App crash when connecting while a bucket is being created

I think opening a bucket after it is deleted should give ‘unknown bucket’. Just like trying to open a bucket before it is created.

Reading the couchbase log I see

I don’t see any retry there. Just processing of responses for requests that were sent (probably before cluster.close()).

To be fair, I don’t think this is the bottom of the issue : the bucket could be advertised as ready by the API ( which is where waitUntilReady gets the information), but if the config map has not been updated by the client yet, the operation could still fail, at least in theory. A hell of race conditions must happen to get there, but it is possible. Unless I’m wrong :man_shrugging:

In the Java implemenation, WaitUntilReady actually issues a kv request to every data node.

Agreed. It is just about the semantic of the error though, ultimately it will fail for the same, valid reason.

The retries are done client side, right ? I mean it’s not the server that retries to open the bucket, am I wrong about this ?

Regardless, once the connection is closed, the SDK should not throw an error. It should do so while awaiting the connection to be closed, or better, offer the option to either terminate or to gracefully close.

Yes and no. In this case yes. If the bucket has been deleted and will never be re-created, then yes, the outcome is the same. But if it is possible that the bucket be recreated within the timeout, the outcome would be different - for ‘unknown bucket’ it would be retried. For 0x89 NO_COLLECTIONS_MANIFEST, it’s not handled and not retried.

Regardless, once the connection is closed, the SDK should not throw an error.

Why not? If one thread is - say reading a list of escaped prisoners, and another thread closed the cluster, it’s very wrong to assume there are no escaped prisoners.

Also - I don’t see any exception being thrown. Just logging messages.

Correct. So the C++ client should handle code 89 and retry.

If I’m not mistaken this is only the case during tests. Production code only issue an HTTP request for buckets per se : https://github.com/couchbase/couchbase-jvm-clients/blob/master/core-io/src/main/java/com/couchbase/client/core/diagnostics/WaitUntilReadyHelper.java#L236.

I think this crosses the line between protecting the user and assuming user intent.
In your example, I assume you are talking about streaming results. A non-streaming operation would throw if the connection is closed while being processed, right ? If not, this is an issue.

Now, while it is wrong to assume there are no escaped prisoners, once the stream is closed, it is the job of the application to check the reason.

If you take the perspective of the closing thread, it is also very wrong to assume that closing the connection does not close the connection.

Because it fails to perform the operation, it ultimately crawl back to Node.js and throws an error, crashing the app/tests. The crash may come from the fact that the v8 context has been since destroyed.

it is the job of the application to check the reason.

Without an exception or error, how can it know the reason? I actually don’t know offhand what is specced out in the RFC, but if you find that the behavior doesn’t match the RFC, you can open a ticket.

I’ve browsed the RFC and nothing is mentioned about it.

The exception should be throw at the moment of the connection shutdown, not after.
Let’s recap what’s happening :

  1. App create threads T1 and T2
  2. T1 creates a connection, do some operations, some are awaited some not.
  3. T1 close the connection and interrupt itself (the thread).
  4. One of the non-awaited operation of T1 throws an error
  5. The app crash

That looks wrong to me.
Also, the non-awaited operation is done by the SDK, on behalf of the user who is completely unaware of that AND cannot await that operation anyway.

You keep saying the “App crashes”. I’ve never been able to reproduce that and I don’t see where you’ve posted a stack trace for it. If couchbase is going to add code to prevent it, they’re going to need to know where.

T2 is mentioned only once (in step 1) in your scenario. Is it related to the scenario?

If T1 is executing cluster.close() how can the same T1 also be thowing an exception for an operation? Doesn’t the cluster.close() need to complete? And whats the difference if the exception is thrown while cluster.close() or after? To the caller of the operation, it’s an exception either way. Also cluster.close() can have a timeout arg to allow some time for in-progress operations to complete.

There is no app trace except the errors from the Couchbase logs that I’ve already shared with you.

Screen-Recording-2024-05-03-at-16.49.52

I mentioned T2 only to illustrate that the app is doing something else that should not be affected.

T1 is closing and waiting the close to be complete. It also terminate itself. So if an error is thrown after the connection is close and the thread is terminated, it should be swallowed and not crash the rest of the processes, because this is not an exposed operation so the user cannot await openBucket. Ideally one should have a gracefullyShutdown() method to await pending operations, but this does not exist.

The Node.js SDK close() method cannot receive any options, only a callback.

When Node.js starts, a main thread is created and it loads the native module.
When a new thread is created in the Node.js world, a new context is available to the native module. The native module should be aware of that and operate adequately to manage the resources and avoid typical threading errors like race conditions within the C++ code.
So when I say "One of the non-awaited operation of T1 throws an error", it means "the underlying C++ binding call from the Node.js thread T1 that was not awaited by the said thread throws an error".
The binding should be aware that T1 no longer exists and swallow the exception, because that exception will now crash the main thread and therefore the entire process.

I do think that swallowing an error is a bad idea in most scenarios, but given the circumstances, it is the best short to me. Those circumstances are :

  1. The user is unaware of the operation openBucket.
  2. If the user is somehow aware of that operation, he cannot await it.
  3. The user has no way of asking for a graceful shutdown, even hiding implementation details.
  4. Node.js is not designed to be resilient to crashes (unlike Java apps for example, who are designed to survive no matter what) so it blows hard on the user.
  5. The crash is particularly difficult to diagnose. We both have 20+ years of experience, deep knowledge in Couchbase, its SDK, access to the C++ code and it took us several hours/days to find the culprit. And we still can’t fix it without modifying the SDK code.

I see several solutions :

  1. Swallow the error - nasty but fast and with an extremely low likelihood of having consequences given the operation involved.
  2. Make the openBucket operations awaitable using a dedicated method. - acceptable
  3. Expose closeGracefully() that hides the implementation details and await the openBucket operations under the hood. - cleaner, better DX
  4. Change the SDK API so calling cluster.bucket() is now an async method - DX nightmare + breaking change
  5. Restructure the C++ bindings so any operation on a bucket will first await the openBucket operation of the bucket involved - expensive + real value to be determined.

Yes. And I stated I don’t know what the handwaving refers to. ‘select bucket failed, bucket probably does not exist’ is not a crash. It means that the bucket does not exist. The 0x89 NO_COLLECTIONS_MANIFEST is not a crash, it means that there was no collections manifest.

it ultimately crawl back to Node.js and throws an error, crashing the app/tests.

That’s not helping. Maybe you can show me that error. Or that crash.

T1 is closing and waiting the close to be complete.

Is it waiting? I thought it only waits if a timeout is provided. Or by waiting do you just mean the ‘await’?

So if an error is thrown after the connection is close

What error is thrown? Is the an error on a couchbase operation? Show me.

it should be swallowed

If this is a couchbase operation to read or write data, no it should not be swallowed. Couchbase should never swallow errors that could inform an application that an operation that it initiated did not complete successfully. The application always has the option of ignoring the error.

I didn’t go through the rest of your post. Without knowing what this mysterious error is, there’s nothing I can do about it.

swallowing errors based on “extremely low likelihood of having consequences” is not acceptable.

making openBucket awaitable using a dedicated method - can’t the application make such a method? Other SDKs have WaitUntilReady

Expose closeGracefully() - other SDKs have close(timeout)

Restructure the C++ bindings so any operation on a bucket will first await the openBucket operation of the bucket involved. - isn’t that what it does? Well, retries until

I have shown you the crash and the error. The crash happens immediately after the error happens, but for some reason it seems to me that you refuse to accept that the error triggers the crash. I explained it in my previous post : the error is wrongfully sent to the main thread after the calling thread has been terminated, leading to a crash.

I completely agree with your statement. I am exclusively talking about the openBucket operation.

No, the application cannot make such a method. The SDK calls openBucket transparently under the hood when accessing the bucket object. Nothing can be done on the user side.

const result = await cluster.bucket('foo').scope('bar').query('...');
//                                 ^ this is when the SDK calls `openBucket`

I’m very happy for the users of other SDKs. The Node.js SDK does not have such option.

No, that is not what is done, apparently. The operations are retried regardless of the status of the openBucket call. If they awaited the openBucket operation, the tests which perform an insert would not terminate before openBucket is settled (success or failure).

This is community support and I appreciate the time you spent trying to help me figure this out.

but for some reason it seems to me that you refuse to accept that the error triggers the crash

I accept it - I’m just asking you to show me where it crashes so it can be fixed.

No, the application cannot make such a method. The SDK calls openBucket transparently under the hood when accessing the bucket object. Nothing can be done on the user side.

The application can attempt to get() or exists() on the bucket/collection until it gets a DocumentNotFoundException.

the tests which perform an insert would not terminate before openBucket is settled (success or failure).

insert has to adhere to the timeout.

I’m very happy for the users of other SDKs. The Node.js SDK does not have such option.

I am pointing out that since it is implemented in other SDKs, then there is a case for getting it implemented in the nodejs SDK. But if you’d rather give snarky replies instead, we’ll keep it as-is.

It happens when creating a new connection. If you put a breakpoint line 364 in packages/vitest/src/keyspaceIsolation/KeyspaceIsolationPool.ts, you will see that past that breakpoint, it crashes.

No, because the operation can go through before openBucket success. That’s actually what happens in my tests, allowing the test and its thread to terminate.

It does, but again, the operation can go through before openBucket settles. I’ve tried what you suggested with a timeout of 5s, the test is completed under 3s, yet the next one fails.

You kept pointing things about other SDK, without bringing up the possibility of having it exposed in the Node.js. Sorry about that. I do appreciate your help.

I’m trying to have a clean/lean reproduction but it’s quite hard to reproduce the exact conditions.

I’ve managed to get a core dump. The full trace is available below, but here is the trace of thread #15 :

  thread #15
    frame #0: 0x0000000181cf190c libsystem_kernel.dylib`__semwait_signal_nocancel + 8
    frame #1: 0x0000000181be7b98 libsystem_c.dylib`nanosleep$NOCANCEL + 216
    frame #2: 0x0000000181c1329c libsystem_c.dylib`usleep$NOCANCEL + 68
    frame #3: 0x0000000181c37a28 libsystem_c.dylib`abort + 188
    frame #4: 0x00000001055733f4 node`uv_mutex_lock + 28
    frame #5: 0x0000000104905748 node`napi_call_threadsafe_function + 44
    frame #6: 0x00000001314135d0 couchbase-native.node`___lldb_unnamed_symbol3930 + 156
    frame #7: 0x0000000131413fac couchbase-native.node`___lldb_unnamed_symbol3975 + 68
    frame #8: 0x000000013159d480 couchbase-native.node`___lldb_unnamed_symbol8552 + 136
    frame #9: 0x00000001315627ec couchbase-native.node`asio::detail::executor_op<asio::detail::binder0<asio::executor_binder<couchbase::core::bucket_impl::bootstrap(couchbase::core::utils::movable_function<void (std::__1::error_code, couchbase::core::topology::configuration)>&&)::'lambda'(std::__1::error_code, couchbase::core::topology::configuration)::operator()(std::__1::error_code, couchbase::core::topology::configuration)::'lambda0'(), asio::io_context::basic_executor_type<std::__1::allocator<void>, 0ul>>>, std::__1::allocator<void>, asio::detail::scheduler_operation>::do_complete(void*, asio::detail::scheduler_operation*, std::__1::error_code const&, unsigned long) + 260
    frame #10: 0x0000000131516314 couchbase-native.node`___lldb_unnamed_symbol7334 + 676
    frame #11: 0x0000000131515e4c couchbase-native.node`___lldb_unnamed_symbol7332 + 208
    frame #12: 0x00000001315167d0 couchbase-native.node`___lldb_unnamed_symbol7338 + 72
    frame #13: 0x0000000181d2af94 libsystem_pthread.dylib`_pthread_start + 136
Backtrace
(llnode) bt
* thread #1
  * frame #0: 0x0000000181cf0340 libsystem_kernel.dylib`kevent + 8
    frame #1: 0x0000000105577d50 node`uv__io_poll + 776
    frame #2: 0x0000000105565e54 node`uv_run + 476
    frame #3: 0x0000000104861714 node`node::SpinEventLoopInternal(node::Environment*) + 256
    frame #4: 0x0000000104983094 node`node::NodeMainInstance::Run(node::ExitCode*, node::Environment*) + 316
    frame #5: 0x0000000104982da8 node`node::NodeMainInstance::Run() + 124
    frame #6: 0x0000000104901380 node`node::Start(int, char**) + 640
    frame #7: 0x00000001819a20e0 dyld`start + 2360
(llnode) bt all
* thread #1
  * frame #0: 0x0000000181cf0340 libsystem_kernel.dylib`kevent + 8
    frame #1: 0x0000000105577d50 node`uv__io_poll + 776
    frame #2: 0x0000000105565e54 node`uv_run + 476
    frame #3: 0x0000000104861714 node`node::SpinEventLoopInternal(node::Environment*) + 256
    frame #4: 0x0000000104983094 node`node::NodeMainInstance::Run(node::ExitCode*, node::Environment*) + 316
    frame #5: 0x0000000104982da8 node`node::NodeMainInstance::Run() + 124
    frame #6: 0x0000000104901380 node`node::Start(int, char**) + 640
    frame #7: 0x00000001819a20e0 dyld`start + 2360
  thread #2
    frame #0: 0x0000000181cf0340 libsystem_kernel.dylib`kevent + 8
    frame #1: 0x0000000105577d50 node`uv__io_poll + 776
    frame #2: 0x0000000105565e54 node`uv_run + 476
    frame #3: 0x00000001049a9b3c node`node::WorkerThreadsTaskRunner::DelayedTaskScheduler::Run() + 336
    frame #4: 0x0000000181d2af94 libsystem_pthread.dylib`_pthread_start + 136
  thread #3
    frame #0: 0x0000000181ced9ec libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x0000000181d2b55c libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x00000001055737a4 node`uv_cond_wait + 40
    frame #3: 0x00000001049a9d1c node`node::TaskQueue<v8::Task>::BlockingPop() + 60
    frame #4: 0x00000001049a75d0 node`node::(anonymous namespace)::PlatformWorkerThread(void*) + 356
    frame #5: 0x0000000181d2af94 libsystem_pthread.dylib`_pthread_start + 136
  thread #4
    frame #0: 0x0000000181ced9ec libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x0000000181d2b55c libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x00000001055737a4 node`uv_cond_wait + 40
    frame #3: 0x00000001049a9d1c node`node::TaskQueue<v8::Task>::BlockingPop() + 60
    frame #4: 0x00000001049a75d0 node`node::(anonymous namespace)::PlatformWorkerThread(void*) + 356
    frame #5: 0x0000000181d2af94 libsystem_pthread.dylib`_pthread_start + 136
  thread #5
    frame #0: 0x00000001051e2684 node`v8::internal::maglev::MaglevGraphBuilder::GetSmiConstant(int) + 48
    frame #1: 0x00000001051e292c node`v8::internal::maglev::MaglevGraphBuilder::VisitLdaSmi() + 40
    frame #2: 0x00000001051b260c node`v8::internal::maglev::MaglevGraphBuilder::VisitSingleBytecode() + 1100
    frame #3: 0x00000001051b1e28 node`v8::internal::maglev::MaglevGraphBuilder::BuildBody() + 260
    frame #4: 0x00000001051b00ac node`v8::internal::maglev::MaglevGraphBuilder::Build() + 532
    frame #5: 0x00000001051af6e0 node`v8::internal::maglev::MaglevCompiler::Compile(v8::internal::LocalIsolate*, v8::internal::maglev::MaglevCompilationInfo*) + 864
    frame #6: 0x00000001051d94d0 node`v8::internal::maglev::MaglevCompilationJob::ExecuteJobImpl(v8::internal::RuntimeCallStats*, v8::internal::LocalIsolate*) + 108
    frame #7: 0x0000000104b939e0 node`v8::internal::OptimizedCompilationJob::ExecuteJob(v8::internal::RuntimeCallStats*, v8::internal::LocalIsolate*) + 60
    frame #8: 0x00000001051dacc4 node`v8::internal::maglev::MaglevConcurrentDispatcher::JobTask::Run(v8::JobDelegate*) + 912
    frame #9: 0x00000001059a345c node`v8::platform::DefaultJobWorker::Run() + 216
    frame #10: 0x00000001049a75e4 node`node::(anonymous namespace)::PlatformWorkerThread(void*) + 376
    frame #11: 0x0000000181d2af94 libsystem_pthread.dylib`_pthread_start + 136
  thread #6
    frame #0: 0x0000000181ced9ec libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x0000000181d2b55c libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x00000001055737a4 node`uv_cond_wait + 40
    frame #3: 0x00000001049a9d1c node`node::TaskQueue<v8::Task>::BlockingPop() + 60
    frame #4: 0x00000001049a75d0 node`node::(anonymous namespace)::PlatformWorkerThread(void*) + 356
    frame #5: 0x0000000181d2af94 libsystem_pthread.dylib`_pthread_start + 136
  thread #7
    frame #0: 0x0000000181cea170 libsystem_kernel.dylib`semaphore_wait_trap + 8
    frame #1: 0x0000000105573620 node`uv_sem_wait + 24
    frame #2: 0x0000000104a2ea68 node`node::inspector::(anonymous namespace)::StartIoThreadMain(void*) + 32
    frame #3: 0x0000000181d2af94 libsystem_pthread.dylib`_pthread_start + 136
  thread #8
    frame #0: 0x0000000181ced9ec libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x0000000181d2b55c libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x00000001055737a4 node`uv_cond_wait + 40
    frame #3: 0x0000000105562204 node`worker + 368
    frame #4: 0x0000000181d2af94 libsystem_pthread.dylib`_pthread_start + 136
  thread #9
    frame #0: 0x0000000181ced9ec libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x0000000181d2b55c libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x00000001055737a4 node`uv_cond_wait + 40
    frame #3: 0x0000000105562204 node`worker + 368
    frame #4: 0x0000000181d2af94 libsystem_pthread.dylib`_pthread_start + 136
  thread #10
    frame #0: 0x0000000181ced9ec libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x0000000181d2b55c libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x00000001055737a4 node`uv_cond_wait + 40
    frame #3: 0x0000000105562204 node`worker + 368
    frame #4: 0x0000000181d2af94 libsystem_pthread.dylib`_pthread_start + 136
  thread #11
    frame #0: 0x0000000181ced9ec libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x0000000181d2b55c libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x00000001055737a4 node`uv_cond_wait + 40
    frame #3: 0x0000000105562204 node`worker + 368
    frame #4: 0x0000000181d2af94 libsystem_pthread.dylib`_pthread_start + 136
  thread #12
    frame #0: 0x0000000181cebea4 libsystem_kernel.dylib`__workq_kernreturn + 8
  thread #13
    frame #0: 0x0000000181cebea4 libsystem_kernel.dylib`__workq_kernreturn + 8
  thread #14
    frame #0: 0x0000000181cea1f4 libsystem_kernel.dylib`mach_msg2_trap + 8
    frame #1: 0x0000000181cfcb24 libsystem_kernel.dylib`mach_msg2_internal + 80
    frame #2: 0x0000000181cf2e34 libsystem_kernel.dylib`mach_msg_overwrite + 476
    frame #3: 0x0000000181cea578 libsystem_kernel.dylib`mach_msg + 24
    frame #4: 0x0000000181e0a058 CoreFoundation`__CFRunLoopServiceMachPort + 160
    frame #5: 0x0000000181e0891c CoreFoundation`__CFRunLoopRun + 1208
    frame #6: 0x0000000181e07e0c CoreFoundation`CFRunLoopRunSpecific + 608
    frame #7: 0x0000000181e85e3c CoreFoundation`CFRunLoopRun + 64
    frame #8: 0x000000010b47caa0 fsevents.node`fse_run_loop + 116
    frame #9: 0x0000000181d2af94 libsystem_pthread.dylib`_pthread_start + 136
  thread #15
    frame #0: 0x0000000181cf190c libsystem_kernel.dylib`__semwait_signal_nocancel + 8
    frame #1: 0x0000000181be7b98 libsystem_c.dylib`nanosleep$NOCANCEL + 216
    frame #2: 0x0000000181c1329c libsystem_c.dylib`usleep$NOCANCEL + 68
    frame #3: 0x0000000181c37a28 libsystem_c.dylib`abort + 188
    frame #4: 0x00000001055733f4 node`uv_mutex_lock + 28
    frame #5: 0x0000000104905748 node`napi_call_threadsafe_function + 44
    frame #6: 0x00000001314135d0 couchbase-native.node`___lldb_unnamed_symbol3930 + 156
    frame #7: 0x0000000131413fac couchbase-native.node`___lldb_unnamed_symbol3975 + 68
    frame #8: 0x000000013159d480 couchbase-native.node`___lldb_unnamed_symbol8552 + 136
    frame #9: 0x00000001315627ec couchbase-native.node`asio::detail::executor_op<asio::detail::binder0<asio::executor_binder<couchbase::core::bucket_impl::bootstrap(couchbase::core::utils::movable_function<void (std::__1::error_code, couchbase::core::topology::configuration)>&&)::'lambda'(std::__1::error_code, couchbase::core::topology::configuration)::operator()(std::__1::error_code, couchbase::core::topology::configuration)::'lambda0'(), asio::io_context::basic_executor_type<std::__1::allocator<void>, 0ul>>>, std::__1::allocator<void>, asio::detail::scheduler_operation>::do_complete(void*, asio::detail::scheduler_operation*, std::__1::error_code const&, unsigned long) + 260
    frame #10: 0x0000000131516314 couchbase-native.node`___lldb_unnamed_symbol7334 + 676
    frame #11: 0x0000000131515e4c couchbase-native.node`___lldb_unnamed_symbol7332 + 208
    frame #12: 0x00000001315167d0 couchbase-native.node`___lldb_unnamed_symbol7338 + 72
    frame #13: 0x0000000181d2af94 libsystem_pthread.dylib`_pthread_start + 136
  thread #16
    frame #0: 0x0000000181ced9ec libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x0000000181d2b55c libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x0000000131516188 couchbase-native.node`___lldb_unnamed_symbol7334 + 280
    frame #3: 0x0000000131515e4c couchbase-native.node`___lldb_unnamed_symbol7332 + 208
    frame #4: 0x0000000131515d6c couchbase-native.node`___lldb_unnamed_symbol7331 + 44
    frame #5: 0x0000000131515d04 couchbase-native.node`asio_detail_posix_thread_function + 28
    frame #6: 0x0000000181d2af94 libsystem_pthread.dylib`_pthread_start + 136
  thread #17
    frame #0: 0x0000000181cf0340 libsystem_kernel.dylib`kevent + 8
    frame #1: 0x0000000131514ea8 couchbase-native.node`___lldb_unnamed_symbol7316 + 280
    frame #2: 0x00000001315161fc couchbase-native.node`___lldb_unnamed_symbol7334 + 396
    frame #3: 0x0000000131515e4c couchbase-native.node`___lldb_unnamed_symbol7332 + 208
    frame #4: 0x00000001315167d0 couchbase-native.node`___lldb_unnamed_symbol7338 + 72
    frame #5: 0x0000000181d2af94 libsystem_pthread.dylib`_pthread_start + 136
  thread #18
    frame #0: 0x0000000181ced9ec libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x0000000181d2b55c libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x0000000131516188 couchbase-native.node`___lldb_unnamed_symbol7334 + 280
    frame #3: 0x0000000131515e4c couchbase-native.node`___lldb_unnamed_symbol7332 + 208
    frame #4: 0x0000000131515d6c couchbase-native.node`___lldb_unnamed_symbol7331 + 44
    frame #5: 0x0000000131515d04 couchbase-native.node`asio_detail_posix_thread_function + 28
    frame #6: 0x0000000181d2af94 libsystem_pthread.dylib`_pthread_start + 136
  thread #19
    frame #0: 0x0000000181cebfac libsystem_kernel.dylib`__ulock_wait + 8
    frame #1: 0x0000000181d2d48c libsystem_pthread.dylib`_pthread_join + 608
    frame #2: 0x0000000181c62aa8 libc++.1.dylib`std::__1::thread::join() + 36
    frame #3: 0x0000000131516be8 couchbase-native.node`___lldb_unnamed_symbol7348 + 56
    frame #4: 0x0000000181d2af94 libsystem_pthread.dylib`_pthread_start + 136
  thread #20
    frame #0: 0x0000000181cebfac libsystem_kernel.dylib`__ulock_wait + 8
    frame #1: 0x0000000181d2d48c libsystem_pthread.dylib`_pthread_join + 608
    frame #2: 0x0000000181c62aa8 libc++.1.dylib`std::__1::thread::join() + 36
    frame #3: 0x0000000131516be8 couchbase-native.node`___lldb_unnamed_symbol7348 + 56
    frame #4: 0x0000000181d2af94 libsystem_pthread.dylib`_pthread_start + 136
  thread #21
    frame #0: 0x0000000181ceac50 libsystem_kernel.dylib`__open + 8
    frame #1: 0x0000000181cf5de8 libsystem_kernel.dylib`open + 64
    frame #2: 0x00000001055684d8 node`uv__fs_work + 248
    frame #3: 0x000000010556a5c8 node`uv_fs_open + 208
    frame #4: 0x0000000104a29f78 node`node::ReadFileSync(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, char const*) + 88
    frame #5: 0x000000010498d6c4 node`node::modules::BindingData::GetPackageJSON(node::Realm*, std::__1::basic_string_view<char, std::__1::char_traits<char>>, node::modules::BindingData::ErrorContext*) + 404
    frame #6: 0x000000010498fb50 node`node::modules::BindingData::TraverseParent(node::Realm*, std::__1::basic_string_view<char, std::__1::char_traits<char>>) + 796
    frame #7: 0x000000010498fce0 node`node::modules::BindingData::GetNearestParentPackageJSON(v8::FunctionCallbackInfo<v8::Value> const&) + 256
    frame #8: 0x000000010558b118 node`Builtins_CallApiCallbackGeneric + 184
    frame #9: 0x00000008000556a0
    frame #10: 0x0000000800062e0c
    frame #11: 0x0000000800062aa0
    frame #12: 0x000000080005fa3c
    frame #13: 0x0000000800059e2c
    frame #14: 0x000000080005a03c
    frame #15: 0x0000000105588ef0 node`Builtins_InterpreterEntryTrampoline + 272
    frame #16: 0x0000000105588ef0 node`Builtins_InterpreterEntryTrampoline + 272
    frame #17: 0x0000000105588ef0 node`Builtins_InterpreterEntryTrampoline + 272
    frame #18: 0x0000000105588ef0 node`Builtins_InterpreterEntryTrampoline + 272
    frame #19: 0x0000000105588ef0 node`Builtins_InterpreterEntryTrampoline + 272
    frame #20: 0x0000000105588ef0 node`Builtins_InterpreterEntryTrampoline + 272
    frame #21: 0x0000000105588ef0 node`Builtins_InterpreterEntryTrampoline + 272
    frame #22: 0x0000000105586c0c node`Builtins_JSEntryTrampoline + 172
    frame #23: 0x00000001055868f4 node`Builtins_JSEntry + 148
    frame #24: 0x0000000104c2d140 node`v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) + 2648
    frame #25: 0x0000000104c2c6b8 node`v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) + 124
    frame #26: 0x0000000104b13680 node`v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) + 536
    frame #27: 0x00000001048f8f58 node`node::loader::ModuleWrap::SyntheticModuleEvaluationStepsCallback(v8::Local<v8::Context>, v8::Local<v8::Module>) + 392
    frame #28: 0x0000000104fc7834 node`v8::internal::SyntheticModule::Evaluate(v8::internal::Isolate*, v8::internal::Handle<v8::internal::SyntheticModule>) + 112
    frame #29: 0x0000000104f70c38 node`v8::internal::Module::Evaluate(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Module>) + 264
    frame #30: 0x0000000104fb378c node`v8::internal::SourceTextModule::InnerModuleEvaluation(v8::internal::Isolate*, v8::internal::Handle<v8::internal::SourceTextModule>, v8::internal::ZoneForwardList<v8::internal::Handle<v8::internal::SourceTextModule>>*, unsigned int*) + 584
    frame #31: 0x0000000104fb370c node`v8::internal::SourceTextModule::InnerModuleEvaluation(v8::internal::Isolate*, v8::internal::Handle<v8::internal::SourceTextModule>, v8::internal::ZoneForwardList<v8::internal::Handle<v8::internal::SourceTextModule>>*, unsigned int*) + 456
    frame #32: 0x0000000104fb370c node`v8::internal::SourceTextModule::InnerModuleEvaluation(v8::internal::Isolate*, v8::internal::Handle<v8::internal::SourceTextModule>, v8::internal::ZoneForwardList<v8::internal::Handle<v8::internal::SourceTextModule>>*, unsigned int*) + 456
    frame #33: 0x0000000104fb370c node`v8::internal::SourceTextModule::InnerModuleEvaluation(v8::internal::Isolate*, v8::internal::Handle<v8::internal::SourceTextModule>, v8::internal::ZoneForwardList<v8::internal::Handle<v8::internal::SourceTextModule>>*, unsigned int*) + 456
    frame #34: 0x0000000104fb3418 node`v8::internal::SourceTextModule::Evaluate(v8::internal::Isolate*, v8::internal::Handle<v8::internal::SourceTextModule>) + 240
    frame #35: 0x0000000104f70c30 node`v8::internal::Module::Evaluate(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Module>) + 256
    frame #36: 0x0000000104b03bb4 node`v8::Module::Evaluate(v8::Local<v8::Context>) + 644
    frame #37: 0x00000001048fac14 node`node::loader::ModuleWrap::Evaluate(v8::FunctionCallbackInfo<v8::Value> const&) + 984
    frame #38: 0x000000010558b118 node`Builtins_CallApiCallbackGeneric + 184
    frame #39: 0x0000000105588ef0 node`Builtins_InterpreterEntryTrampoline + 272
    frame #40: 0x00000001055c5410 node`Builtins_AsyncFunctionAwaitResolveClosure + 80
    frame #41: 0x0000000105690578 node`Builtins_PromiseFulfillReactionJob + 56
    frame #42: 0x00000001055b5714 node`Builtins_RunMicrotasks + 564
    frame #43: 0x0000000105586af4 node`Builtins_JSRunMicrotasksEntry + 148
    frame #44: 0x0000000104c2d118 node`v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) + 2608
    frame #45: 0x0000000104c2d5a8 node`v8::internal::(anonymous namespace)::InvokeWithTryCatch(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) + 88
    frame #46: 0x0000000104c2d6e4 node`v8::internal::Execution::TryRunMicrotasks(v8::internal::Isolate*, v8::internal::MicrotaskQueue*) + 60
    frame #47: 0x0000000104c577a4 node`v8::internal::MicrotaskQueue::RunMicrotasks(v8::internal::Isolate*) + 356
    frame #48: 0x0000000104c57f44 node`v8::internal::MicrotaskQueue::PerformCheckpoint(v8::Isolate*) + 112
    frame #49: 0x0000000104860c4c node`node::InternalCallbackScope::Close() + 252
    frame #50: 0x00000001048607bc node`node::InternalCallbackScope::~InternalCallbackScope() + 20
    frame #51: 0x00000001049479cc node`node::fs::FileHandle::CloseReq::Resolve() + 184
    frame #52: 0x00000001049600c4 node`node::fs::FileHandle::ClosePromise()::$_0::__invoke(uv_fs_s*) + 552
    frame #53: 0x0000000104933478 node`node::MakeLibuvRequestCallback<uv_fs_s, void (*)(uv_fs_s*)>::Wrapper(uv_fs_s*) + 116
    frame #54: 0x0000000105561dbc node`uv__work_done + 184
    frame #55: 0x0000000105565890 node`uv__async_io + 268
    frame #56: 0x0000000105577e54 node`uv__io_poll + 1036
    frame #57: 0x0000000105565e54 node`uv_run + 476
    frame #58: 0x0000000104861714 node`node::SpinEventLoopInternal(node::Environment*) + 256
    frame #59: 0x00000001049fdf84 node`node::worker::Worker::Run() + 2284
    frame #60: 0x0000000104a011f4 node`node::worker::Worker::StartThread(v8::FunctionCallbackInfo<v8::Value> const&)::$_3::__invoke(void*) + 56
    frame #61: 0x0000000181d2af94 libsystem_pthread.dylib`_pthread_start + 136

line 364 is a comment.
And the comment says that if I comment out the sleep, it will crash. So I comment out the call with the sleep, and uncomment the call without the sleep. And I add a console.log after.

364       /**
365        * Remove the sleep() and the tests crashes.
366        * If you change the value for 200, it also crashes.
367        */
368 
369       //this.clusterPromise = sleep(500).then(() =>
370       //  connect(params.connectionString, params.credentials)
371       //);
372 
373       this.clusterPromise = connect(params.connectionString, params.credentials);
374       
375       console.log('this.clusterPromise : '+this.clusterPromise);

then I build it with ‘npm run build’
and I run it with ./node_modules/.bin/vitest kv.spec.ts --project=project:keyspace-isolation --run --disable-console-intercept --reporter=basic

And it does not crash there. Instead it logs this.clusterPromise.

this.clusterPromise : [object Promise]

in fact, the ‘should isolate an insert’ test passes (and inserts a doc in cbjs_store_xxx,cbjs_library_xxx.cbjs_books_xxx and reads it back.

Then, in running ‘should isolate a get’, the test fails with “document not found”. I don’t know why. At the time that document is in cbjs_store_xxx,cbjs_library_xxx.cbjs_books_xxx.

❯ |project:keyspace-isolation| tests/kv.spec.ts (2 tests | 1 failed) 6944ms
❯ tests/kv.spec.ts > kv > should isolate a get
→ document not found

UNTIL. As in a loop.

It never crashes when you run a single file, because the issue comes from the context switching of threads.
If you run multiple files but set isolate: false in vitest.config.ts at the root, it doesn’t crash either.

Isn’t that just waiting on a semaphore?

thread #15
frame #0: 0x0000000181cf190c libsystem_kernel.dylib`__semwait_signal_nocancel + 8

The rest look like regular threads doing regular thread things.

Even running multiple files - it doesn’t crash. Two tests fail, that’s all. I had to increase the timout for kv.spec.ts to 20 seconds as 5 seconds was not enough for it to complete.

./node_modules/.bin/vitest  --project=project:keyspace-isolation --run --disable-console-intercept --reporter=basic
.
.
.
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯[2/2]⎯
 Test Files  2 failed | 3 passed (5)
      Tests  2 failed | 15 passed | 1 skipped (18)
   Start at  17:02:42
   Duration  28.24s (transform 1.53s, setup 1.43s, collect 39ms, tests 21.80s, environment 1ms, prepare 2.68s)

The two tests that did fail, failed with unambiguous timeout errors (ie. they sent the request but never received a response within the timeout). I didn’t investigate why.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.