Random new-connection problems from libcouchbase to server
In trying the simplest of tests, I find that couchbase-libcouchbase-1.1.0dp7-26-g4f9f40c is very intermittent when establishing a connection with the server. Repeatedly running the simple app below will result in something like:
[get_callback: CONNECT_ERROR (23)
Failed to libcouchbase_mget, callback given CONNECT_ERROR (23)]
[get_callback: CONNECT_ERROR (23)
Failed to libcouchbase_mget, callback given CONNECT_ERROR (23)]
[get_callback: SUCCESS (0)
Got it! It already existed.]
[get_callback: CONNECT_ERROR (23)
Failed to libcouchbase_mget, callback given CONNECT_ERROR (23)]
Ignoring errors, it later gets ETMPFAIL when trying to set the value.
This example is simply testing for an already-existing value for a key, and if it isn't already known setting with default content.
Any help appreciated.
Stephen Minifie
Couchbase Server 1.8.0, Windows XP SP3, Visual Studio 2005 SP2
static const char* errors[] = { "SUCCESS", "AUTH_CONTINUE", "AUTH_ERROR", "DELTA_BADVAL", "E2BIG", "EBUSY", "EINTERNAL", "EINVAL", "ENOMEM", "ERANGE", "ERROR", "ETMPFAIL", "KEY_EEXISTS", "KEY_ENOENT", "LIBEVENT_ERROR", "NETWORK_ERROR", "NOT_MY_VBUCKET", "NOT_STORED", "NOT_SUPPORTED", "UNKNOWN_COMMAND", "UNKNOWN_HOST", "PROTOCOL_ERROR", "ETIMEDOUT", "CONNECT_ERROR", "BUCKET_ENOENT" }; static void error_callback(libcouchbase_t instance, libcouchbase_error_t error, const char *errinfo) { fprintf(stderr, "error_callback: %s (%d)\r\n", errors[error], error); } static void get_callback(libcouchbase_t instance, const void *cookie, libcouchbase_error_t error, const void *key, libcouchbase_size_t nkey, const void *bytes, libcouchbase_size_t nbytes, libcouchbase_uint32_t flags, libcouchbase_cas_t cas) { fprintf(stdout, "get_callback: %s (%d)\r\n", errors[error], error); *(libcouchbase_error_t*)cookie = error; } static void set_callback(libcouchbase_t instance, const void *cookie, libcouchbase_storage_t operation, libcouchbase_error_t error, const void *key, libcouchbase_size_t nkey, libcouchbase_cas_t cas) { fprintf(stdout, "set_callback: %s (%d)\r\n", errors[error], error); *(libcouchbase_error_t*)cookie = error; } int _tmain(int argc, _TCHAR* argv[]) { libcouchbase_t instance = libcouchbase_create("localhost:8091", "Administrator", "password", NULL, NULL); if(!instance) { fprintf(stderr, "Failed to create libcouchbase instance\r\n"); return -1; } libcouchbase_set_error_callback(instance, error_callback); libcouchbase_error_t error = libcouchbase_connect(instance); if(!error) { libcouchbase_wait(instance); error = libcouchbase_get_last_error(instance); } if(error) { fprintf(stderr, "Failed to connect libcouchbase instance to server %s (%d)\r\n", errors[error], error); libcouchbase_destroy(instance); return -1; } libcouchbase_set_get_callback(instance, get_callback); libcouchbase_set_storage_callback(instance, set_callback); libcouchbase_error_t callbackError = LIBCOUCHBASE_SUCCESS; const char* key = "Hello"; libcouchbase_size_t nkey = static_cast<libcouchbase_size_t>(strlen(key)); error = libcouchbase_mget(instance, &callbackError, 1, (const void *const *)&key, &nkey, NULL); if(!error) { libcouchbase_wait(instance); error = libcouchbase_get_last_error(instance); } if(error) { fprintf(stderr, "Failed to libcouchbase_mget, returned %s (%d)\r\n", errors[error], error); return -1; } if(!callbackError) { fprintf(stdout, "Got it! It already existed.\r\n"); return 0; } if(callbackError != LIBCOUCHBASE_KEY_ENOENT) { fprintf(stderr, "Failed to libcouchbase_mget, callback given %s (%d)\r\n", errors[callbackError], callbackError); return -1; } callbackError = LIBCOUCHBASE_SUCCESS; error = libcouchbase_store(instance, &callbackError, LIBCOUCHBASE_ADD, key, nkey, "World", static_cast<libcouchbase_size_t>(strlen("World")), 0, 0, 0); if(!error) { libcouchbase_wait(instance); error = libcouchbase_get_last_error(instance); } if(error) { fprintf(stderr, "Failed to libcouchbase_store, returned %s (%d)\r\n", errors[error], error); return -1; } if(callbackError) { fprintf(stderr, "Failed to libcouchbase_store, callback given %s (%d)\r\n", errors[callbackError], callbackError); return -1; } fprintf(stdout, "Set it! It didn't already exist.\r\n"); libcouchbase_destroy(instance); return 0; }
It seems no-one is using the Windows libcouchbase client. The above fix worked for a single node, but the client hangs in select() when connecting to a cluster. I narrowed this second issue down to the linked list of socket events which has errors in the addition and removal of linked items. Any more than 2 entries and the list malfunctions. See these corrections:
static void link_event(struct winsock_io_cookie *instance,
struct winsock_event *event)
{
/* Previously:
if (instance->events == NULL) {
instance->events = event;
} else {
instance->events->next = event;
event->next = NULL;
}*/
if (instance->events == NULL) {
instance->events = event;
event->next = NULL;
} else {
event->next = instance->events;
instance->events = event;
}
}
static void unlink_event(struct winsock_io_cookie *instance,
struct winsock_event *event)
{
if (instance->events == event) {
instance->events = event->next;
} else {
struct winsock_event *prev = instance->events;
struct winsock_event *next;
for (next = prev->next; next != NULL; next = next->next) {
if (event == next) {
prev->next = next->next;
return;
}
prev = next; /* added */
}
}
}
I guess not many are using libcouchbase with Windows, so I wanted to provide a solution having debugged the problem myself.
Essentially, the problem is that the second connect attempt (the first having returned WSAEWOULDBLOCK) can occasionally return WSAEINVAL on Windows, instead of the assumed WSAEISCONN.
http://www.sockets.com/err_lst1.htm quotes With datastream sockets, don't call connect() more than once (use select() or WSAAsyncSelect() to detect connection completion).
A simple retry fixes the asynchronous confusion. Here is a suggested replacement connect function:
static int libcouchbase_io_connect(struct libcouchbase_io_opt_st *iops, libcouchbase_socket_t sock, const struct sockaddr *name, unsigned int namelen) { int ret; unsigned retries; for(retries=3; retries; retries--) { ret = WSAConnect(sock, name, namelen, NULL, NULL, NULL, NULL); if (ret != SOCKET_ERROR) break; iops->error = getError(); if(iops->error != EINVAL) break; Sleep(1); } return ret; }I would also like to voice my distaste with the use of stdout and particularly abort() in this library. Any such library doesn't know the context of its use and killing my process just because of an unrecognised error to the author is shocking (see plugin-win32.c, function "getError()").