Get operation timeout 0x17

I am actually quite excited about deploying Couchbase and I am doing some tests on Couchbase cluster. However, I am now stuck and wonder if anyone here encountered similar problem.

I have written a simple client application using C SDK 2.0 and put them on two identical CentOS client machines. Then I installed v2.1.1 of Couchbase Enterprise Edition on another CentOS machine.
I followed closely the example program as follows:

    lcb_error_t lcb_err;
    lcb_t instance;
    struct lcb_create_st options;
 
    memset(&options, 0, sizeof(cb->create_options));
    options.version = 1;
    options.v.v1.host = "cb1:8091";
    options.v.v1.user = "mybucket";
    options.v.v1.bucket = "mybucket";
    options.v.v1.passwd = "password";
    options.v.v1.type = LCB_TYPE_BUCKET;
 
    lcb_err = lcb_create(&instance, &options);
    if (lcb_err != LCB_SUCCESS) {
        sprintf(err, "Failed to create libcouchbase instance: %s\n",
            lcb_strerror(NULL, lcb_err));
        return LCB_ERROR;
    }
 
    (void)lcb_set_error_callback(instance, error_callback);
    if ((lcb_err = lcb_connect(instance)) != LCB_SUCCESS) {
        lcb_destroy(instance);
        fprintf(stderr, "Failed to initiate connection: %s\n",
            lcb_strerror(NULL, lcb_err));
        return LCB_ERROR;
    }
    lcb_wait(cb->instance);
    (void)lcb_set_store_callback(cb->instance, store_callback);
    (void)lcb_set_get_callback(cb->instance, get_callback);
 
    {
    	data_rec rec;
        lcb_get_cmd_t cmd;
        const lcb_get_cmd_t *commands[1];
        commands[0] = &cmd;
        memset(&cmd, 0, sizeof(cmd));
        cmd.v.v0.key = "key001";
        cmd.v.v0.nkey = 6;
        err = lcb_get(instance, &rec, 1, commands);
        if (err != LCB_SUCCESS) {
            fprintf(stderr, "Failed to get: %s\n", lcb_strerror(NULL, err));
            return LCB_ERROR;
        }
        lcb_wait(instance);
    }

My get callback function looks like this:

    data_rec *rec = (data_rec *)cookie;
    if (error == LCB_SUCCESS) {
        rec->nbytes = resp->v.v0.nbytes;
        memcpy(rec->buf, resp->v.v0.bytes, rec->nbytes);
        rec->cas = resp->v.v0.cas;
    } else {
        fprintf(stderr, "GET ERROR: %s (0x%x)\n",
                lcb_strerror(instance, error), error);
        rec->nbytes = 0;
    }

Both clients are able to establish connection to the couchbase server but client A encountered a get timeout while client B has no problem getting the right data. I have tried many things with no success. Then I set up another couchbase server on another similar machine. This time, both clients A & B encountered get timeout! In desperation, I setup yet another couchbase server on another machine. This time, both clients are able to read the same data successfully.

I found that in situations where get operation was successful, there were established connections between client and couchbase on port 8091 and 11210. Where get operation timeout, there was only established connection on port 8091. There was no firewall and I tested the connections to those two ports using telnet with no problems.

What is going on? Any help will be much appreciated.

Thanks.

1 Answer

« Back to question.

After two days, I finally managed to figure out the problem and solution - the client program must be able to resolve couchbase's node names. In my installation of the third couchbase server, I used 127.0.0.1 as the node - that wouldn't cause any problem. However in the other two, I used the hostname. That means that the client program will need to be able to resolve these hostnames in order to connect to couchbase at port 11210.