Data Insert on Single Node Down

Python code used in the test :

from datetime import timedelta
import datetime 
from time import sleep
import random
import re
import json


# these modules are used to access and authenticate with your database cluster:
from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster

# needed for options -- cluster, timeout, SQL++ (N1QL) query, etc.
from couchbase.options import (ClusterOptions, ClusterTimeoutOptions,
                                       QueryOptions)
#from couchbase.exceptions import (
#    DocumentExistsException,
#    InvalidArgumentException,
#)
# Update this to your cluster
username = "Administrator"
password = "password"
bucket_name = "comp_test"
# User Input ends here.

# Connect options - authentication
auth = PasswordAuthenticator(
    username,
    password,
)
timeout_options=ClusterTimeoutOptions(kv_timeout=timedelta(seconds=50), query_timeout=timedelta(seconds=50))
#options=ClusterOptions(PasswordAuthenticator('username', 'password'), timeout_options=timeout_options)
options=ClusterOptions(auth, timeout_options=timeout_options,enable_compression=False)

# Get a reference to our cluster
# NOTE: For TLS/SSL connection use 'couchbases://<your-ip-address>' instead
cluster = Cluster('couchbase://192.168.10.138', options)

# Wait until the cluster is ready for use.
cluster.wait_until_ready(timedelta(seconds=10))

# get a reference to our bucket
cb = cluster.bucket(bucket_name)
cb_coll = cb.scope("test").collection("test")

# Get a reference to the default collection, required for older Couchbase server versions
cb_coll_default = cb.default_collection()

test_scope = cb.scope('test')
sql_query = 'select count(*) as Cnt from comp_test.test.test'
result = test_scope.query( sql_query)
for row in result.rows():
    print("Found row: {}".format(row))
    No = row['Cnt'] + 1
    print('No:', No)

print('start : ', datetime.datetime.now())

#insert a single document
for i in range(100000):
    ii = No + i

    person_dic = {}
    person_dic["custid"] = 'emp_'+str(ii)
    person_dic["phone"] = '010-'+str(random.randint(1000,9999)) + '-' + str(random.randint(0,9999)).zfill(4)
    person_dic["birthday"] = str(random.randint(1940,2010))+'-'+str(random.randint(1,12))+'-'+str(random.randint(1,28))
    person_dic["zipcode"] = str(random.randint(10000,91000))

    while True:
        try:
            rv = cb_coll.upsert('emp_'+str(ii), person_dic)
            if i % 10000 == 0:
                print (rv.key)
            break
        except Exception as e:
            print(e)
            print('Node is down. Sleeping for 10 seconds...')
            sleep(10)


print('End : ', datetime.datetime.now())

There are four data nodes and two query nodes. If a node is not recovered or Auto FailOver when 1 data node is stopped, I can’t insert data, is it normal?
Isn’t it right that the data is inserted?

Message when one node is stopped :

Found row: {'Cnt': 0}
No: 1
start :  2023-09-06 13:09:17.294335
emp_1
emp_10001
RequestCanceledException(<ec=2, category=couchbase.common, message=request_canceled (2), context=KeyValueErrorContext:{'retry_attempts': 0, 'key': 'emp_11990', 'bucket_name': 'comp_test', 'scope_name': 'test', 'collection_name': 'test', 'opaque': 3150}, C Source=/home/ec2-user/workspace/python/sdk/python-packaging-pipeline/py-client/src/kv_ops.cxx:650>)
Node is down. Sleeping for 10 seconds...
UnAmbiguousTimeoutException(<ec=14, category=couchbase.common, message=unambiguous_timeout (14), context=KeyValueErrorContext:{'retry_attempts': 0, 'key': 'emp_11990', 'bucket_name': 'comp_test', 'scope_name': 'test', 'collection_name': 'test', 'opaque': 0}, C Source=/home/ec2-user/workspace/python/sdk/python-packaging-pipeline/py-client/src/kv_ops.cxx:650>)
Node is down. Sleeping for 10 seconds...
UnAmbiguousTimeoutException(<ec=14, category=couchbase.common, message=unambiguous_timeout (14), context=KeyValueErrorContext:{'retry_attempts': 0, 'key': 'emp_11990', 'bucket_name': 'comp_test', 'scope_name': 'test', 'collection_name': 'test', 'opaque': 0}, C Source=/home/ec2-user/workspace/python/sdk/python-packaging-pipeline/py-client/src/kv_ops.cxx:650>)
Node is down. Sleeping for 10 seconds...

A document for a particular documentId is stored in the Active node for that documentId. If that Active node goes down, and there is no replacement Active Node for that documentId, then the document cannot be saved. If you have replicas, and the replica node for that documentId is made Active, then the document will be stored. So (a) you need to have at least one replica on the bucket; and (b) the Replica partitions need to become Active partitions (fail over).

1 Like

Timeout . The number of seconds that must elapse, after a node or group has become unresponsive, before auto-failover is triggered. This number is configurable: the default is 120 seconds; the minimum permitted is 5; the maximum 3600. Note that a low number reduces the potential time-period during which a consistently unresponsive node remains unresponsive before auto-failover is triggered; but may also result in auto-failover being unnecessarily triggered, in consequence of short, intermittent periods of node unavailability.

The replica is set to 2.
There are four data nodes.
When one node down, three nodes operate.
If there are 3 nodes, the data will not be inserted and the message ‘UnambiguousTimeoutException’ will occur.
1 node goes down, 3 nodes can’t insert data?
Without an auto failover

1 node goes down, 3 nodes can’t insert data?

The three nodes can indeed insert data - but only for documents with ids that map to those three nodes. If the documentId maps to the node that is down, then it cannot be inserted. When the configuration is updated so that the document id now maps to one of the three remaining nodes - it can then be inserted. But while the configuration does not have active node for the document - it cannot be inserted.

see Intra-Cluster Replication | Couchbase Docs

While there is no active node fro the document, a replica of the document can be retrieved using get-from-replica.

1 Like

Even for ‘insert’ other than ‘upert’, if one node goes down, ‘Unambiguous Timeout’ occurs, is it normal?

Yes. More or less. When the active node for a document is not available the kv operation for the active document will be retried until a) it succeeds when the node becomes available; or b) the timeout is reached.

If a replica node for the document is available then a replica of the document is readable by using a replica API.

Is there a way to reconnect when ‘Unambiguous Timeout’ occurs?

Just re-execute the operation. It will reconnect automatically. Alternately, you can set the timeout to a really long value and it will continue to retry on its own until success.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.