Data Loss issue with Replicas 1 or more

We are working on evaluating performance of Couchbase(Version 6.0.0 (6.0.0-1693)) using Gatling.

We have formed a Couchbase cluster with 2 nodes(2 physical mac machines), 1 bucket with 1 replica configuration.We are hitting our data service from gatling to create and insert new record to Couchbase db.

Currently we have configured Gatling to create a load of 50,000 in 400secs from 2 different mac machines. So total records inserted should be 1oo,ooo. But we are consistently seeing data loss when we create a bucket with replica. On Gatling or our Data Service logs we don’t see any issues or exception.
Without replica the total number of records and data distribution across node seems to be working perfectly fine.

We have done multiple iteration of the tests but we are seeing the same data loss issue with replica more than 1. Any suggestions for replica management to fix the issue is deeply appreciated.

Check the screenshot. Total no. of records is less than 100,000.
40%20PM

Hi Asutosh,
Thanks for using our product.
I have a question. Could you tell me what OS X version of your mac machine? Its memory size and disk size.
Also, when you load data to bucket, what is the resident ratio of bucket?
I will try on my mac your case and update here.
Thanks

Hey @simplyasutosh, I also replied to a similar stackoverflow posting. Is this from the same project.

I’m pretty sure it must be related to something with error handling. Can you share the gatling configuration you’re using by chance? Feel free to message me directly here on the forums if need be.

Hello ingenthr,

The stackoverflow posting and this one are from the same project. My teammate has already added the Gatling Class screenshot to stackoverflow link.

Hello @thuan,

Here is my Mac configurations.

Model Name: MacBook Pro
Model Identifier: MacBookPro11,4
Processor Name: Intel Core i7
Processor Speed: 2.2 GHz
Number of Processors: 1
Total Number of Cores: 4
L2 Cache (per Core): 256 KB
L3 Cache: 6 MB
Memory: 16 GB

Disk: 500 GB

Ah, so it’s gatling to a webapp? Any chance you can share that webapp code?

At the moment, I’m suspecting it may be missing some error handling. Note that Couchbase can return a TMPFAIL under load and in going from one node to two, you’re basically doubling the load because of replication (assuming you took the defaults). Also note that some query service replies come back with HTTP 200, but errors embedded later.

Of course, you must know, we test this area pretty carefully ourselves. While there could be a defect here somewhere, a quick review of the test code seems prudent to be sure there’s not some error handling missed.

@ingenthr @thuan , webapp code can not be shared as it is confidential but we have all error handling in place.(let me know if you are looking for specific error handling mechanism, we had try and catch statement as per java handling mechanism).
we are considering Couchbase for our organisation needs so we are doing load testing under various scenarios.

This is my allocated memory while creating cluster

This is my bucket before data load
bucket%20before%20data%20load

This is bucket status after data load

This time i ran a scenario for 50000k data but you can see there is only 49921 data.

Beside this i am also seeing 2 issues ( not sure whether it is issue)

1: We are constantly getting a pop up saying:
Audit Write Failure. Attempt to write to audit log on node “xxx.xx.xxx.xxx” was unsuccessfull.

2: whenever we try to login from other node in the cluster, it logs in but then immediately logs off.

could you please shed some light on this ?

Can you show me, approximately, how you’re handling temporary failures, sometimes called tempooms (though they can come up in different circumstances too)? Actually, can you check for those in the statistics in the console from during your data load? You should be able to see if temporary failures were returned.

@ingenthr, we checked all statistics but there is nothing, all looks fine.
we are trying with other configs and will let you know the results. one more question regarding document id creation.

I am trying to create a document id as id::userid
I am following https://docs.spring.io/spring-data/couchbase/docs/current/reference/html/#couchbase.autokeygeneration.usingattributes

My entity class looks like this,

@Document
public class User {

@Id 
@GeneratedValue(strategy = GenerationStrategy.USE_ATTRIBUTES, delimiter="::")
private String id;
@IdAttribute
private String userid;
@Field
private String fname;
@Field
private String lname;

Post method body is

{
“id”: “888”,
“userid”:“user1”,
“fname”: “xyz”,
“lname”: “abc”
}

It is creating document id as 888 only, it is supposed to generate documentt id as 888::user1

I have tested this many times but the result is same. Could you please suggest what i am missing here ?

The entity definition won’t help, unfortunately. What I’m looking for is how you’re handling any exceptions thrown. That’s how a TMPFAIL will manifest itself in your app.

Let’s do this. Can you create a cbcollect-info per the instructions and get it to somewhere I can look at it? Please identify the time you start the run so I can correlate to the logs.

Hi @ingenthr, we did same performance evaluation on cloud and it went fine.
One question regarding data fetching approach,

First Approach:

Let say I have two document

userdoc1
{
“status”:“pending”
“usertype”:“VIP”
“userid”:“123”
}
for above document let say my documentid is status::usertype . [just to clarify,this document id will be unique in our case ]
userdoc2
{
“userid”:“123”,
“fname”:“abc”,
“lname”:“xyz”,
“age”:20;
“address”:“asdf”
}
for userdoc2, let say userid is my documentid

If i do a get operation i would proceed like this (here idea is to fetch data based on document id)
select userid from userdoc1 with key “pending::VIP”;
and then
select * from userdoc2 with key “123”;

Second Approach:

I have only one document

userdoc
{
“status”:“pending”
“usertype”:“VIP”
“userid”:“123”
“fname”:“abc”,
“lname”:“xyz”,
“age”:20;
“address”:“asdf”
}
Here, documentid is “status::usertype”
and we have secondary index on userid

Here if get the data like this(here idea is to fetch data based on secondary index):

select * from userdoc where userd=“123”;

Could you please explain which approach will give high read performance assuming high data load with 100 of nodes in a cluster and XDCR and other factors ?