Delayed Response by SyncGW randomly

We are using 1.2.0-79 syncgw with couchbase server 4.0.0-4051 both in Community Edition

Traffic is pumped @200 ops. It is seen that few of the responses are very delayed ie more than 800ms whereas for lot of requests are processed within 1 to 5ms…Why is it that few responses are delayed??

15:36:06.078878 2016-05-26T15:36:06.078+05:30 HTTP+: #136153810: --> 201 (1.9 ms)
15:36:06.098106 2016-05-26T15:36:06.098+05:30 HTTP+: #136153815: --> 201 (1.9 ms)
15:36:06.108664 2016-05-26T15:36:06.108+05:30 HTTP+: #136153817: --> 201 (1.9 ms)
15:36:06.117947 2016-05-26T15:36:06.117+05:30 HTTP+: #136153822: --> 201 (2.1 ms)
15:36:06.118967 2016-05-26T15:36:06.118+05:30 HTTP+: #136153823: --> 201 (1.9 ms)
15:36:06.386150 2016-05-26T15:36:06.386+05:30 HTTP+: #136153833: --> 201 (220.1 ms)
15:36:06.437838 2016-05-26T15:36:06.437+05:30 HTTP+: #136153832: --> 201 (274.7 ms)
15:36:06.596281 2016-05-26T15:36:06.596+05:30 HTTP+: #136153885: --> 201 (244.9 ms)
15:36:06.596380 2016-05-26T15:36:06.596+05:30 HTTP+: #136153897: --> 201 (211.5 ms)
15:36:06.837080 2016-05-26T15:36:06.837+05:30 HTTP+: #136153904: --> 201 (429.7 ms)

15:36:06.908146 2016-05-26T15:36:06.908+05:30 HTTP+: #136153935: --> 201 (454.1 ms)
15:36:06.908635 2016-05-26T15:36:06.908+05:30 HTTP+: #136153887: --> 201 (555.0 ms)
15:36:06.909390 2016-05-26T15:36:06.909+05:30 HTTP+: #136153983: --> 201 (372.4 ms)
15:36:06.909349 2016-05-26T15:36:06.909+05:30 HTTP+: #136153947: --> 201 (445.2 ms)
15:36:06.951681 2016-05-26T15:36:06.951+05:30 HTTP+: #136154093: --> 201 (3.5 ms)
15:36:06.952128 2016-05-26T15:36:06.952+05:30 HTTP+: #136154095: --> 201 (3.9 ms)
15:36:06.953740 2016-05-26T15:36:06.953+05:30 HTTP+: #136154111: --> 201 (5.0 ms)
15:36:06.954059 2016-05-26T15:36:06.954+05:30 HTTP+: #136154115: --> 201 (5.2 ms)

Following error i am seeing in the syncgw error log during the same timestamp.

2016/05/26 15:36:06 go-couchbase: call to Do("_sync:rev:s919591210642_Location.doc:37:4854-a9a85ce88aec73a8fd16180997265f47") in*Bucket).Write took 230.707033ms
2016/05/26 15:36:06 go-couchbase: call to Do(“s919591210774_Location.doc”) in*Bucket).GetsRaw took 246.80042ms
2016/05/26 15:36:06 go-couchbase: call to Do(“s919591210674_Location.doc”) in*Bucket).casNext took 240.351377ms
2016/05/26 15:36:06 go-couchbase: call to Do(“s919591214500_LocatedGroupsMap.doc”) in*Bucket).GetsRaw took 276.108809ms
2016/05/26 15:36:06 go-couchbase: call to Do("_sync:rev:s919591210654_Location.doc:37:4855-35c251fc051ab3397a8fe3894505becd") in*Bucket).Write took 291.707726ms
2016/05/26 15:36:06 go-couchbase: call to Do("_sync:rev:s919591210666_Location.doc:37:4854-716be7ca2aa4e2f99a0721c9121044b7") in*Bucket).Write took 240.52547ms
2016/05/26 15:36:06 go-couchbase: call to Do(“s919591212478_SrvrProfile.doc”) in*Bucket).GetsRaw took 232.270948ms
2016/05/26 15:36:06 go-couchbase: call to Do(“s919591210687_Location.doc”) in*Bucket).casNext took 232.986553ms
2016/05/26 15:36:06 go-couchbase: call to Do(“s919591213825_ContactPresence.doc”) in*Bucket).GetsRaw took 225.307305ms
2016/05/26 15:36:06 go-couchbase: call to Do("_sync:rev:s919591210653_Location.doc:37:4855-2edd5578f9c1c6014131c6e5dcbe709a") in*Bucket).Write took 347.307214ms
2016/05/26 15:36:06 go-couchbase: call to Do("_sync:rev:s919591210667_Location.doc:37:4854-2d7679c5e7049b719af8f62a891bde1e") in*Bucket).Write took 295.643024ms
2016/05/26 15:36:06 go-couchbase: call to Do("_sync:rev:s919591210685_Location.doc:37:4854-9d5d6065808f27c0decf56d15ae4a409") in*Bucket).Write took 236.675357ms
2016/05/26 15:36:06 go-couchbase: call to Do("_sync:rev:s919591210643_Location.doc:37:4855-46bf1743153223723d7abdd6d54f105c") in*Bucket).Write took 348.934783ms

Is it possible to know what is actually slowing down the response of syncgw intermittently??.. Is there any bottle neck seen from the log which is hindering 200 ops? We want to achieve appx 500 ops but is stucked here…Any pointer to analyse will be helpful…

Can someone pls help in this?

I’m unable to help but I have a laggy sync, too. Could you tell me where you get these logs please


Those “Call to … took … ms” messages are reporting unusually long times for Couchbase Server calls. So the database server seems to be under heavy load. You may want to ask on the Couchbase Server forums, to see how to troubleshoot that.

Most of the delays you’re seeing are slow response times between Sync Gateway and Couchbase Server (the ‘Write took …ms messages’). From the information provided it’s difficult to identify the source of the delay - it could either be on the Sync Gateway or Couchbase Server side.

There are a few areas for potential followup:

  1. Review the CPU usage on the Sync Gateway side. High CPU contention can result in slower operations.
  2. Review whether Couchbase Server is performing additional processing during the slower operations (high indexing load, rebalance, etc)
  3. Are all the documents being processed identical, or could the delay be related to document size?

From sync gateway log file

1 Like

i did post in couchbase forum as I am seeing the indexing message in Couchbase

But I want to know from syncgateway that there is no issue with syncgateway and it is only with couchbase…now i am not sure whether performance bottleneck is due to syncgateway or couchbase…

Can I do something to isolate the issue whether it is syncgateway or CBS?

Thanks Adam for the inputs…

  1. CPU usage in syncgateway was less than 10% all the time when the traffic was executed.
  2. When traffic was run, in CB GUI, Indexing was getting flashed many times…Will that cause any issue?? How to debug further whether that is the bottleneck for performance
  3. All the documents processed during traffic are identical…same document size

If the CPU usage is low on the Sync Gateway side, and the doc size is identical, then it’s most likely that Jens is correct - the slower processing is on the Couchbase Server side, and following up on your other forum ticket is probably the right approach.

You could also test whether increasing the size of your Couchbase Server cluster alleviates the issue at all - that’s probably the simplest way to test whether your test is bottlenecking on the server side.