Hi there,
Yesterday I spotted that a large number of updates coming into SG from mobile devices are failing on the bulk docs update from Couchbase lite. However, my colleagues have noticed over the last few weeks that actions that sync data from SG or read views from SG have been taking longer and longer.
The errors that are happening look like this:
2018-10-03T13:54:35.566Z CRUD+: Invoking sync on doc "files.shapes.966|97b59d2d9155be283f91040e36746a51aa2675576d664bd50031ec9df93048a6" rev 2-c8fc7b55e8d0319aa4bff2c9acf406ee
2018-10-03T13:54:35.567Z CRUD+: No old revision "files.shapes.966|97b59d2d9155be283f91040e36746a51aa2675576d664bd50031ec9df93048a6" / "1-25f1e1dab482852272b683ac7fc25058"
2018-10-03T13:54:35.567Z CRUD+: Saving doc (seq: #18585714, id: files.shapes.966|97b59d2d9155be283f91040e36746a51aa2675576d664bd50031ec9df93048a6 rev: 2-c8fc7b55e8d0319aa4bff2c9acf406ee)
2018-10-03T13:54:35.568Z WARNING: Unrecoverable error attempting to update xattr for key:files.shapes.966|97b59d2d9155be283f91040e36746a51aa2675576d664bd50031ec9df93048a6 cas:0 deleteBody:false error:key not found -- base.CouchbaseBucketGoCB.UpdateXattr.func1() at bucket_gocb.go:1029
2018-10-03T13:54:35.568Z CRUD+: Did not update document "files.shapes.966|97b59d2d9155be283f91040e36746a51aa2675576d664bd50031ec9df93048a6" w/ xattr: key not found
2018-10-03T13:54:35.569Z CRUD+: Released unused sequence #18585714
2018-10-03T13:54:35.569Z BulkDocs: Doc "files.shapes.966|97b59d2d9155be283f91040e36746a51aa2675576d664bd50031ec9df93048a6" --> 404 missing (key not found)
We’re running “Couchbase Sync Gateway/1.5.1(4;cb9522c)” on top of Couchbase “couchbase-server-community/now 5.0.1-5003-1”. This set up has been stable for us since January 2018. The only recent issue was that we had some power outage on one of our three CB nodes, but we were able to bring that back and everything seemed fine afterwards.
Our SG settings look like:
"databases": {
"construct": {
"server": "http://*******:8091",
"bucket": "construct",
"username": "******",
"password": "******",
"enable_shared_bucket_access": true,
"unsupported": {
"user_views": {"enabled":true}
}
}
},
What made me spot this was that I have a graph that tracks the sequence number that’s in use - that’s increasing by around 2M per day as there are 10s of thousands of these updates coming in per hour and failing. This has been happening since around the time that we had the power outage - the sequence number graph for the last 6 months looks like a hockey stick, with the start of the curve around the time of the power outage.
When looking at the nature of the updates, around 96% of them are on documents that SG is not able to retrieve via the admin port - so it looks like they don’t exist at all as far as ADMIN is concerned. Regarding the remaining 4%, they are in all sorts of states, there are failed revs that are:
- are older than the current known to SG
- are equal to the current in SG - but have a different rev hash, so have failed.
- are younger than the current known to SG.
The nearest thing that I can find while searching is this closed error: https://github.com/couchbase/sync_gateway/issues/3307 - I’m wondering if it could be related?
Any help you can give / suggest would be great - we’ve been pulling our hair out on this one.
Cheers,
James