Couchbase lite not syncing all channels from sync gateway

Lately(the last few weeks) sometimes when a user logs in to couchbase lite (on Android and iOS) the documents for their specific sync gateway channels are not pulled. We have been using couchbase for almost a year now but this issue just came up in the last few weeks, and we haven’t updated couchbase lite anytime in the last few weeks: Android v1.0.4, iOS:v1.0.3.1 build 8.
They always receive all documents in the ‘all’ channel, but anything in channels specific to that user are not synced. We are using basic auth to connect to sync gateway and do the sync, and we just set the SyncGatewaySession cookie in the header. I can verify using Postman that a cookie has access to a specific document, but that document is never synced to our app on the user’s phone. Usually they have to try doing a full DB sync 3 or more times before it is finally synced down. I don’t see any errors or anything suspicious in the sync gateway logs or in the couchbase lite logs. In the server log I see a few errors that don’t seem to correspond to the right times, but they do match the right day:
Control connection to memcached on ‘ns_1@10.0.0.10’ disconnected: {badmatch,
{error,
timeout}}
I posted more details about that error on this forum before and you said it was unlikely to be related to this problem though and was probably just a connection issue. It is always resolved in 1 second too.

We don’t know what else to check at this point. We’ve verified that the app is using a valid cookie and that using that cookie grants us access to all the user’s documents. Is there anything else anyone can suggest that we should be checking for when debugging? Or has anyone else run into this issue where couchbase-lite reports no errors but doesn’t sync all documents?

It would be helpful to look at detailed logging on the Sync Gateway side when the user attempts to pull documents, to see what’s happening when the _changes request comes through from the client. Enabling logging for “HTTP”, “Changes”,“Changes+”,“Cache” and “Cache+” would probably be most helpful for this particular issue.

You can enable this logging through the SG Admin REST API (without restarting your Sync Gateway). Here’s the curl command for reference:

curl -i -X POST http://yourserver:4985/_logging -d '{"HTTP":true, "Cache":true, "Cache+":true, "Changes":true, "Changes+":true}' -H "Content-Type:application/json"

Actually just started having this same issue today, though we’re using CBL iOS 1.0.4 and the bleeding edge master branch from SG. The error seemed to coincide with occurrences of JSON error parsing _changes feed: lexical error: invalid char in json text. on our iOS client.

Here’s some output when I update a document user_555c0a0e9b81f01e6d522563 (that my device should sync but does not):

2015-05-21T00:27:09.073Z HTTP:  #004: GET /db_3_19_14_2/user_555c0a0e9b81f01e6d522563  (ADMIN)
2015-05-21T00:27:09.098Z HTTP:  #005: PUT /db_3_19_14_2/user_555c0a0e9b81f01e6d522563?rev=91-d048076a0213fb860cbb3f3e7f725a0f  (ADMIN)
2015-05-21T00:27:09.127Z Cache: SAVING #1137632
2015-05-21T00:27:09.130Z CRUD: Stored doc "user_555c0a0e9b81f01e6d522563" / "92-64335499215bcbc8692cad0e85d46ff9"
2015-05-21T00:27:10.677Z Cache: Received #1137632 after 1552ms ("user_555c0a0e9b81f01e6d522563" / "92-64335499215bcbc8692cad0e85d46ff9")
2015-05-21T00:27:10.677Z Cache:     #1137632 ==> channel "user_555c0a0e9b81f01e6d522563"
2015-05-21T00:27:10.677Z Cache:     #1137632 ==> channel "*"
2015-05-21T00:27:10.677Z Changes+: Notifying that "3_19_14_2" changed (keys="{*, user_555c0a0e9b81f01e6d522563}") count=3
2015-05-21T00:27:11.089Z Cache: Received #1137632 after 1965ms ("user_555c0a0e9b81f01e6d522563" / "92-64335499215bcbc8692cad0e85d46ff9")
2015-05-21T00:27:11.089Z Cache+:   Ignoring duplicate of #1137632

Note: The channel name in this case is the same as the document ID.

The iOS client currently has rev 89-b43f3826bbb6f5599edab884bc2097b9 of the same document.

@sapieneptus I don’t see anything unusual in that Sync Gateway log. Does anything else appear in the log (particularly Changes/Changes+ entries) after the excerpt you provided?

The ‘error parsing _changes feed’ sounds very suspicious as well - are you able to issue the _changes directly via CURL to see what the results look like?

We’ve noticed that restarting sync gateway resolves the issue. I restarted it to do our test so that the logs would start fresh and they would be small enough to open and view without having to run split on them. It resolved the issue though and so now we have to wait for it to come up again for a different user, which could take days or weeks.

Is there a way to fix the sync gateway logging so it won’t keep the same location in the log file when it rolls over? These are our logrotate.d settings for sync-gateway:
/var/log/sync-gateway.log {
dateext
dateformat -%s
rotate 10
size 10M
missingok
notifempty
compress
copytruncate
}

I have this setup to run every hour since the logs get so big so fast. They said to use these settings in the sync gateway documentation (http://developer.couchbase.com/mobile/develop/guides/sync-gateway/deployment/in-production/index.html) but sync gateway just keeps logging to the same point in the file, so it ends up huge with 100’s of MBs of NULLs at the beginning.

@alexegli We’re looking into the log rotation issue - it shouldn’t be generating the 100 MBs of NULLs. Until that gets resolved, you could run with logging disabled until the issue reappears, and then enable logging via the Admin REST API (i.e. no SG restart).

One additional question - what SG build are you using, and has that been updated recently?

Thanks,
Adam

The log files get very large, which is why we have them roll over every hour. So it could generate a few dozen MBs of log data in an hour, then the copytruncate is run which copies it and deletes everything in the log file, but then SG keeps writing to the same location in the file it was using before the copytruncate, so everything before that point is treated as NULL and takes up space.

We’re using sync gateway enterprise 1.0.3 for Ubuntu. We updated it in early March.

There was an fix made to the log rotation handling that’s included in SG 1.1.0 (see https://github.com/couchbase/sync_gateway/pull/594 for details) - I think that would address your issue.

Until then, I was proposing that you could run with logging disabled on Sync Gateway (or at least minimized smaller set of log keys), and then re-enable logging via the admin API when the problem resurfaces.

How do I disable logging? Starting sync gateway with log:[] in the sync gateway config file doesn’t disable it and it still does a ton of logging. Is there a flag I have to set to explicitly disable logging?

Nevermind, figured out it was logging HTTP no matter what. Also got log rollover working properly now.