Lost TAP feed for bucket. problems with resuming

Hi, I’m trying to debug an issue. i’m running sync gateway and couchbase server in docker, and both have been running fine for a few weeks until a couple of days ago.

I received the following messages in the logs:
INFO _msg=Got new configuration for bucket <bucketname>

Not sure what this message means. To my knowledge, the configuration was not updated
ERROR _msg= Unable to decode response unexpected EOF

Unclear what response this message is referring to
INFO _msg= Trying with http://<server_ip>:8091/pools/default/bucketsStreaming/<bucketname>
WARNING: Bucket Updater for bucket <bucketname> returned error: Get http://<server_ip>:8091/pools/default/bucketsStreaming/<bucketname>: dial tcp <server_ip>:8091: getsockopt: connection refused -- base.GetCouchbaseBucket.func1() at bucket.go:405
WARNING: Lost TAP feed for bucket <bucketname>, with error: Get http://<server_ip>:8091/pools/default/bucketsStreaming/<bucketname>: dial tcp <server_ip>:8091: getsockopt: connection refused -- rest.(*ServerContext)._getOrAddDatabaseFromConfig.func1() at server_context.go:352
CRUD: Taking Database : <bucketname>, offline
CRUD: Waiting for all active calls to complete on Database : <bucketname>

Not sure why the connection was lost here.
i am unsure of the pools/default/bucketsStreaming/ endpoint mentioned in the log, but i’ve tried now /pools/default/buckets endpoint on the server, and i’ve gotten a response

i’ve tried bringing the database back online
curl -X POST http://<syncgatewayserver>:4985/<db_name>/_online

and get the following log messages in sync gateway:
HTTP: #27908: POST /<db_name>/_online (ADMIN)
CRUD: Taking Database : <db_name>, online in 0 seconds

But i still can’t see anything on the sync gateway admin web interface.
and the _all_docs request seems to get stuck for some reason, and does not get logged either. I do get responses for some other api requests such as /<db_name>

so i’m trying to understand the following:

  1. why am i not able to see any documents in the sync gateway admin web interface (:4985/_admin/db/<db_name>), or query _all_docs from the rest api
  2. i don’t know what a tap feed is, but i wouldn’t have expected connection loss of any kind, considering i’m running couchbase server and sync gateway on the same host machine in individual docker containers.

The TAP feed is how Sync Gateway is made aware of all changes that happen on Couchbase Server buckets. It works over sockets, and Sync Gateway keeps a long running socket connection to Couchbase Server.

It looks like in your case that somehow that TAP connection was severed, and Sync Gateway went into the offline state. Going into the offline state is the right thing to do if the TAP connection is severed, but I think it’s worth trying to figure out why it was severed in the first place. How are you doing the networking between the containers? Is it a docker compose file and you’re specifying the container name? Are the containers running on the same host? If Couchbase Server, or the container it was running, somehow restarted, you would expect to see this because the TAP feed would be temporarily lost. Or if you are running this on a laptop and the entire Linux VM was paused due to the laptop going to sleep … that could also explain the networking hiccup.

It sounds like once it went offline, trying to put it back online failed and it somehow got into an unresponsive state. Which version of Sync Gateway are you using?

Our Quality Engineering team is currently working on docker-based testing, so they might run into this same issue.

I’ll try to reproduce this by kicking off a Couchbase Server + Sync Gateway instance on an AWS EC2 instance and let it run for a few days.

I did notice that the default “latest” tag for the Sync Gateway was pointing to an older 1.3 version of Sync Gateway. It’s fixed now, so running “docker pull couchbase/sync-gateway” will give you the latest release of Sync Gateway: 1.4.0.2, which might include a fix for the issue where Sync Gateway was not going back online correctly.

Hi @traun
I’m running both server and sync gateway on a single host machine on AWS, via docker-compose. I’m using the community images right now. Is Sync Gateway 1.4 also in a community release?
Also, as i observed, the sync gateway api wasn’t completely unresponsive, i did try a couple of other endpoints such as /db_name, but /db/all_docs seemed to be unresponsive, as a rest endpoint as well as from the admin web interface.
Also, I’m trying to understand the DCP feed type, which is an alternative to TAP. Can you shed some light on what the differences are and which cases are suited for either feed type?
Thanks.
Here is what my docker-compose looks like:

  version: '2'
  services:
    couchbase-server:
      image: couchbase:community-4.5.0
      networks:
        couchbase:
          aliases: 
            - couchbase-server
      ports:
        - "8091-8094:8091-8094"
        - "11210:11210"
    couchbase-sync-gateway: 
      image: couchbase/sync-gateway:1.3.1-community
      networks:
        couchbase:
          aliases: 
            - couchbase-sync-gateway
        default:
          aliases: 
            - couchbase-sync-gateway
      ports:
        - "4984:4984"
        - "4985:4985"
      command: "-adminInterface :4985 /tmp/config/config.json"

  networks: 
    couchbase:
      driver: bridge

Here is the docker-compose file I created for this test:

I didn’t see any issues with this test.

Yep. Is there somewhere you are looking for SG 1.4 community and not seeing it? If so, can you post a link?

I think the db went offline and got into a stuck mode. Can you retest with Sync Gateway 1.4 and see if you can reproduce the same issue?

DCP is a newer protocol and will be the default soon: see Make DCP the default feed type by ajres · Pull Request #2462 · couchbase/sync_gateway · GitHub. Having said that, at the current point in time, TAP is still the default and undergoes more release testing, so I would probably recommend sticking with TAP until DCP becomes the default.

DCP has additional features like being able to restart the stream at a particular snapshot point. I’d recommend watching https://www.youtube.com/watch?v=j5T7wELj9Wc and checking out some of the docs to learn more.

Thanks @traun

I was looking at Docker and didn’t find a “community” suffix with any 1.4 tag, so i assumed only 1.3 was community, and 1.4 was enterprise.

I’ll see what I can do. The last time it sort of just happened, so not sure if i can reproduce the exact steps. It hasn’t seemed to happen since then. Thanks.