HTTP/1.1 400 error when creating replication across datacenters

I’m using Couchbase 2.2 on CentOS 6.5. I have 4 clusters configured in Rackspace: 2 in one datacenter (called ORD), one in each of 3 others (DFW, IAD, LON).

Replication works when set up between the two clusters in the same datacenter, but not between differing datacenters. I’ve verified with Rackspace that the network settings are all correct: iptables firewalls are (currently) turned off, all ports open and listening. I can telnet on both 8091 and 8092 between machines in the clusters.

I’ve tried doing replication both with the browser gui and with the REST API. From the command line I can query other data centers and see all the connections I’ve set up, but not create replication. Here is the output from two failed attempts to connect ORD-1 to DFW, with two slightly different errors, and a successful connection from ORD-1 to ORD-2.

Any suggestions/help is appreciated.

thx
anthony

[dev1@prod-ord-couch-01 ~]$ curl -v -X POST -u root: http://:8091/controller/createReplication -d 056cc0565912c179567668d907b95189 -d fromBucket=default -d toCluster=in-dfw-cluster -d toBucket=default -d replicationType=continuous

  • About to connect() to port 8091 (#0)
  • Trying … connected
  • Connected to () port 8091 (#0)
  • Server auth using Basic with user ‘root’

POST /controller/createReplication HTTP/1.1
Authorization: Basic cm9vdDpsYXN0Y2hhbmNl
User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2
Host: 192.237.244.139:8091
Accept: /
Content-Length: 120
Content-Type: application/x-www-form-urlencoded

< HTTP/1.1 400 Bad Request
< Server: Couchbase Server 2.2.0-837-rel-community
< Pragma: no-cache
< Date: Fri, 07 Nov 2014 23:36:26 GMT
< Content-Type: application/json
< Content-Length: 71
< Cache-Control: no-cache
<

  • Connection #0 to host left intact
  • Closing connection #0
    {“errors”:{“_”:“Timeout exceeded when trying to reach remote cluster”}}

[dev1@prod-ord-couch-01 ~]$ curl -v -X POST -u root: http://192.237.244.139:8091/controller/createReplication -d 056cc0565912c179567668d907b95189 -d fromBucket=default -d toCluster=in-dfw-cluster -d toBucket=default -d replicationType=continuous

  • About to connect() to port 8091 (#0)
  • Trying … connected
  • Connected to () port 8091 (#0)
  • Server auth using Basic with user ‘root’

POST /controller/createReplication HTTP/1.1
Authorization: Basic cm9vdDpsYXN0Y2hhbmNl
User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2
Host: :8091
Accept: /
Content-Length: 120
Content-Type: application/x-www-form-urlencoded

< HTTP/1.1 400 Bad Request
< Server: Couchbase Server 2.2.0-837-rel-community
< Pragma: no-cache
< Date: Fri, 07 Nov 2014 23:37:23 GMT
< Content-Type: application/json
< Content-Length: 80
< Cache-Control: no-cache
<

  • Connection #0 to host left intact
  • Closing connection #0
    {“errors”:{“_”:“Failed to grab remote bucket defaultfrom any of known nodes”}}

=============================================
And here’s a successful connection from ORD-1 to ORD-2:

[dev1@prod-ord-couch-01 ~]$ curl -v -X POST -u root: http://:8091/controller/createReplication -d 056cc0565912c179567668d907b95189 -d fromBucket=default -d toCluster=ord-cluster-02 -d toBucket=default -d replicationType=continuous

  • About to connect() to port 8091 (#0)
  • Trying … connected
  • Connected to () port 8091 (#0)
  • Server auth using Basic with user ‘root’

POST /controller/createReplication HTTP/1.1
Authorization: Basic cm9vdDpsYXN0Y2hhbmNl
User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2
Host: :8091
Accept: /
Content-Length: 120
Content-Type: application/x-www-form-urlencoded

< HTTP/1.1 200 OK
< Server: Couchbase Server 2.2.0-837-rel-community
< Pragma: no-cache
< Date: Fri, 07 Nov 2014 23:35:24 GMT
< Content-Type: application/json
< Content-Length: 109
< Cache-Control: no-cache
<

  • Connection #0 to host left intact
  • Closing connection #0
    {“database”:“http://:8092/_replicator”,“id”:“f71fd537dabe8f3e31935a57f82ac6cf/default/default”}

Since the REST request here is about asking the cluster to do something and the details about what it’s doing are on the node and it’s logs, it may be best to file an issue with a cbcollect_info from the node that you’re making the request to and getting the 400 back from. That’d give us some detail in what’s happening behind this:
{"errors":{"_":"Failed to grab remote bucket defaultfrom any of known nodes"}}

Done, thanks

https://www.couchbase.com/issues/browse/MB-12610

thx
anthony