Inconsistent N1QL query performance

Hi all,

I’m experiencing N1QL query performance issues on the basis of the following factors:

  • Cluster vs non-cluster
  • Queries run from the web console vs SDK

In Production I have a cluster of 3 Couchbase 4.5 CE nodes running on CentOS machines in Amazon AWS. In local/development environment I’m using the same Couchbase version running OS X. I’m also using the Java SDK v2.4.7.

Among other document types/structure, I have some like this (~100 in Production):

{
  "identifier": "663a10ba-3c4f-445f-bdb2-dcc5f15b703e",
  "targetFlight": {
    "identifier": "e62530b5-4104-4f0c-ae02-10688f1f00ab",
    "flightEvent": {
      "deptUTCOffset": 0,
      "identifier": "LA8124 SCL#20171030T1440~ANF#20171030T1820",
      "hoursToDepart": 169,
      "departureAsString": "Mon, 30 Oct 2017 14:40",
      "arrival": "2017-10-30T18:20:00.000Z",
      "direct": false,
      "numOfStops": 0,
      "arrivalStation": "ANF",
      "arrvUTCOffset": 0,
      "fares": [],
      "segments": [
        {
          "deptUTCOffset": 0,
          "arrival": "2017-10-30T18:20:00.000Z",
          "firstLeg": {
            "deptUTCOffset": 0,
            "sold": 160,
            "flightCode": {
              "flightNumberAsString": "8124",
              "flightCode": "LA8124",
              "carrierCode": "LA",
              "flightNumber": 8124
            },
            "arrival": "2017-10-30T18:20:00.000Z",
            "lid": 168,
            "inventoryDetails": {
              "totalAdjustedCapacity": 168,
              "totalLid": 168,
              "totalSoldSeats": 160,
              "countsByCabin": [
                {
                  "availableSeatsByLid": 12,
                  "loadFactor": 0,
                  "lid": 12,
                  "availableSeats": 12,
                  "loadFactorOverLid": 0,
                  "adjustedAvailableSeats": 12,
                  "adjustedCapacity": 12,
                  "soldSeats": 0,
                  "travelClassCode": "W"
                },
                {
                  "availableSeatsByLid": -4,
                  "loadFactor": 1.0256,
                  "lid": 156,
                  "availableSeats": -4,
                  "loadFactorOverLid": 1.0256,
                  "adjustedAvailableSeats": -4,
                  "adjustedCapacity": 156,
                  "soldSeats": 160,
                  "travelClassCode": "Y"
                }
              ],
              "totalLoadFactor": 0.9524
            },
            "arrivalStation": "ANF",
            "arrvUTCOffset": 0,
            "capacity": 168,
            "departure": "2017-10-30T14:40:00.000Z",
            "departureStation": "SCL",
            "irop": false,
            "key": "LA8124 SCL#20171030T1440~ANF#20171030T1820"
          },
          "flightReference": "20171030 LA8124 SCLANF",
          "arrivalStation": "ANF",
          "fares": [],
          "arrvUTCOffset": 0,
          "segmentNumber": -1,
          "ticketed": false,
          "outbound": false,
          "seatDesignatorList": [],
          "legs": [
            {
              "deptUTCOffset": 0,
              "sold": 160,
              "flightCode": {
                "flightNumberAsString": "8124",
                "flightCode": "LA8124",
                "carrierCode": "LA",
                "flightNumber": 8124
              },
              "arrival": "2017-10-30T18:20:00.000Z",
              "lid": 168,
              "inventoryDetails": {
                "totalAdjustedCapacity": 168,
                "totalLid": 168,
                "totalSoldSeats": 160,
                "countsByCabin": [
                  {
                    "availableSeatsByLid": 12,
                    "loadFactor": 0,
                    "lid": 12,
                    "availableSeats": 12,
                    "loadFactorOverLid": 0,
                    "adjustedAvailableSeats": 12,
                    "adjustedCapacity": 12,
                    "soldSeats": 0,
                    "travelClassCode": "W"
                  },
                  {
                    "availableSeatsByLid": -4,
                    "loadFactor": 1.0256,
                    "lid": 156,
                    "availableSeats": -4,
                    "loadFactorOverLid": 1.0256,
                    "adjustedAvailableSeats": -4,
                    "adjustedCapacity": 156,
                    "soldSeats": 160,
                    "travelClassCode": "Y"
                  }
                ],
                "totalLoadFactor": 0.9524
              },
              "arrivalStation": "ANF",
              "arrvUTCOffset": 0,
              "capacity": 168,
              "departure": "2017-10-30T14:40:00.000Z",
              "departureStation": "SCL",
              "irop": false,
              "key": "LA8124 SCL#20171030T1440~ANF#20171030T1820"
            }
          ],
          "ssrs": [],
          "originalSeatsAsString": "-",
          "paxSegments": [],
          "departureStation": "SCL",
          "international": false,
          "key": "LA8124 SCL#20171030T1440~ANF#20171030T1820",
          "paxSeats": [],
          "flightCode": {
            "flightNumberAsString": "8124",
            "flightCode": "LA8124",
            "carrierCode": "LA",
            "flightNumber": 8124
          },
          "lastLeg": {
            "deptUTCOffset": 0,
            "sold": 160,
            "flightCode": {
              "flightNumberAsString": "8124",
              "flightCode": "LA8124",
              "carrierCode": "LA",
              "flightNumber": 8124
            },
            "arrival": "2017-10-30T18:20:00.000Z",
            "lid": 168,
            "inventoryDetails": {
              "totalAdjustedCapacity": 168,
              "totalLid": 168,
              "totalSoldSeats": 160,
              "countsByCabin": [
                {
                  "availableSeatsByLid": 12,
                  "loadFactor": 0,
                  "lid": 12,
                  "availableSeats": 12,
                  "loadFactorOverLid": 0,
                  "adjustedAvailableSeats": 12,
                  "adjustedCapacity": 12,
                  "soldSeats": 0,
                  "travelClassCode": "W"
                },
                {
                  "availableSeatsByLid": -4,
                  "loadFactor": 1.0256,
                  "lid": 156,
                  "availableSeats": -4,
                  "loadFactorOverLid": 1.0256,
                  "adjustedAvailableSeats": -4,
                  "adjustedCapacity": 156,
                  "soldSeats": 160,
                  "travelClassCode": "Y"
                }
              ],
              "totalLoadFactor": 0.9524
            },
            "arrivalStation": "ANF",
            "arrvUTCOffset": 0,
            "capacity": 168,
            "departure": "2017-10-30T14:40:00.000Z",
            "departureStation": "SCL",
            "irop": false,
            "key": "LA8124 SCL#20171030T1440~ANF#20171030T1820"
          },
          "paxSSRs": [],
          "blockHours": 3.6666667,
          "route": "SCL-ANF",
          "nowDeparted": false,
          "seatDesignatorListOrderByPosition": [],
          "totalFaresByPaxType": {},
          "paxTickets": [],
          "blockTime": 220,
          "departure": "2017-10-30T14:40:00.000Z"
        }
      ],
      "flightNumber": -1,
      "arrivalAsString": "Mon, 30 Oct 2017 18:20",
      "route": "SCLANF",
      "flightCodes": "LA8124",
      "departure": "2017-10-30T14:40:00.000Z",
      "departureStation": "SCL",
      "stops": [],
      "key": "LA8124 SCL#20171030T1440~ANF#20171030T1820"
    },
    "id": "e62530b5-4104-4f0c-ae02-10688f1f00ab",
    "bookingCurveId": "19cf82ac-6401-4b19-bf94-abe8a824ba85",
    "type": "SR-TARGET-FLIGHT",
    "programId": "09cf82ac-6401-4b19-bf94-abe8a824ba85"
  },
  "incomplete": false,
  "alternativeFlights": [],
  "id": "663a10ba-3c4f-445f-bdb2-dcc5f15b703e",
  "program": {
    "identifier": "09cf82ac-6401-4b19-bf94-abe8a824ba85",
    "id": "09cf82ac-6401-4b19-bf94-abe8a824ba85",
    "type": "SR-PROGRAM",
    "flightsTargetingRules": {
      "flightsCriteria": {
        "targetCabin": "Y",
        "routes": [
          "SCL-ANF",
          "SCL-CJC",
          "LIM-CUZ",
          "BOG-CTG",
          "SCL-LIM",
          "SCL-EZE",
          "SCL-GRU",
          "SCL-MVD",
          "SCL-MIA"
        ]
      },
      "targetFlightsBookingCurve": [
        {
          "timeToDepartRange": {
            "start": -30,
            "unit": "DAYS",
            "end": -5
          },
          "goal": {
            "metric": "LOAD_FACTOR",
            "value": 1
          },
          "id": "19cf82ac-6401-4b19-bf94-abe8a824ba85",
          "inventory": {
            "metric": "LID_LOAD_FACTOR",
            "value": 1,
            "operator": "GEQT"
          }
        }
      ],
      "alternativeFlightsBookingCurve": [
        {
          "inventory": {
            "metric": "LOAD_FACTOR",
            "value": 0.85,
            "operator": "LEQT"
          },
          "timeToDepartRange": {
            "start": -32,
            "unit": "DAYS",
            "end": -3
          },
          "id": "29cf82ac-6401-4b19-bf94-abe8a824ba85"
        }
      ],
      "flightsMatchingCriteria": {
        "departureRange": {
          "start": -2,
          "unit": "DAYS",
          "end": 2
        }
      }
    }
  },
  "type": "SR-CAMPAIGN",
  "complete": true,
  "status": "ACTIVE"
}

I have the following indices in place:

CREATE PRIMARY index ON la;

CREATE INDEX `all-programs-idx` ON `default`(`type`) WHERE type = 'SR-PROGRAM';

CREATE INDEX `campaigns-search-idx` ON `default`(`type`, `program.id`, `targetFlight.flightEvent.key`, `status`) WHERE type = 'SR-CAMPAIGN';

CREATE INDEX `classified-flights-search-idx` ON `default`(`type`, `programId`) WHERE type = 'SR-TARGET-FLIGHT' OR type = 'SR-ALTERNATIVE-FLIGHT';

One of the problematic queries is the following, for instance:

SELECT id FROM la WHERE type = "SR-CAMPAIGN" AND program.id = "09cf82ac-6401-4b19-bf94-abe8a824ba85"

And it’s explain plan:

{
                "#operator": "Sequence",
                "~children": [
                    {
                        "#operator": "PrimaryScan",
                        "index": "#primary",
                        "keyspace": "la",
                        "namespace": "default",
                        "using": "gsi"
                    },
                    {
                        "#operator": "Fetch",
                        "keyspace": "la",
                        "namespace": "default"
                    },
                    {
                        "#operator": "Parallel",
                        "~child": {
                            "#operator": "Sequence",
                            "~children": [
                                {
                                    "#operator": "Filter",
                                    "condition": "(((`la`.`type`) = \"SR-CAMPAIGN\") and (((`la`.`program`).`id`) = \"09cf82ac-6401-4b19-bf94-abe8a824ba85\"))"
                                },
                                {
                                    "#operator": "InitialProject",
                                    "result_terms": [
                                        {
                                            "expr": "(`la`.`id`)"
                                        }
                                    ]
                                },
                                {
                                    "#operator": "FinalProject"
                                }
                            ]
                        }
                    }
                ]
            }

I’m experiencing the following behaviour patterns:

  • Development environment
    • Always works fine, with both test data and Production data (bucket restored locally)
  • Production
    • Web console: always works (only tested from the same node, though)
    • Java SDK
      • Cluster: times out
      • Single node: works
    • Command line cbc-n1ql from application host
      • Cluster: times out
      • Single node: works
    • Command line cbc-n1ql from one of the nodes
      • Cluster: times out
      • Single node: works

Fun fact: if I remove the program.id filter from the WHERE clause it works fine in all cases.

Hypotheses I have worked out so far:

  • Indices problem (discarded). The explain plan always shows that it resolves the query by using the index campaigns-search-idx. Anyways, if I delete all the indices except the primary the result is still the same in all the scenarios mentioned above.
  • Indices deployment/distribution (plausible). While in a cluster, the indices are distributed among the three nodes, maybe this is the issue?
  • Network issue (plausible). Same reason as before, however I have tried to remove all security groups, stateful filters and so forth getting the same results.

My conclusion is that the problem only seems to happen while running in the cluster, however I cannot find out why. I’d appreciate if anybody can provide some advice for such strange behaviour (at least from my position).

Additionally, in case it’s meaningful here is the cbc-n1ql output:

[ec2-user@america ~]$ cbc-n1ql 'SELECT id FROM la WHERE type = "SR-CAMPAIGN" AND program.id = "09cf82ac-6401-4b19-bf94-abe8a824ba85"' -U couchbase://cb1.caravelo.vpc/la -P ********** -vvv
0ms [I0] {26741} [INFO] (instance - L:420) Version=2.8.2, Changeset=5b8d57a3a0051f43792b14db541880a2638b357a
0ms [I0] {26741} [INFO] (instance - L:421) Effective connection string: couchbase://cb1.caravelo.vpc/la?console_log_level=4&. Bucket=la
1ms [I0] {26741} [INFO] (instance - L:135) DNS SRV lookup failed: DNS/Hostname lookup failed. Ignore this if not relying on DNS SRV records
1ms [I0] {26741} [DEBUG] (instance - L:77) Adding host cb1.caravelo.vpc:8091 to initial HTTP bootstrap list
1ms [I0] {26741} [DEBUG] (instance - L:77) Adding host cb1.caravelo.vpc:11210 to initial CCCP bootstrap list
1ms [I0] {26741} [DEBUG] (confmon - L:85) Preparing providers (this may be called multiple times)
1ms [I0] {26741} [DEBUG] (confmon - L:92) Provider CCCP is ENABLED
1ms [I0] {26741} [DEBUG] (confmon - L:92) Provider HTTP is ENABLED
1ms [I0] {26741} [TRACE] (confmon - L:272) Refreshing current cluster map
1ms [I0] {26741} [TRACE] (confmon - L:259) Attempting to retrieve cluster map via CCCP
1ms [I0] {26741} [INFO] (cccp - L:149) Requesting connection to node cb1.caravelo.vpc:11210 for CCCP configuration
1ms [I0] {26741} [DEBUG] (lcbio_mgr - L:413) <cb1.caravelo.vpc:11210> (HE=0x1025100) Creating new connection because none are available in the pool
1ms [I0] {26741} [TRACE] (lcbio_mgr - L:328) <cb1.caravelo.vpc:11210> (HE=0x1025100) New pool entry: I=0x1026370
1ms [I0] {26741} [INFO] (connection - L:464) <cb1.caravelo.vpc:11210> (SOCK=0x1026500) Starting. Timeout=2000000us
2ms [I0] {26741} [DEBUG] (connection - L:230) <cb1.caravelo.vpc:11210> (SOCK=0x1026500) Created new socket with FD=3
2ms [I0] {26741} [TRACE] (connection - L:331) <cb1.caravelo.vpc:11210> (SOCK=0x1026500) Scheduling I/O watcher for asynchronous connection completion.
2ms [I0] {26741} [INFO] (connection - L:139) <cb1.caravelo.vpc:11210> (SOCK=0x1026500) Connected established
2ms [I0] {26741} [DEBUG] (connection - L:98) <cb1.caravelo.vpc:11210> (SOCK=0x1026500) Successfully set TCP_NODELAY
2ms [I0] {26741} [DEBUG] (connection - L:98) <cb1.caravelo.vpc:11210> (SOCK=0x1026500) Successfully set TCP_KEEPALIVE
2ms [I0] {26741} [DEBUG] (lcbio_mgr - L:287) <cb1.caravelo.vpc:11210> (HE=0x1025100) Received result for I=0x1026370,C=(nil); E=0x0
2ms [I0] {26741} [DEBUG] (lcbio_mgr - L:247) <cb1.caravelo.vpc:11210> (HE=0x1025100) Assigning R=0x10233f0 SOCKET=0x1026500
2ms [I0] {26741} [DEBUG] (ioctx - L:101) <cb1.caravelo.vpc:11210> (CTX=0x1028210,unknown) Pairing with SOCK=0x1026500
3ms [I0] {26741} [DEBUG] (negotiation - L:418) <cb1.caravelo.vpc:11210> (SASLREQ=0x1028030) Server supports feature: 0x3 (TCP NODELAY)
3ms [I0] {26741} [TRACE] (negotiation - L:563) <cb1.caravelo.vpc:11210> (SASLREQ=0x1028030) GET_ERRORMAP unsupported/disabled
4ms [I0] {26741} [DEBUG] (ioctx - L:151) <cb1.caravelo.vpc:11210> (CTX=0x1028210,sasl) Destroying context. Pending Writes=0, Entered=true, Socket Refcount=1
4ms [I0] {26741} [DEBUG] (ioctx - L:101) <cb1.caravelo.vpc:11210> (CTX=0x1030770,unknown) Pairing with SOCK=0x1026500
4ms [I0] {26741} [DEBUG] (ioctx - L:151) <cb1.caravelo.vpc:11210> (CTX=0x1030770,bc_cccp) Destroying context. Pending Writes=0, Entered=true, Socket Refcount=1
4ms [I0] {26741} [INFO] (lcbio_mgr - L:463) <cb1.caravelo.vpc:11210> (HE=0x1025100) Placing socket back into the pool. I=0x1026370,C=0x1026500
4ms [I0] {26741} [INFO] (confmon - L:158) Setting new configuration. Received via CCCP
4ms [I0] {26741} [DEBUG] (bootstrap - L:56) Instance configured
5ms [I0] {26741} [DEBUG] (confmon - L:85) Preparing providers (this may be called multiple times)
5ms [I0] {26741} [DEBUG] (confmon - L:92) Provider CCCP is ENABLED
---> Encoded query: {"statement":"SELECT id FROM la WHERE type = \"SR-CAMPAIGN\" AND program.id = \"09cf82ac-6401-4b19-bf94-abe8a824ba85\""}

5ms [I0] {26741} [TRACE] (http-io - L:228) <cb2.caravelo.vpc:8093> POST  http://cb2.caravelo.vpc:8093/query/service. Body=150 bytes
5ms [I0] {26741} [DEBUG] (lcbio_mgr - L:413) <cb2.caravelo.vpc:8093> (HE=0x1030b30) Creating new connection because none are available in the pool
5ms [I0] {26741} [TRACE] (lcbio_mgr - L:328) <cb2.caravelo.vpc:8093> (HE=0x1030b30) New pool entry: I=0x1030c50
5ms [I0] {26741} [INFO] (connection - L:464) <cb2.caravelo.vpc:8093> (SOCK=0x1030df0) Starting. Timeout=75000000us
5ms [I0] {26741} [DEBUG] (connection - L:230) <cb2.caravelo.vpc:8093> (SOCK=0x1030df0) Created new socket with FD=4
5ms [I0] {26741} [TRACE] (connection - L:331) <cb2.caravelo.vpc:8093> (SOCK=0x1030df0) Scheduling I/O watcher for asynchronous connection completion.
5ms [I0] {26741} [INFO] (connection - L:139) <cb2.caravelo.vpc:8093> (SOCK=0x1030df0) Connected established
5ms [I0] {26741} [DEBUG] (connection - L:98) <cb2.caravelo.vpc:8093> (SOCK=0x1030df0) Successfully set TCP_NODELAY
5ms [I0] {26741} [DEBUG] (connection - L:98) <cb2.caravelo.vpc:8093> (SOCK=0x1030df0) Successfully set TCP_KEEPALIVE
5ms [I0] {26741} [DEBUG] (lcbio_mgr - L:287) <cb2.caravelo.vpc:8093> (HE=0x1030b30) Received result for I=0x1030c50,C=(nil); E=0x0
5ms [I0] {26741} [DEBUG] (lcbio_mgr - L:247) <cb2.caravelo.vpc:8093> (HE=0x1030b30) Assigning R=0x10309c0 SOCKET=0x1030df0
5ms [I0] {26741} [DEBUG] (ioctx - L:101) <cb2.caravelo.vpc:8093> (CTX=0x10255b0,unknown) Pairing with SOCK=0x1030df0
2507ms [I0] {26741} [TRACE] (bootstrap - L:143) Background-polling for new configuration
2507ms [I0] {26741} [TRACE] (confmon - L:272) Refreshing current cluster map
2507ms [I0] {26741} [TRACE] (confmon - L:259) Attempting to retrieve cluster map via CCCP
2507ms [I0] {26741} [DEBUG] (cccp - L:143) Re-Issuing CCCP Command on server struct 0x10353d0 (cb1.caravelo.vpc:11210)
2507ms [I0] {26741} [INFO] (lcbio_mgr - L:406) <cb1.caravelo.vpc:11210> (HE=0x1025100) Found ready connection in pool. Reusing socket and not creating new connection
2507ms [I0] {26741} [DEBUG] (lcbio_mgr - L:247) <cb1.caravelo.vpc:11210> (HE=0x1025100) Assigning R=0x10309c0 SOCKET=0x1026500
2507ms [I0] {26741} [DEBUG] (ioctx - L:101) <cb1.caravelo.vpc:11210> (CTX=0x1030d20,unknown) Pairing with SOCK=0x1026500
2508ms [I0] {26741} [DEBUG] (confmon - L:150) Not applying configuration received via CCCP. No changes detected. A.rev=6277, B.rev=6277
5009ms [I0] {26741} [TRACE] (bootstrap - L:143) Background-polling for new configuration
5009ms [I0] {26741} [TRACE] (confmon - L:272) Refreshing current cluster map
5009ms [I0] {26741} [TRACE] (server - L:576) <cb1.caravelo.vpc:11210> (SRV=0x10353d0,IX=0) Scheduling next timeout for 2500 ms. This is not an error
5009ms [I0] {26741} [DEBUG] (confmon - L:150) Not applying configuration received via CCCP. No changes detected. A.rev=6277, B.rev=6277
5009ms [I0] {26741} [TRACE] (confmon - L:259) Attempting to retrieve cluster map via CCCP
5009ms [I0] {26741} [DEBUG] (cccp - L:143) Re-Issuing CCCP Command on server struct 0x1037d60 (cb2.caravelo.vpc:11210)
5009ms [I0] {26741} [DEBUG] (lcbio_mgr - L:413) <cb2.caravelo.vpc:11210> (HE=0x104cac0) Creating new connection because none are available in the pool
5009ms [I0] {26741} [TRACE] (lcbio_mgr - L:328) <cb2.caravelo.vpc:11210> (HE=0x104cac0) New pool entry: I=0x104cc20
5009ms [I0] {26741} [INFO] (connection - L:464) <cb2.caravelo.vpc:11210> (SOCK=0x10313c0) Starting. Timeout=2500000us
5009ms [I0] {26741} [DEBUG] (connection - L:230) <cb2.caravelo.vpc:11210> (SOCK=0x10313c0) Created new socket with FD=5
5009ms [I0] {26741} [TRACE] (connection - L:331) <cb2.caravelo.vpc:11210> (SOCK=0x10313c0) Scheduling I/O watcher for asynchronous connection completion.
5009ms [I0] {26741} [INFO] (connection - L:139) <cb2.caravelo.vpc:11210> (SOCK=0x10313c0) Connected established
5009ms [I0] {26741} [DEBUG] (connection - L:98) <cb2.caravelo.vpc:11210> (SOCK=0x10313c0) Successfully set TCP_NODELAY
5009ms [I0] {26741} [DEBUG] (connection - L:98) <cb2.caravelo.vpc:11210> (SOCK=0x10313c0) Successfully set TCP_KEEPALIVE
5009ms [I0] {26741} [DEBUG] (lcbio_mgr - L:287) <cb2.caravelo.vpc:11210> (HE=0x104cac0) Received result for I=0x104cc20,C=(nil); E=0x0
5009ms [I0] {26741} [DEBUG] (lcbio_mgr - L:247) <cb2.caravelo.vpc:11210> (HE=0x104cac0) Assigning R=0x10309c0 SOCKET=0x10313c0
5009ms [I0] {26741} [TRACE] (server - L:633) <cb2.caravelo.vpc:11210> (SRV=0x1037d60) Session not yet negotiated. Negotiating
5009ms [I0] {26741} [DEBUG] (ioctx - L:101) <cb2.caravelo.vpc:11210> (CTX=0x10768e0,unknown) Pairing with SOCK=0x10313c0
5010ms [I0] {26741} [DEBUG] (negotiation - L:418) <cb2.caravelo.vpc:11210> (SASLREQ=0x1076700) Server supports feature: 0x3 (TCP NODELAY)
5010ms [I0] {26741} [TRACE] (negotiation - L:563) <cb2.caravelo.vpc:11210> (SASLREQ=0x1076700) GET_ERRORMAP unsupported/disabled
5011ms [I0] {26741} [DEBUG] (ioctx - L:151) <cb2.caravelo.vpc:11210> (CTX=0x10768e0,sasl) Destroying context. Pending Writes=0, Entered=true, Socket Refcount=1
5011ms [I0] {26741} [DEBUG] (ioctx - L:101) <cb2.caravelo.vpc:11210> (CTX=0x104d340,unknown) Pairing with SOCK=0x10313c0
5011ms [I0] {26741} [DEBUG] (confmon - L:150) Not applying configuration received via CCCP. No changes detected. A.rev=6277, B.rev=6277
7510ms [I0] {26741} [TRACE] (bootstrap - L:143) Background-polling for new configuration
7510ms [I0] {26741} [TRACE] (confmon - L:272) Refreshing current cluster map
7510ms [I0] {26741} [TRACE] (server - L:576) <cb1.caravelo.vpc:11210> (SRV=0x10353d0,IX=0) Scheduling next timeout for 2500 ms. This is not an error
7510ms [I0] {26741} [TRACE] (server - L:576) <cb2.caravelo.vpc:11210> (SRV=0x1037d60,IX=1) Scheduling next timeout for 2500 ms. This is not an error
7510ms [I0] {26741} [DEBUG] (confmon - L:150) Not applying configuration received via CCCP. No changes detected. A.rev=6277, B.rev=6277
7510ms [I0] {26741} [TRACE] (confmon - L:259) Attempting to retrieve cluster map via CCCP
7510ms [I0] {26741} [DEBUG] (cccp - L:143) Re-Issuing CCCP Command on server struct 0x1038b50 (cb3.caravelo.vpc:11210)
7510ms [I0] {26741} [DEBUG] (lcbio_mgr - L:413) <cb3.caravelo.vpc:11210> (HE=0x104d490) Creating new connection because none are available in the pool
7510ms [I0] {26741} [TRACE] (lcbio_mgr - L:328) <cb3.caravelo.vpc:11210> (HE=0x104d490) New pool entry: I=0x1076950
7510ms [I0] {26741} [INFO] (connection - L:464) <cb3.caravelo.vpc:11210> (SOCK=0x1076b10) Starting. Timeout=2500000us
7511ms [I0] {26741} [DEBUG] (connection - L:230) <cb3.caravelo.vpc:11210> (SOCK=0x1076b10) Created new socket with FD=6
7511ms [I0] {26741} [TRACE] (connection - L:331) <cb3.caravelo.vpc:11210> (SOCK=0x1076b10) Scheduling I/O watcher for asynchronous connection completion.
7511ms [I0] {26741} [INFO] (connection - L:139) <cb3.caravelo.vpc:11210> (SOCK=0x1076b10) Connected established
7511ms [I0] {26741} [DEBUG] (connection - L:98) <cb3.caravelo.vpc:11210> (SOCK=0x1076b10) Successfully set TCP_NODELAY
7511ms [I0] {26741} [DEBUG] (connection - L:98) <cb3.caravelo.vpc:11210> (SOCK=0x1076b10) Successfully set TCP_KEEPALIVE
7511ms [I0] {26741} [DEBUG] (lcbio_mgr - L:287) <cb3.caravelo.vpc:11210> (HE=0x104d490) Received result for I=0x1076950,C=(nil); E=0x0
7511ms [I0] {26741} [DEBUG] (lcbio_mgr - L:247) <cb3.caravelo.vpc:11210> (HE=0x104d490) Assigning R=0x1076a60 SOCKET=0x1076b10
7511ms [I0] {26741} [TRACE] (server - L:633) <cb3.caravelo.vpc:11210> (SRV=0x1038b50) Session not yet negotiated. Negotiating
7511ms [I0] {26741} [DEBUG] (ioctx - L:101) <cb3.caravelo.vpc:11210> (CTX=0x107b130,unknown) Pairing with SOCK=0x1076b10
7513ms [I0] {26741} [DEBUG] (negotiation - L:418) <cb3.caravelo.vpc:11210> (SASLREQ=0x107af50) Server supports feature: 0x3 (TCP NODELAY)
7513ms [I0] {26741} [TRACE] (negotiation - L:563) <cb3.caravelo.vpc:11210> (SASLREQ=0x107af50) GET_ERRORMAP unsupported/disabled
7514ms [I0] {26741} [DEBUG] (ioctx - L:151) <cb3.caravelo.vpc:11210> (CTX=0x107b130,sasl) Destroying context. Pending Writes=0, Entered=true, Socket Refcount=1
7514ms [I0] {26741} [DEBUG] (ioctx - L:101) <cb3.caravelo.vpc:11210> (CTX=0x106d8c0,unknown) Pairing with SOCK=0x1076b10
7514ms [I0] {26741} [DEBUG] (confmon - L:150) Not applying configuration received via CCCP. No changes detected. A.rev=6277, B.rev=6277
^C

As you can see, there is a patter that repeats until the time out occurs (time out error hidden to reduce verbosity).

Thanks in advance for any help provided.

Cheers.

Create following index

CREATE INDEX ix1 ON la(program.id, id) WHERE type = "SR-CAMPAIGN" ;

Thanks for your reply but I already tried it in my previous tests as in the beginning I thought it was an indices problem; it doesn’t make any difference.

I just found a solution which I’ll post shortly.

Issue solved, but not the underlying root cause could not be determined.

These are the steps we followed:

  1. Remove all nodes from the cluster to leave it with just a single one (let’s name it node 1). The query works.
  2. Add the second node (node 2) to the cluster. The query works.
  3. Add the third node (node 3) to the cluster. The query doesn’t work any more (same problem as explained in my first post).
  4. Clone the bucked and restore it in a completely different 3-nodes cluster. The query works.

Originally the concerned cluster was a 2-nodes cluster with Couchbase 4.0 CE installed from a bundled AMI. Recently we:

  1. Upgraded them to Couchbase 4.5
  2. Launched a new blank machine in which we installed Couchbase 4.5 from scratch (no bundled AMI used)
  3. Added the third node to the cluster (node 3)

Even though we could not find the exact problem that caused the strange behaviour, it seems to be clear that the problem is caused by the different setups used for nodes 1 & 2 on one hand and node 3 on the other.

I hope it helps others.

Cheers.