Facet query `size` behavior

(We’re running Enterprise Edition 6.6.1 build 9213)

In a bucket we have many (thousands of) Thing documents:

[
  {
    "objectType": "Thing",
    "id": "c67536b1",
    "contactInfo": {
      "address": {
        "state": "WA"
      }
    }
  },
  {
    "objectType": "Thing",
    "id": "48af2bcc",
    "contactInfo": {
      "address": {
        "state": "CA"
      }
    }
  },
  {
    "objectType": "Thing",
    "id": "b9f9476c",
    "contactInfo": {
      "address": {
        "state": "IL"
      }
    }
  }
  // etc...
]

and we have an FTS index for these documents:

{
  "type": "fulltext-index",
  "name": "fts_idx_thing",
  "sourceType": "couchbase",
  "sourceName": "thing-bucket",
  "planParams": {
    "maxPartitionsPerPIndex": 171,
    "indexPartitions": 6
  },
  "params": {
    "doc_config": {
      "docid_prefix_delim": "",
      "docid_regexp": "",
      "mode": "type_field",
      "type_field": "objectType"
    },
    "mapping": {
      "analysis": {},
      "default_analyzer": "standard",
      "default_datetime_parser": "dateTimeOptional",
      "default_field": "_all",
      "default_mapping": {
        "dynamic": true,
        "enabled": false
      },
      "default_type": "_default",
      "docvalues_dynamic": true,
      "index_dynamic": true,
      "store_dynamic": true,
      "type_field": "_type",
      "types": {
        "Thing": {
          "dynamic": false,
          "enabled": true,
          "properties": {
            "contactInfo": {
              "dynamic": false,
              "enabled": true,
              "properties": {
                "address": {
                  "dynamic": false,
                  "enabled": true,
                  "properties": {
                    "state": {
                      "dynamic": false,
                      "enabled": true,
                      "fields": [
                        {
                          "analyzer": "keyword",
                          "docvalues": true,
                          "include_in_all": true,
                          "include_term_vectors": true,
                          "index": true,
                          "name": "state",
                          "type": "text"
                        }
                      ]
                    }
                  }
                }
              }
            }
          }
        }
      }
    },
    "store": {
      "indexType": "scorch"
    }
  },
  "sourceParams": {}
}

When we execute a facet query against that index

{
    "query": {
        "must_not": {
            "disjuncts": [{
                "field": "deletedAt",
                "end": "2021-02-12T17:35:06.901Z",
                "inclusive_end": true
            }]
        }
    },
    "facets": {
        "states": {
            "field": "contactInfo.address.state",
            "size": 50
        }
    }
}

we get back 50 results, and these results show the correct counts per facet. In this case, it’s something like:

{
  "total_hits": 141,
  "facets": {
    "states": {
      "field": "contactInfo.address.state",
      "total": 87,
      "missing": 54,
      "other": 0,
      "terms": [
        {
          "term": "WA",
          "count": 13
        },
        {
          "term": "CA",
          "count": 11
        },
        {
          "term": "IL",
          "count": 9
        }
        // etc...
      ]
    }
  }
}

However, when we change the size property for that field in the facets query, the counts change:

{
    "query": {
        "must_not": {
            "disjuncts": [{
                "field": "deletedAt",
                "end": "2021-02-12T17:41:51.080Z",
                "inclusive_end": true
            }]
        }
    },
    "facets": {
        "states": {
            "field": "contactInfo.address.state",
            "size": 5
        }
    }
}

The terms returned in the smaller size requested now have different counts from the first query:

{
  "total_hits": 141,
  "facets": {
    "states": {
      "field": "contactInfo.address.state",
      "total": 87,
      "missing": 54,
      "other": 0,
      "terms": [
        {
          "term": "WA",
          "count": 13
        },
        {
          "term": "CA",
          "count": 10
        },
        {
          "term": "IL",
          "count": 7
        }
        // etc...
      ]
    }
  }
}

Our expectation is that the counts for each term should be the same regardless of the size provided in the query definition. i.e. the size behaves like OFFSET 0 LIMIT 5. So far we haven’t been able to find any guidance in the couchbase documentation or elsewhere that explains this behavior.

Yes, your expectation is accurate. Regardless of OFFSET/LIMIT you set for the query, the facets content should remain unchanged for a fixed number of hits; Your index definition looks proper to me as well.

I’m unable to reproduce the issue with a set of documents I set up for this. Will you be able to share a minimal set of documents, with which you I can consistently reproduce this issue? It’ll certainly be helpful.

@abhinav to be clear, I don’t think that setting OFFSET/LIMIT is a problem. The problem is setting the size in the facet query to 50 and getting one set of counts, and then setting to 5 and getting another set of counts. Does that change your response?

@schleg It does not. The counts should remain unchanged for a fixed number of hits, regardless of the “size” you’re using for facets.

I’m assuming you don’t have any mutations occurring to the data on your source bucket, yes?

@schleg Looking into this again - this can happen with a partitioned index if you set a somewhat smaller facets’ “size” parameter. Let me explain myself better with an example.

Here’s facet information with 2 partitions …

+------------+                +------------+
|     P1     |                |     P2     |
+------------+                +------------+
    CA: 5                         UT: 4
    UT: 4                         WA: 2
    WA: 3                         CA: 1

Now while combining results, the aggregated facets with size 3 would return (all 3 rows combined) …

UT: 8
CA: 6
WA: 5

and with size 2 (top 2 rows are combined) …

UT: 8
CA: 5
WA: 2

So with facets what’s recommended is to use a reasonably large size (equal to or greater than the number of unique terms) if you wish to get accurate/consistent counts.