FTS sort not working correctly

@abhinav

I think I have found another bug in the FTS. Not sure I am doing something wrong or if it is really a bug, but it seems that the sorting is not working.

trying to run this query using a sort object:

SELECT a_app_release_date
FROM `search` AS a
WHERE sub_type="me"
    AND SEARCH(a,{ "fields": [ "*" ], "highlight": {}, "sort": [{ "by" : "field", "field" : "a_app_release_date", "mode" : "max", "missing" : "last", "type": "number" }], "query": { "query": "+dark" }, "size": 100, "from": 0 })

and the list of results looks like this:

[
  {
    "a_app_release_date": 1607587200
  },
  {
    "a_app_release_date": 1533649045
  },
  {
    "a_app_release_date": 1553612837
  },
  {
    "a_app_release_date": 1585897200
  },
  {
    "a_app_release_date": 1547295010
  }
]

As you can see the results are not in the order of the release date.

If I try without using a sort order object

 SELECT a_app_release_date
    FROM `search` AS a
    WHERE sub_type="me"
        AND SEARCH(a,{ "fields": [ "*" ], "highlight": {}, "sort": ["a_app_release_date"], "query": { "query": "+dark" }, "size": 100, "from": 0 })

List of results looks like this:

[
  {
    "a_app_release_date": 1579852800
  },
  {
    "a_app_release_date": 1564815600
  },
  {
    "a_app_release_date": 1588921200
  },
  {
    "a_app_release_date": 1556402530
  },
  {
    "a_app_release_date": 1595660400
  }
]

As you can see, again the results are not int the correct order

The a_app_release_date is indexed as a number as you can see in the below index definition

{
 "name": "search_reduced",
 "type": "fulltext-index",
 "params": {
  "doc_config": {
   "docid_prefix_delim": "",
   "docid_regexp": "",
   "mode": "type_field",
   "type_field": "sub_type"
  },
  "mapping": {
   "default_analyzer": "standard",
   "default_datetime_parser": "dateTimeOptional",
   "default_field": "_all",
   "default_mapping": {
    "dynamic": true,
    "enabled": false
   },
   "default_type": "_default",
   "docvalues_dynamic": true,
   "index_dynamic": true,
   "store_dynamic": false,
   "type_field": "_type",
   "types": {
    "me": {
     "dynamic": false,
     "enabled": true,
     "properties": {
      "a_app_id": {
       "enabled": true,
       "dynamic": false,
       "fields": [
        {
         "docvalues": true,
         "include_in_all": true,
         "index": true,
         "name": "app_id",
         "type": "text"
        }
       ]
      },
      "a_app_bundle_id": {
       "enabled": true,
       "dynamic": false,
       "fields": [
        {
         "docvalues": true,
         "include_in_all": true,
         "index": true,
         "name": "app_bundle_id",
         "type": "text"
        }
       ]
      },
      "a_app_name": {
       "enabled": true,
       "dynamic": false,
       "fields": [
        {
         "docvalues": true,
         "include_in_all": true,
         "include_term_vectors": true,
         "index": true,
         "name": "app_name",
         "type": "text"
        }
       ]
      },
      "a_app_provider_name": {
       "enabled": true,
       "dynamic": false,
       "fields": [
        {
         "docvalues": true,
         "include_in_all": true,
         "include_term_vectors": true,
         "index": true,
         "name": "app_provider_name",
         "type": "text"
        }
       ]
      },
      "a_app_release_date": {
       "enabled": true,
       "dynamic": false,
       "fields": [
        {
         "docvalues": true,
         "include_in_all": true,
         "include_term_vectors": true,
         "index": true,
         "name": "app_release_date",
         "type": "number"
        }
       ]
      },
      "a_app_seller_name": {
       "enabled": true,
       "dynamic": false,
       "fields": [
        {
         "docvalues": true,
         "include_in_all": true,
         "include_term_vectors": true,
         "index": true,
         "name": "app_seller_name",
         "type": "text"
        }
       ]
      },
      "a_hp": {
       "enabled": true,
       "dynamic": false,
       "fields": [
        {
         "docvalues": true,
         "include_in_all": true,
         "include_term_vectors": true,
         "index": true,
         "name": "hp",
         "type": "number"
        }
       ]
      },
      "app_uuid": {
       "enabled": true,
       "dynamic": false,
       "fields": [
        {
         "docvalues": true,
         "include_in_all": true,
         "index": true,
         "name": "app_uuid",
         "type": "text"
        }
       ]
      }
     }
    }
   }
  },
  "store": {
   "indexType": "scorch"
  }
 },
 "sourceType": "couchbase",
 "sourceName": "search",
 "sourceUUID": "0784c2c848bd1dcf8d8c5148fe6a2bdd",
 "sourceParams": {},
 "planParams": {
  "maxPartitionsPerPIndex": 171,
  "indexPartitions": 6,
  "numReplicas": 0
 },
 "uuid": "50a151c870d281bb"
}

Am I doing something wrong or is really a bug?
Thank you,

N1QL doesn’t guarantee sort unless query has ORDER BY a_app_release_date. N1QL process pipeline can change sort given by FTS.

What you need

SELECT a_app_release_date
    FROM `search` AS a
    WHERE sub_type="me"
        AND SEARCH(a,{ "fields": [ "*" ], "highlight": {}, "sort": ["a_app_release_date"], "query": { "query": "+dark" }, "size": 100, "from": 0 })
ORDER BY a.a_app_release_date;

Do EXPLAIN and check FtsSearch uses index order

if I put the ORDER BY outside of the FTS search, wouldn’t just order the 100 results returned by the FTS search (there is a limit of 100 results in there)?

this is the explain

{
    "#operator": "Sequence",
    "~children": [
        {
            "#operator": "IntersectScan",
            "scans": [
                {
                    "#operator": "IndexScan3",
                    "as": "a",
                    "index": "s_me_rd_ASC",
                    "index_id": "4899f6375c341903",
                    "index_projection": {
                        "primary_key": true
                    },
                    "keyspace": "search",
                    "namespace": "default",
                    "spans": [
                        {
                            "exact": true,
                            "range": [
                                {
                                    "high": "\"me\"",
                                    "inclusion": 3,
                                    "low": "\"me\""
                                }
                            ]
                        }
                    ],
                    "using": "gsi"
                },
                {
                    "#operator": "IndexFtsSearch",
                    "as": "a",
                    "index": "search_reduced",
                    "index_id": "50a151c870d281bb",
                    "keyspace": "search",
                    "namespace": "default",
                    "search_info": {
                        "field": "\"\"",
                        "outname": "out",
                        "query": "{\"fields\": [\"*\"], \"from\": 0, \"highlight\": {}, \"query\": {\"query\": \"+dark \"}, \"size\": 100, \"sort\": [{\"by\": \"field\", \"field\": \"app_release_date\", \"missing\": \"last\", \"mode\": \"max\", \"type\": \"number\"}]}"
                    },
                    "using": "fts"
                }
            ]
        },
        {
            "#operator": "Fetch",
            "as": "a",
            "keyspace": "search",
            "namespace": "default"
        },
        {
            "#operator": "Parallel",
            "~child": {
                "#operator": "Sequence",
                "~children": [
                    {
                        "#operator": "Filter",
                        "condition": "(((`a`.`sub_type`) = \"me\") and search(`a`, {\"fields\": [\"*\"], \"from\": 0, \"highlight\": {}, \"query\": {\"query\": \"+dark \"}, \"size\": 100, \"sort\": [{\"by\": \"field\", \"field\": \"app_release_date\", \"missing\": \"last\", \"mode\": \"max\", \"type\": \"number\"}]}))"
                    },
                    {
                        "#operator": "InitialProject",
                        "result_terms": [
                            {
                                "expr": "(`a`.`a_app_release_date`)"
                            }
                        ]
                    },
                    {
                        "#operator": "FinalProject"
                    }
                ]
            }
        }
    ]
}

In this case it uses IntersectScan between GSI/FTS. Intersect scan can’t use index/search pagination.
Avoid that by specifying USE INDEX (search_reduced)

If there is Pagination in SEARCH() document keys produce those keys.
Then query rules follows.
If you need ORDER offset, limit specify on query.
Pagination on seacrh() and pagination on query applied independently one after other
Pagination on seacrh() and no-pagination on query FTS applies pagination and query results might not be sorted
no-pagination on search() and pagination on query
Query decides if pagination can be qualified to push FTS index ). If yes it push other wise applied by query.

Assume SEARCH() is subquery and issued on FTS and results are used in query. With In the SEARCH() FTS rules applies outside SEARCH() N1QL rules applies. One small use case sorting on score if possible N1QL will push that info to FTS If there is no pagination in SEARCH()

Not sure I understand completely. Can you PLEASE be so kind and explain to me how should I do to make this work as it seems that I can’t get the logic of all the indexes, queries filtering order, pagination.

let’s say we have documents with 5 fields:

{
"id":  123,
"release_date":  12345,
"rank": 5,
"product_name": "product name",
"product_vendor": "vendor name"
}

if our goal is to paginate and also to order (by release_date or by rank), how should we create the index?

Do we need to create multiple indexes to have the correct order?

order by release date ASC
create index idx1 on “bucket”(release_date, id)

order by release date DESC
create index idx2 on “bucket”(release_date DESC, id)

order by rank ASC
create index idx3 on “bucket”(rank, id)

order by rank DESC
create index idx4 on “bucket”(rank DESC, id)

Is there a better option to do this?

If we add the limit and offset would be used from the index itself?

select * from bucket order by rank desc limit 10 offset 5

now, taking the above into consideration, if we would like to filter by some fields but also to benefit from the index definition when we do order and limit, how should the indexes look like?

select * from bucket where product_vendor= "asdsa" order by rank desc limit 10 offset 5
select * from bucket where product_name= "bcedf" order by rank desc limit 10 offset 5
select * from bucket where product_vendor= "asdsa" and product_name= "bcedf" order by rank desc limit 10 offset 5
select * from bucket where product_name= "bcedf" and product_vendor= "asdsa" order by rank desc limit 10 offset 5

In case of an index with multiple fields, how is the order of the fields in the query affected by the order of fields in the index?
If a field is in the index is defined before the field is used (or not used at all in the query) how this affect the query performance?

Can you please explain this a little bit to me… I read all the docs from your website, also, I read some blog posts related to this, search on youtube for forestDB and other Couchbase presentation and still it seems I cannot completely comprehend how this work and this become super frustrated to me.

I know these may be super basic questions which makes everything even more frustrating for me.

GSI Index keys, position of keys, order all important. So you need to create that many indexes and provide USE INDEX pick right one.

Other option is use FTS index.
Use with N1QL. Easy to use benefits more of complex queries (JOINs, Aggregates, want to apply additional predicates etc)
Use directly from SDK on the specific FTS index.

From above queries it looks like simple. It will be better to create single FTS index on all the required fields. Then use FTS query on that index. cc @abhinav

GSI Index:
GSI index ( c0, c1, c2)
The indexing structure is like b-tree (assume like dictionary each letter position is key position). It sorts c0, when duplicates then sorts c1, when duplicates c0,c1 then sorts c2 and stores data.
Unless you know previous position key, you need to do lot more entries to scan. i.e if you know c1 = “x”, not know c0, every c0 value you need to search c1
Unless query ORDER BY follows, index keys order it can’t use index order and must do own sort.
If query need to sort it must produce all qualified entries, lost entry might be first in the sort.
Same for pagination with sort
Also does not Support REVERSE scan (you need ASC/DESC indexes).
If you add order by keys first you many not able to push predicates efficiently. If you put predicates first query must do sort.

FTS Index:
FTS index each field in the index creates sort of separate single filed index internally. The order of index field is irrelevant.
On Search of index each field does separate search on index and results are produced via roaring bit map. skipping fields on search no affect.
Final roaring bit map it does sort and pagination.

Thanks for the explanation… I will need to read it multiple times to be sure I understand everything correctly, but my first thought is that I need to create indexes for any combination of filters and order and then chose the right index.

The problem I have using the FTS from the SDK is that the sort order doesn’t seem to work at all.

here are some examples:

Search using sort object order ASCENDING

SEARCH OBJECT: { “fields”: [*], “sort”: [{ “by” : “field”, “field” : “a_app_release_date”, “asc”:true, “missing” : “last”, “type”: “number” }], “query”: { “query”: “+a_app_name:dark” }}

Here are the search results, as you can see the values are not in the correct order:

[a_app_release_date] => Array
    (
        [0] => 1504555531
        [1] => 1476162635
        [2] => 1421207173
        [3] => 1496138494
        [4] => 1558332907
        [5] => 1568617200
        [6] => 1533936926
        [7] => 1481655020
        [8] => 1563346800
        [9] => 1441104653
    )

These are the document ids of the FTS result:

[FTSDocumentIds] => Array
    (
        [0] => s:05abc0ad-5175-42f2-97b1-1bedda289ddc:me
        [1] => s:cc349bf8-feb1-405d-af06-718b5b99f480:me
        [2] => s:ae67f62a-2255-45e5-8156-fc789f09973a:me
        [3] => s:e4df17ff-1601-43cc-976d-49f4fb46953e:me
        [4] => s:292d1880-681a-41e0-817d-93c0e2bd4f42:me
        [5] => s:cf92750d-8764-4254-aa58-88a957aad5ed:me
        [6] => s:8e03dc7c-5ca5-461a-8968-a0bf09c89787:me
        [7] => s:3b11e5d5-8e8d-4c25-838b-4cf27aa0d730:me
        [8] => s:16863927-03c6-4715-aa84-303d442bd5aa:me
        [9] => s:8e0bfbac-b9e2-4a39-af05-dd4a0f4de420:me
    )

Search using sort object order DESCENDING

SEARCH OBJECT: { “fields”: [*], “sort”: [{ “by” : “field”, “field” : “a_app_release_date”, “desc”:true, “missing” : “last”, “type”: “number” }], “query”: { “query”: “+a_app_name:dark” }}

Here are the search results, as you can see the result are the same like in the Ascending query, so somehow the order is completely ignored:

[a_app_release_date] => Array
    (
        [0] => 1504555531
        [1] => 1476162635
        [2] => 1421207173
        [3] => 1496138494
        [4] => 1558332907
        [5] => 1568617200
        [6] => 1533936926
        [7] => 1481655020
        [8] => 1563346800
        [9] => 1441104653
    )

These are the document ids of the FTS result:

[FTSDocumentIds] =>Array
(
    [0] => s:05abc0ad-5175-42f2-97b1-1bedda289ddc:me
    [1] => s:cc349bf8-feb1-405d-af06-718b5b99f480:me
    [2] => s:ae67f62a-2255-45e5-8156-fc789f09973a:me
    [3] => s:e4df17ff-1601-43cc-976d-49f4fb46953e:me
    [4] => s:292d1880-681a-41e0-817d-93c0e2bd4f42:me
    [5] => s:cf92750d-8764-4254-aa58-88a957aad5ed:me
    [6] => s:8e03dc7c-5ca5-461a-8968-a0bf09c89787:me
    [7] => s:3b11e5d5-8e8d-4c25-838b-4cf27aa0d730:me
    [8] => s:16863927-03c6-4715-aa84-303d442bd5aa:me
    [9] => s:8e0bfbac-b9e2-4a39-af05-dd4a0f4de420:me
)

Search using simple order ASCENDING

SEARCH OBJECT: { “fields”: [*], “sort”: [“a_app_release_date”], “query”: { “query”: “+a_app_name:dark” }}

The results are in a different order compared with the one above, but not in the correct order. Also, as the results are a little bit different, I would say that either this one or the object sort is taken in consideration

[a_app_release_date] => Array
        (
            [0] => 1504555531
            [1] => 1476162635
            [2] => 1600758000
            [3] => 1525405785
            [4] => 1597042800
            [5] => 1593327600
            [6] => 1600326000
            [7] => 1529511967
            [8] => 1512757772
            [9] => 1463252390
        )

These are the document ids of the FTS result:

[FTSDocumentIds] => Array
    (
        [0] => s:05abc0ad-5175-42f2-97b1-1bedda289ddc:me
        [1] => s:cc349bf8-feb1-405d-af06-718b5b99f480:me
        [2] => s:a01c7de5-0290-4e84-a0d1-7e0442575218:me
        [3] => s:81fd7f6c-2bd5-4cec-9ff3-63461d66a389:me
        [4] => s:2404a000-eb73-4ded-acac-e425f69106bb:me
        [5] => s:2bdb3406-98ea-4455-95e9-24dcaf534ef7:me
        [6] => s:ca737a94-b5a4-443f-a570-f7e1cbb3af5d:me
        [7] => s:1b81f7d7-aaff-42ff-8423-172c9c70dbfc:me
        [8] => s:d2768516-5ac7-4177-8a7f-808226c6c5dd:me
        [9] => s:0cc79349-a26e-4621-9d37-4d75704d06d7:me
    )

Search using simple order DESCENDING

SEARCH OBJECT: { “fields”: [*], “sort”: [“-a_app_release_date”], “query”: { “query”: “+a_app_name:dark” }}

Here, the order is a little bit different than the above order, but still, the order is incorect

   [a_app_release_date] => Array
        (
            [0] => 1504555531
            [1] => 1476162635
            [2] => 1496138494
            [3] => 1533936926
            [4] => 1481655020
            [5] => 1558332907
            [6] => 1568617200
            [7] => 1563346800
            [8] => 1441104653
            [9] => 1582704000
        )

These are the document ids of the FTS result:

[FTSDocumentIds] => Array
    (
        [0] => s:05abc0ad-5175-42f2-97b1-1bedda289ddc:me
        [1] => s:cc349bf8-feb1-405d-af06-718b5b99f480:me
        [2] => s:e4df17ff-1601-43cc-976d-49f4fb46953e:me
        [3] => s:8e03dc7c-5ca5-461a-8968-a0bf09c89787:me
        [4] => s:3b11e5d5-8e8d-4c25-838b-4cf27aa0d730:me
        [5] => s:292d1880-681a-41e0-817d-93c0e2bd4f42:me
        [6] => s:cf92750d-8764-4254-aa58-88a957aad5ed:me
        [7] => s:16863927-03c6-4715-aa84-303d442bd5aa:me
        [8] => s:8e0bfbac-b9e2-4a39-af05-dd4a0f4de420:me
        [9] => s:e440261b-c6f3-4ebd-8947-ec03a4cc195f:me
    )

@flaviu ,

Does the sort fail to work even with direct SDK/curl queries to FTS for the above search requests?
Also, have you enabled the doc_values field option in the corresponding index definition for the field to be sorted ?(eg: a_app_release_date). If not, enable it and try.

If the answer to both the above queries is “YES”, then please share the index definition, a sample document with a non-working query. (else I may reuse any of the above query)

Cheers!

There are multiple problems here …

  1. The intersect scan between GSI and FTS is possibly causing the query to scramble the output despite ordering within the FTS search. The intersect scan needs to be avoided for FTS to handle this query properly - try your query with USE INDEX(USING FTS) for this.
  2. We’ve found a bug as highlighted here: How to configure the FTS scoring? - #15 by abhinav , I’ve tracked it internally with https://issues.couchbase.com/browse/MB-44356. We’ve addressed it for an upcoming release. That said there’s a workaround for this.
  3. Also if you’re trying to retrieve fields from your documents (in your example a_app_release_date) - I’d recommend you to “store” it when adding the field as a number within your FTS index definition. By doing this, you can improve the latency of your query by retrieving the field from the search_meta().

Try this query and you should see results you expect …

SELECT search_meta().fields.a_app_release_date
FROM `search` AS a USE INDEX(USING FTS)
WHERE sub_type="me"
    AND SEARCH(a,{ "fields": [ "a_app_release_date" ], "highlight": {}, "sort": [{ "by" : "field", "field" : "a_app_release_date", "mode" : "max", "missing" : "last", "type": "number" }], "query": { "match": "dark" }, "size": 100, "from": 0 })
1 Like