Search with a Disjunt Query is slow

jempis02 · November 12, 2018, 5:01pm

Hi all,
i have a bucket with a 205,665 documents.
The document structure is as follows:

name → string
uuid → string
readers → array of string

if i search with a simple query:

“query”: {
“query”: “text”
}
the took is (on average) 0.005 ( is perfect).

but i want to filter the result with the readers, and i use this query:

{
“explain”: true,
“fields”: [
“*”
],
“highlight”: {},
“query”: {
“conjuncts” :[
{
“query”: “text”
},
{
“disjuncts”:[
{“field”:“readers”, “match”: “reader_1”},
{“field”:“readers”, “match”: “reader_2”},
{“field”:“readers”, “match”: “reader_3”},
{“field”:“readers”, “match”: “reader_4”},
{“field”:“readers”, “match”: “reader_5”},
{“field”:“readers”, “match”: “reader_6”},
{“field”:“readers”, “match”: “reader_7”},
{“field”:“readers”, “match”: “reader_8”},
{“field”:“readers”, “match”: “reader_9”},
{“field”:“readers”, “match”: “reader_10”},
{“field”:“readers”, “match”: “reader_11”},
{“field”:“readers”, “match”: “reader_12”},
{“field”:“readers”, “match”: “reader_13”},
{“field”:“readers”, “match”: “reader_14”},
{“field”:“readers”, “match”: “reader_15”},
{“field”:“readers”, “match”: “reader_16”},
{“field”:“readers”, “match”: “reader_17”},
{“field”:“readers”, “match”: “reader_18”},
{“field”:“readers”, “match”: “reader_19”},
{“field”:“readers”, “match”: “reader_20”},
{“field”:“readers”, “match”: “reader_21”},
{“field”:“readers”, “match”: “reader_22”},
{“field”:“readers”, “match”: “reader_23”},
{“field”:“readers”, “match”: “reader_24”},
]
}
]
}
}

and the took is (on average) 0.89.

I think that the different is very high, also because the number of documents, in production, will become greater than 10,000,000.

I’m missing something?
it is possible to create a query with a low took?

Thanks
J

abhinav · November 12, 2018, 11:24pm

@jempis02 Thank you for raising this concern. A few questions for you …

Did you create a specific index, or is it just a default index?
What analyzer are you using to build the index?
If you share your index definition here, I’ll have the answers to my previous 2 questions.

In not using a wild card query for your use case, I suppose you have perhaps already reduced the number of term searchers for your query request.

If we can fine tune your index (if you haven’t done that already i.e) - we could look at some additional savings. What I mean here is: when you search for term “text” - do you need to look across all fields, or only a specific field. If it is only a specific field, you should include that mapping in your index definition.

jempis02 · November 13, 2018, 8:45am

I @abhinav

thank for the response.
I use a specific index and not an analyzer.
I use the wildcard too, and the scope of the disjuncts query is for filter the response, depending on the user executing the query.
I’ll explain:
My bucket is the list of files (‘name’) in a folder with the relative permissions (‘readers’).
When a user search a file, the disjuncts checks if at least one user permissions are present within the reader array.

I hope I explained myself.

the index is shown below:

{
 "name": "test_index",
 "type": "fulltext-index",
 "params": {
  "doc_config": {
   "docid_prefix_delim": "",
   "docid_regexp": "",
   "mode": "type_field",
   "type_field": "filename"
  },
  "mapping": {
   "default_analyzer": "standard",
   "default_datetime_parser": "dateTimeOptional",
   "default_field": "_all",
   "default_mapping": {
    "default_analyzer": "",
    "dynamic": true,
    "enabled": true,
    "properties": {
     "ext": {
      "enabled": true,
      "dynamic": false,
      "fields": [
       {
        "include_in_all": true,
        "include_term_vectors": true,
        "index": true,
        "name": "ext",
        "type": "text"
       }
      ]
     },
     "name": {
      "enabled": true,
      "dynamic": false,
      "fields": [
       {
        "include_in_all": true,
        "include_term_vectors": true,
        "index": true,
        "name": "name",
        "type": "text"
       }
      ]
     },
     "path": {
      "enabled": true,
      "dynamic": false,
      "fields": [
       {
        "include_in_all": true,
        "include_term_vectors": true,
        "index": true,
        "name": "path",
        "type": "text"
       }
      ]
     },
     "readers": {
      "enabled": true,
      "dynamic": false,
      "fields": [
       {
        "include_term_vectors": true,
        "index": true,
        "name": "readers",
        "type": "text"
       }
      ]
     }
    }
   },
   "default_type": "_default",
   "docvalues_dynamic": true,
   "index_dynamic": true,
   "store_dynamic": false,
   "type_field": "_type"
  },
  "store": {
   "indexType": "upside_down",
   "kvStoreName": "mossStore"
  }
 },
 "sourceType": "couchbase",
 "sourceName": "files-ele",
 "sourceUUID": "161267103d57a3fd63e2ca7a4d11e4a7",
 "sourceParams": {},
 "planParams": {
  "maxPartitionsPerPIndex": 171,
  "numReplicas": 0
 },
 "uuid": "54d52e766588941c"
}

Thanks
jempis

abhinav · November 13, 2018, 6:47pm

@jempis02 Cool. So I guess one bit that I was asking about was - within the query when you search for the file/text could you make do with searching over a single field. For example if the “name” field carries all the information you’ll need, then you’re query could look like:

{
"query": {
"conjuncts" :[
{
"term": "text",
"field": "name"
},
{
"disjuncts":[
{"field":"readers", "match": "reader_1"},
{"field":"readers", "match": "reader_2"},
...
{"field":"readers", "match": "reader_24"},
]
}
]
}
}

Also, what release of couchbase are you using? I see that you’re using upside_down/moss. We have a new version available with 6.0 which we’ve named scorch. We’ve noted pretty good performance improvements with the new index type especially when it comes to latency and throughput for compound queries such as the ones you’re using.

jempis02 · November 14, 2018, 9:24am

@abhinav perfect.
with your new query the request is faster than the querystring.
But I decided to use the querystring query because i want a dynamically search, google like.
I want, for example, add new fields, like the extension or filepath, and i want to use, for example, Field Scoping (like “ext:.pdf”).
Finally : I want to use a search with all possibility query (math query, Field Scoping, Required, optional, numeric, etc…), then i suppose that your specific query is not suitable for my project.

That’s why I want to understand how I can improve and speed up my query, is it possible?

I use the release 5.5.2, but now i download the new version (6.0)

Thanks for now
jempis

Topic		Replies	Views
Query for search including join on different bucket performing very slow SQL++ query , n1ql , couchbase-cli	9	918	July 15, 2022
REST API and Conditions Full Text Search	17	2934	May 2, 2018
FTS with N1QL or API Full Text Search	16	3667	September 22, 2016
Simple user search takes 25+ seconds to execute on large cluster with fairly small data set SQL++	13	3298	June 1, 2016
Question on how the query results are merged Couchbase Server	0	1590	February 3, 2014

Search with a Disjunt Query is slow

Related topics