Help creating n1ql index

jp · August 22, 2017, 4:37am

Hello,

I am having trouble creating a proper index for a n1ql query/document.

Here is my document structure:

{
  "viewEvents": [
    {
      "deviceType": "",
      "country": "USA",
      "endOffset": 19615,
      "product": "",
      "postalCode": "90045",
      "type": "LINEAR",
      "userId": "641469777",
      "viewDuration": 114,
      "airingId": "4321341",
      "itemId": "1233442",
      "startOffset": 9185,
      "profileId": "653667813",
      "contentDuration": 28800,
      "startTime": 1503376385539,
      "portionViewed": 0,
      "attributes": {},
      "endTime": 1503376500492,
      "airingStartTime": 1503367200000,
      "stationId": "54325432",
      "timestamp": 0
    }
  ],
}

Here is my n1ql query.

select viewEvents.*, meta(sentvUserViewEventList).id as _ID, meta(sentvUserViewEventList).cas as _CAS from sentv_user_data sentvUserViewEventList UNNEST sentvUserViewEventList.viewEvents viewEvents where meta(sentvUserViewEventList).id like ‘user_view_event::%’ and viewEvents.startTime is not missing and viewEvents.startTime >= $startTime$ order by viewEvents.startTime ASC;

It currently takes about 6-7 seconds to return around 50k documents. I need this to be much quicker if possible.

Here is the start to my index. Any pointers/help is appreciated. Thanks

CREATE INDEX test ON default (DISTINCT ARRAY i.startTime FOR i IN viewEvents END)

vsr1 · August 22, 2017, 12:26pm

CREATE INDEX ix1 ON default(DISTINCT ARRAY ve.startTime FOR ve IN viewEvents END) WHERE META().id LIKE "user_view_event::%";
SELECT ve.*, META(sud).id AS _ID, META(sud).cas AS _CAS
FROM sentv_user_data AS sud
UNNEST sud.viewEvents AS ve
WHERE META(sud).id LIKE "user_view_event::%" AND ve.startTime >= $startTime$
ORDER BY ve.startTime ASC;

jp · August 22, 2017, 5:29pm

Thanks for your help. I tried implementing what you suggested and it’s still not performing well. I also added a top range to the query to try to make it quicker but still nothing. Below is the explain plan if that is of help. Thanks

SELECT ve.*, META(sud).id AS _ID, META(sud).cas AS _CAS FROM sentv_user_data AS sud USE INDEX(startTime_idx) UNNEST sud.viewEvents AS ve WHERE META(sud).id LIKE 'user_view_event::%' AND ve.startTime >= $startTime$ AND ve.startTime < $endTime$ ORDER BY ve.startTime ASC


[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "DistinctScan",
              "scan": {
                "#operator": "IndexScan",
                "index": "startTime_idx",
                "index_id": "c43cfbd6918ae60a",
                "keyspace": "sentv_user_data",
                "namespace": "default",
                "spans": [
                  {
                    "Range": {
                      "High": [
                        "1503422700000"
                      ],
                      "Inclusion": 1,
                      "Low": [
                        "1503422400000"
                      ]
                    }
                  }
                ],
                "using": "gsi"
              }
            },
            {
              "#operator": "Fetch",
              "as": "sud",
              "keyspace": "sentv_user_data",
              "namespace": "default"
            },
            {
              "#operator": "Parallel",
              "~child": {
                "#operator": "Sequence",
                "~children": [
                  {
                    "#operator": "Unnest",
                    "as": "ve",
                    "expr": "(`sud`.`viewEvents`)"
                  },
                  {
                    "#operator": "Filter",
                    "condition": "((((meta(`sud`).`id`) like \"user_view_event::%\") and (1503422400000 <= (`ve`.`startTime`))) and ((`ve`.`startTime`) < 1503422700000))"
                  },
                  {
                    "#operator": "InitialProject",
                    "result_terms": [
                      {
                        "expr": "`ve`",
                        "star": true
                      },
                      {
                        "as": "_ID",
                        "expr": "(meta(`sud`).`id`)"
                      },
                      {
                        "as": "_CAS",
                        "expr": "(meta(`sud`).`cas`)"
                      }
                    ]
                  }
                ]
              }
            }
          ]
        },
        {
          "#operator": "Order",
          "sort_terms": [
            {
              "expr": "(`ve`.`startTime`)"
            }
          ]
        },
        {
          "#operator": "FinalProject"
        }
      ]
    },
    "text": "SELECT ve.*, META(sud).id AS _ID, META(sud).cas AS _CAS\nFROM sentv_user_data AS sud USE INDEX(startTime_idx)\nUNNEST sud.viewEvents AS ve\nWHERE META(sud).id LIKE \"user_view_event::%\" AND ve.startTime >= 1503422400000 and ve.startTime < 1503422700000\nORDER BY ve.startTime ASC;"
  }
]

vsr1 · August 22, 2017, 5:39pm

Plan looks Optimal. We required fetch of the document i and the document has lot of elements it is taking time.
You can try pretty=false

keshav_m · August 22, 2017, 6:35pm

If you’re fetching lot of documents, you can increase their throughput by donig the following:

curl http://localhost:8093/admin/settings -u Administrator:password > /tmp/z.json

cat /tmp/z.json

{“completed-limit”:4000,“completed-threshold”:1000,“controls”:false,“cpuprofile”:"",“debug”:false,“keep-alive-length”:16384,“loglevel”:“INFO”,“max-index-api”:2,“max-parallelism”:1,“memprofile”:"",“pipeline-batch”:16,“pipeline-cap”:512,“pretty”:true,“profile”:“off”,“request-size-cap”:67108864,“scan-cap”:512,“servicers”:64,“timeout”:0}

Update z.json parameters: pipeline-batch and pretty

cat /tmp/z.json

{“completed-limit”:4000,“completed-threshold”:1000,“controls”:false,“cpuprofile”:"",“debug”:false,“keep-alive-length”:16384,“loglevel”:“INFO”,“max-index-api”:2,“max-parallelism”:1,“memprofile”:"","pipeline-batch”:32,“pipeline-cap”:512,"pretty”:false,“profile”:“off”,“request-size-cap”:67108864,“scan-cap”:512,“servicers”:64,“timeout”:0}

———————— Changing on a cluster. Values will be retained even after restart.
—— update the password and ipaddress below.

curl -X POST -u Administrator: http://127.0.0.1:9000/diag/eval/ -d ‘ns_config:set({node, node(), {query, extra_args}}, [“–pipeline-batch=32", “—pretty=false"])’

jp · August 22, 2017, 9:46pm

thanks for everyone’s help. will give the node params a shot

Topic		Replies	Views
N1QL for like queries SQL++	4	19908	May 21, 2017
How do i write this n1ql SQL++ query , n1ql , index	2	172	September 22, 2024
N1ql query is slow in join even indexes are created SQL++ query , n1ql	3	583	August 14, 2022
N1QL Index issue on sort SQL++	1	611	May 8, 2019
Indexing for performance SQL++ query , n1ql , index	1	724	April 13, 2020

Help creating n1ql index

curl http://localhost:8093/admin/settings -u Administrator:password > /tmp/z.json

cat /tmp/z.json

cat /tmp/z.json

Related topics