N1QL vs MapReduce Views with Sync-Gateway - Performance issues

Hi,

Your index needs to match your query WHERE clause. Can you go back to the index and query that were working as covering indexes, and then make the changes incrementally, checking after each change?

It is a programming exercise, so you have to be precise.

Trying to improve the performance of this query

Select id
FROM sync_gateway_sw1 use index(resident8_idx)
WHERE meta().id NOT LIKE '_sync:%' 
AND (_deleted = FALSE OR _deleted IS MISSING)  
AND type = 'user' and resident=true 
AND ANY l IN OBJECT_NAMES(locations) SATISFIES l = '25486355-dfdd-49df-888a-abe123791bd2'  END
limit 100

Using Index

create index resident8_idx ON sync_gateway_sw1(type,meta().id, resident, DISTINCT ARRAY i FOR i IN OBJECT_NAMES(locations)  END)
WHERE meta().id NOT LIKE '_sync:%' AND (_deleted = FALSE OR _deleted IS MISSING) and resident=true
AND type = 'user'

EXPLAIN

[
{
“plan”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “Sequence”,
“~children”: [
{
#operator”: “DistinctScan”,
“scan”: {
#operator”: “IndexScan”,
“index”: “resident8_idx”,
“index_id”: “d614cbbce45b4ed0”,
“keyspace”: “sync_gateway_sw1”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
“successor("user")”
],
“Inclusion”: 1,
“Low”: [
“"user"”
]
}
}
],
“using”: “gsi”
}
},
{
#operator”: “Parallel”,
“~child”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “Fetch”,
“keyspace”: “sync_gateway_sw1”,
“namespace”: “default”
},
{
#operator”: “Filter”,
“condition”: “(((((not ((meta(sync_gateway_sw1).id) like "_sync:%")) and (((sync_gateway_sw1._deleted) = false) or ((sync_gateway_sw1._deleted) is missing))) and ((sync_gateway_sw1.type) = "user")) and ((sync_gateway_sw1.resident) = true)) and any l in object_names((sync_gateway_sw1.locations)) satisfies (l = "25486355-dfdd-49df-888a-abe123791bd2") end)”
},
{
#operator”: “InitialProject”,
“result_terms”: [
{
“expr”: “(sync_gateway_sw1.id)”
}
]
},
{
#operator”: “FinalProject”
}
]
}
}
]
},
{
#operator”: “Limit”,
“expr”: “100”
}
]
},
“text”: “Select id\nFROM sync_gateway_sw1 use index(resident8_idx)\nWHERE meta().id NOT LIKE ‘_sync:%’ \nAND (_deleted = FALSE OR _deleted IS MISSING) \nAND type = ‘user’ and resident=true \nAND ANY l IN OBJECT_NAMES(locations) SATISFIES l = ‘25486355-dfdd-49df-888a-abe123791bd2’ END\nlimit 100”
}
]

The locations is a hash and need to see specific key exists.
It takes 1.29s with 55 results

If I remove the following from the query

AND ANY l IN OBJECT_NAMES(locations) SATISFIES l = ‘25486355-dfdd-49df-888a-abe123791bd2’ END
It takes 83 ms with 109 results (because of no filter)

I do improve this?

Yes, you can improve the performance significantly. Keep the ANY predicate in the query. Look up array indexing in N1QL. There are documents and examples you can find. Play around with array indexing, and when you have it working, you can post here and we will make any additional suggestions.

When creating the index I already used “DISTINCT ARRAY i FOR i IN OBJECT_NAMES(locations)”

Your index uses variable i, and your query uses variable l. You need to use the same variable in both.

Changing I to i change the execution time to 1.3s

Testing more, even with covered indexes (See explain) N1QL is slower. I have tested with other views and n1ql and produced similar results. Any idea on how to improve it? I even tried to return just the meta().id in the N1QL and

“view_duration = 0.332000941 rows=221”
“n1ql_duration = 0.691879602 rows=221”

Current View

function (doc, meta) {
    if (meta.id && (meta.id.indexOf('_sync') === -1) && !doc._deleted) {
        if (doc.type === 'change-log-entry') {
            if( doc.changedAt && doc.changeType === 'task-create' && doc.changeInfo && doc.changeInfo.details) {
                var normalizeDate = new Date(doc.changedAt);
                var isEmergency = !!doc.changeInfo.details.isEmergency;
                emit([doc.location, normalizeDate.toISOString()],  {
                    'taskId': doc.primaryTarget,
                    'isEmergency':isEmergency});
            }
        }
    }
}

N1QL

Index

CREATE INDEX `change_log_task_create_v1` ON `sync_gateway_sw1`(`type`,(meta().`id`),`_deleted`,softDelete, `location`,`changeType`,`changedAt`, changeInfo.details.isEmergency)
WHERE ((((((not ((meta().`id`) like "_sync:%")) and ((`_deleted` = false) or (`_deleted` is missing))) AND (softDelete is MISSING or softDelete = false) and (`type` = "change-log-entry")) and
(`changeType` = "task-create")) and (`location` is not null)) and (`changedAt` is not null))
and changeInfo is NOT NULL and changeInfo.details is NOT NULL

Query

select meta().id,location, changeInfo.details.isEmergency
FROM sync_gateway_sw1 use index(change_log_task_create_v1)
WHERE meta().id NOT LIKE ‘_sync:%’
AND (_deleted = FALSE OR _deleted IS MISSING)
AND (softDelete is MISSING or softDelete = false)
AND type = ‘change-log-entry’ and changeType = ‘task-create’
and location is not null and changedAt is not null
and location = ‘79497d97-3eb6-400c-8ed0-d48d28d327cb’
and changedAt between ‘2015-08-22’ and ‘2016-08-22’
and changeInfo is NOT NULL and changeInfo.details is NOT NULL

EXPLAIN

[
{
“plan”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “IndexScan”,
“covers”: [
“cover ((sync_gateway_sw1.type))”,
“cover ((meta(sync_gateway_sw1).id))”,
“cover ((sync_gateway_sw1._deleted))”,
“cover ((sync_gateway_sw1.softDelete))”,
“cover ((sync_gateway_sw1.location))”,
“cover ((sync_gateway_sw1.changeType))”,
“cover ((sync_gateway_sw1.changedAt))”,
“cover ((((sync_gateway_sw1.changeInfo).details).isEmergency))”,
“cover ((meta(sync_gateway_sw1).id))”
],
“filter_covers”: {
“cover ((((sync_gateway_sw1.changeInfo).details) is not null))”: true,
“cover (((sync_gateway_sw1.changeInfo) is not null))”: true,
“cover (((sync_gateway_sw1.changedAt) is not null))”: true,
“cover (((sync_gateway_sw1.location) is not null))”: true,
“cover ((sync_gateway_sw1.changeType))”: “task-create”,
“cover ((sync_gateway_sw1.type))”: “change-log-entry”,
“cover ((not ((meta(sync_gateway_sw1).id) like "_sync:%")))”: true
},
“index”: “change_log_task_create_v1”,
“index_id”: “1d6cfba2866ddb20”,
“keyspace”: “sync_gateway_sw1”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
“successor("change-log-entry")”
],
“Inclusion”: 1,
“Low”: [
“"change-log-entry"”
]
}
}
],
“using”: “gsi”
},
{
#operator”: “Parallel”,
“~child”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “Filter”,
“condition”: “((((((((((cover ((not ((meta(sync_gateway_sw1).id) like "_sync:%"))) and ((cover ((sync_gateway_sw1._deleted)) = false) or (cover ((sync_gateway_sw1._deleted)) is missing))) and ((cover ((sync_gateway_sw1.softDelete)) is missing) or (cover ((sync_gateway_sw1.softDelete)) = false))) and (cover ((sync_gateway_sw1.type)) = "change-log-entry")) and (cover ((sync_gateway_sw1.changeType)) = "task-create")) and cover (((sync_gateway_sw1.location) is not null))) and cover (((sync_gateway_sw1.changedAt) is not null))) and (cover ((sync_gateway_sw1.location)) = "79497d97-3eb6-400c-8ed0-d48d28d327cb")) and (cover ((sync_gateway_sw1.changedAt)) between "2015-08-22" and "2016-08-22")) and cover (((sync_gateway_sw1.changeInfo) is not null))) and cover ((((sync_gateway_sw1.changeInfo).details) is not null)))”
},
{
#operator”: “InitialProject”,
“result_terms”: [
{
“expr”: “cover ((meta(sync_gateway_sw1).id))”
},
{
“expr”: “cover ((sync_gateway_sw1.location))”
},
{
“expr”: “cover ((((sync_gateway_sw1.changeInfo).details).isEmergency))”
}
]
},
{
#operator”: “FinalProject”
}
]
}
}
]
},
“text”: “select meta().id,location, changeInfo.details.isEmergency\nFROM sync_gateway_sw1 use index(change_log_task_create_v1)\nWHERE meta().id NOT LIKE ‘_sync:%’ \nAND (_deleted = FALSE OR _deleted IS MISSING) \nAND (softDelete is MISSING or softDelete = false) \nAND type = ‘change-log-entry’ and changeType = ‘task-create’\nand location is not null and changedAt is not null\nand location = ‘79497d97-3eb6-400c-8ed0-d48d28d327cb’\nand changedAt between ‘2015-08-22’ and ‘2016-08-22’\nand changeInfo is NOT NULL and changeInfo.details is NOT NULL”
}
]

METRICS:

“view_duration = 0.332000941 rows=221”
“n1ql_duration = 0.691879602 rows=221”

Remove type, meta().id, _deleted, softDelete, etc from the index KEY definition…
you already have in the WHERE clause of create index.

You need to to include things like _chagedat, location, changeInfo.details.isEmergency in the index key.
Right now, the index scan isn’t very selective.

Not much change in the execution time.

Should N1QL be faster then views?

Index:

CREATE INDEX change_log_task_create_v2 ON sync_gateway_sw1(location,changedAt, changeInfo.details.isEmergency)
WHERE ((((((not ((meta().id) like “_sync:%”)) and ((_deleted = false) or (_deleted is missing))) AND (softDelete is MISSING or softDelete = false) and (type = “change-log-entry”)) and
(changeType = “task-create”)) and (location is not null)) and (changedAt is not null))
and changeInfo is NOT NULL and changeInfo.details is NOT NULL

Explain:

[
{
“plan”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “IndexScan”,
“index”: “change_log_task_create_v2”,
“index_id”: “b137106aee076d92”,
“keyspace”: “sync_gateway_sw1”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
“"79497d97-3eb6-400c-8ed0-d48d28d327cb"”,
“successor("2016-08-22")”
],
“Inclusion”: 1,
“Low”: [
“"79497d97-3eb6-400c-8ed0-d48d28d327cb"”,
“"2015-08-22"”
]
}
}
],
“using”: “gsi”
},
{
#operator”: “Parallel”,
“~child”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “Fetch”,
“keyspace”: “sync_gateway_sw1”,
“namespace”: “default”
},
{
#operator”: “Filter”,
“condition”: “(((((((((((not ((meta(sync_gateway_sw1).id) like "_sync:%")) and (((sync_gateway_sw1._deleted) = false) or ((sync_gateway_sw1._deleted) is missing))) and (((sync_gateway_sw1.softDelete) is missing) or ((sync_gateway_sw1.softDelete) = false))) and ((sync_gateway_sw1.type) = "change-log-entry")) and ((sync_gateway_sw1.changeType) = "task-create")) and ((sync_gateway_sw1.location) is not null)) and ((sync_gateway_sw1.changedAt) is not null)) and ((sync_gateway_sw1.location) = "79497d97-3eb6-400c-8ed0-d48d28d327cb")) and ((sync_gateway_sw1.changeInfo) is not null)) and (((sync_gateway_sw1.changeInfo).details) is not null)) and ((sync_gateway_sw1.changedAt) between "2015-08-22" and "2016-08-22"))”
},
{
#operator”: “InitialProject”,
“result_terms”: [
{
“expr”: “(meta(sync_gateway_sw1).id)”
},
{
“expr”: “(sync_gateway_sw1.location)”
},
{
“expr”: “(((sync_gateway_sw1.changeInfo).details).isEmergency)”
}
]
},
{
#operator”: “FinalProject”
}
]
}
}
]
},
“text”: “select meta().id, location, changeInfo.details.isEmergency\n FROM sync_gateway_sw1 use index(change_log_task_create_v2)\n WHERE meta().id NOT LIKE ‘_sync:%’\n AND (_deleted = FALSE OR _deleted IS MISSING)\n AND (softDelete is MISSING or softDelete = false)\n AND type = ‘change-log-entry’ and changeType = ‘task-create’\n and location is not null and changedAt is not null\n and location = ‘79497d97-3eb6-400c-8ed0-d48d28d327cb’\n and changeInfo is NOT NULL and changeInfo.details is NOT NULL\n and changedAt between ‘2015-08-22’ and ‘2016-08-22’”
}
]

Now your index is not covering…

Previous post was covered, this was from keshav_m suggestion