Best practices for creating indexes in Couchbase Lite

I’m trying to improve performance of queries in my app and therefore revisit a topic that I have looked at quite some time ago. I tried to get “best practices” during another thread - but that disappeared through the process.

Up till now I have only had one index - on the type field.

I have now looked at some queries and thought I would try to improve it by adding some more specific indexes based on the queries. So now I have:

Db.CreateIndex("type", IndexBuilder.ValueIndex(ValueIndexItem.Expression(Expression.Property("type"))));
Db.CreateIndex("type_user", IndexBuilder.ValueIndex(ValueIndexItem.Expression(Expression.Property("type")), ValueIndexItem.Expression(Expression.Property("userkey").NotNullOrMissing())));
Db.CreateIndex("stats_catch", IndexBuilder.ValueIndex(ValueIndexItem.Expression(Expression.Property("type")),
                ValueIndexItem.Expression(Expression.Property("userkey").NotNullOrMissing()),
                ValueIndexItem.Expression(Expression.Property("specieskey").NotNullOrMissing()),
                ValueIndexItem.Expression(Expression.Property("locationtype").NotNullOrMissing())
    ));

I have then added a query.Explain() to the various calls to see what index they use. And now the all seem to use the type_user index…???

These are the explains (that also show the various queries):

GetList: SELECT fl_result(fl_root(_doc.body)) FROM kv_default AS _doc WHERE (fl_value(_doc.body, 'type') = 'Catch' AND fl_value(_doc.body, 'ispublic') IS fl_bool(1)) AND (_doc.flags & 1 = 0) ORDER BY fl_value(_doc.body, 'date') DESC, fl_value(_doc.body, 'catchtime') DESC LIMIT MAX(0, 20) OFFSET MAX(0, 0)

20|0|0| SEARCH TABLE kv_default AS _doc USING INDEX type_user (<expr>=?)
49|0|0| USE TEMP B-TREE FOR ORDER BY

{"WHERE":["AND",["=",[".type"],"Catch"],["IS",[".ispublic"],true]],"LIMIT":20,"OFFSET":0,"ORDER_BY":[["DESC",[".date"]],["DESC",[".catchtime"]]],"WHAT":[["."]]}


GetStats (MY counters): SELECT fl_result(sum(fl_value(_doc.body, 'count'))) AS "s1", fl_result(sum(fl_value(_doc.body, 'takenhome'))) AS "s2" FROM kv_default AS _doc WHERE (fl_value(_doc.body, 'type') = 'Catch' AND fl_value(_doc.body, 'userkey') = '2124DEFEC111BA8FC1257ED20034B387') AND (_doc.flags & 1 = 0)

4|0|0| SEARCH TABLE kv_default AS _doc USING INDEX type_user (<expr>=?)

{"WHERE":["AND",["=",[".type"],"Catch"],["=",[".userkey"],"2124DEFEC111BA8FC1257ED20034B387"]],"WHAT":[["AS",["SUM()",[".count"]],"s1"],["AS",["SUM()",[".takenhome"]],"s2"]]}


GetStats: SELECT fl_result(fl_value(_doc.body, 'targetspecies')), fl_result(fl_value(_doc.body, 'hoursfished')), fl_result(fl_value(_doc.body, 'minutesfished')), fl_result(fl_value(_doc.body, 'date')), fl_result(fl_value(_doc.body, 'zerotrip')) FROM kv_default AS _doc WHERE (fl_value(_doc.body, 'type') = 'FishingTrip' AND fl_value(_doc.body, 'userkey') = '2124DEFEC111BA8FC1257ED20034B387') AND (_doc.flags & 1 = 0)

3|0|0| SEARCH TABLE kv_default AS _doc USING INDEX type_user (<expr>=?)

{"WHERE":["AND",["=",[".type"],"FishingTrip"],["=",[".userkey"],"2124DEFEC111BA8FC1257ED20034B387"]],"WHAT":[[".targetspecies"],[".hoursfished"],[".minutesfished"],[".date"],[".zerotrip"]]}


GetStats (species count): SELECT fl_result(fl_value(_doc.body, 'specieskey')), fl_result(sum(fl_value(_doc.body, 'count'))) AS "s1" FROM kv_default AS _doc WHERE (fl_value(_doc.body, 'type') = 'Catch' AND fl_value(_doc.body, 'userkey') = '2124DEFEC111BA8FC1257ED20034B387') AND (_doc.flags & 1 = 0) GROUP BY fl_value(_doc.body, 'specieskey') ORDER BY "s1"

8|0|0| SEARCH TABLE kv_default AS _doc USING INDEX type_user (<expr>=?)
19|0|0| USE TEMP B-TREE FOR GROUP BY
61|0|0| USE TEMP B-TREE FOR ORDER BY

{"WHERE":["AND",["=",[".type"],"Catch"],["=",[".userkey"],"2124DEFEC111BA8FC1257ED20034B387"]],"ORDER_BY":[[".s1"]],"WHAT":[[".specieskey"],["AS",["SUM()",[".count"]],"s1"]],"GROUP_BY":[[".specieskey"]]}

GetFishPerHour: SELECT fl_result(fl_value(_doc.body, 'targetspecies')), fl_result(sum(fl_value(_doc.body, 'hoursfished'))) AS "s1", fl_result(sum(fl_value(_doc.body, 'minutesfished'))) AS "s2", fl_result(fl_value(_doc.body, 'zerotrip')), fl_result(count(fl_root(_doc.body))) AS "count" FROM kv_default AS _doc WHERE (((((fl_value(_doc.body, 'type') = 'FishingTrip' AND fl_value(_doc.body, 'userkey') = '2124DEFEC111BA8FC1257ED20034B387') AND (NOT (fl_value(_doc.body, 'locationtype') = fl_null() OR fl_value(_doc.body, 'locationtype') IS NULL) AND NOT (fl_value(_doc.body, 'statslocation') = fl_null() OR fl_value(_doc.body, 'statslocation') IS NULL))) AND NOT (fl_value(_doc.body, 'year') = fl_null() OR fl_value(_doc.body, 'year') IS NULL)) AND NOT (fl_value(_doc.body, 'month') = fl_null() OR fl_value(_doc.body, 'month') IS NULL)) AND NOT (fl_value(_doc.body, 'statspublic') = fl_null() OR fl_value(_doc.body, 'statspublic') IS NULL)) AND (_doc.flags & 1 = 0) GROUP BY fl_value(_doc.body, 'targetspecies'), fl_value(_doc.body, 'zerotrip')

7|0|0| SEARCH TABLE kv_default AS _doc USING INDEX type_user (<expr>=?)
58|0|0| USE TEMP B-TREE FOR GROUP BY

{"WHERE":["AND",["AND",["AND",["AND",["AND",["=",[".type"],"FishingTrip"],["=",[".userkey"],"2124DEFEC111BA8FC1257ED20034B387"]],["AND",["NOT",["OR",["IS",[".locationtype"],null],["IS",[".locationtype"],["MISSING"]]]],["NOT",["OR",["IS",[".statslocation"],null],["IS",[".statslocation"],["MISSING"]]]]]],["NOT",["OR",["IS",[".year"],null],["IS",[".year"],["MISSING"]]]]],["NOT",["OR",["IS",[".month"],null],["IS",[".month"],["MISSING"]]]]],["NOT",["OR",["IS",[".statspublic"],null],["IS",[".statspublic"],["MISSING"]]]]],"WHAT":[[".targetspecies"],["AS",["SUM()",[".hoursfished"]],"s1"],["AS",["SUM()",[".minutesfished"]],"s2"],[".zerotrip"],["AS",["COUNT()",["."]],"count"]],"GROUP_BY":[[".targetspecies"],[".zerotrip"]]}


GetStats (best catch): SELECT fl_result(fl_value(_doc.body, 'specieskey')), fl_result(max(fl_value(_doc.body, 'length'))) AS "longest" FROM kv_default AS _doc WHERE ((fl_value(_doc.body, 'type') = 'Catch' AND fl_value(_doc.body, 'userkey') = '2124DEFEC111BA8FC1257ED20034B387') AND NOT (fl_value(_doc.body, 'length') = fl_null() OR fl_value(_doc.body, 'length') IS NULL)) AND (_doc.flags & 1 = 0) GROUP BY fl_value(_doc.body, 'specieskey') ORDER BY "longest" DESC

8|0|0| SEARCH TABLE kv_default AS _doc USING INDEX type_user (<expr>=?)
27|0|0| USE TEMP B-TREE FOR GROUP BY
70|0|0| USE TEMP B-TREE FOR ORDER BY

{"WHERE":["AND",["AND",["=",[".type"],"Catch"],["=",[".userkey"],"2124DEFEC111BA8FC1257ED20034B387"]],["NOT",["OR",["IS",[".length"],null],["IS",[".length"],["MISSING"]]]]],"ORDER_BY":[["DESC",[".longest"]]],"WHAT":[[".specieskey"],["AS",["MAX()",[".length"]],"longest"]],"GROUP_BY":[[".specieskey"]]}


GetList: SELECT fl_result(fl_root(_doc.body)) FROM kv_default AS _doc WHERE (fl_value(_doc.body, 'type') = 'FishingTrip' AND fl_value(_doc.body, 'userkey') = '2124DEFEC111BA8FC1257ED20034B387') AND (_doc.flags & 1 = 0) ORDER BY fl_value(_doc.body, 'date') DESC, fl_value(_doc.body, 'started') DESC LIMIT MAX(0, 2147483647) OFFSET MAX(0, 0)

20|0|0| SEARCH TABLE kv_default AS _doc USING INDEX type_user (<expr>=?)
46|0|0| USE TEMP B-TREE FOR ORDER BY

{"WHERE":["AND",["=",[".type"],"FishingTrip"],["=",[".userkey"],"2124DEFEC111BA8FC1257ED20034B387"]],"LIMIT":2147483647,"OFFSET":0,"ORDER_BY":[["DESC",[".date"]],["DESC",[".started"]]],"WHAT":[["."]]}


GetList: SELECT fl_result(fl_root(_doc.body)) FROM kv_default AS _doc WHERE (fl_value(_doc.body, 'type') = 'Catch' AND fl_value(_doc.body, 'userkey') = '2124DEFEC111BA8FC1257ED20034B387') AND (_doc.flags & 1 = 0) ORDER BY fl_value(_doc.body, 'date') DESC, fl_value(_doc.body, 'catchtime') DESC LIMIT MAX(0, 2147483647) OFFSET MAX(0, 0)

20|0|0| SEARCH TABLE kv_default AS _doc USING INDEX type_user (<expr>=?)
46|0|0| USE TEMP B-TREE FOR ORDER BY

{"WHERE":["AND",["=",[".type"],"Catch"],["=",[".userkey"],"2124DEFEC111BA8FC1257ED20034B387"]],"LIMIT":2147483647,"OFFSET":0,"ORDER_BY":[["DESC",[".date"]],["DESC",[".catchtime"]]],"WHAT":[["."]]}


GetFishPerHourStats: SELECT fl_result(sum(fl_value(_doc.body, 'hoursfished'))) AS "s1", fl_result(sum(fl_value(_doc.body, 'minutesfished'))) AS "s2", fl_result(count(fl_root(_doc.body))) AS "count", fl_result(fl_value(_doc.body, 'year')), fl_result(fl_value(_doc.body, 'zerotrip')), fl_result(fl_value(_doc.body, 'userkey')) FROM kv_default AS _doc WHERE ((((((fl_value(_doc.body, 'type') = 'FishingTrip' AND NOT (fl_value(_doc.body, 'userkey') = fl_null() OR fl_value(_doc.body, 'userkey') IS NULL)) AND NOT (fl_value(_doc.body, 'statslocation') = fl_null() OR fl_value(_doc.body, 'statslocation') IS NULL)) AND NOT (fl_value(_doc.body, 'month') = fl_null() OR fl_value(_doc.body, 'month') IS NULL)) AND fl_value(_doc.body, 'statspublic') IS fl_bool(1)) AND fl_value(_doc.body, 'locationtype') IN ('1', '2', '4', '5')) AND fl_value(_doc.body, 'year') >= 2016) AND (_doc.flags & 1 = 0) GROUP BY fl_value(_doc.body, 'year'), fl_value(_doc.body, 'zerotrip'), fl_value(_doc.body, 'userkey')

7|0|0| SEARCH TABLE kv_default AS _doc USING INDEX type_user (<expr>=?)
65|0|0| USE TEMP B-TREE FOR GROUP BY

{"WHERE":["AND",["AND",["AND",["AND",["AND",["AND",["=",[".type"],"FishingTrip"],["NOT",["OR",["IS",[".userkey"],null],["IS",[".userkey"],["MISSING"]]]]],["NOT",["OR",["IS",[".statslocation"],null],["IS",[".statslocation"],["MISSING"]]]]],["NOT",["OR",["IS",[".month"],null],["IS",[".month"],["MISSING"]]]]],["IS",[".statspublic"],true]],["IN",[".locationtype"],["[]","1","2","4","5"]]],[">=",[".year"],2016]],"WHAT":[["AS",["SUM()",[".hoursfished"]],"s1"],["AS",["SUM()",[".minutesfished"]],"s2"],["AS",["COUNT()",["."]],"count"],[".year"],[".zerotrip"],[".userkey"]],"GROUP_BY":[[".year"],[".zerotrip"],[".userkey"]]}

GetStats (locations): SELECT fl_result(fl_value(_doc.body, 'locationtype')), fl_result(fl_value(_doc.body, 'statslocation')), fl_result(count(fl_root(_doc.body))) AS "count" FROM kv_default AS _doc WHERE ((((((fl_value(_doc.body, 'type') = 'FishingTrip' AND NOT (fl_value(_doc.body, 'userkey') = fl_null() OR fl_value(_doc.body, 'userkey') IS NULL)) AND NOT (fl_value(_doc.body, 'statslocation') = fl_null() OR fl_value(_doc.body, 'statslocation') IS NULL)) AND NOT (fl_value(_doc.body, 'month') = fl_null() OR fl_value(_doc.body, 'month') IS NULL)) AND fl_value(_doc.body, 'statspublic') IS fl_bool(1)) AND fl_value(_doc.body, 'locationtype') IN ('1', '2', '4', '5')) AND fl_value(_doc.body, 'year') >= 2016) AND (_doc.flags & 1 = 0) GROUP BY fl_value(_doc.body, 'locationtype'), fl_value(_doc.body, 'statslocation')

7|0|0| SEARCH TABLE kv_default AS _doc USING INDEX type_user (<expr>=?)
65|0|0| USE TEMP B-TREE FOR GROUP BY

{"WHERE":["AND",["AND",["AND",["AND",["AND",["AND",["=",[".type"],"FishingTrip"],["NOT",["OR",["IS",[".userkey"],null],["IS",[".userkey"],["MISSING"]]]]],["NOT",["OR",["IS",[".statslocation"],null],["IS",[".statslocation"],["MISSING"]]]]],["NOT",["OR",["IS",[".month"],null],["IS",[".month"],["MISSING"]]]]],["IS",[".statspublic"],true]],["IN",[".locationtype"],["[]","1","2","4","5"]]],[">=",[".year"],2016]],"WHAT":[[".locationtype"],[".statslocation"],["AS",["COUNT()",["."]],"count"]],"GROUP_BY":[[".locationtype"],[".statslocation"]]}

GetStats (user's locations): SELECT fl_result(fl_value(_doc.body, 'locationtype')), fl_result(fl_value(_doc.body, 'statslocation')) FROM kv_default AS _doc WHERE ((((fl_value(_doc.body, 'type') = 'FishingTrip' AND fl_value(_doc.body, 'userkey') = '2124DEFEC111BA8FC1257ED20034B387') AND NOT (fl_value(_doc.body, 'statslocation') = fl_null() OR fl_value(_doc.body, 'statslocation') IS NULL)) AND fl_value(_doc.body, 'locationtype') IN ('1', '2', '4', '5')) AND fl_value(_doc.body, 'year') >= 2016) AND (_doc.flags & 1 = 0) GROUP BY fl_value(_doc.body, 'locationtype'), fl_value(_doc.body, 'statslocation')

7|0|0| SEARCH TABLE kv_default AS _doc USING INDEX type_user (<expr>=?)
52|0|0| USE TEMP B-TREE FOR GROUP BY

{"WHERE":["AND",["AND",["AND",["AND",["=",[".type"],"FishingTrip"],["=",[".userkey"],"2124DEFEC111BA8FC1257ED20034B387"]],["NOT",["OR",["IS",[".statslocation"],null],["IS",[".statslocation"],["MISSING"]]]]],["IN",[".locationtype"],["[]","1","2","4","5"]]],[">=",[".year"],2016]],"WHAT":[[".locationtype"],[".statslocation"]],"GROUP_BY":[[".locationtype"],[".statslocation"]]}

So right now it seems that CB Lite only uses one of the indexes…
This obviously leads to the question - what is the best way to find out which indexes to create for CB Lite?

Edit:
Should have mentioned om on Couchbase Lite 2.8.6 :slight_smile:

Well, your type_user index is better than the type index since it adds another column. That might be why SQLite prefers it. (In general, if you have an index on (x, y) you don’t also need an index on (x) alone.)

Your type_user index’s definition is sort of weird, and probably not what you were expecting. The second column will consist of boolean values, which are the result of the NotNullOrMissing function. So this doesn’t help you when trying to look up a particular value of userkey as in the second query. For that you just want to add the userkey column directly.

(I’m guessing you were trying to create a “partial index”, i.e. just index the documents where the userkey property exists. Unfortunately Couchbase Lite doesn’t support partial indexes yet.)

Ahhh… yes, you are right - I was trying to create a partial index. So those two indexes would in effect be equally good (or bad).

So following that knowledge then I guess it wouldn’t make sense to have both the type_user and stats_catch indexes. The extra columns only make sense in a context with partial indexes, I suppose? So should I only have the more generic (type, userkey) instead of adding the more specific columns that only apply to certain queries?