Is there a way to tell if an index has been used?

I have started playing with the external index advisor. This leads to creating more indexes :wink:

However, then I would like to get rid of indexes that may not be used. The previous indexes have been created a while ago with more like a “guess” approach (sometimes fueled by good advices here).

I suppose I shouldn’t create a new index for each and every different type of query but ideally have some overlap - or what is the good advice?

A little more specific question about the index suggested by the advisor:

Here is an example. This query:

SELECT data.date,data.count,data.length
FROM data
WHERE type='Catch' AND year=2023
    AND statslocation IN [ "150", "153", "647", "127", "605", "655", "641", "673", "597", "644", "643", "45", "671", "645", "8" ]

leads to a recommended index:

CREATE INDEX adv_type_year_statslocation ON `data`(`type`,`year`,`statslocation`)

I know that this index is only relevant for documents of type “Catch” - so should I add a WHERE type='Catch' phrase to the index creation statement - or doesn’t it matter?

And another question/clarification. In the advisor I get this full response for the above query:

{
  "results": [
    {
      "#operator": "Advise",
      "advice": {
        "#operator": "IndexAdvice",
        "adviseinfo": {
          "recommended_indexes": {
            "covering_indexes": [
              {
                "index_statement": "CREATE INDEX adv_type_year_statslocation_date_count_length ON `data`(`type`,`year`,`statslocation`,`date`,`count`,`length`)",
                "keyspace_alias": "data"
              }
            ],
            "indexes": [
              {
                "index_statement": "CREATE INDEX adv_type_year_statslocation ON `data`(`type`,`year`,`statslocation`)",
                "keyspace_alias": "data",
                "recommending_rule": "Index keys follow order of predicate types: 2. equality/null/missing, 3. in."
              }
            ]
          }
        }
      },
      "query": "SELECT data.date,data.count,data.length\nFROM data\nWHERE type='Catch' AND year=2023\n    AND statslocation IN [ \"150\", \"153\", \"647\", \"127\", \"605\", \"655\", \"641\", \"673\", \"597\", \"644\", \"643\", \"45\", \"671\", \"645\", \"8\" ]"
    }
  ]
}

Does this mean that I should also create the first “covering” index?

The system:indexes collection has a field metadata.stats.last_scan_time; if this is non-null then you can see when the index was last used since the last restart.
So keep an eye on this and you may get an idea of which indices are no longer used.

You can get similar information directly via the Index statistics endpoint: https://docs.couchbase.com/server/current/rest-api/rest-index-stats.html#_get_index_stats

Whether or not to create partial indices depends on your usage. If for example you will always filter on type='Catch' then it does make sense to create the partial index, assuming the data volume warrants it. (One of the main advantages to a partial index is that it is smaller - and is therefore faster to traverse - but if (almost) all your data matches the filter then the effect is diminished.) (Using collections may be a better fit in some cases, doing away with a “type” field.)

The advisor will attempt to suggest a covering index if it can. If your query won’t change - including the fields/expressions that are projected - then the covering index will typically give you superior performance, usually at the cost of being a larger index (more fields are typically included). If you will be running the query often then it likely makes sense to trade the space for the speed; if this is an occasional-only query or one where you may change the selected fields (but not the filtering criteria) then likely it doesn’t. (There will of course always be exceptions.)

You may want to refer to https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/covering-indexes.html

HTH,

1 Like

Ok, so it is really a matter of creating either the covering index or the other index. Great to know - that makes totally sense.

I will have a look at system:indexes to see if some of the indexes appear obsolete :+1:

Is there any way I can get this information from the Admin interface (e.g. via a query)? I can see that I can get it via the REST API. However, I have a server where I only have easy access via the browser admin interface on port 8091 :wink:

Can you not query via the Query Workbench ? (http://:8091/ui/index.html#/query/workbench)

Otherwise AFAIK you’d need access to the appropriate Indexer port (9102 typically) to get the stats that way.

Well, I thought so, but if I run:

SELECT *
FROM system:indexes

I don’t get that property as far as I can see…

[
  {
    "indexes": {
      "condition": "((`type` = \"Catch\") and (`userkey` is valued))",
      "datastore_id": "http://127.0.0.1:8091",
      "id": "14e2468855630e91",
      "index_key": [
        "`userkey`",
        "`specieskey`",
        "`locationtype`",
        "`statslocation`",
        "`statspublic`",
        "`statstrippublic`",
        "`statstriptarget`",
        "`year`",
        "`month`",
        "`count`",
        "`length`",
        "`weight`"
      ],
      "keyspace_id": "data",
      "metadata": {
        "num_replica": 0
      },
      "name": "stats_catch_species_count_2",
      "namespace_id": "default",
      "state": "online",
      "using": "gsi"
    }
  },
:
:

Sorry, I forgot it is only in 7.6+ (MB-34863 exposed it, MB-54629 refined it) and you’re undoubtedly on an earlier version.

I believe the stat is still available via the index API though - just not through Query/SQL++.

1 Like

Yeah, I am on 7.2.4 and see that 7.6 requires an OS update as our servers are on CentOS7 - so it will take a little more to upgrade to the next version :innocent:

… but good to know that it is coming!