GSI Indexes Too Large for RAM

I am attempting to create GSI indexes on a bucket with 38 million items (couchbase 6.6.0 enterprise with plasma storage, hosted on the kubernetes operator version 2.1.0). Currently the cluster is setup with 2 index nodes, both of which have a guaranteed service class with 1.5 CPUs and 4 GBs RAM (with an index quota of 3275 MBs). I’m having trouble creating large indexes on the bucket - I can reliably create 1 or 2 large indexes, but if I try to create too many then almost all of the indexes across all buckets go into “warmup” state until I delete the large index. This screenshot shows the error right as it’s happening, after leaving things in this state for a few hours all the indexes went into the “warmup” state.

I’m seeing a lot of log messages during that time (while a service is attempting to query the indexes in “warmup” state):

Service 'indexer' exited with status 137. Restarting. Messages:
2021-05-05T09:10:00.074+00:00 [Info] SCAN##2 RESPONSE status:(error = Index not ready for serving queries), requestId: 44aa4324-686a-4885-8a4c-ec3bdfbcf626
2021-05-05T09:10:00.085+00:00 [Info] SCAN##3 REQUEST defnId:5791000603602569377, instId:0, index:/, type:scan, partitions:[0], span:<ud>(range (%!s(<nil>),%!s(<nil>) incl:none))</ud>, limit:9223372036854775807, requestId:44aa4324-686a-4885-8a4c-ec3bdfbcf626
2021-05-05T09:10:00.085+00:00 [Info] SCAN##3 RESPONSE status:(error = Index not ready for serving queries), requestId: 44aa4324-686a-4885-8a4c-ec3bdfbcf626
2021-05-05T09:10:18.600+00:00 [Info] ServiceMgr::GetTaskList []
2021-05-05T09:10:18.601+00:00 [Info] ServiceMgr::GetTaskList returns &{[0 0 0 0 0 0 0 1] []}
2021-05-05T09:10:18.601+00:00 [Info] ServiceMgr::GetCurrentTopology []
2021-05-05T09:10:18.601+00:00 [Info] ServiceMgr::GetCurrentTopology returns &{[0 0 0 0 0 0 0 1] [6c2e523d80b121be5fc51b8c65588f0e] true []}
2021-05-05T09:10:18.602+00:00 [Info] ServiceMgr::GetTaskList [0 0 0 0 0 0 0 1]
2021-05-05T09:10:18.602+00:00 [Info] ServiceMgr::GetCurrentTopology [0 0 0 0 0 0 0 1]




Service 'indexer' exited with status 137. Restarting. Messages:
2021-05-05T09:14:06.780+00:00 [Info] ServiceMgr::GetCurrentTopology returns &{[0 0 0 0 0 0 0 1] [6c2e523d80b121be5fc51b8c65588f0e] true []}
2021-05-05T09:14:06.781+00:00 [Info] ServiceMgr::GetTaskList []
2021-05-05T09:14:06.781+00:00 [Info] ServiceMgr::GetTaskList returns &{[0 0 0 0 0 0 0 1] []}
2021-05-05T09:14:06.782+00:00 [Info] ServiceMgr::GetCurrentTopology [0 0 0 0 0 0 0 1]
2021-05-05T09:14:06.782+00:00 [Info] ServiceMgr::GetTaskList [0 0 0 0 0 0 0 1]
2021-05-05T09:14:11.284+00:00 [Info] fs-bucket-v0/sg_allDocs_x1/Backstore#13579134874878103257:0 Plasma: Enable page eviction before reaching quota.
2021-05-05T09:14:11.284+00:00 [Info] fs-bucket-v0/sg_allDocs_x1/Mainstore#13579134874878103257:0 Plasma: Enable page eviction before reaching quota.
2021-05-05T09:14:11.389+00:00 [Info] fs-bucket-v0/sg_syncDocs_x1/Mainstore#17284487810525332057:0 Plasma: Enable page eviction before reaching quota.
2021-05-05T09:14:11.484+00:00 [Info] fs-bucket-v0/sg_syncDocs_x1/Backstore#17284487810525332057:0 Plasma: Enable page eviction before reaching quota.


Service 'indexer' exited with status 137. Restarting. Messages:
2021-05-05T09:15:16.416+00:00 [Info] PeriodicStats = {"cpu_utilization":100.16666666666667,"index_not_found_errcount":0,"indexer_state":"Warmup","memory_free":6946410496,"memory_quota":3434086400,"memory_rss":1593073664,"memory_total":33676562432,"memory_total_storage":0,"memory_used":0,"memory_used_queue":0,"memory_used_storage":0,"needs_restart":false,"num_connections":0,"num_cpu_core":8,"storage_mode":"plasma","timestamp":"1620206116416462577","timings/stats_response":"72 7724012 889244961924","uptime":"1m2.294666563s"}
2021-05-05T09:15:17.296+00:00 [Info] ServiceMgr::GetTaskList []
2021-05-05T09:15:17.296+00:00 [Info] ServiceMgr::GetTaskList returns &{[0 0 0 0 0 0 0 1] []}
2021-05-05T09:15:17.297+00:00 [Info] ServiceMgr::GetCurrentTopology []
2021-05-05T09:15:17.297+00:00 [Info] ServiceMgr::GetCurrentTopology returns &{[0 0 0 0 0 0 0 1] [6c2e523d80b121be5fc51b8c65588f0e] true []}
2021-05-05T09:15:17.297+00:00 [Info] ServiceMgr::GetTaskList [0 0 0 0 0 0 0 1]
2021-05-05T09:15:17.298+00:00 [Info] ServiceMgr::GetCurrentTopology [0 0 0 0 0 0 0 1]


Service 'indexer' exited with status 137. Restarting. Messages:
2021-05-05T09:02:13.518+00:00 [Info] rostercache/Index_Type/Backstore#5194750392116224388:0 Plasma: checkpoint recovery replayOffset:8631005184 startOffset:8087875811, endOffset:11201191936, recoverTime:1.518611296s
2021-05-05T09:02:42.670+00:00 [Info] ServiceMgr::GetCurrentTopology []
2021-05-05T09:02:42.670+00:00 [Info] ServiceMgr::GetCurrentTopology returns &{[0 0 0 0 0 0 0 1] [6c2e523d80b121be5fc51b8c65588f0e] true []}
2021-05-05T09:02:42.671+00:00 [Info] ServiceMgr::GetTaskList []
2021-05-05T09:02:42.671+00:00 [Info] ServiceMgr::GetTaskList returns &{[0 0 0 0 0 0 0 1] []}
2021-05-05T09:02:42.671+00:00 [Info] ServiceMgr::GetCurrentTopology [0 0 0 0 0 0 0 1]
2021-05-05T09:02:42.672+00:00 [Info] ServiceMgr::GetTaskList [0 0 0 0 0 0 0 1]
2021-05-05T09:03:02.108+00:00 [Info] fs-bucket-v0/sg_roleAccess_x1/Mainstore#8459312647145716814:0 Plasma: Enable page eviction before reaching quota.
2021-05-05T09:03:02.108+00:00 [Info] fs-bucket-v0/sg_roleAccess_x1/Backstore#8459312647145716814:0 Plasma: Enable page eviction before reaching quota.

I’ve enabled “full ejection” on the buckets that are having the problem, this did not help. I had assumed that GSI indexes could fit into a relatively small RAM size, is this not the case? If so, what is the minimum amount of RAM required for a GSI index? From the documentation, it seems that 256 MB should be sufficient to run GSI indexes - but in practice 4 GB per node doesn’t seem to be enough:

https://docs.couchbase.com/server/current/install/sizing-general.html#sizing-index-service-nodes

Other documentation indicates that partial indexes can help (What Is Database Indexing? – Best Practices and Examples), does this mean that a large number of small partial indexes will fit even if a single large GSI will not?

Hi @Louis.Racette,

Your nodes are very small

If you look at couchbase-cloud’s sizing (which runs on version 6.6.0) you will see that the minimums are are much higher than the resources you dedicated for your index nodes .
https://docs.couchbase.com/cloud/clusters/sizing.html#minimum-specification
The above shows 4 vCPUs and 16GB of RAM. The RAM in your case is more important than the vCPUs

I am definitely not an expert at sizing Couchbase with small nodes it does work but you hits a few speed bumps, but I believe you also have two other options a) if you lower the number of buckets or b) add more RAM I think you might escape from your current low resource situation. I think there may also be a third option i.e. creating a sharded index.

Getting back to your issue, it like you have at least (2) buckets and about 40M docs total across your buckets (are there more buckets and more data?)

You say

if I try to create too many then almost all of the indexes across all buckets go into “warmup” state until I delete the large index

This looks like a bug if you could take a cbcollect_info and open a formal support ticket that would be helpful.

Best

Jon Strabala

I ended up using a larger number of partial indexes and that worked, but scaling up to the minimum spec sounds like a better option - we’ll try that out. Thanks!

@Louis.Racette Since you are able to make things work with larger number of partial indexes , it appears it was a resource (cpu, ram) sizing issue that caused the original problem. Couchbase typically recommends maintaining 20% resident ratio for GSI indexes using plasma.