Create index is slow on the huge no. of documents

webber · May 18, 2016, 7:06pm

I am creating two index on a Couchbase cluster.

One is the primary index, the other is a secondary. I am creating these indexes on approximately 30 billions documents. The secondary index is for 3 elements.

I started to create them 7 hours ago. However the current progresses are 35 and 39 percent.
Is it usually take a time to create indexes on such huge data or is something is wrong on my environment ?
When do you think that the creation will finish ?

The size of the cluster is 4 nodes (16 cores for each) and the total index RAM quota is 10GB. 2 nodes are index server.

The index Settings is as follows:

Indexer Threads: 8
In Memory Snapshot Interval: 200 ms
Stable Snapshot Interval: 5000 ms
Max Rollback Points: 5
Indexer Log Level: info

Thanks

cihangirb · May 18, 2016, 8:49pm

The index build times can be high for a few reasons;

retrieval of the information from data service is slow.
index nodes can’t save the index to disk fast enough

There are a few options;

use defer_build option to build both indexes together. defer build will ensure you scan once and build both indexes.
you could also partition your indexes and get more nodes to parallelize your index build. for partitioning you can specify a filter (WHERE clause in CREATE INDEX). However I should note that there may be some queries that may not be able to take advantage of range scans in the index that is partitioned.
Last, We have another option in 4.5 called memory optimized indexes that can build the index much faster in memory - however given the count of the docs, I don’t think you will be able to fit your index into memory.

What is the document key size and index key size? just curious.
thanks
-cihan

webber · May 19, 2016, 3:16pm

Hi cihangirb,

Thank you for your reply. The index key size is 45 bytes.
I am using 4 nodes for the cluster and each node is 16 cores and SSD storage on AWS.
I don’t think retrieval or save is slow, but what do you think ?

Thanks

eldorado · December 25, 2019, 10:36pm

@webber - Have you ever found a solution for this problem ?

varun.velamuri · December 26, 2019, 7:26am

@eldorado,

Just for your information, the underlying storage engine probably used when @webber tried this use case was ForestDB (Considering that the time of initial post is May’16). The current underlying storage engine being used is Plasma which is very different and better performant when compared to ForestDB.

Thanks,
Varun

eldorado · December 26, 2019, 8:48am

@varun.velamuri - Sure … I know plasma is better bet than ForestDB but was looking for information on what was his choice if he ever resolved the issue. Lot of cases I see dangling closure of threads with no solution . So would be really helpful to close case with resolutions . but thanks for pointing out .

Topic		Replies	Views
Index getting stucked in build process Couchbase Server	0	751	February 28, 2018
Index creation is very slow on a CE 6.6 SQL++ index	3	736	October 25, 2022
Large time to build and move index Kubernetes	1	691	August 5, 2022
Creating Primary Index Couchbase Server	5	3732	October 18, 2017
Slow indexing speed (primary and GSI) Couchbase Server	5	1272	March 19, 2021

Create index is slow on the huge no. of documents

Related topics