Global Secondary Index Size

steevebisson · June 16, 2016, 6:10pm

I created the following GSI Index for my bucket:

CREATE PRIMARY INDEX `#primary` ON `parse_with_index` USING GSI WITH {"nodes":"cb41ee-04-us-us-east-1.dev.cld.touchtunes.com:8091"};

Here is the size of the index folder on my node before the creation:

/dev/xvdh1      296G   65M  281G   1% /index

Here is the size after the index creation:

/dev/xvdh1      296G  162G  119G  58% /index

My bucket size says the following on the console:

63.6GB /
79.7GB

I am wondering why would the Index size be greater than the actual data on disk.

Thanks,
Steeve

cihangirb · June 17, 2016, 6:25pm

Could you share your version pls? the behavior has changes in GSI from between versions on this.

There may be a few reasons ;

compaction may not have run yet: we have an append only write mode that favors fast sequential writes over space. a process called compaction removes the orphaned pages at an interval. To see the fragmentation ratio, you can look at the index stats for the index under the “% fragmentation” .
maintaining a tree is more expensive than raw data. index has some overhead over data due to the tree structure it maintains either with skiplist or with btree flavors. you can also see the index disk size and index data size (size of data being indexed) for the index under stats (click on bucket name under the data buckets" tab in web console. let me know if you are seeing a discrepency between the stat and what you observe on the filesystem.
thanks
-cihan

steevebisson · June 17, 2016, 6:36pm

Hello Cihan,

I am testing on 4.1.1-5914 Enterprise Edition (build-5914). I had to remove my Node because of disk space issue. I should have another up and running in an hour or 2. I will keep you posted on the index stats.

Thanks for the help.

-Steeve

cihangirb · June 17, 2016, 7:09pm

Thanks. I’d recommend also looking into 4.5 with circular writes.
4.0, 4.1 and 4.5 all provide append only write mode. With 4.5 we have introduced another write mode that eliminates the need for frequent compaction. The write looks for orphaned pages and reuses the space instead of appending all writes to the end of the file. It is now the new default mode when standard GSI storage mode is used.
If you have enough memory to keep your indexes in memory, I’d recommend using memory-optimized indexes as well. The IO overhead and the IO subsystem required is much less demanding when running memory optimized storage mode for indexes. All index maintenance happen in memory thus much faster for scans and index maintenance as well.
You can find the indexing options for n1ql here http://developer.couchbase.com/documentation/server/4.5/indexes/n1ql-in-couchbase.html

thanks

egrep · June 17, 2016, 8:04pm

@steevebisson,
you definitely should use 4.5.
4.0/4.1 indexes have pretty ugly implementation, or ,“rephrasing politely”, “there a lot of really significant improvements in 4.5”. For 4.1.X take a look at GSI: different sporadic bugs + fragmentation up to 98% there are a lot of fun …

steevebisson · June 17, 2016, 8:51pm

Here is my index stats info.

steevebisson · June 20, 2016, 2:13pm

Hello,

Disk space returned to normal after the compaction of the bucket. 8.3GIG! WOW, this is great.

Topic		Replies	Views
While creating gsi index the disk space is took up quickly SQL++	5	2177	October 8, 2015
Index Fragmentation SQL++	8	5145	January 28, 2016
Index takes too much disk space Couchbase Server	1	1132	July 17, 2020
In memory index size Couchbase Server	1	1836	April 7, 2017
GSI: different sporadic bugs + fragmentation up to 98% Couchbase Server	26	7202	May 11, 2017

Global Secondary Index Size

Related topics