Multiple copies of fts indexes

dwilliams · February 20, 2018, 4:27pm

Am running 5.0.1 build 5003, am having issues with fts using all of my disk space with multiple copies full text indexes. When I delete the fts index the moss files aren’t deleted.

steve · February 20, 2018, 5:24pm

Hi – I’ve heard of a similar report of this kind of thing, and we’re still trying to track it down. Currently trying to reproduce. If you can share any more info about your situation, that’d be great. A cbcollect-info would be terrific if you can share that.

dwilliams · February 20, 2018, 5:35pm

That’s good to hear that am not the only one.
Yes I can run the cbcollect_info command, which option would you like ?

-Thanks

steve · February 20, 2018, 5:49pm

Easiest way, imho, is to use the web admin UI approach, which automatically plops the cbcollect-info output to a couchbase s3 bucket (as the output’s potentially ginormous – meant to capture everything tech support might ever want to know about a cluster to diagnose issues). – steve

dwilliams · February 20, 2018, 6:15pm

That worked, dwilliams/collectinfo-2018-02-20T180337-ns_1%4010.10.10.8.zip

steve · February 20, 2018, 7:44pm

Thanks – I just pinged a colleague who’s been tracking down the other reported issue (similar but slightly different) to see if he can weave together any clues/theories on this.

dwilliams · February 20, 2018, 8:11pm

Thanks, if you need any more info please ping me.

abhinav · February 21, 2018, 2:02am

So it seems that the issue with growing disk size is on one index only: swapacd_cd_info_fts

At about 8:20 is when the disk usage of the FTS index started to increase.

I see a number of SET operations on the couchbase bucket “swapacd_cd_info” (to which the index is tied to) - the reason why there is a steady increase in the indexed document count as well and there-by the number of bytes on disk usage by FTS. What is your average document size on the couchbase bucket?

It seems to me that you either killed the FTS process or deleted the index at about 8:49 - as we don’t have stats after that point? It looks like the index was still building at this point, the reason why disk usage continued to grow.

The files seem pretty large, so when you mentioned that the moss files weren’t cleaned on index delete -
had the delete operation completed or had it time out?

dwilliams · February 21, 2018, 8:07am

Let me restate my first question, under the folder “/@fts/” why are there multiple copies of the same index?

Now to try and answer your question, at that time, I was adding around 10 million attribute to 1.7 million documents. While monitoring the process the Couchbase UI warned me that my disk space was low. I stopped the upload and start to track down why disk usage was high. At this point, I found 6 copies of the same fts index, cd_info_fts. To try and recover space I deleted the fts index using the UI, after the UI showed the fts index gone I checked my disk space and found that the fts indexs where still there, all 6 of them.

I will work on getting you the per document size.

Thanks

abhinav · February 21, 2018, 4:44pm

Ok, so you seem to have had 4 indexes on the node and the thing is I don’t see the index ‘swapacd_cd_info_fts’ in the mossScope diag stats. You can look into this yourself in the cbcollect_info you shared in fts_mossScope_stats.log. I do see only the 3 other indexes (which have 6 partitions each):

swapacd_search_terms_fts
swapacd_help_center_items_fts
swapacd_general_search_suggestions_fts

**Note that we partition each fts index into 6 by default, so we’d expect to see 6 files per FTS index.

dwilliams · February 21, 2018, 11:00pm

Ok, if I have a bucket that’s 166MB and I create a full text search index on that bucket, fts will create 6 partition of 166MB each, a total of 966MB?

My document size is around 4K

Thanks

abhinav · February 22, 2018, 12:08am

Well, the bucket size cannot be related to the FTS index size. The FTS index size (which is an aggregate of the size of all its partitions) depends on the kind of index you create. Say you build a very specific index with type mappings, your index size would be much smaller when compared to a default FTS index with no type mappings. The FTS index size is usually much larger than the bucket size on disk.

I’m not sure if I misunderstand what you’re asking … but the partitions are not copies of each other, they only contain a portion of the data that makes the entire index. So when you issue a search query over your index, the query is applied to all the partitions, and results are fetched from all of them, aggregated and then presented to you.

dwilliams · February 26, 2018, 9:53pm

Ok, I now understand. After narrowed down which attributes are indexed, my fts decreased to a manageable size.

Thanks for the help.

dwilliams · February 26, 2018, 10:02pm

Is there any sizing guidelines for full text search?

tyler.mitchell · May 4, 2018, 4:42pm

@dwilliams - old post but worth updating… we are working on sizing guidelines but at the same time we are working on a new and improved indexing engine, so it’s a bit in flux at the moment. More to come.

Alexandr_Alexandr · September 10, 2019, 5:50am

I have a same problem. @FTS created index, but not completed. I deleted Fts index in GUI, is deleted, but not free space on server node.
Index file not delete from disk (@fts/*.pindex) and not free space on server node.
Can I delete *.pindex folder on linux file system ?
Ned restart/rebalance node?

The_Cimmerian · July 22, 2020, 4:26pm

@abhinav @steve

This discussion may be related to my issue posted here.

For terms FTS indexed across multiple partitions, how does one bring the docFreq and maxDocs in sync with one another so the term scores match?
Search Term: YAMA AUTOMOTIVE
These results both identical SLIMS AUTOMOTIVE but drawn from two different partitions thus yielding different scores.

from partition 1: {
“value”: 7.086092676186764,
“message”: “idf(docFreq=29, maxDocs=13191)”
},

from partition 2: {
“value”: 7.5472383777016825,
“message”: “idf(docFreq=18, maxDocs=13249)”
},

JG

Topic		Replies	Views
Full Text Search Utilization 100% with no Indexes Couchbase Server fts	3	753	December 16, 2022
FTS, slow index after lots of deletes Full Text Search	2	696	November 22, 2020
Index Search consume much storage Full Text Search	6	1241	March 16, 2021
CBFT error Full Virtual Disk Couchbase Server	4	881	November 22, 2018
Couchbase Server Disk Usage Couchbase Server	3	1366	January 23, 2019

Multiple copies of fts indexes

Related topics