Multiple copies of fts indexes

Am running 5.0.1 build 5003, am having issues with fts using all of my disk space with multiple copies full text indexes. When I delete the fts index the moss files aren’t deleted.

Hi – I’ve heard of a similar report of this kind of thing, and we’re still trying to track it down. Currently trying to reproduce. If you can share any more info about your situation, that’d be great. A cbcollect-info would be terrific if you can share that.

That’s good to hear that am not the only one.
Yes I can run the cbcollect_info command, which option would you like ?

-Thanks

Easiest way, imho, is to use the web admin UI approach, which automatically plops the cbcollect-info output to a couchbase s3 bucket (as the output’s potentially ginormous – meant to capture everything tech support might ever want to know about a cluster to diagnose issues). – steve

That worked, dwilliams/collectinfo-2018-02-20T180337-ns_1%4010.10.10.8.zip

Thanks – I just pinged a colleague who’s been tracking down the other reported issue (similar but slightly different) to see if he can weave together any clues/theories on this.

Thanks, if you need any more info please ping me.

So it seems that the issue with growing disk size is on one index only: swapacd_cd_info_fts

At about 8:20 is when the disk usage of the FTS index started to increase.

I see a number of SET operations on the couchbase bucket “swapacd_cd_info” (to which the index is tied to) - the reason why there is a steady increase in the indexed document count as well and there-by the number of bytes on disk usage by FTS. What is your average document size on the couchbase bucket?

It seems to me that you either killed the FTS process or deleted the index at about 8:49 - as we don’t have stats after that point? It looks like the index was still building at this point, the reason why disk usage continued to grow.

The files seem pretty large, so when you mentioned that the moss files weren’t cleaned on index delete -
had the delete operation completed or had it time out?

Let me restate my first question, under the folder “/@fts/” why are there multiple copies of the same index?

Now to try and answer your question, at that time, I was adding around 10 million attribute to 1.7 million documents. While monitoring the process the Couchbase UI warned me that my disk space was low. I stopped the upload and start to track down why disk usage was high. At this point, I found 6 copies of the same fts index, cd_info_fts. To try and recover space I deleted the fts index using the UI, after the UI showed the fts index gone I checked my disk space and found that the fts indexs where still there, all 6 of them.

I will work on getting you the per document size.

Thanks

Ok, so you seem to have had 4 indexes on the node and the thing is I don’t see the index ‘swapacd_cd_info_fts’ in the mossScope diag stats. You can look into this yourself in the cbcollect_info you shared in fts_mossScope_stats.log. I do see only the 3 other indexes (which have 6 partitions each):

  • swapacd_search_terms_fts
  • swapacd_help_center_items_fts
  • swapacd_general_search_suggestions_fts

**Note that we partition each fts index into 6 by default, so we’d expect to see 6 files per FTS index.

Ok, if I have a bucket that’s 166MB and I create a full text search index on that bucket, fts will create 6 partition of 166MB each, a total of 966MB?

My document size is around 4K

Thanks

Well, the bucket size cannot be related to the FTS index size. The FTS index size (which is an aggregate of the size of all its partitions) depends on the kind of index you create. Say you build a very specific index with type mappings, your index size would be much smaller when compared to a default FTS index with no type mappings. The FTS index size is usually much larger than the bucket size on disk.

I’m not sure if I misunderstand what you’re asking … but the partitions are not copies of each other, they only contain a portion of the data that makes the entire index. So when you issue a search query over your index, the query is applied to all the partitions, and results are fetched from all of them, aggregated and then presented to you.

Ok, I now understand. After narrowed down which attributes are indexed, my fts decreased to a manageable size.

Thanks for the help.

Is there any sizing guidelines for full text search?

1 Like

@dwilliams - old post but worth updating… we are working on sizing guidelines but at the same time we are working on a new and improved indexing engine, so it’s a bit in flux at the moment. More to come.

1 Like

I have a same problem. @FTS created index, but not completed. I deleted Fts index in GUI, is deleted, but not free space on server node.
Index file not delete from disk (@fts/*.pindex) and not free space on server node.
Can I delete *.pindex folder on linux file system ?
Ned restart/rebalance node?

@abhinav @steve

This discussion may be related to my issue posted here.

For terms FTS indexed across multiple partitions, how does one bring the docFreq and maxDocs in sync with one another so the term scores match?
Search Term: YAMA AUTOMOTIVE
These results both identical SLIMS AUTOMOTIVE but drawn from two different partitions thus yielding different scores.

from partition 1: {
“value”: 7.086092676186764,
“message”: “idf(docFreq=29, maxDocs=13191)”
},

from partition 2: {
“value”: 7.5472383777016825,
“message”: “idf(docFreq=18, maxDocs=13249)”
},

JG