Choose number and configuration for search nodes

Nitesh_Gupta · February 25, 2021, 7:34am

Hi,
I have the below mentioned CB cluster configuration:

Version - 6.0.2
5 data, 5 index, 8 query nodes on Azure
Bucket has 130 million documents
Each node configuration:
Operating system : Linux (centos 7.7.1908)
Size:Standard E64-32s_v3 (32 vcpus, 432 GB memory)
vCPUs : 32
RAM : 432 GB

Now i want to know how to calculate number of FTS nodes and its machine configuration such that out of 130 million documents, i just need to index 60 million documents and only with its document id. I just want to store one more field from each document to search node. Also, there will be a throughput of 100 request per second.

And what number of search nodes and configuration, you guys suggest for the above cluster?

Thanks
Nitesh

sreeks · February 25, 2021, 8:08am

Hi @Nitesh_Gupta ,

You may start with 2 FTS nodes of the above configuration of default 6 partitioned index, with 1 replica enabled for HA.
Thereafter, depending on the latency requirements you may adjust the partition count to leverage the CPU cores available empirically.

And you may set the FTS memory quota to ~70% of the RAM, ie ~290GB.

You also need to think about the future growth potential of the data too while provisioning the nodes/partitions.

Now, after these initial trials and future growth considerations, if numbers are looking good,
you may explore nodes with a slightly lesser configuration like 16 cores with lesser RAM too.

Nitesh_Gupta · February 25, 2021, 9:14am

Thanks @sreeks for this initial configuration to start with.

Nitesh_Gupta · March 2, 2021, 11:05am

Hi,
I tried creating index on 6 lacs documents out of 130 million documents with above mentioned configurations and requirements. Creating index took almost 1 hour and after index completion, doc count was showing 130 million.

why it took so much time?
why doc count was showing complete document count?
While index create/update, it took 98% CPU utilization. why?

Index created was on one field with only index checked box. Then later i added 3 more fields to index with only index checked box. It again took 1 hour. why is it time consuming?

sreeks · March 2, 2021, 12:35pm

Doc count indicates the number of documents processed from the bucket. Not the real count of documents in the index.
This stat label is updated in the latest server software.

why it took so much time?
As FTS has to process/parse all 130M documents, it is taking this much time. Only by inspecting the document contents it knows whether this is the type of document the user need to index or not.
why doc count was showing complete document count?
Its the count of documents processed so far.
While index create/update, it took 98% CPU utilization. why?
FTS has to text analyze the documents, index it. And there is background compaction and a lot of other activities going on.

FTS has to parse the whole 130M documents again once you change the index definition or mapping. As mentioned in other thread - its a rebuild from zero.

How many nodes you have and what is its hardware configuration? Are those hosting only FTS service?

Nitesh_Gupta · March 2, 2021, 12:54pm

Hi,
Copying from my initial post:

Adding 2 search nodes with memory quota of 70% RAM and index being created with 6 partitions (may be default value, i am not sure because i do not have option to set it while index definition) having 1 replica with scorch type.

And, yes these 2 nodes are hosting only search service.

Thanks
Nitesh

Topic		Replies	Views
Evaluating Full Text Search Full Text Search fts	26	2826	February 24, 2021
How to increase performance in FTS? Full Text Search	9	2489	July 27, 2018
FTS Index with two nodes give diferent results VS one node Full Text Search java	3	1324	June 12, 2019
FTS Error(s) context deadline exceeded Full Text Search	7	1852	January 2, 2021
Change No. of pindex Full Text Search	7	1652	June 17, 2021

Choose number and configuration for search nodes

Related topics