How to justify the resource allocation for Search Service?

TimWong · January 31, 2024, 2:42am

Reference to sizing guidelines (Sizing Guidelines | Couchbase Docs), the memory quota of search service are 256 MB minimum; 2048 MB and above recommended, any more hints on planning the memory for the search service?

Another official recommendation quote be here “The usual guideline is to set the Search memory quota to 60-70% of the available RAM in a Search node.”
(ftsMemoryQuota | Couchbase Docs)

What if I running a cluster, and each node have 512 GB memory, a node will be assigned to run all of the services (without Analytics Service), let say I have allocate 150 GB memory each to data service and index service, should I allocate 100 GB to search services?

jon.strabala · February 8, 2024, 8:17pm

@TimWong,

If you are running as an MDS service the second recommendation for the search nodes is the preferred approach: 60-70% of your RAM obviously if you have 512 GB memory you can ignore the rule go up to 90% as your OS will still have 51 GB for the OS and other processes.

The resource utilization is highly dependant on base sizing e.g. what you are indexing (the type key word, text, datetime, geospatial, and now vector), the size to the items indexed, and finally the number of items indexed.

Storage: Inverted indexes tend to use more storage space than b-trees because they store a mapping for every significant word or token found in the document set. For each word, the index must store a list of all document IDs (or positions within documents) where that word appears. This can become quite large for extensive document collections with a broad vocabulary.
Memory: To ensure fast search performance, parts of the inverted index (or sometimes the entire index, depending on its size and the system’s capabilities) are often loaded into memory. This can lead to high memory usage, especially for large datasets.
Update Costs: Updating inverted indexes can be resource-intensive. Adding, removing, or updating documents requires recalculating and updating the index entries for potentially many words, which can be more demanding than updates to a B-tree structure used for simpler key-value lookups.

Now the Couchbase Search service relies on memory mapping disk files thus the more ram the greater amount of data that can be cached by the OS without swapping pages when access is needed. Thus you do want to allocate as much as possible if you are have large base sizing and a large corpus of documents.

Now your final question:

let say I have allocate 150 GB memory each to data service and index service, should I allocate 100 GB to search services

It seems like you are attempting to run all services on a single node. Couchbase is designed as a loose set of microservices to enable both horizontal and vertical scaling. When you combine all services together, you lose the ability to easily add a node for a specific service to achieve horizontal scaling. Moreover, you forfeit service isolation, not to mention the high availability aspects of Couchbase.

I’m not saying it won’t work (all on one node), but if you’re reindexing 10TB of data and 800 fields via search, you might want to skew the allocation towards search. However, if you’re just indexing a few fields and less than 100MB of data, you would be wasting resources that might be better utilized elsewhere. This is why we recommend sizing exercises with our professional services.

Best

Jon Strabala
Principal Product Manager - Server‌

TimWong · February 9, 2024, 2:55am

Thanks for the detail explanations

I do agreed the benefits from service isolation, let say I will split out the data service, index service, query service and search service come to each isolated node(s), also i choose the Memory Optimized Global Secondary Index (MOI) as the storage option of index (as the performance is my primary objective, not really care to wasting resources ), any side-effect to the Search Service by the storage option of index?

Because you mentions like the Search Service seems are similar or relies on Index Service, maybe I just confused.

jon.strabala · April 15, 2024, 9:12pm

Hi @TimWong

I choose the Memory Optimized Global Secondary Index (MOI) as the storage option of index (as the performance is my primary objective, not really care to wasting resources ), any side-effect to the Search Service by the storage option of index?

No none at all. Of course if you are using the search service it will have to have DCP feed(s) form the the Data Service (the source of truth) to update the inverted indexes (or vector indexes) that Search maintains.

The search service and the indexing service don’t share anything between them if they are on different nodes (excepting the TCP connections to the data service and some buffers on the data service).

Because you mentions like the Search Service seems are similar or relies on Index Service, maybe I just confused.

Think of Couchbase as a high speed KV engine (that’s the data service) all other services feed of of this core service via DCP (database change protocol) which is like a massive set of fire hoses. What this means if you are running both the “Index” service and the “Search” service they may both be receiving data thus some portion of the data service will be feeding both “Search” and “Index”. If you don’t have any mutations or upserts/deletes going to “Search” the indexing service would basically be getting all the resources of the data service

I hope this helps

Best

Jon Strabala
Principal Product Manager - Server‌

system · July 14, 2024, 9:12pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.