HASH Join and memory usage

pccb · October 7, 2020, 2:28pm

Hello,

Is it that a HASH join has to be always instructed via a hint? i.e. the optimizer will never choose a HASH join on its own? If yes, is that because the optimizer is a rule-based one? If yes, does the upcoming cost based optimizer choose a HASH join over NL join if it finds the cost to be lower?
A HASH join will need memory for holding the in-memory hash table. That memory will be used on which node: the node running the Data service, Index service or the Query service?

Thanks

vsr1 · October 7, 2020, 2:37pm

Yes. HASH JOIN available in EE version. In rule based optimizer HASH JOIN is based on Hints. In Cost Based Optimizer, if stats are available it choose HASH JOIN or NL based on lower cost. If no cost available it falls back rule based approach.

HASH table is based on query predicate, built on query node where query is running. At present in-memory hash table only (no spill to disk)

cc @bingjie.miao

pccb · October 10, 2020, 3:22am

Thank you.

But then there is no provision to assign memory quota to query service. So I researched and found that Query service will compete for memory. Does this mean it can potentially hog all the memory on the node (if many queries are running)?

Is there a way to identify the actual memory used by the query when we use HASH JOIN so that it can be ensured that the nodes are sized correctly?

Marco_Greco · October 10, 2020, 4:46pm

In 7.0 there will be per request memory quota meaning that at startup you can enforce what’s the maximum amount of memory taken by values for the processing of a query (there’s several reasons why we prefer per request quota to per service quota - but that’s too long to discuss here) .
This includes memory used for values exchanged in between operators, and memory for aggregates, sorts and hash joins.
When memory quota is on, at the end of a request you get, in the metrics, the maximum amount of value memory used for executing the request, so you could use that for sizing the request quota at development time.

pccb · October 12, 2020, 6:02am

Thank you Marco.

Will both (per request memory quota & metrics) be available only in 7.0? If yes, in 6.x, is there absolutely no way (metadata, statistics, etc.) to find how much memory was used ?

Looking at system tools like top, vmstat, etc. could be one way but a lot of our queries are very shortlived, few hundred millsecs to a couple of seconds. In that case, it will be almost impossible to capture the same and hence the query.

Regards

Topic		Replies	Views
Hash Join In Couchbase Couchbase Server n1ql	4	1443	November 21, 2022
Different results for nested loop and hash join SQL++ query , n1ql	4	1014	January 24, 2021
Memory Quota Node level Couchbase Server	3	70	February 2, 2025
Cbq-engine consuming massive memory for simple queries Couchbase Server	14	4544	January 24, 2017
Couchbase query nodes keep crashing Kubernetes	4	248	March 8, 2024

HASH Join and memory usage

Related topics