Example: list all of the beers manufactured by brewery “Yards Brewing”.
The following query works:
select br.name brewery, b.name beer, b.style style from beer-sample b join beer-sample br on keys b.brewery_id where br.name = “Yards Brewing” order by beer, style;
However, I am not able to make this join work using the USE KEYS argument to define the parent join key. It seems to me that this query would perform much faster if I could specify the parent (brewery) item key. Anybody know how I would use the USE KEYS argument in this kind of parent-child join?
Hi @morrie, If your objective is to optimize the query, I’d recommend optimizing the beer segment of the query. You could create the following index
create index beer_brewery on beer-sample(type,brewery_id) where type=“beer” using GSI;
and push the optimization before the join to filter as much as possible using both the b.type=“beer” and b.brewery_id="…". with the nested loop join, this will eliminate the number of items you need to loop over in “b” (beers).
select br.name brewery, b.name beer, b.style style, b.brewery_id from beer-sample b join beer-sample br on keys b.brewery_id where b.type=“beer” and b.brewery_id=“yards_brewing” order by beer, style;
if you can compare the “explains”, you will see that you can avoid the primary scan with the index and push the filter earlier in the exec plan.
Thanks for the response. Those are all valid performance optimizations.
I’m actually specifically interested in the question of how/if this query would be written to use the “USE KEYS” clause. I’m wondering if USE KEYS can only be used in joins of separate buckets and cannot be used on a same-bucket parent/child join.
Your query has the correct syntax. You could put a USE KEYS clause on the LHS of the join:
select br.name brewery, b.name beer, b.style style
from beer-sample b use keys ["b1", "b2", "b3" ] join beer-sample br on keys b.brewery_id
where br.name = "Yards Brewing" order by beer, style;
This would only use beers that have keys “b1”, “b2” and “b3”.
For a join, “ON KEYS” is the correct syntax to use to specify the keys from the LHS that the RHS is joined with.
Actually what I think I’m trying to do is use USE KEYS for the brewery id , which in this case is “yards_brewing” . I want to eliminate the where -clause reference to br.name, which is forcing a full scan of the bucket.
USE KEYS only applies to the leftmost bucket in your query. So in your example, you would need to start with a set of known keys, in order to benefit from USE KEYS.
Your query is fundamentally based on brewery name, not brewery id. That is the mismatch with USE KEYS. As @cihangirb said, you can create an index on brewery name, which would avoid your full bucket scan. You can confirm this by using CREATE INDEX followed by EXPLAIN, which should show an IndexScan (as opposed to a PrimaryScan, which is a full bucket scan).
Thanks Gerald - that was the issue I was addressing and which you have now confirmed: USE KEYS can only be used against the left most bucket in the join. This means that in the parent-child join which I am trying to execute, the parent cannot be addressed by its key.
Incidentally, I’d like to see that changed: it should be possible to access the parent by its key; we should not need to support a 2nd index if we already know the key Id.
I’m satisfied with Gerald’s response: I believe we can close this thread. Thanks to all who replied.
Hi @morrie, one more point about the change you’d like to see.
If you model your data so that each brewery contains an array of beer_ids, then you can achieve your desired access paths: USE KEYS on the brewery_id, then JOIN or NEST the corresponding beers using the array of beer_ids.