FTS over multiple collections / N1QL JOINing multiple collections that match given SEARCH

Hello
I have a case when I need to FTS over multiple collections, preserve scoring and JOIN them into one object later on.
The perfect way would be to do it all on db side and within a single query.
I’ve noticed already it can’t be done just with FTS because it returns only single object (from one of those collections). So the other way would be to use N1QL query, JOIN collection and use subqueries to get id’s that match the SEARCH query. But in second case I don’t have the scoring.
What is the best way to achieve both - scoring and join across collections here? Any advice?

You should be able to create a search (FTS) index over multiple collections from a couchbase bucket as long as all those collections reside within a single scope.

Here’s some documentation on it - Specifying Type Mappings | Couchbase Docs

Now while querying, you search request will have this extra attribute called collections. This takes in an array giving you the ability to search data from specific collections if the index covers multiple collections. Not specifying anything for this attribute, will bring you a single object with search hits from all the collections that the index covers.

Here’s documentation on searching multi-collection search indexes - Multi-Collection Behaviour | Couchbase Docs

The results will be scored based on your search criteria.

Thank you for quick response :smile:
Actually you didn’t understand my problem. I know I can create FTS index on multiple collections and that’s what I did. The problem is that all of those collections are part of a bigger object (let’s call it “dashboard”). I want to do FTS over multiple collection but I want the database to return full “dashboard” object which consists of multiple smaller objects from different collections.
So if one of those smaller pieces fits the query I want to be able to JOIN other pieces together (i’ve got the id’s so I can join across collections) and return full “dashboard” object to the app. Best approach would be using N1QL + FTS hybrid I guess. The question is - how to do it to preserve scoring?

In order to make it more readable for you here an example:


This is what I get from FTS index - centrals and companies. Each company has centrals, and the opposite way. The dashboard consists out of bigger objects - each contains company and central.
So when I do FTS i want to be able to build this virtual object and return all data, not just the one that I got directly from FTS.

Here is also an example N1QL query that works but there are 2 major issues:

  1. it doesn’t use FTS index (I guess it’s going through GSI indexes)
  2. it doesn’t have the scoring - each collection has it’s own score here which sucks
SELECT * FROM data._default.centrals AS centrals
JOIN data._default.companies AS companies ON ANY v IN companies.centralUuids SATISFIES v = centrals.uuid END
WHERE centrals.uuid IN (SELECT RAW centralsSub.uuid FROM centrals AS centralsSub WHERE SEARCH(centralsSub, "*v*"))
OR companies.uuid IN (SELECT RAW companiesSub.uuid FROM companies AS companiesSub WHERE SEARCH(companiesSub, "*v*"))

Scoring from two different SEARCH queries may not make the best sense.
But that said, I will rope in @vsr1 for his advise on how to achieve the JOIN operation using N1QL here.

1 Like

First you need to FTS search on individual queries and then JOIN.
Is this what you want

WITH centrals AS (SELECT ce.*, SEARCH_SCORE() AS ce_score
                  FROM data._default.centrals AS ce
                  WHERE SEARCH(ce, "*v*")),
     companies AS (SELECT cm.*, SEARCH_SCORE() AS cm_score
                  FROM data._default.companies AS cm
                  WHERE SEARCH(cm, "*v*"))
SELECT ce1, cm1 FROM centrals AS ce1
JOIN companies AS cm1 ON cm1.centralUuids = ce1.uuid ;

FTS Search on inner side of NL there is no use and performance over head unless search values depends on outer, so inner side of NL will not use FTS search.
One way use HASH JOIN. Also you used right side of IN subquery (no results from that are available for main query). .

Write queries such way that FTS search materialize the results and do SQL (JOIN, aggregates etc)

2 Likes

Ok, I got the point now but still it doesn’t solve the issue completely. It seems that now it requires central to have the term at the first place but what if only company has the term but connected central doesn’t? I want it to be equal in terms of priorities. Just like it works for standard FTS index - you type the query - it finds all related documents. In the example above it’s narrowing-down the search results with each JOIN.
And second thing - it seems SEARCH function doesn’t use FTS index that I already created (specified mappings etc… it’s just working on it’s own correct?)

FTS search doesn’t validate the correctness.
SQL validate if underneath FTS index validity (is false negatives possible index will not qualify). It needs keyword analyzer.
It will not use without search predicate (like primary index)
cc @abhinav

Ok, I will work on query definition later on but let’s assume simple wildcard for now.
When I tested your query (this one, modified a bit)

WITH
    centrals AS (SELECT ce.*, SEARCH_SCORE() AS ce_score FROM data._default.centrals AS ce WHERE SEARCH(ce, "*v*")),
    companies AS (SELECT cm.*, SEARCH_SCORE() AS cm_score FROM data._default.companies AS cm WHERE SEARCH(cm, "*v*"))
SELECT ce1, cm1 FROM centrals AS ce1
JOIN companies AS cm1 ON ANY v IN cm1.centralUuids SATISFIES v = ce1.uuid END

It works for short wildcard queries (like *v* which will most probably exist in most of the documents) but when I changes it to real value - a word like name of the company (which indeed exists in the collection) it didn’t work, 0 results returned. I guess it’s because this name doesn’t exist in centrals obviously. Any idea how to make it work for all collections no matter which is the “base” collection that we are starting with (in this case - centrals)?

Eg. I could achieve similar thing by first running FTS search, get all types of documents that matches the query then based on it’s type “lazy load” data that is missing to build the full view (other collections). This would be 1 FTS followed by 1 N1QL query that will use different JOINs to fetch all the data. But I guess it’s not optimal way to do it. That’s why I’m looking for db-only solution that will help me achieve the same thing.

N1QL only works on collection level (not bucket, scope level). Only way is UNION queries.

Do I understand correctly that it would look like this?

# centrals first
SELECT ...
JOIN ...
UNION
# companies first 
SELECT ..
JOIN ...
UNION
# other collections first

Each query within the union would approach the search from a different direction then UNION is going to merge everything together?
Quite a lot of job :confused:

collection is similar concept of table RDBMS.
bucket as database
scope as schema

In RDBMS you can’t write queries at database level.

FTS index has different concept fts index can include data from different collections and query outside SQL. Not with SQL.

This I understand and we are working here within a single scope just on different collections.
The point is I need data from couple collections in order to serve a full dashboard view. And FTS queries (or rather data that it has to query) are spread across different collections. So I can’t use 2 or more separate FTS queries because it’s not objective (scoring). I can use just FTS because it doesn’t provide me all the data I need. Of course when it comes to just query and get IDs and scoring of the documents - it works. But I need to get the related data somehow. It seems that application has to do it separately with couple queries, right? N1QL is not able to glue everything together.

If you want run FTS query on scope(with multiple collections) and JOIN with single collection (get additional data) ?

See if you can use 2.1 (search on FTS index) (2.2 is deprecated)
N1QL & SEARCH: Leverage Full-Text Search (FTS) Index in N1QL