Difference between collections

mahendravarman · December 15, 2022, 12:51pm

Hi,

I have 2 collections with same documents but there are differences between the documents . How to find the difference between the two?

Example:

collection1:

{
    "name" : "Adam",
    "Vehicles" :
      [
       "car","bike"
      ]
}
{
    "name" : "Wright",
    "Vehicles" :
      [
       "Aeroplane"
      ]
}

collection 2:

{
    "name" : "Adam",
    "Vehicles" :
      [
       "car,bike,jeep"
      ]
}
{
    "name" : "Wright",
    "Vehicles" :
      [
       "Aeroplane"
      ]
}

Output:

{
    "name" : "Adam",
    "Vehicles" :
      [
       "car,bike,jeep"
      ]
}

dh · December 15, 2022, 1:22pm

If “name” is actually a key you’d like to join on and you are only interested in records where the “Vehicles” array doesn’t match between the two collections:

SELECT *                                                                                                                            
FROM default.s1.c1, default.s1.c2
WHERE c1.name = c2.name
AND c1.Vehicles != c2.Vehicles;

(returns both collections’ documents where they don’t match; change the * to c1.* or c2.* should you want to report only the one)
This could be made more efficient with a suitable index.

If you want to include those that exist in each collection but are absent from the other:

SELECT *                                                                                                                            
FROM default.s1.c1 LEFT OUTER JOIN default.s1.c2 ON c1.name = c2.name
WHERE c2.name is MISSING OR c1.Vehicles != c2.Vehicles
UNION
SELECT *
FROM default.s1.c2 LEFT OUTER JOIN default.s1.c1 ON c2.name = c1.name
WHERE c1.name IS MISSING
;

If you want to match other fields as well as “Vehicles”, you should be able to include them as additional conditions.
HTH.

mahendravarman · December 22, 2022, 4:53pm

thanks for the solution . when using spring boot how to rewrite this query using placeholders like #n1q1.bucket,#n1ql.collection,#n1ql.selectEntity etc so that the bucket , scope , collection names are not hardcoded. Since we are using two different documents doubt these placeholders can be used.

mreiche · December 22, 2022, 10:07pm

When constructing the @Query :
You can use #n1ql place holders for the bucket and scope and the first collection.
For the second collection, you can use the SPEL positional place-holders and pass the second collection name as an argument. The parameter indexes start at 0. i.e.: #{[0]}

There is mention of using the SPEL positional parameters for a different purpose in Fluent API needs a parametrized Query [DATACOUCH-551] · Issue #859 · spring-projects/spring-data-couchbase · GitHub

mahendravarman · December 28, 2022, 12:07pm

thank you . not sure if it is possible . From the query given we can find the difference between the collections . is there a way to also find what is the difference ? Like whether it is insert or update or delete

dh · January 3, 2023, 8:01am

There is no way to tell [edit: with an SQL query] if the absence of data is because it was never added or if it was added and later deleted.

Similarly there is no way to tell if the difference in an individual field comes about because it was inserted with the difference or later updated with it.

ianmccloy · January 3, 2023, 2:20pm

You can use the audit log to track when changes have been made to documents. Auditing can be configured on what is logged or not, tracking every document mutation will use a lot of resources though. Auditing | Couchbase Docs

mahendravarman · January 24, 2023, 1:46pm

Hi ,

Can you please suggest index for this query? I created index on name but still it takes around 50 seconds

dh · January 26, 2023, 7:41am

How many documents are there in each of the collections?

The index advisor suggests these two:

CREATE INDEX adv_name_Vehicles ON `default`:`default`.`s1`.`c1`(`name`,`Vehicles`);
CREATE INDEX adv_name_Vehicles ON `default`:`default`.`s1`.`c2`(`name`,`Vehicles`);

for the first (non-UNION) statement.

(Edit: Deleted advice for the second statement; on review the suggest indices won’t help.)

HTH.

dh · January 26, 2023, 7:59am

If you change the second statement to:

SELECT c1.name                                                                                                                      
FROM default.s1.c1 LEFT OUTER JOIN default.s1.c2 ON c1.name = c2.name
WHERE c1.name IS VALUED AND (c2.name is MISSING OR c1.Vehicles != c2.Vehicles)
UNION
SELECT c2.name
FROM default.s1.c2 LEFT OUTER JOIN default.s1.c1 ON c2.name = c1.name
WHERE c2.name IS VALUED AND c1.name IS MISSING

the above two indices should be used and may help. The addition of the IS VALUED clauses allows the selection of indices on “name” to be considered. Of course this does subtly change the possible results but I believe this to still be in line with your intentions (i.e. don’t return documents that don’t include “name”).

Topic		Replies	Views
Diff between 2 buckets Couchbase Server	3	2430	May 8, 2017
N1QL - How do you compare 2 document in N1QL and show only difference on all the attributes including array with simple N1QL statement Couchbase Server query , n1ql	2	1036	September 22, 2020
Would like some help with comparing two buckets for differences SQL++	3	1845	November 2, 2017
Couchbase N1QL / Couchbase SDK - Query multiple scopes Java SDK query , java , n1ql	2	578	October 10, 2023
Clarification on Scopes & Collections Adoption in Existing Application Couchbase Server	8	84	December 7, 2025

Difference between collections

Related topics