Updating Document Fields Based on External Mapping for Large Number of Keys

Problem Description
Hello Couchbase Community,

I’m working on a project where I need to update a specific field (image.declared.identityInformation.externalIDs) in my documents within the default bucket. The update should map an existing agencyCode(found withinimage.declared.identityInformation.names.code) to a new COMID` value based on a predefined mapping.

Here’s the mapping between agencyCode and COMID:

  • 54720020 → 000001
  • 54720021 → 000002
  • 54720029 → 000000

The challenge is that I have a large number of document keys (thousands) that need to be updated, and these keys are organized into chunks of 1000 for processing efficiency.

I’m looking for a way to efficiently update the externalIDs field for each document based on its agencyCode, using the USE KEYS clause to target specific documents in batches of 1000. Ideally, I’d like to perform this update with a single N1QL query per batch if possible.

Here’s an example of what I’ve tried so far (simplified for brevity):

UPDATE `default` USE KEYS ["54720020"]
SET `image`.`declared`.`identityInformation`.`externalIDs` = "000001";

UPDATE `default` USE KEYS [54720021"]
SET `image`.`declared`.`identityInformation`.`externalIDs` = "000002";

Another Example
UPDATE `default`
USE KEYS ["54720020", "54720021", "54720022", "54720023", "54720024", "54720025", "54720026", "54720027", "54720028", "54720029"]
SET imaged.pmsId = CASE META().id
    WHEN "54720020" THEN "000001"
    WHEN "54720021" THEN "000002"
    WHEN "54720022" THEN "000003"
    WHEN "54720023" THEN "000004"
    WHEN "54720024" THEN "000005"
    WHEN "54720025" THEN "000006"
    WHEN "54720026" THEN "000007"
    WHEN "54720027" THEN "000008"
    WHEN "54720028" THEN "000009"
    WHEN "54720029" THEN "000000"
    END;
...

Question:

Is there a more efficient way to apply this mapping dynamically within a single N1QL query for each batch of 100,000 keys?

A quick question based on the explanation + queries you have provided:

Where is the value of the existing “agencyCode” ?

Is it from the document id I.e meta().id?

Or from the “code” field of elements in the the image.identityInformation.name[ ] array?

1 Like

Hello Dhanya,

I made a mistake .

  • Where is the value of the existing “agencyCode” ? from the column AgencyCode
    Is it from the document id I.e meta().id? No no. IT’s me I didn’t know how to create the second query
  • Or from the “code” field of elements in the the image.identityInformation.name array? yes From here. correct this is correct

Dhanya.

I have a csv file of 100,000 rows. I can only copy paste the query in the couchbase console. I was trying to create queries on my side similar to the first 2 queries. But Creating 100,000 queries is really hard to manage. It will take hours to finish.

I don’t know how to create a more compact good query that can handle batches of 1000 AgencyCodes.

Thank you so much for your help

Based on my understanding, there is an array image.declared.identityInformation.names that has elements with the field ‘code’ which represents the existing agencyCode.

The goal is to update a field image.declared.identityInformation.externalIDs to a value based on a custom mapping of agency code to COMID.

( If there is no mapping for a value I have simply mapped it to “no-mapping” - please change as necessary )

  1. To update all documents in one single query:
UPDATE default
SET image.declared.identityInformation.externalIDs = CASE i.code
    WHEN "54720020" THEN "000001"
    WHEN "54720021" THEN "000002"
    WHEN "54720022" THEN "000003"
    WHEN "54720023" THEN "000004"
    WHEN "54720024" THEN "000005"
    WHEN "54720025" THEN "000006"
    WHEN "54720026" THEN "000007"
    WHEN "54720027" THEN "000008"
    WHEN "54720028" THEN "000009"
    WHEN "54720029" THEN "000000"
    ELSE "no-mapping" END
FOR i in  image.declared.identityInformation.names END;
  1. If you wish to run the UPDATE in batches of 1000:

a. Create a primary index.

b. Below is a query you can run in batches. The query will modify 1000 documents in the primary index order. The query will also return the document ids i.e meta().ids of the documents that were modified in primary index order.
It can be run in batches. Since your keyspace has 100,000 documents you would require 100 batches.

UPDATE default
SET image.declared.identityInformation.externalIDs = CASE i.code
    WHEN "54720020" THEN "000001"
    WHEN "54720021" THEN "000002"
    WHEN "54720022" THEN "000003"
    WHEN "54720023" THEN "000004"
    WHEN "54720024" THEN "000005"
    WHEN "54720025" THEN "000006"
    WHEN "54720026" THEN "000007"
    WHEN "54720027" THEN "000008"
    WHEN "54720028" THEN "000009"
    WHEN "54720029" THEN "000000" 
 ELSE "no-mapping" END
FOR i in  image.declared.identityInformation.names END
WHERE meta().id > $id 
LIMIT 1000
RETURNING meta().id;

c. Running the batches:

  • First run / batch: Set the $id named parameter to an empty string i.e “”

  • Run the above query ( in step b )

  • Note down the last value of the “id” returned by the above query. This is the
    document id of the last document that was modified in this batch.

  • Next run/ batch: Set the $id named parameter to this noted value from the previously run batch. And run the query.

  • Repeat this process in a loop till there are no more documents left to modify i.e
    mutation count is 0.

Named parameter references:

https://docs.couchbase.com/server/current/n1ql/n1ql-rest-api/exnamed.html

1 Like

The number of elements in the image.declared.identityInformation.names array could be more than one.

If you wish that the image.declared.identityInformation.externalIDs field be an array of all the mapped values of agencyCode to COMID for the elements that have the ‘code’ field in the image.declared.identityInformation.names array:

UPDATE default
SET image.declared.identityInformation.externalIDs = ARRAY CASE i.code
    WHEN "54720020" THEN "000001"
    WHEN "54720021" THEN "000002"
    WHEN "54720022" THEN "000003"
    WHEN "54720023" THEN "000004"
    WHEN "54720024" THEN "000005"
    WHEN "54720025" THEN "000006"
    WHEN "54720026" THEN "000007"
    WHEN "54720027" THEN "000008"
    WHEN "54720028" THEN "000009"
    WHEN "54720029" THEN "000000" 
    ELSE "no-mapping" END
FOR i in  image.declared.identityInformation.names END;

And similarly the batch query:

UPDATE default
SET image.declared.identityInformation.externalIDs = ARRAY CASE i.code
    WHEN "54720020" THEN "000001"
    WHEN "54720021" THEN "000002"
    WHEN "54720022" THEN "000003"
    WHEN "54720023" THEN "000004"
    WHEN "54720024" THEN "000005"
    WHEN "54720025" THEN "000006"
    WHEN "54720026" THEN "000007"
    WHEN "54720027" THEN "000008"
    WHEN "54720028" THEN "000009"
    WHEN "54720029" THEN "000000" 
      ELSE "no-mapping" END
FOR i in  image.declared.identityInformation.names END
WHERE meta().id > $id 
LIMIT 1000
RETURNING meta().id;

[ I have assumed that every element has a “code” field. If not - you can always add a WHEN i.code IS NOT MISSING clause to the FOR clause ]

[ Similarly modify the batch query & the queries in my previous post if necessary. ]

UPDATE default
SET image.declared.identityInformation.externalIDs = ARRAY CASE i.code
    WHEN "54720020" THEN "000001"
    WHEN "54720021" THEN "000002"
    WHEN "54720022" THEN "000003"
    WHEN "54720023" THEN "000004"
    WHEN "54720024" THEN "000005"
    WHEN "54720025" THEN "000006"
    WHEN "54720026" THEN "000007"
    WHEN "54720027" THEN "000008"
    WHEN "54720028" THEN "000009"
    WHEN "54720029" THEN "000000" 
    ELSE "no-mapping" END
FOR i in  image.declared.identityInformation.names 
WHEN i.code IS NOT MISSING END;

Why not compute the value for imaged.pmsId from the id? There are function to convert TONUMBER and TOSTRING. Or you could do it in an eventing function if it is an on-going requirement.

$obj = {"54720020":"000001", 
            "54720021":"000002",
            ...................
          };
UPDATE `default` AS d USE KEYS OBJECT_NAMES($obj)
SET d.image.declared.identityInformation.externalIDs =  $obj.[META(d).id];

Expand @Dhanya answer

UPDATE default AS d
SET image.declared.identityInformation.externalIDs = ARRAY $obj.[i.code]
                                                     FOR i in image.declared.identityInformation.names
                                                     WHEN i.code IS NOT MISSING AND $obj.[i.code] IS VALUED
                                                     END
WHERE meta().id > $id
LIMIT 1000
RETURNING meta().id;
;

Dynamic access $obj.[xxx]
xxx must evaluate string and it extract corresponding filed (like map or python dictionary). As you see after dot it starts array bracket

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.