Efficiently querying deleted documents with N1QL’s MISSING

When you’re building an application, a situation often comes up where your users want to delete some content but you still need to know that the data once existed.

Let’s say you’re building a messaging app. You might want to give your users the chance to restore deleted conversations.

So, when someone comes to delete a conversation you need to do something other than simply deleting the document from Couchbase. Instead, you could use a flag in the JSON body that identifies the document as somehow non-live. For example:

{
    "deleted": true
}

{

"deleted": true

}

Underneath that you’d create a GSI index to give you efficient access to all of those deleted conversations:

CREATE INDEX deletedmsgs ON `conversations`(deleted);

1	CREATE INDEX deletedmsgs ON `conversations`(deleted);

Then, to populate the view of deleted conversations in your app’s UI you’d run a N1QL query such as:

SELECT * FROM `conversations` WHERE deleted = true;

1	SELECT * FROM `conversations` WHERE deleted = true;

So far, everything seems pretty obvious.

Adding this to an existing data set

Now, what happens if you’re only just introducing this feature to your application? The likelihood is that you’d already have many thousands of conversations whose underlying documents don’t have the deleted flag.

You might think of this as a great opportunity to use N1QL’s IS MISSING. You’d be right but you might be thinking along slightly different lines to what I’m going to suggest.

You coulc account for documents that don’t have the deleted flag by changing your N1QL query to something like this:

SELECT * FROM `conversations` WHERE deleted = true OR deleted IS MISSING;

1	SELECT * FROM `conversations` WHERE deleted = true OR deleted IS MISSING;

To make it a bit more realistic, we might also filter our query based on the person involved:

SELECT * FROM `conversations` WHERE sender = user123 AND deleted = true OR deleted IS MISSING;

1	SELECT * FROM `conversations` WHERE sender = user123 AND deleted = true OR deleted IS MISSING;

This would do the job but it wouldn’t be the most efficient way to do it.

Once we introduce N1QL’s MISSING keyword in an OR situation, then we set N1QL off on a full document scan: our query is asking N1QL to check every document for the presence of the deleted JSON key. Our index helps only with those documents where deleted exists, because it doesn’t index documents where the key isn’t present.

Instead, in this case, we’d be better running a one-off query that would return all of the documents that are missing deleted. We could then work through each of them and update them to include deleted = false and so all those documents would now be in our efficient index.

If the query takes a little longer, that’s okay: we’d be running it as part of a background data janitoring task, so none of our users would even be aware of it. Once we’d run this task over all the documents, we’d no longer need MISSING in our user-facing query because we’d know that all our conversation documents would be either deleted: true or deleted: false.

Using schema versions instead

We could avoid using MISSING altogether by maintaing schema versioning:

{
   "schema": "1.2.3"
}

{

"schema": "1.2.3"

}

If we imagine that our new schema version, featuring the deleted flag, was 2.0.0 then our background janitor task could look for all the documents that have a schema version older than 2.0.0 and then run them through an upgrade process, part of which would be adding the deleted flag.

Matthew Revell, Lead Developer Advocate, EMEA, Couchbase

Author

Posted by Matthew Revell, Lead Developer Advocate, EMEA, Couchbase

Matthew Revell is a Lead Dev Advocate, EMEA Couchbase. He developed a global strategy for putting Couchbase front in the minds of the product's developers.

All Posts

Platform

Self-Managed

Services

Capabilities

Why Couchbase?

Migrate to Capella

By Use Case

By Industry

By Application Need

Popular Docs

By Developer Role

Quickstart

Resource Center

About

Partnerships

Our Services

Partners: Register a Deal

Ready to register a deal with Couchbase?

Marriott

All Posts

Efficiently querying for deleted documents with N1QL’s MISSING

Adding this to an existing data set

Using schema versions instead

Author

Posted by Matthew Revell, Lead Developer Advocate, EMEA, Couchbase

Leave a reply Cancel reply