Problem with deletion using TTL/expiration of larger number of documents

I am using couchbase server 6.6.0 and am trying to set the TTL using the java SDK as shown below so documents immediately expires. I want to do this in bulk as my operations will be in millions. My intention is to immediately expire the documents as below. Then I am hoping that couchbase server will delete them. After the below code is run, I still see the documents in the web console with a N1QL query. But when I run my code again to look for those records, it can’t find them. So something did happen, but not sure what state it is in. Also, I noticed the expiration meta tag is still set to 0 in the web console. (I think this is because I did not set up an index on the expiry field though)

I tried to retrieve the document in the web console, and I can still retrieve the data. How come this is happening and what do I need to do to ensure it’s completely deleted? I tried setting auto compaction on for 1 hour in the web console, but that didn’t purge it.

        AsyncBucket queryBucket = bucket.async();
        Observable.from(documentIds)
                .flatMap(id -> queryBucket.touch(id, 1, 0, TimeUnit.SECONDS))
                .toBlocking()
                .subscribe();

Hi @abakran
I might be missing something obvious, but would it not be more efficient to do queryBucket.remove() rather than set a TTL of 0?

1 Like

0 might no expiration. might set 1 second will expire after second. https://docs.couchbase.com/server/current/learn/buckets-memory-and-storage/expiration.html#post-expiration-purging

Also checkout Eventing if you want to do in Million Delete v Expiry | Couchbase Docs cc @jon.strabala

I tried doing a “delete using keys” approach, however, I was finding that it wasn’t actually deleting all the documents, even though it was accepting the commands. I had to run it multiple times and slowly things were getting deleted, so I didn’t find it a reliable way to delete the volume I was doing. I didn’t do the “remove”, but I think delete using keys is similar under the hood? I set the TTL to 1. 0 is no expiration from my understanding.

Yes, I tried your recommended approach (setting TTL to 1 second). But the background/cleanup process doesn’t seem to be working as I still can query these documents in couchbase. I prefer this approach if possible. Any ideas how I can debug this further or ensure that cleanup process is working as expected?

Checkout expiration process.

https://docs.couchbase.com/server/5.0/architecture/db-engine-architecture.html#working-set-management-and-ejection

Hi @abakran,

You might have “tombstones” either be patient or adjust your bucket settings

  • Expiry Pager : Scans for items that have expired, and erases them from memory and disk; after which, a tombstone remains for a default period of 3 days. The expiry pager runs every 60 minutes by default: for information on changing the interval, see cbepctl set flush_param. For more information on item-deletion and tombstones, see Expiration.

You can tune this way down in the Buckets/Edit/Advanced Bucket Settings
The check “ Override the default auto-compaction settings?” Then adjust “ Metadata Purge Interval”

Yes I did set the Metadata Purge Interval to .4, which is every 60 minutes, but I still see the documents through the web console.

Yes delete with USE KEYS and bucket.remove() will do the same underlying request to the Key-Value server.
Can I assume that your queries were using N1QL (as opposed to the Key-Value API e.g. bucket.get(), lookupIn() or exists())? If so, they will have been hitting an index, and my guess is that this index had not yet been told by the Key-Value server about the removed documents. This eventual consistency is a great thing, as it means that you can decide at read time what level of consistency you require - e.g. you can request that the N1QL read is consistent with any mutations at the time of the query. Please take a look at the scan consistency docs for details: https://docs.couchbase.com/java-sdk/2.7/scan-consistency-examples.html

As for why you are seeing the current behaviour, e.g. setting TTL=0 and seeing Key-Value lookups reflecting this, but the documents still existing when you do a N1QL query, it comes down to the same eventual consistency. The Key-Value service has not yet told the N1QL index about the change, but it will do when the expiry pager runs, or compaction is run. Please see these docs for more details: https://docs.couchbase.com/server/current/learn/buckets-memory-and-storage/expiration.html#post-expiration-purging (specifically the Post-Expiration Purging section).

I tried setting auto compaction on for 1 hour in the web console, but that didn’t purge it.

The expiries should be processed on compaction - can I double-check that you were running your query at least an hour after doing the queryBucket.touch(), e.g. 100% after compaction had run?

Still, the TTL discussion is a bit of a red herring to follow, as IMO queryBucket.remove() is the way to go - together with setting the scanConsistency if you’re doing subsequent queries.

Yes, I ran the query 1 hour after doing the queryBucklet.touch() command.

Can I specify the scan consistency in the web console?

I just tried to call the below query after deleting all the documents, but it is still returning documents back (partially):

    N1qlParams param = N1qlParams.build().consistency(ScanConsistency.REQUEST_PLUS);

    JsonObject args =
            JsonObject.create()
                    .put("myparam1", myparam)
                    .put("offset", offset)
                    .put("limit", limit);

    ParameterizedN1qlQuery query = N1qlQuery.parameterized(getAllDocumentIdsTemplate, args, param);

You might doing covered query. Unless Expiry Pager ran and removed documents REQUEST_PLUS will not help, Mutation is not recorded and indexer will not be updated.
Once expiration passed without Expiry Pager if directly retrieved document via SDK or N1QL the document marked as deleted.

If use non-covered query you will not get expired documents.

How do I know what my Expiry Pager settings are? Is there a way to retrieve that information easily? My subsequent query has an index on it. BTW I am no longer “expiring” the document. I am deleting it, as per the suggestions above. Does the same system apply? i.e. Do I need to wait until expiry pager runs? I thought I don’t need to if I am setting REQUEST_PLUS and doing delete.

If you are Deleting the document you can use REQUEST_PLUS on other requests that will not give the deleted document.

Yes, that’s what I tried, I pasted the java code above which is my query after I do delete (query is using existing index). I am batching my deletes in 10000 at a time.

delete from mybucket use keys $documentIds

If my query is using an existing index, and I am using REQUEST_PLUS consistency, I should NEVER see the document? Because I do, and this is where I am currently at.

Try this It might be slow. (You might be hitting some bug)

delete from mybucket use keys $documentIds WHERE 1=1;

Best option will be

  1. Eventing
  2. As you have already have document keys directly remove from SDK asynchronously/parallel

Before I go for eventing, for option #2 how would you suggest I do the delete, which SDK call? I thought I am doing that with the DELETE use keys option under the hood as described above.

Check out Delete bulk documents using Java SDK

FYI I just tried added 1=1 in the where clause of the query, and still is returning documents.

I have no idea what is wrong. But you can use approach suggested here As you have document keys. Delete bulk documents using Java SDK

As @graham.pople suggested on first reply. Then following N1QL calls use REQUEST_PLUS to reflect the index.

 AsyncBucket queryBucket = bucket.async();
        Observable.from(documentIds)
                .flatMap(id -> queryBucket. remove(id)
                .toBlocking()
                .subscribe();