Problem with deletion using TTL/expiration of larger number of documents

abakran · June 28, 2021, 4:36pm

I am using couchbase server 6.6.0 and am trying to set the TTL using the java SDK as shown below so documents immediately expires. I want to do this in bulk as my operations will be in millions. My intention is to immediately expire the documents as below. Then I am hoping that couchbase server will delete them. After the below code is run, I still see the documents in the web console with a N1QL query. But when I run my code again to look for those records, it can’t find them. So something did happen, but not sure what state it is in. Also, I noticed the expiration meta tag is still set to 0 in the web console. (I think this is because I did not set up an index on the expiry field though)

I tried to retrieve the document in the web console, and I can still retrieve the data. How come this is happening and what do I need to do to ensure it’s completely deleted? I tried setting auto compaction on for 1 hour in the web console, but that didn’t purge it.

        AsyncBucket queryBucket = bucket.async();
        Observable.from(documentIds)
                .flatMap(id -> queryBucket.touch(id, 1, 0, TimeUnit.SECONDS))
                .toBlocking()
                .subscribe();

graham.pople · June 28, 2021, 5:03pm

Hi @abakran
I might be missing something obvious, but would it not be more efficient to do queryBucket.remove() rather than set a TTL of 0?

vsr1 · June 28, 2021, 5:06pm

0 might no expiration. might set 1 second will expire after second. https://docs.couchbase.com/server/current/learn/buckets-memory-and-storage/expiration.html#post-expiration-purging

Also checkout Eventing if you want to do in Million Delete v Expiry | Couchbase Docs cc @jon.strabala

abakran · June 28, 2021, 5:07pm

I tried doing a “delete using keys” approach, however, I was finding that it wasn’t actually deleting all the documents, even though it was accepting the commands. I had to run it multiple times and slowly things were getting deleted, so I didn’t find it a reliable way to delete the volume I was doing. I didn’t do the “remove”, but I think delete using keys is similar under the hood? I set the TTL to 1. 0 is no expiration from my understanding.

abakran · June 28, 2021, 5:18pm

Yes, I tried your recommended approach (setting TTL to 1 second). But the background/cleanup process doesn’t seem to be working as I still can query these documents in couchbase. I prefer this approach if possible. Any ideas how I can debug this further or ensure that cleanup process is working as expected?

vsr1 · June 28, 2021, 5:34pm

Checkout expiration process.

https://docs.couchbase.com/server/5.0/architecture/db-engine-architecture.html#working-set-management-and-ejection

jon.strabala · June 28, 2021, 5:46pm

Hi @abakran,

You might have “tombstones” either be patient or adjust your bucket settings

Expiry Pager : Scans for items that have expired, and erases them from memory and disk; after which, a tombstone remains for a default period of 3 days. The expiry pager runs every 60 minutes by default: for information on changing the interval, see cbepctl set flush_param. For more information on item-deletion and tombstones, see Expiration.

You can tune this way down in the Buckets/Edit/Advanced Bucket Settings
The check “ Override the default auto-compaction settings?” Then adjust “ Metadata Purge Interval”

abakran · June 28, 2021, 6:29pm

Yes I did set the Metadata Purge Interval to .4, which is every 60 minutes, but I still see the documents through the web console.

graham.pople · June 28, 2021, 9:22pm

Yes delete with USE KEYS and bucket.remove() will do the same underlying request to the Key-Value server.
Can I assume that your queries were using N1QL (as opposed to the Key-Value API e.g. bucket.get(), lookupIn() or exists())? If so, they will have been hitting an index, and my guess is that this index had not yet been told by the Key-Value server about the removed documents. This eventual consistency is a great thing, as it means that you can decide at read time what level of consistency you require - e.g. you can request that the N1QL read is consistent with any mutations at the time of the query. Please take a look at the scan consistency docs for details: https://docs.couchbase.com/java-sdk/2.7/scan-consistency-examples.html

As for why you are seeing the current behaviour, e.g. setting TTL=0 and seeing Key-Value lookups reflecting this, but the documents still existing when you do a N1QL query, it comes down to the same eventual consistency. The Key-Value service has not yet told the N1QL index about the change, but it will do when the expiry pager runs, or compaction is run. Please see these docs for more details: https://docs.couchbase.com/server/current/learn/buckets-memory-and-storage/expiration.html#post-expiration-purging (specifically the Post-Expiration Purging section).

I tried setting auto compaction on for 1 hour in the web console, but that didn’t purge it.

The expiries should be processed on compaction - can I double-check that you were running your query at least an hour after doing the queryBucket.touch(), e.g. 100% after compaction had run?

Still, the TTL discussion is a bit of a red herring to follow, as IMO queryBucket.remove() is the way to go - together with setting the scanConsistency if you’re doing subsequent queries.

abakran · June 28, 2021, 9:40pm

Yes, I ran the query 1 hour after doing the queryBucklet.touch() command.

Can I specify the scan consistency in the web console?

abakran · June 28, 2021, 11:52pm

I just tried to call the below query after deleting all the documents, but it is still returning documents back (partially):

    N1qlParams param = N1qlParams.build().consistency(ScanConsistency.REQUEST_PLUS);

    JsonObject args =
            JsonObject.create()
                    .put("myparam1", myparam)
                    .put("offset", offset)
                    .put("limit", limit);

    ParameterizedN1qlQuery query = N1qlQuery.parameterized(getAllDocumentIdsTemplate, args, param);

vsr1 · June 29, 2021, 12:07am

You might doing covered query. Unless Expiry Pager ran and removed documents REQUEST_PLUS will not help, Mutation is not recorded and indexer will not be updated.
Once expiration passed without Expiry Pager if directly retrieved document via SDK or N1QL the document marked as deleted.

If use non-covered query you will not get expired documents.

abakran · June 29, 2021, 12:10am

How do I know what my Expiry Pager settings are? Is there a way to retrieve that information easily? My subsequent query has an index on it. BTW I am no longer “expiring” the document. I am deleting it, as per the suggestions above. Does the same system apply? i.e. Do I need to wait until expiry pager runs? I thought I don’t need to if I am setting REQUEST_PLUS and doing delete.

vsr1 · June 29, 2021, 12:12am

If you are Deleting the document you can use REQUEST_PLUS on other requests that will not give the deleted document.

abakran · June 29, 2021, 12:16am

Yes, that’s what I tried, I pasted the java code above which is my query after I do delete (query is using existing index). I am batching my deletes in 10000 at a time.

delete from mybucket use keys $documentIds

If my query is using an existing index, and I am using REQUEST_PLUS consistency, I should NEVER see the document? Because I do, and this is where I am currently at.

vsr1 · June 29, 2021, 12:22am

Try this It might be slow. (You might be hitting some bug)

delete from mybucket use keys $documentIds WHERE 1=1;

Best option will be

Eventing
As you have already have document keys directly remove from SDK asynchronously/parallel

abakran · June 29, 2021, 12:27am

Before I go for eventing, for option #2 how would you suggest I do the delete, which SDK call? I thought I am doing that with the DELETE use keys option under the hood as described above.

vsr1 · June 29, 2021, 12:33am

Check out Delete bulk documents using Java SDK

abakran · June 29, 2021, 12:33am

FYI I just tried added 1=1 in the where clause of the query, and still is returning documents.

vsr1 · June 29, 2021, 12:36am

I have no idea what is wrong. But you can use approach suggested here As you have document keys. Delete bulk documents using Java SDK

As @graham.pople suggested on first reply. Then following N1QL calls use REQUEST_PLUS to reflect the index.

 AsyncBucket queryBucket = bucket.async();
        Observable.from(documentIds)
                .flatMap(id -> queryBucket. remove(id)
                .toBlocking()
                .subscribe();

Topic		Replies	Views
Delete vs setting up TTL Couchbase Server query	2	1374	November 9, 2021
Delete Couchbase Docucments Couchbase Server	2	1222	April 7, 2017
Touch vs remove for large number of docs Couchbase Server java , n1ql	2	1454	June 20, 2017
Delete 1 Billion doc having doc's key- company_name_employeeid Couchbase Server query	3	1404	November 30, 2021
Query to Delete old documents which will help to resolve DiskSpace Issues SQL++	9	2547	January 2, 2020

Problem with deletion using TTL/expiration of larger number of documents

Related topics