Problem with deletion using TTL/expiration of larger number of documents

Hi @abakran,

You might have “tombstones” either be patient or adjust your bucket settings

  • Expiry Pager : Scans for items that have expired, and erases them from memory and disk; after which, a tombstone remains for a default period of 3 days. The expiry pager runs every 60 minutes by default: for information on changing the interval, see cbepctl set flush_param. For more information on item-deletion and tombstones, see Expiration.

You can tune this way down in the Buckets/Edit/Advanced Bucket Settings
The check “ Override the default auto-compaction settings?” Then adjust “ Metadata Purge Interval”

Yes I did set the Metadata Purge Interval to .4, which is every 60 minutes, but I still see the documents through the web console.

Yes delete with USE KEYS and bucket.remove() will do the same underlying request to the Key-Value server.
Can I assume that your queries were using N1QL (as opposed to the Key-Value API e.g. bucket.get(), lookupIn() or exists())? If so, they will have been hitting an index, and my guess is that this index had not yet been told by the Key-Value server about the removed documents. This eventual consistency is a great thing, as it means that you can decide at read time what level of consistency you require - e.g. you can request that the N1QL read is consistent with any mutations at the time of the query. Please take a look at the scan consistency docs for details: https://docs.couchbase.com/java-sdk/2.7/scan-consistency-examples.html

As for why you are seeing the current behaviour, e.g. setting TTL=0 and seeing Key-Value lookups reflecting this, but the documents still existing when you do a N1QL query, it comes down to the same eventual consistency. The Key-Value service has not yet told the N1QL index about the change, but it will do when the expiry pager runs, or compaction is run. Please see these docs for more details: https://docs.couchbase.com/server/current/learn/buckets-memory-and-storage/expiration.html#post-expiration-purging (specifically the Post-Expiration Purging section).

I tried setting auto compaction on for 1 hour in the web console, but that didn’t purge it.

The expiries should be processed on compaction - can I double-check that you were running your query at least an hour after doing the queryBucket.touch(), e.g. 100% after compaction had run?

Still, the TTL discussion is a bit of a red herring to follow, as IMO queryBucket.remove() is the way to go - together with setting the scanConsistency if you’re doing subsequent queries.

Yes, I ran the query 1 hour after doing the queryBucklet.touch() command.

Can I specify the scan consistency in the web console?

I just tried to call the below query after deleting all the documents, but it is still returning documents back (partially):

    N1qlParams param = N1qlParams.build().consistency(ScanConsistency.REQUEST_PLUS);

    JsonObject args =
            JsonObject.create()
                    .put("myparam1", myparam)
                    .put("offset", offset)
                    .put("limit", limit);

    ParameterizedN1qlQuery query = N1qlQuery.parameterized(getAllDocumentIdsTemplate, args, param);

You might doing covered query. Unless Expiry Pager ran and removed documents REQUEST_PLUS will not help, Mutation is not recorded and indexer will not be updated.
Once expiration passed without Expiry Pager if directly retrieved document via SDK or N1QL the document marked as deleted.

If use non-covered query you will not get expired documents.

How do I know what my Expiry Pager settings are? Is there a way to retrieve that information easily? My subsequent query has an index on it. BTW I am no longer “expiring” the document. I am deleting it, as per the suggestions above. Does the same system apply? i.e. Do I need to wait until expiry pager runs? I thought I don’t need to if I am setting REQUEST_PLUS and doing delete.

If you are Deleting the document you can use REQUEST_PLUS on other requests that will not give the deleted document.

Yes, that’s what I tried, I pasted the java code above which is my query after I do delete (query is using existing index). I am batching my deletes in 10000 at a time.

delete from mybucket use keys $documentIds

If my query is using an existing index, and I am using REQUEST_PLUS consistency, I should NEVER see the document? Because I do, and this is where I am currently at.

Try this It might be slow. (You might be hitting some bug)

delete from mybucket use keys $documentIds WHERE 1=1;

Best option will be

  1. Eventing
  2. As you have already have document keys directly remove from SDK asynchronously/parallel

Before I go for eventing, for option #2 how would you suggest I do the delete, which SDK call? I thought I am doing that with the DELETE use keys option under the hood as described above.

Check out Delete bulk documents using Java SDK

FYI I just tried added 1=1 in the where clause of the query, and still is returning documents.

I have no idea what is wrong. But you can use approach suggested here As you have document keys. Delete bulk documents using Java SDK

As @graham.pople suggested on first reply. Then following N1QL calls use REQUEST_PLUS to reflect the index.

 AsyncBucket queryBucket = bucket.async();
        Observable.from(documentIds)
                .flatMap(id -> queryBucket. remove(id)
                .toBlocking()
                .subscribe();

Yeah, I tried this in the beginning, and just tried again. It is MOSTLY deleting the documents, however some documents still remain in the subsequent query. Would eventing give me guarenteed results if I went that route? Because this .remove method is not working with full accuracy it seems.

Couchbase Indexes are Eventual consistency,
As you are doing async call you should wait to finish the async call and issue N1QL query with REQUEST_PLUS to get the next list of documents. Also not sure how you are getting next list of documents. Are you using ORDER BY/LIMIT/OFFSET?
If you are using multiple threads then you need to synchronize those.

You mentioned you want delete millions of records and want to use offset /limit .

Note: Couchbase indexes are eventual consistent. you must use REQUEST_PLUS.
If used second time it might give different results.

If you have 100 items without offset and limit.
OFFSET 10 LIMIT 15 and delete 15 items. Request_plus second time gives another 15 times. After 6th time it will not give results.

Best Approach will be use Keyset pagination: Using OFFSET and Keyset in N1QL | The Couchbase Blog

If you running second time run the SELECT query with REQUEST_PLUS or check UI and Indexes and see any pending mutations and let them go down to 0

CREATE INDEX ix1 ON mybucket (dealer,META().id, entityType);
id = “”
Query = “select RAW meta().id as key , entityType from mybucket where dealer=$dealer AND META().id > $id LIMIT 10000;”

Do In loop
          Execute  the N1QL query
           if results length 0 
                 break the loop
          Else run the delete query asynchronously on all the documents.
          set id value to last value in the results let the loop continue.

I ran the query 1 hour after doing the queryBucklet.touch() command. Can I specify the scan consistency in the web console?

Yes. Checkout Query Preferences Query Workbench | Couchbase Docs

Thanks for your quick and helpful reply.. .