You might have “tombstones” either be patient or adjust your bucket settings
Expiry Pager : Scans for items that have expired, and erases them from memory and disk; after which, a tombstone remains for a default period of 3 days. The expiry pager runs every 60 minutes by default: for information on changing the interval, see cbepctlset flush_param. For more information on item-deletion and tombstones, see Expiration.
You can tune this way down in the Buckets/Edit/Advanced Bucket Settings
The check “ Override the default auto-compaction settings?” Then adjust “ Metadata Purge Interval”
Yes delete with USE KEYS and bucket.remove() will do the same underlying request to the Key-Value server.
Can I assume that your queries were using N1QL (as opposed to the Key-Value API e.g. bucket.get(), lookupIn() or exists())? If so, they will have been hitting an index, and my guess is that this index had not yet been told by the Key-Value server about the removed documents. This eventual consistency is a great thing, as it means that you can decide at read time what level of consistency you require - e.g. you can request that the N1QL read is consistent with any mutations at the time of the query. Please take a look at the scan consistency docs for details: https://docs.couchbase.com/java-sdk/2.7/scan-consistency-examples.html
As for why you are seeing the current behaviour, e.g. setting TTL=0 and seeing Key-Value lookups reflecting this, but the documents still existing when you do a N1QL query, it comes down to the same eventual consistency. The Key-Value service has not yet told the N1QL index about the change, but it will do when the expiry pager runs, or compaction is run. Please see these docs for more details: https://docs.couchbase.com/server/current/learn/buckets-memory-and-storage/expiration.html#post-expiration-purging (specifically the Post-Expiration Purging section).
I tried setting auto compaction on for 1 hour in the web console, but that didn’t purge it.
The expiries should be processed on compaction - can I double-check that you were running your query at least an hour after doing the queryBucket.touch(), e.g. 100% after compaction had run?
Still, the TTL discussion is a bit of a red herring to follow, as IMO queryBucket.remove() is the way to go - together with setting the scanConsistency if you’re doing subsequent queries.
You might doing covered query. Unless Expiry Pager ran and removed documents REQUEST_PLUS will not help, Mutation is not recorded and indexer will not be updated.
Once expiration passed without Expiry Pager if directly retrieved document via SDK or N1QL the document marked as deleted.
If use non-covered query you will not get expired documents.
How do I know what my Expiry Pager settings are? Is there a way to retrieve that information easily? My subsequent query has an index on it. BTW I am no longer “expiring” the document. I am deleting it, as per the suggestions above. Does the same system apply? i.e. Do I need to wait until expiry pager runs? I thought I don’t need to if I am setting REQUEST_PLUS and doing delete.
Yes, that’s what I tried, I pasted the java code above which is my query after I do delete (query is using existing index). I am batching my deletes in 10000 at a time.
delete from mybucket use keys $documentIds
If my query is using an existing index, and I am using REQUEST_PLUS consistency, I should NEVER see the document? Because I do, and this is where I am currently at.
Before I go for eventing, for option #2 how would you suggest I do the delete, which SDK call? I thought I am doing that with the DELETE use keys option under the hood as described above.
Yeah, I tried this in the beginning, and just tried again. It is MOSTLY deleting the documents, however some documents still remain in the subsequent query. Would eventing give me guarenteed results if I went that route? Because this .remove method is not working with full accuracy it seems.
Couchbase Indexes are Eventual consistency,
As you are doing async call you should wait to finish the async call and issue N1QL query with REQUEST_PLUS to get the next list of documents. Also not sure how you are getting next list of documents. Are you using ORDER BY/LIMIT/OFFSET?
If you are using multiple threads then you need to synchronize those.
You mentioned you want delete millions of records and want to use offset /limit .
Note: Couchbase indexes are eventual consistent. you must use REQUEST_PLUS.
If used second time it might give different results.
If you have 100 items without offset and limit.
OFFSET 10 LIMIT 15 and delete 15 items. Request_plus second time gives another 15 times. After 6th time it will not give results.
If you running second time run the SELECT query with REQUEST_PLUS or check UI and Indexes and see any pending mutations and let them go down to 0
CREATE INDEX ix1 ON mybucket (dealer,META().id, entityType);
id = “”
Query = “select RAW meta().id as key , entityType from mybucket where dealer=$dealer AND META().id > $id LIMIT 10000;”
Do In loop
Execute the N1QL query
if results length 0
break the loop
Else run the delete query asynchronously on all the documents.
set id value to last value in the results let the loop continue.