Deletion is extremely slow

jda · March 21, 2024, 10:15pm

I have this query to delete 400.000 of a total of 900.000 docs:

DELETE
FROM data
WHERE type='ActivityLog'
    AND SUBSTR(revisioninfo.created,0,10) < SUBSTR(DATE_ADD_STR(NOW_STR(), -1, 'year'),0,10)

It is extremely slow. It deletes around 500-1.000 docs before it times out after 10 minutes!

I am running this on CE 7.2.4. I used the exact same query on the server before it was upgraded to latest (7.1.x) and it was slow but nothing like that. I think I had to run the query 7-10 times before all was gone.

I have tried to build an index using the external query advisor - but it makes no real difference. I have also tried to remove almost all indexes apart from 3 to see if it was the index update that killed performance. But that does not seem to make a big difference. It gets a little further during the 10 minutes (4-5.000 docs) but still a long way to go…

Any ideas how I could improve the deletion time?

mreiche · March 21, 2024, 11:38pm

That seems really slow - maybe someone from the server team has some input(?)

The general way to operate on many documents as fast as possible is to use an SDK to execute a query that streams all the documentIds into a reactive/asynchronous subscriber/loop that does a kv operation on each of the ids.

I’m curious if finding the documents the issue (vs the actual deletes).

SELECT COUNT(*) FROM data WHERE ...

Is that query slow? (I did note you had already investigated Index Advisor - but maybe more investigation is required).

vsr1 · March 22, 2024, 12:46am

CREATE INDEX ix1 ON data(revisioninfo.created) WHERE type='ActivityLog';

DELETE
FROM data
WHERE type='ActivityLog' 
     AND revisioninfo.created < DATE_ADD_STR(NOW_STR("1111-11-11"),-1,"year");

jda · March 22, 2024, 6:55am

Ok, that index makes sense. The Query Advisor gave me:

CREATE INDEX `adv_type_created` ON `data`(`type`,substr0((`revisioninfo`.`created`), 0, 10))

@mreiche , if I do a count of docs:

SELECT COUNT(0)
FROM data
WHERE type='ActivityLog'
    AND SUBSTR(revisioninfo.created,0,10) < SUBSTR(DATE_ADD_STR(NOW_STR(), -1, 'year'),0,10)

that took 2 secs.

jda · March 24, 2024, 6:27pm

I created that index and deletion is still very slow. Around a couple of hundreds until it times out. These are test-servers (with one sync.gateway connected). They only have 8 GB of RAM - but still it seems slower than it should…

I see that it starts rebuilding the index as soon as the deletion starts. Could that be deferred until the delete statement is completed - or does that not matter?

mreiche · March 24, 2024, 8:46pm

You could delete the indexes to see if it speeds things up. I don’t know if the there is a way to temporarily suspend indexing.

Another approach you might take is to just set the expiration of the documents you want deleted, and let the server delete them at its leisure.

Or you could delete them in batches of 100 or so using keyset paging.

Have you tried my first suggestion?

“The general way to operate on many documents as fast as possible is to use an SDK to execute a query that streams all the documentIds into a reactive/asynchronous subscriber/loop that does a kv operation on each of the ids.”

jda · March 25, 2024, 8:19am

No, I haven’t tried using an SDK yet.

I have tried to stop the sync.gateway but that does not seem to make a big difference.

I also tried to remove almost all indexes but that didn’t speed things up either.

I will try and have a look at setting expiration to see if an update is faster

vsr1 · March 25, 2024, 1:55pm

I am not sure why it is taking long.
One option will be , with covered query and use SDK remove/change ttl of the documents and see if that speeds up

SELECT RAW META().id
FROM ....
WHERE .....

jda · April 5, 2024, 8:37am

Haven’t had a chance to use the SDK (not so urgent as this is my test env.). However, when trying to run a simpler query:

DELETE
FROM data
WHERE type='ActivityLog'

I got this error:

[
  {
    "code": 5000,
    "msg": " read tcp 192.168.42.211:41230->192.168.42.212:9101: i/o timeout - cause:  read tcp 192.168.42.211:41230->192.168.42.212:9101: i/o timeout"
  }
]

I guess that indicates that there is a problem with the communication between the two DB servers.

The other thing I have thought of is that I believe these problems started after having upgraded to 7.2.4 (from 7.1.x… IRC). And it seems general - like also when trying to create a number of docs via my code (an import function). That times out before completing - and the exact same code worked Ok earlier…

How can I troubleshoot this to get an idea of what I need to do? I have tried to vary memory allocated for Data and Index in Settings without any real success…

dh · April 5, 2024, 9:08am

Looks like a problem with the indexer node.

https://docs.couchbase.com/server/current/rest-api/rest-list-node-services.html#example
→ port 9101 is the indexScan port typically.

I would suggest you should start with the indexing.log file (/opt/couchbase/var/lib/couchbase/logs, typically) on .212 to see if it holds any clues. (Obviously any system logs on .212 could also help - is the node nearly out of memory and paging etc.)

HTH.

jda · April 6, 2024, 12:26am

Ok, so I just jumped into the helicopter and looked from things from above…

I found out that the two DB servers’ VMs (on ESXi) had warnings that the disks needed to be “consolidated” - and after that everything seems to be a little more normal. I also gave the VMs a little more RAM (now 10 GB).

I’ll follow up on this tomorrow (way past midnight here now)…

jda · April 6, 2024, 9:56am

Just a quick follow up. I now delete 12-15.000 docs before the timeout (instead of 150-200). And I suppose that is Ok for these relatively small test servers.

/John