Interesting find! I tried to render the two files (heap and goroutine) myself by using GitHub - gperftools/gperftools: Main gperftools repository. But I failed. Could you please share the command I’d need to execute? Will I be able to find the user id of the user who pushes all those revisions?
Back to the issue at hand. Since there was nearly no extra RAM allocated but the BUSY issue appeared I think this is linked to: Replicator in Android get stuck at busy state when it goes back to online from offline - #23 by benjamin_glatzeder
At the time when I had 3 SG nodes (currently I have 1 SG node) the BUSY issue appeared, too. There was a big BUT though! I had observed this issue with several clients connected to the cluster (with a loadbalancer, you are correct!). Some clients (group A) were able to update documents and push them up. Other clients (group B) could update documents locally but the changes were not pushed up. The sync status with those devices was stuck at BUSY. But these clients (group B) received document updates from group A and displayed them. With that I mean that most likely one SG node had the issue you are describing in your last post but other SG nodes had not. This makes it more difficult to figure out which SG service to restart. And only restarting solves the issue.
At the momemt I’m coding a makeshift app which periodically updates a specific document, waits a few seconds, checks the replicator status and if it is still in the BUSY state the SG service is restarted via ssh’ing into the SG node.
Tagging @humpback_whale and @fatmonkey45: Could it be that you face a similiar issue where many, many revisions got stuck and this results in the BUSY state. See the previous post please.