Unable to restore Analyics Service after upgrade from 6.6.0 to 6.6.2

We upgraded our Couchbase cluster’s version from 6.6.0 to 6.6.2. All nodes were able to upgrade fine, except for the Analytics node. We added a new node with just the Analytics service running to act as a replica, took the old one offline to do the upgrade, then added it back, and removed the new one. At each step in this process the dataverse was still visible in the Analytics tab. Except after the final rebalance, the service got stuck in a restart loop. Currently, I can only get the Analytics tab to load in the web console if I go to it just as the service is restarting, and it shows the dataverse and its datasets, though the link is disconnected.

I’ve pulled what seem to me like the most relevant logs to show what is happening each time the service restarts.

2021-06-17T17:01:15.028+00:00 INFO CBAS.bootstrap.NCApplication [main] System state: CORRUPTED

Gets the list of checkpoints

2021-06-17T17:01:16.264+00:00 INFO CBAS.replication.NcLifecycleCoordinator [Executor-4:ClusterController] Building registration tasks for node ec099cf5c29301305441d006c4f2d1ad with status IDLE and system state: CORRUPTED

2021-06-17T17:01:16.274+00:00 INFO CBAS.message.RegistrationTasksResponseMessage [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Starting startup task: { “class” : “UpdateNodeStatusTask” }
2021-06-17T17:01:16.274+00:00 INFO CBAS.message.RegistrationTasksResponseMessage [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Completed startup task: { “class” : “UpdateNodeStatusTask” }
2021-06-17T17:01:16.275+00:00 INFO CBAS.message.RegistrationTasksResponseMessage [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Starting startup task: { “class” : “LocalRecoveryTask”, “partitions” : [76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91] }
2021-06-17T17:01:16.275+00:00 INFO CBAS.nc.RecoveryManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] starting recovery …
2021-06-17T17:01:16.275+00:00 INFO CBAS.recovery.AbstractCheckpointManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Getting latest checkpoint
2021-06-17T17:01:16.275+00:00 INFO CBAS.recovery.AbstractCheckpointManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Listing of files in the checkpoint dir returned [/opt/couchbase/var/lib/couchbase/data/@analytics/v_iodevice_0/txn-log/checkpoint_80, /opt/couchbase/var/lib/couchbase/data/@analytics/v_iodevice_0/txn-log/checkpoint_81, /opt/couchbase/var/lib/couchbase/data/@analytics/v_iodevice_0/txn-log/checkpoint_82]
2021-06-17T17:01:16.275+00:00 WARN CBAS.recovery.AbstractCheckpointManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Reading checkpoint file: /opt/couchbase/var/lib/couchbase/data/@analytics/v_iodevice_0/txn-log/checkpoint_80
2021-06-17T17:01:16.275+00:00 WARN CBAS.recovery.AbstractCheckpointManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Reading checkpoint file: /opt/couchbase/var/lib/couchbase/data/@analytics/v_iodevice_0/txn-log/checkpoint_81
2021-06-17T17:01:16.275+00:00 WARN CBAS.recovery.AbstractCheckpointManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Reading checkpoint file: /opt/couchbase/var/lib/couchbase/data/@analytics/v_iodevice_0/txn-log/checkpoint_82
2021-06-17T17:01:16.278+00:00 INFO CBAS.nc.RecoveryManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Logs analysis phase completed.
2021-06-17T17:01:16.278+00:00 INFO CBAS.nc.RecoveryManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Analysis log count update/entityCommit/jobCommit/abort = 0/0/0/0
2021-06-17T17:01:16.308+00:00 INFO CBAS.nc.RecoveryManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Logs REDO phase completed. Redo logs count: 0
2021-06-17T17:01:16.700+00:00 INFO CBAS.message.RegistrationTasksResponseMessage [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Completed startup task: { “class” : “LocalRecoveryTask”, “partitions” : [76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91] }
2021-06-17T17:01:16.700+00:00 INFO CBAS.message.RegistrationTasksResponseMessage [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Starting startup task: { “class” : “StartReplicationServiceTask” }
2021-06-17T17:01:16.700+00:00 INFO CBAS.message.RegistrationTasksResponseMessage [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Completed startup task: { “class” : “StartReplicationServiceTask” }
2021-06-17T17:01:16.700+00:00 INFO CBAS.message.RegistrationTasksResponseMessage [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Starting startup task: { “class” : “MetadataBootstrapTask” }
2021-06-17T17:01:16.701+00:00 INFO CBAS.management.ReplicationChannel [ec099cf5c29301305441d006c4f2d1ad Replication Channel Thread] opened Replication Channel @ IP Address: 0.0.0.0:9120
2021-06-17T17:01:16.797+00:00 INFO CBAS.nc.RecoveryManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Logs analysis phase completed.
2021-06-17T17:01:16.797+00:00 INFO CBAS.nc.RecoveryManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Analysis log count update/entityCommit/jobCommit/abort = 0/0/0/0
2021-06-17T17:01:16.820+00:00 INFO CBAS.nc.RecoveryManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Logs REDO phase completed. Redo logs count: 0
2021-06-17T17:01:16.822+00:00 INFO CBAS.context.PrimaryIndexOperationTracker [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Primary index on dataset -2147483547 and partition 76 is empty… skipping flush
… repeats of the last one, going through all the datasets and partitions

It gets the list od checkpoint files again and reads them, then this

2021-06-17T17:01:16.858+00:00 INFO CBAS.nc.NCAppRuntimeContext [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Bootstrapping metadata
2021-06-17T17:01:16.867+00:00 INFO CBAS.metadata.CachingTxnIdFactory [Executor-10:ec099cf5c29301305441d006c4f2d1ad] block exhausted; obtaining new block from supplier
2021-06-17T17:01:16.869+00:00 INFO CBAS.messaging.CCMessageBroker [Executor-4:ClusterController] Received message: TxnIdBlockRequest
2021-06-17T17:01:16.870+00:00 INFO CBAS.messaging.NCMessageBroker [Worker:ec099cf5c29301305441d006c4f2d1ad] Received message: TxnIdBlockResponse
2021-06-17T17:01:16.873+00:00 INFO CBAS.context.DatasetMemoryManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Reserved(1048576) for dataset(1)
2021-06-17T17:01:16.876+00:00 INFO CBAS.context.DatasetMemoryManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Reserved(1048576) for dataset(2)
2021-06-17T17:01:16.877+00:00 INFO CBAS.context.DatasetMemoryManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Reserved(1048576) for dataset(3)
2021-06-17T17:01:16.878+00:00 INFO CBAS.context.DatasetMemoryManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Reserved(1048576) for dataset(4)
2021-06-17T17:01:16.879+00:00 INFO CBAS.context.DatasetMemoryManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Reserved(1048576) for dataset(5)
2021-06-17T17:01:16.880+00:00 INFO CBAS.context.DatasetMemoryManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Reserved(1048576) for dataset(6)
2021-06-17T17:01:16.881+00:00 INFO CBAS.context.DatasetMemoryManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Reserved(1048576) for dataset(7)
2021-06-17T17:01:16.881+00:00 INFO CBAS.context.DatasetMemoryManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Reserved(1048576) for dataset(8)
2021-06-17T17:01:16.882+00:00 INFO CBAS.context.DatasetMemoryManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Reserved(1048576) for dataset(10)
2021-06-17T17:01:16.883+00:00 INFO CBAS.context.DatasetMemoryManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Reserved(1048576) for dataset(12)
2021-06-17T17:01:16.884+00:00 INFO CBAS.context.DatasetMemoryManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Reserved(1048576) for dataset(9)
2021-06-17T17:01:16.884+00:00 INFO CBAS.context.DatasetMemoryManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Reserved(1048576) for dataset(13)
2021-06-17T17:01:16.885+00:00 INFO CBAS.context.DatasetMemoryManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Reserved(1048576) for dataset(14)
2021-06-17T17:01:16.885+00:00 INFO CBAS.context.DatasetMemoryManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Reserved(1048576) for dataset(11)
2021-06-17T17:01:16.886+00:00 INFO CBAS.bootstrap.MetadataBootstrap [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Finished enlistment of metadata B-trees in old universe
2021-06-17T17:01:16.923+00:00 INFO CBAS.bootstrap.MetadataBootstrap [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Starting DDL recovery …
2021-06-17T17:01:16.967+00:00 INFO CBAS.bootstrap.MetadataBootstrap [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Completed DDL recovery.
2021-06-17T17:01:16.968+00:00 INFO CBAS.context.DatasetMemoryManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Reserved(1048576) for dataset(53)
2021-06-17T17:01:16.969+00:00 INFO CBAS.context.DatasetMemoryManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Reserved(1048576) for dataset(52)
2021-06-17T17:01:16.982+00:00 INFO CBAS.nc.NCAppRuntimeContext [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Metadata node bound
2021-06-17T17:01:16.982+00:00 INFO CBAS.message.RegistrationTasksResponseMessage [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Completed startup task: { “class” : “MetadataBootstrapTask” }
2021-06-17T17:01:16.982+00:00 INFO CBAS.message.RegistrationTasksResponseMessage [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Starting startup task: { “class” : “ExternalLibrarySetupTask”, “metadata-node” : true }
2021-06-17T17:01:16.982+00:00 INFO CBAS.message.RegistrationTasksResponseMessage [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Completed startup task: { “class” : “ExternalLibrarySetupTask”, “metadata-node” : true }
2021-06-17T17:01:16.982+00:00 INFO CBAS.message.RegistrationTasksResponseMessage [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Starting startup task: { “class” : “CheckpointTask” }
2021-06-17T17:01:16.982+00:00 INFO CBAS.recovery.CheckpointManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Starting sharp checkpoint…
2021-06-17T17:01:16.984+00:00 INFO CBAS.context.PrimaryIndexOperationTracker [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Primary index on dataset 1 and partition 0 is empty… skipping flush
2021-06-17T17:01:16.986+00:00 INFO CBAS.context.PrimaryIndexOperationTracker [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Primary index on dataset 2 and partition 0 is empty… skipping flush
2021-06-17T17:01:16.989+00:00 INFO CBAS.context.PrimaryIndexOperationTracker [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Primary index on dataset 3 and partition 0 is empty… skipping flush
2021-06-17T17:01:16.991+00:00 INFO CBAS.context.PrimaryIndexOperationTracker [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Primary index on dataset 4 and partition 0 is empty… skipping flush
2021-06-17T17:01:16.994+00:00 INFO CBAS.context.PrimaryIndexOperationTracker [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Primary index on dataset 5 and partition 0 is empty… skipping flush
2021-06-17T17:01:16.996+00:00 INFO CBAS.context.PrimaryIndexOperationTracker [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Primary index on dataset 6 and partition 0 is empty… skipping flush
2021-06-17T17:01:17.000+00:00 INFO CBAS.context.PrimaryIndexOperationTracker [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Primary index on dataset 7 and partition 0 is empty… skipping flush
2021-06-17T17:01:17.003+00:00 INFO CBAS.context.PrimaryIndexOperationTracker [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Primary index on dataset 8 and partition 0 is empty… skipping flush
2021-06-17T17:01:17.006+00:00 INFO CBAS.context.PrimaryIndexOperationTracker [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Primary index on dataset 9 and partition 0 is empty… skipping flush
2021-06-17T17:01:17.009+00:00 INFO CBAS.context.PrimaryIndexOperationTracker [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Primary index on dataset 10 and partition 0 is empty… skipping flush
2021-06-17T17:01:17.011+00:00 INFO CBAS.context.PrimaryIndexOperationTracker [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Primary index on dataset 11 and partition 0 is empty… skipping flush
2021-06-17T17:01:17.014+00:00 INFO CBAS.context.PrimaryIndexOperationTracker [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Primary index on dataset 12 and partition 0 is empty… skipping flush
2021-06-17T17:01:17.016+00:00 INFO CBAS.context.PrimaryIndexOperationTracker [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Primary index on dataset 13 and partition 0 is empty… skipping flush
2021-06-17T17:01:17.018+00:00 INFO CBAS.context.PrimaryIndexOperationTracker [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Primary index on dataset 14 and partition 0 is empty… skipping flush
2021-06-17T17:01:17.021+00:00 INFO CBAS.context.PrimaryIndexOperationTracker [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Primary index on dataset -2147483547 and partition 76 is empty… skipping flush
… lots of repeating the last line again

2021-06-17T17:01:17.061+00:00 INFO CBAS.recovery.CheckpointManager [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Completed sharp checkpoint.
2021-06-17T17:01:17.061+00:00 INFO CBAS.message.RegistrationTasksResponseMessage [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Completed startup task: { “class” : “CheckpointTask” }
2021-06-17T17:01:17.061+00:00 INFO CBAS.message.RegistrationTasksResponseMessage [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Starting startup task: { “class” : “StartLifecycleComponentsTask” }
2021-06-17T17:01:17.062+00:00 INFO CBAS.message.RegistrationTasksResponseMessage [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Completed startup task: { “class” : “StartLifecycleComponentsTask” }
2021-06-17T17:01:17.062+00:00 INFO CBAS.message.RegistrationTasksResponseMessage [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Starting startup task: { “class” : “ExportMetadataNodeTask”, “export-stub” : true }
2021-06-17T17:01:17.072+00:00 INFO CBAS.message.RegistrationTasksResponseMessage [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Completed startup task: { “class” : “ExportMetadataNodeTask”, “export-stub” : true }
2021-06-17T17:01:17.072+00:00 INFO CBAS.message.RegistrationTasksResponseMessage [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Starting startup task: { “class” : “BindMetadataNodeTask” }
2021-06-17T17:01:17.077+00:00 INFO CBAS.message.RegistrationTasksResponseMessage [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Completed startup task: { “class” : “BindMetadataNodeTask” }
2021-06-17T17:01:17.077+00:00 INFO CBAS.message.RegistrationTasksResponseMessage [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Starting startup task: { “class” : “UpdateNodeStatusTask” }
2021-06-17T17:01:17.077+00:00 INFO CBAS.message.RegistrationTasksResponseMessage [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Completed startup task: { “class” : “UpdateNodeStatusTask” }
2021-06-17T17:01:17.077+00:00 INFO CBAS.message.RegistrationTasksResponseMessage [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Starting startup task: { “class” : “CheckRebalanceRequiredTask” }
2021-06-17T17:01:17.100+00:00 INFO CBAS.message.RegistrationTasksResponseMessage [Executor-10:ec099cf5c29301305441d006c4f2d1ad] Completed startup task: { “class” : “CheckRebalanceRequiredTask” }
2021-06-17T17:01:17.128+00:00 INFO CBAS.utils.ClusterStateManager [Executor-4:ClusterController] Metadata node ec099cf5c29301305441d006c4f2d1ad is now active
2021-06-17T17:01:17.129+00:00 INFO CBAS.utils.ClusterStateManager [Executor-4:ClusterController] updating cluster state from UNUSABLE to PENDING
2021-06-17T17:01:17.129+00:00 INFO CBAS.utils.ClusterStateManager [Executor-4:ClusterController] Cluster State is now PENDING
2021-06-17T17:01:17.129+00:00 INFO CBAS.utils.ClusterStateManager [Executor-4:ClusterController] updating cluster state from PENDING to RECOVERING
2021-06-17T17:01:17.129+00:00 INFO CBAS.utils.ClusterStateManager [Executor-4:ClusterController] Cluster State is now RECOVERING
2021-06-17T17:01:17.130+00:00 INFO CBAS.bootstrap.GlobalRecoveryManager [Executor-5:ClusterController] Starting Global Recovery
2021-06-17T17:01:17.138+00:00 INFO CBAS.rebalance.ReplicationUtil [Executor-5:ClusterController] Current replicas
2021-06-17T17:01:17.138+00:00 WARN CBAS.rebalance.ReplicationUtil [Executor-5:ClusterController] Couldn’t find additional candidate nodes for metadata replicas
2021-06-17T17:01:17.139+00:00 INFO CBAS.rebalance.ReplicationUtil [Executor-5:ClusterController] Refreshing state of replicas
2021-06-17T17:01:17.139+00:00 INFO CBAS.rebalance.ReplicationUtil [Executor-5:ClusterController] Updating metadata replicas to:
2021-06-17T17:01:17.203+00:00 INFO CBAS.util.EventLoopProvider [Executor-5:ClusterController] Initializing Couchbase Environment
2021-06-17T17:01:17.253+00:00 INFO CBAS.bootstrap.GlobalRecoveryManager [Executor-5:ClusterController] Global Recovery Completed. Refreshing cluster state…
2021-06-17T17:01:17.253+00:00 INFO CBAS.active.RecoveryTask [RecoveryTask (dataverse-name.Local.bucket-name(CouchbaseMetadataExtension))] Actual Recovery task has started
2021-06-17T17:01:17.253+00:00 INFO CBAS.utils.ClusterStateManager [Executor-5:ClusterController] updating cluster state from RECOVERING to ACTIVE
2021-06-17T17:01:17.253+00:00 INFO CBAS.utils.ClusterStateManager [Executor-5:ClusterController] Cluster State is now ACTIVE
2021-06-17T17:01:17.253+00:00 INFO CBAS.active.ActiveNotificationHandler [Executor-7:ClusterController] Starting active recovery

This line repeats a few times

2021-06-17T17:01:17.708+00:00 WARN CBAS.config.ConfigManager [RecoveryTask (dataverse-name.Local.bucket-name(CouchbaseMetadataExtension))] NC option [nc] analytics.broadcast.dcp.state.mutation.count being accessed outside of NC-scoped configuration.

It requests shadow states, gets a response, then initializes couchbase environment

This line repeats a few times

2021-06-17T17:01:19.339+00:00 INFO CBAS.messaging.CCMessageBroker [Executor-7:ClusterController] Received message: (dataverse-name.Local.bucket-name(CouchbaseMetadataExtension))[2]:BucketOperatorDescriptor:ActivePartitionMessage-RUNTIME_REGISTERED

Successfully recovered, according to the logs

2021-06-17T17:01:19.356+00:00 INFO CBAS.active.RecoveryTask [RecoveryTask (dataverse-name.Local.bucket-name(CouchbaseMetadataExtension))] Recovery completed successfully

This line repeats a few times

2021-06-17T17:01:19.517+00:00 INFO CBAS.dataflow.FeedRecordDataFlowController [Executor-14:ec099cf5c29301305441d006c4f2d1ad:JID:6.1:TAID:TID:ANID:ODID:0:0:0:0:SuperActivityOperatorNodePushable:BucketOperatorNodePushable:(dataverse-name.Local.bucket-name(CouchbaseMetadataExtension))[0]:BucketOperatorDescriptor] State is being set from CREATED to STARTED

And right after this is where it fails

2021-06-17T17:01:19.843+00:00 INFO CBAS.context.DatasetMemoryManager [org.apache.hyracks.api.rewriter.runtime.SuperActivity:JID:6.1:TAID:TID:ANID:ODID:2:0:7:0:0] Allocated(2702442496) for dataset(-2147483547)
2021-06-17T17:01:19.865+00:00 INFO CBAS.context.DatasetMemoryManager [org.apache.hyracks.api.rewriter.runtime.SuperActivity:JID:6.1:TAID:TID:ANID:ODID:5:0:4:0:0] Allocated(2702442496) for dataset(-2147483546)
2021-06-17T17:01:19.899+00:00 INFO CBAS.context.DatasetMemoryManager [org.apache.hyracks.api.rewriter.runtime.SuperActivity:JID:6.1:TAID:TID:ANID:ODID:8:0:7:0:0] Allocated(2702442496) for dataset(-2147483545)
2021-06-17T17:01:19.911+00:00 INFO CBAS.context.DatasetMemoryManager [org.apache.hyracks.api.rewriter.runtime.SuperActivity:JID:6.1:TAID:TID:ANID:ODID:11:0:7:0:0] Allocated(2702442496) for dataset(-2147483544)
2021-06-17T17:01:20.083+00:00 INFO CBAS.context.DatasetMemoryManager [org.apache.hyracks.api.rewriter.runtime.SuperActivity:JID:6.1:TAID:TID:ANID:ODID:14:0:7:0:0] Allocated(2702442496) for dataset(-2147483543)
2021-06-17T17:01:20.194+00:00 INFO CBAS.context.DatasetMemoryManager [org.apache.hyracks.api.rewriter.runtime.SuperActivity:JID:6.1:TAID:TID:ANID:ODID:17:0:9:0:0] Allocated(2702442496) for dataset(-2147483542)
2021-06-17T17:01:20.227+00:00 INFO CBAS.context.DatasetMemoryManager [org.apache.hyracks.api.rewriter.runtime.SuperActivity:JID:6.1:TAID:TID:ANID:ODID:26:0:7:0:0] Allocated(2702442496) for dataset(-2147483537)
2021-06-17T17:01:20.228+00:00 INFO CBAS.context.DatasetMemoryManager [org.apache.hyracks.api.rewriter.runtime.SuperActivity:JID:6.1:TAID:TID:ANID:ODID:29:0:5:0:0] Allocated(2702442496) for dataset(-2147483539)
2021-06-17T17:01:20.232+00:00 WARN CBAS.context.DatasetLifecycleManager [org.apache.hyracks.api.rewriter.runtime.SuperActivity:JID:6.1:TAID:TID:ANID:ODID:23:0:13:0:0] failed to allocate memory for dataset -2147483540. Currently allocated {“availableBudget”:2701918208,“allocated”:[{“datasetId”:-2147483547,“budget”:2702442496},{“datasetId”:-2147483546,“budget”:2702442496},{“datasetId”:-2147483545,“budget”:2702442496},{“datasetId”:-2147483544,“budget”:2702442496},{“datasetId”:-2147483543,“budget”:2702442496},{“datasetId”:-2147483542,“budget”:2702442496},{“datasetId”:-2147483539,“budget”:2702442496},{“datasetId”:-2147483537,“budget”:2702442496}],“reserved”:[{“datasetId”:1,“budget”:1048576},{“datasetId”:2,“budget”:1048576},{“datasetId”:3,“budget”:1048576},{“datasetId”:4,“budget”:1048576},{“datasetId”:5,“budget”:1048576},{“datasetId”:6,“budget”:1048576},{“datasetId”:7,“budget”:1048576},{“datasetId”:8,“budget”:1048576},{“datasetId”:9,“budget”:1048576},{“datasetId”:10,“budget”:1048576},{“datasetId”:11,“budget”:1048576},{“datasetId”:12,“budget”:1048576},{“datasetId”:13,“budget”:1048576},{“datasetId”:14,“budget”:1048576},{“datasetId”:52,“budget”:1048576},{“datasetId”:53,“budget”:1048576}]}
2021-06-17T17:01:20.243+00:00 WARN CBAS.operators.FeedMetaStoreNodePushable [org.apache.hyracks.api.rewriter.runtime.SuperActivity:JID:6.1:TAID:TID:ANID:ODID:23:0:13:0:0] Failure Processing a frame at store side
org.apache.hyracks.api.exceptions.HyracksDataException: Cannot allocate dataset -2147483540 memory since memory budget would be exceeded.

This one repeats a lot and then the service finally stops

2021-06-17T17:01:20.377+00:00 WARN CBAS.operators.FeedMetaStoreNodePushable [org.apache.hyracks.api.rewriter.runtime.SuperActivity:JID:6.1:TAID:TID:ANID:ODID:20:0:11:0:0] Failure Processing a frame at store side
org.apache.hyracks.api.exceptions.HyracksDataException: Cannot allocate dataset -2147483541 memory since memory budget would be exceeded.
at org.apache.asterix.common.context.DatasetLifecycleManager.allocateMemory(DatasetLifecycleManager.java:579) ~[asterix-common.jar:6.6.2-9588]

So it seems that it’s managing to reserve the memory it needs for the datasets in v_iodevice_0/storage/partition_0/Metadata/, but then when it gets to our actual datasets, it tries to allocate massive amounts of memory and dies. What is the cause of this, and can anything be done to fix it?