Large volume of sync records and query slowness

We have a bucket in our Production Couchbase Server having 23K documents.
But, the total documents including sync documents is around 2.4 million.

First we were using this query to fetch all the 23K documents.
1)

SELECT 
    A.externalPersonId AS StudentId, 
    A.balance AS AccountBalance 
FROM 
      POS_Account A 
WHERE 
     A.type = 'Account' 
    AND A.regionId = 23
    AND A.isStudent = true 
    AND META(A).id NOT LIKE '_sync%'

We started seeing slowness after some time and we went ahead and modified the query as shown below.
2)

SELECT 
	A.externalPersonId AS StudentId, 
	A.balance AS AccountBalance 
FROM 
	POS_Account A 
WHERE 
	A.type = 'Account' 
	AND A.regionId = {0} 
	AND A.isStudent = true 
	AND META(A).id = CONCAT('Account_', A.accountId)

Now again after sometime ,we started seeing slowness with the above query (#2).
It is taking around 3 mins 48 seconds,
We added all the required indexes that came up when we used Index Recommedation, but with no success.
We went ahead and tried the query #1 and we are seeing that the query #1 is now running much faster (taking 5-10 seconds) compared to query #2.
We are not sure what is causing these 2 queries to slow down after some period of time.
We have to constantly keep updating this query for it to work in our Production Environment.

Please can we get some insight from Couchbase on how to alleviate this issue permanantly?

Also, we tried to re-produce this issue in our QA environment by adding 23K records, but both these 2 queries are getting completed withing 5-10 seconds.

The only diffierence between our Production and QA environment is that in QA we dont have as many sync documents as we have it in Production.

We are seeing this slowness due to the presence of large volumne of sync documents.
So far ,we have 132 sync documents for each document and we are afraid that these will keep on increasing as we add more number of documents to this bucket.

Please let us know if there is a way where in the old sync documents can be purged?

Please feel free to ask if any additional information is needed.

“Explain” is useful for understanding what is going on.
It’s difficult to make suggestions without seeing what the “Explain” is, and what indexes you have.
Can you please show them?
Do you have an index on ‘type’? It sounds like you might not:

The only diffierence between our Production and QA environment is that in QA we dont have as many sync documents as we have it in Production.

Doesn’t A.type=‘Account’ already filter the same documents as :
AND META(A).id NOT LIKE '_sync%'

Doesn’t A.type=‘Account’ already filter the same documents as :
AND META(A).id = CONCAT('Account_', A.accountId)

So far ,we have 132 sync documents for each document and we are afraid that these will keep on increasing as we add more number of documents to this bucket.

It might make sense to put these in a different bucket.

mreiche,

This is the index we have applied to the bucket.

CREATE INDEX adv_Account_concat_accountId_regionId_meta_self_id_type_accountI659650442 ON POS_Account(('Account_' || accountId),regionId,(meta(self).id),accountId,isStudent,siteId,balance,personId,isActive) WHERE (type = 'Account')

Query Explain:

{
    "#operator": "Sequence",
    "~children": [
        {
            "#operator": "IndexScan3",
            "as": "A",
            "index": "adv_Account_concat_accountId_regionId_meta_self_id_type_accountI659650442",
            "index_id": "553bbdee64cb9f43",
            "index_projection": {
                "primary_key": true
            },
            "keyspace": "POS_Account",
            "namespace": "default",
            "spans": [
                {
                    "range": [
                        {
                            "inclusion": 0,
                            "low": "null"
                        },
                        {
                            "high": "1496",
                            "inclusion": 3,
                            "low": "1496"
                        },
                        {
                            "inclusion": 0,
                            "low": "null"
                        },
                        {
                            "inclusion": 0,
                            "low": "null"
                        },
                        {
                            "high": "true",
                            "inclusion": 3,
                            "low": "true"
                        }
                    ]
                }
            ],
            "using": "gsi"
        },
        {
            "#operator": "Fetch",
            "as": "A",
            "keyspace": "POS_Account",
            "namespace": "default"
        },
        {
            "#operator": "Parallel",
            "~child": {
                "#operator": "Sequence",
                "~children": [
                    {
                        "#operator": "Filter",
                        "condition": "(((((`A`.`type`) = \"Account\") and ((`A`.`regionId`) = 1496)) and ((`A`.`isStudent`) = true)) and ((meta(`A`).`id`) = (\"Account_\" || (`A`.`accountId`))))"
                    },
                    {
                        "#operator": "InitialProject",
                        "result_terms": [
                            {
                                "as": "StudentId",
                                "expr": "(`A`.`externalPersonId`)"
                            },
                            {
                                "as": "AccountBalance",
                                "expr": "(`A`.`balance`)"
                            }
                        ]
                    },
                    {
                        "#operator": "FinalProject"
                    }
                ]
            }
        }
    ]
}

Please can you let me know how we can move the sync files to a different bucket?

Also, if we resolve all conflicts as mentioned in this article (Resolving Conflicts | Couchbase Docs), will all the sync files be purged, or will they still stay alive?

Thanks
Jignesh

Add an index just for regionId, isStudent where type = ‘Account’ and remove the META(A).id predicate.

SELECT 
	A.externalPersonId AS StudentId, 
	A.balance AS AccountBalance 
FROM 
	POS_Account A 
WHERE 
	A.type = 'Account' 
	AND A.regionId = {0}
	AND A.isStudent = true

You can also create the recommended ‘covering index’ for that same query - also with WHERE (type = ‘Account’) (instead of having type as one of the attributes in the index). (and delete the index that you already had).

So the issue seems to be that the index advisor recommends multiple indexes. (a) one ‘covering index’ - which you created; and (b) one ‘index’ - which you didn’t create.

Also - if you create recommended indexes for a query, and then change the query - the indexes that you created may not be helpful for the new query.

Added these 2 indexes as shown in the screen shot.
I am using our QA environment for this exercise.

As the query is not having the META(A).id predicate, we are getting duplicate records that we don’t need.

Also, please can you help on how can we minimize the generation of _sync:rb document?

In Production, the bucket is having 23K documents and there are 2.6 million sync documents.

Thanks

I asked this question earlier and there was no answer, so I assumed it was correct. If the predicate is needed, then certainly include it. Then re-do the indexes and retry the query.

I assume those come from sync-gateway. I’m not familiar with sync-gateway - if you don’t find anything in the documentation, then open a support case.

see What are "_sync:rb" documents?

Maybe Ben can direct you on how to store them in a different collection : What are "_sync:rb" documents? - #6 by bbrks

As old as that previous comment is - there’s unfortunately still no support for SG metadata storage being put inside a non-default collection.
We will be able to provide isolation from user documents stored in named collections once Couchbase Mobile’s collection-aware release is out.