Optimize count on IS NOT MISSING subdocuments

Hello,

I currently experience some bad query performance, with a large dataset of documents that have publication keys like draft, staging etc, which are subdocuments.

Our goal is to count the documents with a publication state missing, like:
SELECT count(*) FROM data WHERE draft IS NOT MISSING AND domain='www'

We tried multiple things like creating an index over data(draft,domain) but it takes a huge part of our disk size.
We also tried an index like data(IFMISSING(draft,null),domain) but the query is to long as well.

Thanks for your help.

How about create index by

CREATE INDEX `idx_draft_domain` ON `data`(IFMISSINGORNULL(draft,1), domain) USING GSI;

and query by

SELECT count(1) FROM `data` WHERE IFMISSINGORNULL(draft,1) !=1 AND domain == 'www'

Try with one of the following indexes.

CREATE INDEX ix1 ON `data`(draft) WHERE domain = "www";
CREATE INDEX ix2 ON `data`(domain, draft);

Just run it, but what will be indexed? The entire draft subdocument?

Actually, these are really large document and we don’t really want to index them, but only the count of them.

Creating a simple index on the draft key kills the server because of the size of the document…

CREATE INDEX ix1 ON default(domain) WHERE draft IS NOT MISSING;
SELECT COUNT(1) FROM default WHERE draft IS NOT MISSING AND domain = "www";