Given a set of documents in couchbase where each document has an integer value which is initially assigned sequentially, but is then later released out of order, find the lowest numbered “hole” in the sequence
For example, for documents with the following values:
1
3
4
6 (edited)
the query should return 0 (edited)
with these values:
0
1
3
4
6
the query should return 2
with these values:
0
1
2
the query should return 3
SELECT FIRST pos FOR pos:v IN av WHEN pos != av[pos] END AS hole
LET av = ARRAY_APPEND((SELECT RAW d.id FROM default AS d WHERE d.id >= 0 ORDER BY d.id ASC ), -1);
SELECT FIRST pos FOR pos:v IN av WHEN pos != av[pos] END AS hole LET av = ARRAY_APPEND([1,3,4,6], -1);
SELECT FIRST pos FOR pos:v IN av WHEN pos != av[pos] END AS hole LET av = ARRAY_APPEND([0,1,3,4,6], -1);
SELECT FIRST pos FOR pos:v IN av WHEN pos != av[pos] END AS hole LET av = ARRAY_APPEND([0,1,2], -1);
One requirement, I forget to mention is The query must support up to 25 million documents
The query supports but 25million means it needs CPU/Memory because it needs to sort and find the hole.
In this case you need to follow different approach.
Using your application and Using Index order
CREATE INDEX ix1 ON default(id);
startpos = 0
endpos = 1000
pos=startpos
WHILE
do
id = SELECT FIRST pos FOR pos:v IN av WHEN pos != av[pos] END AS hole
LET av = ARRAY_APPEND((SELECT RAW d.id
FROM default AS d USE INDEX (ix1)
WHERE d.id BETWEEN $startpos AND $endpos
ORDER BY d.id ASC), -1);
if id <= $endpos
pos = startpos+id
break
else
startpos = endpos+1
endpos = endpos+1000
fi
done
pos will have actual value
I am thinking in the same direction but with a small change. If we can get oracle ROWNUM feature,then the query would be simpler
select min(id) from bucket where ROWNUM != id
Is there a way to get ROWNUM working in couchbase n1ql?
It will be there in the next release https://blog.couchbase.com/json-to-insights-fast-and-easy/
CREATE INDEX ix1 ON default(id);
SELECT RAW d.rn-1 AS pos
FROM (select ROW_NUMBER() OVER() AS rn, t.id as id
from default AS t USE INDEX (ix1)
WHERE t.id >= 0
) AS d
WHERE (d.rn-1) != d.id
LIMIT 1;
2 Likes
I like it and query is getting simpler and easier to understand