We have a pretty simple query that is taking as long as 8-10 minutes and then timing out. On checking the plan, we see that authorize phase it taking 3m+. Resident Ratio is 100%. Index is MOI. There are 400k docs in the bucket. The query returns about 2k docs.
SELECT P.X , Q.Y
FROM bucket1 P
INNER JOIN bucket1 Q
ON P.A = Q.A
WHERE P.__t=“BB” and Q.__t=“CC”;
It times out with the message 12008 Error performing bulk get operation.
There are only 2.5k docs for __t=“BB”.
And only 1k docs for __t=“CC”.
Looking at the name, authorize looks like a phase related to RBAC. The user is an admin and hence we are unable to understand why is it taking as much as 3m+?
Also, can somebody point to a link in documentation that explains different times (servTime, kernTime, etc.) and the phases?
Authorize should take a millisecond tops.
Could I have a look at the complete profile?
You can get it from the UI after execution (hit the plan text button).
PFA the profile (pls rename from .zip to .gz and then use gunzip to uncompress). Note that when I was running the N1QL N1QL_Profile.txt.zip (1.4 KB) yesterday on the Query Workbench, it was timing out after 10 minutes. For those attempts I had checked the plan and it showed authorize taking 3m+. Later, when you asked for the profile, I could not fetch it for those attempts. So I generated a new one. And in it, as we see, execTime for authorize phase shows 6hours+.
Right - the authorize execTime (and that of the sequences, for that matter), is a red herring: the actual authorize time is 1.5ms (the servTime).
I’ll have a look at why it may be giving duff times, and log a bug.
Just to confirm did you get that profile from system:active_requests (I’m guessing that there might be an issue generating an execution time from a thread that is presumed t be running).
Your issue is that the fetch that’s supporting the inner side of the join takes the best part of 10 minutes.
This is because you are fetching the 1K docs where Q.__t = “CC” 2.5K times, because of the post join filter.
What indexes do you have on Q?
I navigated to Query Monitor and selected completed and then scrolled down to the query. Identified the original execution from the run at column and then clicked on plan. So that is a correction from what I mentioned before. I did that right now and I see that its showing 33h for the authorize phase. So you are right, it looks like a bug. Possibly its thinking that the query is still running and hence shows the elapsed time since the time when the query was run.
There is an index on bucket1.__t. You would have noted its a self join.
How did you deduce that its the inner side of the join taking most of the time? I scanned through and I found many other phases having timings like 6m+. Some filters and fetches also have 9m+
Why is it fetching 1k docs 2.5k times? For each doc in that 1k docs, it should check for the matches in the 2.5k docs, isnt it?
Also, why is it a post join filter? Coming from the RDBMS world, I thought it would apply the filter first and then join that would reduce the join to 2 sets, 1 containing 1k docs and other 2.5k docs.
In the profile, I see:
{
“#operator”: “Fetch”,
“#stats”: {
“#itemsIn”: 1989295,
“#itemsOut”: 1989278,
“#phaseSwitches”: 8213020,
“execTime”: “6.461664798s”,
“kernTime”: “15.141693578s”,
“servTime”: “9m37.388664028s”
},
Is it getting docs in bulk? May be, 1k docs each time thinking that it will be faster?