Authorize phase taking as much as 3 min on CB server 6.5.1

pccb · September 9, 2020, 7:11pm

Hello,

We have a pretty simple query that is taking as long as 8-10 minutes and then timing out. On checking the plan, we see that authorize phase it taking 3m+. Resident Ratio is 100%. Index is MOI. There are 400k docs in the bucket. The query returns about 2k docs.

SELECT P.X , Q.Y
FROM bucket1 P
INNER JOIN bucket1 Q
ON P.A = Q.A
WHERE P.__t=“BB” and Q.__t=“CC”;

It times out with the message 12008 Error performing bulk get operation.

There are only 2.5k docs for __t=“BB”.
And only 1k docs for __t=“CC”.

Looking at the name, authorize looks like a phase related to RBAC. The user is an admin and hence we are unable to understand why is it taking as much as 3m+?

Also, can somebody point to a link in documentation that explains different times (servTime, kernTime, etc.) and the phases?

Thanks

vsr1 · September 9, 2020, 7:17pm

Authorize phase will be done once per query and it should not take that long. cc @Marco_Greco

Marco_Greco · September 9, 2020, 8:32pm

Authorize should take a millisecond tops.
Could I have a look at the complete profile?
You can get it from the UI after execution (hit the plan text button).

pccb · September 10, 2020, 11:32am

Thanks, for both. I am going through the manual: found meaning of kernTime, servTime, execTime but authorize is not defined.

pccb · September 10, 2020, 2:21pm

Thanks Marco.

PFA the profile (pls rename from .zip to .gz and then use gunzip to uncompress). Note that when I was running the N1QL N1QL_Profile.txt.zip (1.4 KB) yesterday on the Query Workbench, it was timing out after 10 minutes. For those attempts I had checked the plan and it showed authorize taking 3m+. Later, when you asked for the profile, I could not fetch it for those attempts. So I generated a new one. And in it, as we see, execTime for authorize phase shows 6hours+.

Marco_Greco · September 10, 2020, 2:54pm

Right - the authorize execTime (and that of the sequences, for that matter), is a red herring: the actual authorize time is 1.5ms (the servTime).
I’ll have a look at why it may be giving duff times, and log a bug.
Just to confirm did you get that profile from system:active_requests (I’m guessing that there might be an issue generating an execution time from a thread that is presumed t be running).

Your issue is that the fetch that’s supporting the inner side of the join takes the best part of 10 minutes.
This is because you are fetching the 1K docs where Q.__t = “CC” 2.5K times, because of the post join filter.
What indexes do you have on Q?

pccb · September 10, 2020, 6:36pm

I navigated to Query Monitor and selected completed and then scrolled down to the query. Identified the original execution from the run at column and then clicked on plan. So that is a correction from what I mentioned before. I did that right now and I see that its showing 33h for the authorize phase. So you are right, it looks like a bug. Possibly its thinking that the query is still running and hence shows the elapsed time since the time when the query was run.

There is an index on bucket1.__t. You would have noted its a self join.

How did you deduce that its the inner side of the join taking most of the time? I scanned through and I found many other phases having timings like 6m+. Some filters and fetches also have 9m+

Why is it fetching 1k docs 2.5k times? For each doc in that 1k docs, it should check for the matches in the 2.5k docs, isnt it?

Also, why is it a post join filter? Coming from the RDBMS world, I thought it would apply the filter first and then join that would reduce the join to 2 sets, 1 containing 1k docs and other 2.5k docs.

In the profile, I see:
{
“#operator”: “Fetch”,
“#stats”: {
“#itemsIn”: 1989295,
“#itemsOut”: 1989278,
“#phaseSwitches”: 8213020,
“execTime”: “6.461664798s”,
“kernTime”: “15.141693578s”,
“servTime”: “9m37.388664028s”
},

Is it getting docs in bulk? May be, 1k docs each time thinking that it will be faster?

Thanks

vsr1 · September 10, 2020, 7:34pm

For optimization:

Follow this https://blog.couchbase.com/ansi-join-support-n1ql/
Use right index https://index-advisor.couchbase.com/indexadvisor/#1

In addition:
It is Inner JOIN Switch Join order so that inner executed less number of times and see if that improves
EE use HASH JOIN

CREATE INDEX ix1 ON bucket1 (__t,A,X,Y);

SELECT P.X , Q.Y
FROM bucket1 Q
INNER JOIN bucket1 P
ON P.A = Q.A
WHERE P.__t="BB" and Q.__t="CC";

OR

   WITH WQ AS (SELECT q1.Y, q1.A FROM bucket1 AS q1 WHERE q1.__t="CC")
   SELECT P.X , Q.Y
   FROM bucket1 P
   INNER JOIN WQ AS Q ON P.A = Q.A
   WHERE P.__t="BB";

OR

EE Only

SELECT P.X , Q.Y
FROM bucket1 Q
INNER JOIN bucket1 P USE HASH(PROBE)
ON P.A = Q.A
WHERE P.__t="BB" and Q.__t="CC";

Topic		Replies	Views
Timeout and slow response Couchbase Server query	5	885	July 28, 2022
N1ql to upload some large documents via cbq takes a very long time Couchbase Server query , n1ql	21	264	July 4, 2025
New user creation takes very long time Sync Gateway	15	2614	June 9, 2016
Slow queries execution SQL++	6	758	April 13, 2021
CB 6 CC index time out even if I change it to an hour Couchbase Server	6	775	March 22, 2019

Authorize phase taking as much as 3 min on CB server 6.5.1

Related topics