Slow unnest and union all query

Hi I have document of following format:

{
“_id”: “Employee::33243”,
“userID”: 33243,
“type”: “Employee”,
“firstName”: “Kevin”,
“gender” : “M”,
“nationality” : “US”,
“departmentCode” : “DB”,
“birthDate” : “1981-03-26”,
“joiningDate” : “2014-03-29”,
“employeeStatuses”: [
{
“status”: “IN”,
“statusChangedDate”: “2017-07-20T08:20:00Z”,
},
{
“status”: “OUT”,
“statusChangedDate”: “2017-07-20T21:21:00Z”,
},
{
“status”: “IN”,
“statusChangedDate”: “2017-07-21T08:25:00Z”,
},
{
“status”: “OUT”,
“statusChangedDate”: “2017-07-21T20:21:00Z”,
}
.
.
.
.
]
}

and two other documents of similar structure but different ‘type’.
I have created a simple query to show basic details of and their statuses using unnest and combined the results of those three queries using union all
but this query is taking a lot of time. This query is executed every time the report page is loaded and it takes approx. 15 seconds to load. the report has further filtering options on name, gender, date etc.

Select * From
(
Select d.firstName, d.lastName, d.gender, s.status, s.statuschangedDate
From default d
unnest employeeStatuses s
where d.type = ‘Employee’
and d.departmentCode = ‘DB’

Union All

Select d.firstName, d.lastName, d.gender, s.status, s.statuschangedDate
From default d
unnest staffStatuses s
where d.type = ‘Staff’
and d.departmentCode = “HS”

Union All

Select d.firstName, d.lastName, d.gender, s.status, s.statuschangedDate
From default d
unnest visitorStatuses s
where d.type = ‘Visitor’
and d.departmentCode = ‘EE’
) PersonInfo
ORDER BY PersonInfo.firstName ASC LIMIT 10 OFFSET 0;

Can you please suggest me a good index or mapreduce view to improve the query execution time?

*Note : The actual document design and fields are different and sample document is shown here for privacy purpose.

Résultats de recherche

Hi,

First of all, I would have begin in creating an index on the “type” property, did you try that ?

Regards,

Steeve

Hi Steeve,
There are indexes already created on fields:

  1. type
  2. departmentCode, firstName, lastName (where type = …) /* three diferent indexes for all three types*/

You have ORDER BY and LIMIT that means query needs to produce all possible values first.
The UNINON ALL query needs to produce all the documents and UNNEST (most probably it is doing Fetch UNNEST array) this might be taking time.

As you have 3 different indexes you can rewrite the query as follows.

SELECT PersonInfo.firstName, PersonInfo.lastName, PersonInfo.gender, s.status, s.statuschangedDate FROM
(
(SELECT d.firstName, d.lastName, d.gender, d.employeeStatuses
FROM default d
WHERE d.type = "Employee" AND d.departmentCode = "DB" ORDER BY d.departmentCode, d.firstName LIMIT 10)
UNION ALL
(SELECT d.firstName, d.lastName, d.gender, d.employeeStatuses
FROM default d
WHERE d.type = "Staff" AND d.departmentCode = "HS" ORDER BY d.departmentCode, d.firstName LIMIT 10)
UNION ALL
(SELECT d.firstName, d.lastName, d.gender, d.employeeStatuses
FROM default d
WHERE d.type = "Visitor" AND d.departmentCode = "EE" ORDER BY d.departmentCode, d.firstName LIMIT 10)
) AS PersonInfo UNNEST PersonInfo.employeeStatuses AS s
ORDER BY PersonInfo.firstName ASC LIMIT 10 OFFSET 0;

In above query each UNION ALL arm uses different index and has you have equality predicates and uses index order. So we add the LIMIT 10 to stop the query once it produces 10 items.
The result of UNION ALL produces maximum 30 rows.
Parent query does UNNEST and sorts produces 10 rows.

Hi @vsr1, this query is certainly faster. But applying pagination is a bit complex in such condition. Can you suggest any alternate solution?

You can pass limit and offset as $limit and $offset modify query such that inside query limit will be $limit+$offset

Hi @vsr1, Limit with Union All is not working. I am getting an error when applying limit in inner queries.
“syntax error - at UNION”

use () around each union all arm. Updated above query

Hi @vsr1, using this way, when I hit the next page button, the query execution will become slower as the query is slowly turning into the query I just posted because of the increased limit.
I don’t think this is an ideal solution to the problem. It would be great if I can get a stable performance of the query without any compromise. Any suggestion?

@vsr1, When I use the query that I posted in the question (selecting 10-12 fields from the document and giving them proper aliases) then the query takes around 14 seconds to execute for an offset of 21000. But If I use Select * in inner queries , then the time comes down to approx. 2.5 seconds. Can you tell me why is this happening?