CASE WHEN vs UNION SELECT

let’s say the following N1QL with CASE WHEN

SELECT CASE WHEN condition1(fieldA,fieldB,fieldC) THEN field1
            WHEN condition2(fieldA,fieldD,fieldE) THEN field2
            WHEN condition3(fieldA,fieldF,fieldG) THEN field3
            WHEN condition4(fieldA,fieldF,fieldH) THEN field4
            ELSE MISSING
       END AS field
  FROM default
 WHERE condition(fieldX,fieldY)

AS evaluation process of CASE WHEN shows this N1QL can rewrite as following

 SELECT field1 AS field
   FROM default
  WHERE condition(fieldX,fieldY)
    AND condition1(fieldA,fieldB,fieldC)

UNION

SELECT field2 AS field
  FROM default
 WHERE condition(fieldX,fieldY)
   AND NOT (condition1(fieldA,fieldB,fieldC))
   AND condition2(fieldA,fieldD,fieldE)

UNION

SELECT field3 AS field
  FROM default
 WHERE condition(fieldX,fieldY)
   AND NOT (condition1(fieldA,fieldB,fieldC))
   AND NOT (condition2(fieldA,fieldD,fieldE))
   AND condition3(fieldA,fieldF,fieldG)

UNION

   SELECT field4 AS field
     FROM default
    WHERE condition(fieldX,fieldY)
      AND NOT (condition1(fieldA,fieldB,fieldC))
      AND NOT (condition2(fieldA,fieldD,fieldE))
      AND NOT (condition3(fieldA,fieldF,fieldG))
      AND condition4(fieldA,fieldF,fieldH)

am I right?

if so,let’s talk about which is better for N1QL query efficiency.

If I can use correct index I think the second N1QL should be more efficiency, because it will use index to get more documents.

the test step (on CB 5.0.0-2194 Enterprise Edition (build-2194)) as following:
I have create 2 index as following

create index idx_case_when_union_test_1 on default(fieldX,fieldY,fieldA,fieldB,fieldC,fieldD,fieldE,fieldF,fieldG,fieldH,field1,field2,field3,field4);
create index idx_case_when_union_test_2 on default(fieldA,fieldB,fieldC,fieldD,fieldE,fieldF,fieldG,fieldH,field1,field2,field3,field4) WHERE (fieldX == "X" AND fieldY == "Y");

and for the first N1QL

SELECT CASE WHEN fieldA == "A" AND fieldB == "B" AND fieldC == "C" THEN field1
            WHEN fieldA == "A" AND fieldD == "D" AND fieldE == "E" THEN field2
            WHEN fieldA == "A" AND fieldF == "F" AND fieldG == "G" THEN field3
            WHEN fieldA == "A" AND fieldF == "F" AND fieldH == "H" THEN field4
            ELSE MISSING
       END AS field
  FROM default
 WHERE fieldX == "X" AND fieldY == "Y"

the explain shows

and the second N1QL

SELECT field1 AS field
  FROM default
 WHERE fieldX == "X" AND fieldY == "Y"
   AND fieldA == "A" AND fieldB == "B" AND fieldC == "C"

UNION

SELECT field2 AS field
 FROM default
WHERE fieldX == "X" AND fieldY == "Y"
  AND NOT (fieldA == "A" AND fieldB == "B" AND fieldC == "C")
  AND fieldA == "A" AND fieldD == "D" AND fieldE == "E"

UNION

SELECT field3 AS field
 FROM default
WHERE fieldX == "X" AND fieldY == "Y"
  AND NOT (fieldA == "A" AND fieldB == "B" AND fieldC == "C")
  AND NOT (fieldA == "A" AND fieldD == "D" AND fieldE == "E")
  AND fieldA == "A" AND fieldF == "F" AND fieldG == "G"

UNION

  SELECT field4 AS field
    FROM default
   WHERE fieldX == "X" AND fieldY == "Y"
     AND NOT (fieldA == "A" AND fieldB == "B" AND fieldC == "C")
     AND NOT (fieldA == "A" AND fieldD == "D" AND fieldE == "E")
     AND NOT (fieldA == "A" AND fieldF == "F" AND fieldG == "G")
     AND fieldA == "A" AND fieldF == "F" AND fieldH == "H"

the explain shows

it looks like both N1QL use cover index with index idx_case_when_union_test_1.
which one is better?

PS: I use USE INDEX (`idx_case_when_union_test_2`) for both N1QL,only the second N1QL works.

Query1 cannot use index idx_case_when_union_test_2 because leading index key is not part of the predicate.

Which one is better depends on the data and how many CASE statements etc.

In my opinion CASE WHEN will be better. Pros and Cons

  • Predicates are pushed exactly and single IndexScan will be used (less load on indexer)
  • CASE evaluation might be expensive depends on which WHEN condition it matches.
  • For single query latency might be higher than UNION but loaded system can scale better.

UNION Pros and Cons

  • As each UNION arm projects different attributes, there will not be any duplicates across UNION arms you can use UNION ALL.
  • Each UNION query will have different precise IndexScans and each IndexScan will have precise Spans. This result in more IndexScans and there might be load on indexer Also for each Spans all predicates may not pushed to indexer. This may lead to false positives form indexer.
    Take example : fieldX == “X” AND fieldY == “Y”
    AND fieldA == “A” AND fieldD == “D” AND fieldE == “E”
    Given index only filedX,filedY,fieldA can be pushed to indexer not fieldD, fieldE
  • Query has NOT predicate which needs to be DNF transformed. NOT (fieldA == “A” AND fieldB == “B” AND fieldC == “C”) ===> (fieldA != “A” OR fieldB != “B” OR fieldC != “C”). This may require more indexes, IndexScans and face same problem as described above.
1 Like

Thank you for detail explanation.