Using UNNEST for array field with no matching predicate

Using Couchbase server 7.1.5 and Java SDK 3.4.7:

I have a document with an array field called anomalies:

{
  "propCode": "PTEST",
  "roomNumber": "201",
  "macAddress": "ABCDEF123456",
  "date": "2021-08-04",
  "anomalies": [
    {
      "message": "More than 90% CPU used",
      "firstReportedTime": 1628101158,
      "lastReportedTime": 1628101158,
      "reportedTimesWithStay": [
        1628101158,
        1628101158,
        1628101158,
        1628101158,
        1628101158
      ]
    },
    {
      "message": "MPI is enabled",
      "firstReportedTime": 1628101158,
      "lastReportedTime": 1628101158,
      "reportedTimesWithStay": [
        1628101158,
        1628101158,
        1628101158,
        1628101158,
        1628101158
      ]
    }
  ]
}

My Java query is:

@Override
public QueryResult search(ConnectedRoomAnomaliesApi.GetAnomaliesParams getAnomaliesParams) {
  final String statement = "SELECT anomaliesDocument.*"
      + " FROM anomalies anomaliesDocument"
      + " UNNEST anomaliesDocument.anomalies anomaliesArray"
      + " WHERE ($message IS NULL OR anomaliesArray.message = $message)"
      + " AND ($propCode IS NULL OR anomaliesDocument.propCode = $propCode)"
      + " AND ($roomNumber IS NULL OR anomaliesDocument.roomNumber = $roomNumber)"
      + " AND ($macAddress IS NULL OR anomaliesDocument.macAddress = $macAddress)"
      + " AND anomaliesDocument.date BETWEEN $startDate AND $endDate"
      + " ORDER BY anomaliesDocument.date";

  try {
    return scope.query(
        statement,
        queryOptions()
            .parameters(JsonObject.create()
                .put("message", getAnomaliesParams.getMessage())
                .put("propCode", getAnomaliesParams.getPropCode())
                .put("roomNumber", getAnomaliesParams.getRoomNumber())
                .put("macAddress", getAnomaliesParams.getMacAddress())
                .put("startDate", getAnomaliesParams.getStartDate().toString())
                .put("endDate", getAnomaliesParams.getEndDate().toString()))
            .scanConsistency(QueryScanConsistency.REQUEST_PLUS)
            .metrics(true)
            .readonly(true)
            .adhoc(false));
  } catch (CouchbaseException e) {
    LOGGER.error("query or index error: " + e.getMessage());
    throw new AnomaliesRepositoryException(e);
  }
}

It UNNESTs the anomalies array and matches on the message field. If the message field is provided, this works as expected. I can reproduce the correct results using N1QL:

SELECT anomaliesDocument.*
FROM connectedroom.ecmp.anomalies anomaliesDocument
UNNEST anomaliesDocument.anomalies anomaliesArray
WHERE anomaliesArray.message = "MPI is enabled"
AND anomaliesDocument.propCode = "PTEST"
AND anomaliesDocument.roomNumber = "201"
AND anomaliesDocument.date BETWEEN "2021-08-01" AND "2021-08-30"
ORDER BY anomaliesDocument.date;

If the message field is NULL in the application, duplicate results (array.size value) are returned. I assume this is because I’m using UNNEST without matching a predicate?

This appears to be equivalent to the following N1QL query:

SELECT anomaliesDocument.*
FROM connectedroom.ecmp.anomalies anomaliesDocument
UNNEST anomaliesDocument.anomalies anomaliesArray
WHERE anomaliesDocument.propCode = "PTEST"
AND anomaliesDocument.roomNumber = "201"
AND anomaliesDocument.date BETWEEN "2021-08-01" AND "2021-08-30"
ORDER BY anomaliesDocument.date;

where the WHERE clause is not matching on the message field, because it’s NULL in the application. If I do NOT provide a message field to match on, I must remove the UNNEST clause to get the expected results.

My current work around is to create two different queries in my application code as follows:

private String buildQuery(ConnectedRoomAnomaliesApi.GetAnomaliesParams getAnomaliesParams) {
    if (Objects.isNull(getAnomaliesParams.getMessage())) {
      return "SELECT anomaliesDocument.*"
          + " FROM anomalies anomaliesDocument"
          + " WHERE ($propCode IS NULL OR anomaliesDocument.propCode = $propCode)"
          + " AND ($roomNumber IS NULL OR anomaliesDocument.roomNumber = $roomNumber)"
          + " AND ($macAddress IS NULL OR anomaliesDocument.macAddress = $macAddress)"
          + " AND anomaliesDocument.date BETWEEN $startDate AND $endDate"
          + " ORDER BY anomaliesDocument.date";
    } else {
      return "SELECT anomaliesDocument.*"
          + " FROM anomalies anomaliesDocument"
          + " UNNEST anomaliesDocument.anomalies anomaliesArray"
          + " WHERE anomaliesArray.message = $message"
          + " AND ($propCode IS NULL OR anomaliesDocument.propCode = $propCode)"
          + " AND ($roomNumber IS NULL OR anomaliesDocument.roomNumber = $roomNumber)"
          + " AND ($macAddress IS NULL OR anomaliesDocument.macAddress = $macAddress)"
          + " AND anomaliesDocument.date BETWEEN $startDate AND $endDate"
          + " ORDER BY anomaliesDocument.date";
    }
  }

Is there a way to restructure my query to avoid making two different ones if I do not provide a message field to match inside the array?

The duplicates are due to UNNEST usage. UNNEST is self JOIN and expands.
remove UNNEST and use ANY because it used only in WHERE, not projecting or using in ORDER BY. ($message IS NULL OR ANY a IN ad.anomalies SATISFIES a.message = $message END)

Your approach works only when message is unique in array. The moment it has duplicates it will repeat in results. Another way is SELECT DISTINCT.
In your case UNNEST/DISTINCT expensive so use ANY like below

SELECT ad.*
FROM anomalies AS ad
WHERE ad.date BETWEEN $startDate AND $endDate
      AND ($propCode IS NULL OR ad.propCode = $propCode)
      AND ($roomNumber IS NULL OR ad.roomNumber = $roomNumber)
      AND ($macAddress IS NULL OR ad.macAddress = $macAddress)
      AND ($message IS NULL OR ANY a IN ad.anomalies SATISFIES  a.message = $message END)
ORDER BY ad.date;
1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.