Querying double time series with n1ql?

Suppose a table that contains currency exchange rates, containing the schema:

{ date: "2017-01-03", quote: "2017-01-03", from: "EUR", to: "USD", rate: "A" }
{ date: "2017-01-03", quote: "2017-01-03", from: "GBP", to: "USD", rate: "B" }
{ date: "2017-01-03", quote: "2017-01-03", from: "CHF", to: "USD", rate: "C" }

{ date: "2017-01-04", quote: "2017-01-04", from: "EUR", to: "USD", rate: "D" }
{ date: "2017-01-04", quote: "2017-01-04", from: "GBP", to: "USD", rate: "E" }
{ date: "2017-01-04", quote: "2017-01-05", from: "GBP", to: "USD", rate: "F" }

Question is how to write a n1ql query that for any given date will return
all of the latest rates for these currencies.

What I’m trying so far which isn’t working is variants of things like this:

SELECT * from bucket 
WHERE from IN ["CHF", "EUR", "GBP"] AND 
      to = "USD"
ORDER BY date DESC, quote DESC LIMIT 3;

(But this doesn’t work because it fails to de-duplicate pairs)

This is not simple because:

  • On a given day, not all pairs may be present. On Jan 4, CHF->USD is missing,
    and should be taken from Jan 3.
  • Double timeseries, you can have > 1 observation for a pair on a given day. On
    Jan 4, there are two GBP->USD rates, one quoted on the 4th, one quoted on the 5th.
    The most up to date one (quote: “2017-01-05”) should be taken
  • Must have 1 and only 1 reference for each from->to pair on any given day.

Ideally, this query, for 2017-01-04 would return (complex case):

{ date: "2017-01-04", quote: "2017-01-04", from: "EUR", to: "USD", rate: "D" }
{ date: "2017-01-04", quote: "2017-01-05", from: "GBP", to: "USD", rate: "F" }
{ date: "2017-01-03", quote: "2017-01-03", from: "CHF", to: "USD", rate: "C" }

and for 2017-01-03 would return (simple case):

{ date: "2017-01-03", quote: "2017-01-03", from: "EUR", to: "USD", rate: "A" }
{ date: "2017-01-03", quote: "2017-01-03", from: "GBP", to: "USD", rate: "B" }
{ date: "2017-01-03", quote: "2017-01-03", from: "CHF", to: "USD", rate: "C" }

I’m thinking that this may require correlated subqueries, but I’m struggling with how to express this. Any suggestions?

(SELECT * FROM default WHERE `from` = "CHF" AND `to` = "USD" ORDER BY date DESC, quote DESC LIMIT 1)
UNION ALL
(SELECT * FROM default WHERE `from` = "EUR" AND `to` = "USD" ORDER BY date DESC, quote DESC LIMIT 1)
UNION ALL
(SELECT * FROM default WHERE `from` = "GBP" AND `to` = "USD" ORDER BY date DESC, quote DESC LIMIT 1);

The following index and query performs better and avoid sort. In 5.0.0 you can create DESC index instead of negative index.

CREATE INDEX ix10 ON default(`from`,`to`,-STR_TO_MILLIS(date), -STR_TO_MILLIS(quote));
(SELECT * FROM default WHERE `from` = "CHF" AND `to` = "USD" ORDER BY `from`,`to`,-STR_TO_MILLIS(date),-STR_TO_MILLIS(quote) LIMIT 1)
UNION ALL
(SELECT * FROM default WHERE `from` = "EUR" AND `to` = "USD" ORDER BY `from`,`to`,-STR_TO_MILLIS(date),-STR_TO_MILLIS(quote) LIMIT 1)
UNION ALL
(SELECT * FROM default WHERE `from` = "GBP" AND `to` = "USD" ORDER BY `from`,`to`,-STR_TO_MILLIS(date),-STR_TO_MILLIS(quote) LIMIT 1);
1 Like

This is a good solution, but has the drawback that if there are an non-fixed number of currencies, that I could potentially need 30 sub-queries to assemble all of the data. Additionally, if there’s a lot of data, then in the case of un-indexed “from” and “to”, the DB would be repeatedly scanning through the same data in the independent queries over and over again, but I suppose at least it can (maybe) parallelize those independent queries?

Based on the requirements this is best solution. I think performance will be out weigh the inconvenience of multiple UNION ALLs due to reverse order sort and one for each value.

CREATE INDEX ix10 ON default(`from`,`to`,-STR_TO_MILLIS(date), -STR_TO_MILLIS(quote));

(SELECT * FROM default WHERE `from` = "CHF" AND `to` = "USD" ORDER BY `from`,`to`,-STR_TO_MILLIS(date),-STR_TO_MILLIS(quote) LIMIT 1)
UNION ALL
(SELECT * FROM default WHERE `from` = "EUR" AND `to` = "USD" ORDER BY `from`,`to`,-STR_TO_MILLIS(date),-STR_TO_MILLIS(quote) LIMIT 1)
UNION ALL
(SELECT * FROM default WHERE `from` = "GBP" AND `to` = "USD" ORDER BY `from`,`to`,-STR_TO_MILLIS(date),-STR_TO_MILLIS(quote) LIMIT 1);
  1. Each UNINON ALL arm using the same index ix10.
  2. from, to able to push to the indexer and IndexScan is done only the required range.
  3. As ix10 created negated index on date, quote filed and ORDER BY is modified to add from,to these are equality predicate. So Query able to use Index Order and avoid Sort (Can be found in EXPLAIN, there is no Order Operator)
  4. As all the predicates are pushed to indexer and no false positives produced by indexer and query uses index order, can push the LIMIT 1 to indexer. So that Indexer can produce single value (avoids Index Scan). (Can be found in EXPLAIN, there will be “limit”: 1 in Operator IndexScan section).
  5. Each Union arm query does IndexScan of 1 item and Fetch the query time will not vary for the number of qualified items by predicate (i.e 100 or million)

Summary: It is in convenient many Union ALLs, but each arm query does 1 item scan and 1 fetch. Less resources, faster.

Option 2)

CREATE INDEX ix11 ON default(`from`,`to`,date, quote);
SELECT d1
     FROM ( SELECT RAW MAX([q1.quote,q1.id])[1]
                 FROM ( SELECT  `from`, `to`, quote, MAX([date,META().id])[1] AS id
                              FROM default
                              WHERE `from` IN ["CHF", "EUR", "GBP"]  AND `to` = "USD"
                              GROUP BY `from`,`to`, quote
                      ) AS q1
                 GROUP BY q1.`from`, q1.`to`
          ) AS q2ids
      JOIN default d1 ON KEYS q2ids;

Inner most subquery does covering indexscan on ix11 and produces required meta().ids and does fetch.