Problem Statement:
The data is stored in the Couchbase with the date fields in the ISO 8601 format. Also, the date contains Offset information. When trying to SORT the records by the date-time field using the Order By Clause, the offset is not considered in calculating the sorting order.
Question
Why does the default sorting order not working with DateTime offset?
What is the recommended way to sort the date with offset information?
Reproducing the issue
INSERT INTO Bucket VALUES (“k01”, {“created”:“2020-11-23T03:55:12.5922223-06:00”, “docType” : “TEST”});
INSERT INTO Bucket VALUES (“k02”, {“created”:“2020-11-23T03:59:51.0496435-05:00”, “docType” : “TEST”});
INSERT INTO Bucket VALUES (“k03”, {“created”:“2020-11-23T04:00:07.7449352-05:00”, “docType” : “TEST”});
INSERT INTO Bucket VALUES (“k04”, {“created”:“2020-11-23T04:10:04.4547325-06:00”, “docType” : “TEST”});
INSERT INTO Bucket VALUES (“k05”, {“created”:“2020-11-23T04:20:19.6912516-05:00”, “docType” : “TEST”});
Query to Select data by created Ascending order
SELECT created, STR_TO_UTC(created) as utc FROM Bucket
WHERE docType = “TEST” and created > “2020-11-23”
ORDER BY created ASC
Actual Result
created utc
“2020-11-23T03:55:12.5922223-06:00” “2020-11-23T09:55:12.592Z”
“2020-11-23T03:59:51.0496435-05:00” “2020-11-23T08:59:51.049Z”
“2020-11-23T04:00:07.7449352-05:00” “2020-11-23T09:00:07.744Z”
“2020-11-23T04:10:04.4547325-06:00” “2020-11-23T10:10:04.454Z”
“2020-11-23T04:20:19.6912516-05:00” “2020-11-23T09:20:19.691Z”
Expected Result
created utc
“2020-11-23T03:59:51.0496435-05:00” “2020-11-23T08:59:51.049Z”
“2020-11-23T04:00:07.7449352-05:00” “2020-11-23T09:00:07.744Z”
“2020-11-23T04:20:19.6912516-05:00” “2020-11-23T09:20:19.691Z”
“2020-11-23T03:55:12.5922223-06:00” “2020-11-23T09:55:12.592Z”
“2020-11-23T04:10:04.4547325-06:00” “2020-11-23T10:10:04.454Z”
Any comparison operators (ORDER BY, predicates) on string will do left to right by byte comparison.
ISO-8601 formatted string are reliable string comparable when all the values are converted to same timezone.
As JSON doesn’t have timestamp data type, no way to know it is timestamp or actual string.
As if all the times are local time (instead of absolute UTC time). If the timestamp has different timezones and want absolute UTC comparison user must explicitly convert to common timezone using date functions in comparison or convert to number of MILLIS.
Thanks @vsr1 for the explanation.
I am able to understand how the underlying Date comparison is working with N1QL.
In certain programming languages with rich Data Types like C#, the DateTime comparison works fine, even when the JSON data is having DateTime with offset information. So, it was a bit confusing and definitely a learning.
Regarding the solution, which would you recommend from the following w.r.t performance.
Use the function, STR_TO_UTC(created) in the Query and then sort by the STR_TO_UTC(created).
Add the UTC information in the Document, and then sort by the UTC date as a string comparison.
Does adding the function in the query incur a considerable cost. The approximate data size is around 1 million records.
Any language accessing field vs doing function on top will have overhead. If that only few times don’t see it doing millions times you might see difference.