Select all document id from exactly x days ago?

Chi · July 4, 2017, 7:38am

The challenge I am facing is to select some bunch of documents id using n1ql query.

I have a bucket containing nearly thousands of records/documents. Each document contains an expiry field. My document looks something like below:-

{
  "timestamp": "2017-06-30 18:40:12",
  "title": "http://www.sample.com",
  "clicks": 0,
  "keyword": "7s9HHp",
  "expiryDay": 90,
  "user": "admin",
  "url": "http://www.sample.com",
  "ip": "192.12.35.40"
}

Here keyword also represent document id.

Right now what I can possibly think is:-
1)First I need to find all the distinct expiryDay from my bucket.
2)Then I need to query for keywords or documentIds whose expiryDay is let’s say 90 days and find the difference between the document’s timestamp with current-date and then again need to check the resulted difference is equal or greater than 90 or not. If yes then return the keywords or documentIds.

So there might be two different query to achieve my objective. And I don’t know how to do index for the above query as I don’t know how the query will look.

I wonder if it is possible using couchbase-java api, something similar to below:-
Statement expiryDays= Select.select("Don't know how to select distinct expiryDay").from("My_Bucket");
Statement keywords= Select.select("keyword").from("My_Bucket").where("Don't know what to supply here");

atom_yang · July 4, 2017, 9:09am

Try create index by

CREATE INDEX `expiryDay-timestamp-idx` ON `My_Bucket` (`expiryDay`,`timestamp`)

and try this N1QL:

SELECT META().id
  FROM My_Bucket 
 WHERE expiryDay == 90
   AND DATE_DIFF_STR(NOW_UTC(), timestamp, 'day') >= 90

it will use CoverIndexScan.

vsr1 · July 4, 2017, 9:12am

CREATE INDEX  ix1 ON default(expiryDay,timestamp);
SELECT DISTINCT expiryDay FROM default WHERE expiryDay IS NOT NULL;

SELECT META().id FROM default WHERE expiryDay = 90 
               AND DATE_DIFF_STR(NOW_STR(),timestamp,"day") >= 90;

Chi · July 4, 2017, 12:20pm

Thanks for the reply. However, NOW_UTC() function is not working giving "msg": "Invalid function NOW_UTC. - at )". Although it is working with NOW_STR() as mentioned by @vsr1.

vsr1 · July 4, 2017, 3:04pm

NOW_UTC() available from 4.6.0 onwards (https://issues.couchbase.com/browse/MB-21000). Pre 4.6.0 you can convert date string to UTC using STR_TO_UTC() (I.e. NOW_UTC() can be achieved by STR_TO_UTC(NOW_STR()).

Chi · July 5, 2017, 1:45pm

Okay. But I am using 4.1.1. I should have mentioned it before. Anyway now I am facing problem while executing DISTINCT query. It is throwing java.lang.RuntimeException: java.util.concurrent.TimeoutException when executing below query
Statement statement = Select.selectDistinct("expiryDay").from("My_Bucket") .where(Expression.x("expiryDay").isNotNull());
which is similar to below n1ql
SELECT DISTINCT expiryDay FROM My_Bucket WHERE expiryDay IS NOT NULL.

But surprisingly if I switch to Couchbase 4.5.0v it is working fine.

Is it a bug?

vsr1 · July 5, 2017, 1:55pm

It is not a bug. It is timeout query might taking time in 4.1.1 and it might faster in 4.5.0 because of improvements. Increase timeout. Also post the EXPLAIN of query in 4.1.1

Chi · July 5, 2017, 1:57pm

I increased to 30 seconds and kept only 4 document in Bucket but still getting same exception.

vsr1 · July 5, 2017, 1:58pm

Run the query in cbq shell or through curl and post the explain

Chi · July 5, 2017, 2:06pm

The server where my Couchbase 4.1.1 is running is temporarily blocked. Once it is accessible I’ll update.

Chi · July 6, 2017, 8:06am

Okay, this is definitely a bug in Couchbase server (In my case 4.1.1 and don’t know about 4.5.0). Earlier I actually kept nearly 10 million of documents in CB(4.1.1) for testing purpose. Just a week back I flushed the entire Bucket and kept only 4 document for the purpose of above testing.

Now when I run my Distinct query with a timeout of 30 seconds it throws ConcurrentTimeoutException. So I changed the query and run only Select query instead of Distinct and saw thousands/millions of expiryDay value printing in my java console.

I am doubting that the indexing is keeping a separate set of data for himself which is intact even after the data is flushed from the bucket. I wonder if it is not fixed in new version of CB. If so then this is a real threat.

vsr1 · July 6, 2017, 12:59pm

Please run the query through cbq shell or through CURL. Also post the explain.

Indexer should have caught up and kept info of 4 documents when bucket is reduced to 10 million to 4 documents. @deepkaran.salooja.

You can also drop the index and re-create it and check in 4.1.1.

You mentioned it works in 4.5.0 you can upgrade to latest version. Note: Distinct on 10million items takes time because it needs to process all 10 million items.

Chi · July 6, 2017, 2:11pm

1)Definitely, I’ll post the explain query as soon as my server is accessible to me.

2)Just mentioning, I did not reduce the bucket. I simply flush the bucket and then entered 4 document.

3)That’s a good suggestion but in real time I have to delete some data or move the data to some different location. In that case, I can’t drop or create index every time I perform this task.

4)I am not saying the issue will not come in 4.5.0. I am facing this problem in 4.1.1 because I stored nearly 10million data nearly 3 to 4 months back and flushed it just a week before. So there is 90% chance that this issue might come in 4.5.0 or any other higher version if we re-iterate the same test scenario.

deepkaran.salooja · July 6, 2017, 11:09pm

@Chi, it would be great if you could provide us with some more information on the bucket flush issue:

After bucket flush, what was the rough duration after which the query was tried? Index rebuild after bucket flush is an async operation and can take a few minutes.
Did you try the query only once or multiple times? There was a known issue in 4.1 MB-16957, which caused the first query after flush to timeout. This is fixed for 4.5.
What is the consistency level of the query?

Thanks.

Chi · July 7, 2017, 6:15am

As I already mention the bucket was flushed a week or two back ago and in my case, I don’t think index rebuild is working. Note:- My Couchbase is running in non-sudo user i.e Couchbase user.

2)Multiple times, in fact, it is still giving the same response. Can’t say it is the same one. And the bug report is against 4.1.0 which is different.

3)My organisation stopped direct use of putty. We need to migrate to PIM. So this will take time. Once done I will post everything.

Topic		Replies	Views
How to get documents based on date difference Couchbase Server query	3	1848	August 9, 2016
Couchbase Java SDK 3.1.2 Bulk document retrieval Couchbase Server query , java , n1ql	0	464	July 20, 2021
Accessing expiration of a document using N1QL SQL++	11	7646	August 13, 2018
Issues on querying data SQL++ query , java , n1ql	4	574	October 3, 2022
Search for document by partial ID SQL++ java	10	3333	February 4, 2020

Select all document id from exactly x days ago?

Related topics