How many leading digits of the CAS constitute the timestamp

How many leading digits of the CAS constitute the timestamp.

The whole thing is a nanosecond-since-epoch timestamp.
HTH.

Please note that the CAS should be treated as an opaque and the fact that it’s currently a timestamp should not be relied on.

1 Like

@pvarley OH NO! :slight_smile: way back when when we were told about this CAS property, there was cheering on the company slack for its usefulness. That’s really sad to hear.

But this is incredibly important. May we rely on that a later change to document will have CAS that is numerically greater than the CAS for any previous change?

Please tell me THAT we can rely on. (Or if we cannot) I don’t know how you do any data engineering without any such guarantee without updating an filed with every single document change which is totally onerous and way too late for our application.

But this is incredibly important. May we rely on that a later change to document will have CAS that is numerically greater than the CAS for any previous change?

Generally yes, but there are some limited edge cases where this is not true. Stepping back for a second what’s the problem you are trying to solve? Maybe there is a better way to do it?

For example if you want the modify time of the document you can use the Virtual Extended Attributes.

Sure we are using the kafka source connector to bring documents into kafka and ultimately shipping them into our datawarehouse. Since we do not rely on dcp stream to reflect the actual change history of the documents, we are content to simply to have the latest state of the document at any given time stored in the DW.

We do not want to think about or rely on the order of the data elements in the kafka topics or about duplicates, we want the headache free ability to simply import into the DW and have the latest document state.

We have large ingest tables with no primary keys that simply reflect the data as it comes out of the pipeline.

Then we import from these ingest tables like so:

with ingest_table as (
	select document_key, 
	isdemo, 
	cas from 
	(select document_key, isdemo, cas, row_number() over (partition  by document_key order by cas desc) as rownum from cb2pg_isdemo_user) i 
	where rownum = 1)	
insert into isdemo_user(document_key, isdemo, cas) 
select document_key, isdemo, cas from ingest_table
on conflict (document_key) do update set 
cas = EXCLUDED.cas, isdemo = EXCLUDED.isdemo,
updatedate = current_timestamp
WHERE isdemo_user.cas < EXCLUDED.cas;

As you can see by simply selecting the document version that has the largest CAS, we are sure to be handling its latest state.

And then we insert into the clean table or else update its state therein provided that it has a lower CAS value.

This is the very starting point of our Data Transformation process.

While we are stepping back, can you please elaborate on

“Generally yes, but there are some limited edge cases where this is not true.” because this system is very simple and works very well for us and we have to understand where it can break down.

@pvarley please do get back to us as soon as possible, because to say this of critical import to us is an understatement. We really need to know precisely those edge cases where the cas ordering breaks down. We are using version 7 enterprise if that makes a difference.

VirtualExtendeedAttributes looks promising because of the last_modified date but this looks like something only exposed to the subdoc api and not something coming over dcp that could feed the kafka connector and the downstream pipeline.

According to my notes…

In order for the timestamp in the CAS to be reliable, the bucket must have been created by Couchbase Server 4.6 or later, and the document change must have been performed by Couchbase Server 7.0 or later. Even then, it’s possible for a set_with_meta operation to assign an arbitrary CAS value (and therefore timestamp) to a document.

Also, yes, end-users should treat the CAS an opaque.

Thanks,
David

1 Like

Ok thank G-d, this is workable. Coupled with this we are okay.

@pvarley is more of an expert in this area than I am, so if he’s advising against using the CAS as a timestamp I would give more weight to his opinion.

Yeah there are two issues here. The timestamp element which I will not use at all, and very fact that there is an ordering to them. He mentioned a few rare edge cases that it wouldn’t be reliable implying to me that in the other cases it is reliable.

Those edge cases, the ones you laid out, are not ones we are concerned with, but we still would not rely on it at all if not for that we can indeed recover what we need from the dcp/connector if something truly very strange should ever happen (like CB version 8 changes the CAS system)

@pvarley please let me know if this is offbase.

DCP will always give the latest version of the document at the time.

With regards to the edge case, the one that jumps to mind is restoring a backup with --force-update. This will force conflict resolution to take an older document from the backup which includes the metadata i.e the CAS. Using --force-update should be a rare case.

Thank you very much. With much gnashing of teeth we have removed any dependency on the cas from the downstream systems. We treat it as an opaque that is often useful. I hope that one day CB makes a consumable change log that durably records the historical state of the given documents.