Continue with our previous discussion, I added the try catch block you mentioned but we are getting CouchbaseLite exception while calling document.Delete(); It came from SqliteCouchStore.PutRevision… Could you please shed some light on this issue? Please let me know if you need more info…
"Couchbase.Lite.CouchbaseLiteException: Previous revision not found\r\n at Couchbase.Lite.Store.SqliteCouchStore.<>c__DisplayClass95_0.b__0()\r\n at Couchbase.Lite.Store.SqliteCouchStore.RunInTransaction(RunInTransactionDelegate block)\r\n at Couchbase.Lite.Store.SqliteCouchStore.RunInOuterTransaction(RunInTransactionDelegate action)\r\n at Couchbase.Lite.Store.SqliteCouchStore.PutRevision(String inDocId, String inPrevRevId, IDictionary`2 properties, Boolean deleting, Boolean allowConflict, StoreValidation validationBlock)\r\n at Couchbase.Lite.Database.PutDocument(String docId, IDictionary`2 properties, String prevRevId, Boolean allowConflict)\r\n at Couchbase.Lite.Document.PutProperties(IDictionary`2 properties, String prevID, Boolean allowConflict)\r\n at Couchbase.Lite.SavedRevision.CreateRevision(IDictionary`2 properties)\r\n at Couchbase.Lite.Document.Delete()\r\n
This is an odd error to get. It means that you are calling Delete() on a document that does not exist in the local database (Or somehow you’ve called delete on a document with a null revision ID). Or at least the reference you have thinks that it doesn’t exist (if you are using multiple databases on multiple threads on the same file they will not all be consistent with each other all the time).
Do you have a certain method that you can provide for reproducing this issue?
That make sense. Because we had a job to retrieve all existing documents, process them and then delete them. The job uses a query to get a list of documents and then loop through the list to create and load a DTO from each document and then delete each document. We kept getting null pointer exception while the job populating the DTO by calling document.GetProperty(“OurProperty”)… The query must have returned a list of items but, from the db point of view, one of the item does not has any detail.
I will try to isolate the code out of our project. We are in a web environment so it is multiple threads; however, the object that contains the Couchbase db is a singleton object managed by our DI container so there is only one instance of that object in the whole web app.
We are constantly getting this issue but it only happens after the application has been running for a day or so. We had to blow away the database to fix this issue though. Do you have an email address that I can send you some code?
@borrrden, I am trying to gather more info for debugging.
When I open sqlite3 file with sqlite tools, I see 6 tables. Could you please confirm my understanding is correct about the following tables?
docs - contains all documents that ever existed in the database
localdocs - contains documents that are currently exist?
revs - changes that happened to each doc in the docs table
maps_1 - existing documents as well? What maps to what though?
Also I did found in revs table, most of the records has at least 2 revisions.
doc Id 33
However, one record that was created around the time the error happened only had one revision record.
doc Id 31
Does this mean anything?
What code do you want to send? If it is a reproduction case, then ok, but if it is a project that needs to run for a day before maybe seeing a problem then I have to decline to receive it.
As far as the tables go:
docs -> These are all the documents that have been or can be replicated
localdocs -> These are documents that live in the local database only and don’t get replicated
revs -> These are all the revisions of the documents in the docs table that the local database knows about
maps_1 -> These are the cached results of a Couchbase Lite view (most likely the one you showed me before). Their contents are decided by your “emit” function.
And for the last part, do you mean that document #31 has only one entry in the revs table (the one you showed?).
There is one document in localdocs table. I don’t think my code actually specified this document shouldn’t be replicated in the other database… What tells couchbase lite not to replicate this document ?
I think you mean doesn’t exist in docs (not localdocs). So you mean that there is no document with the doc_id 31 in the docs table? If so that is a concerning situation, but I can’t say for sure whether or not it is related to the problem you are seeing.
You are probably seeing the local checkpoint document. The library uses this table to save some data that shouldn’t be replicated. Documents in here were created using the PutLocalDocument API, either by the library or by you.
Sorry about confusing you. I realized I wrote the wrong words. #31 actually exists in docs table. However, there is only on entry in the revs table and there is no content in that rev record. I would think, if there is only one record in revs then it would be the “create”; then it would have the document content in the rev record…
This is not the first revision of the doucment (the “3” in the revision ID means that there are 2 revisions that come before this one) so I am more concerned with where the first and second revisions are. Are you calling Purge() on any documents or revisions?
Compact() will only start deleting revisions after (by default) 20 revisions have been stored. I guess it might happen if you set MaxRevTreeDepth to a very small number (but I don’t think you are doing that). Otherwise, it is being inserted that way, which is an error (but I won’t be able to figure out how it got in there without a reproduction case, unfortunately). Do both databases show the same situation for this document?
I think I found what lead to the issue. This is what I observed so far.
We have node A and node B running push and pull replication on both nodes. Our app performs CRUD operation on node A. Node B is our failover system. No request goes to B. The replication process keeps B’s database in sync.
During the day, due to an IIS appPool recycle, Node B’s website was restarted but the app didn’t get activated because there was no request went to B (I will fix this).
While Node B had been down, 3 requests had gone to A. A then processed the massages and deleted them. No error so far. When I came into the office, I went to check both nodes. The check request would have triggered the Node B to came back active. When the request came back, I noticed there were 3 documents on node B while there was none on node A.
It looked like Node B preformed some replication actions after it started back up. Then 2 of the 3 documents got replicated back onto node A. Then we got “Previous revision not found” on Node A while A tries to process those 2 records.
Clear as mud?
I will try again and confirm I can replicate the sequence of the events and the issue. but if the it was what happened, the question would be, why Couchbase replicated deleted docs to the other node? Why when those records got replicated back, the original node couldn’t find previous revisions? After all, those records were deleted on Node A…
Now it looks like the replication is still working because newly created documents on Node A get replicated to node B. However, Node B is not deleting any document…
Could it be related to the try catch block I added to ignore duplicate revision on deletion? I stopped the exception but the document was not deleted?
This is normal, and required in the replication process. Otherwise there is no difference between “deleted” and “never existed” or simply “unchanged.” Deleting does not remove items from the database, it simply makes a new revision with a deleted flag.
So to make another summary this is what is likely happening?
A and B are pushing and pulling to each other
B goes offline
Question: At this point, does A create new documents, process them, and delete them as a result of the REST call? Or does it create new revisions on an existing document, process them, and then delete the document?
B comes online and starts replicating again, and receives the history of the deleted documents from A. But at this point why does it have documents on it that are unique to B if no requests are being served by B?
My recommendation is that you don’t set up a push replication from A to B. If B is just a backup, then all it needs to do it pull from A normally, and push in case A comes back online after an outage. Or, alternatively you could just set up a push from A → B and B → A. Ideally you would use Sync Gateway to sync between the two machines instead of P2P since it allows for a more robust solution.
I understand. However, in my case, those documents get replicated over as existing docs(not deleted). so maybe the revision that associated with the delete didn’t get replicated over…
A creates new documents, process them and then delete them by using the .Net API.
There was no document on B before it came back online. Those two documents were replicated across from A but somehow shown on B as not deleted. So the two documents exists on B but not on A.
I will give this a try. The idea is if A goes offline, we will switch to use B. So B should be able to push changes to A as well when A comes back online. Both nodes need to be able to do the same thing when it comes to replication.
If it is a continuous push then it will try again at 60 second intervals to push things that it was unable to push, but if there are changes to the database in the meantime it will push them first.
Although this works a little more reliably if it doesn’t immediately fail upon start. The scenario where it tries to start a replication against an already offline endpoint is a scenario that deserves a little more looking into I think.
Ok, here is a bit more details I found during my testing. When I do get changes from Node A (which I believe is how the replication detect changes), I found this.
So -AoWVxTpIuka3yvx36udp7g was deleted from Node A.
However, the document was replicated to Node B and was not deleted from Node B after it was deleted on Node A. The changes metadata I got back from A did marked the item as deleted…
I’m having trouble understanding this part. So did you get a changes feed from Node B that had an entry for the same document ID (-AoWVxTpIuka3yvx36udp7g) but with a lower revision (one starting with 2 or 1)?