Overcoming key length limit strategy?

How do I enforce a unique constraint in CouchBase where the combined url and siteName values must be unique and the total length of url and siteName can be longer than the key length limit of CouchBase:

{
    url: "http://google.com",
    siteName: "google.com",
    data:
    {
       //more properties
    }
}

I currently have two solutions in mind but I think that both are not good enough.

Solution 1 Document key is the SHA1 hash of url + siteName.

Advantages: easy to implement
Disadvantages: collisions can occur

Solution 2 Document key is the hash(*url* + *siteName*) + *index*.
This is same as Solution 1 but key includes *index* in-case a collision occurs.

Retrieving a document by *url* + *siteName* takes the following steps:

1. set *index* to 0
2. get document by hash(*url* + *siteName*) + *index*
3. Is document *url* + *siteName* same?
4. If yes, return document.
5. if no, increment *index* and go back to step 1

This is my favorite solution so far

Solution 3 Allow duplicates then just delete the duplicate at a later time.

In this solution, the unique constraints is moved to the application server. The key is just a GUID or timestamp and is NOT referenced by other documents.

  1. To add a document, the application server:
    1. Searches for existing documents that has the url and siteName. If a document is found, fail the operation.
    2. Insert the document
  2. To update a document, the application server:
    1. Searches for existing documents that has the url and siteName. If a document is not found, fail the operation.
    2. Update only the latest (last inserted) document
  3. To search for the document by url and site, the application server:
    1. only returns the latest (last inserted) document that has the url and siteName
  4. A background job regularly scans for added documents since X minutes ago then deletes the older duplicates.

I am a NoSQL n00b! How can I enforce unique constraints in CouchBase? Thanks

1 Answer

« Back to question.

Personally given how incredibly unlikely a collision is in a SHA 256 I would probably go for Solution 1 and deal with the collision if it should ever come up (which I don't expect) by showing an error and let the monitoring catch it for me to resolve.

You are right that enforcing a unique constraint is best done using the key and then use Add to add elements which enforces unique addition on the server side throwing an error in case of a key clash.

Solution 2 seems to complicate things a lot for a very small chance of clash, it will however work. Also I would suggest another simpler way in my opinion by instead of adding an index storing an array of documents under the key, by default element at index 0 will always be the match should there be more than 1 element you would need to compare the site url.

Solution 3 seems to rely on views, and does some complicated processing which adds significant overhead so I would advise against it for performance reasons.

Hope I could help out a bit.