Document Modeling

Dear Couchbase Colleges,

I got stuck in a problem and I'm not sure I'm handling it property.

In my relational database I have PRODUCTS, COMPATIBILITY, DEVICES, PRODUCT_GROUP and REGION. Products has a relation N:M to Devices using Compatibility table to make the association. And each product can be sold in one or more region being associated using PRODUCT_GROUP (informing the group in which the product will be placed in that region). So I designed my documents like:

{ type:"PRODUCT",
docId:"pd1",
name:"Product Name",
groups: {"rgid1":"2013","rgid1":"Comedy" ...},
compatibility: ["cp1","cp3"]}
 
{ type:"COMPATIBILITY",
docId:"cp1",
name:"Compatibility Name",
provisioningKey:"XY012301CCD",
devices: ["dv1", "dv3"]}

This would be my first idea, I just separated the compatibility record because have a property (provisioningKey) that is specific for each compatibility record and changes each 2 days, this would then avoid to update all documents with the new key. My map function with this approach was something like:

function (doc, meta) {
  for (i in doc.groups) {
    for (j in doc.region[i].compatibility) {
      emit([doc.region[i].compatibility[j], i, doc.groups[i]], 1);
    }
  }
}

And using _count reduce function. The number of devices associated to compatibility records is huge, and the idea was to fetch all compatibility records associated to the device and use it to fetch all compatible products using this view, for example: I retrieve from another view that Device dv03 is assocated to cp1 and cp3, in a second step I use it to fetch all products from the given region for groups from A to B that have cp1 or cp3 in the complex key. Is this possible? Is there a simple/better way to solve this?

My second approach would be to integrate the device with my product like this:

{ type:"PRODUCT",
docId:"pd1",
name:"Product Name",
groups: {"rgid1":"2013","rgid1":"Comedy"},
devices: ["dv1", "dv2", "dv3"]}
 
{ type:"COMPATIBILITY",
docId:"cp1",
name:"Compatibility Name",
provisioningKey:"XY012301CCD"}

And the map-function like:

function (doc, meta) {
  for (i in doc.groups) {
    for (j in doc.region[i].devices) {
      emit([doc.region[i].devices[j], i, doc.groups[i]], 1);
    }
  }
}

This is ok for a few hundred of products. Each product is associated at the end with more than 3000 devices, and the indexing time will take too much time when importing new bunch of products to the database. The import of new products happens at least twice a day. Considering this I would think the first approach is better since new products are indexed easily by using compatibility keys instead of each device associated to the compatibility record.

Does someone faced the same problem? In this case what was the solution? Or maybe someone has a better solution/idea for the problem I exposed?

1 Answer

« Back to question.

Moving from the RDBMS world to a scheme free world can be tough.

The jest of your question is why does it take so long to create an pre-built index between two tables for every possible combination between the two tables respective to their foreign keys.

In your 1st method of having the data in two different documents( like two tables) when you did a join in your RDBMS the system would do lots of the leg work for you.
ie. scan the first table bring back a small result set ... scan the 2nd table bring back the result set ... compare the two results and bring back where they match. Boom you have your info.

In your 2nd method of having the data nested inside the document. So you can have redundant data between two or more products. Example if you had a "DocumnetType":"product" and had an array of possible states sold "states":["AZ","AR".....]
lot of your "DocumentType":"product" would have the same state data,but hard drives are cheap so its not that big of a deal.

So back to your original questions. Can I make it faster. Answer: Maybe.

1. Spinning disk to SSD & more CPU. 4 SSD with RAID 1+0 will take you from 100MB/sec-200MB/sec to 1.5GB/sec to 2GB/sec

2. play around with making your indexes "skinny". Do you need every possible combination or can you do an array with in array like emit( ["main_key","2nd_key",["item1","item2", ........ ]] , 1)

3. Put some logic inside the map function to not index (emit) items that you don't need, EX.

if(doc,meta){
 
if(doc.DocumentType == 'products' ){
 
  if(doc.Region =="East Coast"){
  emit( doc.LotOfDataHere,1)
   }
     if(doc.Region =="West Coast"){
  emit( doc.NotSoMuchDataHere,1)
   }
}
}

4. Stream your data into RDBMS to CouchBase ETL tool. try http://blog.couchbase.com/mysql-couchbase-using-talend-etl