Couchbase
  • Why NoSQL?
  • Couchbase Server
  • Download
  • Resources
  • Careers
Home | Forums | Couchbase | Couchbase Server 2.0

Some newbie questions (Couchbase and NoSQL).

No replies
  • Login or register to post comments
Sat, 01/07/2012 - 10:33
drakmir
Offline
Joined: 01/07/2012
Groups: None

First off, I'm really intrigued by the Couchbase system and what it seems to offer in terms of scalability and high availability.

I'm looking at converting an existing ASP.NET solution to Couchbase. The current data store is sitting on a SQL Server and we are having the typical "scale up" issues associated with a system like this. We don't yet have a caching layer between our application and the database. I'm researching the alternatives (including de-normalizing of our data, memcached, sharding, etc.) Everything seems to point to moving towards an approach like couchbase, although I'm not convinced our data fits the "document" model yet.

I apologize for the density of this posting. I also know that many of the questions are not "couchbase" specific, so if this isn't the best forum for this I apologize (and if someone could redirect me to a better place to ask those questions I'll gladly go there). Then again, I assume that these style of questions are probably common to people looking at couchbase for the first time.

Here are my questions:

1) Is there any good reference of how to model a "document oriented" business model? My team and I are very used to relational models and are having some difficulty determining how to model some concepts in the document paradigm.

My concern is that I'm not sure if the NoSQL approach is our best approach. We have a lot of inter-related data stored in our database that is individually accessed. In other words, there are lots of small "documents/objects" with relationships between them. I have a specific use case down below for one (of the many) issues that is making me worried about the NoSQL approach.

UPDATE: Just saw the 8 page whitepaper on the main website. I'm reading through that now to see if it helps.
UPDATE2: It was informative, but a bit too high level to really help with this particular question. I'm still continuing to google.

2) We are considering a more event driven approach to delivering data to our end users. This is something that the RDBMS doesn't help us with at all. Does Couchbase support the /db/_changes API? It seems perfect for delivery of events to a web front end so we can keep our UI up to date with things that are happening elsewhere in the system. To this point, if it does support it - then how does one specify the "filter functions" needed for optimal use of it (the current design document UI doesn't appear to have a section for filters).

A) Is TAP the right way to do this? It appears to me that TAP is per server, so I'd have to make connections to all the servers in my cluster? Is that true? How does that deal with replicas (where multiple servers are writing the same document)?

3) While moving to this new architecture, we will have to be running in a "hybrid" mode where we monitor what is happening in our relational database - since the entire application will not be able to be moved to the new paradigm at once. My thoughts were to keep "proxy" objects (with basically primary keys of the tables needed to construct the document) in couchbase during this timeframe and keep the real data in the relational database.

A) Are there any "best practices" for this out in the real world? I'd assume that this isn't that much different than the "cache coherency" issue for memcached? Is it viable to keep somewhat rapidly changing data in the "cache"? (Any places someone can refer me to about how to keep memcached objects "up to date" with relational data - or is the typical use case only for rarely edited objects that can be refreshed on an expiration?)

B) Is it better to model the documents as DAO style objects that read/write to the underlying relational database and then move to couchbase server once the entire solution has been converted to use those more document like DAOs?

4) Any advice on how to model a message queue system using couchbase? It seems well suited to this purpose. Most of the service bus / message queue systems I've researched for .NET don't seem to handle node failure all that well. (UPDATE: See question below for concerns about couchbase's ability to handle node failure). It seems like a good solution for our messaging layer even if I choose not to store the actual data objects in the database.

5) I'm concerned about durability of the data in couchbase.
A) Is it true that objects are only written to RAM and then spooled to disk later?
B) Is there an option to require the write to disk to happen?
C) Any support for Quorum writes / reads?
D) Auto-failover seems limited on the product. I understand the desire to stop a network partition or the thundering herd from destroying your day, but what if the deployed solution doesn't have a dedicated couchbase trained DBA? I understand manually having to rebalance when adding nodes, but it seems short sighted to require manual rebalancing when cluster nodes go down. Am I just misreading this, or does the cluster re-elect a new master from the replicas when a node is detected to come down? Is this due to a lack of quorum based reads/writes, so you can't tell if the replicas are in fact consistent? (This is a major disappointment for me right now, so I hope I'm wrong about the cluster not healing itself).

6) I'm seeing people having issues with the view layer working quickly and returning correct data. I was planning on using views extensively to implement business logic. Is that the wrong approach? Is it better to build "lists" in the datastore (similar to how membase seems to work) rather than relying on views? Are views only returning data that has been written to disk?

7) I've seen some suggestions to link couchbase to HadOOP for more complex / ad-hoc queries. Is that a "best practice", or is it more of a hack/kludge?

8) When using the java client (spymemcached), I noticed that it seems to make a connection to every node in the cluster. While that seems reasonable to me when the cluster is small, how does that work with 100s of cluster servers? (Am I doing the wrong thing by using http://localhost:8091/pools)? Is there a way to use Multicast UDP instead of TCP? Is this a non-issue?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A scenerio that keeps me up at night about NoSQL (and hopefully insights into this would help me wrap my brain around NoSQL):

I have a set of objects that are arranged in a somewhat flexible hierarchy (not a true tree since some objects can have multiple "parents"). We query things through this hierarchy constantly and access would need to be fast. Also, this data is "semi-rigid" in that it doesn't change often but when it does it has to be consistent.

Example:

Object type F is a leaf
Object type E can contain F
Object type D can contain E
Object type C can contain F
Object type B can contain D
Object type A can contain B, C

So, a typical hierarchy would look something like this:

A1
|---B1
|    |
|    |---D1
|    |   |----E1
|    |   |     |---F1
|    |   |     |---F2
|    |   |----E2
|
|---C1
|    |--F1
|    |--F2
...

We query for any of the sub objects by any of the higher up objects.

Is it better for a deeply held object to be modeled like:

F1 { parentE: E1, parentD: D1, parentB: B1, parentC: H1, parentA: A1 }

or only hold its direct parents?

F! { parentE: E1, parentC: C1 }

Seems like the first one is very handy for the typical use case of searching by all the parents, versus the 2nd one that seems like I'd need to multiple queries to find what I'm looking for and possibly do a very wide search. But what happens if I change D1 to a different parent? It means that I have to update all the documents in the system that are under D1 to update their "parentB and parentA" fields.

I'm okay with that update (since getting the list of objects under D would seem to be an easy thing to do), but I can't see a way in couchbase to do this atomically?

Alternatively, I can't see a way to use Map-Reduce to help with this if I was to store the objects in the simplier format.

Is there a way to use the "Map/Reduce" concept to do a bulk update across the cluster? Something like:

function(doc)
{
if (doc.parentD = "D1")
{
doc.parentB = "B2";
doc.parentA = "A2";
}
}

Am I looking at the problem incorrectly? I'm starting to feel that with data that is highly hierarchical like this the document store approach isn't going to work reasonably. Then again, this whole paradigm is new to me, so I may be over thinking it.

Thanks for any insight you can provide to any of these questions!

Top
  • Login or register to post comments
  • Login
  • Register

Company

  • About Us
  • Leadership
  • Customers
  • Partners
  • Contact Us

Product

  • Couchbase Server
  • Couchbase SDKs
  • Use Cases
  • Documentation
  • Forums

Open Source

  • Couchbase Project
  • Couchbase vs. CouchDB

Commercial

  • Subscriptions & Support
  • Training & Services

News

  • Blog
  • Newsletter
  • Press Releases
  • Buzz

Follow Us

    
  • Customer Login
  • Terms of Service
  • Privacy Policy
  • Trademark Policy
  • Site Map

© 2013 COUCHBASE All rights reserved.

Sign in to Couchbase Community

close
  • Create new account
  • Request new password
You are logging into the Forums, Wiki and Issue Tracker