Couchbase
  • Why NoSQL?
  • Couchbase Server
  • Download
  • Resources
  • Careers
Home | Forums | Couchbase | Couchbase Server 2.0

Couchbase views - just how far can I push this thing :) ... 10's or 100's of millions of index entries?

4 replies [Last post]
  • Login or register to post comments
Mon, 08/20/2012 - 21:37
Tim Pedersen
Offline
Joined: 08/16/2012
Groups: None

I'm wonder just how far I can push Couchbase views... what are the limits of the technology with regards to a realistic maximum number of design documents and views within a design document, and what is a realistic maximum number of elements and overall aggregate amount of data I can store reliably within a view index and still have reliable performance?

Are we talking millions, tens of millions or hundreds of millions of possible elements/nodes in a view index? How many views containing large numbers of elements are possible in a single bucket?

How far have Couchbase views been stress tested? What is the largest amount of data and number of elements seen in the field so far?

How far should I realistically push Couchbase views/indexes - am I pushing it too far to have a dozens of views in a single bucket with 50-100 million indexed elements/nodes each?

My use case:

I'm primarily interested in grouping related data through the use of collated views. I'm mostly only creating views that have null values, as I'm only interested in the collated keys and the doc._ids.

I'm currently prototyping a database that contains around 2 million entities and around 8 million 'facts' relating to these entities. The database is around 4GB on disk. I'm using collated views to relate/aggregate the entities and their facts. (See any recent presentations by Rich Hickey about value-oriented and fact-oriented programming and databases for the rationale). Essentially I'm creating a precalculated query dataset implemented via views, using map functions to crawl the database and generate and maintain the query dataset.

For example, a typical view is:
function (doc) {
var key;
if(doc.schemaType == "person"){
emit([doc._id, 0], null);
} else if(doc.schemaType == "personLink"){
key = "person:" + doc.fromPersonId;
emit([key, 1], null);
key = "person:" + doc.toPersonId;
emit([key, 2], null);
} else if(doc.schemaType == "personIdentityLink"){
key = "person:" + doc.personId;
emit([key, 3], null);
} else if(doc.schemaType == "personAddressLink"){
key = "person:" + doc.personId;
emit([key, 4], null);
} else if(doc.schemaType == "personSystemLink"){
key = "person:" + doc.personId;
emit([key, 5], null);
}
}

... a pretty standard collated view. There is a 'person' entity and lots of 'facts' about what this person is linked to. I have similar views, eg. collating on addresses and the entities linked to them.

On a single node 8 core computer this view intiallly took around 1.5 hours to index.

Querying this particular view reports 7 million 'total_rows', and so far pulling down results including documents using range queries only averages around few tens of milliseconds. The longest query for a person with a lot of facts and documents is around 70 milliseconds.

So far so good - scaling up and out is probably only going to improve this

My question is, once I increase my dataset, with a couple of dozen similar views, across maybe 20 million entites and 50-100 million associated facts, will Couchbase hack it?

BTW: Any tips to speed indexing up?

Cheers,

Tim Pedersen

__________________

Tim Pedersen

Top
  • Login or register to post comments
Tue, 08/21/2012 - 14:37
scalabl3
Offline
Joined: 07/18/2012
Groups:

This is awesome! Couchbase can hack it. Indexing is more a question of the rate of data influx instead of volume, for the most part, since it's incrementally indexed. That's why initial indexes take a while, because they are going over the entire data set. 1.5 hours is actually pretty fast, there are SQL indexes that can take days. It used to be that we allowed partial datasets (incomplete views) during the initial indexing, but that process made indexing take too long, so we changed it to an initial indexing period.

But this is a good question and I know we have done testing on this, let me see what I can find out.

I would love to know more about your project, would you mind if I contact you about it?

Thanks!

__________________

@scalabl3
Technical Evangelist
Couchbase Inc.
 

Top
  • Login or register to post comments
Tue, 08/21/2012 - 16:07
Frank
Offline
Joined: 06/28/2010
Groups: None

The neat thing about Couchbase Server's indexing is that it is equally distributed across all nodes. Each node has an index for the data active on it. So the amount of data indexed per node is 1/(number of nodes).

As a result you can index very large data sets, simply by adding additional nodes to distribute the indexing load and storage.

There is still performance work ongoing, so retry the test with Beta :)

Cheers,

Frank

Top
  • Login or register to post comments
Tue, 08/21/2012 - 21:15
dipti
Offline
Joined: 11/02/2011
Groups:

Hi Tim

Which build of 2.0 are you currently using? If you are on DP4, I would recommend using a "Recent build", now also available on our download page (http://www.couchbase.com/download) Build 1495 is the latest.

This particular build includes specific improvements around initial index building. You should see a significant improvement for this scenario. In addition, several performance improvements have been made to improve query latency.

Another aspect that you may want to think about is file system cache available for the index. Depending on the size of your indexes you may want to size for additional RAM. The OS will take care of keeping the working set of the actively accessed index pages in RAM. We have seen query latency half by doubling the OS cache available for the index beyond the bucket quota. Throughput also significantly improved.

Hope this helps

- Dipti

Top
  • Login or register to post comments
Thu, 08/23/2012 - 05:30
Tim Pedersen
Offline
Joined: 08/16/2012
Groups: None

Sure, you can get me at Tim dot Pedersen at police dot tas dot gov dot au

__________________

Tim Pedersen

Top
  • Login or register to post comments
  • Login or register to post comments
  • Login
  • Register

Company

  • About Us
  • Leadership
  • Customers
  • Partners
  • Contact Us

Product

  • Couchbase Server
  • Couchbase SDKs
  • Use Cases
  • Documentation
  • Forums

Open Source

  • Couchbase Project
  • Couchbase vs. CouchDB

Commercial

  • Subscriptions & Support
  • Training & Services

News

  • Blog
  • Newsletter
  • Press Releases
  • Buzz

Follow Us

    
  • Customer Login
  • Terms of Service
  • Privacy Policy
  • Trademark Policy
  • Site Map

© 2013 COUCHBASE All rights reserved.

Sign in to Couchbase Community

close
  • Create new account
  • Request new password
You are logging into the Forums, Wiki and Issue Tracker