Couchbase
  • Why NoSQL?
  • Couchbase Server
  • Download
  • Resources
  • Careers
Home | Forums | Couchbase | Couchbase Server 2.0

Store large documents

7 replies [Last post]
  • Login or register to post comments
Wed, 11/28/2012 - 05:06
Castlejoe
Offline
Joined: 10/27/2012
Groups: None

Hi,

I'm considering storing (json) invoices where each document, in addition to standard invoice data (invoice number, amount, due date etc), would have to store an image of the invoice too. We are talking about up to 5MB size images.

We are storing these files in MS SQL Server today which works fine for now, but we will have a dramatic increase in the number of processed files and would like to take advantage of the scaling possibilities in CB.

Is this a case where CB can be used? If I'm not mistaken, most CB documents are kb sized...

Thanks,
Joe

Top
  • Login or register to post comments
Wed, 11/28/2012 - 08:34
tgrall
Offline
Joined: 09/05/2012
Groups: None

Hello,

Couchbase allows you to store document up to 20Mb, so you will be able to do your project. Also for the images it will be stored in Base64, so you can also take another approach to do a reference to a 3rd party storage is you want.

What is important in your case is to understand how to size your Couchbase Cluster to have good performance, as you know the document are stored in memory (memcached) and have a good network connection between your Couchbase cluster and your application.

I am inviting your to look at this chapter in the documentation: Sizing Guidelines
http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-bestpractic...

Note that in our labs we have a project to store large files in an efficient way:
https://github.com/couchbaselabs/cbfs/wiki

__________________

Tug
@tgrall

Top
  • Login or register to post comments
Wed, 11/28/2012 - 13:25
Castlejoe
Offline
Joined: 10/27/2012
Groups: None

Thank you tgrall. I'm totally new to cb and I thought that only keys (int, guid etc) are stored in memory and not the whole document.
Having a lot less RAM then the size of my working set, most documents will be fetched from disk - do you have any estimate (magnitude) of response times I can expect in this case?

Top
  • Login or register to post comments
Thu, 11/29/2012 - 11:33
tgrall
Offline
Joined: 09/05/2012
Groups: None

No I do not have this number because it depends a lot of your infrastructure. (Disk cache, disk speed, ...)
The statistics available in Couchbase, that you can view in the Web Admin Console, will give you lot of information about this. (and you can see if you have a bottleneck or not..)

You will be able to look to that cache miss too, to see how your application is leveraging the document once they are fetched from the disk.

But once again all this, fetching data from the disk, managing the cache, is really fast in Couchbase 2.0.

Have you tested it with a good dataset?

Regards
Tug

__________________

Tug
@tgrall

Top
  • Login or register to post comments
Fri, 11/30/2012 - 04:53
Castlejoe
Offline
Joined: 10/27/2012
Groups: None

tgrall wrote:
Have you tested it with a good dataset?

Not yet, I'm still waiting for my servers to arrive. I only did a very small proof of concept with cb and mongodb on a vmware image and both worked just fine - we'll see how it is with 2 nodes and relevant dataset.
I have a nicer feeling with couchbase now, but mongo seems to have a lot bigger momentum...

Top
  • Login or register to post comments
Fri, 11/30/2012 - 09:47
tgrall
Offline
Joined: 09/05/2012
Groups: None

I can only confirm the momentum, but as you know Couchbase 2.0 as a Document store is new still in beta. (we are all impatient to have it GA, should be very soon now)

For me it is a no brainer the Couchbase architecture/Clustering is fantastic this is why I have joined Couchbase as Technical Evangelist.

To come back to the momentum, "we" (Couchbase community) can change that by just using the product and advertise it ;)

When you have your 2 nodes you will probably be surprised to see how easy and fast is the Couchbase cluster and compare to the other one...

Tug

__________________

Tug
@tgrall

Top
  • Login or register to post comments
Sun, 12/02/2012 - 12:49
dipti
Offline
Joined: 11/02/2011
Groups:

In general, the guideline is that if you have very large documents ( read media files) the best approach would be to store meta data in Couchbase. Metadata is what's heavily accessed and what you need the low latency responses for. You can store the URL to the content in the metadata document. And the document itself could live in S3 or a CDN or some other system.

You can store larger files in Couchbase as well. Currently, the system maintained metadata for all documents is stored in RAM, This allows for quick existence checks. Not recently access documents will automatically be ejected from RAM. But if your documents are large, you will have less of your data in memory.

In general you should monitor two metrics, the "resident ratio" particularly for active documents and "cache miss ratio". As resident ratio decreases, it means that your data set is more disk based. As the cache miss ratio increases, it means that you working set of data is spilling over available RAM. For best performance, most users maintain 1-2% cache miss ratios.

In 2.0, we have improved performance for disk based scenarios. But this is an area we will continue to improve on in future releases.

Top
  • Login or register to post comments
Wed, 12/05/2012 - 13:24
Castlejoe
Offline
Joined: 10/27/2012
Groups: None

dipti wrote:
You can store the URL to the content in the metadata document. And the document itself could live in S3 or a CDN or some other system.

I know that this is the traditional recomandation, but I think that highly scalable nosql databases are excellent candidates for distributed file storage.

Are there any plans for including cbfs or similar in couchbase server & SDKs (like gridfs in mongo)?

Top
  • Login or register to post comments
  • Login or register to post comments
  • Login
  • Register

Company

  • About Us
  • Leadership
  • Customers
  • Partners
  • Contact Us

Product

  • Couchbase Server
  • Couchbase SDKs
  • Use Cases
  • Documentation
  • Forums

Open Source

  • Couchbase Project
  • Couchbase vs. CouchDB

Commercial

  • Subscriptions & Support
  • Training & Services

News

  • Blog
  • Newsletter
  • Press Releases
  • Buzz

Follow Us

    
  • Customer Login
  • Terms of Service
  • Privacy Policy
  • Trademark Policy
  • Site Map

© 2013 COUCHBASE All rights reserved.

Sign in to Couchbase Community

close
  • Create new account
  • Request new password
You are logging into the Forums, Wiki and Issue Tracker