We are building a large inventory of documents with a picture attached to each document. We where thinking of storing the images inline with the document in couchbase base64 encoded so that we can quickly retrieve the image along with the document. The images may run anywhere from a few K to several MB. Total storage will eventually scale to several terrabytes which would all have to be kept in couchbase with this solution.
We are eventually expecting tens of thousands of reads/second across our couchbase infrastructure with the expectation to serve up the document with image in the sub 20ms timeframe.
A few more reasons we’d like to do this is to:
- Simplify our cross data center replication of documents along with the images
- Simplify our infrastructure. Just our web app and coucbhase to deploy.
- Reduce network trips.
So…why is this a really bad idea?
Caveat: You thought of this solution for a reason, and I can’t guess what that is. It’s not a “bad” idea, but it may not be the most efficient idea in some particular ways. Generally static content like images are much easier done with S3 and CloudFront. It costs you less in general, and you are reducing the network and cpu overhead of encoding/decoding and data transfer. It could also even be faster for the clients (users) because of CloudFront’s ability to distribute to edges. Keeping the images in Couchbase means:
- Retrieving larger document with metadata and base64 encoded (from couchbase to app server)
- Decoding and transferring larger amount of data (metadata + image) to the client/user (from app server to client)
- Client must retrieve information and images from same cluster (the cluster of app servers with cb)
In the S3 way, you can just keep the S3 URL with the Couchbase Document (as metadata) rather than the actual image. This also reduces the RAM footprint required to keep all your documents in RAM (since the images part has been decoupled…) This scenario looks like this:
- Retrieve tiny document from couchbase with URL (cb to app server)
- Transfer url + metadata to client/user (app server to client)
- Client retrieving image from S3/CloudFront using closest location (highest speed)
Do you not want clients (users) to asynchronously retrieve images?
Other than that, there is no reason you couldn’t do this method and it’s not a terrible idea or anything, but it limits your ability to take advantage of CDN (content distribution networks).