I need to figure out which entity design way is the best for my situation.
I am currently migrating from MySql to Couchbase, so need to find out the best way while migrating my entities. Here is my case:
We have a Job entity and it is scheduled to work on specific dates. So, we need to store the future execution times. Also the previous executions should be stored with the additional statistics.
At first, I stored the next execution times in Job document as an array for quick access. But it slows down the retrieval of all execution times of all jobs since we are iterating over all Job documents.
Thats why I have moved them into a document called “nextExecutions”. Whenever a job is scheduled, this document is modified. And when the execution is done, they are moved to “executionHistory” document per Job.
But this time, finding the executions of all jobs slows down.
In short, I have an tendency to combine similar items into an array, instead of storing them in separate documents. Is my tendency harmful? And how would you design such a scenario?
How large are your arrays of execution times? I wouldn’t have expected much slowdown in retrieving them unless they are very large…
You say you are “iterating over all Job documents” - that sounds like you might be using a N1QL query to search across all Jobs which match a particular execution time?
Note while N1QL is a very flexible and powerful query language, that flexibility comes at a price - for very high throughput situations you may find that direct Key-Value lookup is faster; assuming you can model your data in a suitable way.
Without knowing the details of your application it’s hard to make specific recommendations, but consider if you can structure your data such that:
Your most frequent / performance-sensitive operations can be achieved by reading & writing a fixed number of documents (and ones which you already know the Key of)
Your less frequent / less performance-critical / background processes use indexes and N1QL queries (if necessary) to access a range of documents selected from the whole bucket.
Couchbase (and NoSQL in general) is about making use of the “right tool for the job” - direct Key/Value access when you need extremely fast and scalable (~O(1) per document) access patterns, Map/Reduce views when you need pre-computed values for a large percentage of your dataset; N1QL and GSI for more complex patterns accessing multiple documents and performing aggregations / selections on that and FTS for Full-Text Searches