Hello,
I’ve started researching Couchbase in the last days and I will be using Couchbase 4.5 EE on a production server very soon. From what I’ve been reading, the following are the main document lookup strategies:
- Document Lookup Strategies
1) Lookup by Key. Totally preferred over all other methods and the data structure should be built with this in mind.
2) Lookup by View Queries using MapReduce. Should be used in second instance, when we need data but we don’t have the needed data keys. We should create a view for each feature with a MapReduce, and if we need to get specific data, instead of joins, we should get Object IDs with Reduce and do a MultiGET request with those Object IDs.
3) Lookup by N1QL. Should be used only in last, as it does not use primary nor secondary indexes. It tries to optimize the query but it is not as performant as first and second options. It should be used mainly for dynamic data search or when we aren’t able to get data with option 2).
At first I would like to know if my current assessments of each concept is correct.
- Data Modelling
Taking into consideration the previous points, I’ve been thinking about the structure for my data.
As I’ve pointed out in 1) (Document Lookup Strategies), I am making the effort to create documents to be found based on document key.
I am currently building a social feed so let me present one of the concepts I am currently wondering. Let’s see the following example:
user_activity::134
{ "_id": 134, "_type": "user_activity", "type": "user_follow", "related_id": "user::8", "created_by": "user::1", "updated_at": 1472741437, "created_at": 1472741410 }
Problem 1:
In the Activity we have an entry of what could be a social feed item. At first I see one problem here. I came from a RDBMS background, and my first thought is to relate data instead of putting it directly in the document.
Of course we have advantanges and disadvantages on both approaches, however for this case, as I want to have updated user data on the user I follow (and even my information of course) I guess this could be the best way.
If not, I would need to update every entry once the user changes his username for example. Let’s say is related to or created ~300.000 activities. At first it would be impossible to solve that with a single call to server, and background workers should be implemented to solve this (and add another layer of complexity). Am I thinking well?
Problem 2:
Activites may have likes. People may like activities on their social feed. Activities do show the total number of likes, so I changed activities to the following (ignore the fact that people like a “user_follow” action, it is just an example):
user_activity::134
{ "_id": 134, "_type": "user_activity", "type": "user_follow", "related_id": "user::8", "total_likes": 27, "created_by": "user::1", "updated_at": 1472741437, "created_at": 1472741410 }
As I don’t need to present which users liked an activity I haven’t added that to the document. However, in social feed I need to inform the user if he has liked the post or not (to present an active button or not). So as 1) (Document Lookup Strategies) is preferred, I thought about having something like the following:
user::1737::user_activity::134::user_activity_like
{ "_type": "user_activity_like", "user_id": 1737, "user_activity_id": 134, "created_at": 1472741416 }
At this point, I already have the user ID that I want to check if has liked the activity or not (the logged-in user) and the user_activity the user is seeing.
This has the disadvantage of having to do another GET request to check if this value exists or not (the user has liked or not), however it should be extremly performant as it is a direct key lookup.
In the future if I do need to see what users like X activity of what activities are liked by Y user, I can use approach 2) (Document Lookup Strategies).
Am I correct on this approach?
- Data Parsing
At a third stage, I’ve been noticing that people do relate data by specifying the key of another document. That seems good to me, however I am a little curious of how people do build their outputs to webservers based on that. Let’s take the following example:
user::184
{ "_id": 184, "_type": "user", "name": "John Perez", "best_friends": [ "user::184", "user::9062", "user::123", ], "created_at": 1472741416 }
Seeing this example, if we want to grab the best friends user IDs what would you do?
It doesn’t make sense to me the need to join each document to get user IDs as we already have them in the key (like user::184). Do you just get the integer in the key? Or do you eventually do a join to guarantee the consistency of data?
Is it just a matter of personal choice?
Sorry for the long questions, but I’ve tried to be the most clear possible to prevent any doubts on my approaches.
Please let me know your feedback.
Thanks