Index and how to find data inside a doc
we are working on an app which has facebook style tags where users can tag someone at a location or in a picture etc. So we are planning to store the complete picture info and all its related data like comments and tags in a single doc that way we dont have to search etc. Here is my question, is it posible to search docs for data like find all tags in a doc where the userid = x
the array stored in my doc looks like this
taggs
taggid
userid
username
taggs
if so, how whould i go about it and how scalable and fast would it be. The other consideration we are having which for sure will be the faster one, is to
go and create a taggs_user_userid doc and store all tagged items for this user. Even so its faster, it will require extra storage, as well as extra code to maintan dataintegrity so if a tagg gets deleted, it will also be removed from the user tag doc and vs versa
So hopefully someone can give me a good idea about if its posible and how it scales. We are expecting to add about 10 million + docs a month which can have 1 or more user taggs
Thanks
Here is a document we are working with, the tags array holds the tagged.id which there can be 0, 1 or many in a doc and i want to be able to find all docs a specific user is listed in a tagged.id field. See it like facebook, how there is a function which shows al pictures a user has been tagged in. Hope that makes it more clear
{
"object_type": "picture",
"from": {
"id": 5,
"name": "Tom Miller"
},
"name": "Great day at the Park",
"note": "This picture shows everyone having a great day in the park",
"icon": "http://pic.myicon.com/abc.jpeg",
"picture": "https://fbcdn-photos-a.akamaihd.net/hphotos-ak-ash4/400507_2529780763684_785627980_s.jpg",
"source": "https://fbcdn-photos-a.akamaihd.net/hphotos-ak-ash4/400507_2529780763684_785627980_s.jpg",
"height": 720,
"width": 480,
"format": "jpg",
"original_file_size": 87092,
"orientation": 0,
"privacy": 0,
"xxx_protect": 0,
"location_privacy": "",
"originalfilesize": 87092,
"place": {
"lat": "",
"long": "",
"altitude": "",
"name": "",
"thoroughfare": "",
"sub_thoroughfare": "",
"locality": "",
"sub_locality": "",
"administrative_area": "",
"sub_administrative_area": "",
"postal_code": "",
"iso_country": "",
"country": "",
"inland_water": "",
"ocean": "",
"areas_of_intrest": ""
},
"created_time": "12/02/2012 11:04:12",
"updated_time": "12/03/2012 11:04:12",
"image_date": "12/03/2012 11:04:12",
"last_coment_id": "",
"last_tag_id": 10,
"images": [
{
"image_type": "320x320",
"height": 320,
"width": 320,
"source": "https://fbcdn-sphotos-a-a.akamaihd.net/hphotos-ak-ash4/s320x320/400507_2529780763684_785627980_n.jpg"
}, {
"image_type": "135x180",
"height": 135,
"width": 180,
"source": "https://fbcdn-photos-a.akamaihd.net/hphotos-ak-ash4/400507_2529780763684_785627980_a.jpg"
}, {
"image_type": "97x130",
"height": 97,
"width": 130,
"source": "https://fbcdn-photos-a.akamaihd.net/hphotos-ak-ash4/400507_2529780763684_785627980_s.jpg"
}, {
"image_type": "2048x2048",
"height": 1536,
"width": 2048,
"source": "https://fbcdn-sphotos-a-a.akamaihd.net/hphotos-ak-ash4/s2048x2048/400507_2529780763684_785627980_n.jpg"
}
],
"comments": [
{
"comment_id": 3,
"from": {
"id": 8,
"name": "Test User 1"
},
"message": "Message 1, test 1",
"created_time": "12/01/2012 10:15:20",
"likes": 0,
"dislikes": 0
}, {
"comment_id": 4,
"from": {
"id": 12,
"name": "Test user 3"
},
"message": "Message 1, test 2",
"created_time": "12/01/2012 10:18:20",
"likes": 2,
"dislikes": 0
}, {
"comment_id": 9,
"from": {
"id": 24,
"name": "Test User 2"
},
"message": "Message 1, test 3",
"created_time": "12/01/2012 11:15:20",
"likes": 1,
"dislikes": 1
}, {
"comment_id": 10,
"from": {
"id": 17,
"name": "New User 1"
},
"message": "Message 1, test 4",
"created_time": "12/02/2012 10:15:20",
"likes": 0,
"dislikes": 0
}
],
"likes": [
{
"like_id": 1,
"id": 4,
"name": "Tome miller"
}, {
"like_id": 2,
"id": 5,
"name": "Frank Thomson"
}, {
"like_id": 3,
"comment_id": 4,
"id": 4,
"name": "Tome miller"
}, {
"like_id": 4,
"comment_id": 4,
"id": 5,
"name": "Frank Thomson"
}
],
"dislikes": [
{
"dislike_id": 1,
"id": 8,
"name": "The grinch",
"comment_id": 9
}
],
"tags": [
{
"tag_id": 1,
"owner": {
"id": 5,
"name": "Frank Tagg"
},
"tagged": {
"id": "",
"name": "frank Miller"
},
"x": 150,
"y": 25,
"xpercent": 55,
"ypercent": 88
}
]
}
I am not sure I accurately understand your description, like what is the difference between a tagg and a tag doc. Are there also "keyword" tags as well? Your description is a little confusing.
But I think this is what you are saying:
You have several options as far as how to go about retrieving tagged documents. One would be to create a View (Map/Reduce) to aggregate by id and tag to find comments and user_tags within pictures. You can create another Spatial View (Map/Reduce) to locate items within a geo range. Alternatively you can use Elastic Search to query for documents that involve user_id = x.
Can you confirm the structure of the doc, or help me to understand better so I can help you if you need assistance with writing the Views, if you need assistance with that. As far as scalability, it will be more about providing enough CPU for Views and enough Horizontal scale for writes. 10 Million docs a month is fairly modest and you should be good with either solution.
@scalabl3
Technical Evangelist
Couchbase Inc.