Drinking the best beer

itay · January 19, 2015, 8:37pm

I want to drink a beer but first I want to taste the best 3 beers

How can I construct a View / Query / … , that will return the top 3 beers in the DB according to their Score field ?

How can I construct a View that will return the top 3 beers per brewery (given in a key parameter in the View query) ?

itay · January 21, 2015, 7:11am

Bounce Bounce Bounce

cihangirb · January 21, 2015, 7:47am

Hi @itay, you should be able to do this with a reduce but before I plunge into that, on your last query, what do you want the top 3 sorted on per brewery? you want the top 3 to be picked with a different order than score?

itay · January 21, 2015, 9:01am

Hi @cihangirb,

To make things more interesting, I actually need a 3rd level, let’s call it the city.

So,

The best 3 beers from all the beers
The best 3 beers from a specific city (specified in a key)
The best 3 beers from a brewery in a specific city (in a compound key of [city, brewery]

Beers are only be sorted by score.

Thanks

cihangirb · January 21, 2015, 11:17pm

@itay, I may be missing something but I don’t see a score on the beer sample that I can sort on but for the samples below I’ll assume there is a score on the beer.

for #1 you can emit the following and limit to 3.
if (doc.type==“beer” && doc.brewery_id && doc.name)
{
emit([doc.score, doc.brewery_id, doc.name], null);
}

the result would come in sorted on the score than on brewery_id and on beer name.
let me see if I can take a stab at #2 and #3. they need a custom reducer.

by the way, N1ql is a great option for these queries too. have you tried that? DP4 is available for a download and we’ll have N1ql in the next major version with indexing.

itay · January 22, 2015, 1:14pm

Hi @cihangirb,

You are correct, there is no score in beers.

Assuming there are million beers. If I understand the View mechanism, the View you suggest will occupy 1,000,000 (compound) keys in memory all the time and also in persistent memory.
This is a lot when all I want is just the top 3.
Am I correct ?
Is there a way to limit the View in memory ?

Regarding N1QL:

Assume I need the top 3 every sec. Is querying a view better (time, memory, CPU) then using N1QL (that runs through the million docs every call) ?
Can you point me to how to install and play with DP4 ?

Waiting for 2 & 3.

Thanks for your efforts.

cihangirb · January 22, 2015, 3:54pm

my bad, I did not mention that you can put the limit=3 on the query to limit the results to top 3. that is easy to do with view.

N1ql is certainly designed for fast response and for expressing all 3 queries above, would be much easier to work with. However I don’t have detailed perf data at this point yet given it is in development to tell you a direct side by side comparison.

you can download and play with N1QL today and it is here: http://docs.couchbase.com/developer/n1ql-dp3/n1ql-intro.html

itay · January 22, 2015, 4:24pm

You did mention to limit to 3 but this will only limit the returned result, not the occupied memory in the server.
Am I right ?

Regarding N1QL,
Although I have read that the DP version is very slow, I hope that the production version will be faster.
Regardless of specific performance, if a N1QL query has to run million docs every time while the View has it ready in memory, isn’t it obvious that in this scenario, View is better ?

cihangirb · January 22, 2015, 5:28pm

Limit is pushed down to each node so you do get the efficiency of reading only what you need with LIMIT and SKIP etc.

Views are great and they are available today and they are very unique. So views certainly can provide what you need. N1ql is in developer preview and isn’t performance tuned at this point but it comes with a number of differences over views for queries like the ones you mentioned and some of these can improve life greatly for your app.
- Expressiveness of SQL: SQL for JSON documents is very powerful. some of the queries you have above can be easier to express and maintain in N1QL.
- Flexible Consistency: We have not talked about this in your case but we provide fully consistent or a eventually consistent view of your data in views today. N1QL extends that to anything in between the 2 which can improve the response times drastically.
- Caching in N1QL and indexing will be very powerful compared to views. we will bring this into views as well later but in the initial release N1QL processing and secondary indexes will have some caching advantages.
- Scalability: N1QL will have the ability to be isolated this will have more dedicated horse power for processing queries. N1QL will be able to utilize views and the new secondary indexes. but secondary indexes will also have the same isolation benefits. Views for now stay aligned to data distribution so depending on your queries, N1QL can give you both options as opposed to views API is tied to views.
- Ad-hoc Queries: N1QL does not require you have an underlying index to process the query. With the view examples, I’ll need to give more than 1 view.

Sorry for the long post but this is just a few of the differences. there is a lot more but hopefully this scratches part of the surface on the differences.

itay · January 22, 2015, 8:32pm

I think that N1QL is great for periodical queries and can help a lot, basically like any other SQL query that the world knew before some time.

However, if the queries are frequent, then going through all the data every time is not efficient and this is when View helps as it pre-calcs the required info and even reducing it, preparing it for super fast retrieval.

But please correct me if I’m wrong:

Assuming there are million beers. If I understand the View mechanism,
the View you suggest will occupy 1,000,000 (compound) keys in memory all
the time and also in persistent memory

Waiting for 2 & 3.

Thanks for your efforts.

itay · January 29, 2015, 2:50pm

@cihangirb, still waiting for the Views for 2&3
Thanks

cihangirb · January 29, 2015, 4:26pm

Sorry @itay, got busy. We are all heads down on the next release. Do you need this urgently? I should be able to get to this next week. is that ok?
thanks
-cihan

itay · January 29, 2015, 6:40pm

I don’t want to delay the next release but would be happy to see a solution to this in the near future.

itay · February 5, 2015, 1:19pm

@cihangirb,

Although I don’t fully understand how this query works (are ORDER BY and LIMIT missing ?) , I do wander about its performance cost in real time.
Is the N1QL query being evaluated every call, based on millions of item in the bucket ?
Is the view being evaluated only offline ?
What if this query is called many times ?

http://query.pub.couchbase.com/tutorial/#37

SELECT
    product.name, 
    COUNT(reviews) AS reviewCount,
    ROUND(AVG(reviews.rating),1) AS AvgRating,
    category 
        FROM reviews AS reviews
    JOIN product AS product 
        ON KEYS reviews.productId
    UNNEST product.categories AS category

Plus, according to N1QL roadmap, when is it scheduled to be released ?
Will it be available with the community server version as well at the same time ?

cihangirb · February 6, 2015, 11:05pm

@itay, thanks for all the questions and being so active on these forums. The answers are a little lengthy but I have a great presentation I can walk you thought on how views and n1ql works under the hood. I can walk you through in 10-15 mins on the differences. Is there a good time to get on a video call next week. I am on PST in San Fran and my email is cihan @ couchbase.com. We can also touch on the 2 queries I owe you.
The next version of Couchbase Server is scheduled for Q2/Q3 this year so in a few months. I don’t have the answer for community edition question at the moment.
thanks
-cihan