SQL has been around for a loooooong time. It is very intuitive and efficient way to process structured data and has been the choice for databases for many years. With the world of BIG DATA however, data has velocity, variety and volume. SQL can tackle the 2 “v”s fine with optimizations: “velocity” and “volume”. In fact, many new dielects of SQL (N1QL, Spark, U-SQL, Impala, Drill and more) are doing exactly that. However, “variety” is a different ballgame! Big Data is complex in the new world; it has unpredictable, constantly evolving, jagged scheme, sparse values and deeply nested structures. For that SQL needs to be extended!

Well, I am super excited to report that we have built N1QL in Couchbase Server 4 to tackle exactly these issues! N1QL can easily traverse over the complex structures in JSON (the de-facto serialization standard for complex data). Using N1QL, you can work with not only NULLs but attributes that are MISSING in various shapes of JSON you process. OR you can use operators like ANY/EVERY to query over arrays embedded within JSON document. OR you can use commands like UNNEST & NEST that can flatten or unflatten nested arrays. There are too many of these powerful extensions to count here so I won’t do that. Instead I am going to show you one hidden gem that was originally posted here by Gerald. This gem is very useful with aggregates if you are using N1QL. vs a relational database like SQL Server.

One of the big advantages of N1QL is its ability to understand the array types. Aggregation functions like MAX() are not revolutionary but with the additions of nesting and arrays, something as simple as MAX() can be super powerful. One note before I dive in – I am going to pick on TSQL and SQL Server as I contributed to TSQL in past life at Microsoft. However this is as applicable to Oracle, Postgres, Mysql, Informix or DB2 and more… Ok so: Imagine trying to find attributes of the product that has the MAX price. In SQL Server that is fairly simple query to write using TSQL;

That is good. Here is the output – the product with the highest price is product named “c” with ID 3.

Here is the execution plan looks like. Basically  The execution plan scans for the MAX price value. Once you have the value, It is a nested loop join to search for the other attributes of the product in the table like productID and name.

However, N1QL has a great advantage here. Since it can process MAX() with arrays, you can return all the attributes of the document without an additional scan.

OR you can simply return the full document using the following query;

Lets take a look at the execution plan for the N1QL query. Here comes the spoiler: you see a single FETCH operation that can perform the MAX and the projection is done without a second FETCH.

Even though, this is a pretty cool trick and a great performance gain, We just scratched the surface of what N1QL is capable of. There is a ton more to discover with N1QL. You can get started with Couchbase Server and N1QL here with the getting started guide.

———————————————————

By the way if you’d like to try this out with SQL Server 2016 and Couchbase 4, here are the scripts to see this yourself in action;

SQL Server Script

Note: Run this in a database called “test”

Here is the Couchbase Server Script

Note: create a bucket called “test”

insert into test(key,value) values(“1”,{productID:1, price:10, name:’a’});

Author

Posted by Cihan Biyikoglu, Director of Product Management, Couchbase

Cihan Biyikoglu is a director of product management at Couchbase, responsible for the Couchbase Server product. Cihan is a big data enthusiast who brings over twenty years of experience to Redis Labs’ product team. Cihan started his career as a C/C++ developer.

Leave a reply