UC San Diego, Couchbase Collaborate on Next-Generation Query Language for Big Data
June 03, 2015
Combines flexibility of JSON with power of SQL
Common Vision: SQL + JSON
Prior to their collaboration, both Couchbase and Prof. Papakonstantinou independently concluded that existing approaches did not provide a complete and efficient solution for querying semi-structured data. Both shared a common vision of combining SQL, the leading database query language, with JSON, the leading format for modeling semi-structured data in modern applications. Both had launched work in that direction, and their decision to collaborate is based on this common vision.
Couchbase will fund continued research at UC San Diego to further the development of SQL++, a formally-defined, SQL-backwards-compatible declarative language for semi-structured data developed by Papakonstantinou’s team at UC San Diego’s Database Group. Couchbase will also continue to enhance N1QL, the company’s query language that extends SQL for JSON and is consistent with specifications defined by SQL++.
SQL++ is easy to learn, especially for developers who are familiar with the syntax of SQL. But unlike a relational database, where all data must fit neatly into tables, JSON is a lightweight data-interchange format that is easy for humans to read and write, and for machines to generate and parse.
As detailed in a recent technical report* from the UC San Diego Database Group, SQL++ co-creators Papakonstantinou, as well as researcher and CSE alumnus Kian Win Ong (PhD ’12), specify the syntax and semantics of SQL++, which is much cleaner and only introduces a small number of query language extensions to SQL. “SQL capabilities are most often extended by removing semantic restrictions of SQL, rather than inventing new features,” said Papakonstantinou. “This allows SQL++ to avoid unnecessary extensions over SQL.” The ease of use is also enhanced because SQL++ semantics tend to be significantly shorter than in prior query languages.
SQL++ and N1QL
After looking at 11 query languages, Papakonstantinou concluded that none provided full-fledged querying of semi-structured data. Funded by the National Science Foundation (NSF) and Informatica as UCSD’s FORWARD project, he and his team developed and launched the SQL++ specification. Concurrently, Couchbase had independently developed N1QL to provide a comprehensive query language, combining the query power of SQL with the flexibility of JSON data.
“Enterprises began to ask for declarative queries on semi-structured databases. With SQL++ you have a declarative query language that queries JSON and is backwards compatible with SQL,” said Papakonstantinou. “This is a query language for the new era of big data, because it operates on semi-structured data but is fully declarative and SQL compatible. It gives you the best of both worlds. Couchbase N1QL aligns with the SQL++ specifications and the requirements of querying semi-structured data.”
“We are delighted to work with professor Papakonstantinou and his research team because they share our vision that a declarative query language for JSON should be based on SQL,” said Gerald Sangudi, Chief Architect for query engineering at Couchbase. “SQL++ also brings rigor and completeness that are beneficial to our users.”
In fact, Couchbase and UCSD have formally established that N1QL is a dialect of SQL++. The formal mapping of N1QL to SQL++ is being published separately.
Others to Join Collaboration
In addition to Couchbase, UCSD will also invite other academic and industry partners to join a query language collaboration, in order to benefit users and ease the adoption of semi-structured and NoSQL databases. Already, UC Irvine’s AsterixDB *, led by professor Mike Carey, supports most of SQL++ and is on the path to supporting the full SQL++. The collaboration has already provided important language design feedback.
*Kian Win Ong, Yannis Papakonstantinou, Romain Vernoux, The SQL++ Query Language: Configurable, Unifying and Semi-structured, Technical Report 2015, Department of Computer Science and Engineering, University of California, San Diego, 29 April 2015. http://arxiv.org/pdf/1405.3631v7.pdf
About UC San Diego Database Group
The Database Group is located in UC San Diego’s Computer Science and Engineering department, and is led by CSE professor Yannis Papakonstantinou, a leading expert on databases and data management technologies. He is also a co-director and on the faculty of the university’s new professional Master of Advanced Studies in Data Science and Engineering, launched in Fall 2014. Papakonstantinou is also an entrepreneur: in 2000 he founded Enosys Software, which was acquired by BEA Systems in 2003. Enosys was one of the first companies to feature a semi-structured data query processor, using XML, which is currently being rapidly replaced by JSON. More recently, Papakonstantinou, researchers Kian Win Ong and Yannis Katsis and their team of PhD and MS graduate students worked on the FORWARD project, a rapid development platform for analytics applications that uses SQL++ to create and incrementally update integrated views of data across multiple databases (SQL, NoSQL, or both). FORWARD includes a middleware query processor that uses SQL++ to issue distributed queries over a variety of data sources, including SQL, NoSQL, NewSQL and SQL-on-Hadoop. The FORWARD project's SQL++-based visualization and app development platform has been commercially deployed. More information about FORWARD project at http://forward.ucsd.edu/.
At Couchbase, we believe data is at the heart of the enterprise. We empower developers and architects to build, deploy, and run their most mission-critical applications. Couchbase delivers a high-performance, flexible and scalable modern database that runs across the data center and any cloud. Many of the world’s largest enterprises rely on Couchbase to power the core applications their businesses depend on. For more information, visit www.couchbase.com.