{"id":1572,"date":"2019-08-07T11:27:57","date_gmt":"2019-08-07T18:27:57","guid":{"rendered":"https:\/\/www.couchbase.com\/blog\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\/"},"modified":"2019-08-07T11:27:57","modified_gmt":"2019-08-07T18:27:57","slug":"cost-based-optimizer-for-couchbase-n1ql-sql-for-json","status":"publish","type":"post","link":"https:\/\/www.couchbase.com\/blog\/pt\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\/","title":{"rendered":"Cost Based Optimizer for Couchbase N1QL (SQL for JSON)"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><span>Authors &amp; Engineering Team: Bingjie Miao, Keshav Murthy,\u00a0 Marco Greco, Prathibha Bisarahalli. <\/span><span>Couchbase, Inc.\u00a0<\/span><\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\"><span>A rule-based optimizer knows rules for everything and cost for nothing &#8211; <a href=\"https:\/\/en.wikiquote.org\/wiki\/Oscar_Wilde\">Oscar Wilde<\/a><\/span><\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Abstract<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Couchbase is a distributed JSON database. It provides distributed data processing, high performance indexing and declarative query language N1QL along with search index, eventing and analytics.\u00a0 N1QL (Non First Normal Query Language) is SQL for JSON. N1QL is to JSON what SQL is to relations. Developers say what needs to be done and the N1QL engine will figure out the \u201chow\u201d. Current N1QL optimizer is a <\/span><a href=\"https:\/\/dzone.com\/articles\/a-deep-dive-into-couchbase-n1ql-query-optimization\"><span>rule based optimizer<\/span><\/a><span>. \u00a0 Cost Based Optimizer(CBO) for SQL was invented at <\/span><a href=\"https:\/\/www2.cs.duke.edu\/courses\/compsci516\/cps216\/spring03\/papers\/selinger-etal-1979.pdf\"><span>IBM some 40 years ago<\/span><\/a><span> and has been critical to the success of RDBMS and the developer\/DBA productivity.\u00a0 NoSQL databases were invented some 10 years ago. Shortly after their invention, NoSQL databases started adding SQL-like query language with limited access paths and optimizer capabilities. \u00a0 Most use rule based optimizer or simply support cost based optimization on simple scalar values (strings, numbers, boolean,etc) only.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>To create a CBO on JSON model, you first need to collect and organize the statistics. How do you collect, store and use statistics on JSON\u2019s flexible schema? How do you collect statistics on objects, arrays, elements within objects? How do you use them efficiently within your optimizer?<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/blog.jooq.org\/author\/lukaseder\/\"><span>Lukas Eder<\/span><\/a><span> once told me, \u201cCost-based optimizer makes SQL fully declarative.\u201d \u00a0 He has a point. Couchbase 6.5 (now GA) has Cost Based Optimizer for N1QL.\u00a0 This article presents the introduction of N1QL Cost Based Optimizer (CBO) in <\/span><a href=\"https:\/\/www.couchbase.com\/downloads?family=server&amp;product=couchbase-server-developer\"><span>Couchbase 6.5<\/span><\/a><span>.\u00a0 CBO is a patent-pending developer-preview feature.\u00a0 In this article, we describe on how you can use CBO as well as\u00a0 its implementation details.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A PDF version of this article can be <a href=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2019\/08\/Cost-Based-Optimizer-for-Couchbase-N1QL-SQL-for-JSON.pdf\">downloaded here<\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Table of Contents<\/strong><\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Abstract<\/strong><\/li>\n\n\n<li><strong>Introduction to N1QL<\/strong><\/li>\n\n\n<li><strong>Using Cost-Based Optimizer for N1QL<\/strong><\/li>\n\n\n<li><strong>N1QL Optimizer<\/strong><\/li>\n\n\n<li><strong>Cost-Based Optimizer for N1QL<\/strong><\/li>\n\n\n<li><strong>Statistics Collection for N1QL CBO<\/strong><\/li>\n\n\n<li><strong>Summary<\/strong><\/li>\n\n\n<li><strong>Resources N1QL rule-based Optimizer<\/strong><\/li>\n\n\n<li><strong>References<\/strong><\/li>\n\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\"><b>Introduction to N1QL<\/b><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span>As JSON has been increasingly accepted by the information technology industry as the lingua franca for information change, there has been an exponential rise in the need for repositories that natively store, update and query JSON documents.\u00a0 SQL has added features manipulating JSON in <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/SQL:2016\"><span>SQL:2016<\/span><\/a><span>. SQL:2016 adds new scalar and table functions to manipulate JSON.\u00a0 An alternative approach is to treat JSON as the data model and to design the language to operate on JSON natively.\u00a0 <\/span><a href=\"https:\/\/query.couchbase.com\/\"><span>N1QL<\/span><\/a><span> and <\/span><a href=\"https:\/\/forward.ucsd.edu\/sqlpp.html\"><span>SQL++<\/span><\/a><span> use the latter approach and provide natural access to scalars, objects, arrays, arrays of objects, arrays within objects and so on.\u00a0<\/span><\/p>\n\n\n<p>[crayon theme=&#8221;github&#8221; font=&#8221;consolas&#8221; font-size=&#8221;15&#8243; wrap=&#8221;true&#8221; whitespace-before=&#8221;2&#8243; whitespace-after=&#8221;2&#8243; lang=&#8221;mysql&#8221; decode=&#8221;true&#8221;]SQL:2016:  From:  Microsoft SQL Server. <\/p>\n<p>SELECT Name,Surname,<br \/>\n JSON_VALUE(jsonCol,&#8217;$.info.address.PostCode&#8217;) AS PostCode,<br \/>\n JSON_VALUE(jsonCol,&#8217;$.info.address.&#8221;Address Line 1&#8243;&#8216;)+&#8217; &#8216;<br \/>\n  +JSON_VALUE(jsonCol,&#8217;$.info.address.&#8221;Address Line 2&#8243;&#8216;) AS Address,<br \/>\n JSON_QUERY(jsonCol,&#8217;$.info.skills&#8217;) AS Skills<br \/>\nFROM People<br \/>\nWHERE ISJSON(jsonCol) &gt; 0<br \/>\n AND JSON_VALUE(jsonCol,&#8217;$.info.address.Town&#8217;)=&#8217;Belgrade&#8217;<br \/>\n AND Status=&#8217;Active&#8217;<br \/>\nORDER BY JSON_VALUE(jsonCol,&#8217;$.info.address.PostCode&#8217;)<br \/>\n[\/crayon]<\/p>\n\n\n<p>[crayon theme=&#8221;github&#8221; font=&#8221;consolas&#8221; font-size=&#8221;15&#8243; lang=&#8221;mysql&#8221; decode=&#8221;true&#8221;]N1QL:  Same query written in N1QL on the JSON model<br \/>\nSELECT Name,Surname, jsonCol.info.address.PostCode  AS PostCode,<br \/>\n       (jsonCol.info.address.`Address Line 1` + &#8216; &#8216;<br \/>\n  + jsonCol.`info.address.`Address Line 2`) AS Address,<br \/>\n       jsonCol.info.skills&#8217; AS Skills<br \/>\nFROM People<br \/>\nWHERE<br \/>\n jsonCol.info.address.Town = &#8216;Belgrade&#8217;<br \/>\n AND Status=&#8217;Active&#8217;<br \/>\nORDER BY jsonCol.info.address.PostCode<br \/>\n[\/crayon]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Learn N1QL at <\/span><a href=\"https:\/\/query-tutorial.couchbase.com\/\"><span>https:\/\/query-tutorial.couchbase.com\/<\/span><\/a><b>\u00a0<\/b><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Couchbase Server:<\/b><a href=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2019\/08\/Screen-Shot-2019-08-10-at-8.06.07-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-7517 aligncenter\" src=\"https:\/\/www.couchbase.com\/wp-content\/uploads\/sites\/5\/2026\/05\/Screen-Shot-2019-08-10-at-8.06.07-AM-300x103-1.png\" alt=\"\" width=\"777\" height=\"267\"><\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Couchbase has engines supporting N1QL:<\/span><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><span>N1QL for interactive applications in the Query service.<\/span><\/li>\n\n\n<li><span>N1QL for Analytics in the Analytics service.<\/span><\/li>\n\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><span>In this article, we focus on N1QL for Query (interactive applications) implemented in the query service. \u00a0 All of the data manipulated by N1QL is saved in JSON within Couchbase data store managed by the data service.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>To support query processing on JSON, N1QL extends the SQL language in many ways:\u00a0<\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><span>Support for flexible schema on semi-structured self-describing JSON.<\/span><\/li>\n\n\n<li><span>access and manipulate elements in JSON: scalar values, objects, arrays, objects of scalar values, arrays of scalar values, arrays of objects, arrays of arrays, etc.\u00a0<\/span><\/li>\n\n\n<li><span>Introduce a new boolean value, MISSING to represent a missing key-value pair in a document, this is distinct from a known null value.\u00a0 This extends the <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Three-valued_logic#Application_in_SQL\"><span>three-valued logic<\/span><\/a><span> to <\/span><a href=\"https:\/\/docs.couchbase.com\/server\/5.5\/n1ql\/n1ql-language-reference\/booleanlogic.html\"><span>four-valued logic<\/span><\/a><span>.<\/span><\/li>\n\n\n<li><span>New operations for NEST and UNNEST operations to create arrays and to flatten arrays, respectively.<\/span><\/li>\n\n\n<li><span>Extend the JOIN operations to work with JSON scalars, objects and arrays.<\/span><\/li>\n\n\n<li><span>To speedup the processing of these JSON documents the global secondary indexes can be created on one or more scalar values, scalar values of arrays,\u00a0 nested objects, nested arrays, objects, array objects, array elements.<\/span><\/li>\n\n\n<li><span>Add integrated search capability using inverted search index.<\/span><\/li>\n\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Using CBO for N1QL<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span>We\u2019ve introduced (CBO) in Couchbase 6.5 (now GA). Let\u2019s look at how to use the feature before delving into the details.<\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><span>CREATE a new bucket and load the data from the sample bucket travel-sample.<\/span><\/li>\n\n<\/ul>\n\n\n<p>[crayon theme=&#8221;github&#8221; font=&#8221;consolas&#8221; font-size=&#8221;15&#8243; whitespace-before=&#8221;2&#8243; whitespace-after=&#8221;2&#8243; lang=&#8221;mysql&#8221; decode=&#8221;true&#8221;]INSERT INTO hotel(KEY id, VALUE h)<br \/>\nSELECT META().id id,<br \/>\n       h<br \/>\nFROM `travel-sample` h<br \/>\nWHERE type = &#8220;hotel&#8221;<br \/>\n[\/crayon]<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><span>Sample hotel document<\/span><\/li>\n\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Here\u2019s an example hotel document. These values are scalars, objects, and arrays.\u00a0 A query on this will access and process on all of these fields. <\/span><\/p>\n\n\n<p>[crayon theme=&#8221;github&#8221; font=&#8221;consolas&#8221; font-size=&#8221;15&#8243; whitespace-before=&#8221;2&#8243; whitespace-after=&#8221;2&#8243; lang=&#8221;js&#8221; decode=&#8221;true&#8221;]{<br \/>\n    &#8220;hotel&#8221;: {<br \/>\n      &#8220;address&#8221;: &#8220;Capstone Road, ME7 3JE&#8221;,<br \/>\n      &#8220;alias&#8221;: null,<br \/>\n      &#8220;checkin&#8221;: null,<br \/>\n      &#8220;checkout&#8221;: null,<br \/>\n      &#8220;city&#8221;: &#8220;Medway&#8221;,<br \/>\n      &#8220;country&#8221;: &#8220;United Kingdom&#8221;,<br \/>\n      &#8220;description&#8221;: &#8220;40 bed summer hostel about 3 miles from Gillingham, housed in a districtive converted Oast House in a semi-rural setting.&#8221;,<br \/>\n      &#8220;directions&#8221;: null,<br \/>\n      &#8220;email&#8221;: null,<br \/>\n      &#8220;fax&#8221;: null,<br \/>\n      &#8220;free_breakfast&#8221;: true,<br \/>\n      &#8220;free_internet&#8221;: false,<br \/>\n      &#8220;free_parking&#8221;: true,<br \/>\n      &#8220;geo&#8221;: {<br \/>\n        &#8220;accuracy&#8221;: &#8220;RANGE_INTERPOLATED&#8221;,<br \/>\n        &#8220;lat&#8221;: 51.35785,<br \/>\n        &#8220;lon&#8221;: 0.55818<br \/>\n      },<br \/>\n      &#8220;id&#8221;: 10025,<br \/>\n      &#8220;name&#8221;: &#8220;Medway Youth Hostel&#8221;,<br \/>\n      &#8220;pets_ok&#8221;: true,<br \/>\n      &#8220;phone&#8221;: &#8220;+44 870 770 5964&#8221;,<br \/>\n      &#8220;price&#8221;: null,<br \/>\n      &#8220;public_likes&#8221;: [<br \/>\n        &#8220;Julius Tromp I&#8221;,<br \/>\n        &#8220;Corrine Hilll&#8221;,<br \/>\n        &#8220;Jaeden McKenzie&#8221;,<br \/>\n        &#8220;Vallie Ryan&#8221;,<br \/>\n        &#8220;Brian Kilback&#8221;,<br \/>\n        &#8220;Lilian McLaughlin&#8221;,<br \/>\n        &#8220;Ms. Moses Feeney&#8221;,<br \/>\n        &#8220;Elnora Trantow&#8221;<br \/>\n      ],<br \/>\n      &#8220;reviews&#8221;: [<br \/>\n        {<br \/>\n          &#8220;author&#8221;: &#8220;Ozella Sipes&#8221;,<br \/>\n          &#8220;content&#8221;: &#8220;This was our 2nd trip here and we enjoyed it as much or more than last year. Excellent location across from the French Market and just across the street from the streetcar stop. Very convenient to several small but good restaurants. Very clean and well maintained. Housekeeping and other staff are all friendly and helpful. We really enjoyed sitting on the 2nd floor terrace over the entrance and &#8220;people-watching&#8221; on Esplanade Ave., also talking with our fellow guests. Some furniture could use a little updating or replacement, but nothing major.&#8221;,<br \/>\n          &#8220;date&#8221;: &#8220;2013-06-22 18:33:50 +0300&#8221;,<br \/>\n          &#8220;ratings&#8221;: {<br \/>\n            &#8220;Cleanliness&#8221;: 5,<br \/>\n            &#8220;Location&#8221;: 4,<br \/>\n            &#8220;Overall&#8221;: 4,<br \/>\n            &#8220;Rooms&#8221;: 3,<br \/>\n            &#8220;Service&#8221;: 5,<br \/>\n            &#8220;Value&#8221;: 4<br \/>\n          }<br \/>\n        },<br \/>\n        {<br \/>\n          &#8220;author&#8221;: &#8220;Barton Marks&#8221;,<br \/>\n          &#8220;content&#8221;: &#8220;We found the hotel de la Monnaie through Interval and we thought we&#8217;d give it a try while we attended a conference in New Orleans. This place was a perfect location and it definitely beat staying downtown at the Hilton with the rest of the attendees. We were right on the edge of the French Quarter withing walking distance of the whole area. The location on Esplanade is more of a residential area so you are near the fun but far enough away to enjoy some quiet downtime. We loved the trolly car right across the street and we took that down to the conference center for the conference days we attended. We also took it up Canal Street and nearly delivered to the WWII museum. From there we were able to catch a ride to the Garden District &#8211; a must see if you love old architecture &#8211; beautiful old homes(mansions). We at lunch ate Joey K&#8217;s there and it was excellent. We ate so many places in the French Quarter I can&#8217;t remember all the names. My husband loved all the NOL foods &#8211; gumbo, jambalya and more. I&#8217;m glad we found the Louisiana Pizza Kitchen right on the other side of the U.S. Mint (across the street from Monnaie). Small little spot but excellent pizza! The day we arrived was a huge jazz festival going on across the street. However, once in our rooms, you couldn&#8217;t hear any outside noise. Just the train at night blowin it&#8217;s whistle! We enjoyed being so close to the French Market and within walking distance of all the sites to see. And you can&#8217;t pass up the Cafe du Monde down the street &#8211; a busy happenning place with the best French dougnuts!!!Delicious! We will defintely come back and would stay here again. We were not hounded to purchase anything. My husband only received one phone call regarding timeshare and the woman was very pleasant. The staff was laid back and friendly. My only complaint was the very firm bed. Other than that, we really enjoyed our stay. Thanks Hotel de la Monnaie!&#8221;,<br \/>\n          &#8220;date&#8221;: &#8220;2015-03-02 19:56:13 +0300&#8221;,<br \/>\n          &#8220;ratings&#8221;: {<br \/>\n            &#8220;Business service (e.g., internet access)&#8221;: 4,<br \/>\n            &#8220;Check in \/ front desk&#8221;: 4,<br \/>\n            &#8220;Cleanliness&#8221;: 4,<br \/>\n            &#8220;Location&#8221;: 4,<br \/>\n            &#8220;Overall&#8221;: 4,<br \/>\n            &#8220;Rooms&#8221;: 3,<br \/>\n            &#8220;Service&#8221;: 3,<br \/>\n            &#8220;Value&#8221;: 5<br \/>\n          }<br \/>\n        }<br \/>\n      ],<br \/>\n      &#8220;state&#8221;: null,<br \/>\n      &#8220;title&#8221;: &#8220;Gillingham (Kent)&#8221;,<br \/>\n      &#8220;tollfree&#8221;: null,<br \/>\n      &#8220;type&#8221;: &#8220;hotel&#8221;,<br \/>\n      &#8220;url&#8221;: &#8220;https:\/\/www.yha.org.uk&#8221;,<br \/>\n      &#8220;vacancy&#8221;: true<br \/>\n    }<br \/>\n  }<br \/>\n[\/crayon]<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><span>Once you know the queries you want to run, you simply create the indexes with the keys.<\/span><\/li>\n\n<\/ul>\n\n\n<p>[crayon theme=&#8221;github&#8221; font=&#8221;consolas&#8221; font-size=&#8221;15&#8243; lang=&#8221;mysql&#8221; decode=&#8221;true&#8221;]CREATE INDEX i3 ON `hotel`(name, country);<br \/>\nCREATE INDEX i4 ON `hotel`(country, name);<br \/>\nCREATE INDEX i3 ON `hotel`(country, city);<br \/>\nCREATE INDEX i4 ON `hotel`(city, country);<\/p>\n<p>\/* Array indexes on the array keys you want to filter on.<br \/>\nCREATE INDEX i5 ON `hotel`(DISTINCT public_likes);<br \/>\nCREATE INDEX i6 ON `hotel`(DISTINCT ARRAY r.ratings.Overall FOR r IN reviews END);<\/p>\n<p>\/* Index on the fields within the geo object *\/<br \/>\nCREATE INDEX i7 ON `hotel`(geo.lat, geo.lon)<br \/>\n[\/crayon]<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><span>Now,collect statistics on the field you will have filters on.\u00a0 Typically, you index the fields you index on. So, you want to collect statistics on them as well.\u00a0 Unlike CREATE INDEX statement, the order of the keys has no consequence for UPDATE STASTICS statement<\/span><\/li>\n\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><span class=\"lang:mysql decode:true crayon-inline \">UPDATE STATISTICS for `hotel` (type, address, city, country, free_breakfast, id, phone); <\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><span>Array index on a simple array of scalars.\u00a0 public_likes is an array of strings. DISTINCT public_likes creates an index on each element of public_likes rather than the whole public_likes array.\u00a0 Details of array statistics later in the article.<\/span><\/li>\n\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><span class=\"lang:mysql decode:true crayon-inline \">UPDATE STATISTICS for `hotel`(DISTINCT public_likes); <\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><span>Now execute, explain and observe these statements.\u00a0 The CBO, based on the statistics you collected above, calculated the selectivity for the predicate (<\/span><b>country = &#8216;France&#8217;)<\/b><\/li>\n\n<\/ul>\n\n\n<p>[crayon theme=&#8221;github&#8221; font=&#8221;consolas&#8221; font-size=&#8221;15&#8243; lang=&#8221;mysql&#8221; decode=&#8221;true&#8221;]SELECT count(*) FROM `hotel` WHERE  country = &#8216;France&#8217;;<br \/>\n    {<br \/>\n        &#8220;$1&#8221;: 140<br \/>\n    }<br \/>\n[\/crayon]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u00a0<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><span>Here\u2019s the snippet from EXPLAIN.\u00a0 Explain output will have cardinality estimates and profile output will have the actual documents (rows, keys) qualified at each operator.<\/span><\/li>\n\n<\/ul>\n\n\n<p>[crayon theme=&#8221;github&#8221; font=&#8221;consolas&#8221; font-size=&#8221;15&#8243; lang=&#8221;js&#8221; decode=&#8221;true&#8221;] &#8220;#operator&#8221;: &#8220;IndexScan3&#8221;,<br \/>\n            &#8220;cardinality&#8221;: 141.54221635883903,<br \/>\n            &#8220;cost&#8221;: 71.19573482849603,<br \/>\n            &#8220;covers&#8221;: [<br \/>\n                &#8220;cover ((`hotel`.`country`))&#8221;,<br \/>\n                &#8220;cover ((`hotel`.`type`))&#8221;,<br \/>\n                &#8220;cover ((meta(`hotel`).`id`))&#8221;,<br \/>\n                &#8220;cover (count(*))&#8221;<br \/>\n            ],<br \/>\n            &#8220;index&#8221;: &#8220;i2&#8221;,<\/p>\n<p>SELECT count(*) FROM `hotel` WHERE  country = &#8216;United States&#8217;;<br \/>\n    {<br \/>\n        &#8220;$1&#8221;: 361<br \/>\n    }<br \/>\n            &#8220;cardinality&#8221;: 361.7189973614776,<br \/>\n            &#8220;cost&#8221;: 181.94465567282322,<br \/>\n            &#8220;covers&#8221;: [<br \/>\n                &#8220;cover ((`hotel`.`country`))&#8221;,<br \/>\n                &#8220;cover ((`hotel`.`type`))&#8221;,<br \/>\n                &#8220;cover ((meta(`hotel`).`id`))&#8221;,<br \/>\n                &#8220;cover (count(*))&#8221;<br \/>\n            ],<br \/>\n            &#8220;index&#8221;: &#8220;i2&#8221;,<br \/>\n[\/crayon]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u00a0<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><span>combining cost calculation on multiple predicates.\u00a0 Notice that the actual results are proportional to cardinality estimates.\u00a0 <\/span><a href=\"https:\/\/wp.sigmod.org\/?p=1075\"><span>Join selectivity estimates<\/span><\/a><span> are difficult to estimate due to correlations and require additional techniques. <\/span><\/li>\n\n<\/ul>\n\n\n<p>[crayon theme=&#8221;github&#8221; font=&#8221;consolas&#8221; font-size=&#8221;15&#8243; lang=&#8221;js&#8221; decode=&#8221;true&#8221;]SELECT count(*) FROM `hotel` WHERE  country = &#8216;United States&#8217; and name LIKE &#8216;A%&#8217;;<br \/>\n    {<br \/>\n        &#8220;$1&#8221;: 7<br \/>\n    }<\/p>\n<p>            &#8220;cardinality&#8221;: 13.397476328354337,<br \/>\n            &#8220;cost&#8221;: 8.748552042415382,<br \/>\n            &#8220;covers&#8221;: [<br \/>\n                &#8220;cover ((`hotel`.`country`))&#8221;,<br \/>\n                &#8220;cover ((`hotel`.`name`))&#8221;,<br \/>\n                &#8220;cover ((meta(`hotel`).`id`))&#8221;,<br \/>\n                &#8220;cover (count(*))&#8221;<br \/>\n            ],<br \/>\n            &#8220;index&#8221;: &#8220;i4&#8221;,<\/p>\n<p>SELECT count(*) FROM `hotel` WHERE  country = &#8216;United States&#8217; and name = &#8216;Ace Hotel DTLA&#8217;<\/p>\n<p>    { &#8220;$1&#8221;: 1 }<\/p>\n<p>           &#8220;#operator&#8221;: &#8220;IndexScan3&#8221;,<br \/>\n            &#8220;cardinality&#8221;: 0.39466325234388644,<br \/>\n            &#8220;cost&#8221;: 0.25771510378055784,<br \/>\n            &#8220;covers&#8221;: [<br \/>\n                &#8220;cover ((`hotel`.`name`))&#8221;,<br \/>\n                &#8220;cover ((`hotel`.`country`))&#8221;,<br \/>\n                &#8220;cover ((meta(`hotel`).`id`))&#8221;,<br \/>\n                &#8220;cover (count(*))&#8221;<br \/>\n            ],<br \/>\n            &#8220;index&#8221;: &#8220;i3&#8221;,<\/p>\n<p>select count(1) from hotel where country = &#8216;United States&#8217; and city = &#8216;San Francisco&#8217;;<br \/>\n    {<br \/>\n        &#8220;$1&#8221;: 132<br \/>\n    }<\/p>\n<p>            &#8220;#operator&#8221;: &#8220;IndexScan3&#8221;,<br \/>\n            &#8220;cardinality&#8221;: 361.7189973614776,<br \/>\n            &#8220;cost&#8221;: 181.94465567282322,<br \/>\n            &#8220;index&#8221;: &#8220;i2&#8221;,<br \/>\n            &#8220;index_id&#8221;: &#8220;a020ba7594f7c045&#8221;,<br \/>\n            &#8220;index_projection&#8221;: {<br \/>\n                &#8220;primary_key&#8221;: true<br \/>\n            },<br \/>\n            &#8220;keyspace&#8221;: &#8220;hotel&#8221;,<br \/>\n[\/crayon]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u00a0<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><span>Calculation on array predicate: ANY.\u00a0 This uses the statistics collection on the expression (DISTINCT public_likes) in the UPDATE statistics above. Array statistics are different from normal scalar statistics in the same way array index keys are different from normal index keys.\u00a0 The histogram on public_keys will contain more than one value from the same document. So, all the calculations will have to account for that to get the estimates closer to reality. <\/span><\/li>\n\n<\/ul>\n\n\n<p>[crayon theme=&#8221;github&#8221; font=&#8221;consolas&#8221; font-size=&#8221;15&#8243; whitespace-before=&#8221;2&#8243; whitespace-after=&#8221;2&#8243; lang=&#8221;js&#8221; decode=&#8221;true&#8221;]SELECT COUNT(1)<br \/>\nFROM hotel<br \/>\nWHERE ANY p IN public_likes SATISFIES p LIKE &#8216;A%&#8217; END<br \/>\n{<br \/>\n        &#8220;$1&#8221;: 272<br \/>\n    }<br \/>\n            &#8220;#operator&#8221;: &#8220;DistinctScan&#8221;,<br \/>\n            &#8220;cardinality&#8221;: 151.68407386905272,<br \/>\n            &#8220;cost&#8221;: 144.52983565532256,<br \/>\n            &#8220;scan&#8221;: {<br \/>\n                &#8220;#operator&#8221;: &#8220;IndexScan3&#8221;,<br \/>\n                &#8220;cardinality&#8221;: 331.49044875073974,<br \/>\n                &#8220;cost&#8221;: 143.53536430907033,<br \/>\n                &#8220;covers&#8221;: [<br \/>\n                    &#8220;cover ((distinct ((`hotel`.`public_likes`))))&#8221;,<br \/>\n                    &#8220;cover ((meta(`hotel`).`id`))&#8221;<br \/>\n                ],<br \/>\n[\/crayon]<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><span>Calculation on array predicate on a field within an object of an array: ANY r IN reviews SATISFIES r.ratings.Overall = 4 END. Statistics is collected on the expression: (DISTINCT ARRAY r.ratings.Overall FOR r IN reviews END).\u00a0 The statistics collection expression should be exactly the same as the index key array expression.<\/span><\/li>\n\n<\/ul>\n\n\n<p>[crayon theme=&#8221;github&#8221; font-size=&#8221;15&#8243; whitespace-before=&#8221;2&#8243; whitespace-after=&#8221;2&#8243; lang=&#8221;js&#8221; decode=&#8221;true&#8221;]SELECT COUNT(1)<br \/>\nFROM hotel<br \/>\nWHERE ANY r IN reviews SATISFIES r.ratings.Overall = 4 END<br \/>\n    {<br \/>\n        &#8220;$1&#8221;: 617<br \/>\n    }<\/p>\n<p>            &#8220;#operator&#8221;: &#8220;IndexScan3&#8221;,<br \/>\n            &#8220;cardinality&#8221;: 621.4762784966113,<br \/>\n            &#8220;cost&#8221;: 206.95160073937154,<br \/>\n            &#8220;covers&#8221;: [<br \/>\n                &#8220;cover ((distinct (array ((`r`.`ratings`).`Overall`) for `r` in (`hotel`.`reviews`) end)))&#8221;,<br \/>\n                &#8220;cover ((meta(`hotel`).`id`))&#8221;,<br \/>\n                &#8220;cover (count(1))&#8221;<br \/>\n            ],<br \/>\n            &#8220;filter_covers&#8221;: {<br \/>\n                &#8220;cover (any `r` in (`hotel`.`reviews`) satisfies (((`r`.`ratings`).`Overall`) = 4) end)&#8221;: true<br \/>\n            },<br \/>\n            &#8220;index&#8221;: &#8220;i6&#8221;,<\/p>\n<p>SELECT COUNT(1)<br \/>\nFROM hotel<br \/>\nWHERE ANY r IN reviews SATISFIES r.ratings.Overall &lt; 2 END<br \/>\n    {<br \/>\n        &#8220;$1&#8221;: 201<br \/>\n    }<\/p>\n<p>            &#8220;#operator&#8221;: &#8220;DistinctScan&#8221;,<br \/>\n            &#8220;cardinality&#8221;: 182.14723292266834,<br \/>\n            &#8220;cost&#8221;: 69.4615304990758,<br \/>\n            &#8220;scan&#8221;: {<br \/>\n                &#8220;#operator&#8221;: &#8220;IndexScan3&#8221;,<br \/>\n                &#8220;cardinality&#8221;: 206.73074553296368,<br \/>\n                &#8220;cost&#8221;: 68.84133826247691,<br \/>\n                &#8220;covers&#8221;: [<br \/>\n                    &#8220;cover ((distinct (array ((`r`.`ratings`).`Overall`) for `r` in (`hotel`.`reviews`) end)))&#8221;,<br \/>\n                    &#8220;cover ((meta(`hotel`).`id`))&#8221;<br \/>\n                ],<br \/>\n                &#8220;filter_covers&#8221;: {<br \/>\n                    &#8220;cover (any `r` in (`hotel`.`reviews`) satisfies (((`r`.`ratings`).`Overall`) &lt; 2) end)&#8221;: true<br \/>\n                },<br \/>\n                &#8220;index&#8221;: &#8220;i6&#8221;,<\/p>\n<p>[\/crayon]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u00a0<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><b>N1QL Optimizer<\/b><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Query Execution Flow:<\/b><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2019\/08\/Screen-Shot-2019-08-10-at-8.29.20-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-7518\" src=\"https:\/\/www.couchbase.com\/wp-content\/uploads\/sites\/5\/2026\/05\/Screen-Shot-2019-08-10-at-8.29.20-AM-300x170-1.png\" alt=\"\" width=\"768\" height=\"435\"><\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>The optimizer, broadly speaking, does the following:<\/span><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><span>Rewrite the query to its optimal equivalent form to make the optimization and its choices easier.<\/span><\/li>\n\n\n<li><span>Select the access path for each keyspace (equivalent to tables)<\/span>\n<ol>\n<li><span>Select one or more indexes for each keyspace.<\/span><\/li>\n<\/ol>\n<\/li>\n\n\n<li><span>Select the join order for all the joins in the FROM clause.\u00a0 N1QL Optimizer doesn\u2019t reorder the joins yet.<\/span><\/li>\n\n\n<li><span>Select join type (e.g. nested loop or hash join) for each join<\/span><\/li>\n\n\n<li><span>Finally create the query execution tree (plan).<\/span><\/li>\n\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><span>We have described N1QL\u2019s rule-based optimizer in this paper: <\/span><a href=\"https:\/\/dzone.com\/articles\/a-deep-dive-into-couchbase-n1ql-query-optimization\"><span>A Deep Dive Into Couchbase N1QL Query Optimization<\/span><\/a><span>.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>While the discussion in this article is mainly on SELECT statements, the CBO choose query plans for UPDATE, DELETE, MERGE and INSERT (into with SELECT) statements.\u00a0 The challenges, motivation and the solution is equally applicable for all of these DML statements.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>N1QL has the following access methods available:<\/span><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><span>Value scan<\/span><\/li>\n\n\n<li><span>Key scan<\/span><\/li>\n\n\n<li><span>Index scan<\/span><\/li>\n\n\n<li><span>Covering Index scan<\/span><\/li>\n\n\n<li><span>Primary scan<\/span><\/li>\n\n\n<li><span>Nested loop join<\/span><\/li>\n\n\n<li><span>Hash join<\/span><\/li>\n\n\n<li><span>Unnest scan<\/span><\/li>\n\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><span><strong>Motivation for a Cost Based Optimizer (CBO)<\/strong><\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2019\/08\/Screen-Shot-2019-08-10-at-8.31.11-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-7519\" src=\"https:\/\/www.couchbase.com\/wp-content\/uploads\/sites\/5\/2026\/05\/Screen-Shot-2019-08-10-at-8.31.11-AM-300x187-1.png\" alt=\"\" width=\"694\" height=\"432\"><\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Imagine google maps give you the same route irrespective of the current traffic situation?\u00a0 A cost-based routing takes into consideration the current cost (estimated time based on current traffic flow) and find the fastest route.\u00a0 Similarly, a cost-based optimizer takes into consideration the probable amount of processing (memory, CPU, I\/O) for each operation, estimates the cost of alternative routes and selects the query plan (query execution tree) with the least cost. In the example above, the routing algorithm considers the distance, traffic and gave you the three best routes.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>For relational systems, CBO was invented in 1970 at IBM, described in this <\/span><a href=\"https:\/\/people.eecs.berkeley.edu\/~brewer\/cs262\/3-selinger79.pdf\"><span>seminal paper<\/span><\/a><span>. In N1QL\u2019s current rule-based optimizer, planning decisions are independent of the data skew for various access paths and the amount of data that qualifies for the predicates.\u00a0 This results in inconsistent query plan generation and inconsistent performance because the decisions can be less than optimal.\u00a0\u00a0<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>There are many JSON databases: MongoDB, Couchbase, Cosmos DB, CouchDB. Many relational databases support JSON type and accessor functions to index and access data within JSON. \u00a0 Many of them, Couchbase, CosmosDB, MongoDB have a declarative query language for JSON and do access path selection and plan generation. All of them implement rule-based optimizer based on heuristics. We\u2019re yet to see a paper or documentation indicating a cost-based optimizer for any JSON database.\u00a0<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>In NoSQL and Hadoop world, there are some examples of a cost-based optimizer.\u00a0<\/span><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.cockroachlabs.com\/blog\/building-cost-based-sql-optimizer\/\"><span>CockroachDB Optimizer<\/span><\/a><\/li>\n\n\n<li><a href=\"https:\/\/rockset.com\/blog\/sql-query-planning-for-operational-analytics\/\"><span>Rocket Optimizer<\/span><\/a><\/li>\n\n\n<li><a href=\"https:\/\/hortonworks.com\/blog\/hive-0-14-cost-based-optimizer-cbo-technical-overview\/\"><span>Hive<\/span><\/a><\/li>\n\n\n<li><a href=\"https:\/\/www.slideshare.net\/cloudera\/performance-of-apache-impala\"><span>Impala<\/span><\/a><\/li>\n\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><span>But, these only handle the basic scalar types, just like relational database optimizer. They do not handle changing types, changing schema, objects, arrays, and array elements &#8212; all these are crucial to the success of declarative query language over JSON.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u00a0<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><p>[crayon theme=&#8221;github&#8221; font=&#8221;consolas&#8221; font-size=&#8221;15&#8243; lang=&#8221;mysql&#8221; decode=&#8221;true&#8221;]N1QL:  Same query written in N1QL on the JSON model<br \/>\nSELECT Name,Surname, jsonCol.info.address.PostCode  AS PostCode,<br \/>\n       (jsonCol.info.address.`Address Line 1` + &#8216; &#8216;<br \/>\n  + jsonCol.`info.address.`Address Line 2`) AS Address,<br \/>\n       jsonCol.info.skills&#8217; AS Skills<br \/>\nFROM People<br \/>\nWHERE<br \/>\n jsonCol.info.address.Town = &#8216;Belgrade&#8217;<br \/>\n AND Status=&#8217;Active&#8217;<br \/>\nORDER BY jsonCol.info.address.PostCode<br \/>\n[\/crayon]<\/p>\n0<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Example Query:<\/b><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>SELECT * <\/strong><strong>FROM customer <\/strong><strong>WHERE name = &#8220;Joe&#8221; <\/strong><strong>AND zip = 94587 <\/strong><strong>AND status = &#8220;Premium&#8221;;<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>The simple question is:\u00a0<\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><span>All of the indexes above are valid access path to evaluate the query. Which of the several indexes should N1QL use to run the query efficiently?\u00a0<\/span><\/li>\n\n\n<li><span>The correct answer for this is, <\/span><i><span>it depends<\/span><\/i><span>. It depends on the cardinality which depends on the statistical distribution of data for each key.<\/span><\/li>\n\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><span>There could be a million people with the name Joe, ten million people in zip code 94587 and only 5 people with Premium status. \u00a0 It could be just a few people with the name Joe or more people with Premium status or fewer customers in the zip 94587. The number of documents qualifying for each filter and the combined statistics affect the decision.\u00a0\u00a0<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>So far, the problems are the same as SQL optimization.\u00a0 Following this approach is safe and sound for collecting statistics, calculating selectivities and coming up with the query plan.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>But, JSON is different:<\/span><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><span>The data type can change between multiple documents. Zip can be numeric in one document, string in another, the object in the third. <\/span><i><span>How do you collect statistics, store it and use it efficiently?<\/span><\/i><\/li>\n\n\n<li><span>It can store complex, a nested structure using arrays and objects.\u00a0 <\/span><i><span>What does it mean to collect statistics on nested structures, arrays, etc?<\/span><\/i><\/li>\n\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">\u00a0<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Scalars: <\/b><span>numbers, boolean, string, null, missing. In the document below, a, b, c,d, e are all scalars.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>{ \u201ca\u201d: 482, \u201cb\u201d: 2948.284, \u201cc\u201d: \u201cHello, World\u201d, \u201cd\u201d: null, \u201ce\u201d: missing }<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Objects:<\/b><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><span>Search for the whole objects<\/span><\/li>\n\n\n<li><span>Search for elements within an objects<\/span><\/li>\n\n\n<li><span>Search for exact value of an attribute within the objects<\/span><\/li>\n\n\n<li><span>Match the elements, arrays, objects anywhere within the hierarchy.<\/span><\/li>\n\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><span>This structure is known only after the user specifies the references to these in the query.\u00a0 If these expressions are in predicates, it would be good to know if they actually exist and then determine their selectivity.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Here are some examples.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Objects:<\/b><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><span>Refer to a scalar inside an object. \u00a0 <\/span><span>E.g.\u00a0 Name.fname, Name.lname<\/span><\/li>\n\n\n<li><span>Refer to a scalar inside an array of an object.\u00a0 <\/span><span>E.g. Billing[*].status<\/span><\/li>\n\n\n<li><span>Nested case of (1), (2) and (3). \u00a0 Using <\/span><span>UNNEST<\/span><span> operation.<\/span><\/li>\n\n\n<li><span>Refer to an object or an array in the cases (1) through (4).<\/span><\/li>\n\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Arrays:<\/b><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><span>Match the full array.<\/span><\/li>\n\n\n<li><span>Match scalar elements of an array with supported types (number, string, etc)<\/span><\/li>\n\n\n<li><span>Match objects within an array.<\/span><\/li>\n\n\n<li><span>Match the elements within an array of an array.<\/span><\/li>\n\n\n<li><span>Match the elements, arrays, objects anywhere within the hierarchy.<\/span><\/li>\n\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><b>LET Expressions:<\/b><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><span>Need to get selectivity on the expressions used in the WHERE clause.<\/span><\/li>\n\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><b>UNNEST operation:<\/b><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><span>Selectivities on the filters on UNNESTed doc to help pick up the right (array) index.<\/span><\/li>\n\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><b>JOINs: <\/b><span>INNER, LEFT OUTER, RIGHT OUTER<\/span><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><span>Join selectivities.\u00a0<\/span>\n<ol>\n<li><span>Usually a big problem in RDBMS as well.\u00a0 May not be in v1.<\/span><\/li>\n<\/ol>\n<\/li>\n\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Predicates:<\/b><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><span>USE KEYS<\/span><\/li>\n\n\n<li><span>Comparison of scalar values: =, &gt;, &lt;, &gt;=, &lt;=, BETWEEN, IN<\/span><\/li>\n\n\n<li><span>Array predicates: ANY, EVERY, ANY &amp; EVERY, WITHIN<\/span><\/li>\n\n\n<li><span>Subqueries<\/span><\/li>\n\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\"><b>Cost-Based Optimizer for N1QL<\/b><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span>The cost-based optimizer will now estimate the cost based on the available statistics on data, index, calculate the cost of each operation and choose the best path.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2019\/08\/Screen-Shot-2019-08-10-at-11.14.42-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-7520\" src=\"https:\/\/www.couchbase.com\/wp-content\/uploads\/sites\/5\/2026\/05\/Screen-Shot-2019-08-10-at-11.14.42-AM-300x177-1.png\" alt=\"\" width=\"850\" height=\"502\"><\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Challenges to Cost-Based Optimizer for JSON<\/b><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><span>Collect statistics on scalars, objects, arrays, array elements &#8212; anything on which you can apply a predicate (filter)<\/span>\n<ol>\n<li><span>Create the right data structure to store statistics on a field whose type can very from one type to another.<\/span><\/li>\n<\/ol>\n<\/li>\n\n\n<li><span>Create methods to use the statistics to efficiently calculate accurate estimates on complex set of statistics collected above.<\/span><\/li>\n\n\n<li><span>Use appropriate statistics, consider valid access paths, and create a query plan.<\/span><\/li>\n\n\n<li><span>A field can be an integer in one document, string in next, array in another and go missing in yet another. The histograms\u00a0<\/span><\/li>\n\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Approach to N1QL cost-based optimizer<\/b><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u00a0<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>N1QL optimizer will be responsible for determining the most efficient execution strategy for a query.\u00a0 There are typically a large number of alternative evaluation strategies for a given query. These alternatives may differ significantly in their system resources requirement and\/or response time.\u00a0 A cost based query optimizer uses a sophisticated enumeration engine (i.e., an engine that enumerates the search space of access and join plans) to efficiently generate a profusion of alternative query evaluation strategies and a detailed model of execution cost to choose among those alternative strategies.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>N1QL Optimizer work can be categorized into these:<\/span><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><span>Collect statistics<\/span>\n<ol>\n<li><span>Collect statistics on individual fields and create a single histogram for each field (inclusive of all data types that may appear in this field).\u00a0<\/span><\/li>\n<li><span>Collect statistics on each available index.<\/span><\/li>\n<\/ol>\n<\/li>\n\n\n<li><span>Query rewrite.\u00a0<\/span>\n<ol>\n<li><span>Basic rule based query rewrite.<\/span><\/li>\n<\/ol>\n<\/li>\n\n\n<li><span>Cardinality estimates<\/span>\n<ol>\n<li><span>Use available histogram and index statistics for selectivity estimation.<\/span><\/li>\n<li><span>Use this selectivity for cardinality estimation<\/span>\n<ol>\n<li><span>This is not so straightforward in case of arrays.<\/span><\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<\/li>\n\n\n<li><span>Join ordering.\u00a0<\/span>\n<ol>\n<li><span>CONSIDER a query:\u00a0 a JOIN b JOIN c<\/span>\n<ol>\n<li><span>This is same as ( b JOIN a JOIN c),\u00a0 (a JOIN c JOIN b), etc.<\/span><\/li>\n<li><span>Choosing the right order makes a huge impact on the query.<\/span><\/li>\n<\/ol>\n<\/li>\n<li><span>The Couchbase 6.5 implementation does not yet do this.\u00a0 This is a well understood problem for which we can borrow solutions from SQL.\u00a0 JSON does not introduce new issues. The ON clause predicates can include array predicates.\u00a0 This is in the roadmap.<\/span><\/li>\n<\/ol>\n<\/li>\n\n\n<li><span>Join type\u00a0<\/span>\n<ol>\n<li><span>The rule based optimizer used\u00a0 block-nested-loop by default. Need to use directives for forcing hash join.\u00a0 Directive also needs to specify the build\/probe side. Both are undesirable.<\/span><\/li>\n<li><span>CBO should select the join type.\u00a0 If a hash join is chosen, it should automatically choose the build and the probe side. Choosing the best inner\/outer keyspace for nested loop is in our roadmap as well.<\/span><\/li>\n<\/ol>\n<\/li>\n\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\"><b>Statistics Collection for N1QL CBO<\/b><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Optimizer statistics is an essential part of cost-based optimization. The optimizer calculates costs for various steps of query execution, and the calculation is based on the optimizer\u2019s knowledge on various aspects of the physical entities in the server &#8211; known as optimizer statistics.\u00a0<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Handling of mixed types<\/b><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Unlike relational databases, a field in a JSON document does not have a type, i.e., different types of values can exist in the same field. A distribution therefore needs to handle different types of values. To avoid confusion we put different types of values in different distribution bins. This means we may have partial bins (as last bin) for each type. There are also special handling of the special values MISSING, NULL, TRUE and FALSE. These values (if present) always reside in an overflow bin. N1QL has predefined sorting order for different types and MISSING\/NULL\/TRUE\/FALSE appears at the beginning of the sorted stream.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><b>collection\/bucket statistics<\/b><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>For collections or buckets, we gather:<\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><span>number of documents in the collection\/bucket<\/span><\/li>\n\n\n<li><span>average document size<\/span><\/li>\n\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Index statistics<\/b><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>For a GSI index, we gather:<\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><span>number of items in the index<\/span><\/li>\n\n\n<li><span>number of index pages<\/span><\/li>\n\n\n<li><span>resident ratio<\/span><\/li>\n\n\n<li><span>average item size<\/span><\/li>\n\n\n<li><span>average page size<\/span><\/li>\n\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Distribution statistics<\/b><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>For certain fields we also gather distribution statistics, this allows more accurate selectivity estimation for predicates like \u201cc1 = 100\u201d, or \u201cc1 &gt;= 20\u201d, or \u201cc1 &lt; 150\u201d. It also produces more accurate selectivity estimates for join predicates such as \u201ct1.c1 = t2.c2\u201d, assuming distribution statistics exist for both t1.c1 and t2.c2.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Gathering optimizer statistics<\/b><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>While our vision is to have the server automatically update necessary optimizer statistics, for the initial implementation optimizer statistics will be updated via a new UPDATE STATISTICS command.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2019\/08\/Screen-Shot-2019-08-10-at-8.31.11-AM.png\"><span class=\"lang:mysql decode:true crayon-inline \">UPDATE STATISTICS [FOR] &lt;keyspace_reference&gt; (&lt;index_expressions&gt;) [WITH &lt;options&gt;] <\/span>\u00a0\u00a0<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>&lt;keyspace_reference&gt; is a collection name (we may support bucket as well, it\u2019s undecided at this point).\u00a0<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>The command above is for gathering distribution statistics, &lt;index_expressions&gt; is one or more (comma separated) expressions for which distribution statistics is to be collected. We support the same expressions as in CREATE INDEX command, e.g., a field, a nested fields (inside nested objects, e.g. location.lat), an ALL expression for arrays, etc.The WITH clause is optional, if present, it specifies options for the UPDATE STATISTICS command. The options are specified in JSON format similar to how options are specified for other commands like CREATE INDEX or INFER.\u00a0<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Currently the following options are supported in the WITH clause<\/span><span>:<\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><b>sample_size<\/b><span>: for gathering distribution statistics, user can specify a sample size to use. It is an integer number. Note that we also calculate a minimum sample size and we take the larger of the user-specified sample size and calculated minimum sample size.<\/span><\/li>\n\n\n<li><b>resolution<\/b><span>: for gathering distribution statistics, indicate how many distribution bins is desired (granularity of the distribution bins). It is specified as a float number, taken as a percentage. E.g., {\u201cresolution\u201d: 1.0} means each distribution bin contains approximately 1 percent of the total documents, i.e., ~100 distribution bins are desired. Default resolution is 1.0 (100 distribution bins). A minimum resolution of 0.02 (5000 distribution bins) and a maximum resolution of 5.0 (20 distribution bins) will be enforced<\/span><\/li>\n\n\n<li><b>update_statistics_timeout<\/b><span>: a time-out value (in seconds) can be specified. The UPDATE STATISTICS command times out with an error when the time-out period is reached. If not specified, a default time-out value will be calculated based on the number of samples used.<\/span><\/li>\n\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Handling of mixed types<\/b><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Unlike relational databases, a field in a JSON document does not have a type, i.e., different types of values can exist in the same field. A distribution therefore needs to handle different types of values. To avoid confusion we put different types of values in different distribution bins. This means we may have partial bins (as last bin) for each type. There are also special handling of the special values MISSING, NULL, TRUE and FALSE. These values (if present) always reside in an overflow bin. N1QL has predefined sorting order for different types and MISSING\/NULL\/TRUE\/FALSE appears at the beginning of the sorted stream.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Boundary bins<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Since we only keep the max value for each bin, the min boundary is derived from the max value of the previous bin. This also means that the very first distribution bin does not have a min value. To resolve that, we put a \u201cboundary bin\u201d at the very beginning, this is a special bin with bin size 0, and the only purpose of the bin is to provide a max value, which is the min boundary of the next distribution bin.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Since a distribution may contain multiple types, we separate the types to different distribution bins, and we also put a \u201cboundary bin\u201d for each type, such that we know the minimum value for each type in a distribution.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Example of handling mixed types and boundary bins<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span class=\"lang:mysql decode:true crayon-inline \">UPDATE STATISTICS for CUSTOMER(quantity);<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Histogram: Total number of documents: 5000 with quantity in simple integers.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2019\/08\/Screen-Shot-2019-08-10-at-11.17.36-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-7521\" src=\"https:\/\/www.couchbase.com\/wp-content\/uploads\/sites\/5\/2026\/05\/Screen-Shot-2019-08-10-at-11.17.36-AM-300x50-1.png\" alt=\"\" width=\"846\" height=\"141\"><\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Predicates:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>\n<p dir=\"ltr\">(quantity = 100):\u00a0 Estimate 1%<\/p>\n<\/li>\n\n\n<li>\n<p dir=\"ltr\">(quantity between 200 and 100) : Estimate 20%<\/p>\n<\/li>\n\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">We also use additional techniques like keeping the highest\/second-highest, lowest, second lowest values for each bin, keep an overflow bin for values which occur more than 25% of the time to improve this selectivity calculation.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In JSON, quantity can be any of the types: MISSING, null, boolean, integer, string, array and an object.\u00a0 For the sake of simplicity, we show quantity histogram with three types: integers, strings and arrays. This has been extended to include all of the types.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2019\/08\/Screen-Shot-2019-08-10-at-11.19.38-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-7522\" src=\"https:\/\/www.couchbase.com\/wp-content\/uploads\/sites\/5\/2026\/05\/Screen-Shot-2019-08-10-at-11.19.38-AM-300x33-1.png\" alt=\"\" width=\"982\" height=\"108\"><\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>N1QL defines the method by with values of different types can be compared.<\/span><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><span>Type order: from lowest to highest<\/span>\n<ol>\n<li><span>Missing &lt; null &lt; false &lt; true &lt; numeric &lt; string &lt; array &lt; object<\/span>\n<ol>\n<li><a href=\"https:\/\/docs.couchbase.com\/server\/4.5\/n1ql\/n1ql-language-reference\/orderby.html\"><span>https:\/\/docs.couchbase.com\/server\/4.5\/n1ql\/n1ql-language-reference\/orderby.html<\/span><\/a><\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<\/li>\n\n\n<li><span>After we sample the documents, we first group them by types, sort them within the type group and create the mini-histogram for each type.<\/span><\/li>\n\n\n<li><span>We then stitch these mini-histograms into a large histogram with a boundary bin between each type.\u00a0 This helps the optimizer to calculate selectivities efficiently either on a single type or across multiple types.<\/span><\/li>\n\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Examples:<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><p>[crayon theme=&#8221;github&#8221; font=&#8221;consolas&#8221; font-size=&#8221;15&#8243; lang=&#8221;mysql&#8221; decode=&#8221;true&#8221;]N1QL:  Same query written in N1QL on the JSON model<br \/>\nSELECT Name,Surname, jsonCol.info.address.PostCode  AS PostCode,<br \/>\n       (jsonCol.info.address.`Address Line 1` + &#8216; &#8216;<br \/>\n  + jsonCol.`info.address.`Address Line 2`) AS Address,<br \/>\n       jsonCol.info.skills&#8217; AS Skills<br \/>\nFROM People<br \/>\nWHERE<br \/>\n jsonCol.info.address.Town = &#8216;Belgrade&#8217;<br \/>\n AND Status=&#8217;Active&#8217;<br \/>\nORDER BY jsonCol.info.address.PostCode<br \/>\n[\/crayon]<\/p>\n1<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Simple data types<\/b><a href=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2019\/08\/Screen-Shot-2019-08-10-at-11.19.38-AM.png\"><br>\n<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Distribution statistics for simple data types are straight forward. Boolean values will have two overflow bins storing TRUE and FALSE values. Numeric and string values are also easy to handle. An open question remains as to whether we want to limit the size of a string value as a bin boundary, i.e., if a string is very long, do we want to truncate the string before storing as a max value for a distribution bin. Long string values in an overflow bin will not be truncated since that requires an exact match.<\/span><a href=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2019\/08\/Screen-Shot-2019-08-10-at-11.19.38-AM.png\"><br>\n<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Arrays<\/b><a href=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2019\/08\/Screen-Shot-2019-08-10-at-11.19.38-AM.png\"><br>\n<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>The design for how to collect distribution statistics has not been finalized. What we want to do is probably gather distribution statistics on individual elements of the array, since that\u2019s how array index works. We may need to support DISTINCT\/ALL variations of the array index by including the same keyword in front of array field specification, which determines whether we remove duplicates from the array before constructing histogram.<\/span><a href=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2019\/08\/Screen-Shot-2019-08-10-at-11.19.38-AM.png\"><br>\n<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Estimating selectivity of an array predicate (ANY predicate) based on such a histogram is a bit challenging. There is no easy way to account for variable lengths of arrays in the collection. In the first release, we\u2019ll just keep an average array size as part of distribution statistics. This assumes some form of uniformity, which is certainly not ideal, but is a good start.<\/span><a href=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2019\/08\/Screen-Shot-2019-08-10-at-11.19.38-AM.png\"><br>\n<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Estimating selectivity of ALL predicate is even trickier, we may need to use some sort of default value for this.<\/span><a href=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2019\/08\/Screen-Shot-2019-08-10-at-11.19.38-AM.png\"><br>\n<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Consider this JSON document in the keyspace\u00a0<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><p>[crayon theme=&#8221;github&#8221; font=&#8221;consolas&#8221; font-size=&#8221;15&#8243; lang=&#8221;mysql&#8221; decode=&#8221;true&#8221;]N1QL:  Same query written in N1QL on the JSON model<br \/>\nSELECT Name,Surname, jsonCol.info.address.PostCode  AS PostCode,<br \/>\n       (jsonCol.info.address.`Address Line 1` + &#8216; &#8216;<br \/>\n  + jsonCol.`info.address.`Address Line 2`) AS Address,<br \/>\n       jsonCol.info.skills&#8217; AS Skills<br \/>\nFROM People<br \/>\nWHERE<br \/>\n jsonCol.info.address.Town = &#8216;Belgrade&#8217;<br \/>\n AND Status=&#8217;Active&#8217;<br \/>\nORDER BY jsonCol.info.address.PostCode<br \/>\n[\/crayon]<\/p>\n2<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>The field \u201ca\u201d is a scalar, b is an array of scalars and c is an array of objects.\u00a0 When is issue a query, you can have predicates on any or all of the fields: a, b, c.\u00a0 So far we\u2019ve discussed the scalars whose type can change. Now let\u2019s discuss array predicates statistics collection and selectivity calculation.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><p>[crayon theme=&#8221;github&#8221; font=&#8221;consolas&#8221; font-size=&#8221;15&#8243; lang=&#8221;mysql&#8221; decode=&#8221;true&#8221;]N1QL:  Same query written in N1QL on the JSON model<br \/>\nSELECT Name,Surname, jsonCol.info.address.PostCode  AS PostCode,<br \/>\n       (jsonCol.info.address.`Address Line 1` + &#8216; &#8216;<br \/>\n  + jsonCol.`info.address.`Address Line 2`) AS Address,<br \/>\n       jsonCol.info.skills&#8217; AS Skills<br \/>\nFROM People<br \/>\nWHERE<br \/>\n jsonCol.info.address.Town = &#8216;Belgrade&#8217;<br \/>\n AND Status=&#8217;Active&#8217;<br \/>\nORDER BY jsonCol.info.address.PostCode<br \/>\n[\/crayon]<\/p>\n3<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>These are simple predicates on array of scalars and array of objects.This is a generalized implementation where the query can be written to filter elements and value in arrays of arrays, arrays of objects of arrays, etc.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>When you have a billion of these documents, you create array indexes to efficiently execute the filter.\u00a0 Now, for the optimizer, it\u2019s important to estimate the number of documents that qualify for a given predicate.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><p>[crayon theme=&#8221;github&#8221; font=&#8221;consolas&#8221; font-size=&#8221;15&#8243; lang=&#8221;mysql&#8221; decode=&#8221;true&#8221;]N1QL:  Same query written in N1QL on the JSON model<br \/>\nSELECT Name,Surname, jsonCol.info.address.PostCode  AS PostCode,<br \/>\n       (jsonCol.info.address.`Address Line 1` + &#8216; &#8216;<br \/>\n  + jsonCol.`info.address.`Address Line 2`) AS Address,<br \/>\n       jsonCol.info.skills&#8217; AS Skills<br \/>\nFROM People<br \/>\nWHERE<br \/>\n jsonCol.info.address.Town = &#8216;Belgrade&#8217;<br \/>\n AND Status=&#8217;Active&#8217;<br \/>\nORDER BY jsonCol.info.address.PostCode<br \/>\n[\/crayon]<\/p>\n4<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Index i1 with the key DISTINCT b creates an index with only the distinct (unique) elements of b.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Index i2 with the key ALL b creates an index with all the elements of b.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>This exists to manage the size of the index, possibility of getting large intermediate results from the index.\u00a0 In both cases, there will be MULTIPLE index entries for EACH element of an array. This is UNLIKE a scalar which has ONLY one entry in the index PER JSON document.\u00a0\u00a0<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>For more on array indexing, see <\/span><a href=\"https:\/\/docs.couchbase.com\/server\/current\/n1ql\/n1ql-language-reference\/indexing-arrays.html\"><span>array indexing documentation in Couchbase<\/span><\/a><span>.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>How do you collect statistics on this array or array of objects?\u00a0 The key insight is to collect statistics on EXACTLY the same expression as the expression you\u2019re creating the index on.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>In this case, we collect statistics on the following:<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><p>[crayon theme=&#8221;github&#8221; font=&#8221;consolas&#8221; font-size=&#8221;15&#8243; lang=&#8221;mysql&#8221; decode=&#8221;true&#8221;]N1QL:  Same query written in N1QL on the JSON model<br \/>\nSELECT Name,Surname, jsonCol.info.address.PostCode  AS PostCode,<br \/>\n       (jsonCol.info.address.`Address Line 1` + &#8216; &#8216;<br \/>\n  + jsonCol.`info.address.`Address Line 2`) AS Address,<br \/>\n       jsonCol.info.skills&#8217; AS Skills<br \/>\nFROM People<br \/>\nWHERE<br \/>\n jsonCol.info.address.Town = &#8216;Belgrade&#8217;<br \/>\n AND Status=&#8217;Active&#8217;<br \/>\nORDER BY jsonCol.info.address.PostCode<br \/>\n[\/crayon]<\/p>\n5<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Now, within the histogram, there can be zero, one or more values originating from the same document. \u00a0 So, calculating the selectivity (estimated percentage of documents qualifying the filter) is not so easy!<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Here\u2019s the novel solution to address the issue with arrays:<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>For normal stats:\u00a0 there\u2019s one index entry per document.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Cardinality becomes a simple: selectivity x table cardinality;<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>For array stats:\u00a0 There is N index entry per document;\u00a0<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0N-&gt; Number of distinct values in the array.\u00a0\u00a0<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0N = 0 to n, n &lt;= ARRAY_LENGTH(a)<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>This additional statistics has to be collected and stored in the histogram.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Now, when a index is chosen for the evaluation of a particular predicate, index scan will return all of the qualified document keys, which contain duplicates.\u00a0 The query engine will then do a distinct operation to get the unique keys to get the correct (non-duplicate) results. The cost-based optimizer will have to take this into account while calculating the number of documents (not the number of index entries) that\u2019ll qualify the predicate.\u00a0 So, we divide the estimate by the estimate of average array size length.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>This cardinality can be used to cost and compare the cost of using the array-index path versus other legal access path to find the best access path.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Object, JSON and binary values<\/b><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>It\u2019s unclear how useful a histogram on an object\/JSON value or a binary value will be. It should be rare to see comparisons with such values in the query. We can either handle these exactly like other simple types, i.e., put number of values, number of distinct values, and a max boundary on each distribution bin; or we can simplify and just put the count in the distribution bin without number of distinct and max value. The issue with max value is similar to long strings, where the value can be large, and storing such large values may not be beneficial in the histogram. This remains an open question for now.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Statistics for fields in nested objects<\/b><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Consider this JSON document in the keyspace:<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><p>[crayon theme=&#8221;github&#8221; font=&#8221;consolas&#8221; font-size=&#8221;15&#8243; lang=&#8221;mysql&#8221; decode=&#8221;true&#8221;]N1QL:  Same query written in N1QL on the JSON model<br \/>\nSELECT Name,Surname, jsonCol.info.address.PostCode  AS PostCode,<br \/>\n       (jsonCol.info.address.`Address Line 1` + &#8216; &#8216;<br \/>\n  + jsonCol.`info.address.`Address Line 2`) AS Address,<br \/>\n       jsonCol.info.skills&#8217; AS Skills<br \/>\nFROM People<br \/>\nWHERE<br \/>\n jsonCol.info.address.Town = &#8216;Belgrade&#8217;<br \/>\n AND Status=&#8217;Active&#8217;<br \/>\nORDER BY jsonCol.info.address.PostCode<br \/>\n[\/crayon]<\/p>\n6<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Following is the dotted expression to access nested objects.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><b>FROM k WHERE b.p = &#8220;NY&#8221; AND c.e.x = &#8220;hello&#8221; AND c.e.z = 48;\u00a0<\/b><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span>Since each path is unique, collecting and using the histogram is just like scalar.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><b>UPDATE STATISTICS FOR k(b.p, c.e.x, c.e.z)<\/b><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2019\/08\/Screen-Shot-2019-08-10-at-11.19.38-AM.png\">\u00a0<\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><b>Summary<\/b><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span>We\u2019ve described how we\u2019ve implemented a cost based optimizer for N1QL (SQL for JSON) and handled the following challenges.<\/span><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><span>N1QL CBO can handle flexible JSON schema.<\/span><\/li>\n\n\n<li><span>N1QL CBO can handle scalars, objects, and arrays.<\/span><\/li>\n\n\n<li><span>N1QL CBO can handle stats collection and calculate estimates on any field of any type within JSON.<\/span><\/li>\n\n\n<li><span>All these have improved query plans and therefore improve the performance of the system.\u00a0 It\u2019ll also reduce the TCO by reducing the DBA performance debugging overhead.<\/span><\/li>\n\n\n<li><span><strong>Download <a href=\"https:\/\/www.couchbase.com\/downloads?family=server&amp;product=couchbase-server-developer\">Couchbase 6.5<\/a> and try it out yourself!<\/strong><\/span><\/li>\n\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\"><b>Resources N1QL rule-based Optimizer<\/b><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span>The first article describes the Couchbase Optimizer as of 5.0.\u00a0 We added ANSI joins in Couchbase 5.5. The second article includes its description and some of the optimization done for it.\u00a0\u00a0<\/span><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><span>A Deep Dive Into Couchbase N1QL Query Optimization <\/span><a href=\"https:\/\/dzone.com\/articles\/a-deep-dive-into-couchbase-n1ql-query-optimization\"><span>https:\/\/dzone.com\/articles\/a-deep-dive-into-couchbase-n1ql-query-optimization<\/span><\/a><\/li>\n\n\n<li><span>ANSI JOIN Support in N1QL <\/span><a href=\"https:\/\/dzone.com\/articles\/ansi-join-support-in-n1ql\"><span>https:\/\/dzone.com\/articles\/ansi-join-support-in-n1ql<\/span><\/a><\/li>\n\n\n<li><span>Create the right index, Get the right Performance for the rule-based optimizer.<\/span>\n<ol>\n<li><a href=\"https:\/\/dzone.com\/articles\/create-the-right-index-get-the-right-performance\"><span>https:\/\/dzone.com\/articles\/create-the-right-index-get-the-right-performance<\/span><\/a><\/li>\n<li><a href=\"https:\/\/dzone.com\/articles\/create-the-right-index-get-the-right-performance-p\"><span>https:\/\/dzone.com\/articles\/create-the-right-index-get-the-right-performance-p<\/span><\/a><\/li>\n<\/ol>\n<\/li>\n\n\n<li><span>Index selection algorithm<\/span>\n<ol>\n<li><a href=\"https:\/\/www.slideshare.net\/journalofinformix\/n1ql-query-optimizer-improvements-in-couchbase-50-by-sitaram-vemulapalli\"><span>https:\/\/www.slideshare.net\/journalofinformix\/n1ql-query-optimizer-improvements-in-couchbase-50-by-sitaram-vemulapalli<\/span><\/a><\/li>\n<\/ol>\n<\/li>\n\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\"><b>\u00a0References<\/b><\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><span>Access Path Selection in a Relational Database Management System. <\/span><a href=\"https:\/\/people.eecs.berkeley.edu\/~brewer\/cs262\/3-selinger79.pdf\"><span>https:\/\/people.eecs.berkeley.edu\/~brewer\/cs262\/3-selinger79.pdf<\/span><\/a><\/li>\n\n\n<li><span>Cost based optimization in DB2 XML. <\/span>\u00a0<a href=\"https:\/\/www2.hawaii.edu\/~lipyeow\/pub\/ibmsys06-xmlopt.pdf\">https:\/\/www2.hawaii.edu\/~lipyeow\/pub\/ibmsys06-xmlopt.pdf<\/a><\/li>\n\n\n<li><span>Access Path Selection in a Relational Database Management System. <\/span><a href=\"https:\/\/people.eecs.berkeley.edu\/~brewer\/cs262\/3-selinger79.pdf\"><span>https:\/\/people.eecs.berkeley.edu\/~brewer\/cs262\/3-selinger79.pdf<\/span><\/a><\/li>\n\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>Authors &amp; Engineering Team: Bingjie Miao, Keshav Murthy,\u00a0 Marco Greco, Prathibha Bisarahalli. Couchbase, Inc.\u00a0 A rule-based optimizer knows rules for everything and cost for nothing &#8211; Oscar Wilde Abstract Couchbase is a distributed JSON database. It provides distributed data processing, high performance indexing and declarative query language N1QL along with search index, eventing and analytics.\u00a0 [&hellip;]<\/p>\n","protected":false},"author":55,"featured_media":1571,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"_acf":"","footnotes":""},"categories":[127,75,179,54,189,18],"tags":[349,370,371,30,6,372,373,374],"ppma_author":[291],"class_list":["post-1572","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-application-design","category-analytics","category-couchbase-architecture","category-couchbase-server","category-data-modeling","category-n1ql-query","tag-6-5","tag-cbo","tag-explain","tag-json","tag-nosql-database","tag-optimizer","tag-plan","tag-statistics"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.6 (Yoast SEO v27.6) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Cost Based Optimizer for Couchbase N1QL (SQL for JSON) - The Couchbase Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.couchbase.com\/blog\/pt\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\/\" \/>\n<meta property=\"og:locale\" content=\"pt_BR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Cost Based Optimizer for Couchbase N1QL (SQL for JSON)\" \/>\n<meta property=\"og:description\" content=\"Authors &amp; Engineering Team: Bingjie Miao, Keshav Murthy,\u00a0 Marco Greco, Prathibha Bisarahalli. Couchbase, Inc.\u00a0 A rule-based optimizer knows rules for everything and cost for nothing &#8211; Oscar Wilde Abstract Couchbase is a distributed JSON database. It provides distributed data processing, high performance indexing and declarative query language N1QL along with search index, eventing and analytics.\u00a0 [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.couchbase.com\/blog\/pt\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\/\" \/>\n<meta property=\"og:site_name\" content=\"The Couchbase Blog\" \/>\n<meta property=\"article:published_time\" content=\"2019-08-07T18:27:57+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/5\/2026\/05\/Transactions-alternative-3.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1400\" \/>\n\t<meta property=\"og:image:height\" content=\"553\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Keshav Murthy\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@rkeshavmurthy\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Keshav Murthy\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"32 minutos\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\\\/\"},\"author\":{\"name\":\"Keshav Murthy\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#\\\/schema\\\/person\\\/c261644262bf98e146372fe647682636\"},\"headline\":\"Cost Based Optimizer for Couchbase N1QL (SQL for JSON)\",\"datePublished\":\"2019-08-07T18:27:57+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\\\/\"},\"wordCount\":6324,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/5\\\/2026\\\/05\\\/Transactions-alternative-3.jpg\",\"keywords\":[\"6.5\",\"cbo\",\"explain\",\"JSON\",\"NoSQL Database\",\"Optimizer\",\"plan\",\"statistics\"],\"articleSection\":[\"Application Design\",\"Couchbase Analytics\",\"Couchbase Architecture\",\"Couchbase Server\",\"Data Modeling\",\"SQL++ \\\/ N1QL Query\"],\"inLanguage\":\"pt-BR\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\\\/\",\"url\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\\\/\",\"name\":\"Cost Based Optimizer for Couchbase N1QL (SQL for JSON) - The Couchbase Blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/5\\\/2026\\\/05\\\/Transactions-alternative-3.jpg\",\"datePublished\":\"2019-08-07T18:27:57+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\\\/#breadcrumb\"},\"inLanguage\":\"pt-BR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"pt-BR\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/5\\\/2026\\\/05\\\/Transactions-alternative-3.jpg\",\"contentUrl\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/5\\\/2026\\\/05\\\/Transactions-alternative-3.jpg\",\"width\":1400,\"height\":553},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Cost Based Optimizer for Couchbase N1QL (SQL for JSON)\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/\",\"name\":\"The Couchbase Blog\",\"description\":\"Couchbase, the NoSQL Database\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"pt-BR\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#organization\",\"name\":\"The Couchbase Blog\",\"url\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"pt-BR\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/5\\\/2026\\\/06\\\/logo.svg\",\"contentUrl\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/5\\\/2026\\\/06\\\/logo.svg\",\"width\":\"1024\",\"height\":\"1024\",\"caption\":\"The Couchbase Blog\"},\"image\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#\\\/schema\\\/person\\\/c261644262bf98e146372fe647682636\",\"name\":\"Keshav Murthy\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"pt-BR\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/af74df754db27152971d0aed2f323ead5a1f9fe5afd0209af91e12e784451224?s=96&d=mm&r=g4e51d72fc07c662aa791316deafffac4\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/af74df754db27152971d0aed2f323ead5a1f9fe5afd0209af91e12e784451224?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/af74df754db27152971d0aed2f323ead5a1f9fe5afd0209af91e12e784451224?s=96&d=mm&r=g\",\"caption\":\"Keshav Murthy\"},\"description\":\"Keshav Murthy is a Vice President at Couchbase R&amp;D. Previously, he was at MapR, IBM, Informix, Sybase, with more than 20 years of experience in database design &amp; development. He lead the SQL and NoSQL R&amp;D team at IBM Informix. He has received two President's Club awards at Couchbase, two Outstanding Technical Achievement Awards at IBM. Keshav has a bachelor's degree in Computer Science and Engineering from the University of Mysore, India, and has received twenty four US patents.\",\"sameAs\":[\"https:\\\/\\\/blog.planetnosql.com\\\/\",\"https:\\\/\\\/x.com\\\/rkeshavmurthy\"],\"url\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/pt\\\/author\\\/keshav-murthy\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Cost Based Optimizer for Couchbase N1QL (SQL for JSON) - The Couchbase Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.couchbase.com\/blog\/pt\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\/","og_locale":"pt_BR","og_type":"article","og_title":"Cost Based Optimizer for Couchbase N1QL (SQL for JSON)","og_description":"Authors &amp; Engineering Team: Bingjie Miao, Keshav Murthy,\u00a0 Marco Greco, Prathibha Bisarahalli. Couchbase, Inc.\u00a0 A rule-based optimizer knows rules for everything and cost for nothing &#8211; Oscar Wilde Abstract Couchbase is a distributed JSON database. It provides distributed data processing, high performance indexing and declarative query language N1QL along with search index, eventing and analytics.\u00a0 [&hellip;]","og_url":"https:\/\/www.couchbase.com\/blog\/pt\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\/","og_site_name":"The Couchbase Blog","article_published_time":"2019-08-07T18:27:57+00:00","og_image":[{"width":1400,"height":553,"url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/5\/2026\/05\/Transactions-alternative-3.jpg","type":"image\/jpeg"}],"author":"Keshav Murthy","twitter_card":"summary_large_image","twitter_creator":"@rkeshavmurthy","twitter_misc":{"Written by":"Keshav Murthy","Est. reading time":"32 minutos"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.couchbase.com\/blog\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\/#article","isPartOf":{"@id":"https:\/\/www.couchbase.com\/blog\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\/"},"author":{"name":"Keshav Murthy","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/c261644262bf98e146372fe647682636"},"headline":"Cost Based Optimizer for Couchbase N1QL (SQL for JSON)","datePublished":"2019-08-07T18:27:57+00:00","mainEntityOfPage":{"@id":"https:\/\/www.couchbase.com\/blog\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\/"},"wordCount":6324,"commentCount":0,"publisher":{"@id":"https:\/\/www.couchbase.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\/#primaryimage"},"thumbnailUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/5\/2026\/05\/Transactions-alternative-3.jpg","keywords":["6.5","cbo","explain","JSON","NoSQL Database","Optimizer","plan","statistics"],"articleSection":["Application Design","Couchbase Analytics","Couchbase Architecture","Couchbase Server","Data Modeling","SQL++ \/ N1QL Query"],"inLanguage":"pt-BR","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.couchbase.com\/blog\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.couchbase.com\/blog\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\/","url":"https:\/\/www.couchbase.com\/blog\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\/","name":"Cost Based Optimizer for Couchbase N1QL (SQL for JSON) - The Couchbase Blog","isPartOf":{"@id":"https:\/\/www.couchbase.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.couchbase.com\/blog\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\/#primaryimage"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\/#primaryimage"},"thumbnailUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/5\/2026\/05\/Transactions-alternative-3.jpg","datePublished":"2019-08-07T18:27:57+00:00","breadcrumb":{"@id":"https:\/\/www.couchbase.com\/blog\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\/#breadcrumb"},"inLanguage":"pt-BR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.couchbase.com\/blog\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\/"]}]},{"@type":"ImageObject","inLanguage":"pt-BR","@id":"https:\/\/www.couchbase.com\/blog\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\/#primaryimage","url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/5\/2026\/05\/Transactions-alternative-3.jpg","contentUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/5\/2026\/05\/Transactions-alternative-3.jpg","width":1400,"height":553},{"@type":"BreadcrumbList","@id":"https:\/\/www.couchbase.com\/blog\/cost-based-optimizer-for-couchbase-n1ql-sql-for-json\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.couchbase.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Cost Based Optimizer for Couchbase N1QL (SQL for JSON)"}]},{"@type":"WebSite","@id":"https:\/\/www.couchbase.com\/blog\/#website","url":"https:\/\/www.couchbase.com\/blog\/","name":"The Couchbase Blog","description":"Couchbase, the NoSQL Database","publisher":{"@id":"https:\/\/www.couchbase.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.couchbase.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"pt-BR"},{"@type":"Organization","@id":"https:\/\/www.couchbase.com\/blog\/#organization","name":"The Couchbase Blog","url":"https:\/\/www.couchbase.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"pt-BR","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/5\/2026\/06\/logo.svg","contentUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/5\/2026\/06\/logo.svg","width":"1024","height":"1024","caption":"The Couchbase Blog"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/c261644262bf98e146372fe647682636","name":"Keshav Murthy","image":{"@type":"ImageObject","inLanguage":"pt-BR","@id":"https:\/\/secure.gravatar.com\/avatar\/af74df754db27152971d0aed2f323ead5a1f9fe5afd0209af91e12e784451224?s=96&d=mm&r=g4e51d72fc07c662aa791316deafffac4","url":"https:\/\/secure.gravatar.com\/avatar\/af74df754db27152971d0aed2f323ead5a1f9fe5afd0209af91e12e784451224?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/af74df754db27152971d0aed2f323ead5a1f9fe5afd0209af91e12e784451224?s=96&d=mm&r=g","caption":"Keshav Murthy"},"description":"Keshav Murthy is a Vice President at Couchbase R&amp;D. Previously, he was at MapR, IBM, Informix, Sybase, with more than 20 years of experience in database design &amp; development. He lead the SQL and NoSQL R&amp;D team at IBM Informix. He has received two President's Club awards at Couchbase, two Outstanding Technical Achievement Awards at IBM. Keshav has a bachelor's degree in Computer Science and Engineering from the University of Mysore, India, and has received twenty four US patents.","sameAs":["https:\/\/blog.planetnosql.com\/","https:\/\/x.com\/rkeshavmurthy"],"url":"https:\/\/www.couchbase.com\/blog\/pt\/author\/keshav-murthy\/"}]}},"acf":[],"authors":[{"term_id":291,"user_id":55,"is_guest":0,"slug":"keshav-murthy","display_name":"Keshav Murthy","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","author_category":"","first_name":"Keshav","last_name":"Murthy","user_url":"","job_title":"","description":"Keshav Murthy is a Vice President at Couchbase R&amp;D. Previously, he was at MapR, IBM, Informix, Sybase, with more than 20 years of experience in database design &amp; development. He lead the SQL and NoSQL R&amp;D team at IBM Informix. He has received two President's Club awards at Couchbase, two Outstanding Technical Achievement Awards at IBM. Keshav has a bachelor's degree in Computer Science and Engineering from the University of Mysore, India,  and has received twenty four US patents."}],"_links":{"self":[{"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/posts\/1572","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/users\/55"}],"replies":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/comments?post=1572"}],"version-history":[{"count":0,"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/posts\/1572\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/media\/1571"}],"wp:attachment":[{"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/media?parent=1572"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/categories?post=1572"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/tags?post=1572"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/ppma_author?post=1572"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}