Couchbase Server uses a Unicode collation algorithm to order letters, so you should be aware of how this functions. Most developers are typically used to Byte order, such as that found in ASCII and which is used in most programming languages for ordering strings during string comparisons.
The following shows the order of precedence used in Byte order, such as ASCII:
123456890 < A-Z < a-zThis means any items that start with integers will appear before any items with letters; any items that beginning with capital letters will appear before items in lower case letters. This means the item named "Apple" will appear before "apple" and the item "Zebra" will appear before "apple". Compare this with the order of precedence used in Unicode collation, which is used in Couchbase Server:
123456790 < aAbBcCdDeEfFgGhH...Notice again that items that start with integers will appear before any items with letters. However, in this case, the lowercase and then uppercase of the same letter are grouped together. This means that that if "apple" will appear before "Apple" and would also appear before "Zebra." In addition, be aware that with accented characters will follow this ordering:
a < á < A < Á < bThis means that all items starting with "a" and accented variants of the letter will occur before "A" and any accented variants of "A."
Ordering Example
In Byte order, keys in an index would appear as follows:
"ABC123" < "ABC223" < "abc123" < "abc223" < "abcd23" < "bbc123" < "bbcd23"The same items will be ordered this way by Couchbase Server under Unicode collation:
"abc123" < "ABC123" < "abc223" < "ABC223" < "abcd23" < "bbc123" < "bbcd23"
This is particularly important for you to understand if you
query Couchbase Server with a startkey and
endkey to get back a range of results. The
items you would retrieve under Byte order are different compared
to Unicode collation. For more information about ordering
results, see
Section 9.8.2.2, “Partial Selection and Key Ranges”.
Ordering and Query Example
This following example demonstrates Unicode collation in
Couchbase Server and the impact on query results returned with a
startkey and endkey. It is
based on the beer-sample database provided
with Couchbase Server 2.0. For more information, see
Section C.2, “Beer Sample Bucket”.
Imagine you want to retrieve all breweries with names starting with uppercase Y. Your query parameters would appear as follows:
startkey="Y"&endkey="z"If you want breweries starting with lowercase y or uppercase Y, you would provides a query as follows:
startkey="y"&endkey="z"This will return all names with lower case Y and items up to, but not including lowercase z, thereby including uppercase Y as well. To retrieve the names of breweries starting with lowercase y only, you would terminate your range with capital Y:
startkey="y"&endkey="Y"As it happens, the sample database does not contain any results because there are no beers in it which start with lowercase y. If you want to learn more about Unicode collation, refer to these resources: Unicode Technical Standard #10 and ICU User Guide, Customization, Default Options.