This is Part 3 of a series examining Data Structures in NoSQL databases. In this post, we use Full-Text Search natural language queries against data structures in Couchbase.

What is Full-Text Search Indexing?

Full-text search indexing analyzes text-based data structures of documents, recording the words found in each document. Queries can then use the index to efficiently find documents that contain those search terms when requested. Full-text search engines use a variety of search algorithms to solve different types of search problems.

Couchbase uses NoSQL data indexing and SQL-based querying services to retrieve information from data structures or documents. Full-text search indexes and search requests find document matches in a more flexible way.

A full-text search returns a list of matching documents. In contrast, N1QL queries return a set of rows. Text indexes are known as inverted indexes, vs. tabular indexing that focuses on finding rows/columns with specific values. 

Full-text search also scores and orders search results so that the most relevant document matches are returned first.

Once a full-text index is created, it is updated as new data enters the database. Database administrators can optimize indexing for specific use cases – e.g., high volume writes vs. high-volume queries, etc.

The Basic Data Structures

There are a handful of simple data structures that Couchbase exposes: maps, lists, sets, and queues. Each is represented by a JSON data type and stored as NoSQL documents. See the other posts for the basic key-based access to these data.

data structures for nosql documents in couchbase map list queues

Primary NoSQL data structure examples

Specific fields in the structures can be indexed depending on the need of users/applications. When a search is requested, the string is compared with the indexes and a list of matching documents is returned. 

Only map-based structures have field names, counter and list structures do not. Therefore, only maps can be indexed. It is possible for a list to also have a map inside it, which would also work.

Creating Sample Data

Using the Python SDK we create a few specific structures for the purposes of this post. See past post for detailed instructions, simple code is shown here for brevity. The idea here is that you are building a user profile, adding more information as it becomes available.

Full-Text Searching Data Structures

Couchbase text searching is similar to using data structures with N1QL in that it also requires an index. Search indexes vary significantly in complexity depending on the documents in the database and the use case. The built-in web UI Search tab makes it easy to create new search indexes.

See Full-Text Search Indexing Best Practices for more detailed information.

Indexing Data Structures

Here we create an index for the particular fields that we know exist in the data. Including the default mapping allows new fields to be indexed as they come online. You may want to turn that feature off when optimizing for production. Also, I chose the store option for each field which will show the matched text, not just the document list when searching.

text search indexing for data structures

Couchbase full-text search index creation for data map entries.

After creating the search index, a search bar appears for entering basic text into. More advanced queries (fuzzy matching, prefix match, geospatial) can be done using the REST API or Couchbase SDK.

Searching Data Structures

Using the built-in search tool, we can find those data structure objects that match input text. Update the map data structures, and the values here will change automatically with no further indexing required.

searching a couchbase data structure through the UI

Searching a Couchbase data structure through the web UI

In this example, we searched for a single word/term and found a match in the name field. If there had been only documents featuring that term — in any field — it would returned those, as well.

A field can also be specified in the search box with a prefix, including a dot syntax to specify child items within another. For example: address.country:Canada. Numeric ranges can also be used here, e.g., I added another user to my database and added a new field to the map. Since the tylerM map did not include it, only the new player1 item is returned.

field scoping for a numeric range in the web search ui

Field scoping with a numeric range search.

It returned the matching document and the fields that I had specifically indexed. The score field was not explicitly stored in the index, though it was indexed so you can still find matching documents.

Full-Text Search SDK  (Python)

Developers will use the Couchbase Search SDK to make direct calls to the database from their applications. The SDK returns a list of documents that match the text along with their relevance score values.

For more information on the SDK for your language see Full-Text Search in the docs here.

All the languages supported by the Couchbase SDK can do searching with custom methods for each type of query. This Python code includes the basic connection and search process that replicates the above examples.

Produces the details of the one document that matched:

Scopes and Collections in Search

With the launch of Couchbase 7.0, subsets of documents into scopes and collections are now possible. When creating an index using the web console, there is a “use non-default scopes” option to check. You select a scope from the dropdown then create a new type mapping to specify a particular collection.

specifying a document collection for searching

Selecting a scope and collection for the search index.

We won’t go into it here, but scope and collection-level searching with an SDK uses connection-level properties.

Bringing it all together

Creating documents, maps, and other data structures are very simple with some basic SDK usage. Likewise, through the strategic use of text search indexing methods, there are even more ways to access the data your applications are managing.

Practically speaking, basic JSON arrays, strings, etc. can be mapped and indexed for use in other access methods.

Because Couchbase is an all-inclusive platform – your system architecture can be greatly simplified. Developers can get started right away without a lot of heavy lifting.

Author

Posted by Tyler Mitchell

Works as Senior Product Marketing Manager at Couchbase, helping bring knowledge about products into the public limelight while also supporting our field teams with valuable content. His personal passion is all things geospatial, having worked in GIS for half his career. Now AI and Vector Search is top of mind.

Leave a reply