Introduction

Full text search (FTS) indexing can be challenging for those who are not familiar with search in general.  In this post, we’ll take some common search use cases and work through the creation of appropriate indexes following best practices for Couchbase FTS indexing. 

Use cases

The following common search use cases will be explored: 

  • Simple search 
  • Field independent search
  • Finding the Nearest Neighbor
  • Search documents with a model that is changing all the time

Travel Sample Data Model

Throughout this document, we’ll be referencing the Travel Sample dataset available to install in any Couchbase Server instance. The travel-sample bucket has several distinct document types: airline, route, airport, landmark, and hotel.  The document model for each kind of document contains:

  • A key that acts as a primary key
  • An id field that identifies the document
  • A type field that identifies the kind of document

 

The following figure illustrates the relationship between the different kinds of documents in the travel-sample bucket. It shows the primary key, ID, and type fields that each document has, plus a few representative fields in each type of document. (Landmark documents are not pictured.)

 

Figure 0. Documents in the travel-sample data model

 

Example Use Cases

Simple Search

Use case summary: a user will find hotels by searching the hotel description field for a keyword and see a list of matching hotels. 

Index creation steps: 

  1. On the Full Text Search UI, click “Add Index”.
  2. Specify an index name, e.g. “hotel_desc”, and select the travel-sample bucket. 
  3. Since each document in the travel-sample bucket has a “type” field indicating the type of document, leave “JSON type field” set to “type”.
  4. Under type mappings:  
    1. Click “+ Add Type Mapping”, and specify “hotel” as the type name, since the requirement is to search all hotel documents.  
    2. A list of available analyzers can be accessed by means of the pull-down menu to the right of the type name field.  For this use case, leave “inherit” selected so that the type mapping inherits the default analyzer from the index.
    3. Since the requirement is to search only the hotel description field, check “only index specified fields”.  With this checked, only user-specified fields from the document are included in the index for the hotel type mapping (the mapping will not be dynamic, meaning that all fields are considered available for indexing). 
    4. Click OK.  
    5. Mouse over the row with the hotel type mapping, click the + button, and then click “insert child field”.  This will allow the description field to be individually included in the index. Specify the following: 
      1. field: Enter the name of the field to be indexed, “description”.
      2. type: Leave this set to text for the description field.
      3. searchable as: Leave this the same as the field name for the current use case.  It can be used to indicate an alternate field name. 
      4. analyzer: As was done for the type mapping, for this use case, leave “inherit” selected so that the type mapping inherits the default analyzer.
      5. index checkbox: Leave this checked, so that the field is included in the index.  Unchecking the box would explicitly remove the field from the index.
      6. store checkbox: This setting includes the field content in the values returned from the search and permits highlighting of matched expressions in the results, but it also increases the size of the index and the build time. Leave this unchecked since there’s no requirement to highlight the results for this use case. 
      7. “include in _all field” checkbox: Uncheck this since the use case is to specifically search the description field. 
      8. “include term vectors” checkbox: Certain kinds of functionality (such as highlighting, and phrase search) require term vectors, but this increases the size of the index and the build time. Uncheck this box since the use case is a keyword search without highlighting.  
      9. Click OK.
    6. Finally, uncheck the checkbox next to the “default” type mapping.  If the default mapping is left enabled, all documents in the bucket are included in the index, regardless of whether the user actively specifies type mappings. Only the hotel documents are required, and they are included by the hotel type mapping added previously. 
  5. The default values suffice for the remaining collapsed panels (Analyzers, Custom Filters, Date/Time Parsers, and Advanced). 
  6. Index Replicas can be set to 1, 2 or 3, provided that the cluster is running the Search service on n+1 nodes. With a single node development cluster, maintain the default value of 0. 
  7. For Index Type, the default value of “Version 6.0 (Scorch)” is appropriate for any newly created indexes. Scorch reduces the size of the index on disk, and provides enhanced performance for indexing and mutation-handling.
  8. At this point, the create index page should look like the last frame captured in Figure 1.  Click “Create Index” to complete the process. 

Figure 1 – Index Creation “hotel_desc”

 

Note: See “Index: hotel_desc” in Appendix A for the JSON payload used to create this index with the REST API.  

Testing queries against the index: 

  1. On the Full Text Search UI, wait for indexing progress to show 100%, then click on the index name “hotel_desc”. 
  2. To search for any hotels with the keyword “farm” in the description, in the “search this index…” text box, enter “description:farm” and click Search.  This field-scoping of the search is required because the “include in _all field” checkbox was left unchecked.
  3. The results are shown (similar to Figure 2) with the key of each matching document.

Figure 2 – Index “hotel_desc” search for “description:farm”

4. The same search can be run through the REST API using curl: 

Field-independent Search

Use case summary: a user will find hotels by searching the hotel name, alias, description, address, and reviews fields for a keyword and see a list of matching hotels that includes the highlighted matches.

Index creation steps: 

  1. On the Full Text Search UI, click “Add Index”.
  2. Specify an index name, e.g. “hotel_mult_fields”, and select the travel-sample bucket. 
  3. Since each document in the travel-sample bucket has a “type” field indicating the type of document, leave “JSON type field” set to “type”.
  4. Under type mappings:  
    1. Click “+ Add Type Mapping”, and specify “hotel” as the type name, since the requirement is to search all hotel documents.  
    2. A list of available analyzers can be accessed by means of the pull-down menu to the right of the type name field.  For this use case, leave “inherit” selected so that the type mapping inherits the default analyzer from the index.
    3. Since the requirement is to search hotel name, alias, description, address, and reviews fields, check “only index specified fields”.  With this checked, only user-specified fields from the document are included in the index for the hotel type mapping (the mapping will not be dynamic, meaning that all fields are considered available for indexing). 
    4. Click OK.  
    5. Add each of the 5 fields to the hotel type mapping:
      1. Name: Mouse over the row with the hotel type mapping, click the + button, and then click “insert child field”.  This will allow the hotel name field to be individually included in the index. Specify the following: 
        1. field: Enter the name of the field to be indexed, “name”.
        2. type: Leave this set to text for the name field.
        3. searchable as: Leave this the same as the field name for the current use case.  It can be used to indicate an alternate field name. 
        4. analyzer: As was done for the type mapping, for this use case, leave “inherit” selected so that the type mapping inherits the default analyzer.
        5. index checkbox: Leave this checked so that the field is included in the index.  Unchecking the box would explicitly remove the field from the index.
        6. store checkbox: Check this setting to include the field content in the search results which permits highlighting of matched expressions in the results. 
        7. “include in _all field” checkbox: Leave this checked since the use case requirement is to search multiple fields. 
        8. “include term vectors” checkbox: Leave this checked since the use case requires highlighting of results.  
        9. Click OK.
      2. Alias: Mouse over the row with the hotel type mapping, click the + button, and then click “insert child field”.  This will allow the hotel name field to be individually included in the index. Specify the following: 
        1. field: Enter the name of the field to be indexed, “alias”.
        2. type: Leave this set to text for the name field.
        3. searchable as: Leave this the same as the field name for the current use case.  It can be used to indicate an alternate field name. 
        4. analyzer: As was done for the type mapping, for this use case, leave “inherit” selected so that the type mapping inherits the default analyzer.
        5. index checkbox: Leave this checked so that the field is included in the index.  Unchecking the box would explicitly remove the field from the index.
        6. store checkbox: Check this setting to include the field content in the search results which permits highlighting of matched expressions in the results
        7. “include in _all field” checkbox: Leave this checked since the use case requirement is to search multiple fields. 
        8. “include term vectors” checkbox: Leave this checked since the use case requires highlighting of results.  
        9. Click OK.
      3. Description: Mouse over the row with the hotel type mapping, click the + button, and then click “insert child field”.  This will allow the hotel name field to be individually included in the index. Specify the following: 
        1. field: Enter the name of the field to be indexed, “description”.
        2. type: Leave this set to text for the name field.
        3. searchable as: Leave this the same as the field name for the current use case.  It can be used to indicate an alternate field name. 
        4. analyzer: As was done for the type mapping, for this use case, leave “inherit” selected so that the type mapping inherits the default analyzer.
        5. index checkbox: Leave this checked so that the field is included in the index.  Unchecking the box would explicitly remove the field from the index.
        6. store checkbox: Check this setting to include the field content in the search results which permits highlighting of matched expressions in the results. 
        7. “include in _all field” checkbox: Leave this checked since the use case requirement is to search multiple fields. 
        8. “include term vectors” checkbox: Leave this checked since the use case requires highlighting of results.  
        9. Click OK.
      4. Address: Mouse over the row with the hotel type mapping, click the + button, and then click “insert child field”.  This will allow the hotel name field to be individually included in the index. Specify the following: 
        1. field: Enter the name of the field to be indexed, “address”.
        2. type: Leave this set to text for the name field.
        3. searchable as: Leave this the same as the field name for the current use case.  It can be used to indicate an alternate field name. 
        4. analyzer: As was done for the type mapping, for this use case, leave “inherit” selected so that the type mapping inherits the default analyzer.
        5. index checkbox: Leave this checked so that the field is included in the index.  Unchecking the box would explicitly remove the field from the index.
        6. store checkbox: Check this setting to include the field content in the search results which permits highlighting of matched expressions in the results. 
        7. “include in _all field” checkbox: Leave this checked since the use case requirement is to search multiple fields. 
        8. “include term vectors” checkbox: Leave this checked since the use case requires highlighting of results.  
        9. Click OK.
      5. Reviews: Mouse over the row with the hotel type mapping, click the + button, and then click “insert child field”.  This will allow the hotel name field to be individually included in the index. Specify the following: 
        1. field: Enter the name of the field to be indexed, “reviews”.
        2. type: Leave this set to text for the name field.
        3. searchable as: Leave this the same as the field name for the current use case.  It can be used to indicate an alternate field name. 
        4. analyzer: As was done for the type mapping, for this use case, leave “inherit” selected so that the type mapping inherits the default analyzer.
        5. index checkbox: Leave this checked so that the field is included in the index.  Unchecking the box would explicitly remove the field from the index.
        6. store checkbox: Check this setting to include the field content in the search results which permits highlighting of matched expressions in the results. 
        7. “include in _all field” checkbox: Leave this checked since the use case requirement is to search multiple fields. 
        8. “include term vectors” checkbox: Leave this checked since the use case requires highlighting of results.  
        9. Click OK.
    6. Finally, uncheck the checkbox next to the “default” type mapping.  If the default mapping is left enabled, all documents in the bucket are included in the index, regardless of whether the user actively specifies type mappings. Only the hotel documents are required, and they are included by the hotel type mapping added previously. 
  5. The default values suffice for the remaining collapsed panels (Analyzers, Custom Filters, Date/Time Parsers, and Advanced). 
  6. Index Replicas can be set to 1, 2 or 3, provided that the cluster is running the Search service on n+1 nodes. With a single node development cluster, maintain the default value of 0. 
  7. For Index Type, the default value of “Version 6.0 (Scorch)” is appropriate for any newly created indexes. Scorch reduces the size of the index on disk, and provides enhanced performance for indexing and mutation-handling.
  8. At this point, the create index page should look like the last frame captured in Figure 3.  Click “Create Index” to complete the process. 

Figure 3 – Index Creation “hotel_mult_fields”

 

Note: See “Index: hotel_mult_fields” in Appendix A for the JSON payload used to create this index through the REST API.  

Testing queries against the index: 

  1. On the Full Text Search UI, wait for indexing progress to show 100%, then click on the index name “hotel_mult_fields”. 
  2. To search for any hotels with the keyword “farm” in the name, alias, description, address, or reviews fields, in the “search this index…” text box, enter “farm” and click Search.  Field-scoping of the search is not required because the “include in _all field” checkbox was checked for each field indexed in the hotel type mapping.
  3. The results are shown (similar to Figure 4) with the key of each matching document and the highlighted matches in each matching document.

Figure 4 – Index “hotel_mult_fields” search results for “farm”

 

4. The same search can be run through the REST API using curl: 

Finding the Nearest Neighbor

Use case summary: a user will find hotels within a certain distance from a specified location and see a list of matching hotel names ordered by distance from the specified location.

Index creation steps: 

  1. On the Full Text Search UI, click “Add Index”.
  2. Specify an index name, e.g. “hotel_geo”, and select the travel-sample bucket. 
  3. Since each document in the travel-sample bucket has a “type” field indicating the type of document, leave “JSON type field” set to “type”.
  4. Under type mappings:  
    1. Click “+ Add Type Mapping”, and specify “hotel” as the type name, since the requirement is to search all hotel documents.  
    2. A list of available analyzers can be accessed by means of the pull-down menu to the right of the type name field.  For this use case, leave “inherit” selected so that the type mapping inherits the default analyzer from the index.
    3. Since the requirement is to search for the hotel location only, check “only index specified fields”.  With this checked, only user-specified fields from the document are included in the index for the hotel type mapping (the mapping will not be dynamic, meaning that all fields are considered available for indexing). 
    4. Click OK.  
    5. Add the geo and name fields to the hotel type mapping:
      1. Geo: Mouse over the row with the hotel type mapping, click the + button, and then click “insert child field”.  This will allow the hotel geo field to be individually included in the index. Specify the following: 
        1. field: Enter the name of the field to be indexed, “geo”.
        2. type: Set this to geopoint.
        3. searchable as: Leave this the same as the field name for the current use case.  It can be used to indicate an alternate field name. 
        4. index checkbox: Leave this checked so that the field is included in the index.  Unchecking the box would explicitly remove the field from the index.
        5. store checkbox: Check this setting to include the field content in the search results. 
        6. “include in _all field” checkbox: Leave this checked since the use case requirement is to search the geo field. 
        7. Click OK.
      2. Name: Mouse over the row with the hotel type mapping, click the + button, and then click “insert child field”.  This will allow the hotel name field to be individually included in the index. Specify the following: 
        1. field: Enter the name of the field to be indexed, “name”.
        2. type: Leave this set to text for the name field.
        3. searchable as: Leave this the same as the field name for the current use case.  It can be used to indicate an alternate field name. 
        4. analyzer: As was done for the type mapping, for this use case, leave “inherit” selected so that the type mapping inherits the default analyzer.
        5. index checkbox: Leave this checked so that the field is included in the index.  Unchecking the box would explicitly remove the field from the index.
        6. store checkbox: Check this setting to include the field content in the search results. 
        7. “include in _all field” checkbox: Uncheck this since the use case requirement is to search only by location but to display the name in the search results. 
        8. “include term vectors” checkbox: Certain kinds of functionality (such as highlighting, and phrase search) require term vectors, but this increases the size of the index and the build time. Uncheck this box since the use case is to search by geo location only.  
        9. Click OK.
    6. Finally, uncheck the checkbox next to the “default” type mapping.  If the default mapping is left enabled, all documents in the bucket are included in the index, regardless of whether the user actively specifies type mappings. Only the hotel documents are required, and they are included by the hotel type mapping added previously. 
  5. The default values suffice for the remaining collapsed panels (Analyzers, Custom Filters, Date/Time Parsers, and Advanced). 
  6. Index Replicas can be set to 1, 2 or 3, provided that the cluster is running the Search service on n+1 nodes. With a single node development cluster, maintain the default value of 0. 
  7. For Index Type, the default value of “Version 6.0 (Scorch)” is appropriate for any newly created indexes. Scorch reduces the size of the index on disk, and provides enhanced performance for indexing and mutation-handling.
  8. At this point, the create index page should look like Figure 5.  Click “Create Index” to complete the process. 

Figure 5 – Index Creation “hotel_geo”

 

Note: See “Index: hotel_geo” in Appendix A for the JSON payload used to create this index through the REST API.  

Testing queries against the index: 

  1. On the Full Text Search UI, wait for indexing progress to show 100%. 
  2. Only “query string” queries can be tested through the Search UI in the Couchbase web console, so this geospatial search will be tested through the REST API using curl.  Search for the 2 nearest hotels within a mile of a location in San Francisco, returning the hotel name and coordinates ordered by distance using the following JSON query body:  

3. Query result JSON: 

Dynamic Search

Use case summary: a user will find hotels by searching for a keyword that appears in any attribute of the hotel document and see a list of matching hotels.

In this case, we want to search data that is changing all the time: fields are going to be added or removed, or fields may change from a simple string to an object, and our search should find the matching landmarks at all times.  To meet this requirement, we will create a dynamic index.

Index creation steps: 

  1. On the Full Text Search UI, click “Add Index”.
  2. Specify an index name, e.g. “hotel_dynamic”, and select the travel-sample bucket. 
  3. Since each document in the travel-sample bucket has a “type” field indicating the type of document, leave “JSON type field” set to “type”.
  4. Under type mappings:  
    1. Click “+ Add Type Mapping”, and specify “hotel” as the type name, since the requirement is to search all hotel documents.  
    2. A list of available analyzers can be accessed by means of the pull-down menu to the right of the type name field.  For this use case, leave “inherit” selected so that the type mapping inherits the default analyzer from the index.
    3. Since the requirement is to search all hotel document fields, leave “only index specified fields” unchecked.  With this unchecked, the mapping will be dynamic, meaning that all fields are considered available for indexing. 
    4. Click OK.  
    5. Finally, uncheck the checkbox next to the “default” type mapping.  If the default mapping is left enabled, all documents in the bucket are included in the index, regardless of whether the user actively specifies type mappings. Only the hotel documents are required, and they are included by the hotel type mapping added previously. 
  5. The default values suffice for the remaining collapsed panels (Analyzers, Custom Filters, Date/Time Parsers, and Advanced). 
  6. Index Replicas can be set to 1, 2 or 3, provided that the cluster is running the Search service on n+1 nodes. With a single node development cluster, maintain the default value of 0. 
  7. For Index Type, the default value of “Version 6.0 (Scorch)” is appropriate for any newly created indexes. Scorch reduces the size of the index on disk, and provides enhanced performance for indexing and mutation-handling.
  8. At this point, the create index page should look like Figure 5.  Click “Create Index” to complete the process. 

Figure 6 – Index Creation “hotel_dynamic”

Note: See “Index: hotel_dynamic” in Appendix A for the JSON payload used to create this index through the REST API.  

Testing queries against the index: 

  1. On the Full Text Search UI, wait for indexing progress to show 100%, then click on the index name “hotel_dynamic”. 
  2. To search for any hotels with the keyword “farm” in any fields of the hotel documents, in the “search this index…” text box, enter “farm” and click Search.  Field-scoping of the search is not required because all fields are included in the dynamic index.
  3. The results are shown (similar to Figure 7) with the key of each matching document.

Figure 7 – Index “hotel_dynamic” search results for “farm”

 

4. The same search can be run through the REST API using curl: 

 

 

Appendix A – Index Definition JSON

Index: hotel_desc

Index: hotel_mult_fields

Index: hotel_geo

Index: hotel_dynamic

 

Author

Posted by Brian Kane, Solutions Engineer, Couchbase

Brian Kane is a Solutions Engineer at Couchbase and has been working in application development and with database technologies since 1996. He is based in the Houston, Texas area.

Leave a reply