Couchbase Mobile 2.0, introduces powerful Full Text Search (FTS) capabilities on your JSON Documents. This is part of the new Query interface based on N1QLa linguagem de consulta declarativa do Couchbase que estende o SQL para JSON. Se você estiver familiarizado com SQL, vai se sentir em casa com a semântica da nova API.

Full Text Search enables natural lanugage querying. This is the third in a series of posts that discusses the query interface in Couchbase Lite. This blog assumes you are familiar with the fundamentals, so if you haven’t done so already, be sure to review the postagem anterior primeiro. Se estiver interessado, os links para blogs que discutem outros recursos da interface do Query são fornecidos no final desta publicação.

You can download the latest pre-release version of Couchbase Mobile 2.0 from aqui.

Histórico

Se você estava usando as versões 1.x do Couchbase Mobile, provavelmente está familiarizado com Visualizações de mapas para criar índices e consultas. Na versão 2.0, você não precisa mais criar visualizações e funções de mapa! Em vez disso, uma interface simples permite a criação de índices e você pode usar uma interface do Query Builder para construir suas consultas. A nova interface de consulta é mais simples de usar e muito mais poderosa em comparação. Vamos descobrir alguns de seus recursos nesta postagem.

Projeto de amostra

While the examples discussed here use Swift for iOS, note that barring some minor differences, the same query interface is supported on the Android and Windows platforms as well.

Portanto, com alguns pequenos ajustes, você poderá reutilizar os exemplos de consulta desta postagem ao trabalhar com outras plataformas.

Siga as instruções abaixo se você estiver interessado em um projeto Swift de amostra

  • Clone o iOS Swift Playground do GitHub
  • Siga as instruções de instalação no manual correspondente LEIAME para criar e executar o playground.

Modelo de dados de amostra

Usaremos o banco de dados Travel Sample localizado aqui. Você pode incorporar esse banco de dados pré-criado em seu aplicativo móvel e começar a usá-lo para suas consultas.

O conjunto de dados de amostra inclui vários tipos de documentos, conforme identificado pelo tipo no documento. Vamos nos concentrar em documentos de tipo "marco" . The JSON document model is shown below. For brevity, we have omitted some of the properties that are not relevant to this post from the model below.

** Consulte o modelo acima para cada um dos exemplos de consulta abaixo. **

O identificador do banco de dados

Nas consultas abaixo, usaremos o Banco de dados API para abrir/criar o banco de dados CouchbaseLite.

O básico

Full Text Search enables natural lanugage querying. In our post on the Fundamentos da consulta, we discussed the como e regex expressions for pattern matching operations. FTS supercedes that capability by enabling support for stemming, relevance based ranking e locale-specific natural language querying.

Full Text Searches are case insensitive and use the partida query expression. In order to perform FTS, you must create Full Text Index on appropriate properties. You can create index on one or more properties.

Stemming

Before we proceed with the examples, first a word on Stemming. Stemming is the process of reducing words to their root stem word. So for instância, “catty”, “catlike” and “cats” are reduced to the word “cat”. So searching for the term “cats” would give us results that match “cat”, “catlike” and so on.

Couchbase Lite currently supports Stemming in the following languages
* danish
* dutch
* english
* finnish
* french
* german
* hungarian
* italian
* norwegian
* portuguese
* romanian
* russian
* spanish
* swedish
* turkish

If no specific language is used, the tokenizer will still break the text into words at Unicode whitespace characters. So it should work, although less well, with any language that puts spaces between words.

Full Text Index

O nome that is associated with the index during creation is important. The query examples that we will see later will refer to the appropriate index via the name

Single Property Index

O exemplo a seguir cria um fullTextIndex no “content” property of a Documento. Stemming is enabled by default and the locale is assumed to be the locale of the device. While not shown below, you also have the option of specifying if “accents” have to be ignored or not via the ignoreAccents option. By default, accents are not ignored.

Multiples Property Index

O exemplo a seguir cria um fullTextIndex em “content” e “name” properties of a Documento

Index without stemming

O exemplo a seguir cria um fullTextIndex no “content” property of a Documento with stemming disabled. Stemming is enabled by default using the current device language settings. Setting language to nil will disable stemming.

FTS Search with Stemming

The query below fetches the id e conteúdo properties of "marco" tipo documents containing the term “Mechanical” no “content” property. We use the “ContentFTSIndex” that was created earlier.

Solicitação

Sample Response

The response to the above query will include documents that contain the terms “mechanical”, “mechanism”, “mechanisms”, “mechanic” and so on.

FTS Search sem Stemming

The query below fetches the id e conteúdo properties of "marco" tipo documents containing the exact term “Mechanical” no “content” property. We use the “ContentFTSIndexNoStemming” that was created earlier which specified the option to disable stemming.

Solicitação

Sample Response

The response to the above query will include documents that contain exactly the term “mechanical” in it. Note again that all searches are case insensitive.

FTS Search on Multiple Properties

The query below fetches the id , nome e conteúdo properties of "marco" tipo documents containing the term “Mechanical” in either the “name” ou o “content” property. We use the “ContentAndNameFTSIndex” that was created earlier. This index enabled indexing on the “name” e “content” propriedades

Solicitação

Sample Response

The response to the above query will include documents that contain the term “mechanical” (or variants of it derived through stemming) in either the “name” or “content” property.

FTS Search with Logical Expressions

In an earlier example, you saw that by disabling stemming, you can look for the exact search string. But what if you wanted to look for more than one search term ? The partida query expression accepts logical expressions including AND and OR.

The query below fetches the id e conteúdo properties of "marco" tipo documents containing the term “Mechanical” ou “Mechanism” no “content” property. We use the “ContentFTSIndexNoStemming” that was created earlier to disable stemming.

Solicitação

Sample Response

The response to the above query will include documents that contain the eactly the terms “mechanical” or “mechanism” in the “content” property.

FTS Search with Wilcard Expression

You can use the “*” character in the search string to represent zero or more character matches.

The query below fetches the id e conteúdo properties of "marco" tipo documents containing the term “walt*” no “content” property. This will match all search terms that start with “walt” followed by zero or more characters. We use the “ContentFTSIndex” that was created earlier.

OBSERVAÇÃO: One could argue that the use of wildcard in the search term could be a naive way of implementing stemming. But then you may end up with derived forms that may not correspond to the terms derived through stemming. So it is preferrable to use stemming if that’s what you need.

Solicitação

Sample Response

The response to the above query will include documents that contain the terms “walt”, “Walter”, “Waltham”,“Walthamstow” and so on.

FTS Search with Stop Words

Stop Words refer to common words in a language. In English, this would be terms like “the”, “is”, “and” , “which” and so on.

Example 1: Search String contains stop words

Couchbase Lite ignores stop words that appear in search string.

The query below fetches the id e conteúdo properties of "marco" tipo documents containing the term “on the history” no “content” property. We use the “ContentFTSIndex” that was created earlier.

Couchbase Lite ignores the stop words “on” and “the”, so you would fetch documents that only include the term “history” and derived forms of the stem word

Solicitação

Sample Response

The response to the above query will include documents that contain the terms “history” and derived forms of this word such as “historical”

Example 2: Ignoring Stop Words while Searching

By default, Couchbase Lite ignores stop words within the search content.

The query below fetches the id e conteúdo properties of "marco" tipo documents containing the terms “blue fin yellow fin” no “content” property. We use the “ContentFTSIndex” that was created earlier.

Couchbase Lite ignores stop words during search, so you would fetch documents that include the terms “blue”, “fin” and “yellow” in that order, separated by any number of stop words.

Solicitação

Sample Response

The response to the above query will include documents that contain the terms “blue”, “fin” and “yellow” separated by any number of stop words such as “blue fin and yellow fin”

FTS Search with Ranking

Você pode usar o FullTextFunction.rank to specify the rank order of the search results. This is useful to rate the matches in order of best match.

The query below fetches the id e conteúdo properties of "marco" tipo documents containing the term “attract” no “content” property. The documents are ordered in descending order according to rank which means that the document which the maximum number of matches is sorted higher than the rest.

Solicitação

Sample Response

The response to the above query will include documents that include the term “attract” or derived versions of it. Documents with the maximum number of matches are sorted higher.

Limitações

While the FTS capabily in Couchbase Lite 2.0 is extremely powerful and would suffice for use cases typical on an embedded database, there are a few limitations

  • Match Expression can only be at the top-level or top-level AND expression. This means that the following expression is not allowed ftsExpression.match(“attract”).or(ftsExpression2.match(“museum”))
  • Custom Language Tokenizers
    The list of supported languages was specified earlier. At the time of writing this post, you cannot plug in a custom tokenizer in order to extend support to other languages
  •  Fuzzy Search Support
    We cannot specify a “fuzziness” factor on the query that may result in less relevant matches being considered
  •  Facets
    There is no support for faceted search

Bear in mind that Couchbase Lite is an embedded database. So one could argue that the FTS capabilities does not have to be as extensive as a server side database implementation. The support for these will be evaluated in future releases.

O que vem a seguir

This blog post looked at how you can leverage the Full Text Search (FTS) capabilities in the new Query API in Couchbase Mobile 2.0. This is a start. Expect to see more functionality in future releases. You can download the latest release from our downloads página.

Aqui estão algumas outras postagens relacionadas ao Couchbase Mobile Query que podem ser de interesse
- Isso postagem no blog discute os fundamentos
- Isso postagem no blog discute como consultar coleções de matrizes
- Isso postagem no blog discute como fazer consultas JOIN

Se tiver dúvidas ou comentários, deixe um comentário abaixo ou entre em contato comigo pelo Twitter @rajagp ou envie-me um e-mail priya.rajagopal@couchbase.com. O Fóruns do Couchbase são outro bom lugar para entrar em contato com perguntas.

 

Autor

Postado por Priya Rajagopal, Diretora Sênior, Gerenciamento de Produtos

Priya Rajagopal é diretora sênior de gerenciamento de produtos da Couchbase, responsável pelas plataformas de desenvolvedor para a nuvem e a borda. Ela desenvolve software profissionalmente há mais de 20 anos em vários cargos técnicos e de liderança de produtos, com mais de 10 anos de foco em tecnologias móveis. Como delegada de padrões de IPTV da TISPAN, ela foi uma das principais colaboradoras das especificações de padrões de IPTV. Ela tem 22 patentes nas áreas de rede e segurança de plataforma.

Deixar uma resposta