{"id":5816,"date":"2018-09-12T08:10:08","date_gmt":"2018-09-12T15:10:08","guid":{"rendered":"https:\/\/www.couchbase.com\/blog\/?p=5816"},"modified":"2025-06-13T20:59:13","modified_gmt":"2025-06-14T03:59:13","slug":"how-analyzers-tokenizers-filters-work-fts-part-2","status":"publish","type":"post","link":"https:\/\/www.couchbase.com\/blog\/pt\/how-analyzers-tokenizers-filters-work-fts-part-2\/","title":{"rendered":"Cria\u00e7\u00e3o de um aplicativo semelhante ao Shazam para entender como funcionam os Tokenizers e os Filtros | FTS Parte 2"},"content":{"rendered":"<p>Na postagem anterior do blog, falamos sobre <a href=\"https:\/\/www.couchbase.com\/blog\/pt\/why-you-should-avoid-like-deep-dive-on-fts-part-1\/\">por que a pesquisa de texto completo \u00e9 uma solu\u00e7\u00e3o melhor em escala para implementar uma pesquisa bem projetada em seu aplicativo<\/a>. Nesta segunda parte, vamos nos aprofundar no \u00cdndice Invertido e explorar como analisadores, tokenizadores e filtros podem moldar o resultado de suas pesquisas.<\/p>\n<p>Portanto, n\u00e3o importa se voc\u00ea est\u00e1 indexando e pesquisando registros, genes em um DNA, sua pr\u00f3pria estrutura de dados e, \u00e9 claro, a linguagem. Todos eles funcionar\u00e3o basicamente da mesma maneira.<\/p>\n<p>Para dar um exemplo de como \u00e9 poss\u00edvel usar o FTS mesmo quando voc\u00ea tem sua pr\u00f3pria estrutura personalizada, vamos aproveitar o fato de que a Apple finalmente comprou o Shazam e criar um aplicativo imagin\u00e1rio semelhante ao Shazam. No entanto, em vez de ouvir um pequeno fragmento de m\u00fasica como o Shazam faz, pediremos que o usu\u00e1rio assovie.<\/p>\n<p>&nbsp;<\/p>\n<h2><strong>Espere... por que preciso do Full-text Search para isso?<\/strong><\/h2>\n<p>Como o usu\u00e1rio pode assobiar erroneamente algumas partes da m\u00fasica, precisaremos dividi-la em \"pequenos blocos de melodia\" e tentar combin\u00e1-los com a nossa biblioteca. Supondo que a nossa biblioteca tenha milhares ou at\u00e9 milh\u00f5es de m\u00fasicas (as bibliotecas da Apple e do Spotify t\u00eam mais de 30 milh\u00f5es de m\u00fasicas), um simples LIKE \"%melody%\" n\u00e3o tem chance de trazer resultados em um per\u00edodo de tempo razo\u00e1vel.<\/p>\n<p>Um \u00edndice invertido parece ser a ferramenta certa para o trabalho, pois podemos encontrar facilmente todas as m\u00fasicas que cont\u00eam um determinado bloco de melodia. Se voc\u00ea ainda n\u00e3o est\u00e1 familiarizado com esse conceito, consulte <a href=\"https:\/\/www.couchbase.com\/blog\/pt\/why-you-should-avoid-like-deep-dive-on-fts-part-1\/\">veja minha postagem anterior no blog<\/a> sobre isso.<\/p>\n<p>&nbsp;<\/p>\n<h2><strong>O C\u00f3digo Parsons<\/strong><\/h2>\n<p>A primeira coisa que precisamos fazer \u00e9 converter nossa biblioteca de m\u00fasicas em texto. Podemos fazer isso usando a fun\u00e7\u00e3o <a href=\"https:\/\/en.wikipedia.org\/wiki\/Parsons_code\">C\u00f3digo Parsons<\/a>que \u00e9 uma nota\u00e7\u00e3o usada para identificar uma pe\u00e7a musical de acordo com os movimentos do\u00a0<u><a href=\"https:\/\/en.wikipedia.org\/wiki\/Pitch_(music)\">campo<\/a><\/u>\u00a0para cima e para baixo:<\/p>\n<p>&nbsp;<\/p>\n<ul>\n<li>* = primeiro tom como refer\u00eancia,<\/li>\n<li>u = \"up\" (para cima), para quando a nota \u00e9 mais alta do que a nota anterior,<\/li>\n<li>d = \"down\", para quando a nota \u00e9 mais baixa do que a nota anterior,<\/li>\n<li>r = \"repeat\" (repetir), para quando a nota tem a mesma altura da nota anterior.<\/li>\n<\/ul>\n<p>Usando o c\u00f3digo de parsons, uma m\u00fasica como \"<u><a href=\"https:\/\/en.wikipedia.org\/wiki\/Twinkle_Twinkle_Little_Star\">Twinkle Twinkle Little Star<\/a><\/u>\" ser\u00e1 convertido para <strong>*rururddrdrdrdurdrdrdurdrdrddrururddrdrdrd<\/strong>.<\/p>\n<p>Aqui est\u00e1 a m\u00fasica completa:<\/p>\n<!--[if lt IE 9]><script>document.createElement('audio');<\/script><![endif]-->\n<audio class=\"wp-audio-shortcode\" id=\"audio-5816-1\" preload=\"none\" style=\"width: 100%;\" controls=\"controls\"><source type=\"audio\/mpeg\" src=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2018\/09\/Twinkle_Twinkle_Little_Star_plain.mp3?_=1\" \/><a href=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2018\/09\/Twinkle_Twinkle_Little_Star_plain.mp3\">https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2018\/09\/Twinkle_Twinkle_Little_Star_plain.mp3<\/a><\/audio>\n<p>e aqui est\u00e1 sua visualiza\u00e7\u00e3o usando o c\u00f3digo Parsons:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-5819\" src=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2018\/09\/Screen-Shot-2018-09-12-at-4.34.00-PM.png\" alt=\"\" width=\"640\" height=\"374\" srcset=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2018\/09\/Screen-Shot-2018-09-12-at-4.34.00-PM.png 640w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2018\/09\/Screen-Shot-2018-09-12-at-4.34.00-PM-300x175.png 300w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2018\/09\/Screen-Shot-2018-09-12-at-4.34.00-PM-20x12.png 20w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/p>\n<h2><strong>Analisadores<\/strong><\/h2>\n<p>Para criar nosso \u00edndice invertido, precisamos primeiro preparar nosso texto, como dividi-lo em partes menores, convert\u00ea-lo em letras min\u00fasculas, remover palavras irrelevantes etc. A fase de prepara\u00e7\u00e3o\/an\u00e1lise geralmente \u00e9 executada durante a <a href=\"https:\/\/docs.couchbase.com\/server\/current\/n1ql\/n1ql-language-reference\/createindex.html\">cria\u00e7\u00e3o de \u00edndices<\/a> e antes de a consulta ser executada. Dessa forma, podemos garantir que tanto o texto de destino quanto o termo que est\u00e1 sendo correspondido passaram exatamente pelas mesmas transforma\u00e7\u00f5es.<\/p>\n<p>O c\u00f3digo respons\u00e1vel por essa transforma\u00e7\u00e3o \u00e9 chamado de Analisador e, em termos gerais, agrupamos os analisadores em duas categorias principais: tokenizadores e filtros.<\/p>\n<p>&nbsp;<\/p>\n<h3><strong>Tokenizadores<\/strong><\/h3>\n<p>Quando estivermos lidando com idiomas, o tokenizador padr\u00e3o dividir\u00e1 um texto em palavras. A estrat\u00e9gia de tokeniza\u00e7\u00e3o mudar\u00e1 ligeiramente de acordo com o idioma, pois tamb\u00e9m devemos considerar outros caracteres al\u00e9m dos espa\u00e7os em branco, como l'amour em franc\u00eas ou \"I'm\" em ingl\u00eas.<\/p>\n<p>No Couchbase FTS, o tokenizador padr\u00e3o funciona imediatamente na maioria das vezes, mas tamb\u00e9m fornecemos tokenizadores para <a href=\"https:\/\/docs.couchbase.com\/server\/5.5\/fts\/fts-using-analyzers.html\">HTML e algumas outras estruturas de dados<\/a>. Portanto, sempre vale a pena verificar se voc\u00ea est\u00e1 usando o mais adequado.<\/p>\n<p>Idealmente, em nosso aplicativo semelhante ao Shazam, dever\u00edamos criar um tokenizador de n-grama personalizado, mas, para manter as coisas simples, vamos tentar aproveitar o padr\u00e3o. Para fazer isso, precisaremos alterar ligeiramente o c\u00f3digo Parsons inserindo um espa\u00e7o em branco ap\u00f3s cada 5 letras. O motivo disso \u00e9 que estou presumindo que, se o usu\u00e1rio conseguir assobiar corretamente pelo menos 5 notas seguidas, considerarei isso um \"bloco de melodia\" e tentarei compar\u00e1-lo com nosso \u00edndice invertido.<\/p>\n<p>Dessa forma, nosso \"<u><a href=\"https:\/\/en.wikipedia.org\/wiki\/Twinkle_Twinkle_Little_Star\">Twinkle Twinkle Little Star<\/a><\/u>\" ser\u00e1 armazenado como <strong>*rurur ddrdr drdur drdrd urdrd rddru rurdd rdrdr d<\/strong>.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<h3><strong>Filtros<\/strong><\/h3>\n<p>&nbsp;<\/p>\n<p>O Couchbase FTS tamb\u00e9m vem com <a href=\"https:\/\/docs.couchbase.com\/server\/5.5\/fts\/fts-using-analyzers.html\">uma variedade de filtros<\/a>s, os tr\u00eas mais populares s\u00e3o potencialmente os <strong>to_lower<\/strong>, <strong>stop_tokens<\/strong>e <strong>modelador de tronco<\/strong>:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-5820\" src=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2018\/09\/Screen-Shot-2018-09-12-at-4.44.25-PM.png\" alt=\"\" width=\"393\" height=\"687\" srcset=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2018\/09\/Screen-Shot-2018-09-12-at-4.44.25-PM.png 533w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2018\/09\/Screen-Shot-2018-09-12-at-4.44.25-PM-172x300.png 172w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2018\/09\/Screen-Shot-2018-09-12-at-4.44.25-PM-300x524.png 300w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2018\/09\/Screen-Shot-2018-09-12-at-4.44.25-PM-11x20.png 11w\" sizes=\"auto, (max-width: 393px) 100vw, 393px\" \/><\/p>\n<p>&nbsp;<\/p>\n<ul>\n<li><strong>to_lower<\/strong>: Converte todos os caracteres em min\u00fasculas. Por exemplo, HTML se torna html.<\/li>\n<li><strong>stop_tokens<\/strong>: Remove do fluxo os tokens considerados desnecess\u00e1rios para uma pesquisa de texto completo: por exemplo, and, is, and the.<\/li>\n<li><strong>Stemmer<\/strong>: Usos\u00a0<a href=\"https:\/\/snowball.tartarus.org\/\">libstemmer<\/a>para reduzir tokens a palavras-temas. Por exemplo, palavras como <em>pesca<\/em>,\u00a0<em>pescados<\/em>e\u00a0<em>pescador<\/em>\u00a0s\u00e3o reduzidos a <em>peixes<\/em>.<\/li>\n<\/ul>\n<p>O ideal \u00e9 que voc\u00ea tenha v\u00e1rios \u00edndices para os mesmos dados, em que cada \u00edndice use uma composi\u00e7\u00e3o de filtros focados em destacar uma caracter\u00edstica espec\u00edfica. Falaremos mais sobre isso nos pr\u00f3ximos artigos.<\/p>\n<p>Para nosso aplicativo semelhante ao Shazam, os filtros podem n\u00e3o ser necess\u00e1rios, mas se quisermos melhorar nossos resultados, tamb\u00e9m podemos adicionar algum tipo de filtro personalizado <strong>stop_tokens<\/strong> ou filtro de caracteres personalizado.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-5821\" src=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2018\/09\/Screen-Shot-2018-09-12-at-4.46.34-PM.png\" alt=\"\" width=\"473\" height=\"385\" srcset=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2018\/09\/Screen-Shot-2018-09-12-at-4.46.34-PM.png 531w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2018\/09\/Screen-Shot-2018-09-12-at-4.46.34-PM-300x244.png 300w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2018\/09\/Screen-Shot-2018-09-12-at-4.46.34-PM-20x16.png 20w\" sizes=\"auto, (max-width: 473px) 100vw, 473px\" \/><\/p>\n<p>Por exemplo, na maioria das m\u00fasicas pop, o cantor pode gritar por alguns segundos um \"<strong>Ahhhhhhh<\/strong>\" ou \"<strong>Ohhhhhh<\/strong>\". Usando o c\u00f3digo Parsons, ele ser\u00e1 traduzido em uma s\u00e9rie de <strong>r<\/strong> (\"repeat\", para quando a nota tem o mesmo tom da nota anterior). Portanto, nosso filtro de caracteres stop_tokens\/custom pode remover qualquer sequ\u00eancia de dez | vinte \"<strong>r<\/strong>\".<\/p>\n<p><strong>Ex:\u00a0<\/strong><strong>*rururddrdrdrdurdrdrdurdrdrddrururddrdrdrdrrrrrrrrrr <\/strong>torna-se\u00a0<strong>*rururddrdrdrdurdrdrdurdrdrddrururddrdrdrd<\/strong><\/p>\n<p>Dessa forma, a m\u00fasica ser\u00e1 identificada por sua melodia principal, em vez de tentar encontr\u00e1-la por uma sequ\u00eancia de notas repetidas, o que poderia gerar resultados errados.<\/p>\n<p>&nbsp;<\/p>\n<h2><strong>Consultando os dados<\/strong><\/h2>\n<p>Agora que temos nossa biblioteca de m\u00fasicas devidamente indexada, tudo o que precisamos fazer \u00e9 registrar o assobio do usu\u00e1rio, convert\u00ea-lo em Parsons Code e, finalmente, consultar o banco de dados. O FTS transformar\u00e1 automaticamente nosso termo de consulta usando os mesmos tokenizadores e analisadores que usamos para indexar os dados.<\/p>\n<p>Por enquanto, vamos supor que a consulta simplesmente trar\u00e1 resultados ordenados pelo total de correspond\u00eancias.<\/p>\n<p><strong>Ex:<\/strong><\/p>\n<p>Uma consulta como <strong>rurur<\/strong><strong> ddrdr <\/strong>potencialmente trar\u00e1 o \"<u><a href=\"https:\/\/en.wikipedia.org\/wiki\/Twinkle_Twinkle_Little_Star\">Twinkle Twinkle Little Star<\/a><\/u>\", pois temos 4 partidas nela:<\/p>\n<p>*<span style=\"color: #0000ff\"><strong>rurur<\/strong><\/span><span style=\"color: #ff0000\"><strong>ddrdr<\/strong><\/span><strong>drdurdrdrdrdurdrdrdd<span style=\"color: #ff0000\"><span style=\"color: #0000ff\">rurur<\/span>ddrdr<\/span>drd<\/strong><\/p>\n<p><strong>\u00a0<\/strong><strong>\u00a0<\/strong><\/p>\n<h2><strong>Onde est\u00e1 a demonstra\u00e7\u00e3o?<\/strong><\/h2>\n<p><strong>\u00a0<\/strong>Vamos criar outro tipo de aplicativo durante esta s\u00e9rie do blog, mas se voc\u00ea estiver interessado em experimentar um aplicativo real que implemente algo semelhante ao que descrevi aqui, confira <a href=\"https:\/\/beta.midomi.com\/\">Midemi<\/a>.<\/p>\n<p><a href=\"https:\/\/beta.midomi.com\/\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-5822\" src=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2018\/09\/Screen-Shot-2018-09-12-at-5.06.36-PM.png\" alt=\"\" width=\"651\" height=\"221\" srcset=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2018\/09\/Screen-Shot-2018-09-12-at-5.06.36-PM.png 651w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2018\/09\/Screen-Shot-2018-09-12-at-5.06.36-PM-300x102.png 300w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2018\/09\/Screen-Shot-2018-09-12-at-5.06.36-PM-20x7.png 20w\" sizes=\"auto, (max-width: 651px) 100vw, 651px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<h2><strong>Conclus\u00e3o<\/strong><\/h2>\n<p>O objetivo deste artigo foi mostrar a import\u00e2ncia dos tokenizadores e filtros mesmo quando estamos lidando com outros tipos de estruturas. Recomendo fortemente a leitura da documenta\u00e7\u00e3o oficial sobre eles para entender qual \u00e9 o melhor caso de uso para cada um deles.<\/p>\n<p>Se voc\u00ea j\u00e1 tem um bom conhecimento de FTS, deve ter notado alguns poss\u00edveis problemas com nosso aplicativo semelhante ao Shazam: Como o usu\u00e1rio geralmente n\u00e3o come\u00e7a a assobiar a m\u00fasica desde o in\u00edcio, podemos tokenizar o assobio a partir de um ponto diferente daquele em que tokenizamos a m\u00fasica original. Como estamos agrupando a m\u00fasica em tokens de 5 notas, as chances de tokenizar tanto a m\u00fasica quanto o termo consultado no ponto correto s\u00e3o de 1 em 5.<\/p>\n<p><strong>Ex:<\/strong><\/p>\n<p>&#8220;<u><a href=\"https:\/\/en.wikipedia.org\/wiki\/Twinkle_Twinkle_Little_Star\">Twinkle Twinkle Little Star<\/a><\/u>&#8220;: <strong>rururddrdrdrdrdrdurdrdrdrdrdurdrdrdrdrdrdrdrdrdrdrdrdrdrdrd<\/strong><\/p>\n<p>Tokenized \"<u><a href=\"https:\/\/en.wikipedia.org\/wiki\/Twinkle_Twinkle_Little_Star\">Twinkle Twinkle Little Star<\/a><\/u>&#8220;<strong>: rurur ddrdr drdur drdrd urdrd rddru rurdd rdrdr d<\/strong><\/p>\n<p>Apito do usu\u00e1rio:\u00a0<strong>rdrdrdurdrdrdrdurdrd<\/strong> (uma parte aleat\u00f3ria no meio da m\u00fasica)<\/p>\n<p>Apito do usu\u00e1rio tokenizado:\u00a0<strong>rdrdr durdr drdur drd<\/strong><\/p>\n<p>&nbsp;<\/p>\n<p>No exemplo acima, tivemos 2 correspond\u00eancias (<strong>rdrdr<\/strong> e\u00a0<strong>drdur<\/strong>) por acaso, mas como eles est\u00e3o fora de ordem, a pontua\u00e7\u00e3o dessa m\u00fasica ser\u00e1 seriamente comprometida, o que pode levar a resultados inesperados.<\/p>\n<p>&nbsp;<\/p>\n<h4><strong>S\u00e9rie de pesquisa de texto completo<\/strong><\/h4>\n<ul>\n<li><a href=\"https:\/\/www.couchbase.com\/blog\/pt\/why-you-should-avoid-like-deep-dive-on-fts-part-1\/\">Por que voc\u00ea deve evitar o LIKE %<\/a> - Parte 2<\/li>\n<li><a href=\"https:\/\/www.couchbase.com\/blog\/pt\/fuzzy-matching\/\">Correspond\u00eancia difusa<\/a>\u00a0- Parte 3<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>Veremos como resolver esse problema e alguns outros nos pr\u00f3ximos artigos desta s\u00e9rie. Enquanto isso, se voc\u00ea tiver alguma d\u00favida, envie-me um tweet para <a href=\"https:\/\/twitter.com\/deniswsrosa\">@deniswsrosa<\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>In the previous blog post, we talked about why full-text search is a better solution at scale to implement a well-designed search in your application. In this second part, we are going to deep-dive on the Inverted Index and explore [&hellip;]<\/p>","protected":false},"author":8754,"featured_media":5817,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"footnotes":""},"categories":[2165],"tags":[],"ppma_author":[9059],"class_list":["post-5816","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-full-text-search"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.1 (Yoast SEO v26.1.1) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Explore how analyzers, tokenizers, and filters works<\/title>\n<meta name=\"description\" content=\"This post focuses on the Inverted Index and also explore how analyzers, tokenizers, and filters might shape the result of your searches.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.couchbase.com\/blog\/pt\/how-analyzers-tokenizers-filters-work-fts-part-2\/\" \/>\n<meta property=\"og:locale\" content=\"pt_BR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Building a Shazam-like app to understand how Tokenizers and Filters work | FTS Part 2\" \/>\n<meta property=\"og:description\" content=\"This post focuses on the Inverted Index and also explore how analyzers, tokenizers, and filters might shape the result of your searches.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.couchbase.com\/blog\/pt\/how-analyzers-tokenizers-filters-work-fts-part-2\/\" \/>\n<meta property=\"og:site_name\" content=\"The Couchbase Blog\" \/>\n<meta property=\"article:published_time\" content=\"2018-09-12T15:10:08+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-06-14T03:59:13+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2018\/09\/Couchbase-FTS-Part2.png\" \/>\n\t<meta property=\"og:image:width\" content=\"728\" \/>\n\t<meta property=\"og:image:height\" content=\"210\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Denis Rosa, Developer Advocate, Couchbase\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@deniswsrosa\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Denis Rosa, Developer Advocate, Couchbase\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutos\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/\"},\"author\":{\"name\":\"Denis Rosa, Developer Advocate, Couchbase\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/fe3c5273e805e72a5294611a48f62257\"},\"headline\":\"Building a Shazam-like app to understand how Tokenizers and Filters work | FTS Part 2\",\"datePublished\":\"2018-09-12T15:10:08+00:00\",\"dateModified\":\"2025-06-14T03:59:13+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/\"},\"wordCount\":1324,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2018\/09\/Couchbase-FTS-Part2.png\",\"articleSection\":[\"Full-Text Search\"],\"inLanguage\":\"pt-BR\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/\",\"url\":\"https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/\",\"name\":\"Explore how analyzers, tokenizers, and filters works\",\"isPartOf\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2018\/09\/Couchbase-FTS-Part2.png\",\"datePublished\":\"2018-09-12T15:10:08+00:00\",\"dateModified\":\"2025-06-14T03:59:13+00:00\",\"description\":\"This post focuses on the Inverted Index and also explore how analyzers, tokenizers, and filters might shape the result of your searches.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/#breadcrumb\"},\"inLanguage\":\"pt-BR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"pt-BR\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/#primaryimage\",\"url\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2018\/09\/Couchbase-FTS-Part2.png\",\"contentUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2018\/09\/Couchbase-FTS-Part2.png\",\"width\":728,\"height\":210},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.couchbase.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Building a Shazam-like app to understand how Tokenizers and Filters work | FTS Part 2\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#website\",\"url\":\"https:\/\/www.couchbase.com\/blog\/\",\"name\":\"The Couchbase Blog\",\"description\":\"Couchbase, the NoSQL Database\",\"publisher\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.couchbase.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"pt-BR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#organization\",\"name\":\"The Couchbase Blog\",\"url\":\"https:\/\/www.couchbase.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"pt-BR\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2023\/04\/admin-logo.png\",\"contentUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2023\/04\/admin-logo.png\",\"width\":218,\"height\":34,\"caption\":\"The Couchbase Blog\"},\"image\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/fe3c5273e805e72a5294611a48f62257\",\"name\":\"Denis Rosa, Developer Advocate, Couchbase\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"pt-BR\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/image\/be0716f6199cfb09417c92cf7a8fa8d6\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/f8d1f5c13115122cab89d0f229b904480bfe20d3dfbb093fe9734cda5235d419?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/f8d1f5c13115122cab89d0f229b904480bfe20d3dfbb093fe9734cda5235d419?s=96&d=mm&r=g\",\"caption\":\"Denis Rosa, Developer Advocate, Couchbase\"},\"description\":\"Denis Rosa is a Developer Advocate for Couchbase and lives in Munich - Germany. He has a solid experience as a software engineer and speaks fluently Java, Python, Scala and Javascript. Denis likes to write about search, Big Data, AI, Microservices and everything else that would help developers to make a beautiful, faster, stable and scalable app.\",\"sameAs\":[\"https:\/\/x.com\/deniswsrosa\"],\"url\":\"https:\/\/www.couchbase.com\/blog\/pt\/author\/denis-rosa\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Explore how analyzers, tokenizers, and filters works","description":"This post focuses on the Inverted Index and also explore how analyzers, tokenizers, and filters might shape the result of your searches.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.couchbase.com\/blog\/pt\/how-analyzers-tokenizers-filters-work-fts-part-2\/","og_locale":"pt_BR","og_type":"article","og_title":"Building a Shazam-like app to understand how Tokenizers and Filters work | FTS Part 2","og_description":"This post focuses on the Inverted Index and also explore how analyzers, tokenizers, and filters might shape the result of your searches.","og_url":"https:\/\/www.couchbase.com\/blog\/pt\/how-analyzers-tokenizers-filters-work-fts-part-2\/","og_site_name":"The Couchbase Blog","article_published_time":"2018-09-12T15:10:08+00:00","article_modified_time":"2025-06-14T03:59:13+00:00","og_image":[{"width":728,"height":210,"url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2018\/09\/Couchbase-FTS-Part2.png","type":"image\/png"}],"author":"Denis Rosa, Developer Advocate, Couchbase","twitter_card":"summary_large_image","twitter_creator":"@deniswsrosa","twitter_misc":{"Written by":"Denis Rosa, Developer Advocate, Couchbase","Est. reading time":"7 minutos"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/#article","isPartOf":{"@id":"https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/"},"author":{"name":"Denis Rosa, Developer Advocate, Couchbase","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/fe3c5273e805e72a5294611a48f62257"},"headline":"Building a Shazam-like app to understand how Tokenizers and Filters work | FTS Part 2","datePublished":"2018-09-12T15:10:08+00:00","dateModified":"2025-06-14T03:59:13+00:00","mainEntityOfPage":{"@id":"https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/"},"wordCount":1324,"commentCount":0,"publisher":{"@id":"https:\/\/www.couchbase.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/#primaryimage"},"thumbnailUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2018\/09\/Couchbase-FTS-Part2.png","articleSection":["Full-Text Search"],"inLanguage":"pt-BR","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/","url":"https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/","name":"Explore how analyzers, tokenizers, and filters works","isPartOf":{"@id":"https:\/\/www.couchbase.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/#primaryimage"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/#primaryimage"},"thumbnailUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2018\/09\/Couchbase-FTS-Part2.png","datePublished":"2018-09-12T15:10:08+00:00","dateModified":"2025-06-14T03:59:13+00:00","description":"This post focuses on the Inverted Index and also explore how analyzers, tokenizers, and filters might shape the result of your searches.","breadcrumb":{"@id":"https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/#breadcrumb"},"inLanguage":"pt-BR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/"]}]},{"@type":"ImageObject","inLanguage":"pt-BR","@id":"https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/#primaryimage","url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2018\/09\/Couchbase-FTS-Part2.png","contentUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2018\/09\/Couchbase-FTS-Part2.png","width":728,"height":210},{"@type":"BreadcrumbList","@id":"https:\/\/www.couchbase.com\/blog\/how-analyzers-tokenizers-filters-work-fts-part-2\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.couchbase.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Building a Shazam-like app to understand how Tokenizers and Filters work | FTS Part 2"}]},{"@type":"WebSite","@id":"https:\/\/www.couchbase.com\/blog\/#website","url":"https:\/\/www.couchbase.com\/blog\/","name":"Blog do Couchbase","description":"Couchbase, o banco de dados NoSQL","publisher":{"@id":"https:\/\/www.couchbase.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.couchbase.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"pt-BR"},{"@type":"Organization","@id":"https:\/\/www.couchbase.com\/blog\/#organization","name":"Blog do Couchbase","url":"https:\/\/www.couchbase.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"pt-BR","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2023\/04\/admin-logo.png","contentUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2023\/04\/admin-logo.png","width":218,"height":34,"caption":"The Couchbase Blog"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/fe3c5273e805e72a5294611a48f62257","name":"Denis Rosa, defensor dos desenvolvedores, Couchbase","image":{"@type":"ImageObject","inLanguage":"pt-BR","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/image\/be0716f6199cfb09417c92cf7a8fa8d6","url":"https:\/\/secure.gravatar.com\/avatar\/f8d1f5c13115122cab89d0f229b904480bfe20d3dfbb093fe9734cda5235d419?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/f8d1f5c13115122cab89d0f229b904480bfe20d3dfbb093fe9734cda5235d419?s=96&d=mm&r=g","caption":"Denis Rosa, Developer Advocate, Couchbase"},"description":"Denis Rosa \u00e9 um Developer Advocate do Couchbase e mora em Munique, na Alemanha. Ele tem uma s\u00f3lida experi\u00eancia como engenheiro de software e fala fluentemente Java, Python, Scala e Javascript. Denis gosta de escrever sobre pesquisa, Big Data, IA, microsservi\u00e7os e tudo o mais que possa ajudar os desenvolvedores a criar um aplicativo bonito, mais r\u00e1pido, est\u00e1vel e escal\u00e1vel.","sameAs":["https:\/\/x.com\/deniswsrosa"],"url":"https:\/\/www.couchbase.com\/blog\/pt\/author\/denis-rosa\/"}]}},"authors":[{"term_id":9059,"user_id":8754,"is_guest":0,"slug":"denis-rosa","display_name":"Denis Rosa, Developer Advocate, Couchbase","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/f8d1f5c13115122cab89d0f229b904480bfe20d3dfbb093fe9734cda5235d419?s=96&d=mm&r=g","author_category":"","last_name":"Rosa, Developer Advocate, Couchbase","first_name":"Denis","job_title":"","user_url":"","description":"Denis Rosa \u00e9 um Developer Advocate do Couchbase e mora em Munique, na Alemanha. Ele tem uma s\u00f3lida experi\u00eancia como engenheiro de software e fala fluentemente Java, Python, Scala e Javascript. Denis gosta de escrever sobre pesquisa, Big Data, IA, microsservi\u00e7os e tudo o mais que possa ajudar os desenvolvedores a criar um aplicativo bonito, mais r\u00e1pido, est\u00e1vel e escal\u00e1vel."}],"_links":{"self":[{"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/posts\/5816","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/users\/8754"}],"replies":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/comments?post=5816"}],"version-history":[{"count":0,"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/posts\/5816\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/media\/5817"}],"wp:attachment":[{"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/media?parent=5816"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/categories?post=5816"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/tags?post=5816"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/ppma_author?post=5816"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}