Anteriormente, mostramos como usar os recursos do Couchbase RAG por meio de um aplicativo Python que permite que o usuário "converse com seu PDF ou com X. É simples de construir, mas será que podemos construí-lo de forma mais simples? Tenho brincado bastante com o Couchbase Shell recentemente e ele deve me permitir fazer algo semelhante.
Configurar um escopo e uma coleção
Presumo que você já esteja familiarizado com Shell do Couchbase (cbsh) e ter um cluster e um modelo configurados.
Crie e selecione um escopo e uma coleção e, em seguida, crie um índice primário:
|
1 2 3 4 5 |
> scopes create pdf > cb-env scope pdf > collections create pdf > cb-env collection pdf > query "CREATE PRIMARY INDEX ON `default`:`cbsh`.`pdf`.`pdf`" |
Transforme um PDF em texto dividido em partes
Há uma variedade de ferramentas que permitem converter um PDF em texto. Na maioria das distribuições Linux, você deve encontrar pdftotext.
|
1 |
> pdftotext ~/monopolyInstruction.pdf |
Isso criará uma versão de texto do arquivo com o mesmo caminho, mas com um .txt extensão.
Com Nushell (o cbsh é baseado no Nushell), é fácil dividir o texto graças ao comando split. O problema é encontrar o delimitador correto que você precisa para dividir o arquivo. Felizmente, ele suporta strings de várias linhas, então copiei e colei o texto do arquivo entre dois parágrafos. Entretanto, você deve ser capaz de fazer algo mais sofisticado usando regex. Essa é a diferença entre o material do blog e a produção 😇.
|
1 2 3 |
> open ~/monopolyInstruction.txt |split row " ::: ::: "|wrap text |
Isso lhe dará uma tabela de strings de texto. Para importá-la para o Couchbase, nós as envolvemos em um campo de texto, em um objeto JSON de conteúdo, adicionamos um UUID gerado aleatoriamente e inserimos o resultado.
|
1 2 3 |
> open ~/monopolyInstruction.txt |split row " ::: ::: "|wrap text |wrap content | each { insert id { random uuid } } | doc upsert |
A próxima etapa é criar embeddings, ou representações vetoriais do texto:
|
1 |
> query "SELECT meta().id as id, p.* from pdf as p" | wrap content| vector enrich-doc text | doc upsert |
Em seguida, crie o índice de pesquisa vetorial. Aqui ele é chamado de pdf, indexa o campo textVectorcrie um vetor de 1536 dimensões e use l2_norm para o algoritmo de similaridade, pois é o padrão.
|
1 |
> vector create-index pdf textVector 1536 |
Eu importei as regras do Monopoly e estou perguntando como sair da cadeia. No exemplo original, tínhamos uma resposta com contexto e outra sem.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
> ask "how to get out of jail" If you or someone you know is in jail and needs to be released, here are some general steps to take: Contact a lawyer: If you have legal representation or know of a lawyer who can help with your case, reach out to them for assistance in navigating the legal process. Obtain a bail bond: In many cases, individuals can be released from jail by posting bail. This requires paying a set amount of money to the court, which is typically returned once the individual attends all required court dates. Attend court hearings: It's important to comply with all court requirements, including attending scheduled hearings and following any conditions of release set by the court. Seek support: Consider reaching out to family members, friends, or local organizations that may be able to provide assistance or guidance during this challenging time. Please keep in mind that the process of getting released from jail can vary depending on the specific circumstances of the case and the jurisdiction. It's always best to consult with legal professionals for personalized advice and assistance. |
E com contexto:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
> let question = "how to get out of jail"; vector enrich-text $question | vector search pdf textVector | select id |doc get| select content.text | ask $question 👤 Laurent Doguin 🏠 capella in ☁️ cbsh.gitlog.pdf > let question = "how to get out of jail"; vector enrich-text $question | vector search pdf textVector | select id |doc get| select content.text | ask $question Embedding batch 1/1 You can get out of jail by following these methods: **Roll Doubles:** If you roll a double with the white dice on any of your next three turns, you can immediately move out of Jail. You then move the number of spaces shown by your doubles roll. **"Get Out of Jail Free" Card:** If you have a "Get Out of Jail Free" card, you can use it to get out of Jail without rolling doubles. This card can be obtained by purchasing it from another player or drawing it from the Chance or Community Chest cards. **Pay Fine:** You can also choose to pay a fine of $50 before you roll the dice on either of your next two turns. After paying the fine, you are free to move and continue playing. Remember, if you do not roll doubles by your third turn or use a "Get Out of Jail Free" card, you must pay the $50 fine to get out of Jail. |
Vamos simplificar isso colocando tudo em um script. Este é o conteúdo de myScript.nu:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
def initRAGPipeline [] { scopes create pdf cb-env scope pdf collections create pdf cb-env collection pdf query "CREATE PRIMARY INDEX ON `default`:`cbsh`.`pdf`.`pdf`" vector create-index pdf textVector 1536 } def storeRAGDoc [] { wrap text |wrap content | each { insert id { random uuid } } | doc upsert query "SELECT meta().id as id, p.* from `pdf` as p" | wrap content| vector enrich-doc text | doc upsert } def myAsk [$question: string] { let norag = ask $question let rag = vector enrich-text $question | vector search pdf textVector | select id |doc get| select content.text | ask $question {"rag":$rag, "norag":$norag} } |
Você pode criar o arquivo de script e, em seguida, chamar essas funções:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
> source ./ragDemo/ragScript.nu > initRAGPipeline > open monopolyInstruction.txt |split row " ::: ::: "| store > myAsk "how to get out of jail" Embedding batch 1/1 ╭───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ rag │ Here are the ways to get out of jail in the game of Monopoly: │ │ │ │ │ │ 1. **Roll Doubles:** The most common way to get out of jail is by rolling doubles on your turn. If you roll │ │ │ doubles with the regular white dice on any of your next three turns after being sent to jail, you can immediately move your token out of jail and advance the corresponding │ │ │ number of spaces. Remember that you can only use the white dice for this purpose. │ │ │ │ │ │ 2. **Using "Get Out of Jail Free" Card:** If you have a "Get Out of Jail Free" card, you can │ │ │ use it to get out of jail without rolling doubles. Simply present the card to the Banker to get out of jail for free. The card is then returned to the bottom of the deck. │ │ │ │ │ │ 3. │ │ │ **Purchase the Card:** If another player has a "Get Out of Jail Free" card and is willing to sell it, you can purchase the card from them at a mutually agreed-upon price. │ │ │ This allows you to get out of jail even if you don't have the card yourself. │ │ │ │ │ │ 4. **Pay the Fine:** If you do not roll doubles within three turns or do not have a "Get Out of │ │ │ Jail Free" card, you must pay a fine of $50 to the Bank before you roll the dice on either of your next two turns. Once you pay the fine, you are immediately released from │ │ │ jail and can move your token as per the dice roll. │ │ │ │ │ │ These are the four main ways to get out of jail in Monopoly. │ │ norag │ If you or someone you know is currently in jail and looking to get released, here are some general steps to consider: │ │ │ │ │ │ 1. Contact a lawyer: A criminal defense attorney can │ │ │ provide guidance on legal options and help navigate the legal process for release. │ │ │ │ │ │ 2. Attend court hearings: It is important to attend all court hearings and follow any │ │ │ conditions set by the court to demonstrate cooperation with the legal system. │ │ │ │ │ │ 3. Consider bail: If bail is an option, you may be able to pay a set amount to be released from │ │ │ jail pending trial. If you cannot afford the bail amount, you may seek assistance from a bail bond agent. │ │ │ │ │ │ 4. Seek alternative options: Depending on the circumstances of your │ │ │ case, there may be alternative options for release such as pretrial services, diversion programs, or supervised release. │ │ │ │ │ │ 5. Follow legal advice: It is crucial to follow the │ │ │ advice of your legal counsel and comply with all legal requirements to increase the chances of a successful release. │ │ │ │ │ │ It's important to note that the process of getting out of │ │ │ jail can vary depending on the specific circumstances of the case and the laws in your jurisdiction. For personalized guidance, it's recommended to speak with a lawyer or │ │ │ legal professional specializing in criminal law. │ ╰───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ |
Aqui, você pode ver o mesmo tipo de resultado que obtivemos na demonstração do Python RAG, mas desta vez usando o Couchbase Shell. Deve ser mais fácil manipular, alterar ou estender, porque você não precisa implantar um aplicativo ou conhecer Python. No entanto, ele será menos flexível do que o que você pode obter com Python e Langchain.
Se isso for do seu interesse, fique ligado - mais conteúdo sobre IA e Couchbase Shell está a caminho!
-
- Saiba mais sobre Shell do Couchbase
- e Pesquisa vetorial do Couchbase capacidades