Anteriormente, mostramos cómo utilizar las capacidades RAG de Couchbase a través de una aplicación Python que permite al usuario 'chatear' con su PDF o con X. Es simple de construir, pero ¿podemos construirlo más simple? He estado jugando mucho con Couchbase Shell recientemente y debería permitirme hacer algo similar.
Establecer un ámbito y una colección
Supongo que ya conoce Shell de Couchbase (cbsh), y tener un clúster y un modelo configurados.
Cree y seleccione un ámbito y una colección y, a continuación, cree un índice primario:
|
1 2 3 4 5 |
> scopes create pdf > cb-env scope pdf > collections create pdf > cb-env collection pdf > query "CREATE PRIMARY INDEX ON `default`:`cbsh`.`pdf`.`pdf`" |
Convertir un PDF en texto fragmentado
Existe una gran variedad de herramientas que permiten convertir un pdf en texto. En la mayoría de las distribuciones de Linux, encontrarás pdftotext.
|
1 |
> pdftotext ~/monopolyInstruction.pdf |
Esto creará una versión de texto del archivo con la misma ruta, pero con un .txt extensión.
Con Nushell (cbsh está basado en Nushell) es fácil dividir texto gracias al comando split. El problema es encontrar el delimitador correcto que necesitas para trocear el archivo. Afortunadamente, soporta cadenas multilínea, así que copié y pegué texto del archivo entre dos párrafos. Sin embargo, deberías poder hacer algo más sofisticado utilizando regex. Esa es la diferencia entre el material del blog y la producción 😇.
|
1 2 3 |
> open ~/monopolyInstruction.txt |split row " ::: ::: "|wrap text |
Esto te dará una tabla de cadenas de texto. Para importarla a Couchbase las envolvemos en un campo de texto, en un objeto JSON de contenido, añadimos un UUID generado aleatoriamente y upsertamos el resultado.
|
1 2 3 |
> open ~/monopolyInstruction.txt |split row " ::: ::: "|wrap text |wrap content | each { insert id { random uuid } } | doc upsert |
El siguiente paso es crear incrustaciones, o representaciones vectoriales del texto:
|
1 |
> query "SELECT meta().id as id, p.* from pdf as p" | wrap content| vector enrich-doc text | doc upsert |
A continuación, cree el índice de búsqueda vectorial. Aquí se llama pdf, indexa el campo textVectorcrear un vector de 1536 dimensiones y utilizar l2_norma para el algoritmo de similitud, ya que es el predeterminado.
|
1 |
> vector create-index pdf textVector 1536 |
He importado las reglas del Monopoly y pregunto cómo salir de la cárcel. En el ejemplo original, teníamos una respuesta con contexto y otra sin él.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
> ask "how to get out of jail" If you or someone you know is in jail and needs to be released, here are some general steps to take: Contact a lawyer: If you have legal representation or know of a lawyer who can help with your case, reach out to them for assistance in navigating the legal process. Obtain a bail bond: In many cases, individuals can be released from jail by posting bail. This requires paying a set amount of money to the court, which is typically returned once the individual attends all required court dates. Attend court hearings: It's important to comply with all court requirements, including attending scheduled hearings and following any conditions of release set by the court. Seek support: Consider reaching out to family members, friends, or local organizations that may be able to provide assistance or guidance during this challenging time. Please keep in mind that the process of getting released from jail can vary depending on the specific circumstances of the case and the jurisdiction. It's always best to consult with legal professionals for personalized advice and assistance. |
Y con contexto:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
> let question = "how to get out of jail"; vector enrich-text $question | vector search pdf textVector | select id |doc get| select content.text | ask $question 👤 Laurent Doguin 🏠 capella in ☁️ cbsh.gitlog.pdf > let question = "how to get out of jail"; vector enrich-text $question | vector search pdf textVector | select id |doc get| select content.text | ask $question Embedding batch 1/1 You can get out of jail by following these methods: **Roll Doubles:** If you roll a double with the white dice on any of your next three turns, you can immediately move out of Jail. You then move the number of spaces shown by your doubles roll. **"Get Out of Jail Free" Card:** If you have a "Get Out of Jail Free" card, you can use it to get out of Jail without rolling doubles. This card can be obtained by purchasing it from another player or drawing it from the Chance or Community Chest cards. **Pay Fine:** You can also choose to pay a fine of $50 before you roll the dice on either of your next two turns. After paying the fine, you are free to move and continue playing. Remember, if you do not roll doubles by your third turn or use a "Get Out of Jail Free" card, you must pay the $50 fine to get out of Jail. |
Vamos a simplificar esto poniendo todo en un script. Este es el contenido de miScript.nu:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
def initRAGPipeline [] { scopes create pdf cb-env scope pdf collections create pdf cb-env collection pdf query "CREATE PRIMARY INDEX ON `default`:`cbsh`.`pdf`.`pdf`" vector create-index pdf textVector 1536 } def storeRAGDoc [] { wrap text |wrap content | each { insert id { random uuid } } | doc upsert query "SELECT meta().id as id, p.* from `pdf` as p" | wrap content| vector enrich-doc text | doc upsert } def myAsk [$question: string] { let norag = ask $question let rag = vector enrich-text $question | vector search pdf textVector | select id |doc get| select content.text | ask $question {"rag":$rag, "norag":$norag} } |
Puedes crear el archivo de script y luego llamar a esas funciones:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
> source ./ragDemo/ragScript.nu > initRAGPipeline > open monopolyInstruction.txt |split row " ::: ::: "| store > myAsk "how to get out of jail" Embedding batch 1/1 ╭───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ rag │ Here are the ways to get out of jail in the game of Monopoly: │ │ │ │ │ │ 1. **Roll Doubles:** The most common way to get out of jail is by rolling doubles on your turn. If you roll │ │ │ doubles with the regular white dice on any of your next three turns after being sent to jail, you can immediately move your token out of jail and advance the corresponding │ │ │ number of spaces. Remember that you can only use the white dice for this purpose. │ │ │ │ │ │ 2. **Using "Get Out of Jail Free" Card:** If you have a "Get Out of Jail Free" card, you can │ │ │ use it to get out of jail without rolling doubles. Simply present the card to the Banker to get out of jail for free. The card is then returned to the bottom of the deck. │ │ │ │ │ │ 3. │ │ │ **Purchase the Card:** If another player has a "Get Out of Jail Free" card and is willing to sell it, you can purchase the card from them at a mutually agreed-upon price. │ │ │ This allows you to get out of jail even if you don't have the card yourself. │ │ │ │ │ │ 4. **Pay the Fine:** If you do not roll doubles within three turns or do not have a "Get Out of │ │ │ Jail Free" card, you must pay a fine of $50 to the Bank before you roll the dice on either of your next two turns. Once you pay the fine, you are immediately released from │ │ │ jail and can move your token as per the dice roll. │ │ │ │ │ │ These are the four main ways to get out of jail in Monopoly. │ │ norag │ If you or someone you know is currently in jail and looking to get released, here are some general steps to consider: │ │ │ │ │ │ 1. Contact a lawyer: A criminal defense attorney can │ │ │ provide guidance on legal options and help navigate the legal process for release. │ │ │ │ │ │ 2. Attend court hearings: It is important to attend all court hearings and follow any │ │ │ conditions set by the court to demonstrate cooperation with the legal system. │ │ │ │ │ │ 3. Consider bail: If bail is an option, you may be able to pay a set amount to be released from │ │ │ jail pending trial. If you cannot afford the bail amount, you may seek assistance from a bail bond agent. │ │ │ │ │ │ 4. Seek alternative options: Depending on the circumstances of your │ │ │ case, there may be alternative options for release such as pretrial services, diversion programs, or supervised release. │ │ │ │ │ │ 5. Follow legal advice: It is crucial to follow the │ │ │ advice of your legal counsel and comply with all legal requirements to increase the chances of a successful release. │ │ │ │ │ │ It's important to note that the process of getting out of │ │ │ jail can vary depending on the specific circumstances of the case and the laws in your jurisdiction. For personalized guidance, it's recommended to speak with a lawyer or │ │ │ legal professional specializing in criminal law. │ ╰───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ |
Aquí puedes ver el mismo tipo de resultado que conseguimos en la demo de Python RAG, pero esta vez usando Couchbase Shell. Debería ser más fácil de manipular, cambiar o extender, porque no necesitas desplegar una aplicación o saber Python. Sin embargo, será menos flexible que lo que puedes conseguir con Python y Langchain.
Si esto te interesa, permanece atento: ¡más contenido sobre IA y Couchbase Shell en camino!
-
- Más información Shell de Couchbase
- y Búsqueda vectorial en Couchbase capacidades