Previously, we showed how to use Couchbase RAG capabilities through a Python app that allows the user to ‘chat’ with their PDF or with X. It’s simple to build, but can we build it simpler? I have been playing a lot with Couchbase Shell recently and it should allow me to do something similar.

Set up a scope and collection

I am assuming you are already familiar with Couchbase Shell (cbsh), and have a configured cluster and model.

Create and select a scope and collection, and then create a primary index:

Turn a PDF into chunked text

There are a variety of tools allowing you to convert a pdf to text. On most Linux distributions, you should find pdftotext.

This will create a text version of the file with the same path, but with a .txt extension.

With Nushell (cbsh is based on Nushell) it’s easy to split text thanks to the split command. The problem is finding the right delimiter you need to chunk out the file. Fortunately, it supports multiline string, so I copied and pasted text from the file between two paragraphs. However, you should be able to do something more sophisticated using regex. That’s the difference between blog material and production 😇.

This will get you a table of text strings. To import it to Couchbase we wrap them in a text field, in a content JSON object, add a randomly generated UUID and upsert the result.

The next step is to create embeddings, or vector representations of the text:

Then create the vector search index. Here it’s called pdf, index the field textVector, create 1536 dimensions vector and use l2_norm for similarity algorithm as it’s the default.

I have imported the rules of Monopoly and I am asking how to get out of jail. In the original example, we had one answer with context and one without.

And with context:

Let’s simplify this by putting everything in a script. This is the content of myScript.nu:

You can source the script file and then call those functions:

Here, you can see the same kind of result we achieved in the Python RAG demo, but this time using Couchbase Shell. It should be easier to manipulate, change or extend, because you don’t need to deploy an app or know Python. However, it will be less flexible than what you can achieve with Python and Langchain.

If this interests you, stay tuned–more AI and Couchbase Shell content is on the way!

Author

Posted by Laurent Doguin

Laurent is a nerdy metal head who lives in Paris. He mostly writes code in Java and structured text in AsciiDoc, and often talks about data, reactive programming and other buzzwordy stuff. He is also a former Developer Advocate for Clever Cloud and Nuxeo where he devoted his time and expertise to helping those communities grow bigger and stronger. He now runs Developer Relations at Couchbase.

Leave a reply