{"id":16518,"date":"2024-10-29T05:50:01","date_gmt":"2024-10-29T12:50:01","guid":{"rendered":"https:\/\/www.couchbase.com\/blog\/?p=16518"},"modified":"2025-06-16T10:43:45","modified_gmt":"2025-06-16T17:43:45","slug":"supercharge-rag-couchbase-vector-unstructured-io","status":"publish","type":"post","link":"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/","title":{"rendered":"Supercharge Your RAG application With Couchbase Vector Search and Unstructured.io"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">Today we\u2019re excited to announce the launch of the Couchbase and <a href=\"https:\/\/unstructured.io\">Unstructured.io<\/a> connector which streamlines the process of ingesting unstructured data into your RAG pipeline built on top of Couchbase as the vector store. Using this connector, you can now convert unstructured and loosely-structured documents into JSON files and make them ready for consumption by RAG applications via the generation of vector embeddings in just a few lines of code.<\/span><\/p>\n<h2><span style=\"font-weight: 400;\">Why is unstructured data ingestion important for developers?\u00a0<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">An overwhelming amount of enterprise data is unstructured and this is unlikely to change in the foreseeable future. The presence of data in unstructured formats has implications for developers beyond time and cost. It means that decision making in enterprises is predicated on the limited amount of consumable, structured data instead of all data residing within it. In addition to this, it means that a large variety of enterprise workflows (internal and customer facing) require manual intervention making them costlier, slower, and more error-prone. This problem is likely to become more acute as enterprise data footprints grow.<\/span><\/p>\n<h2><span style=\"font-weight: 400;\">How is unstructured data leveraged by developers?<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">One of the most effective ways of leveraging unstructured data is to ingest it into a RAG pipeline, making the data available for retrieval via <a href=\"https:\/\/www.couchbase.com\/blog\/what-is-vector-search\/\">vector searches<\/a>. This has wide-ranging applications in various industries. RAG applications can be leveraged to drive operational efficiency by making it easier to access more relevant documents, resulting in faster resolution times and lower costs. Some of the use cases that can be solved for are:\u00a0<\/span><\/p>\n<ol>\n<li style=\"list-style-type: none;\">\n<ol>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Enabling customer support teams across industries to find relevant troubleshooting documents<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Enabling medical professionals to extract relevant articles and patient records stored in document databases to assist in diagnosis and treatment planning<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Recommendation systems that leverage customer data to suggest the most suitable product<\/span><\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<div id=\"attachment_16519\" style=\"width: 910px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image2-8.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-16519\" class=\"wp-image-16519 size-large\" src=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image2-8-1024x506.png\" alt=\"\" width=\"900\" height=\"445\" srcset=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image2-8-1024x506.png 1024w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image2-8-300x148.png 300w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image2-8-768x380.png 768w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image2-8-1536x760.png 1536w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image2-8-1320x653.png 1320w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image2-8.png 1990w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><\/a><p id=\"caption-attachment-16519\" class=\"wp-caption-text\">Fig 1. Unstructured data ingestion pipeline with unstructured.io and Capella VectorDB<\/p><\/div>\n<h2><span style=\"font-weight: 400;\">What is the current way of processing unstructured data?<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">The current way of accomplishing this (ingesting unstructured data for RAG applications) with Couchbase Capella, would require developers to write applications to connect to an unstructured data extractor, parse its output, chunk it, and then send it to an embedding model for generating vectors which then would have to be sent to a vector DB on Couchbase Capella.\u00a0<\/span><\/p>\n<h2><span style=\"font-weight: 400;\">How does our connector improve the current method of ingesting unstructured data?\u00a0<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">The unstructured.io &#8211; Couchbase connectors simplify the process of connecting the two aforementioned primary elements of the ingestion pipeline, making it easier to:<\/span><\/p>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li><span style=\"font-weight: 400;\">Convert unstructured text data into structured JSON documents<\/span><\/li>\n<li><span style=\"font-weight: 400;\">Generate the corresponding vectors<\/span><\/li>\n<li><span style=\"font-weight: 400;\"> Insert them into Couchbase Capella<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">The <\/span><a href=\"https:\/\/docs.unstructured.io\/api-reference\/ingest\/source-connectors\/couchbase\"><span style=\"font-weight: 400;\">source connector<\/span><\/a><span style=\"font-weight: 400;\"> helps fetch data from Couchbase Capella before it is chunked (and optionally vectorized) while the <\/span><a href=\"https:\/\/docs.unstructured.io\/api-reference\/ingest\/destination-connector\/couchbase\"><span style=\"font-weight: 400;\">destination connector<\/span><\/a><span style=\"font-weight: 400;\"> helps ingest processed data from unstructured.io into Couchbase Capella.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Capella is a high performance vector database that lets you swiftly set up, index, and query a vector database. Here\u2019s how you can leverage the connectors to start processing your documents with just a few lines of code.\u00a0<\/span><\/p>\n<h3><span style=\"font-weight: 400;\">Step 1: Prerequisites<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Before you start using the connector you will need to get a few prerequisites in place. You will need:<\/span><\/p>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">An <\/span><a href=\"https:\/\/docs.unstructured.io\/api-reference\/api-services\/saas-api-development-guide\"><span style=\"font-weight: 400;\">API key from unstructured.io<\/span><\/a><span style=\"font-weight: 400;\"> which can be obtained by creating an unstructured.io account<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">An active Capella account with a cluster and database set up as well as scope and collections defined within the database<\/span>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><a href=\"https:\/\/docs.couchbase.com\/cloud\/get-started\/create-account.html\"><span style=\"font-weight: 400;\">Create a free account and a database<\/span><\/a><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><a href=\"https:\/\/docs.couchbase.com\/cloud\/clusters\/data-service\/about-buckets-scopes-collections.html\"><span style=\"font-weight: 400;\">Set up a collection<\/span><\/a><\/li>\n<\/ul>\n<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/docs.couchbase.com\/cloud\/clusters\/allow-ip-address.html\"><span style=\"font-weight: 400;\">To configure the cluster to use your IP address<\/span><\/a><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/docs.couchbase.com\/cloud\/clusters\/manage-database-users.html\"><span style=\"font-weight: 400;\">To configure the database credentials<\/span><\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3><span style=\"font-weight: 400;\">Step 2: Define the source of your unstructured data and the destination<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Once the prerequisites are in place you can define the source for the documents that you want to process and use as inputs for your production RAG pipeline. The connector supports ingestion from various sources: Couchbase, local directories, S3 buckets, and other storage services. Unstructured.io <\/span><a href=\"https:\/\/docs.unstructured.io\/open-source\/introduction\/supported-file-types\"><span style=\"font-weight: 400;\">supports a wide variety of unstructured document formats<\/span><\/a><span style=\"font-weight: 400;\"> including PDFs, image files (JPEG, PNG), text documents (DOCX, DOC), emails, spreadsheets, and presentation file formats (PPT).\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Similarly, define the intermediate location that will be used to store the output generated by unstructured.io before the text is vectorized. This can be a collection on a performant, scalable database on Couchbase or any other storage service that you\u2019re currently using. You can then define the Vector database collection on Couchbase where the JSON documents containing the original text, metadata, and the corresponding embedding vector will be stored.\u00a0<\/span><\/p>\n<h3><span style=\"font-weight: 400;\">Step 3: Define your chunking strategy and select an embedding model for generation of vector embeddings<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Once the input and output locations are defined, you can <\/span><a href=\"https:\/\/docs.unstructured.io\/api-reference\/api-services\/chunking\"><span style=\"font-weight: 400;\">select one of the chunking strategies<\/span><\/a><span style=\"font-weight: 400;\"> supported by unstructured.io and <\/span><a href=\"https:\/\/docs.unstructured.io\/open-source\/core-functionality\/embedding\"><span style=\"font-weight: 400;\">pick an embedding model<\/span><\/a><span style=\"font-weight: 400;\"> of your choice. Unstructured.io supports embedding models from several providers such as Huggingface, OpenAI and Bedrock among others.\u00a0<\/span><\/p>\n<h3><span style=\"font-weight: 400;\">Step 4: Run your application!<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Test your application. You should be able to view the new structured JSON documents inserted into your Capella collection after all the processing steps executed via unstructured.io. Below is an example of the files that we converted from a PDF to JSON and ingested into a Couchbase Capella collection. For a step-by-step guide along with the code on how to do this, check out our <\/span><a href=\"https:\/\/docs.unstructured.io\/api-reference\/ingest\/destination-connector\/couchbase\"><span style=\"font-weight: 400;\">full tutorial here<\/span><\/a><span style=\"font-weight: 400;\">. You can also use our notebook to follow along.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Unstructured document example:<\/span><\/p>\n<p><a href=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image3-5.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-16520\" src=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image3-5.png\" alt=\"\" width=\"596\" height=\"756\" srcset=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image3-5.png 596w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image3-5-237x300.png 237w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image3-5-300x381.png 300w\" sizes=\"auto, (max-width: 596px) 100vw, 596px\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">Output from unstructured.io:<\/span><\/p>\n<p><a href=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image4-6.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-16521\" src=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image4-6.png\" alt=\"\" width=\"836\" height=\"551\" srcset=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image4-6.png 836w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image4-6-300x198.png 300w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image4-6-768x506.png 768w\" sizes=\"auto, (max-width: 836px) 100vw, 836px\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">Documents ingested into Capella:<\/span><\/p>\n<p><a href=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image5-6.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-16522\" src=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image5-6-1024x323.png\" alt=\"\" width=\"900\" height=\"284\" srcset=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image5-6-1024x323.png 1024w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image5-6-300x95.png 300w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image5-6-768x242.png 768w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image5-6-1536x485.png 1536w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image5-6-1320x417.png 1320w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/image5-6.png 1999w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">You can now run your application to process unstructured text documents, identify the components, extract them as JSON documents and generate vector embeddings before inserting them into your Capella collection.<\/span><\/p>\n<h2><span style=\"font-weight: 400;\">Resources<\/span><\/h2>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li><a href=\"https:\/\/www.couchbase.com\/blog\/rag-applications-with-vector-search-and-couchbase\/\"><span style=\"font-weight: 400;\">Building End-to-End RAG Applications With Couchbase Vector Search<\/span><\/a><\/li>\n<li><a href=\"https:\/\/www.couchbase.com\/blog\/couchbase-bedrock-rag-applications\/\"><span style=\"font-weight: 400;\">Build Performant RAG Applications Using Couchbase Vector Search and Amazon Bedrock<\/span><\/a><\/li>\n<li><a href=\"https:\/\/info.couchbase.com\/webinar_Coding_With_AI_Vector_Search_RAG_2024M4_LP.html\"><span style=\"font-weight: 400;\">Coding With AI: Vector Search and RAG<\/span><\/a><span style=\"font-weight: 400;\">\u00a0(Webcast)<\/span><\/li>\n<li><a href=\"https:\/\/cloud.couchbase.com\/sign-up\"><span style=\"font-weight: 400;\">Try Couchbase Capella for free today<\/span><\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><br style=\"font-weight: 400;\" \/><br style=\"font-weight: 400;\" \/><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today we\u2019re excited to announce the launch of the Couchbase and Unstructured.io connector which streamlines the process of ingesting unstructured data into your RAG pipeline built on top of Couchbase as the vector store. Using this connector, you can now [&hellip;]<\/p>\n","protected":false},"author":85541,"featured_media":16557,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"footnotes":""},"categories":[1814,3917,2242,2225,9973,9921,9937],"tags":[10049,9924,10048],"ppma_author":[10050,10051],"class_list":["post-16518","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-application-design","category-company","category-connectors","category-cloud","category-generative-ai-genai","category-partners","category-vector-search","tag-data-prep","tag-rag-retrieval-augmented-generation","tag-unstructured-io"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.8 (Yoast SEO v25.8) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Supercharge Your RAG application With Couchbase Vector Search and Unstructured.io - The Couchbase Blog<\/title>\n<meta name=\"description\" content=\"Announcing the Couchbase and Unstructured.io connector\u2014quickly convert unstructured data into JSON and vector embeddings for seamless integration into your RAG pipeline.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Supercharge Your RAG application With Couchbase Vector Search and Unstructured.io\" \/>\n<meta property=\"og:description\" content=\"Announcing the Couchbase and Unstructured.io connector\u2014quickly convert unstructured data into JSON and vector embeddings for seamless integration into your RAG pipeline.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/\" \/>\n<meta property=\"og:site_name\" content=\"The Couchbase Blog\" \/>\n<meta property=\"article:published_time\" content=\"2024-10-29T12:50:01+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-06-16T17:43:45+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/Unstructured-data-ingestion-pipeline-Diagram_3.png\" \/>\n\t<meta property=\"og:image:width\" content=\"2400\" \/>\n\t<meta property=\"og:image:height\" content=\"1256\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Vishwa Yeruru - Sr. Product Manager, Maria Khalusova - Staff Developer Advocate, Unstructured.io\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Vishwa Yeruru - Sr. Product Manager\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/\"},\"author\":{\"name\":\"Vishwa Yeruru - Sr. Product Manager\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/0670782b8878056390b6a256511c8858\"},\"headline\":\"Supercharge Your RAG application With Couchbase Vector Search and Unstructured.io\",\"datePublished\":\"2024-10-29T12:50:01+00:00\",\"dateModified\":\"2025-06-16T17:43:45+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/\"},\"wordCount\":970,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/Unstructured-data-ingestion-pipeline-Diagram_3.png\",\"keywords\":[\"data prep\",\"RAG retrieval-augmented generation\",\"unstructured.io\"],\"articleSection\":[\"Application Design\",\"Company\",\"Connectors\",\"Couchbase Capella\",\"Generative AI (GenAI)\",\"Partners\",\"Vector Search\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/\",\"url\":\"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/\",\"name\":\"Supercharge Your RAG application With Couchbase Vector Search and Unstructured.io - The Couchbase Blog\",\"isPartOf\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/Unstructured-data-ingestion-pipeline-Diagram_3.png\",\"datePublished\":\"2024-10-29T12:50:01+00:00\",\"dateModified\":\"2025-06-16T17:43:45+00:00\",\"description\":\"Announcing the Couchbase and Unstructured.io connector\u2014quickly convert unstructured data into JSON and vector embeddings for seamless integration into your RAG pipeline.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/#primaryimage\",\"url\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/Unstructured-data-ingestion-pipeline-Diagram_3.png\",\"contentUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/Unstructured-data-ingestion-pipeline-Diagram_3.png\",\"width\":2400,\"height\":1256},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.couchbase.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Supercharge Your RAG application With Couchbase Vector Search and Unstructured.io\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#website\",\"url\":\"https:\/\/www.couchbase.com\/blog\/\",\"name\":\"The Couchbase Blog\",\"description\":\"Couchbase, the NoSQL Database\",\"publisher\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.couchbase.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#organization\",\"name\":\"The Couchbase Blog\",\"url\":\"https:\/\/www.couchbase.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2023\/04\/admin-logo.png\",\"contentUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2023\/04\/admin-logo.png\",\"width\":218,\"height\":34,\"caption\":\"The Couchbase Blog\"},\"image\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/0670782b8878056390b6a256511c8858\",\"name\":\"Vishwa Yeruru - Sr. Product Manager\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/image\/a7609300b8d22762330c56f24bc36684\",\"url\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/vishwa-yeruru.png\",\"contentUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/vishwa-yeruru.png\",\"caption\":\"Vishwa Yeruru - Sr. Product Manager\"},\"url\":\"https:\/\/www.couchbase.com\/blog\/author\/vishwayeruru\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Supercharge Your RAG application With Couchbase Vector Search and Unstructured.io - The Couchbase Blog","description":"Announcing the Couchbase and Unstructured.io connector\u2014quickly convert unstructured data into JSON and vector embeddings for seamless integration into your RAG pipeline.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/","og_locale":"en_US","og_type":"article","og_title":"Supercharge Your RAG application With Couchbase Vector Search and Unstructured.io","og_description":"Announcing the Couchbase and Unstructured.io connector\u2014quickly convert unstructured data into JSON and vector embeddings for seamless integration into your RAG pipeline.","og_url":"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/","og_site_name":"The Couchbase Blog","article_published_time":"2024-10-29T12:50:01+00:00","article_modified_time":"2025-06-16T17:43:45+00:00","og_image":[{"width":2400,"height":1256,"url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/Unstructured-data-ingestion-pipeline-Diagram_3.png","type":"image\/png"}],"author":"Vishwa Yeruru - Sr. Product Manager, Maria Khalusova - Staff Developer Advocate, Unstructured.io","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Vishwa Yeruru - Sr. Product Manager","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/#article","isPartOf":{"@id":"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/"},"author":{"name":"Vishwa Yeruru - Sr. Product Manager","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/0670782b8878056390b6a256511c8858"},"headline":"Supercharge Your RAG application With Couchbase Vector Search and Unstructured.io","datePublished":"2024-10-29T12:50:01+00:00","dateModified":"2025-06-16T17:43:45+00:00","mainEntityOfPage":{"@id":"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/"},"wordCount":970,"commentCount":0,"publisher":{"@id":"https:\/\/www.couchbase.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/#primaryimage"},"thumbnailUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/Unstructured-data-ingestion-pipeline-Diagram_3.png","keywords":["data prep","RAG retrieval-augmented generation","unstructured.io"],"articleSection":["Application Design","Company","Connectors","Couchbase Capella","Generative AI (GenAI)","Partners","Vector Search"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/","url":"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/","name":"Supercharge Your RAG application With Couchbase Vector Search and Unstructured.io - The Couchbase Blog","isPartOf":{"@id":"https:\/\/www.couchbase.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/#primaryimage"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/#primaryimage"},"thumbnailUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/Unstructured-data-ingestion-pipeline-Diagram_3.png","datePublished":"2024-10-29T12:50:01+00:00","dateModified":"2025-06-16T17:43:45+00:00","description":"Announcing the Couchbase and Unstructured.io connector\u2014quickly convert unstructured data into JSON and vector embeddings for seamless integration into your RAG pipeline.","breadcrumb":{"@id":"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/#primaryimage","url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/Unstructured-data-ingestion-pipeline-Diagram_3.png","contentUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/Unstructured-data-ingestion-pipeline-Diagram_3.png","width":2400,"height":1256},{"@type":"BreadcrumbList","@id":"https:\/\/www.couchbase.com\/blog\/supercharge-rag-couchbase-vector-unstructured-io\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.couchbase.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Supercharge Your RAG application With Couchbase Vector Search and Unstructured.io"}]},{"@type":"WebSite","@id":"https:\/\/www.couchbase.com\/blog\/#website","url":"https:\/\/www.couchbase.com\/blog\/","name":"The Couchbase Blog","description":"Couchbase, the NoSQL Database","publisher":{"@id":"https:\/\/www.couchbase.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.couchbase.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.couchbase.com\/blog\/#organization","name":"The Couchbase Blog","url":"https:\/\/www.couchbase.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2023\/04\/admin-logo.png","contentUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2023\/04\/admin-logo.png","width":218,"height":34,"caption":"The Couchbase Blog"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/0670782b8878056390b6a256511c8858","name":"Vishwa Yeruru - Sr. Product Manager","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/image\/a7609300b8d22762330c56f24bc36684","url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/vishwa-yeruru.png","contentUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/vishwa-yeruru.png","caption":"Vishwa Yeruru - Sr. Product Manager"},"url":"https:\/\/www.couchbase.com\/blog\/author\/vishwayeruru\/"}]}},"authors":[{"term_id":10050,"user_id":85541,"is_guest":0,"slug":"vishwayeruru","display_name":"Vishwa Yeruru - Sr. Product Manager","avatar_url":{"url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/vishwa-yeruru.png","url2x":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/vishwa-yeruru.png"},"author_category":"","last_name":"Yeruru - Sr. Product Manager","first_name":"Vishwa","job_title":"Sr. Product Manager","user_url":"","description":""},{"term_id":10051,"user_id":85542,"is_guest":0,"slug":"mariakhalusova","display_name":"Maria Khalusova - Staff Developer Advocate, Unstructured.io","avatar_url":{"url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/maria-khalusova.jpeg","url2x":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/10\/maria-khalusova.jpeg"},"author_category":"","last_name":"Khalusova - Staff Developer Advocate, Unstructured.io","first_name":"Maria","job_title":"","user_url":"","description":""}],"_links":{"self":[{"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/posts\/16518","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/users\/85541"}],"replies":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/comments?post=16518"}],"version-history":[{"count":0,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/posts\/16518\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/media\/16557"}],"wp:attachment":[{"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/media?parent=16518"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/categories?post=16518"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/tags?post=16518"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=16518"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}