{"id":225,"date":"2014-12-16T17:43:14","date_gmt":"2014-12-16T17:43:14","guid":{"rendered":"https:\/\/www.couchbase.com\/blog\/want-get-rid-documents-duplicate-content\/"},"modified":"2014-12-16T17:43:14","modified_gmt":"2014-12-16T17:43:14","slug":"want-get-rid-documents-duplicate-content","status":"publish","type":"post","link":"https:\/\/www.couchbase.com\/blog\/ko\/want-get-rid-documents-duplicate-content\/","title":{"rendered":"Want to get rid of documents with duplicate content?"},"content":{"rendered":"\n<p><img decoding=\"async\" alt=\"\" src=\"https:\/\/www.couchbase.com\/blog\/sites\/default\/files\/uploads\/all\/duplicate-content.jpg\"><\/p>\n\n\n\n<p><b id=\"internal-source-marker_0.37370411586016417\"><span>Whether you\u2019re combining data from two different data sources, have multiple purchases from the same customer or just entered the same data in a web form twice, it seems like everyone faces the problem of duplicate data at one point or the other. <\/span><\/b><\/p>\n\n\n\n<p><b id=\"internal-source-marker_0.37370411586016417\"><span>In this blog post, we&#8217;ll look at using <a href=\"https:\/\/www.couchbase.com\/docs\/couchbase-manual-2.0\/couchbase-views.html\">views<\/a> in Couchbase Server 2.0 to find matching fields among documents and retain the non duplicate documents. For the sake of this example, assume each document has three common user specified fields &#8211; <\/span><span>first_name<\/span><span>, <\/span><span>last_name<\/span><span>, <\/span><span>postal_code<\/span><span>. Using the <\/span><span><font class=\"Apple-style-span\" color=\"#1155cc\"><u>ruby client<\/u><\/font><\/span><span> for Couchbase Server and the <\/span><a href=\"https:\/\/rubygems.org\/gems\/faker\"><span>faker<\/span><\/a><span> ruby gem, you can build a simple <\/span><a href=\"https:\/\/gist.github.com\/859df6c1db21a9bb561b#file_generate.rb\"><span>data generator<\/span><\/a><span> to load some sample duplicate data into Couchbase. To use ruby as a programming language with Couchbase, you should download the Ruby SDK <\/span><a href=\"https:\/\/www.couchbase.com\/develop\/ruby\/next\/\"><span>here<\/span><\/a><span>.<\/span><\/b><\/p>\n\n\n\n<p>\u00a0<\/p>\n\n\n\n<p><b id=\"internal-source-marker_0.37370411586016417\"><span>Here is an execution sample:<\/span><\/b><\/p>\n\n\n\n<p>\u00a0<\/p>\n\n\n\n<p><b id=\"internal-source-marker_0.7940624845214188\"><span>$ ruby .\/generate.rb &#8211;help<\/span><\/b><\/p>\n\n\n\n<p><b id=\"internal-source-marker_0.7940624845214188\"><span>Usage: generate.rb [options]<br class=\"kix-line-break\">\u00a0\u00a0\u00a0-h, &#8211;hostname HOSTNAME \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<span class=\"Apple-tab-span\"> <\/span>Hostname to connect to (default: 127.0.0.1:8091)<br class=\"kix-line-break\">\u00a0\u00a0\u00a0-u, &#8211;user USERNAME \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<span class=\"Apple-tab-span\"> <\/span>Username to log with (default: none)<br class=\"kix-line-break\">\u00a0\u00a0\u00a0-p, &#8211;passwd PASSWORD \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<span class=\"Apple-tab-span\"> <\/span>Password to log with (default: none)<br class=\"kix-line-break\">\u00a0\u00a0\u00a0-b, &#8211;bucket NAME \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<span class=\"Apple-tab-span\"> <\/span><span class=\"Apple-tab-span\"> <\/span>Name of the bucket to connect to (default: default)<br class=\"kix-line-break\">\u00a0\u00a0\u00a0-t, &#8211;total-records NUM \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<span class=\"Apple-tab-span\"> <\/span><span class=\"Apple-tab-span\"> <\/span>The total number of the records to generate (default: 10000)<br class=\"kix-line-break\">\u00a0\u00a0\u00a0-d, &#8211;duplicate-rate NUM \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<span class=\"Apple-tab-span\"> <\/span><span class=\"Apple-tab-span\"> <\/span>Each NUM-th record will be duplicate (default: 30)<br class=\"kix-line-break\">\u00a0\u00a0\u00a0-?, &#8211;help \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<span class=\"Apple-tab-span\"> <\/span><span class=\"Apple-tab-span\"> <\/span>Show this message<\/span><br><br class=\"kix-line-break\"><span>$ ruby .\/generate.rb -t 1000 -d 5<br class=\"kix-line-break\">\u00a0\u00a0\u00a0\u00a0\u00a01000 \/ 1000<\/span><\/b><\/p>\n\n\n\n<p>\u00a0<\/p>\n\n\n\n<p><b id=\"internal-source-marker_0.37370411586016417\"><span>Each document in Couchbase has an user specified key which is accessible as <\/span><span>meta.id<\/span><span> in the map function of the view. In Figure 1 below, there are multiple documents loaded into Couchbase Server using the data generator client above.<\/span><\/b><\/p>\n\n\n\n<p>\u00a0<\/p>\n\n\n\n<p><b id=\"internal-source-marker_0.37370411586016417\"><span>Step 1<\/span><\/b><\/p>\n\n\n\n<p><b id=\"internal-source-marker_0.37370411586016417\"><span>Write a custom map function that emits the document ID (<\/span><span>meta.id<\/span><span>) of all the documents if the a particular duplicate pattern matches (<\/span><span>first_name<\/span><span>, <\/span><span>last_name<\/span><span>, <\/span><span>postal_code<\/span><span> in this case).<\/span><\/b><\/p>\n\n\n\n<p>\u00a0<\/p>\n\n\n\n<p><b id=\"internal-source-marker_0.7940624845214188\"><span>function (doc, meta) {<\/span><\/b><\/p>\n\n\n\n<p><b id=\"internal-source-marker_0.7940624845214188\"><span>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0emit([doc.first_name + &#8216;-&#8216; + doc.last_name + &#8216;-&#8216; + \u00a0doc.postal_code], meta.id);<\/span><\/b><\/p>\n\n\n\n<p><b id=\"internal-source-marker_0.7940624845214188\"><span>}<\/span><\/b><\/p>\n\n\n\n<p><b id=\"internal-source-marker_0.37370411586016417\"><span>The map function defines when two documents are duplicates. \u00a0According to the map function defined above, two documents are duplicate when the first name, last name and postal code match. We use \u2018-\u2019 so that we prevent aliasing of the data when we concatenate the first name, last name and the postal code.<\/span><\/b><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><b id=\"internal-source-marker_0.37370411586016417\"><span>Step 2<\/span><\/b><\/h2>\n\n\n\n<p><font class=\"Apple-style-span\" color=\"#000000\" face=\"Arial\">The reduce function looks like &#8211; <\/font><\/p>\n\n\n\n<p><b id=\"internal-source-marker_0.7940624845214188\"><span>function (keys, values, rereduce) {<\/span><\/b><\/p>\n\n\n\n<p><b id=\"internal-source-marker_0.7940624845214188\"><span>\u00a0 if (rereduce) {<\/span><br><span>\u00a0\u00a0\u00a0 var res = [];<\/span><br><span>\u00a0\u00a0\u00a0 for (var i = 0; i &lt; values.length; i++){<\/span><br><span>\u00a0\u00a0\u00a0\u00a0\u00a0 res = res.concat(values[i])<\/span><br><span>\u00a0\u00a0\u00a0 }<\/span><br><span>\u00a0\u00a0\u00a0 return res;<\/span><br><span>\u00a0 } else {<\/span><br><span>\u00a0\u00a0\u00a0 return values;<\/span><br><span>\u00a0 }<\/span><br><span>}<\/span><\/b><\/p>\n\n\n\n<p>\u00a0<\/p>\n\n\n\n<p><font class=\"Apple-style-span\" color=\"#000000\" face=\"Arial\">After grouping, if there are more than one meta.id values, we concatenate them to get a list of meta.id&#8217;s refering to a duplicate document.<\/font><\/p>\n\n\n\n<p><img decoding=\"async\" alt=\"\" src=\"https:\/\/www.couchbase.com\/blog\/sites\/default\/files\/uploads\/all\/Screen%20Shot%202012-10-25%20at%203.07.13%20PM.png\"><\/p>\n\n\n\n<p><b id=\"internal-source-marker_0.37370411586016417\"><span>Step 3<\/span><\/b><\/p>\n\n\n\n<p><b id=\"internal-source-marker_0.37370411586016417\"><span>The core part of the <\/span><a href=\"https:\/\/gist.github.com\/859df6c1db21a9bb561b#file_cleanup.rb\"><span>data cleaner <\/span><\/a><span>is written in Ruby. <\/span><\/b><\/p>\n\n\n\n<p>\u00a0<\/p>\n\n\n\n<p>\u00a0<\/p>\n\n\n\n<font class=\"Apple-style-span\" color=\"#0086b3\" face=\"Arial\" size=\"2\"><span class=\"nb\">require<\/span> <span class=\"s1\">&#8216;couchbase&#8217;<\/span><\/font>\n\n\n\n<p>\u00a0<\/p>\n\n\n\n<font class=\"Apple-style-span\" color=\"#0086b3\" face=\"Arial\" size=\"2\"><span class=\"n\">connection<\/span> <span class=\"o\">=<\/span> <span class=\"no\">Couchbase<\/span><span class=\"o\">.<\/span><span class=\"n\">connect<\/span><span class=\"p\">(<\/span><span class=\"n\">options<\/span><span class=\"p\">)<\/span><\/font>\n\n\n\n<font class=\"Apple-style-span\" color=\"#0086b3\" face=\"Arial\" size=\"2\"><span class=\"n\">ddoc<\/span> <span class=\"o\">=<\/span> <span class=\"n\">connection<\/span><span class=\"o\">.<\/span><span class=\"n\">design_docs<\/span><span class=\"o\">[<\/span><span class=\"n\">options<\/span><span class=\"o\">[<\/span><span class=\"ss\">:design_document<\/span><span class=\"o\">]]<\/span><\/font>\n\n\n\n<font class=\"Apple-style-span\" color=\"#0086b3\" face=\"Arial\" size=\"2\"><span class=\"n\">view<\/span> <span class=\"o\">=<\/span> <span class=\"n\">ddoc<\/span><span class=\"o\">.<\/span><span class=\"n\">send<\/span><span class=\"p\">(<\/span><span class=\"n\">options<\/span><span class=\"o\">[<\/span><span class=\"ss\">:view<\/span><span class=\"o\">]<\/span><span class=\"p\">)<\/span><\/font>\n\n\n\n<font class=\"Apple-style-span\" color=\"#0086b3\" face=\"Arial\" size=\"2\"><span class=\"n\">connection<\/span><span class=\"o\">.<\/span><span class=\"n\">run<\/span> <span class=\"k\">do<\/span><\/font>\n\n\n\n<font class=\"Apple-style-span\" color=\"#0086b3\" face=\"Arial\" size=\"2\">\u00a0<span class=\"n\">view<\/span><span class=\"o\">.<\/span><span class=\"n\">each<\/span><span class=\"p\">(<\/span><span class=\"ss\">:group<\/span> <span class=\"o\">=&gt;<\/span> <span class=\"kp\">true<\/span><span class=\"p\">)<\/span> <span class=\"k\">do<\/span> <span class=\"o\">|<\/span><span class=\"n\">doc<\/span><span class=\"o\">|<\/span><\/font>\n\n\n\n<font class=\"Apple-style-span\" color=\"#0086b3\" face=\"Arial\" size=\"2\">\u00a0\u00a0\u00a0<span class=\"n\">dup_num<\/span> <span class=\"o\">=<\/span> <span class=\"n\">doc<\/span><span class=\"o\">.<\/span><span class=\"n\">value<\/span><span class=\"o\">.<\/span><span class=\"n\">size<\/span><\/font>\n\n\n\n<font class=\"Apple-style-span\" color=\"#0086b3\" face=\"Arial\" size=\"2\">\u00a0\u00a0\u00a0<span class=\"k\">if<\/span> <span class=\"n\">dup_num<\/span> <span class=\"o\">&gt;<\/span> <span class=\"mi\">1<\/span><\/font>\n\n\n\n<font class=\"Apple-style-span\" color=\"#0086b3\" face=\"Arial\" size=\"2\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<span class=\"nb\">puts<\/span> <span class=\"s2\">&#8220;left doc <\/span><span class=\"si\">#{<\/span><span class=\"n\">doc<\/span><span class=\"o\">.<\/span><span class=\"n\">value<\/span><span class=\"o\">[<\/span><span class=\"mi\">0<\/span><span class=\"o\">]<\/span><span class=\"si\">}<\/span><span class=\"s2\">, &#8220;<\/span><\/font>\n\n\n\n<font class=\"Apple-style-span\" color=\"#0086b3\" face=\"Arial\" size=\"2\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<span class=\"c1\"># delete documents from second to last<\/span><\/font>\n\n\n\n<font class=\"Apple-style-span\" color=\"#0086b3\" face=\"Arial\" size=\"2\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<span class=\"n\">connection<\/span><span class=\"o\">.<\/span><span class=\"n\">delete<\/span><span class=\"p\">(<\/span><span class=\"n\">doc<\/span><span class=\"o\">.<\/span><span class=\"n\">value<\/span><span class=\"o\">[<\/span><span class=\"mi\">1<\/span><span class=\"o\">.<\/span><span class=\"n\">.<\/span><span class=\"o\">&#8211;<\/span><span class=\"mi\">1<\/span><span class=\"o\">]<\/span><span class=\"p\">)<\/span><\/font>\n\n\n\n<font class=\"Apple-style-span\" color=\"#0086b3\" face=\"Arial\" size=\"2\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<span class=\"nb\">puts<\/span> <span class=\"s2\">&#8220;removed <\/span><span class=\"si\">#{<\/span><span class=\"n\">dup_num<\/span><span class=\"si\">}<\/span><span class=\"s2\"> duplicate(s)&#8221;<\/span><\/font>\n\n\n\n<font class=\"Apple-style-span\" color=\"#0086b3\" face=\"Arial\" size=\"2\">\u00a0\u00a0\u00a0<span class=\"k\">end<\/span><\/font>\n\n\n\n<font class=\"Apple-style-span\" color=\"#0086b3\" face=\"Arial\" size=\"2\">\u00a0<span class=\"k\">end<\/span><\/font>\n\n\n\n<font class=\"Apple-style-span\" color=\"#0086b3\" face=\"Arial\" size=\"2\"><span class=\"k\">end<\/span><\/font>\n\n\n\n<p>\u00a0<\/p>\n\n\n\n<p><span>Connect to Couchbase Server and query the view. The value field is an array of meta.id\u2019s that correspond to duplicate documents (matching first name, last name and postal code). If the array size is greater than 1, we delete all the documents except the one corresponding to the last meta.id.<\/span><\/p>\n\n\n\n<p><b><span><img decoding=\"async\" alt=\"\" src=\"https:\/\/www.couchbase.com\/blog\/sites\/default\/files\/uploads\/all\/Screen%20Shot%202012-10-25%20at%203.08.32%20PM.png\"><\/span><\/b><\/p>\n\n\n\n<p><b id=\"internal-source-marker_0.37370411586016417\"><span>If the number of meta.id\u2019s in the value array is greater than 2, there are duplicate documents corresponding to that key. As shown in the figure above id19 and id20 are duplicate documents.<\/span><\/b><\/p>\n\n\n\n<p><b id=\"internal-source-marker_0.37370411586016417\"><span>The output of the data cleaner script looks like &#8211;<\/span><\/b><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.couchbase.com\/blog\/sites\/default\/files\/uploads\/all\/Screen%20Shot%202012-10-25%20at%203.48.21%20PM.png\"\/><\/figure>\n\n\n\n<p>\u00a0<\/p>\n\n\n\n<p><b id=\"internal-source-marker_0.37370411586016417\"><span>As shown in the figure below, duplicate documents are now eliminated. <\/span><\/b><\/p>\n\n\n\n<p>Enjoy!<\/p>\n\n\n\n<p><b><span>&#8212;<\/span><\/b><\/p>\n\n\n\n<p><b id=\"internal-source-marker_0.37370411586016417\"><span>Thanks to Sergey for putting together the ruby code.<\/span><\/b><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Whether you\u2019re combining data from two different data sources, have multiple purchases from the same customer or just entered the same data in a web form twice, it seems like everyone faces the problem of duplicate data at one point or the other. In this blog post, we&#8217;ll look at using views in Couchbase Server [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":18,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"footnotes":""},"categories":[1],"tags":[],"ppma_author":[35],"class_list":["post-225","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.6 (Yoast SEO v27.6) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Want to get rid of documents with duplicate content? - The Couchbase Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.couchbase.com\/blog\/ko\/want-get-rid-documents-duplicate-content\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Want to get rid of documents with duplicate content?\" \/>\n<meta property=\"og:description\" content=\"Whether you\u2019re combining data from two different data sources, have multiple purchases from the same customer or just entered the same data in a web form twice, it seems like everyone faces the problem of duplicate data at one point or the other. In this blog post, we&#8217;ll look at using views in Couchbase Server [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.couchbase.com\/blog\/ko\/want-get-rid-documents-duplicate-content\/\" \/>\n<meta property=\"og:site_name\" content=\"The Couchbase Blog\" \/>\n<meta property=\"article:published_time\" content=\"2014-12-16T17:43:14+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/5\/2026\/05\/couchbase-nosql-dbaas.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1800\" \/>\n\t<meta property=\"og:image:height\" content=\"630\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Don Pinto, Principal Product Manager, Couchbase\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Don Pinto, Principal Product Manager, Couchbase\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/want-get-rid-documents-duplicate-content\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/want-get-rid-documents-duplicate-content\\\/\"},\"author\":{\"name\":\"Don Pinto, Principal Product Manager, Couchbase\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#\\\/schema\\\/person\\\/eb130a1e0278989e089a7fbbf8bc754c\"},\"headline\":\"Want to get rid of documents with duplicate content?\",\"datePublished\":\"2014-12-16T17:43:14+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/want-get-rid-documents-duplicate-content\\\/\"},\"wordCount\":620,\"commentCount\":4,\"publisher\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/want-get-rid-documents-duplicate-content\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/5\\\/2026\\\/05\\\/couchbase-nosql-dbaas.png\",\"articleSection\":[\"Uncategorized\"],\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/want-get-rid-documents-duplicate-content\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/want-get-rid-documents-duplicate-content\\\/\",\"url\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/want-get-rid-documents-duplicate-content\\\/\",\"name\":\"Want to get rid of documents with duplicate content? - The Couchbase Blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/want-get-rid-documents-duplicate-content\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/want-get-rid-documents-duplicate-content\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/5\\\/2026\\\/05\\\/couchbase-nosql-dbaas.png\",\"datePublished\":\"2014-12-16T17:43:14+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/want-get-rid-documents-duplicate-content\\\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/want-get-rid-documents-duplicate-content\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/want-get-rid-documents-duplicate-content\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/5\\\/2026\\\/05\\\/couchbase-nosql-dbaas.png\",\"contentUrl\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/5\\\/2026\\\/05\\\/couchbase-nosql-dbaas.png\",\"width\":1800,\"height\":630},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/want-get-rid-documents-duplicate-content\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Want to get rid of documents with duplicate content?\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/\",\"name\":\"The Couchbase Blog\",\"description\":\"Couchbase, the NoSQL Database\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#organization\",\"name\":\"The Couchbase Blog\",\"url\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/5\\\/2026\\\/06\\\/logo.svg\",\"contentUrl\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/5\\\/2026\\\/06\\\/logo.svg\",\"width\":\"1024\",\"height\":\"1024\",\"caption\":\"The Couchbase Blog\"},\"image\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#\\\/schema\\\/person\\\/eb130a1e0278989e089a7fbbf8bc754c\",\"name\":\"Don Pinto, Principal Product Manager, Couchbase\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/574759a111515cb8c5d5a1f5268d2759050bd8383654dc0d9393324f0c35fae0?s=96&d=mm&r=g39c6d6178c73f0dc09af63f930a4f37d\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/574759a111515cb8c5d5a1f5268d2759050bd8383654dc0d9393324f0c35fae0?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/574759a111515cb8c5d5a1f5268d2759050bd8383654dc0d9393324f0c35fae0?s=96&d=mm&r=g\",\"caption\":\"Don Pinto, Principal Product Manager, Couchbase\"},\"description\":\"Don Pinto is a Principal Product Manager at Couchbase and is currently focused on advancing the capabilities of Couchbase Server. He is extremely passionate about data technology, and in the past has authored several articles on Couchbase Server including technical blogs and white papers. Prior to joining Couchbase, Don spent several years at IBM where he maintained the role of software developer in the DB2 information management group and most recently as a program manager on the SQL Server team at Microsoft. Don holds a master's degree in computer science and a bachelor's in computer engineering from the University of Toronto, Canada.\",\"url\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/ko\\\/author\\\/don-pinto\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Want to get rid of documents with duplicate content? - The Couchbase Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.couchbase.com\/blog\/ko\/want-get-rid-documents-duplicate-content\/","og_locale":"ko_KR","og_type":"article","og_title":"Want to get rid of documents with duplicate content?","og_description":"Whether you\u2019re combining data from two different data sources, have multiple purchases from the same customer or just entered the same data in a web form twice, it seems like everyone faces the problem of duplicate data at one point or the other. In this blog post, we&#8217;ll look at using views in Couchbase Server [&hellip;]","og_url":"https:\/\/www.couchbase.com\/blog\/ko\/want-get-rid-documents-duplicate-content\/","og_site_name":"The Couchbase Blog","article_published_time":"2014-12-16T17:43:14+00:00","og_image":[{"width":1800,"height":630,"url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/5\/2026\/05\/couchbase-nosql-dbaas.png","type":"image\/png"}],"author":"Don Pinto, Principal Product Manager, Couchbase","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Don Pinto, Principal Product Manager, Couchbase","Est. reading time":"4\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.couchbase.com\/blog\/want-get-rid-documents-duplicate-content\/#article","isPartOf":{"@id":"https:\/\/www.couchbase.com\/blog\/want-get-rid-documents-duplicate-content\/"},"author":{"name":"Don Pinto, Principal Product Manager, Couchbase","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/eb130a1e0278989e089a7fbbf8bc754c"},"headline":"Want to get rid of documents with duplicate content?","datePublished":"2014-12-16T17:43:14+00:00","mainEntityOfPage":{"@id":"https:\/\/www.couchbase.com\/blog\/want-get-rid-documents-duplicate-content\/"},"wordCount":620,"commentCount":4,"publisher":{"@id":"https:\/\/www.couchbase.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/want-get-rid-documents-duplicate-content\/#primaryimage"},"thumbnailUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/5\/2026\/05\/couchbase-nosql-dbaas.png","articleSection":["Uncategorized"],"inLanguage":"ko-KR","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.couchbase.com\/blog\/want-get-rid-documents-duplicate-content\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.couchbase.com\/blog\/want-get-rid-documents-duplicate-content\/","url":"https:\/\/www.couchbase.com\/blog\/want-get-rid-documents-duplicate-content\/","name":"Want to get rid of documents with duplicate content? - The Couchbase Blog","isPartOf":{"@id":"https:\/\/www.couchbase.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.couchbase.com\/blog\/want-get-rid-documents-duplicate-content\/#primaryimage"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/want-get-rid-documents-duplicate-content\/#primaryimage"},"thumbnailUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/5\/2026\/05\/couchbase-nosql-dbaas.png","datePublished":"2014-12-16T17:43:14+00:00","breadcrumb":{"@id":"https:\/\/www.couchbase.com\/blog\/want-get-rid-documents-duplicate-content\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.couchbase.com\/blog\/want-get-rid-documents-duplicate-content\/"]}]},{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/www.couchbase.com\/blog\/want-get-rid-documents-duplicate-content\/#primaryimage","url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/5\/2026\/05\/couchbase-nosql-dbaas.png","contentUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/5\/2026\/05\/couchbase-nosql-dbaas.png","width":1800,"height":630},{"@type":"BreadcrumbList","@id":"https:\/\/www.couchbase.com\/blog\/want-get-rid-documents-duplicate-content\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.couchbase.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Want to get rid of documents with duplicate content?"}]},{"@type":"WebSite","@id":"https:\/\/www.couchbase.com\/blog\/#website","url":"https:\/\/www.couchbase.com\/blog\/","name":"The Couchbase Blog","description":"Couchbase, the NoSQL Database","publisher":{"@id":"https:\/\/www.couchbase.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.couchbase.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/www.couchbase.com\/blog\/#organization","name":"The Couchbase Blog","url":"https:\/\/www.couchbase.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/5\/2026\/06\/logo.svg","contentUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/5\/2026\/06\/logo.svg","width":"1024","height":"1024","caption":"The Couchbase Blog"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/eb130a1e0278989e089a7fbbf8bc754c","name":"Don Pinto, Principal Product Manager, Couchbase","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/secure.gravatar.com\/avatar\/574759a111515cb8c5d5a1f5268d2759050bd8383654dc0d9393324f0c35fae0?s=96&d=mm&r=g39c6d6178c73f0dc09af63f930a4f37d","url":"https:\/\/secure.gravatar.com\/avatar\/574759a111515cb8c5d5a1f5268d2759050bd8383654dc0d9393324f0c35fae0?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/574759a111515cb8c5d5a1f5268d2759050bd8383654dc0d9393324f0c35fae0?s=96&d=mm&r=g","caption":"Don Pinto, Principal Product Manager, Couchbase"},"description":"Don Pinto is a Principal Product Manager at Couchbase and is currently focused on advancing the capabilities of Couchbase Server. He is extremely passionate about data technology, and in the past has authored several articles on Couchbase Server including technical blogs and white papers. Prior to joining Couchbase, Don spent several years at IBM where he maintained the role of software developer in the DB2 information management group and most recently as a program manager on the SQL Server team at Microsoft. Don holds a master's degree in computer science and a bachelor's in computer engineering from the University of Toronto, Canada.","url":"https:\/\/www.couchbase.com\/blog\/ko\/author\/don-pinto\/"}]}},"acf":[],"authors":[{"term_id":35,"user_id":4,"is_guest":0,"slug":"don-pinto","display_name":"Don Pinto, Principal Product Manager, Couchbase","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/www.couchbase.com\/blog\/ko\/wp-json\/wp\/v2\/posts\/225","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.couchbase.com\/blog\/ko\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.couchbase.com\/blog\/ko\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/ko\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/ko\/wp-json\/wp\/v2\/comments?post=225"}],"version-history":[{"count":0,"href":"https:\/\/www.couchbase.com\/blog\/ko\/wp-json\/wp\/v2\/posts\/225\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/ko\/wp-json\/wp\/v2\/media\/18"}],"wp:attachment":[{"href":"https:\/\/www.couchbase.com\/blog\/ko\/wp-json\/wp\/v2\/media?parent=225"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/ko\/wp-json\/wp\/v2\/categories?post=225"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/ko\/wp-json\/wp\/v2\/tags?post=225"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/ko\/wp-json\/wp\/v2\/ppma_author?post=225"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}