{"id":2095,"date":"2015-12-09T22:23:53","date_gmt":"2015-12-09T22:23:52","guid":{"rendered":"https:\/\/www.couchbase.com\/blog\/?p=2095"},"modified":"2024-09-12T01:27:40","modified_gmt":"2024-09-12T08:27:40","slug":"bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark","status":"publish","type":"post","link":"https:\/\/www.couchbase.com\/blog\/pt\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/","title":{"rendered":"Transforma\u00e7\u00f5es em massa de dados do Couchbase usando o Apache Spark com uma fonte externa"},"content":{"rendered":"<h2><span style=\"line-height: 1.6em;\">A necessidade<\/span><\/h2>\n<p><span style=\"line-height: 1.6em;\">Ah, as migra\u00e7\u00f5es de banco de dados. Depois de migrar para o Couchbase, em que a representa\u00e7\u00e3o de dados do documento JSON \u00e9 muito mais flex\u00edvel, voc\u00ea n\u00e3o precisar\u00e1 mais distorcer seu processo de desenvolvimento com tanta frequ\u00eancia por meio do pipeline de solicita\u00e7\u00f5es de altera\u00e7\u00e3o, seguido pelo tempo de inatividade, poss\u00edveis erros e o desagrado geral quando vinculado a linhas e colunas.<\/span><\/p>\n<p>Dito isso, o fato de estar no Couchbase n\u00e3o significa que voc\u00ea estar\u00e1 sempre livre da necessidade de fazer a transforma\u00e7\u00e3o de dados com dados no Couchbase. Isso ser\u00e1 apenas muito menos comum.<\/p>\n<p>Esse \u00e9, de fato, um lugar interessante para o Apache Spark!<\/p>\n<h2>Um aparte<span class=\"s1\">...<\/span><\/h2>\n<p>Com o conector Spark e o Couchbase 4.0, voc\u00ea tem n\u00e3o apenas uma, mas quatro interfaces que s\u00e3o relevantes para quem usa o Spark. S\u00e3o elas: a interface K-V, a interface de streaming Database Change Protocol (tamb\u00e9m conhecida como DCP), a interface de consulta N1QL via Spark SQL e a interface View Query.<\/p>\n<p>Eles podem ser combinados com uma s\u00e9rie de fontes de dados diferentes do ecossistema do Spark para reunir e manipular dados de v\u00e1rias maneiras. Por exemplo, voc\u00ea pode querer transmitir os dados do Couchbase por meio do DCP, mistur\u00e1-los com uma fonte de dados do HDFS e colocar os resultados de destino de volta em um bucket diferente do Couchbase.<\/p>\n<h2>A solu\u00e7\u00e3o<span class=\"s1\">...<\/span><\/h2>\n<p>Tomando um caso simples, como podemos usar o Spark para escrever algum c\u00f3digo para transformar com efici\u00eancia um conjunto de dados dentro do Couchbase?<\/p>\n<p>Imagine o cen\u00e1rio em que voc\u00ea adquiriu um novo conjunto de dados sobre jogadores em formato JSON. Todos eles estar\u00e3o jogando seu novo jogo FizzBuzz em breve, e os perfis foram enviados por um parceiro. Todos os perfis recebidos s\u00e3o parecidos com este:<\/p>\n<pre><code class=\"language-json\">{\r\n  \u201cgivenname\u201d: \u201cJoel\u201d,\r\n  \u201csurname\u201d: \u201cSmith\u201d,\r\n  \u201cemail\u201d: \u201cjoelsmith@g00glemail.com\u201d,\r\n  \u201centitlementtoken\u201d: 78238743\r\n}<\/code><\/pre>\n<p>O problema \u00e9 que os perfis do FizzBuzz s\u00e3o todos parecidos com este:<\/p>\n<pre><code class=\"language-json\">{\r\n  \u201cfname\u201d: \u201cMatt\u201d,\r\n  \u201clname\u201d: \u201cIngenthron\u201d,\r\n  \u201cemail\u201d: \u201cmatt@couchbase.com\u201d,\r\n  \u201ccurrentscore\u201d: 1000000\r\n}<\/code><\/pre>\n<p>Normalmente, se voc\u00ea tivesse outra forma para os dados, adicionaria um pouco de l\u00f3gica para o mapeamento no momento da leitura e da grava\u00e7\u00e3o. No entanto, essa transi\u00e7\u00e3o espec\u00edfica \u00e9 um processo \u00fanico e vem com um detalhe adicional. Esse \"entitlementtoken\" precisa ser consultado em um backup do banco de dados MySQL que voc\u00ea tamb\u00e9m possui. Voc\u00ea n\u00e3o quer ter que provisionar ou manter uma grande implementa\u00e7\u00e3o do MySQL para lidar com o tr\u00e1fego do dia do lan\u00e7amento, portanto, uma transforma\u00e7\u00e3o \u00fanica antes do lan\u00e7amento \u00e9 melhor.<\/p>\n<p>O ideal seria transmitir os dados, encontrar aqueles com a \"forma\" que desejamos e transform\u00e1-los com o Spark com base em uma consulta SQL.<\/p>\n<p>Primeiro, precisamos configurar nossa conex\u00e3o e transmitir os dados, procurando a forma do JSON j\u00e1 importado. Isso usar\u00e1 a interface DCP do Couchbase para transmitir os dados.<\/p>\n<pre><code class=\"language-scala\">val ssc = new StreamingContext(sc, Seconds(5))\r\n\r\nssc.couchbaseStream(\"transformative\")\r\n  .filter(_.isInstanceOf[Mutation])\r\n  .map(m =&gt; (new String(m.asInstanceOf[Mutation].key), new String(m.asInstanceOf[Mutation].content)))\r\n<\/code><\/pre>\n<p>Uma limita\u00e7\u00e3o atual \u00e9 que o DStream nunca para, mas podemos simplesmente monitorar quando n\u00e3o vemos mais dados sendo transformados como uma solu\u00e7\u00e3o alternativa para esse caso simples.<\/p>\n<p>Em seguida, por item, precisamos aplicar uma transforma\u00e7\u00e3o com base nessa pesquisa do MySQL. Para fazer isso, precisaremos carregar os dados do MySQL. Supondo que a tabela do MySQL tenha a seguinte apar\u00eancia:<\/p>\n<pre><code class=\"language-bash\">mysql&gt; describe profiles;\r\n+------------------+-------------+------+-----+---------+-------+\r\n| Field            | Type        | Null | Key | Default | Extra |\r\n+------------------+-------------+------+-----+---------+-------+\r\n| givenname        | varchar(20) | YES  |     | NULL    |       |\r\n| surname          | varchar(20) | YES  |     | NULL    |       |\r\n| email            | varchar(20) | YES  |     | NULL    |       |\r\n| entitlementtoken | int(11)     | YES  |     | NULL    |       |\r\n+------------------+-------------+------+-----+---------+-------+\r\n4 rows in set (0.00 sec)<\/code><\/pre>\n<p>Queremos carregar os dados do MySQL como um DataFrame. Como o StreamingContext nos fornece RDDs para unir, n\u00f3s o converteremos em um conjunto de RDDs para uma uni\u00e3o posterior dentro do fluxo. O Spark 1.6 pode tornar isso mais f\u00e1cil. Essa convers\u00e3o tem a seguinte apar\u00eancia (extra\u00edda para uma fun\u00e7\u00e3o para facilitar a leitura):<\/p>\n<pre><code class=\"language-scala\">\/** Returns an RDD based on email address extracted from the document *\/\r\ndef CreateMappableRdd(s: (String, String)): (String, JsonDocument) = {\r\n  val return_doc = JsonDocument.create(s._1, JsonObject.fromJson(s._2))\r\n  (return_doc.content().getString(\"email\"), return_doc)\r\n}\r\n<\/code><\/pre>\n<p>Tamb\u00e9m precisamos adicionar o novo token de direito (tamb\u00e9m extra\u00eddo):<\/p>\n<pre><code class=\"language-scala\">\/** Returns a JsonDocument enriched with the entitlement token *\/\r\ndef mergeIntoDoc(t: (String, (JsonDocument, Integer))): JsonDocument = {\r\n  val jsonToEnrich = t._2._1.content()\r\n  val entitlementFromJoin = t._2._2\r\n  jsonToEnrich.put(\"entitlementtoken\", entitlementFromJoin)\r\n  t._2._1\r\n}\r\n<\/code><\/pre>\n<p>No final, temos uma boa descri\u00e7\u00e3o fluente da nossa transforma\u00e7\u00e3o, modificando os RDDs em voo que precisam de altera\u00e7\u00f5es. Por fim, isso grava os dados transformados de volta no Couchbase, substituindo os itens usando a interface K-V.<\/p>\n<pre><code class=\"language-scala\">\/\/ load the DataFrame of all of the users from MySQL.\r\n\/\/ Note, appending .cache() may make sense here (or not) depending on amount of data.\r\nval entitlements = mysqlReader.load()\r\n\r\n\/* loading this:\r\n  +---------+-----------+-----------------+----------------+\r\n  |givenname|    surname|            email|entitlementtoken|\r\n  +---------+-----------+-----------------+----------------+\r\n  |     Matt| Ingenthron|   matt@email.com|           11211|\r\n  |  Michael|Nitschinger|michael@email.com|           11210|\r\n  +---------+-----------+-----------------+----------------+\r\n *\/\r\n\r\nval entitlementsSansSchema = entitlements.rdd.map[(String, Integer)](f =&gt; (f.getAs[String](\"email\"), f.getAs[Integer](\"entitlementtoken\")))\r\n\r\nval ssc = new StreamingContext(sc, Seconds(5))\r\n\r\nssc.couchbaseStream(\"transformative\")\r\n  .filter(_.isInstanceOf[Mutation])\r\n  .map(m =&gt; (new String(m.asInstanceOf[Mutation].key), new String(m.asInstanceOf[Mutation].content)))\r\n  .map(s =&gt; CreateMappableRdd(s))\r\n  .filter(_._2.content().get(\"entitlementtoken\").eq(null))\r\n  .foreachRDD(rdd =&gt; {\r\n    rdd\r\n      .join(entitlementsSansSchema)\r\n      .map(mergeIntoDoc)\r\n      \/\/.foreach(println) \/\/ a good place to see the effect\r\n      .saveToCouchbase(\"transformative\")\r\n  })\r\n\r\nssc.start()\r\nssc.awaitTermination()\r\n<\/code><\/pre>\n<p>O <a href=\"https:\/\/github.com\/couchbaselabs\/couchbase-spark-samples\/blob\/43594b5c1e8eb3bd010d6eca3f9318c561dde382\/src\/main\/scala\/TransformationExample.scala\">O exemplo completo est\u00e1 no couchbase-spark-samples<\/a> reposit\u00f3rio.<\/p>\n<p>A beleza desse exemplo \u00e9 que ele \u00e9 f\u00e1cil de entender o que est\u00e1 acontecendo e bastante trivial para ser dimensionado. \u00c9 prov\u00e1vel que sua pr\u00f3pria transforma\u00e7\u00e3o seja mais complexa, mas esse exemplo deve lhe dar uma no\u00e7\u00e3o do que \u00e9 poss\u00edvel e algo a ser desenvolvido.<\/p>\n<p>Sempre h\u00e1 espa\u00e7o para melhorias.<\/p>\n<p>Um problema \u00e9 que o MySQL pode ser maior do que o que eu quero carregar na mem\u00f3ria. O Spark leva isso em conta ao oferecer uma maneira de dividir os DataFrames. Eu n\u00e3o precisava disso aqui e queria que a amostra fosse leg\u00edvel. Outra coisa que pode ajudar nisso \u00e9 a capacidade de fazer refer\u00eancia a um SparkContext em um StreamingContext existente. O Spark n\u00e3o permite isso no momento por boas raz\u00f5es, mas eu diria que esse caso de uso simples de fazer uma pesquisa de registro \u00fanico de dentro do fluxo faz sentido.<\/p>\n<p>No Conector Couchbase, no momento, a interface DCP \u00e9 classificada como vol\u00e1til e deve ser considerada experimental. Al\u00e9m disso, o exemplo acima \u00e9 muito r\u00e1pido, mas precisa de ajuda para ser dimensionado. Uma atualiza\u00e7\u00e3o futura do meu colega Sergey Avseyev permitir\u00e1 dividir os fluxos de DCP entre os trabalhadores do Spark para paralelizar essa transforma\u00e7\u00e3o.<\/p>\n<h2>Para concluir<\/h2>\n<p>O Spark \u00e9 uma nova e excelente ferramenta para esse tipo de transforma\u00e7\u00e3o. As mesmas t\u00e9cnicas podem certamente ser aplicadas \u00e0 migra\u00e7\u00e3o para o Couchbase a partir de uma fonte de dados diferente, como um banco de dados relacional. A t\u00e9cnica pode at\u00e9 ser expandida com o aprendizado de m\u00e1quina do Spark para criar um modelo em torno do fluxo de dados do Couchbase para antecipar resultados.<\/p>","protected":false},"excerpt":{"rendered":"<p>The Need Ah, database migrations. After you\u2019ve migrated to Couchbase where the JSON document data representation is much more flexible, you won\u2019t as often need to twist your development process through the change request pipeline followed by the downtime, possible [&hellip;]<\/p>","protected":false},"author":41,"featured_media":13873,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"footnotes":""},"categories":[1],"tags":[],"ppma_author":[8993],"class_list":["post-2095","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.7.1 (Yoast SEO v25.7) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Bulk Transformations of Couchbase Data Using Apache Spark<\/title>\n<meta name=\"description\" content=\"Spark is a great new tool for bulk transformation. Check out the techniques for migrating to Couchbase from a different data source.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.couchbase.com\/blog\/pt\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/\" \/>\n<meta property=\"og:locale\" content=\"pt_BR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Bulk Transformations of Couchbase Data Using Apache Spark with an External Source\" \/>\n<meta property=\"og:description\" content=\"Spark is a great new tool for bulk transformation. Check out the techniques for migrating to Couchbase from a different data source.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.couchbase.com\/blog\/pt\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/\" \/>\n<meta property=\"og:site_name\" content=\"The Couchbase Blog\" \/>\n<meta property=\"article:published_time\" content=\"2015-12-09T22:23:52+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-09-12T08:27:40+00:00\" \/>\n<meta name=\"author\" content=\"Matt Ingenthron, Senior Director, SDK Engineering, Couchbase\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@ingenthr\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Matt Ingenthron, Senior Director, SDK Engineering, Couchbase\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutos\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/\"},\"author\":{\"name\":\"Matt Ingenthron, Senior Director, SDK Engineering, Couchbase\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/35e939d9fe3dfb1a06f1714ee54bb098\"},\"headline\":\"Bulk Transformations of Couchbase Data Using Apache Spark with an External Source\",\"datePublished\":\"2015-12-09T22:23:52+00:00\",\"dateModified\":\"2024-09-12T08:27:40+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/\"},\"wordCount\":872,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2022\/11\/couchbase-nosql-dbaas.png\",\"articleSection\":[\"Uncategorized\"],\"inLanguage\":\"pt-BR\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/\",\"url\":\"https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/\",\"name\":\"Bulk Transformations of Couchbase Data Using Apache Spark\",\"isPartOf\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2022\/11\/couchbase-nosql-dbaas.png\",\"datePublished\":\"2015-12-09T22:23:52+00:00\",\"dateModified\":\"2024-09-12T08:27:40+00:00\",\"description\":\"Spark is a great new tool for bulk transformation. Check out the techniques for migrating to Couchbase from a different data source.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/#breadcrumb\"},\"inLanguage\":\"pt-BR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"pt-BR\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/#primaryimage\",\"url\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2022\/11\/couchbase-nosql-dbaas.png\",\"contentUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2022\/11\/couchbase-nosql-dbaas.png\",\"width\":1800,\"height\":630},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.couchbase.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Bulk Transformations of Couchbase Data Using Apache Spark with an External Source\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#website\",\"url\":\"https:\/\/www.couchbase.com\/blog\/\",\"name\":\"The Couchbase Blog\",\"description\":\"Couchbase, the NoSQL Database\",\"publisher\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.couchbase.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"pt-BR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#organization\",\"name\":\"The Couchbase Blog\",\"url\":\"https:\/\/www.couchbase.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"pt-BR\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2023\/04\/admin-logo.png\",\"contentUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2023\/04\/admin-logo.png\",\"width\":218,\"height\":34,\"caption\":\"The Couchbase Blog\"},\"image\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/35e939d9fe3dfb1a06f1714ee54bb098\",\"name\":\"Matt Ingenthron, Senior Director, SDK Engineering, Couchbase\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"pt-BR\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/image\/e8b5b257dfa7206fd7c2a5d628fc580b\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/fd6787feb079d2c67a3835a47901bbb9c03b8921abced82a2a1f6975816df2ad?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/fd6787feb079d2c67a3835a47901bbb9c03b8921abced82a2a1f6975816df2ad?s=96&d=mm&r=g\",\"caption\":\"Matt Ingenthron, Senior Director, SDK Engineering, Couchbase\"},\"description\":\"Matt Ingenthron is the Senior Director in Engineering at Couchbase where he focuses on the developer interface across SDKs, connectors and other projects. He has been a contributor to the memcached project, one of the maintainers of the Java spymemcached client, and a core developer on Couchbase.\",\"sameAs\":[\"https:\/\/x.com\/ingenthr\"],\"url\":\"https:\/\/www.couchbase.com\/blog\/pt\/author\/matt-ingenthron\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Bulk Transformations of Couchbase Data Using Apache Spark","description":"Spark is a great new tool for bulk transformation. Check out the techniques for migrating to Couchbase from a different data source.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.couchbase.com\/blog\/pt\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/","og_locale":"pt_BR","og_type":"article","og_title":"Bulk Transformations of Couchbase Data Using Apache Spark with an External Source","og_description":"Spark is a great new tool for bulk transformation. Check out the techniques for migrating to Couchbase from a different data source.","og_url":"https:\/\/www.couchbase.com\/blog\/pt\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/","og_site_name":"The Couchbase Blog","article_published_time":"2015-12-09T22:23:52+00:00","article_modified_time":"2024-09-12T08:27:40+00:00","author":"Matt Ingenthron, Senior Director, SDK Engineering, Couchbase","twitter_card":"summary_large_image","twitter_creator":"@ingenthr","twitter_misc":{"Written by":"Matt Ingenthron, Senior Director, SDK Engineering, Couchbase","Est. reading time":"4 minutos"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/#article","isPartOf":{"@id":"https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/"},"author":{"name":"Matt Ingenthron, Senior Director, SDK Engineering, Couchbase","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/35e939d9fe3dfb1a06f1714ee54bb098"},"headline":"Bulk Transformations of Couchbase Data Using Apache Spark with an External Source","datePublished":"2015-12-09T22:23:52+00:00","dateModified":"2024-09-12T08:27:40+00:00","mainEntityOfPage":{"@id":"https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/"},"wordCount":872,"commentCount":0,"publisher":{"@id":"https:\/\/www.couchbase.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/#primaryimage"},"thumbnailUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2022\/11\/couchbase-nosql-dbaas.png","articleSection":["Uncategorized"],"inLanguage":"pt-BR","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/","url":"https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/","name":"Bulk Transformations of Couchbase Data Using Apache Spark","isPartOf":{"@id":"https:\/\/www.couchbase.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/#primaryimage"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/#primaryimage"},"thumbnailUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2022\/11\/couchbase-nosql-dbaas.png","datePublished":"2015-12-09T22:23:52+00:00","dateModified":"2024-09-12T08:27:40+00:00","description":"Spark is a great new tool for bulk transformation. Check out the techniques for migrating to Couchbase from a different data source.","breadcrumb":{"@id":"https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/#breadcrumb"},"inLanguage":"pt-BR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/"]}]},{"@type":"ImageObject","inLanguage":"pt-BR","@id":"https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/#primaryimage","url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2022\/11\/couchbase-nosql-dbaas.png","contentUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2022\/11\/couchbase-nosql-dbaas.png","width":1800,"height":630},{"@type":"BreadcrumbList","@id":"https:\/\/www.couchbase.com\/blog\/bulk-transformations-of-couchbase-data-with-an-external-source-using-apache-spark\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.couchbase.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Bulk Transformations of Couchbase Data Using Apache Spark with an External Source"}]},{"@type":"WebSite","@id":"https:\/\/www.couchbase.com\/blog\/#website","url":"https:\/\/www.couchbase.com\/blog\/","name":"Blog do Couchbase","description":"Couchbase, o banco de dados NoSQL","publisher":{"@id":"https:\/\/www.couchbase.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.couchbase.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"pt-BR"},{"@type":"Organization","@id":"https:\/\/www.couchbase.com\/blog\/#organization","name":"Blog do Couchbase","url":"https:\/\/www.couchbase.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"pt-BR","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2023\/04\/admin-logo.png","contentUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2023\/04\/admin-logo.png","width":218,"height":34,"caption":"The Couchbase Blog"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/35e939d9fe3dfb1a06f1714ee54bb098","name":"Matt Ingenthron, diretor s\u00eanior de engenharia de SDK, Couchbase","image":{"@type":"ImageObject","inLanguage":"pt-BR","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/image\/e8b5b257dfa7206fd7c2a5d628fc580b","url":"https:\/\/secure.gravatar.com\/avatar\/fd6787feb079d2c67a3835a47901bbb9c03b8921abced82a2a1f6975816df2ad?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/fd6787feb079d2c67a3835a47901bbb9c03b8921abced82a2a1f6975816df2ad?s=96&d=mm&r=g","caption":"Matt Ingenthron, Senior Director, SDK Engineering, Couchbase"},"description":"Matt Ingenthron is the Senior Director in Engineering at Couchbase where he focuses on the developer interface across SDKs, connectors and other projects. He has been a contributor to the memcached project, one of the maintainers of the Java spymemcached client, and a core developer on Couchbase.","sameAs":["https:\/\/x.com\/ingenthr"],"url":"https:\/\/www.couchbase.com\/blog\/pt\/author\/matt-ingenthron\/"}]}},"authors":[{"term_id":8993,"user_id":41,"is_guest":0,"slug":"matt-ingenthron","display_name":"Matt Ingenthron, Senior Director, SDK Engineering, Couchbase","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/fd6787feb079d2c67a3835a47901bbb9c03b8921abced82a2a1f6975816df2ad?s=96&d=mm&r=g","first_name":"Matt","last_name":"Ingenthron","user_url":"","author_category":"","description":"Matt Ingenthron \u00e9 o diretor s\u00eanior de engenharia da Couchbase, onde se concentra na interface do desenvolvedor em SDKs, conectores e outros projetos.  Ele contribuiu para o projeto memcached, foi um dos mantenedores do cliente Java spymemcached e um dos principais desenvolvedores do Couchbase."}],"_links":{"self":[{"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/posts\/2095","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/users\/41"}],"replies":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/comments?post=2095"}],"version-history":[{"count":0,"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/posts\/2095\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/media\/13873"}],"wp:attachment":[{"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/media?parent=2095"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/categories?post=2095"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/tags?post=2095"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/pt\/wp-json\/wp\/v2\/ppma_author?post=2095"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}