{"id":1191,"date":"2017-11-30T15:47:51","date_gmt":"2017-11-30T23:47:51","guid":{"rendered":"https:\/\/www.couchbase.com\/blog\/zero-effort-machine-learning-couchbase-spark-mllib\/"},"modified":"2017-11-30T15:47:51","modified_gmt":"2017-11-30T23:47:51","slug":"zero-effort-machine-learning-couchbase-spark-mllib","status":"publish","type":"post","link":"https:\/\/www.couchbase.com\/blog\/es\/zero-effort-machine-learning-couchbase-spark-mllib\/","title":{"rendered":"Zero Effort Machine Learning with Couchbase and Spark MLlib"},"content":{"rendered":"\n<p>The past few years we noticed how machine learning had been proven to be a technology in which companies should invest massively, you can easily find dozens of papers talking about how company X saved tons of money by adding some level of AI into their process.<br>\nSurprisingly I still notice many industries being skeptical about it and others which think it is &#8220;cool&#8221; but does not have anything in mind yet.<\/p>\n\n\n\n<p>I believe the reason for such dissonance is due to 2 main factors: Many companies have no idea how AI fits in their business and for most of the developers, it still sounds like black magic.<\/p>\n\n\n\n<p>That is why I would like to show you today how you can start with machine learning with almost zero effort.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Linear Regression<\/h4>\n\n\n\n<p>On the most basic level of machine learning, we have something called Linear Regression, which is roughly an algorithm that tries to &#8220;explain&#8221; a number by giving weight to a set of features, let&#8217;s see some examples:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u00a0The price of a house could be explained by things like size, location, number of bedrooms and bathrooms.<\/li>\n\n\n<li>\u00a0The price of a car could be explained by its model, year, mileage, condition, etc.<\/li>\n\n\n<li>\u00a0The time spent for a given task could be predicted by the number of subtasks, level of difficulty, worker experience, etc<\/li>\n\n<\/ul>\n\n\n\n<p>There are a plenty of use cases were Linear Regression (or other Regression types) can be used, but let&#8217;s focus on the first one related to house prices.<\/p>\n\n\n\n<p>Imagine we are running a real estate company in a particular region of the country, as we are an old company, there is some data record of which were the houses were sold in the past and for how much.<\/p>\n\n\n\n<p>In this case, each row in our historical data will look like this:<\/p>\n\n\n<p>[crayon lang=&#8221;js&#8221; decode=&#8221;true&#8221;]{<br \/>\n&#8220;id&#8221;: 7129300520,<br \/>\n&#8220;date&#8221;: &#8220;20141013T000000&#8221;,<br \/>\n&#8220;price&#8221;: 221900,<br \/>\n&#8220;bedrooms&#8221;: 3,<br \/>\n&#8220;bathrooms&#8221;: 1,<br \/>\n&#8220;sqft_living&#8221;: 1180,<br \/>\n&#8220;sqft_lot&#8221;: 5650,<br \/>\n&#8220;floors&#8221;: 1,<br \/>\n&#8220;waterfront&#8221;: 0,<br \/>\n&#8220;view&#8221;: 0,<br \/>\n&#8220;condition&#8221;: 3,<br \/>\n&#8220;grade&#8221;: 7,<br \/>\n&#8220;sqft_above&#8221;: 1180,<br \/>\n&#8220;sqft_basement&#8221;: 0,<br \/>\n&#8220;yr_built&#8221;: 1955,<br \/>\n&#8220;yr_renovated&#8221;: 0,<br \/>\n&#8220;zipcode&#8221;: 98178,<br \/>\n&#8220;lat&#8221;: 47.5112,<br \/>\n&#8220;long&#8221;: -122.257,<br \/>\n&#8220;sqft_living15&#8221;: 1340,<br \/>\n&#8220;sqft_lot15&#8221;: 5650<br \/>\n}[\/crayon]<\/p>\n\n\n\n<p>\u00a0<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The problem &#8211; How to price a house<\/h3>\n\n\n\n<p>Now, imagine you just joined the company and you have to sell the following house:<\/p>\n\n\n<p>[crayon lang=&#8221;js&#8221; decode=&#8221;true&#8221;]{<br \/>\n&#8220;id&#8221;: 1000001,<br \/>\n&#8220;date&#8221;: &#8220;20150422T000000&#8221;,<br \/>\n&#8220;bedrooms&#8221;: 6,<br \/>\n&#8220;bathrooms&#8221;: 3,<br \/>\n&#8220;price&#8221;: null,<br \/>\n&#8220;sqft_living&#8221;: 2400,<br \/>\n&#8220;sqft_lot&#8221;: 9373,<br \/>\n&#8220;floors&#8221;: 2,<br \/>\n&#8220;waterfront&#8221;: 0,<br \/>\n&#8220;view&#8221;: 0,<br \/>\n&#8220;condition&#8221;: 3,<br \/>\n&#8220;grade&#8221;: 7,<br \/>\n&#8220;sqft_above&#8221;: 2400,<br \/>\n&#8220;sqft_basement&#8221;: 0,<br \/>\n&#8220;yr_built&#8221;: 1991,<br \/>\n&#8220;yr_renovated&#8221;: 0,<br \/>\n&#8220;zipcode&#8221;: 98002,<br \/>\n&#8220;lat&#8221;: 47.3262,<br \/>\n&#8220;long&#8221;: -122.214,<br \/>\n&#8220;sqft_living15&#8221;: 2060,<br \/>\n&#8220;sqft_lot15&#8221;: 7316<br \/>\n}<br \/>\n[\/crayon]<\/p>\n\n\n\n<p><strong>For how much would you sell it?<\/strong><\/p>\n\n\n\n<p>The question above would be very challenging if you never sold a similar house in the past. Luckily, you have the right tool for the job: A Linear Regression.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><\/h3>\n\n\n\n<h3 class=\"wp-block-heading\">The Answer &#8211; Predicting house prices with Linear Regression<\/h3>\n\n\n\n<p>Before you go further, you will need to install the following items:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u00a0<a href=\"https:\/\/www.couchbase.com\/downloads\/\">Couchbase Server 5<\/a><\/li>\n\n\n<li>\u00a0<a href=\"https:\/\/spark.apache.org\/releases\/spark-release-2-2-0.html\">Spark 2.2<\/a><\/li>\n\n\n<li>\u00a0<a href=\"https:\/\/www.scala-sbt.org\/download.html\">SBT<\/a> (as we are running using Scala)<\/li>\n\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">\u00a0Loading the Dataset<\/h4>\n\n\n\n<p>With your Couchbase Server running, go to the administrative portal ( usually at https:\/\/127.0.0.1:8091) and create a new bucket called <strong>houses_prices<\/strong><\/p>\n\n\n\n<p><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-4281 aligncenter\" src=\"https:\/\/www.couchbase.com\/wp-content\/uploads\/sites\/5\/2026\/05\/bucket_creation-300x258-1.png\" alt=\"\" width=\"333\" height=\"286\"><\/p>\n\n\n\n<p>Now, let&#8217;s clone our tutorial code:<\/p>\n\n\n<p>[crayon lang=&#8221;default&#8221; decode=&#8221;true&#8221;]git clone https:\/\/github.com\/couchbaselabs\/couchbase-spark-mllib-sample.git[\/crayon]<\/p>\n\n\n\n<p>In the root folder there is a file called <strong>house_prices_train_data.zip<\/strong>, it is our dataset which I borrowed from an old machine learning course on <a href=\"https:\/\/www.coursera.org\/learn\/ml-foundations\/\">Coursera<\/a>. Please unzip it and then run the following command:<\/p>\n\n\n<p>[crayon lang=&#8221;default&#8221; decode=&#8221;true&#8221;].\/cbimport json -c couchbase:\/\/127.0.0.1 -u YOUR_USER -p YOUR_PASSWORD -b houses_prices -d &lt;PATH_TO_UNZIPED_FILE&gt;\/house_prices_train_data -f list -g key::%id% -t 4[\/crayon]<\/p>\n\n\n\n<p><strong>TIP<\/strong>: If you are not familiar with <strong>cbimport\u00a0<\/strong>please <a href=\"https:\/\/developer.couchbase.com\/documentation\/server\/current\/tools\/cbimport.html\">check this tutorial<\/a><\/p>\n\n\n\n<p>If your command ran successfully, you should notice that your <strong>houses_prices<\/strong> bucket has been populated:<\/p>\n\n\n\n<p><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-4282 aligncenter\" src=\"https:\/\/www.couchbase.com\/wp-content\/uploads\/sites\/5\/2026\/05\/filled_bucket-300x135-1.png\" alt=\"\" width=\"749\" height=\"337\"><\/p>\n\n\n\n<p>Let&#8217;s also quickly add a primary index for it:<\/p>\n\n\n<p>[crayon lang=&#8221;default&#8221; decode=&#8221;true&#8221;]CREATE PRIMARY INDEX ON `houses_prices`[\/crayon]<\/p>\n\n\n\n<p><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-4283 aligncenter\" src=\"https:\/\/www.couchbase.com\/wp-content\/uploads\/sites\/5\/2026\/05\/index_creation-300x172-1.png\" alt=\"\" width=\"680\" height=\"390\"><\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><\/h4>\n\n\n\n<h4 class=\"wp-block-heading\">Time to Code!<\/h4>\n\n\n\n<p>Our environment is ready, it is time to code!<\/p>\n\n\n\n<p>In the <a href=\"https:\/\/github.com\/couchbaselabs\/couchbase-spark-mllib-sample\/blob\/master\/src\/main\/scala\/LinearRegressionExample.scala\">LinearRegressionExample<\/a> class we start by creating the Spark context with our bucket credentials:<\/p>\n\n\n<p>[crayon lang=&#8221;scala&#8221; decode=&#8221;true&#8221;]val spark = SparkSession<br \/>\n    .builder()<br \/>\n    .appName(&#8220;SparkSQLExample&#8221;)<br \/>\n    .master(&#8220;local[*]&#8221;) \/\/ use the JVM as the master, great for testing<br \/>\n    .config(&#8220;spark.couchbase.nodes&#8221;, &#8220;127.0.0.1&#8221;) \/\/ connect to couchbase on localhost<br \/>\n    .config(&#8220;spark.couchbase.bucket.houses_prices&#8221;, &#8220;&#8221;) \/\/ open the houses_prices bucket with empty password<br \/>\n    .config(&#8220;com.couchbase.username&#8221;, &#8220;YOUR_USER&#8221;)<br \/>\n    .config(&#8220;com.couchbase.password&#8221;, &#8220;YOUR_PASSWORD&#8221;)<br \/>\n    .getOrCreate()[\/crayon]<\/p>\n\n\n\n<p>and then we load all the data from the database:<\/p>\n\n\n<p>[crayon lang=&#8221;scala&#8221; decode=&#8221;true&#8221;]val houses = spark.read.couchbase()[\/crayon]<\/p>\n\n\n\n<p>As Spark uses a lazy approach, the data is not loaded until it is really needed. You can clearly see the beauty of the <strong>Couchbase Connector<\/strong> above, we just converted a JSON Document into a Spark Dataframe with zero effort.<\/p>\n\n\n\n<p>In other databases for example, you would be required to export the data to a CSV file with some specific formats, copy it to your machine, load and do some extra procedures to convert it to a dataframe (not to mention the cases where the file generated is too big).<\/p>\n\n\n\n<p>In a real world you would need to do some filtering instead of just grabbing all data, again our connector is there for you, as you can even run some N1QL queries with it:<\/p>\n\n\n<p>[crayon lang=&#8221;scala&#8221; decode=&#8221;true&#8221;]\/\/loading documents by its type<br \/>\nval airlines = spark.read.couchbase(EqualTo(&#8220;type&#8221;, &#8220;airline&#8221;))<\/p>\n<p>\/\/loading data using N1QL<br \/>\n\/\/ This query groups airports by country and counts them.<br \/>\nval query = N1qlQuery.simple(&#8220;&#8221; +<br \/>\n    &#8220;select country, count(*) as count &#8221; +<br \/>\n    &#8220;from `travel-sample` &#8221; +<br \/>\n    &#8220;where type = &#8216;airport&#8217; &#8221; +<br \/>\n    &#8220;group by country &#8221; +<br \/>\n    &#8220;order by count desc&#8221;)<\/p>\n<p>val schema = StructType(<br \/>\n   StructField(&#8220;count&#8221;, IntegerType) ::<br \/>\n   StructField(&#8220;country&#8221;, StringType) :: Nil<br \/>\n)<\/p>\n<p>val rdd = spark.sparkContext.couchbaseQuery(query).map(<br \/>\n      r =&gt; Row(r.value.getInt(&#8220;count&#8221;), r.value.getString(&#8220;country&#8221;)))<br \/>\nspark.createDataFrame(rdd, schema).show()[\/crayon]<\/p>\n\n\n\n<p><strong>TIP<\/strong>: There are a lot of examples on how to use Couchbase connector <a href=\"https:\/\/github.com\/couchbaselabs\/couchbase-spark-samples\/tree\/master\/src\/main\/scala\">here<\/a>.<\/p>\n\n\n\n<p>Our dataframe still looks exactly like what we had in our database:<\/p>\n\n\n<p>[crayon lang=&#8221;scala&#8221; decode=&#8221;true&#8221;]houses.show(10)[\/crayon]<\/p>\n\n\n\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-4284 aligncenter\" src=\"https:\/\/www.couchbase.com\/wp-content\/uploads\/sites\/5\/2026\/05\/dataframe_data-300x59-1.png\" alt=\"\" width=\"906\" height=\"178\"><\/p>\n\n\n\n<p>There are two different types of data here, &#8220;<em>scalar numbers<\/em>&#8221; such as <strong>bathrooms<\/strong>\u00a0and <strong>sqft_living<\/strong>\u00a0and &#8220;<em>categorical variables<\/em>&#8221; such as <strong>zipcode<\/strong>\u00a0and <strong>yr_renovated<\/strong>. Those categorical variables are not just simple numbers, they have a much deeper meaning as they describe a property, in the zipcode case, for example, it represents the location of the house.<\/p>\n\n\n\n<p>Linear Regression does not like that kind of categorical variables, so if we really want to use zipcode in our Linear Regression, as it seems to be a relevant field to predict the price of a house, we have to convert it to a\u00a0<strong>dummy variable<\/strong>, which is fairly simple process:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Distinct all values of the target column.\u00a0<strong>Ex:\u00a0<\/strong><span class=\"lang:default decode:true crayon-inline \">SELECT DISTINCT(ZIPCODE) FROM HOUSES_PRICES<\/span><\/li>\n\n\n<li>Convert each row into a column. <strong>Ex:<\/strong> zipcode_98002, zipcode_98188, zipcode_98059<\/li>\n\n\n<li>Update those new columns with 1s and 0s according to the value of the zipcode content:<\/li>\n\n<\/ol>\n\n\n\n<p><strong>Ex:<\/strong><\/p>\n\n\n\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-4285\" src=\"https:\/\/www.couchbase.com\/wp-content\/uploads\/sites\/5\/2026\/05\/data_before_transformation-300x179-1.png\" alt=\"\" width=\"300\" height=\"179\"><\/p>\n\n\n\n<p>The table above will be transformed to:<\/p>\n\n\n\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-4288\" src=\"https:\/\/www.couchbase.com\/wp-content\/uploads\/sites\/5\/2026\/05\/data_after_transformation-300x74-1.png\" alt=\"\" width=\"690\" height=\"170\"><\/p>\n\n\n\n<p>That is what we are doing on the line below:<\/p>\n\n\n<p>[crayon lang=&#8221;scala&#8221; decode=&#8221;true&#8221;]val df = transformCategoricalFeatures(houses)[\/crayon]<\/p>\n\n\n\n<p>Converting categorical variables is a very standard procedure and Spark already has some utilities to do this work for you:<\/p>\n\n\n\n<p><p>[crayon lang=&#8221;js&#8221; decode=&#8221;true&#8221;]{<br \/>\n&#8220;id&#8221;: 1000001,<br \/>\n&#8220;date&#8221;: &#8220;20150422T000000&#8221;,<br \/>\n&#8220;bedrooms&#8221;: 6,<br \/>\n&#8220;bathrooms&#8221;: 3,<br \/>\n&#8220;price&#8221;: null,<br \/>\n&#8220;sqft_living&#8221;: 2400,<br \/>\n&#8220;sqft_lot&#8221;: 9373,<br \/>\n&#8220;floors&#8221;: 2,<br \/>\n&#8220;waterfront&#8221;: 0,<br \/>\n&#8220;view&#8221;: 0,<br \/>\n&#8220;condition&#8221;: 3,<br \/>\n&#8220;grade&#8221;: 7,<br \/>\n&#8220;sqft_above&#8221;: 2400,<br \/>\n&#8220;sqft_basement&#8221;: 0,<br \/>\n&#8220;yr_built&#8221;: 1991,<br \/>\n&#8220;yr_renovated&#8221;: 0,<br \/>\n&#8220;zipcode&#8221;: 98002,<br \/>\n&#8220;lat&#8221;: 47.3262,<br \/>\n&#8220;long&#8221;: -122.214,<br \/>\n&#8220;sqft_living15&#8221;: 2060,<br \/>\n&#8220;sqft_lot15&#8221;: 7316<br \/>\n}<br \/>\n[\/crayon]<\/p>\n0<\/p>\n\n\n\n<p><strong>NOTE:<\/strong> The final dataframe will not look exactly like the example shown above as it is already optimized to avoid\u00a0 The<a href=\"https:\/\/en.wikipedia.org\/wiki\/Sparse_matrix\"> Sparse Matrix<\/a> problem.<\/p>\n\n\n\n<p>Now, we can select the fields we would like to use and group them in a vector called <strong>features<\/strong>, as this linear regression implementation expects a field called\u00a0<strong>label<\/strong>, we also have to rename the\u00a0<strong>price<\/strong> column :<\/p>\n\n\n\n<p><p>[crayon lang=&#8221;js&#8221; decode=&#8221;true&#8221;]{<br \/>\n&#8220;id&#8221;: 1000001,<br \/>\n&#8220;date&#8221;: &#8220;20150422T000000&#8221;,<br \/>\n&#8220;bedrooms&#8221;: 6,<br \/>\n&#8220;bathrooms&#8221;: 3,<br \/>\n&#8220;price&#8221;: null,<br \/>\n&#8220;sqft_living&#8221;: 2400,<br \/>\n&#8220;sqft_lot&#8221;: 9373,<br \/>\n&#8220;floors&#8221;: 2,<br \/>\n&#8220;waterfront&#8221;: 0,<br \/>\n&#8220;view&#8221;: 0,<br \/>\n&#8220;condition&#8221;: 3,<br \/>\n&#8220;grade&#8221;: 7,<br \/>\n&#8220;sqft_above&#8221;: 2400,<br \/>\n&#8220;sqft_basement&#8221;: 0,<br \/>\n&#8220;yr_built&#8221;: 1991,<br \/>\n&#8220;yr_renovated&#8221;: 0,<br \/>\n&#8220;zipcode&#8221;: 98002,<br \/>\n&#8220;lat&#8221;: 47.3262,<br \/>\n&#8220;long&#8221;: -122.214,<br \/>\n&#8220;sqft_living15&#8221;: 2060,<br \/>\n&#8220;sqft_lot15&#8221;: 7316<br \/>\n}<br \/>\n[\/crayon]<\/p>\n1<\/p>\n\n\n\n<p>You can play around with those features removing\/adding them as you wish, later you can try for example remove the &#8220;<em>sqft_living<\/em>&#8221; feature to see how the algorithm has a much worse performance.<\/p>\n\n\n\n<p>Finally, we will only use houses in which the price is not null to train our machine learning algorithm, as our whole goal is to make our Linear Regression &#8220;learn&#8221; how to predict the price by a giving set of features.<\/p>\n\n\n\n<p><p>[crayon lang=&#8221;js&#8221; decode=&#8221;true&#8221;]{<br \/>\n&#8220;id&#8221;: 1000001,<br \/>\n&#8220;date&#8221;: &#8220;20150422T000000&#8221;,<br \/>\n&#8220;bedrooms&#8221;: 6,<br \/>\n&#8220;bathrooms&#8221;: 3,<br \/>\n&#8220;price&#8221;: null,<br \/>\n&#8220;sqft_living&#8221;: 2400,<br \/>\n&#8220;sqft_lot&#8221;: 9373,<br \/>\n&#8220;floors&#8221;: 2,<br \/>\n&#8220;waterfront&#8221;: 0,<br \/>\n&#8220;view&#8221;: 0,<br \/>\n&#8220;condition&#8221;: 3,<br \/>\n&#8220;grade&#8221;: 7,<br \/>\n&#8220;sqft_above&#8221;: 2400,<br \/>\n&#8220;sqft_basement&#8221;: 0,<br \/>\n&#8220;yr_built&#8221;: 1991,<br \/>\n&#8220;yr_renovated&#8221;: 0,<br \/>\n&#8220;zipcode&#8221;: 98002,<br \/>\n&#8220;lat&#8221;: 47.3262,<br \/>\n&#8220;long&#8221;: -122.214,<br \/>\n&#8220;sqft_living15&#8221;: 2060,<br \/>\n&#8220;sqft_lot15&#8221;: 7316<br \/>\n}<br \/>\n[\/crayon]<\/p>\n2<\/p>\n\n\n\n<p>Here is where the magic happens, first we split our data into training (<em>80%<\/em>) and test (<em>20%<\/em>), but for the purpose of this article let&#8217;s ignore the test data, then we create our LinearRegression instance and <strong>fit<\/strong> our data into it.<\/p>\n\n\n\n<p><p>[crayon lang=&#8221;js&#8221; decode=&#8221;true&#8221;]{<br \/>\n&#8220;id&#8221;: 1000001,<br \/>\n&#8220;date&#8221;: &#8220;20150422T000000&#8221;,<br \/>\n&#8220;bedrooms&#8221;: 6,<br \/>\n&#8220;bathrooms&#8221;: 3,<br \/>\n&#8220;price&#8221;: null,<br \/>\n&#8220;sqft_living&#8221;: 2400,<br \/>\n&#8220;sqft_lot&#8221;: 9373,<br \/>\n&#8220;floors&#8221;: 2,<br \/>\n&#8220;waterfront&#8221;: 0,<br \/>\n&#8220;view&#8221;: 0,<br \/>\n&#8220;condition&#8221;: 3,<br \/>\n&#8220;grade&#8221;: 7,<br \/>\n&#8220;sqft_above&#8221;: 2400,<br \/>\n&#8220;sqft_basement&#8221;: 0,<br \/>\n&#8220;yr_built&#8221;: 1991,<br \/>\n&#8220;yr_renovated&#8221;: 0,<br \/>\n&#8220;zipcode&#8221;: 98002,<br \/>\n&#8220;lat&#8221;: 47.3262,<br \/>\n&#8220;long&#8221;: -122.214,<br \/>\n&#8220;sqft_living15&#8221;: 2060,<br \/>\n&#8220;sqft_lot15&#8221;: 7316<br \/>\n}<br \/>\n[\/crayon]<\/p>\n3<\/p>\n\n\n\n<p><em>The <strong>lrModel<\/strong>\u00a0variable is already a trained model capable of predicting house prices!<\/em><\/p>\n\n\n\n<p>Before we start predicting things, let&#8217;s just check some metrics of our trained model:<\/p>\n\n\n\n<p><p>[crayon lang=&#8221;js&#8221; decode=&#8221;true&#8221;]{<br \/>\n&#8220;id&#8221;: 1000001,<br \/>\n&#8220;date&#8221;: &#8220;20150422T000000&#8221;,<br \/>\n&#8220;bedrooms&#8221;: 6,<br \/>\n&#8220;bathrooms&#8221;: 3,<br \/>\n&#8220;price&#8221;: null,<br \/>\n&#8220;sqft_living&#8221;: 2400,<br \/>\n&#8220;sqft_lot&#8221;: 9373,<br \/>\n&#8220;floors&#8221;: 2,<br \/>\n&#8220;waterfront&#8221;: 0,<br \/>\n&#8220;view&#8221;: 0,<br \/>\n&#8220;condition&#8221;: 3,<br \/>\n&#8220;grade&#8221;: 7,<br \/>\n&#8220;sqft_above&#8221;: 2400,<br \/>\n&#8220;sqft_basement&#8221;: 0,<br \/>\n&#8220;yr_built&#8221;: 1991,<br \/>\n&#8220;yr_renovated&#8221;: 0,<br \/>\n&#8220;zipcode&#8221;: 98002,<br \/>\n&#8220;lat&#8221;: 47.3262,<br \/>\n&#8220;long&#8221;: -122.214,<br \/>\n&#8220;sqft_living15&#8221;: 2060,<br \/>\n&#8220;sqft_lot15&#8221;: 7316<br \/>\n}<br \/>\n[\/crayon]<\/p>\n4<\/p>\n\n\n\n<p>The one you should care here is called <a href=\"https:\/\/en.wikipedia.org\/wiki\/Root-mean-square_deviation\">RMSE &#8211; Root Mean Squared Error<\/a> which roughly is the\u00a0<strong>average deviation of what our model predicts X the actual price sold<\/strong>.<\/p>\n\n\n\n<p><p>[crayon lang=&#8221;js&#8221; decode=&#8221;true&#8221;]{<br \/>\n&#8220;id&#8221;: 1000001,<br \/>\n&#8220;date&#8221;: &#8220;20150422T000000&#8221;,<br \/>\n&#8220;bedrooms&#8221;: 6,<br \/>\n&#8220;bathrooms&#8221;: 3,<br \/>\n&#8220;price&#8221;: null,<br \/>\n&#8220;sqft_living&#8221;: 2400,<br \/>\n&#8220;sqft_lot&#8221;: 9373,<br \/>\n&#8220;floors&#8221;: 2,<br \/>\n&#8220;waterfront&#8221;: 0,<br \/>\n&#8220;view&#8221;: 0,<br \/>\n&#8220;condition&#8221;: 3,<br \/>\n&#8220;grade&#8221;: 7,<br \/>\n&#8220;sqft_above&#8221;: 2400,<br \/>\n&#8220;sqft_basement&#8221;: 0,<br \/>\n&#8220;yr_built&#8221;: 1991,<br \/>\n&#8220;yr_renovated&#8221;: 0,<br \/>\n&#8220;zipcode&#8221;: 98002,<br \/>\n&#8220;lat&#8221;: 47.3262,<br \/>\n&#8220;long&#8221;: -122.214,<br \/>\n&#8220;sqft_living15&#8221;: 2060,<br \/>\n&#8220;sqft_lot15&#8221;: 7316<br \/>\n}<br \/>\n[\/crayon]<\/p>\n5<\/p>\n\n\n\n<p>On average we miss the actual price by <em>$147556.0841305963<\/em>, which is not bad at all considering we barely did any <a href=\"https:\/\/en.wikipedia.org\/wiki\/Feature_engineering\">feature engineering<\/a> or removed any outliers (some houses might have inexplicable high or low prices and it might mess up with your Linear Regression)<\/p>\n\n\n\n<p>There is only one house with a missing price in this dataset, exactly the one that we pointed in the beginning:<\/p>\n\n\n\n<p><p>[crayon lang=&#8221;js&#8221; decode=&#8221;true&#8221;]{<br \/>\n&#8220;id&#8221;: 1000001,<br \/>\n&#8220;date&#8221;: &#8220;20150422T000000&#8221;,<br \/>\n&#8220;bedrooms&#8221;: 6,<br \/>\n&#8220;bathrooms&#8221;: 3,<br \/>\n&#8220;price&#8221;: null,<br \/>\n&#8220;sqft_living&#8221;: 2400,<br \/>\n&#8220;sqft_lot&#8221;: 9373,<br \/>\n&#8220;floors&#8221;: 2,<br \/>\n&#8220;waterfront&#8221;: 0,<br \/>\n&#8220;view&#8221;: 0,<br \/>\n&#8220;condition&#8221;: 3,<br \/>\n&#8220;grade&#8221;: 7,<br \/>\n&#8220;sqft_above&#8221;: 2400,<br \/>\n&#8220;sqft_basement&#8221;: 0,<br \/>\n&#8220;yr_built&#8221;: 1991,<br \/>\n&#8220;yr_renovated&#8221;: 0,<br \/>\n&#8220;zipcode&#8221;: 98002,<br \/>\n&#8220;lat&#8221;: 47.3262,<br \/>\n&#8220;long&#8221;: -122.214,<br \/>\n&#8220;sqft_living15&#8221;: 2060,<br \/>\n&#8220;sqft_lot15&#8221;: 7316<br \/>\n}<br \/>\n[\/crayon]<\/p>\n6<\/p>\n\n\n\n<p>And now we can finally predict the expected house price:<\/p>\n\n\n\n<p><p>[crayon lang=&#8221;js&#8221; decode=&#8221;true&#8221;]{<br \/>\n&#8220;id&#8221;: 1000001,<br \/>\n&#8220;date&#8221;: &#8220;20150422T000000&#8221;,<br \/>\n&#8220;bedrooms&#8221;: 6,<br \/>\n&#8220;bathrooms&#8221;: 3,<br \/>\n&#8220;price&#8221;: null,<br \/>\n&#8220;sqft_living&#8221;: 2400,<br \/>\n&#8220;sqft_lot&#8221;: 9373,<br \/>\n&#8220;floors&#8221;: 2,<br \/>\n&#8220;waterfront&#8221;: 0,<br \/>\n&#8220;view&#8221;: 0,<br \/>\n&#8220;condition&#8221;: 3,<br \/>\n&#8220;grade&#8221;: 7,<br \/>\n&#8220;sqft_above&#8221;: 2400,<br \/>\n&#8220;sqft_basement&#8221;: 0,<br \/>\n&#8220;yr_built&#8221;: 1991,<br \/>\n&#8220;yr_renovated&#8221;: 0,<br \/>\n&#8220;zipcode&#8221;: 98002,<br \/>\n&#8220;lat&#8221;: 47.3262,<br \/>\n&#8220;long&#8221;: -122.214,<br \/>\n&#8220;sqft_living15&#8221;: 2060,<br \/>\n&#8220;sqft_lot15&#8221;: 7316<br \/>\n}<br \/>\n[\/crayon]<\/p>\n7<\/p>\n\n\n\n<p>\u00a0<\/p>\n\n\n\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-4287\" src=\"https:\/\/www.couchbase.com\/wp-content\/uploads\/sites\/5\/2026\/05\/predicted_price.png\" alt=\"\" width=\"292\" height=\"164\"><\/p>\n\n\n\n<p>Awesome, isn&#8217;t it?<\/p>\n\n\n\n<p>For production purpose, you would still need to do a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Model_selection\">model selection<\/a> first, check other metrics of your regression and save the model instead of training it on the fly, but it&#8217;s amazing how much can be done with less than 100 lines of code!<\/p>\n\n\n\n<p>If you have any questions, feel free to ask me on twitter at <a href=\"https:\/\/twitter.com\/deniswsrosa\">@deniswsrosa<\/a>\u00a0 or on our <a href=\"https:\/\/www.couchbase.com\/forums\/\">forums<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The past few years we noticed how machine learning had been proven to be a technology in which companies should invest massively, you can easily find dozens of papers talking about how company X saved tons of money by adding some level of AI into their process. Surprisingly I still notice many industries being skeptical [&hellip;]<\/p>\n","protected":false},"author":8754,"featured_media":18,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"footnotes":""},"categories":[1],"tags":[],"ppma_author":[287],"class_list":["post-1191","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.6 (Yoast SEO v27.6) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Zero Effort Machine Learning with Couchbase and Spark MLlib - The Couchbase Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.couchbase.com\/blog\/es\/zero-effort-machine-learning-couchbase-spark-mllib\/\" \/>\n<meta property=\"og:locale\" content=\"es_MX\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Zero Effort Machine Learning with Couchbase and Spark MLlib\" \/>\n<meta property=\"og:description\" content=\"The past few years we noticed how machine learning had been proven to be a technology in which companies should invest massively, you can easily find dozens of papers talking about how company X saved tons of money by adding some level of AI into their process. Surprisingly I still notice many industries being skeptical [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.couchbase.com\/blog\/es\/zero-effort-machine-learning-couchbase-spark-mllib\/\" \/>\n<meta property=\"og:site_name\" content=\"The Couchbase Blog\" \/>\n<meta property=\"article:published_time\" content=\"2017-11-30T23:47:51+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/5\/2026\/05\/couchbase-nosql-dbaas.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1800\" \/>\n\t<meta property=\"og:image:height\" content=\"630\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Denis Rosa, Developer Advocate, Couchbase\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@deniswsrosa\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Denis Rosa, Developer Advocate, Couchbase\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutos\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/zero-effort-machine-learning-couchbase-spark-mllib\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/zero-effort-machine-learning-couchbase-spark-mllib\\\/\"},\"author\":{\"name\":\"Denis Rosa, Developer Advocate, Couchbase\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#\\\/schema\\\/person\\\/fe3c5273e805e72a5294611a48f62257\"},\"headline\":\"Zero Effort Machine Learning with Couchbase and Spark MLlib\",\"datePublished\":\"2017-11-30T23:47:51+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/zero-effort-machine-learning-couchbase-spark-mllib\\\/\"},\"wordCount\":1876,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/zero-effort-machine-learning-couchbase-spark-mllib\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/5\\\/2026\\\/05\\\/couchbase-nosql-dbaas.png\",\"articleSection\":[\"Uncategorized\"],\"inLanguage\":\"es\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/zero-effort-machine-learning-couchbase-spark-mllib\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/zero-effort-machine-learning-couchbase-spark-mllib\\\/\",\"url\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/zero-effort-machine-learning-couchbase-spark-mllib\\\/\",\"name\":\"Zero Effort Machine Learning with Couchbase and Spark MLlib - The Couchbase Blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/zero-effort-machine-learning-couchbase-spark-mllib\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/zero-effort-machine-learning-couchbase-spark-mllib\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/5\\\/2026\\\/05\\\/couchbase-nosql-dbaas.png\",\"datePublished\":\"2017-11-30T23:47:51+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/zero-effort-machine-learning-couchbase-spark-mllib\\\/#breadcrumb\"},\"inLanguage\":\"es\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/zero-effort-machine-learning-couchbase-spark-mllib\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"es\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/zero-effort-machine-learning-couchbase-spark-mllib\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/5\\\/2026\\\/05\\\/couchbase-nosql-dbaas.png\",\"contentUrl\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/5\\\/2026\\\/05\\\/couchbase-nosql-dbaas.png\",\"width\":1800,\"height\":630},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/zero-effort-machine-learning-couchbase-spark-mllib\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Zero Effort Machine Learning with Couchbase and Spark MLlib\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/\",\"name\":\"The Couchbase Blog\",\"description\":\"Couchbase, the NoSQL Database\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"es\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#organization\",\"name\":\"The Couchbase Blog\",\"url\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"es\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/5\\\/2026\\\/06\\\/logo.svg\",\"contentUrl\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/5\\\/2026\\\/06\\\/logo.svg\",\"width\":\"1024\",\"height\":\"1024\",\"caption\":\"The Couchbase Blog\"},\"image\":{\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/#\\\/schema\\\/person\\\/fe3c5273e805e72a5294611a48f62257\",\"name\":\"Denis Rosa, Developer Advocate, Couchbase\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"es\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/f8d1f5c13115122cab89d0f229b904480bfe20d3dfbb093fe9734cda5235d419?s=96&d=mm&r=gbe0716f6199cfb09417c92cf7a8fa8d6\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/f8d1f5c13115122cab89d0f229b904480bfe20d3dfbb093fe9734cda5235d419?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/f8d1f5c13115122cab89d0f229b904480bfe20d3dfbb093fe9734cda5235d419?s=96&d=mm&r=g\",\"caption\":\"Denis Rosa, Developer Advocate, Couchbase\"},\"description\":\"Denis Rosa is a Developer Advocate for Couchbase and lives in Munich - Germany. He has a solid experience as a software engineer and speaks fluently Java, Python, Scala and Javascript. Denis likes to write about search, Big Data, AI, Microservices and everything else that would help developers to make a beautiful, faster, stable and scalable app.\",\"sameAs\":[\"https:\\\/\\\/x.com\\\/deniswsrosa\"],\"url\":\"https:\\\/\\\/www.couchbase.com\\\/blog\\\/es\\\/author\\\/denis-rosa\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Zero Effort Machine Learning with Couchbase and Spark MLlib - The Couchbase Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.couchbase.com\/blog\/es\/zero-effort-machine-learning-couchbase-spark-mllib\/","og_locale":"es_MX","og_type":"article","og_title":"Zero Effort Machine Learning with Couchbase and Spark MLlib","og_description":"The past few years we noticed how machine learning had been proven to be a technology in which companies should invest massively, you can easily find dozens of papers talking about how company X saved tons of money by adding some level of AI into their process. Surprisingly I still notice many industries being skeptical [&hellip;]","og_url":"https:\/\/www.couchbase.com\/blog\/es\/zero-effort-machine-learning-couchbase-spark-mllib\/","og_site_name":"The Couchbase Blog","article_published_time":"2017-11-30T23:47:51+00:00","og_image":[{"width":1800,"height":630,"url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/5\/2026\/05\/couchbase-nosql-dbaas.png","type":"image\/png"}],"author":"Denis Rosa, Developer Advocate, Couchbase","twitter_card":"summary_large_image","twitter_creator":"@deniswsrosa","twitter_misc":{"Written by":"Denis Rosa, Developer Advocate, Couchbase","Est. reading time":"9 minutos"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.couchbase.com\/blog\/zero-effort-machine-learning-couchbase-spark-mllib\/#article","isPartOf":{"@id":"https:\/\/www.couchbase.com\/blog\/zero-effort-machine-learning-couchbase-spark-mllib\/"},"author":{"name":"Denis Rosa, Developer Advocate, Couchbase","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/fe3c5273e805e72a5294611a48f62257"},"headline":"Zero Effort Machine Learning with Couchbase and Spark MLlib","datePublished":"2017-11-30T23:47:51+00:00","mainEntityOfPage":{"@id":"https:\/\/www.couchbase.com\/blog\/zero-effort-machine-learning-couchbase-spark-mllib\/"},"wordCount":1876,"commentCount":0,"publisher":{"@id":"https:\/\/www.couchbase.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/zero-effort-machine-learning-couchbase-spark-mllib\/#primaryimage"},"thumbnailUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/5\/2026\/05\/couchbase-nosql-dbaas.png","articleSection":["Uncategorized"],"inLanguage":"es","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.couchbase.com\/blog\/zero-effort-machine-learning-couchbase-spark-mllib\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.couchbase.com\/blog\/zero-effort-machine-learning-couchbase-spark-mllib\/","url":"https:\/\/www.couchbase.com\/blog\/zero-effort-machine-learning-couchbase-spark-mllib\/","name":"Zero Effort Machine Learning with Couchbase and Spark MLlib - The Couchbase Blog","isPartOf":{"@id":"https:\/\/www.couchbase.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.couchbase.com\/blog\/zero-effort-machine-learning-couchbase-spark-mllib\/#primaryimage"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/zero-effort-machine-learning-couchbase-spark-mllib\/#primaryimage"},"thumbnailUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/5\/2026\/05\/couchbase-nosql-dbaas.png","datePublished":"2017-11-30T23:47:51+00:00","breadcrumb":{"@id":"https:\/\/www.couchbase.com\/blog\/zero-effort-machine-learning-couchbase-spark-mllib\/#breadcrumb"},"inLanguage":"es","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.couchbase.com\/blog\/zero-effort-machine-learning-couchbase-spark-mllib\/"]}]},{"@type":"ImageObject","inLanguage":"es","@id":"https:\/\/www.couchbase.com\/blog\/zero-effort-machine-learning-couchbase-spark-mllib\/#primaryimage","url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/5\/2026\/05\/couchbase-nosql-dbaas.png","contentUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/5\/2026\/05\/couchbase-nosql-dbaas.png","width":1800,"height":630},{"@type":"BreadcrumbList","@id":"https:\/\/www.couchbase.com\/blog\/zero-effort-machine-learning-couchbase-spark-mllib\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.couchbase.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Zero Effort Machine Learning with Couchbase and Spark MLlib"}]},{"@type":"WebSite","@id":"https:\/\/www.couchbase.com\/blog\/#website","url":"https:\/\/www.couchbase.com\/blog\/","name":"The Couchbase Blog","description":"Couchbase, the NoSQL Database","publisher":{"@id":"https:\/\/www.couchbase.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.couchbase.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"es"},{"@type":"Organization","@id":"https:\/\/www.couchbase.com\/blog\/#organization","name":"The Couchbase Blog","url":"https:\/\/www.couchbase.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"es","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/5\/2026\/06\/logo.svg","contentUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/5\/2026\/06\/logo.svg","width":"1024","height":"1024","caption":"The Couchbase Blog"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/fe3c5273e805e72a5294611a48f62257","name":"Denis Rosa, Developer Advocate, Couchbase","image":{"@type":"ImageObject","inLanguage":"es","@id":"https:\/\/secure.gravatar.com\/avatar\/f8d1f5c13115122cab89d0f229b904480bfe20d3dfbb093fe9734cda5235d419?s=96&d=mm&r=gbe0716f6199cfb09417c92cf7a8fa8d6","url":"https:\/\/secure.gravatar.com\/avatar\/f8d1f5c13115122cab89d0f229b904480bfe20d3dfbb093fe9734cda5235d419?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/f8d1f5c13115122cab89d0f229b904480bfe20d3dfbb093fe9734cda5235d419?s=96&d=mm&r=g","caption":"Denis Rosa, Developer Advocate, Couchbase"},"description":"Denis Rosa is a Developer Advocate for Couchbase and lives in Munich - Germany. He has a solid experience as a software engineer and speaks fluently Java, Python, Scala and Javascript. Denis likes to write about search, Big Data, AI, Microservices and everything else that would help developers to make a beautiful, faster, stable and scalable app.","sameAs":["https:\/\/x.com\/deniswsrosa"],"url":"https:\/\/www.couchbase.com\/blog\/es\/author\/denis-rosa\/"}]}},"acf":[],"authors":[{"term_id":287,"user_id":8754,"is_guest":0,"slug":"denis-rosa","display_name":"Denis Rosa, Developer Advocate, Couchbase","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/www.couchbase.com\/blog\/es\/wp-json\/wp\/v2\/posts\/1191","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.couchbase.com\/blog\/es\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.couchbase.com\/blog\/es\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/es\/wp-json\/wp\/v2\/users\/8754"}],"replies":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/es\/wp-json\/wp\/v2\/comments?post=1191"}],"version-history":[{"count":0,"href":"https:\/\/www.couchbase.com\/blog\/es\/wp-json\/wp\/v2\/posts\/1191\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/es\/wp-json\/wp\/v2\/media\/18"}],"wp:attachment":[{"href":"https:\/\/www.couchbase.com\/blog\/es\/wp-json\/wp\/v2\/media?parent=1191"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/es\/wp-json\/wp\/v2\/categories?post=1191"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/es\/wp-json\/wp\/v2\/tags?post=1191"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/es\/wp-json\/wp\/v2\/ppma_author?post=1191"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}