{"id":12617,"date":"2021-12-21T08:00:27","date_gmt":"2021-12-21T16:00:27","guid":{"rendered":"https:\/\/www.couchbase.com\/blog\/?p=12617"},"modified":"2025-06-13T17:20:50","modified_gmt":"2025-06-14T00:20:50","slug":"how-couchbase-simplifies-data-science-part-1","status":"publish","type":"post","link":"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/","title":{"rendered":"How Couchbase Simplifies Data Science (Part 1)"},"content":{"rendered":"<p><em><span style=\"font-weight: 400\">This post was co-authored by Karen Yua<\/span><span style=\"font-weight: 400\">n<\/span><span style=\"font-weight: 400\">, a High School Intern.\u00a0<\/span><\/em><\/p>\n<p>Data science extracts knowledge from data and applies that knowledge to solve problems. In the next two posts, we will learn how the Couchbase Data Platform can meet various data science needs and simplify and reduce the number of tools needed during the process.<\/p>\n<h4>Overview<\/h4>\n<p>Data science workflows involve several steps, as shown in Figure 1. Data scientists are forced to use different tools for different steps, complicating the process and making it less efficient.<\/p>\n<p>For example, data scientists perform exploratory data analysis to determine which attributes in the training data are important for their use case. To do this, data scientists usually load training data from a database into a different tool, e.g., a Jupyter notebook. But training datasets are huge and consume a lot of memory. Transferring large datasets also consumes network bandwidth and slows the process. <strong>Clearly, the best place to analyze data is the database where the data is stored.<\/strong><\/p>\n<p>The data scientist needs to read only the essential attributes from the database. This simplifies data analysis, reduces training session memory usages and limits the amount of network data transfer. For example, consider training data with millions of JSON documents; each has ten fields. If only 8 of these fields are needed for training, the data scientist can save ~20% of memory by ignoring the rest (assuming all fields are of the same size).<\/p>\n<p>In this and the following article, we will learn:<\/p>\n<ol>\n<li>How to do exploratory data analysis (EDA) and visualize data science results using the Couchbase Query service. The Query and Analytics services run on the same Couchbase cluster as the training data and predictions. Using these services to analyze the training data and the predictions makes the data science process easy and performant.<\/li>\n<li>How to efficiently read training data using the Query and Analytics APIs in the Couchbase Python SDK and seamlessly save it to a data structure suitable for machine learning (ML), e.g., a pandas dataframe.<\/li>\n<li>How Couchbase can meet all data science process storage needs by storing not just the training data and predictions but also ML models (up to 20MB in size).<\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-12618 size-full\" src=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2021\/12\/Screen-Shot-2021-12-20-at-2.53.24-PM.png\" alt=\"\" width=\"741\" height=\"159\" srcset=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/Screen-Shot-2021-12-20-at-2.53.24-PM.png 741w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/Screen-Shot-2021-12-20-at-2.53.24-PM-300x64.png 300w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/Screen-Shot-2021-12-20-at-2.53.24-PM-20x4.png 20w\" sizes=\"auto, (max-width: 741px) 100vw, 741px\" \/><\/p>\n<h4>Frame the problem<\/h4>\n<p>The data science process (Figure 1) usually starts with a problem definition. In this article, we will use customer churn prediction as an example.<\/p>\n<p>The sales team at an online streaming service company wants to know whether increasing the monthly subscription cost can lead to higher revenues. However, increasing the monthly cost too high could lead to higher customer churn.<\/p>\n<p>What monthly cost maximizes the revenue while keeping the customer churn in check? To answer this, the data scientist needs to predict the churn scores if the monthly costs are increased by specific amounts, e.g., $1, $2, etc.<\/p>\n<h4>Collect and prepare relevant data<\/h4>\n<p>After defining the problem, the data scientist will collect the right data needed to solve the problem. This raw data may need to be cleaned and pre-processed, e.g., to handle missing values.<\/p>\n<p>For this article, we will use the <em>online_streaming.csv<\/em> dataset available <a href=\"https:\/\/github.com\/couchbaselabs\/datasets\">here<\/a>. This is a synthetic dataset created by Couchbase. It simulates 500,000 customer records of a hypothetical online streaming service company.<\/p>\n<p>Load this dataset into your Couchbase cluster using the steps mentioned in the README <a href=\"https:\/\/github.com\/couchbaselabs\/datasets\">here<\/a>.<\/p>\n<h4>Explore the data<\/h4>\n<p>After collecting and preparing the needed data, the data scientist explores it to gain insight into the data. Exploratory data analysis (EDA) is an approach that often employs visualization techniques to uncover the structure of data and extract important variables.<\/p>\n<p>As mentioned earlier, the best place to do this analysis is in the database where the data is stored.<\/p>\n<p>Data scientists can use Couchbase Query and Analytics services for EDA. These services run on the same Couchbase cluster as the one where the training data is stored. Training data need not be moved elsewhere for analysis. This makes the process simple and efficient.<\/p>\n<p>Couchbase Multi-Dimensional Scaling (MDS), described <a href=\"https:\/\/www.couchbase.com\/multi-dimensional-scalability-overview\/\">here<\/a>, allows these services to be scaled independently. This results in faster queries. The parallel data management capabilities of the Couchbase Analytics service make data analysis even more efficient.<\/p>\n<p>To use the Query features described here, first create a primary index on the online_streaming bucket by running the following command in the Query UI. Select <em>query context<\/em> from the dropdown menu and enter: <em>CREATE PRIMARY INDEX ON<\/em> <em>online_streaming<\/em><\/p>\n<p>Instead of creating a primary index, you could also use the Index Advisor feature (<a href=\"https:\/\/docs.couchbase.com\/server\/current\/n1ql\/n1ql-language-reference\/advise.html\">ADVISE<\/a> statement in the UI or command line <a href=\"https:\/\/docs.couchbase.com\/server\/current\/tools\/cbq-shell.html\">cbq<\/a> tool) in the Couchbase Query service to find appropriate indexes to create, per query, based on the <em>where<\/em> clause.<\/p>\n<p>One of the key steps in exploratory data analysis is to understand the attributes in the dataset.<br \/>\nThe <em>INFER<\/em> statement for SQL++ queries can help. It allows users to infer the structure of documents, data types of various attributes, etc. Note that <em>INFER<\/em> is statistical in nature rather than deterministic, as explained <a href=\"https:\/\/docs.couchbase.com\/server\/current\/n1ql\/n1ql-language-reference\/infer.html\">here<\/a>.<\/p>\n<p>As seen in the query below, executing the <em>INFER<\/em> statement on the online_streaming bucket shows that there are 500,000 documents in this bucket and the documents contain attributes such as Plan and CustomerID. Expanding on individual attributes gives additional details. E.g., MonthlyCost has three possible values &#8211; $5.99, $11.99 and $19.99.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-12619 size-full\" src=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2021\/12\/infer.png\" alt=\"\" width=\"468\" height=\"550\" srcset=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/infer.png 468w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/infer-255x300.png 255w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/infer-300x353.png 300w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/infer-17x20.png 17w\" sizes=\"auto, (max-width: 468px) 100vw, 468px\" \/><br \/>\nWe will use logistic regression, a supervised learning algorithm, to train the churn predictor.<br \/>\nThe <em>Churn<\/em> attribute in the dataset will be used as the label (output; target) for the supervised learning. <em>Churn<\/em> is set to either 0 or 1; 1 indicates the customer has churned. The training process tries to learn the relationship between the label (churn) and the inputs (other attributes in the dataset).<\/p>\n<p>Query and Analytics Charts (in Couchbase Server 7.0.2 and later) can be used to visualize patterns or correlations between the attributes. This can help decide which attributes to include in the training process. The chart below shows that customers with higher monthly costs are more likely to churn. Clearly, monthly cost is an important attribute to include while training the churn predictor.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-12620 size-full\" src=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2021\/12\/monthlycost.png\" alt=\"\" width=\"468\" height=\"323\" srcset=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/monthlycost.png 468w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/monthlycost-300x207.png 300w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/monthlycost-20x14.png 20w\" sizes=\"auto, (max-width: 468px) 100vw, 468px\" \/><\/p>\n<p>Couchbase Query and Analytics services also provide many built-in functions such as mean and standard deviation that can be used for statistical analysis. Other Query functions such as date and string can help with data preprocessing.<\/p>\n<h4>Train the model<\/h4>\n<p>After exploring the data, the data scientist proceeds to train the model. We will train the customer churn predictor using the steps shown in Figure 2. If needed, the features can also be stored in a Couchbase bucket.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-12621 size-full\" src=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2021\/12\/Screen-Shot-2021-12-20-at-3.00.12-PM.png\" alt=\"\" width=\"643\" height=\"276\" srcset=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/Screen-Shot-2021-12-20-at-3.00.12-PM.png 643w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/Screen-Shot-2021-12-20-at-3.00.12-PM-300x129.png 300w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/Screen-Shot-2021-12-20-at-3.00.12-PM-20x9.png 20w\" sizes=\"auto, (max-width: 643px) 100vw, 643px\" \/><\/p>\n<p>Step 1: Efficiently Read Training Data from Couchbase<br \/>\nWe have already identified the important attributes during EDA using the Query service. We will fetch only those attributes while reading the training data. This reduces the size of the data read over the network and the amount of memory needed to store it in the training session.<\/p>\n<p>We will use the Query API in Couchbase Python SDK (version 3) to <em>efficiently<\/em> read the entire training dataset by <em>selecting<\/em> only the attributes that are relevant to train our churn predictor.<\/p>\n<p>Data read using the query API can be easily stored in a pandas dataframe, as seen in this code.<\/p>\n<pre class=\"\">import numpy as np\r\nimport pandas as pd\r\n# Connect to Couchbase cluster using the Python SDK\r\nfrom couchbase.cluster import Cluster, ClusterOptions\r\nfrom couchbase.auth import PasswordAuthenticator\r\n# Fill-in the hostname or IP address, user_name and password for your cluster. \r\n# E.g. Cluster.connect(\"couchbase:\/\/localhost\", ClusterOptions(PasswordAuthenticator(\"Administrator\", # \"password\")))\r\ncluster = Cluster.connect(&lt;host&gt;,\r\nClusterOptions(PasswordAuthenticator(&lt;user_name&gt;, &lt;password&gt;)))\r\n# Connect to online_streaming bucket\r\ncb = cluster.bucket('online_streaming')\r\n# Use the Query API to get all documents from the bucket\r\n# Specify only the needed attributes in the SELECT clause \r\nquery_result = cb.query(\"SELECT c.AverageAge, c.AvgHoursWatchedPerWeek, c.Churn, c.MonthlyCost, c.NumViewersInHousehold, c.Plan FROM `online_streaming` as c\")\r\n# Easily store the read data to a Pandas data frame\r\ndata = pd.DataFrame(list(query_result))\r\ndata.head()<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-12622 size-full\" src=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2021\/12\/avgage.png\" alt=\"\" width=\"468\" height=\"108\" srcset=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/avgage.png 468w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/avgage-300x69.png 300w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/avgage-20x5.png 20w\" sizes=\"auto, (max-width: 468px) 100vw, 468px\" \/><br \/>\nRefer to <a href=\"https:\/\/docs.couchbase.com\/python-sdk\/current\/hello-world\/overview.html\">Couchbase Python SDK documentation<\/a> for information on using the SDK.<\/p>\n<p>The Analytics API in the Couchbase Python SDK can also read the training data as described in the next article.<\/p>\n<h4>Step 2: Extract Features<\/h4>\n<p>The data scientist will convert the raw data into suitable features to be passed to the training function. The type of feature engineering techniques depends on the type of data.<\/p>\n<p>One of the most common feature engineering steps is to convert categorical data to numerical values using one-hot encoding. We will use this Python code to encode categorical data and create the input (X) and label (Y) dataframes.<\/p>\n<pre class=\"\"># Get one-hot encoding for categorical features\r\ncategoricals = data.select_dtypes(include = object).columns\r\ndata = pd.get_dummies(data, columns=categoricals)\r\n# Drop the 'Churn' column since it is a label and not a feature\r\nfeature_names = list(set(list(data.columns)) - set(['Churn']))\r\nX = data[feature_names]\r\nY = data['Churn']<\/pre>\n<p>As seen in the<em> X.head()<\/em> output below, one-hot encoding has replaced the Plan column, which was set earlier to one of <em>Basic, Standard or Premium<\/em>, with three numeric columns <em>Plan_Standard, Plan_Basic <\/em>and<em> Plan_Premium.<\/em><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-12625\" src=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2021\/12\/xhead-300x63.png\" alt=\"\" width=\"300\" height=\"63\" srcset=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/xhead-300x63.png 300w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/xhead-20x4.png 20w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/xhead.png 512w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<h4>Step 3: Train ML Model<\/h4>\n<p>Next, the data scientist will proceed with training the ML model relevant to their use case.<br \/>\nWe will use the code below to train a churn predictor using the features created in step two.<\/p>\n<pre class=\"\">from sklearn.model_selection import train_test_split\r\n# train_test_split function splits the data into two subsets for training and testing.\r\n# The test_size parameter below ensures that the test subset is 20% of the \r\n# training data.\r\nX_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=101, shuffle=False)\r\n\r\nfrom sklearn.linear_model import LogisticRegression\r\n# We will use Logistic regression to train the churn predictor\r\nlm = LogisticRegression(max_iter=200)\r\nlm.fit(X_train, Y_train)\r\n\r\nlm.score(X_train, Y_train)\r\nlm.score(X_test, Y_test)<\/pre>\n<h4>Step 4: Store ML Model and its Metadata<\/h4>\n<p>The data scientist needs to save the trained model to generate predictions later. This step is also important for reproducing the research. Data science is an iterative process and there could be multiple versions of the model.<\/p>\n<p>We will use the code below to store our churn prediction model (version 1) and its metadata in a <em>model_repository<\/em> bucket on Couchbase. Create this bucket on your Couchbase cluster before proceeding.<\/p>\n<p>The size of our churn predictor model is less than 1KB. Models up to 20MB in size can be stored in Couchbase as JSON or binary formats as described in the article <a href=\"https:\/\/www.couchbase.com\/blog\/couchbase-machine-learning-model-store\/\">here<\/a>. The name of features expected by the trained model are stored in the model metadata.<\/p>\n<pre class=\"\">import pickle\r\nfrom datetime import datetime\r\n\r\ndef store_model_on_couchbase(model, feature_names, model_id):\r\n# Store model in Binary format\r\nfrom couchbase_core._libcouchbase import FMT_BYTES\r\nbucket = cluster.bucket('model_repository')\r\nmodel_bytes = pickle.dumps(model)\r\nbucket.upsert(model_id, model_bytes, format=FMT_BYTES)\r\nnow = datetime.now()\r\nmodel_metadata = {'model_id': model_id,\r\n'feature_names': list(feature_names), \r\n\"creation time\": now.strftime(\"%d\/%m\/%Y %H:%M:%S\")}\r\n# Store model metadata under a separate key\r\nkey = model_id + \"_metadata\"\r\nbucket.upsert(key, model_metadata)<\/pre>\n<pre class=\"\">store_model_on_couchbase(lm, feature_names, 'churn_predictor_model_v1')<\/pre>\n<p>Verify that the model and its metadata were successfully stored on Couchbase.<\/p>\n<h4><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-12623 size-full\" src=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2021\/12\/keyspace.png\" alt=\"\" width=\"799\" height=\"146\" srcset=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/keyspace.png 799w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/keyspace-300x55.png 300w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/keyspace-768x140.png 768w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/keyspace-20x4.png 20w\" sizes=\"auto, (max-width: 799px) 100vw, 799px\" \/><br \/>\nConclusion<\/h4>\n<p>In this article, we learned how Couchbase makes data science easy. Using customer churn prediction as an example, we saw how to perform exploratory analysis using the Query service, how to efficiently read big training datasets using the Python SDK and easily store it in a data structure suitable for ML e.g. pandas data frame. We also saw how to store ML models (up to 20MB in size) and its metadata on Couchbase.<\/p>\n<p>In the next article, we will learn how to make predictions, store them on Couchbase and how to use the Query charts to analyze them.<\/p>\n<h4>Next Steps<\/h4>\n<p>If you&#8217;re interested in learning more about machine learning and Couchbase, here are some great next steps and resources to get you started:<\/p>\n<ul>\n<li style=\"font-weight: 400\"><a href=\"https:\/\/www.couchbase.com\/products\/capella\/\"><span style=\"font-weight: 400\">Start your free trial of Couchbase Cloud<\/span><\/a><span style=\"font-weight: 400\"> \u2013 no installation required.<\/span><\/li>\n<li style=\"font-weight: 400\"><a href=\"https:\/\/resources.couchbase.com\/c\/server-arc-overview?x=V3nd_e&amp;ref=blog\"><span style=\"font-weight: 400\">Couchbase Under the Hood: An Architectural Overview<\/span><\/a><span style=\"font-weight: 400\"> \u2013 dive deeper into the technical details with this white paper.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Explore the Couchbase <\/span><a href=\"https:\/\/www.couchbase.com\/products\/n1ql\/?ref=blog\"><span style=\"font-weight: 400\">Query<\/span><\/a><span style=\"font-weight: 400\">, <\/span><a href=\"https:\/\/www.couchbase.com\/products\/full-text-search\/?ref=blog\"><span style=\"font-weight: 400\">Full-Text Search<\/span><\/a><span style=\"font-weight: 400\">, <\/span><a href=\"https:\/\/www.couchbase.com\/products\/eventing\/?ref=blog\"><span style=\"font-weight: 400\">Eventing<\/span><\/a><span style=\"font-weight: 400\">, and <\/span><a href=\"https:\/\/www.couchbase.com\/products\/analytics\/?ref=blog\"><span style=\"font-weight: 400\">Analytics<\/span><\/a><span style=\"font-weight: 400\"> services.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Check out these ML blogs:\u00a0<\/span>\n<ul>\n<li style=\"font-weight: 400\"><a href=\"https:\/\/www.couchbase.com\/blog\/5-use-cases-for-prediction-serving-systems-with-couchbase\/?ref=blog\"><span style=\"font-weight: 400\">Five use cases for using Couchbase with your real-time prediction serving system<\/span><\/a><\/li>\n<li style=\"font-weight: 400\"><a href=\"https:\/\/www.couchbase.com\/blog\/couchbase-machine-learning-model-store\/\"><span style=\"font-weight: 400\">How to use Couchbase as a machine learning model store<\/span><\/a><span style=\"font-weight: 400\">\u00a0<\/span><\/li>\n<li style=\"font-weight: 400\"><a href=\"https:\/\/www.couchbase.com\/blog\/ml-meets-nosql-integrating-python-user-defined-functions-with-n1ql-for-analytics\/\"><span style=\"font-weight: 400\">Running ML models using Couchbase Analytics Python UDF<\/span><\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This post was co-authored by Karen Yuan, a High School Intern.\u00a0 Data science extracts knowledge from data and applies that knowledge to solve problems. In the next two posts, we will learn how the Couchbase Data Platform can meet various [&hellip;]<\/p>\n","protected":false},"author":77870,"featured_media":12624,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"footnotes":""},"categories":[2294,1816,1812],"tags":[9231,2140],"ppma_author":[9310],"class_list":["post-12617","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-analytics","category-couchbase-server","category-n1ql-query","tag-data-science","tag-machine-learning"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.7.1 (Yoast SEO v25.7) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How Couchbase Simplifies Data Science (Part 1) - The Couchbase Blog<\/title>\n<meta name=\"description\" content=\"Data science processes can be simplified using Couchbase services to build models, reduce data migration, query, analyze, and more.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How Couchbase Simplifies Data Science (Part 1)\" \/>\n<meta property=\"og:description\" content=\"Data science processes can be simplified using Couchbase services to build models, reduce data migration, query, analyze, and more.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/\" \/>\n<meta property=\"og:site_name\" content=\"The Couchbase Blog\" \/>\n<meta property=\"article:published_time\" content=\"2021-12-21T16:00:27+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-06-14T00:20:50+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/ilyuza-mingazova-HaTIYO87qWQ-unsplash-scaled.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"1772\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Poonam Dhavale, Principal Software Engineer\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Poonam Dhavale, Principal Software Engineer\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/\"},\"author\":{\"name\":\"Poonam Dhavale, Principal Software Engineer\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/f9b85919fd77b1ea51f7fd68a8be03fc\"},\"headline\":\"How Couchbase Simplifies Data Science (Part 1)\",\"datePublished\":\"2021-12-21T16:00:27+00:00\",\"dateModified\":\"2025-06-14T00:20:50+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/\"},\"wordCount\":1696,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/ilyuza-mingazova-HaTIYO87qWQ-unsplash-scaled.jpg\",\"keywords\":[\"data science\",\"Machine Learning (ML)\"],\"articleSection\":[\"Couchbase Analytics\",\"Couchbase Server\",\"SQL++ \/ N1QL Query\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/\",\"url\":\"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/\",\"name\":\"How Couchbase Simplifies Data Science (Part 1) - The Couchbase Blog\",\"isPartOf\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/ilyuza-mingazova-HaTIYO87qWQ-unsplash-scaled.jpg\",\"datePublished\":\"2021-12-21T16:00:27+00:00\",\"dateModified\":\"2025-06-14T00:20:50+00:00\",\"description\":\"Data science processes can be simplified using Couchbase services to build models, reduce data migration, query, analyze, and more.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/#primaryimage\",\"url\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/ilyuza-mingazova-HaTIYO87qWQ-unsplash-scaled.jpg\",\"contentUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/ilyuza-mingazova-HaTIYO87qWQ-unsplash-scaled.jpg\",\"width\":2560,\"height\":1772},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.couchbase.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How Couchbase Simplifies Data Science (Part 1)\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#website\",\"url\":\"https:\/\/www.couchbase.com\/blog\/\",\"name\":\"The Couchbase Blog\",\"description\":\"Couchbase, the NoSQL Database\",\"publisher\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.couchbase.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#organization\",\"name\":\"The Couchbase Blog\",\"url\":\"https:\/\/www.couchbase.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2023\/04\/admin-logo.png\",\"contentUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2023\/04\/admin-logo.png\",\"width\":218,\"height\":34,\"caption\":\"The Couchbase Blog\"},\"image\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/f9b85919fd77b1ea51f7fd68a8be03fc\",\"name\":\"Poonam Dhavale, Principal Software Engineer\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/image\/2055ba12b300559d639fe9ab89303c2b\",\"url\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/07\/poonam-dhavale-couchbase.jpeg\",\"contentUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/07\/poonam-dhavale-couchbase.jpeg\",\"caption\":\"Poonam Dhavale, Principal Software Engineer\"},\"description\":\"Poonam Dhavale is a Principal Software Engineer at Couchbase. She has over 20 years of experience in design and development of distributed systems, NoSQL, high availability, and storage technologies. She holds multiple patents in distributed storage systems and holds certifications in machine learning and data science.\",\"url\":\"https:\/\/www.couchbase.com\/blog\/author\/poonam-dhavale\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"How Couchbase Simplifies Data Science (Part 1) - The Couchbase Blog","description":"Data science processes can be simplified using Couchbase services to build models, reduce data migration, query, analyze, and more.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/","og_locale":"en_US","og_type":"article","og_title":"How Couchbase Simplifies Data Science (Part 1)","og_description":"Data science processes can be simplified using Couchbase services to build models, reduce data migration, query, analyze, and more.","og_url":"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/","og_site_name":"The Couchbase Blog","article_published_time":"2021-12-21T16:00:27+00:00","article_modified_time":"2025-06-14T00:20:50+00:00","og_image":[{"width":2560,"height":1772,"url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/ilyuza-mingazova-HaTIYO87qWQ-unsplash-scaled.jpg","type":"image\/jpeg"}],"author":"Poonam Dhavale, Principal Software Engineer","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Poonam Dhavale, Principal Software Engineer","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/#article","isPartOf":{"@id":"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/"},"author":{"name":"Poonam Dhavale, Principal Software Engineer","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/f9b85919fd77b1ea51f7fd68a8be03fc"},"headline":"How Couchbase Simplifies Data Science (Part 1)","datePublished":"2021-12-21T16:00:27+00:00","dateModified":"2025-06-14T00:20:50+00:00","mainEntityOfPage":{"@id":"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/"},"wordCount":1696,"commentCount":0,"publisher":{"@id":"https:\/\/www.couchbase.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/#primaryimage"},"thumbnailUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/ilyuza-mingazova-HaTIYO87qWQ-unsplash-scaled.jpg","keywords":["data science","Machine Learning (ML)"],"articleSection":["Couchbase Analytics","Couchbase Server","SQL++ \/ N1QL Query"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/","url":"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/","name":"How Couchbase Simplifies Data Science (Part 1) - The Couchbase Blog","isPartOf":{"@id":"https:\/\/www.couchbase.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/#primaryimage"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/#primaryimage"},"thumbnailUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/ilyuza-mingazova-HaTIYO87qWQ-unsplash-scaled.jpg","datePublished":"2021-12-21T16:00:27+00:00","dateModified":"2025-06-14T00:20:50+00:00","description":"Data science processes can be simplified using Couchbase services to build models, reduce data migration, query, analyze, and more.","breadcrumb":{"@id":"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/#primaryimage","url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/ilyuza-mingazova-HaTIYO87qWQ-unsplash-scaled.jpg","contentUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/12\/ilyuza-mingazova-HaTIYO87qWQ-unsplash-scaled.jpg","width":2560,"height":1772},{"@type":"BreadcrumbList","@id":"https:\/\/www.couchbase.com\/blog\/how-couchbase-simplifies-data-science-part-1\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.couchbase.com\/blog\/"},{"@type":"ListItem","position":2,"name":"How Couchbase Simplifies Data Science (Part 1)"}]},{"@type":"WebSite","@id":"https:\/\/www.couchbase.com\/blog\/#website","url":"https:\/\/www.couchbase.com\/blog\/","name":"The Couchbase Blog","description":"Couchbase, the NoSQL Database","publisher":{"@id":"https:\/\/www.couchbase.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.couchbase.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.couchbase.com\/blog\/#organization","name":"The Couchbase Blog","url":"https:\/\/www.couchbase.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2023\/04\/admin-logo.png","contentUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2023\/04\/admin-logo.png","width":218,"height":34,"caption":"The Couchbase Blog"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/f9b85919fd77b1ea51f7fd68a8be03fc","name":"Poonam Dhavale, Principal Software Engineer","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/image\/2055ba12b300559d639fe9ab89303c2b","url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/07\/poonam-dhavale-couchbase.jpeg","contentUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/07\/poonam-dhavale-couchbase.jpeg","caption":"Poonam Dhavale, Principal Software Engineer"},"description":"Poonam Dhavale is a Principal Software Engineer at Couchbase. She has over 20 years of experience in design and development of distributed systems, NoSQL, high availability, and storage technologies. She holds multiple patents in distributed storage systems and holds certifications in machine learning and data science.","url":"https:\/\/www.couchbase.com\/blog\/author\/poonam-dhavale\/"}]}},"authors":[{"term_id":9310,"user_id":77870,"is_guest":0,"slug":"poonam-dhavale","display_name":"Poonam Dhavale, Principal Software Engineer","avatar_url":{"url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/07\/poonam-dhavale-couchbase.jpeg","url2x":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2021\/07\/poonam-dhavale-couchbase.jpeg"},"author_category":"","last_name":"Dhavale","first_name":"Poonam","job_title":"","user_url":"","description":"Poonam Dhavale is a Principal Software Engineer at Couchbase. She has over 20 years of experience in design and development of distributed systems, NoSQL, high availability, and storage technologies. She holds multiple patents in distributed storage systems and holds certifications in machine learning and data science."}],"_links":{"self":[{"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/posts\/12617","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/users\/77870"}],"replies":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/comments?post=12617"}],"version-history":[{"count":0,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/posts\/12617\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/media\/12624"}],"wp:attachment":[{"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/media?parent=12617"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/categories?post=12617"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/tags?post=12617"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=12617"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}