Back in November, Raju Suravarjjala, (Sr. Director, Couchbase QE & Performance) and I  presented “Mindmap: Oracle to Couchbase For Developers”. You can see the slide deck here: http://bit.ly/2mPdJfo. Overview article for the series is at is at https://www.couchbase.com/blog/couchbase-oracle-developers-part-2-architecture/. This series is written in collaboration with Raju.

This is the second part of the “Couchbase for Oracle Developers” series discussing the high-level architecture of the two databases from a developer perspective. The blog assumes you have knowledge of Oracle and does not explain Oracle features, except for compare and contrast reasons. This comparison and contrast is for just for that.  It’s not intended to evaluate feature or performance against another.  

Both Oracle and Couchbase were developed to handle specific enterprise application jobs.  They both have expanded and are expanding. They coexist in enterprises.  It’s with this sensibility, we’re providing a comparison of these products to help Oracle developers to understand what’s Couchbase.

Now, on to architecture.

Topic Oracle Couchbase
Single Node Deployment Oracle is a single monolithic single instance database (in the deployment shown in 1).  It was indeed originally developed to run on a single machine and works well in an SMP (multi-socket, multi-core) machine.  All the layers of a database (logical, physical space management, locking, logging, are all managed from a single machine).   The code executes in a single thread (or cooperating threads) with resource locking designed for a single machine.

This configuration is the majority of the development and production deployment. When you run out of disk capacity, you add capacity.  When you run out of CPU capacity, you migrate the workload to a larger machine with more sockets, more cores.  Or these days, bigger VMs or containers.

The APP for server communication uses proprietary oracle protocol. Higher level SDKs work on top of this.

Couchbase server can be installed on a single server and it’s usually installed on a single node (MacOS, Linux or Windows) for development. All of the database functions are abstracted into services (data layer, index, query, with APIs to access data directly or via query).  These services co-ordinate to expose the services via REST API.  The applications use Couchbase SDKs (available all popular languages) to get data, create indexes, run N1QL queries.

This configuration is used mainly for development on your laptop, VM or a container. It’s rarely if ever, used in deployment. In fact, Couchbase gives you a warning when you’ve deployed on a single node there’s only one copy of your data!

The API for server communication is via REST API and Memcached binary protocol.  All other higher level SDKs use REST or Memcached protocols underneath.

Languages SQL, PL/SQL,XQuery, SES (Secure Enterprise Search) N1QL (SQL for JSON), Direct data access, FTS (full text search).
Connections A persistent connection between user process and database process/thread running on behalf of the client application. HTTP over TCP/IP connection.
Connection for MemcachedEach request for data fetch/modification, query is a distinct request. While we use the TCP and HTTP keep-alive to make the connection request efficient.
Session A session starts from CONNECT to QUIT. Each application “session” has a corresponding session on the Oracle server. Within a session, you can execute a 1 query or one million queries spanning multiple transactions. Just like connection, in Couchbase, each request in a distinct request. Each request has to carry both bind parameters and any context parameters (e.g. timeout). Once the request is processed, the session context disappears in the server.

However, nothing prevents the SDKs to remember the context, parameters and pass it on in subsequent request without application intervention, making it easy to code.

User, authentication Supports OS user, database user. Authentication can be done by OS, database, third party like LDAP, kerberos. Supports OS user, database user. Authentication can be done by OS, database, third party like LDAP, kerberos.
Database Lower than the instance, database is the unit of data management and top level access control for users and applications.

One Oracle instance can have multiple databases.
One Oracle database can have multiple tables (in addition to system tables/catalogs).

One Couchbase instance can have upto 10 BUCKETS. Each bucket is allocated fixed amount of memory (in each node). That’s used to cache the data in the data nodes. Think of it as the buffer pool for the data in Oracle.

Within each bucket, you store JSON documents. Each JSON document in a single bucket should have a unique key (along the lines of primary key in your tables).

Within Each bucket, you can have multiple types of document (like tables and collections).

To allow of uniqueness and to identify the type of document just by looked at the key, we prefix the document key with the type of the document.

E.g: “cust.x817.022.4u2”
“cust.x2317.402.2742”
“parts.lkfh38.sldkv”
“parts.lkfh38.sldkv”
“parts.lkfh38.sldkv”
It’s also typical to store the type of the document within the document.

{ “type”: “cust, “lname”:”Smith”}
{“type”: “parts”, “dept”:”engine”}

Table Table Types of documents within a bucket.
The customer documents (with a cust prefix) become a group logically and parts documents become a group as well. There’s no physical separation except in the context of an index and query.
Column Column All the documents in Couchbase have to be JSON documents conforming to http://json.org/
Consider a simple 1-level JSON document.
Document Key: “cust:2984”
Doc: {“a”:1, “b”: True, “c”: “Hello”}
In your mind map, you can think of this document as a row, individual attributes, “a”, “b”, “c” as columns, the document key as a primary key.
This document contains simple scalar values.JSON can contain arrays and objects, arrays of objects, objects containing arrays. Attribute names are referenced from top down (think of objects in Oracle.This will be discussed in details in “Database Types” section.
Logical and Physical Schema structures.
SDKs http://bit.ly/20TCGn0

http://bit.ly/2FoiFP8

https://www.couchbase.com/downloads  (scroll down to SDKs) 
Multi-node architecture
(Homogeneous deployment).
Multi Dimensional Deployment Not Available. Each node in Oracle RAC has full Oracle SQL capability including the transactional and administrative capability.  

Each node in the cluster can have one or more combination of the services: Data, Query, Index, Search an Analytics. The cluster manager (co-ordinator) is aware of the distribution and availability of the services and will inform the services about each other.

The SDK is also aware of the data distribution, query and FTS nodes. SDK tries to balance the workload among different query nodes for query workload, data nodes for data/KV workload, FTS nodes for search workload.

System Architecture Single node deployment is suited for SMP.
Multi-node deployment share the same disk/storage. Hence, Shared Disk System. You can scale up to a point by adding more compute and sharing the same disk. Eventually, shared disk and IO throughput becomes the bottleneck.Developer is oblivious (for most part) to the instance type: single node or multi-node RAC. The SQL supported are the same and transactions work seamlessly in a multi-node environment.Oracle has developed a sophisticated distributed lock manager, buffer pool synchronization, etc to achieve this. It does require the expensive infiniband to reduce any negative effect on the performance.

Despite this, it’s not uncommon to partition the workload between multiple RAC nodes to minimize the conflicts in locking, etc.

Single node can scale up, implemented as co-operating multiple services running on a single system. Coordination is done via message passing, even on a single system.

This loosely coupled set of services lend themselves to scale out seamlessly. You can have all of the services in each node and simply add new nodes with the same services. All the services will understand the multi-node topology. This is known as scale-out.

Applications don’t use all of the services uniformly. The bottlenecks could be on data, indexer of the query. So, in Couchbase, you size and add new nodes simply to run the bottlenecked services. This will provide optimal resource utilization and better performance compared to a homogeneous deployment. It’ll also reduce your cost. The multi-dimensional scaling is easy to deploy as any other configuration and manageability remains the same.

Query Oracle has Full fledged SQL support.
The SQL extensions include support for JSON and text search.
Oracle also has XML DB, supporting XML, XQuery, SQL/XML, etc.
Couchbase can store binary and JSON.
Developer APIs can GET and SET documents, subdocuments directly.
N1QL (SQL for JSON) provides declarative language. It comes with indexes and a full fledged query engine to execute queries efficiently. Details at : query.couchbase.com.
Couchbase FTS (Full Text Search) helps you create text index and search.
http://bit.ly/2vbcbOF
High Availability Oracle’s MAA (Maximum Availability Architecture) recommends: “On the Cluster configuration screen, in the Disk Group Details section, MAA recommends choosing Oracle Automatic Storage Management (ASM) HIGH redundancy for ALL (DATA and RECO) disk groups for best protection and highest operational simplicity”

Oracle has facility to create hot standby servers, schema based replication, change capture and probably more.

Within a single cluster, you can have multiple copies of data and index just by specifying the number of copies you’d like to have. You can have up to 3 copies of the data and any number of index copies.

Across-multiple clusters, you can replicate the data with built-in XDCR (cross-data center replication).

Transactions ACID

Multi-statement

Single document atomicity
Data Service consistency, Index – eventual consistency
optimistic locking (CAS)
additional confirmation for durability
Drivers JDBC, ODBC, .NET, LINQ and more Couchbase SDK (Java, .NET, LINQ, PHP, Python, Go), Simba JDBC/ODBC
Data Model Supports both normalized and denormalized data model. Denormalized (aggregated) JSON model.
Relationships between multiple type of documents (e.g. Orders to Customers) can be represented and processed. Either the child (orders) or the parent (customer) can store the primary key of the related documents and then JOIN them. This relationship is implied, not codified by a constraint (e.g. Foreign Key Constraint) in Couchbase. This is the reason Couchbase is more flexible.

Author

Posted by Keshav Murthy

Keshav Murthy is a Vice President at Couchbase R&D. Previously, he was at MapR, IBM, Informix, Sybase, with more than 20 years of experience in database design & development. He lead the SQL and NoSQL R&D team at IBM Informix. He has received two President's Club awards at Couchbase, two Outstanding Technical Achievement Awards at IBM. Keshav has a bachelor's degree in Computer Science and Engineering from the University of Mysore, India, holds ten US patents and has three US patents pending.

Leave a reply