When using Couchbase as a persistent system of records the underlying storage infrastructure needs to be considered. Infrastructure decisions have an impact on the overall performance and availability of the data store, with the knock-on effect of impacting the data consuming applications.
As we’re all aware, poor performance, poor response times and lack of availability is not something that customers/businesses will tolerate; it frequently leads to frustration, potential loss of customers to alternate vendors, financial impacts, regulatory fines and maybe even death (e.g., with medical systems)!
Customers seek a solution that meets their expectancy and needs.
Particular challenges for persistent data stores
One of Couchbase’s key tenets, our DNA if you like, has been built on providing sub-millisecond response times and has been key to Couchbase’s adoption by many customers.
However when changing from a caching layer to a persistent data store, the data storage requirements and considerations change. Many more things need to be considered when we’re talking about storing data in the Petabyte range, as each can all have a big impact on the overall performance of the solution.
Typically, before being adopted as the enterprise storage solution, various parameters must be guaranteed to meet specific business requirements.
Storage solutions aren’t just about storage!
They must meet multiple Service Level Agreements that the business mandates such as Recovery Time Objectives (RTO’s) and Recovery Point Objectives (RPO’s).
For example, IOPS, Latency, Resilience, High Availability commonly referred to as RAID, Disaster Recovery (DR) Backups, Data Access – Multi-pathing, Quality of Storage Service (QOSS), and more.
Key persistent storage infrastructure considerations
While this blog can’t address all of these points (there will be subsequent articles), we look to highlight various points for consideration.
Couchbase natively provides HA, DR and Backups, but the underlying storage infrastructure becomes the area to focus on to address each of these areas:
- meeting the applications IOPS & Latency requirements
- the ability to tune
- different OS support, Linux (CentOS, Rhel, SUSE), Windows, Unix,
- VMware & Kubernetes
- online cross-platform migrations
Storage Independence and Agility
- NAS, SAN, DAS, SSD, Cloud
- avoid vendor lock-in if your requirements change
- integration between different storage vendors
- the size of the datastore: TB’s, PB’s, ZB’s …
- the ability to grow / shrink on demand
- budget constraints
- ease of management / complexity: centralized management, specialist teams, single vs. multiple tools
Once we have the answer to the above questions we can then start looking at what physical storage layer to use: DAS Storage, NAS Storage or SAN Storage (HDD & SSD).
Each of these have different scale, performance, space, location, power consumption and cost matrix implications.
Then, following the above choices start looking at the Operating System (OS) Level requirements such as:
- File Systems
- Distributed File Systems
- SDS – Software defined Storage
- Cloud/Object Storage
Choosing the correct storage solution can have a big impact on your applications!
Getting the initial storage fit correctly is better than trying to retrofit later. Most organizations have storage teams and storage standards in place and you will need to involve them to approve and provide the storage profile you require. You will also need to be realistic and clearly demonstrate your performance requirements, all the while taking into consideration the cost implications and source of funding.
The first step is to understand the business objectives and application needs. From there, investigate what will meet your SLAs and the enterprise’s goals.
Next time: The Physical Storage layer.