Where does it begin…
IOT, Edge devices and NoSQL are all technologies that have increased in popularity in recent years. Allowing people to comfortably create interaction heavy applications, without the worry of stability and availability. One problem with the freedom and flexibility of these new technologies is maintaining the cost. Aggregating documents and reducing your data footprint is a common use case for write heavy and analytical applications. Couchbase 6.6 provides us with even more tools to help you lower your TCO. Let’s look into where the problem started and how we can use the Couchbase toolset to resolve it.
Applications and Interactions
I remember the time when applications used to be simple, when they set out to achieve a simple goal. For example, a retail application enabling the shopping process to be digitally available, stopping people having to travel to their local retailer to carry out the same task. Films and Music being made available at the click of a button, instead of having to buy CD’s or DVD’s from a media store. The initial digital transformation of almost everything was a huge leap in convenience of the consumer and gave more time back to everybody, and for a while this change was enough to keep us happy. Any technical issues would appear minor because they were overshadowed by the amount of convenience that it provided.
However, these days, a new retailer ‘going digital’ isn’t really anything to shout about, those expectations are already there, what consumers now care about, is the small stuff that was disregarded before. Those previously minor technical issues will now be the ‘make or break’ for any successful application. As digital transformation embeds itself into businesses everyday activities, expectations from these applications and systems has risen, and will continue to do so.
As we develop software to meet this high expectation of user experience, we provide more interaction between the end user and the application. More complex applications come hand and hand with more interactions. This ever-increasing number of interactions, will need an ever-increasing amount of storage to record them, and this is where our database’s fit in.
Like the applications themselves, there was a time where relational databases would be a perfect fit to store the information of pretty much every system, and this satisfied us. However, like the expectations of the user’s experience had started to rise, so did the expectations of the underlying database. We are now storing an exponential amount of data as time goes by and we started to see where the relational database had its flaws. More data means more storage and essentially ‘More Database’, those that are familiar with relational databases would know that this isn’t as simple as typing ++. To start with, making any adjustments to a relational database would often come with down time, downtime in today’s world just isn’t acceptable anymore. Scaling out relational nodes isn’t easily achievable and so we were limited to scaling up, the problem is that scaling up became costly and it also has its own limitations. You would also expect linearly increased performance given the more compute power and storage that you scaled; however, this is not the case. To top it off, the enforced schema approach meant that the overall architecture was rigid, this made any changes to the underlying dataset hard to make. Overall, the TCO for Relational systems was unmaintainable, and in the world where technology is changing at faster and faster rates, you can see why the relational database needed its replacement for these high interactive applications.
The idea of NoSQL Technologies started to come into the picture. NoSQL embodied a Schema less, Scalable, distributed approach to storing its information. It provided the ability to scale the database both out and up without the need for downtime. There were no restrictions around the dataset allowing rapid and flexible development. The overall TCO for these databases is a lot smaller and maintainable, suiting the applications which require that high level of interaction.
IOT and Edge Devices
One major introduction into the software world was the sudden boom of IOT and Edge Devices. These devices normally consist of both a combination of smaller hardware and software and can act as entry points into the underlying technology stack. One major example of these devices is the use of sensors and being able to continuously record data which can be fed back into the database. NoSQL databases allowed for these devices to multiply and grow without the worry of scaling, rigidity and high availability, which a relational model would struggle provide. In many of these cases the application would encompass a high level of write operations and the information would be analysed at a later stage.
Now, although the ability to comfortably scale is a sigh of relief for most DBA’s, we quickly run into a common problem which everybody is trying to solve, increasing cost. The flexibility and freedom to store as much data as possible also has its downsides to it, as a result the size of these clusters can quickly get out of control.
If we take a Formula 1 racing team during a race for example, they could have sensors built into the car recording statistics down to the millisecond, they could have hundreds or even thousands of these sensors all around the car throughout each race. Although the individual readings would be small, the sheer number of documents which are being stored will be in the millions, and that’s me being conservative! Once they have all of this data, they need to run near real time analytics, the result of these analytics will allow the team to make adjustments throughout the race and improve as they go. The problem doesn’t come during any of the above scenario, but when you look at the cluster supporting this, you can see how the amount of data and the TCO for this database can get quickly out of hand. For this Formula 1 team however, it turns out, that after 15 minutes, the granularity of the sensor readings in milliseconds isn’t as necessary anymore and instead they are happy to store averages by the second instead, after an hour, they are happy to store averages by the minute, this process can be repeated as much as is required.
For example if we look at a set of documents which contain multiple temperature and pressure readings, we could get a set of documents which look like this
key = "sensor::temp-press::2020-01-02T12:34:01"
"ts": "2020-01-02 12:34:01",
key = "sensor::temp-press::2020-01-02T12:34:58"
"ts": "2020-01-02 12:34:58",
key = "sensor::temp-press::2020-01-02T12:34:59"
"ts": "2020-01-02 12:34:59",
To reduce the dataset and compress the information that we have, we could aggregate the results into arrays of much smaller documents
key = "sensor::tps-001::2020-01-02T12:34"
"t": [110.8, ... 112.7, 113.1],
"p": [21.2, ... 21.6, 22.5]
Couchbase Version 6.6 introduced the recursive timer. Allowing users to create timers within the call back of another timer. At the point of execution, the logic would aggregate the documents within a given time frame and then before exiting, spin up another timer to repeat the process after a period of time.
Although the feature sounds trivial, it greatly simplifies the process of recursion and removes the need to utilise external technologies to achieve something which could be done internally.
So, you may not feel the need for document aggregation as you get your feet off the ground with a NoSQL Database. For write heavy use cases or even just when the overall TCO starts to rise, you’ll be looking for any ideas and answers you can find to help reduce the amount of data you are storing, hopefully this blog post and the new recursive timer can be a part of your solution.