The Internet of Things (IoT) application brings a new dimension to the database workload. What makes IoT click is the availability of data from the edge devices at the gateways, servers for instant analysis, rollups, etc. Data generated is heterogeneous in terms of schema and always evolving. For example, each camera or smartphone produces photo metadata differently. The camera or smartphones themselves change the schema for multiple versions. This is true for the various type of devices. JSON representation of data is self-describing and therefore the model is flexible. So, modeling the basic data representation in JSON makes sense.
The systems generating the data for IoT can use RDBMS, NoSQL systems, simple JSON, XML or proprietary format. With the heterogeneous data from all sources and devices, the Internet of Things becomes the Internet of Data.
An architecture for IoT data management
Data from sensors can be aggregated, filtered and analyzed in the sensor or the device itself. E.g. temperature sensor to turn the heater on or off. The gateways collect the data from multiple sensors and try to make sense out if the data. They aggregate the data over time, albeit for a limited time — daily or weekly. This data is sent over to the cloud for maintaining the full data as well as doing deeper analysis. Example: season over season, year-over-year, etc.
The traditional workloads (represented by TPC-C, TPC-E) tend to be read-heavy workload on an OLTP schema. IoT data generation and usage is write-heavy. Gateway gets all of the data from the sensors. It generates the first round of immediate intelligence and then optionally filters, aggregates the data before sending it to the backend. So, TPC Council has created a new benchmark, TPCx-IOT to measure the price-performance of the IoT Gateways systems for IoT use case.
Outline of TPCx-IOT:
TPCx-IOT provides a full kit to implement the benchmark for a database. It includes a framework to generate the data, issue queries, measure the performance and then calculate the price-performance ratio. The dataset is based on data from sensors of modern electric power substations. The data is continuously loaded into the gateway and real-time analytical queries are run continuously.
Here is a block diagram description of the scenario and how the gateways play in the IoT framework.
The workload itself is straightforward. The basic framework should be familiar to the folks who’ve worked with YCSB. The workload operation and distribution have been customized for the IoT use case. The insert/load and the scan operations are executed in parallel.
TPCx-IOT run with Couchbase.
The workload driver, the insert (load) and scan operations were implemented by Cisco and Couchbase successfully to create a workload driver for Couchbase. Couchbase is a high performance, JSON based distributed NoSQL database for supporting scalable applications for web, mobile, and IoT. Here are the pre-audit results for the Couchbase on Cisco M4 hardware. See more details at https://www.couchbase.com/benchmarks and for comparative numbers see the TPCx-IoT site. Thanks to Cisco team for their leadership in TPCx-IoT and collaboration to port the benchmark drivers to Couchbase and running the benchmark.