In part 2 of the “how to model data” series, we’ll look at a variety of logical models, their strengths and weaknesses, and pros and cons.

Part 1 of this series covered the three steps of data modeling: conceptual, logical, and physical.

Common Data Models

After you’ve defined a conceptual model, the next step is to start building a logical data model. This is still too early to pick a specific technology or vendor.

The logical data models that I find to be the most common and popular include: relational, JSON (document), key-value, graph, wide-column, and text search.

Relational Data Model

The relational data model was proposed in 1970 by E.F. Codd. Generally speaking, the relational data model consists of tables that contain rows and columns, which contain data that follow pre-defined schemas and constraints.

The term “relational” does not refer to relationships between tables and entities, but rather the theoretical concept of a set of tuples that are a member of a data domain, each containing attributes.

Actual relationships are not part of the relational model (or any other model except Graph, discussed later), but are usually implemented with multiple tables and foreign key constraints.

Relational is the most popular logical data model type in the world.

SQL was not part of the original relational proposal, but is strongly (but not exclusively) associated with relational databases. In fact, SQL is generally the only way to interact with data in relational databases.

JSON Data Model

Most document databases store data as JSON (XML and other representations are far less popular).

Highly scalable NoSQL implementations avoid relational constraints in favor of independent data that can be distributed amongst multiple servers (via clustering and sharding). JSON is a popular format for this kind of distributed data because:

If you are coming from a relational background, but want to explore the benefits of “NoSQL” (scaling, flexibility, performance, familiarity), then a JSON document model is a great place to start.

Data access methods for document databases typically include (but are not limited to) a querying language (a standard like SQL++ or a proprietary vendor implementation like MQL) that can filter based on individual JSON values.

Key-Value Data Model

A key-value data model associates each piece of data with a key. The actual value can be in any format. It could be JSON, XML, binary, text, or anything else.

These databases are great for certain use cases: caching, session storage, user profile, shopping carts.

Beyond key-value reads and writes, functionality like “subdocument” and “map/reduce” adds more data access options, but using a pure key-value logical model on data with complex relationships can get very difficult.

Graph Data Model

A graph data model is the only model that has both concepts of data and relationships built in. In a graph model, a piece of data is called a “node”, and a relationship between two nodes is called an “edge”.

A graph data model can be useful for modeling a complex system of relationships. For instance, modeling a physical computer network and the computers that are on the network. It’s often used for specialized use cases like fraud detection and shopping recommendations.

For queries, graph databases use a specialized graph query language like Gremlin or Cypher.

Note: GraphQL and the Graph Data Model are not related.

Wide-Column Data Model

A wide-column data model has tables (sometimes called “key families”) that are similar to relational tables, but they are column-based instead of row based.

This means that any given row can have an arbitrary amount of columns, different from other rows. Furthermore, columns can continuously be added to existing rows.

Each row has a key, so you can also think of this model as a 2d evolution of a key value store.

Query languages for wide-column data models are database specific. A popular example is CQL (Cassandra Query Language), which looks similar to SQL, but is a very limited subset.

Text Search Data Model

A text search data model is optimized for storing and searching text. This includes faceting and fuzziness, but can also include geographical search.

Creating indexes is a key part of interacting with text search. Searches provide parameters to active indexes, and are often performed via REST requests or a specialized SDK. Types of queries that might be used include string, match, phrase, compound, range, and geospatial.

Data Modeling Examples

Multi-model databases are those which can support multiple types of the data models listed above with the same pool of data. That being said, here are some examples of physical data model implementation tools (databases) for each model:

    • Relational: SQL Server, Oracle, MySQL, MariaDB, PostgreSQL, SQLite, Microsoft Access, Snowflake.
    • JSON Document: Couchbase, CosmosDB*, DynamoDB, CouchDB (similar name, but this is different and separate from Couchbase), MongoDB.
    • Key-value: There are very few pure key-value databases, but examples that have good key-value support: Memcached, Redis, Couchbase.
    • Graph: Neo4j, CosmosDB*, ArangoDB.
    • Wide-column: Cassandra, HBase, CosmosDB*.
    • Text/Search: Elasticsearch and Solr (Lucene), Couchbase (Bleve)

*Note that you must choose one data model up-front with CosmosDB.

Data Modeling Tools

Data modeling doesn’t require any tools, and can be done on a dry erase board or with pencil and paper (with the exception of physical data modeling, of course). However, there are many data modeling tools that can make your team’s job easier. Here are some popular ones:

JetBrains DataGrip provides a database diagramming tool to construct a model from an existing database. These diagrams can be saved to UML or PNG. This is great for physical data modeling.

Erwin Data Modeler is a very popular tool that supports conceptual, logical, and physical data models. Quest, the makers of Erwin, have partnered with Couchbase for modeling and migration use cases.

Hackolade is another data modeling tool that supports a variety of data models (including all the physical models mentioned in this post) as well as data model migration.

Idera has a full suite of tools for modeling. For instance, ER/Studio Business Architect provides conceptual modeling, and ER/Studio Data Architect provides logical and physical modeling.

If these tools seem like overkill for getting started, remember that you can use a more general purpose diagramming tool, like Diagrams.net to get your team working collaboratively.

Next Steps and Resources

Conceptual, logical, and physical modeling were all introduced in the previous post. This post took a closer look at logical data model choices. The next (and final) post in this series will dive into physical JSON data modeling specifically.

In the meantime, check out these resources for more info:

Author

Author

Posted by Matthew Groves

Matthew D. Groves is a guy who loves to code. It doesn't matter if it's C#, jQuery, or PHP: he'll submit pull requests for anything. He has been coding professionally ever since he wrote a QuickBASIC point-of-sale app for his parent's pizza shop back in the 90s. He currently works as a Senior Product Marketing Manager for Couchbase. His free time is spent with his family, watching the Reds, and getting involved in the developer community. He is the author of AOP in .NET, Pro Microservices in .NET, a Pluralsight author, and a Microsoft MVP.

Leave a reply