NoSQL vs SQL |
A common thread in technology — or for that matter, just about anything humans attempt over time — is evolution.
It's not unusual, then, to see things grow and adapt as the world
around them changes. That very adaptation means that sometimes older
variations of our favorite tools remain, while newer variations appear
alongside them to better serve certain, specific purposes.
The database world is no exception. Since the days of E.F. Codd and his introduction of the relational data model,
we've seen data stores and their requirements grow from purely
transactional models doing ACID (Atomic, Consistent, Isolated, Durable),
to requiring more flexibility, ability to scale (via distributed
deployment), and speed (in terms of latency and throughput). These
improvements may, in many cases, be at the very expense of what made RDBMS solutions preferable in the first place. But how to know?
In this piece, we'll refer to traditional SQL databases —
relational database systems — as RDBMSs. Their counterpart, in our
comparison, is the category of databases — known as NoSQL — which
includes many subtypes.
Overview of SQL/RDBMS
RDBMSs have been the default choice for information storage
in financial, manufacturing, logistical, personnel, and other systems
since the 1980's. Since Oracle's first commercial offering, by what was
then known as Relational Software, was introduced in 1979, the world saw
a phasing out of their predecessors — legacy hierarchical and network
databases — partly because of ease of use and administration. Examples
of RDBMS databases available today include Oracle, MySQL, Microsoft SQL Server, and PostgreSQL.
RDBMSs improve on their predecessors via a few other attributes that make them indispensable: support for the ACID transaction, and the facilitation of referential integrity.
The fault-tolerance, accuracy, and trustworthiness born of
these attributes have made RDBMSs the go-to for transactional data.
However, with the advent of the web in the 90's and "Big Data" in the
mid-aughts, new requirements appeared.
To deal with the exponentially-growing, semi-structured flood
of data that was, by then, becoming commonplace, admins asked
questions. Could RDBMSs be distributed, or were they capacity-limited to
the hardware they resided on? How did distribution affect throughput?
Could RDBMSs support multi-tenancy, and did this threaten transaction
isolation? Could relational systems support very wide columns,
variable-length records, semi- or even unstructured data?
Often, the next evolutionary step is to tear down your
existing assumptions and start over, with your new requirements in mind.
Enter NoSQL.
Overview of NoSQL
Actually, a few "NoSQL" solutions had been around a while
already, but hadn't gained much traction outside of specific domains. As
early as 1966, the unfortunately-named MUMPS programming
language supported a key-value database (with ACID transactions) for,
ironically, the healthcare industry. Another early key-value store (for
Unix) was Ken Thompson's dbm (of which a current successor is BerkeleyDB).
NoSQL is an umbrella term for any database that stores data in a way other than
RDBMS's rigidly-typed, schema-immutable, relational tables. Today,
there are several common types of NoSQL databases, best suited to
different purposes. Among them:
- Key-Value stores: lightning-fast lookup, via a key-value pair, with caching capabilities.
- Document-oriented databases: databases that, like key-value databases, use a key to access an associated value; in a document-oriented database however, the data in the value is not opaque to the system and may be used to optimize the query. Document-oriented databases may store their data in a notation like JSON or XML.
- Distributed/Tunably Consistent databases: distributed databases that let the user configure the tradeoff between consistency, availability, and partition tolerance.
- Wide-Column stores: databases that use tables, rows, and columns, like an RDBMS, but where names and formats of columns can vary by row within the same table.
- Time-Series databases: database optimized for storing and manipulating time series data (where timestamps are best treated as discrete quantities rather than continuous mathematical dimensions).
- Text Search: these databases let you store data in unstructured format or even as plain text.
It's worth mentioning that many NoSQL databases out there
today provide features from multiple NoSQL types. For example, both
Apache Cassandra and ScyllaDB are wide-column stores and distributed, tunably-consistent databases.
Now that many NoSQL databases achieve their (com)promises, many, if not most, are going back to
an SQL-like dialect, to grow their user base with users already
familiar with SQL. CQL (Cassandra Query Language), Spark, Hive, or
Presto are all notable examples. Elasticsearch has its own SQL variant.
So does MongoDB. Even streaming platforms like Kafka are getting in on
the SQL act!
Key Differences Between NoSQL and SQL/RDBMS
In addition to these SQL (-like) query languages, NoSQL and
RDBMS databases share at least one more thing in common: the ability to
store data. The difference is in how.
NoSQL databases make certain compromises to get a leg up on
flexibility, ingestion speed and query throughput. As noted earlier, for
example, a tunably-consistent database offers users an adjustable
tradeoff on consistency if higher availability or partition tolerance is preferred.
Also, NoSQL databases may be missing or may compromise some
of the features inherent to RDBMS, for example, primary keys, foreign
keys, rigid schemas, and (hardcoded) data types.
Which Is Best, When?
If you could answer the question "Why NoSQL?" with only a few words, they would probably boil down to scale (including flexibility of scale), flexibility (including of schema and type constraints), or latency and performance
(including throughput). Secondary considerations may include the size
of the community (and the support and documentation you're likely to
find) or the level of industry adoption (as a measure of popularity, and
thus reliability and fitness for purpose) of a given solution.
Of course, there's more to it than just that. While NoSQL
offerings are built around these concepts, RDBMS databases remain the
preferred choice in industries and applications involving transactional
data, where ACID-based durability is essential: user authentication and
access-rights management, healthcare, banking, resource allocation and
inventory management, and shopping carts, to name but a few. RDBMS
databases come with the long-established promises of isolation,
security, and referential integrity, and there's no seeing those
requirements go away any time soon.
However, in applications involving high-volume, fast-moving analytics
data from sensors, applications, or complementary systems, a NoSQL
solution with different features or attributes might be a better fit.
What kind of needs do you have? A fast-lookup with caching features,
like a key-value store? A more flexible document-store for searching
through larger records, or even text search for unstructured data? Maybe
a distributed wide-column store with tunable consistency, or a store
specialized for time-series data?
NoSQL is an evolution from RDBMS databases to better serve specific needs. The "best" in any case is what suits your needs; remember to assess your existing infrastructure and team to find areas where gaps can become strengths.
1 comment:
Worth for my valuable time, I am very much satisfied with your blog. Thanks for sharing.
Python Training in Chennai
Python Course in Chennai
Ethical Hacking Course in Chennai
RPA Training in Chennai
Post a Comment