Do we need so many databases?

We have hundreds of different databases to store data — and we need more

Comments

The world used to get by on just a smattering of databases. You know, trusty relational workhorses like Oracle, Microsoft SQL Server, Ingres, and IBM DB2.

Soon enough, however, open source crashed the party with MySQL and PostgreSQL. A bit later, NoSQL databases hit the market, with the likes of MongoDB, Redis, and Apache Cassandra growing in popularity.

In terms of numbers, by January 2013 DB-Engines, which ranks database popularity, listed 109 databases. Today? DB-Engines catalogs 356 databases, more than triple the number just seven years ago.

Which may prompt a question: Is this proliferation of databases a good thing? Do we really need 356 databases? I mean, of course we’re going to keep using PostgreSQL (#4 on the DB-Engines ranking). But are Yanza (#336) or Upscaledb (#299) necessary?

The answer seems to be “Yes.”

Only a few will hit escape velocity

Not everyone agrees. Take MongoDB CEO Dev Ittycheria, for example. In an interview, Ittycheria opined that “there’s only a few platforms or technologies that really have escape velocity.”

Which ones? Enterprises “have some sort of legacy standard, some relational database, either Oracle or something open source, and then they’ll have some modern standard.” Not surprisingly, Ittycheria sees MongoDB as such a standard, and he has plenty of proof points to bear out his argument.

And why won’t developers explore those 356 options? Ittycheria’s response:

Developers don’t want to take on the cognitive load of learning all these types of databases; [they think] that after some point [it] becomes diminishing returns. So people want to consolidate their platforms, because it becomes easier to manage, there’s more developers that know those platforms, and so on, so forth.

Again, he has a point. It’s the same point I used to make when I worked at MongoDB.

The problem is that MongoDB’s own success belies that point. If we look at the top 10 databases on the DB-Engines list, almost half of them (40 percent) either didn’t exist or just barely came into existence 10 years ago.

Dig a bit deeper into the top 25 and this phenomenon is even more pronounced: more than half didn’t exist a decade ago. The reason they were created might be summed up as “data changes.” New data from new workloads necessitates new ways of managing it.

New itches to scratch

While MongoDB is a great example of this (founded in 2009, and 11 years later it’s the fifth most popular database on earth), perhaps an even clearer example is Redis. As I wrote recently, Redis founder Salvatore Sanfilippo would have preferred just to use MySQL for his real-time analytics engine.

Unfortunately it didn’t scale in a cost-effective way for him, so he broke all sorts of conventions to create an in-memory, NoSQL database that has become insanely popular.

As he said, Redis is a great example of how it’s “possible to explore new things,” even in areas like databases where people assume the current crop of options has “solved” all potential problems. We did that for years with relational databases, trying to cram all of our semi-structured or unstructured data into neat-and-tidy rows and columns. It didn’t work.

Ittycheria is, of course, correct that the vast majority of these 356 databases won’t hit “escape velocity” and become the next darling of developers and enterprises. But as MongoDB and Redis prove, we’re far from done inventing new databases that developers will love. We might be able to somewhat accurately select the database winners for today’s data, but we’d be fools to presume what tomorrow’s data (and databases) will be.

We simply don’t know yet.

What we do know is that while developers tend to congregate around a few general purpose databases, they’ve also been aggressively embracing more purpose-built options.

Redis, for example, was often used “just” as a cache early on, but has started to grow well beyond that. Perhaps more than any other area of software, databases are an area ripe for continued evolution and disruption. This is something to celebrate, not constrain.