MongoDB grows up
- 09 June, 2022 10:45
When I rejoined MongoDB in 2021, I got to hear all the old jokes rehashed. You know, about MongoDB being “web scale,” about losing data, about only being eventually consistent, and so on. The web scale video is funny; the other statements have largely been wrong since the day they were written.
For example, MongoDB has always been strongly consistent. The contentions that had some semblance of truth have become outdated with each MongoDB release. As Senior Developer Advocate Mark Smith notes, “Everything you know about MongoDB is wrong.”
Of course I’d say that. I work for MongoDB, after all.
Even so, I think it’s worthwhile to gut-check our assumptions. For example, for years we were told that enterprises couldn’t replace Oracle Database or SQL Server with PostgreSQL.
For many workloads, that’s simply not true today and almost certainly wasn’t as “true” before as some would have had us think. Although PostgreSQL has always had a great community, it has also had a chorus of critics. Meanwhile, every major cloud provider has a PostgreSQL database service.
Going further, in 2021 AWS launched Babelfish, an open source project that makes it simple to drop in PostgreSQL to applications written for SQL Server.
In like manner, there’s a reason that every major cloud provider offers MongoDB in some form and that the database was downloaded more times in the past 12 months (265 million times) than in the past 12 years. Both PostgreSQL and MongoDB have made dramatic gains in popularity relative to Oracle and SQL Server.
I don’t want to fanboy this article. But if you’ll indulge me, I’d love to catch you up on a MongoDB you might not know and finish with a suggestion that just might shock you: MongoDB now leads the industry in security, given the release of Queryable Encryption.
Atomicity, transactions, etc
I left MongoDB in 2014, right before the company hit overdrive on updates to the core database. The company had always been a developer darling due to its convenience, but around this time MongoDB announced the acquisition of WiredTiger, which paved the way for document-level concurrency control and compression, starting in MongoDB 3.0 (WiredTiger as an option) and evolving with MongoDB 3.2 (WiredTiger as the default).
Much of the work of deeply integrating WiredTiger into MongoDB happened in 2015, setting up a steady drumbeat of database improvements for the next few years, with one of my absolute favourites — multi-document ACID transactions — arriving in MongoDB 4.0. As MongoDB co-founder Eliot Horowitz wryly wrote in 2018, “MongoDB drops ACID,” and the MongoDB world was never quite the same thereafter.
In MongoDB 5.0, the company introduced a versioned API that allowed developers to upgrade the database without having to change their application, time series functionality, a new serverless offering, live resharding, and more.
Most recently, at MongoDB World, the company announced a number of things to make developers’ lives easier: Atlas Data API, serverless instances, Atlas CLI, and more. For me, the two most interesting thematic announcements were in the areas of analytics and security. Oh, and open source.
Analytics, in MongoDB’s world, are all about enabling developers to build better apps, not about data analysts doing offline analysis. In today’s world, the need for a real-time view of the business pushes data analysis and reporting needs closer to the applications that generate the data.
MongoDB clearly recognised this growing need and announced several new ways to make running analytics against operational data easy, including a new SQL interface, Atlas Analytics Node Tiers, Atlas Data Federation, and Atlas Data Lake.
Even so, it strikes me that the company’s key announcement in the area of applying analytics to real-time data was Columnstore Indexing. Since columnar formats are ideal for running analytical workloads, this indexing option makes it easy for developers to keep documents in the right model for their applications without moving the data, and execute performant analytical queries against that data in real time for their applications.
If all this comes as a surprise to those in the “MongoDB is web scale” camp, Queryable Encryption is an even bigger shocker.
Upping the ante on data security
Nothing that MongoDB (or any database company) releases would be of much use if it couldn’t match features and functionality with security. MongoDB has long offered excellent security, but the introduction of structured encryption for field-level encryption moves things to another level.
Most databases have figured out how to secure data at rest or in motion but fail to secure data while in use when it’s vulnerable to insider access and active database breaches. Enter field-level encryption. Field-level encryption protects data in memory and on disk on the server.
It’s the highest level of security for breaches, but it has a downside: It doesn’t allow for rich, expressive querying of encrypted data. Yes, you can do exact equality matches, but you have to use deterministic encryption.
Nice, but not nearly enough.
Researchers have been working on this problem since 2001, but this week MongoDB announced the first-ever commercially available, structured encryption model, called Queryable Encryption.
With such structured encryption MongoDB can transform the encrypted field in a cryptographically secure way such that it can store anonymous metadata allowing expressive and efficient queries to be performed. As an example, structured encryption enables a developer to build a bank application that can find transactions using a range of dates or dollar amounts for fraud investigation.
This is best-in-industry stuff and doesn’t leave MongoDB’s developer community behind. In short, MongoDB’s use of Queryable Encryption helps developers keep their focus on building engaging, data-driven applications while meeting the industry’s most demanding data privacy challenges. No PhD in cryptography required.
This would be interesting in and of itself, but MongoDB took an especially noteworthy angle with Queryable Encryption: It will be 100 per cent open. As Porter declared in his keynote, “We will be publishing the code, the algorithms, and the math behind it because we believe in white-box security, not black-box security.”
This may come as a surprise to those who still carp on MongoDB’s license change in 2019. (Developers don’t seem to mind, given higher priorities.) But it shouldn’t. MongoDB is a contributor to Apache Lucene, releases WiredTiger under an open source licence, and also offers its Realm mobile data store as open source. It’s easy to paint companies in binary, but it’s usually wrong, as in this case.
All of this is a long way of saying that perhaps it’s time to hit “refresh” on your views on MongoDB. No, it’s not going to be the right data platform for all of your workloads. Nothing is. But it’s good to make that decision based on current reality, not outdated myths.