One of the dilemmas many architects face when designing a system is: To SQL or to NoSQL?
Of course, this is not a “One size fits all” question, and the answers will vary based on many factors: Schema flexibility, type of queries, importance of transactions, scale and more.
A few weeks ago, with one of my clients, we faced the same dilemma. Our conclusion was to use NoSQL, and store our data (of this specific type) as a JSON string, knowing we will have to find an efficient way to query it later.
So the question now became: Which NoSQL database should we use?
As of today, the quite natural answer for this question will usually be “MongoDB, of course!”. But we, as petty as we are, decided to make an informed decision.
The service we were working on was a logging service. In a very early point it was decided that the most important aspect of this service should be performance. We knew this service is going to be queried a lot, and we wanted to make sure the calling services will get the answer in a timely manner.
Since the organization’s IT supported SQL Server 2016 in addition to MongoDB, we decided to compare the performance of those two databases.
We devised a test that was supposed to demonstrate which database – SQL Server 2016 or MongoDB -was faster when querying the JSON data.
It’s important to note that our test measured querying only, and not other types of data actions (ie. Insert, update, delete). The reason is, as stated above, that we knew we’re going to get a lot of queries, and wanted to make sure they’ll perform very well under load.
So here is what we did:
- We loaded 100K records with JSON data to each database. The records are not too large, and contain ~500 bytes each.
- We tested 5 types of queries on the data. Each query was designed to demonstrate a different type of query behavior we anticipated our users to execute. The queries were:
- Retrieve the whole 100K records
- Query by a date range
- Query by a single parameter
- Query by 4 parameters
- Query by nested property of the JSON
We ran each test 1000 times, for each database, and then got the average of the results.
Before posting the results, some technical info:
- We didn’t make any special tuning or configuration change on either of the databases. We used the default installation on both.
- We didn’t define any index on either database.
- The code for querying the databases was written in .NET Core.
- The machine we used for testing has i7-4785T 2.20GHz CPU, 8GB RAM, SSD Drive, Windows 10.
And here are the results:
Surprised? Me too… I was sure MongoDB will be the clear winner here, but the actual results show quite the opposite. In 4 out of our 5 tests, SQL Server 2016 was the faster DB, in some cases by a large margin.
Only in one type of query, nested properties, MongoDB came out the winner, but even in this case it was a very close call.
For our system this was quite a convincing test, and we went with SQL Server 2016.
However, this test was tailor made for our scenarios, and your mileage may vary.
The important lesson for me here was – never assume anything. Always test, and work by the numbers.
Could we see a sample row of data that was stored and the query syntax for each test?
Unfortunately I can’t show this data, as it contains some proprietary info.
However, the JSON is quite large, contains ~20 fields, one of them has ~1KB of base64 string data.
I hope this helps.
Nice test, but to my oppinion it was obvious results, the noSql technology always never defeet Sql engine in quering, not to talk about aggregations..
The benefits are in writing in hige throughputs, scaling, and the flexibility of the schema (cap theorem)
You should try the same test with 100M records, it was realy interested
I’m not sure about that.
In almost all articles related to NoSQL vs SQL, one of the main arguments of the Pro-NoSQL is the great JSON handling. It’s true that other aspects are also important (performance due to eventual consistency, and so on), but the JSON handling is definitely at the top of the list.
Keep testing, you will be surprised. There is no scenario where mongoDB will outperform SQL Server. The ones they always call mean sharding. You can shard in SQL Server too. If you do it right it will outperform again MongoDB.
I wounder what would happen if you added some indices (including Full Text index, I believe MongoDB 4.4 supports it).
That’s a very good question… I believe the results might be different, but I’m not sure to which side…
which unit is used for time s or ms
Thank you for this comparison!
Another strength of SQL Server and most RDBs is the query optimizer, especially when it comes to complicated queries.
In MongoDB, you the programmer have to explicitly code the “access path” – step by step, how to read the data. Consider a self-join query with NOT EXISTS, and how simple that is to write in SQL. Now try to write that for MongoDB, and there are many ways you could code it. Which is best?
In SQL Server, the query optimizer 99% of the time determines an excellent “access path” the first time (especially if the statistics are current). Done.
Now try to make this query fast in Mongo DB. You have quite a project. There are many ways it can be written, and who knows which one is fastest? Time to call for MongoDB premium support. Where are your cost savings now?
Oh, MongoDB has an explain. Compare the results of a MongoDB explain to the SQL Server graphical explain. The MongoDB explain is… really hard to understand. The SQL Server explain is – graphical, with percentages all over the place, and lots of useful information at your fingertips.
After you have written your whole big application, and you “run out of database”, where are you? With SQL Server, SQL is almost universal, I could pretty easily switch to PostgreSQL or MariaDB. With my whole app written for MongoDB — it’s all proprietary. Switching to PostgreSQL or MariaDB or even another NoSQL DB is a very big project. Talk about vendor lock-in.
Considering that in a typical app, SELECTs make up 95% of all requests, and Insert, Update and Delete are pretty rare, the SELECT is the 800 pound gorilla. That’s where I want the power of the Query Optimizer.