SQL vs NoSQL vs Search Engine

Oded Niv
5 min readOct 14, 2018

--

Post relocated to odedniv.me/blog/software/sql-nosql-search, go there for an updated version.

Which engine should you use for storing your data? Searching this question on Google, brought up a few stackoverflows, the first few of the bunch say something like:

“There is no way to tell without knowing the specific problem you are trying to solve.”

Or (exact quote):

“Elasticsearch has *this* and *that* problem.”

Or even:

“Use OrientDB.”

Now as much as I love non-generic solutions, well, I hate them. As a first post to this blog, here goes a mandatory “this is what this blog is all about”. Enjoy.

I will try to avoid using implementation-specific features as proof for my statements. Oracle may offer this feature, and MongoDB can offer another. The way I see it is that all these developers are putting code where it doesn’t belong, in order to help you become lazy. Those features give you the option to only use one product, so you don’t have to go spend your money anywhere else. You get a jet to drive around the city. Don’t be lazy, you might get people sucked into your engine.

I will only talk about the 3 in the title. More options probably exist, but I only used these 3, so I can’t use my “think like the author” ability on the others.

Relational database (or SQL)

As you read this beautiful and large heading you already know what this kind of database is supposed to be used for: relations.

While structuring your problem with this data store model is the most painful of the three, reminding you of coding before OOP, this is also the simplest structure. You define the most basic entities, and you define their relationships. There is no hierarchy.

I will write it clearly: This is where you define the entities and their relationships. If you have more than one entity (most likely) — this is your primary data store. This will maintain the relationship integrity (by implementing foreign key constraints and transactions), and this will choose your primary keys.

You can use this database to store files, or to create awesome indices that will let you search quickly, and this is probably what most programmers do when it’s the first database they learn to use. Some implementations will even help you out, giving you the tools and the optimizations. This is not what a relational database is meant for. The indices are there for foreign key constraints, and the blobs are just a patch for the lazy (they even stand out in most implementations with their need to be fetched explicitly).

OK, I get it, you want an example for why you shouldn’t use it for non-relational purposes: This is the hardest data store to scale due to the inability to stretch related entities across multiple servers without compromising the performance of using said relationships. *inhale, exhale*. You should store in it as little as possible.

Non relational database (or NoSQL)

Well, here be the data. Like the name suggests, there are no relationships here. It can be as simple as a filesystem by saying: “Give me a file name (or another primary key), I will give you its content”. Easy to scale, efficient to store.

Taking MongoDB as an example, some implementations may offer you a key-value format, so instead of giving a raw document, you send a map (or dictionary?). Fine, still cool, data is data, and this gives you a nicer way to structure your problem into it. Instead of having more and more 1-dimensional collections (equivalent to different directories in a filesystem), each resource in the collection is a bunch of values under the same primary key.

Then you also get to index some of said values, to search a resource by a value other than the primary key, I’ll grab that too. Not what I meant when I chose this data store type, but if there are no drawbacks on scale, sure. When they offer you relations though, nope-the-fuck-out. Can’t they read the heading?! It’s even beautifully formatted and large.

Search engine

TBH I’m not really sure it’s called a “search engine”, I think it doesn’t have a name yet. I also only know of one example implementation for this generic idea called Lucene (but you should really use one of its wrappers: Solr or the more up to date Elasticsearch).

If you used custom indices for searching in SQL and NoSQL, you are going to love this. By that I really mean: don’t use custom indices for searching in SQL and NoSQL, use a search engine. If it could talk like its NoSQL counterpart, it would be saying: “Give me a question, I will give you the primary keys that answer this question”. It’s also great for aggregations and analytics.

This engine gives you all the tools you need to search your data. It can be scaled and optimized greatly (when implemented correctly) since it doesn’t really have relations, and it can give you a lot of index mechanisms such as full-text search, geolocation, or whatever the implementation supports.

You might be tempted to say: “This is an all-in-one solution, why should I use SQL/NoSQL?”. The downside here is that the documents you put in it need to be structured according the way you wish to search them. This means duplicates, which pretty much means a big pile of mess. This is not really a data-base, it comes with much more logic, and it may require rebuilding more often according to new requirements, so you should still keep the single source of truth within the basic databases (SQL/NoSQL).

I recommend combining the above 3 types of tools which should solve most common problems, no matter which implementation you choose. You also get to learn new technologies and eventually age (your career) slower.

Got questions? Need help with designing your architecture? Feel free to comment about general questions/thoughts, or message me with your specific problem. You know how much I love to help open minded programmers, because as soon as I won’t this line will be removed or rephrased to something like “never contact me unless you are a Persian prince that needs money for a flight home, you will be rewarded”.

--

--

Oded Niv
Oded Niv

Written by Oded Niv

Software engineer by day, founder of dexHive by night.

No responses yet