Sunday, 26 April 2015

SQL and NoSQL: What's the difference, why would I care?

Most of us need to store data as part of our products. At small scale and low volumes this isn't all that problematic, and at the proof of concept and early trial stage one technology choice is possibly pretty much the same as another.

As things scale out the technology choice becomes more important. When it comes to storage (or persistence, more accurately, if we mean the stored data survives loss of power) there are a couple of choices. Lots of people will have heard of SQL and NoSQL.

As a business person, ideally you should actually not have to care. In this post there are a couple of tips for you to think about when your engineering team burbles excitedly about the wonders of Mongo, the cruftiness of MySQL, the thrill of the new ... OK, you'd prefer not to care.

Before getting on to the tips, here is a brief and somewhat incorrect guide to the technology (never mind, it's brief).

What is SQL?

SQL is actually a language for accessing storage rather than a storage technology itself, the storage technology being Relational Database Management System or RDBMS for short. This technology has been around since the dawn of time (the early 1970s) and so has the advantage of being extremely mature. 

SQL databases can usually be configured with a high degree of fault tolerance and offer strong guarantees about the integrity of data. For example, if a bank receives two requests to debit £10 from an account containing £15 only one can succeed. SQL databases support transactions that have so-called "ACID" properties - which NoSQL databases usually lack to some degree or other.

Unsurprisingly, given they do so much, SQL systems can be rather big, cumbersome and have a reputation for being quite inflexible. 

MySQL is a well-known example of a SQL system in use among smaller companies. PostgreSQL seems to remain a choice for some too. Larger companies may use Oracle (which also owns MySQL) and Microsoft SQL Server.

NoSQL

NoSQL is possibly not a brilliant term for the extremely wide variety of non-relational storage mechanisms, the term having been popularised around 6 or 7 years ago, possibly because SQL was not used as a data access language at that time. 

There was and is an increasingly bewildering variety of technologies that fall under this label, which may have little in common with each other and each of which had the purpose of  addressing perceived shortcomings of or over-engineering of the traditional relational approach. Today some NoSQL databases actually do provide SQL access to data, making the term even less pertinent.

Some common NoSQL choices are:  

Redis: A blindingly fast key/value store. "In the case of a complete system failure on default settings, only a few seconds of data would be lost." Often used for caching - i.e. it's not the reference data store for anything and if it crashes can be easily rebuilt without loss of business data. Given the above quote, though, probably not the place to store financial transactions on the default settings.

MongoDB: A document database, grand daddy of them all in some ways. Possibly showing signs of its age a bit, though the new version 3.0 is promoted as being really quite shiny. Lots of people love it, and it has many detractors too. You have to configure Mongo very carefully to be sure that you are not open to data loss in some failure scenarios.

Others: Couchbase, Cassandra, Riak, Aerospike ... oh goodness, the list goes on. All with pros and cons.

How to Choose

Choosing which technologies to use where is quite hard and requires quite a lot of thought. Despite the inconvenience of using more than one technology to do what is ostensibly "the same thing", this is actually a respectable engineering choice and is probably "current best practice".

It's inconvenient to use more than one technology because your engineering staff need to know what to use and when. The operations aspect becomes quite a lot more complex, since performance tuning, resilience and backup strategies need to be thought about separately for each of the different technologies you deploy. 

Roughly speaking two different technologies means twice the operational burden and cost, and diagnosis of problems becomes correspondingly more complex and if there are more things that can break, well, more things will break.

Nonetheless, cost/performance trade off may still make a heterogeneous persistence choice sensible.

I write about this at more length: Persistence and Constancy.

Conclusions

So what's the answer?

For a while people would think If it's about persistence stick it in MySQL. Nowadays that's changed to If it's about persistence that means stick it in Mongo.

If you are still at the prototype or early trial stage and if I'm talking to you over at Wayra, I won't bat an eyelid almost no matter what you're using. It probably won't scale. You're probably going to rewrite everything you have before this becomes an issue. Concentrate on getting the functionality right and as far as persistence is concerned, well, if you can bear loss of data it doesn't much matter.

If you're doing a finance or banking solution, or something like that, it really does matter, no matter what stage you are at. Read on. Likewise if you're at a later stage then you need to be more careful about considering if the technology is appropriate to your use case.
  1. Don't reject SQL/Relational persistence because it's old and out of date. It's mature and battle hardened and does stuff that other technologies just don't.
  2. Over the years many management reporting tools have been written that interface with SQL systems. Theoretically, at least, they allow non technologists to get reports on their data without having to take engineering time to do this. That is a significant advantage and says that if you're using a system that has SQL access you will hopefully be able to use these tools. 
  3. If the above is not an issue and if you don't need strong transactional guarantees or you don't need to recover to "point in time" in the event of a failure then key value stores can be blindingly fast and may suit your needs just as well or indeed much better than an RDBMS.
  4. Read the documentation carefully, and test in a simulated environment. You can't possibly tell whether a persistence solution is good for you until you've thought carefully about the fit for your data and tried it out. 
  5. Despite increased operational complexity, mix and match may be a good compromise for your applications.
  6. Make sure your data is duplicated reliably. In Mongo that means the write concern is Majority. That's not the default.
  7. NoSQL can appear to offer liberation from having to think carefully about your data structures up front as it it offers the seeming possibility of saying "it doesn't matter, we can change it easily". Really don't do that.
  8. Be very, very careful about concurrency. For example, simple actions like "create or update" are prone to disastrous consequences, if you don't guard against near simultaneous duplicate requests. This is much harder and more subtle than it seems at first glance. Almost everyone has this problem and it's rarely dealt with properly.
  9. Don't wear a MongoDB T-Shirt at Wayra London on Wednesdays, because if I see you wearing one I'll be resisting a temptation to punch you, based on recent experience of performance testing it. Not those of you who are bigger than me, obvs.
Hope the above helps.
Jo

7 comments:

  1. Excellant post!!!. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it.
    SAS Training in Chennai | SAS Course in Chennai

    ReplyDelete
  2. I don't know, without SQL it's not a data base, that's why in all CRM systems we are using SQL.
    Thx for article, it was useful.

    Best regards
    Toby, ideals

    ReplyDelete
  3. Well Said, you have furnished the right information that will be useful to anyone at all time. Thanks for sharing your Ideas.
    Python Training in Chennai | Python Course in Chennai

    ReplyDelete
  4. Nice Post! It is really interesting to read from the beginning & I would like to share your blog to my circles, keep your blog as updated.
    Regards,
    JAVA Training in Chennai|JAVA Course in Chennai|PHP Training in Chennai

    ReplyDelete
  5. The strategy you posted was nice. The people who want to shift their career to the IT sector then it is the right option to go with the ethical hacking course.
    Ethical hacking course in Chennai | Ethical hacking training in chennai

    ReplyDelete
  6. Thanks for sharing this informative content which provided me the required information about the latest technology.
    Salesforce training in Chennai | Salesforce CRM training in Chennai

    ReplyDelete
  7. Excellent post!!! The future of cloud computing is on positive side. With most of the companies integrate Salesforce CRM to power their business; there is massive demand for Salesforce developers and administrators across the world.Salesforce Training in Chennai | Salesforce Training Institutes in Chennai

    ReplyDelete