Sunday 26 April 2015

Persistence and Constancy: How to store your data permanently but change your storage solution

Your requirements are simple: you want to store data safely, you don't want to be tied to a particular vendor or technology.

I can't remember a time at which thinking and available products have been moving faster and where it has been more difficult to formulate tactics - let alone strategy. But there are some general themes.

Keep your Deployment Minimal

The more technology you have the harder your operations are. You have to know a lot about each additional piece of technology. The more instances you use to run your technology the more costly it is. Long and short is that it's desirable to minimise your dependencies.

Having a three node MongoDB cluster and a two node MySQL and duplicated Redis is obviously overkill for a small deployment, isn't it?

Horses for Courses

In the world of persistence, like everywhere else, different technologies are good at different things. Some technologies are very poor at some things but good at others. If your requirement spans a range of functions then you're going to find that you need more than one type of technology.

Someone said to me recently that it has become true that best current practice is to have more than one type of technology for persistence. I have to reluctantly agree, irrespective of trying to keep your deployment minimal, as above.

Choosing Technology is Hard

There are many choices. Much of the information is vague and much of the discussion around is misleading or wrong. The only practical choice for a small engineering team is to narrow down the choices and try them one by one. Of course, while you are doing that everyone is busy upgrading their technology.

A good example is the arrival of MongoDB's Wired Tiger storage engine, in MongoDB 3.0. In my tests it's slower, not - as advertised - much faster. How can this be? My results are not consistent with other benchmarks, even from those with an axe to grind (e.g. Couchbase).

Perhaps it's because I tested it on slow rotating disks and it's designed for SSDs. Perhaps it's because it's not suitable for the workload we have. Perhaps that's why MongoDB now maintains two storage engines. Perhaps it really is just slower at the moment. That's a lot of "Perhaps's". In any case it does not appear to be the answer to this particular MongoDB maiden's prayer and I just have to focus my efforts on functional application enhancements, not delving around in the innards of the persistence engine that I have chosen.

In the same tests the TokuMX storage engine was much (5x) faster. But it looks like it's at Mongo 2.4 and was provided by a small and relatively unknown company. That's changed, since Tokutek was acquired by Percona and presumably this may help accelerate Tokutek's plan to make it a third storage engine choice in Mongo 3.0.

You can't choose your technologies off the spec sheet or by hearsay, you have to try them. As a small company, you may not have resources to try them properly and by the time you discover their limitations you are committed. Because of time and money constraints you need to stick with it and try to make it work out. Hopefully your technology vendor is moving fast and fixes and enhancements arrive in time ...

When it Comes to Change

It's foreseeable that despite your constancy your solution doesn't grow with you. It's impossible for you to foresee this unless you have done more testing than you probably have time to do. You have to take a punt, there's a good chance you will lose. You're going to tell your CEO that they are not going to get new features they need for sales and that you're going to have to replace something with no functional gain. I recommend that you practice the argument in front of a mirror before trying it in the office.

Another reason for change is that you find out something you really don't want to hear. My thanks (I suppose!) to Nick Reffitt, CTO over at Tapdaq for bringing this article about flaws in MongoDB to my attention. It's a very good but very long article, not for the faint hearted, which in brief asserts that Mongo has some failure modes that mean you will lose data or that the data you retrieve will not be consistent. Even if I'd been sublimely happy with MongDB before, I'm not now. A chat with my CEO is due. Cough, Hi Dan ...

It's also possible that something new will come along which appears to have features that are a better fit for what you're doing. This one is harder to explain to your CEO. Really practice in front of a mirror. Try smiling. Try to make sure that a change of this kind has measurable bottom line benefits.

Either way, how can you best be prepared for change that in some time frame is inevitable? There will be pain and anguish. Data migration is not fun. How can you minimise the disruption?

There are textbook, or accepted answers. If you haven't separated your storage concerns into a service layer and a DAO layer, people will sneer at you and if your domain objects don't have nice ORM annotations, well, people are going to say that your code doesn't smell good, so don't go out to any parties till you fix that.

Even so, at risk of a spot of sneering, let's challenge accepted wisdom. I'm not saying don't use these accepted techniques, just be aware that they may not be doing for you what you were hoping.

ORMs: A Faustian Pact goes into that in more detail.


In the end there are no good off-the-shelf answers other than to quite simple questions. Try it, stick with it if you can, move on if you can't. Try to reduce your dependencies so moving on is less painful that it might be.


Oh, one more thought: in general, if your framework locks you into a single persistence solution, think again whether you should be using it. Specifically, Meteor users, I mean you.

No comments:

Post a Comment