Sunday, 26 April 2015

ORMs: A Faustian Pact

This post is a deep dive on one of the topics that I mention in Persistence and Constancy.

The discussion is about a) abstraction of persistence technology b) the practicality of switching storage solutions c) schemas and d) choosing NoSQL or RDBMS.

The separation of concerns into service, DAO and domain objects is sensible and is good practice but how much is it going to help you with the above?

How Decoupled are your Domain Objects?

First of all, look at the data types of the fields of your domain objects. And in particular look at the id field. If you're using Mongo this is an ObjectId. Bzzz. Error, this is a layer violation. Your domain objects which you hoped were independent of the underlying storage now are not. This is not a domain object any more, it's a Mongo object.

OK, you're not using a strongly typed language so you don't care. I'm sorry, your get out of jail free card doesn't work. Typically a domain object refers to other domain objects by id. It doesn't matter what these "foreign keys" are stored as, they are particular to the storage mechanism you are using. i.e. if you want to migrate your storage the best these references can become are secondary index fields, if the actual id in a different type of storage is now different.

Never mind, you can always rewrite those fields during data migration. Hmmm, yes you can. Sounds tricky and error prone.

Annotations or @Mephistopheles

There are standard annotations. JPA and JDO are both standard, if you are using Java. However, if you are using e.g. Hibernate then you're probably going to want to use some of its proprietary features and extensions. That's OK as long as you don't mind being locked into Hibernate. Hibernate supports lots of things, after all. Cross fingers it will support what you want to use next.

Bear in mind, also, that a standard annotation mechanism of necessity is a least common denominator solution. That means that you're not exploiting the best features of the underlying persistence mechanism. The ones that make that solution stand out from the crowd. Hmmm, that is not altogether clever. This point is made very cogently by Ronen from Aerospike in the video linked below.

And of course, the annotations that you use may not be applicable to a solution that you choose. Imagine that you're using MongoDB and that you've chosen to use Morphia. Not a bad choice at all. The problem is, of course, that when you decide to change the underlying persistence mechanism, you won't find an ORM that supports Morphia annotations. So you have to re-annotate.

During migration you're most likely going to have two sets of annotations on the same domain objects. Never mind, it's a bit ugly, but it can be made to work, given of course that the fields of the domain objects are not either strongly or implicitly typed to underlying persistence data types. Remember that whereas the underlying type is explicit in strongly types languages, things do still have a data type whatever language you are using, and even if you're storing an id as a string it's still an id that is closely coupled with the underlying storage.

A Schema is a Schema is a Schema

Your domain objects are probably, in JPA terms, "entities", i.e. they are rows of a table. They probably refer to other entities by id. We talked about that above.

Now, if you find that all your domain objects are entities, well, that means you have nicely normalised all your data. That's good practice, right? And you're using an RDBMS, right? You're not? You're using MongoDB? Really? Why?

If your domain objects are basically a projection of an RDBMS schema then why aren't you using an RDBMS? Using a non-relational store is potentially the worst of both worlds. If your domain objects are basically structured according to a schema then you can get the benefit of joins and all the tremendously useful things that RDBMS does and that NoSQL doesn't. What's more you are not taking advantage of the basically schema free-nature of NoSQL, and schema changes are going to have similar impact as a result of your domain object and ORM choices as if your persistence was RDBMS based, and that's why you chose NoSQL in the first place, probably?

It is true that, for example, MongoDB provides for queries by field missing or not. And that helps with schema upgrades, because your DAOs can use this information and be flexible. But you need to be careful. I mean very careful, you're not careless about your data, I'm sure.

Plus it may not work. If you're using MongoDB you're probably aware that in place updates are quicker than updates that extend a record (document) beyond the current storage size. Consequently it may be important for performance reasons to pre-populate your records to their expected maximum size, so that they don't move on disk and require allocation to take place on an update. I probably don't need to point out that this negates any possibility of using "field not present" techniques.

More Hate for ORMs

That ORM is lying to you is really long and quite definitely for geeks. But this guy, Ronen Botzer from Aerospike, really really knows his stuff. As I say, at 1hr 20min is not a light confection, so something to watch while your significant other is watching reruns of Star Wars or Sex in the City or whatever it is that your tastes diverge on. He makes some tremendously interesting points, not least of which is that Aerospike is Open Source, i.e. no FoundationDB disappointments.

Take Outs

It is, in short, hard. Very hard. Think about the following:
  1. You are not going to get away from the fact that your persistence solution can't just be swapped out. It's going to be painful.
  2. Just because you're using schema-free storage doesn't mean that you don't in practice have a schema. All the vendors say you need to do careful up-font thinking about your data. Don't skip that step.
  3. Don't normalise your data for the sake of normalising it. Economy of storage is not usually a concern that trumps performance and flexibility.
  4. Don't think that because you have services, DAOs and domain objects nicely separated that you're independent of underlying storage. You're not. And worse still you may not be using its best features.
  5. If you need joins use an RDBMS.
  6. Choose open source to avoid FoundationDB style disappointments.
  7. Always question accepted wisdom.
  8. Always ignore fashion.
  9. Never ignore hygiene.
Happy persisting

No comments:

Post a Comment