Monday 28 December 2015

Star Trek's Scotty, Plumbers and CTOs

I read Sustainable Engineering by Douglas Squirrel on The Path Forward back in September. It provoked some thoughts that I think go beyond "one CTO will never agree with another".

This post is mainly directed at CTOs but there's a bit here for CEOs as well.

The Dilithium Crystals

In fact I agree with lots of what is said there. I suppose that my main concern is that it's fuel to the fire of the CEO and the board to treat their CTO's concerns as Scotty-like pleadings to Captain Kirk. "The engines won't take it captain", says Scotty, but somehow they always seem to.

Of course, it would be boring for the producers of Star Trek to show the consequences, namely that the Enterprise goes into extensive refit every time to recover the serviceability of those Dilithium Crystals and the supporting infrastructure. So we don't see that, all we see is that the captain's reckless treatment of his main asset has no consequences.

A former colleague of mine thinks that it's the influence of Star Trek on non technologists that prevents a sensible dialogue on this subject. "It's a point of view, Jim ..."

MVP and Engineering Pragmatism

Moving on from Star Trek analogies, what is a balanced perspective, then?

At the early stages of an initiative you do not know what it is you are trying to build in sufficient detail to want to spend money on the quality that you will eventually require once the details have stabilised.

You seek to create MVP, and you'll pivot etc. along the way. In pivoting you'll make do and mend with whatever technology assets you can - to move quickly and test the new business hypothesis. You'll build on where you are if the hypothesis proves to be promising, you'll dump it if it's no good. In the course of this, you will compromise, often quite dramatically, on textbook engineering. However, that's not bad engineering, it's necessary engineering pragmatism, which has a price.

If the software you have created is anything other than "adequate" (for you to engage with an initial set of customers) you have been wasting engineering resources, since you are quite likely to be throwing any particular piece of code away, or bending it considerably to some new purpose as you find your way.

Maybe you do things a certain way because the skills you have at hand demand that they are done that way. It's no good thinking "if only we'd known at the time that the problem we are actually trying to solve is an ideal fit for technology x" if you don't have technology x skills and you don't have time or money to acquire them.

The code you have at this stage is hence by definition unlikely to be better than "adequate". It's unlikely to be easily maintainable or extensible, and may not be very scalable.

The net result of this is hopefully that the company has proved its commercial purpose, gets a pat on the back from investors in form of follow on funding and that the technology has somehow maintained some kind of architectural shape through the process.

Good enough, at least, to set about a programme of orderly change.

Technical Debt

This is a topic that is a bit belaboured. I think it is an important and despite the belabouring, underserved concept. Technical Debt is healthy, essential even, in some quantities - and disastrous in greater quantities.

If I ask the question "how much technical debt do you think you have?", that's going to be a hard question to answer since I'm not clear that anyone shares a clean enough conception. Things that form part of it are "now we understand the problem it's clear we built it wrongly".

If anyone says "we don't have any" that's a bad sign, either because they haven't thought about it or because they have spent too much time polishing things that should not be polished. It's inevitable and a sign of proper engineering compromise that there will be at least some debt, probably quite a bit.

At the point at which you have reached a significant funding point, then, you'll have some, maybe quite a lot of technical debt. Presumably you'll have made a business case for your funding which includes things like "going beyond the MVP in function" and "scaling, reliability" and other non-functional things you need to do to your product or service.

Hopefully (but in reality more rarely seen than one might wish) in seeking to go beyond, some portion of the raise is allocated to a proper consideration of what needs fixing and with what urgency you need to do this.

Reducing the Debt

In thinking about improving the quality of your code base the imperative is to keep things going while you do that - which means operating what you've got while building the new. Incremental change of some kind is going to be the regime. As often observed, by Squirrel and others, thinking you are going to achieve renewal through a "from scratch" rewrite is at best highly risky and at worst delusional.

If the software is monolithic you'll need to break it apart into smaller pieces. This can be quite hard depending on what the interdependencies are between the bits. So a strong piece of advice is to try to maintain a clean separation (service interfaces) between major components irrespective of pivoting and whatever.

Once you have the smaller bits it may be practical to tear them up and replace them, if they are small enough. And you should not shrink from doing that.

That kind of renewal is going to allow you to introduce a heathy level of decoupling. Database access is a good example where there may be direct dependencies on a schema from different pieces of code, which are otherwise independent of each other. Separating out access to the database via a service layer or APIs is an essential first step. If you you don't take this step it's very hard to change the structure of the database or to tune it without changing substantial portions of the system when you do so.

Also as Squirrel mentions you will want to address more minor issues through having a list of changes you will make as and when you make other modifications to the code. I think that you need to have a target in mind for when you plan to complete that process, as otherwise it's likely that you will never do it. The example of taking a year to move to Symfony looks to me to be be a rather long time to achieve an objective.

Things I'd advocate as part of this "minor" incremental process including things like adding the comments that got missed off when someone was in a rush. Looking critically at the inputs to and outputs of methods, what happens if you pass an out of range value? Really try to do something about improving your test coverage. Good underlying test coverage is worth investing in and makes for many fewer unwelcome surprises when adding features and improving performance.

Cost of Ownership

All of this is a bit abstract and theoretical, perhaps. So let's try to bring it back to some concrete things that fall in the management domain and need to be considered by the management team, rather than being an "engine room only" concern.

Although software doesn't rust, as Spolsky aptly points while saying "never do a rewrite" in his "Things you should never do" (and I must say I do look up to him), that doesn't mean that once built, it's an asset you don't need to look after. The fact that it's intangible perhaps makes it harder to understand this, so another analogy comes along.

If you own a building you make provision for cleaning and periodic refurbishment. The gutters needs cleaning every year. The walls need repainting every few years. Carpets wear out. Things break and need to be replaced.

While software doesn't rust, or wear out in the way that physical things do, it does the equivalent. It becomes brittle, it breaks, it becomes inefficient.

Things that used to work, stop working. Why? Because they are being used in unanticipated ways. Because the versions of dependencies change (at least in response to security advisories, you do upgrade your operating systems and other platform libraries, don't you?). Java 6 code will run significantly slower than Java 8 code. You may care about that, and you may care that security fixes are ongoing to Java 8 but stopped for Java 6 some time ago.

So given the above it seems odd that so few companies seem to make a specific provision in their business planning for fixes, general quality improvement or for periodic major refactoring.

Simply put, once you have software there is a cost of ownership and treating it as a liability rather than an asset will put you in a better place.

A deepening spiral of bad code

You want your new code to be written well. You can't do that if you're breaking the contracts of existing code. "We'll have to do it that way because it fits best with existing code" is something one has to say, from time to time. Adding quickly: "But please understand that this isn't a good way to do it in general because (... lesson n+1 on sound engineering practice)". It's not unreasonable for the team to ask "when are we going to be allowed to write good quality code, then?".

At this point it's sensible to realise that you must have a definite plan to do a rewrite of that bit of the code. If you do not, you are digging the hole deeper, compounding the technical debt and increasing the cost of refactoring.

It seems that it is pervasive (or at best very common) for major corporations to be stuck in a deepening spiral of that kind. I recall reading a couple of years ago that two banks which were to merge did not do so, because their systems were incompatible. W.T.F. Don't get to that point folks.

Using the wrong programming language

Oh, dear. This is a "whole nother" topic. For now, let's just say that by separating your code into interoperable components there is no big deal in using different languages in the different components.

Shall we leave it that for now? Um, not quite. Some programming languages attract a significant salary premium for practitioners of that language. That premium may be justified by skill, or by inherent efficiencies and other considerations, but my advice would be to be sceptical about such claims.

You need to use a completely new technology

Let's say that you didn't realise that proper transactional guarantees were needed at the outset, because that's not what you set out to do. You used MongoDB, a good choice for lots of applications, no doubt about that. But now your counter-cultural CTO realises that some parts of the system would be a lot better off and a lot safer if you switched those portions that need the functionality to MySQL (counter-cultural because everyone knows that MySQL is old and sad and MongoDB is happy and trendy).

Usually you cannot fiddle around the edges to make such a change. If you have used sound engineering principles you'll have a nice service interface behind which you make the changes invisibly to the other dependent parts of the system. Either way, there's a lot of work to do to completely rewrite the service layer to work with different underlying technology. And you can't do this in an incremental way. e.g. it's unlikely that you'll be able to progressively migrate your transactions from one technology to another.

Spolsky makes it clear that he thinks that is the way to address such changes too, btw, while at the same time as saying "don't rewrite from scratch". The point of difference here is only, I think, don't rewrite the whole system from scratch all at the same time. Do rewrite parts of it from scratch while minimising changes to other parts.

If you like classical analogies, think of this as the "Ship of Theseus" approach. While you can replace individual timbers on the deck in a piecemeal way, you have to replace the mast and rigging all in one go.

The Incoming CTO-like-a-plumber Thing

You may have lost your earlier incumbent. You may be scaling up beyond the capabilities of existing team members. You may have outsourced early stage development and now need to insource it. A condition of your investment (if you don't already have one) might be "get a CTO". This isn't always good advice, actually, but that would be too long a digression for here.

Let's assume that for some reason or another you are looking for someone to head up the engineering team. If you are expecting the incoming person to pick up the reins and continue the charge using the same technology and methodologies that you already have in place, that is a somewhat problematic assumption.

Technology choices and development methodologies are not neutral. You won't be surprised to hear candidates sucking through their teeth, and saying the equivalent of "who fitted this sink for you then?", "This fuse box isn't fitted to regulations and should be condemned" and the like.

Yes, that goes with the territory. Getting one CTO to wear another CTOs clothes is going to be hard. You're asking them to adopt the pain of someone else's (no doubt flawed) decision making, and living through the painful late night consequences of that. If the problem was caused under my watch it's my responsibility to fix it. If it was under someone else's watch it's still my responsibility, but I am only human, and will probably resent it!

As part of the hiring process you may need to promise fairly substantial flexibility around change (read cost). Or to turn that around another way, if you hire someone on a "steady as she goes" agenda, they will need to feed very confident that they are comfortable with what you have got. If I were to consider a role of that kind, I'd want to do some serious code review before signing the contract.


Yes, incremental change is the right way to do things if you can. But be realistic, and don't think that everything can be changed in that kind of way. When making necessary major changes make sure the tasks are suitably divided up so that the changes you do make are reasonable and controlled.

Most of all, plan for maintenance. It's no surprise at all that software needs continual ongoing minor maintenance and less frequent major changes. That should be part of your business planning.

Friday 13 November 2015

The Responsive Emperor has no Clothes

Originally published by Computer Weekly

For some time it's been apparent that like the famous fairy tale emperor, responsive design lacks some essential items of clothing, but debate about this has been limited by almost overwhelming group think and fashion.

Google recently played the role of the small boy in that fairy tale in making the announcement of Accelerated Mobile Pages. In Google's announcement we see a refreshing acceptance that there is a problem with the Web on mobile. Even more refreshing is that the argument is conducted on a down to earth pragmatic and commercial basis, rather than on an abstract technological, aesthetic basis which resolutely ignores the commercial point of an organisation having a Web site in the first place.

So Web site owners are suffering, their users are suffering too. Responsive Web Design on its own is quite simply not enough of an answer. And it’s unhealthy - lashings of Javascript poured over everything leads to sclerosis.

The first step, they say, is to acknowledge that there is a problem the next step, apparently, is to seek help.

So what help does Google offer? Well, sensibly, they say they are tackling the problem one step at a time. They offer a remedy for primarily static pages that carry advertising. That's great, but the approach they advocate is not startlingly different to advice that has been available from the W3C in the form of Mobile Web Best Practices for many years. 

Things were quite different when that document was written, but the basics are still quite sound especially when you realise that those recommendations were written at a time when most Web pages were primarily static and responsive design had not become a creed. So it’s not surprising that Google’s recommendations applied to static pages and the historic view are reasonably well aligned.

To take a specific example. In AMP you note the size of an image in the HTML and that is the size of the image for once and for all. This avoids the browser having to shift pages around as they load, one of the main causes of poor user experience - if you start reading something then suddenly it changes position, well that isn’t good, is it.

Knowing what size you want an image to be up front requires an understanding of the context in which the image is to be displayed - i.e. is this being shown on a 27 inch desktop monitor, or is it to be displayed on a small hand-held screen? The techniques that allow Web sites to determine this kind of information have been around for a while. Businesses and brands that require better than a hit or miss user experience already use device detection as part of their Web presence.

Determining the size of an image in advance is just one specific example of what AMP requires and what device detection provides the answer to. Many other aspects of user experience are improved by using this technique which is highly complementary to AMP.

Responsive design is by no means dead, but it’s really beyond time that that its limitations were acknowledged and that debate moves on to discussing how to improve the real world of the Web, improve user experience and help Web site owners to improve what is now an essential part of their business.

Kudos to Google for extending the emperor's wardrobe.

Monday 1 June 2015

Desktop may be big, but it's rather stupid ...

... A pot pourri of Internet Statistics, Mobile Friendliness and Apps vs Web

Mary Meeker's Internet Trends Report

As widely trailed elsewhere, the much valued and often cited Internet Trends Report by Mary Meeker has hit the virtual bookshelves in its 2015 edition. It's lengthy but very much worth taking the time to study.

Mobile Friendliness

While I don't want to belabour the point - well, I'm going to - having a mobile friendly Web site improves your search rankings.

As I mentioned a couple of weeks ago, Google's algorithm has changed and they now favour mobile-friendly sites. When they say mobile-friendly the bar is actually set rather low, it should not be very hard to achieve this "qualification". So that should not be considered an aspirational target. Rather it should be considered a minimum on a journey to providing your users a productive and pleasant experience of using your Web site.

Recently published is a blog post from Vision Mobile on the State of the Mobile Web in which they conclude that 40% of sites are oblivious to their mobile users. You'll find several other interesting points in that piece.

Vision Mobile publishes lots of statistically based material which is of tremendous value when considering your Internet strategy. If you don't already, you might consider checking out some of their material, especially their Developer Economics reports: Developer Economics provides fact-based insights to help developers choose the right platform, tool or API and build a scalable business.

While I'm here, I urge you to contribute to the currently ongoing survey by taking 10 minutes to tell them what you think about IoT, mobile, desktop and cloud. To incentivise you, there are some fab prizes including an iPhone6, Oculus Rift Dev Kit and an Apple Sports Watch.

Before moving on, allow me to repeat myself (for emphasis, obviously) "even if it's responsive it won't necessarily work well on mobile"

Apps vs Web

Q: Should I build an app or should I build a mobile Web site? 

A: Either may be appropriate for development of your product.

Each has contrasting advantages. Fidelity of user experience usually speaks to having an app. Cross platform considerations may favour a Web solution.

Consider this, though. As a start-up people need to find you and find out about you, so it's likely that you should have a mobile friendly Web site before considering the next step. If you don't do that then it may be hard to develop your relationship with your users to the point where they want engage with your product or service, at which point they may want to download your app. 

Ben Evans - a leading commentator on such matters - recently put it nicely "Do people want to put your icon on their home screen?". If they don't know who you are, he suggests, they won't.

Mobile Not the Dumb Little Brother 

Another post from Ben talks about "Mobile first" - a somewhat well-rehearsed theme, admittedly, in which he comes to the conclusion that it's the PC that has the basic, cut-down, limited version of the Internet, not the mobile.

This view echoes something that Tomi Ahonen, another well-known commentator (though somewhat less mellifluous and more hectoring in tone than Ben) said a long time ago, namely "Mobile is not the dumb little brother of the (Desktop) Internet, it's the 7th mass medium". Dating from 2007, still worth a read. 

My own view of the Desktop vs Mobile debate is "the desktop may be big, but it's rather stupid", which I came up with in a ponderous piece of my own, deliberating the question "What does mobile mean, anyway". 

Hoping as ever that some of that helps.


Wednesday 6 May 2015

More Web site compatibility testing on its way?

Mozilla recently announced that it intends to enforce use of HTTPS for various Web site features in its browser Firefox. On reading this I groaned inwardly and thought more Web site compatibility testing, just what we were asking for.

Before asking what or who is Mozilla and why should I care anyway? let's do a quick survey of Web browsers.

Web Browsers

There are lots of browsers, but among the top 10 you'll find Google's Chrome, Microsoft's Internet Explorer (IE), Apple's Safari, Google's Android browser (not the same as Chrome for Android and steadily declining in importance), Opera has a couple of browsers and there is Firefox.

It's very hard to say how the browsers rank in popularity, since this varies over time, according to geography and what type of Web site is being accessed. However, world wide, Chrome and IE are generally numbers 1 and 2 and Firefox is number 3 in popularity.

Here's a chart showing data gathered for the UK by 51Degrees from sites that use their device and browser detection software (disclosure: I am an advisor to 51Degrees):

In this sample you'll see that Mobile Safari comes out top (meaning iPhones and iPads) and Firefox comes out about fifth. If you're interested you can play with the 51 degrees data, and likewise you can do so at StatCounter. There are many other sources of similar data.

I won't attempt to explain the difference between the stats, other than to note that you have to be careful to distinguish whether you are measuring visitors, visits or hits, and you have to be clear whether you think mobile Safari is the same as Safari, etc. etc. etc. (x1000) and indeed how accurate the analysis is at telling which browser is which.

For the purposes of this discussion, (Mobile) Safari is important in the UK, as unsurprisingly are Chrome and IE.

Which Browsers Visit Your Site?

You can measure what proportion of which browsers visit your web site and it's important to do so. If you're targetting particular UK demographics then it's likely you're going to have a lot of Safari and Mobile Safari users. There are many tools that help you to do this, for example lots of people use Google Analytics.

You're going to want to measure the proportion of users of particular browsers make it through to the various parts of your site you'd like them to reach. If there is a skew, perhaps between mobile and non mobile users, then you may have a usability problem or a mobile friendliness problem.

I mentioned a couple of weeks ago that Google had announced its intention to change its ranking algorithm depending its perception of a site's mobile friendliness. You should have an eye to mobile friendliness - at a minimum because of SEO, but preferably because you case about your users' experiences.

Originally, the Web wasn't designed for Web pages to look the same in different browsers. Indeed it was thought that you, the reader of a Web page, might want to control the appearance of Web pages created by other people. With that in mind it's possibly not a huge shock that Web pages don't appear the same in different browsers, even though the Web site owner would like them to. And this is nothing to do with Responsive Design, it's because different browsers behave differently. Responsive Design just makes that more complicated than ever.

It's made more complicated by the fact that the Web is frequently adding new features and different browser vendors implement those features in different order to each other (sometimes don't implement them at all) and their users update their browser software haphazardly if at all.

Mozilla's announcement looks set to make that more complicated still by proposing to switch off features in the browser unless those features are accessed using HTTPS.

Test, Test, Test

Web pages look and behave differently depending among other things on:

  • Which browser it is
  • What version it is at
  • Which platform it is being used on (e.g. Windows vs OS X)
  • What device it is being used on (e.g. mobile tablet or desktop)
  • What type of network connection is in use (e.g. Fixed connection vs WiFi vs 3G)
  • And now, whether you are using HTTP or HTTPS

All in all, quite a headache.


Some things to think about:
  1. Try to test your Web site in as many browsers as possible especially those that you find in your logs
  2. Especially if your site has dynamic content make sure to visit your own site often (can get overlooked!)
  3. Have different members of your team use different browsers when they visit your site
  4. Make sure to visit your site from mobile frequently
  5. Don't just assume that because your site is "responsive" it will work well across browsers or different formats of device
  6. Make sure you or your engineering team have done a thorough analysis of what goes on behind the scenes and use e.g. Chrome Developer Tools, especially the profiling and performance bits to look for ways of optimising your users' experience

As ever hope all this is some use.



Here are some addenda:

What is HTTPS?

HTTPS is the allegedly secure version of HTTP, which is the mechanism (protocol) by which browsers request content from Web servers and get it back. With (non-S) HTTP your request and the response can be read by any intervening equipment in the network. HTTPS prevents that happening directly, but is open to some criticisms of its overall effectiveness.

Who or What is Mozilla?

There is a well known saying which goes, if it's free, you are the product.

firefox logo
We, as users of Microsoft's, Apple's or Google's browser, find that integration with their environment and services has lots to recommend it in terms of usability and convenience. We probably know that they are using our data and that our privacy is compromised at least to some degree. (Well if you didn't realise that then it's well past time to become aware).

Mozilla is the foundation that creates the Firefox browser amongst other open source software.

Here's what Mozilla says about itself:

Committed to you, your privacy and an open Web

Our mission is to promote openness, innovation & opportunity on the Web.

A key point about Firefox is that unlike other browsers it's not owned by a commercial organisation. This means that they are free to champion browser user interests, such as privacy, over commercial gain for their shareholders. If you're concerned about such things you've got to welcome that this organisation exists. You've got to welcome that Mozilla is working towards making the Web more private.

One possible outcome of Mozilla's HTTPS initiative is that everyone (Web site owners) ignores it and users of Mozilla browsers will get fed up as more an more sites don't work for them in Firefox, whereas they do in other browsers. You'd think that Mozilla has considered that risk, but then again, it's not a commercial organisation, so maybe its success metrics don't include maintaining or increasing market share.

In the meantime, as Web site owners, I suppose we need to grin and bear yet more and more complicated and costly testing regimes - just for the sake of getting our Web pages on the screens of our users in a form that is approximately like we had intended it to be.

Sunday 26 April 2015

On Persistence

If at first you don't succeed ... then try, try and try again said my primary school headmistress. Today's primary school teacher would probably say something different, like if at first you don't succeed fail fast and pivot.

Facetious ramblings aside, I have been thinking a lot abut persistence recently. I mean in the sense of data storage that survives a crash or the power being turned off.

I thought I would write some stuff down.

1) I have heard about SQL and NoSQL, what is that? A short, opinionated guide for the perplexed.

2) Persistence and Constancy. Or the need to be prepared to switch persistence solutions and some thoughts about how to lessen the pain.

3) ORMs, a Faustian Pact. A rather detailed discussion. Of interest to engineers, if anyone.

Hope of some use

SQL and NoSQL: What's the difference, why would I care?

Most of us need to store data as part of our products. At small scale and low volumes this isn't all that problematic, and at the proof of concept and early trial stage one technology choice is possibly pretty much the same as another.

As things scale out the technology choice becomes more important. When it comes to storage (or persistence, more accurately, if we mean the stored data survives loss of power) there are a couple of choices. Lots of people will have heard of SQL and NoSQL.

As a business person, ideally you should actually not have to care. In this post there are a couple of tips for you to think about when your engineering team burbles excitedly about the wonders of Mongo, the cruftiness of MySQL, the thrill of the new ... OK, you'd prefer not to care.

Before getting on to the tips, here is a brief and somewhat incorrect guide to the technology (never mind, it's brief).

What is SQL?

SQL is actually a language for accessing storage rather than a storage technology itself, the storage technology being Relational Database Management System or RDBMS for short. This technology has been around since the dawn of time (the early 1970s) and so has the advantage of being extremely mature. 

SQL databases can usually be configured with a high degree of fault tolerance and offer strong guarantees about the integrity of data. For example, if a bank receives two requests to debit £10 from an account containing £15 only one can succeed. SQL databases support transactions that have so-called "ACID" properties - which NoSQL databases usually lack to some degree or other.

Unsurprisingly, given they do so much, SQL systems can be rather big, cumbersome and have a reputation for being quite inflexible. 

MySQL is a well-known example of a SQL system in use among smaller companies. PostgreSQL seems to remain a choice for some too. Larger companies may use Oracle (which also owns MySQL) and Microsoft SQL Server.


NoSQL is possibly not a brilliant term for the extremely wide variety of non-relational storage mechanisms, the term having been popularised around 6 or 7 years ago, possibly because SQL was not used as a data access language at that time. 

There was and is an increasingly bewildering variety of technologies that fall under this label, which may have little in common with each other and each of which had the purpose of  addressing perceived shortcomings of or over-engineering of the traditional relational approach. Today some NoSQL databases actually do provide SQL access to data, making the term even less pertinent.

Some common NoSQL choices are:  

Redis: A blindingly fast key/value store. "In the case of a complete system failure on default settings, only a few seconds of data would be lost." Often used for caching - i.e. it's not the reference data store for anything and if it crashes can be easily rebuilt without loss of business data. Given the above quote, though, probably not the place to store financial transactions on the default settings.

MongoDB: A document database, grand daddy of them all in some ways. Possibly showing signs of its age a bit, though the new version 3.0 is promoted as being really quite shiny. Lots of people love it, and it has many detractors too. You have to configure Mongo very carefully to be sure that you are not open to data loss in some failure scenarios.

Others: Couchbase, Cassandra, Riak, Aerospike ... oh goodness, the list goes on. All with pros and cons.

How to Choose

Choosing which technologies to use where is quite hard and requires quite a lot of thought. Despite the inconvenience of using more than one technology to do what is ostensibly "the same thing", this is actually a respectable engineering choice and is probably "current best practice".

It's inconvenient to use more than one technology because your engineering staff need to know what to use and when. The operations aspect becomes quite a lot more complex, since performance tuning, resilience and backup strategies need to be thought about separately for each of the different technologies you deploy. 

Roughly speaking two different technologies means twice the operational burden and cost, and diagnosis of problems becomes correspondingly more complex and if there are more things that can break, well, more things will break.

Nonetheless, cost/performance trade off may still make a heterogeneous persistence choice sensible.

I write about this at more length: Persistence and Constancy.


So what's the answer?

For a while people would think If it's about persistence stick it in MySQL. Nowadays that's changed to If it's about persistence that means stick it in Mongo.

If you are still at the prototype or early trial stage and if I'm talking to you over at Wayra, I won't bat an eyelid almost no matter what you're using. It probably won't scale. You're probably going to rewrite everything you have before this becomes an issue. Concentrate on getting the functionality right and as far as persistence is concerned, well, if you can bear loss of data it doesn't much matter.

If you're doing a finance or banking solution, or something like that, it really does matter, no matter what stage you are at. Read on. Likewise if you're at a later stage then you need to be more careful about considering if the technology is appropriate to your use case.
  1. Don't reject SQL/Relational persistence because it's old and out of date. It's mature and battle hardened and does stuff that other technologies just don't.
  2. Over the years many management reporting tools have been written that interface with SQL systems. Theoretically, at least, they allow non technologists to get reports on their data without having to take engineering time to do this. That is a significant advantage and says that if you're using a system that has SQL access you will hopefully be able to use these tools. 
  3. If the above is not an issue and if you don't need strong transactional guarantees or you don't need to recover to "point in time" in the event of a failure then key value stores can be blindingly fast and may suit your needs just as well or indeed much better than an RDBMS.
  4. Read the documentation carefully, and test in a simulated environment. You can't possibly tell whether a persistence solution is good for you until you've thought carefully about the fit for your data and tried it out. 
  5. Despite increased operational complexity, mix and match may be a good compromise for your applications.
  6. Make sure your data is duplicated reliably. In Mongo that means the write concern is Majority. That's not the default.
  7. NoSQL can appear to offer liberation from having to think carefully about your data structures up front as it it offers the seeming possibility of saying "it doesn't matter, we can change it easily". Really don't do that.
  8. Be very, very careful about concurrency. For example, simple actions like "create or update" are prone to disastrous consequences, if you don't guard against near simultaneous duplicate requests. This is much harder and more subtle than it seems at first glance. Almost everyone has this problem and it's rarely dealt with properly.
  9. Don't wear a MongoDB T-Shirt at Wayra London on Wednesdays, because if I see you wearing one I'll be resisting a temptation to punch you, based on recent experience of performance testing it. Not those of you who are bigger than me, obvs.
Hope the above helps.

Persistence and Constancy: How to store your data permanently but change your storage solution

Your requirements are simple: you want to store data safely, you don't want to be tied to a particular vendor or technology.

I can't remember a time at which thinking and available products have been moving faster and where it has been more difficult to formulate tactics - let alone strategy. But there are some general themes.

Keep your Deployment Minimal

The more technology you have the harder your operations are. You have to know a lot about each additional piece of technology. The more instances you use to run your technology the more costly it is. Long and short is that it's desirable to minimise your dependencies.

Having a three node MongoDB cluster and a two node MySQL and duplicated Redis is obviously overkill for a small deployment, isn't it?

Horses for Courses

In the world of persistence, like everywhere else, different technologies are good at different things. Some technologies are very poor at some things but good at others. If your requirement spans a range of functions then you're going to find that you need more than one type of technology.

Someone said to me recently that it has become true that best current practice is to have more than one type of technology for persistence. I have to reluctantly agree, irrespective of trying to keep your deployment minimal, as above.

Choosing Technology is Hard

There are many choices. Much of the information is vague and much of the discussion around is misleading or wrong. The only practical choice for a small engineering team is to narrow down the choices and try them one by one. Of course, while you are doing that everyone is busy upgrading their technology.

A good example is the arrival of MongoDB's Wired Tiger storage engine, in MongoDB 3.0. In my tests it's slower, not - as advertised - much faster. How can this be? My results are not consistent with other benchmarks, even from those with an axe to grind (e.g. Couchbase).

Perhaps it's because I tested it on slow rotating disks and it's designed for SSDs. Perhaps it's because it's not suitable for the workload we have. Perhaps that's why MongoDB now maintains two storage engines. Perhaps it really is just slower at the moment. That's a lot of "Perhaps's". In any case it does not appear to be the answer to this particular MongoDB maiden's prayer and I just have to focus my efforts on functional application enhancements, not delving around in the innards of the persistence engine that I have chosen.

In the same tests the TokuMX storage engine was much (5x) faster. But it looks like it's at Mongo 2.4 and was provided by a small and relatively unknown company. That's changed, since Tokutek was acquired by Percona and presumably this may help accelerate Tokutek's plan to make it a third storage engine choice in Mongo 3.0.

You can't choose your technologies off the spec sheet or by hearsay, you have to try them. As a small company, you may not have resources to try them properly and by the time you discover their limitations you are committed. Because of time and money constraints you need to stick with it and try to make it work out. Hopefully your technology vendor is moving fast and fixes and enhancements arrive in time ...

When it Comes to Change

It's foreseeable that despite your constancy your solution doesn't grow with you. It's impossible for you to foresee this unless you have done more testing than you probably have time to do. You have to take a punt, there's a good chance you will lose. You're going to tell your CEO that they are not going to get new features they need for sales and that you're going to have to replace something with no functional gain. I recommend that you practice the argument in front of a mirror before trying it in the office.

Another reason for change is that you find out something you really don't want to hear. My thanks (I suppose!) to Nick Reffitt, CTO over at Tapdaq for bringing this article about flaws in MongoDB to my attention. It's a very good but very long article, not for the faint hearted, which in brief asserts that Mongo has some failure modes that mean you will lose data or that the data you retrieve will not be consistent. Even if I'd been sublimely happy with MongDB before, I'm not now. A chat with my CEO is due. Cough, Hi Dan ...

It's also possible that something new will come along which appears to have features that are a better fit for what you're doing. This one is harder to explain to your CEO. Really practice in front of a mirror. Try smiling. Try to make sure that a change of this kind has measurable bottom line benefits.

Either way, how can you best be prepared for change that in some time frame is inevitable? There will be pain and anguish. Data migration is not fun. How can you minimise the disruption?

There are textbook, or accepted answers. If you haven't separated your storage concerns into a service layer and a DAO layer, people will sneer at you and if your domain objects don't have nice ORM annotations, well, people are going to say that your code doesn't smell good, so don't go out to any parties till you fix that.

Even so, at risk of a spot of sneering, let's challenge accepted wisdom. I'm not saying don't use these accepted techniques, just be aware that they may not be doing for you what you were hoping.

ORMs: A Faustian Pact goes into that in more detail.


In the end there are no good off-the-shelf answers other than to quite simple questions. Try it, stick with it if you can, move on if you can't. Try to reduce your dependencies so moving on is less painful that it might be.


Oh, one more thought: in general, if your framework locks you into a single persistence solution, think again whether you should be using it. Specifically, Meteor users, I mean you.

ORMs: A Faustian Pact

This post is a deep dive on one of the topics that I mention in Persistence and Constancy.

The discussion is about a) abstraction of persistence technology b) the practicality of switching storage solutions c) schemas and d) choosing NoSQL or RDBMS.

The separation of concerns into service, DAO and domain objects is sensible and is good practice but how much is it going to help you with the above?

How Decoupled are your Domain Objects?

First of all, look at the data types of the fields of your domain objects. And in particular look at the id field. If you're using Mongo this is an ObjectId. Bzzz. Error, this is a layer violation. Your domain objects which you hoped were independent of the underlying storage now are not. This is not a domain object any more, it's a Mongo object.

OK, you're not using a strongly typed language so you don't care. I'm sorry, your get out of jail free card doesn't work. Typically a domain object refers to other domain objects by id. It doesn't matter what these "foreign keys" are stored as, they are particular to the storage mechanism you are using. i.e. if you want to migrate your storage the best these references can become are secondary index fields, if the actual id in a different type of storage is now different.

Never mind, you can always rewrite those fields during data migration. Hmmm, yes you can. Sounds tricky and error prone.

Annotations or @Mephistopheles

There are standard annotations. JPA and JDO are both standard, if you are using Java. However, if you are using e.g. Hibernate then you're probably going to want to use some of its proprietary features and extensions. That's OK as long as you don't mind being locked into Hibernate. Hibernate supports lots of things, after all. Cross fingers it will support what you want to use next.

Bear in mind, also, that a standard annotation mechanism of necessity is a least common denominator solution. That means that you're not exploiting the best features of the underlying persistence mechanism. The ones that make that solution stand out from the crowd. Hmmm, that is not altogether clever. This point is made very cogently by Ronen from Aerospike in the video linked below.

And of course, the annotations that you use may not be applicable to a solution that you choose. Imagine that you're using MongoDB and that you've chosen to use Morphia. Not a bad choice at all. The problem is, of course, that when you decide to change the underlying persistence mechanism, you won't find an ORM that supports Morphia annotations. So you have to re-annotate.

During migration you're most likely going to have two sets of annotations on the same domain objects. Never mind, it's a bit ugly, but it can be made to work, given of course that the fields of the domain objects are not either strongly or implicitly typed to underlying persistence data types. Remember that whereas the underlying type is explicit in strongly types languages, things do still have a data type whatever language you are using, and even if you're storing an id as a string it's still an id that is closely coupled with the underlying storage.

A Schema is a Schema is a Schema

Your domain objects are probably, in JPA terms, "entities", i.e. they are rows of a table. They probably refer to other entities by id. We talked about that above.

Now, if you find that all your domain objects are entities, well, that means you have nicely normalised all your data. That's good practice, right? And you're using an RDBMS, right? You're not? You're using MongoDB? Really? Why?

If your domain objects are basically a projection of an RDBMS schema then why aren't you using an RDBMS? Using a non-relational store is potentially the worst of both worlds. If your domain objects are basically structured according to a schema then you can get the benefit of joins and all the tremendously useful things that RDBMS does and that NoSQL doesn't. What's more you are not taking advantage of the basically schema free-nature of NoSQL, and schema changes are going to have similar impact as a result of your domain object and ORM choices as if your persistence was RDBMS based, and that's why you chose NoSQL in the first place, probably?

It is true that, for example, MongoDB provides for queries by field missing or not. And that helps with schema upgrades, because your DAOs can use this information and be flexible. But you need to be careful. I mean very careful, you're not careless about your data, I'm sure.

Plus it may not work. If you're using MongoDB you're probably aware that in place updates are quicker than updates that extend a record (document) beyond the current storage size. Consequently it may be important for performance reasons to pre-populate your records to their expected maximum size, so that they don't move on disk and require allocation to take place on an update. I probably don't need to point out that this negates any possibility of using "field not present" techniques.

More Hate for ORMs

That ORM is lying to you is really long and quite definitely for geeks. But this guy, Ronen Botzer from Aerospike, really really knows his stuff. As I say, at 1hr 20min is not a light confection, so something to watch while your significant other is watching reruns of Star Wars or Sex in the City or whatever it is that your tastes diverge on. He makes some tremendously interesting points, not least of which is that Aerospike is Open Source, i.e. no FoundationDB disappointments.

Take Outs

It is, in short, hard. Very hard. Think about the following:
  1. You are not going to get away from the fact that your persistence solution can't just be swapped out. It's going to be painful.
  2. Just because you're using schema-free storage doesn't mean that you don't in practice have a schema. All the vendors say you need to do careful up-font thinking about your data. Don't skip that step.
  3. Don't normalise your data for the sake of normalising it. Economy of storage is not usually a concern that trumps performance and flexibility.
  4. Don't think that because you have services, DAOs and domain objects nicely separated that you're independent of underlying storage. You're not. And worse still you may not be using its best features.
  5. If you need joins use an RDBMS.
  6. Choose open source to avoid FoundationDB style disappointments.
  7. Always question accepted wisdom.
  8. Always ignore fashion.
  9. Never ignore hygiene.
Happy persisting

Tuesday 14 April 2015

Some thoughts on Software Development: Stack Overflow, Maslow, Trello and The Joel Test

Think of the following as "Amuse-bouches" or Canap├ęs (why don't we have words in English for this) i.e. rather than something specifically actionable, a set of loosely themed ideas ...

Stack Overflow

Voltaire said: if God did not exist it would be necessary to invent [him], so if Stack Overflow did not exist, it would be necessary to invent it.

Don't worry unduly if you've not heard of Stack Overflow - well unless you're in the business of writing software, in which case tremble mightily. As far as software engineers are concerned, Stack Overflow is the 7th layer of the 5 layer Maslow Hierarchy of Needs 

(You've seen the updated 6 layer model with WiFi, right?)

When writing software one often gets stuck, wonders what to do and Stack Overflow is the place to turn to. You find that you're not alone; that people have faced this problem before and that there are 12 different and inconsistent, often wrong, answers to your problem. It makes a massive difference and helps modern engineering productivity hugely.


Actually I'm not here to talk about Stack Overflow, I want to mention Joel Spolsky. As well as being the co-creator of Stack Overflow he also is responsible for Trello. "Drop the lengthy email threads, out-of-date spreadsheets, no-longer-so-sticky notes, and clunky software for managing your projects. Trello lets you see everything about your project in a single glance."

Not using Trello for organising your company? Consider it. It's not for geeks, it's for real people. Read Joel about the success story behind this technology start-up and in particular take note of his discussion of investors in Trello.

The Joel Test

Trello isn't what I want to talk about either, you can check it out for yourself. What I'm here to talk about and what I've finally got round to mentioning is The Joel Test: 12 Steps to Better Code - which is a "highly irresponsible, sloppy test to rate the quality of a software team".

It was written in 2000. A really, really long time ago, and in part it is showing signs of its age. Have a look at it and adapt it to your circumstances. If you're not writing code yourself, think about it as a source of questions for people who write software for you, or who want to write software for you.


Sunday 5 April 2015

FoundationDB: A tale to envy with a sting in its tail

I've been watching FoundationDB for a while now, they've been very good at promoting themselves and have had a clear story. 

Fast, scalable, resilient ... and best of all, combining the merits of both SQL and NoSQL, you know, like Hovis.

Having raised USD 22M in two rounds, they have been bought by Apple for an undisclosed sum.


Here's the sting in the tail.

FoundationDB had in parts been open source. Here's the Git repo now: "This organization has no public repositories."

Not only that but all (including paid) downloads have been removed from the site. This is incredibly bad news for those dependent on it, given the likelihood that they'll need to move to something else. 

Even if they have a properly structured interface to their persistence (meaning that code changes are minimised), the fact that the data model is likely to be somewhat or wildly different to what it is in FoundationDB means the data migration is likely to be anything from a nuisance to extremely painful.

There are several lessons one can draw from this.

  1. Make sure your deployment stack is not a stove pipe and that pieces can be swapped in and out of it.
  2. Another way of saying that is - as far as you can minimise your hard dependencies (build to standards and abstractions)
  3. Be boring, don't reach for the newest shiniest things
  4. Check your licenses. Not only are you not necessarily free to use open source stuff for anything you like, you don't necessarily have any guarantee of ongoing access.


More Reading 

Friday 3 April 2015

Google Ranking Algorithm Changes 21 April

For those interested in SEO (and who isn't) please see the this article in SearchEngineWatch. In the past Google has adjusted its algorithm on a periodic basis which has caused some surprise and dismay among some companies when they find themselves suddenly relegated.

The subject of Mobile Friendliness is, unfortunately, rather detailed. Please be aware that "being responsive" and "being mobile friendly" are not the same thing. Some studies claim to show that responsive sites actually contribute to a poorer user experience ... others show the opposite.

For the sake of pure pragmatism and for those less interested in the ins and outs of this discussion, do make sure to follow the links in the article linked above to Google's mobile friendly test.

But also please be aware that sites that would benefit from better mobile optimisation seem to pass this test. If your site varies its layout but doesn't adjust what gets sent to the device you may well be sending things to the mobile user that have unnecessary and undesirable cost and performance implications. If you're interested in improving user experience rather than just checking the SEO box, please also look at Google's PageSpeed.

Also check out the Chrome Developer tools which simulate your site experience on various screen sizes and under various typical network conditions (3G etc.) - hamburger icon | More tools | Developer tools | phone icon - in your Chrome browser.


Recent news on responsiveness:

The BBC attracted quite a lot of criticism for its recent "move to responsive".

The following is a relatively-speaking polite debate on the topic between religious adherents of both sides of the argument, from the extraordinary Bruce Lawson.

For those with a deeper level of interest in Web technology: among the comments on the above from the great and the good of the Web world please note in particular that of Andrew Betts of the FT. You may or may not remember that the FT famously eschewed an app-based approach in favour of Web only. They know a thing or two about this topic.

Monday 2 March 2015

CTO in Residence Wayra

I had the great pleasure and honour of being appointed the CTO in Residence at Wayra UK.

It's been amazing fun and very rewarding to meet everyone, to hear about their exciting adventures and to try to help them and offer some opinions.

In the course of doing so I found myself gathering some thoughts and whimsies, from time to time, which I found myself circulating the teams by email. Encouraged by some nice feedback I continue.

I've decided that it would be better, in the interests of everyone's inbox and in order to be able to do things that often don't work well in email, like images, to resurrect this occasional blog and put stuff here, initially by retrospectively inserting some earlier emails.

Posts tagged with Wayra and CTO.