Monday 28 December 2015

Star Trek's Scotty, Plumbers and CTOs

I read Sustainable Engineering by Douglas Squirrel on The Path Forward back in September. It provoked some thoughts that I think go beyond "one CTO will never agree with another".

This post is mainly directed at CTOs but there's a bit here for CEOs as well.

The Dilithium Crystals

In fact I agree with lots of what is said there. I suppose that my main concern is that it's fuel to the fire of the CEO and the board to treat their CTO's concerns as Scotty-like pleadings to Captain Kirk. "The engines won't take it captain", says Scotty, but somehow they always seem to.

Of course, it would be boring for the producers of Star Trek to show the consequences, namely that the Enterprise goes into extensive refit every time to recover the serviceability of those Dilithium Crystals and the supporting infrastructure. So we don't see that, all we see is that the captain's reckless treatment of his main asset has no consequences.

A former colleague of mine thinks that it's the influence of Star Trek on non technologists that prevents a sensible dialogue on this subject. "It's a point of view, Jim ..."

MVP and Engineering Pragmatism

Moving on from Star Trek analogies, what is a balanced perspective, then?

At the early stages of an initiative you do not know what it is you are trying to build in sufficient detail to want to spend money on the quality that you will eventually require once the details have stabilised.

You seek to create MVP, and you'll pivot etc. along the way. In pivoting you'll make do and mend with whatever technology assets you can - to move quickly and test the new business hypothesis. You'll build on where you are if the hypothesis proves to be promising, you'll dump it if it's no good. In the course of this, you will compromise, often quite dramatically, on textbook engineering. However, that's not bad engineering, it's necessary engineering pragmatism, which has a price.

If the software you have created is anything other than "adequate" (for you to engage with an initial set of customers) you have been wasting engineering resources, since you are quite likely to be throwing any particular piece of code away, or bending it considerably to some new purpose as you find your way.

Maybe you do things a certain way because the skills you have at hand demand that they are done that way. It's no good thinking "if only we'd known at the time that the problem we are actually trying to solve is an ideal fit for technology x" if you don't have technology x skills and you don't have time or money to acquire them.

The code you have at this stage is hence by definition unlikely to be better than "adequate". It's unlikely to be easily maintainable or extensible, and may not be very scalable.

The net result of this is hopefully that the company has proved its commercial purpose, gets a pat on the back from investors in form of follow on funding and that the technology has somehow maintained some kind of architectural shape through the process.

Good enough, at least, to set about a programme of orderly change.

Technical Debt

This is a topic that is a bit belaboured. I think it is an important and despite the belabouring, underserved concept. Technical Debt is healthy, essential even, in some quantities - and disastrous in greater quantities.

If I ask the question "how much technical debt do you think you have?", that's going to be a hard question to answer since I'm not clear that anyone shares a clean enough conception. Things that form part of it are "now we understand the problem it's clear we built it wrongly".

If anyone says "we don't have any" that's a bad sign, either because they haven't thought about it or because they have spent too much time polishing things that should not be polished. It's inevitable and a sign of proper engineering compromise that there will be at least some debt, probably quite a bit.

At the point at which you have reached a significant funding point, then, you'll have some, maybe quite a lot of technical debt. Presumably you'll have made a business case for your funding which includes things like "going beyond the MVP in function" and "scaling, reliability" and other non-functional things you need to do to your product or service.

Hopefully (but in reality more rarely seen than one might wish) in seeking to go beyond, some portion of the raise is allocated to a proper consideration of what needs fixing and with what urgency you need to do this.

Reducing the Debt

In thinking about improving the quality of your code base the imperative is to keep things going while you do that - which means operating what you've got while building the new. Incremental change of some kind is going to be the regime. As often observed, by Squirrel and others, thinking you are going to achieve renewal through a "from scratch" rewrite is at best highly risky and at worst delusional.

If the software is monolithic you'll need to break it apart into smaller pieces. This can be quite hard depending on what the interdependencies are between the bits. So a strong piece of advice is to try to maintain a clean separation (service interfaces) between major components irrespective of pivoting and whatever.

Once you have the smaller bits it may be practical to tear them up and replace them, if they are small enough. And you should not shrink from doing that.

That kind of renewal is going to allow you to introduce a heathy level of decoupling. Database access is a good example where there may be direct dependencies on a schema from different pieces of code, which are otherwise independent of each other. Separating out access to the database via a service layer or APIs is an essential first step. If you you don't take this step it's very hard to change the structure of the database or to tune it without changing substantial portions of the system when you do so.

Also as Squirrel mentions you will want to address more minor issues through having a list of changes you will make as and when you make other modifications to the code. I think that you need to have a target in mind for when you plan to complete that process, as otherwise it's likely that you will never do it. The example of taking a year to move to Symfony looks to me to be be a rather long time to achieve an objective.

Things I'd advocate as part of this "minor" incremental process including things like adding the comments that got missed off when someone was in a rush. Looking critically at the inputs to and outputs of methods, what happens if you pass an out of range value? Really try to do something about improving your test coverage. Good underlying test coverage is worth investing in and makes for many fewer unwelcome surprises when adding features and improving performance.

Cost of Ownership

All of this is a bit abstract and theoretical, perhaps. So let's try to bring it back to some concrete things that fall in the management domain and need to be considered by the management team, rather than being an "engine room only" concern.

Although software doesn't rust, as Spolsky aptly points while saying "never do a rewrite" in his "Things you should never do" (and I must say I do look up to him), that doesn't mean that once built, it's an asset you don't need to look after. The fact that it's intangible perhaps makes it harder to understand this, so another analogy comes along.

If you own a building you make provision for cleaning and periodic refurbishment. The gutters needs cleaning every year. The walls need repainting every few years. Carpets wear out. Things break and need to be replaced.

While software doesn't rust, or wear out in the way that physical things do, it does the equivalent. It becomes brittle, it breaks, it becomes inefficient.

Things that used to work, stop working. Why? Because they are being used in unanticipated ways. Because the versions of dependencies change (at least in response to security advisories, you do upgrade your operating systems and other platform libraries, don't you?). Java 6 code will run significantly slower than Java 8 code. You may care about that, and you may care that security fixes are ongoing to Java 8 but stopped for Java 6 some time ago.

So given the above it seems odd that so few companies seem to make a specific provision in their business planning for fixes, general quality improvement or for periodic major refactoring.

Simply put, once you have software there is a cost of ownership and treating it as a liability rather than an asset will put you in a better place.

A deepening spiral of bad code

You want your new code to be written well. You can't do that if you're breaking the contracts of existing code. "We'll have to do it that way because it fits best with existing code" is something one has to say, from time to time. Adding quickly: "But please understand that this isn't a good way to do it in general because (... lesson n+1 on sound engineering practice)". It's not unreasonable for the team to ask "when are we going to be allowed to write good quality code, then?".

At this point it's sensible to realise that you must have a definite plan to do a rewrite of that bit of the code. If you do not, you are digging the hole deeper, compounding the technical debt and increasing the cost of refactoring.

It seems that it is pervasive (or at best very common) for major corporations to be stuck in a deepening spiral of that kind. I recall reading a couple of years ago that two banks which were to merge did not do so, because their systems were incompatible. W.T.F. Don't get to that point folks.

Using the wrong programming language

Oh, dear. This is a "whole nother" topic. For now, let's just say that by separating your code into interoperable components there is no big deal in using different languages in the different components.

Shall we leave it that for now? Um, not quite. Some programming languages attract a significant salary premium for practitioners of that language. That premium may be justified by skill, or by inherent efficiencies and other considerations, but my advice would be to be sceptical about such claims.

You need to use a completely new technology

Let's say that you didn't realise that proper transactional guarantees were needed at the outset, because that's not what you set out to do. You used MongoDB, a good choice for lots of applications, no doubt about that. But now your counter-cultural CTO realises that some parts of the system would be a lot better off and a lot safer if you switched those portions that need the functionality to MySQL (counter-cultural because everyone knows that MySQL is old and sad and MongoDB is happy and trendy).

Usually you cannot fiddle around the edges to make such a change. If you have used sound engineering principles you'll have a nice service interface behind which you make the changes invisibly to the other dependent parts of the system. Either way, there's a lot of work to do to completely rewrite the service layer to work with different underlying technology. And you can't do this in an incremental way. e.g. it's unlikely that you'll be able to progressively migrate your transactions from one technology to another.

Spolsky makes it clear that he thinks that is the way to address such changes too, btw, while at the same time as saying "don't rewrite from scratch". The point of difference here is only, I think, don't rewrite the whole system from scratch all at the same time. Do rewrite parts of it from scratch while minimising changes to other parts.

If you like classical analogies, think of this as the "Ship of Theseus" approach. While you can replace individual timbers on the deck in a piecemeal way, you have to replace the mast and rigging all in one go.

The Incoming CTO-like-a-plumber Thing

You may have lost your earlier incumbent. You may be scaling up beyond the capabilities of existing team members. You may have outsourced early stage development and now need to insource it. A condition of your investment (if you don't already have one) might be "get a CTO". This isn't always good advice, actually, but that would be too long a digression for here.

Let's assume that for some reason or another you are looking for someone to head up the engineering team. If you are expecting the incoming person to pick up the reins and continue the charge using the same technology and methodologies that you already have in place, that is a somewhat problematic assumption.

Technology choices and development methodologies are not neutral. You won't be surprised to hear candidates sucking through their teeth, and saying the equivalent of "who fitted this sink for you then?", "This fuse box isn't fitted to regulations and should be condemned" and the like.

Yes, that goes with the territory. Getting one CTO to wear another CTOs clothes is going to be hard. You're asking them to adopt the pain of someone else's (no doubt flawed) decision making, and living through the painful late night consequences of that. If the problem was caused under my watch it's my responsibility to fix it. If it was under someone else's watch it's still my responsibility, but I am only human, and will probably resent it!

As part of the hiring process you may need to promise fairly substantial flexibility around change (read cost). Or to turn that around another way, if you hire someone on a "steady as she goes" agenda, they will need to feed very confident that they are comfortable with what you have got. If I were to consider a role of that kind, I'd want to do some serious code review before signing the contract.

Conclusion

Yes, incremental change is the right way to do things if you can. But be realistic, and don't think that everything can be changed in that kind of way. When making necessary major changes make sure the tasks are suitably divided up so that the changes you do make are reasonable and controlled.

Most of all, plan for maintenance. It's no surprise at all that software needs continual ongoing minor maintenance and less frequent major changes. That should be part of your business planning.