Letting go of frustration is the first step to tackling legacy or inherited code
Legacy code: for some programmers, it’s job security. For others, it’s the reason that they receive threatening letters from unknown addresses. But for devs with an eye toward the future, it’s an opportunity to turn a mountain cliff into an escalator---to carve out the perilous and crumbling logic and replace it with an intuitive flow from one functional concept to the next. In this article, we’ll carry you through a few strategies and philosophies that have proved useful to us in the past.
The tips in this article emphasize simplicity and comprehensibility in software because, if you aim for anything else, you’re likely to run into trouble. Tony Hoare, inventor of quicksort (among many other things), captures this philosophy succinctly, saying: “I conclude that there are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies.”
I conclude that there are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies.
Tony Hoare, developer of the quicksort algorithm
Step 1: Practice letting go of frustration with the original code.
Before we get into the nuts and bolts, it is essential to attain the correct mindset for an endeavor that is surely even more difficult than scaling that mountain cliff: rebuilding it, better. The faint of heart will not complete this journey of ten thousand lines, although it will begin with merely a single character. The ancient lump of code sitting before you was not a malicious act; it was likely just an accident of time and human error that grew unmanageable in the shadow of a hundred small ignorances.
Sometimes, though, even a dumpster fire can be beautiful. That doesn’t mean we don’t have to put out the fire and clean up the mess, but it does mean that dumpster fires are a part of life, and the victory over chaos will be all the sweeter once we have tamed it. We do it because it is hard, because we must, and because our legacy depends in part upon the code that we are about to write.
Step 2: Try pair whiteboarding.
It’s beyond the scope of this article to extol the virtues of pair programming, but suffice to say that the practice is catching on, at fast- and slow-moving code shops alike---and for good reason. The complementary brains of two programmers can eliminate many of the oversights that a single human is liable to commit. Dealing with legacy code is no different: tackling it in teams of two is sure to deliver better results.
Before you get into the actual writing of the new code, though, the fact that you’ve got this legacy issue presents an excellent opportunity for you to design a code base that won’t make future devs cringe when they think about it. You can do this by “pair whiteboarding” with a partner:
- Sketch out a model of the legacy “algorithm” (aka the set of actions that the legacy code currently accomplishes).
- Model the ideal version of the algorithm, one that is not too hard to implement but also not so similar to the current spaghetti code that it’ll need a rewrite in another year.
- Determine whether the ideal version (i) can be built. If not, sketch out a compromise (c) between the legacy and (i) and repeat this step for (c).
- Build either (i) or (c).
You may of course also complete this process with a team, but remember that “too many cooks can spoil the broth." (Much debate has raged on about what the ideal size of a team is. We recommend you start with 2 and work your way up.)
Step 3: Use linters and type checkers.
Many programming languages feature optional syntax, and many times we are thankful for the variety of tools with which we can write our programs. However, the flexibility that these languages allow can also be a downside for the programmer who is tasked with maintaining a mature codebase. This is a particular concern for weakly typed languages like Javascript, where legal operations are not checked until runtime.
Programmers across the industry, myself included, have turned to two solutions for these ailments: linters and types. Linters, which can hook into your IDE or run on your code file as it’s updated, enforce optional and mandatory standards for punctuation, indentation, and demarcation on the code you write - and these standards can be exported and delivered to all members of your team or organization to ensure consistency across platforms and offices.
The benefits of these tools are difficult to overstate. Suddenly everything’s written in a dialect you can easily understand and types, which many Haskell and Java programmers sorely miss when transitioning to Python or JS stacks, can be enforced in the browser and the backend with Typescript or Purescript. From personal experience and independent testimony, it is tough to go back to vanilla JS once you have tasted the sweet nectar of type-safe web programming.
Linting tools will vary depending on your dev environment. The definitive tool for version-agnostic JavaScript is ESLint, with TSLint as its TypeScript equivalent. Both tools are available either as standalone packages, as extensions for your editor, or as part of your continuous integration process for catching “write-time” errors. Flow (flow.org) is the preeminent JavaScript type checking tool.
Step 4: Keep it simple at every level.
The only way to keep this code alive, healthy, and maintainable is this: build each part such that it’s easy to understand. That way, anyone can come in, spend some time studying the intuitive logic, make the needed changes and live to code another day. As Gall's law says: "All complex systems that work evolved from simpler systems that worked. If you want to build a complex system that works, build a simpler system first, and then improve it over time."
A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.
John Gall, author of "Systemantics: How Systems Really Work and How They Fail."
A neural network is an example of a program that accepts complex inputs and returns simple or complex outputs---for example, it can train on millions of websites and generate arbitrary amounts of near-human text, or it can train on images and reliably classify their contents. Despite the sophistication of the finished product, each network is designed relatively simply. A researcher or a team of them can easily tweak the minutiae of its training process or its outputs. At the fundamental level, each of the network’s millions of neurons simply accepts one or more inputs and applies a single operation to them: for example, (1, 1) -> (add) -> (2), or (3, 3, 0) -> (multiply) -> (0). The applied functions can get slightly more complex than arithmetic, but not by much. Out of these elementary building blocks, we’ve built titans.
You probably won’t be replacing your legacy code with a neural network, but you can still take away this core insight. There’s no telling where an error in the code might bubble up to, nor where maintenance or feature-addition might be needed. Thus, each component, function, and algorithm ought to be digestible at a glance, and resilient to time and developer assumptions.
Please share your legacy code experiences with us, too. We’d love to hear your stories, and if this advice helped you on the job!