How we didn't rewrite Shortcut
"We should just rewrite that."
Show any software developer a working system, and the first words you’ll hear are “This could be so much better if we rewrote it from scratch.” This even applies to developers evaluating their own work. As engineers, we like to build things, to create something new. In software, this shows up as the desire to rewrite.
Software-based organizations live with this tension all the time. To some businesses, accustomed to thinking of software as a capital expense, rewriting something that already exists seems like all cost and no benefit. But the savvier organization knows that code is a liability, not an asset, and the only way to manage its cost is continual replacement.
The challenge is deciding what to replace and when. The “big bang” rewrite — just raze it to the ground and start over — is appealing in theory but usually disastrous in practice. The industry is littered with failed big-rewrite projects. On the other hand, engineering organizations that are too afraid of change can become trapped by legacy systems that no one dares touch. That’s why billions of dollars’ worth of commerce still flows through systems written in COBOL.
Shortcut doesn’t have any COBOL lying around. But startups rely on growth, and growth implies change. Changing the business means changing the software.
If we were aiming for maximum clicks, this article would have been titled “From Monolith to Microservices.” But that’s not quite the story. We did transition from a monolith to multiple services, but those services vary widely in size and complexity.
It’s not a story about rewrites either, although plenty of code got written and, yes, rewritten. The story we want to tell is about the evolution of the Shortcut backend architecture over the past couple of years. We hope this series will provide some useful insights into the software development process for our colleagues in the industry and our customers, many of whom are software developers themselves.
Break up the Monolith
In software terms, a monolith is a single large application with multiple responsibilities. The monolith is typically developed, deployed, and run as a single unit. Although it may have internal structure and subsystems, these divisions are more a matter of convention than a strict physical separation.
Software monoliths were the norm for decades going back to the mainframe era (all that COBOL). But as hardware has become exponentially cheaper - and network-based services have proliferated - the trend has been toward deploying software in smaller and smaller units, culminating in today’s fashion for microservices.
When considering a major change to an existing system, it is useful to frame it in terms of the desirable outcomes you expect the change will produce, e.g., we can scale to X more users while improving performance by a factor of Y. We’ll have more to say on that in future posts. But it is equally important to identify the desirable properties of the existing system, to make sure they are not lost or forgotten.
The original Shortcut backend was a monolithic Clojure application deployed to AWS Elastic Beanstalk. A monolith is a natural place to start when you’re not really sure what the eventual structure of the system will be. There’s a reason most applications start out this way. “Monolith first” has even been recommended as a design strategy.
Although a monolith itself may be a large, complex piece of software, its most attractive quality, at least at the start, is simplicity. A monolith is simple in the sense that it is a single, unified whole rather than a composition of parts. This makes life easier for developers. When you are working in a monolith, there are fewer decisions to make about where new features should be implemented. Everything goes in the same “place.” Deployment is likewise straightforward because there’s only one thing to deploy.
The monolith has other advantages that are often forgotten amidst the microservice hype. Code within a monolith can be tightly integrated, with every feature at most a function call away. Common behaviors or domain invariants can be enforced simply by calling shared code, and sharing code across components is trivial. The failure modes of a monolith are usually pretty clear: either the application works or it doesn’t. Partial failures, one of the major headaches of distributed systems, are rare in a monolith.
In summary, a monolith is optimized for rapid development and low operational overhead. In other words, it is exactly what an early-stage startup needs. The backend monolith served Shortcut well through the early years, when a small team was racing to build up a minimum viable feature set to compete in the crowded project-management space.
Time to change
By late 2017, the disadvantages of the monolithic design were starting to emerge. Users were starting to complain that the application felt “slow.” At the same time, our AWS bill was growing out of proportion to the number of users. Outages were becoming more common. These were clear signals that we needed to make our backend systems better, and it was difficult to do that within the monolith.
Optimizing performance of software systems is about specializing. A component that has only one job can be carefully crafted for that specific job, making it more efficient, i.e., doing more work while consuming fewer hardware resources.
For a non-software analogy, think of a Formula One race car compared with a standard car. The race car is highly optimized for the conditions of a race track, but you can’t drive it home. The standard car can drive on all kinds of roads, but it will never beat the race car on the track.
A monolith is the antithesis of specialization: It does everything. Even though it may be deployed to multiple machines, each machine must be capable of performing any task. Homogeneity is an attractive quality at first, because it seems simpler, but it means that specialization is all but impossible. In a monolith, there are fewer opportunities to optimize for specific tasks.
The evolution of the Shortcut backend architecture is largely a story of migrating from generic solutions to specific ones better-optimized for their specific jobs. By necessity, this included breaking the monolithic application into smaller services. Some of those services are small enough to qualify as “microservices,” while others are not. So our story might be called “From Monolith to Multiplicity.”
The next post will describe the first steps we took on this journey. Stay tuned.
- The Shortcut frontend (web and mobile) architecture went through equally significant changes in the same time frame, but this series will focus on the backend.