At Shortcut, we’ve just launched a completely rebuilt Search infrastructure. Search in Shortcut is now faster and returns more relevant results than ever before, and we’re excited to be able to share what we’ve been working on for so long.
While we continue to iterate on this feature, we wanted to take a moment to discuss why we made these changes, what challenges we faced, and also share any tips for teams thinking about how best to implement search in their own products.
I was joined in this discussion by co-founder/product designer Andrew Childs, software developer Jeremy Heiler, and dev-ops lead Paul Groudas.
Andrew, you were here when we implemented the first version of search. What was the thinking when you first set it up? Did you always feel like, “we’ll fix this later”?
Andrew: The way it worked originally was we would load all the organization’s data in the client and then just execute the search in the client. It worked ok in that it instantly presented search results, but they were sorted by workflow state, not by any kind of actual relevance score, and you had to wait for all of the organization’s story data to load before search would work.
So as we started growing as a business and started having some growing pains, we tried a number of things to optimize our web application performance, and one of those things was to remove the story description from the data we initially fetch from the server, so that we were sending less data over the wire. That did help, but it also made search a lot less useful, because from that point forward you could only search against a story title.
It was always “good enough” because you knew we were going to eventually move search to the server?
Andrew: It’s not feasible to replicate on the client-side what Elasticsearch can do, in the sense of creating a corpus of words, and then scoring them based on how often they appear. That would require us to load some giant library, along with all of the client’s data, which doesn’t scale at all — not to mention having a mobile app on the way.
From the feedback we’d seen in Zendesk and NPS surveys, we knew that Search had become a real pain point for a lot of companies. Search is a core requirement of any modern collaboration software, and user expectation keeps rising; so even though it was a massive undertaking, we knew it was worth the investment.
Once you were tasked with porting Search to the backend, Jeremy, how did you initially approach it? What tools did you look at, and how did you settle on Elasticsearch?
Jeremy: I initially started by learning what Search currently looked like, because it was really new to me. And then we talked high level as a team about where we wanted Search to go to help us understand what we needed the tool to do in the future. Ultimately, for version one of this, we decided we wanted to replicate the Search functionality as-is, but on the backend, without really adding features. Our goal was to make the current version faster and better.
Elasticsearch is the industry standard for this type of thing. There’s other services, like AWS CloudSearch and Algolia, but we decided on Elasticsearch because it is widely-used, and it was easy for us to get up and running locally for testing.
Also, AWS has a managed Elasticsearch service as well, so that made that a little easier…. Maybe Paul could talk about that, if it actually is easier or not. (laughs)
As far as building out the pipeline, I actually started thinking about using Onyx just because that was new tech that was starting to gain some ground. It ended up being more than we needed, so we went a simpler route.
I remember there being some talk of difficulties setting up Apache Zookeeper.
Paul: Yeah, that was part of the complexity of Onyx.
It probably helps to describe a little bit about how our search pipeline ended up working. I mean, ultimately what we were trying to do was translate real-time transactions that are occurring in Datomic, and then getting them manifested in a search index, and a scheme that is both as fast as possible today, but also scalable over time.
We were interested in Onyx as a method because it is a parallel framework that we can scale up with many nodes in an effort for building the future. It’s not enough for what we design to work for today; it needs to work for today through another order of magnitude of transaction rate.
For example, today we are leveraging Amazon’s queuing service to basically put Datomic transactions in a queue in realtime, and then we have an opportunity to fan out how we process that, and index those transactions and put them in Elasticsearch. It just so happens that we only need one node right now, but we can simply run more nodes to do the processing of the transactions, and then if Elasticsearch becomes the bottleneck we can also scale up the Elasticsearch domain to increase the rate at which we can add new records to the index.
Just to follow up on the point earlier, one of the big benefits of using Elasticsearch was that Amazon does in fact offer a turnkey solution of Elasticsearch as a service that enables us to very easily scale it up or down based on our performance criteria.
Jeremy: Datomic was very helpful in making this easier. If we did not have the Datomic transaction log, we would have to come up with some other solution for making sure all creates, updates, and deletes are queued up so that we could added complexity to the application.
Andrew, do you have any advice for somebody who’s starting a new application and thinking about how to implement search? Are there any in hindsight things you would’ve done differently?
Andrew: I think we made trade-offs because that was the only option at the time, and this obviously was always going to be a big chunk of work. I think it’s just one of those things that you have to push to get prioritized among all of the big picture things that you want to accomplish early on. There are also search tools like Algolia for a small team that doesn’t have a lot of resources.
In implementing V1 of Search, we were really just trying replicate what we had been doing on the frontend but the change right now hasn’t been wholly additive in some ways. For example, we’ve temporarily lost the inability to sort on the search page. How did you think about making that concession and saying, “ We’re gonna subtract something now, but we’ll add it back later”?
Andrew: To me it wasn’t that big of a deal because the overall experience and the overall product is better.
Jeremy: We lost sorting, but then we gained so much more in ranking and scoring that, as Andrew said, it was worth it.
So Elasticsearch was the right Search for revamping this feature, but can you talk about what Elasticsearch might not be so good for?
Jeremy: Elasticsearch is a secondary data store, so there will be a latency between the data being recording in Datomic, and ending up in Elasticsearch. Which is usually fine in practice. Also, Elasticsearch has different consistency semantics, and therefore the data may not be 100% accurate. If you rely on it for something as concrete as a report, it may not be the best option.
Finally, can you talk a little bit about what else we want to make available in future versions of Search?
Jeremy: We definitely want to support other entities — Comments, Epics , Milestones, all that. We also need to decide how best to support different languages, and then just continue to make improvements to the overall algorithm for relevance.
How is Search in Shortcut working for you? We welcome your feedback on Twitter, or via the Feedback button in the app.