Summary of the Shortcut service disruption on June 1, 2017
Blog

Summary of the Shortcut service disruption on June 1, 2017

Kurt Schrader
CEO and Co-Founder
June 9, 2017

On the afternoon of June 1, 2017, Shortcut experienced a partial service outage that occurred between 2:07pm ET and 3:52pm ET.

Organizations that added or modified something in Shortcut during that period of time may not have had their data recorded in our database. We have been able to recover all customer data from the affected period, and are in the process of communicating the recovered data to all affected customers.

The root cause of the issue was the interaction of two previously undiscovered bugs in Datomic, the database software that we use to store data on the backend, that caused our database to enter an unstable state.

We have been working with the Datomic team over the last week to address the issues at hand and they have released a new version of Datomic with fixes to the issues that we encountered (http://docs.datomic.com/release-notices.html).

Statement from the Datomic team:

"Release 0.9.5561.50 fixes a bug in the catalog that, in the unlikely circumstance where one has deleted a database and restored it from a backup without first having called gc-deleted-dbs, can cause a subsequent gc-deleted-dbs to delete that (active) database."

We have now deployed the updated version of Datomic and do not expect to encounter this problem again.

Every day, thousands of companies trust Shortcut to keep mission-critical data safe and secure, and we take that responsibility very seriously. Now that the issue is resolved, we are planning a complete review of our operations and recovery procedures to ensure we’re taking every preventative measure we can to ensure the reliability and security of your data.

Incident Timeline:

  • Jun 1, 12:00pm ET: We began the process of deleting old databases from the system.
  • Jun 1, 2:07pm ET: Our monitoring showed an increased rate of 500 errors occurring on our servers and we began our investigation.
  • Jun 1, 3:52pm ET: Our database was restored to a known good state. Some of the transactions from the prior 3 hours and 10 minutes were lost. The indexes that power historical reporting and historical activity feed were also non-functional.
  • Jun 7, 4:00pm ET: The database indexes were rebuilt and merged to restore reporting and activity feed functionality. All lost data was recovered is being communicated to affected customers.

If you have any questions about the outage please don’t hesitate to contact us at support@clubhouse.io

Topics:
No items found.
Share this Shortcut story
Enjoy your work
Project management software should be helpful, not a hassle.
Check out our words
Shortcut is modern project management without all the management. 
And this is our blog.
Read more stories
No items found.