Last checked: 3 minutes ago
Get notified about any outages, downtime or incidents for Percolate and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Percolate.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
Percolate API | Active |
Percolate Application | Active |
Percolate Website | Active |
View the latest incidents for Percolate and check for official updates:
Description: Between 5:30 AM ET, Friday May 10 and 12:45 PM ET, Tuesday May 14, the Percolate Asset Manager experienced a severe degradation in performance that impacted customers accessing the Asset Gallery and individual Assets within the application. We know that this is a critical system that our customers and their businesses rely on. We would like to apologize for this incident, and for the length of time it took to resolve. We have high standards for our software, and for the experience we deliver to our customers, and we want to be transparent about what happened, the steps we took to fix it, and how we are changing our tools and processes to prevent such issues from occurring again. **What Happened** During the initial investigation on Friday, Percolate Engineering identified the issue as an overloaded database cluster. Configuration and scaling changes were made to the cluster in order to bring it back to a healthy and performant state. This was a standard response to an infrastructure scaling issue, and was executed as planned according to our procedures. The presence of this overloaded database cluster, however, masked another, more severe, issue that was causing the majority of the performance problems. We then identified a networking issue that was resulting in excessive and slow network calls to a helper service. This issue was the result of a recent code deployment intended to prepare our infrastructure for a migration to a new technology. This change was tested thoroughly prior to deployment. Unfortunately, the increased load of the production environment caused it to behave in a manner that was unforeseen during testing, and it had to be rolled back. Once we rolled back this change, performance improved in the Asset Manager, and was restored to historical performance levels. **Prevention/Mitigation Steps** Several steps have been taken or are in progress to prevent an issue like this from recurring. 1\. We improved our monitoring and alerting on overloaded database clusters to identify when these situations happen earlier, so that we can take advance action before slowness shows up in the application. 2\. We increased resources to all services and databases involved, and applied a new data distribution strategy to improve performance overall. 3\. We improved logging to our Asset service to more quickly pinpoint any issues that may happen. 4\. We changed our deployment processes for more sensitive changes to better identify if they are working as expected as they roll out to customers. 5\. \(In Progress\) - Our new infrastructure technology looks for these issues proactively and heals the system in an automated fashion to avoid the need for manual debugging and intervention. 6\. \(In Progress\) - Remodeling of one database to improve query performance and resiliency. We feel strongly that these mitigation steps will prevent or minimize any recurrence of the issue. We are constantly working to improve our processes so that you can continue to rely on us for your critical business needs.
Status: Postmortem
Impact: None | Started At: May 13, 2019, 2:37 p.m.
Description: This incident has been resolved.
Status: Resolved
Impact: Major | Started At: May 10, 2019, 10:37 a.m.
Description: This incident has been resolved.
Status: Resolved
Impact: Major | Started At: May 10, 2019, 10:37 a.m.
Description: This incident has been resolved.
Status: Resolved
Impact: Critical | Started At: Feb. 21, 2019, 4:10 p.m.
Description: This incident has been resolved.
Status: Resolved
Impact: Critical | Started At: Feb. 19, 2019, 4:14 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.