Last checked: 6 minutes ago
Get notified about any outages, downtime or incidents for Alloy and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Alloy.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
Customer Dashboard | Active |
Production API | Active |
Sandbox API | Active |
SDK | Active |
Webhooks | Active |
View the latest incidents for Alloy and check for official updates:
Description: Between 10:34 and 10:37 ET, we rolled out a scaling event to our API to mitigate some of the issues we are seeing with the AWS outage in one of the us-east-1 datacenters we use. This scaling event did not execute correctly due to the ongoing issues, making the situation temporarily worse. We immediately rolled back that change and are continuing to monitor the situation with AWS.
Status: Resolved
Impact: Minor | Started At: Dec. 22, 2021, 3:36 p.m.
Description: Between 18:21 and 18:25 ET this evening, a change was made to our web application firewall (WAF) rules which caused some clients to experience a loss of service. This WAF change was suggested by Amazon to add an extra layer of security to our environment, but instead, we saw it was flagging most users as malicious. Once we saw that this occurred when the WAF rule was turned on, it was quickly rolled back. This change seemed low-risk due to the origination of the suggestion, but we should have collected more data on it before rolling it out to production. We will be making sure future WAF rules follow that process.
Status: Resolved
Impact: Major | Started At: Dec. 17, 2021, 11:30 p.m.
Description: Reason codes are now populating in the evaluations page for all new evaluations. This took longer than usual to patch due to a large schema migration we had to run overnight. We are auditing our data coming out of this to make sure all other similar instances are identified and fixed. We will be backfilling reason codes for yesterday's evaluations throughout the day today.
Status: Resolved
Impact: Minor | Started At: Sept. 27, 2021, 7:07 p.m.
Description: Summary: On August 30th, our dashboard was inaccessible for about 30 minutes and slow to update for several hours. The API \(processing applications and data\) was not affected and no data was lost or put at risk. Operations teams performing manual reviews were the primary group impacted by this incident. Root Cause: The API server received thousands of requests with an unusual volume of entities in groups. This feature was not designed to support \(or defend against\) groups of the size received. The Groups feature is intended to manage situations where multiple entities belong to the same Application - such as 3 people applying for a joint checking account. Our dashboard uses a special database for displaying data in that dashboard. That database performs several queries that were not optimized to process a group of entities this large. The database slowed down, a backlog of queries developed, and eventually, the database went totally offline causing a dashboard outage. We were able to restore service by rate-limiting the client, removing the impacted entity links, and restarting the services that power the dashboard. This immediately restored service to the dashboard and after roughly another hour all of the backed-up queries were back to real-time. Actions taken or planned to avoid future incidents: In order to avoid this situation in the future we are: * Implementing rate limits on this feature to prevent unexpected behavior from misuse * Improving our proactive monitoring and alerting of the database that powers the dashboard to identify a potential issue more quickly * Implementing generalized guardrails and optimizations against intensive queries impacting the overall system
Status: Postmortem
Impact: Critical | Started At: Aug. 30, 2021, 3:45 p.m.
Description: This incident has been resolved.
Status: Resolved
Impact: Minor | Started At: Aug. 30, 2021, 2 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.