Last checked: 7 minutes ago
Get notified about any outages, downtime or incidents for Aptible and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Aptible.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
api.aptible.com | Active |
Aptible Deploy | Active |
auth.aptible.com | Active |
dashboard.aptible.com | Active |
Let's Encrypt | Active |
AWS EC2 | Active |
AWS EC2 (ap-northeast-1 — Tokyo) | Active |
AWS EC2 (ap-northeast-2 — Seoul) | Active |
AWS EC2 (ap-south-1 — Mumbai) | Active |
AWS EC2 (ap-southeast-1 — Singapore) | Active |
AWS EC2 (ap-southeast-2 — Sydney) | Active |
AWS EC2 (ca-central-1 — Canada) | Active |
AWS EC2 (eu-central-1 — Frankfurt) | Active |
AWS EC2 (eu-west-1 — Ireland) | Active |
AWS EC2 (eu-west-2 — London) | Active |
AWS EC2 (eu-west-3 — Paris) | Active |
AWS EC2 (sa-east-1 — São Paulo) | Active |
AWS EC2 (us-east-1 — Virginia) | Active |
AWS EC2 (us-east-2 — Ohio) | Active |
AWS EC2 (us-west-1 — California) | Active |
AWS EC2 (us-west-2 — Oregon) | Active |
View the latest incidents for Aptible and check for official updates:
Description: This incident has been resolved.
Status: Resolved
Impact: None | Started At: Sept. 28, 2023, 11:31 p.m.
Description: This incident has been resolved.
Status: Resolved
Impact: None | Started At: July 29, 2023, 3:02 p.m.
Description: This incident has been resolved.
Status: Resolved
Impact: Minor | Started At: July 20, 2023, 6:50 p.m.
Description: # Incident Postmortem: Metric Drains Interrupted for Some Dedicated Stacks ## Executive Summary On June 20, 2023, our platform experienced a service degradation incident for Metric Drains while rolling out a new feature for Metric Drains. This was due to unexpected side effects of a new internal utility used to deploy the feature. Some of our customers experienced interruptions in their metric drains during this incident. All issues were subsequently addressed, and service has been fully restored. ## Detailed Incident Description Configuration Change Initiation: The rollout of the change relied on a two-step configuration process to update the software for the metric drain emitter and aggregator components within each dedicated stack. This process was initiated using a new utility that had been successfully deployed in the past but not at the scale required for this rollout. Utility Timeouts and Delays: During the rollout, the configuration utility started experiencing cascading timeouts as operations queued with increasing delays in executing the configuration changes. During this period of delay in having configuration uniformly updated for the rollout, this caused some customer stacks to be only partially configured for the updated metric drain software. Customer Impact: A small number of customers who were deploying or scaling services during this period had their metric drains interrupted due to the aforementioned configuration issues. Resolution: Our team immediately worked on fixing the configuration issues. By 16:24 EDT, we successfully restored the configuration state for the affected customers, and the service was resumed to its regular state. Follow-up Audit: On the following morning of June 21, a follow-up audit revealed that two additional customers still needed configuration updates for their metric drains. We immediately addressed these issues. ## Root Cause Analysis The root cause of this issue was a combination of the increased scale of the rollout and the relative novelty of the utility used for the configuration changes. Although this utility had performed successfully under previous workloads, it did not sufficiently scale to handle the increased demand of this particular rollout. ## Lessons Learned and Preventative Measures Testing Deployment Tools at Scale: testing new deployment tools and utilities under maximum practical loads is crucial to ensure they can handle expected full-scope workloads without disruption. Audit Processes: Though our follow-up audit process effectively identified additional affected customers, we will make such audits more timely to catch any lingering issues sooner. We sincerely apologize for any inconvenience caused to our customers during this incident. We take this issue seriously and are committed to ensuring that such incidents do not occur in the future.
Status: Postmortem
Impact: None | Started At: June 21, 2023, 3:28 p.m.
Description: This incident has been resolved.
Status: Resolved
Impact: None | Started At: June 20, 2023, 8 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.