Last checked: 6 minutes ago
Get notified about any outages, downtime or incidents for Opsgenie and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Opsgenie.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
Public Website | Active |
EU | Active |
Alert Flow | Active |
Alert REST API | Active |
Configuration REST APIs | Active |
Email Notification Delivery | Active |
Heartbeat Monitoring | Active |
Heartbeat REST API | Active |
Incident Flow | Active |
Incident REST API | Active |
Incoming Call Routing | Active |
Incoming Email Service | Active |
Incoming Integration Flow | Active |
Logs | Active |
Mobile Application | Active |
Mobile Notification Delivery | Active |
Opsgenie Actions | Active |
Outgoing Integration Flow | Active |
Pricing & Billing | Active |
Reporting & Analytics | Active |
Signup, Login & Authorization | Active |
SMS Notification Delivery | Active |
Voice Notification Delivery | Active |
Web Application | Active |
US | Active |
Alert Flow | Active |
Alert REST API | Active |
Configuration REST APIs | Active |
Email Notification Delivery | Active |
Heartbeat Monitoring | Active |
Heartbeat REST API | Active |
Incident Flow | Active |
Incident REST API | Active |
Incoming Call Routing | Active |
Incoming Email Service | Active |
Incoming Integration Flow | Active |
Logs | Active |
Mobile Application | Active |
Mobile Notification Delivery | Active |
Opsgenie Actions | Active |
Outgoing Integration Flow | Active |
Pricing & Billing | Active |
Reporting & Analytics | Active |
Signup, Login & Authorization | Active |
SMS Notification Delivery | Active |
Voice Notification Delivery | Active |
Web Application | Active |
View the latest incidents for Opsgenie and check for official updates:
Description: We're excited to inform you that we've shipped upgrades to our production environment, enabling scheduled reports once again. What's Changed: To continuously improve and ensure the security of our services, we've implemented additional controls including domain restrictions and a limitation on the number of recipients per email. This is specifically for mitigation purposes. From now on, users will start receiving emails for the reports they've scheduled for themselves, and they will also have the ability to create new tasks. Impact: Please note that any existing scheduled jobs with external recipients will no longer be editable. However, users can delete these and create new jobs using their email IDs. Thank you for your patience during these changes. We want to assure you that future updates and communications will be shared promptly to keep you informed. We appreciate your understanding and continued support.
Status: Resolved
Impact: Minor | Started At: Nov. 18, 2023, 11:41 a.m.
Description: ### **SUMMARY** On Sep 13, 2023, between 12:00 PM UTC and 03: 30 PM UTC, some Atlassian users were unable to sign in to their accounts and use multiple Atlassian cloud products. The event was triggered by a misconfiguration of rate limits in an internal service which caused a cascading failure in sign-in and signup-related APIs. The incident was quickly detected by multiple automated monitoring systems. The incident was mitigated on Sep 13, 2023, 03: 30 PM UTC by the rollback of a feature and additional scaling of services which put Atlassian systems into a known good state. The total time to resolution was about 3 hours & 30 minutes. ### **IMPACT** The overall impact was between Sep 13, 2023, 12:00 PM UTC and Sep 13, 2023, 03: 30 PM UTC on multiple products. The Incident caused intermittent service disruption across all regions. Some users were unable to sign in for sessions. Other scenarios that temporarily failed were new user signups, profile retrieval, and password reset. During the incident we had a peak of 90% requests failing across authentication, user profile retrieval, and password reset use cases. ### **ROOT CAUSE** The issue was caused due to a misconfiguration of a rate limit in an internal core service. As a result, some sign-in requests over the limit received HTTP 429 errors. However, retry behavior for requests caused a multiplication of load which led to higher service degradation. As many internal services depend on each other, the call graph complexity led to a longer time to detect the actual faulty service. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We are continuously improving our system's resiliency. We are prioritizing the following improvement actions to avoid repeating this type of incident: * Audit and improve service rate limits and client retry and backoff behavior. * Improve scale and load test automation for complex service interactions. * Audit cross-service dependencies and minimize them where possible related to sign-in flows. Due to the unavailability of sign-in, some customers were unable to create support tickets. We are making additional process improvements to: * Enable our unauthenticated support contact form and notify users that it should be used when standard channels are not available. * Create status page notifications more quickly and ensure that for severe incidents, notifications to all subscribers are enabled. We apologize to users who were impacted during this incident; we are taking immediate steps to improve the platform’s reliability and availability. Thanks, Atlassian Customer Support
Status: Postmortem
Impact: Major | Started At: Sept. 13, 2023, 2:08 p.m.
Description: This incident has been resolved.
Status: Resolved
Impact: Minor | Started At: Sept. 10, 2023, 6:37 p.m.
Description: ### **SUMMARY** On August 30, 2023, between 4:07 and 5:30 UTC, some customers were unable to login to Atlassian's Cloud products using [id.atlassian.com](http://id.atlassian.com). Logged-in users were also unable to switch accounts, change passwords, or log out. Users with existing sessions were not impacted. Between 5:32 and 6:00 UTC, traffic was incrementally restored to a previous build, mitigating the impact for users. The total time to resolution was one hour and 53 minutes. ### **IMPACT** Users were not able to login using Atlassian's shared account management system \([id.atlassian.com](http://id.atlassian.com)\). This affected users who were trying to login to the following products: Jira, Confluence, Trello, Opsgenie, mobile apps and ecosystem apps. Aside from the inability to login, there was no impact on other Atlassian products or features. ### **ROOT CAUSE** Multiple Set-Cookie headers were unintentionally modified so that only the last Set-Cookie header remained in the response to user's browsers. The issue was caused by a change to Network Extensions within the Edge Network. As a result, users that needed a new session could not login. Upon login, the users were redirected to login again and no session was created for them. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity. While we have a number of testing and preventative processes in place, this specific issue was not detected in Atlassian's staging environment. End-to-end tests did not cover the use case of multiple Set-Cookie headers in the single response and therefore this bug went unnoticed. We are prioritizing the following improvement actions to avoid repeating this type of incident: * Automated tests to be put in place to validate that cookies are not being removed from responses. * Configuration of networking extensions will be guaranteed to be identical in staging and production to ensure errors are picked up earlier. Furthermore, we typically deploy our changes progressively by cloud region to avoid broad impact, but in this case, the change was not deemed risky and was deployed to all regions. To minimize the impact of breaking changes to our environments, we will implement additional preventative measures: * Changes to network extensions in the future will use progressive rollouts. * With staging being properly utilized, errors similar to this one will not be deployed to any production environments. We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability. Thanks, Atlassian Customer Support
Status: Postmortem
Impact: None | Started At: Aug. 30, 2023, 5:19 a.m.
Description: ### **SUMMARY** On August 30, 2023, between 4:07 and 5:30 UTC, some customers were unable to login to Atlassian's Cloud products using [id.atlassian.com](http://id.atlassian.com). Logged-in users were also unable to switch accounts, change passwords, or log out. Users with existing sessions were not impacted. Between 5:32 and 6:00 UTC, traffic was incrementally restored to a previous build, mitigating the impact for users. The total time to resolution was one hour and 53 minutes. ### **IMPACT** Users were not able to login using Atlassian's shared account management system \([id.atlassian.com](http://id.atlassian.com)\). This affected users who were trying to login to the following products: Jira, Confluence, Trello, Opsgenie, mobile apps and ecosystem apps. Aside from the inability to login, there was no impact on other Atlassian products or features. ### **ROOT CAUSE** Multiple Set-Cookie headers were unintentionally modified so that only the last Set-Cookie header remained in the response to user's browsers. The issue was caused by a change to Network Extensions within the Edge Network. As a result, users that needed a new session could not login. Upon login, the users were redirected to login again and no session was created for them. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity. While we have a number of testing and preventative processes in place, this specific issue was not detected in Atlassian's staging environment. End-to-end tests did not cover the use case of multiple Set-Cookie headers in the single response and therefore this bug went unnoticed. We are prioritizing the following improvement actions to avoid repeating this type of incident: * Automated tests to be put in place to validate that cookies are not being removed from responses. * Configuration of networking extensions will be guaranteed to be identical in staging and production to ensure errors are picked up earlier. Furthermore, we typically deploy our changes progressively by cloud region to avoid broad impact, but in this case, the change was not deemed risky and was deployed to all regions. To minimize the impact of breaking changes to our environments, we will implement additional preventative measures: * Changes to network extensions in the future will use progressive rollouts. * With staging being properly utilized, errors similar to this one will not be deployed to any production environments. We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability. Thanks, Atlassian Customer Support
Status: Postmortem
Impact: None | Started At: Aug. 30, 2023, 5:19 a.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.