Last checked: 4 minutes ago
Get notified about any outages, downtime or incidents for Opsgenie and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Opsgenie.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
Public Website | Active |
EU | Active |
Alert Flow | Active |
Alert REST API | Active |
Configuration REST APIs | Active |
Email Notification Delivery | Active |
Heartbeat Monitoring | Active |
Heartbeat REST API | Active |
Incident Flow | Active |
Incident REST API | Active |
Incoming Call Routing | Active |
Incoming Email Service | Active |
Incoming Integration Flow | Active |
Logs | Active |
Mobile Application | Active |
Mobile Notification Delivery | Active |
Opsgenie Actions | Active |
Outgoing Integration Flow | Active |
Pricing & Billing | Active |
Reporting & Analytics | Active |
Signup, Login & Authorization | Active |
SMS Notification Delivery | Active |
Voice Notification Delivery | Active |
Web Application | Active |
US | Active |
Alert Flow | Active |
Alert REST API | Active |
Configuration REST APIs | Active |
Email Notification Delivery | Active |
Heartbeat Monitoring | Active |
Heartbeat REST API | Active |
Incident Flow | Active |
Incident REST API | Active |
Incoming Call Routing | Active |
Incoming Email Service | Active |
Incoming Integration Flow | Active |
Logs | Active |
Mobile Application | Active |
Mobile Notification Delivery | Active |
Opsgenie Actions | Active |
Outgoing Integration Flow | Active |
Pricing & Billing | Active |
Reporting & Analytics | Active |
Signup, Login & Authorization | Active |
SMS Notification Delivery | Active |
Voice Notification Delivery | Active |
Web Application | Active |
View the latest incidents for Opsgenie and check for official updates:
Description: ### **SUMMARY** On July 19, 2022, between 05:40 and 07:10 UTC, Atlassian customers in the EU region using Jira, Confluence and Opsgenie experienced problems loading pages through the web UI. The incident was automatically detected at 05.14 by one of Atlassian’s automated monitoring systems. The main disruption was resolved within 16 minutes with the full recovery taking additional 74 minutes. ### **IMPACT** Between July 19, 2022, 05:40 UTC and July 19, 2022, 07:10 UTC Jira, Confluence and OpsGenie users saw some web pages fail to load. During the 16 minute period from 06:40 UTC to 6:56 UTC, customers were unable to access Jira Confluence and OpsGenie web UI because the Atlassian Proxy \(the ingress point for service requests\) was unable to service most requests. ### **ROOT CAUSE** The issue was caused by an AWS initiated change that impacted Elastic Block Store \(EBS\) volume performance to such an extent that new instance creation and therefore auto scaling, was blocked. As a result, the products above, as well as essential internal Atlassian services could not auto scale to the increasing incoming service requests as the EU region came online. Once the AWS change had been rolled back, most Atlassian services recovered. Some internal services required manual scaling as a result of unhealthy nodes preventing scaling initiation, which prolonged complete recovery. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity and we apologize to customers whose services were impacted during this incident. We see two main avenues to increase our resiliency during an incident where AWS auto scaling is blocked: * Implement step scaling: Simple scaling in most cases works well. In this case due to nodes becoming unhealthy, simple scaling stops responding to scaling alarms and therefore the service can become “stuck” and will not recover once scaling is possible again. We are exploring the use of step scaling, as this will allow scaling even in the case of instances becoming unhealthy. * Implement improved alarming to identify “stuck” scaling to increase the TTR when scaling is available again. We are taking these immediate steps to improve the platform’s resiliency. Thanks, Atlassian
Status: Postmortem
Impact: None | Started At: July 19, 2022, 8:43 a.m.
Description: ### **SUMMARY** On July 19, 2022, between 05:40 and 07:10 UTC, Atlassian customers in the EU region using Jira, Confluence and Opsgenie experienced problems loading pages through the web UI. The incident was automatically detected at 05.14 by one of Atlassian’s automated monitoring systems. The main disruption was resolved within 16 minutes with the full recovery taking additional 74 minutes. ### **IMPACT** Between July 19, 2022, 05:40 UTC and July 19, 2022, 07:10 UTC Jira, Confluence and OpsGenie users saw some web pages fail to load. During the 16 minute period from 06:40 UTC to 6:56 UTC, customers were unable to access Jira Confluence and OpsGenie web UI because the Atlassian Proxy \(the ingress point for service requests\) was unable to service most requests. ### **ROOT CAUSE** The issue was caused by an AWS initiated change that impacted Elastic Block Store \(EBS\) volume performance to such an extent that new instance creation and therefore auto scaling, was blocked. As a result, the products above, as well as essential internal Atlassian services could not auto scale to the increasing incoming service requests as the EU region came online. Once the AWS change had been rolled back, most Atlassian services recovered. Some internal services required manual scaling as a result of unhealthy nodes preventing scaling initiation, which prolonged complete recovery. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity and we apologize to customers whose services were impacted during this incident. We see two main avenues to increase our resiliency during an incident where AWS auto scaling is blocked: * Implement step scaling: Simple scaling in most cases works well. In this case due to nodes becoming unhealthy, simple scaling stops responding to scaling alarms and therefore the service can become “stuck” and will not recover once scaling is possible again. We are exploring the use of step scaling, as this will allow scaling even in the case of instances becoming unhealthy. * Implement improved alarming to identify “stuck” scaling to increase the TTR when scaling is available again. We are taking these immediate steps to improve the platform’s resiliency. Thanks, Atlassian
Status: Postmortem
Impact: None | Started At: July 19, 2022, 8:43 a.m.
Description: The Problem has been resolved and the services are operating normally! Opsgenie has faced partial outages due to a minor update by the cloud provider and the team has worked with the cloud provider team to solve the incident in time. Only 15% of total requests and 4.1% of customers are affected by the incident. We will take the necessary actions to prevent facing a similar incident.
Status: Resolved
Impact: Major | Started At: July 19, 2022, 6:01 a.m.
Description: The Problem has been resolved and the services are operating normally! Opsgenie has faced partial outages due to a minor update by the cloud provider and the team has worked with the cloud provider team to solve the incident in time. Only 15% of total requests and 4.1% of customers are affected by the incident. We will take the necessary actions to prevent facing a similar incident.
Status: Resolved
Impact: Major | Started At: July 19, 2022, 6:01 a.m.
Description: Problem has been resolved and the services are operating normally! We will take the necessary actions to prevent facing a similar incident, and sorry for the inconvenience while we worked on this
Status: Resolved
Impact: Major | Started At: June 30, 2022, 9:15 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.