Last checked: 3 minutes ago
Get notified about any outages, downtime or incidents for Jira Product Discovery and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Jira Product Discovery.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
Add/view data points ("Capture") | Active |
Create and delete fields | Active |
Open projects | Active |
View the latest incidents for Jira Product Discovery and check for official updates:
Description: ### **SUMMARY** Between September 18 2023 5:47 PM UTC and September 19 2023 04:15 AM UTC, customers using Confluence Cloud, Jira Software, Jira Service Management, Jira Work Management, Jira Product Discovery and Trello products with services hosted in the AWS us-east-1 and us-west-2 region experienced slow performance and/or page load failures as a result of an AWS issue that began on September 18 2023, 4:00 PM UTC. This was triggered by an underlying networking fault in our cloud provider AWS, which affected multiple AWS services in their us-west-2 and us-east-1 regions, used by Atlassian. The incident was detected within one minute by our monitoring systems. Recovery of affected Atlassian services occurred on a product-by-product basis with full recovery for all products completed by September 19 2023 04:15 AM UTC. ### **IMPACT** Product impact varied based on which regions and availability zones services are using, with services hosted in us-west-2 being affected more than services hosted in us-east-1. Product-specific impacts are listed below: * **Jira Software -** A number of Jira nodes were affected with highly elevated error rates due to Jira databases in us-east-1 and us-west-2 being impacted. The impact was varied with some Jira nodes being unusable whilst others were in a usable but degraded state. * **Jira Service Management -** Some users hosted in us-east-1 and us-west-2 experienced problems when creating issues through the Help Center, viewing issues, transitioning issues, posting comments and using queues * **Jira Work Management -** Users based in us-west-2 experienced minor service degradation. * **Jira Product Discovery -** Users experienced some issues when loading insights. * **Confluence Cloud -** Impact was limited to customers hosted in the us-west-2 region. During this time, users attempting to load confluence pages experienced sporadic product degradation, including brief periods where Confluence was inaccessible, complete and partial page load failures, page timeouts, increased request latency. * **Trello -** Users had minimal service degradation - only 0.1% of Trello users had automation rules impacted. ### **ROOT CAUSE** The root cause was an issue with subsystem responsible for network mapping propagation within the Amazon Virtual Private Cloud in the us-east-1 \(use1-az1\) and us-west-2 \(usw2-az1 and usw2-az2\) regions, which impacted network connectivity for multiple AWS services which Atlassian products rely upon. There was a delay between the AWS incident and Atlassian being affected as existing compute instances and resources were not affected by the issue. However any changes to networking state - such as scaling-up with additional compute nodes - experienced delays in the propagation of network mappings. This led to network connectivity issues until these network mappings had been fully propagated. Other AWS services that create or modify networking resources also saw impact as a result of this issue. There were no relevant Atlassian-driven events in the lead-up that have been identified to cause or contribute to this incident. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** To avoid repeating this type of incident, we are prioritizing documenting and evaluating ways to improve Availability Zone failure resiliency. Thanks, Atlassian Customer Support
Status: Postmortem
Impact: Minor | Started At: Sept. 19, 2023, 2:53 a.m.
Description: ### **SUMMARY** Between September 18 2023 5:47 PM UTC and September 19 2023 04:15 AM UTC, customers using Confluence Cloud, Jira Software, Jira Service Management, Jira Work Management, Jira Product Discovery and Trello products with services hosted in the AWS us-east-1 and us-west-2 region experienced slow performance and/or page load failures as a result of an AWS issue that began on September 18 2023, 4:00 PM UTC. This was triggered by an underlying networking fault in our cloud provider AWS, which affected multiple AWS services in their us-west-2 and us-east-1 regions, used by Atlassian. The incident was detected within one minute by our monitoring systems. Recovery of affected Atlassian services occurred on a product-by-product basis with full recovery for all products completed by September 19 2023 04:15 AM UTC. ### **IMPACT** Product impact varied based on which regions and availability zones services are using, with services hosted in us-west-2 being affected more than services hosted in us-east-1. Product-specific impacts are listed below: * **Jira Software -** A number of Jira nodes were affected with highly elevated error rates due to Jira databases in us-east-1 and us-west-2 being impacted. The impact was varied with some Jira nodes being unusable whilst others were in a usable but degraded state. * **Jira Service Management -** Some users hosted in us-east-1 and us-west-2 experienced problems when creating issues through the Help Center, viewing issues, transitioning issues, posting comments and using queues * **Jira Work Management -** Users based in us-west-2 experienced minor service degradation. * **Jira Product Discovery -** Users experienced some issues when loading insights. * **Confluence Cloud -** Impact was limited to customers hosted in the us-west-2 region. During this time, users attempting to load confluence pages experienced sporadic product degradation, including brief periods where Confluence was inaccessible, complete and partial page load failures, page timeouts, increased request latency. * **Trello -** Users had minimal service degradation - only 0.1% of Trello users had automation rules impacted. ### **ROOT CAUSE** The root cause was an issue with subsystem responsible for network mapping propagation within the Amazon Virtual Private Cloud in the us-east-1 \(use1-az1\) and us-west-2 \(usw2-az1 and usw2-az2\) regions, which impacted network connectivity for multiple AWS services which Atlassian products rely upon. There was a delay between the AWS incident and Atlassian being affected as existing compute instances and resources were not affected by the issue. However any changes to networking state - such as scaling-up with additional compute nodes - experienced delays in the propagation of network mappings. This led to network connectivity issues until these network mappings had been fully propagated. Other AWS services that create or modify networking resources also saw impact as a result of this issue. There were no relevant Atlassian-driven events in the lead-up that have been identified to cause or contribute to this incident. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** To avoid repeating this type of incident, we are prioritizing documenting and evaluating ways to improve Availability Zone failure resiliency. Thanks, Atlassian Customer Support
Status: Postmortem
Impact: Minor | Started At: Sept. 19, 2023, 2:53 a.m.
Description: Between 04:00 UTC to 05:00 UTC, we experienced a partial outage for Bitbucket Pipelines, issues viewing in-product attachments; and degraded performance across all products. The issue has been resolved and all services are operating normally.
Status: Resolved
Impact: None | Started At: Sept. 15, 2023, 4:02 a.m.
Description: Between 04:00 UTC to 05:00 UTC, we experienced a partial outage for Bitbucket Pipelines, issues viewing in-product attachments; and degraded performance across all products. The issue has been resolved and all services are operating normally.
Status: Resolved
Impact: None | Started At: Sept. 15, 2023, 4:02 a.m.
Description: ### **SUMMARY** On Sep 13, 2023, between 12:00 PM UTC and 03: 30 PM UTC, some Atlassian users were unable to sign in to their accounts and use multiple Atlassian cloud products. The event was triggered by a misconfiguration of rate limits in an internal service which caused a cascading failure in sign-in and signup-related APIs. The incident was quickly detected by multiple automated monitoring systems. The incident was mitigated on Sep 13, 2023, 03: 30 PM UTC by the rollback of a feature and additional scaling of services which put Atlassian systems into a known good state. The total time to resolution was about 3 hours & 30 minutes. ### **IMPACT** The overall impact was between Sep 13, 2023, 12:00 PM UTC and Sep 13, 2023, 03: 30 PM UTC on multiple products. The Incident caused intermittent service disruption across all regions. Some users were unable to sign in for sessions. Other scenarios that temporarily failed were new user signups, profile retrieval, and password reset. During the incident we had a peak of 90% requests failing across authentication, user profile retrieval, and password reset use cases. ### **ROOT CAUSE** The issue was caused due to a misconfiguration of a rate limit in an internal core service. As a result, some sign-in requests over the limit received HTTP 429 errors. However, retry behavior for requests caused a multiplication of load which led to higher service degradation. As many internal services depend on each other, the call graph complexity led to a longer time to detect the actual faulty service. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We are continuously improving our system's resiliency. We are prioritizing the following improvement actions to avoid repeating this type of incident: * Audit and improve service rate limits and client retry and backoff behavior. * Improve scale and load test automation for complex service interactions. * Audit cross-service dependencies and minimize them where possible related to sign-in flows. Due to the unavailability of sign-in, some customers were unable to create support tickets. We are making additional process improvements to: * Enable our unauthenticated support contact form and notify users that it should be used when standard channels are not available. * Create status page notifications more quickly and ensure that for severe incidents, notifications to all subscribers are enabled. We apologize to users who were impacted during this incident; we are taking immediate steps to improve the platform’s reliability and availability. Thanks, Atlassian Customer Support
Status: Postmortem
Impact: Major | Started At: Sept. 13, 2023, 2:08 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.