Last checked: 2 minutes ago
Get notified about any outages, downtime or incidents for Guard and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Guard.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
Account Management | Active |
API tokens | Active |
Audit Logs | Active |
Domain Claims | Active |
SAML-based SSO | Active |
Signup | Active |
User Provisioning | Active |
Guard Premium | Active |
Data Classification | Active |
Data Security Policies | Active |
Guard Detect | Active |
View the latest incidents for Guard and check for official updates:
Description: Impact on User Management Syncs has been mitigated.
Status: Resolved
Impact: None | Started At: Feb. 12, 2021, 2:18 a.m.
Description: ### **SUMMARY** From Feb 10th, 2021, at 3:15 AM UTC to Feb 11th at 12:23 AM UTC, a subset of Atlassian customers using Trello, Jira, Opsgenie, Access, and Confluence products were unable to login. The event was caused by a faulty change in Atlassian Access that was deployed to production. The changes included Atlassian Access verifying domains and claiming accounts associated with organizations, even though those organizations did not initiate the domain verifications or account claims. However this did not have any impact on customer privacy. This impacted customers in all regions. When a scheduled job executed, the faulty change was activated and the incident was triggered. The incident was detected after 118 minutes by customer support and mitigated by rolling back the faulty change and by progressively setting affected domains and accounts to a good state. The total time to resolution was about 21 hours and 8 minutes. The impact on the products affected is listed below. ### **IMPACT** The product specific impact is between Feb 10th, 2021, 3:15 AM UTC and Feb 11th, 12:23 AM UTC ### **Atlassian Access** * Some [domains were verified and user accounts of the domains were claimed](https://support.atlassian.com/user-management/docs/verify-a-domain-to-manage-accounts) without admin consent. The accounts associated with these domains became managed accounts, but this did not have any impact on customer privacy. * The users of such accounts received an email stating that their account was now being managed by their organization. ### **Confluence, Trello, Jira, Opsgenie** * A subset of users were unable to login to the products during this time. ### **ROOT CAUSE** The issue was caused by a faulty background job in Atlassian Access, which was periodically executed to verify domain ownership, verify domains, and claim accounts for the domain. This resulted in some end-user accounts being locked out. As a result, the products called out above did not allow login to those end users, and the users received login failure messages. The faulty change was in one of the key services of our system which had an impact on downstream systems including products mentioned above. Determining a good state took longer than anticipated. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity. We deploy our changes progressively \(by cloud region\) to avoid broad impact. However, in this case our detection of the domain verification and accounts claim did not work as expected. Moving forward, to minimize the impact of breaking changes to our environments, we will implement preventative measures such as the ones listed below. **Prevention and Detection** * While we have very good coverage on testing of the affected service with the faulty change, additional use cases are being identified and tests are being added. These additional tests would help us verify the changes at various stages of deployment. * We are improving our process of deployment of the affected service to increase confidence in our deployments by taking some steps such as: * Progressive rollouts to production. * Increased level of scrutiny on changes to be deployed to sensitive services. **Restoration Time** * We are improving our end-to-end processes for recovering from such incidents and to reduce the outage/degradation time by: * Introducing runbooks for identifying impact quickly and restoring the data to a good state. * Investigating the architecture between our Access and Identity systems to identify quick recovery opportunities. * We will be conducting a review of our architecture to identify any opportunities for faster recovery under such circumstances. We have identified multiple improvement actions across the affected products to improve resiliency on failures. At the time of writing, we are in the process of implementing some of these. We apologize to customers who were impacted during this incident; we are taking immediate steps to improve the reliability of the domain verification and accounts claim services. Thanks, Atlassian Customer Support
Status: Postmortem
Impact: Major | Started At: Feb. 10, 2021, 7:33 p.m.
Description: ### **SUMMARY** From Feb 10th, 2021, at 3:15 AM UTC to Feb 11th at 12:23 AM UTC, a subset of Atlassian customers using Trello, Jira, Opsgenie, Access, and Confluence products were unable to login. The event was caused by a faulty change in Atlassian Access that was deployed to production. The changes included Atlassian Access verifying domains and claiming accounts associated with organizations, even though those organizations did not initiate the domain verifications or account claims. However this did not have any impact on customer privacy. This impacted customers in all regions. When a scheduled job executed, the faulty change was activated and the incident was triggered. The incident was detected after 118 minutes by customer support and mitigated by rolling back the faulty change and by progressively setting affected domains and accounts to a good state. The total time to resolution was about 21 hours and 8 minutes. The impact on the products affected is listed below. ### **IMPACT** The product specific impact is between Feb 10th, 2021, 3:15 AM UTC and Feb 11th, 12:23 AM UTC ### **Atlassian Access** * Some [domains were verified and user accounts of the domains were claimed](https://support.atlassian.com/user-management/docs/verify-a-domain-to-manage-accounts) without admin consent. The accounts associated with these domains became managed accounts, but this did not have any impact on customer privacy. * The users of such accounts received an email stating that their account was now being managed by their organization. ### **Confluence, Trello, Jira, Opsgenie** * A subset of users were unable to login to the products during this time. ### **ROOT CAUSE** The issue was caused by a faulty background job in Atlassian Access, which was periodically executed to verify domain ownership, verify domains, and claim accounts for the domain. This resulted in some end-user accounts being locked out. As a result, the products called out above did not allow login to those end users, and the users received login failure messages. The faulty change was in one of the key services of our system which had an impact on downstream systems including products mentioned above. Determining a good state took longer than anticipated. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity. We deploy our changes progressively \(by cloud region\) to avoid broad impact. However, in this case our detection of the domain verification and accounts claim did not work as expected. Moving forward, to minimize the impact of breaking changes to our environments, we will implement preventative measures such as the ones listed below. **Prevention and Detection** * While we have very good coverage on testing of the affected service with the faulty change, additional use cases are being identified and tests are being added. These additional tests would help us verify the changes at various stages of deployment. * We are improving our process of deployment of the affected service to increase confidence in our deployments by taking some steps such as: * Progressive rollouts to production. * Increased level of scrutiny on changes to be deployed to sensitive services. **Restoration Time** * We are improving our end-to-end processes for recovering from such incidents and to reduce the outage/degradation time by: * Introducing runbooks for identifying impact quickly and restoring the data to a good state. * Investigating the architecture between our Access and Identity systems to identify quick recovery opportunities. * We will be conducting a review of our architecture to identify any opportunities for faster recovery under such circumstances. We have identified multiple improvement actions across the affected products to improve resiliency on failures. At the time of writing, we are in the process of implementing some of these. We apologize to customers who were impacted during this incident; we are taking immediate steps to improve the reliability of the domain verification and accounts claim services. Thanks, Atlassian Customer Support
Status: Postmortem
Impact: Major | Started At: Feb. 10, 2021, 7:33 p.m.
Description: ### **SUMMARY** From Feb 10th, 2021, at 3:15 AM UTC to Feb 11th at 12:23 AM UTC, a subset of Atlassian customers using Trello, Jira, Opsgenie, Access, and Confluence products were unable to login. The event was caused by a faulty change in Atlassian Access that was deployed to production. The changes included Atlassian Access verifying domains and claiming accounts associated with organizations, even though those organizations did not initiate the domain verifications or account claims. However this did not have any impact on customer privacy. This impacted customers in all regions. When a scheduled job executed, the faulty change was activated and the incident was triggered. The incident was detected after 118 minutes by customer support and mitigated by rolling back the faulty change and by progressively setting affected domains and accounts to a good state. The total time to resolution was about 21 hours and 8 minutes. The impact on the products affected is listed below. ### **IMPACT** The product specific impact is between Feb 10th, 2021, 3:15 AM UTC and Feb 11th, 12:23 AM UTC ### **Atlassian Access** * Some [domains were verified and user accounts of the domains were claimed](https://support.atlassian.com/user-management/docs/verify-a-domain-to-manage-accounts) without admin consent. The accounts associated with these domains became managed accounts, but this did not have any impact on customer privacy. * The users of such accounts received an email stating that their account was now being managed by their organization. ### **Confluence, Trello, Jira, Opsgenie** * A subset of users were unable to login to the products during this time. ### **ROOT CAUSE** The issue was caused by a faulty background job in Atlassian Access, which was periodically executed to verify domain ownership, verify domains, and claim accounts for the domain. This resulted in some end-user accounts being locked out. As a result, the products called out above did not allow login to those end users, and the users received login failure messages. The faulty change was in one of the key services of our system which had an impact on downstream systems including products mentioned above. Determining a good state took longer than anticipated. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity. We deploy our changes progressively \(by cloud region\) to avoid broad impact. However, in this case our detection of the domain verification and accounts claim did not work as expected. Moving forward, to minimize the impact of breaking changes to our environments, we will implement preventative measures such as the ones listed below. **Prevention and Detection** * While we have very good coverage on testing of the affected service with the faulty change, additional use cases are being identified and tests are being added. These additional tests would help us verify the changes at various stages of deployment. * We are improving our process of deployment of the affected service to increase confidence in our deployments by taking some steps such as: * Progressive rollouts to production. * Increased level of scrutiny on changes to be deployed to sensitive services. **Restoration Time** * We are improving our end-to-end processes for recovering from such incidents and to reduce the outage/degradation time by: * Introducing runbooks for identifying impact quickly and restoring the data to a good state. * Investigating the architecture between our Access and Identity systems to identify quick recovery opportunities. * We will be conducting a review of our architecture to identify any opportunities for faster recovery under such circumstances. We have identified multiple improvement actions across the affected products to improve resiliency on failures. At the time of writing, we are in the process of implementing some of these. We apologize to customers who were impacted during this incident; we are taking immediate steps to improve the reliability of the domain verification and accounts claim services. Thanks, Atlassian Customer Support
Status: Postmortem
Impact: Minor | Started At: Feb. 10, 2021, 5:59 a.m.
Description: ### **SUMMARY** From Feb 10th, 2021, at 3:15 AM UTC to Feb 11th at 12:23 AM UTC, a subset of Atlassian customers using Trello, Jira, Opsgenie, Access, and Confluence products were unable to login. The event was caused by a faulty change in Atlassian Access that was deployed to production. The changes included Atlassian Access verifying domains and claiming accounts associated with organizations, even though those organizations did not initiate the domain verifications or account claims. However this did not have any impact on customer privacy. This impacted customers in all regions. When a scheduled job executed, the faulty change was activated and the incident was triggered. The incident was detected after 118 minutes by customer support and mitigated by rolling back the faulty change and by progressively setting affected domains and accounts to a good state. The total time to resolution was about 21 hours and 8 minutes. The impact on the products affected is listed below. ### **IMPACT** The product specific impact is between Feb 10th, 2021, 3:15 AM UTC and Feb 11th, 12:23 AM UTC ### **Atlassian Access** * Some [domains were verified and user accounts of the domains were claimed](https://support.atlassian.com/user-management/docs/verify-a-domain-to-manage-accounts) without admin consent. The accounts associated with these domains became managed accounts, but this did not have any impact on customer privacy. * The users of such accounts received an email stating that their account was now being managed by their organization. ### **Confluence, Trello, Jira, Opsgenie** * A subset of users were unable to login to the products during this time. ### **ROOT CAUSE** The issue was caused by a faulty background job in Atlassian Access, which was periodically executed to verify domain ownership, verify domains, and claim accounts for the domain. This resulted in some end-user accounts being locked out. As a result, the products called out above did not allow login to those end users, and the users received login failure messages. The faulty change was in one of the key services of our system which had an impact on downstream systems including products mentioned above. Determining a good state took longer than anticipated. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity. We deploy our changes progressively \(by cloud region\) to avoid broad impact. However, in this case our detection of the domain verification and accounts claim did not work as expected. Moving forward, to minimize the impact of breaking changes to our environments, we will implement preventative measures such as the ones listed below. **Prevention and Detection** * While we have very good coverage on testing of the affected service with the faulty change, additional use cases are being identified and tests are being added. These additional tests would help us verify the changes at various stages of deployment. * We are improving our process of deployment of the affected service to increase confidence in our deployments by taking some steps such as: * Progressive rollouts to production. * Increased level of scrutiny on changes to be deployed to sensitive services. **Restoration Time** * We are improving our end-to-end processes for recovering from such incidents and to reduce the outage/degradation time by: * Introducing runbooks for identifying impact quickly and restoring the data to a good state. * Investigating the architecture between our Access and Identity systems to identify quick recovery opportunities. * We will be conducting a review of our architecture to identify any opportunities for faster recovery under such circumstances. We have identified multiple improvement actions across the affected products to improve resiliency on failures. At the time of writing, we are in the process of implementing some of these. We apologize to customers who were impacted during this incident; we are taking immediate steps to improve the reliability of the domain verification and accounts claim services. Thanks, Atlassian Customer Support
Status: Postmortem
Impact: Minor | Started At: Feb. 10, 2021, 5:59 a.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.