Last checked: 3 minutes ago
Get notified about any outages, downtime or incidents for Trello and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Trello.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
API | Active |
Atlassian Support Knowledge Base | Active |
Atlassian Support - Support Portal | Active |
Atlassian Support Ticketing | Active |
Trello.com | Active |
View the latest incidents for Trello and check for official updates:
Description: This incident has been resolved.
Status: Resolved
Impact: None | Started At: April 27, 2023, 4:53 p.m.
Description: This incident has been resolved.
Status: Resolved
Impact: None | Started At: April 27, 2023, 4:53 p.m.
Description: ### **SUMMARY** On April 24, 2023, between 10:50 p.m. and 11:18 p.m. UTC, most Trello users experienced errors when trying to view or edit their board and cards. The event occurred during a routine database maintenance event which erroneously updated DNS records. This incident affected customers in all regions and on all devices including web browsers, desktops, and mobile apps. Our automated monitoring systems detected the incident within three minutes and mitigated it by identifying and reverting the erroneous DNS changes. The total time to resolution was approximately 28 minutes. ### **IMPACT** Trello experienced a service disruption lasting approximately 28 minutes affecting a large set of active users during the outage window. During this time, key actions such as loading boards and cards frequently failed. Some particular boards and cards may have loaded successfully, but for most users, the application failed to load and was unusable. ### **ROOT CAUSE** In the process of performing database maintenance, DNS records for two database servers were erroneously updated to point to new servers that were not yet ready for service. This caused database queries to those hosts to fail. The database is designed with redundancy and should quickly and automatically failover to a healthy server. We test this behavior on a regular basis. However, in this particular instance, the replicaset was operating normally among the participating nodes, which prevented the normal failover process from triggering. The erroneous DNS update prevented services that query this replicaset from reaching it, instead going to newly added servers that did not have data. This partial failure state was previously untested and led to a longer diagnosis and recovery time than expected. It took approximately 3 minutes to detect the outage, 19 minutes to discover the root cause, 3 minutes to implement the fix, and 3 minutes for systems to recover. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity and strive to avoid incidents like these. We are prioritizing the following efforts to avoid repeating this type of incident: * Create additional safety checks for DNS record changes in our infrastructure management systems. These checks have been developed, tested, and deployed. * Research and test methods for improving automatic database failover during this partial failure state. We apologize to customers who were affected during this incident. We are taking these immediate steps to improve the platform’s availability. Thanks, The Trello team
Status: Postmortem
Impact: Major | Started At: April 24, 2023, 11:23 p.m.
Description: ### **SUMMARY** On April 24, 2023, between 10:50 p.m. and 11:18 p.m. UTC, most Trello users experienced errors when trying to view or edit their board and cards. The event occurred during a routine database maintenance event which erroneously updated DNS records. This incident affected customers in all regions and on all devices including web browsers, desktops, and mobile apps. Our automated monitoring systems detected the incident within three minutes and mitigated it by identifying and reverting the erroneous DNS changes. The total time to resolution was approximately 28 minutes. ### **IMPACT** Trello experienced a service disruption lasting approximately 28 minutes affecting a large set of active users during the outage window. During this time, key actions such as loading boards and cards frequently failed. Some particular boards and cards may have loaded successfully, but for most users, the application failed to load and was unusable. ### **ROOT CAUSE** In the process of performing database maintenance, DNS records for two database servers were erroneously updated to point to new servers that were not yet ready for service. This caused database queries to those hosts to fail. The database is designed with redundancy and should quickly and automatically failover to a healthy server. We test this behavior on a regular basis. However, in this particular instance, the replicaset was operating normally among the participating nodes, which prevented the normal failover process from triggering. The erroneous DNS update prevented services that query this replicaset from reaching it, instead going to newly added servers that did not have data. This partial failure state was previously untested and led to a longer diagnosis and recovery time than expected. It took approximately 3 minutes to detect the outage, 19 minutes to discover the root cause, 3 minutes to implement the fix, and 3 minutes for systems to recover. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity and strive to avoid incidents like these. We are prioritizing the following efforts to avoid repeating this type of incident: * Create additional safety checks for DNS record changes in our infrastructure management systems. These checks have been developed, tested, and deployed. * Research and test methods for improving automatic database failover during this partial failure state. We apologize to customers who were affected during this incident. We are taking these immediate steps to improve the platform’s availability. Thanks, The Trello team
Status: Postmortem
Impact: Major | Started At: April 24, 2023, 11:23 p.m.
Description: This incident has been resolved.
Status: Resolved
Impact: Minor | Started At: March 30, 2023, 6:35 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.