Company Logo

Is there an Trello outage?

Trello status: Systems Active

Last checked: 37 seconds ago

Get notified about any outages, downtime or incidents for Trello and 1800+ other cloud vendors. Monitor 10 companies, for free.

Subscribe for updates

Trello outages and incidents

Outage and incident data over the last 30 days for Trello.

There have been 0 outages or incidents for Trello in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Sign Up Now

Components and Services Monitored for Trello

Outlogger tracks the status of these components for Xero:

API Active
Atlassian Support Knowledge Base Active
Atlassian Support - Support Portal Active
Atlassian Support Ticketing Active
Trello.com Active
Component Status
API Active
Atlassian Support Knowledge Base Active
Atlassian Support - Support Portal Active
Atlassian Support Ticketing Active
Trello.com Active

Latest Trello outages and incidents.

View the latest incidents for Trello and check for official updates:

Updates:

  • Time: Dec. 8, 2023, 4:03 p.m.
    Status: Postmortem
    Update: ### **SUMMARY** On Nov 30 2023, between 14:04 and 16:57 UTC, Atlassian customers using Trello experienced errors when accessing and interacting with the application. This incident impacted Trello users on the iOS and Android mobile apps as well as those using the Trello web app. The event was triggered by the release of a code change that eventually overloaded a critical part of the Trello database. The incident was detected immediately by our automated monitoring systems and was mitigated by disabling the relevant code change. The issue was extended by the failure of a secondary service whose recovery caused an increase in load on the same critical part of the Trello database, which created a negative feedback loop. This secondary service recovery involved reestablishing over a million connections, with each connection attempt adding load to the same part of the Trello database. We attempted to aid the service recovery by intentionally blocking some of the inbound Trello traffic to reduce load on the database and by increasing the capacity of the Trello database to better handle the high load. Over time the connections were all successfully reestablished, which returned Trello to a known good state. The total time to resolution was just under 3 hours. ### **IMPACT** The overall impact was between Nov 30 2023, 14:04 UTC and Nov 30 2023, 16:57 UTC on the Trello product. The incident caused service disruption to all Trello customers. Our metrics show there were elevated API response times and increased error rates through the entire incident period, which indicates that most users were unable to load Trello at all or easily interact with the application in any way. The particular database collection that was overloaded was one that is necessary for the Trello service to make authorization decisions, which meant that all requests were impacted. ### **ROOT CAUSE** The issue was caused by a series of changes intended to standardize Trello’s approach to authorizing requests, but had the unintended side effect of modifying a database query from a [targeted operation](https://www.mongodb.com/docs/manual/core/sharded-cluster-query-router/#std-label-sharding-mongos-targeted) to a [broadcast operation](https://www.mongodb.com/docs/manual/core/sharded-cluster-query-router/#std-label-sharding-mongos-broadcast). Broadcast operations are more resource-intensive as they must be sent to _all_ database servers to be satisfied. These broadcast operations eventually overloaded some of the Trello database servers as Trello approached its daily peak usage period on Nov 30 2023. 1. The first change of this type was deployed over a period of seven days at the end of August and changed the authorization type used by our websocket service. This meant that newly established websocket connections required this new broadcast query. At any given moment, we have a great deal of _established_ websocket connections, but the usual rate of _new_ websocket connections is relatively low. Therefore, our monitoring systems only detected a slight increase in resource usage and flagged this change as a low priority performance regression. We acknowledged the regression and created a task to identify and reduce the resource demands of these new queries. 2. The second change of this type was deployed over the course of a few days before being fully rolled out on Nov 29, 2023, the day before this incident. This change caused the Trello application server to use the new broadcast query while authorizing standard web browser traffic, which is the vast majority of our traffic. The change was fully deployed at 19:34 UTC on Nov 29, which was during a low traffic period. The next day, as the application approached its daily peak traffic period, our monitoring on the database servers indicated they were overloaded. When these database nodes were overloaded, users' HTTP requests received very slow responses or HTTP 504 errors. As we activated our load shedding strategies, some users received HTTP 429 errors. The incident’s length can be attributed to a secondary failure where our websocket servers experienced a rapid increase in memory leading to processes crashing with OutOfMemoryErrors. As new servers came online and the websockets attempted to reconnected, they once again generated the broadcast queries on the Trello database servers. These broadcast queries continued to put load on the database, which meant the Trello API continued to have high latency, thus perpetuating the negative feedback loop. We are working to determine the root cause of the OutOfMemoryErrors. We also determined after the incident that due to the Trello application server making the load shedding decision AFTER performing the authorization step, the overloaded database servers were still being queried before the request was rejected. We are working to improve our load shedding strategies post incident. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity and we are continually working to improve our testing and preventative processes to prevent similar outages in the future. We are prioritizing the following improvement actions to avoid repeating this type of incident: * Increase the capacity of our database \(completed during the incident\). * This action is the most critical and is aimed at preventing a recurrence of this particular incident and gracefully recover if the websocket service were to fail again. * Refactor the new authorization approach to avoid [broadcast operations](https://www.mongodb.com/docs/manual/core/sharded-cluster-query-router/#std-label-sharding-mongos-broadcast). * Add pre-deployment tests to avoid releasing unnecessary broadcast operations. * Determine the root cause of the secondary failure of the websocket service. Furthermore, we deploy our changes only after thorough review and automated testing, and we deploy them progressively using feature flags to avoid broad impact. To minimize the impact of breaking changes to our environments, we will implement additional preventative measures: * Ensure that our load-shedding strategies fail fast. * Add monitoring to observe [broadcast operations](https://www.mongodb.com/docs/manual/core/sharded-cluster-query-router/#std-label-sharding-mongos-broadcast) in all our environments. We apologize to customers who were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability. Thanks, Atlassian Customer Support
  • Time: Nov. 30, 2023, 5:37 p.m.
    Status: Resolved
    Update: The fix we released was successful, and all the issues our users were experiencing with Trello have been resolved. Thank you for your patience and understanding!
  • Time: Nov. 30, 2023, 5:14 p.m.
    Status: Monitoring
    Update: We've implemented a new fix to the issue affecting our database, and we're now seeing signs of recovery for all users on Trello. We'll keep monitoring this latest fix for now. Once again, we appreciate your understanding!
  • Time: Nov. 30, 2023, 4:35 p.m.
    Status: Identified
    Update: Our engineering team has identified an issue with Trello's database, and we're now working on implementing a fix to restore Trello's availability to all users. We appreciate everyone's understanding!
  • Time: Nov. 30, 2023, 3:56 p.m.
    Status: Investigating
    Update: Our teams are still investigating the issue affecting Trello's availability, and a new update will be provided soon. We appreciate your understanding!
  • Time: Nov. 30, 2023, 3:22 p.m.
    Status: Investigating
    Update: Our team is still investigating the issue that's affecting Trello's availability, and we'll provide a new update soon. Thanks for your patience and understanding!
  • Time: Nov. 30, 2023, 2:52 p.m.
    Status: Investigating
    Update: We're still investigating the root cause of this problem, and a new update will be shared soon! Thanks for your patience!
  • Time: Nov. 30, 2023, 2:21 p.m.
    Status: Investigating
    Update: We're currently investigating an issue that's causing Trello to be slow or unavailable to our users. A new update will be shared soon.

Updates:

  • Time: Dec. 8, 2023, 4:03 p.m.
    Status: Postmortem
    Update: ### **SUMMARY** On Nov 30 2023, between 14:04 and 16:57 UTC, Atlassian customers using Trello experienced errors when accessing and interacting with the application. This incident impacted Trello users on the iOS and Android mobile apps as well as those using the Trello web app. The event was triggered by the release of a code change that eventually overloaded a critical part of the Trello database. The incident was detected immediately by our automated monitoring systems and was mitigated by disabling the relevant code change. The issue was extended by the failure of a secondary service whose recovery caused an increase in load on the same critical part of the Trello database, which created a negative feedback loop. This secondary service recovery involved reestablishing over a million connections, with each connection attempt adding load to the same part of the Trello database. We attempted to aid the service recovery by intentionally blocking some of the inbound Trello traffic to reduce load on the database and by increasing the capacity of the Trello database to better handle the high load. Over time the connections were all successfully reestablished, which returned Trello to a known good state. The total time to resolution was just under 3 hours. ### **IMPACT** The overall impact was between Nov 30 2023, 14:04 UTC and Nov 30 2023, 16:57 UTC on the Trello product. The incident caused service disruption to all Trello customers. Our metrics show there were elevated API response times and increased error rates through the entire incident period, which indicates that most users were unable to load Trello at all or easily interact with the application in any way. The particular database collection that was overloaded was one that is necessary for the Trello service to make authorization decisions, which meant that all requests were impacted. ### **ROOT CAUSE** The issue was caused by a series of changes intended to standardize Trello’s approach to authorizing requests, but had the unintended side effect of modifying a database query from a [targeted operation](https://www.mongodb.com/docs/manual/core/sharded-cluster-query-router/#std-label-sharding-mongos-targeted) to a [broadcast operation](https://www.mongodb.com/docs/manual/core/sharded-cluster-query-router/#std-label-sharding-mongos-broadcast). Broadcast operations are more resource-intensive as they must be sent to _all_ database servers to be satisfied. These broadcast operations eventually overloaded some of the Trello database servers as Trello approached its daily peak usage period on Nov 30 2023. 1. The first change of this type was deployed over a period of seven days at the end of August and changed the authorization type used by our websocket service. This meant that newly established websocket connections required this new broadcast query. At any given moment, we have a great deal of _established_ websocket connections, but the usual rate of _new_ websocket connections is relatively low. Therefore, our monitoring systems only detected a slight increase in resource usage and flagged this change as a low priority performance regression. We acknowledged the regression and created a task to identify and reduce the resource demands of these new queries. 2. The second change of this type was deployed over the course of a few days before being fully rolled out on Nov 29, 2023, the day before this incident. This change caused the Trello application server to use the new broadcast query while authorizing standard web browser traffic, which is the vast majority of our traffic. The change was fully deployed at 19:34 UTC on Nov 29, which was during a low traffic period. The next day, as the application approached its daily peak traffic period, our monitoring on the database servers indicated they were overloaded. When these database nodes were overloaded, users' HTTP requests received very slow responses or HTTP 504 errors. As we activated our load shedding strategies, some users received HTTP 429 errors. The incident’s length can be attributed to a secondary failure where our websocket servers experienced a rapid increase in memory leading to processes crashing with OutOfMemoryErrors. As new servers came online and the websockets attempted to reconnected, they once again generated the broadcast queries on the Trello database servers. These broadcast queries continued to put load on the database, which meant the Trello API continued to have high latency, thus perpetuating the negative feedback loop. We are working to determine the root cause of the OutOfMemoryErrors. We also determined after the incident that due to the Trello application server making the load shedding decision AFTER performing the authorization step, the overloaded database servers were still being queried before the request was rejected. We are working to improve our load shedding strategies post incident. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity and we are continually working to improve our testing and preventative processes to prevent similar outages in the future. We are prioritizing the following improvement actions to avoid repeating this type of incident: * Increase the capacity of our database \(completed during the incident\). * This action is the most critical and is aimed at preventing a recurrence of this particular incident and gracefully recover if the websocket service were to fail again. * Refactor the new authorization approach to avoid [broadcast operations](https://www.mongodb.com/docs/manual/core/sharded-cluster-query-router/#std-label-sharding-mongos-broadcast). * Add pre-deployment tests to avoid releasing unnecessary broadcast operations. * Determine the root cause of the secondary failure of the websocket service. Furthermore, we deploy our changes only after thorough review and automated testing, and we deploy them progressively using feature flags to avoid broad impact. To minimize the impact of breaking changes to our environments, we will implement additional preventative measures: * Ensure that our load-shedding strategies fail fast. * Add monitoring to observe [broadcast operations](https://www.mongodb.com/docs/manual/core/sharded-cluster-query-router/#std-label-sharding-mongos-broadcast) in all our environments. We apologize to customers who were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability. Thanks, Atlassian Customer Support
  • Time: Nov. 30, 2023, 5:37 p.m.
    Status: Resolved
    Update: The fix we released was successful, and all the issues our users were experiencing with Trello have been resolved. Thank you for your patience and understanding!
  • Time: Nov. 30, 2023, 5:14 p.m.
    Status: Monitoring
    Update: We've implemented a new fix to the issue affecting our database, and we're now seeing signs of recovery for all users on Trello. We'll keep monitoring this latest fix for now. Once again, we appreciate your understanding!
  • Time: Nov. 30, 2023, 4:35 p.m.
    Status: Identified
    Update: Our engineering team has identified an issue with Trello's database, and we're now working on implementing a fix to restore Trello's availability to all users. We appreciate everyone's understanding!
  • Time: Nov. 30, 2023, 3:56 p.m.
    Status: Investigating
    Update: Our teams are still investigating the issue affecting Trello's availability, and a new update will be provided soon. We appreciate your understanding!
  • Time: Nov. 30, 2023, 3:22 p.m.
    Status: Investigating
    Update: Our team is still investigating the issue that's affecting Trello's availability, and we'll provide a new update soon. Thanks for your patience and understanding!
  • Time: Nov. 30, 2023, 2:52 p.m.
    Status: Investigating
    Update: We're still investigating the root cause of this problem, and a new update will be shared soon! Thanks for your patience!
  • Time: Nov. 30, 2023, 2:21 p.m.
    Status: Investigating
    Update: We're currently investigating an issue that's causing Trello to be slow or unavailable to our users. A new update will be shared soon.

Updates:

  • Time: Nov. 29, 2023, 5:58 a.m.
    Status: Resolved
    Update: Forge Invocations had an 8 minute outage between 2023-11-29 03:05:13 UTC to 2023-11-29 03:13:27 UTC resulting in Smart Links failing. This service has recovered post this time period.

Updates:

  • Time: Nov. 29, 2023, 5:58 a.m.
    Status: Resolved
    Update: Forge Invocations had an 8 minute outage between 2023-11-29 03:05:13 UTC to 2023-11-29 03:13:27 UTC resulting in Smart Links failing. This service has recovered post this time period.

Updates:

  • Time: Nov. 16, 2023, 11:58 p.m.
    Status: Resolved
    Update: This incident has been resolved. If you're still seeing issues, please reach out at https://trello.com/contact/
  • Time: Nov. 16, 2023, 11:32 p.m.
    Status: Monitoring
    Update: Our Engineers have rolled out a fix and have seen services recovering. We're now monitoring it.
  • Time: Nov. 16, 2023, 10:52 p.m.
    Status: Identified
    Update: Our Engineers have identified the issue and are working on a fix. We have found the following services to be affected: - Inability to create new Automation commands - Card changes not saving - Some file uploads failing
  • Time: Nov. 16, 2023, 10:37 p.m.
    Status: Investigating
    Update: We are currently investigating an issue affecting some of Trello's services. We have identified that some users may experience trouble with the following service: - Automations commands not running - Card changes not saving

Check the status of similar companies and alternatives to Trello

Atlassian
Atlassian

Systems Active

Zoom
Zoom

Systems Active

Dropbox
Dropbox

Systems Active

Miro
Miro

Systems Active

TeamViewer
TeamViewer

Systems Active

Lucid Software
Lucid Software

Systems Active

Restaurant365
Restaurant365

Systems Active

Mural
Mural

Systems Active

Zenefits
Zenefits

Systems Active

Retool
Retool

Systems Active

Splashtop
Splashtop

Systems Active

Hiver
Hiver

Systems Active

Frequently Asked Questions - Trello

Is there a Trello outage?
The current status of Trello is: Systems Active
Where can I find the official status page of Trello?
The official status page for Trello is here
How can I get notified if Trello is down or experiencing an outage?
To get notified of any status changes to Trello, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Trello every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here
What does Trello do?
Trello is a project management tool that enables teams to collaborate and automate tasks. It's easy to set up and accessible on mobile devices.