Last checked: 9 minutes ago
Get notified about any outages, downtime or incidents for Rollbar and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Rollbar.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
API Tier (api.rollbar.com) | Active |
Rollbar Docs | Active |
rollbar.min.js | Active |
SCIM and SSO | Active |
Web App (rollbar.com) | Active |
External notification services | Active |
Mailgun Outbound Delivery | Active |
Mailgun SMTP | Active |
Processing pipeline | Active |
Core Processing Pipeline | Active |
iOS Symbolication pipeline | Active |
Proguard processing pipeline | Active |
Source map symbolication pipeline | Active |
View the latest incidents for Rollbar and check for official updates:
Description: This incident has been resolved.
Status: Resolved
Impact: Minor | Started At: April 12, 2024, 2:34 p.m.
Description: This incident has been resolved.
Status: Resolved
Impact: None | Started At: April 4, 2024, 7:43 p.m.
Description: This incident has been resolved.
Status: Resolved
Impact: None | Started At: April 4, 2024, 7:43 p.m.
Description: **Summary of the Incident and Impact** On March 25th, 2024, between 10:58 and 12:08 PDT Rollbar experienced a platform latency increase affecting the Web Application \([rollbar.com](http://rollbar.com)\) and Pipeline services. The cause of these issues can be traced to a combination of 2 releases that occurred in relatively quick succession. One of the releases involved transitioning our package management for the Summarization service and the other was a code release containing a poorly optimized query that caused our database to increase load. At 07:34 PDT on March 25th, a release of the Summarization service was completed using a new package management system. The release resulted in a change to an IP address that was used to configure a DNS that connected to this service. This resulted in requests that timed out and increased the page load latency on certain views of items in the Web Application. At 10:08 a release was deployed to the Web Application and Pipeline services with a code change which resulted in a query that significantly increased disk IO on one of Rollbar’s main databases. Pipeline latency started to build as load increased on the server, and this further affected page load times on the Web Application. Alerts triggered and brought attention to engineers as thresholds were breached at 10:21 PDT but since these 2 issues were compounding to affect latency, it was not immediately clear what the problem was. The application was still usable but significantly slow for some customers. A series of reverts were made that brought the system back to stability. Timeline: * March 25 07:34 PDT - Summarization service was deployed using a new package manager * 10:08 PDT - Changes to Rollbar’s Web Application and Pipeline were released with a poorly optimized database query * 10:21 PDT - Alerts internal to Rollbar started to trigger as latency spiked in various places * 10:58 PDT - General stability of the Web Application and Pipeline are affected with some customers reporting slow loading or unreachable pages * 11:26 PDT - The changes to the Web Application and Pipeline were reverted and deployed * 12:08 PDT - The changes to the Summarization service were reverted and full stability was reached **Follow-Up Actions** To mitigate future risks and avoid similar incidents, we are undertaking the following actions: * We are actively working on addressing how we reconcile the IP addresses with our DNS for the summarization service and looking to improve this process. * We will be having a full internal postmortem on this event by April 5, 2024, and expect to identify further action items to improve our systems.
Status: Postmortem
Impact: Minor | Started At: March 25, 2024, 5:50 p.m.
Description: **Summary of the Incident and Impact** On March 25th, 2024, between 10:58 and 12:08 PDT Rollbar experienced a platform latency increase affecting the Web Application \([rollbar.com](http://rollbar.com)\) and Pipeline services. The cause of these issues can be traced to a combination of 2 releases that occurred in relatively quick succession. One of the releases involved transitioning our package management for the Summarization service and the other was a code release containing a poorly optimized query that caused our database to increase load. At 07:34 PDT on March 25th, a release of the Summarization service was completed using a new package management system. The release resulted in a change to an IP address that was used to configure a DNS that connected to this service. This resulted in requests that timed out and increased the page load latency on certain views of items in the Web Application. At 10:08 a release was deployed to the Web Application and Pipeline services with a code change which resulted in a query that significantly increased disk IO on one of Rollbar’s main databases. Pipeline latency started to build as load increased on the server, and this further affected page load times on the Web Application. Alerts triggered and brought attention to engineers as thresholds were breached at 10:21 PDT but since these 2 issues were compounding to affect latency, it was not immediately clear what the problem was. The application was still usable but significantly slow for some customers. A series of reverts were made that brought the system back to stability. Timeline: * March 25 07:34 PDT - Summarization service was deployed using a new package manager * 10:08 PDT - Changes to Rollbar’s Web Application and Pipeline were released with a poorly optimized database query * 10:21 PDT - Alerts internal to Rollbar started to trigger as latency spiked in various places * 10:58 PDT - General stability of the Web Application and Pipeline are affected with some customers reporting slow loading or unreachable pages * 11:26 PDT - The changes to the Web Application and Pipeline were reverted and deployed * 12:08 PDT - The changes to the Summarization service were reverted and full stability was reached **Follow-Up Actions** To mitigate future risks and avoid similar incidents, we are undertaking the following actions: * We are actively working on addressing how we reconcile the IP addresses with our DNS for the summarization service and looking to improve this process. * We will be having a full internal postmortem on this event by April 5, 2024, and expect to identify further action items to improve our systems.
Status: Postmortem
Impact: Minor | Started At: March 25, 2024, 5:50 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.