Last checked: 5 minutes ago
Get notified about any outages, downtime or incidents for RebelMouse and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for RebelMouse.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
AWS ec2-us-east-1 | Active |
AWS elb-us-east-1 | Active |
AWS RDS | Active |
AWS route53 | Active |
AWS s3-us-standard | Active |
AWS ses-us-east-1 | Active |
Braintree API | Active |
Braintree PayPal Processing | Active |
CDN | Active |
Celery | Active |
Content Delivery API | Active |
Discovery | Active |
EKS Cluster | Active |
Active | |
Fastly Amsterdam (AMS) | Active |
Fastly Hong Kong (HKG) | Active |
Fastly London (LHR) | Active |
Fastly Los Angeles (LAX) | Active |
Fastly New York (JFK) | Active |
Fastly Sydney (SYD) | Active |
Full Platform | Active |
Google Apps Analytics | Active |
Logged In Users | Active |
Media | Active |
Mongo Cluster | Active |
Pharos | Active |
RabbitMQ | Active |
Redis Cluster | Active |
Sentry Dashboard | Active |
Stats | Active |
Talaria | Active |
Active | |
WFE | Active |
View the latest incidents for RebelMouse and check for official updates:
Description: ## **Chronology of the incident** Feb 8, 2024, 4:20 PM EST – An increase in error rate was observed. Feb 8, 2024, 4:25 PM EST – Monitoring systems detected anomalies, prompting the RebelMouse team to initiate an investigation. Feb 8, 2024, 5:00 PM EST – Error rates experienced a significant surge. Feb 8, 2024, 5:16 PM EST – The RebelMouse team officially categorized the incident as Major and communicated it through the Status Portal. Feb 8, 2024, 5:30 PM EST – The root cause was pinpointed: unavailability in launching new instances within the EKS cluster. Feb 8, 2024, 6:00 PM EST – The RebelMouse team rectified the issue by updating the network configuration and manually launching required instances to restore system performance. Feb 8, 2024, 8:51 PM EST – RebelMouse initiated a support request regarding AWS services outage. Feb 8, 2024, 9:10 PM EST – Systems reconfiguration was completed, and the team entered monitoring mode. Feb 8, 2024, 10:10 PM EST – The incident was officially resolved. Feb 10, 2024, 2:30 AM EST – AWS confirmed an issue with the EKS service in the us-east-1 region during the specified period, and services have been restored. ## **The impact of the incident** Stores multiple key services hosted on AWS us-east-1 region for RebelMouse were impacted leading to partial unavailability. The underlying cause if known The root cause of this problem has been identified as a networking issue within AWS, specifically affecting the EKS service within the us-east-1 region. AWS acknowledged the issue and the team was actively working on resolving it. ## **Actions taken** RebelMouse engineering teams were engaged as soon as the problem was identified. They worked diligently to resolve the issue in the fastest manner possible while keeping customers updated about the situation. ## **Preventive Measures** We have recognized the importance of enhancing our strategies for handling potential networking issues. Going forward, we will seek opportunities to mitigate these challenges by implementing extensive caching systems and boosting our redundant capacity for caching.
Status: Postmortem
Impact: Minor | Started At: Feb. 8, 2024, 10:16 p.m.
Description: Chronology of the incident (EDT timezone): 06:46 AM - Deployment happened at EST 09:35 AM - Service Delivery team received the bug report 12:53 PM - RebelMouse tech team started a rollback procedure involving CTO and Director of IT Operations 01:10 PM - RebelMouse received the report about multiple ongoing issues on some pages like missing styles or text 03:47 PM - release was reverted from the production The impact of the incident: Page rendering issues in the case of usage sections intersection feature. The underlying cause: The unexpected problem was caused by an application release to a website's rendering and routing service where we introduced the change for our routing system. This change was needed to give the possibility to implement new routing features like wildcard redirects from the redirects dashboard which in current implementation was impossible. Actions taken: - Initiated a meeting between the Development and QA teams to thoroughly review the incident. The goal was to classify the incident, define and identify the necessary tests to prevent similar issues in the future. - Incident was classified as not Major. - Updated the regression test suite by incorporating new tests specifically designed to cover the sections intersection functionality. This ensures comprehensive testing moving forward. - Addressed and resolved the bug introduced in the initial release by implementing a fix. The fix is aimed at restoring the intended behavior of the section's intersection functionality. Preventive Measures: By FEB 9 - Conducting a comprehensive review of our custom functionalities to identify potential points of vulnerability. By FEB 16 - Implementing additional checks in the QA phase to catch nuanced issues in custom functionalities. Strengthening collaboration between development and QA teams to improve test coverage for less commonly used features.
Status: Resolved
Impact: None | Started At: Feb. 1, 2024, 11:30 a.m.
Description: Chronology of the incident (EDT timezone): 06:46 AM - Deployment happened at EST 09:35 AM - Service Delivery team received the bug report 12:53 PM - RebelMouse tech team started a rollback procedure involving CTO and Director of IT Operations 01:10 PM - RebelMouse received the report about multiple ongoing issues on some pages like missing styles or text 03:47 PM - release was reverted from the production The impact of the incident: Page rendering issues in the case of usage sections intersection feature. The underlying cause: The unexpected problem was caused by an application release to a website's rendering and routing service where we introduced the change for our routing system. This change was needed to give the possibility to implement new routing features like wildcard redirects from the redirects dashboard which in current implementation was impossible. Actions taken: - Initiated a meeting between the Development and QA teams to thoroughly review the incident. The goal was to classify the incident, define and identify the necessary tests to prevent similar issues in the future. - Incident was classified as not Major. - Updated the regression test suite by incorporating new tests specifically designed to cover the sections intersection functionality. This ensures comprehensive testing moving forward. - Addressed and resolved the bug introduced in the initial release by implementing a fix. The fix is aimed at restoring the intended behavior of the section's intersection functionality. Preventive Measures: By FEB 9 - Conducting a comprehensive review of our custom functionalities to identify potential points of vulnerability. By FEB 16 - Implementing additional checks in the QA phase to catch nuanced issues in custom functionalities. Strengthening collaboration between development and QA teams to improve test coverage for less commonly used features.
Status: Resolved
Impact: None | Started At: Feb. 1, 2024, 11:30 a.m.
Description: On November 7, 2023, from 2:43 PM EST to 2:55 PM EST, an incident occurred, resulting in an outage of our editorial tools. The incident was caused by a combination of two unexpected factors: 1. Celery Queue Name Update: In one of the recent software release, an unintentional change to the name of one of the Celery queues occurred. This change resulted in the processing instance switch from a dedicated instance to a default one which processes several other queues, and the queue was inadvertently excluded from our monitoring tool rules. Consequently, the unprocessed tasks queue grew rapidly as incoming tasks exceeded the processing capacity. These unprocessed tasks were stored in Redis memory, leading to memory exhaustion and the initiation of swap usage. 2. Script Overload: Concurrently, a routine development script was executed by one of our developers, which also utilized the same Redis infrastructure. This increased the load on the already strained Redis service, exacerbating the problem. In response to the incident, our team took the following immediate actions: * Stopped the developer's script to reduce the load on Redis. * Deployed several powerful instances to handle the backlog of unprocessed tasks. * Updated the Celery queue name to its correct configuration to prevent a recurrence of this issue.
Status: Postmortem
Impact: None | Started At: Nov. 8, 2023, 12:29 p.m.
Description: On November 7, 2023, from 2:43 PM EST to 2:55 PM EST, an incident occurred, resulting in an outage of our editorial tools. The incident was caused by a combination of two unexpected factors: 1. Celery Queue Name Update: In one of the recent software release, an unintentional change to the name of one of the Celery queues occurred. This change resulted in the processing instance switch from a dedicated instance to a default one which processes several other queues, and the queue was inadvertently excluded from our monitoring tool rules. Consequently, the unprocessed tasks queue grew rapidly as incoming tasks exceeded the processing capacity. These unprocessed tasks were stored in Redis memory, leading to memory exhaustion and the initiation of swap usage. 2. Script Overload: Concurrently, a routine development script was executed by one of our developers, which also utilized the same Redis infrastructure. This increased the load on the already strained Redis service, exacerbating the problem. In response to the incident, our team took the following immediate actions: * Stopped the developer's script to reduce the load on Redis. * Deployed several powerful instances to handle the backlog of unprocessed tasks. * Updated the Celery queue name to its correct configuration to prevent a recurrence of this issue.
Status: Postmortem
Impact: None | Started At: Nov. 8, 2023, 12:29 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.