Last checked: 2 minutes ago
Get notified about any outages, downtime or incidents for RebelMouse and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for RebelMouse.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
AWS ec2-us-east-1 | Active |
AWS elb-us-east-1 | Active |
AWS RDS | Active |
AWS route53 | Active |
AWS s3-us-standard | Active |
AWS ses-us-east-1 | Active |
Braintree API | Active |
Braintree PayPal Processing | Active |
CDN | Active |
Celery | Active |
Content Delivery API | Active |
Discovery | Active |
EKS Cluster | Active |
Active | |
Fastly Amsterdam (AMS) | Active |
Fastly Hong Kong (HKG) | Active |
Fastly London (LHR) | Active |
Fastly Los Angeles (LAX) | Active |
Fastly New York (JFK) | Active |
Fastly Sydney (SYD) | Active |
Full Platform | Active |
Google Apps Analytics | Active |
Logged In Users | Active |
Media | Active |
Mongo Cluster | Active |
Pharos | Active |
RabbitMQ | Active |
Redis Cluster | Active |
Sentry Dashboard | Active |
Stats | Active |
Talaria | Active |
Active | |
WFE | Active |
View the latest incidents for RebelMouse and check for official updates:
Description: **Incident During Application Deployment** During a regular application deployment, our team encountered two critical issues that affected the functionality of our application. 1. During the deployment, we introduced a new field to the post model. After the deployment, we observed that some posts were not loading, necessitating an immediate update of the post cache version to accommodate the changes in the post model. To address this issue, we promptly updated the post cache version to align with the modified model, restoring the functionality of the affected posts. We are reviewing as a team to make sure that no deploy with new fields or changes to data storage are done without further understanding how to avoid incidents. 2. The time to recover was slower than it should have been because of a network connection error occurred during the deployment process, preventing the successful completion of code deployment to one of our clusters. This situation left us unable to restart the Celery processes, even when all other clusters were in a ready state. As a result, we had to restart the deployment and wait for its completion, causing an unexpected delay in the deployment process. **Immediate Actions Taken:** For the issue with the post model field addition, we acted swiftly by updating the post cache version to ensure compatibility with the modified model. **Mitigation and Preventive Measures:** Based on the incident analysis we've already integrated a new test which won't allow deployment to happen in case the post model was changed without post cache modification. "The network issue occurred randomly and, regrettably, coincided with the ongoing deployment process."
Status: Postmortem
Impact: Minor | Started At: Sept. 21, 2023, 11:08 a.m.
Description: This incident resulted in an interruption of our editorial tools and impacted our editors twice, from 7:15 AM EST to 7:45 AM EST and from 8:35 AM EST to 9:40 AM EST. **Incident Summary:** The incident during the Redis upgrade stemmed from two primary issues: 1. **Failover Switch Configuration**: During the switch to the new Redis version, most of the instances did not automatically transition to the master role, requiring manual intervention. This unexpected behavior occurred despite successful testing in our staging environment. The errors related to the failover switch proved to be unpredictable. Fortunately, our vigilant DevOps department quickly identified the problem, and we promptly resolved it. 2. **Backend Application Restart**: The second service outage was attributed to our backend application not restarting correctly, resulting in the continued use of old Redis endpoints. This problem was traced back to our deployment procedure script, which was not set to restart the application when no changes were made. Identifying the root cause took some time, but once identified, we resolved it promptly. **Impact**: The incident led to the unavailability of our editorial tools. Periodic tasks like feeds, newsletters, post scheduling and others were delayed due to the pause of Celery processes. **Resolution and Mitigation**: To prevent such incidents in the future, we are taking the following measures: Deployment Procedure Enhancement: Our deployment procedure will be improved to ensure that the backend application is correctly restarted after deployment, even when no changes have been made.
Status: Postmortem
Impact: Minor | Started At: Sept. 10, 2023, 11:50 a.m.
Description: On September 5, 2023, between 4:17 PM EST and 4:22 PM EST, our services experienced a performance degradation. Our team is actively investigating the root causes, and we will furnish further information once the investigation is complete.
Status: Resolved
Impact: None | Started At: Sept. 5, 2023, 8:30 p.m.
Description: On September 5, 2023, between 4:17 PM EST and 4:22 PM EST, our services experienced a performance degradation. Our team is actively investigating the root causes, and we will furnish further information once the investigation is complete.
Status: Resolved
Impact: None | Started At: Sept. 5, 2023, 8:30 p.m.
Description: ## **Celery Beat Service Outage** **Summary:** * **Incident Date & Time:** September 1, 2023, from 12:01 PM EST to 1:22 PM EST * **Affected Service:** Celery Beat * **Root Cause:** Unforeseen hardware failure of the instance hosting Celery Beat **Incident Details:** During the period of September 1, 2023, from 12:01 PM EST to 1:22 PM EST, our Celery Beat service experienced an outage, resulting in periodic tasks not processing as expected. These tasks were accumulating in the queue, causing disruption to our operations. The primary cause of this incident was the unexpected failure and subsequent termination of the instance hosting the Celery Beat service. **Actions Taken:** Upon detecting the issue, our incident response team promptly initiated recovery procedures. Initially, we attempted to launch a new instance with the same configuration as the failed instance. Regrettably, these attempts ended in failure due to unanticipated complications. To expedite service restoration and mitigate the risk of recurrence, we made the following critical decisions: * **Instance Family Change:** We transitioned to a more robust instance family with enhanced hardware capabilities. This change was aimed at reducing the likelihood of hardware-related failure. * **Increased Instance Power:** We selected a more powerful instance type to improve the launch speed and overall performance of the Celery Beat service. **Resolution:** With the aforementioned adjustments in place, we successfully launched a new instance, and the Celery Beat service was fully restored. **Root Cause Analysis:** The initial failure was challenging to predict or prevent since it was attributed to a hardware problem within the instance hosting the Celery Beat service. Hardware failures can be unpredictable and fall outside the scope of traditional preventive measures.
Status: Postmortem
Impact: Major | Started At: Sept. 1, 2023, 4:48 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.