Last checked: 2 minutes ago
Get notified about any outages, downtime or incidents for RebelMouse and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for RebelMouse.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
AWS ec2-us-east-1 | Active |
AWS elb-us-east-1 | Active |
AWS RDS | Active |
AWS route53 | Active |
AWS s3-us-standard | Active |
AWS ses-us-east-1 | Active |
Braintree API | Active |
Braintree PayPal Processing | Active |
CDN | Active |
Celery | Active |
Content Delivery API | Active |
Discovery | Active |
EKS Cluster | Active |
Active | |
Fastly Amsterdam (AMS) | Active |
Fastly Hong Kong (HKG) | Active |
Fastly London (LHR) | Active |
Fastly Los Angeles (LAX) | Active |
Fastly New York (JFK) | Active |
Fastly Sydney (SYD) | Active |
Full Platform | Active |
Google Apps Analytics | Active |
Logged In Users | Active |
Media | Active |
Mongo Cluster | Active |
Pharos | Active |
RabbitMQ | Active |
Redis Cluster | Active |
Sentry Dashboard | Active |
Stats | Active |
Talaria | Active |
Active | |
WFE | Active |
View the latest incidents for RebelMouse and check for official updates:
Description: ## **Celery Beat Service Outage** **Summary:** * **Incident Date & Time:** September 1, 2023, from 12:01 PM EST to 1:22 PM EST * **Affected Service:** Celery Beat * **Root Cause:** Unforeseen hardware failure of the instance hosting Celery Beat **Incident Details:** During the period of September 1, 2023, from 12:01 PM EST to 1:22 PM EST, our Celery Beat service experienced an outage, resulting in periodic tasks not processing as expected. These tasks were accumulating in the queue, causing disruption to our operations. The primary cause of this incident was the unexpected failure and subsequent termination of the instance hosting the Celery Beat service. **Actions Taken:** Upon detecting the issue, our incident response team promptly initiated recovery procedures. Initially, we attempted to launch a new instance with the same configuration as the failed instance. Regrettably, these attempts ended in failure due to unanticipated complications. To expedite service restoration and mitigate the risk of recurrence, we made the following critical decisions: * **Instance Family Change:** We transitioned to a more robust instance family with enhanced hardware capabilities. This change was aimed at reducing the likelihood of hardware-related failure. * **Increased Instance Power:** We selected a more powerful instance type to improve the launch speed and overall performance of the Celery Beat service. **Resolution:** With the aforementioned adjustments in place, we successfully launched a new instance, and the Celery Beat service was fully restored. **Root Cause Analysis:** The initial failure was challenging to predict or prevent since it was attributed to a hardware problem within the instance hosting the Celery Beat service. Hardware failures can be unpredictable and fall outside the scope of traditional preventive measures.
Status: Postmortem
Impact: Major | Started At: Sept. 1, 2023, 4:48 p.m.
Description: **Incident Summary:** On August 24 from 11:21 EDT to 11:25 EDT, a subset of our users encountered difficulties accessing our services. We take this matter seriously and immediately initiated a thorough investigation to identify the root cause and mitigate the impact. **Root Cause:** The incident was traced back to a failure in our AWS VPC internal DNS resolver. This failure subsequently led to connectivity issues within our Mongo cluster. The secondary instances within the cluster were unable to establish connections with the Primary instance, causing the entire traffic load, including reads and writes, to be directed solely to the Primary instance. This sudden influx of traffic overwhelmed the Primary instance and resulted in the service disruption. **Mitigation Steps:** Upon identifying the issue, our emergency response team was promptly assembled to address the problem and restore services to normal operation. The team worked diligently to resolve the DNS resolver problem and restore proper connectivity within the Mongo cluster. We understand the importance of maintaining a resilient and reliable service environment. **Preventive Measures:** We are committed to preventing similar incidents from occurring in the future. To this end, we are implementing the following measures: Redundancy and Failover: We will enhance the redundancy and failover mechanisms within our AWS infrastructure to ensure that connectivity disruptions are quickly mitigated without impacting the user experience. We are also reviewing the MongoDB cluster setup to ensure the most efficient configuration is being used. Additionally, we want to inform you that we have reached out to AWS to request further details about the DNS failure that occurred. We believe that this will help us fortify our systems against similar issues in the future.
Status: Postmortem
Impact: Minor | Started At: Aug. 24, 2023, 3:32 p.m.
Description: **Incident Summary:** On August 24 from 11:21 EDT to 11:25 EDT, a subset of our users encountered difficulties accessing our services. We take this matter seriously and immediately initiated a thorough investigation to identify the root cause and mitigate the impact. **Root Cause:** The incident was traced back to a failure in our AWS VPC internal DNS resolver. This failure subsequently led to connectivity issues within our Mongo cluster. The secondary instances within the cluster were unable to establish connections with the Primary instance, causing the entire traffic load, including reads and writes, to be directed solely to the Primary instance. This sudden influx of traffic overwhelmed the Primary instance and resulted in the service disruption. **Mitigation Steps:** Upon identifying the issue, our emergency response team was promptly assembled to address the problem and restore services to normal operation. The team worked diligently to resolve the DNS resolver problem and restore proper connectivity within the Mongo cluster. We understand the importance of maintaining a resilient and reliable service environment. **Preventive Measures:** We are committed to preventing similar incidents from occurring in the future. To this end, we are implementing the following measures: Redundancy and Failover: We will enhance the redundancy and failover mechanisms within our AWS infrastructure to ensure that connectivity disruptions are quickly mitigated without impacting the user experience. We are also reviewing the MongoDB cluster setup to ensure the most efficient configuration is being used. Additionally, we want to inform you that we have reached out to AWS to request further details about the DNS failure that occurred. We believe that this will help us fortify our systems against similar issues in the future.
Status: Postmortem
Impact: Minor | Started At: Aug. 24, 2023, 3:32 p.m.
Description: During a recent maintenance period dedicated to enhancing page speed load performance, an incident occurred that led to certain images responding with a 404 status. The incident was triggered by modifications to the nginx headers for images as part of our efforts to optimize page speed load times. The change inadvertently caused certain images to respond with a 404 status, rendering them inaccessible to users. Some scenario had not been accounted for during testing, leading to the oversight. For future nginx changes we will add a test step to verify image loading.
Status: Resolved
Impact: None | Started At: Aug. 18, 2023, 11:34 a.m.
Description: During a recent maintenance period dedicated to enhancing page speed load performance, an incident occurred that led to certain images responding with a 404 status. The incident was triggered by modifications to the nginx headers for images as part of our efforts to optimize page speed load times. The change inadvertently caused certain images to respond with a 404 status, rendering them inaccessible to users. Some scenario had not been accounted for during testing, leading to the oversight. For future nginx changes we will add a test step to verify image loading.
Status: Resolved
Impact: None | Started At: Aug. 18, 2023, 11:34 a.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.