Last checked: 8 minutes ago
Get notified about any outages, downtime or incidents for Mezmo and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Mezmo.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
Log Analysis | Active |
Alerting | Active |
Archiving | Active |
Livetail | Active |
Log Ingestion (Agent/REST API/Code Libraries) | Active |
Log Ingestion (Heroku) | Active |
Log Ingestion (Syslog) | Active |
Search | Active |
Web App | Active |
Pipeline | Active |
Destinations | Active |
Ingestion / Sources | Active |
Processors | Active |
Web App | Active |
View the latest incidents for Mezmo and check for official updates:
Description: All logs submitted to our service are now available when running searches.
Status: Resolved
Impact: Minor | Started At: May 8, 2021, 6:36 p.m.
Description: **Start Time:** April 28, 2021 at 12:37 UTC **End Time:** May 3, 2021 at 00:54 UTC **What happened:** Newly submitted log lines from all customers were significantly delayed before being available in our WebUI for searching, graphing, and timelines. Alerting, Live Tail, and the uploading of archives to their destinations were significantly delayed as well. The incident was opened on April 28, 12:37 UTC. Typical mitigation steps were taken, but unsuccessful. Live Tail and alerting -- which were also significantly degraded -- were halted, about 14 hours after the start of the incident. This step was taken to keep other services, such as Search, functioning and give more resources to processing log lines. Logs submitted before the incident continued to be searchable. By May 1, 19:17 UTC, about 99% of newly submitted logs were again available in our WebUI at normal rates. Other essential services needed more time and manual intervention to recover. The incident was closed on May 3, 00:54 UTC. Ingestion of new logs lines from clients continued normally throughout the incident. **Why it happened:** We deployed an updated version of our proprietary messaging bus / parsing pipeline. This version had been tested in staging and multiple production regions beforehand and worked as expected. It was deployed and worked normally in production for four days. The cumulative traffic to our service over those four days revealed a performance issue that affected the processing of new log lines: logs were processed, but at a very slow rate. We’ve identified the cause of the slow performance as an update to node.js \(version 14\) that was part of the new version of our messaging bus. **How we fixed it:** Once the source of the failure had been identified, we reverted our messaging bus to its last stable version, which kept the delays in processing from degrading further. Our services still needed to process logs ingested up to that point, which required time, manual intervention, and more resources. We temporarily increased the number of servers dedicated to processing logs by about 60%. We also halted Live Tail and alerting, which were degraded almost to the point of being non-functional. Through the combination of these efforts, all logs were eventually processed and our service was again entirely operational. **What we are doing to prevent it from happening again:** During the incident, the new version of our messaging bus was reverted to its previous version. The version in production today does not contain the upgrade to node.js 14, which caused the performance degradation. We’ve removed node.js 14 from any future upgrades until we’ve had time to carefully examine its performance issues.
Status: Postmortem
Impact: Major | Started At: April 28, 2021, 12:37 p.m.
Description: **Start Time:** April 28, 2021 at 12:37 UTC **End Time:** May 3, 2021 at 00:54 UTC **What happened:** Newly submitted log lines from all customers were significantly delayed before being available in our WebUI for searching, graphing, and timelines. Alerting, Live Tail, and the uploading of archives to their destinations were significantly delayed as well. The incident was opened on April 28, 12:37 UTC. Typical mitigation steps were taken, but unsuccessful. Live Tail and alerting -- which were also significantly degraded -- were halted, about 14 hours after the start of the incident. This step was taken to keep other services, such as Search, functioning and give more resources to processing log lines. Logs submitted before the incident continued to be searchable. By May 1, 19:17 UTC, about 99% of newly submitted logs were again available in our WebUI at normal rates. Other essential services needed more time and manual intervention to recover. The incident was closed on May 3, 00:54 UTC. Ingestion of new logs lines from clients continued normally throughout the incident. **Why it happened:** We deployed an updated version of our proprietary messaging bus / parsing pipeline. This version had been tested in staging and multiple production regions beforehand and worked as expected. It was deployed and worked normally in production for four days. The cumulative traffic to our service over those four days revealed a performance issue that affected the processing of new log lines: logs were processed, but at a very slow rate. We’ve identified the cause of the slow performance as an update to node.js \(version 14\) that was part of the new version of our messaging bus. **How we fixed it:** Once the source of the failure had been identified, we reverted our messaging bus to its last stable version, which kept the delays in processing from degrading further. Our services still needed to process logs ingested up to that point, which required time, manual intervention, and more resources. We temporarily increased the number of servers dedicated to processing logs by about 60%. We also halted Live Tail and alerting, which were degraded almost to the point of being non-functional. Through the combination of these efforts, all logs were eventually processed and our service was again entirely operational. **What we are doing to prevent it from happening again:** During the incident, the new version of our messaging bus was reverted to its previous version. The version in production today does not contain the upgrade to node.js 14, which caused the performance degradation. We’ve removed node.js 14 from any future upgrades until we’ve had time to carefully examine its performance issues.
Status: Postmortem
Impact: Major | Started At: April 28, 2021, 12:37 p.m.
Description: The issue is resolved, Web UI is fully operational.
Status: Resolved
Impact: Major | Started At: April 23, 2021, 6:17 a.m.
Description: The issue is resolved, Web UI is fully operational.
Status: Resolved
Impact: Major | Started At: April 23, 2021, 6:17 a.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.