Last checked: 4 minutes ago
Get notified about any outages, downtime or incidents for Mezmo and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Mezmo.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
Log Analysis | Active |
Alerting | Active |
Archiving | Active |
Livetail | Active |
Log Ingestion (Agent/REST API/Code Libraries) | Active |
Log Ingestion (Heroku) | Active |
Log Ingestion (Syslog) | Active |
Search | Active |
Web App | Active |
Pipeline | Active |
Destinations | Active |
Ingestion / Sources | Active |
Processors | Active |
Web App | Active |
View the latest incidents for Mezmo and check for official updates:
Description: Start Time: Thursday, October 7, 2021, at 17:52 UTC End Time: Thursday, October 7, 2021, at 18:46 UTC Duration: 0:54:00 ## What happened: Our Web UI returned the error message “This site can’t be reached” when some users tried to login or load pages. The ingestion of logs was unaffected. ## Why it happened: The Telia carrier service in Europe experienced a major network routing outage caused by a faulty configuration update. The routing policy contained an error that impacted traffic to our service hosting provider, Equinix Metal. The Washington DC data center that houses our services was impacted. During this incident the [app.logdna.com](http://app.logdna.com) site was unreachable for some customers, depending on their location. ## How we fixed it: No remedial action was possible by LogDNA. We waited until the incident from Equinix Metal, our service hosting provider, was resolved. ## What we are doing to prevent it from happening again: For this type of incident, LogDNA cannot take proactive preventive measures.
Status: Postmortem
Impact: Major | Started At: Oct. 7, 2021, 5:52 p.m.
Description: Start Time: Wednesday, October 6, 2021, at 17:30:06 UTC End Time: Wednesday, October 6, 2021, at 21:30:27 UTC Duration: 4:00:21 ## What happened: Email notifications from alerts were partially halted for about 4 hours. Notifications sent by Slack and Webhooks were not affected. ## Why it happened: Our email service provider’s daily limit of 250,000 email messages was exceeded. All email message notifications from triggered alerts bounced and could not be resent. Further investigation revealed that a HackerOne Security Analyst looking for flaws in our service to report to us had made our system send 450,000 emails. This was accomplished by manually adding an array of many email addresses into the “Change Owner” request field within our Web UI. Adding multiple email addresses in that field is not permitted through normal usage of the Web UI. The Security Analyst intercepted the http request sent when the form was submitted and manually inserted a json list in the field, effectively sending an array of email addresses rather than a string. LogDNA had no server-side \(e.g. backend\) validation to ensure only a string could be accepted. ## How we fixed it: We took remedial action by contacting our email provider, who temporarily increased our daily email sending limit to 625,000 messages. This allowed email notifications from alerts to resume. We then added server-side validation for the “Change Owner” field in our Web UI so that only strings are accepted, even if the request is manually intercepted and an array of email addresses is added. ## What we are doing to prevent it from happening again: We will audit our Web UI to find all places where multiple email inputs can be added. We’ll then add server-side validation, so only strings are accepted. We’ll emphasize to our HackerOne Security Analysts that they should not take potentially damaging actions as they proactively search for vulnerabilities in our service.
Status: Postmortem
Impact: Major | Started At: Oct. 6, 2021, 9:29 p.m.
Description: Start Time: Wednesday, October 6, 2021, at 17:30:06 UTC End Time: Wednesday, October 6, 2021, at 21:30:27 UTC Duration: 4:00:21 ## What happened: Email notifications from alerts were partially halted for about 4 hours. Notifications sent by Slack and Webhooks were not affected. ## Why it happened: Our email service provider’s daily limit of 250,000 email messages was exceeded. All email message notifications from triggered alerts bounced and could not be resent. Further investigation revealed that a HackerOne Security Analyst looking for flaws in our service to report to us had made our system send 450,000 emails. This was accomplished by manually adding an array of many email addresses into the “Change Owner” request field within our Web UI. Adding multiple email addresses in that field is not permitted through normal usage of the Web UI. The Security Analyst intercepted the http request sent when the form was submitted and manually inserted a json list in the field, effectively sending an array of email addresses rather than a string. LogDNA had no server-side \(e.g. backend\) validation to ensure only a string could be accepted. ## How we fixed it: We took remedial action by contacting our email provider, who temporarily increased our daily email sending limit to 625,000 messages. This allowed email notifications from alerts to resume. We then added server-side validation for the “Change Owner” field in our Web UI so that only strings are accepted, even if the request is manually intercepted and an array of email addresses is added. ## What we are doing to prevent it from happening again: We will audit our Web UI to find all places where multiple email inputs can be added. We’ll then add server-side validation, so only strings are accepted. We’ll emphasize to our HackerOne Security Analysts that they should not take potentially damaging actions as they proactively search for vulnerabilities in our service.
Status: Postmortem
Impact: Major | Started At: Oct. 6, 2021, 9:29 p.m.
Description: **Start Time:** Thursday, August 19, 2021, at, 13:56 UTC **End Time:** Thursday, August 19, 2021, at, 20:48 UTC **Duration:** 6:52:00 **What happened:** Searches in our Web UI and Live Tail were intermittently failing. Additionally, for a small set of customers \(about 12%\), there were delays in newly submitted logs being available for searching, graphing, and timelines. **Why it happened:** Our service uses the Calico networking solution to ensure network level connectivity between all nodes and the pods running on them. On several nodes, Calico stopped running. This put one of our ElasticSearch clusters into an unhealthy red state. For customers using this cluster \(about 12% of all our customers\), there were delays in newly submitted logs being available for searching, graphing, and timelines. When Calico stopped running on some nodes, it also led to failures with tribe nodes, which make searching across multiple clusters possible. This caused intermittent failures in searching and live tail, for all customers. **How we fixed it:** We took remedial action by restarting Calico and restoring networking connections between all nodes. We also restarted our tribe nodes, repaired the ElasticSearch cluster that was in a red state, and then provided temporary resources so our service could more quickly process the backlog of logs sent by customers since the beginning of the incident. **What we are doing to prevent it from happening again:** We are investigating why Calico stopped working on several nodes. We’re also updating our runbooks to recover more quickly in similar situations and limit any customer impact.
Status: Postmortem
Impact: Major | Started At: Aug. 19, 2021, 1:56 p.m.
Description: **Start Time:** Thursday, August 19, 2021, at, 13:56 UTC **End Time:** Thursday, August 19, 2021, at, 20:48 UTC **Duration:** 6:52:00 **What happened:** Searches in our Web UI and Live Tail were intermittently failing. Additionally, for a small set of customers \(about 12%\), there were delays in newly submitted logs being available for searching, graphing, and timelines. **Why it happened:** Our service uses the Calico networking solution to ensure network level connectivity between all nodes and the pods running on them. On several nodes, Calico stopped running. This put one of our ElasticSearch clusters into an unhealthy red state. For customers using this cluster \(about 12% of all our customers\), there were delays in newly submitted logs being available for searching, graphing, and timelines. When Calico stopped running on some nodes, it also led to failures with tribe nodes, which make searching across multiple clusters possible. This caused intermittent failures in searching and live tail, for all customers. **How we fixed it:** We took remedial action by restarting Calico and restoring networking connections between all nodes. We also restarted our tribe nodes, repaired the ElasticSearch cluster that was in a red state, and then provided temporary resources so our service could more quickly process the backlog of logs sent by customers since the beginning of the incident. **What we are doing to prevent it from happening again:** We are investigating why Calico stopped working on several nodes. We’re also updating our runbooks to recover more quickly in similar situations and limit any customer impact.
Status: Postmortem
Impact: Major | Started At: Aug. 19, 2021, 1:56 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.