Mezmo Status: Check if Mezmo down or having an outage.

Component	Status
Log Analysis	Active
Alerting	Active
Archiving	Active
Livetail	Active
Log Ingestion (Agent/REST API/Code Libraries)	Active
Log Ingestion (Heroku)	Active
Log Ingestion (Syslog)	Active
Search	Active
Web App	Active
Pipeline	Active
Destinations	Active
Ingestion / Sources	Active
Processors	Active
Web App	Active

Some customers are not able to login to our UI

Description: Start Time: Thursday, October 7, 2021, at 17:52 UTC End Time: Thursday, October 7, 2021, at 18:46 UTC Duration: 0:54:00 ## What happened: Our Web UI returned the error message “This site can’t be reached” when some users tried to login or load pages. The ingestion of logs was unaffected. ## Why it happened: The Telia carrier service in Europe experienced a major network routing outage caused by a faulty configuration update. The routing policy contained an error that impacted traffic to our service hosting provider, Equinix Metal. The Washington DC data center that houses our services was impacted. During this incident the [app.logdna.com](http://app.logdna.com) site was unreachable for some customers, depending on their location. ## How we fixed it: No remedial action was possible by LogDNA. We waited until the incident from Equinix Metal, our service hosting provider, was resolved. ## What we are doing to prevent it from happening again: For this type of incident, LogDNA cannot take proactive preventive measures.

Status: Postmortem

Impact: Major | Started At: Oct. 7, 2021, 5:52 p.m.

Updates:

Time: Oct. 14, 2021, 5:40 p.m.

Status: Postmortem

Update: Start Time: Thursday, October 7, 2021, at 17:52 UTC End Time: Thursday, October 7, 2021, at 18:46 UTC Duration: 0:54:00 ## What happened: Our Web UI returned the error message “This site can’t be reached” when some users tried to login or load pages. The ingestion of logs was unaffected. ## Why it happened: The Telia carrier service in Europe experienced a major network routing outage caused by a faulty configuration update. The routing policy contained an error that impacted traffic to our service hosting provider, Equinix Metal. The Washington DC data center that houses our services was impacted. During this incident the [app.logdna.com](http://app.logdna.com) site was unreachable for some customers, depending on their location. ## How we fixed it: No remedial action was possible by LogDNA. We waited until the incident from Equinix Metal, our service hosting provider, was resolved. ## What we are doing to prevent it from happening again: For this type of incident, LogDNA cannot take proactive preventive measures.
Time: Oct. 7, 2021, 6:46 p.m.

Status: Resolved

Update: Logins are working for all customers. All services are operational.
Time: Oct. 7, 2021, 6:02 p.m.

Status: Monitoring

Update: Logins to our UI appear to be working again for all customers. We are monitoring for any further failures.
Time: Oct. 7, 2021, 5:52 p.m.

Status: Identified

Update: Some customers are not able to login to our UI. It appears this is due to an incident with our cloud provider Equinix. See their status page https://status.equinixmetal.com/incidents/wgg6kl862tl6.

Email Alerts Are Not Being Sent

Description: Start Time: Wednesday, October 6, 2021, at 17:30:06 UTC End Time: Wednesday, October 6, 2021, at 21:30:27 UTC Duration: 4:00:21 ## What happened: Email notifications from alerts were partially halted for about 4 hours. Notifications sent by Slack and Webhooks were not affected. ## Why it happened: Our email service provider’s daily limit of 250,000 email messages was exceeded. All email message notifications from triggered alerts bounced and could not be resent. Further investigation revealed that a HackerOne Security Analyst looking for flaws in our service to report to us had made our system send 450,000 emails. This was accomplished by manually adding an array of many email addresses into the “Change Owner” request field within our Web UI. Adding multiple email addresses in that field is not permitted through normal usage of the Web UI. The Security Analyst intercepted the http request sent when the form was submitted and manually inserted a json list in the field, effectively sending an array of email addresses rather than a string. LogDNA had no server-side \(e.g. backend\) validation to ensure only a string could be accepted. ## How we fixed it: We took remedial action by contacting our email provider, who temporarily increased our daily email sending limit to 625,000 messages. This allowed email notifications from alerts to resume. We then added server-side validation for the “Change Owner” field in our Web UI so that only strings are accepted, even if the request is manually intercepted and an array of email addresses is added. ## What we are doing to prevent it from happening again: We will audit our Web UI to find all places where multiple email inputs can be added. We’ll then add server-side validation, so only strings are accepted. We’ll emphasize to our HackerOne Security Analysts that they should not take potentially damaging actions as they proactively search for vulnerabilities in our service.

Status: Postmortem

Impact: Major | Started At: Oct. 6, 2021, 9:29 p.m.

Updates:

Time: Oct. 13, 2021, 9:01 p.m.

Status: Postmortem

Update: Start Time: Wednesday, October 6, 2021, at 17:30:06 UTC End Time: Wednesday, October 6, 2021, at 21:30:27 UTC Duration: 4:00:21 ## What happened: Email notifications from alerts were partially halted for about 4 hours. Notifications sent by Slack and Webhooks were not affected. ## Why it happened: Our email service provider’s daily limit of 250,000 email messages was exceeded. All email message notifications from triggered alerts bounced and could not be resent. Further investigation revealed that a HackerOne Security Analyst looking for flaws in our service to report to us had made our system send 450,000 emails. This was accomplished by manually adding an array of many email addresses into the “Change Owner” request field within our Web UI. Adding multiple email addresses in that field is not permitted through normal usage of the Web UI. The Security Analyst intercepted the http request sent when the form was submitted and manually inserted a json list in the field, effectively sending an array of email addresses rather than a string. LogDNA had no server-side \(e.g. backend\) validation to ensure only a string could be accepted. ## How we fixed it: We took remedial action by contacting our email provider, who temporarily increased our daily email sending limit to 625,000 messages. This allowed email notifications from alerts to resume. We then added server-side validation for the “Change Owner” field in our Web UI so that only strings are accepted, even if the request is manually intercepted and an array of email addresses is added. ## What we are doing to prevent it from happening again: We will audit our Web UI to find all places where multiple email inputs can be added. We’ll then add server-side validation, so only strings are accepted. We’ll emphasize to our HackerOne Security Analysts that they should not take potentially damaging actions as they proactively search for vulnerabilities in our service.
Time: Oct. 6, 2021, 10:35 p.m.

Status: Resolved

Update: Our email alerting feature has been restored to normal operation. All services are fully functional.
Time: Oct. 6, 2021, 9:29 p.m.

Status: Investigating

Update: Our email alerting feature is not working at the moment and customers are not receiving alerts by email. Other types of alerts, such as Slack and webhook, are still working. We are investigating.

Email Alerts Are Not Being Sent

Description: Start Time: Wednesday, October 6, 2021, at 17:30:06 UTC End Time: Wednesday, October 6, 2021, at 21:30:27 UTC Duration: 4:00:21 ## What happened: Email notifications from alerts were partially halted for about 4 hours. Notifications sent by Slack and Webhooks were not affected. ## Why it happened: Our email service provider’s daily limit of 250,000 email messages was exceeded. All email message notifications from triggered alerts bounced and could not be resent. Further investigation revealed that a HackerOne Security Analyst looking for flaws in our service to report to us had made our system send 450,000 emails. This was accomplished by manually adding an array of many email addresses into the “Change Owner” request field within our Web UI. Adding multiple email addresses in that field is not permitted through normal usage of the Web UI. The Security Analyst intercepted the http request sent when the form was submitted and manually inserted a json list in the field, effectively sending an array of email addresses rather than a string. LogDNA had no server-side \(e.g. backend\) validation to ensure only a string could be accepted. ## How we fixed it: We took remedial action by contacting our email provider, who temporarily increased our daily email sending limit to 625,000 messages. This allowed email notifications from alerts to resume. We then added server-side validation for the “Change Owner” field in our Web UI so that only strings are accepted, even if the request is manually intercepted and an array of email addresses is added. ## What we are doing to prevent it from happening again: We will audit our Web UI to find all places where multiple email inputs can be added. We’ll then add server-side validation, so only strings are accepted. We’ll emphasize to our HackerOne Security Analysts that they should not take potentially damaging actions as they proactively search for vulnerabilities in our service.

Status: Postmortem

Impact: Major | Started At: Oct. 6, 2021, 9:29 p.m.

Updates:

Time: Oct. 13, 2021, 9:01 p.m.

Status: Postmortem

Update: Start Time: Wednesday, October 6, 2021, at 17:30:06 UTC End Time: Wednesday, October 6, 2021, at 21:30:27 UTC Duration: 4:00:21 ## What happened: Email notifications from alerts were partially halted for about 4 hours. Notifications sent by Slack and Webhooks were not affected. ## Why it happened: Our email service provider’s daily limit of 250,000 email messages was exceeded. All email message notifications from triggered alerts bounced and could not be resent. Further investigation revealed that a HackerOne Security Analyst looking for flaws in our service to report to us had made our system send 450,000 emails. This was accomplished by manually adding an array of many email addresses into the “Change Owner” request field within our Web UI. Adding multiple email addresses in that field is not permitted through normal usage of the Web UI. The Security Analyst intercepted the http request sent when the form was submitted and manually inserted a json list in the field, effectively sending an array of email addresses rather than a string. LogDNA had no server-side \(e.g. backend\) validation to ensure only a string could be accepted. ## How we fixed it: We took remedial action by contacting our email provider, who temporarily increased our daily email sending limit to 625,000 messages. This allowed email notifications from alerts to resume. We then added server-side validation for the “Change Owner” field in our Web UI so that only strings are accepted, even if the request is manually intercepted and an array of email addresses is added. ## What we are doing to prevent it from happening again: We will audit our Web UI to find all places where multiple email inputs can be added. We’ll then add server-side validation, so only strings are accepted. We’ll emphasize to our HackerOne Security Analysts that they should not take potentially damaging actions as they proactively search for vulnerabilities in our service.
Time: Oct. 6, 2021, 10:35 p.m.

Status: Resolved

Update: Our email alerting feature has been restored to normal operation. All services are fully functional.
Time: Oct. 6, 2021, 9:29 p.m.

Status: Investigating

Update: Our email alerting feature is not working at the moment and customers are not receiving alerts by email. Other types of alerts, such as Slack and webhook, are still working. We are investigating.

Search and Live Tail Intermittent Failures

Description: **Start Time:** Thursday, August 19, 2021, at, 13:56 UTC **End Time:** Thursday, August 19, 2021, at, 20:48 UTC **Duration:** 6:52:00 **What happened:** Searches in our Web UI and Live Tail were intermittently failing. Additionally, for a small set of customers \(about 12%\), there were delays in newly submitted logs being available for searching, graphing, and timelines. **Why it happened:** Our service uses the Calico networking solution to ensure network level connectivity between all nodes and the pods running on them. On several nodes, Calico stopped running. This put one of our ElasticSearch clusters into an unhealthy red state. For customers using this cluster \(about 12% of all our customers\), there were delays in newly submitted logs being available for searching, graphing, and timelines. When Calico stopped running on some nodes, it also led to failures with tribe nodes, which make searching across multiple clusters possible. This caused intermittent failures in searching and live tail, for all customers. ‌ **How we fixed it:** We took remedial action by restarting Calico and restoring networking connections between all nodes. We also restarted our tribe nodes, repaired the ElasticSearch cluster that was in a red state, and then provided temporary resources so our service could more quickly process the backlog of logs sent by customers since the beginning of the incident. ‌ **What we are doing to prevent it from happening again:** We are investigating why Calico stopped working on several nodes. We’re also updating our runbooks to recover more quickly in similar situations and limit any customer impact.

Status: Postmortem

Impact: Major | Started At: Aug. 19, 2021, 1:56 p.m.

Updates:

Time: Aug. 20, 2021, 5:06 p.m.

Status: Postmortem

Update: **Start Time:** Thursday, August 19, 2021, at, 13:56 UTC **End Time:** Thursday, August 19, 2021, at, 20:48 UTC **Duration:** 6:52:00 **What happened:** Searches in our Web UI and Live Tail were intermittently failing. Additionally, for a small set of customers \(about 12%\), there were delays in newly submitted logs being available for searching, graphing, and timelines. **Why it happened:** Our service uses the Calico networking solution to ensure network level connectivity between all nodes and the pods running on them. On several nodes, Calico stopped running. This put one of our ElasticSearch clusters into an unhealthy red state. For customers using this cluster \(about 12% of all our customers\), there were delays in newly submitted logs being available for searching, graphing, and timelines. When Calico stopped running on some nodes, it also led to failures with tribe nodes, which make searching across multiple clusters possible. This caused intermittent failures in searching and live tail, for all customers. ‌ **How we fixed it:** We took remedial action by restarting Calico and restoring networking connections between all nodes. We also restarted our tribe nodes, repaired the ElasticSearch cluster that was in a red state, and then provided temporary resources so our service could more quickly process the backlog of logs sent by customers since the beginning of the incident. ‌ **What we are doing to prevent it from happening again:** We are investigating why Calico stopped working on several nodes. We’re also updating our runbooks to recover more quickly in similar situations and limit any customer impact.
Time: Aug. 19, 2021, 8:48 p.m.

Status: Resolved

Update: Searching and live tail are working as expected. All services are fully operational.
Time: Aug. 19, 2021, 5:30 p.m.

Status: Identified

Update: We’ve identified the immediate cause and are taking remedial action. Searches in our Web UI and Live Tail are working better now, but are still failing at times.
Time: Aug. 19, 2021, 3:31 p.m.

Status: Investigating

Update: Our engineers are taking steps to mitigate the impact as we identify the root cause.
Time: Aug. 19, 2021, 1:56 p.m.

Status: Investigating

Update: Searches in our Web UI and Live Tail are failing intermittently, for some customers. We are investigating.

Search and Live Tail Intermittent Failures

Description: **Start Time:** Thursday, August 19, 2021, at, 13:56 UTC **End Time:** Thursday, August 19, 2021, at, 20:48 UTC **Duration:** 6:52:00 **What happened:** Searches in our Web UI and Live Tail were intermittently failing. Additionally, for a small set of customers \(about 12%\), there were delays in newly submitted logs being available for searching, graphing, and timelines. **Why it happened:** Our service uses the Calico networking solution to ensure network level connectivity between all nodes and the pods running on them. On several nodes, Calico stopped running. This put one of our ElasticSearch clusters into an unhealthy red state. For customers using this cluster \(about 12% of all our customers\), there were delays in newly submitted logs being available for searching, graphing, and timelines. When Calico stopped running on some nodes, it also led to failures with tribe nodes, which make searching across multiple clusters possible. This caused intermittent failures in searching and live tail, for all customers. ‌ **How we fixed it:** We took remedial action by restarting Calico and restoring networking connections between all nodes. We also restarted our tribe nodes, repaired the ElasticSearch cluster that was in a red state, and then provided temporary resources so our service could more quickly process the backlog of logs sent by customers since the beginning of the incident. ‌ **What we are doing to prevent it from happening again:** We are investigating why Calico stopped working on several nodes. We’re also updating our runbooks to recover more quickly in similar situations and limit any customer impact.

Status: Postmortem

Impact: Major | Started At: Aug. 19, 2021, 1:56 p.m.

Updates:

Time: Aug. 20, 2021, 5:06 p.m.

Status: Postmortem

Update: **Start Time:** Thursday, August 19, 2021, at, 13:56 UTC **End Time:** Thursday, August 19, 2021, at, 20:48 UTC **Duration:** 6:52:00 **What happened:** Searches in our Web UI and Live Tail were intermittently failing. Additionally, for a small set of customers \(about 12%\), there were delays in newly submitted logs being available for searching, graphing, and timelines. **Why it happened:** Our service uses the Calico networking solution to ensure network level connectivity between all nodes and the pods running on them. On several nodes, Calico stopped running. This put one of our ElasticSearch clusters into an unhealthy red state. For customers using this cluster \(about 12% of all our customers\), there were delays in newly submitted logs being available for searching, graphing, and timelines. When Calico stopped running on some nodes, it also led to failures with tribe nodes, which make searching across multiple clusters possible. This caused intermittent failures in searching and live tail, for all customers. ‌ **How we fixed it:** We took remedial action by restarting Calico and restoring networking connections between all nodes. We also restarted our tribe nodes, repaired the ElasticSearch cluster that was in a red state, and then provided temporary resources so our service could more quickly process the backlog of logs sent by customers since the beginning of the incident. ‌ **What we are doing to prevent it from happening again:** We are investigating why Calico stopped working on several nodes. We’re also updating our runbooks to recover more quickly in similar situations and limit any customer impact.
Time: Aug. 19, 2021, 8:48 p.m.

Status: Resolved

Update: Searching and live tail are working as expected. All services are fully operational.
Time: Aug. 19, 2021, 5:30 p.m.

Status: Identified

Update: We’ve identified the immediate cause and are taking remedial action. Searches in our Web UI and Live Tail are working better now, but are still failing at times.
Time: Aug. 19, 2021, 3:31 p.m.

Status: Investigating

Update: Our engineers are taking steps to mitigate the impact as we identify the root cause.
Time: Aug. 19, 2021, 1:56 p.m.

Status: Investigating

Update: Searches in our Web UI and Live Tail are failing intermittently, for some customers. We are investigating.

Is there an Mezmo outage?

Mezmo status: Systems Active

Mezmo outages and incidents

There have been 1 outages or incidents for Mezmo in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Components and Services Monitored for Mezmo

Log Analysis

Pipeline

Latest Mezmo outages and incidents.

Some customers are not able to login to our UI

Updates:

Email Alerts Are Not Being Sent

Updates:

Email Alerts Are Not Being Sent

Updates:

Search and Live Tail Intermittent Failures

Updates:

Search and Live Tail Intermittent Failures

Updates:

Check the status of similar companies and alternatives to Mezmo

Hudl

OutSystems

Postman

Mendix

DigitalOcean

Bandwidth

DataRobot

Grafana Cloud

SmartBear Software

Test IO

Copado Solutions

CircleCI

Frequently Asked Questions - Mezmo

Is there a Mezmo outage?

Where can I find the official status page of Mezmo?

How can I get notified if Mezmo is down or experiencing an outage?

What does Mezmo do?

Start monitoring now!