Mezmo Status: Check if Mezmo down or having an outage.

Component	Status
Log Analysis	Active
Alerting	Active
Archiving	Active
Livetail	Active
Log Ingestion (Agent/REST API/Code Libraries)	Active
Log Ingestion (Heroku)	Active
Log Ingestion (Syslog)	Active
Search	Active
Web App	Active
Pipeline	Active
Destinations	Active
Ingestion / Sources	Active
Processors	Active
Web App	Active

Delays in Alerting, Searching, Live Tail, Graphing, and Timelines. WebUI intermittently unavailable.

Description: **Dates:** Start Time: Tuesday, February 8, 2022, at 13:17 UTC End Time: Tuesday, February 8, 2022, at 14:21 UTC Duration: 1:04:00 ‌ **What happened:** Our Web UI was unresponsive for about 10 minutes. Newly submitted logs were not immediately available for Alerting, Searching, Live Tail, Graphing, and Timelines. No data was lost and ingestion was not halted. ‌ **Why it happened:** Our Redis database had a failover and the services that depend on it were unable to reconnect after it recovered, including the Parser. This service is upstream of many other services. Consequently, newly submitted logs were not passed on to many downstream services, such as Alerting, Live Tail, Searching, Graphing, and Timelines. The WebUI was also intermittently unavailable because it requires a connection to Redis. ‌ **How we fixed it:** We manually restarted the Redis service which allowed a new master to be elected. After Redis recovered, the Parser, Web UI and other services were restarted which were then able to reestablish a connection to Redis. This restored the Web UI and allowed newly submitted logs to pass from our Parser service to all downstream services. Over a short period of time, these services processed the backlog of logs and newly submitted logs were again available without delays. ‌ **What we are doing to prevent it from happening again:** We recently added functionality to track the flow rate of newly submitted logs. This new feature requires more memory than expected in the event of a Redis failover, which is why services could not reconnect to Redis. We’ve increased the limits of the memory buffer for the relevant portions of our service. We will also add additional Redis monitoring to more quickly detect unhealthy sentinels and continue to work on an ongoing project to make all services more tolerant of Redis failovers.

Status: Postmortem

Impact: None | Started At: Feb. 8, 2022, 1:38 p.m.

Updates:

Time: Feb. 15, 2022, 7:23 p.m.

Status: Postmortem

Update: **Dates:** Start Time: Tuesday, February 8, 2022, at 13:17 UTC End Time: Tuesday, February 8, 2022, at 14:21 UTC Duration: 1:04:00 ‌ **What happened:** Our Web UI was unresponsive for about 10 minutes. Newly submitted logs were not immediately available for Alerting, Searching, Live Tail, Graphing, and Timelines. No data was lost and ingestion was not halted. ‌ **Why it happened:** Our Redis database had a failover and the services that depend on it were unable to reconnect after it recovered, including the Parser. This service is upstream of many other services. Consequently, newly submitted logs were not passed on to many downstream services, such as Alerting, Live Tail, Searching, Graphing, and Timelines. The WebUI was also intermittently unavailable because it requires a connection to Redis. ‌ **How we fixed it:** We manually restarted the Redis service which allowed a new master to be elected. After Redis recovered, the Parser, Web UI and other services were restarted which were then able to reestablish a connection to Redis. This restored the Web UI and allowed newly submitted logs to pass from our Parser service to all downstream services. Over a short period of time, these services processed the backlog of logs and newly submitted logs were again available without delays. ‌ **What we are doing to prevent it from happening again:** We recently added functionality to track the flow rate of newly submitted logs. This new feature requires more memory than expected in the event of a Redis failover, which is why services could not reconnect to Redis. We’ve increased the limits of the memory buffer for the relevant portions of our service. We will also add additional Redis monitoring to more quickly detect unhealthy sentinels and continue to work on an ongoing project to make all services more tolerant of Redis failovers.
Time: Feb. 8, 2022, 2:40 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: Feb. 8, 2022, 2:25 p.m.

Status: Monitoring

Update: We have implemented a fix and are monitoring the results. Newly sent logs are being processed again with minimal delays and all services are operational.
Time: Feb. 8, 2022, 1:55 p.m.

Status: Investigating

Update: We are continuing to investigate the issue. The web app can be accessible intermittently. Logs can arrive with a delay which will impact searching and alerting.
Time: Feb. 8, 2022, 1:38 p.m.

Status: Investigating

Update: We are currently investigating the issue.

Ingestion, Searching, Live Tail, Alerting, Graphing, and Timelines Delays

Description: **Dates:** Start Time: Wednesday, January 26, 2022, at 15:45:00 UTC End Time: Wednesday, January 26, 2022, at 16:30:00 UTC Duration: 00:45:00 ‌ **What happened:** Ingestion was halted and newly submitted logs were not immediately available for Alerting, Live Tail, Searching, Graphing, and Timelines. Some alerts were never triggered. Once ingestion had resumed, LogDNA agents running on customer environments resent all locally cached logs to our service for ingestion. No data was lost. ‌ **Why it happened:** Our Redis database had a failover and the services that depend on it were unable to recover automatically. Normally, the pods running our ingestion service deliberately crash until they are able to access Redis again. However, these pods were in a bad state and unable to reconnect when Redis returned. Since ingestion was halted, newly submitted logs were not passed on to many downstream services, such as Alerting, Live Tail, Searching, Graphing, and Timelines. ‌ **How we fixed it:** We manually restarted all the pods of our ingestion service, then restarted all the sentinel pods of Redis. The ingestion service became operational again and logs were passed on to all downstream services. Over a short period of time, these services processed the backlog of logs and newly submitted logs were again available without delays. ‌ **What we are doing to prevent it from happening again:** The ingestion pods were in a bad state because they had not been restarted after a configuration change made several days earlier, for reasons unrelated to this incident. The runbook for making such configuration changes has been updated to prevent this procedural failure in the future. We’re also in the middle of a project to make all services more tolerant of Redis failovers.

Status: Postmortem

Impact: Critical | Started At: Jan. 26, 2022, 4:10 p.m.

Updates:

Time: Jan. 31, 2022, 8:14 p.m.

Status: Postmortem

Update: **Dates:** Start Time: Wednesday, January 26, 2022, at 15:45:00 UTC End Time: Wednesday, January 26, 2022, at 16:30:00 UTC Duration: 00:45:00 ‌ **What happened:** Ingestion was halted and newly submitted logs were not immediately available for Alerting, Live Tail, Searching, Graphing, and Timelines. Some alerts were never triggered. Once ingestion had resumed, LogDNA agents running on customer environments resent all locally cached logs to our service for ingestion. No data was lost. ‌ **Why it happened:** Our Redis database had a failover and the services that depend on it were unable to recover automatically. Normally, the pods running our ingestion service deliberately crash until they are able to access Redis again. However, these pods were in a bad state and unable to reconnect when Redis returned. Since ingestion was halted, newly submitted logs were not passed on to many downstream services, such as Alerting, Live Tail, Searching, Graphing, and Timelines. ‌ **How we fixed it:** We manually restarted all the pods of our ingestion service, then restarted all the sentinel pods of Redis. The ingestion service became operational again and logs were passed on to all downstream services. Over a short period of time, these services processed the backlog of logs and newly submitted logs were again available without delays. ‌ **What we are doing to prevent it from happening again:** The ingestion pods were in a bad state because they had not been restarted after a configuration change made several days earlier, for reasons unrelated to this incident. The runbook for making such configuration changes has been updated to prevent this procedural failure in the future. We’re also in the middle of a project to make all services more tolerant of Redis failovers.
Time: Jan. 26, 2022, 5:15 p.m.

Status: Resolved

Update: This incident has been resolved. All services are operational.
Time: Jan. 26, 2022, 4:58 p.m.

Status: Monitoring

Update: We have implemented a fix and are monitoring the results. Logs are being ingested again and all services are operational.
Time: Jan. 26, 2022, 4:23 p.m.

Status: Investigating

Update: We are continuing to investigate this issue.
Time: Jan. 26, 2022, 4:10 p.m.

Status: Investigating

Update: Ingestion services are currently halted. Customers will also experience delays with Searching, Live Tail, Alerting, Graphing, and Timelines.

Ingestion, Searching, Live Tail, Alerting, Graphing, and Timelines Delays

Description: **Dates:** Start Time: Wednesday, January 26, 2022, at 15:45:00 UTC End Time: Wednesday, January 26, 2022, at 16:30:00 UTC Duration: 00:45:00 ‌ **What happened:** Ingestion was halted and newly submitted logs were not immediately available for Alerting, Live Tail, Searching, Graphing, and Timelines. Some alerts were never triggered. Once ingestion had resumed, LogDNA agents running on customer environments resent all locally cached logs to our service for ingestion. No data was lost. ‌ **Why it happened:** Our Redis database had a failover and the services that depend on it were unable to recover automatically. Normally, the pods running our ingestion service deliberately crash until they are able to access Redis again. However, these pods were in a bad state and unable to reconnect when Redis returned. Since ingestion was halted, newly submitted logs were not passed on to many downstream services, such as Alerting, Live Tail, Searching, Graphing, and Timelines. ‌ **How we fixed it:** We manually restarted all the pods of our ingestion service, then restarted all the sentinel pods of Redis. The ingestion service became operational again and logs were passed on to all downstream services. Over a short period of time, these services processed the backlog of logs and newly submitted logs were again available without delays. ‌ **What we are doing to prevent it from happening again:** The ingestion pods were in a bad state because they had not been restarted after a configuration change made several days earlier, for reasons unrelated to this incident. The runbook for making such configuration changes has been updated to prevent this procedural failure in the future. We’re also in the middle of a project to make all services more tolerant of Redis failovers.

Status: Postmortem

Impact: Critical | Started At: Jan. 26, 2022, 4:10 p.m.

Updates:

Time: Jan. 31, 2022, 8:14 p.m.

Status: Postmortem

Update: **Dates:** Start Time: Wednesday, January 26, 2022, at 15:45:00 UTC End Time: Wednesday, January 26, 2022, at 16:30:00 UTC Duration: 00:45:00 ‌ **What happened:** Ingestion was halted and newly submitted logs were not immediately available for Alerting, Live Tail, Searching, Graphing, and Timelines. Some alerts were never triggered. Once ingestion had resumed, LogDNA agents running on customer environments resent all locally cached logs to our service for ingestion. No data was lost. ‌ **Why it happened:** Our Redis database had a failover and the services that depend on it were unable to recover automatically. Normally, the pods running our ingestion service deliberately crash until they are able to access Redis again. However, these pods were in a bad state and unable to reconnect when Redis returned. Since ingestion was halted, newly submitted logs were not passed on to many downstream services, such as Alerting, Live Tail, Searching, Graphing, and Timelines. ‌ **How we fixed it:** We manually restarted all the pods of our ingestion service, then restarted all the sentinel pods of Redis. The ingestion service became operational again and logs were passed on to all downstream services. Over a short period of time, these services processed the backlog of logs and newly submitted logs were again available without delays. ‌ **What we are doing to prevent it from happening again:** The ingestion pods were in a bad state because they had not been restarted after a configuration change made several days earlier, for reasons unrelated to this incident. The runbook for making such configuration changes has been updated to prevent this procedural failure in the future. We’re also in the middle of a project to make all services more tolerant of Redis failovers.
Time: Jan. 26, 2022, 5:15 p.m.

Status: Resolved

Update: This incident has been resolved. All services are operational.
Time: Jan. 26, 2022, 4:58 p.m.

Status: Monitoring

Update: We have implemented a fix and are monitoring the results. Logs are being ingested again and all services are operational.
Time: Jan. 26, 2022, 4:23 p.m.

Status: Investigating

Update: We are continuing to investigate this issue.
Time: Jan. 26, 2022, 4:10 p.m.

Status: Investigating

Update: Ingestion services are currently halted. Customers will also experience delays with Searching, Live Tail, Alerting, Graphing, and Timelines.

Many services briefly halted due to Cloud Provider incident

Description: **Dates:** Start Time: Thursday, January 20, 2022, at 19:13:00 UTC End Time: Thursday, January 20, 2022, at 21:24:00 UTC Duration: 02:11:00 **What happened:** Ingestion was halted and our Web UI was unresponsive for about 5-10 minutes. Newly submitted logs were not immediately available for Alerting, Searching, Live Tail, Graphing, and Timelines. **Why it happened:** Our service hosting provider Equinix Metal had an outage that was caused by the failure of one of their main switches \(more details at [https://status.equinixmetal.com/incidents/gjmh37y6rkjp](https://status.equinixmetal.com/incidents/gjmh37y6rkjp)\). The outage impacted traffic and global network connectivity to the LogDNA service. During the Equinix Metal incident, Ingestion, Alerting, and Live Tail were halted and our Web UI was unresponsive for a period of 5-10 minutes. Multiple ElasticSearch \(ES\) clusters went into an unhealthy state which caused delays for about one hour in newly submitted logs being immediately available for Searching, Graphing, and Timelines. **How we fixed it:** No remedial action was possible by LogDNA. We waited until the incident from Equinix Metal, our service hosting provider, was resolved. The ES clusters were repaired and the backlog of newly submitted logs was processed in about one hour. **What we are doing to prevent it from happening again:** For this type of incident, LogDNA cannot take proactive preventive measures.

Status: Postmortem

Impact: None | Started At: Jan. 20, 2022, 7:57 p.m.

Updates:

Time: Jan. 21, 2022, 7:47 p.m.

Status: Postmortem

Update: **Dates:** Start Time: Thursday, January 20, 2022, at 19:13:00 UTC End Time: Thursday, January 20, 2022, at 21:24:00 UTC Duration: 02:11:00 **What happened:** Ingestion was halted and our Web UI was unresponsive for about 5-10 minutes. Newly submitted logs were not immediately available for Alerting, Searching, Live Tail, Graphing, and Timelines. **Why it happened:** Our service hosting provider Equinix Metal had an outage that was caused by the failure of one of their main switches \(more details at [https://status.equinixmetal.com/incidents/gjmh37y6rkjp](https://status.equinixmetal.com/incidents/gjmh37y6rkjp)\). The outage impacted traffic and global network connectivity to the LogDNA service. During the Equinix Metal incident, Ingestion, Alerting, and Live Tail were halted and our Web UI was unresponsive for a period of 5-10 minutes. Multiple ElasticSearch \(ES\) clusters went into an unhealthy state which caused delays for about one hour in newly submitted logs being immediately available for Searching, Graphing, and Timelines. **How we fixed it:** No remedial action was possible by LogDNA. We waited until the incident from Equinix Metal, our service hosting provider, was resolved. The ES clusters were repaired and the backlog of newly submitted logs was processed in about one hour. **What we are doing to prevent it from happening again:** For this type of incident, LogDNA cannot take proactive preventive measures.
Time: Jan. 20, 2022, 9:38 p.m.

Status: Resolved

Update: This incident has been resolved. All services are operational.
Time: Jan. 20, 2022, 9:07 p.m.

Status: Monitoring

Update: Logs are being ingested again without delays. All services are working normally. We will monitor until our Cloud Provider closes their incident.
Time: Jan. 20, 2022, 7:57 p.m.

Status: Investigating

Update: Our Cloud Provider Equinix is having an incident (see https://status.equinixmetal.com/incidents/gjmh37y6rkjp). For about 5-10 minutes, ingestion was halted and the WebUI was not responsive. Some alerts may have not been triggered. Currently all services are working and there are some delays in processing recently sent logs. We are monitoring Equinix’s incident closely.

Many services briefly halted due to Cloud Provider incident

Description: **Dates:** Start Time: Thursday, January 20, 2022, at 19:13:00 UTC End Time: Thursday, January 20, 2022, at 21:24:00 UTC Duration: 02:11:00 **What happened:** Ingestion was halted and our Web UI was unresponsive for about 5-10 minutes. Newly submitted logs were not immediately available for Alerting, Searching, Live Tail, Graphing, and Timelines. **Why it happened:** Our service hosting provider Equinix Metal had an outage that was caused by the failure of one of their main switches \(more details at [https://status.equinixmetal.com/incidents/gjmh37y6rkjp](https://status.equinixmetal.com/incidents/gjmh37y6rkjp)\). The outage impacted traffic and global network connectivity to the LogDNA service. During the Equinix Metal incident, Ingestion, Alerting, and Live Tail were halted and our Web UI was unresponsive for a period of 5-10 minutes. Multiple ElasticSearch \(ES\) clusters went into an unhealthy state which caused delays for about one hour in newly submitted logs being immediately available for Searching, Graphing, and Timelines. **How we fixed it:** No remedial action was possible by LogDNA. We waited until the incident from Equinix Metal, our service hosting provider, was resolved. The ES clusters were repaired and the backlog of newly submitted logs was processed in about one hour. **What we are doing to prevent it from happening again:** For this type of incident, LogDNA cannot take proactive preventive measures.

Status: Postmortem

Impact: None | Started At: Jan. 20, 2022, 7:57 p.m.

Updates:

Time: Jan. 21, 2022, 7:47 p.m.

Status: Postmortem

Update: **Dates:** Start Time: Thursday, January 20, 2022, at 19:13:00 UTC End Time: Thursday, January 20, 2022, at 21:24:00 UTC Duration: 02:11:00 **What happened:** Ingestion was halted and our Web UI was unresponsive for about 5-10 minutes. Newly submitted logs were not immediately available for Alerting, Searching, Live Tail, Graphing, and Timelines. **Why it happened:** Our service hosting provider Equinix Metal had an outage that was caused by the failure of one of their main switches \(more details at [https://status.equinixmetal.com/incidents/gjmh37y6rkjp](https://status.equinixmetal.com/incidents/gjmh37y6rkjp)\). The outage impacted traffic and global network connectivity to the LogDNA service. During the Equinix Metal incident, Ingestion, Alerting, and Live Tail were halted and our Web UI was unresponsive for a period of 5-10 minutes. Multiple ElasticSearch \(ES\) clusters went into an unhealthy state which caused delays for about one hour in newly submitted logs being immediately available for Searching, Graphing, and Timelines. **How we fixed it:** No remedial action was possible by LogDNA. We waited until the incident from Equinix Metal, our service hosting provider, was resolved. The ES clusters were repaired and the backlog of newly submitted logs was processed in about one hour. **What we are doing to prevent it from happening again:** For this type of incident, LogDNA cannot take proactive preventive measures.
Time: Jan. 20, 2022, 9:38 p.m.

Status: Resolved

Update: This incident has been resolved. All services are operational.
Time: Jan. 20, 2022, 9:07 p.m.

Status: Monitoring

Update: Logs are being ingested again without delays. All services are working normally. We will monitor until our Cloud Provider closes their incident.
Time: Jan. 20, 2022, 7:57 p.m.

Status: Investigating

Update: Our Cloud Provider Equinix is having an incident (see https://status.equinixmetal.com/incidents/gjmh37y6rkjp). For about 5-10 minutes, ingestion was halted and the WebUI was not responsive. Some alerts may have not been triggered. Currently all services are working and there are some delays in processing recently sent logs. We are monitoring Equinix’s incident closely.

Is there an Mezmo outage?

Mezmo status: Systems Active

Mezmo outages and incidents

There have been 1 outages or incidents for Mezmo in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Components and Services Monitored for Mezmo

Log Analysis

Pipeline

Latest Mezmo outages and incidents.

Delays in Alerting, Searching, Live Tail, Graphing, and Timelines. WebUI intermittently unavailable.

Updates:

Ingestion, Searching, Live Tail, Alerting, Graphing, and Timelines Delays

Updates:

Ingestion, Searching, Live Tail, Alerting, Graphing, and Timelines Delays

Updates:

Many services briefly halted due to Cloud Provider incident

Updates:

Many services briefly halted due to Cloud Provider incident

Updates:

Check the status of similar companies and alternatives to Mezmo

Hudl

OutSystems

Postman

Mendix

DigitalOcean

Bandwidth

DataRobot

Grafana Cloud

SmartBear Software

Test IO

Copado Solutions

CircleCI

Frequently Asked Questions - Mezmo

Is there a Mezmo outage?

Where can I find the official status page of Mezmo?

How can I get notified if Mezmo is down or experiencing an outage?

What does Mezmo do?

Start monitoring now!