Mezmo Status: Check if Mezmo down or having an outage.

Component	Status
Log Analysis	Active
Alerting	Active
Archiving	Active
Livetail	Active
Log Ingestion (Agent/REST API/Code Libraries)	Active
Log Ingestion (Heroku)	Active
Log Ingestion (Syslog)	Active
Search	Active
Web App	Active
Pipeline	Active
Destinations	Active
Ingestion / Sources	Active
Processors	Active
Web App	Active

Alerts not triggering

Description: **Dates:** Start Time: Tuesday, April 5, 2022, at 13:20:00 UTC End Time: Tuesday, April 5, 2022, at 18:20:00 UTC Duration: 5:00:00 ‌ **What happened:** Alerting was halted for all accounts for the entire duration of the incident. Most alerts – any whose trigger date was more than 15 minutes in the past – were discarded. ‌ **Why it happened:** We restarted our parser service, for reasons unrelated to this incident. Any restart of the parser service should be followed by a restart of the alerting service. This second step was overlooked and didn’t happen. Subsequently, all alerts stopped triggering. The need to restart alerting after a restart of the parser is already documented and well-known to our infrastructure team. However, the restart of the parser was performed by a team less familiar with the correct procedure. ‌ **How we fixed it:** We manually restarted the alerting service, which then returned to normal operation. ‌ **What we are doing to prevent it from happening again:** The proper documented restart procedure has been discussed with all teams allowed to restart services. We will add monitoring of our alerting service and automated notifications so we learn more quickly of any similar incidents in the future.

Status: Postmortem

Impact: Critical | Started At: April 5, 2022, 6:23 p.m.

Updates:

Time: April 6, 2022, 5:56 p.m.

Status: Postmortem

Update: **Dates:** Start Time: Tuesday, April 5, 2022, at 13:20:00 UTC End Time: Tuesday, April 5, 2022, at 18:20:00 UTC Duration: 5:00:00 ‌ **What happened:** Alerting was halted for all accounts for the entire duration of the incident. Most alerts – any whose trigger date was more than 15 minutes in the past – were discarded. ‌ **Why it happened:** We restarted our parser service, for reasons unrelated to this incident. Any restart of the parser service should be followed by a restart of the alerting service. This second step was overlooked and didn’t happen. Subsequently, all alerts stopped triggering. The need to restart alerting after a restart of the parser is already documented and well-known to our infrastructure team. However, the restart of the parser was performed by a team less familiar with the correct procedure. ‌ **How we fixed it:** We manually restarted the alerting service, which then returned to normal operation. ‌ **What we are doing to prevent it from happening again:** The proper documented restart procedure has been discussed with all teams allowed to restart services. We will add monitoring of our alerting service and automated notifications so we learn more quickly of any similar incidents in the future.
Time: April 5, 2022, 6:32 p.m.

Status: Resolved

Update: Alerts of all types have resumed. Alerts are fully operational.
Time: April 5, 2022, 6:23 p.m.

Status: Investigating

Update: Currently, alerts of all types are not triggering. We are taking remedial action.

Alerts not triggering

Description: **Dates:** Start Time: Tuesday, April 5, 2022, at 13:20:00 UTC End Time: Tuesday, April 5, 2022, at 18:20:00 UTC Duration: 5:00:00 ‌ **What happened:** Alerting was halted for all accounts for the entire duration of the incident. Most alerts – any whose trigger date was more than 15 minutes in the past – were discarded. ‌ **Why it happened:** We restarted our parser service, for reasons unrelated to this incident. Any restart of the parser service should be followed by a restart of the alerting service. This second step was overlooked and didn’t happen. Subsequently, all alerts stopped triggering. The need to restart alerting after a restart of the parser is already documented and well-known to our infrastructure team. However, the restart of the parser was performed by a team less familiar with the correct procedure. ‌ **How we fixed it:** We manually restarted the alerting service, which then returned to normal operation. ‌ **What we are doing to prevent it from happening again:** The proper documented restart procedure has been discussed with all teams allowed to restart services. We will add monitoring of our alerting service and automated notifications so we learn more quickly of any similar incidents in the future.

Status: Postmortem

Impact: Critical | Started At: April 5, 2022, 6:23 p.m.

Updates:

Time: April 6, 2022, 5:56 p.m.

Status: Postmortem

Update: **Dates:** Start Time: Tuesday, April 5, 2022, at 13:20:00 UTC End Time: Tuesday, April 5, 2022, at 18:20:00 UTC Duration: 5:00:00 ‌ **What happened:** Alerting was halted for all accounts for the entire duration of the incident. Most alerts – any whose trigger date was more than 15 minutes in the past – were discarded. ‌ **Why it happened:** We restarted our parser service, for reasons unrelated to this incident. Any restart of the parser service should be followed by a restart of the alerting service. This second step was overlooked and didn’t happen. Subsequently, all alerts stopped triggering. The need to restart alerting after a restart of the parser is already documented and well-known to our infrastructure team. However, the restart of the parser was performed by a team less familiar with the correct procedure. ‌ **How we fixed it:** We manually restarted the alerting service, which then returned to normal operation. ‌ **What we are doing to prevent it from happening again:** The proper documented restart procedure has been discussed with all teams allowed to restart services. We will add monitoring of our alerting service and automated notifications so we learn more quickly of any similar incidents in the future.
Time: April 5, 2022, 6:32 p.m.

Status: Resolved

Update: Alerts of all types have resumed. Alerts are fully operational.
Time: April 5, 2022, 6:23 p.m.

Status: Investigating

Update: Currently, alerts of all types are not triggering. We are taking remedial action.

Ingestion of new logs to our Syslog endpoint – for logs sent using a custom port, only – is intermittently delayed

Description: **Dates:** Start Time: Saturday, February 26, 2022, at 19:51 UTC End Time: Sunday, February 27, 2022, at 22:13 UTC Duration: 26:22:00 ‌ **What happened:** Ingestion of new logs to our Syslog endpoint – for logs sent using a custom port, only – was intermittently delayed. ‌ **Why it happened:** We recently introduced a new service \(Syslog Forwarder\) to handle the ingestion of logs sent over Syslog. As the name implies, it forwards logs to downstream services. Logs are sent from a range of ports on Syslog Forwarder to a range of ports used by clients running on downstream services. This design worked well in our advance testing, using a limited number of custom ports. Once running in production, however, the Syslog Forwarder needed to connect to a much larger number of custom ports. We then saw that the ephemeral port ranges of the clients running on downstream services overlapped with the port ranges used by the Syslog Forwarder. This led to occasional port conflicts when services and/or clients tried to start. The services and/or clients would attempt to start again until they found an open port without conflicts. This created delays in ingestion. ‌ **How we fixed it:** We changed the ephemeral port ranges of the clients running on downstream services so they no longer overlapped with the port ranges used by the Syslog Forwarder. ‌ **What we are doing to prevent it from happening again:** The new ephemeral port range has been incorporated and proven resilient in production. No further work is needed to prevent this kind of incident from happening again.

Status: Postmortem

Impact: None | Started At: Feb. 26, 2022, 7:51 p.m.

Updates:

Time: March 1, 2022, 8:35 p.m.

Status: Postmortem

Update: **Dates:** Start Time: Saturday, February 26, 2022, at 19:51 UTC End Time: Sunday, February 27, 2022, at 22:13 UTC Duration: 26:22:00 ‌ **What happened:** Ingestion of new logs to our Syslog endpoint – for logs sent using a custom port, only – was intermittently delayed. ‌ **Why it happened:** We recently introduced a new service \(Syslog Forwarder\) to handle the ingestion of logs sent over Syslog. As the name implies, it forwards logs to downstream services. Logs are sent from a range of ports on Syslog Forwarder to a range of ports used by clients running on downstream services. This design worked well in our advance testing, using a limited number of custom ports. Once running in production, however, the Syslog Forwarder needed to connect to a much larger number of custom ports. We then saw that the ephemeral port ranges of the clients running on downstream services overlapped with the port ranges used by the Syslog Forwarder. This led to occasional port conflicts when services and/or clients tried to start. The services and/or clients would attempt to start again until they found an open port without conflicts. This created delays in ingestion. ‌ **How we fixed it:** We changed the ephemeral port ranges of the clients running on downstream services so they no longer overlapped with the port ranges used by the Syslog Forwarder. ‌ **What we are doing to prevent it from happening again:** The new ephemeral port range has been incorporated and proven resilient in production. No further work is needed to prevent this kind of incident from happening again.
Time: Feb. 28, 2022, 6:58 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: Feb. 28, 2022, 2:23 a.m.

Status: Monitoring

Update: A fix has been implemented for the ingestion of new logs to our Syslog endpoint using a custom port. We will continue to monitor the results.
Time: Feb. 27, 2022, 7:08 p.m.

Status: Identified

Update: Ingestion of new logs to our Syslog endpoint using a Custom Port is still intermittently failing. We are continuing to work on a fix.
Time: Feb. 26, 2022, 7:51 p.m.

Status: Identified

Update: Ingestion of new logs to our Syslog endpoint using a Custom Port is intermittently failing.

Ingestion of new logs to our Syslog endpoint – for logs sent using a custom port, only – is intermittently delayed

Description: **Dates:** Start Time: Saturday, February 26, 2022, at 19:51 UTC End Time: Sunday, February 27, 2022, at 22:13 UTC Duration: 26:22:00 ‌ **What happened:** Ingestion of new logs to our Syslog endpoint – for logs sent using a custom port, only – was intermittently delayed. ‌ **Why it happened:** We recently introduced a new service \(Syslog Forwarder\) to handle the ingestion of logs sent over Syslog. As the name implies, it forwards logs to downstream services. Logs are sent from a range of ports on Syslog Forwarder to a range of ports used by clients running on downstream services. This design worked well in our advance testing, using a limited number of custom ports. Once running in production, however, the Syslog Forwarder needed to connect to a much larger number of custom ports. We then saw that the ephemeral port ranges of the clients running on downstream services overlapped with the port ranges used by the Syslog Forwarder. This led to occasional port conflicts when services and/or clients tried to start. The services and/or clients would attempt to start again until they found an open port without conflicts. This created delays in ingestion. ‌ **How we fixed it:** We changed the ephemeral port ranges of the clients running on downstream services so they no longer overlapped with the port ranges used by the Syslog Forwarder. ‌ **What we are doing to prevent it from happening again:** The new ephemeral port range has been incorporated and proven resilient in production. No further work is needed to prevent this kind of incident from happening again.

Status: Postmortem

Impact: None | Started At: Feb. 26, 2022, 7:51 p.m.

Updates:

Time: March 1, 2022, 8:35 p.m.

Status: Postmortem

Update: **Dates:** Start Time: Saturday, February 26, 2022, at 19:51 UTC End Time: Sunday, February 27, 2022, at 22:13 UTC Duration: 26:22:00 ‌ **What happened:** Ingestion of new logs to our Syslog endpoint – for logs sent using a custom port, only – was intermittently delayed. ‌ **Why it happened:** We recently introduced a new service \(Syslog Forwarder\) to handle the ingestion of logs sent over Syslog. As the name implies, it forwards logs to downstream services. Logs are sent from a range of ports on Syslog Forwarder to a range of ports used by clients running on downstream services. This design worked well in our advance testing, using a limited number of custom ports. Once running in production, however, the Syslog Forwarder needed to connect to a much larger number of custom ports. We then saw that the ephemeral port ranges of the clients running on downstream services overlapped with the port ranges used by the Syslog Forwarder. This led to occasional port conflicts when services and/or clients tried to start. The services and/or clients would attempt to start again until they found an open port without conflicts. This created delays in ingestion. ‌ **How we fixed it:** We changed the ephemeral port ranges of the clients running on downstream services so they no longer overlapped with the port ranges used by the Syslog Forwarder. ‌ **What we are doing to prevent it from happening again:** The new ephemeral port range has been incorporated and proven resilient in production. No further work is needed to prevent this kind of incident from happening again.
Time: Feb. 28, 2022, 6:58 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: Feb. 28, 2022, 2:23 a.m.

Status: Monitoring

Update: A fix has been implemented for the ingestion of new logs to our Syslog endpoint using a custom port. We will continue to monitor the results.
Time: Feb. 27, 2022, 7:08 p.m.

Status: Identified

Update: Ingestion of new logs to our Syslog endpoint using a Custom Port is still intermittently failing. We are continuing to work on a fix.
Time: Feb. 26, 2022, 7:51 p.m.

Status: Identified

Update: Ingestion of new logs to our Syslog endpoint using a Custom Port is intermittently failing.

Ingestion of new logs — for Syslog only - Partial Outage

Description: **Dates:** Start Time: Thursday, February 18, 2022, at 00:10 UTC End Time: Thursday, February 24, 2022, at 23:43 UTC Duration: 167:33:00 ‌ **What happened:** The ingestion of new logs to our Syslog endpoint was intermittently failing. ‌ **Why it happened:** We recently introduced a new service \(Syslog Forwarder\) to handle the ingestion of logs sent over Syslog. As the name implies, it forwards logs to downstream services. It was designed to send all logs submitted for each account to a single port opened on the downstream services. No load balancing was implemented in our original design, which performed well in our advance testing. Once put into production, however, it became apparent that some customer accounts submit logs at a volume higher than the downstream services could process. When this happened, logs lines were buffered in memory by the Syslog Forwarder. Memory increased until the pods crashed. Any log lines held on those pods were lost and never ingested. ‌ **How we fixed it:** We improved the design of the Syslog Forwarder by adding a pool of connections to the downstream services. In effect, we added traffic shaping to the Syslog Forwarder. ‌ **What we are doing to prevent it from happening again:** The new architecture has been incorporated and proven resilient in production. No further work is needed to prevent this kind of incident from happening again.

Status: Postmortem

Impact: Major | Started At: Feb. 24, 2022, 10:45 p.m.

Updates:

Time: March 1, 2022, 8:28 p.m.

Status: Postmortem

Update: **Dates:** Start Time: Thursday, February 18, 2022, at 00:10 UTC End Time: Thursday, February 24, 2022, at 23:43 UTC Duration: 167:33:00 ‌ **What happened:** The ingestion of new logs to our Syslog endpoint was intermittently failing. ‌ **Why it happened:** We recently introduced a new service \(Syslog Forwarder\) to handle the ingestion of logs sent over Syslog. As the name implies, it forwards logs to downstream services. It was designed to send all logs submitted for each account to a single port opened on the downstream services. No load balancing was implemented in our original design, which performed well in our advance testing. Once put into production, however, it became apparent that some customer accounts submit logs at a volume higher than the downstream services could process. When this happened, logs lines were buffered in memory by the Syslog Forwarder. Memory increased until the pods crashed. Any log lines held on those pods were lost and never ingested. ‌ **How we fixed it:** We improved the design of the Syslog Forwarder by adding a pool of connections to the downstream services. In effect, we added traffic shaping to the Syslog Forwarder. ‌ **What we are doing to prevent it from happening again:** The new architecture has been incorporated and proven resilient in production. No further work is needed to prevent this kind of incident from happening again.
Time: Feb. 25, 2022, 12:16 a.m.

Status: Resolved

Update: This incident has been resolved. Please reach out to us at [email protected] with any additional questions.
Time: Feb. 24, 2022, 10:45 p.m.

Status: Identified

Update: New logs — from Syslog only -- are intermittently not being ingested by our service. We are working to restore this functionality as soon as possible.

Is there an Mezmo outage?

Mezmo status: Systems Active

Mezmo outages and incidents

There have been 1 outages or incidents for Mezmo in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Components and Services Monitored for Mezmo

Log Analysis

Pipeline

Latest Mezmo outages and incidents.

Alerts not triggering

Updates:

Alerts not triggering

Updates:

Ingestion of new logs to our Syslog endpoint – for logs sent using a custom port, only – is intermittently delayed

Updates:

Ingestion of new logs to our Syslog endpoint – for logs sent using a custom port, only – is intermittently delayed

Updates:

Ingestion of new logs — for Syslog only - Partial Outage

Updates:

Check the status of similar companies and alternatives to Mezmo

Hudl

OutSystems

Postman

Mendix

DigitalOcean

Bandwidth

DataRobot

Grafana Cloud

SmartBear Software

Test IO

Copado Solutions

CircleCI

Frequently Asked Questions - Mezmo

Is there a Mezmo outage?

Where can I find the official status page of Mezmo?

How can I get notified if Mezmo is down or experiencing an outage?

What does Mezmo do?

Start monitoring now!