Company Logo

Is there an Mezmo outage?

Mezmo status: Systems Active

Last checked: 8 minutes ago

Get notified about any outages, downtime or incidents for Mezmo and 1800+ other cloud vendors. Monitor 10 companies, for free.

Subscribe for updates

Mezmo outages and incidents

Outage and incident data over the last 30 days for Mezmo.

There have been 1 outages or incidents for Mezmo in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Sign Up Now

Components and Services Monitored for Mezmo

Outlogger tracks the status of these components for Xero:

Alerting Active
Archiving Active
Livetail Active
Log Ingestion (Agent/REST API/Code Libraries) Active
Log Ingestion (Heroku) Active
Log Ingestion (Syslog) Active
Search Active
Web App Active
Destinations Active
Ingestion / Sources Active
Processors Active
Web App Active
Component Status
Active
Alerting Active
Archiving Active
Livetail Active
Log Ingestion (Agent/REST API/Code Libraries) Active
Log Ingestion (Heroku) Active
Log Ingestion (Syslog) Active
Search Active
Web App Active
Active
Destinations Active
Ingestion / Sources Active
Processors Active
Web App Active

Latest Mezmo outages and incidents.

View the latest incidents for Mezmo and check for official updates:

Updates:

  • Time: Oct. 12, 2022, 6:52 p.m.
    Status: Postmortem
    Update: **Dates:** Start Time: Wednesday, October 5, 2022, at 14:27 UTC End Time: Wednesday, October 5, 2022, at 14:45 UTC Duration: 00:18 **What happened:** The ingestion of logs was partially halted. The WebUI was mostly unresponsive and most API calls failed. Because many newly submitted logs were not being ingested, new logs were not immediately available for Alerting, Searching, Live Tail, Graphing, Timelines, and Archiving. **Why it happened:** We recently added a new API gateway - Kong - to our service, that acts as a proxy for all other services. We had gradually increased the amount of traffic directed through the API gateway over several weeks and seen no ill effects. Prior to the incident, only some of the traffic for ingestion wen through the gateway. Kong was restarted after a routine configuration change. After the restart, all traffic for our ingestion service began to go through Kong. Our monitoring quickly revealed the Kong service did not have enough pods to keep up with the increased workload, causing many requests to fail. **How we fixed it:** We manually added more pods to the Kong service. Ingestion, the WebUI, and API calls began to work normally again. Once ingestion had resumed, LogDNA agents running on customer environments resent all locally cached logs to our service for ingestion. No data was lost. **What we are doing to prevent it from happening again:** We updated Kubernetes to always assign enough pods for the Kong API gateway service to be able to handle all traffic. We’ll update the Kong gateway to more evenly distribute ingestion traffic across available pods. We will adjust our deployment processes so pods are restarted more slowly, which will reduce the impact in a similar scenario. We’ll explore autoscaling policies so more pods could be added automatically in a similar situation.
  • Time: Oct. 5, 2022, 4:05 p.m.
    Status: Resolved
    Update: This incident has been resolved. All services are fully operational.
  • Time: Oct. 5, 2022, 2:58 p.m.
    Status: Monitoring
    Update: Service is restored but we are still monitoring.

Updates:

  • Time: July 1, 2022, 10:40 p.m.
    Status: Postmortem
    Update: **Dates:** Start Time: Thursday, June 30, 21:40 UTC End Time: Thursday, June 30, 23:32 UTC Duration: 1 hour and 52 minutes **What happened:** Some log lines for some customers were discarded by our service. The log lines were successfully accepted by our ingestion service, but a downstream service – the parser – removed some of them. All further downstream services, such as Alerting, Live Tail, Searching, and Archiving never received these logs. In some cases, lines were received by Live Tail and were appended with the phrase “\(not retained\)”. The great majority of customers – 94.2% – were unaffected and had no log lines discarded. Approximately 3.5% had a relatively small number of log lines discarded. Approximately 2.3% had most or all of the log lines submitted during the incident discarded. **Why it happened:** We inadvertently released code into production that contained a bug in the parser service. This bug was known to us and in the process of being fixed in our development environment, but was not yet ready for release to production. The parser service is where exclusion rules are applied to recently submitted log lines that have been ingested but not yet passed to downstream services \(e.g. Alerting, Live Tail, Searching, and Archiving\). The bug made the parser exclude log lines that matched rules for inactive exclusion rules. This included exclusion rules made by customers in the past and then disabled. Customers with such rules had some log lines excluded: whichever lines matched the inactive rules. If those rules had the “Preserve these lines for live-tail and alerting” option enabled, then the excluded lines would still be processed for alerts and appear in Live Tail with the phrase “\(not retained\)” appended. This affected 3.5% of our customer accounts. The usage quota feature is implemented as a particular type of exclusion rule even though it is not presented in the UI as an exclusion rule. The bug made the parser exclude all log lines if the usage quota feature was enabled for an account. This affected 2.3% of our customer accounts. Our monitoring did not detect the decrease in lines being passed from the parser to downstream services because the change was within the range of normal fluctuation rates. These rates vary significantly as traffic changes and as customers choose to enable/disable exclusion rules. **How we fixed it:** We reverted the last release of parser code to the previous version. Once the previous version was deployed to all pods running the parser service, log lines stopped being discarded. **What we are doing to prevent it from happening again:** We added a code level test to ensure inactive exclusion rules are never applied by the parser \(such tests are part of our standard operating procedure\). We will review our release process to understand how the code containing the bug was moved into production and improve our processes to prevent a similar event in the future.
  • Time: June 30, 2022, 10:11 p.m.
    Status: Resolved
    Update: Newly submitted logs are now being processed and retained. Some logs submitted by some customers during the incident were discarded and not successfully retained. [Reference #2792]
  • Time: June 30, 2022, 9:36 p.m.
    Status: Investigating
    Update: Some logs submitted to our service in the last 1.5 hours have not been processed. We are taking remedial action now.

Updates:

  • Time: July 1, 2022, 10:40 p.m.
    Status: Postmortem
    Update: **Dates:** Start Time: Thursday, June 30, 21:40 UTC End Time: Thursday, June 30, 23:32 UTC Duration: 1 hour and 52 minutes **What happened:** Some log lines for some customers were discarded by our service. The log lines were successfully accepted by our ingestion service, but a downstream service – the parser – removed some of them. All further downstream services, such as Alerting, Live Tail, Searching, and Archiving never received these logs. In some cases, lines were received by Live Tail and were appended with the phrase “\(not retained\)”. The great majority of customers – 94.2% – were unaffected and had no log lines discarded. Approximately 3.5% had a relatively small number of log lines discarded. Approximately 2.3% had most or all of the log lines submitted during the incident discarded. **Why it happened:** We inadvertently released code into production that contained a bug in the parser service. This bug was known to us and in the process of being fixed in our development environment, but was not yet ready for release to production. The parser service is where exclusion rules are applied to recently submitted log lines that have been ingested but not yet passed to downstream services \(e.g. Alerting, Live Tail, Searching, and Archiving\). The bug made the parser exclude log lines that matched rules for inactive exclusion rules. This included exclusion rules made by customers in the past and then disabled. Customers with such rules had some log lines excluded: whichever lines matched the inactive rules. If those rules had the “Preserve these lines for live-tail and alerting” option enabled, then the excluded lines would still be processed for alerts and appear in Live Tail with the phrase “\(not retained\)” appended. This affected 3.5% of our customer accounts. The usage quota feature is implemented as a particular type of exclusion rule even though it is not presented in the UI as an exclusion rule. The bug made the parser exclude all log lines if the usage quota feature was enabled for an account. This affected 2.3% of our customer accounts. Our monitoring did not detect the decrease in lines being passed from the parser to downstream services because the change was within the range of normal fluctuation rates. These rates vary significantly as traffic changes and as customers choose to enable/disable exclusion rules. **How we fixed it:** We reverted the last release of parser code to the previous version. Once the previous version was deployed to all pods running the parser service, log lines stopped being discarded. **What we are doing to prevent it from happening again:** We added a code level test to ensure inactive exclusion rules are never applied by the parser \(such tests are part of our standard operating procedure\). We will review our release process to understand how the code containing the bug was moved into production and improve our processes to prevent a similar event in the future.
  • Time: June 30, 2022, 10:11 p.m.
    Status: Resolved
    Update: Newly submitted logs are now being processed and retained. Some logs submitted by some customers during the incident were discarded and not successfully retained. [Reference #2792]
  • Time: June 30, 2022, 9:36 p.m.
    Status: Investigating
    Update: Some logs submitted to our service in the last 1.5 hours have not been processed. We are taking remedial action now.

Updates:

  • Time: June 3, 2022, 10:49 p.m.
    Status: Postmortem
    Update: **Dates:** Start Time: Thursday, June 2, 2022, at 20:25 UTC End Time: Thursday, June 2, 2022, at 20:50 UTC Duration: 00:25 **What happened:** The ingestion of logs was halted for about 25 minutes. During that time, newly submitted logs were never ingested and therefore not available for Alerting, Searching, Live Tail, Graphing, Timelines, and Archiving. **Why it happened:** We manually reverted our ingester service to an older version \(to solve a minor problem unrelated to this incident\). During the procedure, the version of the container was reverted, but not the container’s configuration. Because of this versioning mismatch, logs from the ingester stopped being accepted by a downstream service \(the “buzzsaw broker”\). The ingester is currently not designed to confirm logs are accepted by downstream services; therefore it returned http 200 messages to our customer’s agents, indicating logs had been successfully received. At this point the agent discarded any locally cached log files. Consequently, all log lines sent during the incident \(25 minutes\) were never ingested. **How we fixed it:** We reverted the container’s configuration correctly, so it matched the version of the container itself. Ingestion began working normally again. **What we are doing to prevent it from happening again:** We will review and update our runbooks for reverting services to earlier versions to prevent similar mistakes. We also plan to automate the reversion process. We will add internal confirmations to the ingester so it is always certain log lines were received by downstream services. This will prevent the ingester from sending erroneous 200 messages back to the agent, should the ingester be unable to pass log lines downstream.
  • Time: June 2, 2022, 10:05 p.m.
    Status: Resolved
    Update: Ingestion has resumed. All services are fully operational.
  • Time: June 2, 2022, 9:44 p.m.
    Status: Identified
    Update: Ingestion of log lines has halted from all sources. We are taking remedial action.

Updates:

  • Time: June 3, 2022, 10:49 p.m.
    Status: Postmortem
    Update: **Dates:** Start Time: Thursday, June 2, 2022, at 20:25 UTC End Time: Thursday, June 2, 2022, at 20:50 UTC Duration: 00:25 **What happened:** The ingestion of logs was halted for about 25 minutes. During that time, newly submitted logs were never ingested and therefore not available for Alerting, Searching, Live Tail, Graphing, Timelines, and Archiving. **Why it happened:** We manually reverted our ingester service to an older version \(to solve a minor problem unrelated to this incident\). During the procedure, the version of the container was reverted, but not the container’s configuration. Because of this versioning mismatch, logs from the ingester stopped being accepted by a downstream service \(the “buzzsaw broker”\). The ingester is currently not designed to confirm logs are accepted by downstream services; therefore it returned http 200 messages to our customer’s agents, indicating logs had been successfully received. At this point the agent discarded any locally cached log files. Consequently, all log lines sent during the incident \(25 minutes\) were never ingested. **How we fixed it:** We reverted the container’s configuration correctly, so it matched the version of the container itself. Ingestion began working normally again. **What we are doing to prevent it from happening again:** We will review and update our runbooks for reverting services to earlier versions to prevent similar mistakes. We also plan to automate the reversion process. We will add internal confirmations to the ingester so it is always certain log lines were received by downstream services. This will prevent the ingester from sending erroneous 200 messages back to the agent, should the ingester be unable to pass log lines downstream.
  • Time: June 2, 2022, 10:05 p.m.
    Status: Resolved
    Update: Ingestion has resumed. All services are fully operational.
  • Time: June 2, 2022, 9:44 p.m.
    Status: Identified
    Update: Ingestion of log lines has halted from all sources. We are taking remedial action.

Check the status of similar companies and alternatives to Mezmo

Hudl
Hudl

Systems Active

OutSystems
OutSystems

Systems Active

Postman
Postman

Systems Active

Mendix
Mendix

Systems Active

DigitalOcean
DigitalOcean

Issues Detected

Bandwidth
Bandwidth

Issues Detected

DataRobot
DataRobot

Systems Active

Grafana Cloud
Grafana Cloud

Systems Active

SmartBear Software
SmartBear Software

Systems Active

Test IO
Test IO

Systems Active

Copado Solutions
Copado Solutions

Systems Active

CircleCI
CircleCI

Systems Active

Frequently Asked Questions - Mezmo

Is there a Mezmo outage?
The current status of Mezmo is: Systems Active
Where can I find the official status page of Mezmo?
The official status page for Mezmo is here
How can I get notified if Mezmo is down or experiencing an outage?
To get notified of any status changes to Mezmo, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Mezmo every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here
What does Mezmo do?
Mezmo is a cloud-based tool that helps application owners manage and analyze important business data across different areas.