Mezmo Status: Check if Mezmo down or having an outage.

Mezmo outages and incidents

Outage and incident data over the last 30 days for Mezmo.

There have been 1 outages or incidents for Mezmo in the last 30 days.

Severity Breakdown:

None: 0

Minor: 0

Major: 0

Critical: 1

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Components and Services Monitored for Mezmo

Outlogger tracks the status of these components for Xero:

Log Analysis

Alerting Active

Archiving Active

Livetail Active

Log Ingestion (Agent/REST API/Code Libraries) Active

Log Ingestion (Heroku) Active

Log Ingestion (Syslog) Active

Search Active

Web App Active

Pipeline

Destinations Active

Ingestion / Sources Active

Processors Active

Web App Active

Component	Status
Log Analysis	Active
Alerting	Active
Archiving	Active
Livetail	Active
Log Ingestion (Agent/REST API/Code Libraries)	Active
Log Ingestion (Heroku)	Active
Log Ingestion (Syslog)	Active
Search	Active
Web App	Active
Pipeline	Active
Destinations	Active
Ingestion / Sources	Active
Processors	Active
Web App	Active

Latest Mezmo outages and incidents.

View the latest incidents for Mezmo and check for official updates:

Pipeline Web UI is Unresponsive

Description: The Pipeline UI is now fully functional.

Status: Resolved

Impact: Critical | Started At: Oct. 26, 2024, 1 a.m.

Updates:

Time: Oct. 26, 2024, 8:23 p.m.

Status: Resolved

Update: The Pipeline UI is now fully functional.
Time: Oct. 26, 2024, 6:34 p.m.

Status: Monitoring

Update: The Pipeline WebUI is available and loading pages normally. We're working to resolve the root cause permanently and continuing to monitor.
Time: Oct. 26, 2024, 5:24 a.m.

Status: Monitoring

Update: The Pipeline WebUI is available, but at times pages are slow to load and metrics may be unavailable. We are taking remedial action and continuing to monitor.
Time: Oct. 26, 2024, 3:35 a.m.

Status: Monitoring

Update: We are continuing to monitor for any further issues.
Time: Oct. 26, 2024, 3:35 a.m.

Status: Monitoring

Update: The Pipeline WebUI is working now. We are monitoring.
Time: Oct. 26, 2024, 2:12 a.m.

Status: Investigating

Update: The Pipeline WebUI is still unavailable. Ingress and egress are unaffected. Our engineers are investigating.
Time: Oct. 26, 2024, 1 a.m.

Status: Investigating

Update: Our Pipeline WebUI is not loading pages. We are investigating.

Intermittent user session timeouts, requiring periodic re-authentication

Description: **Dates:** Start Time: Monday, December 4, 2023, at 10:29 UTC End Time: Monday, December 4, 2023, at 12:01 UTC Duration: 92 minutes ‌ **What happened:** Web UI users were logged out frequently – usually within 1-2 minutes of logging in. Users could successfully login again without any issues, but the session would expire shortly afterwards. ‌ **Why it happened:** It was identified that both Web UI pods and the Redis database pods, which are responsible for storing user sessions, experienced a critical memory shortage, leading to uncontrolled data purging. When this same issue happened in July 2023, our engineering team deployed a fix that enhanced how Redis stores the user session keys. This fix successfully prevented any recurrence of the problem until today. The team is still determining what made it exceed the memory limit this time. ‌ **How we fixed it:** Initially, the Web UI pods were restarted, but that did not resolve the problem permanently. The engineering team then restarted the Redis database pods and the session stopped expiring. ‌ **What we are doing to prevent it from happening again:** The team will revise the previous fix, including implementing a mechanism for the pod to automatically restart upon reaching its limit and setting up alerts to notify an engineer when it's approaching that threshold.

Status: Postmortem

Impact: Minor | Started At: Dec. 4, 2023, 12:06 p.m.

Updates:

Time: Dec. 4, 2023, 1:46 p.m.

Status: Postmortem

Update: **Dates:** Start Time: Monday, December 4, 2023, at 10:29 UTC End Time: Monday, December 4, 2023, at 12:01 UTC Duration: 92 minutes ‌ **What happened:** Web UI users were logged out frequently – usually within 1-2 minutes of logging in. Users could successfully login again without any issues, but the session would expire shortly afterwards. ‌ **Why it happened:** It was identified that both Web UI pods and the Redis database pods, which are responsible for storing user sessions, experienced a critical memory shortage, leading to uncontrolled data purging. When this same issue happened in July 2023, our engineering team deployed a fix that enhanced how Redis stores the user session keys. This fix successfully prevented any recurrence of the problem until today. The team is still determining what made it exceed the memory limit this time. ‌ **How we fixed it:** Initially, the Web UI pods were restarted, but that did not resolve the problem permanently. The engineering team then restarted the Redis database pods and the session stopped expiring. ‌ **What we are doing to prevent it from happening again:** The team will revise the previous fix, including implementing a mechanism for the pod to automatically restart upon reaching its limit and setting up alerts to notify an engineer when it's approaching that threshold.
Time: Dec. 4, 2023, 1:19 p.m.

Status: Resolved

Update: The issue has been resolved, and no further issues have been observed with user sessions.
Time: Dec. 4, 2023, 12:13 p.m.

Status: Monitoring

Update: We have implemented a fix for the user session timeouts on the Web UI, but will continue to monitor the situation closely.
Time: Dec. 4, 2023, 12:06 p.m.

Status: Investigating

Update: The Web UI is currently encountering user session timeouts, prompting customers to log in every 1-2 minutes. Our team is actively investigating the root cause of this issue, while the remaining aspects of the service remain fully functional.

Web UI is unresponsive and ingestion of log lines halted

Description: **Dates:** Start Time: 8:32 pm UTC, Tuesday August 29th, 2023 End Time: 10:04 pm UTC, Tuesday August 29th, 2023 Duration: 92 minutes ‌ **What happened:** Our Kong Gateway service stopped functioning and all connection requests to our ingestion service and web service failed. The Web UI did not load and log lines could not be sent by either our agent or API. Log lines sent using syslog were unaffected. Kong was unavailable for two periods of time: one lasting 27 minutes \(8:32 pm UTC to 8:59 pm UTC\) and another lasting 9 minutes \(9:43 pm UTC to 9:52 pm UTC\). Once Kong became available, the Web UI was immediately accessible again. Agents resent locally cached log lines \(as did any APIs implemented with retry strategies\). Our service then processed the backlog of log lines, passing them to downstream services such as alerting, live tail, archiving, and indexing \(which makes lines visible in the Web UI for searching, graphing, and timelines\). The extra processing was completed ~20 minutes after Kong returned to normal usage the first time, and ~10 minutes after the second time. ‌ **Why it happened:** The pods running our Kong Gateway were overwhelmed with connection requests. CPU increased to a point that health checks started to fail and the pods were shut down. We’ve determined through research and experimentation that the cause was a sudden, brief increase in the volume of traffic directed to our service. Our service is designed to handle increases in traffic, but these were approximately 100 times above normal usage. The source\(s\) of the traffic are unknown. The increase came in two spikes, which correspond to the two periods when Kong became unavailable. ‌ **How we fixed it:** We manually scaled up the number of pods devoted to running our Kong Gateway. During the first spike of traffic, we doubled the number of pods; during the second, we quadrupled the number. This certainly helped speed up the processing of the backlog of log lines sent by agents once Kong was again available. It’s unclear whether the higher number of pods would have been able to process the spikes of traffic as they were happening. ‌ **What we are doing to prevent it from happening again:** We are running our Kong service with more pods so there are more resources to handle any similar spikes in traffic. We will add auto-scaling to the Kong service so more pods are made available automatically as needed. We’ll also add metrics to identify the origin of any similar spikes in traffic.

Status: Postmortem

Impact: Major | Started At: Aug. 29, 2023, 9:01 p.m.

Updates:

Time: Sept. 6, 2023, 12:38 a.m.

Status: Postmortem

Update: **Dates:** Start Time: 8:32 pm UTC, Tuesday August 29th, 2023 End Time: 10:04 pm UTC, Tuesday August 29th, 2023 Duration: 92 minutes ‌ **What happened:** Our Kong Gateway service stopped functioning and all connection requests to our ingestion service and web service failed. The Web UI did not load and log lines could not be sent by either our agent or API. Log lines sent using syslog were unaffected. Kong was unavailable for two periods of time: one lasting 27 minutes \(8:32 pm UTC to 8:59 pm UTC\) and another lasting 9 minutes \(9:43 pm UTC to 9:52 pm UTC\). Once Kong became available, the Web UI was immediately accessible again. Agents resent locally cached log lines \(as did any APIs implemented with retry strategies\). Our service then processed the backlog of log lines, passing them to downstream services such as alerting, live tail, archiving, and indexing \(which makes lines visible in the Web UI for searching, graphing, and timelines\). The extra processing was completed ~20 minutes after Kong returned to normal usage the first time, and ~10 minutes after the second time. ‌ **Why it happened:** The pods running our Kong Gateway were overwhelmed with connection requests. CPU increased to a point that health checks started to fail and the pods were shut down. We’ve determined through research and experimentation that the cause was a sudden, brief increase in the volume of traffic directed to our service. Our service is designed to handle increases in traffic, but these were approximately 100 times above normal usage. The source\(s\) of the traffic are unknown. The increase came in two spikes, which correspond to the two periods when Kong became unavailable. ‌ **How we fixed it:** We manually scaled up the number of pods devoted to running our Kong Gateway. During the first spike of traffic, we doubled the number of pods; during the second, we quadrupled the number. This certainly helped speed up the processing of the backlog of log lines sent by agents once Kong was again available. It’s unclear whether the higher number of pods would have been able to process the spikes of traffic as they were happening. ‌ **What we are doing to prevent it from happening again:** We are running our Kong service with more pods so there are more resources to handle any similar spikes in traffic. We will add auto-scaling to the Kong service so more pods are made available automatically as needed. We’ll also add metrics to identify the origin of any similar spikes in traffic.
Time: Aug. 30, 2023, 12:17 a.m.

Status: Resolved

Update: This incident has been resolved.
Time: Aug. 29, 2023, 9:15 p.m.

Status: Investigating

Update: The webUI is loading consistently now, but we are still investigating.
Time: Aug. 29, 2023, 9:01 p.m.

Status: Investigating

Update: Our WebUI is not loading pages consistently. We are investigating. [Reference #3204]

User sessions are timing out and customers are required to login again

Description: **Dates:** Start Time: Monday, June 19, 2023, at 10:31 UTC End Time: Monday, June 19, 2023, at 12:35 UTC Duration: 124 minutes **What happened:** Users were being logged out of our WebUI frequently – within 1-2 minutes of logging in. Users could successfully login again, but the new session would also expire quickly. **Why it happened:** The cache of logged in users held in our Redis database was being cleared every 1-2 minutes. This caused all user sessions to expire and new logins to be required. We have yet to ascertain why the cache was being periodically cleared at frequent intervals. **How we fixed it:** We restarted the pods running the Redis database and the cache behavior returned to normal. **What we are doing to prevent it from happening again:** We will investigate further to learn why the Redis cache was being frequently cleared.

Status: Postmortem

Impact: Minor | Started At: June 19, 2023, 11:09 a.m.

Updates:

Time: June 28, 2023, 6:30 p.m.

Status: Postmortem

Update: **Dates:** Start Time: Monday, June 19, 2023, at 10:31 UTC End Time: Monday, June 19, 2023, at 12:35 UTC Duration: 124 minutes **What happened:** Users were being logged out of our WebUI frequently – within 1-2 minutes of logging in. Users could successfully login again, but the new session would also expire quickly. **Why it happened:** The cache of logged in users held in our Redis database was being cleared every 1-2 minutes. This caused all user sessions to expire and new logins to be required. We have yet to ascertain why the cache was being periodically cleared at frequent intervals. **How we fixed it:** We restarted the pods running the Redis database and the cache behavior returned to normal. **What we are doing to prevent it from happening again:** We will investigate further to learn why the Redis cache was being frequently cleared.
Time: June 19, 2023, 1:58 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: June 19, 2023, 11:27 a.m.

Status: Monitoring

Update: The fix was implemented and we are now monitoring the user login sessions.
Time: June 19, 2023, 11:13 a.m.

Status: Identified

Update: The issue has been identified, and a fix is being implemented.
Time: June 19, 2023, 11:09 a.m.

Status: Investigating

Update: User sessions to our Web UI are timing out and customers using the UI have to log in every 1-2 minutes. We are investigating why this is happening, but the rest of the service is fully functional. No other components are affected.
Time: June 19, 2023, 11:09 a.m.

Status: Investigating

Update: User sessions to our Web UI are timing out and customers using the UI have to log in every 1-2 minutes. We are investigating why this is happening, but the rest of the service is fully functional. No other components are affected.

The Web UI is not accessible

Description: **Dates:** Start Time: Monday, May 1, 2023, at 19:55 UTC End Time: Monday, May 1, 2023, at 20:11 UTC Duration: 16 minutes ‌ **What happened:** The WebUI was unresponsive, returning an error of “failure to get a peer from the ring-balancer.” **Why it happened:** All Mezmo services run within a service mesh. The portion of the mesh dedicated to the pods running our Mongo database began receiving many connection requests, more than its allocated CPU and memory could handle at once. This portion of the mesh \(which itself runs on pods\) quickly ran out of memory. This made the Mongo database unavailable to other services. The WebUI relies entirely on Mongo for account information and therefore became unresponsive, returning an error of “failure to get a peer from the ring-balancer.” While the immediate reason for the incident is clear, the root cause is still unknown. We suspect there was a change in user usage patterns \(e.g. increased traffic, login attempts, etc\) which triggered the incident. **How we fixed it:** We removed the WebUI from the service mesh. The Mongo service has more CPU and memory resources allocated to it and was able to accept the high level of connection requests successfully. WebUI usage immediately returned to normal. **What we are doing to prevent it from happening again:** We will change the default settings for the service mesh to allocate more CPU and memory resources, permanently. Afterwards, we will add the Mongo service back to the service mesh.

Status: Postmortem

Impact: Critical | Started At: May 1, 2023, 8:18 p.m.

Updates:

Time: May 8, 2023, 7:09 p.m.

Status: Postmortem

Update: **Dates:** Start Time: Monday, May 1, 2023, at 19:55 UTC End Time: Monday, May 1, 2023, at 20:11 UTC Duration: 16 minutes ‌ **What happened:** The WebUI was unresponsive, returning an error of “failure to get a peer from the ring-balancer.” **Why it happened:** All Mezmo services run within a service mesh. The portion of the mesh dedicated to the pods running our Mongo database began receiving many connection requests, more than its allocated CPU and memory could handle at once. This portion of the mesh \(which itself runs on pods\) quickly ran out of memory. This made the Mongo database unavailable to other services. The WebUI relies entirely on Mongo for account information and therefore became unresponsive, returning an error of “failure to get a peer from the ring-balancer.” While the immediate reason for the incident is clear, the root cause is still unknown. We suspect there was a change in user usage patterns \(e.g. increased traffic, login attempts, etc\) which triggered the incident. **How we fixed it:** We removed the WebUI from the service mesh. The Mongo service has more CPU and memory resources allocated to it and was able to accept the high level of connection requests successfully. WebUI usage immediately returned to normal. **What we are doing to prevent it from happening again:** We will change the default settings for the service mesh to allocate more CPU and memory resources, permanently. Afterwards, we will add the Mongo service back to the service mesh.
Time: May 1, 2023, 8:27 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: May 1, 2023, 8:18 p.m.

Status: Identified

Update: The Web UI is not accessible.

Check the status of similar companies and alternatives to Mezmo

Hudl

Systems Active

OutSystems

Systems Active

Postman

Systems Active

Mendix

Systems Active

DigitalOcean

Systems Active

Bandwidth

Issues Detected

DataRobot

Systems Active

Grafana Cloud

Systems Active

SmartBear Software

Systems Active

Test IO

Systems Active

Copado Solutions

Systems Active

CircleCI

Systems Active

Frequently Asked Questions - Mezmo

Is there a Mezmo outage?

The current status of Mezmo is: Systems Active

Where can I find the official status page of Mezmo?

The official status page for Mezmo is here

How can I get notified if Mezmo is down or experiencing an outage?

To get notified of any status changes to Mezmo, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Mezmo every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here

What does Mezmo do?

Mezmo is a cloud-based tool that helps application owners manage and analyze important business data across different areas.

Is there an Mezmo outage?

Mezmo status: Systems Active

Mezmo outages and incidents

There have been 1 outages or incidents for Mezmo in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Components and Services Monitored for Mezmo

Log Analysis

Pipeline

Latest Mezmo outages and incidents.

Pipeline Web UI is Unresponsive

Updates:

Intermittent user session timeouts, requiring periodic re-authentication

Updates:

Web UI is unresponsive and ingestion of log lines halted

Updates:

User sessions are timing out and customers are required to login again

Updates:

The Web UI is not accessible

Updates:

Check the status of similar companies and alternatives to Mezmo

Hudl

OutSystems

Postman

Mendix

DigitalOcean

Bandwidth

DataRobot

Grafana Cloud

SmartBear Software

Test IO

Copado Solutions

CircleCI

Frequently Asked Questions - Mezmo

Is there a Mezmo outage?

Where can I find the official status page of Mezmo?

How can I get notified if Mezmo is down or experiencing an outage?

What does Mezmo do?

Start monitoring now!