Last checked: 5 minutes ago
Get notified about any outages, downtime or incidents for Mezmo and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Mezmo.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
Log Analysis | Active |
Alerting | Active |
Archiving | Active |
Livetail | Active |
Log Ingestion (Agent/REST API/Code Libraries) | Active |
Log Ingestion (Heroku) | Active |
Log Ingestion (Syslog) | Active |
Search | Active |
Web App | Active |
Pipeline | Active |
Destinations | Active |
Ingestion / Sources | Active |
Processors | Active |
Web App | Active |
View the latest incidents for Mezmo and check for official updates:
Description: We have identified and resolved the issue that was making https://app.logdna.com unavailable.
Status: Resolved
Impact: Major | Started At: Dec. 16, 2020, 10:54 p.m.
Description: Ingestion for logs is working again. All systems are fully operational.
Status: Resolved
Impact: Critical | Started At: Dec. 3, 2020, 8:03 p.m.
Description: Ingestion for logs is working again. All systems are fully operational.
Status: Resolved
Impact: Critical | Started At: Dec. 3, 2020, 8:03 p.m.
Description: ### **Dates:** The incident was opened on October 14, 2020 - 20:55 UTC. The impact was largely resolved by October 16, 2020 - 23:00 UTC. We monitored usage until the incident was closed, on October 19, 2020 - 23:35 UTC. ### **What happened:** Attempts to send new logs to our service timed out, approximately 3% to 5% of the time. This resulted in intermittent failures to ingest logs from agents, code libraries, and REST API calls. Most customers use our agents, which resend logs that fail to be ingested. Customers using other means to submit logs had to use their own retry methods. ### **Why it happened:** A node was added to our service to handle an increased need for resources. This node had previously been cordoned off because of networking failures. When it became operational, our load balancers directed a percentage of ingestion calls to it. Those calls, which amounted to about 3% to 5% of the total, would fail and eventually timeout. ### **How we fixed it:** Monitoring revealed the failures were particular to pods running on this node. We found other means to handle the increased need for resources, then stopped the pods running on the problematic node and cordoned it off again. The rate of timeouts returned to normal levels and ingestion proceeded normally. ### **What we are doing to prevent it from happening again:** We’re improving how we identify nodes that have been cordoned off because of problematic behavior and should not be reintroduced to our service.
Status: Postmortem
Impact: Minor | Started At: Oct. 14, 2020, 8:55 p.m.
Description: ### **Dates:** The incident was opened on October 14, 2020 - 20:55 UTC. The impact was largely resolved by October 16, 2020 - 23:00 UTC. We monitored usage until the incident was closed, on October 19, 2020 - 23:35 UTC. ### **What happened:** Attempts to send new logs to our service timed out, approximately 3% to 5% of the time. This resulted in intermittent failures to ingest logs from agents, code libraries, and REST API calls. Most customers use our agents, which resend logs that fail to be ingested. Customers using other means to submit logs had to use their own retry methods. ### **Why it happened:** A node was added to our service to handle an increased need for resources. This node had previously been cordoned off because of networking failures. When it became operational, our load balancers directed a percentage of ingestion calls to it. Those calls, which amounted to about 3% to 5% of the total, would fail and eventually timeout. ### **How we fixed it:** Monitoring revealed the failures were particular to pods running on this node. We found other means to handle the increased need for resources, then stopped the pods running on the problematic node and cordoned it off again. The rate of timeouts returned to normal levels and ingestion proceeded normally. ### **What we are doing to prevent it from happening again:** We’re improving how we identify nodes that have been cordoned off because of problematic behavior and should not be reintroduced to our service.
Status: Postmortem
Impact: Minor | Started At: Oct. 14, 2020, 8:55 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.