Company Logo

Is there an Zencargo outage?

Zencargo status: Systems Active

Last checked: 5 minutes ago

Get notified about any outages, downtime or incidents for Zencargo and 1800+ other cloud vendors. Monitor 10 companies, for free.

Subscribe for updates

Zencargo outages and incidents

Outage and incident data over the last 30 days for Zencargo.

There have been 0 outages or incidents for Zencargo in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Sign Up Now

Components and Services Monitored for Zencargo

Outlogger tracks the status of these components for Xero:

Analytics Dashboards Active
API Active
app.zencargo.com Active
sandbox.zencargo.com Active
Component Status
Analytics Dashboards Active
API Active
app.zencargo.com Active
sandbox.zencargo.com Active

Latest Zencargo outages and incidents.

View the latest incidents for Zencargo and check for official updates:

Updates:

  • Time: May 16, 2020, 10:33 p.m.
    Status: Postmortem
    Update: # **Incident summary** Between Friday, May 15th 2020 between 19:55 and Saturday 17:01 we had 7 incidents on our application with a total amount of 4 hours and 56 minutes which affected 5 users that used our application. The incident was caused by introducing a new background job processing technology which exceeded our memory on our application servers and made them unavailable to our customers \(full down time\). # Impact 5 customers were affected in a total downtime of 4 hours and 56 \(mostly during the night\). There was no further impact \(no mentions by the team, no social media mentions, no calls to our KAM team\) in relation to this incident. The event was triggered by a change to our background processing infrastructure on Friday, which caused a memory leak and the interruption of our service at the following times: * 19:55 to 20:01 \(6min\) * 20:02 to 20:04 \(2min\) * 20:07 to 20:31 \(24min\) * 23:25 to 00:04 \(39min\) * 03:25 to 03:30 \(5min\) * 03:32 to 07:04 \(3h32min\) * 16:53 to 17:01 \(8min\) To put this in context, our downtime has been 2 hours in total in the last year. We usually do better and I am sorry for the impact this had on your business. We’re doing everything to have your back in the future. # Leadup The change of the background processing infrastructure was part of introducing changes to simplify our Kubernetes migration, by getting rid of cronjobs and move to [Sidekiq](https://sidekiq.org), which is considered the industry standard for background processing in the ruby world. A bug in our code caused multiple background processes being processed at the same time, duplication on the same host and simultaneously across two instances. The team started working on the event by firstly making sure that our application is reachable. # **Detection** We started a tripple pairing session to debug the situation by checking our monitoring infrastructure. We did try to find clues of what the issue was, but because we couldn't log in to both instances, we couldn't find out what the root cause was. We made an assumption that it's either 1. full disk issues due to logs 2. memory leaks 3. a network outage at AWS We couldn't prove 1, but scheduled a review of the instance in 2 weeks. We couldn't prove 2, so we thought we'll have to wait for what happened, the last deployment was 5 hours ago, so we assumed that this might not be related to the deployment, that's why we assumed 3, which is out of our control and we thought it might have been a temporary issue. We decided after an hour of investigation without results to monitor the new instances that have been deployed. We've seen something weird, a background process where no jobs are running consuming 80% of the system memory: `> ps aux --sort -rss | head -n 2 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND webapp 17417 27.8 79.4 7838288 6490376 ? Sl 11:51 37:28 sidekiq 5.2.7 current [0 of 5 busy]` We've seen memory consumptions up to 92% of available memory by one process. We've seen the same job being run on the same and across multiple machines, in parallel. # Response We started introducing additional metrics logging which would give us memory reporting for the application even if the application would no longer be accessible. We started reconfiguring the background processes so that they don't run with multiple instances at the same time where it made sense \(especially the ocean insights subscription task\). We started kicking off a Garbage Collection process to win back the memory that was allocated for the background jobs after the job is finished. # Recovery We deployed the fixes mentioned in `Response`. Our memory consumption has been stable for more than 3 hours on < 60% for the background process which we monitored. We discovered an inefficient background process that needs to be addressed in an upcoming cooldown cycle. ## **Root cause** 1. Moving background processes from a short running process \(cron job\) to a long running process \(sidekiq worker\) has an implication on the memory management of the ruby process, which led to a memory leak that crashed the application server \(vs. one cron job as a process manager that was run as a sub-process that freed up memory after the job finished\). 2. Previously we had only one machine executing the cron jobs \(the deployment lead\), even if one machine would've ran out of memory another machine would have been unaffected. 3. The deployment of a major infrastructure piece was done on a Friday mid-day. The issue surfaced out of business hours. 4. Lack of monitoring and logs about the instance health \(memory and disk space metrics\) made it hard to reason and hard to detect the issue which led to assumptions that weren't correct, misidentifying the issue at hand. ## **Backlog check** There has been no ticket in our technical debt project that raised moving the background job infrastructure. This was mostly triggered by our Kubernetes objective to simplify the background job management without proper risk assessment and understanding of the existing background jobs. The understanding of Sidekiq and it's impact on the production environment were unknown to us. ## **Recurrence** We haven't seen the same root cause popping up in the past. ## **Lessons learned** * The team was responsive and we came together to solve the issue without any process being in place, which is a great sign that peer accountability and ownership are lived as values in the team. * The lack of visibility of instance health and application state in our metrics hurted us, but we're happy that we're working on transitioning to Kubernetes because this would've been caught in different ways * background jobs being managed by kubernetes independently from the application instances which would've had an impact on the background job without affecting our application to go down * We’re addressing all our outstanding monitoring shortcomings as part of the Kubernetes transition * Prometheus, Grafana and Kibana stack * Istio telemetry \(e.g. application metrics, improved log management, distributed tracing\) * Pagerduty for only one person doesn't make sense no more, we’ve scaled a lot and we’re going to address this to make sure we have no single point of failure. ## **Corrective actions** Describe the corrective action ordered to prevent this class of incident in the future. Note who is responsible and when they have to complete the work and where that work is being tracked. * Manual auto-scaling rate limit increase which is in place to be trained up for all engineers to fix uptime pragmatically * setup pagerduty devices for more people in the team \(plus rotation\) * Review Sidekiq implementation and review queueing strategy * Review Memory leak strategy \(monit or similar\) to kill processes that take too much memory until Kubernetes is in place
  • Time: May 15, 2020, 7:50 p.m.
    Status: Resolved
    Update: We identified the issue which was related to running out of memory on our application servers. We consider the incident resolved and apologise for the inconvenience caused.
  • Time: May 15, 2020, 7:43 p.m.
    Status: Monitoring
    Update: We had a loss of network connectivity to our application servers. We resolve the issue by starting new instances and are investigating the issue
  • Time: May 15, 2020, 7:33 p.m.
    Status: Investigating
    Update: We're back online and are still investigating the issues.
  • Time: May 15, 2020, 7:25 p.m.
    Status: Investigating
    Update: We started investigating this issue.

Updates:

  • Time: March 19, 2020, 8:08 p.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: March 19, 2020, 6:50 p.m.
    Status: Identified
    Update: The issue has been identified and a fix is being implemented.
  • Time: March 19, 2020, 6:46 p.m.
    Status: Investigating
    Update: We are investigating intermittent issues with the shipments page

Updates:

  • Time: March 19, 2020, 8:08 p.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: March 19, 2020, 6:50 p.m.
    Status: Identified
    Update: The issue has been identified and a fix is being implemented.
  • Time: March 19, 2020, 6:46 p.m.
    Status: Investigating
    Update: We are investigating intermittent issues with the shipments page

Updates:

  • Time: Feb. 17, 2020, 2:18 p.m.
    Status: Resolved
    Update: This incident has been resolved, functionality has been restored. Thanks for your patience
  • Time: Feb. 17, 2020, 1:35 p.m.
    Status: Monitoring
    Update: We've pushed a fix and are monitoring
  • Time: Feb. 17, 2020, 1:30 p.m.
    Status: Identified
    Update: We've temporarily removed some filters to reduce issues. We'll bring these back as soon as we have a fix for the underlying problem. In the meantime, please use the global search or purchase orders page to find older shipments.
  • Time: Feb. 17, 2020, 11:22 a.m.
    Status: Identified
    Update: We've pushed a fix that was causing issues with table sorting. We're still monitoring some intermittent issues some customers are experiencing and working on a fix.
  • Time: Feb. 17, 2020, 9:32 a.m.
    Status: Identified
    Update: We have identified an intermittent issue with our shipments page and are working on a fix. Thanks for your patience.

Updates:

  • Time: Feb. 17, 2020, 2:18 p.m.
    Status: Resolved
    Update: This incident has been resolved, functionality has been restored. Thanks for your patience
  • Time: Feb. 17, 2020, 1:35 p.m.
    Status: Monitoring
    Update: We've pushed a fix and are monitoring
  • Time: Feb. 17, 2020, 1:30 p.m.
    Status: Identified
    Update: We've temporarily removed some filters to reduce issues. We'll bring these back as soon as we have a fix for the underlying problem. In the meantime, please use the global search or purchase orders page to find older shipments.
  • Time: Feb. 17, 2020, 11:22 a.m.
    Status: Identified
    Update: We've pushed a fix that was causing issues with table sorting. We're still monitoring some intermittent issues some customers are experiencing and working on a fix.
  • Time: Feb. 17, 2020, 9:32 a.m.
    Status: Identified
    Update: We have identified an intermittent issue with our shipments page and are working on a fix. Thanks for your patience.

Check the status of similar companies and alternatives to Zencargo

NetSuite
NetSuite

Systems Active

ZoomInfo
ZoomInfo

Systems Active

SPS Commerce
SPS Commerce

Systems Active

Miro
Miro

Systems Active

Field Nation
Field Nation

Systems Active

Outreach
Outreach

Systems Active

Own Company

Systems Active

Mindbody
Mindbody

Systems Active

TaskRabbit
TaskRabbit

Systems Active

Nextiva
Nextiva

Systems Active

6Sense

Systems Active

BigCommerce
BigCommerce

Systems Active

Frequently Asked Questions - Zencargo

Is there a Zencargo outage?
The current status of Zencargo is: Systems Active
Where can I find the official status page of Zencargo?
The official status page for Zencargo is here
How can I get notified if Zencargo is down or experiencing an outage?
To get notified of any status changes to Zencargo, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Zencargo every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here