Last checked: 7 minutes ago
Get notified about any outages, downtime or incidents for Avochato and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Avochato.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
API | Active |
avochato.com | Active |
Mobile | Active |
View the latest incidents for Avochato and check for official updates:
Description: ## What Happened A subset of our cloud infrastructure suffered a hardware failure, which caused specific customers associated with that hardware to be unable to view broadcasts and contacts, as well as intermittent issues polling for live inbox updates. Once the hardware issue was resolved, traffic for the specific subset of impacted inboxes returned to normal. No data was lost. ## Resolution Avochato cloud engineers escalated the issue to our cloud partner, who escalated the issue internally. The hardware failure was resolved and our team will continue working with the vendor to ensure that similar incidents properly rotate the faulty hardware out of the cluster.
Status: Postmortem
Impact: Minor | Started At: July 27, 2021, 8:03 p.m.
Description: _Note This incident was unrelated to an outage relating to CDNs that may have impacted customers sending MMS images with public-facing URLs powered by Fastly \(_[_read more here_](https://www.fastly.com/blog/summary-of-june-8-outage)_\)._ ## What Happened Avochato cloud services were temporarily unable to split traffic across our secure cloud databases, causing only one Avochato database to manage all load on the platform at once time. This resulted in slower than average speeds when processing updates to records and when serving pages throughout the app on mobile and desktop, as well as requests to our API. This slowdown compounded during the middle of the day, causing delays in fetching and writing data as well as delays syncing data to our search nodes. It is also possible that the delays would have impacted any time-sensitive operations such as adding contacts to a broadcast before the broadcast’s scheduled date. ## Resolution The engineering team patched the issue once it was determined safe to do so. Traffic patterns returned to normal and our throughput returned to expected levels. We have also made proactive performance improvements to double the normal throughput for certain data pipelines and implemented monitoring to immediately detect this regression in traffic. We apologize for the inconvenience to you and your team, Christopher, CTO
Status: Postmortem
Impact: Minor | Started At: June 8, 2021, 6 p.m.
Description: ## What happened During routine development cycle, database provisioning for a new service within our cloud services provider inadvertently applied an authorization change to our production database cluster. This caused requests to the Avochato app and homepage to fail temporarily until the changes were reverted. ## Resolution We immediately reverted the change when operations identified that it impacted production and the change rolled back out. ## Impact While Avochato was unavailable, queued automations and uploads entered our retry queues and successfully continued once we recovered. Incoming messages received during the period were not lost. Moving forward, our team has adjusted our processes for provisioning services regardless of their proximity to production services.
Status: Postmortem
Impact: Critical | Started At: May 19, 2021, 8:38 p.m.
Description: ## What Happened During routine auto-scaling in response to automated rotating of application servers, the Avochato platform suffered network failures brokering client-side websocket requests to application servers. An application-layer resolution to client-side javascript errors experienced by some customers inadvertently amplified the volume of retry requests, and this caused an insurmountable queue of requests to our websocket broker database. Secure websocket connections are used to deliver real-time notifications and app updates in the live inbox, and have inherent retry mechanisms to keep clients connected even if they lose connectivity intermittently. A high volume of concurrent retry requests timed out and filled the retry queue, where they continued to timeout and fail exponentially as browsers interacted with Avochato. This led to an effective denial of service as the retry mechanisms created an insurmountable volume of requests, compounding based on peak platform usage by our user-base. Exponential back-off mechanisms did not prevent individual clients from sending requests below a safe threshold our network could process expediently. Unlike the control we have over server-side resources, the Avochato engineering team did not have effective means to disable toxic clients from reconnecting, and rushed to isolate and stem the root cause, specifically by deauthenticating certain sessions remotely. Avochato servers remained operational and available on the open internet during the impact period, but interactions with the app became queued at the network level, causing extreme delays to end-users and API requests, as well as delays tagging data and uploading contacts, and delays in attempting to make outbound calls or route incoming calls. The incident persisted while the massive queue of requests was processed, but the Avochato engineering team did not have tools available to clear the queue without risking data loss. ## Resolution The Avochato Platform auto-scaled application servers in response to the increase in traffic to handle peaks in usage. Engineers were alerted and immediately began triaging reports of latency. After evaluating the network traffic and logs, our team identified the root cause and began developing mechanisms to stem websocket retry requests. Various diagnostics by the engineering team were able to decrease but not eliminate the above-average in-app latency so long as problematic clients were still online. Some cohorts of users were securely logged out remotely in order to prevent their clients from overloading Avochato. Backoff mechanisms have been modified to dramatically increase the period between retry requests. Meanwhile, upgrades to the open-source websocket broker libraries used by the platform were identified, patched, tested, and deployed to production application servers in order to prevent the root cause. Additional logging was also implemented to better identify the volume of these requests for internal triage. Functionality to securely reload or disable runaway client requests has been developed and deployed to production in order to prevent the root-cause from occurring across the platform. Additional architecture points of failure were identified at the networking level and upgrades to those parts of the system have been proposed and prioritized to prevent this type of service disruption from occurring in the future. ## Final Thoughts We know how critical real-time conversations are to your team, and how important it is to be able to service your customers throughout the business day. Our team is committing to responding as promptly as possible to incoming support requests and providing as much information as possible during incidents. Thank you again for choosing Avochato, Christopher Neale CTO and Co-founder
Status: Postmortem
Impact: Minor | Started At: March 9, 2021, 6:03 p.m.
Description: This incident has been resolved. We will keep monitoring throughout the day.
Status: Resolved
Impact: Minor | Started At: March 4, 2021, 5:18 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.