Last checked: 3 minutes ago
Get notified about any outages, downtime or incidents for Kustomer and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Kustomer.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
Regional Incident | Active |
Prod1 (US) | Active |
Analytics | Active |
API | Active |
Bulk Jobs | Active |
Channel - Chat | Active |
Channel - Email | Active |
Channel - Facebook | Active |
Channel - Instagram | Active |
Channel - SMS | Active |
Channel - Twitter | Active |
Channel - WhatsApp | Active |
CSAT | Active |
Events / Audit Log | Active |
Exports | Active |
Knowledge base | Active |
Kustomer Voice | Active |
Notifications | Active |
Registration | Active |
Search | Active |
Tracking | Active |
Web Client | Active |
Web/Email/Form Hooks | Active |
Workflow | Active |
Prod2 (EU) | Active |
Analytics | Active |
API | Active |
Bulk Jobs | Active |
Channel - Chat | Active |
Channel - Email | Active |
Channel - Facebook | Active |
Channel - Instagram | Active |
Channel - SMS | Active |
Channel - Twitter | Active |
Channel - WhatsApp | Active |
CSAT | Active |
Events / Audit Log | Active |
Exports | Active |
Knowledge base | Active |
Kustomer Voice | Active |
Notifications | Active |
Registration | Active |
Search | Active |
Tracking | Active |
Web Client | Active |
Web/Email/Form Hooks | Active |
Workflow | Active |
Third Party | Active |
OpenAI | Active |
PubNub | Active |
View the latest incidents for Kustomer and check for official updates:
Description: Kustomer has resolved an event affecting MessageBird by WhatsApp in Prod1 that caused outgoing messaging in MessageBird by WhatsApp conversations to not be reached by clients. After careful monitoring, our team has determined that all affected areas are now fully restored. Please reach out to Kustomer support at Email or Chat if you have additional questions or concerns.
Status: Resolved
Impact: Minor | Started At: May 24, 2024, 10:53 p.m.
Description: Kustomer has resolved an event affecting MessageBird by WhatsApp in Prod1 that caused outgoing messaging in MessageBird by WhatsApp conversations to not be reached by clients. After careful monitoring, our team has determined that all affected areas are now fully restored. Please reach out to Kustomer support at Email or Chat if you have additional questions or concerns.
Status: Resolved
Impact: Minor | Started At: May 24, 2024, 10:53 p.m.
Description: # **Summary** On May 12, 2024 the Kustomer system experienced 1 hour of high latency, internal errors, and agent statuses changing to Unavailable in our chat services and chat SDK. This was traced to increased errors and latency in one of our third-party vendors. **Root Cause** High latency and errors in PubNub, our 3rd party real time event and messaging broadcasting provider, rendered chat services intermittently unusable. See [https://status.pubnub.com/incidents/gwxcdp4rqg85](https://status.pubnub.com/incidents/gwxcdp4rqg85) # **Timeline** 05/12 10.15 am ET: A first error in the PubNub authentication process is reported via our internal system. The Kustomer engineering team starts investigating. 05/12 10.32 am ET: Internal errors across different chat services are confirmed and the investigating team declares an incident. 05/12 10.50 am ET: Communications with the PubNub team confirm increased errors and latency on their side and an on-going investigation on their end. 05/12 11.20 am ET: PubNub service issues are fixed, and as a result, latency and errors in the Kustomer system are no longer occurring. 05/12 11.28 am ET: The Kustomer engineering team completes the redrive of the pending messages and confirms the issue has been resolved. The incident status is set as stable. **Lessons/Improvements** The PubNub team confirms their incident has impacted only auth requests that rely on their Access Manager Version 2 API. PubNub has increased their allocation of resources to continue supporting this service. To mitigate any future occurrence, Kustomer is evaluating our migration from Version 2 to Version 3.
Status: Postmortem
Impact: Minor | Started At: May 12, 2024, 3:13 p.m.
Description: **Post Mortem: Workflow Failures and Platform Latency 05/10/24** # **Summary** On May 10, 2024 the Kustomer system experienced 4 hours of workflow failures and latency, due to a faulty change to core workflow actions code. After resolving the incident all failed workflows were redriven and no data was lost. **Root Cause** A change was deployed to the core Kustomer workflow actions codebase that led to failures in workflows. Two factors delayed resolution of the bug: failures in Kustomer’s continuous integration system due to an update pushed by a provider and overly aggressive caching of workflows that prevented a revert of the faulty release from taking effect immediately. During recovery the volume of events led to temporary high latency across the system before the system fully scaled up. # **Timeline** 05/10 9:45 am ET - First workflow failure, engineering team is alerted and begins investigating. 05/10 10:20 am - Workflow calls begin failing at a high rate. Customers begin reporting conversations not being processed correctly. 05/10 10:58 am - The problem is identified, and the team begins working on rolling back a change to workflow actions, but encounters caching issues that prevent the change from taking effect immediately. 05/10 ~12:00 pm - An alternative fix is identified, but rollout is blocked by CI system failures. 05/10 1:22 pm - A fix to the core problem is deployed. 05/10 1:45 pm - As the system catches up on data, message processing begins to be delayed. 05/10 2:15 pm - The message processing issue is identified and resolved. The platform is operating stably for new messages. 05/10 3:27 pm - All workflow events are fully redriven. **Lessons/Improvements** * **Deployment Improvements** - We will introduce mechanisms for bypassing caching for critical fixes to workflow actions. * **Better Emergency Release Documentation** - We will document and test a runbook for manually releasing changes to mitigate risks introduced by failures in our CI system. * **Monitoring -** Several monitoring gaps were identified in both production and pre-production environments that could have prevented or mitigated this incident. These will be resolved as part of incident follow up.
Status: Postmortem
Impact: Minor | Started At: May 10, 2024, 2:48 p.m.
Description: **Post Mortem: Workflow Failures and Platform Latency 05/10/24** # **Summary** On May 10, 2024 the Kustomer system experienced 4 hours of workflow failures and latency, due to a faulty change to core workflow actions code. After resolving the incident all failed workflows were redriven and no data was lost. **Root Cause** A change was deployed to the core Kustomer workflow actions codebase that led to failures in workflows. Two factors delayed resolution of the bug: failures in Kustomer’s continuous integration system due to an update pushed by a provider and overly aggressive caching of workflows that prevented a revert of the faulty release from taking effect immediately. During recovery the volume of events led to temporary high latency across the system before the system fully scaled up. # **Timeline** 05/10 9:45 am ET - First workflow failure, engineering team is alerted and begins investigating. 05/10 10:20 am - Workflow calls begin failing at a high rate. Customers begin reporting conversations not being processed correctly. 05/10 10:58 am - The problem is identified, and the team begins working on rolling back a change to workflow actions, but encounters caching issues that prevent the change from taking effect immediately. 05/10 ~12:00 pm - An alternative fix is identified, but rollout is blocked by CI system failures. 05/10 1:22 pm - A fix to the core problem is deployed. 05/10 1:45 pm - As the system catches up on data, message processing begins to be delayed. 05/10 2:15 pm - The message processing issue is identified and resolved. The platform is operating stably for new messages. 05/10 3:27 pm - All workflow events are fully redriven. **Lessons/Improvements** * **Deployment Improvements** - We will introduce mechanisms for bypassing caching for critical fixes to workflow actions. * **Better Emergency Release Documentation** - We will document and test a runbook for manually releasing changes to mitigate risks introduced by failures in our CI system. * **Monitoring -** Several monitoring gaps were identified in both production and pre-production environments that could have prevented or mitigated this incident. These will be resolved as part of incident follow up.
Status: Postmortem
Impact: Minor | Started At: May 10, 2024, 2:48 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.