Last checked: 8 minutes ago
Get notified about any outages, downtime or incidents for Kustomer and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Kustomer.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
Regional Incident | Active |
Prod1 (US) | Active |
Analytics | Active |
API | Active |
Bulk Jobs | Active |
Channel - Chat | Active |
Channel - Email | Active |
Channel - Facebook | Active |
Channel - Instagram | Active |
Channel - SMS | Active |
Channel - Twitter | Active |
Channel - WhatsApp | Active |
CSAT | Active |
Events / Audit Log | Active |
Exports | Active |
Knowledge base | Active |
Kustomer Voice | Active |
Notifications | Active |
Registration | Active |
Search | Active |
Tracking | Active |
Web Client | Active |
Web/Email/Form Hooks | Active |
Workflow | Active |
Prod2 (EU) | Active |
Analytics | Active |
API | Active |
Bulk Jobs | Active |
Channel - Chat | Active |
Channel - Email | Active |
Channel - Facebook | Active |
Channel - Instagram | Active |
Channel - SMS | Active |
Channel - Twitter | Active |
Channel - WhatsApp | Active |
CSAT | Active |
Events / Audit Log | Active |
Exports | Active |
Knowledge base | Active |
Kustomer Voice | Active |
Notifications | Active |
Registration | Active |
Search | Active |
Tracking | Active |
Web Client | Active |
Web/Email/Form Hooks | Active |
Workflow | Active |
Third Party | Active |
OpenAI | Active |
PubNub | Active |
View the latest incidents for Kustomer and check for official updates:
Description: The issue regarding Gmail conversations across ALL PODS has been resolved. After careful monitoring, our team has determined that all affected areas are now fully restored. Please reach out to Kustomer support at Email or Chat if you have additional questions or concerns.
Status: Resolved
Impact: Minor | Started At: Sept. 13, 2024, 4:05 p.m.
Description: # **Summary** On September 12, 2024 customers on our Prod 1 cluster experienced elevated latency on multiple features of the Kustomer product. **Root Cause** We had an error during a Sobjects service deployment which consumed all the available hardware resources and blocked the event processor to scale in response to increased load. This caused a slowdown in processing the events which led to the latency in sending responses. Our standard auto recovery attempts failed so our engineers had to manually fix the issue. # **Timeline** **Sep 12, 2024** * 2:44 PM EDT Our on-call engineers were alerted to an incident of increased latency in the platform, kicking off an investigation * 2:48 PM EDT Kustomer’s support team began receiving reports of high latency across the platform * 3:30 PM EDT The issue was identified in our event processor where it hit a limit during scale out. The oncall engineer manually increased this limit to quickly restore operations * 4:08 PM EDT Latency metrics reached normal levels. * 5:21 PM EDT All delayed events are processed and deadlettered items were fully redriven **Lessons/Improvements** * **Improve monitoring and alerting** - Our team was alerted of the failures and began investigating immediately, but did not have immediate visibility into the cause of the failures. We’ve begun improving our monitoring to allow for quicker response times in the case of a future failure like this. - * _\[DONE\]We have fixed the observability issue with the Observability tool , Which would help to investigate such issues faster in the future._ * **Investigate mitigation techniques** - Although with improved monitoring we would be able to respond and resolve this issue in the future, ideally we want to mitigate the chances of this happening. We’ve already begun researching ways on how we can reduce the chance of recurrence. - * _\[DONE\] Optimize release schedule and cadence._ * _\[IN PROGRESS\] Investigate memory and CPU limits on the event handling service"_
Status: Postmortem
Impact: Critical | Started At: Sept. 12, 2024, 7:23 p.m.
Description: # **Summary** Search service became unavailable for multiple orgs. Clients were unable to make searches for some time, but after identifying the suspect issue, services were returned to and operational for all but one client by EOD. During the following day, a solution was implemented to decrease the erroneous search queries by 88% leading to restored access and operations with the Search service to all clients. ## **What happened** * Clients were not able to access / interact with the Search service * Temporary fix applied to restore Search service utilization metrics * Search service was restored * Investigating resulted in finding a suspect issue that brought the Search service down * Manual testing with queries proved a viable solution * A solution was implemented to improve the query times and open the service to all clients again ## **Timeline of Events** _**08/07/2024**_ * 11:53: Alerted of issue via Threshold Alerts * 12:17: Customer Support receives first reports of search issues * 12:42: Dev finds that multiple search cluster nodes have high CPU and memory utilizations * 13:17: Dev proposes temporary solution to allow searches to function * 13:25: Dev keeps track of all impacted clients according to Sentry alerts * 13:49: Dev applies solution * 14:43: Improvements are noted as node CPU and memory utilization levels recover _**08/08/2024**_ * 08:45: Dev applies fix to problem searches that were still present * 10:23: Dev finishes adding additional fixes to client's searches and restores functionality of the aggregations in search to the client * 12:05: Incident marked resolved
Status: Postmortem
Impact: Minor | Started At: Aug. 7, 2024, 7:32 p.m.
Description: Kustomer has resolved an event affecting core platform systems that caused latency in searches and timelines. After careful monitoring, our team has concluded that all affected areas are now fully restored. Please reach out to Kustomer support at [email protected] with any additional questions or concerns.
Status: Resolved
Impact: Minor | Started At: July 22, 2024, 3:01 p.m.
Description: Kustomer has resolved an event affecting core platform systems that caused latency in searches and timelines. After careful monitoring, our team has concluded that all affected areas are now fully restored. Please reach out to Kustomer support at [email protected] with any additional questions or concerns.
Status: Resolved
Impact: Minor | Started At: July 22, 2024, 3:01 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.