Last checked: 4 minutes ago
Get notified about any outages, downtime or incidents for Kustomer and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Kustomer.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
Regional Incident | Active |
Prod1 (US) | Active |
Analytics | Active |
API | Active |
Bulk Jobs | Active |
Channel - Chat | Active |
Channel - Email | Active |
Channel - Facebook | Active |
Channel - Instagram | Active |
Channel - SMS | Active |
Channel - Twitter | Active |
Channel - WhatsApp | Active |
CSAT | Active |
Events / Audit Log | Active |
Exports | Active |
Knowledge base | Active |
Kustomer Voice | Active |
Notifications | Active |
Registration | Active |
Search | Active |
Tracking | Active |
Web Client | Active |
Web/Email/Form Hooks | Active |
Workflow | Active |
Prod2 (EU) | Active |
Analytics | Active |
API | Active |
Bulk Jobs | Active |
Channel - Chat | Active |
Channel - Email | Active |
Channel - Facebook | Active |
Channel - Instagram | Active |
Channel - SMS | Active |
Channel - Twitter | Active |
Channel - WhatsApp | Active |
CSAT | Active |
Events / Audit Log | Active |
Exports | Active |
Knowledge base | Active |
Kustomer Voice | Active |
Notifications | Active |
Registration | Active |
Search | Active |
Tracking | Active |
Web Client | Active |
Web/Email/Form Hooks | Active |
Workflow | Active |
Third Party | Active |
OpenAI | Active |
PubNub | Active |
View the latest incidents for Kustomer and check for official updates:
Description: Kustomer has resolved an event affecting Search and Reporting. Please reach out to the Kustomer Support team if you have additional questions or concerns.
Status: Resolved
Impact: Minor | Started At: July 10, 2023, 9:47 a.m.
Description: Kustomer has resolved an event affecting Search and Reporting. Please reach out to the Kustomer Support team if you have additional questions or concerns.
Status: Resolved
Impact: Minor | Started At: July 10, 2023, 9:47 a.m.
Description: # **Summary** On June 13th, 2023 at 3:00 PM EDT, our engineering team was alerted of increased errors in Lambda functions hosted on the AWS. AWS published a status update at 3:08 PM EDT regarding a system wide issue that was causing these errors. These Lambda functions have various responsibilities with their primary user-facing purpose being to index search results and audit log events. Once the issue was acknowledged to be contained to the us-east-1 region, recovery measures were initiated in anticipation of a broader AWS outage. AWS resolved the issue on their end at 4:48 PM EDT. After resolving, Lambdas resumed processing data and all new search data were populated by 5:44 PM EDT In addition, AWS reported increased error rates for Amazon Connect during this outage. **Root Cause** Search results were delayed and not populating in prod1 from an AWS outage affecting Lambdas in us-east-1 # **Timeline** June 13, 2023 3:00 PM EDT On call engineers were paged about increased errors in an AWS Lambda function and began investigating. At this time, the AWS console was also not accessible, causing a small delay in troubleshooting. 3:08 PM EDT AWS publishes their first update reporting increased error rates and latencies on their status page. 3:15 PM EDT On-call engineers determined that Lambdas in prod1 are degraded or not functioning. 3:19 PM EDT AWS publishes an update reporting that Lambda functions were experiencing elevated error rates. 3:21 PM EDT Pre-emptive efforts began to transition to a different region. 3:26 PM EDT AWS reports that they've identified the root cause of increased errors in AWS Lambda functions, and they're working to resolve it. 3:55 PM EDT Kustomer Statuspage update for incident is published. 4:05 PM EDT AWS status page update **regarding Amazon Connect errors:** “We are experiencing degraded contact handling in the US-EAST-1 Region. Callers may fail to connect and chats may fail to initiate. Agents may experience issues logging in or being connected with end-customers.” 4:29 PM EDT Impact to Kustomer systems was contained to search results not updated. Efforts shifted to focus on populating search results with new data. 4:40 PM EDT AWS status page update **regarding Amazon Connect errors: “**We have identified the root cause of the degraded contact handling in the US-EAST-1 Region. Callers may fail to connect and chats and tasks may fail to initiate. Agents may also experience issues logging in or being connected with end-customers. Mitigation efforts are underway.” 4:48 PM EDT AWS reports that a fix was implemented and services are recovering. Internal metrics show search results are populating. 5:00 PM EDT AWS status page update: "Many AWS services are now fully recovered and marked Resolved on this event. We are continuing to work to fully recover all services." 5:02 PM EDT AWS status page update **regarding Amazon Connect errors**: “Between 11:49 AM and 1:40 PM PDT, we experienced degraded contact handling in the US-EAST-1 Region. Callers may have failed to connect and chats and tasks may have failed to initiate. Agents may also have experienced issues logging in or being connected with end-customers. The issue has been resolved and the service is operating normally.” 5:29 PM EDT AWS status page update: "Lambda synchronous invocation APIs have recovered. We are still working on processing the backlog of asynchronous Lambda invocations that accumulated during the event, including invocations from other AWS services \(such as SQS and EventBridge\). Lambda is working to process these messages during the next few hours and during this time, we expect to see continued delays in the execution of asynchronous invocations." 5:49 PM EDT AWS status page update: "We are working to accelerate the rate at which Lambda asynchronous invocations are processed, and now estimate that the queue will be fully processed over the next hour. We expect that all queued invocations will be executed." **Lessons/Improvements** * Investigate additional methods and strategies to improve search indexing reliability. * Add new alarms for detecting lower than usual traffic to our indexing services. Although our system alerted us to events building up in our event streams, we observed that we would improve our response time if we added some additional alarms related to lower than usual event traffic to our indexers.
Status: Postmortem
Impact: Minor | Started At: June 13, 2023, 7:55 p.m.
Description: # **Summary** On Friday June 9th, 2023, Elastic Cloud clusters in our US Prod1 POD were non-responsive. Events originating from this region were not properly indexed in ElasticSearch, leading to degraded Search and Reporting functionality in Kustomer. This will have affected a small subset of customers from 7pm EST to 8:40 pm EST. # **Root Cause** Elastic Cloud suffered an incident resulting in connectivity loss to clusters in their us-east-1 region, which is where Kustomer’s US Prod1 POD resides. Elastic Cloud has provided a Root Cause Analysis, which indicates the following: * Proxies were deployed to us-east-1 with an invalid configuration, causing all proxies to be non-functional * Deployment process did not detect failure during deployment which eventually resulted in all proxies being deployed with invalid configurations # **Timeline** 06/09 6:59 pm - Elastic Cloud clusters in the us-east-1 region become unresponsive, leading to failed requests to index data in ElasticSearch 06/09 7:02 pm - Engineers are alerted of Elastic Cloud cluster connectivity issues 06/09 7:05 pm - Incident Engineer begins internal incident process 06/09 7:11 pm - Incident Engineer escalates issue 06/09 7:24 pm - Elastic Cloud updates their [StatusPage](https://status.elastic.co/incidents/07bw653d2677) to indicate an ongoing investigation into connectivity issues with us-east-1 clusters 06/09 7:42 pm - Elastic Cloud confirms proxy incident in us-east-1 via [StatusPage](https://status.elastic.co/incidents/07bw653d2677), and that they are working towards a solution 06/09 8:20 pm - Elastic Cloud cluster connectivity is restored, and Kustomer resumes indexing events in Elastic Cloud clusters 06/09 8:42 pm - Search and Reporting functionality in Kustomer is fully restored # **Lessons/Improvements** While Kustomer does have multi-region support and regional specific DR strategies for our primary cloud hosted databases and search clusters, there are additional opportunities to improve the time to recover in these situations. The engineering team is continually improving our system availability, and actively working on projects that will further improve our performance and uptime. Some specific action items include: * Reviewing internal processes surrounding ElasticSearch Disaster Recovery * Researching additional mitigation strategies for regional ElasticSearch Disaster Recovery
Status: Postmortem
Impact: Minor | Started At: June 9, 2023, 11:37 p.m.
Description: # **Summary** On Friday June 9th, 2023, Elastic Cloud clusters in our US Prod1 POD were non-responsive. Events originating from this region were not properly indexed in ElasticSearch, leading to degraded Search and Reporting functionality in Kustomer. This will have affected a small subset of customers from 7pm EST to 8:40 pm EST. # **Root Cause** Elastic Cloud suffered an incident resulting in connectivity loss to clusters in their us-east-1 region, which is where Kustomer’s US Prod1 POD resides. Elastic Cloud has provided a Root Cause Analysis, which indicates the following: * Proxies were deployed to us-east-1 with an invalid configuration, causing all proxies to be non-functional * Deployment process did not detect failure during deployment which eventually resulted in all proxies being deployed with invalid configurations # **Timeline** 06/09 6:59 pm - Elastic Cloud clusters in the us-east-1 region become unresponsive, leading to failed requests to index data in ElasticSearch 06/09 7:02 pm - Engineers are alerted of Elastic Cloud cluster connectivity issues 06/09 7:05 pm - Incident Engineer begins internal incident process 06/09 7:11 pm - Incident Engineer escalates issue 06/09 7:24 pm - Elastic Cloud updates their [StatusPage](https://status.elastic.co/incidents/07bw653d2677) to indicate an ongoing investigation into connectivity issues with us-east-1 clusters 06/09 7:42 pm - Elastic Cloud confirms proxy incident in us-east-1 via [StatusPage](https://status.elastic.co/incidents/07bw653d2677), and that they are working towards a solution 06/09 8:20 pm - Elastic Cloud cluster connectivity is restored, and Kustomer resumes indexing events in Elastic Cloud clusters 06/09 8:42 pm - Search and Reporting functionality in Kustomer is fully restored # **Lessons/Improvements** While Kustomer does have multi-region support and regional specific DR strategies for our primary cloud hosted databases and search clusters, there are additional opportunities to improve the time to recover in these situations. The engineering team is continually improving our system availability, and actively working on projects that will further improve our performance and uptime. Some specific action items include: * Reviewing internal processes surrounding ElasticSearch Disaster Recovery * Researching additional mitigation strategies for regional ElasticSearch Disaster Recovery
Status: Postmortem
Impact: Minor | Started At: June 9, 2023, 11:37 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.