Last checked: 5 minutes ago
Get notified about any outages, downtime or incidents for Ultimate AI and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Ultimate AI.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
Analytics | Active |
Backend Integrations | Active |
Core Services | Active |
Dashboard | Active |
Public API | Active |
Web Chat Widget | Active |
Chat integrations | Active |
Freshchat | Active |
Freshdesk Automation | Active |
Giosg Automation | Active |
Intercom Automation | Active |
LiveChat.com CRM Integration | Active |
Salesforce | Active |
Sunshine | Active |
Zendesk Chat | Active |
Zendesk Support Automation | Active |
Thirdparty Providers | Active |
LiveChat Agent apps | Active |
LiveChat API | Active |
LiveChat Chat widget | Active |
LiveChat Integrations | Active |
LiveChat Subprocessors' service | Active |
Salesforce EU37 | Active |
Salesforce EU40 | Active |
Salesforce UM7 | Active |
Zendesk Sunshine Conversations Sunshine Conversations Core API | Active |
Zendesk Sunshine Conversations Web SDK | Active |
View the latest incidents for Ultimate AI and check for official updates:
Description: This incident has been resolved.
Status: Resolved
Impact: Major | Started At: Nov. 22, 2024, 10:08 a.m.
Description: This incident has been resolved.
Status: Resolved
Impact: Major | Started At: Nov. 15, 2024, 2 p.m.
Description: We have not observed any errors, and all of our services are functioning normally.
Status: Resolved
Impact: Major | Started At: Nov. 11, 2024, 3:21 p.m.
Description: RCA that we got from Azure regarding to incident: > **Azure OpenAI – 408 and 503 errors while using GPT-4o model** > > **What happened?** > Between 05:08 UTC on 15 October 2024 and 15:37 UTC on 17 October 2024, a platform issue resulted in an impact to the Azure OpenAI service. Customers experienced high latency and 5XX/408 errors when making requests to GPT-4o, version 2024-08-06.**What went wrong and why?** > The system is generally designed for resiliency, ensuring it recovers after a crash and continues to serve traffic. Upon investigating this incident, we identified two issues. Firstly, large JSON schema inputs caused crashes due to specific customer usage patterns. Secondly, slow memory leaks led to low available memory, causing system component failures. This prevented the service from restarting normally after crashes, resulting in a deadlock that required a full node reboot to recover. Notably, the two issues are interrelated - the first leads to failures with crashes, and the second worsens due to crashes. However, any failure causing a crash can exacerbate the latter, not just those crashes caused by the former.**How did we respond?** > After identifying a usage pattern causing a code regression in the service infrastructure, we deployed a hotfix following our Safe Deployment Practices to resolve the issue. * 05:08 UTC on 15 October 2024 – Customer impact began, trace number of errors and latency. * 06:27 UTC on 15 October 2024 – Service monitoring threshold exceeded, and internal alerts fired for an increase of errors. * 07:29 UTC on 15 October 2024 – The team starts the investigation and attempts to use the automated solution to mitigate the problem. * 08:08 UTC on 15 October 2024 – Engineers remain engaged and determined impact to a small subset of customers. * 12:46 UTC on 15 October 2024 – We concluded that the automated solution was ineffective in assisting with mitigation efforts. * 13:30 UTC on 15 October 2024 – Additional impact detected, and initial communications sent to targeted impacted subscriptions. * 14:45 UTC on 15 October 2024 – We identified the usage pattern that triggered the bug and initiated throttling measures to mitigate its impact. * 14:45 UTC on 15 October 2024 – Availability and latency stabilizing for most customers. * 15:44 UTC on 15 October 2024 – A hotfix has been developed and validated for a permanent solution, following safe deployment practices. * 19:00 UTC on 15 October 2024 – Additional improvement in availability and latency for a subset of customers. * 20:21 UTC on 15 October 2024 – The deployment of the hotfix in the Sweden Central region has been successfully completed. * 23:28 UTC on 15 October 2024 – A temporary solution was communicated, advising customers to switch to model version 2024-05-13 as a workaround. * 00:00 UTC on 16 October 2024 – Mitigation measures were being maintained for customer workloads. The deployment of the hotfix was ongoing. * 06:03 UTC on 16 October 2024 – During the hotfix deployment across the remaining impacted regions, we initiated manual mitigation that required rebooting some nodes. * 10:20 UTC on 16 October 2024 – We have also started the automated mitigation, which has begun the reboot process for any affected nodes. * 14:22 UTC on 16 October 2024 – All nodes have recovered from auto reboot mitigation. * 15:37 UTC on 17 October 2024 – All regions have been deployed with the hotfix. The issue has been mitigated, and customer subscriptions that were throttled have returned to normal. Service is restored, and customer impact is resolved. > > **How are we making incidents like this less likely or less impactful?** * We are working to resolve the memory leak issue in the service. \(Estimated completion: December 2024\) * We are optimizing hotfix deployment to reduce the time from 1.5 days. \(Estimated completion: February 2025\)
Status: Postmortem
Impact: Minor | Started At: Oct. 17, 2024, 2:29 p.m.
Description: We have not observed any errors, and all of our services are functioning normally.
Status: Resolved
Impact: Minor | Started At: Oct. 15, 2024, 7:29 a.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.