Last checked: 2 minutes ago
Get notified about any outages, downtime or incidents for UiPath and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for UiPath.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
Action Center | Active |
AI Center | Active |
Apps | Active |
Automation Cloud | Active |
Automation Hub | Active |
Automation Ops | Active |
Autopilot for Everyone | Active |
Cloud Robots - VM | Active |
Communications Mining | Active |
Computer Vision | Active |
Context Grounding | Active |
Customer Portal | Active |
Data Service | Active |
Documentation Portal | Active |
Document Understanding | Active |
Insights | Active |
Integration Service | Active |
Marketplace | Active |
Orchestrator | Active |
Process Mining | Active |
Serverless Robots | Active |
Solutions Management | Active |
Studio Web | Active |
Task Mining | Active |
Test Manager | Active |
View the latest incidents for UiPath and check for official updates:
Description: A recent deployment caused an issue in the queue processing workflow, leading to failures in processing in some specific scenarios. The issue was resolved by rolling back the deployment. We apologise for the inconvenience and are working on measures to prevent similar incidents in the future.
Status: Resolved
Impact: Minor | Started At: Oct. 21, 2024, 11:18 a.m.
Description: During an infrastructure change in Orchestrator, the DNS caching did not flush completely due to which a part of requests failed. We will work further to avoid these issues in future. Thank you for your patience and undertanding.
Status: Resolved
Impact: None | Started At: Oct. 18, 2024, 12:34 p.m.
Description: ## Customer impact Between October 17, 2024, at 21:00 UTC and October 18, 2024, at 13:00 UTC, some customers with tenants hosted in the US region may have experienced errors or increased latency while using the Generative AI capabilities in Document Understanding. ## Background context Document Understanding leverages Azure OpenAI GPT to power features that require large language models \(LLMs\). UiPath partners with Azure to secure a specific capacity for these advanced AI services. However, this Azure capacity is limited, and acquiring additional resources on short notice is only sometimes feasible. The allocated capacity is shared among several UiPath products, with Document Understanding receiving a more significant portion of this quota. To ensure the fair and efficient use of these limited resources, UiPath has implemented a quota system that allocates capacity to each customer. This system prevents any single customer from consuming excessive resources, thereby safeguarding the service's performance and accessibility for all users. It ensures that high usage by one customer does not negatively impact the experience of others. In addition to the quota system, Document Understanding incorporates an internal retry mechanism to shield customers from intermittent errors, such as brief periods of quota exhaustion. This mechanism automatically retries failed requests, enhancing the service's reliability and robustness. It helps maintain a seamless user experience even during temporary resource constraints. Despite these measures, customers may experience errors or increased latency on occasion due to inherent limitations in resource capacity. ## Root cause Due to a misconfiguration in our quota management system, the existing quota allocated for Document Understanding was too low to support the increased traffic during the incident. Consequently, a small subset of customers could consume all the LLM capacity allocated for Document Understanding. This unintended usage caused other customers to experience significantly increased latency and required multiple attempts to complete operations that relied on this resource. The issue arose when we resized and reallocated resources but did not update the quota configuration. ## Detection Although we received alerts about the increased error rate, the internal retry mechanism—designed to handle intermittent errors—led to these alerts being incorrectly categorized as low severity. This misclassification delayed our awareness and response to the situation. A few customers reported experiencing issues with Document Understanding not functioning correctly. Their reports enabled us to correlate these incidents with the lower severity alerts we had received, leading to the start of the investigation. ## Response We took the time to identify the root cause of the incident. After fully understanding the problem, we increased the quota allocated for Document Understanding, which resolved the incident. ## Follow-up To prevent similar issues in the future and enhance our service reliability, we are implementing several key improvements: * **Enhance Alerting Mechanisms**: We are improving our alert systems to provide immediate notifications for both quota issues and retried errors. This enhancement will enable us to respond more swiftly to potential problems, minimizing any impact on our customers. * **Transition to a Dynamic Quota System**: We will replace our fixed-rate quota with a dynamic allocation system that adjusts in real-time based on the current load, available vendor capacity, customer eligibility, and other pertinent factors. This approach will ensure a more equitable and efficient distribution of resources across all customers. * **Offer "Bring Your Own LLM Subscription" Options**: We are investigating the possibility of offering a "Bring Your Own LLM Subscription" feature. This option would allow customers to utilize their language model subscriptions within our platform, providing greater flexibility and potentially reducing dependency on shared resources.
Status: Postmortem
Impact: Major | Started At: Oct. 17, 2024, 9 p.m.
Description: This incident has been resolved.
Status: Resolved
Impact: Critical | Started At: Oct. 17, 2024, 3:07 p.m.
Description: A deployment error caused Apps services to go down for our customers in US region. The issue was identified and resolved by rolling back the changes. We are implementing measures to prevent similar incidents in the future. We apologise for the inconvenience caused.
Status: Resolved
Impact: Critical | Started At: Oct. 17, 2024, 5:08 a.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.