Last checked: a minute ago
Get notified about any outages, downtime or incidents for UiPath and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for UiPath.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
Action Center | Active |
AI Center | Active |
Apps | Active |
Automation Cloud | Active |
Automation Hub | Active |
Automation Ops | Active |
Autopilot for Everyone | Active |
Cloud Robots - VM | Active |
Communications Mining | Active |
Computer Vision | Active |
Context Grounding | Active |
Customer Portal | Active |
Data Service | Active |
Documentation Portal | Active |
Document Understanding | Active |
Insights | Active |
Integration Service | Active |
Marketplace | Active |
Orchestrator | Active |
Process Mining | Active |
Serverless Robots | Active |
Solutions Management | Active |
Studio Web | Active |
Task Mining | Active |
Test Manager | Active |
View the latest incidents for UiPath and check for official updates:
Description: ## Background UiPath Automation Cloud services are deployed globally across multiple regions. UiPath has its own identity service that creates tokens. Just like other cloud services, the identity service is also available in multiple regions. Each time there’s a cloud code release for cloud services, we follow safe deployment practices \(SDP\). We gradually introduce changes to our code. This way, we can balance the release’s exposure with its proven performance. Once a release has proven itself in production, it becomes available to tiers of broader audiences until everyone is using it. SPD also helps protect against retry storms by limiting the rate of requests. ## Customer impact Enterprise customers in the European region had issues connecting to the UiPath Automation Cloud, on March 13, 2024, starting at 13:04 UTC. The incident had three phases: * Between 13:04 UTC and 14:07 UTC all requests failed with an HTTP 500 error. This happened because one of our essential background services wasn't working. > All new token refreshes started working successfully from 14:07 UTC onwards. * Between 14:07 UTC and 14:50 UTC, some customers experienced HTTP 429 errors. This happened because there were too many retries. As a result, our network system started to throttle certain processes to manage the load. > All new token refreshes even from IPs that previously may have received errors started succeeding from 14:50 UTC onwards. * Some customers saw HTTP 400 errors from 14:07 UTC until they signed in to their robots again. This happened because the auth tokens expired during the outage. The tokens couldn't be refreshed automatically. So, users had to sign in again to get a new token. > If customers see errors, they might have to log back in to attended robots and portal pages. Their old login sessions might have run out. ## Root cause Identity service had a new dependency, which caused this problem. We gradually introduced the changes by following SDP. We started with our development environments, then community, and then onward through the different regions. By doing this, we can find as many issues as possible in a lower environment before they affect our critical customers. However, the resource didn't have the right scaling configuration. The change worked fine across many regions for several days until March 13, when an increase in traffic caused the quota threshold to be reached, which led to this incident. This failure also had a cascading effect. All the clients started trying to connect again, which triggered our rate-limiting policies for those specific IPs. This returned an HTTP 429 error, which limited any impact on other services and customers. ## Detection Our monitors automatically found the problem and let our disaster recovery team know within a few minutes. ## Response To fix the problem, we turned off the extra rights and raised the throttling limits for a short time. This lets the retries drain away. We followed our usual steps to fix the problem and remove the backup that made us hit the throttling limits. ## Follow-ups We're continuing to look into this issue in more detail, and we'll be sharing more details about how we plan to fix it soon. But first, we're going to: * Improve our failure drill process for new components in test environments to catch these kinds of issues earlier. * Reduce the time it takes to find the root cause of these types of failures. * Review our network throttling policies to make sure they're set correctly. * Look at ways to make our authentication and refresh token processes more resilient.
Status: Postmortem
Impact: Critical | Started At: March 13, 2024, 1:30 p.m.
Description: We are actively monitoring performance, but some customers may still experience small delays exporting logs. The team will continue working with support to improve performance.
Status: Resolved
Impact: Minor | Started At: March 13, 2024, 1:23 p.m.
Description: This incident has been resolved.
Status: Resolved
Impact: Major | Started At: March 13, 2024, 9:29 a.m.
Description: This incident has been resolved.
Status: Resolved
Impact: Minor | Started At: Feb. 27, 2024, 4:30 p.m.
Description: This incident has been resolved.
Status: Resolved
Impact: Minor | Started At: Feb. 12, 2024, 6:08 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.