Company Logo

Is there an UiPath outage?

UiPath status: Systems Active

Last checked: 3 minutes ago

Get notified about any outages, downtime or incidents for UiPath and 1800+ other cloud vendors. Monitor 10 companies, for free.

Subscribe for updates

UiPath outages and incidents

Outage and incident data over the last 30 days for UiPath.

There have been 9 outages or incidents for UiPath in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Sign Up Now

Components and Services Monitored for UiPath

Outlogger tracks the status of these components for Xero:

Action Center Active
AI Center Active
Apps Active
Automation Cloud Active
Automation Hub Active
Automation Ops Active
Autopilot for Everyone Active
Cloud Robots - VM Active
Communications Mining Active
Computer Vision Active
Context Grounding Active
Customer Portal Active
Data Service Active
Documentation Portal Active
Document Understanding Active
Insights Active
Integration Service Active
Marketplace Active
Orchestrator Active
Process Mining Active
Serverless Robots Active
Solutions Management Active
Studio Web Active
Task Mining Active
Test Manager Active
Component Status
Action Center Active
AI Center Active
Apps Active
Automation Cloud Active
Automation Hub Active
Automation Ops Active
Autopilot for Everyone Active
Cloud Robots - VM Active
Communications Mining Active
Computer Vision Active
Context Grounding Active
Customer Portal Active
Data Service Active
Documentation Portal Active
Document Understanding Active
Insights Active
Integration Service Active
Marketplace Active
Orchestrator Active
Process Mining Active
Serverless Robots Active
Solutions Management Active
Studio Web Active
Task Mining Active
Test Manager Active

Latest UiPath outages and incidents.

View the latest incidents for UiPath and check for official updates:

Updates:

  • Time: April 4, 2024, 4:27 p.m.
    Status: Postmortem
    Update: ## Customer impact From 2024-04-03 23:45 UTC to 2024-04-04 02:35 UTC our customers experienced errors when accessing some of the services located in the US region of Automation Cloud. Impacted products include Automation Cloud, Orchestrator, Automation Hub, Automation Ops, Document Understanding, Serverless Robots, Cloud Robots - VM, Solutions Management, and Insights. ## Root cause UiPath makes extensive use of Azure SQL. At the beginning of the outage, Microsoft performed routine SQL maintenance in the East US region. Typically this is done without any visible impact to our customers. But for some reason this maintenance caused the SQL Databases in this region to become unavailable. We are still waiting for a root cause from Microsoft and will update this document once we receive it. ## Detection Automated alerts immediately detected the issue and notified UiPath on-call engineers. They confirmed the scope of the outage and updated [status.uipath.com](http://status.uipath.com/). ## Response After a brief investigation, we determined that the problem was with Azure SQL. We reached out to Microsoft Support to request assistance. For the US region of all UiPath products, we place the primary database in Azure’s East US region and a failover database in Azure’s West US region. By default, Azure will failover from primary to secondary after the primary is unavailable for 60 minutes. During this incident, most databases automatically failed to the secondary region. Unfortunately, the Orchestrator, Automation Hub and Insights databases did not. The UiPath engineers investigated the databases and began to trigger a manual failover, but by that time Microsoft had resolved the underlying issue in the East US region. ## Follow up * Work with Microsoft to get a root cause for the underlying Azure SQL outage. * Determine why Orchestrator, Automation Hub and Insights did not failover to the secondary region. Perform a failover drill to confirm the problem has been fixed. * Investigate if the automatic failover period can be reduced from 60 minutes.
  • Time: April 4, 2024, 2:36 a.m.
    Status: Resolved
    Update: Issue is resolved and we are continuously monitoring our services. Marking the status as resolved.
  • Time: April 4, 2024, 2:04 a.m.
    Status: Monitoring
    Update: We are seeing improvements in health of the databases. We are continusoly monitoring the status
  • Time: April 4, 2024, 2:02 a.m.
    Status: Investigating
    Update: We are continuing to investigate this issue.
  • Time: April 4, 2024, 1:19 a.m.
    Status: Investigating
    Update: We are engaging the Microsoft and continuously investigating but at this time we are waiting for the issue to resolve from Microsoft Azure SQL backend
  • Time: April 4, 2024, 1:18 a.m.
    Status: Investigating
    Update: We are engaging the Microsoft and continuously investigating but at this time we are waiting for the issue to resolve from Microsoft Azure SQL backend
  • Time: April 4, 2024, 12:54 a.m.
    Status: Investigating
    Update: We are continuing to investigate this issue.
  • Time: April 4, 2024, 12:53 a.m.
    Status: Investigating
    Update: We are continuing to investigate this issue.
  • Time: April 4, 2024, 12:40 a.m.
    Status: Investigating
    Update: We are continuing the investigation and we are also seeing some issues on the backend Azure sql from cloud provider.
  • Time: April 4, 2024, 12:09 a.m.
    Status: Investigating
    Update: We are continuing to investigate this issue.
  • Time: April 4, 2024, 12:06 a.m.
    Status: Investigating
    Update: We are continuing to investigate this issue.
  • Time: April 4, 2024, 12:02 a.m.
    Status: Investigating
    Update: We are seeing increasing number of 503 errors from cloudflare . we are further investigating the issue from our backend applications and monitoring

Updates:

  • Time: March 28, 2024, 8:40 a.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: March 28, 2024, 7:11 a.m.
    Status: Monitoring
    Update: We have successfully mitigated the issue affecting AI Center ML Skills and the creation of new data labelling sessions in AI Center. The disruption was caused by a discrepancy in secret refresh between the service and identity provider; impacting customer authentication and inter-service communication. Our team has rectified the issue and restored normal functionality. We apologize for any inconvenience this may have caused and thank you for your understanding.
  • Time: March 28, 2024, 6:05 a.m.
    Status: Monitoring
    Update: A fix has been implemented and we are monitoring the results.
  • Time: March 28, 2024, 5:16 a.m.
    Status: Investigating
    Update: We discovered a problem where ML Skills in AI Center is failing to return results for users in Australia & Canada region. Our engineers are investigating this further.

Updates:

  • Time: March 25, 2024, 3:15 p.m.
    Status: Resolved
    Update: The fix has been applied on all geos and all communications were restored now.
  • Time: March 25, 2024, 2:32 p.m.
    Status: Identified
    Update: The issue is resolved now for our customers in Australia, Singapore and Japan. The fix is currently further being deployed in the rest of the geos. We will keep you posted with the updates.
  • Time: March 25, 2024, 1:23 p.m.
    Status: Identified
    Update: We are continuing to work on the fix for the issue. ETA for mitigation is around 1 hours across all geos.
  • Time: March 25, 2024, 11:47 a.m.
    Status: Identified
    Update: Notification updates in Task Mining are not working. We have identified the issue and are working on applying the fix for it. The ETA for mitigation is around 2 hours across all geos.

Updates:

  • Time: April 9, 2024, 4:06 p.m.
    Status: Postmortem
    Update: # Background UiPath Communications Mining is deployed globally across multiple regions. Each region is independent of all others with independent deployments of databases and stateless services. Multiple, different distributed database solutions are deployed in each region for different purposes. Historically we used a strongly consistent, horizontally scalable document store, for most ground-truth data storage, but due to a variety of reasons, including operational concerns relevant to this outage, in the last year we've been migrating away from this store to a distributed SQL database, instead. But today, much of our data \(~1B rows, ~5 TiB\) is still stored in this legacy document store. # Customer Impact * Performance degradation and elevated error rates \(HTTP 500 error codes\) for tenants in the EU region starting at the weekend on Mar 16, 10:02 UTC and Mar 18. * From Monday, Mar 18, 11:37 UTC, analytics and UI fully back up, but training, ingestion and streams continued to experience issues. * All functionality was fully restored on Wednesday, Mar 20 at 10:20 UTC. * 35 tenants in the EU were affected, and no tenants in other regions were impacted. # Root Cause The outage was caused by an interaction of multiple issues. At its core, however, the incident was triggered by a manual scaling operation that started on Saturday, Mar 16th that exposed fundamental problems in our legacy document store: 1. Explicit table re-sharding causes a temporary reduction in fault tolerance 2. Unexpected memory mapped page count exhaustion caused multiple DB nodes to crash simultaneously 3. Kubernetes security controls, read-only filesystems with unprivileged containers prevented in-place updates to sysctl-s, requiring further DB restarts to increase memory mapped page limits \(vm.max\_map\_count\) 4. Crashes exposed flaws in our document store fail-over mechanism, causing nodes to enter a "viral" state where failover nodes also entered a backfilling state. 5. Eventual solution was manually re-creating a subset of the database tables and repopulating them with data from the old, now read-only tables. 6. New tables suffered from very slow secondary index reconstruction performance issues in our document store # Detection Due to increased usage in the EU region, we started scaling up our document store cluster on Jan 30. We added two new nodes and started re-sharding and moving tables to the new nodes over the next month and a half, during weekends to avoid customer impact. Until the weekend of Mar 16, these operations all completed without a hitch. As soon as we started re-sharding one of the only two remaining tables at 10:02 UTC on Mar 16, two database nodes crashed simultaneously due to exhaustion of memory mapped pages \(`vm.max_map_count`\). There was on-call engineer actively monitoring the process at the time, but the issue was also picked up within minutes by our automated alerts. # Response Since all our workloads run on read-only, unprivileged containers, increasing this limit is impossible without restarting all the nodes. So the focus was bringing the cluster into a fully replicated state so we could then run a controlled restart to increase the `max_map_count` on all the nodes. Because of the hard crash during a re-sharding operation, the database started exhibiting unexpected behaviour and entered a degraded state and would sporadically enter read-only states and would not accept writes before a full integrity check. Furthermore the recovery process did not seem to ever fully complete. By Sunday evening a sufficient number of replicas became available. Our automatic nightly backup process started on 23:00 UTC Sunday, Mar 17 which added sufficient load to the database such that it experienced another four node crashes between 01:00 to 06:00 on Monday, Mar 18, due to `max_map_count` exhaustion. The DB reverted to the same degraded state as above, with very lengthy automated "backfilling" processes that never completed and during which the DB entered read-only mode. Due to the risk of further crashes before recovery is complete, on 08:42 UTC Monday, Mar 18 we made the decision to go ahead with the controlled restart to increase `max_map_count`, even though the database was not in a fully recovered state. This resulted many additional hours of downtime, but in exchange gave us the confidence that it would be able to complete successfully without additional unexpected crashes. By 11:37 UTC Monday, Mar 18 all but two tables were fully available allowing us to restore most functionality. The remaining two \(very large\) tables failed to recover through the automated process multiple times. We rapidly built and after significant testing and iteration, deployed an emergency batch job at 04:20 UTC Tuesday, Mar 19. This created new tables and copied all rows into fresh tables, while maintaining availability of the rest of the product. This process eventually completed at 07:30 UTC Tuesday, after which we could start rebuilding the secondary indexes in the new tables. Reindexing ~200M rows in these tables took over 24h, finally completing at 10:20 UTC, Wednesday, Mar 20, restoring all functionality. # Follow-ups This is the most significant outage UiPath Communications Mining has ever experienced and it was caused by the one of our core data stores. We had been aware of issues with this document store, and have been migrating away from it slowly over the last year. The next steps are 1. Halt further scaling of the document store. The number of replicas today can handle current and forecasted load for at least another year. We know the database is resilient in its steady state. 2. Reduce the amount of data stored in our database by more aggressively garbage collecting old data, and moving larger objects into blob storage that are referenced from the database instead. 3. Reprioritise the migration away from this legacy store as critical, aiming to complete it in the next six months, and start with database tables that caused most problems during this incident first.
  • Time: March 20, 2024, 10:33 a.m.
    Status: Resolved
    Update: We've validated all functionalities and the incident is now fully resolved.
  • Time: March 20, 2024, 10:22 a.m.
    Status: Monitoring
    Update: The index restoration is completed. We are conducting tests to ensure that all functionality has been successfully restored.
  • Time: March 20, 2024, 10:14 a.m.
    Status: Identified
    Update: The database index reconstruction process is taking longer than anticipated. We now expect it to require a few additional (approx. 2) hours to complete. We will provide updates as we gain more insights into the situation.
  • Time: March 19, 2024, 5:33 p.m.
    Status: Identified
    Update: Database index reconstruction is significantly slower than previously expected. Now expecting full availability at midnight UTC tonight.
  • Time: March 19, 2024, 7:47 a.m.
    Status: Identified
    Update: The issue is partially mitigated and it would take around 3 hours for a complete recovery
  • Time: March 18, 2024, 6:23 p.m.
    Status: Identified
    Update: Email ingestion functionality restored, new data can be added. Streams functionality is still down.
  • Time: March 18, 2024, 4:12 p.m.
    Status: Identified
    Update: Database is currently running through backfill process. We estimate this may take multiple hours to fully catch up.
  • Time: March 18, 2024, 11:37 a.m.
    Status: Identified
    Update: Issue is partially mitigated: 1. Analytics functionality restored 2. Most of the UI now operates correctly Ingestion and streams functionality still down Mitigation is in progress
  • Time: March 18, 2024, 10:22 a.m.
    Status: Identified
    Update: Issue is identified, mitigation is in progress
  • Time: March 18, 2024, 9:18 a.m.
    Status: Investigating
    Update: We are experiencing an issue with communications mining and investigation is in progress Currently the service is only available in read only mode

Updates:

  • Time: March 13, 2024, 3:47 p.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: March 13, 2024, 3:37 p.m.
    Status: Monitoring
    Update: We have implemented a fix and all impacted services are back to operational state. We are carefully observing the services for any issue
  • Time: March 13, 2024, 3:31 p.m.
    Status: Investigating
    Update: Orchestrator, Apps and Solutions management services are impacted. We are gauging the extent of this impact and also working on a fast resolution of this issue. Please bear with us while we resolve this and get services back to Operational.

Check the status of similar companies and alternatives to UiPath

Scale AI
Scale AI

Systems Active

Notion
Notion

Systems Active

Brandwatch
Brandwatch

Systems Active

Harness
Harness

Systems Active

Olive AI
Olive AI

Systems Active

Sisense
Sisense

Systems Active

HeyJobs
HeyJobs

Systems Active

Joveo
Joveo

Systems Active

Seamless AI
Seamless AI

Systems Active

hireEZ
hireEZ

Systems Active

Alchemy
Alchemy

Systems Active

Frequently Asked Questions - UiPath

Is there a UiPath outage?
The current status of UiPath is: Systems Active
Where can I find the official status page of UiPath?
The official status page for UiPath is here
How can I get notified if UiPath is down or experiencing an outage?
To get notified of any status changes to UiPath, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of UiPath every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here
What does UiPath do?
UiPath Business Automation Platform automates knowledge work, accelerating innovation and human achievement through AI and automation.