Company Logo

Is there an Harness outage?

Harness status: Systems Active

Last checked: 5 minutes ago

Get notified about any outages, downtime or incidents for Harness and 1800+ other cloud vendors. Monitor 10 companies, for free.

Subscribe for updates

Harness outages and incidents

Outage and incident data over the last 30 days for Harness.

There have been 3 outages or incidents for Harness in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Sign Up Now

Components and Services Monitored for Harness

Outlogger tracks the status of these components for Xero:

Service Reliability Management - Error Tracking FirstGen (fka OverOps) Active
Software Engineering Insights FirstGen (fka Propelo) Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Engineering Insights (SEI) Active
Software Supply Chain Assurance (SSCA) Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Engineering Insights (SEI) Active
Software Supply Chain Assurance (SSCA) Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Supply Chain Assurance (SSCA) Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Component Status
Service Reliability Management - Error Tracking FirstGen (fka OverOps) Active
Software Engineering Insights FirstGen (fka Propelo) Active
Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Engineering Insights (SEI) Active
Software Supply Chain Assurance (SSCA) Active
Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Engineering Insights (SEI) Active
Software Supply Chain Assurance (SSCA) Active
Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Supply Chain Assurance (SSCA) Active
Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active

Latest Harness outages and incidents.

View the latest incidents for Harness and check for official updates:

Updates:

  • Time: Oct. 16, 2024, 6:03 p.m.
    Status: Resolved
    Update: This incident has been resolved. We will provide an RCA after findings are complete.
  • Time: Oct. 16, 2024, 5:49 p.m.
    Status: Monitoring
    Update: The issue has been mitigated. We are still monitoring the system to ensure healthy operation of the cluster.
  • Time: Oct. 16, 2024, 5:37 p.m.
    Status: Identified
    Update: We have identified the service that is causing the degradation. We have scaled up the DB resource for that service. We are still working to mitigate the issue.
  • Time: Oct. 16, 2024, 5:09 p.m.
    Status: Investigating
    Update: We have internally found an issue that is impacting the optimal performance for Prod1 customers. We are actively investigating this.

Updates:

  • Time: Oct. 31, 2024, 4:28 a.m.
    Status: Postmortem
    Update: ## **Summary:**  Pipeline executions were failing with a time-out error on Prod2. This affected ~3% of pipeline executions. ## **What was the issue?** Tasks are execution units that run on a delegate as part of a pipeline execution. As a pipeline runs, its tasks are broadcast to delegates, and one eligible delegate picks up the task for execution. In case any delegate does not acquire the task within the stipulated time, it is rebroadcast. During this incident, rebroadcast functionality was affected, resulting in pipeline executions getting timed out. ## **Resolution:**  We rolled back the service to resolve the issue. ## **RCA** An incompatibility change was rolled out in one of our micro-services, causing deserialization failure for a subset of task types. The rebroadcast threads went into an error state due to this deserialization error, resulting in the failure of pipelines that required task rebroadcasts. The system recovered upon the service's rollback. ‌ **Action Item** 1. Added a critical alert for rebroadcast events. 2. Rebroadbast logic is made resilient to task deserialization errors. 3. Unit Test added to catch incompatible contract changes for task data.
  • Time: Oct. 14, 2024, 5:25 p.m.
    Status: Resolved
    Update: The incident has been resolved. We will be sharing a RCA with improvements in monitoring and other steps.
  • Time: Oct. 14, 2024, 5:08 p.m.
    Status: Monitoring
    Update: The issue has been fixed and we are monitoring the system.
  • Time: Oct. 14, 2024, 4:01 p.m.
    Status: Identified
    Update: The issue has been identified and we are still working on a fix.
  • Time: Oct. 14, 2024, 3:08 p.m.
    Status: Investigating
    Update: We are currently investigating an issue where the clone codebase step is failing for a subset of customers in Prod2.

Updates:

  • Time: Sept. 17, 2024, 10:29 a.m.
    Status: Postmortem
    Update: ### **Summary** CI-hosted MacOS pipelines were failing during the initialisation step, impacting specific customers using our MacOS-hosted service. ### What was the issue? We tightened a firewall rule for our Mac VM registry that was previously too permissive. As a result, another component couldn’t access the registry, leading to pipeline failures. ### **Resolution** | **Time** | **Event** | | --- | --- | | Sept 1st, 17:00 UTC | Restricted the firewall rule. | | Sept 04, 06:03 UTC | Issue reported by the customer. | | Sept 04, 08:39 UTC | We re-created the firewall rule and validated that the issue was fixed. | ### RCA Our MacOS production setup includes several components. When we restricted the permissive firewall rule, the new rule did not account for the NAT IP address of one of these components. After the change, we ran a full sanity pipeline on the Mac machines, which passed successfully. The issue didn’t surface immediately as the affected component maintains a persistent socket connection, unaffected by the firewall until the connection is re-established or restarted. This explains why the failure didn’t occur immediately after we removed the permissive rule on September 1st. We restored the rule, and the issue was resolved. ### Action Items 1. Restrict the firewall rule again, ensuring that necessary NAT IPs are included. 2. Restart all relevant services when applying firewall rule restrictions. 3. Ensure that all connections are properly drained and re-established when the change is implemented.
  • Time: Sept. 4, 2024, 6:47 a.m.
    Status: Resolved
    Update: We apologise for the inconvenience caused by this outage. We will make sure to provide the root cause analysis soon.
  • Time: Sept. 4, 2024, 6:39 a.m.
    Status: Monitoring
    Update: The issue is resolved now. We will be sharing RCA for the problem as soon as possible.
  • Time: Sept. 4, 2024, 6:33 a.m.
    Status: Investigating
    Update: We are currently investigating this issue.

Updates:

  • Time: Sept. 11, 2024, 11 p.m.
    Status: Postmortem
    Update: ## **Summary:** Logged in users started getting redirected to the enrollment screen with “Email verified successfully” message and forced users to enter user details again. Pipeline executions and backend tasks were not impacted. Impact was for accounts in Prod 4 cluster. ## **What was the issue?** We released an incompatible version of Nextgen UI service, resulting in unexpected user flow of new sign up for existing users. This was a human error. ## **Timeline:** ## **Resolution:** | **Time** | **Event** | | --- | --- | | September 03 7:45 PM UTC | Customer reported Login redirection to SignUP page | | September 03 8:15 PM UTC | New deployment happened around the same time. Decided to rollback | | September 03 8:20 PM UTC | Started the partial rollback of FF Proxy changes | | September 03 8:30 PM UTC | Partial rollback didn’t fix the issue. Initiated full rollback | | September 03 9:00 PM UTC | Complete rollback completed and issue resolved | Rollback resolved the issue. ## **RCA** There was a human error in picking the version of NextGen UI service. Post deployment sanity did not catch this issue. Rolling back took longer than expected as multiple services got deployed together. **Action Item** 1. Remove manual process to pick the service versions. Automate the promotion process from lower environments. 2. Improve sanity test to catch above UI flow. 3. Make the rollback process atomic based on the previous known good state.
  • Time: Sept. 11, 2024, 10:57 p.m.
    Status: Resolved
    Update: We can confirm normal operation. Get Ship Done! We will continue to monitor and ensure stability.
  • Time: Sept. 11, 2024, 10:55 p.m.
    Status: Investigating
    Update: Logged in users started getting redirected to the enrollment screen. Currently investigating

Updates:

  • Time: Sept. 4, 2024, 5:31 p.m.
    Status: Postmortem
    Update: ## **Summary:** Customer experienced login failures with 5xx errors on Prod4 cluster. ## **What was the issue?** Harness platform uses managed memStore internally which experienced “Host error”, this triggered master switchover within seconds. Backend microservices which connect to memStore were not able to reconnect quickly. This issue was with JAVA based services but GO services reconnected properly. ## **Timeline:** | **Time** | **Event** | | --- | --- | | 21 August 4:06:41 PM UTC | Primary memStore went down | | 21 August 4:07:00 PM UTC | Secondary memStore promoted to Primary | | 21 August 4:06:41 PM UTC | Harness services experience RedisResponseTimeoutException | | 21 August 4:14:30 PM UTC | Harness services restores connectivity to new Primary | | 21 August 4:14:53 PM UTC | New instance of memstore added and promoted as Secondary | ## **Resolution:** After 8 min services reconnected to the new primary memStore on its own and things recovered. ## **RCA** JAVA services use redisson library to connect to memStore. The established connection pool doesn’t detect the endpoint going away and these connections eventually get timed out. In case of graceful failover this issue doesn’t happen and only in case of  catastrophic failure we encounter this issue. **Action Item** * Detect this catastrophic failure and do a quicker reconnect by services
  • Time: Aug. 21, 2024, 6:47 p.m.
    Status: Resolved
    Update: We can confirm normal operation. Get Ship Done! We will continue to monitor and ensure stability.
  • Time: Aug. 21, 2024, 6:46 p.m.
    Status: Investigating
    Update: We are currently investigating this issue.

Check the status of similar companies and alternatives to Harness

UiPath
UiPath

Systems Active

Scale AI
Scale AI

Systems Active

Notion
Notion

Systems Active

Brandwatch
Brandwatch

Systems Active

Olive AI
Olive AI

Systems Active

Sisense
Sisense

Systems Active

HeyJobs
HeyJobs

Systems Active

Joveo
Joveo

Systems Active

Seamless AI
Seamless AI

Systems Active

hireEZ
hireEZ

Systems Active

Alchemy
Alchemy

Systems Active

Frequently Asked Questions - Harness

Is there a Harness outage?
The current status of Harness is: Systems Active
Where can I find the official status page of Harness?
The official status page for Harness is here
How can I get notified if Harness is down or experiencing an outage?
To get notified of any status changes to Harness, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Harness every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here
What does Harness do?
Harness is a software delivery platform that enables engineers and DevOps to build, test, deploy, and verify software as needed.