Company Logo

Is there an Harness outage?

Harness status: Systems Active

Last checked: 4 minutes ago

Get notified about any outages, downtime or incidents for Harness and 1800+ other cloud vendors. Monitor 10 companies, for free.

Subscribe for updates

Harness outages and incidents

Outage and incident data over the last 30 days for Harness.

There have been 3 outages or incidents for Harness in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Sign Up Now

Components and Services Monitored for Harness

Outlogger tracks the status of these components for Xero:

Service Reliability Management - Error Tracking FirstGen (fka OverOps) Active
Software Engineering Insights FirstGen (fka Propelo) Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Engineering Insights (SEI) Active
Software Supply Chain Assurance (SSCA) Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Engineering Insights (SEI) Active
Software Supply Chain Assurance (SSCA) Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Supply Chain Assurance (SSCA) Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Component Status
Service Reliability Management - Error Tracking FirstGen (fka OverOps) Active
Software Engineering Insights FirstGen (fka Propelo) Active
Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Engineering Insights (SEI) Active
Software Supply Chain Assurance (SSCA) Active
Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Engineering Insights (SEI) Active
Software Supply Chain Assurance (SSCA) Active
Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Supply Chain Assurance (SSCA) Active
Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active

Latest Harness outages and incidents.

View the latest incidents for Harness and check for official updates:

Updates:

  • Time: Dec. 26, 2023, 6:44 a.m.
    Status: Postmortem
    Update: **Summary:** * On December 22nd, a load test triggered by the Harness Performance Team caused a full outage of [app.harness.io](http://app.harness.io/) for approximately 9 minutes \(12:36 PM UTC to 12:45 PM UTC\). * This impacted customers in Prod1 and Prod2 clusters. * Prod3 cluster was unaffected due to its separate component affected by outage. **Impact:** * Customers in Prod1 and Prod2 clusters were unable to access [app.harness.io](http://app.harness.io/) for 9 minutes. **Root Cause:** * During the high volume of traffic from the load test, the component \("Kubernetes Ingress Controller"\) responsible for managing incoming requests and routing them to the correct internal services became overloaded. * This caused the ingress controller to become unhealthy, leading to the outage. **Resolution:** * The system automatically recovered without manual intervention. **Action Items:** * **Resource Scaling:** We are exploring options to automatically scale the ingress controller based on demand to handle high traffic volumes more effectively. We understand the importance of a reliable platform for your operations and sincerely apologize for any inconvenience caused by this incident. Our team is dedicated to ensuring the continued improvement of the Harness platform’s performance and reliability. We appreciate your trust and remain committed to providing you with a seamless experience.
  • Time: Dec. 22, 2023, 1:41 p.m.
    Status: Resolved
    Update: There was a downtime observed with the Harness platform for the Prod1 and Prod 2 clusters. The login page was not accessible and 502 errors were returned. We are investigating the root cause of the issue and will post RCA here. All functionalities are restored now.

Updates:

  • Time: Dec. 26, 2023, 6:13 a.m.
    Status: Postmortem
    Update: # Summary On 21st December at 1:10 PM PST, we received a report from 2 of our customers about issues with their pipeline executions in our Prod-2 cluster for their CI pipelines. A firehydrant was triggered after some time for the same. ## Timeline \(PST\) | **Time** | **Event** | | --- | --- | | 1:24 PM | Confirmed no pipeline-service deployment was done and issue is observed only for few Prod2 and Prod1 CI customers. | | 1:28 PM | Verified CI Automation was running fine but we were able to reproduce the issue | | 2:04 PM | Prod2 CI service was rolled back to previous version, and confirmed with customers the issue is mitigated | # Resolution We rolled back the CI build in Prod2 cluster to unblock the customers. # Total Downtime * Downtime taken: No Downtime taken * Resolution time\*: 1hour 46 minutes * Resolution time = time reported to time restored, either through Rollback or HF # RCA There was a change in a common deserialiser - where we added handling that if the value is a string of Json list example → `"[1,2]"` is given then it will be converted to List of String irrespective the field expecting it to be type String, thus its throwing Exception during execution. This was mainly observed for a customer having this value set in their envVariables in RunStep in the CI stage. # Action Items 1. Updating our customer setup automation to include this setup as well as any others to have our suite up to date so that with our feature development, existing customer setups are not impacted. 2. Adding failover to code paths when making changes to existing flows to minimize the impact on existing running setups with new feature/bug development.
  • Time: Dec. 21, 2023, 10:37 p.m.
    Status: Resolved
    Update: The incident has been resolved. We plan to publish the Postmortem early next week. Our pipeline execution failure rate was less than 1% during this incident. As a result, no downtime was taken.
  • Time: Dec. 21, 2023, 10:11 p.m.
    Status: Monitoring
    Update: We have reverted the services back to the previous version and we are monitoring the results.
  • Time: Dec. 21, 2023, 10:06 p.m.
    Status: Investigating
    Update: We are continuing to investigate this issue.
  • Time: Dec. 21, 2023, 10:05 p.m.
    Status: Investigating
    Update: We are currently investigating this issue.

Updates:

  • Time: Dec. 18, 2023, 7:43 p.m.
    Status: Resolved
    Update: The problem was primarily with the image being used, and we have notified customers to use a stable version of DIND to prevent this issue https://github.com/docker-library/docker/commit/4c2674df4f40c965cdb8ccc77b8ce9dbc247a6c9
  • Time: Dec. 18, 2023, 6:52 p.m.
    Status: Investigating
    Update: We continue to look into the situation. At the moment, two customers are affected, and our team is working on debugging the issue.
  • Time: Dec. 18, 2023, 6:22 p.m.
    Status: Investigating
    Update: We are currently investigating this issue.

Updates:

  • Time: Dec. 13, 2023, 6:59 p.m.
    Status: Postmortem
    Update: # Incident On December 11th, starting at 4:25 PM \(All times UTC\), Harness had a service outage that affected pipelines in our Prod2 environment. Specifically, CI and CD pipeline executions in NextGen which used secrets failed. The incident was resolved on December 11, 4:50 PM. This incident is related to the [incident](https://status.harness.io/incidents/w2w7btby70xs) from last week. # Timeline | **Time** | **Event** | | --- | --- | | Dec 11, 4:25 PM | Harness detected pipelines were failing to resolve secrets. | | Dec 11, 4:28 PM | Incident was acknowledged, and the P0 incident called | | Dec 11, 4:35 PM | Root cause identified | | Dec 11, 4:50 PM | Incident resolved | # Root Cause ## Background Harness uses connectors to external secret managers \(e.g. Google Secret Manager or Hashicorp Vault\) to resolve/store secrets used by pipelines and elsewhere in the Harness platform. External secret manager connectors require configuration, including a means to authenticate to the external Secret Manager.  On 2023-12-07, there was an incident where a bad secret manager configuration was leading to thread exhaustion. To mitigate that incident, we updated the faulty configuration in the database and restarted the affected services. The instant incident was a downstream of the incident from earlier this week. Mitigation and Remediation * In the prior incident, we manually updated the config that controlled the broken secret manager connector. In this cleanup, we unintentionally left a dangling database entry. Had we updated the connector by API, this entry would have been cleaned up correctly * After discovery, we deleted the secret through API and restarted the affected services. # Followup/Action Items * On Friday, we rolled a hotfix to prevent the creation of such faulty configuration. However, it did not help in this case since it was an existing configuration. * There was additional runtime validation in the works, which detects the self reference when secret is used in pipeline execution. Since then, it has also been rolled out.
  • Time: Dec. 11, 2023, 5:04 p.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: Dec. 11, 2023, 5:01 p.m.
    Status: Monitoring
    Update: The issue has been resolved. We are continuing to monitor the incident.
  • Time: Dec. 11, 2023, 4:42 p.m.
    Status: Identified
    Update: The issue has been identified and a fix is being implemented.

Updates:

  • Time: Dec. 9, 2023, 12:18 a.m.
    Status: Postmortem
    Update: # Incident On December 7th, starting around 9 PM \(All times UTC\), Harness experienced an outage that affected pipelines in our Prod2 environment. Specifically, CI and CD pipelines in NextGen which used secrets were failing during execution. There was an intermittent downtime for FirstGen pipelines too for the duration of Services restart events. The incident was resolved on December 8th at 3:01 AM. # Timeline | **Time** | **Event** | | --- | --- | | Dec 7, 9:13 PM | First customer reported issue. Triaged as likely a result of a separate ongoing incident. | | Dec 7, 10:34 PM | Incident acknowledged as independent of separate incident, and incident called | | Dec 8, 2:13 AM | Root cause identified | | Dec 8, 3:01 AM | Incident resolved | # Response Performance degradation and execution failure issues were reported across Continuous Integration \(CI\) and Continuous Deployment \(CD\) pipelines starting at 9:13 PM on Dec 7th. A high severity incident was declared at 10:30 PM. # Root Cause ## Background: Harness uses connectors to external secret managers \(e.g. Google Secret Manager or Hashicorp Vault\) to resolve/store secrets used by pipelines and elsewhere in the Harness platform. External secret manager connectors require configuration, including a means to authenticate to the external Secret Manager. ## Sequence of Events * A customer configured their Secret Manager Connector to use a secret stored in the same Secret Manager. This issue was not apparent at the time of the change to either Harness or the User, because validation rules did not catch this issue. * Several hours later a pipeline was run that referenced a secret contained in that secret manager. * The pipeline execution tried to resolve the secret. Secret resolution created a recursive loop, filling the threadpool devoted to secret resolution. * Threadpool exhaustion stalled secret resolution across the environment. End-users experienced this stall as pipeline failures because failed secret resolution fails a pipeline # Mitigation and Remediation Mitigation consisted of: 1. Updating the faulty configuration to break the self-dependency 2. Aborting the affected in-flight pipeline executions 3. Scaling all replicas of the service which manages secret resolution to zero to stop the job from being picked back up by the scheduler. Note that redeploying or restarting the service didn’t fix the issue because any surviving replica would instantly poison the others. A Hotfix has been released to ensure configuration validation includes checking for self-reference. # Followup/Action Items * Improve fault isolation and layering between services in a way that makes causal issues easier to detect. * Our observability systems were operational and functioning normally, however they were not configured to alert on this type of issue. We will be implementing two classes of fixes across the platform: * 1\) Log-volume based alerting. Although this would not have identified the specific issue sooner, it would have decreased time to detection. * 2\) Close the loop between observability metrics and thresholds for alerting on those metrics. As metrics are added, they need to have thresholds for alerting configured at the same time and adjusted as needed, rather than creating metrics, and configuring alerting in a separate workstream. An alert on thread pool size would have greatly reduced the incident resolution time. * Our incident response playbooks include triage steps for individual modules, and steps for fault isolation at the platform level, but didn’t fully cover the scope of actions needed to isolate this issue. We will enhance our playbooks to provide additional depth for platform-level triage. We understand that the Harness platform is mission critical for our customers. We are committed to living up to our promise of reliability and availability. We are determined to learn from this incident and make the necessary improvements to meet our shared world-class standards. Your trust is of utmost importance, and we appreciate your understanding.
  • Time: Dec. 8, 2023, 3:18 a.m.
    Status: Resolved
    Update: This incident has been resolved, the impacted components were Prod 2 - Continuous Delivery - Next Generation (CDNG) and Continuous Integration Enterprise(CIE) - Self Hosted Runners.
  • Time: Dec. 8, 2023, 3:03 a.m.
    Status: Monitoring
    Update: We are continuing to monitor for any further issues.
  • Time: Dec. 8, 2023, 3:01 a.m.
    Status: Monitoring
    Update: The incident is now resolved. Detailed RCA to follow.
  • Time: Dec. 8, 2023, 2:13 a.m.
    Status: Identified
    Update: We have identified the root cause and we are in the process of recovering.
  • Time: Dec. 8, 2023, 2:08 a.m.
    Status: Identified
    Update: The secret decryption task is failing and we are looking into a recovery
  • Time: Dec. 8, 2023, 1:44 a.m.
    Status: Identified
    Update: We are rolling back the services to the previously deployed version. We will keep you updated on the progress.
  • Time: Dec. 8, 2023, 12:23 a.m.
    Status: Identified
    Update: We are currently working on debugging the issue. We have identified that there may be a problem with the gRPC calls between services. We will keep you updated on the progress.
  • Time: Dec. 7, 2023, 11:44 p.m.
    Status: Identified
    Update: We are currently in the process of identifying the incident. As soon as it is identified, we will provide an update.
  • Time: Dec. 7, 2023, 10:36 p.m.
    Status: Identified
    Update: We are continuing to work on a fix for this issue.
  • Time: Dec. 7, 2023, 10:35 p.m.
    Status: Identified
    Update: We continue to look into the issue and are considering rolling back the latest deployment
  • Time: Dec. 7, 2023, 10:34 p.m.
    Status: Investigating
    Update: Pipelines that reference the secrets are experiencing failures in the Prod-2 cluster starting around 9:13 PM UTC. Harness team started looking into the issue and a high severity incident was declared at 10:34 PM UTC.

Check the status of similar companies and alternatives to Harness

UiPath
UiPath

Systems Active

Scale AI
Scale AI

Systems Active

Notion
Notion

Systems Active

Brandwatch
Brandwatch

Systems Active

Olive AI
Olive AI

Systems Active

Sisense
Sisense

Systems Active

HeyJobs
HeyJobs

Systems Active

Joveo
Joveo

Systems Active

Seamless AI
Seamless AI

Systems Active

hireEZ
hireEZ

Systems Active

Alchemy
Alchemy

Systems Active

Frequently Asked Questions - Harness

Is there a Harness outage?
The current status of Harness is: Systems Active
Where can I find the official status page of Harness?
The official status page for Harness is here
How can I get notified if Harness is down or experiencing an outage?
To get notified of any status changes to Harness, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Harness every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here
What does Harness do?
Harness is a software delivery platform that enables engineers and DevOps to build, test, deploy, and verify software as needed.