Company Logo

Is there an Harness outage?

Harness status: Systems Active

Last checked: 8 minutes ago

Get notified about any outages, downtime or incidents for Harness and 1800+ other cloud vendors. Monitor 10 companies, for free.

Subscribe for updates

Harness outages and incidents

Outage and incident data over the last 30 days for Harness.

There have been 3 outages or incidents for Harness in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Sign Up Now

Components and Services Monitored for Harness

Outlogger tracks the status of these components for Xero:

Service Reliability Management - Error Tracking FirstGen (fka OverOps) Active
Software Engineering Insights FirstGen (fka Propelo) Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Engineering Insights (SEI) Active
Software Supply Chain Assurance (SSCA) Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Engineering Insights (SEI) Active
Software Supply Chain Assurance (SSCA) Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Supply Chain Assurance (SSCA) Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Component Status
Service Reliability Management - Error Tracking FirstGen (fka OverOps) Active
Software Engineering Insights FirstGen (fka Propelo) Active
Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Engineering Insights (SEI) Active
Software Supply Chain Assurance (SSCA) Active
Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Engineering Insights (SEI) Active
Software Supply Chain Assurance (SSCA) Active
Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Supply Chain Assurance (SSCA) Active
Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active

Latest Harness outages and incidents.

View the latest incidents for Harness and check for official updates:

Updates:

  • Time: April 26, 2024, 11:42 p.m.
    Status: Postmortem
    Update: **Overview:** **Security Testing Orchestration \(STO\) and IACM module impacted** **What was the issue?** The STO and IaCM modules couldn't complete execution, causing the pipeline execution to time out. The reason was that the Redis keys were rotated, but the two microservices responsible for these modules were still using the older keys. **Timeline**: | **Time** | **Event** | | --- | --- | | 25th Apr 2024 7:03 AM PDT | Issue was noticed & investigation started. | | 25th Apr 2024 7:35 AM PDT | Issue Identified. | | 25th Apr 2024 7:43 AM PDT | Issue was resolved for STO. We continued Monitoring. | | 25th Apr 2024 7:49 AM PDT | Issue was resolved for IaCM. We continued Monitoring. | | 25th Apr 2024 8:00 AM PDT | All modules are declared Operational. | **Resolution:** The STO and IaCM modules were updated to use the new keys. **RCA & Action Items:** Two microservices were missed in the update because they had different configuration formats in QA vs. Production. Our change management process did not account for this discrepancy. As part of the improvement process, we will standardize the configurations across environments and add relevant checks for key rotation in the change management process.
  • Time: April 25, 2024, 3 p.m.
    Status: Resolved
    Update: This incident has been resolved. Team will be working on a RCA and will share it at the earliest possible. We apologize for the inconvenience this would have caused.
  • Time: April 25, 2024, 2:53 p.m.
    Status: Monitoring
    Update: We are monitoring the systems now.
  • Time: April 25, 2024, 2:53 p.m.
    Status: Identified
    Update: IACM is back to operational as well. We will be monitoring the system now.
  • Time: April 25, 2024, 2:46 p.m.
    Status: Identified
    Update: We have resolved the issue for Feature Flags (FF) and service is back to operational
  • Time: April 25, 2024, 2:43 p.m.
    Status: Identified
    Update: The issue has been identified and we have resolved it for STO in Prod1/Prod2
  • Time: April 25, 2024, 2:40 p.m.
    Status: Investigating
    Update: We are continuing to investigate this issue.
  • Time: April 25, 2024, 2:35 p.m.
    Status: Investigating
    Update: We are currently investigating this issue.

Updates:

  • Time: April 19, 2024, 10:04 a.m.
    Status: Postmortem
    Update: **Issue Description:** Customers experienced a disruption in the Azure copy of cloud cost data due to authentication failures caused by incorrect token formatting. **Resolution Time:** 3 hours 48 minutes **Root Cause Analysis:** * Token formatting had changed, leading to failures. * The token format changed because of a caching issue in the component responsible to populate the configmap which ended up in malformed configmap which is used in Azure data sync. * This was a partial outage impacting few customers. **Prevention Measures and Follow-Up Actions:** 1. Improve metric accuracy and alerting for datasync jobs. 2. Enhance our automation suites. **Conclusion:** The issue stemmed from token formatting inconsistencies, which have been addressed with updated tokens and preventive measures implemented to avoid future disruptions.
  • Time: April 19, 2024, 10:03 a.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: April 19, 2024, 10:01 a.m.
    Status: Identified
    Update: Cause of token failure found.
  • Time: April 19, 2024, 10 a.m.
    Status: Identified
    Update: New token was created. Rolled out in prod1 and 2.
  • Time: April 19, 2024, 10 a.m.
    Status: Investigating
    Update: FireHydrant incident created
  • Time: April 19, 2024, 9:54 a.m.
    Status: Investigating
    Update: Jira opened

Updates:

  • Time: March 28, 2024, 1:05 a.m.
    Status: Postmortem
    Update: ## **Summary** Pipeline executions across CI/CD/STO are not advancing as anticipated, with some even getting stuck across various customers in the prod2 cluster. ## **Timelines** | **Time \(PST\)** | **Event** | | --- | --- | | 08:50 am | System instability alert was received and investigation was initiated. | | 09:20 am | Identified the culprit configuration that led to increased load on our systems. | | 10:00 am | Increased resource allocation to help with increased load. | | 10:15 am | Updated the invalid configuration from the system to make it valid. | | 11:00 am | Systems back to normal | ## **RCA** The Harness pipeline engine functions within a microservice ecosystem, working alongside various framework components to manage expressions. These expressions often involve variables and configuration files, which can be stored in Git repositories.  One of these configuration file contained a self-referential expression. This recursive reference repeatedly called for the resolution of the same configuration file, triggering a loop that led the service to exhaust its resources. ## **Resolution** We've refactored the configuration to remove the recursive reference and restarted the service. Additionally, we've deployed hotfixes to prevent the reintroduction of such configurations and implemented mechanisms to auto-detect and halt recursion within the service. ## **Additional Action Items** To expedite the RCA and mitigate incidents promptly, we're implementing additional logging and alerting mechanisms to detect specific instabilities. This will enhance our ability to identify and address issues swiftly.
  • Time: March 26, 2024, 6:07 p.m.
    Status: Resolved
    Update: This incident has been resolved. All pipelines should be running with normal latency.
  • Time: March 26, 2024, 5:31 p.m.
    Status: Monitoring
    Update: Pipelines are running with healthy metrics and we are currently monitoring the systems
  • Time: March 26, 2024, 5:15 p.m.
    Status: Investigating
    Update: We have identified a possible cause, and are continuing to investigate the source of the issue.
  • Time: March 26, 2024, 4:28 p.m.
    Status: Investigating
    Update: Our engineering teams have discovered the pipelines are not running smoothly and are working towards identifying the issue at the earliest possible.

Updates:

  • Time: March 22, 2024, 10:26 a.m.
    Status: Postmortem
    Update: ## Summary The 'Harness Overview Page' failed to load intermittently in the Prod-2 cluster while the critical functionality remained unaffected. ## Timeline | **Time \(UTC\)** | **Event** | | --- | --- | | 01:33 PM | Received internal alerts for synthetic monitoring failure. | | 01:47 PM | An internal incident was raised. | | 01:48 PM | The root cause was identified. | | 02:16 PM | Incident was resolved | ## Resolution The high CPU-intensive queries were terminated in the backend database to resume normal operations. ## RCA The Overview dashboard failed to retrieve data from the backend database as the CPU utilisation reached critical levels resulting in significant delays in processing regular queries. We have isolated configurations within the database which in combination with the application’s retry mechanism lead to undue load on the database server leading to an unhealthy state. ## Action Items 1. We have modified the database configurations which shows significant promise as per initial observations. 2. We are in the process of moving the queries to horizontally scaled database nodes. 3. More database optimizations are being planned for the application calls
  • Time: March 20, 2024, 2:31 p.m.
    Status: Resolved
    Update: This incident has been resolved now.
  • Time: March 20, 2024, 2:19 p.m.
    Status: Monitoring
    Update: We have fixed the issue with Overview dashboard/landing dashboard and are actively monitoring it.
  • Time: March 20, 2024, 2:01 p.m.
    Status: Investigating
    Update: We are currently investigating an issue with Harness overview dashboard.

Updates:

  • Time: March 15, 2024, 7:32 p.m.
    Status: Postmortem
    Update: #### Overview The 'Customer Overview Page' was loading slowly in the Prod-2 cluster. All other critical functions remained unaffected. ‌ #### Timeline #### What was the issue? The incident occurred when the dashboard failed to retrieve data from the backend database, which was traced back to the CPU utilization of the database exceeding 90%. This critical level of utilization triggered alerts. The surge in CPU usage was primarily due to an increase in load from the application's operations. The simultaneous demands on the database resources led to significant constraints, hindering its ability to process requests efficiently. ‌ #### Resolution To mitigate the issue and restore normal operations, immediate action was taken to terminate long-running queries that were contributing to the high CPU utilization. Additionally, the number of data-consuming services was reduced temporarily. These measures effectively decreased the load on the database, allowing its operations to resume at a normal pace and ensuring the availability of the dashboard data retrieval functionality. ‌ #### Action Items In response to this incident, the following action items have been identified and are being implemented to prevent recurrence and improve system resilience: 1. **Distribute Database Load:** To better manage and distribute the incoming query load, especially during peak times, we will distribute database query load across 2 database instances. 2. **Annotate Logs for Better Analysis:** Work is underway to enhance our logging strategy by annotating logs with details that will help in identifying patterns in query behavior. This enhancement will facilitate more granular analysis and understanding of how queries interact with the database resources.
  • Time: March 14, 2024, 6:03 p.m.
    Status: Resolved
    Update: We can confirm normal operation. We will continue to monitor and ensure stability.
  • Time: March 14, 2024, 5:26 p.m.
    Status: Monitoring
    Update: The Overview page latency is back to normal limits at this time. We are still monitoring the system for any issues.
  • Time: March 14, 2024, 5:01 p.m.
    Status: Identified
    Update: Due to additional load, the system is still not back to normal operations. We are actively debugging this incident.
  • Time: March 14, 2024, 4:39 p.m.
    Status: Monitoring
    Update: We are monitoring the service to ensure normal performance continues.
  • Time: March 14, 2024, 4:21 p.m.
    Status: Identified
    Update: The resource constraint has been identified and we are working to mitigate the situation.
  • Time: March 14, 2024, 4:07 p.m.
    Status: Investigating
    Update: We are currently investigating this issue.

Check the status of similar companies and alternatives to Harness

UiPath
UiPath

Systems Active

Scale AI
Scale AI

Systems Active

Notion
Notion

Systems Active

Brandwatch
Brandwatch

Systems Active

Olive AI
Olive AI

Systems Active

Sisense
Sisense

Systems Active

HeyJobs
HeyJobs

Systems Active

Joveo
Joveo

Systems Active

Seamless AI
Seamless AI

Systems Active

hireEZ
hireEZ

Systems Active

Alchemy
Alchemy

Systems Active

Frequently Asked Questions - Harness

Is there a Harness outage?
The current status of Harness is: Systems Active
Where can I find the official status page of Harness?
The official status page for Harness is here
How can I get notified if Harness is down or experiencing an outage?
To get notified of any status changes to Harness, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Harness every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here
What does Harness do?
Harness is a software delivery platform that enables engineers and DevOps to build, test, deploy, and verify software as needed.