Company Logo

Is there an Harness outage?

Harness status: Systems Active

Last checked: 7 minutes ago

Get notified about any outages, downtime or incidents for Harness and 1800+ other cloud vendors. Monitor 10 companies, for free.

Subscribe for updates

Harness outages and incidents

Outage and incident data over the last 30 days for Harness.

There have been 3 outages or incidents for Harness in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Sign Up Now

Components and Services Monitored for Harness

Outlogger tracks the status of these components for Xero:

Service Reliability Management - Error Tracking FirstGen (fka OverOps) Active
Software Engineering Insights FirstGen (fka Propelo) Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Engineering Insights (SEI) Active
Software Supply Chain Assurance (SSCA) Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Engineering Insights (SEI) Active
Software Supply Chain Assurance (SSCA) Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Supply Chain Assurance (SSCA) Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Component Status
Service Reliability Management - Error Tracking FirstGen (fka OverOps) Active
Software Engineering Insights FirstGen (fka Propelo) Active
Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Engineering Insights (SEI) Active
Software Supply Chain Assurance (SSCA) Active
Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Engineering Insights (SEI) Active
Software Supply Chain Assurance (SSCA) Active
Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Supply Chain Assurance (SSCA) Active
Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active

Latest Harness outages and incidents.

View the latest incidents for Harness and check for official updates:

Updates:

  • Time: Oct. 6, 2023, 11:19 p.m.
    Status: Postmortem
    Update: # Summary CI pipelines using git connectors with a `Delegate` as the mode of connection did not update the status back to SCM/Git providers. This did not impact those pipelines whose git connectors are connected via the `Harness Platform`. # Mitigation Steps to resolve the issue immediately 1. Rolled back delegate for rings to which customers belong. 2. Rolled back delegate for all rings. ‌ # Detailed Timeline\(PST | **Time** | **Event** | | --- | --- | | 10/05/2023 |   | | 3:30 AM | Delegate deployment | | 10:53 AM | First customer ticket for status check not reporting back | | 11-12 AM | Two more customer ticket for the same issue | | 12:20 PM | Incident channel created | | 2:20 PM | Reproduced the issue PR checks reporting not working via delegate | | 2:40 PM | Rolled back delegate in rings | | 3:30 PM | Informed customer about rollback | | 3:40PM | Customer confirmed restoration | ## RCA ### Why didn't certain CI pipelines update SCM/Git status? * The delegate task created to update the status on Git failed causing the status to be not reflected on Git provider. ### Why did the delegate task fail? * A change made to support the new Harness Code module introduced a missing dependency that failed during run-time. ### Why didn't it impact all the CI pipelines? * Harness provides two ways to connect to Git providers - via Harness Platform or via Harness Delegate. Only the CI pipelines using git connectors with a Delegate as the mode of connection did not update the status back to SCM/Git providers. This did not impact those pipelines whose git connectors are connected via the Harness Platform. This is because the missing dependency issue was in the Delegate. ### Why was the missing dependency not predicted in the testing phase? * We have automated tests that check running pipelines via Harness Delegate but the tests to check Git status updates in this mode of connectivity were missing. ## Steps taken Harness CI engineers tried to reproduce in-house with various combinations of infrastructure such as Kubernetes, Harness Cloud,  Virtual Machines, etc but it took some time to realize this happens only when the git connector is set up to be connected via Harness Delegate instead of Harness Platform. As soon as we realized this, we engaged the delegate engineering team and they helped with reverting the delegate version to a previous one that did not have this code. ## Follow-up actions Add automation to catch this case and also set up internal alerts when the issue happens so things can be handled proactively.
  • Time: Oct. 4, 2023, 10:18 p.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: Oct. 4, 2023, 10:18 p.m.
    Status: Identified
    Update: The incident has been resolved. Please reach out to Harness Support if you continue to see this behavior.
  • Time: Oct. 4, 2023, 9:51 p.m.
    Status: Identified
    Update: We have identified the issue and are currently rolling back to a previous version of the delegate.
  • Time: Oct. 4, 2023, 9:50 p.m.
    Status: Investigating
    Update: Impact: CI pipelines running on self-hosted infra, and using the latest delegate (v23.09.80804) that use Git Connectors with '''executeOnDelegate: true''''' will fail to see Git status updates. Workaround: Use '''executeOnDelegate: false''' (Connect via Harness Platform).
  • Time: Oct. 4, 2023, 9:50 p.m.
    Status: Investigating
    Update: The team has been notified of an issue and is currently investigating.

Updates:

  • Time: Oct. 6, 2023, 10:26 a.m.
    Status: Postmortem
    Update: # Summary CI / CD pipeline executions slowed down or timed out because of the high latency of Redis calls from the log service. ‌ # Mitigation 1. Engaged Redis support 2. Increased shards for log service 3. Monitored primary shard until CPU usage returned to the expected range ‌ # Detailed Timeline\(PST\) | **Time** | **Event** | **Notes** | | --- | --- | --- | | 8:42 AM | Firehydrant triggered for CI pipeline performance degradation | | | 8:45 AM | Checked Redis memory - it was under the limit | | | 9:05 AM | Figured out that the stream write calls are taking a very long time hence resulting in longer execution times | | | 9:25 AM | Created a P0 with Redis Support | | | 9:50 AM | Increase log-service memory | Since writes were taking longer, the API payloads were still present in log-service increasing the memory usage | | 10:30 AM | Redis support joined the call Requested shard logs to understand what’s causing the high latency for Redis operations | Explained the chain of events including Redis memory increase from last week. | | 10:51 AM | Deployed a change to decrease the number of lines in a log stream | Created a temporary fix to decrease the size per stream | | 11:15 AM | Discovered 100% CPU utilization on Redis shard Performed failover in an attempt to decrease the CPU utilization -  did not help | CPU was at 100% since Friday \(including the weekend when the load is low\) | | 12PM - 1:30 PM | Gradually increased the number of shards Received logs for 30 second time window for the hot shard \(requested at 10:30 AM\) | Saw the keys getting distributed evenly across shards but the CPU utilization of the host shard did not come down Hot shard - 100%, Other shards - 30-40% Saw some CRDT operation logs in the shard logs | | 2:30 PM | Redis team still investigating the issue. Requested for all shard logs and CRDT sync logs. | | | 3:34 PM | Received logs for 30 second time window for all shards | | | 4:01 PM | Redis team pointed out that even though the keys were distributed evenly, the hot shard was consuming more than twice the memory of the newly provisioned shards | Harness team pointed out that the replication was out of sync and there were a lot of CRDT.MERGE logs in the hot shard logs which were missing in the other shard logs | | 4:08 PM | failover and primary got in sync | | | 4:12 PM | CPU utilization started dropping along with the high memory usage on the hot shard | Incident was marked as resolved | ‌ # RCA ### Why were the pipelines running slow? The current Redis shard was not able to handle the log streaming load and its CPU was running at 100% and causing higher latency. ### Why was the Redis shard CPU running at 100%? Observations based on the call with the Redis support team 1. On 09/29, We noticed the Redis instance was running close to capacity. CloudOps team increased the size of the Redis instance to accommodate for increased load. Unfortunately, the team failed to make corresponding changes in the secondary cluster. 2. As per [log-service DB alerts](https://docs.google.com/spreadsheets/d/1uw9U-bqlaXZUv44jEW64tMHd2mU2s5oVXVIftQ3QaQ0/edit?usp=sharing) in Redis, the sync was failing since 9/29 and did not recover until Monday 10/02 \(OOM and connection errors\) 3. We suspect the sync process might be in a bad state resulting in keeping the CPU at 100% during the weekend too. We are awaiting a detailed RCA from Redis support. ‌ # Follow up actions 1. Updated the memory of failover to match with primary 2. Increased the number of shards 3. Working on enhancing monitoring and alerts around latency spikes
  • Time: Oct. 3, 2023, 12:16 a.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: Oct. 2, 2023, 11:22 p.m.
    Status: Monitoring
    Update: We have a solution, and the pipelines should be completed within the standard execution times; we are actively monitoring and will keep the status updated.
  • Time: Oct. 2, 2023, 10:04 p.m.
    Status: Identified
    Update: After increasing the number of shards on our caching database, we see most executions are now completing within the standard execution times. We are still seeing high CPUs on one of the shards, and we will continue working with our caching-managed service vendor to address this. We will keep you updated on the progress.
  • Time: Oct. 2, 2023, 8:43 p.m.
    Status: Investigating
    Update: We identified the caching component as the one that is the bottleneck at this point. We are closely working on the support to increase the number of shards on our caching database to mitigate the issue. We will keep you updated on the progress.
  • Time: Oct. 2, 2023, 5:06 p.m.
    Status: Investigating
    Update: Pipeline execution can be running slowly due to an on going issue with the log streaming.

Updates:

  • Time: Oct. 3, 2023, 7:21 a.m.
    Status: Postmortem
    Update: ## Impact Users were unable to log in to [app.propelo.ai](http://app.propelo.ai/) and [app.levelops.io](http://app.levelops.io/) through the [https://app.propelo.ai/signin](https://app.propelo.ai/signin) flow. Data ingestion, data processing, propels and any user already logged in in the US region were unaffected. The EMEA and APAC regions were not affected. **Workaround**: The users would still be able to log into the system via [SEI](https://app.propelo.ai/auth/login-page) flow as this issue impacts only the **/signin** flow of the application ‌ ## Root Cause One of the login flows is affected by tenant deletion tasks as the login flow looks at all the tenants in order to find out which tenants the user trying to log in has access to. * If the users table of the available tenants doesn't exist, the flow fails entirely. * During tenant deletions, several tenants were removed but the entry for the tenant to be considered 'available' was not deleted. * The connection to the DB was severed due to a VPN disconnect. ‌ ## Timeline | **Time** | **Event** | | --- | --- | | 2023-10-02 05:45 AM PDT | The issue was resolved | | 2023-10-02 05:40 AM PDT | The tenants which were marked for deletion were removed from the global list of tenants and operations are normal post the deletion of these unused tenant ids. | | 2023-10-02 05:10 AM PDT | Issue was identified via internal testing and incident was triggered. | ‌ ## Action Items * Institute a downtime window and alerting mechanism to stakeholders on the maintenance activity. * Also, perform verification / run sanity tests across tenants in respective regions to ensure the app is up and running. * Review the logic for /login and add any guardrails w.r.t the check on the global tenants list. * Institute a process/tool for tenant de-provision on the legacy SEI module.
  • Time: Oct. 3, 2023, 6:56 a.m.
    Status: Resolved
    Update: We can confirm normal operation. Get Ship Done! We will continue to monitor and ensure stability.
  • Time: Oct. 3, 2023, 6:56 a.m.
    Status: Monitoring
    Update: Harness SEI login flow issue has been addressed and normal operations have been resumed. We are monitoring the service to ensure normal performance continues.
  • Time: Oct. 3, 2023, 6:55 a.m.
    Status: Identified
    Update: We have identified a potential cause of the login issue and are working hard to address it. Please continue to monitor this page for updates.
  • Time: Oct. 3, 2023, 6:55 a.m.
    Status: Investigating
    Update: A few users are facing issues while trying to log in to Harness SEI through one of the login flows. We are working to identify the cause and restore normal operations as soon as possible.

Updates:

  • Time: Oct. 25, 2023, 3:28 a.m.
    Status: Postmortem
    Update: Root Cause was due to DockerHub incident [https://www.dockerstatus.com/pages/533c6539221ae15e3f000031](https://www.dockerstatus.com/pages/533c6539221ae15e3f000031)
  • Time: Sept. 29, 2023, 12:24 a.m.
    Status: Resolved
    Update: We are resolving the incident on the DockerHub side for now. Please follow DockerHub status page for any further updates
  • Time: Sept. 28, 2023, 11:31 p.m.
    Status: Monitoring
    Update: Docker has identified and mitigated the issue and they are actively monitoring for any recurrences, and we will continue to monitor their status in parallel. Latest update from the provider: We have mitigated the issue as of about 21:50 UTC, but will continue to monitor until an underlying incident at our provider is cleared.
  • Time: Sept. 28, 2023, 9:16 p.m.
    Status: Investigating
    Update: Currently, DockerHub is having an active incident of timeouts while authenticating or pulling from Dockerhub https://www.dockerstatus.com/pages/incident/533c6539221ae15e3f000031/6515dbc89fabff05350ae18d. Harness customers using Docker in the pipelines may run into issues because of the ongoing DockerHub incident. Please follow the DockerHub status page for further updates.

Updates:

  • Time: Oct. 24, 2023, 7 p.m.
    Status: Postmortem
    Update: # Overview All customers experienced disruptions with Dashboards and Perspectives. However, users could still access CCM and utilize other sections of the system. # Timeline \(PST\) | **Time** | **Event** | | --- | --- | | 10:17 AM Sept 28 2023 | Users reporting slowness in Dashboards and perspectives. Engineering team started the investigation. | | 11:46 AM Sept 28 2023 | Finds out GCP Bigquery is having autoscaling issues for muti-region US. | | 12:52 PM Sept 28 2023 | The usage of BigQuery has been redirected to a different service account from another GCP project that employs on-demand pricing. | | ~2:30 PM Sept 28 2023 | Perspectives have been restored and are now operational. Dashboards are functioning to some extent, though a few customers are still experiencing issues. | | 5:12 PM Sept 28 2023 | Google reports BigQuery issue resolved | | 5:12 PM Sept 28 2023 | Dashboards have been fully restored and are now operational for all customers. | # Resolution Redirecting BigQuery's service account to a different GCP project with on-demand billing resolved the problem. # Affected Users Users in Prod1 and Prod2 were impacted. Only the Perspectives and Dashboards features were affected, while the rest of CCM operated without issues. # RCA Google's BigQuery employs slot AutoScaling to enhance slot availability for better performance. An incident with BigQuery hindered the slot AutoScaling functionality. Since CCM's perspectives and dashboards rely on BigQuery, the incident impacted their query response times.
  • Time: Sept. 29, 2023, 12:36 a.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: Sept. 28, 2023, 11:33 p.m.
    Status: Monitoring
    Update: We are continuing to monitor the workaround in place for any issues. Additionally, we are seeing responses come back from our primary upstream provider but are awaiting additional status update/confirmation from the provider(GCP) before affirmation.
  • Time: Sept. 28, 2023, 9:51 p.m.
    Status: Monitoring
    Update: Harness has recently implemented a workaround for the majority of customers which should restore CCM Perspectives as well as Custom Dashboards. Our primary upstream provider(GCP) however, while showing some signs of recovery is still encountering issues. Harness will continue to provide status updates as we closely monitor the workaround we have put in place to mitigate issues related to CCM Perspectives and Dashboards until the incident is fully resolved. (Note: Existing GCP/CCM connector validation/test functionality will error out due to the workaround in place, but does not affect product functionality)
  • Time: Sept. 28, 2023, 9:39 p.m.
    Status: Identified
    Update: Harness has recently implemented a workaround for the majority of customers which should restore CCM Perspectives as well as Dashboards. Our primary upstream provider is still experiencing issues and is showing signs of recovery, and we expect an update within the next 30 minutes to the upstream status. Note: Existing GCP/CCM connector validation/tests will error out due to the workaround in place.
  • Time: Sept. 28, 2023, 8:53 p.m.
    Status: Identified
    Update: We identify that the issue we are experiencing is related to the Google Big Query incident: https://status.cloud.google.com/incidents/V8br4RDzg1RsCw6zWQEv
  • Time: Sept. 28, 2023, 8:23 p.m.
    Status: Investigating
    Update: CCM Perspectives and Dashboards are down because of a GCP Bigquery incident. We are currently investigating this issue.

Check the status of similar companies and alternatives to Harness

UiPath
UiPath

Systems Active

Scale AI
Scale AI

Systems Active

Notion
Notion

Systems Active

Brandwatch
Brandwatch

Systems Active

Olive AI
Olive AI

Systems Active

Sisense
Sisense

Systems Active

HeyJobs
HeyJobs

Systems Active

Joveo
Joveo

Systems Active

Seamless AI
Seamless AI

Systems Active

EdCast by Cornerstone
EdCast by Cornerstone

Issues Detected

hireEZ
hireEZ

Systems Active

Alchemy
Alchemy

Systems Active

Frequently Asked Questions - Harness

Is there a Harness outage?
The current status of Harness is: Systems Active
Where can I find the official status page of Harness?
The official status page for Harness is here
How can I get notified if Harness is down or experiencing an outage?
To get notified of any status changes to Harness, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Harness every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here
What does Harness do?
Harness is a software delivery platform that enables engineers and DevOps to build, test, deploy, and verify software as needed.