Company Logo

Is there an Harness outage?

Harness status: Systems Active

Last checked: a minute ago

Get notified about any outages, downtime or incidents for Harness and 1800+ other cloud vendors. Monitor 10 companies, for free.

Subscribe for updates

Harness outages and incidents

Outage and incident data over the last 30 days for Harness.

There have been 3 outages or incidents for Harness in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Sign Up Now

Components and Services Monitored for Harness

Outlogger tracks the status of these components for Xero:

Service Reliability Management - Error Tracking FirstGen (fka OverOps) Active
Software Engineering Insights FirstGen (fka Propelo) Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Engineering Insights (SEI) Active
Software Supply Chain Assurance (SSCA) Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Engineering Insights (SEI) Active
Software Supply Chain Assurance (SSCA) Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Supply Chain Assurance (SSCA) Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Component Status
Service Reliability Management - Error Tracking FirstGen (fka OverOps) Active
Software Engineering Insights FirstGen (fka Propelo) Active
Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Engineering Insights (SEI) Active
Software Supply Chain Assurance (SSCA) Active
Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Engineering Insights (SEI) Active
Software Supply Chain Assurance (SSCA) Active
Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery (CD) - FirstGen - EOS Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Software Supply Chain Assurance (SSCA) Active
Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active
Active
Chaos Engineering Active
Cloud Cost Management (CCM) Active
Continuous Delivery - Next Generation (CDNG) Active
Continuous Error Tracking (CET) Active
Continuous Integration Enterprise(CIE) - Cloud Builds Active
Continuous Integration Enterprise(CIE) - Linux Cloud Builds Active
Continuous Integration Enterprise(CIE) - Self Hosted Runners Active
Continuous Integration Enterprise(CIE) - Windows Cloud Builds Active
Custom Dashboards Active
Feature Flags (FF) Active
Infrastructure as Code Management (IaCM) Active
Internal Developer Portal (IDP) Active
Security Testing Orchestration (STO) Active
Service Reliability Management (SRM) Active

Latest Harness outages and incidents.

View the latest incidents for Harness and check for official updates:

Updates:

  • Time: Jan. 31, 2024, 6:44 p.m.
    Status: Postmortem
    Update: After deployment on the prod-3 cluster, NextGenUI got stuck on the initial loading screen. The issue was observed immediately during post-deployment sanity. We identified the problem to be with our required static resources failing to load. This release included a change to how we build and load the UI for different environments. The change involved making the source for static-files configurable per-environment. But an incompatible configuration for the prod-3 cluster prevented the correct URL from being formed, resulting in 404 for our JS resources. We mitigated the incident by updating the service configuration for this environment and re-deploying the Nextgen UI service. With the new configuration, the UI service was able to generate the correct URLs, and the issue was resolved. ### Timeline | **Time \(UTC\)** | **Event** | | --- | --- | | 12:44 AM | Incident was first detected after the new deployment. An internal incident was raised, and the team started looking into the issue. | | 12:46 AM | Root cause identified and the fix was deployed. | | 12:47 AM | Incident resolved | ### Action Items * We are auditing the service configurations for all environments with an aim to minimize the differences. * Improve the Nextgen UI build process to handle incompatible configurations.
  • Time: Jan. 30, 2024, 12:56 a.m.
    Status: Resolved
    Update: The incident has been resolved. We will provide a postmortem once we have gathered all the details.
  • Time: Jan. 30, 2024, 12:47 a.m.
    Status: Monitoring
    Update: A fix has been implemented and we are monitoring the results.
  • Time: Jan. 30, 2024, 12:44 a.m.
    Status: Investigating
    Update: We are currently investigating this issue.

Updates:

  • Time: Jan. 31, 2024, 5:18 p.m.
    Status: Postmortem
    Update: **Incident Summary:** On January 29, 2024, a disruption occurred in the Prod 2 environment, affecting the execution of AutoStopping rules. Users reported issues, resulting in a total downtime of 56 minutes. The incident was promptly addressed, with a resolution time of 1 hour and 17 minutes. **Timeline of Events:** | Timestamp \(UTC\) | Event | | --- | --- | | January 29, 2024, 06:13 AM | FireHydrant incident was opened. | | January 29, 2024, 06:13 AM | Incident acknowledged, and internal investigation initiated on the incident Slack channel. | | January 29, 2024, 06:24 AM | Root cause identified: A component critical for rule execution encountered errors. | | January 29, 2024, 06:57 AM | Immediate resolution applied to address the identified component issue. | | January 29, 2024, 07:20 AM | System stability restored; rule executions were near optimal. | | January 29, 2024, 07:34 AM | FireHydrant incident closed, and the incident marked as resolved. | **Root Cause Analysis:** The incident originated from the AutoStopping feature in the Prod 2 environment, causing a critical failure in a component crucial for rule execution. This resulted in a disruption of rule operations and a failure to transition messages to the enqueued state. The system relies on a data store that encountered difficulties persisting data, leading to operational failures. The root cause was related to capacity limitations in a specific data storage component, causing it to be unable to handle the increased volume of messages during the incident. **Immediate Resolution:** To address the incident promptly, the team increased the capacity of the affected component. This allowed for the expedited processing of rule operations and a swift resolution of the issue. **Preventive Measures:** To prevent similar incidents in the future, the team has implemented enhanced monitoring to receive timely notifications of potential capacity issues. Proactive measures are being taken to ensure the system can effectively handle increased loads. **Conclusion:** The incident was successfully resolved through immediate actions to increase resource capacity. The team is committed to implementing proactive measures to enhance system monitoring and prevent similar occurrences, ensuring the stability and reliability of the system for all users.
  • Time: Jan. 29, 2024, 7:23 a.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: Jan. 29, 2024, 7:21 a.m.
    Status: Monitoring
    Update: A fix has been implemented and we are monitoring the results.
  • Time: Jan. 29, 2024, 6:56 a.m.
    Status: Identified
    Update: The issue has been identified and a fix is being implemented.
  • Time: Jan. 29, 2024, 6:32 a.m.
    Status: Investigating
    Update: We are currently investigating this issue.

Updates:

  • Time: Jan. 23, 2024, 6:42 p.m.
    Status: Postmortem
    Update: ## Summary The 'Customer Overview Page' was loading slowly in the Prod-2 cluster. All other critical functions remained unaffected. ## Timeline | **Time \(UTC\)** | **Event** | | --- | --- | | 04:30 PM | We got an alert, and the customer also reported the issue. | | 04:45 PM | An internal incident was raised, and the team started looking into the issue. | | 05:11 PM | Root cause identified | | 06:04 PM | Incident resolved | ## Resolution The high CPU-intensive maintenance task and the long-running queries were terminated to resume normal operations. ## RCA The dashboard failed to retrieve data from the backend database as the CPU utilization had reached > 90%. The alert came into the system as a Warning event that got overlooked. We observed the CPU spike due to maintenance tasks, some sub-optimal queries running on the primary node, and several active connections from the application side. We proceeded after validating that the queries and the maintenance task could be terminated without any potential data loss. ## Action Items 1. We have moved the maintenance tasks to the secondary node. 2. We are working on addressing the long-running queries coming from the application side. 3. We are also working on implementing the server-side timeout for long-running queries. 4. We will ensure the alerts immediately trigger an incident to the person on-call.
  • Time: Jan. 18, 2024, 6:03 p.m.
    Status: Resolved
    Update: The incident has been resolved.
  • Time: Jan. 18, 2024, 5:58 p.m.
    Status: Monitoring
    Update: The issue has been resolved and the overview page is back to normal. We are actively monitoring the systems.
  • Time: Jan. 18, 2024, 5:53 p.m.
    Status: Identified
    Update: The issue has been identified. Team is working to mitigate the issue and provide a solution as soon as possible.
  • Time: Jan. 18, 2024, 5:20 p.m.
    Status: Investigating
    Update: We are continuing to investigate this issue.
  • Time: Jan. 18, 2024, 5:14 p.m.
    Status: Investigating
    Update: We are currently investigating an issue where customer dashboards are slow to load or failing to load in some specific environment. This does not impact the pipelines running or deployments.

Updates:

  • Time: Jan. 10, 2024, 2:26 p.m.
    Status: Postmortem
    Update: **Overview** There was an issue reported by multiple harness customers in Prod-2 cluster where 500 errors were seen while accessing or trying to run pipelines and licensing information was also inaccessible. ‌ **Timeline** | **Time** | **Event** | | --- | --- | | 8 Jan 7:23 AM UTC | Issue was reported internally along with Customer reporting. | | 8 Jan 7:23 AM UTC | Internal Incident created. | | 8 Jan 7:23 AM UTC | Rolled back system deployment which immediately resolved the issue. | | 8 Jan 7:28 AM UTC | Internal Incident Resolved. | ‌ ‌ **Resolution** We rolled back our latest system deployment which resolved the issue within 5 minutes of the issue being reported. ‌ **Root Cause Analysis** Post our manager service release, a change in licensing resource resulted in cache failures. License API is called to fetch license information to check entitlements of services. Addition of new fields in license resources caused failures which resulted in unhandled exceptions. ‌ **Action Item** * We have implemented exception handling around api calls to handle cache failure that avoids service breakdown * Review Cache management during software releases to avoid such failures
  • Time: Jan. 8, 2024, 6:33 p.m.
    Status: Resolved
    Update: The issue was resolved once we did the rollback of the most recent deployment.
  • Time: Jan. 8, 2024, 6:31 p.m.
    Status: Investigating
    Update: We are currently investigating this issue.

Updates:

  • Time: Jan. 9, 2024, 6:48 p.m.
    Status: Postmortem
    Update: **Incident Summary:** There was a recent incident related to delays in the evaluation of Asset Governance Rules, stemming from a queue build-up that caused temporary slowness in rule execution. **Timeline:** * **2024-01-04 06:18 PM UTC:** Incident reported . * **2024-01-04 06:20 PM UTC:** Incident acknowledged; investigation initiated. * **2024-01-04 06:20 PM UTC:** Root cause identified. * **2024-01-04 06:39 PM UTC:** Immediate resolution applied to expedite job processing. * **2024-01-04 06:48 PM UTC:** Queue size normalized, incident resolved. **Root Cause Analysis:** The delay was traced back to a build-up in the job queue utilized by the CCM Asset Governance feature. This model employs an asynchronous execution approach using a job queue, where rule executions are enqueued for processing. Workers asynchronously dequeue jobs from this queue to perform actual rule evaluations. **Analysis:** The queue build-up was notable for specific types of evaluations with customers noticing slowness in Asset governance execution. **Immediate Resolution:** To promptly address the issue, the team increased the replica count for the services involved, facilitating quicker job consumption from the queue. **Total Downtime:** There was no downtime during the incident **Follow-up Actions:** 1. Implementation of separate queues for ad-hoc queries and enforcements/recommendations. 2. Enhanced telemetry and metrics monitoring, including alerts on queue lengths for various types. 3. Ongoing investigation to improve asynchronous job execution for faster evaluations.
  • Time: Jan. 4, 2024, 7 p.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: Jan. 4, 2024, 6:59 p.m.
    Status: Monitoring
    Update: A fix has been implemented and we are monitoring the results.
  • Time: Jan. 4, 2024, 6:59 p.m.
    Status: Identified
    Update: The issue has been identified and a fix is being implemented.
  • Time: Jan. 4, 2024, 6:57 p.m.
    Status: Investigating
    Update: We are currently investigating this issue.

Check the status of similar companies and alternatives to Harness

UiPath
UiPath

Systems Active

Scale AI
Scale AI

Systems Active

Notion
Notion

Systems Active

Brandwatch
Brandwatch

Systems Active

Olive AI
Olive AI

Systems Active

Sisense
Sisense

Systems Active

HeyJobs
HeyJobs

Systems Active

Joveo
Joveo

Systems Active

Seamless AI
Seamless AI

Systems Active

hireEZ
hireEZ

Systems Active

Alchemy
Alchemy

Systems Active

Frequently Asked Questions - Harness

Is there a Harness outage?
The current status of Harness is: Systems Active
Where can I find the official status page of Harness?
The official status page for Harness is here
How can I get notified if Harness is down or experiencing an outage?
To get notified of any status changes to Harness, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Harness every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here
What does Harness do?
Harness is a software delivery platform that enables engineers and DevOps to build, test, deploy, and verify software as needed.