Last checked: 9 minutes ago
Get notified about any outages, downtime or incidents for Harness and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Harness.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
Service Reliability Management - Error Tracking FirstGen (fka OverOps) | Active |
Software Engineering Insights FirstGen (fka Propelo) | Active |
Prod 1 | Active |
Chaos Engineering | Active |
Cloud Cost Management (CCM) | Active |
Continuous Delivery (CD) - FirstGen - EOS | Active |
Continuous Delivery - Next Generation (CDNG) | Active |
Continuous Error Tracking (CET) | Active |
Continuous Integration Enterprise(CIE) - Cloud Builds | Active |
Continuous Integration Enterprise(CIE) - Linux Cloud Builds | Active |
Continuous Integration Enterprise(CIE) - Self Hosted Runners | Active |
Continuous Integration Enterprise(CIE) - Windows Cloud Builds | Active |
Custom Dashboards | Active |
Feature Flags (FF) | Active |
Infrastructure as Code Management (IaCM) | Active |
Internal Developer Portal (IDP) | Active |
Security Testing Orchestration (STO) | Active |
Service Reliability Management (SRM) | Active |
Software Engineering Insights (SEI) | Active |
Software Supply Chain Assurance (SSCA) | Active |
Prod 2 | Active |
Chaos Engineering | Active |
Cloud Cost Management (CCM) | Active |
Continuous Delivery (CD) - FirstGen - EOS | Active |
Continuous Delivery - Next Generation (CDNG) | Active |
Continuous Error Tracking (CET) | Active |
Continuous Integration Enterprise(CIE) - Cloud Builds | Active |
Continuous Integration Enterprise(CIE) - Linux Cloud Builds | Active |
Continuous Integration Enterprise(CIE) - Self Hosted Runners | Active |
Continuous Integration Enterprise(CIE) - Windows Cloud Builds | Active |
Custom Dashboards | Active |
Feature Flags (FF) | Active |
Infrastructure as Code Management (IaCM) | Active |
Internal Developer Portal (IDP) | Active |
Security Testing Orchestration (STO) | Active |
Service Reliability Management (SRM) | Active |
Software Engineering Insights (SEI) | Active |
Software Supply Chain Assurance (SSCA) | Active |
Prod 3 | Active |
Chaos Engineering | Active |
Cloud Cost Management (CCM) | Active |
Continuous Delivery (CD) - FirstGen - EOS | Active |
Continuous Delivery - Next Generation (CDNG) | Active |
Continuous Error Tracking (CET) | Active |
Continuous Integration Enterprise(CIE) - Cloud Builds | Active |
Continuous Integration Enterprise(CIE) - Linux Cloud Builds | Active |
Continuous Integration Enterprise(CIE) - Self Hosted Runners | Active |
Continuous Integration Enterprise(CIE) - Windows Cloud Builds | Active |
Custom Dashboards | Active |
Feature Flags (FF) | Active |
Infrastructure as Code Management (IaCM) | Active |
Internal Developer Portal (IDP) | Active |
Security Testing Orchestration (STO) | Active |
Service Reliability Management (SRM) | Active |
Software Supply Chain Assurance (SSCA) | Active |
Prod 4 | Active |
Chaos Engineering | Active |
Cloud Cost Management (CCM) | Active |
Continuous Delivery - Next Generation (CDNG) | Active |
Continuous Error Tracking (CET) | Active |
Continuous Integration Enterprise(CIE) - Cloud Builds | Active |
Continuous Integration Enterprise(CIE) - Linux Cloud Builds | Active |
Continuous Integration Enterprise(CIE) - Self Hosted Runners | Active |
Continuous Integration Enterprise(CIE) - Windows Cloud Builds | Active |
Custom Dashboards | Active |
Feature Flags (FF) | Active |
Infrastructure as Code Management (IaCM) | Active |
Internal Developer Portal (IDP) | Active |
Security Testing Orchestration (STO) | Active |
Service Reliability Management (SRM) | Active |
Prod Eu1 | Active |
Chaos Engineering | Active |
Cloud Cost Management (CCM) | Active |
Continuous Delivery - Next Generation (CDNG) | Active |
Continuous Error Tracking (CET) | Active |
Continuous Integration Enterprise(CIE) - Cloud Builds | Active |
Continuous Integration Enterprise(CIE) - Linux Cloud Builds | Active |
Continuous Integration Enterprise(CIE) - Self Hosted Runners | Active |
Continuous Integration Enterprise(CIE) - Windows Cloud Builds | Active |
Custom Dashboards | Active |
Feature Flags (FF) | Active |
Infrastructure as Code Management (IaCM) | Active |
Internal Developer Portal (IDP) | Active |
Security Testing Orchestration (STO) | Active |
Service Reliability Management (SRM) | Active |
View the latest incidents for Harness and check for official updates:
Description: # Overview A small subset of pipelines encountered VM initialization failures on Cloud Mac builds. ## Timeline \(PST\) | Time | Event | | --- | --- | | 1:17 AM | Customer reported this issue via Support ticket | | 1:29 AM | Internal incident created | | 1:32 AM | Identified that we hit full capacity in Mac pools | | 2:11 AM | We completed a manual cleanup of stale VM’s before the scheduled cleanup, making sure customer pipelines were able to proceed | | 2:30 AM | Mac pool was returned to normal state | ## Resolution Manual cleanup of stale VM’s before the scheduled cleanup to allow the active pipelines to complete. ## Affected accounts One of the customers using Mac hosted builds in Prod2 was impacted. ## RCA During the VM provisioning process, the VMs were not cleaned up as they should have been when an abort call was made. This issue arose from a recent modification we made to address other cleanup issues. The situation would have been managed effectively if appropriate error handling and cleanup had been incorporated into the Anka driver \(Driver for mac OS deployments\) to VM creation logic. However, as the Anka driver returned an error, we didn't create a database entry for cleanup, which in turn led to the dlite instance not performing the necessary cleanup due to the absence of a database entry. ## Action Items * Anka create API needs to be made Idempotent * Additional monitoring/alerting for stale VMs
Status: Postmortem
Impact: Major | Started At: Nov. 2, 2023, 9:56 a.m.
Description: # Overview We experienced a service disruption that affected the pipeline functionality, specifically impacting the pipelines with Test Intelligence steps responsible for uploading call graphs. ## Timeline \(PST\) | **Time** | **Event** | | --- | --- | | 7:37 AM Nov 1 2023 | Problematic Lite Engine service version got deployed | | 11:06 AM Nov 1 2023 | Internal customer reported issue | | 11:43 AM Nov 1 2023 | Lite Engine service rolled back | ## Resolution We have rolled back the recent hotfix to restore the service to its previous state. ## Affected accounts A total of 2 customers and 12 pipelines were impacted by this incident for CIE - Cloud Builds. ## RCA The hotfix aimed to improve the Test Intelligence service by skipping the upload of empty call graphs. However, this change led to the unintended consequence of not serializing empty call graphs that were expected to be present by subsequent steps in the pipeline. The testing for the hotfix was conducted in an environment that did not accurately reflect the production infrastructure where the Test Intelligence service operates. This oversight resulted in the failure to capture the issue before the hotfix was deployed to production. ## Action Items * We are reviewing our testing environments and procedures to ensure they accurately mirror our production settings. * Thoroughly review and test the change required for skip upload callgraph functionality
Status: Postmortem
Impact: Major | Started At: Nov. 1, 2023, 2:37 p.m.
Description: ## Overview Post deployment of 63xx release for ci-manager we started observing failures in pipelines where clone codebase properties was disabled. ## Timeline | **Time** | **Event** | | --- | --- | | 24 Oct 2023 4 AM IST | Internally identified the issue | | 24 Oct 2023 4.20 AM IST | Reverted the release to 62xx | ## Resolution Reverted CI Manager to previous release in Prod 2 to 62xx release ## RCA & Action Items As part of a error handling improvement we added a check that codebase input variable shouldn't be passed as null. But this check should have been applied to only stages where clone is enabled and the input was not passed. Due to a code bug we applied it to all stages. This caused the failure for customer pipelines. Automation for pipeline with stage with and without clone codebase in the same pipeline will be added to catch such scenarios.
Status: Postmortem
Impact: None | Started At: Oct. 24, 2023, 10:30 p.m.
Description: # Overview There was a delay in newly published artifacts showing up in Harness. Impact was limited to few customers in one of our clusters \(prod-1\). ## Timeline \(PST\) | **Time** | **Event** | | --- | --- | | 1:19 AM | Incident declared, and the team got engaged in Incident response. | | 8:55 AM | Issue identified and working on mitigation. | | 10:25 AM | Mitigated and Monitoring. | | 12:17 PM | Issue got resolved | ## Resolution To mitigate, we purged old redundant background job records which were slowing down the collection process. ## RCA & Action Items There was up to a 30 minute delay in our system for collecting newly published artifacts. The typical delay is less than 4 minutes. The slow collection was due to a spike in the backlog of background job requests in our database. To mitigate, we purged old redundant background job records, and the system caught up automatically. The actual cause of build up of the background jobs is unknown. We have added more aggressive alerts in the system for the job queue build up.
Status: Postmortem
Impact: Minor | Started At: Oct. 18, 2023, 9:01 a.m.
Description: This incident is the first instance of this incident. [https://status.harness.io/incidents/w047bt2yqjm5](https://status.harness.io/incidents/w047bt2yqjm5) Please find the RCA here.
Status: Postmortem
Impact: Minor | Started At: Oct. 17, 2023, 10:47 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.