CircleCI Status: Check if CircleCI down or having an outage.

CircleCI outages and incidents

Outage and incident data over the last 30 days for CircleCI.

There have been 6 outages or incidents for CircleCI in the last 30 days.

Severity Breakdown:

None: 2

Minor: 2

Major: 0

Critical: 2

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Components and Services Monitored for CircleCI

Outlogger tracks the status of these components for Xero:

Artifacts Active

Billing & Account Active

CircleCI Insights Active

CircleCI Releases Active

CircleCI UI Active

CircleCI Webhooks Active

Docker Jobs Active

Machine Jobs Active

macOS Jobs Active

Notifications & Status Updates Active

Pipelines & Workflows Active

Runner Active

Windows Jobs Active

CircleCI Dependencies

AWS Active

Google Cloud Platform Google Cloud DNS Active

Google Cloud Platform Google Cloud Networking Active

Google Cloud Platform Google Cloud Storage Active

Google Cloud Platform Google Compute Engine Active

mailgun API Active

mailgun Outbound Delivery Active

mailgun SMTP Active

OpenAI Active

Upstream Services

Atlassian Bitbucket API Active

Atlassian Bitbucket Source downloads Active

Atlassian Bitbucket SSH Active

Atlassian Bitbucket Webhooks Active

Docker Authentication Active

Docker Hub Active

Docker Registry Active

GitHub API Requests Active

GitHub Git Operations Active

GitHub Packages Active

GitHub Pull Requests Active

GitHub Webhooks Active

GitLab Active

Component	Status
Artifacts	Active
Billing & Account	Active
CircleCI Insights	Active
CircleCI Releases	Active
CircleCI UI	Active
CircleCI Webhooks	Active
Docker Jobs	Active
Machine Jobs	Active
macOS Jobs	Active
Notifications & Status Updates	Active
Pipelines & Workflows	Active
Runner	Active
Windows Jobs	Active
CircleCI Dependencies	Active
AWS	Active
Google Cloud Platform Google Cloud DNS	Active
Google Cloud Platform Google Cloud Networking	Active
Google Cloud Platform Google Cloud Storage	Active
Google Cloud Platform Google Compute Engine	Active
mailgun API	Active
mailgun Outbound Delivery	Active
mailgun SMTP	Active
OpenAI	Active
Upstream Services	Active
Atlassian Bitbucket API	Active
Atlassian Bitbucket Source downloads	Active
Atlassian Bitbucket SSH	Active
Atlassian Bitbucket Webhooks	Active
Docker Authentication	Active
Docker Hub	Active
Docker Registry	Active
GitHub API Requests	Active
GitHub Git Operations	Active
GitHub Packages	Active
GitHub Pull Requests	Active
GitHub Webhooks	Active
GitLab	Active

Latest CircleCI outages and incidents.

View the latest incidents for CircleCI and check for official updates:

Jobs failing to start or in progress fails.

Description: The incident has been resolved. Thanks for your patience.

Status: Resolved

Impact: Critical | Started At: Oct. 28, 2024, 2:58 p.m.

Updates:

Time: Oct. 28, 2024, 3:50 p.m.

Status: Resolved

Update: The incident has been resolved. Thanks for your patience.
Time: Oct. 28, 2024, 3:26 p.m.

Status: Monitoring

Update: Jobs are working again. If you had any jobs showing failures you will have to re-run. We will continue monitoring.
Time: Oct. 28, 2024, 2:58 p.m.

Status: Investigating

Update: Some jobs are failing to start, and some jobs are having infrastructure failures. We are looking into it.

Jobs failing to start or in progress fail

Description: ## Summary: On October 22, 2024, from 14:45 to 15:52 and again from 17:41 to 18:22 UTC, CircleCI customers experienced failures on new job submissions as well as failures on jobs that were in progress. A sudden increase in the number of tasks completing simultaneously and requests to upload artifacts from jobs overloaded the service responsible for managing job output. On October 28, 2024, from 13:27 to 14:13 and from 14:58 to 15:50, CircleCI customers experienced a recurrence of these effects due to a similar cause. During these sets of incidents, customers would have experienced their jobs failing to start with an infrastructure failure. Jobs that were already in progress also failed with an infrastructure failure. We want to thank our customers for your patience and understanding as we worked to resolve these incidents. The original status pages for the incidents on October 22 can be found [here](https://status.circleci.com/incidents/6yjv79g764yc) and [here](https://status.circleci.com/incidents/0crxbhkflndc). The status pages for incidents on October 28 can be found [here](https://status.circleci.com/incidents/xk37ycndxbhc) and [here](https://status.circleci.com/incidents/8ktdwlsf2lm8). ## What Happened: \(All times UTC\) On October 22, 2024, at 14:45 there was a sudden increase of customer tasks completing at the same time within CircleCI. In order to record each of these task end events, including the amount of storage the task used, the system that manages task state \(distributor\) made calls to our internal API gateway, which subsequently queried the system responsible for storing job output \(output service\). At this point, output service became overwhelmed with requests; although some requests were handled successfully, the vast majority were delayed before finally receiving a `499 Client Closed Request` error response. ![](https://global.discourse-cdn.com/circleci/original/3X/2/b/2b68322aaf27124eb5ae63a15bc0f8f2118c3f7b.png) `Distributor task end calls to the internal API gateway` Additionally, at 14:50, output service received an influx of artifact upload requests, further straining resources in the service. An incident was officially declared at 14:57. Output service was scaled horizontally at 15:16 to handle the additional load it was receiving. Internal health checks began to recover at 15:25, and we continued to monitor output service until incoming requests returned to normal levels. The incident was resolved at 15:52 and we kept output service horizontally scaled. At 17:41, output service received another sharp increase in requests to upload artifacts and was unable to keep up with the additional load, causing jobs to fail again. An incident was declared at 17:57. Because output service was still horizontally scaled from the initial incident, it automatically recovered by 18:00. As a proactive measure, we further scaled output service horizontally at 18:02. We continued to monitor our systems until the incident was resolved at 18:22. Following incident resolution, we continued our investigation and uncovered on October 25 that our internal API gateway was configured with low values for the maximum number of connections allowed to each of the services that experienced increased load on October 22. We immediately increased these values so that the gateway could handle increased volume of task end events moving forward. Despite these improvements, on October 28, 2024, at 13:27, customer jobs started to fail in the same way as they previously did on October 22. An incident was officially declared at 13:38. By 13:48, the system automatically recovered without any intervention and the incident was resolved at 14:13. We continued to investigate the root cause of the delays and failures, but at 14:45 customer jobs started to fail again in the same way. We declared another incident at 14:50. In order to reduce the load on output service, we removed the retry logic when requesting storage used per task from output service. This allowed tasks to complete even if storage used could not be retrieved \(to the customer’s benefit\). Additionally, we scaled distributor horizontally at 15:19 in order to handle the increased load. At 15:21, our systems began to recover. We continued to monitor and resolved the incident at 15:51. We returned to our investigation into the root cause of this recurring behavior and discovered that there was an additional client in distributor that was configured with a low value for maximum number of connections to our internal API gateway. We increased this value at 17:33. ## Future Prevention and Process Improvement: Following the remediation on October 28, we conducted an audit of **all** of the HTTP clients in the execution environment and proactively increased those that were similarly configured to ones in the internal API gateway and distributor. Additionally, we identified a gap in observability with these HTTP clients that prevented us from identifying the root cause of these sets of incidents sooner. We immediately added additional observability to all of the clients in order to enable better alerting if connections pools were to become exhausted again in the future.

Status: Postmortem

Impact: Critical | Started At: Oct. 28, 2024, 1:44 p.m.

Updates:

Time: Nov. 15, 2024, 1:23 p.m.

Status: Postmortem

Update: ## Summary: On October 22, 2024, from 14:45 to 15:52 and again from 17:41 to 18:22 UTC, CircleCI customers experienced failures on new job submissions as well as failures on jobs that were in progress. A sudden increase in the number of tasks completing simultaneously and requests to upload artifacts from jobs overloaded the service responsible for managing job output. On October 28, 2024, from 13:27 to 14:13 and from 14:58 to 15:50, CircleCI customers experienced a recurrence of these effects due to a similar cause. During these sets of incidents, customers would have experienced their jobs failing to start with an infrastructure failure. Jobs that were already in progress also failed with an infrastructure failure. We want to thank our customers for your patience and understanding as we worked to resolve these incidents. The original status pages for the incidents on October 22 can be found [here](https://status.circleci.com/incidents/6yjv79g764yc) and [here](https://status.circleci.com/incidents/0crxbhkflndc). The status pages for incidents on October 28 can be found [here](https://status.circleci.com/incidents/xk37ycndxbhc) and [here](https://status.circleci.com/incidents/8ktdwlsf2lm8). ## What Happened: \(All times UTC\) On October 22, 2024, at 14:45 there was a sudden increase of customer tasks completing at the same time within CircleCI. In order to record each of these task end events, including the amount of storage the task used, the system that manages task state \(distributor\) made calls to our internal API gateway, which subsequently queried the system responsible for storing job output \(output service\). At this point, output service became overwhelmed with requests; although some requests were handled successfully, the vast majority were delayed before finally receiving a `499 Client Closed Request` error response. ![](https://global.discourse-cdn.com/circleci/original/3X/2/b/2b68322aaf27124eb5ae63a15bc0f8f2118c3f7b.png) `Distributor task end calls to the internal API gateway` Additionally, at 14:50, output service received an influx of artifact upload requests, further straining resources in the service. An incident was officially declared at 14:57. Output service was scaled horizontally at 15:16 to handle the additional load it was receiving. Internal health checks began to recover at 15:25, and we continued to monitor output service until incoming requests returned to normal levels. The incident was resolved at 15:52 and we kept output service horizontally scaled. At 17:41, output service received another sharp increase in requests to upload artifacts and was unable to keep up with the additional load, causing jobs to fail again. An incident was declared at 17:57. Because output service was still horizontally scaled from the initial incident, it automatically recovered by 18:00. As a proactive measure, we further scaled output service horizontally at 18:02. We continued to monitor our systems until the incident was resolved at 18:22. Following incident resolution, we continued our investigation and uncovered on October 25 that our internal API gateway was configured with low values for the maximum number of connections allowed to each of the services that experienced increased load on October 22. We immediately increased these values so that the gateway could handle increased volume of task end events moving forward. Despite these improvements, on October 28, 2024, at 13:27, customer jobs started to fail in the same way as they previously did on October 22. An incident was officially declared at 13:38. By 13:48, the system automatically recovered without any intervention and the incident was resolved at 14:13. We continued to investigate the root cause of the delays and failures, but at 14:45 customer jobs started to fail again in the same way. We declared another incident at 14:50. In order to reduce the load on output service, we removed the retry logic when requesting storage used per task from output service. This allowed tasks to complete even if storage used could not be retrieved \(to the customer’s benefit\). Additionally, we scaled distributor horizontally at 15:19 in order to handle the increased load. At 15:21, our systems began to recover. We continued to monitor and resolved the incident at 15:51. We returned to our investigation into the root cause of this recurring behavior and discovered that there was an additional client in distributor that was configured with a low value for maximum number of connections to our internal API gateway. We increased this value at 17:33. ## Future Prevention and Process Improvement: Following the remediation on October 28, we conducted an audit of **all** of the HTTP clients in the execution environment and proactively increased those that were similarly configured to ones in the internal API gateway and distributor. Additionally, we identified a gap in observability with these HTTP clients that prevented us from identifying the root cause of these sets of incidents sooner. We immediately added additional observability to all of the clients in order to enable better alerting if connections pools were to become exhausted again in the future.
Time: Oct. 28, 2024, 2:13 p.m.

Status: Resolved

Update: The incident has been resolved. Thanks for your patience.
Time: Oct. 28, 2024, 2:02 p.m.

Status: Monitoring

Update: Jobs are working again. If you had any jobs showing failures you will have to re-run. We will continue monitoring.
Time: Oct. 28, 2024, 1:44 p.m.

Status: Investigating

Update: Some jobs are failing to start, and some jobs are having infrastructure failures. We are looking into it.

MacOS Job Starts Delayed: M2 Pro Medium

Description: This incident has been resolved.

Status: Resolved

Impact: None | Started At: Oct. 24, 2024, 5:38 p.m.

Updates:

Time: Oct. 24, 2024, 6:30 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: Oct. 24, 2024, 6:19 p.m.

Status: Monitoring

Update: We are seeing recovery and will continue to monitor.
Time: Oct. 24, 2024, 6:04 p.m.

Status: Identified

Update: Wait times continue to decrease. We are monitoring the fix.
Time: Oct. 24, 2024, 5:41 p.m.

Status: Identified

Update: MacOS job starts delayed for M2 Pro medium resource class. We've identified the issue and we are working to resolve it. We will provide more updates as information becomes available and we appreciate your continued patience.
Time: Oct. 24, 2024, 5:38 p.m.

Status: Identified

Update: The issue has been identified and a fix is being implemented.

Plans and Usage pages are unavailable

Description: This incident has been resolved.

Status: Resolved

Impact: Minor | Started At: Oct. 24, 2024, 10:33 a.m.

Updates:

Time: Oct. 24, 2024, 11:27 a.m.

Status: Resolved

Update: This incident has been resolved.
Time: Oct. 24, 2024, 11:06 a.m.

Status: Monitoring

Update: The plans and usage pages are now accessible and is functioning normally.
Time: Oct. 24, 2024, 11 a.m.

Status: Identified

Update: We have identified the cause of the issue and have begun remediating it. We appreciate your patience whilst we work through the issue.
Time: Oct. 24, 2024, 10:49 a.m.

Status: Investigating

Update: We're continuing to investigate this issue. Thank you for your patience.
Time: Oct. 24, 2024, 10:33 a.m.

Status: Investigating

Update: Users are unable to view the plans or usage pages. We're investigating this issue.

Some Runner jobs not starting

Description: During this incident, customers could not access the Runner Inventory page and experienced infrastructure failures for Runner jobs.

Status: Resolved

Impact: Minor | Started At: Oct. 23, 2024, 1:30 a.m.

Updates:

Time: Oct. 23, 2024, 12:30 p.m.

Status: Resolved

Update: During this incident, customers could not access the Runner Inventory page and experienced infrastructure failures for Runner jobs.

Check the status of similar companies and alternatives to CircleCI

Hudl

Systems Active

OutSystems

Systems Active

Postman

Systems Active

Mendix

Systems Active

DigitalOcean

Issues Detected

Bandwidth

Systems Active

DataRobot

Systems Active

Grafana Cloud

Systems Active

SmartBear Software

Systems Active

Test IO

Systems Active

Copado Solutions

Systems Active

LaunchDarkly

Systems Active

Frequently Asked Questions - CircleCI

Is there a CircleCI outage?

The current status of CircleCI is: Systems Active

Where can I find the official status page of CircleCI?

The official status page for CircleCI is here

How can I get notified if CircleCI is down or experiencing an outage?

To get notified of any status changes to CircleCI, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of CircleCI every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here

What does CircleCI do?

Access top-notch CI/CD for any platform, on our cloud or your own infrastructure, at no cost.

Is there an CircleCI outage?

CircleCI status: Systems Active

CircleCI outages and incidents

There have been 6 outages or incidents for CircleCI in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Components and Services Monitored for CircleCI

CircleCI Dependencies

Upstream Services

Latest CircleCI outages and incidents.

Jobs failing to start or in progress fails.

Updates:

Jobs failing to start or in progress fail

Updates:

MacOS Job Starts Delayed: M2 Pro Medium

Updates:

Plans and Usage pages are unavailable

Updates:

Some Runner jobs not starting

Updates:

Check the status of similar companies and alternatives to CircleCI

Hudl

OutSystems

Postman

Mendix

DigitalOcean

Bandwidth

DataRobot

Grafana Cloud

SmartBear Software

Test IO

Copado Solutions

LaunchDarkly

Frequently Asked Questions - CircleCI

Is there a CircleCI outage?

Where can I find the official status page of CircleCI?

How can I get notified if CircleCI is down or experiencing an outage?

What does CircleCI do?

Start monitoring now!