Company Logo

Is there an CircleCI outage?

CircleCI status: Systems Active

Last checked: 8 minutes ago

Get notified about any outages, downtime or incidents for CircleCI and 1800+ other cloud vendors. Monitor 10 companies, for free.

Subscribe for updates

CircleCI outages and incidents

Outage and incident data over the last 30 days for CircleCI.

There have been 6 outages or incidents for CircleCI in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Sign Up Now

Components and Services Monitored for CircleCI

Outlogger tracks the status of these components for Xero:

Artifacts Active
Billing & Account Active
CircleCI Insights Active
CircleCI Releases Active
CircleCI UI Active
CircleCI Webhooks Active
Docker Jobs Active
Machine Jobs Active
macOS Jobs Active
Notifications & Status Updates Active
Pipelines & Workflows Active
Runner Active
Windows Jobs Active
AWS Active
Google Cloud Platform Google Cloud DNS Active
Google Cloud Platform Google Cloud Networking Active
Google Cloud Platform Google Cloud Storage Active
Google Cloud Platform Google Compute Engine Active
mailgun API Active
mailgun Outbound Delivery Active
mailgun SMTP Active
OpenAI Active
Atlassian Bitbucket API Active
Atlassian Bitbucket Source downloads Active
Atlassian Bitbucket SSH Active
Atlassian Bitbucket Webhooks Active
Docker Authentication Active
Docker Hub Active
Docker Registry Active
GitHub API Requests Active
GitHub Git Operations Active
GitHub Packages Active
GitHub Pull Requests Active
GitHub Webhooks Active
GitLab Active
Component Status
Artifacts Active
Billing & Account Active
CircleCI Insights Active
CircleCI Releases Active
CircleCI UI Active
CircleCI Webhooks Active
Docker Jobs Active
Machine Jobs Active
macOS Jobs Active
Notifications & Status Updates Active
Pipelines & Workflows Active
Runner Active
Windows Jobs Active
Active
AWS Active
Google Cloud Platform Google Cloud DNS Active
Google Cloud Platform Google Cloud Networking Active
Google Cloud Platform Google Cloud Storage Active
Google Cloud Platform Google Compute Engine Active
mailgun API Active
mailgun Outbound Delivery Active
mailgun SMTP Active
OpenAI Active
Active
Atlassian Bitbucket API Active
Atlassian Bitbucket Source downloads Active
Atlassian Bitbucket SSH Active
Atlassian Bitbucket Webhooks Active
Docker Authentication Active
Docker Hub Active
Docker Registry Active
GitHub API Requests Active
GitHub Git Operations Active
GitHub Packages Active
GitHub Pull Requests Active
GitHub Webhooks Active
GitLab Active

Latest CircleCI outages and incidents.

View the latest incidents for CircleCI and check for official updates:

Updates:

  • Time: Sept. 20, 2024, 6:29 p.m.
    Status: Postmortem
    Update: ## Summary: On September 11, 2024, from 10:16 to 21:00 UTC CircleCI customers encountered multiple issues, including delays in starting jobs, slow processing of job outputs and task status messages \(such as the completion of steps and tasks\), dropped workflows, and a rise in infrastructure failures. These problems collectively affected all jobs during this time frame. To address these issues, we worked to stabilize the service responsible for ingesting and serving the step output as well as tracking the start and end times of individual steps \(**step service**\) until 19:00 UTC, at which point it was determined to be in a sustainable state. Despite this progress, delays in starting Mac jobs persisted until 20:30 UTC, largely due to a backlog of jobs waiting to start and a failure to properly garbage collect \(GC\) old virtual machines \(VMs\). This combination of factors contributed to a challenging operational environment for CircleCI customers. The original status page can be found [here](https://status.circleci.com/incidents/lsv2ry3jr16c). ## What Happened \(All times UTC\) On September 11, 2024, for approximately 10 hours, CircleCI experienced significant service disruptions. The incident began at 10:05 when a particularly potent configuration was executed during an internal test. By 10:14, the job ended, but efforts to generate test results led to a spike in memory usage, causing Out of Memory \(OOM\) errors for several internal services. This resulted in failures in processing job submissions and dispatching tasks, which impacted all customer jobs. By 10:16, job starts across all executors had completely failed, as the service responsible for processing and storing test results, as well as handling storage of job records \(**output service**\) became overwhelmed and unable to service requests. An official incident declaration occurred at 10:20. We triggered a deployment restart at 10:23, which initially allowed for some recovery before the service was again overwhelmed at approximately 10:27. To address these issues, we horizontally and vertically scaled the service. This adjustment allowed the service to stabilize and for customer jobs to start flowing again. Throughout the incident, machine jobs faced specific challenges due to timeouts. By 13:00, we detected abnormal resource utilization in step service, prompting us to monitor the situation closely. We believed the ongoing issues were related to a thundering herd effect stemming from an earlier incident. Between 14:47 and 15:05 our efforts to stabilize the system included increasing memory for the service processing step output multiple times throughout this incident in an ongoing attempt to manage the backlog and prevent OOM kills. At 16:21, in order to process the load of the thundering herd of built up workload, we had to raise memory limits in multiple locations in order to allow work to process without causing outages. This marked the beginning of a significant recovery. The existing Redis cluster was under heavy CPU load, prompting a decision at 16:30 to spin up a second Redis cluster to alleviate the pressure. ![](https://global.discourse-cdn.com/circleci/original/3X/c/f/cfbf14664563f62c4d331c0aed80b152ccdc1d5c.png) `Redis Engine CPU Utilization Impact Timeline` By 17:00, the job queue began to decrease significantly as the service stabilized. Throughout the afternoon, we continued to monitor and adjust resources, ultimately doubling Redis shards around 18:11, which had an immediate positive effect on reducing load. During this incident, customers experienced significantly longer response times for API calls from servers running customer workloads reporting back the output of jobs. The 95th percentile \(p95\) response times spiked to between 5 and 15 seconds from 14:20 to 19:50, compared to the usual expectation of around 100 milliseconds. This led to degraded step output on the jobs page, with issues such as delays in displaying the output of the customer steps, missing output of customer steps, and, in some cases, no output of customer steps at all. These delays likely resulted in slower Task performance, as sending step output to the step receiver took longer, blocking other actions within the Tasks. While the average Task runtime increased, the specific impact varied depending on the Task's contents. ![](https://global.discourse-cdn.com/circleci/original/3X/c/6/c600d16b22a0f27de0827edca9d4c55a2e56ea25.png) `Task Wait Time` _Linux:_ * **12:10 - 13:05:** Wait times under 1 minute. * **13:40 - 15:30:** Degraded wait times, generally under 5 minutes. * **15:30 - 18:05:** Wait times increased to tens of minutes, with some recovery starting around 17:45. * **18:05 - 20:00:** Continued degraded wait times of 2-3 minutes. * **20:00:** Fully recovered. _Windows:_ * **12:10 - 15:40:** Degraded wait times, typically under 5 minutes. * **15:40 - 17:15:** Wait times reached tens of minutes. * **17:15 - 19:35:** Returned to degraded wait times, usually under 5 minutes. * **19:35 - 19:55:** Wait times again increased to tens of minutes. * **19:55:** Fully recovered. _Mac OS:_ * **12:10 - 15:30:** Degraded wait times, generally under 5 minutes. * **15:30 - 21:00:** Wait times escalated to tens of minutes. * **21:00:** Fully recovered. ## Future Prevention and Process Improvement: In response to this incident, we are implementing several key improvements to enhance service reliability. First, we will enhance how tasks are cleared during infrastructure failures, which will help streamline operations. We will put in guardrails in the system to prevent execution of pathological workloads. Additionally, we will implement a mechanism that will allow us to temporarily prevent jobs that have failed due to infrastructure issues from being retried. We will adjust Redis health checks by moving them from liveness probes to readiness probes, and we plan to increase the number of Redis shards to better distribute load and minimize the impact of a single shard being blocked. During our investigation we identified a self-reenforcing cycle of poorly performing Redis commands \(scans\) that were the root cause of Redis failing, and we’re going to address this as well. To enhance stability, we will introduce a timeout for step data based on job maximum runtime and reduce the pressure a single job can place on the S3 connection pool. We are also looking to develop a method to pause live deployments from the CircleCI app during incidents, ensuring that delayed changes do not overwrite manual adjustments made in the interim.
  • Time: Sept. 11, 2024, 8:52 p.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: Sept. 11, 2024, 8:36 p.m.
    Status: Monitoring
    Update: Machine, MacOS and Windows jobs are now within normal operating parameters. We will continue to monitor.
  • Time: Sept. 11, 2024, 6:55 p.m.
    Status: Identified
    Update: Machine and Windows jobs are now within normal operating parameters. Wait times continue to be longer than the expected normal for MacOS jobs as we work through the backlog of jobs. We thank you for your patience while our engineers continue to work towards mitigation.
  • Time: Sept. 11, 2024, 6:19 p.m.
    Status: Identified
    Update: We are continuing to optimize our infrastructure and resource usage due to a slower than expected recovery times. While the wait times have improved they continue to be longer than the expected normal. We thank you for your patience while our engineers continue to work towards mitigation.
  • Time: Sept. 11, 2024, 5:27 p.m.
    Status: Identified
    Update: We are beginning to see early signs of recovery with Linux and Windows jobs processing. As the system works through the backlog, you should start to see reduced wait times. Our engineers continue to monitor our resources closely to ensure a steady improvement in our systems. macOS jobs may see a longer recovery period. We thank you for your patience during this time as we work towards a complete mitigation.
  • Time: Sept. 11, 2024, 4:44 p.m.
    Status: Investigating
    Update: We are continuing to investigate and mitigate the high resource pressure on our platform. The mitigating efforts may lead to some steps within the running jobs having an empty output or being skipped. We thank you for your patience while our engineers work to bring the platform to a healthy state
  • Time: Sept. 11, 2024, 4:15 p.m.
    Status: Investigating
    Update: We are continuing to alleviate the resource pressure our platform is currently experiencing. We thank you for your patience while we work to mitigate the issue causing high wait times on multiple job types
  • Time: Sept. 11, 2024, 3:31 p.m.
    Status: Investigating
    Update: Our engineers continue to investigate the issue and are working on changes to our infrastructure to relieve the pressure caused by high resource usage. We thank you for your patience while we work to reduce these high queue times.
  • Time: Sept. 11, 2024, 2:54 p.m.
    Status: Investigating
    Update: We are continuing to investigate this issue and are noticing high pressure on other job types at this time.
  • Time: Sept. 11, 2024, 2:51 p.m.
    Status: Investigating
    Update: Our engineers are continuing to investigate the issue causing high utilization of our resources and higher wait times for machine and other job types. We thank you for your patience as we work to mitigate this issue.
  • Time: Sept. 11, 2024, 2:29 p.m.
    Status: Investigating
    Update: Our engineers are currently investigating an issue causing longer wait times for machine jobs running on the platform. We will provide further updates as more information becomes available.

Updates:

  • Time: Sept. 11, 2024, 11:30 a.m.
    Status: Resolved
    Update: This incident has been resolved. Some workflows impacted during this incident won't be able to finish, those will need to be rerun.
  • Time: Sept. 11, 2024, 11:18 a.m.
    Status: Monitoring
    Update: Jobs are starting normally.
  • Time: Sept. 11, 2024, 11 a.m.
    Status: Investigating
    Update: We are starting to see recovery while we continue to investigate.
  • Time: Sept. 11, 2024, 10:29 a.m.
    Status: Investigating
    Update: We are continuing to investigate this issue.
  • Time: Sept. 11, 2024, 10:27 a.m.
    Status: Investigating
    Update: We have identified an issue preventing jobs from starting. We are investigating the cause

Updates:

  • Time: Sept. 11, 2024, 9:02 a.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: Sept. 11, 2024, 8:52 a.m.
    Status: Monitoring
    Update: We have restored all functionality to the UI, API and GitHub Checks. We are monitoring as service is fully restored.
  • Time: Sept. 11, 2024, 8:32 a.m.
    Status: Identified
    Update: We are continuing work to resolve this issue. We have also identified that this impacts GitHub Checks. Jobs will still continue to run.
  • Time: Sept. 11, 2024, 8:10 a.m.
    Status: Identified
    Update: We have identified an issue with loading pipeline pages and API endpoints. We are working to resolve it. Jobs will continue to run, but it may not be possible to view them.

Updates:

  • Time: Aug. 21, 2024, 6:28 p.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: Aug. 21, 2024, 6:21 p.m.
    Status: Monitoring
    Update: A fix has been implemented and we are monitoring the results.
  • Time: Aug. 21, 2024, 5:54 p.m.
    Status: Identified
    Update: The issue has been identified and a fix is being implemented.

Updates:

  • Time: Aug. 15, 2024, 6:35 p.m.
    Status: Resolved
    Update: Wait times for MacOS jobs have returned to normal. If you have any questions or require additional assistance please reach out to our customer support team.
  • Time: Aug. 15, 2024, 6:09 p.m.
    Status: Monitoring
    Update: We are working with our service provider to identify and resolve the cause of the problem. Users may experience longer than normal wait times for some MacOS jobs on an intermittent basis.
  • Time: Aug. 15, 2024, 5:48 p.m.
    Status: Investigating
    Update: We are investigating an issue with one of our service providers that is causing longer than normal wait times for MacOS jobs on `m1.large` and `m1.medium` resource classes.

Check the status of similar companies and alternatives to CircleCI

Hudl
Hudl

Systems Active

OutSystems
OutSystems

Systems Active

Postman
Postman

Systems Active

Mendix
Mendix

Systems Active

DigitalOcean
DigitalOcean

Issues Detected

Bandwidth
Bandwidth

Issues Detected

DataRobot
DataRobot

Systems Active

Grafana Cloud
Grafana Cloud

Systems Active

SmartBear Software
SmartBear Software

Systems Active

Test IO
Test IO

Systems Active

Copado Solutions
Copado Solutions

Systems Active

LaunchDarkly
LaunchDarkly

Systems Active

Frequently Asked Questions - CircleCI

Is there a CircleCI outage?
The current status of CircleCI is: Systems Active
Where can I find the official status page of CircleCI?
The official status page for CircleCI is here
How can I get notified if CircleCI is down or experiencing an outage?
To get notified of any status changes to CircleCI, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of CircleCI every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here
What does CircleCI do?
Access top-notch CI/CD for any platform, on our cloud or your own infrastructure, at no cost.