Render Status: Check if Render down or having an outage.

Render outages and incidents

Outage and incident data over the last 30 days for Render.

There have been 2 outages or incidents for Render in the last 30 days.

Severity Breakdown:

None: 0

Minor: 2

Major: 0

Critical: 0

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Components and Services Monitored for Render

Outlogger tracks the status of these components for Xero:

Custom Domains Active

Render API Active

Render Dashboard Active

Render Website Active

Static Sites Active

Frankfurt

Autoscaling Active

Background Workers Active

Builds and Deploys Active

Cron Jobs Active

Metrics/Logs Active

PostgreSQL Active

Redis Active

Web Services Active

Web Services - Free Tier Active

Ohio

Autoscaling Active

Background Workers Active

Builds and Deploys Active

Cron Jobs Active

Metrics/Logs Active

PostgreSQL Active

Redis Active

Web Services Active

Web Services - Free Tier Active

Oregon

Autoscaling Active

Background Workers Active

Builds and Deploys Active

Cron Jobs Active

Metrics/Logs Active

PostgreSQL Active

Redis Active

Web Services Active

Web Services - Free Tier Active

Singapore

Autoscaling Active

Background Workers Active

Builds and Deploys Active

Cron Jobs Active

Metrics/Logs Active

PostgreSQL Active

Redis Active

Web Services Active

Web Services - Free Tier Active

Virginia

Autoscaling Active

Background Workers Active

Builds and Deploys Active

Cron Jobs Active

Metrics/Logs Active

PostgreSQL Active

Redis Active

Web Services Active

Web Services - Free Tier Active

Component	Status
Custom Domains	Active
Render API	Active
Render Dashboard	Active
Render Website	Active
Static Sites	Active
Frankfurt	Active
Autoscaling	Active
Background Workers	Active
Builds and Deploys	Active
Cron Jobs	Active
Metrics/Logs	Active
PostgreSQL	Active
Redis	Active
Web Services	Active
Web Services - Free Tier	Active
Ohio	Active
Autoscaling	Active
Background Workers	Active
Builds and Deploys	Active
Cron Jobs	Active
Metrics/Logs	Active
PostgreSQL	Active
Redis	Active
Web Services	Active
Web Services - Free Tier	Active
Oregon	Active
Autoscaling	Active
Background Workers	Active
Builds and Deploys	Active
Cron Jobs	Active
Metrics/Logs	Active
PostgreSQL	Active
Redis	Active
Web Services	Active
Web Services - Free Tier	Active
Singapore	Active
Autoscaling	Active
Background Workers	Active
Builds and Deploys	Active
Cron Jobs	Active
Metrics/Logs	Active
PostgreSQL	Active
Redis	Active
Web Services	Active
Web Services - Free Tier	Active
Virginia	Active
Autoscaling	Active
Background Workers	Active
Builds and Deploys	Active
Cron Jobs	Active
Metrics/Logs	Active
PostgreSQL	Active
Redis	Active
Web Services	Active
Web Services - Free Tier	Active

Latest Render outages and incidents.

View the latest incidents for Render and check for official updates:

Degraded external connections to Redis with IP AllowList

Description: This incident has been resolved.

Status: Resolved

Impact: Minor | Started At: March 29, 2024, 11:15 p.m.

Updates:

Time: March 30, 2024, 12:01 a.m.

Status: Resolved

Update: This incident has been resolved.
Time: March 29, 2024, 11:51 p.m.

Status: Monitoring

Update: A fix has been implemented and we are monitoring the results.
Time: March 29, 2024, 11:15 p.m.

Status: Identified

Update: We have identified an issue impacting external connections to Redis with IP AllowList in all regions. Our team is currently working on a solution.

Traffic not reaching web services in Singapore region

Description: This incident has been resolved.

Status: Resolved

Impact: Critical | Started At: March 29, 2024, 8:40 p.m.

Updates:

Time: March 29, 2024, 9:43 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: March 29, 2024, 9:14 p.m.

Status: Monitoring

Update: Traffic has resumed to services in the region. We are monitoring the results.
Time: March 29, 2024, 8:40 p.m.

Status: Investigating

Update: We've identified an issue where traffic is not reaching web services in the Singapore region. Our team is aware of the issue and is currently investigating.

External Redis connection issues in Ohio

Description: This incident has been resolved.

Status: Resolved

Impact: Minor | Started At: March 29, 2024, 4:56 p.m.

Updates:

Time: March 29, 2024, 7:33 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: March 29, 2024, 7:12 p.m.

Status: Monitoring

Update: We have successfully mitigated the issue but we're still monitoring. The issue was isolated to load-balancing external Redis requests.
Time: March 29, 2024, 7 p.m.

Status: Investigating

Update: We are still investigating but we are now seeing some recovery.
Time: March 29, 2024, 6:36 p.m.

Status: Investigating

Update: We are still investigating.
Time: March 29, 2024, 6:13 p.m.

Status: Investigating

Update: We are still exploring different mitigation strategies.
Time: March 29, 2024, 5:37 p.m.

Status: Investigating

Update: We're still investigating and looking at different mitigation strategies.
Time: March 29, 2024, 4:58 p.m.

Status: Investigating

Update: We believe this is only affecting connections from the external network. Internal connections and the instance themselves are fine.
Time: March 29, 2024, 4:56 p.m.

Status: Investigating

Update: We have received reports of connection to Redis instances hosted in Ohio hanging. We're investigating.

Free PostgreSQL Datastores Unavailable

Description: All of the free PostgreSQL databases have recovered.

Status: Resolved

Impact: Major | Started At: March 26, 2024, 9 p.m.

Updates:

Time: March 26, 2024, 10:09 p.m.

Status: Resolved

Update: All of the free PostgreSQL databases have recovered.
Time: March 26, 2024, 9:44 p.m.

Status: Monitoring

Update: The fix has rolled out to the affected clusters, and we have observed recovery. We are currently verifying that there is no further impact.
Time: March 26, 2024, 9:26 p.m.

Status: Identified

Update: A fix has been implemented for our affected free PostgreSQL datastores, and is currently rolling out to our clusters.
Time: March 26, 2024, 9 p.m.

Status: Identified

Update: Some of our free PostgreSQL datastores are unavailable. We are working to bring these databases up as quickly as possible.

Platform Outages

Description: # Summary On March 26, 2024, at 16:07 UTC, services on Render's platform were disrupted after an unintended restart of all customer and system workloads. No data was lost, and static sites and services not dependent on PostgreSQL, Redis, or Disks recovered within 20 minutes. However, recovery of managed data stores and other services with attached disks took significantly longer due to the scale of the event and underlying rate limits associated with moving disks between machines. Full functionality across all regions was restored at 20:00 UTC. Most PostgreSQL databases, Redis instances, and services with attached disks saw much longer recovery times due to system-level throttling and rate limits that weren't designed for an event of this nature and scale. Centralized logging and metrics services were also slow to recover due to these limits. We increased the underlying rate limits after discovering the root cause of the delay, and took other actions to improve recovery times. However, even with these mitigations, the scale of the event delayed complete recovery of paid services until 18:45 UTC and free services until 20:00 UTC. We are incredibly sorry for the extended disruption we caused for many customers. Platform reliability has always been our top priority as a company, and we let you down. We have implemented measures to prevent an incident of this scale and nature going forward. We are prioritizing further prevention and mitigation measures to improve platform resilience. # Timeline _All times in UTC on 2024-03-26._ * 16:07 - An unintended change causes a restart of all customer and system workloads. * 16:07 - Render engineers are paged. * 16:09 - We open an internal critical incident to investigate. * 16:19 - We identify the source of the restart and disable it to prevent further restarts. * 16:19 - We open a public incident on [https://status.render.com](https://status.render.com/) and continue to investigate and mitigate. * 16:21 - All static sites, stateless services in all regions, and all services hosted on GCP are restored. * 17:48 - All paid services are restored in Singapore. * 17:53 - All paid services are restored in Ohio. * 18:34 - All paid services are restored in Oregon. * 18:45 - All paid services are restored in Frankfurt. * 19:40 - Logs are restored in all regions. * 20:00 - All free services are restored in all regions. # What happened On March 26, 2024, at 16:07 UTC, a faulty code change caused a restart of all workloads on our platform. This change was put behind a feature flag and tested manually and automatically in Render's development and staging environments, but a combination of issues ultimately led to the bug making it to production: * The testing infrastructure for the code change was inconsistent across production and non-production environments. * The change was feature-flagged, but a subtle bug in the feature-flagging code prevented the faulty code from running in non-production environments and surfacing sooner. Our systems paged our engineers as soon as the incident started, and we opened an internal incident to investigate. We declared a public incident 12 minutes after the initial report. We quickly identified the faulty code and disabled it to prevent additional workload restarts. While many services without attached disks recovered within minutes, components responsible for restarting services with attached disks \(PostgreSQL, Redis, and services with explicitly attached disks\) were severely overloaded due to the unprecedented scale of the event, leading to significantly increased recovery times for many of these services. When services restart, they are often transparently moved to a different host. When services with an attached disk \(including managed PostgreSQL and Redis instances\) are restarted and moved to a different host, our systems detach the disk from the source host and attach it to the target host. In isolation, a single detach-attach operation takes, at most, a few minutes. However, hundreds of thousands of services with disks were restarted simultaneously during the incident, overloading the systems responsible for moving disks between machines and significantly slowing down our ability to restore service As we worked to expedite the recovery process, we discovered and quickly increased default rate limits for the detach-attach process. We also noticed throttling in an underlying infrastructure provider and worked with the provider to increase rate limits across all impacted regions. We increased these limits to the maximum values possible without creating further instability in our systems. We also prioritized paid service recovery by temporarily suspending free PostgreSQL instances during the incident. These changes enabled considerably faster recovery of impacted services; however, full recovery took longer due to the overwhelming volume of the restarts. Some monitoring and logging systems that rely on attached disks were also unavailable during the incident, leading to gaps in some service metrics and logs. # Mitigations This incident was the most severe and widespread outage in Render's history and surfaced multiple opportunities for us to improve platform reliability further and minimize time to recovery. They are listed below and are being implemented with the highest priority. ### Increased disk management rate limits As discussed above, we increased multiple rate limits in the systems responsible for moving disks between machines. While we are confident in our ability to prevent similar incidents in the future, we are also now equipped to recover much faster than before. ### Ensure consistency between production and non-production systems Our investigation found subtle differences in testing infrastructure between our production and non-production environments. We are working to standardize and improve our testing processes to prevent similar incidents going forward. ### Improve our disk management infrastructure In addition to increased rate limits, we are also making code changes to the components that manage disks. Specifically, we will rely more on batched operations, increasing disk management throughput by an order of magnitude. ### Restrict permissions for control plane components The control plane code that triggered the incident only needed to interact with a small subset of system resources and should not have had permission to restart existing customer services and data stores. We are adding system-level restrictions so that only necessary control plane components and services can interact with customer services. ### Improve incident communication Our investigation uncovered multiple gaps in incident communications. It took 12 minutes after opening an internal incident to update Render's public status; while our engineers worked to collect enough information to provide a meaningful update, we should have opened a public incident sooner. In our initial update, we incorrectly used the 'Degraded Performance' status instead of 'Partial Outage' or 'Full Outage'. As a result, individual component statuses did not reflect the severity of the incident until our next update 22 minutes later. We understand the critical importance of timely and accurate updates during incidents; we are working on automation and improving our incident response processes to ensure that our status page and other communication channels are updated as soon as relevant information becomes available.

Status: Postmortem

Impact: Major | Started At: March 26, 2024, 4:19 p.m.

Updates:

Time: March 29, 2024, 10:39 p.m.

Status: Postmortem

Update: # Summary On March 26, 2024, at 16:07 UTC, services on Render's platform were disrupted after an unintended restart of all customer and system workloads. No data was lost, and static sites and services not dependent on PostgreSQL, Redis, or Disks recovered within 20 minutes. However, recovery of managed data stores and other services with attached disks took significantly longer due to the scale of the event and underlying rate limits associated with moving disks between machines. Full functionality across all regions was restored at 20:00 UTC. Most PostgreSQL databases, Redis instances, and services with attached disks saw much longer recovery times due to system-level throttling and rate limits that weren't designed for an event of this nature and scale. Centralized logging and metrics services were also slow to recover due to these limits. We increased the underlying rate limits after discovering the root cause of the delay, and took other actions to improve recovery times. However, even with these mitigations, the scale of the event delayed complete recovery of paid services until 18:45 UTC and free services until 20:00 UTC. We are incredibly sorry for the extended disruption we caused for many customers. Platform reliability has always been our top priority as a company, and we let you down. We have implemented measures to prevent an incident of this scale and nature going forward. We are prioritizing further prevention and mitigation measures to improve platform resilience. # Timeline _All times in UTC on 2024-03-26._ * 16:07 - An unintended change causes a restart of all customer and system workloads. * 16:07 - Render engineers are paged. * 16:09 - We open an internal critical incident to investigate. * 16:19 - We identify the source of the restart and disable it to prevent further restarts. * 16:19 - We open a public incident on [https://status.render.com](https://status.render.com/) and continue to investigate and mitigate. * 16:21 - All static sites, stateless services in all regions, and all services hosted on GCP are restored. * 17:48 - All paid services are restored in Singapore. * 17:53 - All paid services are restored in Ohio. * 18:34 - All paid services are restored in Oregon. * 18:45 - All paid services are restored in Frankfurt. * 19:40 - Logs are restored in all regions. * 20:00 - All free services are restored in all regions. # What happened On March 26, 2024, at 16:07 UTC, a faulty code change caused a restart of all workloads on our platform. This change was put behind a feature flag and tested manually and automatically in Render's development and staging environments, but a combination of issues ultimately led to the bug making it to production: * The testing infrastructure for the code change was inconsistent across production and non-production environments. * The change was feature-flagged, but a subtle bug in the feature-flagging code prevented the faulty code from running in non-production environments and surfacing sooner. Our systems paged our engineers as soon as the incident started, and we opened an internal incident to investigate. We declared a public incident 12 minutes after the initial report. We quickly identified the faulty code and disabled it to prevent additional workload restarts. While many services without attached disks recovered within minutes, components responsible for restarting services with attached disks \(PostgreSQL, Redis, and services with explicitly attached disks\) were severely overloaded due to the unprecedented scale of the event, leading to significantly increased recovery times for many of these services. When services restart, they are often transparently moved to a different host. When services with an attached disk \(including managed PostgreSQL and Redis instances\) are restarted and moved to a different host, our systems detach the disk from the source host and attach it to the target host. In isolation, a single detach-attach operation takes, at most, a few minutes. However, hundreds of thousands of services with disks were restarted simultaneously during the incident, overloading the systems responsible for moving disks between machines and significantly slowing down our ability to restore service As we worked to expedite the recovery process, we discovered and quickly increased default rate limits for the detach-attach process. We also noticed throttling in an underlying infrastructure provider and worked with the provider to increase rate limits across all impacted regions. We increased these limits to the maximum values possible without creating further instability in our systems. We also prioritized paid service recovery by temporarily suspending free PostgreSQL instances during the incident. These changes enabled considerably faster recovery of impacted services; however, full recovery took longer due to the overwhelming volume of the restarts. Some monitoring and logging systems that rely on attached disks were also unavailable during the incident, leading to gaps in some service metrics and logs. # Mitigations This incident was the most severe and widespread outage in Render's history and surfaced multiple opportunities for us to improve platform reliability further and minimize time to recovery. They are listed below and are being implemented with the highest priority. ### Increased disk management rate limits As discussed above, we increased multiple rate limits in the systems responsible for moving disks between machines. While we are confident in our ability to prevent similar incidents in the future, we are also now equipped to recover much faster than before. ### Ensure consistency between production and non-production systems Our investigation found subtle differences in testing infrastructure between our production and non-production environments. We are working to standardize and improve our testing processes to prevent similar incidents going forward. ### Improve our disk management infrastructure In addition to increased rate limits, we are also making code changes to the components that manage disks. Specifically, we will rely more on batched operations, increasing disk management throughput by an order of magnitude. ### Restrict permissions for control plane components The control plane code that triggered the incident only needed to interact with a small subset of system resources and should not have had permission to restart existing customer services and data stores. We are adding system-level restrictions so that only necessary control plane components and services can interact with customer services. ### Improve incident communication Our investigation uncovered multiple gaps in incident communications. It took 12 minutes after opening an internal incident to update Render's public status; while our engineers worked to collect enough information to provide a meaningful update, we should have opened a public incident sooner. In our initial update, we incorrectly used the 'Degraded Performance' status instead of 'Partial Outage' or 'Full Outage'. As a result, individual component statuses did not reflect the severity of the incident until our next update 22 minutes later. We understand the critical importance of timely and accurate updates during incidents; we are working on automation and improving our incident response processes to ensure that our status page and other communication channels are updated as soon as relevant information becomes available.
Time: March 26, 2024, 8:02 p.m.

Status: Resolved

Update: Everything is operating normally. We're deeply sorry for the outage; we will soon follow up with a detailed incident report, including mitigations and prevention measures.
Time: March 26, 2024, 7:58 p.m.

Status: Monitoring

Update: All services have recovered. We're continuing to monitor isolated cases.
Time: March 26, 2024, 7:28 p.m.

Status: Identified

Update: All paid services are now operating normally. We're working to restore availability for free tier services.
Time: March 26, 2024, 6:56 p.m.

Status: Identified

Update: Nearly all services have recovered across all regions. We're working towards 100% recovery for all paid services, and will start bringing back free services next.
Time: March 26, 2024, 6:23 p.m.

Status: Identified

Update: Singapore and Ohio have now fully recovered. We have accelerated recovery for the remaining Oregon and Frankfurt databases and services with disks. Free-tier databases will remain unavailable until further notice as we prioritize recovery for paid services.
Time: March 26, 2024, 5:58 p.m.

Status: Identified

Update: Services in Singapore have recovered fully. Over the last 15 minutes, we have seen a recovery of the majority of PostgreSQL, Redis, and services with attached disks, and we continue to observe the recovery of others. Engineers are working on improving recovery times for these services. We aim for full recovery for all paid services before 12 PM PT.
Time: March 26, 2024, 5:29 p.m.

Status: Identified

Update: Due to the scope of the incident, we need to intentionally and sequentially recover PostgreSQL and Redis functionality across the fleet. We're actively working towards full recovery and collaborating with upstream providers to speed things up.
Time: March 26, 2024, 4:45 p.m.

Status: Identified

Update: Data Services (Postgres/Redis) and services with attached disks are taking additional work to recover. Application Services that connect to those data services will experience failures or degredation as a result.
Time: March 26, 2024, 4:28 p.m.

Status: Identified

Update: Many services recovered automatically, engineering is continuing to identify still affected services and mitigate issues as necessary.
Time: March 26, 2024, 4:20 p.m.

Status: Identified

Update: We are continuing to work on a fix for this issue.
Time: March 26, 2024, 4:19 p.m.

Status: Identified

Update: We are encountering a broad range of outages across the Render Platform affecting connections and services. Engineering is working on mitigating the cause of these issues and narrowing down any non-affected components.

Check the status of similar companies and alternatives to Render

Akamai

Systems Active

Nutanix

Systems Active

MongoDB

Systems Active

LogicMonitor

Systems Active

Acquia

Systems Active

Granicus System

Systems Active

CareCloud

Systems Active

Redis

Systems Active

integrator.io

Systems Active

NinjaOne Trust

Systems Active

Pantheon Operations

Systems Active

Securiti US

Systems Active

Frequently Asked Questions - Render

Is there a Render outage?

The current status of Render is: Systems Active

Where can I find the official status page of Render?

The official status page for Render is here

How can I get notified if Render is down or experiencing an outage?

To get notified of any status changes to Render, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Render every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here

What does Render do?

Render provides a cloud platform for building and running apps and websites with TLS certificates, CDN, private networks, and Git auto-deploys.

Is there an Render outage?

Render status: Systems Active

Render outages and incidents

There have been 2 outages or incidents for Render in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Components and Services Monitored for Render

Frankfurt

Ohio

Oregon

Singapore

Virginia

Latest Render outages and incidents.

Degraded external connections to Redis with IP AllowList

Updates:

Traffic not reaching web services in Singapore region

Updates:

External Redis connection issues in Ohio

Updates:

Free PostgreSQL Datastores Unavailable

Updates:

Platform Outages

Updates:

Check the status of similar companies and alternatives to Render

Frequently Asked Questions - Render

Is there a Render outage?

Where can I find the official status page of Render?

How can I get notified if Render is down or experiencing an outage?

What does Render do?

Start monitoring now!