Company Logo

Is there an Render outage?

Render status: Systems Active

Last checked: 47 seconds ago

Get notified about any outages, downtime or incidents for Render and 1800+ other cloud vendors. Monitor 10 companies, for free.

Subscribe for updates

Render outages and incidents

Outage and incident data over the last 30 days for Render.

There have been 2 outages or incidents for Render in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Sign Up Now

Components and Services Monitored for Render

Outlogger tracks the status of these components for Xero:

Custom Domains Active
Render API Active
Render Dashboard Active
Render Website Active
Static Sites Active
Autoscaling Active
Background Workers Active
Builds and Deploys Active
Cron Jobs Active
Metrics/Logs Active
PostgreSQL Active
Redis Active
Web Services Active
Web Services - Free Tier Active
Autoscaling Active
Background Workers Active
Builds and Deploys Active
Cron Jobs Active
Metrics/Logs Active
PostgreSQL Active
Redis Active
Web Services Active
Web Services - Free Tier Active
Autoscaling Active
Background Workers Active
Builds and Deploys Active
Cron Jobs Active
Metrics/Logs Active
PostgreSQL Active
Redis Active
Web Services Active
Web Services - Free Tier Active
Autoscaling Active
Background Workers Active
Builds and Deploys Active
Cron Jobs Active
Metrics/Logs Active
PostgreSQL Active
Redis Active
Web Services Active
Web Services - Free Tier Active
Autoscaling Active
Background Workers Active
Builds and Deploys Active
Cron Jobs Active
Metrics/Logs Active
PostgreSQL Active
Redis Active
Web Services Active
Web Services - Free Tier Active
Component Status
Custom Domains Active
Render API Active
Render Dashboard Active
Render Website Active
Static Sites Active
Active
Autoscaling Active
Background Workers Active
Builds and Deploys Active
Cron Jobs Active
Metrics/Logs Active
PostgreSQL Active
Redis Active
Web Services Active
Web Services - Free Tier Active
Active
Autoscaling Active
Background Workers Active
Builds and Deploys Active
Cron Jobs Active
Metrics/Logs Active
PostgreSQL Active
Redis Active
Web Services Active
Web Services - Free Tier Active
Active
Autoscaling Active
Background Workers Active
Builds and Deploys Active
Cron Jobs Active
Metrics/Logs Active
PostgreSQL Active
Redis Active
Web Services Active
Web Services - Free Tier Active
Active
Autoscaling Active
Background Workers Active
Builds and Deploys Active
Cron Jobs Active
Metrics/Logs Active
PostgreSQL Active
Redis Active
Web Services Active
Web Services - Free Tier Active
Active
Autoscaling Active
Background Workers Active
Builds and Deploys Active
Cron Jobs Active
Metrics/Logs Active
PostgreSQL Active
Redis Active
Web Services Active
Web Services - Free Tier Active

Latest Render outages and incidents.

View the latest incidents for Render and check for official updates:

Updates:

  • Time: March 30, 2024, 12:01 a.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: March 29, 2024, 11:51 p.m.
    Status: Monitoring
    Update: A fix has been implemented and we are monitoring the results.
  • Time: March 29, 2024, 11:15 p.m.
    Status: Identified
    Update: We have identified an issue impacting external connections to Redis with IP AllowList in all regions. Our team is currently working on a solution.

Updates:

  • Time: March 29, 2024, 9:43 p.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: March 29, 2024, 9:14 p.m.
    Status: Monitoring
    Update: Traffic has resumed to services in the region. We are monitoring the results.
  • Time: March 29, 2024, 8:40 p.m.
    Status: Investigating
    Update: We've identified an issue where traffic is not reaching web services in the Singapore region. Our team is aware of the issue and is currently investigating.

Updates:

  • Time: March 29, 2024, 7:33 p.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: March 29, 2024, 7:12 p.m.
    Status: Monitoring
    Update: We have successfully mitigated the issue but we're still monitoring. The issue was isolated to load-balancing external Redis requests.
  • Time: March 29, 2024, 7 p.m.
    Status: Investigating
    Update: We are still investigating but we are now seeing some recovery.
  • Time: March 29, 2024, 6:36 p.m.
    Status: Investigating
    Update: We are still investigating.
  • Time: March 29, 2024, 6:13 p.m.
    Status: Investigating
    Update: We are still exploring different mitigation strategies.
  • Time: March 29, 2024, 5:37 p.m.
    Status: Investigating
    Update: We're still investigating and looking at different mitigation strategies.
  • Time: March 29, 2024, 4:58 p.m.
    Status: Investigating
    Update: We believe this is only affecting connections from the external network. Internal connections and the instance themselves are fine.
  • Time: March 29, 2024, 4:56 p.m.
    Status: Investigating
    Update: We have received reports of connection to Redis instances hosted in Ohio hanging. We're investigating.

Updates:

  • Time: March 26, 2024, 10:09 p.m.
    Status: Resolved
    Update: All of the free PostgreSQL databases have recovered.
  • Time: March 26, 2024, 9:44 p.m.
    Status: Monitoring
    Update: The fix has rolled out to the affected clusters, and we have observed recovery. We are currently verifying that there is no further impact.
  • Time: March 26, 2024, 9:26 p.m.
    Status: Identified
    Update: A fix has been implemented for our affected free PostgreSQL datastores, and is currently rolling out to our clusters.
  • Time: March 26, 2024, 9 p.m.
    Status: Identified
    Update: Some of our free PostgreSQL datastores are unavailable. We are working to bring these databases up as quickly as possible.

Updates:

  • Time: March 29, 2024, 10:39 p.m.
    Status: Postmortem
    Update: # Summary On March 26, 2024, at 16:07 UTC, services on Render's platform were disrupted after an unintended restart of all customer and system workloads. No data was lost, and static sites and services not dependent on PostgreSQL, Redis, or Disks recovered within 20 minutes. However, recovery of managed data stores and other services with attached disks took significantly longer due to the scale of the event and underlying rate limits associated with moving disks between machines. Full functionality across all regions was restored at 20:00 UTC. Most PostgreSQL databases, Redis instances, and services with attached disks saw much longer recovery times due to system-level throttling and rate limits that weren't designed for an event of this nature and scale. Centralized logging and metrics services were also slow to recover due to these limits. We increased the underlying rate limits after discovering the root cause of the delay, and took other actions to improve recovery times. However, even with these mitigations, the scale of the event delayed complete recovery of paid services until 18:45 UTC and free services until 20:00 UTC. We are incredibly sorry for the extended disruption we caused for many customers. Platform reliability has always been our top priority as a company, and we let you down. We have implemented measures to prevent an incident of this scale and nature going forward. We are prioritizing further prevention and mitigation measures to improve platform resilience. # Timeline _All times in UTC on 2024-03-26._ * 16:07 - An unintended change causes a restart of all customer and system workloads. * 16:07 - Render engineers are paged. * 16:09 - We open an internal critical incident to investigate. * 16:19 - We identify the source of the restart and disable it to prevent further restarts. * 16:19 - We open a public incident on [https://status.render.com](https://status.render.com/) and continue to investigate and mitigate. * 16:21 - All static sites, stateless services in all regions, and all services hosted on GCP are restored. * 17:48 - All paid services are restored in Singapore. * 17:53 - All paid services are restored in Ohio. * 18:34 - All paid services are restored in Oregon. * 18:45 - All paid services are restored in Frankfurt. * 19:40 - Logs are restored in all regions. * 20:00 - All free services are restored in all regions. # What happened On March 26, 2024, at 16:07 UTC, a faulty code change caused a restart of all workloads on our platform. This change was put behind a feature flag and tested manually and automatically in Render's development and staging environments, but a combination of issues ultimately led to the bug making it to production: * The testing infrastructure for the code change was inconsistent across production and non-production environments. * The change was feature-flagged, but a subtle bug in the feature-flagging code prevented the faulty code from running in non-production environments and surfacing sooner. Our systems paged our engineers as soon as the incident started, and we opened an internal incident to investigate. We declared a public incident 12 minutes after the initial report. We quickly identified the faulty code and disabled it to prevent additional workload restarts. While many services without attached disks recovered within minutes, components responsible for restarting services with attached disks \(PostgreSQL, Redis, and services with explicitly attached disks\) were severely overloaded due to the unprecedented scale of the event, leading to significantly increased recovery times for many of these services. When services restart, they are often transparently moved to a different host. When services with an attached disk \(including managed PostgreSQL and Redis instances\) are restarted and moved to a different host, our systems detach the disk from the source host and attach it to the target host. In isolation, a single detach-attach operation takes, at most, a few minutes. However, hundreds of thousands of services with disks were restarted simultaneously during the incident, overloading the systems responsible for moving disks between machines and significantly slowing down our ability to restore service As we worked to expedite the recovery process, we discovered and quickly increased default rate limits for the detach-attach process. We also noticed throttling in an underlying infrastructure provider and worked with the provider to increase rate limits across all impacted regions. We increased these limits to the maximum values possible without creating further instability in our systems. We also prioritized paid service recovery by temporarily suspending free PostgreSQL instances during the incident. These changes enabled considerably faster recovery of impacted services; however, full recovery took longer due to the overwhelming volume of the restarts. Some monitoring and logging systems that rely on attached disks were also unavailable during the incident, leading to gaps in some service metrics and logs. # Mitigations This incident was the most severe and widespread outage in Render's history and surfaced multiple opportunities for us to improve platform reliability further and minimize time to recovery. They are listed below and are being implemented with the highest priority. ### Increased disk management rate limits As discussed above, we increased multiple rate limits in the systems responsible for moving disks between machines. While we are confident in our ability to prevent similar incidents in the future, we are also now equipped to recover much faster than before. ### Ensure consistency between production and non-production systems Our investigation found subtle differences in testing infrastructure between our production and non-production environments. We are working to standardize and improve our testing processes to prevent similar incidents going forward. ### Improve our disk management infrastructure In addition to increased rate limits, we are also making code changes to the components that manage disks. Specifically, we will rely more on batched operations, increasing disk management throughput by an order of magnitude. ### Restrict permissions for control plane components The control plane code that triggered the incident only needed to interact with a small subset of system resources and should not have had permission to restart existing customer services and data stores. We are adding system-level restrictions so that only necessary control plane components and services can interact with customer services. ### Improve incident communication Our investigation uncovered multiple gaps in incident communications. It took 12 minutes after opening an internal incident to update Render's public status; while our engineers worked to collect enough information to provide a meaningful update, we should have opened a public incident sooner. In our initial update, we incorrectly used the 'Degraded Performance' status instead of 'Partial Outage' or 'Full Outage'. As a result, individual component statuses did not reflect the severity of the incident until our next update 22 minutes later. We understand the critical importance of timely and accurate updates during incidents; we are working on automation and improving our incident response processes to ensure that our status page and other communication channels are updated as soon as relevant information becomes available.
  • Time: March 26, 2024, 8:02 p.m.
    Status: Resolved
    Update: Everything is operating normally. We're deeply sorry for the outage; we will soon follow up with a detailed incident report, including mitigations and prevention measures.
  • Time: March 26, 2024, 7:58 p.m.
    Status: Monitoring
    Update: All services have recovered. We're continuing to monitor isolated cases.
  • Time: March 26, 2024, 7:28 p.m.
    Status: Identified
    Update: All paid services are now operating normally. We're working to restore availability for free tier services.
  • Time: March 26, 2024, 6:56 p.m.
    Status: Identified
    Update: Nearly all services have recovered across all regions. We're working towards 100% recovery for all paid services, and will start bringing back free services next.
  • Time: March 26, 2024, 6:23 p.m.
    Status: Identified
    Update: Singapore and Ohio have now fully recovered. We have accelerated recovery for the remaining Oregon and Frankfurt databases and services with disks. Free-tier databases will remain unavailable until further notice as we prioritize recovery for paid services.
  • Time: March 26, 2024, 5:58 p.m.
    Status: Identified
    Update: Services in Singapore have recovered fully. Over the last 15 minutes, we have seen a recovery of the majority of PostgreSQL, Redis, and services with attached disks, and we continue to observe the recovery of others. Engineers are working on improving recovery times for these services. We aim for full recovery for all paid services before 12 PM PT.
  • Time: March 26, 2024, 5:29 p.m.
    Status: Identified
    Update: Due to the scope of the incident, we need to intentionally and sequentially recover PostgreSQL and Redis functionality across the fleet. We're actively working towards full recovery and collaborating with upstream providers to speed things up.
  • Time: March 26, 2024, 4:45 p.m.
    Status: Identified
    Update: Data Services (Postgres/Redis) and services with attached disks are taking additional work to recover. Application Services that connect to those data services will experience failures or degredation as a result.
  • Time: March 26, 2024, 4:28 p.m.
    Status: Identified
    Update: Many services recovered automatically, engineering is continuing to identify still affected services and mitigate issues as necessary.
  • Time: March 26, 2024, 4:20 p.m.
    Status: Identified
    Update: We are continuing to work on a fix for this issue.
  • Time: March 26, 2024, 4:19 p.m.
    Status: Identified
    Update: We are encountering a broad range of outages across the Render Platform affecting connections and services. Engineering is working on mitigating the cause of these issues and narrowing down any non-affected components.

Check the status of similar companies and alternatives to Render

Akamai
Akamai

Systems Active

Nutanix
Nutanix

Systems Active

MongoDB
MongoDB

Systems Active

LogicMonitor
LogicMonitor

Systems Active

Acquia
Acquia

Systems Active

Granicus System
Granicus System

Systems Active

CareCloud
CareCloud

Systems Active

Redis
Redis

Systems Active

integrator.io
integrator.io

Systems Active

NinjaOne Trust

Systems Active

Pantheon Operations
Pantheon Operations

Systems Active

Securiti US
Securiti US

Systems Active

Frequently Asked Questions - Render

Is there a Render outage?
The current status of Render is: Systems Active
Where can I find the official status page of Render?
The official status page for Render is here
How can I get notified if Render is down or experiencing an outage?
To get notified of any status changes to Render, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Render every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here
What does Render do?
Render provides a cloud platform for building and running apps and websites with TLS certificates, CDN, private networks, and Git auto-deploys.