Company Logo

Is there an Scaleway outage?

Scaleway status: Minor Outage

Last checked: a minute ago

Get notified about any outages, downtime or incidents for Scaleway and 1800+ other cloud vendors. Monitor 10 companies, for free.

Subscribe for updates

Scaleway outages and incidents

Outage and incident data over the last 30 days for Scaleway.

There have been 37 outages or incidents for Scaleway in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Sign Up Now

Components and Services Monitored for Scaleway

Outlogger tracks the status of these components for Xero:

Element product Active
Serverless-Database Active
Website Active
AMS Minor Outage
DC2 Minor Outage
DC3 Performance Issues
DC5 Performance Issues
Dedibackup Active
Dedibox Minor Outage
Dedibox Console Active
Dedirack Active
Domains Minor Outage
Hosting Active
RPN Performance Issues
SAN Active
fr-par-1 Minor Outage
fr-par-2 Active
fr-par-3 Minor Outage
nl-ams-1 Minor Outage
nl-ams-2 Minor Outage
nl-ams-3 Minor Outage
pl-waw-1 Minor Outage
pl-waw-2 Minor Outage
pl-waw-3 Minor Outage
Account API Active
Apple Silicon M1 Active
Billing API Active
Block Storage Active
BMaaS Active
C14 Cold Storage Active
Container Registry Performance Issues
DBaaS Active
Domains Active
Elastic Metal Active
Elements Console Active
Elements - Products Active
Functions and Containers Performance Issues
Hosting Active
Instances Active
IoT Hub Active
Jobs Active
Kapsule Active
LBaaS Active
Network Active
Object Storage Active
Observability Active
Private Network Active
Transactional Email Active
Excellence Active
Component Status
Element product Active
Serverless-Database Active
Website Active
Minor Outage
AMS Minor Outage
DC2 Minor Outage
DC3 Performance Issues
DC5 Performance Issues
Minor Outage
Dedibackup Active
Dedibox Minor Outage
Dedibox Console Active
Dedirack Active
Domains Minor Outage
Hosting Active
RPN Performance Issues
SAN Active
Minor Outage
fr-par-1 Minor Outage
fr-par-2 Active
fr-par-3 Minor Outage
nl-ams-1 Minor Outage
nl-ams-2 Minor Outage
nl-ams-3 Minor Outage
pl-waw-1 Minor Outage
pl-waw-2 Minor Outage
pl-waw-3 Minor Outage
Performance Issues
Account API Active
Apple Silicon M1 Active
Billing API Active
Block Storage Active
BMaaS Active
C14 Cold Storage Active
Container Registry Performance Issues
DBaaS Active
Domains Active
Elastic Metal Active
Elements Console Active
Elements - Products Active
Functions and Containers Performance Issues
Hosting Active
Instances Active
IoT Hub Active
Jobs Active
Kapsule Active
LBaaS Active
Network Active
Object Storage Active
Observability Active
Private Network Active
Transactional Email Active
Active
Excellence Active

Latest Scaleway outages and incidents.

View the latest incidents for Scaleway and check for official updates:

Updates:

  • Time: Oct. 21, 2024, 12:39 p.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: Oct. 21, 2024, 12:01 p.m.
    Status: Identified
    Update: Today, from 11:33 to 11:46 AM (UTC+2:00), one of our machine in production was running with the wrong configuration. Leading to an InvalidStorageClass error even though the storage class was valid.
  • Time: Oct. 21, 2024, 8:55 a.m.
    Status: Investigating
    Update: Following an electrical maintenance in a datacenter, around 30 machines unexpectedly rebooted, we are currently putting them back into production. This led to an increase in tail latencies.

Updates:

  • Time: Oct. 18, 2024, 2:56 p.m.
    Status: Postmortem
    Update: **Post Mortem: 2024-10-18, Scaleway Elements Partial Loss of VPC Connectivity in FR-PAR-1** **Summary** On Friday, October 18, 2024 Scaleway Elements cloud ecosystem experienced a network incident that caused cascade effects in various products in the FR-PAR-1 Availability Zone. The incident started at 6:20 UTC and had two main impact periods: * From 6:20 to 6:21 UTC: around a minute of instability for a very limited number of instances hosted in a single rack. Some customers might have experienced heavy packet loss for all types of network connectivity for a subset of their instances. * From 7:50 to 8:01 UTC: unavailability of VPC network connectivity for a subset of around 25% of hypervisors of FR-PAR-1 AZ and subsequent impact for VPC dependent products. Public internet connectivity was not impacted during this period. Elastic Metal VPC connectivity was not impacted. ## **Scaleway Elements Infrastructure** Scaleway Elements cloud ecosystem is built on top of stacked layers of infrastructure: 1. Data Centers: electrical power systems, cooling systems, optical fiber cabling and physical security. 2. Hardware and physical network infrastructure: servers, data center fabric networks, backbone routers, and inter-datacenter links. 3. Virtualized multi-tenant cloud foundation: Virtual machines running on top of hypervisors, 4. Virtualized software defined network, providing multi-tenant VPC networks and VPC edge services, such as DHCP and DNS within VPC. High-level PaaS products: K8S, Database, Load Balancer, Serverless, Observability and many more, running on top of VM instances and using VPC networks for internal communication. These layers are running on top of each other: the higher layers are dependent on the lower layers of the infrastructure. In a high-load massive scale environment, consisting of many thousands of physical machines, eventual hardware failures are routine events happening several times per week. The absolute majority of them are invisible to the customers as we build our infrastructure in a redundant fashion. All of the layers have their own redundancy and failover mechanisms which make them resilient to the failures in the lower layers. All the critical systems have at least two instances \(often more\), deployed in an active-active fashion with a 50% load threshold, so if a critical instance fails, the remaining capacity is able to handle 100% of the load. ‌ Timeline * At 6:20 UTC, one of the Top of Rack \(ToR\) switches experienced a software crash. Due to the unstable state of its software modules, it took around 50 seconds for the network protocols to failover all traffic of the rack to the second ToR switch. * During the convergence time, the traffic to and from the hypervisors in the impacted rack experienced a high percentage of packet drops of a non deterministic nature. * By 6:21 UTC, all traffic was fully rerouted to the backup device and the instability was resolved. * At 6:38 UTC, the crashed switch terminated its automatic reboot and restored normal operation of the redundant ToR mode. * However, this instability caused a cascade effect on one of the infrastructure blocks of the virtualized network infrastructure: a BGP route reflector \(RR\), used for VPC services. This software was hosted on one of the hypervisors in the impacted rack. * The RR software stack experienced an instability and got stuck in an abnormal state which could not be resolved by the auto-restart process. * At this point, customers didn’t experience any impact of the VPC services as the backup RR was operating normally. However the RR redundancy was lost. At 7:50 UTC, the second RR experienced a critical condition and also got stuck in a non-operational state. * At this point customers experienced a disruption of the VPC connectivity for a subset of FR-PAR-1 AZ Scaleway Elements virtual products. Around 25% of the hypervisors lost VPC connectivity with the rest of the region. * Both RR software stacks have health-check monitoring and auto-restart mechanism which should have addressed this type of failure. The health-check monitoring successfully detected the anomaly however the auto-restart mechanism failed. * The impact was detected by Scaleway Customer Excellence and SRE teams, and an incident was opened with subsequent notification of the technical top-management. * Both VPC route reflectors were fixed by a manual action. * By 8:01 UTC VPC connectivity was fully restored. * By 8:07 UTC the situation was back to nominal state with redundancy operating normally ‌ ## VPC Dependent Products Impact ### Managed Database and Redis * Impacted during both periods: Short period of connectivity loss of the impacted rack \(6:20-6:21 UTC\). Some Database customers experienced 500 HTTP errors for connections to their databases. * VPC connectivity loss for 25% of FR-PAR-1 hypervisors \(7:50-8:01 UTC\). Impacted customers could not use VPC to connect to their managed databases. ## Serverless Functions and Containers During the VPC connectivity loss period \(7:50-8:01 UTC\) one of the serverless compute nodes was unavailable. A subset of customers with workloads on this node could experience service disruptions, including potential 500 errors when calling their functions/container while their workloads were being rescheduled. ## Kapsule During the VPC connectivity loss period \(7:50-8:01 UTC\), there was a network partition between nodes, preventing applications running on different nodes to communicate. The infrastructure hosting control-planes uses VPC, and was impacted too, causing some control-plane unavailability. Unfortunately, some nodes were replaced by the autohealing because they were unable to report their availability, causing workloads to be rescheduled/restarted. By 8:04 UTC almost all clusters were recovered. # Lessons Learned and Further Actions * We have fixed the autohealing mechanism which failed to fix the route reflectors in a stale state: [https://github.com/FRRouting/frr/pull/17163.](https://github.com/FRRouting/frr/pull/17163.) * We are planning to introduce software version diversity for route reflectors to avoid multiple instances being impacted by a single bug. * We have a plan to deeply investigate the software issue which caused that stale state.
  • Time: Oct. 18, 2024, 2:56 p.m.
    Status: Resolved
    Update: Summary On Friday, October 18, 2024 Scaleway Elements cloud ecosystem experienced a network incident that caused cascade effects in various products in the FR-PAR-1 Availability Zone. The incident started at 6:20 UTC and had two main impact periods: From 6:20 to 6:21 UTC: around a minute of instability for a very limited number of instances hosted in a single rack. Some customers might have experienced heavy packet loss for all types of network connectivity for a subset of their instances. From 7:50 to 8:01 UTC: unavailability of VPC network connectivity for a subset of around 25% of hypervisors of FR-PAR-1 AZ and subsequent impact for VPC dependent products. Public internet connectivity was not impacted during this period. Elastic Metal VPC connectivity was not impacted. Scaleway Elements Infrastructure Scaleway Elements cloud ecosystem is built on top of stacked layers of infrastructure: Data Centers: electrical power systems, cooling systems, optical fiber cabling and physical security. Hardware and physical network infrastructure: servers, data center fabric networks, backbone routers, and inter-datacenter links. Virtualized multi-tenant cloud foundation: Virtual machines running on top of hypervisors, Virtualized software defined network, providing multi-tenant VPC networks and VPC edge services, such as DHCP and DNS within VPC. High-level PaaS products: K8S, Database, Load Balancer, Serverless, Observability and many more, running on top of VM instances and using VPC networks for internal communication. These layers are running on top of each other: the higher layers are dependent on the lower layers of the infrastructure. In a high-load massive scale environment, consisting of many thousands of physical machines, eventual hardware failures are routine events happening several times per week. The absolute majority of them are invisible to the customers as we build our infrastructure in a redundant fashion. All of the layers have their own redundancy and failover mechanisms which make them resilient to the failures in the lower layers. All the critical systems have at least two instances (often more), deployed in an active-active fashion with a 50% load threshold, so if a critical instance fails, the remaining capacity is able to handle 100% of the load. Timeline At 6:20 UTC, one of the Top of Rack (ToR) switches experienced a software crash. Due to the unstable state of its software modules, it took around 50 seconds for the network protocols to failover all traffic of the rack to the second ToR switch. During the convergence time, the traffic to and from the hypervisors in the impacted rack experienced a high percentage of packet drops of a non deterministic nature. By 6:21 UTC, all traffic was fully rerouted to the backup device and the instability was resolved. At 6:38 UTC, the crashed switch terminated its automatic reboot and restored normal operation of the redundant ToR mode. However, this instability caused a cascade effect on one of the infrastructure blocks of the virtualized network infrastructure: a BGP route reflector (RR), used for VPC services. This software was hosted on one of the hypervisors in the impacted rack. The RR software stack experienced an instability and got stuck in an abnormal state which could not be resolved by the auto-restart process. At this point, customers didn’t experience any impact of the VPC services as the backup RR was operating normally. However the RR redundancy was lost. At 7:50 UTC, the second RR experienced a critical condition and also got stuck in a non-operational state. At this point customers experienced a disruption of the VPC connectivity for a subset of FR-PAR-1 AZ Scaleway Elements virtual products. Around 25% of the hypervisors lost VPC connectivity with the rest of the region. Both RR software stacks have health-check monitoring and auto-restart mechanism which should have addressed this type of failure. The health-check monitoring successfully detected the anomaly however the auto-restart mechanism failed. The impact was detected by Scaleway Customer Excellence and SRE teams, and an incident was opened with subsequent notification of the technical top-management. Both VPC route reflectors were fixed by a manual action. By 8:01 UTC VPC connectivity was fully restored. By 8:07 UTC the situation was back to nominal state with redundancy operating normally. VPC Dependent Products Impact Managed Database and Redis Impacted during both periods: Short period of connectivity loss of the impacted rack (6:20-6:21 UTC). Some Database customers experienced 500 HTTP errors for connections to their databases. VPC connectivity loss for 25% of FR-PAR-1 hypervisors (7:50-8:01 UTC). Impacted customers could not use VPC to connect to their managed databases. Serverless Functions and Containers During the VPC connectivity loss period (7:50-8:01 UTC) one of the serverless compute nodes was unavailable. A subset of customers with workloads on this node could experience service disruptions, including potential 500 errors when calling their functions/container while their workloads were being rescheduled. Kapsule During the VPC connectivity loss period (7:50-8:01 UTC), there was a network partition between nodes, preventing applications running on different nodes to communicate. The infrastructure hosting control-planes uses VPC, and was impacted too, causing some control-plane unavailability. Unfortunately, some nodes were replaced by the autohealing because they were unable to report their availability, causing workloads to be rescheduled/restarted. By 8:04 UTC almost all clusters were recovered. Lessons Learned and Further Actions We have fixed the autohealing mechanism which failed to fix the route reflectors in a stale state: https://github.com/FRRouting/frr/pull/17163. We are planning to introduce software version diversity for route reflectors to avoid multiple instances being impacted by a single bug. We have a plan to deeply investigate the software issue which caused that stale state.

Updates:

  • Time: Oct. 18, 2024, 2:57 p.m.
    Status: Postmortem
    Update: **Post Mortem: 2024-10-18, Scaleway Elements Partial Loss of VPC Connectivity in FR-PAR-1** **Summary** On Friday, October 18, 2024 Scaleway Elements cloud ecosystem experienced a network incident that caused cascade effects in various products in the FR-PAR-1 Availability Zone. The incident started at 6:20 UTC and had two main impact periods: * From 6:20 to 6:21 UTC: around a minute of instability for a very limited number of instances hosted in a single rack. Some customers might have experienced heavy packet loss for all types of network connectivity for a subset of their instances. * From 7:50 to 8:01 UTC: unavailability of VPC network connectivity for a subset of around 25% of hypervisors of FR-PAR-1 AZ and subsequent impact for VPC dependent products. Public internet connectivity was not impacted during this period. Elastic Metal VPC connectivity was not impacted. ## **Scaleway Elements Infrastructure** Scaleway Elements cloud ecosystem is built on top of stacked layers of infrastructure: 1. Data Centers: electrical power systems, cooling systems, optical fiber cabling and physical security. 2. Hardware and physical network infrastructure: servers, data center fabric networks, backbone routers, and inter-datacenter links. 3. Virtualized multi-tenant cloud foundation: Virtual machines running on top of hypervisors, 4. Virtualized software defined network, providing multi-tenant VPC networks and VPC edge services, such as DHCP and DNS within VPC. High-level PaaS products: K8S, Database, Load Balancer, Serverless, Observability and many more, running on top of VM instances and using VPC networks for internal communication. These layers are running on top of each other: the higher layers are dependent on the lower layers of the infrastructure. In a high-load massive scale environment, consisting of many thousands of physical machines, eventual hardware failures are routine events happening several times per week. The absolute majority of them are invisible to the customers as we build our infrastructure in a redundant fashion. All of the layers have their own redundancy and failover mechanisms which make them resilient to the failures in the lower layers. All the critical systems have at least two instances \(often more\), deployed in an active-active fashion with a 50% load threshold, so if a critical instance fails, the remaining capacity is able to handle 100% of the load. ## Timeline * At 6:20 UTC, one of the Top of Rack \(ToR\) switches experienced a software crash. Due to the unstable state of its software modules, it took around 50 seconds for the network protocols to failover all traffic of the rack to the second ToR switch. * During the convergence time, the traffic to and from the hypervisors in the impacted rack experienced a high percentage of packet drops of a non deterministic nature. * By 6:21 UTC, all traffic was fully rerouted to the backup device and the instability was resolved. * At 6:38 UTC, the crashed switch terminated its automatic reboot and restored normal operation of the redundant ToR mode. * However, this instability caused a cascade effect on one of the infrastructure blocks of the virtualized network infrastructure: a BGP route reflector \(RR\), used for VPC services. This software was hosted on one of the hypervisors in the impacted rack. * The RR software stack experienced an instability and got stuck in an abnormal state which could not be resolved by the auto-restart process. * At this point, customers didn’t experience any impact of the VPC services as the backup RR was operating normally. However the RR redundancy was lost. At 7:50 UTC, the second RR experienced a critical condition and also got stuck in a non-operational state. * At this point customers experienced a disruption of the VPC connectivity for a subset of FR-PAR-1 AZ Scaleway Elements virtual products. Around 25% of the hypervisors lost VPC connectivity with the rest of the region. * Both RR software stacks have health-check monitoring and auto-restart mechanism which should have addressed this type of failure. The health-check monitoring successfully detected the anomaly however the auto-restart mechanism failed. * The impact was detected by Scaleway Customer Excellence and SRE teams, and an incident was opened with subsequent notification of the technical top-management. * Both VPC route reflectors were fixed by a manual action. * By 8:01 UTC VPC connectivity was fully restored. * By 8:07 UTC the situation was back to nominal state with redundancy operating normally ‌ ## VPC Dependent Products Impact ### Managed Database and Redis * Impacted during both periods: Short period of connectivity loss of the impacted rack \(6:20-6:21 UTC\). Some Database customers experienced 500 HTTP errors for connections to their databases. * VPC connectivity loss for 25% of FR-PAR-1 hypervisors \(7:50-8:01 UTC\). Impacted customers could not use VPC to connect to their managed databases. ## Serverless Functions and Containers During the VPC connectivity loss period \(7:50-8:01 UTC\) one of the serverless compute nodes was unavailable. A subset of customers with workloads on this node could experience service disruptions, including potential 500 errors when calling their functions/container while their workloads were being rescheduled. ## Kapsule During the VPC connectivity loss period \(7:50-8:01 UTC\), there was a network partition between nodes, preventing applications running on different nodes to communicate. The infrastructure hosting control-planes uses VPC, and was impacted too, causing some control-plane unavailability. Unfortunately, some nodes were replaced by the autohealing because they were unable to report their availability, causing workloads to be rescheduled/restarted. By 8:04 UTC almost all clusters were recovered. # Lessons Learned and Further Actions * We have fixed the autohealing mechanism which failed to fix the route reflectors in a stale state: [https://github.com/FRRouting/frr/pull/17163.](https://github.com/FRRouting/frr/pull/17163.) * We are planning to introduce software version diversity for route reflectors to avoid multiple instances being impacted by a single bug. * We have a plan to deeply investigate the software issue which caused that stale state.
  • Time: Oct. 18, 2024, 2:24 p.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: Oct. 18, 2024, 8:50 a.m.
    Status: Monitoring
    Update: We are continuing to monitor for any further issues.
  • Time: Oct. 18, 2024, 8:30 a.m.
    Status: Monitoring
    Update: A fix has been implemented and we are monitoring the results.
  • Time: Oct. 18, 2024, 8:17 a.m.
    Status: Investigating
    Update: The service has been degraded since 08.20AM CEST, down between 09.49 and 10.04AM CEST Our teams are working towards a solution. Thank you for your patience.
  • Time: Oct. 18, 2024, 8:10 a.m.
    Status: Investigating
    Update: We are investigating VPC issues in fr-par-1

Updates:

  • Time: Oct. 17, 2024, 2:44 p.m.
    Status: Resolved
    Update: All the machines are now back up
  • Time: Oct. 17, 2024, 2:20 p.m.
    Status: Identified
    Update: We are in the process of powering back on the affected machines, 80% of the rack is back up at the moment
  • Time: Oct. 17, 2024, 1:58 p.m.
    Status: Investigating
    Update: Following a planned operation in fr-par-3 some Apple Silicon have unexpectedly been powered off. We are in the process of rebooting the affected machines. IP of affected machines: 51.159.120.2 -> 51.159.120.97

Updates:

  • Time: Oct. 17, 2024, 9:36 a.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: Oct. 17, 2024, 8:56 a.m.
    Status: Investigating
    Update: Cluster scaling issues on WAW for clusters with ENT1-L pool on WAW2 Customers can remove the ENT1-L pools to fix the issue

Check the status of similar companies and alternatives to Scaleway

Akamai
Akamai

Systems Active

Nutanix
Nutanix

Systems Active

MongoDB
MongoDB

Systems Active

LogicMonitor
LogicMonitor

Systems Active

Acquia
Acquia

Systems Active

Granicus System
Granicus System

Systems Active

CareCloud
CareCloud

Systems Active

Redis
Redis

Systems Active

integrator.io
integrator.io

Systems Active

NinjaOne Trust

Systems Active

Pantheon Operations
Pantheon Operations

Systems Active

Securiti US
Securiti US

Systems Active

Frequently Asked Questions - Scaleway

Is there a Scaleway outage?
The current status of Scaleway is: Minor Outage
Where can I find the official status page of Scaleway?
The official status page for Scaleway is here
How can I get notified if Scaleway is down or experiencing an outage?
To get notified of any status changes to Scaleway, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Scaleway every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here
What does Scaleway do?
This service enables the creation, education, implementation, and expansion of smart applications on a robust and eco-friendly cloud platform.