Files.com Status: Check if Files.com down or having an outage.

Component	Status
Australia Region	Active
Background Jobs, including Sync and Webhooks	Active
Canada Region	Active
Core Services / API	Active
EU (Germany) Region	Active
Files Tools	Active
FTP/FTPS	Active
Japan Region	Active
Remote Server Integrations (Sync and Mount)	Active
SFTP	Active
Singapore Region	Active
UK Region	Active
USA Region	Active
WebDAV	Active
Web Interface	Active

Reports of Elevated DNS Errors

Description: On May 12th, 2023, at AM/PM PST, [Files.com](http://Files.com) received customer reports of elevated DNS errors which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. [Files.com](http://Files.com) released an initial Status Page posting on May 12th, 2023, at 3:45 PM PST stating: _**“Reports of Elevated DNS Errors:** We are investigating reports of DNS errors on the_ [_Files.com_](http://Files.com) _service._ _This is intermittently affecting some logins for all services._ _We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.”_ The elevated DNS errors was resolved on May 12th, 2023, at 4:16 PM PST, returning the platform to full functionality. [Files.com](http://Files.com) released a resolution Status Page posting on May 12th, 2023, at 4:23 PM PST stating _“All services have been restored and are operating normally._ _We resolved a DNS issue resulting in some intermittent errors on accessing_ [_Files.com_](http://Files.com) _sites. Users without the site name cached were potentially affected from approximately 2:25 p.m. PST to 4:16 p.m. PST. This issue did not anyone with dedicated IP addresses._ _We will follow up with an Incident Report within ten \(10\) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.”_ This incident occurred during the deployment of changes to our corporate domain registrations as part of the post-mortem/resolution process for the incident that occurred on May 5. As discussed in the RCA for that incident, we moved the registration records for all domain names owned by [Files.com](http://Files.com) to CSC Domains, an enterprise and security-focused domain name registrar, for the purpose of mitigating domain name registrar risk. During the process of the domain transfer, the nameservers for one of our domain names were inadvertently entered incorrectly into the new registrar. As a result, DNS lookups for certain domains resulted in failure. This issue only affected a subset of our customers, and did not affect any customers using custom domain names or custom IP addresses. Once we diagnosed the problem, we were able to call CSC Domains and get the matter resolved immediately. As of now, all domains owned by [Files.com](http://Files.com) are managed by CSC Domains, and we do not expect any further registrar-related incidents to occur in the future. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.

Status: Postmortem

Impact: None | Started At: May 12, 2023, 10:45 p.m.

Updates:

Time: June 1, 2023, 6:20 p.m.

Status: Postmortem

Update: On May 12th, 2023, at AM/PM PST, [Files.com](http://Files.com) received customer reports of elevated DNS errors which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. [Files.com](http://Files.com) released an initial Status Page posting on May 12th, 2023, at 3:45 PM PST stating: _**“Reports of Elevated DNS Errors:** We are investigating reports of DNS errors on the_ [_Files.com_](http://Files.com) _service._ _This is intermittently affecting some logins for all services._ _We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.”_ The elevated DNS errors was resolved on May 12th, 2023, at 4:16 PM PST, returning the platform to full functionality. [Files.com](http://Files.com) released a resolution Status Page posting on May 12th, 2023, at 4:23 PM PST stating _“All services have been restored and are operating normally._ _We resolved a DNS issue resulting in some intermittent errors on accessing_ [_Files.com_](http://Files.com) _sites. Users without the site name cached were potentially affected from approximately 2:25 p.m. PST to 4:16 p.m. PST. This issue did not anyone with dedicated IP addresses._ _We will follow up with an Incident Report within ten \(10\) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.”_ This incident occurred during the deployment of changes to our corporate domain registrations as part of the post-mortem/resolution process for the incident that occurred on May 5. As discussed in the RCA for that incident, we moved the registration records for all domain names owned by [Files.com](http://Files.com) to CSC Domains, an enterprise and security-focused domain name registrar, for the purpose of mitigating domain name registrar risk. During the process of the domain transfer, the nameservers for one of our domain names were inadvertently entered incorrectly into the new registrar. As a result, DNS lookups for certain domains resulted in failure. This issue only affected a subset of our customers, and did not affect any customers using custom domain names or custom IP addresses. Once we diagnosed the problem, we were able to call CSC Domains and get the matter resolved immediately. As of now, all domains owned by [Files.com](http://Files.com) are managed by CSC Domains, and we do not expect any further registrar-related incidents to occur in the future. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
Time: May 12, 2023, 11:23 p.m.

Status: Resolved

Update: All services have been restored and are operating normally. We resolved a DNS issue resulting in some intermittent errors on accessing Files.com sites. Users without the site name cached were potentially affected from approximately 2:25 p.m. PST to 4:16 p.m. PST. This issue did not anyone with dedicated IP addresses. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
Time: May 12, 2023, 10:45 p.m.

Status: Investigating

Update: We are investigating reports of DNS errors on the Files.com service. This is intermittently affecting some logins for all services. We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.

SFTP, FTP/FTPS, WebDAV Service Degraded

Description: On May 8th, an d May 9th, 2023, [Files.com](http://Files.com) received multiple automated alerts and customer reports of intermittent issues with the [Files.com](http://Files.com) platform, which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. [Files.com](http://Files.com) released an initial Status Page posting on May 8th, 2023, at 5:12 PM PST stating: _**“SFTP, FTP/FTPS, WebDAV Service Degraded:** FTP/FTPS, SFTP, WebDAV only: We are investigating elevated error rates on these services on_ [_Files.com_](http://Files.com) _in all regions._ _This incident does not impact other network services such as API, AS2, and others._ _We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.”_ [Files.com](http://Files.com) released a resolution Status Page posting on May 8th, 2023, at 5:37 PM PST stating _“All services have been restored and are operating normally._ _Users connecting to accounts with a custom namespace, an ExaVault host key, a custom host key, or an enforced IP whitelist experienced authentication errors. Logins were impacted between 1:34 p.m. PST and 5:33 p.m. PST. Other users may have experienced elevated error rates as well._ _We will follow up with an Incident Report within ten \(10\) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.”_ Customers continued reporting other intermittent issues with the platform, which resulted in second incident being declared on May 9th, 2023, at 6:47 AM PST. The IMT convened and immediately began investigation The intermittent issues with the [Files.com](http://Files.com) platform were resolved on May 9th, 2023, at 8:07 AM PST, returning the platform to full functionality. This incident occurred due to a complex set of circumstances with times that vary by region. This narrative will focus on the overall story of what happened. On May 5, [Files.com](http://Files.com) experienced an incident that resulted in a 3\+ hour service outage. Prior to that, on May 3, [Files.com](http://Files.com) conducted a successful upgrade of certain regional proxy servers in certain regions from Intel architecture to ARM architecture as part of our overall transition from Intel to ARM across all of our services. As we explained in the RCA of the May 5 incident, our Incident Management Team originally misidentified the root cause of that incident as being related to the new ARM servers and made the decision to roll back from our new ARM servers to the old Intel servers in certain regions on May 5. Unfortunately, that rollback was not correctly performed. We make use AWS \(Amazon Web Services\) EC2 \(Elastic Compute Cloud\) for all of our compute resources on [Files.com](http://Files.com). Both the Intel and ARM servers being discussed run inside AWS EC2. The EC2 networking backplane suffers from a long-standing bug that we have long been aware of where migrating an IP from one server to another can result in erroneous data reported by EC2 to our instances. In short, if you live migrate an IP on EC2 from one server to another, EC2 can report to both servers that they still “own” the IP. Because of this bug, we have a complicated procedure for migrating IPs from one server to another. This procedure is highly automated and provides that we always fully shut down servers after IPs are moved off of them. This procedure works around the EC2 bug. When we performed the rollback from ARM to Intel servers on May 5, we failed to fully follow our procedure and fully shut down the ARM servers. They were “disabled” using a softer disabling mechanism, but at some point they rebooted and once they rebooted, EC2 began to report conflicting information about which server “owned” the IPs related to this incident. In our architecture, servers report their internal and external IP list to our central routing system on a regular schedule. As a result of the two sets of servers reporting conflicting information, our routing systems began to oscillate routing traffic between the Intel and ARM servers every few minutes, and only one set of servers would work at a given time. The root cause of this incident was our failure to follow our own procedure during the transition between ARM and Intel servers. A major contributing factor was our failure to detect a situation where IP addresses appear to oscillate between multiple servers. Another contributing factor is the AWS EC2 bug that results in incorrect IP address information being reported to instances. As a result of this incident, we have conducted remedial training with all of our Infrastructure team to re-train them on the procedure to migrate IPs from one server to another. We have additionally added new protection to our routing system that will detect a situation where IP addresses oscillate between servers and raise an alarm when that happens in the future. Furthermore, we have improved our internal synthetic monitoring systems with the ability to detect the situation that occurred during this incident and treat it as a failure. On a more general note, we have added a considerable amount of sophistication to our monitoring and routing systems as a result of the several incidents that occurred in May, and we are adding more. These improvements amount to over 5,000 lines of code and we are optimistic that they will reduce the frequency and impact of incidents in the future. We greatly appreciate your patience and understanding as we resolved these issues. If you need additional assistance or continue to experience issues, please contact our Customer Support team.

Status: Postmortem

Impact: Major | Started At: May 9, 2023, 12:12 a.m.

Updates:

Time: June 1, 2023, 6:34 p.m.

Status: Postmortem

Update: On May 8th, an d May 9th, 2023, [Files.com](http://Files.com) received multiple automated alerts and customer reports of intermittent issues with the [Files.com](http://Files.com) platform, which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. [Files.com](http://Files.com) released an initial Status Page posting on May 8th, 2023, at 5:12 PM PST stating: _**“SFTP, FTP/FTPS, WebDAV Service Degraded:** FTP/FTPS, SFTP, WebDAV only: We are investigating elevated error rates on these services on_ [_Files.com_](http://Files.com) _in all regions._ _This incident does not impact other network services such as API, AS2, and others._ _We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.”_ [Files.com](http://Files.com) released a resolution Status Page posting on May 8th, 2023, at 5:37 PM PST stating _“All services have been restored and are operating normally._ _Users connecting to accounts with a custom namespace, an ExaVault host key, a custom host key, or an enforced IP whitelist experienced authentication errors. Logins were impacted between 1:34 p.m. PST and 5:33 p.m. PST. Other users may have experienced elevated error rates as well._ _We will follow up with an Incident Report within ten \(10\) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.”_ Customers continued reporting other intermittent issues with the platform, which resulted in second incident being declared on May 9th, 2023, at 6:47 AM PST. The IMT convened and immediately began investigation The intermittent issues with the [Files.com](http://Files.com) platform were resolved on May 9th, 2023, at 8:07 AM PST, returning the platform to full functionality. This incident occurred due to a complex set of circumstances with times that vary by region. This narrative will focus on the overall story of what happened. On May 5, [Files.com](http://Files.com) experienced an incident that resulted in a 3\+ hour service outage. Prior to that, on May 3, [Files.com](http://Files.com) conducted a successful upgrade of certain regional proxy servers in certain regions from Intel architecture to ARM architecture as part of our overall transition from Intel to ARM across all of our services. As we explained in the RCA of the May 5 incident, our Incident Management Team originally misidentified the root cause of that incident as being related to the new ARM servers and made the decision to roll back from our new ARM servers to the old Intel servers in certain regions on May 5. Unfortunately, that rollback was not correctly performed. We make use AWS \(Amazon Web Services\) EC2 \(Elastic Compute Cloud\) for all of our compute resources on [Files.com](http://Files.com). Both the Intel and ARM servers being discussed run inside AWS EC2. The EC2 networking backplane suffers from a long-standing bug that we have long been aware of where migrating an IP from one server to another can result in erroneous data reported by EC2 to our instances. In short, if you live migrate an IP on EC2 from one server to another, EC2 can report to both servers that they still “own” the IP. Because of this bug, we have a complicated procedure for migrating IPs from one server to another. This procedure is highly automated and provides that we always fully shut down servers after IPs are moved off of them. This procedure works around the EC2 bug. When we performed the rollback from ARM to Intel servers on May 5, we failed to fully follow our procedure and fully shut down the ARM servers. They were “disabled” using a softer disabling mechanism, but at some point they rebooted and once they rebooted, EC2 began to report conflicting information about which server “owned” the IPs related to this incident. In our architecture, servers report their internal and external IP list to our central routing system on a regular schedule. As a result of the two sets of servers reporting conflicting information, our routing systems began to oscillate routing traffic between the Intel and ARM servers every few minutes, and only one set of servers would work at a given time. The root cause of this incident was our failure to follow our own procedure during the transition between ARM and Intel servers. A major contributing factor was our failure to detect a situation where IP addresses appear to oscillate between multiple servers. Another contributing factor is the AWS EC2 bug that results in incorrect IP address information being reported to instances. As a result of this incident, we have conducted remedial training with all of our Infrastructure team to re-train them on the procedure to migrate IPs from one server to another. We have additionally added new protection to our routing system that will detect a situation where IP addresses oscillate between servers and raise an alarm when that happens in the future. Furthermore, we have improved our internal synthetic monitoring systems with the ability to detect the situation that occurred during this incident and treat it as a failure. On a more general note, we have added a considerable amount of sophistication to our monitoring and routing systems as a result of the several incidents that occurred in May, and we are adding more. These improvements amount to over 5,000 lines of code and we are optimistic that they will reduce the frequency and impact of incidents in the future. We greatly appreciate your patience and understanding as we resolved these issues. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
Time: May 9, 2023, 12:37 a.m.

Status: Resolved

Update: All services have been restored and are operating normally. Users connecting to accounts with a custom namespace, an ExaVault host key, a custom host key, or an enforced IP whitelist experienced authentication errors. Logins were impacted between 1:34 p.m. PST and 5:33 p.m. PST. Other users may have experienced elevated error rates as well. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
Time: May 9, 2023, 12:12 a.m.

Status: Investigating

Update: FTP/FTPS, SFTP, WebDAV only: We are investigating elevated error rates on these services on Files.com in all regions. This incident does not impact other network services such as API, AS2, and others. We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.

SFTP, FTP/FTPS, WebDAV Service Degraded

Description: On May 8th, an d May 9th, 2023, [Files.com](http://Files.com) received multiple automated alerts and customer reports of intermittent issues with the [Files.com](http://Files.com) platform, which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. [Files.com](http://Files.com) released an initial Status Page posting on May 8th, 2023, at 5:12 PM PST stating: _**“SFTP, FTP/FTPS, WebDAV Service Degraded:** FTP/FTPS, SFTP, WebDAV only: We are investigating elevated error rates on these services on_ [_Files.com_](http://Files.com) _in all regions._ _This incident does not impact other network services such as API, AS2, and others._ _We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.”_ [Files.com](http://Files.com) released a resolution Status Page posting on May 8th, 2023, at 5:37 PM PST stating _“All services have been restored and are operating normally._ _Users connecting to accounts with a custom namespace, an ExaVault host key, a custom host key, or an enforced IP whitelist experienced authentication errors. Logins were impacted between 1:34 p.m. PST and 5:33 p.m. PST. Other users may have experienced elevated error rates as well._ _We will follow up with an Incident Report within ten \(10\) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.”_ Customers continued reporting other intermittent issues with the platform, which resulted in second incident being declared on May 9th, 2023, at 6:47 AM PST. The IMT convened and immediately began investigation The intermittent issues with the [Files.com](http://Files.com) platform were resolved on May 9th, 2023, at 8:07 AM PST, returning the platform to full functionality. This incident occurred due to a complex set of circumstances with times that vary by region. This narrative will focus on the overall story of what happened. On May 5, [Files.com](http://Files.com) experienced an incident that resulted in a 3\+ hour service outage. Prior to that, on May 3, [Files.com](http://Files.com) conducted a successful upgrade of certain regional proxy servers in certain regions from Intel architecture to ARM architecture as part of our overall transition from Intel to ARM across all of our services. As we explained in the RCA of the May 5 incident, our Incident Management Team originally misidentified the root cause of that incident as being related to the new ARM servers and made the decision to roll back from our new ARM servers to the old Intel servers in certain regions on May 5. Unfortunately, that rollback was not correctly performed. We make use AWS \(Amazon Web Services\) EC2 \(Elastic Compute Cloud\) for all of our compute resources on [Files.com](http://Files.com). Both the Intel and ARM servers being discussed run inside AWS EC2. The EC2 networking backplane suffers from a long-standing bug that we have long been aware of where migrating an IP from one server to another can result in erroneous data reported by EC2 to our instances. In short, if you live migrate an IP on EC2 from one server to another, EC2 can report to both servers that they still “own” the IP. Because of this bug, we have a complicated procedure for migrating IPs from one server to another. This procedure is highly automated and provides that we always fully shut down servers after IPs are moved off of them. This procedure works around the EC2 bug. When we performed the rollback from ARM to Intel servers on May 5, we failed to fully follow our procedure and fully shut down the ARM servers. They were “disabled” using a softer disabling mechanism, but at some point they rebooted and once they rebooted, EC2 began to report conflicting information about which server “owned” the IPs related to this incident. In our architecture, servers report their internal and external IP list to our central routing system on a regular schedule. As a result of the two sets of servers reporting conflicting information, our routing systems began to oscillate routing traffic between the Intel and ARM servers every few minutes, and only one set of servers would work at a given time. The root cause of this incident was our failure to follow our own procedure during the transition between ARM and Intel servers. A major contributing factor was our failure to detect a situation where IP addresses appear to oscillate between multiple servers. Another contributing factor is the AWS EC2 bug that results in incorrect IP address information being reported to instances. As a result of this incident, we have conducted remedial training with all of our Infrastructure team to re-train them on the procedure to migrate IPs from one server to another. We have additionally added new protection to our routing system that will detect a situation where IP addresses oscillate between servers and raise an alarm when that happens in the future. Furthermore, we have improved our internal synthetic monitoring systems with the ability to detect the situation that occurred during this incident and treat it as a failure. On a more general note, we have added a considerable amount of sophistication to our monitoring and routing systems as a result of the several incidents that occurred in May, and we are adding more. These improvements amount to over 5,000 lines of code and we are optimistic that they will reduce the frequency and impact of incidents in the future. We greatly appreciate your patience and understanding as we resolved these issues. If you need additional assistance or continue to experience issues, please contact our Customer Support team.

Status: Postmortem

Impact: Major | Started At: May 9, 2023, 12:12 a.m.

Updates:

Time: June 1, 2023, 6:34 p.m.

Status: Postmortem

Update: On May 8th, an d May 9th, 2023, [Files.com](http://Files.com) received multiple automated alerts and customer reports of intermittent issues with the [Files.com](http://Files.com) platform, which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. [Files.com](http://Files.com) released an initial Status Page posting on May 8th, 2023, at 5:12 PM PST stating: _**“SFTP, FTP/FTPS, WebDAV Service Degraded:** FTP/FTPS, SFTP, WebDAV only: We are investigating elevated error rates on these services on_ [_Files.com_](http://Files.com) _in all regions._ _This incident does not impact other network services such as API, AS2, and others._ _We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.”_ [Files.com](http://Files.com) released a resolution Status Page posting on May 8th, 2023, at 5:37 PM PST stating _“All services have been restored and are operating normally._ _Users connecting to accounts with a custom namespace, an ExaVault host key, a custom host key, or an enforced IP whitelist experienced authentication errors. Logins were impacted between 1:34 p.m. PST and 5:33 p.m. PST. Other users may have experienced elevated error rates as well._ _We will follow up with an Incident Report within ten \(10\) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.”_ Customers continued reporting other intermittent issues with the platform, which resulted in second incident being declared on May 9th, 2023, at 6:47 AM PST. The IMT convened and immediately began investigation The intermittent issues with the [Files.com](http://Files.com) platform were resolved on May 9th, 2023, at 8:07 AM PST, returning the platform to full functionality. This incident occurred due to a complex set of circumstances with times that vary by region. This narrative will focus on the overall story of what happened. On May 5, [Files.com](http://Files.com) experienced an incident that resulted in a 3\+ hour service outage. Prior to that, on May 3, [Files.com](http://Files.com) conducted a successful upgrade of certain regional proxy servers in certain regions from Intel architecture to ARM architecture as part of our overall transition from Intel to ARM across all of our services. As we explained in the RCA of the May 5 incident, our Incident Management Team originally misidentified the root cause of that incident as being related to the new ARM servers and made the decision to roll back from our new ARM servers to the old Intel servers in certain regions on May 5. Unfortunately, that rollback was not correctly performed. We make use AWS \(Amazon Web Services\) EC2 \(Elastic Compute Cloud\) for all of our compute resources on [Files.com](http://Files.com). Both the Intel and ARM servers being discussed run inside AWS EC2. The EC2 networking backplane suffers from a long-standing bug that we have long been aware of where migrating an IP from one server to another can result in erroneous data reported by EC2 to our instances. In short, if you live migrate an IP on EC2 from one server to another, EC2 can report to both servers that they still “own” the IP. Because of this bug, we have a complicated procedure for migrating IPs from one server to another. This procedure is highly automated and provides that we always fully shut down servers after IPs are moved off of them. This procedure works around the EC2 bug. When we performed the rollback from ARM to Intel servers on May 5, we failed to fully follow our procedure and fully shut down the ARM servers. They were “disabled” using a softer disabling mechanism, but at some point they rebooted and once they rebooted, EC2 began to report conflicting information about which server “owned” the IPs related to this incident. In our architecture, servers report their internal and external IP list to our central routing system on a regular schedule. As a result of the two sets of servers reporting conflicting information, our routing systems began to oscillate routing traffic between the Intel and ARM servers every few minutes, and only one set of servers would work at a given time. The root cause of this incident was our failure to follow our own procedure during the transition between ARM and Intel servers. A major contributing factor was our failure to detect a situation where IP addresses appear to oscillate between multiple servers. Another contributing factor is the AWS EC2 bug that results in incorrect IP address information being reported to instances. As a result of this incident, we have conducted remedial training with all of our Infrastructure team to re-train them on the procedure to migrate IPs from one server to another. We have additionally added new protection to our routing system that will detect a situation where IP addresses oscillate between servers and raise an alarm when that happens in the future. Furthermore, we have improved our internal synthetic monitoring systems with the ability to detect the situation that occurred during this incident and treat it as a failure. On a more general note, we have added a considerable amount of sophistication to our monitoring and routing systems as a result of the several incidents that occurred in May, and we are adding more. These improvements amount to over 5,000 lines of code and we are optimistic that they will reduce the frequency and impact of incidents in the future. We greatly appreciate your patience and understanding as we resolved these issues. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
Time: May 9, 2023, 12:37 a.m.

Status: Resolved

Update: All services have been restored and are operating normally. Users connecting to accounts with a custom namespace, an ExaVault host key, a custom host key, or an enforced IP whitelist experienced authentication errors. Logins were impacted between 1:34 p.m. PST and 5:33 p.m. PST. Other users may have experienced elevated error rates as well. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
Time: May 9, 2023, 12:12 a.m.

Status: Investigating

Update: FTP/FTPS, SFTP, WebDAV only: We are investigating elevated error rates on these services on Files.com in all regions. This incident does not impact other network services such as API, AS2, and others. We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.

SFTP Entirely Down – US East Region (Primary)

Description: On May 8th, 2023, at 1:39 PM PST, [Files.com](http://Files.com) received automated alerting of SFTP entirely down in the US East region which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. [Files.com](http://Files.com) released an initial Status Page posting on May 8th, 2023, at 1:47 PM PST stating: _**“SFTP Entirely Down – US East Region \(Primary\):** SFTP only: We are investigating a major outage of the SFTP service on_ [_Files.com_](http://Files.com) _in our primary USA region._ _This incident does not impact other network services such as API, FTP, WebDAV, AS2, and others._ _If you have an urgent need to access_ [_Files.com_](http://Files.com)_, we recommend using FTP in lieu of SFTP. If you must connect via SFTP, you should be able to immediately connect \(and access your existing files and account\) using the hostname of our Canada region, which is_ [_app-ca-central-1.files.com_](http://app-ca-central-1.files.com)_._ _We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.”_ The SFTP entirely down in the US East region was resolved on May 8th, 2023, at 1:47 PM PST, returning the platform to full functionality. [Files.com](http://Files.com) released a resolution Status Page posting on May 8th, 2023, at 1:51 PM PST stating _“All services have been restored and are operating normally._ _We have resolved a major outage of the SFTP service on_ [_Files.com_](http://Files.com) _in our primary USA region. This incident did not impact other network services such as API, FTP, WebDAV, AS2, and others. The SFTP service was down from 1:34 p.m. to 1:47 p.m., with a total downtime of 13 minutes, but only in the primary USA region._ _If you previously moved any workloads to another region in response to this incident, you are cleared to move those regional workloads back to the USA region._ _We will follow up with an Incident Report within ten \(10\) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.”_ This incident occurred during a time period that also contained multiple other incidents, some of which are overlapping. This report focuses specifically on the symptoms described here, but many customers who experienced this incident also experienced one of the other incidents. This incident had two distinct parts and root causes. First, [Files.com](http://Files.com) deployed a change to its SFTP server as part of our overall project to dramatically improve the logging and handling of errors on SFTP. The deployment of that change crashed our SFTP servers in several of our smaller regions due to an “out of memory” condition. Our SFTP server is developed in Java, and anyone familiar with Java can tell you how sensitive Java can be to memory configuration settings. We immediately identified the issue with the Java memory settings and pushed a change to Chef, our infrastructure configuration management system, to tweak the SFTP memory settings and resolve the initial crash. The root cause of this first part was [Files.com](http://Files.com)’s failure to monitoring Java runtime parameters such as memory usage to defend against an out of memory condition. We have added additional monitoring around Java memory usage and are optimistic that this situation will be avoided in the future. One benefit of the [Files.com](http://Files.com) architecture as compared with many of our peers is that on [Files.com](http://Files.com), SFTP is a completely isolated subsystem, so this incident did not impact other network services such as FTP, AS2, WebDAV, or API. Unfortunately, when we deployed the configuration change via Chef, we inadvertently deployed an unrelated configuration change at the same time that had been previously merged but not deployed to the SFTP servers. This is due to the fact that we use one unified Chef repository for server configuration where certain recipes can be shared by different server types. That configuration change introduced an error into the upstream communication with our API, resulting in inability to connect via SFTP for certain customers. After investigating the issue, we were able to identify the bad configuration change and revert it. The root cause of the second part is [Files.com](http://Files.com)’s failure to operate adequate change management procedures to prevent an unintended change from being deployed. Our incident management team was quite disappointed to learn about the chain of events that led to this incident. We have already improved our internal synthetic monitoring systems with the ability to detect the situation that occurred during this incident and alert on it immediately. Additionally, as a result of this incident, we are implementing major changes to our change management procedures designed to prevent this sort of configuration management error from happening again. Those changes are fairly complicated and will require a great deal of internal development. As such, they will likely not be deployed until the middle of Q3. It is our goal to have them implemented before our next SOC 2 Type II observation period \(which runs from Q2-Q3 2023\) and documented in our next SOC 2 Type II report. On a more general note, we have added a considerable amount of sophistication to our monitoring and routing systems as a result of the several incidents that occurred in May, and we are adding more. These improvements amount to over 5,000 lines of code and we are optimistic that they will reduce the frequency and impact of incidents in the future. We hope to share more about the improvements in our next SOC 2 Type II report. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.

Status: Postmortem

Impact: Critical | Started At: May 8, 2023, 8:47 p.m.

Updates:

Time: June 1, 2023, 6:31 p.m.

Status: Postmortem

Update: On May 8th, 2023, at 1:39 PM PST, [Files.com](http://Files.com) received automated alerting of SFTP entirely down in the US East region which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. [Files.com](http://Files.com) released an initial Status Page posting on May 8th, 2023, at 1:47 PM PST stating: _**“SFTP Entirely Down – US East Region \(Primary\):** SFTP only: We are investigating a major outage of the SFTP service on_ [_Files.com_](http://Files.com) _in our primary USA region._ _This incident does not impact other network services such as API, FTP, WebDAV, AS2, and others._ _If you have an urgent need to access_ [_Files.com_](http://Files.com)_, we recommend using FTP in lieu of SFTP. If you must connect via SFTP, you should be able to immediately connect \(and access your existing files and account\) using the hostname of our Canada region, which is_ [_app-ca-central-1.files.com_](http://app-ca-central-1.files.com)_._ _We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.”_ The SFTP entirely down in the US East region was resolved on May 8th, 2023, at 1:47 PM PST, returning the platform to full functionality. [Files.com](http://Files.com) released a resolution Status Page posting on May 8th, 2023, at 1:51 PM PST stating _“All services have been restored and are operating normally._ _We have resolved a major outage of the SFTP service on_ [_Files.com_](http://Files.com) _in our primary USA region. This incident did not impact other network services such as API, FTP, WebDAV, AS2, and others. The SFTP service was down from 1:34 p.m. to 1:47 p.m., with a total downtime of 13 minutes, but only in the primary USA region._ _If you previously moved any workloads to another region in response to this incident, you are cleared to move those regional workloads back to the USA region._ _We will follow up with an Incident Report within ten \(10\) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.”_ This incident occurred during a time period that also contained multiple other incidents, some of which are overlapping. This report focuses specifically on the symptoms described here, but many customers who experienced this incident also experienced one of the other incidents. This incident had two distinct parts and root causes. First, [Files.com](http://Files.com) deployed a change to its SFTP server as part of our overall project to dramatically improve the logging and handling of errors on SFTP. The deployment of that change crashed our SFTP servers in several of our smaller regions due to an “out of memory” condition. Our SFTP server is developed in Java, and anyone familiar with Java can tell you how sensitive Java can be to memory configuration settings. We immediately identified the issue with the Java memory settings and pushed a change to Chef, our infrastructure configuration management system, to tweak the SFTP memory settings and resolve the initial crash. The root cause of this first part was [Files.com](http://Files.com)’s failure to monitoring Java runtime parameters such as memory usage to defend against an out of memory condition. We have added additional monitoring around Java memory usage and are optimistic that this situation will be avoided in the future. One benefit of the [Files.com](http://Files.com) architecture as compared with many of our peers is that on [Files.com](http://Files.com), SFTP is a completely isolated subsystem, so this incident did not impact other network services such as FTP, AS2, WebDAV, or API. Unfortunately, when we deployed the configuration change via Chef, we inadvertently deployed an unrelated configuration change at the same time that had been previously merged but not deployed to the SFTP servers. This is due to the fact that we use one unified Chef repository for server configuration where certain recipes can be shared by different server types. That configuration change introduced an error into the upstream communication with our API, resulting in inability to connect via SFTP for certain customers. After investigating the issue, we were able to identify the bad configuration change and revert it. The root cause of the second part is [Files.com](http://Files.com)’s failure to operate adequate change management procedures to prevent an unintended change from being deployed. Our incident management team was quite disappointed to learn about the chain of events that led to this incident. We have already improved our internal synthetic monitoring systems with the ability to detect the situation that occurred during this incident and alert on it immediately. Additionally, as a result of this incident, we are implementing major changes to our change management procedures designed to prevent this sort of configuration management error from happening again. Those changes are fairly complicated and will require a great deal of internal development. As such, they will likely not be deployed until the middle of Q3. It is our goal to have them implemented before our next SOC 2 Type II observation period \(which runs from Q2-Q3 2023\) and documented in our next SOC 2 Type II report. On a more general note, we have added a considerable amount of sophistication to our monitoring and routing systems as a result of the several incidents that occurred in May, and we are adding more. These improvements amount to over 5,000 lines of code and we are optimistic that they will reduce the frequency and impact of incidents in the future. We hope to share more about the improvements in our next SOC 2 Type II report. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
Time: May 8, 2023, 8:51 p.m.

Status: Resolved

Update: All services have been restored and are operating normally. We have resolved a major outage of the SFTP service on Files.com in our primary USA region. This incident did not impact other network services such as API, FTP, WebDAV, AS2, and others. The SFTP service was down from 1:34 p.m. to 1:47 p.m., with a total downtime of 13 minutes, but only in the primary USA region. If you previously moved any workloads to another region in response to this incident, you are cleared to move those regional workloads back to the USA region. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
Time: May 8, 2023, 8:47 p.m.

Status: Investigating

Update: SFTP only: We are investigating a major outage of the SFTP service on Files.com in our primary USA region. This incident does not impact other network services such as API, FTP, WebDAV, AS2, and others. If you have an urgent need to access Files.com, we recommend using FTP in lieu of SFTP. If you must connect via SFTP, you should be able to immediately connect (and access your existing files and account) using the hostname of our Canada region, which is app-ca-central-1.files.com. We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.

SFTP Entirely Down – US East Region (Primary)

Description: On May 8th, 2023, at 1:39 PM PST, [Files.com](http://Files.com) received automated alerting of SFTP entirely down in the US East region which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. [Files.com](http://Files.com) released an initial Status Page posting on May 8th, 2023, at 1:47 PM PST stating: _**“SFTP Entirely Down – US East Region \(Primary\):** SFTP only: We are investigating a major outage of the SFTP service on_ [_Files.com_](http://Files.com) _in our primary USA region._ _This incident does not impact other network services such as API, FTP, WebDAV, AS2, and others._ _If you have an urgent need to access_ [_Files.com_](http://Files.com)_, we recommend using FTP in lieu of SFTP. If you must connect via SFTP, you should be able to immediately connect \(and access your existing files and account\) using the hostname of our Canada region, which is_ [_app-ca-central-1.files.com_](http://app-ca-central-1.files.com)_._ _We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.”_ The SFTP entirely down in the US East region was resolved on May 8th, 2023, at 1:47 PM PST, returning the platform to full functionality. [Files.com](http://Files.com) released a resolution Status Page posting on May 8th, 2023, at 1:51 PM PST stating _“All services have been restored and are operating normally._ _We have resolved a major outage of the SFTP service on_ [_Files.com_](http://Files.com) _in our primary USA region. This incident did not impact other network services such as API, FTP, WebDAV, AS2, and others. The SFTP service was down from 1:34 p.m. to 1:47 p.m., with a total downtime of 13 minutes, but only in the primary USA region._ _If you previously moved any workloads to another region in response to this incident, you are cleared to move those regional workloads back to the USA region._ _We will follow up with an Incident Report within ten \(10\) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.”_ This incident occurred during a time period that also contained multiple other incidents, some of which are overlapping. This report focuses specifically on the symptoms described here, but many customers who experienced this incident also experienced one of the other incidents. This incident had two distinct parts and root causes. First, [Files.com](http://Files.com) deployed a change to its SFTP server as part of our overall project to dramatically improve the logging and handling of errors on SFTP. The deployment of that change crashed our SFTP servers in several of our smaller regions due to an “out of memory” condition. Our SFTP server is developed in Java, and anyone familiar with Java can tell you how sensitive Java can be to memory configuration settings. We immediately identified the issue with the Java memory settings and pushed a change to Chef, our infrastructure configuration management system, to tweak the SFTP memory settings and resolve the initial crash. The root cause of this first part was [Files.com](http://Files.com)’s failure to monitoring Java runtime parameters such as memory usage to defend against an out of memory condition. We have added additional monitoring around Java memory usage and are optimistic that this situation will be avoided in the future. One benefit of the [Files.com](http://Files.com) architecture as compared with many of our peers is that on [Files.com](http://Files.com), SFTP is a completely isolated subsystem, so this incident did not impact other network services such as FTP, AS2, WebDAV, or API. Unfortunately, when we deployed the configuration change via Chef, we inadvertently deployed an unrelated configuration change at the same time that had been previously merged but not deployed to the SFTP servers. This is due to the fact that we use one unified Chef repository for server configuration where certain recipes can be shared by different server types. That configuration change introduced an error into the upstream communication with our API, resulting in inability to connect via SFTP for certain customers. After investigating the issue, we were able to identify the bad configuration change and revert it. The root cause of the second part is [Files.com](http://Files.com)’s failure to operate adequate change management procedures to prevent an unintended change from being deployed. Our incident management team was quite disappointed to learn about the chain of events that led to this incident. We have already improved our internal synthetic monitoring systems with the ability to detect the situation that occurred during this incident and alert on it immediately. Additionally, as a result of this incident, we are implementing major changes to our change management procedures designed to prevent this sort of configuration management error from happening again. Those changes are fairly complicated and will require a great deal of internal development. As such, they will likely not be deployed until the middle of Q3. It is our goal to have them implemented before our next SOC 2 Type II observation period \(which runs from Q2-Q3 2023\) and documented in our next SOC 2 Type II report. On a more general note, we have added a considerable amount of sophistication to our monitoring and routing systems as a result of the several incidents that occurred in May, and we are adding more. These improvements amount to over 5,000 lines of code and we are optimistic that they will reduce the frequency and impact of incidents in the future. We hope to share more about the improvements in our next SOC 2 Type II report. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.

Status: Postmortem

Impact: Critical | Started At: May 8, 2023, 8:47 p.m.

Updates:

Time: June 1, 2023, 6:31 p.m.

Status: Postmortem

Update: On May 8th, 2023, at 1:39 PM PST, [Files.com](http://Files.com) received automated alerting of SFTP entirely down in the US East region which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. [Files.com](http://Files.com) released an initial Status Page posting on May 8th, 2023, at 1:47 PM PST stating: _**“SFTP Entirely Down – US East Region \(Primary\):** SFTP only: We are investigating a major outage of the SFTP service on_ [_Files.com_](http://Files.com) _in our primary USA region._ _This incident does not impact other network services such as API, FTP, WebDAV, AS2, and others._ _If you have an urgent need to access_ [_Files.com_](http://Files.com)_, we recommend using FTP in lieu of SFTP. If you must connect via SFTP, you should be able to immediately connect \(and access your existing files and account\) using the hostname of our Canada region, which is_ [_app-ca-central-1.files.com_](http://app-ca-central-1.files.com)_._ _We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.”_ The SFTP entirely down in the US East region was resolved on May 8th, 2023, at 1:47 PM PST, returning the platform to full functionality. [Files.com](http://Files.com) released a resolution Status Page posting on May 8th, 2023, at 1:51 PM PST stating _“All services have been restored and are operating normally._ _We have resolved a major outage of the SFTP service on_ [_Files.com_](http://Files.com) _in our primary USA region. This incident did not impact other network services such as API, FTP, WebDAV, AS2, and others. The SFTP service was down from 1:34 p.m. to 1:47 p.m., with a total downtime of 13 minutes, but only in the primary USA region._ _If you previously moved any workloads to another region in response to this incident, you are cleared to move those regional workloads back to the USA region._ _We will follow up with an Incident Report within ten \(10\) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.”_ This incident occurred during a time period that also contained multiple other incidents, some of which are overlapping. This report focuses specifically on the symptoms described here, but many customers who experienced this incident also experienced one of the other incidents. This incident had two distinct parts and root causes. First, [Files.com](http://Files.com) deployed a change to its SFTP server as part of our overall project to dramatically improve the logging and handling of errors on SFTP. The deployment of that change crashed our SFTP servers in several of our smaller regions due to an “out of memory” condition. Our SFTP server is developed in Java, and anyone familiar with Java can tell you how sensitive Java can be to memory configuration settings. We immediately identified the issue with the Java memory settings and pushed a change to Chef, our infrastructure configuration management system, to tweak the SFTP memory settings and resolve the initial crash. The root cause of this first part was [Files.com](http://Files.com)’s failure to monitoring Java runtime parameters such as memory usage to defend against an out of memory condition. We have added additional monitoring around Java memory usage and are optimistic that this situation will be avoided in the future. One benefit of the [Files.com](http://Files.com) architecture as compared with many of our peers is that on [Files.com](http://Files.com), SFTP is a completely isolated subsystem, so this incident did not impact other network services such as FTP, AS2, WebDAV, or API. Unfortunately, when we deployed the configuration change via Chef, we inadvertently deployed an unrelated configuration change at the same time that had been previously merged but not deployed to the SFTP servers. This is due to the fact that we use one unified Chef repository for server configuration where certain recipes can be shared by different server types. That configuration change introduced an error into the upstream communication with our API, resulting in inability to connect via SFTP for certain customers. After investigating the issue, we were able to identify the bad configuration change and revert it. The root cause of the second part is [Files.com](http://Files.com)’s failure to operate adequate change management procedures to prevent an unintended change from being deployed. Our incident management team was quite disappointed to learn about the chain of events that led to this incident. We have already improved our internal synthetic monitoring systems with the ability to detect the situation that occurred during this incident and alert on it immediately. Additionally, as a result of this incident, we are implementing major changes to our change management procedures designed to prevent this sort of configuration management error from happening again. Those changes are fairly complicated and will require a great deal of internal development. As such, they will likely not be deployed until the middle of Q3. It is our goal to have them implemented before our next SOC 2 Type II observation period \(which runs from Q2-Q3 2023\) and documented in our next SOC 2 Type II report. On a more general note, we have added a considerable amount of sophistication to our monitoring and routing systems as a result of the several incidents that occurred in May, and we are adding more. These improvements amount to over 5,000 lines of code and we are optimistic that they will reduce the frequency and impact of incidents in the future. We hope to share more about the improvements in our next SOC 2 Type II report. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
Time: May 8, 2023, 8:51 p.m.

Status: Resolved

Update: All services have been restored and are operating normally. We have resolved a major outage of the SFTP service on Files.com in our primary USA region. This incident did not impact other network services such as API, FTP, WebDAV, AS2, and others. The SFTP service was down from 1:34 p.m. to 1:47 p.m., with a total downtime of 13 minutes, but only in the primary USA region. If you previously moved any workloads to another region in response to this incident, you are cleared to move those regional workloads back to the USA region. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
Time: May 8, 2023, 8:47 p.m.

Status: Investigating

Update: SFTP only: We are investigating a major outage of the SFTP service on Files.com in our primary USA region. This incident does not impact other network services such as API, FTP, WebDAV, AS2, and others. If you have an urgent need to access Files.com, we recommend using FTP in lieu of SFTP. If you must connect via SFTP, you should be able to immediately connect (and access your existing files and account) using the hostname of our Canada region, which is app-ca-central-1.files.com. We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.

Is there an Files.com outage?

Files.com status: Systems Active

Files.com outages and incidents

There have been 1 outages or incidents for Files.com in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Components and Services Monitored for Files.com

Latest Files.com outages and incidents.

Reports of Elevated DNS Errors

Updates:

SFTP, FTP/FTPS, WebDAV Service Degraded

Updates:

SFTP, FTP/FTPS, WebDAV Service Degraded

Updates:

SFTP Entirely Down – US East Region (Primary)

Updates:

SFTP Entirely Down – US East Region (Primary)

Updates:

Check the status of similar companies and alternatives to Files.com

Atlassian

Zoom

Dropbox

Miro

TeamViewer

Lucid Software

Restaurant365

Mural

Zenefits

Retool

Splashtop

Hiver

Frequently Asked Questions - Files.com

Is there a Files.com outage?

Where can I find the official status page of Files.com?

How can I get notified if Files.com is down or experiencing an outage?

What does Files.com do?

Start monitoring now!