Last checked: 5 minutes ago
Get notified about any outages, downtime or incidents for Files.com and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Files.com.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
Australia Region | Active |
Background Jobs, including Sync and Webhooks | Active |
Canada Region | Active |
Core Services / API | Active |
EU (Germany) Region | Active |
Files Tools | Active |
FTP/FTPS | Active |
Japan Region | Active |
Remote Server Integrations (Sync and Mount) | Active |
SFTP | Active |
Singapore Region | Active |
UK Region | Active |
USA Region | Active |
WebDAV | Active |
Web Interface | Active |
View the latest incidents for Files.com and check for official updates:
Description: We have resolved a major partial outage of the Files.com service in all regions. This outage affected all services except for the Files.com API. This incident occurred between the times of 3:08am and 4:09am Pacific Time. We are compiling a Root Cause Analysis that we will post here.
Status: Resolved
Impact: Major | Started At: Sept. 10, 2024, 11 a.m.
Description: We have resolved an incident causing elevated error rates on the FTP, SFTP, and WebDAV services on Files.com in all regions. This incident did not impact other network services such as our API, AS2, or any others. This incident occurred between the times of 8:31am and 8:40am Pacific Time. We are compiling a final Root Cause Analysis for this incident, which we will post here when it is complete.
Status: Resolved
Impact: None | Started At: Sept. 5, 2024, 3:30 p.m.
Description: On August 6th, 2024, at 3:05 PM PST, [Files.com](http://Files.com) received multiple monitoring alerts indicating _‘SFTP Service Only: Elevated Error Rates’_, which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. The _‘SFTP Service Only: Elevated Error Rates’_ issue was resolved on August 6th, 2024, at 4:06 PM PST, returning the platform to full functionality. From 3:01 PM PST through 4:06 PM PST, [Files.com](http://Files.com) customers experienced elevated error rates when connecting via the SFTP protocol. Although this incident seems similar to the incident which occurred on August 2, it was a completely distinct situation. The elevated error rates during this period were actually caused by a denial-of- service \(“DoS”\) attack against [Files.com](http://Files.com)’s SFTP service. Like all large providers of services on the Internet, we are under constant attack from a variety of threat actors. [Files.com](http://Files.com) uses a variety of sophisticated tools to defend against attacks against its infrastructure. There are no commercial providers \(that we know of\) who produce DoS mitigation tools which work specifically for SFTP, and so we’ve had to invest heavily in developing our own protection and mitigation tools specifically for SFTP. One of our mitigation strategies is to completely block connections from SFTP counter-parties who appear to be abusive. A very hard challenge associated with this is correctly determining whether a counterparty is being intentionally abusive as opposed to being a misconfigured script or automation from an otherwise legitimate customer. Accidentally blocking a legitimate customer can take down a major workflow for a customer, and we try very hard to never have that happen. It’s a delicate balance and we spend a lot of time and engineering resources trying to get this right. About 4 weeks ago, [Files.com](http://Files.com) released an update to our internal security tools to add more logic to the part of our code where we try to determine abusive connections via SFTP. This was done with the hope of making it even less likely for a legitimate customer to ever be blocked inadvertently. While this improvement was good overall, it turns out that this update introduced a regression that allowed a particular type of malicious counterparty to open up SFTP connections and leave them hanging in an idle state. That’s what happened on August 6. A malicious counterparty “used up” a number of our connection pool slots by opening them and letting them hang idle, leaving them unavailable for legitimate use. After fixing the logic error in our security software, the malicious counterparty was automatically blocked and full SFTP functionality was restored. We want to be very clear about two things: 1. This was not a full outage of SFTP, rather it was a degradation due to partial inability to connect. If you operated SFTP software which used retries, it is likely that your connections worked on retry. 2. The \*only\* thing that this malicious actor was able to do was hold open connection pool slots so that legitimate customers weren’t able to connect to them. That’s what a denial-of-service attack is: they denied service to you, the legitimate customer. There was absolutely no access to our systems at all beyond the denial-of-service. Even denial-of-service attacks cause real economic impact, and we work hard to defend against them. The root cause of this issue was [Files.com](http://Files.com)'s incomplete testing of the security software change from 4 weeks ago. It is hard to produce synthetic testing that simulates anything that a malicious actor might do, but we learned from this incident and have updated our testing accordingly. We promise a system that works perfectly, all of the time, and we are disappointed that you may have experienced issues today that were caused by a malicious actor. Defense against the ever-present threat environment is one of the main reasons you chose to use [Files.com](http://Files.com) as opposed to operating your own on-premise server, and it is absolutely our job to prevent these sorts of things from ever affecting your workflows. We take that mission seriously. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
Status: Postmortem
Impact: Major | Started At: Aug. 6, 2024, 10:20 p.m.
Description: On August 2nd, 2024, at 7:35 AM PST, [Files.com](http://Files.com) correlated multiple customer tickets indicating _‘Connection failures over SFTP, FTP, and WebDAV for recently logged in users attempting new connections’_, which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. The _‘Connection failures over SFTP, FTP, and WebDAV for recently logged in users attempting new connections’_ issue was resolved on August 2nd, 2024, at 7:41 AM PST, returning the platform to full functionality. At 6:52 AM PST on August 2, [Files.com](http://Files.com) made a routine code deployment which introduced a bug that prevented more than one session from being opened via the SFTP, FTP, or WebDAV protocols. [Files.com](http://Files.com) reverted this deployment at 7:41 AM PST, restoring proper functionality. This resulted in 49 minutes of degraded performance for many customer use cases. It is common for many automated and ad-hoc processes to use several connections at once when communicating via SFTP or FTP. During the degraded period, only one of those connections was likely to work. Depending on the exact software in use, this might have resulted in failures of your process to run, or it might have worked with only a single connection. In either case, the situation was clearly unacceptable because it likely broke a number of critical customer workflows. The fix to the bug was simple and involved a one line change. The bug was not caught originally because we did not consider testing multiple simultaneous connections in our testing environment. While we are disappointed by the original bug making it past our testing pipeline, our true disappointment relates to our systems that monitor and alert on the status of our production environment. If our monitoring had operated perfectly, we would have solved the original bug in 2 minutes, not 49 minutes. This incident revealed an interesting set of weaknesses in our monitoring systems. First, our automated testing platform which tests our production environment did not attempt multiple simultaneous connections when testing SFTP. In this incident, the issue/downtime only occurred when attempting multiple simultaneous connections. We will update our automated testing platform to attempt multiple simultaneous connections in the future. Additionally, when responding to this incident we discovered that the original bug occurred in a section of server-side code which was excluded from reporting to Sentry, a platform we use for exception tracking and real time alerting. This exclusion was in error. As a result, our on-call team was not immediately paged like we should have been. We have updated our code to ensure that future bugs in this part of the code result in immediate reporting to Sentry, which would result in immediate notification to our on-call team in a future similar incident. To cover the possibility of Sentry alerts failing to fire in the future, we have added additional belt-and-suspenders alerting to look for spikes in 5xx HTTP error codes from our web proxy layer which don’t have a corresponding alert in Sentry. This provides a backup mechanism to ensure that our on-call team will be paged in the future in a situation like this one. The root cause of this issue was [Files.com](http://Files.com)’s failure to have a robust, multi-layered monitoring system to detect production failures and alert our on-call team. We have already implemented multiple mitigations at different layers to reduce the odds of a similar issue occurring in the future. We promise a system that works perfectly, all of the time, and today we failed to deliver that to you. Our entire engineering team is working hard to prevent issues like this one from occurring in the future. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
Status: Postmortem
Impact: None | Started At: Aug. 2, 2024, noon
Description: We have resolved an issue that was preventing certain logins on Files.com via Single Sign On (SSO). This incident occurred between the times of 10:57 AM PST to 11:12 AM PST. This issue did not impact logins which do not use SSO (Single Sign On).
Status: Resolved
Impact: None | Started At: July 8, 2024, 6 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.