Last checked: 3 minutes ago
Get notified about any outages, downtime or incidents for Files.com and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Files.com.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
Australia Region | Active |
Background Jobs, including Sync and Webhooks | Active |
Canada Region | Active |
Core Services / API | Active |
EU (Germany) Region | Active |
Files Tools | Active |
FTP/FTPS | Active |
Japan Region | Active |
Remote Server Integrations (Sync and Mount) | Active |
SFTP | Active |
Singapore Region | Active |
UK Region | Active |
USA Region | Active |
WebDAV | Active |
Web Interface | Active |
View the latest incidents for Files.com and check for official updates:
Description: We have resolved elevated error rates on the FTP service on Files.com in our primary USA region. This incident did not impact other network services such as API, SFTP, WebDAV, AS2, and others. This incident occurred between the times of 7:25am PST and 8:12am PST. If you previously moved any workloads to another region in response to this incident, you are cleared to move those regional workloads back to the USA region. We are compiling a Root Cause Analysis for this incident, which we will post here.
Status: Resolved
Impact: Minor | Started At: Nov. 13, 2024, 3:54 p.m.
Description: Files.com has resolved an incident that caused elevated errors on the web interface. No other services were impacted by this incident. Certain pages in the web interface failed to load between 11:37am and 1:24pm Pacific time. We are still investigating this incident and compiling a Root Cause Analysis, which we will post here.
Status: Resolved
Impact: None | Started At: Nov. 6, 2024, 7:30 p.m.
Description: Files.com is aware of elevated error rates on the FTP, SFTP, and WebDAV services on Files.com in all regions from 10:29am to 10:36am Pacific Time. We believe this incident impacted less than half of total traffic across these services during this time period. This incident did not impact any other network services such as API, AS2, HTTP, or others. We are compiling a Root Cause Analysis for this incident, which we will post here as soon as it is ready.
Status: Resolved
Impact: None | Started At: Oct. 29, 2024, 5:30 p.m.
Description: We have resolved an incident related to download failures of more than one file at a time via the web interface. This incident occurred between 5:14 PM PT on October 10 and 6:42 AM PT on October 11. We are compiling a final Root Cause Analysis for this incident, which we will post here when it is complete.
Status: Resolved
Impact: None | Started At: Oct. 11, 2024, midnight
Description: From 8:55 AM PST through 9:08 AM PST, [Files.com](http://Files.com) customers experienced much slower than normal response times to our core API, which affected other downstream services. Most API requests still completed successfully, albeit slower than normal. The root cause of the slower than normal response times was that one of our databases was running at a dramatically higher load than normal. Upon investigation, [Files.com](http://Files.com) determined that database queries were performing many orders of magnitude slower than intended due to the misconfiguration of an index in the database. This misconfiguration, it turns out, has existed for over a decade, but the particular query pattern had never been seen before in production. [Files.com](http://Files.com) immediately reacted to this situation by first disabling the problematic jobs that were generating the unoptimized queries. This returned the system to normal performance. [Files.com](http://Files.com) then fixed the database index configuration and re-enabled the problematic jobs, which then ran to completion quickly with no further impact on system performance. As part of our incident post-mortem process, we discovered and remedied a few deficiencies that contributed to this incident taking 13 minutes to resolve. First, we discovered a 5 minute delay in importing the relevant time series data from one of our monitoring systems \(Amazon Cloudwatch\) into another of our monitoring systems \(Influxdb\), the latter of which is used to trigger our internal alerting. We have made configuration changes to remedy this delay. Second, in addition to the delay in importing the time series data, we also had a poorly configured alert threshold that introduced an additional 6 minutes of delay before an on-call engineer was paged. We have made configuration changes to remove this delay, ensuring that an on-call engineer will be paged immediately in the event of a similar situation in the future. Additionally, as part of the post-mortem process for this incident, we implemented much stricter controls to detect and reject slow queries at the database itself. We conducted a simulated recreation of this incident in our staging environment and determined that our new controls are sufficient to prevent a recurrence of this incident. Additionally, after reviewing this incident, we built a new tool for our on-call engineers that implements a much faster, one-click action to quarantine a problematic job type once it has been flagged as problematic. This will improve our ability to react quickly to newly discovered performance deficiencies in the future. We will begin incorporating training on this new tool into our training for on-call engineers in our next recurrent training cycle. We greatly appreciate your patience and understanding as we resolved this issue.
Status: Postmortem
Impact: None | Started At: Sept. 19, 2024, 1 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.