Last checked: 5 minutes ago
Get notified about any outages, downtime or incidents for Files.com and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Files.com.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
Australia Region | Active |
Background Jobs, including Sync and Webhooks | Active |
Canada Region | Active |
Core Services / API | Active |
EU (Germany) Region | Active |
Files Tools | Active |
FTP/FTPS | Active |
Japan Region | Active |
Remote Server Integrations (Sync and Mount) | Active |
SFTP | Active |
Singapore Region | Active |
UK Region | Active |
USA Region | Active |
WebDAV | Active |
Web Interface | Active |
View the latest incidents for Files.com and check for official updates:
Description: On May 2nd, 2023, at 12:40 PM PST, [Files.com](http://Files.com) received automated alerting of elevated rates on web services which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. [Files.com](http://Files.com) released an initial Status Page posting on May 2nd, 2023, at 1:11 PM PST stating: “**US Region Only: Web Service Elevated Error Rates:** _US Web services only: We are investigating elevated error rates on the web service on_ [_Files.com_](http://Files.com) _in the US region. This is causing preview delays in the web interface. This incident does not impact other network services such as API, FTP, WebDAV, AS2, and others, nor does it impact regions other than US. At this time, we believe that all network services are currently up in our other regional locations.”_ The was resolved on May 2nd, 2023 at 1:04 PM PST, returning the platform to full functionality. [Files.com](http://Files.com) released a resolution Status Page posting on May 2nd, 2023, at 1:18 PM PST stating: _“All services have been restored and are operating normally. All web services should be operating as normal. The issue with preview processing began at 12:35 PDT and was resolved completely by 1:04 PDT.”_ This incident was started when a deadlock occurred in one of [Files.com](http://Files.com)’s backend job processing systems, specifically the system that generates image and PDF previews of large images and documents for web viewing. A recent code change resulted in the system getting into a state where it locked up and did not process preview generation on 1 out of 6 backend servers. As a result of “backflow” caused by very high error rates, other jobs such as syncs were delayed by 5 minutes on two separate occasions. The root cause of this incident was a failure of [Files.com](http://Files.com)’s internal job scheduling system to probably route around the failed preview worker and prevent its failure from causing broader impact. Ultimately this was caused by a design failure internal job scheduling system, which we have now redesigned to avoid this type of issue. \(See next paragraph.\) A contributing cause was the failure of the preview worker itself, which was caused by [Files.com](http://Files.com)’s failure to properly test the recent code change in a high load situation. As a result of this incident and several other recent incidents, [Files.com](http://Files.com) worked on dramatic improvements to its internal job scheduling code during the last week of April and first week of May, and those improvements have been tested in staging and are now in production. These improvements provide multiple new protection mechanisms to prevent issues with specific customers, job types, or regions from “backflowing” and impacting other customers, job types, or regions. Extensive review and testing was conducted by [Files.com](http://Files.com) staff to ensure this resolution, and we have already taken steps internally to prevent this issue from recurring in the future. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
Status: Postmortem
Impact: Major | Started At: May 2, 2023, 8:11 p.m.
Description: On May 2nd, 2023, at 12:40 AM PST, [Files.com](http://Files.com) received automated alerts of delays in batch and scheduled operations which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. [Files.com](http://Files.com) released an initial Status Page posting on May 2nd, 2023, at 1:07 PM PST stating: “**Delays To Batch and Scheduled Operations:** _We are investigating reports of delays to batch and scheduled operations, including but not limited to scheduled and ad-hoc syncs, moves, copies, file transformations, automations, webhooks, batch deletes, email delivery, previews, and similar batch operations. This situation should not affect real-time operations such as FTP, SFTP, AS2, and other operations where_ [_Files.com_](http://Files.com) _acts as a server.”_ The delays in batch and scheduled operations were resolved on May 2nd, 2023, at 12:55 PM PST, returning the platform to full functionality. [Files.com](http://Files.com) released a resolution Status Page posting on May 2nd, 2023, at 1:15 PM PST stating: _“We have resolved a situation causing delays to batch and scheduled operations, including but not limited to scheduled and ad-hoc syncs, moves, copies, file transformations, automations, webhooks, batch deletes, email delivery, previews, and similar batch operations. Operations were delayed beginning at 12:35 PDT and ending by 12:55 PDT. All operations did successfully complete despite delays. This situation was only a delay of scheduled and batch processing and did not affect real-time operations such as FTP, SFTP, AS2, and other operations where_ [_Files.com_](http://Files.com) _acts as a server.”_ This incident was started when a deadlock occurred in one of [Files.com](http://Files.com)’s backend job processing systems, specifically the system that generates image and PDF previews of large images and documents for web viewing. A recent code change resulted in the system getting into a state where it locked up and did not process preview generation on 1 out of 6 backend servers. As a result of “backflow” caused by very high error rates, other jobs such as syncs were delayed by 5 minutes on two separate occasions. The root cause of this incident was a failure of [Files.com](http://Files.com)’s internal job scheduling system to probably route around the failed preview worker and prevent its failure from causing broader impact. Ultimately this was caused by a design failure internal job scheduling system, which we have now redesigned to avoid this type of issue. \(See next paragraph.\) A contributing cause was the failure of the preview worker itself, which was caused by [Files.com](http://Files.com)’s failure to properly test the recent code change in a high load situation. As a result of this incident and several other recent incidents, [Files.com](http://Files.com) worked on dramatic improvements to its internal job scheduling code during the last week of April and first week of May, and those improvements have been tested in staging and are now in production. These improvements provide multiple new protection mechanisms to prevent issues with specific customers, job types, or regions from “backflowing” and impacting other customers, job types, or regions. Extensive review and testing was conducted by [Files.com](http://Files.com) staff to ensure this resolution, and we have already taken steps internally to prevent this issue from recurring in the future. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
Status: Postmortem
Impact: Major | Started At: May 2, 2023, 8:07 p.m.
Description: On May 1st, 2023, at 7:39 AM PST, [Files.com](http://Files.com) received automated alerts of delays and errors related to certain background processing in the Canada region, which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. [Files.com](http://Files.com) released an initial Status Page posting on May 1st, 2023, at 8:10 AM PST, stating: “**Canada Region Only: Delays and Errors Related to Certain Background Processing:** _Canada only: We are investigating elevated error rates related to certain background processing performed as part of the core_ [_Files.com_](http://Files.com) _file transfer pipeline in the \[REGION\] region. Impacted functions of_ [_Files.com_](http://Files.com) _include file transformations such as zip/unzip, GPG, preview generation, and file statistics calculation such as MD5. This situation should not impact customers at all unless they have files stored in the Canada region. This situation should not affect real-time operations such as the_ [_Files.com_](http://Files.com) _API, FTP, SFTP, AS2, and other operations where_ [_Files.com_](http://Files.com) _acts as a server.”_ The delays and errors related to certain background processing in the Canada region was resolved on May 1st, 2023, at 11:46 AM PST, returning the platform to full functionality. [Files.com](http://Files.com) released a resolution Status Page posting on May 1st, 2023, at 12:18 PM PST stating: _“All services have been restored and are operating normally. Canada only: We have resolved an issue with certain background processing performed as part of the core_ [_Files.com_](http://Files.com) _file transfer pipeline in all regions. Impacted functions of_ [_Files.com_](http://Files.com) _included file transformations such as zip/unzip, GPG, preview generation, and file statistics calculation such as MD5. The issue with background processing began at 6:46 AM PST and was resolved completely by 11:46 AM PST. Resolution means that any background jobs that were previously delayed were now been processed successfully.”_ This incident started when a customer uploaded an large amount of data via our web interface to our Canada region. Due to the exact nature of the files uploaded, our Canada region’s worker servers became overloaded and unresponsive to any type of communication. As a result, all regional background jobs in Canada began failing for customers using our Canada region. Upon investigation, we determined the overload to be caused by a design flaw in our checksum calculation code which failed to properly use all available CPU cores on the machine, and instead only attempted to use a single CPU core. Basically, the machine locked up because dozens of jobs were attempting to use the same core, rather than spreading out to all available cores. As part of the incident resolution, [Files.com](http://Files.com) pushed an update to introduce more parallelism to this calculation and allowed all available CPU cores to be used. Additionally, one CPU core is now reserved for communication with our job scheduling system, which will prevent the communication problems in high load situations in the future. As a result of “back pressure” caused by very high error rates, other jobs on [Files.com](http://Files.com)’s background job scheduling system outside of the Canada region were also impacted with delays. The root cause of the broader delays \(non-Canada\) was a failure of [Files.com](http://Files.com)’s internal job scheduling system to probably route around the failed Canada workers and prevent their failure from causing broader impact. Ultimately this was caused by a design failure internal job scheduling system, which we have now redesigned to avoid this type of issue. \(See next paragraph.\) As a result of this incident and several other recent incidents, [Files.com](http://Files.com) worked on dramatic improvements to its internal job scheduling code during the last week of April and first week of May, and those improvements have been tested in staging and are now in production. These improvements provide multiple new protection mechanisms to prevent issues with specific customers, job types, or regions from “backflowing” and impacting other customers, job types, or regions. Extensive review and testing was conducted by [Files.com](http://Files.com) staff to ensure this resolution, and we have already taken steps internally to prevent this issue from recurring in the future. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
Status: Postmortem
Impact: Major | Started At: May 1, 2023, 3:10 p.m.
Description: On May 1st, 2023, at 7:39 AM PST, [Files.com](http://Files.com) received automated alerts of delays and errors related to certain background processing in the Canada region, which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. [Files.com](http://Files.com) released an initial Status Page posting on May 1st, 2023, at 8:10 AM PST, stating: “**Canada Region Only: Delays and Errors Related to Certain Background Processing:** _Canada only: We are investigating elevated error rates related to certain background processing performed as part of the core_ [_Files.com_](http://Files.com) _file transfer pipeline in the \[REGION\] region. Impacted functions of_ [_Files.com_](http://Files.com) _include file transformations such as zip/unzip, GPG, preview generation, and file statistics calculation such as MD5. This situation should not impact customers at all unless they have files stored in the Canada region. This situation should not affect real-time operations such as the_ [_Files.com_](http://Files.com) _API, FTP, SFTP, AS2, and other operations where_ [_Files.com_](http://Files.com) _acts as a server.”_ The delays and errors related to certain background processing in the Canada region was resolved on May 1st, 2023, at 11:46 AM PST, returning the platform to full functionality. [Files.com](http://Files.com) released a resolution Status Page posting on May 1st, 2023, at 12:18 PM PST stating: _“All services have been restored and are operating normally. Canada only: We have resolved an issue with certain background processing performed as part of the core_ [_Files.com_](http://Files.com) _file transfer pipeline in all regions. Impacted functions of_ [_Files.com_](http://Files.com) _included file transformations such as zip/unzip, GPG, preview generation, and file statistics calculation such as MD5. The issue with background processing began at 6:46 AM PST and was resolved completely by 11:46 AM PST. Resolution means that any background jobs that were previously delayed were now been processed successfully.”_ This incident started when a customer uploaded an large amount of data via our web interface to our Canada region. Due to the exact nature of the files uploaded, our Canada region’s worker servers became overloaded and unresponsive to any type of communication. As a result, all regional background jobs in Canada began failing for customers using our Canada region. Upon investigation, we determined the overload to be caused by a design flaw in our checksum calculation code which failed to properly use all available CPU cores on the machine, and instead only attempted to use a single CPU core. Basically, the machine locked up because dozens of jobs were attempting to use the same core, rather than spreading out to all available cores. As part of the incident resolution, [Files.com](http://Files.com) pushed an update to introduce more parallelism to this calculation and allowed all available CPU cores to be used. Additionally, one CPU core is now reserved for communication with our job scheduling system, which will prevent the communication problems in high load situations in the future. As a result of “back pressure” caused by very high error rates, other jobs on [Files.com](http://Files.com)’s background job scheduling system outside of the Canada region were also impacted with delays. The root cause of the broader delays \(non-Canada\) was a failure of [Files.com](http://Files.com)’s internal job scheduling system to probably route around the failed Canada workers and prevent their failure from causing broader impact. Ultimately this was caused by a design failure internal job scheduling system, which we have now redesigned to avoid this type of issue. \(See next paragraph.\) As a result of this incident and several other recent incidents, [Files.com](http://Files.com) worked on dramatic improvements to its internal job scheduling code during the last week of April and first week of May, and those improvements have been tested in staging and are now in production. These improvements provide multiple new protection mechanisms to prevent issues with specific customers, job types, or regions from “backflowing” and impacting other customers, job types, or regions. Extensive review and testing was conducted by [Files.com](http://Files.com) staff to ensure this resolution, and we have already taken steps internally to prevent this issue from recurring in the future. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
Status: Postmortem
Impact: Major | Started At: May 1, 2023, 3:10 p.m.
Description: We have resolved a performance degradation on Files.com affecting Files.com services in our primary USA and Australia regions. This incident occurred between the times of 11:51am PST to 3:04pm PST, but only in the primary USA and Australia regions. If you previously moved any workloads to another region in response to this incident, you are cleared to move those regional workloads back to the USA or Australia regions.
Status: Resolved
Impact: Major | Started At: April 27, 2023, 6:51 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.