Company Logo

Is there an Fasterize outage?

Fasterize status: Systems Active

Last checked: 8 minutes ago

Get notified about any outages, downtime or incidents for Fasterize and 1800+ other cloud vendors. Monitor 10 companies, for free.

Subscribe for updates

Fasterize outages and incidents

Outage and incident data over the last 30 days for Fasterize.

There have been 0 outages or incidents for Fasterize in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Sign Up Now

Components and Services Monitored for Fasterize

Outlogger tracks the status of these components for Xero:

Acceleration Active
API Active
CDN Active
Collect Active
Dashboard Active
Logs delivery Active
Website Active
Component Status
Acceleration Active
API Active
CDN Active
Collect Active
Dashboard Active
Logs delivery Active
Website Active

Latest Fasterize outages and incidents.

View the latest incidents for Fasterize and check for official updates:

Updates:

  • Time: June 18, 2020, 2:03 p.m.
    Status: Postmortem
    Update: English version follows. # Description Entre 8h53 et 9h52, la couche de front a été surchargée suite au mauvais redémarrage automatique de certaines machines. Pendant cette période, seul un nombre restreint de machines ont assuré le trafic. Le trafic n’a été rerouté que quelques minutes malgré les sondes rapportant l’indisponibilité de la plateforme. # Faits et timeline * 6h30 : renouvellement automatique des certificats Let’s Encrypt * 6h30 : démarrage des machines pour la journée * 6h35 : retour en erreur du démarrage des fronts, toujours pas dispos sur le load-balancer * 8h27 : première alerte sur la bande passante, automatiquement résolue à 8h32 * 8h45 : seconde alerte sur la bande passante, automatiquement résolue à 8h51, l’équipe essaye de démarrer de nouvelles machines * 8h51 : 3ème alerte sur la bande passante, automatiquement résolue à 9h * 8h53 : la couche de fronts commence à être surchargée au niveau réseau * 9h01 : premier ticket au support * 9h19 : alerte sur l’indispo globale => désactivation globale de Fasterize * 9h20 : identification du problème sur les machines défectueuses * 9h25 : première communication sur [status.fasterize.com](http://status.fasterize.com) * 9h35 : première tentative de déploiement du fix et échec * 9h45 : lancement du déploiement du fix * 9h52 : fin du déploiement et restart des machines # Analyse Tous les jours, l’infrastructure Fasterize est ajustée en terme de trafic et des machines sont éteintes ou démarrées en fonction. A 6h30, les machines front ont été démarrées normalement mais le service HTTP n’a pas correctement démarré. Le démarrage du service HTTP a été rendu impossible par un problème de configuration lié au renouvellement automatique des certificats Let’s Encrypt : le service HTTP n’avait plus accès à la clef privée des certificats renouvelés pendant la nuit et refusait de démarrer. Cet accès a été rendu impossible à cause du nouveau mécanisme de renouvellement des certificats impliquant des droits différents sur les certificats et clefs privées et suite au renouvellement automatique effectué à 06h30. Le load-balancer a bien vu les machines démarrées mais les voyait en unhealthy. Les machines restantes ont donc pris tout le trafic et ont commencé à être surchargées après avoir atteint leur capacité maximale. ‌La couche de CDN a donc eu du mal à joindre l’origine et a provoqué des erreurs 50x. ‌La disponibilité de la couche d’optimisation mesurée par les sondes externes montre bien l’indisponibilité à partir de 8h52 alors que les sondes de disponibilité globale montre une indisponibilité seulement pendant 3 minutes à partir de 9h18. Les origines clients sont branchés sur les sondes globales \(incluant CDN et couche d’optimisation\) et donc n’ont pas été reroutées à partir du début de l’incident. Les sondes de disponibilité au niveau global étaient paramétrées avec une sensibilité moindre et comme une partie du trafic continuait de passer, elles n’ont pas relevée la même indisponibilité. Les alertes remontées à l’astreinte ont concerné le trafic réseau excessif mais pas les erreurs 504 car le taux d’erreur moyen de 504 n'a pas dépassé les seuils classiques que nous utilisons, au plus fort de l’incident vers 9h17. Aucune alerte n’a été remontée sur la non disponibilité du service HTTP des fronts. # Métriques * Niveaux de sévérité de l'incident : * Sévérité 2 : dégradation du site, problème de performance et/ou feature cassée avec difficulté de contourner impactant un nombre significatif d'utilisateur * Temps de détection : 2h \(à partir du démarrage des fronts\) * Temps de résolution : 3h * Durée de l’incident : 60 minutes # Impacts Sur la durée de l’incident, les erreurs 50x ont représenté 10,77% du trafic des pages HTML, 3,52% du trafic non caché et 1,15% du trafic total. ‌Au plus fort de l’incident \(9h17\), ces taux sont montés à respectivement 38,7%, 16,3% et 5,5% Onze clients ont remonté des erreurs via le support. # Plan d'action \[ \] à planifier, \[-\] en cours, \[x\] fait **Court terme :** * \[-\] Modification du mécanisme de synchronisation des certificats Let’s Encrypt * \[ \] Améliorer la remontée d’informations \(logs et alertes\) en cas de problèmes lors du renouvellement et/ou de la synchro * \[x\] Corriger la sensibilité de la sonde de disponibilité au niveau global * \[x\] Organisation : débranchement systématique de la plateforme en cas d’incident impactant l’ensemble des clients * \[-\] Test du débranchement manuel sur un environnement de staging * \[x\] Revoir les seuils d'alerte sur les 504 vus par Cloudfront * \[-\] Ajouter une alerte sur la disponibilité du service HTTP pour les fronts * \[ \] Organisation : améliorer le temps de réaction avant publication d’un incident **Moyen terme :** * Améliorer la résilience des fronts face à un certificat SSL invalide / absent. **Long terme :** * Revue du système de gestion des certificats SSL. ‌ # English version # Description Between 8:53 and 9:52 a.m \(UTC\+2\)., the front layer was overloaded due to the poor automatic restart of some machines. During this period, only a limited number of machines were running. Traffic was only re-routed for a few minutes despite availability probes reporting the unavailability of the platform. # Facts and timeline * 6:30 am: automatic renewal of Let's Encrypt certificates * 6:30 am: start of the machines for the day * 6h35: error return from the start of the fronts, still not available on the load-balancer * 8:27 am: first bandwidth alert, automatically resolved at 8:32 am * 8:45 am: second alert on the bandwidth, automatically resolved at 8:51 am, the team tries to start new machines * 8:51 am: 3rd alert on bandwidth, automatically resolved at 9 am * 8:53 am: the front layer begins to be overloaded at network level * 9:01 am: first ticket to the support desk * 9h19 : global availability alert => global disabling of Fasterize * 9:20 am: identification of the issue on defective machines * 9:25 am: first communication on [status.fasterize.com](http://status.fasterize.com) * 9:35 am: first attempt to deploy a fix and failure * 9:45 am: second start of fix deployment * 9:52 am: end of deployment and restart of the machines # Analysis Every day, Fasterize infrastructure is adjusted in terms of traffic and machines are switched off or started up accordingly. At 6:30 am, the front machines were started normally but the HTTP service did not start correctly. Starting the HTTP service was made impossible by a configuration problem related to the automatic renewal of Let's Encrypt certificates: the HTTP service no longer had access to the private key of the certificates renewed during the night and refused to start. This access was made impossible because of the new certificate renewal mechanism involving different rights on certificates and private keys and following the automatic renewal performed at 06:30. The load-balancer did see the machines started but saw them as unhealthy. So the remaining machines took all the traffic and started to be overloaded after reaching their maximum capacity. The CDN layer therefore had trouble reaching the origin and caused 50x errors. The availability of the optimization layer measured by the external probes shows unavailability from 8:52 am while the global availability probes show unavailability only for 3 minutes from 9:18 am. The customer origins are connected to the global probes \(including CDN and optimization layer\) and therefore were not rerouted from the beginning of the incident. The global availability probes were set up with a lower sensitivity and as some traffic continued to pass, they did not detect the same unavailability. The alerts raised on call concerned the excessive network traffic but not the 504 errors because the average error rate of 504 did not exceed the classic thresholds that we use. No alert was raised on the non-availability of the HTTP service of the fronts. # Metrics * Incident Severity Levels : * Severity 2: site degradation, performance problem and/or feature broken with difficulty to bypass impacting a significant number of users. * Detection time: 2h \(from the start of the edges\) * Resolution time: 3h * Duration of the incident: 60 minutes # Impacts Over the duration of the incident, 50x errors accounted for 10.77% of the HTML page traffic, 3.52% of the non-cached traffic and 1.15% of the total traffic. At the peak of the incident \(9:17 a.m.\), these rates rose to 38.7%, 16.3% and 5.5% respectively. Eleven customers reported errors via support. # Action plan \[ \] planned, \[-\] doing, \[x\] done **Short term :** * \[-\] Modification of the Let's Encrypt certificate synchronization mechanism * \[ \] Improve the feedback \(logs and alerts\) in case of problems during renewal and/or synchronization. * \[x\] Correct the sensitivity of the availability probe at the global level * \[x\] Organization: systematic disconnection of the platform in the event of an incident impacting all customers. * \[-\] Test manual disconnect in a staging environment * \[x\] Review the alert thresholds on the 504 seen by Cloudfront. * \[-\] Add an alert on the availability of HTTP service for the fronts. * \[ \] Organization: improve response time before publication of an incident **Medium term :** * Improve the resilience of the fronts against an invalid/absent SSL certificate. **Long term :** * Review of the SSL certificate management system.
  • Time: June 18, 2020, 8:04 a.m.
    Status: Resolved
    Update: This incident is now resolved. Postmortem will follow. Sorry for this incident and your impacted customers.
  • Time: June 18, 2020, 7:54 a.m.
    Status: Monitoring
    Update: The fix is now deployed. Trafic is now accelerated
  • Time: June 18, 2020, 7:41 a.m.
    Status: Identified
    Update: The issue has been identified and the fix is being applied
  • Time: June 18, 2020, 7:25 a.m.
    Status: Investigating
    Update: We're currently investigating an issue giving intermittent 50x errors.

Updates:

  • Time: June 4, 2020, 2:41 p.m.
    Status: Resolved
    Update: Optimizations are now enabled.
  • Time: June 4, 2020, 1:42 p.m.
    Status: Identified
    Update: Following the previously scheduled maintenance, starting at 3pm, optimizations has been disabled and will be restored in the next hour.

Updates:

  • Time: May 22, 2020, 1:47 p.m.
    Status: Postmortem
    Update: # Description Between 10:18 UTC\+2 and 11:20 a.m. UTC\+2, the static resources of some clients responded with 503 errors. Internet users did not necessarily see these errors, but some sites may have displayed broken pages because of these missing objects, especially for Internet users who did not have these objects in their browser cache. # Facts and Timeline * 10:18: manual update of one of our component * 10:28: first alert * 10:36: Start of bypass of the CDN layer for the impacted domains * 10:52: All impacted domains bypass the CDN layer. Due to some DNS propagation delays, errors occur until 11:20 * 13:42: Start of reconnection of impacted domains to the CDN * 14:04: Impacted domains are reconnected to the CDN # Analyze The incident was caused by an update on one of our component, not supposed to be related to the production stack. An execution role needed by edge processes on the CDN layer was removed as a side-effect of this update. # Metrics Severity: level 2 \(site degradation, performance problem and/or feature broken with difficulty to bypass impacting a significant number of users\) Time To Detect: 10 min Time To Resolve: 60min # Impacts Only a few sites were impacted \(<10\). # Countermeasures * Short-term * adjust alerting on edge processes to improve diagnosis * adjust alert level on 5xx errors viewed from the CDN layer * Mid-Term * secure the execution role of edge processes * ease CDN layer unplugging for a specific customer
  • Time: May 22, 2020, 1:47 p.m.
    Status: Resolved
    Update: Everything is now back to normal. Post-mortem will follow in the next hours. Sorry for the inconvenience :-(
  • Time: May 22, 2020, 12:08 p.m.
    Status: Monitoring
    Update: The problem has been fixed for all impacted customers. We are monitoring errors to assert everything is back to normal.
  • Time: May 22, 2020, 9:05 a.m.
    Status: Identified
    Update: For some customers, statics are not served by the CDN layer anymore, we're actively working to fix this. But in the meantime, websites are normally served.
  • Time: May 22, 2020, 9:01 a.m.
    Status: Identified
    Update: Some 503 errors have occured for static assets on the CDN layer. This was limited to some customers.

Updates:

  • Time: May 7, 2020, 10:18 a.m.
    Status: Resolved
    Update: The fix has been deployed. The image inlining is enabled again. We are sorry for the inconvenience.
  • Time: May 7, 2020, 9:31 a.m.
    Status: Identified
    Update: The issue has been identified in the engine code that could cause conflict. We are deploying an hot fix.
  • Time: May 7, 2020, 8:16 a.m.
    Status: Investigating
    Update: We are currently investigating an issue on the image inlining feature. This feature has been deactivated. We saw inlined images that were not the right images.

Updates:

  • Time: April 30, 2020, 7:17 a.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: April 30, 2020, 6:20 a.m.
    Status: Investigating
    Update: Our website has been attacked, so security systems have blocked trafic. We are investigating.

Check the status of similar companies and alternatives to Fasterize

Akamai
Akamai

Systems Active

Nutanix
Nutanix

Systems Active

MongoDB
MongoDB

Systems Active

LogicMonitor
LogicMonitor

Systems Active

Acquia
Acquia

Systems Active

Granicus System
Granicus System

Systems Active

CareCloud
CareCloud

Systems Active

Redis
Redis

Systems Active

integrator.io
integrator.io

Systems Active

NinjaOne Trust

Systems Active

Pantheon Operations
Pantheon Operations

Systems Active

Securiti US
Securiti US

Systems Active

Frequently Asked Questions - Fasterize

Is there a Fasterize outage?
The current status of Fasterize is: Systems Active
Where can I find the official status page of Fasterize?
The official status page for Fasterize is here
How can I get notified if Fasterize is down or experiencing an outage?
To get notified of any status changes to Fasterize, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Fasterize every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here
What does Fasterize do?
Increase website performance and speed with our SaaS solution. Improve loading times, boost conversions, revenue, SEO, and user experience.