Fasterize Status: Check if Fasterize down or having an outage.

Fasterize outages and incidents

Outage and incident data over the last 30 days for Fasterize.

There have been 0 outages or incidents for Fasterize in the last 30 days.

Severity Breakdown:

None: 0

Minor: 0

Major: 0

Critical: 0

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Components and Services Monitored for Fasterize

Outlogger tracks the status of these components for Xero:

Acceleration Active

API Active

CDN Active

Collect Active

Dashboard Active

Logs delivery Active

Website Active

Component	Status
Acceleration	Active
API	Active
CDN	Active
Collect	Active
Dashboard	Active
Logs delivery	Active
Website	Active

Latest Fasterize outages and incidents.

View the latest incidents for Fasterize and check for official updates:

Logs cluster failure

Description: Due to logs cluster issues, we have lost some logs from friday 23 to tuesday 27 th of july 2021. Our logs cluster had some performance issues since sunday 16:30 UTC. It seems that some logs indices experimented sharding issue that have corrupted 23 to 24 th of july 2021 indexes. Indexes from sunday 25 to tuesday 27 th 9h00 UTC of july 2021 may be incomplete. We deleted corrupted indexes. This allowed us to delete unhealthy nodes and re-balance the cluster. Logs are now fully operational. We are sorry for logs lost during this incident.

Status: Resolved

Impact: None | Started At: July 27, 2021, 9 a.m.

Updates:

Time: July 27, 2021, 12:14 p.m.

Status: Resolved

Update: Due to logs cluster issues, we have lost some logs from friday 23 to tuesday 27 th of july 2021. Our logs cluster had some performance issues since sunday 16:30 UTC. It seems that some logs indices experimented sharding issue that have corrupted 23 to 24 th of july 2021 indexes. Indexes from sunday 25 to tuesday 27 th 9h00 UTC of july 2021 may be incomplete. We deleted corrupted indexes. This allowed us to delete unhealthy nodes and re-balance the cluster. Logs are now fully operational. We are sorry for logs lost during this incident.

Performance degradation

Description: # Description Le 27/04/21 à 16h45, nos workers sont devenus indisponibles suite à une opération de maintenance sur la base de données gérant la configuration du moteur et les configurations client. Le dashboard et les API sont devenus très lents voir indisponibles, et ont généré des erreurs. L’opération de maintenance de la base de donnée visait à mettre à jour son code de configuration. Certaines pages des sites des clients ont été ralenti, notamment les pages HTML non cachées et les fragments de Smartcache. Les pages en SmartCache provoquaient des redirections vers les versions non cachées. # Faits et timeline * A 16h33, fin de la maintenance, la mise à jour des workers est déclenchée. Les tests fonctionnels et API sont OK \(mais pas fiables car les tests sont alors joués sur les anciens workers\). * A partir de 16h34, la service discovery ne trouve plus les nouvelles instances de la base de données de configuration * A partir de 16h35, il n’y a plus de worker async disponibles vus par les proxys * A partir de 16h45, il n’y a plus de worker sync disponibles vus par les proxys * 16h55 : alerte sur les workers sync, les proxys ouvrent le circuit breaker et bypassent les workers, les pages ne sont plus optimisées en majeure partie * 17h10 : redémarrage de la service discovery et retour du service de base de données de configuration dans la service discovery * 17h13 : communication de l’incident sur Statuspage * 17h23 : lancement d’une nouvelle mise à jour des workers * 17h30, fin de l’incident de l’indisponibilité des workers * 17h35, communication de la fin de l’incident sur statuspage * 17h38, redémarrage des services de l’API pour retrouver un service normal au niveau du dashboard # Analyse A partir de 16h35, suite à la migration de la base de données de configuration, le renommage des nouvelles machines avec le nom des anciennes a généré un conflit dans la service discovery, ce qui a éjecté les nouvelles machines. Conséquence, les DNS internes de la service discovery pour le service de base de données de configuration ne répondaient plus d’adresse IP. Sur les nouveaux workers utilisant la découverte de ce service via la service discovery, le service worker était toujours en attente de la récupération des fichiers de configuration et ne démarrait pas. Le redémarrage des agents de service discovery a suffi à rétablir la situation, ce redémarrage ayant eu lieu avec la mise à jour de tous les workers. D’autre part, l’indisponibilité de la base de données de configuration a aussi impacté l’ancienne API encore utilisée par le Dashboard \(récupération du statut de branchement\). Cette ancienne API, directement connecté à la base de données de configuration, n’avait plus non plus accès aux configurations pour la même raison et a généré des temps de réponse très long du Dashboard. Un redémarrage de cette API a suffi à la forcer à refaire une résolution DNS via la service discovery et à retrouver la connexion à la base de données de configuration. # Métriques * Niveaux de sévérité de l'incident : * Sévérité 2 : dégradation du site, problème de performance et/ou feature cassée avec difficulté de contourner impactant un nombre significatif d'utilisateur * Temps de détection : 10 minutes \(16h45 ⇢ 16h55\) * Temps de résolution : 45 minutes \(16h45 ⇢ 17h30\) * Durée de l’incident : 45 minutes # Impacts * ⅔ des pages ne sont pas optimisées. ⅓ des pages est ralenti \(timeout de 500ms\) * Un ticket client au sujet de redirection suite à des erreurs de Smartcache # Contre mesures ## Actions pendant l’incident * restart des services # Plan d'actions **Court terme :** * Revoir si les paramètres du circuit breaker dans le proxy sont corrects \(⅓ de tâches envoyées aux brokers alors que les circuits breaker des proxys étaient ouverts\) * Alerting lorsqu’un service ou un pourcentage de noeud d’un service est down sur la service discovery * Mettre à jour la documentation de la service discovery * Statuspage : ne pas faire d’auto-closing des maintenances ‌ English version # Description On 04/27/21 at 4:45pm, our workers became unavailable following a maintenance operation on the database managing the engine configuration and the client configurations. The dashboard and the APIs became very slow or even unavailable, and generated errors. The database maintenance operation aimed at updating its configuration code. Some pages of the clients' sites were slowed down, especially the non-cached HTML pages and the Smartcache fragments. The SmartCache pages caused redirects to the non-cached versions. # Facts and timeline * At 16:33, end of maintenance, the workers update is triggered. The functional and API tests are OK \(but not reliable because the tests are then played on the old workers\) * From 16:34, the service discovery does not find the new instances of the configuration database * From 16:35, there are no more available async workers seen by the proxies * From 16h45, there are no more available worker sync seen by the proxies * 16h55 : alert on the sync workers, the proxies open the circuit breaker and bypass the workers, the pages are not optimized anymore * 17h10 : restart of the service discovery, the configuration database service in the service discovery is back * 17h13 : communication of the incident on Statuspage * 17h23 : launch of a new update of the workers * 17h30, end of the incident of the unavailability of the workers * 17h35, communication of the end of the incident on statuspage * 17h38, restart of the API services to get back to a normal service on the dashboard # Analysis From 16:35, following the migration of the configuration database, the renaming of the new machines with the name of the old ones generated a conflict in the service discovery, which ejected the new machines. As a result, the internal DNS of the service discovery for the configuration database service was no longer responding with IP addresses. On the new workers using service discovery for this service, the service worker was still waiting for the configuration files to be retrieved and would not start. Restarting the service discovery agents was enough to restore the situation, as this restart took place with the update of all workers. On the other hand, the unavailability of the configuration database also impacted the old API still used by the Dashboard \(connection status management\). This old API, directly connected to the configuration database, did not have access to the configurations for the same reason and generated very long response times of the Dashboard. A restart of this API was enough to force it to redo a DNS resolution via the service discovery and to recover the connection to the configuration database. # Metrics * Incident severity levels: * Severity 2: degradation of the site, performance problem and/or broken feature with difficulty to bypass impacting a significant number of users * Detection time: 10 minutes \(16h45 ⇢ 16h55\) * Resolution time: 45 minutes \(16h45 ⇢ 17h30\) * Duration of the incident: 45 minutes # Impacts * ⅔ of pages are not optimized. ⅓ of the pages is slowed down \(timeout of 500ms\) * A customer ticket about redirection due to Smartcache errors # Countermeasures ## Actions during the incident * service restart # Action plan ## Short term : * Review if the circuit breaker settings in the proxy are correct \(⅓ of tasks sent to brokers while the proxies' circuit breakers were open\) * Alerting when a service or a percentage of a service node is down on service discovery * Update the service discovery documentation * Statuspage: do not auto-close maintenance

Status: Postmortem

Impact: Minor | Started At: April 27, 2021, 3:13 p.m.

Updates:

Time: May 4, 2021, 4:37 p.m.

Status: Postmortem

Update: # Description Le 27/04/21 à 16h45, nos workers sont devenus indisponibles suite à une opération de maintenance sur la base de données gérant la configuration du moteur et les configurations client. Le dashboard et les API sont devenus très lents voir indisponibles, et ont généré des erreurs. L’opération de maintenance de la base de donnée visait à mettre à jour son code de configuration. Certaines pages des sites des clients ont été ralenti, notamment les pages HTML non cachées et les fragments de Smartcache. Les pages en SmartCache provoquaient des redirections vers les versions non cachées. # Faits et timeline * A 16h33, fin de la maintenance, la mise à jour des workers est déclenchée. Les tests fonctionnels et API sont OK \(mais pas fiables car les tests sont alors joués sur les anciens workers\). * A partir de 16h34, la service discovery ne trouve plus les nouvelles instances de la base de données de configuration * A partir de 16h35, il n’y a plus de worker async disponibles vus par les proxys * A partir de 16h45, il n’y a plus de worker sync disponibles vus par les proxys * 16h55 : alerte sur les workers sync, les proxys ouvrent le circuit breaker et bypassent les workers, les pages ne sont plus optimisées en majeure partie * 17h10 : redémarrage de la service discovery et retour du service de base de données de configuration dans la service discovery * 17h13 : communication de l’incident sur Statuspage * 17h23 : lancement d’une nouvelle mise à jour des workers * 17h30, fin de l’incident de l’indisponibilité des workers * 17h35, communication de la fin de l’incident sur statuspage * 17h38, redémarrage des services de l’API pour retrouver un service normal au niveau du dashboard # Analyse A partir de 16h35, suite à la migration de la base de données de configuration, le renommage des nouvelles machines avec le nom des anciennes a généré un conflit dans la service discovery, ce qui a éjecté les nouvelles machines. Conséquence, les DNS internes de la service discovery pour le service de base de données de configuration ne répondaient plus d’adresse IP. Sur les nouveaux workers utilisant la découverte de ce service via la service discovery, le service worker était toujours en attente de la récupération des fichiers de configuration et ne démarrait pas. Le redémarrage des agents de service discovery a suffi à rétablir la situation, ce redémarrage ayant eu lieu avec la mise à jour de tous les workers. D’autre part, l’indisponibilité de la base de données de configuration a aussi impacté l’ancienne API encore utilisée par le Dashboard \(récupération du statut de branchement\). Cette ancienne API, directement connecté à la base de données de configuration, n’avait plus non plus accès aux configurations pour la même raison et a généré des temps de réponse très long du Dashboard. Un redémarrage de cette API a suffi à la forcer à refaire une résolution DNS via la service discovery et à retrouver la connexion à la base de données de configuration. # Métriques * Niveaux de sévérité de l'incident : * Sévérité 2 : dégradation du site, problème de performance et/ou feature cassée avec difficulté de contourner impactant un nombre significatif d'utilisateur * Temps de détection : 10 minutes \(16h45 ⇢ 16h55\) * Temps de résolution : 45 minutes \(16h45 ⇢ 17h30\) * Durée de l’incident : 45 minutes # Impacts * ⅔ des pages ne sont pas optimisées. ⅓ des pages est ralenti \(timeout de 500ms\) * Un ticket client au sujet de redirection suite à des erreurs de Smartcache # Contre mesures ## Actions pendant l’incident * restart des services # Plan d'actions **Court terme :** * Revoir si les paramètres du circuit breaker dans le proxy sont corrects \(⅓ de tâches envoyées aux brokers alors que les circuits breaker des proxys étaient ouverts\) * Alerting lorsqu’un service ou un pourcentage de noeud d’un service est down sur la service discovery * Mettre à jour la documentation de la service discovery * Statuspage : ne pas faire d’auto-closing des maintenances ‌ English version # Description On 04/27/21 at 4:45pm, our workers became unavailable following a maintenance operation on the database managing the engine configuration and the client configurations. The dashboard and the APIs became very slow or even unavailable, and generated errors. The database maintenance operation aimed at updating its configuration code. Some pages of the clients' sites were slowed down, especially the non-cached HTML pages and the Smartcache fragments. The SmartCache pages caused redirects to the non-cached versions. # Facts and timeline * At 16:33, end of maintenance, the workers update is triggered. The functional and API tests are OK \(but not reliable because the tests are then played on the old workers\) * From 16:34, the service discovery does not find the new instances of the configuration database * From 16:35, there are no more available async workers seen by the proxies * From 16h45, there are no more available worker sync seen by the proxies * 16h55 : alert on the sync workers, the proxies open the circuit breaker and bypass the workers, the pages are not optimized anymore * 17h10 : restart of the service discovery, the configuration database service in the service discovery is back * 17h13 : communication of the incident on Statuspage * 17h23 : launch of a new update of the workers * 17h30, end of the incident of the unavailability of the workers * 17h35, communication of the end of the incident on statuspage * 17h38, restart of the API services to get back to a normal service on the dashboard # Analysis From 16:35, following the migration of the configuration database, the renaming of the new machines with the name of the old ones generated a conflict in the service discovery, which ejected the new machines. As a result, the internal DNS of the service discovery for the configuration database service was no longer responding with IP addresses. On the new workers using service discovery for this service, the service worker was still waiting for the configuration files to be retrieved and would not start. Restarting the service discovery agents was enough to restore the situation, as this restart took place with the update of all workers. On the other hand, the unavailability of the configuration database also impacted the old API still used by the Dashboard \(connection status management\). This old API, directly connected to the configuration database, did not have access to the configurations for the same reason and generated very long response times of the Dashboard. A restart of this API was enough to force it to redo a DNS resolution via the service discovery and to recover the connection to the configuration database. # Metrics * Incident severity levels: * Severity 2: degradation of the site, performance problem and/or broken feature with difficulty to bypass impacting a significant number of users * Detection time: 10 minutes \(16h45 ⇢ 16h55\) * Resolution time: 45 minutes \(16h45 ⇢ 17h30\) * Duration of the incident: 45 minutes # Impacts * ⅔ of pages are not optimized. ⅓ of the pages is slowed down \(timeout of 500ms\) * A customer ticket about redirection due to Smartcache errors # Countermeasures ## Actions during the incident * service restart # Action plan ## Short term : * Review if the circuit breaker settings in the proxy are correct \(⅓ of tasks sent to brokers while the proxies' circuit breakers were open\) * Alerting when a service or a percentage of a service node is down on service discovery * Update the service discovery documentation * Statuspage: do not auto-close maintenance
Time: April 27, 2021, 3:35 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: April 27, 2021, 3:24 p.m.

Status: Monitoring

Update: A fix has been deployed and we're monitoring the result.
Time: April 27, 2021, 3:20 p.m.

Status: Identified

Update: Issue has been identified and a fix is being deployed. ETA 5-10 min
Time: April 27, 2021, 3:13 p.m.

Status: Investigating

Update: Starting from 4:45pm UTC+2, we currently have some issues on our european infrastructure. Being fixed. Some impact on acceleration. Some pages can have some slowdowns (<500ms)

Acceleration is disabled.

Description: # Description Le 16/03/2021, entre 18h45 et 19h20, l’ensemble de la plateforme Fasterize a subi des ralentissements avec des temps de réponse possiblement élevés quelque soit le site. Entre 18h49 et 18h59, la plateforme a automatiquement basculé le trafic vers les origines clients afin d’assurer la continuité du trafic. A partir de 18h58, de nouvelles machines ont été ajoutées et ont commencé à prendre du trafic pour mitiger l’impact en attendant de trouver la root cause. A 19h, le trafic est à nouveau routé sur la plateforme Fasterize et seules quelques requêtes sont ralenties. A 19h18, la cause est identifiée et conduit à bloquer 10 minutes plus tard une adresse IP effectuant des requêtes surchargeant la plateforme. # Faits et timeline * A partir de 8h23, augmentation du nombre de requêtes vers des fichiers volumineux \(> 1 Go\) d’un site client * A partir de 14h, seconde augmentation du nombre de requêtes vers ces fichiers volumineux * 18h49, alerte sur un nb élevé de requêtes annulées par les internautes \(499\) * 18h50, alerte sur l’indispo de nos fronts, le trafic est rerouté automatiquement sur les origines client * 18h55, ajout de machines dans le pool * 18h58, les nouvelles machines répondent au trafic * 18h59, la plateforme est de nouveau vue comme up, le trafic est rerouté chez Fasterize, quelques ralentissements * 19h18, débranchement du site client après identification de la surcharge de trafic * 19h20, retour à la normale \(plus de ralentissements\) * 19h27, blocage de l’adresse IP incriminée # Analyse A partir de 8h23, un serveur hébergé chez GCP a lancé plusieurs centaines de requêtes sur des fichiers volumineux transitant par nos proxys \(fichiers XML > 1Go\). Jusqu’alors, ce serveur faisait quelques dizaines de requêtes par jour. La bande passante sur les fronts et les proxys a progressivement augmenté sur toute la journée \(jusqu’à un facteur x2.5 par rapport à la veille et à la semaine précédente\) : A partir de 18h45, les temps de réponse globaux ont commencé à se dégrader sans qu’il y ait plus de bande passante utilisée/ Cela peut s’expliquer par l’augmentation soudaine du load des frontaux qui jusque là était stable. L’augmentation du load reste cependant inexpliquée à cette heure. # Métriques * Niveaux de sévérité de l'incident : * Sévérité 2 : dégradation du site, problème de performance et/ou feature cassée avec difficulté de contourner impactant un nombre significatif d'utilisateur * Temps de détection : 5 minutes \(18h45 ⇢ 18h49\) * Temps de résolution : 35 minutes \(18h45 ⇢ 19h20\) * Durée de l’incident : 35 minutes # Impacts * Débranchement automatique de l’ensemble des clients pdt 10 minutes. * Aucun ticket au support * Débranchement manuel de quelques sites par un client # Contre mesures ## Actions pendant l’incident * Ajout de frontaux * Débranchement du site web incriminé * Blocage de l’adresse incriminée # Plan d'actions **Court terme :** * Ajustement des alertes sur la bande passante * Ajustement des alertes sur le ping-fstrz-engine * Détection des objets les plus volumineux pour les bypasser **Moyen terme :** * Rate limiting sur les objets volumineux # English version # Description On 16/03/2021, between 18:45 and 19:20, the entire Fasterize platform experienced slowdowns with possibly high response times regardless of the site. Between 18:49 and 18:59, the platform automatically switched the traffic to the customer origins to ensure the continuity of traffic. From 18:58, new machines were added and started to take traffic to mitigate the impact until the root cause is found. At 7:00 pm, the traffic is again routed on the Fasterize platform and only a few requests are slowed down. At 7:18pm, the cause is identified and leads to blocking 10 minutes later an IP address making requests overloading the platform. # Facts and timeline * From 8:23 am, increase in the number of requests to large files \(> 1 GB\) from a client site * From 2pm, second increase in the number of requests to these large files * 18h49, alert on a high number of requests cancelled by users \(499\) * 18h50, alert route53 on the availability of our fronts, the traffic is rerouted automatically on the client origins * 18h55, addition of machines in the pool * 18h58, the new machines respond to traffic * 18h59, the platform is again seen as up, traffic is rerouted to Fasterize, some slowdowns * 19h18, disconnection of the customer site after identification of the traffic overload * 19h20, back to normal \(no more slowdowns\) * 19h27, blocking of the IP address in question # Analysis From 8:23 am, a server hosted by GCP started several hundred requests on large files transiting through our proxies \(XML files > 1GB\). Until then, this server made a few dozen requests per day. The bandwidth on the fronts and proxies has progressively increased throughout the day \(up to a factor x2.5 compared to the day before and the week before\) Starting at 6:45pm, overall response times started to degrade without more bandwidth being used. This can be explained by the sudden increase of the load of the front-ends, which until then had been stable. The increase in load remains unexplained at this time. # Metrics * Incident severity levels: * Severity 2: degradation of the site, performance problem and/or broken feature with difficulty to bypass impacting a significant number of users * Detection time: 5 minutes \(18h45 ⇢ 18h49\) * Resolution time: 35 minutes \(18h45 ⇢ 19h20\) * Duration of the incident: 35 minutes # Impacts * Automatic disconnection of all customers for 10 minutes. * No ticket to support * Manual disconnection of some sites by a customer # Countermeasures ## Actions during the incident * Addition of front ends * Disconnection of the offending website * Blocking of the incriminated address # Action plan \[ \] planned, \[-\] doing, \[x\] done ## Short term : * \[x\] Adjustment of the alerts on the bandwidth * \[-\] Adjustment of alerts on ping-fstrz-engine * \[-\] Detection of the most voluminous objects to bypass them ## Medium term : * \[ \] Rate limiting on large objects

Status: Postmortem

Impact: Critical | Started At: March 16, 2021, 6:01 p.m.

Updates:

Time: March 18, 2021, 10:15 p.m.

Status: Postmortem

Update: # Description Le 16/03/2021, entre 18h45 et 19h20, l’ensemble de la plateforme Fasterize a subi des ralentissements avec des temps de réponse possiblement élevés quelque soit le site. Entre 18h49 et 18h59, la plateforme a automatiquement basculé le trafic vers les origines clients afin d’assurer la continuité du trafic. A partir de 18h58, de nouvelles machines ont été ajoutées et ont commencé à prendre du trafic pour mitiger l’impact en attendant de trouver la root cause. A 19h, le trafic est à nouveau routé sur la plateforme Fasterize et seules quelques requêtes sont ralenties. A 19h18, la cause est identifiée et conduit à bloquer 10 minutes plus tard une adresse IP effectuant des requêtes surchargeant la plateforme. # Faits et timeline * A partir de 8h23, augmentation du nombre de requêtes vers des fichiers volumineux \(> 1 Go\) d’un site client * A partir de 14h, seconde augmentation du nombre de requêtes vers ces fichiers volumineux * 18h49, alerte sur un nb élevé de requêtes annulées par les internautes \(499\) * 18h50, alerte sur l’indispo de nos fronts, le trafic est rerouté automatiquement sur les origines client * 18h55, ajout de machines dans le pool * 18h58, les nouvelles machines répondent au trafic * 18h59, la plateforme est de nouveau vue comme up, le trafic est rerouté chez Fasterize, quelques ralentissements * 19h18, débranchement du site client après identification de la surcharge de trafic * 19h20, retour à la normale \(plus de ralentissements\) * 19h27, blocage de l’adresse IP incriminée # Analyse A partir de 8h23, un serveur hébergé chez GCP a lancé plusieurs centaines de requêtes sur des fichiers volumineux transitant par nos proxys \(fichiers XML > 1Go\). Jusqu’alors, ce serveur faisait quelques dizaines de requêtes par jour. La bande passante sur les fronts et les proxys a progressivement augmenté sur toute la journée \(jusqu’à un facteur x2.5 par rapport à la veille et à la semaine précédente\) : A partir de 18h45, les temps de réponse globaux ont commencé à se dégrader sans qu’il y ait plus de bande passante utilisée/ Cela peut s’expliquer par l’augmentation soudaine du load des frontaux qui jusque là était stable. L’augmentation du load reste cependant inexpliquée à cette heure. # Métriques * Niveaux de sévérité de l'incident : * Sévérité 2 : dégradation du site, problème de performance et/ou feature cassée avec difficulté de contourner impactant un nombre significatif d'utilisateur * Temps de détection : 5 minutes \(18h45 ⇢ 18h49\) * Temps de résolution : 35 minutes \(18h45 ⇢ 19h20\) * Durée de l’incident : 35 minutes # Impacts * Débranchement automatique de l’ensemble des clients pdt 10 minutes. * Aucun ticket au support * Débranchement manuel de quelques sites par un client # Contre mesures ## Actions pendant l’incident * Ajout de frontaux * Débranchement du site web incriminé * Blocage de l’adresse incriminée # Plan d'actions **Court terme :** * Ajustement des alertes sur la bande passante * Ajustement des alertes sur le ping-fstrz-engine * Détection des objets les plus volumineux pour les bypasser **Moyen terme :** * Rate limiting sur les objets volumineux # English version # Description On 16/03/2021, between 18:45 and 19:20, the entire Fasterize platform experienced slowdowns with possibly high response times regardless of the site. Between 18:49 and 18:59, the platform automatically switched the traffic to the customer origins to ensure the continuity of traffic. From 18:58, new machines were added and started to take traffic to mitigate the impact until the root cause is found. At 7:00 pm, the traffic is again routed on the Fasterize platform and only a few requests are slowed down. At 7:18pm, the cause is identified and leads to blocking 10 minutes later an IP address making requests overloading the platform. # Facts and timeline * From 8:23 am, increase in the number of requests to large files \(> 1 GB\) from a client site * From 2pm, second increase in the number of requests to these large files * 18h49, alert on a high number of requests cancelled by users \(499\) * 18h50, alert route53 on the availability of our fronts, the traffic is rerouted automatically on the client origins * 18h55, addition of machines in the pool * 18h58, the new machines respond to traffic * 18h59, the platform is again seen as up, traffic is rerouted to Fasterize, some slowdowns * 19h18, disconnection of the customer site after identification of the traffic overload * 19h20, back to normal \(no more slowdowns\) * 19h27, blocking of the IP address in question # Analysis From 8:23 am, a server hosted by GCP started several hundred requests on large files transiting through our proxies \(XML files > 1GB\). Until then, this server made a few dozen requests per day. The bandwidth on the fronts and proxies has progressively increased throughout the day \(up to a factor x2.5 compared to the day before and the week before\) Starting at 6:45pm, overall response times started to degrade without more bandwidth being used. This can be explained by the sudden increase of the load of the front-ends, which until then had been stable. The increase in load remains unexplained at this time. # Metrics * Incident severity levels: * Severity 2: degradation of the site, performance problem and/or broken feature with difficulty to bypass impacting a significant number of users * Detection time: 5 minutes \(18h45 ⇢ 18h49\) * Resolution time: 35 minutes \(18h45 ⇢ 19h20\) * Duration of the incident: 35 minutes # Impacts * Automatic disconnection of all customers for 10 minutes. * No ticket to support * Manual disconnection of some sites by a customer # Countermeasures ## Actions during the incident * Addition of front ends * Disconnection of the offending website * Blocking of the incriminated address # Action plan \[ \] planned, \[-\] doing, \[x\] done ## Short term : * \[x\] Adjustment of the alerts on the bandwidth * \[-\] Adjustment of alerts on ping-fstrz-engine * \[-\] Detection of the most voluminous objects to bypass them ## Medium term : * \[ \] Rate limiting on large objects
Time: March 16, 2021, 6:39 p.m.

Status: Resolved

Update: Incident is now closed. Sorry for the inconvenience. A post-mortem will follow.
Time: March 16, 2021, 6:33 p.m.

Status: Monitoring

Update: Fix has been deployed, acceleration has been enabled and everything is back to normal. We're still monitoring.
Time: March 16, 2021, 6:20 p.m.

Status: Identified

Update: Issue has been identified Mitigation is being deployed.
Time: March 16, 2021, 6:01 p.m.

Status: Investigating

Update: We currently have some issues on our european infrastructure. Being fixed. Speeding-up is disabled but trafic is ok.

Slight degradation of performance

Description: This incident has been resolved.

Status: Resolved

Impact: Minor | Started At: Feb. 3, 2021, 8:24 p.m.

Updates:

Time: Feb. 3, 2021, 8:49 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: Feb. 3, 2021, 8:24 p.m.

Status: Identified

Update: Starting from 8:40pm UTC+2, we're seeing slightly degraded performance for optimized pages (~ +200ms on response time). Cached and optimized objects are not affected. We are currently working on a fix and starting to deploy it in the next couple of minutes.

Logging / Monitoring incident

Description: Everything is now ok. Some logs may still be missing from 8AM UTC to 11AM UTC, sorry for the inconvenience.

Status: Resolved

Impact: Minor | Started At: Dec. 18, 2020, 2:43 p.m.

Updates:

Time: Dec. 18, 2020, 6:04 p.m.

Status: Resolved

Update: Everything is now ok. Some logs may still be missing from 8AM UTC to 11AM UTC, sorry for the inconvenience.
Time: Dec. 18, 2020, 3:24 p.m.

Status: Monitoring

Update: Log delivery is now ok. Some delay may appear and some logs between 9h30 UTC and 14h30 UTC may be missing. We're monitoring until it's completely resolved.
Time: Dec. 18, 2020, 2:43 p.m.

Status: Investigating

Update: We currently have some issues on our logging infrastructure. It's being actively fixed but there are impacts on log delivery. No impact on acceleration.

Check the status of similar companies and alternatives to Fasterize

Akamai

Systems Active

Nutanix

Systems Active

MongoDB

Systems Active

LogicMonitor

Systems Active

Acquia

Systems Active

Granicus System

Systems Active

CareCloud

Systems Active

Redis

Systems Active

integrator.io

Systems Active

NinjaOne Trust

Systems Active

Pantheon Operations

Systems Active

Securiti US

Systems Active

Frequently Asked Questions - Fasterize

Is there a Fasterize outage?

The current status of Fasterize is: Systems Active

Where can I find the official status page of Fasterize?

The official status page for Fasterize is here

How can I get notified if Fasterize is down or experiencing an outage?

To get notified of any status changes to Fasterize, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Fasterize every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here

What does Fasterize do?

Increase website performance and speed with our SaaS solution. Improve loading times, boost conversions, revenue, SEO, and user experience.

Is there an Fasterize outage?

Fasterize status: Systems Active

Fasterize outages and incidents

There have been 0 outages or incidents for Fasterize in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Components and Services Monitored for Fasterize

Latest Fasterize outages and incidents.

Logs cluster failure

Updates:

Performance degradation

Updates:

Acceleration is disabled.

Updates:

Slight degradation of performance

Updates:

Logging / Monitoring incident

Updates:

Check the status of similar companies and alternatives to Fasterize

Akamai

Nutanix

MongoDB

LogicMonitor

Acquia

Granicus System

CareCloud

Redis

integrator.io

NinjaOne Trust

Pantheon Operations

Securiti US

Frequently Asked Questions - Fasterize

Is there a Fasterize outage?

Where can I find the official status page of Fasterize?

How can I get notified if Fasterize is down or experiencing an outage?

What does Fasterize do?

Start monitoring now!