Last checked: 9 minutes ago
Get notified about any outages, downtime or incidents for Fasterize and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Fasterize.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
Acceleration | Active |
API | Active |
CDN | Active |
Collect | Active |
Dashboard | Active |
Logs delivery | Active |
Website | Active |
View the latest incidents for Fasterize and check for official updates:
Description: # Error 521 for some customers Date: 04/05/2023 # Description of incident Some customers have experienced resource errors 521. The Fasterize error 521 corresponds to a configuration that is not found in the engine. After an engine update, some proxies failed to load some configurations in V2 format. # Facts and Timeline * **16h38**: Launch of the engine update after validation on the staging environment and then in canary mode * **17h10**: First alert:High proxy error ratio detected. * **17h12**: 521 errors are starting to appear. * **17h27**: The technical team turns off problematic proxies. * **17:30**: Traffic is back to normal. * **17h36**: Publication of a message on StatusPage * **17:56:** Trigger workers rollback * **18:25:** The technical team is fixing the issue by returning to the previous version of the engine. # Analysis On February 15, 2023, the deployment of the website-config package \(4.14.1\) changed the JSON schema used for client configurations in order to introduce a new key. This change should not have been included in the package because the feature was not finished. The new version of the website-config package moved this new key to another location in the JSON schema. During deployment, the deletion of the key previously and incorrectly introduced in the validation scheme had the effect of invalidating all V2 configs with this key. However, this key was added automatically by the API if it was not present. A mechanism to load a configuration even if it is not valid was however introduced during the update but did not work. When processing requests associated with unloaded configurations, the engine responded with a 521 error. The fallback mechanism at the front level has mitigated the problem at the cache layer. Indeed, a second attempt on another proxy is triggered in the event of a 521 error. However, the return-to-origin system is not in place for 521 errors \(to prevent the discovery of configurations\). The message for 521 errors is not clear enough and should render a page like the one used for 592 or 594 errors. At the rollback level, retrieving the commit corresponding to version N-1 was not so easy. The rollback was not possible via the CI because it took too long to execute and was therefore executed on a developer workstation. # Metrics ## Error 521 * a first peak around 5:05 p.m. \(which triggered the alert\) * from 5:10 p.m. to 5:30 p.m. a large number of requests/s in 521 is observed ![](https://frz-statuspage.s3.eu-west-3.amazonaws.com/521+Fronts.png) As a percentage of all traffic: ![](https://frz-statuspage.s3.eu-west-3.amazonaws.com/521+over+time+All+Traffic.png) Over the duration of the incident ![](https://frz-statuspage.s3.eu-west-3.amazonaws.com/521+distribution+All+Traffic.png) Only on impacted customers ![](https://frz-statuspage.s3.eu-west-3.amazonaws.com/521+over+time+Impacted+Customers.png) ![](https://frz-statuspage.s3.eu-west-3.amazonaws.com/521+distribution+Impacted+Customers.png) # Impacts * Number of customers impacted: 12 sites \(< 2%\) * Percentage of requests impacted on all customers * Maximum: 1.5% * Over the duration of the incident: 0.32% * Percentage of queries impacted on impacted customers * Maximum: 7.3% * Over the duration of the incident: 1.54% # Counter measures ## Short term 1. Fix the engine and the faulty package to remove the breaking change 2. Secure v2 config schema validation changes 3. Enable fallback to origin on 521 error ## Middle term 1. Set up a system for migrating V2 configs from one schema version to another. 2. Improve some documentation \(rollback, release\) 3. Improve crisis internal organization 4. Put an extra step to enable canary phase with actual production traffic before triggering the rest of the update
Status: Postmortem
Impact: Major | Started At: May 4, 2023, 3:36 p.m.
Description: Between 10:00 AM (GMT+1) and 12h45AM (GMT+1), the purge on the CDN was not working as expected. We found the issue and apply a fix on the API. This is now resolved. Sorry for the inconvenience
Status: Resolved
Impact: None | Started At: March 9, 2023, 9 a.m.
Description: Starting from approximately 7am (CEST) and until 3pm (CEST), we've seen performance degradation of our engine due to a defect in the autoscaling process. Some users / websites might have noticed some slight slowdown on non-cacheable objects of the order of a couple hundred ms. This is now resolved. Sorry for the inconvenience
Status: Resolved
Impact: Minor | Started At: March 8, 2023, 6 a.m.
Description: This incident is now closed. Sorry for the inconvenience.
Status: Resolved
Impact: Minor | Started At: Feb. 22, 2023, 10:43 a.m.
Description: # Description de l'incident L’incident suivant est relatif à une boucle infinie sur notre plateforme suite à un mauvais paramétrage de configuration. **Impacts** Forts ralentissements des sites clients à cause de la saturation de la plateforme.**Timeline** Début de l’incident : 17h08 Fin de l’incident: 17h40 # Faits et Timeline **17h08-17h43** : Plusieurs perturbations sur la plateforme ont eu lieu à un intervalle d’environ 10 minutes. Chaque perturbation a duré environ 180 secondes. **17h14** : Première alerte en interne émise concernant le problème. **17h27** : Identification de l’origine de l’incident par l’équipe. Mise en place directement d’une réunion d’urgence avec l’équipe technique. **17h31** : Premières actions correctives pour mitiger le problème. **17h35** : Mise à jour du [statut public de la plateforme](https://status.fasterize.com/incidents/lh51wzsdvs1n) \([statuspage.io](https://status.fasterize.com/incidents/lh51wzsdvs1n)\). **17h40** : Résolution du problème. # Métriques * Niveaux de sévérité de l'incident : * Sévérité 2 : dégradation du site, problème de performance et/ou feature cassée avec difficulté de contourner impactant un nombre significatif d'utilisateur * Temps de détection : **6 minutes** * Temps de résolution : **32 minutes** # Analyse Une configuration a été incorrectement paramétrée au niveau de l’origine. L’origine de la configuration pointait sur Fasterize au lieu de pointer sur l’hébergement. La sécurité contre les boucles infinies présentes au niveau de la plateforme n’a pas fonctionné. Cela a conduit à saturer la plateforme et à produire des temps de réponse très dégradés. La détection automatique de la stabilité de la plateforme a détecté à plusieurs reprises les indisponibilités. Cependant, ces instabilités étaient déclenchées à intervalle régulier. Ainsi, les sites web ont été routés vers l’origine puis de nouveau routés vers Fasterize à la fin des différentes boucles. # Plan d'actions **Court terme :** * Correction de l’API pour mieux valider l’origine et ainsi éviter le cas d’une origine pointant vers Fasterize * Correction de la détection des boucles infinies sur le chemin de la requête **Moyen terme :** * Amélioration du système de protection de la plateforme via un système de rate limit.
Status: Postmortem
Impact: Major | Started At: Feb. 16, 2023, 4:12 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.