Last checked: 35 seconds ago
Get notified about any outages, downtime or incidents for Flagsmith and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Flagsmith.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
Admin Dashboard | Active |
Core API | Active |
Edge API | Active |
Public Website | Active |
View the latest incidents for Flagsmith and check for official updates:
Description: ## Overview On the 20th of January 2021, at 16:47 UTC, our REST API suffered a partial outage for 38 minutes, with partial service resuming over the course of 6 minutes, resulting in total downtime of 44 minutes. The core reason for the outage was a database migration that failed to apply correctly. We manually corrected the migration and service was resumed. We’re really sorry for this downtime. We work hard to try to ensure 100% uptime, and will take on these learnings to improve the service into the future. ## Background As part of the development work around 3rd party integrations, we have been working on an integration with \[Amplitude\]\([https://amplitude.com/](https://amplitude.com/)\). This integration requires a new table to be created in the core Postgres database. Consequently a Django Database Migration was created to facilitate this. As part of this work, one of our developers manually edited the migration to make a change to the data schema. This was an error; migrations should not be manually edited; the engineer should have created a second migration to modify the data schema. We have also been migrating our code to use the Black python formatter. This caused issues with regards to our code review process by polluting the code review with additional formatting that made reading the code harder than it ought to be. ## Testing The code worked in our local, development and staging environments. This was due to the fact that test data was present in the prod environment but not on the development or staging environments. The migration failed to apply everywhere \(because the app thought there was no migration to apply\) but the exception was only thrown because there was data in the table in production. ## Outage Once our code review had progressed, we merged our code to master and the CI/CD pipelines pushed it into production. This caused the outage. We were also late to be alerted to this on account of it not taking down endpoints like /health; /health was still reporting 200 OK response codes. ## Immediate Fix We identified the issue quickly, wrote and tested a fix and then deployed it into production. ## Learnings * We will ensure a 3rd set of eyes review any commits that include data migration code. * We will ensure a better consistency of data during testing. * Our downtime alerting has been improved to make synthetic API calls to core SDK endpoints like retrieving flags. This better simulates real world usage.
Status: Postmortem
Impact: None | Started At: Jan. 20, 2021, 5:30 p.m.
Description: Over a period of 5 days we experienced either slowdowns or 500 error response codes from our API. These brown-outs occurred for around 1 minute, 4 times a day. This was due to a misconfiguration on a client's SDK implementation that was sending us very high bursts in traffic following a push notification that was sent out to a large user population. In order to mitigate this outage, we have upsized our core database with 8x the capacity, and our app server cluster with 16x the capacity. This has provided us with enough capacity to serve these traffic bursts. We have also been in contact with the customer to help improve their SDK implementation to reduce the load on our API. We apologise for the degradation in service.
Status: Resolved
Impact: None | Started At: Oct. 8, 2020, 8:59 a.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.