Flagsmith Status: Check if Flagsmith down or having an outage.

Flagsmith outages and incidents

Outage and incident data over the last 30 days for Flagsmith.

There have been 1 outages or incidents for Flagsmith in the last 30 days.

Severity Breakdown:

None: 1

Minor: 0

Major: 0

Critical: 0

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Components and Services Monitored for Flagsmith

Outlogger tracks the status of these components for Xero:

Admin Dashboard Active

Core API Active

Edge API Active

Public Website Active

Component	Status
Admin Dashboard	Active
Core API	Active
Edge API	Active
Public Website	Active

Latest Flagsmith outages and incidents.

View the latest incidents for Flagsmith and check for official updates:

Service Outage

Description: The incident was related to an erroneous DNS change. This has now been reverted and service should be back up and running. There may be a period of time where failures are still seen as we wait for DNS caching to be propagated.

Status: Resolved

Impact: Critical | Started At: Dec. 1, 2023, 3:10 p.m.

Updates:

Time: Dec. 1, 2023, 3:29 p.m.

Status: Resolved

Update: The incident was related to an erroneous DNS change. This has now been reverted and service should be back up and running. There may be a period of time where failures are still seen as we wait for DNS caching to be propagated.
Time: Dec. 1, 2023, 3:10 p.m.

Status: Investigating

Update: We are currently investigating reports of a service outage on all of our infrastructure.

Identity integrations are not being triggered

Description: ## **Summary** On September 5th at 09:45 UTC, we initiated a release that included a database migration aimed at introducing a new constraint to the table containing information related to flags. According to our pre-live tests, this task should not have taken more than 50 milliseconds. Unfortunately, during the release to production, due to the high throughput on a particular table that it needed to acquire a temporary lock on, this caused a backlog of blocked connections waiting on the migration to complete. This caused a knock on effect that exhausted the connections on the database and a full restart was necessary. Once the restart was complete, the connections were restored and service was resumed. This happened at 10:20 UTC. ## **Next Steps** We have researched the cause of the issue and we do still have further research to understand certain aspects. Our current plan in the meantime is to implement certain safeguards as can be found in the following links to the Postgres documentation which should help reduce any impact in the future. [https://www.postgresql.org/docs/11/runtime-config-client.html](https://www.postgresql.org/docs/11/runtime-config-client.html) [https://www.postgresql.org/docs/11/runtime-config-logging.html](https://www.postgresql.org/docs/11/runtime-config-logging.html) \(`log_lock_waits`\)

Status: Postmortem

Impact: Minor | Started At: Sept. 12, 2023, 11:14 a.m.

Updates:

Time: Sept. 13, 2023, 8:43 a.m.

Status: Postmortem

Update: ## **Summary** On September 5th at 09:45 UTC, we initiated a release that included a database migration aimed at introducing a new constraint to the table containing information related to flags. According to our pre-live tests, this task should not have taken more than 50 milliseconds. Unfortunately, during the release to production, due to the high throughput on a particular table that it needed to acquire a temporary lock on, this caused a backlog of blocked connections waiting on the migration to complete. This caused a knock on effect that exhausted the connections on the database and a full restart was necessary. Once the restart was complete, the connections were restored and service was resumed. This happened at 10:20 UTC. ## **Next Steps** We have researched the cause of the issue and we do still have further research to understand certain aspects. Our current plan in the meantime is to implement certain safeguards as can be found in the following links to the Postgres documentation which should help reduce any impact in the future. [https://www.postgresql.org/docs/11/runtime-config-client.html](https://www.postgresql.org/docs/11/runtime-config-client.html) [https://www.postgresql.org/docs/11/runtime-config-logging.html](https://www.postgresql.org/docs/11/runtime-config-logging.html) \(`log_lock_waits`\)
Time: Sept. 12, 2023, 12:15 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: Sept. 12, 2023, 11:35 a.m.

Status: Monitoring

Update: We are continuing to monitor for any further issues.
Time: Sept. 12, 2023, 11:26 a.m.

Status: Monitoring

Update: A fix has been implemented and we are monitoring the results.
Time: Sept. 12, 2023, 11:14 a.m.

Status: Investigating

Update: We are currently investigating this issue.

Core API is not responding

Description: **Summary** On September 5th at 09:45 UTC, we initiated a release that included a database migration aimed at introducing a new constraint to the table containing information related to flags. According to our pre-live tests, this task should not have taken more than 50 milliseconds. Unfortunately, during the release to production, due to the high throughput on a particular table that it needed to acquire a temporary lock on, this caused a backlog of blocked connections waiting on the migration to complete. This caused a knock on effect that exhausted the connections on the database and a full restart was necessary. Once the restart was complete, the connections were restored and service was resumed. This happened at 10:20 UTC. **Next Steps** We have researched the cause of the issue and we do still have further research to understand certain aspects. Our current plan in the meantime is to implement certain safeguards as can be found in the following links to the Postgres documentation which should help reduce any impact in the future. [https://www.postgresql.org/docs/11/runtime-config-client.html](https://www.postgresql.org/docs/11/runtime-config-client.html) [https://www.postgresql.org/docs/11/runtime-config-logging.html](https://www.postgresql.org/docs/11/runtime-config-logging.html) \(`log_lock_waits`\)

Status: Postmortem

Impact: Critical | Started At: Sept. 5, 2023, 9:45 a.m.

Updates:

Time: Sept. 12, 2023, 4:50 p.m.

Status: Postmortem

Update: **Summary** On September 5th at 09:45 UTC, we initiated a release that included a database migration aimed at introducing a new constraint to the table containing information related to flags. According to our pre-live tests, this task should not have taken more than 50 milliseconds. Unfortunately, during the release to production, due to the high throughput on a particular table that it needed to acquire a temporary lock on, this caused a backlog of blocked connections waiting on the migration to complete. This caused a knock on effect that exhausted the connections on the database and a full restart was necessary. Once the restart was complete, the connections were restored and service was resumed. This happened at 10:20 UTC. **Next Steps** We have researched the cause of the issue and we do still have further research to understand certain aspects. Our current plan in the meantime is to implement certain safeguards as can be found in the following links to the Postgres documentation which should help reduce any impact in the future. [https://www.postgresql.org/docs/11/runtime-config-client.html](https://www.postgresql.org/docs/11/runtime-config-client.html) [https://www.postgresql.org/docs/11/runtime-config-logging.html](https://www.postgresql.org/docs/11/runtime-config-logging.html) \(`log_lock_waits`\)
Time: Sept. 5, 2023, 12:01 p.m.

Status: Resolved

Update: This incident has been resolved. A postmortem will follow.
Time: Sept. 5, 2023, 10:25 a.m.

Status: Monitoring

Update: We are continuing to monitor for any further issues.
Time: Sept. 5, 2023, 10:25 a.m.

Status: Monitoring

Update: A fix has been implemented and we are monitoring the results.
Time: Sept. 5, 2023, 9:51 a.m.

Status: Identified

Update: We have identified a database migration that has failed as part of a new release. We are working to re-apply the migration.
Time: Sept. 5, 2023, 9:45 a.m.

Status: Investigating

Update: We are currently investigating this issue.

Performance impacted to our Core API

Description: This incident has been resolved.

Status: Resolved

Impact: Minor | Started At: Aug. 10, 2023, 12:15 p.m.

Updates:

Time: Aug. 10, 2023, 12:54 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: Aug. 10, 2023, 12:26 p.m.

Status: Monitoring

Update: A fix has been implemented and we are monitoring the results.
Time: Aug. 10, 2023, 12:15 p.m.

Status: Identified

Update: We were alerted to a slow DB query in newly released code at 13:09 BST. We are reverting the code and expect to be back to normal latency in the next 10 minutes. Edge API is not affected.

We are currently encountering difficulties with our task processing system.

Description: ## Timeline We were alerted at 23:39 UTC on 18/07/2023 that the queue for our asynchronous task processor was above the acceptable threshold. Once our team was online in India at 2:59am UTC, the status page was updated. By this time the task processor queue had backed up and the application was not able to write flag change events to the datastore which powers the Edge API. We investigated multiple avenues to determine the cause of the issues but there were multiple ‘symptoms’ that made determining the root cause very difficult. One specific issue, which turned out to be a red herring, related to the functionality to forward core API requests to the Edge API. This process seemed to be taking much longer than expected. Much of the investigation was spent restricting the usage of this functionality. At around 9:30am UTC, the cause was attributed to a particular set of tasks in the queue which were causing the processor units to run out of memory. Once it was determined to be safe to do so, these tasks were removed from the queue. At 10:19 UTC the issue had been resolved and the queue had returned to normal, meaning that flag change events were being written to the Edge API datastore again. Any changes that were not processed at the time were also re-run to ensure that the state was consistent with the expected changes that had been made in the database. ## Issue Details The issue was caused by an environment in the Flagsmith platform that included 400 segments and nearly 5000 segment overrides. This meant that the environment document which is generated to power the Edge API was larger than it was possible for the task processor instances to load into memory, and subsequently write to the Edge API datastore. To compound the issue, these changes were made via the Flagsmith API which resulted in 1000s of tasks being generated to update the document in the Edge API datastore in a short space of time. Each of these needed to load the offending environment, causing the task processor instances to fall into a cycle of running out of memory. These tasks were slowly being blocked from being picked up again by the processors but the quantity meant that there were always new versions of the same \(or very similar\) tasks to pick up. ## Next Steps * Implement limits on the size of the environment document * This will primarily consist of implementing limits on the number of segments and features in a given projects, as well as limiting the total number of segment overrides in a given project. * Deprecate the functionality to forward requests from the Core API to the Edge API. All projects using the Edge API will need to ensure that all connected SDKs are using the Edge API only.

Status: Postmortem

Impact: Minor | Started At: July 19, 2023, 2:58 a.m.

Updates:

Time: July 19, 2023, 4:24 p.m.

Status: Postmortem

Update: ## Timeline We were alerted at 23:39 UTC on 18/07/2023 that the queue for our asynchronous task processor was above the acceptable threshold. Once our team was online in India at 2:59am UTC, the status page was updated. By this time the task processor queue had backed up and the application was not able to write flag change events to the datastore which powers the Edge API. We investigated multiple avenues to determine the cause of the issues but there were multiple ‘symptoms’ that made determining the root cause very difficult. One specific issue, which turned out to be a red herring, related to the functionality to forward core API requests to the Edge API. This process seemed to be taking much longer than expected. Much of the investigation was spent restricting the usage of this functionality. At around 9:30am UTC, the cause was attributed to a particular set of tasks in the queue which were causing the processor units to run out of memory. Once it was determined to be safe to do so, these tasks were removed from the queue. At 10:19 UTC the issue had been resolved and the queue had returned to normal, meaning that flag change events were being written to the Edge API datastore again. Any changes that were not processed at the time were also re-run to ensure that the state was consistent with the expected changes that had been made in the database. ## Issue Details The issue was caused by an environment in the Flagsmith platform that included 400 segments and nearly 5000 segment overrides. This meant that the environment document which is generated to power the Edge API was larger than it was possible for the task processor instances to load into memory, and subsequently write to the Edge API datastore. To compound the issue, these changes were made via the Flagsmith API which resulted in 1000s of tasks being generated to update the document in the Edge API datastore in a short space of time. Each of these needed to load the offending environment, causing the task processor instances to fall into a cycle of running out of memory. These tasks were slowly being blocked from being picked up again by the processors but the quantity meant that there were always new versions of the same \(or very similar\) tasks to pick up. ## Next Steps * Implement limits on the size of the environment document * This will primarily consist of implementing limits on the number of segments and features in a given projects, as well as limiting the total number of segment overrides in a given project. * Deprecate the functionality to forward requests from the Core API to the Edge API. All projects using the Edge API will need to ensure that all connected SDKs are using the Edge API only.
Time: July 19, 2023, 10:19 a.m.

Status: Resolved

Update: This incident has been resolved. We will publish a full post-mortem imminently.
Time: July 19, 2023, 9:12 a.m.

Status: Monitoring

Update: We have deployed an update which has resumed consumption of the task queue. We are now processing the task queue and expect to be caught up in the next hour.
Time: July 19, 2023, 8:02 a.m.

Status: Identified

Update: We have identified a database lock that has caused this issue with the task processor. We are working on an interim fix as we identify the root cause.
Time: July 19, 2023, 6:35 a.m.

Status: Investigating

Update: We are continuing to investigate this issue with the utmost priority.
Time: July 19, 2023, 2:58 a.m.

Status: Investigating

Update: At the moment, we are conducting an investigation, which indicates that any flag changes made in the last approximately two hours may not be visible to the client.

Check the status of similar companies and alternatives to Flagsmith

Avalara

Systems Active

Crisis Text Line

Systems Active

Jamf

Systems Active

Mulesoft

Systems Active

Meltwater

Systems Active

HashiCorp

Systems Active

Datto

Issues Detected

Vox Media

Systems Active

Cradlepoint

Systems Active

Liferay

Systems Active

Zapier

Systems Active

Workato US

Systems Active

Frequently Asked Questions - Flagsmith

Is there a Flagsmith outage?

The current status of Flagsmith is: Systems Active

Where can I find the official status page of Flagsmith?

The official status page for Flagsmith is here

How can I get notified if Flagsmith is down or experiencing an outage?

To get notified of any status changes to Flagsmith, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Flagsmith every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here

What does Flagsmith do?

Flagsmith simplifies feature flag creation and management. Utilize their hosted API, deploy to a private cloud, or run on-premises.

Is there an Flagsmith outage?

Flagsmith status: Systems Active

Flagsmith outages and incidents

There have been 1 outages or incidents for Flagsmith in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Components and Services Monitored for Flagsmith

Latest Flagsmith outages and incidents.

Service Outage

Updates:

Identity integrations are not being triggered

Updates:

Core API is not responding

Updates:

Performance impacted to our Core API

Updates:

We are currently encountering difficulties with our task processing system.

Updates:

Check the status of similar companies and alternatives to Flagsmith

Avalara

Crisis Text Line

Jamf

Mulesoft

Meltwater

HashiCorp

Datto

Vox Media

Cradlepoint

Liferay

Zapier

Workato US

Frequently Asked Questions - Flagsmith

Is there a Flagsmith outage?

Where can I find the official status page of Flagsmith?

How can I get notified if Flagsmith is down or experiencing an outage?

What does Flagsmith do?

Start monitoring now!