Flagsmith Status: Check if Flagsmith down or having an outage.

Flagsmith outages and incidents

Outage and incident data over the last 30 days for Flagsmith.

There have been 1 outages or incidents for Flagsmith in the last 30 days.

Severity Breakdown:

None: 1

Minor: 0

Major: 0

Critical: 0

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Components and Services Monitored for Flagsmith

Outlogger tracks the status of these components for Xero:

Admin Dashboard Active

Core API Active

Edge API Active

Public Website Active

Component	Status
Admin Dashboard	Active
Core API	Active
Edge API	Active
Public Website	Active

Latest Flagsmith outages and incidents.

View the latest incidents for Flagsmith and check for official updates:

Increased API latency

Description: This incident has been resolved.

Status: Resolved

Impact: None | Started At: Nov. 5, 2024, 10:19 a.m.

Updates:

Time: Nov. 5, 2024, 1:28 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: Nov. 5, 2024, 10:37 a.m.

Status: Monitoring

Update: Environment creation is functional again now.
Time: Nov. 5, 2024, 10:32 a.m.

Status: Monitoring

Update: API latency is back to normal. We are continuing to monitor.
Time: Nov. 5, 2024, 10:25 a.m.

Status: Identified

Update: The cause is due to a migration applied to the production database taking more resources than we anticipated. The migration has been completed and the load is normalising. We are monitoring and managing the load. Currently, while this issue is ongoing, it is not possible to create new environments.
Time: Nov. 5, 2024, 10:19 a.m.

Status: Investigating

Update: We are currently investigating this issue.

Core API outage

Description: At around 23:35 UTC, July 9th we received an alert that our Core API was not responding. This resulted in our SaaS customers not being able to use the Flagsmith dashboard \([app.flagsmith.com](http://app.flagsmith.com)\). Customers SDK’s serving flags were not impacted for those using the Edge API. Please note, any customers still using our Core API to serve flags were also impacted. This number is limited as we have advised customers to migration to the Edge API starting in June 2022. Our team resolved the issue at 3:06 UTC, July 10th and the Core API was fully responsive. The root cause of the issue was a database running at maximum CPU caused by requests to an end point that triggered an inefficient query. We also had our load balancer consistently recycling unhealthy API tasks that also strained the system due to unnecessary database connections. These two items combined, resulted in the core API being unresponsive. We recovered the database by dropping all traffic and terminating all open connections. This allowed the database to be recovered and process traffic normally. We are mitigating future issues like this by doing the following: * Optimizing the query that was triggered that used too much CPU capacity \(note that this has been completed and deployed to our production SaaS environment\) * Add better alerting when inefficient queries are identified in the application * Improving our internal tools \(e.g. PagerDuty\) to improve response time of issue identification triggered by some team members being out of office

Status: Postmortem

Impact: None | Started At: July 9, 2024, 11:30 p.m.

Updates:

Time: July 10, 2024, 1:31 p.m.

Status: Postmortem

Update: At around 23:35 UTC, July 9th we received an alert that our Core API was not responding. This resulted in our SaaS customers not being able to use the Flagsmith dashboard \([app.flagsmith.com](http://app.flagsmith.com)\). Customers SDK’s serving flags were not impacted for those using the Edge API. Please note, any customers still using our Core API to serve flags were also impacted. This number is limited as we have advised customers to migration to the Edge API starting in June 2022. Our team resolved the issue at 3:06 UTC, July 10th and the Core API was fully responsive. The root cause of the issue was a database running at maximum CPU caused by requests to an end point that triggered an inefficient query. We also had our load balancer consistently recycling unhealthy API tasks that also strained the system due to unnecessary database connections. These two items combined, resulted in the core API being unresponsive. We recovered the database by dropping all traffic and terminating all open connections. This allowed the database to be recovered and process traffic normally. We are mitigating future issues like this by doing the following: * Optimizing the query that was triggered that used too much CPU capacity \(note that this has been completed and deployed to our production SaaS environment\) * Add better alerting when inefficient queries are identified in the application * Improving our internal tools \(e.g. PagerDuty\) to improve response time of issue identification triggered by some team members being out of office
Time: July 10, 2024, 1:31 p.m.

Status: Resolved

Update: Core API and admin dashboard outage on 10th July 2024.

Core API Outage

Description: Our Core API was overwhelmed by massive traffic spike, causing the core SQL database to become extremely slow. This led to ECS tasks failing the health checks, prompting the load balancer to start and stop new tasks, which in turn added more load to the already maxed-out database. We tried several approaches to rate limiting the source of the traffic, but eventually had to temporarily stop traffic at the load balancer for 2 minutes in order to stabilise the system. We are working on implementing AWS API Gateway to include rate limiting at the gateway level to avoid these sort of incidents in the future.

Status: Resolved

Impact: None | Started At: July 4, 2024, 8:36 a.m.

Updates:

Time: July 4, 2024, 8:36 a.m.

Status: Resolved

Update: Our Core API was overwhelmed by massive traffic spike, causing the core SQL database to become extremely slow. This led to ECS tasks failing the health checks, prompting the load balancer to start and stop new tasks, which in turn added more load to the already maxed-out database. We tried several approaches to rate limiting the source of the traffic, but eventually had to temporarily stop traffic at the load balancer for 2 minutes in order to stabilise the system. We are working on implementing AWS API Gateway to include rate limiting at the gateway level to avoid these sort of incidents in the future.

Issues with flag updates

Description: This incident has been resolved.

Status: Resolved

Impact: Major | Started At: March 5, 2024, 10:27 a.m.

Updates:

Time: March 5, 2024, 1:38 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: March 5, 2024, 10:30 a.m.

Status: Monitoring

Update: We are continuing to monitor for any further issues.
Time: March 5, 2024, 10:29 a.m.

Status: Monitoring

Update: We have rolled back a recent change which has cleared out the backlog of tasks. Updates to flags should be propagated as normal.
Time: March 5, 2024, 10:27 a.m.

Status: Investigating

Update: We have been alerted to an issue with our asynchronous task processor which handles replicating flag updates across our network. We are currently investigating.

Increased error rates on the Edge API

Description: ### Timeline At around 13.45 today, we deployed a change to resolve a validation issue that had been introduced in a release earlier today. This validation issue affected only requests which provided a numeric value for the identity identifier. The new validation which was added, however, caused an issue for certain integrations since it also added a requirement for the traits key to be provided \(and not omitted\) which is not the case in some of our clients \(the Go client for example omits the traits key if the list is empty\). This meant that valid requests from these clients for identities with no traits were being incorrectly rejected as invalid. Once we received alerts for this from our monitoring and some of our affected customers we began investigating. At 14:54 we deployed a change which resolved the validation issue for certain cases, however not all. As such, at 15:06 we made the decision to roll back the affected regions, and at 15:48 we deployed a permanent fix for this including additional test cases to cover this behaviour. ### Impact Since the requests that were affected by this issue were those that had no traits, the impact was fairly limited and no trait data has been lost. Some identities will not have been created during this period, however, due to the nature of the Flagsmith integration, subsequent calls to identify those users will create them. ### Next Steps We have been working hard already on improving our release process for the Edge API. The first step of this, which is due to be released next week, is to improve our automated releases to rollback based on a number of additional alerting factors, including more granular looks at our error rates. This will ensure that, in future, a small subset of errors like this will trigger an immediate automated rollback. The next step after this is to create a more comprehensive end to end testing suite which exercises each of our SDKs to verify that the integrations are all compatible with any new changes.

Status: Postmortem

Impact: Minor | Started At: Jan. 18, 2024, 2:40 p.m.

Updates:

Time: Jan. 18, 2024, 5:34 p.m.

Status: Postmortem

Update: ### Timeline At around 13.45 today, we deployed a change to resolve a validation issue that had been introduced in a release earlier today. This validation issue affected only requests which provided a numeric value for the identity identifier. The new validation which was added, however, caused an issue for certain integrations since it also added a requirement for the traits key to be provided \(and not omitted\) which is not the case in some of our clients \(the Go client for example omits the traits key if the list is empty\). This meant that valid requests from these clients for identities with no traits were being incorrectly rejected as invalid. Once we received alerts for this from our monitoring and some of our affected customers we began investigating. At 14:54 we deployed a change which resolved the validation issue for certain cases, however not all. As such, at 15:06 we made the decision to roll back the affected regions, and at 15:48 we deployed a permanent fix for this including additional test cases to cover this behaviour. ### Impact Since the requests that were affected by this issue were those that had no traits, the impact was fairly limited and no trait data has been lost. Some identities will not have been created during this period, however, due to the nature of the Flagsmith integration, subsequent calls to identify those users will create them. ### Next Steps We have been working hard already on improving our release process for the Edge API. The first step of this, which is due to be released next week, is to improve our automated releases to rollback based on a number of additional alerting factors, including more granular looks at our error rates. This will ensure that, in future, a small subset of errors like this will trigger an immediate automated rollback. The next step after this is to create a more comprehensive end to end testing suite which exercises each of our SDKs to verify that the integrations are all compatible with any new changes.
Time: Jan. 18, 2024, 5:33 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: Jan. 18, 2024, 3:43 p.m.

Status: Identified

Update: We are deploying the permanent fix now.
Time: Jan. 18, 2024, 3:36 p.m.

Status: Identified

Update: We are continuing to work on a fix for this issue.
Time: Jan. 18, 2024, 3:34 p.m.

Status: Identified

Update: We have rolled back in certain affected regions and have completed the work for the permanent fix. This is in the final stages of testing now and will be rolled out imminently.
Time: Jan. 18, 2024, 3:15 p.m.

Status: Identified

Update: We have identified the remaining issue and are implementing a fix. ETA for full resolution: 15 minutes.
Time: Jan. 18, 2024, 3:06 p.m.

Status: Investigating

Update: We are continuing to investigate this issue.
Time: Jan. 18, 2024, 3:06 p.m.

Status: Investigating

Update: Issues are still persisting for integrations using the Go client. We are investigating further.
Time: Jan. 18, 2024, 2:52 p.m.

Status: Monitoring

Update: A fix has been implemented and we are monitoring the results.
Time: Jan. 18, 2024, 2:44 p.m.

Status: Identified

Update: The issue has been identified and a fix is being implemented.
Time: Jan. 18, 2024, 2:40 p.m.

Status: Investigating

Update: We are currently investigating this issue.

Check the status of similar companies and alternatives to Flagsmith

Avalara

Systems Active

Crisis Text Line

Systems Active

Jamf

Systems Active

Mulesoft

Systems Active

Meltwater

Systems Active

HashiCorp

Systems Active

Datto

Issues Detected

Vox Media

Systems Active

Cradlepoint

Systems Active

Liferay

Systems Active

Zapier

Systems Active

Workato US

Systems Active

Frequently Asked Questions - Flagsmith

Is there a Flagsmith outage?

The current status of Flagsmith is: Systems Active

Where can I find the official status page of Flagsmith?

The official status page for Flagsmith is here

How can I get notified if Flagsmith is down or experiencing an outage?

To get notified of any status changes to Flagsmith, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Flagsmith every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here

What does Flagsmith do?

Flagsmith simplifies feature flag creation and management. Utilize their hosted API, deploy to a private cloud, or run on-premises.

Is there an Flagsmith outage?

Flagsmith status: Systems Active

Flagsmith outages and incidents

There have been 1 outages or incidents for Flagsmith in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Components and Services Monitored for Flagsmith

Latest Flagsmith outages and incidents.

Increased API latency

Updates:

Core API outage

Updates:

Core API Outage

Updates:

Issues with flag updates

Updates:

Increased error rates on the Edge API

Updates:

Check the status of similar companies and alternatives to Flagsmith

Avalara

Crisis Text Line

Jamf

Mulesoft

Meltwater

HashiCorp

Datto

Vox Media

Cradlepoint

Liferay

Zapier

Workato US

Frequently Asked Questions - Flagsmith

Is there a Flagsmith outage?

Where can I find the official status page of Flagsmith?

How can I get notified if Flagsmith is down or experiencing an outage?

What does Flagsmith do?

Start monitoring now!