Flagsmith Status: Check if Flagsmith down or having an outage.

Flagsmith outages and incidents

Outage and incident data over the last 30 days for Flagsmith.

There have been 1 outages or incidents for Flagsmith in the last 30 days.

Severity Breakdown:

None: 1

Minor: 0

Major: 0

Critical: 0

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Components and Services Monitored for Flagsmith

Outlogger tracks the status of these components for Xero:

Admin Dashboard Active

Core API Active

Edge API Active

Public Website Active

Component	Status
Admin Dashboard	Active
Core API	Active
Edge API	Active
Public Website	Active

Latest Flagsmith outages and incidents.

View the latest incidents for Flagsmith and check for official updates:

Subscription status is missing

Description: This incident has been resolved.

Status: Resolved

Impact: None | Started At: Feb. 27, 2023, 2:53 p.m.

Updates:

Time: Feb. 27, 2023, 3:28 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: Feb. 27, 2023, 3:07 p.m.

Status: Monitoring

Update: We have restored the subscription status data.
Time: Feb. 27, 2023, 2:53 p.m.

Status: Investigating

Update: We have an issue where we are not showing the correct subscription status on the Organisation settings page. We are investigating.

Issue processing analytics data

Description: The issue with our downstream provider has been resolved. We will continue to monitor.

Status: Resolved

Impact: Minor | Started At: Feb. 24, 2023, 10:16 p.m.

Updates:

Time: Feb. 25, 2023, 2:38 a.m.

Status: Resolved

Update: The issue with our downstream provider has been resolved. We will continue to monitor.
Time: Feb. 24, 2023, 11:23 p.m.

Status: Investigating

Update: We're currently awaiting further information from the third party about the issues. As of yet, we don't have an ETA on resolution, however, the issue is still limited to analytics reads / writes.
Time: Feb. 24, 2023, 10:19 p.m.

Status: Investigating

Update: We are continuing to investigate this issue.
Time: Feb. 24, 2023, 10:16 p.m.

Status: Investigating

Update: We are currently investigating issues with a downstream provider impacting our ability to handle requests to write / read feature analytics. This is not affecting any critical services for managing or retrieving feature flags.

Major Core API outage

Description: At 12:46 UTC on Thursday 18th August, our monitoring picked up an increased number of HTTP 502s being served by our API. Upon investigation it became evident that an unexpected increase in load on the PostgreSQL database that serves our Core API was causing our application to struggle to serve some requests and we saw increased latency on those that were being served. In an attempt to resolve the issue, we adjusted the settings in our ECS cluster to reduce the number of connections to the database. Unfortunately, making this change via our IaaC workflow meant that the ECS service tried to recreate all the tasks but couldn’t do so as the health reporting was unable to consistently report a healthy status. This meant that our Core API was essentially flapping up and down while it tried to reinstate all the tasks. During this period, our API was continuing to serve some requests, with increased latency, however, there would have been a large proportion of HTTP 502s still. Following the above, our engineering team looked into the requests that were causing the increased load. From our investigation, it was apparent that the increased load was all to our environment document endpoint \(which powers the local evaluation in our latest server side clients\). This endpoint, although usable in our Core API, is very intensive as it generates the whole environment document from our PostgreSQL database to return to the client in JSON form. This involves a large number of queries. The compounding factor was due to a bug in our Node client regarding request timeouts. The Node client takes an argument of requestTimeoutSeconds on instantiation, however, it passes this directly into the call to the Node Fetch’s library fetch function which expects the timeout to be passed in milliseconds. As such, if requestTimeoutSeconds was set to e.g. 3, the request would timeout in 3ms and retry \(3 times by default\). So, every time a Node client polled for the environment, it would be making 3 requests in ~9ms \(or as close to it as Node can manage\). We were able to block the traffic to this endpoint for the customer that was putting an unusual amount of load through it due to their configuration and the above bug in the Node client. Once we had blocked this traffic, the application began serving traffic as normal again. This occurred at 15:24 UTC. At this point, traffic to the Core API was back to normal and all requests were served successfully. To remediate this issue, we are stepping up our efforts to encourage all of our clients to move over to our Edge API which is immune to issues of this nature. We are also planning to make improvements to the existing Core API platform to help guard against these issues in the future: 1. The addition of caching to our environment document endpoint to improve performance / minimise database impact 2. The implementation of automated rate limiting to better protect the platform from issues of this nature If you’ve read this and are unsure how to migrate to our Edge API, you can find out everything you need to know [here](https://docs.flagsmith.com/advanced-use/edge-api).

Status: Postmortem

Impact: Major | Started At: Aug. 18, 2022, 1 p.m.

Updates:

Time: Aug. 23, 2022, 3:51 p.m.

Status: Postmortem

Update: At 12:46 UTC on Thursday 18th August, our monitoring picked up an increased number of HTTP 502s being served by our API. Upon investigation it became evident that an unexpected increase in load on the PostgreSQL database that serves our Core API was causing our application to struggle to serve some requests and we saw increased latency on those that were being served. In an attempt to resolve the issue, we adjusted the settings in our ECS cluster to reduce the number of connections to the database. Unfortunately, making this change via our IaaC workflow meant that the ECS service tried to recreate all the tasks but couldn’t do so as the health reporting was unable to consistently report a healthy status. This meant that our Core API was essentially flapping up and down while it tried to reinstate all the tasks. During this period, our API was continuing to serve some requests, with increased latency, however, there would have been a large proportion of HTTP 502s still. Following the above, our engineering team looked into the requests that were causing the increased load. From our investigation, it was apparent that the increased load was all to our environment document endpoint \(which powers the local evaluation in our latest server side clients\). This endpoint, although usable in our Core API, is very intensive as it generates the whole environment document from our PostgreSQL database to return to the client in JSON form. This involves a large number of queries. The compounding factor was due to a bug in our Node client regarding request timeouts. The Node client takes an argument of requestTimeoutSeconds on instantiation, however, it passes this directly into the call to the Node Fetch’s library fetch function which expects the timeout to be passed in milliseconds. As such, if requestTimeoutSeconds was set to e.g. 3, the request would timeout in 3ms and retry \(3 times by default\). So, every time a Node client polled for the environment, it would be making 3 requests in ~9ms \(or as close to it as Node can manage\). We were able to block the traffic to this endpoint for the customer that was putting an unusual amount of load through it due to their configuration and the above bug in the Node client. Once we had blocked this traffic, the application began serving traffic as normal again. This occurred at 15:24 UTC. At this point, traffic to the Core API was back to normal and all requests were served successfully. To remediate this issue, we are stepping up our efforts to encourage all of our clients to move over to our Edge API which is immune to issues of this nature. We are also planning to make improvements to the existing Core API platform to help guard against these issues in the future: 1. The addition of caching to our environment document endpoint to improve performance / minimise database impact 2. The implementation of automated rate limiting to better protect the platform from issues of this nature If you’ve read this and are unsure how to migrate to our Edge API, you can find out everything you need to know [here](https://docs.flagsmith.com/advanced-use/edge-api).
Time: Aug. 18, 2022, 3:43 p.m.

Status: Resolved

Update: Major outage of our Core API affecting flag retrieval for users that have not yet migrated to Edge and dashboard usage.

Core API is not responding

Description: This incident has been resolved.

Status: Resolved

Impact: Critical | Started At: July 10, 2022, 5:53 p.m.

Updates:

Time: July 10, 2022, 9:15 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: July 10, 2022, 8:36 p.m.

Status: Monitoring

Update: We are still seeing issues with the ECS cluster scaling out correctly, again down to the recent AWS eu-west-2 outage. We are monitoring.
Time: July 10, 2022, 8:09 p.m.

Status: Monitoring

Update: The DB has recovered. API latency is high as our ECS cluster scales out.
Time: July 10, 2022, 7:51 p.m.

Status: Identified

Update: AWS are restoring services. We hope to have the API back up shortly.
Time: July 10, 2022, 6:24 p.m.

Status: Identified

Update: AWS have confirmed an outage in their EU-West-2 data center. We are monitoring and waiting on AWS to provide updates.
Time: July 10, 2022, 6:04 p.m.

Status: Identified

Update: This looks to be an issue with AWS where our Core API is located. We are investigating.
Time: July 10, 2022, 5:53 p.m.

Status: Investigating

Update: We are investing an outage on our Core API.

Core API: Increased error rates

Description: As part of a new feature rollout, there was a large database migration that needed to take place. We knew that the migration would take some time, however, it should not have affected production traffic. Unfortunately, despite our health check returning unhealthy until all migrations are complete, AWS ECS promoted the new version of the API application before the migrations were complete. This meant that the code that was running was expecting certain columns / data to be available in the database which weren’t there yet. We are still investigating what caused ECS to promote the new version before the migrations were complete.

Status: Postmortem

Impact: None | Started At: July 1, 2022, 11:31 a.m.

Updates:

Time: July 1, 2022, 11:40 a.m.

Status: Postmortem

Update: As part of a new feature rollout, there was a large database migration that needed to take place. We knew that the migration would take some time, however, it should not have affected production traffic. Unfortunately, despite our health check returning unhealthy until all migrations are complete, AWS ECS promoted the new version of the API application before the migrations were complete. This meant that the code that was running was expecting certain columns / data to be available in the database which weren’t there yet. We are still investigating what caused ECS to promote the new version before the migrations were complete.
Time: July 1, 2022, 11:40 a.m.

Status: Resolved

Update: This incident has been resolved.
Time: July 1, 2022, 11:31 a.m.

Status: Investigating

Update: We are seeing increased 502 responses to our Core API. We are aware of the cause and working on a fix. The Edge API is unaffected.

Check the status of similar companies and alternatives to Flagsmith

Avalara

Systems Active

Crisis Text Line

Systems Active

Jamf

Systems Active

Mulesoft

Systems Active

Meltwater

Systems Active

HashiCorp

Systems Active

Datto

Issues Detected

Vox Media

Systems Active

Cradlepoint

Systems Active

Liferay

Systems Active

Zapier

Systems Active

Workato US

Systems Active

Frequently Asked Questions - Flagsmith

Is there a Flagsmith outage?

The current status of Flagsmith is: Systems Active

Where can I find the official status page of Flagsmith?

The official status page for Flagsmith is here

How can I get notified if Flagsmith is down or experiencing an outage?

To get notified of any status changes to Flagsmith, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Flagsmith every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here

What does Flagsmith do?

Flagsmith simplifies feature flag creation and management. Utilize their hosted API, deploy to a private cloud, or run on-premises.

Is there an Flagsmith outage?

Flagsmith status: Systems Active

Flagsmith outages and incidents

There have been 1 outages or incidents for Flagsmith in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Components and Services Monitored for Flagsmith

Latest Flagsmith outages and incidents.

Subscription status is missing

Updates:

Issue processing analytics data

Updates:

Major Core API outage

Updates:

Core API is not responding

Updates:

Core API: Increased error rates

Updates:

Check the status of similar companies and alternatives to Flagsmith

Avalara

Crisis Text Line

Jamf

Mulesoft

Meltwater

HashiCorp

Datto

Vox Media

Cradlepoint

Liferay

Zapier

Workato US

Frequently Asked Questions - Flagsmith

Is there a Flagsmith outage?

Where can I find the official status page of Flagsmith?

How can I get notified if Flagsmith is down or experiencing an outage?

What does Flagsmith do?

Start monitoring now!