Flagsmith Status: Check if Flagsmith down or having an outage.

Flagsmith outages and incidents

Outage and incident data over the last 30 days for Flagsmith.

There have been 1 outages or incidents for Flagsmith in the last 30 days.

Severity Breakdown:

None: 1

Minor: 0

Major: 0

Critical: 0

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Components and Services Monitored for Flagsmith

Outlogger tracks the status of these components for Xero:

Admin Dashboard Active

Core API Active

Edge API Active

Public Website Active

Component	Status
Admin Dashboard	Active
Core API	Active
Edge API	Active
Public Website	Active

Latest Flagsmith outages and incidents.

View the latest incidents for Flagsmith and check for official updates:

Issue affecting permissions for project admin users

Description: This incident has been resolved.

Status: Resolved

Impact: Minor | Started At: July 12, 2023, 1:46 p.m.

Updates:

Time: July 12, 2023, 4:17 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: July 12, 2023, 3:56 p.m.

Status: Monitoring

Update: A fix has been implemented and we are monitoring the results.
Time: July 12, 2023, 2:25 p.m.

Status: Identified

Update: The issue has been identified and a fix is being implemented.
Time: July 12, 2023, 1:47 p.m.

Status: Investigating

Update: We are continuing to investigate this issue.
Time: July 12, 2023, 1:46 p.m.

Status: Investigating

Update: We are investigating an issue where project admin users do not receive the inherited permissions on each environment in the project.

Slow response times for Edge API requests

Description: ## Timeline At 12:15pm UTC, we were notified of increased response times on a number of our Edge API endpoints. Investigation showed nothing immediately obvious but we suspected that it could be caused by Sentry, our APM tool. We set about removing the Sentry initialisation from our code and deployed it as soon as we could. At 12:48pm UTC, this change was deployed and we observed the response times decrease immediately. At 12:52pm UTC our monitoring confirmed that the average response time had returned to normal. ## Next Steps * Look into improvements to reduce / remove the impact of Sentry issues on our Edge API. * Decrease the shutdown timeout of the Sentry SDK. * Look at using [Sentry relay](https://docs.sentry.io/product/relay) to remove the impact on core Edge API services. ‌ * Add integration tests to simulate performance degradation / outages from all downstream services.

Status: Postmortem

Impact: Minor | Started At: July 10, 2023, 12:29 p.m.

Updates:

Time: July 10, 2023, 1:23 p.m.

Status: Postmortem

Update: ## Timeline At 12:15pm UTC, we were notified of increased response times on a number of our Edge API endpoints. Investigation showed nothing immediately obvious but we suspected that it could be caused by Sentry, our APM tool. We set about removing the Sentry initialisation from our code and deployed it as soon as we could. At 12:48pm UTC, this change was deployed and we observed the response times decrease immediately. At 12:52pm UTC our monitoring confirmed that the average response time had returned to normal. ## Next Steps * Look into improvements to reduce / remove the impact of Sentry issues on our Edge API. * Decrease the shutdown timeout of the Sentry SDK. * Look at using [Sentry relay](https://docs.sentry.io/product/relay) to remove the impact on core Edge API services. ‌ * Add integration tests to simulate performance degradation / outages from all downstream services.
Time: July 10, 2023, 12:58 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: July 10, 2023, 12:50 p.m.

Status: Monitoring

Update: The downstream service has been successfully removed. Response times have returned to normal. We are continuing to monitor the situation.
Time: July 10, 2023, 12:44 p.m.

Status: Identified

Update: We have identified an issue caused by a downstream service which is causing a knock on effect to our performance. We are currently deploying a change to remove the downstream service.
Time: July 10, 2023, 12:29 p.m.

Status: Investigating

Update: We are currently investigating this issue.

Erroneous flag values

Description: ## Summary of the issue Following a release of the Core API at 10:35 UTC, a regression was introduced which meant that the generated environment document contained erroneous flag values for those flags which had recent change requests that had not been committed \(and potentially deleted\). Since the environment document is used to generate the flags for the Edge API and SDKs running in local evaluation this meant that certain customers using these methods to evaluate their flags would have received erroneous flag values. In this situation, the flag values served were those that were included in the uncommitted change requests. ## Resolution steps At 17:20 UTC we were notified of this issue by a customer that was affected by the erroneous values. At 17:59 UTC the issue was identified and a fix was being developed. This fix was fully developed and released by 19:10 UTC and all affected environments were regenerated by 19:30 UTC. The PR for the fix can be found [here](https://github.com/Flagsmith/flagsmith/pull/2378) for those interested in reviewing further. ## Next steps / preventative measures In order to prevent these steps in the future we plan to expand our end-to-end testing suite to further cover our change requests workflows so that we can identify these issues earlier.

Status: Postmortem

Impact: Minor | Started At: July 4, 2023, 10:30 a.m.

Updates:

Time: July 5, 2023, 10:02 a.m.

Status: Postmortem

Update: ## Summary of the issue Following a release of the Core API at 10:35 UTC, a regression was introduced which meant that the generated environment document contained erroneous flag values for those flags which had recent change requests that had not been committed \(and potentially deleted\). Since the environment document is used to generate the flags for the Edge API and SDKs running in local evaluation this meant that certain customers using these methods to evaluate their flags would have received erroneous flag values. In this situation, the flag values served were those that were included in the uncommitted change requests. ## Resolution steps At 17:20 UTC we were notified of this issue by a customer that was affected by the erroneous values. At 17:59 UTC the issue was identified and a fix was being developed. This fix was fully developed and released by 19:10 UTC and all affected environments were regenerated by 19:30 UTC. The PR for the fix can be found [here](https://github.com/Flagsmith/flagsmith/pull/2378) for those interested in reviewing further. ## Next steps / preventative measures In order to prevent these steps in the future we plan to expand our end-to-end testing suite to further cover our change requests workflows so that we can identify these issues earlier.
Time: July 5, 2023, 8:43 a.m.

Status: Resolved

Update: Customer reports of erroneous flag values being served in local evaluation mode.

Environment based integrations not working for Edge API

Description: On June 7th at 10:33 UTC we released a change to our Edge API as part of [this issue](https://github.com/Flagsmith/flagsmith/issues/430) that filters out server-side only features when a client API key is used. These changes also affected the logic responsible for triggering environment-level integrations, causing them to fail. **Which integrations were affected?** * Mixpanel * Segment * Heap * Webhooks * Amplitude On June 15th at 22:25 UTC we were notified by a customer that they were not seeing data populated from their integration. First thing on June 16th the engineering team began investigating the issue. The team immediately identified that the change described above had changed the signature of a function that was also used by the integrations logic. Unfortunately, this change had not been picked up by our tests or code review, and the subsequent errors were not picked up by our monitoring. **Why didn’t our tests pick this up?** The unit tests covering the integrations logic utilised mocking, meaning that the change to the method signature was not correctly identified as an issue and our end to end test suite did not include the verification of successful integrations. **Why didn’t our monitoring pick this up?** The monitoring in place to track the error rate on the function responsible was using an incorrect aggregation algorithm meaning that the threshold for alert was never breached. At 18:03 UTC on June 16th a fix was released and service was resumed to all environment-based identity integrations \(listed above\). Unfortunately, due to the implementation of the asynchronous call to the lambda function that handles the integrations, it is not possible to recover the data that was lost during this period. ### What are we doing to prevent this from happening in the future * Improving our unit tests to rely less on mocks and, where they do rely on mocks, ensuring they utilise `spec` correctly \(see unittest documentation [here](https://docs.python.org/3/library/unittest.mock.html) for further reading\) * Extending our E2E testing suite on the Edge API to include tests for all integrations. * Alerting & monitoring: * Immediately this morning, we are changing our alerting to use the correct aggregation algorithm * In the near future, we will be improving these alerts to use percentiles and anomaly detection to ensure that errors are picked up quicker and are more accurate * Introducing Sentry to better track error logs that are reported from the Edge API * Moving asynchronous invocations of other lambda functions to use persistent queues and/or an event messaging system so that, after issues such as this, tasks can be re-run to ensure no data is lost.

Status: Postmortem

Impact: Minor | Started At: June 16, 2023, 5:40 p.m.

Updates:

Time: June 21, 2023, 10:26 a.m.

Status: Postmortem

Update: On June 7th at 10:33 UTC we released a change to our Edge API as part of [this issue](https://github.com/Flagsmith/flagsmith/issues/430) that filters out server-side only features when a client API key is used. These changes also affected the logic responsible for triggering environment-level integrations, causing them to fail. **Which integrations were affected?** * Mixpanel * Segment * Heap * Webhooks * Amplitude On June 15th at 22:25 UTC we were notified by a customer that they were not seeing data populated from their integration. First thing on June 16th the engineering team began investigating the issue. The team immediately identified that the change described above had changed the signature of a function that was also used by the integrations logic. Unfortunately, this change had not been picked up by our tests or code review, and the subsequent errors were not picked up by our monitoring. **Why didn’t our tests pick this up?** The unit tests covering the integrations logic utilised mocking, meaning that the change to the method signature was not correctly identified as an issue and our end to end test suite did not include the verification of successful integrations. **Why didn’t our monitoring pick this up?** The monitoring in place to track the error rate on the function responsible was using an incorrect aggregation algorithm meaning that the threshold for alert was never breached. At 18:03 UTC on June 16th a fix was released and service was resumed to all environment-based identity integrations \(listed above\). Unfortunately, due to the implementation of the asynchronous call to the lambda function that handles the integrations, it is not possible to recover the data that was lost during this period. ### What are we doing to prevent this from happening in the future * Improving our unit tests to rely less on mocks and, where they do rely on mocks, ensuring they utilise `spec` correctly \(see unittest documentation [here](https://docs.python.org/3/library/unittest.mock.html) for further reading\) * Extending our E2E testing suite on the Edge API to include tests for all integrations. * Alerting & monitoring: * Immediately this morning, we are changing our alerting to use the correct aggregation algorithm * In the near future, we will be improving these alerts to use percentiles and anomaly detection to ensure that errors are picked up quicker and are more accurate * Introducing Sentry to better track error logs that are reported from the Edge API * Moving asynchronous invocations of other lambda functions to use persistent queues and/or an event messaging system so that, after issues such as this, tasks can be re-run to ensure no data is lost.
Time: June 16, 2023, 6:13 p.m.

Status: Resolved

Update: Issue has been resolved. Integrations all tested as working. Post-Mortem report to come next week.
Time: June 16, 2023, 5:40 p.m.

Status: Identified

Update: We are aware of issues firing environment based integrations in our Edge API. We have identified a fix and are working on the testing now. Performance of the Edge API itself is not affected.

Delayed Flag updates

Description: This incident has been resolved.

Status: Resolved

Impact: Minor | Started At: April 4, 2023, 8:17 a.m.

Updates:

Time: April 4, 2023, 8:22 a.m.

Status: Resolved

Update: This incident has been resolved.
Time: April 4, 2023, 8:18 a.m.

Status: Identified

Update: The issue has been identified and a fix is being implemented.
Time: April 4, 2023, 8:17 a.m.

Status: Investigating

Update: We are seeing increased task queue sizes which is delaying Flag updates propagating to our Edge API. We have isolated the issue and ar working on a fix.

Check the status of similar companies and alternatives to Flagsmith

Avalara

Systems Active

Crisis Text Line

Systems Active

Jamf

Systems Active

Mulesoft

Systems Active

Meltwater

Systems Active

HashiCorp

Systems Active

Datto

Issues Detected

Vox Media

Systems Active

Cradlepoint

Systems Active

Liferay

Systems Active

Zapier

Systems Active

Workato US

Systems Active

Frequently Asked Questions - Flagsmith

Is there a Flagsmith outage?

The current status of Flagsmith is: Systems Active

Where can I find the official status page of Flagsmith?

The official status page for Flagsmith is here

How can I get notified if Flagsmith is down or experiencing an outage?

To get notified of any status changes to Flagsmith, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Flagsmith every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here

What does Flagsmith do?

Flagsmith simplifies feature flag creation and management. Utilize their hosted API, deploy to a private cloud, or run on-premises.

Is there an Flagsmith outage?

Flagsmith status: Systems Active

Flagsmith outages and incidents

There have been 1 outages or incidents for Flagsmith in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Components and Services Monitored for Flagsmith

Latest Flagsmith outages and incidents.

Issue affecting permissions for project admin users

Updates:

Slow response times for Edge API requests

Updates:

Erroneous flag values

Updates:

Environment based integrations not working for Edge API

Updates:

Delayed Flag updates

Updates:

Check the status of similar companies and alternatives to Flagsmith

Avalara

Crisis Text Line

Jamf

Mulesoft

Meltwater

HashiCorp

Datto

Vox Media

Cradlepoint

Liferay

Zapier

Workato US

Frequently Asked Questions - Flagsmith

Is there a Flagsmith outage?

Where can I find the official status page of Flagsmith?

How can I get notified if Flagsmith is down or experiencing an outage?

What does Flagsmith do?

Start monitoring now!