InfluxDB Status: Check if InfluxDB down or having an outage.

InfluxDB outages and incidents

Outage and incident data over the last 30 days for InfluxDB.

There have been 1 outages or incidents for InfluxDB in the last 30 days.

Severity Breakdown:

None: 0

Minor: 1

Major: 0

Critical: 0

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Components and Services Monitored for InfluxDB

Outlogger tracks the status of these components for Xero:

AWS: Sydney (Discontinued)

API Queries Active

API Writes Active

Compute Active

Other Active

Persistent Storage Active

Tasks Active

Web UI Active

Cloud Dedicated

API Reads Active

API Writes Active

Management API Active

Cloud Serverless: AWS, EU-Central

API Queries Active

API Writes Active

Compute Active

Other Active

Persistent Storage Active

Tasks Active

Web UI Active

Cloud Serverless: AWS, US-East-1

API Queries Active

API Writes Active

Compute Active

Other Active

Persistent Storage Active

Tasks Active

Web UI Active

Cloud Serverless: AWS, US-West-2-1

API Queries Active

API Writes Active

Compute Active

Other Active

Persistent Storage Active

Tasks Active

Web UI Active

Cloud Serverless: AWS, US-West-2-2

API Queries Active

API Writes Active

Compute Active

Other Active

Persistent Storage Active

Tasks Active

Web UI Active

Cloud Serverless: Azure, East US

API Queries Active

API Writes Active

Compute Active

Other Active

Persistent Storage Active

Tasks Active

Web UI Active

Cloud Serverless: Azure, W. Europe

API Queries Active

API Writes Active

Compute Active

Other Active

Persistent Storage Active

Tasks Active

Web UI Active

Cloud Serverless: GCP

API Queries Active

API Writes Active

Compute Active

Other Active

Persistent Storage Active

Tasks Active

Web UI Active

Google Cloud: Belgium (Discontinued)

API Queries Active

API Writes Active

Compute Active

Other Active

Persistent Storage Active

Tasks Active

Web UI Active

Other Services

Auth0 User Authentication Active

Marketplace integrations Active

Web UI Authentication (Auth0) Active

Component	Status
AWS: Sydney (Discontinued)	Active
API Queries	Active
API Writes	Active
Compute	Active
Other	Active
Persistent Storage	Active
Tasks	Active
Web UI	Active
Cloud Dedicated	Active
API Reads	Active
API Writes	Active
Management API	Active
Cloud Serverless: AWS, EU-Central	Active
API Queries	Active
API Writes	Active
Compute	Active
Other	Active
Persistent Storage	Active
Tasks	Active
Web UI	Active
Cloud Serverless: AWS, US-East-1	Active
API Queries	Active
API Writes	Active
Compute	Active
Other	Active
Persistent Storage	Active
Tasks	Active
Web UI	Active
Cloud Serverless: AWS, US-West-2-1	Active
API Queries	Active
API Writes	Active
Compute	Active
Other	Active
Persistent Storage	Active
Tasks	Active
Web UI	Active
Cloud Serverless: AWS, US-West-2-2	Active
API Queries	Active
API Writes	Active
Compute	Active
Other	Active
Persistent Storage	Active
Tasks	Active
Web UI	Active
Cloud Serverless: Azure, East US	Active
API Queries	Active
API Writes	Active
Compute	Active
Other	Active
Persistent Storage	Active
Tasks	Active
Web UI	Active
Cloud Serverless: Azure, W. Europe	Active
API Queries	Active
API Writes	Active
Compute	Active
Other	Active
Persistent Storage	Active
Tasks	Active
Web UI	Active
Cloud Serverless: GCP	Active
API Queries	Active
API Writes	Active
Compute	Active
Other	Active
Persistent Storage	Active
Tasks	Active
Web UI	Active
Google Cloud: Belgium (Discontinued)	Active
API Queries	Active
API Writes	Active
Compute	Active
Other	Active
Persistent Storage	Active
Tasks	Active
Web UI	Active
Other Services	Active
Auth0 User Authentication	Active
Marketplace integrations	Active
Web UI Authentication (Auth0)	Active

Latest InfluxDB outages and incidents.

View the latest incidents for InfluxDB and check for official updates:

Login issues affecting multiple regions

Description: This incident has been resolved.

Status: Resolved

Impact: Critical | Started At: March 13, 2023, 4:22 p.m.

Updates:

Time: March 13, 2023, 7:59 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: March 13, 2023, 4:40 p.m.

Status: Monitoring

Update: A fix has been implemented and we are monitoring the results.
Time: March 13, 2023, 4:22 p.m.

Status: Investigating

Update: We are currently investigating this issue.

Elevated Error Rate -- eu-central

Description: We are continuing to monitor but there are no delays in writes or other issues at this time.

Status: Resolved

Impact: Minor | Started At: March 11, 2023, 1:16 a.m.

Updates:

Time: March 11, 2023, 2:11 a.m.

Status: Resolved

Update: We are continuing to monitor but there are no delays in writes or other issues at this time.
Time: March 11, 2023, 2:10 a.m.

Status: Investigating

Update: We are continuing to investigate this issue.
Time: March 11, 2023, 1:17 a.m.

Status: Investigating

Update: We are continuing to investigate this issue.
Time: March 11, 2023, 1:16 a.m.

Status: Investigating

Update: We are currently investigating this issue.

Elevated Error Rate -- eu-central

Description: We are continuing to monitor but there are no delays in writes or other issues at this time.

Status: Resolved

Impact: Minor | Started At: March 11, 2023, 1:16 a.m.

Updates:

Time: March 11, 2023, 2:11 a.m.

Status: Resolved

Update: We are continuing to monitor but there are no delays in writes or other issues at this time.
Time: March 11, 2023, 2:10 a.m.

Status: Investigating

Update: We are continuing to investigate this issue.
Time: March 11, 2023, 1:17 a.m.

Status: Investigating

Update: We are continuing to investigate this issue.
Time: March 11, 2023, 1:16 a.m.

Status: Investigating

Update: We are currently investigating this issue.

Reads-Writes affected in eu-central-1

Description: # Incident RCA RCA - 503 errors in eu-central on March 7, 2023 # **Summary** We use Hashicorp’s Vault product to secure access to secrets within our clusters. Vault is configured in a highly available active-standby configuration, with one active instance and two standby instances \(the recommended HA configuration\). On March 7, 2023, 09:26 UTC we saw a high rate of errors on the write and read paths in eu-central, due to Vault becoming unreachable. We restarted Vault, and then restarted the query, write and storage pods in a controlled manner, and the cluster recovered. # **Cause of the Incident** The root cause was our Vault infrastructure becoming overloaded. We use Vault to securely handle authorization tokens and store customer secrets. The number of requests became too high for the Vault system to respond in a timely manner causing requests to fail. Because the write and read services rely on the secrets in Vault to be able to process customer data, these services started failing health checks and being restarted in an attempt to recover. Each restart increased the load on Vault further, creating a positive feedback loop that meant the system could not recover without intervention. # **Recovery** As we had learned in the incident of Feb 24, 2023, Vault can get overwhelmed when too many pods try to connect to Vault at the same time. The remedial action we took was to scale down the size of the worker pools for all customer-facing services in order to reduce the load on Vault to enable it to recover and then slowly increase them back to the previous levels in a controlled manner to avoid stressing the Vault system further. This enabled the cluster to return to normal operation. # **Future mitigations** 1. We are reviewing our Vault readiness check to determine why the active Vault instance did not failover to the standby, when it was no longer able to respond to incoming requests. 2. We are investigating strategies to avoid having so many components rely on Vault access to work correctly.

Status: Postmortem

Impact: None | Started At: March 7, 2023, 9:53 a.m.

Updates:

Time: March 17, 2023, 9:47 p.m.

Status: Postmortem

Update: # Incident RCA RCA - 503 errors in eu-central on March 7, 2023 # **Summary** We use Hashicorp’s Vault product to secure access to secrets within our clusters. Vault is configured in a highly available active-standby configuration, with one active instance and two standby instances \(the recommended HA configuration\). On March 7, 2023, 09:26 UTC we saw a high rate of errors on the write and read paths in eu-central, due to Vault becoming unreachable. We restarted Vault, and then restarted the query, write and storage pods in a controlled manner, and the cluster recovered. # **Cause of the Incident** The root cause was our Vault infrastructure becoming overloaded. We use Vault to securely handle authorization tokens and store customer secrets. The number of requests became too high for the Vault system to respond in a timely manner causing requests to fail. Because the write and read services rely on the secrets in Vault to be able to process customer data, these services started failing health checks and being restarted in an attempt to recover. Each restart increased the load on Vault further, creating a positive feedback loop that meant the system could not recover without intervention. # **Recovery** As we had learned in the incident of Feb 24, 2023, Vault can get overwhelmed when too many pods try to connect to Vault at the same time. The remedial action we took was to scale down the size of the worker pools for all customer-facing services in order to reduce the load on Vault to enable it to recover and then slowly increase them back to the previous levels in a controlled manner to avoid stressing the Vault system further. This enabled the cluster to return to normal operation. # **Future mitigations** 1. We are reviewing our Vault readiness check to determine why the active Vault instance did not failover to the standby, when it was no longer able to respond to incoming requests. 2. We are investigating strategies to avoid having so many components rely on Vault access to work correctly.
Time: March 7, 2023, 8:48 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: March 7, 2023, 8:48 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: March 7, 2023, 5:09 p.m.

Status: Monitoring

Update: We are continuing to monitor for any further issues.
Time: March 7, 2023, 5:09 p.m.

Status: Monitoring

Update: We are continuing to monitor for any further issues.
Time: March 7, 2023, 1:21 p.m.

Status: Monitoring

Update: Read and writes are back to Normal levels. We are monitoring this further
Time: March 7, 2023, 1:21 p.m.

Status: Monitoring

Update: Read and writes are back to Normal levels. We are monitoring this further
Time: March 7, 2023, 12:31 p.m.

Status: Identified

Update: Writes have been returned to full capacity. We are now slowly increasing the query capacity back to previous levels.
Time: March 7, 2023, 12:31 p.m.

Status: Identified

Update: Writes have been returned to full capacity. We are now slowly increasing the query capacity back to previous levels.
Time: March 7, 2023, 11:19 a.m.

Status: Identified

Update: The dev team has identified and working on a fix for the issue. We will have further updates soon.
Time: March 7, 2023, 11:19 a.m.

Status: Identified

Update: The dev team has identified and working on a fix for the issue. We will have further updates soon.
Time: March 7, 2023, 10:28 a.m.

Status: Investigating

Update: The investigation continues. There are no further updates.
Time: March 7, 2023, 9:53 a.m.

Status: Investigating

Update: The Devs are currently working on this. We will keep you updated with the progress.
Time: March 7, 2023, 9:53 a.m.

Status: Investigating

Update: The Devs are currently working on this. We will keep you updated with the progress.

Write and read outage in AWS: Frankfurt, EU-Central-1, AWS: Oregon, US-West-2-1 and AWS: Virginia, US-East-1

Description: # Incident RCA Write and read outage in AWS: Frankfurt, EU-Central-1, AWS: Oregon, US-West-2-1 and AWS: Virginia, US-East-1 # Summary On Feb 24, 2023 at 19.30 UTC, we deployed a software change to multiple production clusters, which caused a significant percentage of writes and queries to fail in our larger clusters. The duration of the outage was different for each cluster as was the level of disruption \(percentage of writes and queries that failed during the incident\). The table below summarizes the time ranges during which the service was impacted in each cluster \(all in UTC time\). | Cluster | Write failure start | Write failure end | Query failure start | Query failure end | | --- | --- | --- | --- | --- | | prod01-us-west-2 | 19:38 | 22:17 | 19:36 | 22:20 | | prod01-eu-central-1 | 19:36 | 23:49 | 19:34 | 23:38 | | prod101-us-east-1 | 19:34 | 22:44 | 19:34 | 00:44 | # Cause of the Incident Our software is deployed via a CD pipeline to three staging clusters \(one per cloud provider\) where a suite of automated tests are run. If those tests pass, then it is deployed into an internal cluster where another round of testing occurs, and finally it is deployed to all of our production clusters in parallel. This is our standard software deployment methodology for our cloud service. On February 24, 2023, an engineer made a change to a health-check to ensure that our query and write pods can reach the vault within the cluster \(where credentials are managed\). In the past, it was possible for a query or write pod to get stuck, if it lost access to the vault. To address that problem, a health check was added so that if a pod could not reach the vault, the pod would stop/restart automatically. This health check was tested in all three staging clusters, and worked fine. The change was promoted to our internal cluster, which also worked fine. The change was then promoted to our production clusters. In the larger clusters, when the pods were restarted \(with the new health check in place\) too many pods made health-check calls to the vault in quick succession. These calls overwhelmed the vault, and it was unable to service all the requests. As the health check failed, the pods attempted to recover by restarting, which put an even heavier workload on the vault, from which it was unable to recover. # Recovery As soon as we detected the problem, and identified the offending software change, we rolled back to an earlier version of our production software, and redeployed that in all the production clusters. In our smaller clusters, this happened quickly, without any significant customer impact. In our three largest clusters \(the clusters listed above\), as the vault was deadlocked, we were unable to deploy the new software without manually restarting the vault instances, and then gradually restarting the services that depend on the vault. This is what caused it to take longer to recover in these clusters. # Future mitigations 1. We are re-implementing the offending health check so that we can detect a stuck pod without putting such a burden on the vault. 2. As the vault is a critical element of our service, we are adding an extra peer review step to all software changes that interact with the vault. 3. We are enhancing the vault configuration to have the vault more gracefully degrade when overloaded. 4. We are enhancing our runbooks so that we can more quickly intervene with manual steps if the regular deployment/rollback process fails, to reduce our overall time-to-recover when a cluster fails to recover normally.

Status: Postmortem

Impact: Critical | Started At: Feb. 24, 2023, 8 p.m.

Updates:

Time: March 1, 2023, 1:34 a.m.

Status: Postmortem

Update: # Incident RCA Write and read outage in AWS: Frankfurt, EU-Central-1, AWS: Oregon, US-West-2-1 and AWS: Virginia, US-East-1 # Summary On Feb 24, 2023 at 19.30 UTC, we deployed a software change to multiple production clusters, which caused a significant percentage of writes and queries to fail in our larger clusters. The duration of the outage was different for each cluster as was the level of disruption \(percentage of writes and queries that failed during the incident\). The table below summarizes the time ranges during which the service was impacted in each cluster \(all in UTC time\). | Cluster | Write failure start | Write failure end | Query failure start | Query failure end | | --- | --- | --- | --- | --- | | prod01-us-west-2 | 19:38 | 22:17 | 19:36 | 22:20 | | prod01-eu-central-1 | 19:36 | 23:49 | 19:34 | 23:38 | | prod101-us-east-1 | 19:34 | 22:44 | 19:34 | 00:44 | # Cause of the Incident Our software is deployed via a CD pipeline to three staging clusters \(one per cloud provider\) where a suite of automated tests are run. If those tests pass, then it is deployed into an internal cluster where another round of testing occurs, and finally it is deployed to all of our production clusters in parallel. This is our standard software deployment methodology for our cloud service. On February 24, 2023, an engineer made a change to a health-check to ensure that our query and write pods can reach the vault within the cluster \(where credentials are managed\). In the past, it was possible for a query or write pod to get stuck, if it lost access to the vault. To address that problem, a health check was added so that if a pod could not reach the vault, the pod would stop/restart automatically. This health check was tested in all three staging clusters, and worked fine. The change was promoted to our internal cluster, which also worked fine. The change was then promoted to our production clusters. In the larger clusters, when the pods were restarted \(with the new health check in place\) too many pods made health-check calls to the vault in quick succession. These calls overwhelmed the vault, and it was unable to service all the requests. As the health check failed, the pods attempted to recover by restarting, which put an even heavier workload on the vault, from which it was unable to recover. # Recovery As soon as we detected the problem, and identified the offending software change, we rolled back to an earlier version of our production software, and redeployed that in all the production clusters. In our smaller clusters, this happened quickly, without any significant customer impact. In our three largest clusters \(the clusters listed above\), as the vault was deadlocked, we were unable to deploy the new software without manually restarting the vault instances, and then gradually restarting the services that depend on the vault. This is what caused it to take longer to recover in these clusters. # Future mitigations 1. We are re-implementing the offending health check so that we can detect a stuck pod without putting such a burden on the vault. 2. As the vault is a critical element of our service, we are adding an extra peer review step to all software changes that interact with the vault. 3. We are enhancing the vault configuration to have the vault more gracefully degrade when overloaded. 4. We are enhancing our runbooks so that we can more quickly intervene with manual steps if the regular deployment/rollback process fails, to reduce our overall time-to-recover when a cluster fails to recover normally.
Time: Feb. 25, 2023, 1:55 a.m.

Status: Resolved

Update: The issue has been fully resolved in all regions. We will continue to monitor.
Time: Feb. 25, 2023, 1:40 a.m.

Status: Monitoring

Update: The issue has been fully resolved in all regions. We will continue to monitor.
Time: Feb. 25, 2023, 1:21 a.m.

Status: Monitoring

Update: Write and read are back in AWS: Virginia, US-East-1 and we are continuing to monitor for any further issues.
Time: Feb. 25, 2023, 1:19 a.m.

Status: Monitoring

Update: We are continuing to monitor for any further issues.
Time: Feb. 25, 2023, 1:04 a.m.

Status: Monitoring

Update: Write and read are down in AWS: Virginia, US-East-1.
Time: Feb. 25, 2023, 1:02 a.m.

Status: Monitoring

Update: We are continuing to monitor for any further issues.
Time: Feb. 25, 2023, 12:32 a.m.

Status: Monitoring

Update: Write and read are working in all regions now.
Time: Feb. 24, 2023, 10:50 p.m.

Status: Investigating

Update: Write and read outage: AWS: Oregon, US-WEST-2-1 and AWS: Virginia, US-East-1 are recovering, and we are still working on AWS: Frankfurt, EU-Central-1
Time: Feb. 24, 2023, 10:44 p.m.

Status: Investigating

Update: We are continuing to investigate this issue.
Time: Feb. 24, 2023, 8 p.m.

Status: Investigating

Update: We are currently investigating this issue.

Check the status of similar companies and alternatives to InfluxDB

Smartsheet

Systems Active

ESS (Public)

Systems Active

ESS (Public)

Systems Active

Cloudera

Systems Active

New Relic

Systems Active

Boomi

Systems Active

AppsFlyer

Systems Active

Imperva

Systems Active

Bazaarvoice

Issues Detected

Optimizely

Systems Active

Electric

Systems Active

ABBYY

Issues Detected

Frequently Asked Questions - InfluxDB

Is there a InfluxDB outage?

The current status of InfluxDB is: Systems Active

Where can I find the official status page of InfluxDB?

The official status page for InfluxDB is here

How can I get notified if InfluxDB is down or experiencing an outage?

To get notified of any status changes to InfluxDB, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of InfluxDB every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here

What does InfluxDB do?

Efficiently store and access time series data in a specialized database designed for speed, available in cloud, on-premises, or edge environments.

Is there an InfluxDB outage?

InfluxDB status: Systems Active

InfluxDB outages and incidents

There have been 1 outages or incidents for InfluxDB in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Components and Services Monitored for InfluxDB

AWS: Sydney (Discontinued)

Cloud Dedicated

Cloud Serverless: AWS, EU-Central

Cloud Serverless: AWS, US-East-1

Cloud Serverless: AWS, US-West-2-1

Cloud Serverless: AWS, US-West-2-2

Cloud Serverless: Azure, East US

Cloud Serverless: Azure, W. Europe

Cloud Serverless: GCP

Google Cloud: Belgium (Discontinued)

Other Services

Latest InfluxDB outages and incidents.

Login issues affecting multiple regions

Updates:

Elevated Error Rate -- eu-central

Updates:

Elevated Error Rate -- eu-central

Updates:

Reads-Writes affected in eu-central-1

Updates:

Write and read outage in AWS: Frankfurt, EU-Central-1, AWS: Oregon, US-West-2-1 and AWS: Virginia, US-East-1

Updates:

Check the status of similar companies and alternatives to InfluxDB

Frequently Asked Questions - InfluxDB

Is there a InfluxDB outage?

Where can I find the official status page of InfluxDB?

How can I get notified if InfluxDB is down or experiencing an outage?

What does InfluxDB do?

Start monitoring now!