InfluxDB Status: Check if InfluxDB down or having an outage.

InfluxDB outages and incidents

Outage and incident data over the last 30 days for InfluxDB.

There have been 2 outages or incidents for InfluxDB in the last 30 days.

Severity Breakdown:

None: 0

Minor: 2

Major: 0

Critical: 0

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Components and Services Monitored for InfluxDB

Outlogger tracks the status of these components for Xero:

AWS: Sydney (Discontinued)

API Queries Active

API Writes Active

Compute Active

Other Active

Persistent Storage Active

Tasks Active

Web UI Active

Cloud Dedicated

API Reads Active

API Writes Active

Management API Active

Cloud Serverless: AWS, EU-Central

API Queries Active

API Writes Active

Compute Active

Other Active

Persistent Storage Active

Tasks Active

Web UI Active

Cloud Serverless: AWS, US-East-1

API Queries Active

API Writes Active

Compute Active

Other Active

Persistent Storage Active

Tasks Active

Web UI Active

Cloud Serverless: AWS, US-West-2-1

API Queries Active

API Writes Active

Compute Active

Other Active

Persistent Storage Active

Tasks Active

Web UI Active

Cloud Serverless: AWS, US-West-2-2

API Queries Active

API Writes Active

Compute Active

Other Active

Persistent Storage Active

Tasks Active

Web UI Active

Cloud Serverless: Azure, East US

API Queries Active

API Writes Active

Compute Active

Other Active

Persistent Storage Active

Tasks Active

Web UI Active

Cloud Serverless: Azure, W. Europe

API Queries Active

API Writes Active

Compute Active

Other Active

Persistent Storage Active

Tasks Active

Web UI Active

Cloud Serverless: GCP

API Queries Active

API Writes Active

Compute Active

Other Active

Persistent Storage Active

Tasks Active

Web UI Active

Google Cloud: Belgium (Discontinued)

API Queries Active

API Writes Active

Compute Active

Other Active

Persistent Storage Active

Tasks Active

Web UI Active

Other Services

Auth0 User Authentication Active

Marketplace integrations Active

Web UI Authentication (Auth0) Active

Component	Status
AWS: Sydney (Discontinued)	Active
API Queries	Active
API Writes	Active
Compute	Active
Other	Active
Persistent Storage	Active
Tasks	Active
Web UI	Active
Cloud Dedicated	Active
API Reads	Active
API Writes	Active
Management API	Active
Cloud Serverless: AWS, EU-Central	Active
API Queries	Active
API Writes	Active
Compute	Active
Other	Active
Persistent Storage	Active
Tasks	Active
Web UI	Active
Cloud Serverless: AWS, US-East-1	Active
API Queries	Active
API Writes	Active
Compute	Active
Other	Active
Persistent Storage	Active
Tasks	Active
Web UI	Active
Cloud Serverless: AWS, US-West-2-1	Active
API Queries	Active
API Writes	Active
Compute	Active
Other	Active
Persistent Storage	Active
Tasks	Active
Web UI	Active
Cloud Serverless: AWS, US-West-2-2	Active
API Queries	Active
API Writes	Active
Compute	Active
Other	Active
Persistent Storage	Active
Tasks	Active
Web UI	Active
Cloud Serverless: Azure, East US	Active
API Queries	Active
API Writes	Active
Compute	Active
Other	Active
Persistent Storage	Active
Tasks	Active
Web UI	Active
Cloud Serverless: Azure, W. Europe	Active
API Queries	Active
API Writes	Active
Compute	Active
Other	Active
Persistent Storage	Active
Tasks	Active
Web UI	Active
Cloud Serverless: GCP	Active
API Queries	Active
API Writes	Active
Compute	Active
Other	Active
Persistent Storage	Active
Tasks	Active
Web UI	Active
Google Cloud: Belgium (Discontinued)	Active
API Queries	Active
API Writes	Active
Compute	Active
Other	Active
Persistent Storage	Active
Tasks	Active
Web UI	Active
Other Services	Active
Auth0 User Authentication	Active
Marketplace integrations	Active
Web UI Authentication (Auth0)	Active

Latest InfluxDB outages and incidents.

View the latest incidents for InfluxDB and check for official updates:

Increase in read errors - Azure US-East Region

Description: This incident has been resolved.

Status: Resolved

Impact: Minor | Started At: June 19, 2024, 6:35 p.m.

Updates:

Time: June 20, 2024, 9:14 a.m.

Status: Resolved

Update: This incident has been resolved.
Time: June 19, 2024, 11:22 p.m.

Status: Monitoring

Update: A fix has been implemented and we are monitoring the results.
Time: June 19, 2024, 10:01 p.m.

Status: Investigating

Update: We are continuing to investigate this issue.
Time: June 19, 2024, 8:12 p.m.

Status: Investigating

Update: We are continuing to investigate this issue.
Time: June 19, 2024, 6:35 p.m.

Status: Investigating

Update: We are currently investigating an increase in rate of read errors in the Azure US East region.

Partial outage for Queries in Eu-Central - We are investigating the issue

Description: # RCA **Sustained query error-rate in AWS eu-central-1 on June 7, 2024** ### Background Data stored in InfluxDB Cloud is distributed across 64 partitions. Distribution is performed using a persistent hash of the series key, with the intent that the write and query load will, on average, be distributed evenly across partitions. When an InfluxDB Cloud user writes data, their writes first go into a durable queue. So, rather than being written directly by users, storage pods consume ingest data from the other end of this queue, amongst other things, this allows writes to be accepted even if storage is encountering issues. One of the metrics used to reflect the status of this pipeline is [Time To Become Readable \(TTBR\)](https://docs.influxdata.com/influxdb/cloud/reference/internals/ttbr/), the time it takes between a write being accepted and passing through the queue so that its data is available to queries. In order to respond to a query, the compute tier needs to request relevant data from each of the 64 partitions. In order for a query to succeed, the compute **must** receive a response from every partition \(this is to ensure that incomplete results are not returned\). Each partition has multiple pods that are responsible for it, and query activity is distributed across them. ### Start of Incident On the 7th of June 2024, partition 44 started to report large increases in TTBR. This meant that, whilst customer’s writes were being safely accepted into the durable queue, they were delayed in becoming available to queries. At around the same time, alerts were received indicating an elevated query failure rate, accompanied by an increase in the query queue depth. ### Investigation Investigation showed that the pods responsible for partition 44 were periodically trying to consume more RAM than permitted, causing them to exit and report an out-of-memory \(OOM\) event. InfluxData allocated additional RAM to the pods to try to mitigate the customer-facing impact quickly. However, they continued to OOM, so the investigation moved on to identifying the source of the excessive resource usage. In a multi-tenant system, the resource usage of a single user impacting other users is known as a noisy neighbor issue. The best way to address the problem is to identify the tenant that is the source of the problematic query and temporarily block their queries, while we engage with them to correct the problematic query. In this case, the customer had automated the execution of a query that attempted to run an in-memory _sort\(\)_ against data taken from a particularly dense series. With the problematic query being submitted regularly, more RAM was consumed, until the storage pods ultimately OOMed. As a result of these regular OOMs, Kubernetes moved the pods into a CrashloopBackOff state, which lengthened the recovery time between each OOM. The extended recovery periodically caused all pods responsible for partition 44 to be offline, preventing the query tier from authoritatively answering queries. ### Actions We are working on several changes to better identify the source of incidents and reduce the likelihood of them occurring in the future. These changes include: * Better visualization to make it easier to identify small noisy neighbors * Disabling CrashLoopbackOff for storage pods once Kubernetes can do so has been added.

Status: Postmortem

Impact: Major | Started At: June 7, 2024, 10:37 p.m.

Updates:

Time: June 13, 2024, 3:08 a.m.

Status: Postmortem

Update: # RCA **Sustained query error-rate in AWS eu-central-1 on June 7, 2024** ### Background Data stored in InfluxDB Cloud is distributed across 64 partitions. Distribution is performed using a persistent hash of the series key, with the intent that the write and query load will, on average, be distributed evenly across partitions. When an InfluxDB Cloud user writes data, their writes first go into a durable queue. So, rather than being written directly by users, storage pods consume ingest data from the other end of this queue, amongst other things, this allows writes to be accepted even if storage is encountering issues. One of the metrics used to reflect the status of this pipeline is [Time To Become Readable \(TTBR\)](https://docs.influxdata.com/influxdb/cloud/reference/internals/ttbr/), the time it takes between a write being accepted and passing through the queue so that its data is available to queries. In order to respond to a query, the compute tier needs to request relevant data from each of the 64 partitions. In order for a query to succeed, the compute **must** receive a response from every partition \(this is to ensure that incomplete results are not returned\). Each partition has multiple pods that are responsible for it, and query activity is distributed across them. ### Start of Incident On the 7th of June 2024, partition 44 started to report large increases in TTBR. This meant that, whilst customer’s writes were being safely accepted into the durable queue, they were delayed in becoming available to queries. At around the same time, alerts were received indicating an elevated query failure rate, accompanied by an increase in the query queue depth. ### Investigation Investigation showed that the pods responsible for partition 44 were periodically trying to consume more RAM than permitted, causing them to exit and report an out-of-memory \(OOM\) event. InfluxData allocated additional RAM to the pods to try to mitigate the customer-facing impact quickly. However, they continued to OOM, so the investigation moved on to identifying the source of the excessive resource usage. In a multi-tenant system, the resource usage of a single user impacting other users is known as a noisy neighbor issue. The best way to address the problem is to identify the tenant that is the source of the problematic query and temporarily block their queries, while we engage with them to correct the problematic query. In this case, the customer had automated the execution of a query that attempted to run an in-memory _sort\(\)_ against data taken from a particularly dense series. With the problematic query being submitted regularly, more RAM was consumed, until the storage pods ultimately OOMed. As a result of these regular OOMs, Kubernetes moved the pods into a CrashloopBackOff state, which lengthened the recovery time between each OOM. The extended recovery periodically caused all pods responsible for partition 44 to be offline, preventing the query tier from authoritatively answering queries. ### Actions We are working on several changes to better identify the source of incidents and reduce the likelihood of them occurring in the future. These changes include: * Better visualization to make it easier to identify small noisy neighbors * Disabling CrashLoopbackOff for storage pods once Kubernetes can do so has been added.
Time: June 8, 2024, 1:08 a.m.

Status: Resolved

Update: This incident has been resolved.
Time: June 8, 2024, 12:51 a.m.

Status: Monitoring

Update: We are continuing to monitor for any further issues.
Time: June 8, 2024, 12:51 a.m.

Status: Monitoring

Update: The issue has been addressed and we are monitoring
Time: June 8, 2024, 12:50 a.m.

Status: Investigating

Update: The issue has been addressed and we are monitoring
Time: June 8, 2024, 12:02 a.m.

Status: Investigating

Update: We have added more capacity to support the increased query workload that we are seeing and continue to investigate.
Time: June 7, 2024, 10:37 p.m.

Status: Investigating

Update: We are currently investigating this issue.

Partial outage for Queries in Eu-Central - We are investigating the issue

Status: Postmortem

Impact: Major | Started At: June 7, 2024, 10:37 p.m.

Updates:

Time: June 13, 2024, 3:08 a.m.

Status: Postmortem

Update: # RCA **Sustained query error-rate in AWS eu-central-1 on June 7, 2024** ### Background Data stored in InfluxDB Cloud is distributed across 64 partitions. Distribution is performed using a persistent hash of the series key, with the intent that the write and query load will, on average, be distributed evenly across partitions. When an InfluxDB Cloud user writes data, their writes first go into a durable queue. So, rather than being written directly by users, storage pods consume ingest data from the other end of this queue, amongst other things, this allows writes to be accepted even if storage is encountering issues. One of the metrics used to reflect the status of this pipeline is [Time To Become Readable \(TTBR\)](https://docs.influxdata.com/influxdb/cloud/reference/internals/ttbr/), the time it takes between a write being accepted and passing through the queue so that its data is available to queries. In order to respond to a query, the compute tier needs to request relevant data from each of the 64 partitions. In order for a query to succeed, the compute **must** receive a response from every partition \(this is to ensure that incomplete results are not returned\). Each partition has multiple pods that are responsible for it, and query activity is distributed across them. ### Start of Incident On the 7th of June 2024, partition 44 started to report large increases in TTBR. This meant that, whilst customer’s writes were being safely accepted into the durable queue, they were delayed in becoming available to queries. At around the same time, alerts were received indicating an elevated query failure rate, accompanied by an increase in the query queue depth. ### Investigation Investigation showed that the pods responsible for partition 44 were periodically trying to consume more RAM than permitted, causing them to exit and report an out-of-memory \(OOM\) event. InfluxData allocated additional RAM to the pods to try to mitigate the customer-facing impact quickly. However, they continued to OOM, so the investigation moved on to identifying the source of the excessive resource usage. In a multi-tenant system, the resource usage of a single user impacting other users is known as a noisy neighbor issue. The best way to address the problem is to identify the tenant that is the source of the problematic query and temporarily block their queries, while we engage with them to correct the problematic query. In this case, the customer had automated the execution of a query that attempted to run an in-memory _sort\(\)_ against data taken from a particularly dense series. With the problematic query being submitted regularly, more RAM was consumed, until the storage pods ultimately OOMed. As a result of these regular OOMs, Kubernetes moved the pods into a CrashloopBackOff state, which lengthened the recovery time between each OOM. The extended recovery periodically caused all pods responsible for partition 44 to be offline, preventing the query tier from authoritatively answering queries. ### Actions We are working on several changes to better identify the source of incidents and reduce the likelihood of them occurring in the future. These changes include: * Better visualization to make it easier to identify small noisy neighbors * Disabling CrashLoopbackOff for storage pods once Kubernetes can do so has been added.
Time: June 8, 2024, 1:08 a.m.

Status: Resolved

Update: This incident has been resolved.
Time: June 8, 2024, 12:51 a.m.

Status: Monitoring

Update: We are continuing to monitor for any further issues.
Time: June 8, 2024, 12:51 a.m.

Status: Monitoring

Update: The issue has been addressed and we are monitoring
Time: June 8, 2024, 12:50 a.m.

Status: Investigating

Update: The issue has been addressed and we are monitoring
Time: June 8, 2024, 12:02 a.m.

Status: Investigating

Update: We have added more capacity to support the increased query workload that we are seeing and continue to investigate.
Time: June 7, 2024, 10:37 p.m.

Status: Investigating

Update: We are currently investigating this issue.

Query issues in GCP us-central-1

Description: This incident has been resolved.

Status: Resolved

Impact: Critical | Started At: May 27, 2024, 8:25 a.m.

Updates:

Time: May 27, 2024, 11:41 a.m.

Status: Resolved

Update: This incident has been resolved.
Time: May 27, 2024, 10:22 a.m.

Status: Monitoring

Update: A fix has been implemented and we are monitoring the results.
Time: May 27, 2024, 8:25 a.m.

Status: Investigating

Update: We are currently investigating this issue.

Query issues in GCP us-central-1

Description: This incident has been resolved.

Status: Resolved

Impact: Critical | Started At: May 27, 2024, 8:25 a.m.

Updates:

Time: May 27, 2024, 11:41 a.m.

Status: Resolved

Update: This incident has been resolved.
Time: May 27, 2024, 10:22 a.m.

Status: Monitoring

Update: A fix has been implemented and we are monitoring the results.
Time: May 27, 2024, 8:25 a.m.

Status: Investigating

Update: We are currently investigating this issue.

Check the status of similar companies and alternatives to InfluxDB

Smartsheet

Systems Active

ESS (Public)

Systems Active

ESS (Public)

Systems Active

Cloudera

Systems Active

New Relic

Systems Active

Boomi

Systems Active

AppsFlyer

Systems Active

Imperva

Systems Active

Bazaarvoice

Issues Detected

Optimizely

Systems Active

Electric

Systems Active

ABBYY

Systems Active

Frequently Asked Questions - InfluxDB

Is there a InfluxDB outage?

The current status of InfluxDB is: Systems Active

Where can I find the official status page of InfluxDB?

The official status page for InfluxDB is here

How can I get notified if InfluxDB is down or experiencing an outage?

To get notified of any status changes to InfluxDB, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of InfluxDB every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here

What does InfluxDB do?

Efficiently store and access time series data in a specialized database designed for speed, available in cloud, on-premises, or edge environments.

Is there an InfluxDB outage?

InfluxDB status: Systems Active

InfluxDB outages and incidents

There have been 2 outages or incidents for InfluxDB in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Components and Services Monitored for InfluxDB

AWS: Sydney (Discontinued)

Cloud Dedicated

Cloud Serverless: AWS, EU-Central

Cloud Serverless: AWS, US-East-1

Cloud Serverless: AWS, US-West-2-1

Cloud Serverless: AWS, US-West-2-2

Cloud Serverless: Azure, East US

Cloud Serverless: Azure, W. Europe

Cloud Serverless: GCP

Google Cloud: Belgium (Discontinued)

Other Services

Latest InfluxDB outages and incidents.

Increase in read errors - Azure US-East Region

Updates:

Partial outage for Queries in Eu-Central - We are investigating the issue

Updates:

Partial outage for Queries in Eu-Central - We are investigating the issue

Updates:

Query issues in GCP us-central-1

Updates:

Query issues in GCP us-central-1

Updates:

Check the status of similar companies and alternatives to InfluxDB

Frequently Asked Questions - InfluxDB

Is there a InfluxDB outage?

Where can I find the official status page of InfluxDB?

How can I get notified if InfluxDB is down or experiencing an outage?

What does InfluxDB do?

Start monitoring now!