Company Logo

Is there an InfluxDB outage?

InfluxDB status: Systems Active

Last checked: 2 minutes ago

Get notified about any outages, downtime or incidents for InfluxDB and 1800+ other cloud vendors. Monitor 10 companies, for free.

Subscribe for updates

InfluxDB outages and incidents

Outage and incident data over the last 30 days for InfluxDB.

There have been 2 outages or incidents for InfluxDB in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Sign Up Now

Components and Services Monitored for InfluxDB

Outlogger tracks the status of these components for Xero:

API Queries Active
API Writes Active
Compute Active
Other Active
Persistent Storage Active
Tasks Active
Web UI Active
API Reads Active
API Writes Active
Management API Active
API Queries Active
API Writes Active
Compute Active
Other Active
Persistent Storage Active
Tasks Active
Web UI Active
API Queries Active
API Writes Active
Compute Active
Other Active
Persistent Storage Active
Tasks Active
Web UI Active
API Queries Active
API Writes Active
Compute Active
Other Active
Persistent Storage Active
Tasks Active
Web UI Active
API Queries Active
API Writes Active
Compute Active
Other Active
Persistent Storage Active
Tasks Active
Web UI Active
API Queries Active
API Writes Active
Compute Active
Other Active
Persistent Storage Active
Tasks Active
Web UI Active
API Queries Active
API Writes Active
Compute Active
Other Active
Persistent Storage Active
Tasks Active
Web UI Active
API Queries Active
API Writes Active
Compute Active
Other Active
Persistent Storage Active
Tasks Active
Web UI Active
API Queries Active
API Writes Active
Compute Active
Other Active
Persistent Storage Active
Tasks Active
Web UI Active
Auth0 User Authentication Active
Marketplace integrations Active
Web UI Authentication (Auth0) Active
Component Status
Active
API Queries Active
API Writes Active
Compute Active
Other Active
Persistent Storage Active
Tasks Active
Web UI Active
Active
API Reads Active
API Writes Active
Management API Active
Active
API Queries Active
API Writes Active
Compute Active
Other Active
Persistent Storage Active
Tasks Active
Web UI Active
Active
API Queries Active
API Writes Active
Compute Active
Other Active
Persistent Storage Active
Tasks Active
Web UI Active
Active
API Queries Active
API Writes Active
Compute Active
Other Active
Persistent Storage Active
Tasks Active
Web UI Active
Active
API Queries Active
API Writes Active
Compute Active
Other Active
Persistent Storage Active
Tasks Active
Web UI Active
Active
API Queries Active
API Writes Active
Compute Active
Other Active
Persistent Storage Active
Tasks Active
Web UI Active
Active
API Queries Active
API Writes Active
Compute Active
Other Active
Persistent Storage Active
Tasks Active
Web UI Active
Active
API Queries Active
API Writes Active
Compute Active
Other Active
Persistent Storage Active
Tasks Active
Web UI Active
Active
API Queries Active
API Writes Active
Compute Active
Other Active
Persistent Storage Active
Tasks Active
Web UI Active
Active
Auth0 User Authentication Active
Marketplace integrations Active
Web UI Authentication (Auth0) Active

Latest InfluxDB outages and incidents.

View the latest incidents for InfluxDB and check for official updates:

Updates:

  • Time: June 20, 2024, 9:14 a.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: June 19, 2024, 11:22 p.m.
    Status: Monitoring
    Update: A fix has been implemented and we are monitoring the results.
  • Time: June 19, 2024, 10:01 p.m.
    Status: Investigating
    Update: We are continuing to investigate this issue.
  • Time: June 19, 2024, 8:12 p.m.
    Status: Investigating
    Update: We are continuing to investigate this issue.
  • Time: June 19, 2024, 6:35 p.m.
    Status: Investigating
    Update: We are currently investigating an increase in rate of read errors in the Azure US East region.

Updates:

  • Time: June 13, 2024, 3:08 a.m.
    Status: Postmortem
    Update: # RCA **Sustained query error-rate in AWS eu-central-1 on June 7, 2024** ### Background Data stored in InfluxDB Cloud is distributed across 64 partitions. Distribution is performed using a persistent hash of the series key, with the intent that the write and query load will, on average, be distributed evenly across partitions. When an InfluxDB Cloud user writes data, their writes first go into a durable queue. So, rather than being written directly by users, storage pods consume ingest data from the other end of this queue, amongst other things, this allows writes to be accepted even if storage is encountering issues. One of the metrics used to reflect the status of this pipeline is [Time To Become Readable \(TTBR\)](https://docs.influxdata.com/influxdb/cloud/reference/internals/ttbr/), the time it takes between a write being accepted and passing through the queue so that its data is available to queries. In order to respond to a query, the compute tier needs to request relevant data from each of the 64 partitions. In order for a query to succeed, the compute **must** receive a response from every partition \(this is to ensure that incomplete results are not returned\). Each partition has multiple pods that are responsible for it, and query activity is distributed across them. ### Start of Incident On the 7th of June 2024, partition 44 started to report large increases in TTBR. This meant that, whilst customer’s writes were being safely accepted into the durable queue, they were delayed in becoming available to queries. At around the same time, alerts were received indicating an elevated query failure rate, accompanied by an increase in the query queue depth. ### Investigation Investigation showed that the pods responsible for partition 44 were periodically trying to consume more RAM than permitted, causing them to exit and report an out-of-memory \(OOM\) event. InfluxData allocated additional RAM to the pods to try to mitigate the customer-facing impact quickly. However, they continued to OOM, so the investigation moved on to identifying the source of the excessive resource usage. In a multi-tenant system, the resource usage of a single user impacting other users is known as a noisy neighbor issue. The best way to address the problem is to identify the tenant that is the source of the problematic query and temporarily block their queries, while we engage with them to correct the problematic query. In this case, the customer had automated the execution of a query that attempted to run an in-memory _sort\(\)_ against data taken from a particularly dense series. With the problematic query being submitted regularly, more RAM was consumed, until the storage pods ultimately OOMed. As a result of these regular OOMs, Kubernetes moved the pods into a CrashloopBackOff state, which lengthened the recovery time between each OOM. The extended recovery periodically caused all pods responsible for partition 44 to be offline, preventing the query tier from authoritatively answering queries. ### Actions We are working on several changes to better identify the source of incidents and reduce the likelihood of them occurring in the future. These changes include: *  Better visualization to make it easier to identify small noisy neighbors *  Disabling CrashLoopbackOff for storage pods once Kubernetes can do so has been added.
  • Time: June 8, 2024, 1:08 a.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: June 8, 2024, 12:51 a.m.
    Status: Monitoring
    Update: We are continuing to monitor for any further issues.
  • Time: June 8, 2024, 12:51 a.m.
    Status: Monitoring
    Update: The issue has been addressed and we are monitoring
  • Time: June 8, 2024, 12:50 a.m.
    Status: Investigating
    Update: The issue has been addressed and we are monitoring
  • Time: June 8, 2024, 12:02 a.m.
    Status: Investigating
    Update: We have added more capacity to support the increased query workload that we are seeing and continue to investigate.
  • Time: June 7, 2024, 10:37 p.m.
    Status: Investigating
    Update: We are currently investigating this issue.

Updates:

  • Time: June 13, 2024, 3:08 a.m.
    Status: Postmortem
    Update: # RCA **Sustained query error-rate in AWS eu-central-1 on June 7, 2024** ### Background Data stored in InfluxDB Cloud is distributed across 64 partitions. Distribution is performed using a persistent hash of the series key, with the intent that the write and query load will, on average, be distributed evenly across partitions. When an InfluxDB Cloud user writes data, their writes first go into a durable queue. So, rather than being written directly by users, storage pods consume ingest data from the other end of this queue, amongst other things, this allows writes to be accepted even if storage is encountering issues. One of the metrics used to reflect the status of this pipeline is [Time To Become Readable \(TTBR\)](https://docs.influxdata.com/influxdb/cloud/reference/internals/ttbr/), the time it takes between a write being accepted and passing through the queue so that its data is available to queries. In order to respond to a query, the compute tier needs to request relevant data from each of the 64 partitions. In order for a query to succeed, the compute **must** receive a response from every partition \(this is to ensure that incomplete results are not returned\). Each partition has multiple pods that are responsible for it, and query activity is distributed across them. ### Start of Incident On the 7th of June 2024, partition 44 started to report large increases in TTBR. This meant that, whilst customer’s writes were being safely accepted into the durable queue, they were delayed in becoming available to queries. At around the same time, alerts were received indicating an elevated query failure rate, accompanied by an increase in the query queue depth. ### Investigation Investigation showed that the pods responsible for partition 44 were periodically trying to consume more RAM than permitted, causing them to exit and report an out-of-memory \(OOM\) event. InfluxData allocated additional RAM to the pods to try to mitigate the customer-facing impact quickly. However, they continued to OOM, so the investigation moved on to identifying the source of the excessive resource usage. In a multi-tenant system, the resource usage of a single user impacting other users is known as a noisy neighbor issue. The best way to address the problem is to identify the tenant that is the source of the problematic query and temporarily block their queries, while we engage with them to correct the problematic query. In this case, the customer had automated the execution of a query that attempted to run an in-memory _sort\(\)_ against data taken from a particularly dense series. With the problematic query being submitted regularly, more RAM was consumed, until the storage pods ultimately OOMed. As a result of these regular OOMs, Kubernetes moved the pods into a CrashloopBackOff state, which lengthened the recovery time between each OOM. The extended recovery periodically caused all pods responsible for partition 44 to be offline, preventing the query tier from authoritatively answering queries. ### Actions We are working on several changes to better identify the source of incidents and reduce the likelihood of them occurring in the future. These changes include: *  Better visualization to make it easier to identify small noisy neighbors *  Disabling CrashLoopbackOff for storage pods once Kubernetes can do so has been added.
  • Time: June 8, 2024, 1:08 a.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: June 8, 2024, 12:51 a.m.
    Status: Monitoring
    Update: We are continuing to monitor for any further issues.
  • Time: June 8, 2024, 12:51 a.m.
    Status: Monitoring
    Update: The issue has been addressed and we are monitoring
  • Time: June 8, 2024, 12:50 a.m.
    Status: Investigating
    Update: The issue has been addressed and we are monitoring
  • Time: June 8, 2024, 12:02 a.m.
    Status: Investigating
    Update: We have added more capacity to support the increased query workload that we are seeing and continue to investigate.
  • Time: June 7, 2024, 10:37 p.m.
    Status: Investigating
    Update: We are currently investigating this issue.

Updates:

  • Time: May 27, 2024, 11:41 a.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: May 27, 2024, 10:22 a.m.
    Status: Monitoring
    Update: A fix has been implemented and we are monitoring the results.
  • Time: May 27, 2024, 8:25 a.m.
    Status: Investigating
    Update: We are currently investigating this issue.

Updates:

  • Time: May 27, 2024, 11:41 a.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: May 27, 2024, 10:22 a.m.
    Status: Monitoring
    Update: A fix has been implemented and we are monitoring the results.
  • Time: May 27, 2024, 8:25 a.m.
    Status: Investigating
    Update: We are currently investigating this issue.

Check the status of similar companies and alternatives to InfluxDB

Smartsheet
Smartsheet

Systems Active

ESS (Public)
ESS (Public)

Systems Active

ESS (Public)
ESS (Public)

Systems Active

Cloudera
Cloudera

Systems Active

New Relic
New Relic

Systems Active

Boomi
Boomi

Systems Active

AppsFlyer
AppsFlyer

Systems Active

Imperva
Imperva

Systems Active

Bazaarvoice
Bazaarvoice

Issues Detected

Optimizely
Optimizely

Systems Active

Electric
Electric

Systems Active

ABBYY
ABBYY

Systems Active

Frequently Asked Questions - InfluxDB

Is there a InfluxDB outage?
The current status of InfluxDB is: Systems Active
Where can I find the official status page of InfluxDB?
The official status page for InfluxDB is here
How can I get notified if InfluxDB is down or experiencing an outage?
To get notified of any status changes to InfluxDB, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of InfluxDB every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here
What does InfluxDB do?
Efficiently store and access time series data in a specialized database designed for speed, available in cloud, on-premises, or edge environments.