Bitmovin Status: Check if Bitmovin down or having an outage.

Bitmovin outages and incidents

Outage and incident data over the last 30 days for Bitmovin.

There have been 3 outages or incidents for Bitmovin in the last 30 days.

Severity Breakdown:

None: 0

Minor: 1

Major: 2

Critical: 0

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Components and Services Monitored for Bitmovin

Outlogger tracks the status of these components for Xero:

Bitmovin Dashboard Active

Player Licensing Active

Analytics Service

Analytics Ingress Active

Export Service Active

Query Service Active

Bitmovin API

Account Service Active

Configuration Service Active

Encoding Service Active

Infrastructure Service Active

Input Service Active

Manifest Service Active

Output Service Active

Player Service Active

Statistics Service Active

Component	Status
Bitmovin Dashboard	Active
Player Licensing	Active
Analytics Service	Active
Analytics Ingress	Active
Export Service	Active
Query Service	Active
Bitmovin API	Active
Account Service	Active
Configuration Service	Active
Encoding Service	Active
Infrastructure Service	Active
Input Service	Active
Manifest Service	Active
Output Service	Active
Player Service	Active
Statistics Service	Active

Latest Bitmovin outages and incidents.

View the latest incidents for Bitmovin and check for official updates:

Analytics Database Outage

Description: All systems are fully operational again and all incoming data that was buffered during the outage is fully available again. We are going to conduct a thorough RCA and post a Postmortem explaining the incident as well as the actions derived to prevent similar issues in the future.

Status: Resolved

Impact: Major | Started At: Nov. 6, 2024, 8:10 a.m.

Updates:

Time: Nov. 6, 2024, 9:02 p.m.

Status: Resolved

Update: All systems are fully operational again and all incoming data that was buffered during the outage is fully available again. We are going to conduct a thorough RCA and post a Postmortem explaining the incident as well as the actions derived to prevent similar issues in the future.
Time: Nov. 6, 2024, 10:51 a.m.

Status: Identified

Update: Cluster recovery is still underway and we are working with our Database vendor to recover the issue. Unfortunately we don't have an ETA for when the issue will be resolved but will post another update at 12:45
Time: Nov. 6, 2024, 9:45 a.m.

Status: Identified

Update: We are still working on recovering the cluster Unfortunately we don't have an ETA for when the issue will be resolved but will post another update at 11:45
Time: Nov. 6, 2024, 8:10 a.m.

Status: Identified

Update: Our main analytics database cluster is currently offline due to a failed upgrade operation. We are currently working to recover but currently query and export functionality is unavailable. All incoming data is being stored and will be backfilled once the incident is resolved.

Parquet Analytics exports are failing

Description: The issue is fully resolved and we suggest customers re-run any exports that have failed.

Status: Resolved

Impact: Minor | Started At: Oct. 30, 2024, 5:25 p.m.

Updates:

Time: Oct. 30, 2024, 7:10 p.m.

Status: Resolved

Update: The issue is fully resolved and we suggest customers re-run any exports that have failed.
Time: Oct. 30, 2024, 6:51 p.m.

Status: Monitoring

Update: We have identified the problem as the maven assembly plugin incorrectly merging the hadoop jars thus causing this issue. We have deployed a fixed version and are now monitoring the situation.
Time: Oct. 30, 2024, 5:25 p.m.

Status: Investigating

Update: We are currently investigating an issue that causes analytics exports with the PARQUET file format to fail immediately. This is currently affecting all customers using the parquet file export capability. As a workaround you can switch to CSV while we are working to resolve the problem.

Analytics Database Issues

Description: All analytics data is now backfilled and the system is fully operational again.

Status: Resolved

Impact: Major | Started At: Oct. 27, 2024, 8:40 p.m.

Updates:

Time: Oct. 27, 2024, 11:42 p.m.

Status: Resolved

Update: All analytics data is now backfilled and the system is fully operational again.
Time: Oct. 27, 2024, 9:19 p.m.

Status: Monitoring

Update: We have identified the issue with our main Analytics Database and have recovered operations. No data was lost, all data that was buffered during the outage is now being backfilled.
Time: Oct. 27, 2024, 9:01 p.m.

Status: Investigating

Update: We are continuing to investigate this issue.
Time: Oct. 27, 2024, 8:40 p.m.

Status: Investigating

Update: We are currently experiencing issues with our analytics database. All data is being saved and will be backfilled once the issues are resolved. Query capabilities are currently impacted while we investigate.

Long queue times and scheduling failed for some encoding jobs

Description: ## Summary A component in charge of provisioning infrastructure resources was in an overload situation which caused long queue times and partially to scheduling errors on AWS. To stabilize the system we scaled down job processing and gradually scaled up again in a controlled way. ## Date The issue occurred on September 12, 2024, between 12:04 and 15:03. All times in UTC. ## Root Cause An unusual spike in encoding job processing that was not smoothed out by our scheduling algorithm caused an overload in the component responsible for requesting instances for encoding job processing in AWS. The component could not handle the amount of work and recover itself, affecting all other jobs on AWS. ## Implications Encoding jobs that were started remained in the queued state. Some jobs failed to start and transitioned to the error state with a “Scheduling failed” message. ## Remediation The engineering team quickly identified the affected component causing the long queue times and scheduling failed errors. The load on this component was reduced by delaying the processing of encoding jobs. This then allowed the overloaded component to recover. Once it had recovered the processing of jobs was ramped up to normal operations again. The reduction in job processing also delayed non-AWS Encoding jobs. ## Timeline 12:04 - The monitoring systems alerted the engineering team about an overloaded system component and started investigating. 12:15 - The engineering team closely monitored the impacted component to identify the impact. 12:32 - The engineering team started investigating different approaches to let the impacted system recover. 13:30 - The engineering team identified that customer job processing on AWS is impacted and reduced the number of jobs that are processed in the system. 14:00 - The component recovered and the engineering team started to scale up the encoding job processing again. 14:24 - The full processing capacity was restored and the system continued to process the queued jobs normally. 15:03 - The engineering team continued closely monitoring the systems. ‌ ## Prevention After the first investigations, the engineering team will do the following actions to prevent a similar overload scenario in the future of this component: * Scale the underlying database to a bigger instance type * Improve the scheduling algorithm of the system to smooth out peak load patterns * Review data access patterns to avoid high load of the component Eventually, the specific scenario that led to the overload situation will be simulated in a separate environment to validate the prevention measures are working as expected.

Status: Postmortem

Impact: Major | Started At: Sept. 12, 2024, 1:31 p.m.

Updates:

Time: Sept. 19, 2024, 7:27 a.m.

Status: Postmortem

Update: ## Summary A component in charge of provisioning infrastructure resources was in an overload situation which caused long queue times and partially to scheduling errors on AWS. To stabilize the system we scaled down job processing and gradually scaled up again in a controlled way. ## Date The issue occurred on September 12, 2024, between 12:04 and 15:03. All times in UTC. ## Root Cause An unusual spike in encoding job processing that was not smoothed out by our scheduling algorithm caused an overload in the component responsible for requesting instances for encoding job processing in AWS. The component could not handle the amount of work and recover itself, affecting all other jobs on AWS. ## Implications Encoding jobs that were started remained in the queued state. Some jobs failed to start and transitioned to the error state with a “Scheduling failed” message. ## Remediation The engineering team quickly identified the affected component causing the long queue times and scheduling failed errors. The load on this component was reduced by delaying the processing of encoding jobs. This then allowed the overloaded component to recover. Once it had recovered the processing of jobs was ramped up to normal operations again. The reduction in job processing also delayed non-AWS Encoding jobs. ## Timeline 12:04 - The monitoring systems alerted the engineering team about an overloaded system component and started investigating. 12:15 - The engineering team closely monitored the impacted component to identify the impact. 12:32 - The engineering team started investigating different approaches to let the impacted system recover. 13:30 - The engineering team identified that customer job processing on AWS is impacted and reduced the number of jobs that are processed in the system. 14:00 - The component recovered and the engineering team started to scale up the encoding job processing again. 14:24 - The full processing capacity was restored and the system continued to process the queued jobs normally. 15:03 - The engineering team continued closely monitoring the systems. ‌ ## Prevention After the first investigations, the engineering team will do the following actions to prevent a similar overload scenario in the future of this component: * Scale the underlying database to a bigger instance type * Improve the scheduling algorithm of the system to smooth out peak load patterns * Review data access patterns to avoid high load of the component Eventually, the specific scenario that led to the overload situation will be simulated in a separate environment to validate the prevention measures are working as expected.
Time: Sept. 12, 2024, 3:03 p.m.

Status: Resolved

Update: Everything is back to normal. The team is still monitoring the situation and will provide a RCA in the next few days.
Time: Sept. 12, 2024, 2:24 p.m.

Status: Monitoring

Update: The system is starting to recover and encoding jobs are picked up again. It will still take a while until the available slots are fully utilized.
Time: Sept. 12, 2024, 1:40 p.m.

Status: Investigating

Update: The team has identified the problem and is working on recovering the system.
Time: Sept. 12, 2024, 1:31 p.m.

Status: Investigating

Update: The team is investigating long queue times and scheduling failed errors for encoding jobs running on AWS.

Analytics Ingress Loadbalancing Issue

Description: Between 11:15 and 13:45, Bitmovin Analytics data collection experienced load balancing issues in our European datacenter. The load balancer began excessively auto-scaling our fleet of instances and subsequently performed a full traffic shift to the US datacenter. Following the traffic shift, the instances in Europe were terminated and restarted, causing some requests on those instances to be lost and not written to our database. We stabilized the system by modifying the load balancing behavior and are currently investigating the root cause of this incident. We apologize for the inconvenience and will post a full Root Cause Analysis (RCA) once the investigation is complete, along with corrective actions taken to prevent similar issues in the future.

Status: Resolved

Impact: None | Started At: June 14, 2024, 9:15 a.m.

Updates:

Time: June 14, 2024, 12:44 p.m.

Status: Resolved

Update: Between 11:15 and 13:45, Bitmovin Analytics data collection experienced load balancing issues in our European datacenter. The load balancer began excessively auto-scaling our fleet of instances and subsequently performed a full traffic shift to the US datacenter. Following the traffic shift, the instances in Europe were terminated and restarted, causing some requests on those instances to be lost and not written to our database. We stabilized the system by modifying the load balancing behavior and are currently investigating the root cause of this incident. We apologize for the inconvenience and will post a full Root Cause Analysis (RCA) once the investigation is complete, along with corrective actions taken to prevent similar issues in the future.

Check the status of similar companies and alternatives to Bitmovin

Canva

Systems Active

Figma

Systems Active

Superside

Systems Active

Matterport

Systems Active

InVision

Systems Active

Design Pickle

Systems Active

mux

Systems Active

SmugMug

Systems Active

WeVideo

Systems Active

movingimage EVP GmbH

Systems Active

Threekit

Systems Active

GoReact

Systems Active

Frequently Asked Questions - Bitmovin

Is there a Bitmovin outage?

The current status of Bitmovin is: Systems Active

Where can I find the official status page of Bitmovin?

The official status page for Bitmovin is here

How can I get notified if Bitmovin is down or experiencing an outage?

To get notified of any status changes to Bitmovin, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Bitmovin every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here

Is there an Bitmovin outage?

Bitmovin status: Systems Active

Bitmovin outages and incidents

There have been 3 outages or incidents for Bitmovin in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Components and Services Monitored for Bitmovin

Analytics Service

Bitmovin API

Latest Bitmovin outages and incidents.

Analytics Database Outage

Updates:

Parquet Analytics exports are failing

Updates:

Analytics Database Issues

Updates:

Long queue times and scheduling failed for some encoding jobs

Updates:

Analytics Ingress Loadbalancing Issue

Updates:

Check the status of similar companies and alternatives to Bitmovin

Canva

Figma

Superside

Matterport

InVision

Design Pickle

mux

SmugMug

WeVideo

movingimage EVP GmbH

Threekit

GoReact

Frequently Asked Questions - Bitmovin

Is there a Bitmovin outage?

Where can I find the official status page of Bitmovin?

How can I get notified if Bitmovin is down or experiencing an outage?

Start monitoring now!