Is there an Bitmovin outage?

Bitmovin status: Systems Active

Last checked: 14 minutes ago

Get notified about any outages, downtime or incidents for Bitmovin and 1800+ other cloud vendors. Monitor 10 companies, for free.

Subscribe for updates

Bitmovin outages and incidents

Outage and incident data over the last 30 days for Bitmovin.

There have been 0 outages or incidents for Bitmovin in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Sign Up Now

Components and Services Monitored for Bitmovin

Outlogger tracks the status of these components for Xero:

Bitmovin Dashboard Active
Player Licensing Active
Analytics Ingress Active
Export Service Active
Query Service Active
Account Service Active
Configuration Service Active
Encoding Service Active
Infrastructure Service Active
Input Service Active
Manifest Service Active
Output Service Active
Player Service Active
Statistics Service Active
Component Status
Bitmovin Dashboard Active
Player Licensing Active
Active
Analytics Ingress Active
Export Service Active
Query Service Active
Active
Account Service Active
Configuration Service Active
Encoding Service Active
Infrastructure Service Active
Input Service Active
Manifest Service Active
Output Service Active
Player Service Active
Statistics Service Active

Latest Bitmovin outages and incidents.

View the latest incidents for Bitmovin and check for official updates:

Updates:

  • Time: June 14, 2024, 12:44 p.m.
    Status: Resolved
    Update: Between 11:15 and 13:45, Bitmovin Analytics data collection experienced load balancing issues in our European datacenter. The load balancer began excessively auto-scaling our fleet of instances and subsequently performed a full traffic shift to the US datacenter. Following the traffic shift, the instances in Europe were terminated and restarted, causing some requests on those instances to be lost and not written to our database. We stabilized the system by modifying the load balancing behavior and are currently investigating the root cause of this incident. We apologize for the inconvenience and will post a full Root Cause Analysis (RCA) once the investigation is complete, along with corrective actions taken to prevent similar issues in the future.

Updates:

  • Time: May 22, 2024, 6:16 a.m.
    Status: Resolved
    Update: The incident has been resolved and all data was successfully backfilled as of 16:55 May 2024-05-21.
  • Time: May 21, 2024, 2:30 p.m.
    Status: Monitoring
    Update: The issue has been resolved and we are now monitoring the situation. Query capabilities are restored and the buffered data is currently being backfilled. During the backfill operation queries might return partial data.
  • Time: May 21, 2024, 2:22 p.m.
    Status: Investigating
    Update: We are currently investigating an issue with our main analytics database causing query interruptions and latency issues.

Updates:

  • Time: Dec. 5, 2023, 4:14 p.m.
    Status: Postmortem
    Update: ## Summary A manual cleanup routine got stalled and caused a lock on certain database tables that are necessary to manage encoding jobs. The API-related endpoints returned HTTP 500 errors during that time. Customers depending on that endpoint \(either directly via the API or indirectly via the dashboard\) could not properly do so. After identifying and fixing the cause, the involved endpoints returned to normal operation. ## Date The issue occurred on December 1, 2023, between 07:40 and 8:15. All times in UTC. ## Root Cause A routine manual cleanup procedure caused a lock on certain database tables and stalled so that the locks could not be released. Services depending on this database resource were then impacted and unable to process API requests. ## Implications Customers were not able to start encodings. Some encoding jobs had longer than expected turnaround times. The involved API requests targeting the encoding endpoint returned HTTP 500 errors. ## Remediation The faulty database operation was identified and terminated. ## Timeline 07:40 - Internal alerts notified the team about failures. 07:50 - The team began investigating. 08:00 - The faulty component was identified. The team began investigating the involved operations. 08:15 - The faulty operation was identified and terminated. The affected service recovered. 08:20 - The team kept monitoring and verifying the proper operation of the service. ## Prevention The process for the cleanup procedure has been updated to not use the procedure that caused this incident. The team will analyze this procedure in detail to understand why it caused a lock on the database and stalled. Measures to prevent this procedure from stalling will be taken. As soon as the updated procedure is safe again, the team will continue to use it to fulfill the required maintenance tasks.
  • Time: Dec. 1, 2023, 8:38 a.m.
    Status: Resolved
    Update: All services continue to work normally again. The incident is resolved. The team will come back with an RCA beginning of next week.
  • Time: Dec. 1, 2023, 8:19 a.m.
    Status: Monitoring
    Update: Error rates are back to normal and encoding jobs are processing normally again. The team is monitoring and further investigating the root cause.
  • Time: Dec. 1, 2023, 8:04 a.m.
    Status: Investigating
    Update: We are currently investigating elevated error rate on our API. Starting encoding jobs seems to be impacted. We will come back with more information as soon as we have them.

Updates:

  • Time: Sept. 15, 2023, 1:20 p.m.
    Status: Postmortem
    Update: ## Summary Bitmovin’s engineering team observed failing encoding jobs configured to run on Azure. They also got informed of suspicious activity in Bitmovin’s Microsoft Azure Subscription used for Bitmovin Managed Encoding running in Azure regions. Launching infrastructure on this subscription was deactivated without prior notification. This prevented Bitmovin from launching a new computing infrastructure, leading to encoding job failure with  “Scheduling failed” error messages. Encoding jobs configured to run on other cloud regions like AWS or Google were not affected at any time. Customers were instructed to fall back to cloud regions in AWS and Google. Bitmovin moved all compute to an alternative Azure subscription to unblock customers running encoding jobs in Azure regions. Microsoft admitted an incorrect detection and thus resource block on Bitmovin’s Azure subscription. ## Date The issue occurred on September 11, 2023, between 14:12 and September 13, 17:09. All times in UTC. ## Root Cause Microsoft Azure's Suspicious Activity Detector incorrectly identified Bitmovin's request for additional resources as suspicious, leading to the deactivation of Bitmovin’s main subscription. Microsoft has since identified the logic is too stringent in looking at abuse patterns; has adjusted this detection, and applied further quality controls to avoid resource blocks being applied incorrectly. Our scheduling logic received missing capacity errors while requesting new instances in our main Azure subscription caused by the incorrectly applied resource block by Azure. This led to “Scheduling failed” error messages for customers running encoding jobs in Azure regions. ## Implications Workloads scheduled by customers using Managed Encoding in Azure could not be processed. The encoding jobs immediately transitioned to the error state. Other cloud vendors were not affected. The Cloud Connect feature for Azure Infrastructure was partially and temporarily impacted. ## Remediation The affected customers were notified and advised to change their encoding job configuration to utilize another cloud provider to process the encoding jobs. Customer communication was completed directly by the Bitmovin Customer Experience team and Status Page. The Bitmovin Engineering team switched the managed Azure subscription to an alternative one which was not affected by the resource blocks. ## Timeline Sep 11, 14:12 - The Engineering team observed failing encoding jobs configured to run on Azure and started investigating. Sep 11, 14:16 - The Engineering team identified a resource block on the Bitmovin Azure subscription as the root cause of the failure. Sep 11, 15:30 - A Support case with our Azure partner was opened. The support case was escalated via our partner contacts. Sep 11, 15:40 - The Bitmovin support team started contacting customers running on Azure regions and advised them to switch the configuration of the encoding jobs to run on an alternative cloud provider. Sep 12, 07:00 - The engineering team started working on a solution to switch the Azure encoding workloads to another Azure subscription. Sep 12, 10:04 - The engineering team updated the scheduling logic to make a limited set of Azure regions available again for encoding workloads running on the prepared Azure subscription. Turnaround times were longer than usual as they did not run at full capacity yet. Sep 12, 16:09 - The remaining Azure regions were also made available for Azure encoding workloads using the same strategy. Sep 12, 21:17 - The Azure support ticket to remove the resource blocks was manually escalated via Microsoft directly. Sep 13, 13:15 - Engineering rolled out an update that enabled normal turnaround times for Azure encoding jobs again. Sep 13, 17:09 - The Bitmovin incident was resolved - all Bitmovin customers could run encoding jobs on Azure again. Sep 14, 08:00 - The Engineering team works on getting the original Azure subscription activated again together with the Partner and Microsoft. They are also working to understand the root cause as to why the subscription got disabled and work on a solution to prevent this in the future. Sept 14, 17:30 - Microsoft provided Bitmovin with an RCA saying that the newly added compromise detection logic was too stringent and a response analyst inaccurately validated the subscription leading to the resource blocks being applied incorrectly. ## Prevention The Engineering team will work with Microsoft Azure and our partner to prevent such situations in the future. The Engineering team will keep the Azure subscription failover implemented as a temporary solution and adapt tooling to make switching between Azure subscriptions easier. Microsoft confirmed they have adjusted their incorrect detection and applied further quality controls to avoid resource blocks being applied incorrectly.
  • Time: Sept. 13, 2023, 5:09 p.m.
    Status: Resolved
    Update: Encoding turnaround times are back to normal levels on all Azure regions. Our team was able to move the entire workload to an alternative Azure subscription now. The team will come back with a post-mortem in the next days.
  • Time: Sept. 12, 2023, 4:08 p.m.
    Status: Monitoring
    Update: Encoding times in Azure across all regions are not yet up to normal. The team is investigating multiple ways to solve this and will provide an update once this is fully resolved. The recommendation is still to fall back to other cloud providers in the meantime.
  • Time: Sept. 12, 2023, noon
    Status: Monitoring
    Update: Encoding on Azure can be scheduled again as usual. The team is monitoring the system closely and will resolve the issue if we don't encounter any further issues.
  • Time: Sept. 12, 2023, noon
    Status: Monitoring
    Update: Encoding on Azure can be scheduled again as usual. The team is monitoring the system closely and will resolve the issue if we don't encounter any further issues.
  • Time: Sept. 12, 2023, 11:08 a.m.
    Status: Monitoring
    Update: Encoding on Azure can be scheduled again in the regions * Australia East * Europe North * Europe West The team is monitoring the system closely and working on adding additional regions. We will provide updates in the future as we add additional regions.
  • Time: Sept. 12, 2023, 11:08 a.m.
    Status: Monitoring
    Update: Encoding on Azure can be scheduled again in the regions * Australia East * Europe North * Europe West The team is monitoring the system closely and working on adding additional regions. We will provide updates in the future as we add additional regions.
  • Time: Sept. 11, 2023, 3:57 p.m.
    Status: Identified
    Update: There is a problem with our Azure subscription. Our engineering team is in contact with Azure support to resolve the issue with the subscription. We advise all customers to use a different cloud in the meantime. Customers encoding with other cloud providers are not affected by this incident.
  • Time: Sept. 11, 2023, 3:57 p.m.
    Status: Identified
    Update: There is a problem with our Azure subscription. Our engineering team is in contact with Azure support to resolve the issue with the subscription. We advise all customers to use a different cloud in the meantime. Customers encoding with other cloud providers are not affected by this incident.
  • Time: Sept. 11, 2023, 3:18 p.m.
    Status: Investigating
    Update: We are continuing to investigate this issue.
  • Time: Sept. 11, 2023, 3:18 p.m.
    Status: Investigating
    Update: We are continuing to investigate this issue.
  • Time: Sept. 11, 2023, 3:15 p.m.
    Status: Investigating
    Update: The team is investigating failed scheduling for encoding jobs on Azure Cloud
  • Time: Sept. 11, 2023, 3:15 p.m.
    Status: Investigating
    Update: The team is investigating failed scheduling for encoding jobs on Azure Cloud

Updates:

  • Time: June 22, 2023, 11:27 a.m.
    Status: Resolved
    Update: We continued to monitor the situation and everything is working as expected and the error rates in our monitoring are at normal levels. As the duration of the incident was rather short our recommended retry behavior for 5xx errors on the API should have kept the impact minimal. The root cause of the incident was a database migration where a column was added to a huge table which caused it to lock for a few minutes. This table lock prevented successful execution on most encoding-related API calls. With our reduction of the database size which will start in July, this will not happen anymore. We additionally raised awareness in the team to pay extra attention when modifying huge database tables.
  • Time: June 22, 2023, 11:16 a.m.
    Status: Investigating
    Update: We see a recovery of error rates on our API and will continue to monitor the situation.
  • Time: June 22, 2023, 11:12 a.m.
    Status: Investigating
    Update: We're experiencing an elevated level of API errors on encoding endpoints and are currently looking into the issue.

Check the status of similar companies and alternatives to Bitmovin

Canva
Canva

Systems Active

Figma
Figma

Systems Active

Superside
Superside

Systems Active

Matterport
Matterport

Systems Active

InVision
InVision

Systems Active

Design Pickle
Design Pickle

Systems Active

mux
mux

Systems Active

SmugMug
SmugMug

Systems Active

WeVideo
WeVideo

Systems Active

movingimage EVP GmbH
movingimage EVP GmbH

Systems Active

Threekit
Threekit

Systems Active

GoReact
GoReact

Systems Active

Frequently Asked Questions - Bitmovin

Is there a Bitmovin outage?
The current status of Bitmovin is: Systems Active
Where can I find the official status page of Bitmovin?
The official status page for Bitmovin is here
How can I get notified if Bitmovin is down or experiencing an outage?
To get notified of any status changes to Bitmovin, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Bitmovin every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here