Last checked: 3 minutes ago
Get notified about any outages, downtime or incidents for OpenAI and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for OpenAI.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
API | Active |
ChatGPT | Active |
Labs | Active |
Playground | Active |
View the latest incidents for OpenAI and check for official updates:
Description: From 9:14 AM to 9:35 AM PT, users experienced increased error rates while trying to log into ChatGPT and the API Platform due to an outage from a third-party authentication provider. This issue has now been resolved.
Status: Resolved
Impact: Critical | Started At: Aug. 21, 2024, 4:27 p.m.
Description: Between 5:48pm and 6:04pm PT, users experienced elevated error rates when attempting to log into ChatGPT and the API Platform due to an outage from a third-party auth provider. This issue is now resolved.
Status: Resolved
Impact: None | Started At: Aug. 21, 2024, 1:04 a.m.
Description: File uploads should be working as expected. This issue is resolved.
Status: Resolved
Impact: Major | Started At: Aug. 19, 2024, 6:51 p.m.
Description: On August 16, 2024 from 11:38 AM to 1:16 PM PT, a significant issue impacted the reliability of the OpenAI primary API, resulting in degraded service for users. This incident led to reduced success rates for ChatGPT conversations and affected login and account creation processes. The incident occurred in two waves, lasting 44 and 15 minutes respectively. The root cause was a combination of factors. A scheduled maintenance and an upgrade to the ingress of the OpenAI user-facing clusters introduced a networking control plane regression. This manifested itself in a short-lived data plane outage. As a result of the momentary loss of connectivity, a set of services became unhealthy and were automatically restarted. The restarts, however, took much longer than expected as the services starting up overwhelmed a backend persistence store with a heavy first-start query. The backend persistence store required additional time to catch up and recover. As part of the incident response, we have already implemented the following measures: 1. We have mitigated the networking control plane regression, and validated that control plane restarts do not interfere with the clusters' data plane 2. We implemented software changes to improve services' start time and remove first-start query alleviating pressure on the persistence layer and speeding up start and restart of services. 3. Deployed configuration changes to optimize networking control plane's effect on clusters ability to handle traffic 4. Removed the expensive database query from critical startup paths 5. Implemented additional monitoring and alerting for networking control-plane related issues Additionally, we will be implementing the following changes to prevent future incidents altogether: 1. We are introducing staged rollouts for infrastructure changes with longer soak time to ensure regressions are caught as early as possible and affect as few systems and users as possible. 2. We are auditing our systems for other slow queries that may affect service start time. We are continuing to improve our infrastructure to ensure greater resilience and faster recovery in the event of future incidents. We know that extended API outages affect our customers’ products and business, and outages of this magnitude are particularly damaging. While we came up short here, we are committed to preventing such incidents in the future and improving our service reliability.
Status: Postmortem
Impact: Major | Started At: Aug. 16, 2024, 6:53 p.m.
Description: Between 8:40AM and 9:23AM PDT, all requests to the Assistants API were failing. Assistants in the OpenAI Playground were also impacted. This issue is now resolved.
Status: Resolved
Impact: Major | Started At: Aug. 15, 2024, 4:48 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.