Last checked: 5 minutes ago
Get notified about any outages, downtime or incidents for OpenAI and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for OpenAI.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
API | Active |
ChatGPT | Active |
Labs | Active |
Playground | Active |
View the latest incidents for OpenAI and check for official updates:
Description: As of 2:12a GMT this incident is resolved.
Status: Resolved
Impact: Major | Started At: June 27, 2024, 1:46 a.m.
Description: This issue has now been resolved. If you are experiencing any continuing elevated error rates, please reach out to our support team.
Status: Resolved
Impact: Major | Started At: June 23, 2024, 3:20 a.m.
Description: This incident has now been resolved
Status: Resolved
Impact: Major | Started At: June 21, 2024, 7:44 p.m.
Description: This incident has now been resolved. Customers should no longer be seeing elevated error rates when calling our API endpoints.
Status: Resolved
Impact: Major | Started At: June 20, 2024, 7:38 p.m.
Description: On June 17th, 2024, from 11:39 am to 2:02 pm PT, ChatGPT experienced an elevated error rate, with the majority of requests failing at one point. The incident involved three main issues: * An inference engine issue prompted the initial rollback * A series of cascading issues occurred with our event bus infrastructure resulting in IO-blocking across the ChatGPT service, which prevented requests from completing. * A bug caused ChatGPT users to receive empty completion responses. During the initial rollback, there was an unexpected degradation in an event publishing flow. Due to recent infrastructure changes, and the deployment, we experienced an abnormally high number of requests to a schema service which led to increased latencies. A 3rd party library executing these requests used an IO-blocking behavior that caused processes to stall, resulting in ChatGPT requests timing out and returning 504 errors. We rolled forward to mitigate this and ChatGPT requests no longer experienced 504s. We began to notice that conversations responses appeared to be successful but were usually empty. We identified that this is a regression due to recent code changes. The regression was fixed and we rolled forward again to mitigate the regression. As part of the incident response, we have already implemented the following measures: * Removed the IO-blocking behavior that occurred during event publishing. * Added caching to the schema service. * Implemented additional monitoring for the schema service. Additionally, we will be implementing the following changes to prevent future incidents altogether: * Reduce environment mismatch between testing and prod. * Add monitor for shortened ChatGPT responses * Improve revert deploy time * Remove the dependency on the schema service.
Status: Postmortem
Impact: Major | Started At: June 17, 2024, 6:39 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.