Last checked: 6 minutes ago
Get notified about any outages, downtime or incidents for OpenAI and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for OpenAI.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
API | Active |
ChatGPT | Active |
Labs | Active |
Playground | Active |
View the latest incidents for OpenAI and check for official updates:
Description: On June 3, 2024, at 11:49 PM PDT, ChatGPT experienced a significant outage affecting all user tiers \(paid, enterprise, free, anonymous\). By 4:10 AM PDT, service was fully restored. A second phase of the outage began a few hours later at 7:14 AM PDT on June 4 again impacting the same user cohorts. Service was restored for a second time at 10:07 AM PDT. The issue resulted from a database that ChatGPT depends on becoming unavailable due to traffic surges initiated by the connection pooling service and the way that service was configured. The team initially attempted to mitigate in a variety of ways, including restarting the primary server and assessing failover options to other replicas. Despite the various attempts at recovery, the primary database continued to be unreachable. We eventually blocked all traffic to ChatGPT to remove all load from the DB and were able to promote a secondary target to be the new primary and began redirecting traffic to it. Re-ramping incoming traffic concluded at 10:07 AM at which time, all services were recovered. As part of the incident response, we have already implemented the following measures: * Tuned the number of connections the pooling service makes to the DB backend. * Increased timeouts on connections made to the DB to avoid deadlocks. * Implemented exponential backoff, gradually increasing the wait time between subsequent retry attempts for DB connection failures. * Modified our load shedding tooling to make it easier to degrade more gracefully. Additionally, we will be implementing the following changes to prevent future incidents of this type altogether: * Re-architect the DB design to increase its redundancy. * Improve our ability to load shed at the DB layer \(in addition to the clients\). * Expand the load testing and benchmarking we do for the backend layer.
Status: Postmortem
Impact: Critical | Started At: June 4, 2024, 2:33 p.m.
Description: This incident has been resolved.
Status: Resolved
Impact: Critical | Started At: June 4, 2024, 7:21 a.m.
Description: This incident has been resolved.
Status: Resolved
Impact: Critical | Started At: June 4, 2024, 7:21 a.m.
Description: Engineers have resolved the issue.
Status: Resolved
Impact: Minor | Started At: June 3, 2024, 9:30 p.m.
Description: Engineers have resolved the issue.
Status: Resolved
Impact: Minor | Started At: June 3, 2024, 9:30 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.