Last checked: 3 minutes ago
Get notified about any outages, downtime or incidents for CometChat and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for CometChat.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
CometChat APIs | Active |
App Management API | Active |
Rest API (AU) | Active |
Rest API (EU) | Active |
Rest API (IN) | Active |
Rest API (US) | Active |
CometChat Frontends | Active |
Dashboard | Active |
Website | Active |
CometChat v2 | Active |
Client API (EU) | Active |
Client API (US) | Active |
WebRTC (EU) | Active |
WebRTC (US) | Active |
WebSockets (EU) | Active |
WebSockets (US) | Active |
CometChat v3 | Active |
Client API (EU) | Active |
Client API (IN) | Active |
Client API (US) | Active |
WebRTC (EU) | Active |
WebRTC (IN) | Active |
WebRTC (US) | Active |
WebSocket (IN) | Active |
WebSockets (EU) | Active |
WebSockets (US) | Active |
View the latest incidents for CometChat and check for official updates:
Description: Starting around 9:35am MST on June 22, 2021, some customers started experiencing occasional errors while using CometChat. The engineering team identified the source of the issue as the database cluster. Customers were moved to a backup database cluster which immediately resolved this issue. A root cause analysis revealed that a [bug](https://bugs.mysql.com/bug.php?id=98704) in MySQL caused the issue. While a database upgrade was scheduled for next month, we are now expediting this upgrade to the end of the week. We're truly sorry for the disruption.
Status: Postmortem
Impact: Critical | Started At: June 22, 2021, 3:30 p.m.
Description: Starting around 9:35am MST on June 22, 2021, some customers started experiencing occasional errors while using CometChat. The engineering team identified the source of the issue as the database cluster. Customers were moved to a backup database cluster which immediately resolved this issue. A root cause analysis revealed that a [bug](https://bugs.mysql.com/bug.php?id=98704) in MySQL caused the issue. While a database upgrade was scheduled for next month, we are now expediting this upgrade to the end of the week. We're truly sorry for the disruption.
Status: Postmortem
Impact: Critical | Started At: June 22, 2021, 3:30 p.m.
Description: On Thursday, at 5:03AM MDT, our engineers noticed degraded performance for a few servers in our API clusters in the EU region. This resulted in intermittent performance issues for customers in those servers. Further analysis pointed to a malfunctioning database server at the infrastructure level. Our team worked with the Amazon AWS team to resolve this issue. Based on the review by the Amazon AWS team, this seems like a moment-in-time anomaly. A network issue caused failures of reverse DNS lookups which created temporary files which hit the open files limit triggering the issue encountered. As a result, the flood of connections that could not resolve name resolution created a condition of 'too many files' open. The inability for the instance to spawn new connections to files resulted in a failure of the database server in modifying the binary log index file. Ultimately this resulted in the state where the system could not reboot itself as it could not open a connection to the sock file. In order for the instance to be brought online, an Amazon AWS engineer had to intervene and restart the system manually to get the system out of this hung condition. Our current priority is working alongside our cloud vendor (Amazon AWS) who has assured this issue won't happen in the future. We're truly sorry for the disruption.
Status: Resolved
Impact: None | Started At: May 27, 2021, 11 a.m.
Description: On Thursday, at 5:03AM MDT, our engineers noticed degraded performance for a few servers in our API clusters in the EU region. This resulted in intermittent performance issues for customers in those servers. Further analysis pointed to a malfunctioning database server at the infrastructure level. Our team worked with the Amazon AWS team to resolve this issue. Based on the review by the Amazon AWS team, this seems like a moment-in-time anomaly. A network issue caused failures of reverse DNS lookups which created temporary files which hit the open files limit triggering the issue encountered. As a result, the flood of connections that could not resolve name resolution created a condition of 'too many files' open. The inability for the instance to spawn new connections to files resulted in a failure of the database server in modifying the binary log index file. Ultimately this resulted in the state where the system could not reboot itself as it could not open a connection to the sock file. In order for the instance to be brought online, an Amazon AWS engineer had to intervene and restart the system manually to get the system out of this hung condition. Our current priority is working alongside our cloud vendor (Amazon AWS) who has assured this issue won't happen in the future. We're truly sorry for the disruption.
Status: Resolved
Impact: None | Started At: May 27, 2021, 11 a.m.
Description: Starting around 12:35pm MST on January 27, 2021, some customers started experiencing occasional errors and increased latency while using CometChat. Around 12:45pm MST there was a rapid increase in errors and CometChat wasn’t usable for most customers with apps hosted in our US region. Around 12:45pm MST, we began the process of migrating our customers to a separate database cluster. From there, some customers started seeing improvements. By 1:35pm MST, our migration was complete and all customers were able to use CometChat again. A root cause analysis revealed that, our backup policies coincided with an infrastructure issue that occurred at our cloud vendor's end. As a result, our I/O operations were suspended for an extended period of time. Internal monitoring tools at our cloud vendor's end were able to observe this behavior which eventually caused the underlying hosts to be replaced. While this operation was being performed, it caused a backlog of transactions which ultimately lead to the outage. Our current priority is working alongside our cloud vendor and putting safeguards in place to prevent similar problems from happening again. We're truly sorry for the disruption.
Status: Postmortem
Impact: Critical | Started At: Jan. 27, 2021, 7:35 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.