Last checked: 8 minutes ago
Get notified about any outages, downtime or incidents for PubNub and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for PubNub.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
Functions | Active |
Functions Service | Active |
Key Value store | Active |
Scheduler Service | Active |
Vault | Active |
Points of Presence | Active |
Asia Pacific Points of Presence | Active |
European Points of Presence | Active |
North America Points of Presence | Active |
Southern Asia Points of Presence | Active |
Realtime Network | Active |
Access Manager Service | Active |
App Context Service | Active |
DNS Service | Active |
Mobile Push Gateway | Active |
MQTT Gateway | Active |
Presence Service | Active |
Publish/Subscribe Service | Active |
Realtime Analytics Service | Active |
Storage and Playback Service | Active |
Stream Controller Service | Active |
Website and Portals | Active |
Administration Portal | Active |
PubNub Support Portal | Active |
SDK Documentation | Active |
Website | Active |
View the latest incidents for PubNub and check for official updates:
Description: Beginning at around 11:00 UTC we observed elevated latency and server errors for our Presence service in all of our server endpoints. The issue has been resolved as of 14:11 UTC. We will continue to monitor the incident to ensure service stability has been fully restored. Your trust is our top priority, and we are committed to ensuring smooth operations.
Status: Resolved
Impact: None | Started At: Oct. 24, 2024, 12:01 p.m.
Description: ### **Problem Description, Impact, and Resolution** At 7:35 UTC on October 5, 2024, we received a report of intermittent failures \(5xx errors\) for History API requests. The issue was triggered by an unexpectedly high volume of data requests processed through our shared infrastructure, overwhelming the shared history reader containers responsible for fetching this data from our storage nodes. As data was retrieved and processed by the history reader containers, we observed memory exhaustion \(OOM-kills\), even though the memory capacity had been significantly increased. This impacted the performance of our system, causing History API requests to fail when the memory overload occurred. We took action by isolating the requests responsible for the high data volume and deploying dedicated infrastructure for them. This ensured that the issue was resolved at 00:43 UTC on October 6, and no further impact was observed across the broader customer base. ### **Mitigation Steps and Recommended Future Preventative Measures** To prevent this issue from recurring, we deployed dedicated infrastructure for high-volume data requests, and we implemented dynamic data bucket creation to distribute large data volumes more efficiently, reducing strain on our nodes. These steps ensure that our system can handle sudden spikes in resource usage while maintaining stability for all customers.
Status: Postmortem
Impact: Minor | Started At: Oct. 5, 2024, 11:16 a.m.
Description: ### **Problem Description, Impact, and Resolution** At 22:42 UTC on August 22, 2024, we observed increased latency and errors for our Presence service. We found evidence of network issues from our own testing and monitoring, and we created a ticket with our network service provider for additional investigation. The issue was resolved as of 22:50 UTC. The root cause of this issue was a lack of monitoring and alerting around transient network issues within our network service provider's inter-VPC routing. ### **Mitigation Steps and Recommended Future Preventative Measures** To prevent a similar issue from occurring in the future we have configured the proper monitoring and alerting to provide us with enough time to address this issue before it can affect the QoS of our services.
Status: Postmortem
Impact: Minor | Started At: Aug. 22, 2024, 10:32 p.m.
Description: **Problem Description, Impact, and Resolution** At 18:16 UTC on June 13, 2024 we observed increased latency for delivery of mobile push messages in our Frankfurt and US-East points of presence. In response, we increased the resources available to the services and redeployed the service.The issue was resolved at 21:21 UTC on June 13, 2024. Upon further investigation, we identified this issue occurred due to malformed message payloads creating a backlog in the message queue. ### **Mitigation Steps and Recommended Future Preventative Measures** To prevent a similar issue from occurring in the future, we increased the memory for the service to handle similar malformed payloads, as well as added additional monitoring.
Status: Postmortem
Impact: Major | Started At: June 13, 2024, 6:51 p.m.
Description: At 18:53 UTC on June 7, 2024, we observed excessively latent deliveries of mobile push messages in our Frankfurt point-of-presence. We discovered that a previously undetected bug was being triggered by malformed messages being sent to the service. We increased the resources available to that service, which allowed the system to catch up and deliveries were being made normally. The issue was declared resolved at 19:59 UTC. ### **Mitigation Steps and Recommended Future Preventative Measures** To prevent a similar issue from occurring in the future we are leaving the system running in the new configuration. We will also increase monitoring for this area, and will be modifying the push notification service to rectify the bug that triggered the scenario originally.
Status: Postmortem
Impact: None | Started At: June 7, 2024, 7:25 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.