Last checked: 2 minutes ago
Get notified about any outages, downtime or incidents for PubNub and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for PubNub.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
Functions | Active |
Functions Service | Active |
Key Value store | Active |
Scheduler Service | Active |
Vault | Active |
Points of Presence | Active |
Asia Pacific Points of Presence | Active |
European Points of Presence | Active |
North America Points of Presence | Active |
Southern Asia Points of Presence | Active |
Realtime Network | Active |
Access Manager Service | Active |
App Context Service | Active |
DNS Service | Active |
Mobile Push Gateway | Active |
MQTT Gateway | Active |
Presence Service | Active |
Publish/Subscribe Service | Active |
Realtime Analytics Service | Active |
Storage and Playback Service | Active |
Stream Controller Service | Active |
Website and Portals | Active |
Administration Portal | Active |
PubNub Support Portal | Active |
SDK Documentation | Active |
Website | Active |
View the latest incidents for PubNub and check for official updates:
Description: ### **Problem Description, Impact, and Resolution** At 12:00 UTC on May 3, 2022 we observed latency in our Storage & Playback service in the US West PoP, which manifested itself as missing messages to clients who used that service to look up messages in that location. Publish and Subscribes were unaffected. We identified the cause as an issue with a downstream data storage service provider in that region, and took steps to have other regions assist in processing the message backlog. This caused the secondary effects of temporarily increasing latency and error rates in the Storage & Playback and Push services in the US East and AP Northeast PoPs from 13:42 to 13:54, after which all services were operating nominally. There was an additional secondary effect which manifested as increased latency to the Push service from 14:25 to 14:36. All systems were then performing within normal bounds, and the incident was considered resolved at 14:36 UTC the same day. This issue occurred because of a failure of 2 of 3 database nodes at a database provider in the US West region. The provider completed the replacement of the failed nodes at 18:00 UTC the same day, after which we returned to our normal operating posture for the affected services. ### **Mitigation Steps and Recommended Future Preventative Measures** To help minimize the impact of a similar issue in the future, we updated our operational runbooks for dealing with a regional database failure based on some of our observations during this incident. We noted the secondary effects to the Push system that were caused by the runbook used to route around the issue by bringing other regions’ capacity to assist, and have scheduled work to prevent that kind of effect in the case of another similar procedure. We are continuing to work with our database provider to analyze the root cause in their service, and mitigate that going forward.
Status: Postmortem
Impact: Minor | Started At: May 3, 2022, 1:15 p.m.
Description: ### **Problem Description, Impact, and Resolution** At 12:00 UTC on May 3, 2022 we observed latency in our Storage & Playback service in the US West PoP, which manifested itself as missing messages to clients who used that service to look up messages in that location. Publish and Subscribes were unaffected. We identified the cause as an issue with a downstream data storage service provider in that region, and took steps to have other regions assist in processing the message backlog. This caused the secondary effects of temporarily increasing latency and error rates in the Storage & Playback and Push services in the US East and AP Northeast PoPs from 13:42 to 13:54, after which all services were operating nominally. There was an additional secondary effect which manifested as increased latency to the Push service from 14:25 to 14:36. All systems were then performing within normal bounds, and the incident was considered resolved at 14:36 UTC the same day. This issue occurred because of a failure of 2 of 3 database nodes at a database provider in the US West region. The provider completed the replacement of the failed nodes at 18:00 UTC the same day, after which we returned to our normal operating posture for the affected services. ### **Mitigation Steps and Recommended Future Preventative Measures** To help minimize the impact of a similar issue in the future, we updated our operational runbooks for dealing with a regional database failure based on some of our observations during this incident. We noted the secondary effects to the Push system that were caused by the runbook used to route around the issue by bringing other regions’ capacity to assist, and have scheduled work to prevent that kind of effect in the case of another similar procedure. We are continuing to work with our database provider to analyze the root cause in their service, and mitigate that going forward.
Status: Postmortem
Impact: Minor | Started At: May 3, 2022, 1:15 p.m.
Description: ### **Problem Description, Impact, and Resolution** At 18:42 UTC on 2022-02-16, we observed subscribe \(to channel groups\) errors increase due to issues with channel group registrations \(add/remove channels to/from channel groups\) in our EU PoP. We notified our storage provider and began rerouting storage traffic to our Mumbai PoP to mitigate the issues. At 21:56 UTC on 2022-02-16, the issue was resolved. A new usage pattern at scale exposed some sub-optimal behavior which required us to scale our storage services on short notice to mitigate the issue. **Mitigation Steps and Recommended Future Preventative Measures** To prevent a similar issue from occurring in the future we are fixing the bottleneck so we are able to scale our storage service more quickly.
Status: Postmortem
Impact: None | Started At: Feb. 16, 2022, 9:33 p.m.
Description: ### **Problem Description, Impact, and Resolution** At 18:42 UTC on 2022-02-16, we observed subscribe \(to channel groups\) errors increase due to issues with channel group registrations \(add/remove channels to/from channel groups\) in our EU PoP. We notified our storage provider and began rerouting storage traffic to our Mumbai PoP to mitigate the issues. At 21:56 UTC on 2022-02-16, the issue was resolved. A new usage pattern at scale exposed some sub-optimal behavior which required us to scale our storage services on short notice to mitigate the issue. **Mitigation Steps and Recommended Future Preventative Measures** To prevent a similar issue from occurring in the future we are fixing the bottleneck so we are able to scale our storage service more quickly.
Status: Postmortem
Impact: None | Started At: Feb. 16, 2022, 9:33 p.m.
Description: ### **Problem Description, Impact, and Resolution** At 03:27 UTC on 2022-16-02, we observed errors with Channel Group registrations \(add/remove channels to/from channel groups\) in the South AP PoP \(Mumbai\). We immediately escalated to our storage provider while we routed storage traffic to US West to mitigate the issue. The issue was resolved at 04:30 UTC on 2022-02-15. This issue occurred due to a bug in our storage providers' platform. Our internal monitoring detected the errors which allowed us to take immediate actions with rerouting traffic and escalating to our provider. ### **Mitigation Steps and Recommended Future Preventative Measures** We plan to migrate our Mumbai PoP to Kubernetes which will allow us to more efficiently reroute traffic to another PoP when needed. Our storage provider will resolve the existing bug.
Status: Postmortem
Impact: Critical | Started At: Feb. 16, 2022, 3:41 a.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.