Last checked: 7 minutes ago
Get notified about any outages, downtime or incidents for Cronofy and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Cronofy.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
API | Active |
Background Processing | Active |
Developer Dashboard | Active |
Scheduler | Active |
Conferencing Services | Active |
GoTo | Active |
Zoom | Active |
Major Calendar Providers | Active |
Apple | Active |
Active | |
Microsoft 365 | Active |
Outlook.com | Active |
View the latest incidents for Cronofy and check for official updates:
Description: The incident has now been resolved. We will conduct a full root cause analysis and publish the outcomes in the coming days. In the meantime, if you have further questions, please email [email protected].
Status: Resolved
Impact: Critical | Started At: May 14, 2021, 12:53 p.m.
Description: This incident has been resolved.
Status: Resolved
Impact: None | Started At: April 13, 2021, 3:09 p.m.
Description: This incident has been resolved.
Status: Resolved
Impact: None | Started At: April 13, 2021, 3:09 p.m.
Description: # What happened On Monday 22nd March 2021, our US data center experienced degraded performance for 25 minutes, between the times of 20:05:17 UTC and 20:30:30 UTC. During this time, the performance of background processing, such as calendar syncing was affected. While we initially thought that the API and App were affected, upon investigation we found that the impact on these services was minimal. We first became aware of the issue at 20:11 UTC, when we received our first PagerDuty alert and began investigating. The maximum delay of background processing tasks would have been 25 minutes. But, as mentioned below, as not all tasks were failing it is likely that the delay seen by customers would have been less than this. # Our investigation Due to the nature of the alert, we immediately knew that background processing was affected because queues were backing up. Our application logs showed that DNS lookups were failing, subsequent investigations have shown that one of our five CoreDNS pods stopped processing DNS requests and although we have measures in place to replace unresponsive pods, it was still responding to health checks and so remained in service. This meant that DNS lookups had a 20% chance of being routed to the broken CoreDNS pod. In reality, this meant that most background processing jobs would have a 20% chance of failing, as Google calendar synchronization requires three DNS lookups, these tasks had a 60% chance of failing. We chose to replace all of our CoreDNS, and after doing so, DNS lookups stopped failing and the messages that had backed up on our queues were processed by 20:35 UTC. # What we're doing We've already taken steps to increase the number of logs CoreDNS outputs and adjusted CoreDNS pod auto-scaling so that we have more pods in service at busier times. This means that if this event were to happen again, we'd have more information at our disposal and the percentage chance of hitting a broken CoreDNS pod would be reduced. In the short term, we will be looking at getting metrics from the CoreDNS pods into our observability tooling, this will allow us to keep a closer eye on how the CoreDNS pods are performing.
Status: Postmortem
Impact: Minor | Started At: March 22, 2021, 8:21 p.m.
Description: # What happened On Monday 22nd March 2021, our US data center experienced degraded performance for 25 minutes, between the times of 20:05:17 UTC and 20:30:30 UTC. During this time, the performance of background processing, such as calendar syncing was affected. While we initially thought that the API and App were affected, upon investigation we found that the impact on these services was minimal. We first became aware of the issue at 20:11 UTC, when we received our first PagerDuty alert and began investigating. The maximum delay of background processing tasks would have been 25 minutes. But, as mentioned below, as not all tasks were failing it is likely that the delay seen by customers would have been less than this. # Our investigation Due to the nature of the alert, we immediately knew that background processing was affected because queues were backing up. Our application logs showed that DNS lookups were failing, subsequent investigations have shown that one of our five CoreDNS pods stopped processing DNS requests and although we have measures in place to replace unresponsive pods, it was still responding to health checks and so remained in service. This meant that DNS lookups had a 20% chance of being routed to the broken CoreDNS pod. In reality, this meant that most background processing jobs would have a 20% chance of failing, as Google calendar synchronization requires three DNS lookups, these tasks had a 60% chance of failing. We chose to replace all of our CoreDNS, and after doing so, DNS lookups stopped failing and the messages that had backed up on our queues were processed by 20:35 UTC. # What we're doing We've already taken steps to increase the number of logs CoreDNS outputs and adjusted CoreDNS pod auto-scaling so that we have more pods in service at busier times. This means that if this event were to happen again, we'd have more information at our disposal and the percentage chance of hitting a broken CoreDNS pod would be reduced. In the short term, we will be looking at getting metrics from the CoreDNS pods into our observability tooling, this will allow us to keep a closer eye on how the CoreDNS pods are performing.
Status: Postmortem
Impact: Minor | Started At: March 22, 2021, 8:21 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.