Cronofy Status: Check if Cronofy down or having an outage.

Cronofy outages and incidents

Outage and incident data over the last 30 days for Cronofy.

There have been 0 outages or incidents for Cronofy in the last 30 days.

Severity Breakdown:

None: 0

Minor: 0

Major: 0

Critical: 0

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Components and Services Monitored for Cronofy

Outlogger tracks the status of these components for Xero:

API Active

Background Processing Active

Developer Dashboard Active

Scheduler Active

Conferencing Services

GoTo Active

Zoom Active

Major Calendar Providers

Apple Active

Google Active

Microsoft 365 Active

Outlook.com Active

Component	Status
API	Active
Background Processing	Active
Developer Dashboard	Active
Scheduler	Active
Conferencing Services	Active
GoTo	Active
Zoom	Active
Major Calendar Providers	Active
Apple	Active
Google	Active
Microsoft 365	Active
Outlook.com	Active

Latest Cronofy outages and incidents.

View the latest incidents for Cronofy and check for official updates:

Degraded performance in all data centers

Description: On Wednesday, 13th July 2022 we experienced up to 50 minutes of degraded performance in all of our data centers between 16:10 and 17:00 UTC. This was caused by an upgrade to our Kubernetes clusters \(how the Cronofy platform is hosted\) from version 1.20 to 1.21. This involves upgrading several components of which one, CoreDNS, was the source of this incident. CoreDNS was being upgraded from version 1.8.3 to 1.8.4, as this is the AWS recommended version to use with Kubernetes 1.21 hosted on Amazon's Elastic Kubernetes Service. Upgrading these components is usually a zero-downtime operation and so was being performed during working hours. Reverting the update to components, including CoreDNS, resolved the issue. This would have presented as interactions with the Cronofy platform and calendar synchronization operations taking longer than usual. For example, the 99th percentile of Cronofy API response times is usually around 0.5 seconds while during the incident it increased to around 5 seconds. Calendar synchronization operations were delayed by up to 30 minutes during the incident. Our investigations following the incident have identified that CoreDNS version 1.8.4 included a regression in behavior from 1.8.3 which caused the high level of errors within our clusters, leading to the performance degradation. We are improving our processes around such infrastructure changes to avoid such incidents in future. # Timeline _All times UTC on Wednesday, 13th July 2022 and approximate for clarity_ **16:10** Upgrade of components including CoreDNS started across all data centers. **16:15** Upgrade completed. **16:16** First alert received relating to the US data center. Manual checks show that the application was responding. **16:18** Second alert received for degraded background worker performance in CA and DE data centers. Investigations show that CPU utilization is high on all servers, in all Kubernetes clusters. Additional servers were provisioned automatically and then more added manually. **16:19** Multiple alerts being received from all data centers. **16:31** This incident was opened on our status page informing customers of the issue. We decided to rollback the component upgrade. **16:45** As the components including CoreDNS were rolled back in each data center errors dropped to normal levels and performance improved. **16:47** Rollback completed. The backlog of background work was being processed. **17:00** The backlog of background work was cleared. **17:05** Incident status changed to monitoring. **17:49** Incident closed. # Actions Although there wasn’t an outage, we certainly want to prevent this from happening again in the future. So, this lead us to ask three questions: 1. Why was this not picked up in our test environment? 2. What could we have done to identify the root cause sooner? 3. How could the impact of the change be reduced? ## Why was this not picked up in our test environment? Although this was tested in our test environment, the time between finishing the testing and deploying this to the production environments was too short. This meant that we missed that there was performance degradation introduced. We are going to review the test plan for such infrastructure changes in our test environment. This will include a soaking period, which will see us wait a set amount of time between implementing new changes in our test environment and rolling them out to the production environments. ## What could we have done to identify the root cause sooner? Previous Kubernetes upgrades had been straightforward, which led to over-confidence. Multiple infrastructure components were changed at once and so we were unable to easily identify which component was responsible. In future, we will split infrastructure component upgrades into multiple phases to help identify the cause of problems if they are to occur. ## How could the impact of the change be reduced? As mentioned above, previous Kubernetes upgrades had been straightforward, which led to over-confidence. We rolled out the component updates, including CoreDNS, to all environments in a short amount of time and it wasn’t until they had all been completed that we started to receive alerts. To prevent this from happening in the future for such changes we are going to have a phased rollout to our production environments. This will mean such an issue will only impact some environments rather than them all, reducing the impact and aiding a faster resolution. # Further questions? If you have any further questions, please contact us at [[email protected]](mailto:[email protected])

Status: Postmortem

Impact: Minor | Started At: July 13, 2022, 4:31 p.m.

Updates:

Time: July 15, 2022, 3:37 p.m.

Status: Postmortem

Update: On Wednesday, 13th July 2022 we experienced up to 50 minutes of degraded performance in all of our data centers between 16:10 and 17:00 UTC. This was caused by an upgrade to our Kubernetes clusters \(how the Cronofy platform is hosted\) from version 1.20 to 1.21. This involves upgrading several components of which one, CoreDNS, was the source of this incident. CoreDNS was being upgraded from version 1.8.3 to 1.8.4, as this is the AWS recommended version to use with Kubernetes 1.21 hosted on Amazon's Elastic Kubernetes Service. Upgrading these components is usually a zero-downtime operation and so was being performed during working hours. Reverting the update to components, including CoreDNS, resolved the issue. This would have presented as interactions with the Cronofy platform and calendar synchronization operations taking longer than usual. For example, the 99th percentile of Cronofy API response times is usually around 0.5 seconds while during the incident it increased to around 5 seconds. Calendar synchronization operations were delayed by up to 30 minutes during the incident. Our investigations following the incident have identified that CoreDNS version 1.8.4 included a regression in behavior from 1.8.3 which caused the high level of errors within our clusters, leading to the performance degradation. We are improving our processes around such infrastructure changes to avoid such incidents in future. # Timeline _All times UTC on Wednesday, 13th July 2022 and approximate for clarity_ **16:10** Upgrade of components including CoreDNS started across all data centers. **16:15** Upgrade completed. **16:16** First alert received relating to the US data center. Manual checks show that the application was responding. **16:18** Second alert received for degraded background worker performance in CA and DE data centers. Investigations show that CPU utilization is high on all servers, in all Kubernetes clusters. Additional servers were provisioned automatically and then more added manually. **16:19** Multiple alerts being received from all data centers. **16:31** This incident was opened on our status page informing customers of the issue. We decided to rollback the component upgrade. **16:45** As the components including CoreDNS were rolled back in each data center errors dropped to normal levels and performance improved. **16:47** Rollback completed. The backlog of background work was being processed. **17:00** The backlog of background work was cleared. **17:05** Incident status changed to monitoring. **17:49** Incident closed. # Actions Although there wasn’t an outage, we certainly want to prevent this from happening again in the future. So, this lead us to ask three questions: 1. Why was this not picked up in our test environment? 2. What could we have done to identify the root cause sooner? 3. How could the impact of the change be reduced? ## Why was this not picked up in our test environment? Although this was tested in our test environment, the time between finishing the testing and deploying this to the production environments was too short. This meant that we missed that there was performance degradation introduced. We are going to review the test plan for such infrastructure changes in our test environment. This will include a soaking period, which will see us wait a set amount of time between implementing new changes in our test environment and rolling them out to the production environments. ## What could we have done to identify the root cause sooner? Previous Kubernetes upgrades had been straightforward, which led to over-confidence. Multiple infrastructure components were changed at once and so we were unable to easily identify which component was responsible. In future, we will split infrastructure component upgrades into multiple phases to help identify the cause of problems if they are to occur. ## How could the impact of the change be reduced? As mentioned above, previous Kubernetes upgrades had been straightforward, which led to over-confidence. We rolled out the component updates, including CoreDNS, to all environments in a short amount of time and it wasn’t until they had all been completed that we started to receive alerts. To prevent this from happening in the future for such changes we are going to have a phased rollout to our production environments. This will mean such an issue will only impact some environments rather than them all, reducing the impact and aiding a faster resolution. # Further questions? If you have any further questions, please contact us at [[email protected]](mailto:[email protected])
Time: July 13, 2022, 5:49 p.m.

Status: Resolved

Update: This afternoon we were upgrading our Kubernetes clusters, these are all hosted using AWS Elastic Kubernetes Service. There are multiple steps to this process, which had all been performed successfully in our testing environment, and it wasn't until the last step of the process had been applied that we started to see issues. The last step of the process was upgrading CoreDNS and Kube Proxy to the versions recommended by AWS for the new version of EKS. This started at approximately 16:10 UTC. Shortly after this, we received alerts informing us of degraded performance when processing messages. The CoreDNS and Kube Proxy logs didn't contain any errors and so we thought that our worker processes may have been stuck and so we restarted them, however, this did not resolve the issue. At 16:31 UTC this incident was created while we continued to identify the cause. We decided the best course of action was to start rolling back the last change that was made. We started by doing this in a single environment to see if it had the desired effect. Rolling back Kube Proxy had no effect, but when we rolled back CoreDNS we very quickly saw that messages were being processed and the backlog in our queues started to reduce. We then started to roll out the CoreDNS roll back to all environments, this was completed by approximately 16:46 UTC. It then took a further 15 minutes for the backlog of messages to be cleared. Normal performance was resumed at 17:01 UTC. We will be conducting a postmortem of this incident and will share our findings by Monday 18th July.
Time: July 13, 2022, 5:05 p.m.

Status: Monitoring

Update: The backlog of work generated by the degraded performance has now been processed. We're continuing to monitor the situation
Time: July 13, 2022, 4:49 p.m.

Status: Identified

Update: We had recently upgraded CoreDNS within our Kubernetes clusters. Although initial signs suggested that CoreDNS was operating normally, we decided to roll back. After rolling back performance appears to have returned to normal, however we will continue to monitor the situation
Time: July 13, 2022, 4:31 p.m.

Status: Investigating

Update: We are investigating degraded performance in all data centers

Zoom API disruption

Description: Cronofy's calls to Zoom's API experienced a heightened number of errors for roughly 40 minutes starting at around 14:00 UTC. Normal operation has resumed for around an hour, and our spot checks indicate that conferencing details have eventually been provisioned as expected.

Status: Resolved

Impact: Minor | Started At: June 21, 2022, 2:40 p.m.

Updates:

Time: June 21, 2022, 3:41 p.m.

Status: Resolved

Update: Cronofy's calls to Zoom's API experienced a heightened number of errors for roughly 40 minutes starting at around 14:00 UTC. Normal operation has resumed for around an hour, and our spot checks indicate that conferencing details have eventually been provisioned as expected.
Time: June 21, 2022, 2:40 p.m.

Status: Monitoring

Update: Cronofy's calls to Zoom's API to provision and update conferencing details are encountering more errors than usual. This may result in events being created in people's calendars without conferencing initially to ensure people's time is reserved. Once we are able to provisioning conferencing details from Zoom, any affected events will be updated accordingly.

Zoom API disruption

Status: Resolved

Impact: Minor | Started At: June 21, 2022, 2:40 p.m.

Updates:

Time: June 21, 2022, 3:41 p.m.

Status: Resolved

Update: Cronofy's calls to Zoom's API experienced a heightened number of errors for roughly 40 minutes starting at around 14:00 UTC. Normal operation has resumed for around an hour, and our spot checks indicate that conferencing details have eventually been provisioned as expected.
Time: June 21, 2022, 2:40 p.m.

Status: Monitoring

Update: Cronofy's calls to Zoom's API to provision and update conferencing details are encountering more errors than usual. This may result in events being created in people's calendars without conferencing initially to ensure people's time is reserved. Once we are able to provisioning conferencing details from Zoom, any affected events will be updated accordingly.

UK data center reachability

Description: An internal process initiated from our centralized billing system appears to be responsible for rendering our UK data center largely unreachable between 11:04 UTC and 11:06 UTC. Our internal billing-related API was invoked at such a rate that our web servers were starved of resources for handling further requests. We will be reviewing this process and others like it to avoid such things happening in future.

Status: Resolved

Impact: None | Started At: May 3, 2022, 11:09 a.m.

Updates:

Time: May 3, 2022, 11:35 a.m.

Status: Resolved

Update: An internal process initiated from our centralized billing system appears to be responsible for rendering our UK data center largely unreachable between 11:04 UTC and 11:06 UTC. Our internal billing-related API was invoked at such a rate that our web servers were starved of resources for handling further requests. We will be reviewing this process and others like it to avoid such things happening in future.
Time: May 3, 2022, 11:09 a.m.

Status: Investigating

Update: Our UK data center appeared to be briefly unavailable. It has recovered and we are investigating what happened.

UK data center reachability

Status: Resolved

Impact: None | Started At: May 3, 2022, 11:09 a.m.

Updates:

Time: May 3, 2022, 11:35 a.m.

Status: Resolved

Update: An internal process initiated from our centralized billing system appears to be responsible for rendering our UK data center largely unreachable between 11:04 UTC and 11:06 UTC. Our internal billing-related API was invoked at such a rate that our web servers were starved of resources for handling further requests. We will be reviewing this process and others like it to avoid such things happening in future.
Time: May 3, 2022, 11:09 a.m.

Status: Investigating

Update: Our UK data center appeared to be briefly unavailable. It has recovered and we are investigating what happened.

Check the status of similar companies and alternatives to Cronofy

NetSuite

Systems Active

ZoomInfo

Systems Active

SPS Commerce

Systems Active

Miro

Systems Active

Field Nation

Systems Active

Outreach

Systems Active

Own Company

Systems Active

Mindbody

Systems Active

TaskRabbit

Systems Active

Nextiva

Systems Active

6Sense

Systems Active

BigCommerce

Systems Active

Frequently Asked Questions - Cronofy

Is there a Cronofy outage?

The current status of Cronofy is: Systems Active

Where can I find the official status page of Cronofy?

The official status page for Cronofy is here

How can I get notified if Cronofy is down or experiencing an outage?

To get notified of any status changes to Cronofy, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Cronofy every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here

What does Cronofy do?

Cronofy offers scheduling technology that enables users to share their availability across various applications. It also provides enterprise-level scheduling tools, UI elements, and APIs.

Is there an Cronofy outage?

Cronofy status: Systems Active

Cronofy outages and incidents

There have been 0 outages or incidents for Cronofy in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Components and Services Monitored for Cronofy

Conferencing Services

Major Calendar Providers

Latest Cronofy outages and incidents.

Degraded performance in all data centers

Updates:

Zoom API disruption

Updates:

Zoom API disruption

Updates:

UK data center reachability

Updates:

UK data center reachability

Updates:

Check the status of similar companies and alternatives to Cronofy

NetSuite

ZoomInfo

SPS Commerce

Miro

Field Nation

Outreach

Own Company

Mindbody

TaskRabbit

Nextiva

6Sense

BigCommerce

Frequently Asked Questions - Cronofy

Is there a Cronofy outage?

Where can I find the official status page of Cronofy?

How can I get notified if Cronofy is down or experiencing an outage?

What does Cronofy do?

Start monitoring now!