Company Logo

Is there an Cronofy outage?

Cronofy status: Systems Active

Last checked: 8 minutes ago

Get notified about any outages, downtime or incidents for Cronofy and 1800+ other cloud vendors. Monitor 10 companies, for free.

Subscribe for updates

Cronofy outages and incidents

Outage and incident data over the last 30 days for Cronofy.

There have been 0 outages or incidents for Cronofy in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Sign Up Now

Components and Services Monitored for Cronofy

Outlogger tracks the status of these components for Xero:

API Active
Background Processing Active
Developer Dashboard Active
Scheduler Active
GoTo Active
Zoom Active
Apple Active
Google Active
Microsoft 365 Active
Outlook.com Active
Component Status
API Active
Background Processing Active
Developer Dashboard Active
Scheduler Active
Active
GoTo Active
Zoom Active
Active
Apple Active
Google Active
Microsoft 365 Active
Outlook.com Active

Latest Cronofy outages and incidents.

View the latest incidents for Cronofy and check for official updates:

Updates:

  • Time: July 28, 2022, 6:41 p.m.
    Status: Resolved
    Update: From 18:23 to 18:28 UTC we saw reachability problems for our US data center. Symptomatically this is extremely similar to the outage observed on Saturday 23rd July 2022, details of which can be found here: https://status.cronofy.com/incidents/32fc8mjcr1zw Steps are already underway to alleviate the believed root cause of this.

Updates:

  • Time: July 28, 2022, 8:40 a.m.
    Status: Postmortem
    Update: On Saturday, 23rd July 2022, we experienced a 12-minute outage in our US data center between 17:29 and 17:41 UTC. During this time, our API at [api.cronofy.com](http://api.cronofy.com) and our web application at [app.cronofy.com](http://app.cronofy.com) were not reachable. Any requests made are likely to have failed to connect or received a 500-range status code rather than being handled successfully. Our web application hosts the developer dashboard, Scheduler, Real-Time Scheduling pages, and end-user authorization flows. Our background processing of jobs, such as calendar synchronization, were not affected. Cronofy records all API calls into an API request table before processing. The outage was triggered when the database locked this table. Without being able to write requests to the table, all API requests began to queue up and timeout, and once the queue was full, be rejected outright. This, in turn, caused our infrastructure to mark these servers as unhealthy and take them out of service. We experienced a [very similar incident](https://status.cronofy.com/incidents/mz84qh5n29cq) in February 2021. Since that incident, we have [performed major version upgrades](https://status.cronofy.com/incidents/wzj1vnhj31zc) to our PostgreSQL clusters, and we had thought those upgrades had fixed this issue, as we had not had a recurrence for a long time. It is now clear that the major version upgrades have, unfortunately, not fixed this particular issue. To help prevent this issue from happening again, we will be making changes to how data is stored within our PostgreSQL cluster. # Timeline _All times UTC on Saturday, 23rd July 2022 and approximate for clarity_ **17:29** App and API requests began to fail **17:31** The on-call engineer is alerted to the App and API being unresponsive **17:35** Attempts to mitigate the issue are made, including launching more servers. These result in temporary improvements but do not fix this issue. **17:37** The initial alerts clear as connectivity is temporarily restored as our attempts to resolve this issue temporarily work. **17:38** New alerts are raised for the app and API being unresponsive **17:39** Incident channel created, and other engineers come online to help **17:41** This incident is created. While this is being done, telemetry shows that API and app requests are being processed again. **17:52** Incident status is changed to monitoring and we continue to investigate the root cause. **18:47** Incident status is resolved # Actions The actions for this incident fall into two categories, what we can do straight away, and what we can do in the medium/long-term. ## Short term To improve the performance of database queries we use several indexes within our PostgreSQL clusters, these help to locate the data in a fast and efficient manner. This locking issue always seems to occur when these indexes are being updated and the database gets into a state where it is waiting for some operations to resolve. Therefore, we are going to review which indexes are actively used and determine whether any can safely be removed or consolidated, as this will reduce the chances of the issue occurring by reducing the number of indexes which need updating. We are also going to look at whether we can improve our alerts to help us to identify the root cause of this type of issue faster, and give our on-call engineers a clearer signal that this is the root cause While we currently don’t have a way of resolving the issue directly \(the database eventually resolves the locks\), this will help us provide clearer messaging and faster investigations. ## Medium/long term In the medium to long term, we will review the storage of API and app requests and determine whether PostgreSQL is the correct storage technology. This is likely to lead to re-architecting how we store some types of data to ensure our service is robust in the future. ## Further questions? If you have any further questions, please contact us at [[email protected]](mailto:[email protected])
  • Time: July 28, 2022, 8:40 a.m.
    Status: Postmortem
    Update: On Saturday, 23rd July 2022, we experienced a 12-minute outage in our US data center between 17:29 and 17:41 UTC. During this time, our API at [api.cronofy.com](http://api.cronofy.com) and our web application at [app.cronofy.com](http://app.cronofy.com) were not reachable. Any requests made are likely to have failed to connect or received a 500-range status code rather than being handled successfully. Our web application hosts the developer dashboard, Scheduler, Real-Time Scheduling pages, and end-user authorization flows. Our background processing of jobs, such as calendar synchronization, were not affected. Cronofy records all API calls into an API request table before processing. The outage was triggered when the database locked this table. Without being able to write requests to the table, all API requests began to queue up and timeout, and once the queue was full, be rejected outright. This, in turn, caused our infrastructure to mark these servers as unhealthy and take them out of service. We experienced a [very similar incident](https://status.cronofy.com/incidents/mz84qh5n29cq) in February 2021. Since that incident, we have [performed major version upgrades](https://status.cronofy.com/incidents/wzj1vnhj31zc) to our PostgreSQL clusters, and we had thought those upgrades had fixed this issue, as we had not had a recurrence for a long time. It is now clear that the major version upgrades have, unfortunately, not fixed this particular issue. To help prevent this issue from happening again, we will be making changes to how data is stored within our PostgreSQL cluster. # Timeline _All times UTC on Saturday, 23rd July 2022 and approximate for clarity_ **17:29** App and API requests began to fail **17:31** The on-call engineer is alerted to the App and API being unresponsive **17:35** Attempts to mitigate the issue are made, including launching more servers. These result in temporary improvements but do not fix this issue. **17:37** The initial alerts clear as connectivity is temporarily restored as our attempts to resolve this issue temporarily work. **17:38** New alerts are raised for the app and API being unresponsive **17:39** Incident channel created, and other engineers come online to help **17:41** This incident is created. While this is being done, telemetry shows that API and app requests are being processed again. **17:52** Incident status is changed to monitoring and we continue to investigate the root cause. **18:47** Incident status is resolved # Actions The actions for this incident fall into two categories, what we can do straight away, and what we can do in the medium/long-term. ## Short term To improve the performance of database queries we use several indexes within our PostgreSQL clusters, these help to locate the data in a fast and efficient manner. This locking issue always seems to occur when these indexes are being updated and the database gets into a state where it is waiting for some operations to resolve. Therefore, we are going to review which indexes are actively used and determine whether any can safely be removed or consolidated, as this will reduce the chances of the issue occurring by reducing the number of indexes which need updating. We are also going to look at whether we can improve our alerts to help us to identify the root cause of this type of issue faster, and give our on-call engineers a clearer signal that this is the root cause While we currently don’t have a way of resolving the issue directly \(the database eventually resolves the locks\), this will help us provide clearer messaging and faster investigations. ## Medium/long term In the medium to long term, we will review the storage of API and app requests and determine whether PostgreSQL is the correct storage technology. This is likely to lead to re-architecting how we store some types of data to ensure our service is robust in the future. ## Further questions? If you have any further questions, please contact us at [[email protected]](mailto:[email protected])
  • Time: July 23, 2022, 6:47 p.m.
    Status: Resolved
    Update: The service is still healthy, and we have identified the likely root cause as a rare case in our database management system being triggered. This caused high levels of locking and degraded performance. This occurred at 17:29 UTC and lasted until the locks resolved at 17:41 UTC. We are investigating short and medium-term solutions to change our infrastructure to avoid a repeat incident.
  • Time: July 23, 2022, 6:47 p.m.
    Status: Resolved
    Update: The service is still healthy, and we have identified the likely root cause as a rare case in our database management system being triggered. This caused high levels of locking and degraded performance. This occurred at 17:29 UTC and lasted until the locks resolved at 17:41 UTC. We are investigating short and medium-term solutions to change our infrastructure to avoid a repeat incident.
  • Time: July 23, 2022, 6:09 p.m.
    Status: Monitoring
    Update: Everything is continuing to perform at normal levels. We are still investigating the root cause and monitoring the service.
  • Time: July 23, 2022, 6:09 p.m.
    Status: Monitoring
    Update: Everything is continuing to perform at normal levels. We are still investigating the root cause and monitoring the service.
  • Time: July 23, 2022, 5:52 p.m.
    Status: Monitoring
    Update: We have identified an unusually high number of locks in our database, causing a performance degradation due to high contention. This has now passed and we are monitoring the service while continuing to investigate the root cause.
  • Time: July 23, 2022, 5:45 p.m.
    Status: Investigating
    Update: We are seeing normal service resuming and are still investigating the source of the issue.
  • Time: July 23, 2022, 5:45 p.m.
    Status: Investigating
    Update: We are seeing normal service resuming and are still investigating the source of the issue.
  • Time: July 23, 2022, 5:41 p.m.
    Status: Investigating
    Update: We are currently investigating high levels of errors when trying to communicate with the API, Scheduler, and Developer Dashboard.
  • Time: July 23, 2022, 5:41 p.m.
    Status: Investigating
    Update: We are currently investigating high levels of errors when trying to communicate with the API, Scheduler, and Developer Dashboard.

Updates:

  • Time: July 21, 2022, 11:19 p.m.
    Status: Resolved
    Update: At approximately 22:16 UTC, we observed a much higher number of errors for Google calendar API calls than we would expect (mostly no data received for events page) in our German data center. The on-call engineer was alerted to this issue at 22:32 UTC. After investigating, we decided to open an incident about this at 22:49 UTC to inform of service degradation in our German data center. While opening the incident, we were alerted about the US data center also being impacted. We saw that around 10% of Google calender API calls in our US data center were returning an error, and so the incident was updated at 22:56 UTC. Errors communicating with the Google calendar API returned to normal levels in both our German and US data centers at around 22:52 UTC. Errors have remained at normal levels since then, so we are resolving this incident. There does not appear to have been a pattern to the accounts affected by this.
  • Time: July 21, 2022, 11:01 p.m.
    Status: Monitoring
    Update: Errors returned to usual levels at around 22:52 UTC, as the previous message was being sent. We continue to monitor the situation.
  • Time: July 21, 2022, 10:56 p.m.
    Status: Monitoring
    Update: Initial investigations showed that this was only affecting our German data center. However, we can now see that this is also affecting our US data center, but on a much smaller scale. We are continuing to monitor the situation. Our monitoring shows that the synchronization performance of other calendar providers is not affected.
  • Time: July 21, 2022, 10:49 p.m.
    Status: Monitoring
    Update: Since approximately 22:16 UTC, we have seen a higher level of errors when communicating with Google calendars than we would normally expect in our German data center. We are monitoring the situation. Synchronization performance for Google calendars will be affected by this, other calendar providers are not affected.

Updates:

  • Time: July 21, 2022, 11:19 p.m.
    Status: Resolved
    Update: At approximately 22:16 UTC, we observed a much higher number of errors for Google calendar API calls than we would expect (mostly no data received for events page) in our German data center. The on-call engineer was alerted to this issue at 22:32 UTC. After investigating, we decided to open an incident about this at 22:49 UTC to inform of service degradation in our German data center. While opening the incident, we were alerted about the US data center also being impacted. We saw that around 10% of Google calender API calls in our US data center were returning an error, and so the incident was updated at 22:56 UTC. Errors communicating with the Google calendar API returned to normal levels in both our German and US data centers at around 22:52 UTC. Errors have remained at normal levels since then, so we are resolving this incident. There does not appear to have been a pattern to the accounts affected by this.
  • Time: July 21, 2022, 11:01 p.m.
    Status: Monitoring
    Update: Errors returned to usual levels at around 22:52 UTC, as the previous message was being sent. We continue to monitor the situation.
  • Time: July 21, 2022, 10:56 p.m.
    Status: Monitoring
    Update: Initial investigations showed that this was only affecting our German data center. However, we can now see that this is also affecting our US data center, but on a much smaller scale. We are continuing to monitor the situation. Our monitoring shows that the synchronization performance of other calendar providers is not affected.
  • Time: July 21, 2022, 10:49 p.m.
    Status: Monitoring
    Update: Since approximately 22:16 UTC, we have seen a higher level of errors when communicating with Google calendars than we would normally expect in our German data center. We are monitoring the situation. Synchronization performance for Google calendars will be affected by this, other calendar providers are not affected.

Updates:

  • Time: July 15, 2022, 3:37 p.m.
    Status: Postmortem
    Update: On Wednesday, 13th July 2022 we experienced up to 50 minutes of degraded performance in all of our data centers between 16:10 and 17:00 UTC. This was caused by an upgrade to our Kubernetes clusters \(how the Cronofy platform is hosted\) from version 1.20 to 1.21. This involves upgrading several components of which one, CoreDNS, was the source of this incident. CoreDNS was being upgraded from version 1.8.3 to 1.8.4, as this is the AWS recommended version to use with Kubernetes 1.21 hosted on Amazon's Elastic Kubernetes Service. Upgrading these components is usually a zero-downtime operation and so was being performed during working hours. Reverting the update to components, including CoreDNS, resolved the issue. This would have presented as interactions with the Cronofy platform and calendar synchronization operations taking longer than usual. For example, the 99th percentile of Cronofy API response times is usually around 0.5 seconds while during the incident it increased to around 5 seconds. Calendar synchronization operations were delayed by up to 30 minutes during the incident. Our investigations following the incident have identified that CoreDNS version 1.8.4 included a regression in behavior from 1.8.3 which caused the high level of errors within our clusters, leading to the performance degradation. We are improving our processes around such infrastructure changes to avoid such incidents in future. # Timeline _All times UTC on Wednesday, 13th July 2022 and approximate for clarity_ **16:10** Upgrade of components including CoreDNS started across all data centers. **16:15** Upgrade completed. **16:16** First alert received relating to the US data center. Manual checks show that the application was responding. **16:18** Second alert received for degraded background worker performance in CA and DE data centers. Investigations show that CPU utilization is high on all servers, in all Kubernetes clusters. Additional servers were provisioned automatically and then more added manually. **16:19** Multiple alerts being received from all data centers. **16:31** This incident was opened on our status page informing customers of the issue. We decided to rollback the component upgrade. **16:45** As the components including CoreDNS were rolled back in each data center errors dropped to normal levels and performance improved. **16:47** Rollback completed. The backlog of background work was being processed. **17:00** The backlog of background work was cleared. **17:05** Incident status changed to monitoring. **17:49** Incident closed. # Actions Although there wasn’t an outage, we certainly want to prevent this from happening again in the future. So, this lead us to ask three questions: 1. Why was this not picked up in our test environment? 2. What could we have done to identify the root cause sooner? 3. How could the impact of the change be reduced? ## Why was this not picked up in our test environment? Although this was tested in our test environment, the time between finishing the testing and deploying this to the production environments was too short. This meant that we missed that there was performance degradation introduced. We are going to review the test plan for such infrastructure changes in our test environment. This will include a soaking period, which will see us wait a set amount of time between implementing new changes in our test environment and rolling them out to the production environments. ## What could we have done to identify the root cause sooner? Previous Kubernetes upgrades had been straightforward, which led to over-confidence. Multiple infrastructure components were changed at once and so we were unable to easily identify which component was responsible. In future, we will split infrastructure component upgrades into multiple phases to help identify the cause of problems if they are to occur. ## How could the impact of the change be reduced? As mentioned above, previous Kubernetes upgrades had been straightforward, which led to over-confidence. We rolled out the component updates, including CoreDNS, to all environments in a short amount of time and it wasn’t until they had all been completed that we started to receive alerts. To prevent this from happening in the future for such changes we are going to have a phased rollout to our production environments. This will mean such an issue will only impact some environments rather than them all, reducing the impact and aiding a faster resolution. # Further questions? If you have any further questions, please contact us at [[email protected]](mailto:[email protected])
  • Time: July 13, 2022, 5:49 p.m.
    Status: Resolved
    Update: This afternoon we were upgrading our Kubernetes clusters, these are all hosted using AWS Elastic Kubernetes Service. There are multiple steps to this process, which had all been performed successfully in our testing environment, and it wasn't until the last step of the process had been applied that we started to see issues. The last step of the process was upgrading CoreDNS and Kube Proxy to the versions recommended by AWS for the new version of EKS. This started at approximately 16:10 UTC. Shortly after this, we received alerts informing us of degraded performance when processing messages. The CoreDNS and Kube Proxy logs didn't contain any errors and so we thought that our worker processes may have been stuck and so we restarted them, however, this did not resolve the issue. At 16:31 UTC this incident was created while we continued to identify the cause. We decided the best course of action was to start rolling back the last change that was made. We started by doing this in a single environment to see if it had the desired effect. Rolling back Kube Proxy had no effect, but when we rolled back CoreDNS we very quickly saw that messages were being processed and the backlog in our queues started to reduce. We then started to roll out the CoreDNS roll back to all environments, this was completed by approximately 16:46 UTC. It then took a further 15 minutes for the backlog of messages to be cleared. Normal performance was resumed at 17:01 UTC. We will be conducting a postmortem of this incident and will share our findings by Monday 18th July.
  • Time: July 13, 2022, 5:05 p.m.
    Status: Monitoring
    Update: The backlog of work generated by the degraded performance has now been processed. We're continuing to monitor the situation
  • Time: July 13, 2022, 4:49 p.m.
    Status: Identified
    Update: We had recently upgraded CoreDNS within our Kubernetes clusters. Although initial signs suggested that CoreDNS was operating normally, we decided to roll back. After rolling back performance appears to have returned to normal, however we will continue to monitor the situation
  • Time: July 13, 2022, 4:31 p.m.
    Status: Investigating
    Update: We are investigating degraded performance in all data centers

Check the status of similar companies and alternatives to Cronofy

NetSuite
NetSuite

Systems Active

ZoomInfo
ZoomInfo

Systems Active

SPS Commerce
SPS Commerce

Systems Active

Miro
Miro

Systems Active

Field Nation
Field Nation

Systems Active

Outreach
Outreach

Systems Active

Own Company

Systems Active

Mindbody
Mindbody

Systems Active

TaskRabbit
TaskRabbit

Systems Active

Nextiva
Nextiva

Systems Active

6Sense

Systems Active

BigCommerce
BigCommerce

Systems Active

Frequently Asked Questions - Cronofy

Is there a Cronofy outage?
The current status of Cronofy is: Systems Active
Where can I find the official status page of Cronofy?
The official status page for Cronofy is here
How can I get notified if Cronofy is down or experiencing an outage?
To get notified of any status changes to Cronofy, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Cronofy every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here
What does Cronofy do?
Cronofy offers scheduling technology that enables users to share their availability across various applications. It also provides enterprise-level scheduling tools, UI elements, and APIs.