Company Logo

Is there an Cronofy outage?

Cronofy status: Systems Active

Last checked: 9 minutes ago

Get notified about any outages, downtime or incidents for Cronofy and 1800+ other cloud vendors. Monitor 10 companies, for free.

Subscribe for updates

Cronofy outages and incidents

Outage and incident data over the last 30 days for Cronofy.

There have been 0 outages or incidents for Cronofy in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Sign Up Now

Components and Services Monitored for Cronofy

Outlogger tracks the status of these components for Xero:

API Active
Background Processing Active
Developer Dashboard Active
Scheduler Active
GoTo Active
Zoom Active
Apple Active
Google Active
Microsoft 365 Active
Outlook.com Active
Component Status
API Active
Background Processing Active
Developer Dashboard Active
Scheduler Active
Active
GoTo Active
Zoom Active
Active
Apple Active
Google Active
Microsoft 365 Active
Outlook.com Active

Latest Cronofy outages and incidents.

View the latest incidents for Cronofy and check for official updates:

Updates:

  • Time: Oct. 2, 2024, 2:38 a.m.
    Status: Resolved
    Update: US data center performance has remained normal and the incident is resolved. Around 00:56 UTC inbound traffic to api.cronofy.com and app.cronofy.com began to show signs of performance degradation. This was observed to be an issue routing traffic from our load balancers to their respective target groups and on to our servers. This resulted in an increase in processing time which, in turn, resulted in some requests timing out. By 01:04 UTC the issue with the load balancers routing traffic had been resolved and traffic flow returned to usual levels. A small backlog of requests was worked through by 01:10 UTC and normal operations resumed. A postmortem of the incident will take place and be attached to this incident in the next 48 hours. If you have any queries in the interim, please contact us at [email protected].
  • Time: Oct. 2, 2024, 2:11 a.m.
    Status: Monitoring
    Update: We're continuing to monitor traffic flow but all indicators show that, an increase in incoming traffic being retried remotely aside, as of 01:06 routing has returned to normal.
  • Time: Oct. 2, 2024, 1:55 a.m.
    Status: Identified
    Update: Performance has returned to expected levels. Between 00:56 and 01:04 UTC, traffic making it's way from our load balancers to our servers did not do so in a timely manner. This will have resulted in possible timeouts for requests to api.cronofy.com and app.cronofy.com and potential server errors for API integrators and Scheduler users.
  • Time: Oct. 2, 2024, 1:42 a.m.
    Status: Investigating
    Update: We have seen some performance degradation in our US data center. Initial findings appear similar to those of 26 Sept 2024. Improved monitoring has highlighted this issue earlier and we are in the process of investigating further.

Updates:

  • Time: Sept. 27, 2024, 1:43 p.m.
    Status: Resolved
    Update: Between 15:16-15:18 and 15:44-15:46 UTC we experienced degraded performance in our US data center. During these times, a little under 3% of requests to api.cronofy.com and app.cronofy.com resulted in a server error that potentially affected API integrators and Scheduler users. These errors coincided with an AWS issue in the North Virginia region - https://status.aws.amazon.com/#multipleservices-us-east-1_1727378355, where load balancer target groups experienced slower than normal registration times We are recording this incident retrospectively as whilst we were aware of the issue with target groups, we had a gap in our alerting that led us to believe there was no impact to customers related to it. That gap has now been filled. If you have any questions, please email [email protected].

Updates:

  • Time: April 23, 2024, 5:31 p.m.
    Status: Postmortem
    Update: On Monday April 22nd between 11:00 and 13:30 UTC our background processing services had a major performance degradation meaning background work was delayed for around 2 hours in some cases. This impacted operations such as synchronizing schedules to push events into calendars and to update people's availability. A change in our software's dependencies led to our background processors pulling work from queues but not processing that work as expected. This led to work messages being stuck in a state where the queues believed they were being worked on, so did not allow other background processors to perform the work instead. For a subset of the background processing during this period we had to wait for a configured timeout of 2 hours to expire, at which point the background work messages became available again, and the backlog was cleared. Full service was resumed to all data centers, including processing any delayed messages, by 13:30 UTC. Further details, lessons learned, and further actions we will be taking can be found below. ## Timeline _All times rounded for clarity and UTC_ On Monday April 22nd at 10:55 a change was merged which incorporated some minor version changes in dependencies that we use to interact with AWS services. This was to facilitate work against an AWS service we were not previously using. This change in dependencies interacted with a dependency that had not changed such that our calls to fetch work messages from AWS Simple Queue Service \(SQS\) reported as containing no messages when in fact they did. This meant that messages were being processed as far as AWS SQS was concerned \(in-flight\), but our application code did not see them in order to process them. This change went live from 10:58, with the first alert as a result of the unexpected behavior being triggered at 11:12. The bad change was reverted at 11:15 and fully removed by 11:20. This meant that background work queued between 10:58 and up to 11:20 was stuck in limbo where AWS SQS thought it was being processed. For our data centers in Australia, Canada, UK, and Singapore regular service was resumed at this point. New messages could be received and processed, and we could only wait for the messages in limbo to be released by AWS SQS to process those. In our German and US data centers we had hit a hard limit of SQS with 120,000 messages being considered "in flight" for our high priority queue. This meant that we were unable to read from those queues, but were still allowed to write to them. Once we realised and understood this issue, we released a change to route all new messages to other queues and avoid this problem. This was in place at 12:00. Whilst we were able to make changes to remove the initial problem, and avoid the effects of the secondary problem caused by hitting the hard limit. The status of the individual work messages was outside of our control. AWS SQS does not have a way to force messages back onto the queue which is the operation which we needed to resolve the issue. We looked for other alternatives but the work messages aren't accessible in any way via AWS APIs when in this state. Instead we had to wait for the configured timeout to expire, which would release the messages again. We took more direct control over capacity throughout this incident, including preparing additional capacity for the backlog of work messages being released. Once the work messages became visible after reaching their two hour timeout, we were able to process them successfully with the full service being resumed to all data centers, including processing any delayed messages, by 13:30 UTC. We then reverted changes applied during the incident to help handle it, returning things back to their regular configuration. ## Retrospective The questions we ask ourselves in an incident retrospective are: * Could it have been identified sooner? * Could it have been resolved sooner? * Could it have been prevented? Also, we don't want to focus too heavily on this specifics of an individual incident, instead look for holistic improvements alongside targeted ones. ### Could it have been identified sooner? For something with this significant an impact, it taking 12 minutes to alert us was too slow. Halving the time to alert would have significantly reduced the impact of this incident, potentially avoiding the second-order issue experienced in our German and US data centers. The false-negative nature of the behavior meant that other safeguards were not triggered. Cronofy's code was not mishandling or ignoring an error, the silent failure meant our application code was unaware of a problem. ### Could it have been resolved sooner? The key constraint on the resolution of the incident was the "in flight" timeout we had configured for the respective queues. We don't want to rush such a change to a critical part of our infrastructure but our initial analysis suggests a timeout of 15-30 minutes is likely reasonable and would have made a significant difference to the time to full service recovery. ### Cloud it have been prevented? As the cause was a change deployed by ourselves rather than an external factor, undoubtedly. In hindsight, something touching AWS-related dependencies must always be tested in our staging environment and this change was not. This would likely have led to the issue being noticed before being deployed at all. ## Actions We will be creating additional alerts around metrics that went well outside of normal bounds that would have drawn our attention much sooner. We will be reducing the timeouts configured on our AWS SQS queues to reduce the time messages are considered "in-flight" without any other interaction to align more closely with observed background processing execution times. We are changing how we reference AWS-related dependencies to make them more explicit, and alongside carrying a warning to ensure full testing is performed in our staging environment first. We will also be adding the AWS dependencies to our quarterly patching cycle to keep them contemporary, reducing the possibility of such cross-version incompatibilities. ## Further questions? If you have any further questions, please contact us at [[email protected]](mailto:[email protected])
  • Time: April 22, 2024, 1:54 p.m.
    Status: Resolved
    Update: Background task processing has remained normal and the incident is resolved. Around 11:00 UTC, a change was deployed to production which inadvertently broke the background processing. We received our first alert at 11:12 UTC. We reverted the offending change at 11:15 UTC, but this did not restore background processing completely. Multiple changes were made to minimize the impact of the issue, which restored most functionality for newly created jobs by 11:54 UTC. This left a backlog of work queued for processing between 11:00-11:54 UTC stuck in an unprocessable state. The stuck backlog work was due to requeue at around 13:00 UTC due to reaching a configured timeout of 2 hours. We attempted to find a way to process this work sooner but were unsuccessful. As anticipated, the work became available from around 13:00 UTC and was processed for about half an hour, completing by 13:30 UTC. This fully restored normal operations. A postmortem of the incident will take place and be attached to this incident in the next 48 hours. If you have any queries in the interim, please contact us at [email protected].
  • Time: April 22, 2024, 1:29 p.m.
    Status: Monitoring
    Update: The backlog of jobs has been processed and we are monitoring to verify that everything remains normal.
  • Time: April 22, 2024, 12:57 p.m.
    Status: Identified
    Update: New work continues to process as normal, and the stuck jobs are expected to be processed in the next hour.
  • Time: April 22, 2024, 12:32 p.m.
    Status: Identified
    Update: Capacity has been increased but the stuck jobs are still not processing due to an issue with the queue. We are working to find a way to process these jobs. New jobs are avoiding the stuck queue and are processing as expected.
  • Time: April 22, 2024, 12:04 p.m.
    Status: Investigating
    Update: We have mitigated most of the impact for background processing but have jobs stuck in US & DE. We are working to further increase capacity and execute these jobs.
  • Time: April 22, 2024, 11:41 a.m.
    Status: Investigating
    Update: Processing has recovered in all data centers apart from US & DE, which have one partly recovered. We are working to mitigate the issue in these two data centers.
  • Time: April 22, 2024, 11:23 a.m.
    Status: Investigating
    Update: We have seen background processing degrade in all data centers after a recent deployment. We have reverted the change and are investigating the cause.

Updates:

  • Time: Nov. 30, 2023, 10:05 a.m.
    Status: Resolved
    Update: At 07:47 UTC, we saw a sharp increase in the number of Service Unavailable errors being returned from Apple’s calendar servers across all of our data centers, causing sync operations to fail. This was escalated to our engineering team, who investigated and found that no other calendar providers were affected and so the issue was likely not within our infrastructure. However, very few operations between Cronofy and Apple were succeeding. At 08:20 UTC, we opened an incident, to mark Apple sync as degraded, as customers may have seen an increased delay in calendar sync. This coincided with a sharp drop in the level of failed network calls, which returned to normal levels at 08:18 UTC. The service stabilized and Cronofy automatically retried failed sync operations to reconcile calendars. Over the next hour, we saw communications with Apple return to a mostly healthy state, though there were still occasional spikes in the number of errors. Cronofy continued to automatically retry failed operations, so impact on users was minimal. At 09:15 UTC, these low numbers of errors decreased back to baseline levels and stayed there. As we’ve now seen more than 30 minutes of completely healthy service, we are resolving the incident
  • Time: Nov. 30, 2023, 9:25 a.m.
    Status: Monitoring
    Update: We are still observing small numbers of errors. These are being automatically retried so the service is largely unaffected, and we are continuing to monitor.
  • Time: Nov. 30, 2023, 8:50 a.m.
    Status: Monitoring
    Update: We are still seeing occasional, smaller number of errors. These are being retried automatically by Cronofy and the service is largely unaffected. We are continuing to monitor error levels.
  • Time: Nov. 30, 2023, 8:29 a.m.
    Status: Monitoring
    Update: We saw a rise in Service Unavailable errors from Apple's calendar servers between 07:46 UTC and 08:19 UTC. Normal operation has resumed. Cronofy will automatically retry any failed communications with Apple, so no further intervention is required. We are continuing to monitor the situation to be sure that the incident is over.
  • Time: Nov. 30, 2023, 8:20 a.m.
    Status: Investigating
    Update: We are investigating unusually high numbers of errors when syncing Apple calendars.

Updates:

  • Time: Oct. 28, 2023, 4:05 p.m.
    Status: Resolved
    Update: Error rates from Apple's API have returned to normal levels, and calendar syncs for Apple-backed calendars are healthy again. Apple Calendar sync performance dropped starting at 13:36 UTC and finishing at 15:43 UTC. During this time no other calendar provider sync operations were affected.
  • Time: Oct. 28, 2023, 3:03 p.m.
    Status: Monitoring
    Update: We continue to observe a high error rate on Apple's Calendar API. Around 22% of Apple Calendar operations are resulting in an error. We'll continue to monitor things and update accordingly.
  • Time: Oct. 28, 2023, 2:07 p.m.
    Status: Monitoring
    Update: Errors when communicating with Apple calendar increased significantly across all data centers from 13:36 UTC. This is not affecting communications with any other calendar providers. We continue to monitor the situation.

Check the status of similar companies and alternatives to Cronofy

NetSuite
NetSuite

Systems Active

ZoomInfo
ZoomInfo

Systems Active

SPS Commerce
SPS Commerce

Systems Active

Miro
Miro

Systems Active

Field Nation
Field Nation

Systems Active

Outreach
Outreach

Systems Active

Own Company

Systems Active

Mindbody
Mindbody

Systems Active

TaskRabbit
TaskRabbit

Systems Active

Nextiva
Nextiva

Systems Active

6Sense

Systems Active

BigCommerce
BigCommerce

Systems Active

Frequently Asked Questions - Cronofy

Is there a Cronofy outage?
The current status of Cronofy is: Systems Active
Where can I find the official status page of Cronofy?
The official status page for Cronofy is here
How can I get notified if Cronofy is down or experiencing an outage?
To get notified of any status changes to Cronofy, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Cronofy every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here
What does Cronofy do?
Cronofy offers scheduling technology that enables users to share their availability across various applications. It also provides enterprise-level scheduling tools, UI elements, and APIs.