Cronofy Status: Check if Cronofy down or having an outage.

Cronofy outages and incidents

Outage and incident data over the last 30 days for Cronofy.

There have been 0 outages or incidents for Cronofy in the last 30 days.

Severity Breakdown:

None: 0

Minor: 0

Major: 0

Critical: 0

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Components and Services Monitored for Cronofy

Outlogger tracks the status of these components for Xero:

API Active

Background Processing Active

Developer Dashboard Active

Scheduler Active

Conferencing Services

GoTo Active

Zoom Active

Major Calendar Providers

Apple Active

Google Active

Microsoft 365 Active

Outlook.com Active

Component	Status
API	Active
Background Processing	Active
Developer Dashboard	Active
Scheduler	Active
Conferencing Services	Active
GoTo	Active
Zoom	Active
Major Calendar Providers	Active
Apple	Active
Google	Active
Microsoft 365	Active
Outlook.com	Active

Latest Cronofy outages and incidents.

View the latest incidents for Cronofy and check for official updates:

Degraded Outlook.com sync performance

Description: The performance of Outlook.com requests has stayed stable at levels in line with its usual performance for over an hour. At 07:54 UTC we saw a sharp increase in the number of connections failing to Outlook.com, which lasted until 09:30 UTC. This may have caused some changes to take longer to sync. Due to recent issues with Microsoft platforms, we have monitored for an extended period before resolving this issue. We will continue monitoring but we believe the issue has been resolved.

Status: Resolved

Impact: Minor | Started At: June 6, 2023, 8:47 a.m.

Updates:

Time: June 6, 2023, 11:19 a.m.

Status: Resolved

Update: The performance of Outlook.com requests has stayed stable at levels in line with its usual performance for over an hour. At 07:54 UTC we saw a sharp increase in the number of connections failing to Outlook.com, which lasted until 09:30 UTC. This may have caused some changes to take longer to sync. Due to recent issues with Microsoft platforms, we have monitored for an extended period before resolving this issue. We will continue monitoring but we believe the issue has been resolved.
Time: June 6, 2023, 10:32 a.m.

Status: Monitoring

Update: We're seeing some improvement in Outlook.com requests, but still with enough failures that we consider it to be in a degraded state. We're continuing to monitor the situation.
Time: June 6, 2023, 8:47 a.m.

Status: Monitoring

Update: We are currently experiencing degraded sync with Outlook.com. This is not having an impact on our overall system performance. We are closely monitoring the situation.

Degraded Microsoft 365 sync performance

Description: From 19:22 to 22:27 UTC calendar sync operations to Microsoft platforms were degraded. During this time around 60% of operations between Cronofy and Microsoft resulted in failure. We took steps to mitigate the effects of this on other platforms and began monitoring the issue which has now been resolved. This was a recurrence of an earlier issue, https://status.cronofy.com/incidents/5ftvqz5tp40d.

Status: Resolved

Impact: Minor | Started At: June 5, 2023, 7:42 p.m.

Updates:

Time: June 5, 2023, 11:02 p.m.

Status: Resolved

Update: From 19:22 to 22:27 UTC calendar sync operations to Microsoft platforms were degraded. During this time around 60% of operations between Cronofy and Microsoft resulted in failure. We took steps to mitigate the effects of this on other platforms and began monitoring the issue which has now been resolved. This was a recurrence of an earlier issue, https://status.cronofy.com/incidents/5ftvqz5tp40d.
Time: June 5, 2023, 9:29 p.m.

Status: Monitoring

Update: We continue to experience degradation with Microsoft 365 sync operations. We're monitoring our systems closely.
Time: June 5, 2023, 8:29 p.m.

Status: Monitoring

Update: We continue to experience degradation with Microsoft 365 sync operations. We're monitoring our systems closely.
Time: June 5, 2023, 7:42 p.m.

Status: Identified

Update: We are seeing a recurrence of the earlier issue with Microsoft 365, https://status.cronofy.com/incidents/5ftvqz5tp40d. We are taking the same steps to mitigate the effects of this on other background processing operations.

Degraded Microsoft 365 sync performance

Description: From 14:13 to 15:47 UTC calendar sync operations to Microsoft platforms were degraded. During this time around 69% of operations between Cronofy and Microsoft resulted in failure and led to a backlog when processing requests. We took steps to mitigate the effects of this on other platforms and began monitoring the issue which has now been resolved. We believe this is related to Microsoft alert: MO571683 (https://portal.office.com/adminportal/home?#/servicehealth/:/alerts/MO571683)

Status: Resolved

Impact: Minor | Started At: June 5, 2023, 2:31 p.m.

Updates:

Time: June 5, 2023, 4:56 p.m.

Status: Resolved

Update: From 14:13 to 15:47 UTC calendar sync operations to Microsoft platforms were degraded. During this time around 69% of operations between Cronofy and Microsoft resulted in failure and led to a backlog when processing requests. We took steps to mitigate the effects of this on other platforms and began monitoring the issue which has now been resolved. We believe this is related to Microsoft alert: MO571683 (https://portal.office.com/adminportal/home?#/servicehealth/:/alerts/MO571683)
Time: June 5, 2023, 4:13 p.m.

Status: Monitoring

Update: Calendar sync operations to Microsoft services appear to have returned to normal around 16:00 UTC. We are continuing to monitor.
Time: June 5, 2023, 3:48 p.m.

Status: Identified

Update: Our telemetry is showing that calender sync operations to Outlook.com and Microsoft 365 are also affected. We have mitigated the performance impact this issue was having on calendar sync operations with other providers. We are tracking open Microsoft incidents, with IDs MO571683 and EX571516, and continuing to monitor our platform closely.
Time: June 5, 2023, 2:53 p.m.

Status: Identified

Update: We have identified an issue with calendar sync operations when connecting to the Microsoft Graph API. We are working to mitigate this impact on the rest of our background processing jobs.
Time: June 5, 2023, 2:31 p.m.

Status: Investigating

Update: We are currently investigating degraded performance with background processing in both our DE and US data centers.

Degraded Apple calendar sync

Description: Apple calendar synchronization has returned to normal operation and credentials have been reinstated where the invalidation was related to the incident.

Status: Resolved

Impact: Minor | Started At: May 11, 2023, 7:46 a.m.

Updates:

Time: May 11, 2023, 8:37 a.m.

Status: Resolved

Update: Apple calendar synchronization has returned to normal operation and credentials have been reinstated where the invalidation was related to the incident.
Time: May 11, 2023, 8:24 a.m.

Status: Monitoring

Update: Apple's calendar servers appear to have started responding normally since 08:15 UTC. We will reinstate credentials where possible in the next 20 minutes so long as things continue to appear normal.
Time: May 11, 2023, 8:12 a.m.

Status: Monitoring

Update: Some Apple credentials have been invalidated during this incident before we stepped in to prevent that process from happening. Once Apple's calendar servers are responding normally we will reinstate the credentials invalidated during the window of this incident where possible.
Time: May 11, 2023, 7:46 a.m.

Status: Monitoring

Update: We have seen an elevated number of errors when attempting to synchronize Apple calendars across all data centers since around 07:25 UTC.

US API degradation

Description: On Wednesday April 12th 2023 between 03:16 and 05:35 \(2 hours 19 minutes\) Cronofy experienced an issue causing 17% of the API traffic to our US environments to fail, returning an HTTP 500 error to the API caller. The underlying cause was that our system failed to heal following a database failover. An incorrectly configured alarm, and a gap in our playbook resulted in the issue extending longer than it should. In line with our [principles](https://www.cronofy.com/about#our-principles), we are publishing a public post-mortem to describe what has happened, why it impacted users, and what we will do to prevent it from happening in the future. ## Timeline _Times are from Wednesday April 12th 2023, in UTC and rounded for clarity._ At 03:16 the primary node that writes to the database cluster failed. The secondary node was promoted and the cluster failed over automatically, recovering as planned. This incident alerted us via our pager service, triggering an investigation, which started at 03:20. After the database failover, all visible metrics were pointing towards a successful automated resolution of the issue. Our core metrics were looking healthy and at this time, we marked the incident as recovered without manual intervention at 03:30. It is normal for some of our incidents to be resolved in this way. The on-call engineer makes a decision based on available information as to whether an incident room is necessary to handle an alarm. When a failover occurs, the original failed node is rebooted, and then becomes the read-only partner in the cluster, whilst the existing read-only secondary node is promoted to primary. Some of the connections from the web application nodes were still connected to the read-only secondary database node, but were treating it as if they were connected to the primary writable node. This lead to failures in some of the actions that were taking place in the API. At this time, our monitoring system was not alerting, as the although the metric for measuring HTTP 500 was reporting correctly, the alarm was misconfigured. When the alarm returned no data, this was viewed the same a 0 errors. This resulted in no further alarms to alert us to the degradation of the service, and reinforced the belief that the service was healthy again. At 05:00 an automated notification was posted into our internal slack monitoring channel to show that two health metrics had not fully recovered. This wasn’t an alarm level notification, so did not re-trigger the on-call pager service. At 05:30 an engineer reviewed the notification in slack, and inspected the health metrics. The increased levels of HTTP 500 errors being returned by the Cronofy API was identified. The incident was reopened, and investigation restarted. Our site reliability team was reactivated to triage the issue. At 05:35 Cronofy took the decision to replace all application instances in the US environment. An automated redeployment was triggered. This reset all the nodes connections to the database cluster, flushing out the connections to the read-only node, and returned API error responses to normal levels by 05:37. ## Retrospective We ask three primary questions in our retrospective: * Could we have resolved it sooner? * Could we have identified it sooner? * Could we have prevented it? The root cause for this issue is that our systems did not correctly self-heal when a database failover occurred. Although the database failover was the event that triggered the system to fail, it is something that is a rare, expected, and unavoidable event. The database cluster correctly recovered, and was back in an operating mode within a few minutes. Another significant factor in the severity of the impact of this issue was the robustness of Cronofy’s response to the issue. This was the first occurrence of a database failover happening outside of core working hours. During working hours, multiple engineers would see the error and respond to it, each checking a different set of metrics or systems. This collaborative approach was not correctly translated in to the guidance available in the incident response playbook, resulting in an incomplete set of checks taking place once the database failover had completed. This could definitely have been resolved sooner, by forcing the nodes to reconnect to the database. This should trigger automatically in cases of database failover, and there should also be enough information available to the issue response team to have knowledge that a wider issue is still ongoing. The lack of identification the impacts of the issue had not been fully resolved is what prolonged the incident, and this is linked to several factors: * Confirmation bias of errors trending downwards, and the early hours wake up causing the on-call engineer to miss the elevated error rate. * Misconfigured alarm adding to the confirmation bias by not making it clear that the issue was ongoing. * A reliance on tacit group knowledge instead of explicit documented steps for database failovers meaning that the on-call engineer didn’t know the additional validation and checks that would have identified the issue sooner. ## Actions to be taken We are disappointed with the length of time that it took us to resolve this incident. There were multiple smaller failures that led to this incident having a higher impact. A few different checks, technical changes or alarms being raised could have mitigated this error, and prevented the longer outage. * We are improving the way our applications self-heal in the event of a database failover. Some error messages occurred indicating that the system was trying to write to the wrong part of the database cluster, but these messages did not cause the system to reset. * We will update our guidance for our internal teams on the actions to take when a database failover occurs. This will include a more precise checklist, and specific metrics to review. * We are going through all our alarms to ensure that they are looking at the right data, at the right time. ## Further questions? If you have any further questions, please contact us at [[email protected]](mailto:[email protected])

Status: Postmortem

Impact: Major | Started At: April 12, 2023, 3 a.m.

Updates:

Time: April 13, 2023, 3:18 p.m.

Status: Postmortem

Update: On Wednesday April 12th 2023 between 03:16 and 05:35 \(2 hours 19 minutes\) Cronofy experienced an issue causing 17% of the API traffic to our US environments to fail, returning an HTTP 500 error to the API caller. The underlying cause was that our system failed to heal following a database failover. An incorrectly configured alarm, and a gap in our playbook resulted in the issue extending longer than it should. In line with our [principles](https://www.cronofy.com/about#our-principles), we are publishing a public post-mortem to describe what has happened, why it impacted users, and what we will do to prevent it from happening in the future. ## Timeline _Times are from Wednesday April 12th 2023, in UTC and rounded for clarity._ At 03:16 the primary node that writes to the database cluster failed. The secondary node was promoted and the cluster failed over automatically, recovering as planned. This incident alerted us via our pager service, triggering an investigation, which started at 03:20. After the database failover, all visible metrics were pointing towards a successful automated resolution of the issue. Our core metrics were looking healthy and at this time, we marked the incident as recovered without manual intervention at 03:30. It is normal for some of our incidents to be resolved in this way. The on-call engineer makes a decision based on available information as to whether an incident room is necessary to handle an alarm. When a failover occurs, the original failed node is rebooted, and then becomes the read-only partner in the cluster, whilst the existing read-only secondary node is promoted to primary. Some of the connections from the web application nodes were still connected to the read-only secondary database node, but were treating it as if they were connected to the primary writable node. This lead to failures in some of the actions that were taking place in the API. At this time, our monitoring system was not alerting, as the although the metric for measuring HTTP 500 was reporting correctly, the alarm was misconfigured. When the alarm returned no data, this was viewed the same a 0 errors. This resulted in no further alarms to alert us to the degradation of the service, and reinforced the belief that the service was healthy again. At 05:00 an automated notification was posted into our internal slack monitoring channel to show that two health metrics had not fully recovered. This wasn’t an alarm level notification, so did not re-trigger the on-call pager service. At 05:30 an engineer reviewed the notification in slack, and inspected the health metrics. The increased levels of HTTP 500 errors being returned by the Cronofy API was identified. The incident was reopened, and investigation restarted. Our site reliability team was reactivated to triage the issue. At 05:35 Cronofy took the decision to replace all application instances in the US environment. An automated redeployment was triggered. This reset all the nodes connections to the database cluster, flushing out the connections to the read-only node, and returned API error responses to normal levels by 05:37. ## Retrospective We ask three primary questions in our retrospective: * Could we have resolved it sooner? * Could we have identified it sooner? * Could we have prevented it? The root cause for this issue is that our systems did not correctly self-heal when a database failover occurred. Although the database failover was the event that triggered the system to fail, it is something that is a rare, expected, and unavoidable event. The database cluster correctly recovered, and was back in an operating mode within a few minutes. Another significant factor in the severity of the impact of this issue was the robustness of Cronofy’s response to the issue. This was the first occurrence of a database failover happening outside of core working hours. During working hours, multiple engineers would see the error and respond to it, each checking a different set of metrics or systems. This collaborative approach was not correctly translated in to the guidance available in the incident response playbook, resulting in an incomplete set of checks taking place once the database failover had completed. This could definitely have been resolved sooner, by forcing the nodes to reconnect to the database. This should trigger automatically in cases of database failover, and there should also be enough information available to the issue response team to have knowledge that a wider issue is still ongoing. The lack of identification the impacts of the issue had not been fully resolved is what prolonged the incident, and this is linked to several factors: * Confirmation bias of errors trending downwards, and the early hours wake up causing the on-call engineer to miss the elevated error rate. * Misconfigured alarm adding to the confirmation bias by not making it clear that the issue was ongoing. * A reliance on tacit group knowledge instead of explicit documented steps for database failovers meaning that the on-call engineer didn’t know the additional validation and checks that would have identified the issue sooner. ## Actions to be taken We are disappointed with the length of time that it took us to resolve this incident. There were multiple smaller failures that led to this incident having a higher impact. A few different checks, technical changes or alarms being raised could have mitigated this error, and prevented the longer outage. * We are improving the way our applications self-heal in the event of a database failover. Some error messages occurred indicating that the system was trying to write to the wrong part of the database cluster, but these messages did not cause the system to reset. * We will update our guidance for our internal teams on the actions to take when a database failover occurs. This will include a more precise checklist, and specific metrics to review. * We are going through all our alarms to ensure that they are looking at the right data, at the right time. ## Further questions? If you have any further questions, please contact us at [[email protected]](mailto:[email protected])
Time: April 12, 2023, 6:12 a.m.

Status: Resolved

Update: Our US data center experienced a failover of the primary database at around 03:15 UTC. From this point around 20% of API requests encountered an error resulting in a 500 response, for specific operations such as creating events the failure rate was 80%. This lasted until around 05:30 UTC (around 2h15m later) when the elevate errors were recognized and all processes restarted. Errors then returned to normal levels a few minutes later. From an initial investigation we believe "write" connections were attempting to write to a node which had become a replica as part of the failover, which then failed. Some processes restarted automatically, but API processes appear to have not done so, nor did in the face of intermittent errors. That issue was cleared when we made all the processes restart and service then returned to normal. We will be conducting a full postmortem of this event and will post that against this incident by the end of the week.

Check the status of similar companies and alternatives to Cronofy

NetSuite

Systems Active

ZoomInfo

Systems Active

SPS Commerce

Systems Active

Miro

Systems Active

Field Nation

Systems Active

Outreach

Systems Active

Own Company

Systems Active

Mindbody

Systems Active

TaskRabbit

Systems Active

Nextiva

Systems Active

6Sense

Systems Active

BigCommerce

Systems Active

Frequently Asked Questions - Cronofy

Is there a Cronofy outage?

The current status of Cronofy is: Systems Active

Where can I find the official status page of Cronofy?

The official status page for Cronofy is here

How can I get notified if Cronofy is down or experiencing an outage?

To get notified of any status changes to Cronofy, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Cronofy every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here

What does Cronofy do?

Cronofy offers scheduling technology that enables users to share their availability across various applications. It also provides enterprise-level scheduling tools, UI elements, and APIs.

Is there an Cronofy outage?

Cronofy status: Systems Active

Cronofy outages and incidents

There have been 0 outages or incidents for Cronofy in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Components and Services Monitored for Cronofy

Conferencing Services

Major Calendar Providers

Latest Cronofy outages and incidents.

Degraded Outlook.com sync performance

Updates:

Degraded Microsoft 365 sync performance

Updates:

Degraded Microsoft 365 sync performance

Updates:

Degraded Apple calendar sync

Updates:

US API degradation

Updates:

Check the status of similar companies and alternatives to Cronofy

NetSuite

ZoomInfo

SPS Commerce

Miro

Field Nation

Outreach

Own Company

Mindbody

TaskRabbit

Nextiva

6Sense

BigCommerce

Frequently Asked Questions - Cronofy

Is there a Cronofy outage?

Where can I find the official status page of Cronofy?

How can I get notified if Cronofy is down or experiencing an outage?

What does Cronofy do?

Start monitoring now!