Cronofy Status: Check if Cronofy down or having an outage.

Component	Status
API	Active
Background Processing	Active
Developer Dashboard	Active
Scheduler	Active
Conferencing Services	Active
GoTo	Active
Zoom	Active
Major Calendar Providers	Active
Apple	Active
Google	Active
Microsoft 365	Active
Outlook.com	Active

Degraded performance in US data center

Description: On Tuesday, February 22nd 2022 our US data center experienced 95 minutes of degraded performance between 15:45 and 17:20 UTC. This was caused by the primary PostgreSQL database hitting bandwidth limits and its performance being throttled as a result. This was caused or exacerbated by PostgreSQLs internal housekeeping working on two of our largest tables at the same time. To our customers this would have surfaced as interactions with the US Cronofy platform, i.e. using the website or API, being much slower than normal. For example, the 99th percentile of API response times is usually around 0.5 seconds and during this incident peaked around 14 seconds. We have upgraded the underlying instances of this database, broadly doubling capacity and putting us far from the limit we were hitting. ## Timeline _All times UTC on Tuesday, February 22nd 2022 and approximate for clarity._ **15:45** Our primary database in our US data center started showing signs of some performance degradation. **16:05** First alert received by the on-call engineer for a potential performance issue. Attempts to reduce load on the database through interventions such as temporarily disabling some of its background housekeeping processes. **16:45** Incident opened on our status page informing customers of degraded performance in the US data center. **17:00** Began provisioning more capacity for the primary database as a fallback plan if efforts continued to be unsuccessful. **17:10** New capacity available. **17:15** Failed over to fully take advantage of the new capacity by promoting the larger node to be the writer. **17:20** Performance had returned to normal levels in the US data center. **17:45** Decided we could close the incident. **18:00** Decided to lock in the capacity change and provisioned an additional reader node at the new size. **18:15** Removed the smaller nodes from the database cluster. ## Actions Whilst there was not an outage, this felt like a close call for us. This led to three key questions: * Why had we not foreseen this capacity issue? * Could the capacity issue have been prevented? * Why had we not resolved the issue sooner? ### Foreseeing the capacity issue We had recently performed a major version upgrade on this database, and in the following weeks monitored performance pretty closely. If there was a time we should have spotted a potential issue in the near future, this was such a time. We believe we may have focussed too heavily on CPU and memory metrics in our monitoring, and it was networking capacity that led to this degradation in performance. We will be reviewing our monitoring to set alerts that would have pointed us in the right direction sooner, and also lower priority alerts that would flag an upcoming capacity issue days or weeks in advance. ### Preventing the capacity issue As PostgreSQL internal housekeeping processes appeared to contribute significantly to the problem, we will be revisiting the configuration of these process and seeing if they can be altered to reduce the likelihood of such an impact in future. ### Resolving the issue sooner As this was a performance degradation rather than an outage, the scale of the problem was not clear. This led to the on-call engineer investigating the issue whilst performance degraded further without additional alerts being raised. We will be adding additional alerts relating to performance degradation in several subsystems to raise awareness of the impact of a problem to an on-call engineer. We are also updating our guidance on incident handling for the team to encourage switching to a more visible channel for communication sooner. We are also encouraging the escalation of alerts to involve other on-call engineers in the process, particularly when the cause is not immediately clear. ## Further questions? If you have any further questions, please contact us at [[email protected]](mailto:[email protected])

Status: Postmortem

Impact: Minor | Started At: Feb. 22, 2022, 4:51 p.m.

Updates:

Time: Feb. 25, 2022, 9:45 a.m.

Status: Postmortem

Update: On Tuesday, February 22nd 2022 our US data center experienced 95 minutes of degraded performance between 15:45 and 17:20 UTC. This was caused by the primary PostgreSQL database hitting bandwidth limits and its performance being throttled as a result. This was caused or exacerbated by PostgreSQLs internal housekeeping working on two of our largest tables at the same time. To our customers this would have surfaced as interactions with the US Cronofy platform, i.e. using the website or API, being much slower than normal. For example, the 99th percentile of API response times is usually around 0.5 seconds and during this incident peaked around 14 seconds. We have upgraded the underlying instances of this database, broadly doubling capacity and putting us far from the limit we were hitting. ## Timeline _All times UTC on Tuesday, February 22nd 2022 and approximate for clarity._ **15:45** Our primary database in our US data center started showing signs of some performance degradation. **16:05** First alert received by the on-call engineer for a potential performance issue. Attempts to reduce load on the database through interventions such as temporarily disabling some of its background housekeeping processes. **16:45** Incident opened on our status page informing customers of degraded performance in the US data center. **17:00** Began provisioning more capacity for the primary database as a fallback plan if efforts continued to be unsuccessful. **17:10** New capacity available. **17:15** Failed over to fully take advantage of the new capacity by promoting the larger node to be the writer. **17:20** Performance had returned to normal levels in the US data center. **17:45** Decided we could close the incident. **18:00** Decided to lock in the capacity change and provisioned an additional reader node at the new size. **18:15** Removed the smaller nodes from the database cluster. ## Actions Whilst there was not an outage, this felt like a close call for us. This led to three key questions: * Why had we not foreseen this capacity issue? * Could the capacity issue have been prevented? * Why had we not resolved the issue sooner? ### Foreseeing the capacity issue We had recently performed a major version upgrade on this database, and in the following weeks monitored performance pretty closely. If there was a time we should have spotted a potential issue in the near future, this was such a time. We believe we may have focussed too heavily on CPU and memory metrics in our monitoring, and it was networking capacity that led to this degradation in performance. We will be reviewing our monitoring to set alerts that would have pointed us in the right direction sooner, and also lower priority alerts that would flag an upcoming capacity issue days or weeks in advance. ### Preventing the capacity issue As PostgreSQL internal housekeeping processes appeared to contribute significantly to the problem, we will be revisiting the configuration of these process and seeing if they can be altered to reduce the likelihood of such an impact in future. ### Resolving the issue sooner As this was a performance degradation rather than an outage, the scale of the problem was not clear. This led to the on-call engineer investigating the issue whilst performance degraded further without additional alerts being raised. We will be adding additional alerts relating to performance degradation in several subsystems to raise awareness of the impact of a problem to an on-call engineer. We are also updating our guidance on incident handling for the team to encourage switching to a more visible channel for communication sooner. We are also encouraging the escalation of alerts to involve other on-call engineers in the process, particularly when the cause is not immediately clear. ## Further questions? If you have any further questions, please contact us at [[email protected]](mailto:[email protected])
Time: Feb. 22, 2022, 5:58 p.m.

Status: Resolved

Update: Around 15:45 UTC our primary database in our US data center started showing signs of some performance degradation. We first received an alert at around 16:05 UTC as this problem grew more significant. We made attempts to reduce load on the database through interventions such as temporarily disabling some of its background housekeeping processes. Often giving such breathing room will allow a database to recover by itself. Around 16:45 UTC it appeared our efforts were not bearing fruit, and as the performance of our US data center was degraded from normal levels we opened an incident to make it clear we were aware of the situation. Around 17:00 UTC we decided to provision more capacity for the cluster in case it was necessary, this took around 10 minutes to come online. Whilst that was provisioning, we reduced the capacity of background workers temporarily to see if that would clear the problem by reducing the load. This was unsuccessful and so around 17:15 UTC we decided to failover to the new cluster capacity, after 5 minutes this had warmed and performance had returned to normal levels. There was a brief spike in errors from the US data center as a side effect of the failover, but otherwise the service was available throughout, albeit with degraded performance. We will be conducting a postmortem of this incident and will share our finding by the end of the week.
Time: Feb. 22, 2022, 5:19 p.m.

Status: Identified

Update: Our primary database is the source of the degraded performance, we have provisioned additional capacity to the cluster and failed over to make a new, larger node the primary one. Early signs are positive and we are monitoring the service.
Time: Feb. 22, 2022, 4:51 p.m.

Status: Investigating

Update: We are investigating degraded performance in our US data center.

Degraded performance in US data center

Description: On Tuesday, February 22nd 2022 our US data center experienced 95 minutes of degraded performance between 15:45 and 17:20 UTC. This was caused by the primary PostgreSQL database hitting bandwidth limits and its performance being throttled as a result. This was caused or exacerbated by PostgreSQLs internal housekeeping working on two of our largest tables at the same time. To our customers this would have surfaced as interactions with the US Cronofy platform, i.e. using the website or API, being much slower than normal. For example, the 99th percentile of API response times is usually around 0.5 seconds and during this incident peaked around 14 seconds. We have upgraded the underlying instances of this database, broadly doubling capacity and putting us far from the limit we were hitting. ## Timeline _All times UTC on Tuesday, February 22nd 2022 and approximate for clarity._ **15:45** Our primary database in our US data center started showing signs of some performance degradation. **16:05** First alert received by the on-call engineer for a potential performance issue. Attempts to reduce load on the database through interventions such as temporarily disabling some of its background housekeeping processes. **16:45** Incident opened on our status page informing customers of degraded performance in the US data center. **17:00** Began provisioning more capacity for the primary database as a fallback plan if efforts continued to be unsuccessful. **17:10** New capacity available. **17:15** Failed over to fully take advantage of the new capacity by promoting the larger node to be the writer. **17:20** Performance had returned to normal levels in the US data center. **17:45** Decided we could close the incident. **18:00** Decided to lock in the capacity change and provisioned an additional reader node at the new size. **18:15** Removed the smaller nodes from the database cluster. ## Actions Whilst there was not an outage, this felt like a close call for us. This led to three key questions: * Why had we not foreseen this capacity issue? * Could the capacity issue have been prevented? * Why had we not resolved the issue sooner? ### Foreseeing the capacity issue We had recently performed a major version upgrade on this database, and in the following weeks monitored performance pretty closely. If there was a time we should have spotted a potential issue in the near future, this was such a time. We believe we may have focussed too heavily on CPU and memory metrics in our monitoring, and it was networking capacity that led to this degradation in performance. We will be reviewing our monitoring to set alerts that would have pointed us in the right direction sooner, and also lower priority alerts that would flag an upcoming capacity issue days or weeks in advance. ### Preventing the capacity issue As PostgreSQL internal housekeeping processes appeared to contribute significantly to the problem, we will be revisiting the configuration of these process and seeing if they can be altered to reduce the likelihood of such an impact in future. ### Resolving the issue sooner As this was a performance degradation rather than an outage, the scale of the problem was not clear. This led to the on-call engineer investigating the issue whilst performance degraded further without additional alerts being raised. We will be adding additional alerts relating to performance degradation in several subsystems to raise awareness of the impact of a problem to an on-call engineer. We are also updating our guidance on incident handling for the team to encourage switching to a more visible channel for communication sooner. We are also encouraging the escalation of alerts to involve other on-call engineers in the process, particularly when the cause is not immediately clear. ## Further questions? If you have any further questions, please contact us at [[email protected]](mailto:[email protected])

Status: Postmortem

Impact: Minor | Started At: Feb. 22, 2022, 4:51 p.m.

Updates:

Time: Feb. 25, 2022, 9:45 a.m.

Status: Postmortem

Update: On Tuesday, February 22nd 2022 our US data center experienced 95 minutes of degraded performance between 15:45 and 17:20 UTC. This was caused by the primary PostgreSQL database hitting bandwidth limits and its performance being throttled as a result. This was caused or exacerbated by PostgreSQLs internal housekeeping working on two of our largest tables at the same time. To our customers this would have surfaced as interactions with the US Cronofy platform, i.e. using the website or API, being much slower than normal. For example, the 99th percentile of API response times is usually around 0.5 seconds and during this incident peaked around 14 seconds. We have upgraded the underlying instances of this database, broadly doubling capacity and putting us far from the limit we were hitting. ## Timeline _All times UTC on Tuesday, February 22nd 2022 and approximate for clarity._ **15:45** Our primary database in our US data center started showing signs of some performance degradation. **16:05** First alert received by the on-call engineer for a potential performance issue. Attempts to reduce load on the database through interventions such as temporarily disabling some of its background housekeeping processes. **16:45** Incident opened on our status page informing customers of degraded performance in the US data center. **17:00** Began provisioning more capacity for the primary database as a fallback plan if efforts continued to be unsuccessful. **17:10** New capacity available. **17:15** Failed over to fully take advantage of the new capacity by promoting the larger node to be the writer. **17:20** Performance had returned to normal levels in the US data center. **17:45** Decided we could close the incident. **18:00** Decided to lock in the capacity change and provisioned an additional reader node at the new size. **18:15** Removed the smaller nodes from the database cluster. ## Actions Whilst there was not an outage, this felt like a close call for us. This led to three key questions: * Why had we not foreseen this capacity issue? * Could the capacity issue have been prevented? * Why had we not resolved the issue sooner? ### Foreseeing the capacity issue We had recently performed a major version upgrade on this database, and in the following weeks monitored performance pretty closely. If there was a time we should have spotted a potential issue in the near future, this was such a time. We believe we may have focussed too heavily on CPU and memory metrics in our monitoring, and it was networking capacity that led to this degradation in performance. We will be reviewing our monitoring to set alerts that would have pointed us in the right direction sooner, and also lower priority alerts that would flag an upcoming capacity issue days or weeks in advance. ### Preventing the capacity issue As PostgreSQL internal housekeeping processes appeared to contribute significantly to the problem, we will be revisiting the configuration of these process and seeing if they can be altered to reduce the likelihood of such an impact in future. ### Resolving the issue sooner As this was a performance degradation rather than an outage, the scale of the problem was not clear. This led to the on-call engineer investigating the issue whilst performance degraded further without additional alerts being raised. We will be adding additional alerts relating to performance degradation in several subsystems to raise awareness of the impact of a problem to an on-call engineer. We are also updating our guidance on incident handling for the team to encourage switching to a more visible channel for communication sooner. We are also encouraging the escalation of alerts to involve other on-call engineers in the process, particularly when the cause is not immediately clear. ## Further questions? If you have any further questions, please contact us at [[email protected]](mailto:[email protected])
Time: Feb. 22, 2022, 5:58 p.m.

Status: Resolved

Update: Around 15:45 UTC our primary database in our US data center started showing signs of some performance degradation. We first received an alert at around 16:05 UTC as this problem grew more significant. We made attempts to reduce load on the database through interventions such as temporarily disabling some of its background housekeeping processes. Often giving such breathing room will allow a database to recover by itself. Around 16:45 UTC it appeared our efforts were not bearing fruit, and as the performance of our US data center was degraded from normal levels we opened an incident to make it clear we were aware of the situation. Around 17:00 UTC we decided to provision more capacity for the cluster in case it was necessary, this took around 10 minutes to come online. Whilst that was provisioning, we reduced the capacity of background workers temporarily to see if that would clear the problem by reducing the load. This was unsuccessful and so around 17:15 UTC we decided to failover to the new cluster capacity, after 5 minutes this had warmed and performance had returned to normal levels. There was a brief spike in errors from the US data center as a side effect of the failover, but otherwise the service was available throughout, albeit with degraded performance. We will be conducting a postmortem of this incident and will share our finding by the end of the week.
Time: Feb. 22, 2022, 5:19 p.m.

Status: Identified

Update: Our primary database is the source of the degraded performance, we have provisioned additional capacity to the cluster and failed over to make a new, larger node the primary one. Early signs are positive and we are monitoring the service.
Time: Feb. 22, 2022, 4:51 p.m.

Status: Investigating

Update: We are investigating degraded performance in our US data center.

Elevated errors from Google Calendar

Description: At approximately 17:00 UTC we observed a much higher number of errors for Google calendar API calls than we would expect (mostly 503 Service Unavailable responses) across all of our data centers. There does not appear to have been a pattern to the accounts affected by this. We decided to open an incident about this at 17:10 UTC to inform of potential service degradation as it seemed like it could be a more persistent issue. Whilst opening this incident, errors when communicating with the Google calendar API returned to normal levels at around 17:12 UTC. Errors have remained at normal levels since that time and so we are resolving this incident.

Status: Resolved

Impact: Minor | Started At: Jan. 27, 2022, 5:14 p.m.

Updates:

Time: Jan. 27, 2022, 5:34 p.m.

Status: Resolved

Update: At approximately 17:00 UTC we observed a much higher number of errors for Google calendar API calls than we would expect (mostly 503 Service Unavailable responses) across all of our data centers. There does not appear to have been a pattern to the accounts affected by this. We decided to open an incident about this at 17:10 UTC to inform of potential service degradation as it seemed like it could be a more persistent issue. Whilst opening this incident, errors when communicating with the Google calendar API returned to normal levels at around 17:12 UTC. Errors have remained at normal levels since that time and so we are resolving this incident.
Time: Jan. 27, 2022, 5:18 p.m.

Status: Monitoring

Update: Errors returned to usual levels at around 17:12 UTC, as the previous message was being sent. We are monitoring the situation.
Time: Jan. 27, 2022, 5:14 p.m.

Status: Investigating

Update: Since approximately 17:00 UTC we have seen a higher level of errors when communicating with Google calendars than we would normally expect across all of our data centers. We are monitoring the situation and taking any actions available to us to minimize the impact. Synchronization performance for Google calendars will be affected by this.

Elevated errors from Google Calendar

Description: At approximately 17:00 UTC we observed a much higher number of errors for Google calendar API calls than we would expect (mostly 503 Service Unavailable responses) across all of our data centers. There does not appear to have been a pattern to the accounts affected by this. We decided to open an incident about this at 17:10 UTC to inform of potential service degradation as it seemed like it could be a more persistent issue. Whilst opening this incident, errors when communicating with the Google calendar API returned to normal levels at around 17:12 UTC. Errors have remained at normal levels since that time and so we are resolving this incident.

Status: Resolved

Impact: Minor | Started At: Jan. 27, 2022, 5:14 p.m.

Updates:

Time: Jan. 27, 2022, 5:34 p.m.

Status: Resolved

Update: At approximately 17:00 UTC we observed a much higher number of errors for Google calendar API calls than we would expect (mostly 503 Service Unavailable responses) across all of our data centers. There does not appear to have been a pattern to the accounts affected by this. We decided to open an incident about this at 17:10 UTC to inform of potential service degradation as it seemed like it could be a more persistent issue. Whilst opening this incident, errors when communicating with the Google calendar API returned to normal levels at around 17:12 UTC. Errors have remained at normal levels since that time and so we are resolving this incident.
Time: Jan. 27, 2022, 5:18 p.m.

Status: Monitoring

Update: Errors returned to usual levels at around 17:12 UTC, as the previous message was being sent. We are monitoring the situation.
Time: Jan. 27, 2022, 5:14 p.m.

Status: Investigating

Update: Since approximately 17:00 UTC we have seen a higher level of errors when communicating with Google calendars than we would normally expect across all of our data centers. We are monitoring the situation and taking any actions available to us to minimize the impact. Synchronization performance for Google calendars will be affected by this.

Users unable to log in to Scheduler

Description: Our Engineering team has resolved the Scheduler issue, and users can now log in again. Please get in touch with [email protected] if you have any further questions.

Status: Resolved

Impact: Major | Started At: Jan. 10, 2022, 3:52 p.m.

Updates:

Time: Jan. 10, 2022, 4:12 p.m.

Status: Resolved

Update: Our Engineering team has resolved the Scheduler issue, and users can now log in again. Please get in touch with [email protected] if you have any further questions.
Time: Jan. 10, 2022, 3:52 p.m.

Status: Identified

Update: We are aware of an issue with the Scheduler, which is stopping users from logging in. Our Engineering team are investigating and aim to have a fix in place shortly.

Is there an Cronofy outage?

Cronofy status: Systems Active

Cronofy outages and incidents

There have been 0 outages or incidents for Cronofy in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Components and Services Monitored for Cronofy

Conferencing Services

Major Calendar Providers

Latest Cronofy outages and incidents.

Degraded performance in US data center

Updates:

Degraded performance in US data center

Updates:

Elevated errors from Google Calendar

Updates:

Elevated errors from Google Calendar

Updates:

Users unable to log in to Scheduler

Updates:

Check the status of similar companies and alternatives to Cronofy

NetSuite

ZoomInfo

SPS Commerce

Miro

Field Nation

Outreach

Own Company

Mindbody

TaskRabbit

Nextiva

6Sense

BigCommerce

Frequently Asked Questions - Cronofy

Is there a Cronofy outage?

Where can I find the official status page of Cronofy?

How can I get notified if Cronofy is down or experiencing an outage?

What does Cronofy do?

Start monitoring now!