Last checked: 12 seconds ago
Get notified about any outages, downtime or incidents for Firstup and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Firstup.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
3rd-Party Dependencies | Active |
Identity Access Management | Active |
Image Transformation API | Active |
SendGrid API v3 | Active |
Zoom Zoom Virtual Agent | Active |
Ecosystem | Active |
Connect | Active |
Integrations | Active |
Partner API | Active |
User Sync | Active |
Platforms | Active |
EU Firstup Platform | Active |
US Firstup Platform | Active |
Products | Active |
Classic Studio | Active |
Creator Studio | Active |
Insights | Active |
Microapps | Active |
Mobile Experience | Active |
Web Experience | Active |
View the latest incidents for Firstup and check for official updates:
Description: ## **Summary:** On April 22nd, 2024, at 6:15 AM PT \(13:15 UTC\) we began receiving reports of scheduled campaigns experiencing delays or which had not been published at all. Two sources were identified that lead to the delays, and were subsequently addressed in two separate hotfixes. ## **Impact:** Impact was most visible in campaign reporting delivery metrics showing that campaigns had either not gone out at the expected time, or email deliveries themselves arrived well after the scheduled time. Not all campaigns were affected, and actual delays ranged from several minutes up to an hour or longer in a small number of instances. ## **Root Cause:** Root Cause was determined to be related to a scheduled database upgrade performed on April 19th which resulted in degraded performance characteristics of the scheduling service. There were two underlying observable symptoms: 1. On April 22nd, the actual delivery of some emails was slower than expected as a result of several database queries that were not optimized for the new database software version deployed on April 19th. These queries ran slower after the upgrade when under higher load levels than what had been initially tested against. 2. The number of scheduled campaigns not executing at the precise scheduled time increased dramatically, also following the database upgrade, as a result of several newly uncovered bugs in the scheduling service itself. ## **Mitigation:** A number of mitigation measures were put into place to address different aspects of this platform incident over the course of several days. * The database query optimizations were deployed in a hotfix on April 22nd at 4:30 PM PT \(23:30 UTC\). This was specifically aimed at addressing the email delivery slowness issue. * For Customers who opened support tickets related to specific scheduled campaigns being delayed, those campaigns were manually published as a part of the individual support tickets. Also, a separate query was run on an as-needed basis to proactively identify other campaigns in a similar state, and manually publish those as well. * A second hotfix was deployed on April 24th at 11:30 AM PT \(18:30 UTC\) to add an automated backstop measure to catch and publish any campaigns that had been scheduled at an earlier time but had not actually started. ## **Recurrence Prevention:** The following actions have been committed to fully resolving the incident and eliminating the reliance on the mitigation measure currently in place. * Create improved platform alerting for campaign delivery times to identify and address degraded state earlier. * Fix remaining 3 bugs uncovered during the incident investigation process as well as making the scheduler service itself more robust.
Status: Postmortem
Impact: None | Started At: April 22, 2024, 4:53 p.m.
Description: ## **Summary:** On April 22nd, 2024, at 6:15 AM PT \(13:15 UTC\) we began receiving reports of scheduled campaigns experiencing delays or which had not been published at all. Two sources were identified that lead to the delays, and were subsequently addressed in two separate hotfixes. ## **Impact:** Impact was most visible in campaign reporting delivery metrics showing that campaigns had either not gone out at the expected time, or email deliveries themselves arrived well after the scheduled time. Not all campaigns were affected, and actual delays ranged from several minutes up to an hour or longer in a small number of instances. ## **Root Cause:** Root Cause was determined to be related to a scheduled database upgrade performed on April 19th which resulted in degraded performance characteristics of the scheduling service. There were two underlying observable symptoms: 1. On April 22nd, the actual delivery of some emails was slower than expected as a result of several database queries that were not optimized for the new database software version deployed on April 19th. These queries ran slower after the upgrade when under higher load levels than what had been initially tested against. 2. The number of scheduled campaigns not executing at the precise scheduled time increased dramatically, also following the database upgrade, as a result of several newly uncovered bugs in the scheduling service itself. ## **Mitigation:** A number of mitigation measures were put into place to address different aspects of this platform incident over the course of several days. * The database query optimizations were deployed in a hotfix on April 22nd at 4:30 PM PT \(23:30 UTC\). This was specifically aimed at addressing the email delivery slowness issue. * For Customers who opened support tickets related to specific scheduled campaigns being delayed, those campaigns were manually published as a part of the individual support tickets. Also, a separate query was run on an as-needed basis to proactively identify other campaigns in a similar state, and manually publish those as well. * A second hotfix was deployed on April 24th at 11:30 AM PT \(18:30 UTC\) to add an automated backstop measure to catch and publish any campaigns that had been scheduled at an earlier time but had not actually started. ## **Recurrence Prevention:** The following actions have been committed to fully resolving the incident and eliminating the reliance on the mitigation measure currently in place. * Create improved platform alerting for campaign delivery times to identify and address degraded state earlier. * Fix remaining 3 bugs uncovered during the incident investigation process as well as making the scheduler service itself more robust.
Status: Postmortem
Impact: None | Started At: April 22, 2024, 4:53 p.m.
Description: ### **Summary:** On Tuesday April 16th, 2024, starting at approximately 9:54 AM UTC to 11:09 AM UTC, EU Studio experienced multiple service disruptions including general slowness with loading Studio functions, issues with login as well as HTTP 500 system error messages. It was identified that a number of backend services were experiencing TCP \(Transmission Control Protocol\) networking issues that manifested as a variety of user-visible errors and unpredictable product interactions. ### **Impact:** Affected users were unable to login into Studio, as well as experienced general slowness and system error messages such as “504 Gateway Timeout” or “502 Bad Gateway” due to the backend services having network errors. ### **Root Cause:** The root cause was determined to be an unexpected spike in traffic which caused a number of nodes \(worker machines\) to rapidly increase to handle the additional workload. This led to DNS \(Domain Name Service\) request timeouts as it exceeded the overall capacity for inbound DNS traffic when these nodes increased. ### **Mitigation:** The immediate problem was mitigated by increasing DNS capacity within the EU infrastructure and restarting the affected services, restoring system services and performance by 11:09 AM UTC. ### **Recurrence Prevention:** Below changes have been implemented to prevent unexpected loss of DNS service capacity. * An alert will now fire within the EU infrastructure any time the internal DNS capacity drops below the minimal viable threshold determined by Site Reliability Engineering. * Load testing has been performed to ensure scalability and appropriate buffer for potential spikes and organic growth in DNS request volume.
Status: Postmortem
Impact: None | Started At: April 16, 2024, 10:04 a.m.
Description: **Summary:** On March 15th, 2024, we started receiving reports where scheduled campaigns experienced delays in publishing at the scheduled time or did not publish at all at the scheduled time. **Impact:** The impact was restricted to any scheduled campaigns on the FirstUp platform scheduled to publish on March 15th, 2024, between 1:00 AM ET \(05:00 UTC\) and 8:04 PM ET \(March 16th, 2024 - 00:04 UTC\). **Root Cause:** The root cause was determined to be a regression to a software change to the “scheduled campaign callback service” that was deployed during our scheduled software release window the previous day causing a callback to the “scheduling service” \(to publish a scheduled campaign at the scheduled time\) to fail. **Mitigation:** A hotfix was deployed by 8:04 PM ET \(March 16th, 2024 - 00:04 UTC\) to address the software regression introduced in the campaign scheduling software. Any delayed scheduled campaigns were also manually published by the same time. **Recurrence Prevention:** The Incident Response Team has taken the following actions in an effort to prevent a recurrence of this incident: * Implemented additional pre-release regression testing around the “scheduling service”. * Documented the SQL rake task used to identify any failed/delayed scheduled campaigns in a runbook to aid in quickly mitigating any future similar incidents. * Created monitors to alert us on the first instance of a failed/delayed scheduled campaign to enable us to proactively get ahead of any campaign scheduling issue\(s\) and prevent similar platform-wide incidents.
Status: Postmortem
Impact: None | Started At: March 15, 2024, 8:19 p.m.
Description: **Summary:** On March 15th, 2024, we started receiving reports where scheduled campaigns experienced delays in publishing at the scheduled time or did not publish at all at the scheduled time. **Impact:** The impact was restricted to any scheduled campaigns on the FirstUp platform scheduled to publish on March 15th, 2024, between 1:00 AM ET \(05:00 UTC\) and 8:04 PM ET \(March 16th, 2024 - 00:04 UTC\). **Root Cause:** The root cause was determined to be a regression to a software change to the “scheduled campaign callback service” that was deployed during our scheduled software release window the previous day causing a callback to the “scheduling service” \(to publish a scheduled campaign at the scheduled time\) to fail. **Mitigation:** A hotfix was deployed by 8:04 PM ET \(March 16th, 2024 - 00:04 UTC\) to address the software regression introduced in the campaign scheduling software. Any delayed scheduled campaigns were also manually published by the same time. **Recurrence Prevention:** The Incident Response Team has taken the following actions in an effort to prevent a recurrence of this incident: * Implemented additional pre-release regression testing around the “scheduling service”. * Documented the SQL rake task used to identify any failed/delayed scheduled campaigns in a runbook to aid in quickly mitigating any future similar incidents. * Created monitors to alert us on the first instance of a failed/delayed scheduled campaign to enable us to proactively get ahead of any campaign scheduling issue\(s\) and prevent similar platform-wide incidents.
Status: Postmortem
Impact: None | Started At: March 15, 2024, 8:19 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.