Last checked: 6 minutes ago
Get notified about any outages, downtime or incidents for Firstup and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Firstup.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
3rd-Party Dependencies | Active |
Identity Access Management | Active |
Image Transformation API | Active |
SendGrid API v3 | Active |
Zoom Zoom Virtual Agent | Active |
Ecosystem | Active |
Connect | Active |
Integrations | Active |
Partner API | Active |
User Sync | Active |
Platforms | Active |
EU Firstup Platform | Active |
US Firstup Platform | Active |
Products | Active |
Classic Studio | Active |
Creator Studio | Active |
Insights | Active |
Microapps | Active |
Mobile Experience | Active |
Web Experience | Active |
View the latest incidents for Firstup and check for official updates:
Description: **Summary:** On Thursday, May 31st, 2024, we received reports from several EU communities that the Web and Mobile Experiences were malfunctioning. End users reported that the Web or Mobile Experiences weren’t loading as expected, and a blank screen was observed. **Scope:** The scope of this incident was restricted to end users trying to access the Web and Mobile Experiences for communities in the EU only. **Root Cause:** A platform incident was declared at 08:30 UTC, and the incident response team identified a misconfiguration on the EU instance of CloudFront cache policy. The misconfiguration made it so that the first request for an asset from any origin would be cached as a response to asset requests from all other origins. This caused subsequent requests from all other origins in the EU to fail due to Cross-origin resource sharing \(CORS\), resulting in the blank screen observed. **Mitigation:** The immediate impact was mitigated by invalidating the incorrect cached response in the EU instance of CloudFront, which restored Web and Mobile Experience access by 09:35 UTC. Additional assets-reindexing was performed to force any end-user mobile devices that had locally cached the incorrect responses to reload the Mobile Experience. **Recurrence Prevention:** The EU CloudFront cache policy configurations have been updated to match the US settings to prevent this incident from recurring.
Status: Postmortem
Impact: None | Started At: May 31, 2024, 8:42 a.m.
Description: **Summary:** On Thursday, May 31st, 2024, we received reports from several EU communities that the Web and Mobile Experiences were malfunctioning. End users reported that the Web or Mobile Experiences weren’t loading as expected, and a blank screen was observed. **Scope:** The scope of this incident was restricted to end users trying to access the Web and Mobile Experiences for communities in the EU only. **Root Cause:** A platform incident was declared at 08:30 UTC, and the incident response team identified a misconfiguration on the EU instance of CloudFront cache policy. The misconfiguration made it so that the first request for an asset from any origin would be cached as a response to asset requests from all other origins. This caused subsequent requests from all other origins in the EU to fail due to Cross-origin resource sharing \(CORS\), resulting in the blank screen observed. **Mitigation:** The immediate impact was mitigated by invalidating the incorrect cached response in the EU instance of CloudFront, which restored Web and Mobile Experience access by 09:35 UTC. Additional assets-reindexing was performed to force any end-user mobile devices that had locally cached the incorrect responses to reload the Mobile Experience. **Recurrence Prevention:** The EU CloudFront cache policy configurations have been updated to match the US settings to prevent this incident from recurring.
Status: Postmortem
Impact: None | Started At: May 31, 2024, 8:42 a.m.
Description: ## **Summary:** On Tuesday May 28th, 2024 starting at approximately 8:32 AM PT until 9:32 AM PT we received multiple reports of email campaigns not being sent. It was identified that several campaigns scheduled for delivery were either delayed or not delivered at all. The issue was confirmed in both customer environments and internal test environments, indicating a widespread problem affecting the email campaign functionality. It was related to a code error in a governor hotfix deployed earlier that day. ## **Impact:** Affected scheduled campaigns were temporarily delayed by one hour with some customers reporting a 10 minute delay for the campaign to start sending as well as reports that for some customers the time sensitive campaigns were delivered on time for certain recipients, but delayed for others or not delivered at all. ## **Root Cause:** The latest deployment of a hotfix for an audience restrictions feature caused an elevated error rate as a result of a missing expected default value that was not set. ## **Mitigation:** Once the root cause was identified a software change rollback was enforced in addition to manually re-driving the intake DLQ \(dead-letter queue\) to prevent further degradation and ensure the affected campaigns were sent despite the delay. ## **Recurrence Prevention:** The below changes have been implemented to ensure campaign delays are prevented due to internal deployment activities: * Comprehensive Feature Flag Testing: While we conducted regression tests, we primarily focused on scenarios with the feature flag ON, missing issues when the flag was OFF. Moving forward, we will ensure that each feature flag is rigorously tested in both states. We will also collaborate with other domains for comprehensive testing, ensuring compatibility and robustness across all domains. * Optimal Deployment Scheduling: Moving forward, we will study the platform usage to determine optimal times for hotfix deployments, ensuring minimal impact. This schedule will be documented and approved comprehensively to balance the urgency of fixes with operational stability. * Enhanced Monitoring and Alerting: The initial alert was missed as it was bundled with other warnings. To prevent this, we will enhance our monitoring and alerting systems, ensuring that all critical alerts are promptly noticed. We will continue to refine our dashboards and tailored alerts to maintain proactive and continuous monitoring throughout Development, UAT, and production environments. * High-Risk Multi-Service Hotfixes: Scheduling deployment for high-risk multi-service hotfixes outside of full release and regression cycles. Such changes should be evaluated to determine if they can be deferred to scheduled releases, ensuring they undergo thorough regression testing to prevent widespread issues.
Status: Postmortem
Impact: Minor | Started At: May 28, 2024, 5:06 p.m.
Description: ## **Summary:** On Wednesday May 15th, 2024 starting at approximately 3:00 AM PT until 10:22 AM PT we received customer reports that scheduled campaigns were not being delivered as expected. These reports included campaigns either sitting in the ‘Scheduled’ queue and not getting sent on time, as well as reports of campaigns that were marked as ‘Delivered’ but showed no actual sends to the users in the Campaign Delivery Report. The problem was linked to a regression in a service used for scheduling that introduced a new post related variable that was not correctly initialized in a specific campaign. The result was that this campaign, and all subsequently scheduled campaigns, were not automatically published during the affected time. ## **Impact:** Campaigns that were scheduled remained sitting in the scheduled queue, or were incorrectly marked as delivered, but were not actually sent to the target audience. We confirmed at least 50 campaigns that were scheduled to publish experienced some sort of a delay. ## **Root Cause:** The latest deployment of the service used for campaign scheduling had an internal permissions check failure that resulted in an elevated error rate as a result of a missing expected default value that was not set. ## **Mitigation:** Once the root cause was identified through a source code analyzer, we corrected the post with the missing expected default value. This allowed all of the remaining scheduled campaigns to be published. Any other campaigns that did not go out on time were then manually published by the incident management team. ## **Recurrence Prevention:** The below changes have been implemented to ensure campaign delays are prevented due to internal deployment activities: * A hotfix was deployed on Monday June 3rd, 2024 where the internal permissions method was updated to ensure the correct default value is set for the new post variable irrespective of whether the associated feature flag is enabled or not. * Improved error logging in the scheduling software to make it easier and faster to identify irrecoverable errors that can cause the scheduler to queue up and delay sends. . * Updated runbooks for steps to troubleshoot delayed campaigns linked from monitors * Improved monitoring for the schedule\_publish endpoint for any errors
Status: Postmortem
Impact: Minor | Started At: May 15, 2024, 4:22 p.m.
Description: ## **Summary:** On Wednesday May 15th, 2024 starting at approximately 3:00 AM PT until 10:22 AM PT we received customer reports that scheduled campaigns were not being delivered as expected. These reports included campaigns either sitting in the ‘Scheduled’ queue and not getting sent on time, as well as reports of campaigns that were marked as ‘Delivered’ but showed no actual sends to the users in the Campaign Delivery Report. The problem was linked to a regression in a service used for scheduling that introduced a new post related variable that was not correctly initialized in a specific campaign. The result was that this campaign, and all subsequently scheduled campaigns, were not automatically published during the affected time. ## **Impact:** Campaigns that were scheduled remained sitting in the scheduled queue, or were incorrectly marked as delivered, but were not actually sent to the target audience. We confirmed at least 50 campaigns that were scheduled to publish experienced some sort of a delay. ## **Root Cause:** The latest deployment of the service used for campaign scheduling had an internal permissions check failure that resulted in an elevated error rate as a result of a missing expected default value that was not set. ## **Mitigation:** Once the root cause was identified through a source code analyzer, we corrected the post with the missing expected default value. This allowed all of the remaining scheduled campaigns to be published. Any other campaigns that did not go out on time were then manually published by the incident management team. ## **Recurrence Prevention:** The below changes have been implemented to ensure campaign delays are prevented due to internal deployment activities: * A hotfix was deployed on Monday June 3rd, 2024 where the internal permissions method was updated to ensure the correct default value is set for the new post variable irrespective of whether the associated feature flag is enabled or not. * Improved error logging in the scheduling software to make it easier and faster to identify irrecoverable errors that can cause the scheduler to queue up and delay sends. . * Updated runbooks for steps to troubleshoot delayed campaigns linked from monitors * Improved monitoring for the schedule\_publish endpoint for any errors
Status: Postmortem
Impact: Minor | Started At: May 15, 2024, 4:22 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.