Firstup Status: Check if Firstup down or having an outage.

Platform Service Degradation - Studio Slow Performance

Description: **Summary:** On July 10th, 2024, beginning at approximately 1:30 PM ET \(17:30 UTC\), we started receiving reports of published email campaigns that hadn’t been delivered to the intended audiences in over 1 hour. Due to the number of reports received, a platform incident was declared at 2:18 PM ET \(18:18 UTC\) and an incident response team began investigating these reports. Another platform incident was declared on July 11th, 2024, after reports of audiences being inaccessible or taking too long to load were received. ‌ **Impact:** The service degradation was intermittent in nature, and the impact was restricted to the US platform for access to some audiences and some campaigns published on July 10th, 2024, at or after 11:20 AM ET \(15:20 UTC\) through July 15th, 2024, at 6:23 PM ET \(22:23 UTC\). **Root Cause:** Both incidents stemmed from an overload of the ElasticSearch service, which resolves Audiences to User IDs and email addresses. A surge in error messages temporarily stored in a queue \(for messages ElasticSearch couldn't process\) overwhelmed the service, causing it to intermittently stop serving requests until it could catch up. ‌ **Mitigation:** The issue was immediately addressed by reducing the number of workers sending requests to ElasticSearch and increasing the number of nodes processing those requests. This reduced the strain on ElasticSearch, allowing the request queue to clear faster. Additionally, the error messages were manually reprocessed, making audiences accessible and campaigns publishable again. ‌ **Recurrence Prevention:** Errors in the queue are normal and typically resolve through automatic reprocessing. However, to prevent future occurrences: * We doubled ElasticSearch’s processing power on July 15th, 2024, at 6:23 PM ET to better handle any spikes. * We enabled additional monitoring and dashboards for early detection and mitigation of potential issues. * We will investigate and address the sources of the errors to ensure a healthier service.

Status: Postmortem

Impact: None | Started At: July 11, 2024, 7:07 p.m.

Updates:

Time: July 26, 2024, 5:42 p.m.

Status: Postmortem

Update: **Summary:** On July 10th, 2024, beginning at approximately 1:30 PM ET \(17:30 UTC\), we started receiving reports of published email campaigns that hadn’t been delivered to the intended audiences in over 1 hour. Due to the number of reports received, a platform incident was declared at 2:18 PM ET \(18:18 UTC\) and an incident response team began investigating these reports. Another platform incident was declared on July 11th, 2024, after reports of audiences being inaccessible or taking too long to load were received. ‌ **Impact:** The service degradation was intermittent in nature, and the impact was restricted to the US platform for access to some audiences and some campaigns published on July 10th, 2024, at or after 11:20 AM ET \(15:20 UTC\) through July 15th, 2024, at 6:23 PM ET \(22:23 UTC\). **Root Cause:** Both incidents stemmed from an overload of the ElasticSearch service, which resolves Audiences to User IDs and email addresses. A surge in error messages temporarily stored in a queue \(for messages ElasticSearch couldn't process\) overwhelmed the service, causing it to intermittently stop serving requests until it could catch up. ‌ **Mitigation:** The issue was immediately addressed by reducing the number of workers sending requests to ElasticSearch and increasing the number of nodes processing those requests. This reduced the strain on ElasticSearch, allowing the request queue to clear faster. Additionally, the error messages were manually reprocessed, making audiences accessible and campaigns publishable again. ‌ **Recurrence Prevention:** Errors in the queue are normal and typically resolve through automatic reprocessing. However, to prevent future occurrences: * We doubled ElasticSearch’s processing power on July 15th, 2024, at 6:23 PM ET to better handle any spikes. * We enabled additional monitoring and dashboards for early detection and mitigation of potential issues. * We will investigate and address the sources of the errors to ensure a healthier service.
Time: July 26, 2024, 5:42 p.m.

Status: Resolved

Update: This incident is now resolved, and all associated services are fully operational.
Time: July 15, 2024, 8:59 p.m.

Status: Monitoring

Update: Studio performance has now been restored and all functionality should be available. We will be placing these services under monitoring for now.
Time: July 15, 2024, 8:15 p.m.

Status: Investigating

Update: We continue to investigate the recurrence of this issue and will provide another update within 1 hour.
Time: July 15, 2024, 7:13 p.m.

Status: Investigating

Update: We have received reports of a recurrence of this issue (including delayed campaign deliveries), and are actively investigating it again. Another update within 1 hour.
Time: July 11, 2024, 9:32 p.m.

Status: Monitoring

Update: Studio performance has now been restored and all functionality should be available. We will be placing these services under monitoring for now.
Time: July 11, 2024, 9:21 p.m.

Status: Identified

Update: We continue to work on a fix for this issue and will provide another update within the next hour, or as soon as the fix is deployed.
Time: July 11, 2024, 8:16 p.m.

Status: Identified

Update: We continue to work on a fix for this issue and will provide another update within the next hour, or as soon as the fix is deployed.
Time: July 11, 2024, 7:15 p.m.

Status: Identified

Update: We have identified the cause of this service degradation, and are working to mitigate the issue. Another update within 1 hour.
Time: July 11, 2024, 7:07 p.m.

Status: Investigating

Update: We are currently investigating reports of slow Studio performance with errors loading some pages, including Audiences. We will provide you with an update with 1 hour.

Platform Service Disruption - Campaign emails delayed or not delivered

Status: Postmortem

Impact: None | Started At: July 10, 2024, 6:24 p.m.

Updates:

Time: July 26, 2024, 5:40 p.m.

Status: Postmortem

Update: **Summary:** On July 10th, 2024, beginning at approximately 1:30 PM ET \(17:30 UTC\), we started receiving reports of published email campaigns that hadn’t been delivered to the intended audiences in over 1 hour. Due to the number of reports received, a platform incident was declared at 2:18 PM ET \(18:18 UTC\) and an incident response team began investigating these reports. Another platform incident was declared on July 11th, 2024, after reports of audiences being inaccessible or taking too long to load were received. ‌ **Impact:** The service degradation was intermittent in nature, and the impact was restricted to the US platform for access to some audiences and some campaigns published on July 10th, 2024, at or after 11:20 AM ET \(15:20 UTC\) through July 15th, 2024, at 6:23 PM ET \(22:23 UTC\). **Root Cause:** Both incidents stemmed from an overload of the ElasticSearch service, which resolves Audiences to User IDs and email addresses. A surge in error messages temporarily stored in a queue \(for messages ElasticSearch couldn't process\) overwhelmed the service, causing it to intermittently stop serving requests until it could catch up. ‌ **Mitigation:** The issue was immediately addressed by reducing the number of workers sending requests to ElasticSearch and increasing the number of nodes processing those requests. This reduced the strain on ElasticSearch, allowing the request queue to clear faster. Additionally, the error messages were manually reprocessed, making audiences accessible and campaigns publishable again. ‌ **Recurrence Prevention:** Errors in the queue are normal and typically resolve through automatic reprocessing. However, to prevent future occurrences: * We doubled ElasticSearch’s processing power on July 15th, 2024, at 6:23 PM ET to better handle any spikes. * We enabled additional monitoring and dashboards for early detection and mitigation of potential issues. * We will investigate and address the sources of the errors to ensure a healthier service.
Time: July 26, 2024, 5:39 p.m.

Status: Resolved

Update: This incident is now resolved, and all associated services are fully operational.
Time: July 10, 2024, 6:35 p.m.

Status: Monitoring

Update: This issue has now been mitigated, and customers whose campaign emails had not been delivered should start seeing them coming in now. We will provide you with another update as more information is made available.
Time: July 10, 2024, 6:24 p.m.

Status: Investigating

Update: We have received reports where campaign emails are delayed or have not been delivered at all after an extended period of time. We are investigating these reports and will provide you with an update within 1 hour.

Web Experience and API are currently unavailable for EU Customers.

Description: **Summary:** On Thursday, May 31st, 2024, we received reports from several EU communities that the Web and Mobile Experiences were malfunctioning. End users reported that the Web or Mobile Experiences weren’t loading as expected, and a blank screen was observed. **Scope:** The scope of this incident was restricted to end users trying to access the Web and Mobile Experiences for communities in the EU only. **Root Cause:** A platform incident was declared at 08:30 UTC, and the incident response team identified a misconfiguration on the EU instance of CloudFront cache policy. The misconfiguration made it so that the first request for an asset from any origin would be cached as a response to asset requests from all other origins. This caused subsequent requests from all other origins in the EU to fail due to Cross-origin resource sharing \(CORS\), resulting in the blank screen observed. **Mitigation:** The immediate impact was mitigated by invalidating the incorrect cached response in the EU instance of CloudFront, which restored Web and Mobile Experience access by 09:35 UTC. Additional assets-reindexing was performed to force any end-user mobile devices that had locally cached the incorrect responses to reload the Mobile Experience. **Recurrence Prevention:** The EU CloudFront cache policy configurations have been updated to match the US settings to prevent this incident from recurring.

Status: Postmortem

Impact: None | Started At: May 31, 2024, 8:42 a.m.

Updates:

Time: June 10, 2024, 3:10 p.m.

Status: Postmortem

Update: **Summary:** On Thursday, May 31st, 2024, we received reports from several EU communities that the Web and Mobile Experiences were malfunctioning. End users reported that the Web or Mobile Experiences weren’t loading as expected, and a blank screen was observed. **Scope:** The scope of this incident was restricted to end users trying to access the Web and Mobile Experiences for communities in the EU only. **Root Cause:** A platform incident was declared at 08:30 UTC, and the incident response team identified a misconfiguration on the EU instance of CloudFront cache policy. The misconfiguration made it so that the first request for an asset from any origin would be cached as a response to asset requests from all other origins. This caused subsequent requests from all other origins in the EU to fail due to Cross-origin resource sharing \(CORS\), resulting in the blank screen observed. **Mitigation:** The immediate impact was mitigated by invalidating the incorrect cached response in the EU instance of CloudFront, which restored Web and Mobile Experience access by 09:35 UTC. Additional assets-reindexing was performed to force any end-user mobile devices that had locally cached the incorrect responses to reload the Mobile Experience. **Recurrence Prevention:** The EU CloudFront cache policy configurations have been updated to match the US settings to prevent this incident from recurring.
Time: June 4, 2024, 10:56 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: May 31, 2024, 9:39 a.m.

Status: Monitoring

Update: We have applied a fix for the issue affecting Web Experience and Partner API for EU communities and are currently monitoring.
Time: May 31, 2024, 9:09 a.m.

Status: Identified

Update: We have identified a possible root cause for the issue affecting Web Experience and the Partner API for EU customers and will provide further updates once we have applied a fix.
Time: May 31, 2024, 8:42 a.m.

Status: Investigating

Update: We are investigating a service disruption affecting Web Experience for EU customers. Our Engineering Team are urgently investigating. Updates to follow.

Platform Service Degradation - Email Campaigns Not Sending

Description: ## **Summary:** On Tuesday May 28th, 2024 starting at approximately 8:32 AM PT until 9:32 AM PT we received multiple reports of email campaigns not being sent. It was identified that several campaigns scheduled for delivery were either delayed or not delivered at all. The issue was confirmed in both customer environments and internal test environments, indicating a widespread problem affecting the email campaign functionality. It was related to a code error in a governor hotfix deployed earlier that day. ## **Impact:** Affected scheduled campaigns were temporarily delayed by one hour with some customers reporting a 10 minute delay for the campaign to start sending as well as reports that for some customers the time sensitive campaigns were delivered on time for certain recipients, but delayed for others or not delivered at all. ## **Root Cause:** The latest deployment of a hotfix for an audience restrictions feature caused an elevated error rate as a result of a missing expected default value that was not set. ## **Mitigation:** Once the root cause was identified a software change rollback was enforced in addition to manually re-driving the intake DLQ \(dead-letter queue\) to prevent further degradation and ensure the affected campaigns were sent despite the delay. ## **Recurrence Prevention:** The below changes have been implemented to ensure campaign delays are prevented due to internal deployment activities: * Comprehensive Feature Flag Testing: While we conducted regression tests, we primarily focused on scenarios with the feature flag ON, missing issues when the flag was OFF. Moving forward, we will ensure that each feature flag is rigorously tested in both states. We will also collaborate with other domains for comprehensive testing, ensuring compatibility and robustness across all domains. * Optimal Deployment Scheduling: Moving forward, we will study the platform usage to determine optimal times for hotfix deployments, ensuring minimal impact. This schedule will be documented and approved comprehensively to balance the urgency of fixes with operational stability. * Enhanced Monitoring and Alerting: The initial alert was missed as it was bundled with other warnings. To prevent this, we will enhance our monitoring and alerting systems, ensuring that all critical alerts are promptly noticed. We will continue to refine our dashboards and tailored alerts to maintain proactive and continuous monitoring throughout Development, UAT, and production environments. * High-Risk Multi-Service Hotfixes: Scheduling deployment for high-risk multi-service hotfixes outside of full release and regression cycles. Such changes should be evaluated to determine if they can be deferred to scheduled releases, ensuring they undergo thorough regression testing to prevent widespread issues.

Status: Postmortem

Impact: Minor | Started At: May 28, 2024, 5:06 p.m.

Updates:

Time: June 15, 2024, 1:55 a.m.

Status: Postmortem

Update: ## **Summary:** On Tuesday May 28th, 2024 starting at approximately 8:32 AM PT until 9:32 AM PT we received multiple reports of email campaigns not being sent. It was identified that several campaigns scheduled for delivery were either delayed or not delivered at all. The issue was confirmed in both customer environments and internal test environments, indicating a widespread problem affecting the email campaign functionality. It was related to a code error in a governor hotfix deployed earlier that day. ## **Impact:** Affected scheduled campaigns were temporarily delayed by one hour with some customers reporting a 10 minute delay for the campaign to start sending as well as reports that for some customers the time sensitive campaigns were delivered on time for certain recipients, but delayed for others or not delivered at all. ## **Root Cause:** The latest deployment of a hotfix for an audience restrictions feature caused an elevated error rate as a result of a missing expected default value that was not set. ## **Mitigation:** Once the root cause was identified a software change rollback was enforced in addition to manually re-driving the intake DLQ \(dead-letter queue\) to prevent further degradation and ensure the affected campaigns were sent despite the delay. ## **Recurrence Prevention:** The below changes have been implemented to ensure campaign delays are prevented due to internal deployment activities: * Comprehensive Feature Flag Testing: While we conducted regression tests, we primarily focused on scenarios with the feature flag ON, missing issues when the flag was OFF. Moving forward, we will ensure that each feature flag is rigorously tested in both states. We will also collaborate with other domains for comprehensive testing, ensuring compatibility and robustness across all domains. * Optimal Deployment Scheduling: Moving forward, we will study the platform usage to determine optimal times for hotfix deployments, ensuring minimal impact. This schedule will be documented and approved comprehensively to balance the urgency of fixes with operational stability. * Enhanced Monitoring and Alerting: The initial alert was missed as it was bundled with other warnings. To prevent this, we will enhance our monitoring and alerting systems, ensuring that all critical alerts are promptly noticed. We will continue to refine our dashboards and tailored alerts to maintain proactive and continuous monitoring throughout Development, UAT, and production environments. * High-Risk Multi-Service Hotfixes: Scheduling deployment for high-risk multi-service hotfixes outside of full release and regression cycles. Such changes should be evaluated to determine if they can be deferred to scheduled releases, ensuring they undergo thorough regression testing to prevent widespread issues.
Time: May 28, 2024, 6:25 p.m.

Status: Resolved

Update: We have identified the root cause was related to a governor service change, a hot fix version caused an increased error rate. This caused a brief delay, however impacted delivery systems are processing as normal. Additional details will be outlined in our postmortem to this service degradation.
Time: May 28, 2024, 5:06 p.m.

Status: Investigating

Update: We are currently investigating reports of email campaigns not sending. We will provide you with an update within 1 hour.

Platform Service Degradation - Scheduled Campaigns Not Sending

Description: ## **Summary:** On Wednesday May 15th, 2024 starting at approximately 3:00 AM PT until 10:22 AM PT we received customer reports that scheduled campaigns were not being delivered as expected. These reports included campaigns either sitting in the ‘Scheduled’ queue and not getting sent on time, as well as reports of campaigns that were marked as ‘Delivered’ but showed no actual sends to the users in the Campaign Delivery Report. The problem was linked to a regression in a service used for scheduling that introduced a new post related variable that was not correctly initialized in a specific campaign. The result was that this campaign, and all subsequently scheduled campaigns, were not automatically published during the affected time. ## **Impact:** Campaigns that were scheduled remained sitting in the scheduled queue, or were incorrectly marked as delivered, but were not actually sent to the target audience. We confirmed at least 50 campaigns that were scheduled to publish experienced some sort of a delay. ## **Root Cause:** The latest deployment of the service used for campaign scheduling had an internal permissions check failure that resulted in an elevated error rate as a result of a missing expected default value that was not set. ## **Mitigation:** Once the root cause was identified through a source code analyzer, we corrected the post with the missing expected default value. This allowed all of the remaining scheduled campaigns to be published. Any other campaigns that did not go out on time were then manually published by the incident management team. ## **Recurrence Prevention:** The below changes have been implemented to ensure campaign delays are prevented due to internal deployment activities: * A hotfix was deployed on Monday June 3rd, 2024 where the internal permissions method was updated to ensure the correct default value is set for the new post variable irrespective of whether the associated feature flag is enabled or not. * Improved error logging in the scheduling software to make it easier and faster to identify irrecoverable errors that can cause the scheduler to queue up and delay sends. . * Updated runbooks for steps to troubleshoot delayed campaigns linked from monitors * Improved monitoring for the schedule\_publish endpoint for any errors

Status: Postmortem

Impact: Minor | Started At: May 15, 2024, 4:22 p.m.

Updates:

Time: June 11, 2024, 4:37 p.m.

Status: Postmortem

Update: ## **Summary:** On Wednesday May 15th, 2024 starting at approximately 3:00 AM PT until 10:22 AM PT we received customer reports that scheduled campaigns were not being delivered as expected. These reports included campaigns either sitting in the ‘Scheduled’ queue and not getting sent on time, as well as reports of campaigns that were marked as ‘Delivered’ but showed no actual sends to the users in the Campaign Delivery Report. The problem was linked to a regression in a service used for scheduling that introduced a new post related variable that was not correctly initialized in a specific campaign. The result was that this campaign, and all subsequently scheduled campaigns, were not automatically published during the affected time. ## **Impact:** Campaigns that were scheduled remained sitting in the scheduled queue, or were incorrectly marked as delivered, but were not actually sent to the target audience. We confirmed at least 50 campaigns that were scheduled to publish experienced some sort of a delay. ## **Root Cause:** The latest deployment of the service used for campaign scheduling had an internal permissions check failure that resulted in an elevated error rate as a result of a missing expected default value that was not set. ## **Mitigation:** Once the root cause was identified through a source code analyzer, we corrected the post with the missing expected default value. This allowed all of the remaining scheduled campaigns to be published. Any other campaigns that did not go out on time were then manually published by the incident management team. ## **Recurrence Prevention:** The below changes have been implemented to ensure campaign delays are prevented due to internal deployment activities: * A hotfix was deployed on Monday June 3rd, 2024 where the internal permissions method was updated to ensure the correct default value is set for the new post variable irrespective of whether the associated feature flag is enabled or not. * Improved error logging in the scheduling software to make it easier and faster to identify irrecoverable errors that can cause the scheduler to queue up and delay sends. . * Updated runbooks for steps to troubleshoot delayed campaigns linked from monitors * Improved monitoring for the schedule\_publish endpoint for any errors
Time: June 5, 2024, 4:04 a.m.

Status: Resolved

Update: We have identified the root cause was related to a governor service change. The latest deployment of the governor service version caused an increased error rate. This caused a brief delay, however impacted delivery systems are processing as normal. Additional details will be outlined in our postmortem to this service degradation.
Time: May 15, 2024, 5:59 p.m.

Status: Identified

Update: We are continuing to work on a fix for this issue.
Time: May 15, 2024, 5:57 p.m.

Status: Identified

Update: We have identified a database performance issue, and are working to address it. A permission error caused publishing delays, the problematic post has now been fixed. We will provide another update as soon as more information is made available.
Time: May 15, 2024, 4:22 p.m.

Status: Investigating

Update: We are currently investigating reports of scheduled campaigns not sending. We will provide you with an update within 1 hour.

Is there an Firstup outage?

Firstup status: Systems Active

Firstup outages and incidents

There have been 2 outages or incidents for Firstup in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Components and Services Monitored for Firstup

Latest Firstup outages and incidents.

Platform Service Degradation - Studio Slow Performance

Updates:

Platform Service Disruption - Campaign emails delayed or not delivered

Updates:

Web Experience and API are currently unavailable for EU Customers.

Updates:

Platform Service Degradation - Email Campaigns Not Sending

Updates:

Platform Service Degradation - Scheduled Campaigns Not Sending

Updates:

Check the status of similar companies and alternatives to Firstup

Akamai

Nutanix

MongoDB

LogicMonitor

Acquia

Granicus System

CareCloud

Redis

integrator.io

NinjaOne Trust

Pantheon Operations

Securiti US

Frequently Asked Questions - Firstup

Is there a Firstup outage?

Where can I find the official status page of Firstup?

How can I get notified if Firstup is down or experiencing an outage?

Start monitoring now!