Firstup Status: Check if Firstup down or having an outage.

Platform Service Degradation - Some Users Unable To View Or Edit Studio Campaigns

Description: **Summary:** On September 4th, 2024, starting at 4:27 AM EDT, reports of Studio users unable to view or edit campaigns in Studio were received. Following a correlation of customer reports and initial troubleshooting, a platform service degradation incident was declared at 9:21 AM EDT, and published on our Status Page at 9:41 AM EDT. **Scope:** The scope of this service degradation was restricted to Studio users with multiple audiences assigned to them. **Impact:** Studio users who had multiple audiences assigned to them were unable to view or edit campaigns during the duration of this incident \(12hrs 46mins\). No scheduled campaigns were affected, and campaign viewing, editing, and publishing processes were not inhibited for other users with no audiences assigned or had a single audience assigned. **Root Cause:** The root cause of this incident was determined to be a regression to a misconfigured platform enhancement policy change intended to improve the efficiency of how user-assigned audiences were queried, which had been released at 12:07 AM EDT as part of the scheduled software release maintenance the same day. **Mitigation:** A rollback of the offending policy change was performed and completed by 12:53 PM EDT to restore access to Studio campaigns. **Recurrence Prevention:** To prevent this incident from recurring, we will perform the following actions before releasing the platform enhancement policy change in the future: * Review and correct any misconfiguration on the platform enhancement policy change code. * Add more unit test cases to cover multiple audiences on the modified queries. * Add more regression test cases to cover users with multiple audiences.

Status: Postmortem

Impact: None | Started At: Sept. 4, 2024, 1:41 p.m.

Updates:

Time: Sept. 9, 2024, 9:38 p.m.

Status: Postmortem

Update: **Summary:** On September 4th, 2024, starting at 4:27 AM EDT, reports of Studio users unable to view or edit campaigns in Studio were received. Following a correlation of customer reports and initial troubleshooting, a platform service degradation incident was declared at 9:21 AM EDT, and published on our Status Page at 9:41 AM EDT. **Scope:** The scope of this service degradation was restricted to Studio users with multiple audiences assigned to them. **Impact:** Studio users who had multiple audiences assigned to them were unable to view or edit campaigns during the duration of this incident \(12hrs 46mins\). No scheduled campaigns were affected, and campaign viewing, editing, and publishing processes were not inhibited for other users with no audiences assigned or had a single audience assigned. **Root Cause:** The root cause of this incident was determined to be a regression to a misconfigured platform enhancement policy change intended to improve the efficiency of how user-assigned audiences were queried, which had been released at 12:07 AM EDT as part of the scheduled software release maintenance the same day. **Mitigation:** A rollback of the offending policy change was performed and completed by 12:53 PM EDT to restore access to Studio campaigns. **Recurrence Prevention:** To prevent this incident from recurring, we will perform the following actions before releasing the platform enhancement policy change in the future: * Review and correct any misconfiguration on the platform enhancement policy change code. * Add more unit test cases to cover multiple audiences on the modified queries. * Add more regression test cases to cover users with multiple audiences.
Time: Sept. 9, 2024, 9:38 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: Sept. 4, 2024, 5:01 p.m.

Status: Monitoring

Update: The proposed fix has successfully been deployed in the production environment. Please notify our Customer Support team if any issues persist. We will now be placing the affected services under monitoring for now.
Time: Sept. 4, 2024, 4:09 p.m.

Status: Identified

Update: We are currently deploying the proposed fix in the production environment and will provide another update once this is completed.
Time: Sept. 4, 2024, 3:10 p.m.

Status: Identified

Update: We are currently validating the proposed fix for this issue in a staging environment and will deploy it in the production environment upon successful testing. Another update within 1 hour.
Time: Sept. 4, 2024, 2:21 p.m.

Status: Identified

Update: We have identified a potential cause of the service disruption, and are working on mitigating this issue. Another update within 1 hour.
Time: Sept. 4, 2024, 1:41 p.m.

Status: Investigating

Update: We are currently investigating reports where some users are unable to view or edit campaigns they have access to. We will provide you with an update within 1 hour.

Platform Service Unavailable - US Web Experience and Studio

Description: **Summary:** On August 28th, 2024, at 1:03 PM EDT, system monitors alerted us of failing database health checks, and our engineering team immediately started investigating these alerts. Customer reports of core platform endpoints being unresponsive and/or returning error messages were received beginning at 1:12 PM EDT, and a platform incident was declared at 1:21 PM EDT. **Scope:** Any user on the US platform attempting to access or navigate through the Web and Mobile Experience, as well as Studio, was impacted by this incident. **Impact:** Core US platform endpoints such as Web and Mobile Experiences, as well as Studio, were slow to load or became intermittently unavailable for the duration of the incident \(48 minutes\). **Root Cause:** The root cause was determined to be a slow-running query for “user unread posts” that saw a huge spike in traffic following a campaign that was published to a large audience. As a result, the database CPU spiked and stopped taking new connection requests, causing new Web and Mobile Experience requests, as well as Studio requests to fail and the system appeared to be unresponsive. **Mitigation:** The immediate problem was mitigated by reducing the number of pods submitting requests to the database by half to alleviate the load on the database, which restored database responsiveness and platform endpoints availability by 1:51 PM EDT. **Recurrence Prevention:** To prevent this incident from recurring, our engineering incident response team has optimized the offending “slow-running” query to perform 2x faster, thereby reducing the required database CPU resources. We are also working on implementing circuit breakers on the offending downstream services from the database, to prevent database CPU overutilization, to ensure platform endpoints availability.

Status: Postmortem

Impact: None | Started At: Aug. 28, 2024, 5:21 p.m.

Updates:

Time: Sept. 5, 2024, 10:38 p.m.

Status: Postmortem

Update: **Summary:** On August 28th, 2024, at 1:03 PM EDT, system monitors alerted us of failing database health checks, and our engineering team immediately started investigating these alerts. Customer reports of core platform endpoints being unresponsive and/or returning error messages were received beginning at 1:12 PM EDT, and a platform incident was declared at 1:21 PM EDT. **Scope:** Any user on the US platform attempting to access or navigate through the Web and Mobile Experience, as well as Studio, was impacted by this incident. **Impact:** Core US platform endpoints such as Web and Mobile Experiences, as well as Studio, were slow to load or became intermittently unavailable for the duration of the incident \(48 minutes\). **Root Cause:** The root cause was determined to be a slow-running query for “user unread posts” that saw a huge spike in traffic following a campaign that was published to a large audience. As a result, the database CPU spiked and stopped taking new connection requests, causing new Web and Mobile Experience requests, as well as Studio requests to fail and the system appeared to be unresponsive. **Mitigation:** The immediate problem was mitigated by reducing the number of pods submitting requests to the database by half to alleviate the load on the database, which restored database responsiveness and platform endpoints availability by 1:51 PM EDT. **Recurrence Prevention:** To prevent this incident from recurring, our engineering incident response team has optimized the offending “slow-running” query to perform 2x faster, thereby reducing the required database CPU resources. We are also working on implementing circuit breakers on the offending downstream services from the database, to prevent database CPU overutilization, to ensure platform endpoints availability.
Time: Sept. 5, 2024, 10:37 p.m.

Status: Resolved

Update: This incident is now resolved.
Time: Aug. 28, 2024, 8:06 p.m.

Status: Monitoring

Update: Moving platform incident in to a monitoring state. There has been no further recurrence of the service disruption to the web experience endpoint. A software hot fix has been deployed and verified. This fix is intended to address the suspected root cause of a non-optimal database query that resulted in unresponsiveness and 500 error responses observed by users prior to the incident being mitigated. All components remain fully operational.
Time: Aug. 28, 2024, 6:18 p.m.

Status: Identified

Update: We have identified the cause of this service outage, and are working on a fix. However, Web Experience and Studio continue to be available. Another update will be provided as more information is made available.
Time: Aug. 28, 2024, 5:51 p.m.

Status: Investigating

Update: As we continue investigating this incident, we have relieved some pressure on back-end resources and services to mitigate the issue, and Web Experience and Studio are now available. Another update in 30 minutes.
Time: Aug. 28, 2024, 5:21 p.m.

Status: Investigating

Update: We are currently investigating reports of the US Web Experince being unavailable. We will provide you with an update in 30 minutes.

Intermittent unavailability issues are being experienced for both Studio and Web Experience.

Description: **Summary:** From approximately 11:08 am - 11:38 am PT \(18:08 pm - 18:38 pm UTC\), Thursday August 22nd, both Studio and Web Experience were unavailable due to the release of Version 2 of Personalized Fields \(PFV2\), a new feature with the Q3 quarterly update that was more resource intensive than initially planned. This caused high CPU usage, increased query latency and database connection pool exhaustion. ‌ **Impact:** The scope of this incident primarily affected users who attempted to access Studio services and Web Experience between 11:08 am - 11:38 am PT. The issue manifested itself in the following observable ways through below errors on the frontend of the platform: * We’re sorry, but something went wrong. * 502 Bad Gateway. * There was an error processing your request. Please try again. ‌ **Root Cause:** The root cause was determined to be the release of Version 2 of Personalized Fields \(PFV2\), a new feature with the Q3 quarterly update that has been more resource intensive than was initially planned. The feature caused a significant increase in CPU usage, query latency on the shared database cluster and database connection pool exhaustion. This resulted in the Studio/Web Experience service unavailability and error messages observed by impacted users. ‌ **Mitigation:** The immediate impact was mitigated by temporarily disabling the newly released feature that was causing excessive resource consumption. The cache Time-To-Live \(TTL\) was also changed from 1 minute to 3 hours to reduce load and stabilize performance. After service was restored, we conducted platform tuning and scaled up infrastructure outside business hours to accommodate the increased load with the introduction of this new feature. ‌ **Recurrence Prevention:** To prevent a recurrence of this incident, the below actions have or are being implemented: * Load Testing and Analysis - More rigorous load testing and analysis to detect N\+1 calls or latency spikes before a feature goes live. * Infrastructure Planning and Caching Strategy - Refactor the caching for the affected feature, including pre-warming caches in batches to prevent cache-miss cascades, and optimizing the infrastructure to handle increased load efficiently whilst only caching what is needed. * Remove custom attributes for blocked users who have been inactive for a specific period to reduce table size and improve query performance. * Feature Flagging and Gradual Rollouts - Future high-risk changes will be rolled out gradually and improved resource monitoring performance will be done before full deployment.

Status: Postmortem

Impact: None | Started At: Aug. 22, 2024, 6:27 p.m.

Updates:

Time: Aug. 30, 2024, 7:26 p.m.

Status: Postmortem

Update: **Summary:** From approximately 11:08 am - 11:38 am PT \(18:08 pm - 18:38 pm UTC\), Thursday August 22nd, both Studio and Web Experience were unavailable due to the release of Version 2 of Personalized Fields \(PFV2\), a new feature with the Q3 quarterly update that was more resource intensive than initially planned. This caused high CPU usage, increased query latency and database connection pool exhaustion. ‌ **Impact:** The scope of this incident primarily affected users who attempted to access Studio services and Web Experience between 11:08 am - 11:38 am PT. The issue manifested itself in the following observable ways through below errors on the frontend of the platform: * We’re sorry, but something went wrong. * 502 Bad Gateway. * There was an error processing your request. Please try again. ‌ **Root Cause:** The root cause was determined to be the release of Version 2 of Personalized Fields \(PFV2\), a new feature with the Q3 quarterly update that has been more resource intensive than was initially planned. The feature caused a significant increase in CPU usage, query latency on the shared database cluster and database connection pool exhaustion. This resulted in the Studio/Web Experience service unavailability and error messages observed by impacted users. ‌ **Mitigation:** The immediate impact was mitigated by temporarily disabling the newly released feature that was causing excessive resource consumption. The cache Time-To-Live \(TTL\) was also changed from 1 minute to 3 hours to reduce load and stabilize performance. After service was restored, we conducted platform tuning and scaled up infrastructure outside business hours to accommodate the increased load with the introduction of this new feature. ‌ **Recurrence Prevention:** To prevent a recurrence of this incident, the below actions have or are being implemented: * Load Testing and Analysis - More rigorous load testing and analysis to detect N\+1 calls or latency spikes before a feature goes live. * Infrastructure Planning and Caching Strategy - Refactor the caching for the affected feature, including pre-warming caches in batches to prevent cache-miss cascades, and optimizing the infrastructure to handle increased load efficiently whilst only caching what is needed. * Remove custom attributes for blocked users who have been inactive for a specific period to reduce table size and improve query performance. * Feature Flagging and Gradual Rollouts - Future high-risk changes will be rolled out gradually and improved resource monitoring performance will be done before full deployment.
Time: Aug. 23, 2024, 3:35 p.m.

Status: Resolved

Update: A fix was put in place and the service disruption to the platform has been resolved. Thank you for your patience whilst we carried out our investigation. Please contact Support at support.firstup.io should you encounter any further issues.
Time: Aug. 22, 2024, 6:39 p.m.

Status: Identified

Update: We are continuing to work on a fix for this issue.
Time: Aug. 22, 2024, 6:39 p.m.

Status: Identified

Update: Both Studio and Web Experience are now available. The component which is causing the issue has been identified and steps have been taken to mitigate. Further updates to follow.
Time: Aug. 22, 2024, 6:27 p.m.

Status: Investigating

Update: We are urgently investigating an issue with intermittent unavailability of the platform being experienced for both Studio and Web Experience. Updates to follow asap.

Platform Service Degradation - Studio Email Alias Error

Description: A hotfix has deployed which has been verified to correct the underlying behavior of the author alias reverting on campaign drafts. Marking all components fully operational and resolving incident.

Status: Resolved

Impact: None | Started At: Aug. 7, 2024, 4:31 p.m.

Updates:

Time: Aug. 7, 2024, 7:45 p.m.

Status: Resolved

Update: A hotfix has deployed which has been verified to correct the underlying behavior of the author alias reverting on campaign drafts. Marking all components fully operational and resolving incident.
Time: Aug. 7, 2024, 6:01 p.m.

Status: Identified

Update: We have determined that this problem is related to a regression introduced by a bug fix where a restricted user could potentially publish as an unassigned email alias. We are going to roll back that change in a forthcoming hotfix which should mitigate the problem, and rework a more comprehensive fix to the original bug. We are going through the change management process now to get the hotfix reviewed, verified, and pushed to production.
Time: Aug. 7, 2024, 5:28 p.m.

Status: Investigating

Update: We are continuing to investigate this Studio service disruption. Another update will be provided within the next 1 hour.
Time: Aug. 7, 2024, 4:31 p.m.

Status: Investigating

Update: We are currently investigating reports of email alias in a campaign results reverting to the default in Studio. We will provide you with an update with 1 hour.

EU Communities - Campaign Send Service Disruption

Description: Queued campaigns are now successfully being delivered and we are currently monitoring the issue.

Status: Monitoring

Impact: None | Started At: July 23, 2024, 11:42 a.m.

Updates:

Time: July 23, 2024, 11:47 a.m.

Status: Monitoring

Update: Queued campaigns are now successfully being delivered and we are currently monitoring the issue.
Time: July 23, 2024, 11:42 a.m.

Status: Investigating

Update: We are investigating a potential service disruption affecting campaign sending from EU communities. Users in communities hosted on our EU infrastructure may not be receiving campaign emails, or facing email delays at this time. Our next update will be in 30 minutes.

Is there an Firstup outage?

Firstup status: Systems Active

Firstup outages and incidents

There have been 2 outages or incidents for Firstup in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Components and Services Monitored for Firstup

Latest Firstup outages and incidents.

Platform Service Degradation - Some Users Unable To View Or Edit Studio Campaigns

Updates:

Platform Service Unavailable - US Web Experience and Studio

Updates:

Intermittent unavailability issues are being experienced for both Studio and Web Experience.

Updates:

Platform Service Degradation - Studio Email Alias Error

Updates:

EU Communities - Campaign Send Service Disruption

Updates:

Check the status of similar companies and alternatives to Firstup

Akamai

Nutanix

MongoDB

LogicMonitor

Acquia

Granicus System

CareCloud

Redis

integrator.io

NinjaOne Trust

Pantheon Operations

Securiti US

Frequently Asked Questions - Firstup

Is there a Firstup outage?

Where can I find the official status page of Firstup?

How can I get notified if Firstup is down or experiencing an outage?

Start monitoring now!