Firstup Status: Check if Firstup down or having an outage.

Platform Service Degradation - Studio Unavailable

Description: On February 12th, 2024, starting at around 9:20 AM ET, we received reports that some users were experiencing issues accessing Creator Studio. The reports included general slowness loading Studio functions, as well as system error messages such as “Failed to fetch. Try again” or “504 Bad Gateway”. During our investigations, it was identified that the number of available database connections was exhausted, and therefore new Creator Studio requests could not establish a connection to the database. To mitigate this situation, a rolling redeployment of dependent backend services that had connections to the database was performed to free up any “stuck” connections, thus allowing new Creator Studio requests to connect to the database. As a recurrence prevention measure to this incident, we are working on reducing the demand for database connections in the short term via methods such as: * Connection pooling * Read/write splitting ‌ In the long term, we will be looking at: * Upgrading our database infrastructure * Increasing our database connection limits.

Status: Postmortem

Impact: None | Started At: Feb. 12, 2024, 2:47 p.m.

Updates:

Time: Feb. 29, 2024, 7:21 p.m.

Status: Postmortem

Update: On February 12th, 2024, starting at around 9:20 AM ET, we received reports that some users were experiencing issues accessing Creator Studio. The reports included general slowness loading Studio functions, as well as system error messages such as “Failed to fetch. Try again” or “504 Bad Gateway”. During our investigations, it was identified that the number of available database connections was exhausted, and therefore new Creator Studio requests could not establish a connection to the database. To mitigate this situation, a rolling redeployment of dependent backend services that had connections to the database was performed to free up any “stuck” connections, thus allowing new Creator Studio requests to connect to the database. As a recurrence prevention measure to this incident, we are working on reducing the demand for database connections in the short term via methods such as: * Connection pooling * Read/write splitting ‌ In the long term, we will be looking at: * Upgrading our database infrastructure * Increasing our database connection limits.
Time: Feb. 20, 2024, 5:47 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: Feb. 12, 2024, 3:12 p.m.

Status: Monitoring

Update: We have corrected the database issue to mitigate this incident, and Studio should now be available and accessible. We will be placing this incident under monitoring.
Time: Feb. 12, 2024, 3:03 p.m.

Status: Identified

Update: We have identified a potential database issue that may be causing Studio to be unavailable, and are working to correct this issue. Another update with half an hour.
Time: Feb. 12, 2024, 2:47 p.m.

Status: Investigating

Update: We are currently investigating reports where Studio is currently unavailable. We will provide you with an update within half an hour.

Studio Performance Degradation

Description: ## Summary: On February 8th, 2024, beginning at approximately 1:56 PM EST \(18:56 UTC\), we started receiving reports of Studio not performing as expected. The symptoms observed by some Studio users included: · A “failed to fetch” or a “504 Gateway Timeout” error message. · Unusually slow performance. A recurrence of this incident was also observed on April 24th, 2024. ## Impact: Studio users who were actively trying to navigate through and use any Studio functions during the duration of these incidents were impacted by the service disruption. ## Root Cause: It was identified that Studio services were failing to establish a TCP connection to the Identity and Access Management service \(IAM\) due to a backup of TCP connection requests. The backup of TCP connection requests resulted from other “already failed” connection requests that were not dropped because they kept retrying to establish a connection for an extended period. ## Mitigation: On both days, the immediate problem was mitigated by restarting the backend services that had failed TCP connection attempts, in effect purging the connection request queue of stale connections and allowing new connections to be established with the IAM service. ## Remediation Steps: Our engineering team is working on reducing the time-to-live duration of all TCP connection requests to the IAM service from the default 60 seconds to 10 seconds. This will allow for failed connections to be dropped sooner and reduce the backup of connection requests to IAM. In addition, we have also implemented dashboards to track TCP connection failures, as well as set alerting thresholds on failed TCP connections to help us get ahead of a potential platform service disruption.

Status: Postmortem

Impact: None | Started At: Feb. 8, 2024, 7:16 p.m.

Updates:

Time: June 7, 2024, 7:54 p.m.

Status: Postmortem

Update: ## Summary: On February 8th, 2024, beginning at approximately 1:56 PM EST \(18:56 UTC\), we started receiving reports of Studio not performing as expected. The symptoms observed by some Studio users included: · A “failed to fetch” or a “504 Gateway Timeout” error message. · Unusually slow performance. A recurrence of this incident was also observed on April 24th, 2024. ## Impact: Studio users who were actively trying to navigate through and use any Studio functions during the duration of these incidents were impacted by the service disruption. ## Root Cause: It was identified that Studio services were failing to establish a TCP connection to the Identity and Access Management service \(IAM\) due to a backup of TCP connection requests. The backup of TCP connection requests resulted from other “already failed” connection requests that were not dropped because they kept retrying to establish a connection for an extended period. ## Mitigation: On both days, the immediate problem was mitigated by restarting the backend services that had failed TCP connection attempts, in effect purging the connection request queue of stale connections and allowing new connections to be established with the IAM service. ## Remediation Steps: Our engineering team is working on reducing the time-to-live duration of all TCP connection requests to the IAM service from the default 60 seconds to 10 seconds. This will allow for failed connections to be dropped sooner and reduce the backup of connection requests to IAM. In addition, we have also implemented dashboards to track TCP connection failures, as well as set alerting thresholds on failed TCP connections to help us get ahead of a potential platform service disruption.
Time: Feb. 20, 2024, 5:47 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: Feb. 8, 2024, 8:17 p.m.

Status: Monitoring

Update: We have completed the rolling restart of backend services to mitigate this issue, and Studio Services are now available. We will be placing this issue under monitoring for now.
Time: Feb. 8, 2024, 7:53 p.m.

Status: Identified

Update: We will be performing a rolling restart of backend services as we work to mitigate this issue. Studio users may experience a brief moment of Studio Services being unavailable. We will advise once the backend services are back online.
Time: Feb. 8, 2024, 7:16 p.m.

Status: Investigating

Update: We are currently investigating reports of Studio performance degradation. We will provide an update within 1 hour.

Service Disruption - Studio and Experience

Description: The root cause of this service disruption has been identified to be the same as the one posted in another incident’s postmortem. Please follow this [link](https://status.firstup.io/incidents/4kpmpkmvkrz0) for additional details.

Status: Postmortem

Impact: None | Started At: Jan. 25, 2024, 3:13 p.m.

Updates:

Time: Feb. 29, 2024, 7:34 p.m.

Status: Postmortem

Update: The root cause of this service disruption has been identified to be the same as the one posted in another incident’s postmortem. Please follow this [link](https://status.firstup.io/incidents/4kpmpkmvkrz0) for additional details.
Time: Jan. 30, 2024, 4:02 p.m.

Status: Resolved

Update: This incident has been fully resolved and all components remain fully operational.
Time: Jan. 25, 2024, 6:58 p.m.

Status: Monitoring

Update: This issue has now been fully mitigated and all components and platform are fully operational. Maintaining in a monitoring state while remaining checks are performed.
Time: Jan. 25, 2024, 6:14 p.m.

Status: Identified

Update: We are seeing incremental improvements in error rates platform wide and improved responsiveness.
Time: Jan. 25, 2024, 5:48 p.m.

Status: Investigating

Update: We are restarting incremental services that have been determined to be holding on to network connections we feel may be leading to an unhealthy state. As a result some observed symptoms may change or worsen for a short period of time.
Time: Jan. 25, 2024, 5:14 p.m.

Status: Investigating

Update: We are now working in close contact with our Platform vendor AWS, to identify the source of elevated database locks that are causing inconsistent service performance and availability across multiple endpoints.
Time: Jan. 25, 2024, 4:55 p.m.

Status: Investigating

Update: We are continuing to investigate the service disruption affecting Studio and Experience. We'll update this incident as soon as we have further details.
Time: Jan. 25, 2024, 3:13 p.m.

Status: Investigating

Update: We experienced an unanticipated service disruption across all components. This may have affected Studio and the member experience. Users may have seen an error page, or experienced slowness when accessing the platform. We are investigating this issue and will provide an update in due course. Please monitor this page for further updates.

Platform Service Degradation - User Export Jobs Stuck In Processing State

Description: On January 10th, 2024, starting at 11:11AM ET, we observed that the “User Export” jobs queue started growing, causing such jobs to take longer than usual to process and complete. ‌ Our investigations revealed that a large User Export job caused our systems to hit a memory error, resulting in the job failing and restarting in a loop, and consequently preventing other User Export jobs from being processed. ‌ As a mitigation step, we increased the worker pod memory to accommodate the large User Export job, thereby allowing it to complete processing successfully, and freeing up the queue for backlogged jobs to resume and complete processing. The queue backlog took approximately 2 hours to catch up, and all jobs were completed by 1:23PM ET. ‌ As a long-term solution to the memory limitation, we implemented autoscaling to allow for memory to be allocated automatically as needed to improve the performance of the queue processing.

Status: Postmortem

Impact: None | Started At: Jan. 10, 2024, 4:21 p.m.

Updates:

Time: Feb. 13, 2024, 8:14 p.m.

Status: Postmortem

Update: On January 10th, 2024, starting at 11:11AM ET, we observed that the “User Export” jobs queue started growing, causing such jobs to take longer than usual to process and complete. ‌ Our investigations revealed that a large User Export job caused our systems to hit a memory error, resulting in the job failing and restarting in a loop, and consequently preventing other User Export jobs from being processed. ‌ As a mitigation step, we increased the worker pod memory to accommodate the large User Export job, thereby allowing it to complete processing successfully, and freeing up the queue for backlogged jobs to resume and complete processing. The queue backlog took approximately 2 hours to catch up, and all jobs were completed by 1:23PM ET. ‌ As a long-term solution to the memory limitation, we implemented autoscaling to allow for memory to be allocated automatically as needed to improve the performance of the queue processing.
Time: Jan. 11, 2024, 4:58 p.m.

Status: Resolved

Update: All systems have remained stable and fully functional, with all pending User Export jobs processed successfully. This service disruption is now resolved.
Time: Jan. 10, 2024, 4:46 p.m.

Status: Monitoring

Update: We have increased autoscaling on the platform to improve the User Export queue processing, and are seeing User Exports now being processed as expected, with the queue reducing exponentially. We will be placing this incident under monitoring for now.
Time: Jan. 10, 2024, 4:21 p.m.

Status: Investigating

Update: We are currently investigating reports where some User Export jobs have been stuck in a processing state for an extended period of time. We will provide another update within the hour.

[Sev2] Multiple reports of Draft posts not saving changes...

Description: This service disruption was caused by the deployment of a hot fix that was intended to fix another issue on the platform. The hot fix was rolled back in order to restore the affected services. Our Engineering and Quality Assurance teams reviewed the problematic hot fix, corrected any negative dependencies, and ensured that its redeployment caused no further regressions on the platform.

Status: Postmortem

Impact: None | Started At: Nov. 15, 2023, 11:31 p.m.

Updates:

Time: Dec. 5, 2023, 1:24 p.m.

Status: Postmortem

Update: This service disruption was caused by the deployment of a hot fix that was intended to fix another issue on the platform. The hot fix was rolled back in order to restore the affected services. Our Engineering and Quality Assurance teams reviewed the problematic hot fix, corrected any negative dependencies, and ensured that its redeployment caused no further regressions on the platform.
Time: Dec. 5, 2023, 12:49 p.m.

Status: Resolved

Update: Our monitoring has indicated that the affected services and components have remained stable and fully available following the rollback of the hot fix that caused this service disruption. This service disruption is now considered resolved.
Time: Nov. 16, 2023, 12:31 a.m.

Status: Monitoring

Update: We have completed rolling back the hot fix, and campaign drafts can now be saved, as well as reflect accurate audiences. If you were working on a campaign during this incident, a browser refresh may be required to update the page. We will be placing this incident under monitoring for now.
Time: Nov. 15, 2023, 11:49 p.m.

Status: Identified

Update: This appears to be an unexpected behavior with how audiences are being attached to campaigns which is causing synchronization issues that was introduced with a hot fix that went out earlier today. We are in the process of rolling that change back and re-assessing.
Time: Nov. 15, 2023, 11:31 p.m.

Status: Investigating

Update: We are currently investigating reports of changes made to Campaign drafts that are not being retained.

Is there an Firstup outage?

Firstup status: Systems Active

Firstup outages and incidents

There have been 2 outages or incidents for Firstup in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Components and Services Monitored for Firstup

Latest Firstup outages and incidents.

Platform Service Degradation - Studio Unavailable

Updates:

Studio Performance Degradation

Updates:

Service Disruption - Studio and Experience

Updates:

Platform Service Degradation - User Export Jobs Stuck In Processing State

Updates:

[Sev2] Multiple reports of Draft posts not saving changes...

Updates:

Check the status of similar companies and alternatives to Firstup

Akamai

Nutanix

MongoDB

LogicMonitor

Acquia

Granicus System

CareCloud

Redis

integrator.io

NinjaOne Trust

Pantheon Operations

Securiti US

Frequently Asked Questions - Firstup

Is there a Firstup outage?

Where can I find the official status page of Firstup?

How can I get notified if Firstup is down or experiencing an outage?

Start monitoring now!