Is there an Firstup outage?

Firstup status: Systems Active

Last checked: 8 minutes ago

Get notified about any outages, downtime or incidents for Firstup and 1800+ other cloud vendors. Monitor 10 companies, for free.

Subscribe for updates

Firstup outages and incidents

Outage and incident data over the last 30 days for Firstup.

There have been 2 outages or incidents for Firstup in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Sign Up Now

Components and Services Monitored for Firstup

Outlogger tracks the status of these components for Xero:

Component Status

Latest Firstup outages and incidents.

View the latest incidents for Firstup and check for official updates:

Updates:

  • Time: Nov. 16, 2024, 2:03 a.m.
    Status: Postmortem
    Update: ## Summary: On Wednesday, November 13th, 2024, starting at around 6:14 AM PT, we received reports of some users experiencing a general slowness and decreased responsiveness while navigating around Firstup’s Studio platform. Additional reports of some published campaign emails not being delivered, and error messages being returned while navigating within the Employee Experience followed shortly thereafter. ## Severity Sev2 ## Scope: The scope of this service degradation included all Firstup customers utilizing the Studio and the Employee Experience \(EE\) endpoints. ## Impact: For approximately 2.5 hours after the onset of the incident, users were unable to efficiently and consistently navigate around Studio and the EE. Symptoms included a loading spinner for an extended amount of time in Studio, and missing shortcuts in EE, with some of the errors noted below being returned: * Oops, an error occurred! Sorry about that. * TypeError: toMoment\(…\).format is not a function Concurrently, campaign emails were also not being delivered consistently to their intended recipients, and user sync files were not being actively processed.  Most campaign emails were delivered over the course of the next 3 hours with a few rare exceptions that required manual intervention and Customer coordination.  User sync was not restored until it was discovered the process was not actively running the following day. Total incident duration less the user sync process was just over 6 hours. ## Root Cause: The root cause has been attributed to a code change that was released during the Platform Software Release maintenance window the night before, which caused a slow-running query from one of our back-end services to run for too long and utilize an exceptionally large amount of database resources including the number of network connections and CPU processing cycles. This, coupled with the normal increase in database requests from our customer base during our platform utilization peak hours starting at around 5:54 AM PT, resulted in available database connections to be exhausted and CPU overutilization conditions in the database. As a result, new connections to the database could not be freely established until current connections were closed and made available for new requests from various platform services such as Studio and EE. The backend service responsible for campaign email delivery was also subject to this condition and could not process email deliveries as expected.  Similarly, user sync file processing was also delayed beyond normal limits. ## Mitigation: Various symptoms exhibited during the course of the incident were mitigated in phases.  The most significant service impact was mitigated after a hotfix was released at 8:29 AM PT to halt the aforementioned slow-running query, in effect relieving some resource pressure on the database, and allowing customer-facing service requests from Studio and EE to successfully re-establish connections with the database. Additional resources were also spun up to process the email delivery queue backlog that had been increasing during the incident, which started draining at 12:02 PM PT.  Almost all campaign emails were confirmed delivered by 1:10 PM PT, with the exception of ~150 email messages, which had experienced an internal error, but were later cleared and delivered. ## Recurrence Prevention: The following actions have been taken or have been identified as follow-up actions to commit to as a part of the formal RCA \(Root Cause Assessment\) process: * Moved the slow-running query to run during after-hours \(off customer peak hours\). * Isolate the campaign email delivery back-end service to its own database cluster to avoid the general database impact on email deliveries. * Enhance our software change management policies to release risky backend changes behind a feature flag for a more controlled release. * Update post incident service verification to ensure that user sync processing has been fully restored and remains functional.
  • Time: Nov. 16, 2024, 2:03 a.m.
    Status: Resolved
    Update: All impacted Firstup platform endpoints have remained stable and fully available. This incident is now resolved.
  • Time: Nov. 13, 2024, 8:01 p.m.
    Status: Monitoring
    Update: The hotfix for the campaign email delivery issue has now been vetted and deployed in the production environment. Campaign emails have now resumed delivery and may take a few minutes to hit the end user’s inbox. The affected services will now be placed back under monitoring for now.
  • Time: Nov. 13, 2024, 7:48 p.m.
    Status: Identified
    Update: We continue working on developing and verifying another hotfix for the email delivery issue. Another update within 1 hour.
  • Time: Nov. 13, 2024, 6:47 p.m.
    Status: Identified
    Update: We have identified a potential root cause for the residual impact of this incident where email campaigns are yet to be delivered following the hotfix. We are working on developing and verifying another hotfix for that issue. Another update within 1 hour.
  • Time: Nov. 13, 2024, 5:50 p.m.
    Status: Identified
    Update: We are currently investigating a residual impact of this incident where some email campaigns are yet to be delivered after the hotfix was released. We will provide an update within 1 hour.
  • Time: Nov. 13, 2024, 4:35 p.m.
    Status: Monitoring
    Update: The hotfix for this incident has been released in the production environment. All services should now be restored. We will be placing the affected services in a monitoring state for now.
  • Time: Nov. 13, 2024, 4:25 p.m.
    Status: Identified
    Update: A hotfix for this incident has been developed and is currently being tested in our staging environment. Once vetted, it will be released in the production environment. Another update within 1 hour.
  • Time: Nov. 13, 2024, 3:30 p.m.
    Status: Identified
    Update: We have identified a potential root cause of this incident, and are working towards mitigation. Affected components are only on the US datacenter. The EU datacenter remains unaffected. Another update within 1 hour.
  • Time: Nov. 13, 2024, 3:07 p.m.
    Status: Investigating
    Update: We are currently investigating reports of intermittent Studio slow performance issues. We will provide you with an update within 1 hour.

Updates:

  • Time: Nov. 13, 2024, 1:23 a.m.
    Status: Postmortem
    Update: **Summary:** On Monday, November 4th, 2024, starting at around 7:53 AM PT, we received reports that some published campaign emails were not being delivered to their intended audiences. While some emails were delivered as expected, others were either delayed or appeared to be stuck. All emails were handed off in a timely manner from the Firstup platform to the third-party email provider. This problem worsened over the course of the next several hours where email throughput appeared to be highly restricted, while the backlog of the email delivery queue not only continued to grow, but also did not drain in a logical chronological order in which messages were initially queued. Through a joint troubleshooting call with the third-party email provider, it was determined that a large volume of email delivery errors starting at around 7:11 AM PT had put the entire pool of our email delivery IP addresses in a state of reduced performance. After jointly reviewing the highest volume of errors with the third-party email provider, the sender IPs were restored to a fully functioning state, resulting in the entire email backlog being drained fully by 3:30 PM PT. **Severity:** Sev2 **Scope:** The scope of this service degradation was restricted to customers who use Firstup campaign email delivery as a channel, as well as any other non-campaign email content sent from the Firstup platform, such as password reset request emails. Push notifications, assistant notifications, and web or mobile experience channels were unaffected and remained fully functional. **Impact:** Within an individual campaign sent to email as a channel, some emails may not have been delivered as expected, while others wound up being stuck in a "processing" state on the third- party email delivery platform. During the incident \(7hrs 58mins\), some of the emails in the “processing” state were successfully delivered but heavily delayed, while others remained stuck in a “processing” status. Observed email throughput was reduced to approximately 46k messages per hour from a theoretical max of 30k messages per second. The total outstanding backlog prior to mitigation was over a million email messages. **Root Cause:** Root cause has been attributed to an elevated level of email delivery errors that triggered a protection mechanism on the third-party email provider platform. This resulted in reduced throughput for the entire pool of our sender IP addresses to the point where mostly retries for deferrals from earlier delivery errors were being processed, and very few queued up emails were delivered. Essentially the queue processing equivalent of running in place. **Mitigation:** After analyzing the top contributors to email delivery errors that appeared to be correlated to a single misconfigured email security endpoint all addresses associated with that endpoint were force-unsubscribed until it could be correctly configured, to avoid any further email delivery errors contributing to the underlying log jam. 80k email errors were attributed to that endpoint in just a couple of hours. Through a joint incident bridge with the third-party email provider, Firstup demonstrated the irrelevance of the deferral rates to the overall email backlog queue. A data pipeline engineer was paged out and able to verify that the sender IPs had been relegated to a lower performing state that was actually contributing to a circular problem. Backend system changes were made at 3:09 PM PT on the third-party platform to restore prior state of the sender IPs, and the entire email messages backlog subsequently fully drained in less than 25 minutes. **Recurrence Prevention:** The following actions have been taken or have been identified as follow-up actions to commit to as a part of the formal RCA \(Root Cause Assessment\) process: * Email addresses contributing to elevated error rates will be bulk unsubscribed from the platform \(or otherwise quarantined\) until underlying conditions can be corrected. * Coordinate with third-party provider to better understand the characteristics of the platform safety mechanism, including why it triggered, how to avoid it entirely, and how to improve joint monitoring and mitigate elevated error rates from affecting overall delivery. * Implement any reasonable recommendations from the third-party RFO.
  • Time: Nov. 12, 2024, 5:36 p.m.
    Status: Resolved
    Update: Email delivery has remained available and fully functional throughout the monitoring phase of this incident. This incident is now resolved. Once available, a Root Cause Analysis for this incident will be published here.
  • Time: Nov. 4, 2024, 11:24 p.m.
    Status: Monitoring
    Update: This issue has now been mitigated, and any emails that had not been delivered should now be hitting their user endpoints. We will place the impacted services under monitoring.
  • Time: Nov. 4, 2024, 9:23 p.m.
    Status: Identified
    Update: As we continue to work with our upstream third-party email delivery vendor towards a solution to this issue, we have identified that platform emails such as password reset emails are in the scope of this incident. Another update within 1 hour.
  • Time: Nov. 4, 2024, 8:11 p.m.
    Status: Investigating
    Update: We continue to work with our upstream third-party email delivery vendor towards a solution to this issue. Another update within 1 hour.
  • Time: Nov. 4, 2024, 7:11 p.m.
    Status: Investigating
    Update: We continue to work with our upstream third-party email delivery vendor towards a solution to this issue. Another update within 1 hour.
  • Time: Nov. 4, 2024, 6:11 p.m.
    Status: Investigating
    Update: Our investigations have not revealed any issues on our platform. However, we see a potential issue with our upstream third-party email delivery vendor, and have reached out to them for additional troubleshooting on their end. We will provide you with another update within 1 hour.
  • Time: Nov. 4, 2024, 5:11 p.m.
    Status: Investigating
    Update: As we continue to investigate these reports, our current observation is that email campaigns are being delivered, albeit with some delays. Another update within 1 hour.
  • Time: Nov. 4, 2024, 4:40 p.m.
    Status: Investigating
    Update: We are investigating reports where some email campaigns are not being delivered as expected. We will provide you with an update within 1 hour.

Updates:

  • Time: Oct. 31, 2024, 6:39 p.m.
    Status: Postmortem
    Update: **Summary:** On Monday, October 14th, 2024, starting at 9:41 AM PDT, we received reports of Studio users receiving the error message “We’re sorry, but something went wrong” while attempting to log into Studio via Single Sign-On \(SSO\). Following a correlation of customer reports and initial troubleshooting, a platform service disruption incident was declared at 11:08 AM PDT and published on our Status Page at 11:11 AM PDT. **Severity:** Sev2 **Scope:** The scope of this service disruption was restricted to Studio users on the US platform attempting to log into Studio via SSO. Users who were already logged in before the incident, or used other authentication methods to log into Studio were unaffected. **Impact:** Users could not log into Studio via SSO for the duration of this incident \(1hr 37mins\). **Root Cause:** The root cause of this incident was attributed to an unexpected hardware failure on the AWS Redis cluster, which triggered a failover event at 9:35 AM PDT. The failover event caused disruptions to the authentication flow in the Identity and Access Management \(IAM\) Redis service, which did not re-establish connections to the failover cluster, leading to the SSO login error. **Mitigation:** To mitigate this incident, the IAM service was restarted at 11:12 AM PDT to refresh the connections to the failover Redis cluster, which restored the Studio SSO logging service. **Recurrence Prevention:** To prevent this incident from recurring, we will perform the following: * Introduced self-healing for IAM to automatically reconnect to Redis following failover events. This enhancement will be released in our upcoming Scheduled Software Release maintenance window on November 12th, 2024. * Perform a gap analysis of the already existing IAM monitoring and alerting dashboard.
  • Time: Oct. 22, 2024, 2:14 p.m.
    Status: Resolved
    Update: SSO log-in into Studio has remained available and fully functional throughout the monitoring phase of this incident. This incident is now resolved.
  • Time: Oct. 14, 2024, 6:20 p.m.
    Status: Monitoring
    Update: The reported issue has been mitigated, and Studio is now available via SSO. We will place the impacted services under monitoring for now.
  • Time: Oct. 14, 2024, 6:13 p.m.
    Status: Identified
    Update: We have identified a potential cause of this issue, and are working to resolve it. Another update will be provided within 1 hour.
  • Time: Oct. 14, 2024, 6:11 p.m.
    Status: Investigating
    Update: We are investigating reports of users being unable to log into Studio via Single Sign-On (SSO). We will provide you with an update within 1 hour.

Updates:

  • Time: Oct. 9, 2024, 2:47 p.m.
    Status: Postmortem
    Update: ## Summary: On September 30th, 2024, beginning at approximately 1:24 PM PDT \(20:24 UTC\), we started receiving reports of Shortcuts intermittently being unavailable and the Assistant returning an error in the Employee Experience. A platform incident was declared at 2:36 PM PDT \(21:36 UTC\) after initial investigations revealed the issue to be platform-wide. ## Severity: Sev2 ## Scope: Any user on the US platform accessing the Web or Mobile Experiences intermittently experienced missing Shortcuts and/or received an error message while accessing the Assistant. A refresh of the Employee Experience page occasionally restored these endpoints. All other services in the Employee Experience remained available and functional. ## Impact: Shortcuts and the Assistant endpoints in the Employee Experience were intermittently unavailable during the incident. ## Root Cause: The root cause was determined to be due to an uncharacteristically high number of new user integrations introduced within a short period of time that exacerbated a newly uncovered non-optimized content caching behavior. This caused downstream latency and increased error rates served by the web service responsible for rendering shortcuts and the assistant notification page. ## Mitigation: The immediate impact was mitigated by restarting the Employee Experience integrations API, and services were restored by 2:42 PM PDT \(21:42 UTC\). While investigations into the root cause continued, the incident recurred the following day – October 1st, 2024, at 12:54 PM PDT \(19:54 UTC\). The Employee Experience integrations API and the dependent Employee Experience user-integrations request processing service \(Pythia\) were restarted, restoring Shortcuts and the Assistant endpoints by 1:46 PM PDT \(20:46 UTC\). Cache resources for Pythia were increased to mitigate the observed latency. ## Recurrence Prevention: To prevent this incident from recurring, our engineering incident response team: * Has developed a fix to optimize how user-integrations requests use the cache to reduce memory consumption and eliminate latency. * This fix will be released during our scheduled Software Release maintenance window on October 15th, 2024. * Will be adding a monitoring and alerting dashboard for the Employee Experience user-integrations requests processing service \(Pythia\).
  • Time: Oct. 9, 2024, 2:47 p.m.
    Status: Resolved
    Update: Employee Experience Shortcuts and the Assistant have remained available and fully functional throughout the monitoring phase of this incident. This incident is now resolved.
  • Time: Oct. 1, 2024, 9:19 p.m.
    Status: Monitoring
    Update: We have restarted the offending backend service to restore the affected functionalities. Shortcuts and the Assistant are now available. We will place these services back to monitoring for now.
  • Time: Oct. 1, 2024, 8:28 p.m.
    Status: Investigating
    Update: We are currently investigating a recurrence of this issue. We will provide you with an update in 1 hour.
  • Time: Sept. 30, 2024, 9:55 p.m.
    Status: Monitoring
    Update: We have identified and restarted the offending backend services to restore the affected services. Shortcuts and the Assistant are now available. We will place these services under monitoring for now.
  • Time: Sept. 30, 2024, 9:36 p.m.
    Status: Investigating
    Update: We are currently investigating reports of shortcuts in the Employee Experience intermittently being unavailable, as well as an error message being returned while trying to access the assistant. We will provide you with an update in 1 hour.

Updates:

  • Time: Sept. 18, 2024, 12:43 a.m.
    Status: Postmortem
    Update: **Summary:** On September 16th, 2024, starting at around 11:00 AM PDT, we started receiving customer reports stating that the Web and Mobile Experiences endpoints were unavailable. Following a correlation of these reports and system monitors, a platform incident was declared at 11:14 AM PDT. ‌ **Severity:** Sev1 ‌ **Scope:** Any user on the US platform attempting to access the Web and Mobile Experiences intermittently received an error message, and the Employee Experience failed to load. ‌ **Impact:** The core Web and Mobile Experiences platform endpoints were intermittently unavailable for the duration of the incident \(1hr 38mins\). **Root Cause:** The root cause was determined to be an exhaustion of the available database connections due to a sudden burst of user engagement activity that correlated to a small number of high-visibility campaigns. At 10:50 AM PDT, a dependent back-end service entered into a crash loop back-off state due to the database connection requests being refused and returned the error message to end users. ‌ **Mitigation:** The immediate problem was mitigated by fully redeploying the Employee Experience microservice after initial failed attempts at more surgical standardized mitigation maneuvers proved ineffective. Earlier maneuvers focused on reducing database load by temporarily disabling platform features and functionality that make heavy use of database transactions, which reduced error rates overall, but did not eliminate Customer impact. Web and Mobile Experience availability was restored by 12:28 PM PDT.   **Recurrence Prevention:** To prevent this incident from recurring, our engineering incident response team has: * Increased the available database connections by 40% to account for any unforeseen spikes in platform traffic. * Added circuit breakers that would intercept abnormal increases in platform traffic, thereby maintaining platform endpoints availability. * Added an additional incident mitigation maneuver to disable campaign reactions such that a full-service redeploy would not be required to restore platform availability.
  • Time: Sept. 18, 2024, 12:43 a.m.
    Status: Resolved
    Update: All affected endpoints have remained stable and available. This incident is now resolved.
  • Time: Sept. 17, 2024, 6:09 p.m.
    Status: Monitoring
    Update: We are continuing to monitor for any further issues.
  • Time: Sept. 16, 2024, 9:38 p.m.
    Status: Monitoring
    Update: The unplanned performance enhancement maintenance to the Firstup cloud infrastructure is now completed. All services are now available and fully functional. Please notify our Customer Support team if you experience any issues with Firstup services following this notice.
  • Time: Sept. 16, 2024, 8:56 p.m.
    Status: Monitoring
    Update: Today at 2:30 PM PT / 9:30 PM UTC we will be performing unplanned maintenance to shore up Firstup cloud infrastructure as a preventative measure based on technical troubleshooting done since the incident was initially mitigated earlier today. This change may result in a service disruption lasting from a few seconds to several minutes as the changes take effect. We expect to be in a much more stable state as root cause troubleshooting continues following the completion of the maintenance.
  • Time: Sept. 16, 2024, 7:41 p.m.
    Status: Monitoring
    Update: Web and Mobile Experiences have now been restored. We will be placing the offending services under monitoring for now.
  • Time: Sept. 16, 2024, 7:27 p.m.
    Status: Identified
    Update: We are continuing to work on a fix for this issue.
  • Time: Sept. 16, 2024, 7:26 p.m.
    Status: Identified
    Update: We continue to work on relieving the pressure on database resources, and the current user experience is intermittent and partial access to the Employee Experience (on both desktop and mobile EE). Another update in 30 minutes.
  • Time: Sept. 16, 2024, 6:55 p.m.
    Status: Identified
    Update: We are working on relieving pressure on database resources to restore services. Another update in 30 minutes.
  • Time: Sept. 16, 2024, 6:31 p.m.
    Status: Identified
    Update: We have identified a potential cause of this service outage, and are working to restore services. Another update in 30 minutes.
  • Time: Sept. 16, 2024, 6:14 p.m.
    Status: Investigating
    Update: We are currently investigating reports of the US Web Experience being unavailable. Studio remains available

Check the status of similar companies and alternatives to Firstup

Akamai
Akamai

Systems Active

Nutanix
Nutanix

Systems Active

MongoDB
MongoDB

Issues Detected

LogicMonitor
LogicMonitor

Systems Active

Acquia
Acquia

Systems Active

Granicus System
Granicus System

Systems Active

CareCloud
CareCloud

Issues Detected

Redis
Redis

Systems Active

integrator.io
integrator.io

Systems Active

NinjaOne Trust

Systems Active

Pantheon Operations
Pantheon Operations

Systems Active

Securiti US
Securiti US

Systems Active

Frequently Asked Questions - Firstup

Is there a Firstup outage?
The current status of Firstup is: Systems Active
Where can I find the official status page of Firstup?
The official status page for Firstup is here
How can I get notified if Firstup is down or experiencing an outage?
To get notified of any status changes to Firstup, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Firstup every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here