Last checked: 54 seconds ago
Get notified about any outages, downtime or incidents for Firstup and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Firstup.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
3rd-Party Dependencies | Active |
Identity Access Management | Active |
Image Transformation API | Active |
SendGrid API v3 | Active |
Zoom Zoom Virtual Agent | Active |
Ecosystem | Active |
Connect | Active |
Integrations | Active |
Partner API | Active |
User Sync | Active |
Platforms | Active |
EU Firstup Platform | Active |
US Firstup Platform | Active |
Products | Active |
Classic Studio | Active |
Creator Studio | Active |
Insights | Active |
Microapps | Active |
Mobile Experience | Active |
Web Experience | Active |
View the latest incidents for Firstup and check for official updates:
Description: ## **Summary:** On Tuesday May 14th, 2024, starting at approximately 4 PM PT to Wednesday May 15th, 2024 10:06 AM PT, outbound email that had previously been sent from [sendgrid.net](http://sendgrid.net), a backstop sender domain, instead was sent via another non-allowlisted domain. This resulted in delivery failures or delays for some Customers with explicit inbound email rules configured to explicitly match the [sendgrid.net](http://sendgrid.net) sender domain. ## **Impact:** Any campaigns set to publish or re-engage users during the affected time window, would have appeared to originate from a sender domain other than [sendgrid.net](http://sendgrid.net) for any Customers without their own authenticated sender domain records \(against Firstup and industry best practices\). Depending on individual email rules, those emails could have been blocked, quarantined, marked as spam, or any number of other email policy-specific actions. ## **Root Cause:** Root cause was determined to be related to a default sender domain being set while configuring a new authenticated email domain while implementing a newly onboarded Firstup customer. Previously no default domain was configured which resulted in [sendgrid.net](http://sendgrid.net) being used as a backstop sender domain–despite not being included in the allowlisting article at [https://support.firstup.io/hc/en-us/articles/4417455533975-Allowlist-Emails-from-Firstup](https://support.firstup.io/hc/en-us/articles/4417455533975-Allowlist-Emails-from-Firstup) .Both user error and a UI design limitation in the 3rd party software used to add new authenticated domains contributed to the errantly created default sender domain. Specifically, the configuration page does not have any explicit save button or confirmation dialogue protecting the checkbox which sets the new record as the default sender domain to use platform-wide. The checkbox was incidentally selected while creating a screenshot of the configuration settings to be shared with the newly onboarded customer. ## **Mitigation:** To mitigate, the default sender domain was restored as soon as the root cause became clear. ## **Recurrence Prevention:** The following actions have been committed to fully resolving the incident and eliminating the reliance on the mitigation measure currently in place. * A scheduled maintenance window has been posted for June 15th, 2024 outlining a planned update to a new sender domain that should already be allowlisted, [email.socialchorus.net](http://email.socialchorus.net). Specific details of the maintenance can be found at [https://status.firstup.io/incidents/jfv1s06qyv3v](https://status.firstup.io/incidents/jfv1s06qyv3v) * Firstup will review again any Customers who do not have authenticated sender domains configured to setup DMARC/SPF records and Customer specific sender domains to avoid the backstop or default from ever being needed to send program-specific email. * A feature request has been filed with Sendgrid, the 3rd party email provider to add protections around the checkbox selection in the user interface to avoid any chance of an unintentional user action from changing the default sender domain.
Status: Postmortem
Impact: None | Started At: May 14, 2024, 11 p.m.
Description: ## **Summary:** On Tuesday May 14th, 2024, starting at approximately 4 PM PT to Wednesday May 15th, 2024 10:06 AM PT, outbound email that had previously been sent from [sendgrid.net](http://sendgrid.net), a backstop sender domain, instead was sent via another non-allowlisted domain. This resulted in delivery failures or delays for some Customers with explicit inbound email rules configured to explicitly match the [sendgrid.net](http://sendgrid.net) sender domain. ## **Impact:** Any campaigns set to publish or re-engage users during the affected time window, would have appeared to originate from a sender domain other than [sendgrid.net](http://sendgrid.net) for any Customers without their own authenticated sender domain records \(against Firstup and industry best practices\). Depending on individual email rules, those emails could have been blocked, quarantined, marked as spam, or any number of other email policy-specific actions. ## **Root Cause:** Root cause was determined to be related to a default sender domain being set while configuring a new authenticated email domain while implementing a newly onboarded Firstup customer. Previously no default domain was configured which resulted in [sendgrid.net](http://sendgrid.net) being used as a backstop sender domain–despite not being included in the allowlisting article at [https://support.firstup.io/hc/en-us/articles/4417455533975-Allowlist-Emails-from-Firstup](https://support.firstup.io/hc/en-us/articles/4417455533975-Allowlist-Emails-from-Firstup) .Both user error and a UI design limitation in the 3rd party software used to add new authenticated domains contributed to the errantly created default sender domain. Specifically, the configuration page does not have any explicit save button or confirmation dialogue protecting the checkbox which sets the new record as the default sender domain to use platform-wide. The checkbox was incidentally selected while creating a screenshot of the configuration settings to be shared with the newly onboarded customer. ## **Mitigation:** To mitigate, the default sender domain was restored as soon as the root cause became clear. ## **Recurrence Prevention:** The following actions have been committed to fully resolving the incident and eliminating the reliance on the mitigation measure currently in place. * A scheduled maintenance window has been posted for June 15th, 2024 outlining a planned update to a new sender domain that should already be allowlisted, [email.socialchorus.net](http://email.socialchorus.net). Specific details of the maintenance can be found at [https://status.firstup.io/incidents/jfv1s06qyv3v](https://status.firstup.io/incidents/jfv1s06qyv3v) * Firstup will review again any Customers who do not have authenticated sender domains configured to setup DMARC/SPF records and Customer specific sender domains to avoid the backstop or default from ever being needed to send program-specific email. * A feature request has been filed with Sendgrid, the 3rd party email provider to add protections around the checkbox selection in the user interface to avoid any chance of an unintentional user action from changing the default sender domain.
Status: Postmortem
Impact: None | Started At: May 14, 2024, 11 p.m.
Description: **Summary:** On Tuesday, May 14th, 2024, starting at around 11:25 AM EDT \(15:25 UTC\), we received reports that some users saw errors while accessing the Studio platform or the Web Experience. Reported error messages included: * We’re sorry, but something went wrong. * 502 Bad Gateway. * There was an error processing your request. Please try again. **Scope:** The scope of this incident primarily affected users who attempted to access Studio services, and to a lesser degree, users who tried to access the Web Experience between 11:25 AM EDT and 12:09 PM EDT. **Root Cause:** An underlying service \(Athena\) which is used as part of our machine learning and AI infrastructure experienced access issues connecting with one of our core database servers due to high network latency. The service had timeouts configured that were too large for its access pattern and the data it uses, causing it to block incoming connections for an inordinate period. Subsequently, services that depend on Athena also timed out, resulting in the Studio service degradation and error messages observed by impacted users. **Mitigation:** The immediate impact was mitigated by performing a rolling restart of the affected services, and all Studio functions were restored by 12:09 PM EDT \(16:09 UTC\). **Recurrence Prevention:** To prevent a recurrence of this incident, connection requests Time-To-Live \(TTL\) from Athena to our core database will be reduced from the default 60 seconds to 5 seconds. This will greatly reduce the traffic backup of requests from other services to Athena.
Status: Postmortem
Impact: None | Started At: May 14, 2024, 3:56 p.m.
Description: **Summary:** On Tuesday, May 14th, 2024, starting at around 11:25 AM EDT \(15:25 UTC\), we received reports that some users saw errors while accessing the Studio platform or the Web Experience. Reported error messages included: * We’re sorry, but something went wrong. * 502 Bad Gateway. * There was an error processing your request. Please try again. **Scope:** The scope of this incident primarily affected users who attempted to access Studio services, and to a lesser degree, users who tried to access the Web Experience between 11:25 AM EDT and 12:09 PM EDT. **Root Cause:** An underlying service \(Athena\) which is used as part of our machine learning and AI infrastructure experienced access issues connecting with one of our core database servers due to high network latency. The service had timeouts configured that were too large for its access pattern and the data it uses, causing it to block incoming connections for an inordinate period. Subsequently, services that depend on Athena also timed out, resulting in the Studio service degradation and error messages observed by impacted users. **Mitigation:** The immediate impact was mitigated by performing a rolling restart of the affected services, and all Studio functions were restored by 12:09 PM EDT \(16:09 UTC\). **Recurrence Prevention:** To prevent a recurrence of this incident, connection requests Time-To-Live \(TTL\) from Athena to our core database will be reduced from the default 60 seconds to 5 seconds. This will greatly reduce the traffic backup of requests from other services to Athena.
Status: Postmortem
Impact: None | Started At: May 14, 2024, 3:56 p.m.
Description: **Summary:** On May 2nd, 2024, at 10:39 AM EDT, a system monitor alerted us of a potential issue where the disk space on a service used to pass messages between backend workers was approaching critical “free disk space limits”. As we started looking at the event condition, customer reports of various Studio functions experiencing issues started coming in, including but not limited to the following conditions: * Unable to send test campaigns * Processing error messages * Test campaign emails are not being delivered * White screens * Studio loading issues A platform incident was declared at 12:41 PM EDT, and the incident response team was engaged to diagnose the reported issues. **Impact:** The impact was determined to affect all Studio users who attempted to connect to Studio or initiate new Studio activities. **Root Cause:** The incident response team identified that one of the queues in the impacted service was backed up, in effect utilizing too much memory, which led to the out-of-memory condition. As a result, new Studio service requests could not establish connections to this service. The inability to establish connections to the service presented itself as the aforementioned customer-reported issues. **Mitigation:** To restore Studio services, the backed-up queue was purged at around 1:00 PM EDT to free up memory, which increased the available disk space for the service. This allowed for other queues to continue processing, as well as new Studio service requests to gain a connection to the service, and process successfully. For any affected transactions that were stuck during the purge, such as scheduled campaigns that did not publish, these were manually published. No customer data was lost from purging the queue. **Recurrence Prevention:** To prevent a recurrence of this incident, we have since deployed a hotfix to the code that checks if the queue size is over a certain limit before queueing more messages, to prevent this exact out-of-memory failure scenario.
Status: Postmortem
Impact: None | Started At: May 2, 2024, 4:50 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.