SimpliGov Status: Check if SimpliGov down or having an outage.

SimpliGov outages and incidents

Outage and incident data over the last 30 days for SimpliGov.

There have been 0 outages or incidents for SimpliGov in the last 30 days.

Severity Breakdown:

None: 0

Minor: 0

Major: 0

Critical: 0

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Components and Services Monitored for SimpliGov

Outlogger tracks the status of these components for Xero:

SendGrid API v3 Active

Preproduction

API Active

Authorization Active

Email Interaction Active

eSignature Active

Export Active

File Conversion Active

Metaquery Active

Portal Active

SimpliSign Active

Submission Active

Production

API Active

Authorization Active

Email Interaction Active

eSignature Active

Export Active

File Conversion Active

Metaquery Active

Portal Active

SimpliSign Active

Submission Active

Staging

API Active

Authorization Active

Email Interaction Active

eSignature Active

Export Active

File Conversion Active

Metaquery Active

Portal Active

SimpliSign Active

Submission Active

Training

API Active

Authorization Active

Email Interaction Active

eSignature Active

Export Active

File Conversion Active

Metaquery Active

Portal Active

SimpliSign Active

Submission Active

Component	Status
SendGrid API v3	Active
Preproduction	Active
API	Active
Authorization	Active
Email Interaction	Active
eSignature	Active
Export	Active
File Conversion	Active
Metaquery	Active
Portal	Active
SimpliSign	Active
Submission	Active
Production	Active
API	Active
Authorization	Active
Email Interaction	Active
eSignature	Active
Export	Active
File Conversion	Active
Metaquery	Active
Portal	Active
SimpliSign	Active
Submission	Active
Staging	Active
API	Active
Authorization	Active
Email Interaction	Active
eSignature	Active
Export	Active
File Conversion	Active
Metaquery	Active
Portal	Active
SimpliSign	Active
Submission	Active
Training	Active
API	Active
Authorization	Active
Email Interaction	Active
eSignature	Active
Export	Active
File Conversion	Active
Metaquery	Active
Portal	Active
SimpliSign	Active
Submission	Active

Latest SimpliGov outages and incidents.

View the latest incidents for SimpliGov and check for official updates:

Production - Degraded submission and dashboard performance - 2021/10/21

Description: **Preliminary Root Cause:** SimpliGov utilizes Azure Service Bus as it’s message queuing system for things like submissions, dashboard updates etc. Azure Government support informed SimpliGov that they performed updates to Azure Service Bus infrastructure and between 9:00 and 10:00 PST on 21-10-2021, SimpliGov was identified as a customer using Service Bus in USGov Arizona which experienced increased error rates for service bus. SimpliGov received “Connection reset by peer” error messages from the Azure Service bus service which resulted in partial connection loss to the service. Azure support states that the primary cause of the issue on their side was that a subset of backend instances experienced an unexpected high utilization due to a platform upgrade in US Gov Arizona. The Azure Government product engineering group allocated more bandwidth for the host nodes which brough the instances back to a healthy state. From the SimpliGov side, this mitigating action allowed our services to connect to Azure Service bus as expected. During this period of time, most intake submissions should have worked as expected but some users would have experienced slower than expected dashboard synchronization times and potential duplicates depending on their workflow configuration as submissions were retried when initial connections to service bus failed. **Mitigation:** In the immediate term, SimpliGov support employed manual dashboard synchronization processes to ensure that any records coming in were being reflected as soon as possible. Azure support resolved the Service Bus issue on their end and this allowed SimpliGov processing and dashboard synchronization services to function as expected. Going forward, SimpliGov has implemented several new features within our upcoming Dec 2021/Jan 2022 Production release allowing for better failover and fault tolerance in such scenarios. SimpliGov will be hosted on Azure Kubernetes Services instead of Azure Service Fabric, split service bus queue architecture will be enabled and additional retry policies have been added to handle scenarios whereby Azure Services return transient errors. These 3 particular items, in addition to smaller individual fixes and improvements should reduce the propencity of such incidents going forward. **Next Steps:** We apologize for the impact to affected customers. SimpliGov will continue to monitor the situation going forward with azure service bus and configure specific alerts relating to Service bus and “connection reset by peer' messages. We will be deploying additional updates / architectural changes as part of our upcoming production release scheduled for Jan 2022 to also improve future fault tolerance. All customer records being processed throughout the incident period should be consistent with their expected statuses. Customers do not need to take any reconciliatory actions in their production tenants unless you are directly notified to do so by the SimpliGov team. ------------------------------------------------------------------------------------------------------------------------------------------------------- _Additional RCA from Microsoft Azure Government support:_ **Incident Summary:** Between 10/21 17:07 UTC and 10/21 19:14 UTC, Event Hub, Service Bus, and Relay service customers may have intermittently experienced an increase in latency or timeouts on runtime operations. **Root Cause:** On 10/21/21 at 17:07 UTC, an upgrade was performed on clusters servicing Service Bus, Event Hub, and Relay services. During the upgrade, as TCP connections were disconnected from each gateway machine, there was a noticeable increase in TCP connection requests as existing TCP connections were terminated resulting in high CPU utilization by the LSASS process that handles TLS/SSL handshakes. As the LSASS process hit the CPU threshold set by the service teams, processing of TLS/SSL handshakes were slowed causing increased latency/delay in serving send and receive requests made by client applications. The issue was mitigated at 10/21/21 at 19:14 UTC after the team scaled out the cluster and added machines to allocate more resources to process incoming requests. **Next Steps:** Microsoft apologizes for the impact to affected customers. We have scaled out our gateway machines to increase capacity and are taking steps to improve the Microsoft Azure Platform and our processes to help ensure such incidents do not occur in the future. In this case, this includes \(but is not limited to\): * Enhancements to monitoring related to LSASS CPU utilization * Performance improvements to the throttle/delay response related to invalid tokens/SAS * Adjustment of diagnostic steps to include tracing namespace/audience when clients prevent invalid tokens/SAS

Status: Postmortem

Impact: Minor | Started At: Oct. 21, 2021, 7:55 p.m.

Updates:

Time: Oct. 22, 2021, 7:04 p.m.

Status: Postmortem

Update: **Preliminary Root Cause:** SimpliGov utilizes Azure Service Bus as it’s message queuing system for things like submissions, dashboard updates etc. Azure Government support informed SimpliGov that they performed updates to Azure Service Bus infrastructure and between 9:00 and 10:00 PST on 21-10-2021, SimpliGov was identified as a customer using Service Bus in USGov Arizona which experienced increased error rates for service bus. SimpliGov received “Connection reset by peer” error messages from the Azure Service bus service which resulted in partial connection loss to the service. Azure support states that the primary cause of the issue on their side was that a subset of backend instances experienced an unexpected high utilization due to a platform upgrade in US Gov Arizona. The Azure Government product engineering group allocated more bandwidth for the host nodes which brough the instances back to a healthy state. From the SimpliGov side, this mitigating action allowed our services to connect to Azure Service bus as expected. During this period of time, most intake submissions should have worked as expected but some users would have experienced slower than expected dashboard synchronization times and potential duplicates depending on their workflow configuration as submissions were retried when initial connections to service bus failed. **Mitigation:** In the immediate term, SimpliGov support employed manual dashboard synchronization processes to ensure that any records coming in were being reflected as soon as possible. Azure support resolved the Service Bus issue on their end and this allowed SimpliGov processing and dashboard synchronization services to function as expected. Going forward, SimpliGov has implemented several new features within our upcoming Dec 2021/Jan 2022 Production release allowing for better failover and fault tolerance in such scenarios. SimpliGov will be hosted on Azure Kubernetes Services instead of Azure Service Fabric, split service bus queue architecture will be enabled and additional retry policies have been added to handle scenarios whereby Azure Services return transient errors. These 3 particular items, in addition to smaller individual fixes and improvements should reduce the propencity of such incidents going forward. **Next Steps:** We apologize for the impact to affected customers. SimpliGov will continue to monitor the situation going forward with azure service bus and configure specific alerts relating to Service bus and “connection reset by peer' messages. We will be deploying additional updates / architectural changes as part of our upcoming production release scheduled for Jan 2022 to also improve future fault tolerance. All customer records being processed throughout the incident period should be consistent with their expected statuses. Customers do not need to take any reconciliatory actions in their production tenants unless you are directly notified to do so by the SimpliGov team. ------------------------------------------------------------------------------------------------------------------------------------------------------- _Additional RCA from Microsoft Azure Government support:_ **Incident Summary:** Between 10/21 17:07 UTC and 10/21 19:14 UTC, Event Hub, Service Bus, and Relay service customers may have intermittently experienced an increase in latency or timeouts on runtime operations. **Root Cause:** On 10/21/21 at 17:07 UTC, an upgrade was performed on clusters servicing Service Bus, Event Hub, and Relay services. During the upgrade, as TCP connections were disconnected from each gateway machine, there was a noticeable increase in TCP connection requests as existing TCP connections were terminated resulting in high CPU utilization by the LSASS process that handles TLS/SSL handshakes. As the LSASS process hit the CPU threshold set by the service teams, processing of TLS/SSL handshakes were slowed causing increased latency/delay in serving send and receive requests made by client applications. The issue was mitigated at 10/21/21 at 19:14 UTC after the team scaled out the cluster and added machines to allocate more resources to process incoming requests. **Next Steps:** Microsoft apologizes for the impact to affected customers. We have scaled out our gateway machines to increase capacity and are taking steps to improve the Microsoft Azure Platform and our processes to help ensure such incidents do not occur in the future. In this case, this includes \(but is not limited to\): * Enhancements to monitoring related to LSASS CPU utilization * Performance improvements to the throttle/delay response related to invalid tokens/SAS * Adjustment of diagnostic steps to include tracing namespace/audience when clients prevent invalid tokens/SAS
Time: Oct. 21, 2021, 11:04 p.m.

Status: Resolved

Update: This incident has been resolved. A post-mortem with follow up actions will be posted shortly.
Time: Oct. 21, 2021, 8:47 p.m.

Status: Monitoring

Update: Azure support has provided a preliminary root cause analysis regarding the increased error rate SimpliGov received for Service Bus connections: Summary of impact: Between 12:00 and 13:00 EDT on 21 OCT 2021, Azure Government identified SimpliGov as a customer using Service Bus in USGov Arizona who may have experienced increased error rates from your service bus instance Preliminary root cause: Azure Government determined that a subset of backend instances experienced an unexpected high utilization due to a platform upgrade in US Gov Arizona. Mitigation: The Azure Government product engineering group allocated more bandwidth for the host nodes which brough the instances back to a healthy state Next steps: Azure Government will continue to investigate to establish the full root cause and prevent future occurrences. A full root cause analysis is expected from Azure Government in approximately 2 to 3 business days.
Time: Oct. 21, 2021, 8:40 p.m.

Status: Monitoring

Update: Azure support have confirmed that they have resolved the issue and are preparing a root cause analysis report. SimpliGov will share further information as it is made available. Processing and Dashboard Update services are working as expected but SimpliGov support are continuing to reconcile older records on the dashboard.
Time: Oct. 21, 2021, 7:55 p.m.

Status: Monitoring

Update: SimpliGov has been notified of degraded dashboard and submission performance by a subset of customers. SimpliGov has identified the issue and reached out to our hosting provider, Azure Government. Azure Government has confirmed that they are working on said issue and will report back periodically regarding the status of their investigation, resolution etc. During this period of time, please note that submissions are still being processed but there may be a lag between submission creation / updates and what is shown on the dashboard. SimpliGov support personnel have activated bypass procedures to ensure that records are being shown on customer dashboards while Azure Government resolve the service issue on their end.

Degraded dashboard performance - Production - 2021/10/20

Description: **Preliminary Root Cause:** As a result of an upgrade activity initiated by the Azure Government team, Service Fabric clusters used to provide the production services failed to retrieve node statuses from the underlying virtual machine scale sets. The upgrade pushed by the Azure Government team was expected to apply the latest fabric updates to the cluster on each node sequentially, applying the update on a single seed node per fault domain within the Azure Service Fabric cluster. In cases of upgrade failure or node failure, it is expected that Azure Service Fabric will move services from affected nodes to active or "live" nodes. After applying the upgrade, a seed node required for the service fabric clusters failed to report their status to the Service Fabric management services and the Service Fabric cluster could not service any requests for SimpliGov’s Portal. Authentication, API, MetaQuery Sync and Submission services from this node. SimpliGov customers identified the downtime event by trying to access the SimpliGov portal via web browser and received a 502/504 status code page indicating that the Azure Application Gateway used for load balancing requests to SimpliGov Service Fabric clusters was not receiving appropriate health status responses to allow the Application Gateway to service requests to customers. **Mitigation:** Most services self-healed once the Service Fabric failover manager failed over services from the affected node to an active node. However, some dashboard synchronization services were affected post-healing event. In the immediate term, SimpliGov restarted the affected node containing dashboard synchronizatin services in the production virtual machine scale set, triggering move procedures for services originally hosted on the affected node. At 12:30 PM PST, the move activity completed and normal service was removed. In addition to working to restore services as soon as possible, SimpliGov contacted the Azure Government support team to request assistance with the issue and a full root cause analysis on why the upgrades failed to apply correctly and through each fault domain as expected. Information was also requested on how the upgrade caused all nodes in the underlying Virtual Machine Scale Sets to fail to report appropriate statuses to the Service Fabric management services. **Next Steps:** We apologize for the impact to affected customers. Azure Government will provide additional details on why the upgrade process didn’t apply in the expected manner, caused the nodes to fail to report to the Service Fabric management service and why the node failure did not trigger expected move processes without manual interaction from SimpliGov. All customer records being processed throughout the incident period should be consistent with their expected statuses. Customers do not need to take any reconciliatory actions in their production tenants unless you are directly notified to do so by the SimpliGov team.

Status: Postmortem

Impact: Minor | Started At: Oct. 20, 2021, 7:32 p.m.

Updates:

Time: Oct. 22, 2021, 7:22 p.m.

Status: Postmortem

Update: **Preliminary Root Cause:** As a result of an upgrade activity initiated by the Azure Government team, Service Fabric clusters used to provide the production services failed to retrieve node statuses from the underlying virtual machine scale sets. The upgrade pushed by the Azure Government team was expected to apply the latest fabric updates to the cluster on each node sequentially, applying the update on a single seed node per fault domain within the Azure Service Fabric cluster. In cases of upgrade failure or node failure, it is expected that Azure Service Fabric will move services from affected nodes to active or "live" nodes. After applying the upgrade, a seed node required for the service fabric clusters failed to report their status to the Service Fabric management services and the Service Fabric cluster could not service any requests for SimpliGov’s Portal. Authentication, API, MetaQuery Sync and Submission services from this node. SimpliGov customers identified the downtime event by trying to access the SimpliGov portal via web browser and received a 502/504 status code page indicating that the Azure Application Gateway used for load balancing requests to SimpliGov Service Fabric clusters was not receiving appropriate health status responses to allow the Application Gateway to service requests to customers. **Mitigation:** Most services self-healed once the Service Fabric failover manager failed over services from the affected node to an active node. However, some dashboard synchronization services were affected post-healing event. In the immediate term, SimpliGov restarted the affected node containing dashboard synchronizatin services in the production virtual machine scale set, triggering move procedures for services originally hosted on the affected node. At 12:30 PM PST, the move activity completed and normal service was removed. In addition to working to restore services as soon as possible, SimpliGov contacted the Azure Government support team to request assistance with the issue and a full root cause analysis on why the upgrades failed to apply correctly and through each fault domain as expected. Information was also requested on how the upgrade caused all nodes in the underlying Virtual Machine Scale Sets to fail to report appropriate statuses to the Service Fabric management services. **Next Steps:** We apologize for the impact to affected customers. Azure Government will provide additional details on why the upgrade process didn’t apply in the expected manner, caused the nodes to fail to report to the Service Fabric management service and why the node failure did not trigger expected move processes without manual interaction from SimpliGov. All customer records being processed throughout the incident period should be consistent with their expected statuses. Customers do not need to take any reconciliatory actions in their production tenants unless you are directly notified to do so by the SimpliGov team.
Time: Oct. 20, 2021, 8:04 p.m.

Status: Resolved

Update: This incident has been resolved. The noted downtime was around 2021-10-20 10:33 and 2021-10-20 10:37 i.e. this behavior was encountered for ~4 minutes. We have synced out any records affected during this time. Some customers may have continued to experience syncing issues beyond this period of time but intake and processing services for the dashboard were operational during this time. Regarding the cause of the outage, an underlying node was restarted by Azure updates (these are automated and not under SimpliGov’s control) and their failover manager service took some time to move some copies of customer services to other nodes. Should you have any further questions, please contact support via your Service Desk. Regards, SimpliGov OPS
Time: Oct. 20, 2021, 7:32 p.m.

Status: Monitoring

Update: For a subset of customers, SimpliGov dashboards and API calls are showing submissions with a lag. This has been resolved and the underlying cause is being investigated further.

Preproduction services unavailable

Description: **Executive Summary:** Between 5:19 AM PST and 7:54 AM PST, SimpliGov preproduction was unavailable for customers causing users to receive pages stating "502" errors. As a result of this outage, users of affected customer websites were unable to submit new forms or work with existing forms in the preproduction environment. The cause of this partial outage was that periodic cluster updates failed to apply correctly and automated rollback processes on the cluster stalled. At 7:54 AM PST, normal service was resumed for the preproduction environment. After working with our hosting provider \(Azure Government\), it was determined that the best course of action was to redeploy the cluster and the services running on it. As a result of redeploying the cluster, the IP address for preproduction changed from 52.244.80.86 to 52.244.79.177. Customers should replace the old IP address \(52.244.80.86\) with the new IP address \(52.244.79.177\) if they are whitelisting communication from SimpliGov preproduction to internal systems. We apologize for the impact to affected customers. A more detailed summary of events can be seen below. ‌ **Detailed Summary:** Preliminary Root Cause: As a result of a long running upgrade activity on our preproduction Service Fabric cluster, Service Fabric initiated automatic rollback procedures to revert the updates applied and move back to the last successful version applied. The rollback activity noted ran for much longer than expected and was deemed "stalled". In the aforementioned scenario, the service fabric cluster could not service any requests for SimpliGov’s Portal. Authentication, API, MetaQuery Sync and Submission services and remained in a stalled status. SimpliGov users trying to access the preproduction via web browser received 502 status code pages indicating that the Azure Application Gateway used for load balancing requests to SimpliGov Service Fabric clusters was not receiving appropriate health status responses to allow the Application Gateway to service requests to customers. Likewise, API calls to preproduction received a 502 status code. Mitigation: After discussing available options with the Azure Government support team, SimpliGov redeployed the cluster affected and subsequentially redeployed all preproduction services affected. In addition to working to restore services as soon as possible, the Azure Government support team are working to identify why the upgrades and rollback processes failed. As a result of this mitigation strategy, customers should update any IP whitelists they have for non-production environments by replacing Preproduction's old IP address \(52.244.80.86\) with the new Preproduction IP address \(52.244.79.177\) Next Steps: We apologize for the impact to affected customers. Azure Government will provide additional details on why the upgrade and rollback processes didn’t work as expected, causing preproduction to become unavailable. Note that as this event occurred on preproduction with submission, API and portal services being unavailable, all customer records being processed throughout the downtime event should be consistent with their expected statuses. Customers do not need to take any reconciliatory actions in their preproduction tenants.

Status: Postmortem

Impact: Major | Started At: Aug. 5, 2021, 12:19 p.m.

Updates:

Time: Aug. 5, 2021, 3:59 p.m.

Status: Postmortem

Update: **Executive Summary:** Between 5:19 AM PST and 7:54 AM PST, SimpliGov preproduction was unavailable for customers causing users to receive pages stating "502" errors. As a result of this outage, users of affected customer websites were unable to submit new forms or work with existing forms in the preproduction environment. The cause of this partial outage was that periodic cluster updates failed to apply correctly and automated rollback processes on the cluster stalled. At 7:54 AM PST, normal service was resumed for the preproduction environment. After working with our hosting provider \(Azure Government\), it was determined that the best course of action was to redeploy the cluster and the services running on it. As a result of redeploying the cluster, the IP address for preproduction changed from 52.244.80.86 to 52.244.79.177. Customers should replace the old IP address \(52.244.80.86\) with the new IP address \(52.244.79.177\) if they are whitelisting communication from SimpliGov preproduction to internal systems. We apologize for the impact to affected customers. A more detailed summary of events can be seen below. ‌ **Detailed Summary:** Preliminary Root Cause: As a result of a long running upgrade activity on our preproduction Service Fabric cluster, Service Fabric initiated automatic rollback procedures to revert the updates applied and move back to the last successful version applied. The rollback activity noted ran for much longer than expected and was deemed "stalled". In the aforementioned scenario, the service fabric cluster could not service any requests for SimpliGov’s Portal. Authentication, API, MetaQuery Sync and Submission services and remained in a stalled status. SimpliGov users trying to access the preproduction via web browser received 502 status code pages indicating that the Azure Application Gateway used for load balancing requests to SimpliGov Service Fabric clusters was not receiving appropriate health status responses to allow the Application Gateway to service requests to customers. Likewise, API calls to preproduction received a 502 status code. Mitigation: After discussing available options with the Azure Government support team, SimpliGov redeployed the cluster affected and subsequentially redeployed all preproduction services affected. In addition to working to restore services as soon as possible, the Azure Government support team are working to identify why the upgrades and rollback processes failed. As a result of this mitigation strategy, customers should update any IP whitelists they have for non-production environments by replacing Preproduction's old IP address \(52.244.80.86\) with the new Preproduction IP address \(52.244.79.177\) Next Steps: We apologize for the impact to affected customers. Azure Government will provide additional details on why the upgrade and rollback processes didn’t work as expected, causing preproduction to become unavailable. Note that as this event occurred on preproduction with submission, API and portal services being unavailable, all customer records being processed throughout the downtime event should be consistent with their expected statuses. Customers do not need to take any reconciliatory actions in their preproduction tenants.
Time: Aug. 5, 2021, 3:10 p.m.

Status: Resolved

Update: Preproduction services have been restored and are fully operational.
Time: Aug. 5, 2021, 2:54 p.m.

Status: Monitoring

Update: The issue has been resolved and the preproduction environment is now working as expected. SimpliGov IT/OPS will continue to monitor the situation going forward.
Time: Aug. 5, 2021, 1:40 p.m.

Status: Identified

Update: The issue has been identified and the IT/OPS team are actively working to resolve said issue. As a result of the fix, users will need to update their whitelists for non-production environments. Whitelists referencing 52.244.80.86 should be updated with a new IP address: 52.244.79.177.
Time: Aug. 5, 2021, 12:19 p.m.

Status: Investigating

Update: We are currently availability issues for our preproduction environment. We will post updates as they come available.

Production services unavailable

Description: **Preliminary Root Cause:** As a result of an upgrade activity initiated by the Azure Government team, Service Fabric clusters used to provide the production services failed to retrieve node statuses from the underlying virtual machine scale sets. The upgrade pushed by the Azure Government team was expected to apply the latest fabric updates to the cluster on each node sequentially, applying the update on a single seed node per fault domain within the Azure Service Fabric cluster. In cases of upgrade failure or node failure, it is expected that Azure Service Fabric will move services from affected nodes to active or "live" nodes. After applying the upgrade, a seed node required for the service fabric clusters failed to report their status to the Service Fabric management services and the Service Fabric cluster could not service any requests for SimpliGov’s Portal. Authentication, API, MetaQuery Sync and Submission services from this node. SimpliGov customers identified the downtime event by trying to access the SimpliGov portal via web browser and received a 502 status code page indicating that the Azure Application Gateway used for load balancing requests to SimpliGov Service Fabric clusters was not receiving appropriate health status responses to allow the Application Gateway to service requests to customers. **Mitigation:** In the immediate term, SimpliGov restarted the affected node in the production virtual machine scale set, triggering move procedures for services originally hosted on the affected node. At 8:44 AM PST, the move activity completed and normal service was removed. In addition to working to restore services as soon as possible, SimpliGov contacted the Azure Government support team to request assistance with the issue and a full root cause analysis on why the upgrades failed to apply correctly and through each fault domain as expected. Information was also requested on how the upgrade caused all nodes in the underlying Virtual Machine Scale Sets to fail to report appropriate statuses to the Service Fabric management services. Azure Government support also confirmed the successful application of upgrades after the actions taken by SimpliGov and the movement of services from the affected node to other live nodes was confirmed. **Next Steps:** We apologize for the impact to affected customers. Azure Government will provide additional details on why the upgrade process didn’t apply in the expected manner, caused the nodes to fail to report to the Service Fabric management service and why the node failure did not trigger expected move processes without manual interaction from SimpliGov. Note that as this event occurred on production with submission, API and portal services being unavailable, all customer records being processed throughout the downtime event should be consistent with their expected statuses. Customers do not need to take any reconciliatory actions in their production tenants.

Status: Postmortem

Impact: Major | Started At: Aug. 4, 2021, 3:30 p.m.

Updates:

Time: Aug. 4, 2021, 11:36 p.m.

Status: Postmortem

Update: **Preliminary Root Cause:** As a result of an upgrade activity initiated by the Azure Government team, Service Fabric clusters used to provide the production services failed to retrieve node statuses from the underlying virtual machine scale sets. The upgrade pushed by the Azure Government team was expected to apply the latest fabric updates to the cluster on each node sequentially, applying the update on a single seed node per fault domain within the Azure Service Fabric cluster. In cases of upgrade failure or node failure, it is expected that Azure Service Fabric will move services from affected nodes to active or "live" nodes. After applying the upgrade, a seed node required for the service fabric clusters failed to report their status to the Service Fabric management services and the Service Fabric cluster could not service any requests for SimpliGov’s Portal. Authentication, API, MetaQuery Sync and Submission services from this node. SimpliGov customers identified the downtime event by trying to access the SimpliGov portal via web browser and received a 502 status code page indicating that the Azure Application Gateway used for load balancing requests to SimpliGov Service Fabric clusters was not receiving appropriate health status responses to allow the Application Gateway to service requests to customers. **Mitigation:** In the immediate term, SimpliGov restarted the affected node in the production virtual machine scale set, triggering move procedures for services originally hosted on the affected node. At 8:44 AM PST, the move activity completed and normal service was removed. In addition to working to restore services as soon as possible, SimpliGov contacted the Azure Government support team to request assistance with the issue and a full root cause analysis on why the upgrades failed to apply correctly and through each fault domain as expected. Information was also requested on how the upgrade caused all nodes in the underlying Virtual Machine Scale Sets to fail to report appropriate statuses to the Service Fabric management services. Azure Government support also confirmed the successful application of upgrades after the actions taken by SimpliGov and the movement of services from the affected node to other live nodes was confirmed. **Next Steps:** We apologize for the impact to affected customers. Azure Government will provide additional details on why the upgrade process didn’t apply in the expected manner, caused the nodes to fail to report to the Service Fabric management service and why the node failure did not trigger expected move processes without manual interaction from SimpliGov. Note that as this event occurred on production with submission, API and portal services being unavailable, all customer records being processed throughout the downtime event should be consistent with their expected statuses. Customers do not need to take any reconciliatory actions in their production tenants.
Time: Aug. 4, 2021, 11:35 p.m.

Status: Resolved

Update: Between 8:31 AM PST and 8:44 AM PST, SimpliGov was unavailable for a number of customers causing users to receive pages stating "502" errors. As a result of this outage, users of affected customer websites were unable to submit new forms or work with existing forms. The cause of this partial outage was that a single node of a server farm (hereafter referred to as cluster) did not gracefully fail, causing a slower than expected move for services from said node to other "live" nodes within the cluster. At 8:44 AM PST, the move activity completed and normal service was removed. After working with our hosting provider (Azure Government), it was determined that an upgrade pushed by Azure caused the node to fail leading to the outage for affected customers. We apologize for the impact to affected customers. A more detailed summary of events can be seen below.

Increased error responses from Google Address lookups

Description: The incident has been resolved

Status: Resolved

Impact: Minor | Started At: Aug. 3, 2021, 6:57 a.m.

Updates:

Time: Aug. 3, 2021, 7:17 a.m.

Status: Resolved

Update: The incident has been resolved
Time: Aug. 3, 2021, 7:06 a.m.

Status: Monitoring

Update: Address lookups are now working as expected in forms. We will continue to monitor the situation in the immediate term.
Time: Aug. 3, 2021, 7:05 a.m.

Status: Identified

Update: The issue has been identified and we are working to mitigate it
Time: Aug. 3, 2021, 6:57 a.m.

Status: Investigating

Update: We are currently investigating an issue where address fields in forms are receiving more error responses than expected.

Check the status of similar companies and alternatives to SimpliGov

NetSuite

Systems Active

ZoomInfo

Systems Active

SPS Commerce

Systems Active

Miro

Systems Active

Field Nation

Systems Active

Outreach

Systems Active

Own Company

Issues Detected

Mindbody

Systems Active

TaskRabbit

Systems Active

Nextiva

Systems Active

6Sense

Systems Active

BigCommerce

Systems Active

Frequently Asked Questions - SimpliGov

Is there a SimpliGov outage?

The current status of SimpliGov is: Systems Active

Where can I find the official status page of SimpliGov?

The official status page for SimpliGov is here

How can I get notified if SimpliGov is down or experiencing an outage?

To get notified of any status changes to SimpliGov, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of SimpliGov every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here

What does SimpliGov do?

SimpliGov is a platform for government online forms, workflow automation, and electronic signatures, improving citizen engagement and public sector services.

Is there an SimpliGov outage?

SimpliGov status: Systems Active

SimpliGov outages and incidents

There have been 0 outages or incidents for SimpliGov in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Components and Services Monitored for SimpliGov

Preproduction

Production

Staging

Training

Latest SimpliGov outages and incidents.

Production - Degraded submission and dashboard performance - 2021/10/21

Updates:

Degraded dashboard performance - Production - 2021/10/20

Updates:

Preproduction services unavailable

Updates:

Production services unavailable

Updates:

Increased error responses from Google Address lookups

Updates:

Check the status of similar companies and alternatives to SimpliGov

Frequently Asked Questions - SimpliGov

Is there a SimpliGov outage?

Where can I find the official status page of SimpliGov?

How can I get notified if SimpliGov is down or experiencing an outage?

What does SimpliGov do?

Start monitoring now!