Last checked: 8 minutes ago
Get notified about any outages, downtime or incidents for SimpliGov and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for SimpliGov.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
SendGrid API v3 | Active |
Preproduction | Active |
API | Active |
Authorization | Active |
Email Interaction | Active |
eSignature | Active |
Export | Active |
File Conversion | Active |
Metaquery | Active |
Portal | Active |
SimpliSign | Active |
Submission | Active |
Production | Active |
API | Active |
Authorization | Active |
Email Interaction | Active |
eSignature | Active |
Export | Active |
File Conversion | Active |
Metaquery | Active |
Portal | Active |
SimpliSign | Active |
Submission | Active |
Staging | Active |
API | Active |
Authorization | Active |
Email Interaction | Active |
eSignature | Active |
Export | Active |
File Conversion | Active |
Metaquery | Active |
Portal | Active |
SimpliSign | Active |
Submission | Active |
Training | Active |
API | Active |
Authorization | Active |
Email Interaction | Active |
eSignature | Active |
Export | Active |
File Conversion | Active |
Metaquery | Active |
Portal | Active |
SimpliSign | Active |
Submission | Active |
View the latest incidents for SimpliGov and check for official updates:
Description: **Preliminary Root Cause:** SimpliGov’s dashboard and "Queryable Workflow Instance" API Endpoints utilize SimpliGov's MetaQuery ETL process and database as it’s underlying data source. All MetaQuery databases are geo-replicated to a secondary region, US Gov Texas in Azure Government using Azure's built in geo-replication functionality for Azure SQL. Between 2:30 AM PST and 1:58 PM PST on 06/04, Azure reported issues with connections to existing databases in the secondary region experiencing errors or timeouts, and existing connections may have been terminated. Issues replicating from our primary to secondary databases caused slower than expected transfer of data to our MetaQuery databases. Azure Government noted that a recent change on their DNS registry contained a configuration error in which the cluster hosting the nodes became unhealthy, leading to control plane failures and connection issues. The issue was mitigated by the Azure Government team by rolling back the configuration change noted. The backlog of updates to process on the dashboard caused customers to experience a delay in records being updated with appropriate statuses etc. While the backlog of dashboard updates was processing, customers continued to experience lags on dashboard updates. **Mitigation:** In the immediate term, SimpliGov manually synchronized any records with long synchronization times and monitored sync processes during the time period noted. After receiving notice from the Azure Government team that the issue had been mitigated, any backlog of records to be synced to the MetaQuery database was processed and APIs should have been returning appropriate results. If you were not directly contacted about dashboard delays or invalid responses to the "Queryable Workflow Instance" API endpoint, your tenant was not impacted by this incident. Next Steps: We apologize for the impact to affected customers. SimpliGov will continue to monitor dashboard synchronization performance for the affected customer. All customer records being processed throughout the incident period should be consistent with their expected statuses. Customers do not need to take any reconciliatory actions in their production tenants unless you are directly notified to do so by the SimpliGov team. Should customers have a critical reliance or dependency on the SimpliGov "Queryable Workflow Instance" API Endpoint for retrieving workflow tokens etc., they may want to consider using the @WorkflowTokenId formula to pull the latest TokenID for an open workflow record. This method will also remove any potential delays resulting from the asynchronous transfer of data from our transactional databases to customer MetaQuery databases.
Status: Postmortem
Impact: Minor | Started At: June 4, 2021, 9:30 a.m.
Description: **Preliminary Root Cause:** SimpliGov’s dashboard utilizes our proprietary MetaQuery ETL process and database as it’s underlying data source. In this case, a single record manually being synced caused a backlog in synchronization of other records with users experiencing a delay in information coming through on their dashboard. The backlog of updates to process on the dashboard caused customers to experience a delay in records being updated with appropriate statuses etc. While the backlog of dashboard updates was processing, customers continued to experience lags on dashboard updates. **Mitigation:** In the immediate term, SimpliGov immediately synchronized the backlog of records to be synced to the dashboard. Note that this only affected a single customer and if you were not contacted about dashboard delays, your tenant was not impacted by this incident. **Next Steps:** We apologize for the impact to affected customers. SimpliGov will continue to monitor dashboard synchronization performance for the affected customer. All customer records being processed throughout the incident period should be consistent with their expected statuses. Customers do not need to take any reconciliatory actions in their production tenants unless you are directly notified to do so by the SimpliGov team.
Status: Postmortem
Impact: Minor | Started At: May 18, 2021, 9 p.m.
Description: **Preliminary Root Cause:** SimpliGov utilizes Azure Government service fabric for hosting submission services within the platform. Service fabric utilizes virtual machine scale sets with nodes being added and removed from the virtual machine scale set when auto-scaling operations are run. During a recent auto-scaling process, the nodes deprovisioned from the virtual machine scale set caused the transition of submission services between nodes with some not initializing correctly. The SimpliGov developed submission processing services did not respond in a timely manner because of this incorrect initialization and this caused customers to experience 504 status code responses, indicating that submission processing was not occurring as quickly as expected. SimpliGov restarted the affected submission services and the backlog of submissions built up during the incident period were processed as expected. Whilst the backlog of submits, auto-submits etc. was processing, customers may have experienced some lags on dashboard updates etc. whilst dashboard update services scaled to work with the immediate increase in dashboard update requests. **Mitigation:** In the immediate term, SimpliGov immediately restarted all backend processing services to initialize the submission services affected. The auto-scaling rules configured have been reconfigured to have a higher threshold for scaling down operations also. **Next Steps:** We apologize for the impact to affected customers. SimpliGov will continue to monitor the submission services. All customer records being processed throughout the downtime event should be consistent with their expected statuses. Customers do not need to take any reconciliatory actions in their production tenants unless you are directly notified to do so by the SimpliGov team.
Status: Postmortem
Impact: Minor | Started At: May 18, 2021, 3:08 a.m.
Description: **Preliminary Root Cause:** SimpliGov utilizes Azure Government service bus for message queuing activities within the application. Message queuing acts as a way for submissions, auto-submits, reassignments and a host of other activities to get their information into a queue for processing as soon as processing services are available. Upon attempting to retrieve messages from service bus message queues, SimpliGov received several bad responses and this created some backlogged items within submission tables in SimpliGov databases. The SimpliGov developed submit processing services used to process these messages from the queues crashed and this caused customers to experience 504 status code responses, indicating that submission processing was not happening in a timely manner. SimpliGov identified the submit records that were causing the submit processing services to crash as they were trying to enter duplicate submission records in the database, causing primary key constraint errors to be fired. SimpliGov fixed the statuses on the problematic records and the submit services self-healed at this point to allow the backlog of submissions during the affected period to process as expected. Whilst the backlog of submits, auto-submits etc. was processing, customers may have experienced some lags on dashboard updates etc. whilst dashboard update services scaled to work with the immediate increase in dashboard update requests. **Mitigation:** In the immediate term, SimpliGov immediately restarted all backend processing services to remove any potentially hung SQL sessions and identified the problematic hung submissions in the database. After updating the statuses of such records, the submit services which were previously crashing self-healed and began to process submits, auto-submits etc. As noted, SimpliGov have included a fix for this scenario in the upcoming May 8th 2021 release \(which has already been deployed to staging and preproduction environments as of March 28th 2021\). This update improves SimpliGov's submission service stability by avoiding the situation when an exception on a single tenant can affect other tenants processing. In order to prevent the situation of endless exception loop in the scenario noted, auto-submits with any exception will be immediately marked as “Failed” in the Auto-Submit dashboard from which tenant administrators can retry it. This will prevent any blocking submissions and individually failing auto-submits can be fixed as needed. **Next Steps:** We apologize for the impact to affected customers. SimpliGov will continue to monitor the fix applied and submission services and will be deploying an additional mitigating update as part of the May 8th 2021 production release. All customer records being processed throughout the downtime event should be consistent with their expected statuses. Customers do not need to take any reconciliatory actions in their production tenants unless you are directly notified to do so by the SimpliGov team.
Status: Postmortem
Impact: Critical | Started At: April 28, 2021, 5:15 p.m.
Description: **Preliminary Root Cause:** As a result of an upgrade activity initiated by the Azure Government team, Service Fabric clusters used to provide the preproduction services failed to retrieve node statuses from the underlying virtual machine scale sets. The upgrade pushed by the Azure Government team was expected to apply the latest fabric updates to the cluster on each node sequentially, applying the update on a single seed node per fault domain within the Azure Service Fabric cluster. After applying the upgrade, the seed nodes required for the service fabric clusters failed to report their status to the Service Fabric management services and the Service Fabric cluster could not service any requests for SimpliGov’s Portal. Authentication, API, MetaQuery Sync and Submission services. SimpliGov customers identified the downtime event by trying to access the SimpliGov portal via web browser and received a 502 status code page indicating that the Azure Application Gateway used for load balancing requests to SimpliGov Service Fabric clusters was not receiving appropriate health status responses to allow the Application Gateway to service requests to customers. **Mitigation:** In the immediate term, SimpliGov restarted all nodes in all virtual machine scale sets powering the Service Fabric clusters at approximately 2:15 PM PST. This triggered roll back procedures for the Service Fabric upgrades initiated by the Azure Government team and the roll back completed at approximately 2:27 PM PST at which point, SimpliGov services became available for customers. In addition to working to restore services as soon as possible, SimpliGov contacted the Azure Government support team to request assistance with the roll back if required and a full root cause analysis on why the upgrades failed to apply sequentially and through each fault domain as expected. Information was also requested on how the upgrade caused all nodes in the underlying Virtual Machine Scale Sets to fail to report appropriate statuses to the Service Fabric management services. Azure Government support also confirmed the successful roll back of the upgrade to the last known successful version of the Service Fabric services. Outside of US business hours, SimpliGov IT/Operations personnel manually applied the latest Service Fabric upgrades and monitored their successful application throughout and after the upgrade process. It was noted that the upgrades were applied to nodes in a sequential manner as expected with no effect to the services used by customers. **Next Steps:** We apologize for the impact to affected customers. Azure Government will provide additional details on why the upgrade process didn’t apply in the expected manner, caused the nodes to fail to report to the Service Fabric management service and why a roll back of the upgrade was required to bring SimpliGov services back online. These additional details will be in the form of a Root Cause Analysis report. Note that as this event occurred on production with submission, API and portal services being unavailable, all customer records being processed throughout the downtime event should be consistent with their expected statuses. Customers do not need to take any reconciliatory actions in their preproduction tenants.
Status: Postmortem
Impact: Major | Started At: April 20, 2021, 2:05 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.