Last checked: 1 month ago
Get notified about any outages, downtime or incidents for CyberGrants and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for CyberGrants.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
APIs | Active |
CyberGrants | Active |
Insights | Active |
Sandbox | Active |
View the latest incidents for CyberGrants and check for official updates:
Description: ### What Occurred? Configuration changes in the production environment and data changes resulted in un-planned down-time for the CyberGrants application for 30 minutes, from 7:25 to 7:55 PM Eastern on April 4. These changes also contributed to a failure in the On-Demand Reporting sub-systems from 5 PM Eastern on April 4 to 10 AM on April 5. Monitoring for the On-Demand Reporting sub-systems did not surface a persistent failure mode. ### Why Did This Occur \(Root Cause\)? Concurrent production system changes had adverse interactions. While our monitoring continues to expand and improve, we did not have sufficient alerting to address this issue immediately, and relied on staff reports of reporting feature issues. ### How Will CyberGrants Prevent Further Occurrences? We are expanding and enhancing our change control processes in April to prevent this kind of conflict in the future. We are expanding our system monitoring to address multiple sub-systems, in April, to precisely pinpoint more partial-failure modes.
Status: Postmortem
Impact: Critical | Started At: April 4, 2022, 11:16 p.m.
Description: **System Incident – 3/8/2022 - Root Cause Analysis** **What Occurred?** Clients started reporting intermittent slowness and site errors during the morning on 3/8/2022. Our Operations team identified the failing system and initiated the recovery that resulted in a brief outage from 10:48am EST to 10:53am EST. The system returned to normal operation at 11:00am EST. **Why Did This Occur \(Root Cause\)?** This incident was caused by an intermittent hardware/disk failure on our primary web server. As the load on this server increased during the morning, the impact on the system became worse. After failing over to the secondary web server, the system returned to normal operation. We also determined that a gap in our monitoring failed to notify our Operations team of the hardware failure, which delayed our ability to pinpoint and resolve the problem. **How Will CyberGrants Prevent Further Occurrences?** As a result of this incident, the following actions were taken: 1. We have updated our monitoring system to generate notifications for this class of hardware failure. 2. After our Cloud Migration is completed \(this summer\), we will be implementing multiple active web servers \(such that if one server fails, all traffic will be routed to the remaining active servers\).
Status: Postmortem
Impact: Critical | Started At: March 8, 2022, 4:31 p.m.
Description: **System Incident – 3/8/2022 - Root Cause Analysis** **What Occurred?** Clients started reporting intermittent slowness and site errors during the morning on 3/8/2022. Our Operations team identified the failing system and initiated the recovery that resulted in a brief outage from 10:48am EST to 10:53am EST. The system returned to normal operation at 11:00am EST. **Why Did This Occur \(Root Cause\)?** This incident was caused by an intermittent hardware/disk failure on our primary web server. As the load on this server increased during the morning, the impact on the system became worse. After failing over to the secondary web server, the system returned to normal operation. We also determined that a gap in our monitoring failed to notify our Operations team of the hardware failure, which delayed our ability to pinpoint and resolve the problem. **How Will CyberGrants Prevent Further Occurrences?** As a result of this incident, the following actions were taken: 1. We have updated our monitoring system to generate notifications for this class of hardware failure. 2. After our Cloud Migration is completed \(this summer\), we will be implementing multiple active web servers \(such that if one server fails, all traffic will be routed to the remaining active servers\).
Status: Postmortem
Impact: Critical | Started At: March 8, 2022, 4:31 p.m.
Description: **System Incident – 1/13/22 - External RCA** **What Occurred?** On Thursday morning \(1/13/22\), the CyberGrants system experienced a period of intermittent degradation from 8:11a until 9:13a ET. While the system was still processing transactions, many users experienced slow response time or timeouts during this time. Our Operations team identified the cause of the issue, fixed the problem, and the system returned to normal operation at 9:13a ET. **Why Did This Occur \(Root Cause\)?** When regular database maintenance was performed, the removal of a large number of application debugging records dramatically affected the performance of a key table within the application. This resulted in extreme system degradation when writing new records to this table. As the transaction volume increased during the morning, this degradation affected the entire system. Our team temporarily disabled the application debugging function and the system returned to normal operation. In researching this further, an application defect was identified that created the excessive number of records in this table and this code was corrected. **How Will CyberGrants Prevent Further Occurrences?** As a result of this incident, CyberGrants took the following steps: 1. We identified and corrected the application code that caused this problem. 2. We are revising the database maintenance strategy for this table so that inserting/deleting large amounts of data into this table will not result in severe performance degradation.
Status: Postmortem
Impact: Critical | Started At: Jan. 13, 2022, 1:11 p.m.
Description: **System Incident – 1/13/22 - External RCA** **What Occurred?** On Thursday morning \(1/13/22\), the CyberGrants system experienced a period of intermittent degradation from 8:11a until 9:13a ET. While the system was still processing transactions, many users experienced slow response time or timeouts during this time. Our Operations team identified the cause of the issue, fixed the problem, and the system returned to normal operation at 9:13a ET. **Why Did This Occur \(Root Cause\)?** When regular database maintenance was performed, the removal of a large number of application debugging records dramatically affected the performance of a key table within the application. This resulted in extreme system degradation when writing new records to this table. As the transaction volume increased during the morning, this degradation affected the entire system. Our team temporarily disabled the application debugging function and the system returned to normal operation. In researching this further, an application defect was identified that created the excessive number of records in this table and this code was corrected. **How Will CyberGrants Prevent Further Occurrences?** As a result of this incident, CyberGrants took the following steps: 1. We identified and corrected the application code that caused this problem. 2. We are revising the database maintenance strategy for this table so that inserting/deleting large amounts of data into this table will not result in severe performance degradation.
Status: Postmortem
Impact: Critical | Started At: Jan. 13, 2022, 1:11 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.