Last checked: 1 month ago
Get notified about any outages, downtime or incidents for CyberGrants and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for CyberGrants.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
APIs | Active |
CyberGrants | Active |
Insights | Active |
Sandbox | Active |
View the latest incidents for CyberGrants and check for official updates:
Description: **What Occurred?** On May 3rd, 2022 at 7:30a ET, CyberGrants experienced a total shutdown of all production systems due to the severing of all redundant supplies of power at their co-location Data Center \(Navisite\). Although power was restored within 1 hour, due to the abrupt loss of power, the CyberGrants Operations team had to assess the status of the critical network, storage, and server components, recover key configuration settings and ensure that no data corruption had occurred. **The primary Production system was fully operational at 2:50p ET. Why Did This Occur \(Root Cause\)?** The CyberGrants co-location Data Center \(Navisite\) is a tier-1 facility with fully redundant power. This power loss was caused by a Navisite employed service technician mistakenly hitting the Emergency Power Off \(EPO\) button for the site. This button is required by law for local Fire Marshalls to be able to cut all power to the site in the event of a fire. The EPO button is intentionally designed to sever not only the main power supplied to infrastructure but also the battery backup and generators as well. All EPO buttons within the Navisite Data Center are clearly marked and have two layers of physical protection. After power was restored, CyberGrants ability to quickly restore the failed systems was elongated by a number of factors including getting Operations personnel on-site at the Data Center, assessing the state of all systems, fixing component failures, verifying the integrity of the storage systems, and restarting servers for each tier \(DB, App, Web\). How Will CyberGrants Prevent Further Occurrences. **As a result of this incident CyberGrants is taking the following steps:** 1. Confirm plans and execution with Navisite on the remediations to be implemented as a result of this incident \(i.e., Staff Retraining, Adding additional audio, visual, and physical controls for the EPO buttons\). 2. Reviewing and updating CyberGrants operational procedures for recovering all failed systems in the event of power loss \(i.e., monitoring, network configurations, power loss health checks, run books, etc.\). 3. Scheduling planned sessions where operational procedures can be tested and verified. 4. Enhancing disaster recovery and business continuity capabilities is already part of our infrastructure evolution, including faster full site failover.
Status: Postmortem
Impact: Critical | Started At: May 3, 2022, 11:31 a.m.
Description: What Occurred? A production software release updated our client-side performance monitoring software. The update caused an error for some clients' NPO portal pages that removed styling and reduced functionality. The functionality has been disabled, awaiting a patch to production. We've added additional test cases for impacted clients' NPO portal pages.
Status: Postmortem
Impact: None | Started At: April 15, 2022, 1 p.m.
Description: ### What Occurred? Configuration changes in the production environment and data changes resulted in un-planned down-time for the CyberGrants application for 30 minutes, from 7:25 to 7:55 PM Eastern on April 4. These changes also contributed to a failure in the On-Demand Reporting sub-systems from 5 PM Eastern on April 4 to 10 AM on April 5. Monitoring for the On-Demand Reporting sub-systems did not surface a persistent failure mode. ### Why Did This Occur \(Root Cause\)? Concurrent production system changes had adverse interactions. While our monitoring continues to expand and improve, we did not have sufficient alerting to address this issue immediately, and relied on staff reports of reporting feature issues. ### How Will CyberGrants Prevent Further Occurrences? We are expanding and enhancing our change control processes in April to prevent this kind of conflict in the future. We are expanding our system monitoring to address multiple sub-systems, in April, to precisely pinpoint more partial-failure modes.
Status: Postmortem
Impact: Major | Started At: April 5, 2022, 1:02 p.m.
Description: ### What Occurred? Configuration changes in the production environment and data changes resulted in un-planned down-time for the CyberGrants application for 30 minutes, from 7:25 to 7:55 PM Eastern on April 4. These changes also contributed to a failure in the On-Demand Reporting sub-systems from 5 PM Eastern on April 4 to 10 AM on April 5. Monitoring for the On-Demand Reporting sub-systems did not surface a persistent failure mode. ### Why Did This Occur \(Root Cause\)? Concurrent production system changes had adverse interactions. While our monitoring continues to expand and improve, we did not have sufficient alerting to address this issue immediately, and relied on staff reports of reporting feature issues. ### How Will CyberGrants Prevent Further Occurrences? We are expanding and enhancing our change control processes in April to prevent this kind of conflict in the future. We are expanding our system monitoring to address multiple sub-systems, in April, to precisely pinpoint more partial-failure modes.
Status: Postmortem
Impact: Major | Started At: April 5, 2022, 1:02 p.m.
Description: ### What Occurred? Configuration changes in the production environment and data changes resulted in un-planned down-time for the CyberGrants application for 30 minutes, from 7:25 to 7:55 PM Eastern on April 4. These changes also contributed to a failure in the On-Demand Reporting sub-systems from 5 PM Eastern on April 4 to 10 AM on April 5. Monitoring for the On-Demand Reporting sub-systems did not surface a persistent failure mode. ### Why Did This Occur \(Root Cause\)? Concurrent production system changes had adverse interactions. While our monitoring continues to expand and improve, we did not have sufficient alerting to address this issue immediately, and relied on staff reports of reporting feature issues. ### How Will CyberGrants Prevent Further Occurrences? We are expanding and enhancing our change control processes in April to prevent this kind of conflict in the future. We are expanding our system monitoring to address multiple sub-systems, in April, to precisely pinpoint more partial-failure modes.
Status: Postmortem
Impact: Critical | Started At: April 4, 2022, 11:16 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.