Last checked: 6 minutes ago
Get notified about any outages, downtime or incidents for ChargeOver and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for ChargeOver.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
Developer Docs | Active |
Email Sending | Active |
Integrations | Active |
Main Application | Active |
Payment Processing | Active |
Public Website | Active |
Search | Active |
View the latest incidents for ChargeOver and check for official updates:
Description: Email delivery is operating normally. A postmortem will follow.
Status: Resolved
Impact: Minor | Started At: July 28, 2024, 3:09 a.m.
Description: This issue has been resolved. A postmortem will follow.
Status: Resolved
Impact: Minor | Started At: Dec. 4, 2023, 5:05 p.m.
Description: ## Incident details The ChargeOver team follows an agile software development lifecycle for rolling out new features and updates. We routinely roll out new features and updates to the platform multiple times per week. Our typical continuous integration/continuous deployment roll-outs look something like this: * Developers make changes * Code review by other developers * Automated security, linting, and security testing is performed * Senior-level developer code review before deploying features to production * Automated deploy of new updates to production environment * If an error occurs, roll back the changes to a previously known-good configuration ChargeOver uses `Docker` containers and a redundant set of `Docker Swarm` nodes for production deployments. On `July 19th at 10:24am CT` we reviewed and deployed an update which, although passing all automated acceptance tests, errantly deployed an update which caused the deployment of `Docker` container images to silently fail. This caused the application to become unavailable, and a `503 Service Unavailable` error message was shown to all users. The deployment appeared to be successful to automated systems, but due to a syntax error actually only removed existing application servers rather than replacing them with the new software version. No automated roll back occurred, because the deployment appeared successful but had actually failed silently. ## Root cause A single extra space \(a single errant spacebar press!\) was accidentally added to the very beginning of a `docker-compose` `YAML` file, which caused the `docker-compose` file to be invalid `YAML` syntax. The single-space change was subtle enough to be missed when reviewing the code change. All automated tests passed, because the automated tests do not use the production `docker-compose` deployment configuration file. When deploying the service to `Docker Swarm`, `Docker Swarm` interpreted the invalid syntax in the `YAML` file as an empty set of services to deploy, rather than a set of valid application services to be deployed. This caused the deployment to look successful \(it successfully deployed, removing all existing application servers, and replacing them with nothing\) and thus automated roll-back to a known-good set of services did not happen. ## Incident timeline * 10:21am CT - Change was reviewed and merged from a staging branch, to our production branch. * 10:24am CT - Change was deployed to production, immediately causing an outage. * 10:29am CT - Our team posted a status update here, notifying affected customers. * 10:36am CT - Our team identified the related errant change, and started to revert to a known-good set of services. * 11:06am CT - All services became operational again after deploying to last known good configuration. At this time, all services were restored and operational. * 11:09am CT - Our team identified exactly what was wrong - an accidentally added single space character at the beginning of a configuration file, causing the file to be invalid `YAML` syntax. * 11:56am CT - Our team made a change to validate the syntax of the `YAML` configuration files, to ensure this failure scenario cannot happen again. ## Remediation plan There are several things that our team has identified as part of a remediation plan: * We have already deployed multiple checks to ensure that invalid `YAML` syntax and/or configuration errors cannot pass automated tests, and thus cannot reach testing/UAT or production environments. * Our team will work to improve the very generic `503 Service Unavailable` message that customers received, directing affected customers to [https://status.chargeover.com](https://status.chargeover.com) where they can see real-time updates regarding any system outages. * Customers logging in via [https://app.chargeover.com](https://app.chargeover.com) received generic `The credentials that you provided are not correct.` messages, instead of a notification of the outage. This will be improved. * Our team will do a review of our deployment pipelines, to see if we can identify any other similar potential failure points.
Status: Postmortem
Impact: Critical | Started At: July 19, 2023, 3:29 p.m.
Description: This issue has been resolved. Please contact us if you continue to have any trouble.
Status: Resolved
Impact: Major | Started At: May 19, 2023, 1:40 a.m.
Description: The login issue has been resolved.
Status: Resolved
Impact: Minor | Started At: May 9, 2023, 3:22 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.