Company Logo

Is there an Castle outage?

Castle status: Systems Active

Last checked: 17 seconds ago

Get notified about any outages, downtime or incidents for Castle and 1800+ other cloud vendors. Monitor 10 companies, for free.

Subscribe for updates

Castle outages and incidents

Outage and incident data over the last 30 days for Castle.

There have been 1 outages or incidents for Castle in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Sign Up Now

Components and Services Monitored for Castle

Outlogger tracks the status of these components for Xero:

Dashboard Active
Legacy APIs Active
Log API Active
Risk and Filter APIs Active
Component Status
Dashboard Active
Legacy APIs Active
Log API Active
Risk and Filter APIs Active

Latest Castle outages and incidents.

View the latest incidents for Castle and check for official updates:

Updates:

  • Time: Sept. 9, 2021, 2:43 p.m.
    Status: Resolved
    Update: At 2021-09-06 20:04 UTC we experienced an AWS hardware failure with one of our main databases which led to 7 minutes of downtime impacting our APIs. During this time, the APIs were returning a 500 response code and no data was processed. The database in question is configured to be multi-node with automatic failover, but for unknown reasons the failover didn't happen as expected when the hardware fault occurred. Instead, a full backup had to be recreated, which led to the extended period of downtime. We're currently debugging this with AWS support to make sure we can trust the resiliency of our platform. While the current setup should provide good redundancy, we're simultaneously looking into alternative options to prevent this from happening again.

Updates:

  • Time: Aug. 31, 2021, 4:08 p.m.
    Status: Resolved
    Update: Between 14:43 and 15:31 UTC Castle experienced an infrastructure issue with our message queuing system that caused some customer event data to get lost. While risk scoring and inline responses were functioning normally, the requests sent during the period of the incident will not be visible or searchable in the Castle Dashboard We're prioritizing efforts to add extra redundancy to our system to prevent this from happening again.

Updates:

  • Time: April 8, 2021, 2:28 p.m.
    Status: Postmortem
    Update: On Sunday, April 4th, 2021, beginning at 13:56 UTC, Castle's `/authenticate` endpoint was unavailable. Our teams promptly responded and service was restored at 14:09 UTC. We've conducted a full retrospective and root-cause analysis and determined that the original cause of the incident was the hardware failure \(as confirmed by AWS Support\) of an AWS host instance that contained Castle's managed cache service. This hardware failure caused an accumulation of timeouts, resulting in some app instances being marked unhealthy and automatically restarted in a loop. Although rare, we do expect occasional hardware-level failures, and our system is designed to be resilient to these failures whenever possible. In this case, the accumulated timeouts caused the system to behave in a way we have not seen before. We have re-prioritized our engineering team to implement '[circuit breaker](https://martinfowler.com/bliki/CircuitBreaker.html)'-style handling around cache look-ups which will prevent subsequent cache layer failures from impacting synchronous endpoints like `/authenticate`.
  • Time: April 4, 2021, 2:26 p.m.
    Status: Resolved
    Update: System is back to normal. We will follow up with more details about this incident
  • Time: April 4, 2021, 2:13 p.m.
    Status: Investigating
    Update: API endpoints responding normally again. Queued requests are catching up. Monitoring
  • Time: April 4, 2021, 2:06 p.m.
    Status: Investigating
    Update: We’re experiencing timeouts in API endpoints. Investigating

Updates:

  • Time: March 30, 2021, 5:06 p.m.
    Status: Postmortem
    Update: On March 30, 2021, Castle’s API became degraded during three distinct windows of time: * 12:02 UTC - 12:45 UTC * 12:59 UTC - 13:41 UTC * 14:48 UTC - 15:25 UTC During this time, some Castle API calls failed, including calls to our synchronous `authenticate` endpoint. The Castle dashboard was up, but due to the API being unavailable was not rendering data. Service was fully restored as of 15:25 UTC, and some data generated from requests to our asynchronous `track` and `batch` endpoints during the incident was recovered from queues and subsequently processed. As we communicated to all active customers yesterday, we take these sort of incidents very seriously, and want to share some of the factors that led to this incident. The root cause of the incident was a failure of one of our primary data clusters. This is a multi-node, fault-tolerant commercial solution and a complete cluster failure is extremely rare. Castle’s infrastructure team responded immediately to the incident and found an unbounded memory leak that caused each node to simultaneously shut down. Over the course of the incident, we learned this memory leak was exacerbated by a specific class of background job that actually began running a day prior but did not begin leaking memory for some time. When the incident began, we detected the issue and immediately restarted the cluster. A full 'cold start' of the entire cluster takes around 40 minutes, and this accounts for the first downtime window. After the cluster restarted, our fault-tolerant job scheduling system attempted to run the jobs again, which caused the cluster to require full cold restarts twice more as we worked to clear out the job queue and replicas. At this time, we believe the reason for the memory leak is a bug in our data cluster provider’s software - we have been able to successfully reproduce the issue in a test environment and have a high-priority case open with their support team. In the meantime, we have audited all active background job systems to ensure performance-affecting jobs are temporarily disabled or worked around. Once again, we apologize for the impact of this interruption. Please feel free to contact us at [[email protected]](mailto:[email protected]) if you have any further questions.
  • Time: March 30, 2021, 4:28 p.m.
    Status: Resolved
    Update: Systems are operating normally and we have put mitigation measures in place to ensure the issue does not reoccur. We'll have a full retrospective and root cause teardown of the incident published within the next few days.
  • Time: March 30, 2021, 3:47 p.m.
    Status: Monitoring
    Update: API endpoints are responsive again and the system is stabilizing. We're monitoring the situation
  • Time: March 30, 2021, 2:53 p.m.
    Status: Identified
    Update: We are seeing degraded performance on API endpoints once more, and are working on restoring functionality as quickly as possible.
  • Time: March 30, 2021, 2:09 p.m.
    Status: Monitoring
    Update: Database cluster operating normally and API endpoints are responding. We're continuing to monitor the situation
  • Time: March 30, 2021, 1:18 p.m.
    Status: Investigating
    Update: We are experiencing issues with our main database cluster which affects all API endpoints. We're currently investigating this issue.

Check the status of similar companies and alternatives to Castle

NetSuite
NetSuite

Systems Active

ZoomInfo
ZoomInfo

Systems Active

SPS Commerce
SPS Commerce

Systems Active

Miro
Miro

Systems Active

Field Nation
Field Nation

Systems Active

Outreach
Outreach

Systems Active

Own Company

Systems Active

Mindbody
Mindbody

Systems Active

TaskRabbit
TaskRabbit

Systems Active

Nextiva
Nextiva

Systems Active

6Sense

Systems Active

BigCommerce
BigCommerce

Systems Active

Frequently Asked Questions - Castle

Is there a Castle outage?
The current status of Castle is: Systems Active
Where can I find the official status page of Castle?
The official status page for Castle is here
How can I get notified if Castle is down or experiencing an outage?
To get notified of any status changes to Castle, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Castle every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here
What does Castle do?
A service that efficiently addresses fraud rings, account takeovers, and suspicious transactions in a timely manner.