Last checked: 5 minutes ago
Get notified about any outages, downtime or incidents for TaxJar and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for TaxJar.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
TaxJar Autofile | Active |
TaxJar Reporting | Active |
TaxJar API | Active |
Other API Services | Active |
Tax Calculations API | Active |
Tax Rates API | Active |
Transaction Push API | Active |
View the latest incidents for TaxJar and check for official updates:
Description: During this incident, TaxJar customers were not able to access the TaxJar App or use the TaxJar API. We know this was impactful, and we are truly sorry it happened. We have already implemented the following operational changes to ensure this type of failure does not happen again: * We updated our deployment pattern to a blue-green deployment pattern to allow us to better verify changes to production environments. * We are conducting a full audit of our vendor provided managed services that lack the acceptable level of rollback capabilities **Incident Root Cause Analysis** * The incident started with a routine Kubernetes minor version upgrade using our vendor’s managed kubernetes service * This is a routine upgrade operation that we’ve completed 15 times in the past across 3 accounts and 2 regions. We perform this upgrade quarterly in order to keep pace with Kubernetes releases. * Immediately following completion of the upgrade of our production cluster, Kubernetes workers began reporting “Not Ready” status. * Within a few minutes all nodes were now in a state of “Not Ready” which caused all workloads to be marked as offline by our load balancers. * Kubernetes upgrades on our vendor’s managed Kubernetes service are not able to be rolled back. Furthermore new deployments and upgrades to the managed Kubernetes service can take 30-50 minutes to complete, leaving us forced to resolve the immediate issue rather than rolling back. * The vendor’s support team was able to identify the issue: * Clusters, starting with Kubernetes version 1.14 create a cluster security group when they are created. * This security group is designed to allow all traffic from the control plane and[ ](https://docs.aws.amazon.com/eks/latest/userguide/managed-node-groups.html)managed node groups to flow freely between each other. * After the upgrade was completed, the vendor identified that the security group no longer had the required rules configured to allow this traffic to pass, even though this has always happened in prior instances of minor version upgrades. * We manually added the missing rule, which restored connectivity to our managed Kubernetes cluster. * At this point our services started coming back online. * Several other security groups, managed with cloudformation, which had utilized this rule for connectivity between our K8s workloads to other services provided by the vendor \(such as memory caches and databases\) were identified as being unexpectedly altered after this upgrade and also had to be repaired before all services could be restored. * We continue to work with the vendor to understand the root cause for the failure of the managed service to not operate as documented.
Status: Postmortem
Impact: Critical | Started At: Jan. 11, 2021, 6:30 p.m.
Description: During this incident, TaxJar customers were not able to access the TaxJar App or use the TaxJar API. We know this was impactful, and we are truly sorry it happened. We have already implemented the following operational changes to ensure this type of failure does not happen again: * We updated our deployment pattern to a blue-green deployment pattern to allow us to better verify changes to production environments. * We are conducting a full audit of our vendor provided managed services that lack the acceptable level of rollback capabilities **Incident Root Cause Analysis** * The incident started with a routine Kubernetes minor version upgrade using our vendor’s managed kubernetes service * This is a routine upgrade operation that we’ve completed 15 times in the past across 3 accounts and 2 regions. We perform this upgrade quarterly in order to keep pace with Kubernetes releases. * Immediately following completion of the upgrade of our production cluster, Kubernetes workers began reporting “Not Ready” status. * Within a few minutes all nodes were now in a state of “Not Ready” which caused all workloads to be marked as offline by our load balancers. * Kubernetes upgrades on our vendor’s managed Kubernetes service are not able to be rolled back. Furthermore new deployments and upgrades to the managed Kubernetes service can take 30-50 minutes to complete, leaving us forced to resolve the immediate issue rather than rolling back. * The vendor’s support team was able to identify the issue: * Clusters, starting with Kubernetes version 1.14 create a cluster security group when they are created. * This security group is designed to allow all traffic from the control plane and[ ](https://docs.aws.amazon.com/eks/latest/userguide/managed-node-groups.html)managed node groups to flow freely between each other. * After the upgrade was completed, the vendor identified that the security group no longer had the required rules configured to allow this traffic to pass, even though this has always happened in prior instances of minor version upgrades. * We manually added the missing rule, which restored connectivity to our managed Kubernetes cluster. * At this point our services started coming back online. * Several other security groups, managed with cloudformation, which had utilized this rule for connectivity between our K8s workloads to other services provided by the vendor \(such as memory caches and databases\) were identified as being unexpectedly altered after this upgrade and also had to be repaired before all services could be restored. * We continue to work with the vendor to understand the root cause for the failure of the managed service to not operate as documented.
Status: Postmortem
Impact: Critical | Started At: Jan. 11, 2021, 6:30 p.m.
Description: This incident has been resolved.
Status: Resolved
Impact: Minor | Started At: Dec. 27, 2020, 12:05 p.m.
Description: This incident has been resolved.
Status: Resolved
Impact: Major | Started At: Oct. 28, 2020, 5:40 p.m.
Description: This incident has been resolved.
Status: Resolved
Impact: Major | Started At: Oct. 28, 2020, 5:40 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.