Company Logo

Is there an Kentik SaaS US Cluster outage?

Kentik SaaS US Cluster status: Systems Active

Last checked: 8 minutes ago

Get notified about any outages, downtime or incidents for Kentik SaaS US Cluster and 1800+ other cloud vendors. Monitor 10 companies, for free.

Subscribe for updates

Kentik SaaS US Cluster outages and incidents

Outage and incident data over the last 30 days for Kentik SaaS US Cluster.

There have been 1 outages or incidents for Kentik SaaS US Cluster in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Sign Up Now

Components and Services Monitored for Kentik SaaS US Cluster

Outlogger tracks the status of these components for Xero:

Alerting and Mitigation Services Active
Flow Ingest Active
NMS Active
Notifications Active
Query Active
REST API Active
Web Portal Active
BGP Monitoring and Alerting Active
BGP Peering and Enrichment Active
AWS Ingest Active
Azure Ingest Active
GCP Ingest Active
Synthetics Alerting Active
Synthetics Ingest Active
Component Status
Alerting and Mitigation Services Active
Flow Ingest Active
NMS Active
Notifications Active
Query Active
REST API Active
Web Portal Active
Active
BGP Monitoring and Alerting Active
BGP Peering and Enrichment Active
Active
AWS Ingest Active
Azure Ingest Active
GCP Ingest Active
Active
Synthetics Alerting Active
Synthetics Ingest Active

Latest Kentik SaaS US Cluster outages and incidents.

View the latest incidents for Kentik SaaS US Cluster and check for official updates:

Updates:

  • Time: Oct. 7, 2022, 5:52 p.m.
    Status: Postmortem
    Update: **ROOT CAUSE** A new session management and security model was put in place causing an issue with establishment of SSO sessions. **RESOLUTION** Given this software change was related to session management, it was highly difficult to roll back immediately. Kentik Engineering decided to roll forward with a fix more flexibly accommodate session origination.
  • Time: Oct. 7, 2022, 5:52 p.m.
    Status: Postmortem
    Update: **ROOT CAUSE** A new session management and security model was put in place causing an issue with establishment of SSO sessions. **RESOLUTION** Given this software change was related to session management, it was highly difficult to roll back immediately. Kentik Engineering decided to roll forward with a fix more flexibly accommodate session origination.
  • Time: Sept. 13, 2022, 11:50 p.m.
    Status: Resolved
    Update: A bug in a new session management service caused an irreversible block of new logins for certain users that attempted logins after 20:45 UTC. Engineering has rolled forward our session management system to address the issue as of 00:10 UTC
  • Time: Sept. 13, 2022, 11:50 p.m.
    Status: Resolved
    Update: A bug in a new session management service caused an irreversible block of new logins for certain users that attempted logins after 20:45 UTC. Engineering has rolled forward our session management system to address the issue as of 00:10 UTC
  • Time: Sept. 13, 2022, 11:34 p.m.
    Status: Investigating
    Update: We are continuing to investigate this issue.
  • Time: Sept. 13, 2022, 11:34 p.m.
    Status: Investigating
    Update: We are continuing to investigate this issue.
  • Time: Sept. 13, 2022, 6:53 p.m.
    Status: Investigating
    Update: We have identified an issue with SSO, we are working to deploy a fix.
  • Time: Sept. 13, 2022, 6:53 p.m.
    Status: Investigating
    Update: We have identified an issue with SSO, we are working to deploy a fix.

Updates:

  • Time: Oct. 7, 2022, 5:51 p.m.
    Status: Postmortem
    Update: **ROOT CAUSE** This incident was part of a series of incidents caused by bottlenecking in a load balancing system we placed in front of our query engine on 2022-09-01. This load balancer is shared across many of our underlying services, so many upstream Kentik portal pages were affected in different ways. The bottlenecking only occurred during peak query usage, at which time the load balancer would begin hitting its global connection limits. **RESOLUTION** Because this issue only occurred during our peak query times, it took us much longer than desired to identify the pattern and isolate a root cause. Each business day starting 2022-09-06, we would see elevated response times around the same time of day, but no obvious culprits based on metrics, logs, or traces. For the first few days, Kentik Engineering teams were identifying potential performance bottlenecks in various software services based on trace data, rolled out patches, and saw improved response times. While these changes did result in improved performance of various services, the observed improvement in response times immediately following patch deployments were false positives due to the patches rolling out during off peak hours and the root issue actually coinciding with our query peak. After hitting our query peaks on 2022-09-06 to 2022-09-08, we began to see the pattern emerge, but still could not clearly point at a root cause. The biggest blocker to identifying the root cause was that our load balancer was not reporting the bottlenecking in any fashion. In fact, when a Kentik Portal user loaded a page that ran a request that went through this load balancer, we would see nominal response times reported by the load balancer, but elevated response times reported by the web server. This led us to believe there was a performance issue on our web servers and focus much of our efforts there for the first few days. In addition to software improvements, the team allocated 66% more hardware capacity for our web servers, hoping this would buy us headroom to identify the true root cause, but to no avail. It was only after looking back at macro trends several days into the incident and seeing a very slight decrease in overall responsiveness and increased error rates that coincided with our load balancer changes that we began to investigate it as a potential root cause. Our load balancer employs several concurrency limits, and the addition of query load to it caused us to hit these limits during query peaks. We could clearly see this in concurrent connection metrics, but did not have monitoring for this scenario, nor did the load balancer log or otherwise indicate this was occurring. It would queue requests and silently incur delays while reporting nominal request and response times in its latency metrics. On 2022-09-15, Kentik Engineering removed the query load from this load balancer, and performance returned to consistently nominal levels. However, doing this rollback in conjunction with rapidly deploying new hardware for the web portal caused different bottlenecks in our query system during query peaks – ones that we were anticipating and trying to get ahead of by putting the load balancer in play in the first place. On 2022-09-21, Kentik Engineering was able to get all affected systems into a nominal state in terms of query performance and overall latency. **FOLLOW UP** The team is now focused on adding several layers of observability to our platform in order to improve our ability to respond to these types of incidents. In addition to more thorough monitoring of all components of our infrastructure, we are focused on identifying performance issues more proactively. During Q4 2022, our team will be working towards: * Adding more tracing to the Kentik Portal itself in order to get more visibility into browser-side/browser-observed performance * Leveraging Kentik Synthetics to actively monitor performance of key workflows in the Kentik Portal * Increasing our usage of Kentik Host Monitoring to more quickly identify performance issues via Kentik Alerting Please contact your Customer Success team or [[email protected]](mailto:[email protected]) if you have any further questions or concerns.
  • Time: Sept. 8, 2022, 2:12 a.m.
    Status: Resolved
    Update: This incident has been resolved, we are continuing to investigate root cause.
  • Time: Sept. 7, 2022, 11:15 p.m.
    Status: Investigating
    Update: A subset of customers are being affected by slower query performance in the portal, but the platform is up. We are investigating.

Updates:

  • Time: Oct. 7, 2022, 5:51 p.m.
    Status: Postmortem
    Update: **ROOT CAUSE** This incident was part of a series of incidents caused by bottlenecking in a load balancing system we placed in front of our query engine on 2022-09-01. This load balancer is shared across many of our underlying services, so many upstream Kentik portal pages were affected in different ways. The bottlenecking only occurred during peak query usage, at which time the load balancer would begin hitting its global connection limits. **RESOLUTION** Because this issue only occurred during our peak query times, it took us much longer than desired to identify the pattern and isolate a root cause. Each business day starting 2022-09-06, we would see elevated response times around the same time of day, but no obvious culprits based on metrics, logs, or traces. For the first few days, Kentik Engineering teams were identifying potential performance bottlenecks in various software services based on trace data, rolled out patches, and saw improved response times. While these changes did result in improved performance of various services, the observed improvement in response times immediately following patch deployments were false positives due to the patches rolling out during off peak hours and the root issue actually coinciding with our query peak. After hitting our query peaks on 2022-09-06 to 2022-09-08, we began to see the pattern emerge, but still could not clearly point at a root cause. The biggest blocker to identifying the root cause was that our load balancer was not reporting the bottlenecking in any fashion. In fact, when a Kentik Portal user loaded a page that ran a request that went through this load balancer, we would see nominal response times reported by the load balancer, but elevated response times reported by the web server. This led us to believe there was a performance issue on our web servers and focus much of our efforts there for the first few days. In addition to software improvements, the team allocated 66% more hardware capacity for our web servers, hoping this would buy us headroom to identify the true root cause, but to no avail. It was only after looking back at macro trends several days into the incident and seeing a very slight decrease in overall responsiveness and increased error rates that coincided with our load balancer changes that we began to investigate it as a potential root cause. Our load balancer employs several concurrency limits, and the addition of query load to it caused us to hit these limits during query peaks. We could clearly see this in concurrent connection metrics, but did not have monitoring for this scenario, nor did the load balancer log or otherwise indicate this was occurring. It would queue requests and silently incur delays while reporting nominal request and response times in its latency metrics. On 2022-09-15, Kentik Engineering removed the query load from this load balancer, and performance returned to consistently nominal levels. However, doing this rollback in conjunction with rapidly deploying new hardware for the web portal caused different bottlenecks in our query system during query peaks – ones that we were anticipating and trying to get ahead of by putting the load balancer in play in the first place. On 2022-09-21, Kentik Engineering was able to get all affected systems into a nominal state in terms of query performance and overall latency. **FOLLOW UP** The team is now focused on adding several layers of observability to our platform in order to improve our ability to respond to these types of incidents. In addition to more thorough monitoring of all components of our infrastructure, we are focused on identifying performance issues more proactively. During Q4 2022, our team will be working towards: * Adding more tracing to the Kentik Portal itself in order to get more visibility into browser-side/browser-observed performance * Leveraging Kentik Synthetics to actively monitor performance of key workflows in the Kentik Portal * Increasing our usage of Kentik Host Monitoring to more quickly identify performance issues via Kentik Alerting Please contact your Customer Success team or [[email protected]](mailto:[email protected]) if you have any further questions or concerns.
  • Time: Sept. 8, 2022, 2:12 a.m.
    Status: Resolved
    Update: This incident has been resolved, we are continuing to investigate root cause.
  • Time: Sept. 7, 2022, 11:15 p.m.
    Status: Investigating
    Update: A subset of customers are being affected by slower query performance in the portal, but the platform is up. We are investigating.

Updates:

  • Time: Oct. 7, 2022, 5:51 p.m.
    Status: Postmortem
    Update: **ROOT CAUSE** This incident was part of a series of incidents caused by bottlenecking in a load balancing system we placed in front of our query engine on 2022-09-01. This load balancer is shared across many of our underlying services, so many upstream Kentik portal pages were affected in different ways. The bottlenecking only occurred during peak query usage, at which time the load balancer would begin hitting its global connection limits. **RESOLUTION** Because this issue only occurred during our peak query times, it took us much longer than desired to identify the pattern and isolate a root cause. Each business day starting 2022-09-06, we would see elevated response times around the same time of day, but no obvious culprits based on metrics, logs, or traces. For the first few days, Kentik Engineering teams were identifying potential performance bottlenecks in various software services based on trace data, rolled out patches, and saw improved response times. While these changes did result in improved performance of various services, the observed improvement in response times immediately following patch deployments were false positives due to the patches rolling out during off peak hours and the root issue actually coinciding with our query peak. After hitting our query peaks on 2022-09-06 to 2022-09-08, we began to see the pattern emerge, but still could not clearly point at a root cause. The biggest blocker to identifying the root cause was that our load balancer was not reporting the bottlenecking in any fashion. In fact, when a Kentik Portal user loaded a page that ran a request that went through this load balancer, we would see nominal response times reported by the load balancer, but elevated response times reported by the web server. This led us to believe there was a performance issue on our web servers and focus much of our efforts there for the first few days. In addition to software improvements, the team allocated 66% more hardware capacity for our web servers, hoping this would buy us headroom to identify the true root cause, but to no avail. It was only after looking back at macro trends several days into the incident and seeing a very slight decrease in overall responsiveness and increased error rates that coincided with our load balancer changes that we began to investigate it as a potential root cause. Our load balancer employs several concurrency limits, and the addition of query load to it caused us to hit these limits during query peaks. We could clearly see this in concurrent connection metrics, but did not have monitoring for this scenario, nor did the load balancer log or otherwise indicate this was occurring. It would queue requests and silently incur delays while reporting nominal request and response times in its latency metrics. On 2022-09-15, Kentik Engineering removed the query load from this load balancer, and performance returned to consistently nominal levels. However, doing this rollback in conjunction with rapidly deploying new hardware for the web portal caused different bottlenecks in our query system during query peaks – ones that we were anticipating and trying to get ahead of by putting the load balancer in play in the first place. On 2022-09-21, Kentik Engineering was able to get all affected systems into a nominal state in terms of query performance and overall latency. **FOLLOW UP** The team is now focused on adding several layers of observability to our platform in order to improve our ability to respond to these types of incidents. In addition to more thorough monitoring of all components of our infrastructure, we are focused on identifying performance issues more proactively. During Q4 2022, our team will be working towards: * Adding more tracing to the Kentik Portal itself in order to get more visibility into browser-side/browser-observed performance * Leveraging Kentik Synthetics to actively monitor performance of key workflows in the Kentik Portal * Increasing our usage of Kentik Host Monitoring to more quickly identify performance issues via Kentik Alerting Please contact your Customer Success team or [[email protected]](mailto:[email protected]) if you have any further questions or concerns.
  • Time: Oct. 7, 2022, 5:51 p.m.
    Status: Postmortem
    Update: **ROOT CAUSE** This incident was part of a series of incidents caused by bottlenecking in a load balancing system we placed in front of our query engine on 2022-09-01. This load balancer is shared across many of our underlying services, so many upstream Kentik portal pages were affected in different ways. The bottlenecking only occurred during peak query usage, at which time the load balancer would begin hitting its global connection limits. **RESOLUTION** Because this issue only occurred during our peak query times, it took us much longer than desired to identify the pattern and isolate a root cause. Each business day starting 2022-09-06, we would see elevated response times around the same time of day, but no obvious culprits based on metrics, logs, or traces. For the first few days, Kentik Engineering teams were identifying potential performance bottlenecks in various software services based on trace data, rolled out patches, and saw improved response times. While these changes did result in improved performance of various services, the observed improvement in response times immediately following patch deployments were false positives due to the patches rolling out during off peak hours and the root issue actually coinciding with our query peak. After hitting our query peaks on 2022-09-06 to 2022-09-08, we began to see the pattern emerge, but still could not clearly point at a root cause. The biggest blocker to identifying the root cause was that our load balancer was not reporting the bottlenecking in any fashion. In fact, when a Kentik Portal user loaded a page that ran a request that went through this load balancer, we would see nominal response times reported by the load balancer, but elevated response times reported by the web server. This led us to believe there was a performance issue on our web servers and focus much of our efforts there for the first few days. In addition to software improvements, the team allocated 66% more hardware capacity for our web servers, hoping this would buy us headroom to identify the true root cause, but to no avail. It was only after looking back at macro trends several days into the incident and seeing a very slight decrease in overall responsiveness and increased error rates that coincided with our load balancer changes that we began to investigate it as a potential root cause. Our load balancer employs several concurrency limits, and the addition of query load to it caused us to hit these limits during query peaks. We could clearly see this in concurrent connection metrics, but did not have monitoring for this scenario, nor did the load balancer log or otherwise indicate this was occurring. It would queue requests and silently incur delays while reporting nominal request and response times in its latency metrics. On 2022-09-15, Kentik Engineering removed the query load from this load balancer, and performance returned to consistently nominal levels. However, doing this rollback in conjunction with rapidly deploying new hardware for the web portal caused different bottlenecks in our query system during query peaks – ones that we were anticipating and trying to get ahead of by putting the load balancer in play in the first place. On 2022-09-21, Kentik Engineering was able to get all affected systems into a nominal state in terms of query performance and overall latency. **FOLLOW UP** The team is now focused on adding several layers of observability to our platform in order to improve our ability to respond to these types of incidents. In addition to more thorough monitoring of all components of our infrastructure, we are focused on identifying performance issues more proactively. During Q4 2022, our team will be working towards: * Adding more tracing to the Kentik Portal itself in order to get more visibility into browser-side/browser-observed performance * Leveraging Kentik Synthetics to actively monitor performance of key workflows in the Kentik Portal * Increasing our usage of Kentik Host Monitoring to more quickly identify performance issues via Kentik Alerting Please contact your Customer Success team or [[email protected]](mailto:[email protected]) if you have any further questions or concerns.
  • Time: Sept. 7, 2022, 8:59 p.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: Sept. 7, 2022, 8:59 p.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: Sept. 7, 2022, 6:25 p.m.
    Status: Investigating
    Update: Query performance has normalized; the Kentik Ops team continues to investigate the root cause.
  • Time: Sept. 7, 2022, 6:25 p.m.
    Status: Investigating
    Update: Query performance has normalized; the Kentik Ops team continues to investigate the root cause.
  • Time: Sept. 7, 2022, 4:36 p.m.
    Status: Investigating
    Update: We are continuing to investigate this issue.
  • Time: Sept. 7, 2022, 4:36 p.m.
    Status: Investigating
    Update: We are continuing to investigate this issue.
  • Time: Sept. 7, 2022, 4:02 p.m.
    Status: Investigating
    Update: We are currently investigating reports of degraded performance.
  • Time: Sept. 7, 2022, 4:02 p.m.
    Status: Investigating
    Update: We are currently investigating reports of degraded performance.

Updates:

  • Time: Oct. 7, 2022, 4:52 p.m.
    Status: Postmortem
    Update: **ROOT CAUSE** This incident was caused by a software bug in query cancellation that caused our query engine to become unresponsive.  **RESOLUTION** Internal alerting notified our teams of this issue and we immediately rolled out a fix later in the day.
  • Time: July 8, 2022, 6:52 p.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: July 8, 2022, 4 p.m.
    Status: Monitoring
    Update: We are continuing to monitor for any further issues.
  • Time: July 8, 2022, 3:46 p.m.
    Status: Monitoring
    Update: A fix has been implemented and we are closely monitoring system health.
  • Time: July 8, 2022, 3:44 p.m.
    Status: Identified
    Update: A subset of Kentik Portal sessions are experiencing failed KDE queries with 5xx errors.

Check the status of similar companies and alternatives to Kentik SaaS US Cluster

NetSuite
NetSuite

Systems Active

ZoomInfo
ZoomInfo

Systems Active

SPS Commerce
SPS Commerce

Systems Active

Miro
Miro

Systems Active

Field Nation
Field Nation

Systems Active

Outreach
Outreach

Systems Active

Own Company

Systems Active

Mindbody
Mindbody

Systems Active

TaskRabbit
TaskRabbit

Systems Active

Nextiva
Nextiva

Systems Active

6Sense

Systems Active

BigCommerce
BigCommerce

Systems Active

Frequently Asked Questions - Kentik SaaS US Cluster

Is there a Kentik SaaS US Cluster outage?
The current status of Kentik SaaS US Cluster is: Systems Active
Where can I find the official status page of Kentik SaaS US Cluster?
The official status page for Kentik SaaS US Cluster is here
How can I get notified if Kentik SaaS US Cluster is down or experiencing an outage?
To get notified of any status changes to Kentik SaaS US Cluster, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Kentik SaaS US Cluster every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here
What does Kentik SaaS US Cluster do?
Kentik provides network observability solutions to enhance network performance, security, and diagnostics through traffic monitoring, routing, synthetic testing, and cloud integration.