Company Logo

Is there an Kentik SaaS US Cluster outage?

Kentik SaaS US Cluster status: Systems Active

Last checked: 4 minutes ago

Get notified about any outages, downtime or incidents for Kentik SaaS US Cluster and 1800+ other cloud vendors. Monitor 10 companies, for free.

Subscribe for updates

Kentik SaaS US Cluster outages and incidents

Outage and incident data over the last 30 days for Kentik SaaS US Cluster.

There have been 1 outages or incidents for Kentik SaaS US Cluster in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Sign Up Now

Components and Services Monitored for Kentik SaaS US Cluster

Outlogger tracks the status of these components for Xero:

Alerting and Mitigation Services Active
Flow Ingest Active
NMS Active
Notifications Active
Query Active
REST API Active
Web Portal Active
BGP Monitoring and Alerting Active
BGP Peering and Enrichment Active
AWS Ingest Active
Azure Ingest Active
GCP Ingest Active
Synthetics Alerting Active
Synthetics Ingest Active
Component Status
Alerting and Mitigation Services Active
Flow Ingest Active
NMS Active
Notifications Active
Query Active
REST API Active
Web Portal Active
Active
BGP Monitoring and Alerting Active
BGP Peering and Enrichment Active
Active
AWS Ingest Active
Azure Ingest Active
GCP Ingest Active
Active
Synthetics Alerting Active
Synthetics Ingest Active

Latest Kentik SaaS US Cluster outages and incidents.

View the latest incidents for Kentik SaaS US Cluster and check for official updates:

Updates:

  • Time: Oct. 25, 2022, 4:45 p.m.
    Status: Postmortem
    Update: **ROOT CAUSE** Our inbound proxy/loadbalancer was configured for a small concurrent connection pool for these API paths. At approximately 13:30 UTC, a few high volume paths were brought online that filled this pool and caused requests to queue to a point where we could not catch up. This caused periodic 503 and 429 responses from our API. **RESOLUTION** At approximately 17:45 UTC, the connection pool size was increased to address this issue. We have raised the severity of the internal alerts we have monitoring these metrics to more quickly identify and resolve future similar events.
  • Time: Oct. 25, 2022, 4:44 p.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: Oct. 25, 2022, 4:44 p.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: Oct. 25, 2022, 4:44 p.m.
    Status: Monitoring
    Update: The issue has been resolved and we will continue to monitor
  • Time: Oct. 25, 2022, 4:44 p.m.
    Status: Monitoring
    Update: The issue has been resolved and we will continue to monitor
  • Time: Oct. 25, 2022, 4:43 p.m.
    Status: Identified
    Update: A fix has been issued and engineering continues to monitor the API
  • Time: Oct. 25, 2022, 4:43 p.m.
    Status: Identified
    Update: A fix has been issued and engineering continues to monitor the API
  • Time: Oct. 25, 2022, 4:42 p.m.
    Status: Identified
    Update: The root cause has been identified and a fix is being put into place.
  • Time: Oct. 25, 2022, 4:42 p.m.
    Status: Identified
    Update: The root cause has been identified and a fix is being put into place.
  • Time: Oct. 25, 2022, 4:41 p.m.
    Status: Investigating
    Update: At approximately 1530UTC, we saw elevated error levels for API v6 (synthetics, cloud, etc.) and Kentik agent (kbgp, kproxy) APIs. We have determined these APIs to be non-operational and are investigating the root cause.
  • Time: Oct. 25, 2022, 4:41 p.m.
    Status: Investigating
    Update: At approximately 1530UTC, we saw elevated error levels for API v6 (synthetics, cloud, etc.) and Kentik agent (kbgp, kproxy) APIs. We have determined these APIs to be non-operational and are investigating the root cause.

Updates:

  • Time: Oct. 7, 2022, 5:52 p.m.
    Status: Postmortem
    Update: **ROOT CAUSE** This incident was part of a series of incidents caused by bottlenecking in a load balancing system we placed in front of our query engine on 2022-09-01. This load balancer is shared across many of our underlying services, so many upstream Kentik portal pages were affected in different ways. The bottlenecking only occurred during peak query usage, at which time the load balancer would begin hitting its global connection limits. **RESOLUTION** Because this issue only occurred during our peak query times, it took us much longer than desired to identify the pattern and isolate a root cause. Each business day starting 2022-09-06, we would see elevated response times around the same time of day, but no obvious culprits based on metrics, logs, or traces. For the first few days, Kentik Engineering teams were identifying potential performance bottlenecks in various software services based on trace data, rolled out patches, and saw improved response times. While these changes did result in improved performance of various services, the observed improvement in response times immediately following patch deployments were false positives due to the patches rolling out during off peak hours and the root issue actually coinciding with our query peak. After hitting our query peaks on 2022-09-06 to 2022-09-08, we began to see the pattern emerge, but still could not clearly point at a root cause. The biggest blocker to identifying the root cause was that our load balancer was not reporting the bottlenecking in any fashion. In fact, when a Kentik Portal user loaded a page that ran a request that went through this load balancer, we would see nominal response times reported by the load balancer, but elevated response times reported by the web server. This led us to believe there was a performance issue on our web servers and focus much of our efforts there for the first few days. In addition to software improvements, the team allocated 66% more hardware capacity for our web servers, hoping this would buy us headroom to identify the true root cause, but to no avail. It was only after looking back at macro trends several days into the incident and seeing a very slight decrease in overall responsiveness and increased error rates that coincided with our load balancer changes that we began to investigate it as a potential root cause. Our load balancer employs several concurrency limits, and the addition of query load to it caused us to hit these limits during query peaks. We could clearly see this in concurrent connection metrics, but did not have monitoring for this scenario, nor did the load balancer log or otherwise indicate this was occurring. It would queue requests and silently incur delays while reporting nominal request and response times in its latency metrics. On 2022-09-15, Kentik Engineering removed the query load from this load balancer, and performance returned to consistently nominal levels. However, doing this rollback in conjunction with rapidly deploying new hardware for the web portal caused different bottlenecks in our query system during query peaks – ones that we were anticipating and trying to get ahead of by putting the load balancer in play in the first place. On 2022-09-21, Kentik Engineering was able to get all affected systems into a nominal state in terms of query performance and overall latency. **FOLLOW UP** The team is now focused on adding several layers of observability to our platform in order to improve our ability to respond to these types of incidents. In addition to more thorough monitoring of all components of our infrastructure, we are focused on identifying performance issues more proactively. During Q4 2022, our team will be working towards: * Adding more tracing to the Kentik Portal itself in order to get more visibility into browser-side/browser-observed performance * Leveraging Kentik Synthetics to actively monitor performance of key workflows in the Kentik Portal * Increasing our usage of Kentik Host Monitoring to more quickly identify performance issues via Kentik Alerting Please contact your Customer Success team or [[email protected]](mailto:[email protected]) if you have any further questions or concerns.
  • Time: Sept. 21, 2022, 8:19 p.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: Sept. 21, 2022, 7:25 p.m.
    Status: Monitoring
    Update: Engineering has identified the root cause and has put a mitigation in place at 1830 UTC; we are continuing to monitor.
  • Time: Sept. 21, 2022, 6:15 p.m.
    Status: Investigating
    Update: Approximately 25% of our netflow customers and all Synthetics users are experiencing degraded performance within the portal. We are investigating and should have a fix shortly.

Updates:

  • Time: Oct. 7, 2022, 5:52 p.m.
    Status: Postmortem
    Update: **ROOT CAUSE** This incident was part of a series of incidents caused by bottlenecking in a load balancing system we placed in front of our query engine on 2022-09-01. This load balancer is shared across many of our underlying services, so many upstream Kentik portal pages were affected in different ways. The bottlenecking only occurred during peak query usage, at which time the load balancer would begin hitting its global connection limits. **RESOLUTION** Because this issue only occurred during our peak query times, it took us much longer than desired to identify the pattern and isolate a root cause. Each business day starting 2022-09-06, we would see elevated response times around the same time of day, but no obvious culprits based on metrics, logs, or traces. For the first few days, Kentik Engineering teams were identifying potential performance bottlenecks in various software services based on trace data, rolled out patches, and saw improved response times. While these changes did result in improved performance of various services, the observed improvement in response times immediately following patch deployments were false positives due to the patches rolling out during off peak hours and the root issue actually coinciding with our query peak. After hitting our query peaks on 2022-09-06 to 2022-09-08, we began to see the pattern emerge, but still could not clearly point at a root cause. The biggest blocker to identifying the root cause was that our load balancer was not reporting the bottlenecking in any fashion. In fact, when a Kentik Portal user loaded a page that ran a request that went through this load balancer, we would see nominal response times reported by the load balancer, but elevated response times reported by the web server. This led us to believe there was a performance issue on our web servers and focus much of our efforts there for the first few days. In addition to software improvements, the team allocated 66% more hardware capacity for our web servers, hoping this would buy us headroom to identify the true root cause, but to no avail. It was only after looking back at macro trends several days into the incident and seeing a very slight decrease in overall responsiveness and increased error rates that coincided with our load balancer changes that we began to investigate it as a potential root cause. Our load balancer employs several concurrency limits, and the addition of query load to it caused us to hit these limits during query peaks. We could clearly see this in concurrent connection metrics, but did not have monitoring for this scenario, nor did the load balancer log or otherwise indicate this was occurring. It would queue requests and silently incur delays while reporting nominal request and response times in its latency metrics. On 2022-09-15, Kentik Engineering removed the query load from this load balancer, and performance returned to consistently nominal levels. However, doing this rollback in conjunction with rapidly deploying new hardware for the web portal caused different bottlenecks in our query system during query peaks – ones that we were anticipating and trying to get ahead of by putting the load balancer in play in the first place. On 2022-09-21, Kentik Engineering was able to get all affected systems into a nominal state in terms of query performance and overall latency. **FOLLOW UP** The team is now focused on adding several layers of observability to our platform in order to improve our ability to respond to these types of incidents. In addition to more thorough monitoring of all components of our infrastructure, we are focused on identifying performance issues more proactively. During Q4 2022, our team will be working towards: * Adding more tracing to the Kentik Portal itself in order to get more visibility into browser-side/browser-observed performance * Leveraging Kentik Synthetics to actively monitor performance of key workflows in the Kentik Portal * Increasing our usage of Kentik Host Monitoring to more quickly identify performance issues via Kentik Alerting Please contact your Customer Success team or [[email protected]](mailto:[email protected]) if you have any further questions or concerns.
  • Time: Sept. 21, 2022, 8:19 p.m.
    Status: Resolved
    Update: This incident has been resolved.
  • Time: Sept. 21, 2022, 7:25 p.m.
    Status: Monitoring
    Update: Engineering has identified the root cause and has put a mitigation in place at 1830 UTC; we are continuing to monitor.
  • Time: Sept. 21, 2022, 6:15 p.m.
    Status: Investigating
    Update: Approximately 25% of our netflow customers and all Synthetics users are experiencing degraded performance within the portal. We are investigating and should have a fix shortly.

Updates:

  • Time: Oct. 7, 2022, 5:51 p.m.
    Status: Postmortem
    Update: **ROOT CAUSE** This incident was part of a series of incidents caused by bottlenecking in a load balancing system we placed in front of our query engine on 2022-09-01. This load balancer is shared across many of our underlying services, so many upstream Kentik portal pages were affected in different ways. The bottlenecking only occurred during peak query usage, at which time the load balancer would begin hitting its global connection limits. **RESOLUTION** Because this issue only occurred during our peak query times, it took us much longer than desired to identify the pattern and isolate a root cause. Each business day starting 2022-09-06, we would see elevated response times around the same time of day, but no obvious culprits based on metrics, logs, or traces. For the first few days, Kentik Engineering teams were identifying potential performance bottlenecks in various software services based on trace data, rolled out patches, and saw improved response times. While these changes did result in improved performance of various services, the observed improvement in response times immediately following patch deployments were false positives due to the patches rolling out during off peak hours and the root issue actually coinciding with our query peak. After hitting our query peaks on 2022-09-06 to 2022-09-08, we began to see the pattern emerge, but still could not clearly point at a root cause. The biggest blocker to identifying the root cause was that our load balancer was not reporting the bottlenecking in any fashion. In fact, when a Kentik Portal user loaded a page that ran a request that went through this load balancer, we would see nominal response times reported by the load balancer, but elevated response times reported by the web server. This led us to believe there was a performance issue on our web servers and focus much of our efforts there for the first few days. In addition to software improvements, the team allocated 66% more hardware capacity for our web servers, hoping this would buy us headroom to identify the true root cause, but to no avail. It was only after looking back at macro trends several days into the incident and seeing a very slight decrease in overall responsiveness and increased error rates that coincided with our load balancer changes that we began to investigate it as a potential root cause. Our load balancer employs several concurrency limits, and the addition of query load to it caused us to hit these limits during query peaks. We could clearly see this in concurrent connection metrics, but did not have monitoring for this scenario, nor did the load balancer log or otherwise indicate this was occurring. It would queue requests and silently incur delays while reporting nominal request and response times in its latency metrics. On 2022-09-15, Kentik Engineering removed the query load from this load balancer, and performance returned to consistently nominal levels. However, doing this rollback in conjunction with rapidly deploying new hardware for the web portal caused different bottlenecks in our query system during query peaks – ones that we were anticipating and trying to get ahead of by putting the load balancer in play in the first place. On 2022-09-21, Kentik Engineering was able to get all affected systems into a nominal state in terms of query performance and overall latency. **FOLLOW UP** The team is now focused on adding several layers of observability to our platform in order to improve our ability to respond to these types of incidents. In addition to more thorough monitoring of all components of our infrastructure, we are focused on identifying performance issues more proactively. During Q4 2022, our team will be working towards: * Adding more tracing to the Kentik Portal itself in order to get more visibility into browser-side/browser-observed performance * Leveraging Kentik Synthetics to actively monitor performance of key workflows in the Kentik Portal * Increasing our usage of Kentik Host Monitoring to more quickly identify performance issues via Kentik Alerting Please contact your Customer Success team or [[email protected]](mailto:[email protected]) if you have any further questions or concerns.
  • Time: Sept. 15, 2022, 10:42 p.m.
    Status: Resolved
    Update: Occurrences of high response times have stayed at a nominal level compared to latency patterns over the past week. We are continuing to monitor, but consider the issue resolved.
  • Time: Sept. 15, 2022, 10:38 p.m.
    Status: Monitoring
    Update: A fix has been deployed and occurrences of high response times have diminished. We will continue to monitor for the next 24-48 hours.
  • Time: Sept. 15, 2022, 10:37 p.m.
    Status: Identified
    Update: A potential root cause has been identified and we are working in implementing a fix.
  • Time: Sept. 15, 2022, 10:36 p.m.
    Status: Investigating
    Update: We are seeing intermittent spikes in response times affecting a subset of users in both the web portal and API. We are actively investigating.

Updates:

  • Time: Oct. 7, 2022, 5:51 p.m.
    Status: Postmortem
    Update: **ROOT CAUSE** This incident was part of a series of incidents caused by bottlenecking in a load balancing system we placed in front of our query engine on 2022-09-01. This load balancer is shared across many of our underlying services, so many upstream Kentik portal pages were affected in different ways. The bottlenecking only occurred during peak query usage, at which time the load balancer would begin hitting its global connection limits. **RESOLUTION** Because this issue only occurred during our peak query times, it took us much longer than desired to identify the pattern and isolate a root cause. Each business day starting 2022-09-06, we would see elevated response times around the same time of day, but no obvious culprits based on metrics, logs, or traces. For the first few days, Kentik Engineering teams were identifying potential performance bottlenecks in various software services based on trace data, rolled out patches, and saw improved response times. While these changes did result in improved performance of various services, the observed improvement in response times immediately following patch deployments were false positives due to the patches rolling out during off peak hours and the root issue actually coinciding with our query peak. After hitting our query peaks on 2022-09-06 to 2022-09-08, we began to see the pattern emerge, but still could not clearly point at a root cause. The biggest blocker to identifying the root cause was that our load balancer was not reporting the bottlenecking in any fashion. In fact, when a Kentik Portal user loaded a page that ran a request that went through this load balancer, we would see nominal response times reported by the load balancer, but elevated response times reported by the web server. This led us to believe there was a performance issue on our web servers and focus much of our efforts there for the first few days. In addition to software improvements, the team allocated 66% more hardware capacity for our web servers, hoping this would buy us headroom to identify the true root cause, but to no avail. It was only after looking back at macro trends several days into the incident and seeing a very slight decrease in overall responsiveness and increased error rates that coincided with our load balancer changes that we began to investigate it as a potential root cause. Our load balancer employs several concurrency limits, and the addition of query load to it caused us to hit these limits during query peaks. We could clearly see this in concurrent connection metrics, but did not have monitoring for this scenario, nor did the load balancer log or otherwise indicate this was occurring. It would queue requests and silently incur delays while reporting nominal request and response times in its latency metrics. On 2022-09-15, Kentik Engineering removed the query load from this load balancer, and performance returned to consistently nominal levels. However, doing this rollback in conjunction with rapidly deploying new hardware for the web portal caused different bottlenecks in our query system during query peaks – ones that we were anticipating and trying to get ahead of by putting the load balancer in play in the first place. On 2022-09-21, Kentik Engineering was able to get all affected systems into a nominal state in terms of query performance and overall latency. **FOLLOW UP** The team is now focused on adding several layers of observability to our platform in order to improve our ability to respond to these types of incidents. In addition to more thorough monitoring of all components of our infrastructure, we are focused on identifying performance issues more proactively. During Q4 2022, our team will be working towards: * Adding more tracing to the Kentik Portal itself in order to get more visibility into browser-side/browser-observed performance * Leveraging Kentik Synthetics to actively monitor performance of key workflows in the Kentik Portal * Increasing our usage of Kentik Host Monitoring to more quickly identify performance issues via Kentik Alerting Please contact your Customer Success team or [[email protected]](mailto:[email protected]) if you have any further questions or concerns.
  • Time: Sept. 15, 2022, 10:42 p.m.
    Status: Resolved
    Update: Occurrences of high response times have stayed at a nominal level compared to latency patterns over the past week. We are continuing to monitor, but consider the issue resolved.
  • Time: Sept. 15, 2022, 10:38 p.m.
    Status: Monitoring
    Update: A fix has been deployed and occurrences of high response times have diminished. We will continue to monitor for the next 24-48 hours.
  • Time: Sept. 15, 2022, 10:37 p.m.
    Status: Identified
    Update: A potential root cause has been identified and we are working in implementing a fix.
  • Time: Sept. 15, 2022, 10:36 p.m.
    Status: Investigating
    Update: We are seeing intermittent spikes in response times affecting a subset of users in both the web portal and API. We are actively investigating.

Check the status of similar companies and alternatives to Kentik SaaS US Cluster

NetSuite
NetSuite

Systems Active

ZoomInfo
ZoomInfo

Systems Active

SPS Commerce
SPS Commerce

Systems Active

Miro
Miro

Systems Active

Field Nation
Field Nation

Systems Active

Outreach
Outreach

Systems Active

Own Company

Systems Active

Mindbody
Mindbody

Systems Active

TaskRabbit
TaskRabbit

Systems Active

Nextiva
Nextiva

Systems Active

6Sense

Systems Active

BigCommerce
BigCommerce

Systems Active

Frequently Asked Questions - Kentik SaaS US Cluster

Is there a Kentik SaaS US Cluster outage?
The current status of Kentik SaaS US Cluster is: Systems Active
Where can I find the official status page of Kentik SaaS US Cluster?
The official status page for Kentik SaaS US Cluster is here
How can I get notified if Kentik SaaS US Cluster is down or experiencing an outage?
To get notified of any status changes to Kentik SaaS US Cluster, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of Kentik SaaS US Cluster every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here
What does Kentik SaaS US Cluster do?
Kentik provides network observability solutions to enhance network performance, security, and diagnostics through traffic monitoring, routing, synthetic testing, and cloud integration.