Last checked: a minute ago
Get notified about any outages, downtime or incidents for ServiceChannel and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for ServiceChannel.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
WorkForce | Active |
Analytics | Active |
Analytics Dashboard | Active |
Analytics Download | Active |
Data Direct | Active |
API | Active |
API Response | Active |
Authentication | Active |
Budget Insights | Active |
SendXML | Active |
SFTP | Active |
Universal Connector | Active |
Mobile Applications | Active |
SC Mobile | Active |
SC Provider | Active |
Provider Automation | Active |
Fixxbook | Active |
Invoice Manager | Active |
IVR | Active |
Login | Active |
Proposal Manager | Active |
Work Order Manager | Active |
Service Automation | Active |
Asset Manager | Active |
Compliance Manager | Active |
Dashboard | Active |
Inventory Manager | Active |
Invoice Manager | Active |
Locations List | Active |
Login | Active |
Maps | Active |
Project Tracker | Active |
Proposal Manager | Active |
Supply Manager | Active |
Weather | Active |
Work Order Manager | Active |
Service Center | Active |
Email - servicechannel.com | Active |
Email - servicechannel.net | Active |
Phone - Inbound | Active |
Phone - Outbound | Active |
Third Party Components | Active |
Avalara Tax Calculation Service | Active |
Rackspace - Inbound Email | Active |
Twilio REST API | Active |
Zendesk | Active |
View the latest incidents for ServiceChannel and check for official updates:
Description: **Increased platform latency and workorder reports unresponsive Incident Report** **Date of Incident:** 04/01/2024 **Time/Date Incident Started:** 04/01/2024, 10:34 am EST **Time/Date Stability Restored:** 04/01/2024, 12:45 pm EST **Time/Date Incident Resolved:** 04/01/2024, 1:05 pm EST **Users Impacted:** All Users **Frequency:** Intermitted **Impact:** Major **Incident description:** Users experienced sporadic latency and timeout issues while engaging with the ServiceChannel Platform, particularly for workorder report services. **Root Cause Analysis:** The automated monitoring systems of the ServiceChannel SRE and DBA teams detected elevated CPU utilization on database read replicas. A subsequent investigation into the logs identified that the incident coincided with a spike in user traffic. This surge in activity caused extended wait times for certain Servicechannel Services, notably the excel report services, leading to slower page loads and timeouts. The SRE team swiftly acted by scaling up our infrastructure resources to accommodate the increased traffic. Following the expansion of capacity, normal system operations resumed. **Actions Taken:** 1. Manually tested our services to replicate the issue 2. Isolated the performance degradation to report queues and related database services. 3. Enhanced the capacity of affected services to manage the load and restore full functionality. **Mitigation Measures:** 1. Expansion of database resources to more effectively manage reporting queues. 2. Implementation of refined monitoring systems for better oversight of reporting queues.
Status: Postmortem
Impact: Major | Started At: April 1, 2024, 3:58 p.m.
Description: **Incident Report: SFTP Service Disruption** **Date of Incident:**` ` 12/12/2023 **Time/Date Incident Started:** 12/11/2023, 05:43 pm EST **Time/Date Stability Restored:**` `12/12/2023, 01:24 pm EST **Time/Date Incident Resolved:**` `12/12/2023, 01:54 pm EST **Users Impacted:** Few **Frequency:** Continuous **Impact:** Major **Incident description:** On December 11th at 5:43 pm EDT, an unexpected disruption occurred in the Production ServiceChannel SFTP service. By the morning of December 12th, 2023, the ServiceChannel Support team began to receive customer reports of timeout errors when attempting to connect to the ServiceChannel SFTP server. **Root Cause Analysis:** A comprehensive investigation by the Site Reliability Engineering \(SRE\) team revealed no resource contention issues with the affected server instance. Nevertheless, to preemptively avoid any hardware bottleneck issues, the SRE team performed a scale-up of the server instance to the next larger instance size. Despite this effort, tests indicated ongoing issues with external connections to port 22, while all internal network tests were successful. The SRE team shifted their efforts to pinpoint potential network irregularities and found that the security policy governing the SFTP server had been altered to exclude access to port 22. Upon further investigation with the Security team, we determined that this change was part of a broad initiative to harden our platform's security posture. Regrettably, this policy update was executed without the normal change management process, and the the broader engineering organization was not notified in advance. This network modification was subsequently reversed, and SFTP functionality was restored. **Actions Taken:** 1. The SRE team inspected the SFTP server and confirmed it was operating within defined parameters. The team also proactively scaled up the infrastructure to proactively address the possibility of any system bottlenecks. 2. The SRE team identified a suspected change in the security policy, wherein Port 22 access was removed for all but private network address spaces. System event logs confirmed that this change was implemented by the security team. Upon identifying the issue, the Security team was informed, and an emergency rollback was requested. **Mitigation Measures:** In light of this incident, the following preventative measures have been put in place: 1. Improvements to internal communications, including ensuring that all network changes are announced and approved by the wider engineering organization prior to their implementation. 2. Ensuring that going forward, Infrastructure changes to the ServiceChannel Platform will be made by the SRE team using the normal Infrastructure as Code process. 3. Additional monitoring of SFTP infrastructure using both network ping tests and end-to-end synthetic transaction tests have been implemented to test from both internal and external network paths.
Status: Postmortem
Impact: Critical | Started At: Dec. 12, 2023, 3:37 p.m.
Description: **Incident Report: Infrastructure/Hardware Instability** **Date of Incident:**` `09/08/2023 **Time/Date Incident Started:** 09/08/2023, 04:18 pm EDT **Time/Date Stability Restored:**` `09/08/2023, 05:08 pm EDT **Time/Date Incident Resolved:**` `09/08/2023, 05:15 pm EDT **Users Impacted:** All **Frequency:** Intermittent **Impact:** Major **Incident description:** On September 8th at 04:18 pm EDT, the Site Reliability Engineering \(SRE\) team received an alert regarding "SQL timeout errors" and subsequent reports of dashboard slowness. This slowness had a significant impact on a large number of users, resulting in a suboptimal experience. **Root Cause Analysis:** Upon conducting a thorough investigation, the Database Administration \(DBA\) team identified a series of database requests that were causing blocks and imposing a high CPU load on the database replica servers. This, in turn, led to an increased number of "resource waits." As a preemptive measure, the DBA team initiated a restart of the SQL service on both database replica servers. Following the successful restart of the SQL service, the system's stability was closely monitored and subsequently restored. **Actions Taken:** 1. Investigated system-generated alerts and identified affected platform functionality. 1. DBA team proactively initiated SQL service restart on database replica servers. **Mitigation Measures:** In response to this incident, the following mitigation measures have been implemented: 1. Ongoing Investigation: The team is continuing to investigate the root causes of the high CPU usage and blockages on the database servers. 1. Database Query Performance Improvements: Efforts are being made to enhance the performance of database queries to ensure the overall stability of the platform.
Status: Postmortem
Impact: Critical | Started At: Sept. 8, 2023, 9:03 p.m.
Description: **Infrastructure/Hardware Instability** **Incident Report** **Date of Incident:**` `09/05/2023 **Time/Date Incident Started:** 09/05/2023, 09:15 am EDT **Time/Date Stability Restored:**` `09/05/2023, 10:19 am EDT **Time/Date Incident Resolved:**` `09/05/2023, 10:25 am EDT **Users Impacted:** All **Frequency:** Intermittent **Impact:** Major **Incident description:** Third party vendor infrastructure/hardware instability **Root Cause Analysis:** A third-party vendor infrastructure issue affected performance and system availability for the underlying data storage layer that services platform resources. **Actions Taken:** 1. Investigated system-generated alerts and identified affected platform functionality. 1. SRE and DBA teams initiated a platform infrastructure redeployment, forcing the new infrastructure to be spun up on unaffected infrastructure/hardware. **Mitigation Measures:** 1. Continue the ongoing investigation into root causes of infrastructure issues within our cloud hosting provider. 1. Continue to implement high availability improvements to prepare the platform to respond better to unexpected hardware issues that are beyond our control.
Status: Postmortem
Impact: Critical | Started At: Sept. 5, 2023, 2:05 p.m.
Description: **Infrastructure/hardware instability** **Incident Report** **Date of Incident:**` `08/31/2023 **Time/Date Incident Started:** 08/31/2023, 02:15 pm EDT **Time/Date Stability Restored:**` `08/31/2023, 02:47 pm EDT **Time/Date Incident Resolved:**` `08/31/2023, 02:50 pm EDT **Users Impacted:** All **Frequency:** Intermittent **Impact:** Major **Incident description** On August 31st at 02:15 pm EDT, the ServiceChannel Site Reliability Engineering \(SRE\) team received a large number of SQL timeout errors, followed by reports of dashboard slowness. **Root Cause Analysis** The Database Administration \(DBA\) team discovered a growing queue of active database queries and increasing resource waits, resulting from functionality that was causing database blocks and high CPU load on the database cluster. **Actions Taken** 1. Investigated system-generated alerts and identified affected platform functionality. 1. Recompiled the affected stored procedures and dropped all blocking connections to return the database cluster to the steady state. 1. Compiled incident findings for future remediation by the Application Engineering and SRE teams. **Mitigation Measures** 1. Coordinate with the Application Engineering team to identify and remediate the root causes of the high database CPU and blocks. 1. Identify and implement general performance improvements for database queries to increase overall platform stability. 1. Implement infrastructural modifications to distribute database I/O across additional read replicas.
Status: Postmortem
Impact: Major | Started At: Aug. 31, 2023, 6:36 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.