Last checked: 8 minutes ago
Get notified about any outages, downtime or incidents for DrChrono and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for DrChrono.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
drchrono.com | Active |
drchrono iPad Check-In Kiosk Application | Active |
drchrono iPad EHR | Active |
DrChrono Telehealth Platform | Active |
onpatient.com | Active |
onpatient iPhone PHR | Active |
www.drchrono.com Public Website | Active |
3rd party services | Active |
Box Uploads/Downloads | Active |
Emdeon/Change Healthcare | Active |
Health Gorilla | Active |
HelloFax | Active |
ID.ME | Active |
Labcorp | Active |
Quest Diagnostics | Active |
Square | Active |
Stripe API | Active |
support.drchrono.com | Active |
Trizetto | Active |
Twilio Authy | Active |
Twilio REST API | Active |
Twilio SMS | Active |
Twilio TwiML | Active |
Updox | Active |
Waystar | Active |
View the latest incidents for DrChrono and check for official updates:
Description: DrChrono believes the issues have been resolved. Please reach out to support if you are still having issues.
Status: Resolved
Impact: Major | Started At: Jan. 24, 2024, 3:44 p.m.
Description: At 2:15 PM ET on December 18th, a system hosted by our cloud provider became degraded and intermittently unavailable. This caused the capacity we had to serve requests to degrade. We had to pause application traffic and place a maintenance page to allow the system to recover. At 4:45 PM ET, we lifted the maintenance page, and the system returned to health. We saw traffic levels and background processing return to normal levels. The failing system in question is the same system that resulted in other degradations and outages in the last 90 days. In every case, the outage was not caused by a recent DrChrono release but by hardware failure occurring on an ad-hoc basis. Since the October 18th outage, the team has been working as quickly as possible to patch the system to prevent recurrence and, in parallel, to replace this system entirely. In late November, we fixed the system in place with the understanding and expectation that it would remove the memory leak that is the cause of degradation via assurances provided by our cloud provider. Unfortunately, the memory leak remains in our running version, as experienced on December 18th. We recognize it is unacceptable for this system to disrupt you and your patients. Fundamentally, with assurances from our cloud provider, we believed the patch would provide relief and time for the workstream to replace the existing system to complete their project without an additional event occurring. The patch did not offer that relief, and the team is working to further expedite the replacement workstream. The code for using the new system backing the background task processor was released to production on December 14th in beta mode for us to roll out and test slowly. The full rollout of the replacement is currently slated to be complete in the first half of January. The team is working quickly, including over the holidays, to expedite this further. We will communicate once this has been completed. Please know that this is our team’s number one priority. Fundamentally, we are replacing a vital piece of DrChrono's foundation. Significant testing is required to ensure you and your patients are supported appropriately. In the meantime, while the DrChrono application is supported by the system that we will replace, the infrastructure team has provisioned a spare cluster in case the existing cluster fails again. A contributing factor to these downtimes' length is that new cluster provisioning takes roughly one hour. Running a spare will help us move to a new cluster to restore service faster than waiting on server restarts or new cluster provisioning. The team is adding such an approach to their playbooks for all systems that support the DrChrono application. **Corrective actions related to this event include:** * Further expediting the system replacement workstream – **In progress** * Updating Maintenance Page wording to make it more straightforward that it was _unplanned_ or emergency maintenance – **In progress** * Developing a playbook to maintain the spare cluster and procedure to enable the light cluster – **Complete as of December 19th** * Researching and implementing further current system upgrades if thought to further reduce the occurrence rate of memory leak issues - **In progress** * Investigate and publish a plan with timelines regarding an “offline viewer” that can be used by DrChrono customers in the event of a degradation or outage to make schedule and clinical information available in a read-only mode – **In progress with a plan published in the coming weeks.** We architected our solution based on a cloud provider-managed service for our background task processors. This solution has served us well for many years. While cloud provider solutions are generally highly resilient and highly available, it is no longer the case for this service. Ultimately, your and your patients' experience is our responsibility and is taken very seriously. We are aggressively working towards delivering a solution that will provide consistent service. In addition, as the last corrective action above notes, we consider it an action item to provide our customers with an “offline viewer” to lessen the disruption and enable continued and safe care. We will share more on those plans in the coming weeks. We appreciate your patience as we roll out these changes and find more stable ground.
Status: Postmortem
Impact: Critical | Started At: Dec. 18, 2023, 7:14 p.m.
Description: Between 3:25 PM EST and 5:28 PM EST, we experienced an extreme traffic flood via our public API that caused backend message broker resource pressure resulting in degraded performance. The team worked quickly to identify and limit the source of the traffic flood. Once the flood was halted the team then worked to relieve broker resource pressure before reinstating general access to the system. We are currently evaluating improvements we can make to fortify our throttling capability to prevent this issue from reoccurring in the future.
Status: Resolved
Impact: Minor | Started At: Nov. 27, 2023, 8:35 p.m.
Description: **RCA for DrChrono application outage on 18-October-2023** **Issue Start Date/Time**: 18-Oct-2023 at 10:50 AM ET **Issue Resolution Date/Time**: 18-Oct-2023 at 6:30 PM ET **Issue Summary:** Customers experienced 503 errors and slow performance while logging into DrChrono web application from 10:50 AM ET on Oct 18th, 2023. The service was restored fully by 6:30 PM ET on Oct 18th, 2023. **How were customers impacted?** Customers could not log into or utilize the DrChrono web application, mobile application or public APIs. **Root Cause:** At 10:50 AM ET two systems hosted by our cloud provider became degraded and intermittently unavailable. Due to the interplay of the two systems, customers had an often severely degraded experience in the window mentioned above. This was not caused by a change made by the DrChrono team but was a point in time event due to apparent hardware failure. The first system that failed backs our audit log storage and prevented any action that would have been written to the audit log. This was determined to be caused by disk pressure and likely a bad disk in the cloud provider’s environment as latency for writes was very high, but the volume of writes was normal. The pressure created from this degradation caused the second system, our background task processor, to fail to complete tasks and build up work to a degree that deadlocked the background task processor. Of note, the number of tasks was within expected bounds and should _not_ have caused this degradation but did nonetheless – and has been confirmed by our cloud provider to be a bug in the version they provide to us. To ensure writes are processed for all incoming requests and provide a high level of data integrity, all requests ensure that the background task processor is available. With the background task processor in an intermittent degraded state, all requests that were unable to reach the background task processor returned a 503 error. To that end, both systems needed repair to return to healthy service. Both systems were configured for high availability and had automatic failover. The systems’ attempts at automatically fixing the issue and cloud provider dashboards reporting as healthy masked the true issues for some time. Additionally, the background task processor is configured to add more instances to complete tasks upon degradation or building queues. In this instance, that made the performance of the background task processor counterintuitively _worse_ due to the bug mentioned above. Due to masked root causes, significant investigation was performed by the DrChrono _and_ cloud provider support teams. It took several hours to pinpoint the true root causes and resolutions necessary. Once found, resolutions were performed. **Resolution\(s\):** At roughly 4:00pm ET, the audit log datastore was re provisioned with more CPU, RAM and storage – not to provide more resources but to force the instance shifting from failed to healthy hardware within the cloud provider. Within 15 minutes, the new instance was up, running and healthy. It was keeping up with the same volume of writes that was failing just previously. Unfortunately, the background task processor was yet in a failed state due to the bug mentioned above and pressure created from the audit log datastore while it was intermittently out of service. With page loads checking on background task processor health, DrChrono still had issues serving customer requests. At roughly 6:15pm ET, it was determined that _reducing_ the number of servers performing tasks for the background task processor would put less pressure on the bug causing deadlocks. Within 10 minutes of reducing the number of servers, the background task processor recovered, and service returned to normal. We did receive reports of slowness for roughly 10 minutes, but we believe via the data collected that this was the system continuing to recover. Based on our own testing, monitoring and data we consider DrChrono to have returned to healthy service at 6:30pm ET. Throughout the path of investigation and resolution, we noted many opportunities to implement process, changes and enhanced monitoring to the system to prevent this from occurring in the future. **Mitigation steps planned/taken:** 1. Review and implement configuration to set number of servers for the background task processor to an ideal level to avoid the bug experienced. This has been completed as of 10/18. 2. Replace the existing background task processor system with another system that handles our volumes of tasks more performantly and does not have defects that affect our workloads. This is in progress and is our highest priority open item. We expect this to be completed in the next 30 days and have temporary relief implemented via item #1 above. 3. Introduce a buffer for audit log writes so that if audit log storage is unavailable operations can continue and no data loss occurs. This is in progress, and we expect this to be completed in the next 60 days. 4. Improve alerts for issues experienced above and other issues that may occur in these secondary systems within the DrChrono infrastructure stack. This is in progress, and we expect this to be completed in the next 30 days.
Status: Postmortem
Impact: None | Started At: Oct. 18, 2023, 3:07 p.m.
Description: The issue with custom data fields has now been fixed by the DrChrono engineering team. We are sorry for any inconveniences this may have caused you. Please reach out to support if you feel the issues have not been resolved for you.
Status: Resolved
Impact: None | Started At: Sept. 29, 2023, 2:34 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.