PandaDoc Status: Check if PandaDoc down or having an outage.

PandaDoc outages and incidents

Outage and incident data over the last 30 days for PandaDoc.

There have been 1 outages or incidents for PandaDoc in the last 30 days.

Severity Breakdown:

None: 0

Minor: 0

Major: 1

Critical: 0

Tired of searching for status updates?

Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!

Components and Services Monitored for PandaDoc

Outlogger tracks the status of these components for Xero:

EU

API Active

Creating and editing documents Active

CRMs & Integrations Active

Mobile application Active

Public (recipient) view Active

Sending and opening documents Active

Signup Active

Uploading and downloading documents Active

Web application Active

Webhooks Active

Website Active

US & Global

API Active

Creating and editing documents Active

CRMs & Integrations Active

Mobile application Active

Public (recipient) view Active

Sending and opening documents Active

Signup Active

Uploading and downloading documents Active

Web application Active

Webhooks Active

Website Active

Component	Status
EU	Active
API	Active
Creating and editing documents	Active
CRMs & Integrations	Active
Mobile application	Active
Public (recipient) view	Active
Sending and opening documents	Active
Signup	Active
Uploading and downloading documents	Active
Web application	Active
Webhooks	Active
Website	Active
US & Global	Active
API	Active
Creating and editing documents	Active
CRMs & Integrations	Active
Mobile application	Active
Public (recipient) view	Active
Sending and opening documents	Active
Signup	Active
Uploading and downloading documents	Active
Web application	Active
Webhooks	Active
Website	Active

Latest PandaDoc outages and incidents.

View the latest incidents for PandaDoc and check for official updates:

Public API is down

Description: ## A summary of what happened At **14:01 PDT Friday, April 7th** our monitoring indicated that our Public API request rate dropped and health checks didn’t pass through. The situation deteriorated rapidly and we noticed that some of our API endpoints became unresponsive, which impacted the availability of the PandaDoc platform. We followed our protocol and immediately started our incident response procedure, rolled back recent updates, and involved engineers in multiple investigation paths. After we had dismissed some initial theories, we understood that we had an issue connected with something on the infrastructure level and started investigating this together with our cloud provider \(AWS\). After a deep investigation that lasted several hours, we were able to track down the issue to network problems: several pods on a specific Kubernetes node were experiencing intermittent low-level network issues that caused connection leaks - repeatedly opening connections without closing them, or at least closing only some of them - which eventually led to increased latency and memory consumption and resulted in some of our core services entering a chain of crashes. As a consequence, Application, and API were not available during the downtime. Once the root cause was identified, the broken machine was removed from the cluster and the system started operating normally. The issue was fully resolved by **01:23 PDT, April 08**. ## A deep dive - how we investigated the root cause When the incident started, we noticed a spike in the number of connections in our database pool and many API calls waiting for connections to be released so they could process incoming requests. We quickly figured out that what was stopping connections from being released was a large series of uncommitted transactions that were just waiting idle. We then started analyzing database locks and deadlocks because it is usually what might lead to this behavior and wrote a hotfix to one of our API endpoints to reduce the number of processed events expecting this would release connections faster. Soon after this, we understood that the database was not a bottleneck, although we still had stalled transactions growing and the connections in the pool being taken and not released. We ran a deeper analysis of API endpoints metrics which revealed that external calls within the transactions could be the culprit. After more investigation, we found similarities in the API calls that were not responsive - they all interacted with our message queue \(RabbitMQ HA cluster\). The RabbitMQ cluster was working without any disruptions for the last 1.5 years, and monitoring was not showing anything suspicious. It did not seem like a cause since queues are processing messages independently in async mode \(that’s why they are used to offload tasks to be executed asynchronously later\), but we still decided to look into it closer. After analyzing machines in the clusters and connecting to them directly we saw that they were shutting down and reloading periodically, although this was not visible in the cluster monitoring in our Grafana dashboards, nor did we get any alerts. Since the message queue was unresponsive it led API calls to sit and wait for connection which led to blocking transactions which led to increasing in blocked connections in the database connection pool which led to waiting for other API requests for a new connection forever, in a loop that caused a chain of failures. We immediately started addressing the situation by scaling the cluster up vertically and upgrading the machines it's running on with more processing power and networking capabilities. After the upgrade, we’ve added additional monitoring metrics to the cluster. In parallel, we were leading an investigation into a probable root cause: intermittent networking issues on a kubernetes node that were causing pods on that node to repeatedly open connections without closing them. We investigated deeper and realized that an underlying networking issue was the most probable root cause after we observed and correlated several facts: * We had randomly missing metrics in our Prometheus monitoring relative to several systems, coinciding in time with the degradation of the RabbitMQ cluster metrics \(the number of sockets started growing linearly\) * We found that all pods on one particular Kubernetes node \(that was added to the cluster on Friday morning\) were having trouble connecting to other parts of the system \(our NATS cluster\). We also noticed error patterns in logs related to closed network connections or client timeouts, in numbers higher than normal. At the same time, we also observed that the number of slow NATS consumers was growing abnormally since the start of the incident * Most of the connections to the RabbitMQ nodes during the incident period were coming from pods that were residing in the faulty node. Once the broken machine was removed from the cluster, the system started operating normally. To sum up: we consider the main cause of the incident to be a problem with an AWS EC2 instance provisioned as part of our EKS \(managed k8s cluster\) that occurred during the normal process of a release. There were network-related errors that caused a number of connection issues on the RabbitMQ cluster leading to a chain failure. ## What we have done and will be doing next As our investigation wraps up, we want to highlight our continuous improvement mindset, and to provide clarity on what we are doing to improve our systems: * We have improved the robustness and scale of our rabbitMQ cluster to reduce the likelihood of failure in case of a growing number of network connections and reviewed the HA RabbitMQ setup and its replication settings * We’ve added additional logging and metrics to our RabbitMQ cluster, as well as early detection alarms for any deviation in network traffic patterns for the cluster * We’ve engaged AWS in the investigation and resolution of this outage. AWS support is running their own investigation about the issue * We’ll do further improvements in our Observability stack, with a review of which additional metrics we can add to improve the detection of underlying problems in AWS-managed services \(e.g. EKS\), reduce alerting noise and ensure certain alerts are highlighted \(RabbitMQ / failing pods\) * As an additional step to prevent this in the future, we’re planning to review all the external calls in our API handlers and move them to a transactional outbox to avoid blocking transactions if external services become unavailable.

Status: Postmortem

Impact: Critical | Started At: April 7, 2023, 9:01 p.m.

Updates:

Time: April 17, 2023, 3:43 p.m.

Status: Postmortem

Update: ## A summary of what happened At **14:01 PDT Friday, April 7th** our monitoring indicated that our Public API request rate dropped and health checks didn’t pass through. The situation deteriorated rapidly and we noticed that some of our API endpoints became unresponsive, which impacted the availability of the PandaDoc platform. We followed our protocol and immediately started our incident response procedure, rolled back recent updates, and involved engineers in multiple investigation paths. After we had dismissed some initial theories, we understood that we had an issue connected with something on the infrastructure level and started investigating this together with our cloud provider \(AWS\). After a deep investigation that lasted several hours, we were able to track down the issue to network problems: several pods on a specific Kubernetes node were experiencing intermittent low-level network issues that caused connection leaks - repeatedly opening connections without closing them, or at least closing only some of them - which eventually led to increased latency and memory consumption and resulted in some of our core services entering a chain of crashes. As a consequence, Application, and API were not available during the downtime. Once the root cause was identified, the broken machine was removed from the cluster and the system started operating normally. The issue was fully resolved by **01:23 PDT, April 08**. ## A deep dive - how we investigated the root cause When the incident started, we noticed a spike in the number of connections in our database pool and many API calls waiting for connections to be released so they could process incoming requests. We quickly figured out that what was stopping connections from being released was a large series of uncommitted transactions that were just waiting idle. We then started analyzing database locks and deadlocks because it is usually what might lead to this behavior and wrote a hotfix to one of our API endpoints to reduce the number of processed events expecting this would release connections faster. Soon after this, we understood that the database was not a bottleneck, although we still had stalled transactions growing and the connections in the pool being taken and not released. We ran a deeper analysis of API endpoints metrics which revealed that external calls within the transactions could be the culprit. After more investigation, we found similarities in the API calls that were not responsive - they all interacted with our message queue \(RabbitMQ HA cluster\). The RabbitMQ cluster was working without any disruptions for the last 1.5 years, and monitoring was not showing anything suspicious. It did not seem like a cause since queues are processing messages independently in async mode \(that’s why they are used to offload tasks to be executed asynchronously later\), but we still decided to look into it closer. After analyzing machines in the clusters and connecting to them directly we saw that they were shutting down and reloading periodically, although this was not visible in the cluster monitoring in our Grafana dashboards, nor did we get any alerts. Since the message queue was unresponsive it led API calls to sit and wait for connection which led to blocking transactions which led to increasing in blocked connections in the database connection pool which led to waiting for other API requests for a new connection forever, in a loop that caused a chain of failures. We immediately started addressing the situation by scaling the cluster up vertically and upgrading the machines it's running on with more processing power and networking capabilities. After the upgrade, we’ve added additional monitoring metrics to the cluster. In parallel, we were leading an investigation into a probable root cause: intermittent networking issues on a kubernetes node that were causing pods on that node to repeatedly open connections without closing them. We investigated deeper and realized that an underlying networking issue was the most probable root cause after we observed and correlated several facts: * We had randomly missing metrics in our Prometheus monitoring relative to several systems, coinciding in time with the degradation of the RabbitMQ cluster metrics \(the number of sockets started growing linearly\) * We found that all pods on one particular Kubernetes node \(that was added to the cluster on Friday morning\) were having trouble connecting to other parts of the system \(our NATS cluster\). We also noticed error patterns in logs related to closed network connections or client timeouts, in numbers higher than normal. At the same time, we also observed that the number of slow NATS consumers was growing abnormally since the start of the incident * Most of the connections to the RabbitMQ nodes during the incident period were coming from pods that were residing in the faulty node. Once the broken machine was removed from the cluster, the system started operating normally. To sum up: we consider the main cause of the incident to be a problem with an AWS EC2 instance provisioned as part of our EKS \(managed k8s cluster\) that occurred during the normal process of a release. There were network-related errors that caused a number of connection issues on the RabbitMQ cluster leading to a chain failure. ## What we have done and will be doing next As our investigation wraps up, we want to highlight our continuous improvement mindset, and to provide clarity on what we are doing to improve our systems: * We have improved the robustness and scale of our rabbitMQ cluster to reduce the likelihood of failure in case of a growing number of network connections and reviewed the HA RabbitMQ setup and its replication settings * We’ve added additional logging and metrics to our RabbitMQ cluster, as well as early detection alarms for any deviation in network traffic patterns for the cluster * We’ve engaged AWS in the investigation and resolution of this outage. AWS support is running their own investigation about the issue * We’ll do further improvements in our Observability stack, with a review of which additional metrics we can add to improve the detection of underlying problems in AWS-managed services \(e.g. EKS\), reduce alerting noise and ensure certain alerts are highlighted \(RabbitMQ / failing pods\) * As an additional step to prevent this in the future, we’re planning to review all the external calls in our API handlers and move them to a transactional outbox to avoid blocking transactions if external services become unavailable.
Time: April 8, 2023, 9:23 a.m.

Status: Resolved

Update: We're all set! If you continue to experience any issues with this, please reach out to us at [email protected]. Thank you so much for your patience and understanding!
Time: April 8, 2023, 8:23 a.m.

Status: Monitoring

Update: We have resolved the issue and PandaDoc should now be up and running! We're still monitoring the application performance and will post a final “ALL SET” message once we’ve confirmed the fix produced consistent output.
Time: April 8, 2023, 8:17 a.m.

Status: Identified

Update: We’ve identified the issue root and are already working on a fix. We appreciate your patience and will post updates here as soon as possible. Stay tuned!
Time: April 8, 2023, 7:57 a.m.

Status: Investigating

Update: Unfortunately, the services are still being impacted by the outage. Please rest assured technicians and engineers are digging into it in order to provide the fastest fix possible. Please check back here for updates, we appreciate your patience while we are getting this resolved.
Time: April 8, 2023, 4 a.m.

Status: Investigating

Update: Sadly, the website is still experiencing technical difficulties, the team carries on putting all the possible effort into the investigation and resolution. Thank you so much for your patience and understanding.
Time: April 8, 2023, 1:03 a.m.

Status: Investigating

Update: The development and engineering team is doing their best to resolve the issue. Please check back here for updates, and we appreciate your patience while we get this resolved.
Time: April 7, 2023, 11:37 p.m.

Status: Investigating

Update: We’re really sorry for holding you up! Please know our engineering and operations teams are working hard to get everything up and running.
Time: April 7, 2023, 10:40 p.m.

Status: Investigating

Update: We’re on it! Our team is doing its best to get you back on track as soon as possible. Please check back here for updates.
Time: April 7, 2023, 9:57 p.m.

Status: Investigating

Update: Thank you for your patience! Our development team is already hard at work solving this issue and determining next steps. Check back here for updates and we will get this back on track as soon as possible.
Time: April 7, 2023, 9:27 p.m.

Status: Investigating

Update: We are continuing the investigation and doing our best to get to a resolution as fast as possible. Please check back here for updates, and we appreciate your patience while we get this resolved.
Time: April 7, 2023, 9:01 p.m.

Status: Investigating

Update: We are actively investigating the outage of the public API. Please check back here for updates, and we appreciate your patience while we get this resolved.

Pipedrive Integration is not Functioning Properly

Description: This incident has been resolved.

Status: Resolved

Impact: Critical | Started At: March 31, 2023, 4:08 p.m.

Updates:

Time: March 31, 2023, 5:50 p.m.

Status: Resolved

Update: This incident has been resolved.
Time: March 31, 2023, 5:47 p.m.

Status: Monitoring

Update: We’ve identified that the issue is connected to unexpected changes introduced on the Pipedrive API side. The issue is resolved now and the functionality is up and running again.
Time: March 31, 2023, 4:38 p.m.

Status: Investigating

Update: Thank you for your patience! Our development team is already hard at work investigating this issue and determining next steps. Check back here for updates and we will get this back on track as soon as possible.
Time: March 31, 2023, 4:08 p.m.

Status: Investigating

Update: We are actively investigating why Pipedrive integration is not working. Please check back here for updates, and we appreciate your patience while we get this resolved.

Issues with PandaDoc document creation, completion and download

Description: We're all set! If you continue to experience any issues with this, please reach out to us at [email protected].

Status: Resolved

Impact: Major | Started At: March 20, 2023, 5:15 p.m.

Updates:

Time: March 20, 2023, 6:20 p.m.

Status: Resolved

Update: We're all set! If you continue to experience any issues with this, please reach out to us at [email protected].
Time: March 20, 2023, 5:52 p.m.

Status: Monitoring

Update: We have resolved the issue and PandaDoc users should now be able to download, create, and complete documents. We're still monitoring the application performance and will post a final “ALL SET” message once we’ve confirmed the issue is fully resolved.
Time: March 20, 2023, 5:37 p.m.

Status: Identified

Update: We’ve identified the issue and are already working on a fix. We appreciate your patience and will post updates here as soon as possible. Stay tuned!
Time: March 20, 2023, 5:33 p.m.

Status: Investigating

Update: Our dev is actively working on resolving the issue. We do really apologize for the inconvenience caused and we appreciate your patience while we get this resolved.
Time: March 20, 2023, 5:16 p.m.

Status: Investigating

Update: We are continuing to investigate this issue.
Time: March 20, 2023, 5:15 p.m.

Status: Investigating

Update: We are actively investigating issues with PandaDoc document creation, completion, and download. Please check back here for updates, and we appreciate your patience while we get this resolved.

PandaDoc Chat support is unavailable.

Description: This incident has ben resolved.

Status: Resolved

Impact: Minor | Started At: March 16, 2023, 12:10 a.m.

Updates:

Time: March 16, 2023, 12:16 a.m.

Status: Resolved

Update: This incident has ben resolved.
Time: March 16, 2023, 12:14 a.m.

Status: Monitoring

Update: We have resolved the issue and PandaDoc users should now be able to send a chat and email again] We're still monitoring the application performance and will post a final “ALL SET” message once we’ve confirmed the issue is fully resolved.
Time: March 16, 2023, 12:10 a.m.

Status: Investigating

Update: Zendesk, our chat and email system, is having an issue with its service. As a result, some customers are not able to start a chat with us. We'll continue monitoring and will post an update here once it's back up! In the meantime, you can reach out to us at [email protected].

Documents creation via API does not work

Description: We're all set! Users can now create documents via API. If you continue to experience any issues with this, please reach out to us at [email protected].

Status: Resolved

Impact: Major | Started At: Feb. 13, 2023, 5:23 p.m.

Updates:

Time: Feb. 13, 2023, 5:39 p.m.

Status: Resolved

Update: We're all set! Users can now create documents via API. If you continue to experience any issues with this, please reach out to us at [email protected].
Time: Feb. 13, 2023, 5:32 p.m.

Status: Monitoring

Update: We have resolved the issue and PandaDoc users should now be able to create documents via API. We're still monitoring the application performance and will post a final “ALL SET” message once we’ve confirmed the issue is fully resolved.
Time: Feb. 13, 2023, 5:23 p.m.

Status: Investigating

Update: We are actively investigating the issue of creating documents via API. Please check back here for updates, and we appreciate your patience while we get this resolved.

Check the status of similar companies and alternatives to PandaDoc

Docusign

Systems Active

Foxit

Systems Active

Nitro Sign

Systems Active

Templafy

Systems Active

Documoto System

Systems Active

SmartSuite

Systems Active

Frequently Asked Questions - PandaDoc

Is there a PandaDoc outage?

The current status of PandaDoc is: Systems Active

Where can I find the official status page of PandaDoc?

The official status page for PandaDoc is here

How can I get notified if PandaDoc is down or experiencing an outage?

To get notified of any status changes to PandaDoc, simply sign up to OutLogger's free monitoring service. OutLogger checks the official status of PandaDoc every few minutes and will notify you of any changes. You can veiw the status of all your cloud vendors in one dashboard. Sign up here

What does PandaDoc do?

PandaDoc simplifies business document workflows, including proposals and quotes, with compliance to SOC 2, HIPAA, and GDPR regulations. Trusted by over 50,000 clients.

Is there an PandaDoc outage?

PandaDoc status: Systems Active

PandaDoc outages and incidents

There have been 1 outages or incidents for PandaDoc in the last 30 days.

Severity Breakdown:

Tired of searching for status updates?

Components and Services Monitored for PandaDoc

EU

US & Global

Latest PandaDoc outages and incidents.

Public API is down

Updates:

Pipedrive Integration is not Functioning Properly

Updates:

Issues with PandaDoc document creation, completion and download

Updates:

PandaDoc Chat support is unavailable.

Updates:

Documents creation via API does not work

Updates:

Check the status of similar companies and alternatives to PandaDoc

Docusign

Foxit

Nitro Sign

Templafy

Documoto System

SmartSuite

Frequently Asked Questions - PandaDoc

Is there a PandaDoc outage?

Where can I find the official status page of PandaDoc?

How can I get notified if PandaDoc is down or experiencing an outage?

What does PandaDoc do?

Start monitoring now!