Last checked: 6 minutes ago
Get notified about any outages, downtime or incidents for DANAConnect and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for DANAConnect.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
Site DANAConnect | Active |
API | Active |
Bulk Contacts Load API | Active |
Conversation API | Active |
Document Upload API | Active |
Do Not Contact List API | Active |
One Time Password API | Active |
SMTP Service | Active |
ETL, Audit & Reports | Active |
Audit Reports & Delivery Logs | Active |
Automated Contact Loading | Active |
Custom ETL Processing | Active |
Real-Time Reporting | Active |
Outbound Dispatchers | Active |
Director - Workflow Orchestration | Active |
Email platform | Active |
Webhooks (API Request Node) | Active |
SMS | Active |
SMS - Infobip API | Active |
SMS - Venezuela - Digitel | Active |
SMS -Venezuela - Telefónica Movistar | Active |
Third-Party Components & Services | Active |
AWS cloud9-us-east-1 | Active |
AWS cloud9-us-east-2 | Active |
AWS cloudfront | Active |
AWS ec2-us-east-1 | Active |
AWS ec2-us-east-2 | Active |
AWS route53 | Active |
AWS s3-us-east-2 | Active |
Web | Active |
Landing Pages | Active |
Portal Apps US-EST1 | Active |
Portal US-EST1 | Active |
Web Forms | Active |
View the latest incidents for DANAConnect and check for official updates:
Description: This incident has been resolved.
Status: Resolved
Impact: Minor | Started At: Aug. 28, 2024, 2:32 p.m.
Description: This incident has been resolved.
Status: Resolved
Impact: None | Started At: April 24, 2024, 9:44 p.m.
Description: **Incident Overview:** At 1:07 am on April 17th, our team encountered a significant issue impacting our ability to process bulk conversations efficiently. The root cause of this problem was identified as a failure in the master database, which disrupted our normal operations and led to delays in query execution and failures in processors. **Logs related to the incident:** `watchdog: BUG: soft lockup - CPU#31 stuck` `watchdog: BUG: soft lockup - CPU#31 stuck for 17964s! [scp:3499982]` `[6791981.871733] Modules linked in: binfmt_misc raid0 xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc overlay tls sunrpc nls_ascii nls_cp437 vfat fat ena ghash_clmulni_intel ptp aesni_intel i8042 serio pps_core crypto_simd cryptd button sch_fq_codel dm_mod fuse configfs loop dax dmi_sysfs crc32_pclmul crc32c_intel efivarfswatchdog: BUG: soft lockup - CPU#32 stuck` `837474.767511] watchdog: BUG: soft lockup - CPU#32 stuck for 544s! [xtrabackup:3496469]` `[6837474.798684] Modules linked in: binfmt_misc raid0 xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc overlay tls sunrpc nls_ascii nls_cp437 vfat fat ena ghash_clmulni_intel ptp aesni_intel i8042 serio pps_core crypto_simd cryptd button sch_fq_codel dm_mod fuse configfs loop dax dmi_sysfs crc32_pclmul crc32c_intel efivarfs` `[6837474.999592] CPU: 32 PID: 3496469 Comm: xtrabackup Tainted: G L 6.1.72-96.166.amzn2023.x86_64 #1` `[6837475.048435] Hardware name: Amazon EC2 m6idn.12xlarge/, BIOS 1.0 10/16/2017` `[6837475.082238] RIP: 0010:xas_descend+0x16/0x80` `[6837475.103266] Code: 07 48 c1 e8 20 48 89 57 08 c3 cc cc cc cc cc cc cc cc cc cc 0f b6 0e 48 8b 47 08 48 d3 e8 48 89 c1 83 e1 3f 89 c8 48 83 c0 04 <48> 8b 44 c6 08 48 89 77 18 48 89 c2 83 e2 03 48 83 fa 02 74 0e 88` **Impact:** The impact of this incident was felt across multiple aspects of our operations: * Delays in query execution: The failure in the master database resulted in significant delays in executing queries, slowing down our overall processing speed. * Processor failures: Due to the disruption in the database, several processors failed to function as expected, further exacerbating the processing delays. **Timeline:** * April 17th, 1:07 AM: Incident detected, delays in query execution and failures observed. * April 17th, 1:20 AM: Immediate investigation initiated to identify the root cause. * April 17th, 2:00 AM: Root cause identified as a failure in the master database. * April 17th, 2:30 AM: Recovery efforts initiated to restore normal database functionality. * April 17th, 8:42 AM: Database functionality restored, bulk conversations processing resumed. **Resolution:** Upon identifying the root cause of the issue, our team immediately mobilized to address the situation and mitigate the impact on our operations. We initiated a series of troubleshooting steps to restore functionality to the master database and implemented temporary workarounds to minimize the impact on our processing capabilities. **Maintenance Window Scheduled:** To ensure that the failure is not repeated, it was decided to create an emergency maintenance window to perform a full recovery of the Master Database Server. on April 17th, 9:00 PM EDT. Upon identifying the root cause, our team initiated recovery efforts to restore normal database functionality. This involved implementing backup systems and rerouting traffic to ensure minimal disruption to our services. Database functionality was successfully restored by April 18th, at 12:06 AM. **Lessons Learned:** This incident has provided us with valuable insights and lessons that will guide our future actions: 1. **Database Monitoring and Redundancy:** We recognize the need to enhance our database monitoring systems to detect issues proactively and implement redundancy measures to ensure continuity of operations in the event of a failure. 2. **Communication Protocols:** Clear and timely communication is essential during incidents to keep all stakeholders informed about the situation, the steps being taken to address it, and the expected timelines for resolution. 3. **Resilience Testing:** Regular testing of our systems' resilience and failover mechanisms will help us identify potential weaknesses and ensure that we are adequately prepared to handle similar incidents in the future. **Next Steps:** Moving forward, we are committed to implementing the necessary improvements to strengthen our infrastructure and processes, minimizing the likelihood of similar incidents occurring in the future. We will also conduct a thorough review of our incident response procedures to identify areas for refinement and enhancement. We want to extend my sincere appreciation to everyone who was involved in responding to this incident, your dedication and expertise were instrumental in minimizing the impact on our operations and restoring functionality. If you have any further questions or concerns regarding this incident or our response efforts, please don't hesitate to reach out. Thank you for your understanding and continued support as we work together to ensure the reliability and resilience of our systems.
Status: Postmortem
Impact: Major | Started At: April 17, 2024, 10:49 a.m.
Description: This incident has been resolved.
Status: Resolved
Impact: Minor | Started At: Feb. 24, 2024, 9:36 p.m.
Description: This incident has been resolved.
Status: Resolved
Impact: Minor | Started At: Feb. 6, 2024, 1:52 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.