Last checked: 4 minutes ago
Get notified about any outages, downtime or incidents for SYNAQ and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for SYNAQ.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
SYNAQ Archive | Active |
SYNAQ Branding | Active |
SYNAQ Cloud Mail | Active |
SYNAQ Continuity | Active |
SYNAQ Q Portal | Active |
SYNAQ Securemail | Active |
View the latest incidents for SYNAQ and check for official updates:
Description: Summary and Impact to Customers On Tuesday 25th January 2022 from 11:30 to 16:38, SYNAQ Cloud Mail experienced an authentication incident affecting a subset of users. The resultant impact of the event was that certain users were unable to authenticate and access their mailboxes. Root Cause and Solution The root cause of this event was due to Virtual Machine \(VM\) backup snapshots on two Cloud Mail mailbox stores not being removed as part of a maintenance task. As part of our routine nightly backup processes, or when server maintenance is being performed, backups snapshots are created for each VM to ensure quick roll back and minimal impact to clients should there be an issue during maintenance or otherwise. Once a maintenance task has been completed the snapshots are then removed manually. If snapshots are not removed the performance of the VM slowly degrades over time. Human error resulted in snapshots not being removed from two of the stores after maintenance was performed last week Thursday. Whilst the removal of snapshots is not normally a problem in isolation, a further error occurred with our monitoring scripts that look for this issue – they did not detect that these VM’s had old out-of-date snapshots and therefore no alerts were raised. As such, these snapshots started to impact the performance of the stores on Monday. To resolve this degradation, the snapshots were removed on Tuesday morning. Five of the seven snapshots removed successfully, however, two of the snapshots did not complete their consolidation step and this degraded the VM’s performance even more forcing them to enter an unresponsive state. To resolve this issue, a manual consolidation of both VM’s needed to be run. Due to the business day load and performance of these VM’s this took a couple of hours to complete. Remediation Actions • Process improvements will be made to the existing maintenance process to ensure that the snapshot removal task is triple checked for completion. • Snapshot monitoring checks to be repaired and updated to alert for old snapshots. • Snapshot removal to only take place outside of business hours despite possible degradation of VM performance.
Status: Postmortem
Impact: Major | Started At: Jan. 25, 2022, 9:59 a.m.
Description: Summary and Impact to Customers On Wednesday 13th October 2021 from 12:03 to 14:06, SYNAQ Cloud Mail experienced a mail authentication incident. The resultant impact of the event was that users received an authentication pop-up message when trying to login and experienced slow access to mail. Root Cause and Solution The root cause of this event was due to an abnormal amount of modification requests being received from the SYNAQ API. The increased writes and load on the master LDAP server prevented the LDAP replicas from being able to replicate the LDAP changes from the master. Since LDAP replicas are responsible for servicing requests for authentication and mail delivery, and were unable to sync correctly from the master LDAP server in this instance, authentication and mail delivery requests were not processed. To resolve this issue, the primary server and all replicas had to be restarted to re-establish the syncing between LDAP servers. Once all the LDAP servers were in sync, authentication requests were once again being processed and users could access their mail. Remediation Actions • More stringent limits on the API are to be put in place to limit the number of requests that can come in from a single IP address to prevent increased load on the LDAP servers. • The SYNAQ infrastructure teams are working in collaboration with the 3rd Party vendor to improve and harden our LDAP configuration.
Status: Postmortem
Impact: Critical | Started At: Oct. 13, 2021, 10:31 a.m.
Description: Summary and Impact to Customers On Wednesday 13th October 2021 from 12:03 to 14:06, SYNAQ Cloud Mail experienced a mail authentication incident. The resultant impact of the event was that users received an authentication pop-up message when trying to login and experienced slow access to mail. Root Cause and Solution The root cause of this event was due to an abnormal amount of modification requests being received from the SYNAQ API. The increased writes and load on the master LDAP server prevented the LDAP replicas from being able to replicate the LDAP changes from the master. Since LDAP replicas are responsible for servicing requests for authentication and mail delivery, and were unable to sync correctly from the master LDAP server in this instance, authentication and mail delivery requests were not processed. To resolve this issue, the primary server and all replicas had to be restarted to re-establish the syncing between LDAP servers. Once all the LDAP servers were in sync, authentication requests were once again being processed and users could access their mail. Remediation Actions • More stringent limits on the API are to be put in place to limit the number of requests that can come in from a single IP address to prevent increased load on the LDAP servers. • The SYNAQ infrastructure teams are working in collaboration with the 3rd Party vendor to improve and harden our LDAP configuration.
Status: Postmortem
Impact: Critical | Started At: Oct. 13, 2021, 10:31 a.m.
Description: **Summary and Impact to Customers** On Tuesday 22nd June 2021 from 14:48 to 28th June 2021 14:55, SYNAQ Cloud Mail experienced an intermittent, degraded mail authentication incident. The resultant impact of the event was that certain users would receive authentication pop-up messages when trying to login via HTTPS, POP3/S, IMAP/S, SMTP/S, as well as slow access to webmail. **Root Cause and Solution** On the 22nd of June 2021 at 14:48 SYNAQ Cloud Mail began to experience incoming mail delays. This delay occurred at the Zimbra MTA \(Mail Transfer Agent\) layer. Once identified, we attempted a series of fixes to resolve the incident. At 15:30 a change was made to increase the processing threads of the MTA servers from 100 to 150 to increase the amount of mail dealt with at any one time by an MTA when trying to deliver mail. The change was done in an attempt to increase the processing of the mail building up in the queue. However, this did not have the desired effect. At 16:30 a new Exim server was added to the MTA cluster \(a project that was going to take place within the next couple of weeks – as detailed below in the root cause section\), and this server was able to process mails with no delays. We ceased new mail delivery to the MTA’s, one at a time, until they cleared their queues. Thereafter, normal mail flow was restored by 16:51. On the 23rd of June 2021 at 09:09, mail delays re-occurred, coupled with a select group of users receiving authentication failure “pop-up” messages when trying to login to their mailboxes. At 09:52 debugging was performed on the Anti-Virus functions on the MTA servers, as this appeared to be where the delay was taking place. Configuration changes then were made to timeout settings and processing times to try and resolve this issue. Unfortunately, this did not have the desired effect. At 10:15 the replacement of the rest of the current Cloud Mail MTAs with new Exim servers commenced. This replacement was decided upon because the one server currently in production was processing mail without delay. This was fully completed by 20:30. At 17:14 mail delays and authentication had recovered. On the 24th of June 2021 at 09:41 a select group of users were receiving authentication pop -up messages when trying to login and experienced slow access to mail. As a result, at 10:17 we moved our focus from the MTAs to the LDAP servers as mail flow was no longer affected since the change to the new Exim servers. A data dump of the master LDAP database was performed and reloaded on all the replicas to rule out any memory page fragmentation \(performance inhibiting side-affect\) across these LDAP servers. At 13:00 file system errors were discovered on the master LDAP server. All Cloud Mail servers were adjusted to point to the secondary master. A file system check and repair was run on the primary LDAP master. While the sever was down memory and CPU resources were increased. At 13:10 mail authentication and slow access had recovered. On the 25th of June 2021 at 09:22, a select group of users were receiving authentication pop-up messages when trying to login and experiencing slow access to mail. At 10:35, TCP and connection and timeout settings adjusted on all the LDAP servers. At 12:35, connection tracking was disabled on the load balancer. This was done to ensure that if there was a particular problem with an individual LDAP replica, then the connections move seamlessly to another replica. Mail authentication and slow access then recovered. On the 26th and 27th of June we experienced no further recurrences of the issues. On the 28th of June 09:30, a select group of users were receiving authentication pop-up messages when trying to login and experienced slow access to mail. At 10:00, two new LDAP replicas were built to be added to the cluster. At 11:03, the global address list feature was turned off \(for classes of service with large domains that did not need this feature\) to try and reduce the traffic to the LDAP servers. At 13:05, we deleted 30 data sources \(external account configurations\) that were stored in LDAP but were showing errors during LDAP replication. At 14:30, the two new LDAP servers and all unique components of Cloud Mail, stores, MTAs, and proxies were pointed to their own unique set of LDAP replicas. At 14:55, mail authentication and slow access had recovered. The root cause of this event was due to a project that we initiated last year to replace standard Zimbra MTAs with custom built Exim MTAs. The purpose of this project was to vastly increase the security and delivery of clients’ mail. The initial project phase \(last year\) was to replace the outbound servers and then to do the inbound servers in July. A test inbound server was added, and this resulted in the start of the experienced issues. In addition, the replacement of all of the remaining MTAs with the new inbound servers in an attempt to resolve the issue, only exacerbated the problem. The problem that was introduced was that all native servers to Zimbra establish a persistent connection through to the LDAP servers. These new MTAs, introduced to reduce load and traffic to the LDAP servers, established short-term connections. The load balancer tried to deal with the different ways to establish connections in the same way and would overload a single LDAP server and would then proceed to affect the rest in cascading manner as the load was redistributed. To resolve this issue, two different load balancer IP addresses were configured with their own separate LDAP servers behind them. One, to manage persistent connections and the other, to manage short-term connections. Thereafter, the relevant servers were pointed to the load balancer IP that suits how they communicate and connect to LDAP. **Remediation Actions** • Two additional LDAP replicas have been built and added to the LDAP cluster. • Two different load balancer IP addresses have been configured with their own separate LDAP servers behind them. One to manage persistent connections and one to manage short-term connections. Thereafter, the relevant servers were pointed to the load balancer IP that suits how they communicate through to LDAP. • A third load balancer IP will be added to improve LDAP redundancy. This will allow store servers to attempt a new connection rather than remaining connected to an LDAP server that is no longer responding.
Status: Postmortem
Impact: Critical | Started At: June 23, 2021, 7:09 a.m.
Description: **Summary and Impact to Customers** On Tuesday 22nd June 2021 from 14:48 to 28th June 2021 14:55, SYNAQ Cloud Mail experienced an intermittent, degraded mail authentication incident. The resultant impact of the event was that certain users would receive authentication pop-up messages when trying to login via HTTPS, POP3/S, IMAP/S, SMTP/S, as well as slow access to webmail. **Root Cause and Solution** On the 22nd of June 2021 at 14:48 SYNAQ Cloud Mail began to experience incoming mail delays. This delay occurred at the Zimbra MTA \(Mail Transfer Agent\) layer. Once identified, we attempted a series of fixes to resolve the incident. At 15:30 a change was made to increase the processing threads of the MTA servers from 100 to 150 to increase the amount of mail dealt with at any one time by an MTA when trying to deliver mail. The change was done in an attempt to increase the processing of the mail building up in the queue. However, this did not have the desired effect. At 16:30 a new Exim server was added to the MTA cluster \(a project that was going to take place within the next couple of weeks – as detailed below in the root cause section\), and this server was able to process mails with no delays. We ceased new mail delivery to the MTA’s, one at a time, until they cleared their queues. Thereafter, normal mail flow was restored by 16:51. On the 23rd of June 2021 at 09:09, mail delays re-occurred, coupled with a select group of users receiving authentication failure “pop-up” messages when trying to login to their mailboxes. At 09:52 debugging was performed on the Anti-Virus functions on the MTA servers, as this appeared to be where the delay was taking place. Configuration changes then were made to timeout settings and processing times to try and resolve this issue. Unfortunately, this did not have the desired effect. At 10:15 the replacement of the rest of the current Cloud Mail MTAs with new Exim servers commenced. This replacement was decided upon because the one server currently in production was processing mail without delay. This was fully completed by 20:30. At 17:14 mail delays and authentication had recovered. On the 24th of June 2021 at 09:41 a select group of users were receiving authentication pop -up messages when trying to login and experienced slow access to mail. As a result, at 10:17 we moved our focus from the MTAs to the LDAP servers as mail flow was no longer affected since the change to the new Exim servers. A data dump of the master LDAP database was performed and reloaded on all the replicas to rule out any memory page fragmentation \(performance inhibiting side-affect\) across these LDAP servers. At 13:00 file system errors were discovered on the master LDAP server. All Cloud Mail servers were adjusted to point to the secondary master. A file system check and repair was run on the primary LDAP master. While the sever was down memory and CPU resources were increased. At 13:10 mail authentication and slow access had recovered. On the 25th of June 2021 at 09:22, a select group of users were receiving authentication pop-up messages when trying to login and experiencing slow access to mail. At 10:35, TCP and connection and timeout settings adjusted on all the LDAP servers. At 12:35, connection tracking was disabled on the load balancer. This was done to ensure that if there was a particular problem with an individual LDAP replica, then the connections move seamlessly to another replica. Mail authentication and slow access then recovered. On the 26th and 27th of June we experienced no further recurrences of the issues. On the 28th of June 09:30, a select group of users were receiving authentication pop-up messages when trying to login and experienced slow access to mail. At 10:00, two new LDAP replicas were built to be added to the cluster. At 11:03, the global address list feature was turned off \(for classes of service with large domains that did not need this feature\) to try and reduce the traffic to the LDAP servers. At 13:05, we deleted 30 data sources \(external account configurations\) that were stored in LDAP but were showing errors during LDAP replication. At 14:30, the two new LDAP servers and all unique components of Cloud Mail, stores, MTAs, and proxies were pointed to their own unique set of LDAP replicas. At 14:55, mail authentication and slow access had recovered. The root cause of this event was due to a project that we initiated last year to replace standard Zimbra MTAs with custom built Exim MTAs. The purpose of this project was to vastly increase the security and delivery of clients’ mail. The initial project phase \(last year\) was to replace the outbound servers and then to do the inbound servers in July. A test inbound server was added, and this resulted in the start of the experienced issues. In addition, the replacement of all of the remaining MTAs with the new inbound servers in an attempt to resolve the issue, only exacerbated the problem. The problem that was introduced was that all native servers to Zimbra establish a persistent connection through to the LDAP servers. These new MTAs, introduced to reduce load and traffic to the LDAP servers, established short-term connections. The load balancer tried to deal with the different ways to establish connections in the same way and would overload a single LDAP server and would then proceed to affect the rest in cascading manner as the load was redistributed. To resolve this issue, two different load balancer IP addresses were configured with their own separate LDAP servers behind them. One, to manage persistent connections and the other, to manage short-term connections. Thereafter, the relevant servers were pointed to the load balancer IP that suits how they communicate and connect to LDAP. **Remediation Actions** • Two additional LDAP replicas have been built and added to the LDAP cluster. • Two different load balancer IP addresses have been configured with their own separate LDAP servers behind them. One to manage persistent connections and one to manage short-term connections. Thereafter, the relevant servers were pointed to the load balancer IP that suits how they communicate through to LDAP. • A third load balancer IP will be added to improve LDAP redundancy. This will allow store servers to attempt a new connection rather than remaining connected to an LDAP server that is no longer responding.
Status: Postmortem
Impact: Critical | Started At: June 23, 2021, 7:09 a.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.