-
Time: Dec. 22, 2023, 10:19 p.m.
Status: Resolved
Update: Dear customers,
I'm writing to give you the final update on the recent incident, which we are now marking as resolved.
From around 7pm UTC on the Wednesday 20th December a change was rolled out to all customers and since then we've observed stability in response times and performance even when there was a high volume of traffic running through the platform. This has given us confidence that we've addressed the root cause and solved the underlying issue.
During this process, we identified other areas of improvement and have already rolled out several changes which will provide longer term benefits to our customers. We'll continue to double down on further areas of improvement off the back of this.
We sincerely apologise for any inconvenience caused to you and your teams, especially as we approach the holiday season.
Best wishes,
Danny Hannah
CTO
-
Time: Dec. 21, 2023, 3:56 p.m.
Status: Monitoring
Update: Yesterdays deployment has seen a significant improvement to performance and stability. We're continuing to monitor activity closely and make additional improvements. We will leave this incident open whilst we observe the coming days and provide updates accordingly.
-
Time: Dec. 20, 2023, 7:02 p.m.
Status: Monitoring
Update: We have deployed a change which looks promising. We will continue to monitor and provide any updates in due course. We apologise for the continued inconvenience this has cause.
-
Time: Dec. 20, 2023, 2:37 p.m.
Status: Monitoring
Update: We're continuing to see some issues with certain customers response times and are working to restore stability as soon as possible.
-
Time: Dec. 19, 2023, 5:12 p.m.
Status: Monitoring
Update: We've managed to stabilise the system and will continue to monitor as we continue work towards a root cause. We apologies for the continued disruption.
-
Time: Dec. 19, 2023, 2:13 p.m.
Status: Investigating
Update: Unfortunately we've started to see a recurrence of slow response times, and are working to mitigate as soon as possible.
-
Time: Dec. 17, 2023, 2:06 p.m.
Status: Monitoring
Update: Over the weekend, the Convertr team has migrated all customers to a new cluster and performed a series of stress tests which have all been successful.
We will continue to monitor as customer workloads increase in the coming days and update the incident accordingly.
We are continuing to work with AWS on the root cause which appears to be related to the underlying AWS cluster, however we will provide more details in due course.
We thank you for your continued support, and apologise for the disruption which has been caused.
Best regards
Danny Hannah
CTO
-
Time: Dec. 16, 2023, 1:09 a.m.
Status: Investigating
Update: We are continuing to investigate this issue as a priority. We have migrated some customers to a new cluster and will continue to monitor the situation. The team are continuing to work on this issue over the weekend to restore stability as soon as possible.
-
Time: Dec. 15, 2023, 10:16 p.m.
Status: Investigating
Update: The team are continuing to investigate and explore options to restore service to normal.
-
Time: Dec. 15, 2023, 8:30 p.m.
Status: Investigating
Update: We are continuing to work with AWS and explore options to restore service as soon as possible. We are sorry for the inconvenience caused.
-
Time: Dec. 15, 2023, 7:24 p.m.
Status: Investigating
Update: We are continuing to work with AWS and explore options to restore service as soon as possible. We are sorry for the inconvenience caused.
-
Time: Dec. 15, 2023, 6:06 p.m.
Status: Investigating
Update: We are continuing to work with AWS and explore options to restore service as soon as possible. We are sorry for the inconvenience caused.
-
Time: Dec. 15, 2023, 4:16 p.m.
Status: Investigating
Update: Dear customers,
As we near the end of the week we wanted to provide you with an update on the intermittent platform issues you may have experienced starting from Tuesday 12th this week.
Around 2.06pm (UK) we had a partial outage. By 3.24pm we were fully operational again. Throughout the majority of that time, the system was available albeit with degraded performance. The cause of this was a recommended AWS configuration change.
On Wednesday 13th around 7.44am we reverted our status back to Degraded Performance as we'd identified slower processing times overnight. We still believed this to be an AWS issue and continued working with their Support team on the investigation. By 3.14pm on 13th we updated our status to be Operational again having implemented changes to work around the issues identified.
Overnight it became clear that these workarounds weren't having the desired effect and by 12:18pm on Thurs 14th we returned to a status of Degraded Performance. By 3.06pm we'd implemented more workarounds that appeared to have addressed the issue and we became Operational once again. Our work with AWS to monitor the situation and resolve the root cause was still ongoing throughout.
This morning (Fri 15th) we had to return to a Degraded Performance status. By this afternoon things were largely stabilised (although we know some customers are still experiencing intermittent issues). At the time of writing, we're still monitoring the situation and working with AWS on a fix for the root cause.
Despite the platform being available throughout almost the entirety of this issue, we understand how slower response times impact your work and we want to apologise for any inconvenience this has/is causing. Whilst there are issues our advice is to wait whilst processing takes place (rather than retrying).
We'd like to assure you that we're doing everything possible to restore stability and performance - and have a team working round the clock together with our Support from AWS.
The best way to stay updated with the latest information is to subscribe to our status page at https://status.convertr.io/ where you can see a full history of this incident together with updates as they're posted.
We thank you for your patience and will continue to work on this until full service is restored.
Kind regards
Danny Hannah
CTO
-
Time: Dec. 15, 2023, 9:31 a.m.
Status: Investigating
Update: Unfortunately we continue to see issues this morning after a sustained stable period. We are continuing to investigate as a priority.
-
Time: Dec. 14, 2023, 3:06 p.m.
Status: Monitoring
Update: We have implemented a fix and have seen positive signs of recovery. We are continuing to monitor the situation and work with AWS (our hosting provider) on the root cause.
-
Time: Dec. 14, 2023, 12:18 p.m.
Status: Investigating
Update: Unfortunately, mitigations put in place yesterday, which were monitored through the night, have improved stability but response times continue to be slow. The team are continuing to work on this issue as a priority. We are very sorry for the inconvenience and are working hard to restore normal service as soon as possible.
-
Time: Dec. 13, 2023, 3:14 p.m.
Status: Monitoring
Update: A fix has been implemented and we're monitoring the results. We will continue to provide updates.
-
Time: Dec. 13, 2023, 1:37 p.m.
Status: Identified
Update: This issue is ongoing, we're continuing to explore options to restore normal service as quickly as possible.
-
Time: Dec. 13, 2023, 11:58 a.m.
Status: Identified
Update: We are continuing to work on a resolution to this issue. We're very sorry for the inconvenience caused, and continue to work on this as a priority.
-
Time: Dec. 13, 2023, 11:04 a.m.
Status: Identified
Update: We are continuing to work on a resolution to this issue. We're very sorry for the inconvenience caused, and continue to work on this as a priority.
-
Time: Dec. 13, 2023, 9:11 a.m.
Status: Identified
Update: The team have been working with AWS and identified a potential issue, we are working with them to implement a resolution as soon as possible.
-
Time: Dec. 13, 2023, 7:44 a.m.
Status: Investigating
Update: The team are investigating slow response times on the Convertr application, we will keep this incident updated as we understand more.