Last checked: 5 minutes ago
Get notified about any outages, downtime or incidents for AskNicely and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for AskNicely.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
AskNicely Application | Active |
View the latest incidents for AskNicely and check for official updates:
Description: We experienced a brief period of application unavailability due to malicious requests causing high server load. This issue has been resolved.
Status: Resolved
Impact: None | Started At: Jan. 25, 2021, 12:30 p.m.
Description: We experienced an application outage starting 11:20 AM due to a database issue. This incident was resolved within an hour.
Status: Resolved
Impact: Major | Started At: Aug. 24, 2020, midnight
Description: All AskNicely services are back to normal.
Status: Resolved
Impact: Major | Started At: Feb. 21, 2019, 2:51 a.m.
Description: ## The 502 error Today a number of customer may have experienced a 502 error and were not able to access the AskNicely platform. We are super proud of the platform we have built, and when we let our customers down, we know we need to do a better job, it really hurts. We are sorry you were not able to access our platform. Very sorry. We have a fantastic engineering team and over the next week, we will be focusing on our infrastructure to help minimise outages that you may have seen today. ## What went wrong AskNicely is built on AWS \(Amazon\), it is an amazing platform which allows to scale our solution very easily. Today we hit an issue with extremely heavy load on our USA database server \(RDS\). The symptoms we saw. * 502 Error rates * Load Balancer errors, 'unhealthy web server in load balancer pool * Database load in RDS going from under 5% to 100% in matter of seconds. Very abnormal. * Our 502 error page did not tell our customers what was happening, nor link to our status page. Bad. ## What went right We have extensive monitoring on AskNicely we have some fantastic services that we love which kicked in as soon as it detected something abnormal. The services we use today: * [PagerDuty.com](http://PagerDuty.com) We love PagerDuty, both the mobile app, email, SMS and automated phone calls for alerting. Auto escalation policies to other team members. * [Datadog.com](http://Datadog.com) provides us with detailed metrics around our application performance and servers, we send a massive amount of data back to Datadog and its a valuable asset that we use for real time monitoring and debugging. * [Loggly.com](http://Loggly.com) all our log files and error logs are managed in Loggly. We can easily visualise and quantify requests from customers in seconds using their powerful log query tool. * [NewRelic.com](http://NewRelic.com) can provide incredibly detailed analysis of what parts of our application are being used the most, how well that code is performing and what part of the code is the slowest. It also monitors how long our application is taking to load for our customers. We really absolutely love NewRelic and it is our Litmus test to see if our code changes have resolved our issues or not. * [Slack.com](http://Slack.com) it makes it so easy for our team to stay on the same page and communicate instantly no matter where we are in the world. * [Statuspage.io](http://Statuspage.io) You can find a link to our statuspage from the [www.asknicely.com](http://www.asknicely.com) homepage and our 404 pages. ## What we discovered During this time, we came under a very heavy API load from one customer. Normally our API rate limiter would kick in and prevent any one single customer from causing an outage. But due to the size of this customers dataset, our API was too slow to respond to all their requests causing massive congestion. Our rate limiting API is tuned for number of requests, not time to process a request. ## What we did We have a number of strategies that we use to scale our platform. One strategy allows us to move a single customer from one database host \(RDS Instance\) to another. Once we isolated the issue, this customer was moved to their own database instance. The AskNicely application instantly become responsive and all our server metrics returned to what we would consider normal parameters. We have also worked on several bottle necks including: * Autoscaling our primary USA database server, we have tripled the capacity of this server, in size and dedicated IOPS. * We have 6x our Redis instance that provides us with a powerful and fast caching service for parts of the application. * We have changed several variables on our RDS instance that would allow higher loads * We have added another application server to the server pool. ## What we are planning todo * Add detailed API monitoring - time, frequency, tenant and database * Improve our API rate limiter. * Refactor our API code that caused us issues and most likely refactor a particular query that caused the heavy load on our database. * Provide a way to gracefully degrade AskNicely so that core/key services are not affected. * Improve our 502 error page to link to our StatusPage so we can get our customers more timely updates. Again we are sorry, and we are working hard to rectify these issues. John // CTO and co-founder AskNicely
Status: Postmortem
Impact: None | Started At: Oct. 8, 2018, 3:13 p.m.
Description: We’ve identified endpoints that were not properly rate limited and when receiving a high volume of traffic were causing infrastructure issues. We’re working on better rate limiting coverage rolled out to prevent further outages.
Status: Postmortem
Impact: Major | Started At: Oct. 5, 2018, 3:33 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.