Last checked: 4 minutes ago
Get notified about any outages, downtime or incidents for imgix and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for imgix.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
API Service | Active |
Docs | Active |
Purging | Active |
Rendering Infrastructure | Active |
Sandbox | Active |
Stripe API | Active |
Web Administration Tools | Active |
Content Delivery Network | Active |
Amsterdam (AMS) | Active |
Ashburn (BWI) | Active |
Ashburn (DCA) | Active |
Ashburn (IAD) | Active |
Atlanta (ATL) | Active |
Atlanta (FTY) | Active |
Atlanta (PDK) | Active |
Auckland (AKL) | Active |
Boston (BOS) | Active |
Brisbane (BNE) | Active |
Buenos Aires (EZE) | Active |
Cape Town (CPT) | Active |
Chennai (MAA) | Active |
Chicago (CHI) | Active |
Chicago (MDW) | Active |
Chicago (ORD) | Active |
Columbus (CMH) | Active |
Content Delivery Network | Active |
Copenhagen (CPH) | Active |
Curitiba (CWB) | Active |
Dallas (DAL) | Active |
Dallas (DFW) | Active |
Denver (DEN) | Active |
Dubai (FJR) | Active |
Frankfurt (FRA) | Active |
Frankfurt (HHN) | Active |
Helsinki (HEL) | Active |
Hong Kong (HKG) | Active |
Houston (IAH) | Active |
Johannesburg (JNB) | Active |
London (LCY) | Active |
London (LHR) | Active |
Los Angeles (BUR) | Active |
Los Angeles (LAX) | Active |
Madrid (MAD) | Active |
Melbourne (MEL) | Active |
Miami (MIA) | Active |
Milan (MXP) | Active |
Minneapolis (MSP) | Active |
Montreal (YUL) | Active |
Mumbai (BOM) | Active |
Newark (EWR) | Active |
New York (JFK) | Active |
New York (LGA) | Active |
Osaka (ITM) | Active |
Palo Alto (PAO) | Active |
Paris (CDG) | Active |
Perth (PER) | Active |
Rio de Janeiro (GIG) | Active |
San Jose (SJC) | Active |
Santiago (SCL) | Active |
Sāo Paulo (GRU) | Active |
Seattle (SEA) | Active |
Singapore (SIN) | Active |
Stockholm (BMA) | Active |
Sydney (SYD) | Active |
Tokyo (HND) | Active |
Tokyo (NRT) | Active |
Tokyo (TYO) | Active |
Toronto (YYZ) | Active |
Vancouver (YVR) | Active |
Wellington (WLG) | Active |
DNS | Active |
imgix DNS Network | Active |
NS1 Global DNS Network | Active |
Docs | Active |
Netlify Content Distribution Network | Active |
Netlify Origin Servers | Active |
Storage Backends | Active |
Google Cloud Storage | Active |
s3-ap-northeast-1 | Active |
s3-ap-northeast-2 | Active |
s3-ap-southeast-1 | Active |
s3-ap-southeast-2 | Active |
s3-ca-central-1 | Active |
s3-eu-central-1 | Active |
s3-eu-west-1 | Active |
s3-eu-west-2 | Active |
s3-eu-west-3 | Active |
s3-sa-east-1 | Active |
s3-us-east-2 | Active |
s3-us-standard | Active |
s3-us-west-1 | Active |
s3-us-west-2 | Active |
View the latest incidents for imgix and check for official updates:
Description: # What happened? On September 09, 2021, at 14:02 UTC, an improper configuration prevented imgix servers from connecting to some Web folder and Web Proxy origins, which caused non-cached derivative image requests for affected Web Folder / Web Proxy customer origins to return a `503` error. # How were customers impacted? The impact of this incident was isolated to some Web Folder and Web Proxy customers sharing a common configuration setting. Between the hours of 14:02 UTC and 18:56 UTC, affected Web Folder and Web Proxy customers experienced a variable increase in errors to non-cached derivative images. At the height of the incident, a small percentage of Web Folder and Web Proxy requests returned a `503` error, which amounted to 0.16% of all imgix requests. At 18:56 UTC, a fix was applied, allowing the service to be completely restored. # What went wrong during the incident? At 14:20 UTC, our team was alerted to a small increase in fetch errors to some Web Folder and Web Proxy origins. Due to the small number of errors that were reported by our monitoring service, it was unclear whether or not this was the result of some customer origins misbehaving, or if this was an issue with our service’s ability to fetch images. Eventually, our engineering team tracked down the change to a specific service provider, which we correlated to the increase in errors for some Web Folder / Web Proxy customers. As our team looked into solutions, several external factors severely slowed remediation efforts: * Our internal communication platform was experiencing connectivity issues * Some critical database services were unavailable during the incident * Service error messaging was ambiguous as to the cause of the issue * We experienced discrepancies between applied system changes and running processes Eventually, the imgix team deployed a fix that enabled our servers to successfully talk to all Web Folder and Web Proxy origins. # What will imgix do to prevent this in the future? We will be updating our configurations for fetching assets from customer origins to prevent similar issues from occurring, along with updating our service runbooks to include rolling restarts for some types of configuration updates. We will also be migrating some of our database tooling to mitigate connectivity limitations, along with updating our internal processes to address cases where communication outages occur.
Status: Postmortem
Impact: Minor | Started At: Sept. 30, 2021, 3:16 p.m.
Description: # What happened? On September 13, 2021, between the times of 14:53 UTC and 15:44 UTC, the imgix service experienced increased rendering latency primarily for non-cached derivative images. Error rates remained low, though a very small percentage of images returned a timeout error during the peak of the incident. # How were customers impacted? Between the time of 14:53 UTC and 15:27 UTC, some customers experienced dramatically increased latency for requests to non-cached and cached assets served by imgix. At the peak of the incident, cached requests averaged at 1s/request to complete, while non-cached requests averaged at 40s/request to complete. The majority of requests to the service returned a `200` response, with a small percentage of images \(<2%\) returning a timeout error during the peak of the incident. By 15:03 UTC, response times began to gradually recover, though the average response time was still higher than normal, especially for non-cached derivatives. By 15:27 UTC the service had recovered to the point where timeouts were no longer occurring. Though higher-than-normal response times were still being reported in our monitoring, by this time the service was considered to be mostly recovered. By 15:44 UTC, the service had completely recovered. # What went wrong during the incident? At 14:53 UTC, we started receiving reports from customers regarding increased latency to the rendering service. At the time of these reports \(and during the incident\), our monitoring had not observed any behavior that had indicated service degradation. Because of this, manual alarms had to be raised which slowed our initial response and investigation. Our engineers identified that, while our service’s reported errors were very low, our rendering latency was rapidly increasing. After verifying the issue, our team began tuning our rendering infrastructure in order to improve rendering performance. After additional investigation, our team correlated the increased latency to the enablement of a new feature that resulted in higher render requests than expected. This specific feature interacted with our caching and rendering infrastructure by causing many requests to be immediately re-cached and re-rendered. The immediate increase in caching activity and volume triggered a bottleneck which eventually resolved itself once the cache had been mostly rebuilt. # What will imgix do to prevent this in the future? We will be re-visiting our procedures for rolling out new features, which will include: * Implementing traffic configurations for controlling the flow of feature roll-outs * Improving our internal documentation and processes so that our teams are synchronized across feature roll-outs * Doing better analysis for caching and rendering impact per newly released feature This incident also exposed an error with our rendering performance monitoring, which we have now fixed.
Status: Postmortem
Impact: Minor | Started At: Sept. 13, 2021, 3:29 p.m.
Description: # What happened? On September 01, 2021, 15:33 UTC, analytics and logging for imgix usage had abruptly stopped. During this time, no customer analytics was recorded. This includes data related to image bandwidth, Origin Image counts, and other usage data typically generated from image requests. The issue went unnoticed until the next day on September 02, 15:44 UTC, when a fix was pushed to immediately resume logging. # How were customers impacted? Customers lost approximately 23 hours of imgix analytics data, though we were able to completely recover Origin Image counts. The affected time range for missing analytics spans from September 01, 2021, 15:33 UTC to September 02, 2021, 15:44 UTC. In the dashboard, this is represented as dramatically lower bandwidth counts for the dates between 09/01/2021 and 09/02/2021. All other analytics data \(such as network usage, audience analytics, network health, etc.\) will also show data missing during that time period. # What went wrong during the incident? On September 01, 15:33 UTC a breaking change was deployed by our engineering team which had affected data logging in imgix. This change had been tested prior to being pushed to production, though we lacked monitoring on key measurements that would have let us catch the issue before going live with the change. Consequently, the issue went unnoticed until the next day, when one of our staff members had noticed that analytics was not reporting any data in the dashboard. Once the issue was identified, our engineers rolled back to restore logging functionality. While we were able to recover Origin Image counts, most of the other analytical data \(bandwidth, audience analytics, network logs\) were lost during the logging outage. # What will imgix do to prevent this in the future? On the monitoring side, we will implement monitoring to track metrics such as bandwidth and usage data to trigger internal alerts when data deviates greatly. These changes will be implemented across all applicable systems. We will also be updating our tooling to allow us to recover and replay data in the event that usage logging is disrupted.
Status: Postmortem
Impact: Minor | Started At: Sept. 2, 2021, 5:33 p.m.
Description: # What happened? On September 01, 2021, 15:33 UTC, analytics and logging for imgix usage had abruptly stopped. During this time, no customer analytics was recorded. This includes data related to image bandwidth, Origin Image counts, and other usage data typically generated from image requests. The issue went unnoticed until the next day on September 02, 15:44 UTC, when a fix was pushed to immediately resume logging. # How were customers impacted? Customers lost approximately 23 hours of imgix analytics data, though we were able to completely recover Origin Image counts. The affected time range for missing analytics spans from September 01, 2021, 15:33 UTC to September 02, 2021, 15:44 UTC. In the dashboard, this is represented as dramatically lower bandwidth counts for the dates between 09/01/2021 and 09/02/2021. All other analytics data \(such as network usage, audience analytics, network health, etc.\) will also show data missing during that time period. # What went wrong during the incident? On September 01, 15:33 UTC a breaking change was deployed by our engineering team which had affected data logging in imgix. This change had been tested prior to being pushed to production, though we lacked monitoring on key measurements that would have let us catch the issue before going live with the change. Consequently, the issue went unnoticed until the next day, when one of our staff members had noticed that analytics was not reporting any data in the dashboard. Once the issue was identified, our engineers rolled back to restore logging functionality. While we were able to recover Origin Image counts, most of the other analytical data \(bandwidth, audience analytics, network logs\) were lost during the logging outage. # What will imgix do to prevent this in the future? On the monitoring side, we will implement monitoring to track metrics such as bandwidth and usage data to trigger internal alerts when data deviates greatly. These changes will be implemented across all applicable systems. We will also be updating our tooling to allow us to recover and replay data in the event that usage logging is disrupted.
Status: Postmortem
Impact: Minor | Started At: Sept. 2, 2021, 5:33 p.m.
Description: # What happened? On August 26, 2021, at 15:00 UTC, the imgix service experienced disruption caused by long-running processes within our origin cache. Once our engineers identified the issue, remediation changes were applied at 15:09 UTC. After the changes were pushed out, the service sharply recovered at 15:20 UTC. # How were customers impacted? Starting at 15:00 UTC, requests to non-cached derivative images returned a `503` response. These errors accounted for about 5% of all requests to the rendering service and were sustained until 15:20 UTC when the service sharply recovered. # What went wrong during the incident? Investigating the cause of the incident, our engineers identified a scenario in which origin connections were misbehaving due to customer configuration settings. While by itself this is not normally a problem, there was some origin activity that had caused the performance of the origin cache to severely degrade, eventually affecting rendering. # What will imgix do to prevent this in the future? We will be modifying our infrastructure’s configuration to eliminate scenarios where customer configurations are able to cause origin connection issues in our infrastructure. We will also be working with existing customers to optimize their configurations so that they will not be affected by the new changes in our infrastructure.
Status: Postmortem
Impact: Major | Started At: Aug. 26, 2021, 3:09 p.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.