Last checked: a minute ago
Get notified about any outages, downtime or incidents for Sanity and 1800+ other cloud vendors. Monitor 10 companies, for free.
Outage and incident data over the last 30 days for Sanity.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage. It's completely free and takes less than 2 minutes!
Sign Up NowOutlogger tracks the status of these components for Xero:
Component | Status |
---|---|
apicdn.sanity.io | Active |
api.sanity.io | Active |
Asset Pipeline | Active |
Authentication | Active |
cdn.sanity.io | Active |
Datasets | Active |
GraphQL | Active |
Integrations | Active |
Manage | Active |
Studio | Active |
Webhooks | Active |
www.sanity.io | Active |
View the latest incidents for Sanity and check for official updates:
Description: We experienced an issue with our asset pipeline that affected a small subset of customers. This issue may have impacted those trying to upload an asset for a subsequent time, which is a common occurrence during the dataset import process. The issue would have started at or shortly after 17:47 UTC on 2023-09-13 and was resolved at 08:26 UTC on 2023-09-14. We apologize for the impact this had and any backup processes that were disrupted.
Status: Resolved
Impact: Minor | Started At: Sept. 14, 2023, 8:30 a.m.
Description: ## **Incident summary** On Tuesday, 5 September 2023, from 11:40 to 12:10 UTC, customers observed errors consisting of 503 response codes when trying to access cached objects through the API CDN. Access to Studio and the ability to log in to Sanity was also disrupted during this incident. The incident was identified and mitigated within 30 minutes. ## **Incident timeline** All times UTC on 2023-Sep-05. **11:40 Tracing changes rolled-out - Outage begins** 11:44 First system alert is fired \(elevated 5xx errors\) **11:47 Incident declared** 12:06 Health check identified as root cause of API CDN outage 12:08 Health check corrected 12:08 Tracing work rolled back - **Incident mitigated** 12:09 503 error rate returns to normal level 12:10 API CDN service fully recovered, customers can log in to Sanity **12:15 Incident state moved to monitoring - Root cause analysis underway** ## **Root cause** We recently developed more advanced tracing capabilities for our platform to improve system observability. This change was rolled out several weeks ago, but hit an edge case that unexpectedly increased the load on our identity management service in a way that was not caught in our testing and staging environments. The safety mechanism in the tracing library to prevent this sort of failure had a default value set too high for the system to cope with, causing our identity management service to fail. Our API CDN depended on a health check to this service and this dependency caused our API CDN to stop serving traffic. The ability to use Sanity Studio or log in to Sanity was also blocked by the unavailability of the identity service. _This is our current understanding of the incident and as we continue to investigate, if anything new and material comes to light, we will update with further details._ ## **Remediation and prevention** Sanity engineers were alerted to the issue at 11:44 UTC and began investigating the API CDN failures promptly. At 12:06 UTC the team had determined that the root cause for the API CDN outage was a health check to the identity service, which was immediately corrected. The team also rolled back the tracing change which was the root cause for the identity service outage. As these changes were rolled out, error rates subsided, the identity service started answering requests again, and regular API CDN traffic resumed. In addition to resolving the underlying cause, we will be implementing updates to both prevent and minimize the impact of this type of failure in the future. Given the critical nature of our CDN infrastructure, we are also initiating a complete audit of our caching layer, including making sure no additional legacy dependencies exist. We would like to apologize to our customers for the impact this incident had on their operations and business. We take the reliability of our platform extremely seriously, especially when it comes to availability across regions.
Status: Postmortem
Impact: Major | Started At: Sept. 5, 2023, 11:59 a.m.
Description: ## **Incident summary** On Tuesday, 5 September 2023, from 11:40 to 12:10 UTC, customers observed errors consisting of 503 response codes when trying to access cached objects through the API CDN. Access to Studio and the ability to log in to Sanity was also disrupted during this incident. The incident was identified and mitigated within 30 minutes. ## **Incident timeline** All times UTC on 2023-Sep-05. **11:40 Tracing changes rolled-out - Outage begins** 11:44 First system alert is fired \(elevated 5xx errors\) **11:47 Incident declared** 12:06 Health check identified as root cause of API CDN outage 12:08 Health check corrected 12:08 Tracing work rolled back - **Incident mitigated** 12:09 503 error rate returns to normal level 12:10 API CDN service fully recovered, customers can log in to Sanity **12:15 Incident state moved to monitoring - Root cause analysis underway** ## **Root cause** We recently developed more advanced tracing capabilities for our platform to improve system observability. This change was rolled out several weeks ago, but hit an edge case that unexpectedly increased the load on our identity management service in a way that was not caught in our testing and staging environments. The safety mechanism in the tracing library to prevent this sort of failure had a default value set too high for the system to cope with, causing our identity management service to fail. Our API CDN depended on a health check to this service and this dependency caused our API CDN to stop serving traffic. The ability to use Sanity Studio or log in to Sanity was also blocked by the unavailability of the identity service. _This is our current understanding of the incident and as we continue to investigate, if anything new and material comes to light, we will update with further details._ ## **Remediation and prevention** Sanity engineers were alerted to the issue at 11:44 UTC and began investigating the API CDN failures promptly. At 12:06 UTC the team had determined that the root cause for the API CDN outage was a health check to the identity service, which was immediately corrected. The team also rolled back the tracing change which was the root cause for the identity service outage. As these changes were rolled out, error rates subsided, the identity service started answering requests again, and regular API CDN traffic resumed. In addition to resolving the underlying cause, we will be implementing updates to both prevent and minimize the impact of this type of failure in the future. Given the critical nature of our CDN infrastructure, we are also initiating a complete audit of our caching layer, including making sure no additional legacy dependencies exist. We would like to apologize to our customers for the impact this incident had on their operations and business. We take the reliability of our platform extremely seriously, especially when it comes to availability across regions.
Status: Postmortem
Impact: Major | Started At: Sept. 5, 2023, 11:59 a.m.
Description: This incident has been resolved.
Status: Resolved
Impact: Major | Started At: July 11, 2023, 10:33 a.m.
Description: From 01:59-02:22 UTC, we experienced an elevated level of API errors for some of our customers. We have implemented a fix and are monitoring results.
Status: Resolved
Impact: Major | Started At: June 28, 2023, 2 a.m.
Join OutLogger to be notified when any of your vendors or the components you use experience an outage or down time. Join for free - no credit card required.