Azure Redis Cache - Connectivity Issues - Investigating

Incident
March 16, 5:24am EDT

Azure Redis Cache - Connectivity Issues - Investigating

Status: closed
Start: March 16, 5:24am EDT
End: March 16, 5:24am EDT
Duration: 1m
Affected Components:
Cloud Providers Azure
Resolved

March 16, 5:24am EDT

March 16, 5:24am EDT

Between 03:30 UTC and 13:52 UTC on 16 Mar 2023 you were identified as a customer using Azure Redis Cache that may have experienced some service degradation such as unexpected failovers, timeouts and intermittent connectivity issues.

This issue is now mitigated; more information will follow shortly. 

Update

March 16, 5:24am EDT

March 16, 5:24am EDT

We are investigating an alert for Azure Redis Cache. We will provide more information as it becomes available. 

Update

March 16, 5:24am EDT

March 16, 5:24am EDT

Impact Statement: Starting at 03:30 UTC on 16 Mar 2023, you have been identified as a customer using Azure Redis Cache who may experience some service degradation such as unexpected failovers, timeouts and intermittent connectivity issues.


Current Status: We are aware of the issue and are actively investigating. The next update will be provided in 60 minutes or as events warrant.

Update

March 16, 5:24am EDT

March 16, 5:24am EDT

Impact Statement: Starting at 03:30 UTC on 16 Mar 2023, you have been identified as a customer using Azure Redis Cache who may experience some service degradation such as unexpected failovers, timeouts and intermittent connectivity issues.


Current Status: We have identified the cause of this to be an issue with a recent deployment. We are in the early stages of developing a hotfix to mitigate this issue. The next update will be provided in 1 hour or as events warrant.

Update

March 16, 5:24am EDT

March 16, 5:24am EDT

Impact Statement: Starting at 03:30 UTC on 16 Mar 2023, you have been identified as a customer using Azure Redis Cache who may experience some service degradation such as unexpected failovers, timeouts and intermittent connectivity issues.


Current Status: We have identified the potential root cause to be an issue with a recent deployment. We are in the early stages of developing a hotfix to mitigate this, in the meantime, we are looking into pausing the deployment to avoid further disruption. The next update will be provided in 2 hours, or as events warrant.

Update

March 16, 5:24am EDT

March 16, 5:24am EDT

Impact Statement: Starting at 03:30 UTC on 16 Mar 2023, you have been identified as a customer using Azure Redis Cache who may experience some service degradation such as unexpected failovers, timeouts and intermittent connectivity issues.


Current Status: We have identified the potential root cause to be a bug which was introduced during a recent deployment and is causing the unexpected failovers to occur. We are working to pause the deployment to avoid further disruption. There is no current workaround for this issue, however, once failover process is completed for all nodes of a cache resource, cache health should return to normal. We are also working on a hotfix that will be deployed in the coming days to resolve the underlying issue.

The next update will be provided in 2 hours, or as events warrant.

Update

March 16, 5:24am EDT

March 16, 5:24am EDT

Impact Statement: Starting at 03:30 UTC on 16 Mar 2023, you have been identified as a customer using Azure Redis Cache who may experience some service degradation such as unexpected failovers, timeouts and intermittent connectivity issues.


Current Status: We have identified the cause of this to be a bug which was introduced during a recent deployment and is causing the unexpected failovers to occur. We have put a pause on the deployment as a short term fix while we continue to work on developing the hotfix to address the underlying issue which we expect to mitigate fully this. There is no current workaround for this, however, once failover process is completed for all nodes of a cache resource, cache health should return to normal. 

The next update will be provided in 2 hours, or as events warrant.

Resolved

March 16, 5:24am EDT

March 16, 5:24am EDT

Summary of Impact: Between 03:30 UTC and 13:52 UTC on 16 Mar 2023 you were identified as a customer using Azure Redis Cache that may have experienced some service degradation such as unexpected failovers, timeouts and intermittent connectivity issues.


Preliminary Root Cause: We identified that a recent deployment introduced a bug, which led to the unexpected failovers, timeouts and connectivity issues mentioned above. 


Mitigation: We mitigated this by stopping the deployment which was causing these issues, and can confirm that this has mitigated impact for customers. We are still in the process of rolling out the hotfix as a long term fix. 


Next steps: We will continue to investigate to establish the full root cause and prevent future occurrences. Stay informed about Azure service issues by creating custom service health alerts: https://aka.ms/ash-videos for video tutorials and https://aka.ms/ash-alerts for how-to documentation.