Network Infrastructure

Cloud Providers Azure Network Infrastructure
 
Jan-18, 2:48pm EST

Impact Statement: Starting at 14:12 UTC on 18 Jan 2024, a limited subset of customers in East US may experience short periods of application latency or intermittent HTTP 500-level response codes and/or timeouts when connecting to resources hosted in this region. Internal telemetry indicates that these interruptions are brief and appear in spikes, lasting approximately 2-5 minutes at a time with less than 5 spikes over a 3 hour period.


Current Status: Engineering teams have identified a root cause for this issue and are currently exploring mitigation options. The next update will be provided in 2 hours or as events warrant.

 
Jan-18, 3:04pm EST

Impact Statement: Starting at 14:12 UTC on 18 Jan 2024, customers in East US may experience short periods of application latency or intermittent HTTP 500-level response codes and/or timeouts when connecting to resources hosted in this region. Internal telemetry indicates that these interruptions are brief and appear in spikes, lasting approximately 2-5 minutes at a time with less than 5 spikes over a 3 hour period.

 

Current Status: Engineering teams have identified a root cause for this issue and are currently exploring mitigation options. The next update will be provided in 2 hours or as events warrant.

 
Jan-18, 5:07pm EST

Impact Statement: Starting at 14:12 UTC on 18 Jan 2024, customers in East US may experience short periods of application latency or intermittent HTTP 500-level response codes and/or timeouts when connecting to resources hosted in this region. Internal telemetry indicates that these interruptions are brief and appear in spikes, lasting approximately 2-5 minutes at a time with less than 5 spikes over a 3 hour period.

 

Current Status: Engineering teams have identified a root cause for this issue and are currently exploring mitigation options. We have continued to monitor the status of the service and we can confirm that our telemetry indicates that there have been no additional spikes in the past 2-3 hours. We will continue to monitor and provide an update in 2 hours or as events warrant.

 
Jan-18, 5:19pm EST

Summary of Impact: Between 14:12 UTC and 16:52 UTC on 18 Jan 2024, customers in East US may have experienced short periods of application latency or intermittent HTTP 500-level response codes and/or timeouts when connecting to resources hosted in this region. Internal telemetry indicated that these interruptions were brief and appeared in spikes, lasting approximately 2-5 minutes at a time with less than 5 spikes over a 3 hour period.

 

Current Status: This incident is now mitigated. More details will be provided shortly.

 
Jan-18, 5:50pm EST

Summary of Impact: Between 14:12 UTC and 16:52 UTC on 18 Jan 2024, customers in East US may have experienced short periods of application latency or intermittent HTTP 500-level response codes and/or timeouts when connecting to resources hosted in this region. 

 

Preliminary Root Cause: Engineers observed a sudden increase in traffic to an underlying network endpoint in the East US region. This increase happened in quick spikes(less than 5) over the course of 2-3 hours . When these spikes occurred, customers with resources in the region with network traffic routed through this endpoint may have encountered periods of packet loss and service interruption.

 

Mitigation: Engineers identified and isolated the source of the sudden increases in network traffic.

 

Next Steps: Our team will be completing an internal retrospective to understand the incident in more detail. Once that is completed, generally within 14 days, we will publish a Post Incident Review to all impacted customers. To get notified when that happens, and/or to stay informed about future Azure service issues, make sure that you configure and maintain Azure Service Health alerts – these can trigger emails, SMS, push notifications, webhooks, and more: https://aka.ms/ash-alerts. For more information on Post Incident Reviews, refer to https://aka.ms/AzurePIRs. Finally, for broader guidance on preparing for cloud incidents, refer to https://aka.ms/incidentreadiness.

Cloud Providers Azure Network Infrastructure
 
Nov-22, 3:22pm EST

Summary of Impact: At 18:29 UTC, 20:04 UTC, and 20:24 UTC on 22 Nov 2023 for periods of 1 minute, 15 minutes, and 5 minutes respectively, resources in a single availability zone in East US may have seen intermittent network connection failures, delays, packet loss when reaching out to services across the region. However, the retries would have been successful. 


Preliminary Root Cause: While conducting an urgent break-fix repair on some network capacity in a single availability zone of East US, live traffic was impacted. Traffic was re-routed to alternate spans with sufficient capacity. The region is now stable. Customers can continue to operate workloads in the impacted Availability Zone. 


Mitigation: We have moved traffic to healthy optical fiber spans in the region. The issue is mitigated.


Next Steps: We will investigate why this impact occurred to establish the full root cause and prevent future occurrences. Stay informed about Azure service issues by creating custom service health alerts: https://aka.ms/ash-videos for video tutorials and https://aka.ms/ash-alerts for how-to documentation.

 
Nov-22, 3:23pm EST
Resolved
Cloud Providers Azure Network Infrastructure
 
Jun-10, 4:30pm EDT

Summary of Impact: Between 20:33 UTC - 21:00 UTC on 10 Jun. 2023, customers in East US may have experienced impact on network communications, due to hardware failure of a router, during a planned maintenance. Retries would have been successful.

 

Preliminary Root Cause: We have determined that the router self-healed at 21:00 UTC.

 

Mitigation: We have isolated the device as a precaution and stopped further upgrades.

 

Next Steps: We will continue to investigate to establish the full root cause and prevent future occurrences. Stay informed about Azure service issues by creating custom service health alerts: https://aka.ms/ash-videos for video tutorials and https://aka.ms/ash-alerts for how-to documentation.

 
Jun-10, 4:31pm EDT
Resolved
Cloud Providers Azure Network Infrastructure
 
Jan-25, 2:08am EST

Summary of Impact: Between 7:08 UTC and 12:30 UTC on 25 Jan 2023, you were identified as a customer in Canada East, East US, South Central US, West US, and Canada Central who may have experienced latency or timeouts when deploying networking services through the Azure portal.

 

Preliminary Root Cause: We determined that our services were affected by a network latency brought about by a networking router that was taken out of rotation for maintenance during a spike in traffic. This led to increased congestion on some links.

 

Mitigation: The router was removed from service, and optimized the routes for the flow of traffic so services could resume normally.

 

Next Steps: A full root cause to investigate why this router caused wide spread impact will be developed. Stay informed about Azure service issues by creating custom service health alerts: https://aka.ms/ash-videos for video tutorials and https://aka.ms/ash-alerts for how-to documentation.

 
Jan-25, 2:09am EST
Resolved