Intermittent Connectivity Failures

Incident
May 02, 6:45pm EDT

Intermittent Connectivity Failures

Status: closed
Start: May 02, 5:22pm EDT
End: May 02, 6:45pm EDT
Duration: 1 hour 23 minutes
Affected Components:
Notification services
Update

May 02, 5:22pm EDT

May 02, 5:22pm EDT

Please be aware that StatusCast is currently experiencing intermittent connectivity issues which is causing notifications to be delayed or possibly not sent. These are caused by a global network infrastructure issue at Microsoft Azure, StatusCast's hosting provider. Microsoft is currently investigating the issue.

At this time the application appears to be running normally, however until Microsoft deems the issue as fully resolved we will continue to monitor the application. If you experience an issue sending notifications please contact our support team at support@statuscast.com. We apologize for any inconvenience experienced by these intermittent connectivity issues.

Update

May 02, 6:35pm EDT

May 02, 6:35pm EDT

StatusCast's services continue to remain operational. We will continue to monitor the system closely as long as Microsoft's incident remains active.

Resolved

May 02, 6:45pm EDT

May 02, 6:45pm EDT

Microsoft has confirmed that the issue has been mitigated and connectivity to all services should have returned to a normal state. 

Root Cause

May 02, 6:45pm EDT

May 02, 6:45pm EDT

A summary from Microsoft regarding this issue is below:

Network Connectivity - DNS Resolution

Summary of impact: Between 19:43 and 22:35 UTC on 02 May 2019, customers may have experienced intermittent connectivity issues with Azure and other Microsoft services (including M365, Dynamics, DevOps, etc). Most services were recovered by 21:30 UTC with the remaining recovered by 22:35 UTC. 

Preliminary root cause: Engineers identified the underlying root cause as a nameserver delegation change affecting DNS resolution and resulting in downstream impact to Compute, Storage, App Service, AAD, and SQL Database services. During the migration of a legacy DNS system to Azure DNS, some domains for Microsoft services were incorrectly updated. No customer DNS records were impacted during this incident, and the availability of Azure DNS remained at 100% throughout the incident. The problem impacted only records for Microsoft services.

Mitigation: To mitigate, engineers corrected the nameserver delegation issue. Applications and services that accessed the incorrectly configured domains may have cached the incorrect information, leading to a longer restoration time until their cached information expired.

 

For more information from Microsoft, please visit their status page at https://azure.microsoft.com/en-us/status/history/