Application Insights - Mitigated - Additional Information

Incident
September 13, 2:44pm EDT

Application Insights - Mitigated - Additional Information

Status: closed
Start: August 18, 4:30pm EDT
End: August 18, 4:31pm EDT
Duration: 1 minute
Affected Components:
Cloud Providers Azure
Update

August 18, 4:30pm EDT

August 18, 4:30pm EDT

Summary of Impact: Between 20:30 UTC on 18 Aug. 23 and 05:10 UTC on 19 Aug. 23, you were identified as customer using Workspace-based Application Insights resources who may have experienced 7-10% data gaps during the impact window, and potentially incorrect alert activations.

Preliminary Root Cause: We identified that the issue was caused due to a code bug as part of the latest deployment which has caused some drop in the data.

Mitigation: We have rolled back the deployment to last known good build to mitigate the issue.

Additional Information: Following additional recovery efforts, we have re-ingested the data that was not correctly ingested due to this event, after further investigation it was discovered that the initial re-ingested data had incorrect TimeGenerated values, instead of the original TimeGenerated value. This may cause incorrect query results which may further cause incorrect alerts or report generation. We have investigated the issue that caused this behavior so future events utilizing data recovery processes will re-ingest the data with the correct, original TimeGenerated value.

If you need any further assistance on this, please raise the support ticket for the same.

Resolved

August 18, 4:31pm EDT

August 18, 4:31pm EDT

Resolved

Resolved

September 13, 2:44pm EDT

September 13, 2:44pm EDT

Summary of Impact: Between 20:30 UTC on 18 Aug. 23 and 05:10 UTC on 19 Aug. 23, you were identified as customer using Workspace-based Application Insights resources who may have experienced 7-10% data gaps during the impact window, and potentially incorrect alert activations.

 

Preliminary Root Cause: We identified that the issue was caused due to a code bug as part of the latest deployment which has caused some drop in the data.

 

Mitigation: We have rolled back the deployment to last known good build to mitigate the issue.

 

Additional Information: Following additional recovery efforts, we re-ingested the data that was not correctly ingested due to this event, after further investigation it was discovered that the initial re-ingested data had incorrect TimeGenerated values, instead of the original TimeGenerated value. This may have caused incorrect query results which may have further caused incorrect alerts or report generation. Our investigation extended past previous mitigation and we were able to identify a secondary code bug that caused this behavior. We deployed a hotfix using our Safe Deployment Procedures that re-ingested the data with the correct, original TimeGenerated value. All regions are now recovered and previously incorrect TimeGenerated values are now corrected.

 

If you need any further assistance on this, please raise the support ticket for the same.

 

Next Steps: We will continue to investigate to establish the full root cause and prevent future occurrence. Stay informed about Azure service issues by creating custom service health alerts: https://aka.ms/ash-videos for video tutorials and https://aka.ms/ash-alerts for how-to documentation.