Cloud Providers

Cloud Providers Twilio
 
Nov-1, 8:49am EDT
[closed] The issue causing message delays to Twilio Phone Numbers has been identified. Customers may experience failed delivery in receiving SMS messages from various mobile networks. Our engineers are working with our carrier partner to resolve the issue. We will provide another update in 1 hour or as soon as more information becomes available.
 
Nov-1, 9:54am EDT
Receiving SMS messages from various mobile networks is now operating normally. We will continue to monitor for system stability. We'll provide another update in 30 minutes or as soon as more information becomes available.
 
Nov-1, 10:26am EDT
Twilio Phone Numbers delays has been resolved and it is operating normally at this time.
Cloud Providers Azure
 
Oct-7, 2:47pm EDT

Impact statement: Beginning as early as 11 Aug 2023, you have been identified as a customer experiencing timeouts and high server load for smaller size caches (C0/C1/C2).


Current status: Investigation revealed the cause to be a change in behavior of one of the Azure security monitoring services agent used by Azure Cache for Redis. Monitoring Agent subscribes to the event log and has scheduled backoff for resetting subscription in case no events are received. In some cases scheduled backoff is not working as expected and can increase the frequency of subscription resetting which can significantly affect CPU usage for smaller size caches. Currently, we are in progress of rolling out of the hotfix to the impacted regions which is 80% completed. Initially we estimated this to complete by 13 Oct 2023, however, progress shows we are expected to complete by 11 Oct 2023. To prevent impact till the fix is rolled out we are applying short term mitigation to all caches which will reduce the log file size. The next update will be provided by 19:00 UTC on 8 Oct 2023 or as events warrant, to allow time for the short term mitigation to progress.

 
Oct-7, 2:54pm EDT

Impact statement: Beginning as early as 11 Aug 2023, you have been identified as a customer experiencing timeouts and high server load for smaller size caches (C0/C1/C2).


Current status: Investigation revealed the cause to be a change in behavior of one of the Azure security monitoring services agent used by Azure Cache for Redis. Monitoring Agent subscribes to the event log and has scheduled backoff for resetting subscription in case no events are received. In some cases scheduled backoff is not working as expected and can increase the frequency of subscription resetting which can significantly affect CPU usage for smaller size caches. Currently, we are in progress of rolling out of the hotfix to the impacted regions which is 80% completed. Initially we estimated this to complete by 11 Oct 2023, however, progress shows we are expected to complete by 09 Oct 2023. To prevent impact till the fix is rolled out we are applying short term mitigation to all caches which will reduce the log file size. The next update will be provided by 19:00 UTC on 8 Oct 2023 or as events warrant, to allow time for the short term mitigation to progress.

 
Oct-8, 2:11pm EDT

Summary of Impact: Between as early as 11 Aug 2023 and 18:00 UTC on 8 Oct 2023, you were identified as a customer who may have experienced timeouts and high server load for smaller size caches (C0/C1/C2).


Current Status: This issue is now mitigated. More information will be provided shortly.

 
Oct-8, 2:55pm EDT

What happened? 

Between as early as 11 Aug 2023 and 18:00 UTC on 8 Oct 2023, you were identified as a customer who may have experienced timeouts and high server load for smaller size caches (C0/C1/C2).

 

What do we know so far? 

We identified a change in behavior of one of the Azure security monitoring services agents used by Azure Cache for Redis. Monitoring Agent subscribes to the event log and has scheduled backoff for resetting subscription in case no events are received. In some cases, scheduled backoff is not working as expected and can increase the frequency of subscription resetting which can significantly affect CPU usage for smaller size caches.

 

How did we respond?

To address this issue, engineers performed manual action on the underlying Virtual Machines of impacted caches. After further monitoring, internal telemetry confirmed this issue is mitigated and full-service functionality was restored.

 

What happens next? 

We will continue to investigate to establish the full root cause and prevent future occurrences. Stay informed about Azure service issues by creating custom service health alerts: https://aka.ms/ash-videos for video tutorials and https://aka.ms/ash-alerts for how-to documentation.

Cloud Providers Twilio
 
Sep-1, 10:28pm EDT
We continue to experience SMS delivery delays and failures when sending messages to Movicel network in Angola. Our engineers are working with our carrier partner to resolve the issue. We will provide another update in 2 hours or as soon as more information becomes available.
 
Sep-2, 12:22am EDT
We are continuing to experience SMS delivery delays and failures when sending messages to Movicel network in Angola. Our engineers are working with our carrier partner to resolve the issue. We will provide another update in 4 hours or as soon as more information becomes available.
 
Sep-2, 4:09am EDT
We are continuing to experience SMS delivery delays and failures when sending messages to Movicel network in Angola. Our engineers are working with our carrier partner to resolve the issue. We will provide another update in 8 hours or as soon as more information becomes available.
 
Sep-2, 12:05pm EDT
We continue to experience SMS delivery delays and failures when sending messages to Movicel network in Angola. Our engineers are working with our carrier partner to resolve the issue. We will provide another update in 16 hours or as soon as more information becomes available.
 
Sep-2, 4:05pm EDT
We continue to experience SMS delivery delays and failures when sending messages to Movicel network in Angola. Our engineers are working with our carrier partner to resolve the issue. We will provide another update in 24 hours or as soon as more information becomes available.
 
Sep-3, 4:06pm EDT
We continue to experience SMS delivery delays and failures when sending messages to Movicel network in Angola. Our engineers are working with our carrier partner to resolve the issue. We will provide another update in 24 hours or as soon as more information becomes available.
 
Sep-4, 4:06pm EDT
We continue to experience SMS delivery delays and failures when sending messages to Movicel network in Angola. Our engineers are working with our carrier partner to resolve the issue. We will provide another update in 24 hours or as soon as more information becomes available.
 
Sep-5, 7:45am EDT
We are observing successful SMS delivery when sending messages to Movicel network in Angola. We will continue to monitor to ensure full service recovery. We expect to provide another update in 2 hours or as soon as more information becomes available.
 
Sep-5, 9:47am EDT
We are no longer experiencing SMS delivery delays and failures when sending messages to Movicel network in Angola. This incident has been resolved.
 
Sep-1, 6:51pm EDT
We are observing recovery in SMS delivery delays for a subset of small networks in the US for a subset of short codes. We will continue monitoring the service to ensure a full recovery. We will provide another update in 2 hours or as soon as more information becomes available.
 
Sep-1, 8:42pm EDT
We are no longer experiencing SMS delivery delays for a subset of small networks in the US for a subset of short codes. This incident has been resolved.
Cloud Providers Twilio
 
Aug-31, 8:41am EDT
[closed] Our monitoring systems have detected a potential issue with Outbound SMS where messages will remain in SENT status and consequently move to DELIVERY_UNKNOWN status. Our engineering team has been alerted and is actively investigating. We will update as soon as we have more information.
 
Aug-31, 9:31am EDT
Outbound SMS where messages will remain in SENT status and consequently move to DELIVERY_UNKNOWN status is now operating normally. We will continue to monitor for system stability. We'll provide another update in 30 minutes or as soon as more information becomes available.
 
Aug-31, 10:00am EDT
Issue with SMS delivery status has been resolved and is operating normally at this time.
Cloud Providers Twilio
 
Aug-30, 8:42pm EDT
We are no longer experiencing SMS delivery delays when sending messages to Claro network in Brazil. This incident has been resolved.
Cloud Providers Azure
 
Aug-18, 4:30pm EDT
[closed]

Summary of Impact: Between 20:30 UTC on 18 Aug. 23 and 05:10 UTC on 19 Aug. 23, you were identified as customer using Workspace-based Application Insights resources who may have experienced 7-10% data gaps during the impact window, and potentially incorrect alert activations.

Preliminary Root Cause: We identified that the issue was caused due to a code bug as part of the latest deployment which has caused some drop in the data.

Mitigation: We have rolled back the deployment to last known good build to mitigate the issue.

Additional Information: Following additional recovery efforts, we have re-ingested the data that was not correctly ingested due to this event, after further investigation it was discovered that the initial re-ingested data had incorrect TimeGenerated values, instead of the original TimeGenerated value. This may cause incorrect query results which may further cause incorrect alerts or report generation. We have investigated the issue that caused this behavior so future events utilizing data recovery processes will re-ingest the data with the correct, original TimeGenerated value.

If you need any further assistance on this, please raise the support ticket for the same.

 
Aug-18, 4:31pm EDT
Resolved
 
Sep-13, 2:44pm EDT

Summary of Impact: Between 20:30 UTC on 18 Aug. 23 and 05:10 UTC on 19 Aug. 23, you were identified as customer using Workspace-based Application Insights resources who may have experienced 7-10% data gaps during the impact window, and potentially incorrect alert activations.

 

Preliminary Root Cause: We identified that the issue was caused due to a code bug as part of the latest deployment which has caused some drop in the data.

 

Mitigation: We have rolled back the deployment to last known good build to mitigate the issue.

 

Additional Information: Following additional recovery efforts, we re-ingested the data that was not correctly ingested due to this event, after further investigation it was discovered that the initial re-ingested data had incorrect TimeGenerated values, instead of the original TimeGenerated value. This may have caused incorrect query results which may have further caused incorrect alerts or report generation. Our investigation extended past previous mitigation and we were able to identify a secondary code bug that caused this behavior. We deployed a hotfix using our Safe Deployment Procedures that re-ingested the data with the correct, original TimeGenerated value. All regions are now recovered and previously incorrect TimeGenerated values are now corrected.

 

If you need any further assistance on this, please raise the support ticket for the same.

 

Next Steps: We will continue to investigate to establish the full root cause and prevent future occurrence. Stay informed about Azure service issues by creating custom service health alerts: https://aka.ms/ash-videos for video tutorials and https://aka.ms/ash-alerts for how-to documentation.

August 09, 3:01am EDT
Cloud Providers SendGrid
 
Aug-9, 3:33am EDT
Our engineers have identified the issue and are working towards a fix. We will provide another update in 1 hour or as soon as more information becomes available.
 
Aug-9, 5:14am EDT
Our engineers found the cause of the mail send request delay issue and implemented a solution, we are still monitoring it the situations. Customers from Europe might have experienced request delays as a result of an issue with one of our network providers that also affected our datacenter. We will provide more information when it becomes available.
 
Aug-9, 8:18am EDT
Our engineers are monitoring the system's performance and are working towards a fix. We will provide another update in an hour or as soon as more information becomes available.
 
Aug-9, 9:26am EDT
Our engineers are still monitoring the system's performance and are working towards a fix. We will provide another update in an hour or as soon as more information becomes available.
 
Aug-9, 10:26am EDT
Our engineers are still monitoring the system's performance and are working towards a fix. We will provide another update in an hour or as soon as more information becomes available.
 
Aug-9, 11:20am EDT
Our engineers are still monitoring the system's performance and are working towards a fix. We will provide another update in an hour or as soon as more information becomes available.
 
Aug-9, 12:15pm EDT
Our engineers have implemented a fix and are monitoring system performance. We will provide another update in 1 hour or as soon as more information becomes available.
 
Aug-9, 1:13pm EDT
Our engineers have monitored the fix and confirmed the issue with Email Send request delays has been resolved. All services are now operating normally at this time.
Cloud Providers SendGrid
 
Aug-7, 5:16pm EDT
[closed] Our engineers have investigated and resolved the issues with links. From 10:15 PM PST to 2:15 PM PST customers may have noted intermittent latency when clicking on links within emails, resulting in longer than expected loading times, or misdirections. The issue has been resolved, and all impacted services are operating normally.
 
Aug-3, 11:10pm EDT
Our engineering team has identified the issue. We will provide another update as soon as more information becomes available.
 
Aug-3, 11:26pm EDT
Our engineers have monitored the behaviour and confirmed that the issue regarding deferred emails have been resolved. All services are now operating normally at this time.
 
Aug-3, 11:42pm EDT
Our engineers have monitored the Mail Send behaviour and confirmed the issue related to deferred emails have been resolved. All services are now operating normally at this time.