Notification services
 
Nov-25, 2:24pm EST

StatusCast engineers identified a backup in its background processing that is causing delays in some actions from being completed in a timely fashion. 

 
Nov-25, 3:32pm EST

Engineers have determined the root cause of the backup in processing. During this time we have scaled out the service that is responsible to help clear out the remaining items faster.

 
Nov-25, 5:06pm EST

Services should be operating as expected.

Status pages Admin application
 
Nov-15, 8:00pm EST

StatusCast engineers have been alerted to a possible performance impacting event affecting status pages and the admin application response times. This event is not impacting notification processing. We apologize for this inconvenience and will provide an update shortly. 

 
Nov-18, 2:00pm EST

All services should be fully functional and performing regularly. 

 
Nov-27, 3:00pm EST

StatusCast's engineers determined that at approximately 8:00PM EST on November 15th 2024, several of StatusCast's application servers experienced an issue that caused the response time to spike. StatusCast's infrastructure in Azure is designed to perform scaling procedures for services under duress, and for all but one of the application servers in question this was done successfully restoring the service to an acceptable level of performance.

The remaining application service did not correctly scale and stayed in a degraded state for multiple days. On November 18th engineers were alerted that some customers were still experiencing load time delays and at that point the last server was corrected. 

After this event occurred StatusCast's Devops team began an audit of all scaling procedures to ensure that all application services across our Azure operational regions adhere to a consistent scaling process and more importantly to ensure that monitoring is properly deployed to all services. 

Status pages Admin application
 
Jul-30, 9:30am EDT

StatusCast engineers were alerted earlier that some users were experiencing sporadic issues attempting to connect to the status page and admin portal. Our hosting provider, Microsoft Azure, has alerted us via their status page that they are experiencing some network issues globally. We will provide an update as soon as more information is available. 

 
Jul-30, 10:44am EDT

Access to status pages has remained stable and Azure has updated their status indicating failover processes have been engaged to improve their service availability. StatusCast's engineers will continue to watch this closely and will post additional updates as necessary.  

 
Jul-30, 4:47pm EDT

StatusCast's application has continued to remain stable. Our engineers will continue to watch the system closely as Microsoft has not fully closed out the event on their side. For more specific details on Azure's issue please refer to their status page. We will provide additional updates as necessary. 

 
Jul-30, 6:00pm EDT

Microsoft has closed the issue on their side and StatusCast's platform continues to operate as expected. Once Microsoft has published more details on this we will provide here in the form of an RCA.

 
Aug-1, 1:07pm EDT
FROM MICROSOFT:
Mitigation Statement - Azure Front Door Issues accessing a subset of Microsoft services
Tracking ID: KTY1-HW8

What happened?

Between approximately at 11:45 UTC and 19:43 UTC on 30 July 2024, a subset of customers may have experienced issues connecting to a subset of Microsoft services globally. Impacted services included Azure App Services, Application Insights, Azure IoT Central, Azure Log Search Alerts, Azure Policy, as well as the Azure portal itself and a subset of Microsoft 365 and Microsoft Purview services.

What do we know so far?

An unexpected usage spike resulted in Azure Front Door (AFD) and Azure Content Delivery Network (CDN) components performing below acceptable thresholds, leading to intermittent errors, timeout, and latency spikes. While the initial trigger event was a Distributed Denial-of-Service (DDoS) attack, which activated our DDoS protection mechanisms, initial investigations suggest that an error in the implementation of our defenses amplified the impact of the attack rather than mitigating it.

How did we respond?

Customer impact began at 11:45 UTC and we started investigating. Once the nature of the usage spike was understood, we implemented networking configuration changes to support our DDoS protection efforts, and performed failovers to alternate networking paths to provide relief. Our initial network configuration changes successfully mitigated majority of the impact by 14:10 UTC. Some customers reported less than 100% availability, which we began mitigating at around 18:00 UTC. We proceeded with an updated mitigation approach, first rolling this out across regions in Asia Pacific and Europe. After validating that this revised approach successfully eliminated the side effect impacts of the initial mitigation, we rolled it out to regions in the Americas. Failure rates returned to pre-incident levels by 19:43 UTC - after monitoring traffic and services to ensure that the issue was fully mitigated, we declared the incident mitigated at 20:48 UTC. Some downstream services took longer to recover, depending on how they were configured to use AFD and/or CDN.

What happens next?

Our team will be completing an internal retrospective to understand the incident in more detail. We will publish a Preliminary Post Incident Review (PIR) within approximately 72 hours, to share more details on what happened and how we responded. After our internal retrospective is completed, generally within 14 days, we will publish a Final Post Incident Review with any additional details and learnings. To get notified when that happens, and/or to stay informed about future Azure service issues, make sure that you configure and maintain Azure Service Health alerts – these can trigger emails, SMS, push notifications, webhooks, and more: https://aka.ms/ash-alerts. For more information on Post Incident Reviews, refer to https://aka.ms/AzurePIRs. Finally, for broader guidance on preparing for cloud incidents, refer to https://aka.ms/incidentreadiness.
Status pages Admin application
 
Apr-3, 8:19pm EDT

At approximately 8:19PM EDT, StatusCast’s engineers were alerted that some status page and admin applications were inaccessible. The team identified that its hosting partner, Microsoft, was experiencing some issues in its US East region related to app services and SQL databases connections. As of 9:03PM EDT services have been restored and StatusCast’s team is currently working with Microsoft to fully investigate the incident. Once the team has completed it’s investigation we will follow up with an RCA.

At this time StatusCast should be operating fully as expected, if you continue to have any further issues please contact us at support@statuscast.com

 
Apr-3, 9:03pm EDT

As of 9:03PM EDT services have been restored and StatusCast’s team is currently working with Microsoft to fully investigate the incident. Once the team has completed it’s investigation we will follow up with an RCA.

 
Apr-5, 5:00pm EDT

In working with Microsoft, StatusCast’s team confirmed that the disruption was due to an outage with SQL Databases located in Azure’s US East region which is where StatusCast is primarily hosted: 




StatusCast itself was impacted by this outage from approximately 8:19 PM EDT and had fully recovered by 9:03 PM EDT. StatusCast’s team will continue to work closely with Microsoft to further optimize its offering to help ensure that impact of service provider outages is as minimal as possible. 

Cloud Providers Twilio
 
Mar-8, 8:26am EST
We are continuing to experience SMS Delivery Delays when sending messages to Multiple Networks in Thailand. Our engineers are working with our carrier partner to resolve the issue. We will provide another update in 2 hours or as soon as more information becomes available.
 
Mar-8, 10:20am EST
We are observing recovery in SMS delivery delays when sending messages to Multiple Networks in Thailand. We will continue monitoring the service to ensure a full recovery. We will provide another update in 2 hours or as soon as more information becomes available.
 
Mar-8, 12:05pm EST
We are no longer experiencing SMS delivery delays when sending messages to Multiple Networks in Thailand. This incident has been resolved.
Cloud Providers Twilio
 
Mar-6, 7:44am EST
We are continuing to investigate this issue.
 
Mar-6, 8:27am EST
We are continuing to experience SMS Delivery Delays to Telesom Network in Somalia. Our engineers are working with our carrier partner to resolve the issue. We will provide another update in 2 hours or as soon as more information becomes available.
 
Mar-6, 10:29am EST
We are still experiencing SMS delivery delays to Telesom Network in Somalia. Our engineers are working with our carrier partner to resolve the issue. We will provide another update in 4 hours or as soon as more information becomes available.
 
Mar-6, 2:24pm EST
We are still experiencing SMS delivery delays to Telesom network in Somalia. Our engineers are working with our carrier partner to resolve the issue. We will provide another update in 8 hours or as soon as more information becomes available.
 
Mar-6, 10:12pm EST
We are observing recovery in SMS delivery delays when sending messages to Telesom network in Somalia. We will continue monitoring the service to ensure a full recovery. We will provide another update in 2 hours or as soon as more information becomes available.
 
Mar-7, 12:12am EST
We are no longer experiencing SMS delivery delays when sending messages to Telesom network in Somalia. This incident has been resolved.
Cloud Providers Twilio
 
Mar-6, 8:09am EST
We are continuing to experience SMS Delivery Delays to WOM Network in Colombia. Our engineers are working with our carrier partner to resolve the issue. We will provide another update in 2 hours or as soon as more information becomes available.
 
Mar-6, 10:15am EST
We are still experiencing SMS Delivery Delays to WOM Network in Colombia. Our engineers are working with our carrier partner to resolve the issue. We will provide another update in 4 hours or as soon as more information becomes available.
 
Mar-6, 2:18pm EST
We are still experiencing SMS Delivery Delays to WOM network in Colombia. Our engineers are working with our carrier partner to resolve the issue. We will provide another update in 8 hours or as soon as more information becomes available.
 
Mar-6, 2:20pm EST
We are continuing to experience SMS delivery delays to WOM Network in Colombia. Our engineers are working with our carrier partner to resolve the issue. We will provide another update in 8 hours or as soon as more information becomes available.
 
Mar-6, 10:18pm EST
We are continuing to experience SMS delivery delays to WOM Network in Colombia. Our engineers are working with our carrier partner to resolve the issue. We will provide another update in 16 hours or as soon as more information becomes available.
 
Mar-7, 2:14pm EST
We are still experiencing SMS delivery delays to WOM Network in Colombia. Our engineers are working with our carrier partner to resolve the issue. We will provide another update in 24 hours or as soon as more information becomes available.
 
Mar-7, 6:18pm EST
We are observing recovery in SMS delivery delays when sending messages to WOM Network in Colombia. We will continue monitoring the service to ensure a full recovery. We will provide another update in 2 hours or as soon as more information becomes available.
 
Mar-7, 8:19pm EST
We are no longer experiencing SMS delivery delays to WOM Network in Colombia issue. The incident has been resolved
 
Mar-6, 3:56am EST
We are observing recovery in SMS delivery delays when sending messages to Multiple US and Canadian Networks via a Subset of Longcodes. We will continue monitoring the service to ensure a full recovery. We will provide another update in 2 hours or as soon as more information becomes available.
 
Mar-6, 5:43am EST
This incident has been resolved.
Cloud Providers Twilio
 
Feb-26, 8:28am EST
We are experiencing SMS delivery delays when sending messages to the T-Mobile network in Montenegro. Our engineers are working with our carrier partner to resolve the issue. We will provide another update in 1 hour or as soon as more information becomes available.
 
Feb-26, 9:18am EST
We are observing recovery in SMS delivery delays when sending messages to the T-Mobile network in Montenegro. We will continue monitoring the service to ensure a full recovery. We will provide another update in 2 hours or as soon as more information becomes available.
 
Feb-26, 11:16am EST
We are no longer experiencing SMS delivery delays when sending messages to T-Mobile network in Montenegro. This incident has been resolved.
Cloud Providers Twilio
 
Feb-25, 11:29am EST
We are still experiencing SMS delivery failures to Evatis network in Djibouti. Our engineers are working with our carrier partner to resolve the issue. We expect to provide another update in 4 hours or as soon as more information becomes available.
 
Feb-25, 3:11pm EST
We are still experiencing SMS delivery failures to Evatis network in Djibouti. Our engineers are working with our carrier partner to resolve the issue. We expect to provide another update in 8 hours or as soon as more information becomes available.
 
Feb-25, 8:18pm EST
We are observing successful SMS delivery to Evatis Network in Djibouti. We will continue to monitor to ensure full service recovery. We expect to provide another update in 2 hours or as soon as more information becomes available.
 
Feb-25, 10:18pm EST
We are no longer experiencing SMS delivery failures to the Evatis network in Djibouti. This incident has been resolved.