Status page and admin portal degraded performance

Incident
November 27, 3:00pm EST

Status page and admin portal degraded performance

Status: closed
Start: November 15, 8:00pm EST
End: November 18, 2:00pm EST
Duration: 2 days 18 hours
Affected Components:
Status pages Admin application
Update

November 15, 8:00pm EST

November 15, 8:00pm EST

StatusCast engineers have been alerted to a possible performance impacting event affecting status pages and the admin application response times. This event is not impacting notification processing. We apologize for this inconvenience and will provide an update shortly. 

Resolved

November 18, 2:00pm EST

November 18, 2:00pm EST

All services should be fully functional and performing regularly. 

Root Cause

November 27, 3:00pm EST

November 27, 3:00pm EST

StatusCast's engineers determined that at approximately 8:00PM EST on November 15th 2024, several of StatusCast's application servers experienced an issue that caused the response time to spike. StatusCast's infrastructure in Azure is designed to perform scaling procedures for services under duress, and for all but one of the application services in question this was done successfully restoring the service to an acceptable level of performance.

The remaining application service in the EU did not correctly scale and stayed in a degraded state for multiple days. On November 18th engineers were alerted that some customers were still experiencing load time delays and at that point the last server was corrected. 

After this event occurred StatusCast's DevOps team began an audit of all scaling procedures to ensure that application services across our Azure operational regions adhere to a consistent scaling process and to ensure that monitoring is properly deployed to all services as having a single application service remaining in a degraded state for days is not acceptable.