Status pages

Status pages Notification services
 
Aug-29, 9:30pm EDT
[closed]

Starting August 30th, 2023 for Public Status Pages that allow SMS subscriptions StatusCast will now require that a valid email address be confirmed before a person can fully establish a new SMS subscription.

This change in subscription workflow is to help prevent malicious parties from attempting to commit SMS fraud which has become a growing concern for many SaaS companies dealing with mass notifications. We here at StatusCast have witnessed this trend, in the past 6 months the quantity of malicious traffic attempting to commit SMS fraud has increased drastically. While we have continued to implement industry best practices to safeguard against this sort of activity, ultimately real user confirmation is the most effective way to prevent such unwanted attention.

Status pages Admin application
 
Jul-21, 9:40am EDT
[closed] At approximately 9:40AM EDT StatusCast engineers were alerted to errors on the application that were preventing users from accessing both their status page as well as the administrative portal. StatusCast’s engineers have determined a potential issue with its service provider Azure and is currently working with Microsoft to diagnose and resolve the issue. 
 
Jul-21, 10:55am EDT

At this time services have been restored and should be operating as normal. If you continue to have any issues please contact support@statuscast.com to open a ticket. We will follow to this event with an RCA detailing what occurred and how we will handle this moving forward. 

 
Jul-21, 12:02pm EDT

Describe the full incident details below:

On July 21st, 2023 at approximately 9:40 EDT StatusCast’s engineers received alerts that the application was displaying a  HTTP Error 500.30 error when attempting to access any *.status.page status page or admin portal. During this period any notifications in progress or from schedule maintenance would have continued to work as expected. Additionally, during this period anyone using StatusCast’s legacy(*.statuscast.com) version of the application was not impacted. 

Describe action taken by StatusCast to mitigate issue:

Engineers immediately began to investigate the cause of the problem. StatusCast’s service provider, Azure, indicated that it was undergoing maintenance in the region that StatusCast’s is primarily hosted on(US East). Engineers got in contact with Microsoft to confirm and to get additional insight as the issue itself was impacting the failover region(US West). During this process StatusCast deployed an additional instance to another Azure region which experienced the same errors as both East and West.

The root cause of the problem ultimately was related to Azure’s maintenance and the availability of one of StatusCast’s databases used for managing connections to the application. Leading up to the outage StatusCast’s operations team was preparing for its monthly penetration test which regularly involves a fresh test database for a reserved test application. The updated connection was not properly propagated to all of StatusCast’s application servers and traffic manager which unfortunately caused the subsequent errors. 

Once the issue had been identified StatusCast’s engineers were quickly able to restore service. StatusCast development team will be performing an emergency patch today(July 21st, 2023) to ensure that an issue like this can be caught without the application going unavailable. 



 


Status pages Admin application
 
May-10, 4:11pm EDT
[closed]

StatusCast engineers have detected a possible performance impacting event affecting status pages and the admin application. This event is not impacting notification processing. We apologize for this inconvenience and will provide an update shortly. 

 
May-10, 7:06pm EDT

This event has been resolved. 

Status pages Admin application
 
Feb-17, 6:00am EST
[closed]

The StatusCast team will be performing a maintenance on February 17, 6:00am EST, the estimated duration is 60. We do not expect any impact to your service but in some cases there may be a brief interruption.

 
Feb-17, 7:00am EST
StatusCast's maintenance has been completed. All services should be operational, if you encounter any issues please contact StatusCast's support team at support@statuscast.com.
January 12, 8:40am EST
Status pages
 
Jan-12, 8:40am EST
[closed] StatusCast's engineers have been alerted that when attempting to access a status page that some users get a 500 error. 
 
Jan-12, 9:25am EST
Engineers have determined a subset of accounts experienced an error related to reoccurring schedule maintenances that had been created before January  1st, 2023. This would have caused the error that some users experienced as they attempted to access an event. This issue has been corrected and we do not anticipate any further issue. We apologize for the inconvenience.  
Status pages Admin application
 
Nov-17, 9:47am EST
[closed]

StatusCast's engineers have been alerted that some users while attempting to access their status page and/or administrative portal experienced significant delays in response time and in some cases the application would not load at all. We are working to diagnose and resolve this issue ASAP and will provide updates as available. 

 
Nov-17, 11:00am EST

Engineers have confirmed that this morning StatusCast experienced an unexpected significant spike in traffic that effected response time for many users. In some cases, occasional timeouts were reported when loading status pages as well. We have scaled our servers temporarily while we investigate the root cause of the spike and will be mitigating for long term scalability.


At this time all services should be operating as expected and we will follow-up with a detailed RCA once our investigation is concluded

 
Nov-18, 3:58pm EST

On November 17th at approximately 9:45AM EST StatusCast experienced a tremendous spike in inbound traffic(over 3x our historical max) which caused the primary caching mechanism for the application to become overloaded. This caused many connection requests to the application to experience either major delays in page loads or complete time outs.  During this time StatusCast’s own status page was also affected; not allowing for customers to check in regarding the status of the service and the actions being taken.


Engineers mitigated the issue by scaling out the service and performing an emergency flush of the caching system in order to restore service while investigating the source of traffic spike. 


Once the system had been fully restored engineers continued their investigation into the traffic spike and determined that it was not malicious in nature. The engineering and development teams have spent the last 24 hours making and preparing the following changes to StatusCast’s service offering:


  1. Permanently scaled up the resource baseline for all of StatusCast’s servers

  2. Added additional servers into the pool of servers used to maintain the application 

  3. Revisited auto-scaling rules around resource baselines for auto-mitigation purposes

  4. Planned caching updates for StatusCast’s December release that will aid in caching resource constraints

  5. Migrated StatusCast’s own page to an environment that is totally separate from the production space that clients are deployed to.


StatusCast’s team will continue to monitor both the health of its service offerings and analyze traffic patterns in order to gauge if additional changes to its infrastructure are necessary.

Status pages Admin application
 
Oct-24, 2:17pm EDT
[closed]

StatusCast engineers were alerted to an issue affecting some users access to the status.page and admin version of the application resulting in slow load times or pages to time out.

 
Oct-24, 2:42pm EDT
Engineers identified certain servers within its rotation had encountered memory issues and were able to resolve the issue. An RCA will follow this update. An RCA will follow this update.

At this point service should be operating as expected for all users, however if you continue to experience any issues please contact support@statuscast.com.
 
Oct-24, 6:55pm EDT
At 2:17pm EST, StatusCast engineers were alerted to an issue that resulted in some customers experiencing slow load times and page time outs when accessing status pages and admin portals. Engineers discovered we had an major spike in traffic and our servers existing scaling logic was not able to keep up which resulted in maxing out a couple of our resources. Once the engineers have determined the caused, the influx had resolved itself at 2:42pm EST, and returned to normal. As a temporary solution, we have scaled out the service to handle additional traffic spikes. However as a more permanent solution, additional automated scaling rules have been implemented to allow the application to handle its traffic spikes such as the one experienced today.
Status pages Admin application
 
Jul-19, 4:39pm EDT
[closed]

StatusCast's engineers have been alerted that some users are experiencing latency when attempting to access their status page as well as their administrative portal. At this time this latency does not appear to be impacting all users.

Engineers are working to resolve this now and we will post an update shortly when more information is available. 



 
Jul-19, 6:48pm EDT

At this time all services should be operating as expected. If you continue to experience any further issues please reach out to StatusCast support at support@statuscast.com.


We will follow up with additional information related to the root cause of this latency at a later time. 

 
May-30, 6:24am EDT
[closed]

StatusCast’s engineers were alerted to an issue affecting some customers accessing the status.page version of the application. Engineers confirmed that a certificate renewal was not properly propagated to all servers. This did not impact customers utilizing the statuscast.com domain or those utilizing a custom domain name.

Once we were made aware of this issue the updated certificate was pushed out directly to all instances. At this point service should be operating as expected for all users, however if you continue to experience any issues please contact support@statuscast.com

Status pages Notification services
 
May-19, 11:50am EDT
[closed]

StatusCast's engineers were alerted that schedule maintenance events created from StatusCast's legacy application("V2") were not properly auto-closing after their estimated duration had been reached. After an initial investigation engineers have confirmed the cause on the service responsible and a patch was performed to correct the error. Any maintenance that was overdue for closure should have been resolved and StatusCast's engineers will continue to monitor the legacy process for this to ensure no other issues occur.