What actually happened?
On Friday 19 July 2024 a faulty update to security software vendor CrowdStrike's Falcon Sensor application triggered widespread disruption. Companies worldwide were affected, with Microsoft Windows based computers running CrowdStrike software displaying a blue screen of death. According to news reports CrowdStrike removed the update soon after it was released when the issue became apparent. A fix became widely known Friday afternoon (Australian time) but had to be applied manually to each affected computer.
How can this happen?
Cybercrime and more recently AI driven cybercrime requires software problems to be rectified in increasingly shorter time frames. Cyber criminals are now enterprise grade business operations with global distribution of services. For example, phishing-as-a-service uses a Saas business model providing access to a phishing kit (phishing pages, fake websites, etc.) in exchange for a fee. The fast pace creates pressure for Microsoft and security software vendors to deploy updates more rapidly. This leads to a higher risk of a problem being missed in an update before release, and when released, propagating at globally at speed.
How did it effect Sentrian and Sentrian customers?
We do not use Crowdstrike software, however a small number of clients where effected as their parent company mandated Crowdstrike or it had been implemented by a 3rd party security provider. For those firms we had a response organised late on the Friday with our team of engineers volunteering to assist rectify the issue over the weekend. We have a wonderful team!
Could it happen again?
Yes, and it could be much worse. This issue was not malicious, the fix was not technically difficult and was quickly shared. Reportedly only one per cent of Windows computers worldwide were affected. If a similar event was to be caused by a malicious actor and it rendered the impacted computers difficult or impossible to recover then the disruption caused would be significantly worse. Because the time to recover is largely determined by the technical people available, the IT industry could potentially be overwhelmed slowing the recovery and exacerbating the disruption.
What can we learn?
In my view there are three key take aways from this event.
1. The root cause of the problem may well be human error (or maybe AI error), or maybe we will never know, but large scale disruptions like the Crowstrike issue will almost certainly happen again at some point. There are already reasonably frequent outages from the top tech companies including Microsoft. These are a fact of life, and the reason most vendors have a services status page. Being a big player (with many many organisations depending on your services) doesn't make you immune from the problems resulting from increasing tech complexity.
2. The impact of a global disruption will be an enticing proposition made real for cyber criminals. Effective cyber security is more important than ever.
3. Business continuity planning is required to consider how your organisation will respond, and continue to operate, in the event of widespread disruption to your own computers, internet connectivity, power or cloud services such as Microsoft 365.
What does it mean for businesses?
Its a reminder that tech is rapidly increasing in complexity, driven firstly by the internet, then by cloud adoption, cybercrime and now AI. Things can and will wrong. Last Friday was the first truly global demonstration of a significant tech failure. It is not difficult to imagine a more significant issue that causes greater disruption and chaos and that is potentially much harder to rectify.
What can you do?
In short, there is limited amount you can do other than review your cyber security stance, your business continuity and incident response plans. From a technical standpoint consider the key dependancies of your business, for example power, systems, network, connectivity etc and build redundancy where practical. Mutiple backups of your data in different locations also provides an important failsafe. Finally last Friday illustrated the potential value of a tested plan to keep critical functions running in the event of disaster, like payroll, getting paid, making payments, communications etc.
Please contact your Sentrian Customer Success Manager if you would like further guidance.