Matt Holland is the chief executive and founder of Field Effect, a global cybersecurity solution provider based in Ottawa. He started his career with Communications Security Establishment Canada (CSE).
The last thing I expected when I woke up on July 19 was that a large portion of the world’s computers would be in what we tech nerds call the “blue screen of death state,” (BSOD), especially as it was due to an entirely preventable crash.
The incident was caused by an update pushed out by one of the world’s largest cybersecurity companies, CrowdStrike. Computers that used its software were rendered inoperable.
This was obviously an unacceptable event for the planet, with the cost in downtime estimated at billions of dollars and the impact on human lives immeasurable.
Some tech experts have reacted by saying that Microsoft MSFT-Q made a mistake allowing security vendors such as CrowdStrike into the “Windows Kernel.” I disagree.
First, let me explain what a kernel is. A kernel is the portion of an operating system where highly privileged, protected instructions reside, typically constituting the bulk of the operating system.
The advantage of accessing this space for security vendors is that we can easily monitor for, and detect, malware doing bad things, and ideally take action, such as blocking a malware implementing a ransomware attack. The downside is the situation in July – that if a security vendor introduces a particularly bad bug, there is a risk the host computer will crash.
Microsoft is a unique vendor in that its flagship operating system, Windows, has historically been the most open and commercialized platform for vendors of all types to build upon. It allows hardware vendors to write kernel drivers (operating system extensions) that support hardware. It allows security vendors to write drivers to provide cybersecurity value-add (i.e. the protection from attackers that their customers desperately need).
Pulling security vendors out of the Windows kernel is a rational reaction given the impact of the CrowdStrike incident – but it’s a surface-level reaction that won’t have the outcome that supporters think it will.
It will, in fact, weaken security providers’ ability to protect and defend against hackers.
Before founding my own cybersecurity firm, I was with Canada’s digital intelligence agency, part of a world-class team that built incredible offensive capabilities that could quite literally hack into almost any server, desktop or mobile phone on the planet.
The platforms that provided cybersecurity vendors with limited access to the kernel were the easiest to hack. Just look at how simply NSO Group, the Israeli cyber-intelligence firm, continues to build attack-chains that can hack phones around the world, and they aren’t the only ones.
This is because no systems are perfect. There will always be “exploits” that allow attackers to get inside. To counter that, one of the key principles of defending an operating system is having a level of visibility or access that is at least equal to what an attacker can attain.
Windows grants security vendors this kernel access, enabling their ability to observe and defend. If we remove security vendors’ kernel access, those types of attacks/exploits would be almost impossible to detect and defend against.
Overreacting because of one vendor’s serious mistake, without considering real world data and realities, does not lead to a better outcome for the world.
Rather, this issue is about company culture, developer experience and development mindset – which are what lead to great stable code. Excellent quality assurance, well-designed architectures and rollout procedures that assume the worst, are what lead to stable operational outcomes.
Those who buy cybersecurity technology should be demanding these things, because it is these things that matter.
Editor’s note: A previous version of this article incorrectly referred to Communications Security Establishment Canada (CSE) as the Canadian Security Establishment. This version has been updated.