
- Home
- Cybersecurity
- Crowdstrike / Microsoft outage : are we completely helpless?
Crowdstrike / Microsoft outage : are we completely helpless?


While it might not be the global catastrophe some commentators imagined, it still meant a lot of computers had to be restarted individually. More importantly, it caused disruptions to many services, especially noticeable in areas where we are used to immediate results, like the media and transportation.
So, what does this incident reveal in very basic terms?
- We are increasingly dependent on digital technology.
- Going back to “paper and pencil” is painful—if not impossible—due to the complexity of certain processes.
- Some IT solutions have systemic characteristics: about 70% of workstations today run on Windows.
- The software supply chain is becoming more complex, which excessively dilutes responsibility among multiple parties and increases risk levels, whether accidental or malicious.
- Our digital space is much more centralized than we might think.
- There will always be some residual risk, even after applying best practices for anticipation, prevention, protection, or insurance.
This is not new knowledge for most of us, but it’s a reminder that technology, like all human creations, can fail. It would be unreasonable to blame the type of technology involved—an EDR (Endpoint Detection & Response) system that protects us from numerous attacks daily with unmatched automated detection and response capabilities.
Should we just blame bad luck and consider this incident an unfortunate accident? No! Unlike the famous “black swan” events described by Nassim Nicholas Taleb, which are unpredictable, low-probability, and high-impact, there are ways to reduce the likelihood and limit the impact of such risks.
- The failure stemmed from a coding error in the update not of the CrowdStrike Falcon EDR software itself but in its signature database. This isn’t the first time something like this has happened. In the realm of cybersecurity alone, there have been issues with McAfee Virus updates in 2010, Symantec in 2012, Webroot SecureAnywhere and Avast in 2017, and Sophos in 2019—and it won’t be the last. Because human error is inevitable, such updates should undergo preliminary testing and gradual deployment to ensure safety—a step that doesn’t seem to have been taken here. Should we then consider the limits of “speed security,” which often confuses speed with haste, and its counterpart, “speed dev,” driven by the pace of digital transformation? Maybe we should think about the rise of “slow security” and “slow dev” with thorough code reviews, pair programming, etc.
- The incident’s impact—often referred to as the “blue screen of death”—is related to the privileges that the EDR solution has on Windows, allowing it to access the system’s core (the “kernel”). A Microsoft spokesperson even tried to shift the blame to the European Commission, which, in 2009, required the company to open its operating system to facilitate interoperability with other solutions. However, this agreement, aimed at preventing anti-competitive practices, never required Microsoft to grant access to the kernel. It is entirely possible for security software to have the necessary access to monitor and react to threats without affecting the core.
- The often overly complex IT supply chain must be simplified by relying on highly integrated, automated, and interoperable solutions and services. Strengthening collaboration with suppliers and reviewing their contracts ensures that the number of involved parties doesn’t lead to collective irresponsibility. This is crucial for maintaining organizational sovereignty over their processes, data, employees, and, consequently, their business.
- Even though the IT supply chain is highly globalized, it has Single Points of Failure (SPoFs)—components whose failure could lead to a complete or partial failure of the system or service. This underscores the importance of implementing redundant and decentralized architectures, as well as regularly tested business continuity and recovery plans (BCP and DRP). I recall a CISO from a local government who mandated a complete return to “paper and pencil” once a year for all teams in his organization. However, this is difficult to imagine in all sectors unless we accept, once again, to slow down, which would imply a deliberate slow down…
- Fortunately, the outage didn’t result in any deaths, injuries, or property damage. While essential services like transportation, media, or even healthcare were affected—mainly in activities like passenger check-ins for airlines—air traffic control, navigation systems, industrial automation, and operational environments did not seem to be impacted. This aligns with CrowdStrike’s warranty limitations, which state that its products and services are “not fault-tolerant and are neither designed nor intended to be used in a hazardous environment requiring performance or safe operation in the event of failure.” Thankfully, it would have been wise to assess the criticality of some affected services based on their role in broader chains and prioritize differentiated security policies, especially for updates, if a specific environment and full isolation could not be achieved.
- The low “genetic diversity” of operating systems and cybersecurity solutions has been widely criticized since this crisis began. And rightly so. This certainly amplifies the impact of any problem, regardless of its origin. In terms of security, a compromise must be struck between the benefits of a widely deployed environment or solution—like an EDR benefiting from collective intelligence in facing threats (“crowd security”)—and alternative solutions that reduce systemic risk. However, this discussion should not be conflated with the equally strategic debate on digital sovereignty and technological independence! Ideally, solutions should be both secure and sovereign, but there are also secure but non-sovereign solutions, and vice versa.
Ultimately, this failure might negatively impact the IT risk insurance market and the emerging cyber risk insurance market. Anything that increases the systemic nature of risk, regardless of the cause, makes it much less attractive to insurers. This could destabilize a market that is still precarious, as indicated by the latest LUCY report from AMRAE.
the newsletter
the newsletter