Incident

CrowdStrike agent update caused global disruption in Windows, impacting most industries

Take action: The CrowdStrike incident put on full display a concern of a lot of professionals - the risk of a third party auto-updating software running as root on most of our infrastructure, like an EDR agent. We put implicit trust in such programs, because they are the *security* software.


Learn More

On July 19, 2024, a major percentage of businesses worldwide had a technology outage in their Windows computers.

The incident was caused by a defective software update from CrowdStrike to an endpoint agent (Falcon sensor) installed on huge number of Windows computers. The update created a logic error in Windows and widespread Blue Screen of Death (BSOD) loop, causing affected systems to become unusable. The faulty update did not impact Mac and Linux hosts.

Although initially speculated to be a cyberattack, CrowdStrike confirmed that the incident was non-malicious and not a result of any security breach or cyberattack.

The outage led to substantial disruptions across numerous sectors, including:

  • Healthcare: Hospitals and health systems reported varied impacts, from minor disruptions to significant issues affecting medical technology and communications, resulting in delays, diversions, or cancellations of clinical procedures.
  • Airlines: Major airlines such as Delta, United, and American Airlines were forced to ground flights due to the outage.
  • Financial Institutions: Banks and other financial services, especially in Australia and New Zealand, experienced outages.
  • Technology Services: Microsoft's 365 apps and services, as well as systems running on Microsoft’s Azure cloud service, were notably affected. Google's Compute Engine also faced issues.
  • Stock Exchanges: The London Stock Exchange reported disruptions to its RNS news feed service.
  • Emergency Services: Some 911 call centers in the U.S. experienced disruptions, although there were no reported impacts on New Jersey emergency services.

Neither CrowdStrike nor Microsoft have explained what was broken in Windows to cause the BSOD loop, or whether the same logic problem can be abused by malicious attackers.

CrowdStrike identified and isolated the defect, deploying a fix to mitigate the issue. Unfortunately, all the computers that were stuck in a BSOD loop could not load the fix since they were unusable. All these individual Windows hosts had to be fixed one at a time through a Safe mode reboot and manual deletion of files. Since the remediation was manual, the recovery process is very long, tedious and expensive.

Many experts chimed in with their opinion on how this issue could have been avoided, and most revolve around having a test deployment of updates for all organizations before making a company-wide update. This may work for some companies, but it's a very expensive and very tedious process.

One interesting approach was proposed for the vendor (CrowdStrike) to make a canary deployment to some customers first, and if their systems are not affected to proceed to a global deployment.

Each company will now need to make their own impact assessment and decide whether they accept a global update or they will do a test/staggered updates.

CrowdStrike agent update caused global disruption in Windows, impacting most industries