Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On the CrowdStrike Incident, published by Zvi on July 22, 2024 on LessWrong.
Things went very wrong on Friday.
A bugged CrowdStrike update temporarily bricked quite a lot of computers, bringing down such fun things as airlines, hospitals and 911 services.
It was serious out there.
Ryan Peterson: Crowdstrike outage has forced Starbucks to start writing your name on a cup in marker again and I like it.
What (Technically) Happened
My understanding it was a rather stupid bug, a NULL pointer from the memory unsafe C++ language.
Zack Vorhies: Memory in your computer is laid out as one giant array of numbers. We represent these numbers here as hexadecimal, which is base 16 (hexadecimal) because it's easier to work with… for reasons.
The problem area? The computer tried to read memory address 0x9c (aka 156).
Why is this bad?
This is an invalid region of memory for any program. Any program that tries to read from this region WILL IMMEDIATELY GET KILLED BY WINDOWS.
So why is memory address 0x9c trying to be read from? Well because… programmer error.
It turns out that C++, the language crowdstrike is using, likes to use address 0x0 as a special value to mean "there's nothing here", don't try to access it or you'll die.
…
And what's bad about this is that this is a special program called a system driver, which has PRIVLIDGED access to the computer. So the operating system is forced to, out of an abundance of caution, crash immediately.
This is what is causing the blue screen of death. A computer can recover from a crash in non-privileged code by simply terminating the program, but not a system driver. When your computer crashes, 95% of the time it's because it's a crash in the system drivers.
If the programmer had done a check for NULL, or if they used modern tooling that checks these sorts of things, it could have been caught. But somehow it made it into production and then got pushed as a forced update by Crowdstrike… OOPS!
Here is another technical breakdown.
A non technical breakdown would be:
1. CrowdStrike is set up to run whenever you start the computer.
2. Then someone pushed an update to a ton of computers.
3. Which is something CrowdStrike was authorized to do.
4. The update contained a stupid bug, that would have been caught if those involved had used standard practices and tests.
5. With the bug, it tries to access memory in a way that causes a crash.
6. Which also crashes the computer.
7. So you have to do a manual fix to each computer to get around this.
8. If this had been malicious it could probably have permawiped all the computers, or inserted Trojans, or other neat stuff like that.
9. So we dodged a bullet.
10. Also, your AI safety plan needs to take into account that this was the level of security mindset and caution at CrowdStrike, despite CrowdStrike having this level of access and being explicitly in the security mindset business, and that they were given this level of access to billions of computers, and that their stock was only down 11% on the day so they probably keep most of that access and we aren't going to fine them out of existence either.
Yep.
Who to Blame?
George Kurtz (CEO CrowdStrike): CrowdStrike is actively working with customers impacted by a defect found in a single content update for Windows hosts. Mac and Linux hosts are not impacted. This is not a security incident or cyberattack. The issue has been identified, isolated and a fix has been deployed.
We refer customers to the support portal for the latest updates and will continue to provide complete and continuous updates on our website. We further recommend organizations ensure they're communicating with CrowdStrike representatives through official channels. Our team is fully mobilized to ensure the security and stability of CrowdStrike customers.
Dan Elton: No apology. Many people have...
view more