Microsoft reports that massive 10hr Azure outage caused by DDoS amplified with error in setup
Take action: Another example of how an assumed configuration can cause a massive problem. An error like this one is very difficult to catch - especially in an infrastructure as large as Azure. The only - somewhat good - process is to run a DDoS test after the configuration. Takeaway - when you build up a setup, do your level best to test it out in a scenario as close to reality as possible. Better that you test than a real disaster to reveal the errors.
Learn More
On July 30, 2024, Microsoft confirmed that a nine-hour outage affecting multiple Microsoft 365 and Azure services was caused by a distributed denial-of-service (DDoS) attack. The outage impacted services including
- Microsoft Entra,
- Microsoft 365,
- Intune
- Power BI
- Power Platform
- Microsoft Purview,
- Azure App Services,
- Azure Application Insights,
- Azure IoT Central,
- Azure Log Search Alerts,
- Azure Policy,
- Azure portal.
The DDoS attack triggered their DDoS protection mechanisms, but an error in the implementation of these defenses inadvertently amplified the impact of the attack rather than mitigating it. Microsoft responded by making networking configuration changes and performing failovers to alternate networking paths to alleviate the issue.
Microsoft plans to release a Preliminary Post-Incident Review (PIR) within 72 hours, followed by a Final Post-Incident Review within two weeks. These reports will provide additional details and insights into the incident.