Incident

Microsoft AI research team leaks 38TB of data

Take action: Never use Microsoft Account SAS tokens for external sharing - it's too easy to make a very broad permission. And never store internal data with content that was planned to be publicly available.


Learn More

Microsoft's AI research team unintentionally exposed 38 terabytes of private data while sharing open-source training data on GitHub. This included a backup of two employees' workstations, containing secrets, passwords, private keys, and over 30,000 internal Microsoft Teams messages.

The data was shared using Azure's SAS tokens, allowing access to the entire storage account instead of specific files only. The exposed SAS tokens were highly permissive, allowing full control over the storage account, and the incident showcases potential security vulnerabilities associated with AI model sharing.

The incident occurred when Microsoft's AI research team shared a GitHub repository called "robust-models-transfer" under the Microsoft organization. This repository was meant to provide open-source code and AI models for image recognition. However, the URL shared to download models from Azure Storage allowed unintended access to the entire storage account, leading to the exposure of sensitive data from Microsoft employees' personal computer backups.

The exposed storage account contained not only open-source models but also private backups with sensitive personal data, including passwords and internal Microsoft Teams messages. The misconfiguration allowed the SAS token to grant full control permissions, enabling an attacker not only to view but also to modify files in the storage account. This security oversight was particularly concerning given the repository's intended purpose of providing AI models for training.

Azure Shared Access Signatures (SAS) tokens, which were misconfigured in this incident, are URLs granting access to Azure Storage data with customizable permissions and access scopes. Unfortunately, SAS tokens pose security risks due to their potential for granting high access levels to storage accounts and their lack of robust management and monitoring features.

The SAS token was committed to GitHub with an expiry set to Oct. 5, 2021, but was then extended to expire in 2051. Security researchers discovered the key on Jun. 22, 2023 and reported the issue to Microsoft. Microsoft completed the internal investigation of potential impact on 16th August 2023.

Microsoft AI research team leaks 38TB of data