In a startling turn of events, Microsoft’s AI research division found itself in trouble after it accidentally exposed 38 terabytes of sensitive data to the public. Cloud security startup Wiz has stumbled upon a GitHub repository hosting tens of terabytes of sensitive data.
The breach exposed important private keys and passwords, along with messages from internal Microsoft Teams and the personal backups of two systems belonging to the employees.
The ongoing investigation of Wiz into cloud-hosted data exposures led to the discovery of this gnawing security loophole. The GitHub repository was meant to provide open-source code and AI models for image recognition.
These instructed users to download these models from an Azure Storage URL. However, this URL mistakenly granted permissions to the entire storage account. This exposed sensitive private data to the public.
Ami Luttwak, the co-founder and CTO of Wiz, focused on the growing importance of securing vast amounts of data that development teams involved in AI projects have been manipulating. With AI solutions exponentially growing, the key complexity lies in securing sensitive details.
The Gnawing Exposure Persisted Since 2020
The exposure, which had persisted since 2020, was the result of an overly permissive shared access signature (SAS) token included in the URL. Microsoft was quick to respond after being alerted by Wiz on June 22, 2023. A couple of days later, they revoked the problematic SAS token.
No customer data was exposed, and no other internal services were put at risk because of this issue.Microsoft
Microsoft’s Security Response Center detailed its response in a blog post, stating that the incident led to improvements in GitHub’s secret scanning service.
Now, this improvement monitors every change in public open-source codes for exposed secrets and credentials, including any SAS tokens with excessive permissions or expirations.
How the Leak Took Place
The leak originated from the “robust-models-transfer” repository on Microsoft’s AI GitHub. Due to carelessness, this repository became public during the publication of an open-source training data bucket. Besides valuable source code and machine learning models related to a 2020 research paper, it contained sensitive information from the workstations of employees.
This misconfiguration posed a significant security risk, as reported by Wiz researchers Ronny Greenberg and Hillai Ben-Sasson.
Anyone with knowledge of the SAS token could access the files, and one might have also deleted or overwritten the data.
To prevent vulnerabilities in the future, Microsoft focused on the importance of considering Account SAS tokens as sensitive as the account key itself. They advised not to use Account SAS tokens for external sharing, considering the potential for mistakes during token creation going unnoticed.
With the tech industry becoming largely reliant on AI, it serves as a stark reminder that data security remains paramount. Even the most technically advanced organizations can unintentionally end up exposing sensitive information, which calls for robust vigilance measures.