Incident

Scraped duolingo user data sold again on a new hacking forum

Take action: While the data leak is not new, two lessons learned: (1) If you are using Duolingo you will be phished and spammed. (2) If you are developing APIs, enforce authentication for the API, limit the number of requests and carefully check which request content you accept and which data is being returned in the API.


Learn More

The confidential information of approximately 2.6 million users of DuoLingo, a prominent language learning platform, is available for purchase once again on a hacking forum.

DuoLingo holds a significant standing as one of the largest language learning websites globally, boasting an extensive user base of more than 74 million individuals worldwide each month.

The initial leak was in January 2023, when a hacker was selling the harvested data from 2.6 million DuoLingo users on the Breached hacking forum, a platform that has since been deactivated.

The compromised dataset comprises a combination of both publicly available information like

  • login and real names,
  • internal details related to the DuoLingo service
  • email addresses.

When the data was initially offered for sale, DuoLingo verified to TheRecord that the information was extracted from public profiles.

In the last two days, the scraped user dataset encompassing 2.6 million entries was unveiled on an updated version of the Breached hacking forum. This data was accessible in exchange for 8 site credits, equivalent to a mere $2 US.

The compromised data was procured utilizing a publicly accessible application programming interface (API), which had been openly shared starting from March 2023. This API enabled users to input a username and retrieve JSON output containing public profile details. However, the API also enabled the input of email addresses to determine their association with valid DuoLingo accounts.

I has been confirmed that this API remains openly accessible on the internet, even after its misuse was reported to DuoLingo in January. The API allows the attacker to feed numerous email addresses, potentially sourced from previous data breaches, into the system, thereby confirming their linkage to DuoLingo accounts. Subsequently, this information was utilized to compile a dataset containing both publicly available and confidential details.

While actual names and login credentials can be found within a user's public DuoLingo profile, in combination with email addresses there is an excellent opportunity for malicious actors to engage in targeted phishing campaign using the leaked data.

DuoLingo has not commented on the continuing public availability of the API.

Scraped duolingo user data sold again on a new hacking forum