Dark web: from the theft of customer databases to the resale of access, anatomy of the data available

Customer databases, internal documents and access to information systems feed a structured market on forums and messaging platforms frequented by cybercriminal actors. Once exfiltrated, this data circulates rapidly, is monetized and sometimes later distributed for free, serving increasingly targeted social engineering operations.

The “dark web” refers to a set of spaces where data originating from cybercriminal activities circulates and is traded. These environments include forums sometimes indexed by search engines, semi-public platforms and messaging channels such as Telegram. Content initially reserved for restricted circles often ends up being relayed into more open spaces, or even distributed free of charge. This shift can be explained by several factors: absence of buyers, conflicts between malicious actors or strategies aimed at strengthening a reputation within these communities.

Personal databases, the first family of available data

The first major category of data visible in these spaces corresponds to personal databases resulting from breaches. “It is very common to find on generalist forums such as ‘BreachForums’ the sale or sharing of listings originating from breaches associated with consumer services or highly visible organizations. These listings contain e-mails, phone numbers, postal addresses and sometimes additional attributes. These may include dates of birth, purchase histories or amounts spent on e-commerce sites, etc. The circulation of this data is rapid: announcement, sample proof and then transaction,” explains David Sygula, Head of CTI at Anozr Way.

Several analysis tools make it possible to better characterize the nature of these compromised personal data. The page “Data breach statistics globally”, provided by the company Surfshark, highlights the following astronomical figure: since 2004, 23.5 billion e-mail addresses have been disclosed worldwide, to which 58.5 billion personal data points are added. France alone accounts for 717 million leaked e-mails and nearly two billion data points.

Surfshark experts have classified 100 types of data points into nine distinct categories. The analysis reveals that three of these categories constitute the core of most data breaches. The first category concerns passwords (30.4% of all breaches). This includes passwords themselves, but also clues that make it possible to recover them in case of loss, security questions and their answers. France ranks second worldwide in this category with 588 million data points.

The second category of data points concerns personal information (28.8% of all breaches). It contains highly sensitive data such as full names, social security numbers, phone numbers, dates of birth and identification document numbers. France ranks second worldwide in this category with 492 million data points. Finally, the third category relates to location (22.9%): physical addresses, postal codes, time zones and IP-based locations. France ranks third worldwide with 307 million data points.

Exfiltrated files and access to companies’ networks and information systems

After personal data come internal documents from companies or organizations. “These are files exfiltrated or retrieved through more opportunistic exposures: insufficiently secured shares, misconfigured cloud spaces, VPN access or compromised servers, insiders,” notes David Sygula. These documents may come from structured attacks, but also from accidental exposures or internal leaks. The expert emphasizes the diversity of the origins of this data, as well as the use of extortion. Some attackers indeed threaten to publish these documents if the company does not pay a certain sum. They then justify the publication by advancing the following argument: “We contacted them, they refused.”

The third major family of data available on the dark web concerns the sale of access to the networks and information systems of certain organizations. David Sygula describes a very active market where specialized actors obtain and maintain persistent access, then resell it, often without explicitly naming the target in the announcement. “The description is made through clues: sector, size and revenue of the company concerned. The negotiation then quickly moves to encrypted messaging platforms. Payment is mostly made in cryptocurrencies (Bitcoin, Ethereum…) in order to limit traceability and avoid overly explicit geographical markers,” he explains.

The operational value of this information lies mainly in its capacity to be aggregated

Alongside these three main families, a more diffuse economy linked to the sale of accounts and credentials has developed. Batches of Netflix or Spotify accounts are thus available, as well as credentials recovered via infostealers (malicious software designed to automatically collect sensitive information from a compromised device). “The unit value is often low, but the logic relies on volume and on the ability to test and then sort the credentials that are actually usable before resale,” specifies the cybersecurity expert.

It should also be noted that, in the recent large-scale public breaches (France Travail, Viamedis, Almerys, Free, Boulanger, Colis Privé, etc.), passwords appear less frequently than before. David Sygula nevertheless explains that the exposed databases aggregate a large amount of personal information that allows attacks to be contextualized. “We are moving from a logic of account compromise to a logic of social engineering, where the data is used to make a call, an SMS or a visit credible, and then to obtain approvals or payments through manipulation,” he comments.

The aggregation of several breaches making it possible to build highly exploitable profiles, some cybercriminals even go so far as to create “enriched” directories accessible via subscription. “All the available data feeds phishing and hybrid scams (digital + physical). These include personalized messages and reassurance steps (phone call, courier) made possible by the knowledge of the address and contextual elements,” emphasizes David Sygula.

Finally, the cybersecurity expert recalls that some open data or legally accessible files may be reposted on dark web forums and then presented as a “hack” to gain reputation. “This republication also plays an operational role: it facilitates access to public data for people who do not know how to retrieve it, which mechanically increases the possible malicious uses (scams, impersonation of context, targeting). We can cite the example of a file of deceased persons made available by INSEE that is redistributed in cybercriminal spaces with comments explaining possible offensive uses,” concludes David Sygula.