OSINT Put to the Test of AI: Exploring, Measuring, Verifying

The rise of artificial intelligence is reshaping OSINT practices without overturning their core principles. Used as a tool for exploration and objectification, AI makes it possible to process unprecedented volumes of data—provided it remains strictly framed by human methods of verification and control.

OSINT is based on the methodical exploitation of open sources in order to produce verifiable and contextualized information. According to the latest report published by Global Market Insights, the global OSINT market was valued at USD 12.7 billion in 2025. It is expected to grow from USD 15.9 billion in 2026 to USD 133.6 billion by 2035, with a compound annual growth rate of 26.7%. The arrival of artificial intelligence is profoundly transforming this practice, not by redefining its principles, but by extending its operational capabilities. By facilitating the processing of large volumes of heterogeneous data and helping to structure complex corpora, AI makes it possible to intervene upstream in investigative work, as a tool for exploration and prioritization of leads.

For Camille Pettineo, Deputy Editor-in-Chief in charge of editorial data exploitation at the INA (Institut National de l’Audiovisuel), AI represents a lever for objectifying the vast quantities of data held by the INA through its data portal, data.ina.fr. Launched in 2024, this platform makes available to the general public data drawn from hundreds of thousands of hours of French audiovisual archives. It uses artificial intelligence tools such as Whisper and TextRazor (an AI-based semantic analysis tool) to transcribe, analyze, and extract metadata from television and radio content, making it possible to explore media trends through interactive visualizations. Covering more than five years of history, the data.ina.fr site currently stores more than 27 million hours of programming.

Analyzing Media Coverage of Key Topics in Detail

Among the themes closely monitored by Camille Pettineo is gender parity in the media. Drawing on the tools available on data.ina.fr, the data journalist precisely documents the distribution of speaking time. “Gender parity cannot be reduced to declarative figures published each year by broadcasters or regulatory authorities. Saying that a channel features a certain number of women and men does not suffice to describe the reality of editorial practices. We went to look concretely at when speaking time is allocated and in which editorial contexts. A fine-grained analysis of time slots reveals marked disparities—for example on Sundays at 7 p.m., when women represented only 23% of speakers in November 2025,” Camille Pettineo stated during a roundtable organized at the OSINT Festival 2025 in Paris.

Another example concerns sexist and sexual violence. In an article published in La Revue des médias (INA), Camille Pettineo analyzes media coverage of the issue by 16 French audiovisual outlets over the 2019–2024 period. “The conclusion that emerges is very clear: between January 2019, a few months after the emergence of the term #MeToo, and the end of July 2024, #MeToo has never been mentioned as much as it has since the beginning of 2024. Quite simply, during the first half of 2024, the term had already been uttered more often on the airwaves of the 16 media outlets studied than over the entirety of each of the previous years, with a ratio ranging from one to two,” explains the Deputy Editor-in-Chief.

Source: data.ina.fr – * Mentions counted in speaking turns in which the term “MeToo” is mentioned at least once. ** The channels concerned are TF1, France 2, France 3, Arte, M6, BFM TV, LCI, CNews, France Info (radio and TV), Europe 1, France Culture, France Inter, RMC, RTL, and Sud Radio.

AI Helps Find a Needle in a Haystack

Another testimony complements that of Camille Pettineo: that of Manon Romain, data journalist at Les Décodeurs (Le Monde). In her daily work, Manon Romain uses AI above all as a lever for technical efficiency, particularly for code generation. She notes that her job requires constant programming, whether to produce visualizations, process complex datasets, or develop internal tools for the newsroom. In this area, AI provides a tangible time-saving benefit, especially through advanced auto-completion systems. “I spend a lot of time coding, and AI is an enormous help in this respect,” she explains. She cites tools such as Cursor, an IDE (Integrated Development Environment) capable of anticipating changes several hundred lines of code away.

Another widely shared use concerns the automatic transcription of interviews, using an internal tool based on Whisper. “The newsroom uses it extensively, because this tool works extremely well,” she notes, comparing it with previous solutions considered less reliable. Alongside these established uses, Manon Romain also mentions more occasional experiments, particularly to explore corpora of political reactions at the European level or to monitor debates relating to the budget in the National Assembly or the Senate. These approaches remain marginal and tightly supervised, without becoming routine practices within the newsroom.

Manon Romain insists on a central distinction: AI can help explore a corpus, but never establish journalistic proof. She uses a vivid metaphor to describe this exploratory function. “AI has many characteristics that help us ‘find a needle in a haystack.’ It can suggest a lead, propose a hypothesis, or bring out a recurring pattern—provided the journalist retains control over the final validation.” In the case of sensitive work, such as the analysis of political discourse or the construction of message corpora, she describes a systematic process of manual verification of results. “AI seems to have saved us time. But after checking everything, I don’t know how much time was really saved,” she admits. This remark neatly sums up her position: AI accelerates certain steps, but it never eliminates the verification work that remains at the heart of journalistic practice.

“Ground Truth”: Comparing Automated Results and Human Work

A view fully shared by Camille Pettineo. Before data from the data.ina.fr site is published, several methodological safeguards are put in place. One of them involves comparing automated results with human work. “We do what we call ‘ground truth.’ We compare the result produced by AI with the result that would have been produced by humans, starting from the assumption that the human is perfect, that they score twenty out of twenty. We then establish what we call a confidence rate for the differential between the two results,” explains Camille Pettineo.

This comparison, which makes it possible to identify discrepancies, does not necessarily seek to correct them artificially. “We do not want to add human bias to algorithmic bias,” she insists. These anomalies instead become a pedagogical tool for the public. When a result is considered “fragile,” it is flagged and contextualized. “We invite users to click on an orange icon to understand which AI made the error, the impact on the displayed data, and the prospects for correction. The objective remains constant: to foster awareness of biases, supported by detailed methodological documentation,” concludes Camille Pettineo.

Don’t miss the OSINT Day (Forum INCYBER Lille) on April 1st !