Microsoft releases tool to test generative AI security

Microsoft teams have been using PyRIT internally for a few months.

On February 22, 2024, Microsoft published a risk detection tool for generative AI, named Python Risk Identification Toolkit for generative AI (PyRIT). Microsoft’s AI Red Teams have been using it internally for several months, including on Copilot, the Redmond-based firm’s chatbot.

In a year, Microsoft researchers tested more than 60 generative AI systems, putting themselves in the shoes of cybercriminals. The company noted significant differences in attack strategies, compared to traditional software and AI. First off, generative AI combines the cyber risks inherent to any computer system and risks that are specific to the new tech, such as the potential development of malicious content and disinformation.

Moreover, the architecture of AI systems varies considerably from one model to another. Microsoft finally points out that the same request can generate different results according to the model, but also the context. The last two issues make it significantly more complicated to standardize security tests, requiring a heavy workload to reach reliable results.

This observation led to the creation of the PyRIT, which makes it possible to automate as many tasks as possible, and thus keep the most sensitive tasks for human teams. In particular, Microsoft’s tool can send malicious prompts to a generative AI model, and assess the response according to a scoring system. PyRIT is then able to generate a new test prompt, based on the score.

“For instance, in one of our red teaming exercises on a Copilot system, we were able to pick a harm category, generate several thousand malicious prompts, and use PyRIT’s scoring engine to evaluate the output from the Copilot system all in the matter of hours instead of weeks,” reads the Microsoft press release.