Hacking the future: Notes from DEF CON’s Generative Red Team Challenge

The 2023 DEF CON hacker convention in Las Vegas was billed as the world’s largest hacker event, focused on areas of interest from lockpicking to hacking autos (where the entire brains of a vehicle were reimagined on one badge-sized board) to satellite hacking to artificial intelligence. My researcher, Barbara Schluetter, and I had come to see the Generative Red Team Challenge, which purported to be “the first instance of a live hacking event of a generative AI system at scale.”

It was perhaps the first public incarnation of the White House’s May 2023 wish to see large language models (LLMs) stress-tested by red teams. The line to participate was always longer than the time available, that is, there was more interest than capability. We spoke with one of the organizers of the challenge, Austin Carson of SeedAI, an organization founded to “create a more robust, responsive, and inclusive future for AI.”

Carson shared with us the “Hack the Future” theme of the challenge — to bring together “a large number of unrelated and diverse testers in one place at one time with varied backgrounds, some having no experience, while others have been deep in AI for years, and producing what is expected to be interesting and useful results.”

Participants were issued the rules of engagement, a “referral code,” and brought to one of the challenge’s terminals (provided by Google). The instructions included:

A 50-minute time limit to complete as many challenges as possible.
No attacking the infrastructure/platform (we’re hacking only the LLMs).
Select from a bevy of challenges (20+) of varying degrees of difficulty.
Submit information demonstrating successful completion of the challenge.

Challenges included prompt leaking, jailbreaking, and domain switching

The challenges included a variety of goals, including prompt leaking, jailbreaking, roleplay, and domain switching. The organizers then handed the keys to us to take a shot at breaking the LLMs. We took our seats and became a part of the body of testers and quickly recognized ourselves as fitting firmly in the “slightly above zero knowledge” category.

We perused the various challenges and chose to attempt…

Source…