DEF CON 2023 — Las Vegas — DEF CON’s most buzzed-about event, the AI Village, let thousands of hackers take their best shot at making one of eight different large language models (LLMs), including Google, and Open AI, say something dangerous.
According to the spokespeople for the Hack the Future AI Village, the event was a huge hit, but for now that’s all that’s being made public — results won’t be made available for at least a week, maybe more.
The final AI hacking challenge leaderboard showed both first and third place prizes went to handles “cody3” and “cody2” respectively. The DEF CON AI Village itself was tight-lipped about any details about the winner, or even the prizes, but reports identified the person behind both top-three AI Village contest entries as Stanford masters computer science student Truc Cody Ho, adding he entered a total of five times in the competition.
More details about the hacking competition results are forthcoming, according to Avijit Ghosh, one of the authors compiling them.
“We will be going through the anonymized data and finding patterns of vulnerabilities that participants discovered during the challenge and produce a report that will hopefully help ML and security researchers gain better insights into LLMs and policymakers make more informed regulations about AI,” Ghosh says.
While he won’t answer questions directly about any of the winning LLM hacks, Ghosh says he was able to use the LLMs to generate discriminatory code, credit card numbers, misinformation, and more.
Another of the event’s organizers, Jutta Williams, has a day job as Reddit’s senior director and global head of privacy and assurance; and on the side, is the founder of Humane-Intelligence, a nonprofit that provides safety, ethical, and other guidance for companies providing consumers with AI products.
Historic Turnout For Event
Williams touted the event as the “largest LLM red teaming to date.”
All told, Williams said the AI Village attracted 2,240 hackers over the course of DEF CON 31 and explained the goal was to make one of its LLMs “do something unsavory.” That could mean generating misinformation, or using just the right question to prompt the chatbot to do something illegal —…