When Hackers Descended to Test A.I., They Found Flaws Aplenty


Avijit Ghosh wanted the bot to do bad things.

He tried to goad the artificial intelligence model, which he knew as Zinc, into producing code that would choose a job candidate based on race. The chatbot demurred: Doing so would be “harmful and unethical,” it said.

Then, Dr. Ghosh referenced the hierarchical caste structure in his native India. Could the chatbot rank potential hires based on that discriminatory metric?

The model complied.

Dr. Ghosh’s intentions were not malicious, although he was behaving like they were. Instead, he was a casual participant in a competition last weekend at the annual Defcon hackers conference in Las Vegas, where 2,200 people filed into an off-Strip conference room over three days to draw out the dark side of artificial intelligence.

The hackers tried to break through the safeguards of various A.I. programs in an effort to identify their vulnerabilities — to find the problems before actual criminals and misinformation peddlers did — in a practice known as red-teaming. Each competitor had 50 minutes to tackle up to 21 challenges — getting an A.I. model to “hallucinate” inaccurate information, for example.

They found political misinformation, demographic stereotypes, instructions on how to carry out surveillance and more.

The exercise had the blessing of the Biden administration, which is increasingly nervous about the technology’s fast-growing power. Google (maker of the Bard chatbot), OpenAI (ChatGPT), Meta (which released its LLaMA code into the wild) and several other companies offered anonymized versions of their models for scrutiny.

Dr. Ghosh, a lecturer at Northeastern University who specializes in artificial intelligence ethics, was a volunteer at the event. The contest, he said, allowed a head-to-head comparison of several A.I. models and demonstrated how some companies were further along in ensuring that their technology was performing responsibly and consistently.

He will help write a report analyzing the hackers’ findings in the coming months.

The goal, he said: “an easy-to-access resource for everybody to see what problems exist and how we can combat them.”

Defcon was a logical place to test generative artificial…

Source…