Hacker demonstrates security flaws in GPT-4 just one day after launch

Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More

OpenAI’s powerful new language model, GPT-4, was barely out of the gates when a student uncovered vulnerabilities that could be exploited for malicious ends. The discovery is a stark reminder of the security risks that accompany increasingly capable AI systems.

Last week, OpenAI released GPT-4, a “multimodal” system that reaches human-level performance on language tasks. But within days, Alex Albert, a University of Washington computer science student, found a way to override its safety mechanisms. In a demonstration posted to Twitter, Albert showed how a user could prompt GPT-4 to generate instructions for hacking a computer, by exploiting vulnerabilities in the way it interprets and responds to text.

While Albert says he won’t promote using GPT-4 for harmful purposes, his work highlights the threat of advanced AI models in the wrong hands. As companies rapidly release ever more capable systems, can we ensure they are rigorously secured? What are the implications of AI models that can generate human-sounding text on demand?

VentureBeat spoke with Albert through Twitter direct messages to understand his motivations, assess the risks of large language models, and explore how to foster a broad discussion about the promise and perils of advanced AI. (Editor’s note: This interview has been edited for length and clarity.)

Event

Transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.

VentureBeat: What got you into jailbreaking and why are you actively breaking ChatGPT?

Alex Albert: I got into jailbreaking because it’s a fun thing to do and it’s interesting to test these models in unique and novel ways. I am actively jailbreaking for three main reasons which I outlined in the first section of my newsletter. In…

Source…