OpenAI’s GPT-4 Can Autonomously Exploit 87% of One-Day Vulnerabilities

The GPT-4 large language model from OpenAI can exploit real-world vulnerabilities without human intervention, a new study by University of Illinois Urbana-Champaign researchers has found. Other open-source models, including GPT-3.5 and vulnerability scanners, are not able to do this.

A large language model agent — an advanced system based on an LLM that can take actions via tools, reason, self-reflect and more — running on GPT-4 successfully exploited 87% of “one-day” vulnerabilities when provided with their National Institute of Standards and Technology description. One-day vulnerabilities are those that have been publicly disclosed but yet to be patched, so they are still open to exploitation.

“As LLMs have become increasingly powerful, so have the capabilities of LLM agents,” the researchers wrote in the arXiv preprint. They also speculated that the comparative failure of the other models is because they are “much worse at tool use” than GPT-4.

The findings show that GPT-4 has an “emergent capability” of autonomously detecting and exploiting one-day vulnerabilities that scanners might overlook.

Daniel Kang, assistant professor at UIUC and study author, hopes that the results of his research will be used in the defensive setting; however, he is aware that the capability could present an emerging mode of attack for cybercriminals.

He told TechRepublic in an email, “I would suspect that this would lower the barriers to exploiting one-day vulnerabilities when LLM costs go down. Previously, this was a manual process. If LLMs become cheap enough, this process will likely become more automated.”

How successful is GPT-4 at autonomously detecting and exploiting vulnerabilities?

GPT-4 can autonomously exploit one-day vulnerabilities

The GPT-4 agent was able to autonomously exploit web and non-web one-day vulnerabilities, even those that were published on the Common Vulnerabilities and Exposures database after the model’s knowledge cutoff date of November 26, 2023, demonstrating its impressive capabilities.

“In our previous experiments, we found that GPT-4 is excellent at planning and following a plan, so we were not surprised,” Kang told…

Source…