Hacking using AI-written exploits: human tests of LLM agents’ PoCs show they are almost all fake. 🤖🖋️📜👨🏻💻🤡
Back in 2025, researchers Vikram Nitin, Baishakhi Ray, and Roshanak Zilouchian Moghaddam reported that their LLM-based framework was able to generate PoC code for vulnerabilities based on CVE descriptions with a success rate of 36%. Impressive results!
In 2026, a group of academic researchers tested the same approach but added manual validation of PoC exploits (not only LLM-based tools). Unfortunately, the manual validation showed that 71.5% of “successful” PoCs were actually invalid.
The LLM-driven exploit generation pipeline looks like this:
Take a CVE description.
Use an LLM agent to reason about possible data flows and branch conditions.
Generate a proof-of-concept (PoC) exploit.
Run it.
Refine it in a feedback loop.
Now, after manual review, researchers found that the LLM was simulating the success of the PoC to complete the task.
For example:
🧨 The LLM was not actually triggering the exploit to confirm successful exploitation. Instead, the LLM-generated exploit was printing “[VULN]” or “[Success]” without even touching the target.
🧨 Instead of analyzing the target code logic, the LLM hardcoded a simplified version of the vulnerable logic directly into the PoC exploit to “simulate” execution after exploitation. And then - it reported “[Success].”
🧨 In some cases, the LLM used legitimate methods to create evidence of exploitation - instead of running the exploit to create a file, it simply created the file directly in the target directory.
Very interesting results, especially keeping in mind recent announcements from Anthropic about entering the cybersecurity business. 🙂 I’ve added both papers below - have a look and please share!
More details:
I Can’t Believe It’s Not a Valid Exploit [PDF, 2026]: https://arxiv.org/abs/2602.04165
FaultLine: Automated Proof-of-Vulnerability Generation Using LLM Agents [PDF, 2025]: https://arxiv.org/abs/2507.15241
目录
最新
- Multiple vulnerabilities in [Censored] electric motorcycles: how to hack a vehicle without one. ⚡🏍️📱🔬🤭
- Hacking wireless security cameras with a laser: from 120 meters away with a 100% success rate! 🛜📹 🔫☠️🥷🏻
- Hack 3D printers into robots: cheap, easy-to-manage & modify lab automation. 📇🛠️🤖🦾🌟
- New attack on OpenClaw or similar AI agents that infects them with “malicious skills.” 🤖💉👾😵💫🤡
- Hacking the Freebox HD (TV/Internet box): how one can use the DooM to exploit a router. 📺📦🔌👨🏻💻💰
- Jamming and spoofing of GPS on ships: real-world attacks on global navigation systems 🛳️🧭😵💫👁️👽
- Hacking ECU of heavy-duty vehicles (trucks, buses, etc.): How to attack the J1939 protocol at scale. 🚜🔧👨🏻💻🦠🎉
- Hacking crypto (cryptography! 🤪) without hard math: how to find bugs in implementation.👨🏻💻🔏👩🏿💻🤔💥