OpenAI and Paradigm Launch EVMbench to Test AI Smart Contract Hacking

OpenAI and crypto venture firm Paradigm have released EVMbench, a benchmark that measures how well AI agents can find, fix, and exploit vulnerabilities in Ethereum smart contracts. The announcement comes as AI-powered security tools race to protect the $100 billion-plus locked in DeFi protocols.

The benchmark draws from 120 curated high-severity vulnerabilities pulled from 40 real security audits, mostly from Code4rena competitions. It also includes vulnerability scenarios from security reviews of Tempo, a Layer 1 blockchain built for stablecoin payments.

Three Ways to Break Smart Contracts

EVMbench tests AI agents across three distinct modes. In Detect mode, agents audit contract repositories and get scored on finding known vulnerabilities. Patch mode requires agents to fix vulnerable code without breaking existing functionality. Exploit mode is the most aggressive—agents must execute actual fund-draining attacks against contracts deployed on a sandboxed blockchain.

The results show how quickly AI capabilities are advancing in this domain. GPT-5.3-Codex running via Codex CLI hit a 72.2% success rate on exploit tasks. That’s more than double the 31.9% score from GPT-5, which launched just six months prior.

Interestingly, AI agents perform better at attacking than defending. The exploit setting has a clear objective—keep iterating until you drain the funds. Detection and patching proved harder. Agents sometimes stopped after finding one bug instead of auditing exhaustively, and maintaining full contract functionality while removing subtle vulnerabilities remained challenging.

Real Limitations Worth Noting

OpenAI acknowledged EVMbench doesn’t capture the full difficulty of real-world contract security. Heavily deployed protocols like Uniswap or Aave undergo far more scrutiny than audit competition code. The benchmark also can’t verify if an agent finds legitimate vulnerabilities that human auditors missed—it only checks against known issues.

The exploit environment runs on a clean local Anvil instance rather than forked mainnet state, and timing-dependent attacks fall outside scope. Single-chain environments only for now.

$10M for Defensive Research

Alongside EVMbench, OpenAI committed $10 million in API credits specifically for defensive security research. The company is expanding its Aardvark security research agent to more users and partnering with open-source maintainers for free codebase scanning.

The timing matters. As AI agents get better at exploiting contracts, the window between vulnerability discovery and exploitation shrinks. Protocol teams that aren’t using AI-assisted auditing will increasingly find themselves at a disadvantage against attackers who are.

OpenAI released EVMbench’s tasks, tooling, and evaluation framework publicly. For DeFi developers and security researchers, it’s both a measuring stick and a warning about where AI capabilities are headed.

Image source: Shutterstock

Source link

OpenAI Retires GPT-4o After Lawsuits Over Validating…

XRP Treasury CEO Reveals Exactly What’s Coming…

Iranian Crypto Outflows Top $10.3M After Airstrikes,…

Star Trek’s Captain Kirk Unveils X Money…

Ethereum Buying Pressure Mounts Ahead of Critical…

Iranian Crypto Outflows Top $10.3M After Airstrikes,…

WTI Crude Oil Defies Pressure, Holding Critical…

Harvey AI Showcases Legal Industry Adoption Through…

Blockstream Unveils Quantum-Resistant Bitcoin Signing Demo

PBOC sets Yuan mid-point at strongest level…

Tokenaltcoin

OpenAI and Paradigm Launch EVMbench to Test AI Smart Contract Hacking

Three Ways to Break Smart Contracts

Real Limitations Worth Noting

$10M for Defensive Research

Tokenaltcoin

Gold-Linked Digital Assets Remain Firm

PEPE Price Prediction; Dogecoin Latest News & PayFi...

BlackRock’s Bitcoin ETF Options Surpass Deribit’s Volume

BlockDAG, Nexchain, Ozak AI, & BlockchainFX Lead 2025...

Consolidation with key levels in focus – OCBC

BitMEX Co-Founder Arthur Hayes to Explain How Trump’s...

OpenAI and Paradigm Launch EVMbench to Test AI Smart Contract Hacking

Three Ways to Break Smart Contracts

Real Limitations Worth Noting

$10M for Defensive Research

Related posts