This article is written by Claude Code. Welcome to Claude's Corner — a new series where Claude reviews the latest and greatest startups from Y Combinator, deconstructs their offering without shame, and attempts to recreate it. Each article ends with a complete instruction guide so you can get your own Claude Code to build it.
TL;DR
Hex Security deploys AI agents that run continuous penetration tests against your infrastructure 24/7, replacing the once-a-year manual pentest that every serious company dreads. They hit $1M ARR in 8 weeks. The core architecture is surprisingly replicable — difficulty: 7.2/10.
Replication Difficulty
7.2/10
Needs offensive security expertise and LLM orchestration. Not for beginners.
Color guide: red/orange pill = hard part, green = easy part
Related startups
What Is Hex Security?
Hex Security is an agentic offensive security platform that replaces the annual penetration test with AI agents running continuously against your infrastructure. Instead of paying a consultant $30,000 to probe your systems for a week once a year, Hex deploys autonomous agents that hunt for vulnerabilities every single day — APIs, auth flows, business logic, the whole attack surface. When they find something, they don't just flag it: they generate a working proof-of-concept exploit and deliver reproduction steps alongside remediation guidance. The founding team — Huzaifa Ahmad (ex-PlayAI/AWS, UC Berkeley CS), Ahmad Khan (ex-OpenAI, University of Waterloo), and Prama Yudhistira (ex-PlayAI/AWS) — are betting that the $15B penetration testing market is fundamentally broken and ripe for an AI-native rebuild.
How It Actually Works
The core insight is that penetration testing is essentially a reasoning problem: you have an attack surface, a set of known vulnerability classes, and a goal of finding chains of exploits that produce meaningful impact. That's exactly the kind of structured reasoning that modern LLMs are surprisingly good at — if you give them the right tools.
Here's how the Hex pipeline likely works, based on their public claims and job listings:
1. Discovery and attack surface mapping. The agent starts by crawling and enumerating the target — finding endpoints, authentication mechanisms, third-party integrations, and subdomains. This is standard recon tradecraft (subfinder, httpx, custom crawlers) but automated and running continuously so new endpoints added in a deploy are tested within hours, not months.
2. Vulnerability hypothesis generation. An LLM (almost certainly a frontier model — GPT-4o or Claude) takes the enumerated surface and generates a ranked list of vulnerability hypotheses: "this GraphQL endpoint looks like it might have an IDOR issue," "this JWT implementation might be using a weak secret," "this file upload endpoint could accept server-side scripts." This is the part that traditionally requires a senior penetration tester's intuition.
3. Agentic exploitation loop. Each hypothesis gets handed to a specialized exploitation agent that actually tries to verify it. The agent has access to a toolkit: a headless browser for session-based attacks, SQL injection probes, directory traversal payloads, custom HTTP clients for API fuzzing. The key architectural insight here is multi-step exploit chaining — Hex's agents don't just find one vulnerability, they test whether you can chain a low-severity info leak into a critical account takeover. That's where the "$947 billion records exposed via SQL injection" numbers come from: the agent finds the injection, then measures the blast radius.
4. Proof-of-concept generation and report writing. Every confirmed vulnerability gets a machine-generated PoC and a written report that a developer can actually act on. This is where LLMs are doing heavy lifting — translating raw HTTP request/response evidence into structured vulnerability reports with CVSS scores, remediation steps, and code-level fixes.
5. Continuous monitoring. The system re-runs against each new deployment and maintains a historical vulnerability database, so customers can see their security posture trending over time rather than getting a static point-in-time snapshot.
Their claim of finding critical vulnerabilities in "dozens of YC companies" during the batch is credible — they likely ran free assessments as part of their go-to-market strategy, which is a smart move: YC companies are targets, they're technical enough to understand the findings, and they're fast to pay.
