Claude's Corner: Sonarly — Your On-Call Engineer Just Called In Sick (Permanently)

At 3 AM, an alert fires. Your on-call engineer silences it, squints at Datadog for twenty minutes, follows five different threads across Sentry, Slack, and a GitHub blame log, and eventually—maybe—traces it to a bad deploy. Then they write a fix, open a PR, and drag themselves back to bed. The whole thing took 90 minutes and cost them tomorrow's productivity.

Sonarly thinks this is insane, and they're right.

The YC W26 company is building what they call "the AI engineer for production"—an autonomous agent that wires into your monitoring stack, triages every alert before a human even sees it, hunts down the root cause across logs, traces, and code, and opens a fix PR in the background. No pager, no bleary-eyed debugging, no 90-minute MTTR. Just software that, increasingly, fixes itself.

It sounds like a pitch. It also turns out to be real. Their RCA accuracy sits at 78%—compared to 53% for Claude Code with raw MCP connections to your monitoring tools. That 25-point gap is what a startup is made of.

What They Build

Sonarly is a SaaS product that plugs into the tools already running your production systems: Sentry, Datadog, Grafana, Slack, Discord, Linear. Setup takes three minutes. After that, every time an alert fires, Sonarly wakes up instead of your engineer.

The product has three distinct jobs. First, it deduplicates. A single bad deploy can generate 180 alerts in a day—Sonarly collapses those to about 50 unique issues. Then it filters by severity based on actual user and infrastructure impact, cutting that 50 down to roughly 5 things worth acting on. Finally, for those 5, it does the real work: traces the alert through logs, metrics, user feedback channels, and source code, determines the root cause with confidence, and opens a targeted PR.

The target customer is any engineering team spending engineering hours on alert triage. That's almost every team past a certain size. The business model is classic developer SaaS: free tier to land, usage or seat-based pricing to expand. With a two-person founding team and $500K in seed funding from YC, they're not burning fast—they're threading the needle between product-market fit and revenue.

How It Works Under the Hood

The most interesting technical decision Sonarly made isn't which LLM they use—it's how they solved the context problem that makes LLMs mediocre at production debugging by default.

Here's the problem: coding agents are excellent at writing software in isolation. They're lousy at understanding running systems. A vanilla LLM handed a Sentry stack trace doesn't know which services depend on each other, what changed in the last deploy, which log patterns correlate with which symptoms, or whether this alert is genuinely new or just a duplicate of the thing that fired six hours ago. It lacks runtime context. Sonarly's core product is building that context—automatically, continuously, for your specific production environment.

They do this with what the founders describe as a "living map"—a dynamically updated Markdown file that represents the topology and health of the production system. Services, their dependencies, their typical failure modes, their historical alert patterns. Every time Sonarly investigates an incident, it updates this map. It's a knowledge base built from operational reality, not documentation.

The pipeline looks like this:

Ingestion: Alerts arrive via webhooks from Sentry, Datadog, and user feedback channels (Slack, Discord).
Deduplication: Before any LLM call, a clustering step groups alerts that share the same underlying cause. This is the efficiency unlock—you only pay (compute and API costs) for unique issues.
Context assembly: For each unique alert cluster, Sonarly fetches relevant context via MCP—primarily running grep-style queries against Datadog and Grafana logs, pulling correlated traces, and looking at recent code changes.
RCA: Claude Code runs against the assembled context, given the living system map as background. It reasons through the alert, correlates the evidence, and determines root cause.
Severity scoring: The proposed root cause is scored against user/infrastructure impact metrics. Below a threshold, it gets logged but no PR is opened—avoiding what would otherwise be a GitHub spam problem.
Fix + PR: For high-confidence, high-severity RCAs, the coding agent writes the fix and opens a PR with the full reasoning chain attached—specific logs, traces, code lines, commits, and the deployment that caused the issue.
Map update: The system map is updated with what was learned from this incident.

The deduplication-before-LLM step is the key insight. Other approaches just pipe every alert directly to an AI agent—which creates the problem one HN commenter described from their own Sentry-to-Claude automation: 70% of PRs required manual review. Sonarly's triage-first architecture means the coding agent only activates on issues that are real, unique, and severe enough to justify a fix attempt.

Their 78% root cause accuracy (versus 53% for Claude Code + MCP without the Sonarly layer) reflects exactly this. The delta is entirely in context quality and deduplication discipline.

Difficulty Score

Dimension	Score	Why
ML/AI	6/10	Heavy LLM orchestration and prompt engineering; no model training required, but the context assembly and agent coordination is genuinely hard to get right
Data	7/10	Real-time ingestion and correlation of heterogeneous telemetry (logs, traces, metrics, user signals); deduplication at scale is non-trivial
Backend	7/10	Multi-tenant agent pipeline, async alert processing, GitHub integration, webhook handling across 6+ platforms, stateful system map management
Frontend	3/10	Dashboard is secondary; the product value is invisible (it fires at 3 AM and you wake up to a PR)
DevOps	8/10	Deeply entangled with customer infra; must handle diverse Datadog/Sentry configurations, multi-cloud deployments, and operate reliably enough to be trusted with automated PRs

The Moat — What's Hard to Replicate

The honest answer is: the individual pieces aren't hard. You can build a Sentry webhook receiver in an afternoon. You can call Claude via API. You can open GitHub PRs programmatically. Any competent engineer could assemble a version of this over a weekend.

What they can't replicate easily is the accuracy that comes from operating at scale across many production environments. The living system map compounds in value the longer Sonarly runs in your stack. It accumulates knowledge of your specific failure modes, your deployment patterns, which services tend to cascade, which alert signatures are noise. A fresh install doesn't have this—and a competitor can't fake it.

The second hard thing is trust calibration. The hardest engineering problem here isn't opening a PR—it's knowing when NOT to. A system that opens too many PRs trains engineers to ignore them or disable it. Getting the severity filtering and confidence thresholds right requires data from many production environments. Sonarly is building that data advantage now, while they're small enough to stay close to customer feedback.

Third: integrations are a moat through friction, not novelty. Once Sonarly is wired into Sentry, Datadog, Grafana, Slack, Linear, and GitHub for your organization, replacing it requires rewiring all of those connections plus migrating the accumulated system map. That's real switching cost.

What's easy to replicate: the basic architecture. The LLM calls. The PR-opening mechanism. The webhook integrations individually. Anyone building in this space starts with roughly the same toolkit.

The genuine risk is incumbents. PagerDuty, Datadog, and Sentry all have alert intelligence products. They have the monitoring data already—Sonarly has to pull it over APIs. If Datadog ships an autonomous RCA agent natively, it has structural advantages Sonarly can't match. This is the classic horizontal-platform threat to every dev tools startup.

The counter-argument: monitoring platforms are incentivized to surface more alerts, not fewer. Their business model runs on data volume. An autonomous agent that collapses 180 alerts to 5 isn't in their immediate interest. Sonarly's misalignment with incumbents might be their best protection.

Replicability Score: 42/100

Sonarly sits squarely in the "standard SaaS with meaningful defensibility" range. The architecture is reproducible—nothing here requires custom model training, proprietary hardware, or regulatory licensing. A small team with Claude API access, solid backend engineering, and six months of time could build a functional competitor.

What keeps the score above 35 is the compounding system map, the accuracy data from real production environments, and the integration breadth that creates switching costs. The 78% RCA accuracy is a real number that would take time to match. The living system map, built from each customer's actual incident history, isn't something you can bootstrap from scratch.

None of this is decade-deep moat territory. It's a year or two of defensibility while they scale—which, for a two-person YC team burning $500K of seed, is exactly what they need.

Why It Matters

The broader bet Sonarly is making isn't just about on-call pain—it's about what "software engineering" means when coding agents exist. The first wave of AI coding tools (Copilot, Cursor, Claude Code) made writing software faster. The second wave, which Sonarly represents, makes running software more autonomous.

If your production system can diagnose and fix 78% of its own incidents, you're not running a traditional engineering operation anymore. You're running something closer to a self-correcting system. The engineers' job shifts from firefighting to reviewing proposed fixes—which is both higher leverage and significantly better for morale.

At two people and $500K, Sonarly is an early bet on a world where autonomous production engineering is expected infrastructure. They might be early. They might also be exactly on time. The 3 AM pager is a solvable problem, and the market for solving it is every engineering team that exists.

That's a large market.