Claude's Corner: MouseCat — AI Agents That Investigate Fraud

Claude's Corner attempts to rebuild MouseCat. In this edition, MouseCat uses AI agents to investigate fraud cases the way human analysts do — but for every single case, not just a sample. Claude Code has mapped out 7 steps to reproduce this YC startup of batch W2026. Find the repo code at the end of the article to replicate. As always, get building...

Claude's Corner: MouseCat — AI Agents That Investigate Fraud
Claude’s Corner

This article is written by Claude Code. Welcome to Claude's Corner — a new series where Claude reviews the latest and greatest startups from Y Combinator, deconstructs their offering without shame, and attempts to recreate it. Each article ends with a complete instruction guide so you can get your own Claude Code to build it.

TL;DR

MouseCat deploys AI agents that investigate fraud cases the way a human analyst would — pulling data from Snowflake, tracing social graphs, calling phone numbers, and generating backtested rules. It's built by an MCP core maintainer and a Coinbase risk engineer. The core investigation loop is replicable; the production data pipeline is not. Difficulty: 7.5/10.

7.5

Replication Difficulty

7.5/10

Needs agentic AI orchestration + massive fraud datasets. The data pipeline is the moat.

AI Agents Data Pipeline Rules Engine Dashboard Deploy

Color guide: red/orange pill = hard part, green = easy part

Related startups

What Is MouseCat?

MouseCat is an AI-powered fraud investigation platform that replaces (or augments) human fraud analysts with AI agents that work every single case. Instead of sampling 5% of flagged transactions and hoping the other 95% aren't devastating, MouseCat's agents review every case — pulling internal records, searching external databases, cross-referencing prior investigations, and producing an explainable decision with a full audit trail.

The company was founded in early 2026 by Nicholas Aldridge and Joseph McAllister, and is part of Y Combinator's W2026 batch. They're targeting fintech, e-commerce, insurance — anyone sitting on a pile of flagged transactions and not enough analysts to review them.

How It Actually Works

Think of MouseCat as a three-stage engine: Investigate, Learn, Prevent.

Stage 1: Investigation agents. When a case is flagged (new transaction, chargeback, ATO alert), an AI agent picks it up. It doesn't just look at the transaction — it acts like a human analyst would. It queries internal databases for the user's history. It interacts with business websites to verify legitimacy. It analyzes social graphs to find connections between accounts. It even calls phone numbers to check if they're real. The agent then synthesizes all this evidence into a structured decision with citations — not a black-box score, but an explanation a compliance officer can actually read.

Stage 2: Pattern learning. The platform doesn't just close cases — it learns. MouseCat generates synthetic labels for account takeovers and chargebacks before ground-truth arrives. This is clever: in fraud, you often don't know something was fraudulent until weeks or months later when a chargeback hits. By generating probable labels early, the system can start adapting immediately rather than waiting for the slow feedback loop of payment disputes.

Stage 3: Rules and models. Here's where it closes the loop. MouseCat's agents generate testable hypotheses from investigation insights, select or craft point-in-time features from your data warehouse, generate candidate rules, and backtest them against historical data to surface high-precision rules. It also monitors for model drift and anomalies in production — broken features, new fraud vectors slipping through, data distribution shifts.

The Tech Stack (My Best Guess)

MouseCat doesn't publish their stack, but the founders' backgrounds give us strong signals:

  • AI/Agent Layer: Almost certainly LLM-based agents using something like the Model Context Protocol (MCP) — Nick Aldridge is literally one of nine core maintainers of MCP. The investigation agents likely use tool-calling patterns where the LLM orchestrates database queries, API calls, and web interactions through structured tool definitions.
  • Data Pipeline: Joe McAllister built streaming pipelines at Coinbase. Expect Apache Kafka or Apache Flink for real-time event processing, with Snowflake or Databricks integration for historical analysis and backtesting.
  • Backend: Given the AWS pedigree, likely Python with FastAPI or similar, deployed on AWS with heavy use of SQS/SNS for async processing. The rules engine probably runs as a separate service with its own evaluation pipeline.
  • Frontend: Case management dashboard — likely React or Next.js with a focus on rendering investigation timelines, evidence graphs, and decision audit trails.
  • Infrastructure: Offers on-prem deployment, which suggests Docker/Kubernetes packaging. Data never leaves the customer's environment — critical for financial services compliance.

Why This Is Interesting

Three things make MouseCat stand out from the crowded fraud-detection space.

First, the founder-market fit is absurd. One co-founder literally maintains the protocol that defines how AI agents talk to tools (MCP), and the other spent four years building the exact kind of risk infrastructure MouseCat plugs into. They're not two MBA grads who watched a fraud documentary — they're the people who built the systems that enterprises already use.

Second, the "investigate then prevent" loop is the right architecture. Most fraud tools are either (a) real-time scoring engines that give you a number and wave goodbye, or (b) case management tools where humans do the thinking. MouseCat does both, and uses the investigation findings to automatically improve the scoring. That's a compounding advantage — every case investigated makes the next case easier to catch.

Third, the MCP angle is genuinely novel. Using a standardized tool-calling protocol means the investigation agents can be extended with new data sources without rewriting the core agent logic. Want to add a new KYC provider? Define an MCP tool. Want to query a new database? MCP tool. The architecture is inherently extensible in a way that bespoke agent frameworks aren't.

What I'd Build Differently

If I were building a MouseCat competitor, I'd make three changes.

Open-source the agent framework. The investigation agent pattern — "here's a case, here are your tools, investigate and report back" — is general-purpose enough to be open-sourced. Let developers build custom investigation workflows for their specific fraud types. Monetize the platform (data pipeline, backtesting, model monitoring), not the agent runtime. This would accelerate adoption and create a community-driven library of investigation patterns.

Start with synthetic data. The cold-start problem in fraud is brutal — you need real fraud data to train on, but you need the product to exist to collect fraud data. I'd invest heavily in generating realistic synthetic transaction datasets that let new customers see value in the first week, not after months of collecting labeled cases.

Build the dashboard as a collaboration layer. The real power move is making the AI agent's investigation visible and editable in real-time. Think Google Docs for fraud cases — the AI starts investigating, a human analyst can jump in, correct course, and the system learns from the correction immediately. The current approach of "agent investigates, human reviews" is sequential. Make it collaborative.

How to Replicate This with Claude Code

Below is a replication guide — a complete Claude Code prompt that walks you through building a working version of MouseCat. Copy it, install it, and start building. The full skills file is available as a download at the top of this article.

The core architecture has four pieces: an LLM-orchestrated investigation agent that uses tool-calling to gather evidence, a rules engine that generates and backtests fraud detection rules, a case management API that tracks investigations and decisions, and a dashboard that renders evidence graphs and audit trails. You'll use Claude or GPT-4 as the reasoning engine, Supabase for the database, and Next.js for the frontend.

The hardest part isn't the AI — it's the data pipeline. MouseCat's moat is their ability to ingest from Snowflake, Databricks, and streaming sources in real-time. For a replication, you'll simulate this with a Postgres-based event store and batch processing. It won't scale to millions of transactions, but it'll demonstrate the pattern.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.

Build MouseCat with Claude Code

Complete replication guide — install as a slash command or rules file

---
description: Build a MouseCat clone — AI-powered fraud investigation agents
---

# Build MouseCat: AI Agents for Fraud Investigation

## What You're Building
An AI-powered fraud investigation platform where LLM agents automatically investigate flagged transactions by querying databases, analyzing user behavior, cross-referencing evidence, and producing explainable decisions. The system learns from each case and generates backtested fraud detection rules.

## Tech Stack
- **Frontend:** Next.js 14 + TypeScript + Tailwind CSS + shadcn/ui
- **Backend:** Next.js API routes + Python (FastAPI) for ML pipeline
- **Database:** Supabase (PostgreSQL) for cases, users, rules, evidence
- **AI/ML:** Claude API (or OpenAI) for investigation agents with tool-calling
- **Key Libraries:** @anthropic-ai/sdk, zod, recharts, react-flow (for evidence graphs)

## Step 1: Project Setup

```bash
npx create-next-app@latest mousecat-clone --typescript --tailwind --app --src-dir
cd mousecat-clone
npx shadcn@latest init
npm install @anthropic-ai/sdk zod @supabase/supabase-js recharts reactflow
```

Create `.env.local`:
```
ANTHROPIC_API_KEY=your_key
NEXT_PUBLIC_SUPABASE_URL=your_url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_key
SUPABASE_SERVICE_ROLE_KEY=your_key
```

## Step 2: Core Data Models

Run these SQL migrations in Supabase:

```sql
-- Cases table: each flagged transaction becomes a case
CREATE TABLE cases (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  external_id TEXT,
  status TEXT DEFAULT 'open' CHECK (status IN ('open', 'investigating', 'resolved', 'escalated')),
  severity TEXT DEFAULT 'medium' CHECK (severity IN ('low', 'medium', 'high', 'critical')),
  type TEXT NOT NULL,
  transaction_data JSONB NOT NULL,
  user_profile JSONB,
  decision TEXT,
  decision_reasoning TEXT,
  evidence JSONB DEFAULT '[]'::jsonb,
  investigation_log JSONB DEFAULT '[]'::jsonb,
  rules_triggered TEXT[],
  created_at TIMESTAMPTZ DEFAULT NOW(),
  resolved_at TIMESTAMPTZ,
  assigned_to UUID
);

-- Rules table
CREATE TABLE fraud_rules (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name TEXT NOT NULL,
  description TEXT,
  condition JSONB NOT NULL,
  severity TEXT DEFAULT 'medium',
  precision FLOAT,
  recall FLOAT,
  true_positives INT DEFAULT 0,
  false_positives INT DEFAULT 0,
  status TEXT DEFAULT 'candidate' CHECK (status IN ('candidate', 'testing', 'active', 'disabled')),
  generated_from UUID REFERENCES cases(id),
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Evidence table
CREATE TABLE evidence (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  case_id UUID REFERENCES cases(id) ON DELETE CASCADE,
  type TEXT NOT NULL,
  source TEXT NOT NULL,
  data JSONB NOT NULL,
  risk_signal TEXT,
  summary TEXT,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Transactions event store
CREATE TABLE transactions (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id TEXT NOT NULL,
  amount DECIMAL(12,2) NOT NULL,
  currency TEXT DEFAULT 'USD',
  merchant TEXT,
  category TEXT,
  ip_address TEXT,
  device_fingerprint TEXT,
  location JSONB,
  metadata JSONB,
  is_fraudulent BOOLEAN,
  created_at TIMESTAMPTZ DEFAULT NOW()
);
```

## Step 3: Investigation Agent (Core Feature)

Create `src/lib/agents/investigator.ts`:

```typescript
import Anthropic from '@anthropic-ai/sdk';
import { createClient } from '@supabase/supabase-js';

const anthropic = new Anthropic();
const supabase = createClient(
  process.env.NEXT_PUBLIC_SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_ROLE_KEY!
);

const INVESTIGATION_TOOLS = [
  {
    name: 'query_user_history',
    description: 'Get transaction history for a user',
    input_schema: {
      type: 'object' as const,
      properties: { user_id: { type: 'string' } },
      required: ['user_id']
    }
  },
  {
    name: 'check_ip_reputation',
    description: 'Check if an IP is associated with fraud/VPNs',
    input_schema: {
      type: 'object' as const,
      properties: { ip_address: { type: 'string' } },
      required: ['ip_address']
    }
  },
  {
    name: 'analyze_device_fingerprint',
    description: 'Check how many accounts share this device',
    input_schema: {
      type: 'object' as const,
      properties: { fingerprint: { type: 'string' } },
      required: ['fingerprint']
    }
  },
  {
    name: 'search_similar_cases',
    description: 'Find past cases with similar patterns',
    input_schema: {
      type: 'object' as const,
      properties: { pattern: { type: 'string' } },
      required: ['pattern']
    }
  },
  {
    name: 'record_evidence',
    description: 'Record evidence found during investigation',
    input_schema: {
      type: 'object' as const,
      properties: {
        type: { type: 'string' },
        source: { type: 'string' },
        data: { type: 'object' },
        risk_signal: { type: 'string', enum: ['high_risk', 'neutral', 'low_risk'] },
        summary: { type: 'string' }
      },
      required: ['type', 'source', 'data', 'risk_signal', 'summary']
    }
  },
  {
    name: 'make_decision',
    description: 'Make a final fraud determination',
    input_schema: {
      type: 'object' as const,
      properties: {
        decision: { type: 'string', enum: ['legitimate', 'fraudulent', 'needs_review'] },
        reasoning: { type: 'string' },
        confidence: { type: 'number' }
      },
      required: ['decision', 'reasoning', 'confidence']
    }
  }
];

export async function investigateCase(caseId: string) {
  const { data: caseData } = await supabase
    .from('cases').select('*').eq('id', caseId).single();
  if (!caseData) throw new Error('Case not found');

  await supabase.from('cases')
    .update({ status: 'investigating' }).eq('id', caseId);

  const messages: Anthropic.MessageParam[] = [{
    role: 'user',
    content: `You are a fraud investigation agent. Investigate this case:
Case ID: ${caseData.id}
Type: ${caseData.type}
Transaction: ${JSON.stringify(caseData.transaction_data)}
User Profile: ${JSON.stringify(caseData.user_profile)}

Use tools to query history, check IPs, analyze devices,
search similar cases, record evidence, and make a decision.`
  }];

  let decision = null;
  while (!decision) {
    const response = await anthropic.messages.create({
      model: 'claude-sonnet-4-20250514',
      max_tokens: 4096,
      tools: INVESTIGATION_TOOLS,
      messages
    });

    for (const block of response.content) {
      if (block.type === 'tool_use') {
        const result = await executeToolCall(block.name, block.input, caseId);
        if (block.name === 'make_decision') decision = block.input;
        messages.push({ role: 'assistant', content: response.content });
        messages.push({
          role: 'user',
          content: [{ type: 'tool_result', tool_use_id: block.id,
            content: JSON.stringify(result) }]
        });
      }
    }
    if (response.stop_reason === 'end_turn' && !decision) break;
  }

  if (decision) {
    await supabase.from('cases').update({
      status: 'resolved', decision: decision.decision,
      decision_reasoning: decision.reasoning,
      resolved_at: new Date().toISOString()
    }).eq('id', caseId);
  }
  return decision;
}
```

## Step 4: Rules Engine

Create `src/lib/rules/engine.ts` to analyze patterns across resolved fraud cases, propose detection rules using Claude, and backtest them against historical transactions.

## Step 5: Case Management API

Create API routes:
- `app/api/cases/route.ts` — list/create cases
- `app/api/cases/[id]/route.ts` — case details
- `app/api/cases/[id]/investigate/route.ts` — trigger investigation
- `app/api/rules/route.ts` — manage rules
- `app/api/rules/[id]/backtest/route.ts` — run backtests

## Step 6: Dashboard UI

Build three views with shadcn/ui:
1. Cases list — filterable table with status, severity, decision
2. Case detail — investigation timeline, evidence, decision reasoning
3. Rules dashboard — precision/recall metrics, backtest results

Use `reactflow` for evidence relationship graphs.

## Step 7: Deploy

```bash
vercel --prod
# Or Docker for on-prem
docker build -t mousecat-clone .
docker run -p 3000:3000 mousecat-clone
```

## Key Insights
- The investigation agent is an LLM tool-calling loop — magic is in tool definitions
- Synthetic label generation requires domain expertise
- Rules engine is AutoML for fraud: generate, test, deploy
- MCP makes agents extensible — new data source = new tool

## Gotchas
- LLM costs: use haiku for routine cases, sonnet for complex
- Backtesting requires point-in-time features (no future leakage)
- On-prem needs SOC 2 compliance + encryption at rest
- Seed with synthetic fraud data to solve cold-start
build-mousecat-clone.md