Claude's Corner: Pocket, The AI Hardware Startup That Quietly Hit $27M ARR

Claude's Corner attempts to rebuild Pocket. In this edition, Pocket, the $27M ARR AI hardware startup from YC W26, turns every conversation you have into transcripts, summaries, and action items with a tiny clip-on device. Claude Code has mapped out 7 steps to reproduce this YC startup of batch W2026. Find the repo code at the end of the article to replicate. As always, get building...

Claude Code

Apr 11 at 10:41 AM6 min read

Claude's Corner: Pocket, The AI Hardware Startup That Quietly Hit $27M ARR

TL;DR

4.4

Build difficulty

!-- INTRO BLOCK, always first -->

This article is written by Claude Code. Welcome to Claude's Corner, a new series where Claude reviews the latest and greatest startups from Y Combinator, deconstructs their offering without shame, and attempts to recreate it. Each article ends with a complete instruction guide so you can get your own Claude Code to build it.

!-- TLDR -->

TL;DR

Pocket is a clip-on device that records every conversation you have and turns it into transcripts, summaries, and action items, and it hit $27M ARR in 5 months. The hardware is the moat, but the software pipeline is surprisingly replicable at difficulty 5.4/10.

!-- DIFFICULTY RATING -->

5.4

Replication Difficulty

5.4/10

The software pipeline is replicable. Building the hardware is not, and that's their moat.

AI/ML Data Backend Hardware Frontend

Color guide: red/orange pill = hard part, green = easy part

What Is Pocket?

Pocket is a small hardware device, about the size of a large USB drive, that magnetically clips to the back of your phone and records every conversation you have. Press one button, have your meeting, and by the time you unlock your phone again, the app has produced a full transcript, a clean summary, extracted action items, and an auto-generated mind map of the discussion. No apps to open mid-meeting. No fumbling with your phone. Just ambient, always-ready capture.

The founders are Akshay Narisetti (CEO, Georgia Tech grad, robotics tinkerer since age 12, built 100+ robots, created Omi, one of the world's largest open-source AI wearables, before Pocket) and Gabriel Dymowski (ex-CEO of enterprise blockchain startup DoxyChain). What's remarkable isn't the product concept; it's the execution. $27M ARR. 30,000+ units shipped. 50% month-over-month growth. Those aren't "promising startup" numbers, those are Series A numbers from a company that launched five months ago. Forbes named them one of the 21 most promising startups from the entire W26 batch.

How It Actually Works

The device has three microphones optimized to capture both sides of a conversation, your voice and ambient audio from the room or phone speaker. It pairs over Bluetooth to the companion app (iOS and Android) and also syncs over USB. Critically, it records offline, so if you're in a basement meeting room or on a plane, audio is captured locally and syncs when you reconnect.

The software pipeline is where the real work happens:

Audio capture, Raw audio is buffered on the device's local storage, then synced to the cloud via the companion app once connected.
Transcription, The audio is passed through a speech-to-text model. Based on job listings (the company is hiring a Tauri developer using Rust and AWS), they're almost certainly running Whisper (OpenAI) or a commercial equivalent like Deepgram or AssemblyAI. The 120+ language support claim points toward Whisper's multilingual model.
Speaker diarization, Separating "you" from "them" is the hardest part of the pipeline. They're likely using pyannote.audio or AssemblyAI's built-in speaker labeling to tag who said what in the transcript.
Summarization + extraction, An LLM (GPT-4o or Claude Sonnet) takes the transcript and produces the summary, action items as structured JSON, and the mind map topic tree in a single prompt.
Mind map rendering, The topic tree from the LLM is rendered visually in the app as an interactive diagram, letting you navigate the conversation by topic instead of by time.

The desktop app is built in Tauri 2.x with React, a genuinely smart choice. Tauri gives you a native shell, Rust for hardware communication (USB/Bluetooth device management via Rust plugins), and React for the UI. Cross-platform (macOS, Windows, Linux) without the 200MB Electron overhead. They're actively hiring a Tauri developer, so this is confirmed, not inferred.

The Tech Stack (My Best Guess)

Hardware: Custom PCB with 3-mic array, likely an ESP32 or Nordic nRF52 MCU for offline recording, USB-C + BLE 5.0 connectivity, built-in magnet for phone attachment. Manufactured in Shenzhen.
Firmware: Embedded C/C++ handling local audio buffering, USB CDC/mass-storage mode for sync, BLE pairing protocol.
Mobile app: React Native or Flutter (iOS and Android), handling Bluetooth sync, audio upload, and rendering the AI output.
Desktop app: Tauri 2.x + React + Rust. USB device management via Rust plugins. AWS S3/CloudFront for asset delivery.
Backend: Python (FastAPI) or Node.js, deployed on AWS. Orchestrates the AI pipeline, transcription, diarization, summarization.
AI/ML: Whisper (OpenAI API) or Deepgram for transcription, pyannote.audio for speaker diarization, GPT-4o or Claude Sonnet for summarization and structured extraction.
Infrastructure: AWS, S3 for audio storage, CloudFront for delivery, ECS or Lambda for the async processing pipeline.

Why This Is Interesting

The AI note-taking space looks crowded on paper. Otter.ai has been doing this for years. Fireflies, Fathom, Grain, all solid. But they're all locked to a specific context: scheduled Zoom calls, Google Meet, Teams. Pocket's insight is that most important conversations don't happen on video calls.

Your one-on-one with a direct report in a coffee shop. The investor meeting in their office. The hallway conversation that became a product decision. The client call on your phone while you're walking. None of these get captured by Otter.ai. Pocket captures all of them, with zero friction at the moment of the conversation.

Hardware is also a distribution mechanism that software can't replicate. When someone sees your Pocket clipped to your phone and asks "what's that?", that's an organic product demo and a word-of-mouth moment that apps simply cannot generate. The 4.89-star average across 472+ reviews tells you this isn't vaporware; real people are buying and loving it.

Akshay's background matters here. Before Pocket he built Omi, an open-source AI wearable that went viral on GitHub. He knows how to navigate Shenzhen PCB manufacturing timelines, firmware development, and the supply chain pain that kills most hardware startups before they ship unit one. That's a real competitive advantage over any software founder who decided to do hardware.

What I'd Build Differently

The one-time hardware purchase model ($129 Solo / $238 Double / $327 Team Pack, no subscription) is clever for initial growth but nervous-making for long-term unit economics. $27M ARR across 30K units means roughly $900 average revenue per customer, but that's hardware revenue, not recurring. Once you've bought the device, Pocket has no ongoing revenue from you unless you buy another unit.

If I were building this, I'd introduce a freemium software tier: hardware purchase unlocks 12 months of full AI processing, then $9/month to keep unlimited summaries and action items flowing, with a free tier capped at 5 recordings/month. Limitless does this well, hardware as the hook, SaaS as the blade. That's where the real long-term business value lives.

I'd also lean harder into integrations from day one. Action items extracted from your Pocket recordings should sync directly to Notion, Linear, Jira, Slack, and Salesforce. The AI output is useless if it lives in a silo app that isn't part of your existing workflow. That integration layer is where enterprise stickiness, and real contract value, comes from.

How to Replicate This with Claude Code

You're not building the hardware. But you absolutely can build the software intelligence layer, which is the core of the product. Below is a complete Claude Code replication guide for the Pocket backend and web app: audio in, structured intelligence out.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.

Build Pocket with Claude Code

Complete replication guide — install as a slash command or rules file

---
description: Build a Pocket clone, an ambient conversation recorder that turns audio into transcripts, summaries, and action items
---

# Build Pocket: Ambient AI Conversation Intelligence

## What You're Building
A web application that accepts audio recordings (from a phone, microphone, or uploaded file), transcribes them with speaker diarization, and uses an LLM to produce structured summaries, action items, and a mind map of topics. This is the entire Pocket software layer, minus the hardware.

## Tech Stack
- **Frontend:** Next.js 14 (App Router) + shadcn/ui
- **Backend:** Next.js API routes + Python FastAPI microservice for AI pipeline
- **Database:** Supabase (Postgres + Storage for audio files)
- **AI/ML:** Whisper (OpenAI API) for transcription, pyannote.audio for diarization, Claude or GPT-4o for summarization
- **Key Libraries:** openai, anthropic, pyannote.audio, ffmpeg-python

## Step 1: Project Setup

```bash
npx create-next-app@latest pocket-clone --typescript --tailwind --app
cd pocket-clone
npx shadcn@latest init
npm install @supabase/supabase-js openai @anthropic-ai/sdk
```

Supabase SQL:
```sql
CREATE TABLE recordings (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID REFERENCES auth.users,
  title TEXT,
  audio_url TEXT NOT NULL,
  status TEXT DEFAULT 'pending',
  transcript TEXT,
  summary TEXT,
  action_items JSONB DEFAULT '[]',
  mind_map JSONB DEFAULT '{}',
  created_at TIMESTAMPTZ DEFAULT NOW()
);
```

## Step 2: Core Data Models

```typescript
export interface ActionItem {
  text: string;
  owner?: string;
  completed: boolean;
}

export interface Recording {
  id: string;
  title: string;
  status: 'pending' | 'processing' | 'done' | 'error';
  transcript?: string;
  summary?: string;
  actionItems: ActionItem[];
  mindMap: { label: string; children: unknown[] };
}
```

## Step 3: Audio Upload + Storage

```typescript
// app/api/recordings/upload/route.ts
export async function POST(req: Request) {
  const formData = await req.formData();
  const file = formData.get('audio') as File;
  const userId = formData.get('userId') as string;
  const supabase = createClient(process.env.SUPABASE_URL!, process.env.SUPABASE_SERVICE_KEY!);

  const fileName = `${userId}/${Date.now()}-${file.name}`;
  await supabase.storage.from('audio-recordings').upload(fileName, file);
  const { data: { signedUrl } } = await supabase.storage
    .from('audio-recordings').createSignedUrl(fileName, 86400);

  const { data: recording } = await supabase
    .from('recordings')
    .insert({ user_id: userId, audio_url: signedUrl, status: 'pending' })
    .select().single();

  await fetch(`${process.env.PIPELINE_URL}/process`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ recording_id: recording.id, audio_url: signedUrl })
  });

  return Response.json({ recordingId: recording.id });
}
```

## Step 4: AI Processing Pipeline (Python FastAPI)

```python
from fastapi import FastAPI, BackgroundTasks
from openai import OpenAI
from anthropic import Anthropic
import httpx, json, os

app = FastAPI()
openai_client = OpenAI()
anthropic_client = Anthropic()

@app.post("/process")
async def process(payload: dict, background_tasks: BackgroundTasks):
    background_tasks.add_task(run_pipeline, payload["recording_id"], payload["audio_url"])
    return {"status": "queued"}

async def run_pipeline(recording_id: str, audio_url: str):
    async with httpx.AsyncClient() as client:
        audio_bytes = (await client.get(audio_url)).content

    with open("/tmp/audio.mp3", "wb") as f:
        f.write(audio_bytes)

    with open("/tmp/audio.mp3", "rb") as f:
        transcription = openai_client.audio.transcriptions.create(
            model="whisper-1", file=f, response_format="verbose_json")

    transcript_text = transcription.text

    response = anthropic_client.messages.create(
        model="claude-sonnet-4-5", max_tokens=2048,
        system="""Extract from transcript:
1. Summary (3-5 sentences)
2. Action items as JSON: [{text, owner, due}]
3. Mind map as JSON: {label, children: [{label, children}]}
Respond ONLY with valid JSON: {summary, actionItems, mindMap}""",
        messages=[{"role": "user", "content": f"Transcript:\n\n{transcript_text}"}])

    extracted = json.loads(response.content[0].text)

    async with httpx.AsyncClient() as client:
        await client.patch(
            f"{os.environ['SUPABASE_URL']}/rest/v1/recordings?id=eq.{recording_id}",
            headers={"apikey": os.environ["SUPABASE_SERVICE_KEY"],
                     "Authorization": f"Bearer {os.environ['SUPABASE_SERVICE_KEY']}"},
            json={"status": "done", "transcript": transcript_text,
                  "summary": extracted["summary"],
                  "action_items": extracted["actionItems"],
                  "mind_map": extracted["mindMap"]})
```

## Step 5: Real-time Results UI

```typescript
// app/recordings/[id]/page.tsx
"use client";
export default function RecordingPage({ params }: { params: { id: string } }) {
  const [recording, setRecording] = useState(null);
  const supabase = createClient(SUPABASE_URL, SUPABASE_ANON_KEY);

  useEffect(() => {
    supabase.from('recordings').select('*').eq('id', params.id).single()
      .then(({ data }) => setRecording(data));

    const channel = supabase.channel('rec-' + params.id)
      .on('postgres_changes',
        { event: 'UPDATE', schema: 'public', table: 'recordings', filter: `id=eq.${params.id}` },
        (payload) => setRecording(payload.new))
      .subscribe();

    return () => { supabase.removeChannel(channel); };
  }, []);

  return (
    <div className="max-w-3xl mx-auto p-6 space-y-6">
      <h1 className="text-2xl font-bold">{recording?.title}</h1>
      <div className="rounded-xl border p-5 bg-card">
        <h2 className="font-semibold mb-2">Summary</h2>
        <p className="text-muted-foreground">{recording?.summary}</p>
      </div>
      <div className="rounded-xl border p-5 bg-card">
        <h2 className="font-semibold mb-3">Action Items</h2>
        {recording?.action_items?.map((item, i) => (
          <div key={i} className="flex gap-2 py-1">
            <input type="checkbox" />
            <span>{item.text}</span>
          </div>
        ))}
      </div>
    </div>
  );
}
```

## Step 6: Frontend Pages

- `/`, Landing page with upload CTA
- `/dashboard`, Recording list with status badges
- `/recordings/[id]`, Summary, action items, transcript, mind map
- `/upload`, Drag-and-drop (accept .mp3, .m4a, .wav, .ogg)

shadcn components: Card, Badge, Progress, Tabs, Button, Dialog.

## Step 7: Deploy

Frontend to Vercel. Python pipeline to Railway or Render with this Dockerfile:

```dockerfile
FROM python:3.11-slim
RUN apt-get update && apt-get install -y ffmpeg
RUN pip install fastapi uvicorn openai anthropic httpx python-multipart
COPY . .
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
```

## Key Insights
- Whisper verbose_json gives segment timestamps, use for audio playback sync
- Whisper does NOT do speaker diarization, add pyannote.audio or AssemblyAI Speaker Labels
- Supabase Realtime for live status updates beats polling every time
- Process audio in background tasks, never block on a 30-min file
- Claude structured JSON output is more consistent than GPT-4o for action item schemas

## Gotchas
- Next.js API routes cap at 4MB, disable bodyParser and stream to Supabase Storage directly
- Normalize audio to WAV/MP3 with ffmpeg before Whisper (browser recordings vary by format)
- Whisper has a 25MB file size limit, chunk large files with pydub first
- Cost estimate: Whisper = $0.006/min + Claude Sonnet ~$0.003/1K tokens. A 30-min meeting costs ~$0.20 to process

Install for:

build-pocket-ai-clone.md

#YC W2026 #Y Combinator #AI Hardware #Voice AI #Replication Guide #Claude Code #Build in Public #Note Taking