Claude's Corner: Button Computer — The Wearable AI Button Betting Against Your Phone

Two ex-Apple Vision Pro engineers built a $179 wearable AI button that responds in 500ms. No always-on microphone, no phone required, no nonsense. Here's how it works and how hard it is to clone.

7 min read
Claude's Corner: Button Computer — The Wearable AI Button Betting Against Your Phone

TL;DR

Button Computer is a $179 wearable AI device from two ex-Apple Vision Pro engineers. Press the button, speak, get a response in 500ms — no wake words, no always-on listening. The hardware moat is real; the software is cloneable.

6.0
C

Build difficulty

A Button That Thinks. Hardware That Doesn't Apologize for Existing.

Here's a hot take: the smartest thing about Button Computer isn't the AI. It's the button.

In a world where everyone is building voice assistants that listen constantly, hallucinate constantly, and require a six-step "Hey Siri, actually, no wait—" correction loop, two ex-Apple engineers decided to strip it back to one interaction: press, speak, get an answer in half a second. That's the entire product.

Most people's reaction is some version of why not just use your phone. And that's a fair question. But the same question got asked about iPods when iTunes existed on Windows. The answer, then and now, is that purpose-built hardware creates purpose-built experiences. Whether Button has found the right purpose is the interesting debate. The execution, at least, is sharp.

What They're Building

Button is a small wearable device — think iPod Shuffle form factor — that clips to your shirt or bag. Press the button, speak your query, get a voice response in approximately 500ms. Release the button and it's done. No wake words, no microphone arrays sweeping the room, no ambient surveillance. The device only activates when your thumb is physically holding it down.

The product ships at $179 (down from a $229 launch price) and includes three months of Button AI Pro. After that, the subscription runs $7.99/month. BYOK (bring your own API key) is also supported, which is a smart move that keeps the hardware accessible to developers and the price-sensitive crowd without giving up the subscription flywheel.

Target users are people who frequently need AI answers while their hands are busy — drivers, warehouse workers, clinicians on rounds, anyone who's sick of fishing for their phone every 20 minutes. The device supports Bluetooth pairing with speakers and smart glasses for fully eyes-free and hands-free operation.

At launch, it integrates with email, Slack, and Salesforce via voice commands, positioning it as a professional productivity tool rather than a gadget. Shipping is planned for December 2026, U.S. first with iOS support, Android to follow.

Related startups

The Founding Story

Chris Nolet (CEO) spent years as a Staff Software Engineer at Apple on Vision Pro, with a mechanical engineering background. He's a second-time founder and former venture partner — the kind of person who understands both why hardware is hard and why it's worth doing anyway.

Ryan Burgoyne (CTO) put in six years at Apple, was part of the team that kicked off Vision Pro, and previously founded Skyglass, a mobile virtual production startup. He moved to Colorado, stayed close friends with Chris, and eventually they decided to stop working for other people's visions and build their own.

This matters. Founders who shipped Vision Pro — arguably the most complex consumer hardware Apple has ever built — understand what it takes to get audio latency under 20ms, to design firmware that handles Bluetooth reconnection gracefully, and to make a device small enough that people actually clip it to their clothes. These are not app developers who decided hardware sounds fun.

How It Works

Button's technical architecture is deceptively straightforward, which is why it's so easy to dismiss as "just an app." The reality is that what looks simple from the outside represents an obsessive commitment to the one metric that matters: latency.

Hardware layer: A microcontroller (likely an ESP32 or nRF52 family chip) handles button state, audio capture via MEMS microphone, and Bluetooth Low Energy communication. The device connects to the user's phone over BLE, which serves as the internet gateway. There's a small speaker plus Bluetooth audio output for headphones or smart glasses.

Audio pipeline: The moment the button is depressed, the device begins streaming compressed audio frames over BLE to the companion mobile app. The app decompresses and forwards the audio stream to Button's backend via WebSocket — low-latency, persistent connection, no HTTP round-trip overhead.

Inference backend: This is where the 500ms number lives or dies. Button is running a speech-to-text model on the incoming audio stream (likely a fine-tuned Whisper variant with streaming), pipelining the transcript immediately into an LLM inference call with a heavily pre-configured system prompt, and piping the LLM output tokens into a TTS engine in parallel. The response audio starts playing as soon as the first tokens arrive, not after the LLM finishes generating. Streaming TTS is non-trivial to do cleanly — there are prosody issues at sentence boundaries that require careful buffering.

"Voice apps" architecture: Button describes these as lightweight integrations sitting between the button press and the LLM. For Slack, it means the app has permission to read recent threads and DMs, which get injected into the system context. For email it's similar. The key constraint is that the context window needs to stay small enough to not blow your latency budget. Clever but bounded.

Firmware + OTA: Consumer hardware without solid OTA updates is dead hardware. Button uses a secure bootloader with differential OTA pushed via the companion app — which also means Apple's App Store review timelines become part of the firmware release cycle. This is one of those iOS-specific headaches that nobody warns you about until you're three months deep into launch prep.

Difficulty Score

Dimension Score Why
ML / AI 7 / 10 Streaming ASR + streaming LLM + streaming TTS, all pipelined with a sub-500ms SLA. Each piece is open-source. Orchestrating them cleanly under real-world BLE conditions is not.
Data 3 / 10 No proprietary training data required at the core. Integration adapters need per-app context retrieval, but it's retrieval not training.
Backend 7 / 10 Real-time audio streaming infrastructure, WebSocket servers with tight SLAs, session management for push-to-talk state, multi-tenant BLE device pairing.
Frontend / Mobile 5 / 10 iOS companion app with CoreBluetooth, background audio session, OAuth flows for workspace integrations. Annoying but not exotic.
DevOps / Hardware 8 / 10 Hardware manufacturing, supply chain, FCC/CE certification, OTA firmware, DFM iterations. This is the part that kills hardware startups.

The Moat

What's genuinely hard to replicate:

The hardware supply chain is real. Getting a 5,000-unit run of a custom PCB with custom enclosure tolerances through a contract manufacturer, clearing FCC, surviving Amazon fulfillment, and still shipping before Christmas is not something a first-time founder with $500K figures out in a year. Chris and Ryan have done hardware before. That's worth 18 months of lead time to a competitor who hasn't.

The latency architecture is also harder than it looks. Getting streaming ASR + LLM + TTS to behave cleanly under real-world Bluetooth conditions — dropout, reconnect, background app throttling by iOS — requires the kind of obsessive instrumentation and edge-case hunting that only happens when the latency number is literally your marketing headline.

What's easy to replicate:

The software, honestly. Any competent team with six months and a cloud budget can build the backend. The BLE protocol is straightforward. The mobile app is straightforward. The "voice app" concept is essentially system prompt engineering with OAuth. There's nothing proprietary in the ML stack — they're almost certainly using APIs from Deepgram or similar for ASR, Claude or GPT-4o-mini for LLM inference, and ElevenLabs or Cartesia for TTS.

The hardware design is also not proprietary. The device is elegantly simple — and simple is both a virtue and a vulnerability. A Shenzhen PCB house could have a clone in 90 days. The moat is brand trust, supply chain relationships, and the software ecosystem they build around it. None of those are insurmountable.

The existential risk:

Apple ships an AirPod with a button. OpenAI finishes building its io wearable. Google integrates Gemini into everything at the OS level. Any of these makes Button Computer irrelevant overnight — not because the product is bad, but because distribution is everything in consumer hardware and Button doesn't have a retail channel yet. The window is 18–24 months and they know it.

Replicability Score: 62 / 100

This isn't a software-only startup. The hardware manufacturing moat, the founders' Apple pedigree, and the latency engineering are real differentiators that a weekend project cannot touch. A well-funded team could absolutely clone the software layer — and probably will. But shipping the actual hardware, building the brand, and doing it fast enough to matter before the giants absorb this category? That's genuinely hard. 62 feels right: cloneable in principle, non-trivial in practice, but nobody should be sleeping soundly if Apple decides to ship this feature in iOS 21.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.

Build This Startup with Claude Code

Complete replication guide — install as a slash command or rules file

# Build a Button Computer Clone with Claude Code

A step-by-step guide to building a push-to-talk wearable AI device — hardware + backend + mobile app — in 7 steps.

---

## Step 1: Design the Hardware (BLE Microcontroller + Audio)

**Target chip:** Nordic nRF52840 (BLE 5.0, good audio support, Arduino-compatible)  
**Components:** MEMS microphone (SPH0645), small speaker (8Ω 1W), tactile button, LiPo battery + charging IC (TP4056), USB-C port.

**PCB layout goals:**
- Keep mic far from speaker to avoid feedback
- Route BLE antenna away from ground planes
- Add decoupling caps on VDDIO lines
- Design for DFM: avoid fine-pitch ICs if you're prototyping yourself

**Firmware (Zephyr RTOS or Arduino):**
```c
// On button press: start ADC capture, stream PCM frames over BLE NOTIFY
// On button release: send EOF frame, stop capture
void button_isr_handler(const struct device *dev, struct gpio_callback *cb, uint32_t pins) {
    if (gpio_pin_get(dev, BUTTON_PIN)) {
        audio_stream_start();
    } else {
        audio_stream_stop();
    }
}
```

**BLE GATT service schema:**
- Service UUID: custom `0xFEEF`
- Characteristic: `AUDIO_TX` (NOTIFY) — streams 20-byte PCM frames at 16kHz mono 16-bit
- Characteristic: `CONTROL` (WRITE) — receives commands (mute, volume, pair)

---

## Step 2: Build the Streaming Backend (WebSocket + Audio Pipeline)

**Stack:** Node.js + `ws` library, Redis for session state, deployed on Fly.io (low-latency edge nodes)

**Architecture:**
```
BLE Device → iOS App → WebSocket → Audio Router → ASR → LLM → TTS → WebSocket → iOS App → Speaker
```

**Key API design:**
```typescript
// WebSocket message protocol
type AudioFrame = { type: 'audio'; data: Buffer; sessionId: string };
type EOFFrame   = { type: 'eof';   sessionId: string };
type TTSChunk   = { type: 'tts';   data: Buffer; isFinal: boolean };
```

**Latency budget breakdown:**
- BLE→phone: ~20ms
- Phone→WebSocket server: ~30ms (with edge node)
- ASR (streaming Deepgram): ~80ms to first word
- LLM first token (Claude Haiku or GPT-4o-mini): ~150ms
- TTS first chunk (Cartesia): ~50ms
- **Total realistic TTFA (Time To First Audio): ~330–400ms**

---

## Step 3: Speech-to-Text Integration (Streaming ASR)

**Use Deepgram's streaming WebSocket API** — lowest latency in the market.

```typescript
import { createClient, LiveTranscriptionEvents } from '@deepgram/sdk';

const dg = createClient(process.env.DEEPGRAM_API_KEY);
const live = dg.listen.live({ 
  model: 'nova-3', 
  language: 'en-US',
  encoding: 'linear16',
  sample_rate: 16000,
  interim_results: true  // fire LLM call on interim for lower latency
});

live.on(LiveTranscriptionEvents.Transcript, (data) => {
  if (data.is_final) triggerLLMCall(data.channel.alternatives[0].transcript);
});
```

**Key optimization:** Send interim results to the LLM speculatively. If the final transcript differs, cancel and re-send. This cuts ~100ms in practice.

---

## Step 4: LLM Inference with Context Injection (Voice Apps)

**Model selection:** Use a fast model (Claude Haiku, GPT-4o-mini) not a smart one. A 500ms SLA leaves no room for Opus or GPT-4o.

**DB schema for voice app integrations:**
```sql
CREATE TABLE voice_app_integrations (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID REFERENCES users(id),
  app_type TEXT NOT NULL,  -- 'slack', 'gmail', 'salesforce'
  oauth_token TEXT,
  oauth_refresh_token TEXT,
  scopes TEXT[],
  last_context_fetched_at TIMESTAMPTZ,
  context_cache JSONB,  -- cached for 60s to avoid latency hit
  created_at TIMESTAMPTZ DEFAULT now()
);
```

**Context injection pattern:**
```typescript
async function buildSystemPrompt(userId: string, appType: string): Promise<string> {
  const ctx = await getCachedContext(userId, appType);  // 60s TTL
  return `You are a fast, concise voice assistant. ${ctx.summary}
Answer in 1-2 sentences max. Never use markdown. Be direct.`;
}
```

---

## Step 5: Text-to-Speech (Streaming TTS)

**Use Cartesia** — they stream audio chunks as tokens arrive, not after the full response is generated.

```typescript
import Cartesia from '@cartesia-ai/cartesia-js';

const cartesia = new Cartesia({ apiKey: process.env.CARTESIA_API_KEY });

async function streamTTS(textStream: AsyncIterable<string>, ws: WebSocket) {
  const tts = cartesia.tts.websocket({ model: 'sonic-english', voice: { id: 'VOICE_ID' } });
  for await (const token of textStream) {
    await tts.send({ transcript: token, continue: true });
    tts.on('message', (chunk) => ws.send(JSON.stringify({ type: 'tts', data: chunk.audio })));
  }
}
```

**Critical:** Buffer 2–3 audio chunks before sending to the device to prevent choppy playback at sentence boundaries.

---

## Step 6: iOS Companion App (CoreBluetooth + Background Audio)

**Key iOS APIs:**
- `CBCentralManager` for BLE device discovery and connection
- `AVAudioSession` with `.playAndRecord` category for background audio
- `BackgroundTasks` framework for maintaining WebSocket while backgrounded

```swift
// Maintain BLE connection in background
func centralManager(_ central: CBCentralManager, didConnect peripheral: CBPeripheral) {
    peripheral.discoverServices([CBUUID(string: "FEEF")])
    setupAudioSession()
}

func setupAudioSession() {
    try AVAudioSession.sharedInstance().setCategory(.playAndRecord, 
        options: [.allowBluetooth, .defaultToSpeaker])
    try AVAudioSession.sharedInstance().setActive(true)
}
```

**Subscription management:** Use RevenueCat for in-app subscriptions. Handle BYOK via settings screen — user pastes API key, stored in Keychain.

---

## Step 7: Hardware Manufacturing & OTA Updates

**Prototype path:**
1. Order PCBs from JLCPCB with assembly (PCBA) — ~$200 for 10 boards
2. Test with nRF52840 dev kit before committing to custom PCB
3. Use Adafruit Feather nRF52840 for rapid prototyping

**Enclosure:** Design in Fusion 360, print on Bambu Lab X1C. For production, injection mold through a Chinese CM (Foxlink, Flex). MOQ is usually 5,000 units.

**FCC certification path:**
- Pre-scan at a local EMC lab (~$2K) to catch obvious failures
- Full FCC Part 15B + Part 15C (BLE) certification: ~$15K, 8–12 weeks
- Use a pre-certified BLE module (Nordic nRF52840 module has FCC ID) to simplify certification significantly

**OTA firmware updates:**
```swift
// iOS companion triggers OTA on app open
func checkForFirmwareUpdate() async {
    let latest = await api.getLatestFirmwareVersion()
    guard latest > currentDeviceVersion else { return }
    let binary = await api.downloadFirmware(version: latest)
    await dfuManager.performUpdate(peripheral: device, firmware: binary)
}
```
Use Nordic's DFU library (nRF DFU) — handles secure bootloader, CRC validation, and rollback on failure.

---

## Cost Estimates to Ship v1

| Item | Cost |
|------|------|
| PCB + assembly (1K units) | ~$45/unit |
| Enclosure (injection mold amortized) | ~$8/unit |
| BOM components | ~$22/unit |
| FCC + BT SIG certification | ~$20K one-time |
| **Total COGS at 1K units** | **~$75/unit** |

At $179 retail with $7.99/month subscription, you need roughly 8 months of subscription per device to hit 3× gross margin. That's the business.
claude-code-skills.md