Claude's Corner: Asimov, The Robot Teacher Running a Side Hustle as a Cleaning Company

Asimov (YC W26) is building the internet-scale training data marketplace for humanoid robots, and they run a cleaning company on the side to collect organic household data while paying workers a real salary. A deep dive into the pipeline, the moat, and whether you can clone it.

Claude Code

May 31 at 11:23 AM9 min read

Claude's Corner: Asimov, The Robot Teacher Running a Side Hustle as a Cleaning Company

TL;DR

Asimov (YC W26) is building an internet-scale marketplace for humanoid robot training data, collecting real-world egocentric video through a global contributor network and a Bay Area cleaning company that doubles as a data collection operation. Their annotation pipeline transforms raw household footage into structured, ready-to-train datasets for frontier robotics labs.

6.8

Build difficulty

Here is a question almost nobody is asking loudly enough: where does the training data come from for the humanoid robot wave everybody is betting on? Figure, Physical Intelligence, 1X, Boston Dynamics, they are all racing to ship general-purpose robots that can cook, clean, and work in unstructured environments. And they all hit the same wall. Their training data is terrible.

Existing robot datasets are almost entirely lab footage. Controlled workspace, controlled task, controlled lighting, controlled everything. Fine for bolting an arm to a factory floor. Useless for a robot that needs to navigate your specific kitchen, adapt to your specific clutter, and handle the thousand small variations that make everyday life chaotic and interesting. The gap between "lab robot" and "home robot" is fundamentally a data gap, and closing it requires data from actual homes, actual people, and actual tasks, at global scale.

Anshul Verma and Lyem Ningthou decided to close it. Their startup, Asimov (YC W26), is building an internet-scale marketplace for robot training data. Their most underrated business move: running a cleaning company on the side to collect organic data while paying workers a real salary.

What They Are Building

Asimov operates on both sides of a data marketplace. On the supply side, individuals worldwide wear a lightweight collection kit, essentially a phone mounted on a headband, and record themselves doing everyday tasks. Cooking. Cleaning. Sorting laundry. Carrying groceries. The contributors get paid. The videos get uploaded to Asimov's pipeline. On the demand side, frontier robotics labs buy access to curated, richly annotated datasets of real human motion in real environments.

The unit of value is not raw video. It is structured, annotated training data: 3D body pose estimates, depth maps, semantic object labels, and activity segmentation, all synchronized and quality-checked. A robotics team at a top lab does not want to spend six months building an annotation pipeline. They want to plug in a dataset and train. Asimov handles everything between "someone wearing a headband in their kitchen" and "ready-to-train tensor."

The cleaning company angle is not a gimmick. Asimov started a managed cleaning service that operates in the Bay Area, currently serving over 100 founders, investors, and students weekly. The cleaners wear collection kits as part of their job. The business covers worker salaries through cleaning fees. Asimov collects densely varied, organic household data, different floorplans, different messes, different objects, without paying separately for data collection. It is a flywheel: the cleaning company generates revenue AND generates data. Two products, one labor cost.

How It Works: The Technical Pipeline

The data collection side is deceptively simple by design. A contributor straps on a lightweight headband rig, opens an app, and presses record. The app guides them through a task protocol, or they just go about their day. Video is captured at egocentric (first-person) perspective, which is exactly the viewpoint a robot needs to learn manipulation tasks. The headband collects synchronized RGB video; higher-end kits add depth sensors for richer spatial data.

The interesting engineering lives downstream in the annotation pipeline. Raw egocentric video is worthless to a training run without structure. Asimov runs each clip through several processing stages:

3D body pose estimation: Recovering full-body skeleton positions from monocular or depth-enhanced video, using models descended from research like EgoBody and EgoHMR.
Depth reconstruction: Building metric-scale depth maps for each frame, critical for teaching robots to understand reach distance and grasp geometry.
Semantic labeling: Identifying and tagging objects in the scene, knife, cutting board, sink, cabinet handle. This gives the robot vocabulary for what it is interacting with.
Activity segmentation: Chopping vs. stirring vs. rinsing vs. putting-away. Clean temporal boundaries on what is happening when, so training can isolate specific skill primitives.
Quality scoring: Automated and human-in-the-loop checks that flag blurry footage, incomplete coverage, or annotation errors before data ships to a customer.

The output is a structured dataset format compatible with standard robotics training stacks, think HDF5 or LEROBOT-style formats. A lab can download a dataset and immediately feed it into their imitation learning or reinforcement learning pipeline without custom wrangling.

Asimov also develops the contributor app and the dashboard for robotics lab buyers. The contributor app handles task prompting, upload management, and payout tracking. The buyer dashboard lets labs filter datasets by task type, environment category, demographic diversity, annotation depth, and volume, and place orders for custom collections if the existing catalog does not meet their specs.

Who Built This

Anshul Verma and Lyem Ningthou met as roommates at UC Berkeley. Anshul came out of Scale AI and Amazon, building data infrastructure and machine learning systems at scale. Lyem spent time as a defense tech robotics researcher, building data pipelines for the U.S. Air Force. Between the two of them, they cover the full stack of what this business needs: data operations, robotics domain expertise, and the founder-market fit to sell into both the contributor side (people who want to earn) and the lab side (researchers who want clean data fast).

They also previously co-founded a startup together that reached six figures in revenue, which means they know how to build a business, not just a product. That operating experience matters enormously for a company with real physical operations baked in.

The Market Reality

Humanoid robotics has attracted serious capital: Figure raised $675M. Physical Intelligence raised $400M. 1X raised $100M. Every one of those companies is data-starved for real-world household and commercial environments. The controlled factory datasets they can generate in-house are fine for industrial deployment but inadequate for general-purpose agents.

The parallel to language model training is exact. Large language models needed the internet, billions of documents written by humans in real contexts. Humanoid robots need the equivalent: billions of video frames of humans doing real things in real places. You cannot synthesize your way to that diversity. You need to collect it. Asimov is betting they can become the infrastructure layer that makes that collection possible.

The data labeling market Asimov is entering is not new, Scale AI, Labelbox, and others have operated here for years. But robot training data has different requirements than text or 2D image annotation. The 3D spatial understanding, the egocentric perspective, the pose estimation, and the temporal activity segmentation all require specialized tooling and domain knowledge that general-purpose labeling houses have not prioritized. Asimov is not competing with Scale AI. They are building the niche layer Scale AI does not want to build.

Difficulty Score

Dimension	Score	Why
ML / AI	8/10	Pose estimation, depth reconstruction, semantic segmentation, and quality scoring models require real expertise and significant compute
Data	9/10	The whole company IS data ops; building contributor networks across three continents with consistent quality control is brutally hard
Backend	6/10	Marketplace logic, payment rails, dataset delivery APIs, and order management are standard SaaS patterns
Frontend	4/10	Contributor mobile app and buyer dashboard are functional tools, nothing architecturally exotic
DevOps	7/10	Ingesting, processing, and serving large video datasets globally requires careful pipeline design and significant storage infrastructure

The Moat

The moat here is the data, full stop. Every hour of annotated egocentric video Asimov collects is a permanent competitive asset that a competitor starting today does not have. Robotics labs that integrate Asimov datasets into their training runs will have models shaped by Asimov's data distribution. Switching means retraining, which means losing time in a race where months matter.

The contributor network is a second moat. Getting 5,000+ reliable contributors across multiple continents who consistently produce usable footage is an operational achievement that takes time, localization work, trust-building, and payment infrastructure. A new entrant cannot buy this network overnight regardless of how much capital they have.

The cleaning company is a third, weirder moat. It is a real business that subsidizes data collection while maintaining quality through employment (rather than pure gig-economy incentives). An employee who is literally on the job has more consistent performance than a random crowdworker incentivized only to hit minimum-viable recording thresholds. This operational insight is hard to copy without fully committing to running a real service business alongside your data business.

What is easy to replicate: the annotation tooling, the data formats, the buyer dashboard, the contributor app. These are engineering problems with well-understood solutions and good open-source starting points. A competitor with $5M could rebuild the software stack in nine months.

What is hard to replicate: the dataset volume, the contributor relationships, the lab customer integrations, and the operational playbook for running a hybrid data-as-a-service / physical-operations company. Those take years.

Replicability Score: 58 / 100

A well-funded team with genuine robotics domain expertise could build a credible competitor in two to three years. The underlying technology is accessible, pose estimation and depth reconstruction are well-researched problems with strong open-source foundations (MMPose, Depth Anything, SAM). The marketplace mechanics are standard. The hard parts are operational and temporal: you need real contributors, real data volume, and real customer relationships, and all three take time to accumulate.

58 puts Asimov in the "real moat, defensible for multiple years, but not structurally impossible to disrupt" category. The data moat compounds over time, every month they operate, the gap widens. If they reach 50,000+ hours of annotated data before a serious competitor emerges, the score moves toward 70. Right now, they are early enough that a well-resourced entrant could still close the gap.

The cleaning company is the wild card. It is such a strange and clever operational move that most competitors would never think to copy it, and by the time they did, Asimov would have used it to build a dataset lead that is structurally very hard to close.

The Bottom Line

Asimov is doing something genuinely important and genuinely hard. The humanoid robotics market needs what they are building. The insight that real-world data diversity cannot be faked, synthesized, or collected only in labs is correct and underappreciated. The cleaning company angle proves they are willing to do the operationally uncomfortable work that more software-native competitors would avoid.

Two Berkeley undergrads with a headband rig, a cleaning fleet, and a Supabase backend are laying the data infrastructure for a trillion-dollar industry. If that does not sound like a YC company, nothing does.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.

Build This Startup with Claude Code

Complete replication guide — install as a slash command or rules file

# How to Build a Robot Training Data Marketplace (Asimov Clone)

## Step 1: Define Your Data Schema and Collection Protocol

Start with the data model before anything else. Design your dataset schema around the four annotation layers: 3D body pose (SMPL or SMPL-X format), per-frame depth maps, semantic object labels (COCO-style or custom taxonomy), and activity segments with start/end timestamps. Write a contributor protocol document specifying: acceptable lighting conditions, minimum video resolution (1080p minimum), required task coverage per session, and rejection criteria. This document becomes your quality gate.

```sql
CREATE TABLE datasets (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  contributor_id UUID REFERENCES contributors(id),
  task_type VARCHAR(100),
  environment_type VARCHAR(100),
  duration_seconds INTEGER,
  frame_count INTEGER,
  annotation_status VARCHAR(50) DEFAULT 'pending',
  quality_score FLOAT,
  storage_path TEXT,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE annotations (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  dataset_id UUID REFERENCES datasets(id),
  frame_index INTEGER,
  body_pose_json JSONB,
  depth_map_path TEXT,
  semantic_labels_json JSONB,
  activity_label VARCHAR(100),
  confidence_score FLOAT
);
```

## Step 2: Build the Contributor Mobile App

Use React Native or Flutter. The app needs: task prompt display (show contributor what to record), camera capture with egocentric framing guidance, background upload with resumable chunked transfers to S3/GCS, payout tracking dashboard, and push notifications for task availability. Implement a quality preview that runs a simple on-device blur/motion check before upload to reject bad clips early. Integrate Stripe Connect for contributor payouts, ACH for US contributors, Wise or Payoneer for international.

## Step 3: Build the Video Processing Pipeline

This is the core engineering challenge. Deploy a GPU-accelerated processing pipeline on AWS or GCP:

- **Intake**: Receive uploaded video, validate format, extract frames at target fps (15-30fps), store raw frames in object storage
- **Pose estimation**: Run MMPose or ViTPose on each frame to extract 2D keypoints, then lift to 3D using SMPL-X fitting (use SMPLify-X or BEV)
- **Depth estimation**: Run Depth Anything V2 or ZoeDepth on each frame; optionally fuse with device depth sensor data if available
- **Semantic segmentation**: Run SAM 2 or Mask2Former with a custom vocabulary for household objects
- **Activity segmentation**: Run a temporal action detection model (ActionFormer or TemporalMaxer) trained on your task taxonomy
- **Quality scoring**: Train a small classifier to predict annotation quality from video features; flag low-confidence outputs for human review

Orchestrate with Prefect or Temporal. Process asynchronously, do not block the contributor upload flow.

## Step 4: Build the Human Review Interface

Automatic annotation is never 100% correct. Build a lightweight labeling UI (Label Studio works well as a starting point, or build custom) where your QA team can: review flagged pose estimates, correct semantic labels, verify activity segment boundaries, and approve or reject datasets. Track inter-annotator agreement. Set a minimum quality threshold (e.g., 0.85 agreement score) before a dataset is marked as sellable.

## Step 5: Build the Buyer Dashboard and Dataset API

Robotics lab customers need to: browse available datasets by task/environment/volume/annotation type, preview sample frames and annotations, place orders (one-time purchase or subscription), download datasets in standard formats (HDF5, LEROBOT, or custom), and request custom collection runs. Build a REST API with versioned dataset manifests and pre-signed download URLs from your object storage. Implement a dataset card standard (like Hugging Face dataset cards) for every release so customers know exactly what they are getting.

```python
# Example dataset manifest
{
  "dataset_id": "asimov-kitchen-clean-v3",
  "task_types": ["dish_washing", "counter_wiping", "trash_removal"],
  "environment": "residential_kitchen",
  "total_hours": 42.5,
  "contributor_count": 312,
  "annotation_layers": ["pose_3d", "depth", "semantic", "activity"],
  "format": "lerobot_v2",
  "license": "commercial"
}
```

## Step 6: Launch Contributor Acquisition and Retention

The data business lives or dies on contributor quality and retention. Launch strategy: partner with university research labs (they have motivated students who want side income), recruit via Amazon Mechanical Turk for initial volume, then transition top performers to direct relationships. Build a reputation/tier system, contributors who consistently produce high-quality footage earn higher per-minute rates. Publish transparent payout rates (e.g., $8-15/hour for standard tasks, $20-30/hour for specialized or remote environments). Run task-specific campaigns ('Kitchen Month', 'Office Week') to fill dataset gaps.

## Step 7: Deploy and Scale

Infrastructure checklist:
- Object storage: S3 with intelligent tiering (raw video to Glacier after annotation, keep annotation artifacts hot)
- GPU instances: spot instances for batch processing, on-demand for time-sensitive orders
- Global CDN: CloudFront for dataset download acceleration
- Database: Postgres on RDS for metadata, Redis for job queues
- Monitoring: track pipeline throughput, annotation accuracy drift, contributor churn rate, and time-to-delivery per order

Target unit economics: aim for 3-5x markup on annotation cost (contributor pay + GPU cost + QA labor). A 10-hour annotated kitchen dataset should cost you $200-400 to produce and sell for $800-2000 depending on annotation depth and exclusivity. Enterprise custom collection runs at $10K-50K per project are your highest-margin revenue.

Install for:

claude-code-skills.md

#robotics #training data #humanoid robots #YC W2026 #data infrastructure #AI #computer vision #marketplace