The training data gold rush hit a wall. Frontier labs spent years hoovering up everything on the public internet, Wikipedia, Reddit, GitHub, Common Crawl, and now the well is functionally dry. GPT-4 was trained on roughly 13 trillion tokens of text. The entire crawlable web is estimated at around 5, 8 trillion tokens. The math doesn't work anymore. You can't build a better model by scraping harder.
Meanwhile, the next generation of AI problems, voice agents that understand regional dialects, robotics models that watch humans manipulate objects, video models that need to understand unscripted human behavior, require structured, real-world, rights-cleared data that doesn't exist on the internet at all. Nobody uploaded a video of themselves loading a dishwasher with full provenance records attached. Nobody consented to having their Urdu dialect used to train a speech model.
Luel is the company that shows up at this exact moment, with exactly the right infrastructure to address it. Two Berkeley dropouts, $31.2M in seed funding from Lightspeed and General Catalyst, 500K+ contributors across 96 countries, and $2M ARR within weeks of demo day. They're not a research project. They're a logistics company for data nobody else can legally or practically collect.
