A $294,000 training run with no human-labelled reasoning data produced DeepSeek-R1, the paper that subsequently reached the cover of Nature. Liang Wenfeng, co-founder and CEO of DeepSeek, is listed as the corresponding author. The paper, published on arXiv in January 2025 under the title DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, demonstrated that a language model trained on pure reinforcement learning signals, without any supervised fine-tuning on human-curated reasoning chains, could produce world-class mathematical and coding reasoning.
Liang Wenfeng's $294K DeepSeek-R1 RL Breakthrough Reached Nature
How Liang Wenfeng's DeepSeek-R1 used Group Relative Policy Optimization and pure reinforcement learning to produce emergent reasoning capabilities for $294,000 in training compute, and why the paper reached the cover of Nature.
5 min read

Related startups
© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.