Liang Wenfeng's DeepSeek: $5.6M Training Budget, $45B Valuation

DeepSeek-V3 cost $5.6 million to train. V4-Pro charges $1.74 per million tokens. And Liang Wenfeng is now raising $10 billion at a $45 billion valuation — here is what we know about DeepSeek's finances.

Jun 9 at 7:00 AM6 min read

Liang Wenfeng DeepSeek API pricing vs frontier models 2026 — DeepSeek V4-Flash input pricing versus comparable frontier models at June 2026 API rates.· StartupHub.ai original chart. Sources: DeepSeek API Docs, CostGoat.

DeepSeek-V3, the model that turned heads in early 2025, cost roughly $5.6 million to train on H800 GPUs, according to the company's own technical report — a number that helped reframe global assumptions about the compute cost of frontier AI. Its successor, DeepSeek-V4-Pro, released on April 24, 2026, scales to 1.6 trillion total parameters while keeping API input pricing at $1.74 per million tokens, around 13 times cheaper than comparable US models at launch.

How DeepSeek built frontier AI for under $6 million

The headline number from DeepSeek's V3 technical report, published in December 2024 on arXiv, is 2.788 million H800 GPU hours for the full training run, which at roughly $2 per GPU-hour comes to approximately $5.576 million. The model has 671 billion total parameters but uses a Mixture-of-Experts architecture that activates only 37 billion per token, reducing inference and training compute relative to a dense model of the same nominal size. Pre-training covered 14.8 trillion tokens in under two months on a 2,048-GPU cluster.

V4-Pro, released six months later, extends this architecture: 1.6 trillion total parameters, 49 billion active per token, and a default one-million-token context window. At launch it scored 80.6% on SWE-bench Verified, the highest coding benchmark result of any model at the time of release, according to benchmark tracker BenchLM. DeepSeek labels V4-Pro a preview release, meaning final performance figures may shift before general availability.

The scaling pattern across DeepSeek's model generations reflects a deliberate architectural bet. Rather than chase raw parameter count across all weights, the team progressively increases total-model size while keeping the active-parameter footprint small. V4-Flash, the lighter version released alongside V4-Pro, uses just 13 billion active parameters drawn from a 284-billion-parameter pool, making it cheaper to serve than V3 on a per-token basis despite being a newer model.

DeepSeek model parameter scale: total vs active parameters across V3, V4-Flash, V4-Pro — Total versus active parameters per token across DeepSeek model generations. Source: DeepSeek technical reports, BenchLM.

The $10 billion fundraise and a valuation that tripled in six weeks

Until April 2026, DeepSeek had no external investors. High-Flyer Capital Management funded all research spending directly, without venture-capital dilution or a public markets timeline. That changed when the company opened its first external round, targeting at least 300 million dollars at a valuation above $10 billion, according to reports in April 2026. The China Integrated Circuit Industry Investment Fund — the state-backed vehicle known colloquially as the "Big Fund" — is leading the round, with Tencent Holdings and Alibaba Group in separate talks to participate, per Bloomberg.

By late May 2026, reported valuation had reached $45 billion, per TechCrunch, up from a $3.4 billion secondary-market figure in 2025. Liang himself is expected to commit roughly 20 billion yuan (approximately $2.7 billion) of personal capital to the round. Battery giant CATL is reportedly considering a 5 billion yuan contribution. High-Flyer remains the controlling shareholder throughout. Liang controls close to 90% of the company.

The strategic context is disclosed clearly in communications to potential investors: DeepSeek's management told the round's participants that the startup will prioritise groundbreaking AI research over short-term commercialisation, per Bloomberg's May 22 reporting. The Information reported separately that DeepSeek is simultaneously beginning to plot revenue efforts, suggesting a two-track structure — research-first public messaging, commercial product work running in parallel. Compare this with how Ilya Sutskever's SSI has framed a similar research-over-revenue posture while raising $32 billion at a valuation many analysts consider pre-revenue.

DeepSeek valuation milestones: secondary market 2025 through reported May 2026 figure — DeepSeek valuation progression from secondary-market estimates through the ongoing 2026 funding round. Sources: Bloomberg, TechCrunch, Capital Brief.

API pricing as a structural weapon against US incumbents

DeepSeek's commercial strategy since V3 has been aggressive pricing rather than traditional enterprise sales cycles. V4-Flash, the speed-optimised model released April 24, 2026, costs $0.14 per million input tokens on a cache-miss basis, according to DeepSeek's own API documentation — 18 times cheaper than GPT-5.4 at $2.50 per million and 36 times cheaper than Claude Opus 4.7 at $5.00 per million, per pricing aggregator CostGoat. V4-Pro launched with a 75% discount, per The Next Web, and DeepSeek cut cache-hit prices across the entire API suite to one-tenth of original rates on April 26.

The pricing philosophy maps directly to Wenfeng's stated views on competitive dynamics. In a widely cited interview published by The China Academy, he argued: "In disruptive tech, closed-source moats are fleeting. Even OpenAI's closed-source model can't prevent others from catching up. Therefore, our real moat lies in our team's growth — accumulating know-how, fostering an innovative culture." The implication is that pricing is a tool to establish developer adoption, not a path to margin expansion.

The downstream effect inside China's AI market is already visible. Domestic competitors including Zhipu's GLM 5.1 and Moonshot's Kimi K2.6 have faced direct pricing pressure since DeepSeek's aggressive cuts, per market analysis from CloudZero. The pattern resembles what Sam Altman's OpenAI did to API pricing globally in 2023 and 2024 — except DeepSeek is doing it from a cost base that appears structurally lower, not just subsidised.

API input pricing comparison: DeepSeek V4-Flash and V4-Pro versus GPT-5.4 and Claude Opus 4.7, June 2026 — Input token pricing per million tokens (cache miss, June 2026). Sources: DeepSeek API Docs, CostGoat.

What it means

DeepSeek's financial structure is unlike that of any other AI lab operating at this scale. There is no venture board setting a return timeline, no IPO process shaping disclosure practices, and the primary backer — High-Flyer — is a quantitative fund that recorded a 57% gain in 2026 per Bloomberg, providing ongoing capital without requiring DeepSeek to show revenue. The first external funding round preserves that structure: Liang's personal commitment dwarfs the institutional cheques, and High-Flyer retains control. Whether the research-over-commercialisation framing survives the addition of state-fund and tech-giant investors is the central question as the round closes.

Sources

Editorial standards: every claim is sourced. Tips: [email protected]

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.

#Liang Wenfeng #DeepSeek #DeepSeek V4 #AI funding #China AI #open source AI #AI pricing