DeepSeek's surprisingly inexpensive AI model challenges industry norms. While boasting a mere $6 million pre-training cost for its DeepSeek V3 model, the reality is far more complex. The company's self-reported figure omits substantial research, refinement, data processing, and infrastructure expenses.
DeepSeek's V3 model utilizes innovative technologies: Multi-token Prediction (MTP) for simultaneous word prediction, Mixture of Experts (MoE) employing 256 neural networks, and Multi-head Latent Attention (MLA) for improved focus on key sentence elements.
Image: ensigame.com
However, a SemiAnalysis report reveals a far more substantial investment. DeepSeek operates a massive computational infrastructure, utilizing approximately 50,000 Nvidia Hopper GPUs across multiple data centers, representing a total server investment of roughly $1.6 billion and operational costs near $944 million.
Image: ensigame.com
This contradicts the initial claim of minimal cost. DeepSeek, a subsidiary of High-Flyer, a Chinese hedge fund, owns its data centers, fostering control and rapid innovation. Its self-funded nature enhances agility. The company attracts top talent, with some researchers earning over $1.3 million annually.
Image: ensigame.com
DeepSeek's actual AI development investment exceeds $500 million. While its lean structure facilitates innovation, the "revolutionary budget" narrative is misleading. A comparison with competitors highlights the discrepancy: DeepSeek's R1 model cost $5 million, while ChatGPT 4 cost $100 million. Despite the inflated claims, DeepSeek's success underscores the potential of well-funded independent AI companies to compete effectively with established giants.
Image: ensigame.com
Ultimately, DeepSeek's success stems from substantial investment, technological advancements, and a skilled team, not a miraculously low budget. However, even with its true costs, it remains significantly cheaper than its competitors.