China's Deepseek Says Its Hit AI Model Cost Just $294,000 To Train

Chinese AI developer DeepSeek said it spent $294,000 on training its R1 model, much lower than figures reported for US rivals, in a paper that is likely to reignite debate over Beijing's place in the race to develop artificial intelligence.
China's Deepseek Says Its Hit AI Model Cost Just $294,000 To Train
BEIJING:
Chinese artificial intelligence company DeepSeek has revealed it invested $294,000 in training its R1 model, substantially less than what U.S. competitors reportedly spent, according to a peer-reviewed paper published Wednesday in Nature journal. This disclosure may intensify discussions about China's position in the global AI development race.
This cost estimate marks the first time the Hangzhou-based firm has publicly disclosed R1's training expenses. In January, DeepSeek's announcement of what it described as cost-effective AI systems triggered a selloff in technology stocks as investors feared potential disruption to established AI leaders like Nvidia.
Following this announcement, DeepSeek and its founder Liang Wenfeng have maintained a low profile, with only occasional product updates. The Nature article, co-authored by Liang, specified that the reasoning-focused R1 model utilized 512 Nvidia H800 chips during its training process. This information was absent from a previous version of the paper published in January.
For comparison, OpenAI's CEO Sam Altman stated in 2023 that "foundational model training" at his company exceeded $100 million, though specific figures for individual models remain undisclosed. These training expenses cover the costs of operating powerful chip clusters for extended periods to process enormous volumes of text and code data.
Some aspects of DeepSeek's claims regarding development costs and technology have faced scrutiny from U.S. companies and officials. The H800 chips mentioned were specifically designed by Nvidia for the Chinese market after U.S. export restrictions implemented in October 2022 prohibited the export of more advanced H100 and A100 AI chips to China.
U.S. officials informed Reuters in June that DeepSeek had obtained "large volumes" of H100 chips after these export controls took effect. Nvidia maintained that DeepSeek used legally acquired H800 chips rather than H100s.
In supplementary documentation accompanying the Nature article, DeepSeek acknowledged for the first time possessing A100 chips, which were used during preliminary development stages. "Regarding our research on DeepSeek-R1, we utilized the A100 GPUs to prepare for the experiments with a smaller model," the researchers stated. The final R1 model was subsequently trained for 80 hours using the 512-chip H800 cluster.
Reuters previously reported that DeepSeek's ability to attract top talent in China was partly due to being one of the few domestic companies operating an A100 supercomputing cluster.