OpenAI suspects that China's DeepSeek AI models, significantly cheaper than Western counterparts, were developed using OpenAI's data. This revelation, coupled with DeepSeek's rapid success, triggered a market downturn for major AI companies, with Nvidia suffering the most substantial losses in Wall Street history.
DeepSeek's R1 model, built upon the open-source DeepSeek-V3, boasts significantly lower training costs (estimated at $6 million) and computing power requirements compared to Western models like ChatGPT. While this claim is debated, it has fueled investor concerns regarding the massive investments in AI by American tech giants. DeepSeek's popularity surged in the U.S. app charts amidst the controversy.
OpenAI and Microsoft are investigating whether DeepSeek violated OpenAI's terms of service by employing "distillation," a technique that extracts data from larger models, to train its own models using OpenAI's API. OpenAI confirmed its awareness of such attempts by Chinese and other companies and highlighted its proactive measures to protect its intellectual property, including collaboration with the U.S. government.
David Sacks, President Trump's AI advisor, corroborated the suspicion of data extraction from OpenAI models, predicting further actions by leading AI companies to prevent such practices.
The situation highlights a significant irony: OpenAI, itself accused of utilizing copyrighted internet content to train ChatGPT, is now accusing DeepSeek of similar practices. This hypocrisy has been widely noted, particularly in light of OpenAI's previous statement to the UK's House of Lords that training leading AI models without copyrighted material is impossible. This statement follows lawsuits filed by the New York Times and 17 authors alleging unlawful use of their work. OpenAI maintains that its training practices constitute "fair use." The debate underscores the complex legal and ethical challenges surrounding the use of copyrighted material in AI model training.