Deepseek, The Sputnik moment for US, India's place in this US-China race to AGI glory
State of the Union
By now, the tech world is going over the top on the Deepseek-R1 model and its implications for the broader world. Deepseek dropped a paper on Jan 22, 2025, which spoke about a model that is slightly better than OpenAI’s o1 model(the incumbent and the best reasoning model so far). What was more stark is that this was achieved with a cost of $6 million(vs ~$100 million), with 200+ employees(vs 4500+ employees of OpenAI) and in 2 months, was the clincher that has pushed the industry to the boil. Prior to this, several million dollars were required in computing to train a model like Deepseek-R1 and, furthermore, millions and millions for inference. They also have used far fewer GPUs(2000 Nvidia H800s vs 100k) than the ones used by the premier model. ARC-AGI benchmarks, too, confirm the results published in the paper.
None of this should be surprising, as Deepseek released a paper on December 26, 2024, launching Deepseek v3. However, this time, the reaction was different and quite profound. US large-cap technology companies and the industry at large are in shock, which the market reflects in red. Nvidia lost $590 billion in market capitalization in a single day, marking the largest drop in history.
Why is this a big deal?
Marc Andreessen calls this AI’s Sputnik moment for the US. But why is this a big deal? The US government introduced export controls back in 2022, restricting China from procuring advanced GPUs from the US. This is important since these GPUs were required for training general-purpose models and advanced reasoning models. Deepseek, a Chinese company founded by a former hedge fund manager, managed to develop a better reasoning model using less powerful GPUs (compared to the GPUs. that leading AI companies in the US had), fewer employees, and very low limitations on APIs. This model can run on a high-end laptop. Additionally, it was launched with open weights (but not open source), meaning we don’t have access to the training data. Still, it is different from OpenAI, where their models are closed and proprietary. We must understand that they achieved all of this even at low cost and without access to the latest H100 GPUs that their US competitors could use. This is the most striking aspect of this episode, which leads many to feel that Deepseek-R1 came out of nowhere.
More chips. More compute.
OpenAI has been leading the initiative to create more compute for training and inference. It recently launched Project Stargate, which will develop AI infrastructure in the U.S. for $500 billion.
All companies building models have chosen to invest heavily in GPUs for training and inference, pouring several billion dollars into future computational resources. This has created immense demand for GPUs, as well as for the memory and power required to support this massive infrastructure. However, Deepseek has disrupted the standard operating principles of this industry by leapfrogging several practices and innovating new methods to train and create models at very low training and inference costs and with far fewer GPUs. Clearly, the export controls have led to less powerful GPUs at their disposal, prompting them to maximize efficiency and employ ingenious techniques such as reinforcement learning and distillation. This discussion details the strategies adopted by Deepseek. The belief that we need more GPUs, more compute, and more power is now outdated.
That said, doubts have been raised about whether Deepseek had access to 50,000 H100s and whether they might have used other models (which originated in the US) for distillation purposes. These claims are unfounded at this point, and there is no data backing them up.
Bull and Bear case
This means that all frontier model development companies will thoroughly examine the Deepseek papers and adopt those techniques. Even better, they will conceive more ingenious methods across all layers of the stack involved in creating a model. This will lead to more efficient and effective reasoning models, which will ultimately benefit the customers the most.
However, models will become commodities (if they are not already). Many open-source models will be available. The advantage of closed, proprietary model companies has been challenged. We will have significantly improved use cases and applications that will be realised due to the low footprint required.
Meta, which already has an open-source model, will launch models on par(if not better) than Deepseek. Grok 3 is on the anvil, and I am sure they are in the race. Open AI will have to justify the cost and compute to its investors. Open AI and Microsoft are already growing increasingly distant, and this development will only accelerate their separation. But Sam Altman’s tweet today says otherwise. Satya Nadella of Microsoft and Mark Zuckerberg of Meta were right in saying models would become commodities. They are sitting pretty that it is playing out as they predicted.
Two years ago, a few pointed out that AWS had missed the boat. But with AWS Bedrock for powering the agentic stack and a slew of announcements in its recently concluded Re: Invent(and the one before), it did make amends big time. With models becoming a commodity, AWS(and GCP and Azure) will benefit greatly.
There was a leaked internal Google memo (written by a Google employee and not to be taken as Google’s official document) where I really like a line: ‘We have no moat, and neither does OpenAI.’ This has become true much faster. OpenAI will definitely come out swinging, but they will have to invest in creating new moats. May be they have projects that are not public yet. As the startup world says, ‘never bet against Sam Altman.’ It will be interesting to see what OpenAI does from here and how Sam takes it forward. His initial reactions indicate he is in for the long game, as it has always been and as it should be.
In this mayhem, Apple stock(and Meta, Facebook, Amazon) has gone against the trend. It will benefit because Apple intelligence can be much more meaningful and useful to customers, given the low footprint required for running a very powerful model on these devices. Not just Apple devices, but future smartphones will be nothing like what we have now. This might usher in an era of AI-first smartphones that look different and do a whole lot of things that are not possible now. This was going to happen anyway; it’s just that the timeline has accelerated.
There will likely be additional export bans from the U.S., along with several restrictions on all AI-related exports. The Deepseek app might be banned in the U.S. and possibly in India, given the previous precedent regarding concerns that Chinese apps collect data and store it in servers in China(and not in the countries where the users reside). The next few days will tell us how this will unfold.
Nvidia stock will likely recover once the market rebounds from the doomsday scenario. Satya Nadella from Microsoft spoke about Jevons' paradox, which may unfold in the next few years. Given that fewer GPUs are required to create a new reasoning model, more such models will emerge. Additional use cases will arise, necessitating more compute and, consequently, more GPUs. There will be a period of recalibration regarding GPU needs, but demand for GPUs is expected to increase from this point forward. This represents the bull case that many anticipate. The bear case suggests less need for GPUs if the application of Gen AI use cases does not grow exponentially as we expect it to. However, the bear case is unlikely, considering the Gen AI-driven advancements that have been occurring over the past few years. Amid all this, many are discussing the obscure term in public discourse - Jevons' paradox. Economist Shruti noted in her tweet that everyone in the X feed is talking about Jevons' paradox. To be frank, I heard about it for the first time this week :)
We might be closer to AGI/ASI, and this is a possibility within our lifetime. The race is on, with the US and China being two key players in it. This bipolar competition will lead to extreme innovation, and as Marc mentioned, it could be the AI’s Sputnik moment. During the Space War, which the US won, we witnessed innovation at breakneck speed in just a few years. Until now, there was no narrative framing this as a race, but given recent developments, many in the US tech sector are comparing this moment to a race (and not a sprint, but a marathon). The space race still exists in the public imagination and is easily being used as an analogy in this context.
The next five years (if not a decade) will see new breakthroughs and innovations that we haven’t experienced in the past five years. When I say this, I fully understand what the last five years have resulted in and what has been a relentless march of innovation. Since governments also view AI supremacy as absolutely required for their global standing and sovereignty, we will see more government push and easing of regulations, clearing the path for innovators to build. It will be interesting to see how China and the broader world reacts.
What is clearer is that for a country to remain relevant in the global order, it must be self-reliant in building chips, compute, energy, and foundational models. These four are essential, and governments will increasingly push for investments in them.
Where is the value?
As models become commoditized, the real value is in the application layer. We can build new applications on top of the models and solve real-world problems for customers and enterprises. New ideas exist, such as leveraging Gen AI to fast-track drug development for diseases, an enterprise stack of agents, etc.
Enterprise products have adopted Gen AI over the past two years, solving several use cases intelligently. These primarily focus on improving productivity, enabling efficiency and powering conversational user experiences. Recently, building agents has become mainstream, and nearly all companies are investing in developing or adopting agentic frameworks to launch a variety of agents. However, these Enterprise and SaaS product companies have been charging a premium for these Gen AI features, largely due to the costs incurred for token usage and compute. Given that Deepseek has made it affordable, these companies must reconsider their pricing strategies. In a few years, Gen AI features are poised to become mainstream, and these companies will not be able to charge a premium. It will be interesting to see how companies recalibrate from here.
What is in it for India?
India has been concentrating on creating applications leveraging Gen AI and building SOTA models, specialising in Indic languages. So far, the narrative has been to avoid building foundational LLMs in India because they are very costly, and we can use the money elsewhere. But given the recent developments, we should revisit our approach and go all in to build foundational general-purpose and reasoning models.
India demonstrated how it can innovate in the space industry(and not limited to) and achieve several feats at a fraction of the cost. We should have pursued the goal of building foundational models from India. This is a grave error, but we can rectify it quickly. Right now, in India, there is a rebuke over Sam Altman’s earlier comment about India not being able to compete with the models, as the old video has resurfaced. However, we should channel our energy elsewhere and go back to building.
We should invest in building chips in India. While there have been many developments and government programs in the past few years, establishing this industry at scale in India will take many companies and several years of consistent effort. We need to fast-track what would take several decades of effort into less than a decade.
We should also invest significantly in the computing power, infrastructure, and energy needed to support these.
While these are the top four investments we need to make to become a player in this AI race, many other possibilities exist. We will discuss these further in subsequent posts.