By James Kong
February 9, 2025
Summary
DeepSeek and its $5 million AI model is 98% cheaper than OpenAI’s—outperforms all U.S. giants
Reduced hardware costs by 45x - decoupling AI development from U.S. hardware dependency and infrastructure
Disrupted AI development economics and global financial and investment landscape
Major geopolitical impact and fallouts with immediate shift in power structure and perception
A seismic shift has occurred in recent weeks across the AI industry and broader financial markets.
If your work involves AI or finance—and you haven’t been living under a rock—you’ve likely heard of DeepSeek and its AI model, which has taken a significant leap toward advancing the theoretical concept of AGI (Artificial General Intelligence). AGI refers to a form of intelligence capable of matching or surpassing human cognitive abilities through self-learning.
You’ve likely heard about the financial impact DeepSeek has had on the market, including a ~$600 billion loss for Nvidia’s stock in a single day on January 24th—the largest one-day decline for any stock in the history of public securities trading. As of this writing (February 3rd), that loss has widened to $840 billion, an amount between the GDP of Turkey and Switzerland, the 19th and 20th largest economies. Of course, this extends beyond just one company working on AI. In fact, nearly all major U.S. tech companies, worth almost $20 trillion, are involved in AI and are directly affected by this. I’ll delve further into the near-term and immediate financial consequences later.
Crash
Since the crash of Nvidia’s stock and the publication of DeepSeek’s technical papers, there has been an abundance of often superficial financial reporting on its immediate consequences, along with academic critiques and articles on its AI approaches that are too nuanced and technical for most laypeople or journalists to understand—let alone draw meaningful conclusions or convey the significant impacts on science and society. What is notably absent is a well-written, easy-to-understand, non-technical article about DeepSeek’s achievements and their forthcoming financial and societal implications.
Aim
The aim of this writing is to make these complex AI concepts more accessible to everyone—whether laypeople, journalists, investors, finance professionals, or anyone interested in the technology we now use every day, called AI. By using simple language, storytelling, and relevant analogies drawn from other scientific fields, I’ll explain the purpose of these new mathematical approaches in AI and how they relate to neuroscience, particularly a function known as high-level ‘reasoning,’ which was once thought to be unique to humans.
Quick Recap
Until very recently, DeepSeek was a little-known AI startup from China that released an AI model called DeepSeek R1 on January 20th. By most benchmarks, it outperformed all models from U.S. giants like OpenAI, Anthropic, Google, and Meta, regardless of whether they were free, paid, closed, or open-source.
Not only that, they shocked the financial world by revealing in their technical papers that their training cost was just over $5 million—compared to the reported $100 million+ for a comparable model from OpenAI released less than four months ago.
In the media, many people immediately questioned the validity of this low figure. However, it’s worth referencing the two published papers detailing the methods DeepSeek used to achieve these results. Another way to validate the cost difference could be the 95-98% pricing gap between OpenAI’s paid, closed AI model and DeepSeek’s.
Finally, as mentioned earlier, unlike OpenAI with its closed models, DeepSeek fully open-sourced their model, providing two highly detailed technical papers on their advancements and inviting the global community to build upon them.
Internal dialogue
(Before diving into a detailed 10,000+ word explanation of all the complex AI mathematics, I should consider the preferences of the reader and prioritize the most impactful innovations from the technical papers. Instead of simply listing the mathematics in the order DeepSeek presented them—an order aimed at a very different audience with a different baseline understanding of AI and math—I should focus on what matters most. Additionally, I must avoid jargon and ensure that my explanations are clear, using plain and relatable language.)
The first innovation I want to highlight in DeepSeek’s mathematics is the concept of an ‘internal dialogue’ by the AI before it provides an answer or response, as illustrated in the previous paragraph. It represents the pre-response thought process, which often serves as a hidden layer of logical reasoning that occurs before the AI actually formulates an answer to a question. The ‘internal dialogue’ doesn’t provide direct information to the question itself, but it sets up a structured framework for a considered response. A good AI may give you the right answer, but a great AI takes a moment to consider the context of who you are and why you’re asking before providing a response. In the AI world, this is known as CoT, or Chain of Thought. However, I believe the term “Chain of Thought” poorly describes what DeepSeek has achieved. While traditional CoT may offer linear, step-by-step reasoning, an ‘internal dialogue’ requires a parallel processing system—allowing for a genuine back-and-forth on context before a response is given. There’s a significant difference between someone explaining a step-by-step process for a complex multiplication problem versus someone who can simultaneously perform two parallel multiplications to verify the answer. The latter parallel process not only self-verifies but also determines which method may be more efficient in the future—essentially, it learns. As a result, the true intelligence here isn’t about providing the right answer alone, which doesn’t contribute to the AI’s learning, but rather the internal ‘parallel’ dialogue before problem-solving and the method chosen and learned in the process.
Brain Specialization
Naturally, having prioritized the concept of parallel thinking, the second most important innovation I want to discuss from DeepSeek’s mathematics is the idea of several parallel brains. They call this the ‘Mixture of Experts’ (MoE) architecture. I refer to it as specialized brain regions with functional specialization—much like the human brain, where different regions are responsible for tasks such as seeing, hearing, or abstract reasoning. Ironically, the common non-specialized AI neural network architecture is called the ‘Dense Neural Network’ (DNN) architecture. While MoE has a network of parallel brains, each with its own area of expertise, DNN is designed as a single network that does everything, with everything connected to everything else.
One core advantage of MoE over DNN is the larger raw size of its brain or breadth of specialized knowledge, while still maintaining agility (which is very important—I’ll explain why later) by activating only the regions needed at a given time. The DeepSeek MoE brains contain 607 billion parameters. For simplicity, let’s think of parameters as neurons (yes, I know this is an oversimplification—sorry, nerds!). With MoE, only 37 billion neurons are activated at once, while for comparison, another ‘big’ brain from Meta, called Lambda, activates all 405 billion parameters, naturally consuming more computing resources, time, and energy. Suffice it to say, MoE is much faster and more scalable than Dense, but it is also more complex to manage due to the specialized regions of the brain. Without getting too technical, DeepSeek overcame these challenges with several innovations to better predict, compress data, and allocate memory and resources to speed up processing.
Smaller signal means faster travel
The third innovation I want to prioritize relates to significantly speeding up thinking by strategically compressing data, particularly when precision is not essential. While fully acknowledging the risk of oversimplifying some very clever mathematics, consider this analogy: every time an AI ‘thinks,’ imagine moving a box containing 256 photographs of data around a large house (representing the brain) versus moving a box containing 4.3 billion photos.
Currently, most large AI models utilize a data type called FP32, which can hold 4.3 billion variations of data—a mind-boggling number that offers precision 800,000 times greater than the number of visible stars in the night sky. Meanwhile, DeepSeek strategically uses FP8, a variable that holds 256 variations in places where such high precision isn’t necessary, while retaining the highly precise FP32 where it is needed. It’s like not measuring room temperature to the 100th decimal place when our skin can only detect a 0.1-degree difference. Or approximating Pi as 3.14 when building a space station the size of Mars’ orbit, where the millionth digit is required for perfect circularity.
Of course, their system is more nuanced, as some boxes or data types require 65K photos while others need 4.3 billion. However, these are now managed dynamically, with far fewer larger boxes being moved—or, in some cases, not being moved as much. The overall result is significantly less energy usage, lower storage needs, and much higher speed for ‘thinking.’ This is not insignificant, considering that with 607 billion experiences to process, the AI model needs to ‘think’ through them in order to be ‘trained’ as a functional brain.
DNA
This is not the first time experts have assumed that a complex system required much larger variables, only to discover a more elegant method using smaller variables designed by nature. Until DNA was discovered less than 80 years ago, most biochemists believed the most likely candidate for genetic material was the 20 amino acids that make up protein structures. After all, given the observed complexity of life, the exponentially greater coding possibilities achieved by 20 amino acids, as compared to just 4 nucleic acids (the chemical variables in DNA), seemed to make sense. For example, 3 nucleic acids can encode just 64 possible combinations of information, while 3 amino acids can encode 8,000. The truth is, nature arrived at a much more elegant solution, using a far smaller data set to create something far more complex, such as life and, later, our central nervous system.
In conclusion, it is not the size of the neuron or the size of the data type that determines intelligence or reasoning ability—just as a bigger brain doesn’t make someone smarter. It is the structure, the internal connections, and our experiences (trainings) that use those connections, and the context in which they are applied, that make someone truly intelligent.
The final innovation I want to bring to your attention wasn’t directly programmed into DeepSeek R1 by their mathematicians or engineers, but rather ‘naturally occurred’ during training or was ‘invented’ by DeepSeek’s AI model itself.
The paper flagged a particular moment during training when DeepSeek’s AI abruptly stopped mid-thought while solving a math problem, revisited its earlier approach, allocated more time, and tried a different mathematical path. From the chain of thoughts prompt, it said, “Wait, wait. Wait. That’s an aha moment I can flag here,” announcing the pause and then continuing, “Let’s reevaluate this step-by-step to identify if the correct sum can be…”
I reread and revisited this math problem several times, and what’s interesting to me is not the solution to the math problem itself (which is the least interesting part), nor that it can think in parallel (interesting, yes, but we know that already). The most intriguing part is that it knows when to STOP. Having solved millions of math problems in schools and countless non-math problems in the real world as a technical professional and entrepreneur, I don’t believe there’s anything trivial about knowing when to stop or pause while solving a difficult problem. It’s also not trivial to know what resources to allocate when faced with the realization that you might be on the wrong path. For an AI to know how to do this without being explicitly programmed to do so demonstrates what I can describe as learned instinct. As for myself, I am still in awe of the mathematics and am desperately learning more about the underlying conditions and the right ‘incentives’ given to DeepSeek’s AI to achieve this level of optimization, which appears to operate like instinct.
(There are, in fact, many more innovations, but this piece would become as long as War and Peace if I allowed myself to indulge. It’s probably better to conclude with a very technical effect from DeepSeek called the ‘Aha Moment.’ An incredible moment occurred during DeepSeek’s R1.)
Future Developments: Super AGI
There is no longer any question about the arrival of AGI. Compared to the average intelligence of the human species and their general knowledge, DeepSeek has already surpassed general intelligence. The question now is, when will AI achieve superintelligence (and how would we even measure it)? It’s clear from DeepSeek’s advancements that they have achieved superb reasoning capabilities through internal dialogue, brain specialization, and parallel brain architecture and processes. Therefore, what’s next is not only better reasoning skills to deduce more facts from existing facts but also using these parallel structures for imagination to create something entirely new by forming and testing hypotheses beyond our current scientific understanding. This process has a familiar name: the scientific method. What is less familiar, though, is the important role imagination plays in the scientific process, particularly in game-changing discoveries throughout history.
Before Einstein published his paper on the Special Theory of Relativity, he used his imagination to “ride along” a beam of light in order to ‘observe’ his surroundings before discovering that time is not universal, but relative to an observer.
Thought experiment: one can now easily imagine an AI with an efficient parallel architecture like DeepSeek’s creating or imagining a world without the constraints of classical physics, where it “rides” another theoretical light beam. It could then have an internal dialogue using a specialized expert under MoE architecture to arrive at another “Aha moment” and discover something entirely new, such as the Grand Unified Theory that eluded Einstein during his lifetime, and all physicists—living or dead—since.
Clinical Psychology and Behavioral Psychology
Another very foreseeable development is using ideas from clinical psychology and human quirks from behavioral psychology to improve AI’s learning and intelligence. While this topic is better suited for future articles, beyond the hundreds of ideas from psychology, psychiatry, neuroscience, and behavioral economics, I can also see the potential for leveraging effective parallel group communication and project management models, such as those from ‘Six Thinking Hats,’ which advocate group-level ‘parallel thinking,’ and the Agile framework, which prioritizes a system of interactions and communication to get a job done over a single-minded march toward a narrow end goal. Yes, this discussion should indeed be left for another time, but I can definitely see the inclusion of differences in AI reasoning preferences and weights of knowledge (perhaps akin to personalities) to an AI model, allowing for more dynamism and creativity during reinforced learning and problem-solving.
Consciousness is the wrong question
By now, I hope you have begun to grasp the significance of this and feel the same monumental shift in humanity. Hidden in the mathematics and architecture are the future possibilities of consciousness, the subconscious, personality, imagination, emotion, and creativity. These are the possibilities that are truly beautiful and novel. As a result, there should be no more question about whether AI is conscious. But that is the wrong question.
AI consciousness is no longer the right question. Let’s conduct a thought experiment on the definition of consciousness. First, let’s take both an internal and external examination of human consciousness. Internally, we know we are conscious because we can have an internal dialogue to affirm our existence. We can internally validate our own senses (e.g., seeing, hearing, and even thinking) and reasoning through our innate parallel brain structure. By having a sophisticated parallel thought process and reasoning, we are aware of the different parts of our mind. Therefore, we could probably upgrade Shakespeare’s communication about what it means to be conscious from “I think, therefore I am” to “I parallel think, therefore I know that I am.” Externally, we determine others’ consciousness only through our senses and their feedback mechanisms, a.k.a. communication. In other words, we infer consciousness through our ability to communicate. There is nothing else. Channeling Shakespeare once again: “I sense, therefore you are.”
Now, imagine how DeepSeek or future AGI might answer this question of consciousness—make sure to examine their Chain of Thoughts. So, the more important question is: what relevance is there in continuing to ask this question, especially when the answer is so clear? The only way the answer could be “no” is if people explicitly redefine consciousness as high-level parallel reasoning wrapped in a blanket of collagen and minerals scaffolding—i.e., skin and bones.
In neuroscience, we know that human consciousness resides somewhere within the architecture, from the structure of a basic neuron to the massive, complex structure called the brain. Similarly, in AI, consciousness also exists somewhere between the mathematics of the artificial neuron and the equally massive but highly specialized architecture known as the MoE brain. While human consciousness primarily resides in the prefrontal cortex of the brain, I believe DeepSeek’s current form of consciousness rests mostly in the two green boxes of Figure 2 on page 7 of the DeepSeek-V3 Technical Report. I haven’t found the boxes for its personality yet. ;-)
The 1938 Nuclear Fission Paper
Before I touch on the immediate financial, investment, and potential geopolitical impacts, I want to make a final comparison between DeepSeek’s AGI advancements and another event that redefined human history: the scientific paper describing fission from 1938. Without delving into the fascinating stories and details surrounding the discovery of fission, I want to point out that it took less than seven years for the atomic bomb to be successfully tested in 1945. Once humanity entered the Industrial Revolution nearly 300 years ago, the time it took for many groundbreaking theoretical breakthroughs collapsed and were further accelerated by ever faster and better communication technologies and the threats of mass violence from competing belief systems. AGI, or Super AGI, has a high probability of shortening its development time frame.
Immediate Financial and Sales Impacts
(To be honest, I feel this piece is already getting too long for a lay audience and may not be appropriate for reading in one sitting, so I’ll wrap up with some clear and succinct financial impacts that have already taken place and leave the deeper financial and geopolitical analysis for future Part 2. I’m also tired and need to rest my neural network.)
The primary financial impact of the launch of DeepSeek AI has already been felt in the marketplace, and it continues to accelerate as I write. Since its launch on January 20th, DeepSeek has either exceeded or matched the most advanced commercial AI models in terms of performance while charging 96-98% less for API calls. A quick explanation: most AI companies you hear about are second-layer “AI” companies that don’t create their own models (you can easily recognize them by having “AI” in their name or using “.ai” in their web domain). Instead, they use DeepSeek, OpenAI, or a few others to sell their repackaged AI products for gaming, entertainment, finance, legal services, and more. Naturally, a better product at a 98% price cut will cause almost everyone to switch, especially when there is such a high margin to be made from reselling. Right now, the question is no longer why to switch, but which “idiot” won’t or can’t in the near future.
Latest news: As of today, Amazon—the fourth most valuable U.S. company and the leader in cloud business—appears to have publicly endorsed DeepSeek and deployed it in their own AI products. Until other AI companies like OpenAI and Anthropic match DeepSeek’s pricing, it is hard to see how the collapse of their business models won’t accelerate.
Nvidia divided by 45
As far as Nvidia’s hardware business model is concerned, where the purchasing cycle is not as short as just changing a few software codes to switch to another AI model and redirecting payment to another bank, the impact is no less significant or short-term. A reported 45-fold improvement in the efficiency of the hardware cost-to-AI output ratio would mean dividing any future sales projection for AI hardware by 45. Therefore, if $1 trillion per year was projected for future AI-related hardware, the new sales should now only be $22.2 billion, given DeepSeek’s hardware processing efficiency gain—assuming the same exponential growth. With Nvidia owning almost 70% of the market share in AI-friendly GPUs and a 90% margin on them, they will be the most affected, even if no new competitors emerge.
(Wait, wait, wait… Let’s stop here and shift perspectives to examine what actual humans would do. Certainly, there are billions of current and planned orders for AI hardware from worldwide clients. What would all those purchasing managers and their bosses do with their current and future orders given this new information from DeepSeek?)
Alt Solution: Nvidia divided by purchasing managers
One can imagine that a manager responsible for purchasing costly AI hardware, who just heard about DeepSeek’s efficiency gain, may consider slowing or pausing any new orders while she validates DeepSeek’s claim. Given the open-sourced nature of their models, these numbers should be able to be independently validated within a week. Either way, it would be a lot quicker than the 2-3 month ordering window for the next Nvidia H100 GPUs and before the next fiscal quarter’s budget reporting. Simple economics would certainly compel any purchasing manager or CFO not to want to lose their job by ignoring a 98% savings.
It’s probably too early to accurately predict the effect on Nvidia sales next quarter, but given the new AI pricing structure and the speed at which companies are shifting to DeepSeek from other AI models, we can more confidently say that the financial analysts’ consensus of Nvidia achieving 4x sales or maintaining a 90% margin in 5 years is a low-probability event. In a few months, there will probably be more data points for a more thorough analysis of Nvidia’s sales and its equity market valuation.
U.S. AI Supremacy, until now
Of the top 7 most valuable public companies in the world, 6 of them are technology companies, and all 6 are from the U.S., actively working on and betting their future on AI. The total market value of the top 20 tech companies in the world is $20 trillion, more than China’s GDP. Sixteen are U.S. companies with a combined market value of $18 trillion, while 4 are from the rest of the world with a combined market value of $2 trillion. All 20 are involved in AI.
At face value, this data represents almost complete U.S. supremacy in recent AI development and its commercialization. From the influential AI paper in 2017 on language processing to the release of ChatGPT in late 2022 and the hundreds of billions invested into AI since, the U.S. appeared to have all the hardware infrastructure, talent, and capital to maintain its AI dominance and control its trajectory. Until now. The open-source release of DeepSeek and the consequences of its technical and economic advancements have significantly leveled the AI playing field globally. Given the game-changing nature of DeepSeek’s advancements in reducing hardware infrastructure needs and capital investment, I believe AI development will grow the most in places with the greatest talent pool —those with the most mathematicians, engineers, and scientists. Both capital and infrastructure will have to follow. Asian countries appear primed to capitalize on this shift, with Europe also able to take significant advantage. I doubt the large international gap in AI valuation will persist during the march toward Super AGI.
(There is actually much more to discuss on this topic, particularly regarding changes in market valuations and geopolitical effects, but I should really wrap this up and leave those details for part 2.)
Perhaps a fitting final analogy here might be: with DeepSeek, the secrets of nuclear fission are now out—except this time, anyone with $5,000 can buy the basic equipment to start a nuclear reaction. I suppose as a species, we need to make a decision soon again: whether we use science to build more bombs to kill people or more power plants and other tools to help people?
Either way, what an exciting moment to be alive.
*****
Upcoming in part 2…
Let’s have fun with some much simpler math questions than AI math:
What is OpenAI worth? (Softbank says $300 billion)
What is DeepSeek worth?
- If it cost Company A $100 million each to make a product everyone wants and they are worth $300 billion. What is Company B worth if they can make a better product at $5 million each?
What is Nvidia worth?
- When a new machine half their the price of their can do 45x its work or 90x efficiency?
What is the China ‘Big Techs’ worth in 6 months?
- China’s $1.2 Trillion against $18 Trillion+ total market value of U.S. Big Tech working on AI
If you enjoyed this article and would like to consult with me and the Alp team on AI, machine learning, finance, investing, or any topics within our expertise, please don’t hesitate to reach out at info@alp-technologies.com.
James Kong is an AI and Machine Learning entrepreneur who has worked on Wall Street, specializing in advanced mathematics for the past 25 years. He graduated from Columbia University with a focus on data optimization modeling, probability theory and statistics, linear algebra, and simulation, before taking on senior quantitative roles across North America and Europe. James Kong also holds degrees in biochemistry and economics, with a special interest in cognitive neuroscience and behavioral economics from an early age. He pursued a career as a medical doctor before his work in AI, finance, and engineering contributed to the fields of machine learning, edge computing, and microelectronics in renewable energy.