MIT Calculates Cost of Training Opus 4.7 Model In 1946

The recent success of Anthropic’s Claude Opus wasn’t simply the result of increased investment in algorithms, data, or computing power, but is directly linked to seventy years of steady gains in hardware efficiency. Scientists have pursued computers capable of thought since the mid-20th century, yet achieving goals like passing the Turing test or reasoning in natural language only became possible in recent years. Researchers argue that progress was historically limited not by a lack of ideas, but by insufficient computing hardware until recently. They contend that the steady march of hardware efficiency, similar to Moore’s law, finally enabled rapid investment in quantity to deliver the advancements seen over the last decade.

Early AI Ideas Preceded Implementable Hardware

Many of the core concepts driving artificial intelligence breakthroughs were formulated decades before the necessary computing power existed to realize them. While contemporary discussion often centers on algorithmic innovation and the availability of vast datasets, a historical perspective reveals a prolonged period where progress was fundamentally constrained by hardware limitations. This delay wasn’t simply a matter of lacking the right ideas; many foundational concepts originated in the 1950s and 60s. Claude Shannon, in a seminal 1948 paper, demonstrated the possibility of statistically predicting the next “token” of English text, establishing the basis for modern language models. Just a few years later, Frank Rosenblatt invented the perceptron, the fundamental unit of modern deep learning. Rosenblatt noted that researchers were grappling with concepts long before implementation became feasible. Even backpropagation, essential for training deep networks, was discovered and discussed multiple times before the mid-1980s.

The sheer volume of early algorithmic work supports the notion that ideas consistently outpaced available technology. The connection between algorithmic insight and practical application was delayed for thirty years, until IBM created Candide in 1975, a statistical machine translation system trained on a digitized text corpus. Researchers understood the potential of neural networks and text prediction, but lacked the computational resources to fully explore them. Bernard Widrow and Michael Lehr, writing in 1990 on the previous thirty years of progress in neural networks, emphasized that “speed limitations [a result of compute limitations] keep such networks relatively small.” This wasn’t merely a matter of insufficient investment; it was a matter of hardware efficiency. Consider the scale of the challenge: training a model like Claude Opus requires approximately FLOPs.

Extrapolating back to the mid-1940s, and using the Electronic Numerical Integrator and Computer (ENIAC) as a benchmark, it would have cost roughly 70 million times more than current world GDP to achieve the same result. Even in 1985, with the TITAN supercomputer, the cost would have been 12 billion. These figures illustrate the dramatic impact of hardware advancements, with the same training run now costing around 105 million, a reduction of 114-fold from Shannon’s era.

Shannon’s Entropy and Foundations of Text Prediction

The current state of text prediction is marked by models like Claude Opus demonstrating an ability to generate coherent and contextually relevant text, a feat once relegated to the realm of science fiction. While improved algorithms and larger datasets are undeniably crucial, the capacity to actually implement those advancements hinged on the increasing affordability and power of computing hardware. A historical perspective reveals that many foundational ideas in artificial intelligence were conceived in the 1950s and 60s, yet remained largely unrealized due to resource limitations. Algorithms do not appear to have been the primary constraint. Claude Shannon, considered the founder of information theory, in a seminal paper, established the groundwork for modern text prediction by demonstrating “that it was possible for statistical measures to predict the next ‘token’ of English language text given the preceding words,” and defining entropy as a metric for evaluating predictive performance.

The story isn’t simply about having the ideas, but about possessing the computational power to test and refine them. The reduction in cost from Shannon’s era to the present, while substantial, has not been linear, suggesting that further gains in efficiency are paramount to continued progress.

Today, most artificial neural network research and application is accomplished by simulating networks on serial computers. Speed limitations [a result of compute limitations] keep such networks relatively small […]” Just as algorithmic ideas might have matured sooner if more hardware was available for experiments, the massive digital text datasets we now have are also downstream of progress in hardware.

Bernard Widrow and Michael Lehr

Compute Efficiency, Not Just Investment, Limited AI

Anthropic’s success with “Claude Opus” isn’t solely attributable to recent surges in AI investment; rather, it represents the culmination of seventy years of incremental gains in hardware efficiency, a point underscored by a detailed analysis of computational costs across decades. While contemporary AI development benefits from improved algorithms, larger datasets, and increased financial backing, these factors were historically constrained by the sheer capacity of available computing power. The analysis reveals a striking disparity in cost reduction over time. Training a model equivalent to Claude Opus using the technology available in 1946 would have been an astronomical 70 million times more than current world GDP, while the cost in 1985, using the TITAN supercomputer, would have been 12 billion. This demonstrates that the initial decades saw minimal improvement in cost-effectiveness, and only recently has increased investment in hardware quantity yielded substantial results.

The current cost to train Claude Opus stands at approximately 105 million, a figure dramatically lower than its historical equivalents, but still dependent on the foundation of prior efficiency improvements. Looking forward, sustaining this rapid pace of investment presents a significant challenge. The authors suggest that continued AI advancement hinges on whether further gains can be achieved with “a few more orders of magnitude of investment,” but caution that this level of expenditure is unsustainable without corresponding economic growth. The implication is clear: future AI progress may once again become heavily reliant on the slower, steadier pace of hardware efficiency improvements if dramatic economic expansion doesn’t materialize.

But from Shannon to Hinton, we see almost 2,000x more growth in hardware efficiency than investment. But from Hinton to today, we see more than 6,000,000x more growth in investment than hardware efficiency.

Hardware Constraints Shaped Neural Network Scale

The recent successes of large language models like “Claude Opus” aren’t simply the result of algorithmic breakthroughs or increased data availability; a critical, often overlooked factor has been the 70-year trajectory of steadily improving hardware efficiency. This isn’t to diminish the importance of innovative ideas, but rather to highlight that their implementation was frequently delayed by hardware limitations. Establishing core principles still used today occurred decades ago. Even backpropagation, now essential for training deep networks, had early versions “discovered and discussed numerous times before then,” suggesting algorithms were often ahead of implementability. To illustrate the historical constraints, calculating the cost of achieving this with mid-20th century hardware reveals a staggering disparity. Even by 1985, when the TITAN supercomputer offered 2.7 x FLOPs, training Claude would have cost 12 billion. However, the same task is achievable for around 105 million, demonstrating a 114-fold reduction in cost from Shannon’s era to the present. This dramatic decrease underscores the pivotal role of hardware efficiency in unlocking modern AI capabilities, and suggests that continued advancement may depend on whether further gains are possible.

FLOPs Demand: Recreating Claude Opus on ENIAC

The notion that artificial intelligence’s recent surge is solely attributable to algorithmic breakthroughs or increased data availability overlooks a fundamental constraint: computational power. While innovative algorithms and vast datasets are undeniably crucial, the ability to actually run those algorithms at scale has historically been the limiting factor, a reality starkly illustrated by considering the computational demands of modern AI models like Claude Opus against the hardware available decades ago. A counterfactual exercise reveals the sheer scale of this constraint. Examining the computational cost of training Claude Opus, requiring around FLOPs, and comparing it to the capabilities of historical computers demonstrates a dramatic disparity. This isn’t merely a question of lacking investment in hardware quantity, but of hardware efficiency. Moving to 1985, the TITAN supercomputer, costing 97 million, would have required an investment of 12 billion in present-day dollars to train Claude Opus.

While still substantial, this represents a significant improvement over the ENIAC estimate, demonstrating the impact of incremental hardware advancements. Training Claude Opus costs around $105 million, a reduction from the 1985 figure, but a far cry from the 114-fold reduction seen between Shannon’s era and the present. This historical perspective suggests that continued AI advancement isn’t guaranteed, and will likely depend on sustained improvements in hardware efficiency.

Stay current. See today’s quantum computing news on Quantum Zeitgeist for the latest breakthroughs in qubits, hardware, algorithms, and industry deals.
The Neuron

The Neuron

With a keen intuition for emerging technologies, The Neuron brings over 5 years of deep expertise to the AI conversation. Coming from roots in software engineering, they've witnessed firsthand the transformation from traditional computing paradigms to today's ML-powered landscape. Their hands-on experience implementing neural networks and deep learning systems for Fortune 500 companies has provided unique insights that few tech writers possess. From developing recommendation engines that drive billions in revenue to optimizing computer vision systems for manufacturing giants, The Neuron doesn't just write about machine learning—they've shaped its real-world applications across industries. Having built real systems that are used across the globe by millions of users, that deep technological bases helps me write about the technologies of the future and current. Whether that is AI or Quantum Computing.

Latest Posts by The Neuron: