Templar and the Political Economy of Foundation Models
When the Means of Production Get Distributed, the Power Structure Follows
In the same month, Jack Clark and Chamath Palihapitiya independently flagged the same thing: a 72-billion-parameter model trained across 160 GPUs by anonymous participants coordinating through a blockchain. Neither is a Bittensor insider. Both recognized what it means when the ability to create foundation models stops being a privilege of five organizations.

Two Signals in One Month
On March 15, Jack Clark published Import AI #449. Clark co-founded Anthropic, served as OpenAI's policy director, and has written one of the most widely read newsletters in AI for nearly a decade. His analysis of Covenant-72B was measured, technically specific, and revealing in what it chose to emphasize. He did not focus on the benchmark scores. He focused on the coordination.
"Distributed training via the blockchain notches up a meaningful win... Challenges the political economy of AI... Could shift development from compute singletons toward federated collectives."
Clark noted the hardware gap honestly. Approximately 160 GPUs coordinated across 20 distributed peers, versus the tens of thousands that frontier labs deploy. The model is not frontier-competitive, and Clark did not pretend otherwise. What he identified was something more specific: that the coordination mechanism worked. Anonymous, self-interested participants contributed compute to a training run that produced a 72-billion-parameter model scoring 67.1 on MMLU, slightly above Llama 2 70B. The result was not remarkable for its performance. It was remarkable for the way it was produced.
Two days later, Chamath Palihapitiya raised the same training run on the All-In Podcast while questioning Jensen Huang about the future of open-source AI. Chamath called it "a pretty crazy technical accomplishment." He got the parameter count wrong, citing 4B instead of 72B, but he got the substance right: decentralized compute had produced a real model. Jensen's answer was diplomatic. Open-source and proprietary models are both necessary, he said. "Models are a technology, not a product."
Chamath Palihapitiya asks Jensen Huang about decentralized training on the All-In Podcast, March 2026
Neither Clark nor Chamath is a Bittensor insider. Clark is one of the most careful analysts in AI policy. Chamath is a venture capitalist who moves markets when he talks. When two people from opposite ends of the tech establishment independently identify the same project in the same month, the signal is not about the project. It is about the category of thing the project represents.
The Means of Production
Every major information technology follows the same structural arc. Creation starts concentrated in the hands of whoever can afford the infrastructure. Then the tools get distributed. Then the political economy restructures around the new distribution.
Publishing was the first. For centuries after Gutenberg, the ability to produce and distribute printed material required capital for presses, typesetting equipment, distribution networks, and the social standing to work within the licensing regimes that controlled who could print what. The concentration was not merely economic. It was political. The English Crown's Licensing Act of 1662 explicitly restricted printing to approved publishers, so that the means of production and the means of censorship remained fused. When printing technology became cheap enough for pamphleteers, the political structure of public discourse changed. Not because the pamphlets were better than the books. Because the ability to produce them could no longer be controlled.
Broadcasting followed the same pattern. Early radio required enormous investment in transmission infrastructure, and governments allocated spectrum licenses to a handful of operators. Television concentrated further, with three American networks controlling what tens of millions of people watched. Cable fractured the oligopoly. The internet dissolved it. Each wave of distribution created new producers, new audiences, and new power dynamics that the previous structure could not accommodate.
Cloud computing compressed the cycle again. Before AWS launched in 2006, running internet infrastructure at scale required buying servers, leasing data center space, and hiring operations teams. After AWS, a college student with a credit card could provision the same infrastructure that previously required millions in capital expenditure. The startups that emerged from that redistribution, Airbnb, Stripe, Slack, did not merely compete with incumbents. They operated in categories the incumbents had not imagined, because the incumbents had built their strategies around a world where infrastructure was expensive.
Foundation model creation is the current concentration point. Five, perhaps ten organizations on Earth can train a model from scratch at the frontier. The capital requirements are measured in billions. The compute requirements are measured in tens of thousands of GPUs running for months. The talent requirements are measured in teams of researchers whose salaries alone exceed the total budgets of most AI startups. This concentration is real, and it shapes everything downstream: which models exist, what they are trained on, what guardrails they carry, whose values they reflect, and who profits from their use.
Clark's phrase, "compute singletons," borrows from software architecture. A singleton is a design pattern where a single instance controls access to a shared resource. His question, whether development can shift toward "federated collectives," is a question about whether the structural arc that every previous information technology followed will repeat for this one.
Consuming vs. Creating
There is a distinction that the open-source AI discourse consistently elides. Downloading a model that someone else trained is not the same as training your own.
When Meta releases Llama, when Mistral publishes its weights, when DeepSeek opens its checkpoints, the community gains the ability to use, fine-tune, and deploy those models. This is genuinely valuable. It creates an ecosystem of practitioners who can build on top of existing work without paying licensing fees or accepting restrictive terms. But it is consumption. The act of creation, the training run itself, remains in someone else's hands. The recipients depend on the continued willingness of the provider to release weights. The research directions, the training data choices, the architectural decisions, the safety constraints, all of these remain in the hands of whoever controlled the training run.
Sam Dare, the founder of Covenant AI and the architect of the Templar training protocol, puts the question more directly.
"Mark Zuckerberg stole democracy. Let us never forget that. Google has committed a litany of privacy and human rights violations. Are these the people you want to give Prometheus's fire to?"
— Sam Dare, Hash Rate podcast
Meta's open-weight releases serve Meta's strategic interests. When those interests change, the releases can stop. DeepSeek operates under the jurisdiction of a government that controls its citizens' access to the internet itself. Depending on either entity for the foundational infrastructure of AI is a policy choice, and it is a policy choice that most of the people making it have not consciously examined.
Sam's argument goes further. He contends that the capability which matters is the ability to create base models. Fine-tuning and deployment are downstream. Training from raw data into coherent intelligence is the upstream act, and it is the one that remains concentrated.
The week this essay was published, Cursor shipped a concrete example. The company behind one of the most widely used AI coding tools took Moonshot's open-weight Kimi K2.5 and post-trained it with additional data and reinforcement learning specifically for programming, releasing the result as their latest Composer model. Cursor is a sophisticated operation. They did not need to train a foundation model. They needed a good base to build on, and an open-weight Chinese LLM gave them one. The derivative model may be excellent. But Cursor's ability to make it depends entirely on Moonshot's willingness to release the weights it was built from. If Moonshot changes its licensing, or its government changes Moonshot's priorities, Cursor's supply chain breaks.
If foundation model creation remains the exclusive province of a handful of organizations, then every downstream application, every fine-tune, every deployment, every safety decision, ultimately traces back to choices made by those organizations. The open-weight ecosystem becomes a garden planted in someone else's soil. The soil can be salted at any time.
The Coordination Problem
Clark's newsletter identified the right technical question. The barrier to decentralized training was never raw compute. Bitcoin proved over a decade ago that anonymous, self-interested participants will contribute hardware to a shared network if the incentive structure is correct. At its peak, the Bitcoin network coordinates more computational power than any single organization on Earth commands. The hardware aggregation problem was solved years ago.
What remained unsolved was the coordination problem. Bitcoin mining is embarrassingly parallel. Each miner works independently on a separate nonce, and the work of one miner does not depend on the work of another. Gradient descent is the opposite. Training a neural network requires that distributed participants exchange gradient updates, synchronize their states, and maintain coherence across a fragile optimization process where a single corrupted update can destabilize the entire run. The communication overhead alone, for a 72-billion-parameter model, would normally require the kind of high-bandwidth interconnects that only exist inside a single data center.
This is why the conventional wisdom held that decentralized training was theoretically interesting and practically impossible. Not because the GPUs did not exist. Because coordinating them on something as fragile as gradient descent across unreliable internet connections, with participants who could join and leave at will, seemed to violate the basic requirements of the optimization process.
Covenant-72B is evidence that the conventional wisdom was wrong about the coordination problem, even if it remains correct about the compute gap.
The technical mechanism, SparseLoCo, achieves over 99% communication compression through top-k sparsification and 2-bit quantization of gradient updates. In practical terms, it reduces the bandwidth required for distributed gradient exchange by two orders of magnitude, making training across commodity internet connections feasible. The Bittensor incentive layer, operating through Subnet 3, handles the economic coordination: rewarding participants who contribute valid compute, penalizing those who defect, and maintaining training continuity as peers join and leave the network.
Twenty distributed peers, each running 8 B200 GPUs. Approximately 160 GPUs total. The model trained on roughly 1.1 trillion tokens across participants who did not know each other, did not trust each other, and were free to leave at any time. The network maintained training continuity regardless. The weights and full training report are publicly available.
The gap between 160 GPUs and the tens of thousands at the frontier is enormous, and anyone who pretends otherwise is selling something. But the gap between zero successful decentralized training runs and one is a different kind of distance. It is the distance between "impossible" and "early." Every information technology that eventually restructured its political economy crossed that threshold. The question was never whether decentralized systems could match centralized ones on day one. The question was whether the coordination problem could be solved at all.
Iterations and Themes
Sam frames the arc in terms that are more philosophical than most technologists are comfortable with. His language borrows from prophecy, from civilizational narrative, from the kind of grand historical arcs that the technology industry usually reserves for its keynote stages and then forgets by the following quarter.
"Decentralized training is, I think, humanity's last stand. Because there's been many iterations that fail, but a theme eventually succeeds. If you look at iterations as the x-axis and the theme as the y-axis, the theme represents what the innovation is trying to build, and the x-axis iterations are humanity's best attempts at achieving it. Take for example the steam engine. There were many iterations where mankind put all his dreams, all the best aspirations, and it failed. Explosions. Then eventually, out of nowhere, we get the steam engine. Then out of nowhere, we get airplanes. Because an iteration fails, iterations can fail, but ultimately the theme survives."
— Sam Dare, Ventura Labs podcast
The framework is useful, even if you strip away the prophecy. The theme, in Sam's telling, is the distribution of power over information technology. The iterations are the specific attempts: the internet, peer-to-peer networks, open-source software, blockchains, and now decentralized training. Each iteration addresses a different layer of the stack. Each can fail independently. The theme survives because the structural incentive for distribution does not depend on any single implementation.
Whether you find this framework inspiring or grandiose depends on your tolerance for civilizational framing. What makes it worth engaging with seriously is not the rhetoric but the structural claim underneath it. The internet distributed publishing. Open source distributed software production. Bitcoin distributed financial verification. Each of these redistributions was dismissed as impractical before it succeeded and then treated as inevitable afterward. The pattern is consistent enough that dismissing the next iteration requires explaining why this particular layer of the stack is exempt from the arc that reshaped every layer before it.
Sam's more pointed claim, that this may be "the last iteration of freedom," carries a weight that is harder to evaluate. His argument is that once the ability to create foundation models is permanently captured by a small number of entities, the asymmetry of capability becomes self-reinforcing. Models generate revenue that funds more compute that trains better models. The flywheel concentrates. And unlike previous enclosures, where the enclosed resource was land or spectrum or server capacity, the enclosed resource this time is intelligence itself, the substrate on which every other decision-making process will increasingly depend.
This is where the essay has to be honest. Covenant-72B does not prove that the theme has succeeded. It proves that this iteration has not failed yet. The distance between a 72-billion-parameter model trained on 160 GPUs and the frontier is not a gap that enthusiasm will close. It will require more compute, better coordination algorithms, sustained participation, and the kind of iterative engineering that takes years. The coordination problem is solved. The scale problem is not.
But Clark, from his position as one of the most credible analysts in AI, identified the coordination problem as the meaningful breakthrough. Not the benchmark scores. Not the parameter count. The fact that the mechanism worked. And Chamath, from his position as one of the most influential voices in technology investment, identified the same thing independently.
What Happens Next
Jensen Huang's answer to Chamath was characteristically careful. "Models are a technology, not a product." The framing is deliberate. If models are a technology, then their proliferation follows the same logic as every other technology: what starts concentrated eventually gets distributed, because the economic incentives for distribution are stronger than the barriers to entry once the enabling innovation exists.
The enabling innovation, in this case, is not a new GPU architecture or a novel training algorithm. It is a coordination mechanism that allows anonymous participants to cooperate on gradient descent without trusting each other, without a central orchestrator, and without the kind of high-bandwidth interconnects that only exist inside a data center. That mechanism now exists. It has produced a real model with published weights and a verifiable training history.
The question is no longer whether decentralized training is possible. It is whether the structural arc will hold. Whether 160 GPUs will become 1,600, and then 16,000. Whether the coordination algorithms will scale with the hardware. Whether the incentive structures will attract the sustained participation that frontier training demands. None of these outcomes is guaranteed. All of them are now plausible in a way they were not six months ago.
Sam would say that this is how it always works. The iterations fail until they don't. The theme persists. The steam engine explodes until it doesn't. The airplane doesn't fly until it does.
Clark and Chamath would frame it differently, in the measured language of policy analysis and investment thesis. But all three are pointing at the same structural observation: the means of production for foundation models may be entering the same distribution arc that reshaped every previous information technology. The political economy of AI may not be permanent.
It never is.
Related Reading:
Disclosure: I work with Covenant AI and am directly involved in the organization's communications. For full transparency about my involvement and investments, see my projects page. All opinions expressed are entirely my own.