The Architecture of Abundance — A Manifesto

Human civilization runs on systems shaped by scarcity. Almost every institution we have inherited — money, contracts, queues, prices, borders, calendars, law — exists because two people in two different places, holding two different fragments of information, cannot make a single coherent decision in real time. We built rituals of approximation to substitute for that missing coherence. They worked, in the sense that they got us here. They are also, increasingly, the rate-limiter on what comes next.

The interesting fact about the present century is not that resources are scarce. By any honest accounting, the planet produces more food than it consumes, more energy than it uses at any given second, more housing stock than it sleeps in, more compute than it allocates productively, and more medical capacity than it directs to need. The bottleneck is not extraction. The bottleneck is matching: getting the right tonne of grain to the right port on the right day at the right price under the right contract, moved by the right vessel piloted by the right crew under the right weather forecast. The bottleneck, in other words, is coordination — and coordination, until very recently, has been something only humans and a handful of brittle protocols knew how to do at all.

Apik Systems is a frontier-AI research company, founded on a single architectural bet: that computational coordination is the missing primitive of the next civilization, and that a stack of autonomous intelligence — from cognitive tools for individuals up through planetary-scale economic orchestration — is the substrate on which abundance becomes structural rather than aspirational. This essay is the longest, plainest statement I can write of why we are working on what we are working on, what we think is true, what we think we do not yet know, and what it would mean for that bet to be wrong. It is also a commitment, in writing, to a particular way of doing the work.

The coordination problem

The numbers are not mysterious. The Food and Agriculture Organization estimates that roughly one-third of all food produced for human consumption is lost or wasted, with subsequent UNEP analyses placing the figure at around 1.05 billion tonnes wasted at the consumer and retail stages alone in 2022 — and earlier FAO work pegging total food loss and waste closer to 1.3 billion tonnes per year when production-side losses are included.¹ The carbon footprint of that waste is on the order of 8–10 percent of global anthropogenic greenhouse-gas emissions, which is to say, roughly the same as the emissions footprint of all road transport.² None of this is caused by insufficient harvest. It is caused by the inability to route the harvest.

Logistics tells the same story in a different register. Across recent World Bank and industry estimates, logistics costs amount to something between ten and thirteen percent of global GDP — the gap between what production costs and what consumption pays, absorbed by the friction of getting matter from point A to point B.³ In emerging economies, the figure climbs higher; in countries with weak transport coordination, total logistics overhead can approach a fifth of GDP. This is not an indictment of trucks or ships. It is an indictment of the planning layer above them. A modest improvement in routing — call it a five-percentage-point reduction in global logistics overhead — would be worth more than four trillion dollars a year in recovered surplus. We do not lack diesel. We lack the substrate that decides where the diesel goes.

Energy is the third example, and the most thermodynamic. Wholesale electricity markets clear at five-minute intervals and balance demand against supply with extraordinary precision over those intervals. They fail catastrophically at the seasonal-storage scale, because the institutions that price and trade megawatt-hours have no way to commit, today, to a delivery in eight months under a weather distribution we have only seen in models. The result is curtailment of renewables in summer and combustion of fossil fuels in winter, not because either is preferred, but because the coordination layer required to bridge the two does not yet exist.⁴ Every additional watt of solar capacity makes the coordination problem worse, not better, until the coordination layer catches up.

These are three faces of the same shape. To see why the shape matters, consider three earlier moments in which a coordination layer arrived and the physical world responded as if it had been holding its breath.

The first is electric grid synchronization. Through the late nineteenth and early twentieth centuries, electrification proceeded as a patchwork of incompatible local generators, each running at its own frequency and voltage. No amount of additional copper or coal would have produced a continental grid, because two unsynchronized generators cannot share a wire without destroying each other. The arrival of standardized 50 or 60 Hz alternating current, common phase relationships, and the protective relay was a coordination protocol, not a physical innovation. Once it existed, the physical capacity that had always been latent in the network suddenly composed.

The second is the standardization of the intermodal shipping container in the 1950s and 1960s. The physical capacity to move goods from a Pittsburgh factory to a Singaporean port had existed, in principle, since the Suez Canal opened. What did not exist was a single object that a truck, a crane, a ship, and a railcar could all agree on. The container is a coordination primitive — a shared, dimensionally-typed interface — and the moment it appeared, port throughput rose by orders of magnitude and ocean freight per tonne fell by something close to ninety percent, with downstream effects on global GDP that economists are still trying to fully account for.⁵

The third is the Internet’s packet-routing layer. Long-distance circuits had existed for a century before the ARPANET. Computers had existed for two decades. What the IP and TCP protocols added was not bandwidth, not silicon, not transistors — it was the ability for a packet of information to find its way across an arbitrary, partially-failed network without a central operator deciding the route. Coordination, in software, at line rate. Almost everything else we now call “the digital economy” is downstream of that one architectural shift.

The thing to notice about all three is that none of them required new physical capacity. Each was a layer that turned an existing substrate into a coherent one. Each unlocked, on the order of a decade, a step-change in throughput. We are arguing that the planetary economy is presently in a pre-synchronization, pre-container, pre-IP state with respect to its own coordination — that the substrate is there, that the intelligence to use it is now arriving, and that the layer in between is what we are working to build.

Why human institutions can’t close the gap

It is worth being precise about why we cannot simply hire more planners, build better spreadsheets, or reform our existing institutions until coordination becomes adequate. This is not a critique of human seriousness. It is a structural observation about three named limits.

The first is the bandwidth ceiling on individual cognition. George Miller’s 1956 finding that working memory holds approximately seven plus or minus two chunks at a time is the most-cited number in the history of psychology, and although later work has refined the figure downward — Cowan’s revised estimate is closer to four — the order of magnitude is the order of magnitude.⁶ Herbert Simon’s program of bounded rationality made this a foundational claim of organization theory: real decision-makers do not optimize, they satisfice, because the search space dominates them.⁷ An individual human cannot hold the parameters of even a single regional supply chain in mind simultaneously, let alone reason about counterfactual reallocations across them. We compensate with hierarchy, specialization, and forms — but each of those compensations imposes its own losses.

The second is principal-agent friction at scale. Any large coordination problem — a corporation, a state, a multilateral system — is a recursion of principals delegating to agents, each delegation introducing some loss of fidelity, some skew of incentive, some opportunity for what the institutional-economics literature calls rent-seeking. The losses do not multiply linearly; they compound. A directive that begins at the top of a sufficiently large hierarchy and arrives at the bottom is, almost by construction, a different directive. This is not a moral failing; it is what happens when high-bandwidth intent has to be transmitted through a low-bandwidth medium called language and a low-fidelity actuator called another human’s judgment.

The third is the time-horizon mismatch between the people who decide and the people who experience the consequences. Quarterly capital markets, four-year electoral cycles, and human career arcs all operate on horizons that are short relative to the consequences of energy infrastructure, climate response, biosecurity preparedness, or supply-chain redesign. A decision-maker rationally optimizing within a ten-year horizon will systematically underinvest in coordination problems whose payoff is thirty years out. This is not corruption. It is what happens when agents are evaluated on signals that arrive faster than the system they are supposed to be steering.

Friedrich Hayek’s 1945 essay, “The Use of Knowledge in Society,” named the deepest version of this problem fifty years before computation was ready to address it.⁸ His insight was that the relevant knowledge for economic coordination is not the kind that lives in textbooks or aggregate statistics, but the kind that exists, dispersed and tacit, in the heads of millions of local actors — the dockmaster who knows which crane is unreliable in cold weather, the farmer who knows which field drains poorly. Central planners cannot collect this knowledge fast enough, and Hayek concluded, correctly, that the price system was the only known mechanism that aggregated it at scale. Subsequent decades of mechanism design — Vickrey on truthful auctions, Roughgarden on the price of anarchy, Sandholm on combinatorial allocation — sharpened the picture: there are coordination problems where prices alone cannot reach efficient outcomes, and explicit allocation mechanisms must be designed.⁹¹⁰

We are not arguing that markets should be replaced. Markets do many things extraordinarily well, and any system that ignored their information-aggregation properties would be intellectually dishonest. We are arguing that there exists an additional substrate — a layer of computational coordination — that operates at scales, speeds, and resolutions human institutions cannot directly touch, and that its arrival does not abolish markets any more than the synchronized grid abolished local generation. It composes with them. It absorbs the coordination problems that price signals alone underspecify, and it leaves the rest to the institutions that already do them well.

The Apik thesis

Apik Systems exists because we think three claims are now defensible. They are not equally certain, and we will be honest about which one we are most likely to be wrong on. But they form the architecture of why we are doing what we are doing.

Claim 1 — Coordination is computable.

Most of the coordination problems described above — resource allocation, supply-chain routing, energy dispatch, manufacturing scheduling, hospital triage, agricultural water budgeting — are mathematically tractable. They have been studied for decades as integer programs, mixed-integer linear programs, combinatorial optimization problems, and Markov decision processes. The reason they have not been solved at planetary scale is not that the math is incomplete; it is that three operational preconditions were missing. First, inference latency: the cost of reasoning about a state with millions of variables, in milliseconds, was prohibitive. Second, sensor coverage: the world was not adequately instrumented to keep the model and the territory in correspondence. Third, verifiable execution: even if a plan existed, there was no machinery to translate it into actions in the physical world without re-introducing the human-mediated friction we were trying to avoid.

Frontier models, with their increasingly low marginal cost of long-context reasoning, are closing the inference precondition. Ubiquitous sensing — from satellite constellations with daily revisit rates to industrial telemetry to the LiDAR and millimeter-wave radar that have become standard in autonomous platforms — is closing the coverage precondition. And the agent infrastructure that is now emerging, with structured tool-use, durable memory, and protocols for verifiable action, is closing the execution precondition. None of these is finished. All three are now plausibly within reach. That is what we mean when we say computational coordination has become a tractable engineering target rather than a thought experiment.

Claim 2 — Frontier models plus agent infrastructure are the first plausible coordination substrate.

The shift in capability over the last three years is not, in our reading, primarily about model size. It is about three architectural shifts that change what is operationally possible. The first is reliable long-horizon, multi-step reasoning — the ability of a model to maintain a coherent chain of intermediate goals, revise them in response to feedback, and recover from local failures without losing global coherence. METR’s recent work on the time horizons of frontier AI tasks has begun to quantify this directly: the duration of tasks that frontier systems can complete autonomously has been doubling on roughly a seven-month cadence, a trend with no obvious near-term ceiling.¹¹ The second is robust tool-use, in which a model becomes the planner and a heterogeneous set of external systems — databases, simulators, planners, robots — become its instruments, with the model treating their failure modes and latencies as first-class facts about the world. The third is the maturation of structured oversight protocols: monitors, runtime checks, scaffold-level safety affordances, and capability evaluations that allow systems to be deployed with quantified, not folkloric, claims about what they can and cannot do.

Read together with the trajectory in embodied learning — from RT-2’s vision-language-action architectures, through DeepMind’s SIMA generalist agents, through the π0 flow-matching models from Physical Intelligence — what we have is, for the first time, a plausible architectural template for an agentic system that can reason about a planetary problem, decompose it, dispatch tasks to specialized policies, manage their execution, and integrate the results.¹²¹³ The pieces are not yet welded together at scale. But each of them, individually, has crossed from research demo into deployable artifact.

Claim 3 — Embodiment closes the loop.

A coordination decision that cannot reach into the physical world is, by definition, a planning exercise. It can recommend that a tonne of wheat be moved; it cannot move it. For most of the history of computation, we accepted this gap because the alternative did not exist. Industrial automation could move things, but only in tightly choreographed environments with hand-tuned policies. The world outside the factory was the domain of humans, and any computed plan had to terminate at a recommendation handed to a person.

Humanoid robotics, dexterous manipulation, and fleet-level autonomy are, for the first time, plausible candidates for closing that loop. We are not going to claim that the problem is solved — see the open problems section for a long list of what is not. But the trajectory of the last five years, from Boston Dynamics’ Atlas through Figure’s iterative humanoid platforms through the open-source manipulation work emerging from academia, suggests that embodied policies that can act competently in unstructured environments are an engineering target with measurable progress, not a science-fiction destination. The significance of this for coordination is straightforward. The latency between “the system has decided what should happen” and “the world is the way the system decided it should be” can, in a fully realized stack, drop by orders of magnitude. That latency reduction is the substantive content of the abundance claim. It is what makes the architecture of coordination cash out as physical effect.

The least certain of these three claims is the third. The first we are confident about; the second we think is now broadly accepted by the technical community; the third we are betting on, in the strong sense that we may be wrong, and we are committing capital and years to finding out.

The Apik Civilization Stack

We organize our work as five layers, each interfacing with the layers above and below it. We call this the Apik Civilization Stack. It is a stack and not a tower, in the precise sense that each layer depends on the layers below it for safety and on the layers above it for direction. Removing the bottom does not collapse the top into a simpler version of itself; it collapses the top entirely. Equally, the bottom layers without the top are powerful but undirected — capacity without intent.

Layer 01 — Human Intelligence (Senwitt)

The first layer is cognitive infrastructure for individuals. We call it Senwitt. Its purpose is to extend the bandwidth, memory, and cross-tool reasoning of a single human being engaged in complex work, without replacing the human’s role as the locus of judgment. This is the layer at which contemporary debates about “AI assistance” mostly happen, and it is where the temptation to overclaim is greatest. We try to be plain about what Senwitt is and is not. It is a substrate for context retention across long-running projects, for cross-document reasoning at human-relevant timescales, for the kind of deliberate, slow integration of evidence that humans do well in principle but rarely have the time to do in practice. It is not a replacement for the human’s intent. The premise is the opposite: as the lower layers of the stack become more capable, the question of what we want them to do — at the level of a project, an organization, a community, a polity — becomes more rather than less important. Humans-in-the-loop is not a transitional design; it is the design. Senwitt is the surface through which a human being can productively participate in a stack whose lower layers operate faster and at higher dimensionality than unaided human cognition can directly manage.

Layer 02 — Artificial Intelligence (Brello AI)

The second layer is the generative and reasoning substrate — what we have come to call, internally, the thinking layer. Brello AI is our family of frontier models for design, planning, decomposition, and analysis, operating on the time horizons where careful thought, not reflexive action, is the relevant mode. This is the layer at which a complex problem — a national-scale supply-chain reconfiguration, a research-program design, a manufacturing-line redesign — gets broken into tractable pieces, each with verifiable success criteria, each routable to either a more specialized model below or back to a human above. We invest heavily in the interpretability, calibration, and oversight properties of this layer because almost every failure mode of the higher layers traces back to a misspecified plan at this level. A good plan executed perfectly is still a bad outcome if the plan was wrong. A model that knows when to refuse to plan is more valuable, in this layer, than one that always produces a plan.

Layer 03 — Autonomous Agents (Agentic Systems)

The third layer is what we call the doing substrate. Where Brello AI thinks, the agentic layer acts. Long-horizon planning, durable memory, robust tool-use, structured handoffs between specialized agents, fleet-level coordination — all of the operational machinery that turns a plan into a sequence of actions in the world. This layer is where most of the engineering effort goes, because most of the difficulty of agentic systems is not in any single decision but in maintaining coherence across thousands of decisions, recovering from failures without amplifying them, and producing observable, auditable traces of what happened and why. The agentic layer is also where the practical content of safety becomes most concrete: this is where rate limits live, where capability gates live, where the protocols for human escalation live, where the boundaries between what an agent may and may not do without human authorization are operationalized.

Layer 04 — Physical Intelligence

The fourth layer is the acting substrate. Humanoid robotics, dexterous manipulators, mobile platforms, autonomous manufacturing cells — the machinery that lets a computed decision become a physical effect. We treat physical intelligence as a research program, not a product line, because the underlying scientific problem of generalist embodied policies is unsolved and progress is uneven. We are honest with ourselves about the gap between a controlled demo and a deployed system, and we publish our internal evaluations including the failures. The reason this layer exists in our stack at all, rather than being deferred to specialist robotics companies, is that we do not think the coordination story can be told without it. A coordination layer that can plan but not act is, at planetary scale, indistinguishable from a very expensive recommendation engine.

Layer 05 — Economic Orchestration

The fifth layer is the deciding substrate, and it is the most ambitious and least mature of the five. Its concern is mechanism design at planetary scale: the protocols by which heterogeneous actors — humans, organizations, agents, physical systems — propose, negotiate, allocate, and reconcile commitments under uncertainty. Most of our research effort here is theoretical: we are studying which classes of allocation mechanisms remain incentive-compatible when one party is a learned coordinator, what verifiability primitives are required for a multi-party computation over commitments, what the right interface is between a learned coordinator and an existing legal-economic system. We are explicit that this layer must not concentrate authority in a single operator — including ourselves. The architecture we are working toward is one in which the coordination protocol is open, audited, and federated across multiple instances, none of which is privileged. Whether we can actually achieve this is one of our open problems.

Each layer depends on the layers below it for safety: a misbehavior at a higher layer can be detected and bounded only because the lower layers have been instrumented to make detection possible. Each layer depends on the layers above it for direction: a more capable substrate without intent is, at best, undirected capability and, at worst, a dangerous one. This is a stack, not a tower of replacements. We do not believe the bottom layer goes away when the top layer arrives. We believe the opposite: a competent top layer makes the bottom layer matter more.

Safety as a precondition

The deepest reason for caution about what we are working on is not that any single system might fail in any single way. It is that concentrating coordination authority is the highest-leverage failure mode in the history of human institutions. Empires have done it; nation-states have done it; corporations have done it. The pattern of consequences is not subtle. Any project that takes the coordination thesis seriously has to take the centralization risk equally seriously, because they are the same architecture seen from two angles. We do not get to defer this question. We do not get to ship the capability and figure out the governance afterwards. The two have to develop together or the project has not been done responsibly.

Our position is that every layer of the stack must ship with explicit safety affordances built into it from the beginning, and that those affordances must be technically substantive rather than rhetorical. At the model layer, this means investing in mechanistic interpretability research with the specific goal of producing artifacts — circuits, features, behavioral characterizations — that an external auditor can use to verify properties of the system.¹⁴ We follow the line of work from Olah and collaborators on interpretable bases and the broader Anthropic and DeepMind interpretability programs, and we are committing to publish our own interpretability artifacts on a schedule that does not lag deployment.

At the agent layer, this means robust oversight protocols: scalable monitoring of long-horizon traces, runtime tripwires, capability evaluations that test for the specific failure modes that learned optimizers are known to exhibit. The work of Hubinger and colleagues on risks from learned optimization, of Christiano and others on alignment from human preferences, and the recent body of evaluation work from groups like Apollo Research on scheming and deceptive alignment have shaped how we think about what actually has to be measured.¹⁵¹⁶¹⁷

At the deployment layer, we are operationalizing a Responsible Development Policy, modeled on the responsible scaling commitments published by frontier labs over the last two years, with capability thresholds that gate further training and deployment. The policy is a living document; it ships, openly, at /safety/responsible-development-policy, and we revise it as our understanding of what actually predicts risk improves.¹⁸

We are honest that we do not know how to do this fully yet. We do not know how to interpret models with on the order of one trillion parameters under online updating. We do not know how to evaluate a long-horizon agent in a way that catches the failure modes that are quietest. We do not know how to build a coordination protocol that is provably resistant to capture by its operator, including by us. We are committing to working on all of these problems openly, in collaboration with the broader research community, and to treating the failure to make progress on them as a reason to slow down — not a reason to ship and hope.

Open problems

The research agenda below is partial. These are the questions we are currently most-actively working on or sponsoring work on, and the ones we think the broader field should treat as bottlenecks. None of them have known solutions. Each is dated to indicate when we last revised our thinking about it.

How do we build verifiable swarm protocols that remain stable when an arbitrary subset of participants is adversarial? Existing distributed-consensus literature gives us partial answers in classical settings, but learned coordinators introduce failure modes — collusion under shared training distribution, mode collapse under correlated incentives — that classical Byzantine fault-tolerance does not cover. We do not have a satisfying formalism yet. (Open since 2024; revised April 2026.)

What is the right interface between learned policies and formal-method safety envelopes? Reachability-analysis-based runtime verification works well for narrow control problems and breaks down as soon as the policy operates in a high-dimensional, partially-observed setting. We suspect there is a productive middle ground in which the policy is unconstrained within a verified envelope and conservative outside it, but the precise construction is unsettled. (Open since 2024.)

How do we scale interpretability artifacts to systems with on the order of 10^11 parameters and online updating? Current circuit-level interpretability is producing real artifacts at 10^9 to 10^10 parameters in offline settings. What this looks like at the next two orders of magnitude, while the model is being continuously updated, is genuinely unknown. (Open since 2025.)

What does a decentralized economic-orchestration layer look like — specifically, one whose protocol does not concentrate authority in a single operator, including its designer? We have sketches drawing on multi-party computation, threshold cryptography, and federated allocation, but no candidate that is both expressive enough to handle interesting allocation problems and resistant enough to capture. (Open since 2024.)

How do we evaluate long-horizon agentic systems in a regime where most failures are quiet? An agent that fails loudly is comparatively easy to handle. An agent that systematically degrades the quality of its outputs in subtle ways — that produces plans which are slightly worse than the optimum, in a direction correlated with its training pressures — is the failure mode we are most worried about and least able to currently detect at scale. (Open since 2025.)

What pricing and allocation mechanisms remain incentive-compatible when one of the participants is a learned coordinator with much greater inference capacity than the others? Standard mechanism design assumes computational symmetry. What happens when that symmetry breaks is partially studied in the algorithmic-game-theory literature, but the practical question of whether a learned coordinator can avoid asymmetric advantages without crippling itself is open. (Open since 2025.)

How do humans retain meaningful authority over decisions they do not have time to inspect? This is the substantive form of the human-in-the-loop question, and the honest answer is that we do not know. Sampling-based audit, escalation thresholds, and abstraction-level oversight all have known weaknesses. The deeper question is whether there is a level of abstraction at which a human can authorize a class of decisions without being deceived by the abstraction. (Open since 2024.)

How do we measure progress on physical intelligence in a way that resists Goodharting? Benchmarks in robotics have a long history of being saturated by methods that do not generalize. We are funding work on diverse, hard-to-game evaluation suites, including tasks that explicitly test out-of-distribution generalization, but we are not satisfied with the current state of the art. (Open since 2025.)

What is the right governance model for an entity that operates at planetary coordination scale? Existing corporate, nonprofit, and public-benefit structures have all been stress-tested at sub-planetary scales and have known failure modes. We are studying historical models of multilateral governance — the Bank for International Settlements, the Internet Engineering Task Force, the Universal Postal Union — for their successes and failures. We do not have a candidate model we are ready to commit to. (Open since 2024.)

How do we design upgrade paths for safety properties as capabilities grow? A safety property that holds at one capability level may not survive a step-change in capability. We need formalisms that let us reason about which properties are scale-invariant and which require re-derivation. (Open since 2025.)

How do we think about the energy budget of a planetary coordination layer? Inference at planetary coordination scale is itself a non-trivial energy footprint. The system we are building must justify itself thermodynamically — its coordination output has to dominate its compute input. Whether and at what scale this is true is an empirical question we have not yet adequately answered. (Open since 2026.)

How do we preserve linguistic, cultural, and institutional pluralism in a stack whose lower layers are increasingly homogeneous? A coordination substrate that operates across the planet must respect the fact that the planet’s institutions are not, and should not become, uniform. The technical question of how heterogeneity is preserved through a coordination protocol — and not silently averaged away — is one of the more philosophically important problems on this list. (Open since 2026.)

We expect to revise this list, in public, on a six-month cadence. Problems will move on and off it. What will not change is the principle that the open problems are part of the public record of what we are working on.

How we work

Our lab structure is small relative to our ambitions, and deliberately so. We are interdisciplinary by construction — economists, mechanism designers, robotics researchers, formal-methods specialists, interpretability researchers, and product engineers in the same standing meetings — because the integration of disciplines is the actual unit of work. We hire for long-horizon temperament rather than short-horizon throughput. The kinds of problems on the open-problems list are not amenable to sprint-cycle optimization.

Our publishing posture is open by default. We publish research, code, evaluation suites, and interpretability artifacts. We gate publication only when the combination of capability and uplift risk is concrete and specific — that is, when we can articulate, in writing, why a particular release would non-trivially advance a misuse capability that is not already broadly available. The default is to publish. The burden of justification is on the case for not publishing. We treat that burden as substantial but not insurmountable, and we publish summaries of what was withheld and why.

Collaboration is central to how we work. We host visiting researchers on six- and twelve-month appointments, run a fellowship program for early-career researchers working on topics adjacent to the stack, and partner with academic groups on problems where we can contribute compute, data, or engineering capacity. The fellowship and visiting positions are described at /company/careers, and we make a point of advertising them broadly rather than only through closed networks.

Our funding posture is patient capital. We are not optimizing for the next financing round; we are optimizing for the next decade. The investors we work with have committed to time horizons consistent with the actual time scale of the research, and we have been explicit that we will pass on otherwise-attractive opportunities when they would compress the work. This does not make us slow. It makes us deliberate. There is a difference, and the difference is the entire point.

Beyond Earth

The same coordination infrastructure that closes the loop on planetary supply chains is, in principle, the infrastructure that makes a closed-loop off-world settlement viable. A Mars colony is not, fundamentally, a propulsion problem; it is a coordination problem of extraordinary density — a sealed economic and ecological system in which the failure of a single supply line cannot be papered over by an intercontinental container ship. The mass, energy, and material flows of a hundred-person settlement are tractable in a way that the flows of a hundred-million-person economy are not, but they are also unforgiving in a way that planetary economies are not. There is no slack.

We treat this as a thirty-year-horizon framing. We are not promising Mars next year, or the year after, and we will not. What we are saying is that the architectural problems of closed-loop life support, recursive supply chains, and off-world manufacturing are continuous with the problems we are working on now — that the coordination substrate that lets a planet not waste a billion tonnes of food is the same substrate, with different constants, that lets a closed habitat not waste a kilogram of nitrogen.

It is also, to be plain, a reason to take the planetary version seriously. A civilization that cannot coordinate its way out of the coordination problems on Earth is not a candidate for a multi-planetary future. The terrestrial work is not a stepping stone to space; it is the prerequisite. And the choice of how we do the terrestrial work — what we centralize, what we federate, what we open-source, what we gate, whom we include — sets the precedent for everything that comes after.

Where to read further

If this is the document that introduced you to what we are working on, three further pointers may be useful. /research is where we publish technical results, evaluation reports, and the artifacts referenced in the open-problems section above. /safety/responsible-development-policy is where we publish the operational document that governs how we deploy capabilities, what gates apply, and what we have committed to disclose. /company/careers is where the fellowship, visiting-researcher, and full-time roles are listed; if any of the open problems above is something you have been thinking about, we would prefer to be in touch with you than not.

— Rehan Temkar, Co-founder, Apik Systems · April 2026

United Nations Environment Programme, Food Waste Index Report 2024; see also Food and Agriculture Organization of the United Nations, Global Food Losses and Food Waste — Extent, Causes and Prevention (Gustavsson et al., 2011), the source of the often-cited 1.3-billion-tonne figure. ↩
Food and Agriculture Organization of the United Nations, Food Wastage Footprint: Impacts on Natural Resources (FAO, 2013), which estimates the carbon footprint of food wastage at approximately 3.3 Gt CO2-equivalent per year. ↩
World Bank, Connecting to Compete: Trade Logistics in the Global Economy (Logistics Performance Index series); Armstrong & Associates, Global Logistics Costs and Third-Party Logistics Revenues, which places global logistics costs at roughly 10–12 percent of global GDP. ↩
International Energy Agency, The Future of Energy Storage and related work on seasonal-storage gaps in renewable-heavy grids; see also MIT Energy Initiative, The Future of Energy Storage (2022). ↩
Marc Levinson, The Box: How the Shipping Container Made the World Smaller and the World Economy Bigger (Princeton University Press, 2nd ed., 2016); Bernhofen, El-Sahli, and Kneller, “Estimating the Effects of the Container Revolution on World Trade”, Journal of International Economics (2016). ↩
George A. Miller, “The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information”, Psychological Review 63 (1956): 81–97; Nelson Cowan, “The Magical Number 4 in Short-Term Memory: A Reconsideration of Mental Storage Capacity”, Behavioral and Brain Sciences 24 (2001): 87–114. ↩
Herbert A. Simon, Models of Bounded Rationality (MIT Press, 1982); see also Simon’s earlier “A Behavioral Model of Rational Choice”, Quarterly Journal of Economics 69 (1955): 99–118. ↩
F. A. Hayek, “The Use of Knowledge in Society”, American Economic Review 35, no. 4 (1945): 519–530. ↩
William Vickrey, “Counterspeculation, Auctions, and Competitive Sealed Tenders”, Journal of Finance 16 (1961): 8–37; Tim Roughgarden, Twenty Lectures on Algorithmic Game Theory (Cambridge University Press, 2016). ↩
Tuomas Sandholm, “Algorithm for Optimal Winner Determination in Combinatorial Auctions”, Artificial Intelligence 135 (2002): 1–54. ↩
METR, Measuring AI Ability to Complete Long Tasks (2025), reporting that the time horizon at which frontier AI systems complete tasks at fifty-percent reliability has been doubling on roughly a seven-month cadence. ↩
Anthony Brohan et al., “RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control”, arXiv:2307.15818 (2023); Google DeepMind SIMA Team, “Scaling Instructable Agents Across Many Simulated Worlds”, arXiv:2404.10179 (2024); Open X-Embodiment Collaboration, “Open X-Embodiment: Robotic Learning Datasets and RT-X Models”, arXiv:2310.08864 (2023). ↩
Kevin Black et al., “π0: A Vision-Language-Action Flow Model for General Robot Control”, arXiv:2410.24164 (2024), Physical Intelligence. ↩
Chris Olah et al., “Mechanistic Interpretability, Variables, and the Importance of Interpretable Bases”, Transformer Circuits Thread (2022); Anthropic, “Core Views on AI Safety: When, Why, What, and How” (2023); Rishi Bommasani et al., “On the Opportunities and Risks of Foundation Models”, arXiv:2108.07258 (2021). ↩
Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant, “Risks from Learned Optimization in Advanced Machine Learning Systems”, arXiv:1906.01820 (2019). ↩
Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei, “Deep Reinforcement Learning from Human Preferences”, arXiv:1706.03741 (2017). ↩
Apollo Research, “Frontier Models Are Capable of In-Context Scheming” (2024); see also Apollo Research scheming-evaluations work through 2025. ↩
Anthropic, Responsible Scaling Policy (2023, revised 2024); Google DeepMind, Frontier Safety Framework (2024); see also METR, “Common Elements of Frontier AI Safety Policies” (2024). ↩