The Apik Civilization Stack: A Five-Layer Walkthrough

In our launch post we sketched the Apik Civilization Stack at a high level. This post is the engineering walkthrough. We will treat each of the five layers in turn — what it does, what its inputs and outputs are, what depends on it, what it depends on, and the technical problems we consider open. We will close with a discussion of how the layers compose, where the oversight handoffs sit, and how we have organized the system to fail-isolate. The stack is not purely a software architecture. It is a research and product organization that happens to look like an architecture, and the layering is how we keep ourselves honest about what depends on what.

Layer 1: Human Intelligence (Senwitt)

The bottom layer is the human one. The product expression is Senwitt, a personal cognition surface that mediates between an individual and everything above. It carries four kinds of state: persistent memory, current attention, active intent, and consent. Memory means the user’s facts, preferences, work artifacts, and the longitudinal traces of how those have evolved. Attention means what the user is currently looking at and what they have asked to be done. Intent means the planning structure that turns goals into tasks. Consent means the policy layer that gates which artifacts and actions any higher layer is permitted to touch.

What this layer depends on is, principally, three things: a robust local-first storage substrate, a permissioning model that can express fine-grained intent without exhausting the user, and a model layer below it that can be queried under tight latency budgets without leaking memory upstream. What depends on it is everything. Higher layers cannot function safely without the consent surface that this layer exposes — that is, in a strong sense, what makes the upward composition tractable.

Named technical challenges include the design of consent vocabularies that are expressive enough to be useful and small enough to be auditable; the question of how to do retrieval over personal memory in a way that preserves locality guarantees; and the longitudinal evaluation problem of measuring whether a personal cognition surface is improving the user’s reasoning over months, not minutes. Our current research at this layer is documented in research / cognitive computing.

Layer 2: Artificial Intelligence (Brello AI)

The second layer is the frontier-model layer. Brello AI is our public-facing model surface and the engine that the layers above depend on for reasoning, perception, and generation. The work here is the kind that any frontier lab does: pre-training, post-training, alignment, evaluation, and the operational craft of running large models reliably under load. We do not believe scale alone closes the capability gap, but we also do not believe it closes on its own; both bets are wrong, and the right bet is to push capability and safety together.

This layer depends on the underlying compute substrate, on the quality of training data and synthetic data pipelines, and on the evaluation infrastructure that tells us whether the latest model is actually better in the ways we care about. What depends on it is the entire upper stack: agents are bottlenecked on reasoning quality; humanoids are bottlenecked on perceptual reliability; orchestration is bottlenecked on simulation fidelity. Improvements at this layer ripple upward almost mechanically.

Named technical challenges include faithful chain-of-thought, the long-tail of reasoning errors under distribution shift, sample efficiency in post-training, and the specific evaluation problem of distinguishing models that are better from models that have been better-fit to our evals. We are also working on inference-time reasoning architectures that we will write up separately. The substantive research surface is at research, and the public model documentation lives at products / Brello.

Layer 3: Autonomous Agents (Agentic Systems)

The third layer is where capability becomes action. Agentic Systems is the program that turns models into durable, planning, tool-using agents that operate over long horizons. The substantive research questions here are not principally about model capability; they are about reliability under composition, memory architectures that support multi-day work, planning with rollback, and the design of tool-use protocols that fail safely when the underlying tool fails noisily.

This layer depends on the model layer below it (for reasoning and tool selection), on a memory and storage substrate (for persistence across sessions), and crucially on a verified-policy substrate that we are building under Project Aegis. What depends on it is the physical-intelligence layer (robots are agents with bodies) and the orchestration layer (orchestration is multi-agent at planetary scale).

Named technical challenges include long-horizon credit assignment, the cost-of-error problem in irreversible tool calls, multi-agent emergent behavior under shared resources, faithful self-reporting of an agent’s own state, and the operational problem of running fleets of agents without their failures correlating in dangerous ways. We track the field’s empirical work on long-horizon evaluations — the METR task-horizon study and similar — and our internal evals are calibrated against external ones where possible. The detailed research thread is at research / autonomous agents.

Layer 4: Physical Intelligence

The fourth layer is bodies. Our Physical Intelligence program covers humanoid platforms, manipulation, navigation, and the sensor-fusion substrate that connects perception to control. We treat physical intelligence as a hard composition problem: it inherits the reasoning of the model layer, the planning of the agent layer, and adds a real-time control substrate that has its own latency and reliability constraints.

This layer depends on the agent layer (for high-level intent), the model layer (for vision-language reasoning), and a hardware platform that is reliable under the duty cycles a useful humanoid will see. It depends, too, on a simulation environment of high enough fidelity that policies trained in it transfer to the world without catastrophic surprise. What depends on it is, narrowly, any task that requires interaction with the physical world, and broadly, the entire question of whether the stack can be useful outside of digital substrates.

Named technical challenges include sim-to-real transfer under partial observability, manipulation of deformable objects, the calibration drift problem in sensor stacks operating outside controlled environments, the energy budget for sustained operation, and the safety envelope problem — formally specifying what the body is and is not permitted to do, and verifying that the policy respects the envelope. The line between this layer and Project Aegis is, in our view, the most consequential interface in the entire stack. Research details at research / physical intelligence.

Layer 5: Economic Orchestration

The fifth layer is the most speculative and the most important. Economic Orchestration is our planetary-coordination program. The thesis, which we develop more fully in Coordination as Computation, is that markets are extraordinary at price-discovery for short-horizon, well-specified, locally-knowable goods, and inadequate at long-horizon, externality-heavy, network-coupled coordination problems. A learned coordination substrate — sensor-fusion across institutions, simulation-rich planning over alternative futures, faster reaction times under crisis — could close some of those gaps. It cannot and should not replace the institutions whose job is to express collective preferences; that distinction matters and is non-negotiable.

This layer depends on every layer below it. It is the upward terminus of the stack. What depends on it is the question of whether the rest of the stack adds up to something useful at civilizational scale rather than merely at individual or enterprise scale.

Named technical challenges include the multi-objective optimization problem at planetary scale, the robustness of mechanism design under adversarial reporting, the legitimacy problem (any orchestration substrate that does not have a clear story for why its outputs should be trusted is a non-starter), the simulation fidelity problem (you cannot orchestrate what you cannot simulate), and the human-authority retention problem (we will not build a system that does not preserve, by construction, the right of human institutions to override it). Research at research / economic orchestration.

How the stack composes

The layers do not merely sit on top of each other; they have specified interfaces, and we treat those interfaces as first-class engineering artifacts. Three categories of interface matter most.

First, oversight handoffs. The user-to-agent boundary is where consent flows upward and accountability flows downward. The agent-to-physical boundary is where the formal envelope of Project Aegis sits — a runtime monitor with formally specified invariants that the policy must respect. The physical-to-orchestration boundary is where the local autonomy of an embodied agent is composed with the global coordination decisions of the orchestration layer; here the design constraint is that local agents retain the authority to refuse a global instruction if it violates their local invariants.

Second, failure isolation. The layers are designed so that the failure of one layer degrades the layers above it gracefully rather than catastrophically. If the orchestration layer is unavailable, the physical layer reverts to local autonomous operation. If the agent layer fails, the physical layer reverts to a safe default. If the model layer is unavailable, the agent layer surfaces the failure to the user rather than hallucinating a response. This is the architectural rendering of the safety principle that no layer should single-point-of-failure into civilization-scale harm.

Third, research feedback loops. The interfaces are also where most of our research happens. Improvements in interpretability at the model layer feed directly into the agent layer’s ability to construct trustworthy plans. Improvements in formal verification at the safety layer feed directly into the physical layer’s deployment envelope. Improvements in simulation fidelity at the orchestration layer feed back down into the agent layer’s planning quality. The stack is not a product hierarchy. It is a feedback graph that we have flattened, for communication, into a vertical picture.

There is a fourth class of consideration that does not show up cleanly in the layered picture but that shapes how we organize the work: lateral interfaces between projects. Project Aegis lives notionally between the agent and physical layers, but its specification language is co-developed with the interpretability work that lives at the model layer, because the most useful invariants are the ones that can reference internal model state rather than only external behavior. Project Q-Core sits underneath the model layer architecturally — it is, ultimately, a substrate for compute — but the learned-decoder thread within it borrows methodology from the agent layer’s reinforcement-learning work. Project Synthesis spans the model, agent, and physical layers in a single closed loop; it is the most explicit demonstration we have, internally, that the stack is a graph rather than a tower. We mention these laterals because they explain why our team structure does not map cleanly onto the layer structure, and why the cross-layer collaboration channels are weighted as heavily in our internal coordination as the within-layer ones are.

A note on what the stack is not. It is not a strategy for replacing existing institutions; the orchestration layer’s design constraint is precisely that it is subordinate to existing political authority, a point we develop separately in Coordination as Computation. It is not a market roadmap; the layers are research programs first and product surfaces second, and we have been deliberate about not confusing the two. It is not a claim that we will build all five layers single-handedly; we expect to do specific load-bearing work at each layer and to collaborate widely with other laboratories, vendors, and academic groups on the rest. The framing is meant to clarify the engineering picture, not to assert ownership of it.

We will write deeper engineering notes on each layer over the coming months. The next several posts go into specific projects. If you have read this far and the picture seems internally coherent — that is the design intent. If you have read this far and the picture seems too neat, you should know that the lab spends most of its actual hours on the messy interfaces, not on the layers themselves.

— Rehan Temkar, Co-founder, Apik Systems