Introducing Apik Systems: A Frontier AI Research Lab

Today we are publicly launching Apik Systems. We are a frontier research laboratory working on the autonomous intelligence infrastructure that we believe a post-scarcity civilization will require. The lab has been operating quietly for the better part of a year. We are emerging now because the technical landscape has shifted faster than most of us expected, and because we believe the gap between what is becoming possible and what is currently being built has widened to the point where another laboratory, with a different operating thesis, is genuinely useful. The thesis is straightforward to state and harder to deliver. We think the next decade is not principally about training larger models. It is about composing learned systems with verified ones, embodying those systems in the physical world, and giving them the long-horizon coordination machinery they need to be useful at planetary scale without being unsafe at planetary scale. That composition problem — not raw capability — is the bottleneck. We exist to attack it directly.

Why now

There is a particular kind of moment in a research field where three or four trend lines that were each plotted independently begin to interfere with each other, and the engineering implications of the interference are larger than any of the individual trends. We think we are in such a moment. The first trend is the rapid maturation of frontier models. Training compute has increased by roughly 4x per year for several years; pre-training scaling has shown diminishing but positive returns; reasoning-heavy post-training, distillation, and tool-use scaffolds have continued to extract more capability per parameter. We no longer need to argue whether a sufficiently capable model can plan, write code, or reason multi-step. The argument is now about reliability, faithfulness, and the cost of errors at the long tail.

The second trend is agent infrastructure. Two years ago an “agent” usually meant a thin loop calling a single model with bolted-on tools. Today the better systems decompose work across heterogeneous models, persist memory across sessions, plan over hours or days, and, increasingly, write and verify their own subroutines. The reliability is still poor relative to what shipping into critical paths would require, but the architectural picture has clarified. We know what an agent is. We know what its failure modes are. And the empirical work on long-horizon evaluations — task lengths in the tens of hours rather than minutes — is starting to give us measurement traction on autonomy in a way the field did not have access to before.

The third trend is embodiment. The cost curves on actuators, the reliability of vision-language-action models, and the rate at which simulation-to-real transfer has improved have together pushed humanoid platforms into a regime where pilot deployments are not science fiction. The fourth trend, which is less remarked upon but in our view equally important, is the rise of formal methods as a practical engineering discipline. SMT solvers, theorem provers, and formal verification toolchains have been quietly absorbed into mainstream engineering practice — for compilers, distributed systems, cryptographic protocols, and now, slowly, for the safety envelopes around learned policies. The four trend lines together suggest something specific. The era of standalone capability research is ending. The era of integrating capability into composable, verifiable, embodied, coordinated systems is beginning. That is the era Apik is built for.

What we’re building

Our roadmap is organized as a stack — the Apik Civilization Stack — and we describe it that way because the layers compose, depend on each other, and need to be reasoned about as a whole. There are five layers. We are working on all five concurrently because we believe leaving any of them to chance is what produces the worst outcomes.

The bottom layer is Human Intelligence, and our product expression of it is Senwitt. Senwitt is a personal cognition surface — the substrate through which an individual interfaces with the rest of the stack. Memory, attention, deliberation, and consent live here. We are deliberate about putting humans at the bottom of the stack rather than the top because we want the rest of the stack to be answerable to people, not the other way around.

The second layer is Artificial Intelligence, expressed as Brello AI. This is the frontier-model layer — the reasoning, perception, and generation engine on which the higher layers depend. Brello is where most of the public-facing capability work happens, and where we publish most of our model and evaluation research.

The third layer is Autonomous Agents, our Agentic Systems program. This is the layer that turns capability into action: durable agents that plan, execute, and learn from outcomes over long horizons. Most of our safety work — including Project Aegis — sits at the boundary between this layer and the layer below.

The fourth layer is Physical Intelligence, our robotics and humanoid program. The boundary between policy and physics is where many of the hardest problems live: real-time control, sensor fusion under adversarial conditions, energy budgets, and the question of how to give an agent a body without giving it leverage it should not have.

The top layer is Economic Orchestration, our planetary coordination program. This is the most speculative of the five and also, we think, the most important. The thesis here is that markets are extraordinary coordination devices for a particular class of problem and inadequate for several others, and that a learned coordination substrate — operating beneath, not above, existing institutions — could close gaps that have been open for a long time. We treat this layer as research, not as a product, and we have been clear about what it is and is not designed to do. The full argument lives in our research agenda and is expanded in subsequent posts.

Three internal projects sit at the intersection of these layers and are worth naming directly because they will appear repeatedly in the posts that follow this one. Project Aegis is our formal-verification multi-agent safety program — a runtime envelope that wraps learned policies in formally specified invariants. Project Q-Core is our quantum work, focused specifically on reducing the cryogenic overhead of error-correction stacks through a co-design of topological encoding and learned decoders. Project Synthesis is our closed-loop materials discovery program, in which a planning agent proposes candidate materials, a robotic lab synthesizes and characterizes them, and the resulting data refines the next round of proposals. All three projects are research projects, not products, and each will receive its own deeper writeup over the coming weeks.

How we work

Apik is a research-first laboratory. We have a small number of full-time researchers and engineers, a slightly larger pool of collaborators on specific projects, and a strong bias toward doing fewer things at higher fidelity. Our research agenda is public and we update it on a quarterly cadence. We publish progress notes — this is one of them — at roughly a weekly rhythm during normal operations and more frequently around major project milestones. When we are uncertain, we say so. When we have negative results, we publish them. When we ship safety-relevant artifacts, we ship the code and the eval suites alongside the writeups.

Our safety posture is documented in our Responsible Development Policy and our Safety Principles. We treat safety as load-bearing engineering work, not as a press function. Every project at Apik has a named safety lead, an externally reviewable threat model, and a deployment gate that the safety lead has the authority to hold closed. We collaborate with other labs and external auditors, and we run a coordinated-disclosure channel for safety-relevant findings about our systems. The pre-deployment review process is described in the Responsible Development Policy and is non-negotiable for any system that touches a non-research environment.

We are hiring. The strongest signal we look for is taste — the kind that produces sharp problem decomposition under uncertainty. The roles and the rough team shape are at careers. If you would rather collaborate from outside, we run a small number of structured residencies and a much larger volume of project-specific collaborations. We are particularly interested in researchers whose work spans more than one of the layers above; the interfaces between layers are where we believe the highest-leverage work lives, and they are systematically under-staffed across the field.

Where to read further

If you want the engineering substance of how the stack composes, start with The Apik Civilization Stack, which walks through each layer’s interfaces, dependencies, and named technical challenges. If you want to understand the safety thesis underneath everything we do, read the introduction to Project Aegis, which lays out the formal-envelope-around-learned-policy architecture we are committing to. If you want the philosophical frame — the why, before the how — the Manifesto is the cleanest single document we have written.

We are aware that publicly launching a lab with these ambitions invites both scrutiny and skepticism. We welcome both. We expect to be wrong about specific technical bets and we expect to revise them in public. We do not expect to be wrong about the underlying thesis: that the next decade is about composition, embodiment, and coordination, and that the laboratory that gets those three right will have done something materially useful for the civilization that has to live with the result. The work will speak — or fail to speak — for itself.

— Rehan Temkar, Co-founder, Apik Systems