Project Aegis
Provably safe multi-agent coordination
Why this project exists
Multi-agent autonomy in open-world environments breaks the assumptions that make most existing safety methods work. Behavior that is safe in isolation can become catastrophic under concurrency, emergent dependencies, or environment perturbation. We need a framework that gives operators usable guarantees even when the underlying agents are learned, opaque, and evolving.
Aegis is our attempt at that framework. It does not try to make learned policies provably safe in their entirety. Instead, it surrounds them with formally specified envelopes — runtime monitors and structured constraints whose composition behavior we can analyze with formal methods.
How we work on it
Verified envelopes around learned policies
We pair every deployed policy with a formally specified safety envelope written in TLA+ and discharged through SMT solvers. The policy proposes; the envelope filters or vetoes. The envelope is small, auditable, and version-controlled. The policy is large, learned, and updated frequently. The contract between them is explicit.
Composition under environment uncertainty
Most formal-method safety work assumes a closed model of the environment. Aegis specifically targets cases where that model is partial. We use bounded perturbation analysis on the environment specification and conservative refinement when the runtime observation falls outside the verified region. The decoder boundary is itself an artifact we publish.
Multi-agent invariants
When N agents interact, individual envelopes are insufficient — a swarm can violate a global invariant that no individual agent intended to break. We formalize a small set of swarm-level invariants (no-collision, energy budget, latency contract, authority handoff) and derive per-agent local constraints sufficient to enforce them. The proof obligations are checked offline; the runtime monitors enforce.
Where the work stands
- ShippedQ4 2025
Single-agent envelope toolkit
Internal release of TLA+ specifications + SMT discharge harness for ten reference scenarios.
- In progressQ1 2026
Two-agent composition
Verified safety properties for two cooperating agents in a structured workflow domain.
- PlannedQ3 2026
Open-world swarm preview
Initial demonstration of N-agent invariant enforcement in a live operations setting; external red-team evaluation.
What we are still figuring out
- 01
Composability of verified envelopes
When two verified-envelope agents interact, the joint envelope is not always the conjunction. We are studying when composition is sound vs. requires re-verification.
- 02
Decoder-boundary handling at scale
How aggressively can the runtime monitor refine the envelope without sacrificing throughput? The latency budget is tight in real deployments.
- 03
Authority handoffs
Multi-agent systems require explicit handoffs of decision authority. Specifying handoff invariants without freezing the system is non-trivial.
- 04
Adversarial swarm members
Our current invariants assume cooperative agents. Robustness under a single Byzantine member is a much harder problem.
External references
- TLA+ specification languageLeslie LamportThe specification language we use for safety envelopes.
- Risks from Learned Optimization in Advanced Machine Learning SystemsHubinger et al. · 2019Foundational framing for why learned policies can violate intended invariants in subtle ways.
- Stress Testing Deliberative Alignment for Anti-Scheming TrainingApollo Research · 2024–25Behavioral evaluation methodology that motivates our envelope-as-defense-in-depth posture.
- Time Horizons of Frontier AI TasksMETR · 2025Empirical capability tracking used to calibrate envelope strictness.