The energy budget of cognition is the binding constraint of the next deployment phase. Cloud inference is bounded by datacenter power; edge inference is bounded by battery and thermal envelopes. The brain runs on twenty watts; the foundation models we deploy do not. The technical questions split into four. The first is whether neuromorphic and in-memory-compute architectures can close the energy gap between conventional inference and biological cognition at workloads that the deployment context cares about. The second is whether the software stacks for these architectures can be matured to the point where the workload-architecture mapping is engineering rather than research. The third is whether the deployment stack — quantization, distillation, hardware-software co-design — can carry foundation-policy capabilities onto edge hardware without the accuracy degradation that the field’s published benchmarks systematically understate. The fourth is whether the energy accounting itself is honest, since most published architecture comparisons omit data-movement, host-device-transfer, and surrounding-system costs in ways that flatter the architecture under evaluation. We work on the architectures that close some of the energy gap, on the programming models that make them usable, and on the methodological discipline that distinguishes deployable cognitive-computing substrates from research curiosities.
The four questions are different
The dominant inference architecture of 2026 is a parallel array of dense matrix-multiply units, fed by a memory hierarchy that spends a substantial fraction of its energy moving data rather than computing on it. This is fine for cloud workloads where the marginal cost of energy is bounded by industrial-scale procurement. It is not fine for the deployments where intelligence has to live close to the sensor: humanoids, autonomous vehicles, distributed-sensor networks, and any application where round-tripping to a datacenter is precluded by latency, privacy, or connectivity.
The energy gap between conventional inference and biological cognition is several orders of magnitude. The human brain operates at approximately 20 watts of total metabolic power, supports approximately 100 billion neurons with approximately 10^14 synapses, and runs continuously over multi-decade lifespans. A single forward pass of a frontier-class language model on conventional hardware draws on the order of hundreds of watts at the GPU and dissipates a large fraction of that energy in data movement rather than computation. Closing the gap is not a matter of incremental engineering on existing architectures. It requires architectural alternatives.
Several alternatives are now mature enough to be evaluated as serious rather than as research curiosities. Intel’s Loihi 2 — Davies and colleagues 2021 — is a digital neuromorphic processor with on-chip plasticity and event-driven communication.1 IBM’s earlier TrueNorth — Merolla and colleagues 2014 — established that million-neuron asynchronous chips were buildable.2 BrainChip’s Akida and Mythic AI’s analog-compute parts have brought neuromorphic and in-memory-compute architectures to commercial availability. Cerebras’s wafer-scale engine and Groq’s deterministic-latency tensor processor represent a different bet — radical reorganization of the conventional architecture rather than departure from it. Memristor research continues to advance the analog-compute frontier — Sebastian and colleagues 2020 is the reference review.3 The 2017 Rueckauer paper established the foundational spiking-neural-network training methodology.4
The problem is that the software stacks for these architectures are immature. Spiking neural networks, the natural workload for neuromorphic substrates, do not yet train as well as conventional networks on the same data.4 In-memory compute systems suffer from analog-noise accumulation that limits effective precision. Programming models are vendor-specific and changing. The connection between the architectures and the workloads we actually want to run is not yet clean.
The economics are worth being explicit about. Conventional GPU inference at the cloud margin runs at energy costs per token that are not currently the bottleneck on cloud deployment. The bottleneck shifts decisively at the edge: a humanoid running a foundation policy has, in round numbers, single-digit watts of inference budget, after locomotion and actuation costs are accounted for. Closing the gap between cloud-scale model capability and edge-scale energy envelope is the binding constraint on the most ambitious deployment trajectories, and it is not closeable with conventional architectures alone. The architectures described below represent specific bets about how the gap closes.
The reason this work matters: physical-intelligence deployment, embedded multi-agent systems, and any application that requires inference at the edge depends on closing the energy gap. The architectures that do so are at the threshold of seriousness. Getting the substrate right is the leverage point.
What the cognitive-computing program is, technically
We organize this work along four sub-strands.
Neuromorphic architectures
Neuromorphic processors implement spiking neural network dynamics in hardware, exploiting the sparsity of biological computation to achieve substantial energy reductions on the workloads they are well-matched to. Loihi 2 is our primary digital-neuromorphic reference platform.1 TrueNorth remains the architectural reference for million-neuron asynchronous designs.2 We work on workload mapping — which inference and learning problems are well-served by neuromorphic substrates and which are not — and on the development of training algorithms for spiking models that close the accuracy gap with conventional networks.
The most promising application surface in our internal work is event-driven sensing, where the input modality is itself sparse and asynchronous; the substrate matches the workload, and the energy advantage compounds. The discipline points include explicit workload-suitability characterization (so that the neuromorphic-substrate decision is a workload-architecture-fit decision rather than an architectural-aesthetic decision), explicit accuracy-energy trade-off characterization (so that the workload’s tolerance for the accuracy-degradation that neuromorphic substrates currently exhibit is quantified rather than assumed), and a preference for hybrid-architecture deployments where neuromorphic substrates handle the workloads they are well-matched to and conventional substrates handle the rest.
In-memory compute
In-memory compute substrates perform matrix-vector multiplication in the analog domain, inside the memory array itself, eliminating the data-movement cost that dominates the energy budget of conventional inference. Mythic AI’s analog-compute parts and the broader memristor literature3 define the current frontier. We work on the precision-noise trade-off — analog compute is energy-efficient but precision-limited, and the question of which workloads tolerate the resulting noise is open — and on the integration of in-memory compute with conventional digital pipelines for the parts of the workload that require full precision.
Calibration drift over device lifetime is an open device-engineering problem; recalibration protocols that operate during deployment without taking the system offline are an open systems problem. The discipline points include explicit drift characterization over realistic-deployment lifetimes (so that the architecture’s reliability profile is measured rather than assumed), explicit noise-budget allocation across the inference pipeline (so that the parts of the workload that can tolerate analog noise are identified and the parts that cannot are isolated), and engagement with the device-physics community on the underlying memristor technology.
Edge inference
Edge inference is a systems problem more than an architecture problem. The workload — a foundation-model policy, possibly with multi-modal input — must run inside a power envelope of single-digit watts and a latency envelope of single-digit milliseconds. Quantization, distillation, sparsity exploitation, and pipeline scheduling each contribute. GroqChip’s deterministic-latency architecture is the reference for predictable inference at the latency end; Cerebras’s wafer-scale engine at the throughput end. BrainChip’s Akida is one of the few commercially available neuromorphic edge inference parts, and Mythic’s analog matrix processor a comparable in-memory-compute part.
We work on the deployment stack that carries policies trained in Physical Intelligence onto edge hardware without the typical accuracy degradation, and on the broader question of how a heterogeneous deployment stack — neuromorphic, analog, conventional — should partition a workload between substrates. The discipline points include explicit accuracy-preservation characterization across the deployment-stack steps (quantization, distillation, sparsity exploitation each have non-trivial accuracy implications), explicit latency-budget-allocation across the deployment pipeline, and a preference for deployment-stack architectures whose accuracy implications are characterizable rather than assumed.
Energy-efficient training
The energy cost of training has been the larger story of the last several years; deployment energy is the next one, but training energy continues to matter. We work on training algorithms that reduce the wall-clock energy of frontier-scale runs — sparse training, low-precision training, gradient-compression schemes — with particular attention to the question of which efficiency gains preserve the empirical scaling-law trajectory and which silently degrade it.
There is a methodological hazard: efficiency gains that look identical on small-scale benchmarks may diverge from full-precision baselines at scale, and the divergence is sometimes not visible until the run is well beyond the scale at which it was budgeted. Honest scaling-law characterization of efficiency techniques is a research investment in its own right. The discipline points include explicit scaling-law evaluation of efficiency techniques (so that the small-scale benchmark agreement is verified at scale rather than assumed), explicit efficiency-accuracy trade-off characterization, and engagement with the broader scaling-law-research community on methodological questions.
Definitional bounds
Before moving to the open problems, four exclusions are worth being explicit about.
Cognitive computing does not mean brain emulation. The program is on architectures inspired by biological computation principles — sparsity, event-driven processing, in-memory operation — not on direct emulation of biological neural networks at the level of individual neurons or synapses. The popular-science framings of “brain-like AI” are not the program’s research substrate; the program’s framing is on architectures that exploit the same computational principles as biological cognition without the requirement that they be biological-neural-network-emulation accurate.
Cognitive computing does not mean neuromorphic-only. The architectural alternatives include neuromorphic substrates, in-memory-compute substrates, deterministic-latency tensor processors (Groq), wafer-scale engines (Cerebras), and conventional GPU/TPU substrates with aggressive optimization. The program funds work across the alternatives and treats workload-architecture-fit as the right framing rather than architectural-purity.
Cognitive computing does not mean single-architecture deployment. The deployment context — humanoids, autonomous vehicles, distributed-sensor networks — typically benefits from heterogeneous deployment in which different parts of the workload run on different substrates. The program treats heterogeneous deployment as the deployable-architecture default rather than a degenerate case.
Cognitive computing does not mean energy efficiency at any accuracy cost. The accuracy-preservation discipline is load-bearing. Architectures that achieve substantial energy efficiency at substantial accuracy cost are not deployable for most workloads, and the program treats accuracy-preservation as a co-objective with energy efficiency rather than as an afterthought.
Open problems
- SNN training to parity. Spiking neural networks trained on the same data as conventional networks underperform them on standard benchmarks. The gap is closing, but it is not closed. Whether parity is achievable in principle, or only on a subset of workloads, is open.4
- Memristor reliability. Memristor devices exhibit drift, variability, and limited write endurance. Engineering them into systems that survive multi-year deployment is a real problem.3
- Integration with conventional inference stacks. Neuromorphic and in-memory-compute architectures are not drop-in replacements for GPU inference. The integration problem — which parts of a workload run on which substrate, and how the data crosses the boundaries — is open.
- Programming models for neuromorphic. Vendor SDKs are immature, vendor-specific, and changing. The general programming-model problem for event-driven asynchronous computation has been worked on for decades without a dominant winner.
- On-device continual learning. Continual learning at the edge — adaptation to a specific deployment environment without forgetting the pretrained behavior — is one of the most promising applications for neuromorphic substrates, and one of the least mature.
- Energy accounting. Comparing architectures requires an honest energy accounting that includes data movement, host-device transfers, and the energy cost of the surrounding system. Most published comparisons do not.
- Workload partitioning across heterogeneous substrates. The general question of how to partition a workload across neuromorphic, analog, and conventional substrates is open, and currently solved by ad-hoc engineering rather than principled methodology.
- Closed-loop materials discovery. Memristor materials, neuromorphic-suitable substrates, and energy-efficient analog devices benefit from closed-loop discovery pipelines. Project Synthesis is the program’s investment in this surface.
Three risk scenarios
Scenario A — Architecture-software-stack stagnation
The first failure mode is the architecture-software-stack-stagnation scenario. Neuromorphic and in-memory-compute hardware advances rapidly; the software stacks lag; the workloads that the deployment context cares about cannot be efficiently mapped to the available hardware; the architectural alternative does not deploy at scale. The mitigation is software-stack investment in parallel with the hardware investment, with explicit workload-architecture-fit characterization driving the software-stack priorities.
Scenario B — Cloud-inference dominance
The second failure mode is the cloud-inference-dominance scenario. Cloud inference costs continue to fall, network bandwidth to edge devices continues to improve, and the edge-inference deployment context is incrementally absorbed into a cloud-inference deployment context with thin-client edge devices. The energy-efficiency-at-the-edge concern becomes less urgent than expected. The mitigation is the deployment-context-realism discipline: physical robotics, autonomous vehicles, and privacy-sensitive applications continue to require edge inference for latency, connectivity, and privacy reasons that cloud inference cannot address, and the program prioritizes those applications.
Scenario C — Successful staged deployment
The third scenario, which we treat as the base case if the architectural and software-stack work are competent, is staged deployment in which neuromorphic and in-memory-compute substrates are deployed initially in workloads where the workload-architecture fit is best (event-driven sensing, sparse-computation workloads, low-precision-tolerant workloads), the deployment envelope is gradually widened as the software stacks mature, and the heterogeneous-deployment-stack architecture becomes the default for edge deployment.
What technical work bears on this
This pillar connects to Physical Intelligence on the edge-inference side: the deployment of foundation policies on humanoids and autonomous agents is bounded by the power and latency envelope that this work addresses. It connects to Quantum AI on the architectural-alternatives side: both pillars investigate substrates outside the conventional digital-CMOS frontier. Project Synthesis, our closed-loop materials-discovery effort, intersects this work directly: the search for memristor materials, neuromorphic-suitable substrates, and energy-efficient analog devices is one of its target application areas. The connection to ENERA is through the orbital-array onboard-compute applications: kilometre-scale orbital arrays have on-board compute budgets bounded by the harvested-power-versus-onboard-power-allocation calculation, and the cognitive-computing work directly determines what fraction of the harvested power is available for autonomous control.
Where to read further
Physical Intelligence treats the foundation-policy substrate that edge inference is the deployment surface for. Quantum AI treats the complementary architectural-alternatives bet. Project Synthesis treats the closed-loop materials-discovery infrastructure. ENERA treats the orbital-power applications.
Footnotes
-
Mike Davies, Andreas Wild, Garrick Orchard, et al. (Intel), “Advancing Neuromorphic Computing With Loihi: A Survey of Results and Outlook”, Proceedings of the IEEE 109, no. 5 (2021): 911–934. ↩ ↩2
-
Paul A. Merolla, John V. Arthur, Rodrigo Alvarez-Icaza, et al. (IBM), “A million spiking-neuron integrated circuit with a scalable communication network and interface”, Science 345, no. 6197 (2014): 668–673. ↩ ↩2
-
Abu Sebastian, Manuel Le Gallo, Riduan Khaddam-Aljameh, and Evangelos Eleftheriou, “Memory devices and applications for in-memory computing”, Nature Nanotechnology 15 (2020): 529–544. ↩ ↩2 ↩3
-
Bodo Rueckauer, Iulia-Alexandra Lungu, Yuhuang Hu, Michael Pfeiffer, and Shih-Chii Liu, “Conversion of Continuous-Valued Deep Networks to Efficient Event-Driven Networks for Image Classification”, Frontiers in Neuroscience 11 (2017): 682. ↩ ↩2 ↩3