MAC: Multi-Agent Control — Symbolic MIMO Channel Framework

Multi-Agent Control (MAC) framework for auraison. Simulation code: control-plane/backend/mac/

1. Introduction

This document formalizes multi-agent context communication as a symbolic MIMO channel using the language of information theory. A multi-agent context communication system can be modeled as a multi-input multi-output (MIMO) communication channel in which agents exchange symbolic messages. The messages carry context, which functions as a structured representation of knowledge, beliefs, or environment state.

2. Multi-Agent Communication as a Channel

Consider a system of N agents:

A = \{A_1, A_2, \ldots, A_N\}

Each agent maintains an internal context state

C_i \in \mathcal{C}

where $\mathcal{C}$ is the space of symbolic representations (graphs, tokens, plans, embeddings, etc.).

When agents communicate, they transmit messages

M_i

that are functions of their internal context.

M_i = f_i(C_i)

The messages pass through a communication channel and are received by other agents.

3. MIMO Symbolic Channel Formulation

Let

M = (M_1, M_2, \ldots, M_N)

be the vector of transmitted messages.

The channel produces received messages

Y = (Y_1, Y_2, \ldots, Y_N)

with conditional probability

P(Y \mid M)

This is formally a MIMO communication channel:

M \to \text{Channel} \to Y

Each agent then updates its context:

C_i' = g_i(C_i, Y_i)

Thus the full system evolution is

C_i' = g_i(C_i, Y_i)

4. Symbolic Nature of the Channel

Unlike classical communication systems where symbols are bits, here the symbols belong to a structured alphabet

\Sigma = \{\text{tokens}, \text{graphs}, \text{plans}, \text{tool calls}\}

Examples:

Symbol type	Meaning
Natural language tokens	reasoning traces
PDDL operators	plans
JSON tool calls	actions
embeddings	semantic summaries
scene graphs	environment state

Thus messages are sequences

M_i = (s_1, s_2, \ldots, s_k), \quad s_j \in \Sigma

5. Information-Theoretic Quantities

Channel capacity

The maximum information exchange between agents:

C = \max_{P(M)} I(M; Y)

This represents the maximum context transfer rate.

Mutual information between agents

For two agents $A_i$ and $A_j$ :

I(C_i; C_j)

measures shared knowledge.

Communication increases this mutual information.

Context compression

Agents typically compress context before transmission:

M_i = f(C_i)

Information theory interpretation:

R \geq H(C_i)

where

$R$ = message rate
$H(C_i)$ = entropy of the context.

Large contexts require summarization or embeddings.

6. Noise and Ambiguity

Language communication introduces channel noise:

P(Y \mid M)

Examples:

Noise source	Effect
ambiguous language	semantic distortion
hallucination	channel corruption
lossy summarization	information loss
tool failures	message drop

This makes the channel stochastic.

7. Context Synchronization Problem

Agents aim to minimize context divergence

D(C_i \| C_j)

where $D$ is KL divergence.

Communication protocols attempt to enforce

C_i \approx C_j

This is analogous to distributed consensus in multi-agent systems.

8. Relation to Agent Architectures

In modern AI systems:

Component	Information-theory role
LLM reasoning	encoder
tool output	channel observation
prompt construction	message encoding
memory store	channel state
agent update	decoder

Thus the system becomes:

C_i \xrightarrow{\text{encode}} M_i \xrightarrow{\text{channel}} Y_j \xrightarrow{\text{decode}} C_j'

9. Emergent Communication

If agents learn to communicate, the system optimizes

\max I(C_i; C_j)

subject to bandwidth constraints.

This leads to emergent symbolic protocols, similar to:

differentiable communication channels in multi-agent RL
language emergence studies.

10. Multi-Agent Communication Graph

Communication often follows a graph

G = (V, E)

where

$V$ = agents
$E$ = communication channels.

Information flow becomes

C(t+1) = F(C(t), M(t))

which resembles network information theory.

11. Interpretation for Agentic AI Systems

In modern agent frameworks such as Claude Code agents, Ray distributed agents, OpenClaw multi-agent orchestration, and robotics VLA agents, each agent acts as:

Context → Encoder → Message → Channel → Decoder → Updated Context

Context = memory + environment state + reasoning traces.

12. Practical Implications

This formulation explains several practical phenomena.

Context window limits

A bandwidth constraint:

|M| \leq B

Summarization agents

Compression operators

M = f(C)

Planning agents

Encoding structured plans instead of raw context.

Vector stores

Externalizing channel memory.

13. Summary of Channel Formulation

Multi-agent context communication can be modeled as:

C_i \xrightarrow{\text{encode}} M_i

(M_1, \ldots, M_N) \xrightarrow{\text{MIMO channel}} (Y_1, \ldots, Y_N)

C_i' = g_i(C_i, Y_i)

where:

agents are distributed information processors
communication is a symbolic MIMO channel
the objective is maximizing mutual context information under bandwidth and noise constraints.

14. Rate-Distortion Interpretation

The rate–distortion perspective provides a precise way to understand why modern agent architectures — LLM context windows, summarization, vector retrieval, and memory stores — appear to work well. In a multi-agent or single-agent reasoning system, the fundamental constraint is limited channel bandwidth, which forces compression of context before reasoning or communication.

14.1 Context Window as a Bandwidth Constraint

Let an agent possess a full internal state

C

which includes

observations
memory
tool outputs
reasoning traces
environment state.

The entropy of this context is

H(C)

However, the LLM can only receive a limited number of tokens.

If the context window allows $B$ tokens, then the transmitted representation $M$ must satisfy

|M| \leq B

Thus

M = f(C)

is a compressed representation of the context.

This is exactly the rate constraint

R \leq B

in rate–distortion theory.

14.2 Distortion of Context

Compression inevitably loses information.

Define a distortion function

d(C, \hat{C})

where

$\hat{C}$

is the reconstructed context used by the model.

Examples of distortion:

Distortion type	Example
semantic loss	missing facts
temporal loss	missing earlier events
reasoning loss	lost intermediate thoughts
structural loss	incomplete graph

14.3 Rate-Distortion Function

The optimal trade-off is defined by

R(D) = \min_{P(\hat{C} \mid C)} I(C; \hat{C})

subject to

E[d(C, \hat{C})] \leq D

Interpretation:

$R$ = tokens transmitted
$D$ = context error
$I(C; \hat{C})$ = preserved information.

14.4 Why Summaries Work

A summarization agent computes

M = f(C)

designed to minimize distortion for a given token budget.

The ideal summarizer approximates

\hat{C} = \arg\min E[d(C, \hat{C})]

subject to

|M| \leq B

Thus summarization is lossy compression optimized for reasoning relevance.

14.5 Vector Retrieval as Side Information

RAG systems modify the channel model.

Instead of sending full context, the system transmits a query

q = g(C)

which retrieves external information

K

from a memory store.

The LLM receives

(M, K)

This resembles source coding with side information (Wyner–Ziv coding).

The rate–distortion function becomes

R(D) = \min I(C; \hat{C} \mid K)

The external memory reduces required bandwidth.

In a multi-agent system:

Agent $i$ transmits compressed context

M_i = f(C_i)

to another agent.

The receiving agent reconstructs

\hat{C_i}

and updates its state

C_j' = g(C_j, \hat{C_i})

The efficiency depends on

I(C_i; \hat{C_i})

the preserved information between agents.

14.7 Why Planning Helps

Structured plans reduce entropy.

Raw context entropy:

H(C)

Plan representation entropy:

H(P)

with

H(P) \ll H(C)

Example:

Instead of transmitting

observations + reasoning + history

the agent transmits

PLAN:
find mug
grasp mug
place mug on table

This acts as semantic compression.

14.8 Agent Architectures as Communication Systems

Modern agent systems implicitly implement rate–distortion optimization.

Component	Role
summarizer agent	lossy compression
vector DB	side information
planner	semantic compression
memory store	external entropy reservoir
context window	channel capacity

14.9 Optimal Memory Architecture

From an information theory standpoint, the optimal architecture contains three layers.

Short-term memory (prompt) — High fidelity, limited bandwidth.
Semantic memory (vector store) — Medium fidelity retrieval.
Episodic archive (lakehouse / logs) — High entropy, rarely accessed.

This hierarchy approximates successive refinement coding.

14.10 Interpretation for Robotics and VLA Agents

In physical AI systems (e.g., VLA models controlling robots):

The agent must compress

C = (\text{video}, \text{state}, \text{history})

into a representation usable by the policy.

Vision encoders act as rate-limited encoders.

z = f(\text{image})

where $z$ is a low-dimensional latent.

The robot policy receives

(\text{language}, z)

This is again a rate–distortion optimized representation.

14.11 Implication for Agent Scaling

Scaling agent systems is largely about optimizing information flow.

Three main strategies appear:

Increase channel capacity — larger context windows.
Improve compression — better summarization.
Add side information — RAG memory.

14.12 Key Insight

The central constraint of agentic AI systems is not compute but information bandwidth.

The architecture that wins is the one that best solves

\min R \quad \text{s.t.} \quad E[d(C, \hat{C})] \leq D

which is precisely the rate–distortion problem.

15. Soft Symbols and Semantic Distortion

Once symbols are represented as continuous embeddings rather than discrete tokens, the classical Hamming distance becomes inappropriate. Hallucination in this setting corresponds to semantic drift — a sequence of symbols that deviates from the meaning of the correct answer — rather than bit flips. The problem becomes one of semantic distortion in a continuous representation space.

15.1 Soft Symbol Representation

Let a symbolic vocabulary be

S = \{s_1, s_2, \ldots, s_V\}

Each symbol is mapped to an embedding

\phi(s) \in \mathbb{R}^d

Example

"cat"  →  [0.12, -0.33, ..., 0.87]
"dog"  →  [0.10, -0.29, ..., 0.91]

A sequence

S = (s_1, s_2, \ldots, s_n)

becomes

E = (\phi(s_1), \phi(s_2), \ldots, \phi(s_n))

Thus the channel now transmits vectors instead of discrete tokens.

15.2 Continuous MIMO Channel

Each agent transmits a compressed embedding representation

M_i = f_i(E)

The channel corrupts the vectors

Y_i = M_i + N_i

where

N_i \sim \mathcal{N}(0, \Sigma)

This models

reasoning noise
summarization distortion
LLM generation variability.

The receiver performs fusion

\hat{E} = g(Y_1, \ldots, Y_k)

15.3 Semantic Distortion Metric

Instead of Hamming distance we measure distortion in embedding space.

A natural choice is cosine distance

d(e_i, \hat{e}_i) = 1 - \frac{e_i^\top \hat{e}_i}{\|e_i\| \|\hat{e}_i\|}

Sequence distortion

D(S, \hat{S}) = \frac{1}{n} \sum_{i=1}^{n} d(e_i, \hat{e}_i)

15.4 Hallucination as Semantic Deviation

Let

S^*

be the ground-truth semantic sequence.

A predicted sequence

\hat{S}

is considered hallucinated if

D(S^*, \hat{S}) > \tau

for some semantic tolerance threshold $\tau$ .

Thus hallucination becomes semantic deviation beyond tolerance.

15.5 Multi-Agent Noise Suppression

Suppose each agent sends

Y_i = E + N_i

If noise is independent, the optimal estimator is

\hat{E} = \frac{1}{k} \sum_{i=1}^{k} Y_i

Noise variance becomes

\sigma_{\text{eff}}^2 = \frac{\sigma^2}{k}

Therefore

D(S^*, \hat{S}) \downarrow

as the number of agents increases.

This provides a theoretical justification for multi-agent reasoning reducing hallucination.

15.6 Semantic Majority Voting

Instead of token voting, we perform embedding consensus.

Algorithm

for each position i:
    gather embeddings from agents
    compute centroid
    choose symbol whose embedding is nearest to centroid

This is equivalent to minimum semantic distortion decoding.

15.7 Compression in Embedding Space

Agents may compress embeddings using projection

M_i = P_i E

where

P_i \in \mathbb{R}^{k \times d}

This models

summarization
reasoning traces
compressed plans.

Decoding reconstructs

\hat{E} = \sum_i W_i Y_i

similar to MIMO linear receivers.

15.8 Python Extension Concept

Replace discrete tokens with embeddings.

Example sketch:

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity


def semantic_distance(a, b):
    return 1 - cosine_similarity(a.reshape(1,-1), b.reshape(1,-1))[0,0]


def mimo_fusion(embeddings):
    return np.mean(embeddings, axis=0)


def detect_hallucination(true_seq, pred_seq, tau):
    d = np.mean([
        semantic_distance(t, p)
        for t,p in zip(true_seq, pred_seq)
    ])
    return d > tau

15.9 Experimental Demonstration

The project experiment becomes:

generate semantic sequence
embed tokens
simulate noisy channel
compare
- Single agent $\hat{E}_1$
- vs multi-agent fusion $\hat{E}_k$
Measure
- semantic distortion
- hallucination probability.

Plot:

Agents vs hallucination rate
Agents vs distortion
Compression vs distortion

15.10 Vector Gaussian MIMO Interpretation

With embeddings the channel becomes a vector Gaussian MIMO channel

Y_i = P_i E + N_i

Decoding attempts to estimate $E$ .

Hallucination corresponds to large reconstruction error in semantic space.

This aligns the LLM hallucination problem with

rate–distortion theory
distributed estimation
semantic communications theory.

16. Turbo Codes and Iterative Belief Propagation

The connection to turbo codes and iterative belief propagation arises because both multi-agent systems and turbo decoders attempt to estimate latent variables from multiple noisy observations through iterative probabilistic refinement. Once symbols are represented as soft embeddings, the multi-agent communication system becomes mathematically close to turbo decoding and belief propagation.

16.1 Soft Symbols and Log-Likelihoods

In classical channel coding, a received symbol is not decoded as a hard bit but as a soft value, typically a log-likelihood ratio (LLR):

L(b) = \log \frac{P(b=1 \mid y)}{P(b=0 \mid y)}

These soft values allow iterative algorithms to refine estimates.

In the embedding formulation, each symbol $s$ has a vector representation

e = \phi(s) \in \mathbb{R}^d

A noisy observation is

y = e + n

Instead of an LLR over two symbols, we now have a distribution over the vocabulary:

P(s \mid y) \propto \exp(-\|y - \phi(s)\|^2)

This is directly analogous to soft decoding.

16.2 Multi-Agent Observations as Parallel Channels

Suppose multiple agents produce semantic observations

y_1, y_2, \ldots, y_k

Each observation is

y_i = e + n_i

This resembles parallel channels in coding theory.

The optimal estimator combines likelihoods:

P(s \mid y_1, \ldots, y_k) \propto \prod_i P(s \mid y_i)

Taking logs

\log P(s \mid Y) = \sum_i \log P(s \mid y_i)

Thus every agent contributes soft evidence.

This is exactly how soft information is accumulated in turbo decoding.

16.3 Iterative Belief Propagation Interpretation

Consider a factor graph.

Nodes:

latent symbols $s_t$
agent observations $y_i$
sequence constraints (grammar, reasoning structure)

Edges represent dependencies.

The joint distribution is

P(S, Y) = P(S) \prod_i P(Y_i \mid S)

Belief propagation updates messages:

m_{i \to s}(s) = P(y_i \mid s)

m_{s \to j}(s) = \prod_{k \neq j} m_{k \to s}(s)

These messages iteratively refine the estimate of the symbol sequence.

This is structurally identical to turbo decoding loops.

16.4 Hallucination as Decoding Error

In coding theory:

\hat{b} \neq b

In the semantic case:

\hat{s} \not\approx s

where approximation is defined in embedding space.

Thus hallucination corresponds to

\arg\max_s P(s \mid Y) \neq s_{\text{true}}

or equivalently semantic distortion exceeding a threshold.

16.5 Turbo Code Analogy

Turbo codes contain:

two encoders
interleaver
iterative decoder exchanging soft information.

Mapping to multi-agent reasoning:

Turbo component	Agent system
encoder 1	reasoning agent
encoder 2	retrieval agent
interleaver	context transformation
noisy channel	LLM generation noise
soft decoder	consensus reasoning
iterations	debate / refinement loop

Each iteration improves the posterior

P(S \mid Y)

16.6 Multi-Agent Debate as Iterative Decoding

Consider agents exchanging beliefs

b_i^{(t)}(s)

Iteration rule:

b_i^{(t+1)}(s) \propto P(y_i \mid s) \prod_{j \neq i} b_j^{(t)}(s)

This resembles loopy belief propagation.

The process converges to a consensus distribution over symbols.

16.7 Semantic Channel Capacity

For embeddings the channel becomes a vector Gaussian channel

y = e + n

Capacity

C = \frac{1}{2} \log |I + \Sigma_n^{-1} \Sigma_e|

Multiple agents increase effective SNR.

Noise variance becomes

\sigma_{\text{eff}}^2 = \frac{\sigma^2}{k}

Thus hallucination probability decreases exponentially with agent count.

16.8 Practical Algorithm

A turbo-like iterative fusion algorithm can be implemented.

Example sketch:

def iterative_semantic_fusion(agent_embeddings, vocab_embeddings, iterations=5):
    belief = np.ones(len(vocab_embeddings)) / len(vocab_embeddings)

    for _ in range(iterations):
        for y in agent_embeddings:
            likelihood = np.exp(-np.linalg.norm(
                vocab_embeddings - y, axis=1
            )**2)
            belief = belief * likelihood
            belief = belief / belief.sum()

    return belief

This produces a posterior distribution over symbols.

16.9 Interpretation for LLM Systems

This viewpoint explains why the following techniques reduce hallucination:

self-consistency decoding
multi-agent debate
reflection loops
tool verification

All of them act as additional parity constraints on the latent semantic sequence.

The system is effectively performing error-correcting decoding of meaning.

16.10 Key Insight

Multi-agent reasoning architectures behave like semantic error-correcting codes.

The latent meaning sequence is encoded through multiple reasoning paths and noisy language generation processes. Iterative belief updates gradually remove inconsistencies, analogous to turbo decoding removing channel noise.

17. Chain-of-Thought as Parity Checks

The analogy between chain-of-thought (CoT) reasoning and parity constraints in error-correcting codes (particularly LDPC codes) becomes precise once reasoning steps are treated as latent variables that impose structural constraints on the final answer. Under this view, hallucination is analogous to decoding error in a noisy channel, and intermediate reasoning provides redundancy that enables error correction.

17.1 Latent Reasoning Graph

Let the correct semantic answer be a latent variable

A

A reasoning trace consists of intermediate steps

R = (r_1, r_2, \ldots, r_T)

Each step imposes a constraint relating the answer and other steps.

For example

r_t = f_t(r_{t-1}, A)

The full reasoning structure can be represented as a factor graph:

Nodes:

latent answer $A$
reasoning steps $r_1, \ldots, r_T$
observations (prompt, retrieved facts)

Edges encode dependencies.

17.2 Channel Noise in LLM Generation

LLM generation introduces noise

\tilde{r}_t = r_t + n_t

where noise may represent:

token sampling randomness
incomplete context
reasoning mistakes.

Without reasoning steps the model directly predicts

\hat{A}

from the prompt, equivalent to single-shot decoding.

This is fragile.

17.3 Chain-of-Thought as Redundant Encoding

When the model produces reasoning steps, the answer is not produced independently.

Instead

A = g(r_1, \ldots, r_T)

Thus the reasoning trace creates redundant constraints.

This resembles a linear code

Hx = 0

where

$x$ = codeword bits
$H$ = parity check matrix.

In the reasoning case

h_t(A, r_t, r_{t-1}) = 0

Each reasoning step acts like a parity check equation.

17.4 LDPC-Style Factor Graph

An LDPC code is represented as a bipartite graph.

Variable nodes: $x_1, x_2, \ldots, x_n$

Check nodes enforce constraints.

In the reasoning analogy:

Variable nodes:

answer token embeddings
reasoning step embeddings.

Check nodes:

logical constraints
arithmetic relationships
consistency with retrieved facts.

Graph structure:

Answer node
   |  \   \
 r1  r2   r3
   \  |   /
    constraints

17.5 Belief Propagation Decoding

In LDPC decoding, messages propagate between nodes.

Message from variable node

m_{v \to c}

Message from check node

m_{c \to v}

Iterative updates refine probability estimates.

In reasoning systems:

reasoning steps update beliefs about the answer
the answer updates beliefs about steps.

The process resembles

self-reflection
debate
verification loops.

17.6 Self-Consistency as Monte Carlo Decoding

Self-consistency sampling generates multiple reasoning paths

R^{(1)}, R^{(2)}, \ldots, R^{(K)}

Each produces a candidate answer.

The final answer is selected by majority vote or likelihood.

This approximates marginalizing the posterior

P(A \mid \text{prompt})

similar to ensemble decoding.

17.7 Hallucination as Parity Violation

If a reasoning step contradicts another step

h_t(A, r_t, r_{t-1}) \neq 0

the system detects inconsistency.

Examples:

Arithmetic reasoning

2+3 = 5
5+4 = 9
Answer = 8

Constraint violation reveals an error.

Thus reasoning steps act as error-detecting redundancy.

17.8 Information-Theoretic Interpretation

Suppose the final answer has entropy

H(A)

Adding reasoning steps increases transmitted information

H(A, R)

but because $R$ is redundant,

H(A, R) < H(A) + H(R)

The redundancy improves decoding reliability.

This is exactly the mechanism used in error-correcting codes.

17.9 Implications for Multi-Agent Systems

If multiple agents generate reasoning traces

R_i

the system forms a large constraint graph

P(A \mid R_1, R_2, \ldots, R_k)

More constraints reduce the feasible answer space.

Thus hallucination probability decreases.

17.10 Visualization of the Analogy

Coding theory:

message → encoder → noisy channel → decoder

LLM reasoning:

prompt → reasoning trace → noisy generation → verification / consensus

Redundancy in reasoning plays the role of coding gain.

17.11 Key Insight

Chain-of-thought reasoning functions as a semantic error-correcting code. Intermediate steps introduce redundancy that constrains the answer space, allowing iterative inference mechanisms — similar to turbo decoding or belief propagation — to correct errors introduced by the stochastic generation process.

18. Critical Review and Testable Hypotheses

Added 2026-03-10. The preceding sections present an elegant narrative connecting multi-agent LLM systems to information theory. Several claims are metaphorical rather than mathematical. Below is a critical analysis and 24 testable hypotheses organized by topic.

18.1 Cross-Cutting Weaknesses

#	Issue	Impact
W1	MIMO misnomer — the formulation describes agents independently observing the same source and fusing results. This is SIMO (single-input, multiple-output) / diversity combining, not true MIMO. True MIMO requires cross-channel interference (off-diagonal H matrix entries).	Scaling predictions are wrong: diversity gain ~ log k, not spatial multiplexing gain ~ k.
W2	i.i.d. noise assumption — all theoretical gains (σ²/k, exponential error decay with agent count) require independent agent errors. Agents sharing the same LLM weights, training data, and prompt produce correlated errors.	Actual effective variance is σ²(1+(k-1)ρ)/k. For ρ→1 (identical agents, identical prompts) there is zero noise reduction.
W3	Additive Gaussian noise on embeddings is untested — the most critical modeling assumption. LLM errors are structured, multimodal, and context-dependent, not i.i.d. Gaussian perturbations.	The entire soft-symbol formulation (§15), noise suppression proof, and channel capacity formula depend on this.
W4	Analogies without structural verification — turbo codes, LDPC, Wyner-Ziv are invoked by analogy but the necessary conditions (interleaver design, check node sparsity, side-information independence) are not verified.	Claims about coding gain, belief propagation convergence, and rate reduction are suggestive but unproven.

18.2 Hypothesis Map

Task 1: MIMO Symbolic Communication (`auraison-ncq.1`)

H1.1 — Multi-agent context exchange on independent tasks behaves as SIMO (diversity gain ~ log k), not MIMO (capacity gain ~ k).
- Test: Measure mutual information I(C_i; C_j) before/after communication for k=2..8 agents on a shared reasoning task; fit to log(k) vs linear k.
H1.2 — Correlated agent errors (shared LLM backbone, same prompt) reduce the effective diversity gain below the independent-noise bound σ²/k.
- Test: Compare hallucination rates for k agents using the same vs different LLMs; measure noise correlation coefficient ρ and verify effective variance is σ²(1+(k-1)ρ)/k.
H1.3 — The communication graph topology G=(V,E) affects convergence rate of context synchronization D(C_i||C_j)→0.
- Test: Compare star, ring, and fully-connected topologies for N=5 agents; measure rounds to reach D < ε.

Task 2: Rate-Distortion Interpretation (`auraison-ncq.2`)

H2.1 — LLM summarization approximates the rate-distortion bound.
- Test: For a fixed corpus C, generate summaries at budgets B ∈ {50, 100, 200, 500, 1000} tokens. Measure downstream task accuracy (proxy for distortion). Plot R vs D and compare against the Shannon lower bound for a fitted source model.
H2.2 — RAG reduces the effective rate needed to achieve a given distortion level.
- Test: Compare task accuracy at fixed context budget B with and without RAG retrieval. Measure the rate savings ΔR = R_no_RAG(D) - R_RAG(D) and verify it equals I(C; K) as Wyner-Ziv predicts.
H2.3 — Plan representations achieve lower entropy than raw context for equivalent task performance.
- Test: Measure H(plan) vs H(raw context) using a compression proxy (gzip ratio), verify H(P) << H(C) while downstream task accuracy remains within tolerance D.
H2.4 — The three-layer memory hierarchy outperforms flat retrieval.
- Test: Compare task accuracy for (prompt-only) vs (prompt+vector) vs (prompt+vector+archive) at fixed total token budget; verify diminishing returns consistent with successive refinement.

Task 3: Python MIMO Simulator (`auraison-ncq.3`)

H3.1 — Multi-agent majority voting reduces hallucination rate proportional to 1/k under synthetic i.i.d. noise.
- Test: Sweep k ∈ {1,2,3,5,7,10} agents, noise rates p ∈ {0.05, 0.1, 0.2, 0.3}, measure false positive rate; fit to theoretical curve P_err ~ p^k.
H3.2 — The hallucination reduction saturates or reverses beyond a critical agent count k* when agent noise is correlated.
- Test: Introduce noise correlation ρ ∈ {0, 0.2, 0.5, 0.8} between agents; identify k* where adding agents no longer helps.
H3.3 — There exists an optimal compression budget B* that minimizes hallucination for a given number of agents.
- Test: Sweep B and k jointly; plot the 2D surface of hallucination rate vs (B, k) and identify the Pareto frontier.
H3.4 — Structured noise (clustered errors, systematic biases) defeats majority voting faster than i.i.d. noise.
- Test: Compare i.i.d. vs bursty vs systematic noise models at equal average error rate; measure the gap in hallucination rates.

Task 4: Soft Symbols and Semantic Distortion (`auraison-ncq.4`)

H4.1 ★ — LLM token-level errors are NOT well-modeled by additive Gaussian noise in embedding space.
- Test: Collect (correct_token, hallucinated_token) pairs from a real LLM on a QA benchmark. Compute the distribution of error vectors e_err = φ(hallucinated) - φ(correct). Test for Gaussianity (Shapiro-Wilk, Q-Q plot). Prediction: the distribution will be heavy-tailed and multimodal.
H4.2 — Cosine centroid fusion outperforms token-level majority voting for semantically similar error modes.
- Test: Generate k agent responses where errors are near-synonyms (e.g., 'big'/'large'/'huge'). Compare centroid-nearest-neighbor decoding vs majority vote. Prediction: centroid fusion wins when errors cluster semantically.
H4.3 — Centroid fusion FAILS when errors are adversarially distributed.
- Test: Construct scenarios where k-1 agents hallucinate a semantically coherent but wrong answer. Verify that centroid fusion amplifies the error rather than correcting it.
H4.4 — The effective noise reduction with real LLM agents follows σ²(1+(k-1)ρ)/k, not σ²/k.
- Test: Run k ∈ {1..5} instances of the same LLM on the same questions at temperature > 0. Measure pairwise error correlation ρ. Verify the actual distortion reduction matches the correlated-noise formula.

Task 5: Turbo Codes, Belief Propagation, CoT as LDPC (`auraison-ncq.5`)

H5.1 — Multi-agent debate converges to a consensus distribution, and the number of iterations to convergence scales with graph connectivity.
- Test: Run k ∈ {2,3,5} agents in iterative belief exchange on factual QA. Measure KL divergence between agent beliefs at each iteration. Track convergence rate and identify cases of oscillation or divergence.
H5.2 — Longer CoT traces reduce hallucination following a diminishing-returns curve analogous to coding gain.
- Test: For a fixed task, vary CoT budget T ∈ {0, 1, 2, 4, 8, 16} steps. Measure error rate. Fit to the coding gain curve P_err ~ exp(-αT). Prediction: there exists T* beyond which additional steps add noise rather than redundancy.
H5.3 ★ — Multi-agent debate with DIFFERENT model families (decorrelated errors) outperforms debate with identical models, analogous to the interleaver effect in turbo codes.
- Test: Compare hallucination rates for (a) 3× GPT-4, (b) GPT-4 + Claude + Gemini, (c) 3× Claude, on a shared benchmark. Prediction: mixed-model ensemble (b) achieves the lowest error rate.
H5.4 — Self-consistency sampling does NOT approximate the true posterior for out-of-distribution questions.
- Test: Compare self-consistency answer distribution vs ground truth distribution on questions where the LLM has known systematic biases. Prediction: self-consistency amplifies the bias rather than correcting it.
H5.5 — Explicit parity-check verification (tool use, calculator, code execution) provides stronger error correction than implicit reasoning redundancy.
- Test: Compare error rates on arithmetic/logic tasks for (a) CoT-only, (b) CoT + tool verification, (c) multi-agent debate without tools. Prediction: (b) dominates.

Task 6: Transformers as Soft-Symbol Encoders (`auraison-ncq.6`)

H6.1 — Attention head diversity provides a diversity gain analogous to multiple projection matrices.
- Test: Measure pairwise cosine similarity between attention head outputs. Ablate individual heads and measure hallucination rate increase — heads with more diverse projections should be more critical.
H6.2 ★ — Token embeddings that are closer in cosine distance are more frequently confused in hallucinations.
- Test: Collect hallucination pairs (correct, hallucinated) from a real LLM. Measure cosine distance between their embeddings. Compare against random token pairs. Prediction: hallucinated tokens are significantly closer to the correct token than random.
H6.3 — The softmax temperature acts as a noise parameter in the channel model.
- Test: Vary temperature T ∈ {0.1, 0.3, 0.5, 0.7, 1.0, 1.5} and measure hallucination rate on a factual QA benchmark. Fit to a channel error rate model P_err = f(T). Prediction: error rate increases monotonically with T, consistent with σ² ∝ T.
H6.4 — Fine-tuning reshapes the embedding space to increase minimum distance between confusable tokens, analogous to constellation optimization.
- Test: Compare the embedding-space geometry (minimum pairwise distance among top-k confusable tokens) before and after RLHF/DPO fine-tuning. Prediction: fine-tuning increases separation between frequently confused token clusters.

18.3 Priority Hypotheses

The three starred (★) hypotheses are the most critical to validate first:

H4.1 — Noise characterization. This is the foundation: if LLM errors aren't Gaussian, the entire soft-symbol framework needs a different noise model.
H5.3 — Model diversity as interleaver. This is the strongest practical prediction and the easiest to test with existing LLM APIs.
H6.2 — Embedding proximity predicts hallucination. This would empirically ground the connection between transformers and the channel model.

18.4 What Would Make This Publishable

Empirical validation of H4.1 (noise distribution characterization)
Demonstration of H5.3 (mixed-model ensemble > same-model ensemble)
Rate-distortion curve measurement (H2.1) showing LLM summarization approaches the bound
A revised formulation that accounts for correlated noise and non-Gaussian error structure

1. Introduction​

2. Multi-Agent Communication as a Channel​

3. MIMO Symbolic Channel Formulation​

4. Symbolic Nature of the Channel​

5. Information-Theoretic Quantities​

6. Noise and Ambiguity​

7. Context Synchronization Problem​

8. Relation to Agent Architectures​

9. Emergent Communication​

10. Multi-Agent Communication Graph​

11. Interpretation for Agentic AI Systems​

12. Practical Implications​

13. Summary of Channel Formulation​

14. Rate-Distortion Interpretation​

14.1 Context Window as a Bandwidth Constraint​

14.2 Distortion of Context​

14.3 Rate-Distortion Function​

14.4 Why Summaries Work​

14.5 Vector Retrieval as Side Information​

14.6 Multi-Agent Memory Sharing​

14.7 Why Planning Helps​

14.8 Agent Architectures as Communication Systems​

14.9 Optimal Memory Architecture​

14.10 Interpretation for Robotics and VLA Agents​

14.11 Implication for Agent Scaling​

14.12 Key Insight​

15. Soft Symbols and Semantic Distortion​

15.1 Soft Symbol Representation​

15.2 Continuous MIMO Channel​

15.3 Semantic Distortion Metric​

15.4 Hallucination as Semantic Deviation​

15.5 Multi-Agent Noise Suppression​

15.6 Semantic Majority Voting​

15.7 Compression in Embedding Space​

15.8 Python Extension Concept​

15.9 Experimental Demonstration​

15.10 Vector Gaussian MIMO Interpretation​

16. Turbo Codes and Iterative Belief Propagation​

16.1 Soft Symbols and Log-Likelihoods​

16.2 Multi-Agent Observations as Parallel Channels​

16.3 Iterative Belief Propagation Interpretation​

16.4 Hallucination as Decoding Error​

16.5 Turbo Code Analogy​

16.6 Multi-Agent Debate as Iterative Decoding​

16.7 Semantic Channel Capacity​

16.8 Practical Algorithm​

16.9 Interpretation for LLM Systems​

16.10 Key Insight​

17. Chain-of-Thought as Parity Checks​

17.1 Latent Reasoning Graph​

17.2 Channel Noise in LLM Generation​

17.3 Chain-of-Thought as Redundant Encoding​

17.4 LDPC-Style Factor Graph​

17.5 Belief Propagation Decoding​

17.6 Self-Consistency as Monte Carlo Decoding​

17.7 Hallucination as Parity Violation​

17.8 Information-Theoretic Interpretation​

17.9 Implications for Multi-Agent Systems​

17.10 Visualization of the Analogy​

17.11 Key Insight​

18. Critical Review and Testable Hypotheses​

18.1 Cross-Cutting Weaknesses​

18.2 Hypothesis Map​

Task 1: MIMO Symbolic Communication (auraison-ncq.1)​

Task 2: Rate-Distortion Interpretation (auraison-ncq.2)​

Task 3: Python MIMO Simulator (auraison-ncq.3)​

Task 4: Soft Symbols and Semantic Distortion (auraison-ncq.4)​

Task 5: Turbo Codes, Belief Propagation, CoT as LDPC (auraison-ncq.5)​

Task 6: Transformers as Soft-Symbol Encoders (auraison-ncq.6)​

18.3 Priority Hypotheses​

18.4 What Would Make This Publishable​

1. Introduction

2. Multi-Agent Communication as a Channel

3. MIMO Symbolic Channel Formulation

4. Symbolic Nature of the Channel

5. Information-Theoretic Quantities

6. Noise and Ambiguity

7. Context Synchronization Problem

8. Relation to Agent Architectures

9. Emergent Communication

10. Multi-Agent Communication Graph

11. Interpretation for Agentic AI Systems

12. Practical Implications

13. Summary of Channel Formulation

14. Rate-Distortion Interpretation

14.1 Context Window as a Bandwidth Constraint

14.2 Distortion of Context

14.3 Rate-Distortion Function

14.4 Why Summaries Work

14.5 Vector Retrieval as Side Information

14.6 Multi-Agent Memory Sharing

14.7 Why Planning Helps

14.8 Agent Architectures as Communication Systems

14.9 Optimal Memory Architecture

14.10 Interpretation for Robotics and VLA Agents

14.11 Implication for Agent Scaling

14.12 Key Insight

15. Soft Symbols and Semantic Distortion

15.1 Soft Symbol Representation

15.2 Continuous MIMO Channel

15.3 Semantic Distortion Metric

15.4 Hallucination as Semantic Deviation

15.5 Multi-Agent Noise Suppression

15.6 Semantic Majority Voting

15.7 Compression in Embedding Space

15.8 Python Extension Concept

15.9 Experimental Demonstration

15.10 Vector Gaussian MIMO Interpretation

16. Turbo Codes and Iterative Belief Propagation

16.1 Soft Symbols and Log-Likelihoods

16.2 Multi-Agent Observations as Parallel Channels

16.3 Iterative Belief Propagation Interpretation

16.4 Hallucination as Decoding Error

16.5 Turbo Code Analogy

16.6 Multi-Agent Debate as Iterative Decoding

16.7 Semantic Channel Capacity

16.8 Practical Algorithm

16.9 Interpretation for LLM Systems

16.10 Key Insight

17. Chain-of-Thought as Parity Checks

17.1 Latent Reasoning Graph

17.2 Channel Noise in LLM Generation

17.3 Chain-of-Thought as Redundant Encoding

17.4 LDPC-Style Factor Graph

17.5 Belief Propagation Decoding

17.6 Self-Consistency as Monte Carlo Decoding

17.7 Hallucination as Parity Violation

17.8 Information-Theoretic Interpretation

17.9 Implications for Multi-Agent Systems

17.10 Visualization of the Analogy

17.11 Key Insight

18. Critical Review and Testable Hypotheses

18.1 Cross-Cutting Weaknesses

18.2 Hypothesis Map

Task 1: MIMO Symbolic Communication (`auraison-ncq.1`)

Task 2: Rate-Distortion Interpretation (`auraison-ncq.2`)

Task 3: Python MIMO Simulator (`auraison-ncq.3`)

Task 4: Soft Symbols and Semantic Distortion (`auraison-ncq.4`)

Task 5: Turbo Codes, Belief Propagation, CoT as LDPC (`auraison-ncq.5`)

Task 6: Transformers as Soft-Symbol Encoders (`auraison-ncq.6`)

18.3 Priority Hypotheses

18.4 What Would Make This Publishable