Chaos Mixture-of-Agents (MoA) Pipeline

A production-ready, pure-Python asyncio + httpx pipeline that achieves high-quality answers by running a 4-phase agentic debate through a single blind proxy endpoint.

Architecture

graph LR
    Q["Question"] --> P1
    subgraph "Phase 1 — Blind Generation"
        P1["N concurrent CoT\nrequests to proxy"]
    end
    P1 -->|valid answers| P2
    subgraph "Phase 2 — Cross-Critique"
        P2["Each answer is\naggressively critiqued"]
    end
    P2 -->|answers + critiques| P25
    subgraph "Phase 2.5 — Rebuttal"
        P25["Each solver defends\nor revises its answer"]
    end
    P25 -->|answers + critiques + rebuttals| P3
    subgraph "Phase 3 — The Judge"
        P3["NUM_JUDGES concurrent judges\nmajority vote on final verdict"]
    end
    P3 --> V["Final Verdict + Confidence"]

Quick Start

# 1. Clone & install
git clone <repo-url> && cd Chaos-MoA-Pipeline
pip install -r requirements.txt

# 2. Configure
cp .env.example .env
# Edit .env with your PROXY_URL and API_KEY

# 3. Run
python moa_pipeline.py -q "What is 17 * 23?" -w 3 -v

CLI Options

Flag	Description
`-q`, `--question`	The question or problem to solve (required)
`-w`, `--workers N`	Override `NUM_WORKERS` from `.env`
`-j`, `--judges N`	Override `NUM_JUDGES` from `.env`
`--no-rebuttal`	Skip Phase 2.5 (faster, fewer API calls)
`-v`, `--verbose`	Enable DEBUG-level logging

Configuration

All tunables live in .env — no code changes needed:

Variable	Default	Description
`PROXY_URL`	(required)	Cloudflare AI Gateway endpoint
`API_KEY`	(required)	Bearer token for the proxy
`MODEL_NAME`	`default`	Model identifier sent in the request body
`NUM_WORKERS`	`5`	Concurrent workers in Phase 1 & 2
`MIN_VALID_ANSWERS`	`2`	Min valid Phase 1 answers needed to proceed
`MAX_RETRIES`	`4`	Retries per API call
`RETRY_BASE_DELAY`	`1.0`	Exponential backoff base (seconds)
`RETRY_MAX_DELAY`	`60.0`	Backoff cap (seconds)
`REQUEST_TIMEOUT`	`200.0`	Per-request HTTP timeout (seconds)
`TEMPERATURE`	`1.0`	Generation temperature for Phase 1 & 2
`MAX_TOKENS`	`8192`	Max tokens per response
`JUDGE_TEMPERATURE`	`0.2`	Judge temperature (lower = more deterministic)
`JUDGE_MAX_TOKENS`	`8192`	Max tokens for the Judge's synthesis
`NUM_JUDGES`	`3`	Concurrent judge instances (majority vote)
`MIN_JUDGE_CONFIDENCE`	`60`	Confidence threshold below which pipeline reruns
`ENABLE_REBUTTAL`	`true`	Run Phase 2.5 rebuttal round
`ENABLE_FILE_LOGGING`	`true`	Save full transcript to a `.txt` file
`LOG_DIR`	`logs`	Directory for transcript files
`LOG_FILENAME_PREFIX`	`moa_run`	Prefix for transcript filenames

Key Design Decisions

XML tags over JSON — Even tiny 8B models reliably produce <tag>…</tag> pairs, unlike valid JSON.
Defensive parsing with fallbacks — Three-tier extraction (strict XML → markdown headers → paragraph split) recovers partial outputs before triggering a retry.
Cache-busting on retry — Each retry appends a random nonce to the system prompt, forcing a Cloudflare cache miss so retries actually hit a fresh model call.
Exponential backoff + jitter — Prevents thundering herd on rate limits; backs off on timeout, 5xx, 429, or parse failure.
Multi-judge consensus — Phase 3 runs NUM_JUDGES instances concurrently and resolves their verdicts by majority vote on the FINAL ANSWER: line, using confidence as a tiebreaker.
Confidence-triggered rerun — If the winning judge's confidence is below MIN_JUDGE_CONFIDENCE, the entire pipeline reruns once with higher temperature and more workers.
Graceful degradation — If only 2/5 Phase 1 answers succeed, the pipeline continues rather than aborting.
Zero frameworks — Only asyncio, httpx, re, dataclasses. No LangChain, no AutoGen.

Example Run

Question:

The "Chronos Vault" starts with exactly 1000 data tokens. Four agents (A, B, C, and D) extract tokens sequentially... (see full question in logs/moa_run_20260310_210349.txt)

Pipeline execution summary (NUM_WORKERS=5, NUM_JUDGES=3, Phase 2.5 enabled):

Phase	Outcome
Phase 1 — Blind Generation	5/5 valid answers
Phase 2 — Cross-Critique	4/5 critiques succeeded (1 empty response → gracefully degraded)
Phase 2.5 — Rebuttal	5/5 rebuttals succeeded
Phase 3 — The Judge	2/3 judges agreed, confidence: 100/100
Total elapsed	477s

Final verdict (step-by-step state tracking log):

Start:         Vault=1000
Rogue steals:  47 (largest prime < 50)  → Rogue=47,  Vault=953
Agent A (odd): floor(10% × 953) = 95   → A=95,      Vault=858
Agent B:       ceil(50% × 95)+15 = 63  → B=63,      Vault=795
Agent C:       round(20% × 795) = 159  → C=159,     Vault=636
Agent D (A>B): extracts C's amount=159 → D=159,     Vault=47
620F
7
Audit (477<700): each agent returns 5  → Vault=497

Entity	Final tokens
Rogue subroutine	47
Agent A	90
Agent B	58
Agent C	154
Agent D	154
Vault	497
Total	1000

The cache-busting retry mechanism was exercised during Phase 2: crit-1 received empty responses from the proxy. Without the nonce injection, all 7 retries would have returned the same cached empty response. The pipeline correctly degraded to 4/5 critiques and continued.

Full transcript: logs/moa_run_20260310_210349.txt

Testing

python -m pytest test_parsing.py -v

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chaos Mixture-of-Agents (MoA) Pipeline

Architecture

Quick Start

CLI Options

Configuration

Key Design Decisions

Example Run

Testing

License

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
logs		logs
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
moa_pipeline.py		moa_pipeline.py
requirements.txt		requirements.txt
test_parsing.py		test_parsing.py

Folders and files

Latest commit

History

Repository files navigation

Chaos Mixture-of-Agents (MoA) Pipeline

Architecture

Quick Start

CLI Options

Configuration

Key Design Decisions

Example Run

Testing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages