A production-ready, pure-Python asyncio + httpx pipeline that achieves high-quality answers by running a 4-phase agentic debate through a single blind proxy endpoint.
graph LR
Q["Question"] --> P1
subgraph "Phase 1 — Blind Generation"
P1["N concurrent CoT\nrequests to proxy"]
end
P1 -->|valid answers| P2
subgraph "Phase 2 — Cross-Critique"
P2["Each answer is\naggressively critiqued"]
end
P2 -->|answers + critiques| P25
subgraph "Phase 2.5 — Rebuttal"
P25["Each solver defends\nor revises its answer"]
end
P25 -->|answers + critiques + rebuttals| P3
subgraph "Phase 3 — The Judge"
P3["NUM_JUDGES concurrent judges\nmajority vote on final verdict"]
end
P3 --> V["Final Verdict + Confidence"]
# 1. Clone & install
git clone <repo-url> && cd Chaos-MoA-Pipeline
pip install -r requirements.txt
# 2. Configure
cp .env.example .env
# Edit .env with your PROXY_URL and API_KEY
# 3. Run
python moa_pipeline.py -q "What is 17 * 23?" -w 3 -v| Flag | Description |
|---|---|
-q, --question |
The question or problem to solve (required) |
-w, --workers N |
Override NUM_WORKERS from .env |
-j, --judges N |
Override NUM_JUDGES from .env |
--no-rebuttal |
Skip Phase 2.5 (faster, fewer API calls) |
-v, --verbose |
Enable DEBUG-level logging |
All tunables live in .env — no code changes needed:
| Variable | Default | Description |
|---|---|---|
PROXY_URL |
(required) | Cloudflare AI Gateway endpoint |
API_KEY |
(required) | Bearer token for the proxy |
MODEL_NAME |
default |
Model identifier sent in the request body |
NUM_WORKERS |
5 |
Concurrent workers in Phase 1 & 2 |
MIN_VALID_ANSWERS |
2 |
Min valid Phase 1 answers needed to proceed |
MAX_RETRIES |
4 |
Retries per API call |
RETRY_BASE_DELAY |
1.0 |
Exponential backoff base (seconds) |
RETRY_MAX_DELAY |
60.0 |
Backoff cap (seconds) |
REQUEST_TIMEOUT |
200.0 |
Per-request HTTP timeout (seconds) |
TEMPERATURE |
1.0 |
Generation temperature for Phase 1 & 2 |
MAX_TOKENS |
8192 |
Max tokens per response |
JUDGE_TEMPERATURE |
0.2 |
Judge temperature (lower = more deterministic) |
JUDGE_MAX_TOKENS |
8192 |
Max tokens for the Judge's synthesis |
NUM_JUDGES |
3 |
Concurrent judge instances (majority vote) |
MIN_JUDGE_CONFIDENCE |
60 |
Confidence threshold below which pipeline reruns |
ENABLE_REBUTTAL |
true |
Run Phase 2.5 rebuttal round |
ENABLE_FILE_LOGGING |
true |
Save full transcript to a .txt file |
LOG_DIR |
logs |
Directory for transcript files |
LOG_FILENAME_PREFIX |
moa_run |
Prefix for transcript filenames |
- XML tags over JSON — Even tiny 8B models reliably produce
<tag>…</tag>pairs, unlike valid JSON. - Defensive parsing with fallbacks — Three-tier extraction (strict XML → markdown headers → paragraph split) recovers partial outputs before triggering a retry.
- Cache-busting on retry — Each retry appends a random nonce to the system prompt, forcing a Cloudflare cache miss so retries actually hit a fresh model call.
- Exponential backoff + jitter — Prevents thundering herd on rate limits; backs off on timeout, 5xx, 429, or parse failure.
- Multi-judge consensus — Phase 3 runs
NUM_JUDGESinstances concurrently and resolves their verdicts by majority vote on theFINAL ANSWER:line, using confidence as a tiebreaker. - Confidence-triggered rerun — If the winning judge's confidence is below
MIN_JUDGE_CONFIDENCE, the entire pipeline reruns once with higher temperature and more workers. - Graceful degradation — If only 2/5 Phase 1 answers succeed, the pipeline continues rather than aborting.
- Zero frameworks — Only
asyncio,httpx,re,dataclasses. No LangChain, no AutoGen.
Question:
The "Chronos Vault" starts with exactly 1000 data tokens. Four agents (A, B, C, and D) extract tokens sequentially... (see full question in
logs/moa_run_20260310_210349.txt)
Pipeline execution summary (NUM_WORKERS=5, NUM_JUDGES=3, Phase 2.5 enabled):
| Phase | Outcome |
|---|---|
| Phase 1 — Blind Generation | 5/5 valid answers |
| Phase 2 — Cross-Critique | 4/5 critiques succeeded (1 empty response → gracefully degraded) |
| Phase 2.5 — Rebuttal | 5/5 rebuttals succeeded |
| Phase 3 — The Judge | 2/3 judges agreed, confidence: 100/100 |
| Total elapsed | 477s |
Final verdict (step-by-step state tracking log):
Start: Vault=1000
Rogue steals: 47 (largest prime < 50) → Rogue=47, Vault=953
Agent A (odd): floor(10% × 953) = 95 → A=95, Vault=858
Agent B: ceil(50% × 95)+15 = 63 → B=63, Vault=795
Agent C: round(20% × 795) = 159 → C=159, Vault=636
Agent D (A>B): extracts C's amount=159 → D=159, Vault=47
620F
7
Audit (477<700): each agent returns 5 → Vault=497
| Entity | Final tokens |
|---|---|
| Rogue subroutine | 47 |
| Agent A | 90 |
| Agent B | 58 |
| Agent C | 154 |
| Agent D | 154 |
| Vault | 497 |
| Total | 1000 |
The cache-busting retry mechanism was exercised during Phase 2: crit-1 received empty responses from the proxy. Without the nonce injection, all 7 retries would have returned the same cached empty response. The pipeline correctly degraded to 4/5 critiques and continued.
Full transcript: logs/moa_run_20260310_210349.txt
python -m pytest test_parsing.py -vMIT