10000
Skip to content

bawfng04/Chaos-MoA-Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chaos Mixture-of-Agents (MoA) Pipeline

A production-ready, pure-Python asyncio + httpx pipeline that achieves high-quality answers by running a 4-phase agentic debate through a single blind proxy endpoint.

Architecture

graph LR
    Q["Question"] --> P1
    subgraph "Phase 1 — Blind Generation"
        P1["N concurrent CoT\nrequests to proxy"]
    end
    P1 -->|valid answers| P2
    subgraph "Phase 2 — Cross-Critique"
        P2["Each answer is\naggressively critiqued"]
    end
    P2 -->|answers + critiques| P25
    subgraph "Phase 2.5 — Rebuttal"
        P25["Each solver defends\nor revises its answer"]
    end
    P25 -->|answers + critiques + rebuttals| P3
    subgraph "Phase 3 — The Judge"
        P3["NUM_JUDGES concurrent judges\nmajority vote on final verdict"]
    end
    P3 --> V["Final Verdict + Confidence"]
Loading

Quick Start

# 1. Clone & install
git clone <repo-url> && cd Chaos-MoA-Pipeline
pip install -r requirements.txt

# 2. Configure
cp .env.example .env
# Edit .env with your PROXY_URL and API_KEY

# 3. Run
python moa_pipeline.py -q "What is 17 * 23?" -w 3 -v

CLI Options

Flag Description
-q, --question The question or problem to solve (required)
-w, --workers N Override NUM_WORKERS from .env
-j, --judges N Override NUM_JUDGES from .env
--no-rebuttal Skip Phase 2.5 (faster, fewer API calls)
-v, --verbose Enable DEBUG-level logging

Configuration

All tunables live in .env — no code changes needed:

Variable Default Description
PROXY_URL (required) Cloudflare AI Gateway endpoint
API_KEY (required) Bearer token for the proxy
MODEL_NAME default Model identifier sent in the request body
NUM_WORKERS 5 Concurrent workers in Phase 1 & 2
MIN_VALID_ANSWERS 2 Min valid Phase 1 answers needed to proceed
MAX_RETRIES 4 Retries per API call
RETRY_BASE_DELAY 1.0 Exponential backoff base (seconds)
RETRY_MAX_DELAY 60.0 Backoff cap (seconds)
REQUEST_TIMEOUT 200.0 Per-request HTTP timeout (seconds)
TEMPERATURE 1.0 Generation temperature for Phase 1 & 2
MAX_TOKENS 8192 Max tokens per response
JUDGE_TEMPERATURE 0.2 Judge temperature (lower = more deterministic)
JUDGE_MAX_TOKENS 8192 Max tokens for the Judge's synthesis
NUM_JUDGES 3 Concurrent judge instances (majority vote)
MIN_JUDGE_CONFIDENCE 60 Confidence threshold below which pipeline reruns
ENABLE_REBUTTAL true Run Phase 2.5 rebuttal round
ENABLE_FILE_LOGGING true Save full transcript to a .txt file
LOG_DIR logs Directory for transcript files
LOG_FILENAME_PREFIX moa_run Prefix for transcript filenames

Key Design Decisions

  • XML tags over JSON — Even tiny 8B models reliably produce <tag>…</tag> pairs, unlike valid JSON.
  • Defensive parsing with fallbacks — Three-tier extraction (strict XML → markdown headers → paragraph split) recovers partial outputs before triggering a retry.
  • Cache-busting on retry — Each retry appends a random nonce to the system prompt, forcing a Cloudflare cache miss so retries actually hit a fresh model call.
  • Exponential backoff + jitter — Prevents thundering herd on rate limits; backs off on timeout, 5xx, 429, or parse failure.
  • Multi-judge consensus — Phase 3 runs NUM_JUDGES instances concurrently and resolves their verdicts by majority vote on the FINAL ANSWER: line, using confidence as a tiebreaker.
  • Confidence-triggered rerun — If the winning judge's confidence is below MIN_JUDGE_CONFIDENCE, the entire pipeline reruns once with higher temperature and more workers.
  • Graceful degradation — If only 2/5 Phase 1 answers succeed, the pipeline continues rather than aborting.
  • Zero frameworks — Only asyncio, httpx, re, dataclasses. No LangChain, no AutoGen.

Example Run

Question:

The "Chronos Vault" starts with exactly 1000 data tokens. Four agents (A, B, C, and D) extract tokens sequentially... (see full question in logs/moa_run_20260310_210349.txt)

Pipeline execution summary (NUM_WORKERS=5, NUM_JUDGES=3, Phase 2.5 enabled):

Phase Outcome
Phase 1 — Blind Generation 5/5 valid answers
Phase 2 — Cross-Critique 4/5 critiques succeeded (1 empty response → gracefully degraded)
Phase 2.5 — Rebuttal 5/5 rebuttals succeeded
Phase 3 — The Judge 2/3 judges agreed, confidence: 100/100
Total elapsed 477s

Final verdict (step-by-step state tracking log):

Start:         Vault=1000
Rogue steals:  47 (largest prime < 50)  → Rogue=47,  Vault=953
Agent A (odd): floor(10% × 953) = 95   → A=95,      Vault=858
Agent B:       ceil(50% × 95)+15 = 63  → B=63,      Vault=795
Agent C:       round(20% × 795) = 159  → C=159,     Vault=636
Agent D (A>B): extracts C's amount=159 → D=159,     Vault=47
620F
7
Audit (477<700): each agent returns 5  → Vault=497
Entity Final tokens
Rogue subroutine 47
Agent A 90
Agent B 58
Agent C 154
Agent D 154
Vault 497
Total 1000

The cache-busting retry mechanism was exercised during Phase 2: crit-1 received empty responses from the proxy. Without the nonce injection, all 7 retries would have returned the same cached empty response. The pipeline correctly degraded to 4/5 critiques and continued.

Full transcript: logs/moa_run_20260310_210349.txt

Testing

python -m pytest test_parsing.py -v

License

MIT

About

A brutally fault-tolerant Mixture-of-Agents (MoA) pipeline built in pure Python. Designed to orchestrate chaotic, round-robin LLM proxy endpoints through a rigorous 4-stage Agentic Workflow (Generate ➔ Cross-Critique ➔ Rebuttal ➔ Judge). Built to eradicate hallucination and guarantee absolute accuracy in complex, multi-step reasoning tasks.

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages

0