Framework adds 9x overhead to LLM API calls

Environment

Parlant Version: 3.0.2
Python: 3.12.4
Platform: macOS
LLM: Qwen (qwen3-32b)

Problem
Tool calls through Parlant take 10.22 seconds, while the same LLM API call directly takes only 1.14 seconds - a 9x performance overhead.

Reproduction

Simple weather tool:

@p.tool
async def get_weather(context: p.ToolContext, city: str) -> p.ToolResult:
    return p.ToolResult(f"{city}的天气:晴天,22°")

Performance Comparison

Method	Time
Direct Qwen API	1.14s ✅
Through Parlant	10.22s ⚠️
Overhead	+795%

Logs

[ToolCaller] Creating batches finished in 0.0 seconds
[ToolCaller] Processing batches started
[Qwen LLM Request] started
[Qwen LLM Request] finished in 10.219 seconds  ← 9-second overhead
[ToolCaller] Evaluation finished in 10.22 seconds

Analysis

Only 1 LLM request is made (no hidden retries)
Framework batch creation: ~0ms
The entire 9-second overhead occurs during the single LLM call
Same model, same API endpoint, vastly different performance

Test Code

Direct API (1.14s):

from openai import AsyncClient
response = await client.chat.completions.create(
    model="qwen3-32b",
    messages=[{"role": "user", "content": prompt}],
    response_format={"type": "json_object"},
    extra_body={"enable_thinking": False},
)
# Takes 1.14 seconds

Through Parlant (10.22s):

async with p.Server(nlp_service=load_qwen_service) as server:
    agent = await server.create_agent(...)
    await agent.create_guideline(
        condition="用户询问天气",
        action="获取当前天气并提供友好的回应和建议",
        tools=[get_weather],
    )
# Takes 10.22 seconds for tool call

Impact
Makes Parlant unsuitable for:

Real-time conversations
Production environments with latency SLAs
High-throughput applications

Questions

Where is this 9-second overhead coming from?
Are there extremely large system prompts being generated?
Is there hidden synchronous blocking?
Any performance tuning recommendations?

Additional Testing

Different Qwen models (direct API):

qwen3-32b: 1.14s
qwen-turbo: 0.48s
qwen-turbo (streaming): 0.84s (TTFB: 0.23s)

All show similar performance, while Parlant consistently takes 10+ seconds.

Let me know if you need profiling data or additional information!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Framework adds 9x overhead to LLM API calls #587

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Framework adds 9x overhead to LLM API calls #587

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions