Timeout Configuration

Timeout Configuration#

MassGen provides timeout configuration to control how long coordination and agent operations can run before being terminated. This prevents runaway processes and ensures predictable execution times.

Quick Reference#

Default Timeouts:

Orchestrator: 1800 seconds (30 minutes)
Per-Round: Disabled by default in YAML configs; enabled in --quickstart (10 min initial, 5 min subsequent)
Grace Period: 120 seconds (time after soft timeout before hard block)

CLI Override:

uv run python -m massgen.cli \
  --orchestrator-timeout 600 \
  --config config.yaml \
  "Your question"

Config File:

timeout_settings:
  orchestrator_timeout_seconds: 1800
  initial_round_timeout_seconds: 600      # 10 min for first answer
  subsequent_round_timeout_seconds: 180   # 3 min for voting rounds
  round_timeout_grace_seconds: 120        # Grace period before hard block

Timeout Types#

MassGen has two levels of timeout control:

Orchestrator Timeout: Overall session limit (kills entire coordination)
Per-Round Timeout: Individual round limits (prompts agents to submit)

Orchestrator Timeout#

Controls the maximum time for multi-agent coordination:

Covers: Entire coordination process (all rounds of voting and consensus)
Default: 1800 seconds (30 minutes)
When it triggers: Coordination exceeds the time limit
What happens: Coordination terminates gracefully, current state is saved

timeout_settings:
  orchestrator_timeout_seconds: 600  # 10 minutes

Per-Round Timeout#

Controls the maximum time for individual agent rounds. This prevents agents from getting stuck in analysis loops (e.g., repeatedly analyzing the same image with inconsistent results).

Covers: Single round of agent work (initial answer or voting)
Default: Needs to be added in YAML configs; --quickstart enables with 600s/300s/120s
When it triggers: Agent exceeds time limit for current round
What happens: Two-phase timeout (soft warning, then hard block)

Configuration Options:

timeout_settings:
  initial_round_timeout_seconds: 600    # Soft timeout for round 0 (initial answer)
  subsequent_round_timeout_seconds: 180 # Soft timeout for rounds 1+ (voting)
  round_timeout_grace_seconds: 120      # Grace period before hard block

Two-Phase Timeout Behavior:

Soft Timeout: When reached, a friendly warning message is injected telling the agent to wrap up and submit. The agent can still finish final touches to make their work presentable.
Hard Timeout: After the grace period expires (soft timeout + round_timeout_grace_seconds), non-terminal tool calls are blocked. Only vote and new_answer tools are allowed.

Timeline Example (initial round with 600s timeout + 120s grace):

0-600s:   Agent works normally
600s:     Soft timeout - friendly warning message injected
600-720s: Grace period - agent can finish final touches
720s+:    Hard timeout - non-terminal tools blocked, only vote/new_answer allowed

Soft Timeout Message (from RoundTimeoutPostHook):

============================================================
⏰ ROUND TIME LIMIT APPROACHING - PLEASE WRAP UP
============================================================

You have exceeded the soft time limit for this initial answer round (605s / 600s).

Please wrap up your current work and submit soon:
1. `new_answer` - Submit your current best answer (can be a work-in-progress)
2. `vote` - Vote for an existing answer if one is satisfactory

You may finish any final touches to make your work presentable, but please
submit within the next 120 seconds. After that, tool calls
will be blocked and you'll need to submit immediately.

The next coordination round will allow further iteration if needed.
============================================================

Why Use Per-Round Timeouts:

Prevent stuck agents: Agents can get caught in loops (e.g., repeatedly calling vision tools on the same image)
Predictable costs: Cap spending on individual rounds
Fairer coordination: Ensure all agents get timely turns
Different phases, different needs: Initial answers need more time than voting rounds

Smart Injection Skipping:

When a new answer arrives from another agent, MassGen normally injects it mid-stream so the current agent can consider it. However, if the agent is close to their soft timeout, injection is skipped and the agent restarts instead. This ensures agents have enough time to properly consider new answers rather than being forced to submit immediately after seeing them.

The threshold is round_timeout_grace_seconds - if remaining time before soft timeout is less than the grace period, injection is skipped.

[Orchestrator] Skipping mid-stream injection for agent_a - only 45s until soft timeout (need 120s to think)

Subagent Round Timeouts#

Subagents can use per-round timeouts too. Configure them under orchestrator.coordination.subagent_round_timeouts. If omitted, subagents inherit the parent timeout_settings values.

orchestrator:
  coordination:
    enable_subagents: true
    subagent_round_timeouts:
      initial_round_timeout_seconds: 300
      subsequent_round_timeout_seconds: 120
      round_timeout_grace_seconds: 60

Configuration Methods

Method 1: CLI Flag (Highest Priority)#

Override timeout for a single run:

# Short timeout for simple task
uv run python -m massgen.cli \
  --orchestrator-timeout 300 \
  --config config.yaml \
  "What are LLM agents?"

# Longer timeout for complex research
uv run python -m massgen.cli \
  --orchestrator-timeout 3600 \
  --config config.yaml \
  "Conduct comprehensive market analysis with 5 agents"

Method 2: Configuration File#

Set timeout in your YAML configuration:

# Basic configuration with custom timeout
agents:
  - id: "agent1"
    backend:
      type: "gemini"
      model: "gemini-2.5-flash"

timeout_settings:
  orchestrator_timeout_seconds: 900  # 15 minutes

ui:
  display_type: "rich_terminal"

Method 3: Default (No Configuration)#

If not specified, MassGen uses the default 30-minute timeout:

# This configuration will use default 1800s timeout
agents:
  - id: "agent1"
    backend:
      type: "openai"
      model: "gpt-4o"

Timeout Behavior#

What Happens When Timeout Occurs#

When the orchestrator timeout is reached:

Current coordination round completes (not interrupted mid-operation)
Partial results saved (current state is preserved)
Error message displayed indicating timeout
Graceful shutdown (agents cleanup properly)

🔄 Round 5 of coordination...
⏰ Orchestrator timeout reached (1800 seconds)
💾 Saving current state...
❌ Coordination incomplete - timeout exceeded

Important: The system attempts graceful termination. Individual agent operations may still complete if they’re in progress.

Successful Completion Before Timeout#

If coordination completes normally:

✅ Coordination complete!
⏱️  Total time: 245 seconds (well under 1800s limit)

Choosing the Right Timeout#

Simple Tasks (5-10 minutes)#

Recommended: 300-600 seconds

timeout_settings:
  orchestrator_timeout_seconds: 600

Examples:

Quick research questions
Single-agent tasks
Fast LLM models (GPT-4o-mini, Gemini Flash)
Tasks with 2-3 agents

uv run python -m massgen.cli \
  --orchestrator-timeout 600 \
  --model gemini-2.5-flash \
  "What are the key features of Python 3.12?"

Standard Tasks (15-30 minutes)#

Recommended: 900-1800 seconds (default)

timeout_settings:
  orchestrator_timeout_seconds: 1800  # Default

Examples:

Multi-agent coordination (3-5 agents)
Tasks with external API calls (MCP tools)
Code generation with file operations
Research with web search

uv run python -m massgen.cli \
  --config multi_agent_config.yaml \
  "Analyze market trends and create a report"

Complex Tasks (30-60 minutes)#

Recommended: 1800-3600 seconds

timeout_settings:
  orchestrator_timeout_seconds: 3600  # 1 hour

Examples:

Large-scale code refactoring
Comprehensive research with many sources
Tasks involving multiple API calls
5+ agents coordination
Planning mode with extensive discussion

uv run python -m massgen.cli \
  --orchestrator-timeout 3600 \
  --config five_agents_research.yaml \
  "Conduct a complete competitive analysis of the AI market"

Long-Running Tasks (60+ minutes)#

Recommended: 3600+ seconds

timeout_settings:
  orchestrator_timeout_seconds: 7200  # 2 hours

Warning

Very long timeouts can lead to expensive API costs. Consider breaking down the task or using checkpoints.

Examples:

Full codebase analysis
Large-scale data processing
Multi-stage project generation
Complex multi-turn conversations

Examples by Task Type#

Example 1: Quick Analysis#

Task: Simple question, single agent

uv run python -m massgen.cli \
  --orchestrator-timeout 300 \
  --backend openai \
  --model gpt-4o-mini \
  "Explain quantum entanglement in simple terms"

Reasoning: Single agent with fast model, expected completion in 1-2 minutes, 5-minute timeout gives buffer.

Example 2: Multi-Agent Research#

Task: Three agents researching and comparing approaches

agents:
  - id: "researcher1"
    backend: {type: "gemini", model: "gemini-2.5-flash"}
  - id: "researcher2"
    backend: {type: "openai", model: "gpt-4o"}
  - id: "researcher3"
    backend: {type: "claude", model: "claude-sonnet-4"}

timeout_settings:
  orchestrator_timeout_seconds: 1200  # 20 minutes

Reasoning: Multiple rounds of coordination expected, web search enabled, 20 minutes allows for thorough research and discussion.

Example 3: Code Generation with Files#

Task: Generate project structure with multiple files

agents:
  - id: "architect"
    backend: {type: "claude_code", cwd: "workspace"}
  - id: "reviewer"
    backend: {type: "gemini", model: "gemini-2.5-flash"}

orchestrator:
  coordination:
    enable_planning_mode: true

timeout_settings:
  orchestrator_timeout_seconds: 1800  # 30 minutes

Reasoning: Planning mode discussion + file creation, default 30 minutes is appropriate.

Example 4: MCP Tool Integration#

Task: Use multiple MCP tools with planning mode

agents:
  - id: "agent1"
    backend:
      type: "openai"
      model: "gpt-5-nano"
      mcp_servers:
        - {name: "weather", ...}
        - {name: "search", ...}

orchestrator:
  coordination:
    enable_planning_mode: true

timeout_settings:
  orchestrator_timeout_seconds: 2400  # 40 minutes

Reasoning: MCP tools may have API latency, planning mode adds coordination time, 40 minutes provides safety margin.

Troubleshooting#

Timeouts Occurring Too Frequently#

Symptoms:

Tasks consistently hitting timeout
Coordination incomplete messages
Partial results only

Solutions:

Increase timeout:

timeout_settings:
  orchestrator_timeout_seconds: 3600  # Double the default

Reduce agent count: Fewer agents = faster coordination
Simplify task: Break complex tasks into smaller subtasks
Use faster models: Consider GPT-4o-mini or Gemini Flash instead of larger models

Disable planning mode if not needed:

orchestrator:
  coordination:
    enable_planning_mode: false

Check for stuck agents: Review debug logs for agents not responding

Enable per-round timeouts: Force agents to submit after a time limit:

timeout_settings:
  initial_round_timeout_seconds: 600
  subsequent_round_timeout_seconds: 180

Tasks Completing Too Quickly#

Symptoms:

Coordination ends in seconds
Agents immediately voting without discussion
Short timeout may be unnecessarily limiting deeper analysis

Solutions:

This is generally not a problem - fast completion is good!
If you want more thorough discussion, adjust system messages to encourage analysis

Per-Round Timeout Issues#

Symptoms:

Soft timeout message appears but agent keeps working
Hard timeout blocks tools unexpectedly
Agent submits incomplete work

Solutions:

Increase grace period if agents need more time to finish:

timeout_settings:
  round_timeout_grace_seconds: 180  # 3 minutes instead of 2

Increase initial timeout for complex tasks:

timeout_settings:
  initial_round_timeout_seconds: 900  # 15 minutes

Check log messages for timeout events:

[RoundTimeoutPostHook] Soft timeout reached for agent_b after 605s
[RoundTimeoutPreHook] Blocking mcp__filesystem__write_file for agent_b - hard timeout exceeded

Disable per-round timeouts by omitting the settings (they’re disabled by default)

Timeout But No Error Message#

Problem: Timeout occurs but no clear indication in output.

Solution: Enable debug logging:

uv run python -m massgen.cli \
  --debug \
  --orchestrator-timeout 600 \
  --config config.yaml \
  "Your question"

Check logs in agent_outputs/log_{timestamp}/massgen_debug.log

Best Practices#

Start with defaults: Use the 30-minute default unless you have specific needs
Adjust based on task complexity:
- Simple: 300-600s
- Standard: 900-1800s
- Complex: 1800-3600s
- Very complex: 3600+s
Consider cost implications: Longer timeouts = potentially higher API costs

Use CLI overrides for testing: Test with shorter timeouts first

# Test with 5-minute timeout
uv run python -m massgen.cli --orchestrator-timeout 300 --config test.yaml "test"

# Then use full timeout for production
uv run python -m massgen.cli --config prod.yaml "real task"

Monitor actual completion times: Check logs to see typical durations for your tasks

Set appropriate timeouts per environment:

# Development config
timeout_settings:
  orchestrator_timeout_seconds: 600  # Fast feedback

# Production config
timeout_settings:
  orchestrator_timeout_seconds: 3600  # Allow full completion

Document timeout choices: Add comments explaining timeout rationale

timeout_settings:
  # 40 minutes: allows for 5 agents, planning mode, and MCP tool latency
  orchestrator_timeout_seconds: 2400

API Cost Considerations#

Longer timeouts can lead to higher costs:

Estimated API Costs by Timeout:

Timeout	Typical Duration	3-Agent Scenario	5-Agent Scenario
5 min	2-3 min	$0.10-0.50	$0.20-0.80
30 min (default)	5-15 min	$0.50-2.00	$1.00-4.00
1 hour	20-40 min	$2.00-5.00	$4.00-10.00
2 hours	40-90 min	$5.00-15.00	$10.00-30.00

Note

These are rough estimates. Actual costs depend on:

Models used (GPT-4 vs GPT-4o-mini, etc.)
Number of coordination rounds
Tool usage (MCP, code execution, web search)
Response lengths

Cost-Saving Tips:

Use shorter timeouts for testing
Choose efficient models (GPT-4o-mini, Gemini Flash)
Limit agent count for simple tasks
Monitor actual usage and adjust timeouts accordingly

Debug and Monitoring#

Viewing Timeout Information#

Enable debug logging to see timeout details:

uv run python -m massgen.cli --debug --config config.yaml "question"

Look for timeout-related messages in agent_outputs/log_{timestamp}/massgen_debug.log:

[INFO] Orchestrator timeout configured: 1800 seconds
[INFO] Starting coordination...
[INFO] Round 1 complete (elapsed: 45s / 1800s)
[INFO] Round 2 complete (elapsed: 128s / 1800s)
...

Monitoring Coordination Progress#

In the terminal UI, watch for elapsed time indicators:

┌─ Coordination Progress ─────────────────┐
│ Round: 3/∞                              │
│ Elapsed: 234s / 1800s (13%)             │
│ Status: In progress                     │
└──────────────────────────────────────────┘

Next Steps#

Test your configuration with appropriate timeouts
Monitor actual completion times in your use cases
Adjust timeouts based on observed patterns
Consider cost vs. completion trade-offs

Timeout Configuration

Contents

Timeout Configuration#

Quick Reference#

Timeout Types#

Orchestrator Timeout#

Per-Round Timeout#

Subagent Round Timeouts#

Method 1: CLI Flag (Highest Priority)#

Method 2: Configuration File#

Method 3: Default (No Configuration)#

Timeout Behavior#

What Happens When Timeout Occurs#

Successful Completion Before Timeout#

Choosing the Right Timeout#

Simple Tasks (5-10 minutes)#

Standard Tasks (15-30 minutes)#

Complex Tasks (30-60 minutes)#

Long-Running Tasks (60+ minutes)#

Examples by Task Type#

Example 1: Quick Analysis#

Example 2: Multi-Agent Research#

Example 3: Code Generation with Files#

Example 4: MCP Tool Integration#

Troubleshooting#

Timeouts Occurring Too Frequently#

Tasks Completing Too Quickly#

Per-Round Timeout Issues#

Timeout But No Error Message#

Best Practices#

API Cost Considerations#

Debug and Monitoring#

Viewing Timeout Information#

Monitoring Coordination Progress#

Related Configuration#

Next Steps#