# MassGen — full documentation dump

> Concatenated source of MassGen's quickstart, user guide, and reference
> documentation. For a curated index see /llms.txt. Generated at
> Sphinx build time from docs/source/{quickstart,user_guide,reference}.


---

## quickstart/configuration.rst

Configuration
=============

MassGen is configured using environment variables for API keys and YAML files for agent definitions and orchestrator settings. This guide shows you how to set up your configuration.

.. tip::

   MassGen offers multiple usage modes: **CLI** with YAML configuration, **Python API** (``massgen.run()``), and **LiteLLM integration** for OpenAI-compatible interfaces. This guide focuses on CLI configuration. For Python integration, see :doc:`../user_guide/integration/python_api`.

Configuration Methods
=====================

MassGen offers three ways to configure your agents:

1. **Interactive Setup Wizard** (Recommended for beginners)
2. **YAML Configuration Files** (For advanced customization)
3. **CLI Flags** (For quick one-off queries)

Interactive Setup Wizard
-------------------------

The easiest way to configure MassGen is through the interactive wizard:

.. code-block:: bash

   # First run automatically triggers the wizard
   uv run massgen

   # Or manually launch it
   uv run massgen --init

**The Config Builder Interface:**

.. code-block:: text

   ╭──────────────────────────────────────────────────────────────────────────────╮
   │                                                                              │
   │       ███╗   ███╗ █████╗ ███████╗███████╗ ██████╗ ███████╗███╗   ██╗         │
   │       ████╗ ████║██╔══██╗██╔════╝██╔════╝██╔════╝ ██╔════╝████╗  ██║         │
   │       ██╔████╔██║███████║███████╗███████╗██║  ███╗█████╗  ██╔██╗ ██║         │
   │       ██║╚██╔╝██║██╔══██║╚════██║╚════██║██║   ██║██╔══╝  ██║╚██╗██║         │
   │       ██║ ╚═╝ ██║██║  ██║███████║███████║╚██████╔╝███████╗██║ ╚████║         │
   │       ╚═╝     ╚═╝╚═╝  ╚═╝╚══════╝╚══════╝ ╚═════╝ ╚══════╝╚═╝  ╚═══╝         │
   │                                                                              │
   │            🤖 🤖 🤖  →  💬 collaborate  →  🎯 winner  →  📢 final            │
   │                                                                              │
   │  Interactive Configuration Builder                                           │
   │  Create custom multi-agent configurations in minutes!                        │
   ╰──────────────────────────────────────────────────────────────────────────────╯

The wizard guides you through 4 simple steps:

1. **Select Your Use Case**: Choose from pre-built templates (Research, Coding, Q&A, etc.)
2. **Configure Agents**: Select providers and models (wizard detects available API keys)
3. **Configure Tools**: Enable web search, code execution, file operations, etc.
4. **Review & Save**: Save to ``~/.config/massgen/config.yaml`` (Windows: ``%USERPROFILE%\.config\massgen\config.yaml``)

After completing the wizard, your configuration is ready to use:

.. code-block:: bash

   uv run massgen "Your question"  # Uses default config automatically

Configuration Directory Structure
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

MassGen uses two directories for configuration:

**User Configuration** (``~/.config/massgen/``):

.. code-block:: text

   ~/.config/massgen/                        # Windows: %USERPROFILE%\.config\massgen\
   ├── config.yaml              # Default configuration (from wizard)
   ├── agents/                  # Your custom named configurations
   │   ├── research-team.yaml
   │   └── coding-agents.yaml
   └── .env                     # API keys (optional)

**Project Workspace** (``.massgen/`` in your project):

MassGen also creates a ``.massgen/`` directory in your project for sessions, workspaces, and snapshots. See :doc:`../user_guide/concepts` for details.

**Creating Named Configurations:**

.. code-block:: bash

   # Run the wizard in named config mode
   uv run massgen --init

   # Choose to save to ~/.config/massgen/agents/ (Windows: %USERPROFILE%\.config\massgen\agents\)
   # Then use it:
   uv run massgen --config research-team "Your question"

Quickstart Output Filenames
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Quickstart supports explicit config filenames without manually editing paths:

.. code-block:: bash

   # Uses quickstart defaults and saves to .massgen/team-config.yaml
   uv run massgen --quickstart --config team-config

   # Extension is optional; .yaml is added when omitted
   uv run massgen --quickstart --config team-config.yaml

In the TUI/Web quickstart wizard, use the "Save Location" step to:

* choose project (``.massgen/``) or global (``~/.config/massgen/``) save location
* enter a custom filename

Environment Variables
---------------------

API keys are configured through environment variables or a ``.env`` file. After pip install, the setup wizard can create ``~/.config/massgen/.env`` (Windows: ``%USERPROFILE%\.config\massgen\.env``) for you.

OpenRouter (Recommended)
~~~~~~~~~~~~~~~~~~~~~~~~

**Use one API key to access all models** - OpenRouter provides a unified API for OpenAI, Anthropic, Google, xAI, and 200+ other models:

.. code-block:: bash

   # Single key for all models
   export OPENROUTER_API_KEY=sk-or-v1-...

Then use OpenRouter models in your multi-agent configurations

Get your key: `OpenRouter <https://openrouter.ai/keys>`_

Individual Provider Keys
~~~~~~~~~~~~~~~~~~~~~~~~

Alternatively, use provider-specific keys:

.. code-block:: bash

   # OpenAI (for GPT-5, GPT-4, etc.)
   OPENAI_API_KEY=sk-...

   # Anthropic Claude (for claude backend)
   ANTHROPIC_API_KEY=sk-ant-...

   # Claude Code (optional - for claude_code backend only)
   # If set, claude_code uses this instead of ANTHROPIC_API_KEY
   # CLAUDE_CODE_API_KEY=sk-ant-...

   # Google Gemini
   GOOGLE_API_KEY=...

   # xAI Grok
   XAI_API_KEY=...

.. note::

   **Separate API keys for Claude Code:** The ``claude_code`` backend checks
   ``CLAUDE_CODE_API_KEY`` first, then falls back to ``ANTHROPIC_API_KEY``.
   This allows you to use a Claude subscription (no API key) or a separate
   API key for Claude Code agents while using a different API key for standard
   Claude backend agents.

**Getting API Keys:**

* `OpenRouter <https://openrouter.ai/keys>`_ (recommended - single key for all models)
* `OpenAI <https://platform.openai.com/api-keys>`_
* `Anthropic Claude <https://docs.anthropic.com/en/api/overview>`_
* `Google Gemini <https://ai.google.dev/gemini-api/docs>`_
* `xAI Grok <https://docs.x.ai/docs/overview>`_

YAML Configuration Files
-------------------------

MassGen uses YAML files to define agents, their backends, and orchestrator settings. Configuration files are stored in ``@examples/`` and can be referenced using the ``--config`` flag.

Basic Configuration Structure
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A minimal MassGen configuration has these top-level keys:

.. code-block:: yaml

   agents:              # List of agents (required)
     - id: "agent_id"   # Agent definitions
       backend: ...     # Backend configuration
       system_message: ...  # Optional system prompt

   orchestrator:        # Orchestrator settings (optional, required for file ops)
     snapshot_storage: "snapshots"
     agent_temporary_workspace: "temp_workspaces"
     context_paths: ...

   ui:                  # UI settings (optional)
     display_type: "rich_terminal"
     logging_enabled: true

Single Agent Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~

For a single agent, use the ``agents`` field (plural) with one entry:

.. code-block:: yaml

   # @examples/basic/single/single_gpt5nano
   agents:                # Note: plural 'agents' even for single agent
     - id: "gpt-5-nano"
       backend:
         type: "openai"
         model: "gpt-5-nano"
         enable_web_search: true
         enable_code_interpreter: true

   ui:
     display_type: "rich_terminal"
     logging_enabled: true

.. warning::

   **Common Mistake**: When converting a single-agent config to multi-agent, remember to keep ``agents:`` (plural).

   While ``agent:`` (singular) is supported for single-agent configs, always use ``agents:`` (plural) for consistency - this prevents errors when adding more agents later.

**Run this configuration:**

.. code-block:: bash

   uv run massgen \
     --config @examples/basic/single/single_gpt5nano \
     "What is machine learning?"

Multi-Agent Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~

For multiple agents, add more entries to the ``agents`` list:

.. code-block:: yaml

   # @examples/basic/multi/three_agents_default
   agents:
     - id: "gemini2.5flash"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
         enable_web_search: true

     - id: "gpt5nano"
       backend:
         type: "openai"
         model: "gpt-5-nano"
         enable_web_search: true
         enable_code_interpreter: true

     - id: "grok3mini"
       backend:
         type: "grok"
         model: "grok-3-mini"
         enable_web_search: true

   ui:
     display_type: "rich_terminal"
     logging_enabled: true

**Run this configuration:**

.. code-block:: bash

   uv run massgen \
     --config @examples/basic/multi/three_agents_default \
     "Analyze the pros and cons of renewable energy"

Backend Configuration
---------------------

Each agent requires a ``backend`` configuration that specifies the model provider and settings.

.. important::
   **Choosing the right backend?** Different backends support different features (web search, code execution, file operations, etc.). Check the **Backend Capabilities Matrix** in :doc:`../user_guide/backends` to see which features are available for each backend type.

Supported Providers
~~~~~~~~~~~~~~~~~~~

MassGen supports many LLM providers. Use the **slash format** (``provider/model``) for the Python API and LiteLLM:

.. list-table:: Provider Reference
   :header-rows: 1
   :widths: 15 20 30 20

   * - Provider
     - Backend Type
     - Example Models
     - Slash Format Example
   * - OpenAI
     - ``openai``
     - ``gpt-5``, ``gpt-5-nano``, ``gpt-5.1``
     - ``openai/gpt-5``
   * - Anthropic
     - ``claude``
     - ``claude-sonnet-4-5-20250929``, ``claude-opus-4-5-20251101``
     - ``claude/claude-sonnet-4-5-20250929``
   * - Google
     - ``gemini``
     - ``gemini-2.5-flash``, ``gemini-2.5-pro``, ``gemini-3-pro-preview``
     - ``gemini/gemini-2.5-flash``
   * - xAI
     - ``grok``
     - ``grok-4``, ``grok-4-1-fast-reasoning``, ``grok-3-mini``
     - ``grok/grok-4``
   * - Groq
     - ``groq``
     - ``llama-3.3-70b-versatile``, ``mixtral-8x7b-32768``
     - ``groq/llama-3.3-70b-versatile``
   * - Cerebras
     - ``cerebras``
     - ``llama-3.3-70b``, ``llama-3.1-8b``
     - ``cerebras/llama-3.3-70b``
   * - Together
     - ``together``
     - ``meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo``
     - ``together/meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo``
   * - Fireworks
     - ``fireworks``
     - ``accounts/fireworks/models/llama-v3p3-70b-instruct``
     - ``fireworks/accounts/fireworks/models/llama-v3p3-70b-instruct``
   * - OpenRouter
     - ``openrouter``
     - 200+ models (e.g., ``x-ai/grok-4.1-mini``)
     - ``openrouter/x-ai/grok-4.1-mini``
   * - Qwen
     - ``qwen``
     - ``qwen-max``, ``qwen-plus``, ``qwen-turbo``
     - ``qwen/qwen-max``
   * - Moonshot
     - ``moonshot``
     - ``moonshot-v1-128k``, ``moonshot-v1-32k``
     - ``moonshot/moonshot-v1-128k``
   * - Nebius
     - ``nebius``
     - ``Qwen/Qwen3-4B-fast``
     - ``nebius/Qwen/Qwen3-4B-fast``
   * - Claude Code
     - ``claude_code``
     - ``claude-sonnet-4-5-20250929``
     - (YAML only, no API key — subscription or CLI auth)
   * - Codex
     - ``codex``
     - ``gpt-5.4``, ``gpt-5.3-codex``
     - (YAML only, no API key — ``codex login`` OAuth)
   * - Gemini CLI
     - ``gemini_cli``
     - ``gemini-2.5-pro``, ``gemini-2.5-flash``
     - (YAML only, no API key — ``gemini`` CLI login)
   * - GitHub Copilot
     - ``copilot``
     - ``gpt-5-mini``, ``claude-sonnet-4``, ``gemini-2.5-pro``
     - (YAML only, no API key — ``copilot /login``)
   * - Azure OpenAI
     - ``azure_openai``
     - ``gpt-4o`` (deployment name)
     - (YAML only)

.. tip::
   **Nested slashes are supported!** For providers like OpenRouter, Together, and Fireworks where model names contain slashes, the format still works:

   - ``openrouter/x-ai/grok-4.1-mini`` → provider=openrouter, model=x-ai/grok-4.1-mini
   - ``together/meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo`` → provider=together, model=meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo

**Using slash format in Python:**

.. code-block:: python

   import massgen

   # Build config with slash format
   config = massgen.build_config(models=[
       "openai/gpt-5",
       "groq/llama-3.3-70b-versatile",
       "openrouter/x-ai/grok-4.1-mini"
   ])

   # Or with LiteLLM (using OpenRouter)
   from dotenv import load_dotenv
   load_dotenv()  # Load OPENROUTER_API_KEY from .env

   import litellm
   from massgen import register_with_litellm

   register_with_litellm()
   response = litellm.completion(
       model="massgen/build",
       messages=[{"role": "user", "content": "Your question"}],
       optional_params={"models": ["openrouter/openai/gpt-5", "openrouter/anthropic/claude-sonnet-4.5"]}
   )
   print(response.choices[0].message.content)

Backend Types (YAML)
~~~~~~~~~~~~~~~~~~~~

For YAML configuration files, use the ``type`` field:

**Agent/CLI backends** (no API key required — use CLI auth):

* ``claude_code`` - Claude Code SDK with dev tools (subscription or ``CLAUDE_CODE_API_KEY``)
* ``codex`` - OpenAI Codex CLI (``codex login`` OAuth or ``OPENAI_API_KEY``)
* ``gemini_cli`` - Google Gemini CLI agent (``gemini`` CLI login, no API key needed)
* ``copilot`` - GitHub Copilot CLI (``copilot /login``, no API key needed)

**API backends:**

* ``openai`` - OpenAI models (GPT-5, GPT-4, etc.)
* ``claude`` - Anthropic Claude models
* ``gemini`` - Google Gemini models
* ``grok`` - xAI Grok models
* ``groq`` - Groq inference (ultra-fast)
* ``cerebras`` - Cerebras AI
* ``together`` - Together AI
* ``fireworks`` - Fireworks AI
* ``openrouter`` - OpenRouter (200+ models)
* ``qwen`` - Alibaba Qwen models
* ``moonshot`` - Kimi/Moonshot AI
* ``nebius`` - Nebius AI Studio
* ``azure_openai`` - Azure OpenAI deployment
* ``zai`` - ZhipuAI GLM models
* ``ag2`` - AG2 framework integration
* ``lmstudio`` - Local models via LM Studio
* ``chatcompletion`` - Generic OpenAI-compatible API

Basic Backend Structure
~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   backend:
     type: "openai"           # Backend type (required)
     model: "gpt-5-nano"      # Model name (required)
     api_key: "..."           # Optional - uses env var by default
     temperature: 0.7         # Optional - model parameters
     max_tokens: 4096         # Optional - response length

Backend-Specific Features
~~~~~~~~~~~~~~~~~~~~~~~~~

Different backends support different built-in tools:

.. code-block:: yaml

   # OpenAI with tools
   backend:
     type: "openai"
     model: "gpt-5-nano"
     enable_web_search: true
     enable_code_interpreter: true

   # Gemini with tools
   backend:
     type: "gemini"
     model: "gemini-2.5-flash"
     enable_web_search: true
     enable_code_execution: true

   # Claude Code with workspace
   backend:
     type: "claude_code"
     model: "claude-sonnet-4"
     cwd: "workspace"          # Working directory for file operations

.. note::
   Always use ``cwd: "workspace"`` rather than numbered names like ``workspace1``. MassGen automatically adds a unique suffix per agent at runtime to prevent identity leakage during voting.

See :doc:`../reference/yaml_schema` for complete backend options.

System Messages
---------------

Customize agent behavior with system messages:

.. code-block:: yaml

   agents:
     - id: "research_agent"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
       system_message: |
         You are a research specialist. When answering questions:
         1. Always search for current information
         2. Cite your sources
         3. Provide comprehensive analysis

     - id: "code_agent"
       backend:
         type: "openai"
         model: "gpt-5-nano"
       system_message: |
         You are a coding expert. When solving problems:
         1. Write clean, well-documented code
         2. Use code execution to test solutions
         3. Explain your approach clearly

Orchestrator Configuration
--------------------------

Control workspace sharing and project integration:

.. code-block:: yaml

   orchestrator:
     snapshot_storage: "snapshots"              # Workspace snapshots for sharing
     agent_temporary_workspace: "temp_workspaces"  # Temporary workspaces
     context_paths:                             # Project integration
       - path: "/absolute/path/to/project"
         permission: "read"                     # read or write

Decomposition Mode Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use decomposition mode when agents own subtasks and one presenter synthesizes:

.. code-block:: yaml

   agents:
     - id: "frontend"
       subtask: "Implement UI and client behavior"
       backend:
         type: "openai"
         model: "gpt-5-nano"

     - id: "backend"
       subtask: "Implement API and persistence"
       backend:
         type: "openai"
         model: "gpt-5-nano"

     - id: "integrator"
       subtask: "Integrate and present final output"
       backend:
         type: "openai"
         model: "gpt-5-nano"

   orchestrator:
     coordination_mode: "decomposition"
     presenter_agent: "integrator"
     # Recommended decomposition defaults:
     max_new_answers_per_agent: 2  # Recommended range: 2-3
     max_new_answers_global: 9     # Team-wide cumulative cap across all agents
     answer_novelty_requirement: "balanced"

Sensible defaults guidance:

* By default, use ``max_new_answers_per_agent: 2-3`` in decomposition mode.
* Usually this is lower than fully parallel voting mode settings.
* Add ``max_new_answers_global`` for deterministic total coordination budget.
* Keep other answer-control parameters at defaults unless you need stricter behavior.
* Keep fairness defaults enabled (``fairness_enabled: true``, ``fairness_lead_cap_answers: 2``, ``max_midstream_injections_per_round: 2``) to prevent fast agents from repeatedly lapping slower peers and causing restart churn.

Quickstart note:

* The Quickstart flows (``uv run massgen --quickstart`` and the Web/TUI quickstart wizard) expose decomposition mode, presenter selection, and these defaults directly.
* For GPT-5x models, Quickstart also exposes ``reasoning.effort`` selection.
  OpenAI GPT-5 models support ``low|medium|high`` and Codex GPT-5 models include ``xhigh``.

Example:

.. code-block:: yaml

   agents:
     - id: "agent_a"
       backend:
         type: "codex"
         model: "gpt-5.4"
         reasoning:
           effort: "xhigh"
           summary: "auto"

Advanced Configuration
----------------------

MCP Integration
~~~~~~~~~~~~~~~

Add MCP (Model Context Protocol) servers for external tools:

.. code-block:: yaml

   agents:
     - id: "agent_with_mcp"
       backend:
         type: "openai"
         model: "gpt-5-nano"
         mcp_servers:
           - name: "weather"
             type: "stdio"
             command: "npx"
             args: ["-y", "@fak111/weather-mcp"]

See :doc:`../user_guide/tools/mcp_integration` for details.

File Operations
~~~~~~~~~~~~~~~

Enable file system access for agents:

.. code-block:: yaml

   agents:
     - id: "file_agent"
       backend:
         type: "claude_code"
         model: "claude-sonnet-4"
         cwd: "workspace"       # Agent's working directory

   orchestrator:
     snapshot_storage: "snapshots"
     agent_temporary_workspace: "temp_workspaces"

See :doc:`../user_guide/files/file_operations` for details.

Project Integration
~~~~~~~~~~~~~~~~~~~

Share directories with agents (read or write access):

.. code-block:: yaml

   agents:
     - id: "project_agent"
       backend:
         type: "claude_code"
         cwd: "workspace"

   orchestrator:
     context_paths:
       - path: "/absolute/path/to/project/src"
         permission: "read"      # Agents can analyze code
       - path: "/absolute/path/to/project/docs"
         permission: "write"     # Agents can update docs

See :doc:`../user_guide/files/project_integration` for details.

Protected Paths
~~~~~~~~~~~~~~~

Make specific files read-only within writable context paths:

.. code-block:: yaml

   orchestrator:
     context_paths:
       - path: "/project"
         permission: "write"
         protected_paths:
           - "config.json"        # Read-only
           - "template.html"      # Read-only
           # Other files remain writable

**Use Case**: Allow agents to modify most files while protecting critical configurations or templates.

See :doc:`../user_guide/files/protected_paths` for complete documentation.

Planning Mode
~~~~~~~~~~~~~

Prevent irreversible actions during multi-agent coordination:

.. code-block:: yaml

   orchestrator:
     coordination:
       enable_planning_mode: true
       planning_mode_instruction: |
         PLANNING MODE: Describe your intended actions without executing.
         Save execution for the final presentation phase.

**Use Case**: File operations, API calls, or any task with irreversible consequences.

See :doc:`../user_guide/advanced/planning_mode` for complete documentation.

Subagents
~~~~~~~~~

Enable agents to spawn parallel child processes for independent tasks:

.. code-block:: yaml

   orchestrator:
     coordination:
       enable_subagents: true
       subagent_default_timeout: 300  # 5 minutes per subagent
       subagent_max_concurrent: 3     # Max parallel subagents

**Example usage:**

.. code-block:: bash

   uv run massgen \
     --config @massgen/configs/features/subagent_demo.yaml \
     "Build a website with frontend, backend, and documentation"

The agent can spawn subagents to work on each component simultaneously. Subagents:

* Run in isolated workspaces
* Inherit parent agent configurations by default
* Execute concurrently for parallel task completion
* Return structured results with workspace paths

**Use Case**: Complex tasks with independent, parallelizable components (e.g., multi-part research, website building, documentation generation).

See :doc:`../user_guide/advanced/subagents` for complete documentation.

Timeout Configuration
~~~~~~~~~~~~~~~~~~~~~

Control maximum coordination time:

.. code-block:: yaml

   timeout_settings:
     orchestrator_timeout_seconds: 1800  # 30 minutes (default)

**CLI Override**:

.. code-block:: bash

   uv run massgen --orchestrator-timeout 600 --config config.yaml

See :doc:`../reference/timeouts` for complete timeout documentation.

For the complete list of CLI parameters, see :doc:`../reference/cli`

Configuration Best Practices
-----------------------------

1. **Start Simple**: Use single agent configs for testing, then scale to multi-agent
2. **Use Environment Variables**: Never commit API keys to version control
3. **Organize Configs**: Group related configurations in directories
4. **Comment Your YAML**: Add comments to explain agent roles and settings
5. **Test Incrementally**: Verify each agent works before combining them
6. **Version Your Configs**: Track configuration changes in version control

Example Configuration Templates
-------------------------------

All configuration examples are in ``@examples/``:

* ``@examples/basic/single/single_gpt5nano`` - Single agent configuration
* ``@examples/basic/multi/three_agents_default`` - Multi-agent collaboration
* ``@examples/basic/multi/decomposition_quickstart`` - Decomposition mode with recommended defaults
* ``@examples/basic/multi/decomposition_example`` - Decomposition mode with richer role setup
* ``@examples/tools/mcp/*`` - MCP integration examples
* ``@examples/tools/filesystem/*`` - File operation examples
* ``@examples/ag2/*`` - AG2 framework integration

See the `Configuration Guide <https://github.com/Leezekun/MassGen/blob/main/@examples/README.md>`_ for the complete catalog.

Next Steps
----------

**Excellent! You understand configuration basics. Here's your path forward:**

✅ **You are here:** You know how to configure agents in YAML

⬜ **Put it to use:** :doc:`../examples/basic_examples` - Copy ready-made configurations

⬜ **Go deeper:** :doc:`../user_guide/concepts` - Understand how multi-agent coordination works

⬜ **Add capabilities:** :doc:`../user_guide/tools/mcp_integration` - Integrate external tools

**Need a reference?** The complete configuration schema is at :doc:`../reference/yaml_schema`

Troubleshooting
---------------

**Configuration not found:**

Ensure the path is correct relative to the MassGen directory:

.. code-block:: bash

   # Correct - relative to MassGen root
   uv run massgen --config @examples/basic/multi/three_agents_default

   # Incorrect - missing massgen/ prefix
   uv run massgen --config configs/basic/multi/three_agents_default.yaml

**API key not found:**

Check that your ``.env`` file exists and contains the correct key:

.. code-block:: bash

   # Verify .env file exists
   ls -la .env

   # Check for the required key
   grep "OPENAI_API_KEY" .env

**YAML syntax error:**

Validate your YAML syntax:

.. code-block:: bash

   python -c "import yaml; yaml.safe_load(open('your-config.yaml'))"


---

## quickstart/installation.rst

============
Installation
============

Prerequisites
=============

MassGen requires **Python 3.11 or higher**.

A guide to install python can be found `here <https://realpython.com/installing-python/>`_

.. code-block:: bash

   python --version  # Should be 3.11+

Quick Install
=============

.. tabs::

   .. tab:: CLI

      .. code-block:: bash

         pip install uv          # if uv is not installed, the fastest way to check if uv is installed is to run "uv venv"
         # if the above command fails, run "curl -LsSf https://astral.sh/uv/install.sh | sh" for macOS and Linux to install uv
         # or run "powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" for Windows
         uv venv && source .venv/bin/activate
         uv pip install massgen

   .. tab:: LiteLLM Integration

      .. code-block:: bash

         pip install uv          # if uv is not installed, the fastest way to check if uv is installed is to run "uv venv"
         # if the above command fails, run "curl -LsSf https://astral.sh/uv/install.sh | sh" for macOS and Linux to install uv
         # or run "powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" for Windows
         uv venv && source .venv/bin/activate
         uv pip install massgen litellm python-dotenv

      Then in Python:

      .. code-block:: python

         from dotenv import load_dotenv
         load_dotenv()  # Load OPENROUTER_API_KEY from .env

         import litellm
         from massgen import register_with_litellm

         register_with_litellm()

         response = litellm.completion(
             model="massgen/build",
             messages=[{"role": "user", "content": "Hello!"}],
             optional_params={"models": ["openrouter/openai/gpt-5"]}
         )
         print(response.choices[0].message.content)

For programmatic Python usage, see the :doc:`../user_guide/integration/python_api`.

First Run Setup
===============

On first run, MassGen guides you through setup. Choose your preferred interface:

.. tabs::

   .. tab:: WebUI (Recommended)

      .. code-block:: bash

         uv run massgen --web

      The browser-based setup wizard will:

      1. **Configure API keys** - Enter keys for OpenAI, Anthropic, Google, etc.
      2. **Setup Docker** (optional) - For isolated code execution
      3. **Review Skills** - See available agent capabilities
      4. **Create your agent team** - Quickstart wizard for configuration

      This is the easiest way to get started - everything in one visual interface.

   .. tab:: CLI

      .. code-block:: bash

         uv run massgen

      The terminal wizard will:

      1. **Configure API keys** (OpenRouter recommended, or individual providers)
      2. **Create your agent team** (choose from templates or examples)
      3. **Launch interactive TUI mode** immediately

      The Textual TUI (default) provides real-time visualization with:

      - 📊 Timeline view of all agent activities
      - 🎯 Individual agent status cards
      - 🗳️ Vote visualization and consensus tracking
      - 💬 Multi-turn conversation management

By default, setup creates ``~/.config/massgen/config.yaml``.
Quickstart also supports custom filenames:

* In the TUI/Web Quickstart "Save Location" step, choose project (``.massgen/``) or global
  (``~/.config/massgen/``) and enter a filename.
* In CLI quickstart, ``--config`` can be used as the output filename:

.. code-block:: bash

   # Saves to .massgen/team-config.yaml
   uv run massgen --quickstart --config team-config

API Keys
--------
API keys can be configured through webui or cli setup. However, they can also be setup through the terminal:

**OpenRouter (recommended):** Single API key for all models

.. code-block:: bash

   export OPENROUTER_API_KEY=sk-or-v1-...

**Or use individual providers:** OpenAI, Anthropic, Google Gemini, xAI (Grok), Azure OpenAI, Groq, Together AI, Fireworks, Cerebras

**Manual setup** (optional - wizard handles this):

.. code-block:: bash

   # Create .env file
   mkdir -p ~/.config/massgen
   echo "OPENROUTER_API_KEY=sk-or-v1-..." >> ~/.config/massgen/.env

Re-run Setup
------------

Re-configure anytime:

.. code-block:: bash

   uv run massgen --web         # WebUI setup page (visit /setup)
   uv run massgen --init        # CLI full configuration wizard
   uv run massgen --setup       # CLI just API keys
   uv run massgen --quickstart  # CLI quick 3-agent setup

Verify Installation
===================

.. code-block:: bash

   # Check CLI is available
   uv run massgen --help

   # List example configurations
   uv run massgen --list-examples

   # Run multi-agent collaboration
   uv run massgen --config @examples/basic/multi/three_agents_default "What is machine learning?"

Optional: Observability
=======================

For structured logging and tracing with Logfire:

.. code-block:: bash

   # Install with observability support
   pip install "massgen[observability]"

   # Or with uv
   uv pip install "massgen[observability]"

   # Authenticate with Logfire
   uv run logfire auth

   # Run with observability enabled
   uv run massgen --logfire --config your_config.yaml "Your question"

See :doc:`../user_guide/logging` for detailed Logfire configuration.

Optional: Docker & Skills
=========================

For advanced features like isolated code execution:

.. code-block:: bash

   # Install Docker images
   uv run massgen --setup-docker

   # Install skills (semantic search, web scraping)
   uv run massgen --setup-skills

These are optional - basic MassGen works without them.

.. note::

   In ``uv run massgen --quickstart``, when Docker mode is selected the wizard includes
   a Skills step where you can select package(s) and install them immediately with
   on-page status updates (Anthropic/OpenAI/Vercel collections, Agent Browser skill,
   Remotion, and Crawl4AI). Use ``--setup-skills`` to retry or pre-install manually.

Development Installation
========================

For contributors or source code access:

.. code-block:: bash

   git clone https://github.com/Leezekun/MassGen.git
   cd MassGen
   pip install -e .

   # Or with full dev setup (Unix/macOS)
   ./scripts/init.sh

Next Steps
==========

.. grid:: 2
   :gutter: 3

   .. grid-item-card:: ▶️ Run MassGen

      See all usage modes

      :doc:`running-massgen`

   .. grid-item-card:: ⚙️ Configuration

      Create custom agent teams

      :doc:`configuration`

   .. grid-item-card:: 📚 Examples

      Ready-to-use configs

      :doc:`../examples/basic_examples`

   .. grid-item-card:: 🐍 Python API

      Programmatic integration

      :doc:`../user_guide/integration/python_api`

Troubleshooting
===============

**Python version error:**

.. code-block:: bash

   python --version  # Need 3.11+
   pip install --upgrade massgen

**API key not found:**

Check ``~/.config/massgen/.env`` exists with correct keys, or re-run ``massgen --init``.

**Setup wizard not appearing:**

Run ``massgen --init`` to manually trigger it.

For more help: `GitHub Issues <https://github.com/Leezekun/MassGen/issues>`_


---

## quickstart/running-massgen.rst

Running MassGen
===============

This guide shows you how to run MassGen using different modes and configurations.

Choosing Your Mode
------------------

MassGen offers four ways to run multi-agent workflows:

.. list-table::
   :header-rows: 1
   :widths: 15 35 50

   * - Mode
     - Best For
     - Key Features
   * - **CLI**
     - Interactive exploration, quick experiments
     - Rich terminal UI, YAML configs, real-time visualization
   * - **WebUI**
     - Visual monitoring, team demos, workspace browsing
     - Browser-based UI, real-time streaming, file explorer, vote visualization
   * - **LiteLLM**
     - Application integration, LangChain, existing LiteLLM users
     - Standard OpenAI interface, drop-in replacement
   * - **HTTP Server**
     - Integrating via HTTP, OpenAI-compatible clients, proxies/gateways
     - OpenAI-compatible endpoints (``/v1/chat/completions``), SSE streaming, tool calling

For advanced programmatic control, see the :doc:`../user_guide/integration/python_api` (async-first, headless execution).

Quick Start Examples
--------------------

.. tabs::

   .. tab:: CLI

      .. code-block:: bash

         # Multi-agent collaboration (recommended)
         uv run massgen --config @examples/basic/multi/three_agents_default "Analyze renewable energy"

         # Interactive mode (multi-turn)
         uv run massgen

      Textual TUI (default) with timeline view, agent cards, vote visualization, and multi-turn conversations.

   .. tab:: WebUI

      .. code-block:: bash

         # Start the web interface
         uv run massgen --web

         # Open http://localhost:8000 in your browser

      Browser-based UI with real-time agent streaming, vote visualization, and workspace browsing.

   .. tab:: LiteLLM

      .. code-block:: python

         from dotenv import load_dotenv
         load_dotenv()  # Load OPENROUTER_API_KEY from .env

         import litellm
         from massgen import register_with_litellm

         register_with_litellm()

         # Multi-agent with multiple models (using OpenRouter)
         response = litellm.completion(
             model="massgen/build",
             messages=[{"role": "user", "content": "Analyze renewable energy"}],
             optional_params={"models": ["openrouter/openai/gpt-5", "openrouter/anthropic/claude-sonnet-4.5"]}
         )
         print(response.choices[0].message.content)

      Standard OpenAI-compatible interface for seamless integration with existing applications.

   .. tab:: HTTP Server

      .. code-block:: bash

         # Start an OpenAI-compatible HTTP server (defaults: 0.0.0.0:4000)
         uv run massgen serve

         # With a specific config
         uv run massgen serve --config @examples/basic/multi/three_agents_default

         # Health check
         curl http://localhost:4000/health

         # OpenAI-compatible Chat Completions
         curl http://localhost:4000/v1/chat/completions \
           -H "Content-Type: application/json" \
           -d '{"model":"massgen","messages":[{"role":"user","content":"Analyze renewable energy"}]}'

      OpenAI-compatible HTTP API for integrating MassGen into existing clients and server workflows.

      .. note::

         **Config Selection:** Use the ``model`` parameter to select configs:

         - ``model="massgen"`` - Use the default config (from ``--config`` or auto-discovered)
         - ``model="massgen/basic_multi"`` - Use a built-in example config
         - ``model="massgen/path:/path/to/config.yaml"`` - Use a specific config file

         **Full Parity:** The HTTP server uses ``massgen.run()`` internally, providing identical behavior
         to CLI, WebUI, and LiteLLM modes - including logging to ``.massgen/massgen_logs/``, metrics,
         and session management. The ``massgen_metadata`` field in responses contains the same data
         as ``massgen.run()`` returns.

CLI Usage
---------

Basic Command Structure
~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   uv run massgen [OPTIONS] ["<your question>"]

For the complete list of CLI options, see :doc:`../reference/cli`.

Multi-Agent Collaboration
~~~~~~~~~~~~~~~~~~~~~~~~~

MassGen is designed for multi-agent collaboration - multiple agents working together on complex tasks:

.. code-block:: bash

   # Three agents collaborate
   uv run massgen --config @examples/basic/multi/three_agents_default "Analyze the pros and cons of renewable energy"

The agents work in parallel, share observations, vote for solutions, and converge on the best answer.

Decomposition Mode
~~~~~~~~~~~~~~~~~~

Use decomposition mode when each agent owns a subtask and one presenter combines results:

.. code-block:: bash

   uv run massgen \
     --config @examples/basic/multi/decomposition_quickstart \
     "Build a small full-stack todo app"

Recommended decomposition defaults:

* ``max_new_answers_per_agent: 2-3`` (consecutive cap; resets after unseen external updates are injected)
* ``max_new_answers_global`` set to an overall budget (for example ``9`` with three agents)
* If a decomposition agent hits its cap, it should stop instead of running a wasteful extra round
* With GPT-5x models, quickstart lets you pick ``reasoning.effort`` (Codex GPT-5 models include ``xhigh``)

Unless you need different behavior, keep these defaults.

Interactive Multi-Turn Mode
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Start without a question to enter interactive chat mode:

.. code-block:: bash

   # Interactive with multi-agent team
   uv run massgen --config @examples/basic/multi/three_agents_default

Features:

* Conversation context preserved across turns
* Session history saved in ``.massgen/sessions/``
* Real-time agent coordination visualization
* Optional CWD context shortcut via ``--cwd-context ro|rw``

CWD Context Shortcut
~~~~~~~~~~~~~~~~~~~~

Use ``--cwd-context`` when you want quick access to your current directory without editing YAML:

.. code-block:: bash

   # Read-only current directory context
   uv run massgen --config @examples/basic/multi/three_agents_default --cwd-context ro "Review this repository"

   # Writable current directory context
   uv run massgen --config @examples/basic/multi/three_agents_default --cwd-context rw "Implement the requested changes"

In Textual TUI sessions, this initializes the same state as pressing ``Ctrl+P``.
During Execute mode, ``Ctrl+P`` is blocked so context scope cannot change mid-execution.

See :doc:`../user_guide/sessions/multi_turn_mode` for the complete guide.

.. note::

   For programmatic Python access with async support and full control, see the :doc:`../user_guide/integration/python_api`.

WebUI
-----

The WebUI provides a browser-based interface for visual monitoring of multi-agent coordination.

Starting the WebUI
~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   # Basic: Start on localhost:8000
   uv run massgen --web

   # With custom host/port
   uv run massgen --web --web-host 0.0.0.0 --web-port 3000

   # With a default config
   uv run massgen --web --config @examples/basic/multi/three_agents_default

Then open http://localhost:8000 in your browser.

First-Time Setup
~~~~~~~~~~~~~~~~

On first launch, the WebUI automatically guides you through setup:

1. **Setup Page** - Configure API keys, Docker, and skills
2. **Quickstart Wizard** - Create your first agent configuration, including decomposition mode, presenter selection, GPT-5x reasoning selection, and recommended answer-control defaults

This makes ``uv run massgen --web`` the easiest way to get started with MassGen.

Key Features
~~~~~~~~~~~~

* **Real-time Agent Streaming** - Watch agents think, use tools, and generate answers live
* **Vote Visualization** - See voting distribution and consensus-building with animated charts
* **Coordination Timeline** - Visual swimlane diagram showing answer flow and dependencies
* **Answer Browser** - Browse all agent answers with version history
* **Workspace Explorer** - View and examine files created by agents during execution
* **Multi-Turn Conversations** - Continue sessions with follow-up questions
* **Quickstart Wizard** - Guided setup for configuring agents without manual YAML editing, including decomposition controls

See :doc:`../user_guide/webui` for the complete WebUI guide.

LiteLLM Integration
-------------------

MassGen integrates with LiteLLM for a familiar OpenAI-compatible interface. Use OpenRouter to access multiple models with a single API key.

Setup
~~~~~

.. code-block:: python

   from dotenv import load_dotenv
   load_dotenv()  # Load OPENROUTER_API_KEY from .env

   import litellm
   from massgen import register_with_litellm

   # Register once at startup
   register_with_litellm()

Model String Formats
~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   # Dynamic multi-agent with OpenRouter (recommended - single API key)
   response = litellm.completion(
       model="massgen/build",
       messages=[{"role": "user", "content": "Your question"}],
       optional_params={"models": ["openrouter/openai/gpt-5", "openrouter/anthropic/claude-sonnet-4.5"]}
   )
   print(response.choices[0].message.content)

   # Use example config
   response = litellm.completion(
       model="massgen/basic_multi",
       messages=[{"role": "user", "content": "Your question"}]
   )
   print(response.choices[0].message.content)

Access MassGen Metadata
~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   # MassGen-specific metadata
   metadata = response._hidden_params
   print(metadata.get("massgen_vote_results"))
   print(metadata.get("massgen_answers"))

See :doc:`../user_guide/integration/python_api` for complete LiteLLM documentation.

Adding Tools
------------

MCP Integration
~~~~~~~~~~~~~~~

Add external tools via Model Context Protocol:

.. tabs::

   .. tab:: CLI

      .. code-block:: bash

         uv run massgen --config @examples/tools/mcp/gpt5_nano_mcp_example.yaml \
           "What's the weather in New York?"

   .. tab:: YAML Config

      .. code-block:: yaml

         agents:
           - id: "agent_with_tools"
             backend:
               type: "openai"
               model: "openrouter/openai/gpt-5"
             mcp_servers:
               - command: "npx"
                 args: ["-y", "@modelcontextprotocol/server-weather"]

See :doc:`../user_guide/tools/mcp_integration` for details.

File Operations
~~~~~~~~~~~~~~~

Agents can work with files in isolated workspaces:

.. tabs::

   .. tab:: CLI

      .. code-block:: bash

         uv run massgen --config @examples/tools/filesystem/claude_code_single.yaml \
           "Create a Python web scraper and save results to CSV"

   .. tab:: YAML Config

      .. code-block:: yaml

         orchestrator:
           file_system:
             enabled: true
             use_docker: false

See :doc:`../user_guide/files/file_operations` for details.

Configuration Paths
-------------------

MassGen supports multiple ways to specify configurations:

.. code-block:: bash

   # Built-in examples (works from any directory)
   uv run massgen --config @examples/basic/multi/three_agents_default "Question"

   # List all examples
   uv run massgen --list-examples

   # Custom file (relative or absolute path)
   uv run massgen --config ./my-config.yaml "Question"

   # User config directory
   uv run massgen --config my-saved-config "Question"
   # Looks for ~/.config/massgen/agents/my-saved-config.yaml

   # Quickstart with explicit output filename
   uv run massgen --quickstart --config team-config
   # Saves generated config to .massgen/team-config.yaml

Viewing Results
---------------

By default, MassGen shows a rich terminal UI. Control the display:

.. code-block:: bash

   # Disable UI (quiet mode)
   uv run massgen --no-display --config config.yaml "Question"

   # Enable debug logging
   uv run massgen --debug --config config.yaml "Question"

After execution completes, an interactive Agent Selector menu appears, allowing you to:

* View each agent's original output and reasoning
* See the orchestrator's system status and voting process
* Display the coordination table with full agent interaction history
* Browse workspace files created during execution
* Press ``q`` to exit

Analyzing and Sharing Sessions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

After running a session, use these commands to analyze and share your results:

.. code-block:: bash

   # View summary of the most recent run
   massgen logs

   # List recent runs with costs and questions
   massgen logs list

   # Share a session (creates shareable URL)
   massgen export

The ``massgen export`` command uploads your session to GitHub Gist and returns a shareable URL that anyone can view without login. This is useful for collaboration, debugging, or showcasing results.

See :doc:`../user_guide/logging` for the complete logging and sharing guide.

Next Steps
----------

.. grid:: 2
   :gutter: 3

   .. grid-item-card:: ⚙️ Configuration

      Create custom agent teams

      :doc:`configuration`

   .. grid-item-card:: 📚 Core Concepts

      Understand multi-agent coordination

      :doc:`../user_guide/concepts`

   .. grid-item-card:: 🐍 Python API

      Full programmatic control

      :doc:`../user_guide/integration/python_api`

   .. grid-item-card:: 🔌 Tools & MCP

      Add capabilities to agents

      :doc:`../user_guide/tools/index`


---

## reference/cli.rst

CLI Reference
=============

MassGen Command Line Interface reference.

Basic Usage
-----------

.. code-block:: bash

   massgen [OPTIONS] ["<your question>"]

**Default Behavior (No Arguments):**

When running ``massgen`` with no arguments, configs are auto-discovered with this priority:

1. ``.massgen/config.yaml`` (project-level config in current directory)
2. ``~/.config/massgen/config.yaml`` (global default config)
3. Launch setup wizard if no config found

**First-Time Setup:**

The wizard consists of two steps:

1. **API Key Setup** (if no cloud provider keys detected)

   * Prompts for OpenAI, Anthropic, Google, or other cloud provider API keys
   * Saves to ``~/.config/massgen/.env``
   * Skipped if keys already exist

2. **Configuration Setup**

   * Option to browse ready-to-use configs/examples
   * Option to build from templates (Simple Q&A, Research, Code & Files, etc.)
   * Asks "Save as default?" when browsing existing configs
   * Launches directly into interactive mode

**After Setup:**

* **No arguments** → Starts multi-turn conversation mode if default config chosen
* **With question** → Runs single query using default config

CLI Parameters
--------------

.. list-table::
   :header-rows: 1
   :widths: 25 75

   * - Parameter
     - Description
   * - ``--config PATH``
     - Path to YAML configuration file with agent definitions, model parameters, backend parameters and UI settings.
       With ``--quickstart``, this value is treated as the output filename under ``.massgen/`` (for example,
       ``--quickstart --config team-config`` saves ``.massgen/team-config.yaml``)
   * - ``--select``
     - Interactively select from available configurations (user configs, project configs, current directory, package examples). Uses hierarchical navigation: category → config
   * - ``--backend TYPE``
     - Backend type for quick setup without config file. Options: ``claude``, ``claude_code``, ``gemini``, ``grok``, ``openai``, ``azure_openai``, ``zai``
   * - ``--model NAME``
     - Model name for quick setup (e.g., ``gemini-2.5-flash``, ``gpt-5-nano``). Mutually exclusive with ``--config``
   * - ``--system-message TEXT``
     - System prompt for the agent in quick setup mode. Omitted if ``--config`` is provided
   * - ``--cwd-context {ro,rw,read,write}``
     - Add the current working directory to ``orchestrator.context_paths`` for this run. ``ro/read`` grants read-only access, ``rw/write`` grants write permission
   * - ``--plan``
     - Planning-only mode. Agents create a structured task plan without auto-executing it
   * - ``--plan-depth {dynamic,shallow,medium,deep}``
     - Controls plan granularity for ``--plan`` mode
   * - ``--plan-steps N``
     - Optional explicit task-count target for planning output (must be > 0)
   * - ``--plan-chunks N``
     - Optional explicit chunk-count target for planning output (must be > 0)
   * - ``--broadcast {human,agents,false}``
     - Planning collaboration mode: ask user questions, coordinate among agents, or run fully autonomous
   * - ``--plan-and-execute``
     - Full workflow: create a plan, then execute it immediately
   * - ``--execute-plan PLAN_PATH``
     - Execute an existing plan by plan directory, plan ID, or ``latest``
   * - ``--no-display``
     - Disable real-time streaming UI coordination display (fallback to simple text output)
   * - ``--no-logs``
     - Disable real-time logging
   * - ``--debug``
     - Enable debug mode with verbose logging. Debug logs saved to ``agent_outputs/log_{time}/massgen_debug.log``
   * - ``--session-id ID``
     - Load memory from a previous session by ID (e.g., ``session_20251028_143000``). Allows continuing conversations with memory context from prior runs. Use with ``--list-sessions`` to find available sessions
   * - ``--list-sessions``
     - List all available memory sessions with their metadata (session IDs, timestamps, models, status). Sessions are automatically tracked in ``~/.massgen/sessions.json``
   * - ``--web``
     - Start the WebUI server instead of the terminal UI. Opens a browser-based interface with real-time agent streaming, vote visualization, and workspace browsing
   * - ``--web-host HOST``
     - Host address for the WebUI server (default: ``127.0.0.1``). Use ``0.0.0.0`` to allow external connections
   * - ``--web-port PORT``
     - Port for the WebUI server (default: ``8000``)
   * - ``--no-browser``
     - Don't auto-open browser when using ``--web`` with a question. Useful for automation or when running on servers
   * - ``--output-file PATH``
     - Write final answer to specified file path. Works in any mode (automation, interactive, etc.). Useful for capturing agent responses in scripts or pipelines
   * - ``--logfire``
     - Enable Logfire observability for structured tracing of LLM calls, tool executions, and orchestration. Requires Logfire token (via ``logfire auth login`` or ``LOGFIRE_TOKEN`` env var). See :doc:`../user_guide/logging` for setup details
   * - ``"<your question>"``
     - Optional single-question input. If omitted, MassGen enters interactive chat mode

Mode Settings
~~~~~~~~~~~~~

These flags mirror the TUI mode bar toggles, allowing CLI control of execution modes.

.. list-table::
   :header-rows: 1
   :widths: 25 75

   * - Parameter
     - Description
   * - ``--single-agent [AGENT_ID]``
     - Single-agent mode. Uses only one agent from the config. Optionally specify an agent ID; defaults to the first agent if omitted
   * - ``--coordination-mode {parallel,decomposition}``
     - Coordination mode: ``parallel`` (voting-based, default) or ``decomposition`` (subtask-based). Overrides ``coordination_mode`` from config
   * - ``--quick``
     - Quick mode: disable refinement. Agents produce one answer with no voting loop. Equivalent to TUI "Refine OFF" toggle
   * - ``--personas {off,perspective,implementation,methodology}``
     - Enable parallel persona generation with specified diversity mode. ``off`` disables persona generation. Requires parallel coordination mode

Examples
--------

Default Configuration Mode
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   # First time: Launch setup wizard
   massgen

   # After setup: Start interactive conversation
   massgen

   # Run single query with default config
   massgen "What is machine learning?"

Interactive Config Selection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   # Browse and select from available configurations
   massgen --select

   # After selection, optionally provide a question
   massgen --select "Your question here"

   # The selector shows configs from:
   # - User configs: ~/.config/massgen/agents/
   # - Project configs: .massgen/*.yaml
   # - Current directory: *.yaml
   # - Package examples: Built-in example configs

Quick Single Agent
~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   # Fastest way to test - no config file
   massgen --model claude-3-5-sonnet-latest "What is machine learning?"
   massgen --model gemini-2.5-flash "Explain quantum computing"

With Specific Backend
~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   massgen \
     --backend gemini \
     --model gemini-2.5-flash \
     "What are the latest developments in AI?"

Multi-Agent with Config
~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   # Recommended: Use YAML config for multi-agent
   massgen \
     --config @examples/basic/multi/three_agents_default.yaml \
     "Analyze the pros and cons of renewable energy"

Interactive Mode
~~~~~~~~~~~~~~~~

.. code-block:: bash

   # Omit question to enter interactive chat mode
   massgen --model gemini-2.5-flash

   # Multi-agent interactive
   massgen \
     --config @examples/basic/multi/three_agents_default.yaml

Quick CWD Context Access
~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   # Add current directory as read-only context
   massgen --config my_config.yaml --cwd-context ro "Review this codebase"

   # Add current directory as writable context
   massgen --config my_config.yaml --cwd-context rw "Apply the requested refactor"

.. note::

   In the Textual TUI, ``--cwd-context`` initializes the same CWD context state as the ``Ctrl+P`` toggle.
   In Execute mode, ``Ctrl+P`` is intentionally blocked to avoid changing context scope mid-execution.

Mode Settings
~~~~~~~~~~~~~

.. code-block:: bash

   # Quick mode: one answer per agent, no refinement loop
   massgen --quick --config my_config.yaml "Summarize this paper"

   # Single agent from a multi-agent config
   massgen --single-agent --config my_config.yaml "Quick question"

   # Single agent by ID
   massgen --single-agent agent_b --config my_config.yaml "Quick question"

   # Fastest: single agent + quick mode
   massgen --single-agent --quick --config my_config.yaml "What is 2+2?"

   # Persona-diverse parallel execution
   massgen --personas perspective --config my_config.yaml "Design a logo"

   # Decomposition mode (subtask-based)
   massgen --coordination-mode decomposition --config my_config.yaml "Build a web app"

Debug Mode
~~~~~~~~~~

.. code-block:: bash

   massgen \
     --debug \
     --config @examples/basic/multi/three_agents_default.yaml \
     "Your question here"

Disable UI
~~~~~~~~~~

.. code-block:: bash

   # Simple text output instead of rich terminal UI
   massgen \
     --no-display \
     --config config.yaml \
     "Question"

Session Management
~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   # List available memory sessions
   massgen --list-sessions

   # Load session from previous run
   massgen --session-id session_20251028_143000 \
     "What did we discuss about the backend architecture?"

   # Interactive mode with previous session
   massgen --session-id session_20251028_143000 \
     --config my_config.yaml

   # Session can also be specified in YAML config
   # Add to your config.yaml:
   #   session_id: "session_20251028_143000"

WebUI Mode
~~~~~~~~~~

.. code-block:: bash

   # Start WebUI on default localhost:8000
   massgen --web

   # Custom host and port (for external access)
   massgen --web --web-host 0.0.0.0 --web-port 3000

   # With a specific config
   massgen --web --config @examples/basic/multi/three_agents_default

   # Combine with debug mode
   massgen --web --debug --config my_config.yaml

OpenAI-Compatible HTTP Server (``massgen serve``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Run MassGen as an OpenAI-compatible HTTP API (FastAPI + Uvicorn).

**Endpoints:**

* ``GET /health`` - Health check endpoint
* ``POST /v1/chat/completions`` - OpenAI-compatible chat completions (non-streaming only)

.. note::

   Streaming (``stream: true``) is not yet supported. The server will return HTTP 501
   if streaming is requested. Use ``stream: false`` for all requests.

.. code-block:: bash

   # Start server (defaults: host 0.0.0.0, port 4000)
   massgen serve

   # Custom bind
   massgen serve --host 127.0.0.1 --port 4000

   # Provide a default config
   massgen serve --config path/to/config.yaml

   # Enable auto-reload for development
   massgen serve --reload

   # Health check
   curl http://localhost:4000/health

   # OpenAI-compatible Chat Completions
   # Note: When running with --config, the "model" parameter is ignored
   # to ensure the server uses the agent team defined in your YAML.
   curl http://localhost:4000/v1/chat/completions \
     -H "Content-Type: application/json" \
     -d '{"model":"massgen","messages":[{"role":"user","content":"hi"}],"stream":false}'

**Response Format:**

The server returns responses with the final synthesized answer in ``content`` and all agent traces in ``reasoning_content``:

.. code-block:: json

   {
     "choices": [{
       "message": {
         "role": "assistant",
         "content": "The final answer from the agent team.",
         "reasoning_content": "[system] Starting coordination...\n[agent_1] Analyzing...\n[orchestrator] Vote: agent_1"
       },
       "finish_reason": "stop"
     }]
   }

**Environment variables (optional):**

* ``MASSGEN_SERVER_HOST`` (default: ``0.0.0.0``)
* ``MASSGEN_SERVER_PORT`` (default: ``4000``)
* ``MASSGEN_SERVER_DEFAULT_CONFIG`` (default: unset)
* ``MASSGEN_SERVER_DEBUG`` (default: ``false``)

Output to File
~~~~~~~~~~~~~~

.. code-block:: bash

   # Save agent response to a file
   massgen --output-file results.txt "Summarize the key points of machine learning"

   # With config and output file
   massgen --config my_config.yaml --output-file report.md "Generate a project report"

Logfire Observability
~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   # Enable structured tracing with Logfire
   massgen --logfire --config your_config.yaml "Your question"

   # Combine with debug mode for maximum observability
   massgen --logfire --debug --config your_config.yaml "Your question"

   # Or enable via environment variable
   export MASSGEN_LOGFIRE_ENABLED=true
   massgen --config your_config.yaml "Your question"

See :doc:`../user_guide/logging` for detailed Logfire setup instructions.

Additional Commands
-------------------

Log Analysis (``massgen logs``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Analyze and browse session logs without manual file navigation.

.. code-block:: bash

   # Summary of most recent run
   massgen logs

   # Full tool breakdown
   massgen logs tools

   # List recent runs
   massgen logs list

   # Open log directory in file manager
   massgen logs open

.. list-table::
   :header-rows: 1
   :widths: 35 65

   * - Subcommand
     - Description
   * - ``massgen logs`` or ``massgen logs summary``
     - Display run summary with tokens, rounds, and top tools
   * - ``massgen logs tools``
     - Full tool breakdown table sorted by execution time
   * - ``massgen logs tools --sort calls``
     - Sort tools by call count instead of time
   * - ``massgen logs list``
     - List recent runs with timestamps, costs, questions, and analysis status
   * - ``massgen logs list --analyzed``
     - Show only logs with ANALYSIS_REPORT.md
   * - ``massgen logs list --unanalyzed``
     - Show only logs without analysis
   * - ``massgen logs list --limit 20``
     - Show more runs (default: 10)
   * - ``massgen logs analyze``
     - Generate analysis prompt for use in coding CLIs
   * - ``massgen logs analyze --mode self``
     - Run multi-agent self-analysis using a preset MassGen team (customizable via ``--config``)
   * - ``massgen logs open``
     - Open log directory in system file manager

**Options:**

.. list-table::
   :header-rows: 1
   :widths: 30 70

   * - Option
     - Description
   * - ``--log-dir PATH``
     - Analyze a specific log directory instead of the most recent
   * - ``--json``
     - Output raw JSON for scripting
   * - ``--mode {prompt,self}``
     - For ``analyze``: ``prompt`` (default) outputs a prompt; ``self`` runs multi-agent analysis
   * - ``--ui {automation,rich_terminal,webui}``
     - For ``analyze --mode self``: UI mode (default: ``rich_terminal``)
   * - ``--config PATH``
     - For ``analyze --mode self``: Custom analysis config file

Share Session (``massgen export``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Share a session via GitHub Gist for easy collaboration.

.. code-block:: bash

   # Share the most recent session
   massgen export

   # Share a specific session
   massgen export log_20251218_134125_867383

**Prerequisites:** Requires GitHub CLI (``gh``) to be installed and authenticated.

.. code-block:: bash

   # Install gh (macOS)
   brew install gh

   # Authenticate
   gh auth login

**Output:**

.. code-block:: text

   Sharing session from: .massgen/massgen_logs/log_20251218_134125/turn_1/attempt_1
   Collecting files...
   Uploading 45 files (1,234,567 bytes)...

   Share URL: https://massgen.github.io/MassGen-Viewer/?gist=abc123def456

   Anyone with this link can view the session (no login required).

The URL opens the MassGen Viewer with the session's coordination timeline, answers, votes, and tool usage.

Session Viewer (``massgen viewer``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View any MassGen session in the full Textual TUI (read-only). Useful for observing ``--automation`` runs, replaying completed sessions, and monitoring cloud/CI runs.

.. code-block:: bash

   # View the most recent session
   massgen viewer

   # View a specific log directory
   massgen viewer /path/to/log_dir

   # Interactive session picker
   massgen viewer --pick

   # Replay at real-time speed (default: instant)
   massgen viewer /path/to/log_dir --replay-speed 1

   # View in browser via textual-serve
   massgen viewer /path/to/log_dir --web --port 9000

.. list-table::
   :header-rows: 1
   :widths: 30 70

   * - Option
     - Description
   * - ``log_dir``
     - Path to log directory (positional, optional). Default: most recent session
   * - ``--turn N``
     - View a specific turn (default: latest)
   * - ``--attempt N``
     - View a specific attempt (default: latest)
   * - ``--replay-speed FLOAT``
     - Playback speed for completed sessions. ``0`` = instant (default), ``1`` = real-time, ``2`` = 2x speed
   * - ``--pick``
     - Interactively pick from recent sessions
   * - ``--web``
     - Serve the viewer TUI in a browser via ``textual-serve``
   * - ``--port PORT``
     - Port for ``--web`` mode (default: 8000)

The viewer shows the exact same TUI as a normal interactive run — agent panels, tool calls, votes, and final presentation — but in read-only mode with the input area hidden.

Manage Shared Sessions (``massgen shares``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

List and manage your shared sessions.

.. code-block:: bash

   # List your shared sessions
   massgen shares list

   # Delete a shared session
   massgen shares delete <gist_id>

See Also
--------

* :doc:`../quickstart/running-massgen` - Detailed usage examples
* :doc:`yaml_schema` - YAML configuration reference
* :doc:`supported_models` - Available models and backends
* :doc:`../user_guide/logging` - Complete logging and debugging guide


---

## reference/comparisons.rst

==================================
MassGen vs Other Multi-Agent Tools
==================================

This page compares MassGen with other multi-agent and multi-LLM tools to help you understand when MassGen is the right choice for your use case.

.. contents:: On This Page
   :local:
   :depth: 2

MassGen vs LLM Council
----------------------

`LLM Council <https://github.com/karpathy/llm-council>`_ is a weekend project by Andrej Karpathy that queries multiple LLMs and synthesizes their responses through peer review.

Overview
^^^^^^^^

.. list-table::
   :header-rows: 1
   :widths: 20 40 40

   * - Aspect
     - MassGen
     - LLM Council
   * - **Primary Goal**
     - Multi-agent coordination with tools, voting, and consensus
     - Multi-model response aggregation with peer review
   * - **Architecture**
     - Agents work in parallel, observe each other, vote on answers
     - 3-stage pipeline: individual responses → peer ranking → chairman synthesis
   * - **Maintenance**
     - Actively maintained with regular releases
     - Self-described "weekend hack", no ongoing support

Feature Comparison
^^^^^^^^^^^^^^^^^^

.. list-table::
   :header-rows: 1
   :widths: 25 20 20 35

   * - Feature
     - MassGen
     - LLM Council
     - Notes
   * - **Web UI**
     - ✅ Side-by-side agent panels
     - ✅ Tabbed responses
     - MassGen shows all agents simultaneously; LLM Council uses tabs
   * - **CLI Interface**
     - ✅ Rich terminal UI
     - ❌
     - MassGen has interactive terminal
   * - **Python API**
     - ✅ Full async API
     - ❌
     - MassGen integrates with LiteLLM as a custom provider
   * - **Tool Use (MCP)**
     - ✅ Web search, code execution, file ops
     - ❌
     - MassGen agents can use tools to solve complex tasks
   * - **Voting/Consensus**
     - ✅ Natural voting mechanism
     - ✅ Peer ranking
     - Different approaches: MassGen uses voting; LLM Council uses rankings
   * - **Model Backends**
     - ✅ 10+ backends (OpenRouter, OpenAI, Claude, Gemini, Grok, Azure, LM Studio, etc.)
     - ✅ OpenRouter only
     - MassGen supports direct API calls + local models; LLM Council routes everything through OpenRouter
   * - **Code Execution**
     - ✅ Sandboxed Python/Bash
     - ❌
     - MassGen can run and verify code
   * - **File Operations**
     - ✅ Project integration with permissions
     - ❌
     - MassGen can read/write files in your codebase
   * - **Custom Tools**
     - ✅ YAML or code-based
     - ❌
     - Define your own tools for agents to use
   * - **Real-time Streaming**
     - ✅ Live token streaming
     - ⚠️ Stage-level SSE
     - MassGen streams tokens as generated; LLM Council streams stage completion events

UI Comparison
^^^^^^^^^^^^^

**LLM Council UI:**

- ChatGPT-style interface with conversation sidebar
- Tabbed view to see individual model responses one at a time
- Sequential stages: Stage 1 (responses) → Stage 2 (rankings) → Stage 3 (synthesis)
- Shows "Running Stage 1: Collecting individual responses..." during processing

**MassGen Web UI:**

- Side-by-side panels showing all agents simultaneously
- Real-time status badges (Working, Done) for each agent
- Live streaming of agent responses as they work
- MCP tool connection status visible per agent
- Answer count and vote tracking in the header
- Toast notifications for new answers
- Dark/light theme support
- Coordination progress indicator with cancel option

When to Use Each
^^^^^^^^^^^^^^^^

**Choose MassGen when you need:**

- Agents that can use tools (web search, code execution, file operations)
- Side-by-side visualization of all agents working simultaneously
- Integration with your codebase or external systems
- A CLI interface or Python API
- Active development and support
- Complex multi-step problem solving

**Choose LLM Council when you need:**

- Simple multi-model response comparison
- Quick anonymous peer ranking of responses
- A lightweight "vibe coded" solution you can fork and modify
- Focus on text-only Q&A without tool requirements

Technical Architecture Differences
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**LLM Council's 3-Stage Pipeline:**

1. **Stage 1**: All models receive the query independently
2. **Stage 2**: Each model ranks other responses (anonymized as "Response A, B, C...")
3. **Stage 3**: A "Chairman" model synthesizes the final answer

**MassGen's Parallel Coordination:**

1. All agents receive the query and work in parallel
2. Agents can see recent answers from other agents at each step
3. Agents choose to provide a new answer OR vote for an existing answer
4. When agents provide answers, their workspace is shared
5. Coordination continues until consensus (all agents vote)
6. The agent with the most votes presents the final answer

The key difference: LLM Council uses a fixed 3-stage pipeline with a designated chairman, while MassGen uses dynamic coordination where agents naturally converge on the best solution through voting.

More Comparisons
----------------

Dedicated comparison pages for the most common "MassGen vs …" questions:

- :doc:`comparisons/crewai` — role-based decomposition with a hosted control plane
- :doc:`comparisons/langgraph` — low-level graph orchestration with the LangChain stack
- :doc:`comparisons/autogen` — multi-agent conversations (Microsoft AutoGen and the community AG2 continuation)

.. toctree::
   :hidden:
   :maxdepth: 1

   comparisons/crewai
   comparisons/langgraph
   comparisons/autogen


---

## reference/comparisons/autogen.rst

==========================
MassGen vs AutoGen / AG2
==========================

`AutoGen <https://github.com/microsoft/autogen>`_ is Microsoft's multi-agent conversation framework (CC-BY-4.0 docs / MIT code, ~58K GitHub stars as of May 2026). It pioneered the "agents that chat with each other and tools" pattern that much of the field now builds on. `AG2 <https://github.com/ag2ai/ag2>`_ is the community-governed continuation of AutoGen (Apache 2.0 with original MIT components, ~4.6K stars, hosted under the new ``ag2ai`` organization). Both descend from the same codebase but have diverged in stewardship and roadmap.

.. note::

   **Maintenance status — read this first.**

   - **Microsoft AutoGen** is in maintenance mode. Microsoft has positioned `microsoft/agent-framework <https://github.com/microsoft/agent-framework>`_ as the enterprise successor, with documented migration paths from both AutoGen and Semantic Kernel, supporting Python + .NET with graph-based orchestration. AutoGen continues to receive bug fixes but no new features are planned.
   - **AG2** is actively developed and serves as the community continuation of the AutoGen lineage. It is the project MassGen's own README cites as a direct predecessor — the "multi-agent conversation" idea in AG2 is part of what MassGen builds on.

   If you are choosing today: AG2 for the AutoGen-style API with active development, Microsoft Agent Framework for the new Microsoft-stack story, and AutoGen itself only for existing codebases pinned to it.

This page compares MassGen with the AutoGen / AG2 lineage. Where AutoGen and AG2 differ, the differences are called out.

.. contents:: On This Page
   :local:
   :depth: 2

Overview
--------

.. list-table::
   :header-rows: 1
   :widths: 18 41 41

   * - Aspect
     - MassGen
     - AutoGen / AG2
   * - **Primary Goal**
     - Parallel coordination of agents on the same task with voting and consensus
     - Multi-agent conversation: agents and tools exchange messages to solve a task
   * - **Architecture**
     - All agents tackle the full task in parallel and converge through voting
     - ``ConversableAgent`` base + group chat / swarm / nested chats / society-of-mind patterns
   * - **Maintenance**
     - Actively developed with regular releases
     - **AutoGen:** maintenance only (successor: Microsoft Agent Framework). **AG2:** actively developed.

Architecture & Coordination Model
---------------------------------

Both **AutoGen** and **AG2** model multi-agent work as a *conversation*. The shared lineage gives them a common shape:

- ``ConversableAgent`` is the base abstraction — agents send and receive messages.
- Group chat coordinates multiple agents through a *speaker selection* policy (round-robin, manager-chosen, etc.).
- Higher-level patterns (swarms, nested chats, society-of-mind) compose conversations into richer flows.
- Tools are registered as Python functions and exposed to agents; MCP servers are supported via extensions.
- Termination is rule-based (max turns, sentinel message, predicate) — there is no native voting / consensus primitive.

AutoGen layers this as Core / AgentChat / Extensions APIs and also ships AutoGen Studio (a no-code GUI). AG2 keeps the same conceptual model but emphasizes open governance ("AgentOS" branding) and is iterating on the API independently of Microsoft.

**MassGen** runs all agents in parallel on the *same* task. Coordination is voting-based: at each step every agent decides between submitting a new answer or voting for an existing one. The orchestrator detects consensus automatically and the winner presents.

In one line: AutoGen / AG2 model multi-agent work as a *conversation* where turn-taking is the control primitive. MassGen models it as *parallel attempts with collective validation* where voting is the control primitive. Both are valid; they optimize for different shapes of problem.

Feature Comparison
------------------

.. list-table::
   :header-rows: 1
   :widths: 22 22 22 34

   * - Feature
     - MassGen
     - AutoGen / AG2
     - Notes
   * - **License**
     - Apache 2.0
     - AutoGen: MIT (code) / CC-BY-4.0 (docs). AG2: Apache 2.0 with original MIT components.
     - Both lineages fully open source for self-hosted use
   * - **Languages**
     - Python
     - AutoGen: Python, .NET / C#. AG2: Python.
     - AutoGen's .NET track is one reason to prefer it on the Microsoft stack
   * - **CLI**
     - ✅ ``massgen``, ``massgen --automation``, ``massgen --web``
     - AutoGen: ``autogenstudio ui``. AG2: Python-first; CLI present but less emphasized.
     - Different focuses
   * - **Python API**
     - ✅ Async API
     - ✅ Core, AgentChat, Extensions (AutoGen); ``ConversableAgent`` + orchestration patterns (AG2)
     - Both layered; pick the level you want
   * - **WebUI / Studio**
     - ✅ Side-by-side agent panels with live streaming and vote/consensus view
     - AutoGen Studio (no-code GUI; docs note it is not production-ready without extra hardening)
     - Different roles
   * - **MCP tools**
     - ✅ First-class on every backend (Claude, Codex, Gemini, OpenAI-compatible, Grok, Claude Code SDK)
     - ✅ MCP server support via extensions in both AutoGen and AG2
     - Both work
   * - **Model providers**
     - 10+ direct backends with per-agent heterogeneity (Claude, Gemini, GPT, Grok, Azure, LM Studio, OpenRouter, Codex, Claude Code SDK)
     - OpenAI primary; other providers via extension clients / generic ``LLMConfig``
     - MassGen's backend matrix is broader and first-class
   * - **Voting / consensus**
     - ✅ Core mechanism; agents vote, winner presents
     - ❌ Not built in (group chat uses speaker selection + termination, not voting)
     - This is the central design difference
   * - **Maintenance**
     - Active development
     - AutoGen: maintenance only. AG2: active.
     - Affects long-term roadmap, not current functionality
   * - **Successor / continuation**
     - n/a
     - AutoGen → `microsoft/agent-framework <https://github.com/microsoft/agent-framework>`_ (Python + .NET, graph-based; migration paths from both AutoGen and Semantic Kernel). AG2 is the community continuation.
     - For new work, evaluate AG2 (Python-first) or Microsoft Agent Framework (Python + .NET)

Voting and Consensus (the MassGen Differentiator)
-------------------------------------------------

AutoGen and AG2 group chats pick the *next speaker*; MassGen's protocol picks the *winner*. The two are not the same:

- Speaker selection is a *turn-taking* mechanism — useful when one agent's output is the input to the next.
- MassGen's voting is a *selection* mechanism — useful when you want N agents to attempt the same thing and the system to identify the strongest answer.

If your task is genuinely conversational (an agent asks another agent to do something, they trade messages, the chat terminates on a condition), AutoGen / AG2 is well-shaped for it. If your task benefits from many parallel attempts converging on the best answer, MassGen is purpose-built for it.

When to Use Each
----------------

**Choose AG2 when you need:**

- An *AutoGen-style API* (``ConversableAgent``, group chats, swarms, nested chats) with active community-led development.
- An open governance model independent of any single corporate steward.
- Compatibility with the broader AutoGen ecosystem of notebooks and patterns.

**Choose Microsoft AutoGen when you need:**

- Compatibility with an existing AutoGen codebase you cannot migrate.
- The .NET / C# code path alongside Python on the Microsoft stack. (Note: for new Microsoft-stack work, Microsoft Agent Framework is the recommended forward path.)

**Choose MassGen when you need:**

- *Parallel attempts + voting* as a first-class control flow with iterative refinement.
- Side-by-side live visualization of every agent's reasoning and answer.
- Heterogeneous backends per agent (Claude + Gemini + GPT + Grok all on the same task).
- An actively developed open-source project with regular releases and a broad backend matrix.

Related
-------

- :doc:`crewai` — role-based decomposition framework
- :doc:`langgraph` — graph-based orchestration substrate
- :doc:`../comparisons` — back to comparisons hub


---

## reference/comparisons/crewai.rst

==================
MassGen vs CrewAI
==================

`CrewAI <https://github.com/crewAIInc/crewAI>`_ is a popular open-source framework (MIT, ~51K GitHub stars as of May 2026) for orchestrating role-playing AI agents. It is independent of LangChain and ships with both a Python SDK and the commercial *CrewAI AMP* (Agent Management Platform) for hosted execution and observability.

This page compares CrewAI with MassGen. The intent is fair-handed: both projects are healthy, the right choice depends on what you are trying to build.

.. contents:: On This Page
   :local:
   :depth: 2

Overview
--------

.. list-table::
   :header-rows: 1
   :widths: 20 40 40

   * - Aspect
     - MassGen
     - CrewAI
   * - **Primary Goal**
     - Parallel multi-agent coordination through voting and consensus on the *same* task
     - Sequential / hierarchical role-based agent teams ("crews") that *decompose* a task across roles
   * - **Architecture**
     - All agents tackle the full task in parallel, observe each other, then vote on a winning answer
     - "Crews" of role-played agents execute task graphs; "Flows" add event-driven control over multiple crews
   * - **Hosted product**
     - Open source only; runs locally, in CI, or in your infra
     - Open source SDK + hosted *Crew Control Plane* / AMP for managed deployment and observability

Architecture & Coordination Model
---------------------------------

**CrewAI** treats a multi-agent task as a *workflow*. The unit of work is a ``Task``, the unit of work-doing is an ``Agent`` with a role/goal/backstory, and a ``Crew`` is the team plus the process (sequential or hierarchical) that runs the tasks. ``Flow`` adds event-driven orchestration so multiple crews can be triggered and composed deterministically. The mental model is closer to a structured pipeline than a debate: each task is owned by one agent, and the framework's job is to dispatch and chain them.

**MassGen** treats a multi-agent task as a *redundant parallel attempt*. All agents receive the same task and produce candidate answers in parallel. At each step every agent sees other agents' most recent answers and can either submit a new answer or vote for an existing one. Coordination ends when consensus is reached, and the winning answer is the one with the most votes. See :doc:`../../user_guide/concepts` for the full coordination model.

In one line: CrewAI is built for *decomposition* (different roles do different sub-tasks). MassGen is built for *refinement* (many agents attack the same task and converge).

Feature Comparison
------------------

.. list-table::
   :header-rows: 1
   :widths: 25 20 20 35

   * - Feature
     - MassGen
     - CrewAI
     - Notes
   * - **License**
     - Apache 2.0
     - MIT
     - Both fully open source for self-hosted use
   * - **CLI**
     - ✅ ``massgen``, ``massgen --automation``, ``massgen --web``
     - ✅ ``crewai`` (project scaffolding, run, install)
     - Different focuses: MassGen CLI is the primary interactive entry point; CrewAI CLI is mostly project bootstrap
   * - **Python API**
     - ✅ Async API, LiteLLM custom provider
     - ✅ Synchronous API, role-based abstractions
     - CrewAI's API centers on ``Agent``/``Task``/``Crew``; MassGen's centers on parallel runs and votes
   * - **WebUI**
     - ✅ Side-by-side agent panels, live streaming, vote/consensus view
     - ✅ CrewAI AMP for hosted deployment, traces, and observability
     - Different roles: MassGen's WebUI visualizes the *coordination*; CrewAI AMP is more of a *deployment dashboard*
   * - **MCP tools**
     - ✅ First-class on every backend (Claude, Codex, Gemini, OpenAI-compatible, Grok, Claude Code SDK)
     - ✅ First-class via ``mcps`` field on Agent and ``MCPServerAdapter``
     - Both support stdio, SSE, and streamable HTTP transports
   * - **Code execution / filesystem tools**
     - ✅ Sandboxed Python/Bash, filesystem with permissioned context paths
     - ✅ Tool ecosystem (web search, code, files) via ``crewai-tools``
     - Different defaults: MassGen ships filesystem permissions and workspace snapshots; CrewAI relies on its tool library
   * - **Backend / model providers**
     - 10+ direct backends (Claude, Gemini, OpenAI, Grok, Azure, LM Studio, OpenRouter, …) + Claude Code SDK + Codex
     - OpenAI default; Ollama, Anthropic, Gemini, and others via configuration
     - MassGen's backend abstraction is heterogenous-by-design (each agent can use a different provider)
   * - **Voting / consensus**
     - ✅ Core mechanism; agents vote, winner presents
     - ❌ Not built in (the framework is task-decomposition oriented)
     - This is the central design difference
   * - **Live streaming**
     - ✅ Token-level streaming to TUI and WebUI
     - ✅ Event/step streaming
     - Both stream; MassGen also streams per-agent in parallel side by side
   * - **Hosted control plane**
     - ❌
     - ✅ CrewAI AMP (hosted + self-hosted offerings)
     - Use CrewAI if you specifically want a managed deployment surface

Voting and Consensus (the MassGen Differentiator)
-------------------------------------------------

CrewAI does not have a native voting mechanism. A "consensus" pattern in CrewAI is something you build yourself by orchestrating multiple agents and writing a reducer task.

In MassGen voting is *the* coordination protocol, not an optional pattern:

- Every agent sees the most recent answer from every other agent at each step.
- Every agent at each step picks one of: submit a new answer, or vote for an existing answer.
- The orchestrator detects consensus automatically and the winner presents.
- Combined with checklist-gated evaluation criteria (see :doc:`../../user_guide/concepts`), this enforces refinement until quality is genuinely achieved rather than declared.

If your task benefits from diverse parallel attempts with collective validation — e.g. writing, design, math, code synthesis with verifier feedback — voting is what MassGen adds that role-based frameworks don't.

When to Use Each
----------------

**Choose CrewAI when you need:**

- A *role-based decomposition* of a task — clear sub-tasks owned by clearly-named agents.
- A managed control plane (CrewAI AMP) for deployment, tracing, and team ergonomics.
- A large existing community / ecosystem of role recipes and tools.

**Choose MassGen when you need:**

- *Parallel refinement* of one task with multiple agents converging on a best answer.
- Side-by-side live visualization of every agent's reasoning and answer.
- Heterogeneous backends per agent (Claude + Gemini + GPT + Grok all on the same task).
- Voting / consensus as a first-class control flow, not a pattern to re-implement.
- A local-first / Apache 2.0 stack with no managed control plane dependency.

Choosing CrewAI does not exclude MassGen and vice versa — they solve adjacent problems. A common pattern is to use MassGen at decision points where multiple strong attempts and voting genuinely add quality, and CrewAI (or similar) where the work cleanly decomposes into roles.

Related
-------

- :doc:`langgraph` — graph-based orchestration (more low-level than CrewAI)
- :doc:`autogen` — multi-agent conversations (in maintenance mode; see successor)
- :doc:`../comparisons` — back to comparisons hub


---

## reference/comparisons/langgraph.rst

=====================
MassGen vs LangGraph
=====================

`LangGraph <https://github.com/langchain-ai/langgraph>`_ is LangChain's low-level orchestration framework for stateful, graph-based agent workflows (MIT, ~32K GitHub stars as of May 2026). It powers production agents built on the LangChain stack and is paired with the commercial *LangSmith Studio* / *LangGraph Platform* for visual prototyping, deployment, and observability.

This page compares LangGraph with MassGen. The two operate at very different levels of abstraction — LangGraph is a graph runtime, MassGen is a coordination protocol. They are often complementary rather than substitutes.

.. contents:: On This Page
   :local:
   :depth: 2

Overview
--------

.. list-table::
   :header-rows: 1
   :widths: 20 40 40

   * - Aspect
     - MassGen
     - LangGraph
   * - **Primary Goal**
     - Parallel multi-agent coordination through voting and consensus on the same task
     - Low-level orchestration of stateful graphs of nodes (agents, tools, branches, retries)
   * - **Architecture**
     - All agents tackle the full task in parallel and converge through voting
     - Explicit ``StateGraph`` of nodes and edges with durable execution and persistent state
   * - **Hosted product**
     - Open source only
     - Open source SDK + *LangGraph Platform* / *LangSmith Studio* for deployment and visual debugging

Architecture & Coordination Model
---------------------------------

**LangGraph** is a graph runtime. You define a typed state, a set of nodes (functions / agents / tools), and edges (conditional branches, parallel fan-outs, loops). The runtime executes the graph, persists state, supports human-in-the-loop interrupts, and can resume from failures. Coordination patterns — supervisor, swarm, plan-and-execute, debate — are *encodings* in the graph, not first-class primitives.

**MassGen** is a coordination *protocol*. Agents run in parallel on the same task, observe each other's most recent answers, and choose between "answer" and "vote." The protocol guarantees the orchestrator can detect consensus and pick a winner deterministically. Refinement is bounded by the protocol, not by a graph the user has to author.

In one line: LangGraph gives you the substrate to build any agent topology. MassGen gives you one specific topology — parallel attempts plus voting — implemented end-to-end with a TUI, WebUI, and backend matrix.

Feature Comparison
------------------

.. list-table::
   :header-rows: 1
   :widths: 25 20 20 35

   * - Feature
     - MassGen
     - LangGraph
     - Notes
   * - **License**
     - Apache 2.0
     - MIT
     - Both fully open source for self-hosted use
   * - **Abstraction level**
     - High — pre-built coordination protocol
     - Low — author your own graph
     - Different products; LangGraph is closer to a workflow runtime than an agent framework
   * - **CLI**
     - ✅ ``massgen``, ``massgen --automation``, ``massgen --web``
     - ✅ ``langgraph`` CLI for the LangGraph Platform / Studio
     - Different focuses
   * - **Python API**
     - ✅ Async API
     - ✅ Python and JS/TS APIs
     - LangGraph's API is broader by virtue of being multi-language
   * - **WebUI**
     - ✅ Side-by-side agent panels, live streaming, vote/consensus view
     - ✅ LangSmith Studio for graph visualization, traces, debugging
     - Studio focuses on *graph* execution; MassGen WebUI focuses on *parallel agents* + voting
   * - **MCP tools**
     - ✅ First-class on every backend
     - ✅ Via the ``langchain-mcp-adapters`` bridge (converts MCP tools to LangChain ``BaseTool``)
     - Both work; LangGraph's path goes through LangChain's tool abstraction
   * - **Model providers**
     - 10+ direct backends including Claude Code SDK + Codex; per-agent heterogeneity
     - Whatever LangChain integrates (extensive)
     - LangChain's integration surface is the largest in the ecosystem
   * - **Voting / consensus**
     - ✅ Core mechanism
     - ❌ Not built in (you can implement it as a node)
     - This is the central design difference
   * - **Durable execution**
     - Workspace snapshots, status files, checkpoint MCP for save/restore
     - ✅ Durable state, checkpoints, resume-after-failure as first-class features
     - LangGraph is the more general purpose runtime here
   * - **Hosted platform**
     - ❌
     - ✅ LangGraph Platform / LangSmith Studio
     - Use LangGraph if you want a managed deployment + observability stack

Voting and Consensus (the MassGen Differentiator)
-------------------------------------------------

LangGraph can *express* a voting topology — define N parallel agent nodes, fan out, then a reducer node that picks a winner. It does not *provide* one. That means:

- You decide when to stop iterating (loop condition vs. quality criteria).
- You write the reducer logic (majority? weighted? based on a verifier?).
- You wire the visualization to surface "this is what each agent said and who won" yourself.

MassGen ships all of the above as a single product: streaming side-by-side panels, vote arrows in the WebUI consensus map, checklist-gated criteria, and a TUI consensus visualization. If parallel + voting is the *primary* thing you want, MassGen is purpose-built for it. If voting is one of many topologies your system needs alongside ETL, branching, and tool-heavy flows, LangGraph is the better substrate.

When to Use Each
----------------

**Choose LangGraph when you need:**

- *Arbitrary agent topologies* you author yourself (supervisor, swarm, plan-execute, custom).
- Durable, resumable execution as a first-class concern (long-running flows, human approvals).
- Tight LangChain ecosystem integration (vector stores, retrievers, evaluators, deployment via LangGraph Platform).

**Choose MassGen when you need:**

- A pre-built *parallel + voting* coordination protocol focused on iterative refinement that you don't have to reimplement.
- Heterogeneous backends per agent on the same task (Claude + Gemini + GPT + Grok, etc.).
- A polished TUI / WebUI showing all agents working simultaneously and their consensus path.
- A local-first stack without a managed deployment platform dependency.

LangGraph and MassGen are at different levels and can be combined: MassGen can be invoked as a tool / subgraph from a larger LangGraph workflow when a particular step benefits from parallel attempts and voting.

Related
-------

- :doc:`crewai` — role-based decomposition framework
- :doc:`autogen` — multi-agent conversations (in maintenance mode; see successor)
- :doc:`../comparisons` — back to comparisons hub


---

## reference/configuration_examples.rst

Configuration Examples
======================

This reference provides a comprehensive catalog of MassGen configuration examples organized by use case, backend provider, and feature set.

Directory Structure
-------------------

All configuration files are located in ``@examples/``:

.. code-block:: text

   @examples/
   ├── basic/                 # Simple configs to get started
   │   ├── single/           # Single agent examples
   │   └── multi/            # Multi-agent examples
   ├── tools/                 # Tool-enabled configurations
   │   ├── mcp/              # MCP server integrations
   │   ├── planning/         # Planning mode examples
   │   ├── web-search/       # Web search enabled configs
   │   ├── code-execution/   # Code interpreter/execution
   │   └── filesystem/       # File operations & workspace
   ├── providers/             # Provider-specific examples
   │   ├── openai/           # GPT-5 series configs
   │   ├── claude/           # Claude API configs
   │   ├── gemini/           # Gemini configs
   │   ├── azure/            # Azure OpenAI
   │   ├── local/            # LMStudio, local models
   │   └── others/           # Cerebras, Grok, Qwen, ZAI
   ├── teams/                # Pre-configured specialized teams
   │   ├── creative/         # Creative writing teams
   │   ├── research/         # Research & analysis
   │   └── development/      # Coding teams
   └── ag2/                  # AG2 framework integration

Quick Start Examples
--------------------

Recommended Showcase Example
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Best starting point for multi-agent collaboration:**

.. code-block:: bash

   # Three powerful agents (Gemini, GPT-5, Grok) with enhanced workspace tools
   massgen \
     --config @examples/basic/multi/three_agents_default.yaml \
     "Your complex task"

This configuration combines:

* **Gemini 2.5 Flash** - Fast, versatile with web search
* **GPT-5 Nano** - Advanced reasoning with code interpreter
* **Grok-3 Mini** - Efficient with real-time web search

Quick Setup Without Config Files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Single agent with model name only:**

.. code-block:: bash

   # Quick test with any supported model - no configuration needed
   massgen --model claude-3-5-sonnet-latest "What is machine learning?"
   massgen --model gemini-2.5-flash "Explain quantum computing"
   massgen --model gpt-5-nano "Summarize the latest AI developments"

**Interactive Mode:**

.. code-block:: bash

   # Start interactive chat (no initial question)
   massgen \
     --config @examples/basic/multi/three_agents_default.yaml

   # Debug mode for troubleshooting
   massgen \
     --config @examples/basic/multi/three_agents_default.yaml \
     --debug "Your question"

Tool-Enabled Configurations
----------------------------

MCP (Model Context Protocol) Servers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

MCP enables agents to use external tools and services:

.. code-block:: bash

   # Weather queries
   massgen \
     --config @examples/tools/mcp/gemini_mcp_example.yaml \
     "What's the weather in Tokyo?"

   # Discord integration
   massgen \
     --config @examples/tools/mcp/claude_code_discord_mcp_example.yaml \
     "Extract latest messages"

See :doc:`../user_guide/tools/mcp_integration` for complete MCP documentation.

Planning Mode
~~~~~~~~~~~~~

Prevent irreversible actions during coordination:

.. code-block:: bash

   # Five agents with planning mode enabled
   massgen \
     --config @examples/tools/planning/five_agents_filesystem_mcp_planning_mode.yaml \
     "Create a comprehensive project structure"

See :doc:`../user_guide/advanced/planning_mode` for complete planning mode documentation.

Web Search
~~~~~~~~~~

For agents with web search capabilities:

.. code-block:: bash

   massgen \
     --config @examples/tools/web-search/claude_streamable_http_test.yaml \
     "Search for latest news"

Code Execution
~~~~~~~~~~~~~~

For code interpretation and execution:

.. code-block:: bash

   massgen \
     --config @examples/tools/code-execution/multi_agent_playwright_automation.yaml \
     "Browse three issues in https://github.com/Leezekun/MassGen and suggest improvements"

Filesystem Operations
~~~~~~~~~~~~~~~~~~~~~

For file manipulation, :term:`workspace` management, and :term:`context path` integration:

.. code-block:: bash

   # Single agent with enhanced file operations
   massgen \
     --config @examples/tools/filesystem/claude_code_single.yaml \
     "Analyze this codebase"

   # Multi-agent workspace collaboration
   massgen \
     --config @examples/tools/filesystem/claude_code_context_sharing.yaml \
     "Create shared workspace files"

See :doc:`../user_guide/files/file_operations` for complete filesystem documentation.

Provider-Specific Examples
--------------------------

Each provider has unique features and capabilities:

OpenAI (GPT-5 Series)
~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   massgen \
     --config @examples/providers/openai/gpt5.yaml \
     "Complex reasoning task"

Claude
~~~~~~

.. code-block:: bash

   massgen \
     --config @examples/tools/mcp/claude_mcp_example.yaml \
     "Creative writing task"

Gemini
~~~~~~

.. code-block:: bash

   massgen \
     --config @examples/tools/mcp/gemini_mcp_example.yaml \
     "Research task"

Local Models
~~~~~~~~~~~~

.. code-block:: bash

   # Requires LM Studio running locally
   massgen \
     --config @examples/providers/local/lmstudio.yaml \
     "Run with local model"

See :doc:`../reference/supported_models` for choosing backends.

Pre-Configured Teams
--------------------

Teams are specialized multi-agent setups for specific domains:

Creative Teams
~~~~~~~~~~~~~~

.. code-block:: bash

   massgen \
     --config @examples/teams/creative/creative_team.yaml \
     "Write a story"

Research Teams
~~~~~~~~~~~~~~

.. code-block:: bash

   massgen \
     --config @examples/teams/research/research_team.yaml \
     "Analyze market trends"

Development Teams
~~~~~~~~~~~~~~~~~

.. code-block:: bash

   massgen \
     --config @examples/providers/others/zai_coding_team.yaml \
     "Build a web app"

Configuration File Format
-------------------------

Single Agent
~~~~~~~~~~~~

.. code-block:: yaml

   agents:
     - id: "agent_name"
       backend:
         type: "provider_type"
         model: "model_name"
         # Additional backend settings
       system_message: "Agent instructions"

   ui:
     display_type: "rich_terminal"
     logging_enabled: true

Multi-Agent
~~~~~~~~~~~

.. code-block:: yaml

   agents:
     - id: "agent1"
       backend:
         type: "provider1"
         model: "model1"
       system_message: "Agent 1 role"

     - id: "agent2"
       backend:
         type: "provider2"
         model: "model2"
       system_message: "Agent 2 role"

   ui:
     display_type: "rich_terminal"
     logging_enabled: true

See :doc:`yaml_schema` for complete configuration reference.

MCP Server Configuration
~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   backend:
     type: "provider"
     model: "model_name"
     mcp_servers:
       - name: "server_name"
         type: "stdio"
         command: "command"
         args: ["arg1", "arg2"]
         env:
           KEY: "${ENV_VAR}"

See :doc:`../user_guide/tools/mcp_integration` for complete MCP configuration.

Finding the Right Configuration
--------------------------------

1. **New Users**: Start with ``basic/single/`` or ``basic/multi/``
2. **Need Tools**: Check ``tools/`` subdirectories for specific capabilities
3. **Specific Provider**: Look in ``providers/`` for your provider
4. **Complex Tasks**: Use pre-configured ``teams/``
5. **Planning Mode**: Use ``tools/planning/`` for tasks with irreversible actions

Release History & Examples
---------------------------

v0.0.29 - Latest
~~~~~~~~~~~~~~~~

**New Features:** :doc:`../user_guide/advanced/planning_mode`, File Operation Safety, Enhanced MCP Tool Filtering

**Key Configurations:**

* ``@examples/tools/planning/five_agents_discord_mcp_planning_mode.yaml`` - Five agents with Discord MCP in planning mode
* ``@examples/tools/planning/five_agents_filesystem_mcp_planning_mode.yaml`` - Five agents with filesystem MCP in planning mode
* ``@examples/tools/planning/five_agents_notion_mcp_planning_mode.yaml`` - Five agents with Notion MCP in planning mode
* ``@examples/tools/mcp/five_agents_weather_mcp_test.yaml`` - Five agents testing weather MCP tools

**Try it:**

.. code-block:: bash

   # Planning mode with filesystem operations
   massgen \
     --config @examples/tools/planning/five_agents_filesystem_mcp_planning_mode.yaml \
     "Create a comprehensive project structure with documentation"

   # Multi-agent weather MCP testing
   massgen \
     --config @examples/tools/mcp/five_agents_weather_mcp_test.yaml \
     "Compare weather forecasts for New York, London, and Tokyo"

v0.0.28
~~~~~~~

**New Features:** :doc:`../user_guide/integration/general_interoperability`, External Agent Backend, Code Execution Support

**Key Configurations:**

* ``@examples/ag2/ag2_single_agent.yaml`` - Basic single AG2 agent setup
* ``@examples/ag2/ag2_coder.yaml`` - AG2 agent with code execution capabilities
* ``@examples/ag2/ag2_gemini.yaml`` - AG2-Gemini hybrid configuration

**Try it:**

.. code-block:: bash

   # AG2 single agent with code execution
   massgen \
     --config @examples/ag2/ag2_coder.yaml \
     "Create a factorial function and calculate the factorial of 8"

   # Mixed team: AG2 agent + Gemini agent
   massgen \
     --config @examples/ag2/ag2_gemini.yaml \
     "what is quantum computing?"

v0.0.27
~~~~~~~

**New Features:** Multimodal Support (Image Processing), File Upload and File Search

**Key Configurations:**

* ``@examples/basic/multi/gpt4o_image_generation.yaml`` - Multi-agent image generation
* ``@examples/basic/multi/gpt5nano_image_understanding.yaml`` - Multi-agent image understanding
* ``@examples/basic/single/single_gpt5nano_file_search.yaml`` - File search for document Q&A

**Try it:**

.. code-block:: bash

   # Image generation
   massgen \
     --config @examples/basic/single/single_gpt4o_image_generation.yaml \
     "Generate an image of a gray tabby cat hugging an otter"

   # Image understanding
   massgen \
     --config @examples/basic/multi/gpt5nano_image_understanding.yaml \
     "Please summarize the content in this image"

v0.0.26
~~~~~~~

**New Features:** File Deletion, :doc:`../user_guide/files/protected_paths`, File-Based Context Paths

**Key Configurations:**

* ``@examples/tools/filesystem/gemini_gpt5nano_protected_paths.yaml`` - Protected paths configuration
* ``@examples/tools/filesystem/gemini_gpt5nano_file_context_path.yaml`` - File-based context paths
* ``@examples/tools/filesystem/grok4_gpt5_gemini_filesystem.yaml`` - Multi-agent filesystem collaboration

**Try it:**

.. code-block:: bash

   # Protected paths - keep reference files safe
   massgen \
     --config @examples/tools/filesystem/gemini_gpt5nano_protected_paths.yaml \
     "Review the HTML and CSS files, then improve the styling"

v0.0.25
~~~~~~~

**New Features:** :doc:`../user_guide/sessions/multi_turn_mode` Filesystem Support, SGLang Backend Integration

**Key Configurations:**

* ``@examples/tools/filesystem/multiturn/two_gemini_flash_filesystem_multiturn.yaml`` - Multi-turn with Gemini agents
* ``@examples/tools/filesystem/multiturn/grok4_gpt5_claude_code_filesystem_multiturn.yaml`` - Three-agent multi-turn
* ``@examples/basic/multi/two_qwen_vllm_sglang.yaml`` - Mixed vLLM and SGLang deployment

**Example Multi-Turn Session:**

.. code-block:: bash

   # Turn 1 - Initial creation
   massgen \
     --config @examples/tools/filesystem/multiturn/two_gemini_flash_filesystem_multiturn.yaml

   Turn 1: Make a website about Bob Dylan
   # Creates workspace and saves state to .massgen/sessions/

   # Turn 2 - Enhancement based on Turn 1
   Turn 2: Remove the image placeholder and improve the appearance
   # Automatically loads Turn 1's workspace state

v0.0.24 and Earlier
~~~~~~~~~~~~~~~~~~~

See the `GitHub repository <https://github.com/Leezekun/MassGen/blob/main/@examples/README.md>`_ for complete release history including:

* v0.0.24 - vLLM Backend Support
* v0.0.23 - Backend Architecture Refactoring
* v0.0.22 - Workspace Copy Tools via MCP
* v0.0.21 - Advanced Filesystem Permissions
* v0.0.20 - Claude MCP Support
* v0.0.17 - OpenAI MCP Integration
* v0.0.16 - Unified Filesystem Support
* v0.0.15 - Gemini MCP Integration
* v0.0.12-14 - Enhanced Logging
* v0.0.10 - Azure OpenAI Support
* v0.0.7 - Local Model Support
* v0.0.5 - Claude Code Integration

Environment Variables
---------------------

Most configurations use environment variables for API keys. Set up your ``.env`` file based on ``.env.example``:

**Provider-specific keys:**

* ``OPENAI_API_KEY`` - OpenAI models
* ``ANTHROPIC_API_KEY`` - Claude models
* ``GOOGLE_API_KEY`` - Gemini models
* ``XAI_API_KEY`` - Grok models
* ``AZURE_OPENAI_API_KEY`` - Azure OpenAI

**MCP server keys:**

* ``DISCORD_BOT_TOKEN`` - Discord MCP integration
* ``BRAVE_API_KEY`` - Brave Search MCP integration

See :doc:`../quickstart/configuration` for complete environment setup.

Naming Convention
-----------------

MassGen configuration files follow this pattern for clarity:

**Format:** ``{agents}_{features}_{description}.yaml``

**1. Agents** (who's participating):

* ``single-{provider}`` - Single agent (e.g., ``single-claude``, ``single-gemini``)
* ``{provider1}-{provider2}`` - Two agents (e.g., ``claude-gemini``, ``gemini-gpt5``)
* ``three-mixed`` - Three agents from different providers
* ``team-{type}`` - Specialized teams (e.g., ``team-creative``, ``team-research``)

**2. Features** (what tools/capabilities):

* ``basic`` - No special tools, just conversation
* ``mcp`` - MCP server integration
* ``mcp-{service}`` - Specific MCP service (e.g., ``mcp-discord``, ``mcp-weather``)
* ``websearch`` - Web search enabled
* ``codeexec`` - Code execution/interpreter
* ``filesystem`` - File operations and workspace management

**3. Description** (purpose/context - optional):

* ``showcase`` - Demonstration/getting started example
* ``test`` - Testing configuration
* ``research`` - Research and analysis tasks
* ``dev`` - Development and coding tasks
* ``collab`` - Collaboration example

**Note:** Existing configs maintain their current names for compatibility. New configs should follow this convention.

Related Documentation
---------------------

* :doc:`../quickstart/configuration` - Configuration guide with step-by-step setup
* :doc:`yaml_schema` - Complete YAML schema reference
* :doc:`supported_models` - All supported models and backends
* :doc:`cli` - Command-line interface reference
* :doc:`../user_guide/tools/mcp_integration` - MCP tool integration guide
* :doc:`../user_guide/advanced/planning_mode` - Planning mode documentation
* :doc:`../user_guide/files/protected_paths` - Protected paths feature


---

## reference/mcp_server_registry.rst

MCP Server Registry
===================

MassGen includes a curated registry of recommended MCP (Model Context Protocol) servers
that are automatically available when auto-discovery is enabled.

Overview
--------

The MCP server registry provides pre-configured, tested MCP servers that extend
agent capabilities. When you enable ``auto_discover_custom_tools: true`` in your
configuration, these servers are automatically included if their API keys are
available (or not required).

**Registry Location:** ``massgen/mcp_tools/server_registry.py``

Available Servers
-----------------

Context7
~~~~~~~~

**Purpose:** Up-to-date code documentation for libraries and frameworks

**Type:** stdio (local) or streamable-http (remote)

**API Key:** Optional (``CONTEXT7_API_KEY``)

**Connection:**

.. code-block:: yaml

   mcp_servers:
     - name: "context7"
       type: "stdio"
       command: "npx"
       args: ["-y", "@upstash/context7-mcp"]

**Tools:**

- ``resolve_context7_library_id`` - Convert library names to Context7 IDs
- ``get_library_docs`` - Fetch version-specific documentation (1K-50K tokens)

**Key Features:**

- No API key required for basic use
- Get API key at https://context7.com/dashboard for higher rate limits
- Eliminates outdated information and hallucinated APIs
- Provides current, version-specific documentation

**Important Notes:**

- Outputs can be very large (5K-50K tokens)
- **Recommended:** Write output to file first, then parse
- Use ``topic`` parameter to narrow results
- Adjust ``tokens`` parameter (default: 5000, max: 50000)

**Use Cases:**

- Getting latest framework documentation (React, Next.js, Vue, etc.)
- Finding current API references
- Learning about new features in recent versions
- Avoiding outdated or hallucinated information

**Example:**

.. code-block:: yaml

   # See: massgen/configs/tools/mcp/context7_documentation_example.yaml

Brave Search
~~~~~~~~~~~~

**Purpose:** Web search via Brave API

**Type:** stdio

**API Key:** Required (``BRAVE_API_KEY``)

**Connection:**

.. code-block:: yaml

   mcp_servers:
     - name: "brave_search"
       type: "stdio"
       command: "npx"
       args: ["-y", "@brave/brave-search-mcp-server"]
       env:
         BRAVE_API_KEY: "${BRAVE_API_KEY}"

**Tools:**

- ``brave_web_search`` - Perform web searches
- Additional tools for local search, summarization (Pro tier)

**Key Features:**

- Real-time web search results
- Free tier: 2000 queries/month
- Pro tier: Enhanced features (local search, summarization, extra snippets)

**API Key Setup:**

1. Sign up at https://brave.com/search/api/
2. Generate API key from dashboard
3. Add to ``.env``: ``BRAVE_API_KEY="your_key_here"``

**Rate Limit Warning:**

⚠️  **Free tier limited to 2000 queries/month**

- Execute searches sequentially, not in parallel
- Avoid repeated searches
- Combine multiple questions into single queries
- Consider Pro tier for heavy usage

**Use Cases:**

- Current events and recent information
- Latest trends and updates
- Real-time data queries
- Information not in LLM training data
- Fact verification

**Example:**

.. code-block:: yaml

   # See: massgen/configs/tools/mcp/brave_search_example.yaml

Exa Search
~~~~~~~~~~

**Purpose:** AI-powered web search and page fetch via Exa's official MCP server

**Type:** stdio

**API Key:** Required (``EXA_API_KEY``)

**Connection:**

.. code-block:: yaml

   mcp_servers:
     - name: "exa_search"
       type: "stdio"
       command: "npx"
       args: ["-y", "exa-mcp-server"]
       env:
         EXA_API_KEY: "${EXA_API_KEY}"

MassGen currently wires Exa through the official npm package (``npx -y exa-mcp-server``).
Exa's current docs also highlight a hosted MCP endpoint (``https://mcp.exa.ai/mcp``)
for clients that support HTTP MCP directly.

**Tools:**

- ``web_search_exa`` - Search the web and return ready-to-use content
- ``web_fetch_exa`` - Read a webpage's full content as clean markdown from a URL
- ``web_search_advanced_exa`` - Advanced search with category, domain, date, highlight, and summary controls

**Key Features:**

- Search the web with Exa's AI-oriented ranking and content extraction
- Fetch a known page as clean markdown for summarization or downstream analysis
- Advanced search options documented by Exa include category, domain, and date filters
- Official Exa docs currently prefer the hosted MCP endpoint for HTTP-capable clients,
  while MassGen's registry uses the official npm package form above

**API Key Setup:**

1. Sign up at https://exa.ai/
2. Generate API key from dashboard
3. Add to ``.env``: ``EXA_API_KEY="your_key_here"``

**Use Cases:**

- Research queries requiring semantic understanding
- Fetching and summarizing the full contents of a known page
- Category-specific searches (research papers, news, companies)
- Time-bounded or domain-restricted searches when advanced Exa tooling is enabled

**Example:**

.. code-block:: yaml

   # See: massgen/configs/tools/web-search/exa_search_example.yaml

Auto-Discovery
--------------

When ``auto_discover_custom_tools: true`` is set in your backend configuration,
MassGen automatically includes registry servers that are available:

**Always Included:**

- Context7 (no API key required)

**Conditionally Included:**

- Brave Search (only if ``BRAVE_API_KEY`` is set in ``.env``)
- Exa Search (only if ``EXA_API_KEY`` is set in ``.env``)

**Behavior:**

1. Checks which registry servers have required API keys available
2. Merges available servers into ``mcp_servers`` configuration
3. Avoids duplicates if server is already manually configured
4. Logs which servers were added and which were skipped

**Example:**

.. code-block:: yaml

   agents:
     - id: "research_agent"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
         auto_discover_custom_tools: true  # Automatically adds registry servers!

**Log Output:**

.. code-block:: text

   [gemini] Auto-discovery enabled: Added MCP servers from registry: context7
   [gemini] Registry servers not added (missing API keys): brave_search (needs BRAVE_API_KEY)

Registry Summary Table
----------------------

.. list-table::
   :header-rows: 1
   :widths: 15 10 15 15 45

   * - Server
     - Type
     - API Key
     - Rate Limits
     - Notes
   * - Context7
     - stdio
     - Optional
     - None
     - Large outputs (write to files). Optional API key for higher rate limits.
   * - Brave Search
     - stdio
     - Required
     - 2000/month (free)
     - Avoid parallel queries. Pro tier available for heavy usage.
   * - Exa Search
     - stdio
     - Required
     - Pay-per-use
     - AI-powered neural search. Supports category filtering, content extraction, and deep search.

Manual Configuration
--------------------

You can manually configure any registry server without auto-discovery:

.. code-block:: yaml

   agents:
     - id: "my_agent"
       backend:
         type: "claude"
         model: "claude-sonnet-4"
         mcp_servers:
           - name: "context7"
             type: "stdio"
             command: "npx"
             args: ["-y", "@upstash/context7-mcp"]

This gives you full control over which servers to include and their configuration.

See Also
--------

- :doc:`../user_guide/tools/mcp_integration` - Complete MCP integration guide
- :doc:`yaml_schema` - YAML configuration schema
- ``massgen/configs/tools/mcp/`` - Example configurations


---

## reference/python_api.rst

=========================
Python API & LiteLLM
=========================

MassGen provides two ways to integrate into your Python applications:

1. **LiteLLM Integration** (Recommended) - OpenAI-compatible interface, works with 100+ providers
2. **Direct Python API** - Async-first API with ``massgen.run()`` and ``massgen.build_config()``

.. note::
   **For Contributors:** Looking for internal API documentation? See :doc:`../api/index` for developer API reference of classes and modules.

.. contents:: On This Page
   :local:
   :depth: 2

LiteLLM Integration
===================

The easiest way to use MassGen programmatically. Works with existing LiteLLM-based code.

Quick Start
-----------

.. code-block:: python

   from dotenv import load_dotenv
   load_dotenv()  # Load API keys from .env

   import litellm
   from massgen import register_with_litellm

   # Register MassGen as a provider (call once at startup)
   register_with_litellm()

   # Multi-agent with different models
   response = litellm.completion(
       model="massgen/build",
       messages=[{"role": "user", "content": "What is machine learning?"}],
       optional_params={
           "models": ["openai/gpt-5", "anthropic/claude-sonnet-4-5-20250929"],
       }
   )
   print(response.choices[0].message.content)

Model String Format
-------------------

- ``massgen/build`` - Build config dynamically from ``optional_params`` (most flexible)
- ``massgen/<example-name>`` - Use built-in example config
- ``massgen/model:<model-name>`` - Quick single-agent mode
- ``massgen/path:<config-path>`` - Explicit config file path

Optional Parameters
-------------------

Pass MassGen-specific options via ``optional_params``:

+----------------------+------------------+----------------------------------------------------+
| Parameter            | Type             | Description                                        |
+======================+==================+====================================================+
| ``models``           | list[str]        | List of models for multi-agent                     |
|                      |                  | (e.g., ``["gpt-5", "claude-sonnet-4-5-20250929"]``)|
+----------------------+------------------+----------------------------------------------------+
| ``model``            | str              | Single model name for all agents                   |
+----------------------+------------------+----------------------------------------------------+
| ``num_agents``       | int              | Number of agents when using single model           |
+----------------------+------------------+----------------------------------------------------+
| ``enable_filesystem``| bool             | Enable filesystem/MCP tools (default: True)        |
+----------------------+------------------+----------------------------------------------------+
| ``context_paths``    | list             | Paths with permissions for file operations         |
+----------------------+------------------+----------------------------------------------------+
| ``use_docker``       | bool             | Enable Docker execution mode (default: False)      |
+----------------------+------------------+----------------------------------------------------+
| ``enable_logging``   | bool             | Enable logging and return log directory            |
+----------------------+------------------+----------------------------------------------------+
| ``output_file``      | str              | Write final answer to this file path               |
+----------------------+------------------+----------------------------------------------------+

Examples
--------

.. code-block:: python

   # Multi-agent with filesystem access
   response = litellm.completion(
       model="massgen/build",
       messages=[{"role": "user", "content": "Read the config and summarize"}],
       optional_params={
           "model": "gpt-5",
           "context_paths": [
               {"path": "/path/to/project", "permission": "read"},
               {"path": "/path/to/output", "permission": "write"},
           ],
       }
   )

   # Lightweight mode (no filesystem, faster for simple queries)
   response = litellm.completion(
       model="massgen/build",
       messages=[{"role": "user", "content": "What is 2+2?"}],
       optional_params={
           "model": "gpt-5-nano",
           "enable_filesystem": False,
       }
   )

   # Access coordination metadata
   metadata = response._hidden_params
   print(f"Winner: {metadata.get('massgen_selected_agent')}")
   print(f"Votes: {metadata.get('massgen_vote_results', {}).get('vote_counts')}")

For complete LiteLLM examples, see :doc:`../user_guide/integration/python_api`.

Direct Python API
=================

For async workflows or more control, use ``massgen.run()`` directly.

.. code-block:: python

   import asyncio
   import massgen

   async def main():
       # Single agent with filesystem support (default)
       result = await massgen.run(
           query="What is machine learning?",
           model="gpt-5"
       )
       print(result['final_answer'])

       # Multi-agent mode
       result = await massgen.run(
           query="Compare approaches",
           models=["gpt-5", "claude-sonnet-4-5-20250929"]
       )
       print(result['final_answer'])

       # Lightweight mode (no filesystem)
       result = await massgen.run(
           query="What is 2+2?",
           model="gpt-5-nano",
           enable_filesystem=False
       )
       print(result['final_answer'])

   asyncio.run(main())

API Reference
=============

massgen.run()
-------------

.. code-block:: python

   async def run(
       query: str,
       config: str = None,
       model: str = None,
       models: list = None,
       num_agents: int = None,
       use_docker: bool = False,
       enable_filesystem: bool = True,
       enable_logging: bool = False,
       output_file: str = None,
       **kwargs
   ) -> dict

**Parameters:**

- ``query`` (str): The question or task for the agent(s)
- ``config`` (str, optional): Config file path or ``@examples/NAME``
- ``model`` (str, optional): Model name for agents
- ``models`` (list, optional): List of models for multi-agent mode
- ``num_agents`` (int, optional): Number of agents when using single model
- ``use_docker`` (bool): Enable Docker execution (default: False)
- ``enable_filesystem`` (bool): Enable filesystem/MCP tools (default: True)
- ``enable_logging`` (bool): Enable logging (default: False)
- ``output_file`` (str, optional): Write final answer to file
- ``context_paths`` (list, optional): Paths with permissions for file operations

**Returns:**

.. code-block:: python

   {
       'final_answer': str,        # The generated answer
       'config_used': str,         # Config path or description
       'session_id': str,          # Session ID
       'selected_agent': str,      # Winner (multi-agent)
       'vote_results': dict,       # Voting details
       'answers': list,            # All agent answers
   }

Usage Patterns
==============

Single Agent Mode
-----------------

For simple queries with a single agent:

.. code-block:: python

   import asyncio
   import massgen

   async def single_agent_query():
       result = await massgen.run(
           query="What are the benefits of renewable energy?",
           model="gpt-5-mini"
       )
       return result['final_answer']

   answer = asyncio.run(single_agent_query())
   print(answer)

**Supported Models:**

- OpenAI: ``gpt-5``, ``gpt-5-mini``, ``gpt-5-nano``, ``gpt-4o``, ``o1``
- Anthropic: ``claude-sonnet-4``, ``claude-opus-4``
- Google: ``gemini-2.5-flash``, ``gemini-2.5-pro``, ``gemini-2.0-flash``
- xAI: ``grok-4``, ``grok-4-fast-reasoning``

See :doc:`supported_models` for the complete list.

Multi-Agent with Configuration
-------------------------------

For complex queries requiring multiple agents:

.. code-block:: python

   import asyncio
   import massgen

   async def multi_agent_research():
       result = await massgen.run(
           query="Compare renewable energy sources with analysis",
           config="@examples/research_team"
       )
       return result

   result = asyncio.run(multi_agent_research())
   print(result['final_answer'])
   print(f"Config: {result['config_used']}")

**Built-in Example Configurations:**

Use the ``@examples/`` prefix to access built-in configurations:

- ``@examples/basic/single/single_gpt5nano`` - Single agent configuration
- ``@examples/basic/multi/three_agents_default`` - Three-agent basic setup
- ``@examples/research_team`` - Research-focused agents with web search
- ``@examples/coding_team`` - Code generation with multiple agents

List all available examples:

.. code-block:: bash

   massgen --list-examples

Default Configuration
---------------------

Use your default configuration (from the setup wizard):

.. code-block:: python

   import asyncio
   import massgen

   async def use_default_config():
       # No config or model specified - uses ~/.config/massgen/config.yaml
       result = await massgen.run(
           query="Analyze the impact of AI on healthcare"
       )
       return result['final_answer']

   answer = asyncio.run(use_default_config())
   print(answer)

Custom Configuration Files
---------------------------

Use your own YAML configuration files:

.. code-block:: python

   import asyncio
   import massgen

   async def custom_config():
       result = await massgen.run(
           query="Your question",
           config="./my-agents.yaml"  # Relative path
       )
       return result

   # Or absolute path
   async def custom_config_abs():
       result = await massgen.run(
           query="Your question",
           config="/path/to/my-agents.yaml"
       )
       return result

Named Configurations
--------------------

Use named configurations from ``~/.config/massgen/agents/``:

.. code-block:: python

   import asyncio
   import massgen

   async def named_config():
       # Looks for ~/.config/massgen/agents/research-team.yaml
       result = await massgen.run(
           query="Research question",
           config="research-team"  # No .yaml extension needed
       )
       return result

   answer = asyncio.run(named_config())
   print(answer)

Advanced Usage
==============

Async/Await Patterns
--------------------

Since MassGen is async-native, you can integrate it into async applications:

.. code-block:: python

   import asyncio
   import massgen

   async def process_multiple_queries():
       # Run multiple queries concurrently
       queries = [
           "What is AI?",
           "Explain machine learning",
           "Define neural networks"
       ]

       tasks = [
           massgen.run(query=q, model="gpt-5-mini")
           for q in queries
       ]

       results = await asyncio.gather(*tasks)

       for query, result in zip(queries, results):
           print(f"Q: {query}")
           print(f"A: {result['final_answer']}\n")

   asyncio.run(process_multiple_queries())

Integration with FastAPI
------------------------

MassGen works seamlessly with FastAPI:

.. code-block:: python

   from fastapi import FastAPI
   import massgen

   app = FastAPI()

   @app.post("/query")
   async def handle_query(question: str, model: str = "gpt-5-mini"):
       result = await massgen.run(
           query=question,
           model=model
       )
       return {
           "question": question,
           "answer": result['final_answer'],
           "config": result['config_used']
       }

   # Run with: uvicorn myapp:app

Integration with Jupyter Notebooks
-----------------------------------

MassGen works great in Jupyter notebooks:

.. code-block:: python

   # In a Jupyter cell
   import massgen

   # Jupyter handles the event loop for you
   result = await massgen.run(
       query="Explain photosynthesis",
       model="gemini-2.5-flash"
   )

   print(result['final_answer'])

   # Or create an explicit async cell
   async def research_query():
       return await massgen.run(
           query="Compare programming paradigms",
           config="@examples/research_team"
       )

   result = await research_query()
   print(result['final_answer'])

Error Handling
--------------

Handle errors gracefully:

.. code-block:: python

   import asyncio
   import massgen

   async def safe_query():
       try:
           result = await massgen.run(
               query="Your question",
               model="gpt-5-mini"
           )
           return result['final_answer']

       except ValueError as e:
           print(f"Configuration error: {e}")
           # E.g., config not found, no API key

       except Exception as e:
           print(f"Unexpected error: {e}")
           return None

   answer = asyncio.run(safe_query())

Common Errors
=============

No Configuration Found
----------------------

.. code-block:: python

   ValueError: No config specified and no default config found.
   Run `massgen --init` to create a default configuration.

**Solution:** Run the setup wizard to create a default config:

.. code-block:: bash

   massgen --init

Or specify a config explicitly:

.. code-block:: python

   result = await massgen.run(query="...", config="@examples/basic/multi/three_agents_default")

API Key Not Found
-----------------

If you see API key errors, ensure your keys are configured:

1. Set environment variables:

   .. code-block:: bash

      export OPENAI_API_KEY="sk-..."
      export ANTHROPIC_API_KEY="sk-ant-..."

2. Or create ``~/.config/massgen/.env``:

   .. code-block:: bash

      OPENAI_API_KEY=sk-...
      ANTHROPIC_API_KEY=sk-ant-...

Config Not Found
----------------

.. code-block:: python

   ConfigurationError: Configuration file not found: my-config

**Solution:** Check the config path exists, or use ``@examples/`` for built-in configs.

Best Practices
==============

1. **Use Async/Await Properly**

   .. code-block:: python

      # Good
      result = await massgen.run(query="...")

      # Bad (won't work)
      result = massgen.run(query="...")  # Missing await

2. **Handle Errors**

   Always wrap API calls in try/except blocks for production code.

3. **Reuse Configurations**

   Create named configurations for common use cases:

   .. code-block:: python

      # Save to ~/.config/massgen/agents/research.yaml
      # Then reuse:
      result = await massgen.run(query="...", config="research")

4. **Use Single-Agent Mode for Simple Queries**

   For straightforward questions, single-agent mode is faster:

   .. code-block:: python

      result = await massgen.run(
          query="Quick question",
          model="gpt-5-mini"  # Fast and cheap
      )

5. **Use Multi-Agent Mode for Complex Analysis**

   For research, comparison, or analysis:

   .. code-block:: python

      result = await massgen.run(
          query="Compare X and Y",
          config="@examples/research_team"
      )

See Also
========

- :doc:`../quickstart/installation` - Installation and setup
- :doc:`../quickstart/configuration` - Configuration file format
- :doc:`cli` - Command-line interface reference
- :doc:`supported_models` - Supported models and backends
- :doc:`yaml_schema` - YAML configuration schema


---

## reference/status_file.rst

=======================
status.json Reference
=======================

The ``status.json`` file provides real-time monitoring of MassGen coordination. It is updated every 2 seconds during execution when using ``--automation`` mode.

.. contents:: Table of Contents
   :local:
   :depth: 2

File Location
=============

.. code-block:: text

   .massgen/massgen_logs/log_YYYYMMDD_HHMMSS_ffffff/status.json

The file appears in the log directory immediately after coordination begins and is continuously updated until completion.

Update Frequency
================

- **Updated every 2 seconds** during coordination
- **Final snapshot** written when coordination completes
- **Atomic writes** (temp file + rename) to prevent partial reads

Complete Schema
===============

Root Structure
--------------

.. code-block:: json

   {
     "meta": { ... },
     "coordination": { ... },
     "agents": { ... },
     "results": { ... }
   }

meta Section
------------

Session metadata and timing information.

.. list-table::
   :header-rows: 1
   :widths: 20 15 65

   * - Field
     - Type
     - Description
   * - ``last_updated``
     - float
     - Unix timestamp when this file was last updated
   * - ``session_id``
     - string
     - Log directory name (e.g., ``log_20251103_143022_123456``)
   * - ``log_dir``
     - string
     - Full path to log directory
   * - ``question``
     - string
     - The user's original question
   * - ``start_time``
     - float
     - Unix timestamp when coordination started
   * - ``elapsed_seconds``
     - float
     - Total time elapsed since coordination started

**Example:**

.. code-block:: json

   {
     "meta": {
       "last_updated": 1730678901.234,
       "session_id": "log_20251103_143022_123456",
       "log_dir": ".massgen/massgen_logs/log_20251103_143022_123456",
       "question": "Create a website about Bob Dylan",
       "start_time": 1730678800.000,
       "elapsed_seconds": 101.234
     }
   }

coordination Section
--------------------

Current state of the coordination process.

.. list-table::
   :header-rows: 1
   :widths: 25 15 60

   * - Field
     - Type
     - Description
   * - ``phase``
     - string
     - Current coordination phase: ``initial_answer``, ``enforcement``, or ``presentation``
   * - ``active_agent``
     - string|null
     - Agent ID currently streaming/working, or ``null`` if none active
   * - ``completion_percentage``
     - integer
     - Estimated completion (0-100). Based on answers submitted and votes cast.
   * - ``is_final_presentation``
     - boolean
     - Whether we're in the final presentation phase (winner presenting answer)

**Coordination Phases:**

1. **initial_answer**: Agents are providing their initial answers
2. **enforcement**: Agents are voting on the best answer
3. **presentation**: Winning agent is presenting the final answer

**Example:**

.. code-block:: json

   {
     "coordination": {
       "phase": "enforcement",
       "active_agent": "agent_b",
       "completion_percentage": 65,
       "is_final_presentation": false
     }
   }

agents Section
--------------

Per-agent detailed status. Each agent has its own entry keyed by agent ID.

.. list-table::
   :header-rows: 1
   :widths: 25 15 60

   * - Field
     - Type
     - Description
   * - ``status``
     - string
     - Current agent status (see Status Values below)
   * - ``answer_count``
     - integer
     - Number of answers this agent has provided (usually 0 or 1)
   * - ``latest_answer_label``
     - string|null
     - Label of most recent answer (e.g., ``agent1.1``), or ``null`` if no answer yet
   * - ``vote_cast``
     - object|null
     - Vote information if agent has voted, or ``null`` if not voted yet
   * - ``times_restarted``
     - integer
     - Number of times this agent has been restarted due to new answers from others
   * - ``last_activity``
     - float
     - Unix timestamp of agent's most recent activity
   * - ``error``
     - object|null
     - Error information if agent encountered error, or ``null`` if no error

**Agent Status Values:**

.. list-table::
   :header-rows: 1
   :widths: 20 80

   * - Status
     - Description
   * - ``waiting``
     - Agent has not started work yet
   * - ``streaming``
     - Agent is actively generating content (thinking, reasoning, using tools)
   * - ``answered``
     - Agent has provided an answer but hasn't voted yet
   * - ``voted``
     - Agent has cast their vote
   * - ``restarting``
     - Agent is restarting due to new answer from another agent
   * - ``error``
     - Agent encountered an error
   * - ``timeout``
     - Agent exceeded timeout limit
   * - ``completed``
     - Agent has finished all work

**vote_cast Structure:**

.. code-block:: json

   {
     "voted_for_agent": "agent_b",
     "voted_for_label": "agent2.1",
     "reason_preview": "First 100 characters of vote reason..."
   }

**error Structure:**

.. code-block:: json

   {
     "type": "timeout",
     "message": "Agent timeout after 180s",
     "timestamp": 1730678900.0
   }

**Example:**

.. code-block:: json

   {
     "agents": {
       "agent_a": {
         "status": "voted",
         "answer_count": 1,
         "latest_answer_label": "agent1.1",
         "vote_cast": {
           "voted_for_agent": "agent_b",
           "voted_for_label": "agent2.1",
           "reason_preview": "More comprehensive solution with better structure..."
         },
         "times_restarted": 1,
         "last_activity": 1730678850.123,
         "error": null
       },
       "agent_b": {
         "status": "streaming",
         "answer_count": 1,
         "latest_answer_label": "agent2.1",
         "vote_cast": null,
         "times_restarted": 0,
         "last_activity": 1730678900.456,
         "error": null
       }
     }
   }

results Section
---------------

Aggregated coordination results.

.. list-table::
   :header-rows: 1
   :widths: 25 15 60

   * - Field
     - Type
     - Description
   * - ``votes``
     - object
     - Vote count by answer label. Keys are answer labels (e.g., ``agent1.1``), values are vote counts.
   * - ``winner``
     - string|null
     - Agent ID of the winning agent, or ``null`` if not yet determined
   * - ``final_answer_preview``
     - string|null
     - First 200 characters of final answer, or ``null`` if not available

**Example:**

.. code-block:: json

   {
     "results": {
       "votes": {
         "agent1.1": 1,
         "agent2.1": 2
       },
       "winner": "agent_b",
       "final_answer_preview": "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n    <meta charset=\"UTF-8\">..."
     }
   }

Complete Example
================

Full status.json during enforcement phase:

.. code-block:: json

   {
     "meta": {
       "last_updated": 1730678901.234,
       "session_id": "log_20251103_143022_123456",
       "log_dir": ".massgen/massgen_logs/log_20251103_143022_123456",
       "question": "Create a website about Bob Dylan",
       "start_time": 1730678800.000,
       "elapsed_seconds": 101.234
     },
     "coordination": {
       "phase": "enforcement",
       "active_agent": "agent_b",
       "completion_percentage": 65,
       "is_final_presentation": false
     },
     "agents": {
       "agent_a": {
         "status": "voted",
         "answer_count": 1,
         "latest_answer_label": "agent1.1",
         "vote_cast": {
           "voted_for_agent": "agent_b",
           "voted_for_label": "agent2.1",
           "reason_preview": "More comprehensive solution with better structure and styling..."
         },
         "times_restarted": 1,
         "last_activity": 1730678850.123,
         "error": null
       },
       "agent_b": {
         "status": "streaming",
         "answer_count": 1,
         "latest_answer_label": "agent2.1",
         "vote_cast": null,
         "times_restarted": 0,
         "last_activity": 1730678900.456,
         "error": null
       }
     },
     "results": {
       "votes": {
         "agent1.1": 0,
         "agent2.1": 1
       },
       "winner": null,
       "final_answer_preview": null
     }
   }

Usage Examples
==============

Monitoring Progress
-------------------

**Command line:**

.. code-block:: bash

   # Watch completion percentage
   watch -n 2 'cat .massgen/massgen_logs/log_*/status.json | jq ".coordination.completion_percentage"'

   # Watch which agent is active
   watch -n 2 'cat .massgen/massgen_logs/log_*/status.json | jq ".coordination.active_agent"'

   # Check for errors
   watch -n 2 'cat .massgen/massgen_logs/log_*/status.json | jq ".agents[].error"'

**Reading in scripts:**

.. code-block:: bash

   # Parse with jq
   STATUS_FILE=".massgen/massgen_logs/log_20251103_143022_123456/status.json"

   # Get completion percentage
   jq '.coordination.completion_percentage' $STATUS_FILE

   # Get winner (when done)
   jq '.results.winner' $STATUS_FILE

   # Check if any agent has error
   jq '.agents | to_entries[] | select(.value.error != null)' $STATUS_FILE

Detecting Completion
--------------------

Coordination is complete when:

.. code-block:: bash

   # Method 1: Check winner field
   WINNER=$(jq -r '.results.winner' $STATUS_FILE)
   if [ "$WINNER" != "null" ]; then
       echo "Coordination complete! Winner: $WINNER"
   fi

   # Method 2: Check completion percentage
   COMPLETION=$(jq '.coordination.completion_percentage' $STATUS_FILE)
   if [ "$COMPLETION" -eq 100 ]; then
       echo "Coordination complete!"
   fi

   # Method 3: Check if final presentation
   IS_FINAL=$(jq '.coordination.is_final_presentation' $STATUS_FILE)
   if [ "$IS_FINAL" = "true" ]; then
       echo "In final presentation phase"
   fi

Detecting Errors
----------------

.. code-block:: bash

   # Check for any agent errors
   ERRORS=$(jq '[.agents[] | select(.error != null)] | length' $STATUS_FILE)
   if [ "$ERRORS" -gt 0 ]; then
       echo "Found $ERRORS agent(s) with errors:"
       jq '.agents | to_entries[] | select(.value.error != null) | {agent: .key, error: .value.error}' $STATUS_FILE
   fi

Reading Final Answer
--------------------

Once ``winner`` is not null:

.. code-block:: bash

   # Get winner agent ID
   WINNER=$(jq -r '.results.winner' $STATUS_FILE)

   # Read final answer
   LOG_DIR=$(jq -r '.meta.log_dir' $STATUS_FILE)
   cat "$LOG_DIR/final/$WINNER/answer.txt"

Typical Progression
===================

status.json evolves through these states during coordination:

**1. Initial State** (Just started):

.. code-block:: json

   {
     "coordination": {
       "phase": "initial_answer",
       "active_agent": "agent_a",
       "completion_percentage": 0,
       "is_final_presentation": false
     },
     "agents": {
       "agent_a": {"status": "streaming", "answer_count": 0},
       "agent_b": {"status": "waiting", "answer_count": 0}
     },
     "results": {"votes": {}, "winner": null}
   }

**2. First Answer Provided**:

.. code-block:: json

   {
     "coordination": {
       "phase": "initial_answer",
       "active_agent": "agent_b",
       "completion_percentage": 25
     },
     "agents": {
       "agent_a": {"status": "answered", "answer_count": 1},
       "agent_b": {"status": "streaming", "answer_count": 0}
     },
     "results": {"votes": {}, "winner": null}
   }

**3. Voting Phase**:

.. code-block:: json

   {
     "coordination": {
       "phase": "enforcement",
       "active_agent": "agent_a",
       "completion_percentage": 50
     },
     "agents": {
       "agent_a": {"status": "streaming", "answer_count": 1},
       "agent_b": {"status": "voted", "answer_count": 1, "vote_cast": {...}}
     },
     "results": {
       "votes": {"agent2.1": 1},
       "winner": null
     }
   }

**4. Completed**:

.. code-block:: json

   {
     "coordination": {
       "phase": "presentation",
       "active_agent": null,
       "completion_percentage": 100,
       "is_final_presentation": true
     },
     "agents": {
       "agent_a": {"status": "voted", "answer_count": 1},
       "agent_b": {"status": "voted", "answer_count": 1}
     },
     "results": {
       "votes": {"agent1.1": 1, "agent2.1": 1},
       "winner": "agent_a",
       "final_answer_preview": "<!DOCTYPE html>..."
     }
   }

See Also
========

- :doc:`../user_guide/integration/automation` - Complete automation guide
- :doc:`cli` - CLI reference including ``--automation`` flag
- ``AI_USAGE.md`` - Quick reference for LLM agents


---

## reference/supported_models.rst

Supported Models & Backends
============================

MassGen supports a wide range of LLM providers and models. This page provides comprehensive information about backend types, model support, and setup requirements.

Quick Reference: Backend Setup
--------------------------------

.. list-table::
   :header-rows: 1
   :widths: 30 70

   * - Backend Type
     - Setup Requirements
   * - **Claude API**
     - ``ANTHROPIC_API_KEY``
   * - **Claude Code**
     - Native tools: Read, Write, Edit, Bash, Grep, Glob, TodoWrite. If logged in via Anthropic account, ``ANTHROPIC_API_KEY`` is NOT needed (comment it out in ``.env`` or it will default to using the API key)
   * - **Gemini API**
     - ``GEMINI_API_KEY``
   * - **OpenAI API**
     - ``OPENAI_API_KEY``
   * - **Grok API**
     - ``XAI_API_KEY``
   * - **Azure OpenAI**
     - Azure deployment config: ``AZURE_OPENAI_API_KEY``, ``AZURE_OPENAI_ENDPOINT``, ``AZURE_OPENAI_API_VERSION``
   * - **Z AI**
     - ``ZAI_API_KEY``
   * - **ChatCompletion**
     - ``base_url`` + provider-specific API key (e.g., ``CEREBRAS_API_KEY``, ``TOGETHER_API_KEY``)
   * - **LM Studio**
     - Local LM Studio server running
   * - **vLLM/SGLang**
     - Local inference server on port 8000 (vLLM) or 30000 (SGLang)
   * - **AG2 Framework**
     - AG2 installation + LLM API keys for chosen provider

**For detailed backend capabilities (web search, code execution, MCP support), see:** :doc:`../user_guide/backends`

API-Based Models
----------------

Azure OpenAI
~~~~~~~~~~~~

.. list-table::
   :widths: 40 60

   * - **Models**
     - GPT-4, GPT-4o, GPT-3.5-turbo, GPT-4.1, GPT-5-chat
   * - **Backend Type**
     - ``azure_openai``
   * - **Tools Support**
     - Code interpreter, Azure deployment management
   * - **MCP Support**
     - ❌ Not yet supported

Claude (Anthropic)
~~~~~~~~~~~~~~~~~~

.. list-table::
   :widths: 40 60

   * - **Models**
     - Haiku 3.5, Sonnet 4, Opus 4 series
   * - **Backend Type**
     - ``claude``
   * - **Tools Support**
     - ✅ Web search, code execution, file operations
   * - **MCP Support**
     - ✅ Full integration

Claude Code
~~~~~~~~~~~

.. list-table::
   :widths: 40 60

   * - **Models**
     - Native Claude Code SDK
   * - **Backend Type**
     - ``claude_code``
   * - **Tools Support**
     - ✅ **Native dev tools**: Read, Write, Edit, Bash, Grep, Glob, TodoWrite
   * - **MCP Support**
     - ✅ Full integration

Gemini (Google)
~~~~~~~~~~~~~~~

.. list-table::
   :widths: 40 60

   * - **Models**
     - Gemini 2.5 Flash, Gemini 2.5 Pro series
   * - **Backend Type**
     - ``gemini``
   * - **Tools Support**
     - ✅ Web search, code execution, file operations
   * - **MCP Support**
     - ✅ Full integration with planning mode

Grok (xAI)
~~~~~~~~~~

.. list-table::
   :widths: 40 60

   * - **Models**
     - Grok-4, Grok-3, Grok-3-mini series
   * - **Backend Type**
     - ``grok``
   * - **Tools Support**
     - ✅ Web search, file operations
   * - **MCP Support**
     - ✅ Full integration

OpenAI
~~~~~~

.. list-table::
   :widths: 40 60

   * - **Models**
     - GPT-5.2, GPT-5, GPT-5-mini, GPT-5-nano, GPT-4 series
   * - **Backend Type**
     - ``openai``
   * - **Tools Support**
     - ✅ Web search, code interpreter, file operations
   * - **MCP Support**
     - ✅ Full integration

Z AI
~~~~

.. list-table::
   :widths: 40 60

   * - **Models**
     - GLM-4.5
   * - **Backend Type**
     - ``zai``
   * - **Tools Support**
     - File operations
   * - **MCP Support**
     - ✅ Integration available

ChatCompletion (Generic OpenAI-Compatible)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``chatcompletion`` backend provides a generic way to connect to any OpenAI-compatible API endpoint. This is the most flexible backend type and works with many providers.

.. list-table::
   :widths: 40 60

   * - **Backend Type**
     - ``chatcompletion``
   * - **Compatible Providers**
     - Cerebras AI, Together AI, Fireworks AI, Groq, OpenRouter, POE, and any OpenAI-compatible API
   * - **Required Config**
     - ``base_url`` pointing to the provider's API endpoint
   * - **API Key**
     - Provider-specific (e.g., ``CEREBRAS_API_KEY``, ``TOGETHER_API_KEY``)
   * - **MCP Support**
     - ✅ Full integration
   * - **Tools Support**
     - Depends on provider's function calling support

**Configuration Example:**

.. code-block:: yaml

   backend:
     type: "chatcompletion"
     model: "gpt-oss-120b"              # Model name
     base_url: "https://api.cerebras.ai/v1"  # Provider endpoint
     api_key: "${CEREBRAS_API_KEY}"    # Provider API key
     temperature: 0.7
     max_tokens: 2000
     mcp_servers:                       # Optional MCP tools
       - name: "weather"
         type: "stdio"
         command: "npx"
         args: ["-y", "@modelcontextprotocol/server-weather"]

**Supported Providers:**

.. list-table::
   :header-rows: 1
   :widths: 25 35 40

   * - Provider
     - Base URL
     - Environment Variable
   * - **Cerebras AI**
     - ``https://api.cerebras.ai/v1``
     - ``CEREBRAS_API_KEY``
   * - **Together AI**
     - ``https://api.together.xyz/v1``
     - ``TOGETHER_API_KEY``
   * - **Fireworks AI**
     - ``https://api.fireworks.ai/inference/v1``
     - ``FIREWORKS_API_KEY``
   * - **Groq**
     - ``https://api.groq.com/openai/v1``
     - ``GROQ_API_KEY``
   * - **OpenRouter**
     - ``https://openrouter.ai/api/v1``
     - ``OPENROUTER_API_KEY``
   * - **Kimi/Moonshot**
     - ``https://api.moonshot.cn/v1``
     - ``MOONSHOT_API_KEY``
   * - **Nebius AI Studio**
     - Provider-specific
     - ``NEBIUS_API_KEY``
   * - **POE**
     - Platform-specific
     - Platform credentials

**Common Models:**

* **Cerebras**: ``gpt-oss-120b``, ``gpt-oss-70b``
* **Together AI**: ``meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo``, ``mistralai/Mixtral-8x7B-Instruct-v0.1``
* **Fireworks AI**: ``accounts/fireworks/models/llama-v3p1-405b-instruct``
* **Groq**: ``llama-3.1-70b-versatile``, ``mixtral-8x7b-32768``

Tool Enablement Reference
--------------------------

This section shows exactly which configuration parameters work with which backends.

Backend-Level Tool Parameters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 20 20 20 20 20

   * - Backend
     - ``enable_web_search``
     - ``enable_code_execution``
     - ``enable_code_interpreter``
     - Notes
   * - **claude**
     - ✅
     - ✅
     - ❌
     - Built-in tools via Anthropic API
   * - **claude_code**
     - N/A
     - N/A
     - N/A
     - Native tools always available: Read, Write, Edit, Bash, Grep, Glob, TodoWrite. Control via ``allowed_tools`` or ``disallowed_tools``
   * - **gemini**
     - ✅
     - ✅
     - ❌
     - Google Search and code execution tools
   * - **openai**
     - ✅
     - ❌
     - ✅
     - Web search via Responses API, code interpreter for calculations
   * - **grok**
     - ✅
     - ❌
     - ❌
     - Built-in Live Search feature
   * - **azure_openai**
     - ❌
     - ❌
     - ❌
     - Limited tool support
   * - **zai**
     - ❌
     - ❌
     - ❌
     - Basic file operations only
   * - **chatcompletion**
     - Varies
     - Varies
     - Varies
     - Depends on provider (Cerebras, Together AI, etc.)
   * - **lmstudio**
     - ❌
     - ❌
     - ❌
     - Local models, tool support varies
   * - **vllm**
     - ❌
     - ❌
     - ❌
     - Local inference server
   * - **sglang**
     - ❌
     - ❌
     - ❌
     - Local inference server
   * - **ag2**
     - N/A
     - N/A
     - N/A
     - Uses AG2 code execution config

MCP Backend Parameters
~~~~~~~~~~~~~~~~~~~~~~

These parameters are available for all backends with MCP support (Claude, Gemini, OpenAI, Grok, ChatCompletion, etc.).

.. list-table::
   :header-rows: 1
   :widths: 25 15 60

   * - Parameter
     - Type
     - Description & Usage
   * - ``cwd``
     - string
     - Working directory for MCP filesystem operations. Relative or absolute path. Available for all MCP-enabled backends.
   * - ``allowed_tools``
     - list
     - Whitelist specific tools. Only listed tools will be available. Example: ``["read_file", "write_file", "list_directory"]``
   * - ``disallowed_tools``
     - list
     - Blacklist specific tools. All tools available except those listed. Example: ``["write_file", "create_directory", "move_file"]``
   * - ``exclude_tools``
     - list
     - Exclude specific MCP tools from being available to the agent. Similar to ``disallowed_tools`` for MCP servers.

Claude Code Additional Parameters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

These parameters are specific to the Claude Code backend only.

.. list-table::
   :header-rows: 1
   :widths: 25 15 60

   * - Parameter
     - Type
     - Description & Usage
   * - ``max_thinking_tokens``
     - integer
     - Maximum tokens for internal reasoning. Default: 8000. Increase for complex tasks.
   * - ``system_prompt``
     - string
     - Custom system prompt for the agent. Prepended to default instructions.
   * - ``permission_mode``
     - string
     - ``"bypassPermissions"`` to skip confirmation prompts (use in automation)
   * - ``disallowed_tools``
     - list
     - For Claude Code native tools (Read, Write, Edit, Bash, etc.). Default: ``["Bash(rm*)", "Bash(sudo*)", "Bash(su*)", "Bash(chmod*)", "Bash(chown*)"]``. Example to block web access: ``["Bash(rm*)", "WebSearch"]``

**Example MCP Configuration (any backend):**

.. code-block:: yaml

   backend:
     type: "gemini"  # or claude, openai, grok, etc.
     model: "gemini-2.5-flash"
     cwd: "my_project"  # File operations handled via cwd
     disallowed_tools: ["mcp__weather__set_location"]
     mcp_servers:
       - name: "weather"
         type: "stdio"
         command: "npx"
         args: ["-y", "@modelcontextprotocol/server-weather"]

**Example Claude Code Configuration:**

.. code-block:: yaml

   backend:
     type: "claude_code"
     model: "claude-sonnet-4-20250514"
     cwd: "my_project"
     disallowed_tools: ["Bash(rm*)", "Bash(sudo*)", "WebSearch"]
     max_thinking_tokens: 10000
     system_prompt: "You are an expert Python developer"

Local Models
------------

LM Studio
~~~~~~~~~

.. list-table::
   :widths: 40 60

   * - **Models**
     - LLaMA, Mistral, Qwen, and other open-weight models
   * - **Backend Type**
     - ``lmstudio``
   * - **Features**
     - Automatic CLI installation, auto-download, zero-cost usage
   * - **MCP Support**
     - Limited

vLLM & SGLang
~~~~~~~~~~~~~

Unified inference backend supporting both vLLM and SGLang servers.

.. list-table::
   :widths: 40 60

   * - **Port Detection**
     - Auto-detection: vLLM (8000), SGLang (30000)
   * - **Parameters**
     - Supports both vLLM and SGLang-specific params (top_k, repetition_penalty, separate_reasoning)
   * - **Mixed Deployment**
     - Can run both vLLM and SGLang servers simultaneously

External Frameworks
-------------------

AG2
~~~~~~~~~~~~~~~

.. list-table::
   :widths: 40 60

   * - **Agent Types**
     - ConversableAgent, AssistantAgent
   * - **Backend Type**
     - ``ag2``
   * - **Features**
     - Code execution (Local, Docker, Jupyter, Cloud)
   * - **LLM Support**
     - OpenAI, Azure, Anthropic, Google via AG2 config

See Also
--------

* :doc:`../user_guide/backends` - Detailed backend configuration
* :doc:`../user_guide/tools/mcp_integration` - MCP tool setup
* :doc:`../user_guide/integration/general_interoperability` - Framework interoperability (including AG2)
* :doc:`yaml_schema` - YAML configuration reference


---

## reference/timeouts.rst

Timeout Configuration
=====================

MassGen provides timeout configuration to control how long coordination and agent operations can run before being terminated. This prevents runaway processes and ensures predictable execution times.

Quick Reference
---------------

**Default Timeouts**:

* **Orchestrator**: 1800 seconds (30 minutes)
* **Per-Round**: Disabled by default in YAML configs; enabled in ``--quickstart`` (10 min initial, 5 min subsequent)
* **Grace Period**: 120 seconds (time after soft timeout before hard block)

**CLI Override**:

.. code-block:: bash

   uv run python -m massgen.cli \
     --orchestrator-timeout 600 \
     --config config.yaml \
     "Your question"

**Config File**:

.. code-block:: yaml

   timeout_settings:
     orchestrator_timeout_seconds: 1800
     initial_round_timeout_seconds: 600      # 10 min for first answer
     subsequent_round_timeout_seconds: 180   # 3 min for voting rounds
     round_timeout_grace_seconds: 120        # Grace period before hard block

Timeout Types
-------------

MassGen has two levels of timeout control:

1. **Orchestrator Timeout**: Overall session limit (kills entire coordination)
2. **Per-Round Timeout**: Individual round limits (prompts agents to submit)

Orchestrator Timeout
~~~~~~~~~~~~~~~~~~~~

Controls the maximum time for multi-agent coordination:

* **Covers**: Entire coordination process (all rounds of voting and consensus)
* **Default**: 1800 seconds (30 minutes)
* **When it triggers**: Coordination exceeds the time limit
* **What happens**: Coordination terminates gracefully, current state is saved

.. code-block:: yaml

   timeout_settings:
     orchestrator_timeout_seconds: 600  # 10 minutes

Per-Round Timeout
~~~~~~~~~~~~~~~~~

Controls the maximum time for individual agent rounds. This prevents agents from getting stuck in analysis loops (e.g., repeatedly analyzing the same image with inconsistent results).

* **Covers**: Single round of agent work (initial answer or voting)
* **Default**: Needs to be added in YAML configs; ``--quickstart`` enables with 600s/300s/120s
* **When it triggers**: Agent exceeds time limit for current round
* **What happens**: Two-phase timeout (soft warning, then hard block)

**Configuration Options**:

.. code-block:: yaml

   timeout_settings:
     initial_round_timeout_seconds: 600    # Soft timeout for round 0 (initial answer)
     subsequent_round_timeout_seconds: 180 # Soft timeout for rounds 1+ (voting)
     round_timeout_grace_seconds: 120      # Grace period before hard block

**Two-Phase Timeout Behavior**:

1. **Soft Timeout**: When reached, a friendly warning message is injected telling the agent to wrap up and submit. The agent can still finish final touches to make their work presentable.

2. **Hard Timeout**: After the grace period expires (soft timeout + ``round_timeout_grace_seconds``), non-terminal tool calls are blocked. Only ``vote`` and ``new_answer`` tools are allowed.

**Timeline Example** (initial round with 600s timeout + 120s grace):

.. code-block:: text

   0-600s:   Agent works normally
   600s:     Soft timeout - friendly warning message injected
   600-720s: Grace period - agent can finish final touches
   720s+:    Hard timeout - non-terminal tools blocked, only vote/new_answer allowed

**Soft Timeout Message** (from ``RoundTimeoutPostHook``):

.. code-block:: text

   ============================================================
   ⏰ ROUND TIME LIMIT APPROACHING - PLEASE WRAP UP
   ============================================================

   You have exceeded the soft time limit for this initial answer round (605s / 600s).

   Please wrap up your current work and submit soon:
   1. `new_answer` - Submit your current best answer (can be a work-in-progress)
   2. `vote` - Vote for an existing answer if one is satisfactory

   You may finish any final touches to make your work presentable, but please
   submit within the next 120 seconds. After that, tool calls
   will be blocked and you'll need to submit immediately.

   The next coordination round will allow further iteration if needed.
   ============================================================

**Why Use Per-Round Timeouts**:

* **Prevent stuck agents**: Agents can get caught in loops (e.g., repeatedly calling vision tools on the same image)
* **Predictable costs**: Cap spending on individual rounds
* **Fairer coordination**: Ensure all agents get timely turns
* **Different phases, different needs**: Initial answers need more time than voting rounds

**Smart Injection Skipping**:

When a new answer arrives from another agent, MassGen normally injects it mid-stream so the current agent can consider it. However, if the agent is close to their soft timeout, injection is skipped and the agent restarts instead. This ensures agents have enough time to properly consider new answers rather than being forced to submit immediately after seeing them.

The threshold is ``round_timeout_grace_seconds`` - if remaining time before soft timeout is less than the grace period, injection is skipped.

.. code-block:: text

   [Orchestrator] Skipping mid-stream injection for agent_a - only 45s until soft timeout (need 120s to think)

Subagent Round Timeouts
~~~~~~~~~~~~~~~~~~~~~~~

Subagents can use per-round timeouts too. Configure them under ``orchestrator.coordination.subagent_round_timeouts``.
If omitted, subagents inherit the parent ``timeout_settings`` values.

.. code-block:: yaml

   orchestrator:
     coordination:
       enable_subagents: true
       subagent_round_timeouts:
         initial_round_timeout_seconds: 300
         subsequent_round_timeout_seconds: 120
         round_timeout_grace_seconds: 60

Configuration Methods

---------------------

Method 1: CLI Flag (Highest Priority)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Override timeout for a single run:

.. code-block:: bash

   # Short timeout for simple task
   uv run python -m massgen.cli \
     --orchestrator-timeout 300 \
     --config config.yaml \
     "What are LLM agents?"

   # Longer timeout for complex research
   uv run python -m massgen.cli \
     --orchestrator-timeout 3600 \
     --config config.yaml \
     "Conduct comprehensive market analysis with 5 agents"

Method 2: Configuration File
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Set timeout in your YAML configuration:

.. code-block:: yaml

   # Basic configuration with custom timeout
   agents:
     - id: "agent1"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"

   timeout_settings:
     orchestrator_timeout_seconds: 900  # 15 minutes

   ui:
     display_type: "rich_terminal"

Method 3: Default (No Configuration)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If not specified, MassGen uses the default 30-minute timeout:

.. code-block:: yaml

   # This configuration will use default 1800s timeout
   agents:
     - id: "agent1"
       backend:
         type: "openai"
         model: "gpt-4o"

Timeout Behavior
----------------

What Happens When Timeout Occurs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When the orchestrator timeout is reached:

1. **Current coordination round completes** (not interrupted mid-operation)
2. **Partial results saved** (current state is preserved)
3. **Error message displayed** indicating timeout
4. **Graceful shutdown** (agents cleanup properly)

.. code-block:: text

   🔄 Round 5 of coordination...
   ⏰ Orchestrator timeout reached (1800 seconds)
   💾 Saving current state...
   ❌ Coordination incomplete - timeout exceeded

**Important**: The system attempts graceful termination. Individual agent operations may still complete if they're in progress.

Successful Completion Before Timeout
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If coordination completes normally:

.. code-block:: text

   ✅ Coordination complete!
   ⏱️  Total time: 245 seconds (well under 1800s limit)

Choosing the Right Timeout
---------------------------

Simple Tasks (5-10 minutes)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Recommended**: 300-600 seconds

.. code-block:: yaml

   timeout_settings:
     orchestrator_timeout_seconds: 600

**Examples**:

* Quick research questions
* Single-agent tasks
* Fast LLM models (GPT-4o-mini, Gemini Flash)
* Tasks with 2-3 agents

.. code-block:: bash

   uv run python -m massgen.cli \
     --orchestrator-timeout 600 \
     --model gemini-2.5-flash \
     "What are the key features of Python 3.12?"

Standard Tasks (15-30 minutes)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Recommended**: 900-1800 seconds (default)

.. code-block:: yaml

   timeout_settings:
     orchestrator_timeout_seconds: 1800  # Default

**Examples**:

* Multi-agent coordination (3-5 agents)
* Tasks with external API calls (MCP tools)
* Code generation with file operations
* Research with web search

.. code-block:: bash

   uv run python -m massgen.cli \
     --config multi_agent_config.yaml \
     "Analyze market trends and create a report"

Complex Tasks (30-60 minutes)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Recommended**: 1800-3600 seconds

.. code-block:: yaml

   timeout_settings:
     orchestrator_timeout_seconds: 3600  # 1 hour

**Examples**:

* Large-scale code refactoring
* Comprehensive research with many sources
* Tasks involving multiple API calls
* 5+ agents coordination
* Planning mode with extensive discussion

.. code-block:: bash

   uv run python -m massgen.cli \
     --orchestrator-timeout 3600 \
     --config five_agents_research.yaml \
     "Conduct a complete competitive analysis of the AI market"

Long-Running Tasks (60+ minutes)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Recommended**: 3600+ seconds

.. code-block:: yaml

   timeout_settings:
     orchestrator_timeout_seconds: 7200  # 2 hours

.. warning::

   Very long timeouts can lead to expensive API costs. Consider breaking down the task or using checkpoints.

**Examples**:

* Full codebase analysis
* Large-scale data processing
* Multi-stage project generation
* Complex multi-turn conversations

Examples by Task Type
----------------------

Example 1: Quick Analysis
~~~~~~~~~~~~~~~~~~~~~~~~~

**Task**: Simple question, single agent

.. code-block:: bash

   uv run python -m massgen.cli \
     --orchestrator-timeout 300 \
     --backend openai \
     --model gpt-4o-mini \
     "Explain quantum entanglement in simple terms"

**Reasoning**: Single agent with fast model, expected completion in 1-2 minutes, 5-minute timeout gives buffer.

Example 2: Multi-Agent Research
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Task**: Three agents researching and comparing approaches

.. code-block:: yaml

   agents:
     - id: "researcher1"
       backend: {type: "gemini", model: "gemini-2.5-flash"}
     - id: "researcher2"
       backend: {type: "openai", model: "gpt-4o"}
     - id: "researcher3"
       backend: {type: "claude", model: "claude-sonnet-4"}

   timeout_settings:
     orchestrator_timeout_seconds: 1200  # 20 minutes

**Reasoning**: Multiple rounds of coordination expected, web search enabled, 20 minutes allows for thorough research and discussion.

Example 3: Code Generation with Files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Task**: Generate project structure with multiple files

.. code-block:: yaml

   agents:
     - id: "architect"
       backend: {type: "claude_code", cwd: "workspace"}
     - id: "reviewer"
       backend: {type: "gemini", model: "gemini-2.5-flash"}

   orchestrator:
     coordination:
       enable_planning_mode: true

   timeout_settings:
     orchestrator_timeout_seconds: 1800  # 30 minutes

**Reasoning**: Planning mode discussion + file creation, default 30 minutes is appropriate.

Example 4: MCP Tool Integration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Task**: Use multiple MCP tools with planning mode

.. code-block:: yaml

   agents:
     - id: "agent1"
       backend:
         type: "openai"
         model: "gpt-5-nano"
         mcp_servers:
           - {name: "weather", ...}
           - {name: "search", ...}

   orchestrator:
     coordination:
       enable_planning_mode: true

   timeout_settings:
     orchestrator_timeout_seconds: 2400  # 40 minutes

**Reasoning**: MCP tools may have API latency, planning mode adds coordination time, 40 minutes provides safety margin.

Troubleshooting
---------------

Timeouts Occurring Too Frequently
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Symptoms**:

* Tasks consistently hitting timeout
* Coordination incomplete messages
* Partial results only

**Solutions**:

1. **Increase timeout**:

   .. code-block:: yaml

      timeout_settings:
        orchestrator_timeout_seconds: 3600  # Double the default

2. **Reduce agent count**: Fewer agents = faster coordination

3. **Simplify task**: Break complex tasks into smaller subtasks

4. **Use faster models**: Consider GPT-4o-mini or Gemini Flash instead of larger models

5. **Disable planning mode** if not needed:

   .. code-block:: yaml

      orchestrator:
        coordination:
          enable_planning_mode: false

6. **Check for stuck agents**: Review debug logs for agents not responding

7. **Enable per-round timeouts**: Force agents to submit after a time limit:

   .. code-block:: yaml

      timeout_settings:
        initial_round_timeout_seconds: 600
        subsequent_round_timeout_seconds: 180

Tasks Completing Too Quickly
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Symptoms**:

* Coordination ends in seconds
* Agents immediately voting without discussion
* Short timeout may be unnecessarily limiting deeper analysis

**Solutions**:

* This is generally not a problem - fast completion is good!
* If you want more thorough discussion, adjust system messages to encourage analysis

Per-Round Timeout Issues
~~~~~~~~~~~~~~~~~~~~~~~~~

**Symptoms**:

* Soft timeout message appears but agent keeps working
* Hard timeout blocks tools unexpectedly
* Agent submits incomplete work

**Solutions**:

1. **Increase grace period** if agents need more time to finish:

   .. code-block:: yaml

      timeout_settings:
        round_timeout_grace_seconds: 180  # 3 minutes instead of 2

2. **Increase initial timeout** for complex tasks:

   .. code-block:: yaml

      timeout_settings:
        initial_round_timeout_seconds: 900  # 15 minutes

3. **Check log messages** for timeout events:

   .. code-block:: text

      [RoundTimeoutPostHook] Soft timeout reached for agent_b after 605s
      [RoundTimeoutPreHook] Blocking mcp__filesystem__write_file for agent_b - hard timeout exceeded

4. **Disable per-round timeouts** by omitting the settings (they're disabled by default)

Timeout But No Error Message
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Problem**: Timeout occurs but no clear indication in output.

**Solution**: Enable debug logging:

.. code-block:: bash

   uv run python -m massgen.cli \
     --debug \
     --orchestrator-timeout 600 \
     --config config.yaml \
     "Your question"

Check logs in ``agent_outputs/log_{timestamp}/massgen_debug.log``

Best Practices
--------------

1. **Start with defaults**: Use the 30-minute default unless you have specific needs

2. **Adjust based on task complexity**:

   * Simple: 300-600s
   * Standard: 900-1800s
   * Complex: 1800-3600s
   * Very complex: 3600+s

3. **Consider cost implications**: Longer timeouts = potentially higher API costs

4. **Use CLI overrides for testing**: Test with shorter timeouts first

   .. code-block:: bash

      # Test with 5-minute timeout
      uv run python -m massgen.cli --orchestrator-timeout 300 --config test.yaml "test"

      # Then use full timeout for production
      uv run python -m massgen.cli --config prod.yaml "real task"

5. **Monitor actual completion times**: Check logs to see typical durations for your tasks

6. **Set appropriate timeouts per environment**:

   .. code-block:: yaml

      # Development config
      timeout_settings:
        orchestrator_timeout_seconds: 600  # Fast feedback

   .. code-block:: yaml

      # Production config
      timeout_settings:
        orchestrator_timeout_seconds: 3600  # Allow full completion

7. **Document timeout choices**: Add comments explaining timeout rationale

   .. code-block:: yaml

      timeout_settings:
        # 40 minutes: allows for 5 agents, planning mode, and MCP tool latency
        orchestrator_timeout_seconds: 2400

API Cost Considerations
-----------------------

Longer timeouts can lead to higher costs:

**Estimated API Costs by Timeout**:

.. list-table::
   :header-rows: 1
   :widths: 20 20 30 30

   * - Timeout
     - Typical Duration
     - 3-Agent Scenario
     - 5-Agent Scenario
   * - 5 min
     - 2-3 min
     - $0.10-0.50
     - $0.20-0.80
   * - 30 min (default)
     - 5-15 min
     - $0.50-2.00
     - $1.00-4.00
   * - 1 hour
     - 20-40 min
     - $2.00-5.00
     - $4.00-10.00
   * - 2 hours
     - 40-90 min
     - $5.00-15.00
     - $10.00-30.00

.. note::

   These are rough estimates. Actual costs depend on:

   * Models used (GPT-4 vs GPT-4o-mini, etc.)
   * Number of coordination rounds
   * Tool usage (MCP, code execution, web search)
   * Response lengths

**Cost-Saving Tips**:

1. Use shorter timeouts for testing
2. Choose efficient models (GPT-4o-mini, Gemini Flash)
3. Limit agent count for simple tasks
4. Monitor actual usage and adjust timeouts accordingly

Debug and Monitoring
--------------------

Viewing Timeout Information
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Enable debug logging to see timeout details:

.. code-block:: bash

   uv run python -m massgen.cli --debug --config config.yaml "question"

Look for timeout-related messages in ``agent_outputs/log_{timestamp}/massgen_debug.log``:

.. code-block:: text

   [INFO] Orchestrator timeout configured: 1800 seconds
   [INFO] Starting coordination...
   [INFO] Round 1 complete (elapsed: 45s / 1800s)
   [INFO] Round 2 complete (elapsed: 128s / 1800s)
   ...

Monitoring Coordination Progress
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In the terminal UI, watch for elapsed time indicators:

.. code-block:: text

   ┌─ Coordination Progress ─────────────────┐
   │ Round: 3/∞                              │
   │ Elapsed: 234s / 1800s (13%)             │
   │ Status: In progress                     │
   └──────────────────────────────────────────┘

Related Configuration
---------------------

* :doc:`../user_guide/concepts` - Understanding coordination mechanics
* :doc:`../user_guide/advanced/planning_mode` - Planning mode and coordination time
* :doc:`yaml_schema` - Complete configuration reference
* :doc:`cli` - CLI timeout flags

Next Steps
----------

* Test your configuration with appropriate timeouts
* Monitor actual completion times in your use cases
* Adjust timeouts based on observed patterns
* Consider cost vs. completion trade-offs


---

## reference/yaml_schema.rst

YAML Configuration Reference
============================

Complete YAML configuration schema for MassGen.

.. note::

   For a complete overview of supported models and capabilities, see :doc:`supported_models`.

.. tip::

   **Validate your configs!** MassGen includes a built-in validator that checks for errors before running. Use ``massgen --validate config.yaml`` to verify your configuration. See :doc:`../user_guide/validating_configs` for details.

Configuration Hierarchy
-----------------------

MassGen configurations have a clear hierarchy of settings. Understanding this structure helps you place parameters in the correct location.

**Configuration Levels:**

1. **Top Level** - Global settings

   - ``agents`` or ``agent``: List of agents (or single agent)
   - ``memory``: Memory system configuration (conversation + persistent with Qdrant)
   - ``filesystem_memory``: Filesystem-based memory with auto-compression
   - ``orchestrator``: Coordination and workspace settings
   - ``ui``: Display and logging settings

2. **Agent Level** - Per-agent settings (inside ``agents[]``)

   - ``id``: Unique agent identifier
   - ``backend``: Backend configuration object
   - ``system_message``: Agent-specific instructions

3. **Backend Level** - Model and tool settings (inside ``agent.backend``)

   - Core: ``type``, ``model``, ``api_key``, ``temperature``, ``max_tokens``
   - Tool Enablement: ``enable_web_search``, ``enable_code_execution``, ``enable_code_interpreter``
   - MCP Integration: ``mcp_servers``, ``exclude_tools``, ``enable_mcp_command_line``
   - Backend-Specific: ``cwd``, ``permission_mode``, ``allowed_tools``, etc.

   .. note::
      **Code Execution Options**: ``enable_code_execution``/``enable_code_interpreter`` run in the provider's cloud sandbox (no filesystem access). For local code execution with filesystem access, use ``enable_mcp_command_line: true`` instead. See :doc:`../user_guide/tools/code_execution` for details.

4. **MCP Server Level** - Tool server settings (inside ``backend.mcp_servers[]``)

   - Connection: ``name``, ``type``, ``command``, ``args``, ``url``, ``env``
   - Security: ``security`` object (``level``, ``allow_localhost``, ``allow_private_ips``)
   - Tool Filtering: ``allowed_tools``, ``exclude_tools``

5. **Orchestrator Level** - Multi-agent coordination (top-level ``orchestrator``)

   - Workspace: ``snapshot_storage``, ``agent_temporary_workspace``
   - Project Integration: ``context_paths``
   - Coordination: ``coordination.enable_planning_mode``, ``coordination.planning_mode_instruction``, ``coordination.max_orchestration_restarts``
   - Debug: ``debug_final_answer``
   - Advanced: ``skip_coordination_rounds``, ``timeout``

6. **UI Level** - Display settings (top-level ``ui``)

   - ``display_type``: "rich_terminal" or "simple"
   - ``logging_enabled``: Enable/disable logging

Backend Types Overview
----------------------

MassGen supports multiple backend types with varying capabilities:

**API-Based Backends:**

.. list-table::
   :header-rows: 1
   :widths: 20 80

   * - Backend Type
     - Description & Key Features
   * - ``claude``
     - Anthropic's Claude API with full tool support and MCP integration
   * - ``claude_code``
     - Claude Code SDK with native dev tools (Read, Write, Edit, Bash, etc.)
   * - ``codex``
     - OpenAI Codex CLI with native shell, file editing, web search, and MCP integration
   * - ``gemini``
     - Google's Gemini API with planning mode and MCP support
   * - ``openai``
     - OpenAI's GPT models with full tool and MCP support
   * - ``grok``
     - xAI's Grok models with web search and MCP integration
   * - ``azure_openai``
     - Azure-deployed OpenAI models (limited tool support)
   * - ``zai``
     - ZhipuAI's GLM models with basic MCP support
   * - ``chatcompletion``
     - **Generic OpenAI-compatible backend** - Works with Cerebras, Together AI, Fireworks, Groq, OpenRouter, etc. Requires ``base_url`` parameter

**Local/Inference Backends:**

.. list-table::
   :header-rows: 1
   :widths: 20 80

   * - Backend Type
     - Description & Key Features
   * - ``lmstudio``
     - Local LM Studio server for running open-weight models
   * - ``vllm``
     - vLLM inference server (auto-detects port 8000)
   * - ``sglang``
     - SGLang inference server (auto-detects port 30000)

**Framework Backends:**

.. list-table::
   :header-rows: 1
   :widths: 20 80

   * - Backend Type
     - Description & Key Features
   * - ``ag2``
     - AG2 framework integration with code execution support

Basic Structure
---------------

.. code-block:: yaml

   # Agent definitions (required)
   agents:
     - id: "agent1"
       backend:
         # Backend configuration
       system_message: "..."

   # Orchestrator settings (optional)
   orchestrator:
     # Coordination and workspace settings

   # UI settings (optional)
   ui:
     # Display and logging configuration

Agent Configuration
-------------------

Single Agent
~~~~~~~~~~~~

.. code-block:: yaml

   agent:  # Singular for single agent
     id: "my_agent"
     backend:
       type: "claude"
       model: "claude-sonnet-4"
     system_message: "You are a helpful assistant"

Multiple Agents
~~~~~~~~~~~~~~~

.. code-block:: yaml

   agents:  # Plural for multiple agents
     - id: "agent1"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
       system_message: "You are a researcher"

     - id: "agent2"
       backend:
         type: "openai"
         model: "gpt-5-nano"
       system_message: "You are an analyst"

Backend Configuration
---------------------

Basic Backend
~~~~~~~~~~~~~

.. code-block:: yaml

   backend:
     type: "openai"              # Backend type (required)
     model: "gpt-5-mini"         # Model name (required)
     api_key: "${API_KEY}"       # Optional, uses env var by default
     temperature: 0.7            # Optional
     max_tokens: 2000            # Optional

Claude Code Backend
~~~~~~~~~~~~~~~~~~~

The Claude Code backend uses Anthropic's Claude Agent SDK with native development tools.
By default, MassGen disables most Claude Code tools since it provides native implementations
for file operations, shell execution, and directory listing. Only the ``Task`` tool (for
subagent spawning) is enabled by default.

**Basic Configuration:**

.. code-block:: yaml

   backend:
     type: "claude_code"
     model: "sonnet"
     cwd: "workspace"            # Working directory for file operations
     permission_mode: "bypassPermissions"  # Optional

**With Web Search and Default Prompt:**

.. code-block:: yaml

   backend:
     type: "claude_code"
     model: "sonnet"
     cwd: "workspace"
     use_default_prompt: true    # Use Claude Code's default system prompt
     enable_web_search: true     # Enable WebSearch and WebFetch tools

**Configuration Options:**

- ``use_default_prompt`` (bool, default: false): When true, uses Claude Code's default
  system prompt (with coding style guidelines) plus MassGen's workflow instructions.
  When false, uses only MassGen's workflow prompt for full control.

- ``enable_web_search`` (bool, default: false): When true, enables Claude Code's
  WebSearch and WebFetch tools. Use when MassGen's crawl4ai tools are unavailable
  (crawl4ai requires Docker).

**Default Tool Behavior:**

Only the ``Task`` tool is enabled by default. All other Claude Code tools are disabled
because MassGen provides native equivalents:

- ``Read``, ``Write``, ``Edit`` → MassGen's ``read_file_content``, ``save_file_content``, ``append_file_content``
- ``Bash`` → MassGen's ``run_shell_script`` or ``execute_command`` MCP
- ``LS`` → MassGen's ``list_directory``
- ``Grep``, ``Glob`` → Use ``execute_command`` or future MassGen tools (see GitHub issue #640)

With MCP Servers
~~~~~~~~~~~~~~~~

.. code-block:: yaml

   backend:
     type: "gemini"
     model: "gemini-2.5-flash"
     mcp_servers:
       - name: "weather"
         type: "stdio"
         command: "npx"
         args: ["-y", "@modelcontextprotocol/server-weather"]
       - name: "search"
         type: "stdio"
         command: "npx"
         args: ["-y", "@modelcontextprotocol/server-brave-search"]
         env:
           BRAVE_API_KEY: "${BRAVE_API_KEY}"

Tool Filtering
~~~~~~~~~~~~~~

.. code-block:: yaml

   backend:
     type: "openai"
     model: "gpt-4o-mini"
     exclude_tools:  # Backend-level exclusions
       - mcp__discord__send_webhook
     mcp_servers:
       - name: "discord"
         type: "stdio"
         command: "npx"
         args: ["-y", "@modelcontextprotocol/server-discord"]
         allowed_tools:  # Server-specific whitelist
           - mcp__discord__read_messages
           - mcp__discord__send_message

GitHub Copilot Backend
~~~~~~~~~~~~~~~~~~~~~~

The GitHub Copilot backend uses the ``github-copilot-sdk`` with native MCP support.
Requires a GitHub Copilot subscription and the Copilot CLI (``gh copilot``).

**Basic Configuration:**

.. code-block:: yaml

   backend:
     type: "copilot"
     model: "gpt-5-mini"       # Also: gpt-4.1, claude-sonnet-4, gemini-2.5-pro

**With MCP Servers and Custom Tools:**

.. code-block:: yaml

   backend:
     type: "copilot"
     model: "gpt-5-mini"
     mcp_servers:
       - name: "filesystem"
         command: "npx"
         args: ["-y", "@modelcontextprotocol/server-filesystem", "."]
     custom_tools:
       - path: "massgen/tool/_basic"
         function: "two_num_tool"

**Configuration Options:**

- ``copilot_system_message_mode`` (string: "append"|"replace", default: "append"):
  How the system message is applied to the Copilot session.
- ``copilot_permission_policy`` (string: "approve"|"deny", default: "approve"):
  Permission callback policy. "approve" validates paths via PathPermissionManager.
- ``allowed_tools`` / ``exclude_tools``: Backend-level tool filtering.
- ``enable_multimodal_tools`` (bool): Enable read_media/generate_media tools.

**Docker Mode:**

.. code-block:: yaml

   backend:
     type: "copilot"
     model: "gpt-5-mini"
     command_line_execution_mode: "docker"
     command_line_docker_network_mode: "bridge"

AG2 Backend
~~~~~~~~~~~

.. code-block:: yaml

   backend:
     type: ag2
     agent_config:
       type: assistant           # or "conversable"
       name: "AG2_Coder"
       system_message: "You write Python code"
       llm_config:
         api_type: "openai"
         model: "gpt-4o"
       code_execution_config:
         executor:
           type: "LocalCommandLineCodeExecutor"
           timeout: 60
           work_dir: "./workspace"

Orchestrator Configuration
--------------------------

Basic Orchestrator
~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   orchestrator:
     snapshot_storage: "snapshots"
     agent_temporary_workspace: "temp_workspaces"
     session_storage: "sessions"  # For interactive mode

Context Paths
~~~~~~~~~~~~~

.. code-block:: yaml

   orchestrator:
     context_paths:
       - path: "/absolute/path/to/src"
         permission: "read"       # Read-only access
       - path: "/absolute/path/to/docs"
         permission: "write"      # Write access for final agent

Coordination Config
~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   orchestrator:
     coordination:
       enable_planning_mode: true
       planning_mode_instruction: |
         PLANNING MODE: Describe intended actions.
         Do not execute during coordination phase.

Skills System Config
~~~~~~~~~~~~~~~~~~~~

Enable the skills system for domain-specific guidance and workflows:

.. code-block:: yaml

   orchestrator:
     coordination:
       # Enable skills system
       use_skills: true

       # Optional: Skills discovery directory (default: .agent/skills)
       skills_directory: ".agent/skills"

       # Optional: Enable specific built-in MassGen skills
       massgen_skills:
         - "file_search"    # Always useful (ripgrep/ast-grep)
         - "serena"         # Symbol-level code understanding (LSP)
         - "semtools"       # Semantic search (embeddings)

**Available Built-in Skills:**

- ``file_search``: Fast text and structural code search (ripgrep/ast-grep)
- ``serena``: Symbol-level code understanding using LSP (optional, requires installation)
- ``semtools``: Semantic search using embeddings (optional, requires installation)

**Notes:**

- Skills require command line execution (``enable_mcp_command_line: true``)
- Default skills (memory, file_search) are always available when ``use_skills: true``
- Optional skills (serena, semtools) must be explicitly listed in ``massgen_skills``
- External skills from ``openskills`` are discovered from ``skills_directory``

See :ref:`user_guide_skills` for complete documentation.

UI Configuration
----------------

.. code-block:: yaml

   ui:
     display_type: "rich_terminal"  # or "simple"
     logging_enabled: true

Filesystem Memory Configuration
-------------------------------

Filesystem memory provides automatic context compression and memory persistence for long-running agent conversations. When the context window approaches capacity, the agent is prompted to summarize important information to markdown files before the conversation is truncated.

.. note::

   This is separate from the ``memory`` section, which configures Qdrant-based vector memory. ``filesystem_memory`` uses plain files for simpler, more transparent memory management.

Basic Configuration
~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   filesystem_memory:
     enabled: true
     compression:
       trigger_threshold: 0.75   # Compress at 75% context usage
       target_ratio: 0.20        # Keep 20% after compression

How Auto-Compression Works
~~~~~~~~~~~~~~~~~~~~~~~~~~

1. **Monitoring**: The system monitors context window usage each turn
2. **Trigger**: When usage reaches ``trigger_threshold`` (default 75%), compression begins
3. **Agent Summary**: A compression request is injected asking the agent to:

   - Write a conversation summary to ``memory/short_term/recent.md``
   - Optionally write important facts to ``memory/long_term/*.md``
   - Call the ``compression_complete`` tool to signal completion

4. **Validation**: System validates that ``recent.md`` was written
5. **Truncation**: Conversation is truncated to ``target_ratio`` (default 20%), keeping recent messages
6. **Fallback**: If agent fails after 2 attempts, algorithmic compression is used (with warning)

Memory File Structure
~~~~~~~~~~~~~~~~~~~~~

After compression, the agent's workspace contains:

.. code-block:: text

   workspace/
   └── memory/
       ├── short_term/
       │   └── recent.md       # Conversation summary (auto-injected on future turns)
       └── long_term/
           ├── user_prefs.md   # Optional: persistent facts
           └── project_notes.md

Example with Full Options
~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   # Enable filesystem memory with custom thresholds
   filesystem_memory:
     enabled: true
     compression:
       trigger_threshold: 0.80   # More aggressive: wait until 80%
       target_ratio: 0.30        # Keep more context: 30%

   # Agent with memory-aware configuration
   agents:
     - id: "assistant"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
         enable_mcp_command_line: true  # Required for file writing

Complete Example
----------------

Full multi-agent configuration demonstrating all 6 configuration levels:

.. code-block:: yaml

   # ========================================
   # LEVEL 1: TOP LEVEL - Global Settings
   # ========================================
   # Define agents, orchestrator, and UI at the top level

   # ========================================
   # LEVEL 2: AGENT LEVEL - Per-Agent Settings
   # ========================================
   agents:
     # Agent 1: Gemini with web search and tool enablement
     - id: "researcher"
       system_message: "You are a researcher with web search and weather tools"

       # ========================================
       # LEVEL 3: BACKEND LEVEL - Model & Tools
       # ========================================
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
         temperature: 0.7
         max_tokens: 2000

         # Tool Enablement Flags (Backend Level)
         enable_web_search: true           # Gemini built-in web search
         enable_code_execution: true       # Gemini code execution

         # Backend-level tool filtering
         exclude_tools:
           - mcp__weather__set_location    # Prevent location changes

         # ========================================
         # LEVEL 4: MCP SERVER LEVEL - Tool Servers
         # ========================================
         mcp_servers:
           - name: "search"
             type: "stdio"
             command: "npx"
             args: ["-y", "@modelcontextprotocol/server-brave-search"]
             env:
               BRAVE_API_KEY: "${BRAVE_API_KEY}"

             # MCP Server-level security configuration
             security:
               level: "high"                # Strict security
               allow_localhost: true        # Allow local connections
               allow_private_ips: false     # Block private IPs

             # MCP Server-level tool filtering
             allowed_tools:
               - mcp__search__web_search
               - mcp__search__local_search

           - name: "weather"
             type: "stdio"
             command: "npx"
             args: ["-y", "@modelcontextprotocol/server-weather"]
             security:
               level: "permissive"          # Relaxed for testing

     # Agent 2: Claude Code with native tools
     - id: "coder"
       system_message: "You write and execute code with file operations"
       backend:
         type: "claude_code"
         model: "claude-sonnet-4-20250514"
         cwd: "workspace"                    # Working directory (unique suffix added at runtime)
         permission_mode: "bypassPermissions"

         # Claude Code-specific parameters
         max_thinking_tokens: 10000         # Extended reasoning
         system_prompt: "You are an expert Python developer"
         disallowed_tools:                  # Blacklist dangerous ops
           - "Bash(rm*)"
           - "Bash(sudo*)"
           - "WebSearch"                    # Block web access

         # File operations handled via cwd parameter

     # Agent 3: OpenAI with code interpreter
     - id: "analyst"
       system_message: "You analyze data and generate reports"
       backend:
         type: "openai"
         model: "gpt-5-nano"

         # OpenAI-specific tool enablement
         enable_web_search: true            # OpenAI web search
         enable_code_interpreter: true      # Code interpreter tool

         cwd: "workspace"          # File operations (unique suffix added at runtime)

   # ========================================
   # LEVEL 5: ORCHESTRATOR LEVEL - Coordination
   # ========================================
   orchestrator:
     # Workspace management
     snapshot_storage: "snapshots"
     agent_temporary_workspace: "temp_workspaces"

     # Project integration
     context_paths:
       - path: "/Users/me/project/src"
         permission: "read"                 # Read-only access
       - path: "/Users/me/project/docs"
         permission: "write"                # Write access for winner

     # Coordination settings
     coordination:
       enable_planning_mode: true           # Enable planning mode
       max_orchestration_restarts: 2        # Allow up to 2 restarts (3 total attempts)
       planning_mode_instruction: |
         PLANNING MODE ACTIVE: You are in coordination phase.
         1. Describe your intended actions
         2. Analyze other agents' proposals
         3. Use only vote/new_answer tools
         4. DO NOT execute MCP commands
         5. Save execution for final presentation

     # Voting and answer control
     voting_sensitivity: "balanced"         # How critical agents are when voting (lenient/balanced)
     max_new_answers_per_agent: 2           # Cap new answers per agent (null=unlimited)
     max_new_answers_global: 8              # Cap total new answers across all agents (null=unlimited)
     answer_novelty_requirement: "balanced" # How different new answers must be (lenient/balanced/strict)
     fairness_enabled: true                 # Keep coordination pacing balanced (default: true)
     fairness_lead_cap_answers: 2           # Max lead in answer revisions vs slowest active peer
     max_midstream_injections_per_round: 2  # Cap injected unseen source updates per round
     defer_peer_updates_until_restart: false  # Queue peer updates for next restart instead of mid-stream injection
     allow_midstream_peer_updates_before_checklist_submit: null  # Optional checklist-mode override before first accepted submit_checklist

     # Advanced settings
     skip_coordination_rounds: false        # Normal coordination
     timeout:
       orchestrator_timeout_seconds: 1800   # 30 minute timeout

   # ========================================
   # LEVEL 6: UI LEVEL - Display Settings
   # ========================================
   ui:
     display_type: "rich_terminal"          # Rich terminal display
     logging_enabled: true                  # Enable logging

Parameter Reference
-------------------

Agents
~~~~~~

.. list-table::
   :header-rows: 1

   * - Parameter
     - Type
     - Required
     - Description
   * - ``id``
     - string
     - Yes
     - Unique agent identifier
   * - ``backend``
     - object
     - Yes
     - Backend configuration
   * - ``system_message``
     - string
     - No
     - System prompt for the agent

Backend
~~~~~~~

.. list-table::
   :header-rows: 1

   * - Parameter
     - Type
     - Required
     - Supported Backends
     - Description
   * - ``type``
     - string
     - Yes
     - All
     - Backend type: ``claude``, ``claude_code``, ``codex``, ``gemini``, ``gemini_cli``, ``openai``, ``grok``, ``azure_openai``, ``zai``, ``chatcompletion``, ``lmstudio``, ``vllm``, ``sglang``, ``ag2``, ``copilot``
   * - ``model``
     - string
     - Yes
     - All
     - Model name (provider-specific)
   * - ``api_key``
     - string
     - No
     - All API backends
     - API key (uses env var by default)
   * - ``base_url``
     - string
     - Yes*
     - ``chatcompletion``, ``lmstudio``, ``vllm``, ``sglang``
     - API endpoint URL (required for chatcompletion)
   * - ``cwd``
     - string
     - No
     - ``claude_code``, ``codex``
     - Working directory for file operations. **Use** ``"workspace"`` **as the value** - MassGen automatically adds a unique suffix per agent at runtime (e.g., ``workspace_f7a3b2c1``). Avoid numbered names like ``workspace1`` as they can leak agent identity during voting.
   * - ``exclude_file_operation_mcps``
     - boolean
     - No
     - All with MCP support
     - Exclude file operation MCP tools (read/write/copy/delete). Agents use command-line tools instead. Keeps command execution, media generation, and planning MCPs. (default: false)
   * - ``enable_image_generation``
     - boolean
     - No
     - All with MCP support
     - Enable image generation tools (default: false)
   * - ``enable_audio_generation``
     - boolean
     - No
     - All with MCP support
     - Enable audio generation tools (default: false)
   * - ``enable_file_generation``
     - boolean
     - No
     - All with MCP support
     - Enable file generation tools (default: false)
   * - ``enable_video_generation``
     - boolean
     - No
     - All with MCP support
     - Enable video generation tools (default: false)
   * - ``enable_code_based_tools``
     - boolean
     - No
     - All with MCP support
     - Enable code-based tools (CodeAct paradigm). MCP tools presented as Python code in workspace (default: false)
   * - ``custom_tools_path``
     - string
     - No
     - All with MCP support
     - Path to custom tools directory to copy into workspace (for code-based tools)
   * - ``auto_discover_custom_tools``
     - boolean
     - No
     - All with MCP support
     - Auto-discover custom tools from massgen/tool/ directory (default: false)
   * - ``exclude_custom_tools``
     - list
     - No
     - All with MCP support
     - List of custom tool directories to exclude (e.g., ["_claude_computer_use"])
   * - ``direct_mcp_servers``
     - list
     - No
     - All with MCP support
     - List of MCP server names to keep as direct protocol tools when ``enable_code_based_tools`` is true. These servers remain callable as native tools in the prompt rather than being filtered to code-only access. Example: ``["logfire", "context7"]``
   * - ``shared_tools_directory``
     - string
     - No
     - All with MCP support
     - Shared directory for code-based tools. Tools generated once and shared across agents (default: per-agent)
   * - ``concurrent_tool_execution``
     - boolean
     - No
     - All with MCP support
     - Execute multiple tool calls in parallel (default: false). When enabled, tools called together run simultaneously. WARNING: Do not call dependent tools together (e.g., mkdir + write to that dir)
   * - ``enable_mcp_command_line``
     - boolean
     - No
     - All with MCP support
     - Enable command-line execution tool (default: false)
   * - ``command_line_execution_mode``
     - string
     - No
     - All with MCP support
     - Execution mode: "local" or "docker" (default: "local")
   * - ``command_line_docker_image``
     - string
     - No
     - All with MCP support
     - Docker image for command execution (default: "massgen:runtime")
   * - ``command_line_docker_memory_limit``
     - string
     - No
     - All with MCP support
     - Docker memory limit (e.g., "2g", default: "4g")
   * - ``command_line_docker_cpu_limit``
     - string
     - No
     - All with MCP support
     - Docker CPU limit (e.g., "2.0", default: "4.0")
   * - ``command_line_docker_network_mode``
     - string
     - **Codex**, **Gemini CLI** (Docker mode)
     - All with MCP support
     - Docker network mode: "bridge", "host", "none". **Required for Codex and Gemini CLI in Docker mode** (use "bridge").
   * - ``model_reasoning_effort``
     - string
     - No
     - ``codex``
     - Codex reasoning effort: "low", "medium", "high", or "xhigh". OpenAI-style ``reasoning.effort`` is also accepted for Codex compatibility.
   * - ``command_line_docker_enable_sudo``
     - boolean
     - No
     - All with MCP support
     - Enable sudo in Docker containers (default: false)
   * - ``command_line_docker_credentials``
     - object
     - No
     - All with MCP support
     - Docker credentials config (env_file, env_vars, env_vars_from_file, pass_all_env)
   * - ``command_line_docker_packages``
     - object
     - No
     - All with MCP support
     - Docker packages to install (apt, pip, npm lists)
   * - ``command_line_allowed_commands``
     - list
     - No
     - All with MCP support
     - Whitelist of allowed command patterns
   * - ``command_line_blocked_commands``
     - list
     - No
     - All with MCP support
     - Blacklist of blocked command patterns
   * - ``mcp_servers``
     - list
     - No
     - All except ``ag2``, ``azure_openai``
     - MCP server configurations
   * - ``exclude_tools``
     - list
     - No
     - All with tool support
     - Tools to exclude from this backend
   * - ``temperature``
     - float
     - No
     - All
     - Sampling temperature (0.0-1.0)
   * - ``max_tokens``
     - integer
     - No
     - All
     - Maximum response tokens
   * - ``permission_mode``
     - string
     - No
     - ``claude_code``
     - Permission handling: ``bypassPermissions`` or default
   * - ``agent_config``
     - object
     - Yes*
     - ``ag2``
     - AG2-specific agent configuration (required for AG2)
   * - ``enable_web_search``
     - boolean
     - No
     - ``claude``, ``claude_code``, ``gemini``, ``openai``, ``grok``, ``chatcompletion``
     - Enable built-in web search capability. For ``claude_code``, enables WebSearch and WebFetch tools (default: false)
   * - ``use_default_prompt``
     - boolean
     - No
     - ``claude_code``
     - When true, uses Claude Code's default system prompt with MassGen instructions appended. When false (default), uses only MassGen's workflow prompt for full control over agent behavior
   * - ``enable_code_execution``
     - boolean
     - No
     - ``claude``, ``gemini``
     - Enable built-in code execution tool
   * - ``enable_code_interpreter``
     - boolean
     - No
     - ``openai``
     - Enable OpenAI code interpreter tool
   * - ``allowed_tools``
     - list
     - No
     - ``claude_code``
     - Whitelist of allowed Claude Code tools (legacy - use disallowed_tools instead)
   * - ``disallowed_tools``
     - list
     - No
     - ``claude_code``
     - Blacklist of dangerous tools to block (e.g., ["Bash(rm*)", "Bash(sudo*)"])
   * - ``max_thinking_tokens``
     - integer
     - No
     - ``claude_code``
     - Maximum tokens for internal thinking (default: 8000)
   * - ``system_prompt``
     - string
     - No
     - ``claude_code``
     - Custom system prompt for Claude Code agent
   * - ``api_version``
     - string
     - Yes*
     - ``azure_openai``
     - Azure OpenAI API version (required, default: "2024-02-15-preview")

MCP Server
~~~~~~~~~~

.. list-table::
   :header-rows: 1

   * - Parameter
     - Type
     - Required
     - Description
   * - ``name``
     - string
     - Yes
     - Server name
   * - ``type``
     - string
     - Yes
     - "stdio" or "streamable-http"
   * - ``command``
     - string
     - stdio only
     - Command to launch server
   * - ``args``
     - list
     - stdio only
     - Command arguments
   * - ``url``
     - string
     - http only
     - Server URL
   * - ``env``
     - object
     - No
     - Environment variables
   * - ``allowed_tools``
     - list
     - No
     - Whitelist of allowed tools
   * - ``exclude_tools``
     - list
     - No
     - Tools to exclude
   * - ``security``
     - object
     - No
     - Security configuration for the MCP server

MCP Server Security
~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1

   * - Parameter
     - Type
     - Required
     - Description
   * - ``level``
     - string
     - No
     - Security level: ``"high"`` (strict, default) or ``"permissive"`` (relaxed for testing)
   * - ``allow_localhost``
     - boolean
     - No
     - Allow connections to localhost (required for local MCP servers)
   * - ``allow_private_ips``
     - boolean
     - No
     - Allow connections to private IP ranges (for testing environments)

Orchestrator
~~~~~~~~~~~~

.. list-table::
   :header-rows: 1

   * - Parameter
     - Type
     - Required
     - Description
   * - ``snapshot_storage``
     - string
     - No
     - Directory for workspace snapshots
   * - ``agent_temporary_workspace``
     - string
     - No
     - Directory for temporary workspaces
   * - ``context_paths``
     - list
     - No
     - Shared project directories
   * - ``coordination``
     - object
     - No
     - Coordination configuration (planning mode settings)
   * - ``skip_coordination_rounds``
     - boolean
     - No
     - Debug/test mode: skip voting rounds and go straight to final presentation (default: false)
   * - ``debug_final_answer``
     - string
     - No
     - Debug mode for restart feature: override final answer on attempt 1 only to test restart flow (default: null). Example: "I only created one file."
   * - ``timeout``
     - object
     - No
     - Timeout configuration

Coordination Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1

   * - Parameter
     - Type
     - Required
     - Description
   * - ``enable_planning_mode``
     - boolean
     - No
     - Enable planning mode during coordination (default: false). When enabled, agents plan without executing MCP tools during the coordination phase. Only the winning agent executes actions during final presentation.
   * - ``planning_mode_instruction``
     - string
     - No
     - Custom instruction added to agent prompts when planning mode is enabled. Should explain to agents that they should describe intended actions without executing them.
   * - ``max_orchestration_restarts``
     - integer
     - No
     - Maximum number of orchestration restarts allowed (default: 0). When set > 0, enables post-evaluation where the winning agent reviews the final answer and can request a restart with specific improvement instructions. Recommended values: 1-2.
   * - ``subagent_types``
     - list of strings or null
     - No
     - Which specialized subagent types to expose. Default (null/omitted): ``[evaluator, explorer, researcher]``. Set explicitly to include ``novelty`` or custom project types. Empty list disables all specialized types.
   * - ``enable_subagents``
     - boolean
     - No
     - Enable subagent tools for parallel task execution (default: false)
   * - ``subagent_default_timeout``
     - integer
     - No
     - Default timeout in seconds for subagent execution (default: 300)
   * - ``subagent_min_timeout``
     - integer
     - No
     - Minimum allowed subagent timeout in seconds (default: 60)
   * - ``subagent_max_timeout``
     - integer
     - No
     - Maximum allowed subagent timeout in seconds (default: 600)
   * - ``subagent_max_concurrent``
     - integer
     - No
     - Maximum number of concurrent subagents (default: 3)
   * - ``subagent_round_timeouts``
     - object
     - No
     - Optional per-round timeout settings for subagents. Uses the same keys as ``timeout_settings`` and inherits from parent if omitted.
   * - ``subagent_runtime_mode``
     - string
     - No
     - Subagent runtime boundary mode. ``isolated`` (default) or ``inherited``.
   * - ``subagent_runtime_fallback_mode``
     - string or null
     - No
     - Optional fallback mode when isolated prerequisites are unavailable. ``inherited`` or ``null`` (strict isolation). Codex in Docker mode treats unset fallback as ``inherited`` when ``subagent_runtime_mode`` is ``isolated``.
   * - ``subagent_host_launch_prefix``
     - list or null
     - No
     - Optional command prefix used to bridge isolated launches from containerized parent runtimes.
   * - ``subagent_orchestrator``
     - object
     - No
     - Subagent orchestrator configuration (multi-agent subagents with custom models), including options such as ``parse_at_references`` for literal ``@`` task text.
   * - ``background_subagents``
     - object
     - No
     - Background subagent configuration (``enabled``, ``injection_strategy``)
   * - ``round_evaluator_before_checklist``
     - boolean
     - No
     - Enable the orchestrator-managed round-evaluator stage before round-2+ checklist decisions (default: ``false``). Requires ``orchestrator_managed_round_evaluator: true`` and checklist-gated voting.
   * - ``orchestrator_managed_round_evaluator``
     - boolean
     - No
     - Treat the synthesized round-evaluator task handoff as the normal post-answer self-improvement path (default: ``false``).
   * - ``round_evaluator_refine``
     - boolean
     - No
     - Advanced/non-default option that lets the evaluator child run iterate before producing its packet (default: ``false``).
   * - ``round_evaluator_transformation_pressure``
     - string
     - No
     - Bias on how aggressively the evaluator seeks a larger thesis change. Supported values: ``gentle``, ``balanced``, ``aggressive``. Default: ``balanced``.
   * - ``fast_iteration_mode``
     - boolean
     - No
     - Streamline post-candidate phases so agents submit faster and iterate across rounds instead of over-polishing within a single round (default: ``false``). Only applies to ``checklist_gated`` voting sensitivity. When enabled: Phase 4 (subagent spawning for plateaued criteria) is skipped, the Substantiveness Test is replaced with a Quick Impact Check, and agents are guided to submit with Known Gaps rather than fixing everything internally. Analysis depth (Phases 1-2), verification replay, essential files manifest, and changedoc are all preserved.

.. note::

   **New in v0.1.3:** Orchestration restart enables automatic quality checks after coordination. The winning agent evaluates its own answer and can trigger a restart if the answer is incomplete or incorrect, with specific instructions for improvement.

.. note::

   **Planning Mode Support:** Planning mode works with all backends that support MCP integration (``claude``, ``claude_code``, ``codex``, ``gemini``, ``openai``, ``grok``, ``chatcompletion``, ``lmstudio``, ``vllm``, ``sglang``). It does NOT work with ``ag2`` or ``azure_openai``.

   **When to Use Planning Mode:**

   - When using MCP tools that perform irreversible actions (file deletion, database modifications, API calls)
   - When coordinating multiple agents that should agree on a plan before execution
   - When you want a "dry run" discussion phase before actual tool execution

   **How It Works:**

   1. **Coordination Phase** (with planning mode): Agents discuss and vote on approaches WITHOUT executing MCP tools
   2. **Final Presentation Phase**: The winning agent EXECUTES the planned actions

.. note::

   **Subagent Round Timeouts:** ``coordination.subagent_round_timeouts`` uses the same keys as ``timeout_settings`` (initial, subsequent, grace). If you omit it, subagents inherit the parent ``timeout_settings`` values.

Voting and Answer Control
~~~~~~~~~~~~~~~~~~~~~~~~~~

These parameters control coordination behavior to balance quality and duration.

Fairness controls are designed to solve a common multi-agent failure mode: fast agents can repeatedly submit revisions while slower peers are still working, which creates uneven effort, restart churn, and noisy coordination loops. With fairness enabled (default), agents stay within a bounded revision lead and wait for peer updates before terminal decisions.

.. list-table::
   :header-rows: 1

   * - Parameter
     - Type
     - Required
     - Description
   * - ``voting_sensitivity``
     - string
     - No
     - Controls how critical agents are when evaluating answers. **Options:** ``"lenient"`` (default) - agents vote for existing answers more readily, faster convergence; ``"balanced"`` - agents apply detailed criteria (comprehensive, accurate, complete?) before voting, more thorough evaluation; ``"strict"`` - agents apply high standards of excellence (all aspects, edge cases, reference-quality) before voting, maximum quality.
   * - ``max_new_answers_per_agent``
     - integer or null
     - No
     - Maximum number of new answers each agent can provide. In ``coordination_mode: voting``, this is a total per-agent cap. In ``coordination_mode: decomposition``, this is a **consecutive** cap that resets after the agent sees unseen external answer updates. **Options:** ``null`` (default) - unlimited answers; ``1``, ``2``, ``3``, etc.
   * - ``max_new_answers_global``
     - integer or null
     - No
     - Maximum number of new answers across all agents combined. When reached, ``new_answer`` is disabled for everyone. In voting mode, agents must vote; in decomposition mode, agents auto-stop. **Options:** ``null`` (default) - unlimited total answers; positive integer - global cap.
   * - ``answer_novelty_requirement``
     - string
     - No
     - Controls how different new answers must be from existing ones to prevent rephrasing. **Options:** ``"lenient"`` (default) - no similarity checks (fastest); ``"balanced"`` - reject if >70% token overlap, requires meaningful differences; ``"strict"`` - reject if >50% token overlap, requires substantially different solutions.
   * - ``fairness_enabled``
     - boolean
     - No
     - Enable fairness pacing controls across both ``coordination_mode: voting`` and ``coordination_mode: decomposition``. **Default:** ``true``.
   * - ``fairness_lead_cap_answers``
     - integer
     - No
     - Maximum allowed lead in answer revisions over the slowest active peer. When exceeded, ``new_answer`` is blocked until peers catch up. **Default:** ``2`` (set ``0`` for strict lockstep).
   * - ``max_midstream_injections_per_round``
     - integer
     - No
     - Maximum unseen source-agent updates injected mid-stream into a single agent during one round. Helps prevent fast models from receiving runaway update fanout. **Default:** ``2``.
   * - ``defer_peer_updates_until_restart``
     - boolean
     - No
     - When ``true``, peer answer updates are queued until the agent reaches a safe restart point instead of being injected mid-stream. Human/runtime/background payload delivery is unchanged. **Default:** ``false``.
   * - ``allow_midstream_peer_updates_before_checklist_submit``
     - boolean or null
     - No
     - Checklist-gated override for ``defer_peer_updates_until_restart``. When enabled, peer updates may still arrive mid-stream until the agent records its first accepted ``submit_checklist`` for the current answer. ``null`` uses the orchestrator default policy. **Default:** ``null``.

**Example Configurations:**

Fast but thorough (recommended for balanced evaluation):

.. code-block:: yaml

   orchestrator:
     voting_sensitivity: "balanced"       # Critical evaluation
     max_new_answers_per_agent: 2         # But cap at 2 tries
     max_new_answers_global: 8            # Stop global churn in long runs
     answer_novelty_requirement: "balanced"  # Must actually improve
     fairness_enabled: true
     fairness_lead_cap_answers: 2
     max_midstream_injections_per_round: 2
     defer_peer_updates_until_restart: false

Maximum quality with bounded time:

.. code-block:: yaml

   orchestrator:
     voting_sensitivity: "strict"          # Highest quality bar
     max_new_answers_per_agent: 3
     max_new_answers_global: 12
     answer_novelty_requirement: "strict"   # Only accept real improvements

Quick convergence:

.. code-block:: yaml

   orchestrator:
     voting_sensitivity: "lenient"
     max_new_answers_per_agent: 1
     max_new_answers_global: 3
     answer_novelty_requirement: "lenient"

Decomposition mode (recommended defaults):

.. code-block:: yaml

   orchestrator:
     coordination_mode: "decomposition"
     presenter_agent: "integrator"
     # In decomposition mode, use a lower per-agent cap than parallel voting mode.
     # This cap is consecutive and resets when the agent sees new external answers.
     max_new_answers_per_agent: 2  # Recommended range: 2-3
     # Add a global cap for deterministic total coordination budget.
     max_new_answers_global: 9
     answer_novelty_requirement: "balanced"
     fairness_enabled: true
     fairness_lead_cap_answers: 2
     max_midstream_injections_per_round: 2

Ensemble pattern (recommended defaults):

.. code-block:: yaml

   orchestrator:
     # Agents work independently — no peer answer injection
     disable_injection: true
     # Wait for all agents to finish before voting begins
     defer_voting_until_all_answered: true
     # Each agent produces 1 answer (adjustable)
     max_new_answers_per_agent: 1
     # Winner synthesizes from all answers
     final_answer_strategy: "synthesize"

The **ensemble pattern** is a coordination strategy where agents produce answers
independently (no peer visibility), then vote on the best answer, and the winner
synthesizes insights from all others into a refined final answer.

**When to use ensemble mode:**

- You want diverse, independent perspectives without agents anchoring on each
  other's work
- The task benefits from competitive parallel attempts rather than iterative
  refinement (e.g., creative writing, design proposals, solution brainstorming)
- You want faster coordination — single round of production + vote, no
  multi-round iteration

**Subagent default:** Multi-agent subagent runs use ensemble defaults
automatically (``disable_injection: true``, ``defer_voting_until_all_answered:
true``). Override by setting these fields explicitly in
``subagent_orchestrator`` config.

.. list-table:: Ensemble vs Standard Voting vs Decomposition
   :header-rows: 1

   * - Aspect
     - Standard voting
     - Ensemble pattern
     - Decomposition
   * - Peer visibility
     - Agents see each other's answers
     - Agents work in isolation
     - Agents see subtask assignments
   * - Iteration
     - Multiple refinement rounds
     - Single round of production
     - Multiple rounds per subtask
   * - Voting
     - After iterative refinement
     - After all answers produced
     - No voting (presenter assembles)
   * - Final answer
     - Winner presents
     - Winner synthesizes from all
     - Presenter integrates subtasks
   * - Best for
     - Deep quality refinement
     - Diverse perspectives, speed
     - Complex multi-part tasks

Timeout Configuration
~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1

   * - Parameter
     - Type
     - Required
     - Description
   * - ``orchestrator_timeout_seconds``
     - integer
     - No
     - Maximum time for orchestrator coordination in seconds (default: 1800 = 30 minutes)
   * - ``initial_round_timeout_seconds``
     - integer
     - No
     - Soft timeout for round 0 (initial answer). After this time, a warning is injected telling the agent to wrap up. Set to ``null`` to disable (default: disabled)
   * - ``subsequent_round_timeout_seconds``
     - integer
     - No
     - Soft timeout for rounds 1+ (voting/refinement). After this time, a warning is injected telling the agent to wrap up. Set to ``null`` to disable (default: disabled)
   * - ``round_timeout_grace_seconds``
     - integer
     - No
     - Grace period after soft timeout before hard timeout kicks in. After hard timeout, only ``vote`` and ``new_answer`` tools are allowed (default: 120 seconds)

Context Path
~~~~~~~~~~~~

.. list-table::
   :header-rows: 1

   * - Parameter
     - Type
     - Required
     - Description
   * - ``path``
     - string
     - Yes
     - Absolute path to directory
   * - ``permission``
     - string
     - Yes
     - "read" or "write"

UI
~~

.. list-table::
   :header-rows: 1

   * - Parameter
     - Type
     - Required
     - Description
   * - ``display_type``
     - string
     - No
     - "rich_terminal" or "simple"
   * - ``logging_enabled``
     - boolean
     - No
     - Enable/disable logging

Filesystem Memory
~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1

   * - Parameter
     - Type
     - Required
     - Description
   * - ``enabled``
     - boolean
     - No
     - Enable filesystem memory and auto-compression (default: true)
   * - ``compression``
     - object
     - No
     - Compression settings (see below)

Filesystem Memory Compression
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1

   * - Parameter
     - Type
     - Required
     - Description
   * - ``trigger_threshold``
     - float
     - No
     - Context usage percentage (0.0-1.0) at which to trigger compression (default: 0.75)
   * - ``target_ratio``
     - float
     - No
     - Target context percentage (0.0-1.0) after compression (default: 0.20)

See Also
--------

* :doc:`../quickstart/configuration` - Configuration guide
* :doc:`../user_guide/tools/mcp_integration` - MCP configuration details
* :doc:`../user_guide/files/project_integration` - Context paths setup
* :doc:`cli` - CLI parameters


---

## user_guide/advanced/agent_communication.rst

Agent Communication
===================

MassGen supports collaborative problem-solving through agent-to-agent communication
and optional human participation. This enables agents to coordinate, ask for help,
and work together more effectively during complex tasks.

Overview
--------

The communication system allows agents to:

- Ask questions to other agents using the ``ask_others()`` tool
- Request input, suggestions, or help during coordination
- Coordinate on shared resources or dependencies
- Optionally include the human user in discussions

Communication is handled through a **broadcast channel** that:

1. Spawns **shadow agents** in parallel to generate responses (agent mode)
2. Collects responses asynchronously without interrupting working agents
3. Returns responses to the requesting agent
4. Optionally prompts the human user for input (human mode)

.. note::
   **Backend Limitation**: The ``claude_code`` backend does not currently support
   broadcasting/``ask_others()``. When Claude Code agents attempt to use ``ask_others()``,
   they will see an error message. This is a known limitation tracked in
   `GitHub Issue #648 <https://github.com/massgen/MassGen/issues/648>`_.

   Use other backends (``openai``, ``claude``, ``gemini``, etc.) for agents that need
   to participate in broadcasts.

Communication Modes
-------------------

There are three broadcast modes:

**Disabled (default)**
   Broadcasting is completely disabled. Agents work independently.

   .. code-block:: yaml

      orchestrator:
        coordination:
          broadcast: false

**Agent-to-agent**
   Agents can communicate with each other. Questions are broadcast to all other
   agents who can respond.

   .. code-block:: yaml

      orchestrator:
        coordination:
          broadcast: "agents"

**Human-only**
   Agents can ask questions directly to the human user. Other agents are NOT
   prompted - only the human responds. This is useful when you want human
   guidance without agent cross-talk.

   .. code-block:: yaml

      orchestrator:
        coordination:
          broadcast: "human"

Basic Usage
-----------

The ``ask_others()`` tool waits for all responses before returning:

.. code-block:: python

   # Agent calls ask_others()
   result = ask_others("What authentication patterns are already implemented in the codebase?")

   # Tool blocks and waits for responses
   # Returns: {
   #   "status": "complete",
   #   "responses": [
   #     {"responder_id": "agent_b", "content": "Use OAuth2...", "is_human": False},
   #     {"responder_id": "agent_c", "content": "I agree...", "is_human": False}
   #   ]
   # }

   # Agent can now use responses
   for response in result["responses"]:
       print(f"{response['responder_id']}: {response['content']}")

Configuration Options
---------------------

All broadcast settings are in the orchestrator's coordination config:

.. code-block:: yaml

   orchestrator:
     coordination:
       # Broadcast mode: false (disabled) | "agents" | "human"
       broadcast: "agents"

       # Maximum time to wait for responses (seconds)
       broadcast_timeout: 300

       # Maximum active broadcasts per agent
       max_broadcasts_per_agent: 10

       # How frequently agents use ask_others() (optional)
       # Options: "low" | "medium" | "high"
       broadcast_sensitivity: "medium"

       # Response depth for shadow agents (test-time compute scaling)
       # Controls how thorough/complex suggested solutions should be
       # Options: "low" | "medium" | "high"
       response_depth: "medium"

Complete Configuration Examples
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Agent-to-Agent Communication**

.. code-block:: yaml

   # massgen/configs/broadcast/test_broadcast_agents.yaml
   agents:
     - id: agent_a
       backend:
         type: openai
         model: gpt-5
         cwd: workspace1
         enable_mcp_command_line: true
         command_line_execution_mode: docker

     - id: agent_b
       backend:
         type: gemini
         model: gemini-3-pro-preview
         cwd: workspace2
         enable_mcp_command_line: true
         command_line_execution_mode: docker

   orchestrator:
     snapshot_storage: snapshots
     agent_temporary_workspace: temp_workspaces

     coordination:
       broadcast: "agents"  # Enable agent-to-agent communication
       broadcast_sensitivity: "high"  # Agents use ask_others() frequently
       response_depth: "medium"  # Balanced solution complexity
       broadcast_timeout: 300
       max_broadcasts_per_agent: 10

**Human-in-the-Loop Communication**

.. code-block:: yaml

   # massgen/configs/broadcast/test_broadcast_human.yaml
   agents:
     - id: agent_a
       backend:
         type: openai
         model: gpt-5
         cwd: workspace1
         enable_mcp_command_line: true
         command_line_execution_mode: docker

     - id: agent_b
       backend:
         type: gemini
         model: gemini-3-pro-preview
         cwd: workspace2
         enable_mcp_command_line: true
         command_line_execution_mode: docker

   orchestrator:
     snapshot_storage: snapshots
     agent_temporary_workspace: temp_workspaces

     coordination:
       broadcast: "human"  # Human will be prompted for responses
       broadcast_sensitivity: "high"
       response_depth: "medium"  # Balanced solution complexity
       broadcast_timeout: 60  # Shorter timeout for interactive sessions
       max_broadcasts_per_agent: 5

Human Participation
-------------------

When ``broadcast: "human"`` is enabled, the human user is the sole responder.
Other agents are NOT prompted - only the human answers questions:

.. code-block:: yaml

   orchestrator:
     coordination:
       broadcast: "human"

**What happens:**

1. Agent calls ``ask_others("Question here")``
2. Human sees notification in terminal (other agents are NOT notified):

   .. code-block:: text

      ======================================================================
      📢 BROADCAST FROM AGENT_A
      ======================================================================

      What authentication patterns are already implemented in the codebase?

      ──────────────────────────────────────────────────────────────────────
      Options:
        • Type your response and press Enter
        • Press Enter alone to skip
        • You have 300 seconds to respond
      ======================================================================
      Your response (or Enter to skip):

3. Human can:
   - Type response and press Enter
   - Press Enter to skip (no response)
   - Wait for timeout (no response)

4. Human's response is returned to the requesting agent

Human Q&A Context Injection
~~~~~~~~~~~~~~~~~~~~~~~~~~~

When multiple agents run in parallel, they may ask similar questions to the human.
MassGen prevents redundant prompts through **serialization** and **Q&A history reuse**.

**Serialization:**

In human mode, ``ask_others()`` calls are serialized - only one agent can prompt the
human at a time. If Agent B calls ``ask_others()`` while Agent A is waiting for a
response, Agent B waits until Agent A's request completes.

**Q&A History Reuse:**

Once a human has answered any question, subsequent ``ask_others()`` calls return
the existing Q&A history **without prompting the human again**:

.. code-block:: json

   {
     "status": "deferred",
     "responses": [],
     "human_qa_history": [
       {"question": "What color theme?", "answer": "Dark mode"}
     ],
     "human_qa_note": "The human has already answered questions this session. Review the history above..."
   }

The agent receives the existing Q&A history to check if their question was already
answered. If the existing Q&A answers their question, they can use it directly.
If their question needs different information, they can call ``ask_others()`` again
with a more specific question (which will prompt the human for new input).

**How it works:**

1. Agent A calls ``ask_others("What color theme?")`` → acquires lock → prompts human
2. Agent B calls ``ask_others("What style?")`` → waits for lock...
3. Human answers "Dark mode" → Q&A stored → Agent A gets response → lock released
4. Agent B acquires lock → sees Q&A history exists → returns "deferred" with Q&A (NO prompt!)
5. Agent B uses existing Q&A or asks a different question

**Key points:**

- Human is only prompted **once** - subsequent calls return existing Q&A
- ``ask_others()`` calls are serialized (one at a time) in human mode
- Q&A history persists across all turns within a session
- Agents can call ``ask_others()`` again with a different question if needed
- **Note:** Q&A history reuse only applies to ``broadcast: "human"`` mode, not agent-to-agent communication

Examples
--------

Example 1: Coordinating on Shared Resources
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   # Agent A is about to modify shared code
   result = ask_others(
       "I'm about to refactor the authentication module to use OAuth2. "
       "Any concerns or conflicts with your current work?"
   )

   # Check responses
   for response in result["responses"]:
       if "concern" in response["content"].lower():
           # Address concerns before proceeding
           print(f"⚠️  {response['responder_id']} has concerns: {response['content']}")

Example 2: Getting Help When Stuck
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   # Agent is stuck on a bug
   result = ask_others(
       "I'm seeing a weird authentication error: 'Token signature invalid'. "
       "I've verified the secret key is correct. Any ideas what might cause this?"
   )

   # Review suggestions
   for response in result["responses"]:
       print(f"💡 Suggestion from {response['responder_id']}: {response['content']}")

Example 3: Hitting a Blocker
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   # Agent hits a blocker and needs human input
   result = ask_others(
       "The Google Maps API requires authentication and I don't have credentials. "
       "Should I use a free alternative like OpenStreetMap, or do you have API keys I should use?"
   )

   # Use human's guidance to proceed
   for response in result["responses"]:
       print(f"Guidance: {response['content']}")

Example 4: Multiple Viable Approaches
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   # Agent needs help choosing between approaches
   result = ask_others(
       "I can implement the data visualization using either Chart.js (simpler, lighter) "
       "or D3.js (more powerful, steeper learning curve). "
       "Which would you prefer given your needs?"
   )

   # Implement based on preference
   for response in result["responses"]:
       print(f"Decision: {response['content']}")

Example 5: Human Participation (Design Preferences)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

With ``broadcast: "human"`` enabled:

.. code-block:: python

   # Agent asks for human preferences on design decisions
   result = ask_others(
       "For the portfolio website, would you prefer: "
       "(A) a single-page design with smooth scrolling, or "
       "(B) multiple pages with navigation? "
       "Also, should I include a contact form or just list your email?"
   )

   # In human mode, only the human responds
   for response in result["responses"]:
       print(f"👤 Human: {response['content']}")

Example 6: Clarifying Requirements
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   # Agent needs clarification on vague requirements
   result = ask_others(
       "You mentioned 'professional styling' for the landing page. "
       "Do you want a corporate/minimalist look, or something more colorful/creative? "
       "Any specific colors or brand guidelines to follow?"
   )

   # Use clarified requirements
   for response in result["responses"]:
       print(f"Requirements: {response['content']}")

Example 7: Using Human Q&A History (Deferred Response)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When Q&A history exists, ``ask_others()`` returns immediately with status
"deferred" instead of prompting the human again:

.. code-block:: python

   # Agent B calls ask_others (Agent A already asked the human earlier)
   result = ask_others("What database should we use?")

   # Check if we got a deferred response (Q&A history exists)
   if result["status"] == "deferred":
       print("Human was NOT prompted - using existing Q&A history:")
       for qa in result["human_qa_history"]:
           print(f"  Q: {qa['question']}")
           print(f"  A: {qa['answer']}")

       # Use existing answers or call ask_others with a different question
       # if more specific information is needed

   elif result["status"] == "complete":
       # This was the first ask_others call - human was prompted
       for response in result["responses"]:
           print(f"Human: {response['content']}")

Technical Details
-----------------

Shadow Agent Architecture (Agent Mode)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When an agent calls ``ask_others()`` in **agent mode** (``broadcast: "agents"``),
MassGen uses a **shadow agent architecture** to generate responses:

1. Broadcast created with unique ``request_id``
2. **Shadow agents** are spawned in **parallel** for each target agent
3. Each shadow agent:

   - Shares the parent agent's backend (stateless, safe to share)
   - Copies the parent agent's **full conversation history** (complete context)
   - Includes the parent's **current turn streaming content** (work in progress)
   - Uses a simplified system prompt (preserves identity/persona, removes workflow tools)
   - Generates a **tool-free** text response

4. All shadow agent responses are collected simultaneously via ``asyncio.gather()``
5. Parent agents continue working **uninterrupted** throughout
6. Informational messages are injected into parent agents ("FYI, you were asked X and responded Y")
7. Requesting agent receives all responses when complete

**Why Shadow Agents:**

The shadow agent architecture was chosen for two key reasons:

1. **True Parallelization**: Shadow agents run completely in parallel without blocking
   or interrupting the parent agents. The parent agent continues its work while its
   shadow responds to the broadcast. This maximizes throughput and prevents deadlocks.

2. **System Prompt Control**: Shadow agents use a simplified system prompt that:

   - Preserves the parent's identity and persona (user-configured system message)
   - Removes workflow tools (no ``vote``, ``new_answer``, ``ask_others``)
   - Focuses purely on responding to the question
   - Prevents the shadow from taking unintended actions or getting confused

**Full Context Responses:**

Shadow agents have access to the parent agent's **complete context**, including:

- **Conversation history**: All prior messages and decisions
- **Current turn content**: The parent's in-progress streaming output (work being generated)

This means:

- Responding agents understand their own prior work and current activities
- Responses can reference context from earlier in the conversation
- Responses account for what the parent is actively working on
- Questions can assume the responder has relevant background

**Example:**

.. code-block:: python

   # ✅ Both work well with shadow agents
   ask_others("What do you think about this approach?")  # Shadow has context!

   ask_others(
       "I'm considering using React with TypeScript for the frontend. "
       "Do you see any issues with this choice?"
   )

**Parent Agent Awareness:**

After a shadow agent responds, an informational message is injected into the
parent agent's conversation history:

.. code-block:: text

   [INFO] While you were working, agent_a asked: "Should we use PostgreSQL?"
   Your shadow agent responded: "Yes, PostgreSQL is a good choice because..."
   (This is just for your awareness - you may continue your work.)

Response Depth (Test-Time Compute Scaling)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``response_depth`` parameter controls **test-time compute scaling** for shadow
agent responses. This concept is inspired by recent AI research showing that allocating
more compute at inference time leads to better results (e.g., OpenAI's o1/o3 models).

In MassGen, shadow agents responding to broadcasts represent a form of test-time compute.
The ``response_depth`` parameter allows you to control this scaling - similar to how a
human might decide how much effort to put into a response based on task importance.

**Options:**

``low``
   Quick, simple responses with minimal solutions. Shadow agents will:

   - Prefer basic technologies (vanilla HTML/CSS/JS, simple libraries)
   - Avoid complex frameworks or architectures
   - Focus on getting the job done with minimal dependencies
   - Keep responses brief and to the point

``medium`` (default)
   Balanced effort with standard solutions. Shadow agents will:

   - Use appropriate technology for the task complexity
   - Include standard best practices without over-engineering
   - Balance simplicity with maintainability
   - Be concise but thorough

``high``
   Thorough, comprehensive responses with sophisticated solutions. Shadow agents will:

   - Recommend modern frameworks and best practices (React, Next.js, TypeScript, etc.)
   - Include architecture considerations (SSR, component libraries, testing, CI/CD)
   - Suggest professional-grade tooling and patterns
   - Provide detailed responses with examples

**Example:**

.. code-block:: yaml

   orchestrator:
     coordination:
       broadcast: "agents"
       response_depth: "high"  # Get sophisticated, comprehensive suggestions

**Use Cases:**

- Use ``low`` for quick prototypes or learning projects
- Use ``medium`` (default) for standard development work
- Use ``high`` for production systems requiring enterprise-grade solutions

Serialized Human Mode
~~~~~~~~~~~~~~~~~~~~~

When an agent calls ``ask_others()`` in **human mode** (``broadcast: "human"``):

1. Agent acquires the ``ask_others`` lock (waits if another agent holds it)
2. If Q&A history exists → returns "deferred" with history (no human prompt)
3. If no Q&A history → prompts human and waits for response
4. Response stored in Q&A history
5. Lock released, next waiting agent proceeds

This ensures:

- Human sees only **one prompt at a time**
- Subsequent agents get existing Q&A without re-prompting
- Q&A history persists across all turns in the session

Broadcast Tools
~~~~~~~~~~~~~~~

Built-in tools automatically available when broadcasts are enabled:

``ask_others(question: str)``
   Ask question to other agents or human. Waits for and returns all responses.

``respond_to_broadcast(answer: str)``
   **(Deprecated)** This tool is no longer needed. With the shadow agent architecture,
   broadcast responses are handled automatically by shadow agents. If an agent calls
   this tool, it will receive a message indicating that responses are handled
   automatically and it should continue its work.

These are built-in workflow tools, not MCP servers.

Rate Limiting
~~~~~~~~~~~~~

Each agent can have at most ``max_broadcasts_per_agent`` active broadcasts
(default: 10). This prevents agents from spamming broadcasts.

Troubleshooting
---------------

**Claude Code backend shows "No such tool available: ask_others"**
   - The ``claude_code`` backend does not currently support broadcasting
   - See `GitHub Issue #648 <https://github.com/massgen/MassGen/issues/648>`_ for status
   - Use other backends (``openai``, ``claude``, ``gemini``) for broadcast-enabled agents

**Broadcasts not working**
   - Check that ``broadcast`` is set to ``"agents"`` or ``"human"`` (not ``false``)
   - Verify all agents are initialized and have ``_orchestrator`` reference
   - Check logs for MCP tool injection messages

**Human prompts not appearing**
   - Ensure ``broadcast: "human"`` is set (not just ``"agents"``)
   - Check that ``coordination_ui`` is initialized
   - Verify timeout hasn't expired

**Timeouts occurring**
   - Increase ``broadcast_timeout`` if agents need more time to respond
   - Check agent logs to see if they're receiving broadcasts
   - Verify agents aren't stuck or errored

See Also
--------

- :doc:`agent_task_planning` - Task planning system for organizing work
- :doc:`../../reference/yaml_schema` - Complete YAML configuration reference


---

## user_guide/advanced/agent_task_planning.rst

:orphan:

Agent Task Planning
===================

Agent Task Planning enables agents to organize their work into structured task lists with dependencies during coordination.

.. note::

   **New in v0.1.7**: Agents can create and manage task plans with dependency tracking.

Quick Start
-----------

Enable task planning in your config:

.. code-block:: yaml

   orchestrator:
     coordination:
       enable_agent_task_planning: true
       max_tasks_per_plan: 10

Or use the example:

.. code-block:: bash

   massgen --config @examples/configs/tools/todo/example_task_todo.yaml \
     "Create a website about Bob Dylan"

Available Tools
---------------

When enabled, agents receive 9 MCP planning tools:

* ``create_task_plan()`` - Create a task plan with optional dependencies
* ``add_task()`` - Add tasks dynamically
* ``update_task_status()`` - Mark tasks as in_progress or completed
* ``get_task_plan()`` - View current plan with statistics
* ``get_ready_tasks()`` - See tasks ready to start
* ``get_blocked_tasks()`` - See tasks waiting on dependencies
* ``edit_task()`` - Update task descriptions
* ``delete_task()`` - Remove tasks
* ``clear_task_plan()`` - Clear the current plan to start fresh (used internally on agent restart)

Example Usage
-------------

Creating a plan with dependencies:

.. code-block:: python

   create_task_plan([
       "Research OAuth providers",
       {
           "description": "Implement OAuth flow",
           "depends_on": [0]  # Depends on task 0
       },
       {
           "description": "Write tests",
           "depends_on": [1]  # Depends on task 1
       }
   ])

Tracking progress:

.. code-block:: python

   update_task_status(task_id="task_0", status="completed")
   get_ready_tasks()  # Returns task_1 (now unblocked)

Configuration
-------------

.. list-table::
   :header-rows: 1
   :widths: 30 20 50

   * - Option
     - Default
     - Description
   * - ``enable_agent_task_planning``
     - false
     - Enable task planning tools
   * - ``max_tasks_per_plan``
     - 10
     - Maximum tasks per agent

Features
--------

* **Dependency management** - Automatic validation and circular dependency detection
* **Per-agent isolation** - Each agent has their own task plan
* **Progress tracking** - Track pending, in_progress, and completed tasks
* **Automatic unblocking** - Tasks become ready when dependencies complete

See Also
--------

* :doc:`planning_mode` - Combine with planning mode for complex workflows
* :doc:`../tools/mcp_integration` - Learn about other MCP tools


---

## user_guide/advanced/change_documents.rst

Change Documents
================

Change Documents (changedocs) are decision journals that agents write alongside their answers during coordination. They capture **why** each decision was made, **what alternatives** were considered, and **where in the code** each decision lives --- creating a traceable record from reasoning to implementation.

.. note::

   Change documents are enabled by default. To disable them, set ``enable_changedoc: false`` in coordination config.

Why Change Documents?
---------------------

When AI agents produce code, the reasoning behind their decisions is usually lost. You see *what* was built but not *why*. Change documents solve this by recording decisions in real-time as agents work:

* **Decision provenance** --- every significant choice is documented with rationale
* **Code traceability** --- each decision points to specific files, functions, and line numbers
* **Multi-agent attribution** --- track which agent introduced each idea, and which ideas were genuinely new
* **Deliberation history** --- see how decisions evolved as agents observed and refined each other's work

Quick Start
-----------

Change documents are enabled by default. Agents automatically write ``tasks/changedoc.md`` in their workspace:

.. code-block:: yaml

   # Default behavior --- changedoc is on
   orchestrator:
     coordination:
       enable_changedoc: true   # This is the default

To disable:

.. code-block:: yaml

   orchestrator:
     coordination:
       enable_changedoc: false

How It Works
------------

Agent Workflow
~~~~~~~~~~~~~~

Each agent follows this workflow during coordination:

1. **Create** ``tasks/changedoc.md`` as their first action
2. **Log decisions** in real-time as they make them (not after the fact)
3. **Reference code** with file paths, symbol names, and line numbers
4. **Submit answer** --- the changedoc is already up to date

When agents build on prior answers, they **inherit** the previous agent's changedoc and extend it with their own decisions.

Self-Reference Placeholder
~~~~~~~~~~~~~~~~~~~~~~~~~~

When writing a changedoc, agents use ``[SELF]`` wherever they would reference their own work. The orchestrator automatically replaces ``[SELF]`` with the agent's real answer label (e.g., ``agent1.2``) when the answer is submitted. This means:

* Agents don't need to know their own label in advance
* Other agents always see real labels, never placeholders
* The provenance chain is consistent and machine-readable

.. code-block:: markdown

   # What the agent writes:
   **Origin:** [SELF] --- NEW

   # What the next agent sees:
   **Origin:** agent1.2 --- NEW

Observation Flow
~~~~~~~~~~~~~~~~

The orchestrator automatically includes changedoc content when agents observe each other's work:

.. code-block:: text

   Agent A writes changedoc with DEC-001, DEC-002
       |
       v
   Orchestrator reads tasks/changedoc.md from Agent A's workspace
       |
       v
   Agent B sees Agent A's answer + changedoc in <changedoc> tags
       |
       v
   Agent B inherits changedoc, modifies decisions, adds new ones
       |
       v
   Final presenter consolidates into definitive changedoc

Final Consolidation
~~~~~~~~~~~~~~~~~~~

The final presenter (winning agent) produces a consolidated changedoc that:

* Finalizes the decision list (removes superseded decisions)
* Updates all code references to point to the delivered files
* Preserves the deliberation trail showing how decisions evolved
* Marks which ideas were genuinely new contributions

Changedoc Structure
-------------------

A changedoc has four sections:

Header
~~~~~~

.. code-block:: markdown

   # Change Document

   **Based on:** agent1.1

The ``Based on`` field tracks which answer this changedoc inherits from, using MassGen's answer labels (e.g., ``agent1.1`` = agent 1's first answer, ``agent2.3`` = agent 2's third answer).

Decisions
~~~~~~~~~

Each decision has an Origin, Choice, Rationale, Alternatives, and Implementation:

.. code-block:: markdown

   ### DEC-001: Use connection pooling for response time
   **Origin:** agent1.1 --- NEW
   **Choice:** Connection pooling with pgbouncer
   **Why:** Reduces query overhead from ~180ms to ~40ms
   **Alternatives considered:**
   - Caching: Doesn't handle cache misses within 200ms
   - Read replicas: Adds operational complexity
   **Implementation:**
   - `src/db/pool.py:L15-42` -> `ConnectionPool.__init__()` --- configures pool size and timeout
   - `src/db/pool.py:L44-68` -> `ConnectionPool.acquire()` --- checkout with retry logic

Key fields:

* **Origin** --- who first introduced this decision, using answer labels
* **NEW** marker --- flags genuinely novel ideas not present in any prior answer
* **Implementation** --- relative file paths, symbol names, and line numbers

When a decision is modified by a later agent:

.. code-block:: markdown

   ### DEC-002: Authentication approach
   **Origin:** agent1.1, modified by agent2.1
   **Choice:** JWT with refresh tokens
   **Why:** agent1.1 used session cookies, but JWT scales better for API clients
   **Implementation:**
   - `src/auth/jwt.py:L10-35` -> `create_token()` --- signs payload with RS256

Code References
~~~~~~~~~~~~~~~

All code references use relative paths within the workspace with both symbol names and line numbers:

.. code-block:: text

   Format: `relative/path/file.py:L10-25` -> `ClassName.method()` --- brief description

Line numbers are stable references because each agent's code is frozen once they submit their answer. When another agent reads the changedoc, the line numbers point to an immutable snapshot.

Deliberation Trail
~~~~~~~~~~~~~~~~~~

The trail records what changed between agents and why:

.. code-block:: markdown

   ## Deliberation Trail

   ### agent2.1 (based on agent1.1):
   - DEC-001: Kept --- connection pooling approach is sound
   - DEC-002: Modified --- switched from session cookies to JWT (see rationale above)
   - DEC-003: NEW --- added rate limiting, not present in agent1.1's answer

   ### agent1.2 (based on agent2.1):
   - DEC-001: Kept
   - DEC-002: Kept agent2.1's JWT approach
   - DEC-003: Kept rate limiting, increased threshold from 100 to 500 req/min

The trail uses answer labels (``agent1.1``, ``agent2.1``) for precise provenance. You can trace any decision back through the chain to see who introduced it, who modified it, and why.

Decision Provenance
-------------------

Every decision tracks its origin through the refinement chain:

.. code-block:: text

   agent1.1 --- NEW            Original idea, introduced by agent 1
   agent1.1, modified by agent2.1   Agent 2 changed it, attributed to agent 1
   agent2.1 --- NEW            Genuinely new idea from agent 2

This lets you answer:

* **Where did this idea come from?** Check the Origin field.
* **Who contributed new thinking?** Look for ``NEW`` markers.
* **Did two agents build on the same source?** Compare ``Based on:`` headers --- if both say ``agent1.1``, they forked from the same point.
* **How did a decision evolve?** Read the Deliberation Trail entries for that DEC number.

Reading Changedocs in Logs
--------------------------

Changedocs are saved in the log directory alongside answers:

.. code-block:: text

   .massgen/massgen_logs/log_YYYYMMDD_HHMMSS/
   └── turn_1/
       └── attempt_1/
           ├── agent_a/
           │   └── YYYYMMDD_HHMMSS_NNNNNN/
           │       ├── answer.txt
           │       ├── changedoc.md            # Changedoc snapshot at this step
           │       └── workspace/
           │           └── tasks/
           │               └── changedoc.md    # Raw file from workspace
           ├── final/
           │   └── agent_a/
           │       └── changedoc.md            # Final consolidated changedoc
           └── ...

Each agent snapshot captures the changedoc at that point in time. The ``final/`` directory contains the presenter's consolidated version.

Configuration
-------------

.. code-block:: yaml

   orchestrator:
     coordination:
       enable_changedoc: true    # Default: true

Change documents work with all coordination modes and all backends. They are independent of planning mode --- you get decision journals whether or not ``enable_planning_mode`` is set.

.. seealso::

   When combined with planning mode, changedocs become even more powerful --- agents document their *intended* approach during coordination, then the winning agent executes and updates code references to the final implementation.

   :doc:`planning_mode` --- Planning mode configuration and workflow

Example Output
--------------

Here is an example changedoc from a two-agent run creating a Python fun-facts terminal application:

.. code-block:: markdown

   # Change Document

   **Based on:** agent1.1

   ## Summary
   Interactive Python script with 35 fun facts across 5 categories,
   using Rich library for terminal formatting with validated input via Prompt.ask().

   ## Decisions

   ### DEC-001: Use Rich library for terminal formatting
   **Origin:** agent1.1 --- NEW
   **Choice:** Use the `rich` library for all terminal output
   **Why:** Professional terminal output with minimal code --- panels, tables, syntax
   highlighting, progress bars. Well-maintained and widely used.
   **Alternatives considered:**
   - ANSI escape codes: Too low-level and harder to maintain
   - Colorama: More low-level, requires more code for similar effects
   **Implementation:**
   - `fun_facts.py:L1-5` -> imports --- `from rich.console import Console`
   - `fun_facts.py:L45-80` -> `display_fact()` --- renders fact in styled Panel

   ### DEC-002: Validated input with Prompt.ask()
   **Origin:** agent1.1 --- NEW
   **Choice:** Use `Prompt.ask(choices=[...])` for all user input
   **Why:** Eliminates invalid input entirely, provides autocomplete UX
   **Alternatives considered:**
   - Basic `input()`: More error-prone (agent2.1 used this, switched away)
   **Implementation:**
   - `fun_facts.py:L82-95` -> `main()` --- menu loop with `Prompt.ask(choices=["1","2","3","4","5"])`

   ### DEC-003: Statistics view
   **Origin:** agent1.2 --- NEW
   **Choice:** Add collection statistics showing facts per category
   **Why:** Helps users understand collection scope, showcases Rich tables
   **Implementation:**
   - `fun_facts.py:L120-145` -> `show_statistics()` --- Rich Table with category counts

   ## Deliberation Trail

   ### agent1.1 (original):
   - Created DEC-001, DEC-002 with 35 facts across 5 categories

   ### agent2.1 (original, parallel):
   - Also chose Rich (DEC-001 convergence), but used basic `input()` and 20 facts

   ### agent1.2 (based on agent2.1):
   - DEC-001: Kept
   - DEC-002: Kept Prompt.ask() --- agent2.1's `input()` approach is more error-prone
   - DEC-003: NEW --- statistics view not in any prior answer

Next Steps
----------

* :doc:`planning_mode` --- Combine changedocs with planning mode for safe execution
* :doc:`../../user_guide/logging` --- Understanding the full log directory structure
* :doc:`agent_communication` --- How agents observe and respond to each other


---

## user_guide/advanced/computer_use.rst

Computer Use Tools
==================

MassGen provides powerful computer use tools that allow AI agents to autonomously control browsers and desktop environments. These tools enable agents to browse websites, interact with applications, execute commands, and complete complex multi-step workflows.

.. note::

   **Currently Available Tools:**

   * ``gemini_computer_use`` - Google Gemini Computer Use (requires ``gemini-2.5-computer-use-preview-10-2025`` model)
   * ``claude_computer_use`` - Anthropic Claude Computer Use (requires ``claude-sonnet-4-5`` or newer)
   * ``browser_automation`` - Simple browser automation (works with ANY model: gpt-4.1, gpt-4o, etc.)
   * ``computer_use`` - OpenAI Computer Use (requires ``computer-use-preview`` model from OpenAI)

     - WARNING: OpenAI Computer Use model has not gone through sophisticated testing due to access restrictions on computer-use-preview model. Performance is not guaranteed. Be cautious while using.

   * ``ui_tars_computer_use`` - UI-TARS Computer Use from ByteDance (open-sourced)

**Environments:**

We try to accommodate as many systems as we can, but practically, we observe that computer use models tend to work best when they start on a browser or linux docker. Hence, we have two recommended environments:

* ``browser`` - Launch computer use agents in a browser, suitable for web tasks.
* ``linux docker`` - Launch computer use agents in a Docker container, suitable for all web and desktop tasks.

**Automatic Docker Setup:** MassGen will automatically create and configure the Docker container on first run when using a Docker-based computer use config. No manual setup required! The container includes Ubuntu 22.04 with Xfce desktop, X11 virtual display, xdotool, Firefox, Chromium, and scrot.

See `here <https://github.com/massgen/MassGen/blob/main/scripts/computer_use_setup.md>`_ for quick set-up guides for those two environments, and `here <https://github.com/massgen/MassGen/blob/main/massgen/backend/docs/COMPUTER_USE_VISUALIZATION.md>`_ for visualization guides.

**Naming:**

We name our configs in this convention: ``${TOOL_NAME}_computer_use_${ENVIRONMENT}_example.yaml``.

For example, if you would like to use Claude in linux docker environment, you should use the config ``massgen/configs/tools/custom_tools/claude_computer_use_docker_example.yaml``.

If ``${ENVIRONMENT}`` is not specified, we use ``browser`` as default value.

We welcome proposals of new tool and environment combinations!


Overview
--------

Computer use tools transform AI agents from text processors into active automation systems capable of:

* **Browser Automation** - Navigate websites, fill forms, extract data, search for information
* **Desktop Control** - Interact with applications, manage files, execute system commands
* **Visual Understanding** - Take screenshots and use visual feedback to guide actions
* **Multi-Step Workflows** - Chain together complex sequences of actions autonomously

Tool Comparison
---------------

.. list-table::
   :header-rows: 1
   :widths: 20 20 20 20 20

   * - Feature
     - ``computer_use``
     - ``gemini_computer_use``
     - ``claude_computer_use``
     - ``browser_automation``
   * - **Model Support**
     - ``computer-use-preview`` only
     - ``gemini-2.5-computer-use-preview`` only
     - ``claude-sonnet-4-5`` or newer
     - Any model
   * - **Provider**
     - OpenAI
     - Google
     - Anthropic
     - Any
   * - **Environments**
     - Browser, Linux/Docker, Mac, Windows
     - Browser, Linux/Docker
     - Browser, Linux/Docker
     - Browser only
   * - **Action Planning**
     - Autonomous multi-step
     - Autonomous multi-step
     - Autonomous multi-step
     - User-directed
   * - **Complexity**
     - High (full agentic)
     - High (full agentic)
     - High (full agentic)
     - Low (simple)
   * - **Safety Checks**
     - Built-in
     - Built-in + confirmations
     - Built-in
     - Manual
   * - **Performance**
     - Fast (~1-2 sec/action)
     - Fast (~1-2 sec/action)
     - Thorough (~2-5 sec/action)
     - Very Fast (~1 sec)
   * - **Best Use Case**
     - Complex workflows (OpenAI)
     - Complex workflows (Google)
     - Precision tasks (Anthropic)
     - Simple automation

Quick Start
-----------

**1. Simple Browser Automation (Works with Any Model)**

.. code-block:: bash

   # Install dependencies
   pip install playwright
   playwright install

   # Run with gpt-4.1 or any other model
   uv run massgen \
     --config massgen/configs/tools/custom_tools/simple_browser_automation_example.yaml \
     "Go to Wikipedia and search for Jimmy Carter"

**2. Gemini Computer Use**

Browser automation:

.. code-block:: bash

   # Set API key
   export GEMINI_API_KEY="your-api-key"

   # Run Gemini browser automation
   uv run massgen \
     --config massgen/configs/tools/custom_tools/gemini_computer_use_example.yaml \
     "Go to cnn.com and get the top headline"

Docker/Linux desktop automation:

.. code-block:: bash

   # Set API key
   export GEMINI_API_KEY="your-api-key"

   # Run Gemini desktop automation
   # Docker container is automatically created on first run!
   uv run massgen \
     --config massgen/configs/tools/custom_tools/gemini_computer_use_docker_example.yaml \
     "Open Firefox and search for Python documentation"

**3. Claude Computer Use (Docker/Linux)**

.. code-block:: bash

   # Set API key
   export ANTHROPIC_API_KEY="your-api-key"

   # Run Claude desktop automation
   # Docker container is automatically created on first run!
   uv run massgen \
     --config massgen/configs/tools/custom_tools/claude_computer_use_docker_example.yaml \
     "Navigate to Wikipedia and search for Artificial Intelligence"

Detailed Tool Guides
--------------------

1. Gemini Computer Use
~~~~~~~~~~~~~~~~~~~~~~

**Description:** Full implementation of Google's Gemini 2.5 Computer Use API with native computer control capabilities and built-in safety checks.

**Model Requirement:**

* **MUST use** ``gemini-2.5-computer-use-preview-10-2025`` model
* Will NOT work with other Gemini models

**Example Configuration (Browser)**

.. code-block:: yaml

   agents:
     - id: "gemini_automation_agent"
       backend:
         type: "google"
         model: "gemini-2.5-computer-use-preview-10-2025"  # Required!
         custom_tools:
           - name: ["gemini_computer_use"]
             path: "massgen/tool/_gemini_computer_use/gemini_computer_use_tool.py"
             function: ["gemini_computer_use"]
             preset_args:
               environment: "browser"
               display_width: 1440  # Recommended by Gemini
               display_height: 900  # Recommended by Gemini
               environment_config:
                 headless: false  # Set to true for headless
                 browser_type: "chromium"

   ui:
     display_type: "rich_terminal"

**Supported Actions:**

* ``open_web_browser`` - Open browser
* ``click_at`` - Click at coordinates (normalized 0-1000)
* ``hover_at`` - Hover at coordinates
* ``type_text_at`` - Type text at coordinates
* ``key_combination`` - Press key combinations
* ``scroll_document`` - Scroll entire page
* ``scroll_at`` - Scroll specific area
* ``navigate`` - Go to URL
* ``go_back`` / ``go_forward`` - Browser navigation
* ``search`` - Go to search engine
* ``wait_5_seconds`` - Wait for content
* ``drag_and_drop`` - Drag elements

**Safety Features:**

* Built-in safety system checks all actions
* ``require_confirmation`` - User must approve risky actions
* Automatically handles safety acknowledgements
* All actions logged for auditing

**Use Cases:**

* Complex multi-step browser workflows
* Research and information gathering
* E-commerce product research
* Form filling with validation
* Web scraping with navigation
* Automated testing

**Supported Environments:**

* **Browser** - Playwright-based web automation (Chromium recommended)
* **Linux/Docker** - Desktop automation in Docker container (xdotool)

**Example Docker Configuration:**

.. code-block:: yaml

   agents:
     - id: "gemini_desktop_agent"
       backend:
         type: "openai"  # Orchestration backend
         model: "gpt-4.1"
         custom_tools:
           - name: ["gemini_computer_use"]
             path: "massgen/tool/_gemini_computer_use/gemini_computer_use_tool.py"
             function: ["gemini_computer_use"]
             preset_args:
               environment: "linux"  # Use Docker
               display_width: 1024
               display_height: 768
               max_iterations: 30
               environment_config:
                 container_name: "cua-container"
                 display: ":99"

**Prerequisites:**

* ``GEMINI_API_KEY`` environment variable
* For browser: ``pip install playwright && playwright install``
* For Docker: Docker installed and running (container auto-created on first run)
* ``pip install google-genai docker`` (included in requirements.txt)

2. Claude Computer Use
~~~~~~~~~~~~~~~~~~~~~~~

**Description:** Full implementation of Anthropic's Claude Computer Use API with enhanced actions and thorough execution capabilities.

**Model Requirement:**

* **Recommended:** ``claude-sonnet-4-5`` (latest with computer use)
* Compatible with Claude models supporting computer use
* Will NOT work with older Claude models

**Example Configuration (Docker/Linux)**

.. code-block:: yaml

   agents:
     - id: "claude_automation_agent"
       backend:
         type: "anthropic"
         model: "claude-sonnet-4-5"  # Recommended!
         custom_tools:
           - name: ["claude_computer_use"]
             path: "massgen/tool/_claude_computer_use/claude_computer_use_tool.py"
             function: ["claude_computer_use"]
             preset_args:
               environment: "linux"
               display_width: 1024
               display_height: 768
               max_iterations: 25
               environment_config:
                 container_name: "cua-container"
                 display: ":99"

**Example Configuration (Browser)**

.. code-block:: yaml

   agents:
     - id: "claude_browser_agent"
       backend:
         type: "anthropic"
         model: "claude-sonnet-4-5"
         custom_tools:
           - name: ["claude_computer_use"]
             path: "massgen/tool/_claude_computer_use/claude_computer_use_tool.py"
             function: ["claude_computer_use"]
             preset_args:
               environment: "browser"
               display_width: 1024
               display_height: 768
               max_iterations: 25
               headless: false  # Set to true for headless
               browser_type: "chromium"

**Supported Actions:**

**Standard Actions:**

* ``screenshot`` - Capture current screen
* ``mouse_move`` - Move mouse to coordinates
* ``left_click`` / ``right_click`` / ``middle_click`` / ``double_click`` - Mouse control
* ``left_click_drag`` - Click and drag
* ``type`` - Type text
* ``key`` - Press single key
* ``scroll`` - Scroll up/down

**Enhanced Actions (Claude-specific):**

* ``triple_click`` - Triple-click to select lines
* ``left_mouse_down`` / ``left_mouse_up`` - Precise drag control
* ``hold_key`` - Hold key while performing action
* ``wait`` - Wait for specified duration

**Text Editor Actions:**

* ``str_replace_based_edit_tool`` - File editing with find/replace
* ``bash`` - Execute bash commands (if enabled)

**Supported Environments:**

* **Browser** - Playwright-based web automation (Chromium)
* **Linux** - Docker container with desktop (xdotool, similar to OpenAI implementation)

**Performance Characteristics:**

* **Thorough but slower**: ~2-5 seconds per action (vs 1-2 sec for other tools)
* **High iteration count**: Typically 25-40 iterations for simple web tasks
* **Recommended for**: Complex tasks where thoroughness matters more than speed
* **Not recommended for**: Simple tasks requiring quick execution

Example Performance:

.. code-block:: text

   Task: "Go to cnn.com and get the top headline"
   - Claude Computer Use: 25-40 iterations, ~60-100 seconds
   - Browser Automation: 2-3 actions, ~5-10 seconds

**Choose based on task complexity vs speed requirements.**

**Headless Mode:**

* **Automatically enforced** on Linux servers without DISPLAY environment variable
* **Can be overridden** for systems with X server
* Check logs: "Forcing headless mode on Linux without X server"

**Use Cases:**

* ✅ Complex research requiring deep navigation
* ✅ Multi-step workflows with verification
* ✅ Tasks requiring precision and thoroughness
* ✅ When using Anthropic's ecosystem
* ❌ Simple/quick automation tasks (use ``browser_automation`` instead)

**Prerequisites:**

* ``ANTHROPIC_API_KEY`` environment variable
* ``pip install playwright && playwright install``
* ``pip install anthropic`` (included in requirements.txt)
* Python 3.8+

3. Browser Automation
~~~~~~~~~~~~~~~~~~~~~

**Description:** Simple, direct browser automation tool using Playwright. User explicitly controls each action. Works with any LLM model.

**Model Support:**

* ✅ **gpt-4.1**
* ✅ **gpt-4o**
* ✅ **Gemini**
* ✅ **Claude** (with appropriate backend)
* ✅ Any other model

**Example Configuration:**

.. code-block:: yaml

   agents:
     - id: "browser_agent"
       backend:
         type: "openai"
         model: "gpt-4.1"  # Can be any model!
         custom_tools:
           - name: ["browser_automation"]
             path: "massgen/tool/_browser_automation/browser_automation_tool.py"
             function: ["browser_automation"]

   ui:
     display_type: "rich_terminal"

**Supported Actions:**

* ``navigate`` - Go to URL
* ``click`` - Click element by CSS selector
* ``type`` - Type text into element
* ``extract`` - Extract text from elements
* ``screenshot`` - Capture page image

**Example Usage:**

.. code-block:: python

   # Navigate to a page
   await browser_automation(
       task="Open Wikipedia",
       url="https://en.wikipedia.org",
       action="navigate"
   )

   # Type in search box
   await browser_automation(
       task="Search for Jimmy Carter",
       action="type",
       selector="input[name='search']",
       text="Jimmy Carter"
   )

   # Click search button
   await browser_automation(
       task="Click search",
       action="click",
       selector="button[type='submit']"
   )

   # Extract results
   await browser_automation(
       task="Get first paragraph",
       action="extract",
       selector="p.first-paragraph"
   )

**Use Cases:**

* Simple page navigation
* Data extraction
* Testing specific actions
* Screenshot capture
* Form interactions
* When you need precise control
* When specialized computer use models are not available

Decision Guide
--------------

When to Use Each Tool
~~~~~~~~~~~~~~~~~~~~~

**Use** ``computer_use`` **when:**

* ✅ You have access to ``computer-use-preview`` model (OpenAI)
* ✅ Task requires multiple autonomous steps
* ✅ Task is complex (e.g., "research topic and create report")
* ✅ You want the model to plan its own actions
* ✅ You need Linux/Docker/OS-level automation
* ✅ You need fast execution (1-2 sec/action)

**Use** ``gemini_computer_use`` **when:**

* ✅ You have access to Gemini 2.5 Computer Use model (Google)
* ✅ You prefer Google's AI models
* ✅ Task requires autonomous browser control
* ✅ You want built-in safety confirmations
* ✅ Task is complex and browser-based
* ✅ You need fast execution (1-2 sec/action)

**Use** ``claude_computer_use`` **when:**

* ✅ You have access to Claude Sonnet 4.5 or newer (Anthropic)
* ✅ You prefer Anthropic's AI models
* ✅ Task requires thorough, careful execution
* ✅ Task is complex and multi-step
* ✅ Quality and precision matter more than speed
* ✅ You need enhanced actions (triple_click, mouse_down/up, hold_key)
* ⚠️ Accept ~2-5 sec/action and 25-40+ iterations

**Use** ``browser_automation`` **when:**

* ✅ You don't have specialized computer use model access
* ✅ Using gpt-4.1, gpt-4o, or other standard models
* ✅ Task is simple and direct
* ✅ You want explicit control over each action
* ✅ You're testing specific workflows
* ✅ You only need browser automation
* ✅ You need very fast execution (~1 sec/action)

Performance Quick Reference
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 25 20 25 30

   * - Tool
     - Speed/Action
     - Iterations (Simple Task)
     - Best For
   * - ``browser_automation``
     - ~1 sec
     - 2-5
     - Simple tasks, explicit control
   * - ``computer_use``
     - ~1-2 sec
     - 10-20
     - Complex OpenAI workflows
   * - ``gemini_computer_use``
     - ~1-2 sec
     - 10-20
     - Complex Google workflows
   * - ``claude_computer_use``
     - ~2-5 sec
     - 25-40
     - Thorough Anthropic workflows

Visualization and Monitoring
-----------------------------

Visualizing computer use agents helps you understand what they're doing in real-time and debug issues.

VNC Viewer (Docker/Linux)
~~~~~~~~~~~~~~~~~~~~~~~~~~

For Claude Computer Use in Docker, you can watch the desktop in real-time using VNC.

**Quick Setup:**

.. code-block:: bash

   # 1. Enable VNC on the Docker container
   ./scripts/enable_vnc_viewer.sh

   # 2. Install a VNC viewer (one-time setup)
   # Ubuntu/Debian:
   sudo apt-get install tigervnc-viewer
   # Or:
   sudo snap install remmina

   # 3. Connect to the container
   # Get container IP:
   docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' cua-container
   # Connect with: <container-ip>:5900

**What You'll See:**

* Real-time desktop with Xfce window manager
* Mouse movements and clicks as the agent executes actions
* Terminal windows opening for bash commands
* Applications launching (Firefox, text editors, etc.)
* File browser operations
* All desktop interactions in real-time

Non-Headless Browser Mode
~~~~~~~~~~~~~~~~~~~~~~~~~~

For Gemini and Claude browser automation, watch the browser by disabling headless mode.

**Update Configuration:**

Use ``preset_args`` (not ``default_params``):

.. code-block:: yaml

   # For Gemini Computer Use
   custom_tools:
     - name: ["gemini_computer_use"]
       path: "massgen/tool/_gemini_computer_use/gemini_computer_use_tool.py"
       function: ["gemini_computer_use"]
       preset_args:
         environment: "browser"
         display_width: 1440
         display_height: 900
         environment_config:
           headless: false  # Set to false for visible browser
           browser_type: "chromium"

   # For Claude Computer Use (browser mode)
   custom_tools:
     - name: ["claude_computer_use"]
       path: "massgen/tool/_claude_computer_use/claude_computer_use_tool.py"
       function: ["claude_computer_use"]
       preset_args:
         environment: "browser"
         headless: false  # Set to false for visible browser

**Running with Visible Browser:**

.. important::

   You must set the ``DISPLAY`` environment variable when running:

.. code-block:: bash

   # Check your available displays
   ls /tmp/.X11-unix/
   # Shows: X0, X20, etc.

   # Run MassGen with DISPLAY variable (example using :20)
   DISPLAY=:20 uv run massgen --config gemini_computer_use_example.yaml

   # For Claude browser
   DISPLAY=:20 uv run massgen --config claude_computer_use_browser_example.yaml

**What You'll See:**

* Actual browser window opens on your desktop
* For Claude: Browser opens with Google homepage loaded
* For Gemini: Browser opens at specified URL or blank page
* Pages loading and navigation
* Form filling and clicking in real-time
* Scrolling and text entry
* Mouse movements and interactions

**Requirements:**

* X11 display server running (check with ``echo $DISPLAY``)
* Desktop environment (GUI) or X server available
* DISPLAY environment variable set (e.g., ``:0``, ``:20``)
* Cannot run on headless servers without X forwarding or Xvfb

**Using Xvfb (Virtual Display on Headless Servers):**

.. code-block:: bash

   # Install Xvfb
   sudo apt-get install xvfb

   # Start virtual display
   Xvfb :20 -screen 0 1440x900x24 &

   # Run with visible browser on virtual display
   DISPLAY=:20 uv run massgen --config config.yaml

   # To see it, use VNC or x11vnc
   x11vnc -display :20 -forever -shared -rfbport 5900 -nopw &
   vncviewer localhost:5900

Terminal Output Monitoring
~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Real-time Logs:**

.. code-block:: bash

   # Watch MassGen logs in real-time
   tail -f massgen_logs/log_*/agent_chat.log

   # Watch tool execution
   tail -f massgen_logs/log_*/tool_calls.log

**Verbose Mode:**

.. code-block:: bash

   # Enable debug logging
   export MASSGEN_LOG_LEVEL=DEBUG
   uv run massgen --config config.yaml

Multi-Agent Computer Use
-------------------------

You can combine multiple computer use tools in a single configuration for complex workflows.

**Example: Claude (Desktop) + Gemini (Browser)**

.. code-block:: yaml

   agents:
     # Agent 1: Claude Computer Use with Docker
     - id: "claude_desktop_agent"
       backend:
         type: "claude"
         model: "claude-sonnet-4-5"
         betas: ["computer-use-2025-01-24"]
         custom_tools:
           - name: ["claude_computer_use"]
             path: "massgen/tool/_claude_computer_use/claude_computer_use_tool.py"
             function: ["claude_computer_use"]
             preset_args:
               environment: "linux"
               display_width: 1024
               display_height: 768
               max_iterations: 30

       system_message: |
         You are a Linux desktop automation specialist.
         Your specialty: File operations, bash scripts, system-level tasks.

     # Agent 2: Gemini Computer Use with Browser
     - id: "gemini_browser_agent"
       backend:
         type: "openai"
         model: "gpt-4.1"
         custom_tools:
           - name: ["gemini_computer_use"]
             path: "massgen/tool/_gemini_computer_use/gemini_computer_use_tool.py"
             function: ["gemini_computer_use"]
             preset_args:
               environment: "browser"
               display_width: 1440
               display_height: 900
               environment_config:
                 headless: false
                 browser_type: "chromium"

       system_message: |
         You are a web research and browser automation specialist.
         Your specialty: Web browsing, data extraction, online research.

**Example Use Cases:**

* "Search for the latest Python releases on the web, then create a summary document"
* "Download a file from the web and process it with a bash script"
* "Research information online and save it to a file on the desktop"

Troubleshooting
---------------

Common Configuration Mistake
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Issue:** Browser always runs in headless mode even with ``headless: false``

**Solution:** MassGen's custom tools use ``preset_args``, NOT ``default_params``:

.. code-block:: yaml

   # ❌ WRONG - Will not work
   custom_tools:
     - name: ["gemini_computer_use"]
       default_params:
         environment_config:
           headless: false

   # ✅ CORRECT - Use preset_args
   custom_tools:
     - name: ["gemini_computer_use"]
       preset_args:
         environment: "browser"
         display_width: 1440
         display_height: 900
         environment_config:
           headless: false
           browser_type: "chromium"

VNC Issues
~~~~~~~~~~

.. code-block:: bash

   # Check if VNC is running
   docker exec cua-container ps aux | grep x11vnc

   # Restart VNC
   docker exec cua-container pkill x11vnc
   ./scripts/enable_vnc_viewer.sh

   # Check firewall
   sudo ufw allow 5900/tcp

Browser Not Showing
~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   # 1. Check DISPLAY variable is set
   echo $DISPLAY
   # Should show something like: :0 or :20

   # 2. List available displays
   ls /tmp/.X11-unix/
   # Shows: X0, X20, etc.

   # 3. Test with simple X app
   DISPLAY=:20 xeyes  # Should open a window

   # 4. If no DISPLAY, create virtual display
   Xvfb :20 -screen 0 1440x900x24 &
   export DISPLAY=:20

   # 5. Verify config uses preset_args (not default_params)
   grep -A5 "preset_args" your_config.yaml

   # 6. Ensure headless: false in environment_config
   grep "headless" your_config.yaml

Best Practices
--------------

1. **Development:** Use VNC + non-headless browser for debugging
2. **Testing:** Use terminal logs with occasional screenshots
3. **Production:** Use headless mode with comprehensive logging
4. **Demos:** Record sessions with VNC/browser recording
5. **Remote Work:** Use X11 forwarding or VNC over SSH tunnel
6. **Iteration Limits:** Set appropriate ``max_iterations`` based on task complexity
7. **Safety:** Test actions in isolated environments before production use
8. **Error Handling:** Monitor logs for errors and adjust configurations

File Structure
--------------

.. code-block:: text

   massgen/
   ├── tool/
   │   ├── _computer_use/              # OpenAI CUA implementation
   │   │   ├── __init__.py
   │   │   ├── computer_use_tool.py    # Requires computer-use-preview
   │   │   ├── README.md
   │   │   └── QUICKSTART.md
   │   │
   │   ├── _gemini_computer_use/       # Google Gemini implementation
   │   │   ├── __init__.py
   │   │   └── gemini_computer_use_tool.py
   │   │
   │   ├── _claude_computer_use/       # Anthropic Claude implementation
   │   │   ├── __init__.py
   │   │   └── claude_computer_use_tool.py
   │   │
   │   └── _browser_automation/        # Simple browser tool
   │       ├── __init__.py
   │       └── browser_automation_tool.py
   │
   └── configs/tools/custom_tools/
       ├── gemini_computer_use_example.yaml
       ├── gemini_computer_use_docker_example.yaml
       ├── claude_computer_use_docker_example.yaml
       ├── claude_computer_use_browser_example.yaml
       ├── simple_browser_automation_example.yaml
       └── multi_agent_computer_use_example.yaml

Next Steps
----------

* **Related Guides:**

  * :doc:`../tools/custom_tools` - Learn about creating custom tools
  * :doc:`multimodal` - Multimodal capabilities
  * :doc:`../tools/mcp_integration` - External tools via MCP
  * :doc:`../../reference/yaml_schema` - Complete YAML reference

* **Configuration Examples:**

  * `Computer Use Examples <https://github.com/Leezekun/MassGen/tree/main/massgen/configs/tools/custom_tools>`_
  * ``massgen/backend/docs/COMPUTER_USE_TOOLS_GUIDE.md`` - Comprehensive implementation guide
  * ``massgen/backend/docs/COMPUTER_USE_VISUALIZATION.md`` - Visualization guide

* **Setup Guides:**

  * ``scripts/computer_use_setup.md`` - Docker installation guide
  * ``./scripts/setup_docker_cua.sh`` - Manual Docker setup script (optional - auto-created on first run)
  * ``./scripts/enable_vnc_viewer.sh`` - VNC visualization setup


---

## user_guide/advanced/diversity.rst

===============================
Increasing Diversity in MassGen
===============================

Why Diversity Matters
=====================

In multi-agent systems, diversity drives better outcomes. When agents approach problems from different angles, they explore solution spaces more thoroughly, catch errors, and generate richer insights.

MassGen provides several mechanisms to increase diversity across agent teams:

1. **Answer Novelty Requirements** - Prevent agents from rephrasing existing answers
2. **Question Paraphrasing (DSPy)** - Give each agent a linguistically different question variant
3. **Persona Generation** - Automatically assign different perspectives or solution approaches to agents

.. contents:: Table of Contents
   :local:
   :depth: 2

Answer Novelty Requirements
============================

The ``answer_novelty_requirement`` setting ensures agents produce meaningfully different answers rather than just rephrasing existing solutions.

Configuration
-------------

Set under ``orchestrator`` in your config:

.. code-block:: yaml

   orchestrator:
     answer_novelty_requirement: "balanced"  # lenient|balanced|strict

Options
-------

.. list-table::
   :header-rows: 1
   :widths: 15 20 65

   * - Setting
     - Overlap Threshold
     - Description
   * - ``lenient``
     - No checks
     - No similarity checks (fastest, allows rephrasing)
   * - ``balanced``
     - >70% token overlap
     - Default. Rejects answers that are too similar, requires meaningful differences
   * - ``strict``
     - >50% token overlap
     - Only accepts substantially different solutions, prevents minor variations

How It Works
------------

When an agent provides a new answer, MassGen compares token overlap with existing answers:

* **Passes check**: Answer is novel enough, accepted
* **Fails check**: Agent receives error message explaining their answer is too similar and should use a fundamentally different approach or vote instead

Example
-------

.. code-block:: yaml

   orchestrator:
     voting_sensitivity: "balanced"
     max_new_answers_per_agent: 2
     answer_novelty_requirement: "balanced"  # Enforce meaningful differences

This prevents agents from making cosmetic changes and forces them to explore genuinely different approaches.

Question Paraphrasing with DSPy
================================

DSPy integration provides **intelligent question paraphrasing** - each agent receives a semantically equivalent but differently worded version of your question, encouraging diverse interpretations.

Quick Start
-----------

**1. Install DSPy:**

.. code-block:: bash

   pip install 'dspy>=2.4.0'

**2. Configure in your YAML:**

.. code-block:: yaml

   orchestrator:
     dspy:
       enabled: true
       backend:
         type: "gemini"
         model: "gemini-3-flash-preview"
       num_variants: 3
       strategy: "balanced"

**3. Run MassGen:**

.. code-block:: bash

   massgen --config my_config.yaml "Explain quantum computing"

You'll see: ``✅ DSPy question paraphrasing enabled (strategy=balanced, variants=3)``

Configuration Reference
-----------------------

Main Settings
~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 20 15 15 50

   * - Parameter
     - Type
     - Default
     - Description
   * - ``enabled``
     - boolean
     - ``false``
     - Enable DSPy paraphrasing
   * - ``backend``
     - object
     - \-
     - LLM config for paraphrase generation (required)
   * - ``num_variants``
     - integer
     - ``3``
     - Number of paraphrase variants (1-10 recommended)
   * - ``strategy``
     - string
     - ``balanced``
     - ``balanced`` | ``diverse`` | ``conservative`` | ``adaptive``
   * - ``cache_enabled``
     - boolean
     - ``true``
     - Cache paraphrases for repeated questions
   * - ``semantic_threshold``
     - float
     - ``0.85``
     - Validation strictness (0.0-1.0)
   * - ``validate_semantics``
     - boolean
     - ``true``
     - Verify paraphrases ask for same information

Backend Configuration
~~~~~~~~~~~~~~~~~~~~~

Under ``orchestrator.dspy.backend``:

.. code-block:: yaml

   backend:
     type: "gemini"              # openai|anthropic|gemini|lmstudio|vllm|cerebras
     model: "gemini-3-flash-preview"   # Required
     api_key: "..."              # Optional (uses env var if omitted)
     temperature: 0.7            # Optional (overrides strategy temps)
     max_tokens: 150             # Optional

Paraphrasing Strategies
~~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 20 30 50

   * - Strategy
     - Temperature Pattern
     - Best For
   * - ``balanced``
     - [0.5, 0.6, 0.7]
     - General use (default)
   * - ``diverse``
     - [0.3, 0.6, 0.9]
     - Maximum linguistic variation
   * - ``conservative``
     - [0.3, 0.4, 0.5]
     - Technical/scientific accuracy
   * - ``adaptive``
     - [0.3, 0.5, 0.7, 0.9]
     - Mixed question types

How It Works
------------

1. **Generate**: DSPy creates N paraphrased variants of your question
2. **Validate**: Each variant is checked for semantic equivalence and quality
3. **Assign**: Paraphrases are distributed round-robin to agents
4. **Process**: Each agent receives both original and paraphrased version
5. **Fallback**: If generation fails, agents receive original question (coordination continues)

Example Workflow
~~~~~~~~~~~~~~~~

.. code-block:: text

   Original: "Explain quantum computing"

   Agent 1 receives: "Can you explain what quantum computing is?"
   Agent 2 receives: "What is quantum computing and how does it work?"
   Agent 3 receives: "Please describe quantum computing principles"

Each agent interprets the question slightly differently, leading to more diverse initial answers.

Configuration Examples
----------------------

Cost-Optimized
~~~~~~~~~~~~~~

.. code-block:: yaml

   orchestrator:
     dspy:
       enabled: true
       backend:
         type: "openai"
         model: "gpt-4o-mini"      # Cheaper model
         max_tokens: 100
       num_variants: 2              # Fewer variants
       strategy: "conservative"
       use_chain_of_thought: false
       cache_enabled: true

High-Quality
~~~~~~~~~~~~

.. code-block:: yaml

   orchestrator:
     dspy:
       enabled: true
       backend:
         type: "openai"
         model: "gpt-4o"
       num_variants: 4
       strategy: "diverse"          # Maximum variation
       use_chain_of_thought: true   # Better reasoning (higher cost)
       semantic_threshold: 0.90     # Stricter validation

Local LLM
~~~~~~~~~

.. code-block:: yaml

   orchestrator:
     dspy:
       enabled: true
       backend:
         type: "lmstudio"
         model: "your-local-model"
         base_url: "http://localhost:1234/v1"
       num_variants: 3
       strategy: "balanced"

Troubleshooting
---------------

**Installation Issues**

.. code-block:: bash

   pip install 'dspy>=2.4.0'
   pip show dspy  # Verify version

**API Key Issues**

Set environment variables:

.. code-block:: bash

   export OPENAI_API_KEY="sk-..."
   export ANTHROPIC_API_KEY="sk-ant-..."
   export GOOGLE_API_KEY="..."

**Generation Failures**

If DSPy fails, the system falls back to original question - coordination continues normally. Check:

1. Backend connectivity and model availability
2. API key validity and credits
3. Logs for detailed error messages

**Low Quality Paraphrases**

Try:

* ``strategy: "diverse"`` for more variation
* ``semantic_threshold: 0.90`` for stricter validation
* ``use_chain_of_thought: true`` for better reasoning
* ``temperature_range: [0.5, 1.0]`` for custom temperature control

.. seealso::
   **Detailed Implementation Guide**: See ``massgen/backend/docs/DSPY_IMPLEMENTATION_GUIDE.md`` for comprehensive technical documentation including temperature scheduling formulas, validation mechanisms, and debugging.

Persona Generation
==================

The persona generator automatically assigns different perspectives or approaches to each agent, encouraging diverse solutions without manual configuration.

Quick Start
-----------

Enable persona generation in your config:

.. code-block:: yaml

   orchestrator:
     coordination:
       persona_generator:
         enabled: true
         diversity_mode: "perspective"  # or "implementation"

Configuration Reference
-----------------------

.. list-table::
   :header-rows: 1
   :widths: 20 15 15 50

   * - Parameter
     - Type
     - Default
     - Description
   * - ``enabled``
     - boolean
     - ``false``
     - Enable automatic persona generation
   * - ``diversity_mode``
     - string
     - ``perspective``
     - Type of diversity to encourage (see below)
   * - ``persist_across_turns``
     - boolean
     - ``false``
     - If true, reuse personas across turns. If false (default), generate fresh personas each turn.
   * - ``backend``
     - object
     - (inherited)
     - Optional LLM config for persona generation

Diversity Modes
~~~~~~~~~~~~~~~

**perspective** (default)
   Agents receive different *values and priorities* for the same problem. Each agent optimizes
   for different qualities (e.g., simplicity vs robustness, user experience vs maintainability).

   Example personas:

   * "Prioritize long-term maintainability and clean architecture over quick solutions"
   * "Optimize for the end user's experience - make it intuitive and delightful"

**implementation**
   Agents receive different *solution types or interpretations*. Each agent explores a
   fundamentally different kind of solution to the problem.

   Example personas:

   * "Explore a minimalist, single-page approach focusing on essential content"
   * "Consider a rich, interactive experience with dynamic elements"

**Future: combined**
   A mixed mode combining both perspective and implementation diversity is planned for future releases.

Phase-Based Adaptation
----------------------

Persona injection adapts based on the coordination phase:

**Exploration Phase** (no answers seen yet)
   Agents receive their full perspective to encourage diverse initial solutions.

**Convergence Phase** (after seeing other answers)
   Perspectives are softened to encourage objective evaluation across all approaches.
   Agents are reminded to evaluate ALL solutions on merit rather than defending their original perspective.

This prevents personas from blocking convergence after restarts - agents naturally shift from
"generate diverse ideas" to "find the best solution across all ideas."

How It Works
------------

1. **Generate**: Before agents start, the system generates complementary perspectives
2. **Assign**: Each agent receives a unique persona as part of their system message
3. **Adapt**: Persona text adjusts based on whether the agent has seen other solutions
4. **Preserve**: User-specified system prompts are preserved; personas are prepended

Example Configuration
---------------------

Full configuration with all options:

.. code-block:: yaml

   orchestrator:
     coordination:
       persona_generator:
         enabled: true
         diversity_mode: "perspective"
         backend:
           type: "gemini"
           model: "gemini-3-flash-preview"

With implementation diversity:

.. code-block:: yaml

   orchestrator:
     coordination:
       persona_generator:
         enabled: true
         diversity_mode: "implementation"

Combining Diversity Methods
============================

For maximum diversity, combine multiple techniques:

.. code-block:: yaml

   orchestrator:
     # Enforce different solutions
     answer_novelty_requirement: "balanced"
     max_new_answers_per_agent: 2

     # Linguistic diversity via DSPy
     dspy:
       enabled: true
       backend:
         type: "gemini"
         model: "gemini-3-flash-preview"
       num_variants: 3
       strategy: "diverse"

     # Conceptual diversity via personas
     coordination:
       persona_generator:
         enabled: true
         diversity_mode: "perspective"

This configuration ensures:

1. Each agent receives a different question phrasing (DSPy)
2. Each agent has a different perspective/priority (persona generator)
3. Agents must provide meaningfully different answers (novelty requirement)
4. Limited attempts encourage quality over iteration (max_new_answers)

When to Use What
================

**Answer Novelty Requirement**

* ✅ Always recommended for multi-agent setups
* ✅ Prevents wasted cycles on superficial changes
* Use ``balanced`` by default, ``strict`` for critical tasks

**DSPy Question Paraphrasing**

* ✅ Complex queries benefiting from multiple interpretations
* ✅ Multi-agent systems seeking diverse perspectives
* ❌ Skip for single-agent or simple factual queries (adds overhead)

**Persona Generation**

* ✅ Multi-agent systems where conceptual diversity matters
* ✅ Creative tasks benefiting from different approaches or interpretations
* ✅ Use ``perspective`` mode for different values/priorities
* ✅ Use ``implementation`` mode for different solution types
* ❌ Skip for single-agent setups (no benefit)

Summary
=======

MassGen's diversity framework includes:

**Current Features:**

1. **Answer Novelty Requirements** - Prevents rephrasing, enforces meaningful differences
2. **DSPy Question Paraphrasing** - Linguistic diversity through intelligent paraphrasing
3. **Persona Generation** - Conceptual diversity through automatically assigned perspectives

   * ``perspective`` mode: Different values and priorities
   * ``implementation`` mode: Different solution types and interpretations
   * Phase-based adaptation: Strong perspectives for exploration, softened for convergence

**Future Features:**

4. **Combined Diversity Mode** - Mix perspective and implementation diversity in a single run

Use these techniques individually or combined to maximize the quality and breadth of multi-agent coordination.

**Next Steps:**

* :doc:`../../reference/yaml_schema` - Complete configuration reference
* :doc:`../backends` - Backend capabilities matrix
* :doc:`../../examples/basic_examples` - Working examples


---

## user_guide/advanced/hooks.rst

Hook Framework
==============

MassGen provides a hook framework for extending agent behavior at key execution points.
Hooks enable content injection, permission validation, and custom processing during
tool execution.

Overview
--------

Hooks are callbacks that execute at specific points in the agent lifecycle:

- **PreToolUse**: Before a tool is invoked
- **PostToolUse**: After a tool returns its result

Hooks can:

- **Allow/Deny** tool execution (PreToolUse)
- **Inject content** into tool results or as separate messages (PostToolUse)
- **Extract reminders** from tool output for system notifications
- **Modify arguments** before tool execution

Built-in Hooks
--------------

MassGen includes two built-in hooks that power multi-agent coordination:

MidStreamInjectionHook
~~~~~~~~~~~~~~~~~~~~~~

Injects updates from other agents into tool results during coordination.

**Purpose**: When Agent A provides an answer, other agents need to see that update
without losing their work progress. The MidStreamInjectionHook injects these updates
at natural pauses (tool completion) rather than requiring full agent restarts.

**Injection Strategy**: ``tool_result`` - appends to the current tool output

**Example injection**:

.. code-block:: text

   ============================================================
   ⚠️  IMPORTANT: NEW ANSWER RECEIVED - ACTION REQUIRED
   ============================================================

   [UPDATE: agent1 submitted new answer(s)]

     [agent1] (workspace: /path/to/temp_workspaces/agent_b/agent1):
       I've implemented the authentication module...

   ============================================================
   REQUIRED ACTION - You MUST do one of the following:
   ============================================================

   1. **ADD A TASK** to your plan: 'Evaluate agent answer(s) and decide next action'
      - Use update_task_status or create a new task to track this evaluation
      - Read their workspace files (paths above) to understand their solution
      - Compare their approach to yours

   2. **THEN CHOOSE ONE**:
      a) VOTE for their answer if it's complete and correct (use vote tool)
      b) BUILD on their work - improve/extend it and submit YOUR enhanced answer
      c) MERGE approaches - combine the best parts of their work with yours
      d) CONTINUE your own approach if you believe it's better

   DO NOT ignore this update - you must explicitly evaluate and decide!
   ============================================================

**Benefits**:

- Preserves agent's conversation history
- No lost work on new answers
- Lightweight mid-stream delivery

HighPriorityTaskReminderHook
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Injects reminders when high-priority tasks are completed using the planning tools.

**Purpose**: When an agent completes a high-priority task via ``update_task_status``,
this hook injects a reminder to document learnings. This encourages reflection and
knowledge capture after important work.

**Injection Strategy**: ``user_message`` - creates a separate message after tool result

**Trigger condition**: The hook inspects ``update_task_status`` tool output and fires when:

- ``task.priority == "high"``
- ``task.status == "completed"``

**Formatted injection**:

.. code-block:: text

   ============================================================
   ⚠️  SYSTEM REMINDER
   ============================================================

   ✓ High-priority task completed! Document decisions to optimize future work:
     • Which skills/tools were effective (or not)? → memory/long_term/skill_effectiveness.md
     • What approach worked (or failed) and why? → memory/long_term/approach_patterns.md
     • What would prevent mistakes on similar tasks? → memory/long_term/lessons_learned.md
     • User preferences revealed? → memory/short_term/user_prefs.md

   ============================================================

**Note**: Unlike passive extraction from tool JSON, this hook actively inspects tool
output and makes decisions about when to inject. This pattern is more flexible and
keeps reminder logic separate from tool implementation.

Injection Strategies
--------------------

Hooks can inject content using two strategies:

``tool_result`` Strategy
~~~~~~~~~~~~~~~~~~~~~~~~

Appends content directly to the tool result. Best for:

- Cross-agent updates during coordination
- Additional context that relates to the tool operation
- Minimal message overhead

.. code-block:: python

   HookResult(
       inject={
           "content": "[UPDATE] New information...",
           "strategy": "tool_result"
       }
   )

``user_message`` Strategy
~~~~~~~~~~~~~~~~~~~~~~~~~

Injects as a separate user message after the tool result. Best for:

- System reminders and notifications
- Content that should be clearly distinguished from tool output
- Semantic separation

.. code-block:: python

   HookResult(
       inject={
           "content": "SYSTEM REMINDER: ...",
           "strategy": "user_message"
       }
   )

API-Specific Handling
---------------------

Different LLM APIs handle injection differently:

Anthropic API (Claude)
~~~~~~~~~~~~~~~~~~~~~~

Uses separate content blocks within a single message for clean semantic separation:

.. code-block:: json

   {
     "role": "user",
     "content": [
       {
         "type": "tool_result",
         "tool_use_id": "call_123",
         "content": "actual tool output"
       },
       {
         "type": "text",
         "text": "<system-reminder>Your TODO list is empty</system-reminder>"
       }
     ]
   }

OpenAI API (GPT-4, o1, etc.)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Two approaches:

**Separate message**:

.. code-block:: json

   {"role": "tool", "tool_call_id": "call_123", "content": "actual output"}
   {"role": "user", "content": "<system-reminder>...</system-reminder>"}

**Structured output** (with system prompt explaining format):

.. code-block:: json

   {
     "role": "tool",
     "tool_call_id": "call_123",
     "content": "{\"output\": \"actual\", \"system_notes\": [\"reminder\"]}"
   }

Visual Separation
~~~~~~~~~~~~~~~~~

When appending to tool results, clear visual separators prevent model confusion:

.. code-block:: text

   actual tool output here

   ═══════════════════════════════════════════════════════
   SYSTEM CONTEXT (not part of tool output):
   - Your TODO list is empty
   - Consider using TodoWrite for multi-step tasks
   ═══════════════════════════════════════════════════════

Workspace Handling
------------------

When injecting answers from other agents, workspace paths are normalized so the
receiving agent can access the referenced files.

The Problem
~~~~~~~~~~~

Each agent has an isolated workspace:

- Agent A: ``.massgen/workspaces/agent_a_workspace/``
- Agent B: ``.massgen/workspaces/agent_b_workspace/``

If Agent A's answer references their workspace path, Agent B cannot access it.

Path Normalization
~~~~~~~~~~~~~~~~~~

Workspace paths are automatically normalized during injection:

.. code-block:: text

   Before (Agent A's answer):
   "I created /Users/foo/.massgen/workspaces/agent_a_workspace/output.py"

   After (what Agent B sees):
   "I created /Users/foo/.massgen/temp_workspaces/agent1/output.py"

The receiving agent can now actually access the files at the normalized path.

Snapshot Sharing
~~~~~~~~~~~~~~~~

When Agent A provides an answer:

1. Agent A's workspace is snapshotted
2. The snapshot is copied to Agent B's temp workspace
3. Paths in the injected answer point to the temp workspace
4. Agent B can read Agent A's files for verification

Anonymization
~~~~~~~~~~~~~

Agent identities are preserved through this process:

- Agent B doesn't know "agent1" is really "Agent A"
- Prevents voting bias based on agent identity
- But Agent B knows the injection is external (not from self)

This trade-off is acceptable because:

- Knowing updates are external prevents self-confirmation loops
- Anonymization between agents still prevents bias
- The alternative (full restart) loses ALL progress

Custom Hooks
------------

You can create custom hooks by implementing the hook interface.

Python Callable Hook
~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from massgen.mcp_tools.hooks import (
       PythonCallableHook,
       HookEvent,
       HookResult,
       GeneralHookManager,
       HookType,
   )

   def my_audit_hook(event: HookEvent) -> HookResult:
       """Log all tool calls for auditing."""
       print(f"Tool called: {event.tool_name}")
       print(f"Arguments: {event.tool_input}")
       return HookResult.allow()

   # Register the hook
   manager = GeneralHookManager()
   hook = PythonCallableHook("audit", my_audit_hook, matcher="*")
   manager.register_global_hook(HookType.PRE_TOOL_USE, hook)

Pattern Matching
~~~~~~~~~~~~~~~~

Hooks support glob-style pattern matching:

- ``*`` - Match all tools
- ``Write`` - Match exactly "Write"
- ``mcp__*`` - Match all MCP tools
- ``Write|Edit`` - Match "Write" or "Edit"

Hook Registration
-----------------

Global vs Per-Agent
~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   # Global hooks - apply to ALL agents
   hooks:
     PreToolUse:
       - matcher: "*"
         handler: "massgen.hooks.audit_all_tools"
         type: "python"

   agents:
     - id: "agent1"
       backend:
         # Per-agent hooks - extend global by default
         hooks:
           PreToolUse:
             - matcher: "Write"
               handler: "massgen.hooks.validate_writes"
               type: "python"
               fail_closed: true  # Deny on hook errors
           PostToolUse:
             override: true  # Only use per-agent hooks
             hooks:
               - handler: "massgen.hooks.log_outputs"
                 type: "python"

Error Handling
~~~~~~~~~~~~~~

By default, hooks **fail open** (allow tool execution) on errors to avoid blocking agents.
For security-critical hooks, you can configure **fail closed** behavior:

.. code-block:: yaml

   hooks:
     PreToolUse:
       - matcher: "Write|Delete"
         handler: "massgen.hooks.security_check"
         fail_closed: true  # Deny tool execution if hook fails

**Default behavior (fail_closed: false)**:

- **Timeout**: Allow - don't block agent on slow hooks
- **Runtime errors**: Allow with logging - don't crash agent
- **Import errors**: Always deny - configuration error

**With fail_closed: true**:

- **Timeout**: Deny - block tool if hook can't complete
- **Runtime errors**: Deny - block tool if hook crashes

Timing Considerations
---------------------

Mid-stream injection has specific timing rules:

1. **First update**: Uses traditional full-message injection
   - Prevents premature convergence on first answer

2. **Subsequent updates**: Uses mid-stream hook injection
   - Lighter weight, preserves work in progress
   - Only new answers (not already seen) are injected

3. **Vote-only mode**: Skips mid-stream injection entirely
   - Tool schemas are fixed at stream start
   - Full restart required for new vote options

Debugging Injection
-------------------

Testing mid-stream injection requires one agent to be slower than others so it
receives updates while working. MassGen provides a debug delay feature for this.

Debug Delay Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~

Add ``debug_delay_seconds`` to a backend config to artificially slow an agent:

.. code-block:: yaml

   agents:
     - id: "agent_a"
       backend:
         type: "gemini"
         model: "gemini-3-flash-preview"

     - id: "agent_b"
       backend:
         type: "gemini"
         model: "gemini-3-flash-preview"
         debug_delay_seconds: 30        # Delay in seconds
         debug_delay_after_n_tools: 2   # Apply after N tool calls

**Parameters**:

- ``debug_delay_seconds``: How long to pause (default: 0, disabled)
- ``debug_delay_after_n_tools``: Apply delay after this many tool calls (default: 3)

**Why delay after N tools?** Delaying at the start would cause immediate restarts.
By waiting until the agent has made progress (created tasks, done some work), the
delay happens at a natural point where injection is meaningful.

Injection Visibility
~~~~~~~~~~~~~~~~~~~~

When injection occurs, you'll see:

1. **Log messages**:

   .. code-block:: text

      [Orchestrator] Copying snapshots for mid-stream injection to agent_b
      [Orchestrator] Injection workspace .../temp_workspaces/agent_b/agent1 contains: ['tasks', 'poem.txt']
      [Orchestrator] Mid-stream injection for agent_b: 1 new answer(s)
      [PostToolUse] Hook injection for mcp__filesystem__read_text_file: strategy=tool_result, content_len=1641

2. **Stream chunk**: A visible ``📥 [INJECTION]`` chunk in the output

3. **Full content**: The injection content is logged for debugging

Example Debug Config
~~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   # massgen/configs/debug/injection_delay_test.yaml
   agents:
     - id: "agent_a"
       backend:
         type: "gemini"
         model: "gemini-3-flash-preview"
         cwd: "workspace1"
         enable_mcp_command_line: true

     - id: "agent_b"
       backend:
         type: "gemini"
         model: "gemini-3-flash-preview"
         cwd: "workspace2"
         enable_mcp_command_line: true
         debug_delay_seconds: 30
         debug_delay_after_n_tools: 2

   orchestrator:
     coordination:
       enable_agent_task_planning: true
       task_planning_filesystem_mode: true

Run with:

.. code-block:: bash

   uv run massgen --config massgen/configs/debug/injection_delay_test.yaml \
       "Create a simple poem and write it into a file"

.. seealso::

   - :doc:`agent_communication` - Multi-agent coordination and broadcasts
   - :doc:`/user_guide/files/file_operations` - Workspace management


---

## user_guide/advanced/index.rst

Advanced Features
=================

This section covers advanced MassGen capabilities for power users, including multi-agent coordination, multimodal processing, and specialized automation features.

Overview
--------

Advanced features in MassGen:

* **Agent diversity** - Configure multiple agents with different models and behaviors
* **Agent communication** - Enable agents to ask each other questions
* **Hook framework** - Extend agent behavior with custom hooks for tool execution
* **Task planning** - Structured task breakdown and execution
* **Subagents** - Spawn parallel child processes for independent tasks
* **Planning mode** - Safe execution with human approval
* **Change documents** - Decision journals for traceability and attribution
* **Multimodal support** - Image, audio, and video understanding
* **Computer use** - Browser and desktop automation
* **Terminal evaluation** - Record and evaluate terminal sessions

Guides in This Section
----------------------

.. grid:: 2
   :gutter: 3

   .. grid-item-card:: 🎭 Agent Diversity

      Configure diverse agent teams

      * Different models per agent
      * Varied system prompts
      * Specialization strategies
      * Voting and consensus

      :doc:`Read the Diversity guide → <diversity>`

   .. grid-item-card:: 💬 Agent Communication

      Enable inter-agent messaging

      * ask_others tool
      * Agent collaboration patterns
      * Information sharing
      * Coordination strategies

      :doc:`Read the Agent Communication guide → <agent_communication>`

   .. grid-item-card:: 🪝 Hook Framework

      Extend agent behavior

      * PreToolUse / PostToolUse hooks
      * Content injection strategies
      * Reminder extraction
      * Custom hook development

      :doc:`Read the Hook Framework guide → <hooks>`

   .. grid-item-card:: 📋 Task Planning

      Structured task execution

      * Task breakdown
      * Planning strategies
      * Execution tracking
      * Complex workflows

      :doc:`Read the Task Planning guide → <agent_task_planning>`

   .. grid-item-card:: 🔀 Subagents

      Parallel child processes

      * Independent workspaces
      * Concurrent execution
      * Context file sharing
      * Result aggregation

      :doc:`Read the Subagents guide → <subagents>`

   .. grid-item-card:: ✅ Planning Mode

      Safe execution with approval

      * Human-in-the-loop
      * Plan review
      * Action confirmation
      * Rollback support

      :doc:`Read the Planning Mode guide → <planning_mode>`

   .. grid-item-card:: Change Documents

      Decision journals for traceability

      * Why each decision was made
      * Code references per decision
      * Multi-agent attribution
      * Feature-level provenance

      :doc:`Read the Change Documents guide → <change_documents>`

   .. grid-item-card:: 🖼️ Multimodal

      Image, audio, video support

      * Image understanding
      * Audio transcription
      * Video analysis
      * Multi-format input

      :doc:`Read the Multimodal guide → <multimodal>`

   .. grid-item-card:: 🖥️ Computer Use

      Browser and desktop automation

      * Gemini Computer Use
      * Claude Computer Use
      * Browser automation
      * Visual feedback

      :doc:`Read the Computer Use guide → <computer_use>`

   .. grid-item-card:: 📺 Terminal Evaluation

      Record and evaluate sessions

      * VHS recording
      * Session playback
      * Evaluation metrics
      * Demonstration creation

      :doc:`Read the Terminal Evaluation guide → <terminal_evaluation>`

Related Documentation
---------------------

* :doc:`../concepts` - Core MassGen concepts
* :doc:`../backends` - Backend capabilities
* :doc:`../tools/index` - Tools and capabilities
* :doc:`../integration/index` - Integration options

.. toctree::
   :maxdepth: 1
   :hidden:

   diversity
   agent_communication
   hooks
   agent_task_planning
   subagents
   planning_mode
   change_documents
   multimodal
   computer_use
   terminal_evaluation


---

## user_guide/advanced/multimodal.rst

Multimodal Capabilities
=======================

MassGen supports comprehensive multimodal AI workflows, enabling agents to both understand and generate images, audio, video, and file content. This includes analyzing existing content and creating new multimodal outputs.

.. note::
   **Multimodal Tools (v0.1.3+):**

   MassGen provides custom tools for both understanding and generating multimodal content:

   **Understanding Tools:**

   * ✅ **understand_audio**: Transcribe audio files to text (uses OpenAI's ``gpt-4o-transcribe`` by default)
   * ✅ **understand_file**: Analyze documents (PDF, DOCX, XLSX, PPTX) and text files
   * ✅ **understand_image**: Describe and analyze images — **routes to the agent's native backend** when supported
   * ✅ **understand_video**: Extract and analyze key frames from videos — routes to the best available backend

   **Native Backend Routing (v0.1.55+):**

   * Image and video understanding now route API calls to the **agent's own backend** when it supports the capability
   * Supported image backends: **OpenAI**, **Claude**, **Gemini**, **Grok**, **Claude Code** (SDK), **Codex** (CLI)
   * If the agent's backend does not support image understanding, falls back to OpenAI ``gpt-5.4``
   * This preserves model diversity and per-agent consistency — a Claude agent analyzes images via Claude, not GPT

   **Backend Requirements:**

   * For native routing, the agent's backend API key must be available (e.g., ``ANTHROPIC_API_KEY`` for Claude)
   * Fallback to OpenAI requires ``OPENAI_API_KEY`` environment variable set in ``.env`` file
   * Claude Code requires the ``claude`` CLI installed and authenticated
   * Codex requires the ``codex`` CLI installed and authenticated

   **Generation Tools:**

   * ✅ **text_to_image_generation**: Generate images from text prompts (GPT-4.1)
   * ✅ **image_to_image_generation**: Create image variations from existing images
   * ✅ **text_to_video_generation**: Generate videos from text descriptions (Sora-2)
   * ✅ **text_to_speech_continue_generation**: Generate expressive speech with emotional tone
   * ✅ **text_to_speech_transcription_generation**: Convert text to speech (TTS)
   * ✅ **text_to_file_generation**: Generate formatted documents (TXT, MD, PDF)

   **File Access:**

   * Files must be accessible via ``context_paths`` configuration or created within agent workspaces
   * Supports both pre-existing files and agent-generated content
   * Provides secure, sandboxed file access to agents

Overview
--------

Multimodal capabilities extend MassGen's multi-agent collaboration across different content types:

**Image Capabilities:**

* **Understanding**: Analyze and describe image content (Vision models)
* **Generation**: Create images from text prompts, generate variations from existing images

**Audio Capabilities:**

* **Understanding**: Transcription, audio analysis
* **Generation**: Text-to-speech with emotional expression, direct TTS conversion

**Video Capabilities:**

* **Understanding**: Analyze video content through key frame extraction
* **Generation**: Create videos from text descriptions

**File Operations:**

* **Understanding**: Analyze documents and files (PDF, DOCX, XLSX, PPTX, text files)
* **Generation**: Generate formatted documents from text prompts
* **Custom Tools**: Comprehensive multimodal file handling

Image Understanding
-------------------

Image understanding enables agents to analyze visual content, extract information, and answer questions about images using the ``understand_image`` custom tool.

.. note::
   **Native backend routing (v0.1.55+):** The ``understand_image`` tool now routes to the agent's own backend when it supports ``image_understanding``. For example, a Claude agent will use Claude's vision API, a Gemini agent will use Gemini's multimodal API, etc. If the agent's backend doesn't support image understanding, it falls back to OpenAI ``gpt-5.4``.

   Supported backends: OpenAI, Claude, Gemini, Grok, Claude Code (SDK), Codex (CLI).

Basic Configuration
~~~~~~~~~~~~~~~~~~~

Configure agents with the ``understand_image`` tool:

.. code-block:: yaml

   agents:
     - id: "vision_agent"
       backend:
         type: "openai"
         model: "gpt-5-nano"
         cwd: "workspace1"
         custom_tools:
           - name: ["understand_image"]
             category: "multimodal"
             path: "massgen/tool/_multimodal_tools/understand_image.py"
             function: ["understand_image"]
       system_message: "You are a helpful assistant"

   orchestrator:
     context_paths:
       - path: "@examples/resources/v0.0.27-example/multimodality.jpg"
         permission: "read"

**Example Command:**

.. code-block:: bash

   massgen \
     --config @examples/basic/single/single_gpt5nano_image_understanding.yaml \
     "Please summarize the content in this image."

Multi-Agent Image Analysis
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Multiple agents can provide diverse perspectives on image content:

.. code-block:: yaml

   agents:
     - id: "response_agent1"
       backend:
         type: "openai"
         model: "gpt-5-nano"
         cwd: "workspace1"
         custom_tools:
           - name: ["understand_image"]
             category: "multimodal"
             path: "massgen/tool/_multimodal_tools/understand_image.py"
             function: ["understand_image"]
       system_message: "You are a helpful assistant"

     - id: "response_agent2"
       backend:
         type: "openai"
         model: "gpt-5-nano"
         cwd: "workspace2"
         custom_tools:
           - name: ["understand_image"]
             category: "multimodal"
             path: "massgen/tool/_multimodal_tools/understand_image.py"
             function: ["understand_image"]
       system_message: "You are a helpful assistant"

   orchestrator:
     context_paths:
       - path: "@examples/resources/v0.0.27-example/multimodality.jpg"
         permission: "read"

**Example Command:**

.. code-block:: bash

   massgen \
     --config @examples/basic/multi/gpt5nano_image_understanding.yaml \
     "Analyze this image and identify key elements, mood, and composition."

**Use Cases:**

* Document analysis and OCR
* Visual content description for accessibility
* Image classification and categorization
* Design feedback and critique
* Scene understanding for robotics

Image Generation
----------------

Generate images from text descriptions using AI models. MassGen provides two generation approaches:

Text-to-Image Generation
~~~~~~~~~~~~~~~~~~~~~~~~~

Create new images from text prompts using GPT-4.1:

.. code-block:: yaml

   agents:
     - id: "image_generator"
       backend:
         type: "openai"
         model: "gpt-4o"
         cwd: "workspace1"
         enable_image_generation: true
         custom_tools:
           - name: ["text_to_image_generation"]
             category: "multimodal"
             path: "massgen/tool/_multimodal_tools/text_to_image_generation.py"
             function: ["text_to_image_generation"]
       system_message: "You are an AI assistant with access to text-to-image generation capabilities."

**Example Command:**

.. code-block:: bash

   massgen \
     --config massgen/configs/tools/custom_tools/multimodal_tools/text_to_image_generation_single.yaml \
     "Please generate an image of a cat in space."

**Key Features:**

* Powered by OpenAI's GPT-4.1 model
* Generates high-quality images from text descriptions
* Automatically saves images to agent workspace

Image-to-Image Generation
~~~~~~~~~~~~~~~~~~~~~~~~~~

Create variations or modifications of existing images:

.. code-block:: yaml

   agents:
     - id: "image_editor"
       backend:
         type: "openai"
         model: "gpt-4o"
         cwd: "workspace1"
         enable_image_generation: true
         custom_tools:
           - name: ["image_to_image_generation"]
             category: "multimodal"
             path: "massgen/tool/_multimodal_tools/image_to_image_generation.py"
             function: ["image_to_image_generation"]
           - name: ["understand_image"]
             category: "multimodal"
             path: "massgen/tool/_multimodal_tools/understand_image.py"
             function: ["understand_image"]

   orchestrator:
     context_paths:
       - path: "path/to/source_image.jpg"
         permission: "read"

**Use Cases:**

* Create artistic variations of existing images
* Style transfer and image transformation
* Generate similar images with different characteristics
* Image editing and enhancement workflows

Multi-Agent Image Generation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Combine understanding and generation capabilities with multiple agents:

.. code-block:: yaml

  agents:
    - id: "text_to_image_generation_tool1"
      backend:
        type: "openai"
        model: "gpt-4o"
        cwd: "workspace1"
        enable_image_generation: true
        custom_tools:
          - name: ["text_to_image_generation"]
            category: "multimodal"
            path: "massgen/tool/_multimodal_tools/text_to_image_generation.py"
            function: ["text_to_image_generation"]
          - name: ["understand_image"]
            category: "multimodal"
            path: "massgen/tool/_multimodal_tools/understand_image.py"
            function: ["understand_image"]
          - name: ["image_to_image_generation"]
            category: "multimodal"
            path: "massgen/tool/_multimodal_tools/image_to_image_generation.py"
            function: ["image_to_image_generation"]
      system_message: |
        You are an AI assistant with access to text-to-image generation capabilities.

    - id: "text_to_image_generation_tool2"
      backend:
        type: "openai"
        model: "gpt-4o"
        cwd: "workspace2"
        enable_image_generation: true
        custom_tools:
          - name: ["text_to_image_generation"]
            category: "multimodal"
            path: "massgen/tool/_multimodal_tools/text_to_image_generation.py"
            function: ["text_to_image_generation"]
          - name: ["understand_image"]
            category: "multimodal"
            path: "massgen/tool/_multimodal_tools/understand_image.py"
            function: ["understand_image"]
      system_message: |
        You are an AI assistant with access to text-to-image generation capabilities.

    orchestrator:
      snapshot_storage: "snapshots"
      agent_temporary_workspace: "temp_workspaces"

**Example Command:**

.. code-block:: bash

   massgen \
     --config massgen/configs/tools/custom_tools/multimodal_tools/text_to_image_generation_multi.yaml \
     "Please generate an image of a cat in space."

Audio Understanding
-------------------

Transcribe and analyze audio files using the ``understand_audio`` custom tool.

.. note::
   The ``understand_audio`` tool uses OpenAI's Transcription API with the ``gpt-4o-transcribe`` model by default. This requires an OpenAI API key regardless of which backend your agent uses.

.. code-block:: yaml

   agents:
     - id: "transcriber"
       backend:
         type: "openai"
         model: "gpt-5-nano"
         cwd: "workspace1"
         custom_tools:
           - name: ["understand_audio"]
             category: "multimodal"
             path: "massgen/tool/_multimodal_tools/understand_audio.py"
             function: ["understand_audio"]

   orchestrator:
     context_paths:
       - path: "path/to/audio.mp3"
         permission: "read"

**Supported Formats:**

* WAV, MP3, M4A, MP4, OGG, FLAC, AAC, WMA, OPUS

**Example Use Cases:**

* Meeting transcription
* Podcast analysis
* Voice memo processing
* Interview transcription
* Audio content summarization

Audio/Speech Generation
-----------------------

Generate speech and audio content from text using OpenAI's audio generation capabilities. MassGen provides two text-to-speech approaches:

Expressive Speech Generation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Generate natural-sounding speech with emotional expression using GPT-4o Audio:

.. code-block:: yaml

   agents:
     - id: "speech_generator"
       backend:
         type: "openai"
         model: "gpt-4o"
         cwd: "workspace1"
         enable_audio_generation: true
         custom_tools:
           - name: ["text_to_speech_continue_generation"]
             category: "multimodal"
             path: "massgen/tool/_multimodal_tools/text_to_speech_continue_generation.py"
             function: ["text_to_speech_continue_generation"]
       system_message: "You are an AI assistant with access to text-to-speech generation capabilities."

**Example Command:**

.. code-block:: bash

   massgen \
     --config massgen/configs/tools/custom_tools/multimodal_tools/text_to_speech_generation_single.yaml \
     "I want you to tell me a very short introduction about Sherlock Holmes in one sentence, and I want you to use emotion voice to read it out loud."

**Key Features:**

* Powered by GPT-4o Audio Preview model
* Supports emotional and expressive speech
* Multiple voice options (alloy, echo, fable, onyx, nova, shimmer)
* Output formats: WAV, MP3
* Natural conversation flow with context awareness

Direct Text-to-Speech (TTS)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Convert text directly to speech using OpenAI's TTS API:

.. code-block:: yaml

   agents:
     - id: "tts_agent"
       backend:
         type: "openai"
         model: "gpt-4o"
         cwd: "workspace1"
         enable_audio_generation: true
         custom_tools:
           - name: ["text_to_speech_transcription_generation"]
             category: "multimodal"
             path: "massgen/tool/_multimodal_tools/text_to_speech_transcription_generation.py"
             function: ["text_to_speech_transcription_generation"]

**Key Features:**

* Uses GPT-4o-mini-TTS for fast, cost-effective generation
* Direct text-to-speech conversion
* Supports multiple voices and output formats
* Optional instructions for voice style customization
* Streaming response for efficient processing

**Supported Voices:**

* ``alloy`` - Neutral, balanced voice
* ``echo`` - Clear, professional voice
* ``fable`` - Warm, storytelling voice
* ``onyx`` - Deep, authoritative voice
* ``nova`` - Energetic, friendly voice
* ``shimmer`` - Soft, gentle voice

**Supported Formats:**

* MP3 (default)
* WAV
* OPUS
* AAC
* FLAC

Multi-Agent Audio/Speech Generation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Combine understanding and generation capabilities with multiple agents:

.. code-block:: yaml

  agents:
    - id: "text_to_speech_continue_generation_tool1"
      backend:
        type: "openai"
        model: "gpt-4o"
        cwd: "workspace1"
        enable_audio_generation: true
        custom_tools:
          - name: ["text_to_speech_transcription_generation"]
            category: "multimodal"
            path: "massgen/tool/_multimodal_tools/text_to_speech_transcription_generation.py"
            function: ["text_to_speech_transcription_generation"]
          - name: ["understand_audio"]
            category: "multimodal"
            path: "massgen/tool/_multimodal_tools/understand_audio.py"
            function: ["understand_audio"]
          - name: ["text_to_speech_continue_generation"]
            category: "multimodal"
            path: "massgen/tool/_multimodal_tools/text_to_speech_continue_generation.py"
            function: ["text_to_speech_continue_generation"]
      system_message: |
        You are an AI assistant with access to text-to-speech generation capabilities.

    - id: "text_to_speech_continue_generation_tool2"
      backend:
        type: "openai"
        model: "gpt-4o"
        cwd: "workspace2"
        enable_audio_generation: true
        custom_tools:
          - name: ["text_to_speech_transcription_generation"]
            category: "multimodal"
            path: "massgen/tool/_multimodal_tools/text_to_speech_transcription_generation.py"
            function: ["text_to_speech_transcription_generation"]
          - name: ["understand_audio"]
            category: "multimodal"
            path: "massgen/tool/_multimodal_tools/understand_audio.py"
            function: ["understand_audio"]
          - name: ["text_to_speech_continue_generation"]
            category: "multimodal"
            path: "massgen/tool/_multimodal_tools/text_to_speech_continue_generation.py"
            function: ["text_to_speech_continue_generation"]
      system_message: |
        You are an AI assistant with access to text-to-speech generation capabilities.

  orchestrator:
    snapshot_storage: "snapshots"
    agent_temporary_workspace: "temp_workspaces"


**Example Command:**

.. code-block:: bash

   massgen \
     --config massgen/configs/tools/custom_tools/multimodal_tools/text_to_speech_generation_multi.yaml \
     "I want to you tell me a very short introduction about Sherlock Homes in one sentence, and I want you to use emotion voice to read it out loud."

Video Understanding
-------------------

Analyze and extract information from video files using the ``understand_video`` custom tool.

.. note::
   The ``understand_video`` tool now routes to the agent's native backend when it supports ``video_understanding``. If the agent's backend doesn't support video understanding, it falls back to OpenAI ``gpt-5.4``. The OpenAI fallback requires an ``OPENAI_API_KEY``.

.. code-block:: yaml

   agents:
     - id: "video_analyzer"
       backend:
         type: "openai"
         model: "gpt-5-nano"
         cwd: "workspace1"
         custom_tools:
           - name: ["understand_video"]
             category: "multimodal"
             path: "massgen/tool/_multimodal_tools/understand_video.py"
             function: ["understand_video"]

   orchestrator:
     context_paths:
       - path: "path/to/video.mp4"
         permission: "read"

**Supported Formats:**

* MP4, AVI, MOV, MKV, FLV, WMV, WEBM, M4V, MPG, MPEG

**Example Use Cases:**

* Video content analysis
* Scene detection and description
* Action recognition
* Video summarization
* Quality assessment

**Requirements:**

* Requires opencv-python (``pip install opencv-python``)
* Optional: ``pip install massgen[video]`` for scene-based frame extraction

**Configurable Frame Extraction (v0.1.56+):**

By default, video understanding uses scene-based frame extraction (PySceneDetect) to select the most informative frames. You can configure the extraction strategy via ``multimodal_config``:

.. code-block:: yaml

   agents:
     - id: "video_analyzer"
       backend:
         type: "openai"
         model: "gpt-5.4"
         enable_multimodal_tools: true
         multimodal_config:
           video:
             extraction_mode: "scene"   # "scene" (default) | "uniform"
             max_frames: 30             # Hard cap (default: 30, absolute max: 60)
             fps: 1.0                   # Uniform mode: frames per second
             threshold: 0.3             # Scene mode: detection sensitivity
             frames_per_scene: 3        # Scene mode: frames per detected scene

**Extraction modes:**

* **scene** (default): Detects scene boundaries using PySceneDetect's ``ContentDetector``, then samples ``frames_per_scene`` frames within each scene. Falls back to uniform when PySceneDetect is not installed or no scenes are detected.
* **uniform**: Evenly spaced frames based on ``fps`` (default 1.0 frame/sec) or ``num_frames`` (fixed count, overrides fps). Always capped at ``max_frames``.

**Cost guardrails:** The ``max_frames`` setting (default 30) prevents runaway token costs on long videos. The absolute maximum is 60 frames regardless of configuration.

Video Generation
----------------

Generate videos from text descriptions using OpenAI's Sora-2 API:

.. code-block:: yaml

   agents:
     - id: "video_generator"
       backend:
         type: "openai"
         model: "gpt-4o"
         cwd: "workspace1"
         enable_video_generation: true
         custom_tools:
           - name: ["text_to_video_generation"]
             category: "multimodal"
             path: "massgen/tool/_multimodal_tools/text_to_video_generation.py"
             function: ["text_to_video_generation"]
       system_message: "You are an AI assistant with access to text-to-video generation capabilities."

**Example Command:**

.. code-block:: bash

   massgen \
     --config massgen/configs/tools/custom_tools/multimodal_tools/text_to_video_generation_single.yaml \
     "Generate a 4 seconds video with neon-lit alley at night, light rain, slow push-in, cinematic."

**Key Features:**

* Powered by OpenAI's Sora-2 model
* Generate high-quality videos from text descriptions
* Customizable video duration (4-20 seconds)
* Automatic video download and storage
* Supports detailed scene descriptions and camera movements

**Use Cases:**

* Marketing and advertising content creation
* Concept visualization and storyboarding
* Educational and training videos
* Social media content generation
* Creative storytelling and animation
* Product demonstration videos

**Best Practices for Video Generation:**

* Provide detailed scene descriptions including:

  * Setting and environment
  * Lighting conditions
  * Camera movements (push-in, pull-out, pan, etc.)
  * Atmosphere and mood
  * Objects and characters

* Use cinematic terminology for better results
* Specify duration based on content complexity
* Combine with ``understand_video`` tool for quality verification

Multi-Agent Video Generation
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Combine video generation with analysis for iterative improvement:

.. code-block:: yaml

  agents:
    - id: "text_to_video_generation_tool1"
      backend:
        type: "openai"
        model: "gpt-4o"
        cwd: "workspace1"
        enable_video_generation: true
        custom_tools:
          - name: ["understand_video"]
            category: "multimodal"
            path: "massgen/tool/_multimodal_tools/understand_video.py"
            function: ["understand_video"]
          - name: ["text_to_video_generation"]
            category: "multimodal"
            path: "massgen/tool/_multimodal_tools/text_to_video_generation.py"
            function: ["text_to_video_generation"]
      system_message: |
        You are an AI assistant with access to text-to-video generation capabilities.

    - id: "text_to_video_generation_tool2"
      backend:
        type: "openai"
        model: "gpt-4o"
        cwd: "workspace2"
        enable_video_generation: true
        custom_tools:
          - name: ["understand_video"]
            category: "multimodal"
            path: "massgen/tool/_multimodal_tools/understand_video.py"
            function: ["understand_video"]
          - name: ["text_to_video_generation"]
            category: "multimodal"
            path: "massgen/tool/_multimodal_tools/text_to_video_generation.py"
            function: ["text_to_video_generation"]
      system_message: |
        You are an AI assistant with access to text-to-video generation capabilities.

  orchestrator:
    snapshot_storage: "snapshots"
    agent_temporary_workspace: "temp_workspaces"


**Example Command:**

.. code-block:: bash

   massgen \
     --config massgen/configs/tools/custom_tools/multimodal_tools/text_to_video_generation_multi.yaml \
     "Generate a 4 seconds video with neon-lit alley at night, light rain, slow push-in, cinematic."

File Understanding
------------------

File understanding capabilities enable agents to analyze documents and perform Q&A using the ``understand_file`` custom tool.

Configure agents to analyze files:

.. code-block:: yaml

   agents:
     - id: "document_agent"
       backend:
         type: "openai"
         model: "gpt-5-nano"
         cwd: "workspace1"
         custom_tools:
           - name: ["understand_file"]
             category: "multimodal"
             path: "massgen/tool/_multimodal_tools/understand_file.py"
             function: ["understand_file"]

   orchestrator:
     context_paths:
       - path: "path/to/document.pdf"
         permission: "read"
       - path: "path/to/report.docx"
         permission: "read"

**Supported File Types:**

* **Text Files**: .py, .js, .java, .md, .txt, .log, .csv, .json, .yaml, etc.
* **PDF**: Requires PyPDF2 (``pip install PyPDF2``)
* **Word**: .docx - Requires python-docx (``pip install python-docx``)
* **Excel**: .xlsx - Requires openpyxl (``pip install openpyxl``)
* **PowerPoint**: .pptx - Requires python-pptx (``pip install python-pptx``)

**Example Use Case:**

.. code-block:: bash

   # Document Q&A
   massgen \
     --config @examples/basic/single/single_gpt5nano_file_search.yaml \
     "What are the main conclusions from the research paper?"

File Generation
---------------

Generate formatted documents from text using AI. The ``text_to_file_generation`` tool can create professional documents in various formats:

.. code-block:: yaml

   agents:
     - id: "document_generator"
       backend:
         type: "openai"
         model: "gpt-4o"
         cwd: "workspace1"
         enable_file_generation: true
         custom_tools:
           - name: ["text_to_file_generation"]
             category: "multimodal"
             path: "massgen/tool/_multimodal_tools/text_to_file_generation.py"
             function: ["text_to_file_generation"]
       system_message: "You are an AI assistant with access to text-to-file generation capabilities."

**Example Command:**

.. code-block:: bash

   massgen \
     --config massgen/configs/tools/custom_tools/multimodal_tools/text_to_file_generation_single.yaml \
     "Please generate a comprehensive technical report about the latest developments in Large Language Models (LLMs) and Generative AI. The report should include: 1) Executive Summary, 2) Introduction to LLMs, 3) Recent breakthroughs, 4) Applications in industry, 5) Ethical considerations, 6) Future directions. Save it as a PDF file."

**Supported Output Formats:**

* **TXT** - Plain text files
* **MD** - Markdown formatted documents
* **PDF** - Professional PDF documents with formatting
* **PPTX** - PowerPoint presentations with slide structure

Multi-Agent Document Workflow
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Combine generation with review and refinement:

.. code-block:: yaml

  agents:
    - id: "text_to_file_generation_tool1"
      backend:
        type: "openai"
        model: "gpt-4o"
        cwd: "workspace1"
        enable_file_generation: true
        custom_tools:
          - name: ["text_to_file_generation"]
            category: "multimodal"
            path: "massgen/tool/_multimodal_tools/text_to_file_generation.py"
            function: ["text_to_file_generation"]
          - name: ["understand_file"]
            category: "multimodal"
            path: "massgen/tool/_multimodal_tools/understand_file.py"
            function: ["understand_file"]
      system_message: |
        You are an AI assistant with access to text-to-file generation capabilities.

    - id: "text_to_file_generation_tool2"
      backend:
        type: "openai"
        model: "gpt-4o"
        cwd: "workspace2"
        enable_file_generation: true
        custom_tools:
          - name: ["text_to_file_generation"]
            category: "multimodal"
            path: "massgen/tool/_multimodal_tools/text_to_file_generation.py"
            function: ["text_to_file_generation"]
          - name: ["understand_file"]
            category: "multimodal"
            path: "massgen/tool/_multimodal_tools/understand_file.py"
            function: ["understand_file"]
      system_message: |
        You are an AI assistant with access to text-to-file generation capabilities.

  orchestrator:
    snapshot_storage: "snapshots"
    agent_temporary_workspace: "temp_workspaces"

**Example Command:**

.. code-block:: bash

   massgen \
     --config massgen/configs/tools/custom_tools/multimodal_tools/text_to_file_generation_multi.yaml \
     "Please generate a comprehensive technical report about the latest developments in Large Language Models (LLMs) and Generative AI. The report should include: 1) Executive Summary, 2) Introduction to LLMs, 3) Recent breakthroughs, 4) Applications in industry, 5) Ethical considerations, 6) Future directions. Save it as a PDF file."

**Requirements:**

* PDF generation requires ``reportlab`` (``pip install reportlab``)
* PPTX generation requires ``python-pptx`` (``pip install python-pptx``)

Supported Backends
------------------

* **Supported Backends**: OpenAI, Claude, Claude Code, Gemini, Grok, Chat Completions (generic API), LM Studio, Inference (vLLM/SGLang)
* **Not Supported**: Azure OpenAI, AG2 (these backends don't support custom tools)
* **How It Works**: Understanding tools route to the agent's native backend when supported (v0.1.55+). Image understanding supports OpenAI, Claude, Gemini, Grok, Claude Code, and Codex natively. Unsupported backends fall back to OpenAI.
* **Requirements**:

  * Your agent backend must support custom tools
  * The agent's own API key should be available for native routing (e.g., ``ANTHROPIC_API_KEY`` for Claude agents)
  * ``OPENAI_API_KEY`` is needed as a fallback for backends without native image understanding
  * Claude Code requires the ``claude`` CLI; Codex requires the ``codex`` CLI

See :doc:`../tools/custom_tools` for complete details on custom tool support by backend, and :doc:`../backends` for all backend capabilities including web search, code execution, and MCP support.

Configuration Examples
----------------------

Complete configuration files are available in the MassGen repository:

**Custom Multimodal Understanding Tools (v0.1.3+):**

* ``massgen/configs/tools/custom_tools/multimodal_tools/understand_audio.yaml`` - Audio transcription tool
* ``massgen/configs/tools/custom_tools/multimodal_tools/understand_file.yaml`` - File understanding tool (PDF, DOCX, etc.)
* ``massgen/configs/tools/custom_tools/multimodal_tools/understand_image.yaml`` - Image understanding tool
* ``massgen/configs/tools/custom_tools/multimodal_tools/understand_video.yaml`` - Video understanding tool

**Custom Multimodal Generation Tools (Latest):**

* ``massgen/configs/tools/custom_tools/multimodal_tools/text_to_image_generation_single.yaml`` - Single-agent image generation
* ``massgen/configs/tools/custom_tools/multimodal_tools/text_to_image_generation_multi.yaml`` - Multi-agent image generation
* ``massgen/configs/tools/custom_tools/multimodal_tools/text_to_video_generation_single.yaml`` - Single-agent video generation
* ``massgen/configs/tools/custom_tools/multimodal_tools/text_to_video_generation_multi.yaml`` - Multi-agent video generation
* ``massgen/configs/tools/custom_tools/multimodal_tools/text_to_speech_generation_single.yaml`` - Single-agent speech generation
* ``massgen/configs/tools/custom_tools/multimodal_tools/text_to_speech_generation_multi.yaml`` - Multi-agent speech generation
* ``massgen/configs/tools/custom_tools/multimodal_tools/text_to_file_generation_single.yaml`` - Single-agent document generation
* ``massgen/configs/tools/custom_tools/multimodal_tools/text_to_file_generation_multi.yaml`` - Multi-agent document generation

**Image Understanding:**

* ``@examples/basic/single/single_gpt5nano_image_understanding.yaml`` - Image understanding
* ``@examples/basic/multi/gpt5nano_image_understanding.yaml`` - Multi-agent image analysis

**Audio Understanding:**

* ``@examples/basic/single/single_openrouter_audio_understanding.yaml`` - Audio transcription

**Video Understanding:**

* ``@examples/basic/single/single_qwen_video_understanding.yaml`` - Video analysis with Qwen

**File Operations:**

* ``@examples/basic/single/single_gpt5nano_file_search.yaml`` - Document Q&A with file search

Browse all examples in the `Configuration README <https://github.com/Leezekun/MassGen/blob/main/@examples/README.md>`_.

File Size Limits and Optimization
----------------------------------

MassGen automatically handles file size limits to prevent memory issues and API errors.

Default Size Limits
~~~~~~~~~~~~~~~~~~~

Each multimodal tool has configurable size limits:

* **Images**: 10MB (automatically resized if exceeded)
* **Videos**: 50MB
* **Audio**: 25MB

Automatic Image Resizing
~~~~~~~~~~~~~~~~~~~~~~~~~

When an image exceeds the size limit, MassGen automatically:

1. Detects the oversized file
2. Compresses and resizes the image
3. Saves the optimized version to a temporary location
4. Processes the optimized image

**Supported formats for auto-resizing**: PNG, JPEG, JPG, WebP

**Example log output**:

.. code-block:: text

   Image size (12.5 MB) exceeds limit (10 MB). Attempting to resize...
   Successfully resized image from 12.5 MB to 8.3 MB

Customizing Size Limits
~~~~~~~~~~~~~~~~~~~~~~~~

You can override size limits per tool call using the ``MAX_FILE_SIZE_MB`` parameter:

.. code-block:: yaml

   custom_tools:
     - name: ["understand_image"]
       category: "multimodal"
       path: "massgen/tool/_multimodal_tools/understand_image.py"
       function: ["understand_image"]
       preset_args:
         MAX_FILE_SIZE_MB: 15  # Increase limit to 15MB

**Note**: Increasing limits may cause:

* Higher memory usage
* API errors for very large files
* Increased processing time

Best Practices
--------------

1. **API Keys and Backend Configuration**

   * **Native routing (v0.1.55+)**: Image and video understanding tools now route to the agent's own backend when it supports the capability
   * Ensure your agent's API key is set (e.g., ``ANTHROPIC_API_KEY`` for Claude, ``GEMINI_API_KEY`` for Gemini, ``XAI_API_KEY`` for Grok)
   * Set ``OPENAI_API_KEY`` as a fallback for backends without native image understanding
   * Claude Code requires the ``claude`` CLI installed and authenticated; Codex requires the ``codex`` CLI
   * Audio understanding still uses OpenAI's ``gpt-4o-transcribe`` by default

2. **File Access and Configuration**

   * Use ``context_paths`` to provide secure file access to agents for understanding tasks
   * Ensure files are accessible before running - use absolute paths or paths relative to execution directory
   * Install required dependencies before use:

     * Audio Understanding: No additional dependencies (uses OpenAI API)
     * Video Understanding: ``pip install opencv-python``
     * File Understanding (PDF): ``pip install PyPDF2``
     * File Understanding (Word): ``pip install python-docx``
     * File Understanding (Excel): ``pip install openpyxl``
     * File Understanding (PowerPoint): ``pip install python-pptx``
     * File Generation (PDF): ``pip install reportlab``
     * File Generation (PPTX): ``pip install python-pptx``

2. **Generation Tool Configuration**

   * Enable generation capabilities with backend flags:

     * ``enable_image_generation: true`` for image generation
     * ``enable_video_generation: true`` for video generation
     * ``enable_audio_generation: true`` for speech generation
     * ``enable_file_generation: true`` for document generation

   * Set appropriate ``cwd`` for organized output storage
   * Use ``storage_path`` parameter to customize output locations
   * Verify generated content with corresponding understanding tools

3. **Performance and Cost Optimization**

   * **Understanding Tools:**

     * Set appropriate ``max_chars`` limits for large documents to control API costs
     * Adjust ``num_frames`` for videos (default: 8) based on content length and detail needed
     * Monitor OpenAI API usage when processing large files or many files

   * **Generation Tools:**

     * Image generation (GPT-4.1) is more expensive than standard API calls
     * Video generation (Sora-2) can be costly - use appropriate duration (4-20 seconds)
     * Speech generation costs vary by model (gpt-4o-audio-preview vs gpt-4o-mini-tts)
     * Use multi-agent to refine prompts before generation

4. **Quality and Accuracy**

   * **Understanding:**

     * Use high-quality source files (clear images, high-quality audio, well-lit videos)
     * Ask specific, detailed questions to get better responses
     * Use multi-agent collaboration for diverse perspectives on complex content

   * **Generation:**

     * Provide detailed, specific prompts for better generation results
     * For images: Include style, composition, lighting, and mood details
     * For videos: Specify scene, camera movements, duration, and atmosphere
     * For speech: Choose appropriate voice and specify emotional tone
     * For documents: Outline structure, sections, and formatting requirements
     * Combine understanding and generation agents for iterative refinement

5. **Workspace Management**

   * Configure ``cwd`` for organized file storage (both input and output)
   * Use ``snapshot_storage`` for agent collaboration and sharing generated content
   * Review generated content in agent workspaces before distribution
   * Include ``.massgen/`` in ``.gitignore``
   * Clean up old workspaces periodically to manage storage
   * Use descriptive filenames for generated content (automatic timestamp-based naming available)

Troubleshooting
---------------

**Image Issues:**

* **Image file not found:** Ensure image path is added to ``context_paths`` and the file exists

  .. code-block:: yaml

     orchestrator:
       context_paths:
         - path: "path/to/image.jpg"
           permission: "read"

**Audio Issues:**

* **Audio file not found:** Ensure audio path is in ``context_paths`` and file exists
* **Unsupported audio format:** Use supported formats: WAV, MP3, M4A, MP4, OGG, FLAC, AAC, WMA, OPUS
* **API transcription error:** Verify OpenAI API key is set in ``.env`` file

**Video Issues:**

* **opencv-python not installed:** Install with ``pip install opencv-python``
* **Video file not found:** Ensure video path is in ``context_paths`` and file exists

  .. code-block:: yaml

     orchestrator:
       context_paths:
         - path: "path/to/video.mp4"
           permission: "read"

* **Unsupported video format:** Use supported formats: MP4, AVI, MOV, MKV, FLV, WMV, WEBM, M4V, MPG, MPEG
* **High API costs:** Reduce ``num_frames`` parameter (default: 8) to extract fewer frames

**General File Issues:**

* **File not found:** Ensure the file path is added to ``context_paths`` in the orchestrator configuration

  .. code-block:: yaml

     orchestrator:
       context_paths:
         - path: "path/to/your/file"
           permission: "read"

* **Permission errors:** Verify that files are readable and paths are accessible

* **Missing dependencies:** Install required Python packages for specific file types

  .. code-block:: bash

     pip install PyPDF2 python-docx openpyxl python-pptx opencv-python reportlab

**API and Dependency Issues:**

* **Missing OpenAI API key:** Set ``OPENAI_API_KEY`` in ``.env`` file or environment variable
* **Import errors:** Install required dependencies for your file types (see Best Practices section)
* **API costs:** Monitor usage carefully - multimodal understanding can be expensive with large files or many frames

Use Cases
---------

**Content Understanding:**

* **Document Processing:**

  * Analyze PDFs, Word docs, Excel sheets, PowerPoint presentations
  * Extract data from forms, tables, and structured documents
  * Summarize research papers, technical documentation, and reports

* **Media Analysis:**

  * Transcribe meeting recordings, interviews, and podcasts
  * Analyze video content through key frame extraction
  * Extract information from screenshots, charts, and diagrams

* **Code and Visual Analysis:**

  * Code analysis with AI-powered explanations
  * Visual content description for accessibility
  * Scene detection and description in videos

**Content Generation:**

* **Creative Content Creation:**

  * Generate marketing visuals and product images from descriptions
  * Create social media content (images, videos, audio)
  * Produce concept art and design mockups
  * Generate voice-overs and narration for videos

* **Document and Report Generation:**

  * Automatically generate technical reports and white papers
  * Create formatted business documentation (PDF, MD, TXT)
  * Produce meeting summaries and documentation
  * Generate educational materials and training guides

* **Video Production:**

  * Create promotional and marketing videos from text descriptions
  * Generate concept visualization and storyboards
  * Produce educational content and tutorials
  * Create social media video content

* **Audio Content:**

  * Generate audiobooks and narrated content
  * Create podcast intros and outros
  * Produce accessibility audio for visually impaired users
  * Generate multilingual voice content

Next Steps
----------

* :doc:`../backends` - Backend-specific multimodal capabilities
* :doc:`../files/file_operations` - Workspace and file management
* :doc:`../tools/index` - Custom tools configuration and usage
* :doc:`../../examples/advanced_patterns` - Advanced multimodal patterns
* :doc:`../../reference/yaml_schema` - Complete configuration reference


---

## user_guide/advanced/planning_mode.rst

:orphan:

Planning Mode
=============

Planning Mode enables agents to coordinate and plan their approaches **without executing irreversible actions**. Only the winning agent executes the final plan during presentation, preventing conflicts and unintended side effects during multi-agent coordination.

.. note::

   **New in v0.0.29**: Planning mode is especially powerful for MCP tool usage, preventing agents from executing external API calls, file operations, or database modifications during coordination.

Quick Start
-----------

**Five agents planning with filesystem tools:**

.. code-block:: bash

   uv run massgen \
     --config @examples/tools/planning/five_agents_filesystem_mcp_planning_mode.yaml \
     "Create a comprehensive project structure with documentation"

**Example with MCP tools:**

.. code-block:: bash

   uv run massgen \
     --config @examples/tools/mcp/five_agents_weather_mcp_test.yaml \
     "Compare weather forecasts for New York, London, and Tokyo"

What is Planning Mode?
-----------------------

Planning mode separates multi-agent coordination into two distinct phases:

1. **Coordination Phase** - Agents discuss, analyze, and vote on approaches **without executing actions**
2. **Presentation Phase** - Only the winning agent executes the agreed-upon plan

Without Planning Mode
~~~~~~~~~~~~~~~~~~~~~~

**Standard coordination** allows all agents to execute actions immediately:

.. code-block:: text

   ❌ Agent A creates file "output.txt" with content X
   ❌ Agent B creates file "output.txt" with content Y (overwrites!)
   ❌ Agent C creates file "output.txt" with content Z (overwrites again!)
   → Result: Chaos, lost work, conflicting changes

With Planning Mode
~~~~~~~~~~~~~~~~~~

**Planning mode** prevents execution during coordination:

.. code-block:: text

   ✅ Agent A: "I would create output.txt with content X because..."
   ✅ Agent B: "I would create output.txt with content Y because..."
   ✅ Agent C: "I agree with Agent B's approach" [votes for B]
   ✅ Agent A: "Agent B's approach is better" [votes for B]
   → Winner: Agent B
   → Agent B executes: Creates output.txt with content Y (no conflicts!)

When to Use Planning Mode
--------------------------

Use planning mode for tasks involving irreversible or conflicting operations:

File System Operations
~~~~~~~~~~~~~~~~~~~~~~

* ✅ File creation, modification, deletion
* ✅ Directory structure changes
* ✅ Batch file operations

.. code-block:: yaml

   orchestrator:
     coordination:
       enable_planning_mode: true

**Why**: Prevents multiple agents from creating/deleting the same files during coordination.

MCP External Tools
~~~~~~~~~~~~~~~~~~

* ✅ API calls (weather, search, notifications)
* ✅ Database operations
* ✅ External service integrations (Twitter, Discord, Notion)

.. code-block:: bash

   # Weather API example with planning mode
   uv run massgen \
     --config @examples/tools/mcp/five_agents_weather_mcp_test.yaml \
     "Get weather data for multiple cities"

**Why**: Prevents redundant API calls, rate limiting issues, and conflicting external state changes.

State-Changing Operations
~~~~~~~~~~~~~~~~~~~~~~~~~

* ✅ Database writes
* ✅ Sending messages/emails
* ✅ Creating issues/tickets
* ✅ Publishing content

**Why**: These operations can't be easily undone or rolled back.

When NOT to Use Planning Mode
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Planning mode adds coordination overhead. Skip it for:

* ❌ Pure analysis tasks (no side effects)
* ❌ Read-only operations
* ❌ Single-agent tasks
* ❌ Tasks where parallel execution is beneficial

Configuration
-------------

Basic Configuration
~~~~~~~~~~~~~~~~~~~

Enable planning mode in the ``orchestrator`` section:

.. code-block:: yaml

   orchestrator:
     coordination:
       enable_planning_mode: true

Agents will automatically plan without executing during coordination.

Custom Planning Instructions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Customize the planning behavior with instructions:

.. code-block:: yaml

   orchestrator:
     coordination:
       enable_planning_mode: true
       planning_mode_instruction: |
         PLANNING MODE ACTIVE: You are in the coordination phase.

         During this phase:
         1. Describe your intended approach and reasoning
         2. Analyze other agents' proposals
         3. Use 'vote' or 'new_answer' tools for coordination
         4. DO NOT execute filesystem operations, API calls, or state changes
         5. Save all execution for the final presentation phase

         Focus on planning, analysis, and coordination rather than execution.

Complete Example
~~~~~~~~~~~~~~~~

Full configuration with planning mode for filesystem operations:

.. code-block:: yaml

   agents:
     - id: "agent_a"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
         cwd: "workspace_a"  # File operations handled via cwd

     - id: "agent_b"
       backend:
         type: "openai"
         model: "gpt-5-nano"
         cwd: "workspace_b"  # File operations handled via cwd

   orchestrator:
     snapshot_storage: "snapshots"
     agent_temporary_workspace: "temp_workspaces"
     coordination:
       enable_planning_mode: true
       planning_mode_instruction: |
         During coordination, describe what you would do without executing.
         Only the winning agent will implement the plan.

   ui:
     display_type: "rich_terminal"
     logging_enabled: true

Orchestration Restart
---------------------

Planning mode works well with **orchestration restart** - a feature that automatically restarts coordination when answers are incomplete.

.. seealso::
   :doc:`../sessions/orchestration_restart` - Complete guide to automatic quality checks, per-attempt logging, and self-correcting workflows

How Planning Mode Works
------------------------

Coordination Phase
~~~~~~~~~~~~~~~~~~

During coordination with planning mode enabled:

1. **Agents receive planning instructions** automatically
2. **Agents describe approaches** without execution
3. **Coordination tools remain available**: ``vote`` and ``new_answer``
4. **MCP/filesystem tools are NOT blocked** - agents must follow instructions not to use them
5. **Agents vote** for the best approach

.. note::

   Planning mode relies on agents following instructions. It's not a technical block but a behavioral guideline. Agents with strong instruction-following (Claude, GPT-4, Gemini) respect planning mode well.

Presentation Phase
~~~~~~~~~~~~~~~~~~

After coordination completes:

1. **Winner selected** based on votes
2. **Planning mode disabled** for winner
3. **Winner executes the plan** with full tool access
4. **Results saved** and returned to user

Example Workflow
~~~~~~~~~~~~~~~~

**Task**: "Create a project structure with src/, tests/, and docs/ directories"

**Coordination Phase** (Planning Mode Active):

.. code-block:: text

   Round 1:
   --------
   Agent A: "I would create three directories: src/ for source code,
            tests/ for test files, and docs/ for documentation.
            Then I would add README files to each." [new_answer]

   Agent B: "I would do the same but also add __init__.py files to
            make src/ and tests/ proper Python packages." [new_answer]

   Agent C: "Agent B's approach is more complete." [votes for B]

   Round 2:
   --------
   Agent A: "Good point about __init__.py" [votes for B]
   Agent B: [already provided answer]
   Agent C: [already voted]

   → All agents voted
   → Winner: Agent B (2 votes)

**Presentation Phase** (Planning Mode Disabled):

.. code-block:: text

   Agent B executes:
   - create_directory("src")
   - write_file("src/__init__.py", "")
   - create_directory("tests")
   - write_file("tests/__init__.py", "")
   - create_directory("docs")
   - write_file("docs/README.md", "# Documentation")

   ✅ Complete! Clean execution without conflicts.

Benefits
--------

Conflict Prevention
~~~~~~~~~~~~~~~~~~~

* ✅ No competing file operations
* ✅ No redundant API calls
* ✅ Single, coherent execution path

Quality Through Discussion
~~~~~~~~~~~~~~~~~~~~~~~~~~~

* ✅ Agents refine ideas through coordination
* ✅ Best approach wins through voting
* ✅ Implementation reflects consensus

Resource Efficiency
~~~~~~~~~~~~~~~~~~~

* ✅ Prevents wasted API calls during coordination
* ✅ Single execution reduces costs
* ✅ Avoids rate limiting issues

Auditability
~~~~~~~~~~~~

* ✅ Clear separation between planning and execution
* ✅ Easy to review proposed approach before execution
* ✅ Detailed logs of coordination decisions

Examples by Use Case
--------------------

Example 1: Project Structure Creation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Config**: ``@examples/tools/planning/five_agents_filesystem_mcp_planning_mode.yaml``

.. code-block:: bash

   uv run massgen \
     --config @examples/tools/planning/five_agents_filesystem_mcp_planning_mode.yaml \
     "Create a Python microservice project with src/, tests/, docker/, and docs/ directories. Add starter files."

**Result**: Agents discuss the ideal structure, vote on the best approach, then winning agent creates everything cleanly.

Example 2: Weather Data Collection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Config**: ``@examples/tools/mcp/five_agents_weather_mcp_test.yaml``

.. code-block:: bash

   uv run massgen \
     --config @examples/tools/mcp/five_agents_weather_mcp_test.yaml \
     "Fetch weather data for San Francisco, New York, and London. Compare temperatures."

**Result**: Agents plan the API calls, agree on data format, then winning agent makes the actual requests.

Example 3: Social Media Integration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Config**: ``@examples/tools/planning/five_agents_twitter_mcp_planning_mode.yaml``

.. code-block:: bash

   uv run massgen \
     --config @examples/tools/planning/five_agents_twitter_mcp_planning_mode.yaml \
     "Analyze recent tweets about AI and post a summary"

**Result**: Agents plan search queries and post content without actually posting during coordination.

Backend Compatibility
---------------------

Planning mode works with all backends that support MCP or filesystem tools:

.. list-table::
   :header-rows: 1
   :widths: 25 25 50

   * - Backend
     - Planning Mode
     - Notes
   * - ``gemini``
     - ✅ Full support
     - Excellent instruction following
   * - ``openai``
     - ✅ Full support
     - GPT-4 and GPT-5 follow instructions well
   * - ``claude``
     - ✅ Full support
     - Strong instruction adherence
   * - ``claude_code``
     - ✅ Full support
     - Built-in tool control
   * - ``grok``
     - ✅ Full support
     - Reliable instruction following
   * - ``lmstudio``
     - ⚠️ Varies
     - Depends on local model quality

Troubleshooting
---------------

Agents Executing During Coordination
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Problem**: Agents are executing actions despite planning mode being enabled.

**Solutions**:

1. **Check your configuration**:

   .. code-block:: yaml

      orchestrator:
        coordination:
          enable_planning_mode: true  # Make sure this is set

2. **Strengthen planning instructions**:

   .. code-block:: yaml

      orchestrator:
        coordination:
          planning_mode_instruction: |
            IMPORTANT: DO NOT execute any operations during coordination.
            You are in PLANNING MODE - describe what you would do.

3. **Use backends with strong instruction following**: Claude, GPT-4/5, Gemini 2.0+

4. **Add explicit instructions to agent system messages**:

   .. code-block:: yaml

      agents:
        - id: "agent_a"
          system_message: |
            During coordination, you must ONLY plan and discuss.
            Do not execute filesystem, API, or state-changing operations.

Coordination Takes Too Long
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Problem**: Agents spend many rounds discussing without converging.

**Solutions**:

1. **Add timeout configuration**:

   .. code-block:: yaml

      timeout_settings:
        orchestrator_timeout_seconds: 600  # 10 minutes

2. **Use fewer agents** for simpler tasks

3. **Provide clearer task descriptions**

4. **Add voting guidance to system messages**

Best Practices
--------------

1. **Enable for irreversible operations**: Always use planning mode for file operations, API calls, or database changes

2. **Custom instructions for complex tasks**: Tailor ``planning_mode_instruction`` to your specific use case

3. **Clear task descriptions**: Help agents understand what needs planning vs immediate action

4. **Monitor coordination rounds**: Check logs to see if planning is effective

5. **Test with smaller agent teams first**: Start with 2-3 agents before scaling to 5+

6. **Set appropriate timeouts**: Some tasks need more coordination time

Next Steps
----------

* :doc:`../tools/mcp_integration` - Learn about MCP tools that benefit from planning mode
* :doc:`../files/file_operations` - Understand filesystem operations in planning mode
* :doc:`../../reference/yaml_schema` - Complete configuration reference
* :doc:`../../examples/advanced_patterns` - Advanced planning mode patterns


---

## user_guide/advanced/subagents.rst

Subagents
=========

Subagents enable agents to spawn independent child processes of a MassGen orchestrator for parallel task execution. Each subagent runs in its own isolated workspace, allowing complex workflows to be broken into concurrent, independent pieces.

Quick Start
-----------

**Enable subagents and run a parallel task:**

.. code-block:: bash

   massgen \
     --config @massgen/configs/features/subagent_demo.yaml \
     "Build a web app with frontend, backend, and documentation"

Here, the agent can spawn subagents to work on frontend, backend, and docs simultaneously.

*NOTE: Currently, you may want to mention subagents in the prompt to ensure the agent uses them effectively.*

What are Subagents?
-------------------

Subagents are independent MassGen processes spawned by a parent agent to handle parallelizable work:

.. code-block:: text

   Parent Agent
   ├── spawn_subagents([
   │     {task: "Build frontend", subagent_id: "frontend", context_paths: ["./frontend"]},
   │     {task: "Build backend", subagent_id: "backend", context_paths: ["./backend"]},
   │     {task: "Write docs", subagent_id: "docs", context_paths: ["./docs"]}
   │   ])
   │
   ├─→ Subagent: frontend (isolated workspace)
   ├─→ Subagent: backend (isolated workspace)
   └─→ Subagent: docs (isolated workspace)

   ← All complete → Parent continues with results

Key characteristics:

* **Process separation**: Each subagent is a separate MassGen subprocess, meaning it can use as little as one agent but up to multiple agents, all with MassGen's full capabilities.
* **Workspace isolation**: Each subagent gets its own workspace directory.
* **Explicit runtime mode**: Subagent runtime boundary is controlled by ``subagent_runtime_mode`` (default: ``isolated``). See :ref:`subagents-runtime-and-docker-behavior`.
* **Parallel execution**: All subagents run concurrently
* **Automatic inheritance**: By default, subagents inherit all parent agent backends; optional mode can inherit only the spawning parent's exact backend/model
* **Result aggregation**: Parent receives structured results with workspace paths

When to Use Subagents
---------------------

Use subagents when you have:

* **Independent parallel tasks**: Research topic A while researching topic B
* **Large deliverables**: Break a website into frontend, backend, assets
* **Context isolation**: Keep each task's files separate and organized
* **Time-consuming work**: Run multiple long tasks simultaneously

Do NOT use subagents for:

* **Sequential dependencies**: Task B needs output from Task A
* **Simple tasks**: Overhead isn't worth it for quick operations
* **Shared state requirements**: Tasks that need to coordinate in real-time

Configuration
-------------

Basic Configuration
~~~~~~~~~~~~~~~~~~~

Enable subagents in your YAML config:

.. code-block:: yaml

   orchestrator:
     coordination:
       enable_subagents: true
       subagent_default_timeout: 300  # 5 minutes per subagent (default)
       subagent_min_timeout: 60       # Minimum 1 minute (prevents too-short timeouts)
       subagent_max_timeout: 600      # Maximum 10 minutes (prevents runaway subagents)
       subagent_max_concurrent: 3     # Max 3 subagents at once

       # Optional: per-round timeouts for subagents (inherits parent if omitted)
       subagent_round_timeouts:
         initial_round_timeout_seconds: 600
         subsequent_round_timeout_seconds: 300
         round_timeout_grace_seconds: 120

.. note::

   Timeouts are clamped to the ``[subagent_min_timeout, subagent_max_timeout]`` range. This prevents models from accidentally setting unreasonably short or long timeouts.

Full Example
~~~~~~~~~~~~

.. code-block:: yaml

   agents:
     - id: "orchestrator_agent"
       backend:
         type: "openai"
         model: "gpt-4o"
         cwd: "workspace"
         enable_mcp_command_line: true
       system_message: |
         You are a task orchestrator. For complex tasks with independent
         parts, use spawn_subagents to parallelize the work.

   orchestrator:
     coordination:
       enable_subagents: true
       subagent_default_timeout: 300
       subagent_max_concurrent: 3
       enable_agent_task_planning: true  # Recommended for complex orchestration

   ui:
     display_type: "rich_terminal"
     logging_enabled: true

Custom Subagent Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

By default, subagents inherit all parent agent configurations. To customize:

.. code-block:: yaml

   orchestrator:
     coordination:
       enable_subagents: true
       subagent_orchestrator:
         enabled: true
         parse_at_references: false  # Optional: treat @tokens in task text as literals
         agents:
           - id: "subagent_worker"
             backend:
               type: "openai"
               model: "gpt-5-mini"  # Use cheaper model for subagents
           - id: "subagent_worker_2"
             backend:
               type: "gemini"
               model: "gemini-3-flash-preview"

To inherit the exact backend/model from the spawning parent agent (single-agent subagent runs):

.. code-block:: yaml

   orchestrator:
     coordination:
       enable_subagents: true
       subagent_orchestrator:
         enabled: true
         inherit_spawning_agent_backend: true

.. note::

   ``subagent_orchestrator.agents`` is typically the shared evaluator pool used by
   ``round_evaluator``. Other subagent types prefer per-parent ``subagent_agents`` or
   the spawning parent's inherited backend, so it is valid to configure both.

To opt specific subagent types into that shared ``subagent_orchestrator.agents`` pool,
set:

.. code-block:: yaml

   orchestrator:
     coordination:
       enable_subagents: true
       subagent_orchestrator:
         enabled: true
         shared_child_team_types: [round_evaluator, builder]

To apply the shared pool to every subagent type, use:

.. code-block:: yaml

   orchestrator:
     coordination:
       enable_subagents: true
       subagent_orchestrator:
         enabled: true
         shared_child_team_types: ["*"]

How Agents Use Subagents
------------------------

When subagents are enabled, agents have access to the ``spawn_subagents`` tool:

.. code-block:: json

   {
     "tool": "spawn_subagents",
     "arguments": {
       "tasks": [
         {
           "task": "Research Bob Dylan's biography and write to bio.md",
           "subagent_id": "biography",
           "context_paths": ["./"]
         },
         {
           "task": "Create discography table in discography.md",
           "subagent_id": "discography",
           "context_paths": ["./"]
         },
         {
           "task": "List 20 famous songs with years in songs.md",
           "subagent_id": "songs",
           "context_paths": []
         }
       ],
       "refine": true
     }
   }

Critical Rules for Calling Subagent Tool
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1. **Tasks run in PARALLEL**: All tasks start simultaneously. Do NOT create tasks where one depends on another's output.

2. **``context_paths`` is REQUIRED in every task**: Set it explicitly, even when empty.

   * Use ``[]`` for clean research with no extra path context.
   * Use ``["./"]`` to give read access to the parent workspace.
   * Use specific paths (for example ``["./website/index.html"]``) for least-privilege access.

3. **Maximum tasks per call**: Limited by ``subagent_max_concurrent`` (default 3).

4. **No nesting**: Subagents cannot spawn their own subagents.

5. **Read-only supplemental files**: Use optional ``context_files`` to share extra files, but subagents can only read them.

6. **Refine mode**: Use ``refine: false`` to return the first answer without multi-round refinement.

Subagent Tool Surface
~~~~~~~~~~~~~~~~~~~~~

The subagent MCP server intentionally exposes a small tool surface:

* ``spawn_subagents(tasks, background?, refine?)``: Start one or more subagents
* ``list_subagents()``: Discovery/index view (status, workspace, session, optional in-memory result payload)
* ``continue_subagent(subagent_id, message, timeout_seconds?)``: Continue an existing subagent session

Legacy specialized polling/cost endpoints are no longer part of the subagent MCP surface.
For background lifecycle management, use the standardized background lifecycle tools
(``custom_tool__get_background_tool_status``, ``custom_tool__get_background_tool_result``,
``custom_tool__wait_for_background_tool``, ``custom_tool__cancel_background_tool``,
``custom_tool__list_background_tools``).
Use ``include_all=true`` with ``custom_tool__list_background_tools`` (not ``list_subagents``).

Specialized Subagent Types
~~~~~~~~~~~~~~~~~~~~~~~~~~

You can pass ``subagent_type`` per task to use a specialized profile:

.. code-block:: json

   {
     "tool": "spawn_subagents",
     "arguments": {
       "tasks": [
         {
           "task": "Run procedural UI verification and report findings",
           "subagent_type": "evaluator",
           "subagent_id": "ui_eval",
           "context_paths": ["./"]
         }
       ]
     }
   }

Built-in specialized types:

.. list-table::
   :header-rows: 1
   :widths: 20 40 40

   * - Type
     - When to use
     - What the parent brief should include
   * - ``explorer``
     - Codebase/repository discovery, tracing where behavior is implemented, mapping relevant files and call paths
     - Specific questions to answer, likely path roots, and expected artifact (for example: file list + key findings with line references)
   * - ``researcher``
     - External-source research, evidence gathering, and citation-oriented synthesis
     - Scope boundaries, recency/citation requirements, allowed sources, and expected output format (summary, comparison matrix, references)
   * - ``evaluator``
     - Procedural verification and execution-heavy checks (tests, scripts, UI checks, reproducible validation)
     - Environment/setup steps, exact commands to run, pass/fail rubric, and required report format (what passed, what failed, logs/artifacts)
   * - ``novelty``
     - Breaking refinement plateaus by proposing fundamentally different directions when agents are stuck in incremental changes
     - Current work/answer, diagnostic analysis of what's been tried, evaluation findings showing zero transformative changes. The subagent returns 2-3 alternative approaches, not evaluations.

If you want additional roles (for example ``reasoner``), add a custom profile in
``.agent/subagent_types/<type-name>/SUBAGENT.md`` and call it via ``subagent_type``.

Configuring Active Subagent Types
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

By default, only the core types (``explorer``, ``researcher``, ``evaluator``) are active.
The ``novelty`` type is opt-in because it changes agent behavior during checklist evaluation.

Use ``subagent_types`` under ``orchestrator.coordination`` to control which types are available:

.. code-block:: yaml

   orchestrator:
     coordination:
       enable_subagents: true
       # Default (when omitted or null): [evaluator, explorer, researcher]
       subagent_types: [evaluator, explorer, researcher, novelty]

When ``subagent_types`` is set:

* Only the listed types are exposed to agents via the ``spawn_subagents`` tool and system prompts.
* Unknown type names in the list produce a warning but don't fail.
* An empty list ``[]`` disables all specialized types (agents can still spawn generic subagents).
* ``null`` or omitted uses the default set (excludes ``novelty``).

When ``novelty`` is included in the active types, the checklist evaluation system will
automatically suggest spawning a novelty subagent when it detects zero transformative
changes in an agent's work — helping break through refinement plateaus where agents
are stuck making only incremental improvements.

Parent Briefing Template (Recommended)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Specialized subagents typically have less orchestration context than the parent. Give them a concrete brief with explicit execution instructions.

.. code-block:: text

   Objective:
   Scope and paths:
   Environment/setup:
   Exact commands or checks to run:
   Success criteria / pass-fail rubric:
   Output format required:
   Constraints (time, tools, sources):

For ``evaluator`` tasks, include command-level detail (setup + run + verification) so the subagent can execute deterministically.

Custom project profiles can be added in ``.agent/subagent_types/<type-name>/SUBAGENT.md``.
Frontmatter is strict and only supports:

* ``name``
* ``description``
* ``skills`` (optional list of skill names)
* ``expected_input`` (optional list describing the parent-task brief requirements)

Unknown ``subagent_type`` values fail fast with an explicit validation error that includes available type names.
Template scaffolding is available at ``massgen/subagent_types/_template/SUBAGENT_TEMPLATE.md`` and is excluded from discovery.

Result Structure
~~~~~~~~~~~~~~~~

The ``spawn_subagents`` tool returns:

.. code-block:: json

   {
     "success": true,
     "results": [
       {
         "subagent_id": "biography",
         "status": "completed",
         "success": true,
         "answer": "Created bio.md with comprehensive biography...",
         "workspace": "/path/to/subagents/biography/workspace",
         "execution_time_seconds": 45.2,
         "token_usage": {"input_tokens": 5000, "output_tokens": 2000}
       },
       {
         "subagent_id": "discography",
         "status": "completed",
         "success": true,
         "answer": "Created discography.md with album table...",
         "workspace": "/path/to/subagents/discography/workspace",
         "execution_time_seconds": 38.7
       }
     ],
     "summary": {
       "total": 3,
       "completed": 3,
       "failed": 0,
       "timeout": 0
     }
   }

Refinement Control
~~~~~~~~~~~~~~~~~~

Use ``refine: false`` to disable multi-round refinement for faster, single-pass answers:

.. code-block:: json

   {
     "tool": "spawn_subagents",
     "arguments": {
       "tasks": [
         {"task": "Summarize the repo structure in README.md", "subagent_id": "summary", "context_paths": ["./README.md"]}
       ],
       "refine": false
     }
   }

Status Values
~~~~~~~~~~~~~

Subagents can return several status values, each indicating a different outcome:

.. list-table::
   :header-rows: 1
   :widths: 25 10 65

   * - Status
     - success
     - Description
   * - ``completed``
     - ``true``
     - Normal completion. Use the answer directly.
   * - ``completed_but_timeout``
     - ``true``
     - **Timed out but full answer recovered.** The subagent finished its work before being interrupted. Use the answer normally.
   * - ``partial``
     - ``false``
     - Timed out with partial work. Some work was done but no final answer was selected. Check the ``workspace`` for useful files.
   * - ``timeout``
     - ``false``
     - Timed out with no recoverable work. Check the ``workspace`` anyway for any partial files.
   * - ``error``
     - ``false``
     - An exception occurred. Check the ``error`` field for details.

.. note::

   The ``completed_but_timeout`` status indicates the subagent completed its task successfully—it just took longer than the configured timeout. The answer is complete and should be used normally. This is a success case with ``success: true``.

Passing Context to Subagents
----------------------------

There are two context channels:

* **Required per task**: ``context_paths`` (list of paths mounted read-only)
* **Optional supplement**: ``context_files`` (extra files copied as read-only context)

Example:

.. code-block:: json

   {
     "tasks": [
       {
         "task": "Refactor the utils module",
         "subagent_id": "refactor",
         "context_paths": ["./utils.py", "./config.py"],
         "context_files": [
           "/path/to/project/utils.py",
           "/path/to/project/config.py"
         ]
       }
     ],
   }

.. warning::

   Both ``context_paths`` and ``context_files`` are **read-only** for subagents. If you need the parent to use subagent output, copy files from the subagent's workspace after completion.

.. note::

   ``context_paths`` is required even when no context is needed. Use ``[]`` explicitly in that case.

Handling Timeouts and Failures
------------------------------

MassGen automatically attempts to recover work from timed-out subagents. When a subagent times out, the system checks the subagent's internal state to recover any completed work, answers, and cost metrics.

Timeout behavior has multiple layers:

1. **Subagent runtime timeout**: Controlled by ``subagent_default_timeout`` and clamped by ``subagent_min_timeout`` / ``subagent_max_timeout``.
2. **MCP client timeout**: ``spawn_subagents`` is timeout-exempt at the generic MCP-client layer.
3. **Codex per-tool timeout**: Subagent MCP config sets ``tool_timeout_sec = subagent_default_timeout + 60`` to avoid provider-side 60-second caps.

Timeout Recovery Behavior
~~~~~~~~~~~~~~~~~~~~~~~~~

When a subagent times out:

1. The system reads the subagent's internal ``status.json`` to check progress
2. If a complete answer was produced (the subagent finished but the timeout fired during cleanup), the answer and costs are recovered
3. The status is set to ``completed_but_timeout`` (success=true) if recovery succeeded
4. If partial work exists but no final answer, status is ``partial``
5. If no recoverable work exists, status is ``timeout``

**Example with recovery:**

.. code-block:: json

   {
     "subagent_id": "research",
     "status": "completed_but_timeout",
     "success": true,
     "answer": "Research completed. Created movies.md with...",
     "completion_percentage": 100,
     "token_usage": {"input_tokens": 50000, "output_tokens": 3000, "estimated_cost": 0.05}
   }

The ``completion_percentage`` field indicates progress (0-100) based on how many agents
have submitted answers and cast votes. With N agents, each answer contributes ~(50/N)%
and each vote contributes ~(50/N)%. Approximate phase milestones:

* **0%**: Just started
* **~50%**: All initial answers submitted, waiting for voting
* **100%**: Task completed (may still timeout during final presentation)

Handling Different Status Values
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**For** ``completed_but_timeout`` **(success=true):**

Use the answer normally—the subagent completed its work successfully.

**For** ``partial`` **(success=false):**

.. code-block:: text

   The subagent did work but didn't reach a final answer.

   1. Check the workspace path for created files
   2. Review completion_percentage to understand progress
   3. Either use partial files or retry with a simpler task

**For** ``timeout`` **(success=false):**

.. code-block:: text

   The subagent made no recoverable progress.

   1. Check the workspace anyway for any files
   2. Consider if the task was too complex
   3. Break into smaller subtasks or increase timeout

**For** ``error`` **(success=false):**

.. code-block:: text

   An exception occurred during execution.

   1. Read the error message for details
   2. Check if it's recoverable (missing file, permission issue)
   3. Fix the issue and retry

Example: Mixed Results Handling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When spawn_subagents returns mixed results:

.. code-block:: json

   {
     "success": false,
     "results": [
       {
         "subagent_id": "research",
         "status": "completed",
         "answer": "Research findings...",
         "workspace": "/path/to/subagents/research/workspace",
         "execution_time_seconds": 45.2,
         "token_usage": {"input_tokens": 5000, "output_tokens": 1200}
       },
       {
         "subagent_id": "analysis",
         "status": "completed_but_timeout",
         "answer": "Analysis results...",
         "workspace": "/path/to/subagents/analysis/workspace",
         "execution_time_seconds": 300.0,
         "completion_percentage": 100,
         "token_usage": {"input_tokens": 8000, "output_tokens": 2000}
       },
       {
         "subagent_id": "synthesis",
         "status": "timeout",
         "answer": null,
         "workspace": "/path/to/subagents/synthesis/workspace",
         "execution_time_seconds": 300.0,
         "completion_percentage": 45
       }
     ],
     "summary": {"total": 3, "completed": 2, "timeout": 1}
   }

The parent agent should:

1. Use the answers from ``research`` and ``analysis`` (both have valid answers)
2. Check ``synthesis``'s workspace for any partial files
3. Either complete the synthesis work itself or retry with a longer timeout

Logging and Debugging
---------------------

Directory Structure
~~~~~~~~~~~~~~~~~~~

Subagents create two directory structures:

**1. Log Directory (persisted):**

.. code-block:: text

   .massgen/massgen_logs/log_YYYYMMDD_HHMMSS/
   └── turn_1/
       └── attempt_1/
           ├── subagents/
           │   └── biography/
           │       ├── conversation.jsonl       # Subagent conversation history
           │       ├── workspace/               # Copy/symlink of runtime workspace
           │       └── full_logs/
           │           ├── status.json          # ◄ Single source of truth (Orchestrator writes)
           │           ├── biography_agent_1/
           │           │   └── 20260102_103045/
           │           │       ├── answer.txt   # Agent's answer snapshot
           │           │       └── workspace/   # Agent's workspace snapshot
           │           └── biography_agent_2/
           │               └── ...
           ├── metrics_summary.json             # Includes aggregated subagent costs
           └── status.json

**2. Runtime Workspace (may be cleaned up):**

.. code-block:: text

   .massgen/workspaces/workspace1_{hash}/
   └── subagents/
       └── biography/
           └── workspace/
               ├── agent_1_{hash}/     # Agent 1's working directory
               ├── agent_2_{hash}/     # Agent 2's working directory
               ├── snapshots/          # Answer snapshots
               └── temp/               # Temporary files

The ``full_logs/status.json`` is the single source of truth for subagent status. It's written by the subagent's Orchestrator and contains detailed coordination state including costs, votes, and historical workspaces.

Status File
~~~~~~~~~~~

The ``full_logs/status.json`` contains rich information:

.. code-block:: json

   {
     "meta": {
       "elapsed_seconds": 192.5,
       "start_time": 1767419307.4
     },
     "costs": {
       "total_input_tokens": 50000,
       "total_output_tokens": 3000,
       "total_estimated_cost": 0.05
     },
     "coordination": {
       "phase": "presentation",
       "completion_percentage": 100
     },
     "agents": {
       "biography_agent_1": {"status": "answered", "token_usage": {...}},
       "biography_agent_2": {"status": "answered", "token_usage": {...}}
     },
     "results": {
       "winner": "biography_agent_1",
       "votes": {"agent1.1": 2}
     },
     "historical_workspaces": [
       {"agentId": "biography_agent_1", "answerLabel": "agent1.1", "timestamp": "20260102_103045", ...}
     ]
   }

When status is surfaced (for example via ``list_subagents`` discovery entries or
``custom_tool__get_background_tool_status`` for background jobs), it is transformed
into a simplified view:

.. code-block:: json

   {
     "subagent_id": "biography",
     "status": "running",
     "phase": "enforcement",
     "completion_percentage": 75,
     "task": "Write a biography of Bob Dylan...",
     "workspace": "/path/to/subagents/biography/workspace",
     "started_at": "2026-01-02T10:30:45",
     "elapsed_seconds": 145.3,
     "token_usage": {"input_tokens": 50000, "output_tokens": 3000, "estimated_cost": 0.05}
   }

Cost Tracking
-------------

Subagent costs are automatically aggregated in the parent's metrics:

.. code-block:: json

   {
     "totals": {
       "estimated_cost": 0.083,
       "agent_cost": 0.046,
       "subagent_cost": 0.037
     },
     "subagents": {
       "total_subagents": 3,
       "total_input_tokens": 15000,
       "total_output_tokens": 6000,
       "total_estimated_cost": 0.037
     }
   }

Background Subagent Execution
-----------------------------

By default, ``spawn_subagents`` blocks until all subagents complete. For long-running tasks,
you can use background mode to spawn subagents while the parent agent continues working.

Enabling Background Mode
~~~~~~~~~~~~~~~~~~~~~~~~

Pass ``background=True`` to spawn subagents in the background:

.. code-block:: json

   {
     "tool": "spawn_subagents",
     "arguments": {
       "tasks": [
         {"task": "Research OAuth 2.0 best practices", "subagent_id": "oauth-research", "context_paths": []}
       ],
       "background": true
     }
   }

The tool returns immediately with running status:

.. code-block:: json

   {
     "success": true,
     "mode": "background",
     "subagents": [
       {
         "subagent_id": "oauth-research",
         "status": "running",
         "workspace": "/path/to/subagents/oauth-research/workspace",
         "status_file": "/path/to/logs/oauth-research/full_logs/status.json"
       }
     ],
   "note": "Poll for subagent completion to retrieve results when ready."
   }

Inspecting Logs and Costs
~~~~~~~~~~~~~~~~~~~~~~~~~

For deep debugging and cost inspection:

* Subagent-level status/costs: ``<subagent_log_dir>/full_logs/status.json``
* Run-level aggregated costs: ``<run_log_dir>/metrics_summary.json`` (includes subagent totals)
* Discovery/index metadata: ``list_subagents()`` (IDs, status, workspace/session pointers)

Use ``status.json`` for detailed per-subagent coordination and cost internals; use
``metrics_summary.json`` for top-level totals across the run.


Configuration
~~~~~~~~~~~~~

Configure background subagent behavior in your YAML config:

.. code-block:: yaml

   orchestrator:
     coordination:
       enable_subagents: true
       background_subagents:
         enabled: true  # Allow background spawning (default: true)


When to Use Background Mode
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use background mode when:

* **Long-running research tasks**: Spawn research while continuing other work
* **Independent background work**: Tasks that don't block the main workflow
* **Parallel exploration**: Start multiple research directions simultaneously

Do NOT use background mode when:

* **Results needed immediately**: If you need the result before proceeding
* **Sequential dependencies**: If subsequent work depends on the subagent output
* **Critical path tasks**: If the subagent task is on the critical path

Example: Background Research
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: text

   Parent Agent Workflow:
   1. Spawn background subagent for OAuth research
   2. Continue working on database schema
   3. (Subagent completes in background)
   4. On next tool call, OAuth research results injected
   5. Use research to inform authentication implementation

Evaluation Delegation Pattern
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A key use case for background subagents is **delegating procedural evaluation work** so the main
agent can focus on implementation. Without this, agents often spend their entire token budget
on evaluation (serving websites, running tests, taking screenshots, writing reports) and run
out of budget before implementing improvements.

**The pattern:** Spawn a background subagent with ``background=True, refine=False`` to handle
procedural evaluation, then continue building while it runs.

.. list-table::
   :header-rows: 1
   :widths: 50 50

   * - Subagent handles (procedural)
     - Main agent handles (analytical)
   * - Serve website, take screenshots, run Playwright tests
     - Analyze previous answers and peer approaches
   * - Execute test suites, linters, validation scripts
     - Make quality judgments and prioritize improvements
   * - Run benchmarks, profiling, performance measurements
     - Synthesize insights into a coherent strategy
   * - Check file integrity, link resolution, cross-references
     - Decide what to build or fix next
   * - Compare output against specs with automated tools
     - Weigh tradeoffs and make architectural decisions

The subagent returns a **descriptive report** — what it measured, what passed, what failed,
what it observed. The main agent trusts these observations but makes its own quality judgments,
since it has full context and the subagent may run on a simpler or cheaper model.

.. code-block:: text

   Parent Agent Workflow:
   1. Implement website features
   2. Spawn background subagent: "Serve index.html, take full-page screenshots,
      run Playwright accessibility checks, report all findings"
   3. Continue implementing next feature while subagent evaluates
   4. Subagent results arrive → read descriptive report
   5. Use findings to fix issues, then spawn another eval subagent

.. note::

   For backends without automatic result injection (hook support), agents should use
   the standardized background lifecycle tools:
   ``custom_tool__get_background_tool_status(job_id)``,
   ``custom_tool__wait_for_background_tool(...)``,
   ``custom_tool__get_background_tool_result(job_id)``, and
   ``custom_tool__cancel_background_tool(job_id)``.
   Use ``custom_tool__list_background_tools(include_all=true)`` to inspect job history.
   The wait call may return early with ``interrupted: true`` and ``injected_content``
   when runtime input/context becomes available.

Best Practices
--------------

1. **Design for independence**: Each subagent task should be completable without other subagents' output.

2. **Provide explicit path context**: Use required ``context_paths`` deliberately (``[]``, ``["./"]``, or specific paths).

3. **Use meaningful IDs**: ``subagent_id`` values appear in logs and help with debugging.

4. **Set appropriate timeouts**: Complex tasks may need longer than the default 5 minutes.

5. **Check workspaces on failure**: Subagent workspaces persist even on timeout/error.

6. **Start small**: Test with 2-3 subagents before scaling up.

Example Workflows
-----------------

Website Builder
~~~~~~~~~~~~~~~

.. code-block:: text

   Task: "Build a Bob Dylan tribute website"

   Parent agent spawns:
   ├── "research" subagent → Creates bio.md, timeline.md, quotes.md
   ├── "frontend" subagent → Creates HTML templates, CSS, JS
   └── "assets" subagent → Creates image metadata, placeholders

   Parent then:
   1. Copies files from each workspace
   2. Integrates content into templates
   3. Delivers final website

Documentation Generator
~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: text

   Task: "Generate API documentation for all modules"

   Parent agent spawns:
   ├── "auth_docs" subagent → Documents auth module
   ├── "db_docs" subagent → Documents database module
   └── "api_docs" subagent → Documents API endpoints

   Parent then:
   1. Collects all generated docs
   2. Creates index and cross-references
   3. Builds final documentation site

.. _subagents-runtime-and-docker-behavior:

Runtime and Docker Behavior
---------------------------

Subagent workspace isolation and subagent runtime isolation are different things:

* **Workspace isolation**: Each subagent has its own workspace directory.
* **Runtime boundary**: Where subagent processes execute (isolated vs inherited).

Runtime mode is configured under ``orchestrator.coordination``:

.. code-block:: yaml

   orchestrator:
     coordination:
       enable_subagents: true
       subagent_runtime_mode: isolated            # default
       # subagent_runtime_fallback_mode: inherited  # explicit opt-in fallback
       # subagent_host_launch_prefix: ["host-launch", "--exec"]

Mode semantics:

* ``isolated`` (default): require isolated subagent runtime behavior.
* ``inherited``: run subagents in the parent runtime boundary.
* ``subagent_runtime_fallback_mode: inherited``: explicit downgrade when isolated prerequisites are unavailable.
* **Codex difference (Docker mode)**: when fallback is unset and ``subagent_runtime_mode`` is ``isolated``, orchestrator treats fallback as ``inherited`` by default. Other backends stay strict unless fallback is explicitly configured.

Isolated mode in containerized parent runtimes may require a launch bridge:

* ``subagent_host_launch_prefix`` provides a command prefix used to launch isolated subprocesses across runtime boundaries.
* If isolated mode is requested and prerequisites are unavailable:
  * with no fallback configured: subagent launch fails with actionable diagnostics
  * with explicit fallback configured: launch continues in inherited mode and returns a warning

Execution model:

.. code-block:: text

   Parent agent
   ├── chooses effective runtime mode (isolated/inherited/fallback)
   ├── spawns subagent process
   └── preserves existing MCP/TUI contracts across modes

Troubleshooting
---------------

Subagent Not Spawning
~~~~~~~~~~~~~~~~~~~~~

**Problem**: ``spawn_subagents`` tool not available.

**Solution**: Ensure ``enable_subagents: true`` is set in orchestrator config.

All Subagents Timing Out
~~~~~~~~~~~~~~~~~~~~~~~~

**Problem**: Subagents consistently hit timeout.

**Solutions**:

1. Increase timeout: ``subagent_default_timeout: 600``
2. Simplify tasks: Break into smaller pieces
3. Check model: Some models are slower than others

Subagent Can't Access Files
~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Problem**: Subagent reports file not found.

**Solutions**:

1. Ensure each task includes required ``context_paths`` (even if empty)
2. Use ``["./"]`` for parent workspace access or explicit path list for least privilege
3. Verify referenced paths/files exist before spawning
4. Remember: context passed via ``context_paths`` and ``context_files`` is read-only

Unexpected Runtime Sharing (Docker Context)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Problem**: Subagents appear to run in shared runtime context (for example, local server port conflicts).

**What to check**:

1. Confirm ``subagent_runtime_mode`` is ``isolated`` (or omitted, since isolated is default)
2. If parent runtime is containerized, configure ``subagent_host_launch_prefix`` for isolated launch bridging
3. If you intentionally need shared runtime in constrained environments, set explicit fallback:

.. code-block:: yaml

   orchestrator:
     coordination:
       subagent_runtime_mode: isolated
       subagent_runtime_fallback_mode: inherited

**Important**:

Without explicit fallback, isolated mode failures are expected to fail fast with diagnostics (no silent downgrade).

Parent Can't Find Subagent Output
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Problem**: Parent agent can't locate subagent's created files.

**Solutions**:

1. Check the ``workspace`` path in the result
2. Instruct subagents to list created files in their answer
3. Use ``copy_files_batch`` tool to copy from subagent workspace

Related Documentation
---------------------

* :doc:`agent_task_planning` - Plan tasks before spawning subagents
* :doc:`planning_mode` - Coordinate before executing
* :doc:`../tools/index` - Tools available to agents
* :doc:`../../reference/yaml_schema` - Complete configuration reference


---

## user_guide/advanced/terminal_evaluation.rst

Terminal Evaluation
===================

MassGen can evaluate its own terminal display and frontend user experience by recording terminal sessions as videos and analyzing them using AI vision models. This is useful for:

* **Frontend development**: Evaluate UI/UX changes to the terminal display
* **Quality assurance**: Verify that status indicators, coordination displays, and agent outputs are clear
* **Case study creation**: Record demos and automatically generate video content
* **User testing**: Analyze how well the terminal communicates agent progress and results

.. note::

   **Quick Setup Summary:**

   1. Install VHS terminal recorder: ``brew install vhs`` (macOS) or ``go install github.com/charmbracelet/vhs@latest``
   2. Ensure OpenAI API key is configured in ``.env``
   3. Use the ``run_massgen_with_recording`` tool in your config
   4. Agent records, analyzes, and provides UX feedback automatically

Quick Start: Try It Now
------------------------

MassGen includes a working example you can try immediately:

.. code-block:: bash

   # Evaluate the terminal display for a simple task
   massgen \
     --config massgen/configs/tools/custom_tools/terminal_evaluation.yaml \
     "Record and evaluate the terminal display for the todo example config"

The agent will:

1. Record a MassGen session running the todo example
2. Save the recording as an MP4 video in the workspace
3. Extract key frames and analyze them with GPT-4.1
4. Provide detailed feedback on terminal display quality

How It Works
------------

The ``run_massgen_with_recording`` tool follows this workflow:

1. **Create VHS Tape**: Generates a VHS script to record the terminal session
2. **Run MassGen**: Executes MassGen WITHOUT ``--automation`` flag (to capture rich terminal display)
3. **Record Video**: VHS records the terminal session as MP4/GIF/WebM
4. **Extract Frames**: Extracts key frames from the video (default: 12 frames)
5. **Analyze Display**: Uses the ``understand_video`` tool to evaluate UX quality
6. **Return Feedback**: Provides structured evaluation with recommendations

The tool automatically saves videos to the agent workspace for reuse in case studies.

Prerequisites
-------------

**1. Install VHS Terminal Recorder**

VHS (by Charm) is required to record terminal sessions:

.. code-block:: bash

   # macOS
   brew install vhs

   # Linux/Windows (requires Go)
   go install github.com/charmbracelet/vhs@latest

Verify installation:

.. code-block:: bash

   vhs --version

**2. OpenAI API Key**

The tool uses GPT-4.1 for video analysis. Ensure your ``.env`` file contains:

.. code-block:: bash

   OPENAI_API_KEY=sk-...

**3. Dependencies**

The ``understand_video`` tool requires ``opencv-python``:

.. code-block:: bash

   pip install opencv-python

Basic Usage
-----------

**Example 1: Evaluate a Simple Config**

.. code-block:: yaml

   # terminal_eval_basic.yaml
   agents:
     - id: "evaluator"
       backend:
         type: "openai"
         model: "gpt-5-mini"
         cwd: "workspace"
         enable_mcp_command_line: true  # Required for VHS
         custom_tools:
           - name: ["run_massgen_with_recording"]
             category: "terminal_evaluation"
             path: "massgen/tool/_multimodal_tools/run_massgen_with_recording.py"
             function: ["run_massgen_with_recording"]
       system_message: |
         You can record and evaluate MassGen terminal displays.
         Use run_massgen_with_recording to test configs and provide UX feedback.

   orchestrator:
     context_paths:
       - path: "massgen/configs/simple_two_agents.yaml"
         permission: "read"

   ui:
     display_type: "rich_terminal"
     logging_enabled: true

Run with:

.. code-block:: bash

   massgen --config terminal_eval_basic.yaml "Evaluate the simple two agents config"

**Example 2: Custom Evaluation Criteria**

You can customize the evaluation prompt to focus on specific aspects:

.. code-block:: python

   # In the agent's prompt or directly in tool call
   run_massgen_with_recording(
       config_path="my_config.yaml",
       question="Create a todo list app",
       evaluation_prompt="""
       Focus on the coordination display. Evaluate:
       1. How clearly does it show agent collaboration?
       2. Are status transitions (streaming → answered → voted) clear?
       3. Is the winner selection process visible?
       4. What improvements would enhance multi-agent visualization?
       """
   )

Tool Parameters
---------------

.. code-block:: python

   async def run_massgen_with_recording(
       config_path: str,
       question: str,
       evaluation_prompt: str = "Evaluate the terminal display quality...",
       output_format: str = "mp4",
       num_frames: int = 12,
       timeout_seconds: int = 300,
       width: int = 1200,
       height: int = 800,
       allowed_paths: Optional[List[str]] = None,
       agent_cwd: Optional[str] = None,
   ) -> ExecutionResult

**Parameters:**

* ``config_path`` (str, required): Path to MassGen config file (YAML)

  * Relative paths resolved relative to agent workspace
  * Absolute paths must be within allowed directories

* ``question`` (str, required): Question to pass to MassGen

* ``evaluation_prompt`` (str): Prompt for evaluating terminal display

  * Default: Comprehensive UX evaluation (clarity, information density, status indicators, user experience)
  * Customize to focus on specific aspects (coordination, readability, etc.)

* ``output_format`` (str): Video format - ``"mp4"`` (default), ``"gif"``, or ``"webm"``

  * MP4: Best quality, suitable for case studies
  * GIF: Smaller file size, easier to embed in docs
  * WebM: Modern web format with good compression

* ``num_frames`` (int): Number of frames to extract for analysis (default: 12)

  * Higher values (16+) provide more detail but increase API costs
  * Lower values (4-8) faster and cheaper but may miss details
  * Recommended: 8-16 frames for most evaluations

* ``timeout_seconds`` (int): Maximum time to wait for MassGen completion (default: 300)

  * Adjust based on task complexity
  * Longer tasks need higher timeouts
  * VHS will wait this long before stopping recording

* ``width`` (int): Terminal width in pixels (default: 1200)
* ``height`` (int): Terminal height in pixels (default: 800)

  * Adjust for your preferred terminal dimensions
  * Larger dimensions capture more detail but increase file size

**Returns:**

.. code-block:: json

   {
     "success": true,
     "operation": "run_massgen_with_recording",
     "config_path": "/path/to/config.yaml",
     "question": "Create a todo list",
     "video_path": "/path/to/workspace/massgen_terminal.mp4",
     "video_format": "mp4",
     "video_size_bytes": 2458624,
     "recording_duration_seconds": 45.3,
     "massgen_timeout_seconds": 300,
     "terminal_dimensions": {"width": 1200, "height": 800},
     "evaluation": {
       "success": true,
       "num_frames_extracted": 12,
       "prompt": "Evaluate the terminal display quality...",
       "response": "The terminal display demonstrates excellent clarity..."
     }
   }

Advanced Usage
--------------

Recording as GIF for Documentation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

GIFs are ideal for embedding in documentation and case studies:

.. code-block:: bash

   massgen --config terminal_eval.yaml \
     "Record the todo example as a GIF with focus on agent coordination"

In your agent's system message, guide it to use GIF format:

.. code-block:: text

   When recording for documentation, use output_format="gif" and num_frames=8
   for faster processing and smaller file sizes.

Batch Evaluation of Multiple Configs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You can create an agent that systematically evaluates multiple configs:

.. code-block:: yaml

   orchestrator:
     context_paths:
       - path: "massgen/configs/tools/"
         permission: "read"

   agents:
     - id: "batch_evaluator"
       system_message: |
         Evaluate all configs in massgen/configs/tools/ directory.
         For each config:
         1. Record a simple test question
         2. Analyze the terminal display
         3. Compile a comparative report

Integration with Case Studies
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The tool automatically saves videos to the workspace for case study reuse:

.. code-block:: python

   # Videos are saved as: workspace/massgen_terminal.{format}
   # Reference them in case studies:

   ## Demo Video

   Here's a recording of MassGen solving the task:

   ![Terminal Demo](workspace/massgen_terminal.gif)

   **Evaluation:** The terminal display effectively shows agent collaboration
   with clear status indicators and smooth coordination visualization.

Evaluation Criteria
-------------------

The default evaluation prompt assesses:

1. **Visual Clarity and Readability**

   * Font rendering and contrast
   * Color scheme effectiveness
   * ANSI escape code handling
   * Text layout and spacing

2. **Information Density and Organization**

   * Multi-column layout for parallel agents
   * Content aggregation and streaming display
   * Log message formatting
   * Scroll handling for long outputs

3. **Status Indicator Effectiveness**

   * Agent states (streaming, answered, voted, completed)
   * Progress tracking visibility
   * Coordination phase transitions
   * Winner selection clarity

4. **Overall User Experience**

   * Real-time feedback quality
   * Mental model alignment (does display match user expectations?)
   * Error visibility and handling
   * Cognitive load and information hierarchy

Troubleshooting
---------------

**VHS Not Found Error**

.. code-block:: json

   {
     "success": false,
     "error": "VHS is not installed. Please install it from https://github.com/charmbracelet/vhs"
   }

**Solution:** Install VHS:

.. code-block:: bash

   brew install vhs  # macOS
   go install github.com/charmbracelet/vhs@latest  # Linux/Windows

**Video File Not Created**

If VHS completes but no video file is created:

1. Check VHS stderr output in the error response
2. Verify terminal dimensions are reasonable (width: 800-1920, height: 600-1080)
3. Ensure sufficient disk space for video recording
4. Try a shorter timeout (simpler task)

**Recording Timeout**

.. code-block:: json

   {
     "success": false,
     "error": "VHS recording timed out after 330 seconds"
   }

**Solution:** Increase timeout for complex tasks:

.. code-block:: python

   run_massgen_with_recording(
       config_path="complex_config.yaml",
       question="Complex question",
       timeout_seconds=600  # 10 minutes
   )

**OpenCV Import Error**

.. code-block:: bash

   pip install opencv-python

Best Practices
--------------

1. **Use Appropriate Timeouts**

   * Simple tasks: 60-120 seconds
   * Medium tasks: 120-300 seconds
   * Complex tasks: 300-600 seconds

2. **Optimize Frame Count**

   * Quick evaluation: 4-8 frames
   * Standard evaluation: 8-12 frames
   * Detailed analysis: 12-16 frames

3. **Choose Right Format**

   * Case studies: MP4 (best quality)
   * Documentation: GIF (easy embedding)
   * Web publishing: WebM (modern, efficient)

4. **Customize Evaluation Prompts**

   Focus on specific aspects you're testing:

   * "Evaluate the multi-agent coordination display"
   * "Assess readability for color-blind users"
   * "Analyze information hierarchy and visual flow"

5. **Save Videos for Reference**

   Videos are automatically saved to workspace - commit them to git for:

   * Regression testing (compare old vs new displays)
   * Documentation and tutorials
   * Case study demonstrations
   * User research artifacts

Example Workflow: UI Iteration
-------------------------------

**Step 1: Baseline Evaluation**

.. code-block:: bash

   massgen --config terminal_eval.yaml \
     "Record baseline terminal display for simple_two_agents config"

**Step 2: Make Display Changes**

Edit ``massgen/frontend/displays/terminal_display.py`` to improve UX.

**Step 3: Re-evaluate**

.. code-block:: bash

   massgen --config terminal_eval.yaml \
     "Record updated terminal display for simple_two_agents config"

**Step 4: Compare**

The agent can compare both evaluations and highlight improvements/regressions.

See Also
--------

* :doc:`multimodal` - Other multimodal tools (understand_video, understand_image, understand_audio)
* :doc:`../tools/custom_tools` - Creating custom tools for your domain
* :doc:`../integration/automation` - Running MassGen in automation mode for backend testing
* :doc:`../../development/writing_configs` - Best practices for config development
* Case Study Template: ``docs/case_studies/case-study-template.md``

External Resources
------------------

* `VHS - Charm Terminal Recorder <https://github.com/charmbracelet/vhs>`_
* `VHS Documentation <https://github.com/charmbracelet/vhs/tree/main/docs>`_
* `OpenCV Python Documentation <https://docs.opencv.org/4.x/d6/d00/tutorial_py_root.html>`_


---

## user_guide/agent_workspaces.rst

Agent Workspaces and Code Isolation
====================================

How agents interact with your project code during MassGen coordination.

write_mode Configuration
-------------------------

The ``write_mode`` option controls how agents interact with your project files::

    orchestrator:
      coordination:
        write_mode: auto   # auto | worktree | isolated | legacy

.. list-table::
   :header-rows: 1

   * - Mode
     - Git repo
     - Non-git directory
   * - ``auto`` (recommended)
     - Git worktree per round
     - Shadow copy with git init
   * - ``worktree``
     - Git worktree per round
     - Error (falls back to shadow)
   * - ``isolated``
     - Shadow copy
     - Shadow copy
   * - ``legacy``
     - Direct writes (no isolation)
     - Direct writes

Per-Round Worktrees
--------------------

Each coordination round, every agent gets a fresh git checkout of your project.
Agents have full read/write access to experiment with the code. Changes during
coordination rounds are tracked on anonymous git branches but not applied to
your project.

Only the final presentation winner's changes go through a review modal
where you approve which files to apply.

**Branch lifecycle:**

- Each agent has exactly one branch alive at a time
- Old branches are deleted when a new round starts for that agent
- Branch names use random suffixes (no agent IDs or round numbers)
- Branches are visible to other agents via ``git branch`` / ``git diff``

Scratch Space
--------------

Inside each worktree, ``.massgen_scratch/`` provides a git-excluded directory
for experiments, evaluation scripts, and notes. Scratch files can import from
the project naturally since they live inside the checkout.

Scratch is archived to ``.scratch_archive/`` in the workspace between rounds,
so it persists in workspace snapshots shared with other agents.

**Key properties:**

- Git-excluded: invisible to ``git status`` and review modals
- Archived between rounds: previous scratch available in workspace
- Shared via snapshots: other agents can see your scratch archive

Agent Statelessness
--------------------

Agents are stateless and anonymous across rounds. Each round is a fresh
invocation with no memory of previous rounds. All cross-agent information
is presented anonymously.

This means:

- Agents don't know which agent they are
- System prompts and branch names don't reveal identity
- Cross-agent answers and workspaces are presented anonymously
- Each round starts fresh from HEAD (no accumulated state)

Migrating from use_two_tier_workspace
---------------------------------------

``use_two_tier_workspace`` is deprecated. Replace::

    # Old
    coordination:
      use_two_tier_workspace: true

    # New
    coordination:
      write_mode: auto

The new ``write_mode: auto`` provides:

- Git worktree isolation (safe experimentation)
- In-worktree scratch space (replaces ``scratch/`` directory)
- Branch-based cross-agent visibility
- Review modal for final presentation changes


---

## user_guide/backends.rst

Backend Configuration
=====================

Backends connect MassGen agents to AI model providers. Each backend is configured in YAML and provides specific capabilities like web search, code execution, and file operations.

Overview
--------

Each agent in MassGen requires a backend configuration that specifies:

* **Provider**: Which AI service to use (OpenAI, Claude, Gemini, etc.)
* **Model**: Which specific model within that provider
* **Capabilities**: Which built-in tools are enabled
* **Parameters**: Model settings like temperature, max_tokens, etc.

Available Backends
------------------

Backend Types
~~~~~~~~~~~~~

MassGen supports these backend types (configured via ``type`` field in YAML):

.. list-table::
   :header-rows: 1
   :widths: 20 30 50

   * - Backend Type
     - Provider
     - Models
   * - ``openai``
     - OpenAI
     - GPT-5, GPT-5-mini, GPT-5-nano, GPT-4, GPT-4o
   * - ``claude``
     - Anthropic
     - Claude Haiku 3.5, Claude Sonnet 4, Claude Opus 4
   * - ``claude_code``
     - Anthropic (SDK)
     - Claude Sonnet 4, Claude Opus 4 (with dev tools)
   * - ``codex``
     - OpenAI (CLI)
     - GPT-5.4, GPT-5.3-Codex, GPT-5.2-Codex, GPT-5.1-Codex
   * - ``gemini``
     - Google
     - Gemini 2.5 Flash, Gemini 2.5 Pro
   * - ``gemini_cli``
     - Google (CLI)
     - Gemini 3, Gemini 2.5 Models (via Gemini CLI)
   * - ``grok``
     - xAI
     - Grok-4, Grok-3, Grok-3-mini
   * - ``azure_openai``
     - Microsoft Azure
     - GPT-4, GPT-4o, GPT-5 (Azure deployments)
   * - ``zai``
     - ZhipuAI
     - GLM-4.5
   * - ``ag2``
     - AG2 Framework
     - Any AG2-compatible agent
   * - ``lmstudio``
     - LM Studio
     - Local open-source models
   * - ``codex``
     - OpenAI (CLI)
     - GPT-5.4, GPT-5.3-Codex, GPT-5.2-Codex, GPT-5.1-Codex, GPT-4.1
   * - ``copilot``
     - GitHub Copilot
     - GPT-5-mini, GPT-4.1, Claude Sonnet 4, Gemini 2.5 Pro
   * - ``inference``
     - vLLM / SGLang
     - Any locally served model
   * - ``chatcompletion``
     - Generic
     - Any OpenAI-compatible API

Backend Capabilities
~~~~~~~~~~~~~~~~~~~~

Different backends support different built-in tools:

.. list-table:: Backend Tool Support
   :header-rows: 1
   :widths: 15 10 10 10 10 12 12 12 10 10

   * - Backend
     - Web Search
     - Code Execution
     - Bash/Shell
     - Image
     - Audio
     - Video
     - MCP Support
     - Filesystem
     - Custom Tools
   * - ``openai``
     - ⭐
     - ⭐
     - ✅
     - ⭐ Both
     - ⭐ Both
     - ⭐ Generation
     - ✅
     - ✅
     - ✅
   * - ``claude``
     - ⭐
     - ⭐
     - ✅
     - 🔧
     - 🔧
     - 🔧
     - ✅
     - ✅
     - ✅
   * - ``claude_code``
     - ⭐
     - ❌
     - ⭐
     - 🔧
     - 🔧
     - 🔧
     - ✅
     - ⭐
     - ✅
   * - ``codex``
     - ⭐
     - ❌
     - ⭐
     - 🔧
     - 🔧
     - 🔧
     - ✅
     - ⭐
     - ✅
   * - ``copilot``
     - ⭐
     - ❌
     - ✅
     - 🔧
     - 🔧
     - 🔧
     - ✅
     - ✅
     - ✅
   * - ``gemini``
     - ⭐
     - ⭐
     - ✅
     - 🔧
     - 🔧
     - 🔧
     - ✅
     - ✅
     - ✅
   * - ``gemini_cli``
     - ⭐
     - ⭐
     - ⭐
     - 🔧
     - 🔧
     - 🔧
     - ✅
     - ⭐
     - ✅
   * - ``grok``
     - ⭐
     - ❌
     - ✅
     - 🔧
     - 🔧
     - 🔧
     - ✅
     - ✅
     - ✅
   * - ``azure_openai``
     - ⭐
     - ⭐
     - ✅
     - ⭐ Both
     - ❌
     - ❌
     - ✅
     - ✅
     - ❌
   * - ``chatcompletion``
     - ❌
     - ❌
     - ✅
     - 🔧
     - 🔧
     - 🔧
     - ✅
     - ✅
     - ✅
   * - ``lmstudio``
     - ❌
     - ❌
     - ✅
     - 🔧
     - 🔧
     - 🔧
     - ✅
     - ✅
     - ✅
   * - ``zai``
     - ❌
     - ❌
     - ✅
     - 🔧
     - 🔧
     - 🔧
     - ✅
     - ✅
     - ✅
   * - ``inference``
     - ❌
     - ❌
     - ✅
     - 🔧
     - 🔧
     - 🔧
     - ✅
     - ✅
     - ✅
   * - ``ag2``
     - ❌
     - ⭐
     - ❌
     - ❌
     - ❌
     - ❌
     - ❌
     - ❌
     - ❌

**Notes:**

* **Symbol Legend:**

  * ⭐ **Built-in** - Native backend feature (e.g., Anthropic's web search, OpenAI's native image API, Claude Code's Bash tool)
  * 🔧 **Via Custom Tools** - Available through custom tools (requires ``OPENAI_API_KEY`` for multimodal understanding)
  * ✅ **MCP-based or Available** - Feature available via MCP integration or standard capability
  * ❌ **Not available** - Feature not supported

* **Custom Tools:**

  * Custom tools allow you to give agents access to your own Python functions
  * Most backends support custom tools (OpenAI, Claude, Claude Code, Codex, Copilot, Gemini, Grok, Chat Completions, LM Studio, ZAI, Inference)
  * **Azure OpenAI** and **AG2** do not support custom tools as they inherit from the base backend class without the custom tools layer
  * Custom tools are essential for multimodal understanding features (``understand_image``, ``understand_video``, ``understand_audio``, ``understand_file``)
  * See :doc:`tools/custom_tools` for complete documentation on creating and using custom tools

* **Code Execution vs Bash/Shell:**

  .. warning::
     **Common Confusion**: ``enable_code_execution`` and ``enable_code_interpreter`` run code in the **provider's sandbox** (cloud environment) with **NO access to your local filesystem**. If you need agents to read/write files in your project, use MCP-based bash instead.

  * **Code Execution (⭐)**: Backend provider's native code execution tool (runs in provider sandbox - **no access to MassGen workspaces**)

    * ``openai``: OpenAI code interpreter for calculations and data analysis
    * ``claude``: Anthropic's code execution tool
    * ``gemini``: Google's code execution tool
    * ``azure_openai``: Azure OpenAI code interpreter
    * ``ag2``: AG2 framework code executors (Local, Docker, Jupyter, Cloud)
    * **When to use**: Quick calculations, data analysis, isolated code snippets that don't need filesystem access

  * **Bash/Shell**: MassGen-level feature with **direct workspace access**

    * ⭐ (``claude_code``, ``codex``, ``gemini_cli``): Native shell/bash tools built into the backend CLI/SDK
    * ✅ (all MCP-enabled backends): Universal bash/shell via ``enable_mcp_command_line: true``
    * **When to use**: Code that needs to interact with your project files, run tests, execute scripts
    * See :doc:`tools/code_execution` for detailed setup and comparison

  * **Recommendation**: Choose one approach based on your needs. Use **built-in code execution** for isolated computational tasks, and **MCP bash/shell** for operations that need to affect your workspace files.

* **Filesystem:**

  * ⭐ (``claude_code``, ``codex``, ``gemini_cli``): Native filesystem tools via CLI/SDK (Read, Write, Edit, Bash, etc.)
  * ✅ (all backends with ``cwd`` parameter): Filesystem operations handled automatically through workspace configuration
  * See :doc:`files/file_operations` for detailed filesystem configuration

* **Multimodal Capabilities:**

  * **⭐ Native Multimodal Support**: The backend/model API directly handles multimodal content

    * **⭐ Both** (e.g., ``openai``, ``azure_openai``): Native API supports BOTH understanding (analyze) AND generation (create)
    * **⭐ Generation** (e.g., ``openai`` video): Can create videos via Sora-2 API but not analyze them

  * **🔧 Via Custom Tools**: Multimodal understanding through custom tools (``understand_image``, ``understand_video``, ``understand_audio``)

    * Works with any backend that supports custom tools
    * Requires ``OPENAI_API_KEY`` in ``.env`` file (tools use OpenAI's API for processing)
    * Examples: ``claude``, ``claude_code``, ``gemini``, ``grok``, ``chatcompletion``, ``lmstudio``, ``inference``
    * Does NOT work with ``azure_openai`` or ``ag2`` (these backends don't support custom tools)
    * See :doc:`advanced/multimodal` for complete setup instructions

  * **Understanding vs Generation**:

    * **Understanding**: Analyze existing content (images, audio, video)
    * **Generation**: Create new content from text prompts
    * **Both**: Supports both understanding AND generation

See :doc:`../reference/supported_models` for the complete backend capabilities reference.

Configuring Backends
--------------------

Basic Backend Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Every agent needs a ``backend`` section in the YAML configuration:

.. code-block:: yaml

   agents:
     - id: "my_agent"
       backend:
         type: "openai"          # Backend type (required)
         model: "gpt-5-nano"     # Model name (required)

Backend-Specific Examples
-------------------------

OpenAI Backend
~~~~~~~~~~~~~~

**Basic Configuration:**

.. code-block:: yaml

   agents:
     - id: "gpt_agent"
       backend:
         type: "openai"
         model: "gpt-5-nano"
         enable_web_search: true
         enable_code_interpreter: true

**With Reasoning Parameters:**

.. code-block:: yaml

   agents:
     - id: "reasoning_agent"
       backend:
         type: "openai"
         model: "gpt-5-nano"
         text:
           verbosity: "medium"      # low, medium, high
         reasoning:
           effort: "high"            # low, medium, high
           summary: "auto"           # auto, concise, detailed

**Supported Models:** GPT-5, GPT-5-mini, GPT-5-nano, GPT-4, GPT-4o, GPT-4-turbo, GPT-3.5-turbo

Claude Backend
~~~~~~~~~~~~~~

**Basic Configuration:**

.. code-block:: yaml

   agents:
     - id: "claude_agent"
       backend:
         type: "claude"
         model: "claude-sonnet-4"
         enable_web_search: true
         enable_code_interpreter: true

**With MCP Integration:**

.. code-block:: yaml

   agents:
     - id: "claude_mcp"
       backend:
         type: "claude"
         model: "claude-sonnet-4"
         mcp_servers:
           - name: "weather"
             type: "stdio"
             command: "npx"
             args: ["-y", "@modelcontextprotocol/server-weather"]

**Supported Models:** claude-haiku-4-5-20251001, claude-sonnet-4-5-20250929, claude-opus-4-1-20250805, claude-sonnet-4-20250514, claude-3-5-sonnet-latest, claude-3-5-haiku-latest

Claude Code Backend
~~~~~~~~~~~~~~~~~~~

**With Workspace Configuration:**

.. code-block:: yaml

   agents:
     - id: "code_agent"
       backend:
         type: "claude_code"
         model: "claude-sonnet-4"
         cwd: "workspace"           # Working directory for file operations

   orchestrator:
     snapshot_storage: "snapshots"
     agent_temporary_workspace: "temp_workspaces"

**Authentication:**

The Claude Code backend supports flexible authentication:

* **API key**: Set ``CLAUDE_CODE_API_KEY`` or ``ANTHROPIC_API_KEY`` environment variable
* **Subscription**: If no API key is set, uses Claude subscription authentication

This allows you to use Claude Code with a subscription while using a separate
API key for standard Claude backend agents.

**Special Features:**

* Native file operations (Read, Write, Edit, Bash, Grep, Glob)
* Workspace isolation
* Snapshot sharing between agents
* Full development tool suite

Codex Backend
~~~~~~~~~~~~~

**Basic Configuration:**

.. code-block:: yaml

   agents:
     - id: "codex_agent"
       backend:
         type: "codex"
         model: "gpt-5.4"
         cwd: "workspace"

**Authentication:**

The Codex backend supports flexible authentication:

* **API key**: Set ``OPENAI_API_KEY`` environment variable
* **ChatGPT subscription**: If no API key, uses OAuth via ``codex login``

**Supported Models:** gpt-5.4 (default), gpt-5.3-codex, gpt-5.2-codex, gpt-5.1-codex, gpt-5-codex, gpt-4.1

**Reasoning Effort Configuration:**

.. code-block:: yaml

   agents:
     - id: "codex_reasoning"
       backend:
         type: "codex"
         model: "gpt-5.4"
         model_reasoning_effort: "xhigh"  # low | medium | high | xhigh
         # reasoning:
         #   effort: "xhigh"            # OpenAI-style alias (also supported)

If both ``model_reasoning_effort`` and ``reasoning.effort`` are provided,
``model_reasoning_effort`` takes precedence.

**Special Features:**

* Native shell and file operations via Codex CLI
* Web search capability
* Session persistence and resumption
* MCP server support via workspace config

.. warning::

   **Sandbox Limitation**: Codex uses OS-level sandboxing (Seatbelt/Landlock) which
   **only restricts writes, NOT reads**. Codex can read any file on the filesystem.
   For security-sensitive workloads, use Docker mode or consider Claude Code instead.
   See :ref:`Native Tool Backends <native-tool-backends>` for details.

**Recommended: Docker Mode for Security:**

.. code-block:: yaml

   agents:
     - id: "secure_codex"
       backend:
         type: "codex"
         model: "gpt-5.4"
         cwd: "workspace"
         enable_mcp_command_line: true
         command_line_execution_mode: "docker"
         command_line_docker_network_mode: "bridge"  # Required for Codex

Gemini CLI Backend
~~~~~~~~~~~~~~~~~~

The ``gemini_cli`` backend (alias: ``gemini-cli``) wraps Google's Gemini CLI (``@google/gemini-cli``) for local or Docker execution.

**Basic Configuration (Local):**

.. code-block:: yaml

   agents:
     - id: "gemini_cli_agent"
       backend:
         type: "gemini_cli"
         model: "gemini-2.5-pro"
         cwd: "workspace"

**Authentication:**

* **CLI login**: Run ``gemini`` interactively to login with Google (preferred)
* **API key**: Set ``GOOGLE_API_KEY`` or ``GEMINI_API_KEY`` environment variable

**Installation:** ``npm install -g @google/gemini-cli``

**Docker Mode:** Requires ``command_line_docker_network_mode: "bridge"``. Add ``@google/gemini-cli`` to
``command_line_docker_packages.preinstall.npm`` or use an image with Gemini CLI pre-installed.

**Supported Models:** gemini-2.5-pro (default), gemini-2.5-flash, gemini-2.5-flash-lite, gemini-3-flash-preview, gemini-3-pro-preview, gemini-3.1-pro-preview

**Example configs:** ``massgen/configs/providers/gemini/gemini_cli_local.yaml``, ``gemini_cli_docker.yaml``

GitHub Copilot Backend
~~~~~~~~~~~~~~~~~~~~~~

**Prerequisites:**

1. An active `GitHub Copilot subscription <https://github.com/features/copilot/plans>`_
2. Install the Copilot CLI:

   .. code-block:: bash

      # macOS / Linux
      brew install copilot-cli

      # npm (all platforms)
      npm install -g @github/copilot

      # Windows
      winget install GitHub.Copilot

3. Authenticate — run ``copilot`` and use the ``/login`` slash command, or set a
   ``GH_TOKEN`` / ``GITHUB_TOKEN`` environment variable with a
   `fine-grained PAT <https://github.com/settings/personal-access-tokens/new>`_
   that has the **Copilot Requests** permission.

**Basic Configuration:**

.. code-block:: yaml

   agents:
     - id: "copilot-assistant"
       backend:
         type: "copilot"
         model: "gpt-5-mini"

**Supported Models:** gpt-5-mini (default), gpt-4, claude-sonnet-4, gemini-2.5-pro

**Special Features:**

* No API key required — authentication is handled through your GitHub subscription
* Web search capability
* MCP server support
* Session persistence and resumption

Gemini Backend
~~~~~~~~~~~~~~

**Basic Configuration:**

.. code-block:: yaml

   agents:
     - id: "gemini_agent"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
         enable_web_search: true
         enable_code_execution: true

**With Safety Settings:**

.. code-block:: yaml

   agents:
     - id: "safe_gemini"
       backend:
         type: "gemini"
         model: "gemini-2.5-pro"
         safety_settings:
           HARM_CATEGORY_HARASSMENT: "BLOCK_MEDIUM_AND_ABOVE"
           HARM_CATEGORY_HATE_SPEECH: "BLOCK_MEDIUM_AND_ABOVE"

**Supported Models:** gemini-2.5-flash, gemini-2.5-pro, gemini-2.5-flash-thinking

Grok Backend
~~~~~~~~~~~~

**Basic Configuration:**

.. code-block:: yaml

   agents:
     - id: "grok_agent"
       backend:
         type: "grok"
         model: "grok-3-mini"
         enable_web_search: true

**Supported Models:** grok-4, grok-4-fast, grok-3, grok-3-mini

Azure OpenAI Backend
~~~~~~~~~~~~~~~~~~~~

**Configuration:**

.. code-block:: yaml

   agents:
     - id: "azure_agent"
       backend:
         type: "azure_openai"
         model: "gpt-4"
         deployment_name: "my-gpt4-deployment"
         api_version: "2024-02-15-preview"

**Required Environment Variables:**

.. code-block:: bash

   AZURE_OPENAI_API_KEY=...
   AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
   AZURE_OPENAI_API_VERSION=YOUR-AZURE-OPENAI-API-VERSION

AG2 Backend
~~~~~~~~~~~

**Configuration:**

.. code-block:: yaml

   agents:
     - id: "ag2_agent"
       backend:
         type: "ag2"
         agent_type: "ConversableAgent"
         llm_config:
           config_list:
             - model: "gpt-4"
               api_key: "${OPENAI_API_KEY}"
         code_execution_config:
           executor: "local"
           work_dir: "coding"

See :doc:`integration/general_interoperability` for detailed AG2 configuration.

LM Studio Backend
~~~~~~~~~~~~~~~~~

**For Local Models:**

.. code-block:: yaml

   agents:
     - id: "local_agent"
       backend:
         type: "lmstudio"
         model: "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF"
         port: 1234

**Features:**

* Automatic LM Studio CLI installation
* Auto-download and loading of models
* Zero-cost usage
* Full privacy (local inference)

OpenRouter Backend
~~~~~~~~~~~~~~~~~~

OpenRouter provides unified access to multiple AI providers through a single API.
Use the ``chatcompletion`` backend type with OpenRouter's base URL.

**Basic Configuration:**

.. code-block:: yaml

   agents:
     - id: "openrouter_agent"
       backend:
         type: "chatcompletion"
         model: "openai/gpt-5-mini"
         base_url: "https://openrouter.ai/api/v1"

**With Reasoning Tokens:**

OpenRouter normalizes reasoning tokens across providers. Configure reasoning for
models that support it (OpenAI o-series, GPT-5, Claude 3.7+, Gemini 2.5+, DeepSeek R1, Grok):

.. code-block:: yaml

   agents:
     - id: "reasoning_agent"
       backend:
         type: "chatcompletion"
         model: "openai/gpt-5-mini"
         base_url: "https://openrouter.ai/api/v1"
         reasoning:
           effort: "medium"       # xhigh, high, medium, low, minimal, none
           max_tokens: 2000       # Optional: direct token limit (Anthropic-style)
           exclude: false         # Optional: set true to hide reasoning from response

**With Web Search:**

.. code-block:: yaml

   agents:
     - id: "search_agent"
       backend:
         type: "chatcompletion"
         model: "openai/gpt-5-mini"
         base_url: "https://openrouter.ai/api/v1"
         enable_web_search: true
         engine: "exa"            # exa (AI-native) or native (traditional)
         max_results: 10
         search_context_size: "high"  # low, medium, high

**Reasoning Effort Levels:**

* ``xhigh``: ~95% of max_tokens for reasoning
* ``high``: ~80% of max_tokens for reasoning
* ``medium``: ~50% of max_tokens for reasoning (default)
* ``low``: ~20% of max_tokens for reasoning
* ``minimal``: ~10% of max_tokens for reasoning
* ``none``: Disable reasoning entirely

**Environment Variable:**

.. code-block:: bash

   OPENROUTER_API_KEY=your-openrouter-api-key

.. note::

   Reasoning tokens are output tokens and billed accordingly. Models automatically
   include reasoning in responses when appropriate. Use ``exclude: true`` if you
   want the model to reason internally without returning the reasoning text.

Local Inference Backends (vLLM & SGLang)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Unified Inference Backend** (v0.0.24-v0.0.25)

MassGen supports high-performance local model serving through vLLM and SGLang with automatic server detection:

.. code-block:: yaml

   agents:
     - id: "local_vllm"
       backend:
         type: "chatcompletion"
         model: "meta-llama/Llama-3.1-8B-Instruct"
         base_url: "http://localhost:8000/v1"    # vLLM default port
         api_key: "EMPTY"

     - id: "local_sglang"
       backend:
         type: "chatcompletion"
         model: "meta-llama/Llama-3.1-8B-Instruct"
         base_url: "http://localhost:30000/v1"   # SGLang default port
         api_key: "${SGLANG_API_KEY}"

**Auto-Detection:**

* **vLLM**: Default port 8000
* **SGLang**: Default port 30000
* Automatically detects server type based on configuration
* Unified InferenceBackend class handles both

**SGLang-Specific Parameters:**

.. code-block:: yaml

   backend:
     type: "chatcompletion"
     model: "meta-llama/Llama-3.1-8B-Instruct"
     base_url: "http://localhost:30000/v1"
     separate_reasoning: true        # SGLang guided generation
     top_k: 50                        # Sampling parameter
     repetition_penalty: 1.1          # Prevent repetition

**Mixed Deployments:**

Run both vLLM and SGLang simultaneously:

.. code-block:: yaml

   agents:
     - id: "vllm_agent"
       backend:
         type: "chatcompletion"
         model: "Qwen/Qwen2.5-7B-Instruct"
         base_url: "http://localhost:8000/v1"
         api_key: "EMPTY"

     - id: "sglang_agent"
       backend:
         type: "chatcompletion"
         model: "Qwen/Qwen2.5-7B-Instruct"
         base_url: "http://localhost:30000/v1"
         api_key: "${SGLANG_API_KEY}"
         separate_reasoning: true

**Benefits of Local Inference:**

* **Cost Savings**: Zero API costs after initial setup
* **Privacy**: No data sent to external services
* **Control**: Full control over model selection and parameters
* **Performance**: Optimized for high-throughput inference
* **Customization**: Fine-tune models for specific use cases

**Setup vLLM Server:**

.. code-block:: bash

   # Install vLLM
   pip install vllm

   # Start vLLM server
   vllm serve meta-llama/Llama-3.1-8B-Instruct \
     --host 0.0.0.0 \
     --port 8000

**Setup SGLang Server:**

.. code-block:: bash

   # Install SGLang
   pip install "sglang[all]"

   # Start SGLang server
   python -m sglang.launch_server \
     --model-path meta-llama/Llama-3.1-8B-Instruct \
     --host 0.0.0.0 \
     --port 30000

**Configuration Example:**

See ``@examples/basic/multi/two_qwen_vllm_sglang.yaml`` for a complete mixed deployment example.

Common Backend Parameters
-------------------------

Model Parameters
~~~~~~~~~~~~~~~~

All backends support these common parameters:

.. code-block:: yaml

   backend:
     type: "openai"
     model: "gpt-5-nano"

     # Generation parameters
     temperature: 0.7           # Randomness (0.0-2.0, default 0.7)
     max_tokens: 4096           # Maximum response length
     top_p: 1.0                 # Nucleus sampling (0.0-1.0)

     # API configuration
     api_key: "${OPENAI_API_KEY}"  # Optional - uses env var by default
     timeout: 60                    # Request timeout in seconds

Tool Configuration
~~~~~~~~~~~~~~~~~~

Enable or disable built-in tools:

.. code-block:: yaml

   backend:
     type: "gemini"
     model: "gemini-2.5-flash"

     # Enable tools
     enable_web_search: true
     enable_code_execution: true

     # MCP servers (see MCP Integration guide)
     mcp_servers:
       - name: "server_name"
         type: "stdio"
         command: "npx"
         args: ["..."]

Multi-Backend Configurations
-----------------------------

Using Different Backends
~~~~~~~~~~~~~~~~~~~~~~~~

Each agent can use a different backend:

.. code-block:: yaml

   agents:
     - id: "fast_researcher"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
         enable_web_search: true

     - id: "deep_analyst"
       backend:
         type: "openai"
         model: "gpt-5"
         reasoning:
           effort: "high"

     - id: "code_expert"
       backend:
         type: "claude_code"
         model: "claude-sonnet-4"
         cwd: "workspace"

This is the **recommended approach** - use each backend's strengths:

* **Gemini 2.5 Flash**: Fast research with web search
* **GPT-5**: Advanced reasoning and analysis
* **Claude Code**: Development with file operations

Backend Selection Guide
-----------------------

Choosing the Right Backend
~~~~~~~~~~~~~~~~~~~~~~~~~~

Consider these factors when selecting backends:

**For Research Tasks:**

* **Gemini 2.5 Flash**: Fast, cost-effective, excellent web search
* **GPT-5-nano**: Good reasoning with web search
* **Grok**: Real-time information access

**For Coding Tasks:**

* **Claude Code**: Best for file operations, full dev tools
* **GPT-5**: Advanced code generation with reasoning
* **Gemini 2.5 Pro**: Complex code analysis

**For Analysis Tasks:**

* **GPT-5**: Deep reasoning and complex analysis
* **Claude Sonnet 4**: Long context, detailed analysis
* **Gemini 2.5 Pro**: Comprehensive multimodal analysis

**For Cost-Sensitive Tasks:**

* **GPT-5-nano**: Low-cost OpenAI model
* **Grok-3-mini**: Fast and affordable
* **Gemini 2.5 Flash**: Very cost-effective
* **LM Studio**: Free (local inference)

**For Privacy-Sensitive Tasks:**

* **LM Studio**: Fully local, no data sharing
* **Azure OpenAI**: Enterprise security
* **Self-hosted vLLM**: Private cloud deployment

.. _native-tool-backends:

Native Tool Backends (Claude Code, Codex & Gemini CLI)
------------------------------------------------------

MassGen supports three "native tool" agent backends that wrap CLI/SDK tools rather than just API
calls: **Claude Code** (Anthropic's Claude Code SDK), **Codex** (OpenAI's Codex CLI), and
**Gemini CLI** (Google's Gemini CLI). All three are **agent backends** — they require no API key
and authenticate via their own CLI login flow. They come with built-in filesystem and shell tools,
providing a more integrated development experience but with different security characteristics
than API-only backends.

Architecture Differences
~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table:: Native Tool Backends vs API Backends
   :header-rows: 1
   :widths: 25 35 40

   * - Aspect
     - Agent Backends (Claude Code, Codex, Gemini CLI)
     - API Backends (OpenAI, Claude, Gemini, etc.)
   * - Tool Execution
     - Native tools (Read, Write, Bash) run locally via CLI/SDK
     - Tools run via MassGen's MCP servers
   * - Permission Control
     - Backend's own sandbox + limited MassGen hooks
     - Full MassGen PathPermissionManager control
   * - Filesystem Access
     - Direct local filesystem access
     - Controlled through MCP filesystem tools
   * - State Management
     - Stateful (session persistence, conversation history)
     - Stateless (each call is independent)
   * - Authentication
     - CLI login (no API key required)
     - API key required

Agent Backend Comparison
~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table:: Claude Code vs Codex vs Gemini CLI
   :header-rows: 1
   :widths: 20 27 27 26

   * - Feature
     - Claude Code
     - Codex
     - Gemini CLI
   * - Provider
     - Anthropic (Claude Code SDK)
     - OpenAI (Codex CLI)
     - Google (Gemini CLI)
   * - Authentication
     - Subscription or ``CLAUDE_CODE_API_KEY``; no API key needed
     - ``codex login`` OAuth; no API key needed
     - ``gemini`` CLI login (Google account); no API key needed
   * - Models
     - Claude Sonnet 4, Claude Opus 4
     - GPT-5.4, GPT-5.3-Codex, GPT-5.2-Codex, GPT-5.1-Codex
     - gemini-2.5-pro, gemini-2.5-flash, gemini-3.1-pro-preview
   * - Native Tools
     - Read, Write, Edit, Bash, Grep, Glob, WebSearch, WebFetch
     - shell, apply_patch, web_search, image_view
     - ReadFile, WriteFile, RunShellCommand, WebSearch, WebFetch
   * - MCP Support
     - Yes (SDK-native)
     - Yes (via .codex/config.toml)
     - Yes (via .gemini/settings.json)
   * - Sandbox Type
     - SDK permission hooks
     - OS-level (Seatbelt on macOS, Landlock on Linux)
     - Process-level (workspace isolation)
   * - **Read Restrictions**
     - **Yes** - SDK hooks block reads outside allowed paths
     - **No** - OS sandbox only restricts writes
     - **Yes** - workspace-scoped
   * - Write Restrictions
     - Yes - SDK hooks enforce write permissions
     - Yes - OS sandbox restricts writes to writable_roots
     - Yes - workspace-scoped

.. warning::

   **Codex Sandbox Limitation**: Codex uses OS-level sandboxing (Seatbelt on macOS,
   Landlock on Linux) which **only restricts writes, NOT reads**. This means Codex
   can read any file on the filesystem, including sensitive files outside the workspace
   and context_paths (SSH keys, credentials, environment files, etc.).

   MassGen's permission hooks **cannot intercept** Codex's native tool calls because
   they run directly through the Codex CLI's internal tools.

Security Recommendations
~~~~~~~~~~~~~~~~~~~~~~~~

**For security-sensitive workloads, prefer Docker mode** which provides full filesystem
isolation via container boundaries:

.. code-block:: yaml

   # Recommended: Docker mode for Codex with sensitive data
   agents:
     - id: "secure_codex"
       backend:
         type: "codex"
         model: "gpt-5.4"
         cwd: "workspace"
         enable_mcp_command_line: true
         command_line_execution_mode: "docker"
         command_line_docker_network_mode: "bridge"  # Required for Codex
         command_line_docker_enable_sudo: true

.. important::

   **Codex in Docker mode requires** ``command_line_docker_network_mode: "bridge"``.
   Without this setting, Codex will fail to execute. The validator will check for this.

In Docker mode:

* The container itself is the sandbox - Codex's native tools can only access what's mounted
* Host filesystem is fully isolated from the agent
* ``~/.codex/`` is mounted read-only for OAuth token access
* The Codex CLI runs with ``--sandbox danger-full-access`` since the container provides isolation

**When Docker is not available**, consider:

1. **Use Claude Code or Gemini CLI instead** - Both provide read/write restrictions via their own permission model
2. **Limit context_paths** - Only grant access to directories that need agent access
3. **Avoid sensitive data** - Don't run Codex in directories with credentials or secrets
4. **Use API-only backends** - For maximum control, use ``openai`` or ``claude`` backends with MCP tools

Backend Configuration Best Practices
-------------------------------------

1. **Start with defaults**: Test with default parameters before tuning
2. **Use environment variables**: Never hardcode API keys
3. **Match backend to task**: Use each backend's strengths
4. **Enable only needed tools**: Disable unused capabilities
5. **Set appropriate timeouts**: Longer timeouts for complex tasks
6. **Monitor costs**: Track API usage across backends
7. **Test configurations**: Verify settings before production use

Advanced Backend Configuration
-------------------------------

For detailed backend-specific parameters, see:

* `Backend Configuration Guide <https://github.com/Leezekun/MassGen/blob/main/@examples/BACKEND_CONFIGURATION.md>`_
* :doc:`../reference/yaml_schema` - Complete YAML schema

MCP Integration
~~~~~~~~~~~~~~~

See :doc:`tools/mcp_integration` for:

* Adding MCP servers to backends
* Tool filtering (allowed_tools, exclude_tools)
* Planning mode configuration (v0.0.29)
* HTTP-based MCP servers

File Operations
~~~~~~~~~~~~~~~

See :doc:`files/file_operations` for:

* Workspace configuration
* Snapshot storage
* Permission management
* Cross-agent file sharing

Troubleshooting
---------------

**Backend not found:**

Ensure the backend type is correct:

.. code-block:: bash

   # Correct backend types
   type: "openai"         # ✅
   type: "claude_code"    # ✅
   type: "codex"          # ✅
   type: "copilot"        # ✅
   type: "gemini"         # ✅
   type: "gemini_cli"     # ✅

   # Incorrect (common mistakes)
   type: "gpt"            # ❌ Use "openai"
   type: "claude"         # ✅ (but consider "claude_code" for dev tools)
   type: "google"         # ❌ Use "gemini"

**API key not found:**

Check your ``.env`` file has the correct variable name:

.. code-block:: bash

   # Backend type → Environment variable
   openai       → OPENAI_API_KEY
   claude       → ANTHROPIC_API_KEY
   claude_code  → CLAUDE_CODE_API_KEY (falls back to ANTHROPIC_API_KEY)
   codex        → OPENAI_API_KEY (or use `codex login` for OAuth)
   copilot      → GH_TOKEN or GITHUB_TOKEN (or use /login in Copilot CLI)
   gemini       → GOOGLE_API_KEY
   gemini_cli   → GOOGLE_API_KEY or GEMINI_API_KEY (or use `gemini` login)
   grok         → XAI_API_KEY
   zai          → ZAI_API_KEY
   azure_openai → AZURE_OPENAI_API_KEY

.. note::

   **Separate API keys for Claude Code:** The ``claude_code`` backend checks
   ``CLAUDE_CODE_API_KEY`` first, then falls back to ``ANTHROPIC_API_KEY``.
   This allows you to use a Claude subscription (no API key needed) or a
   separate API key for Claude Code agents while using a different API key
   for standard Claude backend agents.

**Model not supported:**

Verify the model name matches the backend's supported models:

.. code-block:: yaml

   # Check supported models in README.md or use --model flag
   backend:
     type: "openai"
     model: "gpt-5-nano"  # ✅ Supported
     model: "gpt-6"       # ❌ Not yet available

Next Steps
----------

* :doc:`../quickstart/configuration` - Full configuration guide
* :doc:`tools/mcp_integration` - Add external tools via MCP
* :doc:`files/file_operations` - Enable file system operations
* :doc:`../reference/supported_models` - Complete model list
* :doc:`../examples/basic_examples` - See backends in action


---

## user_guide/concepts.rst

Core Concepts
=============

Understanding MassGen's core concepts is essential for using the system effectively.

What is MassGen?
-----------------

MassGen is a **multi-agent coordination system** that assigns tasks to multiple AI agents who work in parallel, share observations, vote for solutions, and converge on the best answer through natural consensus.

Agents observe, critique, and build on each other's work across cycles of refinement and restarts. When they believe they have a strong enough answer, they vote — and the best collectively validated answer wins.

Configuration-Driven Architecture
----------------------------------

MassGen uses **YAML files** to configure everything, not Python code.

.. code-block:: yaml

   agents:
     - id: "researcher"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
       system_message: "You are a researcher"

     - id: "analyst"
       backend:
         type: "openai"
         model: "gpt-5-nano"
       system_message: "You are an analyst"

Run via command line:

.. code-block:: bash

   massgen --config config.yaml "Your question"

This design makes MassGen:

* **Declarative** - Describe what you want, not how to do it
* **Version-controllable** - Config files in Git
* **Shareable** - Easy to share and reproduce setups
* **Language-agnostic** - No Python required for most users

.. seealso::
   :doc:`../quickstart/configuration` - Complete configuration guide with all options and examples

CLI-Based Execution
-------------------

MassGen is currently run via command line (a Python library API is planned for future releases):

**Quick single agent:**

.. code-block:: bash

   massgen --model claude-3-5-sonnet-latest "Question"

**Multi-agent with config:**

.. code-block:: bash

   massgen --config my_agents.yaml "Question"

**Interactive mode:**

.. code-block:: bash

   # Omit question for interactive chat
   massgen --config my_agents.yaml

See :doc:`../reference/cli` for complete CLI reference.

Execution Hierarchy
-------------------

Understanding MassGen's execution hierarchy helps navigate logs and debug issues.

.. code-block:: text

   Session (multi-turn conversation)
   └── Turn 1 (first user question)
   │   └── Attempt 1 (orchestration execution)
   │       ├── Round 1: Agent A processes → new_answer
   │       ├── Round 2: Agent B processes → new_answer
   │       ├── Round 3: Agent A processes → vote for B
   │       └── Round 4: Agent B processes → vote for self → CONSENSUS
   │
   └── Turn 2 (second user question)
       └── Attempt 1
           ├── Round 1: Agent A → new_answer
           └── Round 2: Agent B → vote → CONSENSUS

**Definitions:**

.. list-table::
   :header-rows: 1
   :widths: 15 50 35

   * - Term
     - Definition
     - Log Path
   * - **Session**
     - Multi-turn conversation. Spans multiple user interactions.
     - ``.massgen/sessions/``
   * - **Turn**
     - Single user question/task. Triggers full coordination.
     - ``log_TIMESTAMP/turn_N/``
   * - **Attempt**
     - One orchestration execution. Restarts create new attempts.
     - ``turn_N/attempt_N/``
   * - **Round**
     - Single agent LLM call cycle: context → streaming → output.
     - Recorded in ``llm_calls/``

**Key Insight:** Coordination ends when all agents vote (no ``new_answer`` calls). Each ``new_answer`` extends coordination; each ``vote`` moves toward consensus.

Multi-Agent Coordination
-------------------------

How Coordination Works
~~~~~~~~~~~~~~~~~~~~~~

MassGen's coordination follows a natural collaborative flow where agents observe each other's work and converge on the best solution:

**At each step, agents can:**

1. **See recent answers** - Agents view the most recent answers from other agents
2. **Decide their action** - Each agent chooses to either:

   * **Provide a new answer** if they have a better approach or refinement
   * **Vote for an existing answer** they believe is best

3. **Share context through workspace snapshots** (if file operations are enabled) - When agents provide answers, their workspace state is captured, allowing other agents to see their work

**Coordination completes when:**

* All agents have voted for solutions
* The agent with most votes becomes the final presenter

**Final presentation:**

* The winning agent delivers the coordinated final answer, using read/write permissions (if using filesystem operations and configured with context paths)

Coordination Flow Diagram
~~~~~~~~~~~~~~~~~~~~~~~~~~

Here's how agents asynchronously evaluate and respond during coordination:

.. code-block:: text

   ┌─────────────────────────────────────────────────────────────┐
   │              ASYNCHRONOUS COORDINATION LOOP                  │
   └─────────────────────────────────────────────────────────────┘
                                 │
                    ┌────────────┼────────────┐
                    │            │            │
                ┌───▼──┐     ┌───▼──┐     ┌───▼──┐
                │Agent │     │Agent │     │Agent │
                │  A   │     │  B   │     │  C   │
                └───┬──┘     └───┬──┘     └───┬──┘
                    │            │            │
        ┌───────────▼────────────▼────────────▼───────────┐
        │     View ANONYMIZED Answers (Context)           │
        │     - ORIGINAL MESSAGE                          │
        │     - CURRENT ANSWERs (anonymized)          │
        └───────────┬────────────┬────────────┬───────────┘
                    │            │            │
        ┌───────────▼────────────▼────────────▼───────────┐
        │  "Does the best CURRENT ANSWER address the      │
        │   ORIGINAL MESSAGE well?"                       │
        └───────────┬────────────┬────────────┬───────────┘
                    │            │            │
            ┌───────▼──────┐     │     ┌──────▼───────┐
            │    YES       │     │     │     NO       │
            │              │     │     │              │
        ┌───▼──────────┐   │     │     │    ┌─────────▼──────────┐
        │Use `vote`    │   │     │     │    │Digest existing     │
        │tool          │   │     │     │    │answers, combine    │
        │              │   │     │     │    │strengths, address  │
        │              │   │     │     │    │weaknesses, then use│
        │              │   │     │     │    │`new_answer` tool   │
        └───┬──────────┘   │     │     │    └─────────┬──────────┘
            │              │     │     │              │
            │              │     │     │              │
            └──────────────┴─────┴─────┴──────────────┘
                                 │
                                 ▼
                    ┌──────────────────────────┐
                    │  All agents voted?       │
                    │  (No new_answer calls)   │
                    └────┬──────────────┬──────┘
                         │              │
                     YES │              │ NO
                         │              │
              ┌──────────▼───────┐      │
              │  Select Winner   │      │
              │  (Most votes)    │      │
              └──────────┬───────┘      │
                         │              │
              ┌──────────▼───────┐      │
              │ Final Presentation│     │
              │ (Winner delivers)│      │
              └───────────────────┘     │
                                        │
                             ┌──────────▼───────────────┐
                             │ Agent provided new_answer│
                             │ ↓                        │
                             │ INJECT update to others: │
                             │ ALL agents receive update│
                             │ and continue with new    │
                             │ answer in context        │
                             │ (loop back to top)       │
                             └──────────────────────────┘

**Key Insights:**

* **Asynchronous evaluation** - Agents evaluate continuously and independently (no synchronized rounds)
* **Anonymized answers** - Agents don't know who provided which answer, reducing bias
* **Actual agent prompt** - Agents evaluate "Does best CURRENT ANSWER address ORIGINAL MESSAGE well?"
* **Inject-and-continue** - When any agent uses ``new_answer``, other agents receive an update appended to their conversation and continue (preserving their conversation history, though a new API call is made). Agents that haven't produced their first answer yet are protected from interruption (see :ref:`first-answer-protection` below).
* **Natural consensus** - Coordination ends only when all agents vote (no ``new_answer`` calls)
* **Democratic selection** - Winner determined by peer voting

Streaming Architecture
~~~~~~~~~~~~~~~~~~~~~~~

Each agent :term:`round` involves streaming LLM responses with potential tool execution:

.. code-block:: text

   ┌─────────────────────────────────────────────────────────────────┐
   │                    SINGLE AGENT ROUND                           │
   └─────────────────────────────────────────────────────────────────┘

   ┌──────────────┐     ┌──────────────────────────────────────────┐
   │  1. CONTEXT  │────▶│  2. LLM API CALL (streaming)             │
   │              │     │                                          │
   │ • Messages   │     │  ┌────────────────────────────────────┐  │
   │ • Tools      │     │  │ Stream Chunks:                     │  │
   │ • Prev Ans   │     │  │   content → content → content →    │  │
   └──────────────┘     │  │   tool_calls → done                │  │
                        │  └────────────────────────────────────┘  │
                        └──────────────────┬───────────────────────┘
                                           │
                              ┌────────────▼────────────┐
                              │  3. TOOL EXECUTION?     │
                              │     (if tool_calls)     │
                              └────────────┬────────────┘
                                           │
                      ┌────────────────────┴────────────────────┐
                      │                                         │
               ┌──────▼──────┐                          ┌───────▼──────┐
               │  Yes: MCP   │                          │  No: Done    │
               │  or Custom  │                          │              │
               │  Tool Call  │                          │  Output:     │
               └──────┬──────┘                          │  new_answer  │
                      │                                 │  or vote     │
                      ▼                                 └──────────────┘
               ┌─────────────┐
               │ Execute &   │
               │ Add Result  │
               │ to Messages │
               └──────┬──────┘
                      │
                      ▼
               ┌─────────────┐
               │ RECURSE:    │
               │ New LLM     │──────▶ (back to step 2)
               │ Call        │
               └─────────────┘

**Streaming Flow:**

1. **Context Assembly** - Messages, tools, and previous answers prepared
2. **LLM Streaming** - Response chunks arrive: ``content``, ``tool_calls``, ``reasoning``, ``done``
3. **Tool Detection** - If ``tool_calls`` chunk received, execute tools
4. **Recursive Call** - Tool results added to messages, new LLM call made
5. **Completion** - When ``done`` received without tools, round complete

Context Management & Injection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Each agent maintains an **AgentConversationBuffer** - a unified store that captures all streaming content and supports context injection from other agents.

**Why a Unified Buffer?**

During streaming, agents accumulate content, tool calls, reasoning, and tool results. When one agent provides a ``new_answer``, other agents need to receive that update mid-work. The buffer ensures:

* **Complete context** - All streaming content is captured and persisted
* **Working injection** - Updates from other agents are immediately visible
* **Single source of truth** - No fragmented storage locations

**Data Flow During Streaming:**

.. code-block:: text

   Backend yields chunks:
       │
       ├── content chunk ──────► buffer.add_content("I'll analyze...")
       │                              └── Accumulates in pending content
       │
       ├── reasoning chunk ────► buffer.add_reasoning("Let me think...")
       │                              └── Accumulates in pending reasoning
       │
       ├── tool_call chunk ────► buffer.add_tool_call("read_file", {path: "x"})
       │                              └── Added to pending tool calls
       │
       ├── tool_result chunk ──► buffer.add_tool_result("read_file", "call_1", "...")
       │                              └── Updates matching pending tool call
       │
       └── done chunk ─────────► buffer.flush_turn()
                                     │
                                     ├── Creates permanent entries from pending data
                                     └── Clears all pending accumulators

**Data Flow During Injection:**

When Agent A provides a ``new_answer``, other agents receive an injection:

.. code-block:: text

   Agent A provides new_answer
            │
            ▼
   Orchestrator detects new_answer, sets restart_pending[B] = True
            │
            ▼
   Agent B reaches safe point (between LLM calls)
            │
            ▼
   Orchestrator injects into Agent B's buffer:
       buffer_b.inject_update({"<anonymized>": "Here's my answer..."})
            │
            ├── Formats injection message with <NEW_ANSWERS> tags
            └── Creates INJECTION entry in buffer
            │
            ▼
   Agent B's next LLM call:
       messages = buffer_b.to_messages()  ← Includes injection entry
            │
            ▼
       agent.chat(messages)  ← Agent sees the injected update!

**Key Insight:** The buffer ensures that when Agent B receives an injection, it sees:

1. **Original context** - The initial user question and system messages
2. **Own previous work** - All content, tool calls, and results from its streaming
3. **New injection** - The update from Agent A, formatted as a new user message

This allows agents to seamlessly continue their work with full awareness of what other agents have discovered.

.. _first-answer-protection:

First-Answer Protection
~~~~~~~~~~~~~~~~~~~~~~~~

When an agent submits a ``new_answer``, MassGen signals all other agents to restart
or receive a mid-stream injection with the new context. However, agents that have not
yet produced their **first answer** are protected from these interruptions.

**Why?** Independent first answers are critical for diversity. If Agent A finishes
quickly and Agent B is immediately restarted before completing its first round,
Agent B never contributes an independent perspective -- it only ever sees Agent A's
work. This undermines the parallel exploration that makes multi-agent coordination
valuable.

**How it works:**

1. When any agent submits ``new_answer``, all agents are flagged with ``restart_pending``
2. Before acting on the flag, MassGen checks whether the target agent has produced
   at least one answer
3. If the agent has **no answer yet**, the flag is cleared and the agent continues
   working undisturbed
4. If the agent **already has an answer**, the restart or injection proceeds normally

This guard applies to all injection paths: full restarts, mid-stream hook injection,
and no-hook enforcement fallbacks. It does **not** apply to fairness gate restarts,
which correctly prevent agents from voting before they have sufficient context.

.. note::
   The ``restart_pending`` flag will be re-set if additional answers arrive while
   the agent is still working on its first round. The protection only defers the
   action -- it does not permanently suppress updates.

What Agents See
~~~~~~~~~~~~~~~

**Answer Context:**

Each agent sees the most recent answers from other agents **anonymously**. Answers are presented without attribution to reduce bias.

**Key Points:**

* **Anonymized evaluation** - Agents don't know which agent provided which answer
* **Focus on content** - Decisions based on answer quality, not agent identity
* **Bias reduction** - Prevents agents from favoring certain models or deferring to "authority"
* **Original message** - All agents always see the initial user query
* **Best current answer** - Agents evaluate if the best available answer is sufficient

This anonymous evaluation lets agents:

* Compare different approaches objectively
* Build on good insights regardless of source
* Catch potential errors without bias
* Decide whether to vote or provide a better answer based purely on merit

.. note::
   **Workspace Naming**: Always use ``cwd: "workspace"`` in your configs rather than numbered names like ``workspace1`` or ``workspace2``. MassGen automatically adds unique random suffixes per agent at runtime (e.g., ``workspace_f7a3b2c1``). This prevents agents from inferring their identity from workspace paths, which would undermine the anonymous voting design.

**Workspace Snapshots (for file operations):**

When an agent with filesystem capabilities provides an answer:

* Their workspace is saved as a snapshot
* Other agents can see this snapshot in their temporary workspace
* This enables code review, file analysis, and iterative refinement

Example: If Agent A writes code and provides answer "agent_a.1", Agent B can review that code in ``.massgen/temp_workspaces/agent_a/`` before deciding to vote or provide improvements.

Voting Mechanism
~~~~~~~~~~~~~~~~

:term:`Agents<Agent>` participate in democratic decision-making by evaluating solutions and voting for the best answer:

**Voting Process:**

1. Each agent reviews answers from other agents
2. Agent decides: "Is there a better answer than mine?"
3. If YES → Vote for the better answer
4. If NO → Continue with their own answer or refine it

**Natural Consensus:**

The system reaches :term:`consensus` when all agents have voted. No forced agreement - agents vote for what they genuinely believe is best based on their evaluation criteria.

**Example Scenario:**

* **Agent A** (Researcher) - Provides detailed research → Votes for Agent C's synthesis
* **Agent B** (Analyst) - Provides data analysis → Votes for Agent C's synthesis
* **Agent C** (Synthesizer) - Combines insights → Votes for self (believes synthesis is best)

**Result:** Agent C wins with 3 votes (including self-vote) and presents the final answer.

Checklist-Gated Evaluation
~~~~~~~~~~~~~~~~~~~~~~~~~~~

By default, agents decide when to vote based on their own judgment. **Checklist-gated evaluation** adds structured quality gates that agents must pass before they can vote, ensuring answers meet specific standards rather than relying on subjective "good enough" assessments.

This design is inspired by `GEPA (Genetic-Pareto) <https://gepa-ai.github.io/gepa/>`_, a framework for optimizing systems through LLM-based reflection and Pareto-efficient evolutionary search. GEPA's core insight is that structured diagnostic feedback -- what it calls "Actionable Side Information" -- produces far better iterative improvement than numeric scores alone. MassGen adapts this idea into multi-agent coordination: agents evaluate their work against explicit, evaluable criteria and receive structured feedback that guides their next iteration, rather than simply deciding "is this good enough?"

Rather than having a workflow-based improvement process encoded via code, MassGen is an interactive agent harness, with agents responsible for both implementing solutions and evaluating them against a checklist. This creates a tight feedback loop where agents diagnose weaknesses, propose specific improvements, implement them, and re-evaluate until they meet the quality bar to vote.

**How It Works:**

When ``voting_sensitivity: checklist_gated`` is enabled, agents follow a structured cycle:

.. code-block:: text

   ┌─────────────────────────────────────────────┐
   │  1. IMPLEMENT                                │
   │     Agent works on the task, produces output │
   └──────────────────────┬──────────────────────┘
                          │
                          ▼
   ┌─────────────────────────────────────────────┐
   │  2. SUBMIT CHECKLIST                         │
   │     Agent scores its work against criteria:  │
   │     E1: Requirements met?          (8/10)    │
   │     E2: No broken functionality?   (9/10)    │
   │     E3: Thorough, no gaps?         (6/10)    │
   │     E4: Shows care beyond correct? (5/10)    │
   └──────────────────────┬──────────────────────┘
                          │
                   ┌──────┴──────┐
                   │             │
               ITERATE        VOTE/STOP
                   │             │
                   ▼             ▼
   ┌────────────────────┐  ┌──────────────┐
   │ 3. PROPOSE         │  │ Terminal:    │
   │    IMPROVEMENTS    │  │ Agent votes  │
   │    What to fix,    │  │ for best     │
   │    what to keep    │  │ answer       │
   └────────┬───────────┘  └──────────────┘
            │
            ▼
   ┌────────────────────┐
   │ 4. IMPLEMENT FIXES │
   │    Submit improved  │
   │    new_answer       │
   │    (back to step 2) │
   └─────────────────────┘

**The Two Tools:**

- ``submit_checklist`` -- The agent scores each criterion for the answers in context. The system returns a verdict: **iterate** (keep improving) or **vote/stop** (quality bar met). The agent cannot vote until the checklist says the work is ready.

- ``draft_approach`` -- After an iterate verdict, the agent specifies what to fix and what to preserve. This creates a structured improvement plan rather than vague "make it better" instructions. The agent then implements the plan and submits a new answer.

**Default Criteria (E1-E4):**

MassGen ships with four default evaluation criteria that work across any task type:

.. list-table::
   :header-rows: 1
   :widths: 8 62 15

   * - ID
     - Criterion
     - Focus
   * - E1
     - The output directly achieves what was asked for -- requirements are met, not just approximated.
     - Correctness
   * - E2
     - No broken functionality, errors, or obvious defects. Everything that's present works correctly.
     - Functionality
   * - E3
     - The output is thorough -- no significant gaps, thin sections, or placeholder content.
     - Completeness
   * - E4
     - The output shows care beyond correctness -- thoughtful choices, consistent style, attention to edge cases.
     - Craft

**Custom Evaluation Criteria:**

For tasks where the defaults are too generic, MassGen can generate task-specific criteria before coordination begins. Enable this with ``evaluation_criteria_generator: {enabled: true}`` in your config. A pre-coordination consensus run analyzes the task and produces tailored E1-EN criteria that replace the defaults.

You can also provide criteria directly via ``--eval-criteria criteria.json`` or ``--checklist-criteria-preset`` for common task types (persona generation, task decomposition, prompt crafting, log analysis).

**Configuration:**

.. code-block:: yaml

   orchestrator:
     voting_sensitivity: checklist_gated   # Enable checklist gates
     evaluation_criteria_generator:
       enabled: true                       # Generate task-specific criteria
       min_criteria: 4
       max_criteria: 7

**Why Checklists Matter:**

Without structured evaluation, agents tend to converge prematurely -- voting for "good enough" answers that miss subtle quality gaps. The checklist forces agents to explicitly assess each quality dimension before they can vote, catching blind spots that subjective judgment misses. Combined with ``draft_approach``, this creates a tight feedback loop: diagnose weaknesses, plan fixes, implement, re-evaluate.

Benefits of Multi-Agent Approach
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* **Diverse Perspectives** - Different models, different insights
* **Error Correction** - Agents catch each other's mistakes
* **Collaborative Refinement** - Ideas build on each other
* **Quality Convergence** - Natural selection of best solutions
* **Robustness** - System works even if some agents fail

Coordination Termination
~~~~~~~~~~~~~~~~~~~~~~~~~

Coordination ends when one of these conditions is met:

**Normal Completion:**

* ✅ **All agents have voted** - Consensus reached naturally
* ✅ **Winner selected** - Agent with most votes presents final answer

**Timeout:**

* ⏰ **Orchestrator timeout reached** (default: 30 minutes)
* System saves current state and terminates gracefully
* Partial results preserved

**Typical Duration:**

* Simple tasks: 1-5 minutes (2-3 :term:`rounds<Round>` total across all agents)
* Standard tasks: 5-15 minutes (3-5 rounds)
* Complex tasks: 15-30 minutes (5-10 rounds)

**Configuration:**

.. code-block:: yaml

   timeout_settings:
     orchestrator_timeout_seconds: 1800  # 30 minutes (default)

**CLI Override:**

.. code-block:: bash

   massgen --orchestrator-timeout 600 --config config.yaml

See :doc:`../reference/timeouts` for complete timeout documentation.

Agents & Backends
-----------------

Agent Definition
~~~~~~~~~~~~~~~~

Each :term:`agent` has:

* **ID**: Unique identifier
* **Backend**: :term:`LLM provider<Backend>` (Claude, Gemini, GPT, etc.)
* **Model**: Specific model version
* **System Message**: Role and instructions (:term:`system prompt<System Message>`)
* **Tools**: Optional :term:`MCP servers<MCP Server>` or native capabilities

Example:

.. code-block:: yaml

   agents:
     - id: "code_expert"
       backend:
         type: "claude_code"
         model: "sonnet"
         cwd: "workspace"
       system_message: "You are a coding expert with file operations"

Backend Types
~~~~~~~~~~~~~

MassGen supports multiple :term:`backend` providers:

* **API-based**: Claude, Gemini, GPT, Grok, Azure OpenAI, Z AI
* **Local**: LM Studio, vLLM, SGLang
* **External Frameworks**: AG2

Each backend type has different capabilities. See :doc:`../reference/supported_models` for details.

Workspace Isolation
-------------------

Each :term:`agent` gets an isolated :term:`workspace` for file operations, preventing interference during :term:`coordination phase`.

**What is a Workspace?**

A workspace is an agent's private directory where it can:

* Read, write, and edit files freely
* Execute code and scripts
* Create directory structures
* Perform file operations without affecting other agents

All workspaces are stored under ``.massgen/workspaces/`` in your project directory.

**Example:**

.. code-block:: yaml

   agents:
     - id: "writer"
       backend:
         type: "claude_code"
         cwd: "writer_workspace"    # Isolated workspace: .massgen/workspaces/writer_workspace/

     - id: "reviewer"
       backend:
         type: "gemini"
         cwd: "reviewer_workspace"  # Separate workspace: .massgen/workspaces/reviewer_workspace/

**Benefits of Isolation:**

* **No conflicts** - Agents can't accidentally overwrite each other's files
* **Parallel work** - Multiple agents modify files simultaneously
* **Clean state** - Each agent starts with a fresh workspace
* **Workspace sharing** - Agents can review each other's workspaces via :term:`snapshots<Snapshot>`

.. seealso::
   :doc:`files/file_operations` - Complete workspace management guide including directory structure, snapshots, and safety features

MCP Tool Integration
--------------------

MassGen integrates tools via :term:`Model Context Protocol (MCP)<MCP (Model Context Protocol)>`, enabling access to web search, weather, :term:`file operations<File Operation>`, and many other external services.

**Example:**

.. code-block:: yaml

   backend:
     type: "gemini"
     model: "gemini-2.5-flash"
     mcp_servers:
       - name: "search"
         type: "stdio"
         command: "npx"
         args: ["-y", "@modelcontextprotocol/server-brave-search"]

.. seealso::
   :doc:`tools/mcp_integration` - Complete MCP guide including common servers, tool filtering, planning mode, and security considerations

Project Integration
-------------------

Work directly with your existing codebase using :term:`context paths<Context Path>` with granular read/write permissions.

**What is a Context Path?**

A context path is a shared directory that agents can access during collaboration. Unlike isolated :term:`workspaces<Workspace>`, context paths allow agents to:

* **Read** your existing project files for analysis
* **Write** to your project (only the :term:`final agent` during presentation)
* **Reference** code, documentation, or data from your real project

**Key Features:**

* **Permission control** - Specify ``read`` or ``write`` access per path
* **Coordination safety** - All paths are read-only during coordination
* **Final agent writes** - Only the winning agent can write during final presentation
* **Protected paths** - Mark specific files as read-only even within writable paths

**Example:**

.. code-block:: yaml

   orchestrator:
     context_paths:
       - path: "/Users/me/project/src"
         permission: "read"       # All agents can analyze code
       - path: "/Users/me/project/docs"
         permission: "write"      # Final agent can update docs
         protected_paths:
           - "README.md"          # Keep README read-only

All MassGen state organized under ``.massgen/`` directory in your project root.

.. seealso::
   * :doc:`files/project_integration` - Complete project integration guide
   * :doc:`files/protected_paths` - Protect specific files within writable paths
   * :doc:`files/file_operations` - File operation safety features

Interactive Multi-Turn Mode
----------------------------

Start MassGen without a question for interactive chat with context preservation across turns.

.. code-block:: bash

   # Single agent interactive
   massgen --model gemini-2.5-flash

   # Multi-agent interactive
   massgen --config my_agents.yaml

**Key Features:**

* **Context preservation** - :term:`Sessions<Session>` are automatically saved and restored
* **Multi-turn coordination** - Full coordination process runs for each turn
* **Workspace persistence** - File operations persist across turns
* **Tool integration** - :term:`MCP tools<MCP Server>` work seamlessly across turns
* **Session management** - Resume previous conversations or start fresh

**Tool Running in Multi-Turn:**

When using MCP tools or file operations in :term:`multi-turn mode`:

* Tools execute during each turn's coordination
* Workspace state is preserved in ``.massgen/sessions/``
* Subsequent turns can access previous turn's files and data
* Planning mode can be enabled to prevent premature tool execution

**Example Session:**

.. code-block:: bash

   Turn 1: "Create a website about Python"
   # Agents coordinate, winner creates files in workspace
   # Workspace saved to .massgen/sessions/session_abc123/

   Turn 2: "Add a dark mode toggle"
   # Agents see previous workspace, coordinate on improvements
   # Winner modifies existing files

.. seealso::
   * :doc:`sessions/multi_turn_mode` - Complete interactive mode guide including commands, session management, and debugging
   * :doc:`tools/mcp_integration` - Using MCP tools in multi-turn sessions
   * :doc:`advanced/planning_mode` - Prevent premature tool execution during coordination

External Framework Integration
-------------------------------

AG2 Integration
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Integrate :term:`AG2` framework as a custom tool:

.. code-block:: yaml

   agents:
     - id: "ag2_assistant"
       backend:
         type: "openai"
         model: "gpt-4o"
         custom_tools:
           - name: ["ag2_lesson_planner"]
             category: "education"
             path: "massgen/tool/_extraframework_agents/ag2_lesson_planner_tool.py"
             function: ["ag2_lesson_planner"]
       system_message: |
         You have access to an AG2-powered tool that uses
         nested chats and group collaboration.

AG2's multi-agent orchestration patterns are wrapped as tools that MassGen agents can invoke.

See :doc:`integration/general_interoperability` for details.

File Operation Safety
---------------------

Read-Before-Delete Enforcement
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

MassGen prevents accidental file deletion:

* Agents must read a file before deleting it
* Exception: Agent-created files can be deleted
* Clear error messages when operations blocked

Directory Validation
~~~~~~~~~~~~~~~~~~~~

* All paths validated at startup
* Context paths must be directories, not files
* Absolute paths required

Permissions
~~~~~~~~~~~

* **During coordination**: All context paths are READ-ONLY
* **Final presentation**: Winning agent gets configured permission (read/write)

See :doc:`files/file_operations` for safety features.

System Architecture
-------------------

Execution Flow
~~~~~~~~~~~~~~

1. **Load Configuration**

   Parse :term:`YAML configuration`, validate paths, initialize :term:`agents<Agent>`

2. **Coordination**

   * Agents work in parallel, each seeing recent answers from others
   * Each agent decides: provide new answer or vote for existing answer
   * When agent provides answer, :term:`workspace` :term:`snapshot` is captured
   * Other agents see snapshots in their :term:`temporary workspace`
   * Continues until all agents have voted

3. **Winner Selection**

   Agent with most votes is selected as :term:`final agent`

4. **Final Presentation**

   * Winning agent delivers the coordinated final answer
   * If using :term:`context paths<Context Path>` with write permission, winning agent can update project files

5. **Output**

   Results displayed, logged, and workspace snapshots saved

Real-Time Visualization
~~~~~~~~~~~~~~~~~~~~~~~

MassGen provides rich terminal UI showing:

* Agent coordination table
* Voting progress
* Consensus detection
* Streaming responses
* Phase transitions

Disable with ``--no-display`` for simple text output.

State Management & .massgen Directory
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

MassGen organizes all working files under the :term:`.massgen Directory` in your project root. This keeps MassGen state separate from your project files.

**Directory Structure:**

.. code-block:: text

   .massgen/
   ├── workspaces/              # Agent workspaces
   │   ├── agent1_workspace/    # Agent 1's isolated workspace
   │   └── agent2_workspace/    # Agent 2's isolated workspace
   ├── snapshots/               # Workspace snapshots for sharing
   │   └── agent1_snapshot_1/   # Agent 1's snapshot from coordination round 1
   ├── temp_workspaces/         # Temporary workspaces for viewing others' work
   │   └── agent1/              # Agent 2 can see Agent 1's work here
   ├── sessions/                # Multi-turn session history
   │   └── session_abc123/      # Saved session state
   └── logs/                    # Execution logs

**Key Features:**

* **Workspace isolation** - Each agent has private workspace under ``workspaces/``
* **Snapshot sharing** - Agents share work via ``snapshots/`` during coordination
* **Session persistence** - Multi-turn conversations saved in ``sessions/``
* **Clean separation** - All MassGen files kept separate from your project
* **Git-friendly** - Add ``.massgen/`` to ``.gitignore`` to exclude from version control

**How State Persists:**

1. **During coordination**: Agent workspaces and snapshots are created/updated
2. **Between turns**: Session state saved to ``sessions/`` directory
3. **On restart**: Sessions can be resumed from saved state
4. **Final presentation**: Winner's workspace contains the final output

See :doc:`files/project_integration` for using ``.massgen`` with your existing codebase.

Common Patterns
---------------

Research Tasks
~~~~~~~~~~~~~~

.. code-block:: yaml

   agents:
     - id: "gemini"  # Fast web search
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
     - id: "gpt5"   # Deep analysis
       backend:
         type: "openai"
         model: "gpt-5-nano"

Coding Tasks
~~~~~~~~~~~~

.. code-block:: yaml

   agents:
     - id: "coder"  # Code execution
       backend:
         type: "claude_code"
         cwd: "workspace"
     - id: "reviewer"  # Code review
       backend:
         type: "gemini"

Hybrid Teams
~~~~~~~~~~~~

.. code-block:: yaml

   agents:
     - id: "ag2_executor"  # AG2 framework tool
       backend:
         custom_tools:
           - name: ["ag2_lesson_planner"]
             # ... AG2 tool config
         type: "openai"
         # ... backend config
     - id: "claude_analyst"  # File operations
       backend:
         type: "claude_code"
         # ... MCP config
     - id: "gemini_researcher"  # Web search
       backend:
         type: "gemini"

Best Practices
--------------

1. **Start Simple** - Begin with 2-3 agents, add more as needed
2. **Diverse Models** - Mix different providers for varied perspectives
3. **Clear Roles** - Give each agent specific system messages
4. **Use MCP** - Leverage tools for enhanced capabilities
5. **Enable Planning Mode** - For tasks with irreversible actions
6. **Context Paths** - Work with existing projects safely
7. **Interactive Mode** - For iterative development

Next Steps
----------

* :doc:`../quickstart/running-massgen` - Practical examples
* :doc:`../reference/yaml_schema` - Complete configuration reference
* :doc:`tools/mcp_integration` - Add tools to agents
* :doc:`sessions/multi_turn_mode` - Interactive conversations
* :doc:`files/project_integration` - Work with your codebase
* :doc:`integration/general_interoperability` - External framework integration


---

## user_guide/files/file_operations.rst

File Operations & Workspace Management
=======================================

MassGen provides comprehensive file system support, enabling agents to read, write, and manipulate files in organized, isolated workspaces.

Quick Start
-----------

**Single agent with file operations:**

.. code-block:: bash

   # Run from your project directory
   cd /path/to/your-project
   uv run massgen "Create a Python web scraper and save results to CSV"

**Or with explicit config:**

.. code-block:: bash

   massgen \
     --config @examples/tools/filesystem/claude_code_single.yaml \
     "Create a Python web scraper and save results to CSV"

**Multi-agent file collaboration:**

.. code-block:: bash

   massgen \
     --config @examples/tools/filesystem/claude_code_context_sharing.yaml \
     "Generate a comprehensive project report with charts and analysis"

.. warning::

   **Model Selection for Filesystem Operations**

   We **do not recommend** using weaker models like GPT-4o or GPT-4.1 for filesystem operations. These models do not handle file operations reliably and may produce unexpected behavior.

   **Recommended models:**

   * Claude Sonnet 4/4.5
   * GPT-5
   * Gemini 2.5 Pro
   * Grok 4
   * Other frontier models with strong tool-calling capabilities

   Weaker models may struggle with complex file operations, error handling, and workspace management.

Inline Context Paths with @syntax
---------------------------------

MassGen supports ``@path`` syntax in prompts to include files and directories as context paths dynamically, without modifying your YAML config.

**Basic Syntax:**

.. list-table::
   :header-rows: 1
   :widths: 40 60

   * - Syntax
     - Effect
   * - ``@path/to/file``
     - Add file as read-only context
   * - ``@path/to/file:w``
     - Add file as write context
   * - ``@path/to/dir/``
     - Add directory as read-only context
   * - ``@path/to/dir/:w``
     - Add directory as write context
   * - ``\@literal``
     - Escaped @ (not parsed as reference)

**Examples:**

.. code-block:: bash

   # Review a specific file (read-only)
   massgen "Review @src/main.py for security issues"

   # Refactor a file (write access)
   massgen "Refactor @src/config.py:w to use dataclasses"

   # Use one file as reference, modify another
   massgen "Use @docs/spec.md as reference to update @src/impl.py:w"

   # Multiple files
   massgen "Compare @src/old.py with @src/new.py and create a migration guide"

   # Directory access
   massgen "Review all files in @src/components/ for consistency"

**Quick CWD Shortcut (CLI flag):**

.. code-block:: bash

   # Equivalent to prepending @<cwd> in read-only mode
   massgen --cwd-context ro "Review this repository"

   # Equivalent to prepending @<cwd>:w in write mode
   massgen --cwd-context rw "Apply the requested changes"

**Features:**

* **Path Validation**: Paths are validated before execution - you'll get a clear error if a path doesn't exist
* **Home Directory Expansion**: Use ``~`` for home directory (e.g., ``@~/projects/myapp``)
* **Relative Paths**: Paths are resolved relative to your current working directory
* **Smart Suggestions**: If you reference 3+ files from the same directory, MassGen suggests using the parent directory instead
* **Permission Merging**: ``@`` paths are merged with any ``context_paths`` in your YAML config

**Interactive Mode with Tab Completion:**

In interactive mode, MassGen provides inline file path completion when you type ``@``. Press **Tab** to see file suggestions:

.. code-block:: text

   👤 User: Review @src/ma<Tab>
                   ┌──────────────────┐
                   │ src/main.py      │  ← Press Tab to autocomplete
                   │ src/manager.py   │
                   │ src/makefile     │
                   └──────────────────┘

When you submit a prompt with ``@`` paths, MassGen will automatically update agent permissions:

.. code-block:: text

   👤 User: Review @src/utils.py for bugs

   📂 Context paths from prompt:
      📖 /path/to/src/utils.py (read)
      🔄 Updating agents with new context paths...
      ✅ Agents updated with new context paths

   🔄 Processing...

**Path Accumulation Across Turns:**

Context paths from ``@`` syntax accumulate across turns in a session. If you reference ``@src/main.py`` in turn 1, you can still discuss it in turn 5 without re-specifying the path:

.. code-block:: text

   Turn 1: "Review @src/main.py"
   → Agent can access: src/main.py

   Turn 2: "Now check @tests/test_main.py too"
   → Agent can access: src/main.py, tests/test_main.py (both!)

   Turn 3: "Fix the bug we discussed"
   → Agent can still access: src/main.py, tests/test_main.py

**Permission Upgrade:**

If you reference the same path with different permissions, the higher permission (write) takes precedence:

.. code-block:: text

   Turn 1: @src/main.py      → read access
   Turn 2: @src/main.py:w    → upgraded to write access!

**Programmatic API:**

For the Python API, ``@`` parsing is opt-in:

.. code-block:: python

   import massgen

   # Opt-in to @path parsing
   result = await massgen.run(
       query="Review @src/main.py for issues",
       model="claude-sonnet-4",
       parse_at_references=True,  # Enable @path parsing
   )

   # Or manually parse and handle paths
   from massgen.path_handling import parse_prompt_for_context

   parsed = parse_prompt_for_context("Review @src/main.py")
   print(parsed.context_paths)  # [{'path': '/abs/path/to/src/main.py', 'permission': 'read'}]
   print(parsed.cleaned_prompt)  # "Review"

Configuration
-------------

Basic Workspace Setup
~~~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   agents:
     - id: "file-agent"
       backend:
         type: "claude_code"        # Backend with file support
         model: "claude-sonnet-4"   # Your model choice
         cwd: "workspace"           # Isolated workspace for file operations

   orchestrator:
     snapshot_storage: "snapshots"                 # Shared snapshots directory
     agent_temporary_workspace: "temp_workspaces"  # Temporary workspace management

Multi-Agent Workspace Isolation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Each agent gets its own isolated workspace:

.. code-block:: yaml

   agents:
     - id: "analyzer"
       backend:
         type: "claude_code"
         cwd: "workspace1"      # Agent-specific workspace

     - id: "reviewer"
       backend:
         type: "gemini"
         cwd: "workspace2"      # Separate workspace

This ensures agents don't interfere with each other's files during coordination.

Configuration Parameters
~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 25 15 60

   * - Parameter
     - Required
     - Description
   * - ``cwd``
     - Yes
     - Working directory for file operations (agent-specific workspace)
   * - ``snapshot_storage``
     - Yes
     - Directory for workspace snapshots (shared between agents)
   * - ``agent_temporary_workspace``
     - Yes
     - Parent directory for temporary workspaces
   * - ``exclude_file_operation_mcps``
     - No
     - Exclude file operation MCP tools. Agents use command-line tools instead. (default: false)

Minimal MCP Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~

When ``exclude_file_operation_mcps: true``, MassGen excludes redundant file operation MCP tools and relies on command-line tools instead:

.. code-block:: yaml

   agents:
     - id: "efficient_agent"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
         cwd: "workspace"
         exclude_file_operation_mcps: true  # Use command-line tools
         enable_mcp_command_line: true      # Enable command execution

**What gets excluded:**

* Filesystem MCP read operations (``read_file``, ``list_directory``, ``grep_search``)
* File operation tools from Workspace Tools MCP (``copy_file``, ``delete_file``, ``compare_files``)

**What is kept:**

* File write operations (``write_file``, ``edit_file``) - Provides clean file creation without shell escaping issues
* Command execution tools (``execute_command``, background shell management)
* Media generation tools (image/audio generation, if enabled)
* Planning tools (task management abstractions)

**Agents use standard command-line tools for excluded operations:**

* ``cat``, ``head``, ``tail`` instead of ``read_file``
* ``ls``, ``find`` instead of ``list_directory``
* ``grep``, ``rg`` instead of ``grep_search``
* ``cp`` instead of ``copy_file``
* ``rm`` instead of ``delete_file``
* ``diff`` instead of ``compare_files``

.. note::

   This configuration reduces MCP tool overhead while maintaining full functionality through command-line tools. Recommended for models that are proficient with shell commands.

Available File Operations
-------------------------

Claude Code Backend
~~~~~~~~~~~~~~~~~~~

Claude Code has built-in file operation tools:

* **Read** - Read file contents
* **Write** - Create or overwrite files
* **Edit** - Make targeted edits to existing files
* **Bash** - Execute shell commands (including file operations)
* **Grep** - Search file contents with regex
* **Glob** - Find files matching patterns

**Additional Claude Code Tools:**

* **Task** - Launch specialized agents for complex tasks
* **ExitPlanMode** - Exit planning mode (when enabled)
* **NotebookEdit** - Edit Jupyter notebook cells
* **WebFetch** - Fetch and process web content
* **TodoWrite** - Manage task lists
* **WebSearch** - Search the web
* **BashOutput** - Retrieve background shell output
* **KillShell** - Terminate background shells
* **SlashCommand** - Execute custom slash commands

.. seealso::
   For complete Claude Code tools documentation and usage examples, see the `Claude Code Documentation <https://docs.claude.com/en/docs/claude-code>`_

**Example:**

.. code-block:: bash

   massgen \
     --backend claude_code \
     --model sonnet \
     "Create a Python project with src/, tests/, and docs/ directories"

MCP Filesystem Server
~~~~~~~~~~~~~~~~~~~~~

All backends can use the MCP Filesystem Server for file operations:

.. code-block:: yaml

   agents:
     - id: "gemini_agent"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"

**MCP Filesystem Operations:**

* ``read_file`` - Read file contents
* ``write_file`` - Write or create files
* ``create_directory`` - Create directories
* ``list_directory`` - List directory contents
* ``delete_file`` - Delete files (with safety checks)
* ``move_file`` - Move or rename files

.. seealso::
   For complete MCP Filesystem Server documentation and additional operations, see the `official MCP Filesystem Server <https://github.com/modelcontextprotocol/servers/tree/main/src/filesystem>`_

MassGen Workspace Tools
~~~~~~~~~~~~~~~~~~~~~~~

MassGen provides additional workspace management tools via the Workspace Tools MCP Server:

.. code-block:: yaml

   agents:
     - id: "advanced_agent"
       backend:
         type: "claude"
         model: "claude-sonnet-4"
         mcp_servers:
           - name: "workspace_tools"
             type: "stdio"
             command: "uv"
             args: ["run", "python", "-m", "massgen.filesystem_manager._workspace_tools_server"]

**File Operations:**

* ``copy_file`` - Copy single file/directory from any accessible path to workspace
* ``copy_files_batch`` - Copy multiple files with pattern matching and exclusions
* ``delete_file`` - Delete single file/directory from workspace
* ``delete_files_batch`` - Delete multiple files with pattern matching

**Directory Analysis:**

* ``compare_directories`` - Compare two directories and show differences
* ``compare_files`` - Compare two text files and show unified diff

**Image Generation** (requires OpenAI API key):

* ``generate_and_store_image_with_input_images`` - Create variations of existing images using gpt-4.1
* ``generate_and_store_image_no_input_images`` - Generate new images from text prompts using gpt-4.1

**Example - Workspace cleanup with batch operations:**

.. code-block:: text

   You: Copy all Python files from the previous turn's output
   [Agent uses copy_files_batch with include_patterns: ["*.py"]]

   You: Delete all temporary files
   [Agent uses delete_files_batch with include_patterns: ["*.tmp", "*.temp"]]

   You: Compare my workspace with the reference implementation
   [Agent uses compare_directories to show differences]

Workspace Management
--------------------

Workspace Isolation
~~~~~~~~~~~~~~~~~~~

Each agent's ``cwd`` is fully isolated:

* Agents can freely read/write within their workspace
* No risk of conflicting file operations
* Clean separation of work products

**Directory structure:**

.. code-block:: text

   .massgen/
   └── workspaces/
       ├── workspace1/     # Agent 1's isolated workspace
       │   ├── file1.py
       │   └── output.txt
       └── workspace2/     # Agent 2's isolated workspace
           ├── analysis.md
           └── data.csv

Snapshot Storage
~~~~~~~~~~~~~~~~

Workspace snapshots enable context sharing between agents:

* Winning agent's workspace is saved as snapshot
* Future coordination rounds can access previous results
* Enables building on past work

**How it works:**

1. Agent completes initial answer → Workspace snapshotted
2. Coordination phase → Agents can reference snapshot
3. Final agent selected → Can build on snapshot content

Temporary Workspaces
~~~~~~~~~~~~~~~~~~~~

Previous turn results available via temporary workspaces:

* Multi-turn sessions preserve context
* Agents can access files from earlier turns
* Organized by turn number

.. code-block:: text

   .massgen/
   └── temp_workspaces/
       ├── turn_1/
       │   └── agent1/
       │       └── previous_output.txt
       └── turn_2/
           └── agent2/
               └── refined_output.txt

File Operation Safety
---------------------

Read-Before-Delete Enforcement
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

MassGen prevents accidental file deletion with ``FileOperationTracker``:

**Safety Rules:**

1. Agents **must read a file before deleting it**
2. Exception: Agent-created files can be deleted without reading
3. Directory deletion requires validation
4. Clear error messages when operations blocked

**Example:**

.. code-block:: python

   # This will FAIL - file not read first
   Agent: Delete config.json
   Error: Cannot delete config.json - file has not been read

   # This will SUCCEED - file read first
   Agent: Read config.json
   Agent: Delete config.json
   Success: File deleted

Created File Exemption
~~~~~~~~~~~~~~~~~~~~~~

Files created by an agent can be freely deleted:

.. code-block:: python

   Agent: Write new_file.txt "content"
   Agent: Delete new_file.txt  # Allowed - agent created it

This allows agents to clean up their own temporary files.

PathPermissionManager
~~~~~~~~~~~~~~~~~~~~~

Integrated operation tracking:

* ``track_read_operation()`` - Records file reads
* ``track_write_operation()`` - Records file writes
* ``track_delete_operation()`` - Validates and records deletions
* Enhanced delete validation for files and batch operations

Protected Paths
~~~~~~~~~~~~~~~

Protected paths allow you to make specific files or directories **read-only** within writable context paths, preventing agents from modifying or deleting critical reference files while allowing them to edit other files.

**Use Case**: You want agents to modify some files in a directory but keep certain reference files, configurations, or templates untouched.

**Configuration Example:**

.. code-block:: yaml

   orchestrator:
     snapshot_storage: "snapshots"
     agent_temporary_workspace: "temp_workspaces"

     context_paths:
       - path: "/path/to/project"
         permission: "write"
         protected_paths:
           - "config.json"      # Read-only
           - "template.html"    # Read-only
           - "tests/fixtures/"  # Entire directory read-only

**What Agents Can Do:**

* ✅ **Read** protected files for reference
* ✅ **Write/Edit** non-protected files in the same directory
* ❌ **Modify or Delete** protected files

**Common Use Cases:**

1. **Protect Reference Files**: Keep test fixtures unchanged while agents modify code
2. **Protect Configuration**: Allow code changes but prevent config modifications
3. **Protect Templates**: Let agents generate content without modifying templates
4. **Protect Documentation Structure**: Allow content updates but preserve organization

For complete protected paths documentation including examples, troubleshooting, and best practices, see the **Protected Paths** section in :doc:`project_integration`.

Security Considerations
-----------------------

.. warning::

   **Agents can autonomously read, write, modify, and delete files** within their permitted directories.

Before running MassGen with filesystem access:

* ✅ Only grant access to directories you're comfortable with agents modifying
* ✅ Use permission system to restrict write access where needed
* ✅ Use protected paths for critical files within writable directories
* ✅ Test in an isolated directory first
* ✅ Back up important files before granting write access
* ✅ Review ``context_paths`` configuration carefully

The agents will execute file operations **without additional confirmation** once permissions are granted.

File Access Control
~~~~~~~~~~~~~~~~~~~

Use MCP server configurations to restrict access:

.. code-block:: yaml

   # Filesystem operations handled via cwd parameter
   # No need to add filesystem MCP server manually

Workspace Organization
----------------------

Clean Project Structure
~~~~~~~~~~~~~~~~~~~~~~~

All MassGen state organized under ``.massgen/``:

.. code-block:: text

   your-project/
   ├── .massgen/                    # All MassGen state
   │   ├── sessions/                # Multi-turn conversation history
   │   ├── workspaces/              # Agent working directories
   │   ├── snapshots/               # Workspace snapshots
   │   └── temp_workspaces/         # Previous turn results
   ├── src/                         # Your project files
   └── docs/                        # Your documentation

**Benefits:**

* Clean projects - all MassGen files in one place
* Easy ``.gitignore`` - just add ``.massgen/``
* Portable - delete ``.massgen/`` without affecting project
* Multi-turn sessions preserved

Configuration Auto-Organization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

MassGen automatically organizes under ``.massgen/``:

.. code-block:: yaml

   orchestrator:
     snapshot_storage: "snapshots"         # → .massgen/snapshots/
     agent_temporary_workspace: "temp"     # → .massgen/temp/

   agents:
     - backend:
         cwd: "workspace1"                 # → .massgen/workspaces/workspace1/

Example: Multi-Agent Document Processing
-----------------------------------------

.. code-block:: yaml

   agents:
     - id: "analyzer"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
         cwd: "analyzer_workspace"

     - id: "writer"
       backend:
         type: "claude_code"
         cwd: "writer_workspace"

   orchestrator:
     snapshot_storage: "snapshots"
     agent_temporary_workspace: "temp"

**Usage:**

.. code-block:: bash

   massgen \
     --config document_processing.yaml \
     "Analyze data.csv and create a comprehensive report with visualizations"

**What happens:**

1. **Analyzer** reads data.csv, creates analysis in its workspace
2. **Writer** sees analyzer's snapshot, creates report with charts
3. Final output in winner's workspace, snapshot saved for future reference

Advanced Topics
---------------

The sections above cover basic file operations and workspace management. For advanced features, see:

* :doc:`project_integration` - Work with your existing codebase using context paths with file-level and directory-level access control, plus protected paths for fine-grained permission control
* :doc:`../tools/code_execution` - Execute bash commands and scripts with MCP-based code execution, supporting both local and Docker isolation modes

Next Steps
----------

* :doc:`../tools/code_execution` - Execute commands and scripts with local or Docker isolation
* :doc:`../tools/mcp_integration` - Additional MCP tools beyond filesystem
* :doc:`../sessions/multi_turn_mode` - File operations across multiple conversation turns
* :doc:`../../quickstart/running-massgen` - More examples


---

## user_guide/files/index.rst

File Operations
===============

MassGen provides comprehensive file system capabilities that enable agents to read, write, and manage files safely within defined boundaries. This section covers all aspects of file operations in MassGen.

.. tip::

   For quick file operation examples, see :doc:`../../quickstart/running-massgen`.

Overview
--------

MassGen's file system features include:

* **Workspace isolation** - Agents operate within defined directories
* **Protected paths** - Prevent modification of sensitive files
* **Project integration** - Work with existing codebases safely
* **Memory filesystem** - Combine file ops with session memory
* **Snapshot management** - Track and restore file states

Guides in This Section
----------------------

.. grid:: 2
   :gutter: 3

   .. grid-item-card:: 📁 File Operations

      Core file system capabilities

      * Read, write, edit files
      * Workspace configuration
      * File snapshots and restoration
      * Safety and isolation

      :doc:`Read the File Operations guide → <file_operations>`

   .. grid-item-card:: 📂 Project Integration

      Work with existing projects

      * Context paths configuration
      * Codebase exploration
      * Safe project modification
      * Multi-directory access

      :doc:`Read the Project Integration guide → <project_integration>`

   .. grid-item-card:: 🔒 Protected Paths

      Secure file access

      * Protect sensitive files
      * Read-only paths
      * Path exclusion patterns
      * Security best practices

      :doc:`Read the Protected Paths guide → <protected_paths>`

   .. grid-item-card:: 💾 Memory Filesystem Mode

      Combine files with session memory

      * Persistent file context
      * Memory-based file tracking
      * Multi-session file operations
      * Advanced workflows

      :doc:`Read the Memory Filesystem guide → <memory_filesystem_mode>`

Related Documentation
---------------------

* :doc:`../tools/index` - Tools that work with files
* :doc:`../tools/code_execution` - Execute code that creates files
* :doc:`../sessions/memory` - Session memory management
* :doc:`../../reference/yaml_schema` - YAML configuration reference

.. toctree::
   :maxdepth: 1
   :hidden:

   file_operations
   project_integration
   protected_paths
   memory_filesystem_mode


---

## user_guide/files/memory_filesystem_mode.rst

Memory Filesystem Mode
======================

.. warning::
   **This feature is currently in development and experimental.** Multi-turn persistence and some advanced features may not work as expected.

MassGen's filesystem-based memory mode provides a simple, transparent two-tier memory system for agents. Memories are saved to the filesystem as Markdown files, visible across agents, and managed using standard file operations.

.. note::

   This is different from the :doc:`../sessions/memory` system. Filesystem mode is designed for transparent, file-based memory storage suitable for coordination and cross-agent visibility, while persistent memory uses vector databases for semantic retrieval across sessions.

.. contents:: Table of Contents
   :local:
   :depth: 2

Overview
--------

The filesystem memory mode introduces a **two-tier hierarchy** inspired by `Letta's context hierarchy <https://docs.letta.com/guides/agents/context-hierarchy>`_:

**Short-term Memory** (Tier 1)
   Always injected into all agents' system prompts. Use for tactical observations, user preferences, and quick reference information needed frequently. These are small (<100 lines), immediately useful notes. Auto-loaded every turn.

**Long-term Memory** (Tier 2)
   Summary (name + description only) shown in system prompt, full content loaded on-demand by reading the file. Use for detailed analyses, comprehensive reports, and information that's useful but not needed every turn (>100 lines). Manual load when needed.

Key Features
~~~~~~~~~~~~

- **Filesystem Transparency**: All memories stored as Markdown files in agent workspaces
- **Cross-Agent Visibility**: All agents see all memories from all agents
- **Memory Archiving**: Memories automatically preserved when agents restart (``new_answer``)
- **Stateless Agents**: Agents see aggregated context from current state + historical archives
- **Smart Deduplication**: Historical archives show only latest version of each memory
- **Automatic Injection**: Short-term memories always in-context, no action needed
- **Two-Tier Design**: Balance between immediate availability and context window efficiency
- **Standard File Operations**: Create, update, and remove memories using normal file tools

Quick Start
-----------

Enable in Configuration
~~~~~~~~~~~~~~~~~~~~~~~

Add to your YAML config:

.. code-block:: yaml

   orchestrator:
     coordination:
       enable_memory_filesystem_mode: true

   agents:
     - id: "agent_a"
       backend:
         cwd: "workspace1"  # Required for filesystem mode

Basic Usage
~~~~~~~~~~~

Agents create memories by writing Markdown files with YAML frontmatter to the memory directories.

**Creating a short-term memory** (always in-context):

.. code-block:: markdown

   # Write to: memory/short_term/user_preferences.md

   ---
   name: user_preferences
   description: User's coding style preferences
   created: 2025-01-12T10:30:00Z
   updated: 2025-01-12T10:30:00Z
   ---

   # Preferences
   - Uses tabs over spaces
   - Prefers functional programming
   - Avoids global state

**Creating a long-term memory** (load on-demand):

.. code-block:: markdown

   # Write to: memory/long_term/project_history.md

   ---
   name: project_history
   description: Project background and architectural decisions
   created: 2025-01-12T10:30:00Z
   updated: 2025-01-12T10:30:00Z
   ---

   # Project History
   ## Initial Setup (2024-01)
   ...

**Reading a long-term memory**:

.. code-block:: bash

   # When you need access to long-term memory content
   cat memory/long_term/project_history.md

Architecture
------------

Directory Structure
~~~~~~~~~~~~~~~~~~~

Memories are stored in multiple locations for different purposes:

**Agent Workspaces** (Current work):

.. code-block:: text

   workspace1/
     memory/
       short_term/
         quick_notes.md
         user_prefs.md
       long_term/
         comprehensive_analysis.md

   workspace2/
     memory/
       short_term/
         task_context.md

**Temp Workspaces** (For cross-agent visibility):

.. code-block:: text

   .massgen/temp_workspaces/
     agent1/
       memory/
         short_term/
           quick_notes.md
     agent2/
       memory/
         short_term/
           task_context.md

**Archived Memories** (Historical persistence):

.. code-block:: text

   .massgen/sessions/session_20251123_210304/
     archived_memories/
       agent_a_answer_0/
         short_term/
           quick_notes.md
         long_term/
           skill_effectiveness.md
       agent_a_answer_1/
         short_term/
           quick_notes.md  # Updated version
       agent_b_answer_0/
         short_term/
           task_context.md
     turn_1/
       workspace/
       answer.txt

File Format
~~~~~~~~~~~

All memories **MUST** use **Markdown with YAML frontmatter**:

.. code-block:: markdown

   ---
   name: quick_notes
   description: Tactical observations from current work
   created: 2025-11-23T20:00:00
   updated: 2025-11-23T20:00:00
   ---

   ## Web Development
   - CSS variables work well for theming
   - create_directory fails on nested paths - create parent first

**Required YAML fields:**

- ``name``: Memory filename (without .md extension)
- ``description``: Brief summary for display in long-term memory lists
- ``created``: ISO timestamp when memory was created
- ``updated``: ISO timestamp of last update

**Optional fields:**

- ``tier``: "short_term" or "long_term" (inferred from directory if missing)
- ``agent_id``: Which agent created it (optional, for tracking origin)

System Prompt Injection
~~~~~~~~~~~~~~~~~~~~~~~~

Memory injection happens **automatically on every turn**. The orchestrator reads all memory files from all agents' workspaces and includes them in each agent's system prompt.

Injection Flow
^^^^^^^^^^^^^^

**1. Agent Creates Memory:**

When an agent writes a memory file using standard file tools:

.. code-block:: text

   workspace1/memory/short_term/decisions.md

**2. Orchestrator Reads All Memories (Every Turn):**

On every turn, before sending messages to agents, the orchestrator:

- Scans **all agents' workspaces**: ``workspace1/memory/``, ``workspace2/memory/``, etc.
- Reads all ``short_term/*.md`` and ``long_term/*.md`` files
- Parses YAML frontmatter + content from each file
- Groups into two lists: short-term and long-term memories

**3. Formats Into System Message:**

The orchestrator generates a memory section and appends it to each agent's system prompt:

.. code-block:: text

   system_message = base_system_message
                  + planning_guidance
                  + memory_message       ← Injected here
                  + skills_message

**4. All Agents See All Memories:**

Every agent receives the same memory section in their system prompt, containing:

- **Short-term**: Full content from ALL agents (with source labels)
- **Long-term**: Summary table from ALL agents

**Cross-Agent Visibility:**

- When Agent A creates a memory, Agent B sees it in their **next system message**
- Memories are re-read from filesystem on **every turn** (no caching)
- Updates are immediately visible to all agents
- Each memory shows which agent created it: ``[Agent: agent_a]``

System Prompt Format
^^^^^^^^^^^^^^^^^^^^

Agents see memories from two sources in their system prompts:

**1. Current Agent Memories** (from temp workspaces):

.. code-block:: text

   ### Current Agent Memories (For Comparison)

   **agent1:**
   *short_term:*
   - `quick_notes.md`
     ```
     - CSS variables work well for theming
     - create_directory needs parent to exist first
     ```

   *long_term:*
   - `comprehensive_analysis.md`: Detailed skill effectiveness analysis

   **agent2:**
   *short_term:*
   - `task_context.md`
     ```
     Key findings about current task...
     ```

**2. Archived Memories** (deduplicated historical context):

.. code-block:: text

   ### Archived Memories (Historical - Deduplicated)

   **Short-term (full content):**

   - `quick_notes.md` (from Agent A Answer 1)
     ```
     - CSS variables work well
     - Always test responsiveness
     ```

   **Long-term (summaries only):**
   - `skill_effectiveness.md`: Tracking skills and tools (from Agent A Answer 2)
   - `approach_patterns.md`: Strategy analysis (from Agent B Answer 0)

**Key Differences:**

- **Current**: Shows ALL memories from all agents (no deduplication) for comparison
- **Archived**: Shows deduplicated memories (newest version only) from previous answers
- **Short-term**: Full content always shown
- **Long-term**: Only name + description shown (read file directly to see full content)

**Important Notes:**

- **Stateless agents**: Agents have no persistent identity across ``new_answer`` calls
- **Deduplication**: If same memory name appears multiple times in archives, only newest shown
- **Automatic archiving**: Memories saved to ``.massgen/sessions/{session_id}/archived_memories/`` before workspace clearing
- **Fresh reads**: Memory is always read from filesystem on every turn, never cached

Working with Memory Files
-------------------------

Creating Memories
~~~~~~~~~~~~~~~~~

To create a memory, write a Markdown file with YAML frontmatter to the appropriate directory:

**Short-term memory** (auto-injected every turn):

.. code-block:: bash

   # Write to memory/short_term/<name>.md
   write_file memory/short_term/user_preferences.md "---
   name: user_preferences
   description: User's coding style preferences
   created: 2025-01-12T10:30:00Z
   updated: 2025-01-12T10:30:00Z
   ---

   # Preferences
   - Uses tabs over spaces
   - Prefers functional programming
   "

**Long-term memory** (read on-demand):

.. code-block:: bash

   # Write to memory/long_term/<name>.md
   write_file memory/long_term/project_history.md "---
   name: project_history
   description: Project background and decisions
   created: 2025-01-12T10:30:00Z
   updated: 2025-01-12T10:30:00Z
   ---

   # Project History
   Detailed content here...
   "

Updating Memories
~~~~~~~~~~~~~~~~~

To update a memory, simply overwrite the file with new content:

.. code-block:: bash

   # Update the file, change the 'updated' timestamp
   write_file memory/short_term/user_preferences.md "---
   name: user_preferences
   description: Updated user coding preferences
   created: 2025-01-12T10:30:00Z
   updated: 2025-01-12T11:45:00Z
   ---

   # Updated Preferences
   - Now uses spaces (changed from tabs)
   "

Removing Memories
~~~~~~~~~~~~~~~~~

To remove a memory, delete the file:

.. code-block:: bash

   rm memory/short_term/old_preferences.md

Reading Long-term Memories
~~~~~~~~~~~~~~~~~~~~~~~~~~

Long-term memories are not auto-injected. Read them when needed:

.. code-block:: bash

   cat memory/long_term/project_history.md

Best Practices
--------------

Choosing Between Tiers
~~~~~~~~~~~~~~~~~~~~~~~

**Use Short-term** (PREFERRED for most things) when:

- Small, tactical observations (<100 lines)
- Needed frequently across turns
- Quick reference information
- User preferences and workflow patterns
- Tool tips and gotchas discovered
- Examples: ``quick_notes.md``, ``user_prefs.md``, ``task_context.md``

**Use Long-term** (only for detailed content) when:

- Comprehensive, detailed analysis (>100 lines)
- Multi-task patterns with substantial evidence
- Detailed post-mortems and architectural decisions
- Content that would clutter auto-loaded context
- Examples: ``comprehensive_analysis.md``, ``detailed_post_mortem.md``

**Rule of thumb**: If it's small and useful every turn → short_term. If it's detailed and situationally useful → long_term. Most observations should be short-term.

Memory Organization
~~~~~~~~~~~~~~~~~~~

**Use clear, descriptive names**:

.. code-block:: text

   # Good
   memory/short_term/user_email_preferences.md

   # Bad
   memory/short_term/prefs.md

**Keep short-term memories concise**:

.. code-block:: text

   # Good - focused and brief (memory/short_term/user_style.md)
   ---
   name: user_style
   description: Coding style preferences
   ---
   - Tabs over spaces
   - Functional style
   - No globals

   # Bad - too verbose for short-term (should be in memory/long_term/)
   [100 lines of detailed style guide]

**Use meaningful descriptions** in frontmatter:

.. code-block:: yaml

   # Good
   description: User's Python coding style preferences and conventions

   # Bad
   description: Preferences

Cross-Agent Coordination
~~~~~~~~~~~~~~~~~~~~~~~~

All agents see all memories with source attribution:

.. code-block:: text

   # Agent A creates a memory file
   # workspace1/memory/long_term/research_findings.md
   ---
   name: research_findings
   description: Literature review on neural architectures
   agent_id: agent_a
   ---
   [Research notes]

   # Agent B sees it in their system prompt and can read it directly
   cat workspace1/memory/long_term/research_findings.md

**Tips**:

- Use descriptive names to help other agents find relevant memories
- Include agent context in descriptions when appropriate
- Each agent's memories are in their own workspace directory

Memory Lifecycle Management
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Clean up outdated memories**:

.. code-block:: bash

   # When a memory is no longer needed
   rm memory/short_term/temporary_analysis.md

**Update instead of creating duplicates**:

.. code-block:: bash

   # Check if memory exists, then update or create
   if [ -f memory/short_term/user_prefs.md ]; then
       # Update existing file (change 'updated' timestamp)
       write_file memory/short_term/user_prefs.md "[new content]"
   else
       # Create new file
       write_file memory/short_term/user_prefs.md "[initial content]"
   fi

Context Window Management
~~~~~~~~~~~~~~~~~~~~~~~~~

**Monitor short-term usage**:

Short-term memories consume context window tokens. Recommended limits:

- **Individual memory size**: <1000 tokens
- **Total short-term memories**: <10 memories
- **Total short-term tokens**: <10,000 tokens

**Promote/demote as needed**:

.. code-block:: bash

   # If short-term gets too large, move less critical items to long-term
   mv memory/short_term/detailed_notes.md memory/long_term/detailed_notes.md

Use Cases
---------

Multi-Turn Conversations
~~~~~~~~~~~~~~~~~~~~~~~~

Persist context across conversation turns:

.. code-block:: text

   # Turn 1: Agent A saves findings to memory/long_term/analysis_turn1.md
   ---
   name: analysis_turn1
   description: Initial codebase analysis findings
   ---
   # Findings
   - Found 3 API endpoints
   - Auth uses JWT

   # Turn 2: Agent B references previous work
   cat memory/long_term/analysis_turn1.md
   # Continue from where Agent A left off

User Preferences Tracking
~~~~~~~~~~~~~~~~~~~~~~~~~~

Store and maintain user preferences in ``memory/short_term/user_preferences.md``:

.. code-block:: markdown

   ---
   name: user_preferences
   description: User's project preferences and constraints
   ---

   # User Preferences
   - Language: Python 3.11+
   - Framework: FastAPI
   - Testing: pytest with coverage >80%
   - Documentation: Google-style docstrings

Project Context Management
~~~~~~~~~~~~~~~~~~~~~~~~~~

Maintain project background and decisions in ``memory/long_term/project_context.md``:

.. code-block:: markdown

   ---
   name: project_context
   description: Project overview and architectural decisions
   ---

   # Project Context

   ## Overview
   Building a multi-agent orchestration framework for AI coordination.

   ## Key Decisions
   - Using MCP for tool integration
   - Filesystem-first for transparency
   - Two-tier memory hierarchy

   ## Current Sprint
   - Implementing memory filesystem mode
   - Target: v0.2.0 release

Troubleshooting
---------------

Memories Not Appearing
~~~~~~~~~~~~~~~~~~~~~~

**Check configuration**:

.. code-block:: yaml

   orchestrator:
     coordination:
       enable_memory_filesystem_mode: true  # Must be true

   agents:
     - id: "agent_a"
       backend:
         cwd: "workspace1"  # Must have workspace path

**Verify workspace path**:

.. code-block:: bash

   # Check if memory directory exists
   ls workspace1/memory/

   # Should see: short_term/ long_term/

**Check logs**:

.. code-block:: bash

   # Look for memory directory setup logs
   grep "memory" logs/massgen.log
   grep "enable_memory_filesystem_mode" logs/massgen.log

Memory Not Loading
~~~~~~~~~~~~~~~~~~

**Verify memory exists**:

.. code-block:: bash

   # Check filesystem
   ls workspace1/memory/short_term/
   ls workspace1/memory/long_term/

   # Read memory file
   cat workspace1/memory/long_term/project_history.md

**Check frontmatter format**:

Memory files must start with ``---`` and have valid YAML:

.. code-block:: markdown

   ---
   name: my_memory
   description: My memory description
   tier: long_term
   agent_id: agent_a
   created: 2025-01-12T10:30:00Z
   updated: 2025-01-12T10:30:00Z
   ---

   Content here...

Performance Considerations
~~~~~~~~~~~~~~~~~~~~~~~~~~

**Short-term memory adds to every request**:

- Each short-term memory adds ~100-1000 tokens per agent
- With 3 agents × 5 short-term memories = potential 1,500-15,000 tokens
- Monitor context usage and move to long-term if needed

**Long-term memory loads on-demand**:

- Only costs tokens when explicitly loaded
- Can have unlimited long-term memories
- Load time is negligible (filesystem read)

Comparison with Other Memory Systems
-------------------------------------

Filesystem Mode vs. Persistent Memory
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 30 35 35

   * - Feature
     - Filesystem Mode
     - Persistent Memory (Qdrant)
   * - Storage
     - Markdown files in workspace
     - Vector database (Qdrant)
   * - Retrieval
     - Manual load or auto-inject
     - Semantic search
   * - Persistence
     - Per-session (workspace)
     - Cross-session (database)
   * - Setup
     - Config flag only
     - Requires Qdrant server
   * - Use Case
     - Transparent, file-based coordination
     - Long-term semantic memory
   * - Cross-Agent
     - Full visibility (same orchestration)
     - Shared collection
   * - Scale
     - Small-medium (<100 memories)
     - Large (unlimited)

**When to use each**:

- **Filesystem Mode**: Current session coordination, transparent memory, file-based workflows
- **Persistent Memory**: Multi-session learning, semantic retrieval, large knowledge bases

Filesystem Mode vs. Skills System
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 30 35 35

   * - Feature
     - Filesystem Mode
     - Skills System
   * - Purpose
     - Runtime memory/context
     - Pre-defined knowledge/tools
   * - Modification
     - Dynamic (create/update/delete)
     - Static (loaded at start)
   * - Format
     - Markdown with frontmatter
     - Markdown with frontmatter
   * - Injection
     - Two-tier (auto + manual)
     - Auto-inject or load
   * - Agent Creation
     - Yes (via file operations)
     - No (external files)

**Complementary use**:

- Skills: Pre-existing knowledge (how to use tools, workflows)
- Memory: Runtime discoveries (user prefs, findings, decisions)

Memory Archiving & Persistence
--------------------------------

When agents call ``new_answer`` to restart with a fresh workspace, their memories are automatically archived.

How It Works
~~~~~~~~~~~~

1. **Before clearing workspace**: Memories copied to ``.massgen/sessions/{session_id}/archived_memories/{agent_id}_answer_{n}/``
2. **Workspace cleared**: Agent gets fresh workspace
3. **Memories preserved**: Historical archives persist permanently in sessions/ directory
4. **System prompt updated**: Next agent sees archived memories (deduplicated) + current memories

Deduplication Strategy
~~~~~~~~~~~~~~~~~~~~~~

**Current Agent Memories** (from temp workspaces):
   - Shows ALL memories from ALL agents
   - NO deduplication (need all for comparison)
   - Used for evaluating competing answers

**Archived Memories** (from sessions/):
   - Shows deduplicated historical memories
   - For duplicate names: keeps only newest version (by file timestamp)
   - Reduces noise from repeated memory names across answers

**Example**:

.. code-block:: text

   Archives before deduplication:
   - agent_a_answer_0/quick_notes.md (timestamp: 100)
   - agent_a_answer_1/quick_notes.md (timestamp: 200) ← SHOWN
   - agent_b_answer_0/quick_notes.md (timestamp: 150)

   System prompt shows only: agent_a_answer_1/quick_notes.md (most recent)

Stateless Architecture
~~~~~~~~~~~~~~~~~~~~~~~

Agents are **stateless** - they have no persistent identity across ``new_answer`` calls. Each agent sees:

1. All current memories from all agents' workspaces (for comparing approaches)
2. Deduplicated historical memories from archives (for context)

This design ensures agents see complete context without confusion about "my previous work" vs "other agents' work".

Known Limitations
-----------------

This feature is experimental. Key limitations:

1. **No Semantic Search**: Retrieval by exact name only. No similarity search or automatic relevance ranking.

2. **Token Management**: No automatic enforcement of memory size limits. Keep short-term memories small (<100 lines).

3. **Archive Growth**: Archives accumulate over time. Consider periodic cleanup of old session directories.

4. **No Conflict Resolution**: If multiple agents update same memory name, deduplication keeps newest by timestamp (last write wins).

Related Documentation
---------------------

- :doc:`../sessions/memory` - Persistent memory with Qdrant (different system)
- :doc:`../advanced/agent_task_planning` - Task planning MCP (similar filesystem pattern)
- :doc:`../tools/skills` - Skills system (similar injection pattern)
- `Letta Context Hierarchy <https://docs.letta.com/guides/agents/context-hierarchy>`_ - Inspiration for tier design


---

## user_guide/files/project_integration.rst

Project Integration & Context Paths
====================================

**NEW in v0.0.21** - Directory-level context paths
**ENHANCED in v0.0.26** - File-level context paths

Work directly with your existing projects! Context paths allow you to share specific **directories or individual files** with all agents while maintaining granular permission control.

Quick Start
-----------

**For coding projects** (recommended - auto-detects context from current directory):

.. code-block:: bash

   # Run from your project directory - MassGen will offer to add it as context
   cd /path/to/your-project
   uv run massgen "Enhance the website with dark/light theme toggle and interactive features"

**Using explicit config** (for predefined setups):

.. code-block:: bash

   massgen \
     --config @examples/tools/filesystem/gpt5mini_cc_fs_context_path.yaml \
     "Enhance the website with dark/light theme toggle and interactive features"

Configuration
-------------

Context Paths Setup
~~~~~~~~~~~~~~~~~~~

Share directories **or individual files** with all agents using ``context_paths``:

.. code-block:: yaml

   agents:
     - id: "code-reviewer"
       backend:
         type: "claude_code"
         cwd: "workspace"          # Agent's isolated work area

   orchestrator:
     # Required for file operations
     snapshot_storage: "snapshots"
     agent_temporary_workspace: "temp_workspaces"

     # Context paths - directories OR individual files
     context_paths:
       - path: "/home/user/my-project/src"           # Directory access
         permission: "read"                          # Agents can analyze your code
       - path: "/home/user/my-project/docs"          # Directory access
         permission: "write"                         # Final agent can update docs
       - path: "/home/user/my-project/config.yaml"   # Single file access (v0.0.26+)
         permission: "read"                          # Access only this file

Configuration Parameters
~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 25 15 60

   * - Parameter
     - Required
     - Description
   * - ``context_paths``
     - Yes
     - List of shared directories or files for all agents
   * - ``path``
     - Yes
     - **Absolute path to directory OR file** (both supported as of v0.0.26)
   * - ``permission``
     - Yes
     - Access level: ``"read"`` or ``"write"``
   * - ``snapshot_storage``
     - Yes
     - Directory for workspace snapshots (required for file operations)
   * - ``agent_temporary_workspace``
     - Yes
     - Parent directory for temporary workspaces (required for file operations)

.. note::

   **v0.0.26+**: Context paths can now point to **individual files** in addition to directories. This allows you to grant agents access to specific configuration files or reference documents without exposing the entire directory.

   **File-level access**: When a file path is provided, agents can only access that specific file - sibling files in the same directory are blocked for security.

Permissions Model
-----------------

Context vs Final Agent Permissions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Different permission levels during different phases:

**During Coordination (Context Agents):**
   All context paths are **READ-ONLY**, regardless of configuration. This protects your files during multi-agent discussion.

**Final Presentation (Winning Agent):**
   The winning agent gets the **configured permission** (read or write) for final execution.

**Example:**

.. code-block:: yaml

   orchestrator:
     context_paths:
       - path: "/home/user/project/src"
         permission: "write"

**What happens:**

1. **Coordination phase** → All agents have READ access to ``/src``
2. **Final presentation** → Winning agent has WRITE access to ``/src``

Read Permission
~~~~~~~~~~~~~~~

Agents can:

* Read all files in the directory
* Analyze code structure
* Extract information
* Reference content in responses

Agents **cannot:**

* Create new files
* Modify existing files
* Delete files

**Use cases:**

* Code review and analysis
* Documentation generation from source code
* Data extraction and reporting
* Pattern detection and recommendations

Write Permission
~~~~~~~~~~~~~~~~

Final agent can:

* Read all files
* Create new files
* Modify existing files
* Delete files (with read-before-delete safety)

**Use cases:**

* Code refactoring and updates
* Documentation updates
* Test generation
* Project modernization

Multi-Agent Project Collaboration
----------------------------------

Advanced Example
~~~~~~~~~~~~~~~~

.. code-block:: yaml

   agents:
     - id: "analyzer"
       backend:
         type: "gemini"
         cwd: "analysis_workspace"

     - id: "implementer"
       backend:
         type: "claude_code"
         cwd: "implementation_workspace"

   orchestrator:
     # Required for file operations
     snapshot_storage: "snapshots"
     agent_temporary_workspace: "temp_workspaces"

     # Context paths - mix of directories and files
     context_paths:
       - path: "/home/user/legacy-app/src"              # Directory access
         permission: "read"                             # Read existing codebase
       - path: "/home/user/legacy-app/.env.example"    # Single file access (v0.0.26+)
         permission: "read"                             # Access only env template
       - path: "/home/user/legacy-app/tests"            # Directory access
         permission: "write"                            # Write new tests
       - path: "/home/user/modernized-app"              # Directory access
         permission: "write"                            # Create modernized version

This configuration:

* All agents can read the legacy codebase directory
* Agents can access the `.env.example` template but not other config files
* All agents can discuss modernization approaches
* Winning agent can write tests and create modernized version

Clean Project Organization
---------------------------

The .massgen/ Directory
~~~~~~~~~~~~~~~~~~~~~~~

All MassGen working files are organized under ``.massgen/`` in your project root:

.. code-block:: text

   your-project/
   ├── .massgen/                          # All MassGen state
   │   ├── sessions/                      # Multi-turn conversation history
   │   │   └── session_20250108_143022/
   │   │       ├── turn_1/                # Results from turn 1
   │   │       ├── turn_2/                # Results from turn 2
   │   │       └── SESSION_SUMMARY.txt    # Human-readable summary
   │   ├── workspaces/                    # Agent working directories
   │   │   ├── analysis_workspace/        # Analyzer's isolated workspace
   │   │   └── implementation_workspace/  # Implementer's workspace
   │   ├── snapshots/                     # Workspace snapshots for coordination
   │   └── temp_workspaces/               # Previous turn results for context
   ├── src/                               # Your actual project files
   ├── tests/                             # Your tests
   └── docs/                              # Your documentation

Benefits
~~~~~~~~

✅ **Clean Projects**
   All MassGen files contained in one directory

✅ **Easy .gitignore**
   Just add ``.massgen/`` to your ``.gitignore``

✅ **Portable**
   Move or delete ``.massgen/`` without affecting your project

✅ **Multi-Turn Sessions**
   Conversation history preserved across sessions

Configuration Auto-Organization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You specify simple names, MassGen organizes under ``.massgen/``:

.. code-block:: yaml

   orchestrator:
     snapshot_storage: "snapshots"         # → .massgen/snapshots/ (REQUIRED)
     agent_temporary_workspace: "temp"     # → .massgen/temp/ (REQUIRED)

   agents:
     - backend:
         cwd: "workspace1"                 # → .massgen/workspaces/workspace1/

.. note::

   ``snapshot_storage`` and ``agent_temporary_workspace`` are **required** when using file operations or context paths.

Adding to .gitignore
~~~~~~~~~~~~~~~~~~~~

.. code-block:: gitignore

   # MassGen state and working files
   .massgen/

This excludes all MassGen temporary files, sessions, and workspaces from version control while keeping your project clean.

Use Cases
---------

Code Review
~~~~~~~~~~~

Agents analyze your source code and suggest improvements:

.. code-block:: yaml

   orchestrator:
     snapshot_storage: "snapshots"
     agent_temporary_workspace: "temp_workspaces"

     context_paths:
       - path: "/home/user/project/src"
         permission: "read"
       - path: "/home/user/project/review-notes"
         permission: "write"

.. code-block:: bash

   # Run from project directory - recommended for coding
   cd /home/user/project
   uv run massgen "Review the authentication module for security issues and best practices"

   # Or with explicit config
   massgen \
     --config code_review.yaml \
     "Review the authentication module for security issues and best practices"

Documentation Generation
~~~~~~~~~~~~~~~~~~~~~~~~~

Agents read project code to understand context and generate/update documentation:

.. code-block:: yaml

   orchestrator:
     snapshot_storage: "snapshots"
     agent_temporary_workspace: "temp_workspaces"

     context_paths:
       - path: "/home/user/project/src"
         permission: "read"
       - path: "/home/user/project/docs"
         permission: "write"

.. code-block:: bash

   # Run from project directory - recommended for coding
   cd /home/user/project
   uv run massgen "Update the API documentation to reflect recent changes in the auth module"

   # Or with explicit config
   massgen \
     --config doc_generator.yaml \
     "Update the API documentation to reflect recent changes in the auth module"

Data Processing
~~~~~~~~~~~~~~~

Agents access shared datasets and generate analysis reports:

.. code-block:: yaml

   orchestrator:
     snapshot_storage: "snapshots"
     agent_temporary_workspace: "temp_workspaces"

     context_paths:
       - path: "/home/user/datasets"
         permission: "read"
       - path: "/home/user/reports"
         permission: "write"

.. code-block:: bash

   # Run from project directory - recommended
   cd /home/user
   uv run massgen "Analyze the Q4 sales data and create a comprehensive report with visualizations"

   # Or with explicit config
   massgen \
     --config data_analysis.yaml \
     "Analyze the Q4 sales data and create a comprehensive report with visualizations"

Project Migration
~~~~~~~~~~~~~~~~~

Agents examine existing projects and create modernized versions:

.. code-block:: yaml

   orchestrator:
     snapshot_storage: "snapshots"
     agent_temporary_workspace: "temp_workspaces"

     context_paths:
       - path: "/home/user/old-project"
         permission: "read"
       - path: "/home/user/new-project"
         permission: "write"

.. code-block:: bash

   # Run from project directory - recommended for coding
   cd /home/user/old-project
   uv run massgen "Migrate the Flask 1.x application to Flask 3.x with modern best practices"

   # Or with explicit config
   massgen \
     --config migration.yaml \
     "Migrate the Flask 1.x application to Flask 3.x with modern best practices"

Project Instructions (CLAUDE.md / AGENTS.md)
---------------------------------------------

**NEW in v0.1.36** - Automatic discovery of project instruction files

MassGen automatically discovers and includes project instruction files (CLAUDE.md or AGENTS.md) when they exist in your context paths. This follows the `agents.md standard <https://agents.md/>`_ for coding agent instructions.

Quick Example
~~~~~~~~~~~~~

Create a ``CLAUDE.md`` or ``AGENTS.md`` file in your project root:

.. code-block:: markdown

   # MyProject Instructions

   ## Build & Test
   - Run `npm install` before testing
   - Tests use pytest: `pytest tests/`

   ## Code Style
   - Use TypeScript strict mode
   - Follow ESLint configuration in `.eslintrc`

Then run MassGen with your project as a context path:

.. code-block:: bash

   cd /path/to/myproject
   uv run massgen "@. Add dark mode toggle to the settings page"

The contents of ``CLAUDE.md`` will automatically be included in the agent's system prompt.

Supported Files
~~~~~~~~~~~~~~~

MassGen supports both standard formats:

.. list-table::
   :header-rows: 1
   :widths: 30 70

   * - File
     - Description
   * - ``CLAUDE.md``
     - Claude Code specific instructions (takes precedence)
   * - ``AGENTS.md``
     - Universal standard for coding agents (`agents.md <https://agents.md/>`_, 60k+ projects)

**Priority**: If both files exist, ``CLAUDE.md`` takes precedence.

How Discovery Works
~~~~~~~~~~~~~~~~~~~

MassGen uses **hierarchical discovery** with "closest wins" semantics:

1. Starts at your context path
2. Walks up to workspace root looking for ``CLAUDE.md`` or ``AGENTS.md``
3. Returns the **closest** file found
4. ``CLAUDE.md`` is preferred over ``AGENTS.md`` at the same directory level

**Example structure**:

.. code-block:: text

   /myproject/                   # Root
   ├── AGENTS.md                 # Project-wide instructions
   ├── src/
   │   ├── CLAUDE.md             # Source-specific instructions (closest wins for src/)
   │   └── api/
   │       └── handler.py
   └── docs/
       └── AGENTS.md             # Docs-specific instructions

**If you specify** ``@/myproject/src/api``:

- MassGen finds ``/myproject/src/CLAUDE.md`` (closest to api/)
- Root ``AGENTS.md`` is ignored (src/CLAUDE.md is closer)

Configuration Examples
~~~~~~~~~~~~~~~~~~~~~~

**Option 1: Directory with instruction file**

.. code-block:: yaml

   orchestrator:
     context_paths:
       - path: "/Users/me/myproject"
         permission: "read"

If ``/Users/me/myproject/CLAUDE.md`` or ``AGENTS.md`` exists, it's automatically included.

**Option 2: Explicit file reference**

.. code-block:: bash

   massgen "@CLAUDE.md @src/ Review the authentication module"

**Option 3: Using @path syntax (CLI)**

.. code-block:: bash

   # Discovers CLAUDE.md from project root
   cd /Users/me/myproject
   massgen "@. Add user profile page"

Important Notes
~~~~~~~~~~~~~~~

.. note::

   **Context, not strict instructions**: The contents of CLAUDE.md/AGENTS.md are provided as **reference context** that may or may not be relevant to the current task. Agents use these as helpful guidelines when applicable but are not required to follow every instruction.

   This differs from operational system prompt instructions - think of it like README.md for agents.

**Static loading**:
   Instruction files are read **once** at session start. Changes during execution require restarting the session.

**Workspace boundary**:
   Discovery stops at your workspace root - files outside the workspace are not searched.

**Deduplication**:
   If multiple context paths resolve to the same instruction file, it's only included once.

Real-World Example
~~~~~~~~~~~~~~~~~~

.. code-block:: markdown

   # CLAUDE.md

   # Acme Web App

   ## Build Process
   ```bash
   npm install
   npm run build
   npm test
   ```

   ## Architecture
   - Frontend: React 18 + TypeScript
   - Backend: FastAPI + PostgreSQL
   - Tests: Jest for frontend, pytest for backend

   ## Code Style
   - Use TypeScript strict mode
   - Follow Airbnb style guide
   - 100% test coverage required for API endpoints

   ## Testing
   - Run `npm test` for frontend tests
   - Run `pytest` for backend tests
   - CI runs both on every PR

**Usage**:

.. code-block:: bash

   cd acme-web-app
   massgen "@. Add pagination to the users list endpoint"

Agents will receive the build instructions, architecture context, and testing requirements automatically.

Best Practices
~~~~~~~~~~~~~~

1. **Keep it concise**: Agents receive this as context, so focus on essential information
2. **Include build steps**: How to set up the development environment
3. **Document conventions**: Code style, naming patterns, testing requirements
4. **Use both if needed**: CLAUDE.md for Claude-specific optimizations, AGENTS.md for universal compatibility
5. **Update regularly**: Keep instructions current as your project evolves

Security Considerations
-----------------------

.. warning::

   **Agents can autonomously read/write files** in context paths with write permission.

Before granting write access:

* ✅ **Backup your code** - Ensure you have version control or backups
* ✅ **Test first** - Try with read-only permission first
* ✅ **Isolated projects** - Consider testing on a copy of your project
* ✅ **Review permissions** - Double-check which paths have write access
* ✅ **Use version control** - Git/VCS allows easy rollback

Path Validation
~~~~~~~~~~~~~~~

MassGen validates all context paths at startup:

* ✅ Paths must exist
* ✅ Paths must be directories (not files)
* ✅ Paths must be absolute (not relative)

**Error messages:**

.. code-block:: text

   Error: Context path '/home/user/project/file.txt' is not a directory
   Error: Context path '/home/user/missing' does not exist
   Error: Context path must be absolute, got 'relative/path'

Best Practices
--------------

1. **Start with read-only** - Analyze before modifying
2. **Granular permissions** - Only grant write where needed
3. **Use .gitignore** - Exclude ``.massgen/`` from version control
4. **Review agent work** - Check ``.massgen/workspaces/`` before accepting changes
5. **Backup important projects** - Use Git or other VCS
6. **Test configurations** - Try on sample projects first

Example: Complete Project Setup
--------------------------------

.. code-block:: yaml

   agents:
     - id: "analyzer"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
         cwd: "analyzer_workspace"

     - id: "developer"
       backend:
         type: "claude_code"
         model: "claude-sonnet-4"
         cwd: "developer_workspace"

   orchestrator:
     # Required for file operations
     snapshot_storage: "snapshots"
     agent_temporary_workspace: "temp"

     # Project integration - mix of directories and files
     context_paths:
       - path: "/Users/me/myproject/src"                  # Directory: analyze existing code
         permission: "read"
       - path: "/Users/me/myproject/pytest.ini"           # File: read test config (v0.0.26+)
         permission: "read"
       - path: "/Users/me/myproject/tests"                # Directory: generate tests
         permission: "write"
       - path: "/Users/me/myproject/docs"                 # Directory: update documentation
         permission: "write"

   ui:
     display_type: "rich_terminal"
     logging_enabled: true

**Project structure after running:**

.. code-block:: text

   myproject/
   ├── .massgen/                    # All MassGen state
   │   ├── workspaces/
   │   │   ├── analyzer_workspace/
   │   │   └── developer_workspace/
   │   ├── snapshots/
   │   ├── sessions/
   │   └── temp/
   ├── src/                         # Your source (read access)
   ├── tests/                       # Generated tests (write access)
   ├── docs/                        # Updated docs (write access)
   └── .gitignore                   # Contains .massgen/

Protected Paths
---------------

Protected paths allow you to make specific files or directories **read-only** within writable context paths, preventing agents from modifying or deleting critical reference files while allowing them to edit other files.

.. note::

   **Use Case**: You want agents to modify some files in a directory but keep certain reference files, configurations, or templates untouched.

Basic Configuration
~~~~~~~~~~~~~~~~~~~

Protect specific files within a writable context path:

.. code-block:: yaml

   orchestrator:
     snapshot_storage: "snapshots"
     agent_temporary_workspace: "temp_workspaces"

     context_paths:
       - path: "/absolute/path/to/directory"
         permission: "write"
         protected_paths:
           - "important_file.txt"
           - "config.json"

**Result**:

* Agents can read and modify all files **except** ``important_file.txt`` and ``config.json``
* Protected files are readable but not writable

Protected Paths Syntax
~~~~~~~~~~~~~~~~~~~~~~~

Protected paths are **relative to the context path**:

.. code-block:: yaml

   orchestrator:
     context_paths:
       - path: "/Users/me/project"
         permission: "write"
         protected_paths:
           - "src/config.py"          # Protects /Users/me/project/src/config.py
           - "tests/fixtures/"        # Protects /Users/me/project/tests/fixtures/
           - "README.md"              # File protection
           - "docs/"                  # Directory protection

Common Use Cases
~~~~~~~~~~~~~~~~

**1. Protect Reference Files**: Keep test fixtures unchanged while agents modify code

.. code-block:: yaml

   context_paths:
     - path: "/project"
       permission: "write"
       protected_paths:
         - "tests/fixtures/"
         - "tests/expected_outputs/"

**2. Protect Configuration**: Allow code changes but prevent config modifications

.. code-block:: yaml

   context_paths:
     - path: "/app"
       permission: "write"
       protected_paths:
         - "config.yaml"
         - ".env.example"
         - "docker-compose.yml"

**3. Protect Templates**: Generate content without modifying templates

.. code-block:: yaml

   context_paths:
     - path: "/website"
       permission: "write"
       protected_paths:
         - "templates/"
         - "layouts/"

**4. Mixed Permissions**: Different protection levels across context paths

.. code-block:: yaml

   context_paths:
     # Source code - most files writable, some protected
     - path: "/project/src"
       permission: "write"
       protected_paths:
         - "core/constants.py"
         - "version.py"

     # Docs - completely read-only (no protected_paths needed)
     - path: "/project/docs"
       permission: "read"

     # Temp folder - fully writable
     - path: "/project/temp"
       permission: "write"

How Protection Works
~~~~~~~~~~~~~~~~~~~~

Protected paths are enforced by the ``PathPermissionManager``:

1. **Startup validation**: Checks that protected paths exist within their context path
2. **Runtime enforcement**: Blocks write/delete operations on protected paths
3. **Clear error messages**: Agents receive descriptive errors when blocked

.. code-block:: text

   Agent: Edit /project/config.json
   Error: Cannot modify /project/config.json - path is protected

**Read Operations**: Agents can always read protected files for reference:

.. code-block:: python

   Agent: Read config.json        # ✅ Allowed
   Agent: Edit config.json         # ❌ Blocked
   Agent: Delete config.json       # ❌ Blocked

**Directory Protection**: Protecting a directory protects all contents recursively:

.. code-block:: text

   protected_paths: ["tests/fixtures/"]

   ✅ Read tests/fixtures/data.json
   ❌ Write tests/fixtures/data.json
   ❌ Delete tests/fixtures/
   ❌ Create tests/fixtures/new_file.txt

Best Practices
~~~~~~~~~~~~~~

1. **Be explicit**: List all critical files rather than assuming default protection
2. **Test first**: Run with a test directory to verify protection works
3. **Document**: Add comments explaining why files are protected

   .. code-block:: yaml

      protected_paths:
        - "schema.sql"        # Database schema - don't modify structure
        - "LICENSE"           # Legal file - must not change

4. **Use read-only when appropriate**: If entire directory should be read-only, use ``permission: "read"`` instead of protecting all paths

   .. code-block:: yaml

      # If everything should be read-only:
      - path: "/reference_docs"
        permission: "read"     # Simpler than listing all files

      # If you want selective protection:
      - path: "/working_dir"
        permission: "write"
        protected_paths: [...]  # Mixed permissions

5. **Combine with planning mode**: Use protected paths with planning mode for maximum safety

   .. code-block:: yaml

      orchestrator:
        context_paths:
          - path: "/project"
            permission: "write"
            protected_paths: ["config.json"]
        coordination:
          enable_planning_mode: true  # Prevents modifications during coordination

Troubleshooting
~~~~~~~~~~~~~~~

**Problem**: Agent is modifying a file you marked as protected.

**Check**:

1. **Verify relative path is correct**:

   .. code-block:: yaml

      context_paths:
        - path: "/Users/me/project"
          protected_paths:
            - "config.json"         # ✅ Relative to /Users/me/project
            # NOT: "/Users/me/project/config.json"  # ❌ Would be treated as relative

2. **Check the file exists**: Protected paths must exist when MassGen starts
3. **Verify write permission**: Protection only applies to writable context paths

**Problem**: "Protected path 'file.txt' not found"

**Solution**: Ensure the file exists before starting MassGen:

.. code-block:: bash

   ls /project/file.txt  # Check if file exists

Security Note
~~~~~~~~~~~~~

.. warning::

   Protected paths are a **convenience feature**, not a security boundary. For security-critical files:

   * Use file system permissions (chmod)
   * Run MassGen with limited user accounts
   * Store sensitive data outside agent-accessible directories
   * Review all agent operations before deploying

Next Steps
----------

* :doc:`file_operations` - Learn more about workspace management and file operation safety
* :doc:`../tools/mcp_integration` - Additional tools for project work
* :doc:`../advanced/planning_mode` - Combine with planning mode for safer coordination
* :doc:`../sessions/multi_turn_mode` - Iterative project development across turns
* :doc:`../../quickstart/running-massgen` - More examples


---

## user_guide/files/protected_paths.rst

:orphan:

Protected Paths
===============

Protected paths allow you to make specific files or directories **read-only** within writable context paths, preventing agents from modifying or deleting critical reference files while allowing them to edit other files.

.. note::

   **Use Case**: You want agents to modify some files in a directory but keep certain reference files, configurations, or templates untouched.

Quick Start
-----------

**Protect a single file:**

.. code-block:: yaml

   orchestrator:
     context_paths:
       - path: "/path/to/project"
         permission: "write"
         protected_paths:
           - "config.json"  # Agents can read but not modify

**Example usage:**

.. code-block:: bash

   massgen \
     --config @examples/tools/filesystem/gemini_gpt5nano_protected_paths.yaml \
     "Review the HTML and CSS files, then improve the styling"

What Are Protected Paths?
--------------------------

Protected paths are files or directories within a **writable** context path that are explicitly marked as read-only. Agents can:

* ✅ **Read** protected files for reference
* ✅ **Write/Edit** non-protected files in the same directory
* ❌ **Modify or Delete** protected files

This gives you fine-grained control over what agents can change.

Why Use Protected Paths?
~~~~~~~~~~~~~~~~~~~~~~~~~

**Without protected paths:**

.. code-block:: text

   ❌ Context path: /project (write permission)
      → Agents can modify ALL files including critical configs

**With protected paths:**

.. code-block:: text

   ✅ Context path: /project (write permission)
      ├── config.json (protected - read only)
      ├── template.html (protected - read only)
      └── styles.css (writable)
      → Agents can only modify styles.css

Configuration
-------------

Basic Configuration
~~~~~~~~~~~~~~~~~~~

Protect specific files within a writable context path:

.. code-block:: yaml

   orchestrator:
     context_paths:
       - path: "/absolute/path/to/directory"
         permission: "write"
         protected_paths:
           - "important_file.txt"
           - "config.json"

**Result**:

* Agents can read and modify all files **except** ``important_file.txt`` and ``config.json``
* Protected files are readable but not writable

Multiple Protected Paths
~~~~~~~~~~~~~~~~~~~~~~~~~

Protect multiple files or directories:

.. code-block:: yaml

   orchestrator:
     context_paths:
       - path: "/project"
         permission: "write"
         protected_paths:
           - "README.md"              # File protection
           - "docs/"                  # Directory protection
           - ".github/workflows/"     # Protect CI/CD configs
           - "package.json"           # Protect dependencies

Relative Path Syntax
~~~~~~~~~~~~~~~~~~~~

Protected paths are **relative to the context path**:

.. code-block:: yaml

   orchestrator:
     context_paths:
       - path: "/Users/me/project"
         permission: "write"
         protected_paths:
           - "src/config.py"          # Protects /Users/me/project/src/config.py
           - "tests/fixtures/"        # Protects /Users/me/project/tests/fixtures/

Complete Example
~~~~~~~~~~~~~~~~

Realistic configuration for a web project:

.. code-block:: yaml

   agents:
     - id: "frontend_agent"
       backend:
         type: "claude_code"
         cwd: "workspace"

     - id: "reviewer_agent"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"

   orchestrator:
     snapshot_storage: "snapshots"
     agent_temporary_workspace: "temp_workspaces"
     context_paths:
       - path: "/Users/me/website"
         permission: "write"
         protected_paths:
           - "index.html"           # Keep original structure
           - "assets/logo.png"      # Don't modify brand assets
           - ".git/"                # Never touch version control
           # styles.css is NOT protected - agents can modify it

   ui:
     display_type: "rich_terminal"

**Usage**:

.. code-block:: bash

   massgen \
     --config website_config.yaml \
     "Improve the CSS styling while keeping the HTML structure intact"

**Result**:

* ✅ Agents can read ``index.html`` for structure understanding
* ✅ Agents can freely modify ``styles.css``
* ❌ Agents cannot change ``index.html`` or ``assets/logo.png``

Use Cases
---------

Use Case 1: Protect Reference Files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Scenario**: Let agents improve code while keeping test fixtures unchanged.

.. code-block:: yaml

   context_paths:
     - path: "/project"
       permission: "write"
       protected_paths:
         - "tests/fixtures/"
         - "tests/expected_outputs/"

**Task**: "Refactor the parser module to improve performance"

**Result**: Agents can modify parser code but test fixtures remain untouched for validation.

Use Case 2: Protect Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Scenario**: Allow code changes but prevent config file modifications.

.. code-block:: yaml

   context_paths:
     - path: "/app"
       permission: "write"
       protected_paths:
         - "config.yaml"
         - ".env.example"
         - "docker-compose.yml"

**Task**: "Add error handling to the API endpoints"

**Result**: Agents improve code without accidentally changing deployment configs.

Use Case 3: Protect Templates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Scenario**: Let agents generate content based on templates without modifying the templates.

.. code-block:: yaml

   context_paths:
     - path: "/website"
       permission: "write"
       protected_paths:
         - "templates/"
         - "layouts/"

**Task**: "Generate blog posts using the templates"

**Result**: Agents create new content files without touching template structure.

Use Case 4: Protect Documentation Structure
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Scenario**: Allow content updates but preserve documentation organization.

.. code-block:: yaml

   context_paths:
     - path: "/docs"
       permission: "write"
       protected_paths:
         - "index.md"              # Keep main page structure
         - "_sidebar.md"           # Preserve navigation
         - "_config.yml"           # Don't change doc settings

**Task**: "Update the API reference documentation"

**Result**: Agents update specific doc pages without reorganizing the documentation structure.

Use Case 5: Mixed Permissions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Scenario**: Multiple context paths with different protection levels.

.. code-block:: yaml

   context_paths:
     # Source code - most files writable, some protected
     - path: "/project/src"
       permission: "write"
       protected_paths:
         - "core/constants.py"
         - "version.py"

     # Docs - completely read-only (no protected_paths needed, just use "read")
     - path: "/project/docs"
       permission: "read"

     # Temp folder - fully writable (no protected_paths)
     - path: "/project/temp"
       permission: "write"

How It Works
------------

Permission Enforcement
~~~~~~~~~~~~~~~~~~~~~~

Protected paths are enforced by the ``PathPermissionManager``:

1. **Startup validation**: Checks that protected paths exist within their context path
2. **Runtime enforcement**: Blocks write/delete operations on protected paths
3. **Clear error messages**: Agents receive descriptive errors when blocked

.. code-block:: text

   Agent: Edit /project/config.json
   Error: Cannot modify /project/config.json - path is protected

Read Operations
~~~~~~~~~~~~~~~

Agents can always read protected files:

.. code-block:: python

   Agent: Read config.json        # ✅ Allowed
   Agent: Edit config.json         # ❌ Blocked
   Agent: Delete config.json       # ❌ Blocked

This allows agents to use protected files as reference material.

Directory Protection
~~~~~~~~~~~~~~~~~~~~

Protecting a directory protects all contents recursively:

.. code-block:: yaml

   protected_paths:
     - "tests/fixtures/"  # Protects all files inside

.. code-block:: text

   ✅ Read tests/fixtures/data.json
   ❌ Write tests/fixtures/data.json
   ❌ Delete tests/fixtures/
   ❌ Create tests/fixtures/new_file.txt

Interaction with File Operation Safety
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Protected paths work alongside read-before-delete enforcement:

1. **Protected files**: Cannot be deleted even if read first
2. **Non-protected files**: Follow standard read-before-delete rules
3. **Agent-created files**: Can be deleted (not affected by protection)

Interactive Mode
----------------

In interactive mode, you can add protected paths when prompted:

.. code-block:: text

   📂 Context Paths:
      No context paths configured

   ❓ Add current directory as context path?
      /Users/me/project
      [Y]es (default) / [P]rotected / [N]o / [C]ustom path: P

   Enter protected paths (relative to context path), one per line. Empty line to finish:
      → config.json
      → .env
      → tests/fixtures/
      →

   ✓ Added /Users/me/project (write)
     🔒 config.json
     🔒 .env
     🔒 tests/fixtures/

Advanced Patterns
-----------------

Pattern Matching (Future Enhancement)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. note::

   Currently, protected paths must be explicit file or directory names. Pattern matching (e.g., ``*.json``) is not yet supported but planned for future releases.

Current workaround - list files explicitly:

.. code-block:: yaml

   protected_paths:
     - "config.json"
     - "secrets.json"
     - "settings.json"

Nested Protection
~~~~~~~~~~~~~~~~~

You can have multiple levels of protection:

.. code-block:: yaml

   context_paths:
     # Parent directory mostly writable
     - path: "/project"
       permission: "write"
       protected_paths:
         - "src/core/"              # Protect entire core module

     # More specific protection for subdirectory
     - path: "/project/src"
       permission: "write"
       protected_paths:
         - "utils/constants.py"     # Additional specific protection

Troubleshooting
---------------

Protected Path Not Working
~~~~~~~~~~~~~~~~~~~~~~~~~~

**Problem**: Agent is modifying a file you marked as protected.

**Check**:

1. **Verify relative path is correct**:

   .. code-block:: yaml

      context_paths:
        - path: "/Users/me/project"
          protected_paths:
            - "config.json"         # ✅ Relative to /Users/me/project
            # NOT: "/Users/me/project/config.json"  # ❌ Would be treated as relative

2. **Check the file exists**:

   Protected paths must exist when MassGen starts. Check logs for validation errors.

3. **Verify the context path permission**:

   .. code-block:: yaml

      permission: "write"  # Required - protection only applies to writable paths

Path Not Found Error
~~~~~~~~~~~~~~~~~~~~

**Problem**: "Protected path 'file.txt' not found in context path '/project'"

**Solution**: Ensure the protected path exists before starting MassGen:

.. code-block:: bash

   # Check if file exists
   ls /project/file.txt

   # If missing, either:
   # 1. Create the file first, or
   # 2. Remove it from protected_paths

Agent Still Modifying Files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Problem**: Agent bypasses protection during coordination.

**Check**:

1. **Ensure you're using during final presentation**: Protection applies to all phases, but ensure agent is using the right context path

2. **Check file is within context path**: Protection only works for files within the specified context path

3. **Review logs**: Check ``massgen_debug.log`` for permission checks

Best Practices
--------------

1. **Be explicit about what to protect**: List all critical files rather than assuming default protection

2. **Test first**: Run with a test directory to verify protection works as expected

3. **Document in comments**: Add comments to your config explaining why files are protected

   .. code-block:: yaml

      protected_paths:
        - "schema.sql"        # Database schema - don't let agents modify structure
        - "LICENSE"           # Legal file - must not change

4. **Use read-only permission when appropriate**: If the entire directory should be read-only, use ``permission: "read"`` instead of protecting all paths

   .. code-block:: yaml

      # If you want everything read-only:
      - path: "/reference_docs"
        permission: "read"     # ← Simpler than listing all files as protected

      # If you want selective protection:
      - path: "/working_dir"
        permission: "write"
        protected_paths: [...]  # ← Use this for mixed permissions

5. **Combine with planning mode**: Use protected paths alongside planning mode for maximum safety

   .. code-block:: yaml

      orchestrator:
        context_paths:
          - path: "/project"
            permission: "write"
            protected_paths: ["config.json"]
        coordination:
          enable_planning_mode: true  # Prevents accidental modifications during coordination

Binary File Protection
----------------------

MassGen automatically prevents agents from using text-based read tools on binary files, directing them to use appropriate specialized tools instead.

What's Protected
~~~~~~~~~~~~~~~~

Text-based read tools (``read_file``, ``find_and_read_text``, ``grep``) are automatically blocked from accessing 40+ binary file types:

**Images**:
  ``.jpg``, ``.jpeg``, ``.png``, ``.gif``, ``.bmp``, ``.svg``, ``.webp``, ``.tiff``

**Videos**:
  ``.mp4``, ``.avi``, ``.mov``, ``.mkv``, ``.flv``, ``.wmv``, ``.webm``, ``.mpg``

**Audio**:
  ``.mp3``, ``.wav``, ``.ogg``, ``.flac``, ``.aac``, ``.m4a``, ``.wma``

**Archives**:
  ``.zip``, ``.tar``, ``.gz``, ``.7z``, ``.rar``

**Documents**:
  ``.pdf``, ``.docx``, ``.xlsx``, ``.pptx`` (use ``understand_file`` tool)

**Executables**:
  ``.exe``, ``.bin``, ``.dll``, ``.so``, ``.dylib``, ``.pyc``

How It Works
~~~~~~~~~~~~

When an agent attempts to read a binary file with a text tool, they receive a helpful error message:

.. code-block:: text

   Cannot read image file 'screenshot.png' with text-based tool 'read_file'.
   Please use 'understand_image' tool for image files.

.. code-block:: text

   Cannot read video file 'demo.mp4' with text-based tool 'grep'.
   Please use 'understand_video' tool for video files.

The error messages automatically suggest the correct tool for each file type:

* **Images** → ``understand_image``
* **Videos** → ``understand_video``
* **Audio** → ``understand_audio``
* **PDF/Office docs** → ``understand_file``
* **Archives** → Extract first, then read contents

Benefits
~~~~~~~~

1. **Prevents Confusion**: Agents can't accidentally try to read binary data as text
2. **Better Tool Usage**: Guides agents to use appropriate multimodal tools
3. **Clearer Errors**: Actionable error messages instead of garbled binary output
4. **No Configuration Needed**: Works automatically for all agents

Security Considerations
-----------------------

.. warning::

   Protected paths are a **convenience feature**, not a security boundary. They prevent accidental modifications but shouldn't be relied upon for security-critical files.

**For security-sensitive files:**

* Use file system permissions (chmod)
* Run MassGen with limited user accounts
* Store sensitive data outside agent-accessible directories
* Use read-only context paths instead of protected paths
* Review all agent operations before deploying

**Binary file protection** is also a convenience feature that guides agents to use correct tools, not a security boundary.

Related Features
----------------

* :doc:`file_operations` - File operation safety and read-before-delete enforcement
* :doc:`project_integration` - Context paths and permission system
* :doc:`../advanced/planning_mode` - Prevent modifications during coordination
* :doc:`../../reference/yaml_schema` - Complete YAML configuration reference

Next Steps
----------

* :doc:`project_integration` - Learn about context paths and permissions
* :doc:`file_operations` - Understand file operation safety features
* :doc:`../advanced/planning_mode` - Combine with planning mode for extra safety


---

## user_guide/filesystem_first.rst

==========================
Filesystem-First Mode
==========================

.. note::
   **Status**: Experimental (v0.2.0+)

   Filesystem-first mode is a revolutionary new paradigm that enables agents to discover tools from the filesystem
   rather than having all tools injected into context. This reduces context usage by **98%** and allows attaching
   **100+ MCP servers** without context pollution.

Overview
========

Traditional tool injection loads all available tools into the agent's context window, which:

- Consumes significant tokens (~150K for comprehensive tool definitions)
- Limits the number of MCP servers you can attach (5-10 before context overflow)
- Requires manually predicting which tools agents will need
- Pollutes context with irrelevant tools

Filesystem-first mode solves all these problems by representing tools as **files in the filesystem** that agents discover using CLI primitives like ripgrep and ast-grep.

Key Benefits
============

Context Efficiency
------------------

- **98% reduction**: From ~150K tokens to ~3K tokens for tool definitions
- Only essential tools (filesystem + code execution) remain in context
- All other tools discovered on-demand from filesystem

Unlimited Scalability
---------------------

- **100+ MCP servers**: Attach unlimited servers with zero context cost
- **Progressive disclosure**: Load only tools needed for each task
- **Universal workspace**: One config works for any task

Code Composition
----------------

- **Write scripts**: Compose tools with native Python code
- **Control flow**: Use loops, conditionals, error handling
- **Reusable workflows**: Save successful patterns as skills

Unified Architecture
--------------------

- **MCP tools** → files in ``servers/``
- **Custom tools** → files in ``tools/``
- **Skills** → reusable workflows in ``skills/``
- **Memory** → persistent storage in ``memory/``
- All discoverable with same search primitives

Quick Start
===========

Prerequisites
-------------

Filesystem-first mode **requires** Docker-based code execution:

1. **Docker installed** (Engine 28.0.0+)

   .. code-block:: bash

      docker --version

2. **Build MassGen Docker image**

   .. code-block:: bash

      cd massgen/docker
      bash build.sh

3. **Verify search tools** (ripgrep + ast-grep)

   .. code-block:: bash

      docker run --rm massgen/mcp-runtime:latest rg --version
      docker run --rm massgen/mcp-runtime:latest ast-grep --version

Basic Configuration
-------------------

Create a config file with ``execution_mode: "filesystem_first"``:

.. code-block:: yaml

   # config.yaml
   massgen:
     execution_mode: "filesystem_first"

     # Minimal in-context tools (only essentials)
     in_context_tools:
       filesystem:
         - read_file
         - write_file
         - list_directory
       command_execution:
         - execute_command  # Code execution MCP server tool

     # Attach as many MCP servers as you want!
     available_mcp_servers:
       - google-drive
       - github
       - slack
       - postgres
       # ... unlimited!

   agents:
     - id: "agent"
       backend:
         type: "openai"
         model: "gpt-4o"
         enable_mcp_command_line: true  # Required
         command_line_execution_mode: "docker"  # Required

Run MassGen:

.. code-block:: bash

   massgen --config config.yaml "Your question here"

How It Works
============

Workspace Structure
-------------------

When filesystem-first mode is enabled, MassGen creates this structure:

.. code-block:: text

   /massgen_workspace/                    # Shared workspace
   ├── servers/                           # MCP tools (SHARED, read-only)
   │   ├── google-drive/
   │   │   ├── getDocument.py
   │   │   ├── searchFiles.py
   │   │   └── __init__.py
   │   ├── github/
   │   │   ├── createIssue.py
   │   │   └── __init__.py
   │   └── ...
   ├── tools/                             # Custom tools (SHARED, read-only)
   │   ├── web/
   │   │   ├── playwright_navigate.py
   │   │   └── screenshot.py
   │   └── multimodal/
   │       └── vision_understanding.py
   ├── skills/                            # Reusable workflows
   │   ├── community/                     # Shared skills
   │   │   └── webapp-testing/
   │   │       └── SKILL.md
   │   └── agent_a/                       # Per-agent skills
   │       └── my-workflow/
   │           └── SKILL.md
   └── agents/                            # Per-agent directories
       ├── agent_a/
       │   ├── workspace/  -> temp_workspaces/agent_a/
       │   ├── memory/                    # Persistent memory
       │   └── tasks/
       └── agent_b/
           └── ...

From the agent's perspective (via symlinks):

.. code-block:: text

   /workspace/                            # Agent's working directory
   ├── servers/ -> /massgen_workspace/servers/
   ├── tools/ -> /massgen_workspace/tools/
   ├── skills/ -> /massgen_workspace/skills/
   ├── memory/ -> /massgen_workspace/agents/agent_a/memory/
   └── ... (your project files)

Tool Discovery Workflow
-----------------------

Here's how an agent discovers and uses tools:

**Example Task**: "Analyze Q4 sales and send summary to #sales"

**1. Discover relevant tools using ripgrep:**

.. code-block:: python

   import subprocess

   # Find sales-related tools
   result = subprocess.run(
       ["rg", "sales|revenue|crm", "servers/", "-i", "-l"],
       capture_output=True, text=True
   )
   # Output:
   # servers/salesforce/query_records.py
   # servers/stripe/list_charges.py

   # Find messaging tools
   result = subprocess.run(
       ["rg", "slack|message", "servers/", "-i", "-l"],
       capture_output=True, text=True
   )
   # Output:
   # servers/slack/post_message.py

**2. Read tool definitions:**

.. code-block:: python

   with open("servers/salesforce/query_records.py") as f:
       print(f.read())  # See full tool documentation

**3. Write code using discovered tools:**

.. code-block:: python

   from servers.salesforce import query_records
   from servers.slack import post_message

   # Fetch Q4 sales data
   data = await query_records(
       query="SELECT Amount FROM Opportunity WHERE CloseDate >= 2024-10-01"
   )

   # Analyze
   total = sum(record["Amount"] for record in data)
   summary = f"Q4 Revenue: ${total:,.2f}"

   # Send to Slack
   await post_message(channel="#sales", text=summary)

**Result**: Task completed using only 2 tools (out of 200+ available) with ~3K token context!

Configuration Reference
=======================

Global Settings
---------------

Add a ``massgen:`` section to your config:

.. code-block:: yaml

   massgen:
     # Enable filesystem-first mode (default: "context_based")
     execution_mode: "filesystem_first"

     # Tools that remain in context (minimal set)
     in_context_tools:
       filesystem: [read_file, write_file, list_directory, create_directory]
       code_execution: [execute_python, execute_bash]

     # All other MCP servers (exposed as files, NOT in context)
     available_mcp_servers:
       - google-drive
       - github
       - slack
       # ... add as many as you want!

     # Custom tools (exposed as files, NOT in context)
     available_custom_tools:
       - web.playwright_navigate
       - multimodal.vision_understanding
       # ... all custom tools

     # Code execution configuration (REQUIRED)
     code_execution:
       enabled: true
       mode: "docker"  # Strongly recommended

     # Search tools (for tool discovery)
     search_tools:
       enable_ripgrep: true
       enable_ast_grep: true
       enable_semtools: false  # Optional (future)

In-Context Tools
----------------

These tools remain in the agent's context for immediate access:

**Essential Filesystem** (required for tool discovery):

- ``read_file`` - Read tool definitions and source code
- ``write_file`` - Save generated code
- ``list_directory`` - Discover available tools
- ``create_directory`` - Organize workspace
- ``move_file`` - Manage files
- ``get_file_info`` - Check file metadata

**Command Execution** (required for filesystem-first):

- ``execute_command`` - Run shell commands (ripgrep, ast-grep, Python, etc.)

**Total context cost**: ~3-4K tokens (vs. ~150K for all tools)

Available MCP Servers
---------------------

List all MCP servers you want to make available. Agents will discover them on-demand:

.. code-block:: yaml

   massgen:
     available_mcp_servers:
       # Productivity
       - google-drive
       - gmail
       - google-calendar
       - notion
       - slack

       # Development
       - github
       - gitlab
       - linear

       # Databases
       - postgres
       - mongodb

       # ... add unlimited servers!

**No context cost** - All servers exposed as files, zero tokens in context.

Agent Configuration
-------------------

Agents must have code execution enabled:

.. code-block:: yaml

   agents:
     - id: "universal_agent"
       backend:
         type: "openai"
         model: "gpt-4o"
         enable_mcp_command_line: true  # Required
         command_line_execution_mode: "docker"  # Required
         command_line_docker_image: "massgen/mcp-runtime:latest"

Tool Discovery
==============

Agents use standard CLI tools to discover relevant tools:

Ripgrep (Fast Text Search)
---------------------------

Find tools by keyword:

.. code-block:: bash

   # Find all tools related to "document"
   rg "document" servers/ tools/ -i -l

   # Find tools with specific capabilities
   rg "screenshot|image|visual" tools/ -i -l

AST-grep (Structural Search)
-----------------------------

Find tools by code structure:

.. code-block:: bash

   # Find all async functions
   ast-grep --pattern 'async def $FUNC($$$)' servers/

   # Find tools that return specific types
   ast-grep --pattern 'def $FUNC($$$) -> Dict' servers/

Directory Listing
-----------------

Browse available servers and tools:

.. code-block:: python

   import os

   # List all MCP servers
   servers = os.listdir("servers/")
   print(f"Available: {servers}")

   # List tools in a server
   tools = os.listdir("servers/google-drive/")

Skills System
=============

Skills are reusable workflows saved in SKILL.md format (compatible with Anthropic Skills specification).

Creating Skills
---------------

Agents can save successful workflows:

.. code-block:: python

   from _massgen_runtime import save_skill

   save_skill(
       name="webapp-testing",
       description="Test web applications with Playwright",
       instructions="""
   # Web Application Testing Skill

   ## Instructions
   1. Navigate to URL using playwright_navigate
   2. Take screenshots
   3. Check console for errors

   ## Example
   ```python
   from tools.web import playwright_navigate
   result = await playwright_navigate(url="https://example.com", screenshot=True)
   ```
       """,
       community=True  # Share with all agents
   )

Skills are saved as directories with SKILL.md:

.. code-block:: text

   skills/community/webapp-testing/
   ├── SKILL.md                  # Instructions
   ├── references/               # Optional documentation
   ├── scripts/                  # Optional helper scripts
   └── assets/                   # Optional templates/config

Using Skills
------------

Discover and use skills via filesystem:

.. code-block:: python

   # Discover skills
   import os
   skills = os.listdir("skills/community/")

   # Read a skill
   with open("skills/community/webapp-testing/SKILL.md") as f:
       instructions = f.read()

   # Or use helper
   from _massgen_runtime import read_skill
   skill = read_skill("webapp-testing")
   print(skill["instructions"])

Listing Skills
--------------

.. code-block:: python

   from _massgen_runtime import list_skills

   skills = list_skills()

   print("My skills:")
   for skill in skills["own"]:
       print(f"  - {skill['name']}: {skill['description']}")

   print("\nCommunity skills:")
   for skill in skills["community"]:
       print(f"  - {skill['name']}: {skill['description']}")

Memory System
=============

Each agent has a persistent ``memory/`` directory for storing and retrieving information.

Memory Directory Structure
---------------------------

.. code-block:: text

   memory/                          # Agent's memory (symlinked)
   ├── core_memories.json           # Long-term facts
   ├── task_history.json            # Previous tasks and outcomes
   ├── learned_patterns.md          # Patterns the agent has learned
   └── preferences.yaml             # User preferences

Using Memory
------------

Agents interact with memory using standard file operations:

**Writing Memory:**

.. code-block:: python

   import json

   # Read existing memories
   with open("memory/core_memories.json") as f:
       memories = json.load(f)

   # Add new memory
   memories["user_preferences"] = {"theme": "dark", "language": "python"}

   # Save
   with open("memory/core_memories.json", "w") as f:
       json.dump(memories, f, indent=2)

**Searching Memory:**

.. code-block:: python

   import subprocess

   # Find all memories about a topic
   result = subprocess.run(
       ["rg", "database optimization", "memory/", "-i"],
       capture_output=True, text=True
   )
   print(result.stdout)

**Accessing Memory Path:**

.. code-block:: python

   from _massgen_runtime import get_memory_path

   memory_dir = get_memory_path()
   print(f"My memory is at: {memory_dir}")

Advanced Features
=================

Custom In-Context Tools
-----------------------

You can customize which tools stay in context:

.. code-block:: yaml

   massgen:
     execution_mode: "filesystem_first"

     in_context_tools:
       # Minimal set (maximum context savings)
       filesystem: [read_file, write_file, list_directory]
       code_execution: [execute_python]

     # OR: Add domain-specific essentials
     in_context_tools:
       filesystem: "*"  # All filesystem tools
       code_execution: "*"
       web: [playwright_navigate]  # If web-focused agent

Resource Limits
---------------

Configure Docker resource limits:

.. code-block:: yaml

   agents:
     - backend:
         enable_mcp_command_line: true
         command_line_execution_mode: "docker"
         command_line_docker_memory_limit: "2g"
         command_line_docker_cpu_limit: 4.0
         command_line_docker_network_mode: "bridge"

Network Access
--------------

Control network access for security:

.. code-block:: yaml

   agents:
     - backend:
         command_line_docker_network_mode: "none"    # No network
         # OR
         command_line_docker_network_mode: "bridge"  # Internet access
         # OR
         command_line_docker_network_mode: "host"    # Full host network

Examples
========

Example 1: Universal Workspace
-------------------------------

A single config that works for ANY task:

.. code-block:: yaml

   # universal_workspace.yaml
   massgen:
     execution_mode: "filesystem_first"

     in_context_tools:
       filesystem: [read_file, write_file, list_directory]
       code_execution: [execute_python, execute_bash]

     # Attach EVERY MCP you own
     available_mcp_servers:
       # Productivity (12 servers)
       - google-drive
       - gmail
       - notion
       - slack
       # ... more

       # Development (15 servers)
       - github
       - gitlab
       - linear
       # ... more

       # Databases (8 servers)
       - postgres
       - mongodb
       # ... more

       # Total: 50+ servers, zero context cost!

   agents:
     - id: "universal_agent"
       backend:
         type: "openai"
         model: "gpt-4o"
         enable_mcp_command_line: true
         command_line_execution_mode: "docker"

**Usage**: This single config adapts to any task. Agent discovers and uses only the 2-3 tools needed.

Example 2: Web Development
---------------------------

.. code-block:: yaml

   massgen:
     execution_mode: "filesystem_first"

     available_mcp_servers:
       - github       # Version control
       - vercel       # Deployment
       - postgres     # Database

     available_custom_tools:
       - web.playwright_navigate
       - web.screenshot
       - multimodal.vision_understanding

   agents:
     - id: "web_dev_agent"
       backend:
         type: "openai"
         model: "gpt-4o"
         enable_mcp_command_line: true
         command_line_execution_mode: "docker"

Example 3: Data Analysis
-------------------------

.. code-block:: yaml

   massgen:
     execution_mode: "filesystem_first"

     available_mcp_servers:
       - postgres
       - mongodb
       - salesforce
       - stripe
       - slack        # For reporting

   agents:
     - id: "data_analyst"
       backend:
         type: "openai"
         model: "gpt-4o"
         enable_mcp_command_line: true
         command_line_execution_mode: "docker"

Runtime Functions
=================

Agents have access to these runtime functions:

MCP Tools
---------

.. code-block:: python

   from _massgen_runtime import call_mcp_tool

   result = await call_mcp_tool(
       server="google-drive",
       tool="getDocument",
       arguments={"documentId": "abc123"}
   )

Custom Tools
------------

.. code-block:: python

   from _massgen_runtime import call_custom_tool

   result = await call_custom_tool(
       tool_name="web.playwright_navigate",
       url="https://example.com",
       screenshot=True
   )

Skills
------

.. code-block:: python

   from _massgen_runtime import (
       save_skill,      # Save workflow as skill
       read_skill,      # Read skill instructions
       list_skills,     # List available skills
       delete_skill,    # Remove skill
       get_skill_resource,  # Access bundled resources
   )

Context
-------

.. code-block:: python

   from _massgen_runtime import (
       get_agent_id,     # Get current agent ID
       get_memory_path,  # Get agent's memory directory
   )

Comparison: Context-Based vs Filesystem-First
==============================================

.. list-table::
   :header-rows: 1
   :widths: 30 35 35

   * - Aspect
     - Context-Based
     - Filesystem-First
   * - Max MCP servers
     - 5-10 (limited)
     - **100+** (unlimited)
   * - Context cost
     - ~150K tokens
     - **~3K tokens** (98% reduction)
   * - Tool discovery
     - Manual config
     - **Automatic** (ripgrep/ast-grep)
   * - Tool composition
     - Chain tool calls
     - **Write code** (loops, conditionals)
   * - Config management
     - Different per task
     - **One universal config**
   * - Skills
     - Not available
     - **Reusable workflows**
   * - Memory
     - Dedicated API
     - **Filesystem** (unified)
   * - Scalability
     - Limited by context
     - **Unlimited**

Troubleshooting
===============

"execution_mode: 'filesystem_first' requires code execution"
------------------------------------------------------------

**Cause**: Code execution not enabled in agent backend.

**Solution**: Enable Docker-based code execution:

.. code-block:: yaml

   agents:
     - backend:
         enable_mcp_command_line: true
         command_line_execution_mode: "docker"

"Search tool 'ripgrep' is NOT available"
----------------------------------------

**Cause**: Docker image doesn't include ripgrep.

**Solution**: Rebuild Docker image (includes ripgrep by default):

.. code-block:: bash

   cd massgen/docker
   bash build.sh

"Tools not found in filesystem"
--------------------------------

**Cause**: Workspace not initialized or MCP clients not connected.

**Solution**: Check logs for initialization messages. Ensure MCP servers are configured correctly.

"Cannot import from servers/"
------------------------------

**Cause**: Symlinks not created or Python path not set.

**Solution**:

1. Verify symlinks exist:

   .. code-block:: bash

      ls -la temp_workspaces/your_agent_id/

2. Ensure code execution runs from workspace directory

Best Practices
==============

1. **Use filesystem-first for complex multi-tool tasks**

   - Ideal when you need many MCP integrations
   - Great for tasks where tool needs are unpredictable

2. **Create one universal workspace config**

   - Attach all MCP servers you own
   - Reuse for different tasks
   - Let agents discover what they need

3. **Save successful workflows as skills**

   - When a complex task succeeds, save it
   - Share useful skills to community
   - Build a library of proven workflows

4. **Use Docker mode for security**

   - Isolated execution environment
   - Resource limits prevent abuse
   - Network controls

5. **Leverage search tools effectively**

   - Use ripgrep for keyword search
   - Use ast-grep for structural patterns
   - Combine both for precise discovery

See Also
========

- :doc:`tools/code_execution` - Code execution setup
- :doc:`tools/mcp_integration` - MCP server configuration
- :doc:`tools/custom_tools` - Custom tool development
- **Design Doc**: ``docs/dev_notes/filesystem_tool_discovery_design.md``
- **Example Configs**: ``massgen/configs/examples/filesystem_first_*.yaml``

References
==========

- `Anthropic: Code Execution with MCP <https://www.anthropic.com/engineering/code-execution-with-mcp>`_
- `Apple: CodeAct <https://machinelearning.apple.com/research/codeact>`_
- `Anthropic Skills <https://github.com/anthropics/skills>`_


---

## user_guide/integration/automation.rst

=============================
LLM Agent Automation Guide
=============================

This guide shows how to automate MassGen coordination using LLM agents and programmatic workflows.

.. contents:: Table of Contents
   :local:
   :depth: 2

Overview
========

MassGen provides **automation mode** (introduced in v0.1.8) designed specifically for LLM agents and background execution:

- ✅ **Silent output** (~10 lines instead of 250-3,000+)
- ✅ **Real-time status tracking** via ``status.json`` (updated every 2 seconds)
- ✅ **Meaningful exit codes** (success, timeout, error, interrupted)
- ✅ **Structured result files** (machine-readable JSON and text)
- ✅ **Parallel execution** support (isolated log directories)

.. seealso::
   **Real-World Example:** See the :doc:`../../examples/case_studies/meta-self-analysis-automation-mode` case study demonstrating MassGen agents using automation mode to analyze MassGen itself and propose performance improvements.

Quick Start
===========

Basic Automation Mode
----------------------

.. code-block:: bash

   uv run massgen --automation --config your_config.yaml "Your question here"

**Output** (minimal, parseable):

.. code-block:: text

   LOG_DIR: /path/to/.massgen/massgen_logs/log_20251103_143022
   STATUS: /path/to/.massgen/massgen_logs/log_20251103_143022/status.json
   QUESTION: Your question here
   [Coordination in progress - monitor status.json for real-time updates]

   WINNER: agent_a
   ANSWER_FILE: /path/to/final/agent_a/answer.txt
   DURATION: 45.3s
   ANSWER_PREVIEW: The answer starts here...

   COMPLETED: 2 agents, 45.3s total

**Exit codes**:

- ``0`` = Success (coordination completed)
- ``1`` = Configuration error
- ``2`` = Execution error (agent failure, API error)
- ``3`` = Timeout
- ``4`` = Interrupted (Ctrl+C)

Using BackgroundShellManager
=============================

MassGen provides ``BackgroundShellManager`` for robust background execution. **Always use this instead of subprocess directly.**

.. note::

   ``BackgroundShellManager`` is for running full CLI processes (for example, ``uv run massgen ...``) in the background.
   For non-blocking **tool calls inside an agent run**, use the tool lifecycle documented in :doc:`../tools/background_tools`.

Basic Usage
-----------

.. code-block:: python

   from massgen.filesystem_manager.background_shell import (
       start_shell,
       get_shell_output,
       get_shell_status,
       kill_shell,
   )

   # Start MassGen in background
   shell_id = start_shell(
       "uv run massgen --automation --config config.yaml 'Your question'"
   )

   # Monitor progress
   import time
   while True:
       status = get_shell_status(shell_id)
       if status["status"] != "running":
           break
       time.sleep(2)

   # Get results
   output = get_shell_output(shell_id)
   print(f"Exit code: {output['exit_code']}")
   print(f"Output:\n{output['stdout']}")

Parallel Execution
==================

Parallel Execution Safety
--------------------------

**✅ Parallel execution is AUTOMATIC and SAFE in ALL modes!**

MassGen automatically isolates all resources when running multiple instances:

1. **Generates unique instance IDs** - Appends random 8-character ID to prevent conflicts

   Example: ``workspace1`` → ``workspace1_a1b2c3d4``

2. **Isolates all resources automatically**:

   ✅ **Log directories** - Microsecond-precision timestamps

   ✅ **Workspaces** - Auto-generated unique suffixes

   ✅ **Snapshot storage** - Per-agent subdirectories

   ✅ **Docker containers** - Auto-generated unique container names (includes instance ID suffix)

**No manual configuration needed!** Just use the same config multiple times:

.. code-block:: bash

   # ✅ SAFE - Run the same config 5 times in parallel (with or without --automation)
   for i in {1..5}; do
       uv run massgen --config my_config.yaml "Task $i" &
   done
   wait

**Each instance automatically gets unique workspace paths and Docker containers:**

.. code-block:: text

   Instance 1: workspace1_a1b2c3d4, massgen-agent_a-a1b2c3d4
   Instance 2: workspace1_e5f6a7b8, massgen-agent_a-e5f6a7b8
   Instance 3: workspace1_c9d0e1f2, massgen-agent_a-c9d0e1f2

**Note:** This works in both automation mode (``--automation``) and normal mode. The difference is that automation mode provides silent output and status.json tracking, while normal mode shows the full UI.

Running Multiple Experiments Simultaneously
-------------------------------------------

**Programmatic Parallel Execution:**

Use the BackgroundShellManager for robust programmatic parallel execution:

.. code-block:: python

   from massgen.filesystem_manager.background_shell import start_shell, get_shell_status
   import time

   def run_experiments_in_parallel(configs_and_questions):
       """
       Run multiple MassGen experiments in parallel.

       Args:
           configs_and_questions: List of (config_path, question) tuples

       Returns:
           list: Results from all experiments
       """
       experiments = []

       # Start all experiments
       for config, question in configs_and_questions:
           shell_id = start_shell(
               f'uv run massgen --automation --config {config} "{question}"'
           )
           experiments.append({
               "shell_id": shell_id,
               "config": config,
               "question": question,
           })
           print(f"Started experiment {shell_id}: {question[:50]}...")

       # Wait for all to complete
       while True:
           all_done = True
           for exp in experiments:
               status = get_shell_status(exp["shell_id"])
               if status["status"] == "running":
                   all_done = False

           if all_done:
               break

           time.sleep(2)

       # Collect results
       results = []
       for exp in experiments:
           status = get_shell_status(exp["shell_id"])
           output = get_shell_output(exp["shell_id"])
           results.append({
               "config": exp["config"],
               "question": exp["question"],
               "exit_code": output["exit_code"],
               "duration": status["duration_seconds"],
               "status": status["status"],
           })

       return results


   # Example: Run the SAME config with different questions (parallel isolation is automatic!)
   experiments = [
       ("my_config.yaml", "Create a webpage about Bob Dylan"),
       ("my_config.yaml", "Write a Python script to analyze data"),
       ("my_config.yaml", "Design a REST API for a blog"),
   ]

   results = run_experiments_in_parallel(experiments)

   for result in results:
       print(f"{result['question']}: {result['status']} in {result['duration']}s")

Status File Overview
====================

The ``status.json`` file is updated every 2 seconds during coordination.

.. note::
   **For complete status.json reference with all fields documented:** See :doc:`../../reference/status_file`

File Location
-------------

.. code-block:: text

   .massgen/massgen_logs/log_YYYYMMDD_HHMMSS_ffffff/status.json

Quick Reference
---------------

.. code-block:: json

   {
     "meta": {
       "last_updated": 1730678901.234,
       "session_id": "log_20251103_143022_123456",
       "log_dir": ".massgen/massgen_logs/log_20251103_143022_123456",
       "question": "Your question here",
       "start_time": 1730678800.000,
       "elapsed_seconds": 101.234
     },
     "coordination": {
       "phase": "enforcement",
       "active_agent": "agent_b",
       "completion_percentage": 65,
       "is_final_presentation": false
     },
     "agents": {
       "agent_a": {
         "status": "voted",
         "answer_count": 1,
         "latest_answer_label": "agent1.1",
         "vote_cast": {
           "voted_for_agent": "agent_a",
           "voted_for_label": "agent1.1",
           "reason_preview": "Strong JSON structure..."
         },
         "times_restarted": 1,
         "last_activity": 1730678850.123,
         "error": null
       },
       "agent_b": {
         "status": "streaming",
         "answer_count": 0,
         "latest_answer_label": null,
         "vote_cast": null,
         "times_restarted": 1,
         "last_activity": 1730678900.456,
         "error": {
           "type": "timeout",
           "message": "Agent timeout after 180s",
           "timestamp": 1730678900.0
         }
       }
     },
     "results": {
       "votes": {
         "agent1.1": 1,
         "agent1.2": 0
       },
       "winner": null,
       "final_answer_preview": null
     }
   }

Agent Status Values
-------------------

- **streaming**: Agent is actively generating content
- **answered**: Agent has provided an answer this round
- **voted**: Agent has cast their vote
- **restarting**: Agent is restarting due to new answer
- **error**: Agent encountered an error
- **timeout**: Agent timed out
- **completed**: Agent finished all work

Coordination Phases
-------------------

- **initial_answer**: Agents providing initial answers
- **enforcement**: Voting phase
- **presentation**: Final answer presentation

Reading Results
===============

Log Directory Structure
-----------------------

After coordination completes, find results in the log directory:

.. code-block:: text

   .massgen/massgen_logs/log_YYYYMMDD_HHMMSS/
   ├── execution_metadata.yaml       # Session metadata
   ├── coordination_events.json      # Complete event log
   ├── status.json                   # Final status snapshot
   ├── snapshot_mappings.json        # Answer/vote file mappings
   ├── final/
   │   └── {winner_agent}/
   │       ├── answer.txt            # ⭐ Final answer here
   │       ├── context.txt           # Agent's context
   │       └── workspace/            # Agent's workspace snapshot
   ├── agent_outputs/
   │   ├── agent_a.txt              # Full agent log
   │   └── agent_b.txt
   └── massgen.log                   # Detailed debug log

Programmatic Access
-------------------

.. code-block:: python

   import json
   from pathlib import Path

   def read_massgen_results(log_dir: Path):
       """Read MassGen coordination results."""
       # Read final status
       status = json.load(open(log_dir / "status.json"))

       # Get winner
       winner = status["results"]["winner"]

       # Read final answer
       answer_file = log_dir / f"final/{winner}/answer.txt"
       answer = answer_file.read_text() if answer_file.exists() else None

       # Read execution metadata
       import yaml
       metadata = yaml.safe_load(open(log_dir / "execution_metadata.yaml"))

       return {
           "winner": winner,
           "answer": answer,
           "duration": status["meta"]["elapsed_seconds"],
           "votes": status["results"]["votes"],
           "config": metadata["config"],
           "question": metadata["question"],
       }

Meta-Coordination: MassGen Running MassGen
===========================================

MassGen can autonomously run and monitor itself, enabling self-improvement and automated experimentation.

.. tip::
   **Case Study:** The v0.1.8 release includes a complete :doc:`../../examples/case_studies/meta-self-analysis-automation-mode` demonstrating meta-coordination in action. Agents successfully ran nested MassGen experiments, analyzed execution logs, and proposed 6 prioritized performance improvements with starter code.

Available Meta Configs
-----------------------

**1. massgen_runs_massgen.yaml** - Run MassGen experiments

.. code-block:: bash

   uv run massgen --config @examples/configs/meta/massgen_runs_massgen.yaml \
       "Run a MassGen experiment to create a webpage about Bob Dylan"

**2. massgen_suggests_to_improve_massgen.yaml** - Run experiments AND suggest improvements

.. code-block:: bash

   uv run massgen --config @examples/configs/meta/massgen_suggests_to_improve_massgen.yaml \
       "Run an experiment with MassGen then read the logs and suggest any improvements to help MassGen perform better along any dimension (quality, speed, cost, creativity, etc.)."

This configuration was used in the v0.1.8 case study where agents analyzed MassGen's architecture, ran controlled experiments, and identified optimization opportunities.

Example Configuration
---------------------

**Config**: ``@examples/configs/meta/massgen_runs_massgen.yaml``

.. code-block:: yaml

   agents:
     - id: "meta_agent"
       backend:
         type: "openai"
         model: "gpt-5-mini"
         cwd: "workspace_meta"
         enable_mcp_command_line: true
         command_line_execution_mode: "local"
       system_message: |
         You have access to MassGen through the command line and can:
         - Run MassGen in automation mode using: uv run massgen --automation --config [config] "[question]"
         - Monitor progress by reading status.json files
         - Read final results from log directories
         - Parse coordination outcomes
         - Always run MassGen in a background process to avoid blocking
   orchestrator:
     snapshot_storage: "snapshots_meta"
     agent_temporary_workspace: "temp_workspaces_meta"

Running the Example
-------------------

.. code-block:: bash

   uv run massgen --config massgen/configs/meta/massgen_runs_massgen.yaml \
       "Run a MassGen experiment to create a webpage about Bob Dylan"

**What happens:**

1. The meta_agent receives your request
2. It executes: ``uv run massgen --automation --config massgen/configs/tools/todo/example_task_todo.yaml "Create a simple HTML page about Bob Dylan"``
3. It monitors the nested MassGen's ``status.json`` file
4. It reads the final results
5. It reports which agent won (agent_a or agent_b) and shows the final HTML page

**Output demonstrates:**

- ✅ MassGen can autonomously run experiments
- ✅ Can monitor progress via status.json
- ✅ Can parse and report coordination outcomes
- ✅ Can read final results from log directories

Current Limitations
-------------------

.. note::
   **Local Execution Only**: The meta-config currently uses ``command_line_execution_mode: "local"``.
   Docker execution for nested MassGen requires:

   - API credential passing to nested instances
   - Automatic dependency installation (e.g., reinstalling MassGen in container)
   - See Issue #436 for planned Docker support

.. warning::
   **Cost Control**: Meta-coordination can result in significant API costs as agents run experiments
   autonomously. Always set strict timeout limits. See Issue #432 for planned cost tracking features.


Error Handling Best Practices
==============================

1. **Always use timeouts**

   .. code-block:: python

      result = run_massgen_automation(config, question, timeout_seconds=300)

2. **Check exit codes**

   .. code-block:: python

      if result["exit_code"] == 0:
          # Success
      elif result["exit_code"] == 3:
          # Timeout - may need longer timeout or simpler query
      elif result["exit_code"] == 2:
          # Execution error - check logs

3. **Monitor agent errors in status.json**

   .. code-block:: python

      if status["agents"]["agent_a"]["error"]:
          # Handle agent-specific error

4. **Always clean up on failure**

   .. code-block:: python

      try:
          result = run_massgen_automation(config, question)
      finally:
          # Ensure shell is killed if still running
          if shell_id:
              kill_shell(shell_id)

5. **Validate results exist before reading**

   .. code-block:: python

      if answer_file.exists():
          answer = answer_file.read_text()
      else:
          # Handle missing results

Session Viewer
==============

While ``--automation`` mode runs headless, you can observe any session (live or completed) in the full Textual TUI using ``massgen viewer``.

.. code-block:: bash

   # In terminal 1: Run headless
   uv run massgen --automation --config config.yaml "Your question"
   # Outputs: LOG_DIR: .massgen/massgen_logs/log_20260309_120000_123456/turn_1/attempt_1

   # In terminal 2: View live in TUI
   uv run massgen viewer .massgen/massgen_logs/log_20260309_120000_123456/turn_1/attempt_1

The viewer shows the exact same TUI as a normal interactive run — agent panels, tool calls, votes, and final presentation — but in read-only mode.

Quick Reference
---------------

.. code-block:: bash

   # View the most recent session (auto-detected)
   massgen viewer

   # View a specific log directory
   massgen viewer /path/to/log_dir

   # Interactive session picker
   massgen viewer --pick

   # Replay a completed session at real-time speed
   massgen viewer /path/to/log_dir --replay-speed 1

   # View in browser (requires textual-serve)
   massgen viewer /path/to/log_dir --web

**Live vs Replay:**

- If the session is still running (``is_complete: false`` in ``status.json``), the viewer tails ``events.jsonl`` in real time
- If the session is completed, all events are replayed instantly (or at ``--replay-speed`` if specified)

.. tip::
   This is especially useful for cloud runs, CI/CD pipelines, and embedded processes where you need visual monitoring without a terminal attached to the running process.

Performance Tips
================

1. **Use automation mode** - Reduces output overhead significantly
2. **Poll status.json every 2-5 seconds** - Balances responsiveness and overhead
3. **Limit concurrent experiments** - BackgroundShellManager limits to 10 by default
4. **Clean up old logs** - Remove `.massgen/massgen_logs/log_*` directories periodically
5. **Use appropriate timeouts** - Simple tasks: 60s, Complex tasks: 300-600s

Troubleshooting
===============

Issue: Can't find log directory
--------------------------------

**Symptom**: LOG_DIR not printed in output

**Solutions**:

- Ensure ``--automation`` flag is used
- Check stderr for startup errors
- Verify config file exists and is valid

Issue: status.json not updating
--------------------------------

**Symptom**: status.json file not changing

**Solutions**:

- Ensure logging is enabled (``--automation`` enables it by default)
- Check if coordination is actually running
- Verify file permissions on log directory

Issue: Process hangs
--------------------

**Symptom**: Process runs indefinitely

**Solutions**:

- Set timeout in your automation script
- Monitor status.json for stuck agents
- Use ``kill_shell()`` to terminate gracefully

Issue: Exit code always 1
-------------------------

**Symptom**: Getting config errors

**Solutions**:

- Validate config with ``uv run massgen --validate --config your_config.yaml``
- Check that all required API keys are set
- Verify model names are correct

Limitations
===========

Current Constraints
-------------------

**1. Local Code Execution Only (for MassGen-running-MassGen)**

When using MassGen to run MassGen (meta-coordination), currently only local code execution is supported:

.. code-block:: yaml

   # ✅ Supported
   agents:
     - backend:
         enable_mcp_command_line: true
         command_line_execution_mode: "local"

   # ❌ Not yet supported for meta-coordination
   agents:
     - backend:
         command_line_execution_mode: "docker"
         # Issue: Requires credential passing to nested instances

**Why:** Docker execution requires API credentials, which need to be securely passed to nested MassGen instances. This will be addressed in a future PR.

**2. Cost Control**

.. warning::
   **IMPORTANT:** When using automation mode for autonomous experiments, agents can potentially execute many API calls without human oversight. This can result in unexpected costs.

**Best Practices:**
The configs you have MassGen run itself should include cost control measures:

- Set explicit timeout limits in configs to prevent indefinite hangs:

  .. code-block:: yaml

     timeout_settings:
       orchestrator_timeout_seconds: 1800  # 30 minutes max (recommended for meta-coordination)
       agent_timeout_seconds: 600          # 10 minutes per agent

  **Note**: Meta-coordination typically takes 10-30 minutes. Regular tasks: 2-10 minutes.

- Limit answers per agent for better progress tracking:

  .. code-block:: yaml

     orchestrator:
       max_new_answers_per_agent: 2  # Helps track progress more accurately

  Setting this helps estimate completion percentage more reliably. Without it, agents can provide unlimited answers, making progress tracking less predictable.

- Monitor costs via your API provider dashboards
- Use less expensive models for automated experimentation:

  .. code-block:: yaml

     agents:
       - backend:
           model: "gpt-4o-mini"  # More economical than gpt-4o

- Set API rate limits at the provider level
- Start with small experiments before scaling

**Future Enhancement:** Built-in cost tracking and limits (planned).

Next Steps
==========

- **Read** :doc:`../../reference/cli` for all CLI options
- **See** :doc:`../../reference/status_file` for complete status.json documentation
- **See** :doc:`../../reference/yaml_schema` for configuration details
- **Check** :doc:`../../examples/basic_examples` for working examples
- **Review** ``massgen/filesystem_manager/background_shell.py`` source code


---

## user_guide/integration/general_interoperability.rst

General Framework Interoperability
===================================

**NEW in v0.1.6**

MassGen provides comprehensive interoperability with external agent frameworks through its custom tool system. This enables you to leverage specialized multi-agent frameworks as powerful tools within MassGen's coordination ecosystem.

Quick Start
-----------

Try Framework Integrations
~~~~~~~~~~~~~~~~~~~~~~~~~~~

**AG2 (AutoGen) - Nested Chat Patterns:**

.. code-block:: bash

   massgen \
     --config @examples/tools/custom_tools/interop/ag2_lesson_planner_example.yaml \
     "Create a lesson plan for teaching fractions to fourth graders"

**LangGraph - State Graph Workflows:**

.. code-block:: bash

   massgen \
     --config @examples/tools/custom_tools/interop/langgraph_lesson_planner_example.yaml \
     "Design a lesson plan for the water cycle"

**AgentScope - Sequential Pipelines:**

.. code-block:: bash

   massgen \
     --config @examples/tools/custom_tools/interop/agentscope_lesson_planner_example.yaml \
     "Create a lesson plan for photosynthesis"

**OpenAI Chat Completions - Multi-Agent API:**

.. code-block:: bash

   massgen \
     --config @examples/tools/custom_tools/interop/openai_assistant_lesson_planner_example.yaml \
     "Develop a lesson plan for ecosystems"

**SmolAgent - Tool-Using Agents:**

.. code-block:: bash

   massgen \
     --config @examples/tools/custom_tools/interop/smolagent_lesson_planner_example.yaml \
     "Build a lesson plan for fractions"

**Compare Multiple Frameworks:**

.. code-block:: bash

   massgen \
     --config @examples/tools/custom_tools/interop/ag2_and_langgraph_lesson_planner.yaml \
     "Create a lesson plan comparing different approaches"

These examples demonstrate how each framework can be used as a tool within MassGen agents, leveraging their unique orchestration patterns while participating in MassGen's multi-agent coordination.

Installation
------------

Install the required framework dependencies:

**For AG2:**

.. code-block:: bash

   uv pip install -e ".[external]"

**For LangGraph:**

.. code-block:: bash

   pip install langgraph langchain-openai

**For AgentScope:**

.. code-block:: bash

   pip install agentscope

**For SmolAgent:**

.. code-block:: bash

   pip install smolagents

**OpenAI Chat Completions:**

No additional installation needed - uses standard OpenAI SDK already included with MassGen.

**For all frameworks:**

.. code-block:: bash

   pip install agentscope langgraph langchain-openai smolagents
   uv pip install -e ".[external]"

What is Framework Interoperability?
------------------------------------

Framework interoperability means using specialized agent frameworks as tools within MassGen. Each framework becomes a powerful capability that MassGen agents can invoke.

**Supported Frameworks:**

* **AG2 (AutoGen)** - Nested chats and group collaboration
* **LangGraph** - State graph-based workflows
* **AgentScope** - Sequential agent pipelines
* **OpenAI Chat Completions** - Multi-agent API patterns
* **SmolAgent** - Tool-using agent framework from HuggingFace

**Key Benefits:**

* **Leverage Framework Strengths**: Use the best framework for each task
* **Preserve Framework Patterns**: Maintain nested chats (AG2) or state graphs (LangGraph)
* **Hybrid Coordination**: Combine framework-specific patterns with MassGen's multi-agent coordination
* **Gradual Adoption**: Integrate existing framework implementations without rewriting

**How It Works:**

External frameworks are wrapped as custom tools that MassGen agents can call. This allows you to:

* Wrap entire multi-agent frameworks as single tools
* Maintain framework-specific orchestration patterns
* Combine multiple frameworks in hybrid agent teams
* Preserve each framework's unique capabilities

Supported Frameworks
--------------------

AG2 Integration
~~~~~~~~~~~~~~~

`AG2 <https://github.com/ag2ai/ag2>`_ (formerly AutoGen) is a multi-agent framework that provides powerful orchestration patterns like nested chats and group chats.

**Key Features:**

* Nested chat patterns for complex workflows
* Group chat collaboration between multiple agents
* Code execution capabilities
* Rich agent conversation management
* **Streaming support** for real-time output

**Basic Configuration:**

.. code-block:: yaml

   agents:
     - id: "ag2_assistant"
       backend:
         type: "openai"
         model: "gpt-4o"
         custom_tools:
           - name: ["ag2_lesson_planner"]
             category: "education"
             path: "massgen/tool/_extraframework_agents/ag2_lesson_planner_tool.py"
             function: ["ag2_lesson_planner"]
       system_message: |
         You have access to an AG2-powered lesson planning tool that uses
         nested chats and group collaboration.

**Usage:**

.. code-block:: bash

   massgen --config @examples/tools/custom_tools/interop/ag2_lesson_planner_example.yaml \
     "Create a lesson plan for fractions"

**How AG2 Integration Works:**

The AG2 tool uses nested chat patterns:

1. **Inner Chat 1**: Curriculum agent determines standards (2 turns)
2. **Group Chat**: Collaborative lesson planning with multiple agents
3. **Inner Chat 2**: Formatter agent creates final output

This demonstrates AG2's powerful orchestration patterns within MassGen's coordination system.

LangGraph Integration
~~~~~~~~~~~~~~~~~~~~~

`LangGraph <https://github.com/langchain-ai/langgraph>`_ provides state graph-based orchestration for complex agent workflows.

**Key Features:**

* State graph architecture
* Conditional routing and branching
* Integration with LangChain ecosystem
* Persistent state management

**Note:** Streaming support coming in future release.

**Basic Configuration:**

.. code-block:: yaml

   agents:
     - id: "langgraph_assistant"
       backend:
         type: "openai"
         model: "gpt-4o"
         custom_tools:
           - name: ["langgraph_lesson_planner"]
             category: "education"
             path: "massgen/tool/_extraframework_agents/langgraph_lesson_planner_tool.py"
             function: ["langgraph_lesson_planner"]
       system_message: |
         You have access to a LangGraph-powered lesson planning tool.
         Use it for creating structured lesson plans with state-based workflows.

**Usage:**

.. code-block:: bash

   massgen --config @examples/tools/custom_tools/interop/langgraph_lesson_planner_example.yaml \
     "Design a lesson plan for the water cycle"

**How LangGraph Integration Works:**

The workflow uses a state graph architecture:

.. code-block:: text

   curriculum_node -> planner_node -> reviewer_node -> formatter_node -> END

The graph maintains state throughout execution:

* ``user_prompt``: Original request
* ``standards``: Curriculum standards from first node
* ``lesson_plan``: Draft plan from second node
* ``reviewed_plan``: Reviewed plan from third node
* ``final_plan``: Formatted output from final node

AgentScope Integration
~~~~~~~~~~~~~~~~~~~~~~

`AgentScope <https://github.com/modelscope/agentscope>`_ is a multi-agent framework providing flexible agent orchestration patterns.

**Key Features:**

* Sequential agent pipelines
* Memory and message passing
* Multiple LLM backend support
* Flexible conversation management

**Note:** Streaming support coming in future release.

**Basic Configuration:**

.. code-block:: yaml

   agents:
     - id: "agentscope_assistant"
       backend:
         type: "openai"
         model: "gpt-4o"
         custom_tools:
           - name: ["agentscope_lesson_planner"]
             category: "education"
             path: "massgen/tool/_extraframework_agents/agentscope_lesson_planner_tool.py"
             function: ["agentscope_lesson_planner"]
       system_message: |
         You have access to an AgentScope-powered lesson planning tool.
         Use it to create comprehensive fourth-grade lesson plans.

**Usage:**

.. code-block:: bash

   massgen --config @examples/tools/custom_tools/interop/agentscope_lesson_planner_example.yaml \
     "Create a lesson plan for photosynthesis"

**How AgentScope Integration Works:**

The tool orchestrates four specialized AgentScope agents in sequence:

1. **Curriculum Standards Expert**: Identifies grade-level standards
2. **Lesson Planning Specialist**: Creates detailed lesson structure
3. **Lesson Plan Reviewer**: Reviews for age-appropriateness
4. **Lesson Plan Formatter**: Formats the final output

Each agent uses AgentScope's ``SimpleDialogAgent`` with OpenAI models, maintaining conversation history through AgentScope's memory system.

OpenAI Chat Completions Integration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Direct integration with OpenAI's Chat Completions API as a multi-agent system.

**Key Features:**

* **Native streaming support** for real-time output
* Multiple specialized "agents" via system prompts
* Sequential processing pipeline
* Full control over temperature and parameters

**Basic Configuration:**

.. code-block:: yaml

   agents:
     - id: "openai_assistant"
       backend:
         type: "openai"
         model: "gpt-4o"
         custom_tools:
           - name: ["openai_assistant_lesson_planner"]
             category: "education"
             path: "massgen/tool/_extraframework_agents/openai_assistant_lesson_planner_tool.py"
             function: ["openai_assistant_lesson_planner"]
       system_message: |
         You have access to an OpenAI-powered multi-agent lesson planning tool
         with streaming support.

**Usage:**

.. code-block:: bash

   massgen --config @examples/tools/custom_tools/interop/openai_assistant_lesson_planner_example.yaml \
     "Develop a lesson plan for ecosystems"

**How OpenAI Integration Works:**

Each "agent" is implemented as a separate API call with specialized system prompt:

1. **Curriculum Agent**: Role-specific prompt for standards
2. **Lesson Planner Agent**: Role-specific prompt for lesson design
3. **Reviewer Agent**: Role-specific prompt for quality review
4. **Formatter Agent**: Role-specific prompt for output formatting

SmolAgent Integration
~~~~~~~~~~~~~~~~~~~~~

`SmolAgent <https://github.com/huggingface/smolagents>`_ is HuggingFace's lightweight tool-using agent framework.

**Key Features:**

* Tool-using agents with code execution
* CodeAgent for autonomous tool management
* Integration with HuggingFace models
* Lightweight and efficient

**Note:** Streaming support coming in future release.

**Basic Configuration:**

.. code-block:: yaml

   agents:
     - id: "smolagent_assistant"
       backend:
         type: "openai"
         model: "gpt-4o"
         custom_tools:
           - name: ["smolagent_lesson_planner"]
             category: "education"
             path: "massgen/tool/_extraframework_agents/smolagent_lesson_planner_tool.py"
             function: ["smolagent_lesson_planner"]
       system_message: |
         You have access to a SmolAgent-powered lesson planning tool
         that uses tool-calling agents.

**Usage:**

.. code-block:: bash

   massgen --config @examples/tools/custom_tools/interop/smolagent_lesson_planner_example.yaml \
     "Build a lesson plan for fractions"

**How SmolAgent Integration Works:**

The tool uses SmolAgent's ``CodeAgent`` with custom tools:

1. **Tool Definition**: Custom tools for each planning stage
2. **Agent Orchestration**: CodeAgent manages tool execution
3. **Sequential Processing**: Tools called in order by the agent
4. **Result Aggregation**: Final lesson plan assembled from tool outputs

Hybrid Multi-Framework Setups
------------------------------

Combine Multiple Frameworks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You can use multiple framework integrations in a single MassGen configuration:

.. code-block:: yaml

   agents:
     # Agent with AG2 tool
     - id: "ag2_specialist"
       backend:
         type: "openai"
         model: "gpt-4o"
         custom_tools:
           - name: ["ag2_lesson_planner"]
             path: "massgen/tool/_extraframework_agents/ag2_lesson_planner_tool.py"
             function: ["ag2_lesson_planner"]
       system_message: "You specialize in nested chat workflows using AG2."

     # Agent with LangGraph tool
     - id: "langgraph_specialist"
       backend:
         type: "openai"
         model: "gpt-4o"
         custom_tools:
           - name: ["langgraph_lesson_planner"]
             path: "massgen/tool/_extraframework_agents/langgraph_lesson_planner_tool.py"
             function: ["langgraph_lesson_planner"]
       system_message: "You specialize in state-based workflows using LangGraph."

     # Native MassGen agent with web search
     - id: "researcher"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
       system_message: "You research educational standards and best practices."

**This setup enables:**

* AG2 specialist uses nested chat patterns
* LangGraph specialist uses state graphs
* Researcher provides web-based context
* All three collaborate through MassGen's coordination

Use Cases
---------

Educational Content Creation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use framework-specific multi-agent patterns for lesson planning:

.. code-block:: bash

   massgen --config ag2_lesson_planner.yaml \
     "Create a comprehensive lesson plan for teaching photosynthesis to fourth graders"

**Why framework integration?**

* AG2's nested chats ensure proper workflow orchestration
* LangGraph's state graphs maintain context across planning stages
* Multiple specialized agents provide comprehensive coverage
* Frameworks handle internal coordination while MassGen coordinates overall strategy

Framework Comparison
~~~~~~~~~~~~~~~~~~~~

Run multiple frameworks on the same task to compare approaches:

.. code-block:: yaml

   agents:
     - id: "ag2_approach"
       backend:
         type: "openai"
         model: "gpt-4o"
         custom_tools:
           - name: ["ag2_lesson_planner"]
             path: "massgen/tool/_extraframework_agents/ag2_lesson_planner_tool.py"
             function: ["ag2_lesson_planner"]

     - id: "langgraph_approach"
       backend:
         type: "openai"
         model: "gpt-4o"
         custom_tools:
           - name: ["langgraph_lesson_planner"]
             path: "massgen/tool/_extraframework_agents/langgraph_lesson_planner_tool.py"
             function: ["langgraph_lesson_planner"]

Each agent uses a different framework, and MassGen's coordination helps identify the best approach.

Creating Custom Framework Integrations
---------------------------------------

Want to integrate a new framework or customize existing ones? This section shows you how.

Architecture Overview
~~~~~~~~~~~~~~~~~~~~~

Each framework integration follows a clean separation pattern:

.. code-block:: python

   # 1. Core framework logic (pure framework implementation)
   async def run_framework_agent(messages, api_key):
       # Pure framework code here
       # Returns: result string
       pass

   # 2. MassGen custom tool wrapper
   @context_params("prompt")
   async def framework_tool(prompt):
       # Environment setup
       # Call core framework function
       # Wrap result in ExecutionResult
       yield ExecutionResult(...)

**This separation ensures:**

* Framework code remains portable and testable
* MassGen integration is clean and minimal
* Easy debugging and maintenance

Wrapper Template
~~~~~~~~~~~~~~~~

To integrate a new framework, follow this template:

.. code-block:: python

   # your_framework_tool.py
   import os
   from typing import Any, AsyncGenerator, Dict, List

   # Import your framework
   from your_framework import YourFrameworkAgent

   from massgen.tool import context_params
   from massgen.tool._result import ExecutionResult, TextContent


   async def run_your_framework_agent(
       messages: List[Dict[str, Any]],
       api_key: str,
   ) -> str:
       """
       Core framework logic - pure framework implementation.

       Args:
           messages: Complete message history from orchestrator
           api_key: API key for LLM

       Returns:
           Result as string
       """
       # 1. Extract user request from messages
       user_prompt = ""
       for msg in messages:
           if isinstance(msg, dict) and msg.get("role") == "user":
               user_prompt = msg.get("content", "")
               break

       # 2. Initialize your framework
       agent = YourFrameworkAgent(api_key=api_key)

       # 3. Run framework-specific logic
       result = await agent.run(user_prompt)

       # 4. Return result as string
       return result


   @context_params("prompt")
   async def your_framework_tool(
       prompt: List[Dict[str, Any]],
   ) -> AsyncGenerator[ExecutionResult, None]:
       """
       MassGen custom tool wrapper.

       Args:
           prompt: Processed message list from orchestrator

       Yields:
           ExecutionResult containing the result or error messages
       """
       # Get API key from environment
       api_key = os.getenv("YOUR_FRAMEWORK_API_KEY")

       if not api_key:
           yield ExecutionResult(
               output_blocks=[
                   TextContent(data="Error: API key not found"),
               ],
           )
           return

       try:
           # Call core framework function
           result = await run_your_framework_agent(
               messages=prompt,
               api_key=api_key,
           )

           # Yield result
           yield ExecutionResult(
               output_blocks=[
                   TextContent(data=f"Your Framework Result:\n\n{result}"),
               ],
           )

       except Exception as e:
           yield ExecutionResult(
               output_blocks=[
                   TextContent(data=f"Error: {str(e)}"),
               ],
           )

Configuration Template
~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   agents:
     - id: "your_framework_agent"
       backend:
         type: "openai"  # or any backend
         model: "gpt-4o"
         custom_tools:
           - name: ["your_framework_tool"]
             category: "custom"
             path: "path/to/your_framework_tool.py"
             function: ["your_framework_tool"]
       system_message: |
         You have access to a custom framework tool.
         Use it when appropriate for specialized tasks.

Best Practices
~~~~~~~~~~~~~~

1. **Separation of Concerns**

   Keep framework logic separate from MassGen integration:

   * Core function: Pure framework implementation
   * Wrapper function: MassGen integration only

   This makes testing and maintenance easier.

2. **Error Handling**

   Always wrap framework calls in try-except:

   .. code-block:: python

      try:
          result = await run_framework_agent(...)
          yield ExecutionResult(output_blocks=[TextContent(data=result)])
      except Exception as e:
          yield ExecutionResult(
              output_blocks=[TextContent(data=f"Error: {str(e)}")]
          )

3. **Environment Configuration**

   Use environment variables for API keys and sensitive data:

   .. code-block:: python

      api_key = os.getenv("FRAMEWORK_API_KEY")
      if not api_key:
          yield ExecutionResult(
              output_blocks=[TextContent(data="Error: API key not found")]
          )
          return

4. **Streaming Support**

   For long-running operations, yield intermediate results (currently supported for AG2 and OpenAI Chat Completions):

   .. code-block:: python

      yield ExecutionResult(
          output_blocks=[TextContent(data="Step 1 complete\n")],
          is_log=True,  # Mark as log output
      )

   **Note:** Streaming support is currently available for AG2 and OpenAI Chat Completions. Other frameworks will receive streaming support in future releases.

5. **Message Extraction**

   Properly extract user requests from message history:

   .. code-block:: python

      user_prompt = ""
      for msg in messages:
          if isinstance(msg, dict) and msg.get("role") == "user":
              user_prompt = msg.get("content", "")
              break

Troubleshooting
---------------

Framework Not Found
~~~~~~~~~~~~~~~~~~~

**Error:** ``ModuleNotFoundError: No module named 'ag2'`` or ``No module named 'langgraph'``

**Solution:**

.. code-block:: bash

   # For AG2
   uv pip install -e ".[external]"

   # For LangGraph
   pip install langgraph langchain-openai

API Key Issues
~~~~~~~~~~~~~~

**Error:** ``Error: OPENAI_API_KEY not found``

**Solution:**

Set the required environment variable:

.. code-block:: bash

   export OPENAI_API_KEY="your-key-here"

Tool Not Recognized
~~~~~~~~~~~~~~~~~~~

**Error:** Tool function not found

**Solution:**

* Verify ``path`` points to correct Python file
* Ensure ``function`` name matches the decorated function
* Check that file is in Python path or use absolute path

Async/Sync Mismatch
~~~~~~~~~~~~~~~~~~~

**Error:** ``coroutine was never awaited``

**Solution:**

Ensure your tool function is async and uses ``AsyncGenerator``:

.. code-block:: python

   @context_params("prompt")
   async def your_tool(prompt) -> AsyncGenerator[ExecutionResult, None]:
       # Use async/await throughout
       result = await framework_function()
       yield ExecutionResult(...)

Legacy AG2 Backend Approach (Not Recommended)
----------------------------------------------

**Note:** This section documents the older AG2 backend integration approach for backwards compatibility. We recommend using the **Custom Tool Integration** approach described above instead.

What Was the Backend Approach?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In earlier versions (v0.0.28), MassGen supported AG2 as a direct backend type, where AG2 agents participated directly in MassGen's coordination system:

.. code-block:: yaml

   agents:
     - id: "ag2_coder"
       backend:
         type: ag2                  # AG2 as a backend
         agent_config:
           type: assistant
           name: "AG2_Coder"
           system_message: "You write and execute Python code"
           llm_config:
             api_type: "openai"
             model: "gpt-4o"
           code_execution_config:
             executor:
               type: "LocalCommandLineCodeExecutor"

Why Not Use the Backend Approach?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Limitations:**

* AG2 agents participated directly in coordination, which could be inflexible
* Limited ability to combine AG2's internal multi-agent patterns with MassGen coordination
* Less control over when and how AG2 agents were invoked
* Difficult to preserve AG2-specific orchestration patterns (nested chats, group chats)

**The custom tool approach provides:**

* Better separation of concerns
* Ability to wrap complex AG2 multi-agent workflows as single tools
* More flexible hybrid architectures
* Preservation of AG2's unique orchestration capabilities

Backwards Compatibility
~~~~~~~~~~~~~~~~~~~~~~~~

The backend approach still works and is backwards compatible. If you have existing configurations using ``type: ag2`` in backend configuration, they will continue to function.

However, for new implementations, we recommend:

1. **Use AG2 as a custom tool** (see ``AG2 Integration`` section above)
2. **Wrap AG2 multi-agent patterns** as tools to preserve their orchestration
3. **Leverage hybrid architectures** with custom tool + backend combinations

Migration Example
~~~~~~~~~~~~~~~~~

To migrate from the old backend approach to the new custom tool approach:

**Step 1: Build your custom tool** (see `Creating Custom Framework Integrations`_ section for the template)

Create a Python file with your AG2 logic wrapped as a custom tool following the wrapper pattern.

**Step 2: Update your YAML configuration**

**Old approach (backend):**

.. code-block:: yaml

   agents:
     - id: "ag2_coder"
       backend:
         type: ag2
         agent_config:
           type: assistant
           # ...

**New approach (custom tool):**

.. code-block:: yaml

   agents:
     - id: "assistant_with_ag2_tool"
       backend:
         type: "openai"
         model: "gpt-4o"
         custom_tools:
           - name: ["ag2_lesson_planner"]
             path: "massgen/tool/_extraframework_agents/ag2_lesson_planner_tool.py"
             function: ["ag2_lesson_planner"]
       system_message: |
         You have access to an AG2-powered tool that uses
         nested chats and group collaboration.

The new approach gives you more control and better integration with MassGen's coordination system.

Future Framework Support
-------------------------

MassGen v0.1.6 includes full support for five agent frameworks:

* **AG2 (AutoGen)** - Nested chats and group collaboration
* **LangGraph** - State graph-based workflows
* **AgentScope** - Sequential agent pipelines
* **OpenAI Chat Completions** - Multi-agent API patterns
* **SmolAgent** - Tool-using agents from HuggingFace

All frameworks follow the same custom tool integration pattern. See the examples in ``massgen/tool/_extraframework_agents/`` for implementation details.

**Want to integrate another framework?** We welcome contributions for additional frameworks:

* CrewAI
* Haystack
* Semantic Kernel
* AutoGPT

See :doc:`../../development/contributing` for contribution guidelines.

Next Steps
----------

* :doc:`../tools/custom_tools` - General custom tool development
* :doc:`../tools/mcp_integration` - Model Context Protocol tools
* :doc:`../tools/index` - Complete tool system overview
* :doc:`../../examples/advanced_patterns` - Advanced integration patterns

Examples Repository
-------------------

Find complete working examples in the repository:

* ``massgen/tool/_extraframework_agents/`` - Framework integration implementations
* ``massgen/configs/tools/custom_tools/interop/`` - Example configurations
* Use ``@examples/tools/custom_tools/interop/`` prefix when running configs


---

## user_guide/integration/http_server.rst

HTTP Server (OpenAI-Compatible API)
====================================

Run MassGen as an OpenAI-compatible HTTP server for seamless integration with existing tools, proxies, and clients.

Quick Start
-----------

**Step 1: Create a config file** (``config.yaml``)

.. code-block:: yaml

   agents:
     - id: research-agent
       backend:
         type: openai
         model: gpt-4o

     - id: analysis-agent
       backend:
         type: gemini
         model: gemini-2.5-flash

**Step 2: Start the server**

.. code-block:: bash

   massgen serve --config config.yaml

   # Server starts on http://localhost:4000

**Step 3: Connect with any OpenAI client**

.. code-block:: python

   from openai import OpenAI

   client = OpenAI(
       base_url="http://localhost:4000/v1",
       api_key="not-needed"  # Local server doesn't require auth
   )

   response = client.chat.completions.create(
       model="massgen",
       messages=[{"role": "user", "content": "Analyze renewable energy trends"}],
   )

   # Final answer
   print(response.choices[0].message.content)

**cURL alternative:**

.. code-block:: bash

   curl http://localhost:4000/v1/chat/completions \
     -H "Content-Type: application/json" \
     -d '{"model":"massgen","messages":[{"role":"user","content":"Hello!"}]}'

Endpoints
---------

.. list-table::
   :header-rows: 1
   :widths: 30 70

   * - Endpoint
     - Description
   * - ``GET /health``
     - Health check (returns ``{"status": "ok"}``)
   * - ``POST /v1/chat/completions``
     - Chat completions endpoint

Response Format
---------------

The server returns OpenAI-compatible responses with MassGen metadata:

.. code-block:: json

   {
     "id": "chatcmpl-req_abc123",
     "object": "chat.completion",
     "created": 1704067200,
     "model": "massgen",
     "choices": [{
       "index": 0,
       "message": {
         "role": "assistant",
         "content": "The final coordinated answer from the agent team."
       },
       "finish_reason": "stop"
     }],
     "usage": {
       "prompt_tokens": 0,
       "completion_tokens": 0,
       "total_tokens": 0
     },
     "massgen_metadata": {
       "session_id": "api_session_20260104_213901",
       "config_used": "/path/to/config.yaml",
       "log_directory": ".massgen/massgen_logs/log_20260104_213901_326713",
       "final_answer_path": ".massgen/massgen_logs/log_20260104_213901_326713/turn_1/final",
       "selected_agent": "agent_a",
       "vote_results": {
         "vote_counts": {"agent_a": 2, "agent_b": 1},
         "winner": "agent_a",
         "is_tie": false
       },
       "answers": [
         {"label": "answer1.1", "agent_id": "agent_a", "content": "..."},
         {"label": "answer2.1", "agent_id": "agent_b", "content": "..."}
       ],
       "agent_mapping": {"agent1": "agent_a", "agent2": "agent_b"}
     }
   }

The ``massgen_metadata`` field contains the same information returned by ``massgen.run()``:

* ``session_id`` - Unique session identifier
* ``config_used`` - Path to the config file used
* ``log_directory`` - Root log directory for this session
* ``final_answer_path`` - Path to the final answer directory
* ``selected_agent`` - ID of the winning agent
* ``vote_results`` - Voting details (counts, winner, tie status)
* ``answers`` - All submitted answers with labels and content
* ``agent_mapping`` - Mapping from anonymous names to agent IDs

Config Selection
----------------

Use the ``model`` parameter to select which config to use:

.. list-table::
   :header-rows: 1
   :widths: 40 60

   * - Model String
     - Description
   * - ``massgen``
     - Use the server's default config (from ``--config`` or auto-discovered)
   * - ``massgen/basic_multi``
     - Use a built-in example config (e.g., ``@examples/basic_multi``)
   * - ``massgen/path:/path/to/config.yaml``
     - Use a specific config file path

CLI Options
-----------

.. code-block:: bash

   massgen serve [OPTIONS]

.. list-table::
   :header-rows: 1
   :widths: 25 75

   * - Option
     - Description
   * - ``--config PATH``
     - Path to YAML configuration file (supports ``@examples/`` syntax)
   * - ``--host HOST``
     - Bind address (default: ``0.0.0.0``)
   * - ``--port PORT``
     - Port number (default: ``4000``)
   * - ``--reload``
     - Enable auto-reload (development only)

If no ``--config`` is provided, the server auto-discovers configs in order:

1. ``.massgen/config.yaml`` (project-level)
2. ``~/.config/massgen/config.yaml`` (user-level)

Environment Variables
---------------------

.. list-table::
   :header-rows: 1
   :widths: 40 60

   * - Variable
     - Description
   * - ``MASSGEN_SERVER_HOST``
     - Bind address (default: ``0.0.0.0``)
   * - ``MASSGEN_SERVER_PORT``
     - Port (default: ``4000``)
   * - ``MASSGEN_SERVER_DEFAULT_CONFIG``
     - Default config file path

Full Feature Parity
-------------------

The HTTP server uses ``massgen.run()`` internally, providing **identical behavior** to CLI, WebUI, and LiteLLM modes:

* **Logging** - Creates logs in ``.massgen/massgen_logs/``
* **Metrics** - Saves ``metrics_summary.json`` and ``execution_metadata.yaml``
* **Session Management** - Full session tracking with coordination history
* **Agent Outputs** - Saves individual agent outputs to ``agent_outputs/``

This means you can use ``massgen logs`` to view server session logs, and all debugging/analysis tools work the same way.

Streaming Support
-----------------

.. note::

   Streaming (``stream: true``) is not yet supported. Set ``stream: false`` in requests.
   Streaming support is planned for a future release.

Use Cases
---------

The HTTP server is ideal for:

* **API Gateways** - Route MassGen through existing infrastructure
* **Proxies** - Use tools like LiteLLM Proxy or other OpenAI-compatible routers
* **External Applications** - Any app that speaks the OpenAI API format
* **Language-Agnostic Integration** - Use from any language with HTTP support

See Also
--------

* :doc:`/quickstart/running-massgen` - Quick start with all modes
* :doc:`/reference/cli` - Full CLI reference
* :doc:`python_api` - Direct Python API (same return values as HTTP server)


---

## user_guide/integration/index.rst

Integration & Automation
========================

This section covers how to integrate MassGen into your applications and automate workflows. MassGen offers multiple integration paths for different use cases.

Choosing Your Integration
-------------------------

.. list-table::
   :header-rows: 1
   :widths: 20 40 40

   * - Method
     - Best For
     - Key Features
   * - **HTTP Server**
     - API gateways, proxies, external apps
     - OpenAI-compatible endpoints, SSE streaming
   * - **Python API**
     - Application integration, automation scripts
     - Async-first, full control, direct access
   * - **LiteLLM**
     - Existing LiteLLM users, LangChain integration
     - OpenAI-compatible, drop-in replacement
   * - **Automation Mode**
     - Background execution, CI/CD pipelines
     - Headless, non-interactive, scriptable

Guides in This Section
----------------------

.. grid:: 4
   :gutter: 3

   .. grid-item-card:: 🌐 HTTP Server

      OpenAI-compatible API

      * ``massgen serve`` command
      * ``/v1/chat/completions`` endpoint
      * Streaming via SSE
      * Config-as-Authority mode

      :doc:`Read the HTTP Server guide → <http_server>`

   .. grid-item-card:: 🐍 Python API

      Direct Python integration

      * ``massgen.run()`` async API
      * ``massgen.build_config()`` programmatic config
      * LiteLLM provider registration
      * Full control over execution

      :doc:`Read the Python API guide → <python_api>`

   .. grid-item-card:: 🤖 Automation

      Headless execution

      * Background execution
      * CI/CD integration
      * Status file monitoring
      * Non-interactive mode

      :doc:`Read the Automation guide → <automation>`

   .. grid-item-card:: 🔗 Framework Interoperability

      External frameworks

      * AG2 framework integration
      * LangChain compatibility
      * Custom backends
      * External tools

      :doc:`Read the Interoperability guide → <general_interoperability>`

Quick Examples
--------------

.. tabs::

   .. tab:: HTTP Server

      .. code-block:: bash

         # Start the server with a config
         massgen serve --config balanced.yaml --port 4000

      .. code-block:: python

         # Any OpenAI-compatible client works
         from openai import OpenAI
         client = OpenAI(base_url="http://localhost:4000/v1", api_key="unused")

         response = client.chat.completions.create(
             model="massgen",  # Ignored when config is provided
             messages=[{"role": "user", "content": "Your question"}]
         )
         print(response.choices[0].message.content)  # Final answer
         print(response.choices[0].message.reasoning_content)  # Traces

   .. tab:: Python API

      .. code-block:: python

         import asyncio
         import massgen

         async def main():
             result = await massgen.run(
                 query="Analyze this problem",
                 models=["gpt-5", "claude-sonnet-4-5-20250929"]
             )
             print(result["final_answer"])

         asyncio.run(main())

   .. tab:: LiteLLM

      .. code-block:: python

         import litellm
         from massgen import register_with_litellm

         register_with_litellm()

         response = litellm.completion(
             model="massgen/build",
             messages=[{"role": "user", "content": "Your question"}],
             optional_params={"models": ["openai/gpt-5", "anthropic/claude-sonnet-4-5-20250929"]}
         )
         print(response.choices[0].message.content)

   .. tab:: Automation CLI

      .. code-block:: bash

         # Run in automation mode (headless)
         massgen --automation --model gpt-5 "Your question"

         # Monitor with status file
         massgen --automation --status-file status.json "Your query"

Related Documentation
---------------------

* :doc:`../../quickstart/running-massgen` - Getting started
* :doc:`../../reference/python_api` - API reference
* :doc:`../../reference/cli` - CLI reference
* :doc:`../tools/index` - Available tools

.. toctree::
   :maxdepth: 1
   :hidden:

   http_server
   python_api
   automation
   general_interoperability


---

## user_guide/integration/python_api.rst

=============================
Programmatic API Guide
=============================

This guide shows how to use MassGen programmatically from Python code, including direct Python API usage and LiteLLM integration.

.. contents:: Table of Contents
   :local:
   :depth: 2

Overview
========

MassGen provides multiple ways to integrate into your Python applications:

1. **Direct Python API** - Use ``massgen.run()`` for simple programmatic access
2. **LiteLLM Integration** - Use MassGen as a LiteLLM custom provider
3. **CLI with --output-file** - Save results directly to a file for batch processing

Quick Start
===========

LiteLLM Integration (Recommended)
---------------------------------

.. note::
   Token counting and pricing are not yet supported in the LiteLLM integration.
   The ``usage`` field in responses will show zeros. This feature is planned for a future release.

Copy-paste ready example for using MassGen with LiteLLM:

.. code-block:: python

   from dotenv import load_dotenv
   load_dotenv()  # Load API keys from .env file

   import litellm
   from massgen import register_with_litellm

   # Register MassGen as a provider (call once at startup)
   register_with_litellm()

   # Run multi-agent with different models (slash format: backend/model)
   response = litellm.completion(
       model="massgen/build",
       messages=[{"role": "user", "content": "What is machine learning?"}],
       optional_params={
           "models": [
               "openrouter/openai/gpt-5.1",
               "openrouter/google/gemini-3-pro-preview",
               "openrouter/x-ai/grok-4.1-fast",
           ],
       }
   )

   # Get the final answer (standard LiteLLM response)
   print("=== FINAL ANSWER ===")
   print(response.choices[0].message.content)

   # Access MassGen metadata
   metadata = response._hidden_params

   # Print all agent answers
   print("\n=== ALL ANSWERS ===")
   for answer in metadata.get("massgen_answers", []):
       print(f"\n[{answer['agent_id']}] ({answer['label']})")
       print(answer["content"][:200] + "..." if len(answer["content"] or "") > 200 else answer["content"])

   # Print vote results
   print("\n=== VOTING ===")
   vote_results = metadata.get("massgen_vote_results")
   if vote_results:
       print(f"Winner: {vote_results['winner']}")
       print(f"Votes: {vote_results['vote_counts']}")
       for voted_for, voters in vote_results.get("voter_details", {}).items():
           for v in voters:
               print(f"  {v['voter']} -> {voted_for}: {v['reason']}")

   # Log paths for detailed inspection
   print("\n=== LOG PATHS ===")
   print(f"Log directory: {metadata.get('massgen_log_directory')}")
   print(f"Final answer: {metadata.get('massgen_final_answer_path')}")

Direct Python API
-----------------

For async workflows or more control:

.. code-block:: python

   import asyncio
   import massgen

   # Single agent mode
   result = asyncio.run(massgen.run(
       query="What is machine learning?",
       model="gpt-4o-mini"
   ))
   print(result["final_answer"])

   # Multi-agent mode with config
   result = asyncio.run(massgen.run(
       query="Compare renewable energy sources",
       config="@examples/basic_multi"
   ))
   print(result["final_answer"])

   # Access coordination metadata (multi-agent only)
   print(f"Winner: {result.get('selected_agent')}")
   for answer in result.get("answers", []):
       print(f"[{answer['agent_id']}]: {answer['content'][:100]}...")


Python API Reference
====================

massgen.run()
-------------

The main async function for running MassGen programmatically.

.. code-block:: python

   async def run(
       query: str,
       config: str = None,
       model: str = None,
       models: list = None,
       num_agents: int = None,
       use_docker: bool = False,
       enable_filesystem: bool = True,
       enable_logging: bool = False,
       output_file: str = None,
       **kwargs,
   ) -> dict

**Parameters:**

- ``query`` (str): The question or task for the agent(s)
- ``config`` (str, optional): Config file path or ``@examples/NAME``
- ``model`` (str, optional): Model name for agents (e.g., 'gpt-5')
- ``models`` (list, optional): List of models for multi-agent mode
- ``num_agents`` (int, optional): Number of agents when using single model
- ``use_docker`` (bool): Enable Docker execution mode (default: False)
- ``enable_filesystem`` (bool): Enable filesystem/MCP tools (default: True). Set to False for lightweight agents.
- ``enable_logging`` (bool): Enable logging and return ``log_directory`` in result
- ``output_file`` (str, optional): Write final answer to this file path
- ``**kwargs``: Additional options including ``context_paths`` (list of paths with permissions)

**Returns:**

A dictionary containing:

.. code-block:: python

   {
       "final_answer": str,        # The generated answer
       "config_used": str,         # Path to config or "single-agent:<model>"
       "session_id": str,          # Session ID for continuation

       # Log directory pointers (multi-agent mode):
       "log_directory": str,       # Root log directory (e.g., .massgen/massgen_logs/log_XXX)
       "final_answer_path": str,   # Path to final/ directory

       # Coordination metadata (multi-agent mode only, uses anonymous agent_a, agent_b names):
       "selected_agent": str,      # Anonymous ID of the winning agent (e.g., "agent_a")
       "vote_results": dict,       # Voting details with anonymous IDs (see below)
       "answers": list,            # List of answers with labels and paths (see below)
       "agent_mapping": dict,      # Maps anonymous IDs to real agent IDs
   }

The ``answers`` list contains entries for each answer submitted (agent IDs are anonymized):

.. code-block:: python

   [
       {
           "label": "agent1.1",       # Answer label in answerX.Y format
           "agent_id": "agent_a",     # Anonymized agent ID
           "answer_path": "/path/to/.../turn_1/attempt_1/agent_a/20251130_XXX/",
           "content": "The answer text...",
       },
       {
           "label": "agent2.1",
           "agent_id": "agent_b",
           "answer_path": "/path/to/.../turn_1/attempt_1/agent_b/20251130_XXX/",
           "content": "Another answer...",
       },
   ]

The ``agent_mapping`` dict maps anonymous names back to real agent IDs:

.. code-block:: python

   {
       "agent_a": "openrouter-fast1",
       "agent_b": "openrouter-fast2",
   }

The ``vote_results`` dict contains (with anonymized agent IDs):

.. code-block:: python

   {
       "vote_counts": {"agent_a": 2, "agent_b": 1},  # Votes per agent
       "voter_details": {                            # Who voted and why
           "agent_a": [
               {"voter": "agent_b", "reason": "More comprehensive answer"},
               {"voter": "agent_c", "reason": "Better structure"}
           ]
       },
       "winner": "agent_a",      # Winning agent (anonymous ID)
       "is_tie": False,          # Whether there was a tie
       "total_votes": 3,         # Total votes cast
       "agents_with_answers": 2, # Agents that submitted answers
       "agents_voted": 3,        # Agents that voted
   }

**Examples:**

.. code-block:: python

   import asyncio
   import massgen

   # Single agent with specific model
   result = asyncio.run(massgen.run(
       query="Explain quantum computing",
       model="claude-sonnet-4-20250514"
   ))

   # Multi-agent with example config
   result = asyncio.run(massgen.run(
       query="Design a REST API",
       config="@examples/basic_multi"
   ))

   # With logging enabled
   result = asyncio.run(massgen.run(
       query="Your question",
       config="@examples/basic_multi",
       enable_logging=True
   ))
   print(f"Logs at: {result['log_directory']}")

   # Save output to file
   result = asyncio.run(massgen.run(
       query="Your question",
       model="gpt-4o-mini",
       output_file="/tmp/answer.txt"
   ))


massgen.build_config()
----------------------

Build a MassGen configuration dict programmatically, similar to ``--quickstart``:

.. code-block:: python

   def build_config(
       num_agents: int = None,
       backend: str = None,
       model: str = None,
       models: list = None,
       backends: list = None,
       use_docker: bool = False,
       context_paths: list = None,
   ) -> dict

**Parameters:**

- ``num_agents`` (int, optional): Number of agents (1-10). Auto-detected from models/backends if not specified.
- ``backend`` (str, optional): Backend provider for all agents - 'openai', 'anthropic', 'gemini', 'grok'
- ``model`` (str, optional): Model name for all agents (e.g., 'gpt-4o-mini')
- ``models`` (list, optional): List of model names, one per agent (e.g., ['gpt-4o', 'claude-sonnet-4-20250514'])
- ``backends`` (list, optional): List of backends, one per agent (e.g., ['openai', 'anthropic'])
- ``use_docker`` (bool): Enable Docker execution mode (default: False)
- ``context_paths`` (list, optional): List of paths with permissions for file operations. Each entry can be:

  - A string path (defaults to "write" permission)
  - A dict: ``{"path": "/path", "permission": "read" or "write"}``

**Returns:**

A complete configuration dict ready to use with ``run()``.

**Examples:**

.. code-block:: python

   import asyncio
   import massgen

   # Same model for all agents
   config = massgen.build_config(num_agents=3, model="gpt-5")
   result = asyncio.run(massgen.run(query="Your question", config_dict=config))

   # Different models per agent (auto-detects backends)
   config = massgen.build_config(
       models=["gpt-5", "claude-sonnet-4-5-20250929", "gemini-3-pro-preview"]
   )
   result = asyncio.run(massgen.run(query="Compare approaches", config_dict=config))

   # Explicit backends and models with Docker
   config = massgen.build_config(
       backends=["openai", "anthropic"],
       models=["gpt-5", "claude-sonnet-4-5-20250929"],
       use_docker=True
   )

**Generated Config Structure:**

When you call ``build_config()``, it generates a complete YAML-equivalent config. Here's what the default produces:

.. code-block:: yaml

   # build_config() with defaults (2 agents, gpt-5, local mode)
   agents:
     - id: openai-gpt5-1
       backend:
         type: openai
         model: gpt-5
         cwd: workspace1
         exclude_file_operation_mcps: false  # MCP file ops enabled
     - id: openai-gpt5-2
       backend:
         type: openai
         model: gpt-5
         cwd: workspace2
         exclude_file_operation_mcps: false

   orchestrator:
     snapshot_storage: snapshots
     agent_temporary_workspace: temp_workspaces
     max_new_answers_per_agent: 5
     coordination:
       max_orchestration_restarts: 2
       enable_agent_task_planning: true
       task_planning_filesystem_mode: true
       enable_memory_filesystem_mode: true

   timeout_settings:
     orchestrator_timeout_seconds: 1800

With ``use_docker=True``, the config includes Docker execution settings:

.. code-block:: yaml

   # build_config(models=["groq/llama-3.3-70b"], use_docker=True)
   agents:
     - id: groq-70b1
       backend:
         type: groq
         model: llama-3.3-70b
         base_url: https://api.groq.com/openai/v1  # Auto-filled!
         cwd: workspace1
         enable_code_based_tools: true
         command_line_execution_mode: docker
         command_line_docker_image: ghcr.io/massgen/mcp-runtime-sudo:latest
         # ... additional Docker settings

   orchestrator:
     coordination:
       use_skills: true
       skills_directory: .agent/skills
       # ... additional orchestration settings


LiteLLM Integration
===================

MassGen integrates with `LiteLLM <https://docs.litellm.ai/>`_, allowing you to use it alongside 100+ other LLM providers with a unified interface.

Installation
------------

LiteLLM is an optional dependency. Install it with:

.. code-block:: bash

   pip install massgen[litellm]
   # or
   pip install litellm

Registration
------------

Before using MassGen with LiteLLM, register it as a provider:

.. code-block:: python

   from massgen import register_with_litellm

   # Call once at startup
   register_with_litellm()

Model String Format
-------------------

MassGen uses a special model string format:

- ``massgen/<example-name>`` - Use built-in example config
- ``massgen/model:<model-name>`` - Quick single-agent mode
- ``massgen/path:<config-path>`` - Explicit config file path
- ``massgen/build`` - Build config dynamically from ``optional_params``

**Examples:**

.. code-block:: python

   import litellm
   from massgen import register_with_litellm

   register_with_litellm()

   # Built-in example config
   response = litellm.completion(
       model="massgen/basic_multi",
       messages=[{"role": "user", "content": "Your question"}]
   )

   # Quick single-agent mode
   response = litellm.completion(
       model="massgen/model:gpt-4o-mini",
       messages=[{"role": "user", "content": "What is 2+2?"}]
   )

   # Explicit config path
   response = litellm.completion(
       model="massgen/path:/path/to/my_config.yaml",
       messages=[{"role": "user", "content": "Your question"}]
   )

Dynamic Config Building
-----------------------

Use ``massgen/build`` to create multi-agent configurations on-the-fly.

**Slash Format (Recommended)** - Explicitly specify backend and model:

.. code-block:: python

   import litellm
   from massgen import register_with_litellm

   register_with_litellm()

   # Slash format: "backend/model" - explicit and clear
   response = litellm.completion(
       model="massgen/build",
       messages=[{"role": "user", "content": "Compare approaches"}],
       optional_params={
           "models": ["openai/gpt-5", "groq/llama-3.3-70b", "cerebras/llama-3.3-70b"],
       }
   )

   # Mixed: auto-detect + explicit
   response = litellm.completion(
       model="massgen/build",
       messages=[{"role": "user", "content": "Your question"}],
       optional_params={
           "models": ["gpt-5", "groq/llama-3.3-70b-versatile"],  # gpt-5 auto-detects to openai
       }
   )

   # Same model for multiple agents
   response = litellm.completion(
       model="massgen/build",
       messages=[{"role": "user", "content": "Your question"}],
       optional_params={
           "model": "groq/llama-3.3-70b",
           "num_agents": 3,
       }
   )

   # With filesystem access to specific paths
   response = litellm.completion(
       model="massgen/build",
       messages=[{"role": "user", "content": "Read the config file and summarize it"}],
       optional_params={
           "model": "gpt-5",
           "context_paths": [
               {"path": "/path/to/project", "permission": "read"},
               {"path": "/path/to/output", "permission": "write"},
           ],
       }
   )

   # Lightweight mode without filesystem (faster for simple queries)
   response = litellm.completion(
       model="massgen/build",
       messages=[{"role": "user", "content": "What is 2+2?"}],
       optional_params={
           "model": "gpt-5-nano",
           "enable_filesystem": False,
       }
   )

**Supported Backends:** ``openai``, ``claude``, ``gemini``, ``grok``, ``groq``, ``cerebras``, ``together``, ``fireworks``, ``openrouter``, and more.

.. tip::
   Use slash format for providers like Groq, Cerebras, Together, etc. where model names
   don't clearly indicate the backend.

Async Usage
-----------

LiteLLM also supports async:

.. code-block:: python

   import asyncio
   import litellm
   from massgen import register_with_litellm

   register_with_litellm()

   async def main():
       response = await litellm.acompletion(
           model="massgen/basic_multi",
           messages=[{"role": "user", "content": "Your question"}]
       )
       print(response.choices[0].message.content)

   asyncio.run(main())

Optional Parameters
-------------------

Pass MassGen-specific options via ``optional_params``:

.. code-block:: python

   response = litellm.completion(
       model="massgen/basic_multi",
       messages=[{"role": "user", "content": "Your question"}],
       optional_params={
           "enable_logging": True,
           "output_file": "/tmp/answer.txt"
       }
   )

**Available Parameters:**

+----------------------+------------------+----------------------------------------------------+
| Parameter            | Type             | Description                                        |
+======================+==================+====================================================+
| ``models``           | list[str]        | List of model names for multi-agent mode           |
|                      |                  | (e.g., ``["gpt-4o", "claude-sonnet-4-20250514"]``) |
+----------------------+------------------+----------------------------------------------------+
| ``model``            | str              | Single model name for all agents                   |
+----------------------+------------------+----------------------------------------------------+
| ``num_agents``       | int              | Number of agents when using single model           |
+----------------------+------------------+----------------------------------------------------+
| ``use_docker``       | bool             | Enable Docker execution mode (default: False)      |
+----------------------+------------------+----------------------------------------------------+
| ``enable_filesystem``| bool             | Enable filesystem/MCP tools (default: True)        |
+----------------------+------------------+----------------------------------------------------+
| ``context_paths``    | list             | Paths with permissions for file operations.        |
|                      |                  | Each entry: str or {"path": str, "permission": str}|
+----------------------+------------------+----------------------------------------------------+
| ``enable_logging``   | bool             | Enable logging and return log directory            |
+----------------------+------------------+----------------------------------------------------+
| ``output_file``      | str              | Write final answer to this file path               |
+----------------------+------------------+----------------------------------------------------+

.. note::
   When using ``massgen/build``, either ``models`` (list) or ``model`` (single) should be provided.
   If using ``model``, you can also specify ``num_agents`` to control how many agents use that model.

.. tip::
   For more advanced configurations (custom system prompts, MCP tools, specific orchestration settings, etc.),
   create a YAML config file and use ``massgen/path:/path/to/config.yaml`` instead. See
   :doc:`../../reference/yaml_schema` for the full configuration schema.

Accessing Coordination Metadata
-------------------------------

MassGen stores coordination metadata in the response's ``_hidden_params`` attribute.
This follows LiteLLM's convention for provider-specific metadata:

.. code-block:: python

   import litellm
   from massgen import register_with_litellm

   register_with_litellm()

   response = litellm.completion(
       model="massgen/build",
       messages=[{"role": "user", "content": "Compare AI approaches"}],
       optional_params={
           "models": ["openai/gpt-5", "anthropic/claude-sonnet-4-5-20250929"],
       }
   )

   # Access the final answer (standard LiteLLM)
   print(response.choices[0].message.content)

   # Access MassGen coordination metadata
   metadata = response._hidden_params

   # Basic metadata
   print(f"Config used: {metadata['massgen_config_used']}")
   print(f"Session ID: {metadata['massgen_session_id']}")

   # Log directory pointers
   print(f"Log directory: {metadata['massgen_log_directory']}")
   print(f"Final answer path: {metadata['massgen_final_answer_path']}")

   # Coordination metadata (multi-agent mode)
   print(f"Selected agent: {metadata['massgen_selected_agent']}")

   # Voting details
   vote_results = metadata['massgen_vote_results']
   if vote_results:
       print(f"Winner: {vote_results['winner']}")
       print(f"Vote counts: {vote_results['vote_counts']}")
       print(f"Was tie: {vote_results['is_tie']}")

       # See why each agent voted
       for agent_id, voters in vote_results['voter_details'].items():
           for vote in voters:
               print(f"  {vote['voter']} voted for {agent_id}: {vote['reason']}")

   # All answers with labels and log paths
   answers = metadata['massgen_answers']
   if answers:
       for answer in answers:
           print(f"[{answer['label']}] {answer['agent_id']}")
           print(f"  Path: {answer['answer_path']}")
           print(f"  Content: {answer['content'][:100]}...")

**Available Metadata Fields:**

+-------------------------------+------------------+--------------------------------------------------+
| Field                         | Type             | Description                                      |
+===============================+==================+==================================================+
| ``massgen_config_used``       | str              | Config path or description                       |
+-------------------------------+------------------+--------------------------------------------------+
| ``massgen_session_id``        | str              | Session ID for the run                           |
+-------------------------------+------------------+--------------------------------------------------+
| ``massgen_log_directory``     | str or None      | Root log directory path                          |
+-------------------------------+------------------+--------------------------------------------------+
| ``massgen_final_answer_path`` | str or None      | Path to final/ directory with winning answer     |
+-------------------------------+------------------+--------------------------------------------------+
| ``massgen_selected_agent``    | str or None      | Anonymous ID of winning agent (e.g., "agent_a")  |
+-------------------------------+------------------+--------------------------------------------------+
| ``massgen_vote_results``      | dict or None     | Voting details with anonymous agent IDs          |
+-------------------------------+------------------+--------------------------------------------------+
| ``massgen_answers``           | list or None     | Answers with label, anonymous agent_id, path     |
+-------------------------------+------------------+--------------------------------------------------+
| ``massgen_agent_mapping``     | dict or None     | Maps anonymous IDs to real agent IDs             |
+-------------------------------+------------------+--------------------------------------------------+

Each entry in ``massgen_answers`` contains:

- ``label``: Answer label in answerX.Y format (e.g., "agent1.1", "agent2.1")
- ``agent_id``: Anonymous agent ID (e.g., "agent_a", "agent_b")
- ``answer_path``: Full filesystem path to the answer snapshot in logs
- ``content``: The answer text

Use ``massgen_agent_mapping`` to look up the real agent ID if needed:

.. code-block:: python

   mapping = metadata['massgen_agent_mapping']
   real_id = mapping['agent_a']  # e.g., "openrouter-fast1"

.. note::
   Coordination metadata fields (``selected_agent``, ``vote_results``, etc.) are only populated
   in multi-agent mode. In single-agent mode, these fields will be ``None``.

Advanced: Accessing Log Files
-----------------------------

Read answer files directly from the log directory:

.. code-block:: python

   from pathlib import Path

   metadata = response._hidden_params
   log_dir = metadata.get("massgen_log_directory")

   # List log contents
   if log_dir:
       for item in Path(log_dir).iterdir():
           print(f"  {item.name}")

   # Read specific answer files
   for answer in metadata.get("massgen_answers", []):
       if answer.get("answer_path"):
           answer_file = Path(answer["answer_path"]) / "answer.txt"
           if answer_file.exists():
               print(f"{answer['agent_id']}: {answer_file.read_text()[:100]}...")

Advanced: Export to JSON
------------------------

Save structured results for pipelines:

.. code-block:: python

   import json

   metadata = response._hidden_params
   results = {
       "final_answer": response.choices[0].message.content,
       "winner": metadata.get("massgen_selected_agent"),
       "vote_counts": metadata.get("massgen_vote_results", {}).get("vote_counts"),
       "answers": [
           {"agent": a["agent_id"], "label": a["label"], "content": a["content"]}
           for a in metadata.get("massgen_answers", [])
       ],
   }

   with open("massgen_results.json", "w") as f:
       json.dump(results, f, indent=2)


CLI --output-file Flag
======================

For batch processing and automation, use the ``--output-file`` flag to save the final answer directly to a file:

.. code-block:: bash

   # Save answer to specific file
   massgen --config my_config.yaml --output-file /tmp/answer.txt "Your question"

   # Works with automation mode
   massgen --automation --config my_config.yaml --output-file /tmp/answer.txt "Your question"

   # Output includes OUTPUT_FILE path for easy parsing
   # OUTPUT_FILE: /tmp/answer.txt

This is especially useful for:

- Batch processing multiple questions
- Integration with shell scripts
- LLM agents that need to retrieve answers programmatically

**Example batch script:**

.. code-block:: bash

   #!/bin/bash
   questions=("Question 1" "Question 2" "Question 3")

   for i in "${!questions[@]}"; do
       massgen --automation --config config.yaml \
           --output-file "/tmp/answer_${i}.txt" \
           "${questions[$i]}"
   done

   # Results are in /tmp/answer_0.txt, /tmp/answer_1.txt, etc.


Integration Patterns
====================

Evaluation Workflows
--------------------

Use MassGen in your evaluation pipelines:

.. code-block:: python

   import asyncio
   import massgen
   from pathlib import Path

   async def evaluate_questions(questions: list, config: str) -> list:
       """Run MassGen on a list of questions and collect results."""
       results = []
       for q in questions:
           result = await massgen.run(
               query=q,
               config=config,
               enable_logging=True
           )
           results.append({
               "question": q,
               "answer": result["final_answer"],
               "log_dir": result.get("log_directory")
           })
       return results

   # Run evaluation
   questions = [
       "What is machine learning?",
       "Explain neural networks",
       "Compare supervised and unsupervised learning"
   ]

   results = asyncio.run(evaluate_questions(
       questions,
       config="@examples/basic_multi"
   ))

   for r in results:
       print(f"Q: {r['question']}")
       print(f"A: {r['answer'][:200]}...")
       print()

LangChain Integration
---------------------

Use MassGen as a LangChain LLM via LiteLLM:

.. code-block:: python

   from langchain_community.chat_models import ChatLiteLLM
   from massgen import register_with_litellm

   register_with_litellm()

   llm = ChatLiteLLM(model="massgen/basic_multi")
   response = llm.invoke("Compare different AI architectures")
   print(response.content)


Checking LiteLLM Availability
=============================

Check if LiteLLM is available before using:

.. code-block:: python

   from massgen import LITELLM_AVAILABLE, register_with_litellm

   if LITELLM_AVAILABLE:
       register_with_litellm()
       # Use LiteLLM integration
   else:
       print("LiteLLM not installed. Install with: pip install massgen[litellm]")
       # Fall back to direct API


Next Steps
==========

- **See** :doc:`automation` for LLM agent automation guide
- **Read** :doc:`../../reference/cli` for all CLI options
- **Check** :doc:`../../reference/yaml_schema` for configuration details
- **Browse** :doc:`../../examples/basic_examples` for working examples


---

## user_guide/logging.rst

Logging & Debugging
===================

MassGen provides comprehensive logging to help you understand agent coordination, debug issues, and review decision-making processes.

Logging Directory Structure
----------------------------

All logs are stored in the ``.massgen/massgen_logs/`` directory with timestamped subdirectories:

.. code-block:: text

   .massgen/
   └── massgen_logs/
       └── log_YYYYMMDD_HHMMSS/           # Timestamped log directory
           ├── agent_a/                    # Agent-specific coordination logs
           │   └── YYYYMMDD_HHMMSS_NNNNNN/ # Timestamped coordination steps
           │       ├── answer.txt          # Agent's answer at this step
           │       ├── changedoc.md        # Decision journal snapshot (if changedoc enabled)
           │       ├── context.txt         # Context available to agent
           │       ├── execution_trace.md  # Full tool calls, results, and reasoning
           │       └── workspace/          # Agent workspace (if filesystem tools used)
           ├── agent_b/                    # Second agent's logs
           │   └── ...
           ├── agent_outputs/              # Consolidated output files
           │   ├── agent_a.txt             # Complete output from agent_a
           │   ├── agent_b.txt             # Complete output from agent_b
           │   ├── final_presentation_agent_X.txt  # Winning agent's final answer
           │   ├── final_presentation_agent_X_latest.txt  # Symlink to latest
           │   └── system_status.txt       # System status and metadata
           ├── final/                      # Final presentation phase
           │   └── agent_X/                # Winning agent's final work
           │       ├── answer.txt          # Final answer
           │       ├── changedoc.md        # Consolidated decision journal (if changedoc enabled)
           │       └── context.txt         # Final context
           ├── coordination_events.json    # Structured coordination events
           ├── coordination_table.txt      # Human-readable coordination table
           ├── vote.json                   # Final vote tallies and consensus data
           ├── massgen.log                 # Complete debug log (or massgen_debug.log in debug mode)
           ├── snapshot_mappings.json      # Workspace snapshot metadata
           └── execution_metadata.yaml     # Query, config, and execution details

.. note::
   When agents use filesystem tools, each coordination step will also contain a ``workspace/`` directory showing the files the agent created or modified during that step.

Per-Attempt Logging (Orchestration Restart)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When **orchestration restart** is enabled, each restart attempt gets its own isolated directory:

.. code-block:: text

   .massgen/massgen_logs/log_YYYYMMDD_HHMMSS/
   ├── attempt_1/          # First attempt (complete log structure)
   ├── attempt_2/          # Second attempt after restart
   ├── attempt_3/          # Third attempt if needed
   └── final/              # Copy of accepted result

For multi-turn: ``turn_1/attempt_1/``, ``turn_1/attempt_2/``, ``turn_1/final/``

.. seealso::
   :doc:`sessions/orchestration_restart` - Learn about automatic quality checks and restart workflows

Log Files Explained
-------------------

Agent Coordination Logs
~~~~~~~~~~~~~~~~~~~~~~~~

**Location**: ``agent_<id>/YYYYMMDD_HHMMSS_NNNNNN/``

Each coordination step gets a timestamped directory containing:

* ``answer.txt`` - The agent's answer/proposal at this step
* ``context.txt`` - What answers/context the agent could see (recent answers from other agents)
* ``execution_trace.md`` - Complete execution history with full tool calls, results, and reasoning

**Use cases:**

* Review what each agent proposed during coordination
* Understand how agents' thinking evolved as they saw other agents' work
* Debug why specific decisions were made

Execution Traces
~~~~~~~~~~~~~~~~

**Location**: ``agent_<id>/YYYYMMDD_HHMMSS_NNNNNN/execution_trace.md``

Execution traces are the most detailed debug artifacts available. Each trace captures the complete execution history for that answer/vote iteration:

**Contents:**

* **Tool calls** - Complete tool names and arguments (not truncated)
* **Tool results** - Full output from each tool (not truncated)
* **Reasoning blocks** - Model's internal thinking/chain-of-thought (if available)
* **Round markers** - Which coordination round the activity occurred in
* **Timestamps** - When each action occurred

**Example execution trace:**

.. code-block:: markdown

   # Execution Trace: agent_a
   **Model**: gemini-2.5-flash | **Started**: 2025-01-10 13:56:31

   ## Round 1 (Answer 1.1)

   ### Tool Call: mcp__filesystem__read_file
   **Args**:
   ```json
   {"path": "/workspace/main.py"}
   ```

   ### Tool Result: mcp__filesystem__read_file
   ```
   def main():
       print("Hello world")
       # ... full file content
   ```

   ### Reasoning
   I need to understand the existing code structure before making changes.
   The main.py file shows a simple entry point...

   ### Answer Submitted (1.1)
   Created the requested feature with proper error handling...

**Use cases:**

* **Deep debugging** - See exactly what an agent did and why
* **Compression recovery** - Agents can read their own trace to recover lost context
* **Cross-agent analysis** - Understand how other agents approached the problem
* **Tool failure analysis** - Full arguments and error messages for failed tools

**Accessing traces:**

.. code-block:: bash

   # View an agent's execution trace for a specific step
   cat .massgen/massgen_logs/log_20251010_135631/agent_a/20251010_135655_287787/execution_trace.md

   # Search for specific tool calls across all traces
   grep -r "Tool Call:" .massgen/massgen_logs/log_*/agent_*/*/execution_trace.md

   # Find traces with errors
   grep -l "Tool Error:" .massgen/massgen_logs/log_*/agent_*/*/execution_trace.md

Consolidated Agent Outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Location**: ``agent_outputs/``

Contains merged outputs from all coordination rounds:

* ``agent_<id>.txt`` - Complete output history for each agent
* ``final_presentation_agent_<id>.txt`` - Winning agent's final presentation
* ``final_presentation_agent_<id>_latest.txt`` - Symlink to latest (for automation)
* ``system_status.txt`` - System metadata and status

Final Presentation
~~~~~~~~~~~~~~~~~~

**Location**: ``final/agent_<id>/``

The winning agent's final answer after coordination:

* ``answer.txt`` - Complete final answer
* ``context.txt`` - Final context used for presentation

Coordination Events
~~~~~~~~~~~~~~~~~~~

**Location**: ``coordination_events.json``

Structured JSON log of all coordination events:

.. code-block:: json

   {
     "event_id": "E42",
     "timestamp": "2025-10-08T01:40:29",
     "agent_id": "agent_a",
     "event_type": "vote",
     "data": {
       "vote_for": "agent_b.2",
       "reason": "More comprehensive approach..."
     }
   }

**Event types:**

* ``started_streaming`` - Agent begins thinking
* ``new_answer`` - Agent provides labeled answer
* ``vote`` - Agent votes for an answer
* ``restart`` - Agent requests restart
* ``restart_completed`` - Agent finishes restart
* ``final_answer`` - Winner provides final response

Vote Summary
~~~~~~~~~~~~

**Location**: ``vote.json``

Final vote tallies and consensus information:

.. code-block:: json

   {
     "votes": {
       "agent_a": {
         "voted_for": "agent_b",
         "reason": "More comprehensive analysis"
       },
       "agent_b": {
         "voted_for": "agent_b",
         "reason": "Best captures key insights"
       }
     },
     "winner": "agent_b",
     "consensus_reached": true
   }

**Use cases:**

* Understand final consensus decision
* Review voting patterns across agents
* Analyze decision-making rationale

Main Debug Log
~~~~~~~~~~~~~~

**Location**: ``massgen.log``

Complete debug log with all system operations:

* Backend API calls and responses
* Tool usage and results
* Coordination state transitions
* Error messages and stack traces

Enable with ``--debug`` flag for verbose logging.

Execution Metadata
~~~~~~~~~~~~~~~~~~

**Location**: ``execution_metadata.yaml``

This file captures the complete execution context for reproducibility:

.. code-block:: yaml

   query: "Your original question"
   timestamp: "2025-10-13T14:30:22"
   config_path: "/path/to/config.yaml"
   config:
     agents:
       - id: "agent1"
         backend:
           type: "gemini"
           model: "gemini-2.5-flash"
       # ... full config
   cli_args:
     config: "/path/to/config.yaml"
     question: "Your original question"
     debug: false
     # ... all CLI arguments
   git:
     commit: "a1b2c3d4e5f6..."
     branch: "main"
   python_version: "3.13.0"
   massgen_version: "0.0.33"
   working_directory: "/path/to/project"

**Contents:**

* ``query`` - The user's original query/prompt
* ``timestamp`` - When the execution started (ISO 8601 format)
* ``config_path`` - Path or description of config used
* ``config`` - Complete configuration (full YAML/JSON content)
* ``cli_args`` - All command-line arguments passed to massgen
* ``git`` - Git repository info (commit hash, branch) if in a git repo
* ``python_version`` - Python interpreter version
* ``massgen_version`` - MassGen package version
* ``working_directory`` - Current working directory

**Use cases:**

* **Reproduce the exact same run** - All information needed to recreate execution
* **Debug configuration issues** - Full config and CLI args captured
* **Share execution details** - Send metadata file to team members
* **Create test cases** - Convert real runs into regression tests
* **Track experiments** - Git commit ensures you know which code version was used
* **Environment debugging** - Python version and working directory help diagnose environment issues

**Multi-turn sessions:**

For interactive multi-turn mode, each turn gets its own ``execution_metadata.yaml`` with additional fields:

.. code-block:: yaml

   # ... standard fields above ...
   cli_args:
     mode: "interactive"
     turn: 3
     session_id: "session_20251013_143022"

Coordination Table
------------------

The **coordination table** (``coordination_table.txt``) is a human-readable visualization of the entire multi-agent coordination process.

Structure
~~~~~~~~~

.. code-block:: text

   +-------------------------------------------------------------------+
   |   Event  |           Agent 1           |           Agent 2           |
   |----------+-----------------------------+-----------------------------+
   |   USER   | Original user question                                     |
   |==========+=============================+=============================+
   |     E1   |     📋 Context: []          |      ⏳ (waiting)            |
   |          |  💭 Started streaming       |                             |
   |----------+-----------------------------+-----------------------------+
   |     E2   |     🔄 (streaming)          |   ✨ NEW ANSWER: agent2.1   |
   |          |                             |👁️  Preview: Summary...      |
   |----------+-----------------------------+-----------------------------+

**Key sections:**

1. **Header** - Event symbols, status symbols, and terminology
2. **Event log** - Chronological coordination events
3. **Summary** - Final statistics per agent
4. **Totals** - Overall coordination metrics

Event Symbols
~~~~~~~~~~~~~

**Actions:**

* 💭 Started streaming - Agent begins thinking/processing
* ✨ NEW ANSWER - Agent provides a labeled answer
* 🗳️ VOTE - Agent votes for an answer
* 💭 Reason - Reasoning behind the vote
* 👁️ Preview - Content of the answer
* 🔁 RESTART TRIGGERED - Agent requests to restart
* ✅ RESTART COMPLETED - Agent finishes restart
* 🎯 FINAL ANSWER - Winner provides final response
* 🏆 Winner selected - System announces winner

**Status:**

* 💭 (streaming) - Currently thinking/processing
* ⏳ (waiting) - Idle, waiting for turn
* ✅ (answered) - Has provided an answer
* ✅ (voted) - Has cast a vote
* ✅ (completed) - Task completed
* 🎯 (final answer given) - Winner completed final answer

Answer Labels
~~~~~~~~~~~~~

Each answer gets a unique identifier:

**Format**: ``agent{N}.{attempt}``

* ``N`` = Agent number (1, 2, 3...)
* ``attempt`` = New answer number (1, 2, 3...)

**Examples:**

* ``agent1.1`` = Agent 1's first answer
* ``agent2.1`` = Agent 2's first answer
* ``agent1.2`` = Agent 1's second answer (after restart)
* ``agent1.final`` = Agent 1's final answer (if winner)

Coordination Flow
~~~~~~~~~~~~~~~~~

The table shows how agents coordinate:

1. **Agents see recent answers** - Each agent can view the most recent answers from other agents
2. **Decide next action** - Each agent chooses to either:

   * Provide a new/refined answer
   * Vote for an existing answer they think is best

3. **All agents vote** - Coordination continues until all agents have voted
4. **Final presentation** - The agent with the most votes delivers the final answer

**Example interpretation:**

.. code-block:: text

   E7: Agent 1 provides answer agent1.1
   E13: Agent 1 votes for agent1.1 (self-vote)
   E19: Agent 2 votes for agent1.1 (consensus!)
   E39: Agent 1 selected as winner
   E39: Agent 1 provides final answer

**What agents see:**

During coordination, agents see snapshots of each other's work through workspace snapshots and answer context. This allows agents to build on insights, catch errors, and converge on the best solution.

Summary Statistics
~~~~~~~~~~~~~~~~~~

At the bottom of the coordination table:

.. list-table::
   :header-rows: 1
   :widths: 30 70

   * - Metric
     - Description
   * - **Answers**
     - Number of distinct answers provided
   * - **Votes**
     - Number of votes cast
   * - **Restarts**
     - Number of times agent restarted (cleared memory)
   * - **Status**
     - Final completion status

Accessing Logs
--------------

Log Analysis Commands
~~~~~~~~~~~~~~~~~~~~~

MassGen provides the ``massgen logs`` command for quick log analysis without manual file navigation.

**Summary of most recent run:**

.. code-block:: bash

   massgen logs

   # Example output:
   # ╭──────────────────────────── MassGen Run Summary ─────────────────────────────╮
   # │ Create a website about Bob Dylan                                             │
   # │                                                                               │
   # │ Winner: agent_a | Agents: 1 | Duration: 7.2m | Cost: $0.54                   │
   # ╰───────────────────────────────────────────────────────────────────────────────╯
   #
   # Tokens: Input: 6,035,629 | Output: 21,279 | Reasoning: 7,104
   #
   # Rounds (5): answer: 1 | vote: 1 | presentation: 2 | post_evaluation: 1
   #   Errors: 0 | Timeouts: 0
   #
   # ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━┳━━━━━━┓
   # ┃ Tool                                      ┃ Calls ┃  Time ┃  Avg ┃ Fail ┃
   # ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━╇━━━━━━┩
   # │ mcp__command_line__execute_command        │    47 │  4.4s │ 94ms │      │
   # │ mcp__planning__update_task_status         │    13 │ 228ms │ 18ms │      │
   # └───────────────────────────────────────────┴───────┴───────┴──────┴──────┘

**Available subcommands:**

.. list-table::
   :header-rows: 1
   :widths: 30 70

   * - Command
     - Description
   * - ``massgen logs`` or ``massgen logs summary``
     - Display run summary with tokens, rounds, and top tools
   * - ``massgen logs tools``
     - Full tool breakdown table sorted by execution time
   * - ``massgen logs tools --sort calls``
     - Sort tools by call count instead of time
   * - ``massgen logs list``
     - List recent runs with timestamps, costs, and questions
   * - ``massgen logs list --limit 20``
     - Show more runs (default: 10)
   * - ``massgen logs open``
     - Open log directory in system file manager (Finder/Explorer)

**Filtering by analysis status:**

.. code-block:: bash

   # Show which logs have been analyzed (have ANALYSIS_REPORT.md)
   massgen logs list                    # Shows "Analyzed" column with ✓ for analyzed logs
   massgen logs list --analyzed         # Only logs with ANALYSIS_REPORT.md
   massgen logs list --unanalyzed       # Only logs without analysis

**Common options:**

.. code-block:: bash

   # Analyze a specific log directory
   massgen logs --log-dir .massgen/massgen_logs/log_20251218_134125_867383/turn_1/attempt_1

   # Output raw JSON for scripting
   massgen logs summary --json

**Tool breakdown example:**

.. code-block:: bash

   massgen logs tools

   # ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━┳━━━━━━┓
   # ┃ Tool                                      ┃ Calls ┃  Time ┃  Avg ┃ Fail ┃
   # ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━╇━━━━━━┩
   # │ mcp__command_line__execute_command        │    47 │  4.4s │ 94ms │      │
   # │ mcp__planning__update_task_status         │    13 │ 228ms │ 18ms │      │
   # │ mcp__filesystem__write_file               │     7 │ 181ms │ 26ms │      │
   # │ mcp__planning__create_task_plan           │     2 │  36ms │ 18ms │      │
   # ├───────────────────────────────────────────┼───────┼───────┼──────┼──────┤
   # │ TOTAL                                     │    69 │  4.8s │      │      │
   # └───────────────────────────────────────────┴───────┴───────┴──────┴──────┘

**List recent runs:**

.. code-block:: bash

   massgen logs list --limit 5

   # ┏━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
   # ┃ # ┃ Timestamp        ┃ Duration ┃  Cost ┃ Analyzed ┃ Question                    ┃
   # ┡━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
   # │ 1 │ 2025-12-18 13:41 │     7.2m │ $0.54 │    ✓     │ Create a website about...   │
   # │ 2 │ 2025-12-17 23:01 │    16.2m │ $1.23 │    -     │ Build a REST API...         │
   # │ 3 │ 2025-12-17 22:30 │     3.1m │ $0.12 │    ✓     │ Explain quantum computing...│
   # └───┴──────────────────┴──────────┴───────┴──────────┴─────────────────────────────┘

Analyzing Logs
~~~~~~~~~~~~~~

The ``massgen logs analyze`` command helps you generate analysis reports for log sessions.

**Generate analysis prompt (for coding CLIs):**

.. code-block:: bash

   # Generate a prompt to use in Claude Code, Cursor, etc.
   massgen logs analyze                 # Analyze latest log
   massgen logs analyze --log-dir PATH  # Analyze specific log

This outputs a prompt that references the ``massgen-log-analyzer`` skill, which you can paste into your coding CLI.

**Run multi-agent self-analysis:**

.. code-block:: bash

   # Run MassGen with 3 agents to analyze the log from different perspectives
   massgen logs analyze --mode self

   # Choose UI mode (default: rich_terminal)
   massgen logs analyze --mode self --ui automation   # Headless mode
   massgen logs analyze --mode self --ui webui        # Web UI mode

   # Use custom analysis config
   massgen logs analyze --mode self --config my_analysis.yaml

Self-analysis mode:

* Runs a 2-agent team using Gemini Flash with Docker execution
* Agents analyze from different perspectives (correctness, efficiency, behavior)
* Produces an ``ANALYSIS_REPORT.md`` in the log directory
* Log directory is mounted read-only to protect existing files

.. note::
   Self-analysis mode currently requires a **Gemini API key** (``GEMINI_API_KEY``); to use other models, see `massgen/configs/analysis/log_analysis.yaml` then adjust it or create a new one and pass it to the `analyze` command using `--config`
   For Logfire integration, also set ``LOGFIRE_READ_TOKEN`` in your .env file.
   Without it, agents will use local log files only.

During Execution
~~~~~~~~~~~~~~~~

**Press 'r' key** during execution to view real-time coordination table in your terminal.

After Execution
~~~~~~~~~~~~~~~

**Find latest log directory:**

.. code-block:: bash

   # Using massgen logs open (recommended)
   massgen logs open

   # Or manually
   ls -t .massgen/massgen_logs/ | head -1

**View coordination table:**

.. code-block:: bash

   cat .massgen/massgen_logs/log_20251008_013641/coordination_table.txt

**View specific agent output:**

.. code-block:: bash

   cat .massgen/massgen_logs/log_20251008_013641/agent_outputs/agent_a.txt

**View final answer:**

.. code-block:: bash

   cat .massgen/massgen_logs/log_20251008_013641/agent_outputs/final_presentation_*_latest.txt

Debug Mode
----------

Enable detailed logging with the ``--debug`` flag:

.. code-block:: bash

   uv run python -m massgen.cli \
     --debug \
     --config your_config.yaml \
     "Your question"

**What debug mode logs:**

* ✅ Full API request/response bodies
* ✅ Tool call arguments and results
* ✅ Coordination state transitions
* ✅ File operation details
* ✅ MCP server communication
* ✅ Error stack traces

**Debug log location**: ``.massgen/massgen_logs/log_YYYYMMDD_HHMMSS/massgen_debug.log``

Common Debugging Scenarios
---------------------------

Agent Not Converging
~~~~~~~~~~~~~~~~~~~~

**Check**: ``coordination_table.txt``

Look for:

* Agents changing votes frequently
* New answers in every round
* No clear vote majority

**Solution**: Review agent answers to understand disagreement points.

Agent Errors
~~~~~~~~~~~~

**Check**: ``massgen.log`` for error messages

**Search for**:

.. code-block:: bash

   grep -i "error" .massgen/massgen_logs/log_*/massgen.log
   grep -i "exception" .massgen/massgen_logs/log_*/massgen.log

Tool Failures
~~~~~~~~~~~~~

**Check**: ``agent_outputs/agent_<id>.txt``

Look for tool call failures and error messages.

**Also check**: ``massgen.log`` for detailed tool execution logs

Understanding Agent Decisions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Review coordination rounds:**

1. Open ``coordination_table.txt``
2. Find the round where decision changed
3. Check ``agent_<id>/YYYYMMDD_HHMMSS_NNNNNN/context.txt`` to see what the agent could see
4. Check ``agent_<id>/YYYYMMDD_HHMMSS_NNNNNN/answer.txt`` for the agent's reasoning
5. Check ``agent_<id>/YYYYMMDD_HHMMSS_NNNNNN/execution_trace.md`` for complete tool usage and thinking

Performance Analysis
~~~~~~~~~~~~~~~~~~~~

**Check summary statistics** in ``coordination_table.txt``:

* High restart count = Agents changing approach frequently
* Low vote count = Quick consensus
* Many answers = Iterative refinement

Log Retention
-------------

Logs are stored indefinitely by default.

**Clean old logs manually:**

.. code-block:: bash

   # Remove logs older than 7 days
   find .massgen/massgen_logs/ -type d -name "log_*" -mtime +7 -exec rm -rf {} +

**Disk space check:**

.. code-block:: bash

   du -sh .massgen/massgen_logs/

Best Practices
--------------

1. **Review coordination table first** - Best overview of what happened
2. **Use debug mode for troubleshooting** - Full details when needed
3. **Archive important logs** - Move successful runs to separate directory
4. **Check final presentation** - Verify winning agent's work quality
5. **Monitor log size** - Clean old logs periodically

Integration with CI/CD
----------------------

**Automated log parsing:**

.. code-block:: python

   import json

   # Parse coordination events
   with open(".massgen/massgen_logs/log_latest/coordination_events.json") as f:
       events = json.load(f)

   # Extract final answer
   with open(".massgen/massgen_logs/log_latest/agent_outputs/final_presentation_*_latest.txt") as f:
       final_answer = f.read()

**Exit status:**

MassGen exits with status 0 on success, non-zero on failure.

.. code-block:: bash

   uv run python -m massgen.cli --config config.yaml "Question" && echo "Success"

Sharing Sessions
----------------

MassGen allows you to share session logs via GitHub Gist for easy collaboration and review.

Prerequisites
~~~~~~~~~~~~~

Sharing requires the **GitHub CLI (gh)** to be installed and authenticated:

1. **Install GitHub CLI**:

   - macOS: ``brew install gh``
   - Windows: ``winget install --id GitHub.cli``
   - Linux: See https://cli.github.com/

2. **Authenticate with GitHub**:

   .. code-block:: bash

      gh auth login

   Follow the prompts to authenticate. This is required for creating gists.

Sharing a Session
~~~~~~~~~~~~~~~~~

Use the ``massgen export`` command to share a session:

.. code-block:: bash

   # Share the most recent session (all turns)
   massgen export

   # Share a specific session by log directory name
   massgen export log_20251218_134125_867383

   # Share a specific session by full path
   massgen export /path/to/.massgen/massgen_logs/log_20251218_134125_867383

**Multi-Turn Sessions:**

For sessions with multiple turns, all turns are included by default. Use the ``--turns`` option to select specific turns:

.. code-block:: bash

   # Share only the first 3 turns
   massgen export --turns 3

   # Share turns 2 through 5
   massgen export --turns 2-5

   # Share only the latest turn
   massgen export --turns latest

   # Share all turns (default)
   massgen export --turns all

**Export Options:**

.. code-block:: bash

   # Preview what would be shared without creating a gist
   massgen export --dry-run

   # Show detailed file listing
   massgen export --verbose

   # Output result as JSON (for scripting)
   massgen export --json

   # Skip interactive prompts (use defaults)
   massgen export --yes

   # Exclude workspace artifacts
   massgen export --no-workspace

   # Set workspace size limit per agent (default: 500KB)
   massgen export --workspace-limit 1MB

**Output:**

.. code-block:: text

   Sharing session from: log_20251218_134125_867383

   Session: log_20251218_134125_867383
   Turns: 3

     ✓ Turn 1 - What is the capital of France?
     ✓ Turn 2 - Tell me more about Paris
     ✓ Turn 3 - What are popular attractions?

   Collecting files...
   Uploading 45 files (1,234,567 bytes)...

   Share URL: https://massgen.github.io/MassGen-Viewer/?gist=abc123def456

   Anyone with this link can view the session (no login required).

The share URL opens the **MassGen Viewer**, a web-based session viewer that displays:

- Session summary (question, winner, cost, duration)
- Agent activity and coordination timeline
- Answers and votes with full content
- Tool usage breakdown
- Configuration used
- Turn navigation (for multi-turn sessions)
- Error details (for failed/interrupted sessions)

**What gets uploaded:**

- Session manifest (``_session_manifest.json``) with turn metadata
- Metrics and status files for all turns
- Coordination events and votes
- Agent answers (intermediate and final)
- Execution metadata (with API keys redacted)
- Workspace artifacts (HTML, CSS, JS, images up to size limit)
- Error information for failed/interrupted sessions

**What is excluded:**

- Large files (>10MB or exceeding workspace limit)
- Debug logs (``massgen.log``)
- Binary files and caches
- Sensitive data (API keys are automatically redacted)
- Files matching sensitive patterns (detected with warning)

**Sharing Error Sessions:**

Failed or interrupted sessions can still be shared for debugging:

.. code-block:: text

   Session: log_20251218_134125_867383
   Turns: 2

     ✓ Turn 1 - What is the capital of France?
     ✗ Turn 2 - Tell me more about Paris

   [yellow]Warning: This session has errors[/yellow]

The viewer will clearly indicate error status and show error details when available.

Managing Shared Sessions
~~~~~~~~~~~~~~~~~~~~~~~~

**List your shared sessions:**

.. code-block:: bash

   massgen shares list

**Delete a shared session:**

.. code-block:: bash

   massgen shares delete <gist_id>

Authentication Errors
~~~~~~~~~~~~~~~~~~~~~

If you see authentication errors when sharing:

.. code-block:: text

   Error: Not authenticated with GitHub.
   Run 'gh auth login' to enable sharing.

**Solution:** Run ``gh auth login`` and complete the authentication flow.

If the GitHub CLI is not installed:

.. code-block:: text

   Error: GitHub CLI (gh) not found.
   Install it from https://cli.github.com/

**Solution:** Install the GitHub CLI for your platform.

Logfire Observability
---------------------

MassGen supports `Logfire <https://logfire.pydantic.dev/docs/>`_ for advanced structured tracing and observability.

.. note::
   Logfire is an **optional dependency**. Install it with:

   .. code-block:: bash

      pip install "massgen[observability]"

      # Or with uv
      uv pip install "massgen[observability]"

When enabled, Logfire provides:

* **Automatic LLM instrumentation** - Traces all OpenAI and Anthropic API calls with request/response details
* **Tool execution tracing** - Spans for MCP tool calls with timing and success/failure metrics
* **Coordination events** - Structured logs for agent coordination, voting, and winner selection
* **Token usage metrics** - Detailed tracking of input/output/reasoning/cached tokens
* **Integrated with loguru** - All existing log messages flow through Logfire when enabled

Enabling Logfire
~~~~~~~~~~~~~~~~

**Via CLI flag (recommended):**

.. code-block:: bash

   massgen --logfire --config your_config.yaml "Your question"

**Via environment variable:**

.. code-block:: bash

   export MASSGEN_LOGFIRE_ENABLED=true
   massgen --config your_config.yaml "Your question"

Setting Up Logfire
~~~~~~~~~~~~~~~~~~

1. **Install MassGen with observability support:**

   .. code-block:: bash

      pip install "massgen[observability]"

      # Or with uv
      uv pip install "massgen[observability]"

2. **Create a Logfire account** at https://logfire.pydantic.dev/

3. **Authenticate with Logfire:**

   .. code-block:: bash

      # Authenticate (this creates ~/.logfire/credentials.json)
      uv run logfire auth

4. **Alternatively, set the token directly:**

   .. code-block:: bash

      export LOGFIRE_TOKEN=your_token_here

5. **Run MassGen with Logfire enabled:**

   .. code-block:: bash

      massgen --logfire --config your_config.yaml "Your question"

What Gets Traced
~~~~~~~~~~~~~~~~

When Logfire is enabled, MassGen automatically traces:

**LLM API Calls:**

* All requests to OpenAI-compatible APIs (GPT-4, etc.)
* All requests to Anthropic Claude API
* All requests to Google GenAI (Gemini) API
* Request parameters, response content, and timing
* Token usage breakdown

.. note::
   For Gemini tracing, set ``OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true``
   to capture full prompts and completions. Without this, content appears as ``<elided>``.

**Tool Executions:**

* MCP server tool calls with full input/output
* Custom tools (like ``read_media``, ``write_file``, etc.)
* Agent attribution via ``massgen.agent_id`` span attribute
* Execution time in milliseconds
* Success/failure status
* Error messages when tools fail

**Coordination Events:**

* ``coordination_started`` - When agent coordination begins
* ``winner_selected`` - When voting completes and a winner is chosen
* Vote counts and participating agents

**Example Logfire Dashboard View:**

.. code-block:: text

   ┌─ coordination.session (45.2s) ──────────────────────────────┐
   │  task: "Build a REST API for user management"              │
   │  num_agents: 3                                              │
   │  agent_ids: agent_a, agent_b, agent_c                      │
   │                                                             │
   │  ├─ llm.call [claude-3-5-sonnet] (3.1s)                   │
   │  │   input_tokens: 1,234                                   │
   │  │   output_tokens: 567                                    │
   │  │                                                         │
   │  ├─ mcp.filesystem.write_file (0.8s)                      │
   │  │   input_chars: 245                                      │
   │  │   output_chars: 12                                      │
   │  │   success: true                                         │
   │  │                                                         │
   │  ├─ [info] Agent answer: agent1.1                         │
   │  │   agent_id: agent_a, iteration: 1, round: 1            │
   │  │                                                         │
   │  ├─ llm.call [gpt-4] (3.5s)                               │
   │  │   input_tokens: 2,456                                   │
   │  │   output_tokens: 823                                    │
   │  │                                                         │
   │  ├─ [info] Agent answer: agent2.1                         │
   │  │   agent_id: agent_b, iteration: 1, round: 1            │
   │  │                                                         │
   │  ├─ [info] Agent vote: agent_a -> agent2.1                │
   │  │   reason: "More comprehensive solution"                │
   │  │                                                         │
   │  ├─ [info] Agent vote: agent_b -> agent2.1                │
   │  │                                                         │
   │  └─ [info] Winner selected: agent2.1                      │
   │      vote_counts: {agent2.1: 2}                           │
   └────────────────────────────────────────────────────────────┘

**What Gets Logged (Meaningful Events Only):**

To reduce noise, MassGen only logs meaningful coordination events:

1. **Session span** (``coordination.session``) - Top-level span for the entire coordination
2. **LLM API calls** - Automatic instrumentation of OpenAI, Anthropic, and Gemini calls
3. **Tool executions** - MCP tool calls with input/output sizes and timing
4. **Agent answers** - When an agent provides a new answer (with label like ``agent1.1``)
5. **Agent votes** - When an agent casts a vote (with reason)
6. **Winner selection** - When voting completes and winner is determined
7. **Final answer** - When the winning agent presents the final response

Note: Individual coordination iterations are tracked internally but not logged to Logfire to avoid cluttering the trace with less useful information.

Tool Execution Attributes
~~~~~~~~~~~~~~~~~~~~~~~~~

All tool execution events include rich attributes for filtering, grouping, and debugging:

.. list-table::
   :header-rows: 1
   :widths: 25 75

   * - Attribute
     - Description
   * - ``agent_id``
     - The ID of the agent executing the tool (e.g., ``agent_a``, ``agent_b``)
   * - ``tool_name``
     - The full tool name (e.g., ``mcp__filesystem__write_file``)
   * - ``tool_type``
     - Tool category: ``mcp`` for MCP tools, ``custom`` for built-in tools
   * - ``success``
     - Boolean indicating whether the tool call succeeded
   * - ``execution_time_ms``
     - Execution time in milliseconds
   * - ``input_chars``
     - Number of characters in the tool input/arguments
   * - ``output_chars``
     - Number of characters in the tool output/result
   * - ``error_message``
     - Error message if the tool call failed (null on success)
   * - ``server_name``
     - MCP server name for MCP tools (e.g., ``filesystem``, ``command_line``)
   * - ``arguments_preview``
     - First 200 characters of tool arguments (for pattern analysis)
   * - ``output_preview``
     - First 200 characters of tool output (for debugging)
   * - ``round_number``
     - Which coordination round the tool was called in (0, 1, 2, ...)
   * - ``round_type``
     - Type of round: ``initial_answer``, ``voting``, ``presentation``

LLM API Call Attributes
~~~~~~~~~~~~~~~~~~~~~~~

All LLM API call spans include these attributes for agent attribution:

.. list-table::
   :header-rows: 1
   :widths: 25 75

   * - Attribute
     - Description
   * - ``massgen.agent_id``
     - The ID of the agent making the call
   * - ``llm.provider``
     - Provider name (``anthropic``, ``openai``, ``gemini``, etc.)
   * - ``llm.model``
     - Model being called (``claude-3-opus``, ``gpt-4o``, etc.)
   * - ``llm.operation``
     - API operation type (typically ``stream``)
   * - ``gen_ai.system``
     - OpenTelemetry semantic convention for provider
   * - ``gen_ai.request.model``
     - OpenTelemetry semantic convention for model

Example Logfire Queries
~~~~~~~~~~~~~~~~~~~~~~~

These attributes enable powerful filtering and analysis in the Logfire dashboard:

**Find slowest tool calls:**

.. code-block:: sql

   SELECT
     attributes->>'tool.name' as tool_name,
     (attributes->>'tool.execution_time_ms')::float as execution_time_ms,
     attributes->>'massgen.agent_id' as agent_id
   FROM records
   WHERE attributes->>'tool.type' = 'mcp'
   ORDER BY (attributes->>'tool.execution_time_ms')::float DESC

**Find failed tools with their arguments:**

.. code-block:: sql

   SELECT
     attributes->>'tool.name' as tool_name,
     attributes->>'tool.arguments_preview' as arguments_preview,
     attributes->>'tool.error_message' as error_message,
     attributes->>'massgen.agent_id' as agent_id
   FROM records
   WHERE attributes->>'tool.success' = 'false'

**Tools with large outputs (potential cost drivers):**

.. code-block:: sql

   SELECT
     attributes->>'mcp.server' as server_name,
     attributes->>'tool.name' as tool_name,
     (attributes->>'tool.output_chars')::int as output_chars,
     attributes->>'massgen.agent_id' as agent_id
   FROM records
   WHERE (attributes->>'tool.output_chars')::int > 10000
   ORDER BY (attributes->>'tool.output_chars')::int DESC

**Pattern analysis - which arguments lead to failures:**

.. code-block:: sql

   SELECT
     attributes->>'tool.arguments_preview' as arguments_preview,
     COUNT(*) as fail_count
   FROM records
   WHERE attributes->>'tool.success' = 'false'
   GROUP BY attributes->>'tool.arguments_preview'
   ORDER BY fail_count DESC

**Tool usage by MCP server:**

.. code-block:: sql

   SELECT
     attributes->>'mcp.server' as server_name,
     COUNT(*) as calls,
     AVG((attributes->>'tool.execution_time_ms')::float) as avg_time_ms
   FROM records
   WHERE attributes->>'tool.type' = 'mcp'
   GROUP BY attributes->>'mcp.server'

**LLM calls by agent:**

.. code-block:: sql

   SELECT
     attributes->>'massgen.agent_id' as agent_id,
     attributes->>'llm.model' as model,
     COUNT(*) as calls
   FROM records
   WHERE span_name LIKE 'llm.%'
   GROUP BY attributes->>'massgen.agent_id', attributes->>'llm.model'

**All activity for a specific agent:**

.. code-block:: sql

   SELECT span_name, start_timestamp, duration
   FROM records
   WHERE attributes->>'massgen.agent_id' = 'agent_a'
   ORDER BY start_timestamp

Environment Variables
~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 35 65

   * - Variable
     - Description
   * - ``MASSGEN_LOGFIRE_ENABLED``
     - Set to ``true`` to enable Logfire (alternative to ``--logfire`` flag)
   * - ``LOGFIRE_TOKEN``
     - Your Logfire API token (if not using ``logfire auth login``)
   * - ``LOGFIRE_SERVICE_NAME``
     - Override the service name (default: ``massgen``). Read by Logfire library.
   * - ``LOGFIRE_ENVIRONMENT``
     - Set environment tag (e.g., ``production``, ``development``). Read by Logfire library.
   * - ``OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT``
     - Set to ``true`` to capture Gemini prompts/completions (otherwise shows ``<elided>``)

Programmatic Usage
~~~~~~~~~~~~~~~~~~

When using MassGen as a library, you can configure Logfire programmatically:

.. code-block:: python

   from massgen.structured_logging import configure_observability

   # Enable observability with custom settings
   configure_observability(
       enabled=True,
       service_name="my-app",
       environment="production",
   )

   # Now run your orchestrator
   from massgen.orchestrator import Orchestrator
   orchestrator = Orchestrator(config)
   result = await orchestrator.run("Your question")

Graceful Degradation
~~~~~~~~~~~~~~~~~~~~

Logfire integration is designed to be non-intrusive:

* **Logfire not installed?** - You'll see a helpful message: ``⚠️ Logfire not installed. Install with: pip install massgen[observability]``
* **Not authenticated?** - You'll see: ``Logfire requires authentication. Run 'logfire auth' to authenticate``
* **Logfire disabled?** - All logging falls back to standard loguru
* **Network issues?** - Logfire handles connectivity gracefully

This means you can always enable the ``--logfire`` flag without worrying about breaking your workflow - it will show helpful guidance if Logfire needs to be set up.

See Also
--------

* :doc:`sessions/multi_turn_mode` - Session logging for interactive mode
* :doc:`files/file_operations` - Workspace and file operation logs
* :doc:`../reference/cli` - CLI options for logging control


---

## user_guide/multimodal.rst

Multimodal Tools
================

Overview
--------

MassGen provides unified multimodal tools that enable AI agents to analyze and generate various media types including images, videos, and audio. These tools provide a simple, consistent interface across all supported media formats.

Quick Start
-----------

Enable multimodal tools in your configuration:

.. code-block:: yaml

   agents:
     - id: my_agent
       backend:
         type: openai
         model: gpt-5
         enable_multimodal_tools: true
         image_generation_backend: openai
         video_generation_backend: google
         audio_generation_backend: openai

This automatically registers two unified tools:

- **read_media**: Universal media reading and analysis
- **generate_media**: Universal media generation

Unified Tools
-------------

read_media
^^^^^^^^^^

**Purpose**: Analyze any media file (image, audio, or video) with a single tool.

**Auto-detection**: Automatically detects media type from file extension and routes to the appropriate analysis backend.

**Usage**:

.. code-block:: python

   # Agent can simply use read_media for any media type
   result = read_media("screenshot.png", prompt="What's in this image?")
   result = read_media("podcast.mp3", prompt="Summarize this audio")
   result = read_media("demo.mp4", prompt="What happens in this video?")

**Parameters**:

- ``media_path`` (required): Path to the media file

  - Relative paths resolved from agent workspace
  - Absolute paths must be in allowed directories
  - Auto-detects type from extension (png, jpg, mp3, wav, mp4, mov, etc.)

- ``prompt`` (optional): Question or instruction about the media

  - Default: "Please analyze this {media_type} and describe its contents."

**Returns**:

Text description of the media content via the appropriate understanding tool (``understand_image``, ``understand_audio``, or ``understand_video``).

**Supported Formats**:

- **Images**: png, jpg, jpeg, gif, webp, bmp
- **Audio**: mp3, wav, m4a, ogg, flac, aac
- **Video**: mp4, mov, avi, mkv, webm

**Configuration Overrides**:

You can specify different backends/models per media type using simple config variables:

.. code-block:: yaml

   backend:
     enable_multimodal_tools: true
     image_generation_backend: openai
     image_generation_model: gpt-5
     video_generation_backend: google
     audio_generation_backend: openai

generate_media
^^^^^^^^^^^^^^

**Purpose**: Generate images, videos, or audio from text descriptions.

**Smart Backend Selection**: Automatically chooses the best available backend based on API keys and configuration.

**Usage**:

.. code-block:: python

   # Generate an image
   result = generate_media(
       prompt="a cat in space",
       mode="image"
   )

   # Generate a video
   result = generate_media(
       prompt="neon-lit alley at night, light rain",
       mode="video",
       duration=8
   )

   # Generate audio (text-to-speech)
   result = generate_media(
       prompt="Hello, welcome to MassGen!",
       mode="audio",
       voice="nova"
   )

**Core Parameters**:

- ``prompt`` (required): Text description of what to generate. For audio speech, this is the
  **literal text to speak** — do NOT include speaking instructions here.
- ``mode`` (required): Type of media — ``"image"``, ``"video"``, or ``"audio"``
- ``backend_type`` (optional): Preferred backend — ``"auto"``, ``"openai"``, ``"google"``,
  ``"grok"``, ``"openrouter"``, or ``"elevenlabs"``
- ``model`` (optional): Override the default model for the selected backend
- ``storage_path`` (optional): Directory to save generated media (defaults to workspace root)
- ``continue_from`` (optional): Continuation ID from a previous result for multi-turn editing

**Image-specific parameters**:

- ``quality``: ``"low"``, ``"medium"``, ``"high"``, ``"auto"`` (OpenAI)
- ``size``: Image dimensions. OpenAI: ``"1024x1024"``, ``"1024x1536"``, ``"1536x1024"``.
  Gemini: ``"512px"``, ``"1K"``, ``"2K"``, ``"4K"``. Grok: ``"1k"``.
- ``aspect_ratio``: e.g., ``"16:9"``, ``"1:1"``, ``"9:16"`` (Google, Grok, OpenRouter)
- ``input_images``: List of image paths for image-to-image editing (OpenAI, Google Gemini, Grok)
- ``mask_path``: Path to mask PNG for inpainting (OpenAI, Google Imagen)
- ``output_format``: ``"png"``, ``"jpeg"``, ``"webp"`` (OpenAI, Google Imagen)
- ``background``: ``"transparent"``, ``"opaque"``, ``"auto"`` (OpenAI only)
- ``style_image``: Style reference image for Google Imagen style transfer
- ``control_image``: Structural control image for Google Imagen
- ``subject_image``: Subject reference image for Google Imagen consistency
- ``negative_prompt``: What to exclude (Google Imagen)
- ``seed``: Reproducibility seed (Google Imagen, ElevenLabs)
- ``guidance_scale``: Prompt adherence strength (Google Imagen)

**Video-specific parameters**:

- ``duration``: Length in seconds (clamped per backend)
- ``size``: Resolution — Grok: ``"480p"``, ``"720p"``; Veo: ``"720p"``, ``"1080p"``, ``"4k"``
- ``aspect_ratio``: e.g., ``"16:9"``, ``"9:16"``
- ``input_images``: Source image for image-to-video (all 3 backends)
- ``video_reference_images``: Style/content guide images for Veo (up to 3)
- ``negative_prompt``: What to exclude (Google Veo)

**Audio-specific parameters**:

- ``audio_type``: Type of audio operation — ``"speech"`` (default), ``"music"``,
  ``"sound_effect"``, ``"voice_conversion"``, ``"audio_isolation"``, ``"voice_design"``,
  ``"voice_clone"``, ``"dubbing"``
- ``voice``: Voice name or ID (e.g., ``"Rachel"``, ``"alloy"``, ``"nova"``)
- ``instructions``: Speaking style guidance (OpenAI ``gpt-4o-mini-tts`` only)
- ``speed``: Playback speed multiplier, 0.25–4.0 (OpenAI)
- ``audio_format``: Output format (``"mp3"``, ``"wav"``, ``"opus"``)
- ``input_audio``: Path to input audio for voice conversion, isolation, or dubbing
- ``voice_samples``: List of audio file paths for voice cloning
- ``target_language``: Target language code for dubbing (e.g., ``"es"``, ``"fr"``)
- ``source_language``: Source language code for dubbing (optional, auto-detected)
- ``voice_stability``: ElevenLabs voice stability (0.0–1.0)
- ``voice_similarity``: ElevenLabs similarity boost (0.0–1.0)

**Returns**:

JSON with ``success``, ``file_path``, ``file_size``, ``backend``, ``model``, ``continuation_id``,
and ``metadata`` fields.

**Supported Backends**:

.. list-table::
   :header-rows: 1

   * - Mode
     - Backends (priority order)
     - Default Models
   * - image
     - google, openai, grok, openrouter
     - Nano Banana 2 (``gemini-3.1-flash-image-preview``), ``gpt-5.4``, ``grok-imagine-image``, Nano Banana 2 (via OR)
   * - video
     - grok, google, openai
     - ``grok-imagine-video``, Veo 3.1 (``veo-3.1-generate-preview``), ``sora-2``
   * - audio (speech)
     - elevenlabs, openai
     - ``eleven_multilingual_v2``, ``gpt-4o-mini-tts``
   * - audio (music)
     - elevenlabs
     - ``elevenlabs-music``
   * - audio (sfx)
     - elevenlabs
     - ``elevenlabs-sfx``
   * - audio (editing)
     - elevenlabs
     - See ``audio_type`` values above

Backend Configuration
---------------------

Simple Configuration
^^^^^^^^^^^^^^^^^^^^

Just enable multimodal tools:

.. code-block:: yaml

   backend:
     enable_multimodal_tools: true

This uses default backends based on available API keys.

Advanced Configuration
^^^^^^^^^^^^^^^^^^^^^^

Specify backends and models per media type:

.. code-block:: yaml

   backend:
     enable_multimodal_tools: true
     image_generation_backend: openai
     image_generation_model: gpt-5.4
     video_generation_backend: google
     video_generation_model: veo-3.1-generate-preview
     audio_generation_backend: openai
     audio_generation_model: gpt-4o-mini-tts


Native Backend Routing (v0.1.55+)
----------------------------------

Image and video understanding now route to the **agent's own backend** when it supports the capability, instead of always using OpenAI. This preserves model diversity and per-agent consistency.

**Supported image backends**: OpenAI, Claude, Gemini, Grok, Claude Code (SDK), Codex (CLI).

If the agent's backend doesn't support image understanding, it falls back to OpenAI ``gpt-5.4``.

.. code-block:: yaml

   # A Claude agent will use Claude's vision API for image analysis
   agents:
     - id: claude_vision
       backend:
         type: claude
         model: claude-sonnet-4-5
         enable_multimodal_tools: true

Video Frame Extraction (v0.1.56+)
-----------------------------------

Video understanding (for non-Gemini backends) extracts frames from the video and sends them as images. You can configure the extraction strategy via ``multimodal_config.video``:

.. code-block:: yaml

   backend:
     enable_multimodal_tools: true
     multimodal_config:
       video:
         extraction_mode: "scene"   # "scene" (default) or "uniform"
         max_frames: 30             # Hard cap (default: 30, absolute max: 60)
         fps: 1.0                   # Frames/sec for uniform mode (default: 1.0)
         threshold: 0.3             # Scene detection sensitivity (scene mode)
         frames_per_scene: 3        # Frames per detected scene (scene mode)
         num_frames: 8              # Legacy fixed count (overrides fps if set)

**Extraction Modes:**

- **scene** (default): Uses PySceneDetect to find scene boundaries, then samples frames within each scene. Produces better coverage of meaningful content and avoids wasting tokens on static segments. Falls back to uniform if PySceneDetect is not installed.
- **uniform**: Evenly spaced frames. Uses ``fps`` (default 1.0) to compute frame count based on video duration, or ``num_frames`` for a fixed count.

**Frame Cap Behavior:**

- ``max_frames`` is configurable (default 30), but cannot exceed the absolute maximum of 60
- A 10-second video at 1 FPS produces 10 frames (good coverage)
- A 2-minute video at 1 FPS produces 30 frames (hits default cap)
- A 30-minute video at 1 FPS produces 30 frames (capped, cost-safe)
- Setting ``num_frames: 8`` explicitly gives exactly 8 frames (backward compatible)

**Installation for Scene Detection:**

.. code-block:: bash

   pip install massgen[video]

If PySceneDetect is not installed, scene mode gracefully falls back to uniform extraction.

Legacy Tools
------------

Individual Understanding Tools
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The unified ``read_media`` tool internally delegates to these specialized tools:

- ``understand_image``: Routes to agent's native backend (OpenAI, Claude, Gemini, Grok, Claude Code, Codex)
- ``understand_audio``: OpenAI Whisper transcription + gpt-4o analysis
- ``understand_video``: Routes to best available backend (Gemini native, or frame extraction via OpenAI/Claude/Grok)

These tools are **not automatically registered** when ``enable_multimodal_tools: true``. They are only used internally by ``read_media``.

**When to use them directly**: You can manually register them via ``custom_tools`` if you need:

- Fine control over frame extraction (videos)
- Custom audio transcription settings
- Specific vision model configurations

Individual Generation Tools
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Legacy generation tools have been superseded by ``generate_media``:

- ❌ ``text_to_image_generation`` → Use ``generate_media(mode="image")``
- ❌ ``text_to_video_generation`` → Use ``generate_media(mode="video")``
- ❌ ``text_to_speech_transcription_generation`` → Use ``generate_media(mode="audio")``

These tools are **not automatically registered** when ``enable_multimodal_tools: true``.

**Migration**: Update your configs to use the unified ``generate_media`` tool.

Manual Tool Registration
^^^^^^^^^^^^^^^^^^^^^^^^

If you need specific legacy tools, manually register them:

.. code-block:: yaml

   agents:
     - id: my_agent
       backend:
         custom_tools:
           - name: ["understand_video"]
             category: "multimodal"
             path: "massgen/tool/_multimodal_tools/understand_video.py"
             function: ["understand_video"]
             config:
               num_frames: 16  # More detailed analysis

Examples
--------

Complete Multimodal Workflow
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: yaml

   # config.yaml
   agents:
     - id: multimodal_agent
       backend:
         type: openai
         model: gpt-4o
         enable_multimodal_tools: true
         multimodal_config:
           image:
             backend: openai
             model: gpt-5.4
           video:
             backend: google
             model: veo-3.1-generate-preview

   task: |
     1. Generate an image of a futuristic city
     2. Analyze the generated image
     3. Generate a 4-second video panning across the city

Agent interaction:

.. code-block:: python

   # Agent automatically uses the right tools
   result1 = generate_media("futuristic city with flying cars", mode="image")
   # -> Saves to: workspace/generated_image_20250122_123456.png

   result2 = read_media("generated_image_20250122_123456.png",
                        prompt="Describe this cityscape")
   # -> "The image shows a sprawling metropolis with towering skyscrapers..."

   result3 = generate_media(
       prompt="slow pan across futuristic city with neon lights",
       mode="video",
       duration=4
   )
   # -> Saves to: workspace/generated_video_20250122_123500.mp4

Multi-Agent with Specialized Backends
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: yaml

   agents:
     - id: image_specialist
       backend:
         type: openai
         model: gpt-4o
         enable_multimodal_tools: true
         multimodal_config:
           image:
             backend: openai
             model: gpt-5.4  # Best for images

     - id: video_specialist
       backend:
         type: gemini
         model: gemini-2.5-pro
         enable_multimodal_tools: true
         multimodal_config:
           video:
             backend: google
             model: veo-3.1-generate-preview  # Best for videos

Troubleshooting
---------------

API Key Issues
^^^^^^^^^^^^^^

Ensure required API keys are set:

.. code-block:: bash

   # For OpenAI (images, video, audio)
   export OPENAI_API_KEY="sk-..."

   # For Google/Gemini (images, video)
   export GEMINI_API_KEY="..."

   # For Grok/xAI (images, video)
   export XAI_API_KEY="..."

   # For ElevenLabs (audio: speech, music, SFX, voice editing)
   export ELEVENLABS_API_KEY="..."

   # For OpenRouter (images)
   export OPENROUTER_API_KEY="..."

No Backend Available
^^^^^^^^^^^^^^^^^^^^

If you see "No backend available for {mode} generation":

1. Check API keys are set
2. Verify backend supports the media type (see Supported Backends above)
3. Check ``multimodal_config`` if using custom backends

Path Access Errors
^^^^^^^^^^^^^^^^^^

If media files can't be read:

1. Use relative paths from workspace (recommended)
2. Or use absolute paths within allowed directories
3. Check file exists and has correct extension

File Size Limits
^^^^^^^^^^^^^^^^

Be aware of backend limits:

- **Input images**: 4MB per image (PNG, JPEG only)
- **Google Video**: Varies by duration and resolution
- **Audio**: Generally generous limits

See Also
--------

- :doc:`/reference/yaml_schema` - Full configuration reference
- :doc:`/reference/supported_models` - Supported models by backend


---

## user_guide/sessions/graceful_cancellation.rst

Graceful Cancellation
=====================

MassGen supports graceful cancellation, allowing you to interrupt a running session (Ctrl+C) while still preserving partial progress. This is useful when agents are going down the wrong path or taking too long.

How It Works
------------

When you press Ctrl+C during coordination:

1. MassGen captures the current state from all agents
2. Any answers that have been generated are saved
3. Agent workspaces are preserved
4. The session can be resumed later with ``--continue``

.. code-block:: bash

   # Running a session
   $ massgen --config my_config.yaml "Complex question..."

   🤖 Multi-Agent
   Agents: agent_a, agent_b
   Question: Complex question...

   [Agent coordination in progress...]

   ^C
   ⚠️  Cancellation requested - saving partial progress...
   ✅ Partial progress saved. Session can be resumed with --continue
   👋 Goodbye!

What Gets Saved
---------------

When you cancel mid-session, MassGen saves:

* **Partial answers** - Any answers that agents have submitted
* **Agent workspaces** - All files created or modified by each agent (separately)
* **Turn metadata** - Phase, timestamp, and task information
* **Voting state** - If voting was in progress

The partial turn is marked as "incomplete" in the session directory.

What the Next Turn Sees
-----------------------

When you resume a session with an incomplete turn, no information is lost:

**Conversation History**: All partial answers are combined into a single assistant message with clear attribution. Agents that were still working (have a workspace but no answer yet) get a placeholder:

.. code-block:: text

   [INCOMPLETE TURN - Session was cancelled before completion]
   [Phase when cancelled: coordinating]

   ## agent_a's answer (voted for: agent_b):
   First agent's partial answer...
   [Workspace available at: /path/to/workspace]

   ## agent_b:
   [No answer submitted - agent was still working]
   [View workspace for current progress: /path/to/workspace]

**Workspaces**: All agent workspaces from the incomplete turn are provided as read-only context paths, allowing agents to see what each other was working on. This includes:

- Agents that submitted partial answers
- Agents that were still working (created files but no answer yet)

This ensures no files or progress is lost, even from agents that hadn't finished their response.

Session Directory Structure
~~~~~~~~~~~~~~~~~~~~~~~~~~~

After cancellation, your session directory will contain:

.. code-block:: text

   .massgen/sessions/session_20251205_120000/
   ├── SESSION_SUMMARY.txt           # Updated with incomplete turn info
   ├── turn_1/                       # Complete previous turn (if any)
   │   ├── metadata.json
   │   ├── answer.txt
   │   └── workspace/
   └── turn_2/                       # Incomplete turn
       ├── metadata.json             # status: "incomplete"
       ├── partial_answers.json      # All agent answers
       ├── answer.txt                # Best available answer
       └── workspaces/               # Per-agent workspaces
           ├── agent_a/
           └── agent_b/

Resuming After Cancellation
---------------------------

Resume your session with the ``--continue`` flag:

.. code-block:: bash

   $ massgen --continue

   📚 Restored session with 2 previous turn(s)
      Starting turn 3

   ⚠️  Previous turn was incomplete (cancelled during coordinating phase)
      Task: Complex question...
      Partial answers saved from: agent_a, agent_b
      The incomplete turn's partial progress is saved in the session directory.

When resuming:

* MassGen shows you information about the incomplete turn
* You can ask the same question again or move on to a new one
* Previous complete turns provide context for agents
* Partial answers from the cancelled turn are available for review in the session directory

Viewing Partial Results
-----------------------

You can review the partial answers saved during cancellation:

.. code-block:: bash

   # View the partial answers
   cat .massgen/sessions/session_*/turn_*/partial_answers.json

   # View the best partial answer
   cat .massgen/sessions/session_*/turn_*/answer.txt

Force Exit
----------

If you need to exit immediately without saving:

* First Ctrl+C: Graceful cancellation (saves partial progress)
* Second Ctrl+C: Force exit (no saving)

Multi-Turn Mode Behavior
------------------------

In multi-turn (interactive) sessions, cancellation works slightly differently:

* **First Ctrl+C**: Saves partial progress and returns to the prompt
* **Second Ctrl+C**: Exits the session entirely
* **Queued runtime fallback prompts persist**: If pending runtime-injection text is promoted to a new turn prompt and that turn is cancelled, the prompt is retained in conversation history for subsequent turns

This allows you to cancel a long-running turn without losing your entire session:

.. code-block:: bash

   $ massgen --config my_config.yaml  # Interactive mode

   🤖 Multi-Agent Session

   > What is the meaning of life?

   [Agent coordination in progress...]

   ^C
   ⚠️  Cancellation requested - saving partial progress...
   ✅ Partial progress saved. Session can be resumed with --continue
   ⏸️  Turn cancelled. Partial progress saved.
   Enter your next question or /quit to exit.

   > Let's try a simpler question...

This behavior ensures you can:

- Cancel a turn that's taking too long without losing the session
- Review partial progress and decide how to proceed
- Continue with a different question if needed

Use Cases
---------

Graceful cancellation is helpful when:

1. **Wrong Direction** - Agents are pursuing an incorrect approach
2. **Too Long** - Coordination is taking longer than expected
3. **Debugging** - You want to inspect partial state
4. **Resource Management** - You need to free up API calls or compute

Configuration
-------------

Graceful cancellation is enabled by default. No configuration is required.

.. note::

   Partial progress is only saved if at least one agent has submitted an answer.
   If cancelled before any answers are generated, only the task metadata is saved.

Related Documentation
---------------------

* :doc:`multi_turn_mode` - Interactive multi-turn conversations
* :doc:`orchestration_restart` - How MassGen handles restarts
* :doc:`memory` - Memory and context management


---

## user_guide/sessions/index.rst

Sessions & Memory
=================

MassGen provides robust session management and memory capabilities for interactive, multi-turn conversations with AI agents. This section covers how to maintain context, manage sessions, and work with persistent memory.

Overview
--------

Session features in MassGen:

* **Multi-turn mode** - Interactive conversations with persistent context
* **Memory management** - Long-term context preservation across sessions
* **Session restart** - Resume and continue previous sessions
* **Graceful cancellation** - Save partial progress when interrupting
* **Context windows** - Efficient handling of conversation history

Guides in This Section
----------------------

.. grid:: 3
   :gutter: 3

   .. grid-item-card:: 💬 Multi-Turn Mode

      Interactive conversations

      * Start interactive sessions
      * Conversation management
      * Session commands
      * Real-time agent responses

      :doc:`Read the Multi-Turn Mode guide → <multi_turn_mode>`

   .. grid-item-card:: 🧠 Memory

      Context preservation

      * Session memory
      * Memory archiving
      * Context management
      * Memory configuration

      :doc:`Read the Memory guide → <memory>`

   .. grid-item-card:: 🔄 Session Restart

      Resume previous sessions

      * Restart capabilities
      * Session recovery
      * State restoration
      * Continuation patterns

      :doc:`Read the Session Restart guide → <orchestration_restart>`

   .. grid-item-card:: ⏹️ Graceful Cancellation

      Save progress on interrupt

      * Ctrl+C handling
      * Partial progress saving
      * Resume cancelled sessions
      * Review partial answers

      :doc:`Read the Graceful Cancellation guide → <graceful_cancellation>`

Quick Start
-----------

Start an interactive multi-turn session:

.. tabs::

   .. tab:: CLI

      .. code-block:: bash

         # Start interactive mode
         massgen

         # Or with a specific config
         massgen --config @examples/basic/multi/three_agents_default

   .. tab:: Python API

      .. code-block:: python

         import asyncio
         import massgen

         # Multi-turn requires CLI for now
         # Use single queries for programmatic access
         result = await massgen.run(
             query="First question...",
             model="gpt-5"
         )

Related Documentation
---------------------

* :doc:`../files/memory_filesystem_mode` - Combine memory with file operations
* :doc:`../integration/automation` - Automated execution modes
* :doc:`../../quickstart/running-massgen` - Getting started with sessions
* :doc:`../../reference/cli` - CLI reference

.. toctree::
   :maxdepth: 1
   :hidden:

   multi_turn_mode
   memory
   orchestration_restart
   graceful_cancellation


---

## user_guide/sessions/memory.rst

Memory and Context Management
==============================

MassGen's memory system enables agents to maintain knowledge across conversations, handle long context windows gracefully, and share insights across multi-turn sessions. The system automatically manages context compression, semantic memory retrieval, and cross-agent knowledge sharing.

.. contents:: Table of Contents
   :local:
   :depth: 2

Overview
--------

The memory system consists of two complementary components:

**ConversationMemory (Short-term)**
   Fast in-memory storage for recent messages. Maintains verbatim conversation history for the current context window.

**PersistentMemory (Long-term)**
   Vector database storage (via `mem0 <https://mem0.ai>`_) with semantic search. Extracts and stores key facts that persist across sessions and can be retrieved when relevant.

Key Features
~~~~~~~~~~~~

- **Automatic Context Compression**: When approaching token limits, old messages are removed while remaining accessible via semantic search
- **Semantic Retrieval**: Retrieve relevant facts from past conversations based on current context
- **Cross-Agent Memory Sharing**: Agents access previous winning agents' knowledge from past turns
- **Session Management**: Memories isolated by session for clean separation of different tasks
- **Turn-Aware Filtering**: Prevents temporal leakage by filtering memories by turn number

Quick Start
-----------

Prerequisites
~~~~~~~~~~~~~

For multi-agent setups, start the Qdrant vector database server:

.. code-block:: bash

   # Start Qdrant (required for persistent memory)
   docker-compose -f docker-compose.qdrant.yml up -d

   # Verify it's running
   curl http://localhost:6333/health

   # (Optional) View Qdrant dashboard
   open http://localhost:6333/dashboard

Basic Configuration
~~~~~~~~~~~~~~~~~~~

Add memory configuration to your YAML config:

.. code-block:: yaml

   memory:
     enabled: true

     conversation_memory:
       enabled: true  # Short-term tracking

     persistent_memory:
       enabled: true  # Long-term storage

       # LLM for fact extraction (uses mem0's native providers)
       llm:
         provider: "openai"
         model: "gpt-4.1-nano-2025-04-14"

       # Embeddings for vector search
       embedding:
         provider: "openai"
         model: "text-embedding-3-small"

       # Qdrant configuration
       qdrant:
         mode: "server"  # Use "local" for single-agent only
         host: "localhost"
         port: 6333

     # Context compression settings
     compression:
       trigger_threshold: 0.75  # Compress at 75% usage
       target_ratio: 0.40       # Keep 40% after compression

     # Retrieval settings
     retrieval:
       limit: 5              # Facts to retrieve
       exclude_recent: true  # Only retrieve after compression

     # Recording settings (v0.1.9+)
     recording:
       record_all_tool_calls: false  # Set true to capture ALL MCP tools
       record_reasoning: false       # Set true to capture thinking separately

Run with Memory
~~~~~~~~~~~~~~~

.. code-block:: bash

   # Interactive mode with memory
   massgen --config @examples/memory/gpt5mini_gemini_context_window_management.yaml

   # Single question with memory
   massgen \
     --config @examples/memory/gpt5mini_gemini_context_window_management.yaml \
     "Analyze the MassGen codebase and create an architecture document"

How It Works
------------

Custom Fact Extraction
~~~~~~~~~~~~~~~~~~~~~~~

MassGen uses custom prompts designed to extract high-quality, domain-focused memories. The goal is to filter facts to be:

**Self-Contained and Specific**:
   Facts should be understandable 6 months later without the original conversation

**Focused on Domain Knowledge**:
   - ✅ Concrete data points with context ("OpenAI revenue reached $12B annualized")
   - ✅ Insights with explanations ("Narrative depth valued in creative writing because...")
   - ✅ Capabilities with use cases ("MassGen v0.1.1 supports Python tools via YAML")
   - ✅ Domain expertise with details ("Binet's formula uses golden ratio phi=(1+√5)/2")
   - ✅ Specific recommendations with WHAT, WHEN, WHY

**Tool Usage Patterns** (v0.1.9+):
   - ✅ Tool sequences that work ("For code analysis, directory_tree → read_file → grep provides systematic understanding")
   - ✅ Problem-solving approaches ("Breaking large tasks into focused searches yields better results than broad queries")
   - ✅ What worked/failed with reasoning ("Sequential exploration prevents getting lost in implementation details")

**Excluded for Quality**:
   - ❌ Agent comparisons ("Agent 1's response is better")
   - ❌ Voting details ("The reason for voting...")
   - ❌ Meta-instructions ("Response should start with...")
   - ❌ Generic advice without specifics ("Providing templates improves docs")
   - ❌ Usage statistics without insight ("Used grep 5 times")

**Implementation**: ``massgen/memory/_fact_extraction_prompts.py::MASSGEN_UNIVERSAL_FACT_EXTRACTION_PROMPT``

Memory Flow
~~~~~~~~~~~

**Every Turn**:

1. User message added to conversation_memory (verbatim)
2. Agent responds with reasoning and answer
3. Response recorded to:

   - **ConversationMemory**: Full message for immediate context
   - **PersistentMemory**: mem0's LLM extracts key facts and stores in vector DB

4. Context window checked:

   - **Below threshold**: Continue normally
   - **Above threshold**: Compress old messages, enable retrieval

**What Gets Recorded** (Default):

.. code-block:: text

   ✅ User messages
   ✅ Final answer text (accumulated from content chunks)
   ✅ Workflow tools (new_answer, vote) with full arguments

   ❌ System messages (orchestrator prompts - filtered out)
   ❌ MCP tool calls (unless record_all_tool_calls: true)
   ❌ Reasoning chunks (unless record_reasoning: true)

**Configurable Recording** (v0.1.9+):

You can now control what gets recorded to memory via YAML configuration:

.. code-block:: yaml

   memory:
     recording:
       record_all_tool_calls: false  # Set to true to capture ALL MCP tools
       record_reasoning: false       # Set to true to capture thinking separately

See :ref:`recording-configuration` below for details.

Context Compression
~~~~~~~~~~~~~~~~~~~

MassGen uses **reactive compression** for context window management. This is due to
a fundamental limitation of most LLM APIs.

**Why Reactive?**

Most LLM providers (OpenAI, Anthropic, Google) only report token usage *after* a
request completes. There is no mid-stream token counting or pre-flight validation API.
This means MassGen cannot proactively prevent context overflow—it can only react when
the provider returns a context length error.

**How It Works**

1. MassGen sends the conversation to the LLM
2. If the context is too long, the provider returns an error
3. MassGen catches the error and generates a summary of the work done so far
4. The summarized conversation is retried automatically (single retry to prevent loops)

**After compression**, the message structure looks like:

.. code-block:: text

   Before Error:
   [system] → [user 1] → [assistant 1] → ... → [user 20] → [assistant 20] ← ERROR

   After Compression:
   [system] → [user request] → [summary as assistant message]
   ↑           ↑                ↑
   System      User's original  Summary of ALL work done so far
   preserved   request          (most recent context - model continues from here)

**Key Design: User → Summary Ordering**

The summary is placed *after* the user message as an assistant message. This ordering
is critical for preventing redundant work:

- The model sees its own summary as the most recent context
- It naturally continues from the summary rather than starting fresh
- File reads, analysis, and other completed work are preserved in the summary

**What Gets Summarized**

The compression system captures everything in the streaming buffer, including:

- Tool calls and their results (file reads, directory listings, etc.)
- Reasoning and analysis performed
- Partial answers and work in progress
- Any content that was streaming when the context limit was hit

This ensures the model doesn't re-read files or redo analysis after compression.

**Configuration**

.. code-block:: yaml

   coordination:
     compression_target_ratio: 0.20  # Preserve 20% of messages, summarize 80%

The ``compression_target_ratio`` controls how aggressively to compress when the
context limit is exceeded:

- **0.20** (default): Preserve ~20% of messages verbatim, summarize the rest
- **0.30**: More conservative, preserve ~30% of messages
- **0.10**: More aggressive, preserve only ~10% of messages

.. note::

   Compression is **reactive** - it only triggers when the provider returns a context
   length error. MassGen cannot predict when context will exceed the limit because
   token counts are only available after each LLM call completes.

**Best Practices**

- For very long tasks, consider breaking into multiple sessions
- Use ``clear_history=True`` when starting unrelated topics
- Critical information should be in recent messages or system prompt
- Lower ``compression_target_ratio`` for more aggressive compression (preserves less)

**Future Improvements**

Some providers may add better token tracking in the future:

- Pre-flight token counting APIs
- Streaming token usage updates
- Local models with tiktoken-based estimation

Memory Retrieval
~~~~~~~~~~~~~~~~

Retrieval happens when:

- ✅ **After compression**: Retrieve facts from compressed messages
- ✅ **On restart/reset**: Restore recent context
- ❌ **Before compression**: Skip (all context already in conversation_memory)

Retrieval process:

1. **Search own agent's memories** (all turns, current session)
2. **Search previous winners' memories** (filtered by turn - see below)
3. **Format and inject** as system message before processing

.. code-block:: text

   Retrieved memories injected as:

   ┌─────────────────────────────────────┐
   │ Relevant memories:                   │
   │ • User asked about backend system    │
   │ • Agent analyzed 5 backend files     │
   │ • [From agent_b Turn 1] Explained    │
   │   stateful vs stateless backends     │
   └─────────────────────────────────────┘
   ↓
   [user msg 15] → [agent response 15] → ...

Use Cases
---------

Scenario 1: Long Analysis Tasks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Use case**: Analyzing a large codebase that requires reading 50+ files

**Without memory**:
   Context fills up after ~15 files, agent loses track of earlier analysis

**With memory**:
   - Agent reads files 1-15, context compresses
   - Files 16-30: Agent retrieves relevant facts from 1-15
   - Maintains complete understanding throughout analysis

**Configuration**:

.. code-block:: yaml

   memory:
     enabled: true
     compression:
       trigger_threshold: 0.75  # Compress when 75% full
       target_ratio: 0.40        # Keep 40% of recent context

**Example**:

.. code-block:: bash

   massgen --config @examples/memory/gpt5mini_gemini_context_window_management.yaml \
     "Analyze the entire MassGen codebase and create comprehensive documentation"

Scenario 2: Multi-Turn Sessions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Use case**: Interactive development across multiple sessions

**Without memory**:
   Each turn starts fresh, agents forget previous turns' insights

**With memory**:
   - Turn 1: Agent A wins, explains backend architecture
   - Turn 2: Agent B retrieves Agent A's Turn 1 insights
   - Turn 3: Agent A sees both own past work + Agent B's Turn 2 insights

**How winner memory sharing works**:

.. code-block:: text

   Turn 1: agent_a wins → Memories tagged {"agent_id": "agent_a", "turn": 1}
   Turn 2:
     agent_b retrieves:
       ✅ Own memories (all turns)
       ✅ agent_a's Turn 1 memories (previous winner)
       ❌ agent_a's Turn 2 memories (not yet complete)

   Turn 3:
     agent_a retrieves:
       ✅ Own memories (Turns 1, 2)
       ✅ agent_b's Turn 2 memories (previous winner)

**Configuration**:

Session ID automatically generated for interactive mode: ``session_20251028_143000``

Memories are isolated per session unless you specify a custom session name.

Scenario 3: Orchestrator Restarts
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Use case**: Agent needs to restart due to errors or new answers from other agents

**Without memory**:
   Partial work lost, agent starts from scratch

**With memory**:
   - Before restart: Current conversation recorded to persistent_memory
   - On restart: Relevant facts retrieved to restore context
   - Agent continues seamlessly with knowledge of prior attempts

**Example flow**:

.. code-block:: text

   Agent A working on task...
   📝 Read 5 files, analyzed architecture
   🔄 Other agent submits better answer → Restart triggered
   💾 Recording 10 messages before reset
   🔄 Retrieving memories after reset...
   💭 Retrieved: "Analyzed backend/base.py", "Found adapter pattern", ...
   ✅ Agent continues with restored context

Configuration Reference
-----------------------

Complete Configuration
~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   memory:
     # Global enable/disable
     enabled: true

     # Short-term conversation tracking
     conversation_memory:
       enabled: true

     # Long-term knowledge storage
     persistent_memory:
       enabled: true
       on_disk: true  # Persist across restarts

       # Session isolation (optional)
       # session_name: "my_project_analysis"  # Specific session
       # session_name: null                   # Cross-session memory

       # LLM for fact extraction
       llm:
         provider: "openai"
         model: "gpt-4.1-nano-2025-04-14"  # Fast, cheap for memory ops
         # api_key: "sk-..."  # Optional - reads from OPENAI_API_KEY env var

       # Embeddings for vector search
       embedding:
         provider: "openai"
         model: "text-embedding-3-small"
         # api_key: "sk-..."  # Optional - reads from OPENAI_API_KEY env var

       # Vector store (Qdrant)
       qdrant:
         mode: "server"      # "server" or "local"
         host: "localhost"   # Server mode only
         port: 6333          # Server mode only
         # path: ".massgen/qdrant"  # Local mode only

     # Context window compression
     compression:
       trigger_threshold: 0.75  # Compress at 75% context usage
       target_ratio: 0.40       # Target 40% after compression

     # Memory retrieval
     retrieval:
       limit: 5              # Max facts per agent
       exclude_recent: true  # Skip retrieval before compression

     # Memory recording (v0.1.9+)
     recording:
       record_all_tool_calls: false  # Record ALL MCP tools (not just workflow)
       record_reasoning: false       # Record reasoning chunks separately

Configuration Options
~~~~~~~~~~~~~~~~~~~~~

Memory Toggle
^^^^^^^^^^^^^

.. code-block:: yaml

   memory:
     enabled: false  # Disable entire memory system

Conversation Memory
^^^^^^^^^^^^^^^^^^^

.. code-block:: yaml

   conversation_memory:
     enabled: true  # Almost always true - needed for context management

Persistent Memory
^^^^^^^^^^^^^^^^^

**LLM Configuration** (for fact extraction):

.. list-table::
   :header-rows: 1
   :widths: 20 80

   * - Provider
     - Configuration
   * - OpenAI
     - ``provider: "openai"``, ``model: "gpt-4.1-nano-2025-04-14"`` or ``"gpt-4o-mini"``
   * - Anthropic
     - ``provider: "anthropic"``, ``model: "claude-haiku-4-5-20251001"``
   * - Groq
     - ``provider: "groq"``, ``model: "llama-3.1-8b-instant"``

**Embedding Configuration** (for vector search):

.. list-table::
   :header-rows: 1
   :widths: 20 80

   * - Provider
     - Configuration
   * - OpenAI
     - ``provider: "openai"``, ``model: "text-embedding-3-small"`` (1536 dims)
   * - Together
     - ``provider: "together"``, ``model: "togethercomputer/m2-bert-80M-8k-retrieval"``
   * - Azure OpenAI
     - ``provider: "azure_openai"``, ``model: "text-embedding-ada-002"``

**Qdrant Configuration**:

.. code-block:: yaml

   # Server mode (RECOMMENDED for multi-agent)
   qdrant:
     mode: "server"
     host: "localhost"
     port: 6333

   # Local mode (single agent only)
   qdrant:
     mode: "local"
     path: ".massgen/qdrant"

.. warning::
   Local file-based Qdrant does NOT support concurrent access. For multi-agent setups, always use server mode.

Session Management
^^^^^^^^^^^^^^^^^^

**Automatic sessions**:

All sessions are automatically created and tracked in the registry:

- **Interactive mode**: ``session_20251028_143000`` (shared across all turns in that session)
- **Single question**: ``session_20251028_143001`` (each run gets its own tracked session)

**Custom sessions**:

.. code-block:: yaml

   persistent_memory:
     session_name: "my_project_analysis"  # Continue specific session

**Cross-session memory** (search across all sessions):

.. code-block:: yaml

   persistent_memory:
     session_name: null  # or omit the field

Loading Previous Sessions
^^^^^^^^^^^^^^^^^^^^^^^^^^

MassGen automatically tracks all memory sessions in a registry (``~/.massgen/sessions.json``). You can list and load previous sessions to continue conversations with their memory context intact.

**List available sessions**:

.. code-block:: bash

   massgen --list-sessions

Example output:

.. code-block:: text

   Available Memory Sessions:
   ============================================================

   Session ID: session_20251028_143000
     Status:  completed
     Started: 2025-10-28 14:30:00
     Model:   gpt-4o-mini
     Config:  memory_config.yaml

   Session ID: session_20251027_091500
     Status:  completed
     Started: 2025-10-27 09:15:00
     Model:   gpt-4o
     Description: Codebase analysis project
     Config:  research_config.yaml

   ============================================================
   To load a session, use: massgen --session-id <SESSION_ID> "Your question"

**Load session via CLI**:

.. code-block:: bash

   # Continue previous session
   massgen --session-id session_20251028_143000 "What did we discuss about the backend?"

   # Interactive mode with previous session
   massgen --session-id session_20251028_143000 --config my_config.yaml

**Load session via YAML config**:

.. code-block:: yaml

   # Add to your config file
   session_id: "session_20251028_143000"

   memory:
     enabled: true
     persistent_memory:
       enabled: true
       # ... rest of memory config

**Priority order**: CLI argument (``--session-id``) > YAML config (``session_id:``) > Auto-generated

**Benefits**:

- Continue conversations across multiple CLI runs
- Access memory from previous analysis sessions
- Build on previous agents' knowledge without re-analysis
- Maintain context for long-running research projects

**Note**: All sessions (both interactive and single-question modes) are tracked in the registry and can be continued later

Compression Settings
^^^^^^^^^^^^^^^^^^^^

.. code-block:: yaml

   compression:
     trigger_threshold: 0.75  # Not reliably enforceable - see note below
     target_ratio: 0.20        # Preserve 20% of messages after compression

.. note::

   **Reactive Compression Limitation**: The ``trigger_threshold`` cannot be proactively
   enforced because token counts are only available after each LLM call completes. MassGen
   uses reactive compression—catching context length errors from the provider and
   summarizing automatically. Only ``target_ratio`` is reliably enforced.

Example configurations:

- **Aggressive compression**: ``target_ratio: 0.10`` (preserve only 10%)
- **Moderate** (default): ``target_ratio: 0.20`` (preserve 20%)
- **Conservative**: ``target_ratio: 0.40`` (preserve 40%)

Retrieval Settings
^^^^^^^^^^^^^^^^^^

.. code-block:: yaml

   retrieval:
     limit: 5              # Max facts per agent (default: 5)
     exclude_recent: true  # Smart retrieval (default: true)

- **More context**: Increase ``limit`` to 10-20 (uses more tokens)
- **Always retrieve**: Set ``exclude_recent: false`` (may duplicate recent context)

.. _recording-configuration:

Recording Settings (v0.1.9+)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**New in v0.1.9**: Control what gets recorded to memory for better observability and learning.

.. code-block:: yaml

   memory:
     recording:
       record_all_tool_calls: false  # Record ALL MCP tools (not just workflow)
       record_reasoning: false       # Record reasoning chunks separately

**record_all_tool_calls** (default: ``false``):

:``false``: Only workflow tools (``new_answer``, ``vote``) are recorded
:``true``: ALL MCP tools are captured (``list_directory``, ``read_file``, ``write_file``, etc.)

**When to enable**:
- Learning tool usage patterns across sessions
- Debugging which tools agents use most
- Understanding tool sequences (e.g., "directory_tree → read_file → grep")
- Maximum observability during development

**Example with ALL tools enabled**:

.. code-block:: text

   [Tool Usage]
   [Tool Call: mcp__filesystem__directory_tree]
   Arguments: {"path": "/Users/.../massgen"}
   Result: [directory structure with 50+ files...]

   [Tool Call: mcp__filesystem__read_text_file]
   Arguments: {"path": ".../orchestrator.py"}
   Result: [full file contents...]

   [Tool Call: new_answer]
   Arguments: {"content": "Architecture analysis complete..."}
   Result: Answer submitted

**record_reasoning** (default: ``false``):

:``false``: Reasoning mixed with final answer in main response
:``true``: Reasoning chunks saved separately with ``[Reasoning]`` prefix

**When to enable**:
- Debugging agent decision-making
- Learning problem-solving approaches
- Capturing strategic thinking separate from final output

**Example with reasoning enabled**:

.. code-block:: text

   [Reasoning]
   I should analyze the file structure first before diving into specific implementations.
   This will help me build a mental model of the codebase organization.

   [Reasoning Summary]
   Decided to use directory_tree followed by selective file reads for systematic analysis.

   Final answer: The codebase follows a modular architecture...

**Performance Impact**:

- **With both disabled** (default): ~1-2 KB per recording, concise memory
- **With both enabled**: ~10-50 KB per recording, maximum detail
- **mem0 extraction cost**: Same LLM calls regardless (extracts from whatever is sent)

**Recommendation**:
- **Development**: Enable both for debugging
- **Production**: Keep disabled for concise, focused memory

Monitoring and Debugging
-------------------------

Context Window Logs
~~~~~~~~~~~~~~~~~~~

MassGen uses **buffer-based context tracking** to accurately monitor token usage. The conversation buffer captures ALL content including tool calls, tool results, injections, and reasoning—not just turn-level messages.

**Token Tracking Priority**:

1. **Official API counts** (at stream end): Most accurate for cost/pricing
2. **Buffer estimation** (fallback): Captures all content provided by API

Monitor context usage in real-time:

.. code-block:: text

   # Using official API token counts (most accurate)
   📊 Context Window (Turn 5): 45,000 / 128,000 tokens (35%) [API actual]

   # Using buffer estimation (fallback, assuming API provides all content)
   📊 Context Buffer (Turn 5): 45,000 / 128,000 tokens (35%) [buffer]

When compression triggers:

.. code-block:: text

   ⚠️  Context Buffer (Turn 11): 96,000 / 128,000 tokens (75%) [buffer] - Approaching limit!
   🔄 Attempting compression (96,000 → 51,200 tokens)
   📦 Context compressed: Removed 15 messages (44,800 tokens).
      Kept 8 recent messages (51,200 tokens).

**Why Buffer-Based Tracking?**

The conversation buffer is the true source of context sent to agents. Unlike message-based tracking, it includes:

- Tool calls and their arguments
- Tool results (can be very large)
- Injections from other agents
- Pending content not yet flushed
- Reasoning/thinking content (may not be available, depending on the API)

This provides accurate context usage even mid-stream, before official API counts are available.

Memory Operations
~~~~~~~~~~~~~~~~~

**Recording**:

.. code-block:: text

   🔍 [_mem0_add] Recording to mem0 (agent=agent_a, session=session_123, turn=1)
      messages: 2 message(s)
      assistant: [Reasoning] I analyzed the backend files...
      assistant: The backend system consists of...
   ✅ mem0 extracted 5 fact(s), 2 relation(s)

**Retrieval**:

.. code-block:: text

   🔄 Retrieving memories after reset for agent_a (restoring recent context + 1 winner(s))...
   🔍 [retrieve] Searching memories (agent=agent_a, limit=5, winners=1)
      Previous winners: [{'agent_id': 'agent_b', 'turn': 1}]
      🔎 Searching own memories (agent_a)...
         → Found 3 memory/memories
      🔎 Searching 1 previous winner(s)...
         → Searching agent_b (turn 1)...
            Found 2 memory/memories
   ✅ Total: 5 memories retrieved
      [1] User asked about MassGen architecture
      [2] [From agent_b Turn 1] Explained the adapter pattern

Debug Files (v0.1.9+)
~~~~~~~~~~~~~~~~~~~~~

**New in v0.1.9**: Memory debug mode saves complete message→fact mappings when using the ``--debug`` flag.

**Enable debug mode**:

.. code-block:: bash

   massgen --debug --config your_config.yaml "Your question"

**Debug files saved to**:

.. code-block:: text

   .massgen/massgen_logs/log_{timestamp}/attempt_{N}/memory_debug/
   └── {agent_id}/
       ├── turn_1_20251029_200335.json
       ├── turn_2_20251029_200438.json
       └── turn_3_20251029_200557.json

**File structure**:

.. code-block:: json

   {
     "timestamp": "2025-10-29T20:03:35.123456",
     "agent_id": "test_agent",
     "session_id": "temp_20251029_200122",
     "turn": 1,
     "metadata": {
       "tools_used": ["mcp__filesystem__directory_tree", "read_text_file"],
       "has_tools": true,
       "message_count": 1
     },
     "messages_sent": [
       {
         "role": "assistant",
         "content": "[Tool Usage]\n[Tool Call: directory_tree]\nArguments: {...}\nResult: ..."
       }
     ],
     "facts_extracted": [
       {
         "id": "abc123",
         "memory": "For analyzing Python codebases, directory_tree → read_file sequence...",
         "event": "ADD"
       }
     ],
     "extraction_count": 10
   }

**Use cases**:

- **Verify tool capture**: Check if MCP tools appear in ``messages_sent``
- **Tune prompts**: Compare input vs. extracted facts to improve extraction quality
- **Debug 0 facts**: See what content was sent when extraction fails
- **Monitor quality**: Review if facts are actionable or generic

Testing Memory Setup
~~~~~~~~~~~~~~~~~~~~

Verify your memory configuration:

.. code-block:: bash

   # Run test script
   uv run python scripts/test_memory_setup.py

Expected output:

.. code-block:: text

   🧪 MEMORY SYSTEM TEST SUITE

   ============================================================
   TEST 1: Environment Variables
   ============================================================
   ✅ OPENAI_API_KEY found (starts with: sk-proj...)

   ============================================================
   TEST 2: OpenAI Embedding API
   ============================================================
   ✅ Embedding successful!
      Vector dimensions: 1536

   ============================================================
   TEST 3: mem0 LLM API (gpt-4.1-nano)
   ============================================================
   ✅ LLM call successful!

   ============================================================
   TEST 4: Qdrant Connection
   ============================================================
   ✅ Qdrant server connected!

   ============================================================
   TEST 5: Full Memory Integration
   ============================================================
   ✅ PersistentMemory created!
   ✅ Messages recorded!

Advanced Usage
--------------

Per-Agent Memory Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Override memory settings for specific agents:

.. code-block:: yaml

   memory:
     # Global defaults
     retrieval:
       limit: 5

   agents:
     - id: "researcher"
       memory:
         retrieval:
           limit: 20  # This agent gets more context

     - id: "writer"
       memory:
         retrieval:
           limit: 3   # This agent gets less

Different Embedding Providers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Using Together AI** (cost-effective):

.. code-block:: yaml

   persistent_memory:
     embedding:
       provider: "together"
       model: "togethercomputer/m2-bert-80M-8k-retrieval"
       # Reads TOGETHER_API_KEY from environment

**Using Azure OpenAI**:

.. code-block:: yaml

   persistent_memory:
     llm:
       provider: "azure_openai"
       model: "gpt-4o-mini"
       api_key: "${AZURE_OPENAI_API_KEY}"
     embedding:
       provider: "azure_openai"
       model: "text-embedding-ada-002"

Session Continuation
~~~~~~~~~~~~~~~~~~~~

**Continue a previous session**:

.. code-block:: yaml

   persistent_memory:
     session_name: "codebase_analysis_oct2025"

All agents will access memories from this session across multiple CLI runs.

**Cross-session knowledge**:

.. code-block:: yaml

   persistent_memory:
     session_name: null  # Search across ALL sessions

Useful for:
- Building knowledge base across projects
- Learning from past conversations
- Avoiding repeating analysis

Troubleshooting
---------------

Common Issues
~~~~~~~~~~~~~

**Qdrant Connection Error**

.. code-block:: text

   ⚠️  Failed to create shared Qdrant client: Storage folder .massgen/qdrant
   is already accessed by another instance

**Solution**:

1. Check if Qdrant server is running:

   .. code-block:: bash

      docker-compose -f docker-compose.qdrant.yml ps

2. Remove stale lock files:

   .. code-block:: bash

      ./scripts/cleanup_qdrant_lock.sh
      # Or manually:
      rm .massgen/qdrant/.lock

3. Use server mode for multi-agent:

   .. code-block:: yaml

      qdrant:
        mode: "server"

**API Key Not Found**

.. code-block:: text

   ⚠️  OPENAI_API_KEY not found in environment - embedding will fail!

**Solution**:

Create ``.env`` file in project root:

.. code-block:: bash

   OPENAI_API_KEY=sk-proj-...
   ANTHROPIC_API_KEY=sk-ant-...  # If using Anthropic

**No Memories Retrieved**

.. code-block:: text

   🔄 Retrieving memories after reset...
   ℹ️  No relevant memories found

**This is normal if**:
- First turn (no memories yet)
- Query doesn't match stored memories semantically
- mem0 hasn't processed messages yet (async extraction)

**Check**:
1. Verify recording succeeded: Look for ``✅ mem0 extracted X fact(s)`` in logs
2. Browse Qdrant collections: http://localhost:6333/dashboard
3. Check debug files: ``.massgen/.../memory_debug/*.json``

**0 Facts Extracted**

.. code-block:: text

   ✅ mem0 extracted 0 fact(s), 0 relation(s)
   ⚠️  mem0 extracted 0 facts (check fact extraction prompt or content quality)

**Common causes**:
1. **Content too short**: Less than 10 chars or empty messages
2. **Weak extraction model**: gpt-4o-mini may fail on complex content
3. **Generic content**: No extractable facts (e.g., voting messages)
4. **JSON parsing error**: Model hit token limit mid-response

**Solutions**:
1. Use stronger model: Change ``llm.model`` to ``"gpt-4o"``
2. Enable debug mode: ``--debug`` to inspect ``messages_sent``
3. Check content length in logs: ``Combined content length: X chars``
4. Enable ``record_all_tool_calls: true`` to provide more context

**PointStruct Validation Errors**

.. code-block:: text

   Error: 6 validation errors for PointStruct
   vector.list[float] Input should be a valid list [type=list_type, input_value=None]

**Cause**: Embedding API returned ``None`` instead of valid vector

**Common reasons**:
1. **Empty content**: Message with no text sent to embedding API
2. **API failure**: Rate limit, timeout, or invalid API key
3. **Malformed input**: Special characters or encoding issues

**Solution**: This is now automatically prevented by content validation (messages < 10 chars filtered out). If still occurring, check API key and embedding provider status.

**JSON Parsing Errors from mem0**

.. code-block:: text

   Invalid JSON response: Unterminated string starting at: line 108 column 7

**Cause**: mem0's extraction LLM hit token limit mid-response, didn't close JSON string

**Solution**: Use stronger extraction model (gpt-4o) or reduce content length

Cleaning Up
~~~~~~~~~~~

**Stop Qdrant**:

.. code-block:: bash

   docker-compose -f docker-compose.qdrant.yml down

**Clear all memories**:

.. code-block:: bash

   # Remove Qdrant storage (WARNING: deletes all memories!)
   rm -rf .massgen/qdrant_storage

**Clear session data**:

.. code-block:: bash

   # Remove specific session
   rm -rf .massgen/memory_test_sessions/session_20251028_143000

   # Or all sessions
   rm -rf .massgen/memory_test_sessions

.. _design-decisions:

Design Decisions
----------------

.. raw:: html

   <details>
   <summary><strong>Why These Architecture Choices?</strong> (Click to expand)</summary>

Why mem0's Native LLMs/Embedders?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Decision**: Use mem0's built-in providers (OpenAI, Anthropic, etc.) instead of wrapping MassGen backends

**Rationale**:

- **Simpler**: No adapter layer, direct integration
- **No async issues**: mem0's adapters are sync, wrapping async MassGen backends caused event loop conflicts
- **Optimized**: mem0's default (gpt-4.1-nano) is optimized for memory operations
- **Flexible**: Support for many providers without custom code

**Trade-off**: Requires separate API keys (can't reuse agent's backend). But memory operations are cheap (~1-2 cents/session).

Why MCP Tools Are Optional in Memory (v0.1.9+)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Default**: MCP tool calls (read_file, list_directory, etc.) are **not** recorded

**Rationale**:

1. **Implementation details**: HOW the work was done, not WHAT was learned
2. **Redundant**: The final answer usually captures insights from reading those files
3. **Noise**: 50+ file reads can overwhelm mem0's extraction, making it harder to extract semantic facts
4. **Focus on outcomes**: Agent's conclusions more valuable than execution trace
5. **Token efficiency**: Keeps memory concise and focused

**Example (default mode)**:

.. code-block:: text

   Recorded to memory:
   ✅ Final answer: "The backend uses an adapter pattern in base.py that enables provider abstraction"

   Not recorded:
   ❌ [Tool: read_file] path=/foo/base.py
   ❌ [Tool: read_file] path=/foo/openai.py
   ❌ [Tool: read_file] path=/foo/claude.py

**When to Enable** (``record_all_tool_calls: true``):

- **Learning tool patterns**: Understand which tool sequences work best
- **Debugging**: See exactly what agent explored
- **Pattern analysis**: Extract insights like "directory_tree before read_file is more effective"
- **Development**: Maximum observability during testing

**Example (all tools mode)**:

.. code-block:: text

   Recorded to memory:
   ✅ [Tool Call: mcp__filesystem__directory_tree]
      Arguments: {"path": "/massgen"}
      Result: [50+ files and directories...]
   ✅ [Tool Call: mcp__filesystem__read_text_file]
      Arguments: {"path": "/massgen/base.py"}
      Result: [full file contents...]
   ✅ Final answer: "The backend uses an adapter pattern..."

mem0's LLM can then extract: "For analyzing codebases, using directory_tree first followed by reading key files provides systematic understanding"

**If you just need execution history** (not learning patterns): Check orchestrator logs or agent workspace snapshots instead.

Why Record Reasoning?
~~~~~~~~~~~~~~~~~~~~~

**Decision**: Include full reasoning chains and summaries in memory

**Rationale**:

- **Context for decisions**: Final answer is meaningless without the reasoning
- **Better fact extraction**: mem0's LLM can extract richer facts from reasoning
- **Debugging**: Understand WHY agent made certain choices
- **Learning**: Future turns benefit from understanding past reasoning

**Example memory facts extracted**:

- Without reasoning: "Agent said backend uses adapters"
- With reasoning: "Agent analyzed base.py first, then compared 5 implementations, concluded adapters enable provider abstraction"

Why Filter System Messages?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Decision**: Exclude ``role: "system"`` messages from memory

**Rationale**:

- **Orchestrator noise**: System messages contain coordination prompts like "You are evaluating answers from multiple agents..."
- **Not conversation content**: System prompts are framework instructions, not user/agent dialogue
- **Bloat**: Can be 5-10KB per message, mostly boilerplate
- **Focus on semantics**: User questions and agent answers are what matter for memory

Why Smart Retrieval (exclude_recent)?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Decision**: Default ``exclude_recent: true`` - only retrieve after compression

**Rationale**:

- **Before compression**: All context already in conversation_memory sent to LLM
- **Retrieval would duplicate**: Waste tokens on information already present
- **After compression**: Old messages removed, retrieval fills the gap
- **On restart**: Always retrieve to restore context

**Token efficiency**:

- Without exclude_recent: ~500 extra tokens per turn (duplicated context)
- With exclude_recent: ~100 tokens only when needed (after compression)

Context Compression Thresholds
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Decision**: Default 75% trigger, 40% target

**Rationale**:

- **75% trigger**: Provides buffer before hitting limit (avoid truncation)
- **40% target**: Balances context retention vs. token budget
- **Room for retrieval**: Retrieved facts + recent context fit comfortably
- **Headroom for response**: LLM has space to generate long responses

**Alternative configurations**:

- **Long analysis tasks**: Lower threshold (50%) to compress more aggressively
- **Short conversations**: Higher threshold (90%) to compress rarely

Why Qdrant Server for Multi-Agent?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Decision**: Require Qdrant server mode (Docker) for multi-agent setups

**Rationale**:

- **Concurrent access**: File-based Qdrant locks on first access
- **Performance**: Server mode handles parallel searches better
- **Robustness**: No stale lock files from crashed processes
- **Scalability**: Can scale to many agents

**Trade-off**: Requires Docker. But setup is one command: ``docker-compose up -d``

Why Separate Memories Per Agent?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Decision**: Each agent has isolated memories, filtered by ``agent_id``

**Rationale**:

- **Specialization**: Different agents can build different knowledge bases
- **Controlled sharing**: Only share via turn-aware winner mechanism
- **Scalability**: Single Qdrant database, filtered by metadata
- **Privacy**: Agent-specific knowledge stays private until winning

**Alternative considered**: Shared memory pool for all agents. Rejected because:
- Information overload: Agent sees irrelevant memories from other agents
- Loss of specialization: Can't maintain agent-specific expertise
- Temporal issues: Agent sees work-in-progress from concurrent agents

Why Turn-Aware Memory Filtering?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Decision**: Filter previous winners' memories by ``{"turn": 1}`` metadata

**Rationale**:

**Prevents temporal leakage**:

.. code-block:: text

   Turn 2 (concurrent):
   - agent_a working... (incomplete)
   - agent_b working... (incomplete)

   Without filtering:
   - agent_a could see agent_b's Turn 2 work-in-progress ❌
   - Leads to confusion, inconsistent state

   With filtering:
   - agent_a only sees agent_b's Turn 1 (complete, winner) ✅
   - Clean separation of concurrent work

**Implementation**: Memories tagged with ``{"turn": N}`` on recording, filtered on retrieval.

.. raw:: html

   </details>

API Reference
-------------

For programmatic usage, see the memory module docstrings:

- ``massgen.memory.PersistentMemory`` - Persistent memory API
- ``massgen.memory.ConversationMemory`` - Conversation memory API
- ``massgen.memory._context_monitor`` - Context monitoring utilities

  - ``log_context_usage_from_buffer(buffer, turn_number)`` - Buffer-based tracking (recommended)
  - ``log_context_usage_from_tokens(tokens, turn_number)`` - Official API token counts
  - ``log_context_usage(messages, turn_number)`` - Legacy message-based tracking

- ``massgen.conversation_buffer.AgentConversationBuffer`` - Conversation buffer

  - ``estimate_tokens(calculator)`` - Get total token count including pending content
  - ``get_token_stats(calculator)`` - Get breakdown by entry type (user, assistant, tool_call, etc.)

Examples
--------

See complete examples in:

- ``massgen/configs/memory/gpt5mini_gemini_context_window_management.yaml``
- ``massgen/configs/memory/gpt5mini_high_reasoning_gemini.yaml``

Future Improvements
-------------------

.. note::
   The memory system is production-ready but has several planned enhancements.

Planned Features
~~~~~~~~~~~~~~~~

**1. Proactive Streaming Interruption** *(Partially Implemented)*

**Implemented**: Buffer-based token tracking captures ALL content during streaming:

.. code-block:: text

   [Agent streaming response...]
   → [Buffer tracks: tool calls, results, reasoning, content]
   → [Pre-processing: 📊 Context Buffer: 45K / 128K tokens [buffer]]
   → [Post-processing: 📊 Context Window: 45K / 128K tokens [API actual]]
   → [Compress if needed]

**Remaining**: Proactive interruption when approaching budget

**Planned**: Inject warning to agent mid-stream when approaching limit

.. code-block:: text

   [Agent streaming...]
   → [Buffer counter: 95K / 128K budget]
   → [Agent sees: "⚠️ Approaching token limit, wrap up"]
   → [Agent concludes early]

**2. Memory Analytics Dashboard**

**Planned**: Visualize memory quality and tool usage patterns

.. code-block:: text

   Memory Analytics Dashboard
   ===========================

   Facts Extracted: 245 (last 7 days)
   Tool Patterns Learned: 12

   Top Tool Sequences:
   1. directory_tree → read_file → grep (85% success)
   2. list_directory → read_file (92% success)

   Fact Quality:
   - Actionable: 78%
   - Generic: 15%
   - Redundant: 7%

**3. Smart Tool Result Summarization**

**Planned**: Automatically summarize large MCP tool results before recording

.. code-block:: yaml

   memory:
     recording:
       record_all_tool_calls: true
       summarize_large_results: true  # Auto-summarize results > 5KB
       summary_model: "gpt-4o-mini"   # Model for summarization

**Benefit**: Capture tool usage patterns without overwhelming mem0's extraction LLM with 50KB directory trees

**4. Memory Summarization on Compression** *(Implemented)*

Compression now generates a comprehensive summary of all work done:

.. code-block:: text

   Compression Flow:
   1. Context limit error detected
   2. Generate summary of buffer content (tool calls, results, analysis)
   3. Rebuild context: [system] → [user request] → [summary]
   4. Summary placed LAST so model continues from it (not restart)

The user→summary ordering prevents models from re-reading files or redoing analysis
that was already completed before compression.

Known Limitations
~~~~~~~~~~~~~~~~~

**Token Counting During Streaming** *(Improved in v0.1.25+)*

Buffer-based tracking now provides context estimates during streaming:

- ✅ **Before processing**: Buffer estimation shows current context size
- ✅ **After response**: Official API counts used when available
- ✅ **Accurate tracking**: Includes tool calls, results, injections, reasoning
- ❌ Can't stop mid-response if too large (proactive interruption planned)
- ❌ No real-time budget warnings to agent yet
- ❌ Reasoning not provided by APIs so buffer can be inaccurate

**Workaround**: Set conservative compression thresholds (50-60%) to leave headroom.

**Extraction Quality Depends on Model**

The quality of extracted facts varies significantly by model:

- **gpt-4.1-nano / gpt-4o-mini**: Fast, cheap, but may produce generic facts or JSON parsing errors on complex content
- **gpt-4o / gpt-4-turbo**: Slower, more expensive, but extracts specific, actionable insights

**Recommendation**: Use gpt-4o-mini for development, gpt-4o for production if fact quality matters.

**MCP Tools Recording is Opt-In**

By default, MCP tool calls (read_file, list_directory) are excluded to keep memory concise.

**To enable**: Set ``memory.recording.record_all_tool_calls: true``

**Trade-off**: More data for pattern learning vs. potential information overload for mem0's extraction LLM.

**Session-Level Memory Isolation**

Memories are isolated per session. To access knowledge from previous sessions, either:
- Set ``session_name: null`` (search all sessions)
- Explicitly continue a session with ``session_name: "my_session"``

**Local Qdrant Single-Agent Only**

File-based Qdrant (``mode: "local"``) does NOT support concurrent access.

**For multi-agent**: Always use ``mode: "server"`` with Docker.

Next Steps
----------

- :doc:`multi_turn_mode` - Interactive multi-turn conversations
- :doc:`orchestration_restart` - Graceful restart handling
- :doc:`../logging` - Understanding MassGen's logging system


---

## user_guide/sessions/multi_turn_mode.rst

Interactive Multi-Turn Mode
===========================

MassGen supports interactive mode where you can have ongoing conversations with the system. Agents maintain context across multiple turns and collaborate on each response.

Starting Interactive Mode
--------------------------

Simply omit the question when running MassGen to enter interactive chat mode:

**Single agent:**

.. code-block:: bash

   # Interactive mode with quick model selection
   massgen --model gpt-5-mini

**Multi-agent:**

.. code-block:: bash

   # Multi-agent interactive mode
   massgen \
     --config @examples/basic/multi/three_agents_default.yaml

**The Interactive Interface:**

.. code-block:: text

   ╭──────────────────────────────────────────────────────────────────────────────╮
   │                                                                              │
   │       ███╗   ███╗ █████╗ ███████╗███████╗ ██████╗ ███████╗███╗   ██╗         │
   │       ████╗ ████║██╔══██╗██╔════╝██╔════╝██╔════╝ ██╔════╝████╗  ██║         │
   │       ██╔████╔██║███████║███████╗███████╗██║  ███╗█████╗  ██╔██╗ ██║         │
   │       ██║╚██╔╝██║██╔══██║╚════██║╚════██║██║   ██║██╔══╝  ██║╚██╗██║         │
   │       ██║ ╚═╝ ██║██║  ██║███████║███████║╚██████╔╝███████╗██║ ╚████║         │
   │       ╚═╝     ╚═╝╚═╝  ╚═╝╚══════╝╚══════╝ ╚═════╝ ╚══════╝╚═╝  ╚═══╝         │
   │                                                                              │
   │            🤖 🤖 🤖  →  💬 collaborate  →  🎯 winner  →  📢 final            │
   │                                                                              │
   ╰──────────────────────────────────────────────────────────────────────────────╯

   ╭──────────────────────────────────────────────────────────────────────────────╮
   │    🤝 Mode:                Multi-Agent (3 agents)                            │
   │      ├─ openai_agent_1:    gpt-5 (Response)                                  │
   │      ├─ gemini_agent_2:    gemini-2.5-flash (Gemini)                         │
   │      └─ grok_agent_3:      grok-4-fast-reasoning (Grok)                      │
   ╰──────────────────────────────────────────────────────────────────────────────╯

   ╭──────────────────────────────────────────────────────────────────────────────╮
   │  💬  Type your questions below                                               │
   │  💡  Use: /help, /quit, /reset, /status, /config, /context, /inspect         │
   │  📝  For multi-line input: start with """ or '''                             │
   │  ⌨️   Press Ctrl+C to exit                                                   │
   ╰──────────────────────────────────────────────────────────────────────────────╯

How It Works
------------

In interactive mode:

1. **Context Preservation** - Each response builds on previous conversation history
2. **Multi-Agent Collaboration** - Agents continue to vote and reach consensus on each turn
3. **Session Management** - All conversation state preserved in ``.massgen/sessions/``
4. **Natural Conversation** - Type your questions, press Enter, get collaborative responses
5. **Queued Runtime Input Persistence** - If runtime-injected queue content is promoted to a fallback turn prompt and that turn is cancelled, the prompt is still kept as a user-history entry for future turns
6. **Restart-Safe Runtime Instructions** - Runtime input already delivered to an agent stays in that agent's context for the rest of the turn, including round restarts, and is shown as a ``<RUNTIME USER INSTRUCTIONS>`` block after the original-message section

**Example session:**

.. code-block:: text

   You: What is machine learning?
   [Agents collaborate and provide comprehensive answer]

   You: Give me a practical example of supervised learning
   [Agents use context from previous turn to provide relevant examples]

   You: How can I implement that in Python?
   [Agents build on previous examples with implementation code]

   You: /quit
   Exiting MassGen. Goodbye!

Interactive Features
--------------------

Multi-Turn Conversations
~~~~~~~~~~~~~~~~~~~~~~~~

Multiple agents collaborate to chat with you in an ongoing conversation. Each agent:

* Sees full conversation history
* Builds on previous responses
* Votes and reaches consensus on each turn
* Maintains context about your goals and preferences

Real-Time Coordination Tracking
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Live visualization of agent interactions:

* Agent coordination table showing votes and consensus
* Real-time phase transitions (Initial → Coordination → Presentation)
* Voting progress and decision-making processes
* Streaming agent responses

Interactive Coordination Table
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use ``/inspect`` after a turn completes to view:

* Complete history of agent coordination events
* State transitions for each agent
* Voting patterns and consensus evolution
* Individual agent outputs before voting
* Final coordinated answer

Session Management
------------------

Session Storage
~~~~~~~~~~~~~~~

When using interactive mode, MassGen automatically stores session state in:

.. code-block:: text

   .massgen/
   └── sessions/
       └── session_20250108_143022/
           ├── turn_1/               # Results from first turn
           │   ├── agent_outputs/
           │   └── coordination_log.json
           ├── turn_2/               # Results from second turn
           │   ├── agent_outputs/
           │   └── coordination_log.json
           └── SESSION_SUMMARY.txt   # Human-readable summary

Benefits:

* **Resume sessions** - Continue from where you left off
* **Review history** - Examine past turns and agent decisions
* **Debug conversations** - Understand coordination patterns
* **Track progress** - See how agents evolved their understanding

Configuration
~~~~~~~~~~~~~

Interactive mode uses the same YAML configuration as single-turn mode:

.. code-block:: yaml

   agents:
     - id: "agent1"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
     - id: "agent2"
       backend:
         type: "openai"
         model: "gpt-5-nano"

   ui:
     display_type: "rich_terminal"
     logging_enabled: true

Working with Project Files
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Multi-turn mode supports full filesystem integration for working with your codebase across multiple turns:

.. code-block:: yaml

   orchestrator:
     # Share read-only source code across all agents
     context_paths:
       - path: "src/"
         permission: "read"
       - path: "tests/"
         permission: "read"
       - path: "docs/"
         permission: "read"

     # Agent workspaces for file modifications
     agent_temporary_workspace: ".massgen/temp_workspaces"
     snapshot_storage: ".massgen/snapshots"

   agents:
     - id: "agent_a"
       backend:
         type: "claude"
         model: "claude-sonnet-4"

         # Agent-specific workspace for modifications
         cwd: "workspace_a"
         # File operations handled automatically via cwd parameter

**Key Features:**

* **``context_paths``** - Grant agents read-only access to your source code
* **``cwd``** - Each agent gets isolated workspace for file modifications
* **``agent_temporary_workspace``** - Temporary workspaces preserved across turns
* **``snapshot_storage``** - Workspace snapshots saved between turns

**Example workflow:**

.. code-block:: text

   You: Read the authentication module and explain how it works
   [Agents access src/ via context_paths and analyze code]

   You: Create an improved version with better error handling
   [Agents write to their workspace_a/ with modifications]

   You: Add unit tests for the new error handling
   [Agents build on previous turn's work, maintaining full context]

Interactive Commands
--------------------

Special commands available during interactive sessions:

.. list-table::
   :header-rows: 1
   :widths: 30 70

   * - Command
     - Description
   * - ``/help`` or ``/h``
     - Show available commands and help message
   * - ``/status``
     - Show current session status (agents, mode, conversation history, config path)
   * - ``/config``
     - Open configuration file in default editor (macOS, Windows, Linux)
   * - ``/context``
     - Add or modify context paths to give agents access to project files
   * - ``/inspect`` or ``/i``
     - View agent outputs and coordination data from current or previous turns
   * - ``/inspect <N>``
     - View outputs from a specific turn number (e.g., ``/inspect 2``)
   * - ``/inspect all``
     - List all turns in the current session with summary
   * - ``/clear`` or ``/reset``
     - Reset conversation history and start fresh
   * - ``/quit`` or ``/exit`` or ``/q``
     - Exit interactive mode
   * - ``Ctrl+C``
     - Exit interactive mode

Inspecting Turn History
-----------------------

The ``/inspect`` command allows you to review outputs from any turn in your multi-turn session:

**List all turns:**

.. code-block:: text

   👤 User: /inspect all

   ┌─────────────────────────────────────────────────────────────────────┐
   │                    Session: session_20250108_143022                 │
   ├──────┬──────────────────────────────────────────────────┬───────────┤
   │ Turn │ Task                                             │ Winner    │
   ├──────┼──────────────────────────────────────────────────┼───────────┤
   │ 1    │ What is machine learning?                        │ agent_a   │
   │ 2    │ Give me a practical example of supervised lear...│ agent_b   │
   │ 3    │ How can I implement that in Python?              │ agent_a   │
   └──────┴──────────────────────────────────────────────────┴───────────┘

   Use /inspect <turn_number> to view details

**Inspect a specific turn:**

.. code-block:: text

   👤 User: /inspect 2

   === Turn 2 Inspection ===

   ╭────────────────── Turn Metadata ──────────────────╮
   │ Task:    Give me a practical example of supervi...│
   │ Winner:  agent_b                                  │
   │ Time:    2025-01-08T14:35:22.123456               │
   │ Logs:    .massgen/massgen_logs/log_.../turn_2     │
   ╰───────────────────────────────────────────────────╯

   [Turn Inspection Menu]
     1: View agent_a output
     2: View agent_b output
     f: Show final answer
     s: Show system status log
     r: Show coordination table
     w: List workspace files (3 files)
     o: Open workspace in file browser
     q: Quit inspection

   Enter your choice:

**Inspection menu options:**

* **Agent outputs (1, 2, ...)** - View the full output from each agent before voting
* **Final answer (f)** - The coordinated response that was presented to you
* **System status (s)** - Orchestrator logs showing coordination decisions
* **Coordination table (r)** - Full history of voting and consensus
* **Workspace files (w)** - Files created by agents during that turn
* **Open workspace (o)** - Open the workspace folder in your file browser

This is particularly useful for:

* **Reviewing agent reasoning** - See how each agent approached the problem
* **Understanding voting patterns** - Check why a particular agent was selected
* **Debugging issues** - Examine coordination logs when results are unexpected
* **Learning from history** - Reference previous successful approaches

Real-Time Feedback
------------------

The system displays real-time agent and system status:

**Phase Indicators:**

.. code-block:: text

   ┌─ Initial Answer Generation ────────────────┐
   │ Agent1: Generating...                      │
   │ Agent2: Generating...                      │
   │ Agent3: Complete ✓                         │
   └────────────────────────────────────────────┘

**Coordination Table:**

.. code-block:: text

   ┌─ Coordination Round 1 ─────────────────────┐
   │ Agent     │ Status      │ Votes            │
   ├───────────┼─────────────┼──────────────────┤
   │ Agent1    │ Voted       │ Agent3           │
   │ Agent2    │ Voting...   │ -                │
   │ Agent3    │ Converged   │ Self             │
   └────────────────────────────────────────────┘

**Streaming Output:**

Watch agents' reasoning and responses develop in real-time as they think through the problem.

Use Cases for Interactive Mode
-------------------------------

**Iterative Research**
   Explore topics progressively, diving deeper based on previous responses.

**Code Development**
   Build projects step-by-step with agents refining code based on feedback.

**Learning and Tutoring**
   Ask follow-up questions to clarify concepts and build understanding.

**Exploratory Analysis**
   Investigate datasets or documents with agents maintaining analysis context.

**Creative Writing**
   Develop stories or content iteratively with collaborative refinement.

Example: Iterative Code Development
------------------------------------

.. code-block:: bash

   # Start interactive session with file operations
   massgen \
     --config @examples/tools/filesystem/claude_code_single.yaml

Session example:

.. code-block:: text

   You: Create a simple Flask web app
   [Agents create basic Flask structure]

   You: Add user authentication
   [Agents add authentication using context of existing structure]

   You: Add a database for storing user preferences
   [Agents integrate database with existing auth system]

   You: Write tests for the authentication
   [Agents create tests covering the implemented features]

Each turn builds on the work from previous turns, with agents maintaining full context of the evolving project.

Debugging Interactive Sessions
-------------------------------

Enable debug mode for detailed logging:

.. code-block:: bash

   massgen \
     --debug \
     --config @examples/basic/multi/three_agents_default.yaml

Debug logs saved to ``agent_outputs/log_{timestamp}/massgen_debug.log`` include:

* Full conversation history
* Agent decision-making processes
* Coordination events and state transitions
* Tool calls and backend operations

Best Practices
--------------

1. **Start Broad** - Begin with general questions, then drill down
2. **Reference Previous Turns** - Use "that", "the previous", "your earlier suggestion"
3. **Clear When Switching Topics** - Use ``/clear`` to reset context
4. **Review Coordination** - Use ``/inspect`` to understand agent decision patterns and compare outputs
5. **Save Important Outputs** - Session storage preserves all turns for later review
6. **Compare Agent Approaches** - Use ``/inspect`` to see how different agents approached the same problem

Next Steps
----------

* :doc:`../files/file_operations` - Learn about file operations in multi-turn sessions
* :doc:`../files/project_integration` - Work with your codebase across multiple turns
* :doc:`../tools/mcp_integration` - Use MCP tools in interactive mode
* :doc:`../../quickstart/running-massgen` - More CLI examples


---

## user_guide/sessions/orchestration_restart.rst

Orchestration Restart
=====================

.. contents:: Table of Contents
   :local:
   :depth: 2

Overview
--------

The orchestration restart feature allows the final agent to recognize when the current coordinated answers are insufficient and request a restart of the entire orchestration process with detailed instructions for improvement.

This is particularly useful for:

- Multi-step tasks where early attempts miss key steps
- Complex problems requiring iterative refinement
- Scenarios where irreversible actions must be performed correctly

How It Works
------------

After MassGen completes the voting phase and selects a final agent, instead of immediately presenting the final answer, the system:

1. **Decision Phase**: Asks the final agent to review all answers
2. **Submit or Restart**: The agent chooses to either:

   - Call ``submit`` → Confirms the task is complete, proceeds with final presentation
   - Call ``restart_orchestration`` → Requests a restart with specific instructions

3. **Restart Execution**: If restart is chosen:

   - All agent states are reset
   - Instructions are injected into agent prompts
   - Coordination runs again with improved guidance

4. **Limits**: Maximum restarts are configurable (default: 2) to prevent infinite loops

Final Agent Tools
-----------------

The final agent has access to two special tools:

Submit Tool
~~~~~~~~~~~

Confirms that the coordinated answers are satisfactory:

.. code-block:: json

   {
     "name": "submit",
     "parameters": {
       "confirmed": true
     }
   }

Use this when:

- All answers adequately address the task
- The task is complete
- No further work is needed

Restart Orchestration Tool
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Requests a restart with detailed instructions:

.. code-block:: json

   {
     "name": "restart_orchestration",
     "parameters": {
       "reason": "Agents provided plans but didn't execute actual implementation",
       "instructions": "Please actually implement the solution by modifying the files, not just describing what changes should be made"
     }
   }

Use this when:

- Current answers are incomplete or incorrect
- A different approach is needed
- Key steps were missed
- More specific guidance would help agents

Evaluation Process
------------------

After the winning agent presents their final answer, they evaluate the result:

1. **Presentation**: Final agent delivers complete answer with full tool access
2. **Evaluation**: Agent reviews the actual output quality
3. **Decision**: Agent chooses to submit (complete) or restart with improvements

This approach ensures agents evaluate actual execution, not just plans.

Configuration
-------------

Basic Configuration
~~~~~~~~~~~~~~~~~~~

Set the maximum number of restarts in your configuration:

.. code-block:: yaml

   # config.yaml
   orchestrator:
     coordination:
       max_orchestration_restarts: 2  # Default: 0 (allows 3 total attempts: initial + 2 restarts)

Programmatic Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from massgen.agent_config import CoordinationConfig, AgentConfig

   coordination_config = CoordinationConfig(
       max_orchestration_restarts=2  # Allow up to 2 restarts (3 total attempts)
   )

   config = AgentConfig(
       coordination_config=coordination_config
   )

Setting Restart Limits
~~~~~~~~~~~~~~~~~~~~~~

Each restart runs the full coordination process again. More restarts mean more time and API costs, but better results for complex tasks.

Recommended values:

- ``max_orchestration_restarts: 0`` - No restarts (previous behavior)
- ``max_orchestration_restarts: 2`` - Standard tasks
- ``max_orchestration_restarts: 3`` - Complex tasks

Use Cases
---------

Example 1: Description to Implementation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Scenario**: Agents describe changes without executing them.

**First Attempt**:

.. code-block:: text

   Agent 1: "I would modify app.py to add the login function..."
   Agent 2: "I would create a database migration to add the users table..."

**Final Agent Decision**:

.. code-block:: python

   restart_orchestration(
       reason="Agents only planned but didn't execute implementation",
       instructions="Please actually implement the changes by modifying the files and running necessary commands. Make real changes, not just descriptions."
   )

**Second Attempt**:

.. code-block:: text

   Agent 1: *Actually modifies app.py*
   Agent 2: *Creates and runs database migration*
   Result: Task completed successfully!

Example 2: Multi-Step Task
~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Scenario**: Clone repository and solve an issue.

**First Attempt**:

.. code-block:: text

   Agents solve the issue but forget to clone the repo first

**Final Agent Decision**:

.. code-block:: python

   restart_orchestration(
       reason="Agents attempted to solve issue without cloning repository first",
       instructions="Step 1: Clone the repository. Step 2: Analyze the issue. Step 3: Implement the fix. Please follow these steps in order."
   )

**Second Attempt**:

.. code-block:: text

   Agents follow the steps correctly
   Repository is cloned, issue is analyzed and fixed
   Result: Success!

Example 3: Incomplete Solution
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Scenario**: Web application deployment task.

**First Attempt**:

.. code-block:: text

   Agents set up the server but don't configure the database

**Final Agent Decision**:

.. code-block:: python

   restart_orchestration(
       reason="Server setup complete but database configuration missing",
       instructions="In addition to server setup, please configure the PostgreSQL database, run migrations, and verify the application connects successfully."
   )

**Second Attempt**:

.. code-block:: text

   Complete setup including database
   Result: Fully functional deployment!

Logs and Visibility
-------------------

Evaluation Logs
~~~~~~~~~~~~~~~

.. code-block:: text

   [2025-01-18 17:09:44] Final agent selected: agent_1
   [2025-01-18 17:09:45] 🎤 [agent_1] presenting final answer
   [2025-01-18 17:10:15] 🔍 Evaluating final answer
   [2025-01-18 17:10:20] 🔄 Restart requested by agent_1
      Reason: Final answer describes changes but doesn't execute them
      Instructions: Actually modify the files instead of describing changes
   [2025-01-18 17:10:20] 🔄 Handling orchestration restart (attempt 1 -> 2)

Search logs for ``"restart"`` or ``"RESTART"`` to find restart decisions.

What Happens During Restart
----------------------------

Agent Context
~~~~~~~~~~~~~

When orchestration restarts, each agent receives context about previous attempts:

.. code-block:: text

   ## Previous Orchestration Attempts

   This is attempt 2 to solve the task. The final agent from the previous
   attempt was not satisfied and requested a restart.

   **Why the restart was requested:**
   Agents provided plans but didn't execute actual implementation

   **Instructions for improvement:**
   Please actually implement the solution by modifying the files, not just
   describing what changes should be made

   Please take these insights into account as you work on providing a better answer.

This context ensures agents understand:

- Why previous attempt failed
- What needs improvement
- How to avoid repeating mistakes

State Management
~~~~~~~~~~~~~~~~

During restart:

**Reset**:

- Agent answers
- Agent votes
- Coordination messages
- Selected agent

**Preserved**:

- Timeout flags (agents that timed out stay timed out)
- Session information
- Conversation history

User Visibility
~~~~~~~~~~~~~~~

Users see restart messages in the output:

.. code-block:: text

   🔄 Orchestration restart requested by final agent

   Reason: Agents only planned but didn't execute implementation

   ---

   🔄 Orchestration Restart - Attempt 2/3

   Reason: Agents only planned but didn't execute implementation

   Instructions: Please actually implement the solution...

   ---

   🚀 Starting multi-agent coordination...

Best Practices
--------------

- Set realistic ``max_orchestration_restarts`` based on task complexity (1-3 recommended)
- Provide clear task descriptions to reduce need for restarts
- The final agent should restart when critical steps are missing or implementation wasn't executed
- The final agent should submit when requirements are adequately met

Troubleshooting
---------------

**Max restarts exceeded**: Increase ``max_orchestration_restarts`` or provide more detailed initial instructions

**Agent doesn't restart when it should**: Use a more capable model in your config or provide explicit success criteria

See Also
--------

- :doc:`multi_turn_mode` - Multi-turn conversations
- :doc:`../concepts` - Core MassGen concepts


---

## user_guide/skills.rst

==============================
Skills for AI Coding Agents
==============================

MassGen publishes **skills** that let AI coding agents (Claude Code, OpenAI Codex, GitHub Copilot, Cursor, and others) invoke MassGen directly. When your agent has the MassGen skill installed, it can spin up a multi-agent run, wait for results, and apply them. Learn more about the agent skills standard at `agentskills.io <https://agentskills.io/home>`_.

.. raw:: html

   <div style="text-align: center; margin: 20px 0;">
     <a href="https://github.com/massgen/skills" target="_blank" rel="noopener noreferrer" style="display: inline-block; padding: 12px 24px; background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); color: white; text-decoration: none; border-radius: 8px; font-weight: bold; font-size: 1.1em; box-shadow: 0 4px 15px rgba(102, 126, 234, 0.4); transition: transform 0.2s, box-shadow 0.2s;">
       &#128736; Get the Skills on GitHub &rarr;
     </a>
   </div>

What Are Skills?
----------------

Skills are portable instruction bundles (a folder with a ``SKILL.md`` file) that teach AI agents how to perform specific workflows. The `SKILL.md format <https://agentskills.io/specification>`_ is an open standard supported by `40+ agent platforms <https://skills.sh>`_.

The **MassGen skill** gives your agent four modes:

.. list-table::
   :header-rows: 1
   :widths: 15 35 50

   * - Mode
     - Purpose
     - Output
   * - **General** (default)
     - Any task --- writing, code, research, design
     - Winner's deliverables + workspace files
   * - **Evaluate**
     - Critique existing work
     - ``critique_packet.md``, ``verdict.json``, ``next_tasks.json``
   * - **Plan**
     - Create a structured project plan
     - ``project_plan.json`` with task DAG
   * - **Spec**
     - Create a requirements specification
     - ``project_spec.json`` with EARS requirements

.. note::

   The skill will walk your agent through setup if needed, but things go smoother if you already have MassGen installed, an AI provider authenticated, and a config file ready. First-time setup requires human input (provider selection, API keys). See :doc:`/quickstart/installation` for setup instructions.

Installation
------------

Quick Install (All Agents)
^^^^^^^^^^^^^^^^^^^^^^^^^^

The fastest way to install across any supported agent:

.. code-block:: bash

   npx skills add massgen/skills

This works with Claude Code, Cursor, Codex, Windsurf, GitHub Copilot, Gemini CLI, Goose, Amp, and `40+ other agents <https://skills.sh>`_. See `Vercel's skills docs <https://vercel.com/docs/agent-resources/skills>`_ for details.

To install to a specific agent:

.. code-block:: bash

   npx skills add massgen/skills -a claude-code
   npx skills add massgen/skills -a codex
   npx skills add massgen/skills -a cursor

To install to all detected agents at once:

.. code-block:: bash

   npx skills add massgen/skills --all

Per-Agent Manual Installation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If you prefer to clone and copy manually:

**Claude Code:**

.. code-block:: bash

   # Global (all projects)
   git clone https://github.com/massgen/skills.git /tmp/massgen-skills
   cp -r /tmp/massgen-skills/massgen ~/.claude/skills/massgen

   # Per-project (committed to your repo)
   mkdir -p .claude/skills
   cp -r /tmp/massgen-skills/massgen .claude/skills/massgen

Then invoke with ``/massgen`` in Claude Code.

**OpenAI Codex:**

.. code-block:: bash

   git clone https://github.com/massgen/skills.git /tmp/massgen-skills
   cp -r /tmp/massgen-skills/massgen ~/.codex/skills/massgen

Then invoke with ``$massgen`` in Codex.

**GitHub Copilot (VS Code):**

.. code-block:: bash

   git clone https://github.com/massgen/skills.git /tmp/massgen-skills
   cp -r /tmp/massgen-skills/massgen .github/skills/massgen

Then use ``/skills`` in Copilot chat.

**Other agents:**

Any agent supporting the ``SKILL.md`` standard can use MassGen skills. Copy the ``massgen/`` directory from `the repo <https://github.com/massgen/skills>`_ into your agent's skill discovery path (typically ``~/.agents/skills/``).

Prerequisites
^^^^^^^^^^^^^

1. MassGen installed (``pip install massgen``)
2. At least one AI provider authenticated (API key or login-based auth like ``claude login``)
3. A MassGen config file (``.massgen/config.yaml``) --- run ``massgen --quickstart`` to create one

How It Works
------------

When your agent invokes the MassGen skill, it follows this workflow:

1. **Scope** --- determine the mode (general, evaluate, plan, spec) and what the run covers
2. **Context** --- write a context file describing the task, constraints, and expectations
3. **Criteria** --- use defaults or write custom evaluation criteria
4. **Prompt** --- fill in the mode-specific prompt template
5. **Run** --- launch MassGen in ``--automation`` mode (background), optionally open the web viewer
6. **Parse** --- read the structured output from the winning agent
7. **Apply** --- ground the results in your task system and execute

The skill includes prompt templates, context file guides, and output parsing instructions for each mode. Your agent reads these reference files and follows them step by step.

Skill Contents
--------------

The skill repo at `github.com/massgen/skills <https://github.com/massgen/skills>`_ contains:

::

   massgen/
   +-- SKILL.md                              # Main skill instructions
   +-- references/
       +-- criteria_guide.md                  # How to write evaluation criteria
       +-- general/
       |   +-- workflow.md                    # General mode guide
       |   +-- prompt_template.md             # General prompt template
       +-- evaluate/
       |   +-- workflow.md                    # Evaluate mode guide
       |   +-- prompt_template.md             # Evaluation prompt template
       +-- plan/
       |   +-- workflow.md                    # Plan mode guide
       |   +-- prompt_template.md             # Planning prompt template
       +-- spec/
           +-- workflow.md                    # Spec mode guide
           +-- prompt_template.md             # Spec prompt template

Keeping Skills Updated
----------------------

The skills repo is automatically synced from the main MassGen repository on every merge to ``main``.

.. code-block:: bash

   # If installed via npx
   npx skills update

   # If installed via git clone
   cd /tmp/massgen-skills && git pull
   cp -r /tmp/massgen-skills/massgen ~/.claude/skills/massgen   # or your agent's path


---

## user_guide/task_planning.rst

Task Planning Mode
==================

MassGen's task planning mode enables agents to create structured plans before execution,
separating the "what to build" from the "how to build it" phases.

.. contents:: On This Page
   :local:
   :depth: 2

Overview
--------

Task planning mode provides three workflows:

1. **Planning Only** (``--plan``) - Agents create a structured task plan interactively
2. **Plan and Execute** (``--plan-and-execute``) - Full workflow: create plan, then execute it
3. **Execute Plan** (``--execute-plan``) - Execute an existing plan without re-planning

This separation enables:

* Human review of plans before execution
* Iteration on plans without re-running expensive execution
* Reuse of plans across multiple execution attempts
* Clear accountability for what was planned vs what was built

Quick Start
-----------

**Create a plan:**

.. code-block:: bash

   uv run massgen --config my_agents.yaml --plan "Build a portfolio website with dark mode"

**Create and execute a plan:**

.. code-block:: bash

   uv run massgen --config my_agents.yaml --plan-and-execute "Build a portfolio website"

**Execute an existing plan:**

.. code-block:: bash

   # By plan ID
   uv run massgen --config my_agents.yaml --execute-plan 20260115_173113_836955

   # By path
   uv run massgen --config my_agents.yaml --execute-plan .massgen/plans/plan_20260115_173113_836955

   # Most recent plan
   uv run massgen --config my_agents.yaml --execute-plan latest

TUI Plan and Execute Mode
--------------------------

.. versionadded:: 0.1.44
   Interactive Plan and Execute modes in the Textual TUI.

The TUI provides interactive modes for creating and executing plans without command-line flags.

Mode Cycling
^^^^^^^^^^^^

Press ``Shift+Tab`` to cycle through four modes:

1. **Normal Mode** - Standard chat with agents
2. **Planning Mode** - Create new plans interactively
3. **Execute Mode** - Browse and execute existing plans chunk-by-chunk
4. **Analysis Mode** - Analyze prior run logs and improve workflows

.. code-block:: text

   Normal ──[Shift+Tab]──> Planning ──[Shift+Tab]──> Execute ──[Shift+Tab]──> Analysis ──[Shift+Tab]──> Normal

Or click the plan mode button in the mode bar to cycle through modes.

Using Planning Mode
^^^^^^^^^^^^^^^^^^^

**Step 1: Enter Planning Mode**

.. code-block:: bash

   uv run massgen --display textual

Press ``Shift+Tab`` once to enter Planning mode.

**Step 2: Configure Plan Options** (Optional)

Click the plan options button to set:

* **Plan Depth**: dynamic (default), shallow (5-10 tasks), medium (20-50 tasks), or deep (100-200+ tasks)
* **Task Count Target**: dynamic (default) or an explicit target
* **Chunk Count Target**: dynamic (default) or an explicit target
* **Broadcast Mode**: agents (agent coordination), human (ask user), or false (autonomous)

**Step 3: Create Plan**

Type your planning request and press ``Enter``:

.. code-block:: text

   "Create a Python web scraper for news articles with error handling"

Agents will collaborate to create a structured task plan without executing any code.

**Step 4: Review Plan**

After planning completes, the review modal opens and the plan is saved immediately to ``.massgen/plans/``.

You can:

* **Continue Planning** (multi-agent refinement turn; requires a non-empty prompt)
* **Quick Edit (Single Agent)** (focused refinement turn; requires a non-empty prompt)
* **Finalize Plan and Execute** (default Enter action)

The modal also includes an inline JSON editor so you can directly edit ``project_plan.json`` before continuing or finalizing.

Using Execute Mode
^^^^^^^^^^^^^^^^^^

**Step 1: Enter Execute Mode**

.. code-block:: bash

   uv run massgen --display textual

Press ``Shift+Tab`` twice to enter Execute mode.

**Step 2: Browse Plans**

A plan selector popover appears showing:

* Up to 10 most recent plans
* Timestamps when plans were created
* Original prompts used to create each plan

**Step 3: View Plan Details** (Optional)

Click "View Full Plan" to see complete task breakdown in a modal.

**Step 4: Execute**

* Select a plan (or use the default latest plan)
* Press ``Enter`` to execute the selected plan's current chunk
* Optionally type a chunk label (or chunk range like ``C02-C04``) before pressing Enter
* Optionally type additional execution instructions

**Step 5: Control Chunk Flow**

In execute options you can choose:

* **Auto-continue next chunk** (default)
* **Pause after each chunk** (manual confirmation between chunks)
* **Execute refinement mode**: inherit / force ON / force OFF

.. note::

   CWD context toggling (``Ctrl+P``) is disabled while Execute mode is active.
   Set CWD context before entering Execute mode, or pass ``--cwd-context ro|rw`` at launch.

Context Path Preservation
^^^^^^^^^^^^^^^^^^^^^^^^^

Context paths from the planning phase are automatically preserved:

* If you created a plan with ``@/path/to/file`` context injection
* Those same paths are restored when you execute the plan later
* Ensures consistent file access between planning and execution

Execution Workspace Contract
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

During each execute turn:

* ``tasks/plan.json`` contains only the active chunk tasks
* previous chunk snapshots are retained as ``tasks/tasks_cXX.json`` files
* ``planning_docs/full_plan.json`` contains the frozen full plan for read-only reference
* supporting planning docs are available under ``planning_docs/``

.. note::
   **TUI Plan Workflow Benefits:**

   * **Planning Mode**: Create plans interactively with visual feedback
   * **Execute Mode**: Browse saved plans and re-run without re-planning
   * **Flexibility**: Switch modes on-the-fly without restarting MassGen

Planning Phase (CLI)
--------------------

In planning mode, agents create:

* **Task Plan** (``plan.json``) - Structured list of tasks with dependencies (required)
* **Supporting docs** (optional) - Requirements, design decisions, or other markdown documentation

Plan Depth
^^^^^^^^^^

Control plan granularity with ``--plan-depth``:

.. code-block:: bash

   # Quick overview (5-10 tasks)
   uv run massgen --config my_agents.yaml --plan --plan-depth shallow "Build a blog"

   # Balanced detail (20-50 tasks) - default
   uv run massgen --config my_agents.yaml --plan --plan-depth medium "Build a blog"

   # Comprehensive breakdown (100-200+ tasks)
   uv run massgen --config my_agents.yaml --plan --plan-depth deep "Build a blog"

Broadcast Modes
^^^^^^^^^^^^^^^

Control how agents collaborate during planning:

.. code-block:: bash

   # Agents ask user critical questions (default)
   uv run massgen --config my_agents.yaml --plan --broadcast human "Build a blog"

   # Agents coordinate and clarify with each other
   uv run massgen --config my_agents.yaml --plan --broadcast agents "Build a blog"

   # Fully autonomous - no questions
   uv run massgen --config my_agents.yaml --plan --broadcast false "Build a blog"

.. note::
   In automation mode (``--automation``), ``human`` broadcast automatically switches
   to ``false`` since there's no human to respond.

Task Plan Structure
^^^^^^^^^^^^^^^^^^^

Plans are stored as JSON with this structure:

.. code-block:: json

   {
     "tasks": [
       {
         "id": "F001",
         "description": "Initialize Next.js project with Tailwind CSS",
         "chunk": "C01_foundation",
         "status": "pending",
         "depends_on": [],
         "priority": "high",
         "metadata": {
           "verification": "Dev server runs successfully",
           "verification_method": "Start dev server and verify it loads",
           "verification_group": "foundation"
         }
       },
       {
         "id": "F002",
         "description": "Create responsive navigation component",
         "chunk": "C02_ui_shell",
         "status": "pending",
         "depends_on": ["F001"],
         "priority": "high",
         "metadata": {
           "verification": "Navigation renders on mobile and desktop",
           "verification_method": "Check responsive rendering at different viewport sizes",
           "verification_group": "components"
         }
       }
     ]
   }

**Task Fields:**

.. list-table::
   :header-rows: 1
   :widths: 20 15 65

   * - Field
     - Required
     - Description
   * - ``id``
     - Yes
     - Unique task identifier (e.g., "F001", "T001")
   * - ``description``
     - Yes
     - What the task accomplishes
   * - ``chunk``
     - Yes
     - Planner-defined chunk label used for chunk-by-chunk execution order
   * - ``status``
     - Yes
     - ``pending``, ``in_progress``, ``completed``, ``verified``, or ``blocked``
   * - ``depends_on``
     - Yes
     - Array of task IDs that must complete first
   * - ``priority``
     - No
     - ``high``, ``medium``, or ``low``
   * - ``completed_at``
     - No
     - Timestamp when task was completed (ISO format)
   * - ``verified_at``
     - No
     - Timestamp when task was verified (ISO format)
   * - ``metadata.verification``
     - No
     - How to verify task completion
   * - ``metadata.verification_method``
     - No
     - Specific command or action to verify
   * - ``metadata.verification_group``
     - No
     - Group name for batch verification (e.g., "foundation", "frontend_ui")

Execution Phase
---------------

During execution:

1. The frozen plan (from ``frozen/plan.json``) is loaded
2. ``tasks/plan.json`` is written with only the active chunk tasks for that turn
3. ``planning_docs/full_plan.json`` is copied for read-only full-plan reference
4. Previous chunk snapshots are archived as ``tasks/tasks_cXX.json`` files
5. Agents use MCP planning tools to track progress
6. Agents execute tasks respecting dependencies
7. Agents update task status as they work

MCP Planning Tools
^^^^^^^^^^^^^^^^^^

Agents have access to these tools during execution:

.. code-block:: text

   get_task_plan()                              # View full plan with status
   get_ready_tasks()                            # Tasks with satisfied dependencies
   update_task_status(id, status, notes)        # Mark task progress
   add_task(description, depends_on, priority)  # Add new task if needed
   create_task_plan(tasks)                      # Replace entire plan (adoption)

**Example agent workflow:**

.. code-block:: text

   1. get_ready_tasks() → ["F001"]
   2. update_task_status("F001", "in_progress")
   3. ... execute task ...
   4. update_task_status("F001", "completed", "Created Next.js project")
   5. get_ready_tasks() → ["F002", "F003"]  # Dependencies now satisfied
   6. ... complete more foundation tasks ...
   7. # Verify foundation group: run npm run dev → works!
   8. update_task_status("F001", "verified", "Dev server runs successfully")

**Task Status Flow:**

- ``pending`` → ``in_progress`` → ``completed`` → ``verified``
- **completed**: Implementation is done (code written)
- **verified**: Task has been tested and confirmed working

Agents verify tasks in groups using ``verification_group`` labels, not after every task.

Plan Storage
------------

Plans are stored in ``.massgen/plans/``:

.. code-block:: text

   .massgen/plans/
   └── plan_20260115_173113_836955/
       ├── plan_metadata.json      # Session info, status
       ├── execution_log.jsonl     # Event log
       ├── plan_diff.json          # Changes from original (after execution)
       ├── frozen/                  # Immutable snapshot from planning
       │   ├── plan.json
       │   ├── requirements.md
       │   └── design_decisions.md
       └── workspace/              # Modified plan after execution

**Plan Metadata:**

.. code-block:: json

   {
     "plan_id": "20260115_173113_836955",
     "created_at": "2026-01-15T17:31:13.837534",
     "planning_session_id": "log_20260115_171953_153808",
     "execution_session_id": "log_20260115_195506_513599",
     "status": "completed"
   }

**Status Values:**

* ``planning`` - Plan creation in progress
* ``ready`` - Planning complete, awaiting execution
* ``executing`` - Execution in progress
* ``completed`` - Execution finished
* ``failed`` - Execution failed

Configuration
-------------

Enable planning tools in your config:

.. code-block:: yaml

   orchestrator:
     coordination:
       enable_agent_task_planning: true
       task_planning_filesystem_mode: true

These are **automatically enabled** when using ``--plan``, ``--plan-and-execute``,
or ``--execute-plan`` flags.

Automation Mode
---------------

For CI/CD or programmatic usage:

.. code-block:: bash

   # Plan and execute with automation output
   uv run massgen --automation --config my_agents.yaml \
       --plan-and-execute "Build the feature"

   # Execute existing plan
   uv run massgen --automation --config my_agents.yaml \
       --execute-plan latest

**Automation output includes:**

.. code-block:: text

   LOG_DIR: .massgen/massgen_logs/log_20260115_195506_513599
   PLAN_DIR: .massgen/plans/plan_20260115_173113_836955
   PLAN_ID: 20260115_173113_836955
   STATUS: 0

Best Practices
--------------

**1. Review Plans Before Execution**

.. code-block:: bash

   # Create plan only
   uv run massgen --config my_agents.yaml --plan "Build feature X"

   # Review the plan
   cat .massgen/plans/plan_*/frozen/plan.json

   # Execute when satisfied
   uv run massgen --config my_agents.yaml --execute-plan latest

**2. Use Appropriate Depth**

* ``shallow`` - Quick prototypes, simple features
* ``medium`` - Most projects (default)
* ``deep`` - Complex systems, detailed specifications

**3. Include Verification in Plans**

Good plans include verification methods:

.. code-block:: json

   {
     "id": "F001",
     "description": "Setup project",
     "metadata": {
       "verification": "Project builds without errors",
       "verification_method": "Run build and check for errors"
     }
   }

Agents are instructed to verify at checkpoints:

* After project setup - run dev server, confirm it starts
* After completing a feature group - test the feature works
* Before declaring complete - run full build, fix errors

**4. Iterate on Plans**

If execution reveals issues:

1. Review the plan diff (``--plan-report``)
2. Create a new plan incorporating lessons learned
3. Execute the improved plan

Troubleshooting
---------------

**Plan not found:**

.. code-block:: bash

   # List available plans
   ls .massgen/plans/

   # Use full path if ID doesn't work
   uv run massgen --execute-plan .massgen/plans/plan_20260115_173113_836955

**Agents not following plan:**

Check that planning tools are enabled in your config and that the plan
was properly loaded. Agents should call ``get_task_plan()`` at the start.

**Verification steps not running:**

Agents are instructed to verify at checkpoints (after setup, after feature groups,
before completion), not after every individual task. If verification is still
being skipped, ensure your plan has clear ``verification_method`` fields and
consider adding explicit "verify build" tasks at key milestones.

See Also
--------

* :doc:`concepts` - Core MassGen concepts
* :doc:`logging` - Understanding logs and debugging
* :doc:`../reference/cli` - Complete CLI reference
* :doc:`../examples/advanced_patterns` - Advanced usage patterns


---

## user_guide/tools/background_tools.rst

Background Tool Execution
=========================

MassGen supports non-blocking tool execution for long-running work. This lets agents continue useful foreground tasks while a tool runs in the background.

Use this guide for the generic background lifecycle used by custom tools and MCP targets.

.. note::

   This page covers **tool-level background jobs** (custom tools + MCP tools).
   For running an entire MassGen CLI command in the background, see :doc:`../integration/automation` (BackgroundShellManager).

When to Use Background Tools
----------------------------

Use background mode when a tool call is expected to take noticeable time and you can continue meaningful work without waiting.

Common examples:

* Large test suites and benchmark runs
* Long data processing tasks
* Media generation and heavy file processing
* Slow MCP/API calls

Foreground mode is usually better for short checks where immediate output is needed.

Lifecycle Overview
------------------

MassGen exposes a consistent lifecycle:

1. Start a background job with ``custom_tool__start_background_tool``
2. Check progress with ``custom_tool__get_background_tool_status``
3. Get final output with ``custom_tool__get_background_tool_result``
4. Optionally wait for the next completion with ``custom_tool__wait_for_background_tool``
5. Cancel with ``custom_tool__cancel_background_tool`` if no longer needed
6. Inspect all jobs with ``custom_tool__list_background_tools``

.. important::

   Lifecycle tools use ``job_id`` (background job identifier), not tool-specific IDs such as ``subagent_id``.

You can request background execution in two ways:

* Preferred for normal custom tool calls: include ``background: true`` (or ``mode: background``) on the original tool call
* Explicit management flow: call ``custom_tool__start_background_tool`` with target ``tool_name`` and ``arguments``

How Waiting Works
-----------------

``custom_tool__wait_for_background_tool`` blocks until the **next unseen** background job reaches a terminal state (``completed``, ``error``, or ``cancelled``), or until timeout.

Timeout behavior:

* Default timeout is 30 seconds
* Maximum timeout is 600 seconds
* Timeout returns a success payload with ``ready: false`` and ``timed_out: true``

Wait Interruption by Runtime Input
----------------------------------

``custom_tool__wait_for_background_tool`` can return early when runtime-injection content becomes available.

Interruption payload shape:

.. code-block:: json

   {
     "success": true,
     "ready": false,
     "interrupted": true,
     "interrupt_reason": "runtime_injection_available",
     "injected_content": "...",
     "waited_seconds": 4.231
   }

Notes:

* ``interrupt_reason`` may be ``runtime_injection_available`` (new context ready) or ``turn_cancelled``.
* ``injected_content`` contains the runtime context to incorporate before proceeding.
* Runtime input delivered this way is persisted for that agent within the current turn, so if the agent round restarts, the same instruction context is still present.
* If runtime input is queued just before the wait call starts, MassGen now signals an interrupt immediately after wait activation so the input is not stranded in queue.
* After handling injected context, you can continue foreground work or call wait again.

Result Delivery and Polling
---------------------------

In many runs, completed background results are automatically injected back into agent context by the hook framework. When results are not auto-injected (or when deterministic control is needed), poll status and fetch results explicitly.

Recommended pattern:

1. Start job(s)
2. Continue foreground work
3. When blocked, use ``custom_tool__wait_for_background_tool``
4. Fetch final payload with ``custom_tool__get_background_tool_result`` as needed

Subagents + Background Lifecycle
--------------------------------

For subagent work, keep these roles separate:

* ``spawn_subagents``: starts subagent work
* ``list_subagents``: discovery/index of subagent metadata (status, workspace, session pointers)
* ``custom_tool__*background*`` lifecycle tools: status/result/wait/cancel management for background jobs

When cancelling a background subagent flow, call ``custom_tool__cancel_background_tool(job_id)`` with the
background job ID returned by the lifecycle system.

Backend Notes
-------------

* This lifecycle is available across the primary MassGen tool-capable backends.
* Codex custom-tool sessions include these lifecycle tools via the ``massgen_custom_tools`` MCP wrapper.
* For Codex/Claude Code MCP targets, background-capable MCP server configs are derived from normal ``mcp_servers`` and filtered to avoid recursive/internal servers.

UI Notes (TUI)
--------------

When using the textual UI, background jobs are surfaced in status/ribbon indicators and a background-jobs modal. This makes it easier to monitor asynchronous progress without manual log inspection.

See Also
--------

* :doc:`custom_tools` - Custom tool authoring and registration
* :doc:`code_based_tools` - CodeAct-style MCP wrappers and tool usage
* :doc:`code_execution` - Command execution tools (including background shell commands)
* :doc:`../integration/automation` - BackgroundShellManager for full CLI process automation


---

## user_guide/tools/code_based_tools.rst

Code-Based Tools
================

MassGen supports code-based tool access following the `CodeAct <https://machinelearning.apple.com/research/codeact>`_ paradigm and other blog posts by `Anthropic <https://www.anthropic.com/engineering/code-execution-with-mcp>`_ and `Cloudflare <https://blog.cloudflare.com/code-mode/>`_. Instead of passing tool schemas to the model, MCP tools are presented as Python code in the workspace filesystem. Agents discover tools by reading files, import them like normal Python modules, and execute them via command-line.

This approach provides significant benefits:

* **Context reduction** - Load only needed tools instead of all schemas upfront
* **Transparent tool access** - Agents read source code and docstrings to understand tools
* **Native composition** - Combine multiple tools naturally using standard Python
* **Async-friendly workflows** - Write async scripts for parallel tool execution
* **Smart data filtering** - Process large datasets before returning to LLM

.. note::

   **Quick Setup Summary:**

   1. Enable ``enable_code_based_tools: true`` in your config
   2. Add ``enable_mcp_command_line: true`` for execution
   3. Optionally add ``exclude_file_operation_mcps: true`` to reduce redundancy
   4. Your MCP servers become Python code in ``workspace/servers/``
   5. For long-running tool calls, use the background lifecycle from :doc:`background_tools`

Quick Start: Try It Now
------------------------

MassGen includes a working example you can try immediately:

.. code-block:: bash

   # Explore available tools (demonstrates tool discovery)
   massgen --automation \
     --config massgen/configs/tools/filesystem/code_based/example_code_based_tools.yaml \
     "List all available tools by exploring the workspace filesystem. Show what MCP tools and custom tools are available."

   # Or create a website (demonstrates skills system)
   massgen \
     --config massgen/configs/tools/filesystem/code_based/example_code_based_tools.yaml \
     "Create a website about Bob Dylan, ensuring that it is visually appealing and user friendly"

The agent will:

1. Explore ``workspace/`` to discover available tools (MCP tools in ``servers/``, custom tools in ``custom_tools/``)
2. Read tool documentation from Python files and TOOL.md files
3. Import and use the tools as needed
4. Optionally create workflows in ``workspace/utils/`` for complex operations

How It Works
------------

When ``enable_code_based_tools: true`` is set, MassGen:

1. **Connects to your MCP servers** (weather, GitHub, etc.)
2. **Extracts tool schemas** from each connected server
3. **Generates Python wrapper code** for each tool
4. **Writes code to workspace** in an organized structure:

.. code-block:: text

   workspace/
   ├── servers/              # Auto-generated MCP wrappers
   │   ├── __init__.py      # Package marker (import from here)
   │   ├── weather/
   │   │   ├── __init__.py  # Exports: get_forecast, get_current
   │   │   ├── get_forecast.py
   │   │   └── get_current.py
   │   └── github/
   │       └── create_issue.py
   ├── custom_tools/         # Your custom Python tools (optional)
   ├── utils/               # Agent-created scripts (workflows, async, filtering)
   └── .mcp/                # Hidden MCP client (protocol handler)
       ├── client.py
       └── servers.json

**Directory purposes:**

* ``servers/`` - Auto-generated wrappers for MCP tools (read-only for agents)
* ``custom_tools/`` - Full Python implementations you provide (optional)
* ``utils/`` - Agent workspace for creating workflows and scripts
* ``.mcp/`` - Hidden infrastructure (agents don't see this)

TOOL.md Format & API Keys
--------------------------

Custom Tool Documentation
~~~~~~~~~~~~~~~~~~~~~~~~~~

Custom tools in ``massgen/tool/`` include ``TOOL.md`` files with YAML frontmatter for discoverability:

.. code-block:: yaml

   ---
   name: multimodal-tools
   description: Vision, audio, video, and file processing tools
   category: multimodal
   requires_api_keys: [OPENAI_API_KEY]
   tasks:
     - "Analyze and understand images with vision models"
     - "Understand and transcribe audio files"
     - "Process and understand various file formats (PDF, DOCX, etc.)"
   keywords: [vision, audio, video, multimodal, image-analysis]
   ---

   # Multimodal Tools

   [Detailed documentation follows...]

**YAML Fields:**

* ``name`` - Tool package identifier (matches directory name)
* ``description`` - One-line summary
* ``category`` - Primary category (text-processing, web-scraping, multimodal, automation, etc.)
* ``requires_api_keys`` - List of required API keys, or empty list ``[]`` if none needed
* ``tasks`` - Action-oriented task descriptions (searchable)
* ``keywords`` - Searchable terms

API Key Management
~~~~~~~~~~~~~~~~~~

MassGen uses ``.env`` files for API key management. Keys must be explicitly configured to be passed to Docker containers.

**1. Create .env file in project root:**

.. code-block:: bash

   # .env file
   OPENAI_API_KEY=sk-...
   ANTHROPIC_API_KEY=sk-ant-...
   GOOGLE_API_KEY=...
   GEMINI_API_KEY=...

See ``.env.example`` in the project root for a template of all supported API keys.

**2. Configure Docker to pass API keys:**

When using ``command_line_execution_mode: docker``, you **must** configure credentials to pass API keys:

.. code-block:: yaml

   backend:
     enable_code_based_tools: true
     enable_mcp_command_line: true
     command_line_execution_mode: "docker"

     # IMPORTANT: Pass API keys to Docker
     command_line_docker_credentials:
       env_file: ".env"  # Load from .env file
       env_vars_from_file:
         - "OPENAI_API_KEY"
         - "ANTHROPIC_API_KEY"
         - "GOOGLE_API_KEY"
         - "GEMINI_API_KEY"

**Without this configuration**, custom tools requiring API keys will fail inside Docker containers.

**Alternative approaches:**

.. code-block:: yaml

   # Option A: Load ALL variables from .env (simpler, less secure)
   command_line_docker_credentials:
     env_file: ".env"

   # Option B: Pass specific vars from host environment
   command_line_docker_credentials:
     env_vars:
       - "OPENAI_API_KEY"
       - "ANTHROPIC_API_KEY"

   # Option C: Dangerous - pass ALL host env vars (NOT RECOMMENDED)
   command_line_docker_credentials:
     pass_all_env: true

**3. Check which tools require which keys:**

.. code-block:: bash

   rg "^requires_api_keys:" massgen/tool/*/TOOL.md

**4. Filter tools by available API keys:**

.. code-block:: bash

   # Check what API keys you have
   env | grep "API_KEY"

   # Find tools that need OPENAI_API_KEY
   rg "^requires_api_keys:.*OPENAI_API_KEY" massgen/tool/*/TOOL.md -l

   # Find tools that need no API keys
   rg "^requires_api_keys: \[\]" massgen/tool/*/TOOL.md -l

.. note::

   **For local execution mode** (``command_line_execution_mode: local``), environment variables from your shell are automatically available. You only need to configure ``command_line_docker_credentials`` when using Docker.

**Automatic Tool Exclusion:**

MassGen automatically excludes tools based on unavailable API keys to prevent runtime failures:

* When you configure ``command_line_docker_credentials``, MassGen reads the ``requires_api_keys`` field from each tool's TOOL.md
* Tools requiring API keys not listed in your credentials configuration are automatically excluded
* Excluded tools won't be copied to your workspace and won't appear in tool listings
* Exclusion is logged during setup: ``"Excluding tool_name: missing API keys: KEY_NAME"``

**Example:**

.. code-block:: yaml

   # Only configure OpenAI key
   command_line_docker_credentials:
     env_file: ".env"
     env_vars_from_file:
       - "OPENAI_API_KEY"  # Only this key

**Result:**
- ``_multimodal_tools`` (requires ``OPENAI_API_KEY``) → ✅ **Available**
- ``_computer_use`` (requires ``OPENAI_API_KEY``) → ✅ **Available**
- ``_claude_computer_use`` (requires ``ANTHROPIC_API_KEY``) → ❌ **Excluded** (missing key)
- ``_gemini_computer_use`` (requires ``GOOGLE_API_KEY``) → ❌ **Excluded** (missing key)
- ``_web_tools`` (requires no API keys: ``[]``) → ✅ **Available**

You can override or supplement automatic exclusion with manual ``exclude_custom_tools`` configuration.

Configuration
-------------

Basic Setup
~~~~~~~~~~~

.. code-block:: yaml

   agents:
     - id: "my_agent"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
         cwd: "workspace"

         # Enable code-based tools
         enable_code_based_tools: true
         enable_mcp_command_line: true      # Required for execution
         exclude_file_operation_mcps: true  # Recommended (use CLI for files)

         # Your MCP servers (will be converted to Python code)
         mcp_servers:
           - name: "weather"
             type: "stdio"
             command: "npx"
             args: ["-y", "@modelcontextprotocol/server-weather"]

With Custom Tools
~~~~~~~~~~~~~~~~~

If you have existing Python tools you want visible in the workspace:

.. code-block:: yaml

   backend:
     enable_code_based_tools: true
     custom_tools_path: "massgen/tool/_code_based_example"  # Copied to workspace/custom_tools/
     enable_mcp_command_line: true
     command_line_execution_mode: "docker"

     # IMPORTANT: Most custom tools require API keys
     command_line_docker_credentials:
       env_file: ".env"
       env_vars_from_file:
         - "OPENAI_API_KEY"
         - "ANTHROPIC_API_KEY"
         - "GOOGLE_API_KEY"

     mcp_servers:
       - name: "weather"
         # ... MCP config

Your custom tools directory will be copied into ``workspace/custom_tools/`` where agents can read and use them.

.. note::

   **Automatic Tool Filtering**: When using Docker execution mode, MassGen automatically excludes custom tools whose required API keys are not configured in ``command_line_docker_credentials``. For example, if you only configure ``OPENAI_API_KEY``, tools requiring ``ANTHROPIC_API_KEY`` will be automatically excluded and won't appear in your workspace.

   Use ``rg "^requires_api_keys:" massgen/tool/*/TOOL.md`` to check which tools need which API keys.

Direct MCP Servers
------------------

When using code-based tools, all user MCP servers are normally filtered out from direct protocol access and become accessible only via generated Python code. However, you may want certain MCP servers (like debugging or monitoring tools) to remain as direct native tools in the prompt.

Use ``direct_mcp_servers`` to specify which MCP servers should bypass code-based filtering:

.. code-block:: yaml

   backend:
     type: gemini
     model: gemini-3-flash-preview
     enable_code_based_tools: true
     auto_discover_custom_tools: true

     # Keep logfire as a native tool in the prompt
     direct_mcp_servers:
       - logfire

     mcp_servers:
       - name: logfire
         type: stdio
         command: uvx
         args: ["logfire-mcp@latest"]
         env:
           LOGFIRE_READ_TOKEN: ${LOGFIRE_READ_TOKEN}

       - name: weather
         # This MCP will be filtered to code-only access
         type: stdio
         command: uvx
         args: ["weather-mcp@latest"]

In this example:

- ``logfire`` tools appear directly in the prompt as callable functions
- ``weather`` tools are converted to Python code in the workspace

**When to Use Direct MCP Servers:**

- **Debugging/monitoring tools**: Keep tools like Logfire that you want immediate access to
- **Frequently-used MCPs**: Tools called often that benefit from direct invocation
- **Framework-adjacent MCPs**: Tools that feel like core capabilities rather than external services

.. note::

   Subagents automatically inherit ``direct_mcp_servers`` from their parent agent.

Agent Usage Patterns
--------------------

1. Tool Discovery
~~~~~~~~~~~~~~~~~

Agents discover tools through two mechanisms: **TOOL.md files** for custom tool packages and **filesystem exploration** for MCP tools.

**Custom Tool Packages (TOOL.md Discovery)**

Custom tools include TOOL.md files with searchable YAML frontmatter:

.. code-block:: bash

   # List all available custom tool packages
   rg "^name: " */TOOL.md

   # Search by task description
   rg "tasks:" -A 5 */TOOL.md | rg -i "scrape|image|automate"

   # Search by keyword
   rg "^keywords:.*web|vision" */TOOL.md -l

   # Search by category
   rg "^category: automation" */TOOL.md -l

   # Check API key requirements
   rg "^requires_api_keys:" */TOOL.md
   env | grep "API_KEY"  # Check which API keys you have

   # Semantic search (if available)
   search "process videos" . --glob "*/TOOL.md" --top-k 5

   # Read full documentation
   cat custom_tools/TOOL.md

.. important::

   **Docker Users**: If tools show ``requires_api_keys: [OPENAI_API_KEY]`` or similar, you must configure ``command_line_docker_credentials`` in your YAML to pass API keys to Docker containers. See the API Key Management section above for details.

**MCP Tools (Filesystem Discovery)**

MCP tools are discovered by exploring the servers/ directory:

.. code-block:: bash

   # Discover available MCP servers
   ls servers/

   # See tools in a server (each .py file is a tool)
   ls servers/weather/

   # Read tool documentation from docstring
   cat servers/weather/get_forecast.py

   # Search for functionality across all MCP tools
   rg "temperature|forecast" servers/ --type py

   # Semantic search within MCP tools
   search "get weather data" servers/ --type py

2. Direct Tool Usage
~~~~~~~~~~~~~~~~~~~~

Once discovered, agents import and use tools:

.. code-block:: python

   # Import tool
   from servers.weather import get_forecast

   # Call it
   forecast = get_forecast("San Francisco", days=3)
   print(forecast)

3. Creating Workflows (utils/)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Agents can write sophisticated scripts in ``utils/`` to compose multiple tools:

.. code-block:: python

   # utils/daily_weather_report.py
   from servers.weather import get_forecast, get_current

   def generate_report(city: str) -> str:
       """Generate a daily weather report for a city."""
       current = get_current(city)
       forecast = get_forecast(city, days=3)

       report = f"Current: {current['temp']}°F\n"
       report += f"3-day forecast: {forecast['summary']}"
       return report

   if __name__ == "__main__":
       print(generate_report("San Francisco"))

Then execute:

.. code-block:: bash

   python utils/daily_weather_report.py

4. Async Operations
~~~~~~~~~~~~~~~~~~~

For parallel tool calls, agents can use asyncio:

.. code-block:: python

   # utils/parallel_forecasts.py
   import asyncio
   from servers.weather import get_forecast

   async def get_forecasts(cities: list) -> dict:
       """Get forecasts for multiple cities in parallel."""
       tasks = [get_forecast(city) for city in cities]
       results = await asyncio.gather(*tasks)
       return dict(zip(cities, results))

   cities = ["San Francisco", "New York", "Los Angeles", "Chicago"]
   forecasts = asyncio.run(get_forecasts(cities))
   print(forecasts)

5. Data Filtering
~~~~~~~~~~~~~~~~~

Process large datasets in the execution environment before returning to the LLM:

.. code-block:: python

   # utils/qualified_leads.py
   from servers.salesforce import get_records

   def get_top_leads(limit: int = 50) -> list:
       """Get top qualified leads (filters 10k → 50 records)."""
       # Fetch large dataset
       all_records = get_records(object="Lead", limit=10000)

       # Filter in execution environment (not sent to LLM)
       qualified = [r for r in all_records if r["score"] > 80]

       # Return only top N (massive context reduction)
       return sorted(qualified, key=lambda x: x["score"], reverse=True)[:limit]

   # Agent only sees top 50, not all 10k records
   top_leads = get_top_leads()

Generated Code Example
----------------------

MassGen generates clean, documented Python wrappers. Here's an example:

**``servers/weather/get_forecast.py``:**

.. code-block:: python

   """
   get_forecast - MCP tool wrapper

   Auto-generated wrapper for the 'get_forecast' tool from the 'weather' MCP server.
   This wrapper handles MCP protocol communication transparently.
   """

   from typing import Any, Dict, Optional
   import sys
   import os
   from pathlib import Path

   # Add .mcp to path for MCP client
   _mcp_path = Path(__file__).parent.parent.parent / '.mcp'
   if str(_mcp_path) not in sys.path:
       sys.path.insert(0, str(_mcp_path))

   from client import call_mcp_tool


   def get_forecast(location: str, days: Optional[int] = 5) -> Any:
       """Get weather forecast for a location.

       Args:
           location (str): City name or coordinates
           days (int, optional): Number of days (default: 5, max: 10)

       Returns:
           Any: Tool execution result from MCP server
       """
       return call_mcp_tool(
           server="weather",
           tool="get_forecast",
           arguments={
               "location": location,
               "days": days
           }
       )


   if __name__ == "__main__":
       # CLI usage for testing
       import json

       if len(sys.argv) > 1:
           result = get_forecast(sys.argv[1])
       else:
           print("Usage: python get_forecast.py <location>")
           print(f"\nDocumentation:\n{get_forecast.__doc__}")
           sys.exit(1)

       print(json.dumps(result, indent=2))

Agents can read this file to understand the tool's interface, then import and use it.

Benefits
--------

Context Reduction
~~~~~~~~~~~~~~~~~

**Without code-based tools:**

* All tool schemas loaded upfront into model context
* 10 MCP tools × 200 tokens each = 2,000 tokens before any task
* Wasted context on unused tools

**With code-based tools:**

* Only tool names visible initially (``ls servers/``)
* Agent reads only needed tools

Transparency
~~~~~~~~~~~~

Agents can:

* Read source code to understand tool behavior
* See parameter types and defaults
* Read docstrings and examples
* Understand error handling

This is impossible with opaque tool schemas.

Composability
~~~~~~~~~~~~~

Standard Python enables natural tool composition:

.. code-block:: python

   # utils/weather_email.py
   from servers.weather import get_forecast
   from servers.gmail import send_email

   async def send_weather_alert(city: str, recipient: str):
       """Send weather forecast via email."""
       forecast = get_forecast(city, days=7)

       if forecast['max_temp'] > 100:
           await send_email(
               to=recipient,
               subject=f"Heat Alert: {city}",
               body=f"High temperature expected: {forecast['max_temp']}°F"
           )

Async Performance
~~~~~~~~~~~~~~~~~

Native async support enables parallel tool calls:

.. code-block:: python

   # Sequential: 3 seconds total
   forecast1 = get_forecast("SF")      # 1s
   forecast2 = get_forecast("NYC")     # 1s
   forecast3 = get_forecast("LA")      # 1s

   # Parallel: 1 second total
   results = await asyncio.gather(
       get_forecast("SF"),
       get_forecast("NYC"),
       get_forecast("LA")
   )  # All execute concurrently

Data Filtering Privacy
~~~~~~~~~~~~~~~~~~~~~~

Process sensitive data in execution environment:

.. code-block:: python

   # Fetch 10k customer records
   customers = get_records("Customer", limit=10000)

   # Filter to relevant subset (in execution env, not sent to LLM)
   active_customers = [c for c in customers if c["status"] == "active"]

   # Only return summary statistics
   return {
       "total": len(customers),
       "active": len(active_customers),
       "conversion_rate": len(active_customers) / len(customers)
   }

The LLM never sees the raw customer data.

Important Notes
---------------

Built-in MCPs Stay as MCPs
~~~~~~~~~~~~~~~~~~~~~~~~~~~

When ``enable_code_based_tools: true``:

**User MCP Servers (Converted to Code-Only)**
  * Weather, GitHub, Salesforce, etc. are **removed from MCP protocol**
  * **Only accessible via Python code** in ``servers/``
  * Agents cannot call them as protocol tools (must import and use)

**Framework MCPs (Remain as Protocol)**
  * ``command_line`` - Command execution (bash is implicitly available)
  * ``workspace_tools`` - File operations, media generation
  * ``filesystem`` - Filesystem operations
  * ``planning`` - Task planning MCP
  * ``memory`` - Memory management MCP

These framework MCPs are abstracted at the protocol level and not visible in the filesystem. For example, agents can run bash commands directly without needing an ``execute_command`` function - it's automatically available.

**Important**: User MCP tools are **completely filtered out** of the agent's MCP tool list. This forces agents to use the generated Python wrappers, achieving context reduction benefit.

Non-blocking Code Generation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If code generation fails, MCP setup continues normally. The agent falls back to protocol-based tool access.

Command-line Execution Required
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Code-based tools require command-line execution capability:

.. code-block:: yaml

   backend:
     enable_mcp_command_line: true  # Required

Without this, agents cannot execute Python scripts.

Recommended: Exclude File Operation MCPs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Since agents have command-line access, file operations can use standard tools (``cat``, ``ls``, etc.):

.. code-block:: yaml

   backend:
     exclude_file_operation_mcps: true  # Use CLI for file operations

This reduces tool overhead and simplifies the environment.

.. important::

   **File Creation Tools Kept:** When ``exclude_file_operation_mcps: true`` is set, ``write_file`` and ``edit_file`` are **still available** as MCP tools. This is intentional:

   * **Avoids shell escaping nightmares** - Creating Python scripts with heredocs/echo leads to complex escaping issues
   * **Clean file creation** - Use ``write_file`` to create ``.py`` files, then execute via command-line
   * **Best of both worlds** - Write files via MCP, read/list via command-line

   Example of the problem this solves:

   .. code-block:: bash

      # Shell approach (FAILS with escaping issues):
      echo 'async def main():\n    result = await func("text with \"quotes\"")\n' > script.py

      # MCP approach (WORKS cleanly):
      mcp__filesystem__write_file(path="script.py", content="async def main():\n    ...")

   All other filesystem tools (read, list, search, etc.) are excluded and should use command-line equivalents.

Complete Example Config
------------------------

.. code-block:: yaml

   agents:
     - id: "research_agent"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
         cwd: "research_workspace"

         # Code-based tools configuration
         enable_code_based_tools: true
         enable_mcp_command_line: true
         exclude_file_operation_mcps: true

         # Optional: Your custom tools
         custom_tools_path: "my_tools/"

         # MCP servers (converted to Python code)
         mcp_servers:
           - name: "weather"
             type: "stdio"
             command: "npx"
             args: ["-y", "@modelcontextprotocol/server-weather"]

           - name: "github"
             type: "stdio"
             command: "npx"
             args: ["-y", "@modelcontextprotocol/server-github"]
             env:
               GITHUB_TOKEN: "${GITHUB_TOKEN}"

       system_message: |
         You are a research assistant with access to weather and GitHub tools.
         Tools are available as Python modules in the workspace.

         Discover tools:
         - ls servers/
         - cat servers/weather/get_forecast.py

         Use tools:
         - from servers.weather import get_forecast
         - Create workflows in utils/ for complex tasks

   ui:
     display_type: "rich_terminal"

References
----------

This implementation is based on recent research and production systems:

* **CodeAct** (Apple Research) -
  https://machinelearning.apple.com/research/codeact

* **Cloudflare Code Mode** -
  https://blog.cloudflare.com/code-mode/

* **Anthropic MCP Code Execution** -
  https://www.anthropic.com/engineering/code-execution-with-mcp

See Also
--------

* :doc:`custom_tools` - Creating custom Python tools
* :doc:`background_tools` - Background lifecycle for long-running tool calls
* :doc:`mcp_integration` - MCP server setup and configuration
* :doc:`code_execution` - Command-line execution modes
* :doc:`../files/file_operations` - File operation configuration


---

## user_guide/tools/code_execution.rst

Code Execution
===============

MassGen provides powerful command-line execution capabilities through MCP (Model Context Protocol), enabling agents to run bash commands, install packages, execute scripts, and more - all with multiple layers of security.

Quick Start
-----------

**Enable code execution for a single agent:**

.. code-block:: yaml

   agent:
     backend:
       type: "openai"
       model: "gpt-5-mini"
       cwd: "workspace"
       enable_mcp_command_line: true  # Enables code execution

**Run with code execution:**

.. code-block:: bash

   massgen "Write a Python script to analyze data.csv and create a report"

Execution Modes
---------------

MassGen supports two execution modes:

Local Mode (Default)
~~~~~~~~~~~~~~~~~~~~

Commands execute directly on your host system with pattern-based security:

.. code-block:: yaml

   agent:
     backend:
       cwd: "workspace"
       enable_mcp_command_line: true
       command_line_execution_mode: "local"  # Default

**Best for:** Development, trusted code, fast execution

Docker Mode
~~~~~~~~~~~

Commands execute inside isolated Docker containers:

.. code-block:: yaml

   agent:
     backend:
       cwd: "workspace"
       enable_mcp_command_line: true
       command_line_execution_mode: "docker"

**Best for:** Production, untrusted code, high security requirements

See :ref:`docker-mode-setup` for setup instructions.

Docker Credentials & Package Management
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Docker mode supports comprehensive credential management and package preinstallation through two nested configuration dictionaries: ``command_line_docker_credentials`` and ``command_line_docker_packages``.

Credential Management
"""""""""""""""""""""

**1. Mount Credential Files**

Mount credential files from your host into the container (all mounted read-only):

.. code-block:: yaml

   command_line_docker_credentials:
     mount:
       - "ssh_keys"     # ~/.ssh → /home/massgen/.ssh
       - "git_config"   # ~/.gitconfig → /home/massgen/.gitconfig
       - "gh_config"    # ~/.config/gh → /home/massgen/.config/gh
       - "npm_config"   # ~/.npmrc → /home/massgen/.npmrc
       - "pypi_config"  # ~/.pypirc → /home/massgen/.pypirc
       - "claude_config"  # ~/.claude → /home/massgen/.claude
       - "codex_config"  # ~/.codex → /home/massgen/.codex

**Available mount types:**

- ``ssh_keys`` - Clone private repos via SSH (``git clone git@github.com:org/repo.git``)
- ``git_config`` - Git user name/email for commits
- ``gh_config`` - GitHub CLI authentication (use if you've run ``gh auth login``)
- ``npm_config`` - Private npm package authentication
- ``pypi_config`` - Private PyPI package authentication
- ``claude_config`` - Claude Code CLI session/config files (for Claude auth inheritance in Docker)
- ``codex_config`` - Codex CLI OAuth/session files (for keyless Codex auth inheritance in Docker)

**2. Pass Environment Variables**

Multiple methods to pass environment variables:

.. code-block:: yaml

   # Option 1: From .env file - load ALL variables
   command_line_docker_credentials:
     env_file: ".env"

   # Option 2: From .env file - load ONLY specific variables (recommended)
   command_line_docker_credentials:
     env_file: ".env"
     env_vars_from_file:  # Only pass these from .env
       - "GITHUB_TOKEN"
       - "NPM_TOKEN"
     # Other secrets in .env won't be passed to container

   # Option 3: Specific variables from host environment
   command_line_docker_credentials:
     env_vars:
       - "GITHUB_TOKEN"
       - "NPM_TOKEN"
       - "ANTHROPIC_API_KEY"

   # Option 4: All environment variables (dangerous, use with caution)
   command_line_docker_credentials:
     pass_all_env: true

**3. Custom Volume Mounts**

Mount additional files or directories:

.. code-block:: yaml

   command_line_docker_credentials:
     additional_mounts:
       "/path/on/host/.aws":
         bind: "/home/massgen/.aws"
         mode: "ro"

GitHub CLI Authentication
"""""""""""""""""""""""""

GitHub CLI (``gh``) is pre-installed in MassGen Docker images. Two authentication methods:

**Method 1: Use Existing Login** (recommended if you've run ``gh auth login``):

.. code-block:: yaml

   command_line_docker_credentials:
     mount:
       - "gh_config"  # Mounts ~/.config/gh with your credentials

**Method 2: Pass Token**:

.. code-block:: yaml

   command_line_docker_credentials:
     env_vars:
       - "GITHUB_TOKEN"  # Set: export GITHUB_TOKEN=ghp_your_token

**For HTTPS git clones**, also add the token so git can authenticate:

.. code-block:: yaml

   command_line_docker_credentials:
     mount: ["gh_config", "ssh_keys", "git_config"]
     env_vars: ["GITHUB_TOKEN"]  # Enables both gh CLI and HTTPS git

Agents can then use ``gh`` commands:

.. code-block:: bash

   gh auth status
   gh api user
   gh repo clone user/repo
   gh issue list
   gh pr list

Package Preinstall
""""""""""""""""""

Specify base packages to pre-install in every container. These install when the container is created, before agents start working:

.. code-block:: yaml

   command_line_docker_packages:
     preinstall:
       python:
         - "requests>=2.31.0"
         - "numpy>=1.24.0"
         - "pytest>=7.0.0"
       npm:
         - "typescript"
         - "@types/node"
       system:
         - "vim"
         - "htop"

**Installation order**: System packages → Python packages → npm packages (all with sudo if enabled).

**When to use**:

- Consistent base environment across all runs
- Different package sets per configuration
- Quick iteration without rebuilding Docker images

**Requirements**:

- npm/system packages require: ``command_line_docker_enable_sudo: true``
- All packages require: ``command_line_docker_network_mode: "bridge"``

Custom Docker Images
""""""""""""""""""""

For stable dependencies or complex environments, create a custom Docker image:

.. code-block:: yaml

   command_line_docker_image: "your-username/custom-image:tag"

**Example custom Dockerfile** (see ``massgen/docker/Dockerfile.custom-example``):

.. code-block:: dockerfile

   FROM massgen/mcp-runtime:latest
   RUN pip install --no-cache-dir scikit-learn matplotlib seaborn
   RUN apt-get update && apt-get install -y vim htop && rm -rf /var/lib/apt/lists/*

Build and use:

.. code-block:: bash

   docker build -t my-custom-image:v1 -f Dockerfile.custom .

**Key requirements for custom images:**

1. Must have ``massgen`` user with UID 1000
2. Must create ``/workspace``, ``/context``, ``/temp_workspaces`` directories
3. Must set appropriate permissions
4. CMD should keep container running (``tail -f /dev/null``)

Complete Example Configurations
""""""""""""""""""""""""""""""""

**Minimal GitHub access:**

.. code-block:: yaml

   agent:
     backend:
       enable_mcp_command_line: true
       command_line_execution_mode: "docker"
       command_line_docker_network_mode: "bridge"
       command_line_docker_credentials:
         env_vars: ["GITHUB_TOKEN"]

**Full development setup:**

.. code-block:: yaml

   agent:
     backend:
       enable_mcp_command_line: true
       command_line_execution_mode: "docker"
       command_line_docker_enable_sudo: true
       command_line_docker_network_mode: "bridge"

       command_line_docker_credentials:
         env_file: ".env"
         mount: ["ssh_keys", "git_config"]

       command_line_docker_packages:
         preinstall:
           python: ["pytest", "requests", "numpy"]
           npm: ["typescript"]

**Security best practices:**

- Use ``.env`` files for credentials (add to ``.gitignore``)
- Use ``env_vars_from_file`` to only pass needed secrets from .env (recommended)
- Mount only needed credentials (opt-in by default)
- Use ``command_line_docker_network_mode: "none"`` unless network is required
- All credential files are mounted **read-only**
- Use command filtering (``blocked_commands``) for additional safety

**Ready-to-run examples:**

1. **GitHub read-only mode** (safe mode with credentials):

   .. code-block:: bash

      # Prerequisites: gh auth login or export GITHUB_TOKEN
      uv run massgen --config @examples/configs/tools/code-execution/docker_github_readonly.yaml "Test to see the most recent issues in the massgen/MassGen repo with the github cli"

2. **Full development setup** (all features combined):

   .. code-block:: bash

      # Prerequisites: Build sudo image, create .env file
      bash massgen/docker/build.sh --sudo
      echo "GITHUB_TOKEN=ghp_your_token" > .env

      uv run massgen --config @examples/configs/tools/code-execution/docker_full_dev_setup.yaml "Demonstrate full dev environment: check gh auth, verify pre-installed massgen, verify typescript installed, create Flask app with requirements.txt, show git config"

3. **Custom Docker image** (bring your own image):

   .. code-block:: bash

      # Prerequisites: Build custom image
      docker build -t massgen-custom-test:v1 -f massgen/docker/Dockerfile.custom-example .

      uv run massgen --config @examples/configs/tools/code-execution/docker_custom_image.yaml "Verify custom packages: sklearn, matplotlib, seaborn, ipython, black, vim, htop, tree"

**More examples:** See ``massgen/configs/tools/code-execution/`` for additional configurations.

Code Execution vs Backend Built-in Tools
-----------------------------------------

MassGen provides **two ways** for agents to execute code:

1. **Backend Built-in Code Execution**
2. **MCP-based Code Execution** (Universal)

.. list-table::
   :header-rows: 1
   :widths: 30 35 35

   * - Feature
     - Backend Built-in
     - MCP Code Execution
   * - **Availability**
     - Backend-specific (OpenAI, Claude Code)
     - Universal (all backends)
   * - **Configuration**
     - Automatic with supported backends
     - ``enable_mcp_command_line: true``
   * - **Execution Environment**
     - Backend provider's sandbox
     - Your environment (local/Docker)
   * - **Persistence**
     - Ephemeral (resets between sessions)
     - Persistent (packages stay installed)
   * - **File System Access**
     - Limited to backend's environment
     - Full access to workspace
   * - **Package Installation**
     - Backend-managed
     - You control (pip, npm, etc.)
   * - **Network Access**
     - Provider-controlled
     - Configurable (local: full, Docker: none/bridge/host)
   * - **Use Case**
     - Quick calculations, simple scripts
     - Complex workflows, persistent environments

**You can use both simultaneously!** The agent will choose the most appropriate tool for each task.

Configuration
-------------

Basic Configuration
~~~~~~~~~~~~~~~~~~~

Enable MCP code execution with minimal setup:

.. code-block:: yaml

   agent:
     backend:
       type: "openai"
       model: "gpt-5-mini"
       cwd: "workspace"
       enable_mcp_command_line: true

Advanced Configuration
~~~~~~~~~~~~~~~~~~~~~~

Full configuration with Docker mode and security:

.. code-block:: yaml

   agent:
     backend:
       type: "claude"
       model: "claude-sonnet-4"
       cwd: "workspace"

       # Enable MCP code execution
       enable_mcp_command_line: true
       command_line_execution_mode: "docker"  # or "local"

       # Docker-specific settings (if using docker mode)
       command_line_docker_image: "massgen/mcp-runtime:latest"
       command_line_docker_memory_limit: "2g"
       command_line_docker_cpu_limit: 4.0
       command_line_docker_network_mode: "none"  # "none", "bridge", or "host"

       # Command filtering (optional)
       command_line_whitelist_patterns: ["pip install.*", "python .*"]
       command_line_blacklist_patterns: ["rm -rf /", "sudo .*"]

Configuration Parameters
~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 30 15 55

   * - Parameter
     - Default
     - Description
   * - ``enable_mcp_command_line``
     - ``false``
     - Enable MCP-based code execution
   * - ``command_line_execution_mode``
     - ``"local"``
     - Execution mode: ``"local"`` or ``"docker"``
   * - ``command_line_docker_image``
     - ``"massgen/mcp-runtime:latest"``
     - Docker image for container execution
   * - ``command_line_docker_memory_limit``
     - None
     - Memory limit (e.g., ``"2g"``, ``"512m"``)
   * - ``command_line_docker_cpu_limit``
     - None
     - CPU cores limit (e.g., ``2.0``, ``4.0``)
   * - ``command_line_docker_network_mode``
     - ``"none"``
     - Network mode: ``"none"``, ``"bridge"``, or ``"host"``
   * - ``command_line_docker_enable_sudo``
     - ``false``
     - Enable sudo in containers (⚠️ less secure, see docs)
   * - ``command_line_whitelist_patterns``
     - None
     - Regex patterns for allowed commands
   * - ``command_line_blacklist_patterns``
     - None
     - Regex patterns for blocked commands

.. _docker-mode-setup:

Docker Mode Setup
-----------------

Prerequisites
~~~~~~~~~~~~~

1. **Docker installed and running:**

   .. code-block:: bash

      docker --version  # Should show Docker Engine >= 28.0.0
      docker ps         # Should connect without errors

   Recommended: Docker Engine 28.0.0+ (`release notes <https://docs.docker.com/engine/release-notes/28/>`_)

2. **Python docker library:**

   .. code-block:: bash

      # Install via optional dependency group
      uv pip install -e ".[docker]"

      # Or install directly
      pip install docker>=7.0.0

Build Docker Image
~~~~~~~~~~~~~~~~~~

From the repository root:

.. code-block:: bash

   bash massgen/docker/build.sh

This builds ``massgen/mcp-runtime:latest`` (~400-500MB).

Enable Docker Mode
~~~~~~~~~~~~~~~~~~

Simple configuration:

.. code-block:: yaml

   agent:
     backend:
       cwd: "workspace"
       enable_mcp_command_line: true
       command_line_execution_mode: "docker"

That's it! The container will be created automatically when orchestration starts.

How It Works
~~~~~~~~~~~~

**Container Lifecycle:**

1. **Orchestration Start** → Creates persistent container ``massgen-{agent_id}``
2. **Agent Turns** → Commands execute via ``docker exec``
3. **Orchestration End** → Container stopped and removed

**Key Features:**

* **Persistent Containers:** One container per agent for entire orchestration
* **State Persistence:** Packages and files persist across turns
* **Path Transparency:** Paths mounted at same locations as host
* **MCP Server on Host:** Server runs on host, creates Docker client to execute commands

**Volume Mounts:**

* **Workspace:** Read-write access to agent's workspace
* **Context Paths:** Read-only or read-write based on configuration
* **Temp Workspace:** Read-only access to other agents' outputs

Security Features
-----------------

Multi-Layer Security
~~~~~~~~~~~~~~~~~~~~

MassGen implements multiple security layers for code execution:

1. **AG2-Inspired Command Sanitization**

   Blocks dangerous patterns:

   * ``rm -rf /``
   * ``sudo`` commands
   * ``chmod 777``
   * And more...

2. **Command Filtering**

   Whitelist/blacklist regex patterns:

   .. code-block:: yaml

      command_line_whitelist_patterns: ["pip install.*", "python .*"]
      command_line_blacklist_patterns: ["rm -rf.*", "sudo.*"]

3. **Docker Container Isolation** (Docker mode only)

   * Filesystem isolation (only mounted volumes accessible)
   * Network isolation (default: no network)
   * Resource limits (memory, CPU)
   * Process isolation (non-root user)

4. **PathPermissionManager Hooks**

   Validates file operations against context path permissions

5. **Timeout Enforcement**

   Commands timeout after configured duration

Local vs Docker Comparison
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 25 35 40

   * - Aspect
     - Local Mode
     - Docker Mode
   * - **Setup**
     - None required
     - Docker + image build
   * - **Performance**
     - Fast (direct execution)
     - Slight overhead (~100-200ms)
   * - **Isolation**
     - Pattern-based (circumventable)
     - Container-based (strong)
   * - **Network**
     - Full host network
     - Configurable (none/bridge/host)
   * - **Resource Limits**
     - OS-level only
     - Docker-enforced
   * - **Security**
     - Medium
     - High
   * - **Best For**
     - Development, trusted code
     - Production, untrusted code

Usage Examples
--------------

Example 1: Python Development
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   agent:
     backend:
       type: "claude"
       model: "claude-sonnet-4"
       cwd: "workspace"
       enable_mcp_command_line: true
       command_line_execution_mode: "docker"

.. code-block:: bash

   massgen "Write and test a sorting algorithm"

**What happens:**

1. Agent writes ``sort.py``
2. Agent runs ``pip install pytest``
3. Agent writes tests in ``test_sort.py``
4. Agent runs ``pytest``
5. All isolated in Docker container!

Example 2: With Resource Constraints
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   agent:
     backend:
       cwd: "workspace"
       enable_mcp_command_line: true
       command_line_execution_mode: "docker"
       command_line_docker_memory_limit: "1g"
       command_line_docker_cpu_limit: 1.0
       command_line_docker_network_mode: "none"

Good for untrusted or resource-intensive tasks.

Example 3: With Network Access
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   agent:
     backend:
       cwd: "workspace"
       enable_mcp_command_line: true
       command_line_execution_mode: "docker"
       command_line_docker_network_mode: "bridge"

.. code-block:: bash

   massgen "Fetch data from an API and analyze it"

Agent can make HTTP requests from inside container.

Example 4: Multi-Agent with Different Modes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   agents:
     - id: "developer"
       backend:
         type: "openai"
         model: "gpt-5-mini"
         cwd: "workspace1"
         enable_mcp_command_line: true
         command_line_execution_mode: "local"  # Fast for development

     - id: "tester"
       backend:
         type: "claude"
         model: "claude-sonnet-4"
         cwd: "workspace2"
         enable_mcp_command_line: true
         command_line_execution_mode: "docker"  # Isolated for testing

Docker Image Details
--------------------

Base Image: massgen/mcp-runtime:latest
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Contents:**

* Base: Python 3.11-slim
* System packages: git, curl, build-essential, Node.js 20.x
* Python packages: pytest, requests, numpy, pandas
* User: non-root (massgen, UID 1000)
* Working directory: /workspace

**Size:** ~400-500MB (compressed)

Custom Images
~~~~~~~~~~~~~

Extend the base image with additional packages:

.. code-block:: dockerfile

   FROM massgen/mcp-runtime:latest

   # Install additional system packages
   USER root
   RUN apt-get update && apt-get install -y --no-install-recommends \
       postgresql-client \
       && rm -rf /var/lib/apt/lists/*

   # Install additional Python packages
   USER massgen
   RUN pip install --no-cache-dir sqlalchemy psycopg2-binary

   WORKDIR /workspace

Build and use:

.. code-block:: bash

   docker build -t my-custom-runtime:latest -f Dockerfile.custom .

.. code-block:: yaml

   command_line_docker_image: "my-custom-runtime:latest"

Sudo Variant (Runtime Package Installation)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The sudo variant allows agents to install system packages at runtime inside their Docker container.

**IMPORTANT: Build the image before first use:**

.. code-block:: bash

   bash massgen/docker/build.sh --sudo

This builds ``massgen/mcp-runtime-sudo:latest`` with sudo access locally. (This image is not available on Docker Hub - you must build it yourself.)

**Enable in config:**

.. code-block:: yaml

   agent:
     backend:
       cwd: "workspace"
       enable_mcp_command_line: true
       command_line_execution_mode: "docker"
       command_line_docker_enable_sudo: true  # Automatically uses sudo image

**What agents can do with sudo:**

.. code-block:: bash

   # Install system packages at runtime
   sudo apt-get update && sudo apt-get install -y ffmpeg

   # Install additional Python packages
   sudo pip install tensorflow

**Is this safe?**

**YES**, because Docker container isolation is the primary security boundary:

**Container is fully isolated from your host:**

- Sudo inside container ≠ sudo on your computer
- Agent can only access mounted volumes (workspace, context paths)
- Cannot access your host filesystem outside mounts
- Cannot affect host processes or system configuration
- Docker namespaces/cgroups provide strong isolation

**What sudo can and cannot do:**

- ✅ Can: Install packages inside the container (apt, pip, npm)
- ✅ Can: Modify container system configuration
- ✅ Can: Read/write mounted workspace (same as without sudo)
- ❌ Cannot: Access your host filesystem outside mounts
- ❌ Cannot: Affect your host system
- ❌ Cannot: Break out of the container (unless Docker vulnerability exists)

**Theoretical risks (extremely rare):**

- Container escape vulnerabilities (CVEs in Docker/kernel) are very rare and quickly patched
- Sudo increases attack surface slightly if escape exists
- Still requires exploit code, not just malicious intent

**When to use sudo variant vs custom images:**

.. list-table::
   :header-rows: 1
   :widths: 20 30 25 25

   * - Approach
     - Use When
     - Performance
     - Security
   * - **Sudo variant**
     - Need flexibility, unknown packages, prototyping
     - Slower (runtime install)
     - Good (container isolated)
   * - **Custom image**
     - Know packages, production use
     - Fast (pre-installed)
     - Best (minimal attack surface)

**Custom image example (recommended for production):**

.. code-block:: dockerfile

   FROM massgen/mcp-runtime:latest
   USER root
   RUN apt-get update && apt-get install -y ffmpeg postgresql-client
   USER massgen

Build: ``docker build -t my-runtime:latest .``

Use: ``command_line_docker_image: "my-runtime:latest"``

**Bottom line:** The sudo variant is safe for most use cases because Docker container isolation is strong. Custom images are preferred for production because they're faster and have a smaller attack surface, but sudo is fine for development and prototyping.

Troubleshooting
---------------

Docker Not Installed
~~~~~~~~~~~~~~~~~~~~

**Symptom:** ``RuntimeError: Docker Python library not available``

**Solution:**

.. code-block:: bash

   pip install docker>=7.0.0

Failed to Connect to Docker
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Symptom:** ``RuntimeError: Failed to connect to Docker: ...``

**Possible causes:**

1. Docker daemon not running:

   .. code-block:: bash

      docker ps  # Check if Docker is running

2. Permission issues (Linux):

   .. code-block:: bash

      sudo usermod -aG docker $USER
      # Log out and back in

3. Custom Docker socket:

   .. code-block:: bash

      export DOCKER_HOST=unix:///path/to/docker.sock

Image Not Found
~~~~~~~~~~~~~~~

**Symptom:** ``RuntimeError: Failed to pull Docker image ...``

**Solution:**

.. code-block:: bash

   bash massgen/docker/build.sh

Permission Errors in Container
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Symptom:** ``Permission denied`` when writing files

**Solution:** Ensure workspace has correct permissions:

.. code-block:: bash

   chmod -R 755 workspace

Performance Issues
~~~~~~~~~~~~~~~~~~

**Solutions:**

1. Increase resource limits:

   .. code-block:: yaml

      command_line_docker_memory_limit: "4g"
      command_line_docker_cpu_limit: 4.0

2. Use custom image with pre-installed packages

3. Check Docker Desktop resource settings

Debugging
---------

Inspect Running Container
~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   # List containers
   docker ps | grep massgen

   # View logs in real-time
   docker logs -f massgen-{agent_id}

   # Execute interactive shell
   docker exec -it massgen-{agent_id} /bin/bash

Check Resource Usage
~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   docker stats massgen-{agent_id}

Manual Container Management
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   # Stop container
   docker stop massgen-{agent_id}

   # Remove container
   docker rm massgen-{agent_id}

   # Clean up all stopped containers
   docker container prune -f

Background Shell Execution
---------------------------

**NEW:** MassGen supports running commands in the background without blocking, enabling parallel execution and long-running processes.

What is Background Execution?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Background execution allows agents to:

* Start long-running processes (training, servers, simulations)
* Run multiple experiments in parallel
* Monitor processes without blocking
* Continue working while tasks execute

**Available Tools:**

When ``enable_mcp_command_line: true`` is set, agents automatically get these tools:

* ``start_background_shell(command, work_dir)`` - Start command in background, returns shell_id
* ``get_background_shell_output(shell_id)`` - Retrieve stdout/stderr from background process
* ``get_background_shell_status(shell_id)`` - Check if running/stopped/failed
* ``kill_background_shell(shell_id)`` - Terminate a background process
* ``list_background_shells()`` - List all active background processes

Example: Parallel Experiments
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   agent:
     backend:
       type: "openai"
       model: "gpt-5-mini"
       cwd: "workspace"
       enable_mcp_command_line: true
     system_message: |
       You can run multiple experiments in parallel using background shell tools.
       Use start_background_shell() to launch tasks, then monitor with
       list_background_shells() and collect results when complete.

**Agent workflow:**

.. code-block:: python

   # Start 3 experiments in parallel
   exp1 = start_background_shell("python experiment_a.py")
   exp2 = start_background_shell("python experiment_b.py")
   exp3 = start_background_shell("python experiment_c.py")

   # Monitor until all complete
   while True:
       shells = list_background_shells()
       running = [s for s in shells["shells"] if s["status"] == "running"]
       if len(running) == 0:
           break

   # Collect results
   result1 = get_background_shell_output(exp1["shell_id"])
   result2 = get_background_shell_output(exp2["shell_id"])
   result3 = get_background_shell_output(exp3["shell_id"])

Example: Server Management
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   # Start web server in background
   server = start_background_shell("uvicorn app:main --port 8000")

   # Server runs while agent does other work...

   # Run integration tests
   test_result = execute_command("pytest tests/integration/")

   # Cleanup: stop server
   kill_background_shell(server["shell_id"])

Example: Long-Running Tasks with Monitoring
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   # Start training job
   training = start_background_shell("python train.py --epochs 100")

   # Monitor progress periodically
   while True:
       status = get_background_shell_status(training["shell_id"])

       if status["status"] != "running":
           break

       # Check progress from output
       output = get_background_shell_output(training["shell_id"])
       # Look for "Epoch X/100" in output...

   # Training complete
   final_output = get_background_shell_output(training["shell_id"])

Key Features
~~~~~~~~~~~~

* **Non-blocking:** Continue work while processes run
* **Parallel execution:** Run multiple tasks simultaneously (default limit: 10 concurrent)
* **Memory-safe:** Ring buffer captures last 10,000 lines (prevents OOM on infinite output)
* **Auto-cleanup:** All background processes killed on MassGen exit
* **Thread-safe:** Safe for concurrent access from multiple agents
* **Same security:** Background shells use same sanitization as foreground ``execute_command``

Demo Configuration
~~~~~~~~~~~~~~~~~~

See ``massgen/configs/tools/code-execution/background_shell_demo.yaml`` for a complete example showing parallel vs sequential execution strategies.

Best Practices
--------------

1. **Use Docker mode for untrusted or production workloads**
2. **Set resource limits** to prevent abuse
3. **Use network_mode="none"** unless network is required
4. **Build custom images** for frequently used packages (faster)
5. **Monitor container logs** for debugging
6. **Test in local mode first** for faster iteration
7. **Use command filtering** to restrict dangerous operations
8. **Use background shells for parallel tasks** - Run multiple experiments concurrently
9. **Monitor background processes** - Use ``get_background_shell_status()`` to check progress
10. **Cleanup background shells** - Kill when done or let auto-cleanup handle it

Configuration Examples
----------------------

See ``massgen/configs/tools/code-execution/`` for example configurations:

* ``basic_command_execution.yaml`` - Minimal code execution setup
* ``code_execution_use_case_simple.yaml`` - Simple use case example
* ``command_filtering_whitelist.yaml`` - Whitelist filtering example
* ``command_filtering_blacklist.yaml`` - Blacklist filtering example
* ``docker_simple.yaml`` - Minimal Docker setup
* ``docker_with_resource_limits.yaml`` - Memory/CPU limits with network
* ``docker_multi_agent.yaml`` - Multi-agent with Docker isolation
* ``docker_verification.yaml`` - Verify Docker isolation works
* ``background_shell_demo.yaml`` - **NEW:** Parallel execution with background shells

Next Steps
----------

* :doc:`../files/file_operations` - File system operations and workspace management
* :doc:`mcp_integration` - Additional MCP tools beyond code execution
* :doc:`../../reference/supported_models` - Backend capabilities including code execution
* :doc:`../../quickstart/running-massgen` - More usage examples

References
----------

* `Docker Documentation <https://docs.docker.com/>`_
* `Docker Python SDK <https://docker-py.readthedocs.io/>`_
* Design Document: ``docs/dev_notes/CODE_EXECUTION_DESIGN.md``
* **NEW:** Background Execution Design: ``docs/dev_notes/background_shell_execution_design.md``
* Docker README: ``massgen/docker/README.md``
* Build Script: ``massgen/docker/build.sh``


---

## user_guide/tools/custom_tools.rst

Custom Tools
============

MassGen allows you to give agents access to your own custom Python functions as tools. This enables agents to use your domain-specific functionality, business logic, or specialized algorithms alongside built-in tools and MCP servers.

.. note::

   **Quick Setup Summary:**

   1. Write a Python function that returns ``ExecutionResult``
   2. Reference it in your YAML config under ``custom_tools``
   3. Run MassGen - agents can now use your function
   4. For long-running calls, use the background lifecycle in :doc:`background_tools`

Quick Start: Try It Now
-----------------------

MassGen includes working examples you can try immediately:

.. code-block:: bash

   # Single agent with custom tool
   massgen \
     --config massgen/configs/tools/custom_tools/gemini_custom_tool_example.yaml \
     "What's the sum of 123 and 456?"

   # Custom tool + MCP weather integration
   massgen \
     --config massgen/configs/tools/custom_tools/gemini_custom_tool_with_mcp_example.yaml \
     "What's the sum of 123 and 456? And what's the weather in Tokyo?"

The agent will use the custom ``two_num_tool`` to calculate and respond with "The sum of 123 and 456 is 579".

How The Example Works
~~~~~~~~~~~~~~~~~~~~~~

**The Tool** (``massgen/tool/_basic/_two_num_tool.py``):

.. code-block:: python

   from massgen.tool._result import ExecutionResult, TextContent

   async def two_num_tool(x: int, y: int) -> ExecutionResult:
       """Add two numbers together.

       Args:
           x: First number
           y: Second number

       Returns:
           Sum of the two numbers
       """
       result = x + y
       return ExecutionResult(
           output_blocks=[
               TextContent(data=f"The sum of {x} and {y} is {result}"),
           ],
       )

**The Config** (``gemini_custom_tool_example.yaml``):

.. code-block:: yaml

   agents:
     - id: "gemini2.5flash_custom_tool"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
         custom_tools:
           - name: ["two_num_tool"]
             category: "math"
             path: "massgen/tool/_basic/_two_num_tool.py"
             function: ["two_num_tool"]
       system_message: |
         You are an AI assistant with access to a custom math calculation tool.
         When users ask about adding two numbers together, use the two_num_tool.

   ui:
     display_type: "rich_terminal"

That's the complete pattern! Now let's see how to create your own tools.

How It Works
------------

Custom tools in MassGen follow a simple pattern:

1. **Function Signature**: Write an async function with type hints
2. **Docstring**: Add a Google-style docstring (used for tool description)
3. **Return Type**: Return ``ExecutionResult`` with your output
4. **YAML Config**: Reference the function in your agent's ``custom_tools``

MassGen automatically:

* Generates JSON schema from your function signature
* Makes the tool available to agents
* Handles execution and result streaming
* Works across all backends (Claude, Gemini, OpenAI, etc.)

Creating Your Own Custom Tools
-------------------------------

To create your own custom tool, follow the same pattern as ``two_num_tool``.

Step-by-Step: Create a Custom Tool
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**1. Create your tool file** (e.g., ``my_tools/calculator.py``):

.. code-block:: python

   from massgen.tool import ExecutionResult, TextContent

   async def calculator(operation: str, x: float, y: float) -> ExecutionResult:
       """Perform basic math operations.

       Args:
           operation: The operation (add, subtract, multiply, divide)
           x: First number
           y: Second number

       Returns:
           ExecutionResult with calculation result
       """
       operations = {
           "add": x + y,
           "subtract": x - y,
           "multiply": x * y,
           "divide": x / y if y != 0 else None,
       }

       if operation in operations and operations[operation] is not None:
           result = operations[operation]
           return ExecutionResult(
               output_blocks=[TextContent(data=f"{operation}({x}, {y}) = {result}")]
           )
       else:
           return ExecutionResult(
               output_blocks=[TextContent(data=f"Error: Invalid operation or division by zero")]
           )

**2. Create a config file** (e.g., ``my_calculator_config.yaml``):

.. code-block:: yaml

   agents:
     - id: "calculator_agent"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
         custom_tools:
           - name: ["calculator"]
             category: "math"
             path: "my_tools/calculator.py"
             function: ["calculator"]
       system_message: |
         You are an AI assistant with access to a calculator tool.
         Use it when users ask for math operations.

   ui:
     display_type: "simple"

**3. Run it:**

.. code-block:: bash

   massgen --config my_calculator_config.yaml "What's 15 times 27?"

Basic Tool Structure
~~~~~~~~~~~~~~~~~~~~

Every custom tool follows this pattern:

.. code-block:: python

   from massgen.tool import ExecutionResult, TextContent

   async def my_tool_name(param1: str, param2: int) -> ExecutionResult:
       """Brief description of what this tool does.

       Args:
           param1: Description of first parameter
           param2: Description of second parameter

       Returns:
           ExecutionResult with the tool output
       """
       # Your logic here
       output = f"Processed {param1} with {param2}"

       return ExecutionResult(
           output_blocks=[TextContent(data=output)]
       )

**Key Requirements:**

* Use ``async def`` (even if your function doesn't use await)
* Include type hints for all parameters
* Write a Google-style docstring with Args and Returns sections
* Return ``ExecutionResult`` with at least one content block

Understanding ExecutionResult
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

``ExecutionResult`` is the container for all tool outputs. It tells MassGen what to return to the agent.

**Basic Usage:**

.. code-block:: python

   from massgen.tool import ExecutionResult, TextContent

   return ExecutionResult(
       output_blocks=[TextContent(data="Your output here")]
   )

**Available Content Types:**

1. **TextContent** - Plain text output (most common)

   .. code-block:: python

      TextContent(data="The result is 42")

2. **ImageContent** - Base64-encoded image data

   .. code-block:: python

      ImageContent(data="base64_encoded_image_string")

3. **AudioContent** - Base64-encoded audio data

   .. code-block:: python

      AudioContent(data="base64_encoded_audio_string")

**ExecutionResult Parameters:**

.. code-block:: python

   ExecutionResult(
       output_blocks=[...],        # Required: List of content blocks
       meta_info={"key": "value"}, # Optional: Metadata (not shown to agent)
       is_streaming=False,         # Optional: Is this a streaming result?
       is_final=True,              # Optional: Is this the final result?
       was_interrupted=False       # Optional: Was execution interrupted?
   )

Multimodal Results
~~~~~~~~~~~~~~~~~~

Tools can return multiple content types:

.. code-block:: python

   from massgen.tool import ExecutionResult, TextContent, ImageContent

   async def generate_chart(data: list) -> ExecutionResult:
       """Generate a chart from data."""
       # Generate chart (your code here)
       import base64
       chart_base64 = create_chart_image(data)

       return ExecutionResult(
           output_blocks=[
               TextContent(data="Chart generated successfully"),
               ImageContent(data=chart_base64)
           ],
           meta_info={"chart_type": "bar", "data_points": len(data)}
       )

Streaming Results
~~~~~~~~~~~~~~~~~

For long-running operations, stream progress updates:

.. code-block:: python

   from typing import AsyncGenerator
   import asyncio

   async def process_large_dataset(file_path: str) -> AsyncGenerator[ExecutionResult, None]:
       """Process a large dataset with progress updates."""

       # Initial status
       yield ExecutionResult(
           output_blocks=[TextContent(data="Starting processing...")],
           is_streaming=True,
           is_final=False
       )

       # Process in chunks
       for i in range(10):
           await asyncio.sleep(1)  # Simulate work
           yield ExecutionResult(
               output_blocks=[TextContent(data=f"Progress: {(i+1)*10}%")],
               is_streaming=True,
               is_final=False
           )

       # Final result
       yield ExecutionResult(
           output_blocks=[TextContent(data="Processing complete!")],
           is_streaming=True,
           is_final=True
       )

YAML Configuration
------------------

Basic Configuration
~~~~~~~~~~~~~~~~~~~

Reference your tool in the agent's backend config:

.. code-block:: yaml

   agents:
     - id: "agent_id"
       backend:
         type: "claude"
         model: "claude-sonnet-4"
         custom_tools:
           # Reference external file
           - name: "my_function"
             path: "path/to/my_tools.py"
             function: "my_function"
             category: "utilities"

           # Use built-in tool (no path needed)
           - name: "run_python_script"
             function: "run_python_script"

Configuration Options
~~~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   custom_tools:
     - name: "tool_name"              # Unique identifier
       path: "path/to/file.py"        # Path to Python file (optional for built-ins)
       function: "function_name"      # Function name in the file
       category: "category_name"      # Group related tools (optional)
       description: "Tool description"  # Override auto-generated description (optional)

**Multiple Tools Example:**

.. code-block:: yaml

   custom_tools:
     - name: "calculator"
       path: "tools/math.py"
       function: "calculator"
       category: "math"

     - name: "text_analyzer"
       path: "tools/text.py"
       function: "analyze_text"
       category: "text_processing"

     # Use built-in tool
     - name: "run_python_script"
       function: "run_python_script"

Built-in Tool Functions
------------------------

.. important::

   **When to use the standard approach instead:**

   * **File Operations**: Use Claude Code's native tools or :doc:`../files/file_operations` with MCP filesystem servers
   * **Code Execution**: Use backend built-in code execution or :doc:`code_execution` with MCP

   **These built-in functions are primarily for:**

   * Building blocks when creating your own custom tools (import and use them in your code)
   * Backends that don't have native file/code execution support

Available Functions
~~~~~~~~~~~~~~~~~~~

MassGen provides these built-in functions you can import and use in your custom tools as examples or building blocks to show custom tool capabilities:

**Code Execution:**

* ``run_python_script`` - Execute Python code in isolated subprocess
* ``run_shell_script`` - Execute shell commands

**File Operations:**

* ``read_file_content`` - Read files with optional line range
* ``save_file_content`` - Write content to files
* ``append_file_content`` - Append or insert content into files

See :doc:`../../api/tools` for complete API documentation of these functions.

Example Configurations
----------------------

MassGen includes 58 working config examples in ``massgen/configs/tools/custom_tools/``. All examples use the ``two_num_tool`` shown above.

Example 1: Claude Code with Custom Tool
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   massgen \
     --config massgen/configs/tools/custom_tools/claude_code_custom_tool_example.yaml \
     "What's the sum of 15 and 27?"

**Config:** ``claude_code_custom_tool_example.yaml``

.. code-block:: yaml

   orchestrator:
     snapshot_storage: "claude_code_snapshots"
     agent_temporary_workspace: "claude_code_temp"

   agents:
     - id: "claude_code_custom_tools"
       backend:
         type: "claude_code"
         model: "claude-sonnet-4-20250514"
         cwd: "claude_code_workspace"
         custom_tools:
           - name: ["two_num_tool"]
             category: "math"
             path: "massgen/tool/_basic/_two_num_tool.py"
             function: ["two_num_tool"]
             description: ["Add two numbers together"]
       append_system_prompt: |
         You are an AI assistant with access to custom calculation tools
         in addition to your built-in Claude Code tools.

   ui:
     display_type: "simple"
     logging_enabled: true

Example 2: Gemini with Custom Tool
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   massgen \
     --config massgen/configs/tools/custom_tools/gemini_custom_tool_example.yaml \
     "What's the sum of 123 and 456?"

**Config:** ``gemini_custom_tool_example.yaml``

.. code-block:: yaml

   agents:
     - id: "gemini2.5flash_custom_tool"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
         custom_tools:
           - name: ["two_num_tool"]
             category: "math"
             path: "massgen/tool/_basic/_two_num_tool.py"
             function: ["two_num_tool"]
       system_message: |
         You are an AI assistant with access to a custom math calculation tool.
         When users ask about adding two numbers together, use the two_num_tool.

   ui:
     display_type: "rich_terminal"
     logging_enabled: true

Example 3: Custom Tool + MCP Integration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   massgen \
     --config massgen/configs/tools/custom_tools/gemini_custom_tool_with_mcp_example.yaml \
     "What's the sum of 123 and 456? And what's the weather in Tokyo?"

**Config:** ``gemini_custom_tool_with_mcp_example.yaml``

.. code-block:: yaml

   agents:
     - id: "gemini2.5flash_custom_tool"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"

         # Custom tools
         custom_tools:
           - name: ["two_num_tool"]
             category: "math"
             path: "massgen/tool/_basic/_two_num_tool.py"
             function: ["two_num_tool"]

         # MCP servers
         mcp_servers:
           - name: "weather"
             type: "stdio"
             command: "npx"
             args: ["-y", "@fak111/weather-mcp"]

       system_message: |
         You are an AI assistant with access to a custom math calculation tool
         and a weather information MCP tool.

   ui:
     display_type: "simple"
     logging_enabled: true

Example 4: Multimodal Understanding Tools
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**New in v0.1.3+**: MassGen provides custom tools for analyzing multimodal content (images, audio, video, documents) using OpenAI's gpt-4.1 API.

.. code-block:: bash

   # Analyze an image
   massgen \
     --config massgen/configs/tools/custom_tools/multimodal_tools/understand_image.yaml \
     "Describe the content in this image"

   # Transcribe audio
   massgen \
     --config massgen/configs/tools/custom_tools/multimodal_tools/understand_audio.yaml \
     "What is being said in this audio?"

   # Analyze video
   massgen \
     --config massgen/configs/tools/custom_tools/multimodal_tools/understand_video.yaml \
     "What's happening in this video?"

   # Process documents
   massgen \
     --config massgen/configs/tools/custom_tools/multimodal_tools/understand_file.yaml \
     "Summarize this PDF document"

**Config Example:** ``understand_image.yaml``

.. code-block:: yaml

   agents:
     - id: "understand_image_tool"
       backend:
         type: "openai"
         model: "gpt-5-nano"
         cwd: "workspace1"
         custom_tools:
           - name: ["understand_image"]
             category: "multimodal"
             path: "massgen/tool/_multimodal_tools/understand_image.py"
             function: ["understand_image"]
       system_message: |
         You are an AI assistant with access to image understanding capabilities.
         Use the understand_image tool to analyze and understand images using OpenAI's gpt-4.1 API.

   orchestrator:
     context_paths:
       - path: "massgen/configs/resources/v0.1.3-example/multimodality.jpg"
         permission: "read"

   ui:
     display_type: "rich_terminal"
     logging_enabled: true

**Available Multimodal Tools:**

* ``understand_image`` - Analyze images (PNG, JPEG, JPG)
* ``understand_audio`` - Transcribe and analyze audio files
* ``understand_video`` - Extract key frames and analyze videos
* ``understand_file`` - Process documents (PDF, DOCX, XLSX, PPTX)

**Key Features:**

* Works with any backend - uses OpenAI's gpt-4.1 for analysis
* Processes files from agent workspaces
* Structured JSON responses with detailed metadata
* Path validation for security

See :doc:`../advanced/multimodal` for complete multimodal capabilities documentation.

Example 5: Crawl4AI Web Scraping Tools
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**New in v0.1.4**: Docker-based web scraping with multiple output formats via crawl4ai custom tools.

.. code-block:: bash

   # Start crawl4ai Docker container (one-time setup)
   docker pull unclecode/crawl4ai:latest
   docker run -d -p 11235:11235 --name crawl4ai --shm-size=1g unclecode/crawl4ai:latest

   # Use crawl4ai tools
   massgen \
     --config massgen/configs/tools/custom_tools/crawl4ai_example.yaml \
     "Please scrape the MassGen docs, take a screenshot, and explain that screenshot"

**Config Example:** ``crawl4ai_example.yaml``

.. code-block:: yaml

   agents:
     - id: "web_scraper_agent"
       backend:
         type: "openai"
         model: "gpt-5-mini"
         cwd: "workspace1"

         # Register crawl4ai custom tools
         custom_tools:
           - name: ["crawl4ai_md", "crawl4ai_html", "crawl4ai_screenshot", "crawl4ai_pdf", "crawl4ai_execute_js", "crawl4ai_crawl"]
             category: "web_scraping"
             path: "massgen/tool/_web_tools/crawl4ai_tool.py"
             function: ["crawl4ai_md", "crawl4ai_html", "crawl4ai_screenshot", "crawl4ai_pdf", "crawl4ai_execute_js", "crawl4ai_crawl"]

           - name: ["understand_image"]
             category: "multimodal"
             path: "massgen/tool/_multimodal_tools/understand_image.py"
             function: ["understand_image"]

   ui:
     display_type: "rich_terminal"
     logging_enabled: true

**Available Crawl4AI Tools:**

* ``crawl4ai_md`` - Extract clean markdown from web content
* ``crawl4ai_html`` - Get preprocessed HTML
* ``crawl4ai_screenshot`` - Capture webpage screenshots
* ``crawl4ai_pdf`` - Generate PDF documents
* ``crawl4ai_execute_js`` - Run JavaScript on web pages
* ``crawl4ai_crawl`` - Perform multi-URL crawling

**Key Features:**

* Docker-based isolation (no Python dependencies needed)
* Multiple output formats (markdown, HTML, screenshots, PDFs)
* JavaScript execution for dynamic content
* Concurrent crawling (up to 5 simultaneous crawls)
* Automatic Docker health checks with clear error messages

**Requirements:**

* Docker installed and running
* crawl4ai container accessible at ``http://localhost:11235``

If the Docker container isn't running, agents receive a helpful error message with setup instructions.

Example 6: Computer Use Tools
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**New in v0.1.8**: MassGen provides browser and desktop automation tools for AI agents.

MassGen offers three computer use tools optimized for different providers:

* ``gemini_computer_use`` - Google Gemini Computer Use (autonomous browser/desktop control)
* ``claude_computer_use`` - Anthropic Claude Computer Use (thorough automation with enhanced actions)
* ``browser_automation`` - Simple browser automation (works with ANY model: gpt-4.1, gpt-4o, etc.)

**Quick Example:**

.. code-block:: bash

   # Simple browser automation (any model)
   massgen \
     --config massgen/configs/tools/custom_tools/simple_browser_automation_example.yaml \
     "Go to Wikipedia and search for Jimmy Carter"

   # Gemini Computer Use
   massgen \
     --config massgen/configs/tools/custom_tools/gemini_computer_use_example.yaml \
     "Go to cnn.com and get the top headline"

   # Claude Computer Use
   massgen \
     --config massgen/configs/tools/custom_tools/claude_computer_use_docker_example.yaml \
     "Navigate to Wikipedia and search for Artificial Intelligence"

.. seealso::

   For complete documentation on computer use tools including:

   * Detailed tool comparisons and performance benchmarks
   * Configuration examples for browser and Docker environments
   * Visualization and monitoring with VNC/non-headless mode
   * Multi-agent computer use coordination
   * Troubleshooting and best practices

   See :doc:`../advanced/computer_use` - Complete Computer Use Tools guide

Example 7: Terminal Evaluation Tools
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

MassGen can evaluate its own terminal display and frontend UX by recording sessions with VHS and analyzing them using AI vision models.

MassGen provides terminal evaluation tools for assessing display quality and user experience:

* ``run_massgen_with_recording`` - Record MassGen terminal sessions as video (MP4/GIF/WebM)
* ``understand_video`` - Analyze video recordings using GPT-4.1 vision
* ``understand_image`` - Analyze screenshots and frames

**Quick Example:**

.. code-block:: bash

   # Record and evaluate a MassGen session
   massgen \
     --config massgen/configs/tools/custom_tools/terminal_evaluation.yaml \
     "Record and evaluate the terminal display for the todo example config"

**Config Example:** ``terminal_evaluation.yaml``

.. code-block:: yaml

   agents:
     - id: "terminal_evaluator"
       backend:
         type: "openai"
         model: "gpt-5-nano"
         cwd: "workspace1"

         # Terminal evaluation tools
         custom_tools:
           - name: ["run_massgen_with_recording"]
             category: "terminal_recording"
             path: "massgen/tool/_multimodal_tools/run_massgen_with_recording.py"
             function: ["run_massgen_with_recording"]

           - name: ["understand_video"]
             category: "multimodal"
             path: "massgen/tool/_multimodal_tools/understand_video.py"
             function: ["understand_video"]

           - name: ["understand_image"]
             category: "multimodal"
             path: "massgen/tool/_multimodal_tools/understand_image.py"
             function: ["understand_image"]

   ui:
     display_type: "rich_terminal"
     logging_enabled: true

**Available Terminal Evaluation Tools:**

* ``run_massgen_with_recording`` - Records MassGen sessions as MP4/GIF/WebM videos using VHS
* ``understand_video`` - Extracts frames and analyzes videos with GPT-4.1
* ``understand_image`` - Analyzes individual frames or screenshots

**Key Features:**

* VHS integration for high-quality terminal recording
* Video frame extraction (configurable frame count)
* AI-powered UX evaluation using GPT-4.1 vision
* Automatic workspace management for recordings
* Support for multiple output formats (MP4, GIF, WebM)

**Prerequisites:**

* VHS terminal recorder: ``brew install vhs`` (macOS) or ``go install github.com/charmbracelet/vhs@latest``
* OpenAI API key configured in ``.env``

**Workflow:**

1. Agent creates VHS tape script to record terminal session
2. Runs MassGen command (without ``--automation`` to capture rich display)
3. VHS records the session as video
4. Extracts key frames from video
5. Analyzes frames using GPT-4.1 vision model
6. Returns detailed UX evaluation with recommendations

**Use Cases:**

* Frontend development - Evaluate UI/UX changes to terminal display
* Quality assurance - Verify status indicators and agent outputs
* Case study creation - Record demos and generate video content
* User testing - Analyze how well terminal communicates progress

.. seealso::

   For complete documentation on terminal evaluation including:

   * Detailed recording workflow and VHS configuration
   * Frame extraction and analysis techniques
   * Evaluation criteria and best practices
   * Integration with case study creation
   * Troubleshooting and monitoring

   See :doc:`../advanced/terminal_evaluation` - Complete Terminal Evaluation guide

Available Example Configs
~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``massgen/configs/tools/custom_tools/`` directory contains examples for all backends:

**Basic Custom Tools:**

* **Claude API**: ``claude_custom_tool_example.yaml``
* **Claude Code**: ``claude_code_custom_tool_example.yaml``
* **Gemini**: ``gemini_custom_tool_example.yaml``
* **OpenAI (GPT)**: ``gpt5_nano_custom_tool_example.yaml``, ``gpt_oss_custom_tool_example.yaml``
* **Grok**: ``grok3_mini_custom_tool_example.yaml``
* **Qwen**: ``qwen_api_custom_tool_example.yaml``, ``qwen_local_custom_tool_example.yaml``
* **With MCP**: ``*_custom_tool_with_mcp_example.yaml`` variants for each backend

**Multimodal Understanding Tools:**

* ``multimodal_tools/understand_image.yaml`` - Image analysis
* ``multimodal_tools/understand_audio.yaml`` - Audio transcription
* ``multimodal_tools/understand_video.yaml`` - Video analysis
* ``multimodal_tools/understand_file.yaml`` - Document processing

**Web Scraping Tools:**

* ``crawl4ai_example.yaml`` - Docker-based web scraping with multiple output formats

**Computer Use Tools:**

* ``gemini_computer_use_example.yaml`` - Google Gemini computer use automation
* ``claude_computer_use_docker_example.yaml`` - Anthropic Claude computer use automation
* ``simple_browser_automation_example.yaml`` - Simple browser automation for any model

**Terminal Evaluation Tools:**

* ``terminal_evaluation.yaml`` - Record and evaluate MassGen terminal sessions with VHS and GPT-4.1

Backend Support
---------------

Custom tools work with **most** MassGen backends:

**✅ Supported Backends:**

* **OpenAI** (``openai``) - OpenAI's GPT models
* **Claude** (``claude``) - Anthropic's Claude API
* **Claude Code** (``claude_code``) - Claude with native file/code tools
* **Gemini** (``gemini``) - Google's Gemini models
* **Grok** (``grok``) - xAI's Grok models
* **Chat Completions** (``chatcompletion``) - Generic OpenAI-compatible APIs
* **LM Studio** (``lmstudio``) - Local model hosting
* **Inference** (``inference``) - vLLM, SGLang, custom inference servers

**❌ Not Supported:**

* **Azure OpenAI** (``azure_openai``) - Does not implement custom tools interface
* **AG2 Framework** (``ag2``) - Does not implement custom tools interface

**Why Some Backends Don't Support Custom Tools:**

Azure OpenAI and AG2 inherit from the base ``LLMBackend`` class directly without the custom tools layer. These backends focus on their native capabilities rather than custom tool integration.

Troubleshooting
---------------

Tool Not Found
~~~~~~~~~~~~~~

**Error:** ``ToolNotFound: No tool named 'my_tool' exists``

**Solutions:**

* Verify the file path is correct relative to where you run the command
* Check function name matches exactly
* Ensure the function is imported/defined in the file
* Custom tool names are prefixed with ``custom_tool__`` internally

Function Import Errors
~~~~~~~~~~~~~~~~~~~~~~~

**Error:** ``ModuleNotFoundError`` or ``ImportError``

**Solutions:**

* Use relative or absolute paths correctly
* Ensure all imports in your tool file are available
* Check that dependencies are installed

Schema Generation Fails
~~~~~~~~~~~~~~~~~~~~~~~~

**Error:** ``TypeError: cannot create schema for function``

**Solutions:**

* Add type hints to all parameters
* Use ``async def`` even for non-async functions
* Return ``ExecutionResult`` (not plain values)

Tool Execution Errors
~~~~~~~~~~~~~~~~~~~~~~

Check the error in the agent's output. Common issues:

* Missing required parameters
* Wrong parameter types
* Exceptions in your function code

Add error handling to your tools:

.. code-block:: python

   async def safe_tool(param: str) -> ExecutionResult:
       """A tool with error handling."""
       try:
           # Your logic
           result = process(param)
           return ExecutionResult(
               output_blocks=[TextContent(data=f"Success: {result}")]
           )
       except Exception as e:
           return ExecutionResult(
               output_blocks=[TextContent(data=f"Error: {str(e)}")]
           )

Best Practices
--------------

1. **Clear Function Names**: Use descriptive names that indicate what the tool does
2. **Type Hints Required**: Always include type hints for parameters and return type
3. **Detailed Docstrings**: Agents use these to understand when to use your tool
4. **Error Handling**: Return errors as ``ExecutionResult`` rather than raising exceptions
5. **Test Independently**: Test your function works before adding to MassGen
6. **Keep Functions Focused**: One tool should do one thing well
7. **Use Categories**: Group related tools together

Advanced Usage (Developer API)
-------------------------------

.. note::

   **The sections below are for advanced users and developers** who want to programmatically manage tools or understand internal APIs. Most users don't need this.

For most use cases, the YAML configuration above is sufficient. However, if you're building on top of MassGen or need programmatic control, you can use the ``ToolManager`` API.

ToolManager API
~~~~~~~~~~~~~~~

The ``ToolManager`` class provides programmatic control over tools:

.. code-block:: python

   from massgen.tool import ToolManager

   # Create manager
   manager = ToolManager()

   # Add tool from file
   manager.add_tool_function(
       path="my_tools/calculator.py",
       func="calculator",
       category="math"
   )

   # Get available tools
   schemas = manager.fetch_tool_schemas()

   # Execute a tool
   result = await manager.execute_tool({
       "name": "custom_tool__calculator",
       "input": {"operation": "add", "x": 5, "y": 3}
   })

Tool Categories
~~~~~~~~~~~~~~~

Programmatically manage tool categories:

.. code-block:: python

   # Create category
   manager.setup_category(
       category_name="data_science",
       description="Data analysis tools",
       enabled=True
   )

   # Enable/disable categories
   manager.modify_categories(["data_science"], enabled=False)

   # Delete categories
   manager.delete_categories("old_category")

.. seealso::

   :doc:`../../api/tools` - Complete ToolManager API reference with all methods, parameters, and examples.

Next Steps
----------

* **Related Guides:**

  * :doc:`mcp_integration` - External tools via MCP
  * :doc:`background_tools` - Non-blocking lifecycle for long-running tool calls
  * :doc:`index` - Tools and capabilities overview
  * :doc:`../backends` - Backend capabilities
  * :doc:`../../reference/yaml_schema` - Complete YAML reference

* **Developer API Documentation:**

  For programmatic tool management and internal APIs:

  * :doc:`../../api/tools` - Complete Tool System API reference (ToolManager, ExecutionResult, exceptions, built-in tools)

* **Examples:**

  * `Config Examples <https://github.com/Leezekun/MassGen/tree/main/massgen/configs/tools/custom_tools>`_ - 58 configuration examples
  * `Test Examples <https://github.com/Leezekun/MassGen/blob/main/massgen/tests/custom_tools_example.py>`_ - Python usage examples


---

## user_guide/tools/index.rst

Tools and Capabilities
======================

MassGen provides a comprehensive tools ecosystem that enables AI agents to perform complex tasks through three complementary systems. Tools extend agent capabilities beyond text generation to include code execution, file operations, web search, external API integration, and custom functionality.

.. note::

   This is an overview of MassGen's tools ecosystem. For detailed guides, see:

   * :doc:`mcp_integration` - External tools via Model Context Protocol
   * :doc:`custom_tools` - Custom Python functions as tools
   * :doc:`background_tools` - Background lifecycle for long-running tool calls
   * :doc:`../advanced/computer_use` - Browser and desktop automation tools

What Are Tools?
---------------

Tools in MassGen are capabilities that agents can invoke during task execution. Unlike traditional function calls in your code, these tools are:

* **Discoverable**: Agents automatically learn about available tools through JSON schemas
* **Backend-Agnostic**: The same tool works across Claude, Gemini, OpenAI, and all other backends
* **Safely Isolated**: Tools execute in controlled environments with timeouts and resource limits
* **Multimodal**: Tools can return text, images, audio, or structured data

Tool Systems Overview
---------------------

MassGen provides four ways for agents to access tools:

1. **Backend Built-in Tools**: Web search, code execution, file operations provided by model APIs
2. **MCP Integration**: External tools through the Model Context Protocol
3. **Custom Tools**: Your own Python functions registered via the Tool System
4. **AG2 Framework Tools**: Tools from the AG2 framework (when using AG2 backend)

1. Backend Built-in Tools
-------------------------

Different model providers offer built-in capabilities that agents can enable via YAML configuration.

**Key Capabilities:**

* **Web Search**: Real-time information from the internet (Gemini, Grok, Claude, OpenAI)
* **Code Execution**: Run Python code and scripts (OpenAI, Claude, Gemini, AG2)
* **File Operations**: Read, write, and modify files (Claude Code natively, others via MCP)

.. important::
   **Code Execution: Two Different Options**

   MassGen supports two distinct code execution approaches:

   1. **Backend Built-in** (``enable_code_execution``/``enable_code_interpreter``): Runs in the provider's sandbox (OpenAI, Claude, Gemini). **Does NOT integrate with your local filesystem** - code runs in an isolated cloud environment.

   2. **MCP-based** (``enable_mcp_command_line``): Runs on your local machine or Docker container. **Full filesystem access** - agents can read/write files in your project.

   **Use backend built-in** for quick calculations and isolated code snippets.
   **Use MCP-based** for code that needs to interact with your project files.

   See :doc:`code_execution` for detailed comparison and configuration.

**Quick Example:**

.. code-block:: yaml

   agents:
     - id: "researcher"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
         enable_web_search: true         # Built-in web search
         enable_code_execution: true     # Built-in code execution

**Availability:**

See :doc:`../backends` for the complete backend capabilities matrix showing which backends support which built-in tools.

2. MCP (Model Context Protocol) Integration
--------------------------------------------

The Model Context Protocol (MCP) is an open standard that connects AI agents to external tools and data sources. Think of it as USB-C for AI - a universal interface for tools.

**What You Can Do:**

* Connect to external APIs (Weather, Discord, Twitter, Notion)
* Access databases and file systems
* Use browser automation (Playwright)
* Search the web (Brave Search)
* Integrate with custom services

**Quick Example:**

.. code-block:: yaml

   agents:
     - id: "agent_with_mcp"
       backend:
         type: "openai"
         model: "gpt-5-nano"
         mcp_servers:
           - name: "weather"
             type: "stdio"
             command: "npx"
             args: ["-y", "@modelcontextprotocol/server-weather"]

**Key Features:**

* Standardized protocol for external tool integration
* Works across all MassGen backends (except Azure OpenAI)
* Support for multiple MCP servers per agent
* Tool filtering and safety controls
* Planning mode to prevent premature execution

.. seealso::
   :doc:`mcp_integration` - Complete guide with MCP server configuration, common servers, tool filtering, planning mode, and security best practices

3. Custom Tools System
-----------------------

MassGen's Custom Tools System allows you to register your own Python functions as tools that agents can discover and use. This enables you to extend agent capabilities with domain-specific functionality.

**What You Can Do:**

* Turn your Python functions into agent tools via YAML config
* Automatic schema generation from function signatures and docstrings
* Works across all MassGen backends (Claude, Gemini, OpenAI, etc.)
* No need to modify MassGen internals

**Your Tool File** (``my_tools/analyzer.py``):

.. code-block:: python

   from massgen.tool import ExecutionResult, TextContent
   import json

   async def analyze_data(dataset: str, metrics: list) -> ExecutionResult:
       """Analyze dataset and compute metrics.

       Args:
           dataset: Path to dataset file
           metrics: List of metrics to compute (e.g., ["mean", "median", "count"])

       Returns:
           ExecutionResult with analysis results
       """
       # Load and analyze data
       with open(dataset, 'r') as f:
           data = json.load(f)

       results = {}
       if "count" in metrics:
           results["count"] = len(data)
       if "mean" in metrics and data:
           results["mean"] = sum(data) / len(data)
       if "median" in metrics and data:
           sorted_data = sorted(data)
           mid = len(sorted_data) // 2
           results["median"] = sorted_data[mid]

       output = f"Analysis Results:\n{json.dumps(results, indent=2)}"
       return ExecutionResult(
           output_blocks=[TextContent(data=output)]
       )

**Your Config** (``config.yaml``):

.. code-block:: yaml

   agents:
     - id: "analyst"
       backend:
         type: "claude"
         model: "claude-sonnet-4"
         custom_tools:
           - name: "analyze_data"
             path: "my_tools/analyzer.py"
             function: "analyze_data"
             category: "data_science"

**Run:**

.. code-block:: bash

   massgen --config config.yaml "Analyze sales_data.csv"

.. seealso::
   :doc:`custom_tools` - Complete guide with working examples, built-in tools, configuration patterns, and troubleshooting

4. AG2 Framework Tools
-----------------------

When using the AG2 backend, agents gain access to the AG2 framework's execution environments and tools.

**Supported Executors:**

* ``local`` - Execute code on local machine
* ``docker`` - Execute in Docker container
* ``jupyter`` - Execute in Jupyter kernel
* ``yepcode`` - Execute in YepCode environment

**Configuration:**

.. code-block:: yaml

   agents:
     - id: "ag2_coder"
       backend:
         type: "ag2"
         agent_type: "ConversableAgent"
         llm_config:
           config_list:
             - model: "gpt-4"
               api_key: "${OPENAI_API_KEY}"
         code_execution_config:
           executor: "docker"
           work_dir: "coding"

See :doc:`../integration/general_interoperability` for detailed AG2 tool configuration and usage.

Combining Tool Systems
----------------------

The real power comes from combining different tool systems to create agents with comprehensive capabilities.

All Three Systems Together
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   agents:
     - id: "full_stack_agent"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"

         # 1. Built-in backend tools
         enable_web_search: true
         enable_code_execution: true

         # 2. External MCP tools
         mcp_servers:
           - name: "weather"
             type: "stdio"
             command: "npx"
             args: ["-y", "@modelcontextprotocol/server-weather"]

         # 3. Custom tools
         custom_tools:
           - path: "tools/analyzer.py"
             func: "analyze_data"
           - func: "run_python_script"

**Result**: Agent can search the web, execute code, check weather, and use your custom analysis functions.

Specialized Multi-Agent Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Different agents with different tool combinations:

.. code-block:: yaml

   agents:
     # Research agent: Web search + MCP
     - id: "researcher"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
         enable_web_search: true
         mcp_servers:
           - name: "brave_search"
             type: "stdio"
             command: "npx"
             args: ["-y", "@modelcontextprotocol/server-brave-search"]

     # Development agent: File operations + Custom tools
     - id: "developer"
       backend:
         type: "claude_code"
         model: "claude-sonnet-4"
         cwd: "workspace"
         custom_tools:
           - func: "run_python_script"
           - func: "run_shell_script"

     # Data agent: Code execution + Custom analytics
     - id: "data_analyst"
       backend:
         type: "openai"
         model: "gpt-5-nano"
         enable_code_interpreter: true
         custom_tools:
           - path: "tools/stats.py"
             func: "calculate_statistics"
           - path: "tools/viz.py"
             func: "create_visualization"

Quick Start Examples
--------------------

Built-in Tools
~~~~~~~~~~~~~~

.. code-block:: bash

   # Web search
   massgen --model gemini-2.5-flash \
     "Research the latest AI developments and summarize key trends"

   # Code execution
   massgen --model gpt-5-nano \
     "Calculate the first 100 prime numbers and plot their distribution"

MCP Tools
~~~~~~~~~

.. code-block:: bash

   # Single MCP server (weather)
   massgen \
     --config @examples/tools/mcp/gpt5_nano_mcp_example.yaml \
     "What's the weather forecast for New York this week?"

   # Multiple MCP servers
   massgen \
     --config @examples/tools/mcp/multimcp_gemini.yaml \
     "Find hotels in London and check the weather forecast"

Custom Tools
~~~~~~~~~~~~

.. code-block:: bash

   # Custom Python tools
   massgen \
     --config massgen/configs/tools/custom_tools/claude_code_custom_tool_example.yaml \
     "Calculate the sum of 15 and 27"

   # Custom tools with MCP
   massgen \
     --config massgen/configs/tools/custom_tools/gemini_custom_tool_with_mcp_example.yaml \
     "Test both custom and MCP tools together"

Choosing the Right Tool System
------------------------------

.. list-table::
   :header-rows: 1
   :widths: 25 35 40

   * - Tool System
     - Best For
     - When to Use
   * - **Built-in Tools**
     - Web search, basic code execution, file ops
     - Quick setup, standard capabilities
   * - **MCP Integration**
     - External APIs, third-party services
     - Weather, databases, Discord, Twitter, browser automation
   * - **Custom Tools**
     - Domain-specific functionality
     - Your own business logic, specialized algorithms, internal APIs
   * - **AG2 Framework**
     - Complex multi-agent workflows
     - Research tasks, code generation with execution

Best Practices
--------------

Tool Configuration
~~~~~~~~~~~~~~~~~~

1. **Enable only needed tools**: Reduce API costs and improve agent focus
2. **Use MCP for external integrations**: Standardized, reusable protocol
3. **Create custom tools for domain logic**: Your unique functionality
4. **Test tools independently**: Verify each tool works before multi-agent use
5. **Document tool requirements**: Note required API keys, dependencies, and permissions

Security
~~~~~~~~

.. warning::

   Tools can execute code, access files, and call external APIs. Always:

   * Review third-party MCP servers before use
   * Use tool filtering (``allowed_tools``/``exclude_tools``) to restrict capabilities
   * Enable planning mode for tools with side effects
   * Store API keys in ``.env`` files, never in configs
   * Test in isolated environments first
   * Set timeouts to prevent long-running operations

See :doc:`../files/project_integration` for secure file access configuration.

Performance
~~~~~~~~~~~

1. **Lazy loading**: Don't register unnecessary tools
2. **Category management**: Disable tool categories when not needed
3. **Tool filtering**: Reduce available tools to improve agent decision-making
4. **Caching**: MCP servers support caching for repeated requests
5. **Timeouts**: Set reasonable timeouts for all tools

Common Issues
-------------

**Backend doesn't support tool:**

.. code-block:: yaml

   # ❌ Grok doesn't support code execution
   backend:
     type: "grok"
     enable_code_interpreter: true

   # ✅ Use OpenAI instead
   backend:
     type: "openai"
     enable_code_interpreter: true

See :doc:`../backends` for complete backend capabilities matrix.

**MCP server not found:**

.. code-block:: bash

   # Test MCP server
   npx -y @modelcontextprotocol/server-weather

   # Install globally for faster startup
   npm install -g @modelcontextprotocol/server-weather

**Custom tool not registered:**

* Verify the file path is correct relative to where you run massgen
* Check the function name matches exactly
* Ensure the function is defined in the file
* See :doc:`custom_tools` for detailed troubleshooting

Detailed Guides
---------------

For in-depth information on each tool system:

.. grid:: 3
   :gutter: 3

   .. grid-item-card:: 🔌 MCP Integration

      External tools via Model Context Protocol

      * MCP server configuration
      * Common servers (weather, search, Discord)
      * Tool filtering and safety
      * Planning mode
      * Multi-server setups

      :doc:`Read the MCP Integration guide → <mcp_integration>`

   .. grid-item-card:: 🛠️ Custom Tools

      Your own Python functions as tools

      * Write Python functions as tools
      * Register via YAML config
      * Built-in tools (code execution, file operations)
      * Works across all backends
      * 58 working examples

      :doc:`Read the Custom Tools guide → <custom_tools>`

   .. grid-item-card:: 🖥️ Computer Use

      Browser and desktop automation tools

      * Gemini Computer Use (Google)
      * Claude Computer Use (Anthropic)
      * Simple browser automation (any model)
      * Visual feedback and screenshots
      * Multi-agent coordination

      :doc:`Read the Computer Use guide → <../advanced/computer_use>`

Related Documentation
---------------------

* :doc:`../backends` - Complete backend capabilities matrix
* :doc:`../files/file_operations` - File system operations and safety
* :doc:`../files/project_integration` - Secure project access with context paths
* :doc:`../integration/general_interoperability` - Framework interoperability (including AG2)
* :doc:`../../examples/basic_examples` - See tools in action
* :doc:`../../reference/yaml_schema` - Complete YAML configuration reference
* :doc:`background_tools` - Background execution lifecycle for tool calls

External Resources
------------------

* `MCP Server Registry <https://github.com/modelcontextprotocol/servers>`_ - Official MCP servers catalog
* `MCP Documentation <https://modelcontextprotocol.io/>`_ - Protocol specification
* `Custom Tools System README <https://github.com/Leezekun/MassGen/blob/main/massgen/tool/README.md>`_ - Complete technical overview
* `Config Examples <https://github.com/Leezekun/MassGen/tree/main/massgen/configs/tools>`_ - 58+ tool configuration examples

.. toctree::
   :maxdepth: 1
   :hidden:

   mcp_integration
   custom_tools
   background_tools
   code_execution
   code_based_tools
   skills
   skills_lifecycle_and_consolidation


---

## user_guide/tools/mcp_integration.rst

MCP Integration
================

MassGen supports the Model Context Protocol (MCP) for standardized tool integration. MCP enables agents to use external tools through a unified interface.

What is MCP?
------------

From the official documentation:

   MCP is an open protocol that standardizes how applications provide context to LLMs. Think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect your devices to various peripherals and accessories, MCP provides a standardized way to connect AI models to different data sources and tools.

MassGen integrates MCP through YAML configuration, allowing agents to access tools like:

* Web search (Brave, Google)
* Weather services
* File operations
* Browser automation (Playwright)
* Discord, Twitter, Notion APIs
* And many more MCP servers

Quick Start
-----------

**Single MCP tool (weather):**

.. code-block:: bash

   massgen \
     --config @examples/tools/mcp/gpt5_nano_mcp_example.yaml \
     "What's the weather forecast for New York this week?"

**Multiple MCP tools:**

.. code-block:: bash

   massgen \
     --config @examples/tools/mcp/multimcp_gemini.yaml \
     "Find the best restaurants in Paris and save the recommendations to a file"

Backend Support
---------------

MCP integration is available for most MassGen backends. For the complete backend capabilities matrix including MCP support status, see :doc:`../backends`.

**Backends with MCP Support:**

* ✅ Claude API - Full MCP integration
* ✅ Claude Code - Native MCP + file tools
* ✅ Gemini API - Full MCP integration with planning mode
* ✅ Grok API - Full MCP integration
* ✅ OpenAI API - Full MCP integration
* ✅ Z AI - MCP integration available
* ❌ Azure OpenAI - Not yet supported

See :doc:`../backends` for detailed backend capabilities and feature comparison.

Configuration
-------------

Basic MCP Setup
~~~~~~~~~~~~~~~

Add MCP servers to your agent's backend configuration:

.. code-block:: yaml

   agents:
     - id: "agent_with_mcp"
       backend:
         type: "openai"              # Your backend choice
         model: "gpt-5-mini"         # Your model choice

         # Add MCP servers here
         mcp_servers:
           - name: "weather"         # Server name (you choose this)
             type: "stdio"           # Communication type
             command: "npx"          # Command to run
             args: ["-y", "@modelcontextprotocol/server-weather"]

That's it! The agent can now check weather.

MCP Transport Types
~~~~~~~~~~~~~~~~~~~

**stdio (Standard Input/Output)**

Most MCP servers use stdio transport:

.. code-block:: yaml

   mcp_servers:
     - name: "weather"
       type: "stdio"                # stdio transport
       command: "npx"               # Command to launch server
       args: ["-y", "@modelcontextprotocol/server-weather"]

**streamable-http (HTTP/SSE)**

Some MCP servers use HTTP with Server-Sent Events:

.. code-block:: yaml

   mcp_servers:
     - name: "custom_api"
       type: "streamable-http"      # HTTP transport
       url: "http://localhost:8080/mcp/sse"

Configuration Parameters
~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 25 15 60

   * - Parameter
     - Required
     - Description
   * - ``name``
     - Yes
     - Unique name for the MCP server
   * - ``type``
     - Yes
     - Transport: ``"stdio"`` or ``"streamable-http"``
   * - ``command``
     - stdio only
     - Command to run the MCP server
   * - ``args``
     - stdio only
     - Arguments for the command
   * - ``url``
     - http only
     - Server endpoint URL
   * - ``env``
     - No
     - Environment variables to pass

Variable Substitution
~~~~~~~~~~~~~~~~~~~~~

MassGen supports variable substitution in MCP configurations:

**Built-in Variables:**

* ``${cwd}`` - Replaced with the agent's working directory (from ``backend.cwd``)
* Works anywhere in the backend config (``args``, ``env``, etc.)

**Environment Variables:**

* Use ``${VARIABLE_NAME}`` syntax (must be UPPERCASE)
* Resolved from your ``.env`` file or system environment
* Work in both ``args`` and ``env`` parameters

.. code-block:: yaml

   mcp_servers:
     - name: "playwright"
       type: "stdio"
       command: "npx"
       args:
         - "@playwright/mcp@latest"
         - "--output-dir=${cwd}"                # Built-in: agent's working directory
         - "--user-data-dir=${cwd}/profile"
       env:
         API_KEY: "${API_KEY}"                  # Environment variable from .env file

**Important:**

* ``${cwd}`` is lowercase and refers to the agent's working directory
* Environment variables must be UPPERCASE (e.g., ``${API_KEY}``, ``${BRAVE_API_KEY}``)
* Both systems work together but are resolved separately

Recommended MCP Servers (Registry)
-----------------------------------

MassGen includes a curated registry of recommended MCP servers that are automatically available when auto-discovery is enabled. These servers have been tested and provide essential capabilities for agent workflows.

**Registry Servers:**

* **Context7** - Up-to-date code documentation for libraries and frameworks
* **Brave Search** - Web search via Brave API (requires API key)
* **Exa Search** - AI-powered web search via Exa API (requires API key)

See :doc:`../../reference/mcp_server_registry` for complete documentation of all registry servers, including configuration examples, API key setup, and usage patterns.

Auto-Discovery
~~~~~~~~~~~~~~

Enable automatic inclusion of registry MCP servers:

.. code-block:: yaml

   agents:
     - id: "research_agent"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
         auto_discover_custom_tools: true  # Automatically adds registry servers!

**Behavior:**

* **Context7**: Always included (no API key required)
* **Brave Search**: Only included if ``BRAVE_API_KEY`` is set in ``.env``
* **Exa Search**: Only included if ``EXA_API_KEY`` is set in ``.env``

**Log Output Example:**

.. code-block:: text

   [gemini] Auto-discovery enabled: Added MCP servers from registry: context7
   [gemini] Registry servers not added (missing API keys): brave_search (needs BRAVE_API_KEY), exa_search (needs EXA_API_KEY)

**Benefits:**

* No manual configuration needed for recommended servers
* Servers are only included if API keys are available
* Avoids duplicates if you manually configure a registry server
* Easy to get started with powerful tools

**Example Configurations:**

* ``massgen/configs/tools/mcp/auto_discovery_with_registry.yaml`` - Auto-discovery example
* ``massgen/configs/tools/mcp/context7_documentation_example.yaml`` - Context7 usage
* ``massgen/configs/tools/mcp/brave_search_example.yaml`` - Brave Search usage
* ``massgen/configs/tools/web-search/exa_search_example.yaml`` - Exa Search usage

Manual Configuration
~~~~~~~~~~~~~~~~~~~~

You can still manually configure any registry server without auto-discovery:

.. code-block:: yaml

   agents:
     - id: "my_agent"
       backend:
         type: "claude"
         model: "claude-sonnet-4"
         mcp_servers:
           - name: "context7"
             type: "stdio"
             command: "npx"
             args: ["-y", "@upstash/context7-mcp"]

This gives you full control over which servers to include and their configuration.

Common MCP Servers
------------------

Weather
~~~~~~~

.. code-block:: yaml

   mcp_servers:
     - name: "weather"
       type: "stdio"
       command: "npx"
       args: ["-y", "@modelcontextprotocol/server-weather"]

Web Search (Brave)
~~~~~~~~~~~~~~~~~~

Requires ``BRAVE_API_KEY`` in your ``.env`` file:

.. code-block:: yaml

   mcp_servers:
     - name: "search"
       type: "stdio"
       command: "npx"
       args: ["-y", "@modelcontextprotocol/server-brave-search"]
       env:
         BRAVE_API_KEY: "${BRAVE_API_KEY}"

Playwright (Browser Automation)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Enables browser automation with screenshot and PDF capabilities:

.. code-block:: yaml

   mcp_servers:
     playwright:
       type: "stdio"
       command: "npx"
       args:
         - "@playwright/mcp@latest"
         - "--browser=chrome"              # Browser choice (chrome, firefox, webkit)
         - "--caps=vision,pdf"             # Enable vision and PDF capabilities
         - "--output-dir=${cwd}"           # Save screenshots/PDFs to workspace
         - "--user-data-dir=${cwd}/playwright-profile"  # Persistent browser profile

**Advanced Options:**

* ``--browser`` - Browser to use: ``chrome``, ``firefox``, or ``webkit``
* ``--caps`` - Capabilities: ``vision`` (screenshots), ``pdf`` (PDF generation)
* ``--output-dir`` - Directory for saving screenshots and PDFs
* ``--user-data-dir`` - Persistent browser profile directory
* ``--save-trace`` - Save Playwright traces for debugging (uncomment to enable)

Discord
~~~~~~~

Requires Discord bot token. See `Discord MCP Setup Guide <https://github.com/Leezekun/MassGen/blob/main/massgen/configs/docs/DISCORD_MCP_SETUP.md>`_:

.. code-block:: yaml

   mcp_servers:
     - name: "discord"
       type: "stdio"
       command: "npx"
       args: ["-y", "@modelcontextprotocol/server-discord"]
       env:
         DISCORD_BOT_TOKEN: "${DISCORD_BOT_TOKEN}"

Twitter
~~~~~~~

Requires Twitter API credentials. See `Twitter MCP Setup Guide <https://github.com/Leezekun/MassGen/blob/main/massgen/configs/docs/TWITTER_MCP_ENESCINAR_SETUP.md>`_:

.. code-block:: yaml

   mcp_servers:
     - name: "twitter"
       type: "stdio"
       command: "npx"
       args: ["-y", "mcp-server-twitter-unofficial"]
       env:
         TWITTER_USERNAME: "${TWITTER_USERNAME}"
         TWITTER_PASSWORD: "${TWITTER_PASSWORD}"

Multiple MCP Servers
--------------------

Agents can use multiple MCP servers simultaneously:

.. code-block:: yaml

   agents:
     - id: "multi_tool_agent"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
         mcp_servers:
           # Web search
           - name: "search"
             type: "stdio"
             command: "npx"
             args: ["-y", "@modelcontextprotocol/server-brave-search"]
             env:
               BRAVE_API_KEY: "${BRAVE_API_KEY}"

           # Weather data
           - name: "weather"
             type: "stdio"
             command: "npx"
             args: ["-y", "@modelcontextprotocol/server-weather"]

The agent can use all tools together. For example: "Search for weather apps and check the weather in Paris"

.. note::
   **File operations** are handled automatically via the ``cwd`` parameter in your backend configuration. You don't need to add a filesystem MCP server manually.

Tool Filtering
--------------

Control which MCP tools are available to agents.

Backend-Level Filtering
~~~~~~~~~~~~~~~~~~~~~~~

Exclude specific tools at the backend level:

.. code-block:: yaml

   backend:
     type: "openai"
     model: "gpt-4o-mini"
     exclude_tools:
       - mcp__discord__discord_send_webhook_message  # Exclude dangerous tools
     mcp_servers:
       - name: "discord"
         type: "stdio"
         command: "npx"
         args: ["-y", "@modelcontextprotocol/server-discord"]

MCP-Server-Specific Filtering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Override with allowed tools per MCP server:

.. code-block:: yaml

   backend:
     type: "openai"
     model: "gpt-4o-mini"
     mcp_servers:
       - name: "discord"
         type: "stdio"
         command: "npx"
         args: ["-y", "@modelcontextprotocol/server-discord"]
         allowed_tools:  # Whitelist specific tools
           - mcp__discord__discord_read_messages
           - mcp__discord__discord_send_message

Merged Exclusions
~~~~~~~~~~~~~~~~~

``exclude_tools`` from both backend and MCP server configs are combined:

.. code-block:: yaml

   backend:
     exclude_tools:
       - mcp__discord__send_webhook  # Backend-level exclusion
     mcp_servers:
       - name: "discord"
         exclude_tools:
           - mcp__discord__delete_channel  # MCP-level exclusion
         # Both tools are excluded

MCP Planning Mode
-----------------

**NEW in v0.1.2**: Intelligent LLM-based tool filtering automatically detects and blocks irreversible operations during coordination.

Planning mode prevents irreversible actions during multi-agent coordination by intelligently analyzing your question and blocking tools with side effects.

How It Works
~~~~~~~~~~~~

**Without planning mode:**

1. All agents execute MCP tools during coordination
2. Risk of duplicate or premature actions
3. Example: Multiple agents posting to Discord

**With planning mode (v0.1.2):**

1. **LLM Analysis**: Question is analyzed to detect irreversible operations
2. **Automatic Blocking**: Tools with side effects are automatically blocked during coordination
3. **Coordination**: Agents plan and discuss with read-only tools available
4. **Execution**: Winning agent executes the plan with full tool access

**Example Analysis Output:**

.. code-block:: text

   ╭─ Coordination Mode ────────────────────────────────────────╮
   │ 🧠 Planning Mode: ENABLED                                  │
   │                                                            │
   │ Agents will plan and coordinate without executing         │
   │ irreversible actions. The winning agent will implement    │
   │ the plan during final presentation.                       │
   │                                                            │
   │ 🚫 Blocked Tools:                                          │
   │   1. mcp__discord__discord_send_message                    │
   │                                                            │
   │ 📊 Analysis:                                               │
   │   Post a summary of recent AI discussions to Discord      │
   ╰────────────────────────────────────────────────────────────╯

The LLM identifies which tools have irreversible side effects (like sending messages) and blocks them during coordination, while keeping read-only tools (like reading messages) available.

Configuration
~~~~~~~~~~~~~

Enable planning mode in orchestrator config - the LLM analysis happens automatically:

.. code-block:: yaml

   orchestrator:
     coordination:
       enable_planning_mode: true
       planning_mode_instruction: |
         PLANNING MODE ACTIVE: You are currently in the coordination phase.
         During this phase:
         1. Describe your intended actions and reasoning
         2. Analyze other agents' proposals
         3. Use only 'vote' or 'new_answer' tools for coordination
         4. Read-only tools are available, but write operations are blocked
         5. Save execution for final presentation phase

When ``enable_planning_mode: true`` is set:

1. **Automatic Analysis**: An LLM analyzes your question before coordination starts
2. **Smart Blocking**: Only tools with irreversible side effects are blocked
3. **Read-Only Access**: Agents can still use read tools (e.g., ``discord_get_messages``)
4. **Visual Feedback**: A UI box shows what's blocked and why

No manual tool filtering needed - the system intelligently determines what to block based on your specific question.

Example Configuration
~~~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   agents:
     - id: "gemini_discord_agent"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
         mcp_servers:
           - name: "discord"
             type: "stdio"
             command: "npx"
             args: ["-y", "mcp-discord"]
             env:
               DISCORD_TOKEN: "${DISCORD_TOKEN}"
             security:
               level: "high"

     - id: "openai_discord_agent"
       backend:
         type: "openai"
         model: "gpt-4o-mini"
         mcp_servers:
           - name: "discord"
             type: "stdio"
             command: "npx"
             args: ["-y", "mcp-discord"]
             env:
               DISCORD_TOKEN: "${DISCORD_TOKEN}"

   orchestrator:
     snapshot_storage: "snapshots"
     agent_temporary_workspace: "temp_workspaces"
     coordination:
       enable_planning_mode: true
       planning_mode_instruction: |
         PLANNING MODE ACTIVE: Coordination phase - plan only.
         Read-only operations are allowed (reading messages, files).
         DO NOT execute write operations - those are blocked.

Usage
~~~~~

.. code-block:: bash

   # Five agents with planning mode (no execution during coordination)
   massgen \
     --config @examples/tools/planning/five_agents_filesystem_mcp_planning_mode.yaml \
     "Create a comprehensive project structure with documentation"

**What happens:**

1. **Coordination phase** → Agents discuss and plan file structure
2. **Voting** → Agents vote for best plan
3. **Final presentation** → Winning agent **executes** the plan

Multi-Backend Support
~~~~~~~~~~~~~~~~~~~~~

Planning mode works across:

* Response API (Claude)
* Chat Completions (OpenAI, Grok, etc.)
* Gemini with session-based tool execution

Complete Example
----------------

Full configuration with multiple MCP servers and planning mode:

.. code-block:: yaml

   agents:
     - id: "research_agent"
       backend:
         type: "gemini"
         model: "gemini-2.5-flash"
         mcp_servers:
           # Web search
           - name: "search"
             type: "stdio"
             command: "npx"
             args: ["-y", "@modelcontextprotocol/server-brave-search"]
             env:
               BRAVE_API_KEY: "${BRAVE_API_KEY}"
             allowed_tools:
               - mcp__search__brave_web_search

           # Weather
           - name: "weather"
             type: "stdio"
             command: "npx"
             args: ["-y", "@modelcontextprotocol/server-weather"]

     - id: "analyst_agent"
       backend:
         type: "openai"
         model: "gpt-5-nano"
         # File operations handled via cwd parameter

   orchestrator:
     coordination:
       enable_planning_mode: true
       planning_mode_instruction: |
         PLANNING MODE: Describe your intended tool usage.
         Do not execute tools during coordination.

   ui:
     display_type: "rich_terminal"
     logging_enabled: true

Security Considerations
-----------------------

1. **Tool Filtering** - Use ``allowed_tools`` and ``exclude_tools`` to limit capabilities
2. **Planning Mode** - Enable for tasks with irreversible actions
3. **Environment Variables** - Store API keys in ``.env``, never in config files
4. **Path Restrictions** - Limit filesystem server to specific directories
5. **Review Permissions** - Check what each MCP server can do before enabling

Troubleshooting
---------------

**MCP server not found:**

Ensure the MCP server package is installed:

.. code-block:: bash

   npx -y @modelcontextprotocol/server-weather

**Tools not appearing:**

* Check backend MCP support (see table above)
* Verify ``mcp_servers`` configuration
* Check for tool filtering (``allowed_tools``, ``exclude_tools``)

**Environment variables not working:**

.. code-block:: bash

   # Set in .env file
   BRAVE_API_KEY=your_key_here

   # Reference in config
   env:
     BRAVE_API_KEY: "${BRAVE_API_KEY}"

**Planning mode not working:**

* Check ``enable_planning_mode: true`` in orchestrator config
* Look for the UI box showing analysis results at the start of coordination
* If the box says "Planning Mode: DISABLED", the LLM didn't detect irreversible operations
* Review logs to see what tools the LLM identified as blocked

**Planning mode blocking too many/few tools:**

* The LLM automatically analyzes your question to determine what to block
* If too restrictive: Rephrase your question to emphasize read-only operations
* If not restrictive enough: Make your question more explicit about write operations
* The analysis UI box shows exactly what was blocked and why

**Want to see the analysis:**

The UI box appears automatically before coordination starts when planning mode is enabled.

Next Steps
----------

* :doc:`../files/file_operations` - Filesystem MCP integration
* :doc:`../files/project_integration` - Using MCP with context paths
* :doc:`../sessions/multi_turn_mode` - MCP in interactive sessions
* :doc:`../../quickstart/running-massgen` - More examples
* `MCP Server Registry <https://github.com/modelcontextprotocol/servers>`_ - Browse available MCP servers


---

## user_guide/tools/skills.rst

.. _user_guide_skills:

==========================
Skills System
==========================

The Skills System extends agent capabilities with specialized knowledge and workflows using `openskills <https://github.com/numman-ali/openskills>`_. Skills are modular, self-contained packages that provide domain-specific guidance and tools.

Overview
========

Skills transform agents from general-purpose to specialized agents with:

* **Domain Knowledge**: Specialized expertise (e.g., PDF manipulation, spreadsheet analysis)
* **Workflow Guidance**: Step-by-step procedures for complex tasks
* **Tool Integration**: Pre-configured toolchains for specific domains
* **Filesystem-Based**: Transparent, version-controllable approach

When enabled, agents can invoke skills via bash commands to access domain-specific guidance.

.. note::
   Skills complement MCP tools but work via filesystem instead of MCP protocol. This provides better transparency and allows skills to be version-controlled.

.. seealso::
   :doc:`skills_lifecycle_and_consolidation` - Current skills UX/runtime behavior plus proposed consolidation and update lifecycle.

.. important::
   **Model Recommendations**: Skills work best with frontier models (Claude Sonnet/Opus, GPT-5). Smaller models like gpt-5-mini and gpt-5-nano may not reliably recognize when to invoke skills or may skip skill invocation in favor of attempting tasks directly.

Installation
============

Install openskills and Anthropic's skills collection:

.. code-block:: bash

   # Install openskills CLI
   npm install -g openskills

   # Install Anthropic's skills collection
   openskills install anthropics/skills --universal -y

This creates ``.agent/skills/`` directory with all available skills.

.. note::
   Skills work with both Docker mode (``command_line_execution_mode: "docker"``) and local mode (``command_line_execution_mode: "local"``).

   - **Docker mode**: Skills and dependencies (ripgrep, ast-grep) are pre-installed in the container
   - **Local mode**: You need to install dependencies manually (``brew install ripgrep ast-grep`` on macOS)

Configuration
=============

Basic Configuration
-------------------

Enable skills in your YAML config:

.. code-block:: yaml

   agents:
     Agent1:
       backend_name: "Anthropic"
       backend_params:
         model: "claude-sonnet-4"
         # REQUIRED: Skills need command line access
         enable_mcp_command_line: true
         command_line_execution_mode: "docker"  # or "local"

   orchestrator:
     coordination:
       # Enable skills system
       use_skills: true

       # Optional: Skills directory (default: .agent/skills)
       skills_directory: ".agent/skills"

.. important::
   **Skills require command line execution** (``enable_mcp_command_line: true``) to be enabled for at least one agent.

With Task Planning
------------------

Combine skills with task planning (filesystem mode):

.. code-block:: yaml

   orchestrator:
     coordination:
       use_skills: true
       enable_agent_task_planning: true
       task_planning_filesystem_mode: true  # Save tasks to tasks/ directory

This creates a ``tasks/`` directory in the agent workspace:

.. code-block:: text

   agent_workspace/
     └── tasks/
         └── plan.json        # Task planning state

With Memory System
------------------

Combine skills with filesystem-based memory:

.. code-block:: yaml

   orchestrator:
     coordination:
       use_skills: true
       enable_memory_filesystem_mode: true

This creates a two-tier memory structure:

.. code-block:: text

   agent_workspace/
     └── memory/
         ├── short_term/      # Auto-injected into system prompts
         └── long_term/       # Load on-demand via MCP tools

With Previous Session Skills
-----------------------------

Enable discovery of evolving skills from previous MassGen sessions:

.. code-block:: yaml

   orchestrator:
     coordination:
       use_skills: true
       load_previous_session_skills: true  # Discover skills from past sessions

When enabled, MassGen scans ``.massgen/massgen_logs/`` for evolving skills (``SKILL.md`` files) created by agents in previous sessions. These skills appear in the system prompt with ``<location>previous_session</location>``.

**Path structure scanned:**

.. code-block:: text

   .massgen/massgen_logs/
     └── log_YYYYMMDD_HHMMSS/
         └── turn_N/
             └── attempt_N/
                 └── final/
                     └── agent_X/
                         └── workspace/
                             └── tasks/
                                 └── evolving_skill/
                                     └── SKILL.md  # Discovered as previous_session skill

Complete Setup (All Features)
------------------------------

For full coordination capabilities:

.. code-block:: yaml

   orchestrator:
     coordination:
       use_skills: true
       enable_agent_task_planning: true
       task_planning_filesystem_mode: true
       enable_memory_filesystem_mode: true
       load_previous_session_skills: true  # Optional: load skills from previous sessions

This creates:

.. code-block:: text

   agent_workspace/
     ├── memory/
     │   ├── short_term/
     │   └── long_term/
     └── tasks/
         └── plan.json

Built-in Skills
===============

MassGen includes built-in skills bundled in ``massgen/skills/``:

**Code Search & Understanding:**

* ``file-search`` - Fast text and structural code search (ripgrep/ast-grep)
* ``serena`` - Symbol-level code understanding using LSP
* ``semtools`` - Semantic search using embeddings

**Workflow & Skills (Code Mode):**

* ``evolving-skill-creator`` - Central planning mechanism for code-based workflows. Creates structured workflow plans that inventory MCP servers, custom tools, skills, and capture learnings for reuse in future sessions

**Meta-Development (MassGen develops MassGen):**

* ``massgen-config-creator`` - Config creation guidance and best practices
* ``massgen-develops-massgen`` - Self-development workflows (automation + visual evaluation)
* ``massgen-release-documenter`` - Release documentation process
* ``model-registry-maintainer`` - Model registry maintenance

All skills are invoked the same way using ``openskills read <skill-name>``.

.. note::
   **Lightweight Guidance**: When command execution is enabled, agents automatically receive lightweight file search guidance (~30 lines) in their system prompt. For comprehensive documentation, invoke: ``openskills read file-search``

File Search
-----------

Fast text and structural code search using ripgrep and ast-grep.

**Lightweight Guidance (Always Available):**

When command execution is enabled, agents automatically see basic usage:

.. code-block:: bash

   # Text search with ripgrep
   rg "pattern" --type py --type js

   # Structural search with ast-grep
   sg --pattern 'function $NAME($$$) { $$$ }' --lang js

**Full Skill Content:**

.. code-block:: bash

   # Load comprehensive 280-line guide with targeting strategies
   openskills read file-search

**Best for:**

* Finding code patterns
* Analyzing codebases
* Refactoring workflows
* Fast keyword searches

Serena
------

Symbol-level code understanding using Language Server Protocol (LSP). Provides IDE-like capabilities for finding symbols, tracking references, and making precise code edits.

**Prerequisites:**

.. code-block:: bash

   # Use uvx to run serena on-demand (no permanent installation)
   uvx --from git+https://github.com/oraios/serena serena --help

   # Works in both Docker mode (uv pre-installed) and local mode
   # For local mode, install uv first: curl -LsSf https://astral.sh/uv/install.sh | sh

**Invocation:**

.. code-block:: bash

   # Load serena skill guidance
   openskills read serena

**Core Capabilities:**

* **find_symbol**: Locate class, function, or variable definitions
* **find_referencing_symbols**: Find all locations where a symbol is used
* **insert_after_symbol**: Make precise code insertions at symbol level

**Usage:**

.. code-block:: bash

   # Read skill guidance
   openskills read serena

   # Find symbol definitions (after reading skill)
   serena find_symbol --name 'UserService' --type class

   # Find all references
   serena find_referencing_symbols --name 'authenticate'

   # Insert code at symbol location
   serena insert_after_symbol --name 'MyClass' --type class --code '...'

**Best for:**

* Understanding symbol relationships and dependencies
* Impact analysis before refactoring
* Precise code insertions at symbol level
* Tracking all usages of functions/classes
* Working with large, complex codebases

**Supported Languages:**

Python, JavaScript, TypeScript, Rust, Go, Java, C/C++, C#, Ruby, PHP, and 20+ more languages through LSP.

Semtools Skill
--------------

Semantic search using embedding-based similarity matching. Find code by meaning, not just keywords.

**Prerequisites:**

.. code-block:: bash

   # Install via npm (recommended)
   npm install -g @llamaindex/semtools

   # Or via cargo
   cargo install semtools

   # Optional: For document parsing (PDF, DOCX, PPTX)
   export LLAMA_CLOUD_API_KEY="your-key"

**Invocation:**

.. code-block:: bash

   # Load semtools skill guidance
   openskills read semtools

**Core Capabilities:**

* **Semantic Search**: Find code by meaning, not exact keywords
* **Workspace Management**: Cache embeddings for fast repeated searches
* **Document Parsing**: Convert PDFs, DOCX, PPTX to searchable text (optional)

**Usage:**

.. code-block:: bash

   # Read skill guidance
   openskills read semtools

   # Semantic search by concept (after reading skill)
   search "authentication logic" src/

   # Search with more results
   search "error handling" --top-k 10 --n-lines 5

   # Create workspace for large codebases
   workspace use my-project
   export SEMTOOLS_WORKSPACE=my-project

   # Parse documents (requires API key)
   parse research_papers/*.pdf

**Best for:**

* Finding code when you know the concept but not the keywords
* Discovering semantically similar implementations
* Searching across different terminology/languages
* Document analysis and research
* Exploratory code discovery

**Note:**

* Semantic search works **locally** without API keys
* Document parsing (PDF/DOCX) requires LlamaIndex Cloud API key
* Embeddings are computed locally using model2vec

Meta-Development Skills
------------------------

MassGen includes four skills that enable agents to develop and improve MassGen itself. These skills provide structured workflows and best practices for common development tasks.

MassGen Config Creator
~~~~~~~~~~~~~~~~~~~~~~

Guides agents in creating properly structured YAML configuration files.

**Invocation:**

.. code-block:: bash

   openskills read massgen-config-creator

**Key Features:**

* Enforces "read existing configs first, never invent properties" rule
* References authoritative documentation (``docs/source/development/writing_configs.rst``)
* Property placement reference (backend-level vs orchestrator-level)
* File naming and location conventions
* Common pattern templates (single agent, multi-agent, with filesystem)

**Use Cases:**

* Creating example configs for new features
* Writing configs for case studies
* Building reusable multi-agent workflows
* Testing backend/tool integrations

**Example:**

.. code-block:: bash

   # Agent uses skill to create a config
   openskills read massgen-config-creator
   # Follows guidance to create properly structured config

MassGen Develops MassGen
~~~~~~~~~~~~~~~~~~~~~~~~~

Guides agents in using MassGen to develop itself through two distinct workflows.

**Invocation:**

.. code-block:: bash

   openskills read massgen-develops-massgen

**Workflow 1: Automation Mode** (Functional Testing)

* Run MassGen in ``--automation`` mode for clean, parseable output
* Monitor progress via ``status.json`` file
* Parse log directory and results programmatically
* Create background monitoring tasks (token usage, errors, progress, coordination)
* Test backend functionality, coordination logic, agent responses

**Workflow 2: Visual Evaluation** (UX Testing)

* Pre-test with ``--automation`` to validate config (REQUIRED)
* Record rich terminal display with VHS (without ``--automation``)
* Analyze videos with ``understand_video`` tool
* Evaluate terminal UX: clarity, layout, status indicators, user experience

**Additional Guidance:**

* Model selection guidelines (prefer recent mid-tier models)
* Config generation patterns
* Docker considerations (automatic detection and mode switching)
* Timing expectations and monitoring best practices

**Use Cases:**

* Testing new features programmatically
* Evaluating terminal UI/UX quality
* Creating case study demos with recordings
* Running experiments with MassGen configs

MassGen Release Documenter
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Guides agents through the complete release documentation workflow.

**Invocation:**

.. code-block:: bash

   openskills read massgen-release-documenter

**Documentation Order (CRITICAL):**

1. CHANGELOG.md (START HERE)
2. Sphinx Documentation (docs/source/)
3. Config Documentation (massgen/configs/README.md)
4. Case Studies (docs/source/examples/case_studies/)
5. README.md
6. README_PYPI.md (auto-synced via pre-commit)
7. Roadmap (ROADMAP.md)

**Key Features:**

* Phase-by-phase release checklist
* Commit message templates
* PR creation workflow
* Tag release instructions
* Validation checklist
* Backend update guidance (config_builder.py, capabilities.py)

**Use Cases:**

* Preparing documentation for new releases
* Updating CHANGELOG.md
* Writing case studies
* Following release checklist process

**References:** Primary source of truth is ``docs/dev_notes/release_checklist.md``

Model Registry Maintainer
~~~~~~~~~~~~~~~~~~~~~~~~~~

Guides agents in maintaining MassGen's model and backend registry.

**Invocation:**

.. code-block:: bash

   openskills read model-registry-maintainer

**Two Files to Maintain:**

1. ``massgen/backend/capabilities.py`` - Models, capabilities, release dates
2. ``massgen/token_manager/token_manager.py`` - Pricing, context windows

**Pricing Resolution Order:**

1. LiteLLM database (500+ models, fetched on-demand, cached 1 hour)
2. Hardcoded PROVIDER_PRICING (fallback only)
3. Pattern matching heuristics

**Information Gathered for New Models:**

* Release date (format: "YYYY-MM")
* Context window and max output tokens
* Pricing (input/output per 1K tokens)
* Capabilities (web search, code execution, vision, etc.)
* Exact API identifier

**Programmatic Updates:**

* LiteLLM pricing database integration
* OpenRouter API for real-time data
* Provider-specific API guidance
* Automation script template

**Current Coverage:**

* OpenAI: GPT-5 family, GPT-4.1 family, o4-mini
* Claude: Sonnet 4.5, Haiku 4.5, Opus 4.1
* Gemini: 3-pro-preview, 2.5-pro, 2.5-flash
* Grok: 4.1 family, 4 family, 3 family

**Use Cases:**

* Adding new models from providers
* Updating model pricing
* Maintaining context window information
* Keeping registry current with provider releases

Choosing Between Search Tools
------------------------------

MassGen provides three complementary search approaches:

+------------------+----------------------+------------------------+---------------------------+
| Tool             | Search Type          | Best For               | Example                   |
+==================+======================+========================+===========================+
| **file-search**  | Text/Syntax          | Exact keywords,        | Find "LoginService" class |
| (ripgrep/ast-grep)|                     | code patterns          |                           |
+------------------+----------------------+------------------------+---------------------------+
| **serena**       | Symbols/References   | Finding definitions,   | Track all uses of         |
|                  |                      | tracking usage         | authenticate()            |
+------------------+----------------------+------------------------+---------------------------+
| **semtools**     | Semantic/Meaning     | Concept discovery,     | Find "rate limiting"      |
|                  |                      | similar code           | implementations           |
+------------------+----------------------+------------------------+---------------------------+

**Search Strategy:**

1. **Concept Discovery**: Use semtools to find relevant areas
2. **Symbol Tracking**: Use serena to track precise definitions and references
3. **Text Search**: Use file-search (ripgrep) for exact keyword follow-up

**Example Workflow:**

.. code-block:: bash

   # 1. Discover authentication-related code semantically
   search "user authentication" src/

   # 2. Find exact class definition
   uvx --from git+https://github.com/oraios/serena serena find_symbol --name 'AuthService' --type class

   # 3. Track all references
   uvx --from git+https://github.com/oraios/serena serena find_referencing_symbols --name 'AuthService'

   # 4. Search for specific patterns
   rg "AuthService\(" --type py src/

External Skills
===============

Anthropic Skills Collection
----------------------------

When you install ``anthropics/skills``, you get access to:

* **pdf**: PDF manipulation toolkit
* **xlsx**: Spreadsheet creation and analysis
* **pptx**: PowerPoint presentation generation
* **docx**: Word document processing
* **skill-creator**: Guide for creating custom skills
* **And more...**

Using External Skills
---------------------

1. **Discover available skills:**

   Agents see skills listed in their system prompt automatically.

2. **Invoke a skill:**

   .. code-block:: bash

      openskills read pdf

   This loads the PDF skill's guidance and instructions.

3. **Follow skill guidance:**

   The skill content provides step-by-step instructions, examples, and best practices.

Evolving Skills (Code Mode)
---------------------------

Evolving skills are the **central planning mechanism** for code-based workflows. When agents enable ``auto_discover_custom_tools: true`` (code mode), they create evolving skills that:

1. **Structure Task Planning**: Enhanced task planning with explicit workflow documentation
2. **Discover Available Tools**: Structured sections for MCP servers, custom tools, and other skills
3. **Create Reusable Scripts**: Python tools that persist for future use
4. **Capture Learnings**: What worked, what didn't, and tips for iteration

**Why Use Evolving Skills:**

Rather than ad-hoc task execution, evolving skills provide a structured approach where agents:

- **Plan before executing**: Document the workflow upfront
- **Inventory available resources**: Explicitly list MCP servers, custom tools, and skills to use
- **Build reusable tools**: Create Python scripts that become part of the skill
- **Learn and improve**: Capture learnings for the skill to improve over iterations

**Creating Evolving Skills:**

Invoke the built-in ``evolving-skill-creator`` skill:

.. code-block:: bash

   openskills read evolving-skill-creator

**Directory Structure:**

.. code-block:: text

   tasks/evolving_skill/
   ├── SKILL.md              # Workflow plan with structured sections
   └── scripts/              # Python tools created during execution
       ├── scrape_data.py
       └── generate_output.py

**Key Sections in SKILL.md:**

.. code-block:: markdown

   ---
   name: artist-website-builder  # Descriptive, reusable name
   description: Build static biographical websites for artists
   ---

   # Artist Website Builder

   ## Workflow
   Detailed numbered steps...

   ## Tools to Create
   Python scripts to build (documented BEFORE writing):
   - scripts/fetch_data.py: Purpose, inputs, outputs, dependencies

   ## Tools to Use
   Available resources discovered in the workspace:
   - servers/context7: Documentation fetching
   - custom_tools/image_optimizer: Asset compression

   ## Skills
   Other skills that can help:
   - web-scraping-patterns: Crawling approach guidance

   ## Packages
   Dependencies to install:
   - crawl4ai, jinja2

   ## Learnings
   (Updated after execution)

**Important:** The ``SKILL.md`` must have proper YAML frontmatter with ``name`` and ``description`` fields for the skill to be discoverable in future sessions. Use descriptive names like ``artist-website-builder``, not instance-specific names like ``bob-dylan-project``.

**Loading in Future Sessions:**

Enable ``load_previous_session_skills: true`` to automatically discover evolving skills from previous sessions. This allows agents to build on previous work and reuse tools/workflows. See :ref:`With Previous Session Skills <user_guide_skills>`.

Creating Custom Skills
----------------------

Follow the ``skill-creator`` skill guidance:

.. code-block:: bash

   openskills read skill-creator

Or create manually:

1. Create skill directory:

   .. code-block:: bash

      mkdir .agent/skills/my-skill

2. Create ``SKILL.md`` with YAML frontmatter:

   .. code-block:: markdown

      ---
      name: my-skill
      description: Brief description of what this skill does
      ---

      # My Skill

      Detailed guidance and instructions...

3. Skill is automatically discovered when ``use_skills: true``

How Skills Work
===============

Discovery
---------

When ``use_skills: true``:

1. MassGen scans ``.agent/skills/`` (external) and ``massgen/skills/`` (built-in)
2. Parses ``SKILL.md`` files for metadata
3. Builds skills table in agent system prompt

Skills Table
------------

Agents see available skills in their system prompt:

.. code-block:: xml

   <skills_system priority="1">

   ## Available Skills

   <available_skills>

   <skill>
   <name>pdf</name>
   <description>PDF manipulation toolkit...</description>
   <location>project</location>
   </skill>

   <skill>
   <name>file-search</name>
   <description>Fast text and structural code search...</description>
   <location>builtin</location>
   </skill>

   </available_skills>

   </skills_system>

Invocation
----------

Agents invoke skills using bash:

.. code-block:: bash

   openskills read <skill-name>

This loads the skill's full content and guidance.

Best Practices
==============

When to Use Skills
------------------

**Use skills when:**

* Task requires domain-specific knowledge
* Workflow is complex and benefits from guidance
* Want transparency (filesystem > MCP state)
* Multiple agents need to coordinate

**Don't use skills when:**

* Simple, one-off tasks
* MCP tools are sufficient
* Command line execution not available

Skill Selection
---------------

1. **Check available skills** in the system prompt first
2. **Read skill content** before using
3. **Follow skill guidance** - they provide best practices
4. **Don't mix approaches** - if using a skill, follow its patterns

Memory Management
-----------------

1. **Be selective** - only save important information
2. **Use clear names** - descriptive filenames
3. **Structured data** - JSON for data, Markdown for docs
4. **Clean up** - remove outdated memories

File Searching
--------------

1. **Start broad** - simple patterns first
2. **Add filters** - use file type and directory filters
3. **Use context** - ``-C`` flag shows surrounding code
4. **Combine tools** - ripgrep for text, ast-grep for structure

Troubleshooting
===============

Skills Not Found
----------------

**Error:** ``Skills directory is empty or doesn't exist``

**Solution:**

.. code-block:: bash

   # Local users: Install openskills
   npm install -g openskills
   openskills install anthropics/skills --universal -y

   # Docker users: Skills should be pre-installed
   # If missing, rebuild Docker image

Command Execution Required
---------------------------

**Error:** ``Skills require command line execution``

**Solution:**

Add to agent config:

.. code-block:: yaml

   agents:
     Agent1:
       backend_params:
         enable_mcp_command_line: true
         command_line_execution_mode: "docker"  # or "local"

Skill Not Appearing
-------------------

**Problem:** Installed skill not showing in skills table

**Solutions:**

1. Check skills directory path in config
2. Verify ``SKILL.md`` has YAML frontmatter
3. Check file permissions
4. Try ``openskills list`` to see installed skills

Performance Considerations
==========================

Skill Discovery Cost
--------------------

* Skills are scanned once at orchestration startup
* Minimal overhead for small skill collections
* For 50+ skills, consider using specific skills directory

System Prompt Size
------------------

* Skills table adds to system prompt length
* ~100 tokens per skill in the table
* Full skill content loaded on-demand via ``openskills read``

Integration with Other Features
================================

With Filesystem
---------------

Skills work seamlessly with filesystem features:

* ``memory/`` for skill-specific memory
* ``temp_workspaces/`` for viewing other agents' skill usage
* File tools for creating/reading skill outputs

With MCP Tools
--------------

Skills complement MCP tools:

* Use MCP tools for direct actions
* Use skills for guidance and workflows
* Skills can invoke MCP tools via instructions

With Multi-Turn
---------------

Skills persist across turns:

* Memories saved in ``memory/`` available in next turn
* Skill outputs visible in ``temp_workspaces/``

Example Workflows
=================

Complex Refactoring
-------------------

.. code-block:: yaml

   # Config: Enable skills with task planning and memory
   coordination:
     use_skills: true
     enable_agent_task_planning: true
     task_planning_filesystem_mode: true
     enable_memory_filesystem_mode: true

**Agent workflow:**

1. Use ``file-search`` skill to find all usages
2. Store decisions in ``memory/`` for context
3. Execute refactoring in agent workspace

Multi-Agent Collaboration
--------------------------

.. code-block:: yaml

   # Config: Skills with memory for cross-agent sharing
   coordination:
     use_skills: true
     enable_memory_filesystem_mode: true

**Agent collaboration:**

1. Agent 1: Research using external skills, save findings to ``memory/short_term/``
2. Agent 2: Read Agent 1's memories from shared reference path (typically ``temp_workspaces/agent1/memory/``)
3. Agent 2: Build upon findings using same skills

.. note::
   The shared reference path is configurable via ``agent_temporary_workspace`` in the orchestrator config. The default is ``temp_workspaces/`` but can be any directory name. Agents see the actual path in their system prompt under "Shared Reference".

See Also
========

* :ref:`user_guide_agent_task_planning` - Task planning without skills
* :ref:`user_guide_custom_tools` - Creating custom MCP tools
* :ref:`user_guide_code_execution` - Command line execution setup
* :ref:`user_guide_file_operations` - Filesystem operations

Examples
========

* ``massgen/configs/skills/skills_basic.yaml`` - Basic skills usage
* ``massgen/configs/skills/skills_semantic_search.yaml`` - Semantic search with serena and semtools
* ``massgen/configs/skills/test_semantic_skills.yaml`` - Test configuration for semantic skills
* ``massgen/configs/skills/skills_with_task_planning.yaml`` - With task planning
* ``massgen/configs/skills/skills_organized_workspace.yaml`` - Organized workspace structure
* ``massgen/configs/skills/skills_with_previous_sessions.yaml`` - Load evolving skills from previous sessions


---

## user_guide/tools/skills_lifecycle_and_consolidation.rst

.. _user_guide_skills_lifecycle_and_consolidation:

Skills Lifecycle and Consolidation
==================================

This document captures:

1. What is already implemented for skills UX/runtime behavior.
2. The proposed lifecycle model for updating and consolidating skills to avoid duplication.
3. Tool exposure policy for the ``skills`` MCP wrapper.

Current Implementation (Shipped in This Branch)
================================================

Global Skills Entry in TUI
--------------------------

Skills management is available from a dedicated ``Skills`` button in the mode bar when skills are discoverable.
The analysis-only "Manage Skills" button was removed from analysis settings.

Skills Modal Organization
-------------------------

The modal now groups skills by source:

* ``builtin`` (MassGen bundled skills)
* ``project`` (``.agent/skills`` in current project)
* ``user`` (``~/.agent/skills``)
* ``previous_session`` (evolving skills discovered from logs)

The modal includes:

* Source/custom/evolving labeling for each skill.
* ``Include evolving skills from previous sessions`` toggle.
* ``Enable All``, ``Disable All``, and ``Enable Custom`` quick actions.

Runtime Behavior
----------------

* ``include_previous_session_skills`` is stored in TUI analysis state and forwarded at runtime.
* Analysis mode now supports ``skill_lifecycle_mode`` with:

  * ``create_or_update`` (default): update best matching existing skill, otherwise create.
  * ``create_new``: always create a new skill directory.
  * ``consolidate``: apply update/create and then merge overlapping project skills.

* Local skills setup now supports including previous-session evolving skills in merged local skills directories.
* Skill scanning now returns richer metadata (for example: ``source_path``, ``is_custom``, ``is_evolving``, ``origin``) and deduplicates by skill name.
* Analysis skill-creation instructions now explicitly request provenance metadata:

  * ``massgen_origin``
  * ``evolving: true``

* Post-analysis harvesting now applies lifecycle-aware upsert logic (create/update/consolidate) rather than copy-only behavior.

Skills MCP Wrapper Tool
-----------------------

A ``skills`` tool is available in the workspace tools server:

* ``skills(action="list")`` returns grouped skill metadata.
* ``skills(action="read", skill_name="...")`` reads a skill by:

  1. Trying ``openskills read`` first.
  2. Falling back to direct ``SKILL.md`` reads when needed.

This enables skill usage in local execution contexts where direct CLI invocation may not be reliable.

Remaining Gaps
==============

Without lifecycle controls, evolving skill creation can cause:

* Duplicate domain skills (for example, multiple poem-writing variants).
* Drift and fragmentation across similar skills.
* Overly large, noisy "all skills" inventories.

Lifecycle Model (Implemented + Tuning)
======================================

Default
-------

``create_or_update`` is the default behavior in analysis mode.

Behavior:

1. Discover existing skills first.
2. Match the new candidate against existing skills using semantic similarity (name + description + content).
3. If match confidence is high, update the existing skill instead of creating a new one.
4. If confidence is low, create a new skill.

Other Modes
-----------

* ``create_new``: Always create a new skill (explicit opt-in).
* ``consolidate``: Merge overlapping skills and keep canonical versions.

Consolidation Guardrails
------------------------

* Keep canonical skill IDs/directories stable where possible.
* Record provenance for merged skills (for example ``merged_from`` list).
* Prefer staged writes + explicit approval over silent direct merges.

Analysis UX
-----------

Analysis options now include a dedicated skill lifecycle selector in the popover:

* ``Create or Update (recommended)``
* ``Create New Only``
* ``Consolidate Similar Skills``

Recommended Write Policy
------------------------

For normal coordination runs, avoid broad write access by all agents to canonical skills.

Preferred policy:

1. Only final/presenter agent can propose skill changes.
2. Write to staging first.
3. Require explicit apply/approve step for canonical ``.agent/skills`` updates.

Tool Exposure Policy
====================

Requirement for the ``skills`` MCP wrapper tool:

* Expose it only when CLI-native skill workflows are not active.
* Do not expose it when backend type is ``codex``.
* Do not expose it when backend type is ``claude_code``.
* Treat this as a hard gating rule to avoid overlapping skill systems.

Rationale:

* Those backends already have native skill patterns.
* Avoid duplicated/conflicting skill invocation paths.

Implementation note:

* Backend/tool gating is now enforced in two layers:

  * MCP config excludes ``skills`` and sets a disable env flag.
  * Workspace tools server honors the env flag and does not register ``skills`` at all.

Candidate Config Surface (Proposed)
===================================

.. code-block:: yaml

   orchestrator:
     coordination:
       use_skills: true

       # Existing:
       load_previous_session_skills: false

       # Proposed:
       skill_lifecycle_mode: "create_or_update"   # create_new | create_or_update | consolidate
       enable_skill_consolidation_in_analysis: false
       skill_write_policy: "final_agent_staging"  # none | final_agent_staging | final_agent_direct
       max_previous_session_skills: 10
       previous_session_skill_sort: "recency"     # recency | relevance

Decision Checklist Before Implementation
========================================

1. Should ``create_or_update`` be the default globally, or only in analysis mode?
2. Should consolidation ever run automatically, or always require explicit toggle?
3. Should superseded skills be archived automatically or only suggested?
4. Should canonical updates require approval in all modes, or only when multiple agents are writing?


---

## user_guide/validating_configs.rst

Validating Configurations
=========================

MassGen includes a built-in configuration validator that helps you catch errors before running your agents. The validator checks for:

- **Schema correctness**: Required fields, valid types, and correct structure
- **Backend compatibility**: Ensures requested features are supported by the backend
- **Best practices**: Warns about potential issues or suboptimal configurations

Validation Methods
------------------

Standalone Validation Command
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Validate a configuration file without running it:

.. code-block:: bash

   # Basic validation
   massgen --validate config.yaml

   # Strict mode (treat warnings as errors)
   massgen --validate config.yaml --strict

   # JSON output for scripts/CI
   massgen --validate config.yaml --json

The validator will exit with code 0 if the config is valid, or 1 if errors are found (or warnings in strict mode).

Automatic Validation
~~~~~~~~~~~~~~~~~~~~

By default, MassGen automatically validates configurations when you run commands:

.. code-block:: bash

   # Validation happens automatically
   massgen --config config.yaml "What is machine learning?"

**Error behavior**: If errors are found, MassGen will display them and exit before running agents.

**Warning behavior**: Warnings are displayed but don't block execution (unless ``--strict-validation`` is used).

**Disabling validation**: Use ``--skip-validation`` to bypass validation:

.. code-block:: bash

   massgen --config config.yaml --skip-validation "Your question"

Validation Flags
~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 30 70

   * - Flag
     - Description
   * - ``--validate <file>``
     - Validate a config file without running it
   * - ``--strict``
     - Treat warnings as errors (use with ``--validate``)
   * - ``--json``
     - Output validation results in JSON format
   * - ``--skip-validation``
     - Skip automatic validation when loading configs
   * - ``--strict-validation``
     - Treat warnings as errors during automatic validation

What Gets Validated
-------------------

Required Fields
~~~~~~~~~~~~~~~

The validator ensures all required fields are present:

.. code-block:: yaml

   # ✅ Valid - has all required fields
   agents:
     - id: "agent-1"          # Required
       backend:               # Required
         type: "openai"       # Required
         model: "gpt-4o"      # Required

   # ❌ Invalid - missing required fields
   agents:
     - backend:
         type: "openai"
         # Missing: id, model

Field Types
~~~~~~~~~~~

All fields must have the correct type:

.. code-block:: yaml

   # ❌ Invalid types
   agents:
     - id: 123                    # Should be string
       backend:
         type: "openai"
         model: "gpt-4o"
         enable_web_search: "yes" # Should be boolean (true/false)

Backend Capabilities
~~~~~~~~~~~~~~~~~~~~

The validator checks that backends support requested features:

.. code-block:: yaml

   # ❌ Invalid - lmstudio doesn't support web_search
   agents:
     - id: "agent-1"
       backend:
         type: "lmstudio"
         model: "custom"
         enable_web_search: true  # Error: not supported

   # ✅ Valid - openai supports web_search
   agents:
     - id: "agent-1"
       backend:
         type: "openai"
         model: "gpt-4o"
         enable_web_search: true

Enum Values
~~~~~~~~~~~

Fields with limited valid values are validated:

.. code-block:: yaml

   # ❌ Invalid display_type
   ui:
     display_type: "fancy"  # Must be: rich_terminal, simple

   # ❌ Invalid permission_mode
   agents:
     - id: "agent-1"
       backend:
         type: "claude_code"
         model: "claude-sonnet-4-5-20250929"
         permission_mode: "auto"  # Must be: approve, reject, prompt

Duplicate Agent IDs
~~~~~~~~~~~~~~~~~~~

Each agent must have a unique ID:

.. code-block:: yaml

   # ❌ Invalid - duplicate IDs
   agents:
     - id: "agent-1"
       backend: {...}
     - id: "agent-1"        # Error: duplicate ID
       backend: {...}

V1 Config Detection
~~~~~~~~~~~~~~~~~~~

The validator rejects legacy V1 configs with a helpful migration message:

.. code-block:: yaml

   # ❌ V1 config (no longer supported)
   models: ["gpt-4o", "claude-3-opus"]
   num_agents: 2

   # Error: V1 config format detected.
   # Suggestion: Migrate to V2 config format.

Warnings
--------

The validator generates warnings for potential issues that don't prevent execution:

Multi-Agent Without Orchestrator
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   # ⚠️ Warning: Consider adding orchestrator config
   agents:
     - id: "agent-1"
       backend: {...}
     - id: "agent-2"
       backend: {...}
   # Missing: orchestrator section

Conflicting Tool Filters
~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   # ⚠️ Warning: Using both filters can be confusing
   agents:
     - id: "agent-1"
       backend:
         type: "claude_code"
         model: "claude-sonnet-4-5-20250929"
         allowed_tools: ["Read", "Write"]
         exclude_tools: ["Bash"]  # Prefer one approach

Programmatic Usage
------------------

You can use the validator in Python code:

.. code-block:: python

   from massgen.config_validator import ConfigValidator

   # Validate a config file
   validator = ConfigValidator()
   result = validator.validate_config_file("config.yaml")

   if result.has_errors():
       print(result.format_errors())
       exit(1)

   if result.has_warnings():
       print(result.format_warnings())

   # Validate a config dict
   config = {
       "agent": {
           "id": "test",
           "backend": {"type": "openai", "model": "gpt-4o"}
       }
   }
   result = validator.validate_config(config)

   # Get results as dict (for JSON serialization)
   result_dict = result.to_dict()
   # {
   #   "valid": True,
   #   "error_count": 0,
   #   "warning_count": 0,
   #   "errors": [],
   #   "warnings": []
   # }

CI/CD Integration
-----------------

Use the validator in CI pipelines to catch config errors:

.. code-block:: bash

   #!/bin/bash
   # validate_configs.sh

   EXIT_CODE=0

   for config in configs/*.yaml; do
       echo "Validating $config..."
       if ! massgen --validate "$config" --strict --json > "${config}.validation.json"; then
           echo "❌ Validation failed: $config"
           EXIT_CODE=1
       fi
   done

   exit $EXIT_CODE

GitHub Actions Example:

.. code-block:: yaml

   name: Validate Configs
   on: [push, pull_request]

   jobs:
     validate:
       runs-on: ubuntu-latest
       steps:
         - uses: actions/checkout@v3
         - uses: actions/setup-python@v4
           with:
             python-version: '3.11'
         - name: Install MassGen
           run: pip install massgen
         - name: Validate all configs
           run: |
             for config in configs/*.yaml; do
               massgen --validate "$config" --strict
             done

Best Practices
--------------

1. **Validate before committing**: Run ``massgen --validate`` on config files before committing them to version control.

2. **Use strict mode in CI**: Set ``--strict`` in CI/CD to catch warnings early.

3. **Check JSON output**: Parse ``--json`` output in scripts for programmatic error handling.

4. **Don't skip validation**: Avoid ``--skip-validation`` unless debugging validator issues.

5. **Fix warnings**: While non-blocking, warnings often indicate configuration issues worth addressing.

Error Messages
--------------

The validator provides clear error messages with suggestions:

.. code-block:: text

   🔴 Configuration Errors Found:

   ❌ [agents[0].backend.type] Unknown backend type: 'gpt'
      💡 Suggestion: Use one of: openai, claude, gemini, grok, ...

   ❌ [agents[0]] Agent missing required field 'id'
      💡 Suggestion: Add 'id: "agent-name"'

Each error shows:

- **Location**: Which part of the config has the error (e.g., ``agents[0].backend.type``)
- **Message**: What's wrong
- **Suggestion**: How to fix it

Common Errors and Fixes
-----------------------

.. list-table::
   :header-rows: 1
   :widths: 40 60

   * - Error
     - Fix
   * - ``Unknown backend type``
     - Use a valid backend: openai, claude, gemini, etc.
   * - ``Agent missing required field 'id'``
     - Add ``id: "agent-name"`` to each agent
   * - ``Backend missing required field 'model'``
     - Add ``model: "model-name"`` to backend
   * - ``Backend does not support web_search``
     - Remove ``enable_web_search`` or use a different backend
   * - ``Duplicate agent ID``
     - Ensure each agent has a unique ID
   * - ``Invalid permission_mode``
     - Use: approve, reject, or prompt
   * - ``V1 config format detected``
     - Migrate to V2 format (see :doc:`../reference/yaml_schema`)

See Also
--------

- :doc:`../reference/yaml_schema` - Complete configuration schema reference
- :doc:`../reference/configuration_examples` - Example configurations
- :doc:`../quickstart/configuration` - Getting started with configs


---

## user_guide/webui.rst

=============
Web UI Guide
=============

The Web UI provides a browser-based interface for real-time visualization
of multi-agent coordination. Watch agents work in parallel, vote, and
converge on solutions through an interactive graphical interface.

Why Use the WebUI?
------------------

The WebUI is ideal for:

* **Visual Monitoring** - See all agents working simultaneously with streaming output
* **Team Demos** - Show stakeholders how multi-agent coordination works
* **Workspace Browsing** - Explore files created by agents with syntax highlighting
* **Vote Analysis** - Understand consensus-building with animated visualizations
* **Timeline Review** - See the full coordination flow as a swimlane diagram

Starting the Web UI
-------------------

Launch the Web UI with the ``--web`` flag:

.. code-block:: bash

   # Basic usage with default config
   uv run massgen --web

   # With a specific configuration
   uv run massgen --web --config @examples/basic/multi/three_agents_default

   # Custom host and port
   uv run massgen --web --web-host 0.0.0.0 --web-port 3000

Open http://localhost:8000 (default) in your browser.

First-Time Setup
----------------

When you run ``uv run massgen --web`` for the first time, you'll be automatically
directed to the **Setup Page** to configure your environment:

**Step 1: API Keys**
   Enter API keys for the providers you want to use (OpenAI, Anthropic, Google, etc.).
   Keys can be saved globally (``~/.massgen/.env``) or locally (``./.env``).

**Step 2: Docker Setup**
   Check Docker availability and pull MassGen runtime images for isolated code execution.
   Docker is optional - you can skip this step if you prefer local execution mode.

**Step 3: Skills**
   View available skills that extend agent capabilities. Skills are enabled via YAML config.

After completing setup, click **Finish Setup** to proceed to the Quickstart Wizard
where you'll configure your first agent team.

.. note::
   You can return to the Setup Page anytime by clicking the **Setup** button in the header
   or navigating directly to ``http://localhost:8000/setup``.

Interface Overview
------------------

**Header Bar**
   Connection status indicator, config selector dropdown, and session controls.

**Agent Carousel**
   Real-time view of all agents and their responses. For 1-3 agents, cards
   display in a grid. For 4+ agents, a carousel with navigation arrows.

**Input Area**
   Question input field with Start/Cancel buttons. After completion,
   a follow-up input appears for multi-turn conversations.

**Status Toolbar**
   Quick access buttons for Answer Browser, Vote Browser, Workspace Browser,
   and Timeline View.

Quickstart Wizard
-----------------

If no configuration is set, the WebUI launches a guided setup wizard:

**Step 1: Docker Check**
   Detects if Docker is available for isolated code execution.

**Step 2: Agent Count**
   Choose how many agents to use (1-5). More agents = more perspectives but longer execution.

**Step 3: Setup Mode**
   - **Quick Setup** - Select a provider (OpenAI, Anthropic, Google, etc.) and model
   - **Browse Examples** - Choose from pre-built configuration templates
   - **Custom YAML** - Write your own configuration

**Step 4: Configuration**
   Review and customize the generated configuration. Set API keys if needed.

**Step 5: Preview & Save**
   Preview the final YAML and optionally save as your default config.

Starting a Session
------------------

1. Select a configuration from the dropdown (or use the one specified via ``--config``)
2. Enter your question in the input field
3. Click **Start** to begin coordination
4. Watch agents generate responses in real-time

Each agent card shows:

* Agent ID and model name
* Current status (waiting, working, voting, complete)
* Response content with syntax highlighting
* Round selector for viewing previous rounds

Display Modes
~~~~~~~~~~~~~

The UI automatically transitions through display modes:

* **Coordination** - Shows all agents in a carousel during the coordination phase
* **Final Streaming** - Shows only the winning agent streaming the final answer after consensus
* **Final Complete** - Full-screen view of the final answer with tabs for workspace and history

Viewing Results
---------------

**Answer Browser** (``A`` key)
   Browse all answers across coordination rounds. Filter by agent,
   expand individual answers, and see which answer was selected as final.

**Vote Browser** (``V`` key)
   View voting patterns and distribution. See which agents voted for
   which answers and their reasoning.

**Workspace Browser** (``W`` key)
   Browse files created by agents in their isolated workspaces.
   Select different agents and view file contents with rich artifact preview.
   Navigate directories, preview HTML/React/markdown files, and download as needed.

**Timeline View** (``T`` key)
   Visualize the coordination timeline as a swimlane diagram. See when
   answers and votes occurred and how the coordination flow progressed.
   Shows answer dependencies (which answers influenced later responses).

Final Answer View
~~~~~~~~~~~~~~~~~

After coordination completes, the final answer displays in a full-screen view with three tabs:

**Answer Tab**
   The complete final answer with full markdown rendering and syntax highlighting.

**Workspace Tab**
   Browse all files in the winning agent's workspace. View source code,
   configuration files, and any artifacts created during execution.

**Conversation Tab**
   Review the full conversation history including all turns in a multi-turn session.

Artifact Preview
~~~~~~~~~~~~~~~~

The WebUI supports rich artifact previews for files created by agents. When you click
on a file in the Workspace Browser or Final Answer workspace tab, the Artifact Preview
modal opens with intelligent rendering based on file type.

**Supported Artifact Types:**

.. list-table::
   :header-rows: 1
   :widths: 20 30 50

   * - Category
     - File Types
     - Preview Capabilities
   * - Interactive
     - ``.html``, ``.jsx``, ``.tsx``, ``.vue``
     - Live HTML preview, React/Vue components with Sandpack
   * - Diagrams
     - ``.mermaid``, ``.mmd``
     - Rendered flowcharts, sequence diagrams, and more
   * - Documents
     - ``.md``, ``.pdf``
     - Rendered markdown with styling, native PDF viewer
   * - Graphics
     - ``.svg``, ``.png``, ``.jpg``, ``.gif``, ``.webp``
     - SVG rendering, images with zoom and rotate controls
   * - Office
     - ``.docx``, ``.xlsx``, ``.pptx``
     - Word documents, Excel spreadsheets, PowerPoint slides
   * - Code
     - All other files
     - Syntax-highlighted source code (40+ languages)

**Preview vs Source Toggle:**
   For previewable file types, toggle between "Preview" and "Source" views using
   the buttons in the modal header. Preview shows the rendered artifact while
   Source shows syntax-highlighted code.

**Actions:**
   - **Copy** - Copy file contents to clipboard
   - **Download** - Download the file locally

**Accessing Artifact Preview:**

1. **During Execution** - Press ``W`` to open Workspace Browser, then click any file
2. **After Completion** - Go to Workspace tab in Final Answer view, click any file
3. **Answer Browser** - Press ``A``, go to Workspace tab, click any file

**Preview Indicators:**
   Files that support rich preview are highlighted in violet with an eye icon (👁) in the
   file browser. This makes it easy to identify which files can be previewed vs only
   viewed as source code.

.. tip::
   React and Vue components are rendered using Sandpack, providing a full bundled
   preview with live updates. This works best for standalone components.

Multi-Turn Conversations
------------------------

After a coordination completes:

1. The follow-up input field appears below the final answer
2. Enter your follow-up question
3. Click **Continue** to start a new coordination round
4. View conversation history via the dropdown in the header

Context is preserved across turns, allowing agents to reference
previous answers and build on prior discussion.

Keyboard Shortcuts
------------------

.. list-table::
   :header-rows: 1
   :widths: 20 80

   * - Key
     - Action
   * - ``A``
     - Open Answer Browser
   * - ``V``
     - Open Vote Browser
   * - ``W``
     - Open Workspace Browser
   * - ``T``
     - Open Timeline View
   * - ``?``
     - Show keyboard shortcuts help
   * - ``1-9``
     - Jump to agent by number
   * - ``Escape``
     - Close current modal

.. tip::
   Shortcuts are disabled when typing in input fields.

Automation Mode
---------------

Combine ``--web`` with ``--automation`` to auto-start a run and watch it in the
full Web UI:

.. code-block:: bash

   uv run massgen --web --automation --config config.yaml "Your question"

When ``--automation`` is combined with ``--web``, a question, and a config:

* The coordination **auto-starts** as soon as the server is ready — no browser
  interaction needed to kick off the run.
* The **full Web UI** is available at the printed URL for monitoring progress,
  browsing answers, viewing the workspace, and inspecting votes.
* Server logs are suppressed to keep stdout clean.
* If no ``--config`` is specified, the default config is auto-resolved
  (same as running without ``--web``).

Use ``--no-browser`` to prevent auto-opening the browser (useful for servers
or when driving MassGen from another process).

**CLI flags work with --web:**

Flags like ``--eval-criteria`` and ``--checklist-criteria-preset`` are forwarded
to the WebUI run, so this works as expected:

.. code-block:: bash

   # Custom evaluation criteria with WebUI monitoring
   uv run massgen --web --automation --no-browser \
     --eval-criteria criteria.json \
     --config config.yaml "Your question"

   # Use a built-in criteria preset
   uv run massgen --web --automation \
     --checklist-criteria-preset evaluation \
     --config config.yaml "Your question"

CLI Options
-----------

.. list-table::
   :header-rows: 1
   :widths: 30 70

   * - Option
     - Description
   * - ``--web``
     - Enable Web UI mode
   * - ``--web-port PORT``
     - Server port (default: 8000)
   * - ``--web-host HOST``
     - Server host (default: 127.0.0.1)
   * - ``--no-browser``
     - Don't auto-open browser
   * - ``--eval-criteria FILE``
     - Path to JSON file with custom evaluation criteria (works with ``--web``)
   * - ``--checklist-criteria-preset NAME``
     - Use a built-in criteria preset (works with ``--web``)
   * - ``--orchestrator-timeout SECONDS``
     - Set orchestrator timeout (works with ``--web``)

**Examples:**

.. code-block:: bash

   # Interactive — opens browser, enter question in UI
   massgen --web --config @examples/basic/multi/three_agents_default

   # Auto-start — run begins immediately, open URL to watch
   massgen --web --automation --config config.yaml "Your question"

   # Headless auto-start — no browser, monitor via URL or status.json
   massgen --web --automation --no-browser --config config.yaml "Your question"

   # Custom port
   massgen --web --web-port 3000 --config config.yaml

   # Network access (listen on all interfaces)
   massgen --web --web-host 0.0.0.0 --config config.yaml

Troubleshooting
---------------

**Connection shows "Disconnected"**

* Verify the server is running (check terminal output for errors)
* Check if the port is already in use: ``lsof -i :8000``
* Try a different port with ``--web-port``

**Config dropdown is empty**

* Run ``massgen --init`` to create a default configuration
* Or specify a config directly with ``--config`` flag

**Browser compatibility**

The Web UI works best in modern browsers (Chrome, Firefox, Safari, Edge).
If you experience issues, try clearing browser cache or using incognito mode.

Architecture Notes
------------------

The WebUI consists of:

**Frontend** (React + TypeScript)
   Single-page application using Zustand for state management, Framer Motion
   for animations, Shiki for syntax highlighting, and Sandpack for live code preview.
   Artifact preview uses Mermaid for diagrams, Marked for markdown, and Mammoth/SheetJS
   for Office document rendering.

**Backend** (FastAPI + WebSockets)
   REST API for configuration and session management, WebSocket for real-time
   event streaming during coordination.

**Communication**
   WebSocket connection at ``ws://localhost:8000/ws/{session_id}`` provides
   bidirectional real-time communication. Events include agent content updates,
   vote casts, consensus reached, and final answer streaming.

See Also
--------

* :doc:`sessions/multi_turn_mode` - Multi-turn conversations in CLI
* :doc:`../quickstart/running-massgen` - All running modes
* :doc:`../reference/cli` - CLI reference
* :doc:`concepts` - Core multi-agent concepts