Backend Configuration#

Backends connect MassGen agents to AI model providers. Each backend is configured in YAML and provides specific capabilities like web search, code execution, and file operations.

Overview#

Each agent in MassGen requires a backend configuration that specifies:

  • Provider: Which AI service to use (OpenAI, Claude, Gemini, etc.)

  • Model: Which specific model within that provider

  • Capabilities: Which built-in tools are enabled

  • Parameters: Model settings like temperature, max_tokens, etc.

Available Backends#

Backend Types#

MassGen supports these backend types (configured via type field in YAML):

Backend Type

Provider

Models

openai

OpenAI

GPT-5, GPT-5-mini, GPT-5-nano, GPT-4, GPT-4o

claude

Anthropic

Claude Haiku 3.5, Claude Sonnet 4, Claude Opus 4

claude_code

Anthropic (SDK)

Claude Sonnet 4, Claude Opus 4 (with dev tools)

codex

OpenAI (CLI)

GPT-5.4, GPT-5.3-Codex, GPT-5.2-Codex, GPT-5.1-Codex

gemini

Google

Gemini 2.5 Flash, Gemini 2.5 Pro

gemini_cli

Google (CLI)

Gemini 3, Gemini 2.5 Models (via Gemini CLI)

grok

xAI

Grok-4, Grok-3, Grok-3-mini

azure_openai

Microsoft Azure

GPT-4, GPT-4o, GPT-5 (Azure deployments)

zai

ZhipuAI

GLM-4.5

ag2

AG2 Framework

Any AG2-compatible agent

lmstudio

LM Studio

Local open-source models

codex

OpenAI (CLI)

GPT-5.4, GPT-5.3-Codex, GPT-5.2-Codex, GPT-5.1-Codex, GPT-4.1

copilot

GitHub Copilot

GPT-5-mini, GPT-4.1, Claude Sonnet 4, Gemini 2.5 Pro

inference

vLLM / SGLang

Any locally served model

chatcompletion

Generic

Any OpenAI-compatible API

Backend Capabilities#

Different backends support different built-in tools:

Backend Tool Support#

Backend

Web Search

Code Execution

Bash/Shell

Image

Audio

Video

MCP Support

Filesystem

Custom Tools

openai

⭐ Both

⭐ Both

⭐ Generation

claude

🔧

🔧

🔧

claude_code

🔧

🔧

🔧

codex

🔧

🔧

🔧

copilot

🔧

🔧

🔧

gemini

🔧

🔧

🔧

gemini_cli

🔧

🔧

🔧

grok

🔧

🔧

🔧

azure_openai

⭐ Both

chatcompletion

🔧

🔧

🔧

lmstudio

🔧

🔧

🔧

zai

🔧

🔧

🔧

inference

🔧

🔧

🔧

ag2

Notes:

  • Symbol Legend:

    • Built-in - Native backend feature (e.g., Anthropic’s web search, OpenAI’s native image API, Claude Code’s Bash tool)

    • 🔧 Via Custom Tools - Available through custom tools (requires OPENAI_API_KEY for multimodal understanding)

    • MCP-based or Available - Feature available via MCP integration or standard capability

    • Not available - Feature not supported

  • Custom Tools:

    • Custom tools allow you to give agents access to your own Python functions

    • Most backends support custom tools (OpenAI, Claude, Claude Code, Codex, Copilot, Gemini, Grok, Chat Completions, LM Studio, ZAI, Inference)

    • Azure OpenAI and AG2 do not support custom tools as they inherit from the base backend class without the custom tools layer

    • Custom tools are essential for multimodal understanding features (understand_image, understand_video, understand_audio, understand_file)

    • See Custom Tools for complete documentation on creating and using custom tools

  • Code Execution vs Bash/Shell:

    Warning

    Common Confusion: enable_code_execution and enable_code_interpreter run code in the provider’s sandbox (cloud environment) with NO access to your local filesystem. If you need agents to read/write files in your project, use MCP-based bash instead.

    • Code Execution (⭐): Backend provider’s native code execution tool (runs in provider sandbox - no access to MassGen workspaces)

      • openai: OpenAI code interpreter for calculations and data analysis

      • claude: Anthropic’s code execution tool

      • gemini: Google’s code execution tool

      • azure_openai: Azure OpenAI code interpreter

      • ag2: AG2 framework code executors (Local, Docker, Jupyter, Cloud)

      • When to use: Quick calculations, data analysis, isolated code snippets that don’t need filesystem access

    • Bash/Shell: MassGen-level feature with direct workspace access

      • ⭐ (claude_code, codex, gemini_cli): Native shell/bash tools built into the backend CLI/SDK

      • ✅ (all MCP-enabled backends): Universal bash/shell via enable_mcp_command_line: true

      • When to use: Code that needs to interact with your project files, run tests, execute scripts

      • See Code Execution for detailed setup and comparison

    • Recommendation: Choose one approach based on your needs. Use built-in code execution for isolated computational tasks, and MCP bash/shell for operations that need to affect your workspace files.

  • Filesystem:

    • ⭐ (claude_code, codex, gemini_cli): Native filesystem tools via CLI/SDK (Read, Write, Edit, Bash, etc.)

    • ✅ (all backends with cwd parameter): Filesystem operations handled automatically through workspace configuration

    • See File Operations & Workspace Management for detailed filesystem configuration

  • Multimodal Capabilities:

    • ⭐ Native Multimodal Support: The backend/model API directly handles multimodal content

      • ⭐ Both (e.g., openai, azure_openai): Native API supports BOTH understanding (analyze) AND generation (create)

      • ⭐ Generation (e.g., openai video): Can create videos via Sora-2 API but not analyze them

    • 🔧 Via Custom Tools: Multimodal understanding through custom tools (understand_image, understand_video, understand_audio)

      • Works with any backend that supports custom tools

      • Requires OPENAI_API_KEY in .env file (tools use OpenAI’s API for processing)

      • Examples: claude, claude_code, gemini, grok, chatcompletion, lmstudio, inference

      • Does NOT work with azure_openai or ag2 (these backends don’t support custom tools)

      • See Multimodal Capabilities for complete setup instructions

    • Understanding vs Generation:

      • Understanding: Analyze existing content (images, audio, video)

      • Generation: Create new content from text prompts

      • Both: Supports both understanding AND generation

See Supported Models & Backends for the complete backend capabilities reference.

Configuring Backends#

Basic Backend Configuration#

Every agent needs a backend section in the YAML configuration:

agents:
  - id: "my_agent"
    backend:
      type: "openai"          # Backend type (required)
      model: "gpt-5-nano"     # Model name (required)

Backend-Specific Examples#

OpenAI Backend#

Basic Configuration:

agents:
  - id: "gpt_agent"
    backend:
      type: "openai"
      model: "gpt-5-nano"
      enable_web_search: true
      enable_code_interpreter: true

With Reasoning Parameters:

agents:
  - id: "reasoning_agent"
    backend:
      type: "openai"
      model: "gpt-5-nano"
      text:
        verbosity: "medium"      # low, medium, high
      reasoning:
        effort: "high"            # low, medium, high
        summary: "auto"           # auto, concise, detailed

Supported Models: GPT-5, GPT-5-mini, GPT-5-nano, GPT-4, GPT-4o, GPT-4-turbo, GPT-3.5-turbo

Claude Backend#

Basic Configuration:

agents:
  - id: "claude_agent"
    backend:
      type: "claude"
      model: "claude-sonnet-4"
      enable_web_search: true
      enable_code_interpreter: true

With MCP Integration:

agents:
  - id: "claude_mcp"
    backend:
      type: "claude"
      model: "claude-sonnet-4"
      mcp_servers:
        - name: "weather"
          type: "stdio"
          command: "npx"
          args: ["-y", "@modelcontextprotocol/server-weather"]

Supported Models: claude-haiku-4-5-20251001, claude-sonnet-4-5-20250929, claude-opus-4-1-20250805, claude-sonnet-4-20250514, claude-3-5-sonnet-latest, claude-3-5-haiku-latest

Claude Code Backend#

With Workspace Configuration:

agents:
  - id: "code_agent"
    backend:
      type: "claude_code"
      model: "claude-sonnet-4"
      cwd: "workspace"           # Working directory for file operations

orchestrator:
  snapshot_storage: "snapshots"
  agent_temporary_workspace: "temp_workspaces"

Authentication:

The Claude Code backend supports flexible authentication:

  • API key: Set CLAUDE_CODE_API_KEY or ANTHROPIC_API_KEY environment variable

  • Subscription: If no API key is set, uses Claude subscription authentication

This allows you to use Claude Code with a subscription while using a separate API key for standard Claude backend agents.

Special Features:

  • Native file operations (Read, Write, Edit, Bash, Grep, Glob)

  • Workspace isolation

  • Snapshot sharing between agents

  • Full development tool suite

Codex Backend#

Basic Configuration:

agents:
  - id: "codex_agent"
    backend:
      type: "codex"
      model: "gpt-5.4"
      cwd: "workspace"

Authentication:

The Codex backend supports flexible authentication:

  • API key: Set OPENAI_API_KEY environment variable

  • ChatGPT subscription: If no API key, uses OAuth via codex login

Supported Models: gpt-5.4 (default), gpt-5.3-codex, gpt-5.2-codex, gpt-5.1-codex, gpt-5-codex, gpt-4.1

Reasoning Effort Configuration:

agents:
  - id: "codex_reasoning"
    backend:
      type: "codex"
      model: "gpt-5.4"
      model_reasoning_effort: "xhigh"  # low | medium | high | xhigh
      # reasoning:
      #   effort: "xhigh"            # OpenAI-style alias (also supported)

If both model_reasoning_effort and reasoning.effort are provided, model_reasoning_effort takes precedence.

Special Features:

  • Native shell and file operations via Codex CLI

  • Web search capability

  • Session persistence and resumption

  • MCP server support via workspace config

Warning

Sandbox Limitation: Codex uses OS-level sandboxing (Seatbelt/Landlock) which only restricts writes, NOT reads. Codex can read any file on the filesystem. For security-sensitive workloads, use Docker mode or consider Claude Code instead. See Native Tool Backends for details.

Recommended: Docker Mode for Security:

agents:
  - id: "secure_codex"
    backend:
      type: "codex"
      model: "gpt-5.4"
      cwd: "workspace"
      enable_mcp_command_line: true
      command_line_execution_mode: "docker"
      command_line_docker_network_mode: "bridge"  # Required for Codex

Gemini CLI Backend#

The gemini_cli backend (alias: gemini-cli) wraps Google’s Gemini CLI (@google/gemini-cli) for local or Docker execution.

Basic Configuration (Local):

agents:
  - id: "gemini_cli_agent"
    backend:
      type: "gemini_cli"
      model: "gemini-2.5-pro"
      cwd: "workspace"

Authentication:

  • CLI login: Run gemini interactively to login with Google (preferred)

  • API key: Set GOOGLE_API_KEY or GEMINI_API_KEY environment variable

Installation: npm install -g @google/gemini-cli

Docker Mode: Requires command_line_docker_network_mode: "bridge". Add @google/gemini-cli to command_line_docker_packages.preinstall.npm or use an image with Gemini CLI pre-installed.

Supported Models: gemini-2.5-pro (default), gemini-2.5-flash, gemini-2.5-flash-lite, gemini-3-flash-preview, gemini-3-pro-preview, gemini-3.1-pro-preview

Example configs: massgen/configs/providers/gemini/gemini_cli_local.yaml, gemini_cli_docker.yaml

GitHub Copilot Backend#

Prerequisites:

  1. An active GitHub Copilot subscription

  2. Install the Copilot CLI:

    # macOS / Linux
    brew install copilot-cli
    
    # npm (all platforms)
    npm install -g @github/copilot
    
    # Windows
    winget install GitHub.Copilot
    
  3. Authenticate — run copilot and use the /login slash command, or set a GH_TOKEN / GITHUB_TOKEN environment variable with a fine-grained PAT that has the Copilot Requests permission.

Basic Configuration:

agents:
  - id: "copilot-assistant"
    backend:
      type: "copilot"
      model: "gpt-5-mini"

Supported Models: gpt-5-mini (default), gpt-4, claude-sonnet-4, gemini-2.5-pro

Special Features:

  • No API key required — authentication is handled through your GitHub subscription

  • Web search capability

  • MCP server support

  • Session persistence and resumption

Gemini Backend#

Basic Configuration:

agents:
  - id: "gemini_agent"
    backend:
      type: "gemini"
      model: "gemini-2.5-flash"
      enable_web_search: true
      enable_code_execution: true

With Safety Settings:

agents:
  - id: "safe_gemini"
    backend:
      type: "gemini"
      model: "gemini-2.5-pro"
      safety_settings:
        HARM_CATEGORY_HARASSMENT: "BLOCK_MEDIUM_AND_ABOVE"
        HARM_CATEGORY_HATE_SPEECH: "BLOCK_MEDIUM_AND_ABOVE"

Supported Models: gemini-2.5-flash, gemini-2.5-pro, gemini-2.5-flash-thinking

Grok Backend#

Basic Configuration:

agents:
  - id: "grok_agent"
    backend:
      type: "grok"
      model: "grok-3-mini"
      enable_web_search: true

Supported Models: grok-4, grok-4-fast, grok-3, grok-3-mini

Azure OpenAI Backend#

Configuration:

agents:
  - id: "azure_agent"
    backend:
      type: "azure_openai"
      model: "gpt-4"
      deployment_name: "my-gpt4-deployment"
      api_version: "2024-02-15-preview"

Required Environment Variables:

AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_VERSION=YOUR-AZURE-OPENAI-API-VERSION

AG2 Backend#

Configuration:

agents:
  - id: "ag2_agent"
    backend:
      type: "ag2"
      agent_type: "ConversableAgent"
      llm_config:
        config_list:
          - model: "gpt-4"
            api_key: "${OPENAI_API_KEY}"
      code_execution_config:
        executor: "local"
        work_dir: "coding"

See General Framework Interoperability for detailed AG2 configuration.

LM Studio Backend#

For Local Models:

agents:
  - id: "local_agent"
    backend:
      type: "lmstudio"
      model: "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF"
      port: 1234

Features:

  • Automatic LM Studio CLI installation

  • Auto-download and loading of models

  • Zero-cost usage

  • Full privacy (local inference)

OpenRouter Backend#

OpenRouter provides unified access to multiple AI providers through a single API. Use the chatcompletion backend type with OpenRouter’s base URL.

Basic Configuration:

agents:
  - id: "openrouter_agent"
    backend:
      type: "chatcompletion"
      model: "openai/gpt-5-mini"
      base_url: "https://openrouter.ai/api/v1"

With Reasoning Tokens:

OpenRouter normalizes reasoning tokens across providers. Configure reasoning for models that support it (OpenAI o-series, GPT-5, Claude 3.7+, Gemini 2.5+, DeepSeek R1, Grok):

agents:
  - id: "reasoning_agent"
    backend:
      type: "chatcompletion"
      model: "openai/gpt-5-mini"
      base_url: "https://openrouter.ai/api/v1"
      reasoning:
        effort: "medium"       # xhigh, high, medium, low, minimal, none
        max_tokens: 2000       # Optional: direct token limit (Anthropic-style)
        exclude: false         # Optional: set true to hide reasoning from response

With Web Search:

agents:
  - id: "search_agent"
    backend:
      type: "chatcompletion"
      model: "openai/gpt-5-mini"
      base_url: "https://openrouter.ai/api/v1"
      enable_web_search: true
      engine: "exa"            # exa (AI-native) or native (traditional)
      max_results: 10
      search_context_size: "high"  # low, medium, high

Reasoning Effort Levels:

  • xhigh: ~95% of max_tokens for reasoning

  • high: ~80% of max_tokens for reasoning

  • medium: ~50% of max_tokens for reasoning (default)

  • low: ~20% of max_tokens for reasoning

  • minimal: ~10% of max_tokens for reasoning

  • none: Disable reasoning entirely

Environment Variable:

OPENROUTER_API_KEY=your-openrouter-api-key

Note

Reasoning tokens are output tokens and billed accordingly. Models automatically include reasoning in responses when appropriate. Use exclude: true if you want the model to reason internally without returning the reasoning text.

Local Inference Backends (vLLM & SGLang)#

Unified Inference Backend (v0.0.24-v0.0.25)

MassGen supports high-performance local model serving through vLLM and SGLang with automatic server detection:

agents:
  - id: "local_vllm"
    backend:
      type: "chatcompletion"
      model: "meta-llama/Llama-3.1-8B-Instruct"
      base_url: "http://localhost:8000/v1"    # vLLM default port
      api_key: "EMPTY"

  - id: "local_sglang"
    backend:
      type: "chatcompletion"
      model: "meta-llama/Llama-3.1-8B-Instruct"
      base_url: "http://localhost:30000/v1"   # SGLang default port
      api_key: "${SGLANG_API_KEY}"

Auto-Detection:

  • vLLM: Default port 8000

  • SGLang: Default port 30000

  • Automatically detects server type based on configuration

  • Unified InferenceBackend class handles both

SGLang-Specific Parameters:

backend:
  type: "chatcompletion"
  model: "meta-llama/Llama-3.1-8B-Instruct"
  base_url: "http://localhost:30000/v1"
  separate_reasoning: true        # SGLang guided generation
  top_k: 50                        # Sampling parameter
  repetition_penalty: 1.1          # Prevent repetition

Mixed Deployments:

Run both vLLM and SGLang simultaneously:

agents:
  - id: "vllm_agent"
    backend:
      type: "chatcompletion"
      model: "Qwen/Qwen2.5-7B-Instruct"
      base_url: "http://localhost:8000/v1"
      api_key: "EMPTY"

  - id: "sglang_agent"
    backend:
      type: "chatcompletion"
      model: "Qwen/Qwen2.5-7B-Instruct"
      base_url: "http://localhost:30000/v1"
      api_key: "${SGLANG_API_KEY}"
      separate_reasoning: true

Benefits of Local Inference:

  • Cost Savings: Zero API costs after initial setup

  • Privacy: No data sent to external services

  • Control: Full control over model selection and parameters

  • Performance: Optimized for high-throughput inference

  • Customization: Fine-tune models for specific use cases

Setup vLLM Server:

# Install vLLM
pip install vllm

# Start vLLM server
vllm serve meta-llama/Llama-3.1-8B-Instruct \
  --host 0.0.0.0 \
  --port 8000

Setup SGLang Server:

# Install SGLang
pip install "sglang[all]"

# Start SGLang server
python -m sglang.launch_server \
  --model-path meta-llama/Llama-3.1-8B-Instruct \
  --host 0.0.0.0 \
  --port 30000

Configuration Example:

See @examples/basic/multi/two_qwen_vllm_sglang.yaml for a complete mixed deployment example.

Common Backend Parameters#

Model Parameters#

All backends support these common parameters:

backend:
  type: "openai"
  model: "gpt-5-nano"

  # Generation parameters
  temperature: 0.7           # Randomness (0.0-2.0, default 0.7)
  max_tokens: 4096           # Maximum response length
  top_p: 1.0                 # Nucleus sampling (0.0-1.0)

  # API configuration
  api_key: "${OPENAI_API_KEY}"  # Optional - uses env var by default
  timeout: 60                    # Request timeout in seconds

Tool Configuration#

Enable or disable built-in tools:

backend:
  type: "gemini"
  model: "gemini-2.5-flash"

  # Enable tools
  enable_web_search: true
  enable_code_execution: true

  # MCP servers (see MCP Integration guide)
  mcp_servers:
    - name: "server_name"
      type: "stdio"
      command: "npx"
      args: ["..."]

Multi-Backend Configurations#

Using Different Backends#

Each agent can use a different backend:

agents:
  - id: "fast_researcher"
    backend:
      type: "gemini"
      model: "gemini-2.5-flash"
      enable_web_search: true

  - id: "deep_analyst"
    backend:
      type: "openai"
      model: "gpt-5"
      reasoning:
        effort: "high"

  - id: "code_expert"
    backend:
      type: "claude_code"
      model: "claude-sonnet-4"
      cwd: "workspace"

This is the recommended approach - use each backend’s strengths:

  • Gemini 2.5 Flash: Fast research with web search

  • GPT-5: Advanced reasoning and analysis

  • Claude Code: Development with file operations

Backend Selection Guide#

Choosing the Right Backend#

Consider these factors when selecting backends:

For Research Tasks:

  • Gemini 2.5 Flash: Fast, cost-effective, excellent web search

  • GPT-5-nano: Good reasoning with web search

  • Grok: Real-time information access

For Coding Tasks:

  • Claude Code: Best for file operations, full dev tools

  • GPT-5: Advanced code generation with reasoning

  • Gemini 2.5 Pro: Complex code analysis

For Analysis Tasks:

  • GPT-5: Deep reasoning and complex analysis

  • Claude Sonnet 4: Long context, detailed analysis

  • Gemini 2.5 Pro: Comprehensive multimodal analysis

For Cost-Sensitive Tasks:

  • GPT-5-nano: Low-cost OpenAI model

  • Grok-3-mini: Fast and affordable

  • Gemini 2.5 Flash: Very cost-effective

  • LM Studio: Free (local inference)

For Privacy-Sensitive Tasks:

  • LM Studio: Fully local, no data sharing

  • Azure OpenAI: Enterprise security

  • Self-hosted vLLM: Private cloud deployment

Native Tool Backends (Claude Code, Codex & Gemini CLI)#

MassGen supports three “native tool” agent backends that wrap CLI/SDK tools rather than just API calls: Claude Code (Anthropic’s Claude Code SDK), Codex (OpenAI’s Codex CLI), and Gemini CLI (Google’s Gemini CLI). All three are agent backends — they require no API key and authenticate via their own CLI login flow. They come with built-in filesystem and shell tools, providing a more integrated development experience but with different security characteristics than API-only backends.

Architecture Differences#

Native Tool Backends vs API Backends#

Aspect

Agent Backends (Claude Code, Codex, Gemini CLI)

API Backends (OpenAI, Claude, Gemini, etc.)

Tool Execution

Native tools (Read, Write, Bash) run locally via CLI/SDK

Tools run via MassGen’s MCP servers

Permission Control

Backend’s own sandbox + limited MassGen hooks

Full MassGen PathPermissionManager control

Filesystem Access

Direct local filesystem access

Controlled through MCP filesystem tools

State Management

Stateful (session persistence, conversation history)

Stateless (each call is independent)

Authentication

CLI login (no API key required)

API key required

Agent Backend Comparison#

Claude Code vs Codex vs Gemini CLI#

Feature

Claude Code

Codex

Gemini CLI

Provider

Anthropic (Claude Code SDK)

OpenAI (Codex CLI)

Google (Gemini CLI)

Authentication

Subscription or CLAUDE_CODE_API_KEY; no API key needed

codex login OAuth; no API key needed

gemini CLI login (Google account); no API key needed

Models

Claude Sonnet 4, Claude Opus 4

GPT-5.4, GPT-5.3-Codex, GPT-5.2-Codex, GPT-5.1-Codex

gemini-2.5-pro, gemini-2.5-flash, gemini-3.1-pro-preview

Native Tools

Read, Write, Edit, Bash, Grep, Glob, WebSearch, WebFetch

shell, apply_patch, web_search, image_view

ReadFile, WriteFile, RunShellCommand, WebSearch, WebFetch

MCP Support

Yes (SDK-native)

Yes (via .codex/config.toml)

Yes (via .gemini/settings.json)

Sandbox Type

SDK permission hooks

OS-level (Seatbelt on macOS, Landlock on Linux)

Process-level (workspace isolation)

Read Restrictions

Yes - SDK hooks block reads outside allowed paths

No - OS sandbox only restricts writes

Yes - workspace-scoped

Write Restrictions

Yes - SDK hooks enforce write permissions

Yes - OS sandbox restricts writes to writable_roots

Yes - workspace-scoped

Warning

Codex Sandbox Limitation: Codex uses OS-level sandboxing (Seatbelt on macOS, Landlock on Linux) which only restricts writes, NOT reads. This means Codex can read any file on the filesystem, including sensitive files outside the workspace and context_paths (SSH keys, credentials, environment files, etc.).

MassGen’s permission hooks cannot intercept Codex’s native tool calls because they run directly through the Codex CLI’s internal tools.

Security Recommendations#

For security-sensitive workloads, prefer Docker mode which provides full filesystem isolation via container boundaries:

# Recommended: Docker mode for Codex with sensitive data
agents:
  - id: "secure_codex"
    backend:
      type: "codex"
      model: "gpt-5.4"
      cwd: "workspace"
      enable_mcp_command_line: true
      command_line_execution_mode: "docker"
      command_line_docker_network_mode: "bridge"  # Required for Codex
      command_line_docker_enable_sudo: true

Important

Codex in Docker mode requires command_line_docker_network_mode: "bridge". Without this setting, Codex will fail to execute. The validator will check for this.

In Docker mode:

  • The container itself is the sandbox - Codex’s native tools can only access what’s mounted

  • Host filesystem is fully isolated from the agent

  • ~/.codex/ is mounted read-only for OAuth token access

  • The Codex CLI runs with --sandbox danger-full-access since the container provides isolation

When Docker is not available, consider:

  1. Use Claude Code or Gemini CLI instead - Both provide read/write restrictions via their own permission model

  2. Limit context_paths - Only grant access to directories that need agent access

  3. Avoid sensitive data - Don’t run Codex in directories with credentials or secrets

  4. Use API-only backends - For maximum control, use openai or claude backends with MCP tools

Backend Configuration Best Practices#

  1. Start with defaults: Test with default parameters before tuning

  2. Use environment variables: Never hardcode API keys

  3. Match backend to task: Use each backend’s strengths

  4. Enable only needed tools: Disable unused capabilities

  5. Set appropriate timeouts: Longer timeouts for complex tasks

  6. Monitor costs: Track API usage across backends

  7. Test configurations: Verify settings before production use

Advanced Backend Configuration#

For detailed backend-specific parameters, see:

MCP Integration#

See MCP Integration for:

  • Adding MCP servers to backends

  • Tool filtering (allowed_tools, exclude_tools)

  • Planning mode configuration (v0.0.29)

  • HTTP-based MCP servers

File Operations#

See File Operations & Workspace Management for:

  • Workspace configuration

  • Snapshot storage

  • Permission management

  • Cross-agent file sharing

Troubleshooting#

Backend not found:

Ensure the backend type is correct:

# Correct backend types
type: "openai"         # ✅
type: "claude_code"    # ✅
type: "codex"          # ✅
type: "copilot"        # ✅
type: "gemini"         # ✅
type: "gemini_cli"     # ✅

# Incorrect (common mistakes)
type: "gpt"            # ❌ Use "openai"
type: "claude"         # ✅ (but consider "claude_code" for dev tools)
type: "google"         # ❌ Use "gemini"

API key not found:

Check your .env file has the correct variable name:

# Backend type → Environment variable
openai        OPENAI_API_KEY
claude        ANTHROPIC_API_KEY
claude_code   CLAUDE_CODE_API_KEY (falls back to ANTHROPIC_API_KEY)
codex         OPENAI_API_KEY (or use `codex login` for OAuth)
copilot       GH_TOKEN or GITHUB_TOKEN (or use /login in Copilot CLI)
gemini        GOOGLE_API_KEY
gemini_cli    GOOGLE_API_KEY or GEMINI_API_KEY (or use `gemini` login)
grok          XAI_API_KEY
zai           ZAI_API_KEY
azure_openai  AZURE_OPENAI_API_KEY

Note

Separate API keys for Claude Code: The claude_code backend checks CLAUDE_CODE_API_KEY first, then falls back to ANTHROPIC_API_KEY. This allows you to use a Claude subscription (no API key needed) or a separate API key for Claude Code agents while using a different API key for standard Claude backend agents.

Model not supported:

Verify the model name matches the backend’s supported models:

# Check supported models in README.md or use --model flag
backend:
  type: "openai"
  model: "gpt-5-nano"  # ✅ Supported
  model: "gpt-6"       # ❌ Not yet available

Next Steps#