Code Execution

Code Execution#

MassGen provides powerful command-line execution capabilities through MCP (Model Context Protocol), enabling agents to run bash commands, install packages, execute scripts, and more - all with multiple layers of security.

Quick Start#

Enable code execution for a single agent:

agent:
  backend:
    type: "openai"
    model: "gpt-5-mini"
    cwd: "workspace"
    enable_mcp_command_line: true  # Enables code execution

Run with code execution:

massgen "Write a Python script to analyze data.csv and create a report"

Execution Modes#

MassGen supports two execution modes:

Local Mode (Default)#

Commands execute directly on your host system with pattern-based security:

agent:
  backend:
    cwd: "workspace"
    enable_mcp_command_line: true
    command_line_execution_mode: "local"  # Default

Best for: Development, trusted code, fast execution

Docker Mode#

Commands execute inside isolated Docker containers:

agent:
  backend:
    cwd: "workspace"
    enable_mcp_command_line: true
    command_line_execution_mode: "docker"

Best for: Production, untrusted code, high security requirements

See Docker Mode Setup for setup instructions.

Docker Credentials & Package Management#

Docker mode supports comprehensive credential management and package preinstallation through two nested configuration dictionaries: command_line_docker_credentials and command_line_docker_packages.

Credential Management#

1. Mount Credential Files

Mount credential files from your host into the container (all mounted read-only):

command_line_docker_credentials:
  mount:
    - "ssh_keys"     # ~/.ssh → /home/massgen/.ssh
    - "git_config"   # ~/.gitconfig → /home/massgen/.gitconfig
    - "gh_config"    # ~/.config/gh → /home/massgen/.config/gh
    - "npm_config"   # ~/.npmrc → /home/massgen/.npmrc
    - "pypi_config"  # ~/.pypirc → /home/massgen/.pypirc
    - "claude_config"  # ~/.claude → /home/massgen/.claude
    - "codex_config"  # ~/.codex → /home/massgen/.codex

Available mount types:

ssh_keys - Clone private repos via SSH (git clone git@github.com:org/repo.git)
git_config - Git user name/email for commits
gh_config - GitHub CLI authentication (use if you’ve run gh auth login)
npm_config - Private npm package authentication
pypi_config - Private PyPI package authentication
claude_config - Claude Code CLI session/config files (for Claude auth inheritance in Docker)
codex_config - Codex CLI OAuth/session files (for keyless Codex auth inheritance in Docker)

2. Pass Environment Variables

Multiple methods to pass environment variables:

# Option 1: From .env file - load ALL variables
command_line_docker_credentials:
  env_file: ".env"

# Option 2: From .env file - load ONLY specific variables (recommended)
command_line_docker_credentials:
  env_file: ".env"
  env_vars_from_file:  # Only pass these from .env
    - "GITHUB_TOKEN"
    - "NPM_TOKEN"
  # Other secrets in .env won't be passed to container

# Option 3: Specific variables from host environment
command_line_docker_credentials:
  env_vars:
    - "GITHUB_TOKEN"
    - "NPM_TOKEN"
    - "ANTHROPIC_API_KEY"

# Option 4: All environment variables (dangerous, use with caution)
command_line_docker_credentials:
  pass_all_env: true

3. Custom Volume Mounts

Mount additional files or directories:

command_line_docker_credentials:
  additional_mounts:
    "/path/on/host/.aws":
      bind: "/home/massgen/.aws"
      mode: "ro"

GitHub CLI Authentication#

GitHub CLI (gh) is pre-installed in MassGen Docker images. Two authentication methods:

Method 1: Use Existing Login (recommended if you’ve run gh auth login):

command_line_docker_credentials:
  mount:
    - "gh_config"  # Mounts ~/.config/gh with your credentials

Method 2: Pass Token:

command_line_docker_credentials:
  env_vars:
    - "GITHUB_TOKEN"  # Set: export GITHUB_TOKEN=ghp_your_token

For HTTPS git clones, also add the token so git can authenticate:

command_line_docker_credentials:
  mount: ["gh_config", "ssh_keys", "git_config"]
  env_vars: ["GITHUB_TOKEN"]  # Enables both gh CLI and HTTPS git

Agents can then use gh commands:

gh auth status
gh api user
gh repo clone user/repo
gh issue list
gh pr list

Package Preinstall#

Specify base packages to pre-install in every container. These install when the container is created, before agents start working:

command_line_docker_packages:
  preinstall:
    python:
      - "requests>=2.31.0"
      - "numpy>=1.24.0"
      - "pytest>=7.0.0"
    npm:
      - "typescript"
      - "@types/node"
    system:
      - "vim"
      - "htop"

Installation order: System packages → Python packages → npm packages (all with sudo if enabled).

When to use:

Consistent base environment across all runs
Different package sets per configuration
Quick iteration without rebuilding Docker images

Requirements:

npm/system packages require: command_line_docker_enable_sudo: true
All packages require: command_line_docker_network_mode: "bridge"

Custom Docker Images#

For stable dependencies or complex environments, create a custom Docker image:

command_line_docker_image: "your-username/custom-image:tag"

Example custom Dockerfile (see massgen/docker/Dockerfile.custom-example):

FROM massgen/mcp-runtime:latest
RUN pip install --no-cache-dir scikit-learn matplotlib seaborn
RUN apt-get update && apt-get install -y vim htop && rm -rf /var/lib/apt/lists/*

Build and use:

docker build -t my-custom-image:v1 -f Dockerfile.custom .

Key requirements for custom images:

Must have massgen user with UID 1000
Must create /workspace, /context, /temp_workspaces directories
Must set appropriate permissions
CMD should keep container running (tail -f /dev/null)

Complete Example Configurations#

Minimal GitHub access:

agent:
  backend:
    enable_mcp_command_line: true
    command_line_execution_mode: "docker"
    command_line_docker_network_mode: "bridge"
    command_line_docker_credentials:
      env_vars: ["GITHUB_TOKEN"]

Full development setup:

agent:
  backend:
    enable_mcp_command_line: true
    command_line_execution_mode: "docker"
    command_line_docker_enable_sudo: true
    command_line_docker_network_mode: "bridge"

    command_line_docker_credentials:
      env_file: ".env"
      mount: ["ssh_keys", "git_config"]

    command_line_docker_packages:
      preinstall:
        python: ["pytest", "requests", "numpy"]
        npm: ["typescript"]

Security best practices:

Use .env files for credentials (add to .gitignore)
Use env_vars_from_file to only pass needed secrets from .env (recommended)
Mount only needed credentials (opt-in by default)
Use command_line_docker_network_mode: "none" unless network is required
All credential files are mounted read-only
Use command filtering (blocked_commands) for additional safety

Ready-to-run examples:

GitHub read-only mode (safe mode with credentials):

# Prerequisites: gh auth login or export GITHUB_TOKEN
uv run massgen --config @examples/configs/tools/code-execution/docker_github_readonly.yaml "Test to see the most recent issues in the massgen/MassGen repo with the github cli"

Full development setup (all features combined):

# Prerequisites: Build sudo image, create .env file
bash massgen/docker/build.sh --sudo
echo "GITHUB_TOKEN=ghp_your_token" > .env

uv run massgen --config @examples/configs/tools/code-execution/docker_full_dev_setup.yaml "Demonstrate full dev environment: check gh auth, verify pre-installed massgen, verify typescript installed, create Flask app with requirements.txt, show git config"

Custom Docker image (bring your own image):

# Prerequisites: Build custom image
docker build -t massgen-custom-test:v1 -f massgen/docker/Dockerfile.custom-example .

uv run massgen --config @examples/configs/tools/code-execution/docker_custom_image.yaml "Verify custom packages: sklearn, matplotlib, seaborn, ipython, black, vim, htop, tree"

More examples: See massgen/configs/tools/code-execution/ for additional configurations.

Code Execution vs Backend Built-in Tools#

MassGen provides two ways for agents to execute code:

Backend Built-in Code Execution
MCP-based Code Execution (Universal)

Feature	Backend Built-in	MCP Code Execution
Availability	Backend-specific (OpenAI, Claude Code)	Universal (all backends)
Configuration	Automatic with supported backends	`enable_mcp_command_line: true`
Execution Environment	Backend provider’s sandbox	Your environment (local/Docker)
Persistence	Ephemeral (resets between sessions)	Persistent (packages stay installed)
File System Access	Limited to backend’s environment	Full access to workspace
Package Installation	Backend-managed	You control (pip, npm, etc.)
Network Access	Provider-controlled	Configurable (local: full, Docker: none/bridge/host)
Use Case	Quick calculations, simple scripts	Complex workflows, persistent environments

You can use both simultaneously! The agent will choose the most appropriate tool for each task.

Configuration#

Basic Configuration#

Enable MCP code execution with minimal setup:

agent:
  backend:
    type: "openai"
    model: "gpt-5-mini"
    cwd: "workspace"
    enable_mcp_command_line: true

Advanced Configuration#

Full configuration with Docker mode and security:

agent:
  backend:
    type: "claude"
    model: "claude-sonnet-4"
    cwd: "workspace"

    # Enable MCP code execution
    enable_mcp_command_line: true
    command_line_execution_mode: "docker"  # or "local"

    # Docker-specific settings (if using docker mode)
    command_line_docker_image: "massgen/mcp-runtime:latest"
    command_line_docker_memory_limit: "2g"
    command_line_docker_cpu_limit: 4.0
    command_line_docker_network_mode: "none"  # "none", "bridge", or "host"

    # Command filtering (optional)
    command_line_whitelist_patterns: ["pip install.*", "python .*"]
    command_line_blacklist_patterns: ["rm -rf /", "sudo .*"]

Configuration Parameters#

Parameter	Default	Description
`enable_mcp_command_line`	`false`	Enable MCP-based code execution
`command_line_execution_mode`	`"local"`	Execution mode: `"local"` or `"docker"`
`command_line_docker_image`	`"massgen/mcp-runtime:latest"`	Docker image for container execution
`command_line_docker_memory_limit`	None	Memory limit (e.g., `"2g"`, `"512m"`)
`command_line_docker_cpu_limit`	None	CPU cores limit (e.g., `2.0`, `4.0`)
`command_line_docker_network_mode`	`"none"`	Network mode: `"none"`, `"bridge"`, or `"host"`
`command_line_docker_enable_sudo`	`false`	Enable sudo in containers (⚠️ less secure, see docs)
`command_line_whitelist_patterns`	None	Regex patterns for allowed commands
`command_line_blacklist_patterns`	None	Regex patterns for blocked commands

Docker Mode Setup#

Prerequisites#

Docker installed and running:

docker --version  # Should show Docker Engine >= 28.0.0
docker ps         # Should connect without errors

Recommended: Docker Engine 28.0.0+ (release notes)

Python docker library:

# Install via optional dependency group
uv pip install -e ".[docker]"

# Or install directly
pip install docker>=7.0.0

Build Docker Image#

From the repository root:

bash massgen/docker/build.sh

This builds massgen/mcp-runtime:latest (~400-500MB).

Enable Docker Mode#

Simple configuration:

agent:
  backend:
    cwd: "workspace"
    enable_mcp_command_line: true
    command_line_execution_mode: "docker"

That’s it! The container will be created automatically when orchestration starts.

How It Works#

Container Lifecycle:

Orchestration Start → Creates persistent container massgen-{agent_id}
Agent Turns → Commands execute via docker exec
Orchestration End → Container stopped and removed

Key Features:

Persistent Containers: One container per agent for entire orchestration
State Persistence: Packages and files persist across turns
Path Transparency: Paths mounted at same locations as host
MCP Server on Host: Server runs on host, creates Docker client to execute commands

Volume Mounts:

Workspace: Read-write access to agent’s workspace
Context Paths: Read-only or read-write based on configuration
Temp Workspace: Read-only access to other agents’ outputs

Security Features#

Multi-Layer Security#

MassGen implements multiple security layers for code execution:

AG2-Inspired Command Sanitization

Blocks dangerous patterns:
- rm -rf /
- sudo commands
- chmod 777
- And more…

Command Filtering

Whitelist/blacklist regex patterns:

command_line_whitelist_patterns: ["pip install.*", "python .*"]
command_line_blacklist_patterns: ["rm -rf.*", "sudo.*"]

Docker Container Isolation (Docker mode only)
- Filesystem isolation (only mounted volumes accessible)
- Network isolation (default: no network)
- Resource limits (memory, CPU)
- Process isolation (non-root user)
PathPermissionManager Hooks

Validates file operations against context path permissions
Timeout Enforcement

Commands timeout after configured duration

Local vs Docker Comparison#

Aspect	Local Mode	Docker Mode
Setup	None required	Docker + image build
Performance	Fast (direct execution)	Slight overhead (~100-200ms)
Isolation	Pattern-based (circumventable)	Container-based (strong)
Network	Full host network	Configurable (none/bridge/host)
Resource Limits	OS-level only	Docker-enforced
Security	Medium	High
Best For	Development, trusted code	Production, untrusted code

Usage Examples#

Example 1: Python Development#

agent:
  backend:
    type: "claude"
    model: "claude-sonnet-4"
    cwd: "workspace"
    enable_mcp_command_line: true
    command_line_execution_mode: "docker"

massgen "Write and test a sorting algorithm"

What happens:

Agent writes sort.py
Agent runs pip install pytest
Agent writes tests in test_sort.py
Agent runs pytest
All isolated in Docker container!

Example 2: With Resource Constraints#

agent:
  backend:
    cwd: "workspace"
    enable_mcp_command_line: true
    command_line_execution_mode: "docker"
    command_line_docker_memory_limit: "1g"
    command_line_docker_cpu_limit: 1.0
    command_line_docker_network_mode: "none"

Good for untrusted or resource-intensive tasks.

Example 3: With Network Access#

agent:
  backend:
    cwd: "workspace"
    enable_mcp_command_line: true
    command_line_execution_mode: "docker"
    command_line_docker_network_mode: "bridge"

massgen "Fetch data from an API and analyze it"

Agent can make HTTP requests from inside container.

Example 4: Multi-Agent with Different Modes#

agents:
  - id: "developer"
    backend:
      type: "openai"
      model: "gpt-5-mini"
      cwd: "workspace1"
      enable_mcp_command_line: true
      command_line_execution_mode: "local"  # Fast for development

  - id: "tester"
    backend:
      type: "claude"
      model: "claude-sonnet-4"
      cwd: "workspace2"
      enable_mcp_command_line: true
      command_line_execution_mode: "docker"  # Isolated for testing

Docker Image Details#

Base Image: massgen/mcp-runtime:latest#

Contents:

Base: Python 3.11-slim
System packages: git, curl, build-essential, Node.js 20.x
Python packages: pytest, requests, numpy, pandas
User: non-root (massgen, UID 1000)
Working directory: /workspace

Size: ~400-500MB (compressed)

Custom Images#

Extend the base image with additional packages:

FROM massgen/mcp-runtime:latest

# Install additional system packages
USER root
RUN apt-get update && apt-get install -y --no-install-recommends \
    postgresql-client \
    && rm -rf /var/lib/apt/lists/*

# Install additional Python packages
USER massgen
RUN pip install --no-cache-dir sqlalchemy psycopg2-binary

WORKDIR /workspace

Build and use:

docker build -t my-custom-runtime:latest -f Dockerfile.custom .

command_line_docker_image: "my-custom-runtime:latest"

Sudo Variant (Runtime Package Installation)#

The sudo variant allows agents to install system packages at runtime inside their Docker container.

IMPORTANT: Build the image before first use:

bash massgen/docker/build.sh --sudo

This builds massgen/mcp-runtime-sudo:latest with sudo access locally. (This image is not available on Docker Hub - you must build it yourself.)

Enable in config:

agent:
  backend:
    cwd: "workspace"
    enable_mcp_command_line: true
    command_line_execution_mode: "docker"
    command_line_docker_enable_sudo: true  # Automatically uses sudo image

What agents can do with sudo:

# Install system packages at runtime
sudo apt-get update && sudo apt-get install -y ffmpeg

# Install additional Python packages
sudo pip install tensorflow

Is this safe?

YES, because Docker container isolation is the primary security boundary:

Container is fully isolated from your host:

Sudo inside container ≠ sudo on your computer
Agent can only access mounted volumes (workspace, context paths)
Cannot access your host filesystem outside mounts
Cannot affect host processes or system configuration
Docker namespaces/cgroups provide strong isolation

What sudo can and cannot do:

✅ Can: Install packages inside the container (apt, pip, npm)
✅ Can: Modify container system configuration
✅ Can: Read/write mounted workspace (same as without sudo)
❌ Cannot: Access your host filesystem outside mounts
❌ Cannot: Affect your host system
❌ Cannot: Break out of the container (unless Docker vulnerability exists)

Theoretical risks (extremely rare):

Container escape vulnerabilities (CVEs in Docker/kernel) are very rare and quickly patched
Sudo increases attack surface slightly if escape exists
Still requires exploit code, not just malicious intent

When to use sudo variant vs custom images:

Approach	Use When	Performance	Security
Sudo variant	Need flexibility, unknown packages, prototyping	Slower (runtime install)	Good (container isolated)
Custom image	Know packages, production use	Fast (pre-installed)	Best (minimal attack surface)

Custom image example (recommended for production):

FROM massgen/mcp-runtime:latest
USER root
RUN apt-get update && apt-get install -y ffmpeg postgresql-client
USER massgen

Build: docker build -t my-runtime:latest .

Use: command_line_docker_image: "my-runtime:latest"

Bottom line: The sudo variant is safe for most use cases because Docker container isolation is strong. Custom images are preferred for production because they’re faster and have a smaller attack surface, but sudo is fine for development and prototyping.

Troubleshooting#

Docker Not Installed#

Symptom: RuntimeError: Docker Python library not available

Solution:

pip install docker>=7.0.0

Failed to Connect to Docker#

Symptom: RuntimeError: Failed to connect to Docker: ...

Possible causes:

Docker daemon not running:

docker ps  # Check if Docker is running

Permission issues (Linux):

sudo usermod -aG docker $USER
# Log out and back in

Custom Docker socket:

export DOCKER_HOST=unix:///path/to/docker.sock

Image Not Found#

Symptom: RuntimeError: Failed to pull Docker image ...

Solution:

bash massgen/docker/build.sh

Permission Errors in Container#

Symptom: Permission denied when writing files

Solution: Ensure workspace has correct permissions:

chmod -R 755 workspace

Performance Issues#

Solutions:

Increase resource limits:

command_line_docker_memory_limit: "4g"
command_line_docker_cpu_limit: 4.0

Use custom image with pre-installed packages
Check Docker Desktop resource settings

Debugging#

Inspect Running Container#

# List containers
docker ps | grep massgen

# View logs in real-time
docker logs -f massgen-{agent_id}

# Execute interactive shell
docker exec -it massgen-{agent_id} /bin/bash

Check Resource Usage#

docker stats massgen-{agent_id}

Manual Container Management#

# Stop container
docker stop massgen-{agent_id}

# Remove container
docker rm massgen-{agent_id}

# Clean up all stopped containers
docker container prune -f

Background Shell Execution#

NEW: MassGen supports running commands in the background without blocking, enabling parallel execution and long-running processes.

What is Background Execution?#

Background execution allows agents to:

Start long-running processes (training, servers, simulations)
Run multiple experiments in parallel
Monitor processes without blocking
Continue working while tasks execute

Available Tools:

When enable_mcp_command_line: true is set, agents automatically get these tools:

start_background_shell(command, work_dir) - Start command in background, returns shell_id
get_background_shell_output(shell_id) - Retrieve stdout/stderr from background process
get_background_shell_status(shell_id) - Check if running/stopped/failed
kill_background_shell(shell_id) - Terminate a background process
list_background_shells() - List all active background processes

Example: Parallel Experiments#

agent:
  backend:
    type: "openai"
    model: "gpt-5-mini"
    cwd: "workspace"
    enable_mcp_command_line: true
  system_message: |
    You can run multiple experiments in parallel using background shell tools.
    Use start_background_shell() to launch tasks, then monitor with
    list_background_shells() and collect results when complete.

Agent workflow:

# Start 3 experiments in parallel
exp1 = start_background_shell("python experiment_a.py")
exp2 = start_background_shell("python experiment_b.py")
exp3 = start_background_shell("python experiment_c.py")

# Monitor until all complete
while True:
    shells = list_background_shells()
    running = [s for s in shells["shells"] if s["status"] == "running"]
    if len(running) == 0:
        break

# Collect results
result1 = get_background_shell_output(exp1["shell_id"])
result2 = get_background_shell_output(exp2["shell_id"])
result3 = get_background_shell_output(exp3["shell_id"])

Example: Server Management#

# Start web server in background
server = start_background_shell("uvicorn app:main --port 8000")

# Server runs while agent does other work...

# Run integration tests
test_result = execute_command("pytest tests/integration/")

# Cleanup: stop server
kill_background_shell(server["shell_id"])

Example: Long-Running Tasks with Monitoring#

# Start training job
training = start_background_shell("python train.py --epochs 100")

# Monitor progress periodically
while True:
    status = get_background_shell_status(training["shell_id"])

    if status["status"] != "running":
        break

    # Check progress from output
    output = get_background_shell_output(training["shell_id"])
    # Look for "Epoch X/100" in output...

# Training complete
final_output = get_background_shell_output(training["shell_id"])

Key Features#

Non-blocking: Continue work while processes run
Parallel execution: Run multiple tasks simultaneously (default limit: 10 concurrent)
Memory-safe: Ring buffer captures last 10,000 lines (prevents OOM on infinite output)
Auto-cleanup: All background processes killed on MassGen exit
Thread-safe: Safe for concurrent access from multiple agents
Same security: Background shells use same sanitization as foreground execute_command

Demo Configuration#

See massgen/configs/tools/code-execution/background_shell_demo.yaml for a complete example showing parallel vs sequential execution strategies.

Best Practices#

Use Docker mode for untrusted or production workloads
Set resource limits to prevent abuse
Use network_mode=”none” unless network is required
Build custom images for frequently used packages (faster)
Monitor container logs for debugging
Test in local mode first for faster iteration
Use command filtering to restrict dangerous operations
Use background shells for parallel tasks - Run multiple experiments concurrently
Monitor background processes - Use get_background_shell_status() to check progress
Cleanup background shells - Kill when done or let auto-cleanup handle it

Configuration Examples#

See massgen/configs/tools/code-execution/ for example configurations:

basic_command_execution.yaml - Minimal code execution setup
code_execution_use_case_simple.yaml - Simple use case example
command_filtering_whitelist.yaml - Whitelist filtering example
command_filtering_blacklist.yaml - Blacklist filtering example
docker_simple.yaml - Minimal Docker setup
docker_with_resource_limits.yaml - Memory/CPU limits with network
docker_multi_agent.yaml - Multi-agent with Docker isolation
docker_verification.yaml - Verify Docker isolation works
background_shell_demo.yaml - NEW: Parallel execution with background shells

Next Steps#

File Operations & Workspace Management - File system operations and workspace management
MCP Integration - Additional MCP tools beyond code execution
Supported Models & Backends - Backend capabilities including code execution
Running MassGen - More usage examples

References#

Docker Documentation
Docker Python SDK
Design Document: docs/dev_notes/CODE_EXECUTION_DESIGN.md
NEW: Background Execution Design: docs/dev_notes/background_shell_execution_design.md
Docker README: massgen/docker/README.md
Build Script: massgen/docker/build.sh