Code Execution#
MassGen provides powerful command-line execution capabilities through MCP (Model Context Protocol), enabling agents to run bash commands, install packages, execute scripts, and more - all with multiple layers of security.
Quick Start#
Enable code execution for a single agent:
agent:
backend:
type: "openai"
model: "gpt-5-mini"
cwd: "workspace"
enable_mcp_command_line: true # Enables code execution
Run with code execution:
massgen "Write a Python script to analyze data.csv and create a report"
Execution Modes#
MassGen supports two execution modes:
Local Mode (Default)#
Commands execute directly on your host system with pattern-based security:
agent:
backend:
cwd: "workspace"
enable_mcp_command_line: true
command_line_execution_mode: "local" # Default
Best for: Development, trusted code, fast execution
Docker Mode#
Commands execute inside isolated Docker containers:
agent:
backend:
cwd: "workspace"
enable_mcp_command_line: true
command_line_execution_mode: "docker"
Best for: Production, untrusted code, high security requirements
See Docker Mode Setup for setup instructions.
Docker Credentials & Package Management#
Docker mode supports comprehensive credential management and package preinstallation through two nested configuration dictionaries: command_line_docker_credentials and command_line_docker_packages.
Credential Management#
1. Mount Credential Files
Mount credential files from your host into the container (all mounted read-only):
command_line_docker_credentials:
mount:
- "ssh_keys" # ~/.ssh → /home/massgen/.ssh
- "git_config" # ~/.gitconfig → /home/massgen/.gitconfig
- "gh_config" # ~/.config/gh → /home/massgen/.config/gh
- "npm_config" # ~/.npmrc → /home/massgen/.npmrc
- "pypi_config" # ~/.pypirc → /home/massgen/.pypirc
- "claude_config" # ~/.claude → /home/massgen/.claude
- "codex_config" # ~/.codex → /home/massgen/.codex
Available mount types:
ssh_keys- Clone private repos via SSH (git clone git@github.com:org/repo.git)git_config- Git user name/email for commitsgh_config- GitHub CLI authentication (use if you’ve rungh auth login)npm_config- Private npm package authenticationpypi_config- Private PyPI package authenticationclaude_config- Claude Code CLI session/config files (for Claude auth inheritance in Docker)codex_config- Codex CLI OAuth/session files (for keyless Codex auth inheritance in Docker)
2. Pass Environment Variables
Multiple methods to pass environment variables:
# Option 1: From .env file - load ALL variables
command_line_docker_credentials:
env_file: ".env"
# Option 2: From .env file - load ONLY specific variables (recommended)
command_line_docker_credentials:
env_file: ".env"
env_vars_from_file: # Only pass these from .env
- "GITHUB_TOKEN"
- "NPM_TOKEN"
# Other secrets in .env won't be passed to container
# Option 3: Specific variables from host environment
command_line_docker_credentials:
env_vars:
- "GITHUB_TOKEN"
- "NPM_TOKEN"
- "ANTHROPIC_API_KEY"
# Option 4: All environment variables (dangerous, use with caution)
command_line_docker_credentials:
pass_all_env: true
3. Custom Volume Mounts
Mount additional files or directories:
command_line_docker_credentials:
additional_mounts:
"/path/on/host/.aws":
bind: "/home/massgen/.aws"
mode: "ro"
GitHub CLI Authentication#
GitHub CLI (gh) is pre-installed in MassGen Docker images. Two authentication methods:
Method 1: Use Existing Login (recommended if you’ve run gh auth login):
command_line_docker_credentials:
mount:
- "gh_config" # Mounts ~/.config/gh with your credentials
Method 2: Pass Token:
command_line_docker_credentials:
env_vars:
- "GITHUB_TOKEN" # Set: export GITHUB_TOKEN=ghp_your_token
For HTTPS git clones, also add the token so git can authenticate:
command_line_docker_credentials:
mount: ["gh_config", "ssh_keys", "git_config"]
env_vars: ["GITHUB_TOKEN"] # Enables both gh CLI and HTTPS git
Agents can then use gh commands:
gh auth status
gh api user
gh repo clone user/repo
gh issue list
gh pr list
Package Preinstall#
Specify base packages to pre-install in every container. These install when the container is created, before agents start working:
command_line_docker_packages:
preinstall:
python:
- "requests>=2.31.0"
- "numpy>=1.24.0"
- "pytest>=7.0.0"
npm:
- "typescript"
- "@types/node"
system:
- "vim"
- "htop"
Installation order: System packages → Python packages → npm packages (all with sudo if enabled).
When to use:
Consistent base environment across all runs
Different package sets per configuration
Quick iteration without rebuilding Docker images
Requirements:
npm/system packages require:
command_line_docker_enable_sudo: trueAll packages require:
command_line_docker_network_mode: "bridge"
Custom Docker Images#
For stable dependencies or complex environments, create a custom Docker image:
command_line_docker_image: "your-username/custom-image:tag"
Example custom Dockerfile (see massgen/docker/Dockerfile.custom-example):
FROM massgen/mcp-runtime:latest
RUN pip install --no-cache-dir scikit-learn matplotlib seaborn
RUN apt-get update && apt-get install -y vim htop && rm -rf /var/lib/apt/lists/*
Build and use:
docker build -t my-custom-image:v1 -f Dockerfile.custom .
Key requirements for custom images:
Must have
massgenuser with UID 1000Must create
/workspace,/context,/temp_workspacesdirectoriesMust set appropriate permissions
CMD should keep container running (
tail -f /dev/null)
Complete Example Configurations#
Minimal GitHub access:
agent:
backend:
enable_mcp_command_line: true
command_line_execution_mode: "docker"
command_line_docker_network_mode: "bridge"
command_line_docker_credentials:
env_vars: ["GITHUB_TOKEN"]
Full development setup:
agent:
backend:
enable_mcp_command_line: true
command_line_execution_mode: "docker"
command_line_docker_enable_sudo: true
command_line_docker_network_mode: "bridge"
command_line_docker_credentials:
env_file: ".env"
mount: ["ssh_keys", "git_config"]
command_line_docker_packages:
preinstall:
python: ["pytest", "requests", "numpy"]
npm: ["typescript"]
Security best practices:
Use
.envfiles for credentials (add to.gitignore)Use
env_vars_from_fileto only pass needed secrets from .env (recommended)Mount only needed credentials (opt-in by default)
Use
command_line_docker_network_mode: "none"unless network is requiredAll credential files are mounted read-only
Use command filtering (
blocked_commands) for additional safety
Ready-to-run examples:
GitHub read-only mode (safe mode with credentials):
# Prerequisites: gh auth login or export GITHUB_TOKEN uv run massgen --config @examples/configs/tools/code-execution/docker_github_readonly.yaml "Test to see the most recent issues in the massgen/MassGen repo with the github cli"
Full development setup (all features combined):
# Prerequisites: Build sudo image, create .env file bash massgen/docker/build.sh --sudo echo "GITHUB_TOKEN=ghp_your_token" > .env uv run massgen --config @examples/configs/tools/code-execution/docker_full_dev_setup.yaml "Demonstrate full dev environment: check gh auth, verify pre-installed massgen, verify typescript installed, create Flask app with requirements.txt, show git config"
Custom Docker image (bring your own image):
# Prerequisites: Build custom image docker build -t massgen-custom-test:v1 -f massgen/docker/Dockerfile.custom-example . uv run massgen --config @examples/configs/tools/code-execution/docker_custom_image.yaml "Verify custom packages: sklearn, matplotlib, seaborn, ipython, black, vim, htop, tree"
More examples: See massgen/configs/tools/code-execution/ for additional configurations.
Code Execution vs Backend Built-in Tools#
MassGen provides two ways for agents to execute code:
Backend Built-in Code Execution
MCP-based Code Execution (Universal)
Feature |
Backend Built-in |
MCP Code Execution |
|---|---|---|
Availability |
Backend-specific (OpenAI, Claude Code) |
Universal (all backends) |
Configuration |
Automatic with supported backends |
|
Execution Environment |
Backend provider’s sandbox |
Your environment (local/Docker) |
Persistence |
Ephemeral (resets between sessions) |
Persistent (packages stay installed) |
File System Access |
Limited to backend’s environment |
Full access to workspace |
Package Installation |
Backend-managed |
You control (pip, npm, etc.) |
Network Access |
Provider-controlled |
Configurable (local: full, Docker: none/bridge/host) |
Use Case |
Quick calculations, simple scripts |
Complex workflows, persistent environments |
You can use both simultaneously! The agent will choose the most appropriate tool for each task.
Configuration#
Basic Configuration#
Enable MCP code execution with minimal setup:
agent:
backend:
type: "openai"
model: "gpt-5-mini"
cwd: "workspace"
enable_mcp_command_line: true
Advanced Configuration#
Full configuration with Docker mode and security:
agent:
backend:
type: "claude"
model: "claude-sonnet-4"
cwd: "workspace"
# Enable MCP code execution
enable_mcp_command_line: true
command_line_execution_mode: "docker" # or "local"
# Docker-specific settings (if using docker mode)
command_line_docker_image: "massgen/mcp-runtime:latest"
command_line_docker_memory_limit: "2g"
command_line_docker_cpu_limit: 4.0
command_line_docker_network_mode: "none" # "none", "bridge", or "host"
# Command filtering (optional)
command_line_whitelist_patterns: ["pip install.*", "python .*"]
command_line_blacklist_patterns: ["rm -rf /", "sudo .*"]
Configuration Parameters#
Parameter |
Default |
Description |
|---|---|---|
|
|
Enable MCP-based code execution |
|
|
Execution mode: |
|
|
Docker image for container execution |
|
None |
Memory limit (e.g., |
|
None |
CPU cores limit (e.g., |
|
|
Network mode: |
|
|
Enable sudo in containers (⚠️ less secure, see docs) |
|
None |
Regex patterns for allowed commands |
|
None |
Regex patterns for blocked commands |
Docker Mode Setup#
Prerequisites#
Docker installed and running:
docker --version # Should show Docker Engine >= 28.0.0 docker ps # Should connect without errors
Recommended: Docker Engine 28.0.0+ (release notes)
Python docker library:
# Install via optional dependency group uv pip install -e ".[docker]" # Or install directly pip install docker>=7.0.0
Build Docker Image#
From the repository root:
bash massgen/docker/build.sh
This builds massgen/mcp-runtime:latest (~400-500MB).
Enable Docker Mode#
Simple configuration:
agent:
backend:
cwd: "workspace"
enable_mcp_command_line: true
command_line_execution_mode: "docker"
That’s it! The container will be created automatically when orchestration starts.
How It Works#
Container Lifecycle:
Orchestration Start → Creates persistent container
massgen-{agent_id}Agent Turns → Commands execute via
docker execOrchestration End → Container stopped and removed
Key Features:
Persistent Containers: One container per agent for entire orchestration
State Persistence: Packages and files persist across turns
Path Transparency: Paths mounted at same locations as host
MCP Server on Host: Server runs on host, creates Docker client to execute commands
Volume Mounts:
Workspace: Read-write access to agent’s workspace
Context Paths: Read-only or read-write based on configuration
Temp Workspace: Read-only access to other agents’ outputs
Security Features#
Multi-Layer Security#
MassGen implements multiple security layers for code execution:
AG2-Inspired Command Sanitization
Blocks dangerous patterns:
rm -rf /sudocommandschmod 777And more…
Command Filtering
Whitelist/blacklist regex patterns:
command_line_whitelist_patterns: ["pip install.*", "python .*"] command_line_blacklist_patterns: ["rm -rf.*", "sudo.*"]
Docker Container Isolation (Docker mode only)
Filesystem isolation (only mounted volumes accessible)
Network isolation (default: no network)
Resource limits (memory, CPU)
Process isolation (non-root user)
PathPermissionManager Hooks
Validates file operations against context path permissions
Timeout Enforcement
Commands timeout after configured duration
Local vs Docker Comparison#
Aspect |
Local Mode |
Docker Mode |
|---|---|---|
Setup |
None required |
Docker + image build |
Performance |
Fast (direct execution) |
Slight overhead (~100-200ms) |
Isolation |
Pattern-based (circumventable) |
Container-based (strong) |
Network |
Full host network |
Configurable (none/bridge/host) |
Resource Limits |
OS-level only |
Docker-enforced |
Security |
Medium |
High |
Best For |
Development, trusted code |
Production, untrusted code |
Usage Examples#
Example 1: Python Development#
agent:
backend:
type: "claude"
model: "claude-sonnet-4"
cwd: "workspace"
enable_mcp_command_line: true
command_line_execution_mode: "docker"
massgen "Write and test a sorting algorithm"
What happens:
Agent writes
sort.pyAgent runs
pip install pytestAgent writes tests in
test_sort.pyAgent runs
pytestAll isolated in Docker container!
Example 2: With Resource Constraints#
agent:
backend:
cwd: "workspace"
enable_mcp_command_line: true
command_line_execution_mode: "docker"
command_line_docker_memory_limit: "1g"
command_line_docker_cpu_limit: 1.0
command_line_docker_network_mode: "none"
Good for untrusted or resource-intensive tasks.
Example 3: With Network Access#
agent:
backend:
cwd: "workspace"
enable_mcp_command_line: true
command_line_execution_mode: "docker"
command_line_docker_network_mode: "bridge"
massgen "Fetch data from an API and analyze it"
Agent can make HTTP requests from inside container.
Example 4: Multi-Agent with Different Modes#
agents:
- id: "developer"
backend:
type: "openai"
model: "gpt-5-mini"
cwd: "workspace1"
enable_mcp_command_line: true
command_line_execution_mode: "local" # Fast for development
- id: "tester"
backend:
type: "claude"
model: "claude-sonnet-4"
cwd: "workspace2"
enable_mcp_command_line: true
command_line_execution_mode: "docker" # Isolated for testing
Docker Image Details#
Base Image: massgen/mcp-runtime:latest#
Contents:
Base: Python 3.11-slim
System packages: git, curl, build-essential, Node.js 20.x
Python packages: pytest, requests, numpy, pandas
User: non-root (massgen, UID 1000)
Working directory: /workspace
Size: ~400-500MB (compressed)
Custom Images#
Extend the base image with additional packages:
FROM massgen/mcp-runtime:latest
# Install additional system packages
USER root
RUN apt-get update && apt-get install -y --no-install-recommends \
postgresql-client \
&& rm -rf /var/lib/apt/lists/*
# Install additional Python packages
USER massgen
RUN pip install --no-cache-dir sqlalchemy psycopg2-binary
WORKDIR /workspace
Build and use:
docker build -t my-custom-runtime:latest -f Dockerfile.custom .
command_line_docker_image: "my-custom-runtime:latest"
Sudo Variant (Runtime Package Installation)#
The sudo variant allows agents to install system packages at runtime inside their Docker container.
IMPORTANT: Build the image before first use:
bash massgen/docker/build.sh --sudo
This builds massgen/mcp-runtime-sudo:latest with sudo access locally. (This image is not available on Docker Hub - you must build it yourself.)
Enable in config:
agent:
backend:
cwd: "workspace"
enable_mcp_command_line: true
command_line_execution_mode: "docker"
command_line_docker_enable_sudo: true # Automatically uses sudo image
What agents can do with sudo:
# Install system packages at runtime
sudo apt-get update && sudo apt-get install -y ffmpeg
# Install additional Python packages
sudo pip install tensorflow
Is this safe?
YES, because Docker container isolation is the primary security boundary:
Container is fully isolated from your host:
Sudo inside container ≠ sudo on your computer
Agent can only access mounted volumes (workspace, context paths)
Cannot access your host filesystem outside mounts
Cannot affect host processes or system configuration
Docker namespaces/cgroups provide strong isolation
What sudo can and cannot do:
✅ Can: Install packages inside the container (apt, pip, npm)
✅ Can: Modify container system configuration
✅ Can: Read/write mounted workspace (same as without sudo)
❌ Cannot: Access your host filesystem outside mounts
❌ Cannot: Affect your host system
❌ Cannot: Break out of the container (unless Docker vulnerability exists)
Theoretical risks (extremely rare):
Container escape vulnerabilities (CVEs in Docker/kernel) are very rare and quickly patched
Sudo increases attack surface slightly if escape exists
Still requires exploit code, not just malicious intent
When to use sudo variant vs custom images:
Approach |
Use When |
Performance |
Security |
|---|---|---|---|
Sudo variant |
Need flexibility, unknown packages, prototyping |
Slower (runtime install) |
Good (container isolated) |
Custom image |
Know packages, production use |
Fast (pre-installed) |
Best (minimal attack surface) |
Custom image example (recommended for production):
FROM massgen/mcp-runtime:latest
USER root
RUN apt-get update && apt-get install -y ffmpeg postgresql-client
USER massgen
Build: docker build -t my-runtime:latest .
Use: command_line_docker_image: "my-runtime:latest"
Bottom line: The sudo variant is safe for most use cases because Docker container isolation is strong. Custom images are preferred for production because they’re faster and have a smaller attack surface, but sudo is fine for development and prototyping.
Troubleshooting#
Docker Not Installed#
Symptom: RuntimeError: Docker Python library not available
Solution:
pip install docker>=7.0.0
Failed to Connect to Docker#
Symptom: RuntimeError: Failed to connect to Docker: ...
Possible causes:
Docker daemon not running:
docker ps # Check if Docker is running
Permission issues (Linux):
sudo usermod -aG docker $USER # Log out and back in
Custom Docker socket:
export DOCKER_HOST=unix:///path/to/docker.sock
Image Not Found#
Symptom: RuntimeError: Failed to pull Docker image ...
Solution:
bash massgen/docker/build.sh
Permission Errors in Container#
Symptom: Permission denied when writing files
Solution: Ensure workspace has correct permissions:
chmod -R 755 workspace
Performance Issues#
Solutions:
Increase resource limits:
command_line_docker_memory_limit: "4g" command_line_docker_cpu_limit: 4.0
Use custom image with pre-installed packages
Check Docker Desktop resource settings
Debugging#
Inspect Running Container#
# List containers
docker ps | grep massgen
# View logs in real-time
docker logs -f massgen-{agent_id}
# Execute interactive shell
docker exec -it massgen-{agent_id} /bin/bash
Check Resource Usage#
docker stats massgen-{agent_id}
Manual Container Management#
# Stop container
docker stop massgen-{agent_id}
# Remove container
docker rm massgen-{agent_id}
# Clean up all stopped containers
docker container prune -f
Background Shell Execution#
NEW: MassGen supports running commands in the background without blocking, enabling parallel execution and long-running processes.
What is Background Execution?#
Background execution allows agents to:
Start long-running processes (training, servers, simulations)
Run multiple experiments in parallel
Monitor processes without blocking
Continue working while tasks execute
Available Tools:
When enable_mcp_command_line: true is set, agents automatically get these tools:
start_background_shell(command, work_dir)- Start command in background, returns shell_idget_background_shell_output(shell_id)- Retrieve stdout/stderr from background processget_background_shell_status(shell_id)- Check if running/stopped/failedkill_background_shell(shell_id)- Terminate a background processlist_background_shells()- List all active background processes
Example: Parallel Experiments#
agent:
backend:
type: "openai"
model: "gpt-5-mini"
cwd: "workspace"
enable_mcp_command_line: true
system_message: |
You can run multiple experiments in parallel using background shell tools.
Use start_background_shell() to launch tasks, then monitor with
list_background_shells() and collect results when complete.
Agent workflow:
# Start 3 experiments in parallel
exp1 = start_background_shell("python experiment_a.py")
exp2 = start_background_shell("python experiment_b.py")
exp3 = start_background_shell("python experiment_c.py")
# Monitor until all complete
while True:
shells = list_background_shells()
running = [s for s in shells["shells"] if s["status"] == "running"]
if len(running) == 0:
break
# Collect results
result1 = get_background_shell_output(exp1["shell_id"])
result2 = get_background_shell_output(exp2["shell_id"])
result3 = get_background_shell_output(exp3["shell_id"])
Example: Server Management#
# Start web server in background
server = start_background_shell("uvicorn app:main --port 8000")
# Server runs while agent does other work...
# Run integration tests
test_result = execute_command("pytest tests/integration/")
# Cleanup: stop server
kill_background_shell(server["shell_id"])
Example: Long-Running Tasks with Monitoring#
# Start training job
training = start_background_shell("python train.py --epochs 100")
# Monitor progress periodically
while True:
status = get_background_shell_status(training["shell_id"])
if status["status"] != "running":
break
# Check progress from output
output = get_background_shell_output(training["shell_id"])
# Look for "Epoch X/100" in output...
# Training complete
final_output = get_background_shell_output(training["shell_id"])
Key Features#
Non-blocking: Continue work while processes run
Parallel execution: Run multiple tasks simultaneously (default limit: 10 concurrent)
Memory-safe: Ring buffer captures last 10,000 lines (prevents OOM on infinite output)
Auto-cleanup: All background processes killed on MassGen exit
Thread-safe: Safe for concurrent access from multiple agents
Same security: Background shells use same sanitization as foreground
execute_command
Demo Configuration#
See massgen/configs/tools/code-execution/background_shell_demo.yaml for a complete example showing parallel vs sequential execution strategies.
Best Practices#
Use Docker mode for untrusted or production workloads
Set resource limits to prevent abuse
Use network_mode=”none” unless network is required
Build custom images for frequently used packages (faster)
Monitor container logs for debugging
Test in local mode first for faster iteration
Use command filtering to restrict dangerous operations
Use background shells for parallel tasks - Run multiple experiments concurrently
Monitor background processes - Use
get_background_shell_status()to check progressCleanup background shells - Kill when done or let auto-cleanup handle it
Configuration Examples#
See massgen/configs/tools/code-execution/ for example configurations:
basic_command_execution.yaml- Minimal code execution setupcode_execution_use_case_simple.yaml- Simple use case examplecommand_filtering_whitelist.yaml- Whitelist filtering examplecommand_filtering_blacklist.yaml- Blacklist filtering exampledocker_simple.yaml- Minimal Docker setupdocker_with_resource_limits.yaml- Memory/CPU limits with networkdocker_multi_agent.yaml- Multi-agent with Docker isolationdocker_verification.yaml- Verify Docker isolation worksbackground_shell_demo.yaml- NEW: Parallel execution with background shells
Next Steps#
File Operations & Workspace Management - File system operations and workspace management
MCP Integration - Additional MCP tools beyond code execution
Supported Models & Backends - Backend capabilities including code execution
Running MassGen - More usage examples
References#
Design Document:
docs/dev_notes/CODE_EXECUTION_DESIGN.mdNEW: Background Execution Design:
docs/dev_notes/background_shell_execution_design.mdDocker README:
massgen/docker/README.mdBuild Script:
massgen/docker/build.sh