LLM Agent Automation Guide

LLM Agent Automation Guide#

This guide shows how to automate MassGen coordination using LLM agents and programmatic workflows.

Overview #

MassGen provides automation mode (introduced in v0.1.8) designed specifically for LLM agents and background execution:

✅ Silent output (~10 lines instead of 250-3,000+)
✅ Real-time status tracking via status.json (updated every 2 seconds)
✅ Meaningful exit codes (success, timeout, error, interrupted)
✅ Structured result files (machine-readable JSON and text)
✅ Parallel execution support (isolated log directories)

Quick Start #

Basic Automation Mode #

uv run massgen --automation --config your_config.yaml "Your question here"

Output (minimal, parseable):

LOG_DIR: /path/to/.massgen/massgen_logs/log_20251103_143022
STATUS: /path/to/.massgen/massgen_logs/log_20251103_143022/status.json
QUESTION: Your question here
[Coordination in progress - monitor status.json for real-time updates]

WINNER: agent_a
ANSWER_FILE: /path/to/final/agent_a/answer.txt
DURATION: 45.3s
ANSWER_PREVIEW: The answer starts here...

COMPLETED: 2 agents, 45.3s total

Exit codes:

0 = Success (coordination completed)
1 = Configuration error
2 = Execution error (agent failure, API error)
3 = Timeout
4 = Interrupted (Ctrl+C)

Using BackgroundShellManager #

MassGen provides BackgroundShellManager for robust background execution. Always use this instead of subprocess directly.

Note

BackgroundShellManager is for running full CLI processes (for example, uv run massgen ...) in the background. For non-blocking tool calls inside an agent run, use the tool lifecycle documented in Background Tool Execution.

Basic Usage #

from massgen.filesystem_manager.background_shell import (
    start_shell,
    get_shell_output,
    get_shell_status,
    kill_shell,
)

# Start MassGen in background
shell_id = start_shell(
    "uv run massgen --automation --config config.yaml 'Your question'"
)

# Monitor progress
import time
while True:
    status = get_shell_status(shell_id)
    if status["status"] != "running":
        break
    time.sleep(2)

# Get results
output = get_shell_output(shell_id)
print(f"Exit code: {output['exit_code']}")
print(f"Output:\n{output['stdout']}")

Parallel Execution #

Parallel Execution Safety #

✅ Parallel execution is AUTOMATIC and SAFE in ALL modes!

MassGen automatically isolates all resources when running multiple instances:

Generates unique instance IDs - Appends random 8-character ID to prevent conflicts

Example: workspace1 → workspace1_a1b2c3d4
Isolates all resources automatically:

✅ Log directories - Microsecond-precision timestamps

✅ Workspaces - Auto-generated unique suffixes

✅ Snapshot storage - Per-agent subdirectories

✅ Docker containers - Auto-generated unique container names (includes instance ID suffix)

No manual configuration needed! Just use the same config multiple times:

# ✅ SAFE - Run the same config 5 times in parallel (with or without --automation)
for i in {1..5}; do
    uv run massgen --config my_config.yaml "Task $i" &
done
wait

Each instance automatically gets unique workspace paths and Docker containers:

Instance 1: workspace1_a1b2c3d4, massgen-agent_a-a1b2c3d4
Instance 2: workspace1_e5f6a7b8, massgen-agent_a-e5f6a7b8
Instance 3: workspace1_c9d0e1f2, massgen-agent_a-c9d0e1f2

Note: This works in both automation mode (--automation) and normal mode. The difference is that automation mode provides silent output and status.json tracking, while normal mode shows the full UI.

Running Multiple Experiments Simultaneously #

Programmatic Parallel Execution:

Use the BackgroundShellManager for robust programmatic parallel execution:

from massgen.filesystem_manager.background_shell import start_shell, get_shell_status
import time

def run_experiments_in_parallel(configs_and_questions):
    """
    Run multiple MassGen experiments in parallel.

    Args:
        configs_and_questions: List of (config_path, question) tuples

    Returns:
        list: Results from all experiments
    """
    experiments = []

    # Start all experiments
    for config, question in configs_and_questions:
        shell_id = start_shell(
            f'uv run massgen --automation --config {config} "{question}"'
        )
        experiments.append({
            "shell_id": shell_id,
            "config": config,
            "question": question,
        })
        print(f"Started experiment {shell_id}: {question[:50]}...")

    # Wait for all to complete
    while True:
        all_done = True
        for exp in experiments:
            status = get_shell_status(exp["shell_id"])
            if status["status"] == "running":
                all_done = False

        if all_done:
            break

        time.sleep(2)

    # Collect results
    results = []
    for exp in experiments:
        status = get_shell_status(exp["shell_id"])
        output = get_shell_output(exp["shell_id"])
        results.append({
            "config": exp["config"],
            "question": exp["question"],
            "exit_code": output["exit_code"],
            "duration": status["duration_seconds"],
            "status": status["status"],
        })

    return results


# Example: Run the SAME config with different questions (parallel isolation is automatic!)
experiments = [
    ("my_config.yaml", "Create a webpage about Bob Dylan"),
    ("my_config.yaml", "Write a Python script to analyze data"),
    ("my_config.yaml", "Design a REST API for a blog"),
]

results = run_experiments_in_parallel(experiments)

for result in results:
    print(f"{result['question']}: {result['status']} in {result['duration']}s")

Status File Overview #

The status.json file is updated every 2 seconds during coordination.

Note

For complete status.json reference with all fields documented: See status.json Reference

File Location #

.massgen/massgen_logs/log_YYYYMMDD_HHMMSS_ffffff/status.json

Quick Reference #

{
  "meta": {
    "last_updated": 1730678901.234,
    "session_id": "log_20251103_143022_123456",
    "log_dir": ".massgen/massgen_logs/log_20251103_143022_123456",
    "question": "Your question here",
    "start_time": 1730678800.000,
    "elapsed_seconds": 101.234
  },
  "coordination": {
    "phase": "enforcement",
    "active_agent": "agent_b",
    "completion_percentage": 65,
    "is_final_presentation": false
  },
  "agents": {
    "agent_a": {
      "status": "voted",
      "answer_count": 1,
      "latest_answer_label": "agent1.1",
      "vote_cast": {
        "voted_for_agent": "agent_a",
        "voted_for_label": "agent1.1",
        "reason_preview": "Strong JSON structure..."
      },
      "times_restarted": 1,
      "last_activity": 1730678850.123,
      "error": null
    },
    "agent_b": {
      "status": "streaming",
      "answer_count": 0,
      "latest_answer_label": null,
      "vote_cast": null,
      "times_restarted": 1,
      "last_activity": 1730678900.456,
      "error": {
        "type": "timeout",
        "message": "Agent timeout after 180s",
        "timestamp": 1730678900.0
      }
    }
  },
  "results": {
    "votes": {
      "agent1.1": 1,
      "agent1.2": 0
    },
    "winner": null,
    "final_answer_preview": null
  }
}

Agent Status Values #

streaming: Agent is actively generating content
answered: Agent has provided an answer this round
voted: Agent has cast their vote
restarting: Agent is restarting due to new answer
error: Agent encountered an error
timeout: Agent timed out
completed: Agent finished all work

Coordination Phases #

initial_answer: Agents providing initial answers
enforcement: Voting phase
presentation: Final answer presentation

Reading Results #

Log Directory Structure #

After coordination completes, find results in the log directory:

.massgen/massgen_logs/log_YYYYMMDD_HHMMSS/
├── execution_metadata.yaml       # Session metadata
├── coordination_events.json      # Complete event log
├── status.json                   # Final status snapshot
├── snapshot_mappings.json        # Answer/vote file mappings
├── final/
│   └── {winner_agent}/
│       ├── answer.txt            # ⭐ Final answer here
│       ├── context.txt           # Agent's context
│       └── workspace/            # Agent's workspace snapshot
├── agent_outputs/
│   ├── agent_a.txt              # Full agent log
│   └── agent_b.txt
└── massgen.log                   # Detailed debug log

Programmatic Access #

import json
from pathlib import Path

def read_massgen_results(log_dir: Path):
    """Read MassGen coordination results."""
    # Read final status
    status = json.load(open(log_dir / "status.json"))

    # Get winner
    winner = status["results"]["winner"]

    # Read final answer
    answer_file = log_dir / f"final/{winner}/answer.txt"
    answer = answer_file.read_text() if answer_file.exists() else None

    # Read execution metadata
    import yaml
    metadata = yaml.safe_load(open(log_dir / "execution_metadata.yaml"))

    return {
        "winner": winner,
        "answer": answer,
        "duration": status["meta"]["elapsed_seconds"],
        "votes": status["results"]["votes"],
        "config": metadata["config"],
        "question": metadata["question"],
    }

Meta-Coordination: MassGen Running MassGen #

MassGen can autonomously run and monitor itself, enabling self-improvement and automated experimentation.

Tip

Case Study: The v0.1.8 release includes a complete MassGen v0.1.8: Automation Mode Enables Meta Self-Analysis demonstrating meta-coordination in action. Agents successfully ran nested MassGen experiments, analyzed execution logs, and proposed 6 prioritized performance improvements with starter code.

Available Meta Configs #

1. massgen_runs_massgen.yaml - Run MassGen experiments

uv run massgen --config @examples/configs/meta/massgen_runs_massgen.yaml \
    "Run a MassGen experiment to create a webpage about Bob Dylan"

2. massgen_suggests_to_improve_massgen.yaml - Run experiments AND suggest improvements

uv run massgen --config @examples/configs/meta/massgen_suggests_to_improve_massgen.yaml \
    "Run an experiment with MassGen then read the logs and suggest any improvements to help MassGen perform better along any dimension (quality, speed, cost, creativity, etc.)."

This configuration was used in the v0.1.8 case study where agents analyzed MassGen’s architecture, ran controlled experiments, and identified optimization opportunities.

Example Configuration #

Config: @examples/configs/meta/massgen_runs_massgen.yaml

agents:
  - id: "meta_agent"
    backend:
      type: "openai"
      model: "gpt-5-mini"
      cwd: "workspace_meta"
      enable_mcp_command_line: true
      command_line_execution_mode: "local"
    system_message: |
      You have access to MassGen through the command line and can:
      - Run MassGen in automation mode using: uv run massgen --automation --config [config] "[question]"
      - Monitor progress by reading status.json files
      - Read final results from log directories
      - Parse coordination outcomes
      - Always run MassGen in a background process to avoid blocking
orchestrator:
  snapshot_storage: "snapshots_meta"
  agent_temporary_workspace: "temp_workspaces_meta"

Running the Example #

uv run massgen --config massgen/configs/meta/massgen_runs_massgen.yaml \
    "Run a MassGen experiment to create a webpage about Bob Dylan"

What happens:

The meta_agent receives your request
It executes: uv run massgen --automation --config massgen/configs/tools/todo/example_task_todo.yaml "Create a simple HTML page about Bob Dylan"
It monitors the nested MassGen’s status.json file
It reads the final results
It reports which agent won (agent_a or agent_b) and shows the final HTML page

Output demonstrates:

✅ MassGen can autonomously run experiments
✅ Can monitor progress via status.json
✅ Can parse and report coordination outcomes
✅ Can read final results from log directories

Current Limitations #

Note

Local Execution Only: The meta-config currently uses command_line_execution_mode: "local". Docker execution for nested MassGen requires:

API credential passing to nested instances
Automatic dependency installation (e.g., reinstalling MassGen in container)
See Issue #436 for planned Docker support

Warning

Cost Control: Meta-coordination can result in significant API costs as agents run experiments autonomously. Always set strict timeout limits. See Issue #432 for planned cost tracking features.

Error Handling Best Practices #

Always use timeouts

result = run_massgen_automation(config, question, timeout_seconds=300)

Check exit codes

if result["exit_code"] == 0:
    # Success
elif result["exit_code"] == 3:
    # Timeout - may need longer timeout or simpler query
elif result["exit_code"] == 2:
    # Execution error - check logs

Monitor agent errors in status.json

if status["agents"]["agent_a"]["error"]:
    # Handle agent-specific error

Always clean up on failure

try:
    result = run_massgen_automation(config, question)
finally:
    # Ensure shell is killed if still running
    if shell_id:
        kill_shell(shell_id)

Validate results exist before reading

if answer_file.exists():
    answer = answer_file.read_text()
else:
    # Handle missing results

Session Viewer #

While --automation mode runs headless, you can observe any session (live or completed) in the full Textual TUI using massgen viewer.

# In terminal 1: Run headless
uv run massgen --automation --config config.yaml "Your question"
# Outputs: LOG_DIR: .massgen/massgen_logs/log_20260309_120000_123456/turn_1/attempt_1

# In terminal 2: View live in TUI
uv run massgen viewer .massgen/massgen_logs/log_20260309_120000_123456/turn_1/attempt_1

The viewer shows the exact same TUI as a normal interactive run — agent panels, tool calls, votes, and final presentation — but in read-only mode.

Quick Reference #

# View the most recent session (auto-detected)
massgen viewer

# View a specific log directory
massgen viewer /path/to/log_dir

# Interactive session picker
massgen viewer --pick

# Replay a completed session at real-time speed
massgen viewer /path/to/log_dir --replay-speed 1

# View in browser (requires textual-serve)
massgen viewer /path/to/log_dir --web

Live vs Replay:

If the session is still running (is_complete: false in status.json), the viewer tails events.jsonl in real time
If the session is completed, all events are replayed instantly (or at --replay-speed if specified)

Tip

This is especially useful for cloud runs, CI/CD pipelines, and embedded processes where you need visual monitoring without a terminal attached to the running process.

Performance Tips #

Use automation mode - Reduces output overhead significantly
Poll status.json every 2-5 seconds - Balances responsiveness and overhead
Limit concurrent experiments - BackgroundShellManager limits to 10 by default
Clean up old logs - Remove .massgen/massgen_logs/log_* directories periodically
Use appropriate timeouts - Simple tasks: 60s, Complex tasks: 300-600s

Troubleshooting #

Issue: Can’t find log directory #

Symptom: LOG_DIR not printed in output

Solutions:

Ensure --automation flag is used
Check stderr for startup errors
Verify config file exists and is valid

Issue: status.json not updating #

Symptom: status.json file not changing

Solutions:

Ensure logging is enabled (--automation enables it by default)
Check if coordination is actually running
Verify file permissions on log directory

Issue: Process hangs #

Symptom: Process runs indefinitely

Solutions:

Set timeout in your automation script
Monitor status.json for stuck agents
Use kill_shell() to terminate gracefully

Issue: Exit code always 1 #

Symptom: Getting config errors

Solutions:

Validate config with uv run massgen --validate --config your_config.yaml
Check that all required API keys are set
Verify model names are correct

Limitations #

Current Constraints #

1. Local Code Execution Only (for MassGen-running-MassGen)

When using MassGen to run MassGen (meta-coordination), currently only local code execution is supported:

# ✅ Supported
agents:
  - backend:
      enable_mcp_command_line: true
      command_line_execution_mode: "local"

# ❌ Not yet supported for meta-coordination
agents:
  - backend:
      command_line_execution_mode: "docker"
      # Issue: Requires credential passing to nested instances

Why: Docker execution requires API credentials, which need to be securely passed to nested MassGen instances. This will be addressed in a future PR.

2. Cost Control

Warning

IMPORTANT: When using automation mode for autonomous experiments, agents can potentially execute many API calls without human oversight. This can result in unexpected costs.

Best Practices: The configs you have MassGen run itself should include cost control measures:

Set explicit timeout limits in configs to prevent indefinite hangs:

timeout_settings:
  orchestrator_timeout_seconds: 1800  # 30 minutes max (recommended for meta-coordination)
  agent_timeout_seconds: 600          # 10 minutes per agent

Note: Meta-coordination typically takes 10-30 minutes. Regular tasks: 2-10 minutes.

Limit answers per agent for better progress tracking:
```
orchestrator:
  max_new_answers_per_agent: 2  # Helps track progress more accurately
```
Setting this helps estimate completion percentage more reliably. Without it, agents can provide unlimited answers, making progress tracking less predictable.
Monitor costs via your API provider dashboards

Use less expensive models for automated experimentation:

agents:
  - backend:
      model: "gpt-4o-mini"  # More economical than gpt-4o

Set API rate limits at the provider level
Start with small experiments before scaling

Future Enhancement: Built-in cost tracking and limits (planned).

Next Steps #

Read CLI Reference for all CLI options
See status.json Reference for complete status.json documentation
See YAML Configuration Reference for configuration details
Check Basic Examples for working examples
Review massgen/filesystem_manager/background_shell.py source code

LLM Agent Automation Guide

Contents

LLM Agent Automation Guide#