Architecture#

MassGen’s architecture is designed for scalability, flexibility, and extensibility.

System Overview#

┌─────────────────────────────────────────┐
│           User Application              │
└─────────────┬───────────────────────────┘
              │
┌─────────────▼───────────────────────────┐
│          Orchestrator Layer             │
│  ┌─────────────┬──────────────────┐    │
│  │  Strategy   │  Consensus       │    │
│  │  Manager    │  Engine           │    │
│  └─────────────┴──────────────────┘    │
└─────────────┬───────────────────────────┘
              │
┌─────────────▼───────────────────────────┐
│           Agent Layer                   │
│  ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │
│  │Agent1│ │Agent2│ │Agent3│ │AgentN│ │
│  └──┬───┘ └──┬───┘ └──┬───┘ └──┬───┘ │
└─────┼────────┼────────┼────────┼──────┘
      │        │        │        │
┌─────▼────────▼────────▼────────▼──────┐
│         Backend Abstraction            │
│  ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │
│  │OpenAI│ │Claude│ │Gemini│ │ Grok │ │
│  └──────┘ └──────┘ └──────┘ └──────┘ │
└─────────────────────────────────────────┘

Core Components#

Orchestrator#

The orchestrator manages agent coordination:

  • Task distribution

  • Strategy execution

  • Consensus building

  • Result aggregation

Agent#

Agents are autonomous units with:

  • Unique identity and role

  • Backend connection

  • Tool access

  • Memory management

Backend#

Backends provide LLM capabilities:

  • API abstraction

  • Model management

  • Response handling

  • Error recovery

Design Principles#

  1. Modularity: Components are loosely coupled

  2. Extensibility: Easy to add new agents, backends, tools

  3. Scalability: Supports horizontal scaling

  4. Resilience: Fault-tolerant design

  5. Flexibility: Multiple orchestration strategies

Coordination Protocol#

MassGen uses a coordination protocol built around redundancy and iterative refinement, with cycles of restarts and voting-based consensus to validate quality.

Vote-Based Consensus#

The coordination process follows these steps:

  1. Parallel Execution: All agents receive the same query and work simultaneously

  2. Answer Observation: Agents can see recent answers from other agents

  3. Decision Making: Each agent chooses to either:

    • Provide a new/refined answer (new_answer tool)

    • Vote for an existing answer they think is best (vote tool)

  4. Dynamic Updates: When an agent provides new_answer:

    • Other agents receive update injection mid-work

    • Agents continue with preserved context (inject-and-continue)

    • All existing votes are cleared (new answer invalidates votes)

  5. Consensus Detection: Coordination continues until all agents have voted

  6. Winner Selection: The agent with the most votes is selected

  7. Final Presentation: The winning agent delivers the final answer

Key Features:

  • Natural Convergence: No forced consensus, agents naturally agree on best answer

  • Iterative Refinement: Agents can refine their answers after seeing others’ work

  • Workspace Sharing: When agents answer, their workspace is snapshotted for others to review

  • Tie Resolution: Deterministic tie-breaking based on answer order

Inject-and-Continue (Preempt-Not-Restart)#

When an agent provides a new_answer while other agents are working, MassGen uses an inject-and-continue approach instead of restarting agents from scratch.

Traditional Approach (Restart):

Agent A: Working on solution... [deep in analysis]
Agent B: Provides new_answer
         ↓
Agent A: KILL stream → Clear context → Restart from zero
         ❌ Lost all partial work and thinking

MassGen Approach (Inject-and-Continue):

Agent A: Working on solution... [completes response]
Agent B: Provides new_answer
         ↓
Agent A: Receive UPDATE → Append to conversation → New API call
         ✅ Preserved conversation history
         ✅ Can now build on Agent B's answer

How It Works:

When Agent B provides a new_answer, the orchestrator appends an UPDATE message to Agent A’s conversation history containing Agent B’s answer. Agent A then makes a new API call with this extended context. This preserves the conversation history but requires a fresh inference call.

Note

Future Enhancement: True mid-stream injection within tool responses is planned, which would allow updates to be injected while an agent is actively streaming, preserving in-progress thinking. Currently, updates are only processed between API calls.

Benefits:

  1. Conversation Preservation: Agents keep their full conversation history

  2. Collaboration: Agents can synthesize and build on each other’s work

  3. No Full Restart: Agents don’t lose their accumulated context

Update Injection Points:

Updates are injected at safe points during agent execution:

  • Between iteration loops (after completing a response)

  • When agent checks for new context

  • Between API calls (not mid-stream)

Race Condition: If an agent is deep in its first response when a new answer arrives, it won’t see the injection until completing that response. By then, it may already have full context from the orchestrator’s normal flow. This is acceptable - the agent still gets all answers, just via different mechanism (full context on next spawn vs. injection mid-work).

Implementation: massgen/orchestrator.py:_inject_update_and_continue()

Answer Labeling#

Each answer gets a unique identifier: agent{N}.{attempt}

  • agent1.1 = Agent 1’s first answer

  • agent2.1 = Agent 2’s first answer

  • agent1.2 = Agent 1’s second answer (after restart)

  • agent1.final = Agent 1’s final answer (if winner)

This labeling system enables:

  • Clear vote tracking

  • Answer evolution visualization

  • Transparent decision history

Implementation: massgen/orchestrator.py

Workspace Management#

Each agent gets an isolated workspace for safe file operations.

Directory Structure#

.massgen/
├── workspaces/           # Agent working directories
│   ├── agent1/          # Agent 1's isolated workspace
│   └── agent2/          # Agent 2's isolated workspace
├── snapshots/           # Workspace snapshots for coordination
│   ├── agent1_20250113_143022/  # Snapshot of agent1's work
│   └── agent2_20250113_143025/  # Snapshot of agent2's work
├── temp_workspaces/     # Previous turn results for multi-turn
│   ├── agent1_turn_1/   # Agent 1's turn 1 results
│   └── agent2_turn_1/   # Agent 2's turn 1 results
├── sessions/            # Multi-turn conversation history
│   └── session_20250113_143000/
│       ├── turn_1/
│       └── turn_2/
└── massgen_logs/        # All logging output
    └── log_20250113_143000/

Snapshot System#

When an agent provides an answer during coordination:

  1. Capture: Their workspace is copied to snapshots/

  2. Share: Other agents receive read-only access to the snapshot

  3. Review: Agents can examine files, code, and outputs

  4. Build: Agents build on insights from other agents’ work

This enables agents to:

  • See concrete work, not just descriptions

  • Catch errors in code or logic

  • Build incrementally on each other’s contributions

  • Provide informed votes based on actual outputs

Implementation: massgen/filesystem_manager/

Multi-Turn Conversations#

MassGen supports interactive multi-turn conversations with full context preservation.

Session Management#

Each multi-turn session maintains:

  • Session ID: Unique identifier (e.g., session_20250113_143000)

  • Turn History: Numbered turns (turn_1, turn_2, …)

  • Workspace Persistence: Each turn’s workspace is preserved

  • Context Paths: Previous turns become read-only context for next turns

Turn Lifecycle#

  1. Turn Start: Increment turn counter, create turn directory

  2. Context Loading: Previous turn’s workspace becomes read-only context

  3. Execution: Agents work with fresh writeable workspace + previous context

  4. Persistence: Winning agent’s workspace is saved to turn directory

  5. Summary Update: SESSION_SUMMARY.txt is updated with turn details

This allows agents to:

  • Compare “what I changed” vs “what was originally there”

  • Build incrementally across multiple turns

  • Reference previous results explicitly

  • Maintain project continuity

Implementation: massgen/cli.py (multi-turn mode)

MCP Integration#

MassGen integrates Model Context Protocol (MCP) for external tool access.

Architecture#

Backend → MCP Client → MCP Server → External Tools
   ↓
Tools List → Agent → Tool Calls → Tool Results

Supported Backends:

  • Claude: Native MCP support via claude_messages API

  • Gemini: MCP support via function calling

  • Others: Via tool conversion layer

Planning Mode#

Special coordination mode for MCP tools:

  • During Coordination: Agents can plan tool usage without execution

  • After Consensus: Winner executes tools in their final answer

  • Safety: Prevents irreversible actions during collaboration

This is critical for:

  • File operations (create, delete, modify)

  • API calls with side effects

  • Database operations

  • External service integrations

Implementation: massgen/backend/gemini.py, massgen/backend/claude.py

Backend Abstraction#

All LLM interactions go through a unified backend interface.

Backend Interface#

Each backend implements:

class Backend:
    async def chat(messages, stream=True):
        """Stream responses with tool calls"""

    async def get_available_tools():
        """Return tools for this backend"""

    def format_messages(messages):
        """Convert to backend-specific format"""

Supported Backends:

  • API-based: OpenAI, Claude, Gemini, Grok, Azure OpenAI

  • Local: LM Studio, vLLM, SGLang

  • External: AG2 framework agents

  • Native tool backends: Claude Code SDK and Codex CLI with filesystem and shell access

Implementation: massgen/backend/

File Permission System#

MassGen enforces granular file permissions for safe project integration.

Context Paths#

Agents can access specific directories with permissions:

orchestrator:
  context_paths:
    - path: "/path/to/project"
      permission: "write"
      protected_paths:
        - ".git"
        - "node_modules"

Permission Types:

  • read: View files only

  • write: Read, create, modify, delete files (except protected)

Protected Paths:

  • Immune from modification/deletion

  • Relative to context path

  • Supports files and directories

Safety Features:

  • Read-Before-Delete: Agents must read files before deletion

  • Permission Validation: All file operations are checked

  • Audit Trail: All operations logged to massgen.log

Implementation: massgen/filesystem_manager/_path_permission_manager.py

Code Organization#

massgen/
├── orchestrator.py           # Coordination engine
├── chat_agent.py             # Agent implementations
├── cli.py                    # Command-line interface
├── config_builder.py         # Interactive config wizard
├── agent_config.py           # Configuration models
├── backend/                  # LLM backend implementations
│   ├── claude.py            # Anthropic Claude
│   ├── gemini.py            # Google Gemini
│   ├── response.py          # OpenAI
│   ├── grok.py              # xAI Grok
│   ├── claude_code.py       # Claude Code CLI
│   ├── codex.py            # OpenAI Codex CLI
│   ├── external.py          # External frameworks (AG2)
│   └── ...
├── frontend/                 # UI components
│   └── coordination_ui.py   # Terminal UI
├── filesystem_manager/       # File operations & permissions
│   ├── _path_permission_manager.py
│   ├── _workspace_tools_server.py
│   └── ...
├── logger_config.py          # Logging configuration
└── adapters/                 # External framework adapters
    └── ag2/                 # AG2 adapter

Key Modules:

  • orchestrator.py: Vote tracking, consensus detection, workspace snapshots

  • chat_agent.py: Agent lifecycle, message handling, tool execution

  • backend/: LLM-specific implementations with unified interface

  • filesystem_manager/: Permission system, workspace isolation

  • frontend/: Real-time coordination display with Rich

Extension Points#

Adding New Backends#

  1. Subclass Backend base class

  2. Implement chat() and format_messages()

  3. Register in cli.py’s create_backend()

  4. Add to AgentConfig factory methods

Example: massgen/backend/grok.py

Adding MCP Servers#

  1. Configure in YAML:

    backend:
      type: "claude"
      mcp_servers:
        - name: "weather"
          command: "npx"
          args: ["-y", "@modelcontextprotocol/server-weather"]
    
  2. Servers auto-start when backend initializes

  3. Tools automatically discovered and presented to agent

Example: All MCP configs in massgen/configs/tools/mcp/

Adding External Frameworks#

  1. Create adapter in massgen/adapters/{framework}/

  2. Implement ExternalAgentAdapter interface

  3. Register in adapters/__init__.py

  4. Agents work seamlessly with native MassGen agents

Example: massgen/adapters/ag2/

Context Management#

MassGen implements several strategies to manage LLM context windows efficiently.

Reactive Compression#

When the LLM provider returns a context length error, MassGen automatically:

  1. Captures the streaming buffer content (tool calls, reasoning, partial work)

  2. Generates a summary of completed work

  3. Compresses older messages while preserving recent context

  4. Retries the request with the compressed context

See Memory and Context Management for user-facing documentation.

Implementation: massgen/backend/_compression_utils.py

Streaming Buffer#

The StreamingBufferMixin captures streamed content during API calls, enabling compression recovery to preserve partial work when context limits are exceeded.

How it works:

  1. As chunks stream from the API, content is accumulated in _streaming_buffer

  2. If a context length error occurs mid-stream, the buffer contains partial work

  3. The buffer content is passed to compression, which summarizes it

  4. The summary is injected as an assistant message for retry

Buffer content flow:

API Stream → _append_to_streaming_buffer() → _streaming_buffer accumulates
                                           ↓
                           Context error detected
                                           ↓
                           buffer_content passed to compress_messages_for_recovery()
                                           ↓
                           Summarized into: "[Tool execution results]\n{buffer}"
                                           ↓
                           Injected as assistant message in compressed result

Backend Support:

Streaming Buffer Support by Backend#

Backend

Buffer Support

Notes

ChatCompletionsBackend

✅ Yes

Base for OpenAI-compatible APIs

ClaudeBackend

✅ Yes

Anthropic Messages API

GeminiBackend

✅ Yes

Google Gemini SDK

ResponseBackend

✅ Yes

OpenAI Responses API

GrokBackend

✅ Yes

Inherits from ChatCompletionsBackend

LMStudioBackend

✅ Yes

Inherits from ChatCompletionsBackend

InferenceBackend

✅ Yes

Inherits from ChatCompletionsBackend

AzureOpenAIBackend

❌ No

Extends LLMBackend directly

ClaudeCodeBackend

❌ No

Streaming handled internally

ExternalAgentBackend

❌ No

Wrapper for external agents

Implementation:

  • massgen/backend/_streaming_buffer_mixin.py - Mixin class providing buffer methods

  • Buffer methods: _clear_streaming_buffer(), _append_to_streaming_buffer()

  • Buffer respects _compression_retry flag to avoid clearing during retry

Adding buffer support to a backend:

from ._streaming_buffer_mixin import StreamingBufferMixin

class MyBackend(StreamingBufferMixin, CustomToolAndMCPBackend):
    # StreamingBufferMixin MUST come first in MRO
    pass

Note: The mixin must appear before other base classes in the inheritance list to ensure proper method resolution order (MRO).

MCP Tool Result Optimization#

MCP CallToolResult objects contain both structured and text representations. MassGen extracts only the clean text content to minimize context usage:

Raw CallToolResult (sent to context):
❌ "meta=None content=[TextContent(type='text', text='file contents...')]
    structuredContent={'content': 'file contents...'}"  ← Duplicated, bloated

Optimized extraction (sent to context):
✅ "file contents..."  ← Clean, minimal

This optimization typically reduces tool result size by 4-10x, significantly extending how many tool calls can fit within the context window.

Extraction Logic:

  1. Check if result has .content attribute (MCP CallToolResult)

  2. Extract text from TextContent objects in the content list

  3. Fall back to .text attribute or str(result) for other result types

Implementation: _extract_text_from_content() in each backend’s _append_tool_result_message() method.

Large Tool Result Eviction#

Tool results exceeding a token threshold are automatically evicted to files, preventing context window saturation. Inspired by LangChain DeepAgents Harness.

How it works:

  1. After tool execution, the result is checked against the token threshold

  2. If exceeding 20,000 tokens, the result is written to a file in the agent’s workspace

  3. The result is replaced with a reference message containing:

    • Token and character counts

    • File path for retrieval

    • Character position for chunked reading

    • A 2,000 token preview of the content

Example reference message:

[Tool Result Evicted - Too Large for Context]

The result from read_file was 50,000 tokens / 200,000 chars (limit: 20,000 tokens).
Full result saved to: .tool_results/read_file_20241225_143052_a1b2c3d4.txt

To read more: start at char 6,500, read in chunks.

Preview (chars 0-6,500 of 200,000):
{"data": [{"id": 1, "name": "Alice"...

Note: The preview character count varies based on content (approximately 2,000 tokens).

Configuration:

Currently uses hardcoded thresholds defined in constants:

  • TOOL_RESULT_EVICTION_THRESHOLD_TOKENS = 20,000 - Eviction trigger

  • TOOL_RESULT_EVICTION_PREVIEW_TOKENS = 2,000 - Preview size

  • EVICTED_RESULTS_DIR = ".tool_results" - Storage directory

Implementation files:

  • massgen/filesystem_manager/_constants.py - Threshold constants

  • massgen/backend/base_with_custom_tool_and_mcp.py - Eviction logic:

    • _truncate_to_token_limit() - Binary search for token-based truncation

    • _maybe_evict_large_tool_result() - Main eviction logic

    • Integration in _execute_tool_with_logging()

Testing: massgen/tests/test_tool_result_eviction.py

Performance Considerations#

  • Parallel Execution: All agents run concurrently

  • Streaming: All responses stream in real-time

  • Workspace Isolation: Copy-on-write for efficiency

  • Async I/O: All file operations are non-blocking

  • Token Management: Per-backend rate limiting

See Also#

  • Contributing to MassGen - How to contribute code

  • Writing Configuration Files - Configuration authoring guide

  • massgen/orchestrator.py - Core coordination logic

  • massgen/backend/ - Backend implementations

  • massgen/filesystem_manager/ - Permission system