Architecture

Architecture#

MassGen’s architecture is designed for scalability, flexibility, and extensibility.

System Overview#

┌─────────────────────────────────────────┐
│           User Application              │
└─────────────┬───────────────────────────┘
              │
┌─────────────▼───────────────────────────┐
│          Orchestrator Layer             │
│  ┌─────────────┬──────────────────┐    │
│  │  Strategy   │  Consensus       │    │
│  │  Manager    │  Engine           │    │
│  └─────────────┴──────────────────┘    │
└─────────────┬───────────────────────────┘
              │
┌─────────────▼───────────────────────────┐
│           Agent Layer                   │
│  ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │
│  │Agent1│ │Agent2│ │Agent3│ │AgentN│ │
│  └──┬───┘ └──┬───┘ └──┬───┘ └──┬───┘ │
└─────┼────────┼────────┼────────┼──────┘
      │        │        │        │
┌─────▼────────▼────────▼────────▼──────┐
│         Backend Abstraction            │
│  ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │
│  │OpenAI│ │Claude│ │Gemini│ │ Grok │ │
│  └──────┘ └──────┘ └──────┘ └──────┘ │
└─────────────────────────────────────────┘

Core Components#

Orchestrator#

The orchestrator manages agent coordination:

Task distribution
Strategy execution
Consensus building
Result aggregation

Agent#

Agents are autonomous units with:

Unique identity and role
Backend connection
Tool access
Memory management

Backend#

Backends provide LLM capabilities:

API abstraction
Model management
Response handling
Error recovery

Design Principles#

Modularity: Components are loosely coupled
Extensibility: Easy to add new agents, backends, tools
Scalability: Supports horizontal scaling
Resilience: Fault-tolerant design
Flexibility: Multiple orchestration strategies

Coordination Protocol#

MassGen uses a coordination protocol built around redundancy and iterative refinement, with cycles of restarts and voting-based consensus to validate quality.

Vote-Based Consensus#

The coordination process follows these steps:

Parallel Execution: All agents receive the same query and work simultaneously
Answer Observation: Agents can see recent answers from other agents
Decision Making: Each agent chooses to either:
- Provide a new/refined answer (new_answer tool)
- Vote for an existing answer they think is best (vote tool)
Dynamic Updates: When an agent provides new_answer:
- Other agents receive update injection mid-work
- Agents continue with preserved context (inject-and-continue)
- All existing votes are cleared (new answer invalidates votes)
Consensus Detection: Coordination continues until all agents have voted
Winner Selection: The agent with the most votes is selected
Final Presentation: The winning agent delivers the final answer

Key Features:

Natural Convergence: No forced consensus, agents naturally agree on best answer
Iterative Refinement: Agents can refine their answers after seeing others’ work
Workspace Sharing: When agents answer, their workspace is snapshotted for others to review
Tie Resolution: Deterministic tie-breaking based on answer order

Inject-and-Continue (Preempt-Not-Restart)#

When an agent provides a new_answer while other agents are working, MassGen uses an inject-and-continue approach instead of restarting agents from scratch.

Traditional Approach (Restart):

Agent A: Working on solution... [deep in analysis]
Agent B: Provides new_answer
         ↓
Agent A: KILL stream → Clear context → Restart from zero
         ❌ Lost all partial work and thinking

MassGen Approach (Inject-and-Continue):

Agent A: Working on solution... [completes response]
Agent B: Provides new_answer
         ↓
Agent A: Receive UPDATE → Append to conversation → New API call
         ✅ Preserved conversation history
         ✅ Can now build on Agent B's answer

How It Works:

When Agent B provides a new_answer, the orchestrator appends an UPDATE message to Agent A’s conversation history containing Agent B’s answer. Agent A then makes a new API call with this extended context. This preserves the conversation history but requires a fresh inference call.

Note

Future Enhancement: True mid-stream injection within tool responses is planned, which would allow updates to be injected while an agent is actively streaming, preserving in-progress thinking. Currently, updates are only processed between API calls.

Benefits:

Conversation Preservation: Agents keep their full conversation history
Collaboration: Agents can synthesize and build on each other’s work
No Full Restart: Agents don’t lose their accumulated context

Update Injection Points:

Updates are injected at safe points during agent execution:

Between iteration loops (after completing a response)
When agent checks for new context
Between API calls (not mid-stream)

Race Condition: If an agent is deep in its first response when a new answer arrives, it won’t see the injection until completing that response. By then, it may already have full context from the orchestrator’s normal flow. This is acceptable - the agent still gets all answers, just via different mechanism (full context on next spawn vs. injection mid-work).

Implementation: massgen/orchestrator.py:_inject_update_and_continue()

Answer Labeling#

Each answer gets a unique identifier: agent{N}.{attempt}

agent1.1 = Agent 1’s first answer
agent2.1 = Agent 2’s first answer
agent1.2 = Agent 1’s second answer (after restart)
agent1.final = Agent 1’s final answer (if winner)

This labeling system enables:

Clear vote tracking
Answer evolution visualization
Transparent decision history

Implementation: massgen/orchestrator.py

Workspace Management#

Each agent gets an isolated workspace for safe file operations.

Directory Structure#

.massgen/
├── workspaces/           # Agent working directories
│   ├── agent1/          # Agent 1's isolated workspace
│   └── agent2/          # Agent 2's isolated workspace
├── snapshots/           # Workspace snapshots for coordination
│   ├── agent1_20250113_143022/  # Snapshot of agent1's work
│   └── agent2_20250113_143025/  # Snapshot of agent2's work
├── temp_workspaces/     # Previous turn results for multi-turn
│   ├── agent1_turn_1/   # Agent 1's turn 1 results
│   └── agent2_turn_1/   # Agent 2's turn 1 results
├── sessions/            # Multi-turn conversation history
│   └── session_20250113_143000/
│       ├── turn_1/
│       └── turn_2/
└── massgen_logs/        # All logging output
    └── log_20250113_143000/

Snapshot System#

When an agent provides an answer during coordination:

Capture: Their workspace is copied to snapshots/
Share: Other agents receive read-only access to the snapshot
Review: Agents can examine files, code, and outputs
Build: Agents build on insights from other agents’ work

This enables agents to:

See concrete work, not just descriptions
Catch errors in code or logic
Build incrementally on each other’s contributions
Provide informed votes based on actual outputs

Implementation: massgen/filesystem_manager/

Multi-Turn Conversations#

MassGen supports interactive multi-turn conversations with full context preservation.

Session Management#

Each multi-turn session maintains:

Session ID: Unique identifier (e.g., session_20250113_143000)
Turn History: Numbered turns (turn_1, turn_2, …)
Workspace Persistence: Each turn’s workspace is preserved
Context Paths: Previous turns become read-only context for next turns

Turn Lifecycle#

Turn Start: Increment turn counter, create turn directory
Context Loading: Previous turn’s workspace becomes read-only context
Execution: Agents work with fresh writeable workspace + previous context
Persistence: Winning agent’s workspace is saved to turn directory
Summary Update: SESSION_SUMMARY.txt is updated with turn details

This allows agents to:

Compare “what I changed” vs “what was originally there”
Build incrementally across multiple turns
Reference previous results explicitly
Maintain project continuity

Implementation: massgen/cli.py (multi-turn mode)

MCP Integration#

MassGen integrates Model Context Protocol (MCP) for external tool access.

Architecture#

Backend → MCP Client → MCP Server → External Tools
   ↓
Tools List → Agent → Tool Calls → Tool Results

Supported Backends:

Claude: Native MCP support via claude_messages API
Gemini: MCP support via function calling
Others: Via tool conversion layer

Planning Mode#

Special coordination mode for MCP tools:

During Coordination: Agents can plan tool usage without execution
After Consensus: Winner executes tools in their final answer
Safety: Prevents irreversible actions during collaboration

This is critical for:

File operations (create, delete, modify)
API calls with side effects
Database operations
External service integrations

Implementation: massgen/backend/gemini.py, massgen/backend/claude.py

Backend Abstraction#

All LLM interactions go through a unified backend interface.

Backend Interface#

Each backend implements:

class Backend:
    async def chat(messages, stream=True):
        """Stream responses with tool calls"""

    async def get_available_tools():
        """Return tools for this backend"""

    def format_messages(messages):
        """Convert to backend-specific format"""

Supported Backends:

API-based: OpenAI, Claude, Gemini, Grok, Azure OpenAI
Local: LM Studio, vLLM, SGLang
External: AG2 framework agents
Native tool backends: Claude Code SDK and Codex CLI with filesystem and shell access

Implementation: massgen/backend/

File Permission System#

MassGen enforces granular file permissions for safe project integration.

Context Paths#

Agents can access specific directories with permissions:

orchestrator:
  context_paths:
    - path: "/path/to/project"
      permission: "write"
      protected_paths:
        - ".git"
        - "node_modules"

Permission Types:

read: View files only
write: Read, create, modify, delete files (except protected)

Protected Paths:

Immune from modification/deletion
Relative to context path
Supports files and directories

Safety Features:

Read-Before-Delete: Agents must read files before deletion
Permission Validation: All file operations are checked
Audit Trail: All operations logged to massgen.log

Implementation: massgen/filesystem_manager/_path_permission_manager.py

Code Organization#

massgen/
├── orchestrator.py           # Coordination engine
├── chat_agent.py             # Agent implementations
├── cli.py                    # Command-line interface
├── config_builder.py         # Interactive config wizard
├── agent_config.py           # Configuration models
├── backend/                  # LLM backend implementations
│   ├── claude.py            # Anthropic Claude
│   ├── gemini.py            # Google Gemini
│   ├── response.py          # OpenAI
│   ├── grok.py              # xAI Grok
│   ├── claude_code.py       # Claude Code CLI
│   ├── codex.py            # OpenAI Codex CLI
│   ├── external.py          # External frameworks (AG2)
│   └── ...
├── frontend/                 # UI components
│   └── coordination_ui.py   # Terminal UI
├── filesystem_manager/       # File operations & permissions
│   ├── _path_permission_manager.py
│   ├── _workspace_tools_server.py
│   └── ...
├── logger_config.py          # Logging configuration
└── adapters/                 # External framework adapters
    └── ag2/                 # AG2 adapter

Key Modules:

orchestrator.py: Vote tracking, consensus detection, workspace snapshots
chat_agent.py: Agent lifecycle, message handling, tool execution
backend/: LLM-specific implementations with unified interface
filesystem_manager/: Permission system, workspace isolation
frontend/: Real-time coordination display with Rich

Extension Points#

Adding New Backends#

Subclass Backend base class
Implement chat() and format_messages()
Register in cli.py’s create_backend()
Add to AgentConfig factory methods

Example: massgen/backend/grok.py

Adding MCP Servers#

Configure in YAML:

backend:
  type: "claude"
  mcp_servers:
    - name: "weather"
      command: "npx"
      args: ["-y", "@modelcontextprotocol/server-weather"]

Servers auto-start when backend initializes
Tools automatically discovered and presented to agent

Example: All MCP configs in massgen/configs/tools/mcp/

Adding External Frameworks#

Create adapter in massgen/adapters/{framework}/
Implement ExternalAgentAdapter interface
Register in adapters/__init__.py
Agents work seamlessly with native MassGen agents

Example: massgen/adapters/ag2/

Context Management#

MassGen implements several strategies to manage LLM context windows efficiently.

Reactive Compression#

When the LLM provider returns a context length error, MassGen automatically:

Captures the streaming buffer content (tool calls, reasoning, partial work)
Generates a summary of completed work
Compresses older messages while preserving recent context
Retries the request with the compressed context

See Memory and Context Management for user-facing documentation.

Implementation: massgen/backend/_compression_utils.py

Streaming Buffer#

The StreamingBufferMixin captures streamed content during API calls, enabling compression recovery to preserve partial work when context limits are exceeded.

How it works:

As chunks stream from the API, content is accumulated in _streaming_buffer
If a context length error occurs mid-stream, the buffer contains partial work
The buffer content is passed to compression, which summarizes it
The summary is injected as an assistant message for retry

Buffer content flow:

API Stream → _append_to_streaming_buffer() → _streaming_buffer accumulates
                                           ↓
                           Context error detected
                                           ↓
                           buffer_content passed to compress_messages_for_recovery()
                                           ↓
                           Summarized into: "[Tool execution results]\n{buffer}"
                                           ↓
                           Injected as assistant message in compressed result

Backend Support:

Streaming Buffer Support by Backend#
Backend	Buffer Support	Notes
`ChatCompletionsBackend`	✅ Yes	Base for OpenAI-compatible APIs
`ClaudeBackend`	✅ Yes	Anthropic Messages API
`GeminiBackend`	✅ Yes	Google Gemini SDK
`ResponseBackend`	✅ Yes	OpenAI Responses API
`GrokBackend`	✅ Yes	Inherits from ChatCompletionsBackend
`LMStudioBackend`	✅ Yes	Inherits from ChatCompletionsBackend
`InferenceBackend`	✅ Yes	Inherits from ChatCompletionsBackend
`AzureOpenAIBackend`	❌ No	Extends LLMBackend directly
`ClaudeCodeBackend`	❌ No	Streaming handled internally
`ExternalAgentBackend`	❌ No	Wrapper for external agents

Implementation:

massgen/backend/_streaming_buffer_mixin.py - Mixin class providing buffer methods
Buffer methods: _clear_streaming_buffer(), _append_to_streaming_buffer()
Buffer respects _compression_retry flag to avoid clearing during retry

Adding buffer support to a backend:

from ._streaming_buffer_mixin import StreamingBufferMixin

class MyBackend(StreamingBufferMixin, CustomToolAndMCPBackend):
    # StreamingBufferMixin MUST come first in MRO
    pass

Note: The mixin must appear before other base classes in the inheritance list to ensure proper method resolution order (MRO).

MCP Tool Result Optimization#

MCP CallToolResult objects contain both structured and text representations. MassGen extracts only the clean text content to minimize context usage:

Raw CallToolResult (sent to context):
❌ "meta=None content=[TextContent(type='text', text='file contents...')]
    structuredContent={'content': 'file contents...'}"  ← Duplicated, bloated

Optimized extraction (sent to context):
✅ "file contents..."  ← Clean, minimal

This optimization typically reduces tool result size by 4-10x, significantly extending how many tool calls can fit within the context window.

Extraction Logic:

Check if result has .content attribute (MCP CallToolResult)
Extract text from TextContent objects in the content list
Fall back to .text attribute or str(result) for other result types

Implementation: _extract_text_from_content() in each backend’s _append_tool_result_message() method.

Large Tool Result Eviction#

Tool results exceeding a token threshold are automatically evicted to files, preventing context window saturation. Inspired by LangChain DeepAgents Harness.

How it works:

After tool execution, the result is checked against the token threshold
If exceeding 20,000 tokens, the result is written to a file in the agent’s workspace
The result is replaced with a reference message containing:
- Token and character counts
- File path for retrieval
- Character position for chunked reading
- A 2,000 token preview of the content

Example reference message:

[Tool Result Evicted - Too Large for Context]

The result from read_file was 50,000 tokens / 200,000 chars (limit: 20,000 tokens).
Full result saved to: .tool_results/read_file_20241225_143052_a1b2c3d4.txt

To read more: start at char 6,500, read in chunks.

Preview (chars 0-6,500 of 200,000):
{"data": [{"id": 1, "name": "Alice"...

Note: The preview character count varies based on content (approximately 2,000 tokens).

Configuration:

Currently uses hardcoded thresholds defined in constants:

TOOL_RESULT_EVICTION_THRESHOLD_TOKENS = 20,000 - Eviction trigger
TOOL_RESULT_EVICTION_PREVIEW_TOKENS = 2,000 - Preview size
EVICTED_RESULTS_DIR = ".tool_results" - Storage directory

Implementation files:

massgen/filesystem_manager/_constants.py - Threshold constants
massgen/backend/base_with_custom_tool_and_mcp.py - Eviction logic:
- _truncate_to_token_limit() - Binary search for token-based truncation
- _maybe_evict_large_tool_result() - Main eviction logic
- Integration in _execute_tool_with_logging()

Testing: massgen/tests/test_tool_result_eviction.py

Performance Considerations#

Parallel Execution: All agents run concurrently
Streaming: All responses stream in real-time
Workspace Isolation: Copy-on-write for efficiency
Async I/O: All file operations are non-blocking
Token Management: Per-backend rate limiting

Architecture

Contents

Architecture#

System Overview#

Core Components#

Orchestrator#

Agent#

Backend#

Design Principles#

Coordination Protocol#

Vote-Based Consensus#

Inject-and-Continue (Preempt-Not-Restart)#

Answer Labeling#

Workspace Management#

Directory Structure#

Snapshot System#

Multi-Turn Conversations#

Session Management#

Turn Lifecycle#

MCP Integration#

Architecture#

Planning Mode#

Backend Abstraction#

Backend Interface#

File Permission System#

Context Paths#

Code Organization#

Extension Points#

Adding New Backends#

Adding MCP Servers#

Adding External Frameworks#

Context Management#

Reactive Compression#

Streaming Buffer#

MCP Tool Result Optimization#

Large Tool Result Eviction#

Performance Considerations#

See Also#