MassGen: Multi-Agent Scaling System for GenAI#
What is MassGen?#
MassGen is a cutting-edge multi-agent system that leverages the power of collaborative AI to solve complex tasks. It assigns a task to multiple AI agents who work in parallel, observe each other’s progress, and refine their approaches to converge on the best solution to deliver a comprehensive and high-quality result.
How It Works:
Work in Parallel - Multiple agents tackle the problem simultaneously, each bringing unique capabilities
See Recent Answers - At each step, agents view the most recent answers from other agents
Decide Next Action - Each agent chooses to provide a new answer or vote for an existing answer
Share Workspaces - When agents provide answers, their workspace is captured so others can review their work
Natural Consensus - Coordination continues until all agents vote, then the agent with most votes presents the final answer
MassGen is a cutting-edge multi-agent framework that coordinates AI agents through redundancy and iterative refinement. Agents tackle the full problem, observe and build on each other’s work across cycles of refinement and restarts, then vote — and the best collectively validated answer wins. This lays the groundwork for principled multi-agent scaling and self-improvement.
See visual comparisons between MassGen and single-agent solutions, highlighting how MassGen unifies different agentic approaches for better outcomes.
Use MassGen from Claude Code, Codex, Copilot, Cursor, and other AI coding agents.
Note
For AI agents and crawlers: This site publishes a curated llms.txt index following the llmstxt.org spec, plus a concatenated llms-full.txt dump of the user guide and reference docs.
How Does MassGen Compare?#
MassGen sits in a different design space than typical multi-agent frameworks. The core differentiator across the board is parallel attempts with voting and consensus — agents tackle the same task in parallel, observe each other, and converge on a winner — backed by tools, code execution, filesystem integration, and active development.
MassGen vs LLM Council — dynamic voting / consensus vs a fixed 3-stage pipeline (responses → ranking → chairman synthesis).
MassGen vs CrewAI — parallel refinement on one task vs role-based decomposition into sub-tasks.
MassGen vs LangGraph — a pre-built parallel + voting protocol vs a low-level graph runtime you author yourself.
MassGen vs AutoGen / AG2 — parallel attempts with collective validation vs conversation-based multi-agent message passing.
Quick Start#
pip install uv # if needed
uv venv && source .venv/bin/activate
uv pip install massgen
uv run massgen # Setup wizard, then ask your first question
Rich terminal UI with real-time streaming, multi-turn conversations, and YAML configuration.
pip install uv # if needed
uv venv && source .venv/bin/activate
uv pip install massgen
uv run massgen --web # Open http://localhost:8000
Browser-based UI with real-time agent streaming, vote visualization, and workspace browsing.
from dotenv import load_dotenv
load_dotenv() # Load OPENROUTER_API_KEY from .env
import litellm
from massgen import register_with_litellm
register_with_litellm()
response = litellm.completion(
model="massgen/build",
messages=[{"role": "user", "content": "Your question"}],
optional_params={"models": ["openrouter/openai/gpt-5", "openrouter/anthropic/claude-sonnet-4.5"]}
)
print(response.choices[0].message.content)
Standard OpenAI-compatible interface for seamless integration with existing applications.
Video Tutorials#
Learn how to install, configure, and run your first multi-agent collaboration with MassGen.
Explore how to build custom agents and tools with MassGen.
Key Features#
Use Claude, Gemini, GPT, Grok together - each agent can use a different model.
Multiple agents work simultaneously with voting and consensus detection.
Model Context Protocol for web search, code execution, file operations, and custom tools.
Full async Python API and LiteLLM integration for seamless application embedding.
Real-time terminal display showing agents’ working processes and coordination.
Interactive conversations with context preservation across turns.
Integrate external frameworks (AG2, LangGraph, AgentScope, OpenAI, SmolAgent) as tools.
Work directly with your codebase using context paths with granular read/write permissions.
Recent Releases#
v0.1.94 (June 5, 2026) - Parallelism Hardening (Engineering Health)
Strengthens the orchestrator’s parallel execution: moves the snapshot copy off the event loop so agents keep streaming concurrently — backed by immutable versioned snapshots that keep the off-loop copy safe — and closes latent concurrency races (lost peer-answer revisions, lost background-subagent results, leaked trace tasks, cancel-without-await teardown). Also unifies the mid-stream injection paths and surfaces worktree-isolation degradation. No per-backend functionality changes.
v0.1.93 (June 3, 2026) - CLI Package Decomposition & Pydantic Config Migration
Splits the monolithic cli.py into a focused massgen/cli/ package, migrates the configuration classes to pydantic dataclasses with Literal-typed modes validated at construction, removes ~8.7k lines of dead legacy code, and hardens the test-signal and type-checking tooling (coverage gate, no-assert guard, uv.lock enforcement, and an incremental mypy ratchet). Internal-quality release with no runtime behavior changes.
v0.1.92 (June 1, 2026) - Orchestrator Collaborator Refactor & Parallel Search MCP
Refactors the monolithic orchestrator into 49 lazy collaborators with stable delegator call sites, splits focused Textual display helpers into sibling modules, adds characterization coverage for extraction seams, and introduces a Parallel Web Search MCP registry entry plus runnable example config.
v0.1.91 (May 27, 2026) - Config Reliability & Hook Safety
Hardens release-critical YAML configuration paths with centralized coordination, timeout, and orchestrator runtime parsing; strict unknown-key validation for typo detection; checklist runtime control wiring; and safer Gemini/Codex native hook path permission precedence.
v0.1.90 (May 25, 2026) - Discriminative Criteria Refinements & Checklist Calibration
Improves checklist-gated refinement quality with discriminative-power pruning, per-criterion feedback carried into the next round, position-bias counterbalancing, deterministic tie-breaking, a unified checklist gate on a single 0-10 scale, shared score parsing utilities, and fast-iteration config updates.
v0.1.89 (May 22, 2026) - Antigravity CLI Full Integration & Hardening
Completes the follow-up Antigravity integration pass with workflow-mode parity, early auth and binary health checks, reliable workspace writes via --add-dir, workspace-root .antigravitycli/ anchoring, standalone hooks.json support with enableJsonHooks, and prompt guardrails that hide subagent affordances when subagents are disabled.
v0.1.88 (May 20, 2026) - Antigravity CLI Backend
New antigravity_cli backend wraps Google’s agy binary as a MassGen backend, with workspace-local .antigravity/ config isolation, Antigravity MCP config translation, native hook adapter support, and runnable configs for single-agent Antigravity and mixed Gemini API + Antigravity fast-iteration runs.
v0.1.87 (May 15, 2026) - Documentation: Framework Comparisons & llms.txt
Three new “MassGen vs …” comparison pages (CrewAI, LangGraph, AutoGen/AG2), a curated llms.txt index plus a full-corpus llms-full.txt dump for AI agents and crawlers (per llmstxt.org spec), and a one-line refine=False fix for the bootstrap_subagent discriminator.
v0.1.86 (May 13, 2026) - bootstrap_subagent Discriminator + Codex MCP Approval Fix
The critic-driven criteria path is now functional: orchestrator.coordination.criteria_mode: bootstrap_subagent runs an in-process LLM discriminator between rounds, merges proposed criteria into the accumulator, and augments the next round’s checklist automatically. Codex MCP tool calls under codex exec now get the non-interactive approval bypasses needed for external workflow tools.
v0.1.85 (May 11, 2026) - Discriminative Criteria Emergence (criteria_mode)
New orchestrator.coordination.criteria_mode option lets evaluation criteria emerge from observed gaps across rounds instead of being pre-authored. The bootstrap_inline variant is fully functional on all backends with checklist tool support — agents emit proposed_criteria alongside submit_checklist, the accumulator dedupes/caps, and the next round’s checklist is augmented automatically.
Supported Models#
Claude (Anthropic) · Gemini (Google) · GPT (OpenAI) · Grok (xAI) · Azure OpenAI · Groq · Together · LM Studio · and more…