Overview

Performance-first agent framework that makes any model better — and proves it across 9+ benchmarks.
Architecture

Why MiroFlow
Make Any Model Better
Plug in any LLM — GPT-5, Claude, MiroThinker, Kimi K2.5, DeepSeek — and get better agent performance through smart rollback, iterative reasoning, and optimized tool orchestration. Change provider_class and model_name in YAML — everything else stays the same. Learn more
Prove It With Reproducible Benchmarks
State-of-the-art results across 9+ benchmarks (GAIA, HLE, BrowseComp, xBench-DeepSearch, FutureX, and more). Every result is reproducible from a config file and a shell script, with automated multi-run statistical aggregation.
Fair Model Comparison
Same tools, same prompts, same infrastructure. The only variable is the model. See how different LLMs perform head-to-head on the Model Comparison Leaderboard.
What's New in v1.7
Skill System
Define new agent skills with simple SKILL.md files — no code changes needed. Skills are auto-discovered from the filesystem, support sandboxed execution, and can be whitelisted for production safety. See Core Concepts for details.
Agent Graph Orchestration
Compose multi-agent workflows using SequentialAgent and agent graph configs. Agents pass context to each other via AgentContext, enabling modular task decomposition and flexible pipeline design. See Core Concepts for details.
Web Application
Out-of-the-box FastAPI + React web interface with session management, task execution monitoring, and file uploads. Get started with bash scripts/start_web.sh. See Quick Start for setup.
Smart Rollback & Retry
Automatic detection and rollback of LLM output errors — format issues, truncation, refusals, and duplicate tool calls are all handled gracefully, significantly improving agent robustness.
Plugin Architecture
Unified component registry with @register decorator for agents, IO processors, and LLM clients. Extend MiroFlow without touching core code — just register your component and reference it in config.
Zero-Code Prompt Management
YAML + Jinja2 template-based prompt system. Tune agent behavior by editing config files instead of source code — no redeployment needed.
Quick Start
# 1. Clone and setup
git clone https://github.com/MiroMindAI/miroflow && cd miroflow
uv sync
# 2. Configure API keys
cp .env.template .env
# Edit .env and add your API keys (see .env.template for details)
# 3. Run your first task
bash scripts/test_single_task.sh \
--config config/agent_quickstart.yaml \
--task-question "What is the first country listed in the XLSX file that have names starting with Co?" \
--file-path data/FSI-2023-DOWNLOAD.xlsx
Expected output: \boxed{Congo Democratic Republic}
Switch models in one line — same tools, same prompts, different LLM:
# GPT-5
llm:
provider_class: GPT5OpenAIClient
model_name: gpt-5
# Claude 3.7 Sonnet
llm:
provider_class: ClaudeAnthropicClient
model_name: claude-3-7-sonnet-20250219
# MiroThinker (open-source, self-hosted)
llm:
provider_class: MiroThinkerSGLangClient
model_name: mirothinker-v1.5
See the Installation Guide for web app setup, more examples, and configuration options.
Any Model, Better Results
Benchmark results will be updated after comprehensive testing with v1.7. See Evaluation Methodology for detailed methodology and reproduction guides.
Changelog
Feb 2026
- Added new tools:
tool-code-sandbox,tool-jina-scrape,tool-serper-search - Added generic
OpenRouterClientandOpenAIClientfor flexible LLM access - Added FRAMES-Test benchmark evaluation support
- Refactored tool system: separated Jina scraping and Serper search into standalone tools
- Inlined eval prompts into verifiers to fix broken LLM judge
Oct 2025
- Added BrowseComp-ZH, HLE, HLE-Text, BrowseComp-EN, FinSearchComp, xBench-DS benchmarks
- Added DeepSeek V3.1, GPT-5 integration
- Added WebWalkerQA dataset evaluation
- Added MiroAPI integration
- Improved tool logs and per-task log storage