📚 All About Agents
Welcome to our comprehensive resource collection for AI agents. This page curates valuable tools, frameworks, research papers, and learning materials to help you understand and build sophisticated agent systems.
Table of Contents
Resource Categories
🔬 Agent Papers
Research Papers & Publications
Latest research in agent systems, methodologies, and theoretical foundations.
P001 - WebThinker: Empowering Large Reasoning Models with Deep Research Capability
- Paper · GitHub
P002 - Profile-Aware Maneuvering: A Dynamic Multi-Agent System for Robust GAIA Problem Solving by AWorld - Paper
P003 - AFlow: Automating Agentic Workflow Generation - Paper
P004 - AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs - Paper
P005 - Throttling Web Agents Using Reasoning Gates - Paper
P006 - The Landscape of Agentic Reinforcement Learning for LLMs: A Survey - Paper
P007 - BrowseMaster: Towards Scalable Web Browsing via Tool-Augmented Programmatic Agent Pair - Paper · GitHub
P008 - Long Term Memory: The Foundation of AI Self-Evolution - Paper
P009 - DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments - Paper · GitHub
P010 - Web-Shepherd: Advancing PRMs for Reinforcing Web Agents - Paper · GitHub
P011 - SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis - Paper · GitHub
P012 - Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution - Paper · GitHub
P013 - MCP-Zero: Active Tool Discovery for Autonomous LLM Agents - Paper · GitHub
P014 - AgentOrchestra: Orchestrating Hierarchical Multi-Agent Intelligence with the Tool-Environment-Agent(TEA) Protocol - Paper · GitHub
P015 - Deep Research Agents: A Systematic Examination And Roadmap - Paper · GitHub
P016 - SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam? - Paper · GitHub
P017 - Deep Researcher with Test-Time Diffusion: Enhancing research capabilities through diffusion-based test-time adaptation - Paper
P018 - Multi-Agent Tool-Integrated Policy Optimization: Enhancing multi-agent systems through integrated tool usage and policy optimization - Paper
P019 - WALT: Web Agents that Learn Tools - Paper
P020 - Learning to Route: A Rule-Driven Agent Framework for Hybrid-Source Retrieval-Augmented Generation - Paper
P021 - SurveyBench: Can LLM(-Agents) Write Academic Surveys that Align with Reader Needs? - Paper
P022 - FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents - Paper
P023 - LLM-Based Data Science Agents: A Survey of Capabilities, Challenges, and Future Directions - Paper
P024 - Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models - Paper
P025 - MARS: Optimizing Dual-System Deep Research via Multi-Agent Reinforcement Learning - Paper
P026 - QuantAgents: Towards Multi-agent Financial System via Simulated Trading - Paper · Project
P027 - Small Language Models for Agentic Systems: A Survey of Architectures, Capabilities, and Deployment Trade-offs - Paper
P028 - Open Agent Specification (Agent Spec): Technical Report - Paper
P029 - AudioToolAgent: An Agentic Framework for Audio-Language Models - Paper
P030 - ThinkBrake: Mitigating Overthinking in Tool Reasoning - Paper
P031 - TOUCAN: Synthesizing 1.5M Tool-Agentic Trajectories from Real Environments - Paper
P032 - ToolTweak: An Attack on Tool Selection in LLM-Based Agents - Paper
P033 - ToolBrain: A Flexible RL Framework for Agentic Tools - Paper
P034 - TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture - Paper
P035 - WebDancer: Towards Autonomous Information Seeking Agency - Paper
P036 - WebSailor: Navigating Super-human Reasoning for Web Agent - Paper
P037 - WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization - Paper
P038 - WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent - Paper
P039 - WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning - Paper
P040 - WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents - Paper
P041 - Scaling Agents via Continual Pre-training: Enhancing agent capabilities through continuous learning approaches - Paper
P042 - Towards General Agentic Intelligence via Environment Scaling: Advancing general AI through scalable environment interactions - Paper
P043 - WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research - Paper
P044 - ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization - Paper
P045 - Stratified GRPO: Handling Structural Heterogeneity in Reinforcement Learning of LLM Search Agents - Paper
P046 - AgentFlow: In-the-Flow Agentic System Optimization: Effective Planning and Tool Use - Paper · GitHub
P047 - ARM: Discovering Agentic Reasoning Modules for Generalizable Multi-Agent Systems - Paper
P048 - Customer-R1: Personalized Simulation of Human Behaviors via RL-based LLM Agent in Online Shopping - Paper
P049 - CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards - Paper
P050 - Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window - Paper
P051 - Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks - Paper
P052 - MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning - Paper
P053 - Agent Learning via Early Experience - Paper
P054 - CaRT: Teaching LLM Agents to Know When They Know Enough - Paper
P055 - AutoMLGen: Navigating Fine-Grained Optimization for Coding Agents - Paper
P056 - Opponent Shaping in LLM Agents - Paper
P057 - NavSpace: How Navigation Agents Follow Spatial Intelligence Instructions - Paper
P058 - VoiceAgentBench: Are Voice Assistants ready for agentic tasks? - Paper
P059 - Self-Improving LLM Agents at Test-Time - Paper
P060 - AgentRL: Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework - Paper
P061 - Adaptive Tool Generation with Models as Tools and Reinforcement Learning - Paper
P062 - TinyScientist: An Interactive, Extensible, and Controllable Framework for Building Research Agents - Paper
P063 - A Survey on Agentic Security: Applications, Threats and Defenses - Paper
P064 - A Multi-Agent Framework for Stateful Inference-Time Search - Paper
P065 - AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning - Paper
P066 - Democratizing AI Scientists using ToolUniverse - Paper · GitHub
P067 - Dyna-Mind: Learning to Simulate from Experience for Better AI Agents - Paper
P068 - DeepTravel: An End-to-End Agentic Reinforcement Learning Framework for Autonomous Travel Planning Agents - Paper
P069 - DSPO: Stable and Efficient Policy Optimization for Agentic Search and Reasoning - Paper
P070 - MOSAIC: Multi-agent Orchestration for Task-Intelligent Scientific Coding - Paper
P071 - MASA: LLM-Driven Multi-Agent Systems for Autoformalization - Paper
P072 - Exploiting Web Search Tools of AI Agents for Data Exfiltration - Paper
P073 - Auto-scaling Continuous Memory for GUI Agent - Paper
P074 - StoryBox: Collaborative Multi-Agent Simulation for Hybrid Bottom-Up Long-Form Story Generation Using Large Language Models - Paper
P075 - WebRouter: Query-specific Router via Variational Information Bottleneck for Cost-sensitive Web Agent - Paper
P076 - LLM×MapReduce-V3: Enabling Interactive In-Depth Survey Generation through a MCP-Driven Hierarchically Modular Agent System - Paper
P077 - BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions - Paper
P078 - AGENTIQL: An Agent-Inspired Multi-Expert Framework for Text-to-SQL Generation - Paper
P079 - FML-bench: A Benchmark for Automatic ML Research Agents Highlighting the Importance of Exploration Breadth - Paper
P080 - MedAgentAudit: Diagnosing and Quantifying Collaborative Failure Modes in Medical Multi-Agent Systems - Paper
P081 - Can Tool-Integrated Reinforcement Learning Generalize Across Diverse Domains? - Paper
P082 - A Survey on Agentic Multimodal Large Language Models - Paper
P083 - R-WoM: Retrieval-augmented World Model For Computer-use Agents - Paper
P084 - HackWorld: Evaluating Computer-Use Agents on Exploiting Web Application Vulnerabilities - Paper
P085 - Deep Research Brings Deeper Harm - Paper
A\textsuperscript{2}FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning, https://arxiv.org/abs/2510.12838
DeepPlanner: Scaling Planning Capability for Deep Research Agents via Advantage Shaping, https://arxiv.org/abs/2510.12979
🛠️ Agent Frameworks
Popular Agent Development Frameworks
Comprehensive frameworks for building and deploying AI agents across different domains.
F001 - MiroFlow: Build, manage, and scale your AI agents with ease - GitHub
F002 - Youtu-Agent: A simple yet powerful agent framework that delivers with open-source models - GitHub
F003 - OpenManus: No fortress, purely open ground. OpenManus is Coming - GitHub
F004 - OpenBB Platform: Financial data platform for analysts, quants and AI agents - Project
F005 - TradingAgents: Multi-Agents LLM Financial Trading Framework - Paper · GitHub
F006 - JoyAgent-JDGenie: Technical Report on the GAIA - Paper · GitHub
📊 Evaluation
Benchmarks & Evaluation Frameworks
Comprehensive evaluation tools and benchmarks for measuring agent performance across various tasks.
E001 - LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries - Paper
E002 - BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent - Paper
E003 - HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering - Paper
E004 - GAIA: a benchmark for General AI Assistants - Paper · Leaderboard
E005 - xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations - Paper
E006 - MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers - Paper
E007 - FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction - Paper
E008 - Terminal-Bench: the benchmark for testing AI agents in real terminal environments - GitHub
E009 - Gaia2 and ARE: Empowering the Community to Evaluate Agents - Blog Post
E010 - GPQA: A Graduate-Level Google-Proof Q&A Benchmark - Paper · GitHub
E011 - WebWalkerQA: WebWalker: Benchmarking LLMs in Web Traversal - Paper · GitHub · Leaderboard
E012 - HLE: Humanity's Last Exam - Paper · Website
E013 - BFCL: Berkeley Function Calling Leaderboard - GitHub · Leaderboard
E014 - When2Call: When (not) to Call Tools - Paper · GitHub
E015 - ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities - Paper · GitHub
E016 - ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models - Paper · GitHub
E017 - SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines - Paper · Website
E018 - Terminal-Bench: A benchmark for testing AI agents in terminal environments - Leaderboard · Website
E019 - τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains - Paper · GitHub
E020 - τ2-Bench: Evaluating Conversational Agents in a Dual-Control Environment - Paper · GitHub
E021 - Deep Research Bench: Evaluating AI Web Research Agents - Paper · Website
E022 - Beyond the Final Answer: Evaluating the Reasoning Trajectories of Tool-Augmented Agents - Paper
E023 - TRAJECT-Bench: A Trajectory-Aware Benchmark for Evaluating Agentic Tool Use - Paper
E024 - ARC-AGI: The General Intelligence Benchmark - Website
E025 - Demystifying Deep Search: A Holistic Evaluation with Hint-Free Multi-Hop Questions and Factorised Metrics - Paper
E026 - BrowseComp-VL: A Comprehensive Benchmark for Vision-Language Web Browsing - Paper
E027 - ACEBench: Who Wins the Match Point in Tool Usage? - Paper
E028 - Haystack Engineering: Context Engineering for Heterogeneous and Agentic Long-Context Evaluation - Paper · GitHub
E029 - DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation - Paper
E030 - When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents - Paper
E031 - A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic System - Paper
E032 - Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation - Paper
🧠 Agent Memory
Memory Systems for Persistent Agent Intelligence
Advanced memory solutions for building agents with long-term context and learning capabilities.
M001 - Mem0: Building Production- Ready AI Agents with Scalable Long-Term Memory - GitHub
M002 - memobase: Profile-Based Long-Term Memory for AI Applications - GitHub
M003 - Memento: Fine-tuning LLM Agents without Fine-tuning LLMs - Paper · GitHub
M004 - MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments - Paper
M005 - A-MEM: Agentic Memory for LLM Agents - Paper · GitHub
M006 - MemoryOS: Memory OS of AI Agent - Paper · GitHub
M007 - Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning - Paper
M008 - HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models - Paper · GitHub
M009 - MaxKB: Open-source platform for building enterprise-grade agents - GitHub
M010 - MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent - Paper · Website
M011 - LEGOMem: Modular Procedural Memory for Multi-agent LLM Systems for Workflow Automation - Paper
M012 - Memp: Exploring Agent Procedural Memory - Paper
M013 - MIRIX: Multi-Agent Memory System for LLM-Based Agents - Paper · Website
M014 - A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory - Paper
M015 - ToolMem: Enhancing Multimodal Agents with Learnable Tool Capability Memory - Paper
M016 - CAM: A Constructivist View of Agentic Memory for LLM-Based Reading Comprehension - Paper
M017 - Mem-α: Learning Memory Construction via Reinforcement Learning - Paper
M018 - Preference-Aware Memory Update for Long-Term LLM Agents - Paper
Blogs
Blog Posts & Tutorials
Curated collection of blog posts, tutorials, and articles about AI agents from various sources and languages.
General Blogs
-
ChatGPT Agent: Introducing ChatGPT Agent
- Blog Post · OpenAI's latest agent capabilities and features
-
Tongyi DeepResearch: Deep Research Agent for Complex Tasks
- Blog Post · Alibaba's advanced research agent system
Chinese Blogs
中文博客与资源
精选的中文AI智能体相关博客文章、教程和资源,帮助中文用户更好地理解和应用智能体技术。
Documentation Info
Last Updated: September 2025 · Doc Contributor: Team @ MiroMind AI