Skip to content

📚 All About Agents

Welcome to our comprehensive resource collection for AI agents. This page curates valuable tools, frameworks, research papers, and learning materials to help you understand and build sophisticated agent systems.


Table of Contents


🔬 Agent Papers

Research Papers & Publications

Latest research in agent systems, methodologies, and theoretical foundations.

P001 - WebThinker: Empowering Large Reasoning Models with Deep Research Capability
- Paper · GitHub

P002 - Profile-Aware Maneuvering: A Dynamic Multi-Agent System for Robust GAIA Problem Solving by AWorld - Paper

P003 - AFlow: Automating Agentic Workflow Generation - Paper

P004 - AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs - Paper

P005 - Throttling Web Agents Using Reasoning Gates - Paper

P006 - The Landscape of Agentic Reinforcement Learning for LLMs: A Survey - Paper

P007 - BrowseMaster: Towards Scalable Web Browsing via Tool-Augmented Programmatic Agent Pair - Paper · GitHub

P008 - Long Term Memory: The Foundation of AI Self-Evolution - Paper

P009 - DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments - Paper · GitHub

P010 - Web-Shepherd: Advancing PRMs for Reinforcing Web Agents - Paper · GitHub

P011 - SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis - Paper · GitHub

P012 - Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution - Paper · GitHub

P013 - MCP-Zero: Active Tool Discovery for Autonomous LLM Agents - Paper · GitHub

P014 - AgentOrchestra: Orchestrating Hierarchical Multi-Agent Intelligence with the Tool-Environment-Agent(TEA) Protocol - Paper · GitHub

P015 - Deep Research Agents: A Systematic Examination And Roadmap - Paper · GitHub

P016 - SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam? - Paper · GitHub

P017 - Deep Researcher with Test-Time Diffusion: Enhancing research capabilities through diffusion-based test-time adaptation - Paper

P018 - Multi-Agent Tool-Integrated Policy Optimization: Enhancing multi-agent systems through integrated tool usage and policy optimization - Paper

P019 - WALT: Web Agents that Learn Tools - Paper

P020 - Learning to Route: A Rule-Driven Agent Framework for Hybrid-Source Retrieval-Augmented Generation - Paper

P021 - SurveyBench: Can LLM(-Agents) Write Academic Surveys that Align with Reader Needs? - Paper

P022 - FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents - Paper

P023 - LLM-Based Data Science Agents: A Survey of Capabilities, Challenges, and Future Directions - Paper

P024 - Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models - Paper

P025 - MARS: Optimizing Dual-System Deep Research via Multi-Agent Reinforcement Learning - Paper

P026 - QuantAgents: Towards Multi-agent Financial System via Simulated Trading - Paper · Project

P027 - Small Language Models for Agentic Systems: A Survey of Architectures, Capabilities, and Deployment Trade-offs - Paper

P028 - Open Agent Specification (Agent Spec): Technical Report - Paper

P029 - AudioToolAgent: An Agentic Framework for Audio-Language Models - Paper

P030 - ThinkBrake: Mitigating Overthinking in Tool Reasoning - Paper

P031 - TOUCAN: Synthesizing 1.5M Tool-Agentic Trajectories from Real Environments - Paper

P032 - ToolTweak: An Attack on Tool Selection in LLM-Based Agents - Paper

P033 - ToolBrain: A Flexible RL Framework for Agentic Tools - Paper

P034 - TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture - Paper

P035 - WebDancer: Towards Autonomous Information Seeking Agency - Paper

P036 - WebSailor: Navigating Super-human Reasoning for Web Agent - Paper

P037 - WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization - Paper

P038 - WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent - Paper

P039 - WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning - Paper

P040 - WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents - Paper

P041 - Scaling Agents via Continual Pre-training: Enhancing agent capabilities through continuous learning approaches - Paper

P042 - Towards General Agentic Intelligence via Environment Scaling: Advancing general AI through scalable environment interactions - Paper

P043 - WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research - Paper

P044 - ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization - Paper

P045 - Stratified GRPO: Handling Structural Heterogeneity in Reinforcement Learning of LLM Search Agents - Paper

P046 - AgentFlow: In-the-Flow Agentic System Optimization: Effective Planning and Tool Use - Paper · GitHub

P047 - ARM: Discovering Agentic Reasoning Modules for Generalizable Multi-Agent Systems - Paper

P048 - Customer-R1: Personalized Simulation of Human Behaviors via RL-based LLM Agent in Online Shopping - Paper

P049 - CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards - Paper

P050 - Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window - Paper

P051 - Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks - Paper

P052 - MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning - Paper

P053 - Agent Learning via Early Experience - Paper

P054 - CaRT: Teaching LLM Agents to Know When They Know Enough - Paper

P055 - AutoMLGen: Navigating Fine-Grained Optimization for Coding Agents - Paper

P056 - Opponent Shaping in LLM Agents - Paper

P057 - NavSpace: How Navigation Agents Follow Spatial Intelligence Instructions - Paper

P058 - VoiceAgentBench: Are Voice Assistants ready for agentic tasks? - Paper

P059 - Self-Improving LLM Agents at Test-Time - Paper

P060 - AgentRL: Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework - Paper

P061 - Adaptive Tool Generation with Models as Tools and Reinforcement Learning - Paper

P062 - TinyScientist: An Interactive, Extensible, and Controllable Framework for Building Research Agents - Paper

P063 - A Survey on Agentic Security: Applications, Threats and Defenses - Paper

P064 - A Multi-Agent Framework for Stateful Inference-Time Search - Paper

P065 - AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning - Paper

P066 - Democratizing AI Scientists using ToolUniverse - Paper · GitHub

P067 - Dyna-Mind: Learning to Simulate from Experience for Better AI Agents - Paper

P068 - DeepTravel: An End-to-End Agentic Reinforcement Learning Framework for Autonomous Travel Planning Agents - Paper

P069 - DSPO: Stable and Efficient Policy Optimization for Agentic Search and Reasoning - Paper

P070 - MOSAIC: Multi-agent Orchestration for Task-Intelligent Scientific Coding - Paper

P071 - MASA: LLM-Driven Multi-Agent Systems for Autoformalization - Paper

P072 - Exploiting Web Search Tools of AI Agents for Data Exfiltration - Paper

P073 - Auto-scaling Continuous Memory for GUI Agent - Paper

P074 - StoryBox: Collaborative Multi-Agent Simulation for Hybrid Bottom-Up Long-Form Story Generation Using Large Language Models - Paper

P075 - WebRouter: Query-specific Router via Variational Information Bottleneck for Cost-sensitive Web Agent - Paper

P076 - LLM×MapReduce-V3: Enabling Interactive In-Depth Survey Generation through a MCP-Driven Hierarchically Modular Agent System - Paper

P077 - BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions - Paper

P078 - AGENTIQL: An Agent-Inspired Multi-Expert Framework for Text-to-SQL Generation - Paper

P079 - FML-bench: A Benchmark for Automatic ML Research Agents Highlighting the Importance of Exploration Breadth - Paper

P080 - MedAgentAudit: Diagnosing and Quantifying Collaborative Failure Modes in Medical Multi-Agent Systems - Paper

P081 - Can Tool-Integrated Reinforcement Learning Generalize Across Diverse Domains? - Paper

P082 - A Survey on Agentic Multimodal Large Language Models - Paper

P083 - R-WoM: Retrieval-augmented World Model For Computer-use Agents - Paper

P084 - HackWorld: Evaluating Computer-Use Agents on Exploiting Web Application Vulnerabilities - Paper

P085 - Deep Research Brings Deeper Harm - Paper

A\textsuperscript{2}FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning, https://arxiv.org/abs/2510.12838

DeepPlanner: Scaling Planning Capability for Deep Research Agents via Advantage Shaping, https://arxiv.org/abs/2510.12979


🛠️ Agent Frameworks

Popular Agent Development Frameworks

Comprehensive frameworks for building and deploying AI agents across different domains.

F001 - MiroFlow: Build, manage, and scale your AI agents with ease - GitHub

F002 - Youtu-Agent: A simple yet powerful agent framework that delivers with open-source models - GitHub

F003 - OpenManus: No fortress, purely open ground. OpenManus is Coming - GitHub

F004 - OpenBB Platform: Financial data platform for analysts, quants and AI agents - Project

F005 - TradingAgents: Multi-Agents LLM Financial Trading Framework - Paper · GitHub

F006 - JoyAgent-JDGenie: Technical Report on the GAIA - Paper · GitHub


📊 Evaluation

Benchmarks & Evaluation Frameworks

Comprehensive evaluation tools and benchmarks for measuring agent performance across various tasks.

E001 - LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries - Paper

E002 - BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent - Paper

E003 - HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering - Paper

E004 - GAIA: a benchmark for General AI Assistants - Paper · Leaderboard

E005 - xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations - Paper

E006 - MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers - Paper

E007 - FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction - Paper

E008 - Terminal-Bench: the benchmark for testing AI agents in real terminal environments - GitHub

E009 - Gaia2 and ARE: Empowering the Community to Evaluate Agents - Blog Post

E010 - GPQA: A Graduate-Level Google-Proof Q&A Benchmark - Paper · GitHub

E011 - WebWalkerQA: WebWalker: Benchmarking LLMs in Web Traversal - Paper · GitHub · Leaderboard

E012 - HLE: Humanity's Last Exam - Paper · Website

E013 - BFCL: Berkeley Function Calling Leaderboard - GitHub · Leaderboard

E014 - When2Call: When (not) to Call Tools - Paper · GitHub

E015 - ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities - Paper · GitHub

E016 - ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models - Paper · GitHub

E017 - SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines - Paper · Website

E018 - Terminal-Bench: A benchmark for testing AI agents in terminal environments - Leaderboard · Website

E019 - τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains - Paper · GitHub

E020 - τ2-Bench: Evaluating Conversational Agents in a Dual-Control Environment - Paper · GitHub

E021 - Deep Research Bench: Evaluating AI Web Research Agents - Paper · Website

E022 - Beyond the Final Answer: Evaluating the Reasoning Trajectories of Tool-Augmented Agents - Paper

E023 - TRAJECT-Bench: A Trajectory-Aware Benchmark for Evaluating Agentic Tool Use - Paper

E024 - ARC-AGI: The General Intelligence Benchmark - Website

E025 - Demystifying Deep Search: A Holistic Evaluation with Hint-Free Multi-Hop Questions and Factorised Metrics - Paper

E026 - BrowseComp-VL: A Comprehensive Benchmark for Vision-Language Web Browsing - Paper

E027 - ACEBench: Who Wins the Match Point in Tool Usage? - Paper

E028 - Haystack Engineering: Context Engineering for Heterogeneous and Agentic Long-Context Evaluation - Paper · GitHub

E029 - DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation - Paper

E030 - When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents - Paper

E031 - A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic System - Paper

E032 - Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation - Paper


🧠 Agent Memory

Memory Systems for Persistent Agent Intelligence

Advanced memory solutions for building agents with long-term context and learning capabilities.

M001 - Mem0: Building Production- Ready AI Agents with Scalable Long-Term Memory - GitHub

M002 - memobase: Profile-Based Long-Term Memory for AI Applications - GitHub

M003 - Memento: Fine-tuning LLM Agents without Fine-tuning LLMs - Paper · GitHub

M004 - MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments - Paper

M005 - A-MEM: Agentic Memory for LLM Agents - Paper · GitHub

M006 - MemoryOS: Memory OS of AI Agent - Paper · GitHub

M007 - Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning - Paper

M008 - HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models - Paper · GitHub

M009 - MaxKB: Open-source platform for building enterprise-grade agents - GitHub

M010 - MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent - Paper · Website

M011 - LEGOMem: Modular Procedural Memory for Multi-agent LLM Systems for Workflow Automation - Paper

M012 - Memp: Exploring Agent Procedural Memory - Paper

M013 - MIRIX: Multi-Agent Memory System for LLM-Based Agents - Paper · Website

M014 - A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory - Paper

M015 - ToolMem: Enhancing Multimodal Agents with Learnable Tool Capability Memory - Paper

M016 - CAM: A Constructivist View of Agentic Memory for LLM-Based Reading Comprehension - Paper

M017 - Mem-α: Learning Memory Construction via Reinforcement Learning - Paper

M018 - Preference-Aware Memory Update for Long-Term LLM Agents - Paper


Blogs

Blog Posts & Tutorials

Curated collection of blog posts, tutorials, and articles about AI agents from various sources and languages.

General Blogs

  • ChatGPT Agent: Introducing ChatGPT Agent

    • Blog Post · OpenAI's latest agent capabilities and features
  • Tongyi DeepResearch: Deep Research Agent for Complex Tasks

    • Blog Post · Alibaba's advanced research agent system

Chinese Blogs

中文博客与资源

精选的中文AI智能体相关博客文章、教程和资源,帮助中文用户更好地理解和应用智能体技术。

  • 17个主流 Agent 框架快速对比

    • 博客链接 · 知乎专栏文章,对比分析主流智能体框架
  • 通义 DeepResearch

    • Blog Post · 阿里巴巴通义智能体深度研究系统介绍


Documentation Info

Last Updated: September 2025 · Doc Contributor: Team @ MiroMind AI