GAIA Validation Prerequisites
This document covers the common setup requirements and prerequisites for running GAIA validation benchmarks with MiroFlow, regardless of the specific model configuration used.
About the GAIA Dataset
What is GAIA?
GAIA (General AI Assistant) is a comprehensive benchmark designed to evaluate AI agents' ability to perform complex reasoning tasks that require multiple skills including web browsing, file manipulation, data analysis, and multi-step problem solving.
More details: GAIA: a benchmark for General AI Assistants
Dataset Preparation
Step 1: Prepare the GAIA Validation Dataset
Choose one of the following methods to obtain the GAIA validation dataset:
Method 1: Direct Download (Recommended)
No Authentication Required
This method does not require HuggingFace tokens or access permissions.
cd data
wget https://huggingface.co/datasets/miromind-ai/MiroFlow-Benchmarks/resolve/main/gaia-val.zip
unzip gaia-val.zip
# Unzip passcode: pf4*
Method 2: Using the prepare-benchmark command
Prerequisites Required
This method requires HuggingFace dataset access and token configuration.
First, you need to request access and configure your environment:
- Request Dataset Access: Visit https://huggingface.co/datasets/gaia-benchmark/GAIA and request access
- Configure Environment:
Edit the
.envfile:
Getting Your Hugging Face Token
- Go to https://huggingface.co/settings/tokens
- Create a new token with at least "Read" permissions
- Add your token to the
.envfile
Then download the dataset:
Progress Monitoring and Resume
Progress Tracking
You can monitor the evaluation progress in real-time:
Replace $PATH_TO_LOG with your actual output directory path.
Resume Capability
If the evaluation is interrupted, you can resume from where it left off by specifying the same output directory:
uv run main.py common-benchmark \
--config_file_name=YOUR_CONFIG_FILE \
output_dir="logs/gaia-validation/20250922_1430"
Documentation Info
Last Updated: October 2025 ยท Doc Contributor: Team @ MiroMind AI