GAIA Validation Prerequisites

This document covers the common setup requirements and prerequisites for running GAIA validation benchmarks with MiroFlow, regardless of the specific model configuration used.

About the GAIA Dataset

What is GAIA?

GAIA (General AI Assistant) is a comprehensive benchmark designed to evaluate AI agents' ability to perform complex reasoning tasks that require multiple skills including web browsing, file manipulation, data analysis, and multi-step problem solving.

More details: GAIA: a benchmark for General AI Assistants

Dataset Preparation

Step 1: Prepare the GAIA Validation Dataset

Choose one of the following methods to obtain the GAIA validation dataset:

Method 1: Direct Download (Recommended)

No Authentication Required

This method does not require HuggingFace tokens or access permissions.

Manual Dataset Download

cd data
wget https://huggingface.co/datasets/miromind-ai/MiroFlow-Benchmarks/resolve/main/gaia-val.zip
unzip gaia-val.zip
# Unzip passcode: pf4*

Method 2: Using the prepare-benchmark command

Prerequisites Required

This method requires HuggingFace dataset access and token configuration.

First, you need to request access and configure your environment:

Request Dataset Access: Visit https://huggingface.co/datasets/gaia-benchmark/GAIA and request access

Configure Environment:

cp .env.template .env

Edit the .env file:

HF_TOKEN="your-actual-huggingface-token-here"
DATA_DIR="data/"

Getting Your Hugging Face Token

Go to https://huggingface.co/settings/tokens
Create a new token with at least "Read" permissions
Add your token to the .env file

Then download the dataset:

Download via Script

uv run -m miroflow.utils.prepare_benchmark.main get gaia-val

Progress Monitoring and Resume

Progress Tracking

You can monitor the evaluation progress in real-time:

Check Progress (GAIA-Validation-165)

uv run utils/check_progress_gaia-validation-165.py $PATH_TO_LOG

# Or for GAIA-Validation-Text-103
uv run utils/check_progress_gaia-validation-text-103.py $PATH_TO_LOG

Replace $PATH_TO_LOG with your actual output directory path.

Resume Capability

If the evaluation is interrupted, you can resume from where it left off by specifying the same output directory:

Resume Interrupted Evaluation

uv run miroflow/benchmark/run_benchmark.py \
  --config-path config/benchmark_gaia-validation-165_mirothinker.yaml \
  output_dir="logs/gaia-validation-165/run_1"

Documentation Info

Last Updated: February 2026 · Doc Contributor: Team @ MiroMind AI