Audio Tools (tool-audio)
Audio processing capabilities including transcription and audio-based question answering.
Configuration
Environment Variables:
OPENAI_API_KEY: Required. OpenAI API keyOPENAI_BASE_URL: API base URL. Default:https://api.openai.com/v1OPENAI_TRANSCRIPTION_MODEL_NAME: Default:gpt-4o-transcribeOPENAI_AUDIO_MODEL_NAME: Default:gpt-4o-audio-preview
Function Reference
audio_transcription(audio_path_or_url: str)
Transcribe audio file to text using OpenAI's Whisper models.
Parameters:
audio_path_or_url: Local file path or URL- Supported formats: MP3, WAV, M4A, FLAC, OGG, WebM
- Not supported: E2B sandbox paths, YouTube URLs
Returns:
str: Full transcription of the audio file
Example:
# Transcribe local audio
transcription = await audio_transcription("/data/meeting.mp3")
# Transcribe from URL
transcription = await audio_transcription("https://example.com/podcast.wav")
audio_question_answering(audio_path_or_url: str, question: str)
Answer questions based on audio content using GPT-4o Audio.
Parameters:
audio_path_or_url: Local file path or URL (same formats as transcription)question: Question to answer about the audio content
Returns:
str: Answer with audio duration information
Example:
# Ask about content
answer = await audio_question_answering(
"/data/lecture.mp3",
"What are the main topics discussed?"
)
# Get summary
answer = await audio_question_answering(
"https://example.com/interview.wav",
"Summarize the key points."
)
Important Notes:
- Cannot access E2B sandbox files (
/home/user/) - YouTube URLs not supported (use VQA tools instead)
- Includes audio duration in response
Documentation Info
Last Updated: October 2025 ยท Doc Contributor: Team @ MiroMind AI