ConnectOnionConnectOnion
DocsTranscribe

Audio Transcription

Convert audio files to text using Gemini's multimodal capabilities. Simple one-function interface for transcription.

Quick Start

main.py
1from connectonion import transcribe 2 3# Simple transcription (uses OpenOnion managed keys) 4text = transcribe("meeting.mp3") 5print(text) 6 7# With your own Gemini API key 8text = transcribe("meeting.mp3", model="gemini-3-flash-preview")
Python REPL
Interactive
>>> text = transcribe("meeting.mp3")
>>> print(text)
All right, so here we are in front of the elephants...

That's it! One function for audio-to-text.

With Context Hints

Improve accuracy for domain-specific terms:

main.py
1# Technical meeting with specific names 2text = transcribe( 3 "standup.mp3", 4 prompt="Technical AI discussion. Names: Aaron, Lisa. Terms: ConnectOnion, OpenOnion" 5) 6 7# Medical transcription 8text = transcribe( 9 "consultation.mp3", 10 prompt="Medical consultation. Terms: hypertension, metformin, CBC" 11)
Python REPL
Interactive
>>> text = transcribe("standup.mp3", prompt="Technical AI discussion...")
>>> print(text)
Aaron mentioned that ConnectOnion's new feature is ready for review...

With Timestamps

main.py
1text = transcribe("podcast.mp3", timestamps=True) 2print(text)
Python REPL
Interactive
>>> text = transcribe("podcast.mp3", timestamps=True)
>>> print(text)
[00:00] Welcome to the show...
[00:15] Today we're discussing AI agents...
[01:30] Let's dive into the first topic...

Real Examples

Meeting Minutes

main.py
1def get_meeting_minutes(audio_path: str) -> str: 2 """Transcribe and summarize a meeting.""" 3 from connectonion import transcribe, llm_do 4 5 # Step 1: Transcribe 6 transcript = transcribe(audio_path, prompt="Business meeting") 7 8 # Step 2: Summarize 9 summary = llm_do( 10 transcript, 11 system_prompt="Extract action items and key decisions as bullet points." 12 ) 13 return summary
Python REPL
Interactive
>>> summary = get_meeting_minutes("standup.mp3")
>>> print(summary)
**Action Items:**
- Aaron to review PR #123
- Lisa to update documentation
 
**Key Decisions:**
- Launch date set for Friday

Voice Notes Processing

main.py
1from pathlib import Path 2 3def process_voice_notes(folder: str) -> list[str]: 4 """Transcribe all voice notes in a folder.""" 5 from connectonion import transcribe 6 7 results = [] 8 for audio in Path(folder).glob("*.mp3"): 9 text = transcribe(str(audio)) 10 results.append(f"# {audio.stem}\n{text}") 11 return results
Python REPL
Interactive
>>> notes = process_voice_notes("voice_notes/")
>>> print(notes[0])
# idea_2024_01
Remember to add the new transcribe feature to the docs...

Use as Agent Tool

main.py
1from connectonion import Agent, transcribe 2 3def transcribe_audio(file_path: str) -> str: 4 """Transcribe an audio file to text.""" 5 return transcribe(file_path) 6 7agent = Agent("assistant", tools=[transcribe_audio]) 8result = agent.input("Transcribe the file meeting.mp3 and summarize it")
Python REPL
Interactive
>>> result = agent.input("Transcribe meeting.mp3 and summarize it")
>>> print(result)
I've transcribed the meeting. Here's a summary:
The team discussed the Q4 roadmap and agreed to...

Parameters

ParameterTypeDefaultDescription
audiostrrequiredPath to audio file
promptstrNoneContext hints for accuracy
modelstr"co/gemini-3-flash-preview"Model to use
timestampsboolFalseInclude timestamps in output

Supported Formats

WAVMP3AIFFAACOGGFLACM4AWebM

Token cost: 32 tokens per second of audio (1 minute = 1,920 tokens)

Models

main.py
1# OpenOnion managed keys (default - no API key needed) 2transcribe("audio.mp3", model="co/gemini-3-flash-preview") 3transcribe("audio.mp3", model="co/gemini-2.5-flash") 4 5# Your own Gemini API key (set GEMINI_API_KEY) 6transcribe("audio.mp3", model="gemini-3-flash-preview") 7transcribe("audio.mp3", model="gemini-2.5-flash")
Python REPL
Interactive
>>> transcribe("audio.mp3", model="co/gemini-3-flash-preview")
'This is the transcribed text from your audio file...'
 
>>> transcribe("audio.mp3", model="gemini-2.5-flash")
'This is the transcribed text using your own API key...'

What You Get

Simple API - One function for all transcription needs
Context hints - Improve accuracy with domain terms
Multiple formats - WAV, MP3, FLAC, and more
Timestamps - Optional time markers in output
Managed keys - Works out of the box with co/ models

Comparison with Agent

Featuretranscribe()Agent()
PurposeAudio to textMulti-step workflows
InputAudio filesText prompts
OutputPlain textAgent responses
Best forTranscriptionComplex tasks
main.py
1# Use transcribe() for audio-to-text 2text = transcribe("meeting.mp3") 3 4# Use Agent for complex workflows with multiple tools 5agent = Agent("assistant", tools=[search, calculate]) 6result = agent.input("Research and analyze...")
Python REPL
Interactive
>>> text = transcribe("meeting.mp3")
>>> print(text[:50])
'All right, so here we are in front of the elepha...'
 
>>> result = agent.input("Research and analyze...")
>>> print(result)
I'll help you research and analyze...

Error Handling

main.py
1from connectonion import transcribe 2 3try: 4 text = transcribe("nonexistent.mp3") 5except FileNotFoundError: 6 print("Audio file not found") 7except ValueError as e: 8 print(f"API error: {e}")
Python REPL
Interactive
>>> try:
... text = transcribe("nonexistent.mp3")
... except FileNotFoundError:
... print("Audio file not found")
Audio file not found

Next Steps