DocsTranscribe

Audio Transcription

Convert audio files to text using Gemini's multimodal capabilities. Simple one-function interface for transcription.

Quick Start

main.py

from connectonion import transcribe

# Simple transcription (uses OpenOnion managed keys)
text = transcribe("meeting.mp3")
print(text)

# With your own Gemini API key
text = transcribe("meeting.mp3", model="gemini-3-flash-preview")

Python REPL

Interactive

>>> text = transcribe("meeting.mp3")

>>> print(text)

All right, so here we are in front of the elephants...

That's it! One function for audio-to-text.

With Context Hints

Improve accuracy for domain-specific terms:

main.py

# Technical meeting with specific names
text = transcribe(
    "standup.mp3",
    prompt="Technical AI discussion. Names: Aaron, Lisa. Terms: ConnectOnion, OpenOnion"
)

# Medical transcription
text = transcribe(
    "consultation.mp3",
    prompt="Medical consultation. Terms: hypertension, metformin, CBC"
)

Python REPL

Interactive

>>> text = transcribe("standup.mp3", prompt="Technical AI discussion...")

>>> print(text)

Aaron mentioned that ConnectOnion's new feature is ready for review...

With Timestamps

main.py

text = transcribe("podcast.mp3", timestamps=True)
print(text)

Python REPL

Interactive

>>> text = transcribe("podcast.mp3", timestamps=True)

>>> print(text)

[00:00] Welcome to the show...

[00:15] Today we're discussing AI agents...

[01:30] Let's dive into the first topic...

Real Examples

Meeting Minutes

main.py

def get_meeting_minutes(audio_path: str) -> str:
    """Transcribe and summarize a meeting."""
    from connectonion import transcribe, llm_do

    # Step 1: Transcribe
    transcript = transcribe(audio_path, prompt="Business meeting")

    # Step 2: Summarize
    summary = llm_do(
        transcript,
        system_prompt="Extract action items and key decisions as bullet points."
    )
    return summary

Python REPL

Interactive

>>> summary = get_meeting_minutes("standup.mp3")

>>> print(summary)

**Action Items:**

- Aaron to review PR #123

- Lisa to update documentation

**Key Decisions:**

- Launch date set for Friday

Voice Notes Processing

main.py

from pathlib import Path

def process_voice_notes(folder: str) -> list[str]:
    """Transcribe all voice notes in a folder."""
    from connectonion import transcribe

    results = []
    for audio in Path(folder).glob("*.mp3"):
        text = transcribe(str(audio))
        results.append(f"# {audio.stem}\n{text}")
    return results

Python REPL

Interactive

>>> notes = process_voice_notes("voice_notes/")

>>> print(notes[0])

# idea_2024_01

Remember to add the new transcribe feature to the docs...

Use as Agent Tool

main.py

from connectonion import Agent, transcribe

def transcribe_audio(file_path: str) -> str:
    """Transcribe an audio file to text."""
    return transcribe(file_path)

agent = Agent("assistant", tools=[transcribe_audio])
result = agent.input("Transcribe the file meeting.mp3 and summarize it")

Python REPL

Interactive

>>> result = agent.input("Transcribe meeting.mp3 and summarize it")

>>> print(result)

I've transcribed the meeting. Here's a summary:

The team discussed the Q4 roadmap and agreed to...

Parameters

Parameter	Type	Default	Description
`audio`	str	required	Path to audio file
`prompt`	str	None	Context hints for accuracy
`model`	str	"co/gemini-3-flash-preview"	Model to use
`timestamps`	bool	False	Include timestamps in output

Supported Formats

WAVMP3AIFFAACOGGFLACM4AWebM

Token cost: 32 tokens per second of audio (1 minute = 1,920 tokens)

Models

main.py

# OpenOnion managed keys (default - no API key needed)
transcribe("audio.mp3", model="co/gemini-3-flash-preview")
transcribe("audio.mp3", model="co/gemini-2.5-flash")

# Your own Gemini API key (set GEMINI_API_KEY)
transcribe("audio.mp3", model="gemini-3-flash-preview")
transcribe("audio.mp3", model="gemini-2.5-flash")

Python REPL

Interactive

>>> transcribe("audio.mp3", model="co/gemini-3-flash-preview")

'This is the transcribed text from your audio file...'

>>> transcribe("audio.mp3", model="gemini-2.5-flash")

'This is the transcribed text using your own API key...'

What You Get

Simple API - One function for all transcription needs

Context hints - Improve accuracy with domain terms

Multiple formats - WAV, MP3, FLAC, and more

Timestamps - Optional time markers in output

Managed keys - Works out of the box with co/ models

Comparison with Agent

Feature	transcribe()	Agent()
Purpose	Audio to text	Multi-step workflows
Input	Audio files	Text prompts
Output	Plain text	Agent responses
Best for	Transcription	Complex tasks

main.py

# Use transcribe() for audio-to-text
text = transcribe("meeting.mp3")

# Use Agent for complex workflows with multiple tools
agent = Agent("assistant", tools=[search, calculate])
result = agent.input("Research and analyze...")

Python REPL

Interactive

>>> text = transcribe("meeting.mp3")

>>> print(text[:50])

'All right, so here we are in front of the elepha...'

>>> result = agent.input("Research and analyze...")

>>> print(result)

I'll help you research and analyze...

Error Handling

main.py

from connectonion import transcribe

try:
    text = transcribe("nonexistent.mp3")
except FileNotFoundError:
    print("Audio file not found")
except ValueError as e:
    print(f"API error: {e}")

Python REPL

Interactive

>>> try:

... text = transcribe("nonexistent.mp3")

... except FileNotFoundError:

... print("Audio file not found")

Audio file not found

Next Steps

Learn about llm_do()

For one-shot LLM calls

Explore Agents

For multi-step workflows

See Tools

For extending agents

LLM Function

Trust Parameter