Audio Chunking for Long-Form Transcription: Splitting and Stitching with ffmpeg + TypeScript
APIs that do speech-to-text — Groq Whisper, OpenAI Whisper, and friends — all have one thing in common: a file size limit. Groq's hard cap is 25MB. A typical one-hour interview at decent quality can easily be 80–150MB. If you just try to send that, you'll get a 413 or a rate-limit error before the t
nareshipme
APIs that do speech-to-text — Groq Whisper, OpenAI Whisper, and friends — all have one thing in common: a file size limit. Groq's hard cap is 25MB. A typical one-hour interview at decent quality can easily be 80–150MB. If you just try to send that, you'll get a 413 or a rate-limit error before the transcription even starts.
The fix is chunking: split the audio into manageable pieces, transcribe each one, then stitch the results back together — with correct timestamps. That last part is where most implementations go wrong.
Here's the approach I landed on, built around ffmpeg and TypeScript.
The Strategy
if file < 24MB → send directly (fast path)
else → chunk into 20-min segments at 32kbps mono → transcribe each → stitch
The 20-minute / 32kbps combination keeps each chunk well under 5MB, which gives plenty of headroom below the 25MB limit regardless of source format.
Found this useful? Share it!
Read the Full Story
Continue reading on Dev.to
Related Stories
Majority Element
about 2 hours ago
Building a SQL Tokenizer and Formatter From Scratch — Supporting 6 Dialects
about 2 hours ago
Markdown Knowledge Graph for Humans and Agents
about 2 hours ago

Moving Beyond Disk: How Redis Supercharges Your App Performance
about 2 hours ago