OpenAI Whisper বাংলা গাইড — ফ্রি Audio Transcription Master ২০২৬
ভূমিকা
OpenAI Whisper — open-source speech-to-text model যা ৯৯টি ভাষায় (Bangla সহ) transcription করে। সবচেয়ে accurate ফ্রি option, এবং local install possible — privacy + cost benefit।
Whisper model sizes
- tiny (39M): Fast, less accurate
- base (74M): Good balance for casual
- small (244M): Better accuracy
- medium (769M): Recommended for Bangla
- large-v3 (1.5B): Best accuracy, requires GPU
- large-v3-turbo: Latest, fast + accurate
4 ways to use Whisper
১. Hugging Face web (easiest)
- huggingface.co/spaces/openai/whisper
- Audio upload → transcribe
- No install, no signup
- File size limited (~25MB)
- Free for casual use
২. OpenAI API (cloud)
- $0.006/minute
- Fast (cloud GPU)
- API integration
- Up to 25MB per file
- No local resource
৩. Local install (Python)
- pip install openai-whisper
- Free + unlimited
- Privacy: data never leaves computer
- GPU recommended for speed
- Cmd:
whisper audio.mp3 --language Bengali --model medium
৪. Desktop apps (no-code)
- MacWhisper: Mac native, $20
- WhisperBoard: iOS, free
- Whisper Transcription: Windows
- AudioPen: mobile + web
Bangla accuracy benchmark
- Standard Bangla clear audio: 95%+
- Regional accent (Chittagong, Sylhet): 80-90%
- Mixed Banglish: 90%+
- Background noise: 70-85%
- Multiple speakers: 70-80% (without diarization)
- Music background: poor
Practical workflow — interview
- Phone recording high-quality (Voice Memos)
- Convert to MP3 if needed
- Run Whisper local (medium model)
- 1-hour audio = 5-10 minutes processing
- Output: text + SRT timestamps
- Cleanup with ChatGPT
- Final document time: 30 minutes total
Output formats
- TXT — plain text
- SRT — subtitles with timestamps
- VTT — web video format
- JSON — programmatic
- TSV — spreadsheet-friendly
Advanced — improving accuracy
- Audio preprocessing: noise reduction (Audacity)
- Multiple audio channels separate
- Chunking long files
- Initial prompt with terms:
--initial_prompt "BD politics, RAJUK"
- Voice activity detection (silero VAD)
- Custom vocabulary
Use cases — top 8
- Journalist interview
- Student lecture
- Podcast subtitle
- Doctor patient note
- Lawyer deposition
- Researcher focus group
- YouTube video caption
- Voice journal
Alternative — when not Whisper
- Real-time: Whisper is batch, use Deepgram/Otter
- Speaker labels: Whisper alone no, use combined tool
- Enterprise: Azure/GCP managed
- Very specific accent: train custom model
উপসংহার
Whisper — open-source AI-এর crown jewel। বাংলা transcription আর paid service-এ আবদ্ধ না। আজকেই Hugging Face-এ ১টি Bangla audio test করুন; pro হলে local install. ৩-৫ ঘণ্টার কাজ ৫-১০ মিনিটে।
প্রাসঙ্গিক টুলস ও গাইড
AI আপডেট পেতে চান?
প্রতি সপ্তাহে নতুন AI টুলস ও টিউটোরিয়াল বাংলায় পান।
ফ্রি নিউজলেটার। যেকোনো সময় আনসাবস্ক্রাইব করতে পারবেন।