👥 Speaker Diarization

Identify Who Said What In Any Recording

Dokitscript automatically detects and labels each speaker in your audio or video. Upload an interview, podcast, meeting recording or panel discussion, and get a transcript with speaker labels like "Speaker 1:", "Speaker 2:" automatically.

Try speaker detection free →

Interviews · Podcasts · Meetings · Panel discussions · Focus groups · 90+ languages

Sample output, Speaker Diarization

Thanks for joining us today. Can you walk us through your background?

Speaker 1 · 0:00 – 0:05

Of course. I've been working in product design for about eight years now.

Speaker 2 · 0:06 – 0:11

And what drew you to user research specifically?

Speaker 1 · 0:12 – 0:15

Honestly, I realized early on that building the right thing matters more than building things right.

Speaker 2 · 0:16 – 0:22

How it works

Detect speakers automatically in 3 steps

No software to install. Works in your browser.

Upload your recording

Upload an MP4, MP3, WAV or M4A file. Works with any multi-speaker audio or video content.

Enable speaker detection

Toggle "Detect speakers" before transcribing. Dokitscript analyzes voice patterns to identify separate speakers automatically.

Get a labeled transcript

Receive a transcript with each speaker labeled: "Speaker 1:", "Speaker 2:", etc. Edit labels to use real names.

Features

Powerful speaker identification

Automatic labels, editable names, full export support.

🎙️

Automatic speaker labels

AI detects distinct voices and assigns consistent labels throughout your transcript. No manual tagging needed.

📝

Name your speakers

After transcription, rename "Speaker 1" to real participant names for professional-quality meeting minutes or interview transcripts.

👥

Multi-speaker support

Works with 2+ speakers. Handles panel discussions, focus groups and meetings with multiple participants, up to 10 speakers.

🌍

90+ languages

Speaker diarization works across all supported languages, not just English. Detect speakers in French, Spanish, German, and more.

✨

AI features

After speaker-labeled transcription, use AI Summary, Key Points or Q&A to analyze the conversation content instantly.

📤

Export with labels

Export your speaker-labeled transcript as TXT or SRT. Speaker labels are included in all export formats.

Best use cases

Who uses speaker diarization

From journalists to market researchers.

Interview transcription (journalist interviews, podcast interviews, UX research)
Meeting transcription with participant labels
Focus group recordings for market research
Legal depositions and hearings
Academic research recordings
Panel discussion and conference recordings

FAQ

Common questions

How does speaker diarization work?

Dokitscript uses AssemblyAI's speaker diarization technology to analyze voice characteristics (pitch, tone, speaking patterns) and segment the audio by speaker. Each segment is labeled consistently throughout the transcript.

How many speakers can it detect?

Dokitscript can detect and label up to 10 speakers. For best results, ensure speakers don't talk over each other too often.

Is speaker diarization available on all plans?

Speaker diarization is available on the Business plan ($49.99/month). The Business plan also includes 90-minute recordings, ideal for long interviews and meetings.

What if two speakers sound similar?

Diarization accuracy depends on voice distinctiveness. Speakers with very similar voices may occasionally be confused. For critical transcriptions, review and correct labels manually.