User Guide

Documentation

Learn how to create live caption sessions, share with your audience, and set up translations with audio dubbing.

11 guides~12 min read

Getting Started

Create your first captioning session in minutes. Choose your source language, optionally add translation languages, and start streaming captions to your audience.

1Create a New Session

From your dashboard, click the + button to create a new session. Give it a title, select your source language, and optionally choose translation languages.

Create new session form
The session creation form with language selection

2Configure Options

Source Language
The language being spoken
Translation Languages
Add languages for real-time translation
Speaker Identification
Enable multi-speaker detection
Audio Input
Select your microphone or system audio

Running a Session

Once your session is created, you'll enter the studio view where you can start capturing audio and see captions in real-time.

The Studio View

The studio shows your live captions as they're transcribed. The red LIVE indicator shows when audio is being captured. The timer shows elapsed session time.

Live studio view with captions
Live captions flowing in the studio view

Session Controls

Session tabs
Switch between parent and translation sessions
Timer
Shows elapsed time and audio level
Copy button
Copy all captions to clipboard
Text settings
Adjust font size and display
Share button
Generate share link for viewers
155+ Variants

Regional Language Variants

Go beyond generic translations with 155+ regional language variants. Subsume automatically adapts vocabulary, expressions, and phrasing to match specific regional norms—so your Mexican Spanish sounds Mexican, not generic.

Region-Specific Vocabulary

Our AI translation pipeline uses LLM refinement to adapt vocabulary for each regional variant. See the difference:

EnglishMexicoArgentinaSpain
Carcarroautocoche
Buscamióncolectivoautobús
Jacketchamarracamperachaqueta
Cell phonecelularcelumóvil

Supported Regional Variants

Select from 155+ regional variants across major world languages. Each variant receives tailored translations that respect local vocabulary and expressions.

ES
Spanish22
Mexico, Argentina, Colombia, Spain, Chile, Peru, Venezuela, Ecuador, + 14 more
FR
French24
France, Canada, Belgium, Switzerland, Senegal, Côte d'Ivoire, + 18 more
EN
English22
US, UK, Australia, India, South Africa, Nigeria, Singapore, + 15 more
AR
Arabic18
Egypt, Saudi Arabia, Morocco, Algeria, UAE, Jordan, Lebanon, + 11 more
PT
Portuguese8
Brazil, Portugal, Angola, Mozambique, Cape Verde, + 3 more
ZH
Chinese6
Simplified, Traditional, Hong Kong, Taiwan, Singapore, Cantonese
DE
German6
Germany, Austria, Switzerland, Liechtenstein, Luxembourg, Belgium
RU
Russian6
Russia, Belarus, Kazakhstan, Kyrgyzstan, Ukraine, Moldova
NL
Dutch5
Netherlands, Belgium (Flemish), Suriname, Aruba, Curaçao

+ Italian, Swedish, Tamil, Swahili, and 40+ more language families with regional variants

How Regional Translation Works

Speech Recognition
Audio captured via Deepgram Nova-3 or ElevenLabs Scribe
Neural Translation
Azure or Google NMT provides base translation
Regional Refinement
LLM adapts vocabulary to match regional norms
Regional TTS
Audio dubbing with region-appropriate voice
Real Regional Differences

A viewer in Mexico City expects "carro" while Buenos Aires expects "auto" and Madrid expects "coche". Regional variants ensure your translations feel native to each audience, not like generic machine translation.

Reference

Language Support Reference

Complete reference of translation language support across all providers. See which languages support speech recognition (ASR), translation (NMT), and audio dubbing (TTS).

Legend:
Full support
Fallback provider
Not supported

Major Languages

LanguageASRTranslationAudio Dub
English
en
Spanish
es
French
fr
German
de
Portuguese
pt
Chinese (Simplified)
zh-Hans
Japanese
ja
Korean
ko
Arabic
ar
Russian
ru
Italian
it
Dutch
nl
Hindi
hi

South & Southeast Asian Languages

LanguageASRTranslationAudio Dub
Vietnamese
vi
Thai
th
Indonesian
id
Malay
ms
Filipino/Tagalog
tl
Bengali
bn
Tamil
ta
Telugu
te
Urdu
ur
Burmese
my
Khmer
km

European Languages

LanguageASRTranslationAudio Dub
Polish
pl
Turkish
tr
Greek
el
Czech
cs
Romanian
ro
Hungarian
hu
Ukrainian
uk
Swedish
sv
Norwegian
no
Danish
da
Finnish
fi
Welsh
cy
Irish
ga
Macedonian
mk

African & Middle Eastern Languages

LanguageASRTranslationAudio Dub
Hebrew
he
Persian/Farsi
fa
Swahili
sw
Afrikaans
af
Hausa
ha
Yoruba
yo
Zulu
zu
Amharic
am
Somali
so
Pashto
ps
Note: Languages without TTS support (marked ✗ for Audio Dub) can still be translated—viewers will see text captions but won't receive synthesized audio. The full list includes 155+ language variants across all categories.
Pro Feature

Split View

Monitor multiple translations simultaneously with split view. Pin up to 5 sessions side-by-side and watch captions flow across all languages in real-time—a game-changer for multilingual events.

Multi-Language Monitoring

Click the split view button in the top right corner to enable multi-pane mode. Pin sessions by clicking the pin icon on any tab. Each pane operates independently with its own controls.

Split view showing three language sessions side by side
Monitor original English with Japanese and Korean translations simultaneously

Split View Capabilities

Up to 5 Sessions
Pin your original plus up to 4 translations
Independent Controls
Each pane has its own copy, share, and settings
Real-time Sync
All translations update simultaneously as you speak
Quick Comparison
Spot translation issues instantly across languages
Pro Feature

Translation Coach

Improve translation quality in real-time with AI-powered coaching and human guidance. The Translation Coach learns your terminology preferences and ensures consistency across your entire session.

AI + Human Feedback Loop

Click the coach icon on any translation session to open the Translation Coach. Enter custom guidance like terminology preferences, and watch as the AI also suggests improvements based on patterns it detects.

Translation Coach showing AI suggestions and human input
AI-generated suggestions (AUTO) alongside human coaching input for each language

How It Works

Human Coaching
Type guidance like "Use 'AI agents' not 'AI assistants'"
AI Suggestions
Automatic detection of inconsistencies and improvements
Per-Language
Each translation gets its own coaching context
Live Updates
Coaching applies immediately to new translations
💡
Pro Tip: Domain-Specific Terminology

For technical or specialized content, add coaching notes with your preferred terminology at the start of your session. The AI will maintain consistency throughout.

AI-Powered

AI Transcription Refinement

Improve transcription and translation accuracy for domain-specific terms. The AI uses your context description, previous captions, and a dynamic glossary to ensure consistent, contextually appropriate results.

Transcription Context

Describe the topic, audience, and tone of your session in natural language. For example: "Tech startup pitch, casual tone, explain jargon simply" or "Academic lecture on economics, formal tone, preserve technical terms".

Example context description
"Sunday worship service at a Christian church. Translate religious terms appropriately: 'God' should use the Christian deity term in each language, not a generic word."

Generate Refinement Guide

Click Generate Refinement Guide to have AI analyze your context description and create optimized per-language prompts. Each translation language receives custom guidance tailored to its linguistic needs.

Domain Context
Industry terminology, subject matter expertise
Tone Guidance
Formal, casual, technical, or conversational
Key Term Translations
How to translate specific important terms
Language-Specific Rules
Honorifics, formality levels, script preferences

Use Previous Context

Enable this to include recent captions when refining new text. This helps maintain sentence continuity, resolve pronouns (like "he" or "it"), and keep terminology consistent throughout your session.

Context depth:
123

Higher depth includes more previous captions for better context, but uses more processing.

Dynamic Terminology Glossary

When enabled, the AI automatically identifies and learns domain-specific terms during your session. Once a term is learned, it's translated consistently throughout—and the glossary can be saved to your recurring template for future sessions.

Auto-Detection
AI identifies new domain terms as you speak
Consistent Translation
Learned terms use the same translation every time
Manual Editing
Review and correct glossary entries anytime
Template Persistence
Save glossary to recurring templates for reuse
🎯
Example: Religious Content

In a sermon, the word "God" should be translated to the Christian deity term (Dios, 上帝, 하나님) rather than a generic term. Meanwhile, "the god of this world" (referring to Satan) should be translated differently. The refinement system understands this nuance from your context description.

Sharing with Viewers

Share your captions with anyone via a simple link or QR code. Viewers can watch on any device without installing anything.

Generate a Share Link

Click the share icon in the studio to open the share modal. You'll get a unique URL and QR code that viewers can use to access your captions.

Share modal with QR code
Share modal with QR code and viewer link

Sharing Options

Copy Link
Copy the viewer URL to share via chat, email, etc.
Copy QR
Copy the QR code image to clipboard
Download QR
Save as PNG for printing or projection
Language Links
Share direct links to specific languages

Audio Dubbing

Enable AI-powered audio dubbing to generate spoken audio for your translations. Viewers can listen to captions in their preferred language.

Viewer Experience

When audio dubbing is enabled, viewers see highlighted captions indicating playback status:

Currently playing
Queued next (click to skip)
Audio dubbing playback with highlights
Blue = currently playing, Yellow = queued next

Audio Dubbing Settings

When creating a session with translations, enable Audio Dubbing to configure:

Dub Languages
Choose which translation languages get audio
TTS Model
Flash Lite, Flash (recommended), or Pro
Voice
Select from 30+ AI voices
Voice Style
Professional, Warm, Calm, Energetic, or custom
AI-Powered

Adaptive AI

Subsume's AI doesn't just transcribe—it learns. Our adaptive timing system observes each speaker's natural rhythm and continuously optimizes when to release translations for the smoothest viewer experience.

Continual Learning

Every session makes the system smarter. The AI monitors release timing outcomes and automatically adjusts parameters to match each speaker's patterns—no manual tuning required.

Real-time Adaptation
Adjusts during the session as it learns your speaking patterns
Cross-Session Memory
Remembers preferences and improves with each use
Speaker Profiles
Create profiles for different speakers or contexts

How It Works

The AI agent makes intelligent decisions about when to release translated text, then evaluates each decision against the actual speech flow. Over time, it learns the optimal timing for natural phrase boundaries.

Phrase Boundary Detection
Learns to release at natural pauses, not mid-sentence
Adaptive Confidence
Adjusts how eagerly it releases based on past outcomes
Buffer Optimization
Finds the right balance between speed and completeness
Outcome Learning
Classifies each release and uses feedback to improve
🧠
Agentic AI: Always Improving

Unlike static systems, our timing agent operates autonomously—observing, learning, and optimizing with every session. The more you use it, the better it gets at delivering smooth, natural translations.

Speaker Profiles

Save and switch between timing preferences for different speakers. Each profile captures the learned timing parameters so you can instantly optimize for whoever is speaking.

Why Speaker Profiles?

Different speakers have different rhythms—some pause frequently, others speak in long continuous phrases. The adaptive AI learns these patterns, but when speakers change mid-session, you can switch profiles to instantly apply the right timing.

Fast Speakers
Shorter buffers, quicker releases for rapid delivery
Deliberate Speakers
Longer buffers to capture complete thoughts
Multi-Speaker Events
Switch profiles when presenters change
Cross-Session Memory
Profiles persist and improve over time

Managing Profiles

Create
Name a new profile or save current settings as a profile
Switch
Select a different profile from the dropdown during a session
Delete
Remove profiles you no longer need
👤
Pro Tip: Name Profiles by Speaker

Create profiles like "Pastor John" or "CEO Keynote" so you can quickly switch when different people take the stage. The system will apply their learned timing preferences instantly.

Recurring Templates

Create templates for recurring events. Get a permanent share link that never changes—perfect for weekly meetings, services, or classes.

1Create a Template

After running a session, click the Make Recurring button to create a template from it. All your settings will be saved.

Make recurring button on session card
Find the Make Recurring button
Make recurring template modal
Configure your template

2Template Features

Custom URL
Set a vanity URL like subsume.io/view/my-event
Waiting Message
Show a message when no session is active
Permanent Link
The share URL never changes between sessions
One-Click Start
Start a new session from the template instantly

3Share Your Template

Templates have permanent share links with language-specific QR codes. Share once, use forever.

Recurring template card
Template card with permanent URL
Template share modal
Share modal with language selector

Ready to get started?

Create your first captioning session in minutes.