User Guide

Documentation

Learn how to create live caption sessions, share with your audience, and set up translations with audio dubbing.

11 guides~12 min read

Getting Started

Create your first captioning session in minutes. Choose your source language, optionally add translation languages, and start streaming captions to your audience.

1Create a New Session

From your dashboard, click the + button to create a new session. Give it a title, select your source language, and optionally choose translation languages.

Create new session form — The session creation form with language selection

2Configure Options

Source Language

The language being spoken

Translation Languages

Add languages for real-time translation

Speaker Identification

Enable multi-speaker detection

Audio Input

Select your microphone or system audio

Running a Session

Once your session is created, you'll enter the studio view where you can start capturing audio and see captions in real-time.

The Studio View

The studio shows your live captions as they're transcribed. The red LIVE indicator shows when audio is being captured. The timer shows elapsed session time.

Live studio view with captions — Live captions flowing in the studio view

Session Controls

Session tabs

Switch between parent and translation sessions

Timer

Shows elapsed time and audio level

Copy button

Copy all captions to clipboard

Text settings

Adjust font size and display

Share button

Generate share link for viewers

155+ Variants

Regional Language Variants

Go beyond generic translations with 155+ regional language variants. Subsume automatically adapts vocabulary, expressions, and phrasing to match specific regional norms—so your Mexican Spanish sounds Mexican, not generic.

Region-Specific Vocabulary

Our AI translation pipeline uses LLM refinement to adapt vocabulary for each regional variant. See the difference:

English	Mexico	Argentina	Spain
Car	carro	auto	coche
Bus	camión	colectivo	autobús
Jacket	chamarra	campera	chaqueta
Cell phone	celular	celu	móvil

Supported Regional Variants

Select from 155+ regional variants across major world languages. Each variant receives tailored translations that respect local vocabulary and expressions.

Spanish22

Mexico, Argentina, Colombia, Spain, Chile, Peru, Venezuela, Ecuador, + 14 more

French24

France, Canada, Belgium, Switzerland, Senegal, Côte d'Ivoire, + 18 more

English22

US, UK, Australia, India, South Africa, Nigeria, Singapore, + 15 more

Arabic18

Egypt, Saudi Arabia, Morocco, Algeria, UAE, Jordan, Lebanon, + 11 more

Portuguese8

Brazil, Portugal, Angola, Mozambique, Cape Verde, + 3 more

Chinese6

Simplified, Traditional, Hong Kong, Taiwan, Singapore, Cantonese

German6

Germany, Austria, Switzerland, Liechtenstein, Luxembourg, Belgium

Russian6

Russia, Belarus, Kazakhstan, Kyrgyzstan, Ukraine, Moldova

Dutch5

Netherlands, Belgium (Flemish), Suriname, Aruba, Curaçao

+ Italian, Swedish, Tamil, Swahili, and 40+ more language families with regional variants

How Regional Translation Works

Speech Recognition

Audio captured via Deepgram Nova-3 or ElevenLabs Scribe

Neural Translation

Azure or Google NMT provides base translation

Regional Refinement

LLM adapts vocabulary to match regional norms

Regional TTS

Audio dubbing with region-appropriate voice

Real Regional Differences

A viewer in Mexico City expects "carro" while Buenos Aires expects "auto" and Madrid expects "coche". Regional variants ensure your translations feel native to each audience, not like generic machine translation.

Reference

Language Support Reference

Complete reference of translation language support across all providers. See which languages support speech recognition (ASR), translation (NMT), and audio dubbing (TTS).

Legend:

Full support

Fallback provider

Not supported

Major Languages

Language	ASR	Translation	Audio Dub
English en
Spanish es
French fr
German de
Portuguese pt
Chinese (Simplified) zh-Hans
Japanese ja
Korean ko
Arabic ar
Russian ru
Italian it
Dutch nl
Hindi hi

South & Southeast Asian Languages

Language	ASR	Translation	Audio Dub
Vietnamese vi
Thai th
Indonesian id
Malay ms
Filipino/Tagalog tl
Bengali bn
Tamil ta
Telugu te
Urdu ur
Burmese my
Khmer km

European Languages

Language	ASR	Translation	Audio Dub
Polish pl
Turkish tr
Greek el
Czech cs
Romanian ro
Hungarian hu
Ukrainian uk
Swedish sv
Norwegian no
Danish da
Finnish fi
Welsh cy
Irish ga
Macedonian mk

African & Middle Eastern Languages

Language	ASR	Translation	Audio Dub
Hebrew he
Persian/Farsi fa
Swahili sw
Afrikaans af
Hausa ha
Yoruba yo
Zulu zu
Amharic am
Somali so
Pashto ps

Note: Languages without TTS support (marked ✗ for Audio Dub) can still be translated—viewers will see text captions but won't receive synthesized audio. The full list includes 155+ language variants across all categories.

Pro Feature

Split View

Monitor multiple translations simultaneously with split view. Pin up to 5 sessions side-by-side and watch captions flow across all languages in real-time—a game-changer for multilingual events.

Multi-Language Monitoring

Click the split view button in the top right corner to enable multi-pane mode. Pin sessions by clicking the pin icon on any tab. Each pane operates independently with its own controls.

Split view showing three language sessions side by side — Monitor original English with Japanese and Korean translations simultaneously

Split View Capabilities

Up to 5 Sessions

Pin your original plus up to 4 translations

Independent Controls

Each pane has its own copy, share, and settings

Real-time Sync

All translations update simultaneously as you speak

Quick Comparison

Spot translation issues instantly across languages

Pro Feature

Translation Coach

Improve translation quality in real-time with AI-powered coaching and human guidance. The Translation Coach learns your terminology preferences and ensures consistency across your entire session.

AI + Human Feedback Loop

Click the coach icon on any translation session to open the Translation Coach. Enter custom guidance like terminology preferences, and watch as the AI also suggests improvements based on patterns it detects.

Translation Coach showing AI suggestions and human input — AI-generated suggestions (AUTO) alongside human coaching input for each language

How It Works

Human Coaching

Type guidance like "Use 'AI agents' not 'AI assistants'"

AI Suggestions

Automatic detection of inconsistencies and improvements

Per-Language

Each translation gets its own coaching context

Live Updates

Coaching applies immediately to new translations

💡

Pro Tip: Domain-Specific Terminology

For technical or specialized content, add coaching notes with your preferred terminology at the start of your session. The AI will maintain consistency throughout.

AI-Powered

Improve transcription and translation accuracy for domain-specific terms. The AI uses your context description, previous captions, and a dynamic glossary to ensure consistent, contextually appropriate results.

Transcription Context

Describe the topic, audience, and tone of your session in natural language. For example: "Tech startup pitch, casual tone, explain jargon simply" or "Academic lecture on economics, formal tone, preserve technical terms".

Example context description

"Sunday worship service at a Christian church. Translate religious terms appropriately: 'God' should use the Christian deity term in each language, not a generic word."

Generate Refinement Guide

Click Generate Refinement Guide to have AI analyze your context description and create optimized per-language prompts. Each translation language receives custom guidance tailored to its linguistic needs.

Domain Context

Industry terminology, subject matter expertise

Tone Guidance

Formal, casual, technical, or conversational

Key Term Translations

How to translate specific important terms

Language-Specific Rules

Honorifics, formality levels, script preferences

Use Previous Context

Enable this to include recent captions when refining new text. This helps maintain sentence continuity, resolve pronouns (like "he" or "it"), and keep terminology consistent throughout your session.

Context depth:

123

Higher depth includes more previous captions for better context, but uses more processing.

Dynamic Terminology Glossary

When enabled, the AI automatically identifies and learns domain-specific terms during your session. Once a term is learned, it's translated consistently throughout—and the glossary can be saved to your recurring template for future sessions.

Auto-Detection

AI identifies new domain terms as you speak

Consistent Translation

Learned terms use the same translation every time

Manual Editing

Review and correct glossary entries anytime

Template Persistence

Save glossary to recurring templates for reuse

🎯

Example: Religious Content

In a sermon, the word "God" should be translated to the Christian deity term (Dios, 上帝, 하나님) rather than a generic term. Meanwhile, "the god of this world" (referring to Satan) should be translated differently. The refinement system understands this nuance from your context description.

Share your captions with anyone via a simple link or QR code. Viewers can watch on any device without installing anything.

Generate a Share Link

Click the share icon in the studio to open the share modal. You'll get a unique URL and QR code that viewers can use to access your captions.

Share modal with QR code and viewer link

Sharing Options

Copy Link

Copy the viewer URL to share via chat, email, etc.

Copy QR

Copy the QR code image to clipboard

Download QR

Save as PNG for printing or projection

Language Links

Share direct links to specific languages

Audio Dubbing

Enable AI-powered audio dubbing to generate spoken audio for your translations. Viewers can listen to captions in their preferred language.

Viewer Experience

When audio dubbing is enabled, viewers see highlighted captions indicating playback status:

Currently playing

Queued next (click to skip)

Audio dubbing playback with highlights — Blue = currently playing, Yellow = queued next

Audio Dubbing Settings

When creating a session with translations, enable Audio Dubbing to configure:

Dub Languages

Choose which translation languages get audio

TTS Model

Flash Lite, Flash (recommended), or Pro

Voice

Select from 30+ AI voices

Voice Style

Professional, Warm, Calm, Energetic, or custom

AI-Powered

Adaptive AI

Subsume's AI doesn't just transcribe—it learns. Our adaptive timing system observes each speaker's natural rhythm and continuously optimizes when to release translations for the smoothest viewer experience.

Continual Learning

Every session makes the system smarter. The AI monitors release timing outcomes and automatically adjusts parameters to match each speaker's patterns—no manual tuning required.

Real-time Adaptation

Adjusts during the session as it learns your speaking patterns

Cross-Session Memory

Remembers preferences and improves with each use

Speaker Profiles

Create profiles for different speakers or contexts

How It Works

The AI agent makes intelligent decisions about when to release translated text, then evaluates each decision against the actual speech flow. Over time, it learns the optimal timing for natural phrase boundaries.

Phrase Boundary Detection

Learns to release at natural pauses, not mid-sentence

Adaptive Confidence

Adjusts how eagerly it releases based on past outcomes

Buffer Optimization

Finds the right balance between speed and completeness

Outcome Learning

Classifies each release and uses feedback to improve

🧠

Agentic AI: Always Improving

Unlike static systems, our timing agent operates autonomously—observing, learning, and optimizing with every session. The more you use it, the better it gets at delivering smooth, natural translations.

Speaker Profiles

Save and switch between timing preferences for different speakers. Each profile captures the learned timing parameters so you can instantly optimize for whoever is speaking.

Why Speaker Profiles?

Different speakers have different rhythms—some pause frequently, others speak in long continuous phrases. The adaptive AI learns these patterns, but when speakers change mid-session, you can switch profiles to instantly apply the right timing.

Fast Speakers

Shorter buffers, quicker releases for rapid delivery

Deliberate Speakers

Longer buffers to capture complete thoughts

Multi-Speaker Events

Switch profiles when presenters change

Cross-Session Memory

Profiles persist and improve over time

Managing Profiles

Create

Name a new profile or save current settings as a profile

Switch

Select a different profile from the dropdown during a session

Delete

Remove profiles you no longer need

👤

Pro Tip: Name Profiles by Speaker

Create profiles like "Pastor John" or "CEO Keynote" so you can quickly switch when different people take the stage. The system will apply their learned timing preferences instantly.

Recurring Templates

Create templates for recurring events. Get a permanent share link that never changes—perfect for weekly meetings, services, or classes.

1Create a Template

After running a session, click the Make Recurring button to create a template from it. All your settings will be saved.

Make recurring template modal — Configure your template

2Template Features

Custom URL

Set a vanity URL like subsume.io/view/my-event

Waiting Message

Show a message when no session is active

Permanent Link

The share URL never changes between sessions

One-Click Start

Start a new session from the template instantly

3Share Your Template

Templates have permanent share links with language-specific QR codes. Share once, use forever.

Recurring template card — Template card with permanent URL

Template share modal — Share modal with language selector

Ready to get started?

Create your first captioning session in minutes.

Get Started See pricing

Documentation

Getting Started

1Create a New Session

2Configure Options

Running a Session

The Studio View

Session Controls

Regional Language Variants

Region-Specific Vocabulary

Supported Regional Variants

How Regional Translation Works

Language Support Reference

Major Languages

South & Southeast Asian Languages

European Languages

African & Middle Eastern Languages

Split View

Multi-Language Monitoring

Split View Capabilities

Translation Coach

AI + Human Feedback Loop

How It Works

AI Transcription Refinement

Transcription Context

Generate Refinement Guide

Use Previous Context

Dynamic Terminology Glossary

Sharing with Viewers

Generate a Share Link

Sharing Options

Audio Dubbing

Viewer Experience

Audio Dubbing Settings

Adaptive AI

Continual Learning

How It Works

Speaker Profiles

Why Speaker Profiles?

Managing Profiles

Recurring Templates

1Create a Template

2Template Features

3Share Your Template

Ready to get started?