Skip to content
NVIDIA

Active Speaker Detection

flagship
NVIDIA · released 2024-06-01 · text
currently routing · 4.2k rpm
1K tokens
Context
— / 1M
Input
— / 1M
Output
— t/s
Speed
open
License
/ ABOUT

NVIDIA's Active Speaker Detection model identifies which person is currently speaking in multi-person video or audio streams. Using computer vision and audio analysis, it determines which visible face corresponds to the active audio source, enabling automated speaker attribution in meetings, interviews, and multi-camera productions.

The model processes video frames alongside audio to correlate lip movements and facial expressions with the audio signal. It handles scenarios with overlapping speakers, off-screen speakers, and varying camera angles.

Active Speaker Detection is essential for automated meeting transcription, video editing, and media processing pipelines where accurate speaker identification is required.

Providers for Active Speaker Detection

1 routes · sorted by uptime

ClosedRouter routes requests to the providers best able to handle your prompt size and parameters, with automatic fallbacks to maximize uptime.

Provider
Context
Quant
Uptime · 30d
bf16
0.00%