The AI dubbing industry is booming, with tools promising to translate and replicate actor performances across languages in an instant. But how good are they really? Until now, judging the quality of these systems has been a subjective mess. Amsterdam-based AI data firm Toloka is aiming to fix that with VOX-DUB, the first open, human-evaluated AI dubbing benchmark designed to bring some much-needed accountability to the sector.
VOX-DUB moves beyond the simple metrics used for text-to-speech, which has nearly reached human parity. Dubbing isn’t just about clear pronunciation; it’s about performance. The benchmark uses a pairwise A/B testing methodology, where native speakers listen to clips and rate them across five crucial dimensions: pronunciation, naturalness, audio quality, emotional accuracy, and voice similarity to the original actor.
