Automatic speech recognition (ASR) is at the core of what we do at Gong. Our Revenue Intelligence platform empowers sales leaders to close more deals and manage their pipeline better as we capture customer interactions, analyze what was said and deliver data-driven insights. Building upon the tremendous advances ASR has made in the past decade, we can now process human conversations using ML and NLP algorithms easier and more effectively than ever before. However, most of these algorithms still rely on huge amounts of annotated data. That’s why we are proud to present Gecko, a new open-source tool we developed at Gong for annotating human conversations.
The Gecko interface, which was designed to be clean yet interactive, integrates media player and editing capabilities. The main view features a waveform display of the audio file, a video player display if a video file was uploaded, on which the segmentation and speaker identification is overlaid and color coded. If a transcript was uploaded, it is synced with the audio so that the word currently heard in the audio playback is highlighted. You can zoom in and out of the waveform display and use the auto-center button to automatically center the waveform on the section currently playing. The Segment Labeling box shows the list of labels and allows you to add additional labels.
How to Use Gecko
To use Gecko, you’ll need an audio or video file and one or more files with annotations of segments. Gecko supports various file formats (such as the ones usually generated by speech-recognition frameworks): RTTM, CTM, JSON, SRT and TSV. You can upload several annotation files simultaneously in order to compare multiple models.
What You Can Do with Gecko
You can use Gecko for a variety of segmentationtasks, including Voice Activity Detection (VAD), diarization, and speaker identification. With Gecko, you can label the speaker in each segment (automatically color-coded) or label a segment as a sound event such as music, cross-talk, or whatever else you’d like. You can also set start and end times for segments and add or delete segments.
To easily compare between various models (for example, the ground truth with an output of a diarization system or the results of multiple diarization algorithms), you can upload multiple annotations to Gecko. You can then edit the input annotations, including speaker segments and words in the transcripts. And you can always use the “undo” feature to go back.
Refine automatic transcripts
Instead of manually transcribing the results of an ASR system from scratch, with Gecko you can save time by refining the results and labeling the dataset. Gecko highlights the word heard in the playback and allows you to edit it to improve the quality of the transcription.
Gecko makes it easy to compare two different transcripts by presenting the differences between them in a table and identifying insertions, deletions, and substitutions, as well as discrepancies, which you can search or hide if irrelevant. With both transcripts in front of you, you can listen to the audio and identify the correct transcript or enter the correct text if neither is right. Gecko can also generate a report showcasing the comparison.
Read video and subtitles files
Gecko can read video files, as well as read and generate SRT files, the standard format for subtitles, which is supported by most video players.
See Gecko in Action
Check out this video we created to present Gecko’s core capabilities:
We want to share Gecko and help serve an ever growing community of enthusiastic users. After all, we know that many others encounter the need for annotated conversations and one of our company’s leading operating principles is to create raving fans anywhere and everywhere. Therefore, Gecko is already available at https://gong-io.github.io/gecko, and we are continually updating it and enhancing its capabilities. For example, since its launch in September 2019 at INTERSPEECH 2019 in Graz, Austria, we’ve added support for annotation of videos and subtitle creation. Our goal is to continue making Gecko richer to better serve professionals across the industry and academia. Try it out and let us know how you’re using it — we’d love to hear from you!
Originally published on the Gong Tech Blog by Software Engineer, Golan Levy.