This Startup Uses Deep Learning to Make Virtual Call Center Agents Sound Human

Share on linkedin
Share on facebook
Share on twitter
$6.1 M

With all the innovations manifesting in the world today, call contact centres still aren’t a thing of the past. In fact, they’re currently worth $200 billion in the US alone, and expected to reach $400 billion by 2020. And with all the new Artificial Intelligence (AI), voicebot, and chatbot related innovations proliferating in the market as well, surprisingly, call centres will likely be enhanced and continue to grow, thanks in part to, an Israeli AI startup that has developed an AI agent to supercharge contact centers.

Voca (‘conversation’ in the latin language) is an AI startup that develops agents for call centers. They take the recorded phone conversations of contact centers’ human agents and use the audio to train a custom deep learning engine to serve as an outbound artificial call agent, tailored to the individual tone, accent and wording in order to drive higher revenues, reduce costs and improve productivity from outbound call scenarios. The startup has just secured a Seed funding round of $2.6 million from venture investors including lool ventures and Flint Capital. was started in late 2017 by Dr. Alan Bekker, a native Uruguayan who immigrated to Israel at 11 years old. Bekker earned a double BSc in Electrical Engineering and Physics with honors and a PhD in Deep Learning and Machine Learning from Bar-Ilan University at the age of only 28. His thesis work was about training deep neural networks with unreliable data labels. During the training stage of AI algorithms, the data input might be improperly labeled or suffer from fuzzy labeling, a scenario when data can be represented by multiple labels – a common occurrence in speech recognition work.

 “In real life scenarios, the data is always noisy” explained Bekker. With this constant problem present, and knowing the theoretical lessons are always assumed in a vacuum, Bekker decided to develop deep learning algorithms that are able to train very well in the absence of proper labeling, which would hopefully come in handy in his commercial work and applications following his studies.

Prior to, he worked on computer vision development at Orbotech (Israeli tech conglomerate), deep learning research at Intel Corporation, co-founded a AI software development startup called DeepSolutions, and taught machine learning and deep learning courses at Holon Institute of Technology. Bekker has published ten research papers on noisy data labeling in the field of deep learning, text analysis, speech recognition, vision recognition, medical imaging, and bioinformatics. He’s also spent time in summer courses under Yoshua Bengio at Universite de Montreal, one of the godfathers of modern AI.

Dr. Bekker started and quickly recruited an industry heavy hitter; serial AI entrepreneur Einav Itamar, the former CTO and co-founder of computer vision startup Corrigon (sold to eBay in 2016 for $30 million) to invest and lend his executive expertise in building and selling an AI startup. Itamar began as an investor and Director and, and shortly afterwards assumed the role as CEO and co-founder after his full departure from eBay. Itamar earned a MSc in NLP and Machine Learning from the Technion University and worked as a Software Developer at Intel Corporation before starting Corrigon. He served as the Chief Architect at Matomy Media Group, and VP R&D at K2View. Mr. Itamar is a machine learning expert with vast experience on how to train AI algorithms, machine translation, and how to deal with massive datasets.

When the two entrepreneurs met, they both agreed on one thing: Natural Language Processing (NLP) is taking shape to be the new poster of AI. Deep learning had already revolutionized computer vision, the current and face of AI, notably in autonomous driving systems, and facial recognition applications. But deep learning hadn’t penetrated speech recognition and natural language processing to the extent computer vision had experienced, and the timing was ripe. This agreement gave the two entrepreneurs inspiration and credence to embark on, a NLP software startup.

Through several iterations of technological research, Itamar and Bekker learned the limitations and opportunities of the chatbot and transcription solutions available on the market today lied in three parts: (1) context, (2) intonation, and (3) humane sounding voice for outbound scenarios.

“You cannot transcribe something accurately without being aware of the context, no matter what tool you’re using: Watson, Google, or a custom speech recognizer” explained Bekker. And since the mass market tools in converting speech-to-text lacked context awareness, Bekker set out to give that ability to is built on top of a set of speech recognition algorithms that aware of the current state of conversation, which enables transcription to be extremely accurate.

The second revelation and driving force behind is the intonation understanding, the component of communication that encompasses 70% of information conveyed either verbally or non-verbally. “Even if a transcription is accurate, it’s non-starter and not good enough” stated Bekker. Why? Because intonation is implied in many utterances of the communication. To solve intonation, Bekker developed speech-to-intent into They classify the intent of the user directly from the speech.

So compounding both context and intonation, uses context awareness to accurately transcribe the conversation in order to properly extract parameters from the conversation and intonation to identify the user intent. is based on deep neural networks: recurrent neural networks for speech-to-intent and named entity recognition tasks while convolutional neural networks for the text-to-speech engine.

With the technological tools ready to be deployed, Itamar and Bekker decided to attempt to enhance a marketplace for outbound call center scenarios, as opposed to a wide-ranging approach targeting all contact call centers. But in outbound customer service scenarios, such as debt collection, or cross-selling attempts, the voicebot needs to sound human or else you’ll immediately think the caller is a scam. And lastly, Bekker developed a highly sophisticated human sounding voice for during outbound call scenarios, the main focus.

Not only are outbound call scenarios easier to anticipate, it’s a solvable portion of the customer service segment by AI, which is largely unsolved to this day. Outbound call scenarios, beginning with calling a customer, is solvable because you can anticipate beforehand how the conversation graph can look like explained Bekker. These outbound call scenarios are useful for debt collection, lead generation, lead qualification, upselling or cross-selling opportunities. “These are instances where there’s a well defined KPI in the call thus enabling a end to end optimization of the agents accent, gender and wording depending on the audience” explained Bekker. is current available in two languages (english and spanish) and is available on the cloud or on-premise, for data-secured usage. They charge a one-time setup fee, and employ a licensing-based model from the volume of outbound calls the center conducts. at Finovate Asia 2018 – Best of Show

Today, is composed of 10 employees and they’re hiring two new AI recruits. Their main focus is on financial institutions such as banks and insurance companies, for which they’re fully compliant over every word that their agent says. They participated in the Citi accelerator program in Tel Aviv and recently won “Best of Show” at the Finovate Asia 2018 startup competition. If you get a call in the coming year from a customer contact center to upsell a product you bought, just remember you might be speaking with’s human-sounding AI agent, if you can tell the difference…