The global speech and voice recognition market is estimated to reach $27 billion by 2026, driven in part by the rapid integration of voice control systems in the automotive segment, but current solutions aren’t much to celebrate. Inside a car cabin, speech recognition systems are plagued by overall inaccuracy, largely caused by multiple passenger voices and transient background noise that are immune to directional microphones and noise cancelling. And the technology isn’t new, having been commercially available since 2004 through a partnership with IBM in Honda cars to offer voice control of it’s navigation system.
However, the introduction of driver monitor systems (DMS) in commercially available cars has beset the complex problem with a disruptive catalyst; a video feed. Israeli entrepreneurs Roy Baharav (CEO), Eyal Shapira (CTO) and Zohar Zisapel (Active Chairman) founded Hi Auto earlier this year to enable accurate in-car voice control by capitalizing on both audio and video feeds integrated inside car cabins. Through an audio-visual analysis of a passenger’s speech, Hi Auto is developing AI algorithms to enhance the car’s speech recognition capability, regardless of background noise interference, at least better than prevailing market solutions.
The startup is developing deep neural networks to process audio and video captured of the speakers facial and lip region. They’re designed to run on the edge in near real-time, which is a major challenge for a car’s infotainment system with limited processing resources. “The area around the lips is the input to our pipeline, and we’re focused on getting you the best enhancement and separation, a whole different expertise in AI algorithms development” said Shapira. “With the rise of driver monitor systems incorporated into new vehicles, we want to use the features they’re already extracting to add value through enhancement.” Their datasets are uniquely sourced privately and a competitive advantage according to Shapira.
Their solution is compatible with all speakers in the car, rather than just the main microphone, offering functionality to backseat passengers, extending their voice control of the car as well.
“Hi Auto’s single microphone approach versus microphone array will save OEMs on BOM” explained Baharav. Some auto OEMs are seeking solutions which avoid deploying multiple microphones in the car or an array as these are more expensive and complex to deploy. Baharav also shared their thesis on their foray into automotive based on an optimal go to market to complement speech recognition systems.
Current leading DMS vendors include Faurecea, TaTaelxsi, and Valeo, some of which have existing efforts towards augmenting speech recognition by way of enhancement from feature extraction. Their solutions can monitor passengers for alertness and fatigue, as well as attentiveness from eye-tracking algorithms.
Notwithstanding select DMS innovations, the startup’s thesis is predicated on the baseline growth of voice assistant technology, and it’s adoption rates as the future mode of human-machine interaction. Globally, worldwide smart speaker shipments are expected to reach 92 million units for 2019. And usage in the car is growing in popularity. In the US, it’s estimated that 114 million people used a voice assistant in their car, of which 68 percent are monthly users; to play music, check directions, control the infotainment system, order food and services, or set the climate at their home before they arrive.
While automotive technology is the startup’s current target market, speech enhancement technology can penetrate many verticals: a horizontal AI layer. Their algorithms can find value in numerous sector applications, like retail, robotics and enterprise call center communication systems. The market leaders in the ASR enabled speaker device market consists of Amazon, Baidu, and Google with almost 16 million unit shipments combined for the second quarter of 2019.
Hi Auto completed a $4.5 million seed funding round in October this year, led by the Israeli car importer Delek Motors and Hi Auto’s acting chairman and co-founder Zohar Zisapel. Other investors include Allied Holdings, the Goldbell Group, and Plug & Play. Their advisory board includes world renowned experts in speech enhancement and speech recognition, Prof. Israel Cohen and Prof. Dan Povey respectively. They’re currently participating in Intel’s Ignite startup accelerator program.
Their technology will be on demonstration at CES 2020, held on January 7-10 in Las Vegas, where they’ll display a prototype of the world’s first commercial solution for driver speech recognition. Over the next quarter, they’re expected to grow their team to 12 researchers and developers (see open positions).
While in-car voice assistants claim to offer convenience and faster resolution to queries, especially given drivers’ main focus on driving, auto brands’ current ASR solutions aren’t at par with consumers expectations. But research does suggest the inevitability of the interface as the future mode of command; in fact, in 2022, it’s expected that 95% of consumers will use a conversational assistant in their car. The brands that appease consumers through their in-car experience will likely make meaningful impression on consumers’ loyalty. Sharing the same sentiment, Baharav explained how Hi Auto’s solution will “revolutionize the speech recognition experience for consumers and enable the introduction of more complex and sensitive capabilities by OEMs.”