The global speech and voice recognition market is estimated to reach $27 billion by 2026, driven in part by the rapid integration of voice control systems in the automotive segment, but current solutions aren't much to celebrate. Inside a car cabin, speech recognition systems are plagued by overall inaccuracy, largely caused by multiple passenger voices and transient background noise that are immune to directional microphones and noise cancelling. And the technology isn’t new, having been commercially available since 2004 through a partnership with IBM in Honda cars to offer voice control of it’s navigation system.
However, the introduction of driver monitor systems (DMS) in commercially available cars has beset the complex problem with a disruptive catalyst; a video feed. Israeli entrepreneurs Roy Baharav (CEO), Eyal Shapira (CTO) and Zohar Zisapel (Active Chairman) founded Hi Auto earlier this year to enable accurate in-car voice control by capitalizing on both audio and video feeds integrated inside car cabins. Through an audio-visual analysis of a passenger’s speech, Hi Auto is developing AI algorithms to enhance the car’s speech recognition capability, regardless of background noise interference, at least better than prevailing market solutions.
The startup is developing deep neural networks to process audio and video captured of the speakers facial and lip region. They’re designed to run on the edge in near real-time, which is a major challenge for a car's infotainment system with limited processing resources. “The area around the lips is the input to our pipeline, and we’re focused on getting you the best enhancement and separation, a whole different expertise in AI algorithms development” said Shapira. “With the rise of driver monitor systems incorporated into new vehicles, we want to use the features they’re already extracting to add value through enhancement.” Their datasets are uniquely sourced privately and a competitive advantage according to Shapira.
