This Startup’s New AutoML Optimizer Supercharges Deep Learning Models to Production

As the deep learning market matures, developers are transitioning their trained models for use in production, ushering in a new set of challenges. Among the difficulties are the need to deploy models built in the lab on commercial hardware with its many computational limitations. These real-world restrictions have a direct impact on the models’ accuracy.

With the looming uphill battle AI developers will soon face in continuously optimizing both software and hardware components at inference (production) stage, Israeli startup Deci has come out of stealth with an AutoML based deep learning acceleration software capable of delivering a tenfold acceleration of any model’s run-time and performance, regardless of the hardware used. 

With mounting demand to run deep learning models in the real-world, the supply of inference has yet to reach equilibrium, setting back startups and corporations alike in their pursuit of deployment. Industry-leading deep neural network architectures are continuously growing in size, by the number of layers and parameters, for example Google’s BERT and OpenAI’s GPT, and in computational complexity (including FLOPs and latency), which often fail to materialize their model’s representation and features in real-life scenarios.

Despite promising existing techniques, such as the Neural Architecture Search, and compression methods of quantization and pruning, much of their success relies on access to corporate giant-level compute resources and translates into lower accuracy in real-world hardware environments. Even with corporations’ and startups’ pursuits of inference compute power solutions, including the likes of Nvidia and Intel (acquired AI inference chip startup Habana Labs for $2 billion), the competition and structure of the industry are still taking shape with rising establishment of hardware-independent vendors.

To close the growing void between deep learning computational supply and demand, Deci has developed an acceleration platform: Automated Neural Architecture Construction (AutoNAC) technology, capable of automatically creating state-of-the-art deep learning models that will run faster and more accurately, on any hardware, such as FPGAs, ASICs, GPUs or even CPUs.

The Deep Learning Inference Stack. Graphic: Deci.

“Our AutoNAC engine autonomously redesigns your models to squeeze the maximum utilization out of your existing inference hardware and leverage the hidden structures of the data,” said Yonatan Geifman, CEO of Deci, who with his PhD advisor professor, Ran El-Yaniv, and longtime friend, Jonathan Elial, founded the startup after studying the rising cloud expenses for state-of-the-art deep learning models running on hundreds of GPUs. “This [inference] challenge will affect everyone trying to implement AI in production at scale and we need to build faster and more efficient models compatible with any hardware.”

“Our algorithms are born from the AutoML algorithm family,” explained Geifman, which aim to improve models’ latency, throughput and cloud costs. “As new architectures manifest every other day, their underlying logic is man-made. Conversely, our proprietary algorithm engine employs AI to find better architectures, optimized for the hardware on which they’re running their model.”

The AutoNAC engine redesigns DL models to squeeze the most out of the hardware and leverage the hidden structures of the data. Graphic: Deci.

Their AutoNAC accelerator is a data- and hardware-dependent algorithmic solution that is complementary to other known compression techniques such as pruning and quantization. 

The AutoNAC pipeline ingests the user-trained deep neural network, a dataset, and an inference hardware. It then redesigns the user’s neural network to derive an optimized architecture with faster performance and substantially lower latency, all without compromising the accuracy of the model.

In addition to cloud instances (installed on premise or the cloud), the AutoNAC accelerator services edge-devices to achieve real-time responsiveness and improve energy consumption. With over 750 million edge AI chips produced in 2020, embedded in smartphones, tablets, speakers, wearables and enterprise edge hardware, inference is certainly making its case from moving from core to edge computing environments.

“We’re orthogonal to hardware acceleration too,” explained Geifman. “For deep learning models, you can bring hardware that can multiply matrices, but regardless of its speed, the processing time is still significant. When your model is optimized, in concert with Deci, you get boosted performance. We are reducing the number and size of matrices (operations) that you need to multiply, in order to yield a prediction.”

“Among our current clientele include proprietary hardware products, created for the fields of autonomous vehicles, mobile based edge applications, healthcare diagnostics, real-time sports analytics and smart retail stores,” shared Elial. As detailed in their Whitepaper, one client using Deci’s AutoNAC achieved a speed increase of 4.6x. 

According to Elial, the platform is already being used by hand-picked customers, and they plan to release a Beta version to the public in the coming months.

With the semiconductor market valued at over $425 billion for the 2020 year, the AI infrastructure portion only occupies a fraction. But it’s potential isn’t unnoticed, and offers immense benefits in performance of AI models compared to core processing. The winners of the inference race is uncertain, clouded by many offerings, all attempting to close the computational demand and supply gap. 

As the number of new architectures grows, with accompanying groups of developers, so does the demand for versatile inference technology with mounting pressure for solutions. Until then, the advances of deep learning remain in the lab, where much of the projects, at least mission critical, will remain a project.

Continue Reading