As the deep learning market matures, developers are transitioning their trained models for use in production, ushering in a new set of challenges. Among the difficulties are the need to deploy models built in the lab on commercial hardware with its many computational limitations. These real-world restrictions have a direct impact on the models’ accuracy.
With the looming uphill battle AI developers will soon face in continuously optimizing both software and hardware components at inference (production) stage, Israeli startup Deci has come out of stealth with an AutoML based deep learning acceleration software capable of delivering a tenfold acceleration of any model’s run-time and performance, regardless of the hardware used.
With mounting demand to run deep learning models in the real-world, the supply of inference has yet to reach equilibrium, setting back startups and corporations alike in their pursuit of deployment. Industry-leading deep neural network architectures are continuously growing in size, by the number of layers and parameters, for example Google’s BERT and OpenAI’s GPT, and in computational complexity (including FLOPs and latency), which often fail to materialize their model’s representation and features in real-life scenarios.
Despite promising existing techniques, such as the Neural Architecture Search, and compression methods of quantization and pruning, much of their success relies on access to corporate giant-level compute resources and translates into lower accuracy in real-world hardware environments. Even with corporations' and startups' pursuits of inference compute power solutions, including the likes of Nvidia and Intel (acquired AI inference chip startup Habana Labs for $2 billion), the competition and structure of the industry are still taking shape with rising establishment of hardware-independent vendors.
