The field of deep learning has dramatically changed the way we approach data analysis and problem-solving. As a deep learning enthusiast, I’ve had the chance to speak with many companies, researchers, and entrepreneurs who are pushing the boundaries of what’s possible.
In our interactions with companies in the deep learning field, we note that almost everybody wants their model to run faster, in pretty much every scenario.
We identified two key types of speed optimizations: competitive optimization and SLA leap.
Competitive optimization involves reducing the time and cost of a model within the existing application – for a competitive edge over similar products or for more efficient operations.
On the other hand, an SLA leap involves creating a product that can serve new use cases and markets by reaching a higher level of SLA. This leads to not only better performance but also new growth opportunities for the company.
In other words – having a top-notch deep learning model does not guarantee successful deployment. While the model may have impressive accuracy, it is not sufficient for real-world application. And optimizing a model can help you get to real-world applications by adapting the SLA they require.
We meet a few key SLA levels companies are aiming from long-running batch processing to real-time inference with response times measured in milliseconds. This includes:
- Long-running batch processing – allows for a model to run over an extended period and be analyzed days later, such as in research and initial deployment.
- Periodic processing – requires models to run at specific intervals, such as every week or night.
- User-facing applications – must generate a response within a time frame that a user is willing to wait, typically measured in seconds.
- Web-based applications – require models to provide a response within a fraction of a second.
- Real-time applications – the model must generate a response in real time with near-instantaneous latency.
This journey from one SLA level to the next often requires significant changes to the model itself, including reducing its size, improving efficiency, and modifying its architecture.
Object detection in video streams has been a prime example of the evolution of deep learning models. Initially, deep learning models for object detection were easiest to use in offline applications such as photo management. However, as technology advanced models adapted and were optimized for running in a user-facing environment.
For customer-facing applications, object detection models are needed to provide near real-time results, allowing for fast and efficient recognition of objects, for instance in videos. This paved the way for their use in security cameras, drones, and other applications where quick and accurate detection was crucial.
The demands for even lower latency then took object detection to the next level, requiring real-time inference capabilities for use in areas such as augmented reality, robotics, and even automotive applications. This evolution showcases the incredible potential of deep learning models, as well as the importance of optimizing for speed and efficiency to stay ahead of the curve.
Optimizing deep learning models not only opens up new possibilities but also has the potential to drive significant economic benefits, including market penetration and expanded Total Addressable Market (TAM).
I personally spend a lot of time and effort researching deep learning optimizations, if you are as excited about it as me, reach out in a comment or a PM.
Originally published on Shai Tal’s Linkedin profile.