cnvrg.io AI OS for machine learning, today releases the MLOps Dashboard to its set of capabilities for advanced resource management of ML workloads. The solution fills a major gap in the industry, causing low ROI and the under-utilization of compute for ML. The end-to-end data science platform will now offer greater visibility of resource allocation, capacity and utilization that has the potential to increase GPU/CPU and Memory utilization by 70%.
One of the leading factors obstructing the ROI for machine learning is the under-utilization of GPUs/CPUs or Memory. Companies invest millions of dollars on compute that has the potential to dramatically accelerate AI workloads and improve performance, but end up only utilizing 20% of these powerful resources. The gap between compute allocation and actual utilization is shocking, and can cost companies more than they realize. Machine learning and deep learning are very compute intensive and complex to manage, making this computational debt difficult to reduce. Many infrastructure teams lack visibility of GPUs/CPUs and memory utilization for ML jobs, and are rarely able to attribute a job to its utilization. Not only that, but a lack of visibility can disrupt productivity, by blocking underutilized GPUs from being used for another job. cnvrg.io has introduced the MLOps Dashboard to help infrastructure teams visualize allocation and utilization of different jobs, clusters, by user and by container.
With MLOps Dashboards, IT teams can visualize live metrics like GPU, CPU and Memory utilization to identify workload bottlenecks and avoid wasting expensive resources. Teams using MLOps Dashboards can go from 20% to 70% utilization by re-allocating resources to more computationally heavy jobs, and by reducing compute for historically underutilized jobs. MLOps Dashboards help infrastructure teams to:
- Visualize GPU/CPU utilization across runs
- Compare real time allocation vs. utilization at any timestamp
- Show active jobs with info about user, project, container, allocation, utilization
- Connect consumption metrics with any external data analytics platforms like Tableau, Data Studio, Excel and more
“When it comes to resource management, there are few solutions that maximize visibility of GPU/CPU and memory consumption for infrastructure teams to improve utilization,” said Yochay Ettun, CEO and Co-founder of cnvrg.io. “Across AI organizations, we’ve witnessed the discrepancy of compute allocation vs. utilization vs. capacity, causing wasted resources and low ROI for machine learning projects.”
The MLOps Dashboard is now available across all cnvrg.io Premium and CORE users. To learn more about how your team can reach 80% utilization across ML workloads you can schedule a demo with a specialist, or get started with the free community version, cnvrg.io CORE.
cnvrg.io is an AI OS, transforming the way enterprises manage, scale and accelerate AI and data science development from research to production. The code-first platform is built by data scientists, for data scientists and offers unrivaled flexibility to run on-premise or cloud.