Managing data center capacity for machine learning requires smart DCIM tools
At their recent annual conference, NVIDIA CEO, Jensen Huang proclaimed during his keynote that Artificial Intelligence (AI) workloads will flood data centers everywhere as every app and application moves towards leveraging AI technology. The complexity of managing these workloads is expected to increase adoption of DCIM tools by I&O leaders.
The growth in Machine Learning is exploding, and although most of the Deep Learning technology today resides with large cloud hyper-scalers like Google and Facebook, Huang believes it will spread beyond these giants to data centers of all sizes. “Machine Learning is one of the most important computer revolutions ever,” Huang said. “AI is just another kind of computing, and it’s going to hit many, many markets,” added Ian Buck, NVIDIA VP in charge of the company’s Accelerated Computing unit.
As with the current trend in most hosted infrastructures, data centers will implement a hybrid model of on-premises and outsourced cloud services to achieve cost and performance efficiencies. On-premises Machine Learning will require power-hungry servers with lots of GPUs, which translates to higher densities than most of the world’s data centers have designed to support, in the neighbourhood of 30kW per rack. Deploying smart DCIM tools inside the data center will become critical for operators to manage AI workloads–along with current workloads –to drive energy-efficiencies without sacrificing too much in performance. Having continuous visibility into the three layers of the data center –facility, IT and virtual–and managing each to the operator’s key performance indicators will protect against cost overruns resulting from poorly managed power and compute.