Americas

  • United States
John Edwards
Contributing writer

AI tackles data-center workload management

Feature
Jun 22, 202110 mins
Data CenterData Center Management

AI is ready to automate essential data center management tasks. But are data center managers prepared to make the transition from human to machine management?

Data center corridor of servers with abstract overlay of digital connections.
Credit: Sdecoret / Getty Images

As data center workloads spiral upward, a growing number of enterprises are looking to artificial intelligence (AI), hoping that technology will enable them to reduce the management burden on IT teams while boosting efficiency and slashing expenses.

AI promises to automate the movement of workloads to the most efficient infrastructure in real time, both inside the data center as well as in a hybrid-cloud setting comprised of on-prem, cloud, and edge environments. As AI transforms workload management, future data centers may look far different than today’s facilities. One possible scenario is a collection of small, interconnected edge data centers, all managed by a remote administrator.

Due to a variety of factors, including tighter competition, inflation, and pandemic-necessitated budget cuts, many organizations are seeking ways to reduce their data center operating costs, observes Jeff Kavanaugh, head of the Infosys Knowledge Institute, an organization focused on business and technology trends analysis. “AI and automation have proven to be powerful tools in workload management, as it frees employees from time-consuming and mundane tasks and allows them to focus on work that actually requires a human,” he says.

Most data center managers already use various types of conventional, non-AI tools to assist with and optimize workload management. Yet these tools tend to be reactive rather than proactive, says Sean Kenney, director, advisory, at professional services firm KPMG. “They react to the problems in the data center, but they don’t collect data to determine any foresight to reduce the problem behavior,” he notes.

Sanket Shah, a clinical assistant professor of biomedical and health information sciences at the University of Illinois, Chicago, believes that AI now is poised to help data center managers who find themselves with no reliable way to predict or plan for future needs. “With AI, capacity and horsepower can be allocated in a more efficient manner, allowing organizations to scale and become flexible,” he explains. “Automating certain processes and shifting power where necessary will ultimately lower costs for those [managers] that have rapidly evolving data needs.”

The idea of using AI technology for data center management is hardly new. Back in 2014, for instance, Google disclosed that it was using technology acquired by its purchase of UK-based AI specialist DeepMind to enhance data center facilities equipment management at several of its sites. Today, the AI workload management field has expanded considerably to include a number of startups, such as DLabs, digitate, Redwood Software, and Tidal Software. Larger players, such as Cisco, IBM and VMware, have also started entering the market.

As with most things AI, workload management technology is advancing rapidly. “There are a ton of choices and a ton of limitations, but there are usually ways to mitigate those limitations,” notes Bill Howe, an associate professor at The Information School of the University of Washington. “I don’t see the problem of choosing the right methods and engineering solutions … to be particularly more or less challenging in workload management than any other complex AI application,” he observes.

Fulfilling a need

A top priority for most data center managers is optimizing operations to meet peak demand. Yet no matter how carefully they plan and prepare, demand peaks and valleys often remain beyond their control. “Where AI can bring unique improvements is that it can understand workload patterns and match those demands with data center capacity,” says Goutham Belliappa, vice president of AI engineering at business advisory and consulting firm Capgemini North America.

AI management promises to free data center teams from attending to an array of mundane, repetitive tasks, including server management; security settings; compute, memory, and storage optimization; load balancing; and power and cooling distribution. “All of these workloads can be automated or enhanced by AI,” says Lian Jye Su, principal analyst at tech market advisory firm ABI Research.

AI can help analyze the data collected from individual machines and spot anomalies in the parameters that are being monitored, says Ramprakash Ramamoorthy, product director for AI and ML at IT management software developer ManageEngine. “AI can also help predict breakdowns and outages much earlier, and this can help the data center management team to mitigate downtime and to keep the clusters up and running in good health,” he adds. “AI can also enable better temperature and voltage management, thereby directly cutting down on operational costs and helping reduce carbon footprint.”

While various AI approaches can be used, a workload management tool should always ensure that model predictions are fully explainable, Ramamoorthy says. “More often than in other domains, a decision taken by an AI system in data center workload management will be acted upon by a team or teams of people working together,” he explains. Therefore, AI model decisions should be interpretable, allowing the IT team to better understand the intent of the model’s decision and to act accordingly. “AI models can, at best, be 80 to 85 percent accurate, so this would also help the human teams correlate sensible decisions by rightly interpreting the AI model’s decision,” he notes. It would also be useful for effective workload management if the AI model could give a confidence score to the decision it presents.

As AI and ML tools become more widespread, organizations are learning that the best outcomes are achieved when human intelligence collaborates, not competes, with the technologies, says Richard Boyd, co- founder and CEO of artificial intelligence and machine learning developer Tanjo. “Machines simply cannot replace humans in many respects, but there are certainly areas where machines are much better than humans,” he says. “Popular opinion will shift once AI and ML become prevalent and workers adapt to this new partnership.”

Data centers can leverage AI/ML to improve performance as well as to optimize configuration and deployments, says Brons Larson, AI strategy lead at Dell Technologies. “AI/ML enables dynamic orchestration of resources versus workloads to optimize resource utilization to better manage costs,” he states. All AI solutions, regardless of application or vendor, require expertise to properly configure and optimize value, Larson adds. “This starts with properly capturing and evaluating data for training and testing and managing deployed models against drift and bias.”

Additionally, rule-based AI can help automate resource optimization and compliance through both smart policy control and predefined configurations. “Using data gathered from daily operation, machine learning-based AI can further augment other aspects of data center operation that previously required in-depth domain expertise,” Su notes. “For example, data center security can be strengthened through self-learning threat detection and monitoring algorithms,” he says. “Load balancing, power, and cooling distribution functions can be optimized by channeling the required resources in the right direction.”

AI can also streamline data management. “Enterprises are increasingly finding themselves surrounded by immensely high volumes of data pertaining to critical stakeholders,” Kavanaugh says. “Using AI, organizations can ensure that these large quantities of data are efficiently and accurately managed.” With AI’s assistance, teams can perform tasks, such as data quality analysis or extracting data to create predictions, quicker and more accurately than ever before. “This is crucial for organizations, since they need the most accurate data to make informed decisions,” Kavanaugh observes.

AI bundles

What’s now emerging, as AI matures, is a software-driven method for tying disparate elements together with minimal human intervention. For example, in a typical database system, an enormous amount of configuration is needed to make operations run efficiently, such as indexing tables, partitioning data across servers, allocating memory for certain kinds of queries, and tuning the optimizer to “fit” your computing platform and expected workload, Howe notes. “AI can help by learning rules and procedures from enormous amounts of historical data [concerning] which schedules were effective for which tasks, rather than for us trying to figure everything out,” he explains.

With AI in place, human IT leaders and teams are free to focus on business issues rather than worrying about infrastructure minutiae. “From an AI perspective, most of the models we use are self-learning ensemble models, which use a combination of various techniques and are continuously optimized as they learn from the workload patterns that they manage,” Belliappa says.

Planning and deployment

Before AI can begin working its management magic, IT and business leaders will need to get comfortable with the idea of handing key administrative responsibilities over to a piece of software. “Depending on the scale and internal knowledge repository, it can be quite difficult,” Shah admits.

Ultimately, how well an organization will handle the transition from human to AI workload management depends on its technological maturity, its scale of operations, and the data center’s dynamism. “A siloed business that lacks modern infrastructure to effectively utilize its data will struggle,” Kavanaugh says. On the other hand, a rapidly growing number of AI vendors, offering tools targeted at specific types of enterprises, increases the likelihood that organizations of almost any type and size will be able to make a smooth transition. “Ease of configuration and deployment will continue to improve as both the company and its solutions mature,” he predicts.

If AI has an Achilles’ heel, it’s the technology’s reaction to even relatively subtle changes in data center systems and practices. “Most AI techniques are about finding stable patterns, assuming a fixed environment,” Howe explains. “If you change the environment in a way your model can’t see, it will happily tell you the wrong answer.” Careful planning before deploying changes can help mitigate this concern.

Coming soon

While AI-powered data center workload management is already routinely used by many large enterprises, particularly hyperscalers such as Google, Amazon, and Microsoft, the technology is only now beginning to trickle down to smaller data center operators. It won’t be long, Belliappa observes, before data center managers will face a stark choice: continuing to rely on traditional data center management technologies and practices or “significantly invest in AI-driven reinvention to stay viable.”

Over the long term, as the technology improves, costs drop, and adopter confidence grows, AI-driven management is expected to become mainstream. “In the next four to six years, you’ll see AI data center workload management technology as a standard option,” Shah predicts.

“I think this trend is moving fast,” Howe states. “There has long been a lot of automation in data centers, and these [AI] techniques provide a better way to make use of what providers have a lot of—data.” He expects that automated workload management using AI learning methods will “be commonplace soon.”

There’s a growing expectation among industry observers that AI will begin dominating data center management sometime within the next three or four years, although pandemic-driven acceleration may help nudge that timeline forward, Kavanaugh says. “Soon, data centers will be able to automate almost all operations, from cybersecurity to maintenance to monitoring,” he predicts. “However, our workload, and management of it, will continue to evolve as the amount of data increases exponentially and, also, as we find new uses for AI in the enterprise.”