The battle to combat data sprawl: what CIOs need to do now

Increasingly, the landscape of data in today's enterprise is a wide-ranging, disorganized clutter of information, with vast data sets spread across traditional data centers, multiple clouds and various vendors. The question is: What can CIOs and IT organizations do to combat the unprecedented sprawl?

abstract background technology data binary plotted points
ivanastar / Getty Images

Today's enterprise is a hybrid landscape of data, with vast volumes and types found on-premises, in multiple public and private clouds, in SaaS applications, and at the edge. Organizations want to scale the value of their data with AI, as well as securely and compliantly access it. Those efforts have become a battle to combat data sprawl and transform data into useful insights that boost decision-making and help surpass the competition.

Much of this profound, explosive shift points to edge computing and IoT: There were an estimated 11.7 billion IoT device connections at the end of 2020 and Gartner predicts 75% of enterprise-generated data will be created and processed at the edge, outside a traditional centralized data center or cloud, by 2025. As IoT devices become more intelligent and ubiquitous, they add far more complexity to the data landscape, as organizations scramble to integrate the data produced from IoT into operational systems or analytics.

The rise of remote work in 2020, due to the Covid-19 pandemic, has also been a significant factor—with big moves towards collaboration apps, platforms and BYOD devices. All leave trails of documents, interactions and other data assets to deal with, with no specific guidance around document sharing and storage procedures. For example, a recent survey found that 71% of office workers globally – including 68% in the US – admitted to sharing sensitive and business-critical company data using instant messaging and business collaboration tools.

"The rise of IoT and the trends towards edge computing and remote work contribute to an overall shift away from centralization and toward decentralization," explains Chris Bergh, CEO of DataKitchen, a DataOps consultancy and platform provider that manages analytics creation and operations. For IT organizations, this massive heap of complexity means a loss of control over data assets. "Typical enterprises are creating data at a much faster rate than the ability of their data teams to manage and govern it," he says.

The trend towards data expansion, of course, was already well underway before the Covid-19 era began. But experts say the pandemic's arrival has accelerated its growth rate, scope, and impact. More than ever, organizations "need to get ahead of the fast-rising tide by implementing more robust methods of proactively managing and deriving insight from the data," says Andy Neill, chief enterprise architect and senior director of data & analytics at Info-Tech Research Group.

Vulnerabilities and regulatory challenges due to data decentralization

The problem is that the vast amount of this data explosion is ungoverned, leaving organizations vulnerable to breaches and leaks, says Sandhya Balakrishnan, US region analytics lead at Brillio. Data stored and processed at the edge is often missing from the mainstream data strategy to consolidate and secure data in the cloud, she explains. "That means there isn't a single source of truth for business operations," she says, which adds layers of suspicion on the quality and sanctity of enterprise data.

According to Seth Dobrin, global chief AI officer at IBM, data sprawl also makes complying with global regulations a formidable challenge. "This is increasingly hard to do, because if you have a global data fabric across multiple countries, and data is shared between countries, the regulations from each country need to be complied with," he says.

Ultimately, organizations need a single view of all this data, he adds. While data is decentralized, the users and business processes that work with the data are not—so they need to work with data in the context of a central use case. "There's a bit of a paradox," he explains. "Enterprises will have decentralized data sprawl on the one hand, but they also need a way to understand, govern, and use the data in ways that require the data to be unified."

For CIOs, this means that they need to develop a data management strategy that is future-proof against edge expansion, says Subhankar Pal, AVP of technology and innovation at Capgemini North America's engineering and R&D business unit – one that allows the business to scale up quickly while maintaining security, cost efficiency, reliability and performance. "Data management at the edge creates a whole new set of security challenges for CIOs who are used to dealing with just the data center," he says. "With an edge setup, data is processed closer to the source, away from the centralized data center that is more physically secure."

As CIOs and IT organizations battle to combat data sprawl, these are five important steps to take, say experts:

  1. Start with a comprehensive digital and data strategy. This is an important starting point to illustrate the value to the business of controlling data sprawl – as well as the risks of not prioritizing investment. "Once the strategy is in place, organizations are then well positioned to execute on it with methodically planned data practice and platform implementations," says Neill.
  2. Focus on data architecture that supports automation.  A strong data architecture strategy should address how the business needs to operate; map the data architecture needed to support the business; and deploy the right automation technology that can support various patterns, says Dobrin.  "There isn't one overarching platform that will do it all," he warns: Some technologies are designed to solve "vertical" problems like edge processing, or single cloud data storage. Other technologies are designed to solve "horizontal" problems, such as a data catalog which documents data across the data fabric, and then makes the data available to business users.
  3. Other C-suite involvement. The CIO is only one part of a large modern data management and data governance landscape that is owned and operated by the business, says Neill. "The CIO is only accountable for data custodianship, or looking after the systems, solutions and infrastructure that support the data," he explains. It is up to the chief data officer, chief analytics officer, chief operating officer, or other non-IT members of the C-suite to own the data and execute on initiatives to take advantage of data assets. "Knowing how these accountabilities are assigned, and how they relate to each other, can help IT and the organization maintain more consistent control over their data – and more importantly derive more value from it to drive the business forward," he says.
  4. Invest in security to keep up with sprawl. Security at the edge is a growing concern with the increasing number of data users, access points, and software and hardware, says Pal. "Organizations need to invest in the most secure, edge-based security solutions, even if they come at a higher cost, as the cost of a data breach will be prohibitive," he explains. "They can count on encryption and key management to be core to data security on the edge device, and in some cases multifactor authentication will play a role."
  5. Ensure governance that promotes data at scale. As complexity of data flows rises, organizations risk creating bureaucracy around checklists and signoffs as they attempt to enforce governance. "That is a tax upon productivity," says Bergh. Instead of focusing on how to limit users, governance should be concerned with promoting the safe and controlled use of data at scale, which is about active enablement than rule enforcement. "This automation, known as DataGovOps, designs data quality management and protection workflows that empowers, rather than limits, data usage," he says.

Opportunities for CIOs if they overcome data sprawl

A mid-2020 IDC study found that 80% of surveyed IT leaders identified data sprawl as one of the most critical problems their organizations must address today.

Enterprises that don't take the appropriate actions regarding data sprawl face real financial consequences, says Dobrin. Not only do they risk being hit with data compliance fines, but they also have to contend with the higher costs of storing and processing data inefficiently.

In addition, if enterprises don't manage data sprawl effectively, it will result in unreliable and overly slow business processes, he adds: "The outcome can be everything from bad business decisions to customer attrition, to higher costs of customer service."

The bottom line is, data is a highly-valuable currency that provides more opportunities to organizations seeking a competitive edge. But that only works if there is a clear strategy to standardize storage, security and usage, says Balakrishnan.

"For CIOs, this means building capabilities to understand, discover and govern these disparate data assets holistically—they can no longer be relegated as a secondary investment," she says. "If organizations innovate on data management and strengthen data governance, the opportunities data can unlock will multiply, including capabilities to search data, democratize data with marketplaces and scout for use cases that leverage the data responsibly."