Data science

Data management investments often stumble, survey finds

The bulk of investments made in data management platforms thus far has not been money well spent, according to a Data Value Scorecard published today by data lake platform Dremio.

The scorecard finds only 22% of the data leaders surveyed said they have fully realized a return on investment (ROI) in data management in the past two years.

Potentially more troubling, over half of respondents (56%) admitted that when it comes to data management they have no real way of consistently measuring ROI. The scorecard is based on a survey of 500 data and analytics leaders at enterprise IT organizations in the U.S., U.K., Germany, Denmark, Sweden, Norway, Australia, Hong Kong, and Singapore.

Data access

The survey, conducted in collaboration with Wakefield Research, notes that more than three-quarters of respondents (76%) are currently locked into at least one closed system. Those proprietary platforms also make it difficult for analysts to access all the potentially relevant data they need in a timely manner, Dremio CEO Billy Bosworth noted.

Eighty-four percent of data leaders surveyed said it’s normal for data analysts at their company to work with a partial dataset. Only 16% said they expect the data management platform they employ to make fresh data available in a matter of hours or minutes. More than half (51%) said it takes their organization weeks to update data stored in their current platform. This issue is especially problematic because most digital business processes need to occur in near real time, Bosworth added.

A total of 79% of respondents also noted they have concerns about the level of scale that can be achieved using their current platforms.

Data management issues

Finally, the survey makes it clear organizations are struggling with data management. Survey respondents on average said they make 12 copies of their data to ensure it is available for all users. A total of 60% report their company has more than 10 copies of such data. A full 82% said their end users have used inconsistent versions of the same dataset at the same time due to cumbersome extract transform and load (ETL) processes that are required to move data into a data management platform.

Overall, the scorecard suggests only about 20% of organizations are successfully managing their data, with 28% of respondents claiming it is “very easy” for end users to access data and develop insights. Only 20% said timelines for ETL projects are “rarely or never” underestimated, while an equal percentage said their company has “little to no” restrictions on data access for governance.

The fact that many organizations manage data poorly is one of the dirty little secrets enterprise IT leaders don’t like to acknowledge. Most data is created within the context of an application used in a line of business. The data created by each of those applications is often conflicting and inconsistent. That issue is now coming to a head because digital business transformation initiatives that rely on analytics and artificial intelligence (AI) need access to reliable data to accurately automate a process.

Cleaning up that mess creates an opportunity for centralized IT teams to become more relevant. No line of business unit is able to aggregate all the data required to drive a digital process on their own, Bosworth noted. “Most organizations have come to that realization,” he said.

Data storage

Dremio is making a case for replacing data warehouses running on-premises or in the cloud with a data lake that leverages inexpensive cloud storage to make petabytes of data available via SQL queries. Bosworth argues that as more data is stored in the cloud, IT organizations need to manage data completely independent of both the applications employed to create it and the infrastructure used to store it.

Achieving that goal becomes easier when data is stored, for example, in an open cloud storage service that enables IT organizations to take advantage of a centrally managed data lake platform to identify and manage data in a more consistent manner, Bosworth said.

Employing data lakes as an alternative to data warehouses is not a new idea. Many organizations have attempted to build data lakes based on open-source Hadoop platforms. But those efforts have often resulted in the creation of data swamps, simply because organizations lacked the tools and processes to effectively manage terabytes of data. As a result, many organizations today are often reluctant to launch another data lake initiative.

In terms of overall data management maturity, no two enterprise IT organizations are alike. However, it’s becoming increasingly apparent that the ability of any organization to compete in a world dependent on digital processes will come down to how well they manage the data that drives those processes.

This article was originally published at VentureBeat and is reproduced with permission.

Back to top button