In 2017, it came to light that oil was no longer the world’s most valuable asset and expensive commodity.
Data has now surpassed oil in value, as the tech giants of the world become the new elite. Younger, savvier, and more powerful than ever before.
With growing attention devoted to AI, machine learning, and IoT, what we’ve come to know as big data has become an even broader version of itself. In recent years, big data was seen as an unstoppable force of nature that would either overwhelm enterprises or propel them to new heights.
This next generation of big data — we’ll call it expansive data, pulsing through systems in real-time, powering processes unseen to human eyes, and adapting and learning as it goes along — is going to reshape enterprises in ways not even anticipated.
This requires attention to new types of tools, platforms, and approaches to deliver value to today’s data-hungry businesses. Expansive data will represent ever-growing volumes of information, potentially increasing within enterprises at a rate of up to 36% a year, according to Dresner Advisory Services.
Platforms supporting this growth include Amazon Web Services S3, Spark SQL, Hive, and Hadoop. Additional tools popular in enterprises are Apache Spark and Tensorflow. Expansive data places even higher demands on enterprise infrastructures, processes, and the managers and administrators responsible for making it all work.
That’s because organizations are leaning more heavily than ever before on their data assets and analytics capabilities, and initiatives such as AI and machine learning, to help them compete. Edge computing is also a defining factor in expansive data.
There is likely to be greater activity at the edges — expansive data means more processing may be distributed across IoT networks. Data can be ingested, processed, and even stored within edge devices and systems, and, if it is deemed critical on an enterprise-scale, moved to centralized data centers or clouds.
Edge computing continues to extend its capabilities. It encompasses a broad assortment of devices and systems that may require real-time interactions and responsiveness, including kiosks, autonomous cars and trucks, and sensors embedded across IoT. With comprehensive data surging across all points of the enterprise, infrastructures could be quickly overwhelmed with ingestion, processing, and storage demands.
Expansive data could also be valuable data without proper preparation. Fortunately, none of this is happening in a vacuum, and other developments may be helping organizations manage the challenge. Thanks to the ubiquity of cloud-based services, from infrastructure to platform to applications, the power, and capacity to support even bigger data environments are readily available. A new generation of database platforms and tools — led and enabled by machine-learning initiatives — is supporting the continuous, relentless data growth.
Hadoop is a big data framework that made huge-scale data analytics a reality for every company that will benefit from processing data. However, the software is beginning to show its age. While Hadoop was once seen as the single cure-all for significant data challenges ten years ago, today’s expansive data calls for a variety of tools, platforms, and frameworks to help enterprises better manage their data. Nonetheless, the Hadoop Distributed File System can either support or be a part of data lake architectures, opening up a new mission for these environments.
According to a 2018 survey conducted by Unisphere Research, a division of Information Today, Inc., 44% of enterprises had Hadoop in production, which represents a downward shift from 2016, in which 55% reported using the framework (“2018 Next-Generation Data Deployment Strategies Report”). Also, the survey found general satisfaction levels with Hadoop are mixed:
Only 14% consider themselves to be “extremely satisfied” with Hadoop, while 64% are either dissatisfied or lukewarm toward the framework. While Hadoop provided one-of-a-kind functionality in its early days — such as parallel processing and management of a variety of data types — other technologies and solutions also now share these capabilities without the skill levels that Hadoop demands.
Predictably, the growth of expansive data is likely to track closely to that of IoT itself. Accordingly, next-generation data technology initiatives represent new approaches to data management. The Unisphere Research survey found notable growth in the adoption of data lakes — places to store diverse datasets without requiring to build a model first. Their adoption continues to rise as data management personnel seek to develop ways to quickly capture and store their data from a myriad of sources and in various formats.
Overall, 38%of organizations employ data lakes as part of their data architecture, up from 20% in a survey conducted two years prior. Another 15% said they were considering adoption. Data lakes are growing to impressive levels as well — close to one-third, 32%, support more than 100TB of data, the survey found.
With the relentless rise of IoT, AI, machine learning, and cloud-based services, enterprises are now challenged with accommodating and delivering value from the expansive data that surges through their systems. Data warehouses and Hadoop represented solutions for the pre-IoT, pre-AI enterprises. Today’s opportunities and challenges call for the next generation of platforms and tools to bring it all together.