Logo

The Data Daily

New Technologies in a Big Data World

New Technologies in a Big Data World

The big data world is changing in ways never seen before, particularly when it comes to bringing data together and into situations where it can be actionable for the business. The challenge faced by all enterprises—large and small—is being able to discover, identify, and bring the data needed to build products, deliver services, and understand customers. Data integration itself has been a practice—and challenge—for decades. Now, however, new tools and processes are enabling new ways of bringing enterprises to a state in which it can support sophisticated applications such as artificial intelligence, machine learning, and the Internet of Things. The issue is that data-centric cultures are far off, especially since data still resides in silos, across different devices, in different formats, and may be part of the reason why organizations are not ready to embrace its full potential.

Here are some developments to watch in the year ahead.

First, there’s the matter of what enterprises need to do to handle the ever-growing volumes of data flowing in or being generated. “The game changer is the way unstructured data is stored, managed, and searched,” said Andy Thurai, vice president and principal analyst with Constellation Research. “AI needs a lot of unstructured data, which is important, since close to 80% of data collected is unstructured.”

Many companies “possess far more unstructured data than they really know what to do with,” he added. “Unstructured data was dumped into a cheap storage place such as Amazon S3, and no one ever bothered to get insights from it unless there was a pressing need. Video, image, and audio files, and other types of unstructured data can take up a great deal of space. Storage cost becomes a major factor for a lot of these companies, given the massive size of storage requirements; the storage needs to be much cheaper than traditional systems.”

Enter data lakehouses, invented to solve these issues, he said. Data lakes store data directly from their original sources without the formatting, cleansing, and transformation that go into more traditional data warehouses. “Data lakehouses also support large-scale machine learning workloads,” Thurai noted.

There’s been a rise in tools and platforms that feature “multilingual search into unstructured data, searching for untagged, unclassified images,” said Thurai. “Previously, it was difficult to search for images within scanned documents. Video, image, and audio auto-classification capabilities is another area which is very important. Data scientists need to spend less time data wrangling with this data and more time producing models.”

Such a capability enables systems to detect similarities, which is “effective in defending copyright to music, imagery, music videos, and more,” Thurai said. “It’s now possible to compare two snippets of unstructured data—such as music or video—to see if one was copied from the other.” In addition, such capabilities are useful for sentiment analysis, he continued. “For example, if someone mentions a company or person in a news segment, AI can automatically analyze the untagged data and preventatively suggest mitigation actions.”

Cloud computing has been all around the business landscape for a number of years now. However, its impact is just starting to be felt in the big data world. “Cloud technologies are well-established now, but off-prem, distributed technologies are still driving the most exciting developments in data management today,” said Sharad Varshney, CEO of OvalEdge. “Perhaps the most significant of these is data mesh architecture. The technology embraces decentralized data management, and instead of transporting data to a lake or warehouse, it’s worked on in domains.”

Many forward-thinking organizations “have begun to incorporate it into their data governance and management strategies” said Varshney. “Major cloud and data organizations have already adopted data mesh technologies, which is a positive sign that this critical strategy is becoming more accessible.” At the same time, as with any new technology in or out of the data management space, “initiating culture change is one of the most significant roadblocks,” he cautioned. “Data mesh technologies call for a shift in the way users and upper management access and query data. Shifting the responsibility from a centralized authority to the individual will be difficult for some to accept. First and foremost, there is the issue of trust. However, once you overcome these initial roadblocks, then the effectiveness of the technology should iron out any internal concerns.”

Images Powered by Shutterstock