AI and Machine Learning feed on large amounts of data, and CompTIA's AI Council detailed more on how important it is to start with strategy when considering how all facets of data drive your business outcomes.
We have heard statements like “Data is the new oil”, “Data is the lifeblood of any business” etc.for almost a decade now. “Big Data” was the buzzword not too long ago followed by “IoT” and now AI/Machine Learning. Irrespective of the terminology, the fact is that data is the underlying factor that propels these emerging technologies and their business applications forward.
According to World Economic Forum, back in 2019 they had predicted that the entire universe of data would grow to 44ZB (1 zettabyte (ZB) = 1 million petabytes) by 2020. Rapid digitization, remote working, learning and new applications that evolved during the pandemic catapulted the data growth to now touching 94ZB by the end of 2022. That is more than twice the data growth in just the past 24 months alone.
AI and Machine Learning feed on large amounts of data to build models and train them to deliver the inferences needed to drive meaningful outcomes. As adoption of AI continues to grow, the hunger for more data gets even more exacerbated.
There used to be a time when data was almost always structured (I.e., tables, columns, forms, transactions, etc.) and often housed in databases and data warehouses, siloed across various functions within an organization or even within one of its branches. As the world of sensors, images, videos, voice, social media exploded over the past decade and half, unstructured data (raw data captured as is with no specified format) has become prevalent in many business. Organizations now have to cope with managing multiple data types and taming unstructured data which is highly unpredictable. Additionally, new regulations and privacy laws (e.g., Europe’s GDPR, California Consumer Privacy Act) further limit the use of all data that is collected and stored within an organization, not to mention the multiple copies of data used for various business applications.
More data also leads to strategic, governance and policy issues on who manages, stores, accesses and ultimately owns the data, and importantly, for what purposes or analyses will data be used. Poor data governance could also hamper regulatory compliance requirements which may not have been foreseen when data was being collected in the first place. Most mid to large organizations today have data stewards like CDOs (chief data or digital officers) in place to lead and be accountable for organizational data governance—certainly a far cry from the IT/CIO team that used to own all of this in the past.
By its very definition, Artificial Intelligence or AI doesn’t get created in a vacuum. It requires lots and lots of data for the algorithms to deliver a solution that is meaningful and more probabilistic than looking into a crystal ball. The foundational data feeding into AI models directly impacts the behavior and resulting outcomes of those models. These could be potentially manipulated, or inadvertently unattended models used for credit or loan decisioning, security bypass, benefits eligibility, medical imaging and diagnosis, product quality inspection, fraudulent insurance claims, etc.
The pandemic created an enormous challenge for most supply-chain pipelines as demand outstripped supply and transportation bottlenecks further caused material shortages causing price increases and inflation. None of this could have been predicted, no matter how much data organizations had in their repository.