Traditionally, big data analytics has relied on structured data. Data analytics, however, doesn’t start and stop with the tidy data that’s locked in the rows and columns within your databases. Organizations can garner a lot of value by harnessing the power of “dark” or unstructured data (think nested and threaded emails, image files, outdated file formats, and paper documents) that make up as much as 90 percent of the data available to a company.
Unstructured documents represent far more content than a company’s databases could ever produce, and harnessing this data adds fuel that feeds an organization’s analytics engines—leading to better outputs and shrewder decision-making.
However, utilizing this colossal corpus of data means first getting a handle on unstructured data analytics—the process by which unstructured data is collected, analyzed, cleaned, categorized, and enhanced for use by automated analytics tools. Keep reading for the nuts and bolts of how this works, and how unstructured content is being used to fuel big data analysis.
The technology now exists to effectively (and automatically) process vast volumes of unstructured data and extract meaningful business value from this information through big data analytics. If you think of your business like a refinery and your data like crude oil, data analytics engines allow you to refine that raw material and turn it into the fuel that drives real-world business improvements. In the energy sector, for example, a company may have been purchasing lots of land for test drilling over the course of years. Each of those tests likely generated a lot of data, much of it unstructured (think of all the paperwork around land purchases, surveys, legal documents, and then all the testing procedures and results). All of this data is stored somewhere, but accessing it would require a lot of time, resources, and manual processing. In practice, attempting to access this data would result in an operational nightmare.
When advances in drilling and processing technology make previously undesirable sites suitable for work, the organization faces a challenge. They need to determine which old lots of land would now be potentially profitable drill sites. However, manually searching decades-old records to figure that out would be time-consuming, expensive, and, depending on the company’s record-keeping efficacy, potentially fruitless. In this scenario, what’s needed is a way to automatically conduct a search and convert the historical content into a format that can be processed by an automated analytics engine.
Consider, for example, the challenges faced by a global re-insurance company that processes half a billion pages of contracts annually. Because they can automatically process this unstructured content into a format that is usable by their analytics tools, they can feed the contract data into IBM’s Watson and quickly assess risks and trends.
Once a company implements a good unstructured data processing methodology, formerly “dark data” becomes fuel for big data analytics—dramatically increasing the quality of business intelligence that is produced.
By refining and analyzing unstructured contract data, the company was able to discover which areas have more claims based on natural disasters and integrate that with coverage levels of policy-holders in the area, allowing the company to optimize coverage around predicted risks.
Once unstructured data analysis methods are in place, the dark data can be fed into big data analytics tools to find ways to improve the client experience. For instance, a large Scottish bank has a huge unstructured information load. To make matters worse, that content is housed in different divisions of the bank, which manage the data separately. There is no easy way to get a sense for what might be duplicated across business lines.