In business today, knowledge is power – just as it has always been. The most intelligent and successful businesses collect the right data, which they effectively turn into information and then to knowledge.
We now live in an increasingly data-filled and data-driven society and businesses which can thrive in this environment – by understanding and harnessing the data that flows within and around their operations, and what it means – are likely to have a bright future.
Data virtualisation is one of the tools (or techniques) which an organisation can employ to help with this task. It’s a term that is being heard more and more frequently but there is often confusion about precisely what it means – so in this post I want to take a look at what it is, and why it can be a powerful aid when tooling up for data-driven change.
Just as virtual reality denotes a reality that is abstracted from the “actual” reality we all live in, “virtual” data, at its most basic, is simply a dataset which is abstracted, in some way, from the actual, physical electromagnetic data it represents – which usually exists, at its most “real” level, as 1s and 0s encoded magnetically onto a physical hard drive somewhere.
Modern smartphones, computers and tablets all use virtualisation to some extent – to make them work more in-line with the way our brains expect them to behave. Files sit within folders and are grouped together according to their type. If you want to look at a picture you’ve taken with your camera, you open the gallery – which is a virtualised dataset based on the actual picture files stored on your phone or memory card. You probably get a little thumbnail of the image and depending on your settings, information like the image size and details of when and where it was taken. Working with a “virtualised” dataset like this makes it easier to search through and sort to find the information you want.
Facebook is another good example (borrowed from here). When we want to look at a photograph or video, we can access it through the virtualised environment of the Facebook app or website. We don’t need to know the physical location, or any information about, the file we want to view, we can access it by looking directly in the place we expect it to be – be that our own photo album or a group dedicated to funny cat pictures.
At enterprise scale, data virtualisation is based on the same principles. Because enterprise data is very often Big Data, it can be messy – if a company is collecting even a sliver of the data available to it, it will (or should) have machine data, transactional data, financial data, customer feedback data, operational data and curated external data at its fingertips. The complex nature of this data and the plethora of emerging ways in which is can be leveraged for insights mean that specialist tools have become available for virtualising it.
Why is data virtualisation useful?
The main benefit is that any operations carried out on virtualised data involve only the curated, “useful” information which has been grabbed from the “actual” dataset.
For example, if the data-driven project you are currently working on involves improving the speed of rocket-powered cars, and all you needed for one particular query was 0 – 100 MPH times, working on a virtualised dataset containing just this information would result in quicker, simpler calculations to get the information you need.
Virtualisation is widely seen as an aid to productivity because it means data can be accessed in a variety of ways depending on what it is used for, and this transformation can take place in the virtualisation layer without affecting the source data. Large datasets do not need to be loaded entirely into memory for simple, frequently-repeated operations, again improving speed.
Widely used virtualisation solutions (e.g. Denodo, Cisco, Delphix, Informatica) are built to interface directly with data sources and client applications read purely from the virtualised datasets which are produced. Data virtualisation also has benefits for compliance and governance. It is often used to restrict access to data based on credentials or clearance level. It also provides tools for oversight of how data is used, what types of data are most frequently accessed, and what changes or transformations are being applied to data before it is put to use.
Overall data virtualisation has enormous potential to help businesses become more agile as well as focused as they make the transition to becoming intelligent, data-driven organisations.
It allows disparate and often siloed datasets to be brought together and analysed in the context of everything that can be measured and known about an organisation’s operations. Conversely it also means valueless “noise” can be filtered out, and stopped from consuming increasingly valuable compute and storage resources.
The increase in speed and efficiency means that cutting-edge Big Data projects involving advanced technologies such as predictive analytics and machine learning are within the grasp of an ever-growing number of businesses. This is likely to continue to be a strong driver of innovation for the foreseeable future.