It might seem as if everybody wants to ‘do big data’, for the sake of big data these days. Every time I hear the term I mark it down on the list I've been keeping since 2010, when the hype was really at its peak. You can't imagine in how many contexts I've heard it in since. People just want to make sure they demonstrate they are following new trends, afraid of being left out or not being taken seriously. The problem is, they often don’t really understand what is behind the hype, and why there is such hype.
Recently, I was listening to a debate on TV about pensions for unemployed people over 50, when one of the panel members dropped in the ‘big data’ cure to solve the unemployment problem. There you go, one more mark on my list, and more 'big laughter' from my side! Of course, he had a point. What he really meant is "with big data we can create better insights". For example, by combining behaviour analytics with traffic flows (yes, we have traffic jams in Belgium too), personal skills and interests, we could see how all these different 'parameters' could make a particular person the right fit for a certain work environment, location or job type.
Now we come to the point where we can talk about big data. To massage and prepare all this data, we need to be able to quickly integrate new information, from various structures and sources, quickly digest it and use different techniques to create those insights.
I’ve grown up with a traditional approach to data management, called business intelligence (BI), which involves the typical pivoting of data, or what we call dimensional modelling. That includes dimensions and facts, which simply comes down to, for example, how much revenue we generated this year compared to last year, by region, product group and sales team. Dimensions are the 'by' and the how much is the 'fact'. Why is this approach still valid? Let’s face it, data for the fun of data - the bits and bytes, the programming, the technology - doesn’t bring much business value. That's also the reason why we must start with the business problem and try to answer the right questions. Dimensional modelling is a good exercise to figure out how to ask the right questions, as well to 'dimensionalise' business concepts (a customer, a region, a product, a visit, revenue). An important part of getting the right insights is understanding the business (rules, mechanisms) to be able to understand the data.
Under the traditional BI approach, we have this information more or less at our fingertips. We have the standard ETL (extract-transform- load) running, the loading of operational data from the OLTP (online transaction processing) systems into the data warehouse, structured by dimensions and facts for standard business questions. Day after day, week after week, month after month…. But what if our business model changes a bit, or the market changes, or we get disrupted by start-ups? Suddenly we can’t rely on this super-regulated and organised approach using standard operating procedures.
We now need to add new data, insights and concepts to the data warehouse – and this needs to have been done by yesterday! But for this to work we need to remodel our systems and have little room to experiment, because in all probability the loading mechanism will break and the monthly / quarterly reports will no longer show sensible results. We could take a copy, and experiment on that. But the copy takes too long to compile, the back-up might fail, we don’t have enough diskspace… and so the problems continue. Neither should we offload the data somewhere else, as this may not be secure. Sure, it would be great if we could integrate all these paper contracts (with no real structure) into our system, or for instance, a retailer could use security video footage to carry out analysis of how their customers shop…
But that’s where our innovation stops - we don’t have the playground, or a sandbox… So people shrug their shoulders and say, ‘oh yes, but we can’t do much about it, it has been always like this, this is how it is at our company’. These challenges are why your data needs an agile platform… the right one for the given problem. Certainly traditional data management still has a role to play. But the more demanding your data-to- insights requirements become, the more you will need to move towards that flexible platform.
The global pace of change in the data and analytics economy is tremendous, which can be exciting and paralysing at the same time. So if you don’t know how to get started , here’s a list to help you define the right platform for the right ‘data’ problem. Requirements for a modern data analytics platform:
It’s important that your platform can generate and accelerate insights and value from data, in any compute environment for any challenge. You need to ultimately reduce time-to-insights and value for our customers. I will be finding out more on how to achieve this by attending Analytics Experience in Amsterdam on October 16-18.