Data...What? Data Democratization and the Illusion of Self-Service
I know what you’re thinking. Not another post about Self-Service! But before I lose your interest, this article will be different -- I will try to explain why we are doing it all wrong (most of the time) and why IT is happily washing their hands clean from the users’ experiences and needs (do I have your attention now?!).
What is data democratization?
The 101 answer to that question is “bringing data to users” -- any data, any user. Going one step further, we can say that democratizing is a way to enable users to access the data they need without major bottlenecks and get real value from it by applying the necessary analysis on it. By doing this we need to be sure we put in place the right controls to keep data secure and bring catalogs to help the users find what they are looking for while guaranteeing a certain level of quality. Sounds easy, right? Then…. Why are we seeing this process fail repeatedly and why is there an illusion of self-service that never comes to reality?
Failed Experiences seen in Customers
Let me share some experiences I’ve seen and discussed with our customers.
Wrong paradigm from IT. How many times have we seen IT trying to do something just because? “Self-service is cool, and we need it to be perceived as modern” although under the cover users are still struggling to get the information they need. Many IT organizations don’t understand that for users to be served, they need to think in terms of “Do what you need, I care for you,” instead of “Do what you want, I don’t care.” Unfortunately, the latter is what I’ve seen most. This happens when IT is too focused on tools and technology instead of being accountable for data strategy and data architecture. The tools help narrow the gap for data to be accessible but are useless if there’s no serious practice for governing and productizing the data, leading to the reinvention of the wheel each time the user needs to create a new analysis. Separation of production and discovery of data is also key to success (check what my colleague Kevin Lewis has to say about this ) as this enables a faster way of developing insights that can be operationalized only after full value realization is confirmed.
No governance. Let me say it out loud: having a catalog or dictionary tool is different than having a data governance practice in place. What value can you get from the data if you are not sure what it means? Or, if you can find the same KPI built from different sources of information? Which data are right, and which are not so good? The key foundation for enabling self-service is having curated data that are known and that can be trusted. No user wants to have information at hand that has no value, no matter how easy it was to access it. It’s even worse if you don’t know where to find the data and you struggle for days trying to reach the repository just to realize that after getting to it, you are not sure if it has the data you need because there’s no definition (or metadata) to support it. Metadata is key to success, as my colleague Dwayne and Mark elaborate here .
Lack of efficiency and waste of money. Because there is no governance and no data integration, things keep being redone each time a user needs them. Loading new data, building a data set, copying rows from one place to another or even having the same data duplicated on multiple repositories -- these issues are commonly seen in many enterprises. Millions of dollars are thrown away each year due to the lack of an integrated solution that can bridge the needs for all the different LOBs. Having a practice (or tool) that can federate or virtualize data is useful, although not sufficient. You may still want to integrate (most of) your data and put it in action by enabling different views on top of it. Cross-domain integration and cross-platform integration help in building a more efficient ecosystem by allowing the reuse of information. Remember, self-service is here to increase value and efficiency, not the other way round.
So… What should we do?
Loading a bunch of tables and buying a catalog tool and a dashboarding tool is clearly not self-service. Both IT and LOBs must work together to prioritize which information is key to enable the analytics the company needs and then bring the data to a centralized repository (or even better -- an ecosystem) where it can be curated and governed. Users may find it easier to fulfill their analytics needs if they don’t have to rework the same process every time, but also if they can have a good amount of experimentation applied on the data based on the foundation already built. It doesn’t mean we should spend months or years trying to create the perfect data warehouse without tangible value or have sparse data marts in a crazy ungoverned environment or a data lake. IT can leverage tools and practices to allow a better user experience for discovery analytics based on both centralized and decentralized data, but it all relates to a good data strategy as we discussed in the first part of this series. Remember, the key to self-service is not the tools or technologies but the process and the quality of the information, along with the simplicity to access it.
Sebastian Barreda is an Ecosystem Architect with 15 years of working with Data and Analytics Solutions. He worked in Teradata Consulting in many roles, from ETL and BI development to Requirement gathering and Logical Data Modeling, were he gained practical experience on these many topics. Later, he advanced his career to a Solution Architecture role, working on analyzing customer’s business and technical requirements to translate them into products, solutions, and services, understanding the key link between business needs and technology enablers, leveraging cloud, open source and the so called Big Data tools and solutions. He worked on delivering Data Strategy advisory on several Industries like Retail, Manufacturing, Communications, Media & Entertainment and Banking. View all posts by Sebastian Barreda