Enterprise Data Driven Strategy using DataOps and MLOps

Read original article here

As Big Data and ML technologies moved up the hype curve, many companies jumped onto the bandwagon to implement Big Data. Everyone seems to be convinced that Corporate Strategy should be completely data driven. Sure, Data driven Strategy is the new normal now. But wait a minute, is it going to be Agile Data Driven Strategy or just Data driven strategy?

As per Gartner, more than 50 % of the companies who went on to implement Big Data didn’t meet the positive ROI targets. So, what went wrong? Answer to the million-dollar question is: In most of the cases, successful companies implemented Agile Data Driven Strategy while not so successful ones just the Data Driven Strategy. Let us see what the difference between two approaches is.

As per Gartner, 80 % of the Data Scientists’ time is killed in cumbersome work of data cleaning, exploration and transformation. Hardly 20% of the time is utilized in writing algorithms and deriving meaningful data insights. Furthermore, data scientists face numerous other challenges: for instance, given the choice of multiple algorithms to solve a problem, it is not easy to decide which algorithm is best and it is too time consuming to implement various algorithms to solve one problem and then compare the results to pick the best. Even after the models are deployed successfully, enterprise data changes rapidly, which poses another challenge of monitoring, updating and testing the updated models on new data. Trying to tackle these challenges manually means increased TTM, Cost, Complexity & Risk and poor efficiency.

On the Data Engineering front, challenges are no less. Common scenario is hundreds of MR/Spark jobs written to do ETL to form a Data Lake. One can imagine the pain and the efforts required to write and debug the jobs in a massively parallel distributed platform. To add on further, one needs to take care of all the data engineering aspects: administration, security, lineage, cataloging, audit, data life cycle management, exploration UI, visualization, reporting etc. Often at least some of these data engineering aspects are ignored. If not ignored, many companies try to tackle these using different stack of technologies, leading to high risk of integration, high cost and increased complexity. Failing to take care of even one data engineering aspect can have serious repercussions in terms of security, compliance, efficiency, cost etc. A typical Data engineering pipeline looks as shown in the figure below:

So, what is the solution for all above mentioned problems? Solution is to take off the cumbersome housekeeping work away from data scientists and data engineers by achieving automation in the data pipeline as much as possible. It is imperative to create synergy between data engineers and data scientists as in: make the quality transformed data available to the data scientists and the business analysts as soon as it is in the pipeline. This requires end to end orchestration & automation of the data pipeline all the way from data ingestion to data discovery.

In other words, apply DataOps and MLOps best practices to automate the pipeline and achieve end to end orchestration of the data. With this automated orchestration approach, as soon as the data is ingested in the pipeline, it is curated and immediately available to the data scientists and business analysts for exploration, models building, reporting and visualization. This means, data ETL and curation can be done on the fly (one-time initial jobs set up & configuration is required), multiple ML models can be evaluated and build dynamically upon the new data, pushed to production seamlessly and later monitored automatically. This can be easily achieved by tools such as Pentaho. Of course, there are many other tools available in the market. Pentaho is a one stop solution for all your data needs: Big Data, Machine Learning/Analytics, BI/ETL, IOT, Reporting & Visualization. Furthermore, there is absolutely no coding required in Pentaho and has highly intuitive drag & drop based UI, which even a non-technical person can use. Advantages are reduced cost & TTM, upto 10x reduced development time, increased efficiency, less risk and less management overhead. In conclusion, one needs to take a Head on approach rather than a round robin one to establish an Enterprise Big Data & Analytics platform that can effectively drive your data driven strategy.

Make the Data work for you, not the other way around.

Images Powered by Shutterstock

The Data Daily

Enterprise Data Driven Strategy using DataOps and MLOps