Logo

The Data Daily

Machine Learning Puts New Lens on #IoT. A Step-by-Step Guide to #Azure #MachineLearning

Machine Learning Puts New Lens on #IoT. A Step-by-Step Guide to #Azure #MachineLearning

Healthcare organizations need predictive analytics for providing quality healthcare and population health management. Building predictive models by applying machine learning algorithms is complex in the infrastructure-as-a-service or platform-as-as-a-service environment as it involves distributed computing. The emergence of predictive analytics in the healthcare industry has offered enormous opportunity to be able to predict the events in healthcare organization and other industries as well such as aerospace industry. Predictive analytics is a subfield of data science that deploys several multi-disciplinary fields such as statistical inference, machine learning, clustering, data visualization, and machine learning iteratively through the lifecycle of the data analytics. The stages can be defined as defining the problem statement for the organization, scope of the data analytics project, collection of big data, exploratory data analysis, data preparation, deployment of predictive models leveraging machine learning algorithms.

During the initial phase of the data analytics project, it is of paramount importance to understand the pain points of the business and the requirements before designing solution architecture for predictive analytics with machine learning. The business requirements should be defined during the data discovery phase and need to be translated into data analytics conundrum. For example, in healthcare industry, the healthcare organization might be looking for the epidemics and outbreaks in different parts of the world. The problem statement could be to predict the outbreaks based on translating the calls received by the emergency department and running deep neural networks to identify the speech recognition and location intelligence to identify the impacted locations of the epidemic and the ability to predict the outbreaks based on rate of velocity, geographic location, and demographics.

The data collection could be generated from disparate channels of data sources in both structured and unstructured formats. For healthcare organizations, the data might already be available in data lakes or data warehouses. However, it requires data extraction and loading from source format to target format as part of data collection phase.

Once the data migration, data preparation, and data conversion process is complete, the organization can explore the data to perform statistical inference methods, clustering, data mining, and machine learning algorithms and deliver data visualizations. The data still may not be in exact format for building the predictive models. In such case, data wrangling can be performed to build the data more accurately.

This is the critical stage to build the prediction models by selecting and applying a particular machine learning algorithm by building a predictive model. The datasets are divided into training and testing datasets. The training data is leveraged to train the model; the other partition of the untrained data is leveraged to test to determine the performance evaluation of the predictive model. The testing models can be iteratively run through a number of iterations with ensemble machine learning algorithm to avoid under fitting and over fitting and eliminating outliers and evaluate the machine learning algorithm that fits perfectly for building the prediction model.

Deployment of the model for IoT

The model can be deployed once the best fit for prediction model and performance evaluation is complete. However, there is reusability of the prediction model across multiple departments of healthcare organizations or other aerospace industry organizations. Such reusability of prediction model requires deployment through a web service and database across the organization throughout the nation or across the globe.

Azure machine learning is a Microsoft tool that runs on a distributed cloud-computing environment. The service can be run on browser. The organizations do not require any additional hardware or software procurement for running Azure machine learning. Azure machine learning is also a data visualization service that enables drag and drop methods to build the prediction models and applying machine learning algorithms. Azure machine learning is an integrated environment from Microsoft. Azure machine learning service can pull large-scale big data from Hadoop ecosystem through Microsoft HDInsight and bring the data to Azure machine learning. Microsoft powered Azure machine learning through a number of algorithms as a result of Microsoft Research for various industries. Microsoft also leverages these algorithms to power their internal products such as Cortana and Bing.

Machine Learning Studio is part of Azure machine learning integrated browser-based development environment. In the recent times, Azure machine learning has created Azure notebooks to share the work from one department to another that is powered by open-source Jupyter notebooks. Machine Learning studio or ML Studio allows visually creating the predictive models and performing iterations of training and testing data interactively. Pre-defined processes within Azure machine learning library may not cover all the scenarios. It may not be possible to find a drag and drop model for a particular scenario, in that case, the code can be written either in proprietary R language or in Python language and extend model with the developed code. ML Studio also provides access to query the data. The datasets can be simply be dragged and dropped in ML Studio environment to build an experiment and by submitting it to ML Studio with an algorithm, it can build the predictive model without code. Only when particular business logic needs to be incorporated with a machine learning algorithm that is outside the scope of already developed ML Studio, the code can be written in R or Python. The access to Azure ML Studio can be gained by subscribing to Microsoft account. The tool can be used by creating a machine learning workspace, assigning the workspace information, and workspace owner. Once the workspace is created, a machine learning page shows up. On the left side of ML Studio, a number of tabs will be displayed such as web services, experiments, datasets, settings, and trained models. In experiment, the statistical analysis or predictive analytics can be performed on the modules that encapsulate the machine learning library. Once the dataset is uploaded into ML Studio, it can work as an experiment module. Adding input and output ports can create a workflow. The input ports can have a single or multiple output ports. The procedure to create a new experiment would be clicking the new button, selecting an experiment.

Azure machine learning can be applied to aerospace industry as well apart from healthcare organizations. By creating an experiment to predict the delay of passenger aircraft with binary classification. The big data of historical information of the scheduled flight is collected in the first phase from United States Department of Transportation. The next step would be to perform some data wrangling by preprocessing the data through filtering to consider the most busiest airports in United States and a number of other attributes on each field. Once the final airport codes dataset has been wrangled and ready to be processed, another dataset for weather has to be prepared that has all the data attributes related to the weather conditions. Both the datasets need to be joined on Azure Machine Learning Studio and build the prediction model through Two-Class Boosted Decision Tree and train the datasets. For the purpose of comparisons, a Two-Class Regression needs to be selected as well, as this is a binary classification task. The results show the ROC (Receiver operating characteristic) with precision and recall parameters in addition to the area under curve. The results can be interpreted from the boosted decision tree model for analysis purposes.

Microsoft (2016). How to choose algorithms for Microsoft Azure Machine Learning. Retrieved November 1, 2016, from https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-algorithm-choice

Images Powered by Shutterstock