Data science is all about presenting insights to the end-users in the most simplistic way possible. You work on a machine learning/deep learning model from data cleaning to hyperparameter tuning. However, you realize that the most important task of presenting it to the end-users has not even started yet. Here I discuss an easy and faster way to deploy ML models using Jupyter Notebook and Tableau.
We will use the Titanic dataset available on Kaggle to build a Random Forest model. The goal of the project is to predict if a passenger will likely survive the Titanic disaster or not. We will use demographic variables like Age, Gender, sibling count, and also the ticket class of the passenger as independent variables.
The dataset contains 891 rows with 177 missing values in Age. We replace the missing values with random numbers within one standard deviation around it’s mean. Similarly, we modify the NaNs in Fare to 0.
We build a Random Forest classifier with the default parameters. However, we achieved an accuracy of ~94.5% with just 6 variables suggesting that it could have overfitted on the training data. But, our focus is to deploy the trained model quickly. So, we will not try to get the best evaluation metrics like Precision or Recall. There are plenty of other resources on Kaggle which focus on that and here is a simple model without cross-validation or parameter tuning.
We need Python 2.x or Python 3.x already installed on our machine to start the deployment or you can use Anaconda to install Jupyter Notebook along with Python. Also, we need to install TabPy to launch a server that can host the deployed model. You can do that simply by following the steps here. If you feel that the installation command is taking time, then close the cmd prompt and try it in another one. I had to do it a couple of times to get it right.