Logo

The Data Daily

Automated Deployment of TensorFlow Models with TensorFlow Serving and GitHub Actions

Automated Deployment of TensorFlow Models with TensorFlow Serving and GitHub Actions

September 27, 2022 — Posted by Chansung Park and Sayak Paul (ML-GDEs)
If you are an applications developer, or if your organization doesn’t have a dedicated ML Engineering team, it is common to deploy a machine learning model without worrying about the end to end machine learning pipeline or MLOps.  TFX and  TensorFlow Serving can help you create the heart of an MLOps infrastructure. In this post, we will share how we s…
Automated Deployment of TensorFlow Models with TensorFlow Serving and GitHub Actions
September 27, 2022
Posted by Chansung Park and Sayak Paul (ML-GDEs)
If you are an applications developer, or if your organization doesn’t have a dedicated ML Engineering team, it is common to deploy a machine learning model without worrying about the end to end machine learning pipeline or MLOps.  TFX and  TensorFlow Serving can help you create the heart of an MLOps infrastructure. 
In this post, we will share how we serve a TensorFlow image classification model as RESTful and gRPC based services with TensorFlow Serving on a Kubernetes (k8s) cluster running on  Google Kubernetes Engine (GKE) through a set of  GitHub Actions workflows. 
Overview
In any GitHub project, you can make  releases , with up to 2 GB of assets included in each release when using a free account. This is a good place to manage different versions of machine learning models for various reasons. One can also replace this with a more private component for managing model versions such as Google Cloud Storage buckets. For our purposes, the 2 GB space provided by GitHub Releases will be enough.
Figure 1. Three steps to deploy TF Serving on GKE ( original ).
The basic idea is to:
Automatically detect a newly released version of a TensorFlow-based ML model in GitHub Releases
Build a custom TensorFlow Serving Docker image containing the released ML model
Deploy it on a k8s cluster running on GKE through a set of GitHub Actions.
The entire workflow can be logically divided into three subtasks, so it’s a good idea to write three separate composite GitHub Actions :
First subtask handles the environmental setup
GCP Authentication (GCP credentials are injected from the GitHub Action Secret)
Install gcloud CLI toolkit to access the GKE cluster for the third subtask
Authenticate Docker to push images to the Google Cloud Registry (GCR)
Connect to a designated GKE cluster for further accesses
Second subtask builds a custom TensorFlow Serving image
Download and extract your latest released SavedModel from your GitHub repository
Run the official or a custom built TensorFlow Serving docker image
Copy the extracted SavedModel into the running TensorFlow Serving docker container
Commit the changes of the running container and give it a new name with the tags of special token to denote GCR, GCP project ID, and latest
Push the committed image to the GCR
Third subtask deploys the custom built TensorFlow Serving image to the GKE cluster
Download the Kustomize toolkit to handle overlay configurations
Pick one of the scenarios from the various experiments
Apply Deployment, Service, and ConfigMap according to the selected experiment to the currently connected GKE cluster
ConfigMap is used for batching-enabled scenarios to inject batching configurations dynamically into the Deployment.
There are a number of parameters that you can customize such as the GCP project ID, GKE cluster name, the repository where the ML model will be released, and so on. The full list of parameters can be found here . As noted above, the GCP credentials should be set as a GitHub Action Secret beforehand. If the entire workflow goes without any errors, you will see something similar to the output below.
NAME         TYPE            CLUSTER-IP      EXTERNAL-IP     PORT(S)                            AGE
tfs-server   LoadBalancer    xxxxxxxxxx      xxxxxxxxxx       8500:30869/TCP,8501:31469/TCP      23m
The combinations of the EXTERNAL-IP and the PORT(S) represent endpoints where external users can connect to the TensorFlow Serving pods in the k8s cluster. As you see, two ports are exposed, and 8500 and 8501 are for RESTful and gRPC services respectively. One thing to note is that we used LoadBalancer as the service type, but you may want to consider including Ingress  controllers such as GKE Ingress for securing the k8s clusters with SSL/TLS and defining more flexible routing rules in production. You can check out the complete logs from the past runs .
Build a Custom TensorFlow Serving Image within a GitHub Action
As described in the overview and the official document , a custom TensorFlow Serving Docker image can be built in five steps. We also provide a notebook for local testing of these steps. In this section, we show how to write a composite GitHub Action for this partial subtask of the whole workflow (note that .inputs, .env, and ${{ }} for the environment variables are omitted for brevity).
First, a model can be downloaded by an external  robinraju/release-downloader  GitHub Action with custom information about the URL of the GitHub repository and the filename in the list of assets from the latest release. The default filename is saved_model.tar.gz.
Second, the downloaded file should be decompressed to fetch the actual SavedModel that TensorFlow Serving can understand.
runs:

Images Powered by Shutterstock