A Hands-On Introduction to Transfer Learning

Editor’s note: Tamoghna is a speaker for ODSC West 2022 this November 1st to 3rd. Be sure to check out his talk, “A Hands-on Introduction to Transfer Learning,” there!

To learn a new task as we humans need not always start afresh but rather apply previously-learned knowledge. In the same way, “Transfer Learning” (TL) allows a machine learning model to port the knowledge acquired during the training of one task to a new task. TL is mostly used to obtain well-performing machine learning models in settings wherehigh-quality labeled data is scarce.

How to determine whether a model trained in a source domain (with abundant data) may be adapted to a target domain (with scarce data)? This is termed atask-relatednesschallenge.Domain knowledge about the source and target domains can help determine task relatedness.

TL is mostly used in deep learning models and the following are various types of transfer learning:

Representational transfer : Different layers in a deep neural network capture different sets of features. The intermediate layers of the source domain model can be used as a feature extractor and an ML classifier can be trained on the small target domain data with features extracted by the source model. For example, consider an image classification task of categorizing an image into three classes on lighting exposure: over-exposed, underexposed, or normal. Suppose we have very few annotated images. Taking any of the SOTA CNN models like ImageNet and using it as a feature extractor we can train a simple ML model over these features. Now, which layers of the network can be the best feature extractor? As the target task is to classify images based on brightness/contrast, the lower-level layers of the CNN which represent color, texture, and brightness filters may be a good choice. Also, global pooling along channel dimensions for these filters can reduce feature dimensions. : Features of the source and target domains are drawn from a common distribution and the difference comes from a sample selection bias. For high task relatedness, we may even fine-tune the final layers of the source domain model or finetune the entire model. For example, while training anautonomous driving system , we may train the model initially on massive synthetic datasets generated via a high-fidelity simulator, and then further refine the model on a small dataset collected in the real world. However, if the simulator generates data modeling the streets of Manhattan and the real word dataset is from the streets of Bangalore, their TL becomes really challenging. Ways to mitigate such challenges are: adversarial domain adaptation, divergence-based domain adaptation, etc. : The data from both source and target domains are used for training. Along with the loss derived from targets, domain confusion loss is used which does not allow the models to differentiate between the two domains. The source and target domains can be learned simultaneously (i.e., dual-task learning).

When there is absolutely no data in the target domain, can we still do transfer learning? Zero-shot learning/ few short learning are extreme variants of TL which addresses this.

We will be discussing each of these transfer learning variants in the tutorial session along with hands-on code and cover some latest applications of TL:

Tamoghna Ghosh is an AI Solution Architect in the Client Computing Group at Intel, working on building next-generation AI solutions for edge computing. Prior to this role, he worked as a data scientist at Intel working on various domains like supply chain – inventory optimization, anomaly detection, and failure prediction of various IT infrastructures across Intel, building advanced search tools for bug sightings, to name a few.

Images Powered by Shutterstock

The Data Daily

A Hands-On Introduction to Transfer Learning