Fast Fine-Tuning of AI Transformers Using RAPIDS Machine Learning | NVIDIA Technical Blog

Read original article here

In recent years, transformers have emerged as a powerful deep neural network architecture that has been proven to beat the state of the art in many application domains, such as natural language processing (NLP) and computer vision.

This post uncovers how you can achieve maximum accuracy with the fastest training time possible when fine-tuning transformers. We demonstrate how the cuML support vector machine (SVM) algorithm, from the RAPIDS Machine Learning library, can dramatically accelerate this process. CuML SVM on GPU is 500x faster than the CPU-based implementation. This approach uses SVM heads instead of the conventional multi-layer perceptron (MLP) head, making it possible to fine-tune with precision and ease.

A transformer is a deep learning model consisting of many multi-head, self-attention, and feedforward fully connected layers. It is mainly used for sequence-to-sequence tasks, including NLP tasks, such as machine translation and question-answering, and computer vision tasks, such as object detection and more.

Training a transformer from scratch is a compute-intensive process, often taking days or even weeks. In practice, fine-tuning is the most efficient way of applying pretrained transformers to new tasks, thereby reducing training time.

As shown in Figure 1, transformers have two distinct components:

During fine-tuning, the backbone network of the transformer is frozen while only the lightweight head module is trained for the new task. The most common choice for the head module is a multi-layer perceptron (MLP) for both classification and regression tasks.

As it turns out, implementing and tuning a MLP can be much harder than it looks. Why is that?

Support vector machines (SVMs) are one of the most popular supervised learning methods and most potent when there are meaningful, predictive features available. This is especially true with high-dimensional data due to SVM’s robustness against overfitting.

Yet, data scientists are sometimes hesitant to try SVMs for several reasons:

RAPIDS cuML revives interest in revisiting this classic model by providing a speedup of up to 500x on GPU. With RAPIDS cuML, SVM is gaining popularity again in the data science community.

For example, RAPIDS cuML SVM notebooks have been frequently used in several Kaggle competitions:

As transformers have already learned to extract meaningful representations in the form of long embedding vectors, cuML SVM is an ideal candidate for the head classifier or regressor.

When compared to the MLP head, cuML SVM has the following advantages:

This proposed fine-tuning methodology with SVM heads applies to both NLP and computer vision tasks. To demonstrate this, we looked at the PetFinder.my Pawpularity Contest, a Kaggle data science competition that predicted the popularity of shelter pets based on their photos.

The dataset used for this project consists of 10,000 hand-labeled images, each with a target pawpularitythat we aimed to predict. With pawpularity values ranging from 0 to 100, we used regression to solve this problem.

As there are only 10,000 labeled images, it is impractical to train a deep neural network to achieve high accuracy from scratch. Instead, we approached this by using a pretrained swin transformer backbone and then fine-tuning it with the labeled pet images.

As shown in Figure 2, our approach requires three steps:

It is worth noting that steps 1 and 3 are optional and have been implemented here to optimize the model’s score for this competition. Step 2 alone is the most common scenario for fine-tuning. For this reason, we measured the run time at step 2 and compared three options: cuML SVM (GPU), sklearn SVM (CPU), and PyTorchMLP (GPU). The results are shown in Figure 3.

The runtime is normalized by sklearn SVM and cuML SVM achieved 15x speedup for training and 28.18x speedup for inference. It is noteworthy that cuML SVM is faster than PyTorch MLP due to high GPU utilization. The notebook can be found on Kaggle.

Transformers are revolutionary deep learning models, but training them is time-consuming. Fast fine-tuning of transformers on a GPU can benefit many applications by providing significant speedup. RAPIDS cuML SVM can also be used as a drop-in replacement of the classic MLP head, as it is both faster and more accurate.

GPU acceleration infuses new energy into classic ML models like SVM. With RAPIDS, it is possible to combine the best of the two worlds: classic machine learning (ML) models and cutting-edge deep learning (DL) models. In RAPIDS cuML, you will find more lightning-fast and easy-to-use models.

At the time of writing and editing this post, the PetFinder.my Pawpularity Contest concluded. NVIDIA KGMON Gilberto Titericz won first place by using RAPIDS SVM. His winning solution was to concentrate embeddings from transformers and other deep CNNs, and use RAPIDS SVM as the regression head. For more information, see his winning solution write-up.

Images Powered by Shutterstock

The Data Daily

Fast Fine-Tuning of AI Transformers Using RAPIDS Machine Learning | NVIDIA Technical Blog