Prompt learning is an emerging method of training foundation models in AI to perform specific downstream tasks. This will make it easier to calibrate AI to hyper-specific applications with few available training examples.
Prompt learning, also referred to as "prompt-based learning," is an emerging strategy for allowing pre-trained AI models, also known as "foundation models," to be re-purposed for additional uses without additional training.
Foundation models are initially trained with massive amounts of unstructured data and then fine-tuned with labeled data for specific tasks. However, this approach requires introducing new parameters into the model. For example, fine-tuning a large language BERT model to perform binary classification would require an additional set of 1,024 x 2 labeled parameters.
In contrast, prompt-based learning allows engineers to achieve the same ends without requiring new parameters. Instead, natural language text cues, called "prompts" are injected into the AI model's inputs during the pre-training phase. Their purpose is to proactively provide context for a variety of potential downstream tasks. (Also read: Foundation Models: AI's Next Frontier.)
A prompt is contextual, natural language text relevant to a specific task. For example, if engineers want to enable a large language model to recommend a movie, they might add the prompt "it is" to the sentence fragment "worth watching" and create the prompt "It is [blank]."
If engineers add enough contextual prompts, the model could be re-used without additional parameters to successfully predict whether the blank should contain the word “recommended” or the words “not recommended.”
The above example, of training a large language model (LLM) to categorize a movie as "worth watching" with the prompt "It was," is a "discrete prompt." Discrete prompts can be designed either manually, using prompt engineering, or automatically, using methods like AutoPrompt. When tuning discrete prompts, the prompts are kept fixed and the pre-trained model is tuned.
In contrast, "soft prompts" are essentially random vectors injected in the input sequence. When tuning soft prompts, the pre-trained model is kept fixed and prompts are fine-tuned.
Prompt-based learning bridges the gap between a model’s pre-training phase and its use for multiple downstream tasks. But despite the advantages prompt-based learning offers, it presents a few challenges.
In prompt-based learning, it can be difficult to:
Through researchers have proposed both manual and automated methods for creating prompts, both methods require:
Prompt-based learning has only been explored for limited application domains—such as text classification, question answering and common-sense reasoning. Other domains, such as text analysis, information extraction and analytical reasoning, would require more challenging prompt design methods. (Also read: Data-Centric vs. Model-Centric AI: The Key to Improved Algorithms.)
Prompt-based learning is highly dependent on both the prompt templates (e.g., "It is") and the given answers (e.g., "worth watching"). To this end, it remains challenging to search for an optimal combination of both template and answer and requires a lot of trial and error.
In spite of these challenges, though, prompt learning is rapidly emerging as the next evolution of training foundation models. But to explain why, we need to zoom out a bit.
The first machine learning models were trained with supervised learning. Supervised learning uses labeled data sets and correct output samples to teach a learning algorithm how to classify data or predict an outcome. However, it can be difficult to find enough labeled data to use this method consistently.
As a result, feature engineering became a crucial component of the machine learning pipeline. Feature engineering extracts the most important features from raw data and uses them to guide the model during training. Traditionally, researchers and engineers have used their domain knowledge to decide what counts as the "most important" features. In recent years, however, the advent of deep learning has replaced traditional "hands-on" feature engineering with automatic feature learning. (Also read: Why is feature selection so important in machine learning?)
But that brought us back to square one -- large labeled data sets for training machine learning models are still too scarce.
Self-supervised learning (SSL) is one possible solution to this dilemma. In this type of unsupervised learning, the learning model adopts self-defined signals as supervision and uses the learned representation for downstream tasks. The advent of SSL has enabled researchers to train AI models at scale, particularly for natural language processing (NLP). It's also given rise to foundation models: pre-trained deep learning algorithms that can be scaled to complete various tasks.
The field of AI research going through a paradigm shift where, rather than training task-specific models, large language foundation models are pre-trained on data sets at scale.
By bridging the gap between pre-trained and downstream tasks, prompt-based learning has made it more convenient to deploy the pre-trained models for downstream tasks. This is especially useful in tasks where it is difficult to fine-tune the pre-trained models due to a limited number of large labeled data sets. (Also read: The Top 6 Ways AI Is Improving Business Productivity.)