Large Language Models

Pre-training and Fine-tuning


Learning Objectives

  • You know the term generative pre-trained transformer.
  • You understand the distinction between pre-training and fine-tuning.

The introduction of the transformer architecture led to generative pre-trained transformers (GPTs), which are mostly associated with the term large language model. The transformer architecture avoided some of the problems with scalability that were present in earlier (e.g. RNN-based) models. The transformer architecture also allowed the models to better capture the context of words due to the attention mechanism, and led to better parallelization of the training process as a whole.

Loading Exercise...

The creation of GPT models is based on two key phases, pre-training and fine-tuning. In the pre-training phase, the model is trained on large amounts of data, where the model is trained to generate text. The pre-training phase helps the model to learn the syntax and grammar in the text. In addition, the pre-training phase helps in forming a factual base from which the model can draw from in the future.

Pre-training is a form of unsupervised machine learning. In the context of GPTs, pre-training focuses on learning to predict the next word in a sequence (or to fill in a gap in text) based on the word context.

Loading Exercise...

After the pre-training phase, the pre-trained model is further trained to handle specific tasks. This is the fine-tuning phase. The fine-tuning phase allows improving the performance of the model on specific tasks such as question answering, translation, and summarization.

Fine-tuning is a form of supervised machine learning. In the context of GPTs, fine-tuning focuses on training the models to perform well in specific tasks.

Loading Exercise...

The first GPT models were introduced in 2018 and were able to achieve state-of-the-art performance in a range of natural language processing tasks. As an example, see the articles Improving Language Understanding by Generative Pretraining and BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.

On the term GPT

It is worth noting that although the term GPT has been strongly associated with OpenAI, the term generative pre-trained transformer is a general term that refers to a class of models that are pre-trained on large amounts of text data and then fine-tuned for specific tasks.