Large Language Models

Abundance of Large Language Models

Learning Objectives

You know some of the key milestones in the development of large language models.
You know of some of the time points when large language models gained wider awareness in the general public.

After the first transformer architecture -based large language models, an understanding of the potential of such models grew. In 2019, more models were introduced, including GPT-2, T5, and an improved BERT called RoBERTa. At this point, large language models were most often developed by large companies, such as OpenAI, Google, and Microsoft, and the models were not widely available for the general public.

As the performance of the models increased, concerns about their misuse started to emerge. As an example, OpenAI gradually released larger and more performant versions of GPT-2 over time, initially withhelding their largest model due to misuse concerns. This decision was also widely cited in the media as, in part, it made nice headlines.

See e.g. The AI Text Generator That’s Too Dangerous to Make Public.

In 2020, new models including GPT-3 and Turing-NLG were released. These models were again larger and more powerful than their predecessors. Awareness of the models in the general public also continued to increase, as highlighted by e.g. an opinion article How Do You Know a Human Wrote This? in the New York Times.

Awareness of the models continued to grow in 2020 as highlighted e.g. by the opinion article How Do You Know a Human Wrote This?.

In 2021 and especially in 2022, the amount of available LLMs and their use increased. Plenty of effort was invested into making the outputs of models more controllable, also reducing the amount of bias and hallucination in the generated contents (discussed in the last part of this course).

The key moment in large language models hitting the mainstream was the release of ChatGPT in late 2022, which provided an easy interface for interacting with a large language model that followed instructions.

Loading Exercise...

Click this Google Trends search link to see the search evolution for the terms “chatgpt” and “gpt” (and “ice cream”).

Since then, in 2023 and especially beyond, new large language models are being released almost every week (or, least it feels like it for a researcher trying to keep up with the developments). Few of the new releases gain a wider audience, however.

New models are also being trained for more specialized purposes, including working with specific languages, working in specific domains, and handling specific tasks. There is also an ongoing push for making the models more powerful. To highlight the change in simply the amount of parameters, the first GPT model released in 2018 had 117 million parameters (weights and biases), GPT-3 had 175 billion parameters, and the current largest models have more than one trillion parameters (rumored to be the case for e.g. GPT-4).

Loading Exercise...

As the parameters correspond to weights and biases of neural networks, the increase in the number of parameters also relates to an increase in the computational resources required to train such models. It is not a surprise that the value of companies producing hardware used to train large language models has increased with the popularity of large language models.

As an example, the stock price of NVIDIA has significantly increased considerably since the beginning of 2023. If you wonder what happened to the price in June 2024, there was a stock split where all shareholders got nine new shares for each share they owned.

← Pre-training and Fine-tuning

Evaluating Large Language Models →