Neural Networks for Language

Overview

In this part, we outline neural network -related work leading to contemporary language models, starting with introducing neural networks and forming machine-understandable representations of data and text, followed by problems related to using neural networks with sequential data. Finally, the part introduces self-attention and transformers, which are the basis for many of the current large language models.

The chapters of this part are as follows.

Basics of Neural Networks introduces the basic idea of neural networks.
Training Neural Networks discusses how neural networks are trained.
Embeddings and Word Representations introduces embeddings and word embeddings that are used to represent words in a machine-understandable way.
Sequential Models (RNNs, LSTMs) discusses neural networks for processing sequential data, and the challenges associated with this.
Attention Mechanisms describes the core idea of self-attention that allows processing words in context.
Transformers discusses transformers architecture that is the base for many of the current large language models.
Summary summarizes the key takeaways from this part.

Basics of Neural Networks →