Neural Networks for Language

Training a Neural Network

Learning Objectives

You can explain, at a conceptual level, how training with backpropagation adjusts the network.

A neural network with randomly initialized weights is essentially useless — it produces random outputs bearing no relationship to desired behavior. The network becomes useful only after training, which means adjusting the weights and biases so that the network’s predictions match desired outputs on training examples.

Training happens through a two-phase process that repeats many times over the training data:

Forward propagation

In the forward pass:

Input data flows through the network layer by layer.
Each node computes its weighted sum of inputs, adds its bias, and applies its activation function.
Eventually, an output is produced at the final layer.

Example: Input value 5 flows through the network → produces output 0.15.

This forward pass generates a prediction that can be compared to the correct answer.

Backpropagation

After the forward pass, the network evaluates how accurate its prediction was:

The network compares its prediction (the output) to the correct answer (called the label or target). The difference is quantified as the error or loss.

Using the mathematical technique called backpropagation, the network calculates how much each individual weight contributed to this error. This involves computing gradients — partial derivatives indicating the direction and magnitude each weight should change to reduce the error.

Each weight and bias is then adjusted slightly in the direction that reduces the error. This adjustment process uses optimization algorithms called gradient descent or its variants.

The size of each adjustment is controlled by a learning rate — a small number (typically between 0.0001 and 0.1) that prevents the network from making overly large changes that might cause unstable learning or overshooting good solutions.

This update process repeats for many training examples — often millions or billions of times across multiple passes through the training data (called epochs). Gradually, through this iterative adjustment, the network may learn weight values that allow it to perform its task effectively.

An intuition for backpropagation

Consider teaching someone to throw darts at a target:

The student throws and misses the bullseye by some distance and direction.
You measure the error (how far off they were and in which direction).
You provide specific feedback: “throw a bit higher,” “use less force,” “adjust your angle slightly right.”
Each piece of feedback addresses a different aspect of their technique (analogous to different weights in different parts of the network).
After many rounds of throws and feedback, the student’s aim improves gradually.

Backpropagation is the feedback mechanism: it tells each weight in the network how to adjust to improve the overall result. The “error” propagates backward through the network, informing every weight about its contribution to the mistake.

The mathematical elegance of backpropagation is that it can efficiently compute these adjustments for millions or even billions of weights simultaneously using the chain rule from calculus. Without backpropagation, training large neural networks would be computationally infeasible.

Loading Exercise...

← Basics of Neural Networks

Embeddings and Word Representations →