Probabilities and Language

Overview


In this part, we introduce the concept of language models. We start by discussing the basics of probabilities and predictions. We then explore the foundational work of Andrey Markov and Claude Shannon, two pioneers whose research on stochastic processes and information theory laid the groundwork for modern language modeling. After that, we introduce Markov chains and Word n-gram models, focusing on how they function, and finally discuss their limitations. This sets the stage for the more advanced models covered in the next part.

The chapters in this part are as follows:

  • Probabilities and Predictions introduces the concept of probability and outlines the possibility of making predictions based on data.
  • Towards Language Models defines the term language model and presents some of the foundational work by Markov and Shannon.
  • Markov Chains and n-gram Models introduces Markov chains and the word n-gram language model, with examples of how they work.
  • Limitations of Early Models discusses the shortcomings of early probabilistic language models, including data sparsity, fixed context, and lack of meaning.
  • Summary summarizes the key takeaways from this part.