Programming Language Design 2

Variables and Substitution

Learning Objectives

You understand the language of lambda calculus.
You know what free and bound variables are.
You know of substitution, and what problems free variables may pose.

Untyped Lambda Calculus

In the chapter From BASIC to Lambda Calculus in Part 4, we quickly glanced over untyped lambda calculus, so make sure you have read that chapter first.

In brief, untyped lambda calculus is a formal system for expressing computation based on function abstraction and application. A term in the untyped lambda calculus is simply a variable, an abstraction over a term (which represents a function), or an application of two terms. We can express the terms of untyped lambda calculus using the following algebraic data type:

For brevity, let’s (loosely) define the following symbolic notation for writing down a :

A variable can be expressed using a mathematical literal, such as . This would represent the term . Other variable names could be.

The details of what symbols are allowed is not important, as this is a loose definition based on intuition.
An abstraction can be expressed as follows: . This would represent the term . A variable name right after the is used for the variable the abstraction abstracts over. Then a dot followed by a term for the body is required.
Application is expressed with , where are terms. Parentheses can be added to visually separate the left hand side from the right hand side. For example represents , whereas would represent .

Loading Exercise...

Free and Bound Variables

Before we define what free and bound variables are (in lambda calculus), let’s look at a motivating example from Gleam.

{
    let x = 5               // (1)
    let y = x               // (2)
    let z = f               // (3)
    z
}

First of all, let’s analyze the syntax. The code consists of a block ({}), which is an expression in Gleam. Blocks may consist of let statements followed by a final expression, which becomes the final value of the block expression. A let statement (which in reality is an expression) in Gleam binds a value to a variable, such that it is available on future lines until it’s shadowed or goes out of scope.

The example above, shows the variable x being bound on line (1), such that its value 5 is available on the following lines (notably (2)).

The let expression is called a variable-binding operator/construction, as it introduces a new variable which is available inside its body. The body of the let expression is simply the expression that follows it. Another variable-binding construction is a function — which is just like a let expression, except it doesn’t bind the value immediately. The following example shows how a function in Gleam binds a variable x.

fn func(x) {            // (1')
    let y = x           // (2')
    let z = f           // (3')
    z
}

We will return back to this discussion about how functions and let statements differ later.

Other Variable-Binding Operators from Math

In the following examples, every variable is bound by some operator/construction.

The first equation shows how a new function is defined as it takes an argument and returns the term . The sum () binds the variable and evaluates the term ten times. The limit binds a variable and uses the defined earlier. The quantifies, i.e. binds, the variable to express a logical proposition which may or may not be true¹.

In the following examples, the variables and are bound just like earlier, but the variables occur free.

Next, let’s analyze the semantics of the previous examples.

In the code block example, after line (1), the variable x is bound to the value meaning that any later occurrence of x evaluates to . The first occurrence of the variable x is on line (2), where its value is assigned to the variable y. After line (2) both x and y are bound to the same value of .

In the function example, the behavior is the same except we don’t know the value of x yet, as it is known only once the function is called.

However, on line (3), the variable z is bound to the value of f, which there is no mention of — at least we are not aware of it.

The instance of f on line (3) is called a free occurrence. Free occurrence means that the variable is not bound by any variable-binding construction.

The code, when compiled, leads to an error:

error: Unknown variable
  ┌─ /app/main/src/main.gleam:4:11
  │
4 │   let z = f               // (3)
  │           ^ Did you mean `x`?

The name `f` is not in scope here.

This error is a manifestation of our program not being complete (defined later). The same can be said about the func example where line (3') refers to an undefined variable.

We can already formulate an intuitive definition for when a variable in an expression/term is free. A variable occurs free in the term , if any of the following holds:

represents a variable of the same name: .
consists of subterms ,
- e.g. , and occurs free in any of the subterms.
This is the most important condition!

is a binding construction, which binds a variable in the body , and occurs free in ².

Let’s analyze the third condition more closely for a variable :

. Because binds the same variable name , the variable is not free.
. Now, because , and is not free in the term , is not free.
. Here, the operator binds the variable , but the body contains a free occurrence of , so is free.

The most straight-forward definition for when a variable is bound is that it is bound if and only if it is not free. This however has the unintuitive consequence that variables that don’t appear at all are considered bound.

Now, let’s define when a variable in lambda calculus is free.

Loading Exercise...

Closed and Complete Terms

Unlike in untyped programming languages, in typed ones, it is usually required that all variables are bound somewhere in the program. Otherwise, type checking could not determine the types of the free variables (or whether they match their expected type), and during evaluation there would be no way for the variables to evaluate to anything sensible. So it makes sense to require that no variable in a program occurs free, i.e. the program is a complete term. In logic, the word closed refers to the same thing as complete in programming language theory.

Some programming languages may allow variables to be accessed from environment variables, but the compiler and type checker would have no way of checking the types of those variables (or if they are defined at all).

Think about how programming languages might express accessing environment variables.³

Alpha Equivalence

Consider these two definitions for the function .

Clearly, by mathematical intuition, they are the same definition, because the names of bound variable occurrences don’t matter as long as they match the free occurrences in the body of the variable-binding construction.

This equivalence of the terms is called -equivalence.

Therefore the following pairs represent equal things:

However, renaming a free variable should not be allowed, because a surrounding context should be abstracting over it. For example

is not the same as

because the surrounding context most likely has a universal quantifier, which says that the equation holds for all .

Here, the first formula has no free variables, so it’s closed. However after incorrectly renaming to in the body of the second formula, the formula is no longer closed. Therefore, to keep the meaning of the formula/term the same, only bound variables can be renamed.

Loading Exercise...

Substitution

Substitution plays a key role in functional programming as it enable us to “call functions” (or apply abstractions in lambda calculus lingo) without using a state or a call stack to store the variables. Intuitively, calling a function in a functional language is simply substituting the function call with the function’s body and the parameters with the call arguments. Substitution typically happens in PL dynamics, but it’s also often used for more powerful type systems (such as those of Haskell’s and Rust’s) which are capable of doing type-level computation at compile-time.

For example, the following Gleam code:

fn add2(a) {
    a + 2
}

pub fn main() {
    let x = add2(3)
}

should be equivalent with

fn add2(a) {
    a + 2
}

pub fn main() {
    let x = {
        3 + 2
    }
}

To make this more concrete, let’s define a notation for a substitution. We define the following notation: means that the every free occurrence of the variable in is substituted with the term .

The variable name denotes a term, which is considered to be a value. Values in lambda calculus are defined in the next chapter.

Now we can turn let x = add2(3) into let x = [a ↦ 3]{ a + 2 } by replacing the function with its body and substituting the variable with the value . The term [a ↦ 3]{ a + 2 } equals { 3 + 2 } so let x = add2(3) has effectively turned into let x = { 3 + 2 }.

Here are some examples of substitutions when working with lambda calculus:

The term is an open term with a free occurrence of the variable . Therefore the is substituted by .
As is a closed term, it remains unchanged under substitution.
In , is a free occurence, which is substituted by .
The expression contains two free occurrences of which get substituted by .
Here we are substituting with an incomplete term which has a free occurence of . This is one of the main reasons why we try to always work with complete terms, as substituting by a complete term cannot accidentally bind free variables.

This corner case seems to work incorrectly, as the variable name in could be -converted to any other variable name.

With a good understanding of how substitution should work, we can specify the substitution algorithm for lambda calculus.

Pay attention to how is recursively substituted as this is a very common structure-preserving (i.e. mapping) operation.

Substituting into a variable does the substitution only if the variable names match. Likewise substitution recursively continues into an abstraction only if the name of the bound variable is not the same as what we are substituting.

The last case is the most important as it blocks substitution from substituting bound variables.

Substitution Problem

So far we have been led to believe that in functional programming we can replace equals by equals. However, this is not the case in general. In particular, we have to be careful when substituting a term introduces new free variables

To demonstrate why free variables pose a problem, let’s analyze the following piece of Gleam code:

Run the program to see the output

In this case replacing f() with x is not allowed. The reason is that we are only allowed to call complete functions (or run complete programs) to avoid free variables getting bound.

Sometimes there is a need to do substitution even though the terms are not complete. One case is when we normalize lambda terms. There we might need to do capture-avoiding substitutions when normalizing non-complete terms.

Loading Exercise...

Summary

In this chapter we covered the basics of lambda calculus and learned a bit more about free variables and substitution. We defined a new notation for substituting a value into a term.

Footnotes

If the variable is an ordinal, then the proposition doesn’t hold. ↩
Some variable binding constructions, such as case expressions may bind multiple variables in multiple patterns. For example, consider the following Gleam code for adding together the elements of a list:
```
pub fn sum(l: List(Int)) {
    case l {
        [] -> 0
        [x, ..xs] -> x + sum(xs)
    }
}
```
The pattern [x, ..xs] -> x + sum(xs) binds both x and xs, so neither occurs free in the “body” x + sum(xs). ↩
See how environment variables are modelled in Haskell and in Rust, and in Bash. ↩

← Parsing with Nom

Reduction and Evaluation →