Programming Language Design 2

The Simply Typed Lambda Calculus


Learning Objectives

  • You know what base types are.
  • You know of the hypothetical typing judgment.
  • You understand the typing rules of simply typed lambda calculus.
  • You know how the simply typed lambda calculus behaves with only one base type: Boolean.

Base Types

So far, we haven’t seen any base types, such as integers or booleans in the lambda calculus. That is on purpose, the theory of lambda calculus is mostly independent of base types, and even if we had integers or booleans, those could always be translated to abstractions with Church encodings. One way to say this is that integers and booleans are admissible, i.e. all properties of lambda calculus still hold even if we add more base types.

Not having integers or booleans is not a property…

The Simply Typed Lambda Calculus

The simply typed lambda calculus (STLC) is an extension of the untyped lambda calculus where each valid term has a type1 and abstractions must specify the type of their quantified variable.

TermVarString|App(Term,Term)|Abs(String,Type,Term)TypeArrow(Type,Type)

The Type highlighted in red is the only difference between simply typed and untyped lambda calculus.

However, in order for the simply typed lambda calculus to be useful it needs to have at least one base type, such as integers or booleans.

Booleans

Next, we will introduce booleans as part of the untyped lambda calculus. Then we can begin to analyze and define behavior for booleans from an untyped, and later from the typed point of view.

TermVar(String)|Abs(var:String,body:Term)|App(Term,Term)|Let(var:String,val_t:Term,body:Term)|True|False|Ite(cond:Term,if_true:Term,if_false:Term)

The syntax for writing an Ite(cond:𝑡𝑐,if_true:𝑡1,if_false:𝑡2) is if𝑡𝑐then𝑡1else𝑡2.

Because the set of terms have changed, we have to extend the 𝛽-reduction, otherwise all the if-then-else term is stuck, i.e. a non-reducible non-value. One of the benefits of expressing relations using inference rules is that we can just extend the rules ad-hoc. So let’s add the following evaluation rules to the new boolean lambda calculus.

IfTrue(ifTruethen𝑡1else𝑡2)𝑡1IfFalse(ifFalsethen𝑡1else𝑡2)𝑡2𝑡𝑐𝑡𝑐If(if𝑡𝑐then𝑡1else𝑡2)(if𝑡𝑐then𝑡1else𝑡2)

Note that 𝑡1 and 𝑡2 can be non-value terms. Now, the complete set of evaluation rules for the untyped lambda calculus with let expressions and booleans is App1,App2,AppAbs,Let1,Let,IfTrue,IfFalse,If.

Make sure you know the details of how evaluation of if-then-else works by asking these questions from yourself:

  1. What happens if the condition of an if-then-else is not a value?
  2. If the condition evaluates to false, is the true branch reduced?
  3. If the condition is a value but not a boolean, what happens?
  4. What happens if the branches are the same term?
Loading Exercise...
Loading Exercise...
Loading Exercise...
Loading Exercise...

Church Booleans

For example, the Church encoding of booleans is done by representing them with abstractions of abstractions:

  • True is 𝜆𝑥.𝜆𝑦.𝑥
  • False with 𝜆𝑥.𝜆𝑦.𝑦

Then, we can represent the if-then-else construction for three church booleans 𝑡𝑐,𝑡if,𝑡else with

𝑡𝑐 𝑡if 𝑡else

To analyze how this works, let’s assume 𝑡𝑐 is a Church encoded boolean and analyze both values:

  1. 𝑡𝑐=𝜆𝑥.𝜆𝑦.𝑥. Now, we can see that the term 𝑡𝑐 𝑡if 𝑡else reduces to 𝑡if:

    𝑡𝑐 𝑡if 𝑡else=(𝜆𝑥.𝜆𝑦.𝑥) 𝑡if 𝑡else𝑡if
  2. 𝑡𝑐=𝜆𝑥.𝜆𝑦.𝑦. Now, we can see that the term 𝑡𝑐 𝑡if 𝑡else reduces to 𝑡else:

    𝑡𝑐 𝑡if 𝑡else=(𝜆𝑥.𝜆𝑦.𝑦) 𝑡if 𝑡else𝑡else

Notice that the 𝑡if and 𝑡else don’t even need to be booleans.

It would be easy to define the following type for a lambda calculus with booleans

#[derive(Debug, Clone)]
pub enum TermSugar {
    // Booleans
    True,
    False,
    Ite(Box<TermSugar>, Box<TermSugar>, Box<TermSugar>),

    Var(String),
    Abs { var: String, body: Box<TermSugar> },
    App(Box<TermSugar>, Box<TermSugar>),
}

Then we could simply translate the sugared terms back to plain lambda terms using the Church encoding.

With understanding of booleans in the untyped calculus, we are ready to add types on top. The reason we added a base type is that plain STLC has no finite types1, so it would be impossible to write a well-typed term with a finite type.

Infinite terms and types are outside the scope of this course.

𝜆𝑥:Boolean.𝑥

This lambda term has type BooleanBoolean (or formally Arrow(Boolean,Boolean)). The arrow () type is used to denote the type of abstractions.

The following function would also be the identity function, but for BooleanBoolean functions:

𝜆𝑥:BooleanBoolean.𝑥

The type of this identity function is (BooleanBoolean)BooleanBoolean

Definition 6.5.1: Terms and types of STLC+𝟚The set of terms STLC+𝟚, which is referred to as Term during the rest ofthis chapter, is defined as follows:TermVarString|App(Term,Term)|Abs(String,Type,Term)|Let(var:String,val_t:Term,body:Term)|True|False|Ite(Term,Term,Term)The set of types of STLC+𝟚, referred to as Type, is defined as follows:TypeArrow(Type,Type)|BooleanIn addition, we use the notation 𝜆𝑥:𝑇.𝑡 to denote the term Abs("x",𝑇,𝑡)and 𝟚 to denote the type Boolean. Other notation for Terms (let-expressions and booleans) is inherited from the untyped lambda calculus.

We also refer to free and bound variables in STLC by considering the definitions 6.3.1-3 after erasing types from STLC terms.

Type Erasure (6.5.2-3)

Types play no significance during evaluation of statically typed programming languages, like those based on STLC, so we can lower e.g. STLC+𝟚 to its untyped variant. This step can be done to ensure that types are not used during evaluation.

Definition 6.5.2: 𝜆+𝟚The terms of 𝜆+𝟚 are defined as follows.TermVarString|App(Term,Term)|Abs|Let(var:String,val_t:Term,body:Term)|True|False|Ite(Term,Term,Term)Notation is defined in the usual style.Definition 6.5.3: Lowering STLC+𝟚 to 𝜆+𝟚The type erasure of a term 𝑡:STLC+𝟚, denoted 𝑡:𝜆+𝟚, is defined asfollows.𝑥=𝑥,𝑥is a variable𝑡1 𝑡2=𝑡1 𝑡2𝜆𝑥:𝑇.𝑡=𝜆𝑥.𝑡,Note: forgets/erases𝑇let𝑥=𝑣in𝑡=let𝑥=𝑣in𝑡True=TrueFalse=Falseif𝑡𝑐then𝑡1else𝑡2=if𝑡𝑐then𝑡1else𝑡2

One way to enforce the use of the type checker before, and only before evaluating a term would be to have the following architecture:

type_check:STLC+𝟚Result(𝜆+𝟚,TypeError)multistep:𝜆+𝟚Valueinterpreter:STLC+𝟚Result(Value,TypeError)interpreter=mapmultisteptype_check

Where the type_check is defined as

type_check(𝑡)=caseinfer_type(𝑡,)of|Ok(_)Ok(𝑡)|Err(err)Err(err)

And the type 𝜆+𝟚 is used as an existential/opaque type, which is not visible to the user.

This way evaluation can not externally be used before type checking and reduction doesn’t have access to type information.

Here are some example terms of STLC+𝟚 (STLC with booleans):

  1. 𝜆𝑓:𝟚𝟚.𝑓

    The identity function for boolean endofunctions (endo meaning the domain and codomain have the same type.)

    The type of this term is (𝟚𝟚)(𝟚𝟚).

  2. 𝜆𝑥:𝟚.if𝑥thenFalseelseTrue

    The negation function mapping each boolean to its negation. The type of this term is 𝟚𝟚.

    Let’s refer to it as not.

  3. (𝜆𝑓:𝟚𝟚.𝑓)not

    The identity function for boolean endofunctions applied to the not endofunction.

    The type of this term is 𝟚𝟚.

  4. (𝜆𝑓:𝟚𝟚.𝑓)notTrue

    The identity function for boolean endofunctions applied to the not endofunction which is then applied to the boolean term True.

    The type of this term is 𝟚.

  5. 𝜆𝑓:𝟚𝟚.𝑓True

    This function applies its argument to the True value.

    The type of this term is (𝟚𝟚)𝟚2.

Loading Exercise...
Loading Exercise...
Loading Exercise...

Typing

In part 4 chapter 7 type checking was simply a process of determining the type of subterms until they hit integer or boolean values. However, in STLC, type checking is not as simple because we have to deal with variables.

The term 𝜆𝑥:𝟚,𝜆𝑦:𝟚,𝑥 has type 𝟚𝟚𝟚 as applying the abstraction to some boolean argument 𝑡::𝟚 reduces to the term 𝜆𝑦:𝟚,𝑡 which has type 𝟚𝟚.

What we have just stated certainly makes sense, but how would we turn that into an algorithm? Especially the “some boolean 𝑡” part…

The logic for type checking terms in STLC is the most interesting property of the simply typed lambda calculus. The whole idea is that abstractions know the type of their parameter, so type checking the body of the abstraction should use that knowledge of the variable’s type.

The typing rules should therefore look something like this:

𝑡𝑦𝑝𝑒 𝑐ℎ𝑒𝑐𝑘 𝑡1::𝟚𝑘𝑛𝑜𝑤𝑖𝑛𝑔𝑥::𝟚𝜆𝑥:𝟚. 𝑡1::𝟚

This “knowledge” is stored in a context.

Context

Typing context, typing environment, or just context, refers to the data structure which associates variable names to their types. Symbols Γ,Σ are used to denote contexts.

Here are a couple examples of contexts:

Γ𝑥::IntegerΓ,𝑥::IntegerΓ,Σ

The properties we need from a context, are that we can look up types by variable name, and extend them with new variables.

Hypothetical Typing Judgment

Unfortunately the typing relation (::), which has gotten us this far, is no longer suitable for type checking, as it’s not aware of the context. We need to define a new relation, the hypothetical typing judgment which also has the context in it.

Definition 6.5.4: Hypothetical Typing JudgmentGiven a context Γ, a term 𝑡 and a type 𝑇, the hypothetical typing judgmentis the smallest ternary relation closed under the typing rules (to be defined).The relation, denoted byΓ𝑡::𝑇reads “in the context Γ, 𝑡 has type 𝑇”.Example 6.5.5: Hypothetical Typing Judgment𝑥::𝟚𝑥::𝟚(𝜆𝑥:𝟚. 𝑥)::𝟚𝟚

Note: When reading about lambda calculus and type theory, try to glance over the context and focus on the terms and types on the first reading.

Let’s start spelling out the typing rules for STLC+𝟚. First, for simplicity, let’s consider just the boolean fragment of the language.

Reminder, here are the boolean typing rules from part 4 of the course:

False-TFalse::𝟚True-TTrue::𝟚𝑡𝑐::𝟚𝑡2::𝑇𝑡1::𝑇Ite-TIte(if:𝑡𝑐,then:𝑡1,else:𝑡2)::𝑇

The rules True,False,Ite for type checking terms named respectively, are as follows.

TrueΓTrue::𝟚FalseΓFalse::𝟚Γ,𝑡𝑐,𝑡1,𝑡2,𝑇,Γ𝑡𝑐::𝟚Γ𝑡1::𝑇Γ𝑡2::𝑇IteΓIte(𝑡𝑐,𝑡1,𝑡2)::𝑇

Note: the context Γ is completely arbitrary, as are the terms 𝑡𝑐,𝑡1,𝑡2 and type 𝑇. Notably, this rule correctly type checks function types: 𝑇=𝑇1𝑇2.

Now, let’s write down the rules for type checking the core or STLC.

Variables

Type checking a variable 𝑥 happens with the following logic:

  • If the typing association 𝑥::𝑇 appears in the context, then we know that 𝑥::𝑇.

This can be written in a style without hypotheses as follows:

Γ,𝑥,𝑇,VarΓ,𝑥::𝑇𝑥::𝑇

The rule is taken to mean that if 𝑥::𝑇 appears in the context3, then 𝑥::𝑇.

Here’s an example of a typing derivation involving a variable inside an if-then-else:

VarΓ,𝑥::𝟚𝑥::𝟚FalseΓ,𝑥::𝟚False::𝟚TrueΓ,𝑥::𝟚True::𝟚Γ,𝑥::𝟚if𝑥thenFalseelseTrue::𝟚

Abstractions

The rule for type checking abstractions is the part where the context is extended with a new variable before recursively continuing. Type checking an abstraction 𝜆𝑥:𝑇1,𝑡1 happens with the following logic:

  • If 𝑡1::𝑇2 under the additional hypothesis that 𝑥::𝑇1, then 𝜆𝑥:𝑇1,𝑡1::𝑇1𝑇2.

Extending the context into the hypothesis is crucial for the potentially free parameter inside the abstractions to type check.

Γ,𝑥,𝑡1,𝑇1,𝑇2,Γ,𝑥::𝑇1𝑡1::𝑇2AbsΓ𝜆𝑥:𝑇1. 𝑡1::𝑇1𝑇2

Example typing derivation:

VarΓ,𝑥::𝟚𝑥::𝟚AbsΓ𝜆𝑥:𝟚. 𝑥::𝟚𝟚

Let’s name this derivation Δ.

Applications

Type checking applications simply involves ensuring that the argument is in the domain of the arrow type.

Γ,𝑡1,𝑡2,𝑇1,𝑇2,Γ𝑡1::𝑇1𝑇2Γ𝑡2::𝑇1AppΓ𝑡1 𝑡2::𝑇2

Example typing derivation:

ΔTrueΓTrue::𝟚AppΓ(𝜆𝑥:𝟚,𝑥)True::𝟚

Here, the typing derivation Δ is substituted in from the previous example as it precisely matches the expected format: 𝑡1=𝜆𝑥:𝟚,𝑥 and 𝑇1=𝑇2=𝟚.

Let-expressions

The rule for type let-expressions is very similar to abstractions:

  • If 𝑡2::𝑇2 under the additional hypothesis that 𝑥::𝑇1, and 𝑡1::𝑇1 then let𝑥=𝑡1in𝑡2::𝑇2.

Even though the let expression knows the 𝑥, we avoid using substitution in the rules. Substitution is an implementation detail of evaluation, and evaluation details should not be mixed with type checking, as type checking is a prerequisite for correctness of evaluation.

If we used substitution directly, we could write both of the assumptions as [𝑥𝑡1]𝑡2::𝑇2, but this makes reasoning about the rule much more difficult.

Γ,𝑥,𝑡1,𝑡2,𝑇1,𝑇2,Γ,𝑥::𝑇1𝑡2::𝑇2Γ𝑥::𝑇1LetΓlet𝑥=𝑡1in𝑡2::𝑇2
Loading Exercise...
Loading Exercise...
Loading Exercise...
Loading Exercise...

Evaluation

The evaluation rules are inherited from the untyped lambda calculus, just with the added type in abstractions.

𝑡1𝑡1App1𝑡1 𝑡2𝑡1 𝑡2𝑡2𝑡2is_value(𝑣1)App2𝑣1 𝑡2𝑣1 𝑡2is_value(𝑣2)AppAbs(𝜆𝑥:𝑇.𝑡1) 𝑣2[𝑥𝑣2]𝑡1IfTrue(ifTruethen𝑡1else𝑡2)𝑡1IfFalse(ifFalsethen𝑡1else𝑡2)𝑡2𝑡𝑐𝑡𝑐If(if𝑡𝑐then𝑡1else𝑡2)(if𝑡𝑐then𝑡1else𝑡2)

Implementation

Before diving into the exercise, here are some notes (and hints) for the implementation:

  • We define the context as a HashMap of variable names to types.

    pub type Context = std::collections::HashMap<String, Ty>;
  • infer_type results in a Result<Ty, TypeError> where TypeError encodes the following set of possible type errors.

    pub enum TypeError {
        UndefinedVariable(String),
        WrongAppTypeRight(Ty),
        WrongAppTypeLeft(Ty),
        Fail,
    }
  • It is highly recommended to write unit tests for the implementations, although they are not graded.


Loading Exercise...

Properties of the Simply Typed Lambda Calculus

Theorem 6.5.6: ProgressLet 𝑡 be a well-typed term, i.e. 𝑡::𝑇 for some type 𝑇. Now, eitheris_value𝑡, or there exists 𝑡, such that 𝑡𝑡.The proof is considered too complex for the scope of this course. Theorem 6.5.7: PreservationLet 𝑡 be a well-typed term, i.e. 𝑡::𝑇 for some type 𝑇. Let 𝑡 be a termsuch that 𝑡𝑡. Then 𝑡::𝑇 for some type 𝑇.The proof is considered too complex for the scope of this course.

For proofs of these theorems, refer to the optional chapter 10 which covers formalization of programming language theory.

Footnotes

Footnotes

  1. Well, the lack of base types makes the system less expressive and only interesting from a theory point of view. Without base types, the simply typed lambda calculus is “degenerate”. That is, there are no (finite) well-typed terms. This is why we need base types to make the system useful. However, the typing rules and other properties are still valid in this degenerate system. 2

  2. When writing arrow types, it is useful to associate them to the right so that we don’t need parentheses to indicate that the return type is a function. This means that the type of “multi-argument” functions is just a chain of arrows without parentheses (unless the parameters have types with arrows).

    𝟚𝟚𝟚𝟚(𝟚𝟚)

    which is not the same type as (𝟚𝟚)𝟚.

  3. Depending on the formal definition of the context, we might use a function get:ContextStringOption(Type) and write get(Γ,𝑥)=Some(𝑇) in the assumptions and Γ𝑥::𝑇 in the conclusion. This is closer to the Rust implementation where a HashMap is used as the context.