Programming Language Design

Type Systems and Type Checking


Learning Objectives

  • You know what type systems are.
  • You know what type coercion is, its benefits and its drawbacks.
  • You know how types are used to ensure functions are called with values of expected type.
  • You understand how type inference works in a simple language.
  • You know why type inference rules are functional.

Introduction

In the previous chapter, we made arithmetic and boolean operations succeed by first coercing the required arguments to a common base type, such as an integer or a boolean. Some (weakly typed) programming languages take this approach to the extreme1. However, we will take a step back and consider that addition only makes sense for expressions which eventually result in an integer. Before evaluating an expression, our language should check whether that expression even makes sense in the first place. This leads us to the concept of a type, which each well-formed and semantically valid expression should have.

Types and Type Systems

In our language, we have so far run only into integers and booleans. Let’s define a new data type Type which encodes the possible types our expressions can be:

TypeInteger|Boolean

Now we can say that Num(5) has internal type Integer using the following notation:

Num(5)::Integer

As we know, the type of Num(5) is Term — this is its external type, and we use the single colon for denoting this fact: Num(5):Term. Similarly, the integer 5 is externally typed 5: (or Int when talking explicitly about Gleam).

We use the double colon (::) to indicate that the type is internal to the language — the type Integer is just “data”.

Now that we have a type for booleans, let’s add them to the language as well. The AST that Term refers for the rest of the chapter is defined as

TermNum(Int)|Add(Term,Term)|Sub(Term,Term)|Mul(Term,Term)|True|False|Ite(if:Term,then:Term,else:Term)

Type Checking

Type checking refers to the process of recursively checking that a term consists of subterms of the expected type. The data constructor Add accepts two terms, but to make sure we can add them together during evaluation, we need to make sure that the summands are integers.

For example, to type check the following term

Add(Num(1),Num(2))

we must check that Num(1) is an integer and Num(2) is an integer. If and only if that is the case can we say that Add(Num(1),Num(2)) is an integer.

Clearly both Num(1) and Num(2) are integers, so type checking succeeds in this case.

How about checking the type of the following term?

Mul(True,Num(2))

Again, we check the types of the subexpressions ensuring that they are integers. However, True is probably not an integer, as it’s a boolean, and a common feature of functional type systems is that types are unique, i.e. if a term has a type, it has only one type. We shall take the strict approach and say that the term doesn’t type check — the other alternative is to coerce the type during evaluation.

As we have seen, type checking takes in a term and returns an optional type, but we will call this function infer_type as this is closer to the process of type inference which we will cover next. The type signature of infer_type is:

infer_type:TermOption(Type)

As our language doesn’t have functions yet, the top-level type checking simply ensures that the whole term has some type.

type_check:TermBool

And with infer_type defined (in the following exercise), we can express the typing relation mathematically.

Definition 4.7.1: Typing relation (::)Let 𝑡:Term and ty:Type. The typing relation (::) is a binary relation,such that 𝑡::ty if and only if type_check(𝑡)=Some(ty).
Loading Exercise...

Type Inference

Type inference is the process of determining the type of a term from the surrounding context. It follows basic type logic, like “adding two numbers results in a number” and “negation of a boolean is a boolean”.

However, our simple language doesn’t have any surrounding context, so type inference happens only using the types of subexpressions. For example, in the following Gleam code

let x = None                // (1)
let x2 = case x {
    Some(n) -> n * 2        // (2)
    None -> 0               // (3)
}

Gleam correctly infers that the type of x must be Int, using the following logic:

  1. On line (1), x is set to None, which has type Option(a) for some type a. Gleam’s type system tracks this type a as a typed hole, a.k.a. metavariable.
  2. Lines (2) and (3) indicate that the type of the of the case expression must be an Int (because multiplication results in an integer).
  3. By analyzing the type of the multiplication on line (2), Gleam infers that the type of n must be also Int, which is then fed back to the pattern Some(n). Gleam knows that Some is a function from a -> Option(a), so by matching the type of the argument n: Int with a it infers that a = Int.

This process of filling in typed holes is know as type unification, and it makes statically typed programming languages a lot more ergonomic to use — as you have likely noticed when writing Gleam.

Typing Rules

The typing relation (::) of our language, which we have defined using a function in the previous exercise, can be encoded into inference/typing rules as follows.

First let’s consider the booleans. Clearly False::Boolean and likewise True::Boolean. Logically we write this as an inference rule where the assumptions above the inference line are empty.

False-TFalse::BooleanTrue-TTrue::Boolean

Here are the type inference rules for arithmetic expressions:

Num-TNum(𝑛)::Integer𝑒1::Integer𝑒2::IntegerAdd-TAdd(𝑒1,𝑒2)::Integer𝑒1::Integer𝑒2::IntegerMul-TMul(𝑒1,𝑒2)::Integer𝑒1::Integer𝑒2::IntegerSub-TSub(𝑒1,𝑒2)::Integer

You can replace the 𝑛 by any integer, and the Num-T rule still holds. Similarly, in the “recursive” rules, you can replace 𝑒1 and 𝑒2 by any (well-formed) Term.

How about the typing rule for Ite?

It’s a bit more tricky, but let’s start by thinking about it intuitively. Consider how Ite(True,𝑡1,𝑡2) should infer its type. In this case clearly the type should be the type of 𝑡1 because the condition is true. However, we don’t always know what the condition evaluates to — evaluating happens only after type checking, and if type checking depended on evaluation, we would have a dynamic type system. Therefore we should require that 𝑡1 and 𝑡2 have the same type 𝑇.

To sum up, inferring the type of Ite(𝑡𝑐,𝑡1,𝑡2), should first ensure 𝑡𝑐::Boolean, then 𝑡1::𝑇 and 𝑡2::𝑇, and finally result in 𝑇.

𝑡𝑐::Boolean𝑡2::𝑇𝑡1::𝑇Ite-TIte(if:𝑡𝑐,then:𝑡1,else:𝑡2)::𝑇

The process is recursive, because to show that 𝑡1::𝑇, we need to infer its type possibly using the same inference rule.

Typing Trees

Typing trees, or typing derivations, are trees that prove that a typing relation holds. They can be used to understand how the type checking mechanism works in all of its detail.

Let’s first look at a proof tree that infers that the type of Add(Num(1),Num(2)) is Integer.

Num-TNum(1)::IntegerNum-TNum(2)::IntegerAdd-TAdd(Num(1),Num(2))::Integer

This tree was constructed from bottom up, by following the typing rule for Add (which was annotated with (2)) and then Num(1). Notice that the proof is complete because each the topmost rules of inference have no assumptions.

How would you convince/prove that Integer is the only type that can be inferred?

One answer is to prove that the typing relation is functional, i.e. there is a unique type inference rule for every term.

The statement Ite(Ite(True,False,True),Num(1),Num(2))::Integer is derivable, and has the following proof tree.

Num-TFalse::BooleanNum-TTrue::BooleanIte-TIte(True,False,True)::BooleanNum-TNum(1)::IntegerNum-TNum(2)::IntegerIte-TIte(Ite(True,False,True),Num(1),Num(2))::Integer
Loading Exercise...
Loading Exercise...

Evaluation

Now that our language has a type system, we can define evaluation by presupposing that the code type checks. First we need a data type for the result of the evaluation, let’s call it Value:

ValueIntVal()|BoolVal({True,False})

Now, the evaluation relation is defined over Terms and Values.

In the next exercise, your task is to implement the closure of the following rules as a partial function. The function will be partial because the rules don’t apply to every term, so for some untyped terms, the evaluation may get stuck.

NumNum(𝑛)IntVal(𝑛)TrueTrueBoolVal(True)FalseFalseBoolVal(False)𝑒1IntVal(𝑛1)𝑒2IntVal(𝑛2)AddAdd(𝑒1,𝑒2)IntVal(𝑛1+𝑛2)𝑒1IntVal(𝑛1)𝑒2IntVal(𝑛2)MulMul(𝑒1,𝑒2)IntVal(𝑛1𝑛2)𝑒1IntVal(𝑛1)𝑒2IntVal(𝑛2)SubSub(𝑒1,𝑒2)IntVal(𝑛1𝑛2)𝑡𝑐BoolVal(True)𝑡1𝑣1Ite-TrueIte(if:𝑡𝑐,then:𝑡1,else:𝑡2)𝑣1𝑡𝑐BoolVal(False)𝑡2𝑣2Ite-FalseIte(if:𝑡𝑐,then:𝑡1,else:𝑡2)𝑣2
Totality and Type Safe Evaluation

The set of terms which type check {𝑡:Term|type_check𝑡} should always evaluate to a value, so if we restrict the input to those, we should have a total function — as long as the restricted function terminates for all inputs (which it does).

Therefore we can define a function safe_eval:TermOption(Value), which does type checking (returning None if it fails) before evaluating the value.


Loading Exercise...

To read more about type safety, refer to chapter 6 in Practical Foundations for Programming Languages.

Footnotes

Footnotes

  1. See this demonstration of how JavaScript’s coercions can be used for comedic effect.