Programming Language Design

Type Systems and Type Checking

Learning Objectives

You know what type systems are.
You know what type coercion is, its benefits and its drawbacks.
You know how types are used to ensure functions are called with values of expected type.
You understand how type inference works in a simple language.
You know why type inference rules are functional.

Introduction

In the previous chapter, we made arithmetic and boolean operations succeed by first coercing the required arguments to a common base type, such as an integer or a boolean. Some (weakly typed) programming languages take this approach to the extreme¹. However, we will take a step back and consider that addition only makes sense for expressions which eventually result in an integer. Before evaluating an expression, our language should check whether that expression even makes sense in the first place. This leads us to the concept of a type, which each well-formed and semantically valid expression should have.

Types and Type Systems

In our language, we have so far run only into integers and booleans. Let’s define a new data type which encodes the possible types our expressions can be:

Now we can say that has internal type using the following notation:

As we know, the type of is — this is its external type, and we use the single colon for denoting this fact: . Similarly, the integer is externally typed (or Int when talking explicitly about Gleam).

We use the double colon (::) to indicate that the type is internal to the language — the type is just “data”.

Now that we have a type for booleans, let’s add them to the language as well. The AST that refers for the rest of the chapter is defined as

Type Checking

Type checking refers to the process of recursively checking that a term consists of subterms of the expected type. The data constructor accepts two terms, but to make sure we can add them together during evaluation, we need to make sure that the summands are integers.

For example, to type check the following term

we must check that is an integer and is an integer. If and only if that is the case can we say that is an integer.

Clearly both and are integers, so type checking succeeds in this case.

How about checking the type of the following term?

Again, we check the types of the subexpressions ensuring that they are integers. However, is probably not an integer, as it’s a boolean, and a common feature of functional type systems is that types are unique, i.e. if a term has a type, it has only one type. We shall take the strict approach and say that the term doesn’t type check — the other alternative is to coerce the type during evaluation.

As we have seen, type checking takes in a term and returns an optional type, but we will call this function infer_type as this is closer to the process of type inference which we will cover next. The type signature of is:

As our language doesn’t have functions yet, the top-level type checking simply ensures that the whole term has some type.

And with defined (in the following exercise), we can express the typing relation mathematically.

Loading Exercise...

Type Inference

Type inference is the process of determining the type of a term from the surrounding context. It follows basic type logic, like “adding two numbers results in a number” and “negation of a boolean is a boolean”.

However, our simple language doesn’t have any surrounding context, so type inference happens only using the types of subexpressions. For example, in the following Gleam code

let x = None                // (1)
let x2 = case x {
    Some(n) -> n * 2        // (2)
    None -> 0               // (3)
}

Gleam correctly infers that the type of x must be Int, using the following logic:

On line (1), x is set to None, which has type Option(a) for some type a. Gleam’s type system tracks this type a as a typed hole, a.k.a. metavariable.
Lines (2) and (3) indicate that the type of the of the case expression must be an Int (because multiplication results in an integer).
By analyzing the type of the multiplication on line (2), Gleam infers that the type of n must be also Int, which is then fed back to the pattern Some(n). Gleam knows that Some is a function from a -> Option(a), so by matching the type of the argument n: Int with a it infers that a = Int.

This process of filling in typed holes is know as type unification, and it makes statically typed programming languages a lot more ergonomic to use — as you have likely noticed when writing Gleam.

Typing Rules

The typing relation of our language, which we have defined using a function in the previous exercise, can be encoded into inference/typing rules as follows.

First let’s consider the booleans. Clearly and likewise . Logically we write this as an inference rule where the assumptions above the inference line are empty.

Here are the type inference rules for arithmetic expressions:

You can replace the by any integer, and the rule still holds. Similarly, in the “recursive” rules, you can replace and by any (well-formed) .

How about the typing rule for ?

It’s a bit more tricky, but let’s start by thinking about it intuitively. Consider how should infer its type. In this case clearly the type should be the type of because the condition is true. However, we don’t always know what the condition evaluates to — evaluating happens only after type checking, and if type checking depended on evaluation, we would have a dynamic type system. Therefore we should require that and have the same type .

To sum up, inferring the type of , should first ensure , then and , and finally result in .

The process is recursive, because to show that , we need to infer its type possibly using the same inference rule.

Typing Trees

Typing trees, or typing derivations, are trees that prove that a typing relation holds. They can be used to understand how the type checking mechanism works in all of its detail.

Let’s first look at a proof tree that infers that the type of is .

This tree was constructed from bottom up, by following the typing rule for (which was annotated with ) and then . Notice that the proof is complete because each the topmost rules of inference have no assumptions.

How would you convince/prove that is the only type that can be inferred?

One answer is to prove that the typing relation is functional, i.e. there is a unique type inference rule for every term.

The statement is derivable, and has the following proof tree.

Loading Exercise...

Evaluation

Now that our language has a type system, we can define evaluation by presupposing that the code type checks. First we need a data type for the result of the evaluation, let’s call it :

Now, the evaluation relation is defined over s and s.

In the next exercise, your task is to implement the closure of the following rules as a partial function. The function will be partial because the rules don’t apply to every term, so for some untyped terms, the evaluation may get stuck.

Totality and Type Safe Evaluation

The set of terms which type check should always evaluate to a value, so if we restrict the input to those, we should have a total function — as long as the restricted function terminates for all inputs (which it does).

Therefore we can define a function , which does type checking (returning if it fails) before evaluating the value.

Loading Exercise...

To read more about type safety, refer to chapter 6 in Practical Foundations for Programming Languages.

Footnotes

See this demonstration of how JavaScript’s coercions can be used for comedic effect. ↩

← Boolean Expression Language

Parser Combinators and Lisp →