Programming Language Design 2

A Taste of Formalization


Learning Objectives

  • You know of the Lean proof assistant
  • You know how basic theorems about lambda calculus can be formalized using a proof assistant

This is an extra chapter about formalization of programming language theory. We are going to prove and formalize type safety for the simply typed lambda calculus from chapter 5.

Firstly, what is formalization?

Formalization is the process of converting something informal to something formal. Mathematics can be formalized by e.g. using programming languages. Some programming languages such as Lean are built with mathematics in mind and a significant amount of mathematics has already been formalized in Lean.

All of the code presented here is found in lean playground1 (click the footnote to open it).

While reading the material, move your cursor along in the editor to see how Lean’s InfoView changes.

Type Safety

Type safety in programming languages consists of two parts:

  1. Preservation: Types are preserved during evaluation. If π‘‘βŸΆπ‘‘β€² and 𝑑::𝑇, then 𝑑′::𝑇. This can be written more formally as βˆ€(𝑑,𝑑′:Term) (𝑇:Type),π‘‘βŸΆπ‘‘β€²βˆ§βˆ…βŠ’π‘‘::π‘‡β†’βˆ…βŠ’π‘‘β€²::𝑇
  2. Progress: Well-typed terms are values or can be reduced. If 𝑑::𝑇, then either 𝑑 is a value, or there exists 𝑑′ This can be written more formally as βˆ€(𝑑,𝑑′:Term) (𝑇:Type),βˆ…βŠ’π‘‘::𝑇→is_valueπ‘‘βˆ¨βˆƒπ‘‘β€²,π‘‘βŸΆπ‘‘β€²

The rules we have given actually guarantee a stronger property than progress: unicity. This means that there is a unique term to reduce to:

βˆ…βŠ’π‘‘::𝑇→is_valueπ‘‘βˆ¨βˆƒ!𝑑′,π‘‘βŸΆπ‘‘β€²

Read more in chapter 6 of Practical Foundations for Programming Languages.

Definitions

First, the definitions of terms and types are the following two inductive types

/-
# Definition 6.5.1: Terms and Types of STLC + 𝟚
-/
inductive Ty where
  | Boolean
  | Arrow (dom cod : Ty)
deriving Repr

inductive Term where
  -- Core STLC
  | Abs (name : String) (ty : Ty) (body : Term)
  | App (t₁ : Term) (tβ‚‚ : Term)
  | Var (name : String)
  -- Booleans
  | True
  | False
  | Ite (cond : Term) (if_true : Term) (if_false : Term)
deriving Repr

Next, we define what it means to be a value. This is an inductive proposition that has one non-fixed argument for the Term.

inductive Value : Term β†’ Prop where
  | Abs : βˆ€ {x} {T} {t}, Value (.Abs x T t)
  | False : Value False
  | True : Value True
Inductive Propositions

You can read more about the subject in the Theorem Proving in Lean 4 book.

An inductive proposition is an inductive data type, but in the universe of propositions. Like other algebraic data types, inductive propositions can be constructed with its constructors and deconstructed by case matching.

inductive Value : Term β†’ Prop where
  | Abs : βˆ€ {x} {T} {t}, Value (.Abs x T t)
  | False : Value False
  | True : Value True

example : Value False := Value.False

example : Value (Abs "x" .Boolean (Var "x")) :=
  @Value.Abs "x" .Boolean (Var "x")

Because the arguments to Abs are implicit, Lean can infer them in some situations. Therefore we can shorten the proof for abstraction example to:

example : Value (Abs "x" .Boolean (Var "x")) := Value.Abs

Finally to case analyze some Value t, we can use the cases tactic. For example, the following proof shows that a value is either False, True or there exist x, T, t₁ such that the value is Abs x T t₁.

example (h : Value t) : t = False ∨ t = True ∨ βˆƒ x T t₁, t = Abs x T t₁ := by
  cases h with
  | True =>
    -- ⊒ True = False ∨ True = True ∨ βˆƒ x T t₁, True = Abs x T t₁
    -- Which holds as the middle disjunct is true
    right
    left
    rfl
  | False =>
    -- ⊒ False = False ∨ False = True ∨ βˆƒ x T t₁, False = Abs x T t₁
    left
    rfl
  | @Abs y U t =>
    /-
    y : String
    U : Ty
    t : Term
    ⊒ Abs y U t = False ∨ Abs y U t = True ∨ βˆƒ x T t₁, Abs y U t = Abs x T t₁
    -/
    -- This is true because we can use `x = y` `T = U` and `t₁ = t` to prove the βˆƒ
    right
    right
    exact ⟨y, U, t, rfl⟩

β€” The last case is most interesting, as there we get access to the variables that β€œwent into” constructing the Abs Value.

Next, we define substitution recursively:

def subst (x : String) (s : Term) (t : Term)  :=
  match t with
  | Var y => if x = y then s else t
  | Abs y T t₁ => if x = y then t else Abs y T (subst x s t₁)
  | App t₁ tβ‚‚ => App (subst x s t₁) (subst x s tβ‚‚)

  | Ite t₁ tβ‚‚ t₃ => Ite (subst x s t₁) (subst x s tβ‚‚) (subst x s t₃)

  -- Values etc.
  | t => t

We can even give a notation for substitution using Lean’s metaprogramming tools:

notation:60 "[" x "↦" s "]" t:61 => subst x s t

#eval ["x" ↦ (Var "y")] (Var "x") -- Var "y"

Evaluation

Next, we define a relation Step : Term β†’ Term β†’ Prop for 𝛽-reduction.

Reminder: 𝛽-reduction rules form chapter 5:

𝑑1βŸΆπ‘‘β€²1App1𝑑1 𝑑2βŸΆπ‘‘β€²1 𝑑2𝑑2βŸΆπ‘‘β€²2is_value(𝑣1)App2𝑣1 𝑑2βŸΆπ‘£1 𝑑′2is_value(𝑣2)AppAbs(πœ†π‘₯:𝑇.𝑑1) 𝑣2⟢[π‘₯↦𝑣2]𝑑1

Compare the following rules from Lean with these.

inductive Step : Term β†’ Term β†’ Prop where
  | AppAbs : βˆ€ x Tβ‚‚ t₁ vβ‚‚,
    Value vβ‚‚ β†’
    -----------------------------------------
    Step (App (Abs x Tβ‚‚ t₁) vβ‚‚) ([x ↦ vβ‚‚] t₁)
  | App1 : βˆ€ t₁ t₁' tβ‚‚,
    Step t₁ t₁' β†’
    -----------------------------
    Step (App t₁ tβ‚‚) (App t₁' tβ‚‚)
  | App2 : βˆ€ v₁ tβ‚‚ tβ‚‚',
    Value v₁ β†’
    Step tβ‚‚ tβ‚‚' β†’
    -----------------------------
    Step (App v₁ tβ‚‚) (App v₁ tβ‚‚')

  -- Booleans
  | IteTrue : βˆ€ t₁ tβ‚‚,
    ------------------------
    Step (Ite True t₁ tβ‚‚) t₁
  | IteFalse : βˆ€ t₁ tβ‚‚,
    -------------------------
    Step (Ite False t₁ tβ‚‚) tβ‚‚
  | Ite : βˆ€ t₁ t₁' tβ‚‚ t₃,
    Step t₁ t₁' β†’
    -----------------------------------
    Step (Ite t₁ tβ‚‚ t₃) (Ite t₁' tβ‚‚ t₃)

I have left β€œinference line” comments to separate the hypotheses from the conclusion.

And here’s the notation for Step:

notation:50 t " ⟢ " t' => Step t t'

Context

Next we define the context as a partial function from Strings to Types.

-- A partial map from variable names to types
abbrev Context := String β†’. Ty

Next, we define what it means to update the context with a new variable-type association.

def update (ctx : Context) (v : String) (ty : Ty) : Context :=
  fun s => if s = v then pure ty else ctx s

The definition simply defines a new function, which if passed the same string as v, returns the type ty, and otherwise refers to the old context.

The empty context is a partial function which is not defined for any input:

def empty : Context := fun _ => Part.none

instance instEmptyCollection : EmptyCollection Context where
  emptyCollection := empty

#check (βˆ… : Context)

@[simp]
lemma empty_apply {y} : (βˆ… : Context) y = .none := rfl

The last (simp) lemma states that βˆ… y = Part.none, which will be useful later.

We also define a subset relation between contexts called IncludedIn:

-- The first context is "included" in the second context
def IncludedIn (Ξ“ Ξ“' : Context) : Prop :=
  βˆ€ x v, v ∈ Ξ“ x β†’ v ∈ Ξ“' x

instance : HasSubset Context := ⟨IncludedIn⟩

#check βˆ… βŠ† update βˆ… "x" .Boolean

example : βˆ… βŠ† update βˆ… "x" .Boolean := by
  intro x T h
  simp only [empty_apply, Part.not_mem_none] at h

Typing

Now that we have defined the Context, let’s move on to typing next.

First, a quick reminder β€” the typing rules for the core of STLC:

βˆ€Ξ“,π‘₯,𝑇,VarΞ“,π‘₯::π‘‡βŠ’π‘₯::π‘‡βˆ€Ξ“,π‘₯,𝑑1,𝑇1,𝑇2,Ξ“,π‘₯::𝑇1βŠ’π‘‘1::𝑇2AbsΞ“βŠ’πœ†π‘₯:𝑇1. 𝑑1::𝑇1→𝑇2βˆ€Ξ“,𝑑1,𝑑2,𝑇1,𝑇2,Ξ“βŠ’π‘‘1::𝑇1→𝑇2Ξ“βŠ’π‘‘2::𝑇1AppΞ“βŠ’π‘‘1 𝑑2::𝑇2

The hypothetical typing judgement (Definition 6.5.4) is the following inductive trinary relation:

/--
# Definition 6.5.4: Hypothetical Typing Judgment
-/
inductive HasType : Context β†’ Term β†’ Ty β†’ Prop where
  | Var : βˆ€ Ξ“ x T,
    T ∈ Ξ“ x β†’
    -------------------
    HasType Ξ“ (Var x) T
  | Abs : βˆ€ Ξ“ x T₁ Tβ‚‚ t₁,
    HasType (Ξ“.update x T₁) t₁ Tβ‚‚ β†’
    --------------------------------------
    HasType Ξ“ (Abs x T₁ t₁) (.Arrow T₁ Tβ‚‚)
  | App : βˆ€ Ξ“ t₁ tβ‚‚ T Tβ‚‚,
    HasType Ξ“ t₁ (.Arrow Tβ‚‚ T) β†’
    HasType Ξ“ tβ‚‚ Tβ‚‚ β†’
    -----------------------
    HasType Ξ“ (App t₁ tβ‚‚) T

  -- Booleans
  | True : HasType Ξ“ True .Boolean
  | False : HasType Ξ“ False .Boolean
  | Ite :
    HasType Ξ“ t₁ .Boolean β†’
    HasType Ξ“ tβ‚‚ T β†’
    HasType Ξ“ t₃ T β†’
    -------------------------
    HasType Ξ“ (Ite t₁ tβ‚‚ t₃) T

Plus notation:

notation:50 Ξ“:51 " ⊒ " t:52 " ∷ " T => HasType Ξ“ t T

Examples from Example 6.5.5:

/-
# Example 6.5.5
-/
example : "x" ∷ .Boolean ⊒ (Var "x") ∷ .Boolean := by
  apply HasType.Var
  exact Context.mem_update

example : βˆ… ⊒ (Abs "x" .Boolean (Var "x")) ∷ .Arrow .Boolean .Boolean := by
  apply HasType.Abs
  apply HasType.Var
  exact Context.mem_update

Next, let’s actually prove some properties about STLC+𝟚!

Canonical terms

The canonical terms lemmas state that a value of Arrow type must be an abstraction, and a value of Boolean type must be either True or False The lemmas are useful when proving the progress theorem.

lemma canonical_abs {t T₁ Tβ‚‚} (h : βˆ… ⊒ t ∷ .Arrow T₁ Tβ‚‚) (ht : Value t) : βˆƒ x t₁, t = Abs x T₁ t₁ := by
  cases ht with
  | @Abs x T t₁ =>
    cases h
    use x
    use t₁
  | True | False =>
    nomatch h

lemma canonical_bool {t} (h : βˆ… ⊒ t ∷ .Boolean) (ht : Value t) : t = True ∨ t = False := by
  cases ht with
  | @Abs x T t₁ =>
    cases h
  | True | False =>
    tauto

Progress Theorem

/-
# Theorem: Progress
-/
theorem progress {t} {T} (h : βˆ… ⊒ t ∷ T) : Value t ∨ βˆƒ t', t ⟢ t'

The progress theorem states that all well-typed terms (in the empty context) are either values, or can be reduced further. This means that it’s not possible for a term to be β€œstuck” so that it’s neither. This is one of the two requirements for type safety.

See theorem progress for the proof details in the Lean environment1.

Preservation Theorem

/-
# Theorem: Preservation
-/
theorem preservation {t t'} {T} (h : βˆ… ⊒ t ∷ T) (ht : t ⟢ t') : βˆ… ⊒ t' ∷ T

The preservation theorem states that a well-typed term’s type is preserved during 𝛽-reduction. This theorem is easy to prove with the help of a few lemmas called weakening and substitution:

/-
# Lemma: Weakening
-/
lemma weakening {Ξ“ Ξ“'} {t} {T} (hg : Ξ“ βŠ† Ξ“') (h : Ξ“ ⊒ t ∷ T) : Ξ“' ⊒ t ∷ T := by

/-
# Lemma: Substitution
-/
lemma substitution {Ξ“} {x} {v t} {U T} (ht : Ξ“.update x U ⊒ t ∷ T) (h : βˆ… ⊒ v ∷ U)
  : Ξ“ ⊒ [x ↦ v] t ∷ T

See theorem preservation for the proof details in the Lean environment1.

With both preservation and progress, we know that STLC+𝟚 is type safe.

STLC Extensions

It’s not too difficult to extend type safety (i.e. progress and preservation theorems) to STLC with extensions. Here’s the definition of a language based on STLC with unit types, booleans, integers, lists, pairs, sums, fixed point combinator and let-expressions.

inductive Ty where
  | Unit
  | Boolean
  | Integer
  | List (elem : Ty)

  -- dom β†’ cod
  | Arrow (dom cod : Ty)

  -- left Γ— right
  | Prod (left right : Ty)

  -- left + right
  | Sum (left right : Ty)
deriving Repr, DecidableEq

inductive Term where
  -- STLC core
  | Var (name : String)
  | Abs (name : String) (T : Ty) (body : Term)
  | App (t₁ tβ‚‚ : Term)

   -- Unit
  | Trivial

  -- Boolean
  | True
  | False
  | Ite (cond if_true if_false : Term)
  | Eq (left righ : Term)

  -- Integer
  | Int (n : β„€)

  -- List
  | Nil (T : Ty)
  | Cons (head tail : Term)
  | LCase (t : Term)
      (nil_t : Term)
      (head_var : String) (tail_var : String) (cons_t : Term)

  -- Prod
  | Pair (left right : Term)
  | Fst (t : Term)
  | Snd (t : Term)

  -- Sum
  | Inl (t : Term) (Tβ‚‚ : Ty)
  | Inr (t : Term) (T₁ : Ty)
  | Case (t : Term)
      (inl_var : String) (inl_t : Term)
      (inr_var : String) (inr_t : Term)

  -- Sugar
  | Let (name : String) (value_expr body : Term)

  -- Recursion
  | Fix (t : Term)
deriving Repr

/- Omitted other definitions about Context, Value, subst etc. -/

theorem progress {t} {T} (h : βˆ… ⊒ t ∷ T) : Value t ∨ βˆƒ t', t ⟢ t' := sorry

theorem preservation {t t'} {T} (h : βˆ… ⊒ t ∷ T) (ht : t ⟢ t') : βˆ… ⊒ t' ∷ T := sorry

As a final exercise to the reader, formally prove the preservation and progress theorems for this language. However, no points are awarded for completing this exercise.

Footnotes

Footnotes

  1. Open the Lean playground environment. ↩ ↩2 ↩3