A Taste of Formalization
Learning Objectives
- You know of the Lean proof assistant
- You know how basic theorems about lambda calculus can be formalized using a proof assistant
This is an extra chapter about formalization of programming language theory. We are going to prove and formalize type safety for the simply typed lambda calculus from chapter 5.
Firstly, what is formalization?
Formalization is the process of converting something informal to something formal. Mathematics can be formalized by e.g. using programming languages. Some programming languages such as Lean are built with mathematics in mind and a significant amount of mathematics has already been formalized in Lean.
All of the code presented here is found in lean playground1 (click the footnote to open it).
While reading the material, move your cursor along in the editor to see how Leanβs InfoView changes.
Type Safety
Type safety in programming languages consists of two parts:
- Preservation: Types are preserved during evaluation. If and , then . This can be written more formally as
- Progress: Well-typed terms are values or can be reduced. If , then either is a value, or there exists This can be written more formally as
The rules we have given actually guarantee a stronger property than progress: unicity. This means that there is a unique term to reduce to:
Read more in chapter 6 of Practical Foundations for Programming Languages.
Definitions
First, the definitions of terms and types are the following two inductive types
/-
# Definition 6.5.1: Terms and Types of STLC + π
-/
inductive Ty where
| Boolean
| Arrow (dom cod : Ty)
deriving Repr
inductive Term where
-- Core STLC
| Abs (name : String) (ty : Ty) (body : Term)
| App (tβ : Term) (tβ : Term)
| Var (name : String)
-- Booleans
| True
| False
| Ite (cond : Term) (if_true : Term) (if_false : Term)
deriving Repr
Next, we define what it means to be a value. This is an inductive proposition that has one non-fixed argument for the Term.
inductive Value : Term β Prop where
| Abs : β {x} {T} {t}, Value (.Abs x T t)
| False : Value False
| True : Value True
You can read more about the subject in the Theorem Proving in Lean 4 book.
An inductive proposition is an inductive data type, but in the universe of propositions. Like other algebraic data types, inductive propositions can be constructed with its constructors and deconstructed by case matching.
inductive Value : Term β Prop where
| Abs : β {x} {T} {t}, Value (.Abs x T t)
| False : Value False
| True : Value True
example : Value False := Value.False
example : Value (Abs "x" .Boolean (Var "x")) :=
@Value.Abs "x" .Boolean (Var "x")
Because the arguments to
Abs
are implicit, Lean can infer them in some situations. Therefore we can shorten the proof for abstraction example to:example : Value (Abs "x" .Boolean (Var "x")) := Value.Abs
Finally to case analyze some Value t
, we can use the cases
tactic.
For example, the following proof shows that a value is either False
, True
or there exist x, T, tβ
such that the value is Abs x T tβ
.
example (h : Value t) : t = False β¨ t = True β¨ β x T tβ, t = Abs x T tβ := by
cases h with
| True =>
-- β’ True = False β¨ True = True β¨ β x T tβ, True = Abs x T tβ
-- Which holds as the middle disjunct is true
right
left
rfl
| False =>
-- β’ False = False β¨ False = True β¨ β x T tβ, False = Abs x T tβ
left
rfl
| @Abs y U t =>
/-
y : String
U : Ty
t : Term
β’ Abs y U t = False β¨ Abs y U t = True β¨ β x T tβ, Abs y U t = Abs x T tβ
-/
-- This is true because we can use `x = y` `T = U` and `tβ = t` to prove the β
right
right
exact β¨y, U, t, rflβ©
β The last case is most interesting, as there we get access to the variables that βwent intoβ constructing the Abs Value.
Next, we define substitution recursively:
def subst (x : String) (s : Term) (t : Term) :=
match t with
| Var y => if x = y then s else t
| Abs y T tβ => if x = y then t else Abs y T (subst x s tβ)
| App tβ tβ => App (subst x s tβ) (subst x s tβ)
| Ite tβ tβ tβ => Ite (subst x s tβ) (subst x s tβ) (subst x s tβ)
-- Values etc.
| t => t
We can even give a notation
for substitution using Leanβs metaprogramming tools:
notation:60 "[" x "β¦" s "]" t:61 => subst x s t
#eval ["x" β¦ (Var "y")] (Var "x") -- Var "y"
Evaluation
Next, we define a relation Step : Term β Term β Prop
for -reduction.
Reminder: -reduction rules form chapter 5:
Compare the following rules from Lean with these.
inductive Step : Term β Term β Prop where
| AppAbs : β x Tβ tβ vβ,
Value vβ β
-----------------------------------------
Step (App (Abs x Tβ tβ) vβ) ([x β¦ vβ] tβ)
| App1 : β tβ tβ' tβ,
Step tβ tβ' β
-----------------------------
Step (App tβ tβ) (App tβ' tβ)
| App2 : β vβ tβ tβ',
Value vβ β
Step tβ tβ' β
-----------------------------
Step (App vβ tβ) (App vβ tβ')
-- Booleans
| IteTrue : β tβ tβ,
------------------------
Step (Ite True tβ tβ) tβ
| IteFalse : β tβ tβ,
-------------------------
Step (Ite False tβ tβ) tβ
| Ite : β tβ tβ' tβ tβ,
Step tβ tβ' β
-----------------------------------
Step (Ite tβ tβ tβ) (Ite tβ' tβ tβ)
I have left βinference lineβ comments to separate the hypotheses from the conclusion.
And hereβs the notation for Step
:
notation:50 t " βΆ " t' => Step t t'
Context
Next we define the context as a partial function from String
s to Ty
pes.
-- A partial map from variable names to types
abbrev Context := String β. Ty
Next, we define what it means to update the context with a new variable-type association.
def update (ctx : Context) (v : String) (ty : Ty) : Context :=
fun s => if s = v then pure ty else ctx s
The definition simply defines a new function, which if passed the same string as v
, returns the type ty
, and otherwise refers to the old context.
The empty context is a partial function which is not defined for any input:
def empty : Context := fun _ => Part.none
instance instEmptyCollection : EmptyCollection Context where
emptyCollection := empty
#check (β
: Context)
@[simp]
lemma empty_apply {y} : (β
: Context) y = .none := rfl
The last (simp) lemma states that
β y = Part.none
, which will be useful later.
We also define a subset relation between contexts called IncludedIn
:
-- The first context is "included" in the second context
def IncludedIn (Ξ Ξ' : Context) : Prop :=
β x v, v β Ξ x β v β Ξ' x
instance : HasSubset Context := β¨IncludedInβ©
#check β
β update β
"x" .Boolean
example : β
β update β
"x" .Boolean := by
intro x T h
simp only [empty_apply, Part.not_mem_none] at h
Typing
Now that we have defined the Context
, letβs move on to typing next.
First, a quick reminder β the typing rules for the core of STLC:
The hypothetical typing judgement (Definition 6.5.4) is the following inductive trinary relation:
/--
# Definition 6.5.4: Hypothetical Typing Judgment
-/
inductive HasType : Context β Term β Ty β Prop where
| Var : β Ξ x T,
T β Ξ x β
-------------------
HasType Ξ (Var x) T
| Abs : β Ξ x Tβ Tβ tβ,
HasType (Ξ.update x Tβ) tβ Tβ β
--------------------------------------
HasType Ξ (Abs x Tβ tβ) (.Arrow Tβ Tβ)
| App : β Ξ tβ tβ T Tβ,
HasType Ξ tβ (.Arrow Tβ T) β
HasType Ξ tβ Tβ β
-----------------------
HasType Ξ (App tβ tβ) T
-- Booleans
| True : HasType Ξ True .Boolean
| False : HasType Ξ False .Boolean
| Ite :
HasType Ξ tβ .Boolean β
HasType Ξ tβ T β
HasType Ξ tβ T β
-------------------------
HasType Ξ (Ite tβ tβ tβ) T
Plus notation:
notation:50 Ξ:51 " β’ " t:52 " β· " T => HasType Ξ t T
Examples from Example 6.5.5:
/-
# Example 6.5.5
-/
example : "x" β· .Boolean β’ (Var "x") β· .Boolean := by
apply HasType.Var
exact Context.mem_update
example : β
β’ (Abs "x" .Boolean (Var "x")) β· .Arrow .Boolean .Boolean := by
apply HasType.Abs
apply HasType.Var
exact Context.mem_update
Next, letβs actually prove some properties about !
Canonical terms
The canonical terms lemmas state that a value of Arrow
type must be an abstraction, and a value of Boolean
type must be either True
or False
The lemmas are useful when proving the progress theorem.
lemma canonical_abs {t Tβ Tβ} (h : β
β’ t β· .Arrow Tβ Tβ) (ht : Value t) : β x tβ, t = Abs x Tβ tβ := by
cases ht with
| @Abs x T tβ =>
cases h
use x
use tβ
| True | False =>
nomatch h
lemma canonical_bool {t} (h : β
β’ t β· .Boolean) (ht : Value t) : t = True β¨ t = False := by
cases ht with
| @Abs x T tβ =>
cases h
| True | False =>
tauto
Progress Theorem
/-
# Theorem: Progress
-/
theorem progress {t} {T} (h : β
β’ t β· T) : Value t β¨ β t', t βΆ t'
The progress theorem states that all well-typed terms (in the empty context) are either values, or can be reduced further. This means that itβs not possible for a term to be βstuckβ so that itβs neither. This is one of the two requirements for type safety.
See theorem progress
for the proof details in the Lean environment1.
Preservation Theorem
/-
# Theorem: Preservation
-/
theorem preservation {t t'} {T} (h : β
β’ t β· T) (ht : t βΆ t') : β
β’ t' β· T
The preservation theorem states that a well-typed termβs type is preserved during -reduction.
This theorem is easy to prove with the help of a few lemmas called weakening
and substitution
:
/-
# Lemma: Weakening
-/
lemma weakening {Ξ Ξ'} {t} {T} (hg : Ξ β Ξ') (h : Ξ β’ t β· T) : Ξ' β’ t β· T := by
/-
# Lemma: Substitution
-/
lemma substitution {Ξ} {x} {v t} {U T} (ht : Ξ.update x U β’ t β· T) (h : β
β’ v β· U)
: Ξ β’ [x β¦ v] t β· T
See theorem preservation
for the proof details in the Lean environment1.
With both preservation and progress, we know that is type safe.
STLC Extensions
Itβs not too difficult to extend type safety (i.e. progress and preservation theorems) to STLC with extensions. Hereβs the definition of a language based on STLC with unit types, booleans, integers, lists, pairs, sums, fixed point combinator and let-expressions.
inductive Ty where
| Unit
| Boolean
| Integer
| List (elem : Ty)
-- dom β cod
| Arrow (dom cod : Ty)
-- left Γ right
| Prod (left right : Ty)
-- left + right
| Sum (left right : Ty)
deriving Repr, DecidableEq
inductive Term where
-- STLC core
| Var (name : String)
| Abs (name : String) (T : Ty) (body : Term)
| App (tβ tβ : Term)
-- Unit
| Trivial
-- Boolean
| True
| False
| Ite (cond if_true if_false : Term)
| Eq (left righ : Term)
-- Integer
| Int (n : β€)
-- List
| Nil (T : Ty)
| Cons (head tail : Term)
| LCase (t : Term)
(nil_t : Term)
(head_var : String) (tail_var : String) (cons_t : Term)
-- Prod
| Pair (left right : Term)
| Fst (t : Term)
| Snd (t : Term)
-- Sum
| Inl (t : Term) (Tβ : Ty)
| Inr (t : Term) (Tβ : Ty)
| Case (t : Term)
(inl_var : String) (inl_t : Term)
(inr_var : String) (inr_t : Term)
-- Sugar
| Let (name : String) (value_expr body : Term)
-- Recursion
| Fix (t : Term)
deriving Repr
/- Omitted other definitions about Context, Value, subst etc. -/
theorem progress {t} {T} (h : β
β’ t β· T) : Value t β¨ β t', t βΆ t' := sorry
theorem preservation {t t'} {T} (h : β
β’ t β· T) (ht : t βΆ t') : β
β’ t' β· T := sorry
As a final exercise to the reader, formally prove the preservation and progress theorems for this language. However, no points are awarded for completing this exercise.
Footnotes
Footnotes
-
Open the Lean playground environment. β© β©2 β©3