The type inference algorithm always plays an important role in the real-world programming languages, they allows the user to selectively omit type annotations when it is obvious from the context. One of the most famous application is the ML-Family, where the programmer is permitted to omit almost all type annotations. The algorithm is based on unification: it resembles the equations, the unknown type will be denoted by a placeholder, and our task is to solve the equation to get a set of proper subtitutions that satisfies it, i.e., the solution of the equation.
To get start with the algorithm, some preliminaries may be required, I'll state the definitions more intuitively rather than formally in order to help readers realize it, so do not consider them as formal definitions, if you are looking for a rigorous definition from mathematical perspective, Unification - Wikipedia is what you need.
Let's think about the first question arises in our mind: What exactly is a type equation? Intuitively, it means that we need to know the shape of the exact types, then find the subtitutions (the solution) that accommodates the assumption. For example, consider the term \lambda x.\ x\ x\ 0, what type should x have?
Let's split the body of the abstraction into inner application x\ 0 and the outer application x\ y where y=x\ 0, consider the inner application, we can see that x applies to 0, this reveal two facts: 1. if x is well-typed, then it must be of type \tau\to\sigma for some \tau and \sigma. 2. since it applies to 0, the \tau must equal to \tt int, so, from the inner application, suppose that x has type \rho, we can generate two type equations for x: 1. \rho=\tau\to\sigma for some \tau and \sigma. 2: \tau=\tt int. Now, consider the outer application, since we've already know that the inner application has type \sigma (we conclude x:\tau\to\sigma from the inner application, and applies to a 0, from the typing rule for application, it is trivial that x\ 0:\sigma), so the outer application must takes an argument with type \sigma, and returns some \sigma', now, we know that inner x and outer x must have the same type, therefore a new equation emerged: \tau\to\sigma=\sigma\to\sigma', combined with the aforementioned equations, we can obtain a equation set: \lbrace \rho=\tau\to\sigma, \tau=\tt int, \tau\to\sigma=\sigma\to\sigma'\rbrace
Be aware that during the calculation, there are actually several derivations that are not permitted in a normal type checking algorithm, for example, we assume x:\tau\to\rho, but then it applies to 0:\tt int, how could this happen? the \tau is not known to be the same as \tt int by that time. The reason is that, \tau, \rho, \sigma, and \sigma' are type placeholders, when then are engaged with other types during the calculation, i.e., a placeholder \tau is present when an \tt int is expect, instead of fail the algorithm immediately, we add a new equation \tau=\tt int indicate that type \tau is supposed to be of type \tt int, and this is precisely how the algorithm calculates the equations.
We've been using phrase type equation to indicate such equations, in computer science or logic, it has a scientific name called Constraint, and the set of constraints are called Constraint Set, the procedure to solve a constraint is called Unification; the solution of a constraint is called its Unifier, a unifier unifies a constraint set if it unifies every constraint in it.
From above definition it's not hard to find that both constraint set, unification, unifiers are just alias for the similar concepts in equations. In the above example, the unifier is [\tau\mapsto \tt int, \sigma\mapsto int, \sigma'\mapsto int, \rho\mapsto int\to int], note that we use symbol \mapsto instead of =, because in such context the unifier stands for substitutions, when we apply a unifier to a particular type, it actually replaces all the variables in the type that in its domain to the corresponding type in its codomain, for example, apply the above unifier to type \rho\to\sigma will yields \tt int\to int\to int, where the \rho is substituted by \tt int\to int, and \sigma is substituted by \tt int.
Now we can give the algorithm for calculating the constraint set:
Again, we are aiming "easy-to-understand" instead of "rigorousness", so I'll explain the enigmatic mathematical formulas in a pragmatic way: when you see something like \cfrac{A_1,A_2,...}{B}, you know that A_i are premises, and B is a conclusion, it means, "if all of A_1,A_2,... can be satisfied, then we can conclude B", or, in a way that is more intuitive for programmers, "if A_1=\mathtt{true}\ \land\ A_2=\mathtt{true}\ \land\ ..., then B=\tt true"; formula \Gamma\vdash x:\mathtt{T_1} means x has type \tt T_1 under the context \Gamma, in practice, the \Gamma is usually the symbol table, a list that records the defined variables and their types, we use \Gamma,x:\tt T to append x:\tt T to the end of \Gamma, P\vdash Q read as "with information in P, I can conclude that Q is true", P can be omitted when it's empty, means that I can conclude that Q is true without any context. The above figure uses two more context symbols than a single \Gamma, that is \mathcal{X} and \mathcal{C}, specifically, \mathcal{X} is the set of generated type placeholders during the algorithm (see above explaination about "type placeholder"), and \mathcal{C} is the already-calculated set of constraints. Specifically, \Gamma\vdash x:\mathtt{T_1}\ |_{\mathcal{X}}\ \mathcal{C} means x has type \tt T_1 under symbol table \Gamma, while the already-calculated constraint set is \mathcal{C}, and, until now, we have generated additional type placeholders stored in \mathcal{X}. If we think the algorithm as a function, then the function will take \Gamma and x as input, and produce \mathcal{X} and \mathcal{C} as output. The type \tt Nat stands for "natural numbers" (we use it interchangeably with \tt int, although they stand for a very different range).
Now we can give the function to calculate the constraint set, but first, let's introduce some required definitions:
enum Type:
case Base(n: String) extends Type
case Constructor(from: Type, to: Type) extends Type
case Scheme(quantifiers: Set[Type], ty: Type) extends Type
case Forall(n: String) extends Type
override def toString: String =
import Type.*
this match
case Base(name) => name
case Constructor(from, to) => s"(from->to)"
case Scheme(quantifiers, ty) => s"∀{quantifiers.foldLeft("")((acc, base) => acc + base.name)}.(ty)"
case Forall(name) => s"∀$name"
end toString
def name: String =
import Type.*
this match
case Base(name) => name
case Forall(name) => name
case _ => throw IllegalStateException()
end name
end Type
object TypePrimitives:
val Nat: Type = Type.Base("Nat")
val Bool: Type = Type.Base("Bool")
end TypePrimitives
We use Type.Base
to represent monotypes (i.e. base types that can be represented by a single symbol), Type.Constructor
to represent function types, and Type.Forall
to represent type placeholders, Type.Scheme
will be exlained soon. TypePrimitives
provide built-in types (like primitive types in Java
/C#
). And the abstract syntax tree of our language can be described by:
enum SyntaxNode:
case Var(name: String)
case Abs(binder: String, binderType: Option[Type], body: SyntaxNode)
case App(function: SyntaxNode, applicant: SyntaxNode)
case Successor(nat: SyntaxNode)
case Predecessor(nat: SyntaxNode)
case IsZero(nat: SyntaxNode)
case If(condition: SyntaxNode, trueClause: SyntaxNode, falseClause: SyntaxNode)
case Zero, True, False
case Let(left: String, right: SyntaxNode, body: SyntaxNode)
case Seq(first: SyntaxNode, second: SyntaxNode)
end SyntaxNode
To prevent repeated tedious types, we introduce some type alias:
opaque type Constraint = (Type, Type)
opaque type ConstraintSet = (Type, Set[String], Set[Constraint])
opaque type Bindings = mutable.Map[String, Type]
The Constraint Set
has three member, the first is the calculated type (may contain type placeholders), the second is the generated type placeholders, and the third is already-calculated constraint set, Bindings
are symbol table.
object FreshNameProvider:
private var currentChar: Char = 'A'
private var currentIndex: Int = 0
def freshName(): String =
val str = s"currentChar{if (currentIndex != 0) currentIndex else ""}"
currentChar = currentChar match
case x if 'A' until 'Z' contains x => (currentChar + 1).toChar
case _ => currentIndex += 1; 'A'
return str
def freshName(excludes: Set[String]): String =
val name = freshName()
return if !excludes.contains(name) then name else freshName(excludes)
end FreshNameProvider
The FreshNameProvider
is to provide unique names at every call to freshName()
or freshName(Set[String])
, the later one generates a unique name different from every element in Set[String]
.
// Get all the base type variables that are ever occurred in the given constraint set
def constraintVariables(constraintSet: Set[(Type, Type)]): Set[Type] =
return constraintSet.flatMap {
case (left, right) => typeVariables(left) | typeVariables(right)
}
end constraintVariables
// Get all the base type variables in a type declaration
def typeVariables(decl: Type): Set[Type] =
return decl match
case base: Type.Base => Set(base)
case Type.Constructor(from, to) => typeVariables(from) | typeVariables(to)
case Type.Scheme(_, ty) => typeVariables(ty)
case forall: Type.Forall => Set(forall)
end typeVariables
// Retrieves the free occurrences of variables in a term
def freeVariables(root: SyntaxNode): Set[String] =
import SyntaxNode.*
val symbols = mutable.Stack[String]()
def helper(r: SyntaxNode): Set[String] =
return r match
case Var(name) => if (!symbols.contains(name)) Set(name) else Set.empty
case Abs(binder, _, body) =>
symbols.push(binder)
val fvs = helper(body)
symbols.pop()
return fvs
case App(function, applicant) => helper(function) | helper(applicant)
case If(condition, ifTrue, ifFalse) => helper(condition) | helper(ifTrue) | helper(ifFalse)
case _ => Set.empty
end helper
return helper(root)
end freeVariables
These helper functions calculate free variables in expression, type, and contraint set.
之前在群里和群友讨论，如果你不打算搞纯理论方向的研究，那么学习理论知识究竟有什么意义，就拿计算机语言说事，看上去现在的绝大部分语言已经是开箱即用的了，不需要你去了解底层的原理，比如代码是如何被机器理解的，比如你写出来的烂代码是怎么被自动优化的，在编译器已经如此智能的今天，学习编译器设计或编程语言相关理论的用处到底是什么呢？
其一，是让你知道自己应该写什么样的代码，不应该写什么样的代码，举个例子，你知道了正则表达式引擎的原理，你就知道什么样的正则表达式可能引发问题(相关问题：正则回溯)，在遇到问题的时候，也不至于晕头转向，再比如常见的内存问题，代码的性能问题，你知道虚拟机和编译器是如何处理接口的，你就知道什么时候使用接口会降低性能，什么时候使用接口不会降低性能(相关优化：Devirtualization)，你知道编译器是如何对待异步对象和闭包的，你就知道为什么使用async/await
容易在内存中留下大量内存碎片(相关主题：CPS变换)，更进一步的，如果你知道闭包和普通匿名函数的区别以及编译器是如何对待他们的，你就能写出更高性能的代码(举个例子，CommunityToolkit.HighPerformance
中的Messenger
通过增加匿名函数的参数避免闭包创建，从而优化性能)，你知道struct
什么时候会分配在栈上，什么时候会逃逸到堆上(相关主题：装箱与拆箱，Java不存在这个问题，因为Java目前没有struct
，而是会使用逃逸分析自动将确定不会逃逸到堆上的对象改为栈上分配)。在了解这些东西你以后，你就能在实际写代码的时候避免去写出对内存/运行速度来说不利的代码，没学过的时候也许会觉得这些东西是编译器的魔法，想要记住如何使用只能死记硬背，但是如果你把上面提到的理论和实践有机结合起来，这些东西就显得非常自然，从掌握的知识中就能把这些结论和其原因推导出来；就像在学习中背公式和推导公式，背单词和记词根以及构词法的区别一样，换言之，也就是“鱼”和“渔”，前者也许可以在数量级较小的时候给你带来较高的效率，但是随着你拓宽知识面，增加学习深度，它最终一定会变成一个痛点；而后者虽然相比之下起步较慢，但是在掌握之后便能以不变应万变，让自己在海量的知识洪流中宛如中流砥柱，事半功倍。
其二，是避免做无用功，一个例子就是让自己明白什么样的优化是不需要的，之前看到一篇别人发给我的文章讲优化C#的foreach
循环在数组上面的性能，认为foreach
在数组上居然会调用GetEnumerator()
等一大堆乱七八糟的东西简直是舍近求远，明明直接用[]
运算符取下标不就好了，文中又是看MSIL又是上WinDbg，诸多宝器轮番上阵，最后得出一个把foreach
换成for
即可的结论。在看完这篇文章之后我颇觉奇怪，foreach
是GetEnumerator()
，Current
和MoveNext()
的语法糖这一点应该是人尽皆知的事实，任何情况下foreach
都会生成使用这三个方法的MSIL，仔细思考之后，我总结出了文中的几点漏洞：
foreach
任何情况下都会生成使用上述三个方法的IL，数组能使用foreach
是因为其在CLR中对应的类本身也包含了GetEnumerator()
方法，所以当然落入此类中foreach
优化，会将其自动转换为for
循环，在Java中有一个RandomAccess
接口同样支持此操作，因此完全没有必要去手动优化foreach
基于Duck Typing，这甚至在一定程度上在很多场景下避免了虚方法调用开销之所以该文章会花时间进行一个没必要的微优化，答案就是对这些知识缺乏掌握，而这些东西在很多编译器设计相关文献中都有提及，甚至是MSDN和dotnet/runtime这两个面向开发者的文档和仓库都有不太深入但已经足够帮助你认识到上面几点的介绍。
当然，这并不是说注重实践就是坏的，毕竟，所谓理论都是人们从实践中发现，总结，并形式化的经验和规律，不注重实践只注重理论，必然会造成纸上谈兵，而只注重实践不注重理论，则难免南辕北辙，把两者结合起来，才能在乘前人之风遨游天地的同时保留自己的求知欲和洞察欲。很多看理论文献时候绞尽脑汁的东西，自己去动手实现一下就会迎刃而解，而很多看实现看不出名堂的复杂结构，了解一些其基于的理论就会拨云见日。合理的管理这两者之间的关系，根据自身实际情况来决定两者的占比，才是工程相关领域的正确学习思路。(仅为个人意见)
∎
The intuition comes from the their definitions, that the greatest fixed point is the union of all \mathcal{F}-consistent sets and the least fixed point is the intersection of all \mathcal{F}-closed sets.
A tree type is a partial function T:\lbrace 1,2\rbrace^\ast\rightharpoonup\lbrace\to, \times, \top \rbrace such that T(\epsilon)\downarrow, T(\pi\rho)\downarrow implies T(\pi)\downarrow, T(\pi)=\to or T(\pi)=\times implies T(\pi1)\downarrow and T(\pi2)\downarrow. Let T(\epsilon)=\top. The tree type can also be expressed by the following BNF:
The set of all finite tree types, denoted by \mathcal{T_f}, will be the least fixed point of T, and set of all tree types (including both finite and infinite), denoted by \mathcal{T}, is the greatest fixed point of T∎
The tree type function actually is a function that takes an argument which is of string form and only consists of 1 and 2 as path from the root of a binary tree to one of its nodes, where 1 means left child and 2 means right child
The subtyping relation of a type system who tolerate recursive types is thus can be defined by subtying relation on finite tree types and infinite tree types
For any \tau,\rho\in\mathcal{T_f}, \tau<:\rho if (\tau, \rho)\in\mu \mathcal{S_f}, where the monotone function \mathcal{S_f}:\mathcal{P}(\mathcal{T_f}\times\mathcal{T_f})\to(\mathcal{T_f}\times\mathcal{T_f}) is defined by:
∎
For any \tau,\rho\in\mathcal{T}, \tau<:\rho if (\tau, \rho)\in\nu \mathcal{S}, where the monotone function \mathcal{S}:\mathcal{P}(\mathcal{T}\times\mathcal{T})\to(\mathcal{T}\times\mathcal{T}) is defined by:
∎
Notice that the generating function of finite and infinite types are the same, two definition only differs from the choice of fixed point.
A generating function \mathcal{F} is said to be invertible if for all x\in\mathcal{U}, the family of sets G_x=\lbrace X\sube\mathcal{U}\ |\ x\in\mathcal{F}(X) \rbrace is strictly ordered and well-founded under the set inclusion. If \mathcal{F} is invertible, then we can define a partial function sup_{_\mathcal{F}}:\mathcal{U}\rightharpoonup\mathcal{P(U)} as:
The sup_{\tiny\mathcal{F}} is basically a function that gives the set who can be used to generate, or provide its argument, if S is the mininal set satisfies S=\mathcal{F}(X) and x\in S, then sup_{_\mathcal{F}}(x)=S
We can generalize sup_{_\mathcal{F}} to sets:
∎
We can check if a set X is the subset of the greatest fixed point of a function \mathcal{F} by leveraging the sup_{\tiny\mathcal{F}}, define gfp:\mathcal{P(U)}\to\lbrace true, false\rbrace as follow:
and let gfp_{\tiny\mathcal{F}}(x)\equiv gfp_{\tiny\mathcal{F}}(X)
∎
For the next we will prove the correctness of function gfp_{\tiny\mathcal{F}}, make sure it really returns the correct result for arbitrary X.
X\sube\mathcal{F}(Y) iff sup_{\tiny\mathcal{F}}(X)\downarrow and sup_{\tiny\mathcal{F}}(X)\sube Y
Proof:
From the definition of sup_{\tiny\mathcal{F}} we can see that it is sufficient to prove for all x\in\mathcal{F}(Y), sup_{\tiny\mathcal{F}}(x)\downarrow and sup_{\tiny\mathcal{F}}(x)\in Y: since x\in\mathcal{F}(Y), it's clearly that G_x\neq \empty, which means by definition, sup_{\tiny\mathcal{F}}(x)\downarrow and sup_{\tiny\mathcal{F}}(x)\sube Y(since sup_{\tiny\mathcal{F}} returns the smallest element). Conversely, from the monotonicity of \mathcal{F} we get \mathcal{F}sup_{\tiny\mathcal{F}}(x))\sube \mathcal{F}(Y), however, x\in\mathcal{F(sup_{\tiny\mathcal{F}}(x))} by definition, thus x\in\mathcal{F}(Y)∎
This lemma basically inverses the definition of sup_{\tiny\mathcal{F}}
Suppose P is a fixed point of \mathcal{F}, then X\sube P iff sup_{\tiny\mathcal{F}}(X)\downarrow and sup_{\tiny\mathcal{F}}(X)\sube P
Proof:
By the definition of fixed point, X\sube P\equiv X\sube F(P), then the result is obvious from lemma 1.∎
Now we can prove the partial correctness of gfp_{\tiny\mathcal{F}}, where "partial" means that the termination proof of gfp_{\tiny\mathcal{F}} requires further constraints which will be investigated later.
Proof:
By induction on the recursive tree of gfp_{\tiny\mathcal{F}}(X)=true
∎
Now we are going to investivate the halt condition for gfp_{\tiny\mathcal{F}}.
Define set rch_{\tiny\mathcal{F}}(X) to be the union of pred^n(X) (n composition of function pred) for all n\geqslant 0, where pred(X)=\bigcup_{x\in X}pred(x) such that pred(x) is defined by:
and let rch_{\tiny\mathcal{F}}(x)\equiv rch_{\tiny\mathcal{F}}({x})∎
Be noticed that the set rch_{\tiny\mathcal{F}}(X) contains all the elements that are directly or indirectly required to generate elements in X
A generating function \mathcal{F} is said to be finite state if rch_{\tiny\mathcal{F}} is finite for all x\in\mathcal{U}∎
gfp_{\tiny\mathcal{F}}(X)\downarrow only if rch_{\tiny\mathcal{F}}(X) is finite, i.e., if \mathcal{F} is finite state, then gfp_{\tiny\mathcal{F}} is guaranteed to terminates for any finite X\sube\mathcal{U}
Proof:
Notice that on each recursive call gfp_{\tiny\mathcal{F}}(Y) of the original call gfp_{\tiny\mathcal{F}}(X), we have Y\sube rch_{\tiny\mathcal{F}}(X) by definition, since size of Y is strictly incresing and rch_{\tiny\mathcal{F}}(X) is finite, the function must terminates when |Y|=|rch_{\tiny\mathcal{F}}(X)|.∎
Now we have defined the generating function \mathcal{S} for the subtyping relation of infinite types, and have already found a function to check the membership, now we need to implement the sup function for \mathcal{S}, to prove the correctness of this algorithm, we need to find all types that the algorithm will terminate.
A tree type \rho is a subtree of a tree type \tau if \rho=\lambda\sigma. \tau(\pi\sigma), i.e., \rho can be obtained by adding a fixed prefix \pi to \tau. The prefix \pi is the path from the root of \tau to the root of \rho. We use subtrees(\tau) to denote all subtrees of \tau.∎
Since the tree type is a function that takes a path represented by string as argument and returns a node corresponding to the path, adding a fixed prefix to it means we always want to start from a particular node.
A tree type \tau\in\mathcal{T} is regular if subtrees(\tau) is finite, i.e., \tau has finitely many distinct subtrees. The set of all regular trees is written \mathcal{T}_r.∎
Note that the if we restrict the domain of \mathcal{S} to \mathcal{T_r}, denoted \mathcal{S_r},then it will be finite state, to prove this, observe that rch_{\tiny\mathcal{S_r}}(\tau, \rho)\sube subtrees(\tau)\times subtrees(\rho)(obvious from the three clause of \mathcal{S}), and since both subtrees(\tau) and subtrees(\rho) are finite, so is rch_{\tiny\mathcal{S_r}}(\tau, \rho).
Let \lbrace X_1, X_2, ...\rbrace be a countable set of type variables, let set \mathcal{T_m^{raw}} be the set of raw \mu-Types defined by the following BNF:
where \mu X.\tau is recursive type which can be unfolded once by [X\mapsto \mu X.\tau]\tau, however, all the unfoldings of a raw \mu-type are definitionally equal (that's the so-called "equi-recursive", the unfolding of a recursive type is equal to its folding). Intuitively, \mu X.\mathtt{Int}\to X is definitionally equals to the following scala type declaration(this is just some sort of pseudocode, real scala does not permit recursive occurences of X
on both sides of =
):
type X = Int -> X
We use FV(\tau) to denote the free type variables in a \mu-type \tau
To establish the theorems based on the raw \mu-type, we first need to prove that raw \mu-type is somehow isomorphic to the tree type (since we define both finite and infinite types and their generating functions \mathcal{S} and \mathcal{S_f} based on the tree type representation, respectively), i.e., we can transform a raw \mu-type to a tree type and vice versa. Intuitively, the tree type can be constructed from the infinite unfolding of a \mu-type; however, this only holds under a specific circumstance, where a raw \mu-type is contractive:
A raw \mu-type \tau is contractive, if for any subexpression of \tau' of \tau, its \mu-binders are not equal to corresponding bodies, e.g., \mu X.\mu X_1...\mu X_n.S, where S is not equal to any of X. A raw \mu-type is simply called \mu-type if it's contractive, and the set of all contractive raw \mu-types is denoted by \mathcal{T_m}
To see why the contractivity is required, observe the \mu-type \mu T.T, the \mu-binder T is equals to its body T, unfolding it gives exactly the same type again, thus no tree type can be constructed.
Two \mu-type \tau and \rho is in subtyping relation if (\tau, \rho)\in\nu\mathcal{S_m}, where \mathcal{S_m}:\mathcal{P(T_m\times T_m)\to P(T_m\times T_m)} is defined by:
The additional conditions in the last clause is used to make the function invertible, otherwise the generating set G will not be strictly ordered.
Now we can define the sup_{\tiny\mathcal{S_m}} (which stands as a method to check the membership of gfp) as follow:
It can be proven that \mu-type \tau<:\rho(i.e. (\tau, \rho)\in\nu\mathcal{S_m}) iff the corresponding tree type \tau'<:\rho'(i.e., (\tau', \rho')\in\nu\mathcal{S}), see p.301 to p.304 of the original book.
Until now, the algorithm to checking the subtyping relation between to \mu-types is obvious: instantiate gfp with sup_{\tiny\mathcal{S_m}} (we do not use sup_{\tiny\mathcal{S}} because we have proved the isomorphism between \mu-types and tree types, so they are considered identical). However, since the function gfp(X) is defined only if the reachable set of X is finite, we have to prove that rch_{\tiny\mathcal{S_m}}(\tau,\rho) is finite for all pairs of \mu-types (\tau,\rho) to finish the correctness proof of the algorithm based on gfp. The proof sketch is that the reachable set of a pair of tree types \tau' and \rho' is basically the subtrees for them, respectively (as we have shown above), which is subexpressions for their corresponding \mu-types \tau and \rho, so basically we need to prove that the subexpressions of a \mu-type is finite. To do this, we develop two difference forms of subexpression, one called Top-down subexpression and the other called Bottom-up subexpression, we first prove that the reachable set of \mu-type pair is the subset of former one, then prove that the latter one is finite, finally, we prove that the former one is a subset of the latter one, then by the set inclusion, the reachable set must also be finite, the detailed proof can be found at p.305 to p.311 of the original book.
]]>A function \mathcal{F}:\mathcal{P(U)}\rightarrow\mathcal{P(U)} from some universal set \mathcal{U} is monotone if X\subseteq Y\implies \mathcal{F}(X)\subseteq\mathcal{F}(Y)
Let X be the set of all \mathcal{F}-closed sets, i.e., X=\lbrace C|\mathcal{F}(C)\subseteq C \rbrace. Let P be the intersection of X, i.e., P=\bigcap_X, we need to prove: 1. \mathcal{F}(P)\subseteq P 2. P\subseteq\mathcal{F}(P).
For the first, noticed that P\subseteq C for all C\in X, from the monotonicity of \mathcal{F} we get \mathcal{F}(P)\subseteq\mathcal{F}(C) for all C\in X, from the definition of X we can further conclude that \mathcal{F}(P)\subseteq C for all C\in X, since P is the intersection of all C, if an element x is in all C, then x must in P, and since \forall C\in X.\mathcal{F}(P)\subseteq C, then x\in\mathcal{F}(P)\implies\forall C\in X. x\in C, which follows the definition of set intersection and further proves that \forall x\in\mathcal{F}(P).x\in P, i.e., \mathcal{F}(P)\sube P.
For the second, since we've proved that \mathcal{F}(P)\sube P, from the definition of \mathcal{F} we have \mathcal{F}(\mathcal{F}(P))\sube \mathcal{F}(P), which, from the definition of X, we know that \mathcal{F}(P)\in X, moreover, we know that P is the intersection of X. This means, P\sube C for all C\in X, and since \mathcal{F}(P)\in X, it is clear that P\sube \mathcal{F}(P) must hold.
Let X be the set of all \mathcal{F}-consistent sets, i.e., X=\lbrace C|C\sube \mathcal{F}(C) \rbrace. Let P be the union of X, i.e. P=\bigcup_{X}, we need to prove: 1. P\sube\mathcal{F}(P) 2. P\sube\mathcal{F}(P).
For the first, noticed that C\sube P for all C\in X, from the monotonicity of \mathcal{F} and the definition C we get C\sube\mathcal{F}(C)\sube \mathcal{F}(P) for all C\in X, since P=\bigcup_{X}, we have P=\bigcup_{X}\sube \mathcal{F}(P).
For the second, since we've proved P\sube\mathcal{F}(P), from the monotonicity of \mathcal{F} it is obvious that \mathcal{F}(\mathcal{F}(P))\sube \mathcal{F}(P), moreover, from the definition of X, we know \mathcal{F}(P)\in X,and since P is the union of X, \mathcal{F}(P)\sube P must hold. ∎
从终物语的剧情来看，阿良良木历拥有一个非常扭曲的人格，混合了自我厌恶，排外和虚伪，那么，这些性格是如何，又是从什么时候开始集中在阿良良木历身上的呢？
答案是家庭，阿良良木历的警察父母给他树立了极强的正义感，但是同时父母的严格要求对他的童年造成了不小的创伤，为了同时应对创伤和自己的正义感，历发明了“选择性忽视”这种自我保护机制，当他遇到不正义的事情的时候会选择挺身而出——但是如果是一些他并不想做的事情，他就会假装自己对其一无所知，并在事后完全遗忘掉。五年前的夏天，老仓育以数学教学指导为诱饵引诱历在自己的家中度过了整个暑假，她要求历对教学指导这件事情保密，并且不能过问她的名字或者生活。出于对家庭暴力的恐惧，老仓通过这种毫无道理的要求，试图让历察觉到不寻常的地方，并告诉他的警察父母。然而，此时的历和父母正处于冷战期，这使他内心对这种做法产生了抗拒，可是，与一般人的表现不同，尽管当时自己已经明白了所有事情，历却在心中让自己相信了自己对这一切都毫不知情，他用这种方式，让自己免于来自自身正义感的谴责。而后在高一再次与老仓育相见时，他已经完全忘掉了这个人，也忘掉了当年老仓的数学指导是如何让他在自己不擅长的数学领域一鸣惊人的。而学习会事件，让他明白了真理与正义掌握在大多数人的手中，作为正义代言人的老师，却做出了诬陷老仓这种无比卑鄙的行为，这给他的正义感带来了极大的打击，更让他坚定了自己不需要朋友这个事实；自此之后，他开始仇恨自己，仇恨人性，这种冷漠导致他自己的叙事中总是缺乏人群这么一个随处可见的要素。
而在两年前的春假，羽川翼用看上去略显“死皮赖脸”的行为强行打破了历为了保护自己构筑的名为“隐私”的壁垒，长时间的远离人群让历尽管依旧坚信自己的心跳，却依然对交到朋友这件事感到欣喜不已
就在这之后，历在前往书店购买色情杂志的途中遇到了被削成人棍的姬丝秀忒，他强压着自己的恐惧试图帮助姬丝秀忒，而得来的回答却是必须献出自己的生命。
阿良良木历的虚伪，或者说，扭曲的利他主义在这里显现了出来，他听到这句话被恐惧支配不顾一切的跑开，而当姬丝秀忒反过来开始哀求时，他却做出了常人根本不会做出的决定——舍弃自己的生命去拯救这个吸血鬼。在伤物语的后面，姬丝秀忒也指出了这一点：阿良良木历只会帮助那些比自己弱小的人，期望自己变成这些人眼中的救世主，当姬丝秀忒取回自己全部力量后，阿良良木历就对她失去了兴趣，转变成了纯粹的仇恨。
在历收到作为主人的姬丝秀忒告诉他取回四肢的任务之后，差点被三个吸血鬼猎人杀死的经历让他明白了这件事情的危险性，因此，他试图用表现恶意的方式驱逐羽川翼，防止她被卷入这些事件中受到伤害，对于渴望朋友的阿良良木历来说，这个决定无疑伴随着痛苦和悔恨
历对战的第一个吸血鬼猎人是德拉玛特尔基(Dramaturgy)，有趣的是，Dramaturgy的意思是“剧作家”，加之以他本人以猎杀吸血鬼为工作，这些许的说明了这个吸血鬼猎人正是历的自身意志的体现：尽管历的内心清楚吸血鬼的野蛮天性不是什么好事，但是他自身却对这种感觉有些微上瘾。轻松的第一战助长了阿良良木历的自信心，但是，与艾比索特(Episode)的一战让他明白了不是这么轻松的事情：试图帮助他的羽川翼被艾比索特打穿侧腹，遭受了致命伤。艾比索特作为一个半人半吸血鬼的存在，代表了历对于自己吸血鬼性的厌恶；而发生在羽川翼身上的惨剧则说明了试图压抑这种天性会对周围人造成何等伤害。在此一战之后，历对自己的能力加强了信任，决定自己的事情自己解决，不再让附带伤害伤害到羽川，可是前脚刚做出这个决定，后脚羽川翼就被第三个吸血鬼猎人——奇洛金卡达(GuillotineCutter)所绑架。Guillotine Cutter的意思是断头台，他代表了历使用自己的意志压制野蛮天性的尝试(给我的感觉有点类似小圆中晓美焰对应的胡桃夹子魔女的性质)：尽管他是这三个吸血鬼猎人中唯一的纯人类，但是历却使用了最怪异的方式去击败他，这让历认识到自己已经变成了一个怪物
同时，奇洛金卡达也是唯一一个死亡的吸血鬼猎人——这也是帮助姬丝秀忒取回全力的代价，这件事让历陷入了深深的绝望和自责，而羽川翼把他从崩溃的边缘拉了回来，在这之后，历站在操场上对自己的主人宣战，然而，自私的阿良良木历无法接受姬丝秀忒的真正用意，他喊来忍野咩咩，留下了“所有人都不幸福”的结局，也就是让历“几乎”复归常人，但依然保留吸血鬼性，而姬丝秀忒“失去几乎所有吸血鬼的能力”，却仍然不是人类。这让姬丝秀忒变成了一个8岁的幼女，而历本人则变成了一个半吸血鬼——隐喻了野蛮的天性依然在他的内心存在。
——待续
Diagnostic
，观察日志中涉及到_GenerateAppxPackageRecipe
的部分来自于nuget的缓存文件夹C:\Users\{user}\.nuget\packages\microsoft.windowsappsdk\1.0.0
，在vs中清空整个nuget缓存即可解决问题。We use \tau <: \sigma to denote that \tau is a subtype of \sigma, where there is required for a \sigma, an element of \tau can be supplied. Generally speaking, a \tau can be viewed as a \sigma, any term of \tau can be used safely in a context that requires \sigma, i.e., \lbrace s|s:\tau \rbrace \subseteq \lbrace t|t:\sigma\rbrace
A special case of subtyping is type constructor, a type \tau\rightarrow \sigma is a subtype of \tau'\rightarrow \sigma' iff \tau'<:\tau and \sigma<:\sigma', the intuition is that a function f: \tau \rightarrow \sigma can be replaced by a function that accepts any type that is a subtype of \tau, and results in any type that is a super type of \sigma. The \tau here is said to be at contravariant position, and \sigma is said to be at covariant position. combined with these facts we have:
A record type r can be considered as a subtype of record type r' iff for all label k:\tau\in r, we have k:\tau\in r' or k:\tau'\in r' such that \tau<:\tau'.
Since order of the labels in a record doesn't matter, we have three supplementary rules for the records in \lambda_{\rightarrow}
Top type is the super type of all type:
Bottom type is the subtype of all type:
Bottom type has no canonical form, there is no value with type \bot, since if it is, then we can derive terms like v:\top\rightarrow\top from v:\bot and the subsumption rule, but canonical lemma tells us if a value is of type \tau_1\rightarrow\tau_1, then it has to be the abstraction expression, which leads to the contradition.
This special characteristic of bottom type makes it suitable for expressing some particular operations that are not supposed to return, such as throwing an exception or invoking a continuation (by the way, invoking a continuation will never return because it will suspend itself and resume the captured control flow, remind that the call/cc
itself is a control flow operator and continuation is the first class control-flow, so to speak.), we can put normal logic in the success branch of an if
expression and the code for raising error on the other one while allow this term to be well-typed, since the type of raising an exception is \bot and thus can be promoted to any desired type by subsumption rule.
Another case that requires attention is reference cell, generally speaking, in a lot of cases, the type constructors are either covariant or contravariant, e.g., it is covariant in List
constructor:
This is because that List
is considered immutable, which makes the type only appears at covariant position. The reference cell, in constract, is mutable so that the type appears at both covariant and contravariant position, in such cases it must be invariant, i.e., neither covariant nor contravariant, which makes the derivation rule looks like:
Type \tau_1\cap\tau_2 is an intersection type of \tau_1 and \tau_2, it is the meet of \tau_1 and \tau_2, which means if x:\tau_1\cap\tau_2, then x can be viewed as both \tau_1 and \tau_2
Type \tau_1\cup\tau_2 is an union type of \tau_1 and \tau_2, which means if x:\tau_1\cup\tau_2, then x can be viewed as either \tau_1 or \tau_2
The Coercion Semantics for a language with subtyping is a set of functions that transforms the language with subtyping to a language without subtyping, while preserve its semantic(sorta like desugaring), the coercion function is denoted by a double bracket \llbracket-\rrbracket. consider the coercion semantics of \lambda_{\rightarrow}^{<:} with the unique type \tt unit and record types, we have the following basic transformations:
Since the subtyping involves subsumption rules, if we want to design a coercion function that translate a value of \tau to \sigma, we need to know the exact derivation that promotes \tau to \sigma, i.e., we use calligraphic letter to denote a subderivation tree, where \mathcal{C}::\tau<:\sigma means "A subderivation \mathcal{C} whose conclusion is \tau<:\sigma", thus we have following inductive definition on the subtyping derivation:
We use \llbracket \tau_i\rrbracket in the last rule because it implicitly implies "find the label k_j that has the same type as l_i".
The rule 4 may be a bit confusing, if we want to write a coercion function \mathcal{\llbracket C\rrbracket} on the subtyping rule for arrow type, then we know that \mathcal{\llbracket C\rrbracket} must both accepts and returns an arrow type. We first use \mathcal{\llbracket C_1\rrbracket} to promote x to \tau_1(contravariant), then use \mathcal{\llbracket C_2\rrbracket} to promote the return type to \sigma_2(covariant) after preserving the operational semantic(applying the function to a parameter) by applying f.
We also have these for typing derivations, the only difference is that the \llbracket-\rrbracket on typing derivation is nolonger viewed as a function, but as a term of desugared langauge, if \mathcal{D} derives \Gamma\vdash t:\tau, then \mathcal{\llbracket D\rrbracket}:\llbracket\tau\rrbracket:
Note that although a coherent symbol \llbracket-\rrbracket is used, it has difference behavior on typing derivations and subtyping derivations, respectively. Generally speaking, if \llbracket\mathcal{C}\rrbracket::\tau<:\sigma for some \tau and \sigma, then it follows the subtyping derivation semantic, if \llbracket\mathcal{C}\rrbracket::\Gamma\vdash t:\tau for some t and \tau, then it follows the typing derivation semantic.
A translation function \llbracket-\rrbracket is said to be coherent, if for every derivation \mathcal{C} and \mathcal{D} with the same conclusion, \llbracket\mathcal{C}\rrbracket and \llbracket\mathcal{D}\rrbracket are behaviorally equivalent, i.e., they produce the same result in evaluation.
The subtyping rules proposed above are not suitable for actual implementation in programming languages, the reason is that the rule for subsumption and transtivity involves an arbitrary type, for example, we can't decide whether to promote a type \tau to \top (which is obviously available at any time by definition of \top and subsumption rule) or just leave it as is.
To fix this, we introduce a trimmed subtyping relation, called algorithmic subtyping, denoted by \Vdash^{1}, where \Vdash\tau<:\sigma means "\tau is algorithmically a subtype of \sigma", the subsumption rule, reflexivity rule, and transitivity rule are removed, and the three rules for records are combined into a single rule:
We need to prove that this modified subtyping relation is equivalent to the original relation.
The \tau<:\tau can be derived for every type \tau without using Reflexivity rule.
Straightforward induction on the subtyping derivation of \tau. ∎
If \tau<:\sigma is derivable, it can be derived without using Transitivity rule.
By induction on the size of the derivation \tau<:\sigma, first noticed that if the last rule used is any rule other than transitivity rule, then the result is immediate, since the size of the subderivation is strictly smaller than the original derivation, apply induction hypothesis we can obtain the fact that the subderivation is transitivity-free, and since the last rule is not transitivity, too, the whole derivation tree is thus transitivity-free. If the last rule used is the transitivity rule, we proceed by the combination of both subderivation of the transtivity rule:
\tau<:\sigma iff \Vdash \tau<:\sigma
Straightforward induction on the derivation of \tau<:\sigma, use one of the two lemmas above for reflexivity and transitivity rule. ∎
Similar to the subtyping relation, there is one non-syntax-directed rule in typing relation: the subsumption rule \cfrac{\Gamma\vdash t:\tau\ \ \ \ \ \tau<:\sigma}{\Gamma\vdash t:\sigma}.
This typing relation allows a term to be promoted to \top at any time, however, the subsumption rule cannot be deleted, it plays an important role in the subtyping. If a subsumption rule is used in the immediate subderivation, then it can be moved down to the root to become the last rule ever used in the whole derivation tree——except for the application case(这一段的example实在是太他妈的长了，我懒得打出来了), in application case, if the subsumption rule appears at either left or right hand of the premises, it can only be moved to the other side rather than down to the root. This is because it is necessary to bridging the difference between the function's argument type and actual parameter's type——either by promoting the parameter's type or by demoting the argument's type, there is no way to do this without the sumsumption rule.
In those cases to which the subsumption rule can be postponed, it is safe to delete it, results in a smaller (or more refined) type, e.g., in the following derivation:
It is perfectly safe to delete the last subsumption rule, leave the s_1\ s_2 of type S_{12}, the type is smaller but it's more refined. It is sufficient to conclude that all subsumption rules can be eliminated except for those who appear at the premises of an application rule. To deal with the latter case, we involve a more complex application rule:
which completes the algorithmic typing relation(the \Vdash will also be used to denote the algorithmic typing relation):
A type \tau is called a join of type \sigma and \rho, denoted by \sigma\lor\rho=\tau, if \sigma<:\tau and \rho<:\tau, and for all \upsilon, \tau<:\upsilon only if \sigma<:\upsilon and \rho<:\upsilon, i.e., join is the least upper bound of \sigma and \rho.
A type \tau is called a meet of type \sigma and \rho, denoted by \sigma\land\rho=\tau, if \tau<:\sigma and \tau<:\rho, and for all \upsilon, \upsilon<:\tau only if \upsilon<:\sigma and \upsilon<:\rho, i.e., meet is the greatest lower bound of \sigma and \rho.
From the (pure) \lambda_{\rightarrow}'s perspective(and only from its perspective since this theorem does not extend to the real world programming languages for most of them supports recursion function and recursive types thus makes the encoding of non-terminate programs become possible), we want to prove that every typable term is normalizable(strong normalization, for accuracy, where by "strong" means every term eventually halts after finite step of reductions. Weak Normalization, in contrast, means there exist a derivation that will halts in finite steps.), which means that any evaluation on them is guaranteed to halt (evaluates to normal form) in finite steps.
Definition(Termination):
where v stands for normal form(value)
Definition(Normalization):
We need to prove both (1): \forall t\in R_{\tau(t)}.(t\Darr) and (2): \vdash t:\tau\implies R_{\tau}(t), i.e., every R_{\tau} is normalizable and every typable t:\tau is in some R_{\tau}.
The proof for (1) is immediate from the definition of R. The following contents focusing on proving (2).
If t\rightarrow t', then t halts iff t' halts.
Subproof. The result is immediate. ∎
If t\rightarrow t', then R_{\tau}(t)\iff R_{\tau}(t')
Subproof. By induction on the structure of \tau
If \bigcup\limits_{i=1}^{n}x_n:\tau_n\vdash t:\tau with a set of values \{v_i:\tau_i\vert 1\leqslant i\leqslant n \} where \forall i\leqslant n.R_{\tau_i}(v_i), then R_{\tau}([x_1\mapsto v_1],...,[x_n\mapsto v_n]t)
Subproof. By induction on the derivation of t
Definition. Let [x_i \mapsto_{i=1}^n v_i]\ t \stackrel{\text{def}}{=} [x_1\mapsto v_1],...,[x_n\mapsto v_n]\ t
Suppose that \tau=\tau'\rightarrow\tau'', t=\lambda x:\tau'.\ k, and from Inversion Lemma we know that \bigcup\limits_{i=1}^{n}x_n:\tau_n,x:\tau'\vdash k:\tau'', what left is to prove i. t\Darr. ii. R_{\tau'}(s)\implies R_{\tau''}([x_i \mapsto_{i=1}^n v_i]\ t\ s).
however, since:
by Preservation Lemma we get:
which is equals to
now, combined with the fact that R_{\tau'}(s), by the definition of R we get:
Theorem (Strong Normalization): Every typable term is normalizable.
Proof: By handling each possible case of a term t
1. Case \cfrac{x:\tau\in\Gamma}{\Gamma\vdash x:\tau}: Since t it self is already a normal form, this case is trivial.
2. Case \cfrac{\Gamma,x:\tau_1\vdash t:\tau_2}{\vdash \lambda x:\tau_1.t:\tau_1\rightarrow \tau_2}: Since t itself is already a normal form, this case is trivial.
3. Case \cfrac{\Gamma\vdash t_1:\tau_1\rightarrow \tau_2,t_2:\tau_1}{\Gamma\vdash t_1\ t_2:\tau_2}: By apply Subsitution Lemma until it reaches a normal form, since every normal form is already normalizing, so t is normalizable.
The proof is hereby completed. ∎
(呜呜呜，这是一整个通宵(6个小时)的忙活结果，我好笨)
]]> A term t belongs to type T means we can check this statically, instead of depending on some runtime semantics, i.e., \mathtt{if\ true\ then\ true\ else\ 0} is ill-typed, although it will yields the well-typed term at runtime.
The typeing relation can be formally written as \mathtt{t:T}, meaning "\mathtt t is of type \mathtt T", can be define by a set of inference rules, e.g.:
The typing relation is the smallest binary relation between terms and types satisfying all the instances of inference rules, a term \mathtt t is said to be typable or well-typed if \mathtt{\exist_{T}(t:T)}
The inversion lemma states, given \mathtt{t:T}, if \mathtt t has any type, then \mathtt t has type \mathtt T, this means, for example, if the inference rule states \mathtt{true:Bool}, then you know that if \mathtt{true} has any type at all, it must be of type \mathtt{Bool}.
Under the context of "Simply Typed", Each typable term has only one type, and the type itself has only one derivation tree
The safety of typing system build upon two theorems:
These two rules can be proven by induction on derivation of \mathtt t, however, the second theorem's is the sufficient condition for a term to be typable, because \mathtt 0 is a well-typed form but the term that can derive it it's not necessary to be typable, e.g., \mathtt{if\ true\ then\ 0\ else\ false}.
The lambda calculus will involves only \tt Bool type for brevity
Consider that an abstraction in lambda calculus may results in another abstraction, in order to build a soundness type system, we need to keep track of the type for both arguments and return values, we use \tt T\rightarrow R denoting a function(an abstraction) that takes an argument of type \tt T and a return value of type \tt R, the type of new lambda calculus is thus can be defined by the following BNF(our type system contains only one type \tt Bool):
The \rightarrow is called type constructor since it constructs a new type upon the original one, which is right associative, which means, \tt T_1\rightarrow T_2\rightarrow T_3=T_1\rightarrow(T_2\rightarrow T_3)
There are two ways of knowing what type of argument an abstraction would expect:
We use \lambda x:\mathtt{T}.t_1 to show that x is expected to be of type \tt T.
The Typing Context, denoted by \Gamma, is a function from variables to types, it can be represented by a sequence of typing relations separated by commas, and it can be extended by adding a comma and a new binding to the right of it(e.g.: \Gamma,x:\tt T_1 adds a new relation (x:\tt T_1) to the typing context), we use \Gamma\vdash t:\tt T to show that the term t has type \tt T under the set of assumptions \Gamma, \Gamma can be omitted if it equals to \phi. The variables in \Gamma must be distinct, which means you cannot add a variable that is already existed, so variable renaming is required when such thing occurs.
We can conclude the general rule for typed lambda calculus
:
along with the base type \tt Bool
Specifically, we can consider \Gamma contains the type information of the free variables in t, you can interpret \Gamma\vdash x:\tt T as free variable x is assumed to have type \tt T under context \Gamma
\lambda_{\rightarrow} can be used to represent the simply typed lambda calculus
It is clear that \lambda_{\rightarrow} will degenerate to untyped without a base type (in this case, \tt Bool) because no type can be actually applied.
The inversion lemma for \lambda_{\rightarrow} is essentially the reverse of its definitions:
with the cases for base type, in this case, \tt Bool
Given typing context \Gamma, a term t can have at most one type, and at most one derivation, i.e., both type and type derivation is unique.
The lemma of canonical forms states the shape of the values (normal forms) of the \lambda_{\rightarrow}
The Progress Theorem is omitted since it's nothing more than a straightforwared induction on derivations, here only shows the proof for Preservation
If \Gamma\vdash t:\tt T and \Delta is a permutation of \Gamma, then \Delta\vdash t:\tt T, the latter derivation would have the same depth as the former
If \Gamma\vdash t:\mathtt{T}\land x\notin dom(\Gamma), then \Gamma,x:\mathtt{S}\vdash t:\tt T. Weakening Lemma basically says if a judgment holds for a set of hypotheses, then it will holds for the augmented hypotheses
If \Gamma,x:\mathtt{S}\vdash t:\mathtt{T}\land\Gamma\vdash s:\tt S, then \Gamma\vdash[x\mapsto s]t:\tt T
The substitution lemma basically says if you replace a free variable in a term with another one with the same type, then the type of the term is preserved, these two free variables may stand for different meanings, but we are focusing only on the type now.
The proof for this lemma is written here due to its complexity
Proof:
The proof will based on the induction on the derivation of \Gamma,x:\mathtt{S}\vdash t:\tt T, which means we will assume that the lemma holds for all subderivations of \Gamma,x:\mathtt{S}\vdash t:\tt T upon several cases:
If the last inference rule used in derivation is \text{T-Variable}, by the substitution rule, we know that [x\mapsto s]t will result in s or t depends on whether t=x, if it results in s the desired result is immediate because t=x\land\Gamma\vdash t:\mathtt{T},s:\mathtt{S}\implies \tt T=S, since the variables in typing context is unique, and if it results in t the answer is also immediate since original term t stays intact
If the last inference rule used in derivation is \text{T-Abstraction}, the proof can be witnessed by the following inference tree
If the last inference rule used is \text{T-Application}, which means it has form t_1\ t_2, then by induction hypothesis we know \Gamma\vdash [x\mapsto s]t_1:\tt T_1\rightarrow T and \Gamma\vdash [x\mapsto s]t_2:\tt T_1, thus by \text{T-Application}, we have \Gamma\vdash [x\mapsto s]t_1\ [x\mapsto s]t_2:\tt T, therefore, \Gamma\vdash [x\mapsto s](t_1\ t_2):\tt T
\Gamma\vdash[x\mapsto s]\tt true:T is trivial and immediate
\Gamma\vdash[x\mapsto s]\tt false:T is trivial and immediate
If the last inference rule used in the derivation is \text{T-If}, which means it has form \tt if\ \mathnormal{t_1}\ then\ \mathnormal{t_2}\ else\ \mathnormal{t_3}, by induction hypothesis we have:
therefore by \text{T-If} we have:
which is
The proof is thereby completed.
From the lemmas' above, we can prove the preservation of the \lambda_{\rightarrow}, by induction on the last derivation rule used by a term t, since both \text{T-Variable} and \text{T-Abstration} is already in normal form, the only thing that needs to be considered is \text{T-Application}, following the evaluation rules stated in Untyped Lambda Calculus, we can prove each of them individually, specifically, when proving the last rule(\beta-reduction), use the substitution lemma we've just proved above.
In typed lambda calculus, more specifically, the type constructor \rightarrow, contains two kinds:
The Curry-Howard Correspondence stated the correlation between type theory and constructive logic, generally speaking, there is a one-to-one correspondence between logic statements and types, it can be concluded in the table below^{1}:
^{2}
In \lambda_{\rightarrow}, the type does not preserve to the evaluation, it can be witnessed from the fact that the evaluation rule of \lambda_{\rightarrow} is the same as the untyped one, thus, a term of \lambda_{\rightarrow} can actually degenerates to a term of untyped lambda calculus, called type erasure, defined inductively:
And since types don't matter during the evaluation, there are some simple correspondence between evaluate typed terms directly and evaluate terms after their types have been erased:
The second rule here basically tells "it doesn't matter whether to evaluate before or after the erasure"
A term m in untyped lambda calculus is said to be typable in \lambda_{\rightarrow} if there are some typed term t, type \tt T, and context \Gamma such that erase(t)=m\land\Gamma\vdash t:\tt T^{3}
Church-Style and Curry-Style are two ways to define a language, until now we're using the Curry-Style, which is define the syntax first, then define semantics (evaluation rules), and define its type system at last to reject the ill-formed terms.
Church-Style, on the other hand, takes a difference approach, it first define all the terms, then give a type system to identify those typable terms, and give semantics only on those.
From a historical perspective, the Church-Style is mostly used in explicitly typed systems, while Curry-Style is prefered to be used in implicitly typed systems.
where \lambda x.x is called abstraction and term\ term is called application
The lambda calculus follows is left associative and extend as far as to the right, which means the following equations hold:
A variable x is said to be bounded by abstraction if it appears in the body part of \lambda x.t, or said to be free if it's not bounded by any enclosing abstraction, a lambda term with no free variables is said to be closed, a closed term is also called a combinator
A term is said to be a reducible expression, or redex, if it's in form \lambda x.y\ t because it can be reduced in one step, we use \beta-reduction to reduce a redex, it can be depicted graphically by:
which means "replace all the free occurrence of x in y by t"
we use term "free occurrence" because all x is free if you only consider t, since \lambda x is really not part of it, be aware that because of the left associativity, all the reductions at the same precedence are evaluated from left to right
There are several order/form of \beta-reductions:
The parentheses are required, because otherwise the term will be evaluated in a difference way, remember that an abstraction extend as far as to the right. for example the last occurrence of \lambda x will be interpreted as \lambda x.x\ z rather than \lambda x.x and the second occurrence of \lambda x will become part of the abstraction of the first \lambda x which will contradict to the statement that we're about to make
the third occurrence of \lambda x.x can not be reduced to z, because is inside the body of abstraction \lambda z.\lambda x.x\ z, but the first two occurrences can be reduced unquestionably because they are not part of the abstraction. This reduction strategy is also used in langauges like Haskell, it is somehow lazy-evaluated, because if you consider the innermost redex as an expression that works as a parameter of a function, then it won't be computed until it's value is required
Call by value, only outermost redex can be reduced after it's right part (the parameter) is already reduced to a value. which means the follow redex:
will be reduced to \lambda x.x\ (\lambda z.(\lambda x.x)\ z) first, and then take a step to \lambda z.(\lambda x.x)\ z and then stops. be noticed that \lambda z.(\lambda x.x)\ z is not of form \lambda x.y\ z, so it's not a redex, it's an abstraction, hence won't be reduced.
Theoretically, any data structure and operations can be encoded solely by lambda terms, such encoding is called Church Encoding
The boolean values true
and false
can be encoded by:
which works similar to a ternary operator: true\ x\ y will return x and false\ x\ y will return y, so you can write a test function that take an encoded boolean value and two arguments(in lambda calculus, everything is function so these arguments are actually functions, too):
if you feed test with test\ true\ x\ y it will returns x and if you feed test with false it will returns y.
And some operations on booleans:
Consider the and function, if c_1 is true then the application c_1\ c_2\ false will yield c_2, which means and returns the value of c_2 if c_1 is true which implies that if both c_1 and c_2 is true then and will effectively yield true, and it will yield false if either c_1 is false or c_1 is true but c_2 is false. or function can be interpreted in a similar way.
A pair is simply comprised of two values, first and second, we use church boolean we've seen in the last section to select either of them:
THe only subtlety here is the order of the arguments in pair: the conditional argument c is at the last position, because when we constructing a pair we will use pair\ a\ b, which lacks the last parameter c thus effectively yields a predicate, when you apply first or second to it, it is equals to evaluate that predicate to get the real first or second value inside the pair.
A set can be considered as an encoding of the natural numbers if there is a one-to-one correspondence between that set and \mathbb{N}, church numerals is a way to encoding lambda terms to represent natural numbers, we have:
where s^n stands for "apply s for n times".
Then we can define arithmetic operators:
in the above operators, times means "apply plus\ n, m times to c_0", which is exactly what "multiplication" means, and plus basically means "increment n by m times", and exp means "times n to c_1 by m times"
exp function can also be defined by exp=\lambda m.\lambda n.\ n\ m, if you expand it you will get:
\begin{aligned}
exp\ x\ y&=(\lambda m.\lambda n.\ n\ m)\ x\ y\\
&=y\ x\\
&=(\lambda s_1.\lambda z_1.\ s_1^y\ z_1)\ (\lambda s_2.\lambda z_2.\ s_2^x\ z_2)\\
&=\lambda z_1.\ (\lambda s_2.\lambda z_2.\ s_2^x\ z_2)^y\ z_1\\
&=\lambda z_1.\ (\lambda s_2.\lambda z_2.\ s_2^x\ z_2)\ z_1\ ((\lambda s_2.\lambda z_2.\ s_2^x\ z_2)^{y-1}\ z_1)\\
&=\lambda z_1.\ (\lambda z_2.\ z_1^x\ z_2)\ ((\lambda s_2.\lambda z_2.\ s_2^x\ z_2)^{y-1}\ z_1)\\
&=\lambda z_1.\ (\lambda z_2.\ z_1^x\ z_2)\ (\lambda z_2.\ z_1^x\ z_2)\ ((\lambda s_2.\lambda z_2.\ s_2^x\ z_2)^{y-2}\ z_1)\\
&=\lambda z_1.\ \underbrace{(\lambda z_2.\ z_1^x\ z_2)\ (\lambda z_2.\ z_1^x\ z_2)\ ...\ (\lambda z_2.\ z_1^x\ z_2)}_{(\lambda z_2.\ z_1^x\ z_2)\ \text{repeated for}\ y\ \text{times}}\ z_1
\end{aligned}keep unwinding the last part of the abstraction body, you will get repeated (\lambda z_2.\ z_1^x\ z_2) up to y times, which fits the definition of exponentiation perfectly
and we have pred:
the pred function will apply ss to zz for m times, if m is c_0, apply ss to zz zero times will get zz itself and then yields c_0, otherwise you will get pair\ m-1\ m, then first function will get the first component of the pair, which is m-1.
Since we have pred, we can now write subtract:
which basically means "apply pred for n times on m"
A term is said to be a fixed-point combinator, if after the reduction it yields exactly itself again, for example, (\lambda.x\ x.x)\ (\lambda x.\ x.x) is a fixed-point combinator，reduce it will just make it self-replicate once, this property of fixed-point combinator allows us to create combinators that can be used for recursion, for example, the following fixed-point combinator:
can be used to calculate a function's fixpoint, thus effectively supports recursion.
The set of all free variables in a lambda term can be defined inductively:
The substitution rule (\alpha-conversion) of lambda terms, denoted by [x\mapsto s], can be defined inductively as:
The requirement for the third rule is necessary, because if y is in FV(s), then after substitution it will be excluded from FV(s), such substitution will undoubtedly change the meaning of the term. A such phenomenon that a free variable becomes bounded after a subtitution is called variable capture
following the "Call by value" strategy, we can add a new term called value to the syntax:
this is because in the call by value strategy, a single abstraction is not reducible, which makes them can be effectively considered as normal forms.
We have inference rules of evaluation for lambda terms:
noticed that the use for value and term controls the order of the inference to fit the call by value strategy: when we are about to evaluate term_1\ term_2, we must use rule 1 first to reduce the term_1 to value_1, then use rule 2 to reduce term_2 to value_2, and finally use rule 3 to perform reduction on value1\ value2.
The \beta-reduction involves some subtleties when bring them into the real world: the variable capture, which turns out to be harmful because it will change the meaning of the whole term, to avoid this, one way is to assign unique identifiers to each of the variables, de Bruijn Index is one of those approaches.
The de Bruijn Index says that, instead of assigning a string to the variables, we use number k to stands for "the variable bounded by k'th enclosing binder, which is the \lambda-term", where by "enclosing" means the counting is inside-out, for example, the de Bruijn form of
is
To define the de Bruijn Index formally, consider the k-free variable as "the free variable that requires at least k binders to be bounded", thus, we can define a set \mathcal{T} of sets that are indexed by k, and \mathcal{T}_k stands for "all the terms that can be bounded by k binders", let \mathcal{T} be a set of sets indexed by natural number such that:
In order to solve the problem where sometimes we don't know the actual index of a free variable v when substituting, we create a function \varGamma called naming context: Suppose x_0..x_n are variables from \mathcal{V}, the \varGamma=x_n..x_0 assigns each x_i the index i, which means x_n\mapsto n..x_0\mapsto 0, so that we can resolve the index of a particular variable from the context.
When performing substitutions on de Bruijn forms^{1}, the substitution may goes under abstraction, which means, according to the definition of de Bruijn Index, they need to be renamed to avoid variable capture, for example, [1\mapsto s](\lambda.\ 2), all the terms in s now have one-more enclosing binder, which means its free variables should be incremented by one, this is called variable shifting, the bound variables should stay intact, e.g., if s=2\ (\lambda.\ 0), only 2 should be lifted to 3 but not for 0, because 0 is a bound variable. The key idea is we keep track on how many binders (denoted c, for example) have been encountered, so all the variables less than c should stay intact because they are bounded.
Define d-place shift of a term t above cutoff c, denoted by \uparrow^{d}_c(t):
Specifically, we use \uparrow^{d}(k) and \uparrow^{d}_0(k) interchangeably.
And now we can define the substitution on de Bruijn forms:
And the \beta-reduction rule for de Bruijn forms:
The reason for \uparrow^{-1} and \uparrow^1(v) is because after the reduction, the original \lambda sign just vanished, means there will be one less binder in the whole term, so we need to downshift the variables affected by this \lambda sign by one, which are those inside t, and as a countermeasure of the downshift, the variables inside v must be upshifted by one(because they will become a part of t which will be downshifted after the substitution) so that we won't ended up with some weird situation such as a negative variable.
Here is an implementation of the untyped lambda calculus using OCaml
exception Inappropriate_term
exception Bad_index
type term =
| TmVar of string * int * int (* The second 'int' contains the total length of the context, for debugging purpose *)
| TmAbs of string * term (* the first 'string' stands for the name hint of the bound variable, since the internal representation will be de Bruijn index *)
| TmApp of term * term
type binding = NameBind (* For now it carries no extra info *)
type context = (string * binding) list
let ( >>= ) opt f = match opt with Some v -> f v | None -> None
let rec index_of f = function
| [] -> raise Not_found
| head :: tail -> if f head then 0 else 1 + index_of f tail
let pick_fresh_name ctx hint =
match List.exists (fun (name, _) -> name = hint) ctx with
| true ->
let new_name = hint ^ "_" in
((new_name, NameBind) :: ctx, new_name)
| false -> ((hint, NameBind) :: ctx, hint)
(* append to frond -- innermost has smaller index *)
let ctx_length ctx = List.length ctx
let resolve_opt ctx name = List.find_opt (fun (n, _) -> n = name) ctx
let unwind_opt ctx index = List.nth_opt ctx index >>= fun (name, _) -> Some name
let unwind_var ctx var_name =
let index = index_of (fun (name, _) -> name = var_name) ctx in
TmVar (var_name, index, ctx_length ctx)
let ( * ) ctx var_name = unwind_var ctx var_name
let rec term_to_string ctx = function
| TmAbs (name, t') ->
let ctx', name' = pick_fresh_name ctx name in
"(λ" ^ name' ^ ". " ^ term_to_string ctx' t' ^ ")"
| TmApp (lhs, rhs) ->
"(" ^ term_to_string ctx lhs ^ " " ^ term_to_string ctx rhs ^ ")"
| TmVar (_, index, env_len) ->
if ctx_length ctx = env_len then
let opt = unwind_opt ctx index in
match opt with Some name -> name | None -> raise Not_found
else raise Bad_index
let variable_shift place term =
let rec walk_shifting cutoff = function
| TmVar (var_name, index, env_len) ->
if index >= cutoff then
(* add 'place' to 'env_len' because by shifting variable we effectively extended the context *)
TmVar (var_name, index + place, env_len + place)
else TmVar (var_name, index, env_len + place)
| TmAbs (name, t') ->
TmAbs (name, walk_shifting (cutoff + 1) t')
(* Be careful about the definition, a new 'TmAbs' is required here *)
| TmApp (lhs, rhs) ->
TmApp (walk_shifting cutoff lhs, walk_shifting cutoff rhs)
in
walk_shifting 0 term
let rec substitute before after = function
| TmVar (_, index, _) as var -> if index = before then after else var
| TmAbs (name, t') ->
TmAbs (name, substitute (before + 1) (variable_shift 1 after) t')
| TmApp (lhs, rhs) ->
TmApp (substitute before after lhs, substitute before after rhs)
let beta_reduction = function
| TmApp (TmAbs (_, t), v) ->
variable_shift (-1) (substitute 0 (variable_shift 1 v) t)
| _ -> raise Inappropriate_term
(* Under the call-by-value strategy, the normal form is abstraction *)
let is_normal_form = function TmAbs (_, _) -> true | _ -> false
let rec eval_one_step_call_by_value = function
| TmApp (TmAbs (_, _), v) as app when is_normal_form v -> beta_reduction app
| TmApp (lhs, rhs) when is_normal_form lhs ->
let rhs' = eval_one_step_call_by_value rhs in
TmApp (lhs, rhs')
| TmApp (lhs, rhs) ->
let lhs' = eval_one_step_call_by_value lhs in
TmApp (lhs', rhs)
| _ -> raise Inappropriate_term
let rec eval_one_step_call_by_name = function
| TmApp (TmAbs (_, _), _) as app -> beta_reduction app
| TmApp (lhs, rhs) ->
let lhs' = eval_one_step_call_by_name lhs in
TmApp (lhs', rhs)
| _ -> raise Inappropriate_term
let rec eval_one_step_full_beta_reduction = function
| TmAbs (name, term) ->
let term', reducible = eval_one_step_full_beta_reduction term in
(TmAbs (name, term'), reducible)
| TmApp (TmAbs (_, _), _) as app -> (beta_reduction app, true)
| TmApp (lhs, rhs) ->
let lhs', lhs_reducible = eval_one_step_full_beta_reduction lhs in
let rhs', rhs_reducible = eval_one_step_full_beta_reduction rhs in
(TmApp (lhs', rhs'), lhs_reducible || rhs_reducible)
| TmVar (_, _, _) as var -> (var, false)
let rec eval_call_by_value term =
try
let t' = eval_one_step_call_by_value term in
eval_call_by_value t'
with Inappropriate_term -> term
let rec eval_call_by_name term =
try
let t' = eval_one_step_call_by_name term in
eval_call_by_name t'
with Inappropriate_term -> term
let rec eval_full_beta_reduction term =
let t', reducible = eval_one_step_full_beta_reduction term in
if reducible then eval_full_beta_reduction t' else t'
let ctx = [ ("z", NameBind); ("s", NameBind); ("n", NameBind); ("m", NameBind) ]
let plus =
TmAbs
( "m",
TmAbs
( "n",
TmAbs
( "s",
TmAbs
( "z",
TmApp
( TmApp (ctx * "m", ctx * "s"),
TmApp (TmApp (ctx * "n", ctx * "s"), TmVar ("z", 0, 4)) )
) ) ) )
let c' =
TmAbs
( "s",
TmAbs
( "z",
TmApp (TmApp (TmVar ("s", 1, 2), TmVar ("s", 1, 2)), TmVar ("z", 0, 2))
) )
let c'' =
TmAbs
( "s",
TmAbs
( "z",
TmApp (TmVar ("s", 1, 2), TmApp (TmVar ("s", 1, 2), TmVar ("z", 0, 2)))
) )
(* λm. λn. λs. λz. m s (n s z) (λs. λz. s s z) (λs. λz. s s z) *)
let plus_2_and_2 = TmApp (TmApp (plus, c'), c'')
let call_by_name = eval_call_by_name plus_2_and_2
let call_by_value = eval_call_by_value plus_2_and_2
let full_reduction = eval_full_beta_reduction plus_2_and_2
let _ = print_endline ("Call by name: " ^ term_to_string [] call_by_name)
let _ = print_endline ("Call by value: " ^ term_to_string [] call_by_value)
let _ = print_endline ("Full β reduction: " ^ term_to_string [] full_reduction)