The type inference algorithm always plays an important role in the real-world programming languages, they allows the user to selectively omit type annotations when it is obvious from the context. One of the most famous application is the ML-Family, where the programmer is permitted to omit almost all type annotations. The algorithm is based on unification: it resembles the equations, the unknown type will be denoted by a placeholder, and our task is to solve the equation to get a set of proper subtitutions that satisfies it, i.e., the solution of the equation.

To get start with the algorithm, some preliminaries may be required, I'll state the definitions more *intuitively* rather than *formally* in order to help readers realize it, so do not consider them as formal definitions, if you are looking for a rigorous definition from mathematical perspective, Unification - Wikipedia is what you need.

Let's think about the first question arises in our mind: What exactly is a type equation? Intuitively, it means that we need to know the shape of the *exact* types, then find the subtitutions (the solution) that accommodates the assumption. For example, consider the term \lambda x.\ x\ x\ 0, what type should x have?

Let's split the body of the abstraction into *inner* application x\ 0 and the *outer* application x\ y where y=x\ 0, consider the inner application, we can see that x *applies* to 0, this reveal two facts: 1. if x is well-typed, then it must be of type \tau\to\sigma for *some* \tau and \sigma. 2. since it applies to 0, the \tau must equal to \tt int, so, from the inner application, suppose that x has type \rho, we can generate two type equations for x: 1. \rho=\tau\to\sigma for some \tau and \sigma. 2: \tau=\tt int. Now, consider the outer application, since we've already know that the inner application has type \sigma (we conclude x:\tau\to\sigma from the inner application, and applies to a 0, from the typing rule for application, it is trivial that x\ 0:\sigma), so the outer application must takes an argument with type \sigma, and returns some \sigma', now, we know that inner x and outer x must have the same type, therefore a new equation emerged: \tau\to\sigma=\sigma\to\sigma', combined with the aforementioned equations, we can obtain a equation set: \lbrace \rho=\tau\to\sigma, \tau=\tt int, \tau\to\sigma=\sigma\to\sigma'\rbrace

Be aware that during the calculation, there are actually several derivations that are not permitted in a normal type checking algorithm, for example, we assume x:\tau\to\rho, but then it applies to 0:\tt int, how could this happen? the \tau is not known to be the same as \tt int by that time. The reason is that, \tau, \rho, \sigma, and \sigma' are

type placeholders, when then are engaged with other types during the calculation, i.e., a placeholder \tau is present when an \tt int is expect, instead of fail the algorithm immediately, we add a new equation \tau=\tt int indicate that type \tauis supposed tobe of type \tt int, and this is precisely how the algorithm calculates the equations.

We've been using phrase *type equation* to indicate such equations, in computer science or logic, it has a scientific name called **Constraint**, and the set of constraints are called **Constraint Set**, the procedure to solve a constraint is called **Unification**; the solution of a constraint is called its **Unifier**, a unifier **unifies** a constraint set if it **unifies** every constraint in it.

From above definition it's not hard to find that both constraint set, unification, unifiers are just alias for the similar concepts in equations. In the above example, the unifier is [\tau\mapsto \tt int, \sigma\mapsto int, \sigma'\mapsto int, \rho\mapsto int\to int], note that we use symbol \mapsto instead of =, because in such context the unifier stands for *substitutions*, when we apply a unifier to a particular type, it actually replaces all the variables in the type that in its domain to the corresponding type in its codomain, for example, apply the above unifier to type \rho\to\sigma will yields \tt int\to int\to int, where the \rho is substituted by \tt int\to int, and \sigma is substituted by \tt int.

Now we can give the algorithm for calculating the constraint set:

Again, we are aiming "easy-to-understand" instead of "rigorousness", so I'll explain the enigmatic mathematical formulas in a pragmatic way: when you see something like \cfrac{A_1,A_2,...}{B}, you know that A_i are premises, and B is a conclusion, it means, "if all of A_1,A_2,... can be satisfied, then we can conclude B", or, in a way that is more intuitive for programmers, "if A_1=\mathtt{true}\ \land\ A_2=\mathtt{true}\ \land\ ..., then B=\tt true"; formula \Gamma\vdash x:\mathtt{T_1} means x has type \tt T_1 under the context \Gamma, in practice, the \Gamma is usually the symbol table, a list that records the defined variables and their types, we use \Gamma,x:\tt T to append x:\tt T to the end of \Gamma, P\vdash Q read as "with information in P, I can conclude that Q is true", P can be omitted when it's empty, means that I can conclude that Q is true without any context. The above figure uses two more context symbols than a single \Gamma, that is \mathcal{X} and \mathcal{C}, specifically, \mathcal{X} is the set of generated type placeholders during the algorithm (see above explaination about "type placeholder"), and \mathcal{C} is the already-calculated set of constraints. Specifically, \Gamma\vdash x:\mathtt{T_1}\ |_{\mathcal{X}}\ \mathcal{C} means x has type \tt T_1 under symbol table \Gamma, while the already-calculated constraint set is \mathcal{C}, and, until now, we have generated additional type placeholders stored in \mathcal{X}. If we think the algorithm as a function, then the function will take \Gamma and x as input, and produce \mathcal{X} and \mathcal{C} as output. The type \tt Nat stands for "natural numbers" (we use it interchangeably with \tt int, although they stand for a very different range).

Now we can give the function to calculate the constraint set, but first, let's introduce some required definitions:

```
enum Type:
case Base(n: String) extends Type
case Constructor(from: Type, to: Type) extends Type
case Scheme(quantifiers: Set[Type], ty: Type) extends Type
case Forall(n: String) extends Type
override def toString: String =
import Type.*
this match
case Base(name) => name
case Constructor(from, to) => s"(from->to)"
case Scheme(quantifiers, ty) => s"∀{quantifiers.foldLeft("")((acc, base) => acc + base.name)}.(ty)"
case Forall(name) => s"∀$name"
end toString
def name: String =
import Type.*
this match
case Base(name) => name
case Forall(name) => name
case _ => throw IllegalStateException()
end name
end Type
object TypePrimitives:
val Nat: Type = Type.Base("Nat")
val Bool: Type = Type.Base("Bool")
end TypePrimitives
```

We use `Type.Base`

to represent monotypes (i.e. base types that can be represented by a single symbol), `Type.Constructor`

to represent function types, and `Type.Forall`

to represent type placeholders, `Type.Scheme`

will be exlained soon. `TypePrimitives`

provide built-in types (like primitive types in `Java`

/`C#`

). And the abstract syntax tree of our language can be described by:

```
enum SyntaxNode:
case Var(name: String)
case Abs(binder: String, binderType: Option[Type], body: SyntaxNode)
case App(function: SyntaxNode, applicant: SyntaxNode)
case Successor(nat: SyntaxNode)
case Predecessor(nat: SyntaxNode)
case IsZero(nat: SyntaxNode)
case If(condition: SyntaxNode, trueClause: SyntaxNode, falseClause: SyntaxNode)
case Zero, True, False
case Let(left: String, right: SyntaxNode, body: SyntaxNode)
case Seq(first: SyntaxNode, second: SyntaxNode)
end SyntaxNode
```

To prevent repeated tedious types, we introduce some type alias:

```
opaque type Constraint = (Type, Type)
opaque type ConstraintSet = (Type, Set[String], Set[Constraint])
opaque type Bindings = mutable.Map[String, Type]
```

The `Constraint Set`

has three member, the first is the calculated type (may contain type placeholders), the second is the generated type placeholders, and the third is already-calculated constraint set, `Bindings`

are symbol table.

```
object FreshNameProvider:
private var currentChar: Char = 'A'
private var currentIndex: Int = 0
def freshName(): String =
val str = s"currentChar{if (currentIndex != 0) currentIndex else ""}"
currentChar = currentChar match
case x if 'A' until 'Z' contains x => (currentChar + 1).toChar
case _ => currentIndex += 1; 'A'
return str
def freshName(excludes: Set[String]): String =
val name = freshName()
return if !excludes.contains(name) then name else freshName(excludes)
end FreshNameProvider
```

The `FreshNameProvider`

is to provide unique names at every call to `freshName()`

or `freshName(Set[String])`

, the later one generates a unique name different from every element in `Set[String]`

.

```
// Get all the base type variables that are ever occurred in the given constraint set
def constraintVariables(constraintSet: Set[(Type, Type)]): Set[Type] =
return constraintSet.flatMap {
case (left, right) => typeVariables(left) | typeVariables(right)
}
end constraintVariables
// Get all the base type variables in a type declaration
def typeVariables(decl: Type): Set[Type] =
return decl match
case base: Type.Base => Set(base)
case Type.Constructor(from, to) => typeVariables(from) | typeVariables(to)
case Type.Scheme(_, ty) => typeVariables(ty)
case forall: Type.Forall => Set(forall)
end typeVariables
// Retrieves the free occurrences of variables in a term
def freeVariables(root: SyntaxNode): Set[String] =
import SyntaxNode.*
val symbols = mutable.Stack[String]()
def helper(r: SyntaxNode): Set[String] =
return r match
case Var(name) => if (!symbols.contains(name)) Set(name) else Set.empty
case Abs(binder, _, body) =>
symbols.push(binder)
val fvs = helper(body)
symbols.pop()
return fvs
case App(function, applicant) => helper(function) | helper(applicant)
case If(condition, ifTrue, ifFalse) => helper(condition) | helper(ifTrue) | helper(ifFalse)
case _ => Set.empty
end helper
return helper(root)
end freeVariables
```

These helper functions calculate free variables in expression, type, and contraint set.

## Comments | NOTHING