The type inference algorithm always plays an important role in the real-world programming languages, they allows the user to selectively omit type annotations when it is obvious from the context. One of the most famous application is the ML-Family, where the programmer is permitted to omit almost all type annotations. The algorithm is based on unification: it resembles the equations, the unknown type will be denoted by a placeholder, and our task is to solve the equation to get a set of proper subtitutions that satisfies it, i.e., the solution of the equation.
  To get start with the algorithm, some preliminaries may be required, I'll state the definitions more intuitively rather than formally in order to help readers realize it, so do not consider them as formal definitions, if you are looking for a rigorous definition from mathematical perspective, Unification - Wikipedia is what you need.
  Let's think about the first question arises in our mind: What exactly is a type equation? Intuitively, it means that we need to know the shape of the exact types, then find the subtitutions (the solution) that accommodates the assumption. For example, consider the term \lambda x.\ x\ x\ 0, what type should x have?
  Let's split the body of the abstraction into inner application x\ 0 and the outer application x\ y where y=x\ 0, consider the inner application, we can see that x applies to 0, this reveal two facts: 1. if x is well-typed, then it must be of type \tau\to\sigma for some \tau and \sigma. 2. since it applies to 0, the \tau must equal to \tt int, so, from the inner application, suppose that x has type \rho, we can generate two type equations for x: 1. \rho=\tau\to\sigma for some \tau and \sigma. 2: \tau=\tt int. Now, consider the outer application, since we've already know that the inner application has type \sigma (we conclude x:\tau\to\sigma from the inner application, and applies to a 0, from the typing rule for application, it is trivial that x\ 0:\sigma), so the outer application must takes an argument with type \sigma, and returns some \sigma', now, we know that inner x and outer x must have the same type, therefore a new equation emerged: \tau\to\sigma=\sigma\to\sigma', combined with the aforementioned equations, we can obtain a equation set: \lbrace \rho=\tau\to\sigma, \tau=\tt int, \tau\to\sigma=\sigma\to\sigma'\rbrace

Be aware that during the calculation, there are actually several derivations that are not permitted in a normal type checking algorithm, for example, we assume x:\tau\to\rho, but then it applies to 0:\tt int, how could this happen? the \tau is not known to be the same as \tt int by that time. The reason is that, \tau, \rho, \sigma, and \sigma' are type placeholders, when then are engaged with other types during the calculation, i.e., a placeholder \tau is present when an \tt int is expect, instead of fail the algorithm immediately, we add a new equation \tau=\tt int indicate that type \tau is supposed to be of type \tt int, and this is precisely how the algorithm calculates the equations.

  We've been using phrase type equation to indicate such equations, in computer science or logic, it has a scientific name called Constraint, and the set of constraints are called Constraint Set, the procedure to solve a constraint is called Unification; the solution of a constraint is called its Unifier, a unifier unifies a constraint set if it unifies every constraint in it.
  From above definition it's not hard to find that both constraint set, unification, unifiers are just alias for the similar concepts in equations. In the above example, the unifier is [\tau\mapsto \tt int, \sigma\mapsto int, \sigma'\mapsto int, \rho\mapsto int\to int], note that we use symbol \mapsto instead of =, because in such context the unifier stands for substitutions, when we apply a unifier to a particular type, it actually replaces all the variables in the type that in its domain to the corresponding type in its codomain, for example, apply the above unifier to type \rho\to\sigma will yields \tt int\to int\to int, where the \rho is substituted by \tt int\to int, and \sigma is substituted by \tt int.
  Now we can give the algorithm for calculating the constraint set:

Again, we are aiming "easy-to-understand" instead of "rigorousness", so I'll explain the enigmatic mathematical formulas in a pragmatic way: when you see something like \cfrac{A_1,A_2,...}{B}, you know that A_i are premises, and B is a conclusion, it means, "if all of A_1,A_2,... can be satisfied, then we can conclude B", or, in a way that is more intuitive for programmers, "if A_1=\mathtt{true}\ \land\ A_2=\mathtt{true}\ \land\ ..., then B=\tt true"; formula \Gamma\vdash x:\mathtt{T_1} means x has type \tt T_1 under the context \Gamma, in practice, the \Gamma is usually the symbol table, a list that records the defined variables and their types, we use \Gamma,x:\tt T to append x:\tt T to the end of \Gamma, P\vdash Q read as "with information in P, I can conclude that Q is true", P can be omitted when it's empty, means that I can conclude that Q is true without any context. The above figure uses two more context symbols than a single \Gamma, that is \mathcal{X} and \mathcal{C}, specifically, \mathcal{X} is the set of generated type placeholders during the algorithm (see above explaination about "type placeholder"), and \mathcal{C} is the already-calculated set of constraints. Specifically, \Gamma\vdash x:\mathtt{T_1}\ |_{\mathcal{X}}\ \mathcal{C} means x has type \tt T_1 under symbol table \Gamma, while the already-calculated constraint set is \mathcal{C}, and, until now, we have generated additional type placeholders stored in \mathcal{X}. If we think the algorithm as a function, then the function will take \Gamma and x as input, and produce \mathcal{X} and \mathcal{C} as output. The type \tt Nat stands for "natural numbers" (we use it interchangeably with \tt int, although they stand for a very different range).
  Now we can give the function to calculate the constraint set, but first, let's introduce some required definitions:

enum Type:
  case Base(n: String) extends Type
  case Constructor(from: Type, to: Type) extends Type
  case Scheme(quantifiers: Set[Type], ty: Type) extends Type
  case Forall(n: String) extends Type

  override def toString: String =
    import Type.*
    this match
      case Base(name) => name
      case Constructor(from, to) => s"(from->to)"
      case Scheme(quantifiers, ty) => s"∀{quantifiers.foldLeft("")((acc, base) => acc + base.name)}.(ty)"
      case Forall(name) => s"∀$name"
  end toString

  def name: String =
    import Type.*
    this match
      case Base(name) => name
      case Forall(name) => name
      case _ => throw IllegalStateException()
  end name
end Type
object TypePrimitives:
  val Nat: Type = Type.Base("Nat")
  val Bool: Type = Type.Base("Bool")
end TypePrimitives

We use Type.Base to represent monotypes (i.e. base types that can be represented by a single symbol), Type.Constructor to represent function types, and Type.Forall to represent type placeholders, Type.Scheme will be exlained soon. TypePrimitives provide built-in types (like primitive types in Java/C#). And the abstract syntax tree of our language can be described by:

enum SyntaxNode:
  case Var(name: String)
  case Abs(binder: String, binderType: Option[Type], body: SyntaxNode)
  case App(function: SyntaxNode, applicant: SyntaxNode)
  case Successor(nat: SyntaxNode)
  case Predecessor(nat: SyntaxNode)
  case IsZero(nat: SyntaxNode)
  case If(condition: SyntaxNode, trueClause: SyntaxNode, falseClause: SyntaxNode)
  case Zero, True, False
  case Let(left: String, right: SyntaxNode, body: SyntaxNode)
  case Seq(first: SyntaxNode, second: SyntaxNode)
end SyntaxNode

To prevent repeated tedious types, we introduce some type alias:

opaque type Constraint = (Type, Type)
opaque type ConstraintSet = (Type, Set[String], Set[Constraint])
opaque type Bindings = mutable.Map[String, Type]

The Constraint Set has three member, the first is the calculated type (may contain type placeholders), the second is the generated type placeholders, and the third is already-calculated constraint set, Bindings are symbol table.

object FreshNameProvider:
  private var currentChar: Char = 'A'
  private var currentIndex: Int = 0

  def freshName(): String =
    val str = s"currentChar{if (currentIndex != 0) currentIndex else ""}"
    currentChar = currentChar match
      case x if 'A' until 'Z' contains x => (currentChar + 1).toChar
      case _ => currentIndex += 1; 'A'
    return str

  def freshName(excludes: Set[String]): String =
    val name = freshName()
    return if !excludes.contains(name) then name else freshName(excludes)
end FreshNameProvider

The FreshNameProvider is to provide unique names at every call to freshName() or freshName(Set[String]), the later one generates a unique name different from every element in Set[String].

// Get all the base type variables that are ever occurred in the given constraint set
def constraintVariables(constraintSet: Set[(Type, Type)]): Set[Type] =
  return constraintSet.flatMap {
    case (left, right) => typeVariables(left) | typeVariables(right)
  }
end constraintVariables

// Get all the base type variables in a type declaration
def typeVariables(decl: Type): Set[Type] =
  return decl match
    case base: Type.Base => Set(base)
    case Type.Constructor(from, to) => typeVariables(from) | typeVariables(to)
    case Type.Scheme(_, ty) => typeVariables(ty)
    case forall: Type.Forall => Set(forall)
end typeVariables

// Retrieves the free occurrences of variables in a term
def freeVariables(root: SyntaxNode): Set[String] =
  import SyntaxNode.*
  val symbols = mutable.Stack[String]()
  def helper(r: SyntaxNode): Set[String] =
    return r match
      case Var(name) => if (!symbols.contains(name)) Set(name) else Set.empty
      case Abs(binder, _, body) =>
        symbols.push(binder)
        val fvs = helper(body)
        symbols.pop()
        return fvs
      case App(function, applicant) => helper(function) | helper(applicant)
      case If(condition, ifTrue, ifFalse) => helper(condition) | helper(ifTrue) | helper(ifFalse)
      case _ => Set.empty
  end helper
  return helper(root)
end freeVariables

These helper functions calculate free variables in expression, type, and contraint set.

卧槽用零开始讲好他妈的累啊,我最开始写的时候还在想这才多少东西写一会就写完了然后写了几千字想了一下怎么还尼玛有这么多要讲的,我今天写的累疯了所以摆烂了剩下的改天再写,反正这文章铁定没人看就算有人看也一定是可爱的小聪明光看代码就能领悟


We Choose to Go to the Moon