The Paradox

The Naive Set Theory is considered inconsistent for a long time ever since the Russell's Paradox had been explored. This famous paradox can be state briefly as follow:

R=\lbrace x\vert x\notin x\rbrace

This paradox is so simple yet so magical and powerful that it completely rocks the foundation of mathematics back to the time when Hilbert thought that the math was almost completed. It states: let R be a set of all sets that are not a member of themselves, does R belongs to R itself? The contradiction followed directly by the definition of R that if R is not a member of itself then by its definition it should be, however that again contradicts because also by definition it shouldn't be. i.e.:

R=\lbrace x\vert x\notin x\rbrace\vdash (R\in R\implies R\notin R)

This paradox originated from a vital flaw in Naive Set Theory: The set of all sets is also a set, this can be described in an identical term: The comprehension (Construction) of sets are unrestricted.
  To prevent this, the Axiomatic Set Theory was developed, the central idea is to exclude those sets who are paradoxical by only allowing the sets to be derived from a limited set of axioms. One of these is the most widely known axiomatic set theory, the Zermelo-Fraenkel Set Theory, or Zermelo-Fraenkel Axioms, often abbreviated as ZF Set Theory when the Axiom of Choice is not included and ZFC Set Theory otherwise with C stands for Choice.

The Zermelo-Frankel Set Theory

  ZFC Set Theory states: All of the sets can only be derived from the following axioms

We will take the existence of empty set \empty as an extra axiom, note that this is only for convenience, it is not necessarily to be an axiom because it can actually be derived from the following axioms.
Similarly, some other axioms that are not included in the original definition are included to simplify things, as before, these newly included axioms can be derived from the originals.

Extensionality

The sets are identified by there elements, i.e., two sets are equal if the contains identical members.

\forall x.\forall y.(\forall z.z\in x \land z\in y\implies x=y)

Comprehension

If X is a set and P(x) is a property of element in X, then

\lbrace x\in X|P(x)\rbrace

is also a set.

Powerset

If X is a set, then there is a set consisting of all subset of X, which is called the powerset of X, denoted by \mathcal{P}:

\mathcal{P}(X)=\lbrace X'|X'\in X\rbrace

Unordered Pairs

If x and y are mathematical objects (either sets or not sets), then

\lbrace x, y\rbrace

is a set

Indexed Sets.

If I is a set and for all i\in I there corresponds a unique object x_i, then

\lbrace x_i|i\in I\rbrace

is a set, such a set is said to be indexed by I.

Union

If X and Y are two sets, then

X\cup Y=\lbrace a|a\in X\lor a\in Y\rbrace

is a set, called the union of X and Y.
  We can generalize the definition of union to not just two, but a set of sets, called Big Union, formally, if X is a set of sets, then:

\bigcup_X =\lbrace a|\exists x\in X. a\in x\rbrace

is a set.

Intersection

If X and Y are sets, then:

X\cap Y=\lbrace a|a\in X\land a\in Y\rbrace

is a set, called the intersection of X and Y.
  We can generalize this like we did for unions, formally, if X is a set of sets, then:

\bigcap_X=\lbrace a|\forall x\in X. a\in x\rbrace

is a set.

Product

If X and Y are sets, then:

X \times Y=\lbrace (a, b)|a\in X\land b\in Y\rbrace

is a set, called the product of X and Y.
  We can generalize the definition of product to not two, but a series of sets, formally, if X_1, X_2, ..., X_n are sets, then X_1\times X_2\times ...\times X_n is a set of n-tuples (x_1, x_2, ..., x_n) where x_1 \in X_1, x_2\in X_2, ..., x_n\in X_n.
  If the two operands of a product is the same set X, we can write X^2, similarly we have X^3, X^4, ..., X^n. By convention, X_0 is a unique set with only one element, the empty tuple ().

Disjoint Union/Tagged Union

If X_1, X_2, ..., X_n are sets, then:

X_1\uplus X_2\uplus...\uplus X_n = \\
(\lbrace 1\rbrace\times X_1)\cup (\lbrace 2\rbrace\times X_2)\cup...\cup (\lbrace n\rbrace\times X_n)

is a set.
  This allows the sets with some common elements to be distinguished by "tagging" the elements with an extra element indicating where they from, e.g., if a is in both X_1 and X_2, after the operation, it becomes (\lbrace 1\rbrace, a) and (\lbrace 2 \rbrace, a), which successfully gets distinguished(The programmer of languages with disjoint union/discriminated union/tagged union/sum types should be familiar with this concept, such types in such languages can be somehow considered as a direct adoption of the same concept in set theory).

Set Difference

If X and Y are sets, then:

X \setminus Y=\lbrace a|a\in X\land a\notin Y\rbrace

is a set.

Axiom of Infinity

Above rules tell us how to construct a set, but all of them are "starting with some sets and ending in a new set", We need an axiom to state that there exists at least one set to show how can we apply the above rules at first. This is the axiom of infinity.
  Let \mathcal{S}(w) denoting w\cup \lbrace w\rbrace (the set \lbrace w\rbrace can be constructed through the unordered pairs, just let x and y be the same object w), then there is a set R such that:

  1. The empty set \empty is the element of R
  2. If a set x is a member of R, then \mathcal{S}(x) is also a member of R

At the first glance this seems strange, but it actually defines a set with infinitely many members. We can try to expand it, start from \empty:

  1. Empty set \empty is in R, R=\lbrace \empty\rbrace
  2. From definition, we have \mathcal{S}(\empty) in R, R=\lbrace \empty, \lbrace \empty\rbrace\rbrace
  3. From definition, we have \mathcal{S}(\lbrace\empty\rbrace) in R, R=\lbrace \empty, \lbrace\empty\rbrace, \lbrace\lbrace\empty\rbrace\rbrace\rbrace.
  4. ...

And finally we will get n-nested \lbrace\empty\rbrace, where the nested level is the same as the number of the members in R, anyone with a sharp eye may notice that the set R has the same cardinality as the set of natural numbers \N, i.e., the set R is countably infinite. You may also find that the construction of such a set is quite similar to that of Peano Arithmetic. And yes, they do have a lot of common points.

Axiom of Foundation

  Since every set is build upon other sets, and other sets are also build upon the others which eventually starts from the basic set(the infinite axiom), then if b_1 is a member of b_0, it is either a basic element, such as a natural number(if we adopt the set of natural numbers to compliant with the infinite axiom), or a set, if it's a set, then it must be constructed from another set that itself is constructed by the same procedure. Thus, we can conclude a chain of memberships:

...b_n\in...\in b_1\in b_0

and will eventually end up with an empty set or an element from basic set. The axiom of foundation states that, such a chain of membership, must be finitely long

Notice that this axiom itself rejects to recognize "the collection of all sets" as a set, which is the culprit of the Russell's Paradox, since if "the collection of all sets" itself is a set, then it must include itself which will produce an infinitely long membership chain:

... X\in ...\in X\in X

which contradicts with the axiom of foundation

Axiom of Choice

  The axiom of choice is a generalization of product, noticed that we have already generalize the product into n-tuples, now we are going to generalize it to not just n-tuples, but to index it by a set I, let X_i be a nonempty set for each element i in a set I(that means X_i to X_n is indexed by I), we have a General Product:

\prod_{i\in I}X_i=\lbrace f:I\to\bigcup_{i\in I}X_i |\forall i\in I. f(i)\in X_i\rbrace

The definition of general product forms a set of functions, each function can pick an arbitrary member of i-th set X_i, so, informaly, the general product can choose one element in all X_i indexed by i\in I, thus we can produce a tuple, the i-th element of the tuple is an arbitrarily picked element of X_i by f.
  The axiom of choice states that, the set \prod_{i\in I}X_i is nonempty. This is pretty reasonable since you can choose a function f:\prod_{i\in I}X_i by letting f(i)\in X_i for all i\in I. One simple example is to let f(i) be the literally first element of X_i.


We Choose to Go to the Moon