本文中用P_n表示P[1..n]，用P[n]表示P中的第n个字符，用P_i\sqsupset P_q代表P_i是P_q的后缀，用P_i\sqsubset P_q代表P_i是P_q的前缀，所有下标从1开始
因为太久不看完全忘记了KMP算法究竟是怎么实现的，看龙书的时候发现上面的题目赫然印着相关内容而我却完全看不懂，遂翻开算法导论大战一个多小时重温了这个各方面复杂度都十分优秀的字符串匹配算法
KMP算法是一种优化过的有限自动机匹配算法，和有限自动机需要花费O(m|\Sigma|)(其中m是模式串的长度)的时间生成状态转移表不同的是，KMP算法通过维护一张更小的表来尝试跳过不可能成功匹配的字符串并以此将"准备阶段"的效率提高的O(m)，在扫描阶段，KMP和有限自动机算法拥有相同的线性时间复杂度
KMP算法的核心基于这样的观察，假设匹配串T=bacbababaabcbab，模式串P=ababaca，当我们匹配到某一个位置的时候：
此时令P相对于T的偏移量为s，此例中s=4(本文中下标从1开始，遵循CLRS的规范)，已匹配长度为q，此例中q=5(ababa)，在该情况下我们已经成功匹配了5个字符，但是第6个字符a和c不匹配，KMP的核心思想在于，当我们知道已匹配长度q和已匹配的字符串P_q，就能立即知道某些s一定是无效的，比如在这个例子中，s+1一定是无效的，因为当s+1时，我们有
b和a显然是不匹配的，但是，如果我们把s向右推进2，就变成了
此时s'=s+1，并且P和T中再次出现了长度k=3匹配，也就是我们通过把s推进2次跳过了五次多余的匹配，直接从P[4]开始，诸如类似k的值会被一个特殊的函数计算出来并储存在表\pi中，比如在上面的情况中就可以直接通过\pi[q]=\pi[5]=3来得到k的值并且把s推进q-k=2次，令s'=s+(q-\pi[q])，我们就可以在一次失败匹配后直接得到下一次匹配的开始下标。
很明显，我们的目的是跳过尽可能长的"无用"字符串，也就是不可能成功匹配的字符串，那么就存在一个这样的问题：
假设P[1..q]和T[s+1..s+q]匹配，找到一个最大的偏移量s'，使得对于k\lt q且s'+k=s+q，存在P[1..k]=T[s'+1..s'+k]
(这个描述比较拗口，但是可以结合前三张图示理解)
换言之，就是对在同一下标结束的匹配的字符串，比如上图中的两次匹配，一次q=5一次q=3，都在T[9]处结束，对于T来说，匹配的字符串的末端下标没有变化，我们已经知道P_q\sqsupset T_{s+q}(不言而喻)，那么应当找出一个k<q，让P_k\sqsupset T_{s+q}(这里的s+q可以换成s'+k，相当于下标s增加到了s'而匹配的文本串长度p长度减小到了k，在上面的例子中，下标s=4增加到了s'=6，而q=5减小到了k=3，两者互补最终的和依然是9)，既然我们的目的是增大"跳过"的数量也就是s'的值，我们又知道s'+k=s+q，在给定s和q的情况下，最大化s'值的方法自然是最小化k的值。而k的最小值自然是0，也就是对于任何k\lt q，都不存在P_k\sqsupset T_{s+q}的情况
最佳情况演示
之前：\def\a{\boldsymbol{a}}
\def\b{\boldsymbol{b}}
\def\c{\boldsymbol{c}}
\begin{array}{ccccccccccccccc}
b&a&c&b&a&b&\a&\b&\b&a&b&c&b&a&b& \\
&&&&&&\a&\b&\b&b&a&c&b
\end{array}之后：
\def\a{\boldsymbol{a}}
\def\b{\boldsymbol{b}}
\def\c{\boldsymbol{c}}
\begin{array}{ccccccccccccccc}
b&a&c&b&a&b&a&b&b&\a&\b&c&b&a&b& \\
&&&&&&&&&\a&\b&b&b&a&c&b
\end{array}s直接从6跳过了q=3变成了s'=9，这种情况下k=0，这种情况的前提是根本不存在一个k\lt q使得P_k\sqsupset T_{s+q}，在上面的例子中，只有abb才是abb的后缀，往右挪动P，无论ab还是单a都不是T_{s+q}(或者T_{s'+k}(k=0)的后缀
此时我们需要做出另外一个观察，很明显，P_q\sqsupset T_{s+q}，同时P_k\sqsupset T_{s+q}并且|P_k|\lt |P_q|，这蕴含着P_k\sqsupset P_q，而根据上文中的图示可以看出同时P_k\sqsubset P_q。因此，我们的目标就是寻找一个最大的k使得P_k既是P_q的后缀又是P_q的前缀，总结出这一点，我们就可以填充表\pi了
令\pi是\lbrace1,2,3,...,m\rbrace到\lbrace0,1,2,...,m-1\rbrace的函数，有\pi[q]=max\lbrace k\lt q且P_k\sqsupset P_q\rbrace，换言之，就是要求P_k是最长的既是P_q前缀(k\lt q本身就蕴含了P_k\sqsubset P_q)又是P_q真后缀的字符串，同时P_k\neq P_q，即P_q的最长公共前后缀
我们先给出一个计算\pi的函数叫做前缀函数，接着分析该函数的正确性
该函数的核心思想基于这样的观察，考虑一个例子，如字符串abab?，假设我们已经知道了当前的最长公共前后缀是ab，也就是长度为2，那么当?是什么的时候，可以让该字符串的最公共前后缀长度为3？这个答案很明显是c，因为前三个字母aba是确定的，想要让公共前后缀长度为3，则前缀和后缀都必须是aba，因此只需要判断a是否等于?即可。现在设P=ababaa，要计算P_0..P_q中的所有子串的最长公共前后缀，首先设置两个指针k和q，其中令k的初始值是0，q是1，这代表第q_0不存在任何公共前后缀，接下来，我们尝试着模拟迭代，第一次迭代之前，k=0，q=1，令\pi[1]=k=0，代表第一个字符本身没有任何公共前后缀(因为我们要求后缀必须是真后缀，而单个字符的后缀只有它本身，不满足真后缀要求)，我们有
指针指出了k和q现在的位置，接着我们进入第一次迭代，此时k=0，q=2，我们比较P[k+1]和P[2]是否相等，很显然a\neq b，因此我们让\pi[q]=\pi[2]=k=0，也就是P_2不存在任何公共前后缀，此时进入第二次迭代，进入第二次迭代前有
此时再次比较，发现P[k+1]=P[q]，此时将k递增1，令\pi[q]=\pi[3]=k=1，代表P_q=P_3有长度为1的最长公共前后缀，代表进入第三次迭代，进入第三次迭代前有
依然符合P[k+1]=P[q]，于是k再次递增1，\pi[q]=\pi[4]=k=2，继续下去有
依然P[k+1]=P[q]，递增k，\pi[q]=\pi[5]=k=3
好，现在我们发现了一个问题，P[k+1]\neq P[q]了，但是k=3，这时候我们该怎么填\pi[6]呢，简单的直接把k归零肯定是不行的，因为很明显上图中\pi[6]应当等于1，解决办法是这样的：我们把k从当前的位置开始往后倒着推，找出之前的每一个k'使得P_k'\sqsupset P_{q-1}，也就是P_{q-1}的所有公共前后缀的集合(为啥不是P_q？因为我们现在正在找的就是P_q的啊)，接着看看在这个k'的位置上，P[k'+1]是否会等于P[q]，如果存在这么一个k'，那么说明此时P_q最长公共前后缀长度就是长度为k'+1的P_{k'+1}。
为什么P[k'+1]=P[q]就能说明P_{k'+1}就是P_q的最长公共前后缀呢？因为首先我们找到了P_{q-1}，也就是P_q去掉最后一个字符得到的字符串的公共前后缀P_{k'}
图中是q=|P|的情况，很明显在P_{q-1}中，如果k'是P_{q-1}的公共前后缀并且P[k'+1]=P[q]，则P_{k'+1}一定是P的公共前后缀，这点对于任意p\leqslant |P|都成立
我们一直往后倒着递减k'到0的位置，如果在这之前都没有任何一个合适的k'，那么此时最长公共前后缀长度就是k'=0。而计算每上一个k'都有一个特殊的技巧，也就是k'=\pi[k](该算法的正确性将会在下文中证明)，我们只需要不停的让k'=\pi[k],k=k'，然后判断每一个k直到k=0为止，这就是函数的第6-7行所做的，它不停的递减k，并判断递减后的k是否满足P[k+1]。接着，如果找到了一个符合条件的k，或者一直迭代下去让k最终为0，则跳出循环，然后判断P[k+1]=P[q]，这点是上文提到过的，如果符合那么就让k=k+1(从上图中可以轻松看到P_q的最长公共前后缀长度为k'+1)，再循环的结束为止，将k储存在\pi[q]中，代表“P_q位置的最长公共前后缀长度为k”。
对于字符串x, y和z，如果x\sqsupset y且y\sqsupset z，则x\sqsupset z，也就是说，后缀关系是传递的
对于字符串x, y和z，如果x\sqsubset y且y\sqsubset z，则x\sqsubset z，也就是说，前缀关系是传递的
对于字符串x, y和z，如果x\sqsupset z且y\sqsupset z，则
令P是长度为m的模式串，\pi是P的前缀函数，也即\pi=max({\lbrace k:k\lt m且P_k\sqsupset P_m\rbrace})，对于q=1,2,...,m，设：
有
这一条就是对于上文中k'=\pi[k],k=k'步骤的正确性证明，我们要证明这样得出的k'都可以令P_{k'}\sqsupset P_q，也就是这个迭代步骤中生成的所有k构成了P_q的所有公共前后缀的长度的集合。首先我们证明\pi^\ast[q]\sube\mathcal{S}，令u=1且k_1=\pi^{(u)}[q]=\pi[q](由定义)：
由\lt的传递性和定义1中给出的\sqsupset的传递性归纳可得，对于任意的k\in\pi^\ast[q]，
都满足
这意味着k\in\pi^\ast[q]\implies k\in\mathcal{S}，也即
接下来证明\mathcal{S}\sube\pi^\ast[q]，我们将会使用反证法证明两个集合的差是空集。首先，假设\mathcal{S}'=\mathcal{S}-\pi^\ast[q]\neq\phi，令j=max(\mathcal{S'})，我们知道max(\mathcal{S})=\pi[q](依旧是由\pi函数的定义直接给出了)，也就是说必然有j\lt \pi[q]。因此，令j'\leqslant \pi[q]是\pi^\ast[q]中大于j的最小值(\mathcal{S}'中\pi[q]被删掉了，因此j才是\mathcal{S}'的最大值，这也是必然的，还是看定义，max(\mathcal{S})本身一定属于\mathcal{S}，而max(\mathcal{S})=\pi[q]，因此\pi[q]一定会在\mathcal{S}-\pi^\ast[q]时被删掉，这意味着\pi[q]一定不属于\mathcal{S}')，因为j和j'都属于\mathcal{S}，因此可以知道P_j\sqsupset P_q并且P_{j'}\sqsupset P_q，又因为j\lt j'，所以根据定义2可以知道P_j\sqsupset P_{j'}，并且我们知道j是小于j'的最大值，所以必然有\pi[j']=j(因为j完美符合了\pi[j']的定义)，又因为j'\in\pi^\ast[q]，而从\pi^\ast[q]的定义可以知道\pi[j']\in\pi^\ast[q]，即j\in\pi^\ast[q]。
为什么\pi[j']\in\pi^\ast[q]？因为定义，\pi^\ast[q]中的任意一个\pi^{(i)}[q]都等于\pi[\pi^{(i-1)}[q]]。因为j'\in\pi^\ast[q]，所以可以令j'=\pi^{(i-1)}[q]，则\pi^{(i)}[q]=\pi[j']，因为\pi^{(i)}[q]\in\pi^\ast[q]，所以\pi[j']\in\pi^\ast[q]
然而我们在假设中强调了j\in\mathcal{S'}，而\mathcal{S'}=\mathcal{S}-\pi^\ast[q]，也就是说\forall_{s\in\mathcal{S'}}(s\notin\pi^\ast[q])，因此推出矛盾，证明了\mathcal{S}'必定是空集，这意味着
(2)和(3)一起，完成了对该引理的证明
设P是长度为m的模式串，\pi是P的前缀函数，则对于q=1,2,...,m，如果\pi[q]>0，有\pi[q]-1\in\pi^\ast[q-1],
令r=\pi[q]>0，则从\pi的定义可以知道r<q并且P_r\sqsupset P_q，因此可以得知r-1<q-1并且P_{r-1}\sqsupset P_{q-1}(可以看上面的那张阴影图，如果x是y的后缀，那么这两个字符串各去掉末尾的一个字符之后得到的x'依然是y'的后缀)，应用引理1可以知道，r-1\in\pi^\ast[q-1]，也就是\pi[q]-1\in\pi^\ast[q-1]，证明完毕
我们可以定义一个\pi^\ast[q-1]上的子集E_{q-1}，该子集包含了所有属于\pi^\ast[q-1]并且可以拓展为P_q的k，也即
其中(5)实际上是把\pi^\ast[q-1]的定义展开，而(6)则是把(5)中的后两部分合并(为啥能合并？因为如果P_k是P_{q-1}的后缀，那么如果P[k+1]=P[q]，就相当于说明P_k和P_{q-1}的后面一个字符也是相同的，因此算上这个字符之后的P_{k+1}依然是P_q的后缀)，对于这个集合中的元素来说，只需要把自己递增1，让P_k变成P_{k+1}，就可以变成P_q的一个公共前后缀，这个E_{q-1}的定义将会帮助我们证明算法第6-8行的正确性，它实际上描述的就是我们在开始正确性证明之前的那一段提到过的方法，“找到一个更小的k然后给这个k+1”
设P是长度为m的模式串，\pi是P的前缀函数，则对于q=1,2,...,m，有：
首先，因为E_{q-1}的定义是“所有加上1之后就能让P_{k+1}\sqsupset P_q的k的集合”，所以如果E_{q-1}是空的，那么说明没有任何一个k能使得P_{k+1}\sqsupset P_q，此时根据定义，\pi[q]显然应当等于0。
如果E_{q-1}=\phi，则需要分两步证明，首先我们证明\pi[q]\leqslant 1+max(E_{q-1})。令r+1=\pi[p]，则P_{r+1}\sqsupset P_q，因为r+1=\pi[q]，所以r=\pi[q]-1，这蕴含了两点：1. r的值比\pi[q]小1，而\pi[q]一定小于q，因此r一定小于q-1。2. 根据引理2和之前提到的P_{r+1}\sqsupset P_q，可以得到r\in \pi^\ast[q-1]。综合这两点和之前提到的P_{r+1}=P_q以及E_{q-1}的定义可以得到r\in E_{q-1}，因此r一定不大于E_{q-1}中的最大值，因此r\leqslant max(E_{q-1})，因为r=\pi[q]-1，所以\pi[q]-1\leqslant max(E_{q-1})，也就是：
接下来，我们证明\pi[q]\geqslant 1+max(E_{q-1})。由E_{q-1}的定义可以知道，对于任意一个k\in E_{q-1}，都有P_{k+1}\sqsupset P_q并且k<q-1\implies k+1<q，因为\pi[q]规定了最大的k'使得P_{k'}\sqsupset P_q，因此k+1一定不大于k'，也就是k+1一定不大于\pi[q](否则k不需要加1就有可能是P_q的后缀)，又因为k\in E_{q-1}，所以这蕴含着E_{q-1}中的任意一个k(包括最大的那个)加上1都不大于\pi[q]，即：
(8)和(9)一起，完成了该推论的证明，这一个推论证明了算法第10行递增k的正确性
现在可以根据这三条结论来进行前缀函数的正确性证明了，在每次迭代开始前，k都等于\pi[q-1]，也就是上一次迭代算出的\pi[q]，在第一次迭代开始前，第3-4行保证了这点，第12行在每次迭代中都把\pi[q]设置为k，之后q递增1，保证了在下次循环开始前k=\pi[q-1]，第6-8行计算\pi^\ast[q-1](因为此时的k在本次循环中还没有被更新，依然是\pi[q-1]的值)，并算出最大的使得P[k+1]=P[q]的k的值，也就是E_{q-1}的最大值，这一点在引理2中说明，接着在9-11行根据(7)，我们有足够的把握相信，如果k找到了，那么就应当让他递增1，如果k没有找到，则此时k的值一定是0，我们判断P[k+1]也就是P[1]和P[q]是否相等，如果相等则也给k递增1，不相等就继续让k的值等于0，代表在P_q没有任何公共前后缀，注意，如果k找到了，那么P[k+1]和P[q]一定相等，因为这是第6行while
循环的跳出条件，第12行完成对\pi[q]的设置，并在第14行返回已经填充好的\pi表
A two-pushdown stack machine, abbr. 2PDA, is a PDA with two pushdown stacks, the \tt PUSH and \tt POP operations must specify which stack they operate on, and 2PDA is deterministic.
2PDAs can actually accept some non-context-free langauges, for example, the following 2PDA acceptes old friend a^nb^na^n, it can be achieved by putting the three clusters of a and b into two stacks alternately
2PDA=TM
Let P be a 2PDA, The core idea of this proof is to flatten the three storage of P into a single storage, which is the tape of the TM, at first, we put all the contents on the input tape of P to the tape of TM T, then insert a \Delta at the very beginning of the tape of T, every time P reads a letter, we replace the corresponding letter to \Delta on the T's tape to simulates a consumption. We'll put a \# at the end of the content on T's tape, which is the end of the input string, it'll be considered as a base pointer, no matter how the tape head moves, it must return to the \# after finitely many steps before begin its next operation.
For a \tt READ operation on the P with three outgoing edges lead to states X, Y and Z each of which labeled with a, b and \Delta respectively, we do the following operations on the T to simulate it:
The two stacks' content will appears from the left to right separated by \text{\textdollar} after the \#, for a \tt POP_1 state with three edges lead to X, Y and Z each of which labeled with a, b and \Delta respectively, we do the following to simulate it:
To simulate a \tt PUSH_1\text{ x} state, just call the \tt INSERT\text{ x} and then reset the tape head
The \tt PUSH_2 and \tt POP_2 is very similar, just move the tape head to \text{\textdollar} then do exactly the same thing as to the \tt PUSH_1 and \tt POP_1
When the P reaches an \tt ACCEPT, we \tt HALT the TM T
To show the second part, which is that prove every langauge accepted by a TM can be accepted by some 2PDA, we do a simple exchange, let's prove that every langauge accepted by a PM can be accepted by some 2PDA, since we've proved that PM=TM, so this could also prove that 2PDA=TM. Firstly, we transfer all the contents on the input tape of P to its \tt STACK_1 while maintaining the same order, we can do this by put them in \tt STACK_2 first, which will reverse the order of input string, then pop them from \tt STACK_2 and pushes back to \tt STACK_1, now we need to simulate two operations of the PM, \tt ADD and \tt READ, the \tt READ operation is same as the \tt POP_1 operation on P since both of them remove the character at the very beginning at the input string then branching accordingly, the \tt ADD\text{ x} state, on the other hand, is a bit tricky, we need to empty \tt STACK_1 to \tt STACK_2 first, the push x to the \tt STACK_1, then we pop all the contents in \tt STACK_2 and push the back to the \tt STACK_1, this make the x becomes the bottommost element which is just like the rightmost element on the \tt STORE of the PM, while keep the order of other characters in the \tt STACK_1, since the PMs contain only these two operations, we can consider we've successfully simulated a 2PDA on PM, from the first and second part of this proof, we have 2PDA=TM
ocean liner.
A PDA with n-Stacks is equivalent with 2PDA, in other words, nPDA=TM
There will be no even more powerful machines can be constructed by simply adding more stacks, the proof is very simple, for PDA with n stack, we separate the tape of the TM into n+2 sections, where the first section is the contents on the input tape of nPDA, the following n+1 sections represent the n stacks of the nPDA, and then the last section is filled with blanks, it is obvious that we can build such a machine by repeating the algorithm in the previous proof.
]]>The compilers do not treat the part of the program as set of characters, instead, these "set of characters" are treated as a unit, called token, in count+1
, there will be three tokens, specifically, the count
will be treated as a token of type id instead of a set of characters.
The intermediate codegen will based on the AST:
The feature of the syntax trees make them particularly suitable in intermediate code generation, consider the two leaves of a node as operands and the node itself as operator, it is easily to construct a three-address code like x=y\ op\ z where y and z are the operands, op is the operator, and x is the location where the computation result stores. A three-address instruction carries out at most one operation, such as comparison, computation or branch.
See this article(上下文无关文法)
The analogy here is that the name of the token will be the terminals in the corresponding CFG of the programming language. The terminology token and terminal is synonymous to each other under this context. Another concern is that the empty strings are denoted by \epsilon not \Lambda
If an operands with the same(precedence, actually) operators on its both sides belongs to the left one, the operator is said to be left associative, for example, plus sign is left associative because in an arithmetic expression(incomplete) +x+, x will be resolved by the left + unless a pair of parenthesis is involved. The opposite of left associative is right associative, the assignment operator = and exponential operator are often right associative, they have different parse trees, the former one's parse tree growns towards bottom-left, and the later one is bottom-right
If an operator a has the privilege to takes operands before another operator b, we say that a has higher precedence than b, for example, \ast has higher precedence than +
If we want to create CFG for a language with built-in precedence
- Build the atomic part of the expression, for example in arithmetic expressions, a single digit and a expression that is surrounded by a pair of parenthesis are atomic, they can not be torn apart in anyway, which means they have the highest precedence in the system, for exmaple, a digit itself is definitely atomic, it has the highest precedence, to itself. If an expression, no matter what, is surrounded by parenthese, then it can never be "seized" by any operator outside of the parentheses, so, both digits and parenthesized expression can be considered as the atomic expressions, they have the highest precedence, so we add a production for them: factor\rightarrow\ digit\ |\ (?), the question mark "?" is because for now we don't know what can be put in to the parentheses yet.
- If we want to build a grammar with n-levels of precedence, we need to make an observation:
- For every expression with lower precedence, they cannot "seize" the expression that are bounded by operators of higher precedence, for example, in 3+4\ast5^6 the 5^6 is treated as an atomic unit to the multiplication sign, it cannot "seizes" 5 from the exponential, 4\ast5^6 is treated as an atomic unit to the plus sign, it cannot "seizes" the 4 from multiplication, now let a=5^6, b=4\ast a, the original expression can be shown as 3+ba where a has the highest procedence, b takes the second place, as for the plus sign, since the addition and subtraction have the lowest precedence in the arithmetic system, so the cannot be considered as a whole, because any left-associative operator can "seize" their operands, for example if we put another + at the left of the whole expression +3+ab, then the 3 immediately becomes the operand of the left +
so the basic idea is, for every operator at the same level of precedence, we write a similar production like the one we've made for the factor, where the right side of the production should become \text{current-level-of-precedence } op \text{ higher-level-of-precedence}, of course the order of the two operands depends on the associativity.
- From the last step we know we need to build a production set for operators that have one-level-lower precedence than factor, which, in arithmetic expressions, are multiplication and division, so we add:
term\rightarrow\ term\ast factor\ |\ term/factor\ |\ factorthe last production means that an expression with higher precedence can be derived from an expression with lower precedence, but not vice versa because of the "unit" we've mentioned, an expression with lower precedence can derives higher precedence because that they could be consisted of the expression of higher precedence, but the higher precedence are treated as units like we've talked in step 2, so a higher precedence can never be consisted of lower precedence, just imagine that it's impossible
- Now we get to the lowest precedence in the hierarchy, the addition and subtraction, we do like what we've said in step 2:
expr\rightarrow\ expr+term\ |\ expr-term\ |\ term- Finally, we need to consider what need to be filled into the question mark we leaved in the first step. since any valid arithmetic expression can be placed into a pair of parentheses, so the content of the question mark must be able to derive everything in this langauge, which is, apperantly, expr, because it has the lowest precedence, and only lowest precedence can generates higher or the same precedence, so expr can derive all other productions and itself, but the other productions cannot derive expr, so we need to put an expr to replace the question mark:
factor\rightarrow\ digit\ |\ (expr)So far from now we've finished inventing a CFG for a language with built-in precedence, the whole structure of this CFG is similar to an Ouroboros(because the expr is clearly a self-embedded nonterminal), it is easy to find that this algorithm is iterative, so with any level of precedence we just need to repeat the steps of this algorithm
The keyword allows us to recognize statements because most statements start with a keyword or special character(not all of them, of course)
"Attribute" under this sense means any quantity associated with a programming construct(i.e. a node in parse tree), includes but not limited to:
Syntax directed translation scheme is a notation for attaching program fragments to the productions of the grammar so that they will be executed in the order of how the program text is parsed and produce the corresponding result that will be used as the translation of the original program
A syntax directed definition, includes
We use x.y denotes the attribute y of nonterminal x, a parse tree with attribute value on each node is called annotated parse tree
An attribute is said to be synthesized, if its value on a node N in the parse tree depends on the attributes values of N and the children of N, one of the systhesized attributes' property is that they can be evaluated during a bottom-up traversal of the tree
In the figure above, the t is a attribute with string value that bounded to the nonterminals, represent the postfix form of the subtree start from the corresponding nonterminal, we can used another table-based syntax directed definition to represents such structures:
where || stands for string concatenation, for every production there is a semantic rule tells how the attribute t is evaluated
Such a SDD(abbreviation of syntax directed definition) that the expressions in the right side of the semantic rule has the same order as its production with some additional string is called simple SDD
The evaluation order of the subexpressions in the semantic rules of SDD does not matter, as long as you ensure that the root expression is being evaluated only after all of its subexpressions' evaluation
The semantic rules of the SDD is not limited to only produce strings, you can embed some program fragments that will be executed when the node is encountered, the semantic rules in such case will be called as semantic actionssemantic action will be treated as an extra children of that nonterminal according to its place in the production rule and executed when it is traversed
For every CFG there exist a parsing algorithm that can parses it in O(n^3) time(the CYK Algorithm), mostly it won't be employed since it's too slow
There're generally two classes of parsers: bottom-up and top-down, they're named by the order that the nodes in the parse tree are generated. The current terminal that is being scanned by the parser is called lookahead, in the parsing procedure, if we find that the terminal corresponds to the input, we advance both the tree node and the input from left to right, if we find that some production is unsuitable, we need to go back and try another production, this is called backtracking.
Recursive-descent parsing is a top-down approach to process the input, there is a simple form of RDP called predictive parsing, it is a procedure that "predicts" what productions are about to be used and list all the nonterminal and terminals in that production in advance, which implicitly generates a parse tree, the predictive parsing does not require backtracking, because it already listed all the expecting tokens before the input string contacts, we don't need to try the productions one by one; for production stmt\rightarrow\boldsymbol{for(}\ optexpr\ ;\ optexpr\ ;\ optexpr\ \boldsymbol{)}\ stmt, we can use the predictive parser
It should be noticed that match procedure are only for terminals
Let \alpha be the deriving string, we used \text{FIRST}(\alpha) to denote the set of terminals that at the first place of the one or more substition results from \alpha, if \alpha is nullable, then \epsilon\in\text{FIRST}(\alpha), in the example above, the \text{FIRST}(\alpha)=\lbrace \boldsymbol{for}\rbrace, the \text{FIRST} must be considered is there is more than one live productions from a nonterminal A, for instance, A\rightarrow \alpha and A\rightarrow \beta, the \text{FIRST}(\alpha) and \text{FIRST}(\beta) is required to be disjoint, otherwise the predictive parser cannot deduce which production is to be used when encountering their common terminals
The application of \epsilon-production is a bit tricky, suppose for nonterminal optexpr there exist production optexpr\rightarrow \epsilon\ |\ \boldsymbol{expr}, once we hit such a situation that there is no terminals for optexpr, for example, in the input string \boldsymbol{for(;expr;expr)}\ other, the should be an optexpr after the first (, but the truth is nothing is presented, the input string skips it and proceed to the ;, in such circumstance we should check if the lookahead symbol matches any of the non-\epsilon-productions of nonterminal optexpr, in this example we check if the lookahead=\boldsymbol{expr}, apparently they are not equal since the lookahead=\boldsymbol{;}, so we just return from optexpr() and do nothing, we skipped the optexpr nonterminal, but the lookahead stays intact; if the lookahead indeed match any production, we then use that production and advance the lookahead, we do nothing means that an \epsilon-production is applied
The things the a predictive parser should undertake for a particular nonterminal A can be listed in a list:
Call the procedure that corresponding to the \text{FIRST}(\alpha) we've just detected, in that procedure all the terminal and nonterminals will be listed from left to right to simulates a derivation, if any of the symbols(terminal or nonterminal) is mismatched with lookahead, the procedure should crash or a syntax error should be reported
Since the SDT(abbr for syntax-directed translators)s are build upon SDDs, we can form a SDT from a predictive parser, first we build a predictive parser for those production and omit those semantic action, then, we add those semantic actions to the right place in the procedures of the predictive parser(where "right place" is self-explainable, since the procedures is almost the same as the productions).
A production is said to be left-recursive, if the right side starts with the same nonterminal as the left side, such productions will causing an infinitely high parse tree that will grow downward left, and lookahead will never move during the recursion(because lookahead only advanced in match and match is only for terminals); The right recursion is opposite at left recursion, a production is said to be right-recursive if the right side ends with the same nonterminal as the left side. a right-recursive production will causing an downward right parse tree with infinitely height
Abstract syntax tree is similar to parse tree, but the nodes in parse tree are nonterminals, and in AST are program structures, the node of AST represent an operator and its children represent operands, the operands and the operator is not necessarily be some widely-known binary operatos, they just need to be "treated" like operators with operands
it is clear that the syntax tree does not have something like expr or expr.t on it, they're replaced by real structures
When we convert a grammar in to a form that facilitates parsing, the semantic actions inside will be treated as terminals
A token is consists of name and attributes, it is not the same as lexeme, since the later one is only a string, and token is an object, however, there do have a relationship between them, a particular lexeme can be recognized by the lexer and then create a specific token for this lexeme, the lexeme, of course, also can be an attribute of the token
Symbol table entries are used by lexical, syntax and semantical analysis. Compared to the string table in the lexical analysis, it is more often that only parser can decide whether to use old entry or create a new one in symbol table, the lexer can only return tokens with lexeme
Symbol tables can be use a form of a stack, where the topmost element is the symbol table of current block, the second is the outer block, etc. It can also be implemented by a tree(because every nesting level could be more than one block, so there may be two symbol tables at the same nesting level, which will form a tree), or, anything that can holds the order of the symbol tables from the nested block to the outmost block
A new symbol can be added to a symbol table by using semantic actions on the declaration nodes, when a node that declares a variable is analyzed, the semantic action will puts its information to the symbol table, and it will be queried out from the symbol table when encountered a production such as factor\rightarrow \boldsymbol{id} which declares a use of that variable
It is not necessary for compiler to yields IR(AST is a kind of IR with higher level, by the way) while building a solid syntax tree, it could generates IR while pretending to be doing so, which means some sort of simulation, the nodes and their attrs will be computed along with the data structure while parsing, but disappear after the parsing, the algorithm is to find a way to generates the syntax tree first, and then modify this method to let it generates the IR.
The compiler's front end will finishes the static checking including syntactic and semantic analysis before the intermediate codegen
A particular "universal computing machine" is intended to be made to run any "algorithm", people were thinking about if such a machine exists, we can precisely define the concept of what "algorithm" is, and even more, finding "algorithm" for a particular problem automatically so that there will be no mathematical computations are required to be done by human, they're desiring a machine that can understand any langauge that can be precisely defined such as natural langauges, e.g. English, which makes it more powerful than ever that it can recognize every possible well-defined langauge.
A Post machine, denoted PM, is a collection of:
- The alphabet \Sigma+\lbrace \# \rbrace
- A linear storage called \tt STORE or \tt QUEUE, which initially contains the input string, new characters can be added to the right and the leftmost character can be removed
- The store alphabet \Gamma
- The \tt READ state that will remove the leftmost character and branching the machine to another state, read operations can read every possible characters in \Sigma\cup\Gamma, \Lambda branching is allowed and means that an empty store was read. the PMs are deterministic, so there will be no nondeterministic edges
- The \tt ADD state that will concatenate any character from \Sigma\cup\Gamma onto the right of the \tt STORE
- The set of states S where the q_0\in S will be non-reenterable start state, and some halts states(accept or reject) H\sub S
If we're in read state and read a character that non of the outgoing edges is labeled with, the machine crashes, which is equivalent to taking a labeled edge into a \tt REJECT state, so we can draw PM without \tt REJECT states. The store does not have to be empty when a string is accepted.
The Post machines look like PDAs but Post machine can accepts non-context-free langauges such as a^nb^na^n
We cannot deduce that "PMs are more powerful than PDAs" from far as we know because this example just shows that there exist at least one PM to accept one non-context-free languages.
Any language that can be accepted by a PM can be accepted by some TMs
Let P be a PM with input string s, we put s on the turing machine's \tt TAPE, and insert a \Delta at the leftmost position(the insertion algorithm can be done by a TM, but the algorithm here is omitted intentionally for saving space, it is simple, move all the characeters one cell right then replace the cell that are to be inserted), so that we can know when do we hit the start of the input string, then, for every \tt ADD state that adds character y, create
And for every \tt READ state with outgoing edge that labeled with x
If there're multiple outgoing edges, add them as the q_3 outgoing edges with corresponding character(in this case is x), the \Delta is still a valid label, if what left on the P's store is an empty string, we need to draw a \Delta outgoing edge in the TM
and leave the start and halt states intact.
If {a,b,c,d,e}\subset\Gamma of a TM T, we use (a,b,c,d,e;=,L) denoting instruction sequence:
(a,a,L)(b,b,L)...(e,e,R)and same for (a,b,c,d,e;=,R)
For the next step, we provided two subprograms for PM, the first can add a character at the back of the PM, and the second can read a character from the front of the PM, these two subprograms do the opposite things as the \tt ADD and \tt READ states do in the rules, after we created these two subprogram, we'll have the ability to add or remove character on the either end of the store. The implementation is by using a marker and the cyclic shift, the first subprogram first adds a marker \text{\textdollar} and the desired character to the end of the content of the store, then repeadedly read'n'remove the first letter of the string then add the letter to the end of the string until we've removed the \text{\textdollar}, now the originally last character, which is the one we added to the end of the string, becomes the front-most one. For the second subprogram, we do things that are basically similar to the previous one: adds a \text{\textdollar} to the end of the string, then remove'n'read, each time we read a character we turn to a specific state that can represent that character, if we ever encountered a \text{\textdollar}, we just need to take action corresponding to the state we are because that state is like somehow memorized the character that is previously read'n'removed, then we add that character to the front then read it(this time we use the normal read):
The first subprogram, names \texttt{ADD FRONT}
For the second subprogram, we need to introduce a subprogram named \texttt{SHIFT-RIGHT CYCLICALLY} that do the first-half of the algorithm: add marker, shift the content, add the last character to the front:
then we read the character which is the last one originally that we've just add:
Any langauge accepted by a TM can be accepted by some PM
Before the construction steps, there is one thing needs to be concern: in the constructed PM, the character \Delta will be permitted to put on the store, just think that this symbol stands for blank in TMs but it is a meaningful character in PMs.
The algorithm is TRULY elegant, beautiful and so genius, first, the TM can alter the character at any cell, while the PM can only alter character at the either end(where by "alter", we mean a simplification of "read front and add front" or "read end and add end"), which means we need to find a way to move the cell that the TM's tape head is currently at to the one end of the store of PM, we do this by using a marker \# to simulate the TM's tape head on PM, we put what was originally at the right of the tape head(including the cell that the tape currently at) to the left of \#, and the left of to the right, so now the cell that tape head currently at will becomes the first cell in PM's store, for example, if the turing machine's tape is X_1X_2X_3\underline{X_4}X_5X_6X_7X_8, now the PM's store becomes X_4X_5X_6X_7X_8\#X_1X_2X_3, if we alter the X_4 to Y which will makes the tape becomes X_1X_2X_3Y\underline{X_5}X_6X_7X_8 and move the tape head right on the TM, it will be equivalent to \tt READ\Rightarrow ADD\text{ Y} on the PM, the PM will result in X_5X_6X_7X_8\#X_1X_2Y where the first character X_5 is still the character that tape head points to, and the left-right consistency is still maintained.
If we want to alter X_4 to Y and move the tape head left, we need to instruct the PM to \tt READ\Rightarrow\texttt{ADD FRONT}\Rightarrow\texttt{SHIFT-RIGHT CYCLICALLY}; the TM's tape will be like X_1X_2\underline{X_3}YX_5X_6X_7X_8 after operation, and what is left in the PM's store will be like Y\underline{X_3}YX_5X_6X_7X_8\#X_1X_2. However, there are two other considerations. First one is, if the tape head is pointing to the first cell, then the move left operation will causing a crash, but if it will just ends up with something like \#X_1X_2... on the PM's store which means the tape head is reading nothing but will not crash, to fix this, we add an alter operation right after the opration above is finished: try to read a letter from the front, if the letter is \#, we go to reject state, otherwise we add it back to the front. The second one is, if the tape head is pointing to the last non-blank cell in TM, and is instructed to move right, the corresponding string in the store will again be like \#X_1X_2, in this circumstance, we need to insert a \Delta right before the \# on the store, to compromise with the TM
Before we start the PM, we need to following the convention on the TM, which is let the tape head points to the first cell, so it is obvious that we need to put the \# at the end of the string on the string of the PM, we can do this by a \tt ADD\ \# state just after the start state of the PM.
The last thing is that the start state's reentrance on the TM, if we do the same on the PM, it will end up with two \#, so we need to use the first \tt READ state in PM as the start state in TM
Since:
Any language that can be accepted by a PM can be accepted by some TMs
and
Any langauge that can be accepted by a TM can be accepted by some PMs
We have:
Turing Machine=Post Machine
]]>A compiler is a program that read a program in source language and translate it into an equivalent program in target language. One of its important role is to report any erros during the compilation
Compared to interpreters, compilers must compile the program into a executable file then the user can use it, which means it compiles then accept inputs, the interpreters, on the other hand, accept the inputs while reading source program, and evaluate the program statement by statement simultaneously, then produce the output, the compilers often generate better target codes that will run significantly faster than interpreters, but interpreters are usually have better error-correction capabilities, because they executes the program statement by statement.
There are also hybrid compilers that unite the interpreter and the traditional compiler, they use translator to translate the program into a low-level IL then evaluate the IL in an interpreted way
Lexical analysis is the first phase of a compiler, it reads the stream of characters, then group these characters into meaningful sequences, called lexeme, for every lexeme the lexer will produce a token like \langle\text{token-name},\text{attr-value}\rangle where token-name refers to a symbol that will be used during the syntax analysis(next phase), the second component points to a entry in to symbol table which is needed for semantic analysis and codegen
This is the second phase of a compiler, it's also named parsing, the token's first component that generated by the previous phase will be used to build a tree-like intermediate representation that shows the structure of the token stream. It is often a syntax tree
The nodes of the syntax tree represent the operation and the children of a specific node represents the arguments of the operation which the node represents
Semantic analyzer uses syntax tree and the symbol table to check the semantic consistency, gather type information and saves it in the syntax tree or symbol table for intermediate codegen.
An important part of the semantic analysis is type checking, the compiler checks each operator has matching operands, e.g. strings can not be added to an array of integers
Coercion is a implicit type conversion, like the add operator can be applied to either a pair of integers or a pair of floats, if one side is integer and the other side is float, the compiler may coerce the integer into a float, the actual underlying operation may behaves like add an explicit type conversion to the integer-side operand.
During the compilation there might be one or more intermediate representations of the source program, syntax tree is one of them with relatively high level, after the syntax and semantic analysis, the compiler often generates an low-level IR that can be considered as an abstract machine(e.g. a stack machine or a register machine). the two important properties of IL are:
The code optimization here is machine independent, it works upon the IR trying to improve its quality to generate better target code, the machine independent optimization often involve measures that are highly depended on the control flow and the dataflow like constant folding(常量折叠) and copy propagation(复写传播)
The codegen phase takes IR as input and produces target language as output, if the target language is machine code, then the register and memory location will be selected for each variables use by the program, then the IR will be translated into equivalent real machine code, and the machine code will work together with symbol tables to bridge a path between registers and memory addresses
The symbol table records the name of variables and functions, and collect information for each of them, it may collects the allocated storage, type, scope for a variable, and the arguments' number, type, calling convention(by ref of by val) and return type for a function
The symbol table will be an associative container that uses the name of the variables as key and the attributes above as value, it should be designed to find the record for each name quickly
The front-end of a compiler contains Lexical Analysis, Syntax Analysis, Semantic Analysis and Intermediate Codegen
The back-end of a compiler contains Code Optimization(optional) and Codegen
A well-design frontend and backend should be decoupled, the frondend can be applied to different backends, and a backend can be applied to different frontends
The programming languages that have been invented so far can be classified into various ways
C/C++/C#/Java/Fortran
SQL
Prolog
C/C++/Java/C#
, these languages always involving statements and notions for changing statesML/Haskell/Prolog
CORRECT, MEANINGFUL, REASONABLE, MANAGEABLE
There are two major optimizations specified for computer architecture: Parallelism and Memory Hierarchy, the parallelism have several levels: processor-level and instruction-level, the machine can issues the instructions in parallel if possible, operate on a vector of data at the same time(called vectorization), and run multiple operations in parallel. Some techniques have been found to automatically turn a sequential procedure into multiprocesstor code.
Memory Hierarchy consists of several storage levels like a pyramid, where the topmost has fastest speed but smallest storage, the speed will get slow down but storage gets bigger all the way down to the bottom of the pyramid, there will be some techniques to be involved in compiler design such as register allocation, allocate variables into registers in a reasonable way is the most important problem in optimizing programs. The memory and physical devices is managed by system and hardware yet sometimes they may not efficient enough, it is possible to improve performance on this by rearrange the layout of data and the order of instructions accesing the data
An environment is a mapping from names to variables(a location in the store)
A state is a mapping from variables to their values
The binding from names to locations are mostly dynamic, the same names may bind to the different variables under different scopes
The binding from variables to values is also dynamic because we cannot know the value of a particular location until we run the game, constants are exceptions, the binding of the constant variables and their values is known at compile-time
Name mostly refer to the compile-time name, and variable mostly refer to the runtime locations denoted by names
An identifier is a string of characters refers to an entity, like an object, a procedure, a class, or a type.
All identifiers are names, but not all names are identifiers, x.y is a name but not a identifier, x is both a identifier and a name, so is y, but x.y is a qualified name, not identifier
A variable refers to a particular location of the store(like in the memory), variables may be redeclared more than once, such declarations would result in different store location, the same identifiers may not refer to the same variables, for example, in a recursive procedure.
All of the callable subprograms are procedures，procedures have no return value, a function is a procedure with return value, specifically, we can treat a void
-return function as a procedure. The methods are heavily used in object-oriented languages, they refer to function or procedures that are associated with a class
Declaration tells us the type and the existence of things, while definition tells us about their value and behaviours
We say a scope resolution is dynamic, if it's based on the factors that can be known only during the runtime, a dynamic scope of name x refers to the variable that are defined in the most recently called, not-yet-terminated procedure, dynamic dispatch and macro expansion are the widely-known usages of dynamic policy
A scope resolution is lexical, if it's based on the lexical structure of the source program, almost all languages' variable scope resolution are lexical
Omitted
If two variables point to the same location so that the changes made to one of them will affect the other one, then each one of them can be the other one's alias
]]> We're about to introduce a new kind of machine that will be the basis and the simplest yet powerful model of the current computers, this machine is called Turing Machine. It is structurally very similar to the machines we've talked about before.
The Turing Machine must be capable of output since it's the most powerful computational model, this is they only way that it can communicates what it know with anything else like humans.
A Turing Machine, denoted TM, is a collection of:
- Alphabet \Gamma denoting those characters that can be printed on the \tt TAPE by tape \tt HEAD
- Alphabet \Sigma denoting input letters, it may be a subset of \Gamma
- The blank symbol \Delta\in\Gamma, denoting the blank symbol, which means no characters, it is different from the null string \Lambda
- An infinitely long \tt TAPE, marked out into square cells, each of which contains a input letter from \Sigma or \Delta
- A finite set of states Q
- Start state q_0\in Q
- Halt state F\sube Q
- A tape \tt HEAD that can do following things in one step:
- Read a letter from \tt TAPE
- Alter it with any character c\in\Gamma, if the character is \Delta, it will be considered as an erasion
- Move it self one cell left or right
the \tt HEAD is at the first cell of the \tt TAPE before the machine starts, it can never move to the left of the first cell, if it is instructed to do so, the machine will crash- A program, which is a function that instructs the machine to change state, alter the character on the tape and move the tape head according to the current state, the program will be depicted in a directed graph like the FAs, where the directed edges are the flow of the states, nodes are states themselves and each edge is labeled with (letter, character, direction) where the letter is the letter to be read from the tape, the character is the character that to be used to alter the letter on the tape, and the direction , which could be L or R, is the instruction to tell the tape head move left or right
the TMs do not require that every state must have all the outgoing edges for all the possible characters it can read from the tape, if a letter is read an the we're currently encountered has no appropriate outgoing edge, the machine crashes, the input is said to be accepted by the TMs if and only if the TM halts at the halt state. The turing machines are deterministic, for every input there is a unique trace, it is not allowed to have two outgoing edges from one state labeled with the same letter.
We use a specific notion no trace the path on a TM, if a TM is currently at state q_3, and the input string is abcd where the letter that is about to be read is c, we use q_3(ab\underline{c}d) to show this, if it moves to the state q_4 and the letter about to be read becomes d, we depict it by q_4(abc\underline{d}), and all of these can form a path which we called trace from start state q_0 to halt state q_n
For every regular language L there exist a Turing Machine accepts it
The proof is trivial, we take an FA M that accepts L, then for every edge in the FA, we change the label to (a,a,R) and (b,b,R) correspondingly, for every halt state in the FA, we turn it into a normal state and add an extra edge labeled with (\Delta,\Delta,R) to a newly added state \tt HALT, which will be the halt state of the TM.
This algorithm works because the trace is exactly the same as it were in the M, if we get to the halt state in M originally, we get to the \tt HALT state in the TM, and if the string is fall into an infinite loop in M(we used to use an infinite loop to denote a string that must be rejected), the corresponding TM will crash because the string will be exhausted somehow and \Delta will be encountered, however there is no edge in M that is labeled with \Delta except those who connected to the \tt HALT state
Every Turing Machine T over the \Sigma divides the set of input strings into three classes:
- The set of all strings leading to a \tt HALT state \text{ACCEPT}(T), this set is also the language accepted by T
- The set of all strings which will lead to a undefined behavior, a.k.a. crash, \text{REJECT}(T)
- The set of all others strings \text{LOOP}(T), i.e. strings that will loop forever while running on T
These sets could be empty, and there is a simple example for the definition above, consider the regular language (a+b)^\ast aa(a+b)^\ast, we build a TM for it:
if the input has double a, the machine halts successfully, if the input does not contain double a and end with a, the machine will crash since there is no \Delta outgoing edge from q_1, but if the input does contain double a and end with b, then it will loops forever on q_0, in this case:
A fun fact that a TM can do is that it can not only accept some regular and context-free languages, but also some non context-free languages, we can encode a TM for language a^nb^na^n which has been proved to be non context-free, the idea is to encode the TM in a circuit, on each iteration we erase an a from the first a^n, a b from the b^n and a a from the second a^n, if the tape is finally all blank, then it means that the string must be in a^nb^na^n, but if it ends up with, say, one a and one b, or a single a or a single b or whatsoever makes the tape still retains some non-blank characters, then the string must not balanced on a and b and second clump of as, the encoded TM is depicted as follow:
You can try to simulate aaabbbccc on this machine, see if it will be accepted.
]]>Some of the problems are decidable and some others are undecidable for CFG, the following seven problems
are all undecidable, because they've been proved that it is impossible to find an algorithm to decide them.
Given any CFG G, there exist an algorithm to decide whether it generates any words
It is obvious that if \Lambda\in L(G), then this language generates at least a word and if it is, the G must satisfies S\stackrel{\ast}{\Rightarrow}\Lambda, which means S is a nullable nonterminal, and we've proposed an algorithm to decide whether if a nonterminal is nullable previously in the chatper Grammatical Format.
Now, since we've shown how to decide the \Lambda's existence, now we can turn the CFG into CNF, if there is such a production S\rightarrow t where t could be any terminal, then this CFG generates at least one word, if there's no such production, we hava a way to try to build one
This algorithm works because it simulates a backtracing of the derivation, if S yields any words, these words must be generated by other nonterminals (maybe 0, in which case it satisfies S\rightarrow t) which are again generated by S, since every nonterminal will be replaced by terminal, if we trace the derivation steps of a word all the way up to S, then the S must can be eliminated because it yields that word, and those nonterminals in every steps can be eliminated. If the S cannot be eliminated then it means that there is no word's derivation can be retraced up to S.
The finiteness of this algorithm is obvious
Given a CFG G, there exist an algorithm to decide whether if a nonterminal X is ever used in the derivation of words
This proof will be constructive
This is similar to the propagation of virus, if a production with left nonterminal, say, K, involves X as its right part, then any production that involves K as its right part will also yields X, we repeat this procedure bottom-up till we meet S, if it happens, then S must be able to yields X which anwsers our question perfectly.
Given a CFG G, there exist an algorithm to decide the finiteness of the language it generates
This proof will be constructive
This proof hides a subtle detail that we need to ensure there must exist a production with X as its left side, this is actually unnecessary because if there're no such productions, then X will be eliminated from the rule.1 since it can not generate any word. The essential idea is to find a self-embedded nonterminal, and make sure it is reachable from the start symbol S
Given a CFG G and a string x\in\Sigma^\ast, there exist a algorithm to decide where x can be generated by the CFG
Let n be the length of x and k be the number of nonterminals, let the letters of x be x_1...x_n and the nonterminals be N_1...N_k
If we know that N_i\stackrel{\ast}{\Rightarrow}x_p...x_q,p\lt q and N_o\stackrel{\ast}{\Rightarrow}x_q...x_j,q\lt j, then if there exist a production like N_u\Rightarrow N_iN_o, we would know that N_u\stackrel{\ast}{\Rightarrow} x_p...x_j.
From the letters in x, one by one, we construct such an algorithm:
apparently we know what nonterminals generate x_1 to x_n, respectively, then we can obtain the combination of nonterminals that could generate x_1x_2, x_2x_3, ..., x_{n-1}x_{n}, list all these combinations, find if there exist some nonterminals that can yields them, for example, if N_1\rightarrow x_1, N_2\rightarrow x_2 and N_3\rightarrow N_1N_2, we'll know that N_3\stackrel{\ast}{\Rightarrow}x_1x_2, we also do this repeatedly to find all the nonterminals that generates the substring with length 2, when we exhaust all the possibilities, increase the i, now i=3 and we do the same, if N_3\stackrel{\ast}{\Rightarrow}x_1x_2 and N_4\rightarrow x_3 and N_4\rightarrow N_3N_4, then N_4\stackrel{\ast}{\Rightarrow}x_1x_2x_3, which means it is a nonterminal that can generates one of the substring with length 3, apply this repeatedly until we exhaust all the possibilities from substring of length 2 to substring of length n, which is x_1...x_n itself.
This algorithm is called CYK algorithm and it's actually bottom-up, first we find all the nonterminals generate the substring of length 1, then find all the nonterminals that can generate the nonterminals find the the previous step in a manner that they can form a substring of length 2, then length 3, length 4, ..., until length n, when we reach length n and we find that S is one of the nonterminals, it implies that S is capable of yields x
There is another problem needs to be solved, given a CFG G and a word w, we want know not only if the w is in L(G), but also how does it get generated
Given a word w generated by a grammar G, the procedure of finding its derivation is called parsing
The reason why we need parsing is that if we can deduce the derivation tree of a word, then we can analyze its meaning.
Start with G and w, the task is to find a sequence of productions of G that generates w, we accomplish this by checking all of the leftmost derivations, during this process we will generate a syntax tree and modify it many time (which might involve another algorithm called Depth-First Search to traverse through the tree, DFS will try to simulate a leftmost derivation all the way down till it find that it's impossible, then turn to the next leftmost derivation from the current symbol), which means we will simulate every possibility of the leftmost derivation until we find a suitable one or we find that this path is clearly impossible. By "impossible" means that the current derivation can never lead to w, for example, a terminal that w does't have appears during the process, or a terminal appears at a wrong position compared to w, this step is called disambiguation^{1}, it prunes the syntax tree, cut off those impossible subtrees, reduce the size of the whole tree to improve the time complexity.
Some times there will be a tough problem, for example, if there is a production rule T\rightarrow T+E|F and the currently deriving string is i+i+T, both two rules can be applied, the method we're about to use to solve this particular problem is called backtracking, we "memorize" the current deriving string, and perform each of the productions, if we find all of them are impossible, we "restore" the deriving string back to the one that we memorized, then start to try the other productions.
There're some rules that can help us to decide which subtree needs to be pruned
(These are not all the rules, the total rules are nondeterministic, it highly depends on the context)
The bottom-up parser is the inverse version of top-down parser, it tries to resolve the leftmost derivation from the word itself, not from the start symbol, the aforementioned CYK algorithm is one of the bottom-up parser, it traverses all the possible derivations and tries to deduce the nonterminals all the way up to start symbol.
There is actually a third parser which works upon the postfix notation.
Context-Free Languages are closed under union
Let G_1 and G_2 be CFGs with start symbol S_1 and S_2, introduce a new CFG G_3 with start symbol S and a production rule S\rightarrow S_1|S_2, and L(G_3) will be the same as L(G_1)\cup L(G_2), since it is obvious that S can only generates S_1 or S_2 which will generate L(G_1) and L(G_2) respectively.
The other way is to construct a union machine for L(G_1) and L(G_2) like what we did for those regular languages, use a new start state and link it to the original machines start state.
Context-Free Languages are closed under production
Let G_1 and G_2 be CFGs with start symbol S_1 and S_2, introduce a new CFG G_3 with start symbol S and a production rule S\rightarrow S_1S_2, and L(G_3) will be the same as L(G_1)L(G_2), because it is obvious that the first part of L(G_3) will be in L(G_1) and the second part will comes from L(G_2)
If L(G) is a CFL, then so is L(G)^\ast
Let G be the CFG of the L(G) with start symbol S_1, introduce a new CFG G' incorporated with G and a production S\rightarrow S_1S|\Lambda.
It is clear that S\stackrel{\ast}{\Rightarrow}S_1^n
The intersection of two CFLs may not be CFL
Since all regular languages are context free languages and we've give a sufficient proof that the intersection of regular languages are still regular languages, so if two langauges are regular, then then're also context-free, hence their intersection is regular and context
If a CFL is not regular, this statement may not hold, for example, the language L_1=\lbrace a^nb^na^m\rbrace is context-free(the proof is omitted), and so does L_2=\lbrace a^nb^mc^m\rbrace, but there intersection, which is L_3=\lbrace a^nb^na^n\rbrace is apparently not context-free since we can disprove it by using the pumping lemma for context-free languages we've mentioned at last chapter.
The complement of two CFLs may not be CFL
As we've stated in the last theorem's proof, part of the complement languages are context-free because they're also regular, and the others are not
Let L_1 and L_2 be context-free languages, if their complement \overline{L_1} and \overline{L_2} are context-free, then \overline{L_1}\cup\overline{L_2} must be context-free according to the union property, and \overline{\overline{L_1}\cup\overline{L_2}} must be also context-free, but from the De Morgan law we know \overline{\overline{L_1}\cup\overline{L_2}}=L_1\cap L_2 which we've just proved that may not be context-free, so we proved this theorem by contradiction.
The intersection of a context-free language and a regular language is always context-free
We can build a PDA P_M upon the PDA P of the CFL and the FA M of the regular language, here are steps:
This algorithm must be finite because it can be described in another way, let the states in S(P) become \lbrace y_0, y_1, y_2,...y_n\rbrace and states in S(M) become \lbrace x_0, x_1,x_2,...,x_3\rbrace, then let S(P_M)=S(P)\times S(M), for every y and x find a corresponding element in S(P_M) which satisfies the above rules(it will be a set of ordered pairs), this procedure is obviously finite since both the S(M) and S(P) are finite sets.
The last step of the algorithm is make sure that the running string terminates if and only if it reaches a state such that the y part of this state is an \tt ACCEPT state in P and the x part of this state if a halt state in M, now we've built a PDA that accept both the context-free language and the regular language. Q.E.D.
Let G be a CFG in CNF, we call the productions like
live productions, and
dead productions.
If each production can be used only once during the derivation, we can only generate finitely many words.
For every CFG in CNF, applies a live production will increase the count of nonterminals by one, and applies a dead production will decrease it by one, since the derivation starts with a single nonterminal(the start symbol) initially, and end up with zero nonterminals, we need exactly one more dead productions than live productions because we need to eliminate the single nonterminal, this truth holds for every word that every CFG generates, if a CFG has p live productions and q dead production and each of them can be used only once during the derivation, then there will be at most p live productions and p+1 dead productions be applied, and the words that can be generated from p+1 dead production have at most p+1 letters, so the words generated by applying each production at most once can have at most p+1 letters, which means there can only be finitely many of them.
Let G be a CFG in CNF with p live productions and q dead productions, if w\in L(G)\land |w|\gt 2^p, then there exist two nodes m and n in w's syntax tree such that they're the same nonterminal and n is a direct descendant of m for every syntax tree of w
Let's make a simple clarification before we make the proof, assume that the syntax tree of the word w has, let's say, k rows, and its is a perfect binary tree, whereby "perfect" means each of its nodes has two descendants except the last row, because the last row is where the dead productions take place, and those dead productions can only have one descendant which is the terminal itself, if w has exactly s letters, then the syntax tree would have exactly log_{2} s+1 rows for the last two rows have the same number of nodes, so if w has more than s, the syntax tree must has more than log_{2} s+1 rows, since we've restricted the length of the w to be greater than 2^p, this then implies that the number of the rows of the syntax tree derived from w must have more than p+1 rows, which means at least p+2 rows, if a terminal at the last row can be derived from the root of this tree, it must have at least p+2-1=p+1 ancestors, all this ancestors must be live productions because once a dead production occurs the path will end and the terminal of the last row will be generated, now the contradiction appears, there're only p live productions, according to the pigeonhole principle, there must exist a live production such that it was used more than once, and multiple occurrences of the same live production means the same nonterminal, which proves the theorem.
A nonterminal of a derivation of a word in a CFG is said to be self-embedded if it ever occurs as a tree descendant of itself, according to the theorem 2, any sufficiently long word must have a leftmost derivation that include a self-embededed nonterminal
Let the notation \stackrel{\ast}{\Rightarrow} deontes "yields", we say x yields y, if y can be derived from x after certain productions, if
we say
There is actually a funny fact, if a nonterminal N is self-embedded, which means N\stackrel{\ast}{\Rightarrow}vNy(where v and y are strings of terminals yielded along the procedure), then this procedure can be repeated indefinitely, we can apply N\stackrel{\ast}{\Rightarrow}vNy as many times as we want and eventually end up with an infinite language, the conclusion is, if a CFG contains at least one self-embedded nonterminal, it must be infinite, if N\stackrel{\ast}{\Rightarrow}vNy, then so does N\stackrel{\ast}{\Rightarrow}v^nNy^n
The discovery we just found is analogous to the essential idea of the pumping lemma of regular languages, we've shown that if a CFG has at least one self-embedded nonterminal, then it can yields those words that some of its part repeated occur and the rest of the word are the same
Let G be a CFG in CNF with p live productions, let w\in L(G)\land |w|\gt 2^p, then the w can be break into
where x\neq\Lambda, v and y can't both be empty, and |vxy|\leqslant 2^p
such that
Let P\rightarrow QR is a live production of G, and G starts with S, suppose there exists S\stackrel{\ast}{\Rightarrow}uPz, since we've proved if w is sufficiently long then every of its nonterminals will be self-embedded, so P must be self-embedded which means P\stackrel{\ast}{\Rightarrow}vPy, let's presume that the derivation string just before the second application of P\rightarrow QR is uvPyz, where v or y can be \Lambda but not both, because the v and y are apparently come from P\stackrel{\ast}{\Rightarrow}vPy, and P can be substituted into QR then vPy, the only two edges can be derived from P is Q and R, and Q is always the left cousin of R, if Q yields the next P, the v would be null, because there're only two edges and vPy contains three routes, since the P is on the left edge Q, it occupies the v's place, if the P is yielded from R the situation becomes the opposite, y would be empty, however, no matter the empty one is v or y, the other side must be capable of deriving a subtree, i.e. if Q yields P which implies that v is empty, then R would derive a subtree of y. so either v or y can be empty, but not both of them. At this point, the things become clearer, the current derivation string is uvPyz, where vPy is yielded from the last application of P\rightarrow QR, now we can do this again, we apply P\rightarrow QR to the derivation string and performs exactly the same steps as the first time until we meet the third P to generates uvvPyyz, the P is substituted into vPy again because they're using the same procedures as the first application, keep repeating the applications, we'll eventually find that every uv^nPy^nz where n\in \mathbb{N^+} can be derived. Since the u and z only exist if S yields them before the first P, so they can be null with no doubt, it is totally OK that S yields only P without u and z. Now after some subtitutions, we can finally stop the repeats and yields a string of terminals from P, let's say, x, the derivation string then be like uv^nxy^nz, this means there're totally n+1 occurrences of P where the first n times are the applications of P\stackrel{\ast}{\Rightarrow}vPy, and the last time is P\stackrel{\ast}{\Rightarrow}x. Now let's make another observation, suppose that instead of choosing P arbitrarily, we choose the repeating nonterminals from the bottom, which is row that generates those terminals, we look up the tree from the bottom for p+1 rows, since we've proved that p+1 rows must have two identical nonterminals, let's denote this two nonterminals by N' and N'' where the former one is the upper one, we know that the subtree from N' is at most p+1 rows so the length of the string it generates is at most 2^p. The reason we keep on this is that we can obtain a upper bound of the pumping length, we need to find the closest repetitive nonterminals because the lower it is, the shorter string it generates, which is turned out to be 2^p, be noticed that this string is exactly the vxy, the upper N' generates v and y, and the lower N'' generates x, just like a pyramid, so we've proved that |vxy|\leqslant 2^p
]]>There is a PDA that accepts the same language for every CFG, and a CFG generates the same language for every PDA
This part of the proof will start with an example, we propose a CFG in CNF, and raise a corresponding PDA, run some strings on this PDA, observe what happens on each stage: the characters left on the \tt STACK, the letters left on the \tt TAPE, the current state of the PDA, etc. Since we've shown how to translates a CFG into a CFG in CNF, there would be no problem assuming all the CFGs we're proposing is in CNF
For CFG:
we propose PDA:
instructs a \tt PUSH operation on S indicates that we're to start running the string, then we're going to simulate several things: 1. the substitution from nonterminal to nonterminals. 2: the substitution from nonterminal to terminal. When we pop a nonterminal out, we can decide to push another two nonterminals in according to the production rules, e.g., There're two choices of the combination of nonterminals if we popped a S out of the \tt STACK: SB or AB, because these are all the capabilities of S according to the productions, if we choose to push AB, we can pop out A and push in CC in the next step to simulate another substitution: A\rightarrow CC, now what's left in the \tt STACK is actually three nonterminals: CCB where all of them are actually nonterminals with only terminal productions, the second option takes place in this step: pop out a C and read an a from the tape to simulates the production C\rightarrow a.
It is not hard to obvious that we're simulating a leftmost derivation on the PDA of the CFG, each time we use the stack to perform a substitution comforming to a certain production rule on it's topmost element which acts as a nonterminal in the CFG, and the topmost element always being the leftmost nonterminal in a production. We can track the path to \tt ACCPET by constructing a state transition table, this table shows not only the state sequence to the accept state, but also the relationships between the \tt STACK, \tt TAPE and the CFG's leftmost derivation:
At each stage, the leftmost derivation yields the same content as the characters in stack, this is the essential idea of this PDA, we simulate the whole leftmost derivation by using a stack, the derived string always satisfies such a form like (\text{Letters read from the tape})(\text{Nonterminals from the }\tt STACK) at every stage.
Consider a CFG and assume it is in CNF with nonterminals X_1, X_2, ... and terminals \lbrace a,b\rbrace and S=X_1 denoting start symbol:
Let \tt HERE becomes a new kind of state in the PDA's definition, we introduce this state because we need a way to trace the edges we're traveling now, and those who have been traveled through before, the \tt HERE states are going to be a middle state stands in between two other states, we referring to a certain \tt HERE state each time we want to show that we're traveling through an edge, neither it consumes any letter from the tape nor pop or pushes any character to the stack, it acts like a marker. The \tt HERE states are nonterministic, it allows multiple outgoing edges.
A PDA is said to be in conversion form, if it satisfies:
Note that the \tt HERE state can be replaced by a \tt READ state.
The input string must be exhausted before the machine accepts any word.
A PDA can be turned into a conversion form with its recognizing capability untouched, This gonna be proved for each of the rules.
we can convert it into
the overall nondeterminism stays as it was, we just advanced its timing, the branching used to occur at \tt POP state, now it occurs on \tt READ_1(or \tt HERE), and we add a new edge and a new \tt POP state to preserve the nondeterminism, be aware that the modification we've made in rule 3, we added an extra \tt POP and several \tt PUSHs if the \tt READ or \tt HERE is not followed by exactly one \tt POP, it also needs to be attentioned and being modified once again by the algorithm we've just proposed
There's an extra point that needs to be treated carefully, when we performing the algorithm for rule 3, the possibility of which the stack might only contain \text{\textdollar} must be accounted, we need to branching an additional \tt POP which pop and then pushes \text{\textdollar} from the \tt READ state once such circumstance occurs. For example, the PDA of CFG a^{2n}b^n^{1}:
needs to by converted into
The reason why we need these eight conditions is that it forms a formal description of a path segment on the machine, every PDA, can be considered as a set of a quintuple each of which represents a segment, a part of the machine, and every tuple semantically forms a description like:
the states \tt START, \tt READ, \tt HERE, and \tt ACCEPT are called joints, there will be exactly one character being popped and any number of characters being pushed between any to consecutive joints. And once a machine is in conversion form, we can describe it by describing these path segments and then summary them in a summary table, it's analogous to the transition tables we've talked about before when introducing finite automata; if these segments are being presented pictorially, they will form a graph where the nodes can be classified into two classes, the first classes are the joints, and others are non-joints, where those joint nodes separated the whole graph into several different areas each of which contains multiple(or only one) non-joint nodes, the joint node along with its non-joint nodes makes up a path segment, the PDA we've just mentioned above can be like:
the classification is obvious, the part between \tt START and \tt READ_1 is a path segment, so do \tt READ_1 and \tt HERE, \tt READ_1 and \tt READ_1 itself, \tt HERE and \tt READ_2, \tt READ_2 and \tt ACCEPT. its corresponding summary table is:
be noticed that the cells of \text{PUSH} column means push the characters from right to left onto the stack, a\text{\textdollar} means push \text{\textdollar} first then push a secondly, the leftmost symbol represents the topmost character of the stack.
Instead of focusing on every states and every edges, now we can consider that every successful path through the PDA is made up by several path segments, which are those rows in the summary table, if we want to accept aaaabb, we must follow the rows(R for \text{Row}):
It is no doubt that the summary table represents a PDA with the same recognizing capability as the one we've mentioned above in a pictorial way, they are indeed the same machine despite seemingly prodigious differences, which means they accept the same set of languages.
Since every successful path through the PDA corresponds to a set of rows, we must find a way to distinguish which permutations of those rows can be make up a path, firstly, a row must be followed by another row with the same FROM as the former's TO, you cannot end with \tt READ_1 and then start with \tt HERE; secondly, the connectivity between two rows' FROM and TO does not guarantee that these two rows can form a part of the path, because there're some other contextually sensitive variables, such as \tt STACK, for instance, the R_1 cannot be followed by an R_3 in spite of the connectivity between their FROM and TO, because R_3 requires to pop an a while there is only a \text{\textdollar} left in the \tt STACK after R_1. Another thing needs to be concern is we must balance the pop and push operations, you cannot let four R_3 followed by five R_5\Rightarrow R_6 combination because of every R_5\Rightarrow R_6 consumes two a and every R_3 contributes only one a, the count of a in the stack would be unbalanced which will then causing a crash on the machine.
To form a valid path, the path segments(the rows) must meet two consistencies:
We shall define a special language called "row language" of a PDA, its alphabet comes from the PDA's summary table(where R for \text{Row}):
and its words are all the ordered combinations of several rows that can make up a path from \tt START to \tt ACCEPT, all of its words are both stack-consistent and joint-consistent.
From now on, we're about to determine the CFG for the row language, and then translate it into a CFG for the original PDA, generally, this proof contains the following steps:
The essential idea of this proof is maintain a top-down consistency on both stack and joints, we've showed that each row of the table corresponds to a path on the summary table which represents a path segments on the PDA, and every word accepted by the PDA will follows a path that consist of some of those path segments, and on each path segments we consume at most one letter from the \tt TAPE, let's make an observation about how does these rows maintain the stack-consistency: the whole problem can be reduced into a simple question, consider two arbitrary joints p and q in the PDA, can we find a path which we will call it "route" from p to q, while at a total cost of popping out the topmost element k on the stack? where by "total cost", means that during the path, the stack is never be popped out below k, and it could have multiple other pushes and pops, but when arriving q, the stack should look like it just popping out the topmost element, since we require every row satisfies this requirement, the stack-consistency is ensured. We can connect each of these routes that have matching start joint and end joint, for instance, if a route starts in \tt READ_2 and ends in \tt HERE_2, another route starts in \tt HERE_2 and \tt READ_3, then we can let the former route followed by the later route, because the former one's end and later ont's start are matching, by keeping this all the way, the joint-consistency is also ensured. Now focusing on the PDA itself, we know that the PDA is start with only a \text{\textdollar} on the stack, and end with an empty stack, this means that the path from \tt START to \tt ACCEPT constructs a big route at a total cost of \text{\textdollar}, and this big route, clearly, can be broken into other small routes, like from \tt START to \tt READ_2 at a total cost of a, and these smaller routes can be once again broken into even smaller routes, along with this iteration, the "small routes" will end into a combination of solid rows, this on the other hand, shows that rows can form routes, and routes can form bigger routes, eventually end up in the biggest routes: the one from \tt START to \tt ACCEPT. Until now, it's not hard to find out that these new concepts are analogous to a CFG, where the rows are terminals, the routes are nonterminals, and the biggest route, is the start symbol, if a route can be broken into smaller routes, we involve a production rule that substitutes the original route into those smaller routes, repeat this process topdown until we finally reached a circumstance that all the nonterminals has been broken into a combination of terminals, which are those rows. We've mentioned that every row consumes at most a single letter from the \tt TAPE, we can consider this as a terminal production, where the row itself is at the left side of the production, and the letter it consumes at the right side acts as a terminal, if the row read nothing, or a \Delta, we then let the \Lambda becomes the right side of the production, then we contribute these new productions to the CFG above to finish the step 5 and step 6 of the proof, and this CFG will exactly equivalent to the langauges of the PDA
Since we're going to build a CFG for the row language, it is necessary to introduce its nonterminals, which is the aforementioned "routes"; we're going to use them to maintain the stack and joint consistency, the contents of the \tt STACK, and the beginning and end positions of each row, the \tt TAPE is an exception that can be ignored because every row itself corresponds to a part of the input string on the \tt TAPE, if a word can be accepted by the PDA, it must can be broken into blocks each of which corresponds to a row, we know the rows, we know the words they can construct.
The nonterminals of a row language follows the following definition:
where X and Y are joint, Z is any character from \Gamma, this function-like definition stand for:
There exists a path from X to Y passing through zero or more other joints, which has the net effect on the \tt STACK of removing Z, and by "net effect" means stack has Z as its topmost element when passing X, and retains all its other elements when arriving Y except for Z which has been popped out somewhere during the travel, there might be multiple pop and push operations along the way, but in the end, the stack would be like popping out a Z with all other elements preserved(of course the second topmost element will be new topmost element of the stack) and the elements below Z should never be popped out, for example, if a stack contains babab initially, then push-and-pop the a and b two times, respectively, then pop out a b at last so that the stack would contains abab now, we say that that operation has the net effect on the stack of removing b, because although the stack indeed pushes two b and two a, they are all popped out eventually, then by popping out a b again we say that the net effect appears because the current stack abab looks exactly the initial one babab without the topmost b. It's analogous to the "work" in the physics, no matter how many ups and downs you've taken, the total work still only count on the height deviation. And be noticed that no characters below Z can be popped out during the process, even if it will be pushed in again, for example, a single \tt POP\ \rm Z has the net effect, but \tt POP\ \rm Z\Rightarrow \tt POP\ \rm a\Rightarrow\tt PUSH\ \rm a does not have the net effect because although the stack is balanced at last that looks like only Z is popped out, it still presumes that the a is below Z and pops it out during the process which is never allow to happen, NO ELEMENTS BELOW Z CAN BE POPPED OUT AT ANYTIME
The Row_4 of the summary table above is a good example on this, it shows a nonterminal:
a row in an arbitrary PDA like
has no net effect, because it pushes more than pops, the things left on the stack after this row being executed will be two characters more than before, and one of the net effect's requirements is that the stack should be one character less than before after the process, however, although R_{11} cannot acts as a net effective row alone, it can play a part with other net effects together, for example, if there are other three valid nonterminals
these three nonterminals form a valid path, because their FROMs and TOs are connected, we can then put the R_{11} in use:
this would be a path segment because we can start with \tt READ_9, pops out an b and pushes in abb, then the abb will be consumed when traveling through \tt READ_3 to \tt READ_7, \tt READ_7 to \tt READ_1 and \tt READ_1 to \tt READ_8, at last there would be nothing more than the initial characters without a b in the stack when arriving \tt READ_8, while ensures that no characters below the first b is ever get popped out. We can represent this rule by using a production^{2}:
this production, on the other hand, semantically says that we can reach \tt R_8 from \tt R_9 at the cost of a b, and it can be substituted by the combination of \frak{R_{\text{11}}} and several other \frak{N}s, and this will be one of the productions of the CFG of the row langauge, in the example above, the \frak{R_{\text{11}}} is the terminal, and \frak{N} are the nonterminals. There're three rules to produce all these productions:
which means a complete path from \tt START to \tt ACCEPT, as a cost of popping out only a single character \text{\textdollar}, and never pop out any thing below the \text{\textdollar}, this production is universal to all PDAs.
we have:
indicates that we can arrive Y from X at a cost of popping Z, this production involves a hypothetic that the PDA must contains at least one row like this, because this row holds a truth that it decreases the size of the stack when being executed, if none of the rows have a such form then the stack will never decrease its size down to empty which implies that the machine can accepts nothing.
For every row in the table with valid pushes:
let \Theta be all sets of \tt R, \tt H, or \tt A(which means every element might be one of these three)(no \tt START state) where \forall_{\theta\in\Theta}(|\theta|=n), we introduce a set of productions
One thing important here is that all the \thetas cannot be \tt S because the start state does not permit incoming edges in the conversion form, and those \thetas before \theta_n also cannot be \tt A because there is only one accept state and that one is the last one. This will generate a gigantic amount of productions, some of them will be useful in generating words and some of them won't, which means there will be a considerable part of rules in the set are useless, in fact, only those nonterminals that can be substituted into solid terminals are useful. This set contain all the useful productions, but also preserve all the useless productions, and so far we cannot distinguish them effectively.
We're going to build a whole CFG conforming to the rules we've just mentioned with the PDA that accepts language language a^{2n}b^n as we've mentioned in figure.1, its summary table is also listed above in figure.2.^{3}
Start from the rule 1, it gives us the production:
The rule 2 suits \frak{R_{\text{4}}}, \frak{R_{\text{5}}}, \frak{R_{\text{6}}}, \frak{R_{\text{7}}}, each of which creates a terminal production:
Rule 3 can be applied to \frak{R_\text{1}}, \frak{R_\text{2}}, \frak{R_\text{3}}:
Rule 3 can be applied to \frak{R_{\text{2}}}, before the actually generation, we write it into a "to-be-substituted" form:
(As we've stated before, the X cannot be \tt S, Y cannot be \tt S and \tt A), substitute the X and Y with every possible value, we can get the following productions:
Similarly, we have productions generated by applying rule 3 on \frak{R_{\text{3}}}:
The full production list contains 33 productions, but some of them are useless, for instance, P_{23} contains \frak{N}(\tt R_1, R_2, a) as part of its right side, but there're no other productions that starts with \frak{N}(\tt R_1, R_2, a) at its left side, which means that part of the P_{23} can never be substituted, thus P_{23} will never generates a valid word of the language. And this also holds for P_{24}. As for P_{22}, the more times we apply P_{22} to other productions, the more \frak{N}(\tt R_1, R_1, \rm a) will be generated, applying P_{22} can never ends up in a terminal, so P_{22} is a useless production.
If we keep digging up like this, we would soon found a set of useless productions and we're able to remove them from the complete production set, and it is free to do so, but this step is unnecessary because those redundant productions won't affect the words' generation, they're useless, but also harmless, just like an unreachable node in a DFA, our purpose is to prove the theorem at the very beginning of this article, so we don't need to spend a lot of time on how to pruning the syntax tree to eliminate every possible redundant production, what we going to do next is less complicated.
We've showed how to build a CFG for a row language, now we need to show its equivalency, which is, each word generated by this row language corresponds to a path of the PDA. Firstly, we need to prove that this CFG generates all the words in row language, this part is quite simple because row language requires joint and stack consistency, and by the definition of the \frak{N}, it's not hard to find out that our CFG perfectly fit these two rules: 1. all of the \frak{N} in a valid production must be connected to each others' FROM and TO; 2. This CFG maintain its stack-consistency by the net effect of the path segments, it never allows to pop out the non-topmost characters, and each path segment can have net effect on only one character, the stack size decreases one at a time along with the substitution of the nonterminals, and finally, all the final terminals(\frak{R}s) simulate a path on the PDA. Secondly, the PDA accepts exactly those words in the row language, because every word in the row language have its own stack actions(which means \tt PUSH and \tt POP sequences), and every of them can be broken into a smaller stack actions, for example the initial stack with \text{\textdollar} and the end of the stack with \Delta can be splitted into some push of as, some push of bs, and then some pop of a and some pop of b, this can be simulated by substituting \frak{N} with several other \frak{N} or \frak{R}, each of \frak{N} represents a sequence of stack actions, and they will be keep decomposing until there's nothing left to decompose, which is the terminals \frak{R}, and this sequence of terminals is the path through the PDA.
Now we have the last step: turn a row language into a parcular CFG that the PDA accepts, this is not a hard job: we just need to contribute a new rule to those three rules we defined above, that is for every row \frak{R_{\text{i}}} in the language, create a production:
for instance, for \frak{R_{\text{3}}} in the figure.2, we add a production:
even the letter read is \Lambda or \Delta. This makes those \frak{R}, which are former terminals, become nonterminals, and introduce some new terminals which are exactly the elements in the alphabet of the PDA, if we derives a \frak{R} sequence from the aforementioned non-completed CFG:
we can then derives a real string accepted by the PDA in figure.1 by apply those newly added productions:
treat the \Delta like \Lambda, we can get the final word aab, and this word can be accepted by the PDA by following the path segments
there might be some different paths that can also lead aab to acceptance, but now we know there exists at least one such path for every word, including aab. The reason why we trust this works is that we've already known that every row can read at most one letter by its definition, the conversion form limited its greediness so that we can use this rule to create new productions who will derive real letters of the language.
Now we've showed the complete algorithm
The proof is therefore ended. Q.E.D.
The rule 8 of the conversion form, which requires that the input string must be exhausted from the \tt TAPE before any words acceptance, since the PDA can accept a string without actually read all the letters on the \tt TAPE, this rule ensures that all the CFG generates can be accepted by PDA, because the essential idea of the proof is to using an algebraic way, i.e., a CFG, to simulates a path from the \tt START state to the \tt ACCEPT state on the PDA, if the string can be accepted without leaving an empty \tt TAPE, then the CFG can onlt generates the part that travels through the PDA, those letters still left on the \tt TAPE will be dropped, which will cause a lacking in the CFG's completeness.