There is a PDA that accepts the same language for every CFG, and a CFG generates the same language for every PDA

PDA for a certain CFG

  This part of the proof will start with an example, we propose a CFG in CNF, and raise a corresponding PDA, run some strings on this PDA, observe what happens on each stage: the characters left on the \tt STACK, the letters left on the \tt TAPE, the current state of the PDA, etc. Since we've shown how to translates a CFG into a CFG in CNF, there would be no problem assuming all the CFGs we're proposing is in CNF
  For CFG:

S&\rightarrow SB\\
S&\rightarrow AB\\
A&\rightarrow CC\\
B&\rightarrow b\\
C&\rightarrow a

we propose PDA:

instructs a \tt PUSH operation on S indicates that we're to start running the string, then we're going to simulate several things: 1. the substitution from nonterminal to nonterminals. 2: the substitution from nonterminal to terminal. When we pop a nonterminal out, we can decide to push another two nonterminals in according to the production rules, e.g., There're two choices of the combination of nonterminals if we popped a S out of the \tt STACK: SB or AB, because these are all the capabilities of S according to the productions, if we choose to push AB, we can pop out A and push in CC in the next step to simulate another substitution: A\rightarrow CC, now what's left in the \tt STACK is actually three nonterminals: CCB where all of them are actually nonterminals with only terminal productions, the second option takes place in this step: pop out a C and read an a from the tape to simulates the production C\rightarrow a.
  It is not hard to obvious that we're simulating a leftmost derivation on the PDA of the CFG, each time we use the stack to perform a substitution comforming to a certain production rule on it's topmost element which acts as a nonterminal in the CFG, and the topmost element always being the leftmost nonterminal in a production. We can track the path to \tt ACCPET by constructing a state transition table, this table shows not only the state sequence to the accept state, but also the relationships between the \tt STACK, \tt TAPE and the CFG's leftmost derivation:

At each stage, the leftmost derivation yields the same content as the characters in stack, this is the essential idea of this PDA, we simulate the whole leftmost derivation by using a stack, the derived string always satisfies such a form like (\text{Letters read from the tape})(\text{Nonterminals from the }\tt STACK) at every stage.

Propose an algorithm

  Consider a CFG and assume it is in CNF with nonterminals X_1, X_2, ... and terminals \lbrace a,b\rbrace and S=X_1 denoting start symbol:

X_1&\rightarrow X_2X_3\\
X_1&\rightarrow X_1X_4 \\
X_2&\rightarrow X_5X_6 \\
X_4&\rightarrow a\\
X_5&\rightarrow b\\
X_6&\rightarrow b
  1. For the start symbol, add a start state followed by a \tt PUSH and a \tt POP state where the push state will pushes the start symbol onto the stack:
  2. For every nonterminal production X_q=X_mX_n, draw an outgoing edge from the \tt POP state which will lead the way back to \tt POP state itself before going through to two successive push states where the first one pushes X_n onto the stack and the later does the same thing for X_m:
  3. For every terminal production X_i=b(or X_i=a), draw an outgoing edge connects the \tt POP state and a \tt READ state in which has an outgoing edge labeled with a letter identical to the terminal at the right side of the prodcution:
  4. Draw a path to the \tt ACCEPT state by popping and reading a \Delta successively:
  5. For languages with \Lambda, draw an edge from the \tt POP state and back to it self if it pops the start symbol(The only possibility of \Lambda\in L(G) is that the start symbol yields a \Lambda, because otherwise we've been proved that the \Lambda-Productions can be eliminated), by which means that we simulates an \Lambda production.

CFG for a certain PDA

  Let \tt HERE becomes a new kind of state in the PDA's definition, we introduce this state because we need a way to trace the edges we're traveling now, and those who have been traveled through before, the \tt HERE states are going to be a middle state stands in between two other states, we referring to a certain \tt HERE state each time we want to show that we're traveling through an edge, neither it consumes any letter from the tape nor pop or pushes any character to the stack, it acts like a marker. The \tt HERE states are nonterministic, it allows multiple outgoing edges.

Conversion form of the PDA

A PDA is said to be in conversion form, if it satisfies:

  1. There is only one \tt ACCEPT state
  2. There're no \tt REJECT states
  3. Every \tt READ or \tt HERE is immediately followed by a \tt POP
  4. Every two \tt POPs must be separated by \tt READ or \tt HERE states
  5. All branching only occurs at \tt READ or \tt HERE states, every edge has only one label
  6. The \tt STACK contains a \text{\textdollar} denoting "bottommost element" by default, by which means that element is inside the \tt STACK before the PDA starts execution, if it gets popped out during the processing, it must be immediately replaced by another element, this element is considered an effective limitation, no element beneath it will be popped out. This character will be left out at the last stage of the execution(which means it will be popped out at last).
  7. The PDA must start starts with:

    Note that the \tt HERE state can be replaced by a \tt READ state.

  8. The input string must be exhausted before the machine accepts any word.


  A PDA can be turned into a conversion form with its recognizing capability untouched, This gonna be proved for each of the rules.

  1. It can be easily achieved by leaving only one \tt ACCEPT state and remove all others, then connect the last \tt ACCEPT to all the states that used to be connected to former \tt ACCEPT states
  2. Remove all the \tt REJECT state and their relevant edges, the \tt REJECT states are not required in a PDA because without these states, the PDAs will crash on illegal inputs. It won't accept any formerly non-acceptable languages, and does not lose any currently acceptable languages.
  3. Leave the \tt READ or \tt HERE alone if it has already been followed by a \tt POP state, otherwise we add a \tt POP state just behind that state, then push whatever the \tt POP operation pops out back into the \tt STACK, now we're doing something like put a character back in where it belongs immediately after we grab it out, this won't interfere the original acceptable language set
  4. Add a \tt HERE state between any two successive \tt POP states.
  5. Since all \tt POPs are separated by \tt READ and \tt HERE, it is sufficient to conclude that there must be a \tt READ or \tt HERE state right before the targeting \tt POP state, if we want to de-branching that \tt POP state, we need to transfer its nondeterminism to the \tt READ or \tt HERE state before, if we have a such transition:

    we can convert it into

    the overall nondeterminism stays as it was, we just advanced its timing, the branching used to occur at \tt POP state, now it occurs on \tt READ_1(or \tt HERE), and we add a new edge and a new \tt POP state to preserve the nondeterminism, be aware that the modification we've made in rule 3, we added an extra \tt POP and several \tt PUSHs if the \tt READ or \tt HERE is not followed by exactly one \tt POP, it also needs to be attentioned and being modified once again by the algorithm we've just proposed

  6. We're going to presume that the \tt STACK contains at least one element \text{\textdollar} before the PDA starts.
  7. None of the PDAs' recognizing capability will be affected just by installing this part if those PDAs are already meet rule 6.
  8. We've proved that every effective PDA has a correspondence with the same recognizing capability but empty both the \tt STACK and \tt TAPE before the string is being formally accepted, read the previous chapter if you desire more details about the proof

  There's an extra point that needs to be treated carefully, when we performing the algorithm for rule 3, the possibility of which the stack might only contain \text{\textdollar} must be accounted, we need to branching an additional \tt POP which pop and then pushes \text{\textdollar} from the \tt READ state once such circumstance occurs. For example, the PDA of CFG a^{2n}b^n1:

needs to by converted into


The necessity of conversion form

  The reason why we need these eight conditions is that it forms a formal description of a path segment on the machine, every PDA, can be considered as a set of a quintuple each of which represents a segment, a part of the machine, and every tuple semantically forms a description like:

  1. START from one of the three states \tt START/READ/HERE
  2. GOTO one of the three states \tt READ/HERE/ACCEPT
  3. READ any amount(including zero) of letters from the \tt TAPE
  4. POP out the topmost character on the \tt STACK
  5. PUSH something(maybe nothing) onto the \tt STACK

the states \tt START, \tt READ, \tt HERE, and \tt ACCEPT are called joints, there will be exactly one character being popped and any number of characters being pushed between any to consecutive joints. And once a machine is in conversion form, we can describe it by describing these path segments and then summary them in a summary table, it's analogous to the transition tables we've talked about before when introducing finite automata; if these segments are being presented pictorially, they will form a graph where the nodes can be classified into two classes, the first classes are the joints, and others are non-joints, where those joint nodes separated the whole graph into several different areas each of which contains multiple(or only one) non-joint nodes, the joint node along with its non-joint nodes makes up a path segment, the PDA we've just mentioned above can be like:

the classification is obvious, the part between \tt START and \tt READ_1 is a path segment, so do \tt READ_1 and \tt HERE, \tt READ_1 and \tt READ_1 itself, \tt HERE and \tt READ_2, \tt READ_2 and \tt ACCEPT. its corresponding summary table is:

figure 2

be noticed that the cells of \text{PUSH} column means push the characters from right to left onto the stack, a\text{\textdollar} means push \text{\textdollar} first then push a secondly, the leftmost symbol represents the topmost character of the stack.
  Instead of focusing on every states and every edges, now we can consider that every successful path through the PDA is made up by several path segments, which are those rows in the summary table, if we want to accept aaaabb, we must follow the rows(R for \text{Row}):

R_1\Rightarrow R_2\Rightarrow R_3\Rightarrow R_3\Rightarrow R_3\Rightarrow R_4\Rightarrow R_5\Rightarrow R_6\Rightarrow R_5\Rightarrow R_7

It is no doubt that the summary table represents a PDA with the same recognizing capability as the one we've mentioned above in a pictorial way, they are indeed the same machine despite seemingly prodigious differences, which means they accept the same set of languages.
  Since every successful path through the PDA corresponds to a set of rows, we must find a way to distinguish which permutations of those rows can be make up a path, firstly, a row must be followed by another row with the same FROM as the former's TO, you cannot end with \tt READ_1 and then start with \tt HERE; secondly, the connectivity between two rows' FROM and TO does not guarantee that these two rows can form a part of the path, because there're some other contextually sensitive variables, such as \tt STACK, for instance, the R_1 cannot be followed by an R_3 in spite of the connectivity between their FROM and TO, because R_3 requires to pop an a while there is only a \text{\textdollar} left in the \tt STACK after R_1. Another thing needs to be concern is we must balance the pop and push operations, you cannot let four R_3 followed by five R_5\Rightarrow R_6 combination because of every R_5\Rightarrow R_6 consumes two a and every R_3 contributes only one a, the count of a in the stack would be unbalanced which will then causing a crash on the machine.
  To form a valid path, the path segments(the rows) must meet two consistencies:

  1. Joint-consistency, which requires that in a combination of two rows, the later one's FROM must be the same as the former one's TO
  2. Stack-consistency, the \tt STACK must have a topmost element identical to the one that \tt POP instructs, by which means when popping a character, there must be such character at the top of the \tt STACK

We shall define a special language called "row language" of a PDA, its alphabet comes from the PDA's summary table(where R for \text{Row}):

\Sigma=\lbrace R_1, R_2, R_3, ... \rbrace

and its words are all the ordered combinations of several rows that can make up a path from \tt START to \tt ACCEPT, all of its words are both stack-consistent and joint-consistent.

The essential idea of this proof

  From now on, we're about to determine the CFG for the row language, and then translate it into a CFG for the original PDA, generally, this proof contains the following steps:

  1. Given a particular PDA
  2. Convert it into a conversion form
  3. Summarize its summary table
  4. Constructs the row langauge corresponding to the summary table
  5. Build a CFG for the row language, which means a CFG that generates exactly the words in the row language.
  6. Convert the CFG we built in the last step to a CFG corresponding to that particular PDA at the very beginning

  The essential idea of this proof is maintain a top-down consistency on both stack and joints, we've showed that each row of the table corresponds to a path on the summary table which represents a path segments on the PDA, and every word accepted by the PDA will follows a path that consist of some of those path segments, and on each path segments we consume at most one letter from the \tt TAPE, let's make an observation about how does these rows maintain the stack-consistency: the whole problem can be reduced into a simple question, consider two arbitrary joints p and q in the PDA, can we find a path which we will call it "route" from p to q, while at a total cost of popping out the topmost element k on the stack? where by "total cost", means that during the path, the stack is never be popped out below k, and it could have multiple other pushes and pops, but when arriving q, the stack should look like it just popping out the topmost element, since we require every row satisfies this requirement, the stack-consistency is ensured. We can connect each of these routes that have matching start joint and end joint, for instance, if a route starts in \tt READ_2 and ends in \tt HERE_2, another route starts in \tt HERE_2 and \tt READ_3, then we can let the former route followed by the later route, because the former one's end and later ont's start are matching, by keeping this all the way, the joint-consistency is also ensured. Now focusing on the PDA itself, we know that the PDA is start with only a \text{\textdollar} on the stack, and end with an empty stack, this means that the path from \tt START to \tt ACCEPT constructs a big route at a total cost of \text{\textdollar}, and this big route, clearly, can be broken into other small routes, like from \tt START to \tt READ_2 at a total cost of a, and these smaller routes can be once again broken into even smaller routes, along with this iteration, the "small routes" will end into a combination of solid rows, this on the other hand, shows that rows can form routes, and routes can form bigger routes, eventually end up in the biggest routes: the one from \tt START to \tt ACCEPT. Until now, it's not hard to find out that these new concepts are analogous to a CFG, where the rows are terminals, the routes are nonterminals, and the biggest route, is the start symbol, if a route can be broken into smaller routes, we involve a production rule that substitutes the original route into those smaller routes, repeat this process topdown until we finally reached a circumstance that all the nonterminals has been broken into a combination of terminals, which are those rows. We've mentioned that every row consumes at most a single letter from the \tt TAPE, we can consider this as a terminal production, where the row itself is at the left side of the production, and the letter it consumes at the right side acts as a terminal, if the row read nothing, or a \Delta, we then let the \Lambda becomes the right side of the production, then we contribute these new productions to the CFG above to finish the step 5 and step 6 of the proof, and this CFG will exactly equivalent to the langauges of the PDA

Since we're going to build a CFG for the row language, it is necessary to introduce its nonterminals, which is the aforementioned "routes"; we're going to use them to maintain the stack and joint consistency, the contents of the \tt STACK, and the beginning and end positions of each row, the \tt TAPE is an exception that can be ignored because every row itself corresponds to a part of the input string on the \tt TAPE, if a word can be accepted by the PDA, it must can be broken into blocks each of which corresponds to a row, we know the rows, we know the words they can construct.
  The nonterminals of a row language follows the following definition:


where X and Y are joint, Z is any character from \Gamma, this function-like definition stand for:

There exists a path from X to Y passing through zero or more other joints, which has the net effect on the \tt STACK of removing Z, and by "net effect" means stack has Z as its topmost element when passing X, and retains all its other elements when arriving Y except for Z which has been popped out somewhere during the travel, there might be multiple pop and push operations along the way, but in the end, the stack would be like popping out a Z with all other elements preserved(of course the second topmost element will be new topmost element of the stack) and the elements below Z should never be popped out, for example, if a stack contains babab initially, then push-and-pop the a and b two times, respectively, then pop out a b at last so that the stack would contains abab now, we say that that operation has the net effect on the stack of removing b, because although the stack indeed pushes two b and two a, they are all popped out eventually, then by popping out a b again we say that the net effect appears because the current stack abab looks exactly the initial one babab without the topmost b. It's analogous to the "work" in the physics, no matter how many ups and downs you've taken, the total work still only count on the height deviation. And be noticed that no characters below Z can be popped out during the process, even if it will be pushed in again, for example, a single \tt POP\ \rm Z has the net effect, but \tt POP\ \rm Z\Rightarrow \tt POP\ \rm a\Rightarrow\tt PUSH\ \rm a does not have the net effect because although the stack is balanced at last that looks like only Z is popped out, it still presumes that the a is below Z and pops it out during the process which is never allow to happen, NO ELEMENTS BELOW Z CAN BE POPPED OUT AT ANYTIME

The Row_4 of the summary table above is a good example on this, it shows a nonterminal:

\text{Net(}\tt READ_1, HERE, \rm a\text{)}

a row in an arbitrary PDA like

has no net effect, because it pushes more than pops, the things left on the stack after this row being executed will be two characters more than before, and one of the net effect's requirements is that the stack should be one character less than before after the process, however, although R_{11} cannot acts as a net effective row alone, it can play a part with other net effects together, for example, if there are other three valid nonterminals

\text{Net(}\tt READ_3, READ_7, \rm a\text{)}
\text{Net(}\tt READ_7, READ_1, \rm b\text{)}
\text{Net(}\tt READ_1, READ_8, \rm b\text{)}

these three nonterminals form a valid path, because their FROMs and TOs are connected, we can then put the R_{11} in use:

\text{Net(}\tt READ_9, READ_8, \rm b\text{)}

this would be a path segment because we can start with \tt READ_9, pops out an b and pushes in abb, then the abb will be consumed when traveling through \tt READ_3 to \tt READ_7, \tt READ_7 to \tt READ_1 and \tt READ_1 to \tt READ_8, at last there would be nothing more than the initial characters without a b in the stack when arriving \tt READ_8, while ensures that no characters below the first b is ever get popped out. We can represent this rule by using a production2:

\frak{N}(\mathtt{R_9}, \mathtt{R_8}, b)
\rightarrow \frak{R}_{11}\frak{N}(\mathtt{R_3}, \mathtt{R_7}, a)\frak{N}(\mathtt{R_7}, \mathtt{R_1}, b)\frak{N}(\mathtt{R_1}, \mathtt{R_8}, b)

this production, on the other hand, semantically says that we can reach \tt R_8 from \tt R_9 at the cost of a b, and it can be substituted by the combination of \frak{R_{\text{11}}} and several other \frak{N}s, and this will be one of the productions of the CFG of the row langauge, in the example above, the \frak{R_{\text{11}}} is the terminal, and \frak{N} are the nonterminals. There're three rules to produce all these productions:

  1. Let S denotes the start symbol of the whole CFG, we have
    S\rightarrow\frak{N}(\tt S, A, \text{\textdollar})

    which means a complete path from \tt START to \tt ACCEPT, as a cost of popping out only a single character \text{\textdollar}, and never pop out any thing below the \text{\textdollar}, this production is universal to all PDAs.

  2. For every row in the summary table that pushes nothing (where by "nothing" means the push cell is filled with nil):

    we have:

    \frak{N}(X,Y,Z)\rightarrow \frak{R}_i

    indicates that we can arrive Y from X at a cost of popping Z, this production involves a hypothetic that the PDA must contains at least one row like this, because this row holds a truth that it decreases the size of the stack when being executed, if none of the rows have a such form then the stack will never decrease its size down to empty which implies that the machine can accepts nothing.

  3. For every row in the table with valid pushes:

    let \Theta be all sets of \tt R, \tt H, or \tt A(which means every element might be one of these three)(no \tt START state) where \forall_{\theta\in\Theta}(|\theta|=n), we introduce a set of productions

    \mathcal{P}=\lbrace \theta\in\Theta|\frak{N}(X,\theta_n,Z)\rightarrow\frak{R}_i\frak{N}(Y,\theta_i,m_i)...\frak{N}(\theta_{n-1},\theta_n,m_n) \rbrace

    One thing important here is that all the \thetas cannot be \tt S because the start state does not permit incoming edges in the conversion form, and those \thetas before \theta_n also cannot be \tt A because there is only one accept state and that one is the last one. This will generate a gigantic amount of productions, some of them will be useful in generating words and some of them won't, which means there will be a considerable part of rules in the set are useless, in fact, only those nonterminals that can be substituted into solid terminals are useful. This set contain all the useful productions, but also preserve all the useless productions, and so far we cannot distinguish them effectively.

Pruning of the syntax tree of row language

  We're going to build a whole CFG conforming to the rules we've just mentioned with the PDA that accepts language language a^{2n}b^n as we've mentioned in figure.1, its summary table is also listed above in figure.2.3
  Start from the rule 1, it gives us the production:

P_1\ \ \ \ S\rightarrow \frak{N}(\tt S, A, \text{\textdollar})

  The rule 2 suits \frak{R_{\text{4}}}, \frak{R_{\text{5}}}, \frak{R_{\text{6}}}, \frak{R_{\text{7}}}, each of which creates a terminal production:

&P_2\ \ \ \ \frak{N}(\tt R_1, H, \rm a)&&\rightarrow&&\frak{R}_4 \\
&P_3\ \ \ \ \frak{N}(\tt H, R_2, \rm a)&&\rightarrow&&\frak{R}_5 \\
&P_4\ \ \ \ \frak{N}(\tt R_2, H, \rm b)&&\rightarrow&&\frak{R}_6 \\
&P_5\ \ \ \ \frak{N}(\tt R_2, A, \text{\textdollar})&&\rightarrow&&\frak{R}_7

  Rule 3 can be applied to \frak{R_\text{1}}, \frak{R_\text{2}}, \frak{R_\text{3}}:

&P_6\ \ \ \ \frak{N}(\tt S, R_1, \text{\textdollar})&&\rightarrow&&\frak{R}_1\frak{N}(\tt R_1, R_1, \text{\textdollar}) \\
&P_7\ \ \ \ \frak{N}(\tt S, R_2, \text{\textdollar})&&\rightarrow&&\frak{R}_1\frak{N}(\tt R_1, R_2, \text{\textdollar}) \\
&P_8\ \ \ \ \frak{N}(\tt S, H, \text{\textdollar})&&\rightarrow&&\frak{R}_1\frak{N}(\tt R_1, H, \text{\textdollar}) \\
&P_9\ \ \ \ \frak{N}(\tt S, A, \text{\textdollar})&&\rightarrow&&\frak{R}_1\frak{N}(\tt R_1, A, \text{\textdollar}) \\

  Rule 3 can be applied to \frak{R_{\text{2}}}, before the actually generation, we write it into a "to-be-substituted" form:


(As we've stated before, the X cannot be \tt S, Y cannot be \tt S and \tt A), substitute the X and Y with every possible value, we can get the following productions:

&P_{10}\ \ \ \ \frak{N}(\tt R_1, R_1, \text{\textdollar})&&\rightarrow&&\frak{R}_2
\frak{N}(\tt R_1, R_1, \rm a)
\frak{N}(\tt R_1, R_1, \text{\textdollar})& \\
&P_{11}\ \ \ \ \frak{N}(\tt R_1, R_1, \text{\textdollar})&&\rightarrow&&\frak{R}_2
\frak{N}(\tt R_1, R_2, \rm a)
\frak{N}(\tt R_2, R_1, \text{\textdollar})& \\
&P_{12}\ \ \ \ \frak{N}(\tt R_1, R_1, \text{\textdollar})&&\rightarrow&&\frak{R}_2
\frak{N}(\tt R_1, H, \rm a)
\frak{N}(\tt H, R_1, \text{\textdollar})& \\
&P_{13}\ \ \ \ \frak{N}(\tt R_1, R_2, \text{\textdollar})&&\rightarrow&&\frak{R}_2
\frak{N}(\tt R_1, R_1, \rm a)
\frak{N}(\tt R_1, R_2, \text{\textdollar})& \\
&P_{14}\ \ \ \ \frak{N}(\tt R_1, R_2, \text{\textdollar})&&\rightarrow&&\frak{R}_2
\frak{N}(\tt R_1, R_2, \rm a)
\frak{N}(\tt R_2, R_2, \text{\textdollar})& \\
&P_{15}\ \ \ \ \frak{N}(\tt R_1, R_2, \text{\textdollar})&&\rightarrow&&\frak{R}_2
\frak{N}(\tt R_1, H, \rm a)
\frak{N}(\tt H, R_2, \text{\textdollar})& \\
&P_{16}\ \ \ \ \frak{N}(\tt R_1, H, \text{\textdollar})&&\rightarrow&&\frak{R}_2
\frak{N}(\tt R_1, R_1, \rm a)
\frak{N}(\tt R_1, H, \text{\textdollar})& \\
&P_{17}\ \ \ \ \frak{N}(\tt R_1, H, \text{\textdollar})&&\rightarrow&&\frak{R}_2
\frak{N}(\tt R_1, R_2, \rm a)
\frak{N}(\tt R_2, H, \text{\textdollar})& \\
&P_{18}\ \ \ \ \frak{N}(\tt R_1, H, \text{\textdollar})&&\rightarrow&&\frak{R}_2
\frak{N}(\tt R_1, H, \rm a)
\frak{N}(\tt H, H, \text{\textdollar})& \\
&P_{19}\ \ \ \ \frak{N}(\tt R_1, A, \text{\textdollar})&&\rightarrow&&\frak{R}_2
\frak{N}(\tt R_1, R_1, \rm a)
\frak{N}(\tt R_1, A, \text{\textdollar})& \\
&P_{21}\ \ \ \ \frak{N}(\tt R_1, A, \text{\textdollar})&&\rightarrow&&\frak{R}_2
\frak{N}(\tt R_1, R_2, \rm a)
\frak{N}(\tt R_2, A, \text{\textdollar})& \\
&P_{21}\ \ \ \ \frak{N}(\tt R_1, A, \text{\textdollar})&&\rightarrow&&\frak{R}_2
\frak{N}(\tt R_1, H, \rm a)
\frak{N}(\tt H, A, \text{\textdollar})& \\

  Similarly, we have productions generated by applying rule 3 on \frak{R_{\text{3}}}:

&P_{22}\ \ \ \ \frak{N}(\tt R_1, R_1, \rm a)&&\rightarrow&&\frak{R}_3
\frak{N}(\tt R_1, R_1, a)
\frak{N}(\tt R_1, R_1, a)\\
&P_{23}\ \ \ \ \frak{N}(\tt R_1, R_1, \rm a)&&\rightarrow&&\frak{R}_3
\frak{N}(\tt R_1, R_2, a)
\frak{N}(\tt R_2, R_1, a)\\
&P_{24}\ \ \ \ \frak{N}(\tt R_1, R_1, \rm a)&&\rightarrow&&\frak{R}_3
\frak{N}(\tt R_1, H, a)
\frak{N}(\tt H, R_1, a)\\
&P_{25}\ \ \ \ \frak{N}(\tt R_1, R_2, \rm a)&&\rightarrow&&\frak{R}_3
\frak{N}(\tt R_1, R_1, a)
\frak{N}(\tt R_1, R_2, a)\\
&P_{26}\ \ \ \ \frak{N}(\tt R_1, R_2, \rm a)&&\rightarrow&&\frak{R}_3
\frak{N}(\tt R_1, R_2, a)
\frak{N}(\tt R_2, R_2, a)\\
&P_{27}\ \ \ \ \frak{N}(\tt R_1, R_2, \rm a)&&\rightarrow&&\frak{R}_3
\frak{N}(\tt R_1, H, a)
\frak{N}(\tt H, R_2, a)\\
&P_{28}\ \ \ \ \frak{N}(\tt R_1, H, \rm a)&&\rightarrow&&\frak{R}_3
\frak{N}(\tt R_1, R_1, a)
\frak{N}(\tt R_1, H, a)\\
&P_{29}\ \ \ \ \frak{N}(\tt R_1, H, \rm a)&&\rightarrow&&\frak{R}_3
\frak{N}(\tt R_1, R_2, a)
\frak{N}(\tt R_2, H, a)\\
&P_{30}\ \ \ \ \frak{N}(\tt R_1, H, \rm a)&&\rightarrow&&\frak{R}_3
\frak{N}(\tt R_1, H, a)
\frak{N}(\tt H, H, a)\\
&P_{31}\ \ \ \ \frak{N}(\tt R_1, A, \rm a)&&\rightarrow&&\frak{R}_3
\frak{N}(\tt R_1, R_1, a)
\frak{N}(\tt R_1, A, a)\\
&P_{32}\ \ \ \ \frak{N}(\tt R_1, A, \rm a)&&\rightarrow&&\frak{R}_3
\frak{N}(\tt R_1, R_2, a)
\frak{N}(\tt R_2, A, a)\\
&P_{33}\ \ \ \ \frak{N}(\tt R_1, A, \rm a)&&\rightarrow&&\frak{R}_3
\frak{N}(\tt R_1, H, a)
\frak{N}(\tt H, A, a)\\

The full production list contains 33 productions, but some of them are useless, for instance, P_{23} contains \frak{N}(\tt R_1, R_2, a) as part of its right side, but there're no other productions that starts with \frak{N}(\tt R_1, R_2, a) at its left side, which means that part of the P_{23} can never be substituted, thus P_{23} will never generates a valid word of the language. And this also holds for P_{24}. As for P_{22}, the more times we apply P_{22} to other productions, the more \frak{N}(\tt R_1, R_1, \rm a) will be generated, applying P_{22} can never ends up in a terminal, so P_{22} is a useless production.
  If we keep digging up like this, we would soon found a set of useless productions and we're able to remove them from the complete production set, and it is free to do so, but this step is unnecessary because those redundant productions won't affect the words' generation, they're useless, but also harmless, just like an unreachable node in a DFA, our purpose is to prove the theorem at the very beginning of this article, so we don't need to spend a lot of time on how to pruning the syntax tree to eliminate every possible redundant production, what we going to do next is less complicated.

Final proof of the theorem

  We've showed how to build a CFG for a row language, now we need to show its equivalency, which is, each word generated by this row language corresponds to a path of the PDA. Firstly, we need to prove that this CFG generates all the words in row language, this part is quite simple because row language requires joint and stack consistency, and by the definition of the \frak{N}, it's not hard to find out that our CFG perfectly fit these two rules: 1. all of the \frak{N} in a valid production must be connected to each others' FROM and TO; 2. This CFG maintain its stack-consistency by the net effect of the path segments, it never allows to pop out the non-topmost characters, and each path segment can have net effect on only one character, the stack size decreases one at a time along with the substitution of the nonterminals, and finally, all the final terminals(\frak{R}s) simulate a path on the PDA. Secondly, the PDA accepts exactly those words in the row language, because every word in the row language have its own stack actions(which means \tt PUSH and \tt POP sequences), and every of them can be broken into a smaller stack actions, for example the initial stack with \text{\textdollar} and the end of the stack with \Delta can be splitted into some push of as, some push of bs, and then some pop of a and some pop of b, this can be simulated by substituting \frak{N} with several other \frak{N} or \frak{R}, each of \frak{N} represents a sequence of stack actions, and they will be keep decomposing until there's nothing left to decompose, which is the terminals \frak{R}, and this sequence of terminals is the path through the PDA.

Full algorithm

  Now we have the last step: turn a row language into a parcular CFG that the PDA accepts, this is not a hard job: we just need to contribute a new rule to those three rules we defined above, that is for every row \frak{R_{\text{i}}} in the language, create a production:

\frak{R}_i\rightarrow\text{the letter read from }\frak{R}_i

for instance, for \frak{R_{\text{3}}} in the figure.2, we add a production:

\frak{R}_3\rightarrow a

even the letter read is \Lambda or \Delta. This makes those \frak{R}, which are former terminals, become nonterminals, and introduce some new terminals which are exactly the elements in the alphabet of the PDA, if we derives a \frak{R} sequence from the aforementioned non-completed CFG:


we can then derives a real string accepted by the PDA in figure.1 by apply those newly added productions:

\Lambda aab\Lambda\Delta

treat the \Delta like \Lambda, we can get the final word aab, and this word can be accepted by the PDA by following the path segments


there might be some different paths that can also lead aab to acceptance, but now we know there exists at least one such path for every word, including aab. The reason why we trust this works is that we've already known that every row can read at most one letter by its definition, the conversion form limited its greediness so that we can use this rule to create new productions who will derive real letters of the language.
  Now we've showed the complete algorithm

  1. Converts the PDA into conversion form
  2. Constructs a summary table
  3. Finds the row language of the summary table
  4. Finds the CFG corresponds to the row language
  5. Converts the CFG of the row language into the actual CFG of the PDA

The proof is therefore ended. Q.E.D.

The necessity of Rule.8 of conversion form

  The rule 8 of the conversion form, which requires that the input string must be exhausted from the \tt TAPE before any words acceptance, since the PDA can accept a string without actually read all the letters on the \tt TAPE, this rule ensures that all the CFG generates can be accepted by PDA, because the essential idea of the proof is to using an algebraic way, i.e., a CFG, to simulates a path from the \tt START state to the \tt ACCEPT state on the PDA, if the string can be accepted without leaving an empty \tt TAPE, then the CFG can onlt generates the part that travels through the PDA, those letters still left on the \tt TAPE will be dropped, which will cause a lacking in the CFG's completeness.

  1. Using TikZ to draw flowchart is as good as torturing me, so I decide to use those figures from the book ↩︎
  2. For simplification, we're going to use the teletypewriter font \tt R stands for \tt READ, \tt H stands for \tt HERE, \tt S stands for \tt START, and \tt A stands for \tt ACCEPT, fraktur font \mathfrak{N} stands for \text{Net}, \frak{R} stands for R which is the abbreviation of \text{Row} ↩︎
  3. This is an example directly from the book, I think it's very enlightening ↩︎