Lecture 13

  1. Bottom-Up Parsing

1.0 - Bottom-Up Parsing

1.1 - Top-Down vs Bottom-Up Parsers

1.1.1 - Top-Down (Recursive Descent Parser) Example

Figure 1: Top-Down Parser Example Steps

1.1.2 - Bottom-Up Parsing (Shift/Reduce Parsing) Example

Figure 2: Bottom-Up Parser Example Steps

1.2 - Bottom-Up Parsing: Shift-Reduce Parsing

1.2.1 - Shift-Reduce Parsing Example

Parsing Stack Input Parsing Action Notes / Description
$ ((a))$ shift The shift action takes the first item off the input, and pushes it onto the stack.
$( (a))$ shift
$(( a))$ shift
$((a ))$ reduce AaA\rightarrow a We use the production AaA\rightarrow a to reduce aa to AA. Note that reduction is the inverse of a production (i.e. a production in the opposite direction). [1]
$((A ))$ shift [2]
$((A) )$ reduce A(A)A\rightarrow(A) Use the production A(A)A\rightarrow(A) to reduce (A)(A) to A.
$(A $ shift
$(A) $ reduce A(A)A\rightarrow(A)
$A $ accept Since we’re at the end of our input, and the top of the stack is our initial start symbol, we can perform the accept action (rather than reduce SAS\rightarrow A)

[1] At this stage here, we could perform a shift operation, but then we’d get into a situation where no subsequent actions would lead to a solution

[2] At this point here, we think we may be able to perform an accept action, as the last token matches our new introduced production SAS\rightarrow A. However, we can’t as we’re not at the end of our input.

2.0 - Shift/Reduce Parsing Schemes

Shift-Reduce Parsing Schemes define when to shift or reduce.

2.1 - LR(0) Parsing Scheme

2.1.1 - LR(0) Parsing Items

2.1.2 - LR(0) Parsing Automaton

Automaton States - Derived LR(0) Parsing Items

Automaton Transitions - Goto States

2.1.3 - LR(0) Parsing Automaton Example

Let’s generate the LR(0) Parsing Automaton for the grammar SA,A(A),AaS\rightarrow A, A\rightarrow (A), A\rightarrow a

Step 1: Start at the Start Symbol of the Grammar

Step 2: Transitions out of Initial State (#0)

Step 3: Second State (#1)

Step 4: Third State (#2)

Step 5: Fourth State (#3)

Step 6: Transitions out of the Third State (#2)

Step 7: Fifth State (#4)

Step 8: ??Sixth State??

Step 9: ??Sixth State, Part II?

Step 10: Sixth State, actually

2.1.4 - LR(0) Parsing Actions

Which action do we associate with each state in the Parsing Action Automaton

An LR(0) item of the form

  1. NαaβN\rightarrow \alpha\bullet a \beta, where aa is a terminal symbol indicates the state containing the item has a SHIFT parsing action
  2. SSS'\rightarrow S\bullet, where SS is the (introduced) start symbol for the grammar, indicates that the state containing the item has an ACCEPT action
  3. NαN\rightarrow\alpha \bullet i.e. there is nothing further to match on its right side, where NN is not the (introduced) start symbol for the grammar, indicates that the state containing the item has a parsing action REDUCE NαN\rightarrow\alpha - note that NαN\rightarrow\alpha is a necessary part of the action.

A shift action at end-of-file is an error, as is an accept action when the input is not at the end-of-file

For State #0

For State #1

For State #2

For State #3

For State #4

For State #5

2.2 - Shift/Reduce LR(0) Parsing

We now combine the Shift/Reduce Parsing from before with the Automata defined in the section above

  1. Note that in the previous iteration of parsing with a stack, we started off with the stack initially empty. In this version, we also push the state number. Here, we start in the initial state, which has a parsing action of SHIFT
  2. We perform the SHIFT action from the previous step, which puts us in State 2 (which we also pushed onto the stack). In State 2, the parsing action is a SHIFT action.
  3. We perform the SHIFT action from the step before, in which we remain in the same state (which is also pushed onto the stack). The parsing action from the second state is a SHIFT action.
  4. From the second state, we SHIFT the symbol aa onto the stack, which puts us in state 3. In this state, the parsing action is REDUCE $A\rightarrow a$.
  5. We perform the reduce action from the previous state, REDUCE AaA\rightarrow a. This action replaces the aa on the stack with AA, and we go back to the second state. However, since we’ve pushed an AA to the stack in the second state, we end up in the fourth state. This state has a parsing action of SHIFT.
  6. We perform the SHIFT action from the previous step, which pushes the closing parenthesis character onto the stack (along with the state number). The parsing action of this state is A(A)A\rightarrow (A)
  7. We perform the REDUCE A(A)A\rightarrow (A) action from the previous step, in which replace (A) with A. After performing the reduction action, we are in state 2 as our stack changes from $0(2(2(A4)5\$0(2(2(A4)5 to $0(2(2A\$0(2(2A. We then push AA onto the parsing stack. This means that we are now in state 4 as we have pushed AA from state 2 (We push the new state onto the stack). This new state has a parsing action of SHIFT
  8. We perform the SHIFT parsing action from the previous state, in which the parsing stack changes to 0(2A4)50(2A4)5 since we push a closing parenthesis onto the stack from the fourth state. This fifth state has the parsing action Reduce A(A)A\rightarrow (A).
  9. We perform the Reduce A(A)A\rightarrow (A) parsing action from the previous state, in which we remove (A)(A) from the stack (which puts us in state 0). Whilst in state 0, we push AA onto the stack, which puts us in State 1 (which is the final state, indicating that our parsing process has finished).

2.2.1 - LR(0) Parsing Action Conflicts

A grammar is LR(0) if none of the states in its LR(0) parsing automaton contains a parsing action conflict.

2.2.2 - Parsing Automaton for a Grammar that is not LR(0)

  1. The initial state (State 0) has a kernel item that is formed by the new, introduced kernel SES\rightarrow E in which we place the position indicator \bullet at the start, indicating that nothing has been matched yet

    SE S\rightarrow\bullet E

    Since the position indicator is to the immediate left of the non-terminal symbol EE, we need to include the two productions associated with EE. Therefore, we add the following two derived items

    EE+nE n E\rightarrow\bullet E + n\\ E\rightarrow\bullet\ n

    Based on these three kernels, we know that our initial state has two transitions, on the symbols EE and nn.

    • We transition on EE for the kernels SES\rightarrow \bullet E and SE+nS\rightarrow \bullet E + n (State 1)
    • We transition on nn for the kernel EnE\rightarrow\bullet n (State 2)
  2. State 1 has two kernel items that have EE as the next character to match.

    • To update both of these kernel items, we update the position indicator to jump over the token EE. Therefore, our updated kernel items are:

      SES\rightarrow E \bullet

      SE+nS\rightarrow E \bullet + n

    • There are no derived items for this state, as the position indicator is not to the left of any non-terminal symbols.

    • The second kernel item is to the left of the symbol ++. Therefore, we have a single transition, on the “+” symbol.

  3. State 2 has a single kernel item, as identified earlier.

    • We update the kernel item by moving the position indicator over the matched token nn

      EnE\rightarrow n \bullet

    • This state doesn’t have any transitions, as the position indicator is at the end of the sequence to match

  4. State 3 has a single kernel item, which is an updated version of the kernel item from the first state.

    • We update the kernel item by moving the position indicator over the matched token ++

      EE+nE\rightarrow E + \bullet n

    • We have one transition out of this state, on nn

  5. State 4 has a single kernel item, as identified in State 3

    • We update the kernel item by moving the position indicator over the matched token nn

      EE+nE\rightarrow E + n \bullet

  6. We now go back through the states, and assign an action for each state in the parsing automaton.

    1. In State 0, we have the parsing items:

      SEEE+nEnS\rightarrow\bullet E\\ E\rightarrow\bullet E + n\\ E\rightarrow\bullet n
      • Since our position indicator character \bullet is in front of the terminal symbol nn, we assign this state a SHIFT action.
    2. In State 1, we have the parsing items:

      SEEE+nS\rightarrow E\bullet\\ E\rightarrow E\bullet +n
      • Based on the first parsing item, and the position of the position indicator, we should perform an ACCEPT action
      • Based on the second parsing item, and the position of its parsing indicator, we should perform a SHIFT action
      • Therefore, we have a conflict of actions in this state, and therefore the grammar is not LR(0)
        • If we could use a single-symbol lookahead we could solve this problem - if the next symbol is EOF then accept, otherwise continue parsing.
    3. In State 2, we have the parsing items

      En E\rightarrow n\bullet
      • Since our position indicator character \bullet is at the end of the kernel, we perform a REDUCE action, on the kernel, i.e. REDUCE EnE\rightarrow n
      • Note that since EE is not our introduced start symbol, we don’t perform an accept action here.
    4. In State 3, we have the parsing item

      EE+n E\rightarrow E + \bullet n
      • Since our position indicator character \bullet is before a terminal symbol nn, we perform a SHIFT action.
    5. In State 4, we have the parsing item

      EE+n E\rightarrow E + n \bullet
      • Since we’re at the end of this parsing item (and it isn’t the introduced start symbol), we perform a REDUCE EE+nE\rightarrow E+n action