Lecture 14

  1. LR(1) Parsing Scheme
  2. LALR(1) Parsing Scheme

1.0 - LR(1) Parsing Scheme

1.1 - LR(1) Parsing Item

1.2 - LR(1) Parsing Automaton

1.2.1 - LR(1) Parsing Automaton - Initial State

1.2.2 - Automaton States - Derived LR(1) Items

1.2.3 - Automaton Transitions - Goto States

1.2.4 - LR(1) Parsing Actions

LR(1) Parsing Actions differ from LR(0) parsing actions in that the action chosen depends on the next terminal symbol in the input, xx.

An LR(0) item of the form:

1.3 - Constructing an LR(1) Parsing Automaton for a Grammar

  1. Construct the Initial State

    • We start off the construction of the LR(1) Parsing Automaton with the starting state.

    • Our introduced starting symbol in this case is SS, with the introduced production being SES\rightarrow E

    • Therefore, the first parsing item is given as follows. Note that for this first parsing item, no other symbols come after EE, and therefore, the look-ahead set is just ,$,\text{\textdollar}

      [SE,$][S\rightarrow\bullet E, \text{\textdollar} ]
    • Our position indicator, \bullet is to the left of the non-terminal symbol EE which has two productions

    • Therefore, we have two additional derived parsing items:

      • Note here that the look-ahead sets of these two derived parsing items contain the End-of-File symbol (from the original parsing item from which these were derived).
      • We also have the ++ symbol in our parsing items look-ahead sets - this is as we can derive symbols in the next steps with that symbol.
    • In the three parsing items above, we have our position indicator character to the left of two symbols - EE and nn.

      • Therefore, we have transitions out of this state on those symbols, as they are the symbols to be parsed next.
      • Our transitions are to State 1 (on the symbol EE) and State 2 (on the symbol nn)
  2. Construct the First State

    • This state has two parsing items; Updated versions of the parsing items from the previous state.
    • We essentially just update the position of the indicator. The look-ahead sets remain the same.
    [En,[EE+n,            $,+]$,+] \begin{aligned} &[E\rightarrow n\bullet , \\ &[E\rightarrow E\bullet+n, \end{aligned} \begin{aligned} \ \ \ \ \ \ \ \ \ \ \ \ &\text{\textdollar}, +]\\ &\text{\textdollar}, +] \end{aligned}
  3. Construct the Second State

    • We use the parsing item from the previous state (State 0), but update the position indicator to designate that we have matched the terminal symbol nn

      [En,$,+] [E\rightarrow n \bullet, \text{\textdollar}, +]
    • There are no derived items in this state, as the position indicator is not to the left of any non-terminal symbol.

  4. Construct the Third State

    • We use the parsing item from the previous state (State 1), but update the position indicator character to designate that we have matched the ++ symbol

      [EE+n,$,+] [E\rightarrow E + \bullet n, \text{\textdollar}, +]
    • Since the position indicator character is to the left of the terminal symbol n,n, we transition on it to our next state.

  5. Construct the Fourth State

    • As before, we use the parsing item from the previous state (State 3), but update the position indicator

      [EE+n,$,+] [E\rightarrow E + n \bullet, \text{\textdollar}, +]
    • There are no more go-to states (as we are at the end of the matching) and no derivations.

  6. Define the Parsing Action for the Initial State.

    • Before, we determined that there are the following parsing items in the initial state:

      [SE,[EE+n[En            $],$,+]$,+] \begin{aligned} &{[S\rightarrow\bullet E,}\\ &{[E\rightarrow\bullet E + n}\\ &{[E\rightarrow\bullet n} \end{aligned} \begin{aligned} \ \ \ \ \ \ \ \ \ \ \ \ &{\text{\textdollar}],}\\ &{\text{\textdollar}, +]}\\ &{\text{\textdollar}, +]} \end{aligned}
    • We know that if the position indicator is directly to the left of a non-terminal symbol, then we perform the SHIFT parsing action on nn

  7. Define the Parsing Action for the First State (State 1)

    • Before, we determined that the two parsing items for the first state are:

      [SE,[EE+n,               $]$,+] \begin{aligned} &[S\rightarrow E\bullet, \\ &[E\rightarrow E \bullet + n, \end{aligned} \begin{aligned} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ &\text{\textdollar}]\\ &\text{\textdollar},+] \end{aligned}
    • If the next symbol is ++, then the next symbol is a non-terminal symbol and therefore, we perform a SHIFT action

    • If the next symbol is $\text{\textdollar} (End-of-File), we perform an ACCEPT action

    • Therefore, there’s no conflict in this state (remember in the LR(0) Parsing Automaton we had an error in this state)

  8. Define the Parsing Action for the Second State (State 2)

    • Before, we determined that the parsing action for the second state is:

      [En,$,+] [E\rightarrow n\bullet, \text{\textdollar}, +]
    • Therefore, we REDUCE EnE\rightarrow n if the next symbol is either $\text{\textdollar} or ++.

    • Otherwise, an ERROR action is performed.

  9. Define the Parsing Action for the Third State (State 3)

    • Before, we determined that the parsing action for the third state is:

      [EE+n,$,+] [E\rightarrow E + \bullet n, \text{\textdollar}, +]
    • As before, since the position indicator symbol is to the left of the non-terminal symbol nn, we perform the SHIFT action on nn.

1.4 - Another LR(1) Parsing Automaton

  1. The initial state, State 0 has:
    • Initial kernel item

      [SS,        $][S'\rightarrow S, \ \ \ \ \ \ \ \ \text{\textdollar}]

    • Derived items:

      • We know this because the initial state uses the production SSS'\rightarrow S for its kernel item, and the non-terminal symbol SS has two productions:

        SidS\rightarrow \text{id}

        SV=ES\rightarrow V=E

      • Therefore, we have an additional two kernel items:

        [Sid,[SV=E         $]$]\begin{aligned} &{[S\rightarrow \bullet \text{id}, }\\ &{[S\rightarrow V=E} \end{aligned} \begin{aligned} \ \ \ \ \ \ \ \ \ &\text{\textdollar}]\\ &\text{\textdollar}] \end{aligned}
      • We know that the End-of-File symbol ($) follows SS, as SSS'\rightarrow S and $\text{\textdollar} follows SS'. Therefore, $\text{\textdollar} must follow SS and therefore the lookahead set for these two kernel items is {$}\{\text{\textdollar}\}

    • We now search for derivable items from the new kernel items. Notice that SVS\rightarrow\bullet V:

      • Therefore, we add that as a new kernel item of our state. The look-ahead set is just the symbol ==, as in the production from which this kernel item is derived, == follows VV

        [Vid,              =][V\rightarrow\bullet\text{id}, \ \ \ \ \ \ \ \ \ \ \ \ \ \ =]

    • From this, we have GOTO states on:

      • SS (from the initial kernel item)
      • id\text{id} from the first derived kernel item
      • VV from the second and third derived kernel item
  2. The first state, State 1 has:
    • A single kernel item - this is the same as the kernel item from the initial state, except we update the position of the indicator symbol

      [SS,       $] [S'\rightarrow S\bullet, \ \ \ \ \ \ \ \text{\textdollar}]
    • This state doesn’t have any derived items or GOTO states.

  3. The second state, State 2 has:
    • Two kernel items - both of these are updated versions of their respective kernel items from the initial state.

      [Sid,[Vid,          $]=] \def\spaces{ \ \ \ \ \ \ \ \ \ \ } \begin{aligned} [S\rightarrow\text{id} \bullet,\\ [V\rightarrow\text{id}\bullet, \end{aligned} \begin{aligned} \spaces&\text{\textdollar}]\\ &=] \end{aligned}
    • Neither of these kernel items have any derived items or GOTO states

  4. The third state, State 3 has:
    • A single kernel item - an updated version of the kernel item from the initial state

      [SV =E,        $] [S\rightarrow V\ \bullet=E, \ \ \ \ \ \ \ \ \text{\textdollar}]
    • This kernel item doesn’t have any derived items

    • Since the position indicator symbol is to the left of the non-terminal symbol ==, we have a GOTO on ==.

  5. The fourth state, State 4 has:
    • A single kernel item from the previous state, State 3

      [SV=E,        $] [S\rightarrow V =\bullet E, \ \ \ \ \ \ \ \ \text{\textdollar}]
    • Since the position indicator is to the left of the non-terminal symbol EE which has the productions EnE\rightarrow n and EVE\rightarrow V, we have an additional two derived kernel items:

      [En,[EV,             $]$] \begin{aligned} [E\rightarrow\bullet n,\\ [E\rightarrow\bullet V,\\ \end{aligned} \begin{aligned} \ \ \ \ \ \ \ \ \ \ \ \ \ &\text{\textdollar}]\\ &\text{\textdollar}]\\ \end{aligned}
    • In the second kernel item, notice that the position indicator is to the left of the non-terminal symbol VV, which has production VidV\rightarrow \text{id} implying that we have an additional derived kernel item.

      [Vid,              $] [V\rightarrow\bullet\text{id}, \ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{\textdollar}]
      • The lookahead set for this kernel is anything that can follow EE - we take this from the previous state’s kernel item.
    • From these four kernels, we have four GOTO states:

      1. GOTO State 5 on EE
      2. GOTO State 6 on nn
      3. GOTO State 7 on VV
      4. GOTO State 8 on id\text{id}

...

1.4.1 - Proving that a Grammar is LR(1)

1.5 - LR(1) Parsing Action Conflicts

1.6 - Non-LR(1) Grammars

Creating the LR(1) Parsing Automaton

  1. Begin by introducing our new start symbol to the grammar, and its production.

    • Here, we introduce the new start symbol SS and the production SLS\rightarrow L
  2. The initial state is based on the production SLS\rightarrow L.

    • Therefore, we have the following kernel item:

      [LL               $] \begin{aligned} &[L\rightarrow\bullet L \end{aligned} \begin{aligned} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{\textdollar}&]\\ \end{aligned}
    • Note that the lookahead set is $\text{\textdollar}, as after we have finished parsing the original start symbol there should be an EOF.

    • Since our current position indicator \bullet is to the left of the non-terminal symbol LL in the first kernel item which has productions, we have derived items. More specifically, we have 3 additional derived items from LL’s productions, Lϵ,LE,LL EL\rightarrow\epsilon, L\rightarrow E, L\rightarrow L\ E

      [SL[L,[LE,[LL E,               $]]]] \begin{aligned} &\color{gray}[S\rightarrow\bullet L\\ &[L\rightarrow\bullet,\\ &[L\rightarrow\bullet E,\\ &[L\rightarrow\bullet L\ E,\\ \end{aligned} \begin{aligned} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \color{gray}\text{\textdollar}]&\\ ]&\\ ]&\\ ]& \end{aligned}
    • Note that in each of these, the position indicator is at the start of matching the sequences.

    • Deriving our lookahead set. We know that anything that can follow L is:

      • Anything that can come after SS - that is the EOF symbol, $\text{\textdollar}
      [SL[L,[LE,[LL E,               $]$]$]$] \begin{aligned} &\color{gray}[S\rightarrow\bullet L\\ &[L\rightarrow\bullet,\\ &[L\rightarrow\bullet E,\\ &[L\rightarrow\bullet L\ E,\\ \end{aligned} \begin{aligned} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \color{gray}\text{\textdollar}&]\\ \text{\textdollar}&]\\ \text{\textdollar}&]\\ \text{\textdollar}&] \end{aligned}
    • However, since the lookahead set is comprised of anything that can follow LL (and EE follows LL in the last kernel item, we need to include it.)

      • EE can either start with nn (from EnE\rightarrow n) or an open parenthesis (from E( L )E\rightarrow (\ L\ ))
      [SL[L,[LE,[LL E,               $]$,n,(]$,n,(]$,n,(] \begin{aligned} &\color{gray}[S\rightarrow\bullet L\\ &[L\rightarrow\bullet,\\ &[L\rightarrow\bullet E,\\ &[L\rightarrow\bullet L\ E,\\ \end{aligned} \begin{aligned} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \color{gray}\text{\textdollar}&]\\ \text{\textdollar},n,(&]\\ \text{\textdollar},n,(&]\\ \text{\textdollar},n,(&]\\ \end{aligned}
    • Additionally, in the fourth kernel item, the position indicator is to the left of the non-terminal symbol EE, meaning that we have more derived items. We have 2 additional derived items, from the productions EnE\rightarrow n and E( L )E\rightarrow (\ L\ )

      [SL[L,[LE,[LL E,[E n[E( L )               $]$,n,(]$,n,(]$,n,(]]] \begin{aligned} &\color{gray}[S\rightarrow\bullet L\\ &[L\rightarrow\bullet,\\ &[L\rightarrow\bullet E,\\ &[L\rightarrow\bullet L\ E,\\ &[E\rightarrow\bullet\ n\\ &[E\rightarrow\bullet(\ L\ ) \end{aligned} \begin{aligned} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \color{gray}\text{\textdollar}&]\\ \text{\textdollar},n,(&]\\ \text{\textdollar},n,(&]\\ \text{\textdollar},n,(&]\\ &]\\ &]\\ \end{aligned}
      • The look-ahead set for EE’s kernels are derived from the third and fourth kernel item, in which it is the last item. We propagate these changes.

        [SL[L,[LE,[LL E,[E n[E( L )               $]$,n,(]$,n,(]$,n,(]$,n,(]$,n,(] \begin{aligned} &\color{gray}[S\rightarrow\bullet L\\ &[L\rightarrow\bullet,\\ &[L\rightarrow\bullet E,\\ &[L\rightarrow\bullet L\ E,\\ &[E\rightarrow\bullet\ n\\ &[E\rightarrow\bullet(\ L\ ) \end{aligned} \begin{aligned} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \color{gray}\text{\textdollar}&]\\ \text{\textdollar},n,(&]\\ \text{\textdollar},n,(&]\\ \text{\textdollar},n,(&]\\ \text{\textdollar},n,(&]\\ \text{\textdollar},n,(&]\\ \end{aligned}
  3. GoTo States

    • From this, we deduce that we have GOTO states on:
      1. LL (from the first and fourth kernel items)
      2. EE from the third kernel item
      3. nn from the fifth kernel item
      4. (( from the sixth kernel item
  4. Parsing Actions

    • [L,              $,n,(]\def\spaces{\ \ \ \ \ \ \ \ \ \ \ \ \ \ }[L\rightarrow\bullet, \spaces\text{\textdollar}, n, (]
    • [En,           $,n,(]\def\spaces{\ \ \ \ \ \ \ \ \ \ \ }[E\rightarrow\bullet n,\spaces\text{\textdollar},n,(]
    • Therefore, we have a conflict:
      • If our next symbol is nn, we don’t know whether to Shift or Reduce LϵL\rightarrow\epsilon
      • If our next symbol is ((, we don’t know whether to Shift or Reduce LϵL\rightarrow\epsilon

Therefore, this grammar is ambiguous and not LR(1). Therefore, we can’t parse it using a LR(1) Parsing Automaton.

1.6.1 - Modifying a Non-LR(1) Ambiguous Grammar to a LR(1) Grammar

2.0 - LALR(1) Parsing Scheme

2.1 - Generating a LALR(1) Parsing Automaton

We just need to figure out which states to merge.

State 0 - There are no states with identical LR(0) Parsing items

State 1 - There are no states with identical LR(0) Parsing items

State 2 - State 7 has the same LR(0) Parsing items. We combine the two states together, taking the union of their look-ahead sets.

[LL E ,            $,n,(,)] \def\spaces{\ \ \ \ \ \ \ \ \ \ \ \ } [L\rightarrow L\ E\ \bullet,\spaces \text{\textdollar}, n, (, )]

State 3 - State 8 has the same LR(0) Parsing items. We combine the two states together, taking the union of their look-ahead sets.

[En ,            $,n,(,)] \def\spaces{\ \ \ \ \ \ \ \ \ \ \ \ }[E\rightarrow n\ \bullet, \spaces\text{\textdollar},n,(,)]

State 4 - State 9 has the same LR(0) Parsing items. We combine the two states together, taking the union of their look-ahead sets.

[E(L ),[L,[L L E,            $,n,(,)]            (, n, )]            (, n, )] \def\spaces{\ \ \ \ \ \ \ \ \ \ \ \ } \begin{aligned} &[E\rightarrow(\bullet L\ ), \\ &[L\rightarrow\bullet,\\ &[L\rightarrow\bullet\ L\ E, \end{aligned} \begin{aligned} \spaces\text{\textdollar}, n,(,)&]\\ \spaces (,\ n,\ )&]\\ \spaces (,\ n,\ )&]\\ \end{aligned}

State 5 - State 10 has the same LR(0) parsing items. We combine the two states together, taking the union of their look-ahead sets.

[E(L ),[LL  E,[En,[E (L)            $,n,(,)]),n,(]),n,(]),n,(] \def\spaces{\ \ \ \ \ \ \ \ \ \ \ \ } \begin{aligned} [&E\rightarrow (L\ \bullet),\\ [&L\rightarrow L\ \bullet\ E,\\ [&E\rightarrow\bullet n,\\ [&E\rightarrow\bullet\ (L)\\ \end{aligned} \begin{aligned} \spaces \text{\textdollar}, n, (, )&]\\ ), n, (&]\\ ), n, (&]\\ ), n, (&]\\ \end{aligned}

State 6 - State 11 has the same LR(0) parsing items. We combine the two states together, taking the union of their look-ahead sets.

[E(L)           $,n,(,)] [E\rightarrow (L)\ \bullet \ \ \ \ \ \ \ \ \ \ \text{\textdollar}, n, (, )]

Transitions are carried across to the new state. The new Parsing Automaton is given below;.

2.2.1 - Another Example

Our condensed LALR(1) Parsing Automaton is given as follows

We then check what the parsing actions are, to ensure that the new Parsing Automaton doesn’t introduce any conflicts.