Week 5.2

  1. Follow Sets
  2. LL(1) Grammars
  3. Java-CUP Parser Generator
  4. JFlex Lexical Analyser Generator
  5. Using JFlex and Java-Cup Together
  6. Extending the Calculator Example

1.0 - Follow Sets

1.1 - Formal Definition

1.2 - Rules for Calculating Follow Sets

1.3.1 - Example of Calculating Follow Sets Algorithmically


  1. Initialise the follow sets.

    S {$}
    A {  }\{\ \ \}
    B {  }\{\ \ \}
  2. In the first pass, we first look at the first production:

    SxABS\rightarrow xAB

    Here, we see that the non-terminal symbol AA can be followed by anything that BB can start with. Therefore, we add y,zy, z to Follow(A)\text{Follow}(A).

    Since BB is nullable, we also need to include Follow(S)\text{Follow}(S), so we add $ to Follow(A)\text{Follow}(A)


    The non-terminal BB can be followed by an empty sequence of symbols, which means that it can be followed by anything that can follow SS.

    At this point in time, we know that $ can follow SS

    S {$} {$}
    A {  }\{\ \ \} {y,z,\{y,z,}$
    B {  }\{\ \ \} {$}

    We then look at the second production:

    AyzBA\rightarrow y | zB

    From this production, we see that we have two options for the production:

    • In the left alternative, there’s no non-terminal symbols (so we don’t need to consider it)
    • In the right alternative, we have the non-terminal symbol BB
      • Since BB is followed by the empty sequence of symbols, Follow(B)Follow(A)\text{Follow}(B)\subseteq\text{Follow}(A)
    S {$} {$}
    A {  }\{\ \ \} {y,z,\{y,z,}$
    B {  }\{\ \ \} {\{, y, z}$

    We then look at the third production:

    BϵAxB\rightarrow\epsilon|A x

    From this production, we see that xx can follow AA, so we add xx to the follow set of AA. Since xx is a terminal symbol, and terminal symbols are never nullable, there are no further steps

    S {$} {$}
    A {  }\{\ \ \} {y,z,\{y,z,,x}$
    B {  }\{\ \ \} {\{, y, z}$
  3. We pass through the grammar again to see if there are any changes that are to be made to the follow sets.

    We look at the first production again

    SxABS\rightarrow x A B

    • The follow set of AA must include the follow set of BB ✅ It already includes {\{,y,z}$
    • B is nullable, so it must include the follow set of SS ✅ It already includes {$}

    We then revisit the second production

    AyzBA\rightarrow y | z B

    • Since BB is followed by the empty sequence, it must include Follow(A)\text{Follow}(A) - it doesn’t yet include xx so we add that to the follow set.
    S {$} {$}
    A {  }\{\ \ \} {y,z,\{y,z,,x}$
    B {  }\{\ \ \} {\{, y, z, x}$
  4. {we pass through the grammar again, and notice that none of the follow sets change. Therefore, we’re done}

1.3.2 - Algorithm for Calculating Follow Sets

updateFollow(Production p)
	// Start from the last symbol on the RHS of the production
	i := length(p.rhs) - 1;
	followCurrent := Follow(p.lhs); // Can follow the whole RHS

	while 0 <= i // Process the RHS in reverse order
		if p.rhs(i) ∈ g.nonterminals
			Follow(p.rhs(i)) := Follow(p.rhs(i)) ∪ followCurrent;
			// Update followCurrent ready for the next iteration
			if ϵ ∈ first(p.rhs(i))
				// non-terminal symbol p.rhs(i) is nullable
				// Augment followCurrent with all of the terminal symbols the 
				// current symobol (p.rhs(i)) can start with.
				followCurrent := followCurrent ∪ (first(p.rhs(i)) - {ϵ})
			else
				// non-terminal symbol p.rhs(i) is not nullable
				// Set followCurrent to the set of symbols that the 
				// current symbol (p.rhs(i)) can start with.
				followCurrent := first(p.rhs(i))
		else: // p.rhs(i) ∈ g.terminals
			// Don't need to augment the follow set, but need to update followCurrent.
			// Set followCurrent to the current symbol that we're looking at 
			// (this MUST be in the follow set of the symbol that we look at next, as we're 
			// moving from right to left).
			followCurrent := {p.rhs(i)}; // Update it for the next iteration
		i := i - 1

2.0 - LL(1) Grammars

2.1 - Formal Definition of LL(1) Grammars

2.1.1 - LL(1) Grammars and Recursive Descent Parsing

2.1.2 - EBNF Grammars and LL(1) Grammars

3.0 - Java-CUP Parser Generator

// Expression definitions
E ::= E:e PLUS T:t
      {: RESULT = e + t; :} 
   ;
E ::= E:e MINUS T:t
      {: RESULT = e - t; :} 
   ;
E ::= T:t
      {: RESULT = t; :}  
  ;
// Term definitions 
T ::= T:t TIMES F:f
      {: RESULT = t * f; :}
   ;
T ::= F:f
      {: RESULT = f; :}
   ;
// Factor definitions
F ::= LPAREN E:e RPAREN
      {: RESULT = e; :}
   ;
F ::= NUMBER:n
      {: RESULT = n; :}
   ;

4.0 - JFlex Lexical Analyser Generator

4.1 - What Does the Lexical Analyser Do?

4.2 - Other Definitions

%%
/* -----------------Options and Declarations Section----------------- */
/* The name of the class JFlex will create will be Lexer.
 * Will write the code to the file Lexer.java.
 */
%class lexer
%unicode

/* Make the resulting class public */
%public

/* Will switch to a CUP compatibility mode to interface with a CUP
 * generated parser.
 * The terminal symbols defined by CUP are placed in the class sym.
 */
%cup
/* The value returned at end of file.
 */
%eofval{
    return makeToken( sym.EOF );
%eofval}
/* The current line number can be accessed with the variable yyline
 * and the current column number with the variable yycolumn.
 */
%line
%column
/* Declarations
 * Code between %{ and %}, both of which must be at the beginning of a
 * line, will be copied verbatim into the lexer class source.
 * Here one declares member variables and functions that are used inside
 * scanner actions.
 */
%{
    ComplexSymbolFactory sf;
    public lexer(java.io.Reader in, ComplexSymbolFactory sf){
        this(in);
        this.sf = sf;
    }

    /** To create a new java_cup.runtime.Symbol.
     * @param kind is an integer code representing the token.
     * Note that CUP and JFlex use integers to represent token kinds.
     */
    private Symbol makeToken( int kind ) {
    	/* Symbol takes the token kind, and the locations of the
    	 * leftmost and rightmost characters of the substring of the
    	 * input file that matched the token. 
    	 */
    	// System.err.println( "Token " + yytext() + " " + kind );
        return sf.newSymbol( sym.terminalNames[kind], kind, 
        	new ComplexSymbolFactory.Location(yyline, yycolumn), 
        	new ComplexSymbolFactory.Location(yyline, yycolumn + yylength()) );
    }
    /** Also creates a new java_cup.runtime.Symbol with information
     * about the current token, but this object has a value. 
     * @param kind is an integer code representing the token.
     * @param value is an arbitrary Java Object.
     * Below when tokens such as a NUMBER or IDENTIFIER are 
     * recognised they pass values which are respectively
     * of type Integer and String. The types of these values *must*
     * match their type as declared in the Terminals sections
     * of the CUP specification.
     */
    private Symbol makeToken(int kind, Object value) {
    	// System.err.println( "Token " + yytext() + " " +kind );
        return sf.newSymbol( sym.terminalNames[kind], kind, 
        	new ComplexSymbolFactory.Location(yyline, yycolumn), 
        	new ComplexSymbolFactory.Location(yyline, yycolumn + yylength()), 
        	value );
    }
%}

4.3 - Running Java-CUP on Calc.CUP

4.4 - Run Configurations

4.4.1 - Java-CUP Run Configuration [Calc_CUP]

4.4.2 - JFlex Lexical Analyser Run Configuration [Calc_JFlex]

4.4.3 - Calc Configuration

4.5 - CalcCUP Java Class

import java.io.IOException;
import java.io.InputStreamReader;
import java_cup.runtime.ComplexSymbolFactory;

public class CalcCUP {
	    public static void main(String[] args) throws java.lang.Exception {
	    	ComplexSymbolFactory csf = new ComplexSymbolFactory();
				lexer  calcLexer = new lexer( new InputStreamReader( System.in ), csf );
	      parser calcParser = new parser( calcLexer, csf );
				try {
			        //calcParser.debug_parse();
	            calcParser.parse();
				} catch( IOException e ) {
		    System.out.println( "Got IOException: " + e + "... Aborting" );
	            System.exit(1);
				}
	  }
}

5.0 - Using JFlex and Java-Cup Together

6.0 - Extending the Calculator Example.

6.1 - Implementing the LET Expression

6.1.1 - Modifying the Lexical Tokens

6.1.2 - Lexical Analyser - Adding Identifiers

{Letter}({Letter}|{Digit})*
              { return makeToken( sym.IDENTIFIER, yytext() ); }