Tutorial 3

For this tutorial, you will need to study the Recursive Descent Parsing notes and you will need a copy of the PL0 Concrete Syntax Document. The first few questions explore adding a repeat statement to PL0 by writing a parsing, including in later questions syntax recovery and building an AST

Question 1

Write a parse method for a Pascal-line repeat statement with the following syntax: $\text{RepeatStatement}\rightarrow\text{KW\_REPEAT StatementList KW\_UNTIL Condition}$
where $\text{KW\_REPEAT}$ and $\text{KW\_UNTIL}$ are lexical tokens, and $\text{StatementList}$ and $\text{Condition}$ are non-terminals. You may assume that there are parse methods parseStatementList() and parseCondition() for the non-terminals $\text{StatementList}$ and $\text{Condition}$ respectively. The stream of tokens is represented by the variable $\text{tokens}$ which mas a method void match(Token) to match a single token. Syntax recovery is handled in Question 2 and building an AST in Question 3.

/**
 * Rule: RepeatStatement -> KW_REPEAT StatementList KW_UNTIL Condition
 */
private void parseRepeatStatement() {
  Tokens.match(Token.KW_REPEAT);
  parseStatementList(); // Parse method for non-terminal StatementList
  Tokens.match(Token.KW_UNTIL);
  parseCondition();     // Parse method for non-terminal Condition
}

Question 2

Extend your parse method from the previous question to handle syntax error recovery. The token stream $\text{tokens}$ has methods parse() and match() described in detail in the Recursive Descent Parsing notes.

// Parse and Match statement inputs:
private void parse(String rule,TokenSet expected,TokenSet recoverSet,ParseVoid parser);
private void parse(String rule,TokenSet expected,TokenSet recoverSet,ParseVoid parser);
void match(Token expected, TokenSet nextSymbols); // With recovery
void match(Token expected);                       // Without recovery
void parseStatementList(TokenSet recoverSet);
void parseCondition(TokenSet recoverSet);
const CONDITION_START_SET // Set of lexcial tokens that can start a Condition

/**
 * Rule: RepeatStatement -> KW_REPEAT StatementList KW_UNTIL Condition
 * @param recoverSet is a set of terminal symbols used for error recovery. At the
 * completion of the method, the current token will normally be the first symbol
 * after the repeat statement, if not, it will be the first symbol after the end of
 * the repeat stateemnt that is in the recoverSet.
 */
private void parseRepeatStatement(TokenSet recoverSet) {
  parse("Repeat Statement", Token.KW_REPEAT, recoverset, 
              () -> {
                tokens.match(Token.KW_REPEAT);
                parseStatementList(recoverSet);
                tokens.match(Token.KW_UNTIL, CONDITION_START_SET);
                parseCondition(recoverSet
              });
}

Question 3

Next, extend your solution from the previous question to return an abstract syntax tree for a repeat statement in the form of a RepeatNode which is a subclass of StatementNode. You may assume the following methods:

StatementNode parseStatementList(TokenSet recoverSet);
ExpNode parseCondition(TokenSet recoverSet);
RepeatNode(Location loc, StatementNode body, ExpNode condition);

// The parse method for a statement, stmt.parse has the following types and hence 
// the (anonymous) fucntion (containing the syntax recogniser) passed as the fourth
// parameter parser to stmt.parse must return a StatementNode.
private StatementNode parse(String rule, TokenSet expected, TokenSet recoverSet, 
    parseNonterminal<StatementNode> parser);
private StatementNode parse(String rule, Token expected, TokenSet recoverSet, 
    parseNonterminal<StatementNode> parser);

/**
 * Rule: RepeatStatement -> KW_REPEAT StatementList KW_UNTIL Condition
 * @param recoverSet is a set of terminal symbols used for error recovery. At the
 * completion of the method, the current token will normally be the first symbol
 * after the repeat statement, if not, it will be the first symbol after the end of
 * the repeat stateemnt that is in the recoverSet.
 * @return an abstract syntax tree for a repeat statement
 */
private StatementNode parseRepeatStatement(TokenSet recoverSet) {
  return stmt.parse("Repeat Statement", Token.KW_REPEAT, recoverSet, 
          () -> {
            // The location of KW_REPEAT, used to generate error message
            Location loc = token.getLocation(); 
            // Cannot fail as we have performed synchronisation 
            tokens.match(Token.KW_REPEAT); 
            StatementNode statements = 
                                parseStatementList(recoverSet.union(Tokens.KW_UNTIL);
            tokens.match(Token.KW_UNTIL, CONDITION_START_SET);
            // RecoverSet doesn't have union as this is the last thing to parse.
            ExpNode cond = parseCondition(recoverSet);
            // Construct and return AST node.
            return new StatementNode.RepeatNode(loc, statements, cond);
          });
}

Question 4

Write a grammar for Boolean expressions that includes the constants true and false, the operators and, or and not, and parentheses. Give or a lower precedence than and, and and a lower precedence than not, and allow repeated nots, as in the Boolean expression not not true. Also be sure your grammar is not ambiguous.

Precedence:
1. OR
2. AND
3. NOT

E\rightarrow E\ \bold{or}\ T\ |\ T\\ T \rightarrow T\ \bold{and}\ F\ | F\\ F\rightarrow \bold{not}\ F\ |\ (E)\ |\ \bold{true}\ |\ \bold{false}

Question 5

Extend the parser for conditions to include the binary logical operators LOG_AND and LOG_OR (represented by ‘&&’ and ‘||’), and the unary logical operator LOG_NOT (represented by ‘!’). The binary operators are left associative and (unlike previous question) have the same precedence than each other but are at a lower precedence than relational operators. The unary operator LOG_NOT has higher precedence than the binary operators LOG_AND and LOG_OR, but lower precedence than relational operators. The non-terminal Condition is defined as follows, where RelCondition corresponds to the version in the RDP notes.
$\footnotesize\text{Condition}\rightarrow\text{LogTerm\{(LOG\_AND | LOG\_OR) LOG\_TERM\}}\\\text{LogTerm}\rightarrow \text{ LOG\_NOT LogTerm | RelCondition}\\\text{RelCondition}\rightarrow\text{Exp [RelOp Exp]}$
You may assume that the scanner supplies tokens LOG_AND, LOG_OR and LOG_NOT for the operators. Your Abstract Syntax Tree structure can make use of the existing BinaryNode and UnaryNode nodes which are sub-classes of ExpNode. Assume operators AND_OP, OR_OP and NOT_OP are in Operator. The parse method for an expression, exp.parse(...), has the following types, and hence the (anonymous) function (which contains the syntax recogniser) passed as the fourth parameter parser to exp.parse must return an ExpNode.

private ExpNode parse(String rule, TokenSet expected, TokenSet recoverSet,
    ParseNonterminal<ExpNode> parser);
private ExpNode parse(String rule, Token expected, TokenSet recoverSet,
    ParseNonterminal<ExpNode> parser);

/**
  * We add a set of the logical binary operator tokens and defined symbols that 
  * can start a logical termn and hence a condition. 
  */
private final static TokenSet CONDITION_OPS_SET = 
        new TokenSet(Token.LOG_AND, Token.LOG_OR);
private final static TokenSet LOG_TERM_START_SET = 
    REL_CONDITION_START_SET.union(Token.LOG_NOT);
private final static TokenSet CONDITION_START_SET = LOG_TERM_START_SET;

/**
 * Rule: Condition -> LogTerm {(LOG_AND | LOG_OR) LogTerm}
 * Accepts a Condition from the input and proesses the corresponding 
 * Abstract Syntax Tree.
 * @param recoverSet is a set of terminal symbols used for error recovery.
 *@ return an abstract syntaxtree for a condition.
 *
 */
private ExpNode parseCondition(TokenSet recoverSet) {
  return exp.parse("Condition", CONDITION_START_SET, recoverSet, 
                    () -> {
            // The current token is in the CONDITION_START_SET
            // Extend the syntax recovery sert with logical operators.
                        ExpNode cond = parseLogTerm(recoverSet.union(CONDITION_OPS_SET);
            while (tokens.isIn(CONDITION_OPS_SET)) {
                          Operator binaryOp = Operator.INVALID_OP; // Initially set sentinel
                            Location opLocation = tokens.getLocation();
              if (tokens.isMatch(Token.LOG_AND)) {
                                binaryOp = Operator.LOG_AND;
                tokens.match(Tokens.LOG_AND); // Cannot fail.
              } else if (tokens.isMatch(Token.LOG_OR)) {
                binaryOp = Operator.LOG_OR;
                tokens.match(Token.LOG_OR);
              } else {
                fatal("Unreachable branch in parseCondition");
              }

              /**
                * The recovery set needs to include teh CONDITION_OPS_SET
                * because there nay be another logical operator 
                * (LOG_AND or LOG_OR) following which will be matched on the 
                * next iteration of the while loop.
                */
              ExpNode right = parseLogTerm(recoverSet.union(CONDITION_OPS_SET));
              /**
                * cond becomes a new binary operator node, with the old value 
                * of cond as its first argument, and right as its 
                * second argument.*/
              cond = new ExpNode.BinaryNode(opLocation, binaryOp, cond, right);
            }
            return cond;
                    });
}

/**
  * Rule: LogTerm -> LOG_NOT LogTerm | RelCondition
  */
private ExpNode parseLogTerm(TokenSet recoverSet) {
  return exp.parse("LogTerm", LOG_TERM_START_SET, recoverSet, 
          () -> {
            ExpNode result = null;
            if (tokens.isMatch(Token.LOG_NOT)) {
              Location opLocation = tokens.getLocation(); // Location of LOG_NOT
              tokens.match(Token.LOG_NOT); // Cannot fail
              result = parseLogTerm(recoverSEt);
              result = new ExpNode.UnaryNode(opLocation, Operator.NOT_OP, result);
            } else if (tokens.isIn(REL_CONDITION_START_SET)) {
              result = parseRelCondition();
            } else {
              fatal("Unreachable breanch in parseLogTerm");
            }
            return result;
          });
}