Tutorial 3
For this tutorial, you will need to study the Recursive Descent Parsing notes and you will need a copy of the PL0 Concrete Syntax Document. The first few questions explore adding a repeat statement to PL0 by writing a parsing, including in later questions syntax recovery and building an AST
Question 1
Write a parse method for a Pascal-line repeat statement with the following syntax:
whereand are lexical tokens, and and are non-terminals. You may assume that there are parse methods parseStatementList()
andparseCondition()
for the non-terminalsand respectively. The stream of tokens is represented by the variable which mas a method void match(Token) to match a single token. Syntax recovery is handled in Question 2 and building an AST in Question 3.
/**
* Rule: RepeatStatement -> KW_REPEAT StatementList KW_UNTIL Condition
*/
private void parseRepeatStatement() {
Tokens.match(Token.KW_REPEAT);
parseStatementList(); // Parse method for non-terminal StatementList
Tokens.match(Token.KW_UNTIL);
parseCondition(); // Parse method for non-terminal Condition
}
Question 2
Extend your parse method from the previous question to handle syntax error recovery. The token stream
has methods parse()
andmatch()
described in detail in the Recursive Descent Parsing notes.
// Parse and Match statement inputs:
private void parse(String rule,TokenSet expected,TokenSet recoverSet,ParseVoid parser);
private void parse(String rule,TokenSet expected,TokenSet recoverSet,ParseVoid parser);
void match(Token expected, TokenSet nextSymbols); // With recovery
void match(Token expected); // Without recovery
void parseStatementList(TokenSet recoverSet);
void parseCondition(TokenSet recoverSet);
const CONDITION_START_SET // Set of lexcial tokens that can start a Condition
/**
* Rule: RepeatStatement -> KW_REPEAT StatementList KW_UNTIL Condition
* @param recoverSet is a set of terminal symbols used for error recovery. At the
* completion of the method, the current token will normally be the first symbol
* after the repeat statement, if not, it will be the first symbol after the end of
* the repeat stateemnt that is in the recoverSet.
*/
private void parseRepeatStatement(TokenSet recoverSet) {
parse("Repeat Statement", Token.KW_REPEAT, recoverset,
() -> {
tokens.match(Token.KW_REPEAT);
parseStatementList(recoverSet);
tokens.match(Token.KW_UNTIL, CONDITION_START_SET);
parseCondition(recoverSet
});
}
Question 3
Next, extend your solution from the previous question to return an abstract syntax tree for a repeat statement in the form of a
RepeatNode
which is a subclass ofStatementNode
. You may assume the following methods:
StatementNode parseStatementList(TokenSet recoverSet);
ExpNode parseCondition(TokenSet recoverSet);
RepeatNode(Location loc, StatementNode body, ExpNode condition);
// The parse method for a statement, stmt.parse has the following types and hence
// the (anonymous) fucntion (containing the syntax recogniser) passed as the fourth
// parameter parser to stmt.parse must return a StatementNode.
private StatementNode parse(String rule, TokenSet expected, TokenSet recoverSet,
parseNonterminal<StatementNode> parser);
private StatementNode parse(String rule, Token expected, TokenSet recoverSet,
parseNonterminal<StatementNode> parser);
/**
* Rule: RepeatStatement -> KW_REPEAT StatementList KW_UNTIL Condition
* @param recoverSet is a set of terminal symbols used for error recovery. At the
* completion of the method, the current token will normally be the first symbol
* after the repeat statement, if not, it will be the first symbol after the end of
* the repeat stateemnt that is in the recoverSet.
* @return an abstract syntax tree for a repeat statement
*/
private StatementNode parseRepeatStatement(TokenSet recoverSet) {
return stmt.parse("Repeat Statement", Token.KW_REPEAT, recoverSet,
() -> {
// The location of KW_REPEAT, used to generate error message
Location loc = token.getLocation();
// Cannot fail as we have performed synchronisation
tokens.match(Token.KW_REPEAT);
StatementNode statements =
parseStatementList(recoverSet.union(Tokens.KW_UNTIL);
tokens.match(Token.KW_UNTIL, CONDITION_START_SET);
// RecoverSet doesn't have union as this is the last thing to parse.
ExpNode cond = parseCondition(recoverSet);
// Construct and return AST node.
return new StatementNode.RepeatNode(loc, statements, cond);
});
}
Question 4
Write a grammar for Boolean expressions that includes the constants true and false, the operators and, or and not, and parentheses. Give or a lower precedence than and, and and a lower precedence than not, and allow repeated nots, as in the Boolean expression not not true. Also be sure your grammar is not ambiguous.
- Precedence:
- OR
- AND
- NOT
Question 5
Extend the parser for conditions to include the binary logical operators LOG_AND and LOG_OR (represented by ‘&&’ and ‘||’), and the unary logical operator LOG_NOT (represented by ‘!’). The binary operators are left associative and (unlike previous question) have the same precedence than each other but are at a lower precedence than relational operators. The unary operator LOG_NOT has higher precedence than the binary operators LOG_AND and LOG_OR, but lower precedence than relational operators. The non-terminal Condition is defined as follows, where
RelCondition
corresponds to the version in the RDP notes.
You may assume that the scanner supplies tokens LOG_AND, LOG_OR and LOG_NOT for the operators. Your Abstract Syntax Tree structure can make use of the existingBinaryNode
andUnaryNode
nodes which are sub-classes ofExpNode
. Assume operators AND_OP, OR_OP and NOT_OP are in Operator. The parse method for an expression,exp.parse(...)
, has the following types, and hence the (anonymous) function (which contains the syntax recogniser) passed as the fourth parameter parser toexp.parse
must return anExpNode
.
private ExpNode parse(String rule, TokenSet expected, TokenSet recoverSet,
ParseNonterminal<ExpNode> parser);
private ExpNode parse(String rule, Token expected, TokenSet recoverSet,
ParseNonterminal<ExpNode> parser);
/**
* We add a set of the logical binary operator tokens and defined symbols that
* can start a logical termn and hence a condition.
*/
private final static TokenSet CONDITION_OPS_SET =
new TokenSet(Token.LOG_AND, Token.LOG_OR);
private final static TokenSet LOG_TERM_START_SET =
REL_CONDITION_START_SET.union(Token.LOG_NOT);
private final static TokenSet CONDITION_START_SET = LOG_TERM_START_SET;
/**
* Rule: Condition -> LogTerm {(LOG_AND | LOG_OR) LogTerm}
* Accepts a Condition from the input and proesses the corresponding
* Abstract Syntax Tree.
* @param recoverSet is a set of terminal symbols used for error recovery.
*@ return an abstract syntaxtree for a condition.
*
*/
private ExpNode parseCondition(TokenSet recoverSet) {
return exp.parse("Condition", CONDITION_START_SET, recoverSet,
() -> {
// The current token is in the CONDITION_START_SET
// Extend the syntax recovery sert with logical operators.
ExpNode cond = parseLogTerm(recoverSet.union(CONDITION_OPS_SET);
while (tokens.isIn(CONDITION_OPS_SET)) {
Operator binaryOp = Operator.INVALID_OP; // Initially set sentinel
Location opLocation = tokens.getLocation();
if (tokens.isMatch(Token.LOG_AND)) {
binaryOp = Operator.LOG_AND;
tokens.match(Tokens.LOG_AND); // Cannot fail.
} else if (tokens.isMatch(Token.LOG_OR)) {
binaryOp = Operator.LOG_OR;
tokens.match(Token.LOG_OR);
} else {
fatal("Unreachable branch in parseCondition");
}
/**
* The recovery set needs to include teh CONDITION_OPS_SET
* because there nay be another logical operator
* (LOG_AND or LOG_OR) following which will be matched on the
* next iteration of the while loop.
*/
ExpNode right = parseLogTerm(recoverSet.union(CONDITION_OPS_SET));
/**
* cond becomes a new binary operator node, with the old value
* of cond as its first argument, and right as its
* second argument.*/
cond = new ExpNode.BinaryNode(opLocation, binaryOp, cond, right);
}
return cond;
});
}
/**
* Rule: LogTerm -> LOG_NOT LogTerm | RelCondition
*/
private ExpNode parseLogTerm(TokenSet recoverSet) {
return exp.parse("LogTerm", LOG_TERM_START_SET, recoverSet,
() -> {
ExpNode result = null;
if (tokens.isMatch(Token.LOG_NOT)) {
Location opLocation = tokens.getLocation(); // Location of LOG_NOT
tokens.match(Token.LOG_NOT); // Cannot fail
result = parseLogTerm(recoverSEt);
result = new ExpNode.UnaryNode(opLocation, Operator.NOT_OP, result);
} else if (tokens.isIn(REL_CONDITION_START_SET)) {
result = parseRelCondition();
} else {
fatal("Unreachable breanch in parseLogTerm");
}
return result;
});
}