Lecture 5
1.0 - Syntax Error Recovery
- Programmers make errors - how do we recover from these errors?
- There is no such thing as perfect error recovery
- We focus on error recovery that handles simple common errors.
1.1 - Recovery Strategy for Recursive-Descent Parsing
- Perform local error recovery on matching a single token
- Synchronise the input stream at the start and end of each parse method (parse method → trying to parse a non-terminal symbol)
1.1.1 - Recovery Strategy for Matching a Single Token (Terminal Symbol)
-
What if the current token isn’t KW_DO?
void parseWhileStatement() { tokens.match(Token.KW_WHILE); parseCondition(); tokens.match(Token.KW_DO); // An error is encountered at this step. parseStatement(); }
-
It might be missing:
- In this case, we notice that the next token is
is something that starts a statement (and a statement is the next thing to be parsed) - We can assume that the keyword do is missing, and continue parsing as normal.
- In this case, we notice that the next token is
-
A token may be erroneously inserted.
-
A single erroneous token replacing the expected token
-
When the next token is not what we expected it to be, we look at the next token along:
- If after skipping a token, the next token is what we expected it to be, we know that a token was erroneously inserted
- If after skipping a token, the next token isn’t what we expected it to be, we assume that it was a replacement for the token that we were looking for and try to parse the following tokens.
-
We can replace the original
tokens.match
method with one that has two arguments:tokens.match(token_to_match, proceeding_set)
token_to_match
the token that we’re trying to match toproceeding set
a set of all the tokens that could start the following section / statement
void parseWhileStatement() { tokens.match(Token.KW_WHILE); parseCondition(); tokens.match(Token.KW_DO, STATEMENT_START_SET); parseStatement(); }
- If the current token is KW_DO, match it;
- Else if the current token is in STATEMENT_START_SET, assume KW_DO was missing and take no further action
- Else skip the current token and:
- if the current token is KW_DO, assume that the previous token was erroneously inserted and match the new token;
- Else assume that the expected token was replaced by the previous token; take no further action
1.1.2 - Recovery Strategy for Parse Method (Non-Terminal Symbol)
-
We perform synchronisation at the start and end of the parse methods
void parseN(TokenSet recoverSet) { // Skip tokens until the current token is in the expected set or recoverSet // if the new current token is not in expected // Return as though N had been recognised // Returning an error node / value if required // else, recognise the token N recog(N) // Skip tokens until the current token is in recoverSet }
expected
is in the set of terminal symbols that are valid at the start of parsing NrecoverSet
contains the terminal symbols that can follow N in the context in which N is being parsed
-
Essentially removing the garbage at the start and end.
1.1.3 - Implementation using Method Parse
We don’t want to hard-code the behaviour that we proposed before for every non-terminal symbol. Let’s create a pattern that does this more succinctly.
void parseN(TokenSet recoverSet) {
parse("N", N_START_SET, recoverSet, () -> {
recog(N)
});
}
-
The parameters of the method
parse
are:- The name of the rule for use in error messages
- The set of tokens expected at start of rule
- Set of tokens to recover at on a syntax error
- A function that parses the non-terminal.
-
The parse method performs the synchronisation at the start and end of the
parseN
method -
So from this, we can modify our
parseWhileStatement
method from:void parseWhileStatement() { tokens.match(Token.KW_WHILE); parseCondition(); tokens.match(Token.KW_DO, STATEMENT_START_SET); parseStatement(); }
-
To the following:
void parseWhileStatement(TokenSet recoverSet) { // Name for debugging, rule start set, recover set, non-terminal parsing function parse("While Statement", Token.KW_WHILE, recoverSet, () -> { // Assume that synchronisation has occurred, and // The current token is KW_WHILE // Since synchronisation has occurred, the current token is KW_WHILE. // Cannot fail as a result of this, so don't need to use the recoverable // method version of tokens.match. tokens.match(Token.KW_WHILE); parseCondition(recoverSet.union(Token.KW_DO)); // [1] tokens.match(Token.KW_DO, STATEMENT_START_SET); // [2] parseStatement(recoverSet); // [3] }); }
- [1] Note that our
recoverSet
is the union of the existingrecoverSet
(that is, the set of tokens on which theWhileStatement
can recover on and KW_DO (as we are up to parsing the condition attached to the while).- Also note that
recoverSet.union(...)
doesn’t modify the originalrecoverSet
variable, it returns a new instance
- Also note that
- [2] The arguments to
token.match(...)
with recovery are:- The token to try to match
- The set of tokens that can follow
- [3] Since no token can follow (for the purpose of syntax recovery), we just use the non-recover version of
token.match(...)
1.1.4 - Recovery Strategy Example for
RelCondition
Suppose we wanted to parse a
RelCondition
, which is defined by the following productionvoid parseRelStatement(TokenSet recoverSet) { parse("RelCondition", REL_COND_START_SET, recoverSet, // Anonymous parse method for parsing a RelCondition () -> { // The current token is in REL_COND_START_SET parseExp(recoverSet.union(REL_OPS_SET)); // [1] // Parse the optional [RelOp Exp] portion of the production [2] if (tokens.isIn(REL_OPS_SET)) { parseRelOp(recoverSet.union(EXP_START_SET); parseExp(recoverSet); } }); }
- [1] The
recoverSet
for parsing the expression is the union of:- Anything that
RelCondition
could recover on (this is inrecoverSet
already) - Anything that an Expression could recover on
- Anything that
- [2] When parsing the
RelOp
set, therecoverSet
would be comprised of- The
recoverSet
for theRelCondition
- Anything that could start an expression
- The
- [1] Note that our
1.2 - Parsing: The Big Picture
- During the syntax analysis phase, the compiler needs to
- Recognise the grammatical structure of a sequence of lexical tokens;
- While performing error recovery where possible, and; recover as best as we can so we can report as much of the error to the user as possible
- Build an Abstract Syntax Tree (AST) and symbol table
1.2.1 - Parsing and Building an AST
-
Consider parsing the following production with error recovery.
- Key -
Building AST
|Error Recovery
| Parsing Code
ExpNode parseFactor(TokenSet recoverSet) { // Use exp.parse at it returns an expression node // Identical function to return exp.parse("Factor", FACTOR_START_SET, recoverSet, () -> { // Assume synchronisation has been performed, // the current token is in FACTOR_START_SET ExpNode result = null; // There are three symbols that could start a Factor production: // LPAREN, NUMBER or LVALUE. Parse each of them. if (token.isMatch(Token.LPAREN)) { // Cannot fail so don't use recovery version tokens.match(Token.LPAREN); result = parseCondition(recoverSet.union(Token.RPAREN)); // Note that we use the recoverset as the set of tokens that can // follow the Token.RPAREN - we don't actually know what comes after // but we can give it ALL of the possible tokens that could come // after + some more (could reduce this set based on context). tokens.match(Token.RPAREN, recoverSet); // Can fail } else if (tokens.isMatch(Token.NUMBER)) { result = new ExpNode.ConstNode(tokens.getLocation(), Predefined.INTEGER_TYPE, tokens.getIntValue()); tokens.match(Token.NUMBER); // Cannot fail, move pointer past token } else if (tokens.isMatch(Token.IDENTIFIER)) { result = parseLValue(recoverSet); } else { // Defensive programming in the case FACTOR_START_SET // has an extra unintended token. fatal("parseFactor"); } return result; }); }
- Key -
-
This code here integrates three key things:
- Recognising the tokens
- Building the abstract syntax tree
- Performing (simple) error recovery.
1.2.2 - Construction of Expression Parse Method
Notice that in the code above, we have used exp.parse. How do we construct this?
-
The code to construct the expression parser
exp
ParseMethod<ExpNode> exp = new ParseMethod<>( ... (Location loc) -> new ExpNode.ErrorNode(loc));
- Notice here that the
ParseMethod
is configured to return anExpNode
. - The fourth parameter of this function (the anonymous function) must also return an ExpNode.
- Notice here that the