Lecture 5
1.0 - Syntax Error Recovery
- Programmers make errors - how do we recover from these errors?
- There is no such thing as perfect error recovery
- We focus on error recovery that handles simple common errors.
1.1 - Recovery Strategy for Recursive-Descent Parsing
- Perform local error recovery on matching a single token
- Synchronise the input stream at the start and end of each parse method (parse method → trying to parse a non-terminal symbol)
1.1.1 - Recovery Strategy for Matching a Single Token (Terminal Symbol)
-
What if the current token isn’t KW_DO?
void parseWhileStatement() { tokens.match(Token.KW_WHILE); parseCondition(); tokens.match(Token.KW_DO); // An error is encountered at this step. parseStatement(); } -
It might be missing:
- In this case, we notice that the next token is
is something that starts a statement (and a statement is the next thing to be parsed) - We can assume that the keyword do is missing, and continue parsing as normal.
- In this case, we notice that the next token is
-
A token may be erroneously inserted.
-
A single erroneous token replacing the expected token
-
When the next token is not what we expected it to be, we look at the next token along:
- If after skipping a token, the next token is what we expected it to be, we know that a token was erroneously inserted
- If after skipping a token, the next token isn’t what we expected it to be, we assume that it was a replacement for the token that we were looking for and try to parse the following tokens.
-
We can replace the original
tokens.matchmethod with one that has two arguments:tokens.match(token_to_match, proceeding_set)token_to_matchthe token that we’re trying to match toproceeding seta set of all the tokens that could start the following section / statement
void parseWhileStatement() { tokens.match(Token.KW_WHILE); parseCondition(); tokens.match(Token.KW_DO, STATEMENT_START_SET); parseStatement(); }- If the current token is KW_DO, match it;
- Else if the current token is in STATEMENT_START_SET, assume KW_DO was missing and take no further action
- Else skip the current token and:
- if the current token is KW_DO, assume that the previous token was erroneously inserted and match the new token;
- Else assume that the expected token was replaced by the previous token; take no further action
1.1.2 - Recovery Strategy for Parse Method (Non-Terminal Symbol)
-
We perform synchronisation at the start and end of the parse methods
void parseN(TokenSet recoverSet) { // Skip tokens until the current token is in the expected set or recoverSet // if the new current token is not in expected // Return as though N had been recognised // Returning an error node / value if required // else, recognise the token N recog(N) // Skip tokens until the current token is in recoverSet }expectedis in the set of terminal symbols that are valid at the start of parsing NrecoverSetcontains the terminal symbols that can follow N in the context in which N is being parsed
-
Essentially removing the garbage at the start and end.
1.1.3 - Implementation using Method Parse
We don’t want to hard-code the behaviour that we proposed before for every non-terminal symbol. Let’s create a pattern that does this more succinctly.
void parseN(TokenSet recoverSet) {
parse("N", N_START_SET, recoverSet, () -> {
recog(N)
});
}
-
The parameters of the method
parseare:- The name of the rule for use in error messages
- The set of tokens expected at start of rule
- Set of tokens to recover at on a syntax error
- A function that parses the non-terminal.
-
The parse method performs the synchronisation at the start and end of the
parseNmethod -
So from this, we can modify our
parseWhileStatementmethod from:void parseWhileStatement() { tokens.match(Token.KW_WHILE); parseCondition(); tokens.match(Token.KW_DO, STATEMENT_START_SET); parseStatement(); } -
To the following:
void parseWhileStatement(TokenSet recoverSet) { // Name for debugging, rule start set, recover set, non-terminal parsing function parse("While Statement", Token.KW_WHILE, recoverSet, () -> { // Assume that synchronisation has occurred, and // The current token is KW_WHILE // Since synchronisation has occurred, the current token is KW_WHILE. // Cannot fail as a result of this, so don't need to use the recoverable // method version of tokens.match. tokens.match(Token.KW_WHILE); parseCondition(recoverSet.union(Token.KW_DO)); // [1] tokens.match(Token.KW_DO, STATEMENT_START_SET); // [2] parseStatement(recoverSet); // [3] }); }- [1] Note that our
recoverSetis the union of the existingrecoverSet(that is, the set of tokens on which theWhileStatementcan recover on and KW_DO (as we are up to parsing the condition attached to the while).- Also note that
recoverSet.union(...)doesn’t modify the originalrecoverSetvariable, it returns a new instance
- Also note that
- [2] The arguments to
token.match(...)with recovery are:- The token to try to match
- The set of tokens that can follow
- [3] Since no token can follow (for the purpose of syntax recovery), we just use the non-recover version of
token.match(...)
1.1.4 - Recovery Strategy Example for
RelConditionSuppose we wanted to parse a
RelCondition, which is defined by the following productionvoid parseRelStatement(TokenSet recoverSet) { parse("RelCondition", REL_COND_START_SET, recoverSet, // Anonymous parse method for parsing a RelCondition () -> { // The current token is in REL_COND_START_SET parseExp(recoverSet.union(REL_OPS_SET)); // [1] // Parse the optional [RelOp Exp] portion of the production [2] if (tokens.isIn(REL_OPS_SET)) { parseRelOp(recoverSet.union(EXP_START_SET); parseExp(recoverSet); } }); }- [1] The
recoverSetfor parsing the expression is the union of:- Anything that
RelConditioncould recover on (this is inrecoverSetalready) - Anything that an Expression could recover on
- Anything that
- [2] When parsing the
RelOpset, therecoverSetwould be comprised of- The
recoverSetfor theRelCondition - Anything that could start an expression
- The
- [1] Note that our
1.2 - Parsing: The Big Picture
- During the syntax analysis phase, the compiler needs to
- Recognise the grammatical structure of a sequence of lexical tokens;
- While performing error recovery where possible, and; recover as best as we can so we can report as much of the error to the user as possible
- Build an Abstract Syntax Tree (AST) and symbol table
1.2.1 - Parsing and Building an AST
-
Consider parsing the following production with error recovery.
- Key -
Building AST|Error Recovery| Parsing Code
ExpNode parseFactor(TokenSet recoverSet) { // Use exp.parse at it returns an expression node // Identical function to return exp.parse("Factor", FACTOR_START_SET, recoverSet, () -> { // Assume synchronisation has been performed, // the current token is in FACTOR_START_SET ExpNode result = null; // There are three symbols that could start a Factor production: // LPAREN, NUMBER or LVALUE. Parse each of them. if (token.isMatch(Token.LPAREN)) { // Cannot fail so don't use recovery version tokens.match(Token.LPAREN); result = parseCondition(recoverSet.union(Token.RPAREN)); // Note that we use the recoverset as the set of tokens that can // follow the Token.RPAREN - we don't actually know what comes after // but we can give it ALL of the possible tokens that could come // after + some more (could reduce this set based on context). tokens.match(Token.RPAREN, recoverSet); // Can fail } else if (tokens.isMatch(Token.NUMBER)) { result = new ExpNode.ConstNode(tokens.getLocation(), Predefined.INTEGER_TYPE, tokens.getIntValue()); tokens.match(Token.NUMBER); // Cannot fail, move pointer past token } else if (tokens.isMatch(Token.IDENTIFIER)) { result = parseLValue(recoverSet); } else { // Defensive programming in the case FACTOR_START_SET // has an extra unintended token. fatal("parseFactor"); } return result; }); } - Key -
-
This code here integrates three key things:
- Recognising the tokens
- Building the abstract syntax tree
- Performing (simple) error recovery.
1.2.2 - Construction of Expression Parse Method
Notice that in the code above, we have used exp.parse. How do we construct this?
-
The code to construct the expression parser
expParseMethod<ExpNode> exp = new ParseMethod<>( ... (Location loc) -> new ExpNode.ErrorNode(loc));- Notice here that the
ParseMethodis configured to return anExpNode. - The fourth parameter of this function (the anonymous function) must also return an ExpNode.
- Notice here that the