Lecture 15
1.0 - LALR(1) Parsing in Java-CUP
- The Java-CUP Parser Generator generates a LALR(1) Parser and its complementary parsing automaton.
1.1 - Running the Java-CUP Parser Generator
-
We defined this simple grammar before, in the Bottom-Up Parsing Lecture, in which we have the non-terminal symbols
. -
The grammar is defined by two productions
-
We can define this in Java-CUP using the following syntax
import java_cup.runtime.*; terminal a; terminal LPAREN, RPAREN; nonterminal A; A ::= a | LPAREN A RPAREN;
-
When running the Java-CUP Parser Generator, we can use the -dump option to get it to display the LALR(1) Parsing Automaton it generates.
1.1.1 - Grammar Summary
-
The first thing in Java-CUP’s output (with the -dump option) is a summary of the grammar
===== Terminals ===== [0]EOF [1]error [2]a [3]LPAREN [4]RPAREN ===== Non terminals ===== [0] A ===== Productions ===== [0] A ::= a [1] $START ::= A EOF [2] A ::= LPAREN A RPAREN
- Terminals
- JavaCUP inserts two new terminal symbols - EOF symbol and error symbol.
- Non-Terminals
- Productions
- Our two productions from our original grammar are [0] and [2].
- JavaCUP inserts the production
$START :== A EOF
just as we did in the theory before - it introduces a new start symbol,START
, and a new production where the new start symbol produces the original start symbol. - This production has a small technical difference to our lecture example - here JavaCUP inserts the EOF symbol at the end of the production.
1.1.2 - Automaton Summary - Viable Prefix Recogniser
- For each state, we define the LR(1) parsing items - which are pairs of LR(0) parsing items, and lookahead sets.
===== Viable Prefix Recogniser =====
START lalr_state [0]: {
[A ::= (*) LPAREN A RPAREN, {EOF }]
[$START ::= (*) A EOF, {EOF }]
[A ::= (*) a, {EOF }]
}
transition on LPAREN to state [3]
transition on a to state [2]
transition on A to state [1]
1.1.3 - Action Table
-
The Action Table is a summary of parsing actions
From state #0 [term 2:SHIFT(to state 2)] [term 3:SHIFT(to state 3)]
-
In this case term x is given by the terminal symbol summary above:
===== Terminals ===== [0]EOF [1]error [2]a [3]LPAREN [4]RPAREN
- Where term 2 = a, term 3 = LPAREN
1.1.4 - Reduce Table
-
The Reduce Table is a summary of transitions out of each state on non-terminal symbols
-
Once the parser generates the LALR(1) parsing automaton, it really only needs to execute the Action Table and Reduce Table.
----------------------------- -------- REDUCE_TABLE-------- From state #0 [non term 0->state 1] From state #1
1.2 - Giving JavaCUP a Non-LALR(1) Grammar
-
What happens when we give JavaCUP a non-LALR(1) grammar?
-
This grammar is not LALR(1) as it is ambiguous
-
It’s defined in JavaCUP syntax as follows:
import jav_cup.runtime.*; terminal n; terminal LPAREN, RPAREN; nonterminal L, E; L ::= /* empty */ | E | L E; E ::= n | LPAREN L RPAREN;
-
When we try to run this however, we get a series of error messages saying that JavaCUP has run into a whole host of action conflicts.
-
Removing the ambiguity (i.e. removing the
production) gives us a LALR(1) grammar, and therefore, JavaCUP is able to generate a parsing automaton.
1.3 - Giving JavaCUP an Ambiguous Grammar
-
Suppose we introduce the dangling-else problem
Statement ::= KW_WHILE Condition:c KW_DO Statement:s {: RESULT = new StatementNode.WhileNode(cxleft, c, s); :} | ... | KW_IF Condition:c KW_THEN Statement:s1 ElsePart:s2 {: RESULT = new StatementNode.IfNode(cxleft, c, s1, s2); :} ElsePart ::= KW_ELSE Statement:s2 {: RESULT = s2; :} | /* empty */ {: RESULT = new StatementNode.SkipNode(ErrorHandler.NO_LOCATION); :} %prec KW_IF ;
-
When we attempt to create the parsing automaton for this ambiguous grammar, we get conflicts:
Error : *** More conflicts encountered than expected - parser generation aborted ------ Cup v0.11b 20160615 (GIT 4ac7450) Parser Generation Summary ------- 1 error and 3 warnings 43 terminals, 33 non-terminals and 79 productions declared, producing 130 unique parsing states ...
-
To solve this, we can either:
- Come up with a better grammar that resolves the ambiguity
- Specify precedence in PL0.cup which tells JavaCUP how to resolve ambiguities
- We can tell JavaCUP to match the else to the closest if using precedence operators.
-
To specify precedence, we can write the following, which specifies that matching KW_ELSE has a higher precedence than matching KW_IF
precedence nonassoc KW_IF; precedence nonassoc KW_ELSE;
-
Note that in the code block above, we use the syntax
%prec
KW_IF which essentially means that we give the null alternative the same precedence as KW_IF (meaning that matching KW_ELSE has a higher precedence).- Therefore, JavaCUP will always try to match the KW_ELSE if possible, and default to matching the empty alternative otherwise.
-
Re-running the JavaCUP Parser Generator, we receive no errors, and the parsing automaton is generated properly.
1.4 - JavaCUP Syntax Error Recovery
- JavaCUP attempts to perform Syntax Error Recovery.
- When JavaCUP detects a syntax error, it tries to replace some of the input with an error terminal symbol and continues.
- We can use this symbol to try to recover.
1.4.1 - JavaCUP Syntax Error Recovery in Practice - LValue
-
Here, if we can’t match an LValue, we return an ExpNode.ErrorNode to try to handle errors that arise..
LValue ::= IDENTIFIER:id {: /* At this stage the identifier could be either a constant identifier or * a variable identifier but which cannot be determined until static * checking when the IdentifierNode will be transformed into either * ConstNode or a VariableNode or detected as invalid. */ RESULT = new ExpNode.IdentifierNode(idxleft, id); :} | error:e {: RESULT = new ExpNode.ErrorNode(exleft); :}
-
We use LValues in Factors:
Factor ::= PLUS Factor:e {: RESULT = e; :} | UnaryOperator:op Factor:e {: RESULT = new ExpNode.UnaryNode(opxleft, op, e); :} | LPAREN Condition:c RPAREN {: RESULT = c; :} | NUMBER:n {: RESULT = new ExpNode.ConstNode(nxleft, Predefined.INTEGER_TYPE, n); :} | LValue:lval {: RESULT = lval; :} ;
-
Suppose we wanted to introduce an error alternative into Factor, if we couldn’t match a Factor:
| error:e {: RESULT = new ExpNode.ErrorNode(exleft); :}
- However, this is the wrong thing to do, as an LValue already has an error alternative; this would introduce an ambiguity as the compiler doesn’t know whether to match the error alternative of LValue or Factor.
-
Notice that since
, and none of these need an error alternative. -
In general, our approach is to put these error alternatives as deeply nested as possible in our syntax tree.
-
In addition to
, a also has an error alternative.
2.0 - Symbol Table for LALR(1) Compilers.
This example is talking specifically about the Assignment 2 LALR(1) Compiler.
-
The parser generates both an Abstract Syntax Tree and a Symbol Table.
-
The type checker type checks both the AST and Symbol Table.
-
The symbol tracks information about all of the identifiers defined in our program
-
Generate symbol table for PL0, which is a procedural programming language
- This means that our program is hierarchically structured into procedures
- Therefore, our symbol table is hierarchically structured into scopes.
-
Consider the following PL0 program.
-
The main method has a scope, which keeps track of the following:
- Declaration of variables
x
,y
&res
- Declaration of
procedure C
- The main method has a parent scope - this special predefined scope is the scope that contains all pre-defined constants and types in our program.
- Declaration of variables
-
The scope of
procedure C
keeps track of the following:- Declaration of variables
n
&f
- Declaration of
procedure fact
- This scope’s parent scope is the main procedure’s scope.
- Declaration of variables
-
The scope of procedure fact has:
- (no declarations)
- Parent scope is the scope of procedure C
-
The static level describes how deep a scope is
Static Lvl Scope 0 Predefined Scope 1 Main Method 2 procedure C (& any procedure defined in main) 3 procedure fact ... ...
var x: int; // first argument to C
y: int; // second argument to C
res: int; // result from C
procedure C() =
var n: int; // argument to fact
f: int; // result from fact
procedure fact() =
begin
// assume 0 <= n
if n = 0 then
f := 1
// f = n!
else // 0 < n
begin
n := n - 1;
call fact();
// f = n!
n:= n + 1;
f := f * n
// f = n!
end
end; // fact
begin // C
n := x;
call fact();
// f = x!
res := f;
n := y;
call fact();
// f = y!
res := res / f;
// res = x! / y!
n := x - y;
call fact();
// f = (x-y)!
res := res / f
// res = x! / (y! * (x-y!))
end; // C
begin // main program
x := 6;
y := 2;
call C();
write res
end
-
We can observe that the Scope method implements these features:
public class Scope { /** * Offset of start of local variables from frame pointer */ final static int LOCALS_BASE = 3; /** * Parent Scope */ private final Scope parent; /** * Static level of this scope */ private final int level; /** * Symbol table entry for the procedure (main program) that owns this scope */ private final SymEntry.ProcedureEntry ownerEntry; }
-
Note that the ownerEntry is an interesting feature:
- The main procedure “owns” the variable declarations of
x
,y
&res
- The procedure C “owns” the variable declarations of
n
&f
- The predefined procedure is a special case, as it doesn’t have a procedure that owns it.
var x: int; // first argument to C y: int; // second argument to C res: int; // result from C procedure C() = var n: int; // argument to fact f: int; // result from fact
- The main procedure “owns” the variable declarations of
-
Our Scope class also defines a SortedMap which contains all of our symbol table entries
- These symbol table entries are defined in src/syms/SymEntry.java
/** * Symbol table entries. A SortedMap is used to avoid issues with * hashing functions working differently on different implementations. * This only affects minor things like the order of allocating variables * and dumping symbol tables in trace backs. A Map/HashMap would still give a * valid implementation. */ private final SortedMap<String, SymEntry> entries;
-
Additionally our Scope class has an integer variable that keeps track of the amount of space allocated for variables in this current scope.
/** * space allocated for local variables within this scope */ private int variableSpace;
-
Additionally, scope extensions have been defined (for implementing “for” statements)
/** * true if this is an extension of its parent scope */ private boolean extension;
2.1 - Identifier Lookup in Scope
- To look up an identifier in a symbol table, we first check whether it’s within the current scope.
- If it is not, we try looking it up in the parent scope (which recursively checks through parent scopes until we either find it, or reach the predefined scope).
/**
* Lookup id starting in the current scope and
* thence in the parent scope and so on.
*
* @param id identifier to search for.
* @return symbol table entry for the id, or null if not found.
*/
public SymEntry lookup(String id) {
/* Lookup the entry in the current scope */
SymEntry entry = entries.get(id);
if (entry == null && parent != null) {
/* If the entry is not in the current scope
* look it up in the parent scope, if there is one.
*/
return parent.lookup(id);
}
return entry;
}
2.1 - Setup of Predefined Scope
public class SymbolTable {
private final Scope predefinedScope;
/**
* Construct a symbol table and build the predefined scope
* as its initial scope.
*/
public SymbolTable() {
super();
SymEntry.ProcedureEntry predefined =
new SymEntry.ProcedureEntry("<predefined>",
ErrorHandler.NO_LOCATION, null);
predefinedScope = new Scope(null, 0, predefined);
predefined.setLocalScope(predefinedScope);
Predefined.addPredefinedEntries(predefinedScope);
}
}
- The Symbol Table class sets up the predefined scope for us - that is, when we construct a new
SymbolTable
class instance, it configures thepredefinedScope
.- We create a new
SymTable.ProcedureEntry
instance as a dummy procedure (predefined
), and pass it to the Scope constructor,
- We create a new
- Since a procedure needs to know its scope, and a scope its procedure, we use the
SymTable.ProcedureEntry.setLocalScope
method to set the local scope of the procedure entry. - We additionally want to populate the predefined scope with everything that should be in it - that’s what
Predefined.addPredefinedEntries(Scope predefined)
does.- This method adds in a series of types, operators
predefined.addType("int", ErrorHandler.NO_LOCATION, INTEGER_TYPE);
predefined.addType("boolean", ErrorHandler.NO_LOCATION, BOOLEAN_TYPE);
predefined.addOperator(Operator.NEG_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_UNARY);
predefined.addOperator(Operator.ADD_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_BINARY);
predefined.addOperator(Operator.SUB_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_BINARY);
predefined.addOperator(Operator.MUL_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_BINARY);
predefined.addOperator(Operator.DIV_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_BINARY);
predefined.addOperator(Operator.EQUALS_OP, ErrorHandler.NO_LOCATION,
INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.NEQUALS_OP, ErrorHandler.NO_LOCATION,
INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.GREATER_OP, ErrorHandler.NO_LOCATION,
INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.LESS_OP, ErrorHandler.NO_LOCATION,
INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.GEQUALS_OP, ErrorHandler.NO_LOCATION,
INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.LEQUALS_OP, ErrorHandler.NO_LOCATION,
INT_RELATIONAL_TYPE);
2.2 - PL0 Parser - Setup of Predefined Scope
- In PL0.cup, we need to set up both the predefined scope and the scope for the main method
2.2.1 - Parsing a Program
- In setting up a new scope for the main method, we need to do a few things:
-
Create a new procedure entry for the main method
SymEntry.ProcedureEntry procMain = currentScope.addProcedure("<main>", ErrorHandler.NO_LOCATION);
-
We then test that everything has gone well, and that nothing weird has happened.
if(procMain == null) { errors.fatal("Could not add main program to symbol table", ErrorHandler.NO_LOCATION); }
-
We finally take our current scope, and add a new scope to it, for the main method
- In doing this, we also update the current scope to be the scope of the main method.
- It’s important that we do that before moving on and parsing other things (the next thing is a Block) as we want to parse these other things with respect to the main method’s scope.
currentScope = currentScope.newScope(procMain);
-
Program ::= /* empty */
{:
/* This action occurs before the whole program is recognised.
* It constructs the initial symbol table with current scope the
* predefined scope. */
currentScope = new SymbolTable().getPredefinedScope();
/* Set up a dummy symbol table entry for the main program */
SymEntry.ProcedureEntry procMain =
currentScope.addProcedure("<main>", ErrorHandler.NO_LOCATION);
if(procMain == null) {
errors.fatal("Could not add main program to symbol table",
ErrorHandler.NO_LOCATION);
}
/* Enter the scope for the main program and save the new local
* scope in main's symbol table entry */
currentScope = currentScope.newScope(procMain);
:}
2.2.2 - Parsing a Block
Block ::= DeclarationList:dl CompoundStatement:b
{:
RESULT = new StatementNode.BlockNode(bxleft, dl, b, currentScope);
:}
;
- A block is a DeclarationList followed by a CompoundStatement.
2.2.3 - Parsing a DeclarationList
DeclarationList ::= DeclarationList:dl ProcedureDef:p
{:
/* Add a procedure declaration to the list of declarations */
dl.addDeclaration(p);
RESULT = dl;
:}
| DeclarationList:dl Declaration
{:
/* A non-procedure declaration is not added to the list
* but added to the symbol table during its parsing. */
RESULT = dl;
:}
| /* empty */
{:
RESULT = new DeclNode.DeclListNode();
:}
;
-
As we parse each declaration, we append it to the declaration list.
Declaration ::= KW_CONST ConstDefSeq | KW_TYPE TypeDefSeq | KW_VAR VarDeclSeq ;
-
These declarations can either be:
- ProcedureDef (Procedure Definitions / Procedure Declarations) - these are produced by the first alternative in the DeclarationList production.
- Constant Declarations
- Type Declarations
- Variable Declarations
2.2.4 - Adding Declarations to the Symbol Table
-
Consider the following declarations
const N = 10; M = -N; // Constant Declaration var x : T; // Variable Declaration type T = [1..N]; // Type Declaration
-
And their corresponding symbol table entries
-
Where
abbreviates in the compiler. -
During the parsing phase, these declarations will be added to our symbol table
-
When we add these declarations to the symbol table, there is type information that hasn’t been resolved yet - this needs to get done during Type Checking.
Parsing a Constant
- When parsing a constant, we add a new
ConstEntry(type, value)
to the symbol table.- However, at this stage, we don’t know what type it is (that’s what the first parameter is, and is currently left blank as it is still unknown).
- Additionally, we haven’t yet resolved its value (here, denoted as a
). - We represent the constant as a small Abstract Syntax Tree (with the root being
) - This AST will be later evaluated by the type checker
- It is the role of the type checker to go through all of these declarations to evaluate the value.
- We represent the constant as a small Abstract Syntax Tree (with the root being
- A benefit of doing it this way is that some of our definitions depend on declarations after it - in this example, look at the declaration of the variable
, which depends on the type . - Therefore, we collect all of our information in the parsing phase, and then resolve the values later.
Performing Type Checking
-
After performing type checking, we know both the type and value of our constant declarations.
2.2.5 - Parsing Constant Definitions in JavaCUP
-
A sequence of constant definitions is just a series of Constant definitions.
ConstDefSeq ::= ConstDef | ConstDefSeq ConstDef ;
-
A constant definition is of the form IDENTIFIER = {constant};
- When we parse a constant definition, we attempt to add it to the current scope (i.e. add to the symbol table.
- the
addConstant
method returns null if there is already an identifier with that name - in that case we throw an error.
ConstDef ::= IDENTIFIER:id EQUALS Constant:c SEMICOLON {: /* The attribute idxleft represents the location of the first * character of the IDENTIFIER token in the input stream. * addConstant returns null if id is already defined * in the current scope */ if(currentScope.addConstant(id, idxleft, c) == null) { errors.error(id + " already declared in this scope", idxleft); } :} | error ;
-
When we parse these constants, we return a small abstract syntax tree (as described before) that will be resolved in the type checking phase.
/* The rules for Constant construct a (mini) abstract syntax tree * for constant expressions (not to be confused with ExpNodes). */ Constant ::= NUMBER:n {: RESULT = new ConstExp.NumberNode(nxleft, Predefined.INTEGER_TYPE, n); :} | MINUS:op Constant:c {: RESULT = new ConstExp.NegateNode(opxleft, c); :} | IDENTIFIER:id {: RESULT = new ConstExp.ConstIdNode(idxleft, id, currentScope); :} | error:err {: RESULT = new ConstExp.ErrorNode(errxleft); :} ;
2.2.6 - Parsing Variables in JavaCUP
-
When we declare a variable, we know what the name of the variable is, and its type.
var x : T
-
At this point, we represent the type
as - This essentially just denotes that we know that
is some identifier, but we don’t know what type it is yet - As it is just a placeholder until the static checking phase, it will be removed (and replaced) by the appropriate type during static checking.
- This essentially just denotes that we know that
-
After the static checking phase, the type is resolved, and the symbol table entry for the type becomes:
Type ::= TypeIdentifier:type
{:
RESULT = type;
:}
| LBRACKET:subr Constant:lo RANGE Constant:hi RBRACKET
{:
RESULT = new Type.SubrangeType(subrxleft, lo, hi);
:}
| LCURLY:e EnumerationList:elist RBRACKET
{:
RESULT = new Type.EnumerationType(exleft, elist, currentScope);
:}
;
TypeIdentifier ::= IDENTIFIER:id
{: /* As the type identifier may not be defined at this point.
* IdRefType records the id, as well as the symbol table scope
* to look it up during type resolution in the static checker.
*/
RESULT = new Type.IdRefType(idxleft, id, currentScope);
:}
| error:err
{:
RESULT = Type.ERROR_TYPE;
:}
;
- Note that in the code above that when we parse a TypeIdentifier, we return a new
Type.IdRefType
just as we have discussed above.
-
In resolving the type above, we don’t yet know what the base type of the subrange is.
- However, at this point we do know its low value
and high value
LBRACKET:subr Constant:lo RANGE Constant:hi RBRACKET {: RESULT = new Type.SubrangeType(subrxleft, lo, hi); :}
- However, at this point we do know its low value
-
This is once again reflected in our code - When we parse a subrange type, we pass in the location, low value and high value of our subrange type
-
The type of the subrange is dependent on what types the low and high values resolve to
- Since they both resolve to integers, the subrange has base type of integer.
-
Once we’ve resolved the type entry
, we can then resolve the AST entry and therefore resolve the type of the variable .
3.0 - Static Checking
3.1 - Static Checking a Program
-
To perform static checking on a program, we perform static checking on the main procedure:
public void visitProgramNode(DeclNode.ProcedureNode node) { beginCheck("Program"); // The main program is a special case of a procedure visitProcedureNode(node); endCheck("Program"); }
3.2 - Static Checking a Procedure
-
The main procedure node is static checked as follows:
/** * Procedure, function or main program node */ public void visitProcedureNode(DeclNode.ProcedureNode node) { beginCheck("Procedure"); SymEntry.ProcedureEntry procEntry = node.getProcEntry(); // Save the block's abstract syntax tree in the procedure entry procEntry.setBlock(node.getBlock()); // The local scope is that for the procedure. Scope localScope = procEntry.getLocalScope(); /* Resolve all references to identifiers within the declarations. */ localScope.resolveScope(); // Enter the local scope of the procedure currentScope = localScope; // Check the block of the procedure. visitBlockNode(node.getBlock()); // Restore the symbol table to the parent scope currentScope = currentScope.getParent(); endCheck("Procedure"); }
-
We first resolve the (local) scope of the procedure.
- The resolveScope() method resolves the type of each entry inside the symbol table (where beforehand, it was a mini AST).
-
We also statically check the block of code inside that procedure
-
However, we need to statically check it with respect to the scope of the procedure, so we update the currentScope before static checking the block.
currentScope = localScope; visitBlockNode(node.getBlock());
-
After we finish checking the block, we return to the parent scope
currentScope = currentScope.getParent();
-
3.2.1 - Detecting Cyclic Definitions
-
Consider the following declaration in which we cyclic definitions of both constants and types.
const N = M; M = -N; type S = T; T = S;
-
If we detect a cyclic definition in the type resolution, the compiler should stop and warn the programmer.
-
In the case where we don’t have cyclic definitions, but definitions that depend on each other, we need to be careful about the order in which we resolve the types.
-
Consider the following case in which the variable x depends on the type T
const N = 10; M = -N; var x : T; type T = [1..N];
4.0 - Resolving Symbol Table Entries
-
In our symbol table entry class, SymEntry we have the method resolve() intended to resolve the type of any symbol table entry.
-
The default implementation is listed below, in which it tries to resolve the entry if it hasn’t yet been resolved.
/** * Resolve any references to type identifiers in supplied scope */ public void resolve() { if (!resolved) { type = type.resolveType(); resolved = true; } }
-
For the ConstEntry and VarEntry, this resolve method is re-defined.
4.1 - Resolving a Variable Entry
-
In resolving a variable entry, we:
- Call the base resolve method
- Determine the type of the variable
- Allocate space for the variable, based on its type.
public static class VarEntry extends SymEntry { /** * resolving a variable requires allocating space for it */ @Override public void resolve() { if (!resolved) { // resolve the type of the variable super.resolve(); /* Space is allocated for the variable and the address of that * location placed in the entry for the variable. * The space allocated depends on the size of its type. */ Type baseType = ((Type.ReferenceType) type).getBaseType(); offset = scope.allocVariableSpace(baseType.getSpace()); } } }
-
To resolve a variable entry, we need to resolve the type - how do we do this?
4.1.1 - Parsing a Variable
VarDecl ::= IDENTIFIER:id COLON TypeIdentifier:type SEMICOLON
{:
// Variables are always of ReferenceType.
if(currentScope.addVariable(id, idxleft, type) == null) {
errors.error(id + " already declared in this scope", idxleft);
}
:}
| error
;
-
When we parse a variable, we use the addVariable method of the current scope.
/** * Add a VARIABLE entry to the current scope. * * @return a reference to the new entry unless an entry with the same name * already exists in the current scope, in which case return null. */ public SymEntry.VarEntry addVariable(String name, Location loc, Type type) { SymEntry.VarEntry entry = new SymEntry.VarEntry(name, loc, type); return (SymEntry.VarEntry) addEntry(entry); }
- The addVariable method creates a new symbol table entry instance, with the provided location and type.
- When we construct a new instance of SymEntry.VarEntry, it sets the type as ReferenceType(t)
/** * Symbol table entry for a variable identifier */ public static class VarEntry extends SymEntry { /** * offset of variable starting from 0 */ int offset; public VarEntry(String id, Location loc, Type t) { super(id, loc, new ReferenceType(t), false); readOnly = false; } }
- Therefore, when we specify our Java CUP syntax, we don’t want to pass it a
ReferenceType(t)
, but rather t itself (as the construction of a reference type is handled by theSymEntry.VarEntry
.
-
If we were parsing the following variable definition, the type at this point would be set to
-
Additionally, observe that the Java CUP definition for a TypeIdentifier returns a new Type.IdRefType.
TypeIdentifier ::= IDENTIFIER:id {: /* As the type identifier may not be defined at this point. * IdRefType records the id, as well as the symbol table scope * to look it up during type resolution in the static checker. */ RESULT = new Type.IdRefType(idxleft, id, currentScope); :} | error:err {: RESULT = Type.ERROR_TYPE; :} ;
4.1.2 - Resolving a Variable Type.
-
In resolving a SymEntry that is a variable type, SymEntry.resolve() is called.
/** * Resolve any references to type identifiers in supplied scope */ public void resolve() { if (!resolved) { type = type.resolveType(); resolved = true; } }
-
If the symbol table entry is a variable, and it hasn’t been resolved then the type will be an
. -
Therefore, type.resolveType() calls Type.IdRefType.resolveType().
-
This is another case where we must be careful of cyclic definitions.
- If our type is unresolved, we try to look up the type in the symbol table
- If the type exists in the symbol table, we resolve the symbol table entry.
- Otherwise, we throw an error, as the type has not yet been defined.
/** * Resolve the type identifier and return the real type. */ @Override public Type resolveType() { // System.out.println("Resolving type id " + id); switch (status) { case Unresolved: status = Status.Resolving; realType = ERROR_TYPE; SymEntry entry = scope.lookupType(id); if (entry != null) { /* resolve identifiers in the referenced type */ entry.resolve(); /* if status of this entry has resolved then there was a * circular reference and we leave the realType as * ERROR_TYPE to avoid other parts of the compiler getting * into infinite loops chasing types. */ if (status == Status.Resolving) { realType = entry.getType(); } assert realType != null; } else { errors.error("undefined type: " + id, loc); } status = Status.Resolved; break; case Resolving: errors.error(id + " is circularly defined", loc); /* Will resolve to ERROR_TYPE */ status = Status.Resolved; break; case Resolved: /* Already resolved */ break; } // System.out.println("Resolved type id " + name + " to " + realType); return realType; }
- If our type is unresolved, we try to look up the type in the symbol table
- We now need to resolve a
SymEntry.TypeEntry
(that is the object returned by thescope.lookupType(id)
method called inIdRefType.resolve()
above.
4.1.3 - Resolving a Subrange Type
- Suppose we have the following type that we want to resolve - how do we do this?
4.2 - Resolving a Constant Entry
-
The resolution of a constant entry can have three possible states:
Unresolved
The constant entry hasn’t yet been resolved, but we are now resolving it.Resolving
We are in the process of resolving the constant entry - we get into the “Resolving” state when we are currently in the process of resolving another constant entry - which requires the current constant entry’s type to be resolved.- This only occurs when there is a circular reference, so we report the error.
Resolved
There is nothing to do, as the entry is resolved.
public static class ConstantEntry extends SymEntry { /** * Resolve references to constant identifiers and evaluate expression */ @Override public void resolve() { switch (status) { case Unresolved: status = Status.Resolving; value = tree.getValue(); type = tree.getType(); status = Status.Resolved; resolved = true; break; case Resolving: error("circular reference in constant expression", loc); status = Status.Resolved; resolved = true; break; case Resolved: break; } } }
-
If the current status is “Unresolved”, how do we resolve a constant entry
-
We evaluate the value of the variable “tree” which is of type ConstExp
-
The ConstExp type is essentially a small abstract syntax tree representing the value and type of the constant node.
-
The ConstExp.getValue method is defined as follows:
public int getValue() { if (status == Status.Unresolved) { evaluate(); } return value; }
-
The evaluate() method doesn’t have a default implementation; the specific method that gets called depends on what type of
ConstExp
has been defined
-
4.2.1 - Resolving a ConstEntry(NumberNode(...))
-
Before, we looked at the example where a constant was simply a
NumberNode
, defined as follows: -
NumberNode
types are trivial to resolve, as we already know their type and value. -
Therefore, the
NumberNode
type doesn’t have an implementation of the resolve method - it is already known. -
In resolving it, we just set the type and value of the
ConstEntry
to the value of thatNumberNode
-
This lookup is done by the
ConstantEntry.resolve()
method listed above:value = tree.getValue(); type = tree.getType();
4.2.2 - Resolving a ConstEntry(NegateConst(...))
-
Before, we looked at an example where a constant was a negation node, defined as follows:
-
To resolve the type of a negation node, we first need to resolve the type of it’s sub-expression
/** * A constant expression consisting of a negated constant expression */ public static class NegateNode extends ConstExp { private final ConstExp subExp; public NegateNode(Location loc, ConstExp subExp) { super(loc, Status.Unresolved); this.subExp = subExp; } @Override protected void evaluate() { type = subExp.getType(); if (type != Predefined.INTEGER_TYPE) { errors.error("can only negate an integer", loc); } else { value = -subExp.getValue(); } status = Status.Resolved; } }
-
In this particular instance, our sub-expression is an instance of
. -
Note that in our evaluation method for a
NegateNode
we callsubExp.getType()
- this method essentially calls -
Following our constant identifier entry instance
which we need to resolve to get the type. -
In this specific case, our identifier entry type has already been resolved
-
We then set the type of the
to that of the -
If the type isn’t an integer we throw an error.
- Otherwise, we set it to the negative of the sub-expression and set the status of the node to resolved.
4.2.3 - Resolving a ConstIdNode
A
ConstIdNode
is a constant expression consisting of a reference to an identifier.
- In resolving a Constant identifier node, we first attempt to look up the identifier in the symbol table.
- We then check if it’s a constant entry. Else, throw an error.
- We then evaluate (and set) its type and value
- We finish by setting its status to resolved.
protected void evaluate() {
// System.out.println( " ConstExp Resolving " + id + " " + status );
switch (status) {
case Unresolved:
status = Status.Resolving;
SymEntry entry = scope.lookup(id);
if (entry instanceof SymEntry.ConstantEntry) {
SymEntry.ConstantEntry constEntry =
(SymEntry.ConstantEntry) entry;
type = constEntry.getType();
value = constEntry.getValue();
status = Status.Resolved;
} else {
errors.error("Constant identifier expected", loc);
}
break;
case Resolving:
errors.error(id + " is circularly defined", loc);
/* Will resolve to error type and silly value.
* Set to Resolved to avoid repeated attempts to
* resolve the unresolvable, and hence avoid
* unnecessary repeated error messages. */
status = Status.Resolved;
break;
case Resolved:
/* Already resolved */
break;
}
}
4.2 - Example Code and Scope.
var x: int; // first argument to C
y: int; // second argument to C
res: int; // result from C
procedure C() =
var n: int; // argument to fact
f: int; // result from fact
procedure fact() =
begin
// assume 0 <= n
if n = 0 then
f := 1
// f = n!
else // 0 < n
begin
n := n - 1;
call fact();
// f = n!
n:= n + 1;
f := f * n
// f = n!
end
end; // fact
begin // C
n := x;
call fact();
// f = x!
res := f;
n := y;
call fact();
// f = y!
res := res / f;
// res = x! / y!
n := x - y;
call fact();
// f = (x-y)!
res := res / f
// res = x! / (y! * (x-y!))
end; // C
begin // main program
x := 6;
y := 2;
call C();
write res
end
-
Given the code shown above, we have four scopes
0 : Predefined "int" ↦ TypeEntry(INT_TYPE) "boolean" ↦ TypeEntry(BOOL_TYPE) "false" ↦ ConstEntry(BOOL_TYPE, 0) "true" ↦ OperatorEntry(PLUS_OP, ARITHMETIC_BINARY) "_+_" ↦ OperatorEntry(PLUS_OP, ARITHMETIC_BINARY) 1: Main "x" ↦ VarEntry(ReferenceType(INT_TYPE)) "y" ↦ VarEntry(ReferenceType(INT_TYPE)) "res" ↦ VarEntry(ReferenceType(INT_TYPE)) "C" ↦ ProcedureEntry() 2: C "n" ↦ VarEntry(ReferenceType(INT_TYPE)) "f" ↦ VarEntry(ReferenceType(INT_TYPE)) "fact" ↦ ProcedureEntry() 3: fact