Lecture 6
- Recursive Descent Parsing
- Static Semantics of PL0
- Symbol Table
- Implementation of Static Semantic Checking
1.0 - Recursive Descent Parsing
1.1 - Tutorial 4 Recursive Descent Parser Code
So we’ve got some Java code that can parse PL0 code. How do we get it to actually parse the code?
- The
src/parse/Parser.java
file contains the definition for the Parser class.
-
When opened in IntelliJ, we have a run configuration that is configured to run on the file that is currently open.
-
We are essentially running the
PL0 Recursive Descent Parser
,pl0.PL0_RD
on the file that’s currently open (denoted by$FilePath$
) in the current directory$MODULE_WORKING_DIRS$
-
Suppose we open the
test-base0-abs.pl0
file in thetest-pgm
directory of the project root and compile it:

Compiling test-base0-abs.pl0
Parsing complete
Static semantic analysis complete
Running ...
100
var x: int;
y: int;
begin
x := -100;
if x < 0 then y := -x else y := x;
write y
end
Parsing complete
indicates that the program has no syntactical errors.Static semantic analysis complete
indicates that there are no type checking errors
The PL0 Recursive Descent Parser implemented here is using the error recovery strategy described in the previous lecture. What would the output be if we intentionally mess up the code?
- Suppose in the source code we replace the assignment operator
:=
with=
var x: int;
y: int;
begin
x = -100; // Replaced := with =
if x < 0 then y := -x else y := x;
write y
end
Compiling test-base0-abs.pl0
4 x = -100;
***** ^ Error: Parse error, expecting ':=' in Assignment
Parsing complete
Static semantic analysis complete
1 error detected.
- What if we had two errors in our source code:
- The replacement from before
- The extra
then
keyword
var x: int;
y: int;
begin
x = -100; // Replaced := with =
if x < 0 then then y := -x else y := x; // Extra then keyword.
write y
end
Compiling test-base0-abs.pl0
4 x = -100;
***** ^ Error: Parse error, expecting ':=' in Assignment
5 if x < 0 then then y := -x else y := x;
***** ^ Error: 'then' cannot start a statement.
Parsing complete
Static semantic analysis complete
2 errors detected.
1.2 - Tutorial 4 PL0 Parser Implementation
Specifically looking at the implementation of the PL0 parser for Expressions and Statements.
- This parser is based on the error recovery scheme that was discussed in the previous lecture.
- To implement this, we have a series of parse methods that perform the synchronisation for us at the start and end of each parse method for Expressions, Statements etc.
private final ParseMethod<ExpNode> exp = new ParseMethod<>(
(Location loc) -> new ExpNode.ErrorNode(loc));
- Configure
exp.parse
to return anExpNode
- If it encounters an error during the synchronisation phase, return an
ExpNode.ErrorNode
to indicate that there was a problem.
1.2.1 - Constant Declarations
Create constants that represent the start sets for our non-terminal symbols.
/**
* Set of tokens that may start an LValue.
*/
private final static TokenSet LVALUE_START_SET =
new TokenSet(Token.IDENTIFIER);
private final static TokenSet FACTOR_START_SET =
LVALUE_START_SET.union(Token.NUMBER, Token.LPAREN);
private final static TokenSet TERM_START_SET =
FACTOR_START_SET;
private final static TokenSet EXP_START_SET =
TERM_START_SET.union(Token.PLUS, Token.MINUS);
private final static TokenSet REL_CONDITION_START_SET =
EXP_START_SET;
private final static TokenSet CONDITION_START_SET =
REL_CONDITION_START_SET;
1.2.2 - ParseRelCondition
private ExpNode parseRelCondition(TokenSet recoverSet) {
return exp.parse("RelCondition", REL_CONDITION_START_SET, recoverSet,
() -> {
/* The current token is in REL_CONDITION_START_SET */
ExpNode cond = parseExp(recoverSet.union(REL_OPS_SET));
if (tokens.isIn(REL_OPS_SET)) {
Location loc = tokens.getLocation();
Operator operatorCode =
parseRelOp(recoverSet.union(EXP_START_SET));
ExpNode right = parseExp(recoverSet);
cond = new ExpNode.BinaryNode(loc, operatorCode, cond, right);
}
return cond;
});
}
-
Note that
TokenSet recoverSet
is passed as a parameter - this is required as we’re doing error recovery. -
Use
exp.parse(...)
to perform the synchronisation at the start and end of the parsing for us. The parameters to the method are:Name
This is used in the debugger to print out more meaningful messages of what failed.StartSet
The set of tokens from which aRelCondition
can start withRecoverSet
The set of tokens from which aRelCondition
can start withAnonymous Function
A function that does parsing for theRelCondition
- The
Anonymous Function
is used to define the behaviour of the parser for aRelCondition
(in this case)
-
Since we want to build up the AST representation of the code that we’re parsing, we want to return an
ExpNode
- So rather than using
parse(...)
we use returnexp.parse(...)
-
A
RelCondition
is defined by the following production: -
If the
RelCondition
is just an Expression, we can just use the parse method and return itExpNode cond = parseExp(recoverSet.unino(REL_OPS_SET)); ... return cond;
-
Otherwise, we have to parse the
component of the production ExpNode cond = parseExp(recoverSet.union(REL_OPS_SET)); if (tokens.isIn(REL_OPS_SET)) { // I.e. are we at the start of a RelOp Location loc = tokens.getLocation(); // Required for BinaryNode constructor Operator operatorCode = parseRelOp(recoverSet.union(EXP_START_SET)); ExpNode right = parseExp(recoverSet); cond = new ExpNode.BinaryNode(loc, operatorCode, cond, right); } return cond;
- The parseRelOp operation is defined below here
- Note that the recoverSet is anything that a RelCondition can start with, and the set of tokens that can follow what we’re parsing
- So when we’re parsing the first Expression in the production, the recoverSet added with the RelOp start set is what we use as the recover set in this instance.
- When we’re parsing the RelOp, the recoverSet added with the Expression start set is what we use as the recover set in this instance.
- When we’re parsing the final expression, the recoverSet is just the set of tokens that the production can start with (as there is nothing that follows)
- After we finish parsing the tokens in the production, we can create the BinaryNode and return it.
- So rather than using
1.2.3 - ParseRelOp
Note that the
parseRelOp
method doesn’t return anExpNode
- it returns an instance of Operator.
-
The
parseRelOp
method must parse the following production -
To do this, we initialise a parse method that returns an Operator:
private final ParseMethod<Operator> op = new ParseMethod<>( (Location loc) -> Operator.INVALID_OP);
- Note here that if there’s an error, we return an
Operator.INVALID_OP
instead of anExpNode.ErrorNode
- Also note that the tree package contains all implementations of the nodes used in an Abstract Syntax Tree (AST), such as
ExpNode
andStatementNode
- Note here that if there’s an error, we return an
-
We then define the
parseRelOp
method itselfprivate Operator parseRelOp(TokenSet recoverSet) { return op.parse("RelOp", REL_OPS_SET, recoverSet, () -> { Operator operatorCode = Operator.INVALID_OP; switch (tokens.getKind()) { case EQUALS: operatorCode = Operator.EQUALS_OP; tokens.match(Token.EQUALS); /* cannot fail */ break; case NEQUALS: operatorCode = Operator.NEQUALS_OP; tokens.match(Token.NEQUALS); /* cannot fail */ break; case LESS: operatorCode = Operator.LESS_OP; tokens.match(Token.LESS); /* cannot fail */ break; case GREATER: operatorCode = Operator.GREATER_OP; tokens.match(Token.GREATER); /* cannot fail */ break; case LEQUALS: operatorCode = Operator.LEQUALS_OP; tokens.match(Token.LEQUALS); /* cannot fail */ break; case GEQUALS: operatorCode = Operator.GEQUALS_OP; tokens.match(Token.GEQUALS); /* cannot fail */ break; default: // If we get to here, there has been a fatal implementation // error in the cmocompiler. fatal("parseRelOp"); } return operatorCode; }); }
- Note here that we could choose conditional statements to implement the choice in this production
- However, here we use a switch statement as all of the branch have a start set that contains just one token.
- If we get to the
default:
case, there is a fatal error, as these are all of the possible terminal symbols for theRelOp
non-terminal symbol.
- Note here that we could choose conditional statements to implement the choice in this production
1.3 - StatementNodes
Parsing Statements
-
As before, we define a ParseMethod that returns a StatementNode
private final ParseMethod<StatementNode> stmt = new ParseMethod<>( (Location loc) -> return new StatementNode.ErrorNode(loc));
- Returns a
StatementNode.ErrorNode
if an error is encountered when parsing
- Returns a
1.3.1 - Parsing a Conditional Statement / If-Statement
private StatementNode parseIfStatement(TokenSet recoverSet) {
return stmt.parse("If Statement", Token.KW_IF, recoverSet,
() -> {
/* The current token is KW_IF */
tokens.match(Token.KW_IF); /* cannot fail */
Location loc = tokens.getLocation();
ExpNode cond = parseCondition(recoverSet.union(Token.KW_THEN));
tokens.match(Token.KW_THEN, STATEMENT_START_SET);
StatementNode thenClause =
parseStatement(recoverSet.union(Token.KW_ELSE));
tokens.match(Token.KW_ELSE, STATEMENT_START_SET);
StatementNode elseClause = parseStatement(recoverSet);
return new StatementNode.IfNode(loc, cond, thenClause, elseClause);
});
}
-
We use the
stmt.parse(...)
method to perform synchronisation at both the start and end of the parsing -
Supply as arguments
- Name of rule,
- Start set
- Recover set (provided as parameter)
- Anonymous parsing function
-
In the anonymous parsing function, we want to parse the production
- This line cannot fail as it should only be called if the current token is
Token.KW_IF
- Therefore, don’t need to use the error recovery version.
tokens.match(Token.KW_IF); // Match and consume
- We then store the current location for later.
- This is done, as we essentially want to treat the condition of the if statement as the location of the
Condition
itself (instead of the location ofToken.KW_IF
)
- This is done, as we essentially want to treat the condition of the if statement as the location of the
Location loc = tokens.getLocation();
- Following this, we parse the condition itself.
- The
recoverSet
in this case is anything that the if statement can start with and the next token in the production (KW_THEN
since we’re parsingCondition
)
- The
ExpNode cond = parseCondition(recoverSet.union(Token.KW_THEN));
- We then parse the
KW_THEN
token - we need to use the version with error recovery, as we can’t guarantee that the next token isKW_THEN
.- The set of tokens that can immediately follow
KW_THEN
are contained in theTokenSet
,STATEMENT_START_SET
- The set of tokens that can immediately follow
tokens.match(Token.KW_THEN, STATEMENT_START_SET);
- After we parse the
KW_THEN
token, we parse the statement immediately following it (the then clause).- We store this so that we can use it to build up our AST
StatementNode thenClause = parseStatement(recoverSet.union(Token.KW_ELSE));
- Next, we’re expecting the KW_ELSE token but we’re not sure so we use the error recovery version of
tokens.match
tokens.match(Token.KW_ELSE, STATEMENT_START_SET);
- After we parse the KW_ELSE token we parse the statement immediately following it (the then clause).
- Here, the
recoverSet
is just therecoverSet
parameter passed in as we are at the end of the production
- Here, the
StatementNode elseClause = parseStatement(recoverSet);
- We then construct the
StatementNode.IfNode
object and return that as part of our AST
return new StatementNode.IfNode(loc, cond, thenClause, elseClause);
- This line cannot fail as it should only be called if the current token is
1.3.2 - Parsing a While Statement
-
From the parseStatement function, we have:
private StatementNode parseStatement(TokenSet recoverSet) { return stmt.parse("Statement", STATEMENT_START_SET, recoverSet, () -> { /* The current token is in STATEMENT_START_SET. * Instead of using a cascaded if-the-else, as indicated in * the recursive descent parsing notes, a simpler approach * of using a switch statement can be used because the * start set of every alternative contains just one token. */ switch (tokens.getKind()) { case IDENTIFIER: return parseAssignment(recoverSet); case KW_WHILE: return parseWhileStatement(recoverSet); ... default: fatal("parseStatement"); // To keep the Java compiler happy - can't reach here return new StatementNode.ErrorNode(tokens.getLocation()); } }); }
-
So when we call parseWhileStatement we know that the token is Token.KW_WHILE.
-
We still perform synchronisation so that (a) the format is consistent (b) synchronisation is performed at the end.
-
The production for a While Statement is given as:
private StatementNode parseWhileStatement(TokenSet recoverSet) { return stmt.parse("While Statement", Token.KW_WHILE, recoverSet, () -> { /* The current token is KW_WHILE */ tokens.match(Token.KW_WHILE); /* cannot fail */ Location loc = tokens.getLocation(); ExpNode cond = parseCondition(recoverSet.union(Token.KW_DO)); tokens.match(Token.KW_DO, STATEMENT_START_SET); StatementNode statement = parseStatement(recoverSet); return new StatementNode.WhileNode(loc, cond, statement); }); }
1.4 - Types of AST Nodes
1.4.1 - Expression Nodes
- The
ExpNode
class in thetree
package implements subclasses for each of the subtypes ofExpNodes
-
ErrorNode
,ConstNode
,IdentifierNode
,VariableNode
,BinaryNode
,UnaryNode
,DereferenceNode
,NarrowSubrangeNode
,WidenSubrangeNode
-
Each of these nodes have a location and type as well as their own individual parameters
-
For example, a
BinaryNode
has the constructor:public BinaryNode(Location loc, Operator op, ExpNode left, ExpNode right);
-
-
We define more constant declarations that will help us with parsing statements:
/** * Set of tokens that may start a Statement. */ private final static TokenSet STATEMENT_START_SET = LVALUE_START_SET.union(Token.KW_WHILE, Token.KW_IF, Token.KW_READ, Token.KW_WRITE, Token.KW_CALL, Token.KW_BEGIN);
-
2.0 - Static Semantics of PL0
If implemented correctly, the static semantic checker of our compiler should pick up on all of the type errors.
-
Consider the following code - it is syntactically correct but it is not semantically / type correct.
const C = 42; type S = [-C..C]; var b : boolean; y : S; begin // main y := b + 42; // Addition between boolean and integer C := 27; // Assignment to a constant if y then y := 0 else y := 1 // y as condition (subrange type) is not boolean end
2.1 - PL0 Concrete vs Abstract Syntax
Our static semantics rules determine what we allow (and conversely, what we don’t allow) in our programming language grammar
- The parsing phase of the compiler parses the concrete syntax of the program and generates an abstract syntax tree
- The (concrete) syntax of a programming language describes the form of the language when it is treated as a sequence of terminal symbols (lexical tokens)
- The abstract syntax of a language represents the same constructs as a data structure, an abstract syntax tree
- All static semantics rules are expressed using the abstract syntax of the language
2.1.1 - Abstract Syntax of PL0
A program is a block A block has both declarations (ds) and statements (s) A declaration is a mapping from identifiers to individual declarations Each individual declaration is either: - A constant
- A type
- A variable
- A procedure
A constant has a parameter c, which is either a number, identifier, or unary operator applied to a number A type is either an identifier or subrange from one constant to another -
A left value is an identifier - $ \text{e\ \ \ \ :==\ \ \ \ n | lv | op(unary, e) | op(binary, (e,e))}$ Expressions are either numbers, left values, unary operators or binary operators.
Our starting unary operator is the symbol (the underscore indicates where arguments shall go.) -
The binary operators in PL0 are indicated above, where the underscore indicates where arguments shall go
2.1.2 - Abstract Syntax Example
Consider the following code, and its abstract syntax.
const C = 42;
type S = [-C..C];
var b : boolean;
y : S;
-
From this, let’s construct it’s abstract syntax
-
Let’s first map out the declarations,
-
Next, let’s map out the statements:
- What does this actually mean?
- We assign the value of
to the variable - In the statement
, we are applying the binary operator to two arguments - and . - The statement
applies the unary operator to the constant
2.2 - PL0 Types - Scalar and Reference Types
-
The
scalar types
are:- Integers (type
int
) - Booleans (type
boolean
) - Subrange, where
subrange(T, lower, upper)
is the subrange of values in basetype T
from lower to upper where - These are our primitive, predefined types.
- Integers (type
-
There are also
reference (or location)
types,ref(T)
that describe variables- To fully describe variables, we not only need to know its type, but also where it’s stored
- Hence, variables are reference types.
const C = 42; // C has type int type S = [-C..C]; // S is the type subrange(int, -42, 42) var b : boolean; // b has type ref(boolean) y : S; // y has type ref(subrange(int, -42, 42)
-
There are
product types
, whereis the type of pairs for -
There are
function types
, whereis the type of functions with arguments of type and results of type -
We can combine product types and function types to create more sophisticated operators:
-
Note here that the definition for equality is overloaded - for both
and -
The following summarises the semantic types shown above:
- Where T for a subrange must be
or - Where
stands for the integers
- Where T for a subrange must be
3.0 - Symbol Table
In the Parsing phase of a compiler, an AST and Symbol Table are created.
-
Entries in the symbol table are mappings from identifiers (
) to symbol table entries . -
We have different types for Constants, Types, Variables and Procedures in our symbol table
-
From before, we have the following code:
const C = 42; // C has type int type S = [-C..C]; // S is the type subrange(int, -42, 42) var b : boolean; // b has type ref(boolean) y : S; // y has type ref(subrange(int, -42, 42)
-
And we’ve generated its declarations
-
We then generate the symbol table entries for the declarations
3.1 - Well-Typed Expressions
-
Expressions need to be
well-typed
and we need to be able to determine the type of an expression -
We have type inferences which can be denoted as:
- Which essentially states that in the context of symbol table
, the expression is well-typed
and has type
- Which essentially states that in the context of symbol table
3.2 - Rules of Static Semantics
3.2.1 - Integer Value
- This rule has a conclusion without any premises.
- Any expression
, where is an integer is well typed, and has type
Example: Type Inference for 27
3.2.2 - Symbolic Constant
- This is essentially saying that an identifier
of type is well typed iff: - If the identifier
is in the domain of the symbol table (i.e. ) and; - If the entry associated with that identifier
is of type
- If the identifier
Example: Type Inference for Constant C
-
From before, we have the following entry in our symbol table
-
So we know that
and additionally, from the symbol table entry we know that . -
Therefore,
3.2.3 - Variable Identifier
- This is essentially saying that an identifier
of type is well typed iff: - If the identifier
is in the domain of the symbol table (i.e. ) and; - If the entry associated with that identifier
is of type
- If the identifier
Example: Type Inference for b
-
From before, we have the following entry in our symbol table
3.2.4 - Unary Negation
- This is essentially saying that for the unary operator
applied to an expression is well typed in the context of the symbol table and is of type iff: - The expression
is well typed in the context of the same symbol table and is of type integer
- The expression
Example: Type Inference for -C
- Therefore, we only need to show that
to prove that is well typed. - From before, we proved that the symbol with the declaration
is type correct and therefore is type correct.
3.2.5 - Binary Operator
- Note that the symbol
denotes any binary operator. - We have the expression
which essentially denotes the application of an arbitrary binary operator being applied to expressions is type correct, and has type iff is well typed in the context of the symbol table and has type is well typed in the context of the symbol table and has type - There exists a binary operator
that takes in type and produces type
Example: Type Inference for -C + 27
-
We want to know whether
is well typed. -
In this interpretation,
is replaced with -
To know whether
is type correct, we essentially want to know if the following holds true -
To do this, we can use the rule described above, and prove the following three things:
Is the first argument or type correct, and is it an integer? -
In the context of a symbol table with the following declaration:
-
We can conclude that the statement is type correct by Unary Negation, and is in fact an integer
-
Is the argument type correct, and is it an integer? - By Integer Value we can conclude that this statement is type correct and is in fact an integer
In the context of our symbol table , is addition well typed, takes input and produce a value of type - This holds true by definition of the addition binary operator.
3.2.6 - Dereference
The dereference rule allows us to treat an expression of type
as an expression of type itself.
- In the context of a symbol table
, an expression is well typed (& can be treated as if it was of type ) iff the expression is well typed, and of type in the context of the symbol table
Example: Type Inference for y
-
Recall that we have an entry in our symbol table:
-
And therefore, since
is well typed and of type so we can reason that is wel typed, and can be treated as type as a result of the premise holding. -
We know that this premise holds in the context of the symbol table
by the Variable Identifier rule.
3.2.7 - Widen Subrange
- Suppose we had a subrange of type
from to . We can treat this as a variable of type
Proof for Type Inference for y - Widen Subrange
Step 1: Use Variable Identifier Rule
- Since
, we can modify the expression from to
$\text{id}\in\text{dom(syms)}\ \text{\underline{syms(id)=VarEntry(T)}}\ \text{syms}\vdash\text{id : T}$
Step 2: Use Dereference Rule
- Since we have a statement of the form
we can use the Dereference rule to modify the expression from to
$\underline{\text{syms}\vdash\text{e : ref(T)}}\ \text{syms}\vdash\text{e : T}$
Step 3: Use Widen Subrange Rule
- Since we have a statement of the form
we can modify the expression from to
$\underline{\text{syms} \vdash e :\text{subrange(T,i,j)}}\ \text{syms}\vdash e : T$
3.2.8 - Narrow Subrange
- Given that
is well typed and of type , where and , then we can say that
Example: Type Inference for 27
- Suppose we were trying to argue that the integer 50 was in the subrange of -42 to 42
- This wouldn’t cause a static semantics error - we treat it as a runtime error.
- We treat it as a runtime error as the integer that we’re trying to coerce into the subrange may not be a constant value and we therefore cannot guarantee the value.
3.3 - Well-Formed Statements
-
Statements need to be well-formed
-
Our type inferences will be of the form:
- This inference states that in the context of the symbol table
, the statement is well-formed.
- This inference states that in the context of the symbol table
3.3.1 - Assignment
How do we know whether an assignment is well formed?
-
That is, how do we prove that
is well formed? -
In the context of the symbol table
, we require: - The left value
to be well typed, and has to be a reference to some type , - The expression
to be well typed in the context of the symbol table, and have type
- The left value
Example - Type Inference for y := 4
- In this example, we instantiate
to be the subrange from - For the statement to be well formed, we also require that
is also of the same type
We can reason that the inference
- We can use the Narrowing Subrange rule to justify the inference:
- Since
and is an we can reason that is well typed and is contained within the subrange type
- Since
We can reason that the inference
- We can use the Variable Identifier rule to justify the inference:
- Since
and has a VarEntry of the subrange type:
- Since
3.3.2 - Procedure Call
- A procedure call will be a well formed statement in the context of the symbol table as long as:
We call an identifier that is in the domain of the symbol table The identifier is a procedure entry.
3.3.3 - Read
In other programming languages, read and write wouldn’t be primitives, but we do this in PL0 for simplicity.
is well formed in the context of the symbol table iff: The left value is a reference type (i.e. something that we can assign to) The value read has a type of either an integer or subrange of integers.
3.3.4 - Write
will be well formed in the context of the symbol table iff: is of type is well typed
3.3.5 - Conditional Rule
- For a conditional (denoted
) to be well formed, we require: - The condition
to be well-typed, and return a boolean type in the context of the symbol table - The statements
to be well formed statements in the context of the symbol table.
- The condition
Example - Well Formed Conditional
Show that
if x < 0 then y := -x else y := x
is a well formed conditional statement.
-
From this statement, we can generate the following:
-
From our symbol table, we know that:
Firstly, we prove that the statement
That is:
- From the Variable Identifier rule, we can infer that:
$\footnotesize\text{x}\in\text{dom(syms)}\ \text{syms(x)=VarEntry(int)}\ \overline{\text{syms}\vdash\text{ x : ref(int)}}$
$\scriptsize\color{gray}\text{id}\in\text{dom(syms)}\ \text{\underline{syms(id)=VarEntry(T)}}\ \text{syms}\vdash\text{id : T}$
-
We know that
is in the domain of our symbol table as it is inherently in the symbol table:
- From the Dereference rule, we can infer that:
$\footnotesize\color{gray}\underline{\text{syms}\vdash\text{e : ref(T)}}\ \text{syms}\vdash\text{e : T}$
- From the Integer Value rule, we can inherently infer that:
- From the Binary Operator rule, we can inherently infer that:
Secondly, we prove that the statement
That is:
- From the Variable Identifier rule, we can infer that:
$\footnotesize\text{y}\in\text{dom(syms)}\ \text{syms(y)=VarEntry(int)}\ \overline{\text{syms}\vdash\text{ y : ref(int)}}$
$\scriptsize\color{gray}\text{id}\in\text{dom(syms)}\ \text{\underline{syms(id)=VarEntry(T)}}\ \text{syms}\vdash\text{id : T}$
- This is inherently true as y is in the domain of the symbol table BECAUSE it is in the symbol table.
- We can also do the same for the variable
$\footnotesize\text{x}\in\text{dom(syms)}\ \text{syms(x)=VarEntry(int)}\ \overline{\text{syms}\vdash\text{ x : ref(int)}}$
$\scriptsize\color{gray}\text{id}\in\text{dom(syms)}\ \text{\underline{syms(id)=VarEntry(T)}}\ \text{syms}\vdash\text{id : T}$
- From the Dereference rule, we can infer that:
$\footnotesize\color{gray}\underline{\text{syms}\vdash\text{e : ref(T)}}\ \text{syms}\vdash\text{e : T}$
- From the Unary Negation rule, we can infer that:
$\footnotesize\text{syms}\vdash\text{x : int}\ \overline{\text{syms}\vdash\text{op(-_,x) : int}}$
$\footnotesize\color{gray}\text{syms}\vdash\text{e : int}\ \overline{\text{syms}\vdash\text{op(-_,e) : int}}$
- And then combining Step 1 and Step 4 using the Statement Assignment rule.
$\footnotesize \text{syms}\vdash\text{y : ref(int)}\ \text{syms}\vdash\text{op(-_,x) : int}\ \overline{\text{syms}\vdash\text{WFStatement(assign(y,op(-_, x)))}}$
Thirdly, we prove the statement
That is:
- From the Variable Identifier rule, we can infer that:
$\footnotesize\text{y}\in\text{dom(syms)}\ \text{syms(y)=VarEntry(int)}\ \overline{\text{syms}\vdash\text{ y : ref(int)}}$
$\scriptsize\color{gray}\text{id}\in\text{dom(syms)}\ \text{\underline{syms(id)=VarEntry(T)}}\ \text{syms}\vdash\text{id : T}$
- This is inherently true as y is in the domain of the symbol table BECAUSE it is in the symbol table.
- We can also do the same for the variable
$\footnotesize\text{x}\in\text{dom(syms)}\ \text{syms(x)=VarEntry(int)}\ \overline{\text{syms}\vdash\text{ x : ref(int)}}$
$\scriptsize\color{gray}\text{id}\in\text{dom(syms)}\ \text{\underline{syms(id)=VarEntry(T)}}\ \text{syms}\vdash\text{id : T}$
- From the Dereference rule, we can infer that:
$\footnotesize\color{gray}\underline{\text{syms}\vdash\text{e : ref(T)}}\ \text{syms}\vdash\text{e : T}$
- And then combining Step 1 and Step 3 using the Statement Assignment rule.
$\footnotesize \text{syms}\vdash\text{y : ref(int)}\ \text{syms}\vdash\text{x : int}\ \overline{\text{syms}\vdash\text{WFStatement(assign(y,x))}}$
Finally, putting it all together:
$\footnotesize \text{syms}\vdash\text{y : ref(int)}\ \text{syms}\vdash\text{op(-_,x) : int}\ \overline{\text{syms}\vdash\text{WFStatement(assign(y,op(-_, x)))}}$
$\footnotesize \text{syms}\vdash\text{y : ref(int)}\ \text{syms}\vdash\text{x : int}\ \overline{\text{syms}\vdash\text{WFStatement(assign(y,x))}}$
- Combining all of these conclusions using the Conditional Statement rule, we get:
$\footnotesize\text{syms}\vdash\text{(op(_<_,(x,0)) : boolean)}\ \text{syms}\vdash\text{WFStatement(assign(y,op(-_, x)))}\ \text{syms}\vdash\text{WFStatement(assign(y,x))}\
\overline{\text{syms}\vdash\text{WFStatement(if(op(_<_,(x,0), assign(y,op(-_,x)),assign(y,x)))}}$
3.3.6 - Iteration (While Rule)
For a while loop, if the expression is well formed in the context of the symbol table and is a boolean AND if the statement is well formed, then the while statement will be well formed.
3.3.7 - Statement List
For a statement list, if all statements are well formed in the context of the symbol table, then the statement list will be well formed.
4.0 - Implementation of Static Semantic Checking
The PL0 Static Checker is implemented in the tree package.
4.1 - Abstract Syntax Tree Classes
- The
tree package
contains all of the abstract syntax tree classes.-
The
StatementNode
class implements all of the statement classes -
The ExpNode class implements all of the expression classes
-
4.2 - Symbol Table Implementations
- The
syms package
contains all of the data type implementations from the symbol table.- Type is an abstract class which other classes can use to implement their own desired behaviour.
- E.g.
ScalarType
,ProductType
,ReferenceType
,FunctionType
- In the Predefined class, we configure all of our predefined types and symbols in our symbol table.
- E.g. Scalars - Integers and Booleans
- Predefined operators (the actual symbol is defined in tree/Operator.java as enum)
predefined.addOperator(Operator.EQUALS_OP, ErrorHandler.NO_LOCATION, LOGICAL_BINARY);
predefined.addOperator(Operator.NEQUALS_OP, ErrorHandler.NO_LOCATION, LOGICAL_BINARY);
predefined.addOperator(Operator.NEG_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_UNARY);
predefined.addOperator(Operator.ADD_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_BINARY);
predefined.addOperator(Operator.SUB_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_BINARY);
predefined.addOperator(Operator.MUL_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_BINARY);
predefined.addOperator(Operator.DIV_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_BINARY);
predefined.addOperator(Operator.EQUALS_OP, ErrorHandler.NO_LOCATION,
INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.NEQUALS_OP, ErrorHandler.NO_LOCATION,
INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.GREATER_OP, ErrorHandler.NO_LOCATION,
INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.LESS_OP, ErrorHandler.NO_LOCATION,
INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.GEQUALS_OP, ErrorHandler.NO_LOCATION,
INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.LEQUALS_OP, ErrorHandler.NO_LOCATION,
INT_RELATIONAL_TYPE);
- Notice that our
addOperator
function takes three inputs:- Note that when we define these, we use the
ErrorHandler.NO_LOCATION
as a location - they’re predefined and thus are not defined in the source code.
- Note that when we define these, we use the
SymEntry.OperatorEntry addOperator(Operator op, Location loc, Type.FunctionType type);
- For our arithmetic operators, we define a
FunctionType
above:
FunctionType ARITHMETIC_BINARY = new FunctionType(PAIR_INTEGER_TYPE, INTEGER_TYPE);
4.3 - Static Checker Implementation
-
The static checker traverses the abstract syntax tree and type checks the tree as it traverses through
- As we traverse the AST, we may be updating it with type information
- The tree traversal is done using the Visitor pattern
-
Consider the following code, which is essentially an abstraction and simplification of an AST
public abstract class Tree { public abstract void accept(Visitor v); public static class ATree extends Tree { // Tree class with two children Tree t1, t2; Public ATree(Tree t1, Tree t2) { super(); this.t1 = t1; this.t2 = t2; } public void accept(Visitor v) { v.visitATree(this); } } public static class BTree extends Tree { // A leaf node of the tree (i.e. has no subtrees) int n; public BTree(int n) { super(); this.n = n; } public void accept(Visitor v) { v.visitBTree(this); } } public static class CTree extends Tree { // Tree subclass with three children Tree t1,t2,t3; public CTree extends Tree { public CTree(Tree t1, Tree t2, Tree t3) { super(); this.t1 = t1; this.t2 = t2; this.t3 = t3; } } } }
4.3.1 - Visitor Pattern
- We would like to create the visit methods shown here to implement tree traversal using the visitor pattern.
- Create individual methods for visiting ATree, BTree and CTree
- Ideally, we’d just like to do something like this:
public class TraversalIdeal {
public void visit(Tree.ATree t) {
visit(t.t1);
visit(t.t2);
}
public void visit(Tree.BTree t) {
System.out.println(t.n);
}
public void visit(Tree.CTree t) {
visit(t.t1);
visit(t.t2);
visit(t.t3);
}
}
-
However, we don’t know actually know the (dynamic) type of
t.tn
- we know that it’s a Tree (static type), but we don’t know what implementation of Tree it is.-
The compiler is looking for a method of the form:
public void visit(Tree t) { ... }
-
-
We can solve this by using the visitor pattern - we create an interface that has a traversal method for each tree subtype.
public interface Visitor { public void visitATree(Tree.ATree t); public void visitBTree(Tree.BTree t); public void visitCTree(Tree.CTree t); }
-
We then add an accept method in each of our tree implementations:
public abstract class Tree { public abstract void accept(Visitor v); public static class ATree extends Tree { Tree t1,t2; public ATree(Tree t1, Tree t2){ super(); this.t1 = t1; this.t2 = t2; } public void accept(Visitor v) { v.visitATree(this); } } }
-
And then our Traversal class implements these methods:
public class Traversal implements Visitor { @Override public void visitATree(Tree.ATree t) { // Traversal using the aceept() method instead of visit() method // Dynamically dispatches the call to the right visitXTree method. t.t1.accept(this); t.t2.accept(this); } }