Lecture 2

  1. PL0’s Lexical Tokens
  2. Concrete Syntax Trees
  3. Abstract Syntax Trees
  4. Syntax Checking and Type Checking

1.0 - PL0’s Lexical Tokens

1.1 - An Aside on Regular Expressions

1.2 - List of PL0 Lexical Tokens

2.0 - Concrete Syntax Trees

This section is based on the PL0 Compiler Example set of PowerPoint slides

KW_IF, IDENTIFIER("x"), LESS NUMBER(0), KW_THEN, IDENTIFIER("z"), ASSIGN, 
MINUS, IDENTIFIER("x"), KW_ELSE, IDENTIFIER("z") ASSIGN, IDENTIFIER("x")

2.1 - Parsing the Concrete Syntax Tree

2.1 - Lessons from The Concrete Syntax Tree

3.0 - Abstract Syntax Trees

We can use Abstract Syntax Trees to represent the same information as in a Concrete Syntax Tree in a denser format.

PL0 Compiler Data Structures

4.0 - Syntax Checking and Type Checking

4.1 - Syntax Checking

if x < 0, then z := -x else z := x

4.2 - Type Checking the Abstract Syntax Tree (AST)

4.2.1 - Coercions

  1. We first evaluate the expression x < 0 - we need to determine whether the expression has a Boolean value.

    • We’re essentially checking the BinaryNode and its children nodes (as indicated by the orange box)
    • We notice that the BinaryNode specifies the use of LESS_OP which is the less than operator.
      • The less than operator takes two integers and returns a Boolean value, i.e. LESS_OP:(int×int)bool\text{LESS}\_\text{OP}:(\text{int}\times\text{int})\rightarrow\text{bool}
    • We need to check that both the Identifier “x” and the Constant 0 are both integers, or coerce them into Integers if they are not.
      • We resolve the type of the identifier “x” by looking it up in our symbol table.
      • By looking up the type of the identifier “x” we may see that its type is ref(int) (that is, a reference to an integer)
      • We need to dereference the identifier “x” - so we’re actually performing the LESS_OP operation on the dereferenced (reference to) an integer and a constant integer.
    • Therefore, the statement is type correct for its usage within this specific conditional statement.
    • After type checking the expression x < 0 our symbol table looks like this:
  2. We then evaluate the expression z := -x to check whether it’s well formed.

    • The AssignmentNode has a left value of IdentifierNode(”z”) and we need to resolve the identifier node’s reference just as we did for IdentifierNode(”x”) in the step above.
    • By looking up the type of Identifier(”z”) in our symbol table, we may discover that it has type ref(int)
    • We now want to check the right child of the AssignmentNode to see whether it is an expression that evaluates to be the same type - the expression is the minus operator (MINUS_OP) being applied to the identifier “x”
      • The MINUS_OP takes an integer and returns an integer (MINUS_OP:intint)(\text{MINUS}\_\text{OP}:\text{int}\rightarrow\text{int})
      • Therefore, this expression should be well typed as long as the input value we’re passing it is an integer.
      • We’ve already looked up IDENTIFIER(”x”) before and we know it’s an integer variable.
      • It’s not exactly an integer, so we have to coerce it into an integer (this is done by dereferencing the integer).
  3. Likewise, we check the other assignment.

Example Summary

5.0 - Context Free Grammars

5.1 - Writing Context-Free Grammars

5.2 - Definition of Context-Free Grammars

EE Op E  (" E )" numberE \rightarrow E\ Op\ E\ |\ ``(" \ E \ ``)" | \ \text{number}
Op+"""Op \rightarrow ``+" | ``-" | ``*"

This is a leftmost derivation, as at every step we choose to expand the leftmost non-terminal symbol

5.3 - Directly Derives (Formal Definition)

5.4 - Sequences of Derivations

We say that α\alpha derives β\beta (i.e. aβa \overset{*}{\Rightarrow}\beta) if there is a sequence of direct derivations from α\alpha to β\beta (this sequence being {γ0,γ1,,γn}\{\gamma_0, \gamma_1, \cdots,\gamma_n\} where α=γ0\alpha=\gamma_0 and γn=β\gamma_n=\beta)

5.5 - Nullable


5.6 - Language Corresponding to a Grammar

The (formal) language L(G)\mathcal{L}(G) corresponding to a grammar, G, is the set of all finite sequences of terminal symbols that can be derived from the start symbol of the grammar using its productions

L(G)={tseqSt} \mathcal{L}(G)=\{t \in\bold{seq}\sum |S\overset{*}{\Rightarrow}t\}

where SS is the start symbol of GG and \sum is its set of terminal symbols.

5.7 - Sentences and Sentential Forms


EE Op EE Op E Op E3 Op E Op E3E Op E34 Op E34E342 E \Rightarrow E\ Op\ E\\ \Rightarrow E\ Op\ E\ Op\ E\\ \Rightarrow 3\ Op\ E\ Op\ E\\ \Rightarrow 3 - E\ Op\ E\\ \Rightarrow 3-4\ Op\ E\\ \Rightarrow3-4-E\\ \Rightarrow3-4-2

6.0 - Language Examples

E"("E")"aL(E)={a,(a),((a)),(((a))),((((a)))),} E\rightarrow"(" E")"|a\\ \color{gray}\mathcal{L}(E)=\{a,(a),((a)),(((a))),((((a)))),\cdots\}

6.1 - Parse Trees

Parse Tree Example

This is the parse tree for the previous example, where we try to parse 3423-4-2

EE Op EE\rightarrow E\ Op\ E

Op "+"""""Op\rightarrow\ "+"|"-"|"*"

6.2 - Ambiguous Grammars

In general, we don’t want our grammars to be ambiguous

The key idea to add a new non-terminal symbol to break the symmetry of the production

Parse Tree Using Left-Associative Unambiguous Grammar

Creating a parse tree for the 3423-4-2 example using left-associative unambiguous grammar

EE "" TETTN E\rightarrow E\ "-"\ T\\ E\rightarrow T\\ T\rightarrow N