Week 5.1
1.0 - First-Sets and Follow Sets
- In predictive RDP, we make choices between alternatives based on the current token.
- To do this, we need to know:
First Set
What tokens each alternative can begin with
- If a construct is nullable, in order to choose between recognising the empty string or a non-empty string, what symbols can follow the construct
- Do we match the empty sequence of symbols or was there a syntax error?
- To know this, we need to know what the
Follow Set
of a non-terminal symbol is
1.1 - First Set
- The first set for a construct α, i.e. First(α) is a set, and records:
- The set of terminal symbols α can start with, and
- If α is nullable it contains the empty string ϵ to indicate that.
- It should be emphasised that the ϵ in a
First Set
is not recording that the construct can begin with the empty string, but rather that the whole construct can match the empty string (i.e. is nullable)
- Including ϵ in the first set is a bit confusing.
- It is there purely to indicate that the construct is nullable, and hence the first set encodes the two pieces of information
- The terminal symbols that can start the construct
- Whether or not it is nullable.
1.2 - First Set - Formal Definition
-
For these definitions, α,β and γ are assumed to be possible empty sequences of terminal and non-terminal symbols.
-
If α is not nullable, its first set is the set of terminal symbols that can start α
- i.e. those terminal symbols “a” such that α can derive a sequence beginning with “a”
First(α)={a:Terminal ∣ (∃β∙α⇒aβ)}∗
-
If α is nullable, First(α) includes ϵ, i.e. if a⇒∗ϵ, then we have:
First(α)={a:Terminal ∣ (∃β∙α⇒∗aβ)}∪{ϵ}
-
Note that in this context, ϵ is not a terminal symbol; its presence in a first set merely indicates that α is nullable.
1.3 - Calculating First Sets
-
Let:
a |
a terminal symbol |
α1,α2,⋯,αn |
string of (terminal and non-terminal) symbols |
A |
non-terminal symbol defined by a single production | A→α1∣α2∣⋯∣αn |
Then:
First(ϵ)={ϵ}
First(a)={a}
First(α1∣α2∣⋯∣αn)=First(α1)∪First(α2)∪⋯∪First(αn)
First(A)=First(α1∣α2∣⋯∣αn)
- (is nullable, cannot be used to derive a set that start with any non-terminal symbol)
- Note that if the first set of any α1,α2,⋯,αn contain ϵ then First(α1∣α2∣⋯∣αn) must contain ϵ
If any of the alternatives is nullable, its first set will contain ϵ, and hence the fisrt set of the set of alternatives will contain ϵ
1.3.1 - Examples of Calculating First Sets (alternatives)
First(a∣c)=First(a)∪First(c)={a}∪{c}={a,c}Given that the production for A is A→a∣c then:
First(A)=First(a∣c)={a,c}Then, First(A) is just the RHS of its production (which is what we’ve calculated above)
Given the only production for B is B→b∣ϵ then:
First(B)=First(b∣ϵ)=First(b)∪First(ϵ)={b,ϵ}Then, First(B) is just the RHS of its production, which is computed as follows.
Note that First(ϵ)={ϵ} as ϵ is nullable.
1.3.2 - Calculating First-Sets of Symbols.
Let S1,S2,⋯,Sn be (terminal or non-terminal) symbols, then
-
Since S1is the first symbol in the first set that we’re trying to compute, the first set must contain the terminal symbols that can start S1
-
If S1 is nullable, then it will also contain all of the non-terminal symbols that can start S2(and this pattern can repeat until the end of the sequence, to Sn
-
We also need to know whether a sequence of symbols is nullable - the sequence will only be nullable if all of the symbols within the sequence are nullable.
First(S1 S2 ⋯ Sj ⋯ Sn)=First(S1)−{ϵ}∪First(S2)−{ϵ} If S1 is nullable⋯∪First(Sj)−{ϵ} If S1 S2⋯Sj−1 is nullable∪⋯First(Sn)−{ϵ} If S1 S2⋯Sn−1 is nullable∪{ϵ} If S1 S2⋯Sn is nullable
Example of Calculating First-Sets of Sequences
Let the only production for A be A→a d ∣ ϵ, then the first set is:
First(A)=First(a d ∣ ϵ)=First(a d)∪First(ϵ)=First(a)∪First(ϵ)=First(a)∪{ϵ}={a,ϵ}Replace A with the RHS of its production
Separate out the alternative
First(a d)=First(a)First(ϵ)={ϵ}However, as A is nullable,
First(A b)=First(A)−{ϵ}∪First(b)={a,ϵ}−{ϵ}∪{b}={a,b}Since A is nullable, its first set also contains all of the terminal symbols b can start with.
Example II of Calculating First-Sets of Sequences
Let the following be the only productions for A,B and C.
A→a ∣ ϵB→b ∣ ϵC→c ∣ ϵThen each of A,B,C is nullable, and hence
First(A B C)=First(A)−{ϵ} ∪ First(B)−{ϵ} ∪ First(C)−{ϵ}∪{ϵ}={a,ϵ}−{ϵ} ∪ {b,ϵ}−{ϵ} ∪ {c,ϵ}−{ϵ} ∪ {ϵ}={a} ∪ {b} ∪ {c} ∪ {ϵ}={a,b,c,ϵ}1.3.3 - First Sets for Optionals, Repetitions and Groups (EBNF)
1.4 - Calculating First-Sets Algorithmically
- To calculate the first set for a syntactic construct, we need to know the first sets for the non-terminals occurring in it
- Hence, we start by showing how to calculate the first sets of all the non-terminals in a grammar
- We start with the first sets for all non-terminals being the empty set, and note that the first set for every terminal symbol ‘‘a" is the singleton set {a}
- We then make a pass over all productions in a grammar, considering all alternatives and process as follows.
- In this process, we assume that our grammar is written in plain BNF form.
- If there is a production of the form N→ϵ, we add ϵ to the first set for N to indicate that it is nullable.
- If there is a production of the form N→S1 S2 ⋯ Sn, then for each i∈1⋯n, if for all j∈1⋯1−i−1, Sj is nullable, we add the current first set for Sj minus ϵ to the first set of N
- If every construct S1,⋯,Sn is nullable, we add ϵ to the first set of N
- After making a complete pass in which we process every alternative right side for all productions, we repeat the process of making a pass, but start with the first sets computed so far, rather than the initial first sets.
- This pass may or may not extend some first sets
- If no first sets are modified in the pass, we are finished.
- Otherwise, we repeat the process.
- Because all the first sets are finite, and each time we decide to repeat the process at least one first set must have been extended by at least one symbol, the whole process must terminate.
- The above iterative algorithm is a common approach to calculating inductively defined sets.
1.4.1 - Calculating First Sets Algorithmically
-
Initially, set all first sets to be the empty set, {ϵ}
|
1 |
A |
{} |
B |
{} |
C |
{} |
D |
{} |
-
First Pass
- Consider the production A→B x ∣ C - from this we know that First(B)−{ϵ}∈First(A),First(C)−{ϵ}∈First(A)
- Consider the first alternative, B x
- We can’t update or change anything since we don’t know what First(B) is, so we skip it for this iteration
- Consider the second alternative C - we also can’t update or change anything since we don’t know what First(C) is so we skip it for this iteration
- At this stage, we don’t even know that C is nullable, and therefore can’t deduce that ϵ∈First(A)
- Consider the production B→C y ∣ D - from this we know that First(C)−{ϵ}∈First(B) and First(D)−{ϵ}∈First(B)
- As in (2a), we realise that we don’t know what First(C) or First(D) is, so we can’t update the first sets
- Consider the third production C→D z ∣ ϵ - from this we know that {ϵ}∈First(C) and First(D)−{ϵ}∈First(C)
- Since we know that C is nullable, we can add {ϵ} to First(C)
- Consider the fourth production D→ A w - from this we know that First(A)−{ϵ}∈First(D)
- As in (2a) and (2b) we don’t know what First(A) is, so we can’t update First(D)
|
1 |
A |
{} |
B |
{} |
C |
{ϵ} |
D |
{} |
-
Second Pass
- There is a “next round” as a first set changed in the last iteration - First(C)
- Consider the first production, A→B x ∣ C
- We look at the first alternative in the production, B x - we can’t do anything as First(B) is still undefined
- We look at the second alternative in the production C - we know that C is nullable, and therefore ϵ∈First(C)
- Therefore, we can add ϵ to First(A)
- Consider the second production, B→C y ∣ D
- We look at the first alternative in the production, C y
- We don’t know anything about First(C) besides the fact that C is nullable.
- Based on this, we know that y can start B - we can add y∈First(B)
- We look at the second alternative in the production, D
- Since First(D) is still unknown, we can’t add anything to First(B)
- Consider the third production, C→D z ∣ ϵ
- Consider the first alternative in the production, D z
- Since First(D) is still unknown, we can’t add anything to First(C) (we already know that C is nullable and have already added ϵ to First(C)
- Consider the second alternative in the production - The symbol ϵ has already been added to First(C) to signify that it is nullable.
- Consider the fourth production, D→ A w
- We know that First(A)∈First(D), and we have ϵ in First(A)
- There is a case where A=ϵ, and D produces w
- Therefore, we can add w∈First(D)
|
1 |
2 |
A |
{} |
{ϵ} |
B |
{} |
{y} |
C |
{ϵ} |
{} |
D |
{} |
{w} |
-
Third Pass
- There is a “next round” as we updated the first sets of A, B and D in the last iteration
- Consider the first production, A→B x ∣ C
- Consider the first alternative, B x - we now know that y∈First(B) so we can add y∈First(A)
- Since B is not nullable, we don’t include x
- Consider the second alternative, C - we’ve already added ϵ to signify that C is nullable.
- Consider the second production, B→C y ∣ D
- Consider the first alternative C y - we set First(B) to be anything C can start with - Since C is nullable, we set it to y
- Consider the second alternative D - we add FirstD∈First(B) which means that we add w to First(B)
- Consider the third production, C→D z ∣ ϵ
- Consider the first alternative of C - we now know that First(D) contains w, so we add that to First(C)
- We also don’t add z as D is not nullable.
- Consider the fourth production, D→ A w
- We know that First(A)−{ϵ}$ now contains y, and since the production starts with A, we can add that to First(D)
|
1 |
2 |
3 |
A |
{} |
{ϵ} |
{ϵ,y} |
B |
{} |
{y} |
{y,w} |
C |
{ϵ} |
{} |
{ϵ,w} |
D |
{} |
{w} |
{w,y} |
-
Fourth Pass
- There is a “next round” as we updated the first sets of B, C and D in the last iteration
- Consider the first production, A→B x ∣ C
- We know that First(A)=(First(B)−{ϵ}) ∪ (First(C)−{ϵ})
- We now see that First(B) now contains w, so we add that to our first set
- Since C is nullable, we add ϵ to First(A)
- Consider the second production, B→C y ∣ D
- Consider the first alternative C y - we set First(B) to be anything C can start with - Since C is nullable, we set it to y
- Consider the second alternative D - we add FirstD∈First(B) which means that we add w to First(B)
- Consider the third production, C→D z ∣ ϵ
- Consider the first alternative, D z which now has y - we add this to First(C)
- Consider the fourth production, D→ A w
- We know that First(A)−{ϵ} now contains y, and since the production starts with A, we can add that to First(D)
|
1 |
2 |
3 |
4 |
A |
{} |
{ϵ} |
{ϵ,y} |
{ϵ,y,w} |
B |
{} |
{y} |
{y,w} |
{y,w} |
C |
{ϵ} |
{} |
{ϵ,w} |
{ϵ,w,y} |
D |
{} |
{w} |
{w,y} |
{w,y} |
-
Fifth Pass
- There is a “next pass” as the first sets of A, C and D were updated in the previous iteration
- {we actually perform an iteration and realise nothing changes, so we terminate at the end of this loop}
1.4.2 - Calculating First Sets - Formal Algorithmic Definition
-
We allow ϵ as a symbol for calculating first sets:
Map ⟨Symbol, Set⟨Symbol⟩⟩ first;
calculateFirst()
for (Symbol N : g.nonterminals)
first(N) := {};
do
saveFirst := first;
for (p : g.productions)
first(p.lhs) := first(p.lhs) ∪ firstSeq(p.rhs)
while (first ≠ saveFirst)
-
This process computes the least fixed-point of the relation firstSeq(p.rhs) ⊆ first(p.lhs) simultaneously for all productions p in the grammar.
-
Note that the firstSeq method returns the first set of all of the symbols in the production
-
The right side of N→ϵ is represented by sequence of length 0
Set ⟨Symbol⟩ firstSeq(List⟨Symbol⟩ s)
i := 0; nullable := true;
f := {};
while i < length(s) && nullable
if s(i) ∈ g.terminals
f := f ∪ {s(i)};
nullable =:= false
else:
f := f ∪ (first(s(i)) - {ε});
nullable := (ε ∈ first(s(i)));
i := i + 1;
if nullable:
f := f ∪ {ε};
return f