A production rule in a PSG r → w
has a LHS, r, that may contain
more than one nonterminal. It is
required to contain at least one
nonterminal.
For example, if A = {0, 1} and <s> is the start
symbol in a PSG, 0<s>0<s>0 → 00010 would be an
acceptable production rule, but NOT in a CFG.
Noam Chomsky
Regular Languages
Introduced by Stephen Kleene
in 1951. Sometimes known as a
finite-state language, as a
language is regular iff it can be
recognised by an FSA.
The regular languages over A constitute the smallest collection, C, of the subsets of A*, satisfying that:
1. All finite subsets of A* belong to C.
2. C is closed under the Kleene star operation
(if M⊆A* is inside C, i.e. if M ∈ C, then M* ∈ C)
3. C is closed under concatenation and union
Let A be a finite set, and let A* be the set of words over the alphabet A.
A subset, L, of A*, is called a regular language over
the alphabet A, if L = Lm for some finite sequence
L1, L2, ..., Lm of subsets of A* with the property
that ∀i, 1 ≤ i ≤ m, Li satisfies one of the following:
Example
Let A = {0, 1}. Let L = {0^m0^n | m, n ∈ N, m ≥ 0, n ≥ 0}.
L is a regular language by way of these statements
0^m1^n stands for m 0's
followed by n 1's i.e. 0^m ◦ 1^n
So, let us examine:
L' = {0^m | m ∈ N, m ≥ 0}
L'' = {1^n | n ∈ N, n ≥ 0}
L, however, is NOT a regular language if the power of 0 and 1 were the same. This
language consists of strings of 0's, followed by an equal number of strings of 1's.
For a machine to decide that the string is inside the language, it must store the number of 1's, as
it examines the number of 0's or vice versa
The number of strings of the type 0m1n is not finite, however, so a finite-state machine cannot recognise this language.
Finite State Acceptors
Components
Set of states
Initial state
The subset F ⊆ S of finishing states
Transition mapping
[s' = t(s, a)]
FSA = (S, F, i, t, A)
Alphabet
Recognition and Acceptance
A word, w, over alphabet A is
recognised or accepted
by the FSA if each letter,
in order, is found in the
FSA states
A language, L, over the
alphabet A, is said to be
recognised or accepted
by the FSA if L is the set
consisting of all words
recognised by the FSA
Myhill-Nerode theorem
Let L be a language over the
alphabet A. If the set L/N of
equivalence classes in L is infinite,
then L is NOT a regular language.
Two Types:
Deterministic
every states has exactly
one transition for each
possible input
(function)
Non-deterministric
an input can lead to one,
more than one, or no
transition for a given
state (not a function)
Automata Theory
An automaton is a
mathematical model of
a computing device.
Regular Grammars
A context-free grammar is given by (V, A, <s>, P)
is called a regular grammar if every production
rule in P is of one of the three forms:
1. <A> → b<B>
2. <A> → b
3. <A> → ε
Normal Form
Lemma: Any language
generated by a regular
grammar may be
generated by a regular
grammar in normal form,
if you replace rules of type
2 (<A> → b) with two rules,
<A> → b<F> and <F> → ε.
Left-Regular Grammar
Right-Regular Grammar
Rule: <A> → <B>b
Example
Regular Expressions
∅, ε, and all elements of A are regular expressions
ε = language of a single string (the empty string)
∅ = language that doesn't contain any strings
A language is regular ⇐⇒ some regular expression describes it
The Pumping Lemma
If L is a regular language, then there is a number p (the pumping
length) where if w is any word in L of length at least p, then w =
xuy for words x, y, and u satisfying:
1. u != ε
2.|xu| ≤ p
3. x u^n y ∈ L, ∀n ≥ 0
Basics
Let A be a finite set
A = alphabet
string in form a1, a2, ..., an = word
A^n = set of all words of length n over the alphabet A
A^+ = set of all words of positive length
|w| = length of word
w1◦w2 = concatenation
Associative
only concatenative if A has only one letter
A^0 = empty word {ε}
A* (Kleene star) = A^+ ∪ A^0
Elements of A = letters
L = language over A (subset of A*)
L = formal language if ∃ a finite set
of rules or algorithm that
generates exactly L
Formal Grammars
A set of production rules
for strings in a language.
To generate such a
language, we use:
1. the set A (alphabet)
2. a start symbol <s>
3. a set of production rules
Example
A = {0, 1}
<s>
<s> → 0<s>1
<s> → 01
all strings of form 0^m1^m
terminals (strings of just elements of A i.e. 01)
nonterminals (strings that contain
elements which aren't part of A i.e.
00<s>11)
Context-Free Grammar (V, A, <s>, P)
V: set of terminals and nonterminals,
A: set of terminals
P ⊂ (V\A) x V*
Ambiguous: if same string can be
generated in more than one way
A language generated by a CFG is
known as a context-free
language.