Zusammenfassung der Ressource
Formal Languages
- Phrase Structure Grammars
- (V, A, <s>, P)
- Finite set V
- A subset A of V
- An element <s> of V\A
- Finite subset P ⊂ (V*\A*) x V*
- In comparison, for CFG, P ⊂ (V\A) x V*
- A production rule in a PSG r → w
has a LHS, r, that may contain
more than one nonterminal. It is
required to contain at least one
nonterminal.
- For example, if A = {0, 1} and <s> is the start
symbol in a PSG, 0<s>0<s>0 → 00010 would be an
acceptable production rule, but NOT in a CFG.
- Noam Chomsky
- Regular Languages
- Introduced by Stephen Kleene
in 1951. Sometimes known as a
finite-state language, as a
language is regular iff it can be
recognised by an FSA.
- The regular languages over A constitute the smallest collection, C, of the subsets of A*, satisfying that:
- 1. All finite subsets of A* belong to C.
- 2. C is closed under the Kleene star operation
(if M⊆A* is inside C, i.e. if M ∈ C, then M* ∈ C)
- 3. C is closed under concatenation and union
- Let A be a finite set, and let A* be the set of words over the alphabet A.
- A subset, L, of A*, is called a regular language over
the alphabet A, if L = Lm for some finite sequence
L1, L2, ..., Lm of subsets of A* with the property
that ∀i, 1 ≤ i ≤ m, Li satisfies one of the following:
- Example
- Let A = {0, 1}. Let L = {0^m0^n | m, n ∈ N, m ≥ 0, n ≥ 0}.
- L is a regular language by way of these statements
- 0^m1^n stands for m 0's
followed by n 1's i.e. 0^m ◦ 1^n
- So, let us examine:
- L' = {0^m | m ∈ N, m ≥ 0}
- L'' = {1^n | n ∈ N, n ≥ 0}
- L, however, is NOT a regular language if the power of 0 and 1 were the same. This
language consists of strings of 0's, followed by an equal number of strings of 1's.
- For a machine to decide that the string is inside the language, it must store the number of 1's, as
it examines the number of 0's or vice versa
- The number of strings of the type 0m1n is not finite, however, so a finite-state machine cannot recognise this language.
- Finite State Acceptors
- Components
- Set of states
- Initial state
- The subset F ⊆ S of finishing states
- Transition mapping
[s' = t(s, a)]
- FSA = (S, F, i, t, A)
- Alphabet
- Recognition and Acceptance
- A word, w, over alphabet A is
recognised or accepted
by the FSA if each letter,
in order, is found in the
FSA states
- A language, L, over the
alphabet A, is said to be
recognised or accepted
by the FSA if L is the set
consisting of all words
recognised by the FSA
- Myhill-Nerode theorem
- Let L be a language over the
alphabet A. If the set L/N of
equivalence classes in L is infinite,
then L is NOT a regular language.
- Two Types:
- Deterministic
- every states has exactly
one transition for each
possible input
(function)
- Non-deterministric
- an input can lead to one,
more than one, or no
transition for a given
state (not a function)
- Automata Theory
- An automaton is a
mathematical model of
a computing device.
- Regular Grammars
- A context-free grammar is given by (V, A, <s>, P)
is called a regular grammar if every production
rule in P is of one of the three forms:
- 1. <A> → b<B>
- 2. <A> → b
- 3. <A> → ε
- Normal Form
- Lemma: Any language
generated by a regular
grammar may be
generated by a regular
grammar in normal form,
if you replace rules of type
2 (<A> → b) with two rules,
<A> → b<F> and <F> → ε.
- Left-Regular Grammar
- Right-Regular Grammar
- Rule: <A> → <B>b
- Example
- Regular Expressions
- ∅, ε, and all elements of A are regular expressions
- ε = language of a single string (the empty string)
- ∅ = language that doesn't contain any strings
- A language is regular ⇐⇒ some regular expression describes it
- The Pumping Lemma
- If L is a regular language, then there is a number p (the pumping
length) where if w is any word in L of length at least p, then w =
xuy for words x, y, and u satisfying:
- 1. u != ε
- 2.|xu| ≤ p
- 3. x u^n y ∈ L, ∀n ≥ 0
- Basics
- Let A be a finite set
- A = alphabet
- string in form a1, a2, ..., an = word
- A^n = set of all words of length n over the alphabet A
- A^+ = set of all words of positive length
- |w| = length of word
- w1◦w2 = concatenation
- Associative
- only concatenative if A has only one letter
- A^0 = empty word {ε}
- A* (Kleene star) = A^+ ∪ A^0
- Elements of A = letters
- L = language over A (subset of A*)
- L = formal language if ∃ a finite set
of rules or algorithm that
generates exactly L
- Formal Grammars
- A set of production rules
for strings in a language.
To generate such a
language, we use:
- 1. the set A (alphabet)
- 2. a start symbol <s>
- 3. a set of production rules
- Example
- A = {0, 1}
- <s>
- <s> → 0<s>1
<s> → 01
- all strings of form 0^m1^m
- terminals (strings of just elements of A i.e. 01)
- nonterminals (strings that contain
elements which aren't part of A i.e.
00<s>11)
- Context-Free Grammar (V, A, <s>, P)
- V: set of terminals and nonterminals,
A: set of terminals
- P ⊂ (V\A) x V*
- Ambiguous: if same string can be
generated in more than one way
- A language generated by a CFG is
known as a context-free
language.
- Context-Sensitive Grammars