Probabilistic parsing

Description

Note on Probabilistic parsing, created by Doria Šarić on 06/06/2018.
Doria Šarić
Note by Doria Šarić, updated more than 1 year ago
Doria Šarić
Created by Doria Šarić over 6 years ago
8
0

Resource summary

Page 1

Statistical Parsing ====================================================================================        -- using statistical models for AMBIGUITY (PP-attachment), guiding parsing, most likely parse    -- we extract the grammar from corpora -- parsing of free, non-restricted texts with >90 accuracy and efficiently -- we need to have a tagged POS corpora -- analyzed corpora (treebanks) -- PTB, Ancora   -- lexical approaches -- context free (unigrams)                                       -- context dependent (ngrams, HMM) -- syntactic approaches -- SCFG (inside and outside and viterbi algorithm, learning models)                                                    = Stohastic context-free grammar -- hybrid approaches -- stochastic lexicalized tags - computing the most probable parse - Vitebri algorithm - Parameter learning    -- Supervised - tagged corpora (some linguist) (from treebanks)    -- Unsupervised - Baum-Welch (Fw-Bw) for HMM                                  - Inside-Outside for SCFG

SCFG  ==================================================================================== -- associate probability p to each rule and each lexical entry - restriction for context-free grammar:  binary rules: Ap -> Aq Ar    matrix Bpqr                                                                                        unary rules : Ap -> BM        matrix Upm                = CHOMSKY NORMAL FORM (CNF) -- assign a p to each left-most derivation or parse tree, allowed by the underlying CFG argmax p(t) is the most likely parse tree   - How to obtain PCFG from a treebank? prob of a sentence: sum of p(t) for all t in T, best parse is max prob from all p(t)   PROS:  some idea of the probability of a parse (but not very good),              can be learned without negative examples             provides language model for a language                     CONS:  provides worse language model than a 3-gram   -- robust, we can combing SCFG with 3-grams -- assigns a lot of probabilities to short sentences, small trees are more probable -- parameter estimation (probabliities)              -- problem of sparseness, volume -- asociation to the rule, information about the point of application of the rule in the derivation tree lost -- low frequency constructions are penalized -- probab of a derivation, contextual independence is assumed (CF grammar, but also prob assigment) -- the possibility of relax conditional independence -> sensitivity to structure, lexicalization -- node expansion depends on its position in the tree   -- 2 models     Condition/Discriminative model: probability of parse tree directly estimated                probs are conditioned on a concrete sentence                no sentence distrib probs are assumed                               sum of all probs is 1      Generative/Joined model                assigns probs to all the trees generated by the grammar                sum of all probs is 1   Probability of a sentence is the sum of the probabilities of all the valid parse trees of the sentence prob of a subtree is independent of its position in the derivation tree -- positional invariance -- context-free, independence from ancestors   MLE - Maximum Likelihood Estimation - e.g. TreeBank Grammars    P(A -> a) = number of (A-> a) / sum(all rules A-> a in the grammar)      

HMM                                                 vs.                               PCFG =============================================================================== Prob distribution over strings of a certain length |  Probability distribution over the set of strings in the                                                                                           | language L Forward/Backward                                                     | Inside/Outside  Forward: αi(t) = P(w1(t-1), Xt=i)                                 | Outside:     αj (p,q) = P(w1(p-1), Nj pq, w(q+1)m | G) Backward: βi(t) = P(wtT|Xt=i)                                    | Inside:        βj (p,q) = P(wpq | Nj pq, G)  

Show full summary Hide full summary

Similar

Statistics Key Words
Culan O'Meara
SAMPLING
Elliot O'Leary
FREQUENCY TABLES: MODE, MEDIAN AND MEAN
Elliot O'Leary
CUMULATIVE FREQUENCY DIAGRAMS
Elliot O'Leary
TYPES OF DATA
Elliot O'Leary
HISTOGRAMS
Elliot O'Leary
GROUPED DATA FREQUENCY TABLES: MODAL CLASS AND ESTIMATE OF MEAN
Elliot O'Leary
Statistics Vocab
Nabeeha Yusuf
chapter 1,2 statistics
Rigo Sanchez
Statistics, Data and Area (Semester 2 Exam)
meg willmington
Chapter 7: Investigating Data
Sarah L