Zusammenfassung der Ressource
Corpus
- Collection of texts
- Large
- Computer readable
- Designed for linguistic analysis
- Applications
- Depend on
- Desing of corpora
- Observational
methods of analysis
- Interpretation
of analysis
- Translation studies
- Stylistics
- Forensic
linguistics
- Cultural representation &
key words
- Psycholinguistics
- Theoretical linguistics
- Modern corpora & software
- Principles
- Observer must not influence
what is observed
- Repeated events are significant
- Available corpora
- 1960s --> 1990
(First generation)
- Small but carefully
designed
- Carefully designed
REFERENCE corpora
- Corpus design
- Must be balanced
- Must include
consideranble data
- Running words
- Size of the audience for the
text in corpus
- Must combine
- Large general corpora
- Small corpora for specific knowledge
- Opportunistic text collections
- Some types of corpora
(according to process)
- Raw
- Lemmatized
- Annotated
- Empirical linguistics
- Computer technology is essential
- Requires observation
- No single method
- Uses concordances
- Concordance lines
- Concordance data
- New findings & descriptions
- Word frequency
- Varies according text-types
- May have differences
senses
- Requires interpretation by material
designers & teachers
- AWL can be used as
a guide
- Phrase frequency
- Determines word
frequency
- Phrase-like units
- Basic units of meaning
- Phrases
- Collocations
- 1, 2, or 3 words
co-occurring frequently
- Recurrent phrases
- Frequent multi-word strings
- Identified using computer programs
- Identify patterns
- Semantic prefence, discourse prosody,
and extended lexical units
- Collocation
- Colligation
- Semantic
preference
- Discourse prosody
- Strength & attraction between
nodes & collocates
- Position of nodes & collocates
- Distribution
- Grammar, co-text, and text-types
- Corpus can reveal characteristics
- Type-token ratio
- Lexical density
- % of everyday &
academic vocabulary