Question | Answer |
Data Mining is.. | Data mining—core of knowledge discovery process |
Data Mining Process | |
Typical Data Mining System | |
What is OLTP? What does it do? | On-line Transaction Processing Operational DBMS Major task of traditional relational DBMS Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc |
What is OLAP and what does it do? | On-Line Analytical Processing Data Warehouse Major task of data warehouse system Data analysis and decision making |
Distinct Features of OLTP vs. OLAP User and System Orientation | Customer vs. Market |
Distinct Features of OLTP vs. OLAP Data Contents | Current, detailed vs. historical, consolidated |
Distinct Features of OLTP vs. OLAP Database design | ER + application vs. star + subject |
Distinct Features of OLTP vs. OLAP View | Current, local vs. evolutionary, integrated |
Distinct Features of OLTP vs. OLAP Access Patterns | Updated vs. read-only but complex queries |
OLTP breakdown | USER: clerk, IT professional FUNCTION: day to day operations DB DESIGN: application-oriented DATA: current, up-to-date, detailed, flat relational isolated USAGE: reptitive ACCESS: read/write index/hash on prim key UNIT OF WORK: short, simple transaction # RECORDS ACCESS: tens # USERS: thousands DB SIZE: 100MB-GB METRIC: transaction throughput |
OLAP breakdown | USERS: knowledge worker FUNCTION: decision support DB DESIGN: subject-oriented DATA: historical, summarized, multidemensional, integrated, consolidated USAGE: ad-hoc ACCESS; lots of scans UNIT OF WORK: Complex queries # RECORDS ACCESSED: millions # USERS: hundreds DB SIZE: 10GB-TB METRIC: query throughput, réponse |
CUBE: A Lattice of Cuboids | |
Example of Fact Constellation | |
Generating Association Rules for Frequent Itemsets | Once the frequent itemsets have been found, generation strong association rules from them is straight forward An association rule A -> B is STRONG if it satisfies both min support and min confidence |
Generating Association Rules from Frequent Itemsets METHODS | 1. For each frequent items I, generation all non-empty subsets of I 2. For every non-empty subset s of I, output rules s-> (i-s) if con(s->(i-s)) >/= min_conf |
LIFT is | Measuring of dependent/correlated events |
Process 1: Model Construction | |
Process (2) Using the Model in Prediction | |
Attribute Selection Measure: Information Gain (ID3) | |
Attribute Selection: Info Gain |
Want to create your own Flashcards for free with GoConqr? Learn more.