Data Mining Part 1

Question	Answer
Data Mining is..	Data mining—core of knowledge discovery process
Data Mining Process	Image: a5412abb-d62b-4af9-9f81-c5b6c6406f88 (image/png)
Typical Data Mining System	Image: 5256d71b-9911-4358-8475-d805c1a2046a (image/png)
What is OLTP? What does it do?	On-line Transaction Processing Operational DBMS Major task of traditional relational DBMS Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc
What is OLAP and what does it do?	On-Line Analytical Processing Data Warehouse Major task of data warehouse system Data analysis and decision making
Distinct Features of OLTP vs. OLAP User and System Orientation	Customer vs. Market
Distinct Features of OLTP vs. OLAP Data Contents	Current, detailed vs. historical, consolidated
Distinct Features of OLTP vs. OLAP Database design	ER + application vs. star + subject
Distinct Features of OLTP vs. OLAP View	Current, local vs. evolutionary, integrated
Distinct Features of OLTP vs. OLAP Access Patterns	Updated vs. read-only but complex queries
OLTP breakdown	USER: clerk, IT professional FUNCTION: day to day operations DB DESIGN: application-oriented DATA: current, up-to-date, detailed, flat relational isolated USAGE: reptitive ACCESS: read/write index/hash on prim key UNIT OF WORK: short, simple transaction # RECORDS ACCESS: tens # USERS: thousands DB SIZE: 100MB-GB METRIC: transaction throughput
OLAP breakdown	USERS: knowledge worker FUNCTION: decision support DB DESIGN: subject-oriented DATA: historical, summarized, multidemensional, integrated, consolidated USAGE: ad-hoc ACCESS; lots of scans UNIT OF WORK: Complex queries # RECORDS ACCESSED: millions # USERS: hundreds DB SIZE: 10GB-TB METRIC: query throughput, réponse
CUBE: A Lattice of Cuboids	Image: dd21b486-361d-49b5-8b91-16321a9bbf03 (image/png)
Example of Fact Constellation	Image: b3b38b83-7173-4d25-8223-ec08127c8ad4 (image/png)
Generating Association Rules for Frequent Itemsets	Once the frequent itemsets have been found, generation strong association rules from them is straight forward An association rule A -> B is STRONG if it satisfies both min support and min confidence
Generating Association Rules from Frequent Itemsets METHODS	1. For each frequent items I, generation all non-empty subsets of I 2. For every non-empty subset s of I, output rules s-> (i-s) if con(s->(i-s)) >/= min_conf
LIFT is	Measuring of dependent/correlated events Image: d9f1af56-3359-4469-a027-e7936758c57c (image/png)
Process 1: Model Construction	Image: cfb17bab-a68b-4cf9-87d1-45b687e4fd2c (image/png)
Process (2) Using the Model in Prediction	Image: aa9795c4-929c-4f66-90a6-5db7ddc2b6c8 (image/png)
Attribute Selection Measure: Information Gain (ID3)	Image: 142ad831-4ef8-4110-886e-3e195e33f6b7 (image/png)
Attribute Selection: Info Gain	Image: b5f38fb5-e86d-47fc-92c1-d2f065777713 (image/png)

Next up

Description

Resource summary

Similar

	Created by Kim Graff over 9 years ago