Introduction to Analytics Modeling

Frage	Antworten
Descriptive Question	Passed tense The question asked for explanation of what happened. e.g. 1. Which sets of customers are most likely in their buying patterns? 2. What factors are most important in determining customer similarity?
Predictive Question	Future tense The question asked for what is going to happen. e.g. 1. How much worldwide demand will there be for crude oil next year? 2. What will worldpay's stock price be in five years?
Prescriptive Question	What action would be best? e.g. what strategy can an airline use to get passengers to their destinations quickly before, during and after a big snow storm?
Classifier	The classifier that is father from making mistakes (a.k.a misclassification)
Soft Classifier	Used when there is no way avoid making mistakes. One that gives as good a separation as possible.
Cost of Misclassification	The more costly one type of bad decision is, the more we want to move the line away from it.
Vertical	From top to bottom Because there is no left to right line in V
Horizontal	From left to right People always say what a beautiful horizon, which is very harmony.
Structured Data	The data that can be stored structurally. Quantitative Data Categorical Data
Unstructured Data	No structure. e.g. written text
Data Point	all the information about one observation.
Time series data	The same thing is measured at same-period time intervals
Support Vector Machines (SVM)	For Classification Model
Support Vector Machines (SVM) Soft Classifier	The bigger the lambda is, the more important a large margin outweighs avoiding mistakes and misclassification. Image: Screen Shot 2019 03 03 At 11.31.43 Am (binary/octet-stream)
Support Vector Machines Error Measurement	Measure classification error Also for the Hard Separator (Make no Mistake) Image: Screen Shot 2019 03 03 At 11.41.45 Am (binary/octet-stream)
Support Vector Machines Why the name?	Image: Screen Shot 2019 03 03 At 11.45.32 Am (binary/octet-stream)
Add Cost to the SVM	Image: Screen Shot 2019 03 03 At 11.55.11 Am (binary/octet-stream)
Why scale data in SVM	As the coefficient values of different variables might be different by multiple magnitude. e.f. # of people and Annual Income$
Eliminate classifiers in SVM	Near-Zero coefficients are probably not relevant for classification.
Support Vector Machine	SVM does not have to straight line. One model can use multiple SVM with different multipliers. The bigger the multiplier is, the more important to avoid classification error.
Scale Data	Common scaling: data between 0 and 1 Scale linearly 0 and 1 can be a and b
Standardization	Image: Screen Shot 2019 03 03 At 12.14.06 Pm (binary/octet-stream)
When to use Scale	Scale to a fixed range When the bounded range is important. Neural networks Batting average RGB color intensities SAT Scores
When to use standardization	Principal Component Analysis (PCA) Clustering
K-nearest Neighbor Classification KNN	For situations that there are more than two classes Pick the k closest points (like neighbors) Allow to weight attributes differently

Nächster

Introduction to Analytics Modeling

Beschreibung

Zusammenfassung der Ressource

ähnlicher Inhalt

	Erstellt von Xuehua Bu vor fast 6 Jahre