Changing the way you learn

Question	Answer
How can you check Multicollinearity?	VIF < 10
How can the presence of Multicollinearity be solved?	- get more data (add more observations) - eliminate one or more variables (guided by theory) - transform variables into less independent factors (e.g. with Factor Analysis)
Explain Ward's Method	Ward method is a variance method (i.e. attempts to minimize the variance within clusters)
Explain Mediation	Effect of X on Y passes through M (either partially or fully affects M) nature of relationship does not change, mediator absorbs some part of the effect of X on Y. Direct influence will become smaller (or even insignificant)
Explain Moderation	A variable Z moderates the relationship between X and Y of the effect of X on Y depends on the level of Z (interaction effect) Changes the relationship of an independent variable X on Y
Conditions for Mediation	3 regressions for 3 requirements: 1. Relationship between X and Y (Beta must be sig) 2. Relationship between X and M 3. All variables in the regression a. Insig between X and Y -> full mediation b. Still sig between X and Y -> partial mediation
What is a marketing model	Simplified representation of reality that we use to solve problems and gather insights to make decisions
What is a market response model?	Model trying to establish a function between a group of independent variables and a dependent variable of interest (e.g. Sales)
Explain intuition between least squares model	aims at estimating the parameters of a linear regression model with the lowest possible error (highest variance explained), by minimizing the sum of the (standardized) squared residuals (total difference between predicted and actual Y)
What are residuals and why are they squared in linear regression?	Residuals are differences between the model predicted values of Y and the actual values of Y. They are squared to account for the presence of positive and negative distances.
Simple explanation of linear regression	Fit a line which minimises total distance to all data points
Why is does linear regression use a measurement error term (Epsilon)	You cannot predict DV perfectly so you need to introduce statistical measurement error to infer things from model
What is the difference between Measurement Error (Epsilon) and e	Measurement error is true value error e is residual of computational error
How do you interpret Beta	1 unit change in X (IV) leads to b1 unit of change in Y (DV) F.e. 1 unit change in promotion leads to 0.56 change in Sales
What does the standard deviation of error term indicate?	smaller standard deviation of error term indicates more accurate measurement because data points are less dispersed around regression line
How do you compare relative effects of different IVs on DV if their units are different?	Standardisation: raw data is transformed into new variables that have a mean of 0 and variance of 1 Standardised regression coefficients = Beta Allow direct comparison of relative effect of each IV on DV
What is the null hypothesis to test overall model significance?	all coefficients are equal to 0 means that you just have a point no regression line...there is no pattern in data -> you want to reject that (overall model should be significant)
How can coefficient significance be tested?	H0: bk = 0 Use T-Test and check p-value (should be significant), if not IV has no influence on DV and should be set to 0
How do you calculate a 95% confidence interval	Y - 2sigma, Y + 2sigma
What is R-square	R-square is the percentage of variation explained by model -> higher value means model captures more variation, can predict DV, is accurate and powerful
Why is adjusted R-square important?	Adj. R-Square adjusted for number of IVs and sample size important because you could easily get higher R-square by adding more variables also allows you to compare models with different IVs
what are the three error assumptions	1. normal distribution with mean zero 2. equal variance 3. Independence
How can you test the normality assumption?	Image: 19ac8c60-0c75-4d8f-9cec-872fe429e139 (image/png)
You do no want to reject the null of KS test. (T/F?)	True Null: residuals are normally distributed you do not want to reject that
What is the equal variance assumption?	same variation above means as beneath mean Test by plotting Y-hat and residuals (Xi switches axis) Image: ecb601e3-ea28-4efe-9da6-6333e433f5d6 (image/png)
How does a violated equal variance assumption look like?	Image: 872414b2-4943-490d-beda-14f3158339ec (image/png)
when is the independence assumption violated?	when residuals increase for higher values of Y Image: 66d014b4-c49b-46e7-bce3-27b073e9782b (image/png)
How do you interpret a dummy coefficient (Beta)?	always interpret dummy coefficient relative to baseline SPSS automatically excludes baseline values from regression -> perfect collinearity
In Conjoint analysis: why are consumers asked to evaluate products by considering different attributes jointly?	if you ask through disjoint approach, you have biases that do not provide valuable insight f.e. consumers want all of most desirable features at lowest possible cost
What is the most preferred product?	the one with the highest part-worth coefficients
How do you calculate the importance of an attribute?	importance = Max - Min (part worth values) Importance of attribute is defined as the range of part-worth across the levels
How do you calculate the relative importance of attribute?	importance of attribute ________________________________________ sum of all attribute importances
How do you calculate utilities and market share of hypothetical products?	sum up all part-worth coefficients of all the products Divide part-worth coefficients by part-worth coefficients of all products Image: e2d12c49-02f3-47e8-a235-91e656e6d1a8 (image/png)
Standardised coefficients become importance of attributes. (T/F?)	False. Unstandardised coefficients become part worth coefficients. insignificant coefficients should be set to 0
What are the two goals of cluster analysis?	1. Homogeneity within a cluster 2. Heterogeneity between clusters
How do you measure homogeneity/heterogeneity?	Distance between two consumers
When do you use Euclidean metric of distance?	continuous variables (interval/ratio), dummy variables
What do you have to examine in data before doing cluster analysis?	All variables should be independent (check multicollinearity -> correlation, VIF) If variables are highly correlated: Standardise, combine, factor analysis
What is the premise of agglomerative hierarchical clustering?	agglomerative process: from each person in separate cluster to all people in one cluster
What are two agglomerative processes used in hierarchical clustering?	1. Linkage method (merge clusters with smaller distance) 2. Variance method (typically for continuous variables; combines clusters by checking if variance becomes smaller) -> more stable than linkage method
name three ways of deciding on the number of clusters	1. heuristics 2. elbow method (look for the "kink" 3. Dendrogram (draw a line at every amount of clusters, check when high difference becomes low difference)
What is non-hierarchical clustering?	Assign consumers into K (respecified by researcher) non-empty clusters
What are advantages/disadvantages of non-hierarchical clustering?	disadvantage: unknown number of clusters as convergence depends on initial value advantage: dealing with large number of consumers
How do you interpret and profile clusters?	Run a One-Way anova with selected number of clusters and compare all the means for different clusters, colour maximum in table and relative minimum means
Explain the direct method of MDS.	- Rate pairs of brands on similarity/dissimilarity (1-7 Likert Scale) - Problem: Many pairs to compare - Allows to cover unknown or unobserved attributes - Difficult to interpret dimensions
Explain derived method of MDS	- Rate different (prespecified attributes) on likert or semantic scale - You first need to define attributes (need quali research) - You might miss out on unknown of unobserved attributes - Easy to interpret dimensions
When do you stop with MDS algorithm?	when STRESS (standardised residual sum of square) is small The smaller STRESS, the better fit to data smaller than 0.5 considered good fit
What does STRESS indicate?	indicates discrepancy between actual and observed distances STRESS makes trade-off between decreasing value at cost of dimensions (more)
Why is STRESS value similar to R-square?	Both minimise variance. Distance between observed and computed distance.
How do you determine ideal number of dimensions on perceptual map?	elbow criterion on a plot of STRESS vs. Dimensionality (Scree plot)
How do you label dimensions of perceptual maps?	subjectively or by collecting more data or performing more analysis (cluster analysis)

Next up

MSR Flash Cards

Description

Resource summary

Similar

	Created by Paul Mandaiker almost 9 years ago