Created by Paul Mandaiker
almost 9 years ago
|
||
Question | Answer |
How can you check Multicollinearity? | VIF < 10 |
How can the presence of Multicollinearity be solved? | - get more data (add more observations) - eliminate one or more variables (guided by theory) - transform variables into less independent factors (e.g. with Factor Analysis) |
Explain Ward's Method | Ward method is a variance method (i.e. attempts to minimize the variance within clusters) |
Explain Mediation | Effect of X on Y passes through M (either partially or fully affects M) nature of relationship does not change, mediator absorbs some part of the effect of X on Y. Direct influence will become smaller (or even insignificant) |
Explain Moderation | A variable Z moderates the relationship between X and Y of the effect of X on Y depends on the level of Z (interaction effect) Changes the relationship of an independent variable X on Y |
Conditions for Mediation | 3 regressions for 3 requirements: 1. Relationship between X and Y (Beta must be sig) 2. Relationship between X and M 3. All variables in the regression a. Insig between X and Y -> full mediation b. Still sig between X and Y -> partial mediation |
What is a marketing model | Simplified representation of reality that we use to solve problems and gather insights to make decisions |
What is a market response model? | Model trying to establish a function between a group of independent variables and a dependent variable of interest (e.g. Sales) |
Explain intuition between least squares model | aims at estimating the parameters of a linear regression model with the lowest possible error (highest variance explained), by minimizing the sum of the (standardized) squared residuals (total difference between predicted and actual Y) |
What are residuals and why are they squared in linear regression? | Residuals are differences between the model predicted values of Y and the actual values of Y. They are squared to account for the presence of positive and negative distances. |
Simple explanation of linear regression | Fit a line which minimises total distance to all data points |
Why is does linear regression use a measurement error term (Epsilon) | You cannot predict DV perfectly so you need to introduce statistical measurement error to infer things from model |
What is the difference between Measurement Error (Epsilon) and e | Measurement error is true value error e is residual of computational error |
How do you interpret Beta | 1 unit change in X (IV) leads to b1 unit of change in Y (DV) F.e. 1 unit change in promotion leads to 0.56 change in Sales |
What does the standard deviation of error term indicate? | smaller standard deviation of error term indicates more accurate measurement because data points are less dispersed around regression line |
How do you compare relative effects of different IVs on DV if their units are different? | Standardisation: raw data is transformed into new variables that have a mean of 0 and variance of 1 Standardised regression coefficients = Beta Allow direct comparison of relative effect of each IV on DV |
What is the null hypothesis to test overall model significance? | all coefficients are equal to 0 means that you just have a point no regression line...there is no pattern in data -> you want to reject that (overall model should be significant) |
How can coefficient significance be tested? | H0: bk = 0 Use T-Test and check p-value (should be significant), if not IV has no influence on DV and should be set to 0 |
How do you calculate a 95% confidence interval | Y - 2*sigma, Y + 2*sigma |
What is R-square | R-square is the percentage of variation explained by model -> higher value means model captures more variation, can predict DV, is accurate and powerful |
Why is adjusted R-square important? | Adj. R-Square adjusted for number of IVs and sample size important because you could easily get higher R-square by adding more variables also allows you to compare models with different IVs |
what are the three error assumptions | 1. normal distribution with mean zero 2. equal variance 3. Independence |
How can you test the normality assumption? | |
You do no want to reject the null of KS test. (T/F?) | True Null: residuals are normally distributed you do not want to reject that |
What is the equal variance assumption? | same variation above means as beneath mean Test by plotting Y-hat and residuals (Xi switches axis) |
How does a violated equal variance assumption look like? | |
when is the independence assumption violated? | when residuals increase for higher values of Y |
How do you interpret a dummy coefficient (Beta)? | always interpret dummy coefficient relative to baseline SPSS automatically excludes baseline values from regression -> perfect collinearity |
In Conjoint analysis: why are consumers asked to evaluate products by considering different attributes jointly? | if you ask through disjoint approach, you have biases that do not provide valuable insight f.e. consumers want all of most desirable features at lowest possible cost |
What is the most preferred product? | the one with the highest part-worth coefficients |
How do you calculate the importance of an attribute? | importance = Max - Min (part worth values) Importance of attribute is defined as the range of part-worth across the levels |
How do you calculate the relative importance of attribute? | importance of attribute ________________________________________ sum of all attribute importances |
How do you calculate utilities and market share of hypothetical products? | sum up all part-worth coefficients of all the products Divide part-worth coefficients by part-worth coefficients of all products |
Standardised coefficients become importance of attributes. (T/F?) | False. Unstandardised coefficients become part worth coefficients. insignificant coefficients should be set to 0 |
What are the two goals of cluster analysis? | 1. Homogeneity within a cluster 2. Heterogeneity between clusters |
How do you measure homogeneity/heterogeneity? | Distance between two consumers |
When do you use Euclidean metric of distance? | continuous variables (interval/ratio), dummy variables |
What do you have to examine in data before doing cluster analysis? | All variables should be independent (check multicollinearity -> correlation, VIF) If variables are highly correlated: Standardise, combine, factor analysis |
What is the premise of agglomerative hierarchical clustering? | agglomerative process: from each person in separate cluster to all people in one cluster |
What are two agglomerative processes used in hierarchical clustering? | 1. Linkage method (merge clusters with smaller distance) 2. Variance method (typically for continuous variables; combines clusters by checking if variance becomes smaller) -> more stable than linkage method |
name three ways of deciding on the number of clusters | 1. heuristics 2. elbow method (look for the "kink" 3. Dendrogram (draw a line at every amount of clusters, check when high difference becomes low difference) |
What is non-hierarchical clustering? | Assign consumers into K (respecified by researcher) non-empty clusters |
What are advantages/disadvantages of non-hierarchical clustering? | disadvantage: unknown number of clusters as convergence depends on initial value advantage: dealing with large number of consumers |
How do you interpret and profile clusters? | Run a One-Way anova with selected number of clusters and compare all the means for different clusters, colour maximum in table and relative minimum means |
Explain the direct method of MDS. | - Rate pairs of brands on similarity/dissimilarity (1-7 Likert Scale) - Problem: Many pairs to compare - Allows to cover unknown or unobserved attributes - Difficult to interpret dimensions |
Explain derived method of MDS | - Rate different (prespecified attributes) on likert or semantic scale - You first need to define attributes (need quali research) - You might miss out on unknown of unobserved attributes - Easy to interpret dimensions |
When do you stop with MDS algorithm? | when STRESS (standardised residual sum of square) is small The smaller STRESS, the better fit to data smaller than 0.5 considered good fit |
What does STRESS indicate? | indicates discrepancy between actual and observed distances STRESS makes trade-off between decreasing value at cost of dimensions (more) |
Why is STRESS value similar to R-square? | Both minimise variance. Distance between observed and computed distance. |
How do you determine ideal number of dimensions on perceptual map? | elbow criterion on a plot of STRESS vs. Dimensionality (Scree plot) |
How do you label dimensions of perceptual maps? | subjectively or by collecting more data or performing more analysis (cluster analysis) |
Want to create your own Flashcards for free with GoConqr? Learn more.