Created by Alicia de la Pena
over 11 years ago
|
||
Regression Analysis Regression analysis is a statistical procedure for analyzing associative relationships between a metric dependent variable and one or more independent variables (Galton, 1885; Johnson and Wichern, 2007; Malhotra, 2007; Stigler, 1986; Woolridge, 2009). It can be used in the following ways: 1. Determine whether the independent variables explain a significant variation in the dependent variable: whether a relationship exists. 2. Determine how much of the variation in the dependent variable can be explained by the independent variables: strength of the relationship. 3. Determine the structure of the relationship: the mathematical equation relating the independent and dependent variables. 4. Predict the values of the dependent variable. 5. Control for other independent variables when evaluating the contributions of a specific variable or set of variables. Regression can be simple (bivariate) and multiple. Simple Regression Simple (Bivariate) regression is a procedure for deriving a mathematical relationship, in the form of an equation, between a single metric dependent variable and a single metric independent variable. The following equation defines the simple linear regression model: y = ß0 + ß1x + u Terminology for Simple Regression: Y X Dependent Variable Independent Variable Explained Variable Explanatory Variable Response Variable Control Variable Predicted Variable Predictor Variable Regressand Regressor Statistics Associated with Simple Regression Analysis: Simple regression model: y = ß0 + ß1x + u Where y = dependent variable; X = independent variable; ß0 = intercept of the line; C = slope of the line; and u= error term associated with the observation Coefficient of determination r2: measures the strength of association. It varies between 0 and 1 and signifies the proportion of the total variation in y that is accounted for the variation in X. Estimated or predicted value: i = a + bx ; where ˆ i is the predicted value of y; and a and b are stimators of ß0 and ß1 respectively. Regression coefficient: the estimated parameter b is usually referred to as the non-standardized regression coefficient. Scattergram: a scatter diagram is a plot of the values of two variables for all the cases of observations. Standard error of estimate: this statistic SEE is the standard deviation of the actual y values from the predicted i values. Standard error: the standard deviation of b, SEb Standardized regression coefficient: also termed the beta coefficient or beta weight; is the slope obtained by the regression of Y on X when the data are standardized. Sum of squared error: the distances of all the points from the regression line are squared and added together to arrive at the sum of squared errors, which is a measure of total error ∑e2 t statistic: a t statistic with n – 2 degrees of freedom can be used to test the null hypothesis that no linear relationship exists between X and y, or H0: ß1 = 0, where: t= b/SEb Least square procedure: a technique used for fitting a straight line to a scattergram by minimizing the square of the vertical distances of all the points from the line; and to maximize the correlation between the actual values of Y and the predicted values . Multiple Regression Multiple regression: a statistical technique that simultaneously develops a mathematical relationship between two or more independent variables and an interval-scaled dependent variable. eg: Are consumers’ perceptions of quality determined by their perceptions of prices, brand image, and brand attributes? Multiple regression model: y = ß0 + ß1x1 + ß2x2 + ß3x3 + u Statistics associated with multiple regression: Adjusted R2: coefficient of multiple determination, is adjusted for the number of independent variables and the sample size to account for diminishing returns. After the first few variables, the additional independent variables do not make much contribution. Coefficient of multiple determination: measures the strength of association. F test: is used to test the null hypothesis that the coefficient of multiple determination in the population R2 is zero. H0: ß1 = ß2 = ß3 = ßk = 0 The test has an F distribution with k and (n –k – 1) degrees of freedom. Partial F test: the significance of a partial regression coefficient ßi of Xi may be tested using an incremental F statistic. The incremental F statistic is based on the increment in the explained sum of squares resulting from the addition of the independent variable Xi to the regression equation after all the other independent variables have been included. Partial regression coefficient: the partial regression coefficient b1, denotes the change in the predicted value per unit change in X1 when the other independent variables, X2 to Xk are held constant.
Nueva Página
Want to create your own Notes for free with GoConqr? Learn more.