Hi Im trying to get to grips with the program Minitab, mainly multiple regression. My question is could someone explain each area of a results table shown below especially their meaning. This is just an example so i just need to know how each value works i.e what are p values and their effect etc. I need this in the next 4 hours thanks very much Regression Analysis: Total£ versus GFarea, Bedrooms The regression equation is Total£ =  36280 + 84.2 GFarea + 20629 Bedrooms 79 cases used 2 cases contain missing values Predictor Coef SE Coef T P Constant 36280 20143 1.80 0.076 GFarea 84.198 9.779 8.61 0.000 Bedrooms 20629 4903 4.21 0.000 S = 49455 RSq = 66.5% RSq(adj) = 65.7% Analysis of Variance Source DF SS MS F P Regression 2 3.69762E+11 1.84881E+11 75.59 0.000 Residual Error 76 1.85878E+11 2445767700 Total 78 5.55641E+11 Source DF Seq SS GFarea 1 3.26460E+11 Bedrooms 1 43302466403 Unusual Observations Obs GFarea Total£ Fit SE Fit Residual St Resid 8 2504 381984 257070 9787 124914 2.58R 18 1996 100300 214297 6362 113997 2.32R 49 3076 274958 284602 17274 9644 0.21 X 53 1980 374256 295467 19406 78789 1.73 X 54 3501 453744 382274 17160 71470 1.54 X R denotes an observation with a large standardized residual X denotes an observation whose X value gives it large influence.
Hello, In answering this question, I shall precede your initial question with >'s > The regression equation is > Total£ =  36280 + 84.2 GFarea + 20629 Bedrooms In regression, we are trying to get an estimate which best predicts the outcome. The outcome in this case is Total£. If we know GFArea and Bedrooms, our best guess at Total£ is given by this equation. > 79 cases used 2 cases contain missing values You had 81 rows in your data, however two of these rows contained missing data, for at least one of the variables, and so were not used in the analysis. > Predictor Coef SE Coef T P > Constant 36280 20143 1.80 0.076 > GFarea 84.198 9.779 8.61 0.000 > Bedrooms 20629 4903 4.21 0.000 This next part gives some of the details of the equation. Each of the estimates (coefficients, indicated with Coef) has a standard error  this is a measure of how variable the estiamte is likely to be. To gain the 95% confidence intervals of the coefficient, we multiply the standard error by 1.96, and add and subrtract this from the coefficient. So our best guess at GFArea is 84. Howevever, this estimate has a standard error of (approximately) 10. So the confidence intervals are given by 84 + 1.96 x 10 = 104 and 84  1.96 x 10 = 64. If we were to say that the true (population) value for the coefficient is likely to be from 64 to 104, there is only a 5% chance (1 in 20) that we are wrong. That is, one more unit of GFArea adds between 64 and 104 units of Total£. The next part of the model is T. T is given by Coef / SE. So: 36280 / 20143 = 1.80 T isn't very useful on its own, but it does give us P  that is the probability of the result occurring, if the real value in the population is zero. The fact that GFarea and Bedrooms both have low probabilities (less than 0.0005) means that it is very unlikely you would have found this result, if in fact they had no effect. The constant is a special variable  this is the estimated value of Total£ when all of the other predictors are zero. It often makes no sense  as is the case here (I am guessing). The value of a house with 0 bedrooms and a 0 ground floor area is 36280, is obviously a silly thing to say. > S = 49455 RSq = 66.5% RSq(adj) = 65.7% OK, so this equation tells us the best guess at Total£, but the next question is, how good is that? This is given by RSq  or RSquared. RSquared is the proportion of variance in the Total£ which is explained by the predictors  in this case, it is 66.5%  quite a high prediction. If you take this as 0.665, and find the square root, it is 0.81. This is the correlation between the predicted score (given by the equation) and the actual score. The next question to ask is whether this is a good prediction  i.e. is this prediction better than chance. > Source DF SS MS F P > Regression 2 3.69762E+11 1.84881E+11 75.59 0.000 > Residual Error 76 1.85878E+11 2445767700 > Total 78 5.55641E+11 > Source DF Seq SS > GFarea 1 3.26460E+11 > Bedrooms 1 43302466403 This is answered by the next section. This is usually not reported in depth, so I am not going to cover it here, but request clarification if you need it. The P value again tells us whether we can make a significant prediction, or whether we are better off guessing. because this pvalue is very low, it is highly significant, and better than guessing. The most common thing here is to report the pvalue (again, it's <0.0005, it's not 0.000). you might also want to report the F, and the DF, in which case it's F=75.6, df = 2, 76. > Obs GFarea Total£ Fit SE Fit Residual StResid > 8 2504 381984 257070 9787 124914 2.58R > 18 1996 100300 214297 6362 113997 2.32R > 49 3076 274958 284602 17274 9644 0.21 X > 53 1980 374256 295467 19406 78789 1.73 X > 54 3501 453744 382274 17160 71470 1.54 X >R denotes an observation with a large standardized residual >X denotes an observation whose X value gives it large influence. The final part of the output is some diagnostics, to help you to interpret the equation. Minitab has selected some cases it believes you might want to look at. It bases this on the residuals and the influence. First, the residuals. The residual is the difference between the value we would expect, given GFArea and Bedrooms, and what we actually have. Large residuals are marked with an R. Case 8 has a very large residual  its value for Total£ is 124914 higher than would be expected. Similarly, 18 has a much lower value than would be expected. It is worth looking at these to see if there has been an error entering the data, or there is something unusual about them. Maybe they are in a very different area to the others, maybe they are paved with gold, or come with a free farm (I am guessing, because I have no idea what the data are about  I am sure you can think of something more sensible). Second, the influential cases, marked with an X. An influential case is more important than the others in determining the values of the coefficients. It isn't necessarily anything to worry about, but again is worth checking. (I am not going to go into detail on this, because of your time limit, but if you would like, just request clarification). I have written this fairly swiftly, because of your time limit, so if you think that I have missed anything out, or would like clarification on anything, please ask, before rating the question. I will recheck the page fairly regularly for the rest of the day, to see fi a clarification request appears. Here's a useful site: http://www.fw.umn.edu/biochr/assoc/dho/107/notes/minitab/REGRESS1.HTM Here's a useful book. :) http://www.amazon.co.uk/exec/obidos/ASIN/0761962301/qid%3D1006938682/sr%3D11/ref%3Dsr%5Fsp%5Fre/02699555708940457 jeremymilesga 














Advertisements