|Outline||Additional instructions for the use of the Minitab statistics package are provided. In particular, the commands required to carry out paired and unpaired t-tests, calculate binomial and Poisson probabilities, calculate correlation coefficients and undertake a simple bivariate regression analysis are summarised.|
|Suggested study time||180 minutes|
|Pre-requisites||An introduction to significance testing
The Binomial distribution
The Poisson distribution
Comparing means using Minitab
These analyses use the plasma.mtw file. Four analyses are shown.
- Unpaired t-test (samples in same column)
- Unpaired t-test (samples in different columns)
- Paired t-test using Minitab command
- Paired t-test using column arithmetic
Minitab has the capacity to calculate probabilities from a large number of theoretical distributions. The command is available from the ‘Calc>Probability Distributions’ menu. Select the Binomial option.
At first glance this is a rather confusing window. It is split into 3 sections. The top section has 3 choices. Select ‘Probability’ for the first example.
The second section asks for information about the Binomial Distribution’s 2 parameters: the number of trials and the probability of a success.
The third section is the most confusing.
|Minitab will only produce probabilities for
1. A column of outcomes (input column) or
We also have the option of specifying a column for the storage of the calculated probabilities.
In this example we are calculating probabilities for the outcomes specified in c16 resulting from 10 trials where each trial has a probability of success of 0.5.
Since we are undertaking 10 trials we need the probabilities of 0,1,2,…9,10 successes. Enter the numbers 0 to 10 into c16 (the input column). There are two ways of doing this.
1. Type them in yourself. While this is fine for this example it would be tedious if you have 100 trials.
2. Automate it using a Minitab command. Minitab calls the sequence 0 – 10 patterned data.
|The command is available from the Calc menu. We require the ‘Simple set of numbers’ option.||You will be presented with this screen. You must enter something into each input box.|
|C16 is the storage column.
The first value is 0
We only require 1 copy of each number and only 1 copy of the entire sequence.
As a consequence of the above commands c16 will contain the numbers 0 to 10.
|We can now complete the input windows for the Binomial probability calculation.
The result of this command is shown below.
x P( X = x) Thus,
0.00 0.0010 P(0) successes is 0.0010;
The above probabilities appear in the session window. However, we can also specify a column that should be used for storing these probabilities.
Column 17 will be used for storage. The subsequent contents of columns 16 & 17 are shown opposite. Note the slight differences due to the rounding of the P values in the session window.
|The second option in the Binomial Distribution screen is ‘Cumulative probability’. If this option is selected we are presented with a table of probabilities for up to x successes.
In this example we will store these in c18.
When cumulative probabilities are calculated we are presented with a sum of probabilities so far. Thus, the first probability is for P(0), but the second is for
Note that the table of P values presented at the end of the Binomial notes is for x or more successes. Thus, P(2) is the probability of 2 or 3 or 4 or 5 …. successes.
The contents of columns 16 (the input column), 17 (from the previous command) and 18 (the cumulative probabilities from this command) are shown below.
|0.01074 is P(0) + P(1), 0.000977+0.009766
0.05469 is 0.01074 + 0.043945
If you wished to find, for example, P(4 or more) from the above you need a simple subtraction. P(4 or more) is found from 1 – P(<=3), i.e. 1 – 0.17188.
These are found in a similar way to the binomial probabilities, except that we now specify only a mean (since Poisson probabilities are determined solely by the mean number of events in a random process).
|In this example a mean of 2.3 has been specified. Again we need a column of input values (or a single constant value). we will use the same 0 – 10 values that were used for the binomial calculations.
|Probability Density Function
Poisson with mu = 2.30000
These P values do not quite sum to 1.00. P values are only given for the outcomes listed in the input column and will not always sum to one. You may need to make allowances for this. For example, if you sum the P values for 0 – 9 events and subtract this from 1 the remaining figure is the probability of 10 or more events. These probabilities will sum to one.
As with the binomial probabilities you can specify cumulative probabilities. For example, again with a mean of 2.3 and c16 as the input column, the output to the session window would be:
Cumulative Distribution Function
Poisson with mu = 2.30000
x P( X <= x)
0.00 0.1003 The probability of 0 events
1.00 0.3309 The probability of 0 or 1 events
2.00 0.5960 The probability of 0, 1 or 2 events
3.00 0.7993 The probability of 0, 1, 2 or 3 events
4.00 0.9162 etc
Note that calculated P values can be stored in a column if one is specified in the optional storage command (see the Binomial example for details).
This is one of the simplest Minitab commands.
1. The correlation command is accessed via the Calc>Basic Statistics menu option.
2. You need to select two or more suitable variables.
3. In this example the variables ht to age have been selected (i.e. ht, wt, bmi, age).
|The Minitab output
ht wt bmi
Cell Contents: Correlation
4. Thus, the correlation between ht and wt is 0.534, with a P value of 0.000. The correlation between age and wt is 0.100 with a P value of 0.361, etc.
Note that all the P values are two-tailed.
The following example uses the B_Cancer.mtw file. These data relate mortality from breast cancer to average temperature. Mortality is the response (dependent) variable.
|1. Regression analysis is obtained through the Stat>Regression menu option. The simple regression command is the first of these.||2. You now need to select a
· Response (y) variable and a
· Predictor (x) variable (select only one for simple regression)
3. The Results option allows you to control how much output is provided by Minitab. The third option is the recommended default.
|4. If you click on the Graphs option in the Regression window you are provided with a range of options. I suggest the following:
· make sure that Regular residuals are selected
· choose histogram of the residuals
· plot the residuals against the x variable (temperature in this example).5. The settings in the Options window are best left unchanged. These options are not explained here.
6. The Storage option again gives you access to many choices. I would recommend that you select Fits (the predicted values) and Residuals.
Output from this command
The regression equation is
Mortality = – 21.8 + 2.36 Temperature
Predictor Coef StDev T P
Constant -21.79 15.67 -1.39 0.186
Temperat 2.3577 0.3489 6.76 0.000
The ‘Coef’ are the a and b values, the Constant is a (the intercept). Significance tests (t and P values are shown) are provided for both coefficients. In this example the intercept (a) is not significantly different from 0. However, as shown above, the slope is significantly 0.
S = 7.545 R-Sq = 76.5% R-Sq(adj) = 74.9%
The adjusted R2 value is the original R2 value adjusted to take account of the degrees of freedom. The adjusted value is always less but is thought to have less bias.
s is the standard deviation of the line, as with all standard deviations it gives us some idea about the reliability of the sample statistic as an estimate of the population parameter.
Minitab next provides a redundant significant test. This test is an example of an analysis of variance. It is only redundant in the simple case of a bivariate regression (i.e. one predictor variable). If a multiple regression analysis is undertaken (more than one predictor variable) the analysis of variance provides important information. The analysis is redundant because the test statistic, F, is the square of the t statistic calculated previously, i.e. 6.758^2 = 45.67.
Analysis of Variance
Source DF SS MS F P
Regression 1 2599.5 2599.5 45.67 0.000
Error 14 796.9 56.9
Total 15 3396.4
The final piece of information provided relates to unusual observations. Two types of unusual observations are flagged by a final letter.
- R indicates a predicted value that is a long way from the actual value.
· X indicates a predictor value that is a long way from the majority of points, this could give it undue influence on the slope of the line.
In this data set observation 15, which has the lowest temperature 31.8º, is flagged as an unusual observation on both counts.
Obs Temperat Mortalit Fit StDev Fit Residual St Resid
15 31.8 67.30 53.18 4.85 14.12 2.44RX
R denotes an observation with a large standardized residual
X denotes an observation whose X value gives it large influence.
|Normality of the residuals is an important assumption in a regression analysis. Although no formal test of normality has been applied (you could use the describe option to do this) a simple inspection of the above suggests that the distribution is not too far from normal. It is important to understand that small samples rarely give a perfect match to a normal distribution even if the parent population is normal.||This plot of the residuals against the predictor (independent) variable does not have obvious pattern. Again this suggests that the data are not violating the regression analysis assumptions.|
There are many other diagnostic plots that could be used. A discussion of these is not given in these notes.
Plotting the line of best fit
Minitab now offers an option to draw the line of best fit (from the Stat>Regression menu). In fact it also carries out a regression analysis
You must select the response and predictor variables.
You are given the option of three regression models. The second and third are non-linear (curves).
|The options command gives you access to two sets of commands.
The first allows you to carry out a logarithmic transformation of the x and/or y variables. This is not needed for these data.
It also gives you the option to include confidence bands (default 95%) on the plot. You can select one or both of the regression line confidence band (provides some limits for the possible position of the actual regression line) and prediction bands (the predicted value of y for a certain x value is the mean of a normal distribution).
|This is the default plot. It includes the data and the regression line. You also given the regression equation and the R2 value.||The sample plot but with 95% confidence limits for the regression line. Note how bands get wider away from the center. One way of thinking about this is to imagine a line that is pinned to a surface at the position marked by the mean of x & y. This line is free to rotate, within the limits imposed by the 95% bands. This gives the curved confidence bands shown below..|