BMS11: Business Maths & Statistics
a) The statistical summary table from Part A returned the following results from the salary observations in the Environmental Policy survey:
Salary Summary | | |
| Male | Female |
Mean | $ 53,695 | $ 58,643 |
Median | $ 50,050 | $ 58,100 |
Stdeviation | $ 10,201 | $ 13,731 |
Minimum | $ 37,700 | $ 31,000 |
Maximum | $ 78,000 | $ 81,400 |
Range | $ 40,300 | $ 50,400 |
1st Quartile | $ 45,400 | $ 49,600 |
3rd Quartile | $ 62,100 | $ 62,000 |
Inter-quartile range | $ 16,700 ...view middle of the document...
The distribution of data sets display bell shaped properties so we can apply the empirical rule (that most (95%) of the sample observations will fall within 2 times the standard deviation from the mean) to check that the calculated values are reasonable, and confirm that for both sets of data the range/4 is approximately equal to the calculated standard deviation:
Male Range/4 = $10,075 Std Dev = $10,201
Female: Range/4 = $12,600 Std Dev= $13,731
Extrapolating the empirical rule, we can observe that the male salary measurements are more clustered around the mean - 68% of male observations lie between $43,494-$63,896, against $44,912-$72,374 for females. The observations for 95% and 99.7% reflect the observed spread of measurements for both sets. This clustering in the male observations reinforces the observed distribution showing a clear modal class.
Further to the consideration of variability, the coefficient of variation is calculated to identify whether the standard deviation value is large or small – simply returning a number value does not give an indication against the rest of the sample. The coefficient of variation is the standard deviation as a percentage of the mean, and for both data sets it is approximately 20%, although higher for the female data set.
The interquartile range(IQR) of the data sets present an interesting observation. The IQR measures the spread of the middle 50% of observations, and is particularly useful for ranked data such as these salary observations and this measure is not sensitive to extreme or outlier values. The male data set returns a larger IQR measurement than the female set ($16,700 against $12,400), meaning that the first and third quartiles are further apart, and indicates a greater degree of variability.
Preparing a histogram of the two data sets showing relative frequency distribution across equal class widths reveals the following information:
(Red line is approximation of Median, Green line is approximation of Mean)
* The centre of the histogram for female salaries is higher than that for males – as evidenced by higher values for both median and mean measures.
* The spread of salaries for females is greater than males, as evidenced by the range of the female observations having both a lower minimum value and higher maximum value.
* The female salary histogram appears generally symmetric with a bell shaped distribution, however the lack of observations in the $75,000 class would not produce a frequency polygon with symmetric or bell shaped characteristics. In fact the chart is somewhat negatively skewed with the larger proportion of the observation occur in the higher end of the salary scale.
In contrast, the male histogram is also unimodal and shows a more dominant modal class (returning a relative frequency of 45. The ‘shape’ of the histogram is positively skewed with the number of observations trailing in the higher classes.
b) The median salary...