Task 6 Statistics II
February 4th 2013
Excel is used generate a random number using the RANDBETWEEN(1,4) method. Please see part A below.
To generate this sample from the whole population of job applicants, using Excel and the method RANDBETWEEN(1,4) the number 2 was generated. The RANDBETWEEN(1,4) method uses an algorithm to generate a random number. This method generates a large number using a seed that is based on the time noted in the computer time. This number is then reduced to the specified 1, 2, 3, or 4 using the mod 3 + 1. Using the random list that was provided, I was able to use systematic sampling to select every second data value and generated a sample. To ...view middle of the document...
This method uses a formula (shown below) and plugs in every age to calculate the standard deviation.
N=size of sample=50
x=mean unsucessful applicant=43.68
The standard deviation for unsuccessful applicants is 7.7
To calculate range, the difference between the maximum and the minimum is found.
Example Results Table
| Age of Unsuccessful Applicants | Age of Successful Applicants |
Mean | 43.68 | 39.74 |
Median | 44 | 39 |
Mode (if applicable) | 44 | 39 |
Range | 32 | 18 |
Standard Deviation | 7.78 | 4.87 |
Looking over the preliminary data, you can see that the mean of unsuccessful applicants in higher than the mean of successful applicants, as well as having a higher median age and mode. The range and standard deviation of the unsuccessful applicants is lower than successful applicants. This evidence may imply that there may not be a case for age discrimination. These different statistics lead us to see that more in depth statistics are needed to see if there really is a case for age discrimination or not.
1. The histograms and box plots were generated using SPSS statistical software and labeled using the drawing tools on Microsoft word.
Maximum = 57
Third Q = 48
Median = 44
First Q =37
Minimum = 25
Third Q =44
First Q =37
Interquartile range = 11
Interquartile range = 7
Histograms and box plots are a visual representation of a normal distribution. In a histogram, the data is put in numerical order and grouped into intervals like 1-3, 4-6, 5-9, etc. These intervals are then used to make a histogram. Using the histogram, a normal distribution would have most of the data in the middle of the graph and the rest of the data trailing off on each end of the graph. (See example below)
Example of a histogram with a normal distribution (Summary statistics, 2012)
Box plots are another way to organize data to determine how normalized the data is. In this form of a plot, the data is put into numerical order and the median is found. After finding the median, the middle data point between median and the minimum number is found, this is called the first quartile. The same process is used to find the middle number between the median and the maximum; this is called the third quartile. The boxes in the middle of the graph represent half of all the data points; it is drawn between the first and the third quartile. The other half of the data is spread out over the ends of the graph. Box plots are often used to show if a data sets are normal. In a normal data set, the box plot will have a small Interquartile range (the difference between the first and third quartiles) when compared to the whole range. The outer regions of the plot, close to the minimum and maximum, will be more spread out. (See example below)
Example of a normal distribution box plot.