What are the most important descriptive measures in statistics? *
2 points
Median and Standard Deviation
Mean and Standard Deviation
Mean and Variance
Answers
Answer:
2. Mean and standard deviation
The median is known as a measure of location; that is, it tells us where the data are. As stated in , we do not need to know all the exact values to calculate the median; if we made the smallest value even smaller or the largest value even larger, it would not change the value of the median. Thus the median does not use all the information in the data and so it can be shown to be less efficient than the mean or average, which does use all values of the data. To calculate the mean we add up the observed values and divide by the number of them. The total of the values obtained in Table 1.1 was 22.5, which was divided by their number, 15, to give a mean of 1.5. This familiar process is conveniently expressed by the following symbols:
(pronounced "x bar") signifies the mean; x is each of the values of urinary lead; n is the number of these values; and σ , the Greek capital sigma (our "S") denotes "sum of". A major disadvantage of the mean is that it is sensitive to outlying points. For example, replacing 2.2 by 22 in Table 1.1 increases the mean to 2.82 , whereas the median will be unchanged.
As well as measures of location we need measures of how variable the data are. We met two of these measures, the range and interquartile range, in Chapter 1.
The range is an important measurement, for figures at the top and bottom of it denote the findings furthest removed from the generality. However, they do not give much indication of the spread of observations about the mean. This is where the standard deviation (SD) comes in.
The theoretical basis of the standard deviation is complex and need not trouble the ordinary user. We will discuss sampling and populations in Chapter 3. A practical point to note here is that, when the population from which the data arise have a distribution that is approximately "Normal" (or Gaussian), then the standard deviation provides a useful basis for interpreting the data in terms of probability.
The Normal distribution is represented by a family of curves defined uniquely by two parameters, which are the mean and the standard deviation of the population. The curves are always symmetrically bell shaped, but the extent to which the bell is compressed or flattened out depends on the standard deviation of the population. However, the mere fact that a curve is bell shaped does not mean that it represents a Normal distribution, because other distributions may have a similar sort of shape.
Many biological characteristics conform to a Normal distribution closely enough for it to be commonly used - for example, heights of adult men and women, blood pressures in a healthy population, random errors in many types of laboratory measurements and biochemical data. Figure 2.1 shows a Normal curve calculated from the diastolic blood pressures of 500 men, mean 82 mmHg, standard deviation 10 mmHg. The ranges representing [+-1SD, +12SD, and +-3SD] about the mean are marked. A more extensive set of values is given in Table A of the print edition.
The reason why the standard deviation is such a useful measure of the scatter of the observations is this: if the observations follow a Normal distribution, a range covered by one standard deviation above the mean and one standard deviation below it.
includes about 68% of the observations; a range of two standard deviations above and two below () about 95% of the observations; and of three standard deviations above and three below () about 99.7% of the observations. Consequently, if we know the mean and standard deviation of a set of observations, we can obtain some useful information by simple arithmetic. By putting one, two, or three standard deviations above and below the mean we can estimate the ranges that would be expected to include about 68%, 95%, and 99.7% of the observations.
Standard deviation from ungrouped data
The standard deviation is a summary measure of the differences of each observation from the mean. If the differences themselves were added up, the positive would exactly balance the negative and so their sum would be zero. Consequently the squares of the differences are added. The sum of the squares is then divided by the number of observations minus oneto give the mean of the squares, and the square root is taken to bring the measurements back to the units we started with. (The division by the number of observations minus oneinstead of the number of observations itself to obtain the mean square is because "degrees of freedom" must be used. In these circumstances they are one less than the total. The theoretical justification for this need not trouble the user in practice.)
To gain an intuitive feel for degrees of freedom, consider choosing a chocolate from a box of n chocolates. Every time we come to choose a chocolate we have a choice, until we come to the last one (normally one with a nut in it!), and then we have no choice. Thus we have n-1 choices, or "degrees of freedom".