Tuesday, October 22, 2019

How to Estimate Standard Deviations (SD)

How to Estimate Standard Deviations (SD) The standard deviation and range are both measures of the spread of a data set. Each number tells us in its own way how spaced out the data are, as they are both a measure of variation.  Although there is not an explicit relationship between the range and standard deviation, there is a rule of thumb that can be useful to relate these two statistics.  This relationship is sometimes referred to as the range rule for standard deviation. The range rule tells us that the standard deviation of a sample is approximately equal to one-fourth of the range of the data. In other words s (Maximum – Minimum)/4. This is a very straightforward formula to use, and should only be used as a very rough estimate of the standard deviation. An Example To see an example of how the range rule works, we will look at the following example. Suppose we start with the data values of 12, 12, 14, 15, 16, 18, 18, 20, 20, 25. These values have a mean of 17 and a standard deviation of about 4.1. If instead we first calculate the range of our data as 25 – 12 13 and then divide this number by four we have our estimate of the standard deviation as 13/4 3.25. This number is relatively close to the true standard deviation and good for a rough estimate. Why Does It Work? It may seem like the range rule is a bit strange. Why does it work? Doesn’t it seem completely arbitrary to just divide the range by four? Why wouldn’t we divide by a different number? There is actually some mathematical justification going on behind the scenes. Recall the properties of the bell curve and the probabilities from a standard normal distribution. One feature has to do with the amount of data that falls within a certain number of standard deviations: Approximately 68% of the data is within one standard deviation (higher or lower) from the mean.Approximately 95% of the data is within two standard deviations (higher or lower) from the mean.Approximately 99% is within three standard deviations (higher or lower) from the mean. The number that we will use has to do with 95%. We can say that 95% from two standard deviations below the mean to two standard deviations above the mean, we have 95% of our data. Thus nearly all of our normal distribution would stretch out over a line segment that is a total of four standard deviations long. Not all data is normally distributed and bell curve shaped. But most data is well-behaved enough that going two standard deviations away from the mean captures nearly all of the data. We estimate and say that four standard deviations are approximately the size of the range, and so the range divided by four is a rough approximation of the standard deviation. Uses for the Range Rule The range rule is helpful in a number of settings. First, it is a very quick estimate of the standard deviation. The standard deviation requires us to first find the mean, then subtract this mean from each data point, square the differences, add these, divide by one less than the number of data points, then (finally) take the square root. On the other hand, the range rule only requires one subtraction and one division. Other places where the range rule is helpful is when we have incomplete information. Formulas such as that to determine sample size require three pieces of information: the desired margin of error, the level of confidence and the standard deviation of the population we are investigating. Many times it is impossible to know what the population standard deviation is. With the range rule, we can estimate this statistic, and then know how large we should make our sample.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.