For these variables, almost all random variables correspond to the measures of tendency.
However, in B5 it should be noted that the measures of central tendency are far from each other. In fact, there exist two modes for the variable. Also since the mean and the median are calculated and are not in the sample, the chance of picking a random variable which is the same as these two measures is zero.
If the data is significantly skewed, the mean becomes an inappropriate measure of central tendency. It should be noted that the mean will be more likely to be found on the dataset where the skewness can be found. For example, a data set which ranges from 7-40 which is positively skewed can have a mean which is 15 only because most of the data range from 7-18 for instance. The presence of outliers which are extremely low or high data can also adversely affect the effectiveness of the mean as a measure of central tendency.
If data is significantly skewed, the mode becomes the best approximation of the data's center. Mean cannot be relied upon because of the presence of outliers while median can also be misleading. Thus, mode which represents the most number of variable can be best represent the data's center at this situation.
If the data is significantly skewed, the range will not be affected. ...
a. Determine the range, sample standard deviation , and IQR for each of these random variables: D1, D5, D6, D7, SBC1, SBC8, B5.
b. How would the range, sample standard deviation , and IQR be affected when data is significantly skewed
If the data is significantly skewed, the range will not be affected. It should be noted that as the range is only a measure of the dispersion of data, it does not indicate how the data looks like. The range is simply a measurement of the values within the data set. However, it should also be noted that the presence of outliers in the data set will make the range very high.
A symmetrical data is expected to have a skewness of 0 because the standard deviation is zero. Thus, when the standard deviation of a data set is computed as zero, it follows that it is normally distributed.
The interquartile range is affected by the dispersion of the data set. It should be noted that as it looks at the 50% of the data set, the gap between the 3rd and the 1st quartile cannot fully indicate the dispersion because of the presence of observations which are in the lowest or highest quartiles. However, it is effective in removing the possibility of being misled by the presence of outliers.
c. If data is significantly skewed, what measure would be the best approximation of the data's dispersion
If the data is significantly skewed, the interquartile range serves as the best measure of dispersion. It should be noted that the interquartile range measures the dispersion by looking at the 50% of the observation. It removes the possibility of having misleading measures of central tendency by capturing the middle of the data and leaving the outliers behind. This is in contrast with range which is