The Means of Two Different Data Sets Essay Example | Topics and Well Written Essays

Running Head: INSERT ABBREVIATED and Section # of Distinguish between comparingpercentages, comparing means, and correlating scores When we are comparing percentages we intend to compare two numbers which themselves were formed by dividing them with a special number, i.e. 100. Therefore, percentages mean out of 100 and when we compare two percentages with each other we intend to form our conclusion based on the numbers weight relative to the number 100. While when we compare the means of two different data sets we intend to find out the difference between the average values of the two data sets. By comparing the means of different data set an observer can make an inference about the different sets of data. Correlation is a method to measure the association in between two variables. When we compare the correlating scores of two variables, we are trying to determine whether the variables are related to each other or not. The purpose of doing correlations is to allow us to make a prediction about one variable based on what we know about another variable. What is frequency distribution? A frequency distribution is the tabulation of raw data obtained by dividing it into classes of some size and computing the number of data elements (or their fraction out of the total) falling within each pair of class boundaries. A frequency distribution can be modeled as a histogram or as a pie chart (Frequency Distribution). Distinguish between a pie chart, bar graph and frequency polygon. Construct one of each. A pie chart shows the differences between two separate variables or subjects. A pie chart is a graph that is in the shape of a circle which represents a total of 100%. Other variables or subjects are shown on the chart with respect to their relative percentages to the whole. The different subjects are shown in different colors and the size of each subject in the pie is proportional to the percentage of the subject. Figure 1 Pie Chart A bar graph shows raw data and it is designed to show different values of two or more subjects but instead of using the pie to represent data it uses horizontal and vertical bars that represent a different value. The bar graph has numbers along the side of the bars to indicate the value of the variable and there are scales which show what variable is being measured. Figure 2 Bar Graph The difference between the pie chart and the bar graph is that a bar graph is capable of showing change over time. While a single pie chart cannot show changes over time by itself, it can only represent the given percentages at a fixed point in time. A graphical display of a frequency table is called a frequency polygon. The X-axis has the intervals shown on it while the number of scores in each interval is represented by the height of a point located above the middle of the interval. The points are then connected with each other and with the X-axis so that they form a polygon. What is a measure of central tendency? Distinguish between the mean, median and mode. Central tendency refers to the middle value of the data it may also be some other typical value of the data set. The measures of central tendency are the mean, mode and median. One of the most common ways of finding the mean of a statistical distribution is the mathematical average of all the terms. This is also known as the arithmetic mean and is calculated by adding up all the values in the dataset and then dividing them by the total number of terms (Walpole). On the other hand, the median of a distribution with a discrete random variable is dependent upon the number of terms in the distribution, i.e. are they even or odd. If the number of terms is odd, then the median is the value of the term in the middle. Consequently, if the number of terms is even, then the median is found by taking the average of the two terms in the middle. It is not necessary that the mean and the median of the data se the same. The mean tells us about the average of the data set while the median tells us about the value which lies in the middle of the data set. Mode is defined as the most frequent value in a given dataset. For example, if a dataset has the scores of a cricket team over a year, the mode is the score that the team scored most number of times in the year. What is a measure of variability? Distinguish between the standard deviation and the range. Variability refers to how spread out a group of values is in a dataset. There are numerous ways by which the variability of a dataset can be measured. The most common measures of variability are the standard deviation and the range. The standard deviation is a measure that tells how closely the data is packed around the mean value of the dataset. In a normal distribution about 68% of the values lie in between 1 unit of standard deviation on both sides of the mean, while about 95% of the values lie in between two standard deviations from the mean. The range is the range is the difference in between the maximum and the minimum value in the set. While the standard deviation focuses on measuring the variation from the mean value the range focuses on only specifying the diversity in the set. What is a correlation coefficient? What do the size and sign of the correlation coefficient tell us about the relationship between variables? The degree of linear relationship in between two variables is known as the correlation. It is usually expressed as a coefficient which measures the strength of the linear relationship between variables. There are two types of correlation i.e. negative and positive. Negative correlation means that when the values of one variable increase the values of the other decrease and vice versa. Positive correlation means that the values of both the variables increase and decrease in the same direction (Weiss). The size of the correlation coefficient tells us about the strength of the relationship, i.e. is it weak or is it strong. The general ranges with their respective strengths are listed below: 0.0 - 0.2 Very weak to negligible correlation 0.2 - 0.4 Weak, low correlation (not very significant) 0.4 - 0.7 Moderate correlation 0.7 - 0.9 Strong, high correlation 0.9 - 1.0 Very strong correlation What is a scatter plot? Scatter plot is a diagram that shows the relationship in between two variables using the Cartesian coordinate system. The explanatory variable is plotted on the x-axis while the response variable is plotted on the y-axis. The information conveyed by a scatter plot about the relationship of variables includes strength, shape (linear or curvilinear), direction and the presence of outliers. What happens when a scatter plot show the relationship to be curvilinear? When the scatter plot shows the relationship to be curvilinear it means that the relationship in between the two variables is not linear. Hence, the equation of the relation will either be quadratic or a polynomial equation. Therefore, increase in one variable will not amount to a linear increase/decrease in the other variable. What is a regression equation? How might an employer use a regression equation? A regression equation is a technique that is used to predict the behavior of a dependent variable with respect to an independent variable. The regression equation is usually in the form of Y=A+Bx+C, where Y is the dependent variable, X is the independent variable, A is the Y-intercept of the line, and c is the regression residual. The values of A and B are selected in a manner so as to minimize the sum of the squares of the regression residuals. An employer can use regression analysis to predict the amount of sales for a given year then plan other business decisions like hiring staff, expenditure on advertising etc by keeping in mind the revenues from the predicted sales (Regression Analysis). How does multiple correlation increase accuracy of predication? Multiple correlation increases the accuracy of prediction because in the real world an outcome is not dependent upon just one independent variable. Therefore, in real world situations where we need to find relationships there are often two or more independent variables effecting one dependent variable. Hence, if we were to model the same data using simple correlation we would be led to different answers. Hence, for complex problems multiple correlation increases the efficiency of prediction (Correlation ). What is the purpose of partial correlation? The purpose of partial correlation analysis is to find the correlation between two variables after removing the effects of other variables. When a path diagram is shown, what information is conveyed by the arrows leading from one variable to another? In a path diagram a single headed arrow has two different meanings, a strong meaning and other being a weak meaning. In the strong meaning the arrowhead signifies causality and in the weak meaning the arrowhead signifies predictability. While a double headed arrow shows the correlation between two exogenous variables, with no commitment about causality or predictability. Chapter 12 Distinguish between null hypothesis and the research hypothesis. When does the researcher decide to reject the null hypothesis? A research hypothesis is a statement in which the researcher speculates the possible relationship between two variables. A null hypothesis, however, is the opposite of that statement which measures any correlation between the two variables due to random chance. To prove your research hypothesis right, you have to refute the null hypothesis. When the probability of the null statement occurring is less than 5%, the null hypothesis is rejected. What is meant by statistical significance? Statistical significance means that the research findings are very true and very close to the reality. It means that the element of chance is not present in the outcome. What factors are most important in determining whether obtained results will be significant? The sample size, the error rate the researcher decides, the type of sampling procedure adopted, the method of significance testing employed i.e. one-tailed or two-tailed, all affect the significance of the results Distinguish between a Type I and a Type II error. Why is your significance level the probability of making a Type I error? Type I error is the probability of a true null hypothesis being falsely discarded while Type II error is the probability of a false null hypothesis being considered right. The probability of making a type I error is denoted by alpha (a) and is called the Type I error rate. What factors are involved in choosing a significance level? A t-test is used to determine the significance level. It depends upon the average value, variance and the number of observations. What influences of the probability of a type II error? Beta denotes the probability of Type II error occurring. It depends upon the number of false negatives and the total number of actual positive instances. What is the difference between statistical significance and practical significance? Statistical significance shows that the null hypothesis can be disregarded at certain levels of probability. This in itself does not prove anything since probability values are greatly influenced b a number of factors e.g. larger samples can produce statistically significant results and any small difference can also be deemed statistically significant whereas in reality it might not be so. Discuss the reason that a researcher might obtain non-significant results. The researcher might obtain non-significant results if the error rate is falsely determined, the probability might lie outside the significant criteria or there might be an experimental error. Chapter 14 Why should a researcher be concerned about generalizing to other subject populations? What are some of the subject population generalization problems that a researcher might confront? A researcher is concerned about generalizing to other subject populations because the tests are always performed on samples and later the results are inferred on the entire population. Hence, in order to make sure that the results are accurate researchers generalize the results to different populations. The problems that a researcher might face while generalizing are the differences in sample behavior and attributes. What is the source of the problem of generalizing to other experiments? How can the problem be solved? The basic problem of generalizing to other experiments is that the inference is only valid to the extent the sample is representative of the population. We cannot infer to a population if the selected sample has different attributes as compared to the population. This problem can be solved by taking assumptions, and hence ensuring that the sample is a true reflection of the entire population. Why is it important to pretest a problem for generalization? Discuss the reasons why including a pretest may affect the ability to generalize results. It is important to pretest a problem for generalization in order to ensure that generalized results are possible. Generalization is not possible in all cases, there are circumstances were generalization is not possible. Hence, if we pretest a problem, there are higher chances of successful generalization of results provided generalization is possible. Including a pretest may affect the ability to generalize because once a pretest is taken the researcher may get biased to generalize the same results as hinted in the pretest. Hence, the sole purpose of generalization will be defeated. Distinguish between an exact replication and a conceptual replication? What is the value of a conceptual replication? Exact replications are copies of the original study that was undertaken. They are useful for establishing that the findings of the original study are reliable. While conceptual replication is a study based on the original study, but which uses different methods to better assess the true relationships in the original study. It is the most sophisticated kind of replication and one can use different manipulation measures while performing conceptual replication. What is a meta-analysis? Meta-analysis uses the results of two or more studies of the same research question and combines them into a single analysis. It is used to obtain greater accuracy and statistical power by taking advantage of the large sample size resulting from the accumulation of results over multiple studies. Meta analysis is used to answer a lot of questions as long as primary research exists to validate the answers. This is done by feeding data about selected parts into a database and then it is analyzed to yield a description and then infer about a certain hypotheses. Bibliography Correlation . (n.d.). Retrieved February 13, 2010, from NVCC: http://www.nvcc.edu/home/elanthier/methods/correlation.htm Frequency Distribution. (n.d.). Retrieved February 13, 2010, from Mathworld: http://mathworld.wolfram.com/FrequencyDistribution.html Regression Analysis. (n.d.). Retrieved February 13, 2010, from http://www.valuebasedmanagement.net/methods_regression_analysis.html Walpole, R. E. Introduction to Statistics. Weiss, N. A. Introductory Statistics. Read More

The Means of Two Different Data Sets - Essay Example

Extract of sample "The Means of Two Different Data Sets"

CHECK THESE SAMPLES OF The Means of Two Different Data Sets

Enterprise Data Warehousing and Data Mining

Strong Relationship between Two Data Sets

Analysis of previously gathered individual presentation data

Examining Gender Bias in Children's Merchandise

Comparison of Two Different Approaches to Therapy

Simple Data Analysis Investigation

Data Mining and How it Can Be Address

SPSS Partial Vinous Sample