## Scatter plot

A scatter diagram is a graph that plots two variables on a Cartesian plane. The diagram tries to establish if there exists a linear relationship between the two variables plotted. This can be observed by looking at the trend of the scatter plots. The independent variable is plotted on the x – axis while the dependent variable is on the y โ axis. A positive relationship between two variables is displayed in a scatter plot when the points tend to slope upwards. This can also be seen when a straight line is drawn on the points that are on the scatter plot. The data below show the untreated cases of tooth decay in the US between 1988 and 2008 for age group 6-19 years across the three ethnicities.

*(Source of graphs – Center for Disease Control and Prevention, 2016).*

A scatter plot for the ethnicity / race and untreated cases of tooth decay will be drawn. The dependent variable is untreated cases of tooth decay while the independent variable is the ethnicity. The diagram is presented below.

The general rule is that correlation does not mean causality. One way of observing the correlation between variables is by using a scatter plot diagram. For instance, the scatter diagram above shows that there is a positive relationship between the two variables. This implies that as the values of one variable increases, the values of the other variables also increase. However, the diagram cannot prove that one variable is causing the other. For instance, in the example above, it cannot be stated with certainty that ethnicity is causing the reported percentage of untreated tooth decay. Thus, a scatter plot may just give information on the strength and the direction of the relationship between the two variables. From the diagram above, it is not possible to know if a causal relationship exists between the ethnicity and percentage of untreated tooth decay.

A casual relationship exists if the occurrence of one variable causes the other. If a causal relationship exists between two variables, then they must be correlated. It is important to test if a causal relationship exists between two variables. The idea of causality is quite philosophical and in most cases, it is believed that can it cannot be tested. The only accurate way of determining if one variable causes the other is by carrying out a controlled experiment. From a statistical point of view, a number of tests have always been used to measure the ability of one variable to predict future values of the other variable. One common test is the Granger causality test. This test makes use of a number of t-test and Fโtest. Based on this test, a time series of one variable is considered to cause another variable if it can provide statistically significant information on the future values of the second variable. However, this method is not accurate in determining the causality between two variables (Rosner, 2010).

The positive relationship that is observed between ethnicity and untreated cases of tooth decay can be used to create and test hypothesis. In this case, the null and alternative hypotheses are stated below.

Null hypothesis: Ethnicity does not have an impact on the percentage of untreated tooth decay for age group 6-19 years.

Alternative hypothesis: Ethnicity has an impact on the percentage of untreated tooth decay for age group 6-19 years.

## References

Center for Disease Control and Prevention. (2016). *Interactive tables and databases*. Web.

Rosner, B. (2010). *Fundamentals of biostatistics*. Boston: Cengage Learning.