Chi Square

Packages you should have loaded to successfully run this week’s exercises: haven, lsr, and dplyr.

Chi Square ( $χ^{2}$ )

All of the analyses we have covered this semester involve categorical independent variables and a continuous dependent variable. But what should you do when your outcome data is also categorical? Luckily, there are set of categorical data analytic techniques you can use when your data are categorical. The one we will focus on this week is the $χ^{2}$ test.

$χ^{2}$ goodness-of-fit test

The $χ^{2}$ goodness-of-fit test tests whether the proportions you observe for a categorical variable differ from a known population (or an equal distribution across categories). Let’s use an example from Navarro’s Learning Statistics with R textbook to see how to conduct a $χ^{2}$ “by hand” in R.

In this example, 200 people are asked to “draw” cards from an imaginary deck of cards, and their responses are recorded. We are interested in determining whether the cards people selected are really random. Load the package if you haven’t already done so previously, and then load the the dataset (randomness.Rdata, available here: Download Randomness Dataset.). Save the file to your working directory for easy loading.

load("randomness.Rdata")

For this example, we will focus on the first choice that people made (“choice_1” in the dataset). Use the table function to see the distribution of responses across the four types of playing cards. Save the table to an object called “observed”.

observed <- table(cards$choice_1)
observed

## 
##    clubs diamonds   hearts   spades 
##       35       51       64       50

We want to test whether this distribution of scores is random. Our null hypothesis is that all four suits of cards are chosen with equal probability. In other words, each card should have a 25% chance of being drawn. We will test whether the distriubtion we observed is the same or different as what we expected.

$H_{0}$ : All four suits are chosen with equal probability $H_{1}$ : At least one suit is chosen with a probability that is not equal to .25

We can save the probabilities to an object, and then multiply those probabilities by our sample size (N = 200) to get our expected values.

probabilities <- c(.25, .25, .25, .25)
N <- 200
expected <- (N*probabilities)
expected

## [1] 50 50 50 50

The formula for a $χ^{2}$ is :

$χ^{2} = \sum \frac{(O - E)^{2}}{E}$

We need to subtract the expected scores from our observed scores and then square them. Then we need to divide each of those values by the expected value. Finally, we sum all of those numbers together to get the value of $χ^{2}$ .

sum((observed - expected)^2 / expected)

## [1] 8.44

The last step is to determine whether our $χ^{2}$ statistic is statistically significant. In order to determine that, we need to know how many degrees of freedom we have. For a goodness-of-fit test, our degrees of freedom are equal to C - 1, or the number of categories minus 1. We therefore have 3 degrees of freedom. We can use pchisq function to determine the p-value associated with our $χ^{2}$ of 8.44. This function will tell us the p-value for getting a score equal or or less than 8.44, so we need to set lower.tail = FALSE to get the p-value for a score equal to or greater than 8.44

pchisq(8.44, df=3, lower.tail=FALSE)

## [1] 0.03774185

Using the `lsr` package to run the goodness-of-fit test

Luckily, the lsr package has a function that will calculate the $χ^{2}$ goodness-of-fit test for you.

goodnessOfFitTest(cards$choice_1)

## 
##      Chi-square test against specified probabilities
## 
## Data variable:   cards$choice_1 
## 
## Hypotheses: 
##    null:        true probabilities are as specified
##    alternative: true probabilities differ from those specified
## 
## Descriptives: 
##          observed freq. expected freq. specified prob.
## clubs                35             50            0.25
## diamonds             51             50            0.25
## hearts               64             50            0.25
## spades               50             50            0.25
## 
## Test results: 
##    X-squared statistic:  8.44 
##    degrees of freedom:  3 
##    p-value:  0.038

Goodness-of-fit tests can also be used when you know what the distribution of scores are across categories in a population and they aren’t necessarily equal to one another. For example, pretend that we knew that people in general tend to select red cards 60% of the time compared to 40% of the time for black cards. We can specify the probabilities of the population and then test our observed data against those probabilities.

redpref <- c(clubs=.2, diamonds =.3, hearts= .3, spades = .2)
goodnessOfFitTest(cards$choice_1, p=redpref)

## 
##      Chi-square test against specified probabilities
## 
## Data variable:   cards$choice_1 
## 
## Hypotheses: 
##    null:        true probabilities are as specified
##    alternative: true probabilities differ from those specified
## 
## Descriptives: 
##          observed freq. expected freq. specified prob.
## clubs                35             40             0.2
## diamonds             51             60             0.3
## hearts               64             60             0.3
## spades               50             40             0.2
## 
## Test results: 
##    X-squared statistic:  4.742 
##    degrees of freedom:  3 
##    p-value:  0.192

The goodnessOfFit function in the lsr package is very user-friendly, as it was designed for introductory statistics students. As you know, most functions in R do not give you as much helpful output, so you have to know what you are doing! There is a chisq.test function in R that will calculate the chi-square statistic for you (but won’t give as much output as the lsr package).

options(digits = 3)
#equal probability
chisq.test(observed)

## 
## 	Chi-squared test for given probabilities
## 
## data:  observed
## X-squared = 8, df = 3, p-value = 0.04

#red more likely example
chisq.test(observed, p=c(.20,.30,.30,.20))

## 
## 	Chi-squared test for given probabilities
## 
## data:  observed
## X-squared = 5, df = 3, p-value = 0.2

The $χ^{2}$ Test of Independence

The goodness-of-fit test is for one categorical variable where you want to test the proportions against a known population (or equal probability, chance). The $χ^{2}$ Test of Independence is used when you have two categorical variables and you want to see if there is a relationship between the two. The simplest form the the $χ^{2}$ Test of Independence is a 2 x 2 (2 independent variables with 2 levels each). For this example, we will use the Cats dataset and example from Field. In this example, a researcher was interested in whether cats could be trained to line dance by giving them either food or affection as a reward. After a week, he counted how many cats could line dance in each of the training categories (Food v Affection).

Creating a Contingency Table

We’ll use the cats dataset available from the Andy Field DSUR companion website, Discovering Statistics using R. Or, you can access a csv version here:

Download Cats Dataset

cats <- read.csv("cats.csv", fileEncoding = "UTF-8-BOM")

Note that the code above includes fileEncoding = "UTF-8-BOM". This bit of code will prevent a common issue that sometimes occurs when reading in csv files where the variable name for the first column comes in with symbols. If you don’t include that piece of code and find that the first variable of your dataframe is not what you expect, add this code to you read-in and try again.

Let’s take a look at a contingency table of the the two variables, Dance and Training. The addmargins function will add a column and row to the table that contains the corresponding sums.

#creating a contingency table
cats_frequencies <- addmargins(table(cats$Dance, cats$Training))
cats_frequencies

##      
##       Affection as Reward Food as Reward Sum
##   No                  114             10 124
##   Yes                  48             28  76
##   Sum                 162             38 200

Remember the formula for a $χ^{2}$ :

$χ^{2} = \sum \frac{(O - E)^{2}}{E}$

But now, we’ll need to calculate expected values for 4 conditions (Food/No, Food/Yes, Affection/No, and Affection/Yes), replacing the $E$ with $E_{i j}$ .

To calculate the expected values, we use the formula:

$E_{i j} = \frac{R_{i} * C_{j}}{N}$

We will take the row total for row i and multiply by the column total j. Then divide by n. For example, to calculate the expected value for Food/Yes, we would do the following:

$M o d e l_{F o o d, Y e s} = \frac{76 * 38}{200} = 14.44$ And for the Affection/No condition:

$M o d e l_{A f f e c t i o n, N o} = \frac{124 * 162}{200} = 100.44$

If we were going to continue doing this by hand, we would then go ahead and calculate the expected values for the other two conditions. Then we would need to plug in the expected and observed values for each of the four conditions back into the formula for $χ^{2}$ . The final step would be to sum across those four calculations to get our $χ^{2}$ value. See Field page 815 for the full formula and calculation.

R can do the dirty work for us. Navarro has created another user-friendly function to calculate $χ^{2}$ for us called associationTest.

library(lsr)
associationTest(~ Dance+Training, data=cats)

## 
##      Chi-square test of categorical association
## 
## Variables:   Dance, Training 
## 
## Hypotheses: 
##    null:        variables are independent of one another
##    alternative: some contingency exists between variables
## 
## Observed contingency table:
##      Training
## Dance Affection as Reward Food as Reward
##   No                  114             10
##   Yes                  48             28
## 
## Expected contingency table under the null hypothesis:
##      Training
## Dance Affection as Reward Food as Reward
##   No                100.4           23.6
##   Yes                61.6           14.4
## 
## Test results: 
##    X-squared statistic:  23.5 
##    degrees of freedom:  1 
##    p-value:  <.001 
## 
## Other information: 
##    estimated effect size (Cramer's v):  0.343 
##    Yates' continuity correction has been applied

We can also use the chisq.test function to calculate the $\chi^2$:

chisq.test(cats$Training, cats$Dance)

## 
## 	Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  cats$Training and cats$Dance
## X-squared = 24, df = 1, p-value = 1e-06

Yates Continuity Correction

Note that for both of the above analyses, the answer is the same, 23.5, P<.001. However, Field gets a slightly different answer (25.35). Why the difference? Both associationTest and chisq.test apply a Yates Continuity Correction to the $χ^{2}$ formula. This is because there is some error of approximation with 2 x 2 $χ^{2}$ analysis, making a Type I error more likely. The $χ^{2}$ test is based on an approximation–that the binomial distribution looks like a normal distribution when the sample size is large. But with only 1 degree of freedom, this assumption doesn’t quite hold, and you end up with a larger value for $χ^{2}$ than you should (smaller p-value).

To adjust for this, the Yates correction is applied by subtracting 0.5 from the deviation scores before squaring. There is some debate about whether this correction is overly conservative, or whether it is necessary at all. Just know that most statistical software (and in this case, functions in R) apply the correction automatically when you do a 2 x 2 $χ^{2}$ analysis.

Field uses the CrossTable function in the gmodels package to run the $χ^{2}$ analysis. This function can you give the $χ^{2}$ statistic (with and without the Yates correction) and Fisher’s Exact test, and you can also request to have the output formatted the way that SPSS does.

#install.packages("gmodels")
library(gmodels)

CrossTable(cats$Training, cats$Dance, fisher = TRUE, chisq = TRUE, expected = TRUE, format = "SPSS")

## 
##    Cell Contents
## |-------------------------|
## |                   Count |
## |         Expected Values |
## | Chi-square contribution |
## |             Row Percent |
## |          Column Percent |
## |           Total Percent |
## |-------------------------|
## 
## Total Observations in Table:  200 
## 
##                     | cats$Dance 
##       cats$Training |       No  |      Yes  | Row Total | 
## --------------------|-----------|-----------|-----------|
## Affection as Reward |      114  |       48  |      162  | 
##                     |  100.440  |   61.560  |           | 
##                     |    1.831  |    2.987  |           | 
##                     |   70.370% |   29.630% |   81.000% | 
##                     |   91.935% |   63.158% |           | 
##                     |   57.000% |   24.000% |           | 
## --------------------|-----------|-----------|-----------|
##      Food as Reward |       10  |       28  |       38  | 
##                     |   23.560  |   14.440  |           | 
##                     |    7.804  |   12.734  |           | 
##                     |   26.316% |   73.684% |   19.000% | 
##                     |    8.065% |   36.842% |           | 
##                     |    5.000% |   14.000% |           | 
## --------------------|-----------|-----------|-----------|
##        Column Total |      124  |       76  |      200  | 
##                     |   62.000% |   38.000% |           | 
## --------------------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  25.4     d.f. =  1     p =  4.77e-07 
## 
## Pearson's Chi-squared test with Yates' continuity correction 
## ------------------------------------------------------------
## Chi^2 =  23.5     d.f. =  1     p =  1.24e-06 
## 
##  
## Fisher's Exact Test for Count Data
## ------------------------------------------------------------
## Sample estimate odds ratio:  6.58 
## 
## Alternative hypothesis: true odds ratio is not equal to 1
## p =  1.31e-06 
## 95% confidence interval:  2.84 16.4 
## 
## Alternative hypothesis: true odds ratio is less than 1
## p =  1 
## 95% confidence interval:  0 14.3 
## 
## Alternative hypothesis: true odds ratio is greater than 1
## p =  7.71e-07 
## 95% confidence interval:  3.19 Inf 
## 
## 
##  
##        Minimum expected frequency: 14.4

Assumptions of $χ^{2}$

There are two assumptions you need to be concerned with when doing a $χ^{2}$ analysis:

Independence of Observations. Each person, item, or entity can only contribute to one cell of the contingency table. $χ^{2}$ is not appropriate for repeated measures data.

Expected frequencies per cell are sufficiently large. The expected frequencies of each cell of a contingency table should be greater than or equal to 5. In larger tables, you can get away with 80% of the cells having an expected frequency greater than 5 and none below 1. If expected cell counts are small (and sample size is likely to be small), you can use the Fisher Exact Test instead of $χ^{2}$ .

fisher.test(cats$Training, cats$Dance)

## 
## 	Fisher's Exact Test for Count Data
## 
## data:  cats$Training and cats$Dance
## p-value = 1e-06
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##   2.84 16.43
## sample estimates:
## odds ratio 
##       6.58

Effect size

For a 2 x 2 contingency table, there are a few different effect sizes you can calculate and report: Odds ratio, Cramer’s V, and Phi ( $ϕ$ ).

Phi

$ϕ$ is calculated by dividing the $χ^{2}$ by N and then taking the square root of that value. Ideally, the value should be between 0 and 1; however, with larger contingency tables you can end up with a value that exceeds 1. Cramer’s V is therefore the better effect size measure to use.

Cramer’s V

Cramer’s V measures the strength of association between two nominal/categorical variables. It is included in the output of associationTest. You can also get it by using cramersV function in the lsr package. In the formula below, N is the total sample size and k is the smallest number for either the rows or the columns (choose the smallest one overall).

The formula for Cramer’s V is:

$V = \sqrt{\frac{χ^{2}}{N (k - 1)}}$

To calculate cramer’s V, you can use the cramersV function:

cramersV(cats$Training, cats$Dance)

## [1] 0.343

This lab was written by Annie B. Fox at MGH Institute of Health Professions for RS 930 Statistics for Health and Rehabilitation Sciences. It was based on material presented in Danielle Navarro's *Learning Statistics with R*. This lab exercise is released under a Creative Commons Attribution-ShareAlike 3.0 Unported license.

Last updated on May 20, 2020

Chi Square

Chi Square (χ2)

χ2 goodness-of-fit test

Using the lsr package to run the goodness-of-fit test

The χ2 Test of Independence