You are here

Testing with matched pairs: the Wilcoxon signed ranks test

26 January, 2016 - 11:31

During your career, you will often be interested in finding out if the same population is different in different situations. Do the same workers perform better after a training session? Do customers who used one of your products prefer the "new improved" version? Are the same characteristics important to different groups? When you are comparing the same group in two different situations, you have "matched pairs". For each member of the population or sample you have what happened under two different sets of conditions.

There is a non-parametric test using matched pairs that allows you to see if the location of the population is different in the different situations. This test is the Wilcoxon Signed Ranks Test. To understand the basis of this test, think about a group of subjects who are tested under two sets of conditions, A and B. Subtract the test score under B from the test score under A for each subject. Rank the subjects by the absolute size of that difference, and look to see if those who scored better under A are mostly lumped together at one end of your ranking. If most of the biggest absolute differences belong to subjects who scored higher under one of the sets of conditions, then the subjects probably perform differently under A than under B.

The details of how to perform this test were published by Frank Wilcoxon in 1945 1 . Wilcoxon found a method to find out if the subjects who scored better under one of the sets of conditions were lumped together or not. He also found the sampling distribution needed to test hypotheses based on the rankings. To use Wilcoxon's test, collect a sample of matched pairs. For each subject, find the difference in the outcome between the two sets of conditions and then rank the subjects according to the absolute value of the differences. Next, add together the ranks of those with negative differences and add together the ranks of those with positive differences. If these rank sums are about the same, then the subjects who did better under one set of conditions are mixed together with those who did better under the other condition, and there is no difference. If the rank sums are far apart, then there is a difference between the two sets of conditions.

Because the sum of the rank sums is always equal to [N(N-1)]/2], if you know the rank sum for either the positives or the negatives, you know it for the other. This means that you do not really have to compare the rank sums, you can simply look at the smallest and see if it is very small to see if the positive and negative differences are separated or mixed together. The sampling distribution of the smaller rank sums when the populations the samples come from are the same was published by Wilcoxon. A portion of a table showing this sampling distribution is in Table 7.3 Sampling distribution. See below.

Table 7.3 Sampling distribution

one-tail significance

0.05

0.025

0.01

   two-tail significance 0.1 0.05 0.02

number of pairs, N
 

 

   
5 0    
6 2 0  
7 3 2 0
8 5 3 1
9 8 5 3
10 10 8 5
 

Wendy Woodruff is the President of the Student Accounting Society at the University of North Carolina at Burlington (UNC-B). Wendy recently came across a study by Baker and McGregor  2 in which both accounting firm partners and students were asked to score the importance of student characteristics in the hiring process. A summary of their findings is in Table 7.4 Data on importance of student attributes. From Baker and McGregor..

Table 7.4 Data on importance of student attributes. From Baker and McGregor.

ATTRIBUTE

Mean: student rating

Mean: big firm rating

High Accounting GPA

2.06

2.56

High Overall GPA

0.08

-0.08

Communication Skills

4.15

4.25

Personal Integrity

4.27

7.5

Energy, drive, enthusiasm

4.82

3.15

Appearance

2.68

2.31

 

Wendy is wondering if the two groups think the same things are important. If the two groups think that different things are important, Wendy will need to have some society meetings devoted to discussing the differences. Wendy has read over the article, and while she is not exactly sure how Baker and McGregor's scheme for rating the importance of student attributes works, she feels that the scores are probably not distributed normally. Her test to see if the groups rate the attributes differently will have to be non-parametric since the scores are not normally distributed and the samples are small.Wendy uses the Wilcoxon Signed Ranks Test.

Her hypotheses are:

H0: There is no true difference between what students and Big 6 partners think is important.
Hs: There is a difference.

She decides to use a level of significance of .05. Wendy's test is a two-tail test because she wants to see if the scores are different, not if the Big 6 partners value these things more highly. Looking at the table, she finds that, for a two-tail test, the smaller of the two sum of ranks must be less than or equal to 2 to accept Ha :.

Wendy finds the differences between student and Big 6 scores, and ranks the absolute differences, keeping track of which are negative and which are positive. She then sums the positive ranks and sum the negative ranks. Her work is shown below:

Table 7.5 The worksheet for the Wilcoxon Signed Ranks Test
ATTRIBUTE Mean: student rating Mean: big firm rating Difference Rank
High Accounting GPA 2.06 2.56 -0.5 -4
High Overall GPA 0.08 -0.08 0.16 2
Communication Skills 4.15 4.25 -0.1 -1
Personal Integrity 4.27 7.5 -2.75 -6
Energy, drive, enthusiasm 4.82 3.15 1.67 5
Appearance 2.68 2.31 0.37 3
                                                           sum of positive ranks = 4+5+3=10
                                                           sum of negative ranks = 4+1=6=11
                                                           number of pairs=6

Her sample statistic, T, is the smaller of the two sums of ranks, so T=10. According to her decision rule to accept Ha: if T < 2, she decides that the data supports H0: that there is no difference in what students and Big 6 firms think is important to look for when hiring students. This makes sense, because the attributes that students score as more important, those with positive differences, and those that the Big 6 score as more important, those with negative differences, are mixed together when the absolute values of the differences are ranked. Notice that using the rankings of the differences rather than the size of the differences reduces the importance of the large difference between the importance students and Big 6 partners place on Personal integrity. This is one of the costs of using non-parametric statistics. The Student Accounting Society at UNC-B does not need to have a major program on what accounting firms look for in hiring. However, Wendy thinks that the discrepancy in the importance in hiring placed on Personal Integrity by Big 6 firms and the students means that she needs to schedule a speaker on that subject. Wendy wisely tempers her statistical finding with some common sense.