What are Non-Parametric Tests?
Parametric tests (z-test, t-test) require assumptions about the population distribution — usually normality. Non-parametric tests make no such assumptions. They work on ranks or signs rather than the raw data values, making them robust to outliers and skewed distributions.
When to Use Non-Parametric Tests
- The population is not normally distributed (or normality cannot be assumed).
- The sample size is small and you cannot invoke the CLT.
- Data are ordinal (ranked categories) rather than continuous measurements.
- There are extreme outliers that would invalidate a t-test.
- The question explicitly says "use a non-parametric test" or "do not assume normality."
The Three Tests Tested in 9231 P4
Ranking — The Common Foundation
Both the Wilcoxon and Mann-Whitney tests require ranking data values. The ranking rules are critical:
The Sign Test
The sign test is the simplest non-parametric test. For each observation xᵢ, record whether xᵢ − m₀ is positive (+) or negative (−). Ties (xᵢ = m₀) are discarded.
H₁: m > m₀ (right-tailed) or m < m₀ (left-tailed) or m ≠ m₀ (two-tailed)
Let S⁺ = number of positive signs (xᵢ > m₀).
Let S⁻ = number of negative signs (xᵢ < m₀).
Test statistic = S⁺ (right-tailed) or S⁻ (left-tailed) or min(S⁺, S⁻) (two-tailed)
For right-tailed: p = P(S⁺ ≥ observed value | B(n,0.5))
Reject H₀ if p-value < α.
Under H₀, each observation is equally likely to be above or below m₀, so S⁺ ~ B(n, ½). Cambridge provides binomial cumulative tables in MF19. For a right-tailed test at 5% with n=10, reject H₀ if P(S⁺ ≥ s) < 0.05, i.e. if s is large enough that the upper tail probability is below 0.05.
Test at 5% whether the median reaction time exceeds 220 ms.
Signs: +, −, +, +, −, +, +, +, −, +
S⁺ = 7, S⁻ = 3, n = 10 (no ties)
P(S⁺ ≥ 7) = P(X≥7) where X~B(10,0.5)
= P(X=7)+P(X=8)+P(X=9)+P(X=10)
= 0.1172+0.0439+0.0098+0.0010 = 0.1719
The sign test ignores the magnitude of the differences — it only uses direction. A difference of +1 and +100 are treated identically. The Wilcoxon signed-rank test is always preferred when the data are continuous, as it uses more information and is more powerful.
Wilcoxon Signed-Rank Test
The Wilcoxon signed-rank test uses both the sign and the rank of each difference from the hypothesised median. It is more powerful than the sign test.
Discard any dᵢ = 0 and reduce n accordingly.
For tied |dᵢ|, assign average ranks.
W⁻ = sum of ranks where dᵢ < 0
Check: W⁺ + W⁻ = n(n+1)/2
For right-tailed (H₁: m > m₀): use T = W⁻
For left-tailed (H₁: m < m₀): use T = W⁺
Reject H₀ if T ≤ w_α (note: ≤, not ≥ — smaller T means more evidence against H₀).
Selected Critical Values — Wilcoxon Signed-Rank (Two-Tailed)
| n | α = 0.10 | α = 0.05 | α = 0.02 | α = 0.01 |
|---|---|---|---|---|
| 5 | 1 | 1 | 0 | — |
| 6 | 2 | 1 | 0 | — |
| 7 | 4 | 2 | 1 | 0 |
| 8 | 6 | 4 | 2 | 1 |
| 9 | 8 | 6 | 3 | 2 |
| 10 | 11 | 8 | 5 | 3 |
| 12 | 14 | 14 | 10 | 7 |
| 15 | 25 | 21 | 16 | 13 |
| 20 | 43 | 38 | 30 | 26 |
Test at 5% (two-tailed) whether the median score is 65.
| xᵢ | dᵢ = xᵢ−65 | |dᵢ| | Rank |dᵢ| | Sign | Signed Rank |
|---|---|---|---|---|---|
| 45 | −20 | 20 | 6 | − | −6 |
| 54 | −11 | 11 | 4 | − | −4 |
| 58 | −7 | 7 | 3 | − | −3 |
| 63 | −2 | 2 | 1 | − | −1 |
| 69 | +4 | 4 | 2 | + | +2 |
| 72 | +7 | 7 | 3 | + | +3 (tied — avg) |
| 77 | +12 | 12 | 5 | + | +5 |
| 81 | +16 | 16 | 7 | + | +7 |
| xᵢ | dᵢ | |dᵢ| | Rank | Signed Rank |
|---|---|---|---|---|
| 63 | −2 | 2 | 1 | −1 |
| 69 | +4 | 4 | 2 | +2 |
| 58 | −7 | 7 | 3.5 | −3.5 |
| 72 | +7 | 7 | 3.5 | +3.5 |
| 54 | −11 | 11 | 5 | −5 |
| 77 | +12 | 12 | 6 | +6 |
| 81 | +16 | 16 | 7 | +7 |
| 45 | −20 | 20 | 8 | −8 |
W⁻ = 1 + 3.5 + 5 + 8 = 17.5
Check: 18.5 + 17.5 = 36 = 8×9/2 ✓
T = min(18.5, 17.5) = 17.5
Mann-Whitney U-Test
The Mann-Whitney test compares two independent samples. It tests whether the two populations have the same median (or equivalently, the same distribution). Also called the Wilcoxon Rank-Sum test.
H₁: m₁ ≠ m₂ or m₁ > m₂ or m₁ < m₂
Rank all values from smallest (rank 1) to largest, applying average ranks for ties.
Keep track of which sample each rank belongs to.
W₂ = sum of ranks for sample 2
Check: W₁ + W₂ = N(N+1)/2 where N = n₁ + n₂
U₂ = W₂ − n₂(n₂+1)/2
Check: U₁ + U₂ = n₁ × n₂
For one-tailed: choose U₁ or U₂ depending on H₁ direction.
Reject H₀ if U ≤ u_α (same direction as Wilcoxon — smaller = more evidence against H₀).
Sample B (n₂=6): 25, 14, 20, 17, 28, 11
Test at 5% (two-tailed) whether the populations have equal medians.
| Value | Sample | Rank |
|---|---|---|
| 9 | A | 1 |
| 11 | B | 2 |
| 12 | A | 3 |
| 14 | B | 4 |
| 15 | A | 5 |
| 17 | B | 6 |
| 18 | A | 7 |
| 20 | B | 8 |
| 22 | A | 9 |
| 25 | B | 10 |
| 28 | B | 11 |
W₂ (Sample B) = 2+4+6+8+10+11 = 41
Check: 25+41 = 66 = 11×12/2 ✓
U₂ = 41 − 6×7/2 = 41 − 21 = 20
Check: U₁+U₂ = 30 = 5×6 ✓
U = min(10,20) = 10
Normal Approximation for Large Samples
When both n₁ and n₂ are large (Cambridge will specify), the Mann-Whitney U statistic follows an approximate normal distribution:
Worked Examples
Before: 148, 152, 145, 160, 155, 143, 158
After: 142, 150, 148, 151, 152, 140, 155
Use Wilcoxon signed-rank at 5% (one-tailed) to test whether treatment reduces blood pressure. Do not assume normality.
No zeros. Order: 2(1), 3(3), 3(3), 3(3), 3(3), 6(6), 9(7)
Wait — four 3s at positions 2,3,4,5 → average rank = (2+3+4+5)/4 = 3.5
Ranks: d=2→1, d=−3→3.5, d=+3→3.5, d=+3→3.5, d=+3→3.5, d=6→6, d=9→7
| Before | After | d | |d| | Rank | Signed Rank |
|---|---|---|---|---|---|
| 152 | 150 | +2 | 2 | 1 | +1 |
| 145 | 148 | −3 | 3 | 3.5 | −3.5 |
| 155 | 152 | +3 | 3 | 3.5 | +3.5 |
| 143 | 140 | +3 | 3 | 3.5 | +3.5 |
| 158 | 155 | +3 | 3 | 3.5 | +3.5 |
| 148 | 142 | +6 | 6 | 6 | +6 |
| 160 | 151 | +9 | 9 | 7 | +7 |
W⁻ = 3.5
Check: 24.5+3.5 = 28 = 7×8/2 ✓
H₁: treatment reduces BP (before > after → positive d → want W⁺ large)
For right-tail H₁ (m_d > 0): T = W⁻ = 3.5
T = 3.5 ≤ 4 → Reject H₀
(a) n=6 observations, test whether median = 50. Normality not assumed.
(b) Two independent samples of size 8 and 10. No normality assumption.
(c) Paired data, n=12. The differences are approximately normal.
(d) n=8 observations, median test. Only direction of differences known (not magnitude).
Practice Questions
Use a sign test at 10% to test whether the median differs from 2000 (two-tailed).
1850(−), 2100(+), 1980(−), 2250(+), 1760(−), 2050(+), 1920(−), 2180(+), 1890(−), 2300(+), 2020(+), 1970(−)
S⁺ = 6, S⁻ = 6, n = 12 (no ties)
H₀: m=2000, H₁: m≠2000 (two-tailed, α=0.10)
T = min(6,6) = 6 ~ B(12, 0.5)
P(S⁺ ≤ 6) for B(12,0.5) = 0.6128 (not a small tail probability — this is not unusual)
Two-tailed p-value = 2×P(X ≤ 6) ... actually since T=6=n/2 this is the most balanced possible outcome.
p-value = 2×P(X ≤ 6) = 2×0.6128 > 1 — use: p = 1 (perfectly balanced).
More precisely: p-value = P(X≤6 or X≥6) = 1. Fail to reject H₀.
No zeros. n=9.
|d| ordered: 2(1), 3(2), 12(3), 15(4), 18(5), 21(6), 33(7), 43(8), 56(9)
Signs: −18→rank5(−), +15→rank4(+), −2→rank1(−), +43→rank8(+), −33→rank7(−), +21→rank6(+), −12→rank3(−), +56→rank9(+), +3→rank2(+)
W⁺ = 4+8+6+9+2 = 29
W⁻ = 5+1+7+3 = 16
Check: 29+16 = 45 = 9×10/2 ✓
H₁: m>500 (right-tailed) → T = W⁻ = 16
n=9, one-tailed 5%: w_{0.05} = 8 (from table)
T=16 > 8 → Fail to reject H₀
Group Y (n₂=7): 52, 29, 44, 38, 61, 33, 47
Test at 5% (two-tailed) whether the populations have equal medians.
W_X = 1+4+5+7+9 = 26
W_Y = 2+3+6+8+10+11+12 = 52
Check: 26+52 = 78 = 12×13/2 ✓
U_X = 26 − 5×6/2 = 26−15 = 11
U_Y = 52 − 7×8/2 = 52−28 = 24
Check: 11+24 = 35 = 5×7 ✓
U = min(11,24) = 11
n₁=5, n₂=7, two-tailed 5%: u_{0.05} = 6 (from MF19)
11 > 6 → Fail to reject H₀
Justification:
• Two independent samples → rules out Wilcoxon signed-rank and sign test
• Skewed distribution with outlier → normality assumption of t-test is violated, especially with small n
• Mann-Whitney uses ranks, so it is robust to the outlier and skewness
Assumptions required:
• The two samples are independent random samples
• The observations are from continuous distributions
• Under H₀, the two distributions have the same shape (not just the same mean)
Interactive Rank Calculator
Enter data values and compute test statistics for all three non-parametric tests, with full ranking table shown.
Formula Sheet — Non-Parametric Tests
- Always show the full ranking table — values, differences, |d|, rank, signed rank. Cambridge awards method marks for correct ranking even if later arithmetic is wrong.
- Check sums explicitly: State W⁺ + W⁻ = n(n+1)/2 and U₁ + U₂ = n₁n₂. These checks catch errors before the decision step.
- Tied ranks: If any |dᵢ| values are equal, average their ranks. State "tied ranks" explicitly — Cambridge rewards this acknowledgement.
- Zero differences (Wilcoxon): State "discard d=0, reduce n to [new value]." This earns a mark and affects the critical value lookup.
- Decision direction: Reject H₀ if T ≤ critical value — the opposite direction to z and t tests. Students often err here.
- Justification questions: When asked why a non-parametric test is appropriate, mention: (1) small sample size, (2) normality not satisfied, (3) data is ordinal. When it is not appropriate: large n with normal data → parametric test is more powerful.