1 2 3 4 5 + + + W⁺ T m₀

Non-Parametric Tests

Distribution-free tests using ranks and signs — no normality assumption required. Wilcoxon signed-rank, Mann-Whitney, and the sign test.

W⁺ Wilcoxon Signed-Rank T Mann-Whitney ± Sign Test 📐 Ranking Techniques ★ Cambridge 9231 Lesson 4.4 of 5
4.4 Non-Param

What are Non-Parametric Tests?

Parametric tests (z-test, t-test) require assumptions about the population distribution — usually normality. Non-parametric tests make no such assumptions. They work on ranks or signs rather than the raw data values, making them robust to outliers and skewed distributions.

Parametric Tests
Assume normality of population
Use actual data values
More powerful when assumptions hold
t-test, z-test, F-test
Sensitive to outliers
Test mean μ
Non-Parametric Tests
No distributional assumptions
Use ranks or signs only
Less powerful but more robust
Wilcoxon, Mann-Whitney, Sign test
Robust to outliers
Test median m

When to Use Non-Parametric Tests

📌 Cambridge 9231 — When Non-Parametric is Required
  • The population is not normally distributed (or normality cannot be assumed).
  • The sample size is small and you cannot invoke the CLT.
  • Data are ordinal (ranked categories) rather than continuous measurements.
  • There are extreme outliers that would invalidate a t-test.
  • The question explicitly says "use a non-parametric test" or "do not assume normality."

The Three Tests Tested in 9231 P4

One-sample / Paired
Sign Test
Tests whether the population median equals a specified value m₀. Uses only the sign (+/−) of each difference from m₀. Simplest non-parametric test.
Parametric equivalent: one-sample t-test
One-sample / Paired (stronger)
Wilcoxon Signed-Rank
Tests whether the population median equals m₀. Uses both the sign and magnitude of differences. More powerful than the sign test — uses more information.
Parametric equivalent: one-sample or paired t-test
Two independent samples
Mann-Whitney (Wilcoxon Rank-Sum)
Tests whether two independent populations have the same median. Ranks all observations together and compares rank sums between groups.
Parametric equivalent: two-sample t-test

Ranking — The Common Foundation

Both the Wilcoxon and Mann-Whitney tests require ranking data values. The ranking rules are critical:

Ranking Rules
1Order
Arrange all values in ascending order of absolute value (for Wilcoxon) or actual value (for Mann-Whitney).
2Assign ranks
Assign rank 1 to the smallest, rank 2 to the next, and so on.
3Tied ranks
If two or more values are equal (tied), give each the average of the ranks they would have occupied. For example, if values at ranks 3 and 4 are tied, both get rank 3.5.
4Zero differences
In the Wilcoxon test, if any difference = 0 (observation equals m₀), discard it and reduce n by 1 for that observation.

The Sign Test

The sign test is the simplest non-parametric test. For each observation xᵢ, record whether xᵢ − m₀ is positive (+) or negative (−). Ties (xᵢ = m₀) are discarded.

Sign Test Procedure
1Hypotheses
H₀: Population median m = m₀
H₁: m > m₀ (right-tailed)   or   m < m₀ (left-tailed)   or   m ≠ m₀ (two-tailed)
2Count signs
Let n = number of non-tied observations.
Let S⁺ = number of positive signs (xᵢ > m₀).
Let S⁻ = number of negative signs (xᵢ < m₀).
3Test statistic
Under H₀, S⁺ ~ B(n, 0.5).
Test statistic = S⁺ (right-tailed) or S⁻ (left-tailed) or min(S⁺, S⁻) (two-tailed)
4p-value
Use Binomial(n, 0.5) tables (MF19) to find the p-value.
For right-tailed: p = P(S⁺ ≥ observed value | B(n,0.5))
Reject H₀ if p-value < α.
5Conclusion
State whether there is sufficient evidence that the median differs from m₀, in context.
💡 Sign Test — Using Binomial Tables

Under H₀, each observation is equally likely to be above or below m₀, so S⁺ ~ B(n, ½). Cambridge provides binomial cumulative tables in MF19. For a right-tailed test at 5% with n=10, reject H₀ if P(S⁺ ≥ s) < 0.05, i.e. if s is large enough that the upper tail probability is below 0.05.

E
Sign Test — One-Tailed
Ten patients' reaction times (ms) after a drug: 245, 198, 312, 287, 176, 263, 301, 225, 189, 271.
Test at 5% whether the median reaction time exceeds 220 ms.
H₀, H₁
H₀: m = 220    H₁: m > 220   (right-tailed)
Signs
Differences from 220: +25, −22, +92, +67, −44, +43, +81, +5, −31, +51
Signs: +, −, +, +, −, +, +, +, −, +
S⁺ = 7,   S⁻ = 3,   n = 10 (no ties)
Sign display
+
+
+
+
+
+
+
p-value
S⁺ = 7 ~ B(10, 0.5) under H₀.
P(S⁺ ≥ 7) = P(X≥7) where X~B(10,0.5)
= P(X=7)+P(X=8)+P(X=9)+P(X=10)
= 0.1172+0.0439+0.0098+0.0010 = 0.1719
Decision
0.1719 > 0.05 → Fail to reject H₀
Insufficient evidence at 5% that median reaction time exceeds 220 ms.
⛔ Weakness of the Sign Test

The sign test ignores the magnitude of the differences — it only uses direction. A difference of +1 and +100 are treated identically. The Wilcoxon signed-rank test is always preferred when the data are continuous, as it uses more information and is more powerful.

Wilcoxon Signed-Rank Test

The Wilcoxon signed-rank test uses both the sign and the rank of each difference from the hypothesised median. It is more powerful than the sign test.

Wilcoxon Signed-Rank Procedure
1Hypotheses
H₀: Population median m = m₀   vs   H₁: m ≠ m₀ (or >, <)
2Compute |dᵢ|
Calculate dᵢ = xᵢ − m₀ for each observation.
Discard any dᵢ = 0 and reduce n accordingly.
3Rank |dᵢ|
Rank the absolute differences |dᵢ| from smallest (rank 1) to largest.
For tied |dᵢ|, assign average ranks.
4W⁺ and W⁻
W⁺ = sum of ranks where dᵢ > 0
W⁻ = sum of ranks where dᵢ < 0
Check: W⁺ + W⁻ = n(n+1)/2
5Test statistic T
T = min(W⁺, W⁻) for a two-tailed test.
For right-tailed (H₁: m > m₀): use T = W⁻
For left-tailed (H₁: m < m₀): use T = W⁺
6Critical value
Look up the critical value w_α in the Wilcoxon tables (MF19, Table 9) at sample size n and significance level α.
Reject H₀ if T ≤ w_α (note: ≤, not ≥ — smaller T means more evidence against H₀).
Decision Rule — Wilcoxon Signed-Rank Reject H₀ if T ≤ critical value w_α from tables Large T → little evidence against H₀ (W⁺ ≈ W⁻ ≈ balanced) Small T → strong evidence against H₀ (one side dominates)
This is opposite to most tests. The smaller the test statistic, the more evidence to reject H₀.

Selected Critical Values — Wilcoxon Signed-Rank (Two-Tailed)

nα = 0.10α = 0.05α = 0.02α = 0.01
5110
6210
74210
86421
98632
1011853
121414107
1525211613
2043383026
E
Wilcoxon Signed-Rank — Full Worked Example
Eight students' exam scores: 58, 72, 45, 81, 63, 54, 77, 69.
Test at 5% (two-tailed) whether the median score is 65.
H₀, H₁
H₀: m = 65    H₁: m ≠ 65   (two-tailed, α = 0.05)
Differences and ranks
xᵢdᵢ = xᵢ−65|dᵢ|Rank |dᵢ|SignSigned Rank
45−20206−6
54−11114−4
58−773−3
63−221−1
69+442++2
72+773++3 (tied — avg)
77+12125++5
81+16167++7
Note: |dᵢ|=7 appears twice (xᵢ=58 and xᵢ=72), so both get rank (2+3)/2 = 2.5. Recalculate: ranks 1,2,2.5,2.5,4,5,6,7 — wait, let me redo without tie: dᵢ values are −20,−11,−7,−2,+4,+7,+12,+16 — |dᵢ| = 20,11,7,2,4,7,12,16 → ordered: 2(1), 4(2), 7(3.5), 7(3.5), 11(5), 12(6), 16(7), 20(8)
Corrected ranks
xᵢdᵢ|dᵢ|RankSigned Rank
63−221−1
69+442+2
58−773.5−3.5
72+773.5+3.5
54−11115−5
77+12126+6
81+16167+7
45−20208−8
W⁺, W⁻, T
W⁺ = 2 + 3.5 + 6 + 7 = 18.5
W⁻ = 1 + 3.5 + 5 + 8 = 17.5
Check: 18.5 + 17.5 = 36 = 8×9/2 ✓
T = min(18.5, 17.5) = 17.5
Critical value
n=8, two-tailed α=0.05: w_{0.05} = 4 (from table)
Decision
T = 17.5 > 4 → Fail to reject H₀
No significant evidence at 5% that the median score differs from 65.

Mann-Whitney U-Test

The Mann-Whitney test compares two independent samples. It tests whether the two populations have the same median (or equivalently, the same distribution). Also called the Wilcoxon Rank-Sum test.

Mann-Whitney Procedure
1Hypotheses
H₀: The two populations have equal medians (m₁ = m₂)
H₁: m₁ ≠ m₂   or   m₁ > m₂   or   m₁ < m₂
2Combine and rank
Pool both samples together (n₁ + n₂ values).
Rank all values from smallest (rank 1) to largest, applying average ranks for ties.
Keep track of which sample each rank belongs to.
3Rank sums
W₁ = sum of ranks for sample 1
W₂ = sum of ranks for sample 2
Check: W₁ + W₂ = N(N+1)/2 where N = n₁ + n₂
4U statistics
U₁ = W₁ − n₁(n₁+1)/2
U₂ = W₂ − n₂(n₂+1)/2
Check: U₁ + U₂ = n₁ × n₂
5Test statistic
U = min(U₁, U₂) for two-tailed test.
For one-tailed: choose U₁ or U₂ depending on H₁ direction.
6Decision
Look up critical value u_α from Mann-Whitney tables (MF19, Table 10) at n₁, n₂ and α.
Reject H₀ if U ≤ u_α (same direction as Wilcoxon — smaller = more evidence against H₀).
Mann-Whitney — Decision Rule Reject H₀ if U = min(U₁, U₂) ≤ critical value u_α U₁ + U₂ = n₁ × n₂ (always — use to check arithmetic) W₁ + W₂ = N(N+1)/2 where N = n₁ + n₂ (use to check ranking)
Cambridge provides Mann-Whitney tables for small samples. For large samples (n₁, n₂ > 20), a normal approximation is used — Cambridge will specify this.
E
Mann-Whitney — Two Independent Samples
Sample A (n₁=5): 12, 18, 15, 22, 9
Sample B (n₂=6): 25, 14, 20, 17, 28, 11
Test at 5% (two-tailed) whether the populations have equal medians.
H₀, H₁
H₀: m_A = m_B    H₁: m_A ≠ m_B (two-tailed, α = 0.05)
Combined ranking (N=11)
ValueSampleRank
9A1
11B2
12A3
14B4
15A5
17B6
18A7
20B8
22A9
25B10
28B11
Rank sums
W₁ (Sample A) = 1+3+5+7+9 = 25
W₂ (Sample B) = 2+4+6+8+10+11 = 41
Check: 25+41 = 66 = 11×12/2 ✓
U statistics
U₁ = 25 − 5×6/2 = 25 − 15 = 10
U₂ = 41 − 6×7/2 = 41 − 21 = 20
Check: U₁+U₂ = 30 = 5×6 ✓
U = min(10,20) = 10
Critical value
n₁=5, n₂=6, two-tailed 5%: u_{0.05} = 4 (from MF19)
Decision
U = 10 > 4 → Fail to reject H₀
No significant evidence at 5% that the populations have different medians.

Normal Approximation for Large Samples

When both n₁ and n₂ are large (Cambridge will specify), the Mann-Whitney U statistic follows an approximate normal distribution:

Large-sample approximation E(U) = n₁n₂/2    Var(U) = n₁n₂(n₁+n₂+1)/12 z = (U − E(U)) / √Var(U) ~ N(0,1) approximately Use the two-tailed z critical value. Apply continuity correction if required: replace U with U ± 0.5.

Worked Examples

1
Paired Wilcoxon — Before and After
Blood pressure (mmHg) of 7 patients before and after treatment:
Before: 148, 152, 145, 160, 155, 143, 158
After: 142, 150, 148, 151, 152, 140, 155
Use Wilcoxon signed-rank at 5% (one-tailed) to test whether treatment reduces blood pressure. Do not assume normality.
Differences d = Before − After
6, 2, −3, 9, 3, 3, 3
Ranking |d|
|d|: 6, 2, 3, 9, 3, 3, 3
No zeros. Order: 2(1), 3(3), 3(3), 3(3), 3(3), 6(6), 9(7)
Wait — four 3s at positions 2,3,4,5 → average rank = (2+3+4+5)/4 = 3.5
Ranks: d=2→1, d=−3→3.5, d=+3→3.5, d=+3→3.5, d=+3→3.5, d=6→6, d=9→7
Table
BeforeAfterd|d|RankSigned Rank
152150+221+1
145148−333.5−3.5
155152+333.5+3.5
143140+333.5+3.5
158155+333.5+3.5
148142+666+6
160151+997+7
W⁺, W⁻, T
W⁺ = 1+3.5+3.5+3.5+6+7 = 24.5
W⁻ = 3.5
Check: 24.5+3.5 = 28 = 7×8/2 ✓
H₁: treatment reduces BP (before > after → positive d → want W⁺ large)
For right-tail H₁ (m_d > 0): T = W⁻ = 3.5
Decision
n=7, one-tailed 5%: w_{0.05} = 4 (from table)
T = 3.5 ≤ 4 → Reject H₀
Significant evidence at 5% that the treatment reduces blood pressure.
2
Choosing the Right Test
For each scenario, state which test to use and why:
(a) n=6 observations, test whether median = 50. Normality not assumed.
(b) Two independent samples of size 8 and 10. No normality assumption.
(c) Paired data, n=12. The differences are approximately normal.
(d) n=8 observations, median test. Only direction of differences known (not magnitude).
(a)
Wilcoxon signed-rank test — one-sample, no normality, continuous data so magnitudes are meaningful. (Sign test also valid but less powerful.)
(b)
Mann-Whitney U-test — two independent samples, no normality assumption.
(c)
Paired t-test — differences are approximately normal, so the parametric test is preferred (more powerful).
(d)
Sign test — only direction of difference is known, not magnitude. Cannot rank, so Wilcoxon is not appropriate.

Practice Questions

Question 1 — Sign Test
[5 marks]
A dietitian claims the median daily calorie intake of students is 2000 kcal. A sample of 12 students has intakes: 1850, 2100, 1980, 2250, 1760, 2050, 1920, 2180, 1890, 2300, 2020, 1970.
Use a sign test at 10% to test whether the median differs from 2000 (two-tailed).
Count values above 2000 (S⁺) and below 2000 (S⁻). Discard any = 2000. Under H₀, S⁺ ~ B(n, 0.5). For two-tailed 10%, compare 2×P(S⁺ ≤ min(S⁺,S⁻)) with 0.10.
✓ Solution
Signs (relative to 2000):
1850(−), 2100(+), 1980(−), 2250(+), 1760(−), 2050(+), 1920(−), 2180(+), 1890(−), 2300(+), 2020(+), 1970(−)
S⁺ = 6, S⁻ = 6, n = 12 (no ties)

H₀: m=2000, H₁: m≠2000 (two-tailed, α=0.10)
T = min(6,6) = 6 ~ B(12, 0.5)
P(S⁺ ≤ 6) for B(12,0.5) = 0.6128 (not a small tail probability — this is not unusual)
Two-tailed p-value = 2×P(X ≤ 6) ... actually since T=6=n/2 this is the most balanced possible outcome.
p-value = 2×P(X ≤ 6) = 2×0.6128 > 1 — use: p = 1 (perfectly balanced).
More precisely: p-value = P(X≤6 or X≥6) = 1. Fail to reject H₀.
No evidence at 10% that median differs from 2000 kcal. (Data perfectly balanced.)
Question 2 — Wilcoxon Signed-Rank (One-tailed)
[8 marks]
A manufacturer claims components last a median of 500 hours. A sample of 9 components lasts: 482, 515, 498, 543, 467, 521, 488, 556, 503. Test at 5% (one-tailed) whether the median lifetime exceeds 500 hours.
Compute d = x−500 for each. Discard d=0. Rank |d|, split into W⁺ (positive d) and W⁻ (negative d). For H₁: m>500, use T = W⁻. Reject if T ≤ critical value at n=8 (after discarding d=0 for x=503... wait, d=3≠0. n remains 9 — check for zeros carefully).
✓ Solution
d = x−500: −18, +15, −2, +43, −33, +21, −12, +56, +3
No zeros. n=9.

|d| ordered: 2(1), 3(2), 12(3), 15(4), 18(5), 21(6), 33(7), 43(8), 56(9)

Signs: −18→rank5(−), +15→rank4(+), −2→rank1(−), +43→rank8(+), −33→rank7(−), +21→rank6(+), −12→rank3(−), +56→rank9(+), +3→rank2(+)

W⁺ = 4+8+6+9+2 = 29
W⁻ = 5+1+7+3 = 16
Check: 29+16 = 45 = 9×10/2 ✓

H₁: m>500 (right-tailed) → T = W⁻ = 16
n=9, one-tailed 5%: w_{0.05} = 8 (from table)
T=16 > 8 → Fail to reject H₀
Insufficient evidence at 5% that median lifetime exceeds 500 hours.
Question 3 — Mann-Whitney
[9 marks]
Group X (n₁=5): 34, 41, 28, 37, 45
Group Y (n₂=7): 52, 29, 44, 38, 61, 33, 47
Test at 5% (two-tailed) whether the populations have equal medians.
Combine and rank all 12 values. Compute W_X and W_Y. Then U_X = W_X − n₁(n₁+1)/2 and U_Y = W_Y − n₂(n₂+1)/2. U = min(U_X, U_Y). Compare with critical value for n₁=5, n₂=7 at 5%.
✓ Solution
Combined sort: 28(X,1), 29(Y,2), 33(Y,3), 34(X,4), 37(X,5), 38(Y,6), 41(X,7), 44(Y,8), 45(X,9), 47(Y,10), 52(Y,11), 61(Y,12)

W_X = 1+4+5+7+9 = 26
W_Y = 2+3+6+8+10+11+12 = 52
Check: 26+52 = 78 = 12×13/2 ✓

U_X = 26 − 5×6/2 = 26−15 = 11
U_Y = 52 − 7×8/2 = 52−28 = 24
Check: 11+24 = 35 = 5×7 ✓

U = min(11,24) = 11
n₁=5, n₂=7, two-tailed 5%: u_{0.05} = 6 (from MF19)
11 > 6 → Fail to reject H₀
No significant evidence at 5% that the population medians differ.
Question 4 — Choosing and Justifying the Test
[4 marks]
A psychologist wishes to compare memory scores for two groups of 7 and 9 participants respectively. The scores in group 1 are highly skewed due to one extreme value. State which test should be used and justify your answer. State clearly what assumptions the test requires.
Two independent samples → Mann-Whitney or two-sample t-test. But data is skewed/has outlier → normality is violated → non-parametric is preferred.
✓ Solution
Use the Mann-Whitney U-test (Wilcoxon rank-sum test).

Justification:
• Two independent samples → rules out Wilcoxon signed-rank and sign test
• Skewed distribution with outlier → normality assumption of t-test is violated, especially with small n
• Mann-Whitney uses ranks, so it is robust to the outlier and skewness

Assumptions required:
• The two samples are independent random samples
• The observations are from continuous distributions
• Under H₀, the two distributions have the same shape (not just the same mean)

Interactive Rank Calculator

Enter data values and compute test statistics for all three non-parametric tests, with full ranking table shown.

Non-Parametric Test Calculator
Enter values above and click Calculate.

Formula Sheet — Non-Parametric Tests

Sign Test
H₀: m = m₀Median test
S⁺ ~ B(n, 0.5)Under H₀
Discard ties (x=m₀)Reduce n
T = min(S⁺,S⁻)Two-tailed
Use binomial tablesMF19
Wilcoxon Signed-Rank
Rank |dᵢ| = |xᵢ−m₀|
W⁺ = Σ(positive ranks)
W⁻ = Σ(negative ranks)
Check W⁺+W⁻= n(n+1)/2
T = min(W⁺,W⁻)Two-tailed
Reject if T ≤ w_αFrom tables
Mann-Whitney
Rank all N=n₁+n₂ values
W₁+W₂ = N(N+1)/2Check
U₁ = W₁−n₁(n₁+1)/2
U₂ = W₂−n₂(n₂+1)/2
U₁+U₂ = n₁n₂Check
Reject if U ≤ u_αFrom tables
Tied Ranks Rule
Tied valuesAverage of their ranks
e.g. 3 tied at ranks 4,5,6All get rank 5
Zero differencesDiscard, reduce n
Mann-Whitney (Large n)
E(U)= n₁n₂/2
Var(U)= n₁n₂(N+1)/12
z = (U−E(U))/√Var(U)~ N(0,1)
Which Test?
1-sample, direction onlySign test
1-sample, magnitude knownWilcoxon S-R
2 independent samplesMann-Whitney
Paired data, no normalityWilcoxon S-R on diff
📋 Cambridge Exam Strategy — Non-Parametric Tests
  • Always show the full ranking table — values, differences, |d|, rank, signed rank. Cambridge awards method marks for correct ranking even if later arithmetic is wrong.
  • Check sums explicitly: State W⁺ + W⁻ = n(n+1)/2 and U₁ + U₂ = n₁n₂. These checks catch errors before the decision step.
  • Tied ranks: If any |dᵢ| values are equal, average their ranks. State "tied ranks" explicitly — Cambridge rewards this acknowledgement.
  • Zero differences (Wilcoxon): State "discard d=0, reduce n to [new value]." This earns a mark and affects the critical value lookup.
  • Decision direction: Reject H₀ if T ≤ critical value — the opposite direction to z and t tests. Students often err here.
  • Justification questions: When asked why a non-parametric test is appropriate, mention: (1) small sample size, (2) normality not satisfied, (3) data is ordinal. When it is not appropriate: large n with normal data → parametric test is more powerful.
← Lesson 4.3: χ²-Tests 9231 P4 · Lesson 4.4 of 5