Non-Parametric Tests | 9231 Further Statistics P4

What are Non-Parametric Tests?

Parametric tests (z-test, t-test) require assumptions about the population distribution — usually normality. Non-parametric tests make no such assumptions. They work on ranks or signs rather than the raw data values, making them robust to outliers and skewed distributions.

Parametric Tests

Assume normality of population

Use actual data values

More powerful when assumptions hold

t-test, z-test, F-test

Sensitive to outliers

Test mean μ

Non-Parametric Tests

No distributional assumptions

Use ranks or signs only

Less powerful but more robust

Wilcoxon, Mann-Whitney, Sign test

Robust to outliers

Test median m

When to Use Non-Parametric Tests

📌 Cambridge 9231 — When Non-Parametric is Required

The population is not normally distributed (or normality cannot be assumed).
The sample size is small and you cannot invoke the CLT.
Data are ordinal (ranked categories) rather than continuous measurements.
There are extreme outliers that would invalidate a t-test.
The question explicitly says "use a non-parametric test" or "do not assume normality."

The Three Tests Tested in 9231 P4

One-sample / Paired

Sign Test

Tests whether the population median equals a specified value m₀. Uses only the sign (+/−) of each difference from m₀. Simplest non-parametric test.

Parametric equivalent: one-sample t-test

One-sample / Paired (stronger)

Wilcoxon Signed-Rank

Tests whether the population median equals m₀. Uses both the sign and magnitude of differences. More powerful than the sign test — uses more information.

Parametric equivalent: one-sample or paired t-test

Two independent samples

Mann-Whitney (Wilcoxon Rank-Sum)

Tests whether two independent populations have the same median. Ranks all observations together and compares rank sums between groups.

Parametric equivalent: two-sample t-test

Ranking — The Common Foundation

Both the Wilcoxon and Mann-Whitney tests require ranking data values. The ranking rules are critical:

Ranking Rules

1Order

Arrange all values in ascending order of absolute value (for Wilcoxon) or actual value (for Mann-Whitney).

2Assign ranks

Assign rank 1 to the smallest, rank 2 to the next, and so on.

3Tied ranks

If two or more values are equal (tied), give each the average of the ranks they would have occupied. For example, if values at ranks 3 and 4 are tied, both get rank 3.5.

4Zero differences

In the Wilcoxon test, if any difference = 0 (observation equals m₀), discard it and reduce n by 1 for that observation.

The Sign Test

The sign test is the simplest non-parametric test. For each observation xᵢ, record whether xᵢ − m₀ is positive (+) or negative (−). Ties (xᵢ = m₀) are discarded.

Sign Test Procedure

1Hypotheses

H₀: Population median m = m₀
H₁: m > m₀ (right-tailed) or m < m₀ (left-tailed) or m ≠ m₀ (two-tailed)

2Count signs

Let n = number of non-tied observations.
Let S⁺ = number of positive signs (xᵢ > m₀).
Let S⁻ = number of negative signs (xᵢ < m₀).

3Test statistic

Under H₀, S⁺ ~ B(n, 0.5).
Test statistic = S⁺ (right-tailed) or S⁻ (left-tailed) or min(S⁺, S⁻) (two-tailed)

4p-value

Use Binomial(n, 0.5) tables (MF19) to find the p-value.
For right-tailed: p = P(S⁺ ≥ observed value | B(n,0.5))
Reject H₀ if p-value < α.

5Conclusion

State whether there is sufficient evidence that the median differs from m₀, in context.

💡 Sign Test — Using Binomial Tables

Under H₀, each observation is equally likely to be above or below m₀, so S⁺ ~ B(n, ½). Cambridge provides binomial cumulative tables in MF19. For a right-tailed test at 5% with n=10, reject H₀ if P(S⁺ ≥ s) < 0.05, i.e. if s is large enough that the upper tail probability is below 0.05.

E

Sign Test — One-Tailed

Ten patients' reaction times (ms) after a drug: 245, 198, 312, 287, 176, 263, 301, 225, 189, 271.
Test at 5% whether the median reaction time exceeds 220 ms.

H₀, H₁

H₀: m = 220 H₁: m > 220 (right-tailed)

Signs

Differences from 220: +25, −22, +92, +67, −44, +43, +81, +5, −31, +51
Signs: +, −, +, +, −, +, +, +, −, +
S⁺ = 7, S⁻ = 3, n = 10 (no ties)

Sign display

+

−

+

−

+

−

+

p-value

S⁺ = 7 ~ B(10, 0.5) under H₀.
P(S⁺ ≥ 7) = P(X≥7) where X~B(10,0.5)
= P(X=7)+P(X=8)+P(X=9)+P(X=10)
= 0.1172+0.0439+0.0098+0.0010 = 0.1719

Decision

0.1719 > 0.05 → Fail to reject H₀

Insufficient evidence at 5% that median reaction time exceeds 220 ms.

⛔ Weakness of the Sign Test

The sign test ignores the magnitude of the differences — it only uses direction. A difference of +1 and +100 are treated identically. The Wilcoxon signed-rank test is always preferred when the data are continuous, as it uses more information and is more powerful.

Wilcoxon Signed-Rank Test

The Wilcoxon signed-rank test uses both the sign and the rank of each difference from the hypothesised median. It is more powerful than the sign test.

Wilcoxon Signed-Rank Procedure

1Hypotheses

H₀: Population median m = m₀ vs H₁: m ≠ m₀ (or >, <)

2Compute |dᵢ|

Calculate dᵢ = xᵢ − m₀ for each observation.
Discard any dᵢ = 0 and reduce n accordingly.

3Rank |dᵢ|

Rank the absolute differences |dᵢ| from smallest (rank 1) to largest.
For tied |dᵢ|, assign average ranks.

4W⁺ and W⁻

W⁺ = sum of ranks where dᵢ > 0
W⁻ = sum of ranks where dᵢ < 0
Check: W⁺ + W⁻ = n(n+1)/2

5Test statistic T

T = min(W⁺, W⁻) for a two-tailed test.
For right-tailed (H₁: m > m₀): use T = W⁻
For left-tailed (H₁: m < m₀): use T = W⁺

6Critical value

Look up the critical value w_α in the Wilcoxon tables (MF19, Table 9) at sample size n and significance level α.
Reject H₀ if T ≤ w_α (note: ≤, not ≥ — smaller T means more evidence against H₀).

Decision Rule — Wilcoxon Signed-Rank Reject H₀ if T ≤ critical value w_α from tables Large T → little evidence against H₀ (W⁺ ≈ W⁻ ≈ balanced) Small T → strong evidence against H₀ (one side dominates)

This is opposite to most tests. The smaller the test statistic, the more evidence to reject H₀.

Selected Critical Values — Wilcoxon Signed-Rank (Two-Tailed)

n	α = 0.10	α = 0.05	α = 0.02	α = 0.01
5	1	1	0	—
6	2	1	0	—
7	4	2	1	0
8	6	4	2	1
9	8	6	3	2
10	11	8	5	3
12	14	14	10	7
15	25	21	16	13
20	43	38	30	26

E

Wilcoxon Signed-Rank — Full Worked Example

Eight students' exam scores: 58, 72, 45, 81, 63, 54, 77, 69.
Test at 5% (two-tailed) whether the median score is 65.

H₀, H₁

H₀: m = 65 H₁: m ≠ 65 (two-tailed, α = 0.05)

Differences and ranks

xᵢ	dᵢ = xᵢ−65	\|dᵢ\|	Rank \|dᵢ\|	Sign	Signed Rank
45	−20	20	6	−	−6
54	−11	11	4	−	−4
58	−7	7	3	−	−3
63	−2	2	1	−	−1
69	+4	4	2	+	+2
72	+7	7	3	+	+3 (tied — avg)
77	+12	12	5	+	+5
81	+16	16	7	+	+7

Note: |dᵢ|=7 appears twice (xᵢ=58 and xᵢ=72), so both get rank (2+3)/2 = 2.5. Recalculate: ranks 1,2,2.5,2.5,4,5,6,7 — wait, let me redo without tie: dᵢ values are −20,−11,−7,−2,+4,+7,+12,+16 — |dᵢ| = 20,11,7,2,4,7,12,16 → ordered: 2(1), 4(2), 7(3.5), 7(3.5), 11(5), 12(6), 16(7), 20(8)

Corrected ranks

xᵢ	dᵢ	\|dᵢ\|	Rank	Signed Rank
63	−2	2	1	−1
69	+4	4	2	+2
58	−7	7	3.5	−3.5
72	+7	7	3.5	+3.5
54	−11	11	5	−5
77	+12	12	6	+6
81	+16	16	7	+7
45	−20	20	8	−8

W⁺, W⁻, T

W⁺ = 2 + 3.5 + 6 + 7 = 18.5
W⁻ = 1 + 3.5 + 5 + 8 = 17.5
Check: 18.5 + 17.5 = 36 = 8×9/2 ✓
T = min(18.5, 17.5) = 17.5

Critical value

n=8, two-tailed α=0.05: w_{0.05} = 4 (from table)

Decision

T = 17.5 > 4 → Fail to reject H₀

No significant evidence at 5% that the median score differs from 65.

Mann-Whitney U-Test

The Mann-Whitney test compares two independent samples. It tests whether the two populations have the same median (or equivalently, the same distribution). Also called the Wilcoxon Rank-Sum test.

Mann-Whitney Procedure

1Hypotheses

H₀: The two populations have equal medians (m₁ = m₂)
H₁: m₁ ≠ m₂ or m₁ > m₂ or m₁ < m₂

2Combine and rank

Pool both samples together (n₁ + n₂ values).
Rank all values from smallest (rank 1) to largest, applying average ranks for ties.
Keep track of which sample each rank belongs to.

3Rank sums

W₁ = sum of ranks for sample 1
W₂ = sum of ranks for sample 2
Check: W₁ + W₂ = N(N+1)/2 where N = n₁ + n₂

4U statistics

U₁ = W₁ − n₁(n₁+1)/2
U₂ = W₂ − n₂(n₂+1)/2
Check: U₁ + U₂ = n₁ × n₂

5Test statistic

U = min(U₁, U₂) for two-tailed test.
For one-tailed: choose U₁ or U₂ depending on H₁ direction.

6Decision

Look up critical value u_α from Mann-Whitney tables (MF19, Table 10) at n₁, n₂ and α.
Reject H₀ if U ≤ u_α (same direction as Wilcoxon — smaller = more evidence against H₀).

Mann-Whitney — Decision Rule Reject H₀ if U = min(U₁, U₂) ≤ critical value u_α U₁ + U₂ = n₁ × n₂ (always — use to check arithmetic) W₁ + W₂ = N(N+1)/2 where N = n₁ + n₂ (use to check ranking)

Cambridge provides Mann-Whitney tables for small samples. For large samples (n₁, n₂ > 20), a normal approximation is used — Cambridge will specify this.

E

Mann-Whitney — Two Independent Samples

Sample A (n₁=5): 12, 18, 15, 22, 9
Sample B (n₂=6): 25, 14, 20, 17, 28, 11
Test at 5% (two-tailed) whether the populations have equal medians.

H₀, H₁

H₀: m_A = m_B H₁: m_A ≠ m_B (two-tailed, α = 0.05)

Combined ranking (N=11)

Value	Sample	Rank
9	A	1
11	B	2
12	A	3
14	B	4
15	A	5
17	B	6
18	A	7
20	B	8
22	A	9
25	B	10
28	B	11

Rank sums

W₁ (Sample A) = 1+3+5+7+9 = 25
W₂ (Sample B) = 2+4+6+8+10+11 = 41
Check: 25+41 = 66 = 11×12/2 ✓

U statistics

U₁ = 25 − 5×6/2 = 25 − 15 = 10
U₂ = 41 − 6×7/2 = 41 − 21 = 20
Check: U₁+U₂ = 30 = 5×6 ✓
U = min(10,20) = 10

Critical value

n₁=5, n₂=6, two-tailed 5%: u_{0.05} = 4 (from MF19)

Decision

U = 10 > 4 → Fail to reject H₀

No significant evidence at 5% that the populations have different medians.

Normal Approximation for Large Samples

When both n₁ and n₂ are large (Cambridge will specify), the Mann-Whitney U statistic follows an approximate normal distribution:

Large-sample approximation E(U) = n₁n₂/2 Var(U) = n₁n₂(n₁+n₂+1)/12 z = (U − E(U)) / √Var(U) ~ N(0,1) approximately Use the two-tailed z critical value. Apply continuity correction if required: replace U with U ± 0.5.

Worked Examples

1

Paired Wilcoxon — Before and After

Blood pressure (mmHg) of 7 patients before and after treatment:
Before: 148, 152, 145, 160, 155, 143, 158
After: 142, 150, 148, 151, 152, 140, 155
Use Wilcoxon signed-rank at 5% (one-tailed) to test whether treatment reduces blood pressure. Do not assume normality.

Differences d = Before − After

6, 2, −3, 9, 3, 3, 3

Ranking |d|

|d|: 6, 2, 3, 9, 3, 3, 3
No zeros. Order: 2(1), 3(3), 3(3), 3(3), 3(3), 6(6), 9(7)
Wait — four 3s at positions 2,3,4,5 → average rank = (2+3+4+5)/4 = 3.5
Ranks: d=2→1, d=−3→3.5, d=+3→3.5, d=+3→3.5, d=+3→3.5, d=6→6, d=9→7

Table

Before	After	d	\|d\|	Rank	Signed Rank
152	150	+2	2	1	+1
145	148	−3	3	3.5	−3.5
155	152	+3	3	3.5	+3.5
143	140	+3	3	3.5	+3.5
158	155	+3	3	3.5	+3.5
148	142	+6	6	6	+6
160	151	+9	9	7	+7

W⁺, W⁻, T

W⁺ = 1+3.5+3.5+3.5+6+7 = 24.5
W⁻ = 3.5
Check: 24.5+3.5 = 28 = 7×8/2 ✓
H₁: treatment reduces BP (before > after → positive d → want W⁺ large)
For right-tail H₁ (m_d > 0): T = W⁻ = 3.5

Decision

n=7, one-tailed 5%: w_{0.05} = 4 (from table)
T = 3.5 ≤ 4 → Reject H₀

Significant evidence at 5% that the treatment reduces blood pressure.

2

Choosing the Right Test

For each scenario, state which test to use and why:
(a) n=6 observations, test whether median = 50. Normality not assumed.
(b) Two independent samples of size 8 and 10. No normality assumption.
(c) Paired data, n=12. The differences are approximately normal.
(d) n=8 observations, median test. Only direction of differences known (not magnitude).

(a)

Wilcoxon signed-rank test — one-sample, no normality, continuous data so magnitudes are meaningful. (Sign test also valid but less powerful.)

(b)

Mann-Whitney U-test — two independent samples, no normality assumption.

(c)

Paired t-test — differences are approximately normal, so the parametric test is preferred (more powerful).

(d)

Sign test — only direction of difference is known, not magnitude. Cannot rank, so Wilcoxon is not appropriate.

Practice Questions

Question 1 — Sign Test

[5 marks]

A dietitian claims the median daily calorie intake of students is 2000 kcal. A sample of 12 students has intakes: 1850, 2100, 1980, 2250, 1760, 2050, 1920, 2180, 1890, 2300, 2020, 1970.
Use a sign test at 10% to test whether the median differs from 2000 (two-tailed).

Count values above 2000 (S⁺) and below 2000 (S⁻). Discard any = 2000. Under H₀, S⁺ ~ B(n, 0.5). For two-tailed 10%, compare 2×P(S⁺ ≤ min(S⁺,S⁻)) with 0.10.

✓ Solution

Signs (relative to 2000):
1850(−), 2100(+), 1980(−), 2250(+), 1760(−), 2050(+), 1920(−), 2180(+), 1890(−), 2300(+), 2020(+), 1970(−)
S⁺ = 6, S⁻ = 6, n = 12 (no ties)

H₀: m=2000, H₁: m≠2000 (two-tailed, α=0.10)
T = min(6,6) = 6 ~ B(12, 0.5)
P(S⁺ ≤ 6) for B(12,0.5) = 0.6128 (not a small tail probability — this is not unusual)
Two-tailed p-value = 2×P(X ≤ 6) ... actually since T=6=n/2 this is the most balanced possible outcome.
p-value = 2×P(X ≤ 6) = 2×0.6128 > 1 — use: p = 1 (perfectly balanced).
More precisely: p-value = P(X≤6 or X≥6) = 1. Fail to reject H₀.

No evidence at 10% that median differs from 2000 kcal. (Data perfectly balanced.)

Question 2 — Wilcoxon Signed-Rank (One-tailed)

[8 marks]

A manufacturer claims components last a median of 500 hours. A sample of 9 components lasts: 482, 515, 498, 543, 467, 521, 488, 556, 503. Test at 5% (one-tailed) whether the median lifetime exceeds 500 hours.

Compute d = x−500 for each. Discard d=0. Rank |d|, split into W⁺ (positive d) and W⁻ (negative d). For H₁: m>500, use T = W⁻. Reject if T ≤ critical value at n=8 (after discarding d=0 for x=503... wait, d=3≠0. n remains 9 — check for zeros carefully).

✓ Solution

d = x−500: −18, +15, −2, +43, −33, +21, −12, +56, +3
No zeros. n=9.

|d| ordered: 2(1), 3(2), 12(3), 15(4), 18(5), 21(6), 33(7), 43(8), 56(9)

Signs: −18→rank5(−), +15→rank4(+), −2→rank1(−), +43→rank8(+), −33→rank7(−), +21→rank6(+), −12→rank3(−), +56→rank9(+), +3→rank2(+)

W⁺ = 4+8+6+9+2 = 29
W⁻ = 5+1+7+3 = 16
Check: 29+16 = 45 = 9×10/2 ✓

H₁: m>500 (right-tailed) → T = W⁻ = 16
n=9, one-tailed 5%: w_{0.05} = 8 (from table)
T=16 > 8 → Fail to reject H₀

Insufficient evidence at 5% that median lifetime exceeds 500 hours.

Question 3 — Mann-Whitney

[9 marks]

Group X (n₁=5): 34, 41, 28, 37, 45
Group Y (n₂=7): 52, 29, 44, 38, 61, 33, 47
Test at 5% (two-tailed) whether the populations have equal medians.

Combine and rank all 12 values. Compute W_X and W_Y. Then U_X = W_X − n₁(n₁+1)/2 and U_Y = W_Y − n₂(n₂+1)/2. U = min(U_X, U_Y). Compare with critical value for n₁=5, n₂=7 at 5%.

✓ Solution

Combined sort: 28(X,1), 29(Y,2), 33(Y,3), 34(X,4), 37(X,5), 38(Y,6), 41(X,7), 44(Y,8), 45(X,9), 47(Y,10), 52(Y,11), 61(Y,12)

W_X = 1+4+5+7+9 = 26
W_Y = 2+3+6+8+10+11+12 = 52
Check: 26+52 = 78 = 12×13/2 ✓

U_X = 26 − 5×6/2 = 26−15 = 11
U_Y = 52 − 7×8/2 = 52−28 = 24
Check: 11+24 = 35 = 5×7 ✓

U = min(11,24) = 11
n₁=5, n₂=7, two-tailed 5%: u_{0.05} = 6 (from MF19)
11 > 6 → Fail to reject H₀

No significant evidence at 5% that the population medians differ.

Question 4 — Choosing and Justifying the Test

[4 marks]

A psychologist wishes to compare memory scores for two groups of 7 and 9 participants respectively. The scores in group 1 are highly skewed due to one extreme value. State which test should be used and justify your answer. State clearly what assumptions the test requires.

Two independent samples → Mann-Whitney or two-sample t-test. But data is skewed/has outlier → normality is violated → non-parametric is preferred.

✓ Solution

Use the Mann-Whitney U-test (Wilcoxon rank-sum test).

Justification:
• Two independent samples → rules out Wilcoxon signed-rank and sign test
• Skewed distribution with outlier → normality assumption of t-test is violated, especially with small n
• Mann-Whitney uses ranks, so it is robust to the outlier and skewness

Assumptions required:
• The two samples are independent random samples
• The observations are from continuous distributions
• Under H₀, the two distributions have the same shape (not just the same mean)

Interactive Rank Calculator

Enter data values and compute test statistics for all three non-parametric tests, with full ranking table shown.

Non-Parametric Test Calculator

H₁ direction

Significance α

Enter values above and click Calculate.

Formula Sheet — Non-Parametric Tests

Sign Test

H₀: m = m₀Median test

S⁺ ~ B(n, 0.5)Under H₀

Discard ties (x=m₀)Reduce n

T = min(S⁺,S⁻)Two-tailed

Use binomial tablesMF19

Wilcoxon Signed-Rank

Rank |dᵢ| = |xᵢ−m₀|—

W⁺ = Σ(positive ranks)—

W⁻ = Σ(negative ranks)—

Check W⁺+W⁻= n(n+1)/2

T = min(W⁺,W⁻)Two-tailed

Reject if T ≤ w_αFrom tables

Mann-Whitney

Rank all N=n₁+n₂ values—

W₁+W₂ = N(N+1)/2Check

U₁ = W₁−n₁(n₁+1)/2—

U₂ = W₂−n₂(n₂+1)/2—

U₁+U₂ = n₁n₂Check

Reject if U ≤ u_αFrom tables

Tied Ranks Rule

Tied valuesAverage of their ranks

e.g. 3 tied at ranks 4,5,6All get rank 5

Zero differencesDiscard, reduce n

Mann-Whitney (Large n)

E(U)= n₁n₂/2

Var(U)= n₁n₂(N+1)/12

z = (U−E(U))/√Var(U)~ N(0,1)

Which Test?

1-sample, direction onlySign test

1-sample, magnitude knownWilcoxon S-R

2 independent samplesMann-Whitney

Paired data, no normalityWilcoxon S-R on diff

📋 Cambridge Exam Strategy — Non-Parametric Tests

Always show the full ranking table — values, differences, |d|, rank, signed rank. Cambridge awards method marks for correct ranking even if later arithmetic is wrong.
Check sums explicitly: State W⁺ + W⁻ = n(n+1)/2 and U₁ + U₂ = n₁n₂. These checks catch errors before the decision step.
Tied ranks: If any |dᵢ| values are equal, average their ranks. State "tied ranks" explicitly — Cambridge rewards this acknowledgement.
Zero differences (Wilcoxon): State "discard d=0, reduce n to [new value]." This earns a mark and affects the critical value lookup.
Decision direction: Reject H₀ if T ≤ critical value — the opposite direction to z and t tests. Students often err here.
Justification questions: When asked why a non-parametric test is appropriate, mention: (1) small sample size, (2) normality not satisfied, (3) data is ordinal. When it is not appropriate: large n with normal data → parametric test is more powerful.

n	α = 0.10	α = 0.05	α = 0.02	α = 0.01
5	1	1	0	—
6	2	1	0	—
7	4	2	1	0
8	6	4	2	1
9	8	6	3	2
10	11	8	5	3
12	14	14	10	7
15	25	21	16	13
20	43	38	30	26

n	α = 0.10	α = 0.05	α = 0.02	α = 0.01
5	1	1	0	—
6	2	1	0	—
7	4	2	1	0
8	6	4	2	1
9	8	6	3	2
10	11	8	5	3
12	14	14	10	7
15	25	21	16	13
20	43	38	30	26

n	α = 0.10	α = 0.05	α = 0.02	α = 0.01
5	1	1	0	—
6	2	1	0	—
7	4	2	1	0
8	6	4	2	1
9	8	6	3	2
10	11	8	5	3
12	14	14	10	7
15	25	21	16	13
20	43	38	30	26