1. Types of Data and Data Collection
Qualitative (Categorical) Data
Data described in words — categories with no numerical value.
Examples: colours, gender, favourite subject, brand names.
Displayed using: bar charts, pie charts, pictograms.
Quantitative (Numerical) Data
Data with numerical values. Divided into two types:
Discrete: Only specific values (usually whole numbers). e.g. number of children, shoe sizes, goals scored.
Continuous: Any value in a range — measured, not counted. e.g. height, mass, time, temperature.
Frequency Tables and Grouped Data
For the class "20 ≤ h < 30": lower boundary = 20, upper boundary = 30, midpoint = 25.
For the class "20–29" (discrete data): lower boundary = 19.5, upper boundary = 29.5, midpoint = 24.5.
2. Averages — Mean, Median, and Mode
| Average | Definition | When to Use | Affected by Extremes? |
|---|---|---|---|
| Mean | Sum of all values ÷ number of values | When data is fairly symmetrical, no extreme outliers | Yes — outliers pull the mean |
| Median | Middle value when data is arranged in order. For n values: position = (n+1)/2 | When data has outliers or is skewed | No — resistant to extremes |
| Mode | Value that appears most frequently. A dataset can have no mode, one mode, or multiple modes. | Qualitative data; most popular category | No |
| Range | Largest value − Smallest value. Measures spread. | Simple measure of spread | Yes — very sensitive to extremes |
Mean from a Frequency Table
Mean from Frequency Table
where f = frequency, x = value (or midpoint for grouped data)
📐 Worked Example 1 — Mean from Grouped Frequency Table
Find the estimated mean from this grouped frequency table:
| Height h (cm) | Frequency f | Midpoint x | fx |
|---|---|---|---|
| 140 ≤ h < 150 | 4 | 145 | 580 |
| 150 ≤ h < 160 | 11 | 155 | 1705 |
| 160 ≤ h < 170 | 9 | 165 | 1485 |
| 170 ≤ h < 180 | 6 | 175 | 1050 |
| Total | 30 | 4820 |
Median and Quartiles for Ungrouped Data
📐 Worked Example 2 — Median, Quartiles, IQR
Data: 3, 7, 8, 12, 15, 18, 21, 24, 26. Find the median, lower quartile (Q1), upper quartile (Q3), and interquartile range (IQR).
Median (Q2) = 15
Range = 26 − 3 = 23
3. Statistical Diagrams
Bar Charts and Pie Charts
Bar Chart
Bars of equal width. Height represents frequency. Bars may be separated (discrete) or for comparison (grouped bars). Y-axis must start at zero and be evenly scaled.
Pie Chart
Circle divided into sectors. Angle for each sector = (frequency ÷ total) × 360°. Total of all angles = 360°. Used for showing proportions/percentages of a whole.
📐 Worked Example 3 — Pie Chart Angles
80 students chose their favourite subject: Maths 24, English 18, Science 20, Other 18. Calculate the angle for each sector.
Maths: (24/80) × 360 = 108°
English: (18/80) × 360 = 81°
Science: (20/80) × 360 = 90°
Other: (18/80) × 360 = 81°
Check: 108 + 81 + 90 + 81 = 360° ✓
Histograms
Frequency Density
📐 Worked Example 4 — Histogram with Unequal Class Widths
Calculate frequency densities for the following:
| Time t (min) | Frequency | Class Width | Freq. Density |
|---|---|---|---|
| 0 ≤ t < 10 | 12 | 10 | 1.2 |
| 10 ≤ t < 20 | 25 | 10 | 2.5 |
| 20 ≤ t < 30 | 30 | 10 | 3.0 |
| 30 ≤ t < 50 | 20 | 20 | 1.0 |
| 50 ≤ t < 80 | 9 | 30 | 0.3 |
For the bar 50 ≤ t < 80: frequency = 0.3 × 30 = 9 ✓
Scatter Diagrams and Correlation
Types of Correlation
- Positive correlation — as one variable increases, the other increases. Points slope upward left to right.
- Negative correlation — as one increases, the other decreases. Points slope downward.
- No correlation — no relationship between variables. Points scattered randomly.
- Strong/Weak — how closely points follow the trend line.
Line of Best Fit
A straight line drawn through the scatter diagram that best represents the trend — approximately equal numbers of points on each side.
Must pass through the mean point (x̄, ȳ).
Used to interpolate (estimate within data range) or extrapolate (estimate outside range — less reliable).
4. Cumulative Frequency Diagrams
📐 Worked Example 5 — Cumulative Frequency Table and Reading Values
60 students took a test scored out of 80. Build the cumulative frequency table and find the median, Q1, Q3, and IQR.
| Score s | Frequency | Cumulative Frequency | Plot Point |
|---|---|---|---|
| 0 ≤ s < 20 | 5 | 5 | (20, 5) |
| 20 ≤ s < 40 | 12 | 17 | (40, 17) |
| 40 ≤ s < 50 | 18 | 35 | (50, 35) |
| 50 ≤ s < 60 | 15 | 50 | (60, 50) |
| 60 ≤ s < 80 | 10 | 60 | (80, 60) |
Read across from cf = 30 on the curve → Median ≈ 47
Read from cf = 15 → Q1 ≈ 38
Read from cf = 45 → Q3 ≈ 57
Percentile: The 80th percentile = value at cf = 0.8 × 60 = 48th value → read from curve.
Box-and-Whisker Plot
Minimum | Q1 | Median (Q2) | Q3 | Maximum
The box spans from Q1 to Q3 (the IQR). Whiskers extend to the minimum and maximum values.
5. Probability
P(event) = Number of favourable outcomes / Total number of equally likely outcomes
Basic Probability Rules
Mutually Exclusive and Independent Events
Mutually Exclusive Events
Events that cannot both occur at the same time. If A and B are mutually exclusive:
P(A and B) = 0
P(A or B) = P(A) + P(B)
Example: Rolling a 3 and rolling a 5 on one die.
Independent Events
Events where the outcome of one does not affect the probability of the other:
P(A and B) = P(A) × P(B)
Example: Tossing a coin and rolling a die — each outcome is unaffected by the other.
📐 Worked Example 6 — Basic Probability
A bag contains 5 red, 3 blue, and 2 green balls. A ball is drawn at random. Find: (a) P(red) (b) P(not blue) (c) P(red or green).
(a) P(red) = 5/10 = 1/2
P(red or green) = 5/10 + 2/10 = 7/10
6. Combined Events — Tree Diagrams and Tables
Possibility Space (Sample Space) Diagrams
📐 Worked Example 7 — Two Dice Sample Space
Two fair dice are rolled. Find: (a) P(sum = 7) (b) P(sum ≥ 10) (c) P(both show same number).
P(sum=7) = 6/36 = 1/6
P(sum≥10) = 6/36 = 1/6
P(both same) = 6/36 = 1/6
Tree Diagrams
• Each branch shows a possible outcome and its probability.
• Probabilities on branches from the same point must sum to 1.
• Multiply along branches to find the probability of a combined outcome.
• Add across branches to find the probability of alternative outcomes.
📐 Worked Example 8 — Tree Diagram (Without Replacement)
A bag has 4 red and 3 blue balls. Two balls are drawn without replacement. Find: (a) P(both red) (b) P(one of each colour) (c) P(at least one blue).
First draw: P(R) = 4/7, P(B) = 3/7
Second draw (if first was R): P(R) = 3/6 = 1/2, P(B) = 3/6 = 1/2
Second draw (if first was B): P(R) = 4/6 = 2/3, P(B) = 2/6 = 1/3
= (4/7×3/6) + (3/7×4/6) = 12/42 + 12/42 = 24/42 = 4/7
Conditional Probability
P(A|B) = P(A and B) / P(B)
In tree diagrams: conditional probabilities appear on the second set of branches — the probabilities change based on what happened first (especially in without-replacement problems).
📐 Worked Example 9 — Conditional Probability
From the tree diagram above, find P(first ball was red | second ball is blue).
P(R first | B second) = P(R first AND B second) / P(B second)
P(B second) = P(RB) + P(BB) = 2/7 + 1/7 = 3/7
7. Comparing Statistical Distributions
Cambridge frequently asks you to compare two distributions using averages and measures of spread. You must comment on BOTH the average AND the spread to earn full marks.
1. Compare averages (mean or median) — state which group scored higher/lower on average and by how much.
2. Compare spread (range or IQR) — state which group is more consistent (smaller IQR = more consistent results).
Example answer: "Group A has a higher median (58) than Group B (52), suggesting Group A performed better overall. However, Group A has a larger IQR (22 vs 14), suggesting Group B's results were more consistent."
| Measure | What it tells you | Comparison phrase |
|---|---|---|
| Mean / Median | Typical/average value — the "centre" of the data | "On average, Group A scored higher than Group B" |
| Range | Overall spread — difference between extremes | "Group B has a smaller range, so results were less spread out" |
| IQR | Spread of the middle 50% of data — not affected by outliers | "Group A has a larger IQR, so the middle 50% were less consistent" |
📝 Exam Practice Questions
Q1 [3 marks] — The ages of 7 people are: 14, 18, 22, 16, 14, 20, 31. Find the mean, median, mode, and range.
Mean: (14+14+16+18+20+22+31)/7 = 135/7 = 19.3
Median: 4th value = 18
Mode: 14 (appears twice)
Range: 31−14 = 17
Q2 [4 marks] — The table shows the heights of 40 plants. Calculate an estimate of the mean height. State why it is an estimate.
| Height h (cm) | Frequency |
|---|---|
| 0 ≤ h < 5 | 6 |
| 5 ≤ h < 10 | 14 |
| 10 ≤ h < 20 | 12 |
| 20 ≤ h < 30 | 8 |
fx: 6×2.5=15, 14×7.5=105, 12×15=180, 8×25=200
Σfx = 15+105+180+200 = 500
Estimated mean = 500÷40 = 12.5 cm
It is an estimate because we assume all values within each class are at the midpoint, which is not necessarily true.
Q3 [3 marks] — A frequency density histogram has bars at: 0–4 (FD=3), 4–8 (FD=5), 8–16 (FD=2), 16–20 (FD=1). Find the total frequency and the modal class.
0–4: 3×4=12 4–8: 5×4=20 8–16: 2×8=16 16–20: 1×4=4
Total frequency = 12+20+16+4 = 52
Modal class = class with highest frequency density = 4–8 (FD=5)
Q4 [4 marks] — A spinner has sections numbered 1–5, each equally likely. It is spun twice. Find: (a) P(both show 4) (b) P(sum = 6) (c) P(at least one 5) (d) P(second shows 3 | first showed an odd number).
(a) P(4,4) = (1/5)×(1/5) = 1/25
(b) Pairs summing to 6: (1,5),(2,4),(3,3),(4,2),(5,1) = 5 pairs.
P(sum=6) = 5/25 = 1/5
(c) P(at least one 5) = 1 − P(no 5) = 1 − (4/5)×(4/5) = 1 − 16/25 = 9/25
(d) P(2nd=3 | 1st=odd). Odd numbers: 1,3,5 → P(1st=odd)=3/5.
P(1st=odd AND 2nd=3) = (3/5)×(1/5) = 3/25.
P(2nd=3|1st=odd) = (3/25)÷(3/5) = 1/5
Q5 [5 marks] — A box has 6 white and 4 black counters. Two counters are taken without replacement. Draw a tree diagram. Find: (a) P(both white) (b) P(both same colour) (c) P(at least one black).
1st W (6/10): 2nd W=5/9, 2nd B=4/9
1st B (4/10): 2nd W=6/9, 2nd B=3/9
(a) P(WW) = (6/10)×(5/9) = 30/90 = 1/3
(b) P(same) = P(WW)+P(BB) = 30/90 + (4/10×3/9) = 30/90 + 12/90 = 42/90 = 7/15
(c) P(at least one black) = 1−P(both white) = 1−1/3 = 2/3
Q6 [3 marks] — Two groups of students sit the same test. Group A: median = 64, IQR = 18. Group B: median = 58, IQR = 9. Write two statistical comparisons between the groups.
Comparison 2 (spread): Group A has a larger IQR (18) than Group B (9), meaning the middle 50% of Group A's results were more spread out. Group B's results were more consistent — students in Group B performed more similarly to each other.