Practice Problems
Test your understanding with calculations
Confusion Matrix
Problem 1
A model predicts disease outcomes with the following confusion matrix:
| | Predicted + | Predicted - |
|---|---|---|
| Actual + | 45 | 5 |
| Actual - | 15 | 135 |
Calculate: (a) Accuracy (b) Precision (c) Recall (d) F1 Score
Show Solution
From the matrix: TP=45, FN=5, FP=15, TN=135, Total=200
(a) Accuracy = (TP+TN)/(Total) = (45+135)/200 = 180/200 = 0.90 = 90%
(b) Precision = TP/(TP+FP) = 45/(45+15) = 45/60 = 0.75 = 75%
(c) Recall = TP/(TP+FN) = 45/(45+5) = 45/50 = 0.90 = 90%
(d) F1 = 2×(P×R)/(P+R) = 2×(0.75×0.90)/(0.75+0.90) = 2×0.675/1.65 = 0.818 = 81.8%
Key Insight
High recall (90%) means we catch most diseases. Lower precision (75%) means some false alarms. For disease detection, high recall is usually preferred.
Statistics
Problem 2
Dataset: 12, 15, 18, 22, 25, 28, 30
Calculate: (a) Mean (b) Median (c) Range
Show Solution
(a) Mean = Sum/Count = (12+15+18+22+25+28+30)/7 = 150/7 = 21.43
(b) Median = Middle value (already sorted)
7 values → middle is 4th value = 22
(c) Range = Max - Min = 30 - 12 = 18
Key Insight
Mean (21.43) and median (22) are close, suggesting roughly symmetric distribution.
Scaling
Problem 3
Feature values: [100, 200, 300, 400, 500]
Scale the value x=300 using:
(a) MinMaxScaler
(b) StandardScaler (mean=300, std=141.4)
Show Solution
(a) MinMaxScaler:
min=100, max=500
x_scaled = (300-100)/(500-100) = 200/400 = 0.5
(b) StandardScaler:
x_scaled = (300-300)/141.4 = 0/141.4 = 0.0
Key Insight
After StandardScaler, x=300 (the mean) becomes 0. After MinMaxScaler, x=300 (middle value) becomes 0.5.
Class Imbalance
Problem 4
Training set has 950 normal samples and 50 fraud samples.
(a) What is the class imbalance ratio?
(b) If model predicts ALL as "normal", what is its accuracy?
(c) Why is accuracy misleading here?
Show Solution
(a) Imbalance ratio = 950:50 = 19:1 (or 95% vs 5%)
(b) If predict all as normal:
- Correct: 950 normal predictions
- Wrong: 50 fraud missed
- Accuracy = 950/1000 = 95%
(c) 95% accuracy sounds great, but:
- Precision for fraud = 0%
- Recall for fraud = 0%
- We catch ZERO frauds!
Key Insight
With imbalanced data, use precision, recall, and F1 instead of accuracy.
Model Comparison
Problem 5
Two models tested:
- Model A: Accuracy=92%, Precision=85%, Recall=75%
- Model B: Accuracy=88%, Precision=70%, Recall=95%
Which model is better for disease detection?
Show Solution
For disease detection, missing a disease (False Negative) is dangerous.
Model A: Recall=75% → misses 25% of diseases
Model B: Recall=95% → misses only 5% of diseases
Model B is better because:
- Higher recall (catches more diseases)
- Even though precision is lower (more false alarms),
it's better to investigate a false alarm than miss a disease
F1 scores:
- A: 2×(0.85×0.75)/(0.85+0.75) = 0.80
- B: 2×(0.70×0.95)/(0.70+0.95) = 0.81
Key Insight
Choose metric based on business need: Recall for disease/fraud detection, Precision for spam filters.
Quick Quizzes
Test specific topics