Practice Problems

Test your understanding with calculations

Confusion Matrix

Problem 1

Listen to Solution Walkthrough

Hosts explain step-by-step how to solve Confusion Matrix problems

A model predicts disease outcomes with the following confusion matrix: | | Predicted + | Predicted - | |---|---|---| | Actual + | 45 | 5 | | Actual - | 15 | 135 | Calculate: (a) Accuracy (b) Precision (c) Recall (d) F1 Score
Show Solution
From the matrix: TP=45, FN=5, FP=15, TN=135, Total=200 (a) Accuracy = (TP+TN)/(Total) = (45+135)/200 = 180/200 = 0.90 = 90% (b) Precision = TP/(TP+FP) = 45/(45+15) = 45/60 = 0.75 = 75% (c) Recall = TP/(TP+FN) = 45/(45+5) = 45/50 = 0.90 = 90% (d) F1 = 2×(P×R)/(P+R) = 2×(0.75×0.90)/(0.75+0.90) = 2×0.675/1.65 = 0.818 = 81.8%
Key Insight

High recall (90%) means we catch most diseases. Lower precision (75%) means some false alarms. For disease detection, high recall is usually preferred.

Statistics

Problem 2

Listen to Solution Walkthrough

Hosts explain step-by-step how to solve Statistics problems

Dataset: 12, 15, 18, 22, 25, 28, 30 Calculate: (a) Mean (b) Median (c) Range
Show Solution
(a) Mean = Sum/Count = (12+15+18+22+25+28+30)/7 = 150/7 = 21.43 (b) Median = Middle value (already sorted) 7 values → middle is 4th value = 22 (c) Range = Max - Min = 30 - 12 = 18
Key Insight

Mean (21.43) and median (22) are close, suggesting roughly symmetric distribution.

Scaling

Problem 3

Listen to Solution Walkthrough

Hosts explain step-by-step how to solve Scaling problems

Feature values: [100, 200, 300, 400, 500] Scale the value x=300 using: (a) MinMaxScaler (b) StandardScaler (mean=300, std=141.4)
Show Solution
(a) MinMaxScaler: min=100, max=500 x_scaled = (300-100)/(500-100) = 200/400 = 0.5 (b) StandardScaler: x_scaled = (300-300)/141.4 = 0/141.4 = 0.0
Key Insight

After StandardScaler, x=300 (the mean) becomes 0. After MinMaxScaler, x=300 (middle value) becomes 0.5.

Class Imbalance

Problem 4

Listen to Solution Walkthrough

Hosts explain step-by-step how to solve Class Imbalance problems

Training set has 950 normal samples and 50 fraud samples. (a) What is the class imbalance ratio? (b) If model predicts ALL as "normal", what is its accuracy? (c) Why is accuracy misleading here?
Show Solution
(a) Imbalance ratio = 950:50 = 19:1 (or 95% vs 5%) (b) If predict all as normal: - Correct: 950 normal predictions - Wrong: 50 fraud missed - Accuracy = 950/1000 = 95% (c) 95% accuracy sounds great, but: - Precision for fraud = 0% - Recall for fraud = 0% - We catch ZERO frauds!
Key Insight

With imbalanced data, use precision, recall, and F1 instead of accuracy.

Model Comparison

Problem 5

Listen to Solution Walkthrough

Hosts explain step-by-step how to solve Model Comparison problems

Two models tested: - Model A: Accuracy=92%, Precision=85%, Recall=75% - Model B: Accuracy=88%, Precision=70%, Recall=95% Which model is better for disease detection?
Show Solution
For disease detection, missing a disease (False Negative) is dangerous. Model A: Recall=75% → misses 25% of diseases Model B: Recall=95% → misses only 5% of diseases Model B is better because: - Higher recall (catches more diseases) - Even though precision is lower (more false alarms), it's better to investigate a false alarm than miss a disease F1 scores: - A: 2×(0.85×0.75)/(0.85+0.75) = 0.80 - B: 2×(0.70×0.95)/(0.70+0.95) = 0.81
Key Insight

Choose metric based on business need: Recall for disease/fraud detection, Precision for spam filters.

Quick Quizzes

Test specific topics

← Formulas Quick Reference →