Practice Problems

Test your understanding with calculations

Confusion Matrix

Problem 1

A model predicts disease outcomes with the following confusion matrix: | | Predicted + | Predicted - | |---|---|---| | Actual + | 45 | 5 | | Actual - | 15 | 135 | Calculate: (a) Accuracy (b) Precision (c) Recall (d) F1 Score

Show Solution

From the matrix: TP=45, FN=5, FP=15, TN=135, Total=200 (a) Accuracy = (TP+TN)/(Total) = (45+135)/200 = 180/200 = 0.90 = 90% (b) Precision = TP/(TP+FP) = 45/(45+15) = 45/60 = 0.75 = 75% (c) Recall = TP/(TP+FN) = 45/(45+5) = 45/50 = 0.90 = 90% (d) F1 = 2×(P×R)/(P+R) = 2×(0.75×0.90)/(0.75+0.90) = 2×0.675/1.65 = 0.818 = 81.8%

Key Insight

High recall (90%) means we catch most diseases. Lower precision (75%) means some false alarms. For disease detection, high recall is usually preferred.

Statistics

Problem 2

Dataset: 12, 15, 18, 22, 25, 28, 30 Calculate: (a) Mean (b) Median (c) Range

Show Solution

(a) Mean = Sum/Count = (12+15+18+22+25+28+30)/7 = 150/7 = 21.43 (b) Median = Middle value (already sorted) 7 values → middle is 4th value = 22 (c) Range = Max - Min = 30 - 12 = 18

Key Insight

Mean (21.43) and median (22) are close, suggesting roughly symmetric distribution.

Scaling

Problem 3

Feature values: [100, 200, 300, 400, 500] Scale the value x=300 using: (a) MinMaxScaler (b) StandardScaler (mean=300, std=141.4)

Show Solution

(a) MinMaxScaler: min=100, max=500 x_scaled = (300-100)/(500-100) = 200/400 = 0.5 (b) StandardScaler: x_scaled = (300-300)/141.4 = 0/141.4 = 0.0

Key Insight

After StandardScaler, x=300 (the mean) becomes 0. After MinMaxScaler, x=300 (middle value) becomes 0.5.

Class Imbalance

Problem 4

Training set has 950 normal samples and 50 fraud samples. (a) What is the class imbalance ratio? (b) If model predicts ALL as "normal", what is its accuracy? (c) Why is accuracy misleading here?

Show Solution

(a) Imbalance ratio = 950:50 = 19:1 (or 95% vs 5%) (b) If predict all as normal: - Correct: 950 normal predictions - Wrong: 50 fraud missed - Accuracy = 950/1000 = 95% (c) 95% accuracy sounds great, but: - Precision for fraud = 0% - Recall for fraud = 0% - We catch ZERO frauds!

Key Insight

With imbalanced data, use precision, recall, and F1 instead of accuracy.

Model Comparison

Problem 5

Two models tested: - Model A: Accuracy=92%, Precision=85%, Recall=75% - Model B: Accuracy=88%, Precision=70%, Recall=95% Which model is better for disease detection?

Show Solution

For disease detection, missing a disease (False Negative) is dangerous. Model A: Recall=75% → misses 25% of diseases Model B: Recall=95% → misses only 5% of diseases Model B is better because: - Higher recall (catches more diseases) - Even though precision is lower (more false alarms), it's better to investigate a false alarm than miss a disease F1 scores: - A: 2×(0.85×0.75)/(0.85+0.75) = 0.80 - B: 2×(0.70×0.95)/(0.70+0.95) = 0.81

Key Insight

Choose metric based on business need: Recall for disease/fraud detection, Precision for spam filters.

Quick Quizzes

Test specific topics

Practice Problems

Problem 1

Listen to Solution Walkthrough

Problem 2

Listen to Solution Walkthrough

Problem 3

Listen to Solution Walkthrough

Problem 4

Listen to Solution Walkthrough

Problem 5

Listen to Solution Walkthrough

Quick Quizzes

Evaluation

Statistics

Preprocessing

Algorithms