Complete Audio Review

Rapid-fire review of all key concepts, formulas, and tips

Essential Formulas

Mean (Average)
\[ \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} \]
Variance
\[ \sigma^2 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})^2}{n} \]
Standard Deviation
\[ \sigma = \sqrt{\sigma^2} \]
Accuracy
\[ Accuracy = \frac{TP + TN}{TP + TN + FP + FN} \]
Precision
\[ Precision = \frac{TP}{TP + FP} \]
Recall (Sensitivity)
\[ Recall = \frac{TP}{TP + FN} \]
F1 Score
\[ F1 = 2 \times \frac{Precision \times Recall}{Precision + Recall} \]
Specificity
\[ Specificity = \frac{TN}{TN + FP} \]
StandardScaler
\[ x_{scaled} = \frac{x - \mu}{\sigma} \]
MinMaxScaler
\[ x_{scaled} = \frac{x - x_{min}}{x_{max} - x_{min}} \]
Pearson Correlation
\[ \rho = \frac{cov(X,Y)}{\sigma_X \cdot \sigma_Y} \]

Confusion Matrix

Predicted + Predicted -
Actual + TP (True Positive) FN (Type II Error)
Actual - FP (Type I Error) TN (True Negative)
Precision
TP/(TP+FP) - Of predicted +, how many correct?
Recall
TP/(TP+FN) - Of actual +, how many found?

ML Basics

Machine Learning
A field of AI that enables computers to learn from data without being explicitly programmed.
Supervised Learning
Learning from labeled data where both input (X) and output (Y) are provided. The model learns the function Y = f(X).
Unsupervised Learning
Learning from unlabeled data to find hidden patterns or structures.
Feature
An input variable (column) used to make predictions. Also called predictor, attribute, or independent variable.

Statistics

Mean
The average value. Sum all values and divide by count.
Median
The middle value when data is sorted. Less affected by outliers than mean.
Mode
The most frequently occurring value in a dataset.
Variance
Average of squared differences from the mean. Measures spread of data.

Data Preprocessing

EDA
Exploratory Data Analysis - examining data to summarize characteristics and find patterns.
One-Hot Encoding
Converting categorical variables to binary columns (0 or 1).
Label Encoding
Converting categories to numbers. May create false ordinal relationships.
StandardScaler
Transforms data to have mean=0 and std=1.

Model Evaluation

True Positive (TP)
Model correctly predicted positive class.
True Negative (TN)
Model correctly predicted negative class.
False Positive (FP)
Model incorrectly predicted positive. Type I Error.
False Negative (FN)
Model incorrectly predicted negative. Type II Error.

Algorithms

Linear Regression
Predicts continuous output as weighted sum of inputs: y = a + bx
Logistic Regression
Predicts probability of binary outcome using sigmoid function.
Decision Tree
Makes decisions by splitting data based on feature values. Easy to interpret.
Random Forest
Ensemble of decision trees trained on random subsets. Bagging method.

Optimization

Hyperparameter
Settings configured before training (not learned from data).
Parameter
Values learned during training.
Grid Search
Tests all combinations of hyperparameter values.
Overfitting
Model learns training data too well, fails on new data. High variance.

Algorithms - Scaling Required?

Algorithm Scaling? Why
Linear/Logistic Regression Yes Gradient descent
SVM Yes Distance-based
KNN Yes Distance-based
Neural Network Yes Gradient descent
Decision Tree No Split-based
Random Forest No Tree-based
Naive Bayes No Probability-based

Key Tips

Data Leakage Prevention

Always: Split → Fit on train → Transform both

Metric Selection

Recall for disease/fraud detection (don't miss positives)
Precision for spam filters (don't bother users with false alarms)

ML vs Deep Learning

ML: Less data, faster, manual features
DL: More data, GPU needed, auto features

← Practice Home →