Gaussian Discriminant Analysis (GDA) — Multiclass
Table of Contents
- Introduction
- Generative Assumptions
- Model Derivation
- Linear Discriminant Analysis (LDA)
- Quadratic Discriminant Analysis (QDA)
- Comparison: LDA vs QDA
- Conclusion
Introduction
GDA is a generative classifier as it models the joint distribution \(P(x, y)\), and uses Bayes’ Rule to derive \(P(y|x)\) for classification.
GDA assumes that the feature distribution conditioned on the class is Gaussian:
This generalizes easily to the multiclass setting, where:
- \(x \in \mathbb{R}^d\) is the feature vector
- \(y \in \{0, 1, ..., K-1\}\) is the class label
- \(\phi_k = P(y = k)\) is the prior for class \(k\)
- \(\mu_k \in \mathbb{R}^d\), \(\Sigma_k \in \mathbb{R}^{d \times d}\)
Generative Assumptions
Let the training data be:
For each class \(k \in \{0, ..., K-1\}\):
- \(x \mid y = k \sim \mathcal{N}(\mu_k, \Sigma_k)\)
- \(y \sim \text{Categorical}(\phi_0, ..., \phi_{K-1})\)
Model Derivation
Posterior using Bayes’ Theorem:
We predict:
Linear Discriminant Analysis (LDA)
Assumptions
- All class-conditional distributions share the same covariance matrix: \(\Sigma_k = \Sigma\)
- Only the class means \(\mu_k\) differ.
Decision Function
Each class has a discriminant function:
Predict:
Parameter Estimation (MLE)
From data, estimate:
- Class priors:
$$ \phi_k = \frac{n_k}{n}, \quad \text{where } n_k = \sum_{i=1}^n \mathbb{1}{y_i = k} $$
- Class means:
$$ \mu_k = \frac{1}{n_k} \sum_{i: y_i = k} x_i $$
- Shared covariance matrix:
$$ \Sigma = \frac{1}{n} \sum_{k=0}^{K-1} \sum_{i: y_i = k} (x_i - \mu_k)(x_i - \mu_k)^\top $$
Quadratic Discriminant Analysis (QDA)
Assumptions
- Each class has its own covariance matrix: \(\Sigma_k\)
- This yields quadratic decision boundaries.
Decision Function
Each class has a quadratic discriminant:
Predict:
Parameter Estimation (MLE)
For each class \(k\):
- Class prior:
$$ \phi_k = \frac{n_k}{n} $$
- Class mean:
$$ \mu_k = \frac{1}{n_k} \sum_{i: y_i = k} x_i $$
- Class-specific covariance:
$$ \Sigma_k = \frac{1}{n_k} \sum_{i: y_i = k} (x_i - \mu_k)(x_i - \mu_k)^\top $$
Comparison: LDA vs QDA
Feature | LDA | QD |
---|---|---|
Covariance Assumption | Same for all classes | Unique per class |
Decision Boundary | Linear | Quadratic |
Parameters Estimated | \(\mu_k, \phi_k, \Sigma\) | \(\mu_k, \phi_k, \Sigma_k\) |
Complexity | Simpler, lower variance | More flexible, but can overfit on small data |
Ideal Use Case | Classes share similar spread (covariance) | Classes have distinct covariance structures |
Conclusion
Multiclass GDA extends the binary case by modeling each class \(k \in \{0, ..., K-1\}\) with its own mean and covariance (for QDA) or a shared covariance (for LDA). The model remains clear and interpretable:
- LDA gives a linear classifier, computationally efficient and robust.
- QDA is more expressive, better when covariances differ.
Both compute closed-form MLE solutions, making them fast and useful even in high dimensions.