COMP4702 Lecture 11

Chapter 10: Generative Models

Gaussian Mixture Models and Discriminant Analysis

Mixture Densities

p(x)=i=1kp(xGi)P(Gi)p({\bf x})=\sum_{i=1}^{k} p({\bf x}|G_i) P(G_i)
p(xGi)=N(μi,Σi)p({\bf x} | G_i) = \mathcal{N}({\bf \mu}_i, \Sigma_i)
p(xθ)=p(xμ,Σi,P(Gi))p({\bf x}|\theta)=p({\bf x} | {\bf \mu}, \Sigma_i, P(G_i))
A GMM with 3 components
Figure 1 - A Gaussian mixture model with 3 Gaussian distributions in a two-dimensional space.

A GMM with 3 components
Figure 2 - A Gaussian mixture model with 3 Gaussian distributions in a two-dimensional space.

Predicting Output Labels for New Inputs: Discriminant Analysis

A GMM with 3 components
Figure 3 - Difference in decision boundaries generated by LDAs and QDAs.

Semi-Supervised Learning of the Gaussian Mixture Model.

wi(m)={p(y=mxi,θ^) if yi is missing1 if yi=m0 otherwiseπ^m=1ni=1nwi(m)μ^m=1i=1nwi(m)i=1nwi(m)xiΣ^m=1i=1nwi(m)i=1nwi(m)(xiμ^m)(xiμ^m)T \begin{align*} w_i(m)&= \begin{cases} p(y=m|{\bf x}_i, \hat\theta) & \text{ if } y_i \text{ is missing}\\ 1 & \text{ if } y_i = m\\ 0 & \text{ otherwise} \end{cases} \tag{10.10a}\\ \hat\pi_m &= \frac{1}{n} \sum_{i=1}{n} w_i(m) \tag{10.10b}\\ \hat\mu_m &= \frac{1}{\sum_{i=1}^{n} w_i (m)} \sum_{i=1}^{n} w_i(m) {\bf x}_i \tag{10.10c}\\ \hat\Sigma_m &= \frac{1}{\sum_{i=1}^{n} w_i (m)} \sum_{i=1}^{n} w_i(m) ({\bf x}_i - \hat\mu_m) ({\bf x}_i - \hat\mu_m)^T \tag{10.10d} \end{align*}

Cluster Analysis


Data Unlabelled training data T={xi}i=1n\mathcal{T}=\lbrace{\bf x}_i\rbrace_{i=1}^{n}, number of clusters MM Result Gaussian Mixture Model

  1. Initialise θ^={π^m,μ^m,Σ^m}m=1M\hat{\bf\theta}=\lbrace\hat\pi_m, \hat{\bf\mu}_m, \hat{\bf\Sigma}_m\rbrace_{m=1}^M
  2. repeat
  3.  |   For each xi{\bf x}_i in T={xi}i=1n\mathcal{T}=\lbrace{\bf x}_ i\rbrace_ {i=1}^{n} compute the prediction p(yxi,θ^)p(y|{\bf x}_ i, \hat{\bf\theta}) using Equation 10.5 using the current parameter estimate θ^\hat{\bf\theta}.
  4.  |   Update the parameter estimates θ^{π^m,μ^m,Σ^m}m=1M\hat{\bf\theta}\leftarrow\lbrace\hat\pi_m, \hat{\bf\mu}_m, \hat{\bf\Sigma}_m\rbrace_{m=1}^M using Equations 10.10.
  5. until convergence

k-Means Clustering

Choosing the Number of Clusters