COMP4702 Lecture 2

General Concept

T={xi,yi}i=1n\mathcal{T}=\lbrace {\bf{x}}_i, y_i\rbrace_{i=1}^{n}
x=[x1x2xp]T{\bf{x}} = \begin{bmatrix}x_1&x_2&\cdots&x_p\end{bmatrix}^T

Example - Classifying Songs

Figure 1 - Length vs Perceived Energy of Songs


Example - Car Stopping Distances

Figure 2 - Car Stopping Distances Example

kk-Nearest Neighbours


kk-Nearest Neighbours for Classification

Figure 3  - Table of Values for KNN Classifier Example

Figure 4 - KNN Classifier for k=1 and k=3

kk-Nearest Neighbours Pseudocode

Data: Training data, denoted {xi,yi}i=1n\{\bf{x}_ i, y_ i\}_ {i=1}^{n} with test input denoted x\bf{x_ \star}

Result: Predicted test output, y^(x)\hat{y}(\bf{x_ \star})

  1. Compute the distances xix2||\bf{x_ i}-\bf{x_ \star}||_2 for all training points i=1,,ni=1, \cdots, n
  2. Let N={i:xi is one of the k data points closest to x}\mathcal{N}_ \star=\lbrace i:\bf{x}_ i \text{ is one of the }k \text{ data points closest to }\bf{x_ \star}\rbrace
  3. Compute the prediction y^(x)\hat{y}(\bf{x_ \star}) using the following formula:
y^(x)={Average{yj:jN}(Regression problems)MajorityVote{yj:jN}(Classification problems)(7) \hat{y}(\bf{x_ \star})=\begin{cases}\text{Average}\{y_ j:j\in\mathcal{N}_ \star\}&\text{(Regression problems)}\\ \text{MajorityVote}\lbrace y_ j : j \in \mathcal{N}_ \star \rbrace&\text{(Classification problems)}\end{cases}\tag{7}

Figure 5 - KNN Decision Boundary for varying k-values

Figure 6 - KNN Decision Boundary for varying k-values

KK-Nearest Neighbours for Regression

Figure 7 - KNN Regression Problem

Decision Trees

Figure 7 - KNN Regression Problem

Learning a Regression Tree

y^(x)==1Ly^I{xR} \hat{y}(\bf{x_\star})=\sum_{\ell=1}^{L} \hat{y}_\ell \mathbb{I}\lbrace \bf{x}_\star \in R_\ell \rbrace

Recursive Binary Splitting Algorithm

R1(j,s)={xxj<s}        and         R2(j,s)={xxjs}(2.4)R_1(j,s)=\{\bf{x} | x_j < s\}\ \ \ \ \ \ \ \ \text{and}\ \ \ \ \ \ \ \ \ R_2(j,s)=\{\bf{x} | x_j \ge s\}\tag{2.4}
i:xiR1(j,s)(yiy^1(j,s))2+i:xiR2(j,s)(yiy^2(j,s))2(2.5) \sum_{i: \bf{x}_i\in R_1(j,s)} (y_i-\hat{y}_1(j,s))^2 + \sum_{i: \bf{x}_i\in R_2(j,s)} (y_i-\hat{y}_2(j,s))^2\tag{2.5}

Figure 8 - Possible splits for decision tree problem

Regression Tree Algorithm

Goal: Learn a decision tree using recursive binary splitting:
Data: Training data T={xi,yi}i=1n\mathcal{T}=\{\bf{x}_i, y_i\}_{i=1}^n
Result: Decision tree with regions R1,,RLR_1, \cdots, R_L and corresponding predictions y^1,,y^L\hat{y}_1, \cdots, \hat{y}_L

  1. Let RR denote the whole input space
  2. Compute the regions (R1,,RL)=Split(R,T)(R_1, \cdots, R_L)=\text{Split}(R, \mathcal{T})
  3. Compute the predictions y^\hat{y}_\ell for {1,,L}\ell\in\{1, \cdots, L\} as:
    y^={Average{yi:xiR}(Regression Problems)MajorityVote{yi:xiR}(Classification Problems) \hat{y}_\ell=\begin{cases} \text{Average}\{y_i:\bf{x}_i \in R \ell\} & \text{(Regression Problems)}\\ \text{MajorityVote}\{y_i : \bf{x}_i\in R\ell \} &\text{(Classification Problems)}\\ \end{cases}
Function Split(R, 𝒯):
    if (stopping criterion fulfilled) 
        return r
    else
        Go through all possible splits xⱼ < s for all input variables
          j = 1, ..., p
        Pick the pair (j,s) that minimises the loss function for regression/classification problems
        Split the region R into R₁ and R₂ according to Eq 2.4
        Split data 𝒯 into 𝒯₁ and 𝒯₂ respectively 
        return Split(R₁, 𝒯₁), Split(R₂, 𝒯₂)

Classification Tree Algorithm



Figure 9 - Effect of varying depth of decision tree. Right Decision tree with unbound depth. We can see that the decision tree over-fits to the data.

Figure 10 - Effect of varying depth of decision tree. Right Decision tree with unbound depth. We can see that the decision tree over-fits to the data.