COMP4702 Week 1
The lecture content for this course follows Lindholm22 and includes the summary content and additional commentary created by Dr Marcus Gallagher here. The content for this lecture goes through Chapter 1.
1.0 - Introduction
Machine Learning
is a form of data analysis, in which (part of) the analysis is automated- This results in a computer program that is created (at least partly) by learning from data
- At the heart of this computer program is a mathematical model
- This means that generic computer programs / mathematical models can be used in specific applications just by changing the training data.
- The mathematical model describes the relationship between the various variables / inputs to the mathematical model that are combined to yield the output.
- The mathematical model is a compact representation of the data, that captures the key properties of the phenomenon we are studying.
- The machine learning model could be a mathematical equation with coefficients.
- We use the data to figure out the best coefficients to use - this is the "Machine Learning" component.
- The type of model to use is dependent on the type of available data, and the type of problem.
- The learning process of a mathematical model requires a learning algorithm which is capable of automatically adjusting the parameters of the model to agree with the data.
- For machine learning, require three key components:
- The data
- The mathematical model
- The learning algorithm
- For machine learning, require three key components:
1.1 - Example: Automatically Diagnosing Heart Abnormalities
- Given ECG recordings of the electrical activity of a person's heart.
- This is a more complicated example, in that we have this assumption that the data-point at time
depends on the data-point at time .
- This is a more complicated example, in that we have this assumption that the data-point at time

-
A cardiologist gains insight about the condition of the heart (which can be used to diagnose the patient's condition).
-
We can attempt to automate this analysis process - can we construct a computer program which reads ECG signals and returns a prediction regarding the normality (or abnormality) of the heart?
-
Creating a traditional computer program to do this is quite complicated - not obvious which mathematical operations / computations are required to return the diagnosis.
-
Instead, we can teach the computer to perform this analysis by example, through the construction of a machine learning program
- Instead of requiring a set of rules of how to classify an ECG signal as normal or abnormal, can ask cardiologists to label a large number of ECG signals with labels corresponding to the underlying heart condition.
- Much easier (although tedious) way for cardiologists to communicate their experience, and encode it in a way that is interpretable by a computer.
- Then, the learning algorithm is responsible to adapt the computer program so that its predictions agree with the cardiologists' labels.
-
The hope is that if the mathematical model succeeds on the training data (where we already know the data), then it should be possible to use the predictions made by the program on previously unseen data (where we don't already know the answer).
- We want the model to be able to generalise (in this case, be able to label ECG data that it hasn't seen before).
- We want the model to be able to generalise (in this case, be able to label ECG data that it hasn't seen before).
-
This is an example of supervised learning, in which the model was trained using training data where experts had labelled examples of each class of ECG signal
- The model's learning is supervised by the domain expert through their labelling
-
This is also an example of a classification problem, in which the output of the model is one of a fixed number of categories (represented by output values).
- This is in contrast to a regression problem, in which the output of the model is a numerical value (demonstrated in the next example)

Left
: Values for the unknown parameters of the model are set by the learning algorithm such that the model best describes the available training dataRight
: The learned model is used on new, previously unseen data, in which we hope to obtain a correct classification. It is essential that the model is able to generalise to new data (that is not present in the training data).
1.2 - Example: Formation Energy of Crystals
- Want to discover new materials and their properties. One of these key properties for crystalline structures is the formation energy (the energy required to form the crystalline structure from individual component elements).
- There is a classical way to to compute the formation energy (using density function theory, DFT).
- Based on quantum mechanical modelling, it is computationally expensive.
- If we can train machine learning models to predict the energy of new crystals, many more materials could be investigated.
- As we are predicting a numerical value, this is an example of a regression problem.
- The correct outputs for the training set come from an (expensive) simulation rather than a human expert.
- One difference between Example 1.1 and Example 1.2 is that the ECG model is asked to predict a certain class, whereby the materials discovery model is asked to predict a numerical value.
- Study two key kinds of prediction problems, referred to as classification and regression problems.
- While conceptually similar, often use slight variations of the underlying mathematical model depending on the problem type.
- However, these re both types of supervised learning problems - we train a predictive model to mimic the predictions made by some form of supervisor.
- Interesting to note that the predictions made are not necessarily done by a human domain expert.
1.3 - Uncertainty in Machine Learning: Probabilistic Modelling
- Will make use of statistics and probability to describe the models used for making predictions
- Using probabilistic models allows the systematic representation (and to cope with) the uncertainty in the predictions made.
- Even in situations where there is a correct answer, ML models rely on various assumptions, and are trained from data using computational learning algorithms.
- With probabilistic models, able to represent the uncertainty in the models predictions irrespective of whether it originates from the data, modelling assumptions or computations.
- In many applications of ML, the output is uncertain itself, and there is no such thing as a definitive answer. Consider the example of predicting the probability of scoring a goal in a game of Soccer.

- We can create a simple probabilistic model to predict whether a shot is a goal.
- The prediction is based only on the players position on the field when taking a shot.
- Specifically, the input is given by the distance from the goal, and the angle between two lines from the player's position to each of the goal posts.
- This is just one feature that we can use to predict the outcome, but there are other features that we could use.
- Don't expect that this feature is a very good predictor of goal scoring, but if we had other features, the accuracy of our model would likely increase
- The output of the model corresponds to whether or not the shot results in a goal, meaning that this is a binary classification problem (1 if predicting a goal, else 0).
1.4 - Pixel-Wise Class Prediction
This is also called image segmentation
- This example is not like a traditional classification problem, in that we want to produce a label for every pixel in the image simultaneously.
- That is, the number of inputs and outputs to the model is the same
- The data required to train this model would be a mapping of image pixels to their respective class.
- The supervised machine learning problem is to then use this data to find a mapping that is capable of taking a (new, unseen) image and producing a corresponding output in the form of a predicted class for each pixel.

In the example above, we want to classify the pixels on the screen into one of four classes:
- Car (Blue)
- Traffic Sign (Yellow)
- Pavement (Purple)
- Tree (Green)
1.5 - Estimating Pollution Levels in London
- Must be some sort of clustering/trends in the data
- Obtained data from an array of ground sensors that measure
levels, as well as satellite data. - Want to develop a supervised machine learning model that can deliver forecasts of the air pollution level across time and space.
- Since the output (in this case, pollution level) is a continuous numerical value, this is a type of regression problem.
- Particularly challenging in that the measurements are reported at different spatial resolutions, and on varying timescales.
- Must figure out how to merge the different pieces of information together.
- This is an example of a multi-sensor, multi-resolution problem