Revision: Neural networks

A neural networks contains nodes where the neurons sums the inputs thorough weights. The simplest transfer function (step functioN) will output a 1 if exceed the treshold through an activation function (step function).

The problem is how to decide the weights
- we want to find a set of weights that minimise an error function E
- DE/DW =0 does not means that it is the lowerst point, could be local min

Learning Laws

Unsupervise learning

Throws data and the data will learn by itself by applying its own structure.

Hebbian learning: Neural networks that strngthen or weaken connections between neurons based on data
Adversial learning: Two neural networks that comptete with each other based on sample data
Self organising maps: Partition the vector space based on sample data - “ Clustering”

SOM (Self organising maps)

Given a sample vector Vk, the neuron that best matches vk accordning to some distance metrics. (BMU)

We adjust it to look like the input
Adjust the neighbour except that we adjust it to a lesser degree
This results in a vector space called “tesellated voronoi surface”

Run through every neuron
Find the one that has a smallest distance (BMU)
Update using the neighbour hood function

KOHONEN SOM ALGO

h is known as neighbourhood function

The BMU will have the maximum value for the neighbourhood function where the change will tbe the greatest.
As the curve goes further to the left or the right, the change get smaller
As time passes, our training rate will becomes narrower.
As time passes, the hat will also become narrower

The learning rate is similiarly decayed
The end result:
- The neurons and their nearest neioghbours are adpated to look like sample data that are closest to them
- New sample data coming in will auto be classified as the centroid they closest to

KOHONEN SOM auto infers structure from the data presented to it.

Supervise learning

Takes the data and generates the label. Neural will try to mimicks these generated labels

Data is present in form of (x,y) where x is sample input and y is some kind of target or class that neural networks need to learn

Algo:

Linear perceptron: Simple, good for classifying linearly separable data
Multilayer perceptron: Complex, good for classifying non linearly separable data

Perceptrons

Neurons consist:

Set of Input x
Set of Weight y
A unit that does a dot product w^Tx
An activation fucntion that does the outputs 1 whwen w^Tx + b > 0 and -1 otherwise.

Learning algo:

Present (xi,yi) to preceptron, where i is a particular sample
Do a feed forward:
We take the error (yi-yout) * the learning rate (a)

however, perceptrons cannot learn the XOR problem

No matter how we draw line, we cant seperate the 1 and 0

We want alpha to bring us to the minimum

Multilayer preceptons

Consist of:

Set of input x with n+1 element s(Because of bias)
Set of n perceptron ndoes called hidden nodes with m+1 terms for bias
Set of weights connecting x to h
Set a q perceptron nodes p for the output
A set of weights wp connecting h to p

Learning steps:

Gradient decend:

the 1/2 s used to cancel out the 2
We set the gradient to 0 to find the minima
This means that the activation fucntion f and g must be differentiable

this could be use to solve our xor problem

Non linearity

Neural network:

Classification: Classify the input into some label

regression: Predict xn given x0 to xn-1

Unlike the xor which we can sovle by drawing another line, we cannot do so for every data:

Neura; network Hyper parameters

Weights = Parameters NN = Parameterised models

Neural Network hyper parameters factors:

Number of input nodes
Architecture
Size of each hidden layer
Number of hidden layer
Loss function
Transfer function
Optimisation function adn their parameters
dropouts and regulaizer
Activation function

An activation fucntion transforms the dot products of the weights and inputs.

Uses of activation functions:

Scaling
Destroy linearity of output
Shapuing it to statistcial distribution

Examples:

Loss function

A loss function measures the error between function learnt by the NN and the actual target value yi

Optimiser

In NN Optimiser are algo that derive the best set of parameter values that minimise the loss function

Overfitting

When the model only react well to the training data but not test data.

Data is Consist of two things
- Some sort of generator process where the input goes into
- Noise

Noise is what cause overfitting since noise is very specific to the data fitted in.

We can prevent this by having a simple model with fewer parameters or having more training data

Blue: training
Green: Actual data

A good way to fix is to partition the data.

Random noise introduce to the data makes the data looks like other data. (See noise layers)

CS3237 - Lecture 3: Neural Networks

Lecture 3: Neural networks