Motivation: To figure out which of the option is the best and pick that wihtout making all recursive calls

Storing files on a tape (Minimisation)

Consider n files we want to store on a magnetic tape
They have lengths given by L1.. Ln,
Make a simple assumption that all lengths are distint
On magnetic tape, reading the kth file requries going through all previous files
If the files are arranged from 1 to n in the given order then
- cost(k) = L1 + l2 + l3 … Lk

Problem: Reaarange the files to minimize the avg cost

Theory

We want to permute the order of the files such that the avg cost of accessing the files are as small as possible.

We want to try to permute

Back trcking is to costly, there are n! (n^n/2) permutation
Dynamic programming does not help because we cannot find a small number of subproblems leading to the given problem.
Guess the solution with initution that Lp(n) > Lp(1)

Our inituition is to pick the smallest

Algorithm

If lets say our guess is correct, then just

Sort the records in increasing length

We can prove by using local swap/exchange algorithm

Prooving (Local swap for minimisation problem)

Suppose we are given a problem where we are required to find an optimum solution O where f(O) is minimum
Suppose greedy algo produce solution G
We want to prove that for any optimum solution O That minimises f, f(G) = f(O)
Approach: Slowly change any optimum solution O to eventually get the greedy solution G without increasing f in any step

Steps:

Define distance measure (Distance from the greedy solution) dist from any possible solution to a positive number that indicated the distance from greedy solution
- For any solution S, dist(S) = 0 if and only if S = G: This means that S is the greedy solution
- dist takes finitely many possible values for any valid solution

Intuitively far from greedy => Far from optimum

Given any optimum solution O, different from G, find another solution O* such that
- Either f(O*) < f(O)
- or f(O) = f(O) and dist(O) < dist(O)

Our claim is that if our file is increasing order of length then or F(q) is in the minimal order.. so we have to do is to swap some value of p[2] and p[3] such that the value of P[3] is greater than p[2].. note that this would make the order not in increasing anymore

We can see that the difference is smaller than 0 after we do the swap..

Let greedy algo produce a permutation p such that

Observed that dist(q) = 0 if and only if p = q… why?

Because it is the greedy solution

Collary explaination:

Proving that G is optimal: We want to prove that the greedy is optimal.. if G is not optimal, then there must be another solution that is optimal. Picking O as the optimal solution by minimal distance where O is not G and dist(O) > 0

O is not optimal (Since O is not G and there exist a O* that is a more optimal solution)
O does not have the min distance among optimal solution

Therefore the condition where O is the optimal solution by minimal distance cannot hold

Remember f is the function that we want to minimise.

In this case, our length equation is our f

In this image, let say G is our current greedy solution and lets say O is our optimal greedy solution by minimal distance..

We want to try to get as close to the greedy solution as possible by changing the edges slowly

The goal of distance function is to show that G is optimal:

if G was the only optimal solution then for any O that is not G, we will find O* such that f(O*) < f(O)
If there are many optimal solution, then we move closer to the greedy solution G to eventually show that G is optimal.

In this case, we do not need the dist function since LOCAL SWAP gave a solution better than optimal.

Why does this proof work?

Let O be optimum solution such that dist(O) is as small as possible. Since O is different from G, dist(O) > 0

if F(O*) < f(O), then we have found a solution better than optimum which contradicts
if f(O) = f(O) and dist(O) < dist(O) then this contradicts that dist(O) is the smallest among all possible optimal solution

Generalisation

Assuming the frequency of accessing file ith is Fi

So if we reorder the files using the permutation q then the cost of the function now becomes:

So intuitvely, we need Li to be in increasing order and Fi to be in decreasing order

To help figure out the correct greedy choice, we can try a similiar local swap as the prev algo to help us
Let q* be the permutation obtain by swapping q(i) and q(i+1)

This consistent with what we expect
We again make the simplifying assumption that Li/Fi are distinct
The proof of correctness of this greedy algorithm is almost identical to the prev one. The LOCAL SWAP is already there on the previous slide.

Local swap for maximiation problem

We are given a problem where we are required to find an optimum solution O such that f(O) is max

Suppose greedy algo produce solution G
We want to prove that for any optimum solution O That minimises f, f(G) = f(O)
Approach: Slowly change any optimum solution O to eventually get the greedy solution G without decreasing f in any step

Steps:

Define distance measure dist from any possible solution to a positive number that indicated the distance from greedy solution
- For any solution S, dist(S) = 0 if and only if S = G
- dist takes finitely many possible values for any valid solution
Given any optimum solution O, different from G, find another solution O* such that
- Either f(O*) > f(O)
- or f(O) = f(O) and dist(O) < dist(O)

For exam

Define approporaite dist function

Convince urself that dist(S) = 0 if and only if S = G and that dist takes only finitely many values (There is no need to prove this)

Given O, prove that there exists an O* that satisfies the desired property

Scheduling Classes

Suppose you are given two ararys S and F of size n listing the start and finishing times of each class
Choose the largest possible subset X of 1..n so that for any pair i,j in X.. either S[i] >= F[j] or S[j] >= F[i]

The goal is to schedule as much classes as possible

Naive: Just schedule everything and try everything

Theory

Which is the first class we should pick?

Think of how to make the first greedy choice
DO we want to include the class that is the shortest
Do we want to include class that begin first
Do we want to include the class that ends first

For each of the strategies above, think of a counter example for which it will not a part of the solution or try to reason why it should be a valid strategy

If we pick the shortest one, might overlap with another class with the starting time in between the S and F
IF we pick class that start first, it might go on forever and conflict with every other class
If pick the class that ends first, seems like it has minimal conflict and scheduling it might be good idea

Pick the class that finish first and recurse over all classes that start after this one finishes

Pick the class
Remove those that has the conflict
pick the next optimal one that has no conflict
repeat

PsueudoCode

Sort the finishing time of all class from first to last
Add Class 1 to schedule, Curr = F[1]
For i =2 to n
- if S[i] >= Curr, then - Add class i to the schedule
  - Curr = F[i]

Time complexity: O(nlogn) for sort and O(n) for others

Proof

Suppose optimal schedule O has class {o1..om}, f(O) = m
Suppose greedy schedule G has classes {g1..gk}, f{G} = k
We assume that the classes in both are sorted according to finish times
We want to swap the first class where o differes from G
Define dist(O) to be m-i where i+1 is the smallest index such that O’i+1’ != g’i+1’

Correctness:

As long as we are able to keep the same number of classes, then its non conflicting

Since there are multiple greedy solution, we pick the class with the smallest greedy time that does not conflict. We want to show that it is non conflicting:

Prove the number class is the same for my optimal solution and the optimal solution G

1) You can see the Os are an optimal solution

2) Lets swap O1 with G1

3) When we do the swap we realised that the dist become smaller each swap

swap 1

Swap finish all

Each swap is dist(O) - 1 since our optimal solution would be G where dist(G) = 0

So we can see we will try to decrease dist to be 0

Notice that the number of classes in the origianl optimal is always the same.. it will not increase anymore becasuse we know that the first optimal solution has alr the max amount of classes needed.. thus there is no way the number of classes would increase during the swapping sequence (If it is then the original optimal solution would have already added that class in)

Conclusion

Try to prove by contradiction
Prove that the lennma for either local swap (min or max) is correct for the algo we have
Ensure that the dist is small because if the dist is large it means it is not a optimal solution

Graphs

Proof:

Every tree has a vertex of degree 1.
Each node has at least 2 or 1 neighbours
We can loop back
We realised that we if we connect back to the previous node, it becoes a cucle

THis contradicts

Therefore, every Tree has at least 2 vertices of degree 1

Minimum spanning tree

Onegraph:

n = number of vertices
m = number of edges

Weight of the tree is considered to be the sum of the weight of the edges of the smallest weight

Generic greedy algo

It is going to add edges and maintain a forest.

Start from an empty edge set
Add one edge at a time without creating any cycle

Consider greedy algo for finding an MST of G = (V,E) that finds a spanning tree as follows

Initilised F = (V,E’) where E’ is empty
For i = 1 to n-1
- Let C1 .. Ck be the connected components of F
- Arbituarily chooe j in {1..k}
- pick an edge e in E with minimum weight among edges with exactly endpoint in Cj
- Add e to E’

The main idea is to pick the edge from each component (min) until all is taken

Graph F is a forest

Theory

We assume we have some minimum spanning tree with the same number of edges {o1…on-1}. We can use local

Pick smallest index i +1 such that Oi+1 is not ei+1
Look at the connected components form by the first i edges
add ei+1 to our O (Optimum solution)
In O, where O is the minimum spanning tree, there must be a path from u to v.
Look at the first edge e’ that leaves c1
Look at the first edge where the vertice is not in c1

The claim is that the swap will either give less weight or be the same but at the same time, have more components then the previous solution.

We removed e’ and added e

Note that we might have disconnected the nodes, but there is another path from the other side
Total weight is W(O) + W(e) - W(e’) But notice that

We just need to get it such that for the optimum solution O*:

W(O*) < = W(O)
dist(O*) < dist(O)

Check it if the path e is e’

Further explaination

Lets say there is 3 component C1, C2 and C3. There is an edge u to v

We are just replacing e’ with another edge e

Choose one random vertice
Check out going edges and select the one with least cost
Form many components
Find the outgoing edges one component and see if it can connect to another component
Choose the smallest edge and merge the two compoennts

Algo has to choose the edge
Proof is given the edge

No matter how you pick the edge, the answer is the same

Prims algo

There is only one non trivial component C which is a tree. Every other component contains one vertex.

Choose one random vertice
Check out going edges and select the one with least cost
Check that the connected component is not added
Add it to our list of components
Repeat
See 2040s

Time complexity

Keep all the edges adjacent to C in a pq Q implemented using a min heap
At each step, extract the min weight edge (u,v) in Q
- if both vertices are in tree do nothing
- If only one, say u, is in the component C then we insert v into C and all edges from v into Q
Each edge (u,v) can be inserted into Q at most twice.
- Adding edges from u -　Adding edges from v
The size of Q is at most M hence insertion/deletion is logm time
Time complexity O(mLogn) because n^2 is larger than m

Fib heap: O(nlgn + m) time

Kruskal algorithm

Scan all edges in increasing order of weight and add the edge if both endpoints are in different components

Start with the smallest edge
Add the edge into the set
Check the next smallest edge
Check there is no cycle
If no cycle, add the edge
Repeat with 3

Label: Labelling the component (Label is given as the name of the largest component)
- Every time we merge, we are changing the label of the smaller component to be that of the larger component

Code

Analysis

After init sort which takes time O(mlogm), the complexity is dominated by teh total time taten UNION operation
For loop vertex: O(n)
For loop edge: O(M)
Each time the component label a vertex changes, the component of F containing the vertex frows by at least the factor of 2, thus each vertex label changes at most O(lgn) times
It follows the total time spent updating vertex labels is only O(nlgn)

Union is lgn:

Everytime we merge, we are changing the smaller components with the lkabel of the larger
The size of this component will be at x2 of the previous
The next time this component is being merge (And the smaller component value changed to the alrgest) again would be when the component is to be at least x2
Therefore it would change at most lgn because each time the size of the component will x2

CHanging one component’s label is O(1) but changing all the components in the set is at least Olgn since the larger component would be at most x2 of the previous

2040s:

Find and union are both logn

CS3230 - Lecture 5 : Greedy Algorithms

Storing files on a tape (Minimisation)

Theory

Algorithm

Prooving (Local swap for minimisation problem)

Collary explaination:

Why does this proof work?

Generalisation

Local swap for maximiation problem

Scheduling Classes

Theory

PsueudoCode

Proof

Prove the number class is the same for my optimal solution and the optimal solution G

Conclusion

Graphs

Minimum spanning tree

Generic greedy algo

Theory

Further explaination

Prims algo

Time complexity

Kruskal algorithm

Code

Analysis

SLide