Imagine picking a student in a crowd

Maybe there are two characteristics that we are interested in.
- CAP
- Modules taking

This leads to two dimensional random variables
Because of 2 dimension, we talk about marginal and conditional probability distributions.
The x is a certain value and the y is a certain value

P(X =x) marginal
p(X = x | Y = y ) conditional
Given this, how does it affect the other value

Marginal and conditional will not be the same unless they are independent. (Chap 2)
Chap 2 covers for one variables but for chapter 3, covers 2 variables

2 Dimensional random variables

Definition
Let E be experiment and S is sample space 
Let X and Y be two functions each assigning a real number to each s where s is variables in S
This is like in a graph, it is denoted as s(X) and s(Y)
This is a two dimension random variables or random vector

X and Y takes certain variables and refer to the same sample space


Range space
Rx,y =  {(x,y) | x = X(s), y = Y(s), s is part of S}

Random variables takes certain values and is random. It is associated with the probability function. 
A function maps one value to another value in S

Definition 2
Let X1, X2 ... Xn be n functions each assigning a real number to every outcome.
We call (X1...Xn) an n dimensional random variable

Definition 3
1. (X,Y) is two dimensional discrete random variable if the possible values of X(s), Y(s) are finite or countable infinite

This means that x and y can be represented as xi , yj where i = 1,2,3... j = 1,2,3..
We use two different subscripts

2. If (X, Y) is Continuous random variable if the possible values assume all values in region of Euclidean plane R^2


e.g 1
Tv set to be service
X represent the age to nearest year of set and Y the number of defective components in the set

For any set, there is a function 
X(s) where 0 ,1,2,3...
Y(s) where  = 1,2,3...

The range for x is 0 to infinite and
the range for y is 0 to n
Where n is the total number of components in the tv set


e.g 2
Fast food restaurant operates drive up and walk up
X is proportion of time for drive up in use (at least one customer is being served or waiting)
Y is walk up window is in used

Rx, 0 < = X <= 1
Ry, 0 < = Y <=1

The range must be between 1 and 0 because it is proportional of time use

This is continuous because it takes all values within the unit square

Joint probability density function

Joint probability functions for discrete random variables

Definition
Let (X,Y) be a 2 dim discrete random variable define on the sample space of an experiment.
We associate a number fxy (Xi, Yi) representing P(X= xi, Y =yi)
and it satisfy the following conditions:
1. fxy >= 0
2. The sum of fxy = 1


Joint probability function

The function is defined for all pairs of values

e.g 1
Find value of k so that the function 
fx,y = kxy for x = 1,2,3 and y = 1,2,3
can serve as a joint probability function

There are 3 possible values for each variables, thus there are a possible 9 combinations (3*3)

We need to sum up all the probabilities, it will give us one
1k + 2k  +3k + 2k + 4k+ 6k+ 3k + 6k + 9k = 1
k = 1/36

How do we know when the f(x,y) is 0?
When the values are outside the range of the function (ie, x and y are wrong numbers)

e.g 2
A company has lines A and B which produce 5 and 3 machines respectively
Assume number of machiens produce is random variable
X = Line A
Y = Line B
What is the prob that more machine are produced by line A than by line B on a given day
We are finding:
P(X > Y)
We can find the answer by summing all probabilities where X is greater than Y
We do not bother what is the value of Y, just care only where X is greater than Y

Marginal probability is the sum of all the X when Y is a certain similiar number
ie when Y is 1, sum X1,X2,X3...

What is the conditional probability of X given Y when Y = 0 
F(X | Y = 0) where X has to be defined as a specific number

e.g 3
There are 9 exe, 4 are married, 3 are never married and 2 are divorce
3 of the exe are to be randomly selected for promotion

X is number of married 
Y are never married

Find joint probability function of X and Y
X = 0 ,1,2,3
Y = 0,1,2,3
We cannot have X = 0 and Y = 0 at the same time

The number to select 3 out of 9 is 9C3
The number of ways to select x exe from 4 married, y exe from 3 never married and the rest from divorce is
4Cx * 3Cy * 2C(3-x-7)
such that 1< = x+y < = 3
//this means that x and y must at least contribute 1
//x + y also cannot be bigger than 3 because the max is 3 people to be selected

f(x,y) = P(X = x , Y = y )
= 4Cx * 3Cy * 2C(3-x-7) / 9C3
such that 1< = x+y < = 3 holds true.

What if I add in a variable z to represent the divorce?
-> that is 3 dimension already, the formula does not hold true and it is very complicated
x+ y + z must be equal to 3

Joint probability function for continuous random variables


This must always hold:
1. fxy is greater or equal to 0
2. Integrating fx(x,y) dxdy = 1

When integrating, treat y as an constant.  Integrate with respect to y first.'
Does it matter if we integrate x or y first?
-> There might be some.. but it doesnt matter for this module (LOL)🤣
But if we take the right order, it may make the integration easier. Try to alternate who to integrate first.

e.g 1
P(X + Y >=1)

<Insert 3-23>

Look at the boundary of the pdf
Drawing it out, we can plot the area of the graph that we are looking for using the PDF

Looking at the probability,  we realised that we want the area of the graph that is above the line
x+y =1
However, there is no prob density function for this.

If integrating x first before y,
Integrate x first -> x^2 + (xy)/3 dx with range of 1-x to 0
Because y = 1 - x, we will integrate dy from a range of 2 to 1- x (2 is taken from pdf)


But the easier method is to find the area of the triangle (Under the graph x + y =1)
and take the complement of that

e.g2
Given a pdf
12/13 x ( x+ y)  for 0<=x<=1, 1<=y <=2
0                           otherwise

Define A = 0 < x< 1/2, 1< y <2

By integrating x first and y second,
x -> limit of 1/2 to 0
Ensure there is no more x after this
y-> limit of 2 to 1

But what if we integrate y first??
y-> limit of 1 to 2 
x-> limit of 1/2 to 0
In this case , because it is a nice shape, there is no need to change


Marginal and Conditional Probability Distributions

What if I want X only or Y only. 
I also want to know what is X | Y is given some value

Definition

Discrete
fx(x) = the sum of f(x,y) with the same x value
fy(y) = sum of all fxy)x, y) with fix y

Continuous
fx(x) = integrate of dxy with respect to dy
fy(y) = integrate of dxy with respect to dx

IF the value x or y falls out of the range, f(x) / f(y) = 0

Conditional Distribution

Definition
Define as the ratio of the joint distribution / marginal probability
Marginal must always be greater than 0.
fY|X (y|x) = fxy(x,y) / fx(x)
In this case, only one is the random variable
We already fix the random variable x to a certain number, thus it is not random anymore
This expression will show the density of y for a given x

1. The conditional pdf satisfy
a) For fixed y,
 fx|y(X|Y) >= 0
and for fixed x,
fY|X(y|x) >= 0

b) The sum of fx|y(X|Y) = 1
The sum of fY|X(y|x) = 1

fxy(x,y) = fY|X(y|x) * fx(x)

e.g1
Suppose 3 toss of a fair coins
Let X be number of head on last flip and Y the total number of heads for 3 tosses
(HHH HHT HTH .... are sample points)

_ _ _, there are 2 outcomes for each _ 
Therefore, 2* 2 *2 = 8

Find the conditional distribution of Y given X = 1
Constructing the table will help us see better.
<Insert 47>

Independent Random variable

x and y are independent when the condition does not rely on the other.

If the pdf is given x^y -> this is dependent as y^x means a different thing
But if the pdf is given x*y -> independent as y*x is the same as x*y

Proving independence

We can prove by showing fxy(x,y) = fx(x) * fy(y)

To get fx(x)/fy(y):
Continuous: Integrate for the range treating the other as a common factor
ie)
fx(x) -> integrate only the x for x's range and treat y as a common factor

Discrete: Summation of the range of values

OR

We can sub in values and prove that there is fxy(x,y) != fx(x) * fy(y)
that exist

A range is a product space, if within the product space exist a number that does not correspond to the fx(x,y), x and y is not independent
-> If the coverage is a triangle (Not a square), we can use a loophole to find the product to be 0

e.g 2
fxy = xy/36
for x = 1,2,3
and
y = 1,2,3
Are X and Y independent?

For joint PDF, check if its greater or equal to 0

If we sub in x and y, we know that it is greater than 0.
We must then check if f(1,1) + f(1,2) + ... f(3,3) = 1
This is a double summation. 

Solution:
fx(X) =
= x/36 * (summation of y from 1 to 3)
= x/36 *(1+2+3) = 1/6 x for x = 1,2,3

fy(Y) =
1/36 * (Summation of x from 1 to 3)
= 1/4 * y

fxy(xy) = 1/36 * xy = fx(x) fy(y) =  x/6 * y *6
for all x,y =1,2,3

Therefore, it is independent

e.g 3
X and Y are 2 independent rv with
fx(x) = e^-x for x>=0 and
fy(y) = e^-y for y>=0

what is fxy(x,y)?
(How to get the joint pdf)
Given this 2, it is not possible without fx(x)* fy|x(y|x) or fy(y) * fx|y(x|y)
but because it is independent... we can just multiply

Solution
fx,y(x,y) = fx(x)fy(y) { e^-x * e^ -y for x >+ 0 and y >= 0
                                   {  0        otherwise


Expectation

The expectation of  g(x,y) is define as:
Discrete: double summation of x and y of g(x,y)fxy(x,)
Cont.:  double integration of x and y of g(x,y)fxy(x,)

Special case:
Let g(x,y) = (X - vx)(Y-vy)
This lead to the definition of covariance between two random variables.

Let (X,Y) be a bivariate random vector with joint p.f or (Pdf) fx,y(x,y), then the covariance of (X,Y) is defined as cov(X,Y) = E [ (X - vx)(Y-uy)]

A positive covariance means that the two variables at hand are positively related, and they move in the same direction. 

A negative covariance means that the variables are inversely related, or that they move in opposite directions.

If they go in same direction, the cov will be negative else it will be positive
Cov(x,y) = 0 does not meant that X and Y are independent
But if f(x)*f(y) = 0 means Cov is 0

This is similar to finding v(x) but only for 2 variables instead of one

Recap: v(x) = E(x^2) - (E(x))^2

Multiplication and expectation are not commutative
We cannot find the expectation then the multiplication but summation is ok

ie E(X+Y) = E(x) +E(y) is ok
ie E(XY) = E(x) * E(y) is wrong unless they are independent
Cov(X,Y) =  E(x) * E(y) =  E [ (X - vx)(Y-uy)]
 <Insert slide 111>

Correlation Coefficient

Pxy = cov(x,y) / [sqrt(V(x)) * sqrt(V(y))]

This is the degree of association which also depends on the degree of magnitude of x and y. 
The answer will be between -1 and 1
This is the measure of degree of linear relationship between X and Y
If X and Y are independent, pxy =0
But pxy= 0 does not mean independence