Population and sample

How does each sample’s avg behaves? That is the sampling distribution

The totality of possible outcomes or observation of a surve or experiment is populate
Sample is a subset of population
Every outcome or observation can be recorded as a numerical or categorical value. Thus each member of a population is a value of a random variable.

Finite population

Finite number of elements

e.g

All citizen of singapore
All books in science library

Population is a collection of something that has a value and it can have more than one of these value.

Infinite Population

Infinite is one that consists of an infinitely (Countable and uncountable) large number of elements.

e.g

The results of all possible rolls of a pair of dice
Random digits numbers taken with replacement from a sample space of 10 digits. like {0,1,2,3,4,5,6,7,8,9}

Although they are only 10 distinct integers, there can be a replacement, thus it is infinite

The depths at all conceivable position of a lake

There is infinite number of points

Remark

Some infinite pop are so large that intheory we assume to be infinite such as the pop of lives of a certain type of storage battery.

Random Sampling

Simple random sample

A set of n members takne from a given population is called a sample of size n
A simple random sample of n members is a sample that is chosen in such a way that every subset of n observation of the population has the same prob of being selected

We want to select a sample of 3 from 1,2,3,4,5,6,7,8,9,10,11,12

We can select using 12C3 -> e.g {1,8,9}

Each of these are equally likely of being selected

Sampling from a finite population

Sampling without replacement
- In general, there are NCn samples of size n that can be drawn from a finite population of size N without replacement

Each sample has an equal chance of being selected

p(select) = 1/Ncn

Sampling with replacement
- Using the population {A,B,C,D}, there are 4^2 = 16 samples of size 2

There are N^n sample of size n that can be drawn from a finte pop of size N with replacement

Sampling from an infinite population (with or without)

It wont affect the remaining if you take one out because it is infinite.

E.g 1

15 tosses of a coin with the hypothetical infinite population as the result
If prob of getting heads is the same
15 tosses are independent

-> Therefore, the sample is random

Each of these values are independent and has an even distribution The joint distribution is equal to the product of the joint marginal for each values.

1) same prob of being selected 2) Independent

E.g 3

Consider the pop of the sum of all possible rolls of a pair of dice
the population is consider to be infinite
We choose a random sample of size n from a random variable X having probability function given by

is infiite cos we can roll the dice many times

To obtain a random size of size 100, we simply roll the pair of dice 100 times independently under the same conditons
If xi represent the result on the ith roll, we then obtain x1,x2,x100
All these are random variables with the same prob/same distribution as the population variable X

no matter is what roll, the outcome will follow the same distribution

Definition

Let X be random variable with prob distribution
Let x1,x2.. xn be n independent random variables each having the same distribution as X
then x1,x2..xn is called a random sample of size n from a population with distribution fx(x)

Sampling distribution of sample mean

Selecting random sample to elicit infomation about unknown population parameters

e.g we just toss a coin

fx(x) = p^x * (1-p) ^ (1-x) * x

We draw inference from a subset of a population

We want to know the proportion of people in singapore who prefer a certain brand of coffee
A large random sample is then selected from the population and the proportion of this sample favouring the brand of coffee in question is calculated

All these answer being correct or incorrect, they are all answer since we are selecting a random sample

Statistic and sampling distribution

A function of a random sample is call statistic

A function is when we have one value mapped to another value

A statistic is also called a random variable
The prob distribution of a statstic is called a sampling distribution

Sample mean

This is the realisation of the statistic This happen wehen the values in the random sample is observed

E.g 1

Consider a discrete uniform population
3,5,7,9,11
Population N = 5

Hence, fx(X) = 1/5

The population mean

Note: Mean is not the average, we must use the fomular

v(xspa) = v(x) / n

E(xpa) = e(x)

The variation of xpa is smaller than the variation of x. Because variance of xspa is v(x) * n. If n is a big number, then the varation of xspa is close to 0 compared to v(x)

Theorem

FOr random samples of size n taken from infinite population or finite with replacement, having population mean (u)and population deviation(sigma)

Law of large number

The number is very big

-> Seems to tend to 0 when n is big

Remark:

The sample size increase, the prob that the sample mean differes from the population mean tends to 0

Central Limit theorem and its application

Let x1, x2.. xn be random sample of size n from a population having any distribution with mean u and finite population variance sigma^2

The sampling distribution of the sample mean xspa is approximately normal with mean u and variance sigma^2/n if n is Sufficiently large

Taking a random sample of a million does not mean it will follow this normal distribution -> but that all the mean will follow the approximate normal

e.g If i select one million from a population and ask them a yes or no question. Each individual question be yes or no. How can it follow normal distribution? However, if we take the sample mean, all the xspa will follow the same normal distribution

Big sample size means that the sample mean follow a normal distribution

Theorem

If given a bunch of x which follow N(u,sigma^2), then xspa is N(u,sigma^2/n) regardless of the sample size n

The sum of a1x1…anxn is a linear combination.

The whole expression is an random variable. What is the distribution for this random variable.

Variance is sigma^2/n, this does not require the n to be large. If it is approximately normal, the Xspa is approximately normal as well. N(u,sigma^2/n) regardless of the sample size n

E.g 1

Light bubl have a length of life that is approx normally distributed
Mean equal to 800 hours
SD 40 hrs
Find prob of randomsample of 16 light bulbs will have an avg life of less than 775 hours

Xspa approx ~ N(800, 40^2/16)

Therefore,

It is symmetric about 0, therefore p(z<-0.5) = p(z>2.5). We do not need a graphic calculator

E.g 2

Xspa denote the mean of random sample size 75
Distribution with pdf:

fx(x) = 1 for 0<x<1

What is xspa?

xspa ~N(E(x), V(x)/n)

Find (0.45<xspa<0.55)
It is known that E(X) = 1/2 and v(x) = 1/12

E(Xspa) = E(x)

V(xspa) = V(X) /n

E.g 3

Random sample size 50 from possion distribution with para lamda = 0.03
What is the prob that the sum of the sample will be at least 3 ie p(xspa>=3)

By the theorem, Xspa~N(E(x) , v(x)/n)

E(x) = 0.03 = V(x) due to poisson

Therefore,

Note: Continuity correction is used.

If the sample size is very big, it does not really matter.

E.g 4

Nicotine content mean= 0.8mg
SD 0.1mg
SOmeone smokes 5 packs (20 cigarettes) per week

-> 5 packs consist of 100 cigarettes -> Let xi denote the nicotine contents of the 100 cigarettes

Applying clt,

This is the same as asking if xspa is greater or equal to 0.82

Sampling distributions of difference of two sample means

What if we have x1spa and x2spa

Since x1spa and x2spa are approx normally distributed, therefore their differences are also approximately normally distributed.

E.g 1

Tv picture tubes of manufacturer A have a

mean lifetime of 6.5 years
sd of 0.9 years

Manu B have

mean 6
SD - 0.8

What is the prob that a random sample of 36 tubes from A have mean that is at least 1 year more than mean of sample 49 from B?

P(Xspa - Yspa >=1)

Mean is 6.5 - 6 = 0.5
SD = Sqrt((0.81/36) + (0.64/49))

We are not told what is the distribution, we only know the mean, That is why we will approximate the distribution to the normal distribution

Chi square distruution

Now we are talking about how sample variance behaves

This will follow the ~x^2(n-1) distribution.

This is the PDF for y:

Y is defined to have a chi square distribution with n degress of freedom denoted by X^2(n) Where n is a positive integer and R(.) is the gamma function

![2334_5_15.PNG(/NUSCSMODS/img/2334_5_15.PNG)

Properties

Y~x^2(n), then E(Y) = n and V(Y) = 2n
For large n, X^2(n) approx ~ N(n,2n)
If Y1..Yk are independent chi sqaer random cariabe with n1..nk degress of freedom respectabl, then the sum of y1 to yk has a chi square distribution with n1 + ..nk degress of freedom

Theorem

If X~N(0,1), then X^2 ~X^2(1) Where x is (y-u)/sigma

Sampling distribution of (n-1)S^2 /

t-distribution

Z is Xspa - u / (sigma/sqrt(n)) ~N(0,1)

The final is Xspa - u / (s/sqrt(n)) which is the t distribution

This t distribution is consider as the ratio of 2 random variables, x and z.

Properties of t-distribution

Resembles a graph of a sd distribution
As n tend to infinite, it looks like normal distribution

E.g 1

Light bubl burn on avg 500 hrs
Test 25 bubls each month
Computed t values falls between -t24;0.05 and t24;0.05
What conclusion should be drawn from a sample that has a mean Xspa 518 hours and sd s = 40?

The chance that falls between t(24) and betwen -1.71 and 1.71, what is the probability that it falls within this area in the graph.

Assume that the distribution of burning times in hours is approx normal

if u =500 then t = (518-u)/(40/5) = (519-500)/8 = 2.25 > 1.711

In this case, we do not believe what the manufacturer claim as u is > 500.

F-distribution

Let U and V be independent random variables having X^2(n1) and X^2(n2) respectively.

Definition

E.g 1

Sppose random samples of size n1 and n2 are selected from 2 normal pop with varaince of sigma1 and sigma2 respectively

We are interested in finding if the 2 sample population have the same variance. Thus we will use this

Theorem

If F~F(n,m) then 1/F ~ F(m.n)

We are just taking the reciprocol since it is a ratio. We need to swap the degrees of freedom
The table gives the values of F(n1,n2;a) such that P(F>F(n1,n2;a)) = a

E.g

F(5,4;0.05) = 6.26 means p(f>6.26) = 0.05 where f~F(5,4)
F(4,5;0.025) = 7.39 means P(F>7.39) = 0.025 where F ~ F(4,5)

Theorem

P(W>a) = 1 - alpha
P(1/w < 1.a) = 1 - alpha
P(1/w > 1.a) = alpha

E.g 2

S1^2 and S2^2 be sample variances of independent sample of size n1 = 25 and n2 = 31
Normal population with variance 10 and 15 respectively
Find P(S1^2 / S2^2 > 1.26)