Point estimation

Mean and variance

Assume that some characteristics of elements can be represented by random variable X whose pdf is fx(x,titer)

Choose the varables for our parameters, this will give us different normal distribution

The form of pdf is assumed know except that it contains unknown parameter titer

Parameter is just some constant for our PDF

Estimator: Based as a fomula on the random sample -> Can use to estimate the unknown parameter

Estimation

Point estimation

This will be always wrong, but it is good for a large sample size (Law of large number)

Interval estimation

Choose an interval where the unknown can land in.

e.g Avg temperature in singapore: 28 - 30 degrees

Parameter and statistic

A statistic is a function of the random sample which does not depend on any unknown parameters

We can only vary the random set but we cannot involve anything that is not known to us, it cannot be use as an estimator

Point estimate of mean

Suppose u is the population mean

The stats that one uses to obtain a pount estimate is called estimator

e.g Xspa is an estimator of u

The value of X, denoted by xspa is an estimate of u

E.g 1

Sample mean of random sample taken from population with mean u is 5
Point estimate for the population mean u is 5
lowercase: estimate
Uppercase: Estimator

Different random sample give different point estimates of u

Interval estimation

Define two stats like:

This is a random interval per 2 end points where each end point is a random variable.

E.g 2

Suppose a^2 is known

Let

Random variable has more than one variable with a probability associated with it.

Unknown parameters are unknown constant that will not change

Here we use xspa to determine the estimate. If we want to find the chance where

p(titleL < u < titleU) ?

Unbiased estimator

A statistic is said to be an unbaised of the parameter titer if

E(x) = u and E(xspa) = u

This is a consistent estimator where n is very huge

To compare two different estimator for same parameter, we look at the unbiasness.

Estimator is good if:

Biase is small (Not too far from para)
Variation is not very big

This is called mean square error

Interval estimation

An interval estimate of population parameter is an interval of the form

We try to get the width of confidence interval small.

Since different sample will yeild different titer hat
Therefore, these end points of the interval are values of corresponding random variables

I take a sample and calculate many intervals. Titerlhat can be consider as a random variable.

If we repeat it many times, about 95 percent of the time, it will cover

Confidence interval for the mean

Known variance case

Confidence interval for mean with

Known variance case

Sample size for estimating mean

How big the sample must i get in order to get inference? -> Depends

This is to get the sample size

If e is small, then n will be big.
if n is big, e will be big

This is the same as saying that if the variance is big, we need a larger sample size in order to achieve a higher level of confidence

E.g 1

Mean of CAP of random sample of 36 seniors is 3.5
muel is 0.3 i) find the confidence interval for the mean of the entire seniors

We can just sub in the fomula

ii) How large a sample is required if we want to be 95 percent confidence that our estimate of u is off by less than 0.05?

Unknown Variance case

We are thinking of small sample size (n<30)

Point estimate
Construct an interval: some multiple of the sd

For large n,

E.g 1

E.g 2

The population is the customers that uses credit card
u, the avf amount spent on their first visit to the chain’s new store in the mall

We are interested in certain characteristic of the individual which is the amount spent

-> We assume that they follow a normal distribution

Since n is large, we use z value instead of t-value

A 90 percent confidence interval for the mean is given by:

Confidence intervals for the difference between two means

We normally like to look at the difference. It does not matter which mean is one or two, we can swap accordingly

Assume have 2 population with u1 and u2 and varaince a1 and a2

Then

 Xspa1 - Xspa2

is the point estimator of u1 - u2

Known variances

Large Sample C.L for unknwon variance

We use s1^2 because we trying to use the best known

E.g 2

Unknown but equal variances

We can still apply the t distribution

We can use chi - square (they are independent)

Unknown but equal variance for large sample

Just check these conditions:

1) Approximately normal 2) Equal variance

This will give us a range for u1 - u2

C.L for the difference between two means for paired data (Dependent data)

We want to find out if there are difference in the means but what we are looking for if there are difference in the two variables.

How to find the confidence interval for this difference

We are trying to find the avg of these two differenes.

1) Consider before and after

2) Both u is taken from the same individual (pair)

Consider these questions:

Normally distributed? => yes
Do we know the variance (population)? => no

we are looking at d + k sqrt(
Sample size is known => yes

V(xspa) = a^2/n

the xi and yi is related

Big sample does not mean that the sample follow the normal distribution!!!!!! It is the average (xspa) which follow normal distribution if our sample size is big enough, central limit theorem only apply to xspa

Small sample and approximate normal population

For large sample (n>30)

The numbers are close to 1.96, therefore we can just replace it.

E.g 4

10 pairs of 20 students
Each member of pair is approx the same IQ
One of each pair was selected at random and is assign to a math section using program materials only
The other member of each pair is assign to professor lecture
Given same exam there result recorded