Pitman MTH 135/ STA 104 Probability Week 6 Read: Pitman sections 3.3 ------------------------------------------------------------ Useful Facts about Expectations: Definition: E[ g(X) ] = SUM g(x) * P[X=x] mu = E[ X ] sigma^2 = E[ (X-mu)^2 ] = E[ X^2 ] - mu^2 Linearity: E[ a + b*X ] = SUM { (a + b*x) * P[X=x] } = SUM a * P[X=x] + Sum b * x * P[X=x] = a + b*mu VAR[ a + b*X ] = E[ (a + b*X - a - b*mu)^2 ] = E[ ( b * (X-mu) )^2 ] = b^2 * sigma^2 E[ X + b*Y ] = mu_X + b * mu_Y VAR[ X + b*Y ] = E[ ( X + b*Y - mu_X - b*mu_Y )^2 ] = E{ [ ( X-mu_X ) + b ( Y-mu_Y ) ]^2 } = E (X-mu_X)^2 + 2 * b * E (X-mu_X)*(Y-mu_Y) + b^2 * E (Y-mu_Y)^2 = sigma_X^2 + 2 * b * Cov(X,Y) + b^2 * sigma_Y^2 Markov's Inequality: Let phi(x) be INCREASING and EVEN (like |x| or x^2): for any number a>0 and any random variable X, P[ |X| > a ] <= P[ phi(X) > phi(a) ] <= E[ phi(X) ] / phi(a) Example: P[ |X| > a ] <= E |X| / |a| P[ |X| > a ] <= E X^2 / a^2 P[ |X-mu| > a ] <= sigma^2 / a^2 P[ |X-mu| > k * sigma ] <= 1/k^2 (set a=k*sigma) ------------------------------------------------------------ Let's explore what happens for averages Xbar = ( X1 + X1 + ... + Xn ) / n of a series of independent random variables, all with the same distribution with MEAN E [ X ] = mu and VARIANCE E [ (X-mu)^2 ] = sigma^2 Let's let Sn = ( X1 + X1 + ... + Xn ) be the "partial sum"; then E [ Sn ] = E [ ( X1 + X2 + ... + Xn ) ] = (E X1) + (E X2) + ... + (E Xn) = mu + mu + ... + mu = n * mu and VAR [ Sn ] = E [ { (X1-mu) + (X2-mu) + ... + (Xn-mu) }^2 ] = n E [ (Xi-mu)^2 ] + n*(n-1)* E [ (Xi-mu)(Xj-mu) ] = n sigma^2 so E [ Xbar ] = (1/n) ( n mu ) = mu VAR [ Xbar ] = (1/n^2) ( n sigma^2 ) = sigma^2 / n ------------ SO, for samples of size n, for any number eps > 0, P[ | Xbar - mu | > eps ] = P[ | Sn - n*mu | > n*eps ] <= E ( Sn - n*mu)^2 / (n*eps)^2 = n * sig^2 / n^2 eps^2 = sig^2 / n * eps^2 -> 0 so the probability that Xbar isn't very close to the mean mu goes to zero as n->oo. With more work we can show that P[ Xbar -> mu ] = 1 ------------- Sketch if there's time: Let {An} be events, and let "A" be the event A = { infinitely-many of the An occur } = (intersection over m>0) {At least one An occurs for n>=m } If Sum{ Prob { A_n }: n=1,2,... } < oo, then fix any epsilon>0 and find M such that Sum{ Prob { A_n }: n=M,M+1,... } < epsilon. Then P[ A ] <= P [ {At least one An occurs for n>=M } <= Sum P[An: n>=M ] < epsilon so P[ A ] = 0. This is called the "Borel-Cantelli Lemma", or "B-C". If the probabilities of some sequence of events are summable, then at most finitely-many of them can occur. The An do NOT have to be independent. Okay--- Now let Xn be independent random variables with means mu and variances sig^2 < oo. We saw that, for any eps>0, P [ { | Xbar - mu | > eps } ] <= sig^2 / (eps^2 * n) The statement that "Xbar -> mu" is the same as saying "for every eps>0, only finitely-many An occur" for the event An = { | Xbar - mu | > eps } Unfortunately, Sum [ 1/n ] = oo, so we can't apply B-C directly. But B-C DOESS tell us that only finitely-many of A1, A4, A9, A16, A25, ..., A{n^2}, ... occur, so we know Sk -> mu along the *subsequence* k=n^2. With a little more work we can fill in the gaps between the squares to get the STRONG LAW OF LARGE NUMBERS: ------------- Thus, Sn - n*mu LLN: ------------- = (Xbar - mu) -> 0 n If we divide by something that grows slower.... namely, sqrt(n).... then: Sn - n*mu CLT: -------------- = sqrt(n) * [ Xbar - mu ] sqrt(n) ==> No( 0, sig^2 ) We'll see more of this later... but we've already seen the DeMoivre Laplace theorem, where X1 X2 ... Xn are INDICATOR RVs with P[ Xj = 1 ] = p P[ Xj = 0 ] = 1-p = q E[ Xj ] = p VAR[ Xj ] = p q so Sn has a Binomial Bi(n, p) distribution, and DeM-Lap and CLT both say: Sn - np ---------------- ~~ No(0, 1) sqrt( n p q ) ---------------------------------------------------------------------- Extremes: Particularly in the aftermath of events like the 2008 housing market crash, the 2010 gulf oil leak, global warming trends, and such, we are more and more interested not only in *AVERAGE* behaviour of random variables but also in their *EXTREMES*. The LLNs and the CLT talk only about averages. If X1 X2 ... Xn are independent random variables, set X*n = max(X1, ..., Xn). (could do minimum too) Does X*n have a limiting probability distribution (probably not...) If not, can we find constants a.n, b.n such that (X*n - a.n) / b.n has a limiting distribution? (YES, usually) What will that distribution be? Example: (1) Xn uniform random variables (2) Xn exponential random variables (3) Xn Pareto random variables Turns out these are the only possibilities! They can all be put into one family, "GEV". Ask me if you're interested in learning more. Uniform (0,1): P[ X*n < x ] = x^n P[ Zn < z ] = P[ X*n < (a.n + z b.n) ] = (a.n + z b.n)^n Set a.n = 1; b.n = 1/n; and consider -n < z < 0: = (1 + z/n)^n -> exp(z), -oo < z < 0 ("Reversed Weibul") Exponential w/rate lam: P[ X*n < x ] = [ 1 - exp(- lam * x ) ]^n P[ Zn < z ] = P[ X*n < (a.n + z b.n) ] = [ 1 - exp(- lam * (a.n + z b.n) ) ]^n Set a.n = log(n)/lam and b.n = 1: = [ 1 - (1/n) exp(- lam z ) ]^n -> exp( - exp( - lam z ) ), -oo < z < oo ("Gumbel") Pareto: P[ Xn > x ] = (eps/x)^alp , x > eps (alp>0, eps real) P[ X*n < x ] = [1-(eps/x)^alp]^n P[ Zn < z ] = P[ X*n < (a.n + z b.n) ] = {1-[eps/(a.n + z b.n)]^alp}^n Set a.n = 0, b.n = eps * n^(1/alp); then = {1-[eps/(z eps*n^(1/alp))]^alp}^n = {1- z^{-alp}/n } ^n -> exp(- z^(-alp) ), 0 < z < oo ("Frechet") ----------------------------------------------------------------