SUMS OF INDEPENDENT RVs ======================= In general it's hard to find the pdf or pmf of sums Sn=X1+...+Xn of random variables. Soon we'll see that, IF the rv's are * (nearly) independent and have * (nearly) the same distribution, and if * n is very large, then the distribution of Sn will be approximately normal. First some preliminaries: If X is a random variable with (nonnegative) integer values, define the "Generating function" G(z) = E[z^X] = sum z^k * P[X=k] We can recover the pmf p(k)=P[X=k] from the power-series expansion of G(z), for example as (k) p(k) = G (0) / k! = (k'th derivative of G, evaluated at 0) / k! This is helpful for sums of indep rv's: if Sn=X1+...+Xn, then G(z) = E[z^Sn] = E[z^X1] * ... * E[z^Xn] and so we can recover the pmf for Sn from the power series for G; for example, the sum S2 of X1~Bi(n=10,p=.2) and X2~Bi(n=6,p=.5) has gen. fn. G(z) = (.2 Z + .8)^10 * (.5 Z + .5)^6, available from Mathematica as 0.0016777216 + 0.0142606336 Z + 0.05505024 Z^2 + 0.127926272 Z^3 + 0.2000814080 Z^4 + 0.22320906 Z^5 + 0.183744102 Z^6 + 0.1139384320 Z^7 + 0.05385946 Z^8 + 0.019508032 Z^9 + 0.0054071824 Z^10 + 0.00113730 Z^11 + 0.000178328 Z^12 + 0.0000201920 Z^13 + 1.56 10^-6 Z^14 + 7.36 10^-8 Z^15 + 1.60 10^-9 Z^16 from which we can read off the probabilities of different possible sums. LIMIT THEOREMS ============== 0. MARKOV & CHEBYCHEV INEQUALITIES: A simple & useful tool used to prove these limit theorems are the inequalities: P[ Y > a ] <= E[ Y ] / a for positive random variables Y P[ |X-mu| > k sigma ] <= 1/k^2 (set Y=|X-mu|^2/sigma^2, a=k^2) The cool thing about these is that they are NOT "approximations", they are guarantees; and that they don't use anything about the distributions of X or Y. Also they're easy to prove.... Two hallmarks of Probability Theory are the: 1. LAW of LARGE NUMBERS: If X1 ... Xn ... are all independent with the same probability distribution with (finite) mean mu, then _ X1 + X2 + ... + Xn X = ---------------------- ------> mu, n n _ i.e. Yn = (Xn - mu)/sigma satisfies Yn --> 0 as n -> oo. The Yn are *random variables*, not just numbers, so we have to be a little more careful about what this means: _ 1. WEAK: P[ |Yn| > eps ] --> 0 _ 2. STRONG: P[ |Yn| -> 0 ] = 1 One can show that the WEAK formulation says the events En={ |Yn| > eps } satisfies P[ En ] -> 0 as n-> oo, while the STRONG formulation makes the stronger statement P[ U {En: n>=m} ] -> 0 as m->oo. There are a range of possible variations, weakening "same distribution" or "independent"; but SOME sort of condition is needed to eliminate for example: Toss a fair coin. If Heads, set X1 = X2 = X3 = ... = 1; if Tails, set X1 = X2 = X3 = ... = 0. Then E[Xi] = mu = 1/2; all Xi have the same distribution; but Xbar_n = X1 does not converge to mu as n->oo in any sense. For indicator random variables Xi = 1 if event A occurs on i'th trial the Law of Large Numbers says _ # of occurances of A in 1st n tries X = --------------------------------------- ------> Prob(A), n n consistent with the Frequency Interpretation of probability. 2. CENTRAL LIMIT THEOREM: If X1 ... Xn ... are all independent with the same probability distribution with (finite) mean mu and finite variance sigma^2, then _ sqrt(n) * [ X - mu ] / sigma ==> N(0,1) n or, for the normalized versions Yn = (Xn - mu)/sigma, _ sqrt(n) Y = (Sum Yi: i=1->n)/sqrt n ==> N(0,1) n i.e., the probability distribution of Xbar_n is closer and closer to the normal distribution with mean mu and variance sigma^2/n. This lets us compute approximate probabilities for means & sums of any set of independent random variables. Poisson and Gamma random variables are *Approximately* normal, with big lambda or alpha, respectively; for example, let X be Poisson with mean 100; then P[ X <= 110 ] = Sum [ 100^k/k! * exp(-100) : k=0...100] = 0.8528627 = P[ (X-100)/10 <= (110.5-100)/10 ] APPROX = PHI((110.5-100)/10 = 1.05) = 0.8531409 Note that this is a better approximation than Phi((110-100)/10)=.84; the difference is more striking at smaller n's. -------- Simple Proof of Weak Law of Large Numbers: Look at Yn = (Xn - mu)/sigma; then: _ _ WLLN: Pr [ |Yn| > eps ] -> 0 SLLN: Pr [ Yn -> 0 ] = 1 _ CLT: Pr [ Yn < a/sqrt n ] -> Phi(a) _ _ The *mean* of Yn is zero and the *variance* of Yn is 1/n, so by Chebychev's inequality _ P[ |Yn| > eps ] = <= (1/n) / eps^2 = 1/n*eps^2 --> 0 as n->oo, for any eps>0. CLT: Any probability distribution is characterized by it's "moment generating function" (MGF, related to Fourier & Laplace transforms): oo / tx M(t) = E[e^(tX)] = | e f(x)dx / -oo Interesting properties (and genesis of name) include: M(0) = 1 M'(0) = mu M''(0) = E[X^2] => sigma^2 = M''(0) - {M'(0)}^2 or, a bit more elegently, log M(0) = 0; [log M]'(0) = mu; [log M]''(0) = sigma^2 so a Taylor series expansion gives log M(t) = 0 + mu * t + sigma^2 * t^2/2 + R(t), or M(t) = e^( mu * t + sigma^2 * t^2/2) * exp( R(t) ) with remainder R(t) small compared to t^2 as t->0. Some common distributions' MGF's are: Binomial: M(t) = [p e^t + q ]^n log M = n*log{1+p(e^t-1)} Poisson: M(t) = exp(lambda * (e^t-1)) log M = lambda * (e^t-1) Geometric: M(t) = p/(1-q e^t) log M = -log{1+(q/p)(1-e^t)} Exponential: M(t) = (1-t/lambda)^-1 log M = -log{1-t/lambda} Gamma: M(t) = (1-t/lambda)^-alpha log M = -alpha * log(1-t/lambda) Normal: M(t) = exp(mu*t + sigma^2*t^2/2) log M = mu*t + sigma^2 * t^2/2 In particular, the standard normal has log M(t) = t^2/2. For independent RV's, the sum Sn has MGF the product of the MGF's: M_S(t) = E[e^(t*(X1+...+Xn))] = E[e^(t*X1)] E[e^(t*X2)]... E[e^(t*Xn)] = M(t)^n After subtracting mu and dividing by sigma we might as well start with Yn's that have zero mean and unit variance; for mu=0 sigma=1, Zn = (Sn - n*mu) / (sigma * sqrt n ) = Sn/sqrt n satisfies: log M_Zn(t) = n * log M(t/sqrt n) = n * [ (t/sqrt n)^2 + R(t/sqrt n) ] (Taylor's Theorem) --> t^2/2 + n * {something small compared to t^2/n} = t^2/2 ===> for large n, Zn has *approximately* the distribution with MGF exp(t^2/2), i.e., standard normal distribution. (using more advanced mathematics, including the "Fourier inversion formula", we could show that the CDF and expectation E[g(Zn)] for any continuous function g(z) converge to those of of a standard normal random variable). Note that the PDF does NOT converge, and in fact will not even exist unless at least one of the Xj has a continuous distribution. SO, any sum or average of a large number of independent random variables is APPROXIMATELY normally distributed.