Pitman STA 230 / MTH 230 Probability Week 4 Read: Pitman sections 2.4-2.5 MEAN --------------- If we observe many replications of an experiment and so collect data X1 X2 X3 ... Xn, what will happen to the LONG TERM AVERAGE (X1 + ... + Xn)/n for huge n? If X has finitely-many values--- say, if P[ X = a_k ] = p_k for k=1,2,...,K with \sum p_k = 1 and each p_k>0, then for huge n, S_n := (X_1 + ... + X_n) should be about S_n ~ n*p_1*a_1 + n*p_2*a_2 + ... + n*p_n*a_n (why?) and so, dividing by n, the average should be about S_n/n ~ p_1*a_1 + p_2*a_2 + ... + p_n*a_n = \sum p_k a_k. Conventional letter: Greek mu. For example: Average die roll should be about E[X] = 3.5 The same formula works even if X takes countably many values, *except* that we have to worry about whether the sum is finite, and makes sense. Later on we'll see something like this works for random variables with uncountably-many values, but we need to replace the sums with integrals. (Consider St. Petersburg paraxdox here) ------------------------------------------------------------------------ Poisson & Hypergeometric Distributions * Poisson Approximation to the Binomial ----------- Shortcomings of DeMoivre/Laplace: if p is close to 0 or 1, then sigma = \sqrt{n p q} is small. Normal is symmetric, for example, while binomial for small p is packed up against 0. Text looks at binomial with n large and mu=1, i.e., p=1/n; present fishing example here, and sneak in exponential distribution: lam = average number of fish caught per hour X_t = number of fish caught by time t ~ # of successes in N independent tries (t/N hours long), with prob. p_N of success ~ Bi(N, p_N) => t*lam = N * p_N => p_N = (t lam/N) P[ X_t = k ] ~ (N:k) (t lam/N)^k (1-t lam/N)^(N-k) -> (lam t)^k/k! exp(-lam t) = mu^k/k! exp(-mu), k=0,1,2,... T_k = time needed to catch k'th fish P[ T_1 > t ] = P[ X_t = 0 ] = exp(-lam t), t>0 => P[ a < X_t <= b ] = exp(-lam a)-exp(-lam b) = \int_a^b lam exp(-lam t) dt Recall: exponential sum, \sum_{k=0}^oo a^k/k! = exp(a) exponential limit, (1+x/N)^N -> exp(x) --------------- What's the Poisson mode? [ Mention: f'=0 is NOT the way to find it ] Is there ever a double-mode? k-1 k mu mu P(k-1) < P(k) <==> ----------- < ---------- (k-1) ! k ! mu <==> 1 < ---------- k <==> k < mu SO, mode is largest integer <= mu; if mu is an integer, then (mu-1, mu) is a double mode. ---------------- What happens when mu gets big? Looks like the normal, with mean mu and variance sigma^2 = mu. ===================================================================== * Random Sampling: WITH and WITHOUT Replacement WITH replacement: Binomial, mu = G/(G+B) ("good" and "bad" in population), so P[g good, b bad in n=b+g tries] = (n:g) G^g B^b / N^n = (n:g) mu^g (1-mu)^(N-g) (here "(n:k)" denotes the binomial coeffient "n choose k") WITHOUT replacement: Two approaches: one-at-a-time: G G-1 G+1-g B B-1 B+1-b (n:g) --- --- ... ----- x --- ----- ... ----- N N-1 N+1-g N-g N-g-1 N+1-n n! G! B! (N-n)! = ----- ------ ------ ------ g! b! (G-g)! (B-b)! N! and "numbers of subsets" approach: (G:g) (B:b) -------------------- (N:n) G! B! (N-n)! n! = -------- --------- --------- g!(G-g)! b! (B-b)! N! ========================================================================= GEOMETRIC DIST'N: Count the failures X before the first success, in independent trials with same probability p of success. Find the probability of exactly k failures before first success; also find the average number of failures before first success. To have k failures before 1st succ, first k+1 trials must be EXACTLY: F F F ... F S whose probability is f(k) = p (1-p)^k, k = 0,1,2,... Average ("mean") value of this is sum(k=0:oo) k p q^k = ??? One Trick: = p * q * sum(k=0:oo) k q^(k-1) = p * q * (d/dq) sum(k=0:oo) q^k = p * q * (d/dq) [ 1 / (1-q) ] = p * q * [ 1 / (1-q)^2 ] = p * q / p^2 = q/p Example: E[ Tails before 1st Head ] = 1 E[ Non-aces before 1st Ace ] = 5 Another Trick: Think of what happens on first trial: Either p: you succeed (=> NO failures) q: or not (=> ONE failure already, plus whatever else happens) so M=E[X] satisfies mu = p * 0 + q * (1+mu) Solution: mu(1-q)=q, or mu=q/(1-q)=q/p ================================================================== NEGATIVE BINOMIAL DIST'N: Count the failures Y before the r'th success, otherwise as above; For k failures before rth succ, first k+r trials must be EXACTLY: F S S F F S ... F S \ r-1 S's and k Fs / whose prob is (k+r-1 : k) p^r q^k May view it as sum of r indep Geometrics--- failures before 1st succ + add'l failures before 2nd succ + ... + failures before rth succ. SO, E[ Y ] = r*(q/p) Note dist'n makes sense for ALL real r>0, not only integers: r (r+1) (r+2)...(r+k-1) Gamma(r+k) f(k) = ----------------------- p^r (1-p)^k = ----------- p^r q^k k ! Gamma(r) k! where Gamma(z) := int{ t^(z-1) exp(-t), t=0:infinity } = (z-1)! for positive integers z Mean will be mu = r q / p Trick 2 works here too: mu(r) = p { mu(r-1) } + q { 1 + mu(r) } = \mu(r-1) + q/p = r q/p ======================================================================== Indicators: Binomial: X = I1 + I2 + ... + In Each Ij = 1 w/prob p, 0 w/prob q, ==> Mean p; Sum has mean n*p. Hypergeo: X = I1 + I2 + ... + In Each Ij = 1 w/prob p, 0 w/prob q, ==> Mean p; Sum has mean n*p. How are they different? VARIANCE: sig^2 = E | X - mu |^2 = E ( I1 + ... + In - n*p )^2 = E ( I1 - p)^2 + ( I2 - p)^2 + ... + ( In - p)^2 + E ( I1-p )(I2-p) + ... = n * p * (1-p) + n*(1-n) * [ E Ii Ij - p^2 ] What IS E[ Ii Ij - p^2 ] ??? Binomial: p^2 - p^2 = 0, so var is n p q. HyperGeo: (G/N) ( (G-1)/(N-1) ) - (G/N)^2 = (G/N) [ (G-1)/(N-1) - G/N ] = - (G/N) [ (N-G) / N(N-1) ] = - (G/N)(B/N) / (N-1) so var = n(G/N)(B/N) - n(n-1)(G/N)(B/N) / (N-1) = n P Q [ 1 - (n-1)/(N-1) ] = n P Q [ (N-n) / (N-1) ] which is pretty much the same as Binomial IF N >> n ========================================================================= Poi Mean: sum_{k=0:oo} k (lam^k)/k! exp(-lam) = sum_{k=1:oo} k (lam^k)/k! exp(-lam) = sum_{k=1:oo} lam * (lam^(k-1))/(k-1)! exp(-lam) = lam * sum_{j=0:oo} lam^j / j! exp(-lam) = lam * exp(lam) * exp( -lam ) = lam. Poi Var: E[ (X-lam)^2 ] is hard; so is E[ X^2 ]. BUT, easy is: E[ X*(X-1) ] = E[ X^2 ] - E[ X ] = lam^2 so Var(X) = (lam^2 + lam) - (lam^2) = lam, same as mean. ====================================================================== St. Petersburg: Start with $1 on the table and a fair coin. At each step: Toss the coin; if it shows Heads, take the money. If it shows Tails, I'll double whatever money is on the table. Let X be the amount you win. What is the pmf for X? What is E[ X ] ??? What does that MEAN in terms of averages? How much would you pay me to play this game? Can your opponent really pay off the bet? Note lg($68.7 B) = 36... --------