Random Variables ========== Roll a fair die until an ace (1) appears; how many non-aces do you see first? This is an example of a *RANDOM VARIABLE*, a number that depends on chance. a) What *is* a random variable? One answer: A function from the sample space to the real numbers |R Another: A number that depends on chance Secret: Usually upper-case letters from the end of the alphabet are used... so if you see X, Y, or Z, it's probably a RV Let's call the number of non-aces X. b) What questions can we ask & answer about random variables? One: P[ X < 3 ] = 1 - (5/6)^3 = 1 - 125/216 = 91/216 = .4213 Another: P[ X = 2 ] = P[ X >= 2 ] - P[ X >= 3 ] = (5/6)^2 - (5/6)^3 = 25/36 - 125/216 = 25/216 = .1157 OR = P[~A ~A A] = (5/6)(5/6)(1/6) = 25/216 = .1157 Yet Another: What would X be, on average, in lots of repeated trials? Variation: Instead of P[Ace]=1/6, count # of failures before 1st success if successes have probability p, 0
=17: = 1 - (16/20)*(15/19)*(14/18) = 29/57 = .5088 X = max number selected; what are the possible values of X and their probabilities? P[X=20] = 3/20 = .1500 P[X=19] = 3 * (18/20) * (17/19) * (1/18) = 51/380 = .1342 P[X=18] = 3 * (17/20) * (16/19) * (1/18) = 34/285 = .1193 P[X=17] = 3 * (16/20) * (15/19) * (1/18) = 2/19 = .1053 P[X>=17]= .5088 P[X=x] = 3 * (x-1)*(x-2)/(20*19*18) = (x-1)(x-2)/2280, x = 3,4,...,20 Book: P[X=x] = C(x-1,2) / C(20,3) (also correct) c) Tools for answering RV questions: i. Probability Mass Function: p(x) = P[ X = x ] ii. Distribution Function: F(b) = P[ X <= b ] = Sum{ p(x) : x <= b } Properties: a) a < b ==> F(a) <= F(b); (increasing) b) F(b) -> 1 as b -> +oo; c) F(b) -> 0 as b -> -oo; d) F(b) = lim (x\b) F(x). (RIGHT continuous) Note: * P[ X > a ] = 1 - P[ X <= a ] = 1-F(a) * P[ a < X <= b ] = F(b) - F(a) * If X can only take on integer values then: p(x) = F(b) - F(b-1) * SOME RV's will be able to take on non-integer values.... e.g. `uniform' (spinner), / 0 -oo < b <= 0 F(b) = { b 0 < b <= 1 \ 1 1 < b < oo iii. Discrete Random Variables: Everything depends on Probability Mass Function: p(x_i) >= 0, Sum p(x_i) = 1; examples: # of failures before 1st success, P[Succ] = p, P[Fail] = q = 1-p: P[X = n] = q^n p, n = 0, 1, 2, ... P[X >= 0] = Sum(p q^n: n=0..oo) = (pq^0 - 0) / (1-q) = 1. P[X <= x] = Sum(p q^n: n=0..x) = (pq^0 - pq^{x+1}) / (1-q) = 1 - q^{x+1} = 1 - P[ X > x ] = 1 - P[ X >= x+1 ] = 1 - q^{x+1} # of fish caught in t hours, decays in 1 seconds, failures in t months: P[X = n] = c (lambda^n)/n!, n = 0, 1, 2, .... P[X >= 0] = c Sum(lambda^n/n!) = c e^lambda ==> c = e^{-lambda} P[X = 0] = e^{- lambda} P[X > 2] = 1 - P[X=0 or X=1 or x=2 ] = 1 - (1 + l + l^2/2)*e^{-l} P[X <= x] = (1 + ... + lambda^x/x!) * e^{-lambda} (no closed-form) iv. Expected Value: Long-term average value: in a huge number N of tries, X1 + ... + XN Sum (x_i)N p(x_i) --------------- = ----------------- = Sum (x_i) p(x_i) N N Examples: // REVIEW GEOMETRIC SERIES: (1st in - 1st out)/(1-r) // # of failures: Sum (n pq^n : n>=0) = pq Sum (nq^{n-1}: n>0) = pq (d/dq) Sum(q^n: n>=0) = pq (d/dq) (1/1-q) = pq (1-q)^-2 = q/p = 5 for fair die # of fish: Sum (n l^n/n! e^-l) = l e^-l (d/dl) Sum l^n/n! = l e^-l (d/dl) e^l = l e^-l e^l = lambda # on a fair die: Sum(n/6: n=1..6) = 21/6 = 3.5 (*EXPECTED* ???) =========== ======================== ======================== ================ v. Functions of RV's: Expectation of g(X): Sum { g(x_i) * p_i } Examples: X^2, for X = number on fair die: E[X^2] = (1 + 4 + 9 + 16 + 25 + 36)/6 = 91/6 = 15.167 2^X, for X = number on fair die: E[2^X] = (1 + 4 + 8 + 16 + 32 + 64)/6 = 126/6 = 21 2^X, for X = # of Tails before 1st Head: Sum{2^i * ( .5 * .5^i )} = Sum{.5} = .5 + .5 + .5 + .5 +... = oo vi. Variance: mu = E[X] sigma^2 = E[(X-mu)^2] = E[X^2] - mu^2 Note: L(a) = E[ (X-a)^2 ] = E[X^2 - 2*a*mu + a^2] L'(a) = -2*mu + 2*a = 0 ==> a=mu, SO, L(a) >= L(mu) = Variance = MINIMUM SQUARED ERROR Sigma = sqrt(Variance) = MEAN SQUARED ERROR (MSE), a measure of how "variable" an RV is. vii. Bernoulli & Binomial Random Variables: Binomial: # of successes in a FIXED NUMBER n of INDEPENDENT trials p(k) = C(n:k) p^k q^(n-k), k=0,1,...,n viii. Poisson Random Variables: Poisson: # of fish caught in T hours OR p(k) = e^(-lambda) lambda^k/k! OR # Deaths by horse-kick in Prussian army, # misprints on a page (or 10 pages) of a book, # customers entering a store on a given day, # trees within a (100m)^2 ("hectare") of Duke Forest, etc. ix. Other Discrete: Geometric: # of failures before 1^st success p(k) = p q^k, k=0,1,... Neg Binom: # of failures before r^th success p(k) = C(k+r-1:k) p^r q^k, k=0,1,2,... Hypergeo: # of successes in a FIXED NUMBER n of DEPENDENT trials (no replacement) p(k) = C(A:k) C(B:n-k) / C(A+B:n), k=0...n Zeta (Pareto, Zipf): ummmmh.... p(k) = c/k^alpha, k=1,2,3,...