Pitman MTH 135/ STA 104 Probability Week 5 Read: Pitman sections 3.1-3.3 Discrete Random Variables * Introduction to Joint Distributions of Random Variables Roll a fair die until an ace (1) appears; how many non-aces do you see first? This is an example of a *RANDOM VARIABLE*, a number that depends on chance. a) What *is* a random variable? One answer: A function from the sample space to the real numbers |R Another: A number that depends on chance Secret: Usually upper-case letters from the end of the alphabet are used... so if you see X, Y, or Z, it's probably a RV Let's call the number of non-aces X. b) What questions can we ask & answer about random variables? One: P[ X < 3 ] = 1 - (5/6)^3 = 1 - 125/216 = 91/216 = .4213 Another: P[ X = 2 ] = P[ X >= 2 ] - P[ X >= 3 ] = (5/6)^2 - (5/6)^3 = 25/36 - 125/216 = 25/216 = .1157 OR = P[~A ~A A] = (5/6)(5/6)(1/6) = 25/216 = .1157 Yet Another: What would X be, on average, in lots of repeated trials? Variation: Instead of P[Ace]=1/6, count # of failures before 1st success if successes have probability p, 0=17: = 1 - (16/20)*(15/19)*(14/18) = 29/57 = .5088 X = max number selected; what are the possible values of X and their probabilities? P[X=20] = 3/20 = .1500 P[X=19] = 3 * (18/20) * (17/19) * (1/18) = 51/380 = .1342 P[X=18] = 3 * (17/20) * (16/19) * (1/18) = 34/285 = .1193 P[X=17] = 3 * (16/20) * (15/19) * (1/18) = 2/19 = .1053 P[X>=17]= .5088 P[X=x] = 3 * (x-1)*(x-2)/(20*19*18) = (x-1)(x-2)/2280, x = 3,4,...,20 Another way: P[X=x] = (x-1:2) / (20:3) (also correct) DEF: A (real-valued) RANDOM VARIABLE is a (real-valued) function on the sample space Omega. Example: if Omega is the usual 36-point space for two rolls of a fair die, say, { (r,g) : 1 <= r,g <= 6 } all equally-likely, then X(r,g) = r Y(r,g) = |r-g| Z(r,g) = r+g are all random variables. What is the probability that Y=1? What is that EVENT? DEF: The *RANGE* or a random variable is just the set of its possible values. The *DISTRIBUTION* of a random variable is any specification of P[ X in A ] for every set A... if X has only finitely-many (or countably-many) values, the DISTRIBUTION can be specified by giving the probability of each outcome in the range, f(x) = P[ X = x ] and then P[ X in A ] = sum { f(x) : x in A } is specified for every A. For other random variables, like "uniform" and "normal" among others, we'll have to do something else--- we start that just after Fall Break. It's always good enough to specify F(x) = P [ X <= x ] for every x; then we can work out the probability that X is in any interval, any union of intervals, etc; more later. If X is any random variable and g is any function, then Y = g(X) is another random variable: X g Omega -----> |R ---------> |R Actually, X could be a function from Omega to any set at all (say, "E") and g could be a function from E to the real numbers, and we'd still be okay. If X is discrete with pmf f(x) = P[X=x], What is the DISTRIBUTION of Y = g(X) ??? P[ Y = y ] = P[ g(X) = y ] = SUM { P[ X=x ]: g(x)=y } = P[ X in g^{-1}(y) ] ------------- Random Vectors and Joint Distributions Draw two socks at random, without replacement, from a drawer full of twelve colored socks: 6 black 4 white 2 purple Let B be the number of Black socks, W the number of White socks drawn. The *DISTRIBUTION* of B and W are easy to write down; each has only 3 values in its range, with probability table (why?). To make it easier to compare & add numbers, I'll put everything over the same denominator instead of our usual convention of "lowest terms": 0 1 2 B 15/66 36/66 15/66 (6:b)(6:2-b)/(12:2) (**) W 28/66 32/66 6/66 (4:w)(8:2-w)/(12:2) This table doesn't let us know everything--- for example, what is the probability that we draw a matching pair? What's the probability that we have one each of black and white socks? We don't have enough to tell (e.g., we can't tell about the probability of a purple pair). The *JOINT* distribution of B and W tells us the probability of every possible PAIR (b,w) of numbers... we can present it in a formula P(b,w) = (6:b)(4:w)(2:2-b-w)/(12:2) or in a table: W 0 1 2 +------------------------------ 0 | 1/66 8/66 6/66 || 15/66 | || B 1 | 12/66 24/66 0 || 36/66 | || 2 | 15/66 0 0 || 15/66 | || =============================== 28/66 32/66 6/66 66/66 Note that the MARGINAL SUMS are the same numbers we had before in (**); they are called the *MARGINAL DISTRIBUTIONs* of B and W. Now we can see the probability of a matching pair: Black White Green 15/66 + 6/66 + 1/66 = 22/66 = 1/3. or the probability of a black-and-white pair, 24/66 = 4/11. ----------------------------------------------------------------------------- * EXPECTATIONS * We can use the JOINT distribution P[ X=x, Y=y ] to find expectations of functions of any two discrete random variables X and Y : E[ g(X,Y) ] = SUM{ g(x,y) P[ X=x, Y=y ] } For example, above the expectation of the PRODUCT G(B,W) = B * W of the numbers of Black and White socks is E[ B * W ] = 0*0* 1/66 + 0*1* 8/66 + 0*2*6/66 + 1*0*12/66 + 1*1*24/66 + 1*2*0 + 2*0*15/66 + 2*1*0 + 1*2*0 = 24/66 = 4/11. Why was that obvious already???? Note this cannot be calculated from the *marginal* distributions of B and W--- and (in particular) it is NOT THE SAME as E[ B ] * E[ W ] = { (36+30)/66 = 66/66 } * { (32+12)/66 = 44/66 } = { 1 } * { 2/3 } = 2/3 DEFINITION: The *COVARIANCE* of two RVs is: Cov(X, Y) = E[ (X-mu_x) * (Y-mu_y) ] = E[ X*Y ] - mu_X * mu_Y so, here, Cov(B, W) = 4/11 - 2/3 = (12-22)/33 = -10/33 = -0.30303 Let Z = a*X + b*Y + c; what are the MEAN and VARIANCE of Z ? E[ Z ] = a*mu_X + b*mu_Y + c VAR[ Z ] = E{ [a*(X-mu_X) + b*(Y-mu_Y) ]^2 } = a^2 E[ (X-mu_X)^2 ] + 2*a*b E[ (X-mu_X) (Y-mu_Y) ] + b^2 E[ (Y-mu_Y)^2 ] = a^2 sigma^2_X + b^2 sigma^2_Y + 2 a b Cov(X,Y) For example: VAR[ X + Y ] = sig_X^2 + sig_Y^2 + 2 Cov(X,Y) VAR[ X - Y ] = sig_X^2 + sig_Y^2 - 2 Cov(X,Y) ====================================================================== * Conditional Distributions * The marginal probability of w white sox in the draw is: 28/66 = 14/33 for w=0; P[ W = w ] = 32/66 = 16/33 for w=1; 6/66 = 3/33 for w=2. But what if we KNOW that we drew ZERO BLACK socks? *Then* the *conditional distribution* of W would be: 1/15 for w=0; P[ W = w | B = 0 ] = 8/15 for w=1; 6/15 for w=2. (note it's much more likely now for W=2). More generally, for any two discrete random variables X and Y, the *CONDITIONAL DISTRIBUTION* is P[ X=x, Y=y ] joint pmf P[ X = x | Y = y ] = ------------------- = --------------- P[ Y=y ] marginal pmf These can be used just like any other distribution to calculate, for example, the *conditional* mean and variance (see below): E[ X | Y=y ] = SUM { x * P[ X=x | Y=y ] } For example, E[ W | B=0 ] = 0*(1/15) + 1*(8/15) + 2*(6/15) = 20/15 = 4/3 E[ W^2 | B=0 ] = 0*(1/15) + 1*(8/15) + 4*(6/15) = 32/15 96 - 80 Var[ W | B=0 ] = 32/15 - (4/3)^2 = --------- = 16/45 = 0.3555556 45 ----------------------------------------------------------------------------- * INDEPENDENCE * Two random variables X and Y are *INDEPENDENT* if their joint pmf factors: P[ X=x , Y=y ] = P[ X=x ] * P[ Y=y ] as the product of the marginal pmfs ( IF it factors at all as ANY product f(x) * g(y), THEN it factors as the product of marginals. Why?) For independent X,Y, the covariance vanishes: Cov[ X,Y ] = E[ (X-mu_X) (Y-mu_Y) ] = SUM { (X-mu_X) * (Y-mu_Y) * P[ X=x, Y=y ] } = SUM { (X-mu_X) * (Y-mu_Y) * P[ X=x ] * P[ Y=y ] } = SUM { (X-mu_X) * P[ X=x ] } * SUM { (Y-mu_Y) * P[ Y=y ] } = (mu_X - mu_X) * (mu_Y - mu_Y) = 0 * 0 = 0 and so the variance is simply Var[ a*X + b*Y ] = a^2 Var[X] + b^2 Var[Y] For a=1 and b=1 or b=-1, Var[ X + Y ] = Var[X] + Var[Y] ***AND*** Var[ X - Y ] = Var[X] + Var[Y] ----------------------------------------------------------------------------- * Expectations If we draw some RV X repeatedly & independently, what will be its AVERAGE VALUE? For example, if we roll a fair die 600 times, what will the average be? If we denote the outcome on the i'th roll by X_i this looks like: X_1 + X_2 + X_3 + ... + X_600 Avg = --------------------------------- 600 and it's a little hard to tell. BUT--- if instead we think of how many 1's we will find, and how many 2's, and how many 3's and 4's and so forth, we see the sum should be exactly X_1 + X_2 + X_3 + ... + X_600 = 1 * (# of 1's in 600 rolls + 2 * (# of 2's in 600 rolls + 3 * (# of 3's in 600 rolls + 4 * (# of 4's in 600 rolls + 5 * (# of 5's in 600 rolls + 6 * (# of 6's in 600 rolls which should be about (why?) ~~ 1 * 100 + 2 * 100 + 3 * 100 + 4 * 100 + 5 * 100 + 6 * 100 = 2100 so the average should be about 2100 Avg ~ ----------- = 3.5 600 More generally, if we have any function g() and want to know the average value of g(X) for a random variable X that takes each value x with probability f(x), then in a large number N of tries the average will be about Sum [ g(x) * N * f(x) ] Avg[ g(X) ] ~~ ----------------------------------- = Sum g(x) * f(x) N (note the N cancels top-and-bottom, so we can take the limit N->oo easily) This is a *weighted average of g(x)*, weighted by the PROBABILTY that X=x; the fair-die example had g(x) = x f(x) = 1/6 for x = 1,2,3,4,5,6 ---------------- DEFINITION: The MEAN of X is E[X] = Sum { x * f(x) } (usually denoted "mu") The EXPECTATION of g(X) is E[g(X)] = Sum { g(x) * f(x) } Nobody will get upset if you mix up the words MEAN and EXPECTATION. Note that mu has the same UNITS as X does--- if X is measured in feet, meters, seconds, or fortnights then so is mu. ----------------- On average, any RV X will be equal to its mean "mu"... but how far from mu will X be? * Can't measure this by the average of ( X - mu ) (that average is always zero); the points where X>mu are balanced out by the points where X>mu AND when X<