Pitman MTH 135/ STA 104 Probability Week 2: Cond'l Prob. & Independence - Asymp freq => definition P[A|B] = P[A B]/P[B] - P[ A B ] = P[A] P[B|A] - (Wgtd) Average Condit Probs: P[A] = P[A|B] P[B] + P[A|B^c] P[B^c] = \sum P[A|B_i] P[B_i] for partition {B_i} Independence: P[A B] = P[A] P[B] <==> P[A] = P[A|B] = P[A|B^c] [note P[B]=0 makes P[A|B] undefined] B Venn Diagram: +------------+------+ | | | | | | +------------+------+ | | | A | | | A | | | +------------+------+ B Reliability: ---------[ p1 ]---------[ p2 ]-------- p1 p2 ---[ p1 ]--- / \ ---< >--- 1-(1-p1)(1-p2) = p1+p2-p1 p2 \ / ---[ p2 ]--- Bayes Rule False positives: P[+|D] = 0.95 P[+|H] = 0.98 (H = D^c) P[D] = 0.01 P[D] P[+|D] .01 * .95 => P[D | +] = ----------------------------- = --------------------- P[D] P[+|D] + P[D^c] P[+|D^c] .01 * .95 + .99 * .02 95 95 = -------- = ----- = 32% 95 + 198 293 In General: P[A B_i] P[A | B_i] P[B_i] P[ B_i | A ] = --------- = ----------------------- P[A] \sum P[A | B_j] P[B_j] Note: "reversing" the conditioning. Is P[ B | A ] = P[A | B] ??? /Usually not/ INFERENCE: * I have two slot machines; one pays off with p=1/2, one pays off with p=2/3, but I don't know which is which. Call them the "bad" and "good" machines. I pick one of the machines and play it twice. If I win both times, NOW what is the probability it's the good machine? Let W2 be the event "win twice in a row", G the event "good machine". 0.5 * (4/9) P[ G | W2 ] = P[G W2] / P[ W2 ] = -------------------- 0.5*(1/4) + 0.5*(4/9) = (2/9) / (1/8 + 2/9) = 16/25 = 64.00% P[ G | W1 ] = (2/9) / (1/4 + 2/9) = 8/17 = 47.06% P[ G | W0 ] = (1/18)/ (1/8 + 1/18)= 4/13 = 30.77% + I have an experimental drug that might work better than the old one and might not. I find two matched pairs of subjects; within each matched pair I assign the old drug and the new one randomly and blindly. I identify which subject is doing better within each pair. As it happens, the new drug did better within each group. The probability it is better than the old one is now 64%. More than Two Events P[ A B C ] = P [ A B ] P[ C | A B ] = P [ A ] P [ B | A ] P[ C | A B ] P[ \int A_i ] = P[ A1 ] P[ A2|A1 ] P[ A3 | A1 A2] ... P[ An | A1...A_{n-1}] ------------------------------------------------------------------------------ Distributions A 'Random Variable' is a number that depends on chance---- that is, a function from the Sample Space to (for example) the Real Numbers or maybe the Integers. Usually Random Variables are named with upper-case letters near the end of the alphabet. The "Distribution" of a random variable X is some rule to let us compute the probabilities Pr[ X in A ] (*) for every set A. If X takes only finitely-many or countably-many values, then we can list them and just report f(x) = Pr[ X = x ] for each one of those values---- this is called the 'Probablility Mass Function' (abbreviated "pmf") by some authors and the "Probability Function" (just "pf") by others. With it, probabilities like (*) can be computed by addition: Pr[ X in A ] = Sum [ f(x) : x in A ] The pmf can be specified in many ways---- tables, histograms, formulas, etc. For example: Let X be the number of aces in two draws *without replacement* from a standard 52-card deck. Each of the 52*51=2652 pairs is equally likely; 4*3 of them have 2 aces, so f(2) = (4*3=12) /2652 = 0.004524887 have two aces, f(1) = (48*4+4*48=384)/2652 = 0.1447964 have one ace, and f(0) = (48*47=2256) /2652 = 0.8506787 have zero aces; note ------- --------------------- ----------- = 2256 /2256 = 1.000000000 have 0,1, or 2 aces. Could represent this as a: Table: x: 0 1 2 f(x): 0.4525% 14.48% 85.07% Pie chart: Histogram: (4:x) (48:2-x) 2.246863e+59 Formula: ---------------- = -------------------------- (52:2) x! (4-x)! (46+x)! (2-x)! Some random variables (we'll see many examples later) can take on more than finitely-many values, so some other way of specifying prbabilities (*) is needed--- sometimes we'll have a "probability density function" f(x) that satisfies Pr[ X in A ] = Integral f(x) dx A but more of that later. We've already seen one example--- the "spinner" that gives a random value "U" between zero and one, with the property that Pr[ a < U <= b ] = (b-a) for 0 <= a <= b <= 1 This "distribution" is called the "Uniform" distribution on [0,1]. ------------------------------------------------------------------------------ Birthday Problem: Prob of NO shared birthday among 47 students in this class; let "A(i)" be the EVENT that the i'th person does NOT duplicate any earlier birthday, and let p(i) = P[ A(i) | A(1),...,A(i-1) ] = (366-i)/365 [WHY?] be the CONDITIONAL PROBABILITY of the event A(i), GIVEN the i-1 preceeding A(j)'s. Then the event of no shared birthday among 47 students has probability: P[ A(1) A(2) ... A(47) ] = P[A(1)] P[A(2) | A(1)] P[A(3) | A(1) A(2)] ... P[A(47)|...] = p(1) * p(2) * p(3) * ... * p(47) = (365/365) * (364/365) * (363/365) * ... * (366-47=319/365) = prod(319:365)/365^47 = 0.0452256 a bit less than one in twenty. SO, prob of shared b'day is at least 95.45% (why is the exact answer really HIGHER?) Math guys: e^x is about 1+x for small x; thus (1-i/365) is about e^(-i/365), and prod {(1-i/365) : i = 0..(n-1)} is about exp( - (1/365) * {0+1+2+...+(n-1)} ) = exp( - (n-1)*n/730 ) For n=47, this would be exp(-46*47/730) = exp(-2.961644)=0.05173381 More math: n! = n * n-1 * n-2 * ... * n = sqrt(2*pi) * n^(n+1/2) * exp(-n + theta/12n) for some 0< theta < 1 SO, 365! sqrt(2*pi) * 365^365.5 * exp(-365) P = -------------- ~ ---------------------------------------------- 318! * 365^47 sqrt(2*pi) * 318^318.5 * exp(-318) * 365^47 (365)^318.5 = ----- * exp(-47) = 0.04522712, a little closer. (318) --------- Probability of a FLUSH: 4 * (13/52) * (12/51) * (11/50) * (10/49) * (9/48) = 0.001980792 = 1/505.85, about two in a thousand. 4 * (13:5) 4 * 1287 5148 Another (harder?) way: ---------------- = ------------ = ------- (52:5) 2598960 2598960 Prob of a STRAIGHT (Ace is high or low but no wrap-around): 10 * 5! * (4/52) * (4/51) * (4/50) * (4/49) * (4/48) = 0.003940038 = 1/253.8047, about four in a thousand Probability of FOUR OF A KIND: 13 * 5 * (4/52) * (3/51) * (2/50) * (1/49) = 0.000240096 = 1/4165.001, about two in TEN thousand. Prob of a STRAIGHT FLUSH is 10 * 5! * (4/52) * (1/51) * (1/50) * (1/49) * (1/48) = 1.539077e-05 = 1/64974.01 Prob of a FULL HOUSE is 13 * 12 * (5:3) * (4/52)*(3/51)*(2/50) * (4/49)*(3/48) = 0.001440576 = 1/694.1668 Prob of THREE OF A KIND is 13 * (5:3) * (4/52)*(3/51)*(2/50) * (48/49) * (44/48) = 0.02112845 = 1/47.32955, about one in fifty Prob of TWO PAIRS is 13 * 12 * (5:2)*(3:2)*(1/2) * (4/52)*(3/51) * (4/50)*(3/49) * (44/48) = 0.04753902 = 1/21.03535, about one in twenty Prob of ONE PAIR is 13 * (5:2) * (4/52)*(3/51) * (48/50) * (44/49) * (40/48) = 0.422569 = 1/2.366477, about two in five Prob of NOTHING is (52/52) * (48/51) * (44/50) * (40/49) * (36/48) - P[Flush] - P[Straight] + P[Straight Flush] = 0.5070828 - 0.001980792 - 0.003940038 + 1.539077e-05 = 0.5011774, about one in two Prob of JACKS OR BETTER is 1 - P[NOTHING] - (9/13)*P[ONE PAIR] = 0.2062748, about one in five