Resnick STA 205 Probability & Measure Week 1: Intro * Housekeeping Details - Lec: Tue/Thu 11:45-1:00 - OH: Mon 1:00pm (tent) - Web: http://stat.duke.edu/courses/Fall12/sta711/ (midtermS will probably change; confs & trip) - HW: Approx 6 probs/wk; expect to spend 5-6 hrs a week on hw Due each Thu starting 9 days from today; returned next Tue - Txt: Comments welcome. Others opt'l (see class web page) - Sty: Read the book, do the problems, ask questions. My goal is not to spoon-feed the book, but rather to add perspective, illustrate and illuminate ideas, offer examples, and help show how the ideas and tools are useful in the theory and application of (especially Bayesian) statistics. 1. Sets and Events Motivation: Although it's not a pre-requisite for this course, most students will have taken an undergraduate calculus-based course in probability theory (like Duke's MTH230 = STA230). Such a course teaches about discrete and continuous random vbls and their distributions, joint distributions of 2 or 3 RV's, a little about conditional prob's and dist'ns. Most things are done twice: once for discrete rv's (binomial, geometric, Poisson) and once for continuous (uniform, normal, exponential). This course builds a single coherent (beautiful) structure for one, two, or even infinitely-many random variables that are discrete or continuous or neither. We will be especially concerned with limits of random variables (we will see there are many sorts of limits to consider) and with conditional distributions, given the values of many (even infinitely-many) other random variables. A recurring theme is application within Bayesian statistics--- which we may view as simply probability theory on a grand scale, building a joint probability model for all the things we don't know. These might include both parameters (like the probability p of success in a clinical trial of an experimental drug) and observable quantities that we haven't yet observed (for example, the number X of successes in the trial of N subjects). The object is usually to make deductions about the CONDITIONAL DISTRIBUTION of the things we care about, given the things we have observed... like P[ p > 0.75 | X=8, N=10 ]. Notation and Basic Mathematical Set-Up: \Omega: Set of possible outcomes of some "experiment" \omega: One of the outcomes in \Omega [Idea: nature or fate chooses an \omega from \Omega; alas she doesn't tell us which one. We just get hints from observing X(\omega), Y(\omega), ...] A, B, C: Subsets of \Omega; A is "true" if nature's \omega \in A. Usually UC letters in first half of alphabet A-Z Y^X: All functions from a set X to a set Y; special cases: 2^\Omega: All subsets of \Omega ("Power set", sometimes denoted with a spiky P(\Omega)) ( \Omega->{0,1} ) \Omega^2: All ordered pairs (\omega_1,\omega_2) ( {1,2}->\Omega ) P[ ]: Probability assignment of numbers P[A] >= 0 to SOME (maybe not all) subsets A of \Omega... the need to limit P[ ] to just SOME "events" and not the ENTIRE power set is an important distinction of graduate level or "measure theoretic" probability. \cal{A}: Certain collections ("classes") of sets (typ. 1st half of A-Z). X,Y,Z: Random variables, functions X:\Omega -> |E, usually to a vector space |E (often R or R^n). Mostly 2nd half of A-Z. E[X]: Expectation of SOME (not all!!!) random variables X (why not?) {} "Slash Oh" (\emptyset) is *empty set*, not \phi or \varphi . Four Big Ideas in Probability: LLN: If { X_i } are Indep Identically-Distributed RV's with same mean \mu = E[ X_i ], and partial sums S_n := \sum_{i<=n} X_i, then (1/n) S_n -> \mu [ what does it MEAN for a sequence RANDOM VARIABLES like Y_n = (1/n) S_n to "converge" to a constant \mu or to a random variable Y??? ] CLT: If { X_i } are IID with same mean \mu and finite variance \sig^2=E[(X_i-\mu)^2], and partial sums S_n := \sum_{i<=n} X_i, then Z_n = \sqrt {n} * (Y_n - \mu)/sig ==> No(0,1) [ what does it MEAN for a sequence of DISTRIBUTIONS to converge?? ] LIL: If { X_i } are IID with same mean \mu and finite variance \sig^2=E[(X_i-\mu)^2], and partial sums S_n := \sum_{i<=n} X_i, then \limsup_n (S_n - n\mu)/\sqrt{2 n \sigma^2 \log\log n} = 1 [ what is the "lim sup" of a sequence of random variables? ] [ Put LLN, CLT, LIL in perspective: [S.n - n mu] / g(n) -> ??? ] MCT: If X_n is "conditionally constant" in the sense that X_n = E[ X_{n+k} | X_1,...,X_n ] for every k>=0 and n, then UNDER SOME CONDITIONS (what conditions? why?), X_n -> X for some random variable X ( "->" in what way?) and, for SOME random times T (which ones? why?), E[ X_T | Info up to time n ] = X_n [ what does it MEAN to find expectation "given" some "info" ? ] ---------------------- Operations: Complement; A^c = "not A" = {w: w \notin A} Union over ARBITRARY index set; \cup A_\alpha = {w: w \in A_\alpha for at least one \alpha } A \cup B = "A or B (or perhaps both)" [ Later we'll see it sometimes matters if the index set has finitely-many, countably-many, or uncountably-many elements; this definition works for all those cases ] Intersection over arbitrary index set \cap A_\alpha = {w: w \in A_\alpha for all \alpha } A \cap B = A B = "both A and B" Set difference; symmetric difference; A /\ B = (A \cap B^c) \cup (A^c \cap B) -- = A\B u B\A = "exactly one of A, B occurs" Relations: containment: A \subset B : "A implies B" (A \cap B = A) disjoint: A \cap B = \emptyset equality; A = B: "A if-and-only-if B" De Morgan's Law: (\cup A_\alpha)^c = \cap (A_\alpha ^c) (\cap A_\alpha)^c = \cup (A_\alpha ^c) countable != infinite ( Cantor arg if time allows; note c = 2^\aleph_0 ) Define injection, cardinality, STATE #A < #B && #B < #A => #A = #B #A <= #B if exists 1:1 \phi: A \into B (not nec. surj'n) Convention: "i,j,n" (Latin) subscripts -> *countable* union/intersection/sum/... "alpha" (Greek) subscripts -> *arbitrary* (possibly uncountable) ----- END OF FIRST PART --------- Lim Inf: All but finitely-many = union of intersections = \cup_n \cap_{k>=n} A_k Lim Sup: Infinitely-many = intersection of unions = \cap_n \cup_{k>=n} A_k Note (Lim Inf) \subset (Lim Sup); sometimes (but not always) they coincide. Some examples, with \Omega = |N: A_n = {n,n+1,...} LimSup = LimInf = emptyset A_n = {1,2,...,n} LimSup = LimInf = |N A_{2n} = Evens, A_{2n+1} = Odds: LimSup = |N, LimInf = {} Important, Motivating Example: "epsilon-delta" notion of convergence and continuity: Convergence of a sequence in a metric space: a_n -> a <==> (V eps>0) (\exists mm -> |a_n-a| < eps Cauchy Convergence: a_n -> ? <==> (V eps>0) (\exists NN -> |a_n-a_m| < eps The *event* that a sequence X_n of functions on \Omega converges: { w: X_n(w) -> 0 } = { w: LimSup X_n(w) - LimInf X_n(w) = 0 } = \cap_k \cup_N \cap_{m,n>N} { w: | X_n(w) - X_m(w) | < 1/k } [or any other ... / ] ********************* Not every subset A of \Omega will be an "event" whose probability can be computed, but we will need to show that some are. Here are 3 rules: FIELD: i) \Omega in |A ii) A \in |A => A^c \in |A iii) A,B \in |A => AuB \in |A [==> FINITE unions & int's ] o'-FIELD (=\sigma-ALGEBRA) iii) Ai \in |A => Ui Ai \in |A (*countable* only) Example of Field not o'-Field: finite & co-finite sets (o')-Field generated by class C: Int'n of (o')-Fields is (o')-Field. Borel Sets -------------- 1. Basic Definitions - Probability Space (_O_, F, P) ~ Prob prop'ties: 1) P(A)>=0; 2) o'-additive; 3) P(_O_)=1. [ Finite Pos Measure: replace w/ 3) P(_O_) < oo; sig-Finite Pos Meas: replace w/ 3) _O_=U A_i\in F, P(A_i)oo} inf_{n >= m} a_n <= ~ lim-sup a_n = lim_{m->oo} sup_{n >= m} a_n (draw picture of (1+1/x) * sin(x) - Fatou's Lemma: E[ liminf X_n ] <= liminf E [ X_n ] -> (X_n = 1Ai) P(LimInf Ai e.a.) <= lim-inf P(Ai) <= <= lim-sup P(Ai) <= P(LimSup Ai) - df: F(x)=F(x+); x F(x) <= F(y); F(-oo)=0, F(+oo)=1 P (a,b] := F(b) - F(a) 2. Dynkin's "Pi/Lambda" Theorem [replaces older "Monotone Class Theorem"] * Lambda System * * Pi System * l1: _O_ \in L l2: A \in L => A^c \in L p1: A,B \in L => A B \in L l3: Disjoint ctble unions Thm (ED): a) P a \pi-system \subset L, a \lambda-system => o'(P) \subset L b) P a \pi-system => o'(P) = L(P) 3. Two Constructions i) Discrete: Countable \Omega, \sum{p_i} = 1, => B=2^\Omega okay ii) Continuous: Prob density iii) General 1-d: CDF 4. Constructions of Probability Spaces - Infinite Bernoulli sequence - Cantor distribution? - Big Thm 2.4.3: "Semi-Algebra": - \Omega, \emptyset \in C - C is a \pi-system - A \in C => A^C = \cup_{i <= n} B_i = *finite* union of *disjoint* el'ts of C - "combo extension thm": P sigma-additive on field or semi-algebra C => there is a unique extension of P to all of \lambda(C) = \sigma(C) [ sketch proof and prove uniqueness using Dynkan ] 5. Measure Constructions 2 - Lebesgue measure on (0,1] (\lambda_2(dx)) (inner & outer measure; circle, triangle examples) 6. Counter-examples? E.g. Uniform on integers; Finitely-additive measures; ------------ Thu: Introduce: 1) { w: X_n(w) -> X(w) } = \cap{km} { w: |X_n(w) - X(w)| < 1/k } = \cap{k