class: center, middle, inverse, title-slide .title[ # Adding a variable ] .author[ ### Yue Jiang ] .date[ ### STA 490/690 ] --- ### Roadmap for the last week STA 690 has ended, so to avoid overlapping with any potential final projects (like...oops I lectured on frailty models so now students can't), this week will cover a pretty important topic in linear models in general to partially address a very important question in applied modeling: "what happens when I add a variable?" (...in a very broad sense). Since the last week of class is technically STA 490 only, I figured we'd go through a perhaps surprising, but fundamental theorem across a broad class of models. But since everyone's coming from a different set of previous stats courses, there are a few basics to go through first. So bear with me through today - we'll need these concepts for Thursday's lecture (which I actually think will be a really cool one to end on (sorry, Taylor expansions again, can't avoid those in statistics!)). Essentially we're going to be whirlwind-tour-ing through a bunch of topics that are covered in much greater detail in STA 432. --- ### Convergence concepts A sequence of real numbers `\(X_1, X_2, X_3, \cdots\)` .vocab[converges] to a limit `\(X\)` if we can find, for every `\(\epsilon > 0\)`, some natural number `\(N\)` such that for every `\(n \ge N\)`, `\(|X_n - X| < \epsilon\)`. .vocab[ What about a sequence of *random variables*? What might it mean for a sequence of random variables "to converge"? ] --- ### An aside/review .vocab[Chebyshev's Inequality] provides a bound on the probability of how far observations can be from the mean as long as the distribution has finite mean `\(\mu\)` and variance `\(\sigma^2\)`: `\begin{align*} P(|X - \mu| \ge \epsilon) \le \frac{\sigma^2}{\epsilon^2} \end{align*}` As an example, consider the sample mean `\(\bar{X}\)` of `\(n\)` i.i.d. random variables, where each `\(X_i\)` has mean `\(\mu\)` and variance `\(\sigma^2\)`. .question[ What are `\(E(\bar{X})\)` and `\(Var(\bar{X})\)`? What is a bound for `\(P(|\bar{X} - E(\bar{X}) | \ge \epsilon)\)`? What happens as `\(n \to \infty\)`? ] --- ### An aside/review The .vocab[weak law of large numbers] says that for i.i.d. random variables `\(X_1, \cdots, X_n\)` with finite expectation `\(\mu\)`, `\begin{align*} \lim_{n \to \infty} P(|\bar{X} - E(\bar{X}) | \ge \epsilon) = 0. \end{align*}` .small[(Chebyshev's Inequality provides an immediate proof when `\(\sigma^2 < \infty\)`, but the WLLN holds regardless of whether the variance is finite (the proof is just worse)).] .question[ What does this mean intuitively? ] --- ### Convergence in probability A sequence of *random variables* `\(X_1, X_2, X_3, \cdots\)` is said to .vocab[converge in probability] to a *random variable* `\(X\)` if for all `\(\epsilon > 0\)`, `\begin{align*} \lim_{n \to \infty} P(|X_n - X| \ge \epsilon) = 0. \end{align*}` A .vocab[consistent] estimator is one that converges in probability to the parameter of interest. For instance, the sample mean is a consistent estimator for the population mean. --- ### `\(O_p\)` and `\(o_p\)` You might've seen big/little o notation before (i.e., a function is Big-O of h if `\(f(h)/h\)` is bounded and a function is little-o of h if `\(\lim_{h \to 0}f(h)/h = 0\)`). Similar notation is used for notions of convergence in probability. -- Let `\(\{X_n\}\)` be a sequence of random variables and `\(\{a_n\}\)` a sequence of real numbers. `\(\{X_n/a_n\}\)` is said to be .vocab[bounded in probability] if for every `\(\epsilon > 0\)`, there is some `\(N_\epsilon\)` such that `\(P(|X_n/a_n| > N_n) < \epsilon\)` for all `\(n > N_\epsilon\)`. -- In this case, we say that `\({X_n}\)` is Big-$O_p$ of `\(a_n\)`. That is, `\(X_n = O_p(a_n)\)` as `\(n \to \infty\)`. We often take `\(\{a_n\}\)` to simply be the sequence `\(n\)` itself: `\(1, 2, 3, \cdots\)` (what would `\(\{X_n\}\)` being "Big-$O_p$ of `\(n\)` mean?). --- ### `\(O_p\)` and `\(o_p\)` Similarly, a sequence of random variables is little-$o_p$ of `\(a_n\)` if `\(\{X_n/a_n\} \to_p 0\)`. .question[ What is the difference between being bounded in probability (being `\(O_p\)`) and converging in probability (being `\(o_p\)`)? ] --- ### A different type of convergence We just explored a notion of stochastic converge where random variables are said to "converge" if there is a low probability of them being "very far" from each other. What about some notion of "far-ness" based on whether their *distribution functions* are "close" to each other? Can we derive some notion of convergence in this sense? -- A sequence of random variables `\(X_1, X_2, X_3, \cdots\)` is said to .vocab[converge in distribution] to a .vocab[limiting distribution] `\(X\)` if `\begin{align*} \lim_{n \to \infty}F_{X_n}(x) = F_X(x) \end{align*}` at all continuity points of `\(F_X\)`. --- ### A classic example Consider the sequence of variables `\(X_1, X_2, X_3, \cdots \sim Unif(0, 1)\)` (i.i.d.). Denote by `\(X^{(n)}\)` the maximum of `\(X_1, \cdots, X_n\)`. Then for any `\(\epsilon > 0\)`, `\begin{align*} P(X^{(n)}) \le 1 - \epsilon &= P(X_1, \le 1 - \epsilon, X_2 \le 1 - \epsilon, \cdots)\\ &= (1 - \epsilon)^n. \end{align*}` .question[ Set `\(\epsilon = u/n\)` (since `\(\epsilon > 0\)` and `\(n > 0\)`, `\(u\)` also has to be `\(> 0\)`). What is the limiting distribution of the random variable `\(n(1 - X^{(n)})\)`? ] --- ### Another classic example For a sequence of i.i.d. random variables with finite mean `\(\mu\)` and variance `\(\sigma^2\)`, `\begin{align*} \sqrt{n}(\bar{X}_n - \mu) \to_d N(0, \sigma^2) \end{align*}` --- ### Two incredibly useful theorems Let `\(X_n\)` and `\(Y_n\)` represent sequences of random variables. If `\(X_n \to_d X\)` and `\(Y_n \to_p c\)` where `\(c\)` is a constant. .vocab[Slutsky's Theorem] says that: `\begin{align*} X_n + Y_n &\to_d X + c\\ X_nY_n &\to_d Xc \end{align*}` The .vocab[Continuous Mapping Theorem] says* that for any continuous* function `\(g\)`: `\begin{align*} X_n \to_p X &\implies g(X_n) \to_p g(X)\\ X_n \to_d X &\implies g(X_n) \to_d g(X) \end{align*}` .small[*also true for almost sure convergence, which we're not going to talk about.] .small[*with at most a countably infinite number of discontinuity points.]