Adding a variable

class: center, middle, inverse, title-slide

.title[
# Adding a variable
]
.author[
### Yue Jiang
]
.date[
### STA 490/690
]

---

### Roadmap for the last week

STA 690 has ended, so to avoid overlapping with any potential final projects (like...oops I lectured on frailty models so now students can't), this week will cover a pretty important topic in linear models in general to partially address a very important question in applied modeling: "what happens when I add a variable?" (...in a very broad sense).

Since the last week of class is technically STA 490 only, I figured we'd go through a perhaps surprising, but fundamental theorem across a broad class of models. But since everyone's coming from a different set of previous stats courses, there are a few basics to go through first.

So bear with me through today - we'll need these concepts for Thursday's lecture (which I actually think will be a really cool one to end on (sorry, Taylor expansions again, can't avoid those in statistics!)). Essentially we're going to be whirlwind-tour-ing through a bunch of topics that are covered in much greater detail in STA 432.

---

### Convergence concepts

A sequence of real numbers `$X_1, X_2, X_3, \cdots$` .vocab[converges] to a limit `$X$` if we can find, for every `$\epsilon > 0$`, some natural number `$N$` such that for every `$n \ge N$`, `$|X_n - X| < \epsilon$`.

.vocab[
What about a sequence of *random variables*? What might it mean for a sequence of random variables "to converge"?
]

---

### An aside/review

.vocab[Chebyshev's Inequality] provides a bound on the probability of how far observations can be from the mean as long as the distribution has finite mean `$\mu$` and variance `$\sigma^2$`:

`\begin{align*}
P(|X - \mu| \ge \epsilon) \le \frac{\sigma^2}{\epsilon^2}
\end{align*}`

As an example, consider the sample mean `$\bar{X}$` of `$n$` i.i.d. random variables, where each `$X_i$` has mean `$\mu$` and variance `$\sigma^2$`.

.question[
What are `$E(\bar{X})$` and `$Var(\bar{X})$`? What is a bound for `$P(|\bar{X} - E(\bar{X}) | \ge \epsilon)$`? What happens as `$n \to \infty$`?
]

---

### An aside/review

The .vocab[weak law of large numbers] says that for i.i.d. random variables `$X_1, \cdots, X_n$` with finite expectation `$\mu$`,

`\begin{align*}
\lim_{n \to \infty} P(|\bar{X} - E(\bar{X}) | \ge \epsilon) = 0.
\end{align*}`

.small[(Chebyshev's Inequality provides an immediate proof when `$\sigma^2 < \infty$`, but the WLLN holds regardless of whether the variance is finite (the proof is just worse)).]

.question[
What does this mean intuitively?
]

---

### Convergence in probability

A sequence of *random variables* `$X_1, X_2, X_3, \cdots$` is said to .vocab[converge in probability] to a *random variable* `$X$` if for all `$\epsilon > 0$`,

`\begin{align*}
\lim_{n \to \infty} P(|X_n - X| \ge \epsilon) = 0.
\end{align*}`

A .vocab[consistent] estimator is one that converges in probability to the parameter of interest. For instance, the sample mean is a consistent estimator for the population mean.

---

### `$O_p$` and `$o_p$`

You might've seen big/little o notation before (i.e., a function is Big-O of h if `$f(h)/h$` is bounded and a function is little-o of h if `$\lim_{h \to 0}f(h)/h = 0$`). Similar notation is used for notions of convergence in probability.

Let `$\{X_n\}$` be a sequence of random variables and `$\{a_n\}$` a sequence of real numbers. `$\{X_n/a_n\}$` is said to be .vocab[bounded in probability] if for every `$\epsilon > 0$`, there is some `$N_\epsilon$` such that `$P(|X_n/a_n| > N_n) < \epsilon$` for all `$n > N_\epsilon$`.

In this case, we say that `${X_n}$` is Big-$O_p$ of `$a_n$`. That is, `$X_n = O_p(a_n)$` as `$n \to \infty$`. We often take `$\{a_n\}$` to simply be the sequence `$n$` itself: `$1, 2, 3, \cdots$` (what would `$\{X_n\}$` being "Big-$O_p$ of `$n$` mean?).

---

### `$O_p$` and `$o_p$`

Similarly, a sequence of random variables is little-$o_p$ of `$a_n$` if `$\{X_n/a_n\} \to_p 0$`.

.question[
What is the difference between being bounded in probability (being `$O_p$`) and converging in probability (being `$o_p$`)?
]

---

### A different type of convergence

We just explored a notion of stochastic converge where random variables are said to "converge" if there is a low probability of them being "very far" from each other.

What about some notion of "far-ness" based on whether their *distribution functions* are "close" to each other? Can we derive some notion of convergence in this sense?

A sequence of random variables `$X_1, X_2, X_3, \cdots$` is said to .vocab[converge in distribution] to a .vocab[limiting distribution] `$X$` if

`\begin{align*}
\lim_{n \to \infty}F_{X_n}(x) = F_X(x)
\end{align*}`

at all continuity points of `$F_X$`.

---

### A classic example

Consider the sequence of variables `$X_1, X_2, X_3, \cdots \sim Unif(0, 1)$` (i.i.d.). Denote by `$X^{(n)}$` the maximum of `$X_1, \cdots, X_n$`. Then for any `$\epsilon > 0$`,

`\begin{align*}
P(X^{(n)}) \le 1 - \epsilon &= P(X_1, \le 1 - \epsilon, X_2 \le 1 - \epsilon, \cdots)\\
&= (1 - \epsilon)^n.
\end{align*}`

.question[
Set `$\epsilon = u/n$` (since `$\epsilon > 0$` and `$n > 0$`, `$u$` also has to be `$> 0$`).

What is the limiting distribution of the random variable `$n(1 - X^{(n)})$`?
]

---

### Another classic example

For a sequence of i.i.d. random variables with finite mean `$\mu$` and variance `$\sigma^2$`,

`\begin{align*}
\sqrt{n}(\bar{X}_n - \mu) \to_d N(0, \sigma^2)
\end{align*}`

---

### Two incredibly useful theorems

Let `$X_n$` and `$Y_n$` represent sequences of random variables. If `$X_n \to_d X$` and `$Y_n \to_p c$` where `$c$` is a constant. .vocab[Slutsky's Theorem] says that:

`\begin{align*}
X_n + Y_n &\to_d X + c\\
X_nY_n &\to_d Xc
\end{align*}`

The .vocab[Continuous Mapping Theorem] says* that for any continuous* function `$g$`:

`\begin{align*}
X_n \to_p X &\implies g(X_n) \to_p g(X)\\
X_n \to_d X &\implies g(X_n) \to_d g(X)
\end{align*}`

.small[*also true for almost sure convergence, which we're not going to talk about.]

.small[*with at most a countably infinite number of discontinuity points.]