More on Estimation, esp. in Exponential Families ------------------------------------------------ If f(x | th) = exp[ eta(th) T(x) - A(th) ] * h(x) for each X_i among n indep. ident. dist'd RV's X1 ... Xn, then the JOINT pdf is also of Exponential Family form f(x | th) = exp[ eta(th) \sum T(x.j) - n A(th) ] * \prod h(x.j) SO, it's enough to consider a single (maybe vector-valued) observation "x". Claim: MLE is consistent in Expo Fams. Build-up: (a) MLE in Expo Fam Assuming the LH achieves its maximum at an interior point of Theta where the derivative exists and vanishes, the MLE "th.hat" will be the solution "th" to: T(x) = A'(th) / eta'(th) (*) For samples of size n, this becomes simply T.bar(x) = A'(th) / eta'(th) In the case of a multi-dimensional parameter th in R^p, by the chain rule we have 0 = (d/dth.i) { sum eta.j(th) T.j(x) - A(th) } = sum [(d eta.j/d th.i) T.j(x) ] - (d/dth.i) A(th) so "th.hat" is the solution th to the matrix/vector equation H T(x) = A' where H is the matrix with entries (d eta.j/d th.i) T.j(x) and where A' is the gradient vector with entries (d/dth.i) A(th). -------- (b) Mean of Sufficient Statistic Since f(x | th) is a density, it integrates to one, and so: 1 = int_X exp[ eta(th) T(x) - A(th) ] * h(x) dx Upon taking a derivative w.r.t. th, we have 0 = int_X {eta'(th) T(x) - A'(th) } * exp[ eta(th) T(x) - A(th) ] * h(x) dx = eta'(th) * E[ T(X) | th ] - A'(th), so for fixed th and X ~ f(x | th) the expectation of T(X) is E[ T(X) | th ] = A'(th) / eta'(th) (**) Note the similarity (and differences) between (*) and (**). Again, in p>1 dimensions a matrix/vector version holds. -------- By the Law of Large Numbers, T.bar(x) -> E[ T | th ] as n->oo; if A(th) and eta(th) have continuous derivatives (always the case for us), then the solution th.hat of (*) must converge to the solution th of (**), i.e., the MLE must converge to the true value of th. WHAT IT MEANS: If we observe repeated independent observations X.j ~ f(x | th), all from the same distribution (i.e. for some fixed value of th), then in the limit as n->oo we will learn th perfectly from the data---- th = lim { th.hat.n ) n->oo In fact more is true--- not only does the estimation error [ th.hat - th ] go to zero as n->oo, it becomes approximately normally-distributed, with mean zero and variance 1 / n * A''(th).... so sqrt{ n A''(th) } * [ th.hat - th ] has a standard normal limiting distribution as n grows. The term "A''(th)" can be replaced with its estimate "A''(th.hat)" if we like, leading to asymptotic interval estimates for th, e.g., 0.95 = Pr[ th.hat - 1.96/sqrt{n A''(th.hat)} < th < th.hat + ... ] for *any* exponential family! Compare this to the formulas we get for Binomial data (both natural and conventional parameterizations), and to the exact result for normal distributions with known variance. ----------------------------------------------------------------- EXAMPLES: Work out details for n Bernoulli variables, in both conventional and natural parametrizations. ================================================================= (c) Conjugate Prior Distributions If f(x | th) = exp[ eta(th) T(x) - A(th) ] * h(x) and the prior density for th happens to be of the form pi( th ) = c * exp[ alp eta(th) - bet A(th) ] (#) then the posterior density must be pi( th | X.n) = c* * exp[ {alp+sum T(x.j)} eta(th) - {bet+n} A(th) ] so it's again of form (#) but with new parameters, alp* = alp + sum T(x.j) bet* = bet + n One can interpret "bet" as a "prior sample size" and "alp/bet" as the "prior average T(x.j)". Of course, we need for c*(alp,bet,x) = int { exp[ {alp+sum T(x.j)} eta(th) - {bet+n} A(th) ] } dth to be positive and finite--- otherwise the posterior is "improper" and not suitable for inference. It's okay if the PRIOR is improper but not the POSTERIOR. In the limit as alp->0 and bet->0 this commonly approaches the "Jeffreys Prior" density proportional to pi( th ) ~ sqrt { A''(th) } which has the nice property that it's invariant under changes of variables [explain what that means]. ---------------------------------------------------------------------