Chapter 6: Large Sample Results for Likelihood-Based Methods

Author

Michael Throolin

6.2

Consider the density \[f(y;\sigma) = \frac{2y}{\sigma^2}\exp\left(-\frac {y^2}{\sigma^2}\right) I(y>0,\sigma>0)\] Verify that the regularity conditions 1. to 5. of Theorem 6.6 (p. 284) hold for the asymptotic normality of the maximum likelihood estimator of \(\sigma\).

Solution:

Condition 1: Identifiability \(\theta_1 \neq \theta_2 \implies \exists y ~s.t.~ F(y;\theta_1) \neq F(y;\theta_2)\).

\[\begin{aligned} F(y;\sigma) &= \int_{-\infty}^y\frac{2t}{\sigma^2}\exp\left(-\frac {t^2}{\sigma^2}\right) I(t>0,\sigma>0)dt & \substack{let~ u = \frac {t^2}{\sigma^2} \\du = \frac {2t}{\sigma^2}dt}\\ &= \int_{0}^{y^2/\sigma^2} e^{-u} \,I(u,\sigma>0) du \\ &= [-e^{y^2/\sigma^2}+e^0] \,I(y,\sigma>0) \\ &= [1-e^{y^2/\sigma^2}]I(y,\sigma>0) \\ \end{aligned}\]

Thus, if \(\sigma_1 \neq \sigma_2\), then

\[\begin{aligned} F(y; \sigma_1) &= [1-e^{-y^2/\sigma_1^2}]I(y,\sigma_1>0) \\ F(y; \sigma_2) &= [1-e^{-y^2/\sigma_2^2}]I(y,\sigma_2>0) \\ F(y; \sigma_1) -F(y; \sigma_2)&= e^{y^2/\sigma_2^2}I(y,\sigma_2>0)-e^{y^2/\sigma_1^2}I(y,\sigma_1>0) \\ &\neq 0 \\ \implies F(y; \sigma_1) &\neq F(y; \sigma_2) \end{aligned} \]

Condition 2: \(\forall \theta \in \Theta, F(y;\theta)\) has the same support not depending on \(\theta\).

This is true as the support is all \((y, \sigma) \in \mathbb R^+ \times \mathbb R^+\), which is independent of \(\sigma\).

Condition 3: \(\forall \theta \in \Theta, F(y;\theta)\) the first three partial derivatives of \(\log f(y;\theta)\) with respect to \(\theta\) exist for y is the support of \(F(y;\theta)\)

\[\begin{aligned} f(y;\sigma) &= \frac{2y}{\sigma^2}\exp\left(-\frac {y^2}{\sigma^2}\right) \\ \log f(y;\sigma) &= \log (2y)- 2\log(\sigma) -y^2\sigma^{-2} \\ \frac{\partial \log f(y;\sigma)}{\partial \sigma} &= - 2 \sigma^{-1} +2y^2\sigma^{-3} \\ \frac{\partial^2 \log f(y;\sigma)}{\partial \sigma^2} &= 2 \sigma^{-2} -6y^2\sigma^{-4} \\ \frac{\partial^3 \log f(y;\sigma)}{\partial \sigma^3} &= -4 \sigma^{-3} +24y^2\sigma^{-5} \\ \end{aligned} \]

The first three partial derivatives are defined for all strictly positive values of \(y\) and \(\sigma\), which is the support of \(F(y;\sigma)\).

Condition 4: \(\forall \theta_0 \in \Theta\), there exist a function \(g(y)\) such that \(\forall \theta\) in a neighborhood of \(\theta\), \(|\partial^3 \log f(y;\theta)/\partial \theta^3| \leq g(y)\) for all \(y\) where \(\int g(y)dF(y;\theta_0) < \infty\).

Let \(g(y)= |\partial^3 \log f(y;\sigma)/\partial \sigma^3| = |-4 \sigma^{-3} +24y^2\sigma^{-5}|\).

Then,

\[\begin{aligned} |-4 \sigma_0^{-3} +24y^2\sigma_0^{-5}| &\leq g(y) & \text{by def } g(y) \\ \int g(y)dF(y;\theta_0) &= \int |-4 \sigma_0^{-3} +24y^2\sigma_0^{-5}|dF(y; \sigma_0) \\ &\leq \int [|4 \sigma_0^{-3}| +|24y^2\sigma_0^{-5}|]f(y;\sigma_0)dy & \text{by Triangle Ineq.} \\ &\leq \int [4 \sigma_0^{-3} +24y^2\sigma_0^{-5}]f(y;\sigma_0)dy & \text{by Triangle Ineq.} \\ &\leq E(4 \sigma_0^{-3}) + E(24\sigma_0^{-5}Y^2) \\ &\leq 4 \sigma_0^{-3} + 24\sigma_0^{-5}E(Y^2)\\ &\leq 4 \sigma_0^{-3} + 24\sigma_0^{-5}\sigma_0^2 \\ &\leq 28\sigma_0^{-3} \\ &< \infty \end{aligned}\]

Condition 5: \(\forall \theta \in \Theta, E[\partial \log f(Y_1;\theta)/\partial \theta] = 0, I(\theta)=E([\partial \log f(Y_1;\theta)/\partial \theta]^2) = E[-\partial^2 \log f(Y_1;\theta)/\partial \theta^2]\)

\[\begin{aligned} E[\partial \log f(Y;\sigma)/\partial \sigma] &= E(- 2 \sigma^{-1} +2Y^2\sigma^{-3})\\ &= - 2 \sigma^{-1} +2E(Y^2)\sigma^{-3} &\substack{X= Y^2 \implies E(Y^2)= E(X) = \sigma^2\\ X \sim Exponential(\sigma^2)}\\ &= - 2 \sigma^{-1} +2\sigma^{-1}\\ &= 0 \end{aligned}\]

\[\begin{aligned} I(\sigma) &= E([\partial \log f(Y;\theta)/\partial \theta]^2) \\ &= E([- 2 \sigma^{-1} +2Y^2\sigma^{-3}]^2)\\ &= E(4\sigma^{-2} -8Y^2\sigma^{-4} +4Y^4\sigma^{-6}) \\ &= -4\sigma^{-2} +4E(Y^4)\sigma^{-6} & \substack{X= Y^2 \implies E(Y^4)= E(X^2) = Var(X) + [E(X)]^2\\ X \sim Exponential(\sigma^2) \implies E(X^4) = (\sigma^2)^2+(\sigma^2)^2=2\sigma^4}\\ &= -4\sigma^{-2} +4(2\sigma^{4})\sigma^{-6} \\ &= -4\sigma^{-2} +8\sigma^{-2}\\ &= 4\sigma^{-2} \\ &= 6\sigma^{-2}-2\sigma^{-2} \\ &= E[6\sigma^2\sigma^{-4}-2\sigma^{-2}]\\ &= E[-(2\sigma^{-2}-6Y^2\sigma^{-4})]\\ &= E[-\partial^2 \log f(Y_1;\theta)/\partial \theta^2] \end{aligned}\]

6.3

In Condition 5 of Theorem 6.6 (p. 284) we have the assumption that \(E[\partial \log f(Y_1;\theta)/\partial \theta] = 0\). For continuous distributions this mean zero assumption follows if \[\int\left[\frac \partial {\partial \theta} f(y;\theta)\right] dy=\frac \partial {\partial \theta}\int\left[ f(y;\theta)dy\right] \] because this latter integral is one by the definition of a density function. The typical proof that this interchange of differentiability and integration is allowed assumes that for each \(\theta_0 \in \Theta\), there is a bounding function \(g_1(y)\) (possibly depending on \(\theta_0\)) and a neighborhood of \(\theta_0\) such that for all \(y\) and for all \(\theta\) in the neighborhood of \(|\partial f(y;\theta)/\partial \theta| \leq g_1(y)\) and \(\int g_1(y)dy < \infty\). Use the dominated convergence theorem to show that this condition allows the above interchange.

Solution:

I begin by assuming what was stated in “the typical proof).” That for each \(\theta_0 \in \Theta\), there is a bounding function \(g_1(y)\) (possibly depending on \(\theta_0\)) and a neighborhood of \(\theta_0\) such that for all \(y\) and for all \(\theta\) in the neighborhood of \(|\partial f(y;\theta)/\partial \theta| \leq g_1(y)\).

Note that: \[\begin{aligned} \frac{f(y;\theta+h_n) - f(y;\theta)}{h_n}&= \frac{\partial f(y;\theta)}{\partial \theta}\bigg|_{\theta_* \in (\theta, \theta + h_n)} &\text{ by mean value theorem}\\ &\leq g_1(y) & since~|\partial f(y;\theta)/\partial \theta| &\leq g_1(y) \end{aligned}\]

Thus, \(\frac{f(y;\theta+h_n) - f(y;\theta)}{h_n} \to \frac{f(y;\theta+h) - f(y;\theta)}{h}\) satisfies conditions for the dominated convergence theorem.

Let \(h_n\) be a sequence converging to 0:

\[\begin{aligned} \frac \partial {\partial \theta}\int\left[ f(y;\theta)dy\right] &= \lim_{h_n \to 0} \frac{\int f(y;\theta+h_n)dy - \int f(y;\theta)dy}{h_n} \\ &= \lim_{h_n \to 0} \int \frac{f(y;\theta+h_n) - f(y;\theta)}{h_n} dy \\ &= \lim_{n \to \infty} \int \frac{f(y;\theta+h_n) - f(y;\theta)}{h_n} dy \\ &=\int\lim_{n \to \infty} \frac{f(y;\theta+h_n) - f(y;\theta)}{h_n} dy &\text{dominated convergence theorem} \\ &=\int\lim_{h_n \to 0} \frac{f(y;\theta+h_n) - f(y;\theta)}{h_n} dy \\ &=\int\left[\frac \partial {\partial \theta} f(y;\theta)\right] dy \end{aligned}\]

6.6

The proof of the asymptotic normality of the maximum likelihood estimator does not use an approximation by averages. Show, however, that one can extend the proof to obtain an approximation by averages result for the maximum likelihood estimator. Hint: add and subtract the numerator of (6.10, p. 285) divided by the probability limit of the denominator of (6.10, p. 285).

Solution:

From (6.10, p.285), where \(\hat \theta\) is the MLE and \(\theta_0\) is the true value,

\[\begin{aligned} \sqrt n (\hat \theta -\theta) &= \frac{-S(\theta)/\sqrt n}{\frac 1 n S'(\theta) + \frac 1 {2n} S''(\hat \theta^*)( \hat \theta - \theta)} \\ &= \frac{-S(\theta)/\sqrt n}{\frac 1 n S'(\theta) + \frac 1 {2n} S''(\hat \theta^*)( \hat \theta - \theta)} +\frac{S(\theta)/\sqrt n-S(\theta)/\sqrt n}{-I(\theta)} \\ \implies \hat \theta - \theta&= \frac{-S(\theta)/ n}{\frac 1 n S'(\theta) + \frac 1 {2n} S''(\hat \theta^*)( \hat \theta - \theta)} +\frac{S(\theta)/n-S(\theta)/ n}{-I(\theta)} \\ &= \frac{-S(\theta)/ n}{-I(\theta)} + R_n ~~~~ where~ R_n =\frac{-S(\theta)/ n}{\frac 1 n S'(\theta) + \frac 1 {2n} S''(\hat \theta^*)( \hat \theta - \theta)}+\frac{S(\theta)/n}{-I(\theta)}\\ &= \frac 1 n \sum_{i=1}^n \frac{-\partial \log f(Y_i; \theta)/\partial \theta}{-I(\theta)} + R_n \\ &= \frac 1 n \sum_{i=1}^n \frac{\partial \log f(Y_i; \theta)/\partial \theta}{I(\theta)} + R_n \end{aligned}\]

Note \(\sqrt n R_n \to 0\) since the denominators probability limits go to the negative information (see textbook).

Also note \(E\left[\frac{\partial \log f(Y_i; \theta)/\partial \theta}{I(\theta)}\right] = 0\) since the expected value of a score is zero.

Lastly, \[\begin{aligned} Var\left[\frac{\partial \log f(Y_i; \theta)/\partial \theta}{I(\theta)}\right] &=\frac{E[(\partial \log f(Y_i; \theta)/\partial \theta)^2]}{I^2(\theta)}\\ &= \frac{I(\theta)}{I^2(\theta)} \\ &= I^{-1}(\theta) \end{aligned} \]

Thus, the approximation by averages of \(\hat \theta_{MLE}\) is: \[\hat \theta -\theta =\frac 1 n \sum_{i=1}^n \frac{\partial \log f(Y_i; \theta)/\partial \theta}{I(\theta)} + R_n\] And \(\sqrt n (\hat \theta -\theta) \overset d \to N(0, I^{-1}(\theta))\) by Theorem 5.23 (p.242).