Chapter 7: M-Estimation (Estimating Equations)

Author

Michael Throolin

7.3

Suppose that \(Y_1, \dots , Y_n\) are iid from a gamma\((\alpha, \beta)\) distribution.

a.

One version of the method of moments is to set \(\bar Y\) equal to \(E(Y_1) = \alpha \beta\) and \(n^{-1} \sum_{i=1}^n Y_i^2\) equal to \(E(Y_1^2) = \alpha \beta^2 + (\alpha \beta)^2\) and solve for the estimators. Use Maple (at least it’s much easier if you do) to find \(V= A^{-1} B \{A^{-1}\}^T\). Here it helps to know that \(E(Y_1^3) = \alpha(1+\alpha )(2+\alpha ) \beta^3\) and \(E(Y_1^4) = \alpha(1+\alpha)(2+\alpha)(3+\alpha)\beta^4\). Show your derivation of A and B and attach Maple output.

Solution:

\[\begin{aligned} \theta &= (\alpha, \beta)^T ~~~~~ \psi(Y_i, \theta) &= \begin{pmatrix} Y_i - \alpha \beta \\ Y_i^2 - \alpha \beta^2 - (\alpha \beta)^2 \end{pmatrix} ~~~~~~~~~ \frac{\partial \psi(Y_i, \theta)}{\partial \theta} &= \begin{bmatrix} - \beta & - \alpha \\ - \beta^2 - 2\alpha \beta^2 & - 2\alpha \beta - 2\alpha^2 \beta \end{bmatrix} \end{aligned} \] \[ \begin{aligned} A &= E[- \psi'(Y_i, \theta)] = \begin{bmatrix} \beta & \alpha \\ \beta^2 + 2\alpha \beta^2 & 2\alpha \beta + 2\alpha^2 \beta \end{bmatrix} \\ B &= E[\psi (Y_i, \theta)\psi (Y_i, \theta)^T] = E\left[\begin{pmatrix} Y_i - \alpha \beta \\ Y_i^2 - \alpha \beta^2 - (\alpha \beta)^2 \end{pmatrix} \begin{pmatrix} Y_i - \alpha \beta & Y_i^2 - \alpha \beta^2 - (\alpha \beta)^2 \end{pmatrix} \right]\\ &= E\begin{bmatrix} (Y_i -\alpha \beta)^2 & (Y_i -\alpha \beta)(Y_i^2 -\alpha \beta^2 -\alpha^2\beta^2) \\ (Y_i -\alpha \beta)(Y_i^2 -\alpha \beta^2 -\alpha^2\beta^2) & (Y_i^2 -\alpha \beta^2 -\alpha^2\beta^2)^2\end{bmatrix} \\ &= E\begin{bmatrix} Y_i^2 - 2 \alpha \beta Y_i +\alpha^2 \beta^2 & Y_i^3-\alpha \beta Y_i^2 - \alpha^2 \beta^2 Y_i -\alpha \beta^2 Y_i + \alpha^3 \beta^3 + \alpha^2\beta^3\\ Y_i^3-\alpha \beta Y_i^2 - \alpha^2 \beta^2 Y_i -\alpha \beta^2 Y_i + \alpha^3 \beta^3 + \alpha^2\beta^3 & Y_i^4 -2 \alpha^2 \beta^2 Y_i^2 - 2 \alpha \beta^2 Y_i^2 + \alpha^4 \beta^4 +\alpha^2 \beta^4 + 2 \alpha^3 \beta^4\end{bmatrix} \\ &= \begin{bmatrix} (\alpha \beta^2 + \alpha^2 \beta^2) - 2 \alpha \beta \alpha \beta +\alpha^2 \beta^2 & \alpha(1+\alpha)(2+\alpha)\beta^3-\alpha \beta (\alpha \beta^2 + \alpha^2 \beta^2) - \alpha^2 \beta^2 \alpha \beta -\alpha \beta^2 \alpha \beta + \alpha^3 \beta^3 + \alpha^2\beta^3\\ \dots Symmetric \dots & E(Y_i^4) -2 \alpha^2 \beta^2 (\alpha \beta^2 + \alpha^2 \beta^2) - 2 \alpha \beta^2 (\alpha \beta^2 + \alpha^2 \beta^2) + \alpha^4 \beta^4 +\alpha^2 \beta^4 + 2 \alpha^3 \beta^4\end{bmatrix} \\ &= \begin{bmatrix} \alpha \beta^2 & \alpha(1+\alpha)(2+\alpha)\beta^3-\alpha^2 \beta^3 - \alpha^3 \beta^3 \\ \alpha(1+\alpha)(2+\alpha)\beta^3-\alpha^2 \beta^3 - \alpha^3 \beta^3 & \alpha(1+\alpha)(2+\alpha)(3+\alpha)\beta^4 -2 \alpha^3 \beta^4 - \alpha^4 \beta^4 - \alpha^2 \beta^4 \end{bmatrix} \\ &= \begin{bmatrix} \alpha \beta^2 & (\alpha^3+3\alpha^2+2\alpha)\beta^3-\alpha^2 \beta^3 - \alpha^3 \beta^3 \\ (\alpha^3+3\alpha^2+2\alpha)\beta^3-\alpha^2 \beta^3 - \alpha^3 \beta^3 & (\alpha^4 + 6 \alpha^3+ 11 \alpha^2 + 6\alpha)\beta^4 -2 \alpha^3 \beta^4 - \alpha^4 \beta^4 - \alpha^2 \beta^4 \end{bmatrix} \\ &= \begin{bmatrix} \alpha \beta^2 & 2(\alpha^2+\alpha)\beta^3 \\ 2(\alpha^2+\alpha)\beta^3 & (4 \alpha^3+ 10 \alpha^2 + 6\alpha)\beta^4 \end{bmatrix} \\ V &= A^{-1} B \{ A^{-1} \}^T ~~~~ Using~Microsoft~Mathematics \\ &= \begin{bmatrix} 2(\alpha^2 +\alpha) & -2(1+ \alpha) \beta \\ -2(1+ \alpha) \beta & \left(2 + \frac 3 \alpha \right)\beta^2 \end{bmatrix} \end{aligned}\]

b.

The second version of the method of moments (and perhaps the easier method) is to set \(\bar Y\) equal to \(\alpha \beta\) and \(s^2\) equal to \(var(Y_1) = \alpha \beta^2\) and solve for the estimators. You could use either the “\(n-1\)” or “\(n\)” version of \(s^2\), but here we want to use the “\(n\)” version in order to fit into the M-estimator theory. Compute \(V\) as in a) except that the second component of the function is different from a) (but V should be same). Here it helps to know that \(\mu_3= 2\alpha\beta^3\) and \(\mu_4 = 3[\alpha^2+2\alpha]\beta^4\).

Solution:

\[\begin{aligned} \psi(Y_i, \theta) &= \begin{pmatrix} Y_i - \alpha \beta \\ (Y_i - \alpha \beta)^2 - \alpha \beta^2\end{pmatrix} ~~~~~\frac{\partial \psi(Y_i, \theta)}{\partial \theta} = \begin{bmatrix} - \beta & - \alpha\\ -2 \beta (Y_i - \alpha \beta) - \beta^2 & -2\alpha(Y_i - \alpha \beta) -2\alpha \beta\end{bmatrix} \end{aligned}\]

\[ \begin{aligned} A &= E[- \psi'(Y_i, \theta)] = E \begin{bmatrix} \beta & \alpha\\ 2 \beta (Y_i - \alpha \beta) + \beta^2 & 2\alpha(Y_i - \alpha \beta) +2\alpha \beta\end{bmatrix} \\ &=\begin{bmatrix} \beta & \alpha\\ \beta^2 & 2\alpha \beta\end{bmatrix} \\ B &= E[\psi (Y_i, \theta)\psi (Y_i, \theta)^T] = E\left[\begin{pmatrix} Y_i - \alpha \beta \\ (Y_i - \alpha \beta)^2 - \alpha \beta^2 \end{pmatrix} \begin{pmatrix} Y_i - \alpha \beta & (Y_i - \alpha \beta)^2 - \alpha \beta^2 \end{pmatrix} \right]\\ &= E \begin{bmatrix} (Y_i - \alpha \beta)^2 & (Y_i - \alpha \beta)[(Y_i-\alpha \beta)^2 - \alpha \beta^2] \\ (Y_i - \alpha \beta)[(Y_i-\alpha \beta)^2 - \alpha \beta^2] & [(Y_i-\alpha \beta)^2 - \alpha \beta^2]^2\end{bmatrix}\\ &= \begin{bmatrix} Var(Y_i) & \mu_3 \\ \mu_3 & \mu_4 -2 \alpha \beta^2 Var(Y_i) + \alpha^2\beta^4\end{bmatrix} = \begin{bmatrix} \alpha \beta^2 & 2 \alpha \beta^3 \\ 2 \alpha \beta^3 & 3(\alpha^2+2\alpha)\beta^4-2 \alpha \beta^2(\alpha \beta^2) + \alpha^2\beta^4\end{bmatrix}\\ &= \begin{bmatrix} \alpha \beta^2 & 2 \alpha \beta^3 \\ 2 \alpha \beta^3 & (2\alpha^2+6\alpha)\beta^4 \end{bmatrix}\\ V &= A^{-1} B \{ A^{-1} \}^T ~~~~ Using~Microsoft~Mathematics \\ &= \begin{bmatrix} 2(\alpha^2 +\alpha) & -2(1+ \alpha) \beta \\ -2(1+ \alpha) \beta & \left(2 + \frac 3 \alpha \right)\beta^2 \end{bmatrix} \end{aligned}\]

c.

The asymptotic variance of the MLEs for \(\alpha\) and \(\beta\) are \(Avar(\hat \alpha_{MLE})= 1.55/n\) for \(\alpha = 1.0\) and \(Avar(\hat \alpha_{MLE})= 6.90/n\) for \(\alpha = 2.0\). Similarly, \(Avar(\hat \beta_{MLE})= 2.55\beta^2/n\) for \(\alpha = 1.0\) and \(Avar(\hat \beta_{MLE})= 3.45\beta^2/n\) for \(\alpha = 2.0\). Now calculate the asymptotic relative efficiencies of the MLEs to the method of moment estimators for \(\alpha = 1.0\) and \(\alpha = 2.0\) using results from a.

Solution:

\[\begin{aligned} Avar(\hat \alpha_{MOM}) |_{\alpha = 1.0} &= 2(\alpha^2 +\alpha)/n \big|_{\alpha = 1.0} = 4/n \\ Avar(\hat \alpha_{MOM}) |_{\alpha = 2.0} &= 2(\alpha^2 +\alpha)/n \big|_{\alpha = 2.0} = 12/n \\ Avar(\hat \beta_{MOM}) |_{\alpha = 1.0} &=\left(2 + \frac 3 \alpha \right)\beta^2 \big|_{\alpha = 1.0} = 5 \beta^2 /n \\ Avar(\hat \beta_{MOM}) |_{\alpha = 2.0} &= \left(2 + \frac 3 \alpha \right)\beta^2 \big|_{\alpha = 2.0} = 7 \beta^2/(2n) \\ \\ ARE(\hat \alpha_{MLE},\hat \alpha_{MOM})|_{\alpha = 1.0} &= 1.55/4 = 0.3875 \\ ARE(\hat \alpha_{MLE},\hat \alpha_{MOM})|_{\alpha = 2.0} &= 6.9/12 = 0.575 \\ ARE(\hat \beta_{MLE},\hat \beta_{MOM})|_{\alpha = 1.0} &= 2.55/(5) = 0.52 \\ ARE(\hat \beta_{MLE},\hat \beta_{MOM})|_{\alpha = 2.0} &= 3.45/(7/2) \approx 0.986 \\ \end{aligned}\]

As expected, the MLE is more efficient in all cases.

7.4

Suppose that \(Y_1, \dots, Y_n\) are iid and \(\boldsymbol{\hat \theta}\) satisfies \(\sum_{i=1}^n \boldsymbol \psi (Y_i, \boldsymbol{\hat \theta}) = \mathbf c_n\) where we assume:

\(\boldsymbol{\hat \theta} \overset p \to \boldsymbol \theta_0\)
\(\mathbf c_n / \sqrt n \overset p \to 0\)
The remainder term \(R_n\) from the expansion \[G_n (\boldsymbol{\hat \theta}) = n^{-1} \sum_{i=1}^n \boldsymbol \psi (Y_i, \boldsymbol{\hat \theta}) = G_n(\boldsymbol \theta_0) +G_n'(\boldsymbol \theta_0)(\boldsymbol{\hat \theta} -\boldsymbol \theta_0) + \mathbf R_n\] satisfies \(\sqrt n \mathbf R_n \overset p \to 0\).

Show that \(\boldsymbol{\hat \theta}\) is \(AN(\boldsymbol \theta_0, V(\boldsymbol \theta_0)/n)\), i.e., the same result as for the usual case when \(\mathbf c_n = 0\).

Solution:

Let \(G_n (\boldsymbol{\hat \theta}) = n^{-1} \sum_{i=1}^n \boldsymbol \psi (Y_i, \boldsymbol{\hat \theta}) = n^{-1} \mathbf c_n\).

Expanding using Taylor’s Theorem about \(\boldsymbol \theta_0\) \[\begin{aligned} n^{-1} \mathbf c_n &= G_n (\boldsymbol{\hat \theta}) =G_n(\boldsymbol \theta_0) +G_n'(\boldsymbol \theta_0)(\boldsymbol{\hat \theta} -\boldsymbol \theta_0) + \mathbf R_n \\ \implies \frac {\mathbf c_n} {\sqrt{n}} &= \sqrt n G_n(\boldsymbol \theta_0) + \sqrt n G_n'(\boldsymbol \theta_0)(\boldsymbol{\hat \theta} -\boldsymbol \theta_0) + \sqrt n \mathbf R_n \end{aligned}\] \[\begin{aligned} \implies \sqrt n G_n(\boldsymbol \theta_0) + \sqrt n G_n'(\boldsymbol \theta_0)(\boldsymbol{\hat \theta} -\boldsymbol \theta_0) &\overset p \to 0 & \text{by ii and iii}\\ \implies \sqrt n G_n'(\boldsymbol \theta_0)(\boldsymbol{\hat \theta} -\boldsymbol \theta_0) \overset p \to -\sqrt n G_n(\boldsymbol \theta_0) \\ \implies \sqrt n (\boldsymbol{\hat \theta} -\boldsymbol \theta_0) \overset p \to -G_n'^{-1}(\boldsymbol \theta_0)\sqrt n G_n(\boldsymbol \theta_0) \end{aligned}\]

Note that, by CLT \[\begin{aligned} \sqrt n [G_n(\boldsymbol \theta_0)- E[G_n(\boldsymbol \theta_0)]] &\overset d \to N(0, E(G_n(\boldsymbol \theta_0)G_n(\boldsymbol \theta_0)^T)) \\ \implies \sqrt n \left[G_n(\boldsymbol \theta_0)- E[n^{-1} \sum_{i=1}^n \boldsymbol \psi (Y_i, \boldsymbol \theta_0)]\right] &\overset d \to N\left(0, E\left[n^{-2} \sum_{i=1}^n \boldsymbol \psi (Y_i, \boldsymbol \theta_0)\boldsymbol \sum_{i=1}^n \psi^T (Y_i, \boldsymbol \theta_0)\right] \right) \\ \implies \sqrt n \left[G_n(\boldsymbol \theta_0)- n^{-1} \mathbf c_n\right] &\overset d \to N\left(0, E\left[ \boldsymbol \psi (Y_1, \boldsymbol \theta_0)\boldsymbol \psi^T (Y_2, \boldsymbol \theta_0)\right] \right) \\ \implies \sqrt n G_n(\boldsymbol \theta_0) &\overset d \to N\left(0, \mathbf B(\boldsymbol \theta_0) \right) \end{aligned}\]

Also, by SLLN \[\begin{aligned} G_n'(\boldsymbol \theta_0) &= n^{-1} \sum_{i=1}^n \boldsymbol \psi' (Y_i, \boldsymbol{\hat \theta}) \overset p \to E(\psi' (Y_1, \boldsymbol \theta_0)) = - A(\boldsymbol \theta_0) \end{aligned}\]

Thus, by Slutsky’s Theorem,

\[\begin{aligned} \implies \sqrt n (\boldsymbol{\hat \theta} -\boldsymbol \theta_0) &\overset p \to -G_n'^{-1}(\boldsymbol \theta_0)\sqrt n G_n(\boldsymbol \theta_0) \\ &\overset d \to N\left(0, \mathbf A^{-1}(\boldsymbol \theta_0) \mathbf B(\boldsymbol \theta_0) \left\{\mathbf A^{-1}(\boldsymbol \theta_0) \right\}^T \right) \\ &\overset d \to N\left(0, \mathbf V(\boldsymbol \theta_0) \right) \\ \implies \boldsymbol{\hat \theta} ~is ~ & ~~ AN\left( \boldsymbol \theta_0, \frac{\mathbf V(\boldsymbol \theta_0)}{n} \right) \end{aligned}\]

7.6

(Delta Theorem via M-estimation). Suppose that \(\boldsymbol{\hat \theta}\) is a b-dimensional M-estimator with defining function \(\boldsymbol \psi(y, \boldsymbol \theta)\) such that the usual quantities \(\mathbf A\) and \(\mathbf B\) exist. Here we want to essentially reproduce Theorem 5.19 (p. 238) for \(g(\boldsymbol{\hat \theta})\), where \(g\) satisfies the assumptions of Theorem 5.19 and \(b_n^2 \boldsymbol \Sigma = n^{-1} \mathbf V(\boldsymbol \theta)\), where \(\mathbf V(\boldsymbol \theta )= \mathbf A(\boldsymbol \theta)^{-1} \mathbf B(\boldsymbol \theta)\{\mathbf A(\boldsymbol \theta)^{-1}\}^T\) So add the \(\boldsymbol \psi\) function \(g(\boldsymbol \theta) - \theta_{b+1}\) to \(\boldsymbol \psi(y, \boldsymbol \theta)\), compute the relevant matrices, say \(\mathbf {A}^* , \mathbf{B}^*,\) and \(\mathbf{V}^*\), and show that the last diagonal element of \(\mathbf{V}^*\) is \(g'(\boldsymbol \theta) \mathbf V(\boldsymbol \theta)g'(\boldsymbol \theta)^T\).

Solution:

Define \(\theta^* = E[g(\boldsymbol \theta)], \boldsymbol \beta = (\boldsymbol \theta, \theta^*)^T\) {% raw %} \[\begin{aligned} \psi^* (Y_i, \boldsymbol \beta) &= \begin{pmatrix} \psi(Y, \boldsymbol \theta) \\ g(\boldsymbol \theta) - \theta^* \end{pmatrix} \\ \mathbf A^* &= - E \begin{bmatrix} \frac{\partial \psi^* (Y_i, \boldsymbol \beta)}{\partial \boldsymbol \beta} \end{bmatrix} = - E \begin{bmatrix} \frac{\partial \psi(Y, \boldsymbol \theta)}{\partial \boldsymbol \theta} & \frac{\partial \psi(Y, \boldsymbol \theta)}{\partial \theta^*} \\ \frac{\partial}{\partial \boldsymbol \theta} \left\{ g(\boldsymbol \theta) - \theta^*\right\} & \frac{\partial}{\partial \theta^*} \left\{ g(\boldsymbol \theta) - \theta^*\right\} \end{bmatrix} \\ &= \begin{bmatrix} \mathbf A & 0 \\ -g'(\boldsymbol \theta) & 1\end{bmatrix} \\ \mathbf B^*&= E\begin{bmatrix} \psi^* {\psi^*}^T \end{bmatrix} = E\begin{bmatrix} \begin{pmatrix} \psi(Y, \boldsymbol \theta) \\ g(\boldsymbol \theta) - \theta^* \end{pmatrix} \begin{pmatrix} \psi^T(Y, \boldsymbol \theta) & g(\boldsymbol \theta) - \theta^* \end{pmatrix} \end{bmatrix} \\ &=E\begin{bmatrix} \psi(Y, \boldsymbol \theta) \psi^T(Y, \boldsymbol \theta) & \psi(Y, \boldsymbol \theta)[g(\boldsymbol \theta) - \theta^*] \\ \psi(Y, \boldsymbol \theta)[g(\boldsymbol \theta) - \theta^*] & [g(\boldsymbol \theta) - \theta^*]^2 \end{bmatrix} \\ &= \begin{bmatrix} \mathbf B & 0 \\ 0 & 0 \end{bmatrix} \\ \mathbf V^* &= {\mathbf A^*}^{-1}\mathbf B^* \left\{{\mathbf A^*}^{-1}\right\}^T \\ &= \begin{bmatrix} \mathbf A & 0 \\ -g'(\boldsymbol \theta) & 1\end{bmatrix}^{-1} \begin{bmatrix} \mathbf B & 0 \\ 0 & 0 \end{bmatrix} \left\{\begin{bmatrix} \mathbf A & 0 \\ -g'(\boldsymbol \theta) & 1\end{bmatrix}^{-1}\right\}^T \\ &= \mathbf A^{-1}\begin{bmatrix} 1 & 0 \\ g'(\boldsymbol \theta)& \mathbf A\end{bmatrix} \begin{bmatrix} \mathbf B & 0 \\ 0 & 0 \end{bmatrix} \left\{\mathbf A^{-1}\begin{bmatrix} 1 & 0\\ g'(\boldsymbol \theta)& \mathbf A\end{bmatrix}\right\}^T \\ &= \begin{bmatrix} \mathbf A^{-1} & 0\\ g'(\boldsymbol \theta)\mathbf A^{-1}& 1\end{bmatrix} \begin{bmatrix} \mathbf B & 0 \\ 0 & 0 \end{bmatrix} \left\{\begin{bmatrix} \mathbf A^{-1} & 0\\ g'(\boldsymbol \theta)\mathbf A^{-1}& 1\end{bmatrix}\right\}^T \\ &= \begin{bmatrix} \mathbf {A^{-1}BA^{-1}}^T & \mathbf {A^{-1}BA^{-1}}^T g'(\boldsymbol \theta)^T \\ g'(\boldsymbol \theta)\mathbf {A^{-1}BA^{-1}}^T & g'(\boldsymbol \theta) \mathbf {A^{-1}BA^{-1}}^T g'(\boldsymbol \theta)^T\end{bmatrix} ~~~~\text{using Microsoft Mathematics for matrix multiplication} \end{aligned}\] {% endraw %}

The \(g'(\hat {\boldsymbol \theta})\) is \(AN \left( \boldsymbol \theta_0, \frac{g'(\boldsymbol \theta)\mathbf V g'(\boldsymbol \theta)^T } n \right)\), where \(\mathbf V = \mathbf {A^{-1}BA^{-1}}^T\)

7.7

The generalized method of moments (GMM) is an important estimation method found mainly in the econometrics literature and closely related to M-estimatiion. Suppose that we have iid random variables \(Y_1, \dots, Y_n\) and a \(p\) dimensional unknown parameter \(\boldsymbol \theta\). The key idea is that there are a set of \(g \geq p\) possible estimating equations \[\frac 1 n \sum_{i=1}^n \psi_j (Y_i; \boldsymbol \theta) = 0, ~~~~~~~j=1, \dots, q\] motivated by the fact that \(E_{\psi_j}(Y_1; \boldsymbol \theta_0)\) where \(\boldsymbol \theta_0\) is the true value. These motivating zero expectations come from the theory in the subject area being studied. But notice that if \(q > p\), then we have too many equations. The GMM approach is to minimize the objective function

\[T= \left[\frac 1 n \sum_{i=1}^n \boldsymbol \psi (Y_i; \boldsymbol \theta)\right]^T \mathbf W \left[\frac 1 n \sum_{i=1}^n \boldsymbol \psi (Y_i; \boldsymbol \theta)\right]\]

where \(\boldsymbol \psi = (\psi_1, \dots, \psi_q)^T\) and \(\mathbf W\) is a matrix of weights. Now let’s simplify the problem by letting \(q=2, p=1\) so that \(\theta\) is real-valued, and \(\mathbf W = \text{diag}(w_1,w_2)\). Then T reduces to

\[T= w_1\left[\frac 1 n \sum_{i=1}^n \psi_1 (Y_i; \boldsymbol \theta)\right]^2 +w_2\left[\frac 1 n \sum_{i=1}^n \psi_2 (Y_i; \boldsymbol \theta)\right]^2\]

To find \(\hat \theta\) we just take the partial derivative of T with respect to \(\theta\) and set it equal to 0:

a.

Prove that \(S(\mathbf Y ; \theta_0) \overset p \to 0\) making any moment assumptions that you need. (This should suggest to you that the solution of the equation \(S(\mathbf Y; \theta)=0\) is consistent.)

Solution:

\[\begin{aligned} \frac 1 n \sum_{i=1}^n \psi_j(Y_i; \theta_0) &\overset p \to E\left[\frac 1 n \sum_{i=1}^n \psi_j(Y_i; \theta_0) \right] = 0 & \text{ by WLLN} \\ \frac 1 n \sum_{i=1}^n \psi_j'(Y_i; \theta_0) &\overset p \to E\left[\frac 1 n \sum_{i=1}^n \psi_j'(Y_i; \theta_0) \right] & \text{ by WLLN} \\ \text{Assuming } E\left[\frac 1 n \sum_{i=1}^n \psi_j'(Y_i; \theta_0) \right] &< \infty,\\ w_j \left[\frac 1 n \sum_{i=1}^n \psi_j (Y_i; \boldsymbol \theta)\right]&\left[\frac 1 n \sum_{i=1}^n \psi_j '(Y_i; \boldsymbol \theta)\right] \overset p \to 0 & \text{by Slutsky's Theorem} \end{aligned}\] \[\begin{aligned} \implies S(\mathbf Y; \theta) &= 2w_1\left[\frac 1 n \sum_{i=1}^n \psi_1 (Y_i; \boldsymbol \theta)\right]\left[\frac 1 n \sum_{i=1}^n \psi_1 '(Y_i; \boldsymbol \theta)\right] +2w_2\left[\frac 1 n \sum_{i=1}^n \psi_2 (Y_i; \boldsymbol \theta)\right]\left[\frac 1 n \sum_{i=1}^n \psi_2 '(Y_i; \boldsymbol \theta)\right] \\ &\overset p \to 0 \end{aligned}\]

b.

To get asymptotic normality for \(\hat \theta\), a direct approach is to expand \(S(\mathbf Y; \hat \theta)\) around \(\theta_0\) and solve for \(\hat \theta - \theta_0\). \[\hat \theta - \theta_0 = \left[-\frac{\partial S(\mathbf Y; \theta_0)}{\partial \theta^T} \right]^{-1}S(\mathbf Y; \theta_0)+ \left[-\frac{\partial S(\mathbf Y; \theta_0)}{\partial \theta^T} \right]^{-1}R_n\] Then one ignores the remainder term and uses Slutsky’s Theorem along with asymptotic normality of \(S(\mathbf Y; \theta_0)\). But how to get the asymptotic normality of \(S(\mathbf Y; \theta_0)\)? Find \(h(Y_i; \theta_0)\) such that \[S(\mathbf Y; \theta_0) = \frac 1 n \sum_{i=1}^n h(Y_i; \theta_0) + R_n^*\] No proofs are required.

Solution:

\[\begin{aligned} S(\mathbf Y; \theta) &= 2w_1\left[\frac 1 n \sum_{i=1}^n \psi_1 (Y_i; \boldsymbol \theta)\right]\left[\frac 1 n \sum_{i=1}^n \psi_1 '(Y_i; \boldsymbol \theta)\right] +2w_2\left[\frac 1 n \sum_{i=1}^n \psi_2 (Y_i; \boldsymbol \theta)\right]\left[\frac 1 n \sum_{i=1}^n \psi_2 '(Y_i; \boldsymbol \theta)\right] \\ &= \sum_{j=1}^2 2w_j\left[\frac 1 n \sum_{i=1}^n \psi_j (Y_i; \boldsymbol \theta)\right]\left[\frac 1 n \sum_{i=1}^n \psi_j '(Y_i; \boldsymbol \theta)\right] \\ &= \sum_{j=1}^2 2w_j\left[\frac 1 n \sum_{i=1}^n \psi_j (Y_i; \boldsymbol \theta)\right]\left[\frac 1 n \sum_{i=1}^n \psi_j '(Y_i; \boldsymbol \theta) + E\left(\psi_j '(Y_i; \boldsymbol \theta) \right)- E\left(\psi_j '(Y_i; \boldsymbol \theta) \right)\right] \\ &=\sum_{j=1}^2 2w_jE\left(\psi_j '(Y_i; \boldsymbol \theta) \right)\left[\frac 1 n \sum_{i=1}^n \psi_j (Y_i; \boldsymbol \theta)\right]+ 2w_j\left[\frac 1 n \sum_{i=1}^n \psi_j (Y_i; \boldsymbol \theta)\right]\left[\frac 1 n \sum_{i=1}^n \psi_j '(Y_i; \boldsymbol \theta)- E\left(\psi_j '(Y_i; \boldsymbol \theta) \right)\right] \\ \end{aligned}\]

\[\begin{aligned} \text{Let }R_n^* &= \sum_{j=1}^2 2w_j\left[\frac 1 n \sum_{i=1}^n \psi_j (Y_i; \boldsymbol \theta)\right]\left[\frac 1 n \sum_{i=1}^n \psi_j '(Y_i; \boldsymbol \theta)- E\left(\psi_j '(Y_i; \boldsymbol \theta) \right)\right] \\ \implies \sqrt n R_n^* &= \sum_{j=1}^2 2w_j\left[\frac 1 n \sum_{i=1}^n \psi_j (Y_i; \boldsymbol \theta)\right]\sqrt n\left[\frac 1 n \sum_{i=1}^n \psi_j '(Y_i; \boldsymbol \theta)- E\left(\psi_j '(Y_i; \boldsymbol \theta) \right)\right] \\ &\overset p \to 0 ~~~ \text{by WLLN, Central Limit Theorem, and Slutsky's Theorem} \end{aligned}\] \[\begin{aligned} \implies S(\mathbf Y; \theta) &= \sum_{j=1}^2 2w_jE\left(\psi_j '(Y_i; \boldsymbol \theta) \right)\left[\frac 1 n \sum_{i=1}^n \psi_j (Y_i; \boldsymbol \theta)\right]+ R_n \\ &= \frac 1 n \sum_{i=1}^n \sum_{j=1}^2 2w_j\psi_j (Y_i; \boldsymbol \theta)E\left(\psi_j '(Y_i; \boldsymbol \theta) \right) + R_n \\ \implies h(Y_i; \theta_0) &=\sum_{j=1}^2 2w_j\psi_j (Y_i; \boldsymbol \theta)E\left(\psi_j '(Y_i; \boldsymbol \theta) \right) \\ &=2w_1\psi_1 (Y_i; \boldsymbol \theta)E\left(\psi_1 '(Y_i; \boldsymbol \theta) \right)+2w_2\psi_2 (Y_i; \boldsymbol \theta)E\left(\psi_2 '(Y_i; \boldsymbol \theta) \right) \end{aligned}\]

c.

The equation \(S(\mathbf Y; \theta) = 0\) is not in the form for using M-estimation results (because the product of sums is not a simple sum). Show how to get it in M-estimation form by adding two new parameters, \(\theta_2\) and \(\theta_3\), and two new equations so that the result is a system of three equations with three \(\psi\) functions; call them \(\psi_1^*, \psi_2^*\), and \(\psi_3^*\) because \(\psi_1^*\) is actually a function of the original \(\psi_1\) and \(\psi_2\).

Solution:

\[\begin{aligned} S(\mathbf Y; \theta) &= 2w_1\left[\frac 1 n \sum_{i=1}^n \psi_1 (Y_i; \theta)\right]\left[\frac 1 n \sum_{i=1}^n \psi_1 '(Y_i; \theta)\right] +2w_2\left[\frac 1 n \sum_{i=1}^n \psi_2 (Y_i; \theta)\right]\left[\frac 1 n \sum_{i=1}^n \psi_2 '(Y_i; \theta)\right] \\ \psi_1^* &= 2w_1 \psi_1 (Y_i; \theta) \theta_2 +2w_2\psi_2 (Y_i; \theta)\theta_3\\ \psi_2^* &= \psi_1'(Y_i; \theta) - \theta_2 \\ \psi_3^* &= \psi_2'(Y_i; \theta) - \theta_3 \end{aligned}\]

Almost trivially, the solutions to \(\sum_{i=1}^n \psi_2^* = 0\) and \(\sum_{i=1}^n \psi_3^* = 0\) are respectively \(\left[\frac 1 n \sum_{i=1}^n \psi_1 '(Y_i; \theta)\right]\) and \(\left[\frac 1 n \sum_{i=1}^n \psi_2 '(Y_i; \theta)\right]\).

Thus, \[\begin{aligned} 0&= \sum_{i=1}^n \psi_1^*,\sum_{i=1}^n \psi_2^* = 0,\sum_{i=1}^n \psi_3^* = 0\\ \implies 0 &= \sum_{i=1}^n \left(2w_1 \psi_1 (Y_i; \theta) \left[\frac 1 n \sum_{i=1}^n \psi_1 '(Y_i; \theta)\right] +2w_2\psi_2 (Y_i; \theta)\left[\frac 1 n \sum_{i=1}^n \psi_2 '(Y_i; \theta)\right] \right) \\ \implies 0 &= \frac 1 n\sum_{i=1}^n \left(2w_1 \psi_1 (Y_i; \theta) \left[\frac 1 n \sum_{i=1}^n \psi_1 '(Y_i; \theta)\right] +2w_2\psi_2 (Y_i; \theta)\left[\frac 1 n \sum_{i=1}^n \psi_2 '(Y_i; \theta)\right] \right) \\ &= 2w_1\left[\frac 1 n \sum_{i=1}^n \psi_1 (Y_i; \theta)\right]\left[\frac 1 n \sum_{i=1}^n \psi_1 '(Y_i; \theta)\right] +2w_2\left[\frac 1 n \sum_{i=1}^n \psi_2 (Y_i; \theta)\right]\left[\frac 1 n \sum_{i=1}^n \psi_2 '(Y_i; \theta)\right] \\ &= S(\mathbf Y; \theta) \end{aligned}\]