Chapter 1: Roles of Modeling in Statistical Inference

Author

Michael Throolin

1.12.

Suppose that \(Y_1, \cdots, Y_n\) are iid \(Poisson(\lambda)\) and that \(\hat{\lambda} = \bar{Y}\). Define \(\sigma^2_{\hat{\lambda}} = var(\hat{\lambda})\). Consider the following two estimators of \(\sigma^2_{\hat{\lambda}}\) : \[\hat{\sigma}^2_{\hat{\lambda}} = \frac{\hat{\lambda}}{n}~~,~~\tilde{\sigma}^2_{\hat{\lambda}} = \frac{s^2}{n}\]

a.

When the Poisson model holds, are both estimators consistent?

Solution:

\[ \begin{aligned} n\hat{\sigma}^2_{\hat{\lambda}} &= n\left(\frac{\hat{\lambda}}{n} \right) = \bar{Y} \overset{P}{\to} \lambda &\text{ by weak law of large numbers} \\ n{\sigma}^2_{{\lambda}} &= n \, var(\hat{\lambda}) =n\,var(\bar{Y}) \overset{P}{\to} n\frac{\lambda}{n} = \lambda & \overset{\text{ by Central Limit Theorem}}{\text{and } Y \sim POIS(\lambda)} \\ \end{aligned} \]

Thus \(n\hat{\sigma}^2_{\hat{\lambda}}\) is a consistent estimator of \(n{\sigma}^2_{\hat{\lambda}}\).

Now for the other estimator,

Let \(\epsilon > 0\),

\[ \begin{aligned} \lim_{n \to \infty}Pr(|n\tilde{\sigma}^2_{\hat{\lambda}} - n\sigma^2_{\hat{\lambda}}| \geq \epsilon) &= \lim_{n \to \infty}Pr(|n \frac{s^2}n - n\sigma^2_{\hat{\lambda}}| \geq \epsilon) \\ &= \lim_{n \to \infty}Pr(|s^2 - n \, var(\hat{\lambda}) | \geq \epsilon) \\ &= \lim_{n \to \infty}Pr(|s^2 - n \, var(\bar{Y}) | \geq \epsilon) \\ &=\lim_{n \to \infty}Pr(|s^2 - n (\lambda/n)| \geq \epsilon) \\ &= \lim_{n \to \infty}Pr(|s^2 - \lambda | \geq \epsilon) \\ &\leq \frac{E[s^2-\lambda]^2}{\epsilon^2} & \overset{\text{by Chebychev's Inequality}}{\text{(see page 233 Cassella-Berger )}} \\ \\ &= \frac{Var(s^2)}{\epsilon^2} \end{aligned} \]

Note that \(lim_{n \to \infty}Var(s^2) = 0\) under the Poisson model since the kurtosis is not a function of n. Thus \(\lim_{n \to \infty}Pr(|n\tilde{\sigma}^2_{\hat{\lambda}} - n\sigma^2_{\hat{\lambda}}| \geq \epsilon) = 0,\) thus, by definition, \(n\tilde{\sigma}^2_{\hat{\lambda}}\) is a consistent estimator of \(n\sigma^2_{\hat{\lambda}}\).

b.

When the Poisson model holds, give the asymptotic relative efficiency of the two estimators. (Actually here you can compare exact variances since the estimators are so simple.)

Solution:

\[ \begin{aligned} RE(\hat{\sigma}^2_{\hat{\lambda}},\tilde{\sigma}^2_{\hat{\lambda}}) &= \lim_{n \to \infty} VAR(\frac{\hat{\lambda}}{n})/VAR(\frac{s^2}{n}) \\ &= \lim_{n \to \infty} Var(\hat{\lambda})/Var({s^2}) \\ &= \lim_{n \to \infty}\frac{Var(\bar{Y})}{[Var(Y_1)]^2[\frac{2}{n-1}+\frac{Kurt -3}{n}]} \\ &= \lim_{n \to \infty}\frac{\lambda/n}{\lambda^2\left[\frac{2}{n-1}+\frac{1/\lambda}{n}\right]} \\ &= \lim_{n \to \infty}\frac{1}{n\lambda\left[\frac{2}{n-1}+\frac{1/\lambda}{n}\right]} \\ &= \lim_{n \to \infty}\frac{1}{\left[\frac{2n\lambda}{n-1}+1\right]} \\ &= \frac{1}{2\lambda + 1} \end{aligned} \]

As the support for \(\lambda\) is \(\mathbb N\), this indicates \(\hat{\sigma}^2_{\hat{\lambda}}\) is more efficient.

c.

When the Poisson model does not hold but assuming second moments exist, are both estimators consistent?

Solution:

\(n\tilde{\sigma}^2_{\hat{\lambda}}\) will always be consistent as long as \(Var(s^2) \overset{P}{\to} 0\).

\(n\hat{\sigma}^2_{\hat{\lambda}}\) would no longer necessarily be a consistent estimator of \(n{\sigma}^2_{\hat{\lambda}}\). This is because

\[\begin{aligned} n\hat{\sigma}^2_{\hat{\lambda}} &= n\left(\frac{\hat{\lambda}}{n} \right) = \bar{Y} \overset{P}{\to} \mu &\text{ by weak law of large numbers} \\ n{\sigma}^2_{{\lambda}} &= n \, var(\hat{\lambda}) =n\,var(\bar{Y}) \overset{P}{\to} n\frac{\sigma^2}{n} = \sigma^2 & \text{ by Central Limit Theorem} \end{aligned}\]

Therefore, for any distribution where the mean does not equal the variance (such as a standard normal distribution), \(n{\hat{\sigma}}^2_{\hat{\lambda}}\) will not be a consistent estimator of \(n{\sigma}^2_{\hat{\lambda}}\).

1.13

Define full model as \(N(\mu, c^2\mu^2)\), where \(c\) is known. Define \(\hat{\mu}_{WLS}\) as the value that minimizes \[\rho(\mu) = \sum_{i=1}^n \frac{(Y_i- \mu)^2}{c^2\mu^2}\].

a)

Show that \(\mu_{WLS}\) converges in probability, but that its limit is not \(\mu\).

Solution:

\[ \begin{aligned} 0 &= \frac{\partial}{\partial \mu} \left\{\rho(\mu)\right\} = \frac{\partial }{\partial \mu} \left\{ \sum_{i=1}^n \frac{(Y_i- \mu)^2}{c^2\mu^2} \right\} = \frac{\partial }{\partial \mu} \left\{ \sum_{i=1}^n \frac{Y_i^2- 2Y_i\mu +\mu^2}{c^2\mu^2} \right\} \\ &= \frac{\partial }{\partial \mu} \left\{ \frac{nm_2'- 2n\bar{Y}\mu +n\mu^2}{c^2\mu^2} \right\} \\ &=\frac{c^2\mu^2(-2n\bar{Y}+2n\mu)-2c^2\mu(nm_2'- 2n\bar{Y}\mu +n\mu^2)}{c^4\mu^4} \\ &=\frac{\mu(-2n\bar{Y}+2n\mu)-2(nm_2'- 2n\bar{Y}\mu +n\mu^2)}{c^2\mu^3} \\ &=\frac{-2n\bar{Y}\mu+2n\mu^2-2nm_2'+ 4n\bar{Y}\mu -2n\mu^2}{c^2\mu^3} \\ &=\frac{2n\bar{Y}\mu -2nm_2'}{c^2\mu^3} = \frac{2n(\bar{Y}\mu -m_2')}{c^2\mu^3}\\ \implies \mu &= \frac{m_2'}{\bar{Y}} \end{aligned} \]

\[ \begin{aligned} \frac{\partial^2}{\partial^2 \mu} \left\{\rho(\mu)\right\}_{\mu=\frac{m_2'}{\bar{Y}}} &= \frac{\partial^2 }{\partial^2 \mu} \left\{ \sum_{i=1}^n \frac{(Y_i- \mu)^2}{c^2\mu^2} \right\}_{\mu = \frac{m_2'}{\bar{Y}}} =\frac{\partial}{\partial \mu} \left\{ \frac{2n(\bar{Y}\mu -m_2')}{c^2\mu^3} \right\}_{\mu = \frac{m_2'}{\bar{Y}}} \\ &=\left\{ \frac{c^2\mu^3(2n\bar{Y})-3c^2\mu^2 2n(\bar{Y}\mu -m_2')}{c^4\mu^6} \right\}_{\mu = \frac{m_2'}{\bar{Y}}} \\ &=\left\{ \frac{2n\bar{Y}\mu-6n\bar{Y}\mu +6nm_2'}{c^2\mu^4} \right\}_{\mu = \frac{m_2'}{\bar{Y}}} =\left\{ \frac{-4n\bar{Y}\mu+6nm_2'}{c^2\mu^4} \right\}_{\mu = \frac{m_2'}{\bar{Y}}} \\ &=\frac{-4n\bar{Y}\frac{m_2'}{\bar{Y}}+6nm_2'}{c^2\mu^4} =\frac{2nm_2'}{c^2\mu^4} > 0. \end{aligned} \]

Thus \(\hat{\mu}_{WLS} = \frac{m_2'}{\bar{Y}}\). By the Strong Law of Large Numbers, \(m_2' \to E(m_2')\) and \(\bar{Y} \to \mu\). Note that,

\[ \begin{aligned}E(m_2') &= E\left(\frac{1}{n}\sum_{i=1}^n Y_i^2\right)=\frac{1}{n}\sum_{i=1}^nE\left( Y_i^2\right) \\ &=\frac{1}{n}\sum_{i=1}^n \left(Var(Y_i) +[E(Y_i)]^2 \right) =\frac{1}{n}\sum_{i=1}^n \left( c^2\mu^2+\mu^2 \right)\\ &=c^2\mu^2+\mu^2 = (c^2+1)\mu^2 \end{aligned} \]

Thus, by the continuous mapping theorem, \(\hat{\mu}_{WLS}= \frac{m_2'}{\bar{Y}} \overset{P}{\to} = \frac{(c^2+1)\mu^2}{\mu}=(c^2+1)\mu\)

b)

Find \(E[\rho ' (\mu)]\) and show that it does not equal zero.

Solution:

\[ \begin{aligned} E[\rho ' (\mu)] &= E\left(\frac{2n(\bar{Y}\mu -m_2')}{c^2\mu^3} \right) \\ &= \frac{2n}{c^2\mu^3}\left[E(\bar{Y})\mu -E(m_2')\right] \\ &= \frac{2n}{c^2\mu^3}\left[\mu^2 -(c^2+1)\mu^2\right] \\ &= \frac{2n}{c^2\mu^3}\left[-c\mu^2\right] \\ &=\frac{-2n}{\mu} \neq 0. \end{aligned} \]

c)

Find the estimator obtained by minimizing \(\rho_*(\mu) = \sum_{i=1}^n \frac{(Y_i- \mu)^2}{c^2\mu^2} + 2log(\mu)\).

Solution:

\[\begin{aligned} 0 &= \frac{\partial}{\partial \mu} \left\{\rho_*(\mu)\right\} \\ &= \frac{\partial}{\partial \mu} \left\{\left[ \sum_{i=1}^n \frac{(Y_i- \mu)^2}{c^2\mu^2} \right] + 2n\log(\mu)\right\}\\ &= \frac{2n(\bar{Y}\mu -m_2')}{c^2\mu^3} +\frac{2n}{\mu} \\ &= \frac{2n(\bar{Y}\mu -m_2'+c^2\mu^2)}{c^2\mu^3}\\ \implies \mu &= \frac{-\bar{Y} + \sqrt{\bar{Y}^2+4c^2m_2'}}{2c^2} & \overset {\text{other solution cannot exist since }}{\mu > 0} \end{aligned} \]

\[ \begin{aligned} \frac{\partial^2}{\partial^2 \mu} \left\{\rho_*(\mu)\right\}_{\mu = \hat{\mu}} &=\frac{\partial}{\partial \mu} \left\{ \frac{2n(\bar{Y}\mu -m_2'+c^2\mu^2)}{c^2\mu^3} \right\}_{\mu = \hat{\mu}} =\frac{\partial}{\partial \mu} 2n\left\{ \frac{\bar{Y}\mu -m_2'+c^2\mu^2}{c^2\mu^3} \right\}_{\mu = \hat{\mu}} \\ &=2n\left\{\frac{c^2\mu^3(\bar{Y}+2c^2\mu)-3c^2\mu^2(\bar{Y}\mu -m_2'+c^2\mu^2)}{c^4\mu^6} \right\}_{\mu = \hat{\mu}} \\ &=2n\left\{\frac{\mu(\bar{Y}+2c^2\mu)-3(\bar{Y}\mu -m_2'+c^2\mu^2)}{c^2\mu^4} \right\}_{\mu = \hat{\mu}} \\ &=2n\left\{\frac{-2\bar{Y}\mu - c^2\mu^2+3m_2'}{c^2\mu^4} \right\}_{\mu = \hat{\mu}} \\ \end{aligned} \]

We only need to show the numerator to be positive.

\[ \begin{aligned} &- c^2\mu^2-2\bar{Y}\mu +3m_2' \bigg|_{\mu = \left(\frac{-\bar{Y} + \sqrt{\bar{Y}^2+4c^2m_2'}}{2c^2}\right)}\\ &=-c^2\left(\frac{-\bar{Y} + \sqrt{\bar{Y}^2+4c^2m_2'}}{2c^2}\right)^2-2\bar{Y}\left(\frac{-\bar{Y} + \sqrt{\bar{Y}^2+4c^2m_2'}}{2c^2}\right)+3m_2' \\ &=-c^2\left(\frac{\bar{Y}^2 -2\bar{Y} \sqrt{\bar{Y}^2+4c^2m_2'}+\bar{Y}^2+4c^2m_2'}{4c^4}\right)-2\bar{Y}\left(\frac{-\bar{Y} + \sqrt{\bar{Y}^2+4c^2m_2'}}{2c^2}\right)+3m_2' \\ &=\frac{-\bar{Y}^2 +\bar{Y} \sqrt{\bar{Y}^2+4c^2m_2'}-2c^2m_2'}{2c^2}+\left(\frac{2\bar{Y}^2 -2\bar{Y} \sqrt{\bar{Y}^2+4c^2m_2'}}{2c^2}\right)+3m_2' \\ &=\frac{\bar{Y}^2+4c^2m_2' -\bar{Y} \sqrt{\bar{Y}^2+4c^2m_2'}}{2c^2} \\ &=\frac{\left(\sqrt{\bar{Y}^2+4c^2m_2'} -\bar{Y}\right)\sqrt{\bar{Y}^2+4c^2m_2'}}{2c^2} \\ &>0~\text{because } \sqrt{x^2+a} > x~\forall a>0. \end{aligned} \]

Thus \(\hat{\mu}_{^*{WLS}}=\frac{-\bar{Y} + \sqrt{\bar{Y}^2+4c^2m_2'}}{2c^2}\)

d)

Is \(\hat{\mu}_{^*{WLS}}\) a consistent estimator of \(\mu\)? If so, find its asymptotic variance and compare to \(\hat{\mu}_{MLE}\) and \(\hat{\mu}_{MOM}\).

Solution:

By continuous mapping theorem,

\[ \begin{aligned} \hat{\mu}_{^*{WLS}} =\frac{-\bar{Y} + \sqrt{\bar{Y}^2+4c^2m_2'}}{2c^2} &\overset{P}{\to} \frac{-\mu + \sqrt{\mu^2+4c^2(c^2+1)\mu^2}}{2c^2}\\ &=\frac{-\mu +\mu \sqrt{1+4c^2(c^2+1)}}{2c^2} \\ &=\frac{-\mu +\mu \sqrt{1+4c^2+ 4c^4}}{2c^2} \\ &=\frac{-\mu +\mu \sqrt{(1+2c^2)^2}}{2c^2} \\ &=\frac{-\mu +\mu (1+2c^2)}{2c^2} \\ &=\mu. \end{aligned} \]

Thus \(\hat{\mu}_{^*{WLS}}\) is a consistent estimator of \(\mu\).

Note that \(\hat{\mu}_{^*{WLS}} = \hat{\mu}_{MLE}\). Thus, \(Var(\hat{\mu}_{^*{WLS}}) = Var(\hat{\mu}_{MLE}) = \frac{c^2\mu^2}{n(1+2c^2)}\), and the relative efficiency between the two is 1. That is,

\(ARE(\hat{\mu}_{^*{WLS}}, \hat{\mu}_{MLE}) =1\) And, according to (1.17, p. 18) in the book, \(ARE(\hat{\mu}_{MOM}, \hat{\mu}_{^*WLS}) =1+2c^2\).

Thus \(\hat{\mu}_{MLE}\) and \(\hat{\mu}_{^*{WLS}}\) are equally more efficient than \(\hat{\mu}_{MOM}\)

e)

Find \(E[\rho_* ' (\mu)]\) and show that it does not equal zero.

Solution:

\[ \begin{aligned} E[\rho_* ' (\mu)] &= E\left(\frac{2n(\bar{Y}\mu -m_2'+c^2\mu^2)}{c^2\mu^3}\right) \\ &= \frac{2n}{c^2\mu^3}\left[E(\bar{Y})\mu -E(m_2') +E(c^2\mu^2)\right] \\ &= \frac{2n}{c^2\mu^3}\left[\mu^2 -(c^2+1)\mu^2 +c^2\mu^2\right]\\ &= \frac{2n}{c^2\mu^3}(0) \\ &=0. \end{aligned} \]

1.17.

To check on the asymptotic variance expression in (1.20, p. 18), generate 1000 samples of size \(n=20\) from the exponential density \((1/2)exp(-y/2)\) that has mean \(\mu=2\) and variance=4 so that in terms of the model \(\sigma^2=1\). For each sample, calculate \(\hat{\mu}_{MLE}\) in (1.16, p. 17). Then compute (1.20, p. 18) for this exponential density and compare it to \(n=20\) times the sample variance of the 1000 values of \(\hat{\mu}_{MLE}\). Repeat for \(n=50\).

Solution:

Show the code

library(kableExtra)
set.seed(123)

#Set constants
mu <- 2
pop_var <- 4
c<- 1


skew <- 2
kurt <- 9


#For sample = 20
sample_size<- 20

#Compute asymptotic variance as defined in book (1.21)
a_var_20<- (c^2*mu^2/(sample_size*(1+2*c^2)^2))*(1+2*c^2+2*c*skew+c^2*(kurt-3))

#Simulate mu_mle
sample <- list()
mu_mle<- rep(NA, 1000)

for(i in 1:1000){
    sample[[i]]<- rexp(n=sample_size,rate =0.5)
    first_moment <- mean(sample[[i]])
    second_moment <- sum(sample[[i]]^2)/sample_size
    
    mu_mle[i] <-
      (sqrt((first_moment^2)+
              4*(c^2)*second_moment)- first_moment)/
      (2*c^2)
}

sample_var_20 <- var(mu_mle)

sample_size<- 50

#Compute asymptotic variance as defined in book (1.21)
a_var_50<- (c^2*mu^2/(sample_size*(1+2*c^2)^2))*(1+2*c^2+2*c*skew+c^2*(kurt-3))

#Simulate mu_mle
sample <- list()
mu_mle<- rep(NA, 1000)

for(i in 1:1000){
    sample[[i]]<- rexp(n=sample_size,rate =0.5)
    first_moment <- mean(sample[[i]])
    second_moment <- sum(sample[[i]]^2)/sample_size
    
    mu_mle[i] <-
      (sqrt((first_moment^2)+
              4*(c^2)*second_moment)- first_moment)/
      (2*c^2)
}
sample_var_50 <- var(mu_mle)

sample_size_20 <- c(sample_var_20, a_var_20)
sample_size_50 <- c(sample_var_50, a_var_50)

kable(rbind(sample_size_20,sample_size_50),
      col.names = c(expression("Sample Variance"), expression("Asymptotic Variance")))

	Sample Variance	Asymptotic Variance
sample_size_20	0.2465302	0.2888889
sample_size_50	0.1007942	0.1155556

The sample variance approaches the asymptotic variance as sample size increases.