Chapter 1: Roles of Modeling in Statistical Inference
Author
Michael Throolin
1.12.
Suppose that \(Y_1, \cdots, Y_n\) are iid \(Poisson(\lambda)\) and that \(\hat{\lambda} = \bar{Y}\). Define \(\sigma^2_{\hat{\lambda}} = var(\hat{\lambda})\). Consider the following two estimators of \(\sigma^2_{\hat{\lambda}}\) : \[\hat{\sigma}^2_{\hat{\lambda}} = \frac{\hat{\lambda}}{n}~~,~~\tilde{\sigma}^2_{\hat{\lambda}} = \frac{s^2}{n}\]
a.
When the Poisson model holds, are both estimators consistent?
Solution:
\[
\begin{aligned}
n\hat{\sigma}^2_{\hat{\lambda}} &= n\left(\frac{\hat{\lambda}}{n} \right) = \bar{Y} \overset{P}{\to} \lambda &\text{ by weak law of large numbers} \\
n{\sigma}^2_{{\lambda}} &= n \, var(\hat{\lambda}) =n\,var(\bar{Y}) \overset{P}{\to} n\frac{\lambda}{n} = \lambda & \overset{\text{ by Central Limit Theorem}}{\text{and } Y \sim POIS(\lambda)} \\
\end{aligned}
\]
Thus \(n\hat{\sigma}^2_{\hat{\lambda}}\) is a consistent estimator of \(n{\sigma}^2_{\hat{\lambda}}\).
Note that \(lim_{n \to \infty}Var(s^2) = 0\) under the Poisson model since the kurtosis is not a function of n. Thus \(\lim_{n \to \infty}Pr(|n\tilde{\sigma}^2_{\hat{\lambda}} - n\sigma^2_{\hat{\lambda}}| \geq \epsilon) = 0,\) thus, by definition, \(n\tilde{\sigma}^2_{\hat{\lambda}}\) is a consistent estimator of \(n\sigma^2_{\hat{\lambda}}\).
b.
When the Poisson model holds, give the asymptotic relative efficiency of the two estimators. (Actually here you can compare exact variances since the estimators are so simple.)
As the support for \(\lambda\) is \(\mathbb N\), this indicates \(\hat{\sigma}^2_{\hat{\lambda}}\) is more efficient.
c.
When the Poisson model does not hold but assuming second moments exist, are both estimators consistent?
Solution:
\(n\tilde{\sigma}^2_{\hat{\lambda}}\) will always be consistent as long as \(Var(s^2) \overset{P}{\to} 0\).
\(n\hat{\sigma}^2_{\hat{\lambda}}\) would no longer necessarily be a consistent estimator of \(n{\sigma}^2_{\hat{\lambda}}\). This is because
\[\begin{aligned}
n\hat{\sigma}^2_{\hat{\lambda}} &= n\left(\frac{\hat{\lambda}}{n} \right) = \bar{Y} \overset{P}{\to} \mu &\text{ by weak law of large numbers} \\
n{\sigma}^2_{{\lambda}} &= n \, var(\hat{\lambda}) =n\,var(\bar{Y}) \overset{P}{\to} n\frac{\sigma^2}{n} = \sigma^2 & \text{ by Central Limit Theorem}
\end{aligned}\]
Therefore, for any distribution where the mean does not equal the variance (such as a standard normal distribution), \(n{\hat{\sigma}}^2_{\hat{\lambda}}\) will not be a consistent estimator of \(n{\sigma}^2_{\hat{\lambda}}\).
1.13
Define full model as \(N(\mu, c^2\mu^2)\), where \(c\) is known. Define \(\hat{\mu}_{WLS}\) as the value that minimizes \[\rho(\mu) = \sum_{i=1}^n \frac{(Y_i- \mu)^2}{c^2\mu^2}\].
a)
Show that \(\mu_{WLS}\) converges in probability, but that its limit is not \(\mu\).
Is \(\hat{\mu}_{^*{WLS}}\) a consistent estimator of \(\mu\)? If so, find its asymptotic variance and compare to \(\hat{\mu}_{MLE}\) and \(\hat{\mu}_{MOM}\).
Thus \(\hat{\mu}_{^*{WLS}}\) is a consistent estimator of \(\mu\).
Note that \(\hat{\mu}_{^*{WLS}} = \hat{\mu}_{MLE}\). Thus, \(Var(\hat{\mu}_{^*{WLS}}) = Var(\hat{\mu}_{MLE}) = \frac{c^2\mu^2}{n(1+2c^2)}\), and the relative efficiency between the two is 1. That is,
\(ARE(\hat{\mu}_{^*{WLS}}, \hat{\mu}_{MLE}) =1\) And, according to (1.17, p. 18) in the book, \(ARE(\hat{\mu}_{MOM}, \hat{\mu}_{^*WLS}) =1+2c^2\).
Thus \(\hat{\mu}_{MLE}\) and \(\hat{\mu}_{^*{WLS}}\) are equally more efficient than \(\hat{\mu}_{MOM}\)
e)
Find \(E[\rho_* ' (\mu)]\) and show that it does not equal zero.
To check on the asymptotic variance expression in (1.20, p. 18), generate 1000 samples of size \(n=20\) from the exponential density \((1/2)exp(-y/2)\) that has mean \(\mu=2\) and variance=4 so that in terms of the model \(\sigma^2=1\). For each sample, calculate \(\hat{\mu}_{MLE}\) in (1.16, p. 17). Then compute (1.20, p. 18) for this exponential density and compare it to \(n=20\) times the sample variance of the 1000 values of \(\hat{\mu}_{MLE}\). Repeat for \(n=50\).
Solution:
Show the code
library(kableExtra)set.seed(123)#Set constantsmu <-2pop_var <-4c<-1skew <-2kurt <-9#For sample = 20sample_size<-20#Compute asymptotic variance as defined in book (1.21)a_var_20<- (c^2*mu^2/(sample_size*(1+2*c^2)^2))*(1+2*c^2+2*c*skew+c^2*(kurt-3))#Simulate mu_mlesample <-list()mu_mle<-rep(NA, 1000)for(i in1:1000){ sample[[i]]<-rexp(n=sample_size,rate =0.5) first_moment <-mean(sample[[i]]) second_moment <-sum(sample[[i]]^2)/sample_size mu_mle[i] <- (sqrt((first_moment^2)+4*(c^2)*second_moment)- first_moment)/ (2*c^2)}sample_var_20 <-var(mu_mle)sample_size<-50#Compute asymptotic variance as defined in book (1.21)a_var_50<- (c^2*mu^2/(sample_size*(1+2*c^2)^2))*(1+2*c^2+2*c*skew+c^2*(kurt-3))#Simulate mu_mlesample <-list()mu_mle<-rep(NA, 1000)for(i in1:1000){ sample[[i]]<-rexp(n=sample_size,rate =0.5) first_moment <-mean(sample[[i]]) second_moment <-sum(sample[[i]]^2)/sample_size mu_mle[i] <- (sqrt((first_moment^2)+4*(c^2)*second_moment)- first_moment)/ (2*c^2)}sample_var_50 <-var(mu_mle)sample_size_20 <-c(sample_var_20, a_var_20)sample_size_50 <-c(sample_var_50, a_var_50)kable(rbind(sample_size_20,sample_size_50),col.names =c(expression("Sample Variance"), expression("Asymptotic Variance")))
Sample Variance
Asymptotic Variance
sample_size_20
0.2465302
0.2888889
sample_size_50
0.1007942
0.1155556
The sample variance approaches the asymptotic variance as sample size increases.