Exercise 4.1

We have that the estimates of the natural parameter is given by, \(\hat \theta = (b')^{-1}(\bar y)\), hence

  1. For the binomial, \(b(\theta)=m\, \log(1+e^\theta)\) and so \(b'(\theta)= m\, e^\theta/(1+e^\theta)\). The estimate, \(\hat \theta\) is given by the solution of \[ b'(\hat\theta)= m \frac{e^{\hat\theta}}{1+e^{\hat\theta}} = \bar y \] and so \(\theta\) can be estimated as \[\hat \theta = \log \left\{ \frac{\bar y /m}{1-\bar y/m} \right\} = \log \left\{ \frac{\bar y }{m-\bar y} \right\}. \] We know that the estimator, \(\hat \theta = \log \left\{ {\bar Y }/{(m-\bar Y)} \right\}\) is asymptotically unbiased, at least. Its variance is given by \[ \mbox{Var}(\hat\theta) = \frac{\phi}{n\, b''(\theta_0)} \] where \(\theta_0\) is the true value. Here \[ b''(\theta)= m \frac{e^{\theta}}{(1+e^{\theta})^2} \] and hence \[ \mbox{Var}(\hat\theta) = \frac{1}{nm}\frac{(1+e^{\theta_0})^2}{e^{\theta_0}} \]

    Notice that this depends on the unknown true value \(\theta_0\).

  2. For the geometric, \(b(\theta)= \theta-\log(1-e^\theta)\) and so \(b'(\theta)= 1+ e^\theta/(1-e^\theta)\). The estimate, \(\hat \theta\) is given by the solution of \[ b'(\hat\theta)= 1+ e^{\hat\theta}/(1-e^{\hat\theta}) = \bar y \] and so \[ \hat \theta = \log \left\{ 1-{1}\big/{\bar y} \right\}. \]

    We know that the estimator, \(\hat \theta = \log \left\{1-1/\bar Y \right\}\) is asymptotically unbiased, at least. To find the variance, first \[ b''(\theta)= \frac{e^{\theta}}{(1-e^{\theta})^2} \] and hence \[ \mbox{Var}(\hat\theta) = \frac{1}{n}\frac{(1-e^{\theta_0})^2}{e^{\theta_0}} \]

  3. For the exponential, \(\theta=-\lambda\) and \(b(\theta)=-\log (-\theta)\) (these are not in our notes and hence need to first write exponential distribution in exponential family form). Hence, \(b'(\theta)= -{1}/{\theta}\). The estimate, \(\hat \theta\) is given by the solution of \[ b'(\hat\theta)= -\frac{1}{\hat \theta} = \bar y \] and so \[\hat \theta = -{1}\big/{\bar y}. \] For the variance, \[ b''(\theta)= \frac{1}{\theta^2} \] and hence \[ \mbox{Var}(\hat\theta) = \frac{\theta^2}{n}. \]


Exercise 4.2

For the exponential family form of the normal distribution we have \(\theta=\mu\), \(\phi=\sigma^2\), \(b(\theta)=\theta^2/2\), hence \(\hat\theta=\hat\mu\). Then, @eq-ehattheta gives \[ \mbox{E}[\hat\mu]=\mbox{E}[\hat\theta]\approx \theta_0 = \mu. \] Further, since \(b''(\theta)=1\), then \[ \mbox{Var}[\hat\theta]= \frac{\sigma^2}{n}. \] Hence, the approach based on the exponential fmily leads to the familiar results – as we would expect.


Exercise 4.3

Following the principle of maximum likelihood estimation we maximizes the log-likelihood function, that is \[ \hat{\boldsymbol{\beta}} = \max_{\boldsymbol{\beta}} l (\boldsymbol{\beta}; \mathbf{y}, \phi). \] where \(l (\boldsymbol{\beta}; \mathbf{y}, \phi)\) is the log-likelihood of the parameters \(\boldsymbol{\beta}\) given data \(\mathbf{y}\) and assuming that the nuisance scale parameter is fixed.

For the normal linear regression model the log-likelihood is given by \[ l = -\frac{n}{2}\log (2\pi \sigma^2) -\frac{1}{2\sigma^2}(\mathbf{y}-X\boldsymbol{\beta})^T(\mathbf{y}-X\boldsymbol{\beta}). \] Differentiating with respect to vector \(\boldsymbol{\beta}\) gives \[ \frac{d\, l}{d\, \boldsymbol{\beta}} = -\frac{1}{\sigma^2} X^T(\mathbf{y}-X\boldsymbol{\beta}) \] which represents a system of \(p\) equations in \(p\) unknown. Setting this to zero gives the required result of @eq-linearregressionmle, \[ \hat{\boldsymbol{\beta}} = \left(X^T X\right)^{-1} X^T \mathbf{y}. \]


Exercise 4.4


Exercise 4.5

In the notes, we analysed the birthweight data using \(F\)-tests based on the residual sum of squares \(R\) and residual degrees of freedom \(r'\) (note the change of notation — \(r\) was used for degrees of freedom, but later we have used \(r\) for the number of parameters). In other other places, we described tests using the deviance \(D\) and number of parameters \(r\).

Since \(D = R\) for a normal GLM and \(r' = n - r\), where \(n\) is the number of observations, the \(F\)-statistic comparing Model 0 and Model 1 can be written \[\begin{equation*} F_{01} = \frac{(R_0 - R_1) / (r_0' - r_1')} {R_1 / r_1'} = \frac{(D_0 - D_1) / (r_1 - r_0)} {D_1 / (n-r_1)}, \end{equation*}\] which we compared to an \(F\)-distribution on \(r_1 - r_0\) and \(n - r_1\) degrees of freedom. This is almost equivalent to the test statistic: \[\begin{equation*} \frac{(D_1 - D_2) / (r_2 - r_1)}{D_3/(n-r_3)}, \end{equation*}\] which we compared to an \(F_{r_2 - r_1, n - r_3}\) distribution.

The only differences are (i) the arbitrary indexing of models (0,1) or (1,2) and (ii) that in the second form we use a ``large’’ Model 3 to estimate \(\sigma^2\) by \(\hat{\phi} = D_3/(n-r_3)\). In practice the differences will usually be small and the two approaches to variable selection are equivalent.


Exercise 4.6

Start this question by creating an R-Script file entering appropriately chosen R commands, such as those below.

# Define variables, your choice of names, for the data
dose = seq(20,100,by=20)
m    = c(23,18,14,23,22)     # This is the Group size
y    = c(20,14,5,6,1)        # This is the number re-treated

# Now plot the data as a scatter plot
par(mfrow=c(1,2))
plot(dose,y/m, pch=16, ylim=c(0,1), xlim=c(0,120),
     ylab="Proportion re-treated", xlab="Dose, mg/kg")
abline(h=c(0,1), lty=2)

# Create a data frame to hold the data for the lm
data = data.frame(dose=dose, prop=y/m)

my.lm = lm(prop~dose, data=data)
abline(my.lm)

# Now get ready for the glm
plot(dose,y/m, pch=16, ylim=c(0,1), xlim=c(0,120),
     ylab="Proportion re-treated", xlab="Dose, mg/kg")
abline(h=c(0,1), lty=2)

# Now re-define the response as "successes" and "failures"
response = cbind(y,m-y)
my.glm = glm(response~dose, family = "binomial")

# For the fitted curve evaluate the fitted function at many doses
newval = data.frame(dose=seq(0,125,length.out=100))
predicted = predict(my.glm, newdata = newval, type="response")

lines(newval$dose, predicted)

Although both models fit the data well, as the data points are never far from the fitted functions, knowing that proportions can never be outside the interval zero to one, then the binomial logistic model is preferable.

To emphasize the similarity of the fitted model within he range of the data, consider the model residuals as shown in the following graphs.

par(mfrow=c(1,2))

plot(my.lm$fitted, my.lm$residuals, pch=16, ylim=0.2*c(-1,1),
     xlab="Fitted values", ylab="Residuals")
abline(h=0, lty=2)

glm.resids = residuals(my.glm,  type="response")
plot(my.glm$fitted, glm.resids, pch=16, ylim=0.2*c(-1,1),
     xlab="Fitted values", ylab="Residuals")
abline(h=0, lty=2)

Although these are not identical, there is no clear better. Further, the RSS for the linear model is 0.022 whereas for the logistic model it is 0.0161. From the data alone it is impossible to choose. Knowing the background, however, the better model is clearly the logistic.


End of Solutions to Chapter 4 Exercises