Here is a video [3 mins] to introduce the chapter.
7.1 Overview
In this chapter we consider two extensions to the log-linear models studied in the previous chapter. Recall that the log-linear model assumes that each observation follows a Poisson distribution, but in many cases these will not be all independent. Instead, row totals or the grand total will be fixed. This structure means that the observed counts will no longer follow the Poisson distribution and hence modification to the theory and calculations are needed.
7.2 Contingency tables with fixed marginals
7.2.1 Introduction
In the melanoma example, see Table 6.2, the overall total number of observations was exactly \(400\). If the data were truly Poisson, this would be a suspiciously round number. In reality, this total was fixed by design, so we should take into account the fact that \(y_{++} = 400\) and fit a more appropriate model. Also, recall the flu vaccine example of Table 6.5. Suppose that some of the marginal totals are fixed by design. For example, that the number of patients in each arm of the trial (38 in the Placebo group, 35 in the Vaccine group) was fixed before data collection started. In this case, we should also take into account the fact that \(y_{1+}=38\) and \(y_{2+}=35\) and fit a more suitable model.
Before we can re-consider these data sets, we first need to establish some theoretical results to deal with such conditional distribution cases.
7.2.2 Modelling
Consider, again, a general two-way contingency table such as that in Table 7.1. The fixed marginals are the column sums: \(y_{+j},\) the row sums: \(y_{i+},\) and the overall sum: \(y_{++}.\)
Table 7.1: A two-way contingency table with \(k_1\) rows and \(k_2\) columns, where each entry \(y_{ij}\) is a count.
1
2
\(\dots\)
\(j\)
\(\dots\)
\(k_2\)
Total
1
\(y_{11}\)
\(y_{12}\)
\(\dots\)
\(y_{1j}\)
\(\dots\)
\(y_{1k_2}\)
\(y_{1+}\)
\(\dots\)
\(\dots\)
\(\dots\)
\(\dots\)
\(\dots\)
\(\dots\)
\(\dots\)
\(\dots\)
\(i\)
\(y_{i1}\)
\(y_{12}\)
\(\dots\)
\(y_{1j}\)
\(\dots\)
\(y_{ik_2}\)
\(y_{i+}\)
\(\dots\)
\(\dots\)
\(\dots\)
\(\dots\)
\(\dots\)
\(\dots\)
\(\dots\)
\(\dots\)
\(k_1\)
\(y_{k_11}\)
\(y_{k_12}\)
\(\dots\)
\(y_{k_1j}\)
\(\dots\)
\(y_{k_1k_2}\)
\(y_{k_1+}\)
Total
\(y_{+1}\)
\(y_{+2}\)
\(\dots\)
\(y_{+j}\)
\(\dots\)
\(y_{+k_2}\)
\(y_{++}\)
Suppose that we have fixed row totals \(y_{i+},\) or a fixed overall total \(y_{++}\), how would such constraints affect the distribution of the random variables \(Y_{ij}\)?
To answer this question, we will need the following results.
Proposition 7.1 Let \(Y_{ij}\), for \(i=1,\dots,k_1\) and \(j=1,\dots, k_2\), be the responses in a two-way contingency table, with \(k_1\) rows and \(k_2\) columns. Further, assume that the counts follow independent Poisson distributions with corresponding means \(\lambda_{ij}\), that is \(Y_{ij}\sim \mbox{Po}(\lambda_{ij})\). Then, the distributions of the row sums are, \[
Y_{i+} \sim \mbox{Po}(\lambda_{i+}),
\] where \(\displaystyle Y_{i+} = \sum_{j=1}^{k_2} Y_{ij}\) and \(\displaystyle \lambda_{i+}=\sum_{j=1}^{k_2} \lambda_{ij}\).
Notes:
Here we are not yet assuming that the marginal totals are fixed.
Similar results hold when considering column sums and the overall sum.
Proof: See MATH2715, Chapter 7 on MGF Properties of Linear Functions.
Proposition 7.2 Let \(Y_{ij}\sim \mbox{Po}(\lambda_{ij})\) for \(i=1,\dots,k_1\) and \(j=1,\dots, k_2\) then the conditional distribution of each count in a row given that the row sum is fixed is given by \[
\left( Y_{ij} \, |\, Y_{i+} = m \right)
\sim
\mbox{Bin}\left(m,
\frac{\lambda_{ij}}{\lambda_{i+}}
\right).
\] That is, conditioning on the sum of independent Poisson random variables induces a Binomial distribution.
Proof: First define \(Y_{i-j}= Y_{i+}-Y_{ij}\) with correspondingly \(y_{i-j}= m-y_{ij}\) and also \(\lambda_{i-j}= \lambda_{i+}-\lambda_{ij}\).
Consider the conditional probability mass function \[\begin{align*}
p(Y_{ij} = y_{ij} | Y_{i+} = m)
& = \frac{p(Y_{ij} = y_{ij}, Y_{i+} = m)} {p(Y_{i+} = m)}
\quad \text{by conditional probability definition}\\
& = \frac{p(Y_{ij} = y_{ij}, Y_{i-j} = m-y_{ij})} {p(Y_{i+} = m)} \quad \text{rearranging} \\
& = \frac{p(Y_{ij} = y_{ij})p(Y_{i-j} = m-y_{ij})} {p(Y_{i+} = m)} \quad \text{by independence}.
\end{align*}\] Then, each of these follows a Poisson distribution: \(Y_{ij} \sim \text{Po}(\lambda_{ij})\), \(Y_{i-j}\sim \text{Po}(\lambda_{i-j})\), and \(Y_{i+} \sim \text{Po}(\lambda_{i+})\). Therefore, \[\begin{align*}
p(Y_{ij} = y_{ij} | Y_{i+} = m)
& =
\frac{\lambda_{ij}^{y_{ij}}e^{-\lambda_{ij}}}{y_{ij}!}
\frac{\lambda_{i-j}^{m-y_{ij}}e^{-\lambda_{i-j}}}{m-y_{ij}!}
\Bigg/
\frac{\lambda_{i+}^{m} e^{-\lambda_{i+}}}{m!} \\
& =
{m \choose y_{ij}}
\left(\frac{\lambda_{ij}}{\lambda_{i+}} \right) ^{y_{ij}}
\left(\frac{\lambda_{i-j}}{\lambda_{i+}} \right) ^{m-y_{ij}}\\
& =
{m \choose y_{ij}}
\pi ^{y_{ij}}
\left(1-\pi\right) ^{m-y_{ij}}
\end{align*}\]
which is the probability mass function of a \(\text{Bin}(m, \pi)\) random variable, where \(\pi = \lambda_{ij} / \lambda_{i+}\).
Proposition 7.3 Let \(Y_{ij}\sim \mbox{Po}(\lambda_{ij})\) for \(i=1,\dots,k_1\) and \(j=1,\dots, k_2\) then the joint conditional distribution of the counts in a row given that the row sum is fixed is given by \[
p(Y_{i1},\dots, Y_{i k_2} \, | \, Y_{i+}=m)
=
\frac{m!}{y_{i1}!\cdots y_{i k_2}!}
\pi_{i1}^{y_{i1}} \cdots\pi_{ik_2}^{y_{ik_2}}
\] with \[
\pi_{ij}=\lambda_{ij}/\lambda_{i+}
\quad \mbox{ and } \quad
\sum_{j=1}^{k_2} \pi_{ij} =1.
\] This distribution is called the Multinomial distribution, with index \(m\) and parameters \(\pi_{ij}, j=1,\dots k_2\), which can be denoted \[
(Y_{i1},\dots, Y_{i k_2} \, | \, Y_{i+}=m)
\sim
\text{Mult}(m, \pi_{i1}, \dots \pi_{ik_2}
).
\]
Proof: Omitted.
Proposition 7.4 Let \(Y_{ij}\sim \mbox{Po}(\lambda_{ij})\) for \(i=1,\dots,k_1\) and \(j=1,\dots, k_2\) then the conditional distribution of the counts given that the overall sum is fixed is \[
(Y_{11},\dots, Y_{k_1k_2} \, | \, Y_{++}=m)
\sim
\text{Mult}(m, \pi_{11},\dots, \pi_{k_1k_2}).
\] with probability mass function given by \[
p(Y_{11},\dots, Y_{k_1k_2} \, | \, Y_{++}=m)
=
\frac{m!}{y_{11}!\cdots y_{k_1k_2}!}
\pi_{11}^{y_{11}} \cdots\pi_{k_1k_2}^{y_{k_1k_2}}
\] where \(\pi_{ij} = \lambda_{ij}/\lambda_{++}\).
Proof: Omitted.
7.3 Product-multinomial models
The Poisson contingency table model assumes that each cell count (and therefore row and column total) is random – specifically, Poisson.
However, in many examples, some of the marginal totals are fixed by design.
For example, as discussed in the Flu-vaccine example of Section 6.2.2, the numbers in the placebo (38) and vaccine (35) groups are not random, but fixed by the experimenter. Further, we are not interested in modelling these row totals but instead the response to treatment (placebo or vaccine). Thus, in the Flu vaccine example, we should condition the Poisson model on the row totals.
From Equation 6.1 the saturated model for a two-factor table is \[
\lambda_{ij} =
\exp\left\{\mu + \alpha_i + \beta_j + (\alpha \beta)_{ij}\right\}.
\] Then, using the result in Proposition 7.3, we see that \[
\pi_{ij} = \frac{\lambda_{ij}}{\lambda_{i+}}
=
\frac{\exp\left\{\mu + \alpha_i + \beta_j + (\alpha \beta)_{ij}\right\}}{\sum_j \exp\{\mu + \alpha_i + \beta_j + (\alpha \beta)_{ij}\}}
\] which simplifies to give \[
\pi_{ij}
=
\frac{\exp\left\{\beta_j + (\alpha \beta)_{ij}\right\}}{\sum_j \exp\left\{\beta_j + (\alpha \beta)_{ij}\right\} }.
\tag{7.1}\] Thus, for each row, that is each \(i\), this multinomial distribution depends on \(\beta_j\) and \((\alpha\beta)_{ij}\) for \(j=1,\dots, k_2\), but not on \(\mu\) and \(\alpha_i\). This means that the data in each row of the table are independent and hence, overall, the model likelihood is given by \[
L(\boldsymbol{\beta}; \mathbf{y}) =
\prod_{i=1}^{k_1}
p(Y_{i1},\dots, Y_{i k_2} \, | \, Y_{i+}=y_{i+}),
\] with \(\boldsymbol{\beta}=\{\mu, \alpha_1,\dots, \alpha_{k_1}, \beta_1,\dots, \beta_{k_2}, (\alpha\beta)_{11},\dots, (\alpha\beta)_{k_1k_2}\}\) and where \[
p(Y_{i1},\dots, Y_{i k_2} \, | \, Y_{i+}=y_{i+})
=
\frac{y_{i+}!}{y_{i1}!\cdots y_{i k_2}!}
\pi_{i1}^{y_{i1}} \cdots\pi_{ik_2}^{y_{ik_2}} .
\] Further, using Equation 7.1 the log-likelihood is given by \[
l(\boldsymbol{\beta}; \mathbf{y}) =
\text{const} +
\sum_{i=1}^{k_1} \sum_{j=1}^{k_2}
y_{ij}
\left\{
\beta_j + (\alpha \beta)_{ij}
- \log \sum_j \exp\left\{\beta_j + (\alpha \beta)_{ij}\right\}\right\}.
\] Recall that the saturated model will have the fitted values equal to the data values in all cases. Therefore, we concentrate on the product multinomial models with interactions set to zero, that is \((\alpha\beta)_{ij}=0\), then \[
l(\boldsymbol{\beta}; \mathbf{y}) =
\text{const} +
\sum_{i=1}^{k_1} \sum_{j=1}^{k_2}
y_{ij} \left\{ \beta_j
- \log \sum_j \exp\left\{\beta_j \right\}\right\}
\] which simplifies to \[
l(\boldsymbol{\beta}; \mathbf{y}) =
\text{const} +
\sum_{j=1}^{k_2}
y_{+j} \left\{ \beta_j
- y_{++}\log \sum_j \exp\left\{\beta_j \right\}\right\}.
\] Differentiating this with respect to single parameter \(\beta_k\) and setting to zero gives \[
\frac{y_{+k}}{y_{++}} =
{ e^{\hat\beta_k}}\Big/ {\sum_j e^{\hat\beta_j}}
\tag{7.2}\] which is identical to the parameter estimate for the previous independence Poisson model. Thus, in this case, if we want to fit the product multinomial model (which is hard) we can instead fit the Poisson model (which is easy). Note, however, that these estimates only coincide when \(\alpha_1, \dots, \alpha_{k_1}\) are included in the Poisson independence model, even though we see that these are eliminated from the multinomial model. In fact, these parameter estimates reflect the fixed marginal totals in the multinomial situation.
Although this derivation has only considered the cases of the independence Poisson model: \(\log \lambda_{ij} = \mu+\alpha_i+\beta_j\) and the product multinomial model with \((\alpha\beta)_{ij}=0\), the following result gives general guidance.
Theorem 7.1 Tables with fixed margin sums can be analyzed using a multinomial or product-multinomial model as though they were independent Poisson models, with log-linear predictor, provided terms corresponding to the fixed margins are included in the model. Then the MLEs for the two models yield the same values for the parameters of interest and have the same deviances.
Proof: Omitted. For details see Birch, 1963, JRSS(B), 220–233. The proof is not especially difficult but is a bit messy. It depends on the property of the Poisson model with log link function that observed totals equals fitted totals for the effects present in the Poisson model.
7.4 Model fitting in R
Consider again he flu vaccine data described in Section 6.2.2 with data repeated below. Recall that 38 people were recruited to the Placebo group and 35 to the Vaccine group and that we assume these were fixed before data collection started.
Antibody responses to flu vaccine from a randomized controlled trial.
Group
Low
Moderate
High
Total
Placebo
25
8
5
38
Vaccine
6
18
11
35
Total
31
26
16
73
The fixed row sums mean that a product-multinomial model is appropriate but this can be fitted using the usual same command as the Independence Poisson model as long as we include the term for the row effect, that is for Group.
Call:
glm(formula = count ~ antibodyF + groupF, family = "poisson")
Deviance Residuals:
1 2 3 4 5 6
2.040 -1.630 -1.247 -2.615 1.469 1.128
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.78111 0.21184 13.129 <2e-16 ***
antibodyF2 -0.17589 0.26593 -0.661 0.5083
antibodyF3 -0.66140 0.30783 -2.149 0.0317 *
groupF2 -0.08224 0.23428 -0.351 0.7256
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 23.807 on 5 degrees of freedom
Residual deviance: 18.643 on 2 degrees of freedom
AIC: 51.771
Number of Fisher Scoring iterations: 5
Hence, our proposed model has a residual deviance of \(18.643\) on \(2\) degrees of freedom. The chi-square test gives a \(p\)-value of \[p = \mbox{Pr}(\chi^2_2 > 18.643) = 8.95\times 10^{-5},\] so we reject the independence model in favour of the saturated model.
For the independence model \[
\hat\pi_{ij} =
{ e^{\hat\beta_j}}\Big/ {\sum_j e^{\hat\beta_j}}
\] whose right-hand side does not depend on \(i\) and hence is the same for all rows. The corresponding fitted values are then given by \(\hat y_{ij} = y_{i+} \hat \pi_{ij}\).
7.5 Exercises
7.1 Consider a \(2 \times m\) contingency table with entries \(y_{ij}, \ i=1,2,\ j=1,\ldots,m\). The rows label STATUS represents alive (STATUS = 1) or dead (STATUS=2) and the columns label AGE represents \(m\) age groups \((\mbox{AGE} = 1,\ldots,m\)) and not the exact age.
Suppose that a Poisson model \(P(\lambda_{ij})\) with \[
\log(\lambda_{ij}) = \delta + \alpha_i+ \beta_j + \gamma_i \, AGE_j
\] is considered appropriate. Note that \(AGE_j=j\) is treated as a quantitative variable in the last term. What additional constraints on the parameters should be included to make estimation unique – that is to remove the aliasing problem.
Suppose that the column totals, \(Y_{+j}\), are fixed. Explain why a product binomial model \(B(y_{+j},\pi_j), \ j=1,\ldots,m\), is suitable in this case and find the form of \(\pi_j.\) Which parameters can be estimated under this model?
7.2 In the product-multinomial model for a table with fixed row totals, model parameter estimates are given as the solution of Equation 7.2 and it was claimed that this had the same form as for the previous independence Poisson model (without fixed row totals). Use Equation 6.4 and Equation 6.6 to show that indeed the estimation equations are identical and hence that the parameter estimates will be the same.