For the problem of inference about a real parameter μ on the basis of n independent observations x1, ⋯, xn (or x) each distributed as N(μ, σ2) with σ2 "unknown", it is commonly asserted, for example in [2] p. 465, that the Bayesian method is close to other forms of inference (significance tests, confidence and fiducial intervals) since it too may be based on sn-1(t), the probability density function (pdf) of Student's t with n - 1 degrees of freedom. The Bayesian role of sn-1(t) is that of the posterior pdf of $t = \lbrack n(n - 1)/S\rbrack^{\frac{1}{2}} (\bar x - \mu)$, where $\bar x = n^{-1} \sum x_i$ and $S = \sum (x_i - \bar x)^2$ are the sufficient statistics for μ and σ2. It results from formal use in Bayes's Theorem of the improper prior pdf for μ and σ2 described by "independence of μ and log σ and their uniform distributions on R1". More convincing support for sn-1(t) as a posterior pdf could be obtained by detailed examination of the product space of proper (integrable) prior pdfs and $(\bar x, S)$ and the determination of the essential features of the region where replacement of the posterior pdf of μ by that derived from sn-1(t) does not seriously affect inference about μ. In this note, attention will be confined to prior pdfs in the following class. Let ω denote the Fisher information σ-2 and let $I\{\}$ denote the 0-1 indicator function of a set. Consider prior pdfs for μ and ω drawn from the sequence \begin{equation*}\tag{1.1}p_\alpha(\mu, \omega) \propto \omega^{-1}I\{\mu, \omega\mid\mu_{1\alpha} \mu \mu_{2\alpha}, \omega_{1\alpha} \omega \omega_{2\alpha}\} \alpha = 1, 2, \cdots.\end{equation*} For each member of this sequence, μ and ω are independent while μ and log ω (or log σ) have rectangular distributions (from which it is clear that the choice of (1.1) is motivated by the improper prior pdf for μ and σ2 above). The posterior pdf of μ obtained by combining pα(μ, ω) with the likelihood function $p(\mathbf{x}\mid\mu, \omega) \propto \omega^{\frac{1}{2}n} \exp \lbrack-\frac{1}{2}n\omega(\bar x - \mu)^2 - \frac{1}{2}\omega S\rbrack$ is proportional to $\int^{\omega_{2\alpha}}_{\omega_{1\alpha}} \omega^{\frac{1}{2}n-1} \exp \lbrack-\frac{1}{2}n\omega(\bar x - \mu)^2 - \frac{1}{2}\omega S\rbrack d\omega\cdot I\{\mu\mid\mu_{1\alpha} \mu \mu_{2\alpha}\}$ giving, with the change of variable u = ω[ 1 + t2/(n - 1)] S \begin{equation*}\tag{1.2}p_\alpha(t \mid \mathbf{x}) \propto s_{n-1}(t) \int^{\lbrack 1+t^2/(n-1)\rbrack S\omega_{2\alpha}}_{\lbrack 1+t^2/(n-1)\rbrack S\omega_{1\alpha}} u^{\frac{1}{2}(n-2)}e^{-\frac{1}{2}u}du\end{equation*} $I\{t\mid\lbrack n(n - 1)/S\rbrack^{\frac{1}{2}}(\bar x - \mu_{2\alpha}) t \lbrack n(n - 1)/S\rbrack^{\frac{1}{2}}(\bar x - \mu_{1\alpha})\}.$ To obtain sn-1(t), Jeffreys (p. 68 of [1]) uses a convergence argument which, in our specialisation, would involve letting \begin{equation*}\tag{1.3}\mu_{1\alpha} \rightarrow - \infty, \mu_{2\alpha} \rightarrow \infty, \omega_{1\alpha} \rightarrow 0, \omega_{2\alpha} \rightarrow \infty \text{as} \alpha \rightarrow \infty\end{equation*} as necessary and sufficient conditions for \begin{equation*}\tag{1.4}\lim p_\alpha(t\mid\mathbf{x}) \equiv s_{n-1}(t)\end{equation*} for all values of x. In (1.4), x is kept fixed. However, in changing α, we are changing the prior distribution used, so that keeping x fixed has no obvious relevance. To emphasize that a different x would normally be associated with a different prior pdf, we will, except in the proofs of Section 2, write $\mathbf{x}_\alpha, \bar x_\alpha, S_\alpha, t_\alpha$ for the $\mathbf{x}, \bar x, S, t$ associated with pα(μ, ω). A radically different justification of sn-1(t) is provided as follows. Let us suppose that the person who is to make the inference about μ has the prior pdf ps(μ, ω) for s some positive integer, that is, a pdf that happens to be a member of the sequence (1.1). Examination of (1.2) shows that he can take sn-1(ts) as a good approximation to his posterior pdf provided \begin{equation*}\tag{1.5}S_s\omega_{1s} \ll 1, S_s\omega_{2s} \gg 1, S^{-\frac{1}{2}}_s(\mu_{2s} - \bar x_s) \gg 1, S^{-\frac{1}{2}}_s(\bar x_s - \mu_{1s}) \gg 1.\end{equation*} Now a person holding the prior pdf ps(μ, ω) would expect to obtain xs's according to the marginal pdf ps(xs) = ∫ ∫ p(xs∣μ, ω)ps(μ, ω) dμ dω. The probability of (1.5) under ps(xs) is therefore the person's prior probability of being able to use sn-1(ts) as a basis for inference about μ. In the light of this, if, for the sequence (1.1), we were to have $\operatorname{plim} S_\alpha\omega_{1\alpha} = 0, \quad \operatorname{plim} S_\alpha\omega_{2\alpha} = \infty,$ \begin{equation*}\tag{1.6} \operatorname{plim} S^{-\frac{1}{2}}_\alpha(\bar x_\alpha - \mu_{1\alpha}) = \infty, \operatorname{plim} S^{-\frac{1}{2}}_\alpha(\bar{x}_\alpha - \mu_{1\alpha}) = \infty \end{equation*} with the $\operatorname{plim}$ evaluated with respect to the sequence of marginal distributions pα(xα), we would, by proceeding down the sequence, be able to invest sn-1(t) with an asymptotic justification. (By $\operatorname{plim} z = \infty$, we mean that $\lim \operatorname{Prob} (z K) = 0 \text{for all} K.)$ In Lemma 1 be (a) ρ2α/ρ1α→ ∞ \begin{equation*}\tag{1.7} \mathrm{(b)} \rho_{2\alpha} \rightarrow \infty \end{equation*} $\mathrm{(c)} \lim \inf \lbrack\log \rho_{1\alpha}/\log \rho_{2\alpha}\rbrack \geqq 0 \quad \text{as} \quad \alpha \rightarrow \infty. $ Lemma 2 then shows that (1.6) is equivalent to \begin{equation*}\tag{1.8}\operatorname{plim} p_\alpha(t\mid\mathbf{x}_\alpha) \equiv s_{n-1}(t)\end{equation*} where the $\operatorname{plim}$ is again evaluated with respect to the sequence pα(xα), α → ∞. Hence (1.7) is necessary and sufficient for (1.8) which, since it allows direct comparison with the Jeffreys approach in (1.3) and (1.4), we state as the principal theorem. The interpretation of the conditions (1.3) is superficially straightforward; it is that the prior pdfs for μ and ω should (separately) approach conditions representing "complete ignorance". (1.7) is apparently more complex. In the requirement ρ2α/ρ1α → ∞, it agrees with (1.3); its principal divergence from (1.3) lies in the existence of the joint conditions, (b) and (c), on the developments of the prior pdfs of μ and ω. ρ1α and ρ2α may be regarded as measures of the information about μ in the least and most informative conditional distribution p(x∣μ, ω) allowed by pα(μ, ω), relative to the prior information about μ measured by the quantity (μ2α - μ1α)-1. (1.7) (c) requires that, although there is no necessity for ρ1α to approach zero at all, if it does so, it should not do so too rapidly that is, loosely speaking the least informative conditional distribution should not be too uninformative. For the case μ1α = -α, μ2α = α, ω1α = αλ, ω2α = α, (1.3) requires $-\infty \lambda 0$, while (1.7) requires $-2 \leqq \lambda 1$. The case μ1α = -1, μ2α = 1, ω1α = 1, ω2α = α satisfies (1.7) but not (1.3). The comparison of (1.3) and (1.7) is assisted by noting that tα is invariant with respect to the simultaneous transformations of x and μ, x → aα x + bα, μ → aαμ + bα. We would therefore expect that any reasonable condition on the sequence (1.1) for the asymptotic relevance of sn-1(t) would be unaffected by these transformations, when coupled with ω → a-2 αω. (1.7) agrees with such expectation while (1.3) does not.
CITATION STYLE
Stone, M. (1963). The Posterior $t$ Distribution. The Annals of Mathematical Statistics, 34(2), 568–573. https://doi.org/10.1214/aoms/1177704169
Mendeley helps you to discover research relevant for your work.