Exercise 10.3 Solution Example - Hoff, A First Course in Bayesian Statistical Methods
標準ベイズ統計学 演習問題 10.3 解答例

Table of Contents

a)

Answer

import Pkg; Pkg.add(url="https://github.com/KaoruBB/BayesPlots.jl")
data = readdlm("Exercises/tplant.dat")
20×3 Matrix{Float64}:
 11.8   0.0  6.39
 15.34  1.0  6.39
  9.31  0.0  5.58
 13.7   1.0  5.58
 11.2   0.0  4.26
 14.75  1.0  4.26
 10.33  0.0  5.87
 14.38  1.0  5.87
  9.79  0.0  3.91
 13.96  1.0  3.91
  8.39  0.0  3.91
 12.84  1.0  3.91
 10.61  0.0  6.17
 14.87  1.0  6.17
  8.54  0.0  3.36
 12.77  1.0  3.36
  9.25  0.0  3.52
 13.14  1.0  3.52
  8.86  0.0  2.02
 12.24  1.0  2.02
# first column: plant height
# second column: measurement period
# third column: pH level
y = data[:, 1]
X = data[:, 2]
# add intercept
X = hcat(ones(length(X)), X)
df = DataFrame(
    height = y,
    time = X[:, 2],
    pH = data[:, 3]
)
lmfit = lm(@formula(height ~ time + pH), df)
"StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

height ~ 1 + time + pH

Coefficients:
────────────────────────────────────────────────────────────────────────
                Coef.  Std. Error      t  Pr(>|t|)  Lower 95%  Upper 95%
────────────────────────────────────────────────────────────────────────
(Intercept)  7.20869     0.589054  12.24    <1e-09   5.9659     8.45149
time         3.991       0.328003  12.17    <1e-09   3.29897    4.68303
pH           0.577752    0.120354   4.80    0.0002   0.323828   0.831676
────────────────────────────────────────────────────────────────────────"

This result shows that the height of tomato plants increases by 3.99 cm per time on average. Also, the height of tomato plants increases by 0.58 cm when the pH level increases by 1. Even with a significance level of 0.01, these effects are statistically significant.

この結果は、トマトの苗の高さは平均して時間あたり3.99 cmずつ増加し、 pHレベルが1上昇すると、0.58 cm増加することを示唆している。有意水準0.01でも、有意な結果となっている。

b)

Answer

ex10-3b.png

上は残差の散布図である。同じトマトの苗は同じ色で示されており、同じ苗に関する残差に相関が見られる。これより、このデータは一般的なOLSモデルの仮定である、誤差項が独立同一に分布しているという仮定(random sampling)を満たしていないことが示唆される。この仮定からの逸脱は、回帰係数や標準誤差の妥当性を損なう。

The scatter plot of residuals is shown above. The same tomato plants are indicated in the same color, and the residuals for the same plant show correlation. This suggests that the data do not satisfy the assumption of the ordinary least squares model that the errors are independently and identically distributed (random sampling). Deviation from this assumption would compromise the validity of the regression coefficients and standard errors.

c)

Answer

\[ \boldsymbol{Y} = \begin{pmatrix} Y_{1,1} \\ Y_{1,2} \\ Y_{2,1} \\ Y_{2,2} \\ \vdots \\ Y_{n,1} \\ Y_{n,2} \end{pmatrix} \sim \text{multivariate normal} \left( \mathbf{X} \boldsymbol{\beta}, \Sigma \right) \] where \[ \Sigma = \sigma^2 \mathbf{C}_{\rho} = \sigma^2 \begin{pmatrix} 1 & \rho & 0 & 0 & \cdots & 0 \\ \rho & 1 & 0 & 0 & \cdots & 0 \\ 0 & 0 & 1 & \rho & \cdots & 0 \\ 0 & 0 & \rho & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & 0 & \cdots & 1 \end{pmatrix} \] and \(\rho\) is the correlation between the two measurements of the same plant.

Setting the prior distribution of \(\beta, \sigma, \rho\) as follows,

  • \(\beta\): \[ \beta \sim \mathcal{N}(\boldsymbol{\beta}_0, \Sigma_0), \]
  • \(\sigma\): \[ \sigma \sim \text{inverse-gamma}( \frac{v_0}{2}, \frac{v_0 \sigma_0^2}{2} ), \]
  • \(\rho\): \[ \rho \sim \text{uniform}( 0, 1 ), \]

the posterior distributions of \(\beta, \sigma\) are given by

\begin{align*} \{\boldsymbol{\beta} | \boldsymbol{y}, \mathbf{X}, \sigma^2, \rho \} &\sim \text{multivariate normal}(\boldsymbol{\beta}_n, \Sigma_n), \text{ where} \\ \Sigma_n &= \left( \mathbf{X}^{\top} \mathbf{C}_{\rho}^{-1} \mathbf{X}/\sigma^2 + \Sigma_0^{-1} \right)^{-1} \\ \boldsymbol{\beta}_n &= \Sigma_n \left( \mathbf{X}^{\top} \mathbf{C}_{\rho}^{-1} \boldsymbol{y}/\sigma^2 + \Sigma_0^{-1} \boldsymbol{\beta}_0 \right), \text{ and} \\ \{\sigma^2 | \boldsymbol{y}, \mathbf{X}, \boldsymbol{\beta}, \rho \} &\sim \text{inverse-gamma} \left( \frac{ \nu_0 + n }{2}, \frac{ \nu_0 + \mathrm{SSR}_{\rho} }{2} \right), \text{ where} \\ \mathrm{SSR}_{\rho} &= \left( \boldsymbol{y} - \mathbf{X} \boldsymbol{\beta} \right)^{\top} \mathbf{C}_{\rho}^{-1} \left( \boldsymbol{y} - \mathbf{X} \boldsymbol{\beta} \right). \\ \end{align*}

The posterior distribution of \(\rho\) is not analytically tractable, so we use the Metropolis algorithm to sample from the posterior distribution.

Given \(\{\boldsymbol{\beta}^{(s)}, \sigma^{2(s)}, \rho^{(s)}\}\), we update the parameters as follows:

  1. Update \(\boldsymbol{\beta}\): Sample \(\boldsymbol{\beta}^{(s+1)} \sim \text{multivariate normal}(\boldsymbol{\beta}_n, \Sigma_n)\), where \(\boldsymbol{\beta}_n \) and \(\Sigma_n\) depend on \(\sigma^{2(s)}\) and \(\rho^{(s)}\).
  2. Update \(\sigma^2\): Sample \(\sigma^{2(s+1)} \sim \text{inverse-gamma}([\nu_0 + n]/2, [\nu_0 \sigma_0^2 + \mathrm{SSR}_{\rho}] / 2)\), where \(\mathrm{SSR}_{\rho}\) depends on \(\boldsymbol{\beta}^{(s+1)}\) and \(\rho^{(s)}\).
  3. Update \(\rho\):
    1. Propose \(\rho^{\ast} \sim \text{uniform}(\rho^{(s)} - \delta, \rho^{(s)} + \delta)\). If \(\rho^{\ast} \lt 0\) then reassin it to be \(|\rho^{\ast}|\). If \(\rho^{\ast} > 1\) reassin it to be \(2 - \rho^{\ast}\).
    2. Compute the acceptance ratio \[ r = \frac{ p(\boldsymbol{y} | \mathbf{X}, \boldsymbol{\beta}^{(s+1)}, \sigma^{2(s+1)}, \rho^{\ast} ) p(\rho^{\ast})}{ p(\boldsymbol{y} | \mathbf{X}, \boldsymbol{\beta}^{(s+1)}, \sigma^{2(s+1)}, \rho^{(s)} ) p(\rho^{(s)})} \] and sample \(u \sim \text{uniform}(0,1)\). If \(u \lt r\), set \(\rho^{(s+1)} = \rho^{\ast}\), otherwise set \(\rho^{(s+1)} = \rho^{(s)}\).
function fit_ar1_linear_model_for_repeated_individuals(y, X, β₀, Σ₀, ν₀, σ₀²; S=5000, δ=1, seed=42, thin=1)

    n, p = size(X)

    # starting values
    lmfit = lm(X, y)
    β = coef(lmfit)
    σ² = sum(map(x -> x^2, residuals(lmfit))) / (n - p)
    res = residuals(lmfit)
    ρ = autocor(res, [1])[1]

    Random.seed!(seed)
    OUT = Matrix{Float64}(undef, 0, p+2)
    ac = 0 # acceptance count

    function _create_cor_matrix(n::Int, ρ::Number)
        Cᵨ = zeros(n, n)
        for i in 1:n
            for j in 1:n
                if i == j
                    Cᵨ[i, j] = 1
                elseif (i - j == 1 && i % 2 == 0) || (j - i == 1 && j % 2 == 0)
                    Cᵨ[i, j] = ρ
                end
            end
        end
        return Cᵨ
    end

    function _update_β(y, X, σ², iCor, Σ₀, β₀)
        V_β = inv(X' * iCor * X/σ² + Σ₀)
        E_β = V_β * (X' * iCor * y/σ² + inv(Σ₀) * β₀)
        return rand(MvNormal(E_β, Symmetric(V_β)))
    end

    function _update_σ²(y, X, β, iCor, ν₀, σ₀²)
        νₙ = ν₀ + length(y)
        SSR = (y - X*β)' * iCor * (y - X*β)
        νₙσ²ₙ = ν₀ * σ₀² + SSR
        return rand(InverseGamma(νₙ / 2, νₙσ²ₙ/2))
    end

    function _update_ρ(ρ, y, X, β, σ², δ)
        ρₚ = rand(Uniform(ρ-δ, ρ+δ)) |> abs
        ρₚ = minimum([ρₚ, 2-ρₚ])
        Cor = _create_cor_matrix(n, ρ)
        Corₚ = _create_cor_matrix(n, ρₚ)
        lr = -0.5*(
            logdet(Corₚ) - logdet(Cor)  +
            tr((y-X*β)*(y-X*β)' * (inv(Corₚ) - inv(Cor)))/σ²
        )
        if log(rand()) < lr
            ac += 1
            return ρₚ
        else
            return ρ
        end
    end

    # MCMC algorithm
    for s in 1:S
        Cor = _create_cor_matrix(n, ρ)
        iCor = inv(Cor)

        β = _update_β(y, X, σ², iCor, Σ₀, β₀)
        σ² = _update_σ²(y, X, β, iCor, ν₀, σ₀²)
        ρ = _update_ρ(ρ, y, X, β, σ², δ)

        # store
        if s % thin == 0
            # println(s, ac/s, β, σ², ρ)
            println(join([s, ac/s, β, σ², ρ], ", "))
            params = vcat(β, σ², ρ)'
            OUT = vcat(OUT, params)
        end
    end

    return OUT
end
# set prior parameters
ν₀ = 1
σ₀² = 1
Σ₀ = diagm(repeat([1/1000], 3))
β₀ = [0, 0, 0]
y = df[:, 1];
X = hcat(ones(length(y)), df[:,2], df[:,3])

20×3 Matrix{Float64}:
 1.0  0.0  6.39
 1.0  1.0  6.39
 1.0  0.0  5.58
 1.0  1.0  5.58
 1.0  0.0  4.26
 1.0  1.0  4.26
 1.0  0.0  5.87
 1.0  1.0  5.87
 1.0  0.0  3.91
 1.0  1.0  3.91
 1.0  0.0  3.91
 1.0  1.0  3.91
 1.0  0.0  6.17
 1.0  1.0  6.17
 1.0  0.0  3.36
 1.0  1.0  3.36
 1.0  0.0  3.52
 1.0  1.0  3.52
 1.0  0.0  2.02
 1.0  1.0  2.02
# MCMC
S=25000
δ=0.1
thin=25
fitted_sample = fit_ar1_linear_model_for_repeated_individuals(y, X, β₀, Σ₀, ν₀, σ₀², S=S, δ=δ, thin=thin);
plot(
    map(
        1:size(fitted_sample, 2), ["intercept", "β₁", "β₂", "σ²", "ρ"]
    ) do i, label
        plot(
            fitted_sample[:, i],
            xlabel="scan/25",
            ylabel=label,
            legend=nothing
        )
    end...,
    layout=(3, 2), size=(900, 900)
)
savefig("../../fig/ch10/ex10_3_scan.png")
"../../fig/ch10/ex10_3_scan.png"

ex10_3_scan.png

Figure 1: Sample paths of the parameters

plot(
    map(
        1:size(fitted_sample, 2), ["intercept", "β₁", "β₂", "σ²", "ρ"]
    ) do i, label
        bar(
            0:30,
            autocor(fitted_sample[:, i], 0:30),
            xlabel="lag/25 of $label",
            ylabel="ACF",
            legend=false
        )
    end...,
    layout=(2, 3), size=(900, 500), margin=5mm
)
savefig("./../../fig/ch10/ex10_3_acf.png")
"./../../fig/ch10/ex10_3_acf.png"

ex10_3_acf.png

Figure 2: Autocorrelation plots of the parameters

12はそれぞれ、25,000スキャンのうち25スキャンごとに保存された各パラメーターのサンプルパスと、自己相関のプロットである。これらの結果から、MCMCサンプリングが十分に収束していること確認できる。

Figure 1 and 2 show the sample paths and autocorrelation plots of each parameter saved every 25 scans out of 25,000 scans. These results confirm that the MCMC sampling has sufficiently converged.

d)

Answer

figs = []
for i in 1:3
    fig = plot_posterior_density(
        fitted_sample[:, i], bandwidth=0.1, xlabel="β" * "_$i"
    )
    fig = vspan!(
        fig,
        confint(lmfit)[i, :],
        color = :blue,
        alpha = 0.1,
        label = "95% CI of OLS",
        legend = i == 3 ? :topright : nothing,
    )
    fig = vline!(
        fig,
        [coef(lmfit)[i]],
        color = :blue,
        linestyle = :dash,
        label = "OLS estimate",
    )
    push!(figs, fig)
end

plot(figs..., layout=(1, 3), size=(1000, 400), margin=5mm)
savefig("../../fig/ch10/ex10_3d.png")
"../../fig/ch10/ex10_3d.png"

ex10_3d.png

Figure 3: Posterior of the coefficients

3は、MCMCサンプリングによる事後分布の分布、事後平均、95%信用区間と、 OLSによる点推定量と95%信頼区間を示している。切片(\(\beta_1\)), timeの係数(\(\beta_2\)), pHの係数(\(\beta_3\))について、MCMCサンプリングによる事後平均とOLS推定量はほぼ一致しているが、95%信用区間とOLSの信頼区間は異なる。もし点推定値のみに関心がある場合、同様な結論が得られるが、推定値の不確実性などを考慮したい場合、今回のベイズモデルとOLSモデルの結論は大きく異なる。

Figure 3 shows the posterior density, posterior mean, and 95% credible interval of the coefficients obtained by MCMC sampling, as well as the point estimate and 95% confidence interval obtained by OLS. The posterior means of \(\beta_1\) and \(\beta_2\) are almost identical to the OLS estimates, but the 95% credible intervals and OLS confidence intervals differ. If only point estimates are of interest, similar conclusions can be drawn, but if uncertainties in the estimates are to be considered, the conclusions of the Bayesian model and the OLS model differ significantly.

Author: Kaoru Babasaki

Email: [email protected]

Last Updated: 2025-05-02 金 16:29

home Home | ホーム | GitHub