Exercise 7.6 Solution Example - Hoff, A First Course in Bayesian Statistical Methods
標準ベイズ統計学 演習問題 7.6 解答例

Table of Contents

a)

Answer

preprocess & Gibbs sampler

Y_src = readdlm("../../Exercises/azdiabetes.dat", header=true)
Y = Y_src[1]
header = Y_src[2] |> vec
df = DataFrame(Y, header)

col_int = [:npreg, :glu, :bp, :skin, :age]
col_float = [:bmi, :ped]
col_str = :diabetes
df[!,col_int] = Int.(df[!,col_int])
df[!,col_float] = Float32.(df[!,col_float])
df[!,col_str] = String.(df[!,col_str])

df_d = filter(:diabetes => ==("Yes"), df)
df_n = filter(:diabetes => ==("No"), df)

function getPosterior(df, S)
    # select columns except for diabetes
    Y = select(df, Not(:diabetes)) |> Matrix
    n, p = size(Y)
    ȳ = mean(Y, dims=1) |> vec
    Σ̂ = cov(Y)

    # set prior
    μ₀ = ȳ
    Λ₀ = Σ̂
    S₀ = Σ̂
    ν₀ = p + 2

    function multivariateGibbs(S, Σ_init, μ₀, Λ₀, S₀, ν₀, Y)
        n, p = size(Y)
        THETA = Matrix{Float64}(undef, S, p)
        SIGMA = Matrix{Float64}(undef, S, p^2)
        Σ = Σ_init
        ȳ = mean(Y, dims=1) |> vec
        for s in 1:S
            # update θ
            Λn = inv( inv(Λ₀) + n * inv(Σ) )
            μn = Λn * ( inv(Λ₀) * μ₀ + n * inv(Σ) *) |> vec
            dist = MvNormal(μn, Symmetric(Λn))
            θ = rand(dist)

            # update Σ
            Sn = S₀ + (Y' .- θ) * (Y' .- θ)'
            # Σ = rand(InverseWishart(n + ν₀, Sn))
            Σ = rand(InverseWishart(n + ν₀, round.(Sn, digits=5)))

            # save results
            THETA[s, :] = vec(θ)
            SIGMA[s, :] = vec(Σ)
        end
        return THETA, SIGMA
    end

    THETA, SIGMA = multivariateGibbs(S, Σ̂, μ₀, Λ₀, S₀, ν₀, Y)
    return THETA, SIGMA
end

S = 10000
THETA_d, SIGMA_d = getPosterior(df_d, S)
THETA_n, SIGMA_n = getPosterior(df_n, S)

marginal posterior distribution

exercise7-6a.png

p = size(THETA_d, 2)
for i in 1:p
    theta_d = THETA_d[:, i]
    theta_n = THETA_n[:, i]
    println("Pr(θd,$i > θn,$i) = ", mean(theta_d .> theta_n))
end
Pr(θd,1 > θn,1) = 1.0
Pr(θd,2 > θn,2) = 1.0
Pr(θd,3 > θn,3) = 1.0
Pr(θd,4 > θn,4) = 1.0
Pr(θd,5 > θn,5) = 1.0
Pr(θd,6 > θn,6) = 1.0
Pr(θd,7 > θn,7) = 1.0
  • 全ての変数で、糖尿病患者と非糖尿病患者間での平均値の差があると言える。 (For all variables, it can be said that there is a difference in mean values between diabetic and non-diabetic patients.)

b)

Answer

standard deviation

exercise7-6b_sd.png

correlation

exercise7-6b_cor.png

discussion

日本語
  • 標準偏差は、糖尿病患者群の方が、やや大きい
  • 相関係数は diabetes pedigree 以外の変数に関して、非患者群の方が大きい。
English
  • The standard deviation is slightly larger in the diabetic patient group.
  • For variables other than diabetes pedigree, the correlation coefficient is larger in the non-diabetic group.

Author: Kaoru Babasaki

Email: [email protected]

Last Updated: 2025-05-02 金 16:29

home Home | ホーム | GitHub