Exercise 9.3 Solution Example - Hoff, A First Course in Bayesian Statistical Methods
標準ベイズ統計学演習問題 9.3 解答例

a)
- Answer
b)
- Answer
c)
- Answer

a)

Answer

The result of bayesian regression using the g-prior is shown below:

crime_src = readdlm("../../Exercises/crime.dat", header=true)
crime = crime_src[1]
header = crime_src[2] |> vec

df = DataFrame(crime, header)

using GLM

function ResidualSE(y,X)
    fit = lm(X, y)
    return sum(map(x -> x^2, residuals(fit))) / (size(X, 1) - size(X, 2))
end

function lmGprior(y, X; g=length(y), ν₀=1, σ₀²=ResidualSE(y,X), S=1000)
    n, p = size(X)

    Hg = (g/(g+1)) * X * inv(X'X) * X'
    SSRg = y' * (diagm(ones(n)) - Hg) * y
    σ² = 1 ./ rand(Gamma((ν₀+n)/2, 2/(ν₀*σ₀²+SSRg)), S)

    Vb = g * inv(X'X) /(g+1)
    Eb = Vb * X' * y

    β = Matrix{Float64}(undef, 0, p)
    for i in 1:S
        s = σ²[i]
        β = vcat(β, rand(MvNormal(Eb, Symmetric(s * Vb)))')
    end
    return β, σ²
end

# set prior parameters
n = size(df, 1)
g = n
ν₀ = 2
σ₀² = 1

y = df[:, :y]
X = select(df, Not(:y)) |> Matrix
X = [ones(size(X, 1)) X] # add intercept
p = size(X, 2)
S=5000
BETA, SIGMA² = lmGprior(y, X; g=g, ν₀=ν₀, σ₀²=σ₀², S=S)

# posterior mean for β
β̂_gprior = mean(BETA, dims=1) |> vec

# posterior confidence intervals for β
ci_gprior = map(x -> quantile(x, [0.025, 0.975]), eachcol(BETA))

# store results
coef_name = names(df[:, Not(:y)]) |> vec
coef_name = ["intercept", coef_name...]
result_gprior = DataFrame(
    param = coef_name,
    β̂ = β̂_gprior,
    ci_lower = map(x -> x[1], ci_gprior),
    ci_upper = map(x -> x[2], ci_gprior)
)
@transform!(result_gprior, :ci_length = :ci_upper - :ci_lower)
end

:RESULTS:

16×5 DataFrame
 Row │ param      β̂             ci_lower    ci_upper    ci_length
     │ String     Float64       Float64     Float64     Float64
─────┼────────────────────────────────────────────────────────────
   1 │ intercept  -0.00138411   -0.145485    0.141785    0.28727
   2 │ M           0.280085      0.0326016   0.522226    0.489625
   3 │ So          0.000124904  -0.335994    0.326097    0.662091
   4 │ Ed          0.531345      0.213679    0.848809    0.63513
   5 │ Po1         1.4448        0.0578835   2.9098      2.85191
   6 │ Po2        -0.769673     -2.28756     0.708661    2.99622
   7 │ LF         -0.0649535    -0.345494    0.201018    0.546512
   8 │ M.F         0.128508     -0.147925    0.402133    0.550058
   9 │ Pop        -0.0704276    -0.301267    0.153849    0.455116
  10 │ NW          0.105647     -0.200629    0.416185    0.616814
  11 │ U1         -0.265548     -0.631514    0.103081    0.734594
  12 │ U2          0.359167      0.0283685   0.688258    0.659889
  13 │ GDP         0.239074     -0.213783    0.692641    0.906425
  14 │ Ineq        0.715697      0.299354    1.13272     0.83337
  15 │ Prob       -0.27993      -0.527327   -0.0344898   0.492837
  16 │ Time       -0.0602993    -0.288748    0.181236    0.469984

The result of least square estimation is shown below:

# least square estimates
olsfit = lm(X, y)

tstat = coef(olsfit) ./ stderror(olsfit)
pval = 2 * (1 .- cdf.(Normal(), abs.(tstat)))

result_ols = DataFrame(
    param = coef_name,
    β̂ = coef(olsfit),
    pvalue = pval,
    ci_lower = confint(olsfit)[:, 1],
    ci_upper = confint(olsfit)[:, 2]
)
@transform!(result_ols, :ci_length = :ci_upper - :ci_lower)

16×6 DataFrame
 Row │ param      β̂             pvalue      ci_lower     ci_upper    ci_length
     │ String     Float64       Float64     Float64      Float64     Float64
─────┼─────────────────────────────────────────────────────────────────────────
   1 │ intercept  -0.000458109  0.995369    -0.161444     0.160527    0.321971
   2 │ M           0.286518     0.0348684    0.00955607   0.56348     0.553924
   3 │ So         -0.000114046  0.999506    -0.375493     0.375265    0.750757
   4 │ Ed          0.544514     0.00243231   0.178196     0.910832    0.732636
   5 │ Po1         1.47162      0.0715397   -0.193934     3.13718     3.33111
   6 │ Po2        -0.78178      0.358608    -2.51862      0.955056    3.47367
   7 │ LF         -0.0659646    0.667281    -0.378923     0.246994    0.625917
   8 │ M.F         0.131298     0.397494    -0.185192     0.447788    0.63298
   9 │ Pop        -0.0702919    0.579692    -0.329144     0.18856     0.517704
  10 │ NW          0.109057     0.525852    -0.241574     0.459687    0.701261
  11 │ U1         -0.270536     0.168865    -0.671568     0.130495    0.802063
  12 │ U2          0.36873      0.0403534    0.00190637   0.735554    0.733648
  13 │ GDP         0.238059     0.357931    -0.290079     0.766198    1.05628
  14 │ Ineq        0.726292     0.00192334   0.24874      1.20384     0.955105
  15 │ Prob       -0.285226     0.033017    -0.558095    -0.0123574   0.545738
  16 │ Time       -0.0615769    0.638379    -0.328802     0.205648    0.534451

The point estimates and 95% confidence/credible intervals are very similar between bayesian regression and least square estimation. The lengths of the intervals seem to be shorter in bayesian regression than in least square estimation. Seeing the results of bayesian regression, the 95% credible intervals of coefficients of M, Ed, Po1, U2, Ineq and Prob do not include 0, so these variables seem strongly predictive of crime rates. On the other hand, the results of least square estimation show that the p-values of M, Ed, U2 and Ineq are less than 0.05, so these variables seem strongly predictive of crime rates. Using these types of criteria, the difference between the two results is that Po1 and Prob are predictive of crime rates in bayesian regression, but not in least square estimation.

b)

Answer

# split data into training and test set
Random.seed!(1234)
i_te = sample(1:n, div(n, 2), replace=false)
i_tr = setdiff(1:n, i_te)

y_tr = y[i_tr]
y_te = y[i_te]
X_tr = X[i_tr, :]
X_te = X[i_te, :]

# OLS
olsfit = lm(X_tr, y_tr)
β̂_ols = coef(olsfit)
ŷ_ols = X_te * β̂_ols
error_ols = mean((y_te - ŷ_ols).^2)

:RESULTS:

0.6155918321208247

# Bayesian
BETA, SIGMA² = lmGprior(y_tr, X_tr; g=g, ν₀=ν₀, σ₀²=σ₀², S=S)
β̂_bayes = mean(BETA, dims=1) |> vec
ŷ_bayes = X_te * β̂_bayes
error_bayes = mean((y_te - ŷ_bayes).^2)

:RESULTS:

0.6012597112455574

From the above results, the least square regression and bayesian regression seem to yield very similar predictions for the test data. For this time, the bayesian regression predict better than the least square regression, but the gap is very small.

c)

Answer

function computeMses(y, X; g=g, ν₀=ν₀, σ₀²=σ₀², S=S)
    # split data into training and test set
    y_tr, y_te, X_tr, X_te = splitData(y, X)

    # OLS
    olsfit = lm(X_tr, y_tr)
    β̂_ols = coef(olsfit)
    ŷ_ols = X_te * β̂_ols
    error_ols = mean((y_te - ŷ_ols).^2)

    # Bayesian
    BETA, SIGMA² = lmGprior(y_tr, X_tr; g=g, ν₀=ν₀, σ₀²=σ₀², S=S)
    β̂_bayes = mean(BETA, dims=1) |> vec
    ŷ_bayes = X_te * β̂_bayes
    error_bayes = mean((y_te - ŷ_bayes).^2)

    return error_ols, error_bayes
end

# compute MSEs
n_sim = 1000
error_ols = zeros(n_sim)
error_bayes = zeros(n_sim)
for i in 1:n_sim
    error_ols[i], error_bayes[i] = computeMses(y, X)
end

# average MSEs
mean(error_ols), mean(error_bayes)

:RESULTS:

(1.12446194924989, 1.092206057122776)

9.2

10.1

Exercise 9.3 Solution Example - Hoff, A First Course in Bayesian Statistical Methods 標準ベイズ統計学 演習問題 9.3 解答例

Table of Contents

a)

Answer

b)

Answer

c)

Answer

Exercise 9.3 Solution Example - Hoff, A First Course in Bayesian Statistical Methods
標準ベイズ統計学演習問題 9.3 解答例