EE5111 Mini project 4 Solved

30.00 $

Category:

Description

Rate this product

The aim of this exercise is to study the importance of conjugate priors while performing Bayesian estimation.

Consider the estimation of the covariance of a bivariate Gaussian distribution. We have access to n observations yi ∼ N(0,Σ) for i = 1,…n. Here yi is a 2×1 vector; Σ is a 2×2 matrix. We denote by ~y the set of all observations, yi. Perform the following experiments using as the underlying covariance for n = 10,100,1000.

  1. Estimate the covariance using Maximum Likelihood
  2. Perform Bayesian estimation and provide a point estimate for the covariance. The conjugate prior distribution is the inverse Wishart distribution. The d-dimensional distribution is given by

(1)

Consider the following hyperparameters for the prior:  and ν0 = 5; here

d = 2. (Refer to Section 3.6 in Gelman)

The posterior density is of the same density with parameters

(2)

n = 0 + XyiyiT                                                                                                           (3)

i=1

  1. Use of Non-informative prior:

As an alternative to the conjugate prior, use the non-informative Jeffrey’s prior given by

p(Σ) ∝ |Σ|−2,                                                                          (4)

and the independence-Jeffreys prior given by

p(Σ) ∝ |Σ|−3/2.                                                                        (5)

What are the differences in the inferences using non-informative priors as compared to the conjugate prior?

  1. Monte Carlo Bayesian estimation:

This method is useful when the posterior is not available in closed form. Note that we require the mean of the posterior distribution.

(6)

(7)

Note that the likelihood is

 .                                     (8)

Instead of using the closed form expression for the posterior update, find the posterior using Monte Carlo integration using the following equation

(9)

where each Σj p(Σ) ( a sample drawn from the prior distribution). Report the values of A for n = 10,100,1000 and for m = 103,104,105 for p(Σ) ∼ InvWishartν0(∆0) for the following parameters:

.

Which prior performs better? Why do you think it happens? Can you justify why modeling the prior is important? Note that you can now model your prior distribution as any non-conjugate distribution as well.

  1. Hierarchical Bayes estimation and Gibbs sampling:

Consider the following formulation of the prior for covariance.

Diag(10)

1 a2

)                                                                                    (11)

For performing Gibbs sampling, use the following equations to draw samples iteratively from one distribution and use the drawn samples in the next equation:

!

yiyiT                           (12)

(13)

Use A1 = 0.05 and A2 = 0.05. Report the covariance estimate after 103 iterations of Gibbs sampling.

  1. Empirical Bayes:

For empirical Bayes, we consider an inverse Wishart prior

.                                          (14)

However, instead of using a distribution over the parameters of the Wishart distribution, the marginal likelihood is computed as follows:

Z

py|ν,) =           py|Σ)p(Σ|ν,)d∆                                                 (15)

The obtained py|ν,) is then used to maximizing the log likelihood with respect to ν and to obtain νopt and opt,

yiyiT                                                                                                                                                                                     (16)

νopt = argmax          (17)

where Γd(a) is the multivariate gamma function,

 .                                              (18)

You may employ an iterative optimization algorithm to solve (17). The posterior is given by,

n

p(Σ|y,ν¯              opt,opt) = InvWishart(νopt + n,opt + XyiyiT)                                  (19)

i=1

Consider the following hyperparameters for the prior:  and ν0 = 5. You can refer to https://emtiyaz.github.io/Writings/wishart.pdf for more details. Compare the estimate with the one obtained using the conjugate prior, non-informative prior and the hierarchical Bayes method.

Questions:

  • Which of the six methods listed above would you advocate for this problem and why?
  • Here, we deal with a dimension of d = 2. For a problem of a higher dimension, which method would you recommend? Justify.
  • Mini-project-4-tnm2mi.zip