Description

Rate this product

Make sure that you upload the PDF (or HTML) output after you have knitted the file. The files you upload to the Canvas page should be updated with commands you provide to answer each of the questions below. You can edit this file directly to produce your final solutions.

Goals

This lab has two goals. The first goal is to use the _{Accept-Reject}algorithm to simulate from a mixture of two normals. The second goal is to utilize Bayesian methods and the the famous _{Markov Chain Monte}Carlo algorithm to estimate the mixture parameter _δ.

Background: (Mixture)

Name: GU4206-GR5206 Lab6-(Bayesian Estimation and MCMC) Solved
SKU: 68646
Price: 30.00 USD
Availability: InStock

A mixture distribution is the probability distribution of a random variable that is derived from a collection of other random variables (Wiki). In our case we consider a mixture of two normal distributions. Here we assume that our random variable is governed by the probability density _f(_x), defined by

f(x) = f(x;µ₁,σ₁,µ₂,σ₂,δ)

where −∞ < x < ∞ and the parameter space is defined by −∞ < µ₁,µ₂< ∞, σ₁,σ₂> 0, and 0 ≤ δ ≤ 1.

The mixture parameter _δgoverns how much mass gets placed on the first distribution _f(_x;_µ₁_,σ₁) and the complement of _δgoverns how much mass gets placed on the other distribution _f₂(_x;_µ₂_,σ₂).

To further motivate this setting, consider simulating _n= 10_,000 heights from the population of both males and females. Assume that males are distributed normal with mean _µ₁= 70[in] and standard deviation σ₁= 3[in] and females are distributed normal with mean _µ₂= 64[in] and standard deviation _σ₂= 2_.5[in]. Also assume that each distribution contributes equal mass, i.e., set the mixture parameter to _δ= _.5. The distribution of males is governed by

and the distribution of females is governed by

Below shows the pdf of _f₁(_x;_µ₁_,σ₁), _f₂(_x;_µ₂_,σ₂) and the mixture _f(_x) all on the same plot.

x <- seq(45,90,by=.05) n.x <- length(x) f_1 <- dnorm(x,mean=70,sd=3) f_2 <- dnorm(x,mean=64,sd=2.5) f <- function(x) { return(.5*dnorm(x,mean=70,sd=3) + .5*dnorm(x,mean=64,sd=2.5))

}

plot_df <- data.frame(x=c(x,x,x),

Density=c(f_1,f_2,f(x)),

Distribution=c(rep(“Male”,n.x),rep(“Female”,n.x),rep(“Mixture”,n.x))

)

library(ggplot2) ggplot(data = plot_df) + geom_line(mapping = aes(x = x, y = Density,color=Distribution))+ labs(title = “Mixture of Normals”)

Mixture of Normals

Part I: Simulating a Mixture of Normals

The first goal is to simulate from the mixture distribution

δf₁(x;µ₁,σ₁) + (1 − δ)f₂(x;µ₂,σ₂),

where _µ₁= 70_,σ₁= 3_,µ₂= 64_,σ₂= 2_.5_,δ= _.5. We use the accept-reject algorithm to accomplish this task.

First we must choose the “easy to simulate” distribution _g(_x). For this problem choose _g(_x) to be a Cauchy distribution centered at 66 with scale parameter 7.

g <- function(x) { s=7 l=66 return(1/(pi*s*(1+((x–l)/s)^2)))

}

Perform the following tasks

1) Identify a _suitablevalue of _alphasuch that your envelope function _e(_x) satisfies

f(x) ≤ e(x) = g(x)/α, where 0 < α < 1.

Note that you must choose _αso that _e(_x) is close to _f(_x). There is not one unique solution to this problem. The below plot shows how _α= _.20 creates an envelope function that is too large. Validate your choice of _alphawith with a graphic similar to below.

# Choose alpha alpha <- .20

# Define envelope e(x) e <- function(x) { return(g(x)/alpha) }

# Plot

x.vec <- seq(30,100,by=.1) ggplot() + geom_line(mapping = aes(x = x.vec, y = f(x.vec)),col=”purple”)+ geom_line(mapping = aes(x = x.vec, y = e(x.vec)),col=”green”)

40 60

80 100

x.vec

# Is g(x)>f(x)?

all(e(x.vec)>f(x.vec))

## [1] TRUE

Solution

## Solution goes here ——-

2) Write a function named _r.norm.mix()that simulates _n.sampsfrom the normal-mixture _f(_x). To accomplish this task you will wrap a function around the accept-reject algorithm from the lecture notes. Also include the acceptance rate, i.e., how many times did the algorithm accept a draw compared to the total number of trials performed. Your function should return a list of two elements: (i) the simulated vector mixture and (ii) the proportion of accepted cases. Run your function _r.norm.mix()to simulate 10,000 cases and display the first 20 values. What’s the proportion of accepted cases? Compare this number to your chosen _αand comment on the result. The code below should help you get started.

Solution

r.norm.mix <- function(n.samps) {

#n <- 0 # counter for number samples accepted

#m <- 0 # counter for number of trials

#samps <- numeric(n.samps) # initialize the vector of output

#while (n < n.samps) {

# Fill in body ——-

#return(list(x=samps,alpha.hat=n.samps/m))

}

#r.norm.mix(n.samps=10000)$alpha.hat

#alpha

3) Using _ggplotor _{base R}, construct a histogram of the simulated mixture distribution with the true mixture pdf _f(_x) overlayed on the plot.

Solution

## Solution goes here ——-

Part II: Bayesian Statistics and MCMC

Suppose that the experimenter collected 100 cases from the true mixture-normal distribution _f(_x). To solve problems (4) through (8) we analyze one realized sample from our function _r.norm.mix(). In practice this dataset would be collected and not simulated. Uncomment the below code to simulate our dataset _x. If you failed to solve Part I, then read in the csv file _{mixture_data.csv}posted on Canvas.

Solution

# Simulate data

#set.seed(1983)

#x <- r.norm.mix(n.samps=100)$x

#head(x)

#hist(x,breaks=20,xlab=”X”,main=””)

# Or read data x <- read.csv(“mixture_data.csv”)$x head(x)

## [1] 71.66666 63.91096 67.06554 65.49516 70.34363 65.69982

hist(x,breaks=20,xlab=”X”,main=””)

Further, suppose that we know the true heights and standard deviations of the two normal distributions but the mixture parameter _δis unknown. In this case, we know _µ₁= 70, _σ₁= 3, _µ₂= 64, _σ₂= 2_.5. The goal of this exercise is to utilize maximum likelihood and MCMC Bayesian techniques to estimate mixture parameter _δ.

Maximum likelihood Estimator of Mixture Parameter

4) Set up the likelihood function _L(_δ|_x₁_,…,x₁₀₀) and define it as _mix.like(). The function should have two inputs including the parameter _deltaand data vector _x. Evaluate the likelihood at the parameter values _delta=.2, _delta=.4,and _delta=.6. Note that all three evaluations will be very small numbers. Which delta (_δ= _.2_,.4_,.6) is the most likely to have generated the dataset _x?

Solution

## Solution goes here ——-

5) Compute the maximum likelihood estimator of mixture parameter _δ. To accomplish this task, apply your likelihood function mix.like() across the vector seq(.1,.99,by=.001). The solution to this exercise is given below.

# delta <- seq(.1,.99,by=.001)

# MLE.values <- sapply(delta,mix.like,x=x)

# delta.MLE <- delta[which.max(MLE.values)]

# plot(delta,MLE.values,ylab=”Likelihood”,type=”l”)

# abline(v=delta.MLE,col=”blue”)

# text(x=delta.MLE+.08,y=mix.like(delta=.45,x=x),paste(delta.MLE),col=”blue”)

MCMC

6) Run the Metropolis-Hastings algorithm to estimate mixture parameter _δ. In this exercise you will assume a Beta(_α= 10_,β= 10) prior distribution on mixture parameter _δ. Some notes follow:

Run 20000 iterations. I.e., simulate 20000 draws of _δ^(t)
Proposal distribution Beta(_α= 10_,β= 10) • Independence chain with Metropolis-Hastings ratio:

R(δ(t),δ∗) = LL((δδ(∗t)||xx11,…,x,…,x100100))

Display the first 20 simulated cases of _δ^(t).

Solution

## Solution goes here ——-

7) Construct a lineplot of the simulated Markov chain from exercise (6). The vertical axis is the simulated chain _δ^(t)and the horizontal axis is the number of iterations.

Solution

## Solution goes here ——-

8) Plot the empirical autocorrelation function of your simulated chain _δ^(t). I.e., run the function _acf(). A quick decay of the chain’s autocorrelations indicate good mixing properties.

Solution

#acf(delta_t_vec,main=”ACF: Prior Beta(10,10)”)

9) Compute the empirical Bayes estimate _δˆ_Bof the simulated posterior distribution _π(_δ|_x₁_,…,xn). To solve this problem, simply compute the sample mean of your simulated chain _δ^(t)after discarding a 20% burn-in.

Solution

## Solution goes here ——-

10) Construct a histogram of the simulated posterior _π(_δ|_x₁_,…,xn) after discarding a 20% burn-in.

Solution

## Solution goes here ——-

11) Run the Metropolis-Hastings algorithm to estimate the mixture parameter _δusing a Beta(_α= 15_,β= 2) prior distribution on mixture parameter _δ. Repeat exercises 6 though 10 using the updated prior.

Solution

## Solution goes here ——-

lineplot:

Construct a lineplot of the simulated Markov chain from exercise (6). The vertical axis is the simulated chain δ^(t)and the horizontal axis is the number of iterations.

Solution

## Solution goes here ——-

ACF:

Plot the empirical autocorrelation function of your simulated chain _δ^(t). I.e., run the function _acf(). A slow decay of the chain’s autocorrelations indicate poor mixing properties.

Solution

## Solution goes here ——-

Bayes estimate:

Compute the empirical Bayes estimate _δˆ_Bof the simulated posterior distribution _π(_δ|_x₁_,…,xn). To solve this problem, simply compute the sample mean of your simulated chain _δ^(t)after discarding a 20% burn-in. Your answer should be close to the MLE.

Solution

## Solution goes here ——-

Posterior: Construct a histogram of the simulated posterior _π(_δ|_x₁_,…,xn) after discarding a 20% burn-in.

Solution

## Solution goes here ——-

Lab-6-hlsi5q.zip

GU4206-GR5206 Lab6-(Bayesian Estimation and MCMC) Solved

If Helpful Share:

Description

Goals

Background: (Mixture)

Mixture of Normals

Part I: Simulating a Mixture of Normals

Perform the following tasks

Solution

Solution

Part II: Bayesian Statistics and MCMC

Solution

Maximum likelihood Estimator of Mixture Parameter

Solution

MCMC

Solution

Solution

Solution

Solution

Solution

Solution

Solution

Solution

Solution

Solution

Related products

GU4206-GR5206 Homework5-probability distributions using the Inverse Transform Method and AcceptReject MethodSolved

GU4206-GR5206 Homework1-Data cleaning, EDA, R graphics Solved

GU4206-GR5206 Homework4-Introduction To Tidyverse Solved