Name: IE613 Assignment1 Solved
SKU: 64968
Price: 30.00 USD
Availability: InStock

5/5 - (1 vote)

Question 1 Consider the problem of prediction with expert advice with d = 10. Assume that the losses assigned to each expert are generated according to independent Bernoulli distributions. The adversary/environment generates loss for experts 1 to 8 according to Ber(0.5) in each round. For the 9th expert, loss is generated according to Ber(0.5−∆) in each round. The losses for the 10th expert are generated according to different Bernoulli random variable in each round– for the first T/2 rounds, they are generated according to Ber(0.5+∆) and the remaining T/2 rounds they are generated according to Bernoulli random variable Ber(0.5 − 2∆). ∆ = 0.1 and T = 10⁵. Generate (pseudo) regret values for different learning rates (η) for Weighted Majority algorithm. The averages should be taken over at least 20 sample paths (more is better). Display 95% confidence intervals for each plot. Vary c in the interval in steps of size 0.2 to get different learning rates. Implement Weighted Majority algorithm with

η 2log( ) .

Question 2Consider the problem of multi-armed bandit with K = 10 arms. Assume that the losses are generated as in Question 1. For each of the following algorithms generate (pseudo) regret for different learning rates (η) for each of the following algorithms. The averages should be taken over atleast 50 sample paths (more is better). Display 95% confidence intervals for each plot. Vary c in the interval [0.1 2.1] in steps of size 0.2 to get different learning rates.

Set η = c^p2log(K)/KT.
P. Set η = c^p2log(K)/KT, β = η, γ = Kη.
EXP-IX. Set η = c^p2log(K)/KT, γ = η/2.

Question 3 In Question 2, which one of EXP3, EXP3.P and EXP3-IX performs better and why?

Question 4 Show that for any deterministic policy π there exists an environment ν such that R_T(π,ν) ≥ T(1 − 1/K) for T rounds and K arms.

Question 5 Suppose we had defined the regret by

where I_tis the arm chosen by the policy π and x_tI_tis the reward observed in the round t. At first sight this definition seems like the right thing because it measures what you actually care about. Unfortunately,

1-1

1-2 Homework 1: February 7

however, it gives the adversary too much power. Show that for any policy π (randomized or not) there exists a ν ∈ [0,1]^K^×Tsuch that

Question 6 Let p ∈P_kbe a probability vector and suppose Xˆ : [k]×R → R is a function such that for all x ∈ R^k, if A ∼ p,

E[X^ˆ(A,x_A)] = ^Xp_iX^ˆ(i,x_i) = x₁.

i=1

Show there exists an a ∈ R^ksuch that such that and.

Question 7 Suppose we have a two-armed stochastic Bernoulli bandit with µ₁= 0.5 and µ₂= 0.55. Test your implementation of EXP3 from the Question 2. What happens when T = 10⁵and the sequence of rewards is x_t₁= I{t ≤ T/4} and x_t₂= I{t > T/4}?

Submission Format and Evaluation: Your should submit a report along with your code. Please zip all your files and upload via Moodle. The zipped folder should named as YourRegistrationNo.zip e.g.‘154290002.zip’. The report should contain two figures: first figure should have one plot corresponding to algorithm in Q.1 and the other should have 3 plots one corresponding to each algorithm in Q.2. For each figure, write a brief summary of your observations. We may also call you to a face-to-face session to explain your code.

Note: Please calculate (pseudo) regret for each algorithm in Q.2 for a given set of parameters as follows:

Let µⁱ_tdenote the mean of arm i in round t. Suppose an adversary generates sequence of loss vectors and an algorithm generates sequence of pulls, the (pseudo) regret for this sample path is

)] (1.1)

T T

= XµItt − minXµit (1.2)

t=1 t=1

Note that in this calculation we only considered the mean values of losses, not the actual losses suffered. It is Okay if this value turns out to be negative. There is no expectation over random choices of I_ts here. Now generate 20 such sample paths and take their average.

Assignment-1-qicl6i.zip

[SOLVED] IE613 Assignment1

Want to See Past Work First?

[SOLVED] IE613 Assignment1

Related products

IE613 Assignment2

IE613 Assignment3

Related in this category

More in this category

IE613 Assignment2

IE613 Assignment3

Want to See Past Work First?