CS 208 HW 2: Differential Privacy Foundations Solved

Description

Rate this product

Mechanisms: Consider the following mechanisms M that takes a dataset x ∈ [0,1]ⁿand returns an estimate of the mean ¯ .
- , for Z ∼ Lap(2/n), where for real numbers y and a ≤ b, [y]^b_adenotes the

“clamping” function:





[y]_a= y

b

if y < a if a ≤ y ≤ b .

if y > b

, for Z ∼ Lap(2/n). iii

(

1 w.p. x

M(x) =.

0 w.p. 1 − x.

iv M(x) = Y where Y has probability density function f_Ygiven as follows:

if y ∈ [0,1].

0 if y /∈ [0,1].

(This is an instantiation of a continuous version of the exponential mechanism.)

Which of the above mechanisms meet the definition of (,0)-differential privacy for a finite value of , and what is the smallest value of (possibly as a function of n) for which they do?
For those that do not meet the definition, calculate the smallest value of δ (again possibly as a function of n) for which they satisfy (,δ) differential privacy for a finite value of .
Describe how you would modify the algorithms to have tunable privacy parameters (and δ in case of mechanisms that require it) and tunable data domain [a,b] (rather than [0,1]).
Which of these algorithms do you consider to be “best” for releasing a mean and why?

(There is not a single “right” answer for this problem.)

Evaluating DP Algorithms with Synthetic Data: Consider a dataset x ∈ Nⁿdrawn from a Poisson process, which has probability distribution Pr[x_i= k] = 10^ke^−[1]0/k! for natural numbers k (where we consider k = 0 to be a natural number and define 0! = 1).
- Write a data generating process (DGP) function that generates a dataset x ∈ Nⁿaccording the above Poisson process.
- Pick one of your differentially private mechanisms from question 1 (generalized to allow for arbitrary choices of and data range [a,b] as parameters) that releases an estimate of ¯x. Implement this mechanism as a function which is given a vector of values x ∈ [a,b]ⁿand an and makes a differentially private release. You can assume the sample size n is public knowledge. To apply your mechanism to unbounded data x ∈ Rⁿ, you will have to clamp x to a chosen range [a,b]. For simplicity, we will fix a = 0 and only consider the effect of varying b.
- Recall the discussion on clamping from class; if the range is large, the sensitivity increases, so noise increases and utility drops. However, if you clip the values too aggressively the answer will be biased, and again utility will drop. For n = 200 and 5, plot the root mean squared error as a function of the upper bound b. Identify the approximate optimal value b^∗of b for this data distribution.
- Suppose we have an actual (not synthetic) dataset x ∈ Nⁿfor which we want to release a differentially private mean, and we don’t know the underlying distribution of x. Again, we need to select the parameter b and want to do so in a way that minimizes the error. A natural idea is to use a (nonparametric) bootstrap¹to generate many datasets that are “similar” to x in place of the data-generating process above, and optimize the choice of b as above. Once we find an optimal value b^∗, we then do our differentially private release on the dataset x

Explain why this approach is not safe in general and may violate differential privacy.

Propose some alternative methods for determining a good upper bound b for a given sensitive dataset x, while continuing to satisfying the definition of differential privacy.

Regression: Consider a dataset where each of its n rows is a pair of real numbers (x_i,y_i), where x_iis drawn from a Poisson distribution as in Problem 2, and y_iis then a noisy linear function of x_i, specifically:

y_i= βx_i+ α + ν_i; ν_i∼ N(0,σ^[2]) (1)

for unknown parameters α,β,σ.

One simple way to estimate the parameters α and β without privacy is by a simple linear regression, using the following estimators:

(2)

αˆ = ¯y − β^ˆx¯; (3)

In class and section, we saw an algorithm and implementation to produce a differentially private version of β^ˆ.²Augment this implementation to also produce a differentially private version of ˆα, so that the overall method for computing both ˆα and -DP, for an input parameter . If your algorithms use several basic differentially private algorithms as subroutines, divide the overall privacy budget of among them evenly. You can clamp both the x_i’s and y_i’s using reasonable values of your choosing (perhaps following your choice from Problem 2 for x). Describe the reasoning behind your choice in your write up.
Evaluate the performance of your algorithm using a Monte Carlo simulation with synthetic data as in Problem 2 Parts (a)–(c), for the parameters = 1 and n = 1000.

Measure utility by the mean-squared residuals:

(The non-private method for simple linear regression described above is chosen to minimize this quantity, hence the term “least-squares regression”.) Plot and compare the distributions of mean-squared residuals you get with your differentially private simple linear regression and with a non-private simple linear regression.

Now, run experiments to see if there is a different partition of your privacy budget in your algorithm that yields better utility. Use a grid search^[3] to explore different partitions of and see if you find one that is convincingly better (in terms of mean-squared residuals) than an equal partition under the given data distribution. Show and explain your results.

DP vs. Reconstruction Attacks: Suppose M : {0,1}ⁿ→ Y is an (,δ)-DP mechanism and A : Y → {0,1}ⁿis an adversary that is trying to reconstruct the sensitive bits in the dataset x ∈ {0,1}ⁿfrom the output M(x). Suppose the dataset is a random variable X = (X₁,…,X_n) consisting of n iid draws from a Bernoulli(p) distribution, for a known value of p. Prove that the expected fraction of bits that the adversary successfully reconstructs is not much larger than the trivial bound of max{p,1 − p} (which can be achieved by guessing the all-zeroes or all-ones dataset). Specifically:

E[#

(Hint: write the quantity inside the expectation as an average of indicator random variables, and for each i, consider running M on the dataset X⁽ⁱ⁾where we replace the i’th row of X with the fixed value 0.)

CS 208 HW 2: Differential Privacy Foundations Solved

If Helpful Share:

Description

Related products

CS 208 HW 4A: The Local Model Solved

CS 208 HW 4B: Stochastic Gradient Descent and Lipschitz Extensions

CS 208 HW 1: Reidentification, Reconstruction and Membership Attacks Solved