Description
Problem 1
Prove that the derivative of θTXTXθ with respect to θ is 2XTXθ.
Problem 2
Age | Home Owner | Car Owner | Having Kids | Salary |
40 | Yes | Yes | Yes | 10000 |
20 | No | No | No | 500 |
50 | Yes | No | Yes | 8000 |
30 | Yes | No | No | 5000 |
Tasks
- Run GBM on paper for two iterations (i.e., stopping at F2 and PR2). No more than 4 leaves. Use learning rate γ = 0. Features can be re-used in DT.
- Run XGBoost on paper for two iterations (i.e., stopping at F2 and PR2). No more than 4 leaves.
Use regularizer λ = 1 and pruning γ = 0 and learning rate µ = 0.1.
Problem 3 (Open-Ended)
Dataset
California housing price data in the 1990-2000. 1–9 are the features and 10 is the target.
- longitude: A measure of how far west a house is; a higher value is farther west
- latitude: A measure of how far north a house is; a higher value is farther north
- housingMedianAge: Median age of a house within a block; a lower number is a newer building
- totalRooms: Total number of rooms within a block
- totalBedrooms: Total number of bedrooms within a block
- population: Total number of people residing within a block
- households: Total number of households, a group of people residing within a home unit, for a block
- medianIncome: Median income for households within a block of houses (measured in tens of thousands of US Dollars)
- oceanProximity: Location of the house w.r.t ocean/sea
- medianHouseValue: Median house value for households within a block (measured in US Dollars)
Tasks
- Build a Linear Regression Model using 80% training set and 20% testing set. Interpret your results as much as you can.
- Build a GBM using 80% training set and 20% testing set. Interpret your results as much as you can.
- Build a XGBoost Model using 80% training set and 20% testing set. Interpret your results as much as you can.