[SOLVED] Machine-Learning Homework 2-Banking Insurance Product Phase 2

30.00 $

Category: Tags: , , , , , , ,
Click Category Button to View Your Next Assignment | Homework

You will receive the following solution file(s) instantly after successful payment:

zip file icon HW-2-g88wpc.zip (645.6 KB)
Assignment Instructions Updated Recently? Submit Below and we will provide new Solution!
Submit New Instructions
🔒 Securely Powered by:
Secure Checkout
Rate this product

 

Background

The Commercial Banking Corporation (hereafter the “Bank”), acting by and through its department of Customer Services and New Products is seeking proposals for banking services. The Bank ultimately wants to predict which customers will buy a variable rate annuity product. Previously the bank sought consulting work on the same project, but also had a focus on understanding the factors involved. Here the focus is more on predictive power.

A variable annuity is a contract between you and an insurance company / bank, under which the insurer agrees to make periodic payments to you, beginning either immediately or at some future date. You purchase a variable annuity contract by making either a single purchase payment or a series of purchase payments.

A variable annuity offers a range of investment options. The value of your investment as a variable annuity owner will vary depending on the performance of the investment options you choose. The investment options for a variable annuity are typically mutual funds that invest in stocks, bonds, money market instruments, or some combination of the three. If you are interested in more information, see: http://www.sec.gov/investor/pubs/varannty.htm

The project will be broken down into 3 phases:

  • Phase 1 – MARS and GAMs
  • Phase 2 – Tree-Based Models
  • Phase 3 – Model Interpretation

Objective – Phase 2

The scope of services in this phase includes the following:

  • For this phase use only the insurance_t data set.
  • Previous analysis has identified potential predictor variables related to the purchase of the

    insurance product so no initial variable selection before model building is necessary.

  • The data has missing values that need to be imputed.

    o Typically,theBankhasusedmedianandmodeimputationforcontinuousand categorical variables but are open to other techniques if they are justified in the report.

  • The Bank is interested in the value of random forest models.

    o Buildarandomforestmodel.
    § (HINT: You CANNOT just copy and paste the code from class. In class we built a

    model to predict a continuous variable. Make sure your target variable is a

    factor for the random forest.)
    o Tunethemodelparametersandrecommendafinalrandomforestmodel.

    § You are welcome to consider variable selection as well for building your final model. Describe your process for arriving at your final model.

    o Reportthevariableimportanceforeachofthevariablesinthemodel.
    § Pick one metric to rank things by – no need to report multiple metrics for each

    variable.
    o ReporttheareaundertheROCcurveaswellasaplotoftheROCcurve.

    § (HINT: Use the same approaches you used back in the logistic regression class.)

  • The Bank is also interested in the value of an XGBoost model.

    o BuildanXGBoostmodel.
    § (HINT: You CANNOT just copy and paste the code from class. In class we built a

    model to predict a continuous variable. You will need to look up the

    documentation for the ‘objective = “binary:logistic” ‘ option.)
    § Use the area under the ROC curve (AUC) as your evaluation metric instead of

    the default in XGBoost.
    o TunethemodelparametersandrecommendafinalXGBoostmodel.

    § You are welcome to consider variable selection as well for building your final model. Describe your process for arriving at your final model.

    o Reportthevariableimportanceforeachofthevariablesinthemodel. o ReporttheareaundertheROCcurveaswellasaplotoftheROCcurve.

    § (HINT: Use the same approaches you used back in the logistic regression class.)

Data Provided

The following two sets of data are provided for the proposal:

  • The training data set insurance_t contains 8,495 observations and selected variables.

    o Allofthesecustomershavebeenofferedtheproductinthedatasetunderthevariable INS, which takes a value of 1 if they bought and 0 if they did not buy.

    o Thereareselectedvariablesdescribingthecustomer’sattributesbeforetheywere offered the new insurance product.

  • The validation data set insurance_v contains 2,124 observations and selected variables.
  • The table below describes the Roles and Description of the variables found in both data sets.

    o Except for Branch of Bank, consider anything with more than 10 distinct values as continuous.

Name Model Role Description

ACCTAGE DDA DDABAL DEP DEPAMT CHECKS DIRDEP NSF NSFAMT PHONE TELLER SAV SAVBAL ATM ATMAMT POS POSAMT CD CDBAL IRA IRABAL INV INVBAL MM MMBAL MMCRED CC CCBAL CCPURC SDB INCOME LORES HMVAL AGE CRSCORE INAREA INS BRANCH

Input Age of oldest account

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Target

Input

Indicator for checking account

Checking account balance

Checking deposits

Total amount deposited

Number of checks written

Indicator for direct deposit

Number of insufficient fund issues

Amount of NSF

Number of telephone banking interactions

Number of teller visit interactions

Indicator for savings account

Savings account balance

Indicator for ATM interaction

Total ATM withdrawal amount

Number of point of sale interactions

Total amount for point of sale interactions

Indicator for certificate of deposit account

CD balance

Indicator for retirement account

IRA balance

Indicator for investment account

INV balance

Indicator for money market account

MM balance

Number of money market credits

Indicator for credit card

CC balance

Number of credit card purchases

Indicator for safety deposit box

Income

Length of residence in years

Value of home

Age

Credit score

Indicator for local address

Indicator for purchase of insurance product

Branch of bank

  • HW-2-g88wpc.zip