CENG499 Introduction to Machine Learning Homework 3: Support Vector Machines and Linear Regression Solved

35.00 $

Category:

Description

5/5 - (1 vote)

1           Introduction

In this assignment, you will have the chance to get hands-on experience with support vector machines (SVM) and linear regression. Python is the programming language choice for this homework.

2           Support Vector Machines (40 pts)

In this task, you will improve your comprehension of the SVM by experimenting on various kernel functions and the hyperparameter C. You will employ the SVM implementation (called SVC, link) of scikit-learn, a popular machine learning library, for both subtasks. In each subtask, you will read the corresponding dataset with NumPy library, and plot the decision boundaries with Mathplotlib, a popular plotting library. No other external library is allowed.

2.1         Kernel Functions

The dataset file for this subtask is task1 A.npz. And you will read it by using the following code snippet. X is the feature matrix (100, 2), whereas y is the corresponding labels (100, ). Notice that the file is put under the directory task1 (Do not change the location). import numpy as np

data = np.load(“task1\\task1_A.npz”) X, y = data[“X”], data[“y”]

Write your code in task1 A.py. When grading, your implementation will be called as follows.

python task1_A.py

By using the dataset, your program will train 4 different SVMs, each of which has different kernel parameter. The values you will try for the kernel parameter are ‘linear’, ’sigmoid’, ‘poly’, and ‘rbf’. Stick with the default values for the other parameters of the SVC. As the output, your program will show 4 decision boundary plots on the screen. We have provided an example decision boundary plot below. You are to state the axes, the kernel name, and the training accuracy in your plots.

2.2         The hyperparameter C

The dataset file for this subtask is task1 B.npz. And you will read it by using the following code snippet. X is the feature matrix (100, 2), whereas y is the corresponding labels (100, ). Notice that the file is assumed to be under the directory task1 (Do not change the location). import numpy as np

data = np.load(“task1\\task1_B.npz”) X, y = data[“X”], data[“y”]

Write your code in task1 B.py. When grading, your implementation will be called as follows. python task1_B.py

By using the dataset, your program will train 4 different SVMs, each of which has different C parameter. The values you will try for the C parameter are 0.01, 0.1, 1, and 10. Use polynomial kernel. Stick with the default values for the other parameters of the SVC. As the output, your program will show 4 decision boundary plots on the screen. We have provided an example decision boundary plot below. You are to state the axes, the C value, and the training accuracy in your plots.

3           Linear Regression (60 pts)

In this section, you are going to perform a regression on a dataset by using multivariate linear regression. Specifically, you will parse the input dataset, perform normalization on it, train the model with gradient descent, and compute the performance metric on the test set. You will write your code in the function linear regression in task2.py. When grading, we will call the function with different parameters. The dataset for this task is taken from here (link).

  • No external library is allowed for this task.
  • As we stated above, we will use different datasets. Although the datasets we are going to use in grading have the same structure as the provided dataset, they have different number of features. In short, do not assume a fixed value for the number of features.
  • To normalize both sets, use the following equation. X is a feature, Xmin is the minimum value for the feature X, and Xmax is the maximum value for the feature X. Also, do not apply normalization on the target value (the last column).
  • The performance metric for this task is RMSE (Root Mean Square Error). m is the number of instances in the set, yi is the value of the ith instance, yˆi is the predicted value of the ith
  • For the provided dataset, the RMSE value you should achieve is 4.76 (whenn num epochs is 1000 and learning rate is 0.001).

4           Specifications

  • There is a time limit for a (sub) task. If a (sub) task takes more than 5 minutes to achieve its result, you will get a zero point from that (sub) task.
  • Falsifying results, changing the composition of training and test data are strictly forbidden, and you will receive 0 if this is the case. Your programs will be examined to see if you have actually reached the results and if they are working correctly.
  • Using any piece of code that is not your own is strictly forbidden and constitutes as cheating. This includes friends, previous homeworks, or the internet. The violators will be punished according to the department regulations.
  • Follow the course page on ODTUClass for any updates and clarifications. Please ask your questions on the discussion forum of ODTUClass instead of e-mailing.
  • You have total of 3 late days for all your homeworks. For each day you have submitted late, you will lose 10 points. The homeworks you submit late after your total late days have exceeded 3 will not be graded.

5           Submission

Submission will be done via ODTUClass. You will submit a zip file called hw3.zip that contains task1 A.py, task1 B.py, and task2.py.

  • THE-3-azgfvu.zip