Description

Rate this product

Environment l OS: Windows, Mac OS, or Linux

l Languages: C++, Java, or Python (any version is ok)

Goal: Perform clustering on a given data set by using DBSCAN.

3. Requirements

The program must meet the following requirements:

Execution file name: exe n Execute the program with four arguments: input data file name, n, Eps and MinPts
- Three input data will be provided: ‘input1.txt’, ‘input2.txt’, ‘input3.txt
- n: number of clusters for the corresponding input data
- Eps: maximum radius of the neighborhood
- MinPts: minimum number of points in an Eps-neighborhood of a given point
- We suggest that you use the following parameters (n, Eps, MinPts) for each input data l For ‘input1.txt’, n=8, Eps=15, MinPts=22 l For ‘input2.txt’, n=5, Eps=2, MinPts=7 l For ‘input3.txt’, n=4, Eps=5, MinPts=5 n Example:

Input data file name = ‘input1.txt’, n = 8, Eps = 15, MinPts = 22

File format for an input data

[object_id_1]\t[x_coordinate]\t[y_coordinate]\n

[object_id_2]\t[x_coordinate]\t[y_coordinate]\n [object_id_3]\t[x_coordinate]\t[y_coordinate]\n [object_id_4]\t[x_coordinate]\t[y_coordinate]\n

n Row: information of an object

[object_id_i]: identifier of the ith object
[x_coordinate], [y_coordinate]: the location of the corresponding object in the 2-dimensional space n Example:

Figure 1. An example of an input data.

l Output files n You must print n output files for each input data

(Optional) If your algorithm finds m clusters for an input data and m is greater than n (n = the number of clusters given), you can remove (m–n) clusters based on the number of objects within each cluster. In order to remove (m–n) clusters, for example, you can select (m–n) clusters with the small sizes in ascending order
You can remove outlier. In other words, you don’t need to include outlier in a specific cluster n File format for the output of ‘input#.txt’ – ‘input#_cluster_0.txt’

[object_id]\n

‘input#_cluster_1.txt’

[object_id]\n

‘input#_cluster_n-1.txt’

[object_id]\n

n ‘output#_cluster_i.txt’ should contain all the ids belonging to cluster i that were obtained by using your algorithm n Supposed to follow the naming scheme for the output file as above

Rubric l The following figure shows the clustering result for each input data

l Test method

For testing, we will use a measure similar to the Kendall’s tau measure. Please refer to the following wikipedia page.

(http://en.wikipedia.org/wiki/Kendall_tau_rank_correlation_coefficient)

Example

l Correct answer: [object_id_1] and [object_id_2] are contained in different clusters l Your answer n [object_id_1] and [object_id_2] are contained in the same cluster à INCORRECT n [object_id_1] and [object_id_2] are contained in different clusters à CORRECT

The final score will be computed as follows:

𝑻𝒉𝒆 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒄𝒐𝒓𝒓𝒆𝒄𝒕 𝒑𝒂𝒊𝒓𝒔

𝑻𝒉𝒆 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒂𝒍𝒍 𝒑𝒐𝒔𝒔𝒊𝒃𝒍𝒆 𝒑𝒂𝒊𝒓𝒔

Submission l Please submit the program files and the report to GitLab n Report
- File format must be *.pdf.
- Guideline ü Summary of your algorithm
  - Detailed description of your codes (for each function)
  - Instructions for compiling your source codes at TA’s computer (e.g. screenshot) (Important!!) ü Any other specification of your implementation and testing
- Program and code
  - An executable file

ü If you are in the following two cases, please submit alternative files (e.g., .py file, makefile)

You cannot meet the requirements (.exe file) of the programming assignment due to your computing environment (ex. Mac OS or Linux)
You are using python for implementing your program ü You MUST SUBMIT instructions for compiling your source codes. If TAs read your instructions but cannot compile your program, you will get a penalty. Please, write the instructions carefully.

All source files

6. Testing program

Please put the following files in a same directory: Testing program, your output files, given input files, attached answer files(~ideal.txt)

Execute the testing program with one argument (input file name)

Check your score for the input file
- If you implement your DBSCAN algorithm successfully and use the given parameters mentioned above, you will be able to get the similar scores with the following score for each input data
  - For ‘input1.txt’, Score=99
  - For ‘input2.txt’, Score=95
  - For ‘input3.txt’, Score=99
- The test program was build with program ‘mono’. So, even if you are using mac or linux instead of window, you can run dt_test.exe using C# mono.

–

assignment3-evdtld.zip

ITE4005 Assignment 3-clustering Solved

If Helpful Share:

Description

3. Requirements

6. Testing program

Related products

ITE4005 Assignment 2-decision tree Solved

ITE4005 Assignment1 Solved

ITE4005 Assignment4 Solved