Name: CSC1002 Computational Laboratory Handwritten Digit Recognition Solved
SKU: 30916
Availability: InStock

5/5 - (2 votes)

OVERVIEW

In this assignment, you will apply Machine Learning skills. You are asked to design and develop one of its popular classification algorithm called kNN (k-Nearest Neighbors) to programmatically predict handwritten digits in its digitized format. kNN is supervised based algorithm by majority vote of its neighbors. It is supervised in a sense that a sufficient large sample dataset of known target values is carefully selected. This sample dataset will be split into 2 groups. One group will be used to formula a kNN model to be used for prediction. It’s called Training. The other group is used to verify the accuracy of the established model, called Testing. The Training-Testing cycle might result in several iterations of tuning and adjustment until a satisfactory model is achieved with high accuracy of positive predictions, higher better. Once such a model has been implemented, it’s ready to make prediction. To make a prediction, simply feed an unknown value of same format to the program and then perform the required computation in accordance to the established kNN model.

SCOPE

Note: refer to the sample output for references.

Construct a kNN models based on 2 files called digit-training.txt & digit-testing.txt for training & testing respectively (cross-validation is not required), files are text based.
Refer to the lab materials for more information on kNN implementation, file format and ideas.
Use ONLY vector-based function developed from previous lab class to determine nearest neighbors
1. Download the latest version of vector.py from moodle
Implement majority vote algorithms to do the best guess prediction; you need to experiment with different values of “k” (3,5,7 or 9) and apply your majority rule to make the prediction. By comparing the results (accuracy) of each outcome, pick the one with the highest accuracy rate.
IN THE DESIGN DOC: explain (a) how the closest neighbors are chosen and (b) the rule(s) used in making the prediction
Based on your “FINAL” kNN model implemented (value of k and rule(s)):
1. show training and testing info (see Output sections)
2. show prediction outcome using file digit-predict.txt (see Output sections).
Files to download from moodle:
1. digit-training.txt, digit-testing.txt, digit-predict.txt & vector.py

SKILLS

In this assignment, you will be trained on the use of the followings:

Machine Learning life cycle – dataset, data mining, kNN construction, training & testing
Python objects & modules (file IO, string, string formatting, sorting, dictionary, list, list comprehensions)
Controls – if, while, for to control program flow
Variable Scope
Functions to breakdown the logic

DELIVERABLES

Design documentation (A3_School_StudentID_Design.doc/pdf)
Program source code (A3_School_StudentID_Source.py)
Output (A3_School_StudentID_Output.doc/pdf)

Zip all files above in a single file (A3_School_StudentID.zip) and submit the zip file by due date to the corresponding assignment folder under “Assignment (submission)”

For instances, a SME student with student ID “119010001”:

 A3_SME_119010001.zip:

o A3_SME_119010001_Design.doc/pdf o A3_SME_119010001_Source.py o A3_SME_119010001_Output.doc/pdf

5% will be deducted if any files are incorrectly named!!!

OUTPUT

Training Info (see sample output)
Testing Info (see sample output)
Prediction Outcome (see sample output)

DESIGN DOCUMENTATION

For the design document provide write-up for the following information:

Design:
1. Describe the general structure of the program (functions, variables and program flow).
2. Describe kNN model you implemented:
  1. your choice of k value
  2. how the closest neighbors are determined

the rule(s) used in making the prediction

Propose one strategy in reducing the kNN computation time (finding the neighbors) specific to this assignment; “random” is already suggested in class.

Test Plan: (Not Required)

TIPS & HINTS

Use Dictionary to keep list of digit-vectors (during training) and to track accuracy rate (during testing)
Use Counter() and most_common() from module “collections” to return the closest neighbors
Use zip + list comprehension for vector sum, subtract, sum, or, average, sum and so on
Use String Formatting for training and testing info
Use reduce() from module “functools” to combine multiple vectors into a single OR-vector or AND-vector, if needed

SAMPLE OUTPUT – TRAINING

SAMPLE OUTPUT – TESTING

SAMPLE ODUTPUT – PREDICTION

Simply output the predicted value, one number per line, such as:

MARKING CRETERIA

Coding Styles – layout, comments, white spaces, naming convention, variables, indentation.
Documentation – Design + Test Plan
Program Correctness – logic, program structure, functions with appropriate parameters
User Interaction – how informative and accurate information is exchanged between game player and host.
Readability counts – programs that are well structured and easy-to-follow using functions to breakdown complex problems into smaller cleaner generalized functions are preferred over a function embracing a complex logic with nested conditions and sub-functions! In other words, a design with clean architecture with high readability is the predilection for the course objectives over efficiency.
KISS approach – Keep It Simple and Straightforward.
Balance approach – you are not required to come up a very optimized solution. However, take a balance between readability and efficiency with good use of program constructs.

CHALLENGES

Determine other means of reducing kNN computation time yet keeping accuracy rate relatively high.

Assignment3.zip

[SOLVED] CSC1002 Computational Laboratory Handwritten Digit Recognition

OVERVIEW

SCOPE

SKILLS

DELIVERABLES

OUTPUT

DESIGN DOCUMENTATION

TIPS & HINTS

SAMPLE OUTPUT – TRAINING

SAMPLE OUTPUT – TESTING

SAMPLE ODUTPUT – PREDICTION

MARKING CRETERIA

Want to See Past Work First?

[SOLVED] CSC1002 Computational Laboratory Handwritten Digit Recognition

OVERVIEW

SCOPE

SKILLS

DELIVERABLES

OUTPUT

DESIGN DOCUMENTATION

TIPS & HINTS

SAMPLE OUTPUT – TRAINING

SAMPLE OUTPUT – TESTING

SAMPLE ODUTPUT – PREDICTION

MARKING CRETERIA

Related products

Number-Guessing Game (Computer as Guessing Role)

CSC1002 Assignment 2

CSC1002 Assignment 1

Related in this category

More in this category

CSC1002 Assignment 2

CSC1002 Connect 4

CSC1002 Assignment 1

CSC1002 Assignment 1

Number-Guessing Game (Computer as Guessing Role)

Data Visualization – Part I (Interactive Query) Solution

Want to See Past Work First?