CSC4760 Assignment 3 Solved

30.00 $

Category:

Description

5/5 - (1 vote)

Problem 1.  (Setting up Spark and running the WordCount example)

This assignment aims at letting you learn how to setup Spark on your KVM. After the installation of Spark, you need to run the WordCount (Python version) example on your KVM.

Please follow the instructions provided in the slides “14 Setup Spark on Ubuntu.pptx”. If you have any questions, please talk with the instructor or the TA. We will help you.

Source Code and Datasets:

The Python source code is given in the file “WordCount.py”. You need to run it on two datasets: 1) test.txt (display the top-5 most frequent words)
2) peterpan.txt (display the top-30 most frequent words)
The example commands are as follows.

$ spark-submit WordCount.py /home/rob/Assignment3/test.txt 5
$ spark-submit WordCount.py /home/rob/Assignment3/peterpan.txt 30 Report:

Please write a report to explain the key steps. Please take the screenshots of the outputs in the terminalfor “test.txt”and“peterpan.txt”respectively.Pleaseputtheminthereportandexplain the outputs briefly. You may include the following key steps.

1) Setup Spark in KVM by yourself.
2) Download the “WordCount.py” file and two input data files from iCollege.

3) Open a terminal, and run the “WordCount.py” file on “test.txt” and “peterpan.txt” respectively. You need to explain the commands and the outputs.

  • 3-uxbwoj.zip