CSC4760 Assignment 5 Solved

30.00 $

Category:

Description

Rate this product

1. ( (Counting Tweets)

Input Datasets:

Tweets (tweets.json):

user

geo

tweet

Bob

Atlanta

It is a sunny day!

Susan

Athens

We have a football game today 🙂

David

Atlanta

Today is cold.

Lisa

Auburn

I love Auburn University

Ben

Birmingham

I will go to Atlanta today!

Paul

San Francisco

We watch a movie today!

Smith

San Diego

It is hot today. Summer comes.

Ethan

Log Angeles

Oscar ceremony is wonderful!

Emma

Log Angeles

I love Oscar ceremony!

Rolando

Orlando

I will go to the beach!

Mia

Miami

Sunny Day!

City and State lookup table (cityStateMap.json):

city

state

Atlanta

Georgia

Athens

Georgia

Miami

Florida

Orlando

Florida

Birmingham

Alabama

Auburn

Alabama

Log Angeles

California

San Francisco

California

San Diego

California

Problem and Output Data:

We want to count the number of tweets published in each state. The following table shows the desired results.

state

count

Georgia

3

Florida

2

Alabama

2

California

4

Implementation:

Design and implement a PySpark program to solve the problem. We did not provide any template python file this time. You may want to create one python file from scratch.

You are required to use Spark Dataframe to implement this function.

Report:

Please write a report illustrating your experiments. You need to explain your basic idea about how to count tweets in each state. You may add comments to the source code such that the source code can be read and understood by the graders.

In the report, you should include the answers to the following questions. 1) Explanation of the source code
2) Experimental Results

2.1) Screenshots of the output. Since we plan to use Dataframe in Spark, it is easy to type in “DF.show()” to visualize the table in the terminal. Please do so and take a screenshot of the output in the terminal. The screenshot “output.PNG” of the output in my VM is given. You can use it to verify your outputs.

2.2) Explain your results. Does your implementation give the right answer?

  • 5-ijh1yg.zip