Name: Comp598 Homework 1 Solved
SKU: 91379
Price: 30.00 USD
Availability: InStock

[SOLVED] Comp598 Homework 1

30.00 $

Category: COMP598

Click Category Button to View Your Next Assignment | Homework

You will receive the following solution file(s) instantly after successful payment:

hw1-slkhod.zip (55 KB)

Assignment Instructions Updated Recently? Submit Below and we will provide new Solution!

Submit New Instructions

🔒 Securely Powered by:

Rate this product

The goal of this assignment is to work through the data-handling phases of a mini data science project to put into practice the ideas we’ve discussed in the Unit 1 lecture. You are welcome to complete the exercises in this homework using whatever tools and programming languages you deem fit. In order to make ANY points, your assignment MUST pass the Homework 1 grading tests. Please watch the orientation video under Lecture Recordings in MyCourses for more information.

In this assignment, you will conduct an analysis of tweets produced by Russian trolls during the 2016 US election. These tweets were published by 538. You can read about them here.

In this mini-project, we’ll be assessing the frequency with which troll tweets mention “Trump” by name.

1. Data Collection

Download the raw tweet data. You will ONLY be using the data from the first file(IRAhandle_tweets_1.csv).
Looking at only the first 10,000 tweets in the file, keep those that (1) are in English and (2) don’tcontain a question. This will be our dataset. To filter the right tweets out, take a look at the columns.
i. There are specific columns that call our language. You can trust these.

ii. Assume that a tweet which contains a question contains a “?” character.

c. Create a new file (I would suggest in TSV – tab-separated-value – format) containing these tweets.

2. Data Annotation

To do our analysis, we need to add one new feature: whether or not the tweet mentionedTrump. This feature “trump_mention” is Boolean (=”T”/”F”). A tweet mentions Trump if and only if it contains the word “Trump” (case-sensitive) as a word. This means that it is separated from other alphanumeric letters by either whitespace OR non-alphanumeric characters (e.g., “anti- Trump protesters” contains “trump”, but “I got trumped” does not).
Create a new version of your dataset that contains this additional feature.

3. Analysis

Using your newly annotated dataset, compute the statistic: % of tweets that mention Trump.
It turns out that our approach isn’t counting tweets properly … meaning that some tweets aregetting counted more than once. Go through and look at your annotated data. Identify where the counting problem is coming from.

Submission Instructions

Download the template code from https://github.com/druths/comp598-2021 .
Your submission should pass the unit tests and contain – at minimum – the following:

– README.md (5 pts)
o In3sentencesorless,explainwherethecountingproblemiscomingfrom.
– dataset.tsv (20 pts)
o ThisshouldbetheoutputofyourDataAnnotationphase.
o Formatistab-separatedvalue,utf-8(aslongasyoudon’tdoanythingfancy,itwillbeinutf-8)(5

COMP 598, Fall 2021

pts)

COMP 598, Fall 2021

o Thefirstlineshouldbeaheaderline(3pts)
o Thefileshouldcontainthefollowingcolumns,inthisorder:tweet_id,publish_date,content,and

trump_mention. Tweets should appear in the same order they appeared in the original file from

538. (12 pts) – results.tsv

o Formatistab-separatedvalue
o Thefirstlineshouldbeaheaderline,withheaders“result”and“value”.
o The second line should contain the result for “frac-trump-mentions”. If necessary, truncate your

answer to three decimal places.

For partial credit purposes, you may also include the code that you used to do this work. It must be readable in a standard text editor. Remember that code readability and partial credit are correlated “#$%

hw1-slkhod.zip

[SOLVED] Comp598 Homework 1

Related products

COMP598 Final Project – Data Science Project

COMP 598 Homework 2 – Unix server and command-line exercises

COMP598 Homework 8 –Using TF-IDF

Related in this category

More in this category

COMP598 Homework 3-Unix and python analysis

COMP 598 Homework 8 – Using TF-IDF

COMP598 Homework 4- Bokeh Dashboard

COMP598 Homework 5 -Data Collection & Cleaning

COMP598 Homework 3 –Unix and python analysis

COMP 598 Homework 7 – Data Annotation

Want to See Past Work First?