Name: CS4395 Assignment 8-Author Attribution Solved
SKU: 101551
Price: 30.00 USD
Availability: InStock

Description

Rate this product

Author Attribution

Alexander Hamilton 49
James Madison 15
John Jay 5There are several documents for which authorship is in dispute by historians:
Hamilton or Madison 11
Hamilton and Madison 3Caution: All course work is run through plagiarism detection software comparing students’ work as well as work from previous semesters and other sources.

Instructions:

Read in the csv file using pandas. Convert the author column to categorical data. Display the first few rows. Display the counts by author.
Divide into train and test, with 80% in train. Use random state 1234. Display the shape of train and test.
Process the text by removing stop words and performing tf-idf vectorization, fit to the training data only, and applied to train and test. Output the training set shape and the test set shape.
Try a Bernoulli Naïve Bayes model. What is your accuracy on the test set?
The results from step 4 will be disappointing. The classifier just guessed the predominant class,Hamilton, every time. Looking at the train data shape above, there are 7876 unique words in the vocabulary. This may be too much, and many of those words may not be helpful. Redo the vectorization with max_features option set to use only the 1000 most frequent words. In addition to the words, add bigrams as a feature. Try Naïve Bayes again on the new train/test vectors and compare your results.
Try logistic regression. Adjust at least one parameter in the LogisticRegression() model to see if you can improve results over having no parameters. What are your results?
Try a neural network. Try different topologies until you get good results. What is your final accuracy?

CS4395 Assignment 8-Author Attribution Solved