CSCI804 Object and Generic Programming in C++ Assignment 4    Solved

30.00 $

Click Category Button to View Your Next Assignment | Homework

You'll get a download link with a: . zip solution files instantly, after Payment


5/5 - (2 votes)

TASK ONE (5 Marks): Document Retrieval

The field of information retrieval is concerned with finding relevant electronic documents based upon a query. For example, given a group of keywords (the query), a search engine retrieves Web pages (documents) and display them sorted by relevance to the query. This technology requires a way to compare a document with the query to see which is most relevant to the query.

A simple way to make this comparison is to compute the binary cosine coefficient. The coefficient is a value between 0 and 1, where 1 indicates that the query is very similar to the document and 0 indicates that the query has no keywords in common with the document. This approach treats each document as a set of words. For example, given the following sample document:

“Chocolate ice cream, chocolate milk, and chocolate bars are


This document would be parsed into a set of keywords, where case is ignored, punctuation discarded,

{chocolate, ice, cream, milk, and, bars, are, delicious}. An identical process is performed on the query to turn it into a set of keywords.

Once we have a query Q represented as a set of words and a document D represented as a set of words, the similarity (relevance) between Q and D is computed by:

relevance =

where Q and D represents the number of words in Q and D respectively, QD is the number of words appeared in both Q and D (intersection of Q and D).

Select appropriate STL containers and write a program that takes a set of keywords (any number of words) that represent a query. The program should then compare the query to all the document files (whose names end with extension .txt) specified in the file called listofdocs.txt and output the relevance and the documents in a descending order of the relevance. If a document contains more than 10 words, then just output the first 10 words of the document and a symbol “…” at the end.

For this task you should submit DocRetrieval.cpp. Your code must compile on Banshee with the instruction

$ g++ DocRetrieval.cpp -o DocRetrieval

and should run as

$ ./DocRetrieval keywords1 keywords2 keywords3 …

For example, if the listofdocs.txt lists four documents that are in the same directory as the program:





Note: check the text files for their contents.

Run the program as follow

$ ./DocRetrieval kyle radio 2Day girl

The output would look like:

(Kyle04.txt – 32.44%) THE radio network Austereo has pulled the top-rating 2Day FM …

(Kyle03.txt – 23.15%) THE top-rating radio station 2Day FM and its owner, Austereo …

(Kyle01.txt – 8.98%) The Ten Network has dumped embattled host Kyle

Sandilands as …

(Kyle02.txt – 0.00%) Word around the traps yesterday was that Monday night’s televisual …


TASK TWO (5 marks): Movie Ratings

You have collected files of movie ratings where each movie is rated from 1 (bad) to 5 (excellent).  The first line of each file is a number that identifies how many ratings are in the file.  Each rating then consists of two lines:  the name of the movie followed by the numeric rating from 1 to 5. Here is a sample rating file with four unique movies and seven ratings:


File: ratings.txt



Harry Potter and the Order of the Phoenix


Harry Potter and the Order of the Phoenix


The Bourne Ultimatum


Harry Potter and the Order of the Phoenix


The Bourne Ultimatum








Choose a proper STL container and write a program that reads multiple files in this format, calculates the average rating for each movie, and outputs the average along with the number of reviews. Here is the desired output for the sample data:


Glitter: 1 review, average of 1/5

Harry Potter and the Order of the Phoenix: 3 reviews, average of 4.3/5

The Bourne Ultimatum: 2 reviews, average of 3.5/5

Wall-E: 1 review, average of 4/5


For this task you should submit Movies.cpp. Your code must compile on Banshee with the instruction

$ g++ Movies.cpp -o Movies

and should run as

$ ./Movies ratings1.txt ratingg2.txt ratings3.txt