CS300 Homework #3-Hash Table Solved  

35.00 $

Category:
Click Category Button to View Your Next Assignment | Homework

You'll get a download link with a: zip solution files instantly, after Payment

Securely Powered by: Secure Checkout

Description

Rate this product

In this homework we, as the manager of a grocery store, we ask you to develop a tool which will help us to improve our marketing strategy. We will provide you the purchased transactions of our customers as an input file and your tool needs to extract all the sales relations between the products. For this purpose, you are required to use a templated hashtable with separate chaining.
Overview
When we go shopping, we generally have a standard list of things to purchase. This list is prepared
based on one’s needs and preferences for each customer. A housewife might purchase fresh food like
vegetables or milk for the kitchen whereas someone alone might purchase chips and coke.
Understanding these buying interests can offer assistance to extend deals in a few ways.
For instance, if X and Y are frequently bought together:
● X and Y can be both placed on the same shelf, so that buyers of one item would be prompted to
buy the other.
● Promotional discounts could be applied to just one out of the two items (X or Y) since buying
one would encourage the buyer to buy the other one, too.
● Advertisements on X could be targeted at buyers who purchase Y.
The Goal
The goal of this homework, for a given list of transactions, is to extract all possible implications. For
instance;
chips -> coke
This implies that, for the customers who bought chips most likely bought coke, too. Or, as another
example, consider the following implication;
sugar, milk -> flour
Implies that, the customers who bought sugar and milk most likely bought flour, too.Input File Format
Your program needs to ask two numbers (ranging from 0 and 1) for given support and confidence
threshold values (see their definitions in the document for details below) from the user. Then, your
program needs to ask the name of the input file. You can decide on the appropriate display messages for
them.
The input file contains the list of purchased transactions of our customers. In each line, there are several
products (items) that are purchased for that time of shopping. For a transaction, there is no repeated
product and the number of products are not fixed. An example of input file is given as below:
inputFile.txt
milk bread bananas cake
bread apple onion
coke chips sandwich
bananas yoghurt chocolate grapes orange
milk orange bread
In the input file, there are no empty input lines and all of the words are given as in lower cases. The
words in the same line are separated with an empty space. An item will be only one word, so, you will
not see an item containing multiple words such as “orange juice”.
Output File Format
The output file format needs to be exactly the same as described here. You need to write your outputs
to a file named as results.txt. In each line there will be only one rule. For each rule, you need to put
commas between the products of the same side of the implication and you need to put a semicolon
instead of the implication symbol (->). So, basically, what we expect from you is to generate the file as
given in the second column of the following table if the extracted implications are as given in the first
column of the table. There will be no space character in the output file. Moreover, you need to give the
confidence values of rules in the output file. Round your confidence values having 2 decimals, i.e.,
having 2 digits after the dot as below. Continue reading this document for more details on confidence
values.
Extracted Rules with Confidences
A -> B conf = 0.30123
{B, C} -> D conf = 0.12139
{B, C} -> {A, D} conf = 0.32125
Output file format
A;B;0.30
B,C;D;0.12
B,C;A,D;0.32Background Information
Before going into detail with the algorithm, we need to define two concepts: support and confidence.
Support describes how an item (or set of items) are popular among the whole item set. Hence, the
support value of an item (or set of items) is the ratio of the number of times the item appeared among
all the transactions. For instance, if there are 10 transactions in the list and milk is purchased 4 times,
the support value of milk is 0.4. On the other hand, if you need to calculate the support value of several
items together, such as milk and bread, you need to find the number of transactions which contain both
milk and bread together to find the support value of them.
Confidence value describes how likely item Y is purchased when item X is purchased, expressed as X->Y.
This is calculated as:
confidence(X -> Y) = support(X, Y) / support(X)
For an implication having multiple items, such as {X, Y} -> {Z, W}, confidence is calculated as:
confidence({X, Y} -> {Z, W}) = support(X, Y, Z, W) / support(X, Y)
For instance; for the given support values;
support(bread) = 0.5
support(milk) = 0.4
support(bread, milk) = 0.2
The confidence of {bread -> milk} is calculated as:
confidence(bread -> milk) = support(bread, milk) / support(bread)
= 0.2 / 0.5 = 0.4
Whereas the confidence of {milk -> bread} becomes,
confidence(milk->bread) = support(milk, bread) / support(milk)
= 0.2 / 0.4 = 0.5
The Flow of the Program
The program has 2 main parts: in the first part, you need to find the support value of several item sets
such as support({bread}) and support({bread, milk}). Then, by using those support values,
you need to calculate the confidence values and extract the rules.Calculating Support Values
● Find all support values of items in the transaction such as support({bread}) and
support({milk}).
● Store all the items whose support value is smaller than the given support threshold value.
● Find all possible item pairs of the remaining items (selected after the previous step).
● Find all support values of item pairs in the transaction such as support({bread, milk}),
support({apple,orange}).
● Store all the items pairs whose support value is smaller than the given support threshold value.
Rule Extraction Process
Up to this point, you have discovered all items and item pairs whose support values are greater than a
given support threshold value. Assume that all of these items and item pairs are stored in the same hash
table named as lookupTable.
Using lookupTable (a hash table) you need to enumerate all possible 2 way permutations of all the
lookupTable elements. For example, assume that in the hash table, we have the following elements:
{A}, {B}, {C}, {A, B} and {B, C}
You need to find all possible 2 length permutations of these elements and construct the rules as below:
A -> B
A -> C
A -> {B, C}
B -> A
B -> C
C -> A
C -> B
C -> {A, B}
{A, B} -> C
However note that, there can not be the same item on both sides of the implication. For example, the
following rules are not valid.
A -> {A, B}
{A, B} -> {B, C}
{A, B} -> A
After enumerating all possible rules, you need to calculate the confidence values of them. The output of
your program will be the rules whose support values are greater the given confidence threshold value.Sample Runs
in.txt
milk bread cake
bread apple onion
coke chips sandwich
bananas yoghurt chocolate grapes orange
milk orange bread
yoghurt bananas juice
apple cucumber garlic
tea egg melon
bread milk juice bananas orange
milk tea bread
bread milk chocolate
milk coffee
broccoli bananas orange bread milk
Sample Run 1
For the given input file above, the output will be:
Please enter the transaction file name: in.txt
Please enter support and confidence values between 0 and 1: 0.20 0.60
14 rules are written to results.txt
results.txt
bananas;orange;0.75
orange;bananas;0.75
bread,orange;bananas;0.66
orange,milk;bananas;0.66
orange;bread;0.75
bread;milk;0.85
milk;bread;0.85
bananas,orange;bread;0.66
orange,milk;bread;1.00
orange;milk;0.75
orange;bread,milk;0.75
bananas,orange;milk;0.66
bread,orange;milk;1.00
bananas,orange;bread,milk;0.66Sample Run 2
For the given same input file above, the output will be:
Please enter the transaction file name: in.txt
Please enter support and confidence values between 0 and 1: 0.60 0.45
There is no rule for the given support and confidence values.

  • HW3-Hash-Table-iyvhl7.zip