CS2223 Homework 3 Solved

35.00 $

Category:

Description

5/5 - (1 vote)

Please copy these classes into your Homework area

  • hw3.AVL à USERID.hw3.AVL
  • hw3.BST à USERID.hw3.BST
  • hw3.Heap à USERID.hw3.Heap
  • hw3.TaleOfTwoCitiesExtractor à USERID.hw3.TaleOfTwoCitiesExtractor
  • hw3.Question1 à USERID.hw3.Question1
  • hw3.Question2 à USERID.hw3.Question2
  • hw3.Question3 à USERID.hw3.Question3

You will refer to the existing InstrumentedSeparateChainingHashST class. Note you could also copy your hw2 TaleOfTwoCitiesExtractor which will work as is for HW3

 

Q1. HeapSort Empirical Evaluation

Algorithm 2.7 in Sedgewick, Heapsort, shows how to use a heap to sort a comparable array. The code is provided for you in algs.hw3.Heap. The first step is to construct a heap from a Comparable[] array. This takes place in the constructHeap(a) method.

// construct heap from the raw array of which we know nothing. int n = a.length;

for (int k = n/2; k >= 1; k–) {   sink(a, k, n);

}

If you look at this code with an eye towards its performance, it sure looks like the for loop will execute n/2 times (which means its performance is linearly dependent on the size of the array). You also know that the sink method behavior (in the worst case) is directly proportional to the log of the number of elements in the heap. Thus, at first glance, it looks like this behavior will be proportional to ~ (N*log N)/2.

It turns out that you can mathematically prove that the performance constructHeap is in direct proportion to N alone.  Your task is to instead count the number of comparisons and exchanges, and validate the proposition (page 323) that it will, in fact, require fewer than 2N compares and fewer than N exchanges to construct a heap from N items.

Copy algs.hw3.Heap into USERID.hw3 and modify it to record empirical results. For the domain of data, use uniformly computed random numbers from 0 to 1 (such as you can generate from

StdRandom.uniform()), and generate a table of results (showing N, # of exchanges, and #comparisons) for N=16, 32, 64, … 512. For each size N, run T=100 trials, and record the maximum number of exchanges and comparisons you witnessed solely during the constructHeap construction.

Do your empirical results support the proposition? Explain why or why not.

Copy the Question1 class to USERID.hw3 and modify it so its output should look something like this:

N     MaxComp      MaxExch

16    21          11          NOTE YOUR NUMBERS WILL BE DIFFERENT THAN THESE

32    xxx         yyy

64    xxx  yyy 128 xxx  yyy

256 xxx            yyy

512 xxx            yyy

 

Q2. Empirical Evaluation of Symbol Table structures

This question explores the different structures that result from using binary search trees and separate chaining hash table to support the Symbol Table API, which allows you to associate a (key, value) pair for future retrieval.

Once again, using the Tale Of Two Cities data set, modify this program to produce the following table. This question is based on the number of comparisons needed to locate each key in the symbol table.

  • For Binary Search Trees, the depth of a node reflects the distance from the root, and this depth is one value smaller than the total number of comparisons needed to locate that node in the tree
  • The same is true of AVL trees
  • For Separate Chaining Hash Symbol Tables, each of the buckets stores its size, and each of these will contribute to the overall values.

For example, If there were five (key, value) pairs stored in the symbol table implemented as above as follows (note that only the keys are shown).

 

In the above structures, the number of comparisons to find each of the five keys (note that hashing doesn’t count as a comparison) is:

key BST AVL HT
A 2 3 2
B 1 2 1
C 2 1 3
D 3 3 1
E 4 2 1
Avg. 2.4 2.2 1.6

 

Your task is to complete this experiment using the Tale Of Two Cities data set to record the count of each word in a symbol table. You will have to use put (key, value) properly. That is, for the first occurrence of a given word, w, you would put(w, 1). Use the existing API to determine when a word already exists in the symbol. For each subsequent occurrence of a given word, just increment the value associated with w by calling put() with a count that is one greater.

To be clear, the depth of a node in a binary search tree is the number of edges it takes to get to that node from the root. Thus the root of the binary search tree has a depth of zero. With regards to the number of comparisons which you need to output below, this is a reflection of how many comparisons it takes to determine whether a key value exists within the AVL, BST or Hash Symbol Table. The number of comparisons for a key value in the BST and AVL tree is 1 greater than the depth of the node. Why? Because you need to compare just to see if you have even found the key you are looking for, even when looking for the key associated with the root of a tree.

For the Hashtable, you don’t count the cost of hashing a key to its individual bucket. You only count the number of comparisons to find these keys. Note this part of the question is the trickiest part of this homework. Come to office hours if you have questions.

When you are done, your output should look like this:

There are 10650 unique words.

The Height of the BST is 29

The Height of the AVL is 15

 

N      1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,  14,  15,  16,  17,  18,  19  …

#BST   1,   2,   4,   8,  15,  30,  57,  98, 159, 252, 383, 549, 740, 886, 997,1085,1053,1024, 955, …

#AVL   1,   2,   4,   8,  16,  32,  64, 128, 256, 512,1020,1944,2904,2583,1066, 110 #HT 2035,1977,1820,1549,1204, 867, 557, 331, 166,  92,  40,  11,   1

 

AVG. BST NumberOfComparisons:16.509671361502347

AVG. AVL NumberOfComparisons:12.716338028169014 AVG. HT NumberOfComparisons:3.6068544600938965 Tasks:

  • Complete implementation of collect() in AVL
  • Complete implementation of height() and height(Node) in BST
  • Complete implementation of collect() in BST
  • Complete implementation of Question2

Consider inserting the keys { “it”, “was”, “the”, “best”, “of”, “times”, “it”, “was”,”the”, “worst”, “of”, “times” } into a BST, an AVL and the Hast Table. The results appear on the next page. Note that the structures represent a symbol table, so the BST, AVL and Hash Table all associate the respective frequencies of each word in the structures (not shown below). The following words appear twice (“it”, “was”, “the”, “of”, “times”) and the words “best” and “worst” appear just once. So these keys would have either 2 or 1 associated with them.

 

 

BST TREE AVL TREE HASH TABLE (M=4)
   

 

With these structures, here are the counts of comparisons for locating each key.

key BST AVL HT
Best 2 3 1
It 1 2 3
Of 4 3 2
The 3 1 1
Times 4 3 1
Was 2 2 2
Worst 3 3 1
Avg. 2.714286 2.428571 1.571429

 

Output would look like this:

There are 7 unique words.

The Height of the BST is 3

The Height of the AVL is 2

N      1,   2,   3,   4,   5,   …

#BST   1,   2,   2,   2,

#AVL   1,   2,   4,

#HT    4,   2,   1,

 

AVG. BST NumberOfComparisons:2.7142857142857144

AVG. AVL NumberOfComparisons:2.4285714285714284

AVG. ST NumberOfComparisons:1.5714285714285714

 

As a further hint, please review the code example “CountingObject” I’ve made this available in day 18 and the “outputDepthInfo” method in BST as found in day 17. You will need to pull the latest version of the Algorithms D2020 repository to get this code.

 

 

Q3. Binary Search Trees

Using Binary Search Trees to build a Symbol Table that counts the number of occurrences of unique words in The Tale Of Two Cities by Charles Dickens. Copy the TaleOfTwoCitiesExtractor that you had used for HW2 into your USERID.hw3 package. Q2  Copy algs.hw3.Question3 into your USERID.hw3 package and modify it for this problem.

Q3.1 (10 points) Complete the method that returns the Key whose Value is highest.

public String mostFrequent() { }

 

You can add any number of helper methods as part of this assignment.

Q3.2 (15 points) Complete the method that prints in ascending order the number of words that appear only once in the Binary Search Tree (i.e., their count is 1) and returns the total number of keys sthat were printed.

public int printUnique() { }

 

You can add any number of helper methods as part of this assignment. Note that the output will likely exceed your console history, so you will only see the last

Q3.3 (5 points) Question4 should construct a BST from the Tale of Two Cities and output the 10 most common words (together with their corresponding appearance counts). It does this by repeatedly deleting the key whose count frequency is highest in the BST until all values are printed. The output should look like:

Top ten most frequent words

 

the 7983
and 4935
of 3999
to 3460
a 2908
in 2579
his 2005
it 2003
i 1913

that 1889

Number of words that appear once aa aaabusiness aamatter aback abandon … youths youties

youunder

5158 unique words.

Change Log

  1. ANY changes will appear here 2. Binary Search Tree has a height of 29.
  2. Clarified difference between Depth and #of Comparisons

 

  • HW3-fzlrs0.zip