Description
Assignment – High Frequency Words
- Choose a corpus of interest.
- How many total unique words are in the corpus? (Please feel free to define unique words in any interesting,
defensible way).
- Taking the most common words, how many unique words represent half of the total words in the corpus?
- Identify the 200 highest frequency words in this corpus.
- Create a graph that shows the relative frequency of these 200 words.
- Does the observed relative frequency of these words follow Zipf’s law? Explain.
- In what ways do you think the frequency of the words in this corpus differ from “all words in all corpora.”