What is a fixed line representing Zipf's law
Recall that Zipf’s law postulates the following relationship:
To evaluate that hypothesis we could plot a fixed line with slope of -1 in log space alongside our data. Presumably that line will have a similar slope as the actual data. Think about how you could use the actual data to approximate meaningful “y values”. Zipf’s law implies that the second most common word is half as common as the most common word and the third most common is a third as common as the most common word. What does that imply about the counts for the second and third most common words (and the nth) most common word)?