CS465 - Assignment eight

Due: 2016-04-29 01:45p

Objectives

Part 0: Get your data

There are no milestones to hand in, but you should have your data in hand by the end of the week. You may already have this, in which case, congratulations. I’ll be getting back to you shortly about your proposals, so continue to think about them this week.

Part 1: Demonstrate how to build a Huffman Tree

Rather than just having you draw me a tree, I would like you to demonstrate the steps required to build a Huffman tree. If you are not already familiar with Huffman coding, I suggest you read up on it, it is an interesting technique that turns up all over the place. We are not going to actually generate the codes or do any compression or decompression, we are just going to construct the tree that is the basis for coding and decoding.

The algorithm is very simple. I will assume we are working with text data.

Here is the set of steps to create the tree for “to be or not to be” (I’ve modified it to show the frequency of the tree).

Hw8 Example

User interface

Here is a stub file. It contains the basic elements you will need. There is a text box for entering a string, a start button which starts the process and step button to advance by a step. A “step” for us will be one iteration of the loop (remove two elements, build a tree, reinsert it).

So, the user will type something in, click the start button, and then click the step button repeatedly until the process is complete. Note that the step button is currently disabled. The button should be enabled when the process has been started and disabled when it is complete.

Stepping through

To simplify the trees, lowercase all strings when you receive them. When the string is first loaded, start by calculating out the frequencies and creating an array of trees, one for each character. I suggest creating an object with properties: text, frequency, and children, where children is an array of these same objects. We you begin, you will just have single letters. None of these will ever have children, so they don’t need the children property.

I suggest writing a step function that does one pass over the array. If the array is sorted, a simple slice (or two) is all that is needed to extract the lowest frequency trees. Create a new tree object with these, making them the children. For the frequency of the new node, add the frequencies of the two children together. Do the same to the text. This isn’t required for the algorithm, but it does make things easier to find in the tree, and it makes our trees more interesting looking. So, for example, if I had two nodes “n” and “r”, both with frequency 1, I would combine them into a new node “nr” with frequency 2.

Note about sorting

One potential issue with the algorithm I’ve presented is that if we just go by the frequency for sorting, there are potentially many valid trees. So, I would like you to use a slightly more complicated sort to make sure we all end up with the same trees. If two nodes have different frequencies, sort by the frequency. If they have the same frequency, check the height of the tree. Shorter trees should come before taller trees – this will help keep the final tree shallow. Note that this is easy if we are merging the text in the parent node; we don’t have to actually look at the depth, we just compare string lengths. Finally, if the frequencies are the same and the lengths of the texts are the same, fall back to alphabetical order.

Visualizing

I will assume that at this point, you can read through the documentation to figure out the mechanics of how the tree layout works. Note that there are separate steps for producing the nodes and the edges.

I would like you to produce a grid of the trees in your array. Here is an example part of the way through constructing the tree for the string “doctor who”.

Hw8 1

This is just an example, yours doesn’t need to look like this (for example, there is no need to include the frequencies). *Note that the first version I posted here had boxes around the elements. This is useful for debugging, but not a necessary part of the drawing.

At each step, the number of trees will shrink until you are left with just one tree. This means that you will need to be very mindful of your entering, exiting, and update sets. Note that once a tree is drawn, it may move, but it doesn’t change. If you wanted to get fancy, you might try animating the merge, but that is probably more trouble than it is worth.

Styling is up to you. The grid lines in mine were more for debugging than anything else, you can add them or not. You could do something fancy like adding a highlighted background to show which tree just changed. You could put the text in circles or label the edges with 0s and 1s. You could label the trees with the frequency of the root node. Anything you want to do to make this look nicer or more informative would be great.


Logistics

We will return to working in pairs for this assignment. Again, you should be working pair programming style, not splitting up the work. Let me know if you are having trouble with your partner assignment.

Please call your HTML file hw8.html and submit it file on Moodle.