CS 101 Laboratory #10
Spam, Spam, Spam
Objective:
To gain more experience with Strings.
As in previous weeks, you should bring a completed design document to lab.
Due date: Friday, December 2, at the beginning of your lecture
section.
Background
What is spam? Before the mid-90s, Spam was a canned "meat"
product, which we believe actually still exists under that name.
Others are fondly reminded of a
Monty Python skit involving a
restaurant that served spam in all of its dishes. Of course, these
days, the term "spam" means just one thing - unwanted email!
There is lots more information available online about the
history of the term
"spam". Hormel
(the makers of Spam) have an amusing site to give spam back its
"good" reputation, including the slogan (Spam. So good...it's gone.)
The Problem
This week we will build a program that should give you insights into
how you might add a spam filter capability to a mail program. Our
spam filter is rather simple. The user provides a list of words.
The program then searches the "from" and "subject" headers of all
your mail messages and divides your mail into two lists. One list
contains words in the filter, the other does not.
The SpamFilter program is shown below:
In this program, the user selects the machine to read mail from, then
enters his/her username and password. The program connects to the mail
server and downloads the headers of all the messages. Initially, all
the mail is considered good mail (not spam).
The user can then add
filter words. The filter words appear in the text area in the top
right after they are entered. When the user adds a filter word, it
appears in the list, but also the mail is scanned for occurrences of
the filter word. Messages with that word in their headers are
removed from the good mail list and moved to the spam list.
The JComboBox at the bottom of the screen allows the user to switch
between viewing the good mail or the spam.
You can use this program to connect to your actual mailbox and filter
your own mail by entering your mail server name, username and
password into the user interface.
We provide the code to actually communicate with a mail server. You
do not need to worry about errors in your program causing problems
with your real mailbox, though, as there is nothing in your program
that modifies your mailbox in any way.
If you don't want to use your actual mailbox, we have set up
temporary mailboxes for you to use on the mail server
basin.cs.middlebury.edu. You would connect with the same username
and password that you use to connect to benjerry. Of course, these
mailboxes are currently empty. So, you will need to send yourself a
few short pieces of mail to test out your program, using the address
username@cs.middlebury.edu for example (using your login id in place of "username",
of course).
To start click on this link to download
the SpamFilter starter.
Design
This week we will again require that you prepare
a written "design" for your program before lab. At the beginning of
the lab, the instructor will briefly examine each of your designs to
make sure you are on the right track. Submit your final design with
your code submission.
The only class you are responsible for designing is the
SpamFilter class. For this class, you should provide:
- A list of the non-final instance variables you expect to
include
- the headers of the methods you expect to define (including all
parameters) in the class and a brief description of the function of
the method (similar to the comment you would include to describe the
method in the final program).
- a sketch of the code used in the body of each method,
especially control structures like while, for, and if
constructs.
We provide a MailConnection class that manages the
communication with a mail server. We also provide a simple
Mail class that the SpamFilter class uses to store
and retrieve a "from" header and a "subject" header. It provides the
following:
- public Mail (String from, String subject)
-
Constructs a mail object with just a "from" header and a "subject" header.
- public String getFrom()
- Returns the from header that
was passed in on construction.
- public String getSubject()
- Returns the subject
header that was passed in on construction.
- public String toString()
- Returns the "from" and
"subject" headers as a single string with a newline between them
and a newline at the end.
The MailConnection class is a bit more interesting.
Here are descriptions of the
constructor and methods the SpamFilter class needs:
- public MailConnection (String host, String userName,
char[] password)
- To talk to a mail server, we first
construct a MailConnection object. We pass in the
name of our mail server, something like
"mail.middlebury.edu", a login id, and a password.
Notice that the password parameter is a character array rather than a
String. This is not really a problem, because the GUI component that
the user types their password into (a JPasswordField) returns
a character array, rather than a String, so the program just passes the
contents of the password field on to the MailConnection
constructor.
When you construct a
MailConnection it attempts to connect to the mail server.
There are a number of reasons why it might fail. For example, the
user might mistype his/her login id or password or the mail server
might not be running for some reason. If any of these failures
occur, a dialog box will pop up to inform the user. The program must
also be aware that the connection failed because it will not be
possible to look at the mail if there is no connection. For that
reason, we provide the following method.
- public boolean isConnected()
- Returns true if the
program currently has a connection to the mail server.
- public void disconnect()
- Closes the connection to
the mail server. This does nothing if there is no active connection.
- public int getNumMsgs()
- Returns the number of
messages in the mailbox you are connected to. This returns 0 if
there is no active connection.
- public String header (int msg)
- Returns the headers
of a mail message identified by the number passed in. Unlike Java,
mailboxes number messages beginning with 1 and going up to the number
of messages contained in the mailbox.
The mail headers are returned in one long string, such as:
Received: from arcticcat.middlebury.edu ([140.233.2.8]) by bearcat.middlebury.edu with Microsoft SMTPSVC(5.0.2195.6713);
Fri, 18 Nov 2005 15:36:19 -0500
Received: from anthrax.middlebury.edu ([140.233.2.72]) by arcticcat.middlebury.edu with Microsoft SMTPSVC(5.0.2195.6713);
Fri, 18 Nov 2005 15:36:19 -0500
Received: from 140.233.20.15 [140.233.20.15]
by anthrax.middlebury.edu
with XWall v3.35 ;
Fri, 18 Nov 2005 15:36:18 -0500
Message-ID: <437E3B2F.1080301@middlebury.edu>
Date: Fri, 18 Nov 2005 15:35:59 -0500
From: David Guertin
MIME-Version: 1.0
To: "Briggs, Amy"
Subject: Re: two needs for CS 101
In-Reply-To: <437DE9F4.2000107@middlebury.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Return-Path: guertin@middlebury.edu
Your spam filter will look at only the "From" and "Subject"
headers, which the SpamFilter class puts into an array for you.
Your job is to complete the definition of the SpamFilter
class
to search
the "From" and "Subject" headers for keywords indicating the message
is spam. The SpamFilter class extends
FrameController rather than FrameWindowController.
We do this because this program makes no use of the canvas. The
begin method constructs the GUI. If you need to initialize
some instance variables, you may want to add code to the
begin method to do that. The GUI components that we
construct and that your program will need to interact with are:
- JTextArea mailArea
- This is the main text area where the
user's good mail headers or spam headers are displayed.
- JComboBox goodOrSpam
- This is the menu at the bottom of the
screen that allows the user to control whether mailArea
displays the headers from good mail or from spam. When the user
makes a selection from this menu, what is displayed should change if
the user's mail has already been downloaded. If no mail has been
downloaded, nothing should happen.
- JTextField filterField
- This is the text field where the user
enters a filter to be added to or removed from the list of filters.
- JTextArea filterArea
- This is the text area showing all the
terms actively used as filters. Each filter the user adds should be
displayed on a separate line within this text area.
- JButton addFilterButton
- This is the button labelled "Add to
filter". When the user clicks this button, the entry in
filterField should be added to filterArea. The
filterField should be cleared.
If the
user's mail has already been downloaded, it should be refiltered and
mailArea should be updated to show the results of the
filtering.
- JButton removeFilterButton
- This is the button labelled
"Remove from
filter". When the user clicks this button, the entry in
filterField should be removed from filterArea. The
filterField should be cleared. If the
user's mail has already been downloaded, it should be refiltered and
mailArea should be updated to show the results of the
filtering. If the entry in the text field does not exist in the
filters, nothing should happen.
- JComboBox servers
- This is the menu where the user selects
the mail server to connect to. Note that the user can type in the
name of the server to connect to a server not in the original menu.
- JTextField user
- This is the text field where the user types
in his/her name to connect to a mailbox.
- JPasswordField pass
- This is where the user types in his/her
password. A JPasswordField is very similar to a
JTextField except that what the user types is not displayed
on the screen. Also, the program uses the getPassword() method
to find out what the user typed in this field, rather than
getText(). Recall that getPassword returns a
char[] rather than a String, but we just
pass this value on to the MailConnection constructor.
- JButton connectButton
- This is the button labeled "Get
mail". When the user clicks this button, the program creates a new
mail connection using the information in servers,
user, and pass. It then downloads headers
from all of the user's mail and saves just the "From" and
"Subject" headers. It then closes the connection to the mail server.
Your job is then to filter the saved mail headers into good mail or spam using
the current filters. Then, display good mail or spam depending on
the setting of the goodOrSpam menu.
You must provide:
- The listening method to react to the user clicking on the
"Add to filter" and "Remove from filter" buttons (in addition
to the "Get mail" button).
In addition, you must have a listener for the good mail vs. spam
combo box to control what is displayed in the main text area. Both
JButtons and JComboBoxes use
ActionListeners to receive user input.
- To avoid repeatedly going back to the mail server (a slow
operation), the program saves all the "From" and "Subject" headers
in an array that contains all the user's headers (a Mail[]
array) and that can be
repeatedly walked to do the filtering.
Besides the array to hold all mail, you will also need two
other arrays. One will hold only the good mail headers, the other will hold
only the spam headers.
- You will need another array to hold the list of filter terms.
Remember that when you remove a filter, you will need to update this
array by removing the matching element and sliding everything beyond
it one slot closer to the beginning of the array (like the way the
nibbles snake shrinks, except you will generally be removing from the
middle of the array, not the beginning). It is ok to store and
display the list of filter items in all lowercase to simplify the
case-insensitive comparisons you will do when you filter. (It's not
ok to do that to the mail headers, though.)
- You will need to be able to determine if a header is
spam. You should use String methods to search the header for the
presence of any String in the list of filters. You should use a
case-insensitive comparison for your spam comparisons.
- You may find it useful to introduce other private methods to
keep your code simple and to prevent repeating code in several
places. (Avoid reuse by copy-and-paste!)
Note that the code you are given extracts just the "From" and
"Subject" headers from the long string that header
returns. As shown earlier, the String that header returns
actually contains multiple headers with a newline between each. To
find just one header, the program finds a string that begins with
a newline character (\n) followed by "From:" or "Subject:" and
ending at the next newline character. Note how it handles the special
case where the header it is looking for is the last header and does
not end with a newline. When looking for these
strings, we use a case-sensitive comparison.
Implementation Suggestions
We suggest that you approach this problem in the following order:
- First, read and understand the code you are given that
establishes a connection with a mail server, downloads the headers,
extracts the "From" and "Subject" headers into an array, and displays
the contents of this array in mailArea.
- Now, work on adding filters to the list of filters and
removing filters from the list. At first, just be sure that the
filter list gets updated correctly. Don't bother with actually
filtering the mail.
- Finally, apply the filters to identify good mail and spam.
Make sure you can view both.
- Test your program by modifying your filter list and seeing
that what gets displayed in the mail area is updated appropriately.
Remember that it should not be necessary to download your mail again
when you change your filters.
Submitting Your Work
We will grade your assignment both by running your submitted BlueJ
project and by reading a printout of your Java source code. You should
submit a printout for SpamFilter.java only. Homework is due by
the beginning of your lecture section.
Before submitting your work, make sure that your .java file
includes a comment containing your name and lab section. Also, before
turning in your work, be sure to double check both its logical
organization and your style of presentation. Make your code as clear
as possible and include appropriate comments describing major sections
of code and declarations. Make sure your indentation is all
consistent. Refer to the lab
style sheet for more information about style.
Print out a copy of your java source file, your design document, and the Homework 10 Cover Page. This page provides
the guidelines for how your homework will be graded. Turn in one stapled
hardcopy of all your work, with this cover page on top. You should
include at the bottom any assigned exercises from the textbook.
Electronic submission
Because of Java security mechanisms, it will not be possible to run
this program in a web browser or with an appletviewer. For this reason
we will want you to submit your entire BlueJ project.
Open a terminal window and type
cd public_html/cs101/Spam
This should bring you to your files for this assignment. Then, type
~cs101/submit
This will copy all files in your current directory and store them in our account.
You can resubmit as many times as you want -- only the newest submission will be kept.
The final deadline for both paper and electronic submission is Friday, December 2
at the beginning of your lecture section.
Good luck and have fun!
Back to Computer
Science 101 Home
Department of Computer
Science