Programming Assignment 8: Weather Report
Initial Due Date: 2024-11-14 8:00AM
Final Due Date: 2024-12-05 4:15PM
Note: On the this assignment you will again be able to work in pairs if you want to do so. If you do work with a teammate, you must both be present whenever you’re working on the lab. Only one of you should submit the assignment, but make sure both your names are in the comment at the top of the file and the submitter adds their partner to your Gradescope submission.
An important component of many scientific applications is data collection and data analysis. For this assignment, we’ll be looking at an example data collection application that collects weather data from the web and aggregates it into a data file. In addition, we’ll also make a useful program that takes a zip code as a command-line parameter and will give you the current temperature for that zipcode.
An important disclaimer:
When writing a program like this that contacts an external server, you need to be thoughtful about how you use that external resource. Many commercial services will have request limits. If someone is offering a service as a courtesy, we want to be respectful of that resource.
For testing purposes, I have put up a version of the web page you will be extracting the temperature from on our department web server. You should use this test web page until you have your program working. Even when you have your program working and change over to the external web address, please avoid making too many repeated calls.
Part 1: Getting the weather
For the first part of this assignment, write a program that reads the current weather from the web for a zip code entered by the user. I’ve broken the description of this program into two parts: the specification of what is required, and my suggestion about how to proceed on the implementation. Make sure to read both sections before starting!
Specifications
Write a program called pa8_weather.py
that has the following characteristics:
Importing the module only defines functions and variables, i.e. on import your program should not query the weather (or invoke any functions or print anything to the shell).
Your program should be able to be run from the command-line and take a single argument, which is the zip code:
If your program is run with too few or too many arguments, it should print out the usage, e.g. within Thonny:
>>> %Run pa8_weather.py usage: python3 pa8_weather.py <zip_code>
If your program is run with the correct number of arguments (one) you should treat it as a zip code (you can assume it is a valid zipcode) and the program should print out the current temperature at that zip code, e.g., within Thonny.
>>> %Run pa8_weather.py 05753 39.71
Your module must contain a function named
get_temperature
that takes a zip code as a string parameter (think about why a zipcode might be better represented as a string than an integer) and returns the temperature at that zip code as a float.
Guide
We we will use an API to obtain weather data. API stands for “Application Program Interface” and it means that a service (such as a weather data server on the web) provides a protocol specifically designed to be used by programs, rather than by humans.
In particular, for this lab, we will use the API by OpenWeatherMap. If you follow the link for “current weather data” and then scroll down to “by ZIP code” you will see that you can use a URL like
to get the weather conditions for a given zip code. Notice zip=05753
in the URL specifying the zip code. To obtain the weather for a different zip code you would change that portion of the URL, e.g., to zip=20015
to get the weather for 20015 (Washington D.C.). Note that the whole URL is required, including the APPID
portion (which is my API key); see below for more explanation of API keys and how to get your own. Here is a sample page I retrieved for Middlebury via the API:
For now, your program should only use this sample page (on the CS department server). If you follow this link you’ll see it’s a text encoding for the weather for Middlebury, with a current temperature, indicated by the “temp” key, of 49.25. Your job is to write a Python module that extracts just the temperature from this data.
Note: You may get the following error, or something similar, when using the test URL
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)>
If that happens, add the following statements to the top of the your program
# Python is rejecting the certificate used for the CS dept. server so we bypass some of those checks
import ssl
= ssl._create_unverified_context ssl._create_default_https_context
For context, this data is provide by the API as JSON. JSON (which stands for JavaScript Object Notation) is one of the most common data interchange formats, that is specifications for communicating precisely formatted data between different programming languages (or computer systems). In our example, the weather website provides a JSON representation of the weather that can be sent as a string, and then parsed (or understood) by many different programming languages as dictionaries, lists, numbers, etc.
You may implement this module however you like as long as it meets the specifications above, however, here is one suggested approach to implementing it:
Write code to open and read the web page above (on the CS department server). Check out this example program for extracting information from a webpage. In that program we use the
urllib.request
module built into Python request data from a website much like open a local file for reading. Start by importurlib.request
, then you can request and read the file like shown below.with urllib.request.urlopen(url) as webpage: # Iterate through each line of webpage, just like a file for line in webpage: = line.decode('utf-8', 'ignore') # Obtain a string from the raw bytes line
Note that all the information is just contained on a single line, but it’s still fine to use our standard approach of iterating over the lines, we will just only have one iteration. You can also read the entire file at once using
contents = webpage.read()
instead offor line in webpage:
.Once you have this working, you need to extract the temperature. There are (at least) two approaches. One is to use the string method
find
, like in the example program linked above. You can, for instance, search for the string'"temp":'
(including the double quotes) to find the start location of the temperature, and then find the location of the next comma to get the end location. Another approach is to utilize the structure of the data returned by the API. As described above, this data is formatted according to the JSON standard, and as you might expect, there is a Python module for parsing this representation (specifically check out thejson.loads
function). The latter is how you would typically do this “for real”.Either approach is valid and permitted, but be sure that you can extract the temperature and store it in a variable.
Once you can obtain the temperature, put this all together to write the
get_temperature
function. Recall that it will take a zip code as a parameter. That parameter will need to be inserted into a properly formatted URL (you might find the stringformat
method helpful here). Note that the sample page will always return the same data (even if you change the zip code), but you still want to generate a properly formatted URL so that you can obtain the correct data in the future. Keep in mind that you control the URL string and so can structure that string to make it easier (and more concise) to incorporate the zip code into the URL.Finally, write the part of the program that checks to see if this program is being run versus imported, checks the number of command line arguments and prints the usage if the incorrect number of arguments are provided (exactly as shown above). Finish up your program so that when you run it with the zip code command line argument it prints out the temperature. You should now be able to run your program from the command line with a zip code and it will give you a temperature (pretty cool!). With the test URL, it should always give you 49.25, however, it will just be a small change to have it do the real thing. We’ll get to that soon…
Part 2: Aggregating the weather
We now have a program that we can run and it gives us the temperature and we have a module that we could import and call the get_temperature
function to get the current temperature for a zip code. For the second part of this assignment, we’re going to write another program (i.e., in a different “.py” file) that can be run regularly over time to build up a file with temperature data over time.
Your program will be run with two command-line arguments, the name of a file and a zip code. The file will contain multiple entries collected over time. Each line in the file will consist of a date, an hour of the day (in 24 hour time) and the temperature at that hour separated by commas (termed a CSV file). For example, here is a short snippet of an example file:
11-1-2018,13,49.23
11-1-2018,14,52.12
11-1-2018,15,52.45
11-1-2018,16,50.71
11-1-2018,17,50.85
11-1-2018,18,51.04
Each time you run the program it will add at most one line to this file. So the file above would have been generated with at least six calls to the program (over 6 different hours). We’re setting the problem up this way since it is generally straightforward to get a program to run at some fixed interval. You won’t be doing that for this assignment, but I’m happy to talk to you offline about how that would work.
As with the first part, I’ve broken the description of this program into two parts, the specification and the guide.
Specifications
Write a program called pa8_aggregator.py
that has the following characteristics:
Importing the module only defines functions and variables (no functions are invoked, nothing is printed in the shell).
Your program should be able to be run from the command-line and take two arguments, the first a filename and the second a zip code. For example, within Thonny:
>>> %Run pa8_aggregator.py temps_05753.txt 05753 >>>
If the program is run with an incorrect number of command-line arguments it should print out the usage. For example, within Thonny:
>>> %Run pa8_aggregator.py usage: python3 pa8_aggregator.py <file> <zip_code>
If the program is run with the correct arguments:
- Your program should work if the file doesn’t yet exist. In that case, there can’t possibly be an entry for the current date and time and so your program should create the file and write an entry with the correct information and formatting (i.e., comma-separated date, hour, and temperature). The date and hour should be formatted as described below.
- If the file exists, the program should first check to make sure that there isn’t already an entry in the file for the current date and hour. If there is, the program should do nothing. This means that running the program repeatedly within the same hour will not alter the file after the first time when the current temperature is added for the current hour. Furthermore, if an entry already exists in the file for the current date and hour, the
get_temperature
function will not get called (doing so would be computationally inefficient). - If the file exists, but there is not an entry in the file for the current date and hour, the program should use the
pa8_weather
module to get the current temperature for the zip code specified as a command line argument and add an entry to the file at the end with the appropriate formatting. You program should only invokeget_temperature
if it is going to write an entry to the file (to avoid slowing your program down with unneeded queries to the API). It is possible for the file to exist, but be empty, in which case there can’t be an entry for the current date and hour and your program should write an entry.
Make sure that you do not “hard code” any filenames or directories in your program, that is specify a particular file name or directory on your computer. The Gradescope tests will fail if your program ignores (or changes) the filenames it provides as arguments.
The date should be formatted as “M-D-YYYY” and the hour in “24 hour” time, e.g., 4 PM is 16. Depending the month, day and hour, each might be one or two digits. You can generate correctly formatted time and date strings using the datetime
module and the functions below (you are welcome and encourage to copy this code into your program).
import datetime
def get_hour():
"""Return the current hour as a string in 24 hour time"""
= datetime.datetime.now()
now return str(now.hour)
def get_date():
"""Return the current date as a string with [M]M-[D]D-YYYY"""
= datetime.datetime.now()
now return str(now.month) + "-" + str(now.day) + "-" + str(now.year)
Guide
Here is one approach to implementing this program:
- Write the part of the program that checks to see if this program is being run vs. imported, checks the number of program parameters and prints the usage accordingly.
- Write a function to check whether the file exists and has an entry for a current date and time. To check whether a file exists you can use the
exists
function within theos.path
module (which returnsTrue
if the file specified by the string argument exists). For testing purposes, it may be useful to create a version of the aggregate file manually. You can do so with Thonny. - Finally, use that function to check to see if an entry should be written to the file and if so use your
pa8_weather
module to get the temperature and append it to the end of the file. When writing this file, you can either rewrite the entire file from scratch each time (in which case you’d open the file with “w”) or instead just append the one new entry (in which case you’d open the file with “a”). In either case you will use thewrite
method on the file object to write a string to the file. Opening a file in append mode (with the “a” argument) will create the file if it doesn’t exist. - Add any finishing touches to the program to make sure it runs appropriately. Note that when you run your program you won’t see any output, but the data file you provided as a command line argument may have been updated.
The Real Deal
So far, all of your testing should have been done with the departmental web server using the URL above, always giving you the same temperature. When you’re confident that you have everything working you can go back and change your pa8_weather
module to use the real web page. For a given zip code, the URL should look as follows:
http://api.openweathermap.org/data/2.5/weather?zip=05753,us&APPID=9838b264525602b46f0b2ef8c191eef8&units=imperial
Note that the URL has several “query parameters” separated with ampersands. For instance, we specify the zip code via zip=05753,us
. At the end we request imperial
units, i.e., Fahrenheit, since by default we get Kelvin which is not as useful. What about the APPID
variable? This API asks you to create an account, which controls the number of requests you are allowed to make. A free account gives you up to 60 requests per minute. I encourage you to create an account, which will give you your own unique APPID to use in the URL. The current value is Prof. Linderman’s key.
In any case, to use the actual API you need to use a URL like the one above, but with the correct zip code substituted.
Change your get_temperature
function in the pa8_weather
module to generate an appropriate URL based on the zip code passed in and then use this URL to get the temperature. You should now be able to query the current weather based on the zip code entered:
>>> %Run pa8_weather.py 05753
36.56
>>> %Run pa8_weather.py 80424
13.65
>>> %Run pa8_weather.py 33111
78.32
Again, please try not to run this program too many times (unless you created your own API account), but do play with it some. You should be able to run your pa8_weather.py
program with a zip code and it will give you the current temperature and your pa8_aggregator.py
should now aggregate the real values.
Creativity Points
Here are some possible creativity additions, although you are encouraged to include your own ideas. Make sure to document your additions in the docstring comment at the top of the file.
[0.5 points] Check to make sure that the user enters a valid zip code (i.e., 5 digits).
[1 point] Also include the zip code in the aggregated file and add data to the file based on whether an entry for that date, time and zip code do not exist in the file. If you add additional information to each line in the file, make sure the beginning of the line remains as specified above, i.e. date, then hour, then temperature (as that is what Gradescope will be checking).
[1 point] Extend your
pa8_weather
module with functions to extract other information from the API.If you extract other data from the API, do so by creating new functions similar to
get_temperature
. Even better style is to create a general function, e.g.get_field
, that takes a zip code and field name as a parameters and returns that field. For exampleget_temperature
could then call that general function with'"temp":'
as the field name. However you go about it, make sureget_temperature
continues to satisfy the specification above, that is it will execute successfully with a single argument, the zip code, and returns the temperature as a float.
When you’re done
Make sure that your program is properly documented:
- You should have a docstring at the very beginning of the file briefly describing your program and stating your name, section and creativity additions.
- Each function should have an appropriate docstring (including arguments and return value if applicable).
- Other miscellaneous inline/block comments if the code might otherwise be unclear.
Remember that modules need docstrings too! Make sure you have a docstring at the top of your file that starts with meaningful one sentence description of the functionality in that module. That is the top of your file should now look like:
"""
A brief description of my module...
CS146 Programming Assignment 8
Name: Michael Linderman
Section:
Creativity:
"""
In addition, make sure that you’ve used good code design and style (including helper functions where useful, meaningful variable names, constants where relevant, vertical white space, removing “dead code” that doesn’t do anything, removing testing code, etc.).
Submit your programs via Gradescope. Your files must be named pa8_weather.py
and pa8_aggregator.py
, and you must submit both files at the same time. You can submit multiple times, with only the most recent submission (before the due date) graded. Note that the tests performed by Gradescope are limited. Passing all of the visible tests does not guarantee that your submission correctly satisfies all of the requirements of the assignment.
Grading
Assessment | Requirements |
---|---|
Revision needed | Some but not all tests are passing. |
Meets Expectations | All tests pass, the required functions are implemented correctly and your implementation uses satisfactory style. |
Exemplary | All requirements for Meets Expectations, 2 creativity points, and your implementation is clear, concise, readily understood, and maintainable. |
FAQ
Ensure you use the correct file format
The assignment specifies (and Gradescope tests for) a specific file format. Per the specification, the date, hour, temperature should be separated by commas (not spaces or other characters). Note, we use commas to make it easy for other tools or libraries, like datascience, to read in our data file. A compliant write would look like file.write("10-1-2018,12,26.2\n")
.
Note that newline ("\n"
) at the end. Unlike print
, write
doesn’t automatically include a newline. We want to put the newline at the end instead of the beginning of the line. Including a newline at the beginning of the line will create a blank line at the beginning of the file.
With use of optional arguments, we can also use print
to write to a file. If we check out the print
documentation we see it takes an optional file
argument. We can provide the object returned by open
to that argument to print to file (including the newline at the end) just as we would have printed it to the screen.
How can I accurately check for the date and hour
In the past, many of the errors in the assignment originated in the code to check the file for existing entries. To check if there is an entry with the current date and time, we need to check if any line in the file contains both the current date and the current time. However, just using the in
operator to check for the presence of the date and hour in the string can fail some of the time. Instead we want to match the entire date and hour string at one time.
Other past errors include incorporating the temperature as part of the matching criteria. Since the temperature changes it can’t be part of matching the line. The matching and the writing need to be in sync; i.e., if you are matching the zipcode in the fourth field make sure it is actually writing to the fourth field. Approaches that reference explicit indices in a string are very fragile. What if the day is a single digit instead of two?
For example, if we have the following line in our data file as the variable line
:
11-7-2018,9,33.4
the following expression "11-7-2018" in line and "11" in line
would evaluate to True even though the hour, “11”, doesn’t match. Because in
scans the entire line, the “11” matches the month within the date. Instead what we want to do is match entire date and hour string “11-7-2018,11”, i.e. "11-7-2018,11" in line
, or even better, line.startswith("11-7-2018,11")
. Alternately consider splitting the line into fields and comparing individual fields, e.g.,
= line.split(",")
fields if fields[0] == "11-7-2018" and fields[1] == "11":
...
How to decide when to write to the file? Using truth tables for program logic.
We have primarily used truth tables in the context of our boolean operators (e.g., and
), but they are also a useful tool when figuring out our program logic.
Consider the “aggregator” part of the assignment. How do we know when we want to write to the file? We can use a truth table to help us figure that out, and then implement the necessary logic. There are two “inputs”:
- Does the file exist?
- Is there an existing entry in the file for the current date and time?
Based on the lab specification, we can define the following truth table for this functionality:
File exists | Existing entry | Write new entry |
---|---|---|
False | N/A | True |
True | True | False |
True | False | True |
That truth table leads to the following pseudo code for this operation:
not file_exists(filename) or not entry_in_file(filename, date, hour)
As noted in the assignment guide, the exists
function within the os.path
module will return True
if the file specified by its string argument exists (i.e., the “file_exists” operation above).