An important component of many scientific applications is data collection and data analysis. For this lab, we’ll be looking at an example data collection application that collects weather data from the web and aggregates it into a data file. In addition, we’ll also make a useful program that takes a zip code as a command-line parameter and will give you the current temperature for that zipcode.
As described in the prelab, you may work with a partner on this lab. If you do, you must both be there whenever you’re working on the lab. Only one of you should submit the assignment to Gradescope, but make sure both your names are in the comment at the top of the file and you add your partner to your Gradescope submission (as described at the end of the assignment).
An important disclaimer:
When writing a program like this that contacts an external server, you need to be thoughtful about how you use that external resource. Many commercial services will have request limits. If someone is offering a service as a courtesy, we want to be respectful of that resource.
For testing purposes, I have put up a version of the web page you will be extracting the temperature from on our department web server. You should use this test web page until you have your program working. Even when you have your program working and change over to the external web address, please avoid making too many repeated calls.
For the first part of this lab, write a program that reads the current weather from the web for a zip code entered by the user. I’ve broken the description of this program into two parts: the specification of what is required, and my suggestion about how to proceed on the implementation. Make sure to read both sections before starting!
Write a program called lab7_weather.py
that has the following characteristics:
Your program should be able to be run from the command-line and take a single argument, which is the zip code:
If your program is run with too few or too many arguments, it should print out the usage:
>>> %Run lab7_weather.py
usage: python3 lab7_weather.py <zip_code>
If your program is run with the correct number of arguments (one) you should treat it as a zip code (you can assume it is a valid zipcode) and the program should print out the current temperature at that zip code
>>> %Run lab7_weather.py 05753
39.71
get_temperature
that takes a zip
code as a string parameter (think about why a zipcode might be better
represented as a string than an integer) and returns the temperature at that
zip code as a float.We we will use an API to obtain weather data. API stands for “Application Program Interface” and it means that a service (such as a weather data server on the web) provides a protocol specifically designed to be used by programs, rather than by humans.
In particular, for this lab, we will use the API by OpenWeatherMap. If you follow the link for “current weather data” and then scroll down to “by ZIP code” you will see that you can use a URL like
to get the weather conditions for a given zip code. Notice zip=05753
in the
URL specifying the zip code. To obtain the weather for a different zip code you would change that portion of the URL, e.g., to zip=20015
to get the weather for 20015 (Washington D.C.). Note that the whole URL is required, including
the APPID
portion (which is my API key); see below for more explanation of
API keys and how to get your own. Here is a sample page I retrieved for
Middlebury via the API:
For now, your program should only use this sample page (on CS department servers). If you follow this link you’ll see it’s a text encoding for the weather for Middlebury, with a current temperature, indicated by the “temp” key, of 49.25. Your job is to write a Python module that extracts just the temperature from this data.
Note: You may get the following error, or something similar, when using the test URL
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)>
If that happens, add the following statements to the top of the your program
# Python is rejecting the certificate used for the CS dept. server so we bypass some of those checks
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
For context, this data is provide by the API as JSON. JSON (which stands for JavaScript Object Notation) is one of the most common data interchange formats, that is specifications for communicating precisely formatted data between different programming languages (or computer systems). In our example, the weather website provides a JSON representation of the weather that can be sent as a string, and then parsed (or understood) by many different programming languages as dictionaries, lists, numbers, etc.
You may implement this module however you like as long as it meets the specifications above, however, here is one suggested approach to implementing it:
contents = webpage.read()
instead of for line in webpage:
.Once you have this working, you need to extract the temperature. There are
(at least) two approaches. One is to use the string method find
, like the example
program for extracting email addresses we discussed in class. You can, for
instance, search for the string '"temp":'
(including the double quotes) to
find the start location of the temperature, and then find the location of
the next comma to get the end location. Another
approach is to utilize the structure of the data returned by the API. As
described above, this data is formatted according to the JSON standard, and
as you might expect, there is a Python
module for parsing this
representation (specifically check out the json.loads
function). The latter is how you would typically do this “for real”.
Either approach is valid and permitted, but be sure that you can extract the temperature and store it in a variable.
Once you can obtain the temperature, put this all together to write the
get_temperature
function. Recall that it will take a zip code as a
parameter. That parameter will need to be inserted into a properly formatted
URL (you might find the string format
method
helpful here). Note that the sample page will always return the same data
(even if you change the zip code), but you still want to generate a properly
formatted URL so that you can obtain the correct data in the future. Keep in mind that you control the URL string and so can structure that string to make it easier (and more concise) to incorporate the zip code into the URL.
We now have a program that we can run and it gives us the temperature and we
have a module that we could import and call the get_temperature
function to
get the current temperature for a zip code. For the second part of this lab,
we’re going to write another program (i.e., in a different “.py” file) that can
be run regularly over time to build up a file with aggregated temperature data
over time.
Your program will be run with two command-line arguments, the name of a file and a zip code. The file will contain multiple entries collected over time. Each line in the file will consist of a date, an hour of the day (in 24 hour time) and the temperature at that hour separated by commas (termed a CSV file). For example, here is a short snippet of an example file:
11-1-2018,13,49.23
11-1-2018,14,52.12
11-1-2018,15,52.45
11-1-2018,16,50.71
11-1-2018,17,50.85
11-1-2018,18,51.04
Each time you run the program it will add at most one line to this file. So the file above would have been generated with at least six calls to the program (over 6 different hours). We’re setting the problem up this way since it is generally straightforward to get a program to run at some fixed interval. You won’t be doing that for this lab, but I’m happy to talk to you offline about how that would work.
As with the first part, I’ve broken the description of this program into two parts, the specification and the guide.
Write a program called lab7_aggregator.py
that has the following
characteristics:
Your program should be able to be run from the command-line and take two arguments, the first a filename and the second a zip code. For example, within Thonny:
>>> %Run lab7_aggregator.py temps_05753.txt 05753
>>>
If the program is run with an incorrect number of command-line arguments it should print out the usage:
>>> %Run lab7_aggregator.py
usage: python3 lab7_aggregator.py <file> <zip_code>
get_temperature
function will not get called (doing so would be computationally inefficient).lab7_weather
module to get the current temperature for the zip code specified as a
command line argument and add an entry to the file at the end with the
appropriate formatting. You program should only invoke get_temperature
if it is going to write an entry to the file (to avoid slowing your
program down with unneeded queries to the API). It is possible for the file to exist, but be empty, in which case there can’t be an entry for the current date and hour and your program should write an entry.Make sure that you do not “hard code” any filenames or directories in your program, that is specify a particular file name or directory on your computer. The Gradescope tests will fail if your program ignores (or changes) the filenames it provides as arguments.
Here is one approach to implementing this program:
exists
function
within the os.path
module (which returns True
if the file specified by
the string argument exists). For testing purposes, it may be useful to
create a version of the aggregate file manually. You can do so with Thonny.lab7_weather
module to
get the temperature and append it to the end of the file. When writing this
file, you can either rewrite the entire file from scratch each time (in
which case you’d open the file with “w”) or instead just append the one new
entry (in which case you’d open the file with “a”). In either case you will
use the write
method on the file object to write a string to the file.
Opening a file in append mode (with the “a” argument) will create the file
if it doesn’t exist.So far, all of your testing should have been done with the departmental web
server using the URL above, always giving you the same temperature. When
you’re confident that you have everything working you can go back and change
your lab7_weather
module to use the real web page. For a given zip code,
the URL should look as follows:
http://api.openweathermap.org/data/2.5/weather?zip=05753,us&APPID=9838b264525602b46f0b2ef8c191eef8&units=imperial
Note that the URL has several “query parameters” separated with ampersands.
For instance, we specify the zip code via zip=05753,us
. At the end we request
imperial
units, i.e., Fahrenheit, since by default we get Kelvin which is not
as useful. What about the APPID
variable? This API asks you to create an
account, which controls the number of requests you are allowed to make. A free
account gives you up to 60 requests per minute. I encourage you to create an
account, which will give you
your own unique APPID to use in the URL. The current value is Prof.
Linderman’s key.
In any case, to use the actual API you need to use a URL like the one above, but with the correct zip code substituted.
Change your get_temperature
function in the lab7_weather
module to
generate an appropriate URL based on the zip code passed in and then use this
URL to get the temperature. You should now be able to query the current weather
based on the zip code entered:
>>> %Run lab7_weather.py 05753
36.56
>>> %Run lab7_weather.py 80424
13.65
>>> %Run lab7_weather.py 33111
78.32
Again, please try not to run this program too many times (unless you created
your own API account), but do play with it some. You should be able to run your
lab7_weather.py
program with a zip code and it will give you the current
temperature and your lab7_aggregator.py
should now aggregate the real
values.
You may earn up to 2 creativity points on this assignment. Below are some ideas, but you may incorporate your own if you’d like. Make sure to document your additions in the comment at the top of the files.
[1 point] Extend your lab7_weather
module with functions to extract other
information from the API.
If you extract other data from the API, do so by creating new functions
similar to get_temperature
. Even better style is to create a general
function, e.g. get_field
, that takes a zip code and field name as a
parameters and returns that field. For example get_temperature
could then
call that general function with '"temp":'
as the field name. However you
go about it, make sure get_temperature
continues to satisfy the
specification above, that is it will execute successfully with a single
argument, the zip code, and returns the temperature as a float.
Make sure that your program is properly commented:
Remember that modules need docstrings too! Make sure you have a docstring at the top of your file that starts with meaningful one sentence description of the functionality in that module. That is the top of your file should now look like:
"""
A brief description of my module...
CS150 Lab 7
Name: Michael Linderman
Section:
Creativity:
"""
In addition, make sure that you’ve used good coding style (including meaningful variable names, constants where relevant, vertical white space, etc.).
Submit your programs via Gradescope. Your files must
be named lab7_weather.py
and lab7_aggregator.py
, and you must submit
both files at the same time. You can submit multiple times, with only the most
recent submission (before the due date) graded. Note that the tests performed
by Gradescope are limited. Passing all of the visible tests does not guarantee
that your submission correctly satisfies all of the requirements of the
assignment.
If you worked with a partner, only one person needs to submit to Gradescope, but that person does need to add their partner’s name as shown in Gradescope documentation. Make sure both names are included in the comment at the top of the files.
Feature | Points |
---|---|
lab7_weather.py |
|
run vs. import | 1 |
prints usage with incorrect number of arguments | 2 |
runs correctly with zip code entered | 2 |
get temperature | 5 |
lab7_aggregator.py |
|
run vs. import | 1 |
prints usage with incorrect number of arguments | 2 |
data and hour formatted correctly in file | 1 |
appends temp to end of file | 3 |
doesn’t add repeated data | 3 |
Code design and style | 5 |
Creativity points | 2 |
Total | 27 |
In the past, many of the errors in the lab originated in the code to check the file for existing entries. To check if there is an entry with the current date and time, we need to check if any line in the file contains both the current date and the current time. However, just using the in
operator to check for the presence of the date and
hour in the string can fail some of the time. Instead we want to match the
entire date and hour string at one time.
We have primarily used truth tables in the context of our boolean operators
(e.g., and
), but they are also a useful tool when figuring out our program
logic.
The lab specifies (and Gradescope tests for) a specific file format. Per the
specification, the date, hour, temperature should be separated by commas (not
spaces or other characters). Note, we use commas to make it easy for other tools
or libraries, like datascience, to read in our data file. A compliant write would
look like file.write("10-1-2018,12,26.2\n")
.