## Lab 7: Weather Report Due: 08:00:00AM on 2022-11-11

FAQ

An important component of many scientific applications is data collection and data analysis. For this lab, we’ll be looking at an example data collection application that collects weather data from the web and aggregates it into a data file. In addition, we’ll also make a useful program that takes a zip code as a command-line parameter and will give you the current temperature for that zipcode.

As described in the prelab, you may work with a partner on this lab. If you do, you must both be there whenever you’re working on the lab. Only one of you should submit the assignment to Gradescope, but make sure both your names are in the comment at the top of the file and you add your partner to your Gradescope submission (as described at the end of the assignment).

An important disclaimer:

When writing a program like this that contacts an external server, you need to be thoughtful about how you use that external resource. Many commercial services will have request limits. If someone is offering a service as a courtesy, we want to be respectful of that resource.

For testing purposes, I have put up a version of the web page you will be extracting the temperature from on our department web server. You should use this test web page until you have your program working. Even when you have your program working and change over to the external web address, please avoid making too many repeated calls.

### Part 1: Getting the weather

For the first part of this lab, write a program that reads the current weather from the web for a zip code entered by the user. I’ve broken the description of this program into two parts: the specification of what is required, and my suggestion about how to proceed on the implementation. Make sure to read both sections before starting!

#### Specifications

Write a program called lab7_weather.py that has the following characteristics:

1. Importing the module only defines functions and variables, i.e. on import your program should not query the weather (or invoke any functions or print anything to the shell).
2. Your program should be able to be run from the command-line and take a single argument, which is the zip code:

1. If your program is run with too few or too many arguments, it should print out the usage:

>>> %Run lab7_weather.py
usage: python3 lab7_weather.py <zip_code>

2. If your program is run with the correct number of arguments (one) you should treat it as a zip code (you can assume it is a valid zipcode) and the program should print out the current temperature at that zip code

>>> %Run lab7_weather.py 05753
39.71

3. Your module must contain a function named get_temperature that takes a zip code as a string parameter (think about why a zipcode might be better represented as a string than an integer) and returns the temperature at that zip code as a float.

#### Guide

We we will use an API to obtain weather data. API stands for “Application Program Interface” and it means that a service (such as a weather data server on the web) provides a protocol specifically designed to be used by programs, rather than by humans.

In particular, for this lab, we will use the API by OpenWeatherMap. If you follow the link for “current weather data” and then scroll down to “by ZIP code” you will see that you can use a URL like

http://api.openweathermap.org/data/2.5/weather?zip=05753,us&APPID=9838b264525602b46f0b2ef8c191eef8&units=imperial

to get the weather conditions for a given zip code. Notice zip=05753 in the URL specifying the zip code. To obtain the weather for a different zip code you would change that portion of the URL, e.g., to zip=20015 to get the weather for 20015 (Washington D.C.). Note that the whole URL is required, including the APPID portion (which is my API key); see below for more explanation of API keys and how to get your own. Here is a sample page I retrieved for Middlebury via the API:

http://www.cs.middlebury.edu/~mlinderman/courses/cs150/f22/labs/lab7-test-data.json?zip=05753,us&APPID=9838b264525602b46f0b2ef8c191eef8&units=imperial

For now, your program should only use this sample page (on CS department servers). If you follow this link you’ll see it’s a text encoding for the weather for Middlebury, with a current temperature, indicated by the “temp” key, of 49.25. Your job is to write a Python module that extracts just the temperature from this data.

Note: You may get the following error, or something similar, when using the test URL

urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)>


If that happens, add the following statements to the top of the your program

# Python is rejecting the certificate used for the CS dept. server so we bypass some of those checks
import ssl
ssl._create_default_https_context = ssl._create_unverified_context


For context, this data is provide by the API as JSON. JSON (which stands for JavaScript Object Notation) is one of the most common data interchange formats, that is specifications for communicating precisely formatted data between different programming languages (or computer systems). In our example, the weather website provides a JSON representation of the weather that can be sent as a string, and then parsed (or understood) by many different programming languages as dictionaries, lists, numbers, etc.

You may implement this module however you like as long as it meets the specifications above, however, here is one suggested approach to implementing it:

1. Write some code that opens and reads the web page above (on the CS department server). Note that all the information is just contained on a single line, but it’s still fine to use our standard approach of iterating over the lines, we will just only have one iteration. You can also read the entire file at once using contents = webpage.read() instead of for line in webpage:.
2. Once you have this working, you need to extract the temperature. There are (at least) two approaches. One is to use the string method find, like the example program for extracting email addresses we discussed in class. You can, for instance, search for the string '"temp":' (including the double quotes) to find the start location of the temperature, and then find the location of the next comma to get the end location. Another approach is to utilize the structure of the data returned by the API. As described above, this data is formatted according to the JSON standard, and as you might expect, there is a Python module for parsing this representation (specifically check out the json.loads function). The latter is how you would typically do this “for real”.

Either approach is valid and permitted, but be sure that you can extract the temperature and store it in a variable.

3. Once you can obtain the temperature, put this all together to write the get_temperature function. Recall that it will take a zip code as a parameter. That parameter will need to be inserted into a properly formatted URL (you might find the string format method helpful here). Note that the sample page will always return the same data (even if you change the zip code), but you still want to generate a properly formatted URL so that you can obtain the correct data in the future. Keep in mind that you control the URL string and so can structure that string to make it easier (and more concise) to incorporate the zip code into the URL.

4. Finally, write the part of the program that checks to see if this program is being run versus imported, checks the number of command line arguments and prints the usage if the incorrect number of arguments are provided (exactly as shown above). Finish up your program so that when you run it with the zip code command line argument it prints out the temperature. You should now be able to run your program from the command line with a zip code and it will give you a temperature (pretty cool!). With the test URL, it should always give you 49.25, however, it will just be a small change to have it do the real thing. We’ll get to that soon…

### Part 2: Aggregating the weather

We now have a program that we can run and it gives us the temperature and we have a module that we could import and call the get_temperature function to get the current temperature for a zip code. For the second part of this lab, we’re going to write another program (i.e., in a different “.py” file) that can be run regularly over time to build up a file with aggregated temperature data over time.

Your program will be run with two command-line arguments, the name of a file and a zip code. The file will contain multiple entries collected over time. Each line in the file will consist of a date, an hour of the day (in 24 hour time) and the temperature at that hour separated by commas (termed a CSV file). For example, here is a short snippet of an example file:

11-1-2018,13,49.23
11-1-2018,14,52.12
11-1-2018,15,52.45
11-1-2018,16,50.71
11-1-2018,17,50.85
11-1-2018,18,51.04


Each time you run the program it will add at most one line to this file. So the file above would have been generated with at least six calls to the program (over 6 different hours). We’re setting the problem up this way since it is generally straightforward to get a program to run at some fixed interval. You won’t be doing that for this lab, but I’m happy to talk to you offline about how that would work.

As with the first part, I’ve broken the description of this program into two parts, the specification and the guide.

#### Specifications

Write a program called lab7_aggregator.py that has the following characteristics:

1. Importing the module only defines functions and variables (no functions are invoked, nothing is printed in the shell).
2. Your program should be able to be run from the command-line and take two arguments, the first a filename and the second a zip code. For example, within Thonny:

>>> %Run lab7_aggregator.py temps_05753.txt 05753
>>>

3. If the program is run with an incorrect number of command-line arguments it should print out the usage:

>>> %Run lab7_aggregator.py
usage: python3 lab7_aggregator.py <file> <zip_code>

4. If the program is run with the correct arguments:
1. Your program should work if the file doesn’t yet exist. In that case, there can’t possibly be an entry for the current date and time and so your program should create the file and write an entry with the correct information and formatting (i.e., comma-separated date, hour, and temperature). The date and hour should be formatted using (or matching) the functions from the prelab.
2. If the file exists, the program should first check to make sure that there isn’t already an entry in the file for the current date and hour. If there is, the program should do nothing. This means that running the program repeatedly within the same hour will not alter the file after the first time when the current temperature is added for the current hour. Furthermore, if an entry already exists in the file for the current date and hour, the get_temperature function will not get called (doing so would be computationally inefficient).
3. If the file exists, but there is not an entry in the file for the current date and hour, the program should use the lab7_weather module to get the current temperature for the zip code specified as a command line argument and add an entry to the file at the end with the appropriate formatting. You program should only invoke get_temperature if it is going to write an entry to the file (to avoid slowing your program down with unneeded queries to the API). It is possible for the file to exist, but be empty, in which case there can’t be an entry for the current date and hour and your program should write an entry.

Make sure that you do not “hard code” any filenames or directories in your program, that is specify a particular file name or directory on your computer. The Gradescope tests will fail if your program ignores (or changes) the filenames it provides as arguments.

#### Guide

Here is one approach to implementing this program:

1. Write the part of the program that checks to see if this program is being run vs. imported, checks the number of program parameters and prints the usage accordingly.
2. Write some code to check whether the file exists and has an entry for a current date and time. To check whether a file exists you can use the exists function within the os.path module (which returns True if the file specified by the string argument exists). For testing purposes, it may be useful to create a version of the aggregate file manually. You can do so with Thonny.
3. Finally, put the above code together so that you check to see if an entry should be written to the file and if so use your lab7_weather module to get the temperature and append it to the end of the file. When writing this file, you can either rewrite the entire file from scratch each time (in which case you’d open the file with “w”) or instead just append the one new entry (in which case you’d open the file with “a”). In either case you will use the write method on the file object to write a string to the file. Opening a file in append mode (with the “a” argument) will create the file if it doesn’t exist.
4. Add any finishing touches to the program to make sure it runs appropriately. Note that when you run your program you won’t see any output, but the data file you provided as a command line argument may have been changed.

### The Real Deal

So far, all of your testing should have been done with the departmental web server using the URL above, always giving you the same temperature. When you’re confident that you have everything working you can go back and change your lab7_weather module to use the real web page. For a given zip code, the URL should look as follows:

http://api.openweathermap.org/data/2.5/weather?zip=05753,us&APPID=9838b264525602b46f0b2ef8c191eef8&units=imperial


Note that the URL has several “query parameters” separated with ampersands. For instance, we specify the zip code via zip=05753,us. At the end we request imperial units, i.e., Fahrenheit, since by default we get Kelvin which is not as useful. What about the APPID variable? This API asks you to create an account, which controls the number of requests you are allowed to make. A free account gives you up to 60 requests per minute. I encourage you to create an account, which will give you your own unique APPID to use in the URL. The current value is Prof. Linderman’s key.

In any case, to use the actual API you need to use a URL like the one above, but with the correct zip code substituted.

Change your get_temperature function in the lab7_weather module to generate an appropriate URL based on the zip code passed in and then use this URL to get the temperature. You should now be able to query the current weather based on the zip code entered:

>>> %Run lab7_weather.py 05753
36.56
>>> %Run lab7_weather.py 80424
13.65
>>> %Run lab7_weather.py 33111
78.32


Again, please try not to run this program too many times (unless you created your own API account), but do play with it some. You should be able to run your lab7_weather.py program with a zip code and it will give you the current temperature and your lab7_aggregator.py should now aggregate the real values.

### Creativity Points

You may earn up to 2 creativity points on this assignment. Below are some ideas, but you may incorporate your own if you’d like. Make sure to document your additions in the comment at the top of the files.

• [0.5 points] Check to make sure that the user enters a valid zip code (i.e., 5 digits).
• [1 point] Also include the zip code in the aggregated file and add data to the file based on whether an entry for that date, time and zip code do not exist in the file. If you add additional information to each line in the file, make sure the beginning of the line remains as specified above, i.e. date, then hour, then temperature (as that is what Gradescope will be checking).
• [1 point] Extend your lab7_weather module with functions to extract other information from the API.

If you extract other data from the API, do so by creating new functions similar to get_temperature. Even better style is to create a general function, e.g. get_field, that takes a zip code and field name as a parameters and returns that field. For example get_temperature could then call that general function with '"temp":' as the field name. However you go about it, make sure get_temperature continues to satisfy the specification above, that is it will execute successfully with a single argument, the zip code, and returns the temperature as a float.

### When you’re done

Make sure that your program is properly commented:

• You should have docstring at the very beginning of the file stating your name(s), course (including section number), assignment number.
• Each function should have an appropriate docstring (including arguments and return value if applicable).
• Other miscellaneous comments to make things clear

Remember that modules need docstrings too! Make sure you have a docstring at the top of your file that starts with meaningful one sentence description of the functionality in that module. That is the top of your file should now look like:

"""
A brief description of my module...

CS150 Lab 7

Name: Michael Linderman
Section:

Creativity:
"""


In addition, make sure that you’ve used good coding style (including meaningful variable names, constants where relevant, vertical white space, etc.).

Submit your programs via Gradescope. Your files must be named lab7_weather.py and lab7_aggregator.py, and you must submit both files at the same time. You can submit multiple times, with only the most recent submission (before the due date) graded. Note that the tests performed by Gradescope are limited. Passing all of the visible tests does not guarantee that your submission correctly satisfies all of the requirements of the assignment.

If you worked with a partner, only one person needs to submit to Gradescope, but that person does need to add their partner’s name as shown in Gradescope documentation. Make sure both names are included in the comment at the top of the files.

Feature Points
lab7_weather.py
run vs. import 1
prints usage with incorrect number of arguments 2
runs correctly with zip code entered 2
get temperature 5
lab7_aggregator.py
run vs. import 1
prints usage with incorrect number of arguments 2
data and hour formatted correctly in file 1
appends temp to end of file 3
Code design and style 5
Creativity points 2
Total 27

In the past, many of the errors in the lab originated in the code to check the file for existing entries. To check if there is an entry with the current date and time, we need to check if any line in the file contains both the current date and the current time. However, just using the in operator to check for the presence of the date and hour in the string can fail some of the time. Instead we want to match the entire date and hour string at one time.
We have primarily used truth tables in the context of our boolean operators (e.g., and), but they are also a useful tool when figuring out our program logic.
The lab specifies (and Gradescope tests for) a specific file format. Per the specification, the date, hour, temperature should be separated by commas (not spaces or other characters). Note, we use commas to make it easy for other tools or libraries, like datascience, to read in our data file. A compliant write would look like file.write("10-1-2018,12,26.2\n").