Practice Problems 8 Solution

  1. Assume that a=np.array([1, 2, 3]) and b=np.array([4, 5, 6]) (after import numpy as np). Evaluate each of the following expressions. Make it clear whether the result is a scalar (a single value) or a vector (an array of values).
    1. array([ 4, 10, 18])
    2. -9
    3. array([0.16666667, 0.33333333, 0.5 ])
    4. array([ 7, 9, 11])
    5. array([-1., 0., 1.])
  2. Rewrite the following code into “plain” Python that does not use NumPy, assuming a is a list. Built-in functions like sum, etc. are considered “plain” Python.

     def mystery(a):
         return np.max(a) - np.min(a)
    
     def mystery(a):
         return max(a) - min(a)
    

    NumPy functions can typically be used with both built-in Python lists and NumPy arrays. Thus is many instances we don’t need to convert a built-in list to a NumPy array type. The one place where we do often need to implement that conversion is when we are using arithmetic operators to implement element-wise computations. Those operators are only overloaded for NumPy arrays, i.e. [1, 2, 3] / 4 will raise an error while np.array([1, 2, 3]) / 4 will perform element-wise division.

  3. Rewrite the following code into “plain” Python that does not use NumPy, assuming a and b are lists (of the same length). Built-in functions like sum, etc. are considered “plain” Python.

     def mystery(a, b):
         return np.sum((np.array(a)-np.mean(a)) * (np.array(b)-np.mean(b)))
    
     def mystery(a, b):
         a_mean = sum(a) / len(a)
         b_mean = sum(b) / len(b)
         total = 0
         for i in range(len(a)):
             total += (a[i] - a_mean) * (b[i] - b_mean)
         return total
    
  4. Rewrite the following Python function using NumPy to not have any explicit loops:

     def length_normalize(items):
         """
         Normalize all the values in the list by the sum
             
         Args:
             item: A list of numbers
            
         Returns: List of normalized numbers
         """
         total = 0
         for item in items:
             total += item
            
         new_items = []
         for item in items:
             new_items.append(item / total)
         return new_items
    
     def length_normalize(items):
         return np.array(items) / np.sum(items)
    
  5. Consider the following Table assigned to the tips variable, a subset of which are shown below (you can download the file here and read into Python via tips = ds.Table().read_table("tips.csv")).

     >>> tips
     total_bill | tip  | sex    | smoker | day  | time   | size
     16.99      | 1.01 | Female | No     | Sun  | Dinner | 2
     10.34      | 1.66 | Male   | No     | Sun  | Dinner | 3
     21.01      | 3.5  | Male   | No     | Sun  | Dinner | 3
     23.68      | 3.31 | Male   | No     | Sun  | Dinner | 2
     24.59      | 3.61 | Female | No     | Sun  | Dinner | 4
     25.29      | 4.71 | Male   | No     | Sun  | Dinner | 4
     8.77       | 2    | Male   | No     | Sun  | Dinner | 2
     26.88      | 3.12 | Male   | No     | Sun  | Dinner | 4
     15.04      | 1.96 | Male   | No     | Sun  | Dinner | 2
     14.78      | 3.23 | Male   | No     | Sun  | Dinner | 2
     ... (234 rows omitted) 
    

    Briefly describe the plot generated by the following code. & is the element-wise and operation.

     d = tips.where((tips["sex"] == "Female") & (tips["time"] == "Lunch"))
     plt.plot(d["total_bill"], d["tip"], "ro")
     d = tips.where((tips["sex"] == "Male") & (tips["time"] == "Lunch"))
     plt.plot(d["total_bill"], d["tip"], "bo")
     d = tips.where((tips["sex"] == "Female") & (tips["time"] == "Dinner"))
     plt.plot(d["total_bill"], d["tip"], "rx")
     d = tips.where((tips["sex"] == "Male") & (tips["time"] == "Dinner"))
     plt.plot(d["total_bill"], d["tip"], "bx")
     plt.show()
    

    This code produces a scatter plot of “tip” (y-axis) vs. “ total_bill” (x-axis) with the color of the point indicating the gender (red for female, blue for male) and the shape of the point indicating the meal time (circle for lunch, “x” for dinner).

  6. For the dataset above, write datascience code to subset the data to just those rows where the tip is greater than 15% of the total bill.

     tips.where((tips["tip"] / tips["total_bill"]) > 0.15)
    
  7. For the dataset above, write code using the datascience group method to concisely and efficiently compute the average tip percentage for all combinations of diner gender and meal time (“Lunch” vs. “Dinner”). As a suggestion, the NumPy np.mean method can be used as the function applied to each group.

     tips["pct"] = tips["tip"] / tips["total_bill"]
     tips.group(["sex", "time"], np.mean)
    
  8. [Bonus] Write code to perform the same computation, computing the mean tip percentage for all combinations of diner gender and meal time (“Lunch” vs. “Dinner”), using just Python built-in functions and data structures. There are many ways to go about this, but as a hint, tuples, e.g. (sex, time), can be used as dictionary keys. You can easily iterate through the rows of a Table with the row attribute and access the fields as attributes of the value of that iterable, e.g.

     for row in tips.rows:
         print(row.tip / row.total_bill)
    

    There are many ways to go about this, here I create a composite key of the sex and time variables for use in a dictionary. The value in that dictionary is a list of all of the tip amounts.

     groups = {}
     for row in tips.rows:
         # Create tuple with combination of sex and time to use as dictionary key
         combo = (row.sex, row.time)
         # Append to tip pct. to list to compute average later
         groups[combo] = groups.get(combo, []) + [row.tip / row.total_bill]
        
     for key, value in groups.items():
         print(key[0], key[1], sum(value) / len(value))