Complete the following problems on paper. Try to solve each problem on paper first before using Thonny to confirm your answers.
a=np.array([1, 2, 3])
and b=np.array([4, 5, 6])
(after import numpy as np
). Evaluate each of the following expressions. Make it clear whether the result is a scalar (a single value) or a vector (an array of values).
a * b
np.sum(a - b)
a / np.sum(a)
(a + b) + 2
b - np.mean(b)
Rewrite the following code into “plain” Python that does not use NumPy,
assuming a
is a list. Built-in functions like sum
, etc. are considered
“plain” Python.
def mystery(a):
return np.max(a) - np.min(a)
Rewrite the following code into “plain” Python that does not use NumPy,
assuming a
and b
are lists (of the same length). Built-in functions like
sum
, etc. are considered “plain” Python.
def mystery(a, b):
return np.sum((np.array(a)-np.mean(a)) * (np.array(b)-np.mean(b)))
Rewrite the following Python function using NumPy to not have any explicit loops:
def length_normalize(items):
"""
Normalize all the values in the list by the sum
Args:
item: A list of numbers
Returns: List of normalized numbers
"""
total = 0
for item in items:
total += item
new_items = []
for item in items:
new_items.append(item / total)
return new_items
Consider the following Table assigned to the tips
variable, a subset of
which are shown below (you can download the file here and read
into Python via tips = ds.Table().read_table("tips.csv")
).
>>> tips
total_bill | tip | sex | smoker | day | time | size
16.99 | 1.01 | Female | No | Sun | Dinner | 2
10.34 | 1.66 | Male | No | Sun | Dinner | 3
21.01 | 3.5 | Male | No | Sun | Dinner | 3
23.68 | 3.31 | Male | No | Sun | Dinner | 2
24.59 | 3.61 | Female | No | Sun | Dinner | 4
25.29 | 4.71 | Male | No | Sun | Dinner | 4
8.77 | 2 | Male | No | Sun | Dinner | 2
26.88 | 3.12 | Male | No | Sun | Dinner | 4
15.04 | 1.96 | Male | No | Sun | Dinner | 2
14.78 | 3.23 | Male | No | Sun | Dinner | 2
... (234 rows omitted)
Briefly describe the plot generated by the following code. &
is the
element-wise and
operation.
d = tips.where((tips["sex"] == "Female") & (tips["time"] == "Lunch"))
plt.plot(d["total_bill"], d["tip"], "ro")
d = tips.where((tips["sex"] == "Male") & (tips["time"] == "Lunch"))
plt.plot(d["total_bill"], d["tip"], "bo")
d = tips.where((tips["sex"] == "Female") & (tips["time"] == "Dinner"))
plt.plot(d["total_bill"], d["tip"], "rx")
d = tips.where((tips["sex"] == "Male") & (tips["time"] == "Dinner"))
plt.plot(d["total_bill"], d["tip"], "bx")
plt.show()
For the dataset above, write datascience code to subset the data to just those rows where the tip is greater than 15% of the total bill.
For the dataset above, write code using the datascience group
method to
concisely and efficiently compute the average tip percentage for all
combinations of diner gender and meal time (“Lunch” vs. “Dinner”). As a
suggestion, the NumPy np.mean
method can be used as the function applied
to each group.
[Bonus] Write code to perform the same computation, computing the mean tip
percentage for all combinations of diner gender and meal time (“Lunch” vs.
“Dinner”), using just Python built-in functions and data structures. There
are many ways to go about this, but as a hint, tuples, e.g. (sex, time)
,
can be used as dictionary keys. You can easily iterate through the rows of a
Table with the
row
attribute
and access the fields as attributes of the value of that iterable, e.g.
for row in tips.rows:
print(row.tip / row.total_bill)