a=np.array([1, 2, 3])
and b=np.array([4, 5, 6])
(after import numpy as np
). Evaluate each of the following expressions. Make it clear whether the result is a scalar (a single value) or a vector (an array of values).
array([ 4, 10, 18])
-9
array([0.16666667, 0.33333333, 0.5 ])
array([ 7, 9, 11])
array([-1., 0., 1.])
Rewrite the following code into “plain” Python that does not use NumPy,
assuming a
is a list. Built-in functions like sum
, etc. are considered
“plain” Python.
def mystery(a):
return np.max(a) - np.min(a)
def mystery(a):
return max(a) - min(a)
NumPy functions can typically be used with both built-in Python lists and
NumPy arrays. Thus is many instances we don’t need to convert a built-in
list to a NumPy array type. The one place where we do often need to
implement that conversion is when we are using arithmetic operators to
implement element-wise computations. Those operators are only overloaded
for NumPy arrays, i.e. [1, 2, 3] / 4
will raise an error while
np.array([1, 2, 3]) / 4
will perform element-wise division.
Rewrite the following code into “plain” Python that does not use NumPy,
assuming a
and b
are lists (of the same length). Built-in functions like
sum
, etc. are considered “plain” Python.
def mystery(a, b):
return np.sum((np.array(a)-np.mean(a)) * (np.array(b)-np.mean(b)))
def mystery(a, b):
a_mean = sum(a) / len(a)
b_mean = sum(b) / len(b)
total = 0
for i in range(len(a)):
total += (a[i] - a_mean) * (b[i] - b_mean)
return total
Rewrite the following Python function using NumPy to not have any explicit loops:
def length_normalize(items):
"""
Normalize all the values in the list by the sum
Args:
item: A list of numbers
Returns: List of normalized numbers
"""
total = 0
for item in items:
total += item
new_items = []
for item in items:
new_items.append(item / total)
return new_items
def length_normalize(items):
return np.array(items) / np.sum(items)
Consider the following Table assigned to the tips
variable, a subset of
which are shown below (you can download the file here and read
into Python via tips = ds.Table().read_table("tips.csv")
).
>>> tips
total_bill | tip | sex | smoker | day | time | size
16.99 | 1.01 | Female | No | Sun | Dinner | 2
10.34 | 1.66 | Male | No | Sun | Dinner | 3
21.01 | 3.5 | Male | No | Sun | Dinner | 3
23.68 | 3.31 | Male | No | Sun | Dinner | 2
24.59 | 3.61 | Female | No | Sun | Dinner | 4
25.29 | 4.71 | Male | No | Sun | Dinner | 4
8.77 | 2 | Male | No | Sun | Dinner | 2
26.88 | 3.12 | Male | No | Sun | Dinner | 4
15.04 | 1.96 | Male | No | Sun | Dinner | 2
14.78 | 3.23 | Male | No | Sun | Dinner | 2
... (234 rows omitted)
Briefly describe the plot generated by the following code. &
is the
element-wise and
operation.
d = tips.where((tips["sex"] == "Female") & (tips["time"] == "Lunch"))
plt.plot(d["total_bill"], d["tip"], "ro")
d = tips.where((tips["sex"] == "Male") & (tips["time"] == "Lunch"))
plt.plot(d["total_bill"], d["tip"], "bo")
d = tips.where((tips["sex"] == "Female") & (tips["time"] == "Dinner"))
plt.plot(d["total_bill"], d["tip"], "rx")
d = tips.where((tips["sex"] == "Male") & (tips["time"] == "Dinner"))
plt.plot(d["total_bill"], d["tip"], "bx")
plt.show()
This code produces a scatter plot of “tip” (y-axis) vs. “ total_bill” (x-axis) with the color of the point indicating the gender (red for female, blue for male) and the shape of the point indicating the meal time (circle for lunch, “x” for dinner).
For the dataset above, write datascience code to subset the data to just those rows where the tip is greater than 15% of the total bill.
tips.where((tips["tip"] / tips["total_bill"]) > 0.15)
For the dataset above, write code using the datascience group
method to
concisely and efficiently compute the average tip percentage for all
combinations of diner gender and meal time (“Lunch” vs. “Dinner”). As a
suggestion, the NumPy np.mean
method can be used as the function applied
to each group.
tips["pct"] = tips["tip"] / tips["total_bill"]
tips.group(["sex", "time"], np.mean)
[Bonus] Write code to perform the same computation, computing the mean tip
percentage for all combinations of diner gender and meal time (“Lunch” vs.
“Dinner”), using just Python built-in functions and data structures. There
are many ways to go about this, but as a hint, tuples, e.g. (sex, time)
,
can be used as dictionary keys. You can easily iterate through the rows of a
Table with the
row
attribute
and access the fields as attributes of the value of that iterable, e.g.
for row in tips.rows:
print(row.tip / row.total_bill)
There are many ways to go about this, here I create a composite key of the
sex
and time
variables for use in a dictionary. The value in that
dictionary is a list of all of the tip amounts.
groups = {}
for row in tips.rows:
# Create tuple with combination of sex and time to use as dictionary key
combo = (row.sex, row.time)
# Append to tip pct. to list to compute average later
groups[combo] = groups.get(combo, []) + [row.tip / row.total_bill]
for key, value in groups.items():
print(key[0], key[1], sum(value) / len(value))