Class 14

Object-oriented Programming (OOP)

Objectives for today

Explain the relationship between types, classes and objects in Python
Implement a class that overrides operators
Use inheritance to create a specialized type while reusing code

Why Object-Oriented Programming (OOP)

How many of you have used Object-Oriented Programming, often called “OOP” (sometimes rhymes with “goop”)? A trick question. Everyone has. Recall that everything in Python is an object, i.e., integers, strings, etc. and so at some level all Python programming is OOP. But that does not provide any explanation of why that is the case, or what the advantages might be for OOP.

Python has rich library of types “built-in” (its tag line is “batteries included”) but that can’t possibly include types for every kind of data you might want to represent in your program. Instead Python provides a mechanism, Object-Oriented Programming, for us to create new types as needed, and use these those types in the same way we use integers, lists, etc., including performing arithmetic or other operations on those values.

In OOP, we create a Class to define a new type. A Class is like a blueprint for creating objects, or specific instances of that type (much in the way a single construction blueprint can be used to create one or more specific instances of a house). The Class specifies the instance variables (data) contained within an instance and the methods, the computations that can be performed on those instance variables.

As we will see, this approach can offer two key benefits:

We can “encapsulate” all of the complexity of specific data type, including any data and associated operations on that data. Using objects we can define a “higher level” interface the “abstracts” the specifics of that entity
Facilitate code reuse through shared interfaces and inheritance

To see those benefits in action, let’s revisit floating point numbers and some of their peculiarities…

Creating our own data types

We can observe unexpected behavior when comparing floating point numbers due to insufficient precision. For example,

0.1 + 0.2 <= 0.3

False

What is going on there? Due to insufficient precision 0.1 + 0.2 rounds to a number slightly larger than 0.3. We can check out that by using the decimal module, which is designed for base-10 arithmetic.

import decimal
decimal.Decimal(0.1 + 0.2)

Decimal('0.3000000000000000444089209850062616169452667236328125')

decimal.Decimal(0.3)

Decimal('0.299999999999999988897769753748434595763683319091796875')

We could avoid the inherent imprecision by representing floating numbers as rational numbers, i.e., the ratio of two integers. For example 0.1 would be $\frac{1}{10}$. We could then express all of the above operations as operations on ratios, i.e. $\frac{a}{b} + \frac{c}{d}$ is $\frac{ad+cb}{bd}$. Since all the values are integers that computation produces an exact result!

Using the tools already in our toolbox, we could implement our rational number as a tuple of the numerator and denominator, i.e.,

r1 = (1, 10)
r2 = (2, 10)

and addition as a function, e.g.

def add(left, right):
    return (left[0]*right[1] + right[0]*left[1], left[1]*right[1])

add(r1, r2)

(30, 100)

Doing so is absolutely workable, but it requires anyone using that “rational” tuple to know that the first element is the numerator and the second is the denominator, and we can’t use any of Python’s built-in operations in the way we might want, for example + performs tuple concatenation. That is we don’t have a very effective “abstraction” for rational numbers.

r1 + r2

(1, 10, 2, 10)

Instead let’s create a Rational class, that is a Rational data type to encapsulate the numerator and denominator and any associated operations. See the linked Python file for the complete implementation with docstrings, etc.

# Define a class named Rational (we capitalize class names to distinguish them from functions, etc.)
# Note that classes need docstrings too (see linked file)!
class Rational:
    # Define an initializer that sets the numerator and denominator instance variables. It too needs a
    # docstring, but note we don't include a return value since it does not return. We also
    # don't document the self parameter since it already has a defined role in the Python
    # language specification.
    def __init__(self, numerator, denominator):
        # Initialize the object instance variables
        self.numerator = numerator
        self.denominator = denominator

Above we have defined the Rational class. Within that class, any functions definitions are the “methods” of that class. We have started with the “initializer”, __init__, a special method used to construct new instances or objects of a class. When we invoke the function with the same name as the class (type), we are invoking the __init__ method.

r1 = Rational(1, 10)

We can create new instances of Rational just as we created new integers, etc. (that is when we invoked int, list, etc. we were creating new instances of those classes via their __init__ methods). Behind the scenes, Python is creating a new empty object and invoking the initializer, __init__. The self parameter is a special parameter that is always first in the parameter list and is automatically set by Python to be a reference to the newly created (still empty at this point) object. The initializer takes self, the reference to that newly created object and defines and sets the numerator and denominator “instance variables” or “attributes” of that object. When it is done, we have fully initialized Rational object with its own unique numerator and denominator instance variables . Check this code out in Python Tutor.

We can access those instance variables using the “dot” syntax.

r1 = Rational(1, 10)
r1.numerator
r1.denominator

r2 = Rational(2, 10)
r2.numerator
r2.denominator

To implement our motivating example above, we need to support addition and comparison. Let’s start with the former. We implement an add method that will add two Rational numbers, via r1.add(r2). Notice the special self parameter we also used in the __init__ method, this is always the first parameter for methods, and is automatically set to the object on which the method is invoked, i.e., r1 in the example above. other will be a reference to r2. We perform the addition and return a new Rational object with the appropriate numerator and denominator.

class Rational:
    def __init__(self, numerator, denominator):
        self.numerator = numerator
        self.denominator = denominator

    def add(self, other):
        """Return a new Rational that is the sum of this and other"""
        numerator = self.numerator * other.denominator + other.numerator * self.denominator
        denominator = self.denominator * other.denominator
        return Rational(numerator, denominator)

r1 = Rational(1, 10)
r2 = Rational(2, 10)
r3 = r1.add(r2)
r3

<__main__.Rational at 0x7f81c86a1300>

r3.numerator
r3.denominator

Overriding operators

If we attempt any comparisons of these Rational objects, however, we will get unexpected results:

r4 = Rational(3, 10)
r3 == r4

False

r5 = Rational(30, 100)
r3 == r5

False

Why is that? Don’t r3 and r4, and especially r3 and r5 represent the same value? They do, but at the moment Python doesn’t know that. It doesn’t know how Rational objects should be compared and so it compares them based on their location in memory, not that value they represent. Since those are distinct objects (and distinct locations in memory) they compare as different, or False. To implement the appropriate comparison we need to “overload” equality to be specific to Rational. Python actually lets us overload quite a few operators. In our case, we will overload ==, <= and + (so we can implement our motivating example), and override __str__, to customize how Rational objects are displayed. We overload these operators by implementing the special “dunder” methods __eq__, __le__, and __add__ in our class. As we noted in previous classes, Python calls these methods automatically when performing ==, etc.

class Rational:
    def __init__(self, numerator, denominator):
        self.numerator = numerator
        self.denominator = denominator

    def __eq__(self, other):
        return self.numerator * other.denominator == other.numerator * self.denominator 

    def __le__(self, other):
        return self.numerator * other.denominator <= other.numerator * self.denominator

    def __add__(self, other):
        numerator = self.numerator * other.denominator + other.numerator * self.denominator
        denominator = self.denominator * other.denominator
        return Rational(numerator, denominator)

We can now correctly perform the computation in our motivating example! By implementing a class and overriding those operators we have “abstracted away” the details of operations on rational numbers. We can implement our code just as if we were using floats, but we get the correct results!

Rational(1, 10) + Rational(2, 10) <= Rational(3, 10)

True

Much as Python didn’t know how to compare Rational objects by default, it doesn’t know how to print them either. So it defaults to printing the location in memory. By overriding the __str__ method, we can get a more useful display:

print(r1)

<__main__.Rational object at 0x7f81c86a20e0>

With the new method below, we get a more useful output:

def __str__(self):
    return str(self.numerator) + "/" + str(self.denominator)

r1 = Rational(1, 10)
print(r1)

1/10

Reuse through inheritance

A common use case for precise arithmetic is dealing with currency (that is one of the motivations for the decimal module we saw above). We would like to avoid rounding errors when there is money at stake! US dollars could be very naturally represented as a Rational where the numerator is the number of cents and the denominator is 100. But instead of displaying currency as “3/10” we would like to display it as something like “$0.30”. That is we want a slightly customized version of Rational.

We could certainly copy all of the code for Rational into a new class Dollar customizing the methods we want, but that seems like poor design and style (all that copying should make us nervous). Instead we recognize that Dollar is a Rational number, and thus the Dollar class can be derived (or inherit) from the Rational class. By doing so, it “inherits” all the instance variables and methods from Rational and then overrides just those that need to be different.

# The Dollar class is derived from Rational, that is Rational is the "base" class (sometimes called 
# "parent" class) and Dollar the "derived" (or "child") class. As a result, Dollar inherits all of 
# Rational's instance variables and methods.
class Dollar(Rational):
    def __init__(self, cents):
        # Use `super()` to call `Rational`'s initializer, passing through the cents as the numerator
        super().__init__(cents, 100)
    
    def __str__(self):
        # Use format method to specify 0 padding up to two digits for cents
        return "${}.{:02d}".format(self.numerator // 100, self.numerator % 100)

Via inheritance we get to reuse all the capabilities of Rational, but with the customized initializer that has an assumed denominator (of 100 cents) and the customized string generation.

d1 = Dollar(10)
d1

<__main__.Dollar at 0x7f81c870b8e0>

d2 = Dollar(10) + Dollar(20)
d2 == Dollar(30)
print(d2)
type(d2)

True

3000/10000

__main__.Rational

That mostly looks like what we wanted, but we also notice that when we printed d2, we got the Rational version. Why is that? d2 is actually a Rational object not a Dollar. The root cause is the __add__ method, which returns a Rational. We would like to return a Rational when self is a Rational object and Dollar when self is a Dollar object. To do so we modify that method to return an instance of the same type as self, e.g.,

def __add__(self, other):
    numerator = self.numerator * other.denominator + other.numerator * self.denominator
    denominator = self.denominator * other.denominator
    return self.__class__(numerator, denominator)

However when we do so we get an error because the Dollar initializer only takes one argument, but add provides two arguments: the numerator and denominator. We could fix that by using an optional argument that defaults to 100. We will talk more about optional arguments (arguments with default values) later in the semester, but as the syntax suggests, if we don’t provide a value for denominator it defaults to 100.

def __init__(self, cents, denominator=100):
    # Use `super()` to call `Rational`'s initializer
    super().__init__(cents, denominator)

Now we can print d2, the sum of 10 and 20 cents:

d2 = Dollar(10) + Dollar(20)
print(d2)

$30.00

Except we don’t get the right value! Why? Our __str__ method in Dollar assumes the denominator is 100, but addition produces a larger denominator. How could we fix that? For Dollar the denominator should always be a multiple of 100, so we could use the initializer to re-normalize the denominator every time we create a new object, e.g.

def __init__(self, cents, denominator=100):
    # Reduce numerator to have 100 as the denominator, so the values stay "normalized"
    normalize = (denominator // 100)
    super().__init__(cents // normalize, 100)

Now we get the correct result and can take advantage of all the capabilities of Rational without needing to re-implement those methods. The result is much more concise and maintainable!

Make sure to check out the complete implementation which includes docstrings and additional comments, explanation, etc.

Python inheritance vocabulary

In the above example, Dollar derives or inherits from Rational . We would describe Rational as the base or parent class, and Dollar the derived or child class. The derived class inherits all the methods and instance variables of its base class(es) (a class can have “grandparent” classes, “great grandparent” classes, etc. forming a chain). When you invoke a method on an object or access an instance variable, Python first searches that class for the relevant method or instance variable then its base class, continuing recursively “up the chain” until the name is method/instance variable is found or there is no more base class. Thus methods in a derived class override methods of the same name in parent classes. That is invoking the __str__ method on a Dollar object executes the __str__ method defined in Dollar, while invoking __add__ executes the method defined in Rational because there is no implementation of __add__ in Dollar (and thus Python find it in Rational).

Note that there many different ways to implement OOP and inheritance. Other languages, such as Java (which you learn in CS201) have slightly different approaches, but the core ideas are similar.

So when is OOP useful?

That is hard question to answer in the abstract… but I would say that anytime you have a single logical entity, e.g. a rational number, that requires multiple pieces of data to represent, e.g., a numerator and denominator, OOP could be a useful tool. That is we could benefit from creating a new data type for that logical entity. Another example would be calendar dates. A date is a single logical entity, but encoding a date requires multiples pieces of information, e.g., some combination of year, day and/or month. And we would like to perform computations on dates, e.g., time elapsed between two dates, without needing to know the underlying encoding. Implementing a Date class, with methods for those operations, is a very natural approach.

Adapted from Alvarado C. et al. (Alvarado et al. 2019)

Alvarado, Christine, Ran Libeskind-Hadas, Geoffrey H. Kuenning, and Zachary Dodds. 2019. CS for All: An Introduction to Computer Science Using Python. Franklin, Beedle & Associates Incorporated.