This series of posts is for those who have used a programming language but are not familiar with Python. The first six posts introduced the language. The last post discussed installing and using Python.
In this post we’ll start exploring how to create and use our own data types.
Many programming languages provide for user-defined data types. In some languages, these data types have capabilities (aka methods) for dealing with their data. Other languages only allow user-defined data structures; code that operates on those structures is separate.
Python is the former type of language. We can define new data types along with methods that act on their data. We’ve seen some examples in passing (in part 3 especially). Now we’ll dig deeper.
Let’s imagine we’re writing software that generated 3D graphs. We’ll be dealing with lots of [x,y,z] points in these graphs, so it would be handy if we had a single “3D point” data type that embodied those three (float) values.
One way to implement this uses a list or tuple (depending on whether we want our point objects to be mutable or immutable):
002|
003| # Using a list…
004| pt = [2.1, 4.2, 6.3]
005|
006| print(f’point {pt}‘)
007| print(f’. x = {pt[0]}‘)
008| print(f’. y = {pt[1]}‘)
009| print(f’. z = {pt[2]}‘)
010| print()
011|
012| # Change the values…
013| pt[0] = 2.7 # x
014| pt[1] = 3.1 # y
015| pt[2] = 5.5 # z
016|
017| print(f’point {pt}‘)
018| print(f’. x = {pt[0]}‘)
019| print(f’. y = {pt[1]}‘)
020| print(f’. z = {pt[2]}‘)
021| print()
022|
023| # Using a tuple…
024| pt = (2.1, 4.2, 6.3)
025|
026| print(f’point {pt}‘)
027| print(f’. x = {pt[0]}‘)
028| print(f’. y = {pt[1]}‘)
029| print(f’. z = {pt[2]}‘)
030| print()
031|
Access to the elements by index is the same for both, but we can only alter items in a list.
When run, this prints:
point [2.1, 4.2, 6.3] . x = 2.1 . y = 4.2 . z = 6.3 point [2.7, 3.1, 5.5] . x = 2.7 . y = 3.1 . z = 5.5 point (2.1, 4.2, 6.3) . x = 2.1 . y = 4.2 . z = 6.3
This works, but accessing x, y, or z via the indexes [0], [1], and [2] is messy and error prone.
We can improve on it by using a dictionary:
002|
003| # Using the dict constructor…
004| pt = dict(x=2.1, y=4.2, z=6.3)
005|
006| # Using a dictionary literal…
007| pt = {“x”:2.1, “y”:4.2, “z”:6.3}
008|
009| print(f’point {pt}‘)
010| print(f’. x = {pt[“x”]}‘)
011| print(f’. y = {pt[“y”]}‘)
012| print(f’. z = {pt[“z”]}‘)
013| print()
014|
015| # Change the values…
016| pt[“x”] = 2.7 # x
017| pt[“y”] = 3.1 # y
018| pt[“z”] = 5.5 # z
019|
020| print(f’point {pt}‘)
021| print(f’. x = {pt[“x”]}‘)
022| print(f’. y = {pt[“y”]}‘)
023| print(f’. z = {pt[“z”]}‘)
024| print()
025|
Lines #4 and #7 are two ways of creating the same dictionary object, either by using the dict constructor, or by using a dictionary literal. Other than the syntax, they are identical in effect.
When run, this prints:
point {'x': 2.1, 'y': 4.2, 'z': 6.3}
. x = 2.1
. y = 4.2
. z = 6.3
point {'x': 2.7, 'y': 3.1, 'z': 5.5}
. x = 2.7
. y = 3.1
. z = 5.5
This works better — it’s clearer which element we mean — but the ["x"] syntax isn’t ideal. Something easier to type and read would be nice.
In Python, the class keyword begins the definition of a new data type (aka class). Using a class, we can bundle our point elements into a single package:
002|
003| class Point:
004| ”’A 3D XYZ Point data structure.”’
005|
006| def __init__ (self):
007| ”’New instance.”’
008| self.x = 0.0
009| self.y = 0.0
010| self.z = 0.0
011|
012|
013| # Create a Point object…
014| pt = Point()
015| pt.x = 2.1
016| pt.y = 4.2
017| pt.z = 6.3
018|
019| # Print point values…
020| print(f’point {pt}‘)
021| print(f’. x={pt.x}‘)
022| print(f’. y={pt.y}‘)
023| print(f’. z={pt.z}‘)
024| print()
025|
026| # Change the values…
027| pt.x = 2.7
028| pt.y = 3.1
029| pt.z = 5.5
030|
031| print(f’point {pt}‘)
032| print(f’. x={pt.x}‘)
033| print(f’. y={pt.y}‘)
034| print(f’. z={pt.z}‘)
035| print()
036|
Lines #3 to #10 define a new class named Point. Line #3 begins the definition; everything indented after is part of the definition. Line #4 provides a documentation string describing the new class. Triple-quote strings allow multi-line comments, so this doc string is a good place to describe the class in detail.
Lines #6 to #10 define a function belonging to the class. Such functions are called class methods (but they are functions). Any function defined in a class always has an initial parameter, usually named self. This is a reference to the instance object itself. Line #7 is a doc string describing the function.
We name the function __init__ (“dunder init”) because that name is special. Class methods with double underbars beginning and ending the name are special method names Python defines that — if defined for a given class — are called in certain contexts.
For example, Python calls the __init__ method when creating a new instance object of a class. It’s the “initialize” method that defines the new object’s properties and otherwise prepares the object for use. If we don’t define this method, Python calls a “do nothing” default.
(We can define our own methods with names that have leading and/or trailing underbars, but there are some caveats, so it’s an advanced topic for later. For now, any “non-special” methods we create should have ordinary names.)
Lines #8 to #10 each create a new attribute on the new instance object that self refers to. These attributes are named x, y, and z, respectively, and their values are all set to 0.0. When dunder init finishes, the new object has those three attributes.
Line #14 creates a new Point object using the Point class as a constructor (by putting parentheses after the name). This automatically invokes dunder init and creates the three attributes. Lines #15 to #17 set their respective values (as do lines #27 to #29).
When run, this prints:
point <__main__.Point object at 0x000002380FFE2900> . x=2.1 . y=4.2 . z=6.3 point <__main__.Point object at 0x000002380FFE2900> . x=2.7 . y=3.1 . z=5.5
It would be nice if we didn’t have to set the point object element by element after creating it (as in lines #14 to #17 above).
We can do this by defining input parameters for dunder init:
002|
003| class Point:
004| ”’A 3D XYZ Point data structure.”’
005|
006| def __init__ (self, x=0.0, y=0.0, z=0.0):
007| ”’New instance.”’
008| self.x = x
009| self.y = y
010| self.z = z
011|
012|
013| # Create a Point object…
014| pt = Point(2.1, 4.2, 6.3)
015|
016| # Print point values…
017| print(f’point {pt}‘)
018| print(f’. x={pt.x}‘)
019| print(f’. y={pt.y}‘)
020| print(f’. z={pt.z}‘)
021| print()
022|
023| # Change the values…
024| pt.x = 2.7
025| pt.y = 3.1
026| pt.z = 5.5
027|
028| print(f’point {pt}‘)
029| print(f’. x={pt.x}‘)
030| print(f’. y={pt.y}‘)
031| print(f’. z={pt.z}‘)
032| print()
033|
With these default values for the parameters (line #6), new Point objects still default to [0.0, 0.0, 0.0], but we can provide any of the values (in any order) to override the default values. This makes initializing new point objects simpler.
When run, this prints the same thing as the code above.
Note how the Point object was printed:
point <__main__.Point object at 0x000001A63D822900>
This is Python’s default way to print an object if the object doesn’t define its own print methods. Python uses the full class name (which includes the module name — in this case __main__) and the object’s memory address.
However, we can define either or both of two dunder methods that control how Python prints an object. Firstly, the dunder str method (“dunder string”), which should return a “nice” string — the “informal” way to display the object.
Secondly, the dunder repr method (“dunder rep-per”) returns a “representation” or “debug” string version of the object (Python’s default way of printing objects uses a default dunder repr method). In some cases, this string allows reconstruction of the object.
002|
003| class Point:
004| ”’A 3D XYZ Point class.”’
005|
006| def __init__ (self, x=0.0, y=0.0, z=0.0):
007| ”’New instance.”’
008| self.x = x
009| self.y = y
010| self.z = z
011|
012| def __repr__ (self):
013| ”’Return a debug string.”’
014| return f’<{type(self).__name__} @{id(self):012x}>‘
015|
016| def __str__ (self):
017| ”’Return a string version.”’
018| return f’[{self.x:.3f}, {self.y:.3f}, {self.z:.3f}]‘
019|
020| if __name__ == ‘__main__’:
021| print()
022|
023| # Create a Point object…
024| pt = Point(2.1, 4.2, 6.3)
025| print(pt)
026| print()
027|
028| # Print point values…
029| print(f’point {str(pt)}‘)
030| print(f’point {repr(pt)}‘)
031| print()
032| print(f’point {pt!s} {pt!r}‘)
033| print(f’. x={pt.x}‘)
034| print(f’. y={pt.y}‘)
035| print(f’. z={pt.z}‘)
036| print()
037|
038| # Change the values…
039| pt.x = 2.7
040| pt.y = 3.1
041| pt.z = 5.5
042|
043| print(f’point {pt}‘)
044| print(f’point {pt=}‘)
045| print()
046| print(f’point {pt!r}‘)
047| print(f’point {pt=!s}‘)
048| print()
049|
Lines #12 to #14 define the dunder repr method; lines #16 to #18 define the dunder str method. Python calls these when it wants a string version of the object to print. Which one it calls depends on the context. In a print statement, Python calls dunder str if the class defines it. Otherwise, it calls dunder repr (using the default if the class doesn’t define its own version).
The built-in str function (line #29) and the built-in repr function (line #30) access these respective strings. We can also access them with !s and !r codes in f-strings (line #32). Generally, the syntax {pt} (line #43) returns the informal print string whereas the syntax {pt=} (line #44) returns the debug string. Alternately, {pt!r} (line #46) returns the debug string and {pt=!s} (line #47) returns the informal string.
Note that we put our test code under the if statement on line #20 so we can import the Point class from another module without the test code executing. (See part 7 for details.)
When run, this prints:
[2.100, 4.200, 6.300] point [2.100, 4.200, 6.300] point <Point @017a2ad92900> point [2.100, 4.200, 6.300] <Point @017a2ad92900> . x=2.1 . y=4.2 . z=6.3 point [2.700, 3.100, 5.500] point pt=<Point @017a2ad92900> point <Point @017a2ad92900> point pt=[2.700, 3.100, 5.500]
Note how lines #46 and #47 reverse the output of lines #43 and #44.
Canonically, the str (“string”) version is meant to be the “nice” way to print the object while the repr (“rep-per”) version is meant for persistence or debugging. That said, the content of the strings returned by dunder str or dunder repr is up to the class designer.
Let’s take a closer look at these methods:
002|
003| class Point:
004| ”’A 3D XYZ Point class.”’
005|
006| def __init__ (self, x=0.0, y=0.0, z=0.0):
007| ”’New instance.”’
008| self.x = x
009| self.y = y
010| self.z = z
011|
012| def __str__ (self):
013| ”’Return a string version.”’
014| x = self.x
015| y = self.y
016| z = self.z
017| s = f’[{x:.3f}, {y:.3f}, {z:.3f}]‘
018| return s
019|
020| def __repr__ (self):
021| ”’Return a debug string.”’
022| classname = type(self).__name__
023| object_id = id(self)
024| s = f’<{classname} @{object_id:012x}>‘
025| return s
026|
I reversed the order — putting dunder str first — to point out that methods can be defined within a class in any order. Many programmers have a preferred organizational order, but that has no effect on how the class works.
I broke dunder str (lines #12 to #18) into separate lines. Lines #14 to #16 copy the self.x, self.y, and self.z values into corresponding variables local to the method. Line #17 uses an f-string (“formatted string”) to create the desired output string and line #18 returns it. In the f-string, the :.3f codes specify that the number should be printed with three decimal points (the “f” stands for “floating-point”).
This produces output like:
[0.000, 0.000, 0.000]
Likewise, I broke dunder repr (lines #20 to #25) into separate lines. Line #22 uses the built-in type function to get the object’s class, which (like all classes) has a __name__ attribute (in this case “Point”). This lets the output string contain the name without coding it explicitly into the string. Line #23 uses the built-in id function to get the object’s “identity” — an integer that is usually the object’s address in memory. Line #24 uses another f-string to create the return string, and line #25 returns it. The :012x code for the id specifies a hexadecimal representation 12 digits wide with leading zeroes if necessary.
This produces output like:
<Point @025761342900>
Which is as (un)informative as Python’s default version but a bit more concise.
Some designers use dunder repr to return strings that can later be used to recreate the object:
002|
003| class Point:
004| ”’A 3D XYZ Point class.”’
005|
006| def __init__ (self, x=0.0, y=0.0, z=0.0):
007| ”’New instance.”’
008| self.x = x
009| self.y = y
010| self.z = z
011|
012| def __str__ (self):
013| ”’Return a string version.”’
014| return f’[{self.x:.3f}, {self.y:.3f}, {self.z:.3f}]‘
015|
016| def __repr__ (self):
017| ”’Return a debug string.”’
018| classname = type(self).__name__
019| return f’{classname}(x={self.x}, y={self.y}, z={self.z})‘
020|
021|
022| if __name__ == ‘__main__’:
023| print()
024|
025| # Create new Point object…
026| pt = Point(2.12, 4.21, 6.35)
027| print(pt)
028| print(id(pt))
029| print()
030|
031| # Get string representation…
032| pt_string = repr(pt)
033| print(pt_string)
034| print()
035|
036| # Create duplicate Point object…
037| pt2 = eval(pt_string)
038| print(pt2)
039| print(id(pt2))
040| print()
041|
We use dunder repr to create a string that’s the same as when we use the class name as a constructor to create a new point:
Point(x=2.12, y=4.21, z=6.35)
On line #32 we save the repr string into the pt_info variable. (We can image that we save it to a file and at some later time reload it into the variable.) Regardless, the string looks like a line of Python source code — one that creates a new Point object with specific values.
That means, on line #37, we can use Python’s built-in eval method — which takes a string of Python code, executes it, and returns the value. In this case we execute code that creates a new Point instance, so eval returns that new Point object.
When run, this prints:
[2.120, 4.210, 6.350] 2017356491344 Point(x=2.12, y=4.21, z=6.35) [2.120, 4.210, 6.350] 2017356430736
Note how the duplicated Point object, pt2, has a different id value than the pt object. That shows that these are different objects.
Alternately, we might use dunder repr to create JSON strings:
002|
003| class Point:
004| ”’A 3D XYZ Point class.”’
005|
006| def __init__ (self, x=0.0, y=0.0, z=0.0):
007| ”’New instance.”’
008| self.x = x
009| self.y = y
010| self.z = z
011|
012| def __str__ (self):
013| ”’Return a string version.”’
014| return f’[{self.x:.3f}, {self.y:.3f}, {self.z:.3f}]‘
015|
016| def __repr__ (self):
017| ”’Return a debug string.”’
018| classname = type(self).__name__
019| attrs = self.__dict__
020| values = [f’“{name}“:{attrs[name]}‘ for name in attrs]
021| return f’{{{classname}:{{ {“, “.join(values)}}}}}‘
022|
023|
024| if __name__ == ‘__main__’:
025| print()
026|
027| pt = Point(2.12, 4.21, 6.35)
028| print(pt)
029| print(repr(pt))
030| print()
031|
When run, this prints:
[2.120, 4.210, 6.350]
{Point:{"x":2.12, "y":4.21, "z":6.35}}
Line #19 makes attrs a short alias for self.__dict__, the Python-created dictionary holding the class’s attributes (which we created in lines #8 to #10). Line #20 builds a list of the attribute names and their values. Line #21 joins those with comma-separators to create the JSON string.
This example jumps ahead on a number of points, but I wanted to demonstrate the flexibility of the dunder str and dunder repr methods.
There is a symmetry between the dunder str and dunder repr methods and the built-in str and repr functions that invoke them. This isn’t the only place such symmetry exists:
002|
003| class Point:
004| ”’A 3D XYZ Point class.”’
005|
006| def __init__ (self, x=0.0, y=0.0, z=0.0):
007| ”’New instance.”’
008| self.x = x
009| self.y = y
010| self.z = z
011|
012| def __repr__ (self):
013| ”’Return a representation string.”’
014| return f’[{self.x}, {self.y}, {self.z}]‘
015|
016| def __bool__ (self):
017| ”’Return True or False.”’
018| return any(self)
019|
020| def __iter__ (self):
021| ”’Return an iterator over the elements.”’
022| return iter([self.x, self.y, self.z])
023|
024|
025| if __name__ == ‘__main__’:
026| print()
027|
028| pt0 = Point()
029| pt1 = Point(2.12, 4.21, 6.35)
030| print(pt0)
031| print(pt1)
032| print()
033|
034| print(f’pt0 is {bool(pt0)}‘)
035| print(f’pt1 is {bool(pt1)}‘)
036| print()
037|
038| if pt0: print(‘True! (0)’)
039| if pt1: print(‘True! (1)’)
040| print()
041|
042| for element in pt1:
043| print(element)
044| print()
045|
046| print(f’pt0 values: {list(pt0)}‘)
047| print(f’pt1 values: {list(pt1)}‘)
048| print()
049|
050| print(f’pt1 min: {min(pt1)}‘)
051| print(f’pt1 max: {max(pt1)}‘)
052| print(f’pt1 sum: {sum(pt1)}‘)
053| print()
054|
We define dunder repr (lines #12 to #14) to ensure our Point objects are printable.
Lines #16 to #18 define the dunder bool method. Python calls this (if defined) when the object is in a Boolean context (for instance, as the conditional in an if statement or while loop). The method must return either True or False. The built-in bool function explicitly returns an object’s Boolean value.
Note that, if dunder bool is not defined for a class, Python considers all its instance objects True by default.
In our dunder bool method, we use the built-in any function (line #18), which takes a list-like object and returns True if any list element is True, and False if all list elements are False.
That requires that Point objects be list-like. One way to do that is to define the dunder iter method (lines #20 to #22), which must return an iterator object. Python calls dunder iter (if defined) in list contexts, such as lines #42, #46 and #47, and also lines #50 to #52, which use the built-in functions min, max, and sum, all of which expect a list-like object. (See part 2 and part 3 for more on like-like objects.)
Line #22 makes a list of the three Point elements and passes that to the built-in iter function, which takes a list-like object and returns an iterator over that object.
To get a sense of iterators, try the following at an interactive Python prompt (see part 7 for details):
>>> stuff = [1, 2, 3]
>>>
>>> items = iter(stuff)
>>> next(items)
1
>>> next(items)
2
>>> next(items)
3
>>> next(items)
Traceback (most recent call last):
File "<pyshell#8>", line 1, in <module>
next(items)
StopIteration
>>>
Iterators cycle through items once. When they reach the end of the list, further attempts raise an Exception.
The process above is what goes on “under the hood” in Python for loops and other list contexts. Python gets an iterator and uses its internal version of the built-in next function to cycle through its elements until it gets a StopIteration Exception (which indicates the end of the list).
To make our Point objects more useful we can take advantage of a set of special methods intended for designing numeric objects. These methods allow user-defined data types to handle a variety of math operations (such as add or multiply).
For example, we can implement member-wise addition of Point objects. Given two 3D points, P₁ and P₂, we want:
Python allows us to handle three distinct situations:
P₃ = P₁ + P₂ P₁ += P₂ P₃ = X + P₂
The first line adds two Point objects and creates a new Point object with the result. The second line uses the += operator to add P₂ to P₁ and leave the result in P₁. The third line involves a non-Point object (for example, an int) that doesn’t know how to add with a Point object. In such cases, Python checks to see if Point objects know how to add themselves to whatever X is.
There are three dunder add methods for the three situations just described:
002|
003| class Point:
004| ”’3D X-Y-Z point.”’
005|
006| def __init__ (self, x=0.0, y=0.0, z=0.0):
007| ”’New Point instance.”’
008| self.x = x
009| self.y = y
010| self.z = z
011|
012| def __repr__ (self):
013| ”’Return a representation string.”’
014| return f’[{self.x:.2f}, {self.y:.2f}, {self.z:.2f}]‘
015|
016| def __add__ (self, other):
017| ”’Add two points, return sum in new Point.”’
018| x = self.x + other.x
019| y = self.y + other.y
020| z = self.z + other.z
021| return Point(x, y, z)
022|
023| def __iadd__ (self, other):
024| ”’Add another point to self; return self.”’
025| self.x += other.x
026| self.y += other.y
027| self.z += other.z
028| return self
029|
030| def __radd__ (self, other):
031| ”’Add scalar number to self; return new Point.”’
032| if isinstance(other,int) or isinstance(other,float):
033| x = self.x + other
034| y = self.y + other
035| z = self.z + other
036| return Point(x, y, z)
037| raise ValueError(f”Can’t to add a {type(other).__name__}!“)
038|
039|
040| if __name__ == ‘__main__’:
041| print()
042|
043| pt1 = Point()
044| pt2 = Point(1.3, 0.5, 2.1)
045| pt3 = Point(2.1, 4.2, 3.6)
046| print(f’{pt1 = }‘)
047| print(f’{pt2 = }‘)
048| print(f’{pt3 = }‘)
049| print()
050|
051| print(f’{pt2 + pt3 = }‘) # invokes __add__
052| print()
053|
054| print(f’{3.333 + pt1 = }‘) # invokes __radd__
055| print()
056|
057| pt1 += pt2 # invokes __iadd__
058| pt1 += pt2
059| print(f’{pt1 = }‘)
060| print()
061|
062| print(f’{pt1+pt2+pt3 = }‘)
063| print()
064|
We define dunder repr (lines #12 to #14) to ensure our Point objects are printable.
The dunder add method (lines #16 to #21) handles the first addition case, adding another object (accessed via the other parameter) to a Point object (accessed via the self parameter). We add the respective elements and use the sums to create and return a new Point object. For simplicity, we blindly assume other really is a Point object. If it isn’t, attempting to access the x, y, or z attributes will likely raise an Exception.
However, note that because Python uses “duck typing”, if — regardless of its actual data type — the object referenced by other does support x, y, and z attribute access, and assuming that access produces appropriate numeric data, the method works as expected.
Line #51 invokes dunder add by adding two Point objects. Line #62 invokes it twice, first to add pt1 and pt2 and then to add that result to pt3. The final result is printed.
The dunder iadd method (“in-place add”; lines #23 to #28) handles the second case, adding the elements of other to the elements of self (using the in-place add operator +=) and returns self.
Lines #57 and #58 invoke dunder iadd by adding pt2 to pt1 twice.
The dunder radd method (“right add”; lines #30 to #37) handles the third case where the left-hand object isn’t a Point object and doesn’t know how to add one. Python then tries the right-hand object to see if it knows what to do. Here, we check to see if other is an int or float, and if so, add it to each element and return the sums in a new Point instance. If other fails our check, we fall through and raise a TypeError Exception.
Line #54 invokes dunder radd by adding a float object (with value 3.333) to a Point object (pt1).
When run, this prints:
pt1 = [0.00, 0.00, 0.00] pt2 = [1.30, 0.50, 2.10] pt3 = [2.10, 4.20, 3.60] pt2 + pt3 = [3.40, 4.70, 5.70] 3.333 + pt1 = [3.33, 3.33, 3.33] pt1 = [2.60, 1.00, 4.20] pt1+pt2+pt3 = [6.00, 5.70, 9.90]
There are corresponding dunder methods for subtraction, multiplication, division, as well as a variety of other mathematical operations. See the Special Method Names documentation for the list of special dunder methods.
That’s plenty for this time. We’ll pick up here next week.
Link: Zip file containing all code fragments used in this post.
∅
ATTENTION: The WordPress Reader strips the style information from posts, which can destroy certain important formatting elements. If you’re reading this in the Reader, I highly recommend (and urge) you to [A] stop using the Reader and [B] always read blog posts on their website.
This post is: This is Python! (part 8)