Last time, Simple Tricks looked at some built-in Python functions. This time, we look at three built-in Python container classes, tuple, list, and dict, with a focus on designing useful sub-classes based on them.
We’ll explore the built-in __new__ and __init__ methods in detail along with some of the other built-in methods that help you to create rich new types. [The reader is assumed to be familiar with the basics of object-oriented programming.]
As a starting point, just about everything in Python, and certainly any data object, is an instance of a class that has object as its base class. (In Python, functions and other program elements also descend from object.)
While you can create instances of the object class:
>>> obj = object() >>> print(obj) <object object at 0x0000020065F84070> >>>> hex(id(obj)) '0x20065f84070' >>>
They aren’t very useful. You cannot add attributes, for instance:
>>> obj = object()
>>> obj.x = 42
Traceback (most recent call last):
File "<pyshell#8>", line 1, in <module>
obj.x = 42
AttributeError: 'object' object has no attribute 'x'
>>>
But any class you create automatically has object as its base class. Note that in older versions of Python, these two class declarations were different:
002| …
003|
004| class my_class_2 (object):
005| …
006|
Now they’re the same. It’s no longer necessary to explicitly subclass from object.
Let’s start with a minimal user-defined class:
002| ”’A basic user-defined class template.”’
003|
004| def __new__ (cls):
005| ”’Create a new instance object.”’
006| return super().__new__(cls)
007|
008| def __init__ (self):
009| ”’Initialize new instance object.”’
010| super().__init__()
011|
012| def __str__ (self):
013| ”’Return a string version of the object.”’
014| return “An object.”
015|
016| obj = a_class()
017| print()
018| print(f'{obj}’)
019| print(f'{repr(obj)}’)
020| print(f'{type(obj).__name__}’)
021| print(f’id={hex(id(obj))}’)
022| print()
023|
024| obj.x = 42
025| obj.y = 21
026| print(f’x={obj.x}, y={obj.y}’)
027| print()
028|
Objects of this class don’t do anything useful. They’re essentially instances of object with a do-nothing a_class wrapper around them. The intent of this template is to highlight the __new__ and __init__ built-in methods.
The first, __new__ (lines #4 to #6) isn’t a method you’ll need to implement often. It’s a special static method on the class that normally returns a new instance of the class. Note that the single required input parameter is the class (canonically named cls) rather than the object instance (canonically named self) as with regular class methods. Except in special cases, __new__ should return a new instance of cls.
Usually, the new instance is ultimately provided by object.__new__, so when you do implement __new__, you almost always need to call the parent __new__ method (line #6) to get that instance. Any parent classes between your class and object, if they implement __new__ need to do the same.
The second, __init__ (lines #8 to #10) is the method you’ll usually use to initialize your new object. In most cases, you’ll call the parent __init__ method and then add and initialize your own instance attributes. We’ll come back to this important method below. For now, just note it does not return anything. The single required input parameter is the newly created object.
Note that, if __new__ does not return an instance object of the class that was passed in, Python does not call the __init__ method. One use case for this is the singleton (anti-) pattern where you only want one shared instance of a class. I may revisit this in a future post.
Python invokes these two functions whenever you create a new object:
obj = object()
By the time obj gets a reference to the created object instance, Python has called both __new__ to actually create the instance and __init__ to initialize it.
The template above also implements the built-in __str__ method (lines #12 to #14) because you should always implement either __str__ or __repl__. Ideally both, if the class is at all important. [See Always Implement toString for details.] In most cases, with new classes, you’ll start by defining __init__ and __str__. (Actually, you’ll start by first writing a doc string for your class. Rule #4: Always comment as you go.)
When run, the code above prints:
An object. <__main__.a_class object at 0x00000244C737F250> a_class id=0x244c737f250 x=42, y=21
Note that subclassing object allows us to assign new attributes to instances of a_class. The code above adds the x and y attributes to the new instance of a_class “by hand” — that is, using code to alter the instance after it has been created and initialized. As we’ll see shortly, there’s a better way to assign attributes to objects when those attributes are common to all instances of the class.
Let’s instrument our simple class so we can see how this works. We’ll also add some input parameters:
002| ”’A basic user-defined class.”’
003|
004| def __new__ (cls, **kwargs):
005| ”’Create a new instance object.”’
006| print(f’new({kwargs})’)
007|
008| # object.__new__ takes a class to instantiate…
009| self = super().__new__(cls)
010| # Object instance now exists…
011|
012| # Return the new object…
013| print(f’new: {self!r}’)
014| return self
015|
016| def __init__ (self, x=0.0, y=0.0):
017| ”’Initialize new instance object.”’
018| print(f’init({x} {y})’)
019|
020| # object.__init__ takes no arguments…
021| super().__init__()
022| # Set object properties…
023| self.x = x
024| self.y = y
025|
026| # And doesn’t return anything…
027| print(f’init: {type(self).__name__}:{self}’)
028| print()
029|
030| def __repr__ (self):
031| ”’Representative string.”’
032| return f'<{self.__class__.__name__} @{hex(id(self))}>’
033|
034| def __str__ (self):
035| ”’Pretty-print string.”’
036| return f'[{self.x:.3f}, {self.y:.3f}]’
037|
038|
039| o1 = my_class()
040| o2 = my_class(x=2.7, y=3.1)
041|
042| print(f’object1: {o1} @{hex(id(o1))}’)
043| print(f’object2: {o2} @{hex(id(o2))}’)
044| print()
045|
This class implements both __new__ and __init__ as well as both __str__ and __repr__.
The __new__ method (lines #4 to #14) isn’t necessary for a class like this (note how it ignores the kwargs parameter and just returns the new instance). It’s included here as part of instrumenting the class with print statements. Because __init__ takes parameters (see below), __new__ must have a matching signature. We don’t care about the parameters here, so we can bundle them into one kwargs parameter.
We call the parent class __new__ method (line #9) to get a new object instance and assign it to self (the canonical name for the object instance — this is the same self other class methods take as their first parameter). Note that in printing the object (line #13) we have to be careful. At this point, the x and y attributes haven’t been set, so the str representation is off-limits. The repr version is safe to print, so we use the !r option in the print statement.
The __init__ method illustrates the usual method of calling the parent class (line #21) to initialize any base attributes and, upon return, setting the attributes for this class (lines #23 and #24). In this case, we’re creating x and y attributes and assigning the input values to them. Note, in line #27, one way to print the class name. The __repr__ method below illustrates another.
The __repr__ method (lines #30 to #32) returns a “representative” string version of the object. Exactly what to print here is up to you, but it’s generally a more “industrial” take on the object. There are multiple views on what’s best. One is to return a string that, at least in theory, could reconstruct the object. A more common view, implemented here, is to return some form of the class name along with the object id (which is usually its address in memory).
The __str__ method (lines #34 to #36) should return a “nice” string version of the object. Here it’s the x and y attributes, normalized to three decimal places, and shown in square brackets. Remember, for all but the most trivial classes, you should implement one of these two string methods.
Lines #39 to #44 create two new instances and print their string versions and locations in memory. When run, this prints:
new({})
new: <my_class @0x24111fcf2b0>
init(0.0 0.0)
init: my_class:[0.000, 0.000]
new({'x': 2.7, 'y': 3.1})
new: <my_class @0x24111fcf280>
init(2.7 3.1)
init: my_class:[2.700, 3.100]
object1: [0.000, 0.000] @0x24111fcf2b0
object2: [2.700, 3.100] @0x24111fcf280
You can see how __new__ is called first and then __init__. Depending on your class, you may not need either of these, but Python still calls them on the parent class (which in cases like this is the object class).
Here’s the same basic class again without the instrumentation and extra comments:
002|
003| class xy_point:
004| ”’A simple XY point class.”’
005|
006| def __init__ (self, x=0.0, y=0.0):
007| ”’Initialize new point object.”’
008| self.x = x
009| self.y = y
010|
011| def __str__ (self):
012| ”’Pretty-print string.”’
013| return f'[{self.x:.6f}, {self.y:.6f}]’
014|
015| p1 = xy_point()
016| p2 = xy_point(x=e, y=pi)
017|
018| print(f’p1: {p1}’)
019| print(f’p2: {p2}’)
020|
When run, this prints:
p1: [0.000000, 0.000000] p2: [2.718282, 3.141593]
I’ll leave a detailed exploration of user-defined classes for another time. In this post I want to look at making (fairly simple) subclasses of three of Python’s more useful built-in classes, tuple, list, and dict. As we’ll see, a tuple being immutable has an impact on any subclass we make from it.
Here’s an instrumented class that subclasses the list class:
002| def __new__ (cls, listobject=[]):
003| ”’Create a new my_list object.”’
004| print(f’new({listobject})’)
005|
006| # Expects a single iterable arg; ignores keyword args…
007| obj = super().__new__(cls, listobject)
008| print(f’new: {obj}:{type(obj).__name__}’)
009| # List is empty at this point!
010|
011| # Return the new object…
012| return obj
013|
014| def __init__ (self, listobject=[]):
015| ”’Initialize new my_list object.”’
016| print(f’init({listobject})’)
017|
018| # Init takes same arguments…
019| super().__init__(listobject)
020| # Now list is populated!
021|
022| print(f’init: {self}:{type(self).__name__}’)
023| print()
024| # Init doesn’t return anything…
025|
026|
027| l0 = my_list()
028| l1 = my_list([1,2,3,4,5])
029| l2 = my_list(‘Hello!’)
030| l3 = my_list(range(12))
031| print()
032|
033| print(l0)
034| print(l1)
035| print(l2)
036| print(l3)
037| print()
038|
We once again implement __new__ just so we can have some print statements to show the flow. The list class constructor takes a single optional argument, an iterable object to make into a list, so we must write the __new__ and __init__ methods to expect that parameter. To make it optional, we provide a default value, an empty list.
We don’t do anything special in the __new__ and __init__ methods, so this subclass isn’t very useful. It’s just a list object wrapped in our my_list class. The only purpose here is to demonstrate subclassing Python objects and to illustrate the flow one more time. When run, this prints:
new([]) new: []:my_list init([]) init: []:my_list new([1, 2, 3, 4, 5]) new: []:my_list init([1, 2, 3, 4, 5]) init: [1, 2, 3, 4, 5]:my_list new(Hello!) new: []:my_list init(Hello!) init: ['H', 'e', 'l', 'l', 'o', '!']:my_list new(range(0, 12)) new: []:my_list init(range(0, 12)) init: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]:my_list [] [1, 2, 3, 4, 5] ['H', 'e', 'l', 'l', 'o', '!'] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
We didn’t have to implement __str__ because list already has an acceptable version. We do that only if we want a different version.
Why might we want to subclass list? Here’s a use case where we want lists of numbers and a set of operations we can perform on a list of numbers:
002| ”’Basic list of numbers.”’
003|
004| def __init__ (self, *args):
005| ”’Initialize. Takes a single iterable or a list of arguments.”’
006| # We want a single list of objects…
007| arglst = list(args[0]) if len(args) == 1 else args
008|
009| # Each list object must be numeric…
010| for ix,a in enumerate(arglst, start=1):
011| if isinstance(a,int): continue
012| if isinstance(a,float): continue
013| if isinstance(a,complex): continue
014| if isinstance(a,bytes): continue
015| raise ValueError(f’Illegal list item: #{ix}={a}’)
016|
017| # Let list.__init__ build the list…
018| super().__init__(arglst)
019|
020| def __str__ (self):
021| ”’Our string version.”’
022| sn = ‘,’.join([str(n) for n in self])
023| return f'[{sn}]’
024|
025| @property
026| def total (self):
027| ”’Return the total.”’
028| return sum(self)
029|
030| @property
031| def mean (self):
032| ”’Return the mean (average).”’
033| return sum(self)/len(self)
034|
035| @property
036| def low (self):
037| ”’Return the smallest number.”’
038| return min(self)
039|
040| @property
041| def high (self):
042| ”’Return the largest number.”’
043| return max(self)
044|
045| @property
046| def sumsquares (self):
047| ”’Return the sum of the number squares.”’
048| return sum(self.squares)
049|
050| @property
051| def squares (self):
052| ”’Return the squares of the numbers.”’
053| return numbers(pow(a,2) for a in self)
054|
055| @property
056| def cubes (self):
057| ”’Return the cubes of the numbers.”’
058| return numbers(pow(a,3) for a in self)
059|
060| from random import randint
061|
062| ns = numbers(randint(10,1000) for _ in range(8))
063| print(ns)
064| print(f’sum = {ns.total}’)
065| print(f’avg = {ns.mean:.3f} (min: {ns.low}, max: {ns.high})’)
066| print(f’squares: {ns.squares}’)
067| print()
068|
We subclass list (line #1) and implement the __init__ and __str__ methods as well as seven new methods implemented as properties.
The __init__ method (lines #4 to #18) first checks to see if it got a single input argument or more than one (line #7). It assumes a single argument is an iterable, and creates a list from it, otherwise it uses args as is. The for loop (lines #10 to #15) checks that each parameter is an acceptable numeric type (raising an exception if not). Only after checking do we invoke the parent initialize method with the vetted list of numbers.
Our version of the __str__ method (lines #20 to #23) produces output that is nearly the same as from list, but in our version, for compactness, we remove the spaces.
The new methods, which return various values, are implemented as properties (thus removing the need for parentheses when getting the value and also emphasizing their read-only nature).
When run, this prints:
[314,851,839,281,364,958,487,964] sum = 5058 avg = 632.250 (min: 281, max: 964) squares: [98596,724201,703921,78961,132496,917764,237169,929296]
Depending on your application, there are other properties you could add. Specialized mathematical operations such as dot product or variations on the methods implemented here (for example, returning the index of the lowest and highest members rather than the list member).
Now that we’ve seen how to subclass list (a general pattern for mutable objects), let’s see how to subclass tuple (which creates immutable objects). Here’s a subclass of tuple that implements a simple multidimensional point:
002| ”’Base point class.”’
003|
004| def __new__ (cls, *args, **kwargs):
005| ”’Create a new point object.”’
006| arglst = list(args[0]) if len(args)==1 else args
007| return super().__new__(cls, arglst)
008|
009| def __init__ (self, *args, precision=3, sep=‘, ‘):
010| ”’Initialize point object.”’
011| super().__init__()
012| self.precision = precision
013| self.sep = sep
014|
015| def __str__ (self):
016| ”’Pretty print string.”’
017| xs = [f'{self[ix]:+.{self.precision}f}’ for ix in range(len(self))]
018| return self.sep.join(xs)
019|
020| def dot_product (self, other):
021| ”’Return the dot product between this and another point.”’
022| assert isinstance(other,point), ValueError(‘Invalid object’)
023| assert len(self)==len(other), ValueError(‘Lengths must match.’)
024| xs = [a*b for a,b in zip(self,other)]
025| return sum(xs)
026|
027| p1 = point(range(31,42), precision=0, sep=‘ ‘)
028| p2 = point(+2.1, –4.2, +6.3, –8.4, precision=2)
029| p3 = point(2.71814, 3.14159, 0.41468, 1.91027, precision=6)
030|
031| print(p1)
032| print(p2)
033| print(p3)
034| print()
035| print(f’p2 dot p2: {p2.dot_product(p2)}’)
036| print(f’p2 dot p3: {p2.dot_product(p3)}’)
037| print()
038|
A significant difference here is that tuple creates and initializes during the __new__ method, not the __init__ method. We could accept the default method, which accepts only a single iterable as input, but we want to use the same trick we did with the numbers class above. We want to be able to create a new object with a list of input arguments (such as in lines #28 and #29). But this time we do it in the __new__ method.
We must also implement the __init__ method to handle the keyword parameters. (We could do this in __new__ but doing it in __init__ is the preferred approach. We only need to store the argument values for later reference. The precision parameter controls how many decimal digits we’ll print in the __str__ function. The sep parameter lets us change the separator string.
When run, this prints:
+31 +32 +33 +34 +35 +36 +37 +38 +39 +40 +41 +2.10, -4.20, +6.30, -8.40 +2.718140, +3.141590, +0.414680, +1.910270 p2 dot p2: 132.3 p2 dot p3: -20.920368000000003
Adding a keyword parameter to control the leading plus sign is left as an exercise for the reader.
Lastly, we’ll subclass the built-in dict class. Our use case is a configuration object for displaying pages of text. We want to support multiple named configurations that we’ll put in a JSON file that looks like this:
{
"version": 1,
"configs": {
"8x11": {
"margin":{"left":0.4, "right":0.6, "top":0.4, "bottom":0.6},
"line-height": 1.1,
"page-numbers": "yes",
"mode": "portrait"
},
"8x14": {
"margin":{"left":0.25, "right":0.25, "top":0.25, "bottom":0.25},
"line-height": 1.2,
"page-numbers": "no",
"mode": "landscape"
},
"": {
"margin":{"left":0.5, "right":0.5, "top":0.5, "bottom":0.5},
"line-height": 1.0,
"page-numbers": "no",
"mode": "portrait"
}
}
}
Here’s a subclass of dict that loads this file and puts the requested configuration into the dictionary:
002|
003| class config (dict):
004| ”’Basic configuration class.”’
005| filename = r”C:\demo\hcc\python\configs.json”
006|
007| def __init__ (self, config_name=‘<default>’):
008| ”’New config instance.”’
009| super().__init__()
010| self[‘name’] = config_name
011|
012| # Auto load data…
013| with open(self.filename, mode=‘rb’) as fp:
014| # Read JSON file…
015| self.data = jsonload(fp)
016| print(‘read: %s’ % self.filename)
017|
018| # Get named configuration…
019| self.configs = self.data[‘configs’]
020|
021| # Load requested configuration…
022| self.config = self.configs[config_name]
023|
024| # Put config values in the dictionary…
025| self[‘lmargin’] = self.config[‘margin’][‘left’]
026| self[‘rmargin’] = self.config[‘margin’][‘right’]
027| self[‘tmargin’] = self.config[‘margin’][‘top’]
028| self[‘bmargin’] = self.config[‘margin’][‘bottom’]
029| self[‘lheight’] = self.config[‘line-height’]
030| self[‘pagnums’] = self.config[‘page-numbers’]
031| self[‘mode’] = self.config[‘mode’]
032|
033| def __str__ (self):
034| ”’Pretty print.”’
035| return f’config[{self[“name”]}]’
036|
037| def margin (self):
038| ”’Return margin parameters as a tuple.”’
039| lmar = self[‘lmargin’]
040| rmar = self[‘rmargin’]
041| tmar = self[‘tmargin’]
042| bmar = self[‘bmargin’]
043| return (lmar, rmar, tmar, bmar)
044|
045| cfg1 = config()
046| cfg2 = config(‘8×11’)
047| print()
048|
049| print(cfg1)
050| print(f’mode: {cfg1[“mode”]}’)
051| print(‘left=%.2f, right=%.2f, top=%.2f, bot=%.2f’ % cfg1.margin())
052| print()
053| print(cfg2)
054| print(f’mode: {cfg2[“mode”]}’)
055| print(‘left=%.2f, right=%.2f, top=%.2f, bot=%.2f’ % cfg2.margin())
056| print()
057|
We don’t need to implement the __new__ method, but we do need to implement the __init__ method because it uses a hard-coded filename to load a JSON file and extract the requested configuration into the dictionary (lines #7 to #31).
The margin convenience method returns the four margin values as a tuple. This is intended for routines that set the margins (but don’t know about the other parameters). As the print functions on line #51 and #55 show, it’s also handy for printing the margin values.
When run, this prints:
read: C:\demo\hcc\python\configs.json read: C:\demo\hcc\python\configs.json config[] mode: portrait left=0.50, right=0.50, top=0.50, bot=0.50 config[8x11] mode: portrait left=0.40, right=0.60, top=0.40, bot=0.60
There are many improvements that could be made, but this serves to illustrate how you can create dictionary objects that self-populate when you create them. This use case assumes other functions expect a dictionary. Normally, if designing this from the bottom up, I’d use a configuration object with attributes for the values rather than dictionary items (in fact, under the hood, there isn’t much difference).
Next time, I’ll explore many other built-in methods you can use in your user-defined classes to create useful objects that are “Python aware”.
Link: Zip file containing all code fragments used in this post.
∅
ATTENTION: The WordPress Reader strips the style information from posts, which can destroy certain important formatting elements. If you’re reading this in the Reader, I highly recommend (and urge) you to [A] stop using the Reader and [B] always read blog posts on their website.
This post is: Simple Python Tricks #10
Pingback: Simple Python Tricks #11 | The Hard-Core Coder
Pingback: Simple Python Tricks #12 | The Hard-Core Coder
Pingback: Simple Python Tricks #16 | The Hard-Core Coder