Simple Python Tricks #16

Tags

Python classes, Python code, Simple Tricks

Simple Tricks #10 was about Python classes with a focus on the __new__ and __init__ built-in methods plus how to use them when extending Python’s built-in list, tuple, and dict classes.

In this edition of Simple Tricks, we’ll look at a number of possibly actually useful subclasses of Python’s dict class. Specifically, a “ticket” class, a “list of files” class, and an INI file class.

But we’ll start with a “do nothing” subclass of dict that instruments the built-in functions most central to dict:

 class test_bag (dict):

 

     def __new__ (cls, **kwargs):

         “””Create a new test_bag instance.”””

         print(f’new({kwargs.keys()})’)

 

         obj = super().__new__(cls, **kwargs)

         print(f’self={obj}’)

         return obj

 

     def __init__ (self, **kwargs):

         “””Initialize a new test_bag instance.”””

         print(f’init({kwargs.keys()})’)

         super().__init__()

 

         # Populate self “by hand”…

         for kx,k in enumerate(kwargs):

             a = kwargs[k]

             print(f’arg[{kx}] {k}=”{a}”‘)

             self[k] = a

 

     def __missing__ (self, keyname):

         “””No entry in collection for keyname.”””

         print(f’missing({keyname})’)

 

         return None

 

     def __getitem__ (self, keyname):

         “””Get Item. val = obj[key]”””

         print(f’getitem({keyname})’)

 

         return super().__getitem__(keyname)

 

     def __setitem__ (self, keyname, value):

         “””Set Item. {obj[key]=val}”””

         print(f’setitem({keyname},{value})’)

 

         if keyname in self:

             raise ValueError(‘Values are read-only.’)

         super().__setitem__(keyname, value)

 

     def visit (self, callback_function):

         “””Visit each item in the bag.”””

 

         # For each key in the dictionary…

         for ix,key in enumerate(self.keys(), start=1):

             print(f’visit[{ix}]: “{key}” = “{self[key]}”‘)

             # Pass key and value to function…

             retval = callback_function(ix, key, self[key])

             # If return value other than None,…

             if retval is not None:

                 # Update entry with return value…

                 super().__setitem__(key, retval)

 

         return self

 

 

 if __name__ == ‘__main__’:

     “””Exercise the test_bag.”””

 

     def print_function (ix, key, value):

         print(f'{ix}: {key}=”{value}” ({type(value).__name__})’)

 

     bag = test_bag(x=1.0, y=0.0, z=0.0, a=True, b=False)

     print()

 

     print(‘Visit…’)

     bag.visit(print_function)

     print()



Line #1 begins the definition of a new class, test_bag that subclasses dict. (I often refer to essentially unordered associative collections as “bags”. I think of them as opaque bags of any sorts of things accessed by identifying keys.)

Lines #3 to #9 implement the built-in __new__ method for the test_bag class. Because the dict isn’t populated until the __init__ method, we don’t need to intercept the new object process. We’re not doing anything necessary here. The only reason we implement it is to instrument the new process with the two print statements (line #5 and #8). Note that we must be sure to call the superclass (line #7) and return the object instance it creates (line #9).

Lines #11 to #20 implement the built-in __init__ method for the test_bag class. Note that when we call the superclass (line #14), we do not pass the kwargs parameter. This prevents dict from populating the instance with any passed arguments. It’s in lines #16 to #20 that we populate the instance “by hand” — just so we can print each passed argument.

(See Simple Tricks #10 for more details on __new__ and __init__.)

Lines #22 to #26 implement the built-in __missing__ method. Python calls this method whenever key isn’t found in the collection. In the dict class, this raises a ValueError exception. Here we simply return None — the idea being that returning None is better than raising an exception. Obviously, a client using this class would need to check for it as a returned value.

Lines #28 to #32 implement the built-in __getitem__ method that, given a key, returns an item from the collection. (Or invokes __missing__ if not found.) Again, we’re not adding any functionality here other than the print statement for instrumentation.

Lines #34 to #40 implement the built-in __setitem__ method that, given a key and a new value, either updates (if the key exists) or adds the value (if the key does not). In our override of the method, we disallow updating of existing key=value pairs but allow new ones to be added. (This is entirely for fun/demonstration.)

Lastly, lines #42 to #55 implement a visit method that enumerates through the existing keys and calls the given function with the index, key, and value. That function can return None to signify no action or any other value to update the dict value. (Note that this method dodges the prohibition on updating values! So, we don’t entirely disallow updating values. We just make it harder.)

Lines #58 to #69 exercise the test_bag class. Note how we use the visit function to print the content of the bag.

When run (as is), this prints:

new(dict_keys(['x', 'y', 'z', 'a', 'b']))
self={}
init(dict_keys(['x', 'y', 'z', 'a', 'b']))
arg[0] x="1.0"
setitem(x,1.0)
arg[1] y="0.0"
setitem(y,0.0)
arg[2] z="0.0"
setitem(z,0.0)
arg[3] a="True"
setitem(a,True)
arg[4] b="False"
setitem(b,False)

Visit...
getitem(x)
visit[1]: "x" = "1.0"
getitem(x)
1: x="1.0" (float)
getitem(y)
visit[2]: "y" = "0.0"
getitem(y)
2: y="0.0" (float)
getitem(z)
visit[3]: "z" = "0.0"
getitem(z)
3: z="0.0" (float)
getitem(a)
visit[4]: "a" = "True"
getitem(a)
4: a="True" (bool)
getitem(b)
visit[5]: "b" = "False"
getitem(b)
5: b="False" (bool)

Note how when we print the new instance in the __new__ method, despite having passed kwargs to the superclass, the instance is not yet populated. (This output also shows that kwargs.keys() is a specific type, a dict_keys object.)

Note also that when we populate the instance, we invoke the __setitem__ method for each key=value we set. Later, when we visit the instance to print it, we invoke the __getitem__ method for each key=value pair we read.

If we comment out (or otherwise disable) the instrumentation, it just prints:

Visit...
1: x="1.0" (float)
2: y="0.0" (float)
3: z="0.0" (float)
4: a="True" (bool)
5: b="False" (bool)

It is in all regards a Python dictionary object, but it has a few special features. You can use it to experiment, or as a base class to provide instrumentation for subclasses of your own.

We’ll implement a simple “Ticket” class (or more reasonably a base class that could be extended with additional properties). We’re going to require that each ticket issued have a unique ID number and a timestamp of when it was issued. First, we need some helper classes:

 from time import time

 from datetime import datetime, timedelta, tzinfo

 from random import random

 

 UID = lambda: (int(1000*time()) * int((2**22)*random()))

 

 class TimeZone (tzinfo):

     “””A TimeZone class (extending tzinfo).”””

 

     # Dates for Daylight Saving Time (DST)…

     DST0 = datetime(2018, 3, 11, 2, 0, 0)

     DST1 = datetime(2018, 11, 4, 2, 0, 0)

 

     # One hour and zero…

     HOUR  = timedelta(hours=1)

     ZERO  = timedelta(0)

 

     def utcoffset (self, dt):

         “””This time zone’s UTC offset.”””

         return self.offset + self.dst(dt)

 

     def dst (self, dt):

         “””Return difference for thie time zone’s DST.”””

         return self.HOUR if self._dst(dt) else self.ZERO

 

     def _dst (self, dt):

         “””Figure out if we’re in DST now.”””

         dst0 = self.DST0.replace(year=dt.year)

         dst1 = self.DST1.replace(year=dt.year)

         dtx =  dt.replace(tzinfo=None)

         return dst0 <= dtx < dst1

 

 class UTC (TimeZone):

     “””aka GMT aka ‘Zulu Time'”””

     def tzname (self, dt):

         return ‘UTC’

 

     def utcoffset (self, dt):

         return self.ZERO

 

     def dst (self, dt):

         return self.ZERO

 

 class EST (TimeZone):

     “””Eastern Standard Time.”””

     offset = timedelta(hours=–5)

 

     def tzname (self, dt):

         return ‘EDT’ if self._dst(dt) else ‘EST’

 

 class CST (TimeZone):

     “””Central Standard Time.”””

     offset = timedelta(hours=–6)

 

     def tzname (self, dt):

         return ‘CDT’ if self._dst(dt) else ‘CST’

 

 class MST (TimeZone):

     “””Mountain Standard Time.”””

     offset = timedelta(hours=–7)

 

     def tzname (self, dt):

         return ‘MDT’ if self._dst(dt) else ‘MST’

 

 class PST (TimeZone):

     “””Pacific Standard Time.”””

     offset = timedelta(hours=–8)

 

     def tzname (self, dt):

         return ‘PDT’ if self._dst(dt) else ‘PST’

 

 class HST (TimeZone):

     “””Hawaii Standard Time.”””

     offset = timedelta(hours=–10)

 

     def tzname (self, dt):

         return ‘HDT’ if self._dst(dt) else ‘HST’



For brevity, I’m not going to go over this in any detail. The UID function just uses time and a random number to generate a unique ID number. The rest just implements a set of TimeZone classes for use with Python’s datetime class.

(Including TimeZone classes for most of the USA time zones is overkill for this project — we only need the UTC class — but I included them in case you find them useful)

((Even so, my apologies for not including Alaska time.))

Now we can implement our ticket class:

 from datetime import datetime

 from examples import UID, UTC

 

 class ticket (dict):

     “””Implementing a Ticket by subclassing dict.”””

 

     def __init__ (self, firstname, lastname, **kwargs):

         “””New tickets require a first and last name.”””

         super().__init__(**kwargs)

         self[‘tid’] = UID()

         self[‘ts’] = datetime.now(tz=UTC())

         self[‘firstname’] = firstname

         self[‘lastname’] = lastname

 

     @property

     def tid (self): return self[‘tid’]

 

     @property

     def name (self): return f'{self[“firstname”]} {self[“lastname”]}’

 

     @property

     def timestamp (self): return self[‘ts’].strftime(‘%Y-%m-%d %H:%M:%S’)

 

     def __missing__ (self, key): return ‘<null>’

 

     def __eq__ (self, other): return (self[‘ts’] == other[‘ts’])

     def __ne__ (self, other): return (self[‘ts’] != other[‘ts’])

     def __le__ (self, other): return (self[‘ts’] <= other[‘ts’])

     def __lt__ (self, other): return (self[‘ts’] <  other[‘ts’])

     def __gt__ (self, other): return (self[‘ts’] >  other[‘ts’])

     def __ge__ (self, other): return (self[‘ts’] >= other[‘ts’])

 

     def __hash__ (self): return hash(self[‘tid’])

 

     def __setitem__ (self, keyname, value):

         “””Set Item.”””

         if keyname in self:

             raise ValueError(‘Entries are read-only.’)

         super().__setitem__(keyname, value)

 

 

     def __delitem__ (self, keyname):

         “””Delete Item.”””

         raise ValueError(‘Entries may not be deleted.’)

 

     def __str__ (self):

         return f'[{self.timestamp}] {self.name} ({self.tid})’

 

     def __repr__ (self):

         t = (self.tid, self.timestamp, self[‘firstname’], self[‘lastname’])

         s = ‘{ticket:{tid:%s, ts:”%s”, firstname:”%s”, lastname:”%s”}}’

         return s%t

 

 

 if __name__ == ‘__main__’:

     “””Exercise the ticket class.”””

     print()

 

     tick = ticket(‘Wyrd’,‘Smythe’, job=‘123456-78’, due=‘2025-08-14’)

     print(f'{tick!s}’)

     print()



This is fairly straight-forward, but there are a few wrinkles, some of which should be familiar from the previous example. For instance, we again implement the __missing__ method (line #24), so clients need to watch for the '<null>' string value rather than an exception.

Ticket entries are intended to be immutable, so we implement __setitem__ (lines #35 to #39) and __delitem__ (lines #42 to #44) to prevent updating and removing them. As before, we do allow adding new entries.

This time there’s no visitor method that allows easily getting around our restrictions, so we’ll treat instances of this class as immutable. We implement the __hash__ method (line #33) to allow their use as keys.

We also implement the relational operators (lines #26 to #31) to make these objects sortable.

Most importantly, the __init__ method (lines #7 to #13) takes explicit first and last name arguments in addition to a set of keyword arguments (which will be added to the dictionary). The names, along with a generated ID and timestamp, are added to the dictionary with specific names.

We define various properties (lines #15 to #22) for convenience (note their use in the __str__ and __repr__ methods).

When run, this just prints:

[2025-07-20 18:07:25] Wyrd Smythe (4547523151084949552)

(The ID number is different each time run because of the random function. The timestamp, of course, also varies each time run.)

We can test the immutability with the following:

 from demos import ticket

 

 if __name__ == ‘__main__’:

     “””Exercise the ticket class.”””

 

     # Create a new ticket…

     tick = ticket(‘Wyrd’,‘Smythe’, job=‘123456-78’, status=‘new’)

     print(tick)

     print()

 

     # Try to update an entry…

     print(f’status1 = “{tick[“status”]}”‘)

     try:

         tick[‘status’] = ‘accepted’

     except Exception as e:

         print(e)

         print()

 

     # Try to delete an entry…

     try:

         del tick[‘status’]

     except Exception as e:

         print(e)

         print()

 

     # Add a new entry…

     try:

         tick[‘status2’] = ‘open’

         print(f’status2 = “{tick[“status2”]}”‘)

         print()

     except Exception as e:

         print(e)

         print()

 

     # Try to update the new entry…

     try:

         tick[‘status2’] = ‘closed’

     except Exception as e:

         print(e)

         print()

 

     # Use the dict base class to modify entries…

     super(type(tick), tick).__setitem__(‘status’, ‘accepted’)

     super(type(tick), tick).__setitem__(‘status2’, ‘closed’)

     print(f’status1 = “{tick[“status”]}”‘)

     print(f’status2 = “{tick[“status2”]}”‘)

     print()



When run, this prints:

[2025-07-20 18:07:39] Wyrd Smythe (4827626601178243512)

status1 = "new"
Entries are read-only.

Entries may not be deleted.

status2 = "open"

Entries are read-only.

status1 = "accepted"
status2 = "closed"

Instances don’t allow their entries to be modified or deleted. At least not easily. But we can always appeal to the dict base class as we did in line #43 and #44.

So, it’s not impossible to modify values, but since only the ID is used to generate a hash, it’s the only entry that absolutely must be immutable. If this was a serious class rather than an illustration, we’d want to take steps to protect that entry.

We can test sorting with this bit of code:

 from time import sleep

 from demos import ticket

 

 if __name__ == ‘__main__’:

     “””Exercise the ticket class.”””

 

     t1 = ticket(‘Wyrd’,‘Smythe’, job=‘123456-78’)

     print(t1)

     sleep(3)

 

     t2 = ticket(‘Fred’, ‘Flintstone’, job=‘123789-34’)

     print(t2)

     sleep(3)

 

     t3 = ticket(‘Barney’, ‘Rubble’, job=‘567258-42’)

     print(t3)

     print()

 

     print(f'{t1 < t2 = }’)

     print(f'{t2 < t3 = }’)

     print()

 

     for ix,tick in enumerate(sorted([t3,t1,t2]), start=1):

         print(f'{ix}: {tick.name}’)

     print()



Which, when run, prints:

[2025-07-20 18:50:35] Wyrd Smythe (7214965656540271146)
[2025-07-20 18:50:38] Fred Flintstone (6708916351415752640)
[2025-07-20 18:50:41] Barney Rubble (4272416954490912370)

t1 < t2 = True
t2 < t3 = True

1: Wyrd Smythe
2: Fred Flintstone
3: Barney Rubble

The ordering is determined by the timestamp, so I used the sleep function to provide some time separation between tickets.

Here’s a subclass of dict that implements a list of files for a given subdirectory:

 from os import path, listdir

 

 class filelist (dict):

     “””Load a dictionary with a list of files from a directory.”””

 

     def __init__ (self, filepath):

         “””New filelist. Loads data dynamically.”””

         super().__init__()

         self.fpath = filepath

 

         # Add files…

         for name in listdir(filepath):

             fn = path.join(self.fpath, name)

 

             # It’s a file…

             if path.isfile(fn):

                 self[name] = {

                     ‘@’:1,

                     ‘name’:fn,

                     ‘size’:path.getsize(fn),

                     ‘dlm’:path.getmtime(fn),

                     ‘dcr’:path.getctime(fn),

                 }

                 continue

 

             # It’s a subdirectory…

             if path.isdir(fn):

                 self[name] = {

                     ‘@’:2,

                     ‘name’:fn

                 }

                 continue

 

             # Not a file or a subdirectory…

             raise RuntimeError(f’Unknown directory type: {name}’)

 

     def __repr__ (self):

         return f'{self.fpath} ({len(self)})’

 

 

 if __name__ == ‘__main__’:

     “””Exercise the filelist class.”””

 

     # New filelist instance…

     flist = filelist(r’C:\demo\hcc\python’)

     print(flist)

 

     # Sort key…

     sort_by_type = lambda k:flist[k][‘@’]

 

     # List the files…

     for name in sorted(flist, key=sort_by_type, reverse=True):

         rcd = flist[name]

         match rcd[‘@’]:

 

             # Files…

             case 1:

                 size = f'{rcd[“size”]:,}’

                 print(f'{name:<24s}{size:>8s} bytes’)

 

             # Subdirectories…

             case 2:

                 print(f'{name:24s}(subdirectory)’)

 

     print()



Here we only need to implement the __init__ method (lines #6 to #35) and __repr__ method (lines #37 and #38). (The latter just because we always implement toString.)

An instance requires a directory name which it uses to generate a list of files and subdirectories in that directory (lines #12 to #35). The routine is not recursive and does not scan any subdirectories. I’ll leave that as a user exercise.

The dictionary keys are the file and subdirectory names (with no path info). The dictionary values are dictionaries with two guaranteed entries, “@” and “name”. The first is either a 1 (this entry is a file) or 2 (this entry is a subdirectory). The “name” entry is the entire path and file name.

If the entry is a file (“@” = 1), then there are three additional entries: “size” (the size of the file in bytes), “dcr” (the date the file was created), and “dlm” (the date the file was last modified).

When run, this prints:

C:\demo\hcc\python (13)
config                  (subdirectory)
images                  (subdirectory)
inputs                  (subdirectory)
outputs                 (subdirectory)
2b-or-not2b.txt              450 bytes
chriscarol.txt             5,596 bytes
colors.png                 1,992 bytes
configs.json                 675 bytes
configuration.ini          1,099 bytes
example.json               1,040 bytes
example.xml                1,373 bytes
randnums.dat              19,164 bytes
vim.txt                      334 bytes

Lastly, here’s a subclass of dict that reads a Windows™ INI file and generates a dictionary of the sections and values found:

 class inifile (dict):

 

     def __init__ (self, filename, encoding=‘utf8’):

         “””New inifile. Loads data dynamically.”””

         super().__init__()

         self.fname = filename

 

         # Open, read, and parse the INI file…

         fp = open(filename, mode=‘r’, encoding=encoding)

         try:

             print(f’reading: {filename}…’)

             print()

             sect_name = ‘main’

             self[sect_name] = {}

 

             for ix,line in enumerate(fp, start=1):

                 line = line.strip()

                 #print(f'{ix:2d}: {line}’)

 

                 # Ignore blank lines…

                 if len(line) == 0:

                     continue

 

                 # Ignore comments…

                 if line.startswith(‘#’) or line.startswith(‘;’):

                     continue

 

                 # Process section names…

                 if line.startswith(‘[‘):

                     # New section name…

                     if line[–1] != ‘]’:

                         SyntaxError(f’Invalid section header: “{line}”‘)

                     sect_name = line[1:–1].strip()

                     self[sect_name] = {}

                     continue

 

                 # Name and value pair for current section…

                 parts = line.split(‘=’, maxsplit=1)

                 name = parts[0].strip()

                 valu = parts[1].strip() if 1 < len(parts) else ”

                 self[sect_name][name] = valu

 

             print()

         except:

             raise

         finally:

             fp.close()

 

     def __repr__ (self):

         return f'{self.fname} ({len(self)} sections)’

 

 

 if __name__ == ‘__main__’:

     “””Exercise the inifile class.”””

 

     # New INI file dictionary…

     ini = inifile(r”C:\demo\hcc\python\configuration.ini”)

     print(f’INI File: {ini.fname}’)

 

     # Iterate through the sections and list name=value pairs…

     for sect in ini:

         print(f’Section: “{sect}”‘)

         for ix,name in enumerate(ini[sect], start=1):

             print(f'{ix:2d}: {name}={ini[sect][name]}’)

         print()

     print()



Again, we only need to implement the __init__ and __repr__ methods. In the former, we open, read, and parse the INI file text (lines #9 to #41).

The INI file for the test looks like this:

# Configuration file
#
Python=C:\Python310\python.exe
BasePath=C:\demo\hcc\python

index = modules.list
version = 1.0.0.0

[colors ]
red   = #ff0000
green = #007f00
blue  = #0000ff

[ paths ]
"C:/demo/hcc/python/baseball"
"C:/demo/hcc/python/baseball/etc"
"C:/demo/hcc/python/baseball/ftp"
"C:/demo/hcc/python/baseball/xslt"

 [modules]
gameday
gameday_url
gameday_msb
gameday_boxscore
gameday_linescore
gameday_innings
gameday_game
gameday_players


[]
/eof

Note that the section names are designed to stress-test the parser. We expect the names to be “colors” and “paths” with no spaces. And we ignore spaces before or after the section name brackets. Likewise, leading and trailing spaces, as well as spaces on either side of the equals sign, are ignored in the name=value pairs.

When we run the above code, it prints:

reading: C:\demo\hcc\python\configuration.ini...

INI File: C:\demo\hcc\python\configuration.ini
Section: "main"
 1: Python=C:\Python310\python.exe
 2: BasePath=C:\demo\hcc\python
 3: index=modules.list
 4: version=1.0.0.0

Section: "colors"
 1: red=#ff0000
 2: green=#007f00
 3: blue=#0000ff

Section: "paths"
 1: "C:/demo/hcc/python/baseball"=
 2: "C:/demo/hcc/python/baseball/etc"=
 3: "C:/demo/hcc/python/baseball/ftp"=
 4: "C:/demo/hcc/python/baseball/xslt"=

Section: "modules"
 1: gameday=
 2: gameday_url=
 3: gameday_msb=
 4: gameday_boxscore=
 5: gameday_linescore=
 6: gameday_innings=
 7: gameday_game=
 8: gameday_players=

Section: ""
 1: /eof=

You can extend this (or the classes above) to give them more features. For instance, you could make the INI file reader type-aware to make the values be integers or strings or dates or whatever.

I hope these examples show some of the power of subclassing the built-in types like dict or list. Using full-featured, well-tested classes as a starting point gives your extended classes a lot of power. All the subclasses above inherit the keys, values, and items methods of dict, for example.

Link: Zip file containing all code fragments used in this post.

∅

1 thought on “Simple Python Tricks #16”

Wyrd Smythe said:

July 21, 2025 at 9:20 am

ATTENTION: The WordPress Reader strips the style information from posts, which can destroy certain important formatting elements. If you’re reading this in the Reader, I highly recommend (and urge) you to [A] stop using the Reader and [B] always read blog posts on their website.

This post is: Simple Python Tricks #16

The Hard-Core Coder

~ I can't stop writing code!

Simple Python Tricks #16

1 thought on “Simple Python Tricks #16”

Over to you... Cancel reply

Share this:

Related

1 thought on “Simple Python Tricks #16”

Over to you... Cancel reply