This is Python! (part 6)

Tags

software design, Python code, Unicode, Python 100

This series of posts is for those who have used a programming language before but are not familiar with Python. This post concludes the introduction to the language.

This last post of the introductory tour is a grab bag of features I skipped in previous posts. From now on we’ll be digging deeper into specific topics.

In part 1 we met the str class, Python’s data type for text. In part 2 we saw that text (aka string) objects are list-like (iterable) — they are ordered lists of characters (in part 3 we learned more about list-like objects).

A string’s iterable nature is revealed in list contexts:

 # Strings are lists…

 

 chars = list(“Hello!”)

 print(f’{chars = }‘)

 print()

 

 for char in “Goodbye!”:

     print(f’{char = }‘)

 print()

 

 text = “Hey, Moon!”

 

 for ix,char in enumerate(text):

     print(f’{ix}: {char}‘)

 print()

 

 chars = list(text)

 print(f’{chars = }‘)

 print()



When run, this prints:

chars = ['H', 'e', 'l', 'l', 'o', '!']

char = 'G'
char = 'o'
char = 'o'
char = 'd'
char = 'b'
char = 'y'
char = 'e'
char = '!'

0: H
1: e
2: y
3: ,
4:  
5: M
6: o
7: o
8: n
9: !

chars = ['H', 'e', 'y', ',', ' ', 'M', 'o', 'o', 'n', '!']

So, it’s easy to iterate through text character-by-character (or to turn text into a list of individual characters).

Python strings are Unicode and many contain any valid Unicode character including emojis. [See this LCC post for a general overview or this HCC post for a more technical look.]

Python uses plain old ASCII for source code but allows Unicode characters in Python strings (assuming the text editor handles it; most do):

 # Python unicode strings…

 

 unicode_string = “֍ Ϣўȓⅆ ֆ៣ᥡᚾℏℯ ֎”

 

 print(unicode_string)

 print()

 

 unicode_chars = list(unicode_string)

 print(unicode_chars)

 print()

 

 for ix,char in enumerate(unicode_string, start=1):

     print(f’{ix:2d}: {char} ({ord(char)})‘)

 print()

 

 ascii_editors = “\u058d \u03e2\u045e\u0213\u2146 \u058e”

 print(ascii_editors)

 print()



As in line #16, the \u#### escape sequence can represent any 16-bit character using plain ASCII. The \x## escape sequence can represent any 8-bit character, and the \U######## escape sequence can represent any 32-bit character (e.g. many emoji characters — the teddy bear emoji below is \U0001f9f8). In all cases, the # is a hex digit.

When run, this prints:

֍ Ϣўȓⅆ ֆ៣ᥡᚾℏℯ ֎

['֍',' ','Ϣ','ў','ȓ','ⅆ',' ','ֆ','៣','ᥡ','ᚾ','ℏ','ℯ',' ','֎']

 1: ֍ (1421)
 2:   (32)
 3: Ϣ (994)
 4: ў (1118)
 5: ȓ (531)
 6: ⅆ (8518)
 7:   (32)
 8: ֆ (1414)
 9: ៣ (6115)
10: ᥡ (6497)
11: ᚾ (5822)
12: ℏ (8463)
13: ℯ (8495)
14:   (32)
15: ֎ (1422)

֍ Ϣўȓⅆ ֎

The built-in ord (“ordinal”) function (line #13) returns a character’s numeric value — its Unicode index (aka ordinal number aka code point). For ASCII characters, this is the usual ASCII value. (ASCII maps directly to Unicode.) For Unicode characters, as seen above, the ordinal numbers are larger than ASCII values.

Note: characters are the “atoms” in strings — from the lowly “A” (ordinal value 65) to emojis like the teddy bear (“🧸” — ordinal value 129528 — 1f9f8 in hex).

But 8-bit (aka byte aka octet) contexts — such as memory, file storage, or network communications — use bytes as “atoms”. In such a context, Unicode characters must be encoded with multiple bytes [see the technical look post for more]. For example, the teddy bear emoji has a UTF-8 encoding comprised of four bytes: 240, 159, 167, 184 (or F0, 9F, A7, B8 in hexadecimal). (Note this is not the ordinal value.)

Python has the built-in bytes data type for 8-bit strings:

 ### Byte Strings…

 

 bs = bytes()

 bs = b”

 print(bs)

 print()

 

 bs = b’ABCDEF\x00′

 print(bs)

 print()

 

 bs = b’\x00\x01\x02\x80\xfd\xfe\xff’

 bs = bytes([0x00, 0x01, 0x02, 0x80, 0xfd, 0xfe, 0xff])

 print(bs)

 print()

 

 ns = [0, 1, 2, 128, 253, 254, 255]

 bs = bytes(ns)

 print(bs)

 print()

 

 bs:bytes = b’ABCDEF\x00′

 print(bs)

 print()



This is similar to the string example in part 1. Line #3 uses the bytes class as a constructor to create a new bytes instance and assign it to the variable named bs (for “byte string”). We provide no arguments to the constructor, so the new instance has the default value: a zero-length byte string.

Line #4 does the same thing using a bytes literal — note the letter b prepending the string. (Prepending the letter b to make a bytes string is similar to prepending the letter f to make an f-string.)

Line #8 assigns a seven-byte literal value to bs — the \x## escape sequence allows inserting arbitrary 8-bit values (in hex). Here, it adds a trailing null (zero) to some ASCII characters (some languages terminate strings with a null — Python does not; it stores the string length).

Lines #12 and #13 both create identical strings with 8-bit binary values, line #11 with a byte string literal and line #13 with a list of integers given to the bytes constructor. Note how Python allows literal hexadecimal integer values (which can be as large as needed).

Lines #17 and #18 combine to create a list identical to those in lines #12 or #13 but using decimal literal values rather than hex.

Lastly, line #22 matches line #8 but annotates bs with a type-hint.

When run, this prints:

b''

b'ABCDEF\x00'

b'\x00\x01\x02\x80\xfd\xfe\xff'

b'\x00\x01\x02\x80\xfd\xfe\xff'

b'ABCDEF\x00'

Python displays bytes values with the leading letter b. It displays printable characters as-is and non-printable characters using \x## sequences.

We have str for strings of regular text (including Unicode characters) and bytes for strings of 8-bit values. Both are iterable. Both are immutable — once created they cannot be altered. Both have methods in common but also distinct capabilities.

Python makes it easy to convert (Unicode) strings to any of the Unicode binary encoding forms. The most common of these is UTF-8 for 8-bit contexts, but there is also UTF-16 and UTF-32 for 16- and 32-bit contexts. Because they use multiple bytes, the 16- and 32-bit sizes have an order: most-significant byte first (“big endian”, 3-2-1-0) or least-significant byte first (“little endian”, 0-1-2-3).

 # Strings and byte strings…

 

 unicode_string = “\U0001f9f8”

 print(“Unicode:”)

 print(unicode_string)

 print()

 

 unicode_bytes = bytes(unicode_string, encoding=‘utf-8’)

 print(“UTF-8:”)

 print(unicode_bytes)

 print()

 

 unicode_bytes = bytes(unicode_string, encoding=‘utf-16’)

 print(“UTF-16:”)

 print(unicode_bytes)

 print()

 

 unicode_bytes = bytes(unicode_string, encoding=‘utf-16le’)

 print(“UTF-16 Little Endian:”)

 print(unicode_bytes)

 print()

 

 unicode_bytes = bytes(unicode_string, encoding=‘utf-16be’)

 print(“UTF-16 Big Endian:”)

 print(unicode_bytes)

 print()

 

 unicode_bytes = bytes(unicode_string, encoding=‘utf-32’)

 print(“UTF-32:”)

 print(unicode_bytes)

 print()

 

 unicode_bytes = bytes(unicode_string, encoding=‘utf-32le’)

 print(“UTF-32 Little Endian:”)

 print(unicode_bytes)

 print()

 

 unicode_bytes = bytes(unicode_string, encoding=‘utf-32be’)

 print(“UTF-32 Big Endian:”)

 print(unicode_bytes)

 print()



The bytes constructor takes an iterable object to convert to a bytes string. The encoding parameter (in this case required) specifies the encoding form. Values provided in the iterable must all be in the correct range (0–255) or Python raises an Exception.

When run, this prints:

Unicode:
🧸

UTF-8:
b'\xf0\x9f\xa7\xb8'

UTF-16:
b'\xff\xfe>\xd8\xf8\xdd'

UTF-16 Little Endian:
b'>\xd8\xf8\xdd'

UTF-16 Big Endian:
b'\xd8>\xdd\xf8'

UTF-32:
b'\xff\xfe\x00\x00\xf8\xf9\x01\x00'

UTF-32 Little Endian:
b'\xf8\xf9\x01\x00'

UTF-32 Big Endian:
b'\x00\x01\xf9\xf8'

Note to the Unicode-savvy: the UTF-16 and UTF-32 encodings, with no “endianness” specified (lines #13 and #28), default to little endian and prepend a BOM (byte order mark).

We can go the other way using the str constructor:

 # Bytes to strings…

 

 utf8_bytes = b’\xd6\x8d\x20\xd6\x8e’

 print(utf8_bytes)

 

 unicode_string = str(utf8_bytes, encoding=‘utf-8’)

 print(unicode_string)

 print()

 

 

 utf16_bytes = b’\x8d\x05\x20\x00\x8e\x05′

 print(utf16_bytes)

 

 unicode_string = str(utf16_bytes, encoding=‘utf-16’)

 print(unicode_string)

 print()

 

 

 utf32_bytes = b’\x00\x00\x05\x8d\x00\x00\x00\x20\x00\x00\x05\x8e’

 print(utf32_bytes)

 

 unicode_string = str(utf32_bytes, encoding=‘utf-32be’)

 print(unicode_string)

 print()



As with converting a string to bytes, converting bytes to string requires an encoding argument. Note the space character is here coded as a hex escape sequence (\x20 for UTF-8, \x20,\x00 for UTF-16LE, and \x00,\x00,\x00,\x20 for UTF-32BE).

When run, this prints:

b'\xd6\x8d \xd6\x8e'
֍ ֎

b'\x8d\x05 \x00\x8e\x05'
֍ ֎

b'\x00\x00\x05\x8d\x00\x00\x00 \x00\x00\x05\x8e'
֍ ֎

Bottom line: Python makes it easy to work with all types of text.

We went far down the rabbit hole here. If you’ve worked with Unicode text before, you know there’s a lot to it. This introductory tour is meant only as an overview of Python’s text handling.

For mutable byte arrays, Python has the built-in bytearray data type:

 ### Bytesarrays…

 

 ba = bytearray()

 print(ba)

 print()

 

 ba = bytearray([1,2,3,4,5])

 print(ba)

 print()

 

 ba = bytearray(b”Hello\x00″)

 print(ba)

 print()

 

 ba = bytearray(“Hello!”, encoding=‘utf8’)

 print(ba)

 print()

 

 ba = bytearray(6)

 print(ba)

 print()

 ba[0] = ord(‘A’)

 ba[1] = ord(‘B’)

 ba[2] = ord(‘C’)

 ba[–3] = ord(‘X’)

 ba[–2] = ord(‘Y’)

 ba[–1] = ord(‘Z’)

 print(ba)

 print()



Line #3, as always, uses the class constructor to create a default object — here a bytearray object — and assign it to the variable name ba. Also as always, the default is a zero-length empty object. Unlike previous data types, there is no bytearray literal.

Similar to other iterable classes, the bytearray constructor takes an iterable object to convert to a bytearray. Line #7 gives it a list of integers (all values must be less than 256). Line #11 gives it a bytes object (which must be only byte values). Line #15 gives it a str object — this requires we provide an encoding.

Line #19 is a special case with an integer argument — this is taken as a length and creates a bytearray of given length (six in this case). The array values are initialized to zero.

Lines #22 to #27 modify the array — recall that negative index values index from the end of a list. Line #27 indexes the last item, line #26 the penultimate item.

When run, this prints:

bytearray(b'')

bytearray(b'\x01\x02\x03\x04\x05')

bytearray(b'Hello\x00')

bytearray(b'Hello!')

bytearray(b'\x00\x00\x00\x00\x00\x00')

bytearray(b'ABCXYZ')

In part 4 we looked at the if-else statement as well as while and for loops. And we’ve seen that Python raises an Exception when it encounters an error.

Consider this code:

 # Python Exceptions (introduction)…

 

 def divider (a, b):

     ”’Naive divider function.”’

     return a / b

 

 divider(42, 0)



The values we pass to divider cause line #5 to divide by zero — which is mathematically undefined. If we run this, Python raises ZeroDivisionError (one of the many built-in Exception types):

ZeroDivisionError: division by zero

When an error occurs, Python halts code execution and backs out of nested code blocks until either it finds an error handler or it backs all the way out to the Python interpreter — which terminates the program and displays the error.

In fact, it displays more than just the error: it displays a stack trace. Let’s give Python something to display by nesting some functions:

 # Python Exceptions (stack trace)…

 

 def divider (a, b):

     ”’Naive divider function.”’

     return a / b

 

 def nested_function (x, y):

     ”’Some random function.”’

     …

     value = divider(x,y)

     …

     return value

 

 def oops_function (n=0, m=0):

     ”’A problem-causing function.”’

     …

     nested_function(n, m)

     …

 

 

 # We’re gonna regret this…

 oops_function(42)



On line #22 we call the oops_function; on line #17 it calls the nested_function; on line #10 that function calls divider. Because we don’t provide argument m in line #22, it defaults to zero and causes an Exception on line #5.

When we run this, we get:

Traceback (most recent call last):

  File "C:\...\fragment.py", line 22, in <module>
    oops_function(42)

  File "C:\...\fragment.py", line 17, in oops_function
    nested_function(n, m)

  File "C:\...\fragment.py", line 10, in nested_function
    value = divider(x,y)

  File "C:\...\fragment.py", line 5, in divider
    return a / b

ZeroDivisionError: division by zero

(I added blank lines to make the output clearer.) Skipping the first line, the pair of lines at top is the highest level. Each pair below is one level down in the call stack. The first line of each pair has the filename, line number, and function name. The second line is the source code on that line.

At you see, we started at the module level calling oops_function (line #22). Next, in that function, we called nested_function (line #17). And in that function, we called divider (line #10). Lastly, in that function, we generate an error (line #5).

The line at the bottom is the actual error and its content. When we raise exceptions in code, we specify their type and content (if any).

The try-except statement lets us trap errors:

 # Python Try-Except…

 

 def divider (a, b):

     ”’Smart divider function.”’

     print(f’divider({a}, {b})‘, end=”)

 

     # Attempt a division…

     try:

         retval = a / b

         print(f’ = {retval}\n‘)

         return retval

 

     # Catch any error…

     except:

         print(‘ *** Oops! ***\n’)

 

     return None

 

 

 # Try some dividing…

 divider(42,21)

 divider(42, 0)

 divider( 1, 3)



The print statement in line #5 prints the function name and incoming arguments. The optional end keyword argument overrides the default newline character(s) — in this case, with nothing so we can print more on the same line.

Here, rather than extra print statements to create blank lines, here we embed \n in strings. These become newline (aka linefeed) characters. (We can also use \r to insert a carriage-return or \t to insert a tab. See Escape sequences for others.)

When run, this prints:

divider(42, 21) = 2.0

divider(42, 0) *** Oops! ***

divider(1, 3) = 0.3333333333333333

If the code in the try block — or in any function called from the block — causes an Exception, Python jumps to the except block (which clears the error condition).

Note that, if called code has its own try-except block, it becomes the new error-handler until the code exits it.

Above, we noted an error occurred but let the code continue. Because Python has many Exception types, it might be nice to know which one caused the error:

 # Python Try-Except…

 

 def divider (a, b):

     ”’Smarter divider function.”’

     print(f’divider({a}, {b})‘, end=”)

 

     # Attempt a division…

     try:

         retval = a / b

         print(f’ = {retval}\n‘)

         return retval

 

     # Catch any error…

     except Exception as e:

         print(f’\nErr Type: {type(e).__name__}‘)

         print(f’Err Text: “{e}“\n‘)

 

     return None

 

 

 # Try some dividing…

 divider( 1, 3)

 divider(42,23)

 divider(42, 0)

 divider(‘a’,5)



The except statement can specify the exceptions it catches. Here (line #14) we use the Exception class, which is a base class for all the usual Python exceptions. We want to refer to it, so we give it a name: e (is common).

We use the built-in type function to get the exception’s actual type (class). All classes have a __name__ property (“dunder name”) we can access.

When run, this prints:

divider(1, 3) = 0.3333333333333333

divider(42, 23) = 1.826086956521739

divider(42, 0)
Err Type: ZeroDivisionError
Err Text: "division by zero"

divider(a, 5)
Err Type: TypeError
Err Text: "unsupported operand type(s) for /: 'str' and 'int'"

Here’s one last wrinkle before we move on:

 # Python Try-Except (alternate)…

 

 def divider (a, b):

     ”’Smarter divider function.”’

     print(f’divider({a}, {b})‘, end=”)

 

     # Attempt a division…

     try:

         retval = a / b

         print(f’ = {retval}\n‘)

         return retval

 

     # Catch divide-by-zero errors…

     except ZeroDivisionError:

         print(‘ *** Opps! Division by zero!\n’)

 

     # Catch divide-by-zero errors…

     except TypeError:

         print(f’ *** Invalid operands: {a}÷{b}\n‘)

 

     return None

 

 

 # Try some dividing…

 divider( 1, 3)

 divider(42,23)

 divider(42, 0)

 divider(‘a’,5)



We can specify specific errors to trap. Line #14 catches only division-by-zero errors, and line #18 catches only data type errors. Note that we don’t need to name the exceptions because we don’t refer to them in code.

When run, this prints:

divider(1, 3) = 0.3333333333333333

divider(42, 23) = 1.826086956521739

divider(42, 0) *** Opps! Division by zero!

divider(a, 5) *** Invalid operands: a÷5

Yet another topic to explore in more detail down the road.

In part 5 we saw how to define functions:

 # Basic function definition…

 

 # Define a new function…

 def my_function (x:int, y:int, a:str=”, b:str=”):

     ”’This function does stuff!”’

     …

     # … stuff …

     …

 

 

 # Call the function…

 my_function(2.1, 4.2, a=‘save’)



Despite the assertion in the function’s docstring (line #5), the function does nothing. It’s just a reminder of what a Python function looks like.

We’ve also seen many times a common Python syntax pattern like this:

keyword some-stuff :
    code-body

Besides function definitions, it’s used in if-elif-else statements and in for and while loops. We’ll find it in class definitions when we get to those. It’s a common pattern.

We’ve seen that one-line code constructs are possible when the code-body is just one statement (or two or three very short statements joined with semi-colons). Very simple function definitions can also be “one-liners”:

 # One-line function definition…

 

 def ultimate (): return 42

 

 

 print(f’The Ultimate Answer is: {ultimate()}‘)

 print()



But this is fairly rare except in class methods (we’ll explore classes soon).

Instead, Python has lambda expressions — single expressions that are functions:

 ### Lambda function…

 

 # Define a lambda function…

 ultimate = lambda: 42

 

 

 # Use the function…

 print(f’The Ultimate Answer is: {ultimate()}‘)

 print()



This code is functionally identical to the code above. Both print:

The Ultimate Answer is: 42

The first example defines a function object named ultimate (line #3). The empty parentheses indicate it takes no arguments. The single line of code returns the integer value 42.

The second example defines a lambda object (line #4). The colon directly after the lambda keyword indicates it takes no arguments. The code after it gives the function a hard-coded value of 42. This lambda object is bound to the variable named ultimate.

The print statements in line #6 (first example) and line #8 (second example) work the same: they use an f-string with embedded code to call the ultimate function.

Note there is no return statement — a lambda definition has a single expression as its value, and this is automatically returned when the function is called.

As with regular functions, lambda functions can take arguments:

 ### Lambda Functions…

 

 # Define a lambda function…

 x_to_the_pi = lambda x: pow(x, 3.14159)

 

 

 # Use the function…

 print(f’{x_to_the_pi(2) = }‘)

 print(f’{x_to_the_pi(3) = }‘)

 print(f’{x_to_the_pi(4) = }‘)

 print()



Line #4 defines a lambda function with one parameter (named x). The function’s value uses the built-in pow (power) function to raise x to the power of pi (approximately).

When run, this prints:

x_to_the_pi(2) = 8.824961595059897
x_to_the_pi(3) = 31.544188740351338
x_to_the_pi(4) = 77.8799471542821

A lambda can have multiple arguments, including ones with defaults:

 ### Lambda Functions…

 

 # Define a lambda function…

 round_up = lambda x,step=10: step * (int(x/step) + 1)

 

 

 # Use the function…

 print(f’{round_up(42)      = }‘)

 print(f’{round_up(42, 5)   = }‘)

 print(f’{round_up(42, 50)  = }‘)

 print(f’{round_up(42, 100) = }‘)

 print()

 print(f’{round_up(421)       = }‘)

 print(f’{round_up(421, 50)   = }‘)

 print(f’{round_up(421, 100)  = }‘)

 print(f’{round_up(421, 500)  = }‘)

 print(f’{round_up(421, 1000) = }‘)

 print()



Line #4 defines a lambda with two parameters, one required (x) and one optional with a default (step). The value is the calculation seen after the colon.

When run, this prints:

round_up(42)      = 50
round_up(42, 5)   = 45
round_up(42, 50)  = 50
round_up(42, 100) = 100

round_up(421)       = 430
round_up(421, 50)   = 450
round_up(421, 100)  = 500
round_up(421, 500)  = 500
round_up(421, 1000) = 1000

Note how the examples so far involve a lambda object assigned to a variable name. They can also be written inline and passed to functions:

 ### Inline Lambda Functions…

 

 # 13 tuples with random integers…

 nums = [

     (“h”,82), (“d”,20), (“e”,99), (“m”,99), (“l”,31),

     (“i”,84), (“g”, 4), (“a”,77), (“f”,27), (“j”,98),

     (“b”,72), (“c”,17), (“k”,27),

 ]

 

 # List with default tuple sort…

 for ix,num in enumerate(sorted(nums)):

     print(f’{ix:2d}: {num[0]} = {num[1]:2d}‘)

 print()

 

 # List with defined sort function…

 for ix,num in enumerate(sorted(nums, key=lambda t:t[1])):

     print(f’{ix:2d}: {num[1]:2d} = {num[0]}‘)

 print()



Lines #4 to #8 define a list of 13 tuple objects, each containing a string and an integer. The strings are single lowercase characters in alphabetical order. The integers are random values from 1 to 99.

The built-in sorted function (lines #11 and #16) sorts whatever iterable it’s given. In this case a list of tuple objects. By default, when sorting objects such as tuples, sorted considers the first item in each (moving on to later items on a match — all items matching means the two match). Here, the first item is the string, so lines #11 to #13 sort alphabetically.

In line #16 we use the optional key parameter to extract a sorting value, and we use an inline lambda object to provide it. The key argument must be a one-parameter function expecting an item from the list and returning a sortable key. In this case, the t parameter is a tuple from the list, and we return the second item — the integer — as the sort-key.

When run, this prints:

 0: a = 77
 1: b = 72
 2: c = 17
 3: d = 20
 4: e = 99
 5: f = 27
 6: g =  4
 7: h = 82
 8: i = 84
 9: j = 98
10: k = 27
11: l = 31
12: m = 99

 0:  4 = g
 1: 17 = c
 2: 20 = d
 3: 27 = f
 4: 27 = k
 5: 31 = l
 6: 72 = b
 7: 77 = a
 8: 82 = h
 9: 84 = i
10: 98 = j
11: 99 = e
12: 99 = m

And on that note, the introductory tour concludes. Any questions?

The ZIP file linked below contains all the Python code fragments from part 1, part 2, part 3, part 4, part 5, and this part 6. Next time we’ll look at downloading, installing, and running Python.

Link: Zip file containing all code fragments used in this six-post tour.

∅

2 thoughts on “This is Python! (part 6)”

Wyrd Smythe said:

February 16, 2026 at 9:25 am

ATTENTION: The WordPress Reader strips the style information from posts, which can destroy certain important formatting elements. If you’re reading this in the Reader, I highly recommend (and urge) you to [A] stop using the Reader and [B] always read blog posts on their website.

This post is: This is Python! (part 6)

Pingback: This is Python! (part 7) | The Hard-Core Coder

The Hard-Core Coder

~ I can't stop writing code!

This is Python! (part 6)

2 thoughts on “This is Python! (part 6)”

Over to you... Cancel reply

Share this:

Related

2 thoughts on “This is Python! (part 6)”

Over to you... Cancel reply