This series of posts is for those who have used a programming language before but are not familiar with Python. This post concludes the introduction to the language.
This last post of the introductory tour is a grab bag of features I skipped in previous posts. From now on we’ll be digging deeper into specific topics.
In part 1 we met the str class, Python’s data type for text. In part 2 we saw that text (aka string) objects are list-like (iterable) — they are ordered lists of characters (in part 3 we learned more about list-like objects).
A string’s iterable nature is revealed in list contexts:
002|
003| chars = list(“Hello!”)
004| print(f’{chars = }‘)
005| print()
006|
007| for char in “Goodbye!”:
008| print(f’{char = }‘)
009| print()
010|
011| text = “Hey, Moon!”
012|
013| for ix,char in enumerate(text):
014| print(f’{ix}: {char}‘)
015| print()
016|
017| chars = list(text)
018| print(f’{chars = }‘)
019| print()
020|
When run, this prints:
chars = ['H', 'e', 'l', 'l', 'o', '!'] char = 'G' char = 'o' char = 'o' char = 'd' char = 'b' char = 'y' char = 'e' char = '!' 0: H 1: e 2: y 3: , 4: 5: M 6: o 7: o 8: n 9: ! chars = ['H', 'e', 'y', ',', ' ', 'M', 'o', 'o', 'n', '!']
So, it’s easy to iterate through text character-by-character (or to turn text into a list of individual characters).
Python strings are Unicode and many contain any valid Unicode character including emojis. [See this LCC post for a general overview or this HCC post for a more technical look.]
Python uses plain old ASCII for source code but allows Unicode characters in Python strings (assuming the text editor handles it; most do):
002|
003| unicode_string = “֍ Ϣўȓⅆ ֆ៣ᥡᚾℏℯ ֎”
004|
005| print(unicode_string)
006| print()
007|
008| unicode_chars = list(unicode_string)
009| print(unicode_chars)
010| print()
011|
012| for ix,char in enumerate(unicode_string, start=1):
013| print(f’{ix:2d}: {char} ({ord(char)})‘)
014| print()
015|
016| ascii_editors = “\u058d \u03e2\u045e\u0213\u2146 \u058e”
017| print(ascii_editors)
018| print()
019|
As in line #16, the \u#### escape sequence can represent any 16-bit character using plain ASCII. The \x## escape sequence can represent any 8-bit character, and the \U######## escape sequence can represent any 32-bit character (e.g. many emoji characters — the teddy bear emoji below is \U0001f9f8). In all cases, the # is a hex digit.
When run, this prints:
֍ Ϣўȓⅆ ֆ៣ᥡᚾℏℯ ֎ ['֍',' ','Ϣ','ў','ȓ','ⅆ',' ','ֆ','៣','ᥡ','ᚾ','ℏ','ℯ',' ','֎'] 1: ֍ (1421) 2: (32) 3: Ϣ (994) 4: ў (1118) 5: ȓ (531) 6: ⅆ (8518) 7: (32) 8: ֆ (1414) 9: ៣ (6115) 10: ᥡ (6497) 11: ᚾ (5822) 12: ℏ (8463) 13: ℯ (8495) 14: (32) 15: ֎ (1422) ֍ Ϣўȓⅆ ֎
The built-in ord (“ordinal”) function (line #13) returns a character’s numeric value — its Unicode index (aka ordinal number aka code point). For ASCII characters, this is the usual ASCII value. (ASCII maps directly to Unicode.) For Unicode characters, as seen above, the ordinal numbers are larger than ASCII values.
Note: characters are the “atoms” in strings — from the lowly “A” (ordinal value 65) to emojis like the teddy bear (“🧸” — ordinal value 129528 — 1f9f8 in hex).
But 8-bit (aka byte aka octet) contexts — such as memory, file storage, or network communications — use bytes as “atoms”. In such a context, Unicode characters must be encoded with multiple bytes [see the technical look post for more]. For example, the teddy bear emoji has a UTF-8 encoding comprised of four bytes: 240, 159, 167, 184 (or F0, 9F, A7, B8 in hexadecimal). (Note this is not the ordinal value.)
Python has the built-in bytes data type for 8-bit strings:
002|
003| bs = bytes()
004| bs = b”
005| print(bs)
006| print()
007|
008| bs = b’ABCDEF\x00′
009| print(bs)
010| print()
011|
012| bs = b’\x00\x01\x02\x80\xfd\xfe\xff’
013| bs = bytes([0x00, 0x01, 0x02, 0x80, 0xfd, 0xfe, 0xff])
014| print(bs)
015| print()
016|
017| ns = [0, 1, 2, 128, 253, 254, 255]
018| bs = bytes(ns)
019| print(bs)
020| print()
021|
022| bs:bytes = b’ABCDEF\x00′
023| print(bs)
024| print()
025|
This is similar to the string example in part 1. Line #3 uses the bytes class as a constructor to create a new bytes instance and assign it to the variable named bs (for “byte string”). We provide no arguments to the constructor, so the new instance has the default value: a zero-length byte string.
Line #4 does the same thing using a bytes literal — note the letter b prepending the string. (Prepending the letter b to make a bytes string is similar to prepending the letter f to make an f-string.)
Line #8 assigns a seven-byte literal value to bs — the \x## escape sequence allows inserting arbitrary 8-bit values (in hex). Here, it adds a trailing null (zero) to some ASCII characters (some languages terminate strings with a null — Python does not; it stores the string length).
Lines #12 and #13 both create identical strings with 8-bit binary values, line #11 with a byte string literal and line #13 with a list of integers given to the bytes constructor. Note how Python allows literal hexadecimal integer values (which can be as large as needed).
Lines #17 and #18 combine to create a list identical to those in lines #12 or #13 but using decimal literal values rather than hex.
Lastly, line #22 matches line #8 but annotates bs with a type-hint.
When run, this prints:
b'' b'ABCDEF\x00' b'\x00\x01\x02\x80\xfd\xfe\xff' b'\x00\x01\x02\x80\xfd\xfe\xff' b'ABCDEF\x00'
Python displays bytes values with the leading letter b. It displays printable characters as-is and non-printable characters using \x## sequences.
We have str for strings of regular text (including Unicode characters) and bytes for strings of 8-bit values. Both are iterable. Both are immutable — once created they cannot be altered. Both have methods in common but also distinct capabilities.
Python makes it easy to convert (Unicode) strings to any of the Unicode binary encoding forms. The most common of these is UTF-8 for 8-bit contexts, but there is also UTF-16 and UTF-32 for 16- and 32-bit contexts. Because they use multiple bytes, the 16- and 32-bit sizes have an order: most-significant byte first (“big endian”, 3-2-1-0) or least-significant byte first (“little endian”, 0-1-2-3).
002|
003| unicode_string = “\U0001f9f8”
004| print(“Unicode:”)
005| print(unicode_string)
006| print()
007|
008| unicode_bytes = bytes(unicode_string, encoding=‘utf-8’)
009| print(“UTF-8:”)
010| print(unicode_bytes)
011| print()
012|
013| unicode_bytes = bytes(unicode_string, encoding=‘utf-16’)
014| print(“UTF-16:”)
015| print(unicode_bytes)
016| print()
017|
018| unicode_bytes = bytes(unicode_string, encoding=‘utf-16le’)
019| print(“UTF-16 Little Endian:”)
020| print(unicode_bytes)
021| print()
022|
023| unicode_bytes = bytes(unicode_string, encoding=‘utf-16be’)
024| print(“UTF-16 Big Endian:”)
025| print(unicode_bytes)
026| print()
027|
028| unicode_bytes = bytes(unicode_string, encoding=‘utf-32’)
029| print(“UTF-32:”)
030| print(unicode_bytes)
031| print()
032|
033| unicode_bytes = bytes(unicode_string, encoding=‘utf-32le’)
034| print(“UTF-32 Little Endian:”)
035| print(unicode_bytes)
036| print()
037|
038| unicode_bytes = bytes(unicode_string, encoding=‘utf-32be’)
039| print(“UTF-32 Big Endian:”)
040| print(unicode_bytes)
041| print()
042|
The bytes constructor takes an iterable object to convert to a bytes string. The encoding parameter (in this case required) specifies the encoding form. Values provided in the iterable must all be in the correct range (0–255) or Python raises an Exception.
When run, this prints:
Unicode: 🧸 UTF-8: b'\xf0\x9f\xa7\xb8' UTF-16: b'\xff\xfe>\xd8\xf8\xdd' UTF-16 Little Endian: b'>\xd8\xf8\xdd' UTF-16 Big Endian: b'\xd8>\xdd\xf8' UTF-32: b'\xff\xfe\x00\x00\xf8\xf9\x01\x00' UTF-32 Little Endian: b'\xf8\xf9\x01\x00' UTF-32 Big Endian: b'\x00\x01\xf9\xf8'
Note to the Unicode-savvy: the UTF-16 and UTF-32 encodings, with no “endianness” specified (lines #13 and #28), default to little endian and prepend a BOM (byte order mark).
We can go the other way using the str constructor:
002|
003| utf8_bytes = b’\xd6\x8d\x20\xd6\x8e’
004| print(utf8_bytes)
005|
006| unicode_string = str(utf8_bytes, encoding=‘utf-8’)
007| print(unicode_string)
008| print()
009|
010|
011| utf16_bytes = b’\x8d\x05\x20\x00\x8e\x05′
012| print(utf16_bytes)
013|
014| unicode_string = str(utf16_bytes, encoding=‘utf-16’)
015| print(unicode_string)
016| print()
017|
018|
019| utf32_bytes = b’\x00\x00\x05\x8d\x00\x00\x00\x20\x00\x00\x05\x8e’
020| print(utf32_bytes)
021|
022| unicode_string = str(utf32_bytes, encoding=‘utf-32be’)
023| print(unicode_string)
024| print()
025|
As with converting a string to bytes, converting bytes to string requires an encoding argument. Note the space character is here coded as a hex escape sequence (\x20 for UTF-8, \x20,\x00 for UTF-16LE, and \x00,\x00,\x00,\x20 for UTF-32BE).
When run, this prints:
b'\xd6\x8d \xd6\x8e' ֍ ֎ b'\x8d\x05 \x00\x8e\x05' ֍ ֎ b'\x00\x00\x05\x8d\x00\x00\x00 \x00\x00\x05\x8e' ֍ ֎
Bottom line: Python makes it easy to work with all types of text.
We went far down the rabbit hole here. If you’ve worked with Unicode text before, you know there’s a lot to it. This introductory tour is meant only as an overview of Python’s text handling.
For mutable byte arrays, Python has the built-in bytearray data type:
002|
003| ba = bytearray()
004| print(ba)
005| print()
006|
007| ba = bytearray([1,2,3,4,5])
008| print(ba)
009| print()
010|
011| ba = bytearray(b”Hello\x00″)
012| print(ba)
013| print()
014|
015| ba = bytearray(“Hello!”, encoding=‘utf8’)
016| print(ba)
017| print()
018|
019| ba = bytearray(6)
020| print(ba)
021| print()
022| ba[0] = ord(‘A’)
023| ba[1] = ord(‘B’)
024| ba[2] = ord(‘C’)
025| ba[–3] = ord(‘X’)
026| ba[–2] = ord(‘Y’)
027| ba[–1] = ord(‘Z’)
028| print(ba)
029| print()
030|
Line #3, as always, uses the class constructor to create a default object — here a bytearray object — and assign it to the variable name ba. Also as always, the default is a zero-length empty object. Unlike previous data types, there is no bytearray literal.
Similar to other iterable classes, the bytearray constructor takes an iterable object to convert to a bytearray. Line #7 gives it a list of integers (all values must be less than 256). Line #11 gives it a bytes object (which must be only byte values). Line #15 gives it a str object — this requires we provide an encoding.
Line #19 is a special case with an integer argument — this is taken as a length and creates a bytearray of given length (six in this case). The array values are initialized to zero.
Lines #22 to #27 modify the array — recall that negative index values index from the end of a list. Line #27 indexes the last item, line #26 the penultimate item.
When run, this prints:
bytearray(b'') bytearray(b'\x01\x02\x03\x04\x05') bytearray(b'Hello\x00') bytearray(b'Hello!') bytearray(b'\x00\x00\x00\x00\x00\x00') bytearray(b'ABCXYZ')
In part 4 we looked at the if-else statement as well as while and for loops. And we’ve seen that Python raises an Exception when it encounters an error.
Consider this code:
002|
003| def divider (a, b):
004| ”’Naive divider function.”’
005| return a / b
006|
007| divider(42, 0)
008|
The values we pass to divider cause line #5 to divide by zero — which is mathematically undefined. If we run this, Python raises ZeroDivisionError (one of the many built-in Exception types):
ZeroDivisionError: division by zero
When an error occurs, Python halts code execution and backs out of nested code blocks until either it finds an error handler or it backs all the way out to the Python interpreter — which terminates the program and displays the error.
In fact, it displays more than just the error: it displays a stack trace. Let’s give Python something to display by nesting some functions:
002|
003| def divider (a, b):
004| ”’Naive divider function.”’
005| return a / b
006|
007| def nested_function (x, y):
008| ”’Some random function.”’
009| …
010| value = divider(x,y)
011| …
012| return value
013|
014| def oops_function (n=0, m=0):
015| ”’A problem-causing function.”’
016| …
017| nested_function(n, m)
018| …
019|
020|
021| # We’re gonna regret this…
022| oops_function(42)
023|
On line #22 we call the oops_function; on line #17 it calls the nested_function; on line #10 that function calls divider. Because we don’t provide argument m in line #22, it defaults to zero and causes an Exception on line #5.
When we run this, we get:
Traceback (most recent call last):
File "C:\...\fragment.py", line 22, in <module>
oops_function(42)
File "C:\...\fragment.py", line 17, in oops_function
nested_function(n, m)
File "C:\...\fragment.py", line 10, in nested_function
value = divider(x,y)
File "C:\...\fragment.py", line 5, in divider
return a / b
ZeroDivisionError: division by zero
(I added blank lines to make the output clearer.) Skipping the first line, the pair of lines at top is the highest level. Each pair below is one level down in the call stack. The first line of each pair has the filename, line number, and function name. The second line is the source code on that line.
At you see, we started at the module level calling oops_function (line #22). Next, in that function, we called nested_function (line #17). And in that function, we called divider (line #10). Lastly, in that function, we generate an error (line #5).
The line at the bottom is the actual error and its content. When we raise exceptions in code, we specify their type and content (if any).
The try-except statement lets us trap errors:
002|
003| def divider (a, b):
004| ”’Smart divider function.”’
005| print(f’divider({a}, {b})‘, end=”)
006|
007| # Attempt a division…
008| try:
009| retval = a / b
010| print(f’ = {retval}\n‘)
011| return retval
012|
013| # Catch any error…
014| except:
015| print(‘ *** Oops! ***\n’)
016|
017| return None
018|
019|
020| # Try some dividing…
021| divider(42,21)
022| divider(42, 0)
023| divider( 1, 3)
024|
The print statement in line #5 prints the function name and incoming arguments. The optional end keyword argument overrides the default newline character(s) — in this case, with nothing so we can print more on the same line.
Here, rather than extra print statements to create blank lines, here we embed \n in strings. These become newline (aka linefeed) characters. (We can also use \r to insert a carriage-return or \t to insert a tab. See Escape sequences for others.)
When run, this prints:
divider(42, 21) = 2.0 divider(42, 0) *** Oops! *** divider(1, 3) = 0.3333333333333333
If the code in the try block — or in any function called from the block — causes an Exception, Python jumps to the except block (which clears the error condition).
Note that, if called code has its own try-except block, it becomes the new error-handler until the code exits it.
Above, we noted an error occurred but let the code continue. Because Python has many Exception types, it might be nice to know which one caused the error:
002|
003| def divider (a, b):
004| ”’Smarter divider function.”’
005| print(f’divider({a}, {b})‘, end=”)
006|
007| # Attempt a division…
008| try:
009| retval = a / b
010| print(f’ = {retval}\n‘)
011| return retval
012|
013| # Catch any error…
014| except Exception as e:
015| print(f’\nErr Type: {type(e).__name__}‘)
016| print(f’Err Text: “{e}“\n‘)
017|
018| return None
019|
020|
021| # Try some dividing…
022| divider( 1, 3)
023| divider(42,23)
024| divider(42, 0)
025| divider(‘a’,5)
026|
The except statement can specify the exceptions it catches. Here (line #14) we use the Exception class, which is a base class for all the usual Python exceptions. We want to refer to it, so we give it a name: e (is common).
We use the built-in type function to get the exception’s actual type (class). All classes have a __name__ property (“dunder name”) we can access.
When run, this prints:
divider(1, 3) = 0.3333333333333333 divider(42, 23) = 1.826086956521739 divider(42, 0) Err Type: ZeroDivisionError Err Text: "division by zero" divider(a, 5) Err Type: TypeError Err Text: "unsupported operand type(s) for /: 'str' and 'int'"
Here’s one last wrinkle before we move on:
002|
003| def divider (a, b):
004| ”’Smarter divider function.”’
005| print(f’divider({a}, {b})‘, end=”)
006|
007| # Attempt a division…
008| try:
009| retval = a / b
010| print(f’ = {retval}\n‘)
011| return retval
012|
013| # Catch divide-by-zero errors…
014| except ZeroDivisionError:
015| print(‘ *** Opps! Division by zero!\n’)
016|
017| # Catch divide-by-zero errors…
018| except TypeError:
019| print(f’ *** Invalid operands: {a}÷{b}\n‘)
020|
021| return None
022|
023|
024| # Try some dividing…
025| divider( 1, 3)
026| divider(42,23)
027| divider(42, 0)
028| divider(‘a’,5)
029|
We can specify specific errors to trap. Line #14 catches only division-by-zero errors, and line #18 catches only data type errors. Note that we don’t need to name the exceptions because we don’t refer to them in code.
When run, this prints:
divider(1, 3) = 0.3333333333333333 divider(42, 23) = 1.826086956521739 divider(42, 0) *** Opps! Division by zero! divider(a, 5) *** Invalid operands: a÷5
Yet another topic to explore in more detail down the road.
In part 5 we saw how to define functions:
002|
003| # Define a new function…
004| def my_function (x:int, y:int, a:str=”, b:str=”):
005| ”’This function does stuff!”’
006| …
007| # … stuff …
008| …
009|
010|
011| # Call the function…
012| my_function(2.1, 4.2, a=‘save’)
013|
Despite the assertion in the function’s docstring (line #5), the function does nothing. It’s just a reminder of what a Python function looks like.
We’ve also seen many times a common Python syntax pattern like this:
keyword some-stuff :
code-body
Besides function definitions, it’s used in if-elif-else statements and in for and while loops. We’ll find it in class definitions when we get to those. It’s a common pattern.
We’ve seen that one-line code constructs are possible when the code-body is just one statement (or two or three very short statements joined with semi-colons). Very simple function definitions can also be “one-liners”:
002|
003| def ultimate (): return 42
004|
005|
006| print(f’The Ultimate Answer is: {ultimate()}‘)
007| print()
008|
But this is fairly rare except in class methods (we’ll explore classes soon).
Instead, Python has lambda expressions — single expressions that are functions:
002|
003| # Define a lambda function…
004| ultimate = lambda: 42
005|
006|
007| # Use the function…
008| print(f’The Ultimate Answer is: {ultimate()}‘)
009| print()
010|
This code is functionally identical to the code above. Both print:
The Ultimate Answer is: 42
The first example defines a function object named ultimate (line #3). The empty parentheses indicate it takes no arguments. The single line of code returns the integer value 42.
The second example defines a lambda object (line #4). The colon directly after the lambda keyword indicates it takes no arguments. The code after it gives the function a hard-coded value of 42. This lambda object is bound to the variable named ultimate.
The print statements in line #6 (first example) and line #8 (second example) work the same: they use an f-string with embedded code to call the ultimate function.
Note there is no return statement — a lambda definition has a single expression as its value, and this is automatically returned when the function is called.
As with regular functions, lambda functions can take arguments:
002|
003| # Define a lambda function…
004| x_to_the_pi = lambda x: pow(x, 3.14159)
005|
006|
007| # Use the function…
008| print(f’{x_to_the_pi(2) = }‘)
009| print(f’{x_to_the_pi(3) = }‘)
010| print(f’{x_to_the_pi(4) = }‘)
011| print()
012|
Line #4 defines a lambda function with one parameter (named x). The function’s value uses the built-in pow (power) function to raise x to the power of pi (approximately).
When run, this prints:
x_to_the_pi(2) = 8.824961595059897 x_to_the_pi(3) = 31.544188740351338 x_to_the_pi(4) = 77.8799471542821
A lambda can have multiple arguments, including ones with defaults:
002|
003| # Define a lambda function…
004| round_up = lambda x,step=10: step * (int(x/step) + 1)
005|
006|
007| # Use the function…
008| print(f’{round_up(42) = }‘)
009| print(f’{round_up(42, 5) = }‘)
010| print(f’{round_up(42, 50) = }‘)
011| print(f’{round_up(42, 100) = }‘)
012| print()
013| print(f’{round_up(421) = }‘)
014| print(f’{round_up(421, 50) = }‘)
015| print(f’{round_up(421, 100) = }‘)
016| print(f’{round_up(421, 500) = }‘)
017| print(f’{round_up(421, 1000) = }‘)
018| print()
019|
Line #4 defines a lambda with two parameters, one required (x) and one optional with a default (step). The value is the calculation seen after the colon.
When run, this prints:
round_up(42) = 50 round_up(42, 5) = 45 round_up(42, 50) = 50 round_up(42, 100) = 100 round_up(421) = 430 round_up(421, 50) = 450 round_up(421, 100) = 500 round_up(421, 500) = 500 round_up(421, 1000) = 1000
Note how the examples so far involve a lambda object assigned to a variable name. They can also be written inline and passed to functions:
002|
003| # 13 tuples with random integers…
004| nums = [
005| (“h”,82), (“d”,20), (“e”,99), (“m”,99), (“l”,31),
006| (“i”,84), (“g”, 4), (“a”,77), (“f”,27), (“j”,98),
007| (“b”,72), (“c”,17), (“k”,27),
008| ]
009|
010| # List with default tuple sort…
011| for ix,num in enumerate(sorted(nums)):
012| print(f’{ix:2d}: {num[0]} = {num[1]:2d}‘)
013| print()
014|
015| # List with defined sort function…
016| for ix,num in enumerate(sorted(nums, key=lambda t:t[1])):
017| print(f’{ix:2d}: {num[1]:2d} = {num[0]}‘)
018| print()
019|
Lines #4 to #8 define a list of 13 tuple objects, each containing a string and an integer. The strings are single lowercase characters in alphabetical order. The integers are random values from 1 to 99.
The built-in sorted function (lines #11 and #16) sorts whatever iterable it’s given. In this case a list of tuple objects. By default, when sorting objects such as tuples, sorted considers the first item in each (moving on to later items on a match — all items matching means the two match). Here, the first item is the string, so lines #11 to #13 sort alphabetically.
In line #16 we use the optional key parameter to extract a sorting value, and we use an inline lambda object to provide it. The key argument must be a one-parameter function expecting an item from the list and returning a sortable key. In this case, the t parameter is a tuple from the list, and we return the second item — the integer — as the sort-key.
When run, this prints:
0: a = 77 1: b = 72 2: c = 17 3: d = 20 4: e = 99 5: f = 27 6: g = 4 7: h = 82 8: i = 84 9: j = 98 10: k = 27 11: l = 31 12: m = 99 0: 4 = g 1: 17 = c 2: 20 = d 3: 27 = f 4: 27 = k 5: 31 = l 6: 72 = b 7: 77 = a 8: 82 = h 9: 84 = i 10: 98 = j 11: 99 = e 12: 99 = m
And on that note, the introductory tour concludes. Any questions?
The ZIP file linked below contains all the Python code fragments from part 1, part 2, part 3, part 4, part 5, and this part 6. Next time we’ll look at downloading, installing, and running Python.
Link: Zip file containing all code fragments used in this six-post tour.
∅
ATTENTION: The WordPress Reader strips the style information from posts, which can destroy certain important formatting elements. If you’re reading this in the Reader, I highly recommend (and urge) you to [A] stop using the Reader and [B] always read blog posts on their website.
This post is: This is Python! (part 6)