Tags

,

While I love coding, it turns out I don’t enjoy blogging about it quite as much as I thought I would. But I’m not ready to toss in the towel yet, and to help develop more of a habit of posting here, I thought I’d try posting some really simple things.

These are simple tricks intended to help beginners dig deeper into coding with Python.

The task here is breaking a given string of bytes into groups of a user-specified width. This to support a hex dump application that can display bytes (8-bits), words (16-bits), or double-words (32-bits). Or byte-groups of any user-specified size. Here’s an example of a hex dump displaying bytes (16 per row):

0000 | 23 09 54 69 74 6c 65 09 49 53 42 4e 09 44 61 74 | #.Title.ISBN.Dat |
0010 | 65 09 56 6f 69 63 65 09 4f 72 64 65 72 52 65 61 | e.Voice.OrderRea |
0020 | 64 0d 0a 31 09 4b 69 6c 6c 69 6e 67 20 46 6c 6f | d..1.Killing.Flo |
0030 | 6f 72 09 30 2d 35 31 35 2d 31 32 33 34 34 2d 37 | or.0-515-12344-7 |
0040 | 09 4d 61 72 63 68 20 31 39 39 37 09 31 73 74 09 | .March.1997.1st. |
0050 | 31 37 0d 0a 32 09 44 69 65 20 54 72 79 69 6e 67 | 17..2.Die.Trying |
0060 | 09 30 2d 33 39 39 2d 31 34 33 37 39 2d 33 09 4a | .0-399-14379-3.J |
0070 | 75 6c 79 20 31 39 39 38 09 33 72 64 09 31 36 0d | uly.1998.3rd.16. |

A second requirement is that the order of the bytes within the group (if the group is larger than one) should be selectable between “big endian” and “little endian”.

A third requirement asks for the groups in rows of a user-specified length. This allows the application to build both the row of hex bytes and the row of characters that accompany it on the right side.

Not everything we look at will meet all three, but they are useful recipes for cases that don’t involve all three requirements.

§ §

Simply breaking a sequence into chunks of a given size is very easy in Python:

001| def chunker (seq, width):
002|     ”’Return seq elements in chunks.”’
003|     for ix in range(0, len(seq), width):
004|         yield seq[ix:ix+width]
005| 
006| example_text = ”’\
006| While I love coding, it turns out I don’t enjoy blogging \
006| about it quite as much as I thought I would. I’m not \
006| ready to toss in the towel yet, and to help develop more \
006| of a habit of posting here, I thought I’d try posting \
006| some really simple things.”’

007| 
008| for c in chunker(example_text, 24):
009|     print(c)
010| print()
011| 

The chunker function (line #1) only needs two lines of code to break the sequence you pass it into chunks of the specified size.

When run, it prints:

While I love coding, it
turns out I don't enjoy
blogging about it quite
as much as I thought I w
ould. I'm not ready to t
oss in the towel yet, an
d to help develop more o
f a habit of posting her
e, I thought I'd try pos
ting some really simple
things.

With a little juggling we can use it to add commas to numbers (which necessarily makes them strings):

001| def commas (number):
002|     ”’Insert commas every three digits.”’
003|     s0 = str(number)
004|     s1 = chunker(list(reversed(s0)),3)
005|     s2 = [.join(ch) for ch in s1]
006|     s3 = ‘,’.join(s2)
007|     s4 = reversed(s3)
008|     return .join(s4)
009| 
010| print(commas(1))
011| print(commas(12))
012| print(commas(123))
013| print(commas(1230))
014| print(commas(12300))
015| print(commas(123000))
016| print(commas(1230000))
017| print(commas(12300000))
018| print(commas(123000000))
019| 

The function expects a number, so first we turn it into a string (line #3). We reverse that string (because we want to chunk from the tail) and pass it to the chunker (line #4, which returns a generator object (s1). In line #5 we join the triplets of numbers into three-digit strings, and in line #6 we join those with commas between them. Line #7 reverses the reversed numbers, and line #8 rejoins the characters and returns the string.

There are easier ways to add commas to a number, but this show how the chunker can be used. Here’s another example a little closer to our requirements:

001| seq = bytes(range(256))
002| 
003| for group in chunker(seq, 16):
004|     s = [‘%02x’ % c for c in group]
005|     print(‘ ‘.join(s))
006| print()
007| 

When run, this prints:

00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f
30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f
40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f
50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f
60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f
70 71 72 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 7f
80 81 82 83 84 85 86 87 88 89 8a 8b 8c 8d 8e 8f
90 91 92 93 94 95 96 97 98 99 9a 9b 9c 9d 9e 9f
a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af
b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf
c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf
d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dc dd de df
e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 ea eb ec ed ee ef
f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff

§

But chunker doesn’t meet the second and third requirements. Taking things one step at a time, here’s a version that meets the third (slightly easier) requirement:

001| seq = bytes(range(256))
002| 
003| def blocks (seq, rowsize, width):
004|     for rx in range(0, len(seq), rowsize):
005|         row = seq[rx:rx+rowsize]
006|         bs = [row[cx:cx+width] for cx in range(0, len(row), width)]
007|         yield bs
008| 
009| def demo_blocks (rowsize, width):
010|     for row in blocks(seq, rowsize, width):
011|         line = []
012|         for datum in row:
013|             s =  [‘%02x’ % b for b in datum]
014|             line.append(.join(s))
015|         print(‘ ‘.join(line))
016|     print()
017| 
018| demo_blocks(16, 1)
019| demo_blocks(16, 2)
020| demo_blocks(16, 4)
021| 

The blocks function returns rows of groups of bytes, thus meeting the third requirement. When run, this prints:

00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f
30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f
40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f
50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f
60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f
70 71 72 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 7f
80 81 82 83 84 85 86 87 88 89 8a 8b 8c 8d 8e 8f
90 91 92 93 94 95 96 97 98 99 9a 9b 9c 9d 9e 9f
a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af
b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf
c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf
d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dc dd de df
e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 ea eb ec ed ee ef
f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff

0001 0203 0405 0607 0809 0a0b 0c0d 0e0f
1011 1213 1415 1617 1819 1a1b 1c1d 1e1f
2021 2223 2425 2627 2829 2a2b 2c2d 2e2f
3031 3233 3435 3637 3839 3a3b 3c3d 3e3f
4041 4243 4445 4647 4849 4a4b 4c4d 4e4f
5051 5253 5455 5657 5859 5a5b 5c5d 5e5f
6061 6263 6465 6667 6869 6a6b 6c6d 6e6f
7071 7273 7475 7677 7879 7a7b 7c7d 7e7f
8081 8283 8485 8687 8889 8a8b 8c8d 8e8f
9091 9293 9495 9697 9899 9a9b 9c9d 9e9f
a0a1 a2a3 a4a5 a6a7 a8a9 aaab acad aeaf
b0b1 b2b3 b4b5 b6b7 b8b9 babb bcbd bebf
c0c1 c2c3 c4c5 c6c7 c8c9 cacb cccd cecf
d0d1 d2d3 d4d5 d6d7 d8d9 dadb dcdd dedf
e0e1 e2e3 e4e5 e6e7 e8e9 eaeb eced eeef
f0f1 f2f3 f4f5 f6f7 f8f9 fafb fcfd feff

00010203 04050607 08090a0b 0c0d0e0f
10111213 14151617 18191a1b 1c1d1e1f
20212223 24252627 28292a2b 2c2d2e2f
30313233 34353637 38393a3b 3c3d3e3f
40414243 44454647 48494a4b 4c4d4e4f
50515253 54555657 58595a5b 5c5d5e5f
60616263 64656667 68696a6b 6c6d6e6f
70717273 74757677 78797a7b 7c7d7e7f
80818283 84858687 88898a8b 8c8d8e8f
90919293 94959697 98999a9b 9c9d9e9f
a0a1a2a3 a4a5a6a7 a8a9aaab acadaeaf
b0b1b2b3 b4b5b6b7 b8b9babb bcbdbebf
c0c1c2c3 c4c5c6c7 c8c9cacb cccdcecf
d0d1d2d3 d4d5d6d7 d8d9dadb dcdddedf
e0e1e2e3 e4e5e6e7 e8e9eaeb ecedeeef
f0f1f2f3 f4f5f6f7 f8f9fafb fcfdfeff

So, we’re getting closer to our goal. Note that blocks is not restricted to any specific group or row size. When the row size isn’t a multiple of the group size, the final group won’t be full:

0001020304 0506070809 0a0b0c0d0e 0f
1011121314 1516171819 1a1b1c1d1e 1f
2021222324 2526272829 2a2b2c2d2e 2f
3031323334 3536373839 3a3b3c3d3e 3f
4041424344 4546474849 4a4b4c4d4e 4f
5051525354 5556575859 5a5b5c5d5e 5f
6061626364 6566676869 6a6b6c6d6e 6f
7071727374 7576777879 7a7b7c7d7e 7f
8081828384 8586878889 8a8b8c8d8e 8f
9091929394 9596979899 9a9b9c9d9e 9f
a0a1a2a3a4 a5a6a7a8a9 aaabacadae af
b0b1b2b3b4 b5b6b7b8b9 babbbcbdbe bf
c0c1c2c3c4 c5c6c7c8c9 cacbcccdce cf
d0d1d2d3d4 d5d6d7d8d9 dadbdcddde df
e0e1e2e3e4 e5e6e7e8e9 eaebecedee ef
f0f1f2f3f4 f5f6f7f8f9 fafbfcfdfe ff

That’s for a group size of five and a row size of sixteen. Since the first three groups use fifteen bytes, the last group only has one. This is a consequence of specifying row size. The advantage is that the application only deals in fixed-size rows, which makes addressing and formatting simpler (although situations like the above can throw of the formatting).

The final row also can be incomplete if the row size doesn’t divide the sequence of bytes evenly. For example:

000102 030405 060708 090a0b 0c0d0e 0f1011 121314 151617
18191a 1b1c1d 1e1f20 212223 242526 272829 2a2b2c 2d2e2f
303132 333435 363738 393a3b 3c3d3e 3f4041 424344 454647
48494a 4b4c4d 4e4f50 515253 545556 575859 5a5b5c 5d5e5f
606162 636465 666768 696a6b 6c6d6e 6f7071 727374 757677
78797a 7b7c7d 7e7f80 818283 848586 878889 8a8b8c 8d8e8f
909192 939495 969798 999a9b 9c9d9e 9fa0a1 a2a3a4 a5a6a7
a8a9aa abacad aeafb0 b1b2b3 b4b5b6 b7b8b9 babbbc bdbebf
c0c1c2 c3c4c5 c6c7c8 c9cacb cccdce cfd0d1 d2d3d4 d5d6d7
d8d9da dbdcdd dedfe0 e1e2e3 e4e5e6 e7e8e9 eaebec edeeef
f0f1f2 f3f4f5 f6f7f8 f9fafb fcfdfe ff

That’s with a group size of three and a row size of twenty-four. The row size is a multiple of the group size, so the groups of each row all contain their three bytes (except at the very end). But the row size isn’t a multiple of the length of the byte sequence (256), so the last row is missing groups (and the last group is missing bytes).

The application can allow the dangling groups or raise an error. There’s no reason for a hex dump to expect the last row to be full, though, so a dangling last row must be taken as a normal occurrence. This can have some formatting consequences.

§

Finally, here’s one way to implement the second requirement:

001| seq = bytes(range(256))
002| 
003| def get_row (seq, rowsize=16, width=1, endian=‘BE’):
004|     ”’Given a sequence of bytes, return BYTES, WORDS or DWORDS.”’
005|     _endian = endian.upper()
006|     assert _endian in [‘LE’,‘BE’], ValueError(‘Endian must be LE or BE’)
007| 
008|     # Generate the byte indexes
009|     bixs = list(reversed(range(width)) if _endian==‘BE’ else range(width))
010| 
011|     # For each row of data…
012|     for rx in range(0, len(seq), rowsize):
013|         row = []
014| 
015|         # For each group of bytes…
016|         for gx in range(0, rowsize, width):
017|             if len(seq) <= (rx+gx):
018|                 break
019|             accum = 0
020| 
021|             # For each byte in the group…
022|             for ix,bix in enumerate(bixs):
023|                 if (rx+gx+bix) < len(seq):
024|                     # Get byte (or zero if beyond row)…
025|                     value = seq[rx+gx+bix] if (gx+bix)<rowsize else 0
026|                     # Shift value into the accumulator…
027|                     accum += (value << (8 * ix))
028| 
029|             # Add the value to the row…
030|             row.append(accum)
031| 
032|         # Yield the row…
033|         yield (rx, row)
034| 
035| 
036| def demo_get_row (rowsize=16, width=1, endian=‘le’):
037|     for rx,row in get_row(seq, rowsize=rowsize, width=width, endian=endian):
038|         buf = [‘%0*x’ % (2*width,g) for g in row]
039|         print(‘%04x: %s’ % (rx,‘ ‘.join(buf)))
040|     print()
041| 
042| demo_get_row(rowsize=16, width=1)
043| 
044| demo_get_row(rowsize=16, width=2, endian=‘le’)
045| demo_get_row(rowsize=16, width=2, endian=‘be’)
046| 
047| demo_get_row(rowsize=16, width=4, endian=‘le’)
048| demo_get_row(rowsize=16, width=4, endian=‘be’)
049| 

The code starts by processing the endian parameter (lines #5 and #6) to ensure it’s in a valid state. Line #9 uses this value to determine how to generate the byte order indexes (bixs), which range [0,1,2…width-1] or [width-1, … 2, 1, 0] depending. Note that if width=1, then these are identically just [0].

Generating the rows of groups of bytes involves three nested loops. The first (starting on line #12) iterates over the rows, stepping by rowsize. The second loop (starting on line #16) iterates over the groups, stepping by width. The third (starting on line #22) iterates over the byes in the group. The byte indexes determine the order in which the bytes are selected, but the iteration index (ix) determines how they’re shifted into the accumulator.

So, each byte of a group is shifted appropriately and added to the accumulator (accum). Each group, then, is an integer added to row, which the function yields when full (or there are no more bytes). The function also yields the current row index (rx), which the calling function can use as the row address.

When run, this prints:

0000: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
0010: 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
0020: 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f
0030: 30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f
0040: 40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f
0050: 50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f
0060: 60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f
0070: 70 71 72 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 7f
0080: 80 81 82 83 84 85 86 87 88 89 8a 8b 8c 8d 8e 8f
0090: 90 91 92 93 94 95 96 97 98 99 9a 9b 9c 9d 9e 9f
00a0: a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af
00b0: b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf
00c0: c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf
00d0: d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dc dd de df
00e0: e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 ea eb ec ed ee ef
00f0: f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff

0000: 0100 0302 0504 0706 0908 0b0a 0d0c 0f0e
0010: 1110 1312 1514 1716 1918 1b1a 1d1c 1f1e
0020: 2120 2322 2524 2726 2928 2b2a 2d2c 2f2e
0030: 3130 3332 3534 3736 3938 3b3a 3d3c 3f3e
0040: 4140 4342 4544 4746 4948 4b4a 4d4c 4f4e
0050: 5150 5352 5554 5756 5958 5b5a 5d5c 5f5e
0060: 6160 6362 6564 6766 6968 6b6a 6d6c 6f6e
0070: 7170 7372 7574 7776 7978 7b7a 7d7c 7f7e
0080: 8180 8382 8584 8786 8988 8b8a 8d8c 8f8e
0090: 9190 9392 9594 9796 9998 9b9a 9d9c 9f9e
00a0: a1a0 a3a2 a5a4 a7a6 a9a8 abaa adac afae
00b0: b1b0 b3b2 b5b4 b7b6 b9b8 bbba bdbc bfbe
00c0: c1c0 c3c2 c5c4 c7c6 c9c8 cbca cdcc cfce
00d0: d1d0 d3d2 d5d4 d7d6 d9d8 dbda dddc dfde
00e0: e1e0 e3e2 e5e4 e7e6 e9e8 ebea edec efee
00f0: f1f0 f3f2 f5f4 f7f6 f9f8 fbfa fdfc fffe

0000: 0001 0203 0405 0607 0809 0a0b 0c0d 0e0f
0010: 1011 1213 1415 1617 1819 1a1b 1c1d 1e1f
0020: 2021 2223 2425 2627 2829 2a2b 2c2d 2e2f
0030: 3031 3233 3435 3637 3839 3a3b 3c3d 3e3f
0040: 4041 4243 4445 4647 4849 4a4b 4c4d 4e4f
0050: 5051 5253 5455 5657 5859 5a5b 5c5d 5e5f
0060: 6061 6263 6465 6667 6869 6a6b 6c6d 6e6f
0070: 7071 7273 7475 7677 7879 7a7b 7c7d 7e7f
0080: 8081 8283 8485 8687 8889 8a8b 8c8d 8e8f
0090: 9091 9293 9495 9697 9899 9a9b 9c9d 9e9f
00a0: a0a1 a2a3 a4a5 a6a7 a8a9 aaab acad aeaf
00b0: b0b1 b2b3 b4b5 b6b7 b8b9 babb bcbd bebf
00c0: c0c1 c2c3 c4c5 c6c7 c8c9 cacb cccd cecf
00d0: d0d1 d2d3 d4d5 d6d7 d8d9 dadb dcdd dedf
00e0: e0e1 e2e3 e4e5 e6e7 e8e9 eaeb eced eeef
00f0: f0f1 f2f3 f4f5 f6f7 f8f9 fafb fcfd feff

0000: 03020100 07060504 0b0a0908 0f0e0d0c
0010: 13121110 17161514 1b1a1918 1f1e1d1c
0020: 23222120 27262524 2b2a2928 2f2e2d2c
0030: 33323130 37363534 3b3a3938 3f3e3d3c
0040: 43424140 47464544 4b4a4948 4f4e4d4c
0050: 53525150 57565554 5b5a5958 5f5e5d5c
0060: 63626160 67666564 6b6a6968 6f6e6d6c
0070: 73727170 77767574 7b7a7978 7f7e7d7c
0080: 83828180 87868584 8b8a8988 8f8e8d8c
0090: 93929190 97969594 9b9a9998 9f9e9d9c
00a0: a3a2a1a0 a7a6a5a4 abaaa9a8 afaeadac
00b0: b3b2b1b0 b7b6b5b4 bbbab9b8 bfbebdbc
00c0: c3c2c1c0 c7c6c5c4 cbcac9c8 cfcecdcc
00d0: d3d2d1d0 d7d6d5d4 dbdad9d8 dfdedddc
00e0: e3e2e1e0 e7e6e5e4 ebeae9e8 efeeedec
00f0: f3f2f1f0 f7f6f5f4 fbfaf9f8 fffefdfc

0000: 00010203 04050607 08090a0b 0c0d0e0f
0010: 10111213 14151617 18191a1b 1c1d1e1f
0020: 20212223 24252627 28292a2b 2c2d2e2f
0030: 30313233 34353637 38393a3b 3c3d3e3f
0040: 40414243 44454647 48494a4b 4c4d4e4f
0050: 50515253 54555657 58595a5b 5c5d5e5f
0060: 60616263 64656667 68696a6b 6c6d6e6f
0070: 70717273 74757677 78797a7b 7c7d7e7f
0080: 80818283 84858687 88898a8b 8c8d8e8f
0090: 90919293 94959697 98999a9b 9c9d9e9f
00a0: a0a1a2a3 a4a5a6a7 a8a9aaab acadaeaf
00b0: b0b1b2b3 b4b5b6b7 b8b9babb bcbdbebf
00c0: c0c1c2c3 c4c5c6c7 c8c9cacb cccdcecf
00d0: d0d1d2d3 d4d5d6d7 d8d9dadb dcdddedf
00e0: e0e1e2e3 e4e5e6e7 e8e9eaeb ecedeeef
00f0: f0f1f2f3 f4f5f6f7 f8f9fafb fcfdfeff

Which pretty much gets us to our goal. This function is well behaved when either groups or the last row dangles, so it can be used with any combination of rowsize and width. For instance:

0000: 0001020304 0506070809 0a0b0c0d0e 0f10000000
0011: 1112131415 161718191a 1b1c1d1e1f 2021000000
0022: 2223242526 2728292a2b 2c2d2e2f30 3132000000
0033: 3334353637 38393a3b3c 3d3e3f4041 4243000000
0044: 4445464748 494a4b4c4d 4e4f505152 5354000000
0055: 5556575859 5a5b5c5d5e 5f60616263 6465000000
0066: 666768696a 6b6c6d6e6f 7071727374 7576000000
0077: 7778797a7b 7c7d7e7f80 8182838485 8687000000
0088: 88898a8b8c 8d8e8f9091 9293949596 9798000000
0099: 999a9b9c9d 9e9fa0a1a2 a3a4a5a6a7 a8a9000000
00aa: aaabacadae afb0b1b2b3 b4b5b6b7b8 b9ba000000
00bb: bbbcbdbebf c0c1c2c3c4 c5c6c7c8c9 cacb000000
00cc: cccdcecfd0 d1d2d3d4d5 d6d7d8d9da dbdc000000
00dd: dddedfe0e1 e2e3e4e5e6 e7e8e9eaeb eced000000
00ee: eeeff0f1f2 f3f4f5f6f7 f8f9fafbfc fdfe000000
00ff: ff00000000

Which is the output for rowsize=17 and width=5 (and endian=’BE’). Note the dangling groups at the end of each row as well as the dangling last row. A strange way to view the data, perhaps, but the function delivers what was asked for.

§

The last part is the character equivalents usually shown to the right of the byte dump. That’s generally straightforward with the exception that some byte values don’t result in printable characters, which has to be detected and the character converted to something printable. (Often, they are just replaced with periods.)

There is also the consideration of what, if anything, to show for WORD and DWORD sizes (let alone stranger sizes). One choice is to treat the values as Unicode code points. A sophisticated system might even use UTF-16 when the width=2 and use the endian value to select UTF-16LE versus UTF-16BE. Likewise, when width=4, UTF-32LE and UTF-32BE. Note that this assumes the data being examined actually is Unicode — not only will the character equivalents mean nothing, but any width that allows values above Unicode’s 21 bits (0x110000) is likely to contain lots of non-Unicode values.

Here’s the hex_dump function, which uses the gen_data_rows function to generate a configurable hex dump from an input byte sequence:

001| def hex_dump (seq, rowsize=16, width=1, endian=‘le’):
002|     ”’Configurable Hex Dump function.”’
003| 
004|     def char (cx):
005|         ”’Convert character index to a character.”’
006|         if 0x110000 <= cx:
007|             # Return non-unicode code point…
008|             return ‘?’
009|         # Convert index to a character…
010|         ch = chr(cx)
011|         if chr(cx).isprintable():
012|             # Return printable character…
013|             return ch
014|         # Return non-printable character…
015|         return ‘.’
016| 
017|     # Determine the number of number groups per row…
018|     rowcount,rem = divmod(rowsize, width)
019|     if rem:
020|         rowcount += 1
021| 
022|     # Iterate over the rows of data…
023|     for rx,row in get_row(seq, rowsize=rowsize, width=width, endian=endian):
024| 
025|         # Generate hex values from row data…
026|         xvals = [‘%0*x’ % (2*width,g) for g in row]
027|         # Check for dangling (last) row…
028|         while len(xvals) < rowcount:
029|             # Fill out the row…
030|             xvals.append(‘%*s’ % (2*width,))
031| 
032|         # Generate character values from row data…
033|         svals = [char(g) for g in row]
034| 
035|         # Print the row…
036|         print(‘%04x| %s | %s’ % (rx,‘ ‘.join(xvals), .join(svals)))
037| 
038|     # End with a nice blank line…
039|     print()
040| 

The inner char function (lines #4 thru #15) is similar to the built-in chr function but checks for out-of-range values and unprintable characters, which it returns as ‘?’ and ‘.’, respectively.

Lines #18 thru #20 figure out what the count of hex values in a row, which is essentially rowsize÷width except when rowsize isn’t a multiple of width. In such cases, we want (rowsize÷width)+1. The rowcount value comes into play if the final row is dangling. It specifies how long the row should be.

Line #23 begins a loop that iterates over data rows. Line #26 generates the hex strings from row data. If the row isn’t full, lines #28 thru #30 fill it out. Line #33 generates the character values with the help of the char function.

Line #36 prints the hex output.

§

Note that the code examples above have all been Python generators. [See Python Generators, Part 1, Part 2, and Part 3 for more information.] That’s good design sense, because it doesn’t force these routines to deal with the entire byte sequence — which could be quite long — all at once.

But it’s certainly not the only option. Here’s a simple example using a class and designed as an iterator:

001| seq = bytes(range(256))
002| 
003| class feeder (object):
004|     ”’Given a sequence of bytes, return then in groups.”’
005| 
006|     def __init__ (self, seq, width=1, endian=‘LE’):
007|         ”’New feeder instance.”’
008|         self.seq = seq
009|         self.wid = width
010|         self.end = endian.upper()
011|         self.bixs = list(range(width))
012|         assert self.end in [‘LE’,‘BE’], ValueError(‘Endian not LE or BE.’)
013|         # Generate the byte indexes
014|         if self.end==‘BE’:
015|             self.bixs = list(reversed(range(width)))
016|         # Set data index to beginning of sequence…
017|         self.dix = 0
018| 
019|     def __next__ (self):
020|         ”’Next data value.”’
021|         if len(self.seq) <= self.dix:
022|             raise StopIteration()
023| 
024|         # Generate next data value…
025|         value = 0
026|         for ix,bix in enumerate(self.bixs):
027|             curr = 0
028|             if (self.dix+bix) < len(self.seq):
029|                 curr = self.seq[self.dix + bix]
030|             value += (curr << (8 * ix))
031| 
032|         # Advance the data index…
033|         self.dix += self.wid
034|         # Return the value…
035|         return value
036| 
037|     def __iter__ (self):
038|         ”’Get a new iterator.”’
039|         return feeder(self.seq, width=self.wid, endian=self.end)
040| 
041| 
042| def demo_feeder (width=1, endian=‘LE’):
043|     # Iterate over data values…
044|     for ix,datum in enumerate(feeder(seq, width=width, endian=endian)):
045|         print(‘%04x: %0*x’ % (ix*width, 2*width, datum))
046|     print()
047| 
048| demo_feeder(12)
049| demo_feeder(16, ‘be’)
050| 

This only provides a stream of numbers taken from the byte sequence with configurable width and byte-order. As its name implies, it’s just a feeder to some other function that consumes the restructured byte stream. In particular, it doesn’t generate rows or return the current index.

Those are left as reader exercises!