Tags

, , , ,

This series of posts is for those who have used a programming language before but are not familiar with Python. In previous posts we’ve introduced the language, installed it, and begun learning how to create our own data types (aka classes).

In this post we’ll dig into a very important topic: reading and writing files.

Let’s jump right in:

001| ### Reading a text file…
002| 
003| filename = r”C:\Demo\HCC\Python\Python Zen.txt”
004| 
005| # Open the file…
006| file_in = open(filename)
007| 
008| # Read the entire file contents…
009| file_content = file_in.read()
010| 
011| # Close the file…
012| file_in.close()
013| 
014| # Print the file contents…
015| print(file_content)
016| 

Line #3 assigns a full path and filename to filename. Note the letter r prepending the quoted string literal. We’ve been using f-strings (“format strings”), which have the letter f prepended. A prepended r makes a string “raw” — it disables the usual escape sequences [see Filename Notes below].

The built-in open function (line #6) — as its name says — opens a file. In this case, we’re reading an ordinary text file, so we can accept the default parameters for the function, and the only required parameter is the filename.

Assuming the open function succeeds, it returns a file object. As with any Python object, it has methods for interacting with it. On line #9 we call its read method to read the (entire!) file text into a variable named file_content (as one long string).

On line #12, we call the close method to close the file. (This isn’t strictly necessary when reading a file — Python will close the file when the script ends — but it’s good practice.)

Finally, on line #15, we print the text string.

When run, this prints:

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Which is the official ethic of Python.

This is a Python “Easter egg”. In a Python interactive prompt, type import this:

>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
...
...

For another Easter Egg, type import antigravity. 😎

Filename Notes:

Escape sequences are two-letter codes that begin with a backslash (e.g. \n for a new-line character). When Python sees a backslash in a normal string, it expects the next character to tell it what special code to insert. If the next character isn’t a special code, Python inserts it as is. Either way, the two-letter pair is replaced by a single character, either a special code or the character after the backslash.

Windows filenames have backslashes, which Python normally sees as escape sequences. What used to happen is that when the character following a pathname backslash was a special code, the two characters were replaced by that special code. Otherwise, the two characters were replaced by just the character following the backslash. Either way, the backslashes were stripped from the filename. A raw string disabled escape sequence processing and preserved the filename intact.

Recently, that changed. Python recognizes Windows filenames, allows them as-is, but prints a warning:

001| # Using a regular string…
002| filename = “C:\Demo\HCC\Python\Python Zen.txt”
003| print(filename)
004| print()
005| 
006| # Using a regular string with double backslashes…
007| filename = “C:\\Demo\\HCC\\Python\\Python Zen.txt”
008| print(filename)
009| print()
010| 
011| # Using a raw string…
012| filename = r”C:\Demo\HCC\Python\Python Zen.txt”
013| print(filename)
014| print()
015| 
016| # Using a raw+format string…
017| which_file = “Python Zen”
018| 
019| filename = rf”C:\Demo\HCC\Python\{which_file}.txt
020| print(filename)
021| print()
022| 

When run, this prints:

Warning (from warnings module):
  File "C:\CJS\prj\Python\blog\hcc\source\fragment.py", line 2
    filename = "C:\Demo\HCC\Python\Python Zen.txt"
SyntaxWarning: invalid escape sequence '\D'
>>> 
======= RESTART: fragment.py =======
C:\Demo\HCC\Python\Python Zen.txt

C:\Demo\HCC\Python\Python Zen.txt

C:\Demo\HCC\Python\Python Zen.txt

C:\Demo\HCC\Python\Python Zen.txt

This is due to line #2, which has a Windows filename in a regular string. Line #7 shows how to avoid the warning when using a regular string — “escape” the backslashes by doubling them.

Line #12 uses a raw string, which treats the backslashes like any other character. Raw strings are useful with filenames and regular expressions (which often also have backslash characters).

Line #19 shows that raw and format strings can be combined. The prepended r and f characters can be in any order.


The first example read the entire file into a single string variable. This is fine for some kinds of processing, but sometimes it’s nice to break a text file into a list of individual lines. Python makes that easy, too:

001| ### Reading a text file…
002| 
003| filename = r”C:\Demo\HCC\Python\Python Zen.txt”
004| 
005| # Open the file…
006| file_in = open(filename)
007| 
008| # Read the entire file contents…
009| file_content = file_in.readlines()
010| 
011| # Close the file…
012| file_in.close()
013| 
014| print(f’file_content is a {type(file_content)}.)
015| print(f’file_content has {len(file_content)} lines.)
016| print()
017| 

The readlines method (line #9) returns the file text as a list object where each list item — a text string — is a line of the file.

When we open a text file, the returned file object is list-like:

001| ### Reading a text file…
002| 
003| filename = r”C:\Demo\HCC\Python\Python Zen.txt”
004| 
005| # Open the file…
006| file_in = open(filename)
007| 
008| # Read the entire file contents…
009| file_content = list(file_in)
010| 
011| # Close the file…
012| file_in.close()
013| 
014| print(f’file_content is a {type(file_content)}.)
015| print(f’file_content has {len(file_content)} lines.)
016| print()
017| 

When we put a (text) file object into a list context, it returns a list of the lines of the file. When run, both these code fragments print:

file_content is a <class 'list'>.
file_content has 21 lines.

Having a list-like file object means we can also do this:

001| ### Reading a text file…
002| 
003| filename = r”C:\Demo\HCC\Python\Python Zen.txt”
004| 
005| # Open the file…
006| file_in = open(filename)
007| 
008| # Print the file contents…
009| for line in file_in:
010|     print(line)
011| print()
012| 
013| # Close the file…
014| file_in.close()
015| 

When run, this prints the Python Zen text but with a minor snag — it’s double-spaced; there’s an unexpected blank line between each line of text.

This crops up in many languages. It’s because the print statement normally adds its own new-line character at the end. But each line of the file also has a new-line character, so each line is printed with two new-line characters.

One remedy is to tell print to print nothing at the end:

001| ### Reading a text file…
002| 
003| filename = r”C:\Demo\HCC\Python\Python Zen.txt”
004| 
005| # Open the file…
006| file_in = open(filename)
007| 
008| # Print the file contents…
009| for line in file_in:
010|     print(line, end=)
011| print()
012| 
013| # Close the file…
014| file_in.close()
015| 

The end keyword parameter in the print statement lets us control what it does after printing whatever string(s) we send it. Here (line #10) we give it the empty string so it appends nothing (the default is the newline character, '\n').

Another is to use the string rstrip method to remove all whitespace from the right side (i.e. the end) of the string:

001| ### Reading a text file…
002| 
003| filename = r”C:\Demo\HCC\Python\Python Zen.txt”
004| 
005| # Open the file…
006| file_in = open(filename)
007| 
008| # Print the file contents…
009| for line in file_in:
010|     line = line.rstrip()
011|     print(line)
012| print()
013| 
014| # Close the file…
015| file_in.close()
016| 

Both print the Python Zen text correctly.


At this point, one might reasonably point out that the two examples above seem to violate one of the tenets of Python Zen:

There should be one — and preferably only one — obvious way to do it.

Fair enough, but the two techniques are slightly different. The first prints what’s truly in the file. It just steps out of the way as far as adding anything to that output. The second alters the file text; it removes all whitespace from the right-hand side of the line. This includes spaces and tabs, so it can potentially remove significant characters (if we care about spaces or tabs).

Note also that we used the rstrip method rather than the strip method because we wanted to remove whitespace only from the end of the string. (There is also an lstrip method that removes whitespace only from the front of the string.)

Bottom line, which is the one obvious way depends on context. When in doubt, focus on the first seven lines of the Python Zen, especially the seventh (it’s rule #1 in my book).


We’ve been casual about opening and reading the file. Python raises an Exception if it can’t open the file or if it can’t read data from it.

That’s fine for quick-n-dirty scripts that only read a file. If Python does raise an Exception during opening or reading the file, the script terminates. That means the script never closes the file, but as mentioned above, Python closes any open files it finds when it terminates.

But when we’re writing to a file, we might want to be more certain. If our code raises an Exception for anything besides a file write error, we almost certainly want to be sure to close the file properly.

Wrapping file access in a try block is one way to ensure this:

001| ### Writing a text file…
002| 
003| filename = r”C:\Demo\HCC\Python\output.txt”
004| 
005| # Open the file…
006| file_out = open(filename, mode=“w”)
007| print(f’opened: {filename})
008| print()
009| 
010| try:
011|     # Write the output data…
012|     for datum in big_list_of_data:
013| 
014|         # Process the data…
015|         temp = first_process(datum)
016|         line = second_process(temp)
017| 
018|         # Write data to file…
019|         file_out.write(line)
020| 
021| except Exception as e:
022|     print(f’ErrType: {type(e).__name__})
023|     print(f’ErrText: {e})
024|     print()
025| 
026| else:
027|     print(‘Wrote all data successfully!’)
028| 
029| finally:
030|     # Close the file…
031|     file_out.close()
032|     print(f’closed: {filename})
033|     print()
034| 

We’re trying to write the big_list_of_data (line #12) to an output file, and each line of data needs to be processed by two functions (lines #15 and #16) before being written (line #19).

Note on line #6 that when we open a file for writing we set the mode keyword parameter to "w". Its default value is "r", which is why it’s not required for reading text files.

This code has a problem because we haven’t defined big_list_of_data or the two processing functions. Those are inside the try block, so when we run this, it prints:

opened: C:\Demo\HCC\Python\output.txt

ErrType: NameError
ErrText: name 'big_list_of_data' is not defined

closed: C:\Demo\HCC\Python\output.txt

In this case, Python can’t find the function, so line #12 make Python jump down to the try-except block (lines #21 to #24).

The else block (lines #26 and #27) executes only if the try block completes successfully. (This is similar to using else in a for or while loop — see part 4).

The finally block (lines #29 to #33) always executes after either the try block completes successfully (and then the else block if present), or after an Exception triggers the except block. This is a good place to put the close method (line #31).


With file objects, Python offers an even cleaner way to ensure proper file closure:

001| ### Reading a text file using with…
002| 
003| filename = r”C:\Demo\HCC\Python\Python Zen.txt”
004| 
005| # Open the file with context manager…
006| with open(filename) as file_in:
007| 
008|     # Print the file contents…
009|     for line in file_in:
010|         print(line, end=)
011|     print()
012| 

The with keyword invokes a context manager that automatically cleans up if Python raises an Exception within the context of the manager (i.e. within its block). The context manager for files ensures file closure, we no longer need to do it explicitly (or rely on Python to do it later).

Using the file object context manager, the file write example above becomes:

001| ### Writing a text file using with…
002| 
003| filename = r”C:\Demo\HCC\Python\output.txt”
004| 
005| # Open the file…
006| with open(filename, mode=“w”) as file_out:
007| 
008|     for line in big_list_of_data:
009| 
010|         # Process the data…
011|         line = first_process(line)
012|         line = second_process(line)
013| 
014|         # Write data to file…
015|         file_out.write(line)
016| 

Which, of course, fails for the same reason as the one above but without the nice error-handling. To get that, we’d need to wrap the with block in a try block (which, in my mind, rather defeats the point of the with block, so I tend not to use with for files).

That said, here’s a more interesting example:

001| ### Convert Python Zen to Rotated Zen…
002| 
003| basepath = r”C:\Demo\HCC\Python”
004| filename_in = rf”{basepath}\Python Zen.txt
005| filename_out = rf”{basepath}\Rotated Zen.txt
006| 
007| 
008| # Read input file with context manager…
009| with open(filename_in) as file_in:
010| 
011|     # Get the file contents…
012|     file_text = file_in.read()
013| 
014| print(‘read: {filename_in} ({len(file_text)} chars)’)
015| 
016| 
017| # Write output file with context manager…
018| with open(filename_out, mode=“w”) as file_out:
019| 
020|     # Write ROT13 version to output…
021|     for char in file_text:
022| 
023|         # Get the character index…
024|         cx = ord(char)
025| 
026|         # If it’s an uppercase char…
027|         if ord(‘A’) <= cx <= ord(‘Z’):
028|             # Separate A-M from N-Z…
029|             if cx < ord(‘N’):
030|                 file_out.write(chr(cx+13))
031|             else:
032|                 file_out.write(chr(cx13))
033| 
034|             # Next char…
035|             continue
036| 
037|         # If it’s a lowercase char…
038|         if ord(‘a’) <= cx <= ord(‘z’):
039|             # Separate a-m from n-z…
040|             if cx < ord(‘n’):
041|                 file_out.write(chr(cx+13))
042|             else:
043|                 file_out.write(chr(cx13))
044| 
045|             # Next char…
046|             continue
047| 
048|         # Not an alpha-char, so just write it…
049|         file_out.write(char)
050| 
051| print(‘wrote: {filename_out}’)
052| 

Lines #3 to #5 create the two filenames. Both files are in the same directory, so we use a common pathname for both. All three strings are raw to allow for the backslashes.

Lines #9 to #12 open the input file and read its contents into the file_text string. Reaching the print statement on line #14 means the read worked.

Lines #18 to #49 write the output file, a ROT13 encoded version of the input text. Reaching the print statement on line #51 means the write worked.

We do the ROT13 conversion by converting each character to its index (line #24) and testing that index to see if the character is uppercase (lines #27 to #35) or lowercase (lines #38 to #46). Within each of those we test to see if the character is A-M versus N-Z (or a-m versus n-z). If the character is in the lower half of the alphabet, we add 13 to its index; if it’s in the upper half, we subtract 13.

If the character isn’t alphabetic, we just write it (line #49).

Note that we use the built-in ord function to get a character’s index (aka ordinal number) and the built-in chr function to convert an index back to its character.


In the previous example we had a common pathname and used f-strings to insert it into both filenames. Python has a much better way to do this:

001| ### The os.path module…
002| from os import path
003| 
004| BasePath = r”C:\Demo\HCC\Python”
005| 
006| filename_in = path.join(BasePath, “Python Zen.txt”)
007| filename_out = path.join(BasePath, “Rotated Zen.txt”)
008| 
009| print(f’input: {filename_in})
010| print(f’output: {filename_out})
011| print()
012| 

We import the os.path module, which is loaded with useful filename functions, a key one being the join function for building filenames.

When run, this prints:

input: C:\Demo\HCC\Python\Python Zen.txt
output: C:\Demo\HCC\Python\Rotated Zen.txt

Here’s a brief demo of some of the other path functions:

001| ### The os.path module…
002| from os import path
003| 
004| BasePath = r”C:\Demo”
005| DemoPath = path.join(BasePath, “HCC”, “Python”)
006| FileName = path.join(DemoPath, “Python Zen.txt”)
007| 
008| print(f’BasePath: {BasePath})
009| print(f’DemoPath: {DemoPath})
010| print()
011| print(f’FileName: {FileName})
012| print(f’basename: {path.basename(FileName)})
013| print(f’dirname: {path.dirname(FileName)})
014| print()
015| 
016| print(f’Exists: {path.exists(FileName)})
017| print(f’IsFile: {path.isfile(FileName)})
018| print(f’IsDir: {path.isdir(FileName)})
019| print()
020| 
021| print(f’FileSize: {path.getsize(FileName)} bytes)
022| print(f’DateCreated: {path.getctime(FileName)})
023| print(f’DateUpdated: {path.getmtime(FileName)})
024| print(f’LastAccess: {path.getatime(FileName)})
025| print()
026| 
027| print(‘Split:’)
028| for part in path.split(FileName):
029|     print(part)
030| print()
031| 
032| print(‘SplitDrive:’)
033| for part in path.splitdrive(FileName):
034|     print(part)
035| print()
036| 
037| print(‘SplitRoot:’)
038| for part in path.splitroot(FileName):
039|     print(part)
040| print()
041| 
042| print(‘SplitExt:’)
043| for part in path.splitext(FileName):
044|     print(part)
045| print()
046| 

When run, this prints:

BasePath: C:\Demo
DemoPath: C:\Demo\HCC\Python

FileName: C:\Demo\HCC\Python\Python Zen.txt
basename: Python Zen.txt
dirname:  C:\Demo\HCC\Python

Exists: True
IsFile: True
IsDir:  False

FileSize: 878 bytes
DateCreated: 1774017627.2220998
DateUpdated: 1774017693.7498984
LastAccess:  1774032851.6244624

Split:
C:\Demo\HCC\Python
Python Zen.txt

SplitDrive:
C:
\Demo\HCC\Python\Python Zen.txt

SplitRoot:
C:
\
Demo\HCC\Python\Python Zen.txt

SplitExt:
C:\Demo\HCC\Python\Python Zen
.txt

Most should be obvious based on their outputs. The path.getctime, path.getmtime, and path.getatime functions (lines #22 to #24) return a floating-point value that is the number of seconds since the epoch (Midnight, January 1, 1970).

The four split functions (lines #27 to #45) return a tuple made of filename parts. The path.split function (line #28) breaks a filename into its path and name (same as the dirname and basename functions above do individually). The path.splitdrive function breaks a name into drive and path+name parts. The path.splitroot function returns three parts, the drive, the root (if present), and the path+name. Lastly, the path.splitext function divides the file extension (including the period) from the rest of the name.

Suppose we want a function that receives a filename and returns it with an output name (same path and name but with an “out” extension), and a logfile name (different path and extension but same base filename):

given:
C:\Demo\HCC\Python\data.json

function returns:
C:\Demo\HCC\Python\data.json
C:\Demo\HCC\Python\data.out
C:\Demo\logs\hcc\data.log

Easily done with the os.path module:

001| ### Generate related filenames…
002| from os import path
003| 
004| LogsPath = r”C:\Demo\logs\hcc”
005| 
006| def get_related_filenames (filename):
007|     “””Generate output and logfile names.”””
008| 
009|     # Divide filename into fpath and fname…
010|     fpath, fname = path.split(filename)
011| 
012|     # Divide fname into name and ext…
013|     name, ext = path.splitext(fname)
014| 
015|     print(f’{fpath = })
016|     print(f’{fname = })
017|     print(f’{name = })
018|     print(f’{ext = })
019|     print()
020| 
021|     # Create output name…
022|     outname = path.join(fpath, f’{name}.out)
023| 
024|     # Create logfile name…
025|     logname = path.join(LogsPath, f’{name}.log)
026| 
027|     # Return all three filenames…
028|     return (filename, outname, logname)
029| 
030| 
031| zen_file = r”C:\Demo\HCC\Python\data.json”
032| 
033| names = get_related_filenames(zen_file)
034| 
035| print(f’Filename: {names[0]})
036| print(f’Out-Name: {names[1]})
037| print(f’Log-Name: {names[2]})
038| print()
039| 

When run, this prints:

fpath = 'C:\\Demo\\HCC\\Python'
fname = 'data.json'
name = 'data'
ext = '.json'

Filename: C:\Demo\HCC\Python\data.json
Out-Name: C:\Demo\HCC\Python\data.out
Log-Name: C:\Demo\logs\hcc\data.log

Exactly as requested.

Note that Python also has the pathlib module which has even more powerful tools for dealing with filenames.


Our examples so far have involved text files — files with lines of printable text. We can also open binary files — files containing arbitrary byte values. Suppose we have a PNG image file we’d like to read:

001| from os import path
002| 
003| # Folder where our files are…
004| BasePath = r’C:\Demo\HCC\Python’
005| 
006| # Binary file we want to read…
007| DataFile = ‘bitmap-pi.png’
008| 
009| # Create the full path/filename…
010| filename = path.join(BasePath, DataFile)
011| 
012| # Let’s see how big it is…
013| filesize = path.getsize(filename)
014| print(f’{DataFile} has {filesize} bytes.)
015| print()
016| 
017| # Open and read binary file…
018| with open(filename, mode=‘rb’) as file_in:
019|     data = file_in.read()
020| 
021| # Display some results…
022| print(f’read {len(data)} bytes.)
023| print(f’data type is {type(data)}.)
024| print(f’datum type is {type(data[0])}.)
025| print()
026| 
027| print(‘First 10 bytes:’)
028| for ix,datum in enumerate(data[:10]):
029|     char = chr(datum)
030|     char = char if char.isprintable() else ‘-‘
031|     print(f’{ix}: {datum:02x} ({char}))
032| print()
033| 

To read a binary file, we set the mode argument to 'rb' (line #18). As above, using the file object’s read method (line #19) returns the entire file contents (as bytes). The order of the two characters in the mode argument doesn’t matter ('br' works fine), but we must supply the 'r' here (because just 'b' overrides the default 'r', so the open function doesn’t know whether to read or write the file).

Note that we cannot use the readlines method here because binary files do not have lines of text. We also cannot use a for loop with binary files because that tries to extract lines of text.

When run, this prints:

bitmap-pi.png has 10697 bytes.

read 10697 bytes.
data type is <class 'bytes'>.
datum type is <class 'int'>.

First 10 bytes:
0: 89 (-)
1: 50 (P)
2: 4e (N)
3: 47 (G)
4: 0d (-)
5: 0a (-)
6: 1a (-)
7: 0a (-)
8: 00 (-)
9: 00 (-)

Note that the bytes string is a list of integer values — the ordinal codes of the characters. This is why we use the built-in chr function (line #29) to get the string version of the character and test that for printability (line #30). We replace it with a '-' if it isn’t.


Sometimes, to open a file, we need to scan a folder (or folders) for their contents and proceed depending on what we find (and what we’re trying to do).

Python makes this easy using the listdir function from the os module (the “operating system” module with many useful low-level O/S functions):

001| from os import listdir
002| 
003| # Folder where our files are…
004| BasePath = r’C:\Demo\HCC\Python’
005| 
006| # Get a list of files there…
007| filelist = listdir(BasePath)
008| 
009| # List the files…
010| for ix,name in enumerate(filelist, start=1):
011|     print(f’{ix:2d}: {name})
012| print()
013| 

When run (on my machine), this prints:

 1: bitmap-pi.png
 2: Christmas Carol.txt
 3: data.json
 4: demo.py
 5: dualnums-1.png
 6: dualnums-2.png
 7: dualnums-test.png
 8: first.py
 9: IDLE demo.lnk
10: IDLE first.lnk
11: multipart-1.txt
12: output.txt
13: Python Zen.txt
14: Rotated Zen.txt
15: second.py
16: Tk window 1.png
17: Tk window 2.png
18: Tk window 3.png
19: Tk window 4.png
20: Tk window 5a.png
21: Tk window 5b.png

The actual output, of course, depends on the actual folder contents.

The listdir function returns a list of files in (case-insensitive) alphabetical order, and note that it returns only the file names, not their paths. If we want full names, we need to use the path.join function:

001| from os import listdir, path
002| 
003| # Folder where our files are…
004| BasePath = r’C:\Demo\HCC\Python’
005| 
006| # Get a list of files there…
007| filelist = listdir(BasePath)
008| 
009| # List the files…
010| for ix,name in enumerate(filelist, start=1):
011|     filename = path.join(BasePath, name)
012|     print(f’{ix:2d}: {filename})
013| print()
014| 

And given a full path and name, we can list the size and dates:

001| from os import listdir, path
002| from time import ctime
003| 
004| # Folder where our files are…
005| BasePath = r’C:\Demo\HCC\Python’
006| 
007| # Get a list of files there…
008| filelist = listdir(BasePath)
009| 
010| # List the files…
011| print(‘Files in {BasePath}:’)
012| for ix,name in enumerate(filelist, start=1):
013|     filename = path.join(BasePath, name)
014| 
015|     filesize = f’{path.getsize(filename):,}
016| 
017|     filedate = ctime(path.getmtime(filename))
018| 
019|     print(f’{ix:2d}: {name:19s} | {filesize:>6} | {filedate})
020| print()
021| 

We’d like commas in the file sizes, so we use an f-string in line #15 to convert the file size (an integer) to a string with commas.

As mentioned above, the getxtime functions return the number of seconds since the epoch, so on line #17 we use the ctime function from the time module to convert that to a standard (locale-based) string.

When run, this prints:

Files in {BasePath}:
 1: bitmap-pi.png       | 10,697 | Wed May 13 22:07:42 2020
 2: Christmas Carol.txt |  4,073 | Fri Mar 20 09:40:10 2026
 3: data.json           |    560 | Thu Feb 12 13:07:12 2026
 4: demo.py             |    817 | Thu Feb 12 12:51:23 2026
 5: dualnums-1.png      | 41,531 | Mon Nov 17 21:38:02 2025
 6: dualnums-2.png      | 48,358 | Mon Nov 17 21:26:44 2025
 7: dualnums-test.png   | 48,887 | Mon Nov 17 21:38:01 2025
 8: first.py            |     89 | Thu Feb 12 13:39:03 2026
 9: IDLE demo.lnk       |  3,154 | Thu Feb 12 13:12:49 2026
10: IDLE first.lnk      |  1,170 | Thu Feb 12 16:32:04 2026
11: multipart-1.txt     |  3,509 | Fri Sep  5 17:00:42 2025
12: output.txt          |      0 | Fri Mar 20 13:32:37 2026
13: Python Zen.txt      |    878 | Fri Mar 20 09:41:33 2026
14: Rotated Zen.txt     |    878 | Fri Mar 20 13:54:11 2026
15: second.py           |      9 | Thu Feb 12 15:28:13 2026
16: Tk window 1.png     |  2,027 | Thu Nov 20 21:02:36 2025
17: Tk window 2.png     |  3,194 | Thu Nov 20 21:18:04 2025
18: Tk window 3.png     |  3,924 | Fri Nov 21 11:11:10 2025
19: Tk window 4.png     |  3,197 | Thu Nov 20 21:47:15 2025
20: Tk window 5a.png    |  5,159 | Thu Nov 20 22:00:06 2025
21: Tk window 5b.png    |  5,218 | Thu Nov 20 22:07:35 2025

The listdir function returns both file names and folder names, which we usually want to differentiate (folder names don’t have sizes, for instance). The path.isfile function and the path.isdir function allow us to separate the two:

001| from os import listdir, path
002| from time import ctime
003| 
004| # Folder where our files are…
005| BasePath = r’C:\Demo\HCC\Python’
006| 
007| # Get a list of files (and folders) there…
008| names = listdir(BasePath)
009| 
010| # Generate a list of fullnames…
011| filenames = [path.join(BasePath, n) for n in names]
012| 
013| # Get the directory names…
014| dirs = list(filter(path.isdir, filenames))
015| 
016| # Get the filenames…
017| files = list(filter(path.isfile, filenames))
018| 
019| # List directories…
020| print(f’Subfolders found in {BasePath}:)
021| for name in dirs:
022|     print(name)
023| print()
024| 
025| # List files…
026| print(f’Files found in {BasePath}:)
027| for name in files:
028|     print(name)
029| print()
030| 

Line #11 uses a list comprehension to create a list of full path/file names. This is the equivalent of:

001| 
002| 
003| filenames = []
004| for n in names:
005|     # Create the full path/name…
006|     fullname = path.join(BasePath, n)
007| 
008|     # Append it to the end of the list…
009|     filenames.append(fullname)
010| 
011| 
012| 

Which does the same thing but takes several lines of code.

Lines #14 and #17 use the built-in filter function to extract the directory names (line #14) and file names (line #17). Because the filter function is one of those functions that returns an iterator rather than a list, we use the list constructor to get an actual list object.

Because we use both results in for loops, we could have just done this:

001| 
002| 
003| # List directories…
004| print(f’Subfolders found in {BasePath}:)
005| for name in filter(path.isdir, filenames):
006|     print(name)
007| print()
008| 
009| # List files…
010| print(f’Files found in {BasePath}:)
011| for name in filter(path.isfile, filenames):
012|     print(name)
013| print()
014| 

But in most cases, we’d want the lists for further processing.


As you might imagine, there’s a great deal more we can do with listdir and related functions. As an exercise, try your hand at creating a recursive file scanner that descends into the sub-folders and lists their files as well.


Link: Zip file containing all code fragments used in this post.