100 Days of Learning: Day 23 – More exploring of the Python importing process

Photo by Maxwell Nelson on Unsplash

Here is my Log book

Code for today.

The Definitive Guide to Python import Statements is a really good article to read.

Quick recap on importing

  • When a module is imported the interpreter executes all of the code from the module file.
  • When a package is imported the interpreter executes all of the code from the package’s __init__.py file.
  • __name__ specifies the current module’s name. This can be changed by the loading system (e.g. __main__ )
  • __package__ specifies the package the current module belongs too.
  • A built-in module is compiled into the Python interpreter and is normally written in C.

How does importing work?

According to the docs when a module is imported the interpreter will first search for a built-in module with that name. If not found it will then search for a file in the list of directories specified by sys.path. According to the guide the interpreter will also look for a package matching the name.

sys.path is initialized with the directory that contains the script being run, followed by the environment variable PYTHONPATH (same syntax as PATH) then followed by Python standard libraries.

sys.path

Let’s explore sys.path. Modify the example.py file and run it.

# example.py
import sys
...
if __name__ == '__main__':
    ...
    print(f'sys.path: {sys.path}')
$ python example.py
...
sys.path: ['/Users/andre/.../project', '/Users/andre/.pyenv/versions/3.9.1/lib/python39.zip', '/Users/andre/.pyenv/versions/3.9.1/lib/python3.9', '/Users/andre/.pyenv/versions/3.9.1/lib/python3.9/lib-dynload', '/Users/andre/.pyenv/versions/3.9.1/lib/python3.9/site-packages']
# Ok that checks out, the first directory is the one containing the script being run.

# What happens when we add paths to PYTHONPATH
export PYTHONPATH=${PYTHONPATH}:${HOME}/temp
$ python example.py
sys.path: ['/Users/andre/.../project', '/Users/andre/temp', ...]

# Ok that also checks out, script directory followed by PYTHONPATH followed by the rest.

What does sys.path look like when we are technically not running a script? Running the Python REPL means we are not running a script.

# Start a new terminal session to be sure that PYTHONPATH is not going to interfere
$ echo $PYTHONPATH
# nothing

$ python
>>> import sys
>>> print(sys.path)
['', '/Users/andre/.pyenv/versions/3.9.1/lib/python39.zip', '/Users/andre/.pyenv/versions/3.9.1/lib/python3.9', '/Users/andre/.pyenv/versions/3.9.1/lib/python3.9/lib-dynload', '/Users/andre/.pyenv/versions/3.9.1/lib/python3.9/site-packages']

# Ok so we indeed do not get a script directory or the current working directory

Key points:

  • sys.path does not contain the current working directory when running a script. This should not be confused when running the REPL, here the sys.path[0] == ” actually means the current working directory.
  • sys.path can be modified.
  • sys.path is shared across all the imported modules.

Let’s explore some of these key points. Create a new file named hello.py inside package1

├── package1
      ├── __init__.py
      └── hello.py
# hello.py
import sys
print(f'{__name__} sys.path: {sys.path}')

Modify example.py and run it.

# example.py
...
import package1.hello
...
$ python example.py
...
package1.hello sys.path: ['/Users/andre/.../project', '/Users/andre/.pyenv/versions/3.9.1/lib/python39.zip', '/Users/andre/.pyenv/versions/3.9.1/lib/python3.9', '/Users/andre/.pyenv/versions/3.9.1/lib/python3.9/lib-dynload', '/Users/andre/.pyenv/versions/3.9.1/lib/python3.9/site-packages']

Here we can see that the sys.path did indeed carry across to the hello module inside of package1 as is. So in theory it means that hello.py could import modules from the same directory that the script (example.py) lives in.

# Add the following function to module1.py
def say_something():
    print(f'{__name__} says something')

# Then modify hello.py
# hello.py
...
print('hello.py will import module1')
import module1
module1.say_something()
$ python example.py
...
hello.py will import module1
module1 says something

Take note, the code at global script is only executed the first time it is imported. We had to add a function to module1.py since all the print statements in that module would have already been executed.

Let’s modify sys.path. Edit example.py

# example.py
...
sys.path.append('/Users/andre/temp')
import tempmodule

# Create a file named tempmodule in /Users/andre/temp
# tempmodule.py
print('This module lives no where near the project')
$ python example.py
...
This module lives no where near the project
...
sys.path: [ ..., '/Users/andre/temp']

Thus sys.path can be modified and thus alter the importing process.

Built-in modules takes precedence

Based on the docs, it would mean that if we had a module name that matches a name of a built-in module, then our module will never be imported.

Let’s explore that. First lets get the names of built-in modules

$ python
>>> import sys
>>> print(sys.builtin_module_names)
('_abc', '_ast', '_codecs', '_collections', '_functools', '_imp', '_io', '_locale', '_operator', '_peg_parser', '_signal', '_sre', '_stat', '_string', '_symtable', '_thread', '_tracemalloc', '_warnings', '_weakref', 'atexit', 'builtins', 'errno', 'faulthandler', 'gc', 'itertools', 'marshal', 'posix', 'pwd', 'sys', 'time', 'xxsubtype')

We can see from sys.builtin_module_names that there is a module named time. Create a file named time.py.

project
  ├── example.py
  ├── module1.py
  ├── package1
  │   ├── __init__.py
  └── time.py
#time.py
print('THIS SHOULD NOT BE IMPORTED')

Modify example.py

...
# This should import the built-in and not the time.py module
import time
$ python example.py
...
# Output as would be expected. Did not get the print from time.py

Let’s make sure. Rename time.py to mytime.py and adjust the import to be import mytime.

$ python example.py
...
THIS SHOULD NOT BE IMPORTED
# Well in this case it should have been imported as we expected.

Wrap up for tonight

Again we have barely scratched the surface on the module loading and importing process. To be honest it seems quite complex and I am pretty sure it evolved that way for good reasons over the years.

The key take away for me on this is that I need to stop thinking in terms of where my files are located but rather start thinking about how the loading system works.