Friday, July 17, 2009

A module file can be loaded twice as two different modules (in Python 2.x)

I'll show you a case where one file (mymodule.py) gets loaded twice as two different modules. I'm sure this is trivial to some people, if you know how to do this, you don't have to read the rest of this entry. I will write it anyway hoping it will help somebody since it could bring a huge confusion if he doesn't know why it happens (huge confusion, according to my experience).

Modules and packages in directories found in sys.path are called toplevel. Those which are not in the directories found in sys.path but in a package are still accessible but not toplevel.

Say you have a package named mypackage in a directory you can find in the PYTHONPATH, and it has a module mymodule. My package is a toplevel and mymodule is not.

$ pwd
/home/kotamura/mytest
$ echo $PYTHONPATH
/home/kotamura/mytest/mytoplevel
$ tree mytoplevel/
mytoplevel/
`-- mypackage
|-- __init__.py
`-- mymodule.py
Let's run Python interactively.
$ python
Python 2.6 (r26:66714, Jun  8 2009, 16:07:26)
[GCC 4.4.0 20090506 (Red Hat 4.4.0-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import mypackage.mymodule as mm
>>> mm.__name__
'mypackage.mymodule'
You can see the module is not a toplevel from its name (it's under mypackage.)
Go down the directories and do it again.
$ python
Python 2.6 (r26:66714, Jun  8 2009, 16:07:26)
[GCC 4.4.0 20090506 (Red Hat 4.4.0-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import mypackage.mymodule as mm
>>> mm.__name__
'mypackage.mymodule'

It's not surprising at all. But you can also import mymodule
>>> import mymodule as m
>>> m.__name__
'mymodule'

Now mymodule is toplevel!, This is because if you run Python interactively Python adds the current directory to sys.path. What is important is that mm and m are different objects even if it is created from the same file mymodule.py
>>> mm.a = 1
>>> m.a = 2
>>> mm.a
1
>>> mm is m
False

It's not the case you will see only when you use interactive console. The same thing happens when you import a module from another module without specifying it from the toplevel.
$ cat weirdImporter.py
import mypackage.mymodule as mm
import mymodule as m
print mm.__name__
print m.__name__

You don't have to be in the directory to run weirdImporter.py. You can run it from anywhere
$ python /home/kotamura/mytest/mytoplevel/mypackage/weirdImporter.py
mypackage.mymodule
mymodule

Again mymodule.pygot imported twice.
import mymodule as m

Python lets weirdImporter import mymodule because it is in the same directory, and since weirdImporter runs in the __main__ module (not mypackage.weirdImporter I mean), it imports mymodule as a toplevel. While
import mypackage.mymodule as mm

imports the mymoduel.py as mypackage.mymodule. So they are different.

A solution many people recommend is that you always import a package/module specicfying the path from the toplevel. It will solve every problem. You can also use "relative import" first introduced in Python2.5 but it brings another confusion until you get used to it.

Finally, Python3.x doesn't let weirdImporter import mymoduel.py only because it's in the same directory. But you still need to be careful not to have the same file be imported more than once from multiple toplevels. Don't forget executing 'python /path/to/foo.py' adds /path/to directory to sys.path.

No comments: