Monday, November 24, 2008

PyPy

PyPy is a project to implement Python(CPython compatible) in Python(RPython). What interests a PyPy end user like me is that it's good for converting Python code to various languages. According to PyPy document, it currently translates a Python script into C(full featured), llvm(full), .Net intermediate language(full), Java(not full but getting close), and JavaScript. Due to Python's super flexible features, not all the Python scripts can be translated. They introduced restrictions to CPython specification and defined RPython, a subset of CPython (well, they don't say they have clearly *defined* RPython spec since those restrictions introduced to RPython are evolving). While They make RPython, they have removed every Python features that makes scripts difficult to be statistically compiled. You can read details about RPython here, for example, all module global variables are constant.
PyPy has lots of documents but some of them are not easy to read because it deals with PyPy internal architecture as well as PyPy usage, several comments are only for implementing Python on Python, while others are more generic and for every RPython scripts, and there are lots of levels of things, CPython v.s. RPython, application level objects v.s. interpreter level objects ... so I will suggest in what order you read PyPy documents.
The order I recommend is, you first read getting-started just briefly ignoring every stuff you cannot understand, then architecture then coding-guide then translation. Do not read documents regarding new features such as What PyPy can do for your objects first, though it's the first thing you see in the PyPy documentation list.


Here I will make some more comments on different PyPy topics randomly.

One good thing about outputting llvm code is that it can take advantage of llvm's powerful optimization, both llvm bit code level and processor dependent. From the llvm point of view, PyPy is a llvm Python front end(more precisely RPython frontend). I haven't tested a lot but it's definitely a cool topic.

PyPy translates RPython into other languages from a Python object, not the script directly. It means what you pass to the translator is your function object, not a string nor a file path that contains RPython script.

They have added some to PyPy that doesn't exist in CPython such as optional stackless features. Lazy computed objects is another.

Talking about PyPy implementation, "object space" is interesting. PyPy uses a "standard object space" to execute Python scripts. When it generates a control flow graph (which is used when it translates RPython into another language), it just replaces the standard object space with "flow object space", so for the rest of PyPy, it looks as if the script is executed but the result is a control flow graph, not script execution. I'm not interested in the details but understanding the concept made it easier to read the document.

I just found a tiny bug.


def f():
for a in [1, 2, 3]:
print a,
>>> f()
1 2 3

f() prints nothing after it is c-compiled. It works as expected if I remove comma after "print a".

2 comments:

Anonymous said...

Re: "Java(not full but getting close)"

Will PyPy eliminate the need for Jython?

John

hohehohe2 [at] gmail.com said...

John,
If you use pypy (i.e. Python frontend) and Java frontend (currently in development according to the LLVM web page) you can mix Python and Java code to make one executable where each script calls functions that are defined in the other so it may be the reason, but I don't know how much LLVM can do what Jython can do cause I'm not a Jython user.