Tuesday, March 10, 2009

LLVM and Python

LLVM (Low Level Virtual Machine ) takes LLVM IR (intermediate representation), which is like an intermediate language and builds it after optimizing it. LLVM compile flow consists of roughly two parts, LLVM frontend, which is an IR generator, and backend, which generates build products. By choosing a frontend, you can compile your source code written in your favorite language (C++, Java, Python, ...) and by choosing a backend, you can decide what to build, native executable, Java byte code, Flash swf, ... As far as I know, there are two approaches to make a Python front end (different concept and different level of maturity), RPython in PyPy project and py2llvm. The objective of PyPy project is to make a Python compiler that can be used to write any programming languages in Python. py2llvm focuses on native build optimization (It uses single precision floating point for Python float to make use of SIMD optimization). Because LLVM IR is statically typed, both RPython and py2llvm are statically typed. Both are designed to be as similar to CPython as possible by introducing type inference, etc. but it's still different from CPython (RPython stands for Restricted Python) by its nature, so in many cases you cannot use Python standard libraries(*) nor third party tools without modifying it.

(*)They have ported most CPython standard libraries to RPython

I wonder if I can make a complete LLVM Python frontend. If I decomposite the bytecode execution part of the CPython interpreter (and initialization, thread context switching, exception, garbage collector, etc.etc.) and can implement it in LLVM IR, I think I may be able to make it. It can be dependent on CPython binary (Python2x/3.so/dll). It can be as slow as CPython. I think I would be lucky if it's a little bit faster but it's not the point. If I can make it, I may be able to merge it seamlessly with the existing Python frontend I have told above. You may think why not using ctypes etc. but the objective is making a full set Python compiler and it needs to be as user friendly as possible. I want to let the user make a RPython or py2llvm function that calls a function which uses a standard module, and it could call RPython or py2llvm function again which may raise exception, all without the user notice the difference (or in a way the user can use naturally and easily). By implementing some (or quite a lot) on the boundary of the two implementation such as type mapping, Depending on the implementation details of RPython and/or py2llvm it may not be impossible, at least technically.

It's just an obscure cloud of thought, not at all concrete, but I'm now having fun with the cloud of idea. RPython is a big one so first I'll take a close look at py2llvm.

6 comments:

greatRGB said...

Is this similar to .NET and their CLR? IronPython taps into that, but what you are discussing is a little over my head. I just picked up on a few key words such as Intermediate, multiple language etc :)

hohehohe2 [at] gmail.com said...

greatRGB,

There are similarities but at least it's different in the following things (if my understanding is correct).
- .NET always depends on its runtime for execution, you can choose LLVM backend
- hmmm, I can't find anything else. Follow up please (to LLVM masters)

Maciej Fijalkowski said...

Honestly, regarding PyPy it's all wrong.

LLVM is used in PyPy to compile RPython, which does not come with libraries, that's true. But the point is that PyPy implements Python interpreter written in RPython. So the net effect is that you have python interpreter compiled to llvm. Regarding "the same could be used in CPython with type inference", no. There is a reason why RPython is restricted and that precise reason is that you cannot type-inference python, it's just too dynamic. Of course you can compile dynamic language to a static platform (Jython would be example of such a compiler), but you still have dynamic dispatch at runtime, unless you employ some JIT.

Cheers,
fijal

hohehohe2 [at] gmail.com said...
This comment has been removed by the author.
hohehohe2 [at] gmail.com said...

Thanks for the comment.

> "the same could be used in CPython with type inference"

It means you cannot convert full CPython script into LLVM IR, does it? yeah I know, so we'll need to generate a LLVM IR that implements CPython virtual machine, but it has additional feature that calls functions written in RPython and compiled with LLVM, and the virtual machine should bridge CPython and RPython objects(or data types) at runtime. I don't know if it's possible cause CPython and RPython are quite different in this sense.

hohehohe2 [at] gmail.com said...

> a LLVM IR that implements CPython virtual machine

which means CPython virtual machine written in LLVM IR. It will have most ceval.c and will have code to call functions in the CPython shared library (python25.so etc.)

> written in RPython and compiled with LLVM

Oops once RPython script is compiled with LLVM separately, in a CPython script we'll have no way to catch exceptions raised by an RPython script. So CPython vm and RPython script should be compiled together (still idea not clear...)