Friday, May 8, 2009

Passing more than 65536 arguments to a function

Creating a simple function with 4 arguments and it works.


>>> cmd1 = 'def f('
>>> cmd2 = cmd2[:-3]
>>> cmd2 = 'print '
>>> for i in range(0, 4):
... cmd1 += 'p' + str(i) + ', '
... cmd2 += 'p' + str(i) + ' + '
...
>>> cmd = cmd1[:-2] + "):" + cmd2[:-3]
>>> cmd
'def f(p0, p1, p2, p3):print p0 + p1 + p2 + p3'
>>> exec cmd
>>> f(1,1,1,1)
4

Let's discompile it.

>>> dis.dis(f)
1 0 LOAD_FAST 0 (p0)
3 LOAD_FAST 1 (p1)
6 BINARY_ADD
7 LOAD_FAST 2 (p2)
10 BINARY_ADD
11 LOAD_FAST 3 (p3)
14 BINARY_ADD
15 PRINT_ITEM
16 PRINT_NEWLINE
17 LOAD_CONST 0 (None)
20 RETURN_VALUE

The first number of each line is the address of the instruction(see this). You'll see LOAD_FAST takes three bytes, which means an argument is indexed in two bytes (LOAD_FAST instruction itself is one byte). So Python should be confused if we pass more than 65536 arguments. Let's create 100000 arguments and confuse it :p

>>> cmd1 = 'def f('
>>> cmd2 = 'print '
>>> for i in range(0, 100000):
... cmd1 += 'p' + str(i) + ', '
... cmd2 += 'p' + str(i) + ' + '
...
>>> cmd = cmd1[:-2] + "):" + cmd2[:-3]
>>> exec cmd
>>> f(*(1,)*100000)
100000

Why did it work? It's me who is confused...

>>> dis.dis(f)
1 0 LOAD_FAST 0 (p0)
3 LOAD_FAST 1 (p1)
6 BINARY_ADD
7 LOAD_FAST 2 (p2)
10 BINARY_ADD
11 LOAD_FAST 3 (p3)
...
262135 LOAD_FAST 65534 (p65534)
262138 BINARY_ADD
262139 LOAD_FAST 65535 (p65535)
262142 BINARY_ADD
262143 EXTENDED_ARG 1
262146 LOAD_FAST 65536L (p65536)
262149 BINARY_ADD
262150 EXTENDED_ARG 1
262153 LOAD_FAST 65537L (p65537)
262156 BINARY_ADD
262157 EXTENDED_ARG 1
...

Whoa! Python is prepared for the mean test. So clever!

May 11 added:
EXTENDED_ARG n is an instruction to tell the interpreter that it should add n*65536 to the next LOAD_FAST argument.
Related code snippet from ceval.c

case EXTENDED_ARG:
opcode = NEXTOP();
oparg = oparg<<16 | NEXTARG();
goto dispatch_opcode;

No comments: