Friday, May 8, 2009

Passing more than 65536 arguments to a function

Creating a simple function with 4 arguments and it works.

>>> cmd1 = 'def f('
>>> cmd2 = cmd2[:-3]
>>> cmd2 = 'print '
>>> for i in range(0, 4):
...     cmd1 += 'p' + str(i) + ', '
...     cmd2 += 'p' + str(i) + ' + '
...
>>> cmd = cmd1[:-2] + "):" + cmd2[:-3]
>>> cmd
'def f(p0, p1, p2, p3):print p0 + p1 + p2 + p3'
>>> exec cmd
>>> f(1,1,1,1)
4

Let's discompile it.
>>> dis.dis(f)
 1           0 LOAD_FAST                0 (p0)
             3 LOAD_FAST                1 (p1)
             6 BINARY_ADD         
             7 LOAD_FAST                2 (p2)
            10 BINARY_ADD         
            11 LOAD_FAST                3 (p3)
            14 BINARY_ADD         
            15 PRINT_ITEM         
            16 PRINT_NEWLINE      
            17 LOAD_CONST               0 (None)
            20 RETURN_VALUE       

The first number of each line is the address of the instruction(see this). You'll see LOAD_FAST takes three bytes, which means an argument is indexed in two bytes (LOAD_FAST instruction itself is one byte). So Python should be confused if we pass more than 65536 arguments. Let's create 100000 arguments and confuse it :p
>>> cmd1 = 'def f('
>>> cmd2 = 'print '
>>> for i in range(0, 100000):
...     cmd1 += 'p' + str(i) + ', '
...     cmd2 += 'p' + str(i) + ' + '
...
>>> cmd = cmd1[:-2] + "):" + cmd2[:-3]
>>> exec cmd
>>> f(*(1,)*100000)
100000

Why did it work? It's me who is confused...
>>> dis.dis(f)
 1           0 LOAD_FAST                0 (p0)
             3 LOAD_FAST                1 (p1)
             6 BINARY_ADD         
             7 LOAD_FAST                2 (p2)
            10 BINARY_ADD         
            11 LOAD_FAST                3 (p3)
          ...
          262135 LOAD_FAST            65534 (p65534)
          262138 BINARY_ADD         
          262139 LOAD_FAST            65535 (p65535)
          262142 BINARY_ADD         
          262143 EXTENDED_ARG             1
          262146 LOAD_FAST            65536L (p65536)
          262149 BINARY_ADD         
          262150 EXTENDED_ARG             1
          262153 LOAD_FAST            65537L (p65537)
          262156 BINARY_ADD         
          262157 EXTENDED_ARG             1
          ...

Whoa! Python is prepared for the mean test. So clever!

May 11 added:
EXTENDED_ARG n is an instruction to tell the interpreter that it should add n*65536 to the next LOAD_FAST argument.
Related code snippet from ceval.c
  case EXTENDED_ARG:
   opcode = NEXTOP();
   oparg = oparg<<16 | NEXTARG();
   goto dispatch_opcode;

No comments: