Showing posts with label Mac. Show all posts
Showing posts with label Mac. Show all posts

Saturday, November 8, 2008

Hooking library calls on Mac using DYLD_INSERT_LIBRARIES

Mac offers a way to override functions in a shared library with DYLD_INSERT_LIBRARIES environment variable (which is similar to LD_PRELOAD on Linux). When you make a twin brother of a function that is defined in an existing shared library, put it in you a shared library, and you register your shared library name in DYLD_INSERT_LIBRARIES, your function is used instead of the original one. This is my simple test. Here I've replaced f() in mysharedlib.dylib with f() in openhook.dylib.


$ cat mysharedlib.h
void f();
$ cat mysharedlib.c
#include <stdio.h>
#include "mysharedlib.h"

void f()
{
printf("hello");
}
$ cat main.c
#include <stdio.h>
#include "mysharedlib.h"

int main()
{
f();
return 0;
}
$ cat openhook.c
#include <stdio.h>
#include <dlfcn.h>
#include <unistd.h>
#include "mysharedlib.h"

typedef void (*fType)();
static void (*real_f)() = NULL;

void f()
{
if ( ! real_f)
{
void* handle = dlopen("mysharedlib.dylib", RTLD_NOW);
real_f = (fType)dlsym(handle, "f");
if ( ! real_f) printf("NG");
}
printf("--------zzz--------");
real_f();
}
$ cat bat
#!/bin/bash
gcc -flat_namespace -dynamiclib -o openhook.dylib openhook.c
gcc -dynamiclib -o mysharedlib.dylib mysharedlib.c
gcc mysharedlib.dylib main.c
export DYLD_FORCE_FLAT_NAMESPACE=
export DYLD_INSERT_LIBRARIES=openhook.dylib
./a.out
$ ./bat
--------zzz--------hello

You also need to define DYLD_FORCE_FLAT_NAMESPACE (doesn't matter what value it has). In general it makes the command (in this case a.out) unstable, not a lot in my opinion if we use it just for debugging purpose, but it increases the chance of symbol name conflicts.

You can use the same technique to override a method in a C++ class. Say there's a method named "fff" in a class AAA, like

class AAA
{
public:
int m;
AAA(){m = 1234;}
void fff(int a);
};

To override it, you first need to know the mangled symbol name of the method.

$ nm somelibrary.dylib | grep "T "
00000ed6 T __ZN3AAA3fffEi

Then what you need to define is _ZN3AAA3fffEi. Don't forget removing the first '_'. If you see multiple symbols in the shared library and not sure which one to override, you can check it by demangling a symbol like

$ c++filt __ZN3AAA3fffEi
AAA::fff(int)

Now you can override it like this.

$ cat mysharedlib.h
class AAA
{
public:
int m;
AAA(){m = 1234;}
void fff(int a);
};
$ cat mysharedlib.cpp
#include <stdio.h>
#include "mysharedlib.h"

void AAA::fff(int a)
{

printf("--ORIGINAL:%d--", a);
}
$ cat main.cpp
#include <stdio.h>
#include "mysharedlib.h"

int main()
{
AAA a;
printf("--------main1--------");
a.fff(50);
printf("--------main2--------");
return 0;
}
$ cat openhook.cpp
#include <stdio.h>
#include <dlfcn.h>
#include <unistd.h>
#include "mysharedlib.h"

typedef void (*AAAfffType)(AAA*, int);
static void (*real_AAAfff)(AAA*, int);

extern "C"
{

void _ZN3AAA3fffEi(AAA* a, int b)
{
printf("--------AAA::fff--------");
printf("%d, %d", b, a->m);
void* handle = dlopen("mysharedlib.dylib", RTLD_NOW);
real_AAAfff = (AAAfffType)dlsym(handle, "_ZN3AAA3fffEi");
if (real_AAAfff) printf("OK");
real_AAAfff(a, b);
}

}
$ cat bat
#!/bin/bash

gcc -flat_namespace -dynamiclib -lstdc++ -o openhook.dylib openhook.cpp
gcc -dynamiclib -lstdc++ -o mysharedlib.dylib mysharedlib.cpp
gcc -lstdc++ mysharedlib.dylib main.cpp
export DYLD_FORCE_FLAT_NAMESPACE=
export DYLD_INSERT_LIBRARIES=openhook.dylib
./a.out
$ ./bat
--------main1----------------AAA::fff--------50, 1234OK--ORIGINAL:50----------main2--------

Note that the first argument of the function call is this pointer, just like Python passes self to a bound method. C++ just does it implicitely. I believe this is compiler (in this case gcc) implementation specific, and there may be a case it is not true. Please use this technique at your own risk.

Here I assumed you have access to the header file that declares the function to get (the size of) argument data being passed to the function. If the size of arguments are all known (int, float, pointer, ...) you can wrap it even if you don't have the header but if not, you'll need to write assembler to modify the stack to pass the arguments to the original function.

Sunday, August 3, 2008

-undefined dynamic_lookup (Mac)

I was a little bit confused about MacPython's extension dependency.

Extensions that are part of the standard library are in
/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-dynload (On my machine)
Basically those extensions seemed to be dependent on only libgcc_s.1.dylib and libSystem.B.dylib, not Python at all.


$ otool -L array.so
array.so:
/usr/lib/libgcc_s.1.dylib (compatibility version 1.0.0, current version 1.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 111.0.0)

I was wondering why it was not dependent on Python, while in the extension Python C/API funcs were used.
Python C/API functions are in
/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/config/libpython2.5.a (again on my machine)
which is actually a shared library (dylib) although its file extension is .a, and is a symbolic link to ../../../Python. (P is capital)

$ cd /System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/config
$ nm libpython2.5.a | grep "T "
...(lots of API funcs)...

Should extensions be dependent on it?
If a symbol is not bound to a filename how can we avoid name conflict?
I tested further and found the way to know what file a functions is dependent on.

$ cat a.cpp
#include <stdio.h>
void f()
{
printf("a\n");
}
$ cat b.cpp
#include <stdio.h>
void f()
{
printf("b\n");
}
$ cat main.cpp
int main()
{
void f();
f();
return 0;
}


$ gcc -dynamiclib -arch i386 -o a.dylib a.cpp
$ gcc -dynamiclib -arch i386 -o b.dylib b.cpp
$ gcc main.cpp -dylib a.dylib -dylib b.dylib -lstdc++
$ nm -mg a.out
0000200c (__DATA,__data) external _NXArgc
00002008 (__DATA,__data) external _NXArgv
(undefined [lazy bound]) external __Z1fv (from a)
...

So each function does have a dependency to a certain file, yeah of course
I used the command to array.so,

$ cd /System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-dynload
$ nm -mg array.so
(undefined [lazy bound]) external _PyErr_BadArgument (dynamically looked up)
(undefined [lazy bound]) external _PyErr_Clear (dynamically looked up)

What's (dynamically looked up)?
The answer was a linker option -undefined dynamic_lookup first supported in OSX10.3.
http://developer.apple.com/documentation/DeveloperTools/Conceptual/MachORuntime/Reference/reference.html

$cat fakeb.cpp
$ gcc -dynamiclib -arch i386 -o a.dylib fakeb.cpp
$ gcc -dynamiclib -arch i386 -o b.dylib fakeb.cpp
$ gcc main.cpp -dylib a.dylib -dylib b.dylib -lstdc++
Undefined symbols:
"f()", referenced from:
_main in cca9JLe0.o
ld: symbol(s) not found
collect2: ld returned 1 exit status
$ gcc -dynamiclib -arch i386 -o a.dylib fakeb.cpp
$ gcc -dynamiclib -arch i386 -o b.dylib fakeb.cpp
$ gcc main.cpp -dylib a.dylib -dylib b.dylib -lstdc++ -undefined dynamic_lookup
$ nm -mg a.out
0000200c (__DATA,__data) external _NXArgc
00002008 (__DATA,__data) external _NXArgv
(undefined [lazy bound]) external __Z1fv (dynamically looked up)
...
$ gcc -dynamiclib -arch i386 -o a.dylib a.cpp
$ gcc -dynamiclib -arch i386 -o b.dylib b.cpp
$ ./a.out
a
$ gcc -dynamiclib -arch i386 -o a.dylib fakeb.cpp
$ ./a.out
b

Phew...