It's an ancient trick, but I decided to blog about it while I still remember to blog about it.
Unix (or ELF, to be more precise) shared libraries suck. Well, there are many useful features that are not found on other platforms (like LD_PRELOAD), but the performance is worse than in other systems. And sometimes, dramatically.
The problem is that shared libraries are usually compiled as position independent code (PIC). This is quite slower (especially, on older ISAs, like i386, that don't have have PC-relative addressing) than normal code. Other reason for shared libraries slowness is indirection for almost all function calls. This indirection gives you extra flexibility. ELF shared libraries allow you to replace almost every function, due to this extra indirection. And sometimes this is very convenient, but this is slow.
Now imagine modern C or C++ code, with small functions that frequently call each other. Extra indirection and PIC cost in function prologues becomes quite high. GCC's -fvisibity switch allows you to get rid of some of this flexibility for gains in speed, but few libraries use it, yet.
You can run simple experiment (as I did few years ago). Write a small program that does malloc/free in a tight loop. Link it normally (i.e. with shared libc) and statically. Then compare their speed. I remember as high as 50% gain with statically sinked libc, that doesn't pay performance price for ELF shared libraries flexibility.
There's 'sort-of-hack' that trades memory efficiency for speed. It is possible to link normal (i.e. non-PIC code) as shared library (at least on i386). This way you'll have minimal function call indirection and no variable access indirection at all. You won't pay PIC price too. The only downside of this method is that dynamic linker will have to patch TEXT section pages with relocated addresses, so it won't be possible to share this pages between different processes. The performance gain may well worth it though. NVIDIA folks, for example, build their libGL in this way. And I'm sure they know what they do.
Why I'm mentioning ELF shared libraries? Because Debian (and Ubuntu too) build ruby interpreter as shared library, and according to my measurements, this gives around 10% performance penalty. To regain part of this loss, I simply edit ruby's 'configure' script and remove all mentions of -fPIC. I than 'dpkg-buildpackage' and install resultant .deb-s as usual. This brings performance of ruby interpreter back to it's normal level (i.e. when it's built statically). Another optimization that I use, is passing better GCC optimization flags. "-march=' flag is quite important on i386.
P.S. Read excellent http://people.redhat.com/drepper/dsohowto.pdf by GNU libc guru Ulrich Drepper, if you want to know more details about ELF shared libraries.