Tuesday, May 3, 2011

Setting up Distel and erlang remote shell for membase

I decided to do myself a small present. I've just wrote some Emacs lisp code that via REST API grabs erlang otp node and cookie and connects my emacs with that node. I've made integration with Distel and erlang remote shell. The later was most problematic because of quite weird TTY handling in Erlang shell. But in the end it seems to work great! Grab code here: https://gist.github.com/952958 And the usage is M-x alk-membase-shell and M-x alk-membase-setup-distel.


Sunday, April 24, 2011

Should we avoid exceptions in JS ? (Or the problem with jsPerf micro-benchmarks)

In general my take on benchmarking is: it's too easy to screw up something. The biggest problem is that with normal programming/engineering it's easy to see if you're doing it right. With benchmarking you have some (plausibly looking or otherwise) numbers. I actually think that most of benchmarks we're seeing are flawed in some way. So it's very easy to jump to conclusions, only to be found guilty of screwing up some important aspect. So first of all I'd like to make the following statement:
I understand that benchmarks suck. Please be careful and don't blindly believe any benchmarking numbers. Yes. Even mine.
Now back to main story. Today my pull request for underscore was rejected. I've proposed nice call/cc like helper to do non-local return from JS functions. It should be especially helpful for breaking out of forEach/map loops. More ideally ES5 designers could have added some ability to terminate loops early. Like via returning some special value. But apparently they're too conservative. Maybe I'll post some rants about that some other day.

Anyway, my patch was rejected because underscore author does not want to use exceptions for control flow. His point is that exceptions are very slow.

But is that true? I'm not sure. If we want all performance we can get, then probably we shouldn't use forEach & friends in first place. And then ES3's break statement can actually break out of several loop nestings (via break to label). But note, that standard forbids breaking across function calls, otherwise it would be nice (and potentially faster) alternative to exceptions.

Part of the problem is that, apparentlyV8 forces heap allocation of frames of functions with try/catch block.

The fact is, any code that creates closures has potential to force heap allocation of stack frames (or parts of), which if done often enough will trigger (relatively) expensive GC pauses. That's because if closure is 'leaked' out of it's dynamic scope, it effectively out-lives function that created it. So any variables that are needed by closure cannot live on normal stack.

Lot's of tricks in modern JS interpreters allow them to avoid heap allocation in many important cases. But sometimes you still have to pay that price.

My point is: there is no other way to do non-local return in JS other than by throwing exception. And it doesn't seem to be too slow. I also expect JS-engine vendors to gradually optimize that case.

But even though I understand micro-benchmarking drawbacks. Here's my (potentially flawed) take on for-loop-with-break vs. forEach with throw problem.

First, here's jsPerf page on plain for vs. forEach performance. We see that on FF3.6 forEach is much slower than loop. But on V8 it's about same.

Second, here's jsPerf page on for with break vs. forEach with throw performance. We see that throwing out of loop is not too bad. It actually seems to be faster on Firefox and only 1.5-to-2 times slower on Chrome's V8.

Also note, that somehow manual inlining of add function produces much larger scores, which might indicate  major flow in this microbenchmark. In general of course we'd like to compile/optimize benchmark code once and then run it multiple times. But lack of automatic inlining in this very trivial case hints that we might be causing JIT to optimize it each time it's run. So feel free to play with that benchmarks more and find some flaw!

Note that there's some plausible evidence that throwing non-Errors is much faster. I think that's because when you throw Error instance, it causes runtime to collect backtrace and fill in 'stack' attribute, which seems to be much costlier then just throwing. In fact you can google for Java exception throwing performance and you'll discover that optimal throwing performance is reached by throwing 'prepared' exception instance. Exactly for same reason.

But returning to original question, I still feel that in some cases exceptions are perfectly good way to do some non-local returns. There are so many creative ways to use all language features. And blindly rejecting any of them, just because it seems to be slow is not smart, IMHO. Sure, exceptions have their cost, but sometimes avoiding them is costlier. And all trade-offs should be carefully analyzed and understood.

Monday, February 21, 2011

Caching autotools outputs for fun and profit

GNU make parallell building feature is cool. And ccache which speeds up rebuilds of same files is cool too. But even with this great tools rebuilding some project is still not as fast as it can be. Why ? Because preparing GNU autotools files (./configure & friends) takes quite long time. And there is no way to parallelize it.

But we can cache autotools products just as ccache caches compiler products. And now there is tool that's capable of doing that! You can grab it here. I've adapted fabricate.py for this job.

Just put fabricate.py into PATH and prepend it to your command. Like:

fabricate.py ./bootstrap

or

fabricate.py sh -c './bootstrap && ./configure'

And it'll copy command outputs from cache if all dependencies are same. Nearly instantaneous!

Tuesday, February 8, 2011

Welcome to 21th century, Aliaksey!

I'm used to slow, expensive & unreliable Internet. Because of that I tended to keep as much as possible on my local hard drive. I had tons of specs, articles, e-books. And I even kept local mirror of entire Debian unstable repository! That's because at home and until recently at Altoros office I had quite limited and expensive Internet access. I tried (successfully) to reduce my dependency on Internet as mush as possible.

Around 2 weeks ago I accidentally deleted most of my 'stuff'. And now I'm forced to use online resources. And quite surprisingly that's not bad experience at all! Internet here, at Bay Area, is fast enough. And googling for information and reading it online sometimes seems to be even faster than locating and opening it from local disk.

This is unusual and new experience. When I, for example, need some information on how something is done in python I just ask Google. And within mere seconds I have precisely what I need! I've even started reading PDF articles online via Chrome's plugin without downloading them first.

I'm slowly getting used to 21th century Internet. I'm still not quite 100% comfortable with depending on Internet availability. And some lookups are definitely much faster against local data. For example, I don't think anything can be faster than local supermegadoc documentation lookup of some function. But I'm going to stop downloading interesting stuff and will instead bookmark it. Let's see how it'll go. I'm especially curious how it will work when I'll return home. Hopefully, we'll have speedy ADSL connection by then :)

Monday, February 7, 2011

How to use two monitors in IntelliJ IDEA

I'm working on some educational Scala project. And IDEA is natural choice for that. Normally I use Emacs for my work, but for anything Java related it's not as good as IDEA. And I cannot express how powerful and polished this nearly free software IDE is.

However, I was missing one very useful feature of Emacs, which is ability to open several frames to effectively use two monitors. Until few minutes ago.

When you want to open same "buffer" on two monitors you can first split it horizontally (C-x 2 if you use Emacs key bindings) and then you can simply drag one of the windows out of it's frame to second monitor! If you want different files you can just drag tabs out of main frame.

Saturday, January 29, 2011

How to testament... childs' death

Often Unix processes spawn childs. Contrary to the popular belief childs DO NOT automatically die when their parent exits.

Unix job control has a concept of sessions and session leaders. Session leader is the process that 'owns' terminal, and when this guy dies, each process of session gets SIGHUP. This usually leads to death. Every time you open new tab in your terminal emulator, you're creating new session. And bash (or other shell) is leader of this session. So your childs will likely die when terminal emulator tab will be closed, but they will easily outlive your process. And they will live forever when your main process is spawned from cron or your favorite continuous integration tool.

Of course the whole topic of Unix job control is slightly more complex. You can read more by reading GNU libc manual (or aptitude install glibc-doc && info libc). Posix man pages (aptitude install manpages-posix{,-dev}) contain lots of details, much more than usual Linux man pages.

So one of the ways to ensure your childs die is by using sessions. This is usually implemented by creating pseudo terminal pair (PTY) and giving slave side of that pair to child. Session leader (i.e. child) gets SIGHUP when master side of PTY pair is closed. Master side is closed when your main process exits (if you don't pass it's fd to child by mistake). When child dies, this causes delivery of SIGHUP to whole session. {spawn,fork}pty-like functions are used for that.

And there's another even less widely known way. There is a concept of orphaned process group that can be used to ensure SIGHUP delivery without relying on PTYs. I'll refer to posix man page for details. But basically you fork and deliver SIGSTOP to child. When your process group becomes orphaned (e.g. when main process dies), each member will be sent SIGCONT & SIGHUP. Added benefit of this approach is that if parent of your main process dies, your process and it's child will die too. This relies on main process or it's parent being process group leader, but that usually holds.

I'm using this to control a bunch of erlang VMs when running Membase in development mode. You can grab python code for doing that here. Ruby code for doing same is removed, but you can dig it's grave.

UPDATE: unfortunately this trick doesn't always work in OSX/Darwin due to different definition of orphaned process group in OSX: http://developer.apple.com/library/mac/#documentation/darwin/reference/manpages/man2/intro.2.html

Sunday, December 26, 2010

Two envelopes sophism puzzle

I'm currently on vacation. And it allows me to return to some small topics that I postponed earlier. One of them is Two envelopes problems. This is quite interesting and difficult sophism to solve and 'get'. This morning I'm not in 100% work-ready mood so I decided to return to my list of 'Later'. Surprisingly, this time it took quite small amount of time to solve this puzzle. Actually, I'm a bit puzzled why I postponed it earlier. Maybe I should take vacations more seriously ?

I'm copying definition from wikipedia:

The basic setup: You are presented with two indistinguishable envelopes containing some money. You are further informed that one of the envelopes contains twice as much money as the other. You may select any one of the envelopes and you will receive the money in the selected envelope. When you have selected one of the envelopes at random but not yet opened it, you get the opportunity to take the other envelope instead. Should you switch to the other envelope?

The switching argument: One line of reasoning proceeds as follows:
  1. I denote by A the amount in my selected envelope.
  2. The probability that A is the smaller amount is 1/2, and that it is the larger amount is also 1/2.
  3. The other envelope may contain either 2A or A/2.
  4. If A is the smaller amount the other envelope contains 2A.
  5. If A is the larger amount the other envelope contains A/2.
  6. Thus the other envelope contains 2A with probability 1/2 and A/2 with probability 1/2.
  7. So the expected value of the money in the other envelope is

    {1 \over 2} 2A + {1 \over 2} {A \over 2} = {5 \over 4}A
  8. This is greater than A, so I gain on average by switching.
  9. After the switch, I can denote that content by B and reason in exactly the same manner as above.
  10. I will conclude that the most rational thing to do is to swap back again.
  11. To be rational, I will thus end up swapping envelopes indefinitely.
  12. As it seems more rational to open just any envelope than to swap indefinitely, we have a contradiction.
The puzzle: The puzzle is to find the flaw in the very compelling line of reasoning above.

IMO the most easily understandable flaw in that reasoning is that it adds values of A from two possible realities. And those values obviously differ. If you analyze this problem correctly then it's obvious, that switching envelopes does not make sense. So the lesson is: be careful with your variables.