Sunday, December 26, 2010

Two envelopes sophism puzzle

I'm currently on vacation. And it allows me to return to some small topics that I postponed earlier. One of them is Two envelopes problems. This is quite interesting and difficult sophism to solve and 'get'. This morning I'm not in 100% work-ready mood so I decided to return to my list of 'Later'. Surprisingly, this time it took quite small amount of time to solve this puzzle. Actually, I'm a bit puzzled why I postponed it earlier. Maybe I should take vacations more seriously ?

I'm copying definition from wikipedia:

The basic setup: You are presented with two indistinguishable envelopes containing some money. You are further informed that one of the envelopes contains twice as much money as the other. You may select any one of the envelopes and you will receive the money in the selected envelope. When you have selected one of the envelopes at random but not yet opened it, you get the opportunity to take the other envelope instead. Should you switch to the other envelope?

The switching argument: One line of reasoning proceeds as follows:
  1. I denote by A the amount in my selected envelope.
  2. The probability that A is the smaller amount is 1/2, and that it is the larger amount is also 1/2.
  3. The other envelope may contain either 2A or A/2.
  4. If A is the smaller amount the other envelope contains 2A.
  5. If A is the larger amount the other envelope contains A/2.
  6. Thus the other envelope contains 2A with probability 1/2 and A/2 with probability 1/2.
  7. So the expected value of the money in the other envelope is

    {1 \over 2} 2A + {1 \over 2} {A \over 2} = {5 \over 4}A
  8. This is greater than A, so I gain on average by switching.
  9. After the switch, I can denote that content by B and reason in exactly the same manner as above.
  10. I will conclude that the most rational thing to do is to swap back again.
  11. To be rational, I will thus end up swapping envelopes indefinitely.
  12. As it seems more rational to open just any envelope than to swap indefinitely, we have a contradiction.
The puzzle: The puzzle is to find the flaw in the very compelling line of reasoning above.

IMO the most easily understandable flaw in that reasoning is that it adds values of A from two possible realities. And those values obviously differ. If you analyze this problem correctly then it's obvious, that switching envelopes does not make sense. So the lesson is: be careful with your variables.

Saturday, December 25, 2010

Reflections on Ruby vs. Python meetup



This is cross-posted from Altoros' developers blog. I want to thank Altoros PR department for editing this text.
I never understood developers and customers that willingly choose Python over Ruby. Ruby does feel like a native language to me, while everything else doesn’t. No, this doesn’t mean that I don’t like working with other technologies. Over the last year and a half, I was really inspired by my positive experience of using Erlang and JavaScript, while working on the Membase project (where Python is widely utilized). I really enjoyed doing my work, so it’s obviously possible to have fun with languages that are less well designed than Ruby.
However, I always wanted to understand why this happens. Why do people choose other languages than Ruby so often? That is why I was excited to accept the invitation to the “Ruby vs. Python” meetup that was sponsored by our company. I chose the topic “Why Ruby Is a Brilliantly Designed Language” and wanted to summarize what makes Ruby different.
The Holy War
The event was very dynamic and intriguing. The speakers and attendees were split into two groups: one defended Python, while another fought for Ruby. Each group prepared presentations describing history, community, main concepts, development features, and perspectives of their language of choice.
I was a part of the group that presented Ruby. It was my first public speech in many years. My previous presentation—the defense of the diploma—was a big failure. I’m glad that this time I’ve learned some lessons and wasn’t so bad. :)
So, why is Ruby a brilliantly designed language?
With hindsight, it is obvious that I picked up a very hard topic. Squeezing my experience into a 15-minute speech was really tricky. While preparing for the session, I reinforced my belief that Ruby is the best designed programming language I’ve ever used. However, when it was just one day left before the meetup, I realized that I could bring in only the most essential arguments to prove that.
    1. My first pro-Ruby argument was that, from the moment the language was created, there were no missing features to add and no design bugs to fix. With over 10 years of its history, Ruby remained—mostly—stable with only minor syntactic features added/tweaked.
    2. Second, Ruby represents all powerful object-oriented programming (OOP) features, which it adopted from Smalltalk. It is a “pure” OOP language, where everything is an object. The pureness, open classes, method_missing, and other OOP features provide for a huge semantic power. Surprisingly, nothing game-changing was introduced to the programming language design over the last 30 years.
    3. After that, I talked about what powerful features Ruby adds on top of the traditional form of OOP. For instance, mix-ins, a powerful alternative to multiple inheritance is a great invention. During the presentation, I demonstrated how it makes functional programming even more natural than in LISP. Another remarkable Ruby feature worth mentioning was the ability to extend individual objects. It really makes defining “static” methods natural and can be useful in some other cases.
    4. I concluded with touching upon Ruby’s concise syntax. As a part of this argument, I demonstrated Ruby’s syntactical killer-feature – blocks.
Revealed pros and cons
There were some other nice discussions during and in between the speeches. For example, the Python group argued that its language is more widely used. However, I noted that one could resort to such argument only because of the lack of other worthy pro-language arguments.
Another controversial point was that Python is good for education and is adopted by Massachusetts Institute of Technology as the primary programming language for students. I objected that Python is not the best language to start with. I think it is wiser to begin developing your programming skills with a pure OOP language, such as Ruby, because it can build a better solid ground for object-oriented design and way of thinking.
The Ruby group was also presented by Aliaksandr Rahalevich, Altoros alumni, who spoke about how Ruby allows him to design great APIs, and Hleb Stiblo, who presented an “Introduction to Ruby.” At the end, my university classmate Pavel Gabriel called upon pro-Python and pro-Ruby folks to summarize their arguments and language trivia.

The discussion ended up with the confessions from both sides about what sucks in their language of choice. Python 2.6/3 incompatibility was the biggest issue that Pythonistas admitted. Ruby developers noted the lack of speed, the lack of support for real concurrency, and cumbersome multiple encodings support in Ruby 1.9. The first two negative points were well understood by Pythonistas, since they suffer from them, as well. Ruby’s incompatibility of 1.9 vs. 1.8 was also mentioned, although nobody said that the difference between these versions is far less than the difference between Python 2.6 and Python 3.
I really enjoyed attending this event and making a presentation there. However, it seems like everyone—including me—forgot to mention that it is hard work of human beings and their attention to details which mostly contributes to an end-result. I think that as long as the technology doesn’t actively interfere with the programmer’s job, it affects the end-result very slightly. The language is merely the tool, while the engineer is the master.

Friday, October 22, 2010

How to use time travel for testing javascript libraries

The project that I'm working on right now is Membase. It is and was awesome experience in many ways. One cool part of that project is that I'm allowed to do quite advanced stuff for membase UI. Part of that is Cells library that is growing as part of Membase UI effort. But this post is not about cells. I promise to write about cells later. This post is about mocking JS clock for testing cells.

The code is here. You're interested in files named: mockTheClock.js and wrapped-date.js. It is inspired by equivalent code in JSUnit, but it's more full featured and more efficient. In particular I'm using binary heap to order pending timers.

The code is Apache-licensed, so feel free to grab and re-use it for your own projects.

Thursday, September 2, 2010

News from the past — gpicker 2 is out!

I've released gpicker 2.0 more than half year ago. And I planned to write about it since then. Lack of time and lazyness delayed that more and more. But some people outside of my immediate 'circle of influence' are now using it for some very creative things. And I cannot wait anymore. It's time to realize my plan at last! :)

So gpicker 2.0 is out and it has some nice improvements. All new features are detailed in README file. But I'd like to describe the most important things here.

When I announced gpicker, the project didn't even had a home. It's official home now is here: http://savannah.nongnu.org/projects/gpicker.

gpicker became even faster. Loading and filtration were moved into separate threads, so that UI is always ultra-responsive. With speed improvements and separated filtration (which nicely aborts/restarts when user changes filter) it's now very responsive even on my whole filesystem which has more than 1E6 files.

supermegadoc and erlang docs
Perhaps the most significant improvement is ability to select from arbitrary list. gpicker is very good at selecting files, but you can feed it list of arbitrary strings via standard input. And I have quite interesting use of that feature – supermegadoc. This thing indexes man pages, ri, devhelp & erlang documentation and allows you to quickly choose and open documentation you need. But it's also quite useful for exploration of documentation. You can try typing various words and see if there's anything like that in documentation. I found it especially helpful for getting up to speed with Erlang standard library. I also did some very basic imenu & tags completer for Emacs which is based on this feature too.

Another area of improvement is gpicker integration. VIM integration (contributed by Sergey Avseyev) is included in gpicker distribution. And we also have plug-ins for netbeans (by Sergey Avseyev) and for gedit  (by Yury Tolstik).

We also have some support for building/running on OSX. But unfortunately it's not very easy (with limited Internet & OSX access I have) to get proper installation of Gtk+ on OSX. The canonical way seems to be building from source via jhbuild. But gpicker-on-OSX work was done against ancient distribution of Gtk+ which is long withdrawn from web. So any success/failure reports with proper Gtk+ installation as well as contributions in this area are very much welcomed. I expect, that only minor configure.ac fixes should be required at most. Contributing pre-built osx binaries is very much appreciated too. This can be done by Bundler.

Yet another improvement is ability to build gpicker without Gtk+ and glib. This flavor of gpicker doesn't have any GUI, obviously. But it's able to filter standard input against filter specified via command line, which should be enough for some useful things. Some people, for example, would like to have gpicker-like functionality with text-mode Emacs. gpicker-simple can help here. In fact I did some very preliminary (and buggy) integration of gpicker-simple filtration and IDO user-interface. If anyone is interested in making this work, please contribute. You can find it in gpicker.el from gpicker distribution.

The creative work by other people I mentioned before is a very interesting idea of switching windows by auto-completing their names (and other info) in gpicker. See Switch Window Using Fuzz Matching :: DoIT for more info.

I'm very glad that I can say thank you to people that helped with code (Sergey & Yura) and/or bug reports/suggestions (folks from Ruby department at Altoros). I'm happy that I have some users outside of Altoros. In particular, Marcelo De Moraes Serpa has been very helpful with suggestions and spreading the word. Thank you very much guys!

I thought that gpicker would take over the world, it's awesome after all, but my (not evil at all) plan utterly failed. This is a bit unfortunate :) But I'm very glad that gpicker is slowly getting some adoption. The future is bright!

Saturday, July 31, 2010

My life with Debian

Tribute

I'm a long time Debian user. And I'm a happy Debian user. My current Debian installation is very (and I mean very, very) old. I don't remember exactly how old it is, but it's more than 6 years old. And it's probably as much as 9 years old. Since early months it was unstable branch of Debian. It is updated regularly, as often as my (quite poor at times) network connectivity allows.

When I'm getting new machine I simply copy my existing installation on new machine. And my particular OS has already been reincarnated from one machine to another for countless times.

I'm not trying to set any records here. I'm just lazy enough to avoid re-installing my OS and setting it up. But I seriously doubt than any other OS or distro can handle it. It's unique combination of Debian's approach to distro development, package upgrade-ability policies and attention to software quality that makes it possible.

So I owe a big thank you to Debian guys for their outstanding job during all this years. Thanks a lot!

Of course sometimes I had issues during package upgrades. It is inevitable when you run unstable and when you do, probably, as much as thousands of package upgrades per year. But I don't remember having anything really major. I was able to resolve all issues that I had.

Tips

During this years of running and constantly upgrading my system I've learned a couple of tricks to keep it in best shape. And I'd like to share them, though it's nothing really new.

Sometimes due to various reasons old versions of some libraries stay installed on your system. It's useful to run deborphan (http://www.debian-administration.org/articles/134) periodically. With new (well a number of years already) features of apt & aptitude it should happen less often, but some cruft can still accumulate.

Another trick that I discovered quite recently is package database de-fragmentation. It's described here (http://ubuntuforums.org/showthread.php?t=1004376).

Probably, the widest known tip is to do purge instead of just uninstall when removing packages. After normal uninstallation, Debian keeps config files. Purging package removes those too. Synaptic package management tool can be quite useful for purging any packages that where uninstalled in default mode.

I found another cleanup opportunity with package database. It turned out that dpkg kept information about uninstalled packages in is /var/lib/dpkg/available file. And during all this years this file has grown quite substantially (15 megs). I wrote simple script that cleans this up. And I've used it a couple of month ago. So far I don't see any bad effects of that. So I can recommend it for anyone with long lived and often upgraded Debian-based distro. Here it is.


amd64 versus i386

Another piece of knowledge, that I can share is that Debian is probably the only distro that more or less supports running 64 bit kernel with 32 bit user-space. I think, that's best combination. You get all 4 gigs of address space for 32 bit apps, you can run 64 bit apps, if you want, and you don't waste memory on twice as large pointers.

It would be great to have advantages of amd64. That's first of all larger register file and modern instruction set (i386 Debian still targets i386). But in my opinion, for typical desktop & developer machine larger memory consumption of 64 bit programs outweigh the benefits. In particular, most java programs really consume very close to twice as much of memory on 64 bit. And don't forget, that AFAIK there's still no lightweight 'client' JIT for amd64. Larger memory consumption causes less cache hits and more memory bandwitch, so amd64 is often slower. I also did some benchmarks with ruby & rails and found, that i386 version is faster.

So I've decided to stick with 64-bit kernel and 32-bit user-space. (And avoid re-install yet again). I'm running this combo for around half of year. One thing that was a bit annoying after switch, is that uname reported amd64, which caused issues with most (all?) configure scripts. My simple trick, which relies on debian's default init is to put the following file in /etc/initscript


It makes sure that everything that's spawned by init has i386 personality. This fixed issues with configure scripts.

For some rare programs, that require kernel component and don't support mixed user- & kernel-space (virtualbox), I have amd64 version installed in chroot. This also helps with development.

Thats all. Keep your systems clean and efficient.

Thursday, July 1, 2010

How to cheaply turn single machine into a cluster

For development of membase which is a distributed storage system, I often need to run it on cluster of machines. Luckily I have two machines at home and at work so with trivial use of rsync & ssh running cluster of two nodes was easy. But I sometimes need to run more than two nodes so I decided to find something cheap that allows me to run multiple true nodes even when I have single machine. This is more important now, because I'll be on business trip for next month. So I'll have only single machine at my disposal.

One approach is to use virtualization. I tried it around a year ago for some project. Starting complete OS just to run single application is a bit too slow, but that can be alleviated by use of snapshots. In practice this is too painful. Even with snapshots it's slow. And free software virtualization products either have it wrong (virtualbox) or buggy (kvm & qemu). Bridge networking is relatively slow to come up. And I remember having some networking issues when restoring from snapshot.

Yesterday I finally tamed 'virtualization' that's cheap and works. My approach is to use LXC which is Linux's built-in containers implementation. It supports network virtualization as it's core feature and it doesn't require separate root for it's instances. So with my current solution I can create large number of 'servers' all having shared filesystem, but different hostnames and network stacks. It's reliable, starts quickly and it's easy to kill.

One of the problems was that I needed to create virtual host-only network that connects host and all containers. The problem is that current implementation of macvlan link type doesn't support networking with host. I worked that around by creating virtual ethernet pair and linking macvlans to one side of it, while using other side as host's end. So far it works beautifully!

The main script is at the following gist: http://gist.github.com/459693. It takes care of allocation of ip address & hostname and simply runs provided command inside container. It also takes care to kill everything inside container when main command exits. This is useful for killing any daemons (e.g. erlang port mapper) that might still be running inside.

Here's what I added to project's makefile to launch multiple instances of membase:

'make lxc-run' starts one instance of membase inside container. 'make lxc-cluster' starts three instances in tabs of new terminal window.

Thursday, June 3, 2010

Multiple return values in Javascript

While writing some javascript few days ago I encountered one case where it was really nice to be able to return multiple values from function. Usually you pack your return values in array, return that array and unpack it at call site. While creating array with values is easy in javascript, unpacking those values back is neither elegant, nor short. Because javascript (portable subset at least) have no destructuring. E.g. code normally looks like this:

  function findRoutingFor(method, path) {
    var foundRoute, routeArgs;
    // some computation here

    if (foundRoute)
        return [foundRoute, routeArgs];
  }

  // and at call site
  var rv = findRoutingFor(method, path);
  if (!rv)
     throw new Error(/* */);
  var foundRoute = rv[0];
  var routeArgs = rv[1];

I believe, that I came up with more elegant way to do that. Continuation passing style saves us here. Here's how:

  // notice extra parameter
  function findRoutingFor(method, path, body) {
    var foundRoute, routeArgs;
    // same computation here
    // and then we invoke continuation
    return body(foundRoute, routeArgs);
  }

  // then at call site...
  return findRoutingFor(method, path, function (foundRoute, routeArgs) {
    if (!foundRoute)
      throw new Error(/* */);
    // rest of call site is here
  });

So continuation passing style allows us to avoid packing & unpacking of multiple return values. There are some costs, like, for example, extra indentation of rest of call site, but, IMO, in many cases it's better than usual alternative.

Sunday, January 24, 2010

Emacs and broken syntax highlighting

I've been struggling with syntax highlighting bugs in ruby-mode and javascript-mode for quite some time. Today I've tried espresso-mode that claims to be fixed javascript mode, and it fails too. I'm not sure why but it seems to be quite typical for Emacs and not typical for other advanced editors (vim & textmate at least). Maybe something is wrong with syntax highlighting facility of Emacs, or maybe major mode authors just cannot use it correctly.

One of the potential problems is regular expressions facility of Emacs Lisp. First of all it uses quite inconvenient dialect of regexps. Many constructs need to be escaped with '\' (alternative (|) and grouping, for example). Second, ELisp doesn't have special syntax for regexps. You have to use strings, where slash has to be escaped again. So even simple regexp a|b turns to "a\\|b".

Two years ago I've written small regexp DSL for Emacs Lisp. Using this DSL you can express complex regular expressions readably.

So today I found regexp literal expression in espresso.el -- "[=(,:]\\(?:\\s-\\|\n\\)*\\(/\\)\\(?:\\\\/\\|[^/*]\\)\\(?:\\\\/\\|[^/]\\)*\\(/\\)", translated it to my DSL and got:

(redsl-to-regexp `(concat (char-set "=(,:")
                          (* (or (whitespace) "\n"))
                          (cap-group /)
                          (or "\\/" (neg-char-set "/*"))
                          (* (or "\\/" (neg-char-set "/")))
                          (cap-group /)))

which is simply wrong. It obviously fails for simple regexp -- /\\/

I tranformed it into:

(redsl-to-regexp `(concat (char-set "=(,:")
                          (* (or (whitespace) "\n"))
                          (cap-group /)
                          (+ (or (neg-char-set "\\/")
                                 (concat "\\" (anychar))))
                          (cap-group /)))

which is logical and works. And ура ура I finally have working syntax highlighting in javascript mode!

I've uploaded regex-dsl to http://github.com/alk/elisp-regex-dsl and will send espresso.el fix soon.

Monday, January 11, 2010

Speeding up ruby interpreter in Debian & Ubuntu

It's an ancient trick, but I decided to blog about it while I still remember to blog about it.

Unix (or ELF, to be more precise) shared libraries suck. Well, there are many useful features that are not found on other platforms (like LD_PRELOAD), but the performance is worse than in other systems. And sometimes, dramatically.

The problem is that shared libraries are usually compiled as position independent code (PIC). This is quite slower (especially, on older ISAs, like i386, that don't have have PC-relative addressing) than normal code. Other reason for shared libraries slowness is indirection for almost all function calls. This indirection gives you extra flexibility. ELF shared libraries allow you to replace almost every function, due to this extra indirection. And sometimes this is very convenient, but this is slow.

Now imagine modern C or C++ code, with small functions that frequently call each other. Extra indirection and PIC cost in function prologues becomes quite high. GCC's -fvisibity switch allows you to get rid of some of this flexibility for gains in speed, but few libraries use it, yet.

You can run simple experiment (as I did few years ago). Write a small program that does malloc/free in a tight loop. Link it normally (i.e. with shared libc) and statically. Then compare their speed. I remember as high as 50% gain with statically sinked libc, that doesn't pay performance price for ELF shared libraries flexibility.

There's 'sort-of-hack' that trades memory efficiency for speed. It is possible to link normal (i.e. non-PIC code) as shared library (at least on i386). This way you'll have minimal function call indirection and no variable access indirection at all. You won't pay PIC price too. The only downside of this method is that dynamic linker will have to patch TEXT section pages with relocated addresses, so it won't be possible to share this pages between different processes. The performance gain may well worth it though. NVIDIA folks, for example, build their libGL in this way. And I'm sure they know what they do.

Why I'm mentioning ELF shared libraries? Because Debian (and Ubuntu too) build ruby interpreter as shared library, and according to my measurements, this gives around 10% performance penalty. To regain part of this loss, I simply edit ruby's 'configure' script and remove all mentions of -fPIC. I than 'dpkg-buildpackage' and install resultant .deb-s as usual. This brings performance of ruby interpreter back to it's normal level (i.e. when it's built statically). Another optimization that I use, is passing better GCC optimization flags. "-march=' flag is quite important on i386.

P.S. Read excellent http://people.redhat.com/drepper/dsohowto.pdf by GNU libc guru Ulrich Drepper, if you want to know more details about ELF shared libraries.

Monday, January 4, 2010

Power of modern javascript interpreters

Today my colleague  (Vlad Zarakovsky), who is interested in porting gpicker scoring to JS or finding something that's close to that, showed me some filtration code and it's performance results. He demonstrated that some (quite non-trivial it seems, from quick glimpse of the source) filtration implementation takes 33 milliseconds on 22k of filenames, which is not bad. This result was obtained on Safari, which sports state-of-the-art javascript interpreter.

Another thing that we noted is that sorting takes additional 14 milliseconds, which compared to much more sophisticated scoring is quite large. I thought that it should be due to cost of transition from native qsort code to JS code thats implements comparator function. And out of curiosity I've proposed trying pure JS sorting instead.

The results were interesting. At first it was much slower than built-in qsort, but Vlad then discovered that it's actually faster on random data (I haven't asked by how much, but that's not very important). All this led me to the following conclusions:

  • Naive qsort on real world non-random data may be 100 times slower than proper qsort implementation. Which is understandable, but this is the first time I've met such case personally.
  • I was right about pure JS qsort being able to beat C qsort, that has to use slow and barely optimize-able comparator.
  • In general, state of the art javascript interpreters are indeed approaching speeds of optimized C. If only we had Ruby interpreter of same quality :)

So when next time your JS code will depend on fast sorting, consider picking pure JS sort implementation, but pick wisely. Also don't forget that there are still some very slow JS interpreters around (yes, I'm talking about 'the-cursed-browser').

BTW, I'm sure same applies to C standard qsort. In cases where comparator is quite simple, the implementation that is able to inline comparator should be much faster. For gpicker I've stolen source of glibc's qsort implementation and made sure that it's able to inline comparator.