Saturday, July 31, 2010

My life with Debian

Tribute

I'm a long time Debian user. And I'm a happy Debian user. My current Debian installation is very (and I mean very, very) old. I don't remember exactly how old it is, but it's more than 6 years old. And it's probably as much as 9 years old. Since early months it was unstable branch of Debian. It is updated regularly, as often as my (quite poor at times) network connectivity allows.

When I'm getting new machine I simply copy my existing installation on new machine. And my particular OS has already been reincarnated from one machine to another for countless times.

I'm not trying to set any records here. I'm just lazy enough to avoid re-installing my OS and setting it up. But I seriously doubt than any other OS or distro can handle it. It's unique combination of Debian's approach to distro development, package upgrade-ability policies and attention to software quality that makes it possible.

So I owe a big thank you to Debian guys for their outstanding job during all this years. Thanks a lot!

Of course sometimes I had issues during package upgrades. It is inevitable when you run unstable and when you do, probably, as much as thousands of package upgrades per year. But I don't remember having anything really major. I was able to resolve all issues that I had.

Tips

During this years of running and constantly upgrading my system I've learned a couple of tricks to keep it in best shape. And I'd like to share them, though it's nothing really new.

Sometimes due to various reasons old versions of some libraries stay installed on your system. It's useful to run deborphan (http://www.debian-administration.org/articles/134) periodically. With new (well a number of years already) features of apt & aptitude it should happen less often, but some cruft can still accumulate.

Another trick that I discovered quite recently is package database de-fragmentation. It's described here (http://ubuntuforums.org/showthread.php?t=1004376).

Probably, the widest known tip is to do purge instead of just uninstall when removing packages. After normal uninstallation, Debian keeps config files. Purging package removes those too. Synaptic package management tool can be quite useful for purging any packages that where uninstalled in default mode.

I found another cleanup opportunity with package database. It turned out that dpkg kept information about uninstalled packages in is /var/lib/dpkg/available file. And during all this years this file has grown quite substantially (15 megs). I wrote simple script that cleans this up. And I've used it a couple of month ago. So far I don't see any bad effects of that. So I can recommend it for anyone with long lived and often upgraded Debian-based distro. Here it is.


amd64 versus i386

Another piece of knowledge, that I can share is that Debian is probably the only distro that more or less supports running 64 bit kernel with 32 bit user-space. I think, that's best combination. You get all 4 gigs of address space for 32 bit apps, you can run 64 bit apps, if you want, and you don't waste memory on twice as large pointers.

It would be great to have advantages of amd64. That's first of all larger register file and modern instruction set (i386 Debian still targets i386). But in my opinion, for typical desktop & developer machine larger memory consumption of 64 bit programs outweigh the benefits. In particular, most java programs really consume very close to twice as much of memory on 64 bit. And don't forget, that AFAIK there's still no lightweight 'client' JIT for amd64. Larger memory consumption causes less cache hits and more memory bandwitch, so amd64 is often slower. I also did some benchmarks with ruby & rails and found, that i386 version is faster.

So I've decided to stick with 64-bit kernel and 32-bit user-space. (And avoid re-install yet again). I'm running this combo for around half of year. One thing that was a bit annoying after switch, is that uname reported amd64, which caused issues with most (all?) configure scripts. My simple trick, which relies on debian's default init is to put the following file in /etc/initscript


It makes sure that everything that's spawned by init has i386 personality. This fixed issues with configure scripts.

For some rare programs, that require kernel component and don't support mixed user- & kernel-space (virtualbox), I have amd64 version installed in chroot. This also helps with development.

Thats all. Keep your systems clean and efficient.

Thursday, July 1, 2010

How to cheaply turn single machine into a cluster

For development of membase which is a distributed storage system, I often need to run it on cluster of machines. Luckily I have two machines at home and at work so with trivial use of rsync & ssh running cluster of two nodes was easy. But I sometimes need to run more than two nodes so I decided to find something cheap that allows me to run multiple true nodes even when I have single machine. This is more important now, because I'll be on business trip for next month. So I'll have only single machine at my disposal.

One approach is to use virtualization. I tried it around a year ago for some project. Starting complete OS just to run single application is a bit too slow, but that can be alleviated by use of snapshots. In practice this is too painful. Even with snapshots it's slow. And free software virtualization products either have it wrong (virtualbox) or buggy (kvm & qemu). Bridge networking is relatively slow to come up. And I remember having some networking issues when restoring from snapshot.

Yesterday I finally tamed 'virtualization' that's cheap and works. My approach is to use LXC which is Linux's built-in containers implementation. It supports network virtualization as it's core feature and it doesn't require separate root for it's instances. So with my current solution I can create large number of 'servers' all having shared filesystem, but different hostnames and network stacks. It's reliable, starts quickly and it's easy to kill.

One of the problems was that I needed to create virtual host-only network that connects host and all containers. The problem is that current implementation of macvlan link type doesn't support networking with host. I worked that around by creating virtual ethernet pair and linking macvlans to one side of it, while using other side as host's end. So far it works beautifully!

The main script is at the following gist: http://gist.github.com/459693. It takes care of allocation of ip address & hostname and simply runs provided command inside container. It also takes care to kill everything inside container when main command exits. This is useful for killing any daemons (e.g. erlang port mapper) that might still be running inside.

Here's what I added to project's makefile to launch multiple instances of membase:

'make lxc-run' starts one instance of membase inside container. 'make lxc-cluster' starts three instances in tabs of new terminal window.