Tales about Aliaksey's life

Sunday, February 15, 2015

Visualizing perf profiles using pprof

Perf is a pretty powerful profiling tool (in addition to it's other features). But it is not as good as pprof at visualizing profiles.
Now you can capture profiles using perf (including profiling running processes), but deal with them using pprof. With help of perf2pprof tool. It is also available via rubygems. Simply do gem install perf2pprof and you're ready to rock.

Friday, December 12, 2014

Better productivity writing golang code with help of supermegadoc

At work I'm writing more and more golang code recently. It is nice little language. But as a language noob I have yet to internalize which features are available in standard library and how to use them. So I'm spending a lot of time staring at pages under golang.org/pkg looking for types or functions or methods I need.
For my Erlang work I'm quite regular user of supermegadoc erlang integration (video demo with my nice/odd accent is here). And I was seriously lacking something similar for go. Today I finally spent few hours and I have something that looks like what I need.

Observe:

I can quickly find things that are visible to godoc. Just like with other supermegadoc integrations, I can see function signatures and constant and variable values. And I can see if type is struct, "typedef" or interface. From experience working with Erlang's supermegadoc, I know that this means that I often don't even need to open corresponding doc entry. It is often enough to see that it's there and (in case of functions or methods) what it's signature is.

I expect my golang productivity to increase.

Have a nice day folks!

Monday, December 9, 2013

Massive power of (liblzma based) XZ archiver

I've recently restarted gathering of bitcoin market data. I'm grabbing samples of market depth every 3 seconds and I'm collecting trade events.

Market depth samples can be quite large. Every mtgox sample appears to be about 90 kilobytes big. So 1 hour of samples is about 100 megs of data. And month is about 3 gigs. Which is a bit too much.

gzip is able to compress that about 5x. But that's still a bit too large.

I've found xz to really shine on that kind of data. More than 1 gig of data gets squeezed down to less than a meg! And what's extra cool is xz is very quick to decompress. For static data like btc market archive that's very useful.

So quality compression does matter. And I just wanted to express my ultimate respect to authors of that extremely useful software.

Have a nice day and happy hacking!

Sunday, September 22, 2013

Playing with Intel TSX

I've recently got access to a box that has Intel Haswell CPU inside. And I was quite looking forward playing with one of it's most interesting features: hardware transactional memory. My particular interest is to see how cheap it is.

My use case is per-processor data structures (e.g. malloc caches). And without explicit binding of threads to processors, there's only optimistic way of doing it. Which requires some synchronization to defend against pessimistic case of rescheduling of thread to different cpu. That would look like taking cpu id, locking it's corresponding lock which in most cases would be in cache and uncontended and thus reasonably quick, and then doing something with per-cpu data. So in this approach we always pay some performance price even if majority of actual runs will hit fast-path. Lack of really cheap optimistic locking makes that price significant which makes it less attractive.

So lets return to Intel's implementation of transactional memory (aka TSX). Wikipedia article describes that thing pretty well. My understanding is that it's expected to be most useful for somewhat coarse locks where multiple threads would normally contend for the lock yet they touch different memory locations. E.g. imagine different threads touching different buckets of hash table or different branches of binary search tree. It can also be used as compare-and-exchange operation that allows you to process multiple memory locations at once. There's already glibc support for it that optimises pthread mutex operations described in usually nice lwn.net article.

My hope was that this feature ends up being even faster than atomic operations in fastest path (everything is in L1 cache) given it's optimistic nature. And that it might be useful for quick optimistic locking I'd like to have.

You can see my test case here. It simulates fastpath of "lock" that guards a counter. There is no locking itself, just check that "lock" is free. Which is what glibc lock elision code is doing. And you can see how TSX allows to avoid actual locking. "On the other side of the ring" is code that changes counter via traditional compare-exchange atomic operation (no locking either, to give me purer numbers).

On the box I have access to (with Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz processor) I'm getting about 67 cycles per loop iteration for TSX case. And about 27 cycles for atomic CAS code (and same for more traditional locking instructions "xchg mem, register"). Note that it's very likely that larger transactions will have bigger overhead. Also note that usual synchronised region is two atomic ops (unlocking atomic operation being potentially significantly cheaper than locking operation), so in this limited case TSX appears to be somewhat competitive with traditional locking, but not faster.

So TSX is not faster than single atomic operation. Which breaks my hope of using it for quick optimistic lock It is somewhat sad that on today's hardware there's seemingly no way to have fast-path of locks to be lightning fast without playing crazy tricks (e.g. slow path stopping lock owner thread via signal or ptrace like jvm "biased locking" appears to be doing).

Anyways, being slightly more than 2x slower than simple atomic operation is pretty good news IMHO for use cases for which TSX is designed. And it's "multi-word cas" application appears to be very interesting and useful too. So I'm looking forward using it somewhere.

And finally I have to note that, especially at the beginning, debugging transactional memory code can be quite tricky and very weird. That's because transaction is fully isolated while it runs, so there's no way to printf something to see why it fails. Or set breakpoint inside it and inspect things. This hit me initially because my simplistic code wasn't at all prepared to handle transaction failures. I.e. my code is only supposed to test fast-path without any real-world synchronization contention. After few minutes of struggling with it I realized, that even otherwise conflict-less code will abort from time to time. For example, any interrupt (e.g. timer tick) will abort in-flight transaction, as well as in fact any user- to kernel-space transition will.

So lesson number one is that debugging hardware transactional memory code should be done very carefully. Especially if code path is significantly different between successful and abort-ful cases. I.e. imagine some real transaction that might span several layers of code and consider that debugger/printf will never be able to see or expose "guts" of aborted transactions. And lesson number two is that aborts have to be handled always, even in toy code.

Have a nice day and happy hacking.

Sunday, April 8, 2012

gpicker 2.2 is out!

Hello there! I've just made long due release of gpicker 2.2. Some notable changes are:

new project type -- script, that I'm using to handle multi-repository project (i.e. couchbase)
implemented poor man's isearch on steroid's -- gpicker-isearch
big improvements for gpicker-imenu
more optimization

Savannah project page has link download area with source .tar.{gz,bz2,xz} archives and binary .deb packages (built on lenny) for i386 and amd64. If you haven't heard of gpicker before also check out supermegadoc which is very convenient gpicker-using tool.

Thursday, December 15, 2011

Me and Gnome3

Hi. Quite a bit of time passed since my last past. That was busy time with continued hard work on (still forthcoming) Couchbase Server 2.0 release and, most importantly, I've found beautiful girl and got married!

Anyway, I just got remind that I should not forget about writing something from time to time. And today's "hot" topic is Gnome 3.

About a month ago (or was it 2 ? Time flies so weirdly with so much happening around me now) Debian Sid got Gnome 3. Even earlier it got some components of Gnome 3. Most noticeable was upgrade of gnome-terminal to Gnome 3 version. And that was almost immediately reverted back to gnome-terminal 2 from last Debian stable. The reason is very simple. Default theme of gtk3 (which is, naturally, used by all gnome 3 apps) is ugly. Like very very ugly. And, surprisingly, there's only one non-default theme engine for gtk3. The one that's heavily using CSS3. I don't like it's look either, but the most worrisome aspect of it is quite noticeable slowness. There are ways to adjust look with CSS3 hackery after all. I've found that some porting work of old gtk engines was initiated. But quick and minimalistic Mist engine I'm used to is not yet ported.

That's basically my whole Gnome 3 story. I cannot tolerate Gnome 3 not because of it's experimental UI, but because I need usable gtk3 theme first. I cannot even say what I'm thinking about gnome's UI, because I haven't even tried using it on daily basis.

Whoever makes Mist work on gtk3 will become my hero. Meanwhile, I was forced to find refuge in XFCE land, that's missing few things I had on my gnome 2 desktop.

Monday, May 16, 2011

Unbreaking LXC on latest Debian unstable

With recent switch to /run directory in Debian I was getting error from lxc when it was trying to mount /dev/shm in container and failed because /dev/shm is now symlink to inside /run. The simplest fix I found is replacing symlink with bind mount. Here's what I've added to /etc/rc.local

if [ -L /dev/shm ]
then
mv /dev/shm /dev/shm~
mkdir /dev/shm
mount --bind "`readlink -f /dev/shm~`" /dev/shm
fi