Tales about Aliaksey's life

Sunday, January 24, 2010

Emacs and broken syntax highlighting

I've been struggling with syntax highlighting bugs in ruby-mode and javascript-mode for quite some time. Today I've tried espresso-mode that claims to be fixed javascript mode, and it fails too. I'm not sure why but it seems to be quite typical for Emacs and not typical for other advanced editors (vim & textmate at least). Maybe something is wrong with syntax highlighting facility of Emacs, or maybe major mode authors just cannot use it correctly.

One of the potential problems is regular expressions facility of Emacs Lisp. First of all it uses quite inconvenient dialect of regexps. Many constructs need to be escaped with '\' (alternative (|) and grouping, for example). Second, ELisp doesn't have special syntax for regexps. You have to use strings, where slash has to be escaped again. So even simple regexp a|b turns to "a\\|b".

Two years ago I've written small regexp DSL for Emacs Lisp. Using this DSL you can express complex regular expressions readably.

So today I found regexp literal expression in espresso.el -- "[=(,:]\\(?:\\s-\\|\n\\)*\\(/\\)\\(?:\\\\/\\|[^/*]\\)\\(?:\\\\/\\|[^/]\\)*\\(/\\)", translated it to my DSL and got:

(redsl-to-regexp `(concat (char-set "=(,:")
                          (* (or (whitespace) "\n"))
                          (cap-group /)
                          (or "\\/" (neg-char-set "/*"))
                          (* (or "\\/" (neg-char-set "/")))
                          (cap-group /)))

which is simply wrong. It obviously fails for simple regexp -- /\\/

I tranformed it into:

(redsl-to-regexp `(concat (char-set "=(,:")
                          (* (or (whitespace) "\n"))
                          (cap-group /)
                          (+ (or (neg-char-set "\\/")
                                 (concat "\\" (anychar))))
                          (cap-group /)))

which is logical and works. And ура ура I finally have working syntax highlighting in javascript mode!

I've uploaded regex-dsl to http://github.com/alk/elisp-regex-dsl and will send espresso.el fix soon.

Monday, January 11, 2010

Speeding up ruby interpreter in Debian & Ubuntu

It's an ancient trick, but I decided to blog about it while I still remember to blog about it.

Unix (or ELF, to be more precise) shared libraries suck. Well, there are many useful features that are not found on other platforms (like LD_PRELOAD), but the performance is worse than in other systems. And sometimes, dramatically.

The problem is that shared libraries are usually compiled as position independent code (PIC). This is quite slower (especially, on older ISAs, like i386, that don't have have PC-relative addressing) than normal code. Other reason for shared libraries slowness is indirection for almost all function calls. This indirection gives you extra flexibility. ELF shared libraries allow you to replace almost every function, due to this extra indirection. And sometimes this is very convenient, but this is slow.

Now imagine modern C or C++ code, with small functions that frequently call each other. Extra indirection and PIC cost in function prologues becomes quite high. GCC's -fvisibity switch allows you to get rid of some of this flexibility for gains in speed, but few libraries use it, yet.

You can run simple experiment (as I did few years ago). Write a small program that does malloc/free in a tight loop. Link it normally (i.e. with shared libc) and statically. Then compare their speed. I remember as high as 50% gain with statically sinked libc, that doesn't pay performance price for ELF shared libraries flexibility.

There's 'sort-of-hack' that trades memory efficiency for speed. It is possible to link normal (i.e. non-PIC code) as shared library (at least on i386). This way you'll have minimal function call indirection and no variable access indirection at all. You won't pay PIC price too. The only downside of this method is that dynamic linker will have to patch TEXT section pages with relocated addresses, so it won't be possible to share this pages between different processes. The performance gain may well worth it though. NVIDIA folks, for example, build their libGL in this way. And I'm sure they know what they do.

Why I'm mentioning ELF shared libraries? Because Debian (and Ubuntu too) build ruby interpreter as shared library, and according to my measurements, this gives around 10% performance penalty. To regain part of this loss, I simply edit ruby's 'configure' script and remove all mentions of -fPIC. I than 'dpkg-buildpackage' and install resultant .deb-s as usual. This brings performance of ruby interpreter back to it's normal level (i.e. when it's built statically). Another optimization that I use, is passing better GCC optimization flags. "-march=' flag is quite important on i386.

P.S. Read excellent http://people.redhat.com/drepper/dsohowto.pdf by GNU libc guru Ulrich Drepper, if you want to know more details about ELF shared libraries.

Monday, January 4, 2010

Power of modern javascript interpreters

Today my colleague  (Vlad Zarakovsky), who is interested in porting gpicker scoring to JS or finding something that's close to that, showed me some filtration code and it's performance results. He demonstrated that some (quite non-trivial it seems, from quick glimpse of the source) filtration implementation takes 33 milliseconds on 22k of filenames, which is not bad. This result was obtained on Safari, which sports state-of-the-art javascript interpreter.

Another thing that we noted is that sorting takes additional 14 milliseconds, which compared to much more sophisticated scoring is quite large. I thought that it should be due to cost of transition from native qsort code to JS code thats implements comparator function. And out of curiosity I've proposed trying pure JS sorting instead.

The results were interesting. At first it was much slower than built-in qsort, but Vlad then discovered that it's actually faster on random data (I haven't asked by how much, but that's not very important). All this led me to the following conclusions:

  • Naive qsort on real world non-random data may be 100 times slower than proper qsort implementation. Which is understandable, but this is the first time I've met such case personally.
  • I was right about pure JS qsort being able to beat C qsort, that has to use slow and barely optimize-able comparator.
  • In general, state of the art javascript interpreters are indeed approaching speeds of optimized C. If only we had Ruby interpreter of same quality :)

So when next time your JS code will depend on fast sorting, consider picking pure JS sort implementation, but pick wisely. Also don't forget that there are still some very slow JS interpreters around (yes, I'm talking about 'the-cursed-browser').

BTW, I'm sure same applies to C standard qsort. In cases where comparator is quite simple, the implementation that is able to inline comparator should be much faster. For gpicker I've stolen source of glibc's qsort implementation and made sure that it's able to inline comparator.

Sunday, December 27, 2009

Speeding up firewatir

My recent work is pure ajax single-page application. And now I'm trying to cover it with some tests. Yes, I'm not TDD-infected, but that's a story for another post. And I've tried Firewatir. I quickly encountered it's known issue of being slow sometimes.

After a bit of investigation it turned out to be lack of TCP_NODELAY on both sides of firefox control socket. It's surprisingly typical that programmers are not aware that Naggle's algorithm affects localhost-to-localhost sockets too. In this case short requests and responses were unnecessarily delayed in kernel because of that.

The most interesting thing was fixing this issue. It seems that (re)building JSSh extension that is firefox side of Firewatir power is not very convenient and quick process. So I made a funny hack instead.

In one of my earlier posts I described a piece of code that drives GDB from ruby to get backtrace from running ruby process. This time I decided to use same approach to set TCP_NODELAY on connected socket inside running firefox process. I extended old code and published it at http://github.com/alk/alk-ruby-gdb-hacks/blob/master/gdb.rb. After creating Firewatir instance you simply need to call
GDB.force_socket_nodelay($jssh_socket)


On biggest numbers

There's a quite long, but very interesting essay on big numbers. Here:
http://www.scottaaronson.com/writings/bignumbers.html.
Author describes math competition to write down a representation of as big as possible number in 30 seconds. It's quite interesting thing to read as it stretches your imagination.
The biggest joy was to read about 'Busy Beaver' numbers. Those are amazing. But there's one mistake IMO. While it's possible to prove that busy beaver numbers sequence grows faster than any other computable sequence, it doesn't let you win biggest number competition. It only says that for each 'opponent' sequence there exists some number N starting from which Busy Beaver sequence will be greater than that sequence. And you will still need to prove, that, say busy beaver sequence number 1111 is greater than Ackermann's number 1111. For example BB(4) is only 107, while Ackermann's 4th number is quite large - around 10E154.
It may be quite tricky to judge such competition. For estimating largest entries we can use floating point numbers for computation, but it seems that we can easily meet such big entrants that resulting floating point mantissa representation will exceed available computer memory! So we may need to use floating point number approximation for mantissas! It's not hard to imagine yet another level of floating approximation of mantissas.
Computation time can also be challenging and may require special methods. How about estimating A(1E1111) ? My math skills and available computation facilities do not allow me to get even estimation of that number in my entire lifetime. But it should be truly huge number.

Sunday, October 26, 2008

gpicker tale

Today I finally released version 1.0 of my file picker. While I'm awaiting project hosting approval it can be fetched from http://github.com/alk/gpicker/tree/master.
This program lets you type part of filename (not necessarily prefix or substring) you want to pick. Unlike it's alternatives it's very fast and supports matching directory name. I also tried to make it smart. This means that it tries hard to place most likely candidates on top, so that you rarely have to select file, just press enter. Details can be seen in README.
It's not very useful as a standalone program, but is designed to be called from Emacs or any other editor/IDE. Emacs support is included. I bind it to Super-f (Super is a key with flag between Ctrl and Alt) and it takes me just few key presses and rarely more than a second to reach any file in project.
It's best to try it out to get its tremendous usefulness. But I'll try to illustrate it's power by few screenshots.

First screen demonstrates picker usage on linux kernel source. Nothing especially fancy here. Notice match highlighting and matching against directory name. And it's fast. On my machine startup takes much less then second. I think something around 0.2 seconds. It's noticeable, but still fast. Real-time filtration response is instantaneous (i.e. less than 0.1 seconds).

Second screen demonstrates smart matching which prefers starts of words and adjacent chars.

This demonstrates using easy to type delimiter to emphasize start of words in pattern.

Usage on openjdk. Demonstrates usage on camel-cased names. Full openjdk sources are around 2 times larger then linux. It has 68k files. And on this size the filtration lag is noticeable. But it's still fast and usable. And you can always point gpicker to a particular subproject.

This demonstrates noncontiguous match.

Friday, October 24, 2008

Live ruby process backtrace tale

On my current job I'm working with ruby on rails. Around half year ago we had one problem on staging environment. Our application processes suddenly started to eat all CPU they could. Something somewhere got into apparently endless loop. On mature platforms (like Java) you have plenty of tools to diagnose the situation. In this case getting a backtrace would reveal the problematic place. But there was no way to get a ruby backtrace from a running process. Now there is!

My original approach was to use GDB to break into runaway process, grab backtrace via interpreter's make_backtrace() function and print it, then detach. And it helped us greatly half year ago. But make_backtrace is static. So GDB may be unable to see it depending on how your ruby was built.

So for this post I've prepared improved version. This version uses rb_backtrace which is not static and which prints backtrace via printf. It was obviously left for debugging purposes. The problem is that printf uses server's stdout which may be /dev/null. So my newer code

  • opens temp file
  • dup(3)-es stdout somewhere
  • then dup2(3)-s temp file descriptor to stdout
  • calls rb_backtrace
  • flushes stdout
  • restores original stdout back

I'll post a ruby script which does that below. But first I need to note several points:

  • it shows backtrace of active thread only (ruby's threads are user-space aka green threads)
  • it's fundamentally not 100% reliable. GDB may interrupt interpreter in places where allocating array for backtrace or printing anything will not work. I haven't yet encountered this case in practice, so it should be very rare. Anyway don't expect that after printing backtrace your process will stay alive.
  • the ruby script code which feeds commands to GDB is uglier then you might expect. That's because GDB is very picky about receiving new command when it's still processing previous one. I can only guess why.
  • this script uses temporary file unsafely. This should be mostly harmless from security standpoint. Point TMPDIR to a safe place if you're afraid.
#!/usr/bin/env ruby

require 'tempfile'

module GDB
  def process_gdb_output 
    begin
      line = begin
               self.readpartial(65536)
             rescue EOFError
               return
             end
      exit = (line =~ /^\(gdb\)\s+/)
      if exit
        line = $`
      end
      yield line
    end while !exit
  end
  def eat_gdb_output
    process_gdb_output {|dummy| }
  end
  def command(cmd)
    silent = false
    cmd = cmd.strip
    if cmd[0] == ?@
      silent = true
      cmd = cmd[1..-1]
    end
    self.puts cmd
    process_gdb_output do |line|
      unless silent
        STDOUT.print line
        STDOUT.flush
      end
    end
  end

  def self.print_backtrace_of_pid(pid)
    tmp = Tempfile.new("backtrace")
    path = tmp.path
    tmp.close
    IO.popen("gdb","r+") do |f|
      f.extend GDB
      f.eat_gdb_output

      script = <<HERE
@set pagination off
@attach #{pid}
@set $newfd = open(#{path.inspect}, #{IO::WRONLY|IO::CREAT}, 0600)
@set $newstdout = dup(1)
@call dup2($newfd, 1)
@call close($newfd)
@call rb_backtrace()
@call rb_funcall(rb_funcall(rb_mKernel, rb_intern("const_get"), 1, rb_str_new2("STDOUT")), rb_intern("flush"), 0)
@call dup2($newstdout, 1)
@call close($newstdout)
@quit
HERE
      script.split(%r{\n}).each {|c| f.command c}
    end
    print IO.read(path)
    File.unlink(path)
   end
end

pid = ARGV[0] || raise("need PID")

if pid == "--mongrels"
  puts "Printing backtraces of all mongrels"
  all_pids = Dir[File.expand_path(File.dirname(__FILE__)+"/../log/mongrel*.pid")].map do |path|
    [File.basename(path), IO.read(path).strip]
  end
  all_pids.each do |basename, pid|
    puts "Backtrace of #{basename} (#{pid}):"
    GDB.print_backtrace_of_pid pid
    puts "----------------------"
  end
else
  GDB.print_backtrace_of_pid pid
end