On my current job I'm working with ruby on rails. Around half year ago we had one problem on staging environment. Our application processes suddenly started to eat all CPU they could. Something somewhere got into apparently endless loop. On mature platforms (like Java) you have plenty of tools to diagnose the situation. In this case getting a backtrace would reveal the problematic place. But there was no way to get a ruby backtrace from a running process. Now there is!
My original approach was to use GDB to break into runaway process, grab backtrace via interpreter's make_backtrace() function and print it, then detach. And it helped us greatly half year ago. But make_backtrace is static. So GDB may be unable to see it depending on how your ruby was built.
So for this post I've prepared improved version. This version uses rb_backtrace which is not static and which prints backtrace via printf. It was obviously left for debugging purposes. The problem is that printf uses server's stdout which may be /dev/null. So my newer code
- opens temp file
- dup(3)-es stdout somewhere
- then dup2(3)-s temp file descriptor to stdout
- calls rb_backtrace
- flushes stdout
- restores original stdout back
I'll post a ruby script which does that below. But first I need to note several points:
- it shows backtrace of active thread only (ruby's threads are user-space aka green threads)
- it's fundamentally not 100% reliable. GDB may interrupt interpreter in places where allocating array for backtrace or printing anything will not work. I haven't yet encountered this case in practice, so it should be very rare. Anyway don't expect that after printing backtrace your process will stay alive.
- the ruby script code which feeds commands to GDB is uglier then you might expect. That's because GDB is very picky about receiving new command when it's still processing previous one. I can only guess why.
- this script uses temporary file unsafely. This should be mostly harmless from security standpoint. Point TMPDIR to a safe place if you're afraid.
#!/usr/bin/env ruby require 'tempfile' module GDB def process_gdb_output begin line = begin self.readpartial(65536) rescue EOFError return end exit = (line =~ /^\(gdb\)\s+/) if exit line = $` end yield line end while !exit end def eat_gdb_output process_gdb_output {|dummy| } end def command(cmd) silent = false cmd = cmd.strip if cmd[0] == ?@ silent = true cmd = cmd[1..-1] end self.puts cmd process_gdb_output do |line| unless silent STDOUT.print line STDOUT.flush end end end def self.print_backtrace_of_pid(pid) tmp = Tempfile.new("backtrace") path = tmp.path tmp.close IO.popen("gdb","r+") do |f| f.extend GDB f.eat_gdb_output script = <<HERE @set pagination off @attach #{pid} @set $newfd = open(#{path.inspect}, #{IO::WRONLY|IO::CREAT}, 0600) @set $newstdout = dup(1) @call dup2($newfd, 1) @call close($newfd) @call rb_backtrace() @call rb_funcall(rb_funcall(rb_mKernel, rb_intern("const_get"), 1, rb_str_new2("STDOUT")), rb_intern("flush"), 0) @call dup2($newstdout, 1) @call close($newstdout) @quit HERE script.split(%r{\n}).each {|c| f.command c} end print IO.read(path) File.unlink(path) end end pid = ARGV[0] || raise("need PID") if pid == "--mongrels" puts "Printing backtraces of all mongrels" all_pids = Dir[File.expand_path(File.dirname(__FILE__)+"/../log/mongrel*.pid")].map do |path| [File.basename(path), IO.read(path).strip] end all_pids.each do |basename, pid| puts "Backtrace of #{basename} (#{pid}):" GDB.print_backtrace_of_pid pid puts "----------------------" end else GDB.print_backtrace_of_pid pid end
fascinating.
ReplyDeleteAlso check out the backtracer gem http://github.com/rdp/backtracer
and hijack/liveconsole [?]
I get the error
ReplyDelete1532 ./include/ruby/ruby.h: No such file or directory.
Interesting. Did not expect to see somebody reading this after so many years.
ReplyDeleteIt looks like error above applies to backtracer gem or other alternative. At least I'm not able to see how ruby.h may be relevant to my script above.
It's entirely possible (although unlikely) that after all this years I'm not able to understand my own code :). So feel free to correct me if I'm wrong.