Wednesday 2 September 2009

Running Debian GNU/Linux 5.0 (Lenny) on Core i7

We recently had the good fortune to upgrade our build machine to something a little more powerful. Although the quoted CPU clock speed is actually slightly lower than our previous machine the new one has two quad-core Core i7 processors with hyperthreading whereas the old one had one dual-core processor. The new one also supports Turbo Boost which provides a limited sanctioned-by-Intel over clocking mechanism.

I eventually got Debian GNU/Linux 5.0 (Lenny) installed after some trouble that I'm putting down to out-of-date mirrors and set about configuring the machine. After a bit of fiddling I discovered that by default the modules required for Turbo Boost (and SpeedStep) weren't loaded. I added acpi-cpufreq and cpufreq_ondemand to /etc/modules and noted that the frequencies did seem to change when the system was under load by inspecting the files in /sys/devices/system/cpu/cpu*/cpufreq/. Mission accomplished I carried on configuring the machine and we started using it for real work.

But, a few days later I started noticing processes taking far longer than they should have done to complete. They were always stuck consuming 100% CPU. C++ compilations that usually took at most a couple of seconds were taking over twelve minutes!

My dabblings with strace(1) and time(1) lead me to believe that the programs appeared to be spending most of their time stuck in kernel. Initially all the afflicted processes were running in a 32-bit chroot so I suspected that the problem was related to that.

It was when my VNC slowed to a complete crawl I knew that the problem wasn't related to the chroot and that it needed solving. My usual techniques for investigation didn't yield anything useful so I decided to try running a more modern kernel. Lenny uses a v2.6.26 kernel which was released before Core i7 (although of course it may have been patched since by Debian.) The version in backports is v2.6.30 which is much newer. Initial tests failed to reproduce the problem but it took a couple of weeks of good behaviour before I believed that the problem was solved.