[PlanetCCRMA] FC4 SMP kernels won't boot

Michael Gurevich gurevich@ccrma.Stanford.EDU
Sun Nov 27 17:42:01 2005

I turned HT off and now it loads the ethernet module but locks up at 
various later places (cups, power management, etc.). Suspecting that it 
was something earlier on in the boot process, I can now see that there was 
an earlier error:

FATAL: Error inserting acpi_cpufreq 
No such device

I actually get the same error when I boot the uniprocessor kernel, but it 
doesn't crash. This seems to be a problem for some intel cpus with 
default settings in the FC4 2.6.13 kernels:

But I'm not sure if this is the problem. I followed 2 suggestions I found 
on the web - changing the driver in /etc/cpuspeed.conf and disabling
cpuspeed entirely. Both make the error go away, but the smp kernel still 
doesn't boot.

WRT the ethernet module, modprobe.conf contains
alias eth0 e1000

The ethernet device is Intel 82545EM Gigabit. Any thoughts, or should I 
just wait and try the new kernel?


On Thu, 24 Nov 2005, Fernando Lopez-Lezcano wrote:

> On Wed, 2005-11-23 at 23:55 -0800, Michael Gurevich wrote:
> > Hi,
> > 
> > I just did a fresh install of FC4 and PlanetCCRMA. The machine is a little 
> > weird - it has Dual Xeons with HT enabled and a SCSI BIOS/Boot ROM 
> > thingie whose purpose I don't really understand. But we get along alright 
> > and I don't think this is related to the problem. The SCSI HD has WinXP on 
> > it, but I can dual boot from GRUB on an IDE drive which also has the FC4 
> > install. 
> > 
> > The boot seems to start okay, gets to runlevel 5 and freezes on:
> > 
> > Bringing up interface eth0
> > 
> > Just now I left it at this stage for about 5 minutes and got:
> > BUG: Unable to handle kernel NULL pointer at virtual address 0000008
> > then it went all Matrix on me with junk scrolling down the screen. If I 
> > do the Fedora interactive startup and skip loading the module the rest of 
> > the boot goes normally. I know this isn't very helpful to diagnose what 
> > the problem is but I'm not sure where to look. Is it possible to log what 
> > happens in the kernel during a failed boot? /var/log/dmesg doesn't contain 
> > anything noteworthy and /var/log/boot.log is empty.
> Most probably you are hitting a kernel panic while loading the ethernet
> kernel module, that will not be logged to disk as the kernel does
> nothing else after the panic. It is also possible to dump things to a
> serial console (to other machine). 
> > 
> > This happens with the smp kernels 2.6.13-0.3.rdt.rhfc4.ccrmasmp and 
> > 2.6.12-0.21.rdt.rhfc4.ccrmasmp but the uni-processor versions of these 
> > kernels boot and run smoothly.
> Hmmm, not much to say except that it has to be a kernel bug (oh boy, am
> I bright today!). 
> What driver is eth0 using? Look into /etc/modprobe.conf. That should be
> the culprit. What happens if you disable HT in the BIOS? (most probably
> the same thing, of course). 
> I'm getting ready to release a more up to date kernel - I just need to
> properly patch Jack so that it works without using the TSC ever for
> timing, or just in the case of dual core X1 Athlons. Stay tuned (or
> contact me off the list if you want to be a guinea pig for it). 
> -- Fernando