[PlanetCCRMA] SMP Boot Problem Report

Fernando Lopez-Lezcano nando@ccrma.Stanford.EDU
Tue Dec 20 20:45:02 2005


On Tue, 2005-12-20 at 19:43 -0800, Michael Gurevich wrote:
> On Tue, 20 Dec 2005, Fernando Lopez-Lezcano wrote:
> > On Tue, 2005-12-20 at 08:03 -0800, Timothy Polashek wrote:
> > > Hello-
> > > 
> > > I just tried the booting the latest edge FC4 Kernels after installing from a
> > > fresh install of FC4.
> > > 
> > > Booting with 2.6.14-0.10.rrt.rhfc4.ccrma works just fine.
> > > 
> > > However, booting with 2.6.14-0.10.rrt.rhfc4.ccrmasmp always freezes the machine
> > > at the "Starting networking" point of the boot process.
> > > 
> > > This problem persisted even after performing an "apt-get dist-upgrade" command.
> > > 
> > > I'd prefer during the smp version running to get the most out of both CPUs.  Has
> > > anyone also experienced this problem with this kernel?  Or better yet, found a
> > > way to fix it?
> > 
> > No, I have not seen this problem. 
> > 
> > I would seem that loading one of the drivers is hanging the machine. Did
> > you try to boot without the graphical boot interface to see if there are
> > messages there to see? (edit out the "rhgb quiet" part of the boot line
> > in grub). Do you see any messages in /var/log/messages from the smp
> > failed boot when you boot into the up kernel? What kind of hardware do
> > you have?
> > 
> > Even if you manage to get to the actual error messages I would not be of
> > much help - if it is indeed a problem with some particular driver Ingo
> > Molnar and the linux kernel mailing list would be the one to contact
> > with error messages and such. 
>
> I reported this problem to the planetccrma list about a month ago, for the 
> previous two smp kernels. I just tried the latest and get the same result 
> unfortunately. I turned off the graphical and quiet boot, and when it 
> locks up (after the ethernet modules appear to be loaded ok) I do get 
> error messages like:
> irq 209: nobody cared (try booting with the "irqpoll" option)
> 
> followed by a stack trace or something. I tried adding the irqpoll 
> option in my grub.conf, but that makes things much worse. The messages 
> look similar to ones I get in /var/log/messages when 
> 2.6.13-0.3.rdt.rhfc4.ccrma (not smp) crashes (also reported previously, 
> errors copied below). I've booted into the 2.6.14 uni-processor kernel and 
> will wait to see if I get the same crashes. 
> 
> Nov 28 15:18:23 localhost kernel: irq 21: nobody cared (try booting with
> the "irqpoll" option)
> Nov 28 15:18:23 localhost kernel:  [<c01471b4>] __report_bad_irq+0x24/0x90
> (8)
> Nov 28 15:18:23 localhost kernel:  [<c01472c2>] note_interrupt+0x72/0xd0
> (20)
> Nov 28 15:18:23 localhost kernel:  [<c0146ec4>] do_hardirq+0xe4/0x110 (20)
> Nov 28 15:18:23 localhost kernel:  [<c0146ef0>] do_irqd+0x0/0x90 (20)
> Nov 28 15:18:23 localhost kernel:  [<c0146f52>] do_irqd+0x62/0x90 (8)
> Nov 28 15:18:23 localhost kernel:  [<c0137078>] kthread+0x98/0xa0 (24)
> Nov 28 15:18:23 localhost kernel:  [<c0136fe0>] kthread+0x0/0xa0 (12)
> Nov 28 15:18:23 localhost kernel:  [<c01013b1>]
> kernel_thread_helper+0x5/0x14 (12)
> Nov 28 15:18:23 localhost kernel: handlers:
> Nov 28 15:18:23 localhost kernel: [<c02b08d0>] (usb_hcd_irq+0x0/0x80)

Sorry to hear this is still a problem...
Can you see what is interrupt 21 or 209? (using "cat /proc/interrupts")

I would report this to Ingo Molnar (mingo __at__ elte __dot__ hu) and
the linux kernel mailing list (linux-kernel __at__ vger __dot__ kernel
__dot__ org)

Regretfully I'll be away till Jan 14 or so...
-- Fernando