[PlanetCCRMA] Shutdown Problems

Fernando Lopez-Lezcano nando@ccrma.Stanford.EDU
Sat Jun 17 12:17:01 2006


On Sat, 2006-06-17 at 17:59 +0200, Nigel Henry wrote:
> On Saturday 17 June 2006 03:04, Fernando Lopez-Lezcano wrote:
> > On Thu, 2006-06-15 at 14:01 +0200, nigel henry wrote:
> > > On Thursday 15 June 2006 03:19, Fernando Lopez-Lezcano wrote:
> > > > > Shut down the machine, booted up with the 2133 kernel, shut down and
> > > > > same problems again with a segfault and the shutdown stalling.
> > > > >
> > > > > I have another install of FC5 on the same machine, which does not
> > > > > have any planetccrma software on it. There are no problems with the
> > > > > 2133 kernel on this, and shutdown proceeds as it should.
> > > > >
> > > > > It may be worth removing the 2133 kernel, reinstalling the
> > > > > planetccrma sfuff, including the kernel, and kernel-module-alsa
> > > > > package, and see if the machine shuts down ok.
> > > > >
> > > > > I do have the messages from the halted shutdown if you want to see
> > > > > them Fernando.
> > > >
> > > > Yes, please send any info you have on this.
> > > > Sounds weird as both kernels should not interfere with each other.
> > >
> > > Jun 13 18:59:57 localhost gdm[2149]: Master halting...
> > > Jun 13 18:59:58 localhost shutdown[2149]: shutting down for system halt
> > > Jun 13 18:59:58 localhost init: Switching to runlevel: 0
> > > Jun 13 18:59:59 localhost gconfd (djmons-2369): Received signal 15,
> > > shutting down cleanly
> > > Jun 13 18:59:59 localhost gconfd (djmons-2369): Exiting
> > > Jun 13 18:59:59 localhost avahi-daemon[2003]: Got SIGTERM, quitting.
> > > Jun 13 18:59:59 localhost avahi-daemon[2003]: Leaving mDNS multicast
> > > group on interface eth0.IPv4 with address 192.168.0.234.
> > > Jun 13 19:00:03 localhost kernel: List corruption. next->prev should be
> > > cf40ae48, but was cf71ba48
> > > Jun 13 19:00:03 localhost kernel: ------------[ cut here ]------------
> > > Jun 13 19:00:03 localhost kernel: kernel BUG at include/linux/list.h:58!
> > > Jun 13 19:00:03 localhost kernel: invalid opcode: 0000 [#1]
> > > Jun 13 19:00:03 localhost kernel: last sysfs file: /block/hda/hda1/size
> > > Jun 13 19:00:03 localhost kernel: Modules linked in: appletalk ipx p8023
> > > ipv6 autofs4 ip_conntrack_ftp ip_conntrack_netbios_ns ipt_REJECT xt_state
> > > ip_conntrack nfnetlink xt_tcpudp iptable_filter ip_tables x_tables vfat
> > > fat dm_mirror dm_mod video button battery ac lp parport_pc parport floppy
> > > nvram usblp uhci_hcd 3c59x mii gameport snd_seq_dummy snd_seq_oss
> > > snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss
> > > snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 i2c_core ext3
> > > jbd Jun 13 19:00:03 localhost kernel: CPU:    0
> > > Jun 13 19:00:03 localhost kernel: EIP:    0060:[<d08cf1fa>]    Not
> > > tainted VLI Jun 13 19:00:03 localhost kernel: EFLAGS: 00010082  
> > > (2.6.16-1.2133_FC5 #1) Jun 13 19:00:03 localhost kernel: EIP is at
> > > snd_seq_delete_all_ports+0x74/0x17d [snd_seq]
> >
> > So, it looks like something is happening when the snd-seq module is
> > being removed on shutdown.
> >
> > Hmmmm.... I think I know what the problem might be, or rather why this
> > is happening only when Planet CCRMA is installed. Planet CCRMA's
> > alsa-driver package includes an ALSA startup and shutdown script which
> > is activated by default (/etc/rc.d/init.d/alsasound). When it is active
> > (check with "/sbin/chkconfig --list alsasound") it will stop the alsa
> > subsystem as part of the normal shutdown of the computer.
> >
> > I bet that is triggering a bug in the ALSA sequencer kernel module
> > included in the 2133 kernel that only happens on module unload.
> >
> > Try disabling the alsasound script:
> >   /sbin/chkconfig alsasound off
> > most probably you will find the shutdown proceeds normally (but the bug
> > is still there, it is just not being tickled :-)
> >
> > Or, while logged in to 2133 (and after saving work just in case), do a:
> >   /etc/rc.d/init.d/alsasound stop
> 
> It didn't like it at all when I did /etc/rc.d/init.d/alsasound stop. 100% CPU 
> with mouse locked up for a couple of minutes, then a load of output on the 
> Konsole from syslog. Message below.
> 
> [root@localhost djmons]# /etc/rc.d/init.d/alsasound stop
> Shutting down sound driver/etc/rc.d/init.d/alsasound: line 215:  3438 
> Segmentation fault      /sbin/rmmod `echo $line | cut -d ' ' -f 1`
> 
> Message from syslogd@localhost at Sat Jun 17 16:09:30 2006 ...
> localhost kernel: ------------[ cut here ]------------
> Message from syslogd@localhost at Sat Jun 17 16:09:30 2006 ...
> localhost kernel: kernel BUG at include/linux/list.h:58!
> Message from syslogd@localhost at Sat Jun 17 16:09:30 2006 ...
> localhost kernel: invalid opcode: 0000 [#1]
> Message from syslogd@localhost at Sat Jun 17 16:09:31 2006 ...
> localhost kernel: CPU:    0
> Message from syslogd@localhost at Sat Jun 17 16:09:31 2006 ...
> localhost kernel: EIP is at snd_seq_delete_all_ports+0x74/0x17d [snd_seq]
> Message from syslogd@localhost at Sat Jun 17 16:09:31 2006 ...
> localhost kernel: eax: 00000044   ebx: cf446448   ecx: c38c9f1c   edx: 
> d08ef433
> Message from syslogd@localhost at Sat Jun 17 16:09:31 2006 ...
> localhost kernel: esi: cf446448   edi: c12e39c0   ebp: c12e3a48   esp: 
> c38c9f18
> Message from syslogd@localhost at Sat Jun 17 16:09:31 2006 ...
> localhost kernel: ds: 007b   es: 007b   ss: 0068
> Message from syslogd@localhost at Sat Jun 17 16:09:31 2006 ...
> localhost kernel: Process rmmod (pid: 3438, threadinfo=c38c9000 task=c646faa0)
> Message from syslogd@localhost at Sat Jun 17 16:09:31 2006 ...
> localhost kernel: Stack: <0>d08ef433 cf446448 c12e3a48 c12e3a50 00000246 
> c12e3a5c c2853b1c 005590d0
> Message from syslogd@localhost at Sat Jun 17 16:09:31 2006 ...
> localhost kernel:        c12e39c0 00000000 bfc58960 c38c9000 d08e91bc c12e39c0 
> d08e927d c12e39c0
> Message from syslogd@localhost at Sat Jun 17 16:09:31 2006 ...
> localhost kernel:        d08eb573 d0858c80 c01323d6 5f646e73 5f716573 6d6d7564 
> c55b0079 ca4a2e8c
> Message from syslogd@localhost at Sat Jun 17 16:09:32 2006 ...
> localhost kernel: Call Trace:
> Message from syslogd@localhost at Sat Jun 17 16:09:32 2006 ...
> localhost kernel:  [<d08e91bc>] seq_free_client1+0x8/0x7e [snd_seq]     
> [<d08e927d>] seq_free_client+0x4b/0x80 [snd_seq]
> Message from syslogd@localhost at Sat Jun 17 16:09:32 2006 ...
> localhost kernel:  [<d08eb573>] snd_seq_delete_kernel_client+0x1a/0x2c 
> [snd_seq]     [<c01323d6>] sys_delete_module+0x191/0x1ce
> Message from syslogd@localhost at Sat Jun 17 16:09:32 2006 ...
> localhost kernel:  [<c02e466a>] do_page_fault+0x189/0x51d     [<c0102be9>] 
> syscall_call+0x7/0xb
> Message from syslogd@localhost at Sat Jun 17 16:09:32 2006 ...
> localhost kernel: Code: 88 00 00 00 39 af 88 00 00 00 74 63 8b 9f 88 00 00 00 
> 8b b7 8c 00 00 00 8b 43 04 39 f0 74 17 50 56 68 33 f4 8e d0 e8 43 dc 82 ef 
> <0f> 0b 3a 00 1e f4 8e d0 83 c4 0c 8b 06 39 d8 74 17 50 53 68 e8
> 
> Trying a shutdown at this stage, 2133 just hangs at "Stopping sound driver", 
> and I let it hang for quite a few minutes, just in case.

To be expected. A bug in snd-seq (or somewhere related) is causing the
removal of the module to crash and fail - anything later that tries to
do the same thing will hang forever. 

> Perhaps strangely, the 2122 kernel, which also came down with updates, and was 
> installed after the planetccrma one, has no such problems on shutdown.
> 
> I'm not bothered about the 2133 kernel myself, as I use the planetccrma one. I 
> don't know what changes there are between the 2122, and the 2133 kernel. Both 
> boot up and shut down ok on the other install of FC5 on the same machine, 
> which does not have the planetccrma kernel, or the kernel-module-alsa package 
> on it.

The extra Planet CCRMA packages are not the root cause of the problem.
The bug is part of the Fedora kernel (AFAICT), but it is tickled by the
alsasound script, which is part of alsa-driver Planet CCRMA package.
Disable the script and you should be able to shut down 2133 cleanly
("/sbin/chkconfig alsasound off"). 

> I'm not clued up enough to know whether this is a genuine bug in the 2133 
> kernel. You've had many more years of experience with this sort of thing than 
> me.

Definitely looks like a bug to me. 

> I'm using Apt again now on FC5 for the updates, 

Which apt are you using? Is it back in extras? (at the very beginning of
starting fc5 support I tried to get apt running but could not). I also
seem to remember having read somewhere that it now correctly support
x86_64 architectures, do you know if that's a fact for the one you are
using?

> and Yum only for the 
> planetccrma stuff. Any ideas what I need to do to stop the Fedora kernel 
> updates With Apt? This way I could remove the 2133 kernel and it would not be 
> reinstalled, along with later versions. It's a pity you've moved to Yum for 
> the planetccrma packages. Yumex is so slow to load up compared to Synaptic, 
> when you're looking for individual packages.

Well, it is the direction in which Fedora is going. 

> Have a nice weekend. Nigel.

Thanks, you too.
-- Fernando