[PlanetCCRMA] Chronic System Freezes

Fernando Lopez-Lezcano nando@ccrma.Stanford.EDU
Sat Jan 6 11:47:01 2007


On Sat, 2007-01-06 at 13:07 -0600, Bill Polhemus wrote:
> SYSTEM:
> 
> Homebrew PC with:
> ASUS A8N-SLI Mobo With "Cool-Pipe" (liquid-cooled) technology (the fan
> just NEVER turns on)
> AMD Athlon 64 X2 4200 Dual-Core Processor
> 2GB DDR2 Memory (4-512MB DIMMs, cannot recall make or designation)
> 2-160GB Western Digital SATA HDs, configured as a single 16GB /boot
> partion and the rest as an LVM volume as set up by Anaconda
> M-Audio Delta 66 PCI Audio Interface card
> BFG 3DFuzion NVidia GeForce 6200 LE PCI Express Graphics Card 
> 
> Running FC5 with up-to-date PlanetCCRMA packages including the latest
> kernel.
> 
> I'm running the GNOME desktop. I also have an M-Audio Keystation 49e
> MIDI keyboard controller hooked up through USB.
> 
> Several times now since I've had the system up and running (about two
> weeks), the system has just "frozen up" in a very un-Linux-like
> manner. Everything just stops. I have left it sitting overnight,
> hoping that it would come unstuck somehow over time, but it appears to
> be a permanent session-death. I have to to a hardware-reboot (which I
> hate to do on a Linux system) to get it back up and running.
> 
> I have the following questions:
> 
> 1) Any ideas as to why this is happening?

Most probably a bug in the realtime preemption patches from Ingo (that
make the kernel good for realtime apps, arghh... :-)

For example in the kernel I was running up to a week ago I sometimes got
hangs that apparentlyl correlate with high netowrk traffic. 

> 2) Where might I be able to find information (e.g. log entries) that
> might help me pinpoint the cause?

That is difficult if we are talking about a complete halt. Does the
system still respond to the sysrq key? (hmmm, is it configured to
respond that? - see /etc/sysctl.conf). Are the three status LED's in the
keyboard blinking? (indication of a kernel panic). If the system still
responds to the sysrq key it would be possible to get a dump of the task
state (that might contain clues) but that would require running from the
start with a serial console so that another machine can grab the
output. 

I imagine you have already looked at /var/log/messages and found
nothing. 

> 3) What are some recommendations for other diagnostic measures?
> 
> BTW, all the above components are new, except for two of the memory
> DIMMs, which are a few months old. There shouldn't be any "memory
> incompatibility problems" as I have the DIMMs in separate banks, but
> if, say, the memory specs ARE different between banks, could that be
> the culprit?

Hmmm, unlikely, more likely it is a bug (of course you could run
memtest86 overnight but I doubt it will find anything). Something that
happens once every two weeks is very hard to catch... Have you noticed
any commonality between occcurrences? Things you were doing? Always
happens at the same time of the day? Things like that...

If you are running >= fc5 you could try the newer versions of the
realtime kernel that Ingo (Molnar) is publishing, just to test them. I
have not yet released any of those on Planet CCRMA. The latest one
appears to be quite stable and has the best realtime performance I've
seen so far but it is not even one week old...

-- Fernando