fleetfootmike: (Default)
fleetfootmike ([personal profile] fleetfootmike) wrote2004-01-28 02:02 pm

Last night, my mail server died...

Rather messily. Throwing spurious disk errors.

So. Last night I tried to do what I could to fix the disk errors, including turning UDMA off on the offending drive. That fixed it for a bit, and most things were working, except useful things like the virus scanner, and anything to do with Debian package installs, all of which died with 'Segmentation Fault'.

At half past midnight, I gave up, went outside to the server room (discovering it had snowed in the process) to turn it off, and went to bed.

This morning dawned crisp and even brightly...

9.15am: download a new Debian net install CD from the comfort of my armchair on to the machine in the office with the CD-R.

9.40am: Finally run out of displacement activity to put off going out in the cold again, girded my loins, went out, started the Debian CD burning, and spent 10 mins wrestling to get the server out of the rack.

9.50am: bring machine it inside, along with spare hard drive, IDE ribbon cable.

9.55am: outside to collect Debian CD.

10.00am: plug machine up to power, keyboard, monitor and network port in the semi-warm indoors. Move iffy disk and CD drive to IDE secondary, hang brand new 20G drive off primary. Hit power.... Machine finds new drive and CD, but not old drive.

10:05am: jiggle cables, reroute cables, check jumpers on HD's. Curse idiot who put massive ATI TV tuner card in mini tower, thus making cable routing a complete sod. Remember idiot is self. Curse self. Boot again. Same problem.

10:10am: reboot. Set BIOS to boot from CD first. hit eject button on CD to insert Debian CD. Zip.

10:15am: remove suspect HD. more cable jiggling, apply paperclip to eject CD drawer, insert CD, close draw. Boot. Nada.

10:20am: retrieve box from car containing nice Yamaha CD-RW from [livejournal.com profile] bedlamhouse.

10:22am: realise nice CD-RW is SCSI, and about as much immediate use as a chocolate fireguard. Gird loins and head outside again

10.30am: finish reviewing pile of mostly dead PCs in office - reject DVD-ROM in one as being known flaky, reject pile of strange CD drives in cupboard (including two external ones and an original Creative SB 8x CD, whose interface I do not wish to contemplate). Find Acer 50x drive in fourth PC down pile, remove.

10:35am: swap mounting rails over from old CD to newer Acer drive.

10:40am: make tea, start writing this LJ entry.

10:50am: fit Acer drive in server. Pause. Find paperclip. Remove Debian CD from old drive using paperclip.

10:55am: apply power. CD drawer opens, disk spins.

11:00am: remove power. On a whim, reattach suspect hard drive. Reboot with Debian CD in drive. BINGO! Both HD's and CD found, machine starts to boot installer.

11:05am: start fresh Debian install on new HD. It might not need it now, but I'm in the mood to be paranoid: that old HD was giving some very very odd errors.

11:09am: admire shininess of new Debian installer

11;10am; CD fails to read network drivers. Swear, reboot.

11:15am: fine second time round, partition new disk, leave machine badblock scanning, have early lunch while admiring shininess of new Debian installer some more

11:30am: return to machine to find various errors, reboot, try again. Reboot, try yet again with old HD disconnected. Still craps out. Peer, as instructed, in /target/var/log/debbootstrap,log, to find the very helpful 'Segmentation Fault'.

12:00noon: spent half an hour considering alternatives. Begin to suspect old HD is in fact fine, and can be plonked in different machine pretty safely. Temporarily cannibalise old Linux workstation.,...

12:15pm: Old HD fsck's fine, after a tantrum or two, in old workstation. Take deep breath, let it continue booting.

12:20pm: Well, it makes it to a root prompt, and MailScanner seems to be clearing the mail queue.

12:21pm: break for delivering James to doctors, via Anne, and a walk in the park with [livejournal.com profile] bardling

2:00pm: It's still up, and still behaving. Make tea.
hrrunka: Attentive icon by Narumi (Default)

[personal profile] hrrunka 2004-01-28 06:47 am (UTC)(link)
Ugh! Irritating when the errors being reported are secondary fallout, and the real cause of the problem is pretending that it's working just fine....

Motherboard failure?

[identity profile] filker0.livejournal.com 2004-01-28 07:07 am (UTC)(link)
You've probably already looked in to all of this, but...

I've had motherboard failures that sounded like this. Also power-supply problems. Either one can mimic a hard drive failure. RAM failures can cause random failures as well. Also, you might want to swap out 80-pin HD cables -- a flaky ATA100 cable can cause a world of pain.

In any case, my sympathies, and good luck with getting everything working, and soon.

Re: Motherboard failure?

[identity profile] antonia-tiger.livejournal.com 2004-01-28 08:23 am (UTC)(link)
Trying a spare IDE cable can fix a lot of these puzzling faults, since there's a pretty close connection to the old AT bus, and ribbon cables can fail in all sorts of invisible little ways.

But old boards have a slightly different connector. The current hardware has a pin missing, with a matching blanked-off socket on the cable. It's all for good reasons, but means that an off-the-shelf cable might not work with an old PC that's running Linux in some out-of-the-way corner.

Been there, done that, censored the tshirt.
bedlamhouse: (Default)

Re: Motherboard failure?

[personal profile] bedlamhouse 2004-01-28 12:21 pm (UTC)(link)
By the same token, particularly with newer hard drives, I've run into situations where using an 80-wire IDE cable cleaned up the problem even on an interface that didn't require one in the past. My first troubleshooting activity now tends to be to replace the IDE cable and start again.