One of the most frustrating things in computer repair is an intermittent (seemingly random) fault. It can suck up a lot of diagnosis time and, unless you can pin down the problem, it is difficult to be sure if any ‘repair’ has actually worked or if you just got lucky and the intermittent problem will return in a few days…
A perfect example is an issue we saw recently which turned out to be caused by a faulty SATA port on the motherboard. We will explain how we arrived at the solution – it reinforces the need to have a logical approach to diagnosis yet not to become so blinded by the ‘most common’ causes that you fail to consider much rarer alternatives.
Reported Problem – 4 year old Vista PC working fine most of the time but suffering occasional blue screen of death (BSOD) crashes or long freezes followed by automatic shut down and restart. The issues had been getting worse for 6 months and now averaged about once a day – but with no pattern as to how long the PC was on for or what software was in use at the time…
These types of error can be caused by either a hardware or a software problem – and, being intermittent, are not easy to diagnose so a thorough investigation is needed.
Steps To Diagnose The Problem – We first backed up all the important documents before attempting any diagnosis! This is really important if hardware failure is suspected e.g. if a hard drive is dying then testing it might just finish it off. Always play safe and ensure you have a full backup before continuing with diagnosis.
1. We checked Event Viewer, looking particularly at the System logs for relevant errors. Several errors typically related to memory (RAM) faults so we tested the memory – no faults found. (If the problem were not so intermittent we could have just swapped out the RAM to test it)
2. Other errors were typically related to hard drive faults – although they may have been caused by the frequent unexpected shutdowns corrupting the Windows file system. As the PC was 4 years old we tested the hard drive to check if it was failing – no faults found.
3. A faulty PSU (Power Supply Unit) could cause random freezing and restarts so we tested it under full load – no faults found.
Next Steps – So far we had established that the hard drive, RAM, PSU and temperatures were all fine – there were no other separate components (like a graphics/sound card) so this narrowed down the cause to an intermittent motherboard problem or software errors.
The biggest issue was not being able to reproduce the error – even under stress testing the PC had remained resolutely stable for us with no crashes or freezing. Unless we could find a ‘smoking gun’ we could not justify wasting a lot of time updating drivers/software willy nilly, and certainly not wasting a lot of money replacing the motherboard or writing off the PC – we needed to make this PC crash.
- Considering all the information to date, our instinct and experience still pointed us towards an issue with heavy file read/write access causing related memory and hard drive issues, hence the crashes/freezing…
- So we ran a full virus scan on the PC (also useful to rule out virus activity) which would read/check every file but found no issues – and still no crash.
- Next we decided to do a disk cleanup – lo and behold it froze half way through and the PC shut down!
- To make sure, we tried a few disk cleanups in both normal and safe mode and the PC froze up every time. At last we could reproduce the problem at will so could proceed with diagnosis and repair, confident that we could test for sure if we had been successful.
- To test for motherboard issues we changed the hard drive SATA data cable – disk cleanup still crashed. We then moved the hard drive SATA data cable to a different SATA port on the motherboard – and disk cleanup finished successfully
We repeated the test by trying a few disk cleanups in both normal and safe mode and they were successful every time – no freezes at all. As a final check, we moved the SATA cable back to the original port on the motherboard and, sure enough, the disk cleanup started failing again – proving for sure that particular SATA port was at fault.
Updated 29th Dec 2012 – some readers have asked what to do if you find that all your motherboard SATA ports are faulty (especially if you only have 2). Possible solutions would include:
1. Add more internal SATA ports by adding a SATA PCIe card (if you have a free PCIe slot) or a SATA PCI card (not as quick but will do the job) – an example of the type of thing is this SATA PCI Card – note that you may have to play around with loading SATA drivers when installing Windows.
2. Use an IDE port instead if you have an IDE hard drive available.
3. Buy a new motherboard.
4. Even if you decide to buy another PC, as long as the SATA hard drive itself is not faulty, you could reuse it in a new PC or just retrieve your personal data from it (by attaching it temporarily in another PC).
The original SATA port (or associated disk controller) on the motherboard was faulty – but only intermittently and under heavy disk activity. This is a very rare outcome – the usual sign of a faulty SATA port is that the hard drive is not recognized in BIOS or Windows does not load, producing a disk read error or ‘no operating system found’ message.
It underlines the importance of thorough step by step testing and being able to reproduce a fault so that you can test if a potential fix actually works – otherwise you’re just shooting in the dark.