Oops, I only did the first one. I set up a chdsk last night, I'll post the results when I get home.
However, I am thinking we might be going about this the wrong way. The system worked fine when I was running WHS2011 and only started freezing when I switched to Windows 7. I then formatted the drive and performed a clean install with the same results. Unless a hardware component recently failed I think it unlikely to be a hardware issue. Also, it runs indefinitely under safe mode. What ever is causing the problem is something that is loaded in a normal boot but not during a safe boot. It also must be a Microsoft issue or one of my drivers because nothing else has been installed.
How can I track down the offending software/driver? I think I am going to try and run Validate without excluding Microsoft items and see if I get anything. Aside from that, what do you propose?
I'm not sure why you would think it isn't hardware if you said that formatting and reinstalling an OS didn't resolve the issue (unless I'm reading it incorrectly). The fact that it runs fine in safe mode doesn't really negate the possibility of it being hardware. Safe mode will limit the amount of hardware being used, and there may be hardware safe mode is not using that would trigger the freezes, or safe mode may not present conditions that would cause your hardware to bug out.
As for Driver Verifier, you want to make sure
not to select Microsoft drivers. I personally have done so to test it out and it will either slow down your PC to a crawl or lock it up or cause other unusual behavior. Unless you're sure the drivers selected are from Microsoft and are
not part of Windows (like they're part of Microsoft Security Essentials) then you'll want to keep away from selecting them.
The problem with freezes is that it's very difficult to ascertain cause because there's no data generated on the situation. If Driver Verifier is not triggering, then often one is reduced to brute force tactics like process of elimination to try and isolate cause. The only other exception is live kernel debugging, but I highly doubt you have the resources to perform that.
The only other data I can see that can help is a
Process Monitor log. Though this log can get big, fast, so you'll want to start it up and then try to perform some action you know will cause the freeze. The more logs the merrier (to determine patterns). Note in order for this to work during freezing, you'll have to go to
File then
Backing Files and change the backing file from virtual memory (paging file) to a static file that you define. That way all the data will be stored on that file and will not be lost at restart. Do that for a number of crashes, then zip them up and upload to a 3rd-party filesharing site.
Now, as for process of elimination, if you still believe this to be software, you can use
Autoruns and start turning off services, startup junk, drivers, etc., until you manage to narrow it down to a specific one. Your best method for this is to uncheck (never delete!) groups of related drivers/services/etc. (like everything pertaining to a specific application, or anything related to your network card). Unless you can find a pattern in the freezes and what you suspect may be causing them, your best bet is to start from the bottom (uncheck a LOT of items aside from necessary ones) and work up by turning on some, then testing, then turning on more, then testing, etc. If you want a quick way to do this, go to start menu then type
msconfig and then select either Diagnostic startup or Selective startup, then restart the PC. Then you can open Autoruns and see that much of the stuff has already been unchecked.
As you can kinda realize, this all probably won't work because diagnostic startup reduces startup to the most barebones startup possible, a little above safe mode. If you said diagnostic startup freezes up, then no amount of selecting items in Autoruns is going to fix this, unless you happen to go in after turning on diagnostic startup and use Auturuns to uncheck any drivers that have not been unchecked.
One thing I'm curious about, you said safe mode works. Does safe mode with networking also work? Does the networking in safe mode actually function properly (There may be problems with wifi in safe mode)?
Anyways, as for brute force tactics for hardware, it means performing a battery of hardware tests and removing hardware or swapping with replacements you know are reliable and seeing if that's the trick. I'll include a copypasta of a bunch of hardware tests you can use (I know you already ran like 2 of em) as well as how to generate a temp/voltage log for us to check. Basically they're additional (or the same) as what
writhzeden already mentioned previously. It's best to do anything you haven't already done that's been stated below or from
writhzeden.
RAM:
Memtest86+ - 7+ passes
CPU:
Prime95 - Torture Test; Large FFTs; overnight (9+ hours)
GPU:
MemtestG80/CL - Run twice (if any of the tests work on your GPU; ATI cards will need to install the
ATI APP SDK as it requires OpenCL)
Drives:
Seatools - All basic tests aside from the
Fix all or the advanced ones.
All of these (excluding MemtestG80/CL) are included in the
UBCD if you prefer a Live CD environment (which is the best environment to test hardware on). Note that Prime95 currently does not work on the UBCD. Also, please provide us temps/voltages using
HWInfo with
Sensors only option checked. Log two 30-minute instances: one for idle, and one for high load.