New
#11
In addition to torrentg's excellent advice, I'll try to add a bit of detail about some of these dumps, because a certain member thought them interesting :)
SYNOPSIS: In situations where the stack itself is available, which is most of the time, try to focus on the activity revealed by the stack as a means of gauging what is going on. For those bugcheck types which are not outright hardware error reports (0x124 in particular) the stack is frequently far more informative than the bugcheck code. In no particular order...
103109-15943-01.dmp, from the "!analyze -v" output:
STACK_TEXT:
fffff880`08460780 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!ExAllocatePoolWithTag+0x52e
Any time you see problems with "AllocatePool" or "FreePool" in a dump, the situation is suggestive of "pool corruption" - either a bad driver or a bad hardware component has been mangling pool memory which does not belong to it. It is mostly impossible to understand why from just a minidump. Hence, the driver verifier's (DV) "special pool" tracking option can be used to make the OS pay closer attention next time.
Diagnosis for this one: pool corruption, cause unknown, let's think about enabling DV "special pool".
103109-16130-01.dmp:
4: kd> k
Child-SP RetAddr Call Site
fffff880`07dfe6f8 fffff880`0121ae95 nt!KeBugCheckEx
fffff880`07dfe700 fffff880`0121aae8 Ntfs!NtfsPagingFileIo+0x155
fffff880`07dfe800 fffff880`0109423f Ntfs! ?? ::FNODOBFM::`string'+0x9e89
fffff880`07dfe8b0 fffff880`010926df fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x24f
fffff880`07dfe940 fffff800`02a77a52 fltmgr!FltpDispatch+0xcf
fffff880`07dfe9a0 fffff800`02a77d55 nt!IoPageRead+0x252
fffff880`07dfea30 fffff800`02aa109b nt!MiIssueHardFault+0x255
fffff880`07dfeac0 fffff800`02a83fee nt!MmAccessFault+0x14bb
fffff880`07dfec20 00000000`779055a1 nt!KiPageFault+0x16e
00000000`0018f648 00000000`00000000 0x779055a1
Reading the function names from the bottom upwards: something tried to access memory which wasn't located in the process "working set" at the time, leading to a page fault (KiPageFault). The Memory Manager (Mm*) attempts to handle the condition (MmAccessFault), decides that the particular memory being requested has been paged out, and invokes a specialised handler for "hard" page faults which bring in pages from the paging file on the disk (MiIssueHardFault). The I/O manager has the task of actually reading in data (as with any file I/O), in the form of IoPageRead. The filter driver manager (fltmgr.sys) is consulted before the file I/O operation begins, in case there are any registered "filter drivers", such as AV, which have registered for notification during this particular file I/O activity. Eventually, we get down to NtfsPagingFileIo, whose purpose is now hopefully self-descriptive, and that is when a condition sufficiently severe to trigger a bugcheck (KeBugCheckEx) is noted.
As so frequently happens with minidumps, it's hard to be certain about the absolute cause of the crash, but at least the nature of the activity is providing clues.
Diagnosis: either what we're reading from (the NTFS metadata or the pagefile) or what we're writing to (the RAM) appears to be experiencing some sort of problem.
103009-13915-01.dmp:
0: kd> k
Child-SP RetAddr Call Site
fffff880`0ca387b8 fffff800`02881228 nt!KeBugCheckEx
fffff880`0ca387c0 fffff800`028fcb51 nt! ?? ::FNODOBFM::`string'+0x31f72
fffff880`0ca38970 fffff800`0290dc4a nt!MiDeleteVirtualAddresses+0x408
fffff880`0ca38b30 fffff800`028cb153 nt!NtFreeVirtualMemory+0x5ca
fffff880`0ca38c20 00000000`778f009a nt!KiSystemServiceCopyEnd+0x13
00000000`0008e2d8 00000000`00000000 0x778f009a
In this case, some operation had finished (KiSystemServiceCopyEnd) and it thus came time to free some memory (NtFreeVirtualMemory). More psecifically, that's accomplished by deleting a particular range of addresses (MiDeleteVirtualAddresses), and that's where the OS felt sufficiently in trouble to warrant a KeBugCheckEx (blue screen). It's usually OK to just ignore the "FNODOBFM" business".
Diagnosis: we seem to have hit upon a problem at the conversion layer between virtual and physical memory (RAM).
103109-15272-01.dmp:
Ditto, an attempt to "MiQueryAddressState" appears to proke a crash.
103009-18501-01.dmp:
And again. "MiUnlinkPageFromLockedList" runs into some sort of problem while attempting to deal with physical memory.
=============================
The other few dumps are so badly broken it's difficult to use them even for instructive purposes, but for reasons which are hard to explain succinctly, they too look like hardware.
Overall, while the first couple of dumps looked (hopefully) like driver pool corruption or some type of disk/NTFS issue, all the others suggest that either the RAM itself or the pathways used to get there (processor, motherboard) are somehow unreliable.
When troubleshooting a bunch of dumps from a single machine, it is never a good idea to assume the worst. For example, "pool corruption" might be caused by bad hardware, but it would be a mistake to jump to that conclusion if all we had was 103109-15943-01.dmp. Instead, we'd assume the cause was a bad driver and use DV to try to find it.
Conversely, once you've got evidence of a rather severe problem - in particular a hardware problem - it becomes almost impossible and definitely pointless to troubleshoot lesser issues such as software pool corruption, at least until the hardware badness is thoroughly resolved.
===============================
Overall diagnosis: RAM or RAM access is highly unreliable. The machine is either overclocked, inadequately cooled, or outright broken. The chances of a software cause are less than 10%
Recommendation: rum memory diagnostics 'til the thing bleeds - for hours or days if necessary.