I found more information. If I'm looking at it right, the SD bit was set for this specific PCI-E error. SD means Surprise Down, as in it was reported by the controller that the link established between the PCI-E card and the controller hub was lost. Unfortunately this can be rather broad, in that it could mean
either the card or the motherboard was responsible for losing the connection. There's also the possibility of dust or something else that's stuck in the PCIE slot, or that the card is not flush and correctly inserted into the slot. Make sure that isn't the case, otherwise, given the complaints from others about this board, I'd lean more to the controller being responsible.
Analysts:
Code:
3: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
WHEA_ERROR_RECORD structure that describes the error conditon.
Arguments:
Arg1: 00000004, PCI Express Error
Arg2: 869348d4, Address of the WHEA_ERROR_RECORD structure.
Arg3: 00000000
Arg4: 00000000
Debugging Details:
------------------
TRIAGER: Could not open triage file : C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\triage\modclass.ini, error 2
BUGCHECK_STR: 0x124_GenuineIntel
CUSTOMER_CRASH_COUNT: 1
DEFAULT_BUCKET_ID: WIN7_DRIVER_FAULT
PROCESS_NAME: System
CURRENT_IRQL: a
STACK_TEXT:
80e4cb2c 8341afcd 00000124 00000004 869348d4 nt!KeBugCheckEx+0x1e
80e4cb68 83506fc4 869334e1 869348d4 8691bc10 hal!HalBugCheckSystem+0xab
80e4cb9c 8c7ce609 8691b638 8690a780 80e4cd20 nt!WheaReportHwError+0x230
80e4cbb4 8c7cf088 869344b4 00000000 8691b638 pci!ExpressRootPortAerInterruptRoutine+0x1e7
80e4cbd8 8c7cf264 8690a780 86934008 80e4cbfc pci!ExpressRootPortInterruptRoutine+0x1a
80e4cbe8 834a9cff 8690a780 86934008 00000001 pci!ExpressRootPortMessageRoutine+0x10
80e4cbfc 83474ded 8690a780 86934008 80e4cc28 nt!KiInterruptMessageDispatch+0x12
80e4cbfc 93be45d6 8690a780 86934008 80e4cc28 nt!KiInterruptDispatch+0x6d
WARNING: Stack unwind information not available. Following frames may be wrong.
80e4cc98 8349ada4 888d8d48 80e35800 80e30000 intelppm+0x15d6
80e4cd20 834985ad 00000000 0000000e ab16ab16 nt!PoIdle+0x524
80e4cd24 00000000 0000000e ab16ab16 8bdf8bdf nt!KiIdleLoop+0xd
STACK_COMMAND: kb
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: GenuineIntel
IMAGE_NAME: GenuineIntel
DEBUG_FLR_IMAGE_TIMESTAMP: 0
FAILURE_BUCKET_ID: 0x124_GenuineIntel_PCIEXPRESS
BUCKET_ID: 0x124_GenuineIntel_PCIEXPRESS
Followup: MachineOwner
---------
3: kd> !errrec 869348d4
===============================================================================
Common Platform Error Record @ 869348d4
-------------------------------------------------------------------------------
Record Id : 01cd07d8bce4740f
Severity : Fatal (1)
Length : 672
Creator : Microsoft
Notify Type : PCI Express Error
Timestamp : 3/22/2012 3:06:44 (UTC)
Flags : 0x00000000
===============================================================================
Section 0 : PCI Express
-------------------------------------------------------------------------------
Descriptor @ 86934954
Section @ 869349e4
Offset : 272
Length : 208
Flags : 0x00000001 Primary
Severity : Recoverable
Port Type : Root Port
Version : 1.1
Command/Status: 0x4010/0x0507
Device Id :
VenId:DevId : 8086:340a
Class code : 030400
Function No : 0x00
Device No : 0x03
Segment : 0x0000
Primary Bus : 0x00
Second. Bus : 0x00
Slot : 0x0000
Dev. Serial # : 0000000000000000
Express Capability Information @ 86934a18
Device Caps : 00008021 Role-Based Error Reporting: 1
Device Ctl : 0107 ur FE NF CE
Dev Status : 0003 ur fe NF CE
Root Ctl : 0008 fs nfs cs
AER Information @ ffffffff86934a54
Uncorrectable Error Status : 00000020 ur ecrc mtlp rof uc ca cto fcp ptlp SD dlp und
Uncorrectable Error Mask : 00000000 ur ecrc mtlp rof uc ca cto fcp ptlp sd dlp und
Uncorrectable Error Severity : 00062010 ur ecrc MTLP ROF uc ca cto FCP ptlp sd DLP und
Correctable Error Status : 00000000 adv rtto rnro dllp tlp re
Correctable Error Mask : 00000000 adv rtto rnro dllp tlp re
Caps & Control : 00000005 ecrcchken ecrcchkcap ecrcgenen ecrcgencap FEP
Header Log : 00000000 00000000 00000000 00000000
Root Error Command : 00000000 fen nfen cen
Root Error Status : 00000000 MSG# 00 fer nfer fuf mur ur mcr cer
Correctable Error Source ID : 00,00,00
Correctable Error Source ID : 00,00,00
===============================================================================
Section 1 : Processor Generic
-------------------------------------------------------------------------------
Descriptor @ 8693499c
Section @ 86934ab4
Offset : 480
Length : 192
Flags : 0x00000000
Severity : Informational
Proc. Type : x86/x64
Instr. Set : x86
CPU Version : 0x00000000000106a5
Processor ID : 0x0000000000000006
Looked up VenID:DevID on PCIDatabase.com. Turned up with the client's
Intel 7500 Chipset PCIe Root Port which is part of the ICH10 Intel Southbridge chipset.
I then went to MSDN and looked up any structure information related to PCI Express and AER (Advanced Error Reporting). The results came up with 3 possible structures. Because this was a report sent by the Root Port (you can tell in the WHEA error by the Port Type), the structure we want is
PCI_EXPRESS_ROOTPORT_AER_CAPABILITY. Then I looked at the structure details related to
Uncorrectable Error Status as it should tell us the current status of the error that was triggered. In the WHEA record, it shows us that
SD is capitalized, meaning the bit related to that is set, so we should look for details on what "SD" most likely means. Result in the article is "Surprise Down". Googling this showed that it means a report that there was a sudden connection loss between the card and the controller.