View Single Post
22 Mar 2012  
Vir Gnarus

Microsoft Community Contributor Award Recipient

Windows 7 64-bit
 
 

I found more information. If I'm looking at it right, the SD bit was set for this specific PCI-E error. SD means Surprise Down, as in it was reported by the controller that the link established between the PCI-E card and the controller hub was lost. Unfortunately this can be rather broad, in that it could mean either the card or the motherboard was responsible for losing the connection. There's also the possibility of dust or something else that's stuck in the PCIE slot, or that the card is not flush and correctly inserted into the slot. Make sure that isn't the case, otherwise, given the complaints from others about this board, I'd lean more to the controller being responsible.



Analysts:

Code:
3: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
WHEA_ERROR_RECORD structure that describes the error conditon.
Arguments:
Arg1: 00000004, PCI Express Error
Arg2: 869348d4, Address of the WHEA_ERROR_RECORD structure.
Arg3: 00000000
Arg4: 00000000

Debugging Details:
------------------

TRIAGER: Could not open triage file : C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\triage\modclass.ini, error 2

BUGCHECK_STR:  0x124_GenuineIntel

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  WIN7_DRIVER_FAULT

PROCESS_NAME:  System

CURRENT_IRQL:  a

STACK_TEXT:  
80e4cb2c 8341afcd 00000124 00000004 869348d4 nt!KeBugCheckEx+0x1e
80e4cb68 83506fc4 869334e1 869348d4 8691bc10 hal!HalBugCheckSystem+0xab
80e4cb9c 8c7ce609 8691b638 8690a780 80e4cd20 nt!WheaReportHwError+0x230
80e4cbb4 8c7cf088 869344b4 00000000 8691b638 pci!ExpressRootPortAerInterruptRoutine+0x1e7
80e4cbd8 8c7cf264 8690a780 86934008 80e4cbfc pci!ExpressRootPortInterruptRoutine+0x1a
80e4cbe8 834a9cff 8690a780 86934008 00000001 pci!ExpressRootPortMessageRoutine+0x10
80e4cbfc 83474ded 8690a780 86934008 80e4cc28 nt!KiInterruptMessageDispatch+0x12
80e4cbfc 93be45d6 8690a780 86934008 80e4cc28 nt!KiInterruptDispatch+0x6d
WARNING: Stack unwind information not available. Following frames may be wrong.
80e4cc98 8349ada4 888d8d48 80e35800 80e30000 intelppm+0x15d6
80e4cd20 834985ad 00000000 0000000e ab16ab16 nt!PoIdle+0x524
80e4cd24 00000000 0000000e ab16ab16 8bdf8bdf nt!KiIdleLoop+0xd


STACK_COMMAND:  kb

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: GenuineIntel

IMAGE_NAME:  GenuineIntel

DEBUG_FLR_IMAGE_TIMESTAMP:  0

FAILURE_BUCKET_ID:  0x124_GenuineIntel_PCIEXPRESS

BUCKET_ID:  0x124_GenuineIntel_PCIEXPRESS

Followup: MachineOwner
---------

3: kd> !errrec 869348d4
===============================================================================
Common Platform Error Record @ 869348d4
-------------------------------------------------------------------------------
Record Id     : 01cd07d8bce4740f
Severity      : Fatal (1)
Length        : 672
Creator       : Microsoft
Notify Type   : PCI Express Error
Timestamp     : 3/22/2012 3:06:44 (UTC)
Flags         : 0x00000000

===============================================================================
Section 0     : PCI Express
-------------------------------------------------------------------------------
Descriptor    @ 86934954
Section       @ 869349e4
Offset        : 272
Length        : 208
Flags         : 0x00000001 Primary
Severity      : Recoverable

Port Type     : Root Port
Version       : 1.1
Command/Status: 0x4010/0x0507
Device Id     :
  VenId:DevId : 8086:340a
  Class code  : 030400
  Function No : 0x00
  Device No   : 0x03
  Segment     : 0x0000
  Primary Bus : 0x00
  Second. Bus : 0x00
  Slot        : 0x0000
Dev. Serial # : 0000000000000000
Express Capability Information @ 86934a18
  Device Caps : 00008021 Role-Based Error Reporting: 1
  Device Ctl  : 0107 ur FE NF CE
  Dev Status  : 0003 ur fe NF CE
   Root Ctl   : 0008 fs nfs cs

AER Information @ ffffffff86934a54
  Uncorrectable Error Status    : 00000020 ur ecrc mtlp rof uc ca cto fcp ptlp SD dlp und
  Uncorrectable Error Mask      : 00000000 ur ecrc mtlp rof uc ca cto fcp ptlp sd dlp und
  Uncorrectable Error Severity  : 00062010 ur ecrc MTLP ROF uc ca cto FCP ptlp sd DLP und
  Correctable Error Status      : 00000000 adv rtto rnro dllp tlp re
  Correctable Error Mask        : 00000000 adv rtto rnro dllp tlp re
  Caps & Control                : 00000005 ecrcchken ecrcchkcap ecrcgenen ecrcgencap FEP
  Header Log                    : 00000000 00000000 00000000 00000000
  Root Error Command            : 00000000 fen nfen cen
  Root Error Status             : 00000000 MSG# 00 fer nfer fuf mur ur mcr cer
  Correctable Error Source ID   : 00,00,00
  Correctable Error Source ID   : 00,00,00

===============================================================================
Section 1     : Processor Generic
-------------------------------------------------------------------------------
Descriptor    @ 8693499c
Section       @ 86934ab4
Offset        : 480
Length        : 192
Flags         : 0x00000000
Severity      : Informational

Proc. Type    : x86/x64
Instr. Set    : x86
CPU Version   : 0x00000000000106a5
Processor ID  : 0x0000000000000006
Looked up VenID:DevID on PCIDatabase.com. Turned up with the client's Intel 7500 Chipset PCIe Root Port which is part of the ICH10 Intel Southbridge chipset.

I then went to MSDN and looked up any structure information related to PCI Express and AER (Advanced Error Reporting). The results came up with 3 possible structures. Because this was a report sent by the Root Port (you can tell in the WHEA error by the Port Type), the structure we want is PCI_EXPRESS_ROOTPORT_AER_CAPABILITY. Then I looked at the structure details related to Uncorrectable Error Status as it should tell us the current status of the error that was triggered. In the WHEA record, it shows us that SD is capitalized, meaning the bit related to that is set, so we should look for details on what "SD" most likely means. Result in the article is "Surprise Down". Googling this showed that it means a report that there was a sudden connection loss between the card and the controller.
My System SpecsSystem Spec