View Single Post
05 Dec 2012  
Vir Gnarus

Microsoft Community Contributor Award Recipient

Windows 7 64-bit
 
 

All your crashes are identical in that the PCI-Express bus is reporting an unexpected connection completion because the connection timed out. I personally am not familiar enough with PCI-E WHEA errors to get the nitty gritty on exactly what device caused it, but I do know the USB thing with the monitor hubs can be involved because often - especially on OEM mobos - the USB bus is made as an extension of the PCI-E bus, so if there's a USB problem it can manifest as a PCI-E error. The video card can also be involved here because of the PCI-E bus it's connected too. Again, I cannot be sure what is what.

Previous experiences I've dealt with involving this have involved dust or some other debris getting in a PCI-E slot or that a card is not seated properly. Re-seat cards and make sure slots and card connectors are nice n clean. Make sure USB ports also don't have anything iffy in them.

I recommend we work on a process of elimination here with the USB devices, or do it backwards and start testing with one of the monitors and its associated hub and see if it bugs out again. I also recommend - if you haven't already - to update BIOS and chipset drivers as well as drivers associated with the USB and/or PCI-E buses as those may have fixed instability issues. Either way, just contemplate on what just may be causing connection timeouts on the PCI-E/USB bus and work with that. Typically I've found it's due to a physical connection issue, but it can just as well be related to any drivers associated with a USB/PCI-E device or the bus itself. If all other options have been exhausted, then you'll have to blame the motherboard (unfortunately there's no testing procedure for mobos besides a hardware swap).

I wish I could help you pinpoint more on this, but the problem it seems to me with dealing with PCI-E bus is that typically the one reporting the error is the root port, which is the central hub of the PCI-E bus. The bus has several nodes (called bridges) which report between their associated bridge or end devices to the root port, but if an error happens to get past the bridges to the root port, all the root port can say at most is that it retrieved it from such-n-such bridge, which isn't really narrowing things down. There is no header log to read either which often can have decent info in it. The only hint of data I can garner is that it's device #2, but without an understanding of the layout of the PCI-E bus for that motherboard, I cannot determine what this is referring too (is this USB port #2? Is it PCI-E slot #2? Is it #2 device in an enumerated device list?). We'll just have to say it involves the PCI-E/USB bus and go from there. Again, update drivers and BIOS, tinker with video card and USB devices, and see how that goes.

Analysts:

Code:
Use !analyze -v to get detailed debugging information.

BugCheck 124, {4, fffffa80151b78d8, 0, 0}

TRIAGER: Could not open triage file : C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\triage\modclass.ini, error 2
Probably caused by : GenuineIntel

Followup: MachineOwner
---------

30: kd> !errrec fffffa80151b78d8
===============================================================================
Common Platform Error Record @ fffffa80151b78d8
-------------------------------------------------------------------------------
Record Id     : 01cdcf535a539a27
Severity      : Fatal (1)
Length        : 672
Creator       : Microsoft
Notify Type   : PCI Express Error
Timestamp     : 12/3/2012 21:55:09 (UTC)
Flags         : 0x00000000

===============================================================================
Section 0     : PCI Express
-------------------------------------------------------------------------------
Descriptor    @ fffffa80151b7958
Section       @ fffffa80151b79e8
Offset        : 272
Length        : 208
Flags         : 0x00000001 Primary
Severity      : Recoverable

Port Type     : Root Port
Version       : 1.1
Command/Status: 0x0010/0x0407
Device Id     :
  VenId:DevId : 8086:3c04 // referring to device that reported error (root port), not actual bad device. Same with class code.
  Class code  : 030400
  Function No : 0x00
  Device No   : 0x02
  Segment     : 0x0000
  Primary Bus : 0x00
  Second. Bus : 0x00
  Slot        : 0x0000
Dev. Serial # : 0000000000000000
Express Capability Information @ fffffa80151b7a1c
  Device Caps : 00008001 Role-Based Error Reporting: 1
  Device Ctl  : 0007 ur FE NF CE
  Dev Status  : 0003 ur fe NF CE
   Root Ctl   : 0008 fs nfs cs

AER Information @ fffffa80151b7a58
  Uncorrectable Error Status    : 00014000 ur ecrc mtlp rof UC ca CTO fcp ptlp sd dlp und
  Uncorrectable Error Mask      : 00000000 ur ecrc mtlp rof uc ca cto fcp ptlp sd dlp und
  Uncorrectable Error Severity  : 00062010 ur ecrc MTLP ROF uc ca cto FCP ptlp sd DLP und
  Correctable Error Status      : 00002000 ADV rtto rnro dllp tlp re
  Correctable Error Mask        : 00000000 adv rtto rnro dllp tlp re
  Caps & Control                : 0000000e ecrcchken ecrcchkcap ecrcgenen ecrcgencap FEP
  Header Log                    : 00000000 00000000 00000000 00000000
  Root Error Command            : 00000000 fen nfen cen
  Root Error Status             : 00000000 MSG# 00 fer nfer fuf mur ur mcr cer
  Correctable Error Source ID   : 00,00,00
  Correctable Error Source ID   : 00,00,00

===============================================================================
Section 1     : Processor Generic
-------------------------------------------------------------------------------
Descriptor    @ fffffa80151b79a0
Section       @ fffffa80151b7ab8
Offset        : 480
Length        : 192
Flags         : 0x00000000
Severity      : Informational

Proc. Type    : x86/x64
Instr. Set    : x64
CPU Version   : 0x00000000000206d7
Processor ID  : 0x000000000000002e
Read my article on these types of crashes here. UC status bit means Unexpected Completion, and CTO means Completion Timeout. Most likely the timeout triggered the unexpected completion bit. What strikes me odd is that this crash reported itself as recoverable, but it still BSOD. I wonder if a BIOS/chipset bug has anything to do with this.
My System SpecsSystem Spec