Debugging Analysis:
Code:
[COLOR="Red"]BugCheck 124[/COLOR], {[COLOR="Blue"]4[/COLOR], [COLOR="SeaGreen"]fffffa8007ae8038[/COLOR], 0, 0}
Probably caused by : GenuineIntel
The first parameter indicates that the error was sourced from your PCIe bus, therefore it is most likely associated with this type of bus or a device connected to it.
Code:
7: kd> [COLOR="SeaGreen"]!errrec fffffa8007ae8038[/COLOR]
===============================================================================
Common Platform Error Record @ fffffa8007ae8038
-------------------------------------------------------------------------------
Record Id : 01cee18371d6b18f
Severity : Fatal (1)
Length : 672
Creator : Microsoft
Notify Type : [COLOR="Red"]PCI Express Error[/COLOR]
Timestamp : 11/15/2013 1:12:43 (UTC)
Flags : 0x00000000
===============================================================================
Section 0 : PCI Express
-------------------------------------------------------------------------------
Descriptor @ fffffa8007ae80b8
Section @ fffffa8007ae8148
Offset : 272
Length : 208
Flags : 0x00000001 Primary
Severity : [COLOR="Red"]Recoverable[/COLOR]
Port Type : [COLOR="Red"]Root Port[/COLOR]
Version : 1.1
Command/Status: 0x4010/0x0507
Device Id :
VenId:DevId : 8086:340e <-- Device
Class code : 030400
Function No : 0x00
Device No : 0x07
Segment : 0x0000
Primary Bus : 0x00
Second. Bus : 0x00
Slot : 0x0000
Dev. Serial # : 0000000000000000
Express Capability Information @ fffffa8007ae817c
Device Caps : 00008021 Role-Based Error Reporting: 1
Device Ctl : 0107 ur FE NF CE
Dev Status : 0003 ur fe NF CE
Root Ctl : 0008 fs nfs cs
AER Information @ [COLOR="Blue"]fffffa8007ae81b8[/COLOR]
Uncorrectable Error Status : 00014000 ur ecrc mtlp rof UC ca CTO fcp ptlp sd dlp und
Uncorrectable Error Mask : 00000000 ur ecrc mtlp rof uc ca cto fcp ptlp sd dlp und
Uncorrectable Error Severity : 00062010 ur ecrc [COLOR="Red"]MTLP[/COLOR] [COLOR="Red"]ROF[/COLOR] uc ca cto [COLOR="Red"]FCP[/COLOR] ptlp sd [COLOR="Red"]DLP[/COLOR] und <-- These are our important flags
Correctable Error Status : 00002000 ADV rtto rnro dllp tlp re
Correctable Error Mask : 00000000 adv rtto rnro dllp tlp re
[COLOR="Red"]Caps & Control[/COLOR] : 0000000e [COLOR="Blue"]ecrcchken[/COLOR][COLOR="Blue"] ecrcchkcap ecrcgenen[/COLOR] ecrcgencap FEP
[COLOR="Red"]Header Log [/COLOR] : 00000000 00000000 00000000 00000000 <-- Contains the Header for TLP error
Root Error Command : 00000000 fen nfen cen
Root Error Status : 00000000 MSG# 00 fer nfer fuf mur ur mcr cer
Correctable Error Source ID : 00,00,00
Correctable Error Source ID : 00,00,00
===============================================================================
Section 1 : Processor Generic
-------------------------------------------------------------------------------
Descriptor @ fffffa8007ae8100
Section @ fffffa8007ae8218
Offset : 480
Length : 192
Flags : 0x00000000
Severity : Informational
Proc. Type : x86/x64
Instr. Set : x64
CPU Version : 0x00000000000106a5
Processor ID : 0x0000000000000007
The most relevant part here is the AER (PCI Express Advanced Error Reporting), it's been a while since I've had a Stop 0x124 with a PCI/PCIe slot being the source of the problem.
MTLP seems to point to malformed transaction layer packet (TLP). This layer is used for accepted and sending TLPs, which are messages used for I/O and memory reads and writes. The Header log contains the TLP Header which is 32-bits long or 4 bytes. I believe this packet is empty with no data present.
A malformed TLP being received, should mean that no receivers should update their flow information. I'm guessing a receiver did in fact update their flow information leading to the other error below.
ROF is a receiver overflow, reading the specifications, it seems that a TLP packet is consuming more than the specified amount of control flow credit. I believe the credit system is a way of controlling TLPs and their buffer sizes.
FCP is Flow Control Protocol error, this indicates a flow control protocol rule has been broken.
DLP Data Link Protocol error has been reported, this layer sits between the TLP layer and the Physical layer, and is primarily used for sending TLPs between two devices on a link.
Caps & Control corresponds to the AER Capabilities and Control Register, which is used for ECRC Error Checking. It appears all the options have been enabled here. It checks the TLPs and their Digest field.
#############################################
#############################################
Things to do:
Your VenID and DevID fields appear to point to your Intel 7500 Chipset PCIe Root Port, do you have the latest chipset drivers?
I would suggest some graphics card testing too, please run the test for a few hours, and do not let the temperature exceed 80*C/90*C.
Graphics Card:
Warning
Read all the steps within the hardware test tutorials very carefully, as stress tests is designed run components to their maximum capacity, in order to point out failing or faulty hardware components
Use the following program to monitor your hardware temperatures during the stress test:
Code:
7: kd> [COLOR="Red"][COLOR="SeaGreen"]!sysinfo machineid[/COLOR][/COLOR]
Machine ID Information [From Smbios 2.6, DMIVersion 0, Size=1818]
BiosMajorRelease = 8
BiosMinorRelease = 16
BiosVendor = Dell Computer Corporation
BiosVersion = A04
BiosReleaseDate = [COLOR="Red"]10/21/2010[/COLOR]
SystemManufacturer = Dell Inc.
SystemProductName = [COLOR="Red"]Studio XPS 9100[/COLOR]
SystemFamily = High-End Desktop
SystemVersion = 00
SystemSKU = 0
BaseBoardManufacturer = Dell Inc.
BaseBoardProduct = 05DN3X
BaseBoardVersion = A00
Your BIOS seems to be a little outdated, I would suggest checking for any potential BIOS updates. Ensure you flash the BIOS with the correct version for the
exact model of your motherboard, otherwise you could completely corrupt the BIOS.
Do suffer from power outs?