x BlueRobot
Closed by request
- Local time
- 3:29 PM
- Messages
- 6,784
Before Reading: This tutorial is outdated, please check my blog for the updated version, which explains all the errors and places all the references of information together. BSODTutorials (Check November 2013)
**Please read corrections (See link)**
Vir Gnarus - Post #4 - Corrections
I understand, that most of the current BSOD analysts on the forum, use and understand the more efficient way of analyzing Stop 0x124 crashes as pointed out by Vir Gnarus. Although, I would like to create page which enables new BSOD analysts to understand and use this method in their analysis.
Thanks for Vir Gnarus, for explaining this method, he has already created a brilliant tutorial on Sysnative about how to debug Stop 0x124 PCI errors (see External Links)
Here's the start of a 0x124 bugcheck (without any extensions used):The WHEA_UNCORRECTABLE_ERROR bug check has a value of 0x00000124. This bug check indicates that a fatal hardware error has occurred. This bug check uses the error data that is provided by the Windows Hardware Error Architecture (WHEA).
Code:
[COLOR=Red]BugCheck 124[/COLOR], {[COLOR=Blue]4[/COLOR], [COLOR=SeaGreen]fffffa800aaeb8d8[/COLOR], 0, 0}
Probably caused by : GenuineIntel
The text in red, is basically the type of bugcheck, you can use this to find further information from the BSOD Index.
The text in blue, is the first parameter of the bugcheck, and this describes the cause of the error, which in this case is 0x4 and is linked to a Uncorrectable PCI Express Error.
:info: 0x4 can also mean a PCI error, as well as, PCI Express Error.
The text in green is the second parameter, and this describes the address of WHEA_ERROR_RECORD; we will use this address to extract some additional information.
Code:
7: kd> [COLOR=seagreen]!errrec[/COLOR] [COLOR=seagreen]fffffa800aaeb8d8[/COLOR]
===============================================================================
Common Platform Error Record @ fffffa800aaeb8d8
-------------------------------------------------------------------------------
Record Id : 01cdc0c21e2fc73d
Severity : Fatal (1)
Length : 672
Creator : Microsoft
Notify Type : PCI Express Error
Timestamp : 11/12/2012 10:45:50 (UTC)
Flags : 0x00000000
===============================================================================
Section 0 : PCI Express
-------------------------------------------------------------------------------
Descriptor @ fffffa800aaeb958
Section @ fffffa800aaeb9e8
Offset : 272
Length : 208
Flags : 0x00000001 Primary
Severity : [COLOR=red]Fatal[/COLOR]
Port Type : [COLOR=Red]Root Port[/COLOR]
Version : 1.1
Command/Status: 0x0010/0x0000
Device Id :
VenId:DevId : [COLOR=Blue]8086[/COLOR]:[COLOR=blue]3405[/COLOR]
Class code : 030000
Function No : 0x00
Device No : 0x00
Segment : 0x0000
Primary Bus : 0x00
Second. Bus : 0x00
Slot : 0x0000
Dev. Serial # : 0000000000000000
Express Capability Information @ fffffa800aaeba1c
Device Caps : 00008020 Role-Based Error Reporting: 1
Device Ctl : 0000 ur fe nf ce
Dev Status : 0000 ur fe nf ce
Root Ctl : 0000 fs nfs cs
[COLOR=red]AER Information @ fffffa800aaeba58[/COLOR]
Uncorrectable Error Status : 00000000 ur ecrc mtlp rof uc ca cto fcp ptlp sd dlp und
Uncorrectable Error Mask : 00000000 ur ecrc mtlp rof uc ca cto fcp ptlp sd dlp und
Uncorrectable Error Severity : 00062010 ur ecrc [COLOR=red]MTLP[/COLOR] [COLOR=red]ROF[/COLOR] uc ca cto [COLOR=red]FCP[/COLOR] ptlp sd [COLOR=red]DLP[/COLOR] und
Correctable Error Status : 00000000 adv rtto rnro dllp tlp re
Correctable Error Mask : 00000000 adv rtto rnro dllp tlp re
Caps & Control : 00000000 ecrcchken ecrcchkcap ecrcgenen ecrcgencap fep
Header Log : 00000000 00000000 00000000 00000000
Root Error Command : 00000000 fen nfen cen
Root Error Status : 00000000 MSG# 00 fer nfer fuf mur ur mcr cer
Correctable Error Source ID : 00,00,00
Correctable Error Source ID : 00,00,00
===============================================================================
Section 1 : Processor Generic
-------------------------------------------------------------------------------
Descriptor @ fffffa800aaeb9a0
Section @ fffffa800aaebab8
Offset : 480
Length : 192
Flags : 0x00000000
Severity : Informational
Proc. Type : x86/x64
Instr. Set : x64
CPU Version : 0x00000000000106a5
Processor ID : 0x0000000000000007
Code:
!errrec fffffa800aaeb8d8
I believe the capitalized parts are supposed to the most interesting and where the errors occurred, I think MTLP means Malformed TLP and ROF means Receiver Overflow.
In general, the parts indicate:
- UR = Unsupported Request Error
- MTLP = Malformed TLP
- SD = Surprise Down
- ROF = Receiver Overflow
- UC = Unexcepted Completion
- CT = Completion Timeout
However, the device may not always be the actual cause, it could be the port it is using or the motherboard. We can always use various stress tests and swaps in order to find a confirmation.
For processors, we can use the same extension (!errrec), however, less information will be displayed and I tend to just check the MCA (Processor Machine Check Architecture), here is an example:
Code:
===============================================================================
Section 2 : x86/x64 MCA
-------------------------------------------------------------------------------
Descriptor @ 86ca6a0c
Section @ 86ca6b94
Offset : 664
Length : 264
Flags : 0x00000000
Severity : [COLOR=red]Fatal[/COLOR]
Error : [COLOR=Red]BUSLG_SRC_ERR_*_NOTIMEOUT_ERR (Proc 1 Bank 0)[/COLOR]
Status : 0xb20000001040080f
I would like to thank Arc for pointing this out to me.
Code:
[COLOR=Red]WHEA_UNCORRECTABLE_ERROR (124)[/COLOR]
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
WHEA_ERROR_RECORD structure that describes the error conditon.
Arguments:
[COLOR=Blue]Arg1: 00000000, Machine Check Exception[/COLOR]
Arg2: 86ca68fc, Address of the WHEA_ERROR_RECORD structure.
Arg3: 00000000, High order 32-bits of the MCi_STATUS value.
Arg4: 00000000, Low order 32-bits of the MCi_STATUS value.
In such a situation, it is best to use these steps:
All the hardware seeming to be running stable and tests reporting no errors, could mean a bad motherboard.
External Links:
Last edited:
My Computer
- Computer type
- Laptop
