New
#1
Stop 0x124 - what it means and what to try
Synopsis:
A "stop 0x124" is fundamentally different to many other types of bluescreens because it stems from a hardware complaint. Stop 0x124 minidumps contain very little practical information, and it is therefore necessary to approach the problem as a case of hardware in an unknown state of distress.
Generic "Stop 0x124" Troubleshooting Strategy:
1) Ensure that none of the hardware components are overclocked. Hardware that is driven beyond its design specifications - by overclocking - can malfunction in unpredictable ways.If stop 0x124 errors persist despite the steps above, and the harware is under warranty, consider returning it and requesting a replacement which does not suffer periodic MCE events. Be aware that attempting the subsequent harware troubleshooting steps may, in some cases, void your warranty:
2) Ensure that the machine is adequately cooled. If there is any doubt, open up the side of the PC case (be mindful of any relevant warranty conditions!) and point a mains fan squarely at the motherboard. That will rule out most (lack of) cooling issues.
3) Update all hardware-related drivers: video, sound, RAID (if any), NIC... anything that interacts with a piece of hardware. It is good practice to run the latest drivers anyway.
4) Update the motherboard BIOS according to the manufacturer's instructions. Their website should provide detailed instructions as to the brand and model-specific procedure.
5) Rarely, bugs in the OS may cause "false positive" 0x124 events where the hardware wasn't complaining but Windows thought otherwise (because of the bug). At the time of writing, Windows 7 is not known to suffer from any such defects, but it is nevertheless important to always keep Windows itself updated.
6) Attempt to (stress) test those hardware components which can be put through their paces artificially. The most obvious examples are the RAM and HDD(s). For the RAM, use the in-built memory diagnostics (run MDSCHED) or the 3rd-party memtest86 utility to run many hours worth of testing. For hard drives, check whether CHKDSK /R finds any problems on the drive(s), notably "bad sectors". Unreliable RAM, in particular, is deadly as far as software is concerned, and anything other than a 100% clear memory test result is cause for concern. Unfortunately, even a 100% clear result from the diagnostics utilities does not guarantee that the RAM is free from defects - only that none were encountered during the test passes.
7) As the last of the non-invasive troubleshooting steps, perform a "vanilla" reinstallation of Windows: just the OS itself without any additional applications, games, utilities, updates, or new drivers - NOTHING AT ALL that is not sourced from the Windows 7 disc. Should that fail to mitigate the 0x124 problem, jump to the next steps. Otherwise, if you run the "vanilla" installation long enough to convince yourself that not a single 0x124 crash has occurred, start installing updates and applications slowly, always pausing between successive additions long enough to get a feel for whether the machine is still free from 0x124 crashes. Should the crashing resume, obviously the very last software addition(s) may be somehow linked to the root cause.
8) Clean and carefully remove any dust from the inside of the machine. Reseat all connectors and memory modules. Use a can of compressed air to clean out the RAM DIMM sockets as much as possible.Should you find yourself in the situation of having performed all of the steps above without a resolution of the symptom, unfortunately the most likely reason is because the error message is literally correct - something is fundamentally wrong with the machine's hardware.
9) If all else fails, start removing items of hardware one-by-one in the hope that the culprit is something non-essential which can be removed. Obviously, this type of testing is a lot easier if you've got access to equivalent components in order to perform swaps.
=====================================================
Background Information:
Windows passes on the hardware error report in the form of a "stop 0x124" because it can't do anything else once the hardware has signalled an uncorrectable fault condition. In technical terms, the vast majority of stop 0x124 crashes correspond to "Machine Check Exceptions" (MCEs) issued by the processor to alert the software to the existence of a hardware problem. It's possible for drivers to indirectly induce hardware to register MCEs by "driving" in ways that are confusing to the hardware, but from a user's point of view that disctinction is so subtle as to be invisible.
It is important to note that there are many different possible MCE triggers, and one machine's stop 0x124 is likely to be entirely different to another's. Hence, it is best not to place too much emphasis on very specialised ways in which other individuals have resolved their own 0x124 problems - the more exotic the other machine's MCE solution, the less likely it is to apply to your own setup.
It is possible - but painful - to interpret the hardware's error report. It's passed along in the so-called "MCi_Status" register, the contents of which are generally visible as bugcheck parameters 3 and 4 on the BSOD screen, as well as in the corresponding minidump.
The trouble is that the hardware's complaints are almost never "practical", in the sense that they would explain what is wrong in layman's terms and include a recommendation for how to fix it. Instead, it's esoteric stuff which is intended for hardware specialists and driver developers.
Interpreting MCi_Status Contents:
This is not a viable troubleshooting methodology for most cases of stop 0x124 crashes, both because of the procedure's complexity and the impracticality of the resultant output. It is included here for the sake of completeness, and in case anyone should wish to go to the extreme in an attempt to understand recalcitrant stop 0x124 crashes on their machine.
Interpreting the numbers a matter of consulting information published by Intel and AMD. The MCi_Status register contents are a bitmask, and each individual bit has a very specific meaning. Reference:
http://download.intel.com/design/pro...als/253668.pdf
http://www.amd.com/us-en/assets/cont...docs/24593.pdf
Machine Check Exception - Wikipedia, the free encyclopedia
As an example, a hypothetical stop 0x124 crash may pass on an MCi_Status from the hardware whose contents are below:
1011001000000000000000000001100000000110000000000000111000001111
3210987654321098765432109876543210987654321098765432109876543210
___6_________5_________4_________3_________2_________1
Interpretation is performed based on the position of each significant bit, starting from "63" on the far left and ending with bit "0" on the far right:
63: VAL - MCi_STATUS register valid
61: UC - Error uncorrected
60: EN - Error enabled
57: PCC - Processor context corrupt
36: component has received a parity error on the RS[2:0]# pins for a response transaction.
35: (Reserved)
27/26/25: Bus queue error type = "Response Parity Error" (011)
MCA [15:0]:
0000 1110 0000 1111
000F 1PPT RRRR IILL
F: "Normal" filtering (0)
PP: Generic (11)
T: Request did not time out (0)
RRRR: Generic Error (0000)
II: Other transaction (11)
LL: Memory hierarchy level "generic" (11)
Last edited by H2SO4; 04 Nov 2009 at 05:49.