Recurring BSOD Problem

Page 1 of 3 123 LastLast

  1. ssu
    Posts : 3
    Windows 7 RC 64 Bit
       #1

    Recurring BSOD Problem


    Im running Windows 7 64 bit, and I get a few Blue screens every day (a variety of different ones). If anyone could help solve this problem it would be great. Attached is my minidump folder

    Specs:
    AMD Phenom II x4 940 Black Edition
    Gigabyte GA-MA790GP-UD4H
    Radeon 4870 1GB
    OCZ Fatal1ty high performace DDR2 memory 4GB (2x2GB)
    Sound Blaster x-Fi titanium pci-e sound card
    Seagate Barracuda 7200 RPM 3.0GB/s 16Mb cache OEM
      My Computer


  2. Posts : 2,913
    Windows 7 Ultimate x64 SP1
       #2

    Have you installed any drivers? Also, fill out your system specs in your profile.
      My Computer


  3. ssu
    Posts : 3
    Windows 7 RC 64 Bit
    Thread Starter
       #3

    kegobeer said:
    Have you installed any drivers? Also, fill out your system specs in your profile.
    I have updated my profile. Yes i have installed drivers (the newest ones i beleive)
      My Computer


  4. Posts : 1,377
    Win7x64
       #4

    ssu said:
    Im running Windows 7 64 bit, and I get a few Blue screens every day (a variety of different ones). If anyone could help solve this problem it would be great. Attached is my minidump folder

    Specs:
    AMD Phenom II x4 940 Black Edition
    Gigabyte GA-MA790GP-UD4H
    Radeon 4870 1GB
    OCZ Fatal1ty high performace DDR2 memory 4GB (2x2GB)
    Sound Blaster x-Fi titanium pci-e sound card
    Seagate Barracuda 7200 RPM 3.0GB/s 16Mb cache OEM
    Short version: there's a high likelihood that your machine is experiencing hardware problems. Overclocking or under-cooling should both be ruled out first, then you'd want to run something like memtest for a long time to try to detect unreliable memory.

    ==========================

    Longer version: A whole bunch of different crashes, all seemingly related to memory and memory management.

    091609-18595-01.dmp is perhaps the most telling of the recent ones because it describes an issue called instruction pointer misalignment which is caused by a hardware-level defect in the vast majority of cases. In a nutshell, your machine is executing code which doesn't actually exist - instead of executing instruction 1, followed by instruction 2, followed by inst3, it instead tries to execute 1.5, followed by 2.4, followed by 3.6...

    To draw an analogy, film cameras have a winder motor which draws film precisely one-frame-at-a-time while the camera is taking photos. Should the winder mechanism malfunction, the camera may end up exposing the adjacent portions of two separate frames during a singe shot - it has lost track of where each frame begins on the film.

    Since that particular minidump is >99% hardware, there's very little point in performing in-depth software analysis of the others. While the hardware is unreliable, all software bets are off.
      My Computer


  5. Posts : 2,913
    Windows 7 Ultimate x64 SP1
       #5

    Have you used this exact hardware with a different operating system before, and if so, did you have BSODs?
      My Computer


  6. ssu
    Posts : 3
    Windows 7 RC 64 Bit
    Thread Starter
       #6

    Ive only tried the current OS. This computer was built around a month ago, and hasn't been overclocked. Are you saying the hardware is defective, or was it something I did (ive done nothing other than put it together basically)?
      My Computer


  7. Posts : 1,112
    XP_Pro, W7_7201, W7RC.vhd, SciLinux5.3, Fedora12, Fedora9_2x, OpenSolaris_09-06
       #7

    H2SO4 said:
    091609-18595-01.dmp is perhaps the most telling of the recent ones because it describes an issue called instruction pointer misalignment which is caused by a hardware-level defect in the vast majority of cases.
    H2S04,
    Just curious:

    What provides you the analysis of the 'mini-dump' -- Crash Analyzer in MS Debug?

    The IP used to be a CPU register.
    Malfunction here would execute various memory locations that actually exist (assuming correct fetchs),
    just not in the correct, or intended order.

    Any links to the issue called instruction pointer misalignment?
    I'm not trying to be a wise-azz here, just trying to learn a little bit more.

    I seem to remember a SF member replacing a stock AMD multicore CPU. No o/c, etc.
    In and of itself, this seems highly unusual...

    Thank you,
    Chuck
      My Computer


  8. Posts : 2,913
    Windows 7 Ultimate x64 SP1
       #8

    Since I don't analyze dump files, I would appreciate if you could post two or three of the BSOD crash screens/error codes.
      My Computer


  9. Posts : 1,377
    Win7x64
       #9

    chuckr said:
    The IP used to be a CPU register.
    Malfunction here would execute various memory locations that actually exist (assuming correct fetchs),
    just not in the correct, or intended order.

    Any links to the issue called instruction pointer misalignment?
    I'm not trying to be a wise-azz here, just trying to learn a little bit more.
    I salute you for being interested in old-school computery stuff :)

    I'll aim to explain the "what" first, then the why...Here's the stack of the crashing thread at the time the user sees blue:

    Code:
    0: kd> kv
    Child-SP          RetAddr           : Args to Child                                                           : Call Site
    fffff880`031af358 fffff800`02c884e9 : 00000000`0000000a 00000000`00000020 00000000`00000002 00000000`00000001 : nt!KeBugCheckEx
    fffff880`031af360 fffff800`02c87160 : 00000000`00000000 00000000`00000004 00000000`00000000 00000000`00000000 : nt!KiBugCheckDispatch+0x69
    fffff880`031af4a0 fffff880`014e7f4b : 00000000`00000001 fffff880`01687391 00000000`00000014 fffffa80`00000000 : nt!KiPageFault+0x260 (TrapFrame @ fffff880`031af4a0)
    fffff880`031af630 00000000`00000001 : fffff880`01687391 00000000`00000014 fffffa80`00000000 00000000`00000014 : ndis+0x1f4b
    fffff880`031af638 fffff880`01687391 : 00000000`00000014 fffffa80`00000000 00000000`00000014 00000000`00000010 : 0x1
    fffff880`031af640 00000000`00000014 : fffffa80`00000000 00000000`00000014 00000000`00000010 00000000`00000000 : tcpip+0x7f391
    fffff880`031af648 fffffa80`00000000 : 00000000`00000014 00000000`00000010 00000000`00000000 00000000`000007be : 0x14
    fffff880`031af650 00000000`00000014 : 00000000`00000010 00000000`00000000 00000000`000007be 00000000`00000078 : 0xfffffa80`00000000
    fffff880`031af658 00000000`00000010 : 00000000`00000000 00000000`000007be 00000000`00000078 fffff880`014028b1 : 0x14
    fffff880`031af660 00000000`00000000 : 00000000`000007be 00000000`00000078 fffff880`014028b1 fffff880`00000001 : 0x10
    And the register context:
    Code:
     
    0: kd> r
    rax=fffff880031af460 rbx=0000000000000004 rcx=000000000000000a
    rdx=0000000000000020 rsi=0000000000000000 rdi=0000000000000000
    rip=fffff80002c88f80 rsp=fffff880031af358 rbp=fffff880031af520
    r8=0000000000000002  r9=0000000000000001 r10=fffff880014e7f4b
    r11=fffff6fb40000000 r12=0000000000000000 r13=0000000000000000
    r14=0000000000000000 r15=0000000000000000
    iopl=0         nv up ei ng nz na po nc
    cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00000286
    nt!KeBugCheckEx:
    fffff800`02c88f80 48894c2408      mov     qword ptr [rsp+8],rcx ss:0018:fffff880`031af360=000000000000000a
    But that's not the crashing code - that's the "last resort" mechanism which elevates IRQL to 31 (the highest), thereby interrupting any and all other activity in order to dump physical memory to disk because an irrecoverable error has occurred.

    To see the crashing code, we'd have to set the register context as it looked at the time of the fault. Thankfully, the OS dumps a trap frame (in red above) which records that info. To change the context:

    Code:
     
    0: kd> .trap fffff880`031af4a0
    NOTE: The trap frame does not contain all registers.
    Some register values may be zeroed or incorrect.
    rax=0000000000000008 rbx=0000000000000000 rcx=fffffa8004861b80
    rdx=0000000000000001 rsi=0000000000000000 rdi=0000000000000000
    rip=fffff880014e7f4b rsp=fffff880031af630 rbp=fffffa8004861a00
    r8=0000000000000000 r9=0000000000000000 r10=fffffa8004861c00
    r11=fffff880031af938 r12=0000000000000000 r13=0000000000000000
    r14=0000000000000000 r15=0000000000000000
    iopl=0 nv up ei pl nz na pe cy
    ndis+0x1f4b:
    fffff880`014e7f4b 896e20 mov dword ptr [rsi+20h],ebp ds:5c70:00000000`00000020=????????
    OK, so the immediate cause of a crash is a function supposedly at ndis+0x1f4b which tries to move the contents of EBP into the memory address pointed at by RSI plus 0x20 bytes. Trouble is, RSI happens to be zero at the time, so the instruction turns into an attempt to write to memory address 0x20, which is not only in user-mode but also within the first 64KB "guard" block which is always designated no-access in order to catch null pointer references.

    Note the (64-bit) RIP instruction pointer though: rip=fffff880014e7f4b

    So the offending function which causes the crash (in green) is supposedly at that address, pointed up by RIP. Let's disassemble the code around that address:


    Code:
     
    0: kd> u fffff880`014e7f4b
    ndis+0x1f4b:
    fffff880`014e7f4b 896e20 mov dword ptr [rsi+20h],ebp
    fffff880`014e7f4e 48897e08 mov qword ptr [rsi+8],rdi
    fffff880`014e7f52 48897e10 mov qword ptr [rsi+10h],rdi
    fffff880`014e7f56 48897e18 mov qword ptr [rsi+18h],rdi
    fffff880`014e7f5a 48897e38 mov qword ptr [rsi+38h],rdi
    fffff880`014e7f5e 48897e30 mov qword ptr [rsi+30h],rdi
    fffff880`014e7f62 48897e68 mov qword ptr [rsi+68h],rdi
    fffff880`014e7f66 48897e60 mov qword ptr [rsi+60h],rdi
    For reasons not simple to explain succinctly, that looks a little odd. Let's start the disassembly a bit higher up:

    Code:
     
    0: kd> u fffff880`014e7f40
    ndis+0x1f40:
    fffff880`014e7f40 f6 ???
    fffff880`014e7f41 0f8413840100 je ndis+0x1a35a (fffff880`0150035a)
    fffff880`014e7f47 48893e mov qword ptr [rsi],rdi
    fffff880`014e7f4a 48896e20 mov qword ptr [rsi+20h],rbp
    fffff880`014e7f4e 48897e08 mov qword ptr [rsi+8],rdi
    fffff880`014e7f52 48897e10 mov qword ptr [rsi+10h],rdi
    fffff880`014e7f56 48897e18 mov qword ptr [rsi+18h],rdi
    fffff880`014e7f5a 48897e38 mov qword ptr [rsi+38h],rdi
    Hold on, that looks different, although it's still not right because I happen to have arbitrarily started disassembly in the middle of an instruction. Let's go back a little more:

    Code:
     
    0: kd> u fffff880`014e7f3a
    ndis+0x1f3a:
    fffff880`014e7f3a 8b6c2430 mov ebp,dword ptr [rsp+30h]
    fffff880`014e7f3e 4885f6 test rsi,rsi
    fffff880`014e7f41 0f8413840100 je ndis+0x1a35a (fffff880`0150035a)
    fffff880`014e7f47 48893e mov qword ptr [rsi],rdi
    fffff880`014e7f4a 48896e20 mov qword ptr [rsi+20h],rbp
    fffff880`014e7f4e 48897e08 mov qword ptr [rsi+8],rdi
    fffff880`014e7f52 48897e10 mov qword ptr [rsi+10h],rdi
    fffff880`014e7f56 48897e18 mov qword ptr [rsi+18h],rdi
    That looks like real code. There's a "test" followed by a conditional jump (je), and then we're moving a bunch of structured info into memory starting with [RSI].

    Note that there is no instruction at 0xfffff880`014e7f4b! There are (legit) instructions starting at '4a and '4e, but the attempt to execute an instruction starting at '4b (in red) leads to an inevitable mangling of the instruction itself.

    Instead of this:
    fffff880`014e7f4a 48896e20 mov qword ptr [rsi+20h],rbp


    ...RIP is one byte too far along and we're executing an instruction which doesn't exist:

    fffff880`014e7f4b 896e20 mov dword ptr [rsi+20h],ebp

    And as for the "why"... (to be continued)
      My Computer


  10. Posts : 1,377
    Win7x64
       #10

    (continuation from previous lengthy post)

    Remember that on IA-32 and AMD64/EM64T architectures the code doesn't directly set the RIP. Instead, the hardware itself has the job of scanning through the memory which is the instruction sequence and determining where one instruction ends and the next one begins ("non-orthogonal" instruction set). Every time it concludes that it has isolated the next instruction and set in motion its execution, RIP is automatically updated without any involvement from software. In other words, it is almost impossible for software bugs to cause RIP to point in between two instructions because it's the hardware which moves RIP around.

    One class of software bug which can lead to RIP getting whacked is when an errant memory write mangles the contents of a thread's CONTEXT structure - the one which contains the values of its registers, especially while the thread is parked and waiting for processor time. For example, if my buggy driver happens to accidentally throw a wild write to location XYZ, and there happens to be a CONTEXT structure stored thereabouts, then the victim thread could have one or more of its register contents - including RIP - point to something totally invalid the next time it gets scheduled for proc time.

    That is exceptionally rare though, and when it does happen the results are almost always stupidly inaccurate, instead of being off-by-one byte as in this case. In situations like this one where RIP is only off by one or a few bytes, the cause is almost certainly unreliable hardware. A transistor farted somewhere

    In case you should think I go through this rigmarole for every minidump, let me assure you all I did was to double-click the file. Because I've got the debugger set up quite nicely, it opens the dump, tries to have an automated go at deducing the cause of any crash conditions it finds in there, and spits out a text result thus:

    SYMBOL_NAME: ndis+1f4b
    FOLLOWUP_NAME: MachineOwner
    IMAGE_NAME: hardware
    DEBUG_FLR_IMAGE_TIMESTAMP: 0
    MODULE_NAME: hardware
    FAILURE_BUCKET_ID: X64_IP_MISALIGNED

    I wrote up a short intro on how to get WinDBG to do the heavy lifting over on vistax64.com:

    http://www.vistax64.com/tutorials/22...k-process.html

    I enjoy this stuff so if there's something else you wanted to discuss - let's talk :)
      My Computer


 
Page 1 of 3 123 LastLast

  Related Discussions
Our Sites
Site Links
About Us
Windows 7 Forums is an independent web site and has not been authorized, sponsored, or otherwise approved by Microsoft Corporation. "Windows 7" and related materials are trademarks of Microsoft Corp.

© Designer Media Ltd
All times are GMT -5. The time now is 20:39.
Find Us