Win7 x64 - Plethora of BSODs - At my wit's end

Page 1 of 2 12 LastLast

  1. Posts : 15
    Win7 x64
       #1

    Win7 x64 - Plethora of BSODs - At my wit's end


    As the topic says - I'm at my wit's end, and this is why:

    • Bought a new computer, hand picked quality components, assembled with care
    • Win7 x64 [retail] installs without problems, but is as unstable as can be
    • The symptoms include - but are not limited to - the following:
    - perhaps the most noticeable: an almost guaranteed BSOD at about 10 minutes uptime after a longer period of being switched off
    - occasional random BSODs during normal operation
    - crashing windows components and applications, including MSIE, firefox, skype, live messenger
    - basically every single application has crashed at least once
    - sometimes it happens that an app keeps crashing immediately after I try to start it, only to regain normal functionality after the next reboot
    - it may even happen that right after logging on, I get a cascade of error messages of all startup apps as well as numerous starting win components such as services, and all there's left to do is to reboot
    • I've tried all I could think of - including, but -again- not limited to the following:
    - BIOS update
    - vanilla install
    - like above, but with latest windows updates
    - like above, but with latest drivers for mainboard and video card
    - surface scans of all hard disks
    - thorough tests with memtest86+ (twice, 12+h each, not a single error)
    - hour long stress test with Everest Ultimate
    - removal of all non essential peripherals
    • An excerpt of my BSODs, loosely sorted by frequency:
    - 0x19 BAD_POOL_HEADER
    - 0x24 NTFS_FILE_SYSTEM
    - 0x1A MEMORY_MANAGEMENT
    - 0x0A IRQL_NOT_LESS_OR_EQUAL
    - 0x3B SYSTEM_SERVICE_EXCEPTION
    - 0x50 PAGE_FAULT_IN_NONPAGED_AREA
    - 0x1E KMODE_EXCEPTION_NOT_HANDLED
    - 0xC5 DRIVER_CORRUPTED EXPOOL
    I have attached an archive of the 21 minidumps that I have collected in the past 8 days since my latest OS re-install. please let me know if you need any further details, I will be happy to reply asap.

    Until then, I'm more than curious to hear your opinions and experiences, basically anything that will help me establish and maintain a permanently stable situation!

    ~sahib
      My Computer


  2. Posts : 7,878
    Windows 7 Ultimate x64
       #2

    Have you tried any other operating system on the computer in question? For example, Vista 64-bit?

    It seems you have coveraged a majority of the steps that I would recommend to further troubleshoot this. I'll take a look at your dump files a bit later when I am off this Linux machine and back on a Windows machine that can properly read these dump files.
      My Computer


  3. Posts : 15
    Win7 x64
    Thread Starter
       #3

    pparks1 said:
    Have you tried any other operating system on the computer in question? For example, Vista 64-bit?
    I haven't done so yet. I do not own a copy of Vista, and while installing 32-bit XP on this machine would feel like an acknowledgement of defeat, this might just be my next task, simply because I'm running out of options...
      My Computer


  4. Posts : 1,377
    Win7x64
       #4

    Exemplary problem description!

    Unfortunately, by far the most likely explanation is that the hardware is unreliable, and indeed many of the minidumps are strongly suggestive of such a situation.

    The crashes vary in exact type, but a lot of them occur while attempting to manage the transition between virtual and physical memory, which makes me suspect that bad or mismatched RAM is at least part of the problem.
      My Computer


  5. Posts : 15
    Win7 x64
    Thread Starter
       #5

    H2SO4 said:
    Unfortunately, by far the most likely explanation is that the hardware is unreliable, and indeed many of the minidumps are strongly suggestive of such a situation.
    Good morning, and thank your for your assessment of the situation.

    My day began with another BSOD [0xC5 DRIVER_CORRUPTED_EXPOOL, minidump attached to post], which occurred after 11 minutes of uptime. Before rebooting, I moved the RAM from slots 0+1 to slots 2+3, something which I had not tried before.

    H2SO4 said:
    a lot of [the crashes] occur while attempting to manage the transition between virtual and physical memory, which makes me suspect that bad or mismatched RAM is at least part of the problem.
    Assuming the transaction between virtual and physical memory to be the cause for my troubles would narrow down the search to these components:

    • faulty hard drive
    - the HDD containing the OS partition is brand new
    - surface scans did not reveal any bad sectors
    - I moved the swap file to a dedicated partition of the secondary HDD, which did not help at all
    • faulty memory
    - the memory is brand new
    - extensive testing with memtest86+ of both DIMMs together, as well as each of them separately, ran for 12+ hours without errors
    - I moved the DIMMs from banks 0+1 to 2+3 (as mentioned above), the effect remains to be seen
    • faulty drivers
    - this is where I'm rather clueless
    - could the SATA drivers have any impact, be it the win7 native drivers or the ones included with the mainboard drivers?
    - several of the BSODs appear to be related to drivers, judging by the description of the STOP messages
    Is there anything you would recommend me to check before I attempt yet another clean install of either win7 x86 (to rule out x64 related causes) or winXP x86 (to rule out win7 related causes) ?
      My Computer


  6. Posts : 1,377
    Win7x64
       #6

    This may turn into a long post, so I'll list my suggestions first and then explain them down below:

    1) On an elevated (run as admin) CMD prompt:

    VERIFIER /FLAGS 1 /ALL
    <reboot>

    Then, wait for any other crashes to occur and upload the minidumps.


    ===============================================

    Details:

    You're obviously highly methodical which makes me suspect that you've done the job right when you say that you've already completed BIOS updates and multiple "vanilla" reinstalls. Unfortunately, the fact that crashes persisted after all that virtually proves that you've got hardware problems, despite the memory and HDD diagnostics coming back clean. Lack of detection doesn't always mean there's no hardware problem.

    Many of your crashes fit patterns commonly observed when RAM is unreliable, and many of the others are "exotic" in the sense that they just shouldn't be happening - ever - on a vanilla install, unless again there's an unspecified hardware defect.

    The virtual/physical transition layer that I mentioned doesn't actually involve the HDD, at least not directly. Contrary to popular belief, "virtual memory" is not "the pagefile", but the type of virtualised address space that all applications get to see and experience, parts of which are indeed sometimes paged out to the pagefile. The OS component called the "memory manager" (Mm) has the job of translating, if you like, between the virtual memory (VM) references and the physical memory which backs a particular committed VM page. That's a long-winded way of saying "on your machine the Mm causes a crash sometimes when it tries to touch physical memory, again suggesting RAM issues".

    At present, the installation is NOT "vanilla", if only because of a few drivers such as this inexplicable one which is apparently something do to with a music jukebox app:

    Image path: \SystemRoot\System32\Drivers\PxHlpa64.sys
    Timestamp: Tue Dec 11 10:49:01 2007 (475DD06D)

    Why a jukebox app would need a kernel-mode driver is beyond me, but its presence invalidates the proposition that the OS is entirely "clean" at present, and that all kernel-mode memory corruption must be either hardware or bugs in the OS.

    The "verifier" command I suggested above will enable a mode of operation known as "special pool". After the reboot, the OS will pay closer attention to pool memory allocations, and trigger a crash immediately as soon as it notices a driver doing something wrong, instead of the default status quo where such corruption may go completely undetected until another hapless component sumbles upon it and causes a secondary-effect crash.

    Your last minidump suggests that the OS itself either caused or stumbled upon corrupted pool memory:

    1: kd> k
    Child-SP RetAddr Call Site
    fffff880`0b791598 fffff800`01a8e469 nt!KeBugCheckEx
    fffff880`0b7915a0 fffff800`01a8d0e0 nt!KiBugCheckDispatch+0x69
    fffff880`0b7916e0 fffff800`01bc190d nt!KiPageFault+0x260
    fffff880`0b791870 fffff800`01daa4c9 nt!ExAllocatePoolWithTag+0x53d
    fffff880`0b791960 fffff800`01da6322 nt!MiMapViewOfImageSection+0x199
    fffff880`0b791aa0 fffff800`01d3c112 nt!MiMapViewOfSection+0x372
    fffff880`0b791b90 fffff800`01d3ac47 nt!PspMapSystemDll+0xa6
    fffff880`0b791c30 fffff800`01d3aa5c nt!PsMapSystemDlls+0x5b
    fffff880`0b791ca0 fffff800`01d3bbdf nt!MmInitializeProcessAddressSpace+0x440
    fffff880`0b791db0 fffff800`01d39524 nt!PspAllocateProcess+0x6b3
    fffff880`0b792080 00000000`00000000 nt!NtCreateUserProcess+0x4a3


    The Mm's attempt to load a DLL (nt!MiMapViewOfSection) into the address space of a new process being created (nt!PspAllocateProcess) led to the need for allocating pool memory to store some of the content. In turn, that request for pool led to a wild write attempt to an invalid location, either because the pool metadata structures had already been corrupted by another driver (remember the OS is not "vanilla" at this time), or because the hardware is faulty. The third (unspoken) possibility is that the OS is buggy, but I sincerely recommend you don't focus on that because the routines in question are traversed millions of times per day on each one of our machines, and others (with healthy hardware) don't experience this morass of problems that you find yourself in.

    In short, if you've done the vanilla reinstalls and BIOS updates only to have this happen afterwards, the hardware is indeed broken
      My Computer


  7. Posts : 15
    Win7 x64
    Thread Starter
       #7

    First of all, thank you very much for your advice, and for taking the time to reply in such detail. I am very delighted to have found a forum with users as helpful and competent as you!

    H2SO4 said:
    1) On an elevated (run as admin) CMD prompt:

    VERIFIER /FLAGS 1 /ALL
    <reboot>

    Then, wait for any other crashes to occur and upload the minidumps.
    I have done as you suggested - configured the driver verifier manager, followed by a reboot. I'll post any future BSODs' minidumps as soon as they happen.

    H2SO4 said:
    At present, the installation is NOT "vanilla", if only because of a few drivers such as this inexplicable one which is apparently something do to with a music jukebox app:

    Image path: \SystemRoot\System32\Drivers\PxHlpa64.sys
    That is correct - it is not vanilla right now, because this computer was intended to be used for work, rather than to keep me busy for weeks, trying to fix all kinds of issues.

    (btw, I have no clue regarding the origin of the mentioned file... it appears to belong to a certain "MusicMatch Jukebox" program, although I'm absolutely positive that I did not install any such application).

    Nevertheless, I am planning to do another win7 x64 reinstall first thing tomorrow morning, keeping it vanilla and starting the driver verifier right after completing the install.

    Even though the current situation strongly suggests faulty memory modules, I'm wondering if you can think of any way to proove this in order to facilitate the RMA?
      My Computer


  8. Posts : 15
    Win7 x64
    Thread Starter
       #8

    I'll post any future BSODs' minidumps as soon as they happen.
    Well, here they go:

    • a 0x109 (CRITICAL_STRUCTURE_CORRUPTION - that one's a first timer. yay.) while trying to apply a 'sharpen' filter with Photoshop x64

    • followed by a 0x50 (PAGEFAULT_IN_NONPAGED_AREA) about 3 minutes after rebooting.

    • followed by another 0x50 about 30 minutes after the next reboot, while inspecting the windows user account settings.

    sigh.
    Last edited by sahib; 07 Nov 2009 at 15:02.
      My Computer


  9. Posts : 8,870
    Windows 7 Ult, Windows 8.1 Pro,
       #9

    You can read it for yourself. Keeps pointing to memory management problem "hardware problem or possibly memory settings need adjustment".


    Microsoft (R) Windows Debugger Version 6.11.0001.404 AMD64
    Copyright (c) Microsoft Corporation. All rights reserved.


    Mini Kernel Dump File: Only registers and stack trace are available
    Symbol search path is: SRV*C:\SymCache*Symbol information
    Executable search path is:
    Windows 7 Kernel Version 7600 MP (4 procs) Free x64
    Product: WinNt, suite: TerminalServer SingleUserTS
    Built by: 7600.16385.amd64fre.win7_rtm.090713-1255
    Machine Name:
    Kernel base = 0xfffff800`01a4b000 PsLoadedModuleList = 0xfffff800`01c88e50
    Debug session time: Sat Nov 7 11:56:14.539 2009 (GMT-8)
    System Uptime: 0 days 0:37:17.210
    Loading Kernel Symbols
    ...............................................................
    ................................................................
    .....................
    Loading User Symbols
    Loading unloaded module list
    .....
    *******************************************************************************
    * *
    * Bugcheck Analysis *
    * *
    *******************************************************************************
    Use !analyze -v to get detailed debugging information.
    BugCheck 50, {fffff683fd7e9908, 0, fffff80001ad0ae2, 2}

    Could not read faulting driver name
    Probably caused by : memory_corruption ( nt!MiAgeWorkingSet+1c2 )
    Followup: MachineOwner
    ---------
    2: kd> !analyze -v
    *******************************************************************************
    * *
    * Bugcheck Analysis *
    * *
    *******************************************************************************
    PAGE_FAULT_IN_NONPAGED_AREA (50)
    Invalid system memory was referenced. This cannot be protected by try-except,
    it must be protected by a Probe. Typically the address is just plain bad or it
    is pointing at freed memory.
    Arguments:
    Arg1: fffff683fd7e9908, memory referenced.
    Arg2: 0000000000000000, value 0 = read operation, 1 = write operation.
    Arg3: fffff80001ad0ae2, If non-zero, the instruction address which referenced the bad memory
    address.
    Arg4: 0000000000000002, (reserved)
    Debugging Details:
    ------------------

    Could not read faulting driver name
    READ_ADDRESS: GetPointerFromAddress: unable to read from fffff80001cf30e0
    fffff683fd7e9908
    FAULTING_IP:
    nt!MiAgeWorkingSet+1c2
    fffff800`01ad0ae2 488b19 mov rbx,qword ptr [rcx]
    MM_INTERNAL_CODE: 2
    CUSTOMER_CRASH_COUNT: 1
    DEFAULT_BUCKET_ID: VERIFIER_ENABLED_VISTA_MINIDUMP
    BUGCHECK_STR: 0x50
    PROCESS_NAME: dllhost.exe
    CURRENT_IRQL: 0
    TRAP_FRAME: fffff8800236a7a0 -- (.trap 0xfffff8800236a7a0)
    NOTE: The trap frame does not contain all registers.
    Some register values may be zeroed or incorrect.
    rax=0000007ffffffff8 rbx=0000000000000000 rcx=fffff683fd7e9908
    rdx=0000000000000001 rsi=0000000000000000 rdi=0000000000000000
    rip=fffff80001ad0ae2 rsp=fffff8800236a930 rbp=00000003fd7e9908
    r8=0000000000000001 r9=fffffa800454ba88 r10=0000000000000005
    r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
    r14=0000000000000000 r15=0000000000000000
    iopl=0 nv up ei ng nz na pe cy
    nt!MiAgeWorkingSet+0x1c2:
    fffff800`01ad0ae2 488b19 mov rbx,qword ptr [rcx] ds:fffff683`fd7e9908=????????????????
    Resetting default scope
    LAST_CONTROL_TRANSFER: from fffff80001b3abc2 to fffff80001abcf00
    STACK_TEXT:
    fffff880`0236a638 fffff800`01b3abc2 : 00000000`00000050 fffff683`fd7e9908 00000000`00000000 fffff880`0236a7a0 : nt!KeBugCheckEx
    fffff880`0236a640 fffff800`01abafee : 00000000`00000000 00000980`00000000 00000000`00000000 00000000`00000889 : nt! ?? ::FNODOBFM::`string'+0x40f90
    fffff880`0236a7a0 fffff800`01ad0ae2 : 00000003`00000000 88100001`13a9c025 00000000`00000000 00000000`00000081 : nt!KiPageFault+0x16e
    fffff880`0236a930 fffff800`01b3da0e : fffffa80`0454ba88 fffff880`00000001 00000000`00000001 fffff880`0236abb0 : nt!MiAgeWorkingSet+0x1c2
    fffff880`0236aae0 fffff800`01ad16e2 : 00000000`000008c4 00000000`00000000 fffffa80`00000000 00000000`00000005 : nt! ?? ::FNODOBFM::`string'+0x49926
    fffff880`0236ab80 fffff800`01ad196f : 00000000`00000008 fffff880`0236ac10 00000000`00000001 fffffa80`00000000 : nt!MmWorkingSetManager+0x6e
    fffff880`0236abd0 fffff800`01d60166 : fffffa80`039d1ae0 00000000`00000080 fffffa80`03991740 00000000`00000001 : nt!KeBalanceSetManager+0x1c3
    fffff880`0236ad40 fffff800`01a9b486 : fffff880`02056180 fffffa80`039d1ae0 fffff880`02060fc0 00000000`00000000 : nt!PspSystemThreadStartup+0x5a
    fffff880`0236ad80 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxStartSystemThread+0x16

    STACK_COMMAND: kb
    FOLLOWUP_IP:
    nt!MiAgeWorkingSet+1c2
    fffff800`01ad0ae2 488b19 mov rbx,qword ptr [rcx]
    SYMBOL_STACK_INDEX: 3
    SYMBOL_NAME: nt!MiAgeWorkingSet+1c2
    FOLLOWUP_NAME: MachineOwner
    MODULE_NAME: nt
    DEBUG_FLR_IMAGE_TIMESTAMP: 4a5bc600
    IMAGE_NAME: memory_corruption
    FAILURE_BUCKET_ID: X64_0x50_VRF_nt!MiAgeWorkingSet+1c2
    BUCKET_ID: X64_0x50_VRF_nt!MiAgeWorkingSet+1c2
    Followup: MachineOwner
      My Computer


  10. Posts : 1,377
    Win7x64
       #10

    chev65 said:
    You can read it for yourself. Keeps pointing to memory management problem "hardware problem or possibly memory settings need adjustment".
    You're right - it does indicate a probable hardware defect, but not because of anything it "says". The vast majority of bugchecks occur when memory that is expected to contain X for some reason contains Y instead. Therefore, even if the debugger's automated analysis spews out "memory management" as a literal text string, that doesn't by itself automatically mean "hardware problem", and indeed PAGE_FAULT_IN_NONPAGED_AREA (0x50) is a very common software bugcheck as well.

    There are some subtle clues though, and good on ya for getting more into the "debugging" thang :)
      My Computer


 
Page 1 of 2 12 LastLast

  Related Discussions
Our Sites
Site Links
About Us
Windows 7 Forums is an independent web site and has not been authorized, sponsored, or otherwise approved by Microsoft Corporation. "Windows 7" and related materials are trademarks of Microsoft Corp.

© Designer Media Ltd
All times are GMT -5. The time now is 00:13.
Find Us