Plagued by random BSoDs

Page 6 of 7 FirstFirst ... 4567 LastLast

  1. Posts : 47
    Windows 7 Home Premium SP1 x64
    Thread Starter
       #51

    It does not look like a hardware issue, working on the assumption that the RAM sticks are assigned in the order they are listed by SMBios. If the assumption holds true, then the 4GB stick would be assigned to 0`00100000..1`000FFFFF and the 2GB stick would be at 1`00100000..1`800FFFFF. Looking at my current progress of isolating the issue (by hand), we have corruption at:
    Code:
    VIRTUAL ADDRESS     P ADDRESS    CHANGE
    fffff980`5603c6c6 = 1`197a66c6 : f1 -> ef
    fffff980`5603c6ce = 1`197a66ce : f1 -> ef
    fffff980`5603c6d6 = 1`197a66d6 : f1 -> ef
    fffff980`5603c6de = 1`197a66de : f1 -> ef
    fffff980`5603c6e6 = 1`197a66e6 : f1 -> ef
    fffff980`5603c6ee = 1`197a66ee : f1 -> ef
    fffff980`5603c6f6 = 1`197a66f6 : f1 -> ef
    fffff980`5603c6fe = 1`197a66fe : f1 -> ef
    fffff8a0`10fade6e = 1`1b73fe6e : ff -> 38
    fffff8a0`10fade76 = 1`1b73fe76 : ff -> 05
    fffff8a0`095bfe26 = 0`b73aae26 : ff -> 0c
    fffff8a0`095bfe36 = 0`b73aae36 : ff -> 04
    The last two rows of the table proper would be on the 4GB, versus the 2GB (working with the mentioned assumption). However, one can note that the low three bits of the address (either physical or virtual; the low 12 bits are always identical) are always 110b ≡ 0x6 or 0xe. This means that whatever is doing the spray is doing something like a `byte ptr [rax+6]` (where RAX is qword-aligned); I _might_ actually copy the most recent (6GB) memory dump to my Linux-based server box and write up a program which searches for any code which would refer to such a pointer. Maybe a smaller dump, actually, since driver code would be in the kernel memory area, right?
    EDIT: Current physical address mask is 0_1___10_11_01___1___1______110b; I expect the higher bits to disappear as I do more analysis
    EDIT: It simplifies to 110b, so it is almost certain that it is, in fact, spray from SOMEthing.
    Last edited by TruePikachu; 27 Aug 2015 at 16:20.
      My Computer


  2. Posts : 20,583
    Win-7-Pro64bit 7-H-Prem-64bit
       #52

    I was reading through "this not sure why" but then noticed this is a 8 month old thread
      My Computer


  3. Posts : 47
    Windows 7 Home Premium SP1 x64
    Thread Starter
       #53

    Hm, well I just confirmed that the spray is what causes Minecraft to craft when it is the JVM which dies. The error log file thing (since I can't manage to get Java to dump on crash) reports RDX=0x009d0006f1678120, and the crashing instruction was using RDX as a pointer. This was just after using RAX as a pointer to get the value into RDX.

    Since I have no way to catch the JVM when it crashes, I can't check where it was in the grand scheme of things (either the kernel's virtual address or the system's physical address).
      My Computer


  4. Posts : 47
    Windows 7 Home Premium SP1 x64
    Thread Starter
       #54

    And just proved it isn't a problem with any drivers that DriverView lists as 3rd party; even having them all either disabled (renamed to .sys.disabled) or reverted to the OEM version (for drivers which have one) results in a crash from spray.

    The issue is either in ir41_qcx.dll (unlikely since it isn't loaded) or one of the 147 Microsoft drivers I have loaded right now. This can be reduced by knowing that it passed a run of SFC, so it is something which either SFC doesn't check, or provided initially through Windows Update (and never registered with SFC).

    I do not look forward to getting checksums and versions for all of these verified against my friends. But I'm almost certain it has to be one of these.

    EDIT: Wrote up a Powershell script which would provide easily diff(1)able output. For the curious, it is at http://cdusto.selfip.com/getVersionAndMD5.ps1

    EDIT: First run, checking against a different local Win7x64 machine, says that apisetschema.dll, cdd.dll, ksecdd.sys, ksecpkg.sys, mountmgr.sys, mrxsmb.sys, mrxsmb10.sys, mrxsmb20.sys, and ntoskrnl.exe (?!) are all outdated. Additionally, the files ntdll.dll, smss.exe, usbrpm.sys, volsnap.sys, and win32k.sys have matching version numbers, but differing checksums. I'll wait for another output log to help isolate against possible corruption on the first tested system, but I'm pretty sure that corruption exists on here.

    EDIT: The list of differences between my system and a "stable" Win7x64:

    • apisetschema.dll is 6.1.7601.18798 instead of .18933
    • cdd.dll is 6.1.7601.17514 instead of .17554
    • dxgkrnl.sys is 6.1.7601.22720 instead of .18510
    • dxgmms1.sys is 6.1.7601.22410 instead of .18126
    • ksecdd.sys is 6.1.7601.18912 instead of .18933
    • ksecpkg.sys is 6.1.7601.18912 instead of .18933
    • mountmgr.sys is 6.1.7600.16385 instead of .7601.18933
    • mrxsmb.sys is 6.1.7601.18912 instead of .18933
    • mrxsmb10.sys is 6.1.7601.18912 instead of .18933
    • mrxsmb20.sys is 6.1.7601.18912 instead of .18933
    • ntdll.dll is 6.1.7600.16385, and doesn't match MD5 checksum
    • ntsokrnl.exe is 6.1.7601.18798 instead of .18933
    • smss.exe is 6.1.7600.16385, and doesn't match MD5 checksum
    • usbrpm.sys is 6.1.7600.16385, and doesn't match MD5 checksum
    • volsnap.sys is 6.1.7600.16385, and doesn't match MD5 checksum
    • win32k.sys is 6.1.7600.16385, and doesn't match MD5 checksum

    The five files which didn't match checksums all match versions from known-good instances of Win7x64. I'd bet the corruption is coming from one of them.

    EDIT: Just installed a batch of updates. cdd.dll is still old, usbrpm.sys and volsnap.sys still mismatch checksums. Why does Microsoft make modifications to system files without changing their version information? It makes locating corruption a lot harder.

    EDIT: And issue isn't resolved.
    Last edited by TruePikachu; 28 Aug 2015 at 20:07.
      My Computer


  5. Posts : 47
    Windows 7 Home Premium SP1 x64
    Thread Starter
       #55

    Hmmm...I noticed that one non-OEM driver never was unloaded (LGSHidFilt.sys) - it was determined to be the best driver for my mouse even after being disabled by rename. So I'm doing a disassembly of it right now...I do not really like what I see. For instance, some subroutines have CALL instructions reading from areas of memory (within the driver's image) which appear to never be written to, and are uninitialized (e.g. at virtual address +113C8, referencing +1E550). There are also things like a `call near ptr` pointing to a procedure past the end of the image (this being as early as DriverEntry+1F). While I'm not familiar with driver development or disassembly, it looks to me like this might be the culprit.

    I don't know how I would test the driver by elimination without another mouse, since Windows insists on using the (possibly crash-inducing) driver by default, even after renaming it. The main thing is that I've been using Minecraft to test for the issue disappearing (since it both uses a lot of memory and, when the JVM crashes, it helpfully dumps the registers to file so I can check the 7th byte of the registers for corruption), and I need an external mouse to play it; the touchpad on here has a strange "issue" where it doesn't register any input when typing on the keyboard, so it isn't the best thing to use...

    EDIT: Driver is WHCP signed, and obtained through WinUpdate or something. Despite the strange stuff I was seeing (as someone more experienced with application disassembly instead), it doesn't look like it would be the problem.
    Last edited by TruePikachu; 28 Aug 2015 at 22:24.
      My Computer


  6. Posts : 47
    Windows 7 Home Premium SP1 x64
    Thread Starter
       #56

    And just hit a crash within 2 1/2 minutes of bootup. X64_IP_MISALIGNED versus an access violation (so it's harder to locate the exact site of corruption), but since there wasn't a lot of time since the drivers loaded, it might be possible to locate the problematic pointer in the dump.
      My Computer


  7. Posts : 47
    Windows 7 Home Premium SP1 x64
    Thread Starter
       #57

    Well then.

    Does that bitmask look familiar?

    The failing address puts it on the 4GB stick (I believe), but I'll have it check the 2GB alone tonight, just in case. Yes, right now I'm running on a third of my usual memory.


    So I'd be looking for PC3-12800 DDR3 4GB tomorrow?

    EDIT: Ignore that the pic says DDR2, CPU-Z reports DDR3. As well as PC3-12800.
      My Computer


  8. Posts : 2,781
    Windows 10 Pro x64
       #58

    You must run MemTest86+. Never use 2 different sticks together. Remove one of the sticks. It reports a failing adress on the 4GB RAM module, so it must be failing. A single error means that it's dead.
      My Computer


  9. Posts : 47
    Windows 7 Home Premium SP1 x64
    Thread Starter
       #59

    You must run MemTest86+.
    That was from MemTest86+ (the + blinks, and was not visible in the picture).

    Never use 2 different sticks together.
    Just want a bit of clarification for this (it doesn't matter for this case anymore), only one socket should be filled when running MemTest86+, or two different sizes of stick together? The memory was supplied by the OEM, so I figured it was acceptable.

    It reports a failing adress on the 4GB RAM module, so it must be failing. A single error means that it's dead.
    I came to the same conclusion, so earlier today I got a pair of new 4GB modules, and have those in right now. I was meaning to upgrade to 8GB for some time, and this was the perfect excuse to do so. I might put the (most likely OK) 2GB card into one of the other laptops over here (which is running on 3GB right now), but I don't know yet.

    I just finished getting all the drivers (including outdated OEM ones) re-enabled (since I had disabled many of them for troubleshooting, and later so they wouldn't eat up too much of the 2GB I had), and after a fight with HP DriveGuard not reconising the disk (I got that fixed), this system is back in it's normal configuration. I'll give it time to see if it crashes before I mark solved.
      My Computer


  10. Posts : 47
    Windows 7 Home Premium SP1 x64
    Thread Starter
       #60

    Might not be entirely out of the woods yet, Garry's Mod just crashed with an access violation (according to WinDbg). While I'm sure it is probably just a bug in the code, there _might_ be spray. I can't check unless I get a kernel crash, though.
    I might see if I can catch spray by rebuilding the BitCoin blockchain. If even one byte in it gets sprayed on, it won't fully build and will error out.
      My Computer


 
Page 6 of 7 FirstFirst ... 4567 LastLast

  Related Discussions
Our Sites
Site Links
About Us
Windows 7 Forums is an independent web site and has not been authorized, sponsored, or otherwise approved by Microsoft Corporation. "Windows 7" and related materials are trademarks of Microsoft Corp.

© Designer Media Ltd
All times are GMT -5. The time now is 23:05.
Find Us