Windows 7 Forums
Welcome to Windows 7 Forums. Our forum is dedicated to helping you find support and solutions for any problems regarding your Windows 7 PC be it Dell, HP, Acer, Asus or a custom build. We also provide an extensive Windows 7 tutorial section that covers a wide range of tips and tricks.


Windows 7: Recurring BSOD Problem

17 Sep 2009   #1
ssu

Windows 7 RC 64 Bit
 
 
Recurring BSOD Problem

Im running Windows 7 64 bit, and I get a few Blue screens every day (a variety of different ones). If anyone could help solve this problem it would be great. Attached is my minidump folder

Specs:
AMD Phenom II x4 940 Black Edition
Gigabyte GA-MA790GP-UD4H
Radeon 4870 1GB
OCZ Fatal1ty high performace DDR2 memory 4GB (2x2GB)
Sound Blaster x-Fi titanium pci-e sound card
Seagate Barracuda 7200 RPM 3.0GB/s 16Mb cache OEM


My System SpecsSystem Spec
.
17 Sep 2009   #2
kegobeer

Windows 7 Ultimate x64 SP1
 
 

Have you installed any drivers? Also, fill out your system specs in your profile.
My System SpecsSystem Spec
17 Sep 2009   #3
ssu

Windows 7 RC 64 Bit
 
 

Quote   Quote: Originally Posted by kegobeer View Post
Have you installed any drivers? Also, fill out your system specs in your profile.
I have updated my profile. Yes i have installed drivers (the newest ones i beleive)
My System SpecsSystem Spec
.

17 Sep 2009   #4
H2SO4

Win7x64
 
 

Quote   Quote: Originally Posted by ssu View Post
Im running Windows 7 64 bit, and I get a few Blue screens every day (a variety of different ones). If anyone could help solve this problem it would be great. Attached is my minidump folder

Specs:
AMD Phenom II x4 940 Black Edition
Gigabyte GA-MA790GP-UD4H
Radeon 4870 1GB
OCZ Fatal1ty high performace DDR2 memory 4GB (2x2GB)
Sound Blaster x-Fi titanium pci-e sound card
Seagate Barracuda 7200 RPM 3.0GB/s 16Mb cache OEM
Short version: there's a high likelihood that your machine is experiencing hardware problems. Overclocking or under-cooling should both be ruled out first, then you'd want to run something like memtest for a long time to try to detect unreliable memory.

==========================

Longer version: A whole bunch of different crashes, all seemingly related to memory and memory management.

091609-18595-01.dmp is perhaps the most telling of the recent ones because it describes an issue called instruction pointer misalignment which is caused by a hardware-level defect in the vast majority of cases. In a nutshell, your machine is executing code which doesn't actually exist - instead of executing instruction 1, followed by instruction 2, followed by inst3, it instead tries to execute 1.5, followed by 2.4, followed by 3.6...

To draw an analogy, film cameras have a winder motor which draws film precisely one-frame-at-a-time while the camera is taking photos. Should the winder mechanism malfunction, the camera may end up exposing the adjacent portions of two separate frames during a singe shot - it has lost track of where each frame begins on the film.

Since that particular minidump is >99% hardware, there's very little point in performing in-depth software analysis of the others. While the hardware is unreliable, all software bets are off.
My System SpecsSystem Spec
17 Sep 2009   #5
kegobeer

Windows 7 Ultimate x64 SP1
 
 

Have you used this exact hardware with a different operating system before, and if so, did you have BSODs?
My System SpecsSystem Spec
17 Sep 2009   #6
ssu

Windows 7 RC 64 Bit
 
 

Ive only tried the current OS. This computer was built around a month ago, and hasn't been overclocked. Are you saying the hardware is defective, or was it something I did (ive done nothing other than put it together basically)?
My System SpecsSystem Spec
17 Sep 2009   #7
chuckr

XP_Pro, W7_7201, W7RC.vhd, SciLinux5.3, Fedora12, Fedora9_2x, OpenSolaris_09-06
 
 

Quote   Quote: Originally Posted by H2SO4 View Post
091609-18595-01.dmp is perhaps the most telling of the recent ones because it describes an issue called instruction pointer misalignment which is caused by a hardware-level defect in the vast majority of cases.
H2S04,
Just curious:

What provides you the analysis of the 'mini-dump' -- Crash Analyzer in MS Debug?

The IP used to be a CPU register.
Malfunction here would execute various memory locations that actually exist (assuming correct fetchs),
just not in the correct, or intended order.

Any links to the issue called instruction pointer misalignment?
I'm not trying to be a wise-azz here, just trying to learn a little bit more.

I seem to remember a SF member replacing a stock AMD multicore CPU. No o/c, etc.
In and of itself, this seems highly unusual...

Thank you,
Chuck
My System SpecsSystem Spec
17 Sep 2009   #8
kegobeer

Windows 7 Ultimate x64 SP1
 
 

Since I don't analyze dump files, I would appreciate if you could post two or three of the BSOD crash screens/error codes.
My System SpecsSystem Spec
18 Sep 2009   #9
H2SO4

Win7x64
 
 

Quote   Quote: Originally Posted by chuckr View Post
The IP used to be a CPU register.
Malfunction here would execute various memory locations that actually exist (assuming correct fetchs),
just not in the correct, or intended order.

Any links to the issue called instruction pointer misalignment?
I'm not trying to be a wise-azz here, just trying to learn a little bit more.
I salute you for being interested in old-school computery stuff

I'll aim to explain the "what" first, then the why...Here's the stack of the crashing thread at the time the user sees blue:

Code:
0: kd> kv
Child-SP          RetAddr           : Args to Child                                                           : Call Site
fffff880`031af358 fffff800`02c884e9 : 00000000`0000000a 00000000`00000020 00000000`00000002 00000000`00000001 : nt!KeBugCheckEx
fffff880`031af360 fffff800`02c87160 : 00000000`00000000 00000000`00000004 00000000`00000000 00000000`00000000 : nt!KiBugCheckDispatch+0x69
fffff880`031af4a0 fffff880`014e7f4b : 00000000`00000001 fffff880`01687391 00000000`00000014 fffffa80`00000000 : nt!KiPageFault+0x260 (TrapFrame @ fffff880`031af4a0)
fffff880`031af630 00000000`00000001 : fffff880`01687391 00000000`00000014 fffffa80`00000000 00000000`00000014 : ndis+0x1f4b
fffff880`031af638 fffff880`01687391 : 00000000`00000014 fffffa80`00000000 00000000`00000014 00000000`00000010 : 0x1
fffff880`031af640 00000000`00000014 : fffffa80`00000000 00000000`00000014 00000000`00000010 00000000`00000000 : tcpip+0x7f391
fffff880`031af648 fffffa80`00000000 : 00000000`00000014 00000000`00000010 00000000`00000000 00000000`000007be : 0x14
fffff880`031af650 00000000`00000014 : 00000000`00000010 00000000`00000000 00000000`000007be 00000000`00000078 : 0xfffffa80`00000000
fffff880`031af658 00000000`00000010 : 00000000`00000000 00000000`000007be 00000000`00000078 fffff880`014028b1 : 0x14
fffff880`031af660 00000000`00000000 : 00000000`000007be 00000000`00000078 fffff880`014028b1 fffff880`00000001 : 0x10
And the register context:
Code:
 
0: kd> r
rax=fffff880031af460 rbx=0000000000000004 rcx=000000000000000a
rdx=0000000000000020 rsi=0000000000000000 rdi=0000000000000000
rip=fffff80002c88f80 rsp=fffff880031af358 rbp=fffff880031af520
r8=0000000000000002  r9=0000000000000001 r10=fffff880014e7f4b
r11=fffff6fb40000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei ng nz na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00000286
nt!KeBugCheckEx:
fffff800`02c88f80 48894c2408      mov     qword ptr [rsp+8],rcx ss:0018:fffff880`031af360=000000000000000a
But that's not the crashing code - that's the "last resort" mechanism which elevates IRQL to 31 (the highest), thereby interrupting any and all other activity in order to dump physical memory to disk because an irrecoverable error has occurred.

To see the crashing code, we'd have to set the register context as it looked at the time of the fault. Thankfully, the OS dumps a trap frame (in red above) which records that info. To change the context:

Code:
 
0: kd> .trap fffff880`031af4a0
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000008 rbx=0000000000000000 rcx=fffffa8004861b80
rdx=0000000000000001 rsi=0000000000000000 rdi=0000000000000000
rip=fffff880014e7f4b rsp=fffff880031af630 rbp=fffffa8004861a00
r8=0000000000000000 r9=0000000000000000 r10=fffffa8004861c00
r11=fffff880031af938 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl nz na pe cy
ndis+0x1f4b:
fffff880`014e7f4b 896e20 mov dword ptr [rsi+20h],ebp ds:5c70:00000000`00000020=????????
OK, so the immediate cause of a crash is a function supposedly at ndis+0x1f4b which tries to move the contents of EBP into the memory address pointed at by RSI plus 0x20 bytes. Trouble is, RSI happens to be zero at the time, so the instruction turns into an attempt to write to memory address 0x20, which is not only in user-mode but also within the first 64KB "guard" block which is always designated no-access in order to catch null pointer references.

Note the (64-bit) RIP instruction pointer though: rip=fffff880014e7f4b

So the offending function which causes the crash (in green) is supposedly at that address, pointed up by RIP. Let's disassemble the code around that address:


Code:
 
0: kd> u fffff880`014e7f4b
ndis+0x1f4b:
fffff880`014e7f4b 896e20 mov dword ptr [rsi+20h],ebp
fffff880`014e7f4e 48897e08 mov qword ptr [rsi+8],rdi
fffff880`014e7f52 48897e10 mov qword ptr [rsi+10h],rdi
fffff880`014e7f56 48897e18 mov qword ptr [rsi+18h],rdi
fffff880`014e7f5a 48897e38 mov qword ptr [rsi+38h],rdi
fffff880`014e7f5e 48897e30 mov qword ptr [rsi+30h],rdi
fffff880`014e7f62 48897e68 mov qword ptr [rsi+68h],rdi
fffff880`014e7f66 48897e60 mov qword ptr [rsi+60h],rdi
For reasons not simple to explain succinctly, that looks a little odd. Let's start the disassembly a bit higher up:

Code:
 
0: kd> u fffff880`014e7f40
ndis+0x1f40:
fffff880`014e7f40 f6 ???
fffff880`014e7f41 0f8413840100 je ndis+0x1a35a (fffff880`0150035a)
fffff880`014e7f47 48893e mov qword ptr [rsi],rdi
fffff880`014e7f4a 48896e20 mov qword ptr [rsi+20h],rbp
fffff880`014e7f4e 48897e08 mov qword ptr [rsi+8],rdi
fffff880`014e7f52 48897e10 mov qword ptr [rsi+10h],rdi
fffff880`014e7f56 48897e18 mov qword ptr [rsi+18h],rdi
fffff880`014e7f5a 48897e38 mov qword ptr [rsi+38h],rdi
Hold on, that looks different, although it's still not right because I happen to have arbitrarily started disassembly in the middle of an instruction. Let's go back a little more:

Code:
 
0: kd> u fffff880`014e7f3a
ndis+0x1f3a:
fffff880`014e7f3a 8b6c2430 mov ebp,dword ptr [rsp+30h]
fffff880`014e7f3e 4885f6 test rsi,rsi
fffff880`014e7f41 0f8413840100 je ndis+0x1a35a (fffff880`0150035a)
fffff880`014e7f47 48893e mov qword ptr [rsi],rdi
fffff880`014e7f4a 48896e20 mov qword ptr [rsi+20h],rbp
fffff880`014e7f4e 48897e08 mov qword ptr [rsi+8],rdi
fffff880`014e7f52 48897e10 mov qword ptr [rsi+10h],rdi
fffff880`014e7f56 48897e18 mov qword ptr [rsi+18h],rdi
That looks like real code. There's a "test" followed by a conditional jump (je), and then we're moving a bunch of structured info into memory starting with [RSI].

Note that there is no instruction at 0xfffff880`014e7f4b! There are (legit) instructions starting at '4a and '4e, but the attempt to execute an instruction starting at '4b (in red) leads to an inevitable mangling of the instruction itself.

Instead of this:
fffff880`014e7f4a 48896e20 mov qword ptr [rsi+20h],rbp


...RIP is one byte too far along and we're executing an instruction which doesn't exist:

fffff880`014e7f4b 896e20 mov dword ptr [rsi+20h],ebp

And as for the "why"... (to be continued)
My System SpecsSystem Spec
18 Sep 2009   #10
H2SO4

Win7x64
 
 

(continuation from previous lengthy post)

Remember that on IA-32 and AMD64/EM64T architectures the code doesn't directly set the RIP. Instead, the hardware itself has the job of scanning through the memory which is the instruction sequence and determining where one instruction ends and the next one begins ("non-orthogonal" instruction set). Every time it concludes that it has isolated the next instruction and set in motion its execution, RIP is automatically updated without any involvement from software. In other words, it is almost impossible for software bugs to cause RIP to point in between two instructions because it's the hardware which moves RIP around.

One class of software bug which can lead to RIP getting whacked is when an errant memory write mangles the contents of a thread's CONTEXT structure - the one which contains the values of its registers, especially while the thread is parked and waiting for processor time. For example, if my buggy driver happens to accidentally throw a wild write to location XYZ, and there happens to be a CONTEXT structure stored thereabouts, then the victim thread could have one or more of its register contents - including RIP - point to something totally invalid the next time it gets scheduled for proc time.

That is exceptionally rare though, and when it does happen the results are almost always stupidly inaccurate, instead of being off-by-one byte as in this case. In situations like this one where RIP is only off by one or a few bytes, the cause is almost certainly unreliable hardware. A transistor farted somewhere

In case you should think I go through this rigmarole for every minidump, let me assure you all I did was to double-click the file. Because I've got the debugger set up quite nicely, it opens the dump, tries to have an automated go at deducing the cause of any crash conditions it finds in there, and spits out a text result thus:

SYMBOL_NAME: ndis+1f4b
FOLLOWUP_NAME: MachineOwner
IMAGE_NAME: hardware
DEBUG_FLR_IMAGE_TIMESTAMP: 0
MODULE_NAME: hardware
FAILURE_BUCKET_ID: X64_IP_MISALIGNED

I wrote up a short intro on how to get WinDBG to do the heavy lifting over on vistax64.com:

http://www.vistax64.com/tutorials/22...k-process.html

I enjoy this stuff so if there's something else you wanted to discuss - let's talk
My System SpecsSystem Spec
Reply

 Recurring BSOD Problem




Thread Tools




Similar help and support threads
Thread Forum
Recurring events in outlook stop recurring
We have multiple conference rooms with resources in active directory so we can schedule meetings in the conference rooms. Recently, I'm not sure for how long, whenever someone creates a recurring event with no end date when you go into the event on the conference room calender it only shows...
Microsoft Office
Recurring Updates Problem
I recently started having issues running Windows Updates on my beautiful Dell E6400. The error is 80070002 and I have gone through this process a few times. After doing so, I'm able to run updates, but only once. Each subsequent attempt to run updates gives the same error. Windows 7...
Windows Updates & Activation
File name too long recurring big problem
In Windows 7, I have the frequent problem of not being able to move some items in large projects from my main drive to a portable USB drive because of too long file name. It's a big problem. I tried moving in stages but this doesn't always work. Even worse, I often can't figure out where to...
General Discussion
Recurring problem with dowloading anything in Windows 7
When I first bought my W7 machine, I had no problems downloading. Right-click, save as, and it worked. But all of a sudden, in both IE and Firefox, it won't download anything. The save window won't even appear in IE, and the downloads window in FF will appear, but under the file, it just says...
Browsers & Mail
Recurring Problem with Updates
I have 2 computers with Windows 7. My laptop has Professional and my netbook has Starter. The problem I'm having is with the Starter version. Every other time I boot up, I get the message that Windows is configuring updates. It finally times out and says that the update failed. It finishes...
Windows Updates & Activation
Recurring Problem
My damn comfuser keeps having a problem w/(what I think is) malware. But even after using Malwarebytes' & manual cleaning, the problems come back. I don't know what to do & was hoping someone can look at my Hijackthis! result & give me some direction please.
BSOD Help and Support


Our Sites

Site Links

About Us

Find Us

Windows 7 Forums is an independent web site and has not been authorized, sponsored, or otherwise approved by Microsoft Corporation. "Windows 7" and related materials are trademarks of Microsoft Corp.

Designer Media Ltd

All times are GMT -5. The time now is 12:32.
Twitter Facebook Google+ Seven Forums iOS App Seven Forums Android App