|23 Dec 2011||#1|
| || |
Skyrim Optimization Mod
Tests show that the "Markath FPS Death Zone" went from ~20FPS without the plugin to ~30FPS with the plugin, framebuffer size 2560x1600, everything maxed except no AA. Not joking or trolling; that's just the worst CPU-bound case I could think of.
Things I'd like to note here:
Only a fraction of the speedup comes from using SSE2 code. The original exe also uses SSE2 code, just not in the right places where it is truly needed. This could've been prevented by using automated SSE2 vectorization and/or another math library. Interestingly, in this case, it's the dot product function that has been rewritten, which is somewhat ironically the #1 textbook example for automated vectorization in compiler demos.
Much of the speedup is gained by manually eliminating (only possible if the entire function can be reduced to 5 bytes or less), or at least simplifying calls along the critical code paths as far as possible. This doesn't even produce nearly as good results as an optimizing compiler could have because of many restrictions a compiler doesn't have to deal with in the first place, so every optimizing compiler can do and usually does an excellent job at this if told to do it. Skyrim would probably experience an execution speed gain of over 100% just by applying this single optimization, as it has drastic consequences to the amount of code that could be detected as being redundant and thus completely eliminated. I know that sounds exaggerated, and normally would be, but it isn't when you've read and profiled enough of the code to know just how bad the compiled code is.
Just 3 functions have truly been rewritten, everything else is either a variant form or an instruction-level simplification of functions consisting of things like "return *this;" which are at the very top in the profiled list because the compiler was obviously told not to inline it. So, every time a certain kind of pointer needs to be dereferenced, the game will call a lengthy function to do what can be (and is) replaced by a single instruction. Fixing this manually isn't feasible after a certain point, but the compiler can do this for the whole binary at the cost of just a few seconds extra compiling time and much better than ever possible by a human (at least at these code dimensions).
In general, the TESV code has pretty high register pressure. A huge part of this is simply due to the completely missing optimizations which would otherwise eliminate the unneeded allocations, but an x86_64 build would also definitely help improving this condition.
Jump targets are completely unaligned, including the so-called hot targets which are hit millions of times in short periods, leading to cache stress due to multiple fetches being required to execute a jump, whether correctly predicted or not. Optimizing compilers can automatically align them properly.
I guess I don't have to mention how bad the threading is; this isn't trivial to fix though. Just sad that it's almost 2012 and this thing can't even properly use two threads. Besides all the other obvious flaws, this is the main reason why the game is so strongly limited by the CPU. Single-core speed didn't grow nearly as much as the number of cores did. Everyone knew it 10 years ago, but back then they could still just wait for the hardware to provide the additional power needed to run the sloppy code - this trick doesn't work anymore.
Yeah, Skyrim is a nice game, but many obstacles we've got here have trivial fixes compared to the size of their respective payoffs (little to none extra coding required). Especially with over 10 million copies already sold, I somewhat expect that it will at least run on recent hardware without sub-30 framerates.
|My System Specs|
|Similar help and support threads for2: Skyrim Optimization Mod|
|questions about system optimization||Performance & Maintenance|
|Windows 7 Optimization??||Performance & Maintenance|
|Optimization||Performance & Maintenance|
|Usable RAM optimization?||Hardware & Devices|
|SSD & 7 optimization clarification||Hardware & Devices|
|W7 Optimization||Performance & Maintenance|