KVA Shadow: Mitigating Meltdown on Windows

Brink · Mar 26, 2018

On January 3rd, 2018, Microsoft released an advisory and security updates that relate to a new class of discovered hardware vulnerabilities, termed speculative execution side channels, that affect the design methodology and implementation decisions behind many modern microprocessors. This post dives into the technical details of Kernel Virtual Address (KVA) Shadow which is the Windows kernel mitigation for one specific speculative execution side channel: the rogue data cache load vulnerability (CVE-2017-5754, also known as “Meltdown” or “Variant 3”). KVA Shadow is one of the mitigations that is in scope for Microsoft's recently announced Speculative Execution Side Channel bounty program.

It’s important to note that there are several different types of issues that fall under the category of speculative execution side channels, and that different mitigations are required for each type of issue. Additional information about the mitigations that Microsoft has developed for other speculative execution side channel vulnerabilities (“Spectre”), as well as additional background information on this class of issue, can be found here.

Please note that the information in this post is current as of the date of this post.

Vulnerability description & background

The rogue data cache load hardware vulnerability relates to how certain processors handle permission checks for virtual memory. Processors commonly implement a mechanism to mark virtual memory pages as owned by the kernel (sometimes termed supervisor), or as owned by user mode. While executing in user mode, the processor prevents accesses to privileged kernel data structures by way of raising a fault (or exception) when an attempt is made to access a privileged, kernel-owned page. This protection of kernel-owned pages from direct user mode access is a key component of privilege separation between kernel and user mode code.

Certain processors capable of speculative out-of-order execution, including many currently in-market processors from Intel, and some ARM-based processors, are susceptible to a speculative side channel that is exposed when an access to a page incurs a permission fault. On these processors, an instruction that performs an access to memory that incurs a permission fault will not update the architectural state of the machine. However, these processors may, under certain circumstances, still permit a faulting internal memory load µop (micro-operation) to forward the result of the load to subsequent, dependent µops. These processors can be said to defer handling of permission faults to instruction retirement time.

Out of order processors are obligated to “roll back” the architecturally-visible effects of speculative execution down paths that are proven to have never been reachable during in-program-order execution, and as such, any µops that consume the result of a faulting load are ultimately cancelled and rolled back by the processor once the faulting load instruction retires. However, these dependent µops may still have issued subsequent cache loads based on the (faulting) privileged memory load, or otherwise may have left additional traces of their execution in the processor’s caches. This creates a speculative side channel: the remnants of cancelled, speculative µops that operated on the data returned by a load incurring a permission fault may be detectable through disturbances to the processor cache, and this may enable an attacker to infer the contents of privileged kernel memory that they would not otherwise have access to. In effect, this enables an unprivileged user mode process to disclose the contents of privileged kernel mode memory.

Operating system implications

Most operating systems, including Windows, rely on per-page user/kernel ownership permissions as a cornerstone of enforcing privilege separation between kernel mode and user mode. A speculative side channel that enables unprivileged user mode code to infer the contents of privileged kernel memory is problematic given that sensitive information may exist in the kernel’s address space. Mitigating this vulnerability on affected, in-market hardware is especially challenging, as user/kernel ownership page permissions must be assumed to no longer prevent the disclosure (i.e., reading) of kernel memory contents from user mode. Thus, on vulnerable processors, the rogue data cache load vulnerability impacts the primary tool that modern operating system kernels use to protect themselves from privileged kernel memory disclosure by untrusted user mode applications.

In order to protect kernel memory contents from disclosure on affected processors, it is thus necessary to go back to the drawing board with how the kernel isolates its memory contents from user mode. With the user/kernel ownership permission no longer effectively safeguarding against memory reads, the only other broadly-available mechanism to prevent disclosure of privileged kernel memory contents is to entirely remove all privileged kernel memory from the processor’s virtual address space while executing user mode code.

This, however, is problematic, in that applications frequently make system service calls to request that the kernel perform operations on their behalf (such as opening or reading a file on disk). These system service calls, as well as other critical kernel functions such as interrupt processing, can only be performed if their requisite, privileged code and data are mapped in to the processor’s address space. This presents a conundrum: in order to meet the security requirements of kernel privilege separation from user mode, no privileged kernel memory may be mapped into the processor’s address space, and yet in order to reasonably handle any system service call requests from user mode applications to the kernel, this same privileged kernel memory must be quickly accessible for the kernel itself to function.

The solution to this quandary is to, on transitions between kernel mode and user mode, also switch the processor’s address space between a kernel address space (which maps the entire user and kernel address space), and a shadow user address space (which maps the entire user memory contents of a process, but only a minimal subset of kernel mode transition code and data pages needed to switch into and out of the kernel address space). The select set of privileged kernel code and data transition pages handling the details of these address space switches, which are “shadowed” into the user address space are “safe” in that they do not contain any privileged data that would be harmful to the system if disclosed to an untrusted user mode application. In the Windows kernel, the usage of this disjoint set of shadow address spaces for user and kernel modes is called “kernel virtual address shadowing”, or KVA shadow, for short.

In order to support this concept, each process may now have up to two address spaces: the kernel address space and the user address space. As there is no virtual memory mapping for other, potentially sensitive privileged kernel data when untrusted user mode code executes, the rogue data cache load speculative side channel is completely mitigated. This approach is not, however, without substantial complexity and performance implications, as will later be discussed.

On a historical note, some operating systems previously have implemented similar mechanisms for a variety of different and unrelated reasons: For example, in 2003 (prior to the common introduction of 64-bit processors in most broadly-available consumer hardware), with the intention of addressing larger amounts of virtual memory on 32-bit systems, optional support was added to the 32-bit x86 Linux kernel in order to provide a 4GB virtual address space to user mode, and a separate 4GB address space to the kernel, requiring address space switches on each user/kernel transition. More recently, a similar approach, termed KAISER, has been advocated to mitigate information leakage about the kernel virtual address space layout due to processor side channels. This is distinct from the rogue data cache load speculative side channel issue, in that no kernel memory contents, as opposed to address space layout information, were at the time considered to be at risk prior to the discovery of speculative side channels...

Read more: KVA Shadow: Mitigating Meltdown on Windows Defense

KVA Shadow: Mitigating Meltdown on Windows

Brink

Administrator

My Computer