We spent a lot of time in the previous year improving everyone’s access to Hyper-V research. First Steps in Hyper-V Research, our first blog post, examines the intriguing attack surfaces of the virtualization stack components and describes the tools and setup for debugging the hypervisor. The focus of our friends at the Virtualization Security Team has been” Fuzzing para-virtualized devices in Hyper-V,” which we then published. In this blog, they explore the VSPs-VSCs communication using VMBus and describe an intriguing guest-to-host vulnerability in the kernel of the root partition (vpcivsp ). sys. Joe Bialek demonstrated how he took advantage of another IDE emulator vulnerability, the Virtual Machine Worker Process (VMWP), in an incredible talk at Black Hat in August. Now that we know this, it’s time to look deeper inside VMWP to see if there are any additional vulnerabilities.
The Virtual Machine Worker Process: What Is It?
The Virtual Machine Worker Process (VMWP ) implements one of the biggest attack surfaces in our virtualization stack in the root partition userspace. exe. There is only one instance of VMWP when Hyper-V is running. each virtual machine’s exe process. Here are a few illustrations of the parts that make up VMWP, as we mentioned in the first blog post:
- vSMB Server
- Plan9FS
- IC ( Components for Integration )
- Virtual devices ( both emulated and unimplemented )
You could consider the VMWP to be our” QEMU” process. To implement emulated/non-embedded devices, we require a component, which we strongly advise implementing in userspace rather than kernelspace. You enter the picture when you consider that components are typically very complex and that complex things are challenging to implement correctly. Given that it is relatively simple to debug, has a sizable attack surface, and implements complex drivers, VMWP appears to be an excellent place to look for vulnerabilities. Oh, and you can use public symbols.
I’d like to briefly discussinternals of VMWPin this blog, including classes, interfaces, responsibilities, and how it functions.
internals of VMWP
Let’s examine some of the worker process’ internal parts and the intriguing flowcharts for various operations. Let’s start by examining the general virtualization architecture:
Through the Virtualization Infrastructure Driver (vid ), the worker process communicates with the hypervisor while operating in the root partition. sys. Additionally, the hypervisor and Virtual Machine Management Services (VMMS ) are connected by this driver. For tasks like: VID uses hypercalls to send management commands to the hypervisor.
- partition creation/delete
- partition suspended/resumed
- Dynamic memory
- Virtual ProcessorAdd/Remove
The VID also emulates ROMs and MMIO.
The userspace library, known as vid, has an interface to the driver because the worker process requires a VID interface. ( as opposed to vid ) dll. sys. The import table in VMWP makes this interface obvious.
A key element of this communication is the VID Notification Dispatcher (VND). Clients can register with the VND and receive VID notifications sent to them as a result of their actions. A VND notification is the most common flow that occurs during the worker process. These are some potential x64 VID notifications:
typedef enum _VID_MESSAGE_TYPE {       VidMessageInvalid = 0x0000,        //       // Message types corresponding to generic hypervisor intercepts.       //       VidMessageMbpAccess = 0x0001,       VidMessageException = 0x0002 | VID_MESSAGE_TYPE_FLAG_INTERCEPT,       VidMessageTripleFault = 0x0003 | VID_MESSAGE_TYPE_FLAG_INTERCEPT,       VidMessageExecuteInstruction = 0x0004 | VID_MESSAGE_TYPE_FLAG_INTERCEPT,       VidMessageMmio = 0x0005 | VID_MESSAGE_TYPE_FLAG_INTERCEPT,       VidMessageRegister = 0x0006 | VID_MESSAGE_TYPE_FLAG_INTERCEPT,        //       // Message types corresponding to other monitored events.       //       VidMessageHandlerUnregistered = 0x0007,       VidMessageStopRequestComplete = 0x0008,       VidMessageMmioRangeDestroyed = 0x0009,       VidMessageTerminateVm = 0x000A,       VidMessageGuestControlForPageCopy = 0x000B,        //       // Message types for minimal partitions       //       VidMessagePassthroughIntercept = 0x000C | VID_MESSAGE_TYPE_FLAG_INTERCEPT,       VidMessageStopRequestCompleteDirect = 0x000D | VID_MESSAGE_TYPE_FLAG_INTERCEPT,        VidMessageApicEoi = 0x000E | VID_MESSAGE_TYPE_FLAG_INTERCEPT,       VidMessageCpuid = 0x000F | VID_MESSAGE_TYPE_FLAG_INTERCEPT,       VidMessageExecuteIoInstruction = 0x0010 | VID_MESSAGE_TYPE_FLAG_INTERCEPT,       VidMessageMsr = 0x0011 | VID_MESSAGE_TYPE_FLAG_INTERCEPT,        VidMessageTypeCount = 0x0012,  } VID_MESSAGE_TYPE;
We’ll see how crucial it is to comprehend those messages and their significance later on in this post. Components inside the guest can communicate with the host’s worker process thanks to the VID, which serves as the glue that holds the virtualization stack together. The VMB ( Virtual Mother Board ) interface, which VDEVs use to interact with the worker process, is another crucial interface.
Tip: This enum’s names and values are available in vmwp! Details of TraceVidMessage. Sometimes it’s nice to have tracers ( spoiler, they’re not the only ones in this area ) do all the reversing for you.
We can examine some parts and begin setting breakpoints to examine the notifications and data once we have a better understanding of how communication is handled over VID messages. Later, we’ll see how we follow the logic’s progression from VID to a vmwp component and back.
essential elements of the employee process
An overview of some of the essential steps in the worker process is provided below.
Decoder of Instruction
A class called EmulatorVP implements an x86 emulator inside the worker process. Execution of x86 and X64 instructions is emulated by this class. As vmentries/vmexits have a significant performance overhead, its goal is to expedite cases of intercept-intensive code in child partitions by remaining in the worker process context between intercepts. Although this class has 250 functions, some of the flows are simple to comprehend. Let’s talk about a few of its components:
- EmulatorVP: Execute*Instruction and Execution*Op are two types of emulators. Both are collections of functions designed to mimic numerous instructions being carried out. There are various abstraction levels within the Execute*Instruction function family. They call the Execute*Op functions family, which has more specific functions, after carrying out the fundamental general operations/checks.
A collection of functions called ExecuteIs* is used by EmulatorVp to retrieve RFLAGS values. Checking some conditions ( such as Overflow/Zero/Sign/Carry ) requires numerous instructions. As seen in one of those handlers, all the flags are kept in a constant union RFLAGS:
- EmulatorVP: Decode*- if we perform emulation, we must also decode the instructions. EmulatorVP has a variety of features for this, including decode*, which parses the instructions and directs the flow to the appropriate handlers for each emulated instruction.
I’ll let you look them up since there are more instances of that kind.
Let’s examine a logic flow to better understand how that logic relates to the big picture. Here is the call stack for EmulatorVP, for instance, when DecodePrimaryOp is running:
You can see why we covered VID at the beginning of this post by looking at how the VndCompletionThread retrieves messages from a queue of messages. When VndCompletionThread receives such a message, it responds appropriately. This flow serves as an illustration of the VID_MESSAGE_TYPE enum definition’s VidMessageExecuteIoInstructionVID message type. All messages are stored in VndCompletion Handler’s switchcase, which is called HandleVndCallback from RunSelf.
Keep in mind those features. They are where the majority of the intriguing flows come from.
Devices
Let’s talk about devices now that you’ve seen how intercepts in the VMWP instructions emulator work. Emulation of devices is required by every virtualization platform. The QEMU process is typically used in KVM-based platforms on PCs.
Most likely, you’ve heard of “emulated devices” and “para-virtualized devices.” The following three categories apply to virtualized devices:
- Emulated Devices: Complete hardware-emulation software. This is very helpful when we want to run a guest without making any changes that would make them aware that their hardware is n’t up to par ( for instance, emulated e1000 NICs running on different hosts ).
- Para-virtualized: To lessen the complexity of virtualization or improve performance, a guest can be modified to make it aware that it is operating in virtual space. Hypervisors typically implement some virtual devices in the context of devices that are only intended to exist in virtual form and do not emulate any actual physical hardware.
- As it sounds, direct access ( passthrough ). The hypervisor merely enables direct hardware communication and access for the visitor. For some devices, virtualization platforms have an impact on performance. The device can only be used in a single guest partition if hardware support is n’t provided occasionally.
All of the virtualized ( PV ) VSPs mentioned in the previous vPCI post are para-virtualized. Drivers in the host kernel handle requests from the guest kernel for storage, network, and vPCI. It’s time to see some emulated devices right now.
Consider the PCI-ISA bus, for example. In vmemulateddevices, a class has been implemented. IsaBusDevice, dll. You can see the actual implementation emulating an Intel chipset called PIIX ( for PCI-to-ISA/IDE Xcelerator ) by starting to reverse that class methods and logic. or the VmEmulatedNic class EthernetCard. dll, which complies with the specifications for PCI ethernet cards DEC 21041, 21140A, and 21143. The list never ends. Numerous emulated/non-emulated devices are used in the worker process binary or the dlls it loads, including:
DLLs | Responsibility | scope of the devices/logics |
---|---|---|
VmEmulatedDevices.dll | simulates a variety of devices with capacities greater than 1250 and 1500. | IsaBusDevice Ps2Keyboard I8042PitchDevile VideoS3DmaControllerDeve PitDece PicDevice PciBuSDevine SpeakerDeVice |
vmuidevices.dll | emulates more than 1500 different devices. | VideoDevice VideoDirt InputManager SynthMouseDevice, KeyboardDevil, RdpServerConnection, and VideoSynthDeviance |
vmEmulatedNic.dll | devices with network cards | The following cards are emulated by Class EthernetDevice: DEC 21140A, DECD 21141 |
VmEmulatedStorage.dll | storage equipment | DeviceControllerDevice IdeOpticalDrive, Harddrive, DiskMetric, and FloppyDrive |
winHvEmulation.dll | Emulation of x86 instructions at UM ( EmulatorVP functions ) | N/A |
Vmchipset.dll | emulates numerous chipsets | PowerManagementEmulator for iOS |
Let’s examine the appearance of a full flow to that code area. What happens after a visitor executes an instruction that calls for specific emulation until the appropriate function in one of the libraries mentioned above can handle it?
Let’s start by thinking about the generic design, which will be useful when looking at other parts of this puzzle. Think about the NotifyPciConfigAccess PCI configuration function:
STDMETHOD(NotifyPciConfigAccess)(        _In_ PCI_BUS_NUMBER        Bus,        _In_ PCI_DEVICE_NUMBER     DeviceNumber,        _In_ PCI_FUNCTION_NUMBER   FunctionNumber,        _In_ UINT16                InRegAddress,        _In_ BOOLEAN               InIsWrite,        _Inout_ UINT32*            IoData        );
The worker process is written in C++, and this is how an interface function is defined for devices with PCI configuration access handlers. A few classes use this interface, but they also use their own implementation to supersede that approach:
0:035> x *!*NotifyPciConfigAccess 00007ffc`1fdafeb0 vmchipset!VmEmu::WCL::PciHandler::NotifyPciConfigAccess 00007ffc`477047c0 VmEmulatedStorage!IdeControllerDevice::NotifyPciConfigAccess 00007ffc`4c56c5e0 vmemulateddevices!VideoS3Device::NotifyPciConfigAccess 00007ffc`4c556300 vmemulateddevices!IsaBusDevice::NotifyPciConfigAccess
For the PCI configuration, that is. The read/write access handlers, which likely have more parsing of guest-controlled data, are also visible. Bus, device, function, register address, and type code are all extracted by the handlers. Check out PciBusDevice for an example of everything that needs to be sanitized and bounds checked. Due to the debug trace’s strings explaining their purpose, reverse engineering the code there should be fairly simple. As an illustration:
if ((unsigned int)VmlIsDebugTraceEnabled(0xC0C5i64)) VmlDebugTrace(       0xC0C5i64,       L"Write to PCI Data Port %04X [bus=%02X, unit=%02X, func=%02X, reg=%02X]: %08X",       ioAddr, …);
If you watched Joe Bialek’s speech from Blackhat 2019, you’ll understand how crucial it is to properly sanitize the data entering the IDE emulator ( storage ): NotifyIoPortWrite using the WriteIOPort handler. Let’s look at how to debug the entire flow, from the guest partition to the host worker process.
Let’s go over CVE 2018- 0959 to establish a baseline. The code that manages read/write IO ports lacks validation on guest-controlled data, which is the primary cause of the vulnerability in a storage emulator. For an example, look at this image from Joe’s speech:
Understanding the full picture and possibly identifying any new bugs will be made easier by observing the entire flow from the guest to that function in the worker process on the host. Let’s debug, then. Take into account the following guest partition instruction:
     out 0x1f0, 0x41414141
We are unable to execute this privileged instruction directly from userspace. There are two choices for us:
- Create a kernel driver for us that exposes an interface to userspace and allows us to read and write from/to IO ports.
- Execute it there by opening a kernel debugger.
The first choice would be made if we were writing a complete exploit. The second choice will be effective for simply running the instruction and debugging the worker process in usermode.
The vmwp debugger and kd are located on opposite sides of the screen in the screenshots that follow.
Let’s say the IDE storage emulator wants to set a breakpoint for the write to IO port. VmEmulatedStorage is the necessary function! NotifyIoPortWrite is an IDEControllerDevice. In order to achieve the following breakpoint:
bp vmemulatedstorage!IdeControllerDevice::NotifyIoPortWrite ".printf "write to port: 0x%x, accessSize==0x%x, data==0x%xrn", @rdx, @r8, @r9;g"
The IDE emulator allows us to receive all writes to IO ports. At the bottom left of each one, we can see ours with controlled data. Since I entered the kernel debugger, the traces have stopped. The write we performed using the debugger was still sent to the host usermode even though nothing was running in the guest’s kernel at the time.
Let’s say we want to examine and analyze the call stack for our particular flow. The following are the ways we can alter our breakpoint:
bp vmemulatedstorage!IdeControllerDevice::NotifyIoPortWrite ".printf "write to port: 0x%x, accessSize==0x%x, data==0x%xrn", @rdx, @r8, @r9; .if(@r9 != 41414141) {g;};"
Similar to the instruction emulation example, the call stack can be seen here, and it leads to an emulated storage device via VND.
This could serve as a good starting point for some solid breakpoints. Here is a list of intriguing game features and functions.
Function | Description |
---|---|
VndCompletionHandler::HandleVndCallback | When an IO completion event is received, we are called. The majority of flows that handle requests and data from guests have this at their core. |
EmulatorVp::TryIoEmulation | ( Includes strings IO instructions ) emulates a straightforward instruction ( IO instruction ) |
VmbCallback::NotifyIoPortWrite | The IO port’s general handler writes. It calls the handler of the pertinent device. |
EmulatorVp: Execute*Instruction and Execution*Op are two types of emulators. | simulate carrying out numerous instructions |
EmulatorVp::GenerateEvent | creates a guest event. Events could include nested pages, exceptions, etc. Since most events are injected into guests through this function, it is crucial. This function, for instance, is used to inject page fault and protection fault into the visitor. |
devices that are para-virtualized
Let’s now discuss how the employee process can provide the visitor with new PCI devices. For that, the worker process uses a VpciBus interface. The ioctl API that the vPCI VSP exposes is wrapped up in this code ( we must communicate with the device in some way ). VSPs are drivers that operate within the kernelspace of the root, as we covered in the previous blog post. In the guest’s kernelspace, they communicate with Virtualization Service Consumers (VSCs ) over VMBus. The code that causes vpcivsp to open a HANDLE to the exposed device can be found inside VMWP. sys:
Using that HANDLE, VMWP issues ioctls to the vPCI VSP device, requesting that it carry out a variety of tasks. IOCTLVPCI_ASSIGN_DEVICE, which is mapped to the function _vpcivsp, is one such example! Inside the kernel driver vpcipvsp, there is a VpciIoctlAssignDevice. systems, and in charge of giving guests devices. From the VMWP function, we can see that call! AssignDevice: VpciBus.
This is for PCI devices, please note. The two 8237 DMA controllers ‘ imitation on a PC’s motherboard is another illustration. Find the vmemulateddevices class! In the vmemulateddevices, there is a DmaControllerDevice. If you want to learn more, you can use dll, which implements both of them.
Let’s examine a real-world CVE example. We will now talk about a different vulnerability after Joe Bialek describesCVE-2018-959 and an exploit in Exploiting the Hyper-V IDE Emulator to Escape the Virtual Machine.
In last year’s Blackhat talk, A Dive into Hyper-V Architecture and Vulnerabilities, Joe and Nico made a passing reference to CVE-2018-888. Uninitialized memory during MMIO emulation is the vulnerability’s primary cause. BatteryEmulator: MmioRead ( ) should initialize ReadBuffer, and it appeared as follows:
void BatteryEmulator::MmioRead(    std::uint64_t /* RangeBase */,    std::uint64_t RangeOffset,    std::uint64_t NumberOfBytes,    _Out_writes_bytes_(NumberOfBytes) void* ReadBuffer    ) noexcept {    UINT32 readValue = 0;     if (NumberOfBytes != 4)    {        // log illegal read size        return;    }...
It is obvious that the buffer was not initialized if NumberOfBytes=0. By forcing the buffer to initialize before dialing BatteryEmulator:: MmioRead, this issue was resolved.
We could also look at the callstack:
0:001> kc Call Site vmchipset!BatteryEmulator::NotifyMmioRead vmwp!VmbComMmioHandlerAdapter::ReadCallback vmwp!VmbCallback::NotifyMmioRead vmwp!VND_HANDLER_CONTEXT::NotifyMmioRead vmwp!EmulatorVp::DispatchMmioOperation vmwp!EmulatorVp::FinishReadMemoryOperation vmwp!EmulatorVp::FinishReadModRmOperation vmwp!EmulatorVp::ExecuteGEInstruction vmwp!EmulatorVp::ExecuteInstructions vmwp!EmulatorVp::ActuallyAttemptEmulation vmwp!EmulatorVp::TryEmulation vmwp!VndIce::HandleExecutionRequest vmwp!VndCompletionHandler::HandleVndCallback vmwp!VndCompletionThread::RunSelf vmwp!::operator() vmwp!Vml::VmThread::Run vmwp!Vml::VmThread::OnRunThread ucrtbase!invoke_thread_procedure ucrtbase!thread_start KERNEL32!BaseThreadInitThunk ntdll!RtlUserThreadStart
VMWP hardening
We spend time and effort hardening the worker process because, as is obvious, it provides a very good attack surface for guest-to-host vulnerabilities. The following actions were taken:
-
We enabled both in CFG:
- CFG export repression ( significant decrease in the number of legitimate targets )
- Only modules that are CFG-enabled can be loaded using” ForceCFG.”
-
ACG ( Arbitrary Code Guard ) Enabled
- does not permit + X pages to be allocated or mapped.
- does not permit manipulation of the protection of + X pages.
-
Code Integrity GuardEnabledCIG
- does not permit unsigned DLLs to be loaded.
-
Removed SeImpersonatePrivilege
Note that the Black Hat 2019 exploit of calling VirtualProtect to make _guard_check_icall_fptr writeable and make it point to a no-op function is killed by CFG export suppression.
In general, we favor using the worker process to set up components rather than the kernel as VSPs. However, given strong primitives, exploitation is always feasible ( with some effort ). There are two strategies that come to mind right away:
- Nothing verifies the validity of the return addresses on the stack, even with all the mitigations enabled. Therefore, random reads and writes could result in ROP code execution.
- Even with memory corruption primitives, an attacker still has a wide range of options. For instance, you could use CreateFile ,WriteForce, etc. to try to run code in other services by referring to filesystem APIs. An attacker might also use other strategies.
You can see how many attack surfaces are exposed in the worker process in this blog. Anyone interested in attacking the virtualization stack is encouraged to do so for rewards. We examined key debugging scenarios, observed how communication between components operates, and provided a list of pertinent mitigations. I’m hoping that more researchers will use those tools to sift through the ground and uncover intriguing bugs.
If you have any questions, do n’t hesitate to DM me on Twitter at @AmarSaar.
MSRC-IL Saar Amar