티스토리 뷰

조사

The State of Virtualization

덕쑤 2013. 7. 9. 13:55

Summarized from

David C, The Definitive Guide to the Xen HypervisorPrentice Hall, 2007

Part 1. The Xen Virtual Machine

CH1. The State of Virtualization

1.1 What Is Virtualization?

  • Virtualization is similar conceptually to emulation which a system pretends to be two or more of the same system.
  • Simplified system of virtualization in modern OSes (i.e., multi-process)
    • CPU scheduling (i.e., process scheduling)
    • Virtual memory
    • H/W devices sharing bet/ processes
  • The isolation gives often prevents a bug or intentionally malicious behavior, in one application from breaking others.
  • OSes often allow one application to compromise the isolation. (Because OSes may contain bugs.)
  • Virtualization often provides a greater degree of isolation than an operating system can.

 
  1.1.1 CPU Virtualization

  • Context switching according to timer interrupt(i.e., occurred every 10ms)
    • A process runs with exclusive use of it for a while, and is then interrupted.
    • The CPU state is then saved, and another process runs. After while, this process is repeated.
  • The virtual CPU and the physical CPU are not identical.
    • When OS is running with swapping processes, the CPU runs in a privileged mode.
    • The privileged mode allows certain operations, such as access to memory by physical address, that are not usually permitted.
  • A set of requirements for completely virtualized CPU[1].
    • Privileged instructions: are executed in a privileged mode, but will trap if executed outside this mode.
    • Control sensitive instructions: attempt to change the configuration of resources in the system, such as updating virtual to physical memory mappings, communicating with devices, or manipulating global configuration registers.
    • Behavior sensitive instructionsbehave in a different way depending on the configuration of resources, including all load and store operations that act on virtual memory.
  • All sensitive instructions must also be privileged instructions for an virtualizable architecture.
    • Hypervisor must be able to intercept any instructions that change the state of the machine in a way that impacts other processes.
  • Sample of CPU virtualization
    • DEC Alpha
      • had special instruction that jumped to a specified firmware address('PALCode')
      • PALCode: By setting or unsetting a flag in a hidden register, pass control from process to the kernel, or the kernel to process, respectively.
    • This mechanism could be extended to provide multiple levels of privilege.

     

1.1.2 I/O Virtualization

  • A way to virtualize memory
    • Partition the memory
    • Make every privileged instruction that access physical memory to be trapped and replaced with one that maps to the permitted range.
    • These translations are provided Memory Management Unit(MMU) included in a modern CPU.
  • Other I/O devices
    • A block devices, such as a hard disk can be virtualized in the same way as main memory
      • By dividing it up into partitions that can be accessed by each virtual machine(VM)
    • Issue 1 – complex devices to do virtualization: graphic card
      • Virtual frame buffer to each VM and allow user to either switch bet/ them or map them into ranges on a physical display.
      • Modern graphics
        • Switching bet/ VMs is problematic.(2D, 3D acceleration – a lot of internal state)
      • GUI must be modified to ensure that it also saves the state elsewhere, and can restore it when required.
    • Issue 2 – interaction bet/ the devices and the system – Direct Memory Access(DMA)
      • Data is transferred to and from via DMA transfers.
      • The device exists outside the normal framework of the OS, it must use a physical memory rather than a virtual address space.
      • In virtualized environment, the kernel is running in a hypervisor-provided virtual address space in much the same way that a user process runs in a kernel-provided virtual address space.
      • Accordingly, allowing the guest kernel to tell devices to write to an arbitrary location in the physical address space is a serious security hole. Moreover, the situation is even worse if guest OS(specially, the kernel or device driver) is not aware that it is running in a virtualized environment.
      • In theory, a hypervisor(instead of kernel) traps writes to devices and rewrite the DMA address to something in the permitted address.
      • In practice, even discounting the significant performance penalty, detecting a DMA instruction is nontrivial.
        • Each driver defines its own protocol for talking to drivers, and so the hypervisor would have to understand this protocol, parse the instruction stream, and perform the substitution.

        Figure 1. IOMMU

      • Input/Output Memory Management Unit(IOMMU)
        • Is similar with MMU and added to allow pages of the physical address space to be mapped to the devices' address space
        • SPARC systems' network interface – not sufficient address space to write into all of main memory.
        • AMD's x86-64 systems –
          • To connect some devices for PCI legacy(supports only 32-bit address space) to x86-64 physical memory.
          • Without IOMMU, these are limited to accessing the bottom 4GB of physical memory.
        • ISA 8-, 16-bit card with 32-bit system
        • AGP card's Graphic Address Remapping Table(GART)
          • Is simple IOMMU used to allow loading of texture into an AGP graphics card using DMA transfers.
          • Virtualization aspect – Not sufficient
            • Since not all interaction with an AGP or PCIe graphic card pass through the GART.


1.2 Why Virtualize?

  • The basic motivation for virtualization is the same as that for multitasking OS.
    • Multitasking made it possible to user computing power, fully.
      • The hardware became fast enough that a machine could do one task and still have spare resources.
  • Virtualization allows a number of virtual servers to be consolidated into a single physical machine, without losing the security gained by having completely isolated environment.
  • Features of virtualization
    • Small space –
      • allowing to give each customer his own VM without a physical machine(Web hosting companies)
    • Low cost cloning –
      • Tests for uncertain patch in production system
    • Easy migration – the virtual machines from one physical computer to another
    • Low power – According to consolidating a number of servers into VM running on a smaller number of hosts can reduce poser costs considerably.
    • Portable – easy to move the VM using by small memory driver such as USA, iPOP, laptop.
    • Secure isolation – provide greater degree of isolation than a process in an OS.


1.3 The First Virtual Machine

  • IBM's VM
    • System/360 project.(S/360)
    • To provide a stable architecture and upgrade path to IBM customer.
      • Low cost to upgrade from a few S/360 minicomputers to single S/360 mainframe.
    • The Model 67 – self-virtualizing instruction set
      • Each VM could be further partitioned; this made it very easy to migrate from having a collection of minicomputers to having a single computer
    • The latest VM, z/VM, which runs on IBM's zSeries
      • These run variety of OS such as Linux, AIX in a fully virtualized environment, as well as running VM/CMS application.


1.4 The Problem of x86

  • The 80386 CPU(32-bit system) designed with virtualization
    • Allows the running of multiple existing DOS(16-bit application) application at once.
    • Virtual 8086 mode(isolated 8086 environment for older program)
      • Old real-mode addressing model running on top of protected mode addressing.
  • Problem of x86 for virtualization
    • Virtualization properties: The set of control sensitive instructions  The set of privileged instructions
    • Unfortunately, 17 instruction of x86 does not satisfy this property.
      • LAR and LSL instructions
        • These load information about specified memory segment without trap.
        • The hypervisor cannot rearrange the memory layout without a guest OS finding out.
      • SIDT
        • It allows the values of certain condition register to be set, but no corresponding load instruction.
        • This means that every time they execute they must be trapped and the new value stored elsewhere as well, so it can be restored when the virtual machine is re-activated.


1.5 Some Solutions
  1.5.1 Binary Rewriting

  • Popularized by VMware, allowing the VM to run in user-space

Figure 2. Binary Rewriting Scheme

  • Operation
    • In Figure 2, instruction streams is composed of 1 privileged instruction, 1 jump instruction, and 6 unprivileged instructions, respectively.
      • 1. Instruction stream is scanned by the virtualization environment.
      • 2. Privileged instruction is identified. (In fig 2, 4 or 8 instruction)
      • 3. These are rewritten to point to their emulated versions.
    • Characteristic
      • Aggressive caching
        • Speed boost
        • But, expense of memory usage
      • Allows most of the virtual environment to run in user-space, but imposes a performance penalty.
        • 80-97% performance, with worse performance in code segments high in privileged inst/.
      • Not identical when doing anything I/O intensive.
      • Binary rewriting is operated similar with debugger application.
        • Provides the ability to set breakpoints, which will cause the running process to be interrupted.
        • Allows the breakpoints to be inspected by user.
        • Debug supported machine which has breakpoint register make implementing this scheme easier.

     

1.5.2 Paravirtualization

  • Paravirtualization
    • Rather than dealing with problematic instruction such as x86, paravirtualization system like Xen simply ignore them.
  • Difference bet/ native and paravirtualized system
    • In the Paravirtualized VM, guest cannot perform privileged instruction.
    • In order to provide similar functionality, hypervisor exposes a set of hypercalls(corresponding to OS's system call)

Figure 3. System calls in native and paravirtualized systems

 

1.5.3 Hardware-Assisted Virtualization

  • Intel's virtualizable processor(aimed to x86)
    • IVT or VT(Intel Virtualization Technology)
    • VMX mode – guest OSes are running in ring 0.(hypervisor can be invisible)
    • VMX root mode – a set of extra instructions is added
      • Allocating a memory page on which to store a full copy of CPU state, start, and stop a guest OS.
      • A set of bitmaps are defined
        • Indicating whether a particular interrupt, instruction, or exception should be passed to guest OS(ring0).

           

  • AMD's virtualizable processor(aimed to x86)
    • Pacifica(AMD-V)
    • Adding a "ring -1" above "ring 0" – extra privileged mode for hypervisor's trap.
    • Links related to x86-64 extension, Opteron architecture(including on-die memory controller)
    • Approaches for memory partitioning
      • Shadow Page Table
        • Marking page tables of guest OS as read-only
        • If page fault(interrupt) is occurred, hypervisor handles it.
      • Nested Page Tables
        • Add another layer – Nested Page Table(H/W)
        • Page tables mapped into real physical address are managed by hypervisor.

Figure 4. Nested Page Tables

  • Device to specific guest OS
    • Device Exclusive Vector – allows a device to write to specific guest OS's address space.
  • Hardware assisted virtualization
    • Referred to HVM(Hardware Virtual Machine)
    • Advantage - Useful when guest OS's source code is not open. Because HVM allows unmodified OS to run in VM.
    • Shortcoming - Speed and flexibility
  • Hybrid approaches bet/ hardware assisted and paravirtualization
    • HVM-assisted
      • Fast system call - hardware supported privilege transition (a transition from "ring0" to "ring1" is unnecessary)
      • Nested Page Table – reducing the number of hypercalls
    • Paravitualization
      • Efficient I/O – lightweight interface rather than relying on emulated H/W

 

1.6 The Xen Philosophy

  1.6.1 Separation of Policy and Mechanism

  • Separation of policy and mechanism
    • Xen hypervisor – implements mechanisms
    • Domain 0 guest – policy
  • Xen's flexible –
    • Xen implements only basic mechanism for hypervisor. Accordingly, additional functionalities are not enforced in code.

 

  1.6.2 Less Is More

  • Xen attempts to be simple or be smaller in order to be secure and bug-free as possible.
    • Flexible features are moved to domain 0
      • EX) Network multiplexing
    • Communication mechanism – simple shared memory


1.7 The Xen Architecture
  1.7.1 The Hypervisor, the OS, and Applications

Figure 5. Ring usage in I32 native and paravirtualized system

  • To virtualize I32 system to Xen env/, (as shown in figure 5)
    • Ring 0(i.e., guest OS's kernel) moves to ring 1. (Here, ring means privilege level)
    • Xen's Hypervisor is located in ring 0.
    • I32 supports not only segment-based memory protection mechanism, but also page protection mechanism.

 

Figure 6. Ring usage in x86-64 native and paravirtualized system

  • On x86-64 system, however, both ring 1 and ring 2 are removed. (as shown in figure 6)
    • Accordingly, Guest OS's kernel and application should be located in ring 3, and hypervisor is in ring 0.
      • i.e., It only supports page protection mechanism.
    • X86 series such as 8086 and 8088, have start real mode.
      • Real mode – 16-bit mode with access to 20-bit address space and no memory management.
      • To compatible with 8086 and 8088, x86 remained real mode system(for legacy system) over 32-bit instruction.
        • Security hole – After real mode booting, CPU is switched to 32-bit protected mode.
          • Guest OSes operated by x86 system in Xen may interfere the other guest OS.
          • This makes hard to isolate the x86 system in Xen.
      • Finally, x86-64 offers Intel's Extended Firmware Interface(EFI), which can boot in protected mode.

         

1.7.2 The Role of Domain 0

  • Xen runs guests in environment known as domain.
  • Xen supports boot loader as a module.
  • First Guest OS in Xen, Dom 0(Domain 0)
    • Elevated privilege
      • Other guests are referred to as domain U(Dom U) – the "U" stand for unprivileged.
      • Dom 0 can handle physical devices.
    • Hypervisor does not have any device driver and user interface.
      • Dom 0 provides the user interfaces to hypercisor
        • EX) Xstore – stores Xen's system configuration(i.e., status information) like /procfs in linux kernel.
  • Components of device driver in Xen system.
    • The spilt device driver – Guest OSes located in dom U moves data via ring buffer to use real device driver located in dom 0.
      • The spilt driver is only soft driver operated in guest OS's kernel.
        • Ring buffer – Shared memory between doms. (ex. TCP/IP stack)
    • The multiplexer
    • The real driver – it is located in dom 0.
  • Sample) packet sent from dom U, TCP/IP (figure 7)

    Figure 7. The path of a packet sent from an unprivileged guest via the system


  1.7.4 HVM Domain

  • Virtualization problem for x86 chip
    • Instruction's privileged properties (mentioned in 1.4), real mode supporting.
    • Legacy system – it cannot be modified the source code(not open code) according to Xen, so it should be loaded in Xen as unmodified feature.
  • HVM domain
    • It supports HVM
      • Unmodified OS with emulated device in dom 0
    • Hypercall in HVM domain
      • HVM supports "CUPID" instruction to access a virtual machine specific register.
      • Access hypercall page -> HVM domain can issue hypercall(called VMCALL)

 

  1.7.5 Xen Configurations

 

Figure 8. A Xen configuration showing driver isolation and an unmodified guest OS.

 

Figure 9. A single node in a cluster Xen environment

Reference
[1] 
Popek and Goldberg, "Formal Requirements for Virtualizable Third Generation Architectures.", Communications of the ACM (1974)

'조사' 카테고리의 다른 글

의사-물리 메모리 모델  (0) 2013.12.30
LINEAR TEMPORAL LOGIC (LTL)  (0) 2013.11.19
Tuning Embedded Linux  (0) 2013.07.03
http://elinux.org  (0) 2013.06.21
Kernel Size Tuning features  (0) 2013.06.21
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
TAG
more
«   2024/05   »
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31
글 보관함