Operating SystemsParavirtualization

Paravirtualization

LevelAdvanced

Duration60 mins

TopicParavirtualization

3 / 5

Hypercalls

The Kernel's System Calls, Elevated

Every operating system student learns about system calls—the gateway through which user-space applications request services from the kernel. When an application needs to read a file, allocate memory, or create a process, it invokes a system call that transitions from user mode to kernel mode, performs the privileged operation, and returns the result.

Hypercalls are the same concept, elevated by one privilege layer.

In a paravirtualized environment, the guest kernel cannot directly perform certain privileged operations—they require hypervisor intervention. When the guest needs to update page tables, handle an interrupt, or configure a timer, it makes a hypercall: a controlled transition from the guest kernel to the hypervisor that requests a specific operation.

The analogy is precise: system calls are the API between applications and the kernel; hypercalls are the API between the guest kernel and the hypervisor. Both provide controlled, validated access to a higher privilege level.

What You Will Learn

By the end of this page, you will understand the architecture of hypercall interfaces, how hypercalls are implemented at the machine level, the design of common hypercall categories, performance considerations and optimization techniques, and the security validation required for safe hypercall handling.

Hypercall Architecture

A hypercall system consists of several components working together to provide efficient, secure communication between guests and the hypervisor:

Architectural Components:

Hypercall Table — A dispatch table in the hypervisor mapping hypercall numbers to handler functions
Entry Mechanism — The instruction or sequence that transfers control from guest to hypervisor
Calling Convention — Register/memory layout for passing arguments and receiving results
Handler Functions — Hypervisor code implementing each hypercall operation
Return Path — Mechanism to resume guest execution with the result

Hypercall vs System Call Comparison
Aspect	System Call	Hypercall
Caller	User-space application	Guest kernel
Callee	Operating system kernel	Hypervisor
Privilege transition	Ring 3 → Ring 0	Ring 1/3 → Ring 0 (or VM exit)
Entry instruction	`syscall`, `sysenter`, `int 0x80`	`vmcall`, `vmmcall`, trap instruction
Typical latency	~100-200 cycles	~200-1000 cycles
Validation required	Check user permissions	Check guest identity and permissions
Examples	`read()`, `write()`, `fork()`	`mmu_update()`, `event_channel_op()`

The Hypercall Page:

Many hypervisors provide a dedicated hypercall page—a region of memory containing the optimal instruction sequence for invoking hypercalls on the current platform. This allows the guest to use the most efficient available mechanism without hardcoding assumptions:

On Intel with VT-x: vmcall instruction
On AMD with AMD-V: vmmcall instruction
On Xen PV (without hardware): int $0x82 or similar trap
Future processors: Potentially new, faster instructions

The guest copies the hypercall invocation stub from the hypervisor-provided page and uses it for all calls. This provides forward compatibility as hardware evolves.

hypercall_setup.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
/* Hypercall Page Setup - Xen Example */
 
/*
 * The hypervisor provides a page with optimal hypercall entry stubs.
 * Guest copies these to a known location and calls through them.
 */
 
/* Hypercall page provided by guest kernel */
extern char hypercall_page[PAGE_SIZE];
 
void __init xen_hypercall_setup(void) {
    /*
     * HYPERCALL_PAGE_MSR contains the physical address where
     * the hypervisor should write the hypercall page.
     */
    unsigned long hypercall_msr;
    
    hypercall_msr = __pa(hypercall_page);
    hypercall_msr |= (unsigned long)XEN_SIGNATURE << 32;
    
    /* Ask Xen to populate the hypercall page */
    wrmsrl(MSR_HYPERCALL_PAGE, hypercall_msr);
    
    /*
     * Now hypercall_page contains stubs like:
     * 
     * hypercall_page + 0*32:   (hypercall 0 entry - mmu_update)
     *     mov $0, %eax
     *     vmcall            ; or vmmcall, or int $0x82
     *     ret
     *
     * hypercall_page + 1*32:   (hypercall 1 entry - set_gdt)
     *     mov $1, %eax
     *     vmcall
     *     ret
     *
     * Each stub is 32 bytes, allowing up to 128 hypercalls per page.
     */
}
 
/* Invoking a hypercall - call into the page */
static inline long HYPERVISOR_mmu_update(
    mmu_update_t *req, unsigned int count,
    unsigned int *success_count, domid_t domid)
{
    long ret;
    
    /*
     * Call the stub at hypercall_page + (hypercall_nr * 32)
     * Arguments in registers per AMD64 calling convention:
     *   %rdi = req
     *   %rsi = count  
     *   %rdx = success_count
     *   %r10 = domid (note: not %rcx, which is clobbered)
     */
    asm volatile(
        "call *%[entry]"
        : "=a" (ret)
        : [entry] "r" (hypercall_page + __HYPERVISOR_mmu_update * 32),
          "D" (req), "S" (count), "d" (success_count), "r" (domid)
        : "memory", "rcx", "r11"
    );
    
    return ret;
}

Hypercall Entry Mechanisms

The physical mechanism for transferring control from guest to hypervisor varies by hardware platform and virtualization mode. Understanding these mechanisms reveals important performance and security characteristics.

Software Trap (Pure Paravirtualization):

In pure paravirtualization without hardware virtualization support, hypercalls use software interrupt instructions:

int $0x82 on Xen (arbitrary vector not used by standard x86)
The trap transfers control to an IDT handler in the hypervisor
Similar to int $0x80 for Linux system calls in 32-bit mode

Characteristics:

Works on any x86 processor
Higher latency than hardware-assisted methods (~500-1000 cycles)
Requires careful IDT setup for security
Guest runs at Ring 1 (allowing hypervisor at Ring 0)

Assembly

; Software trap hypercall entry (Xen PV)
; Guest executes this stub from hypercall page
 
hypercall_entry_int82:
    ; Arguments already in registers:
    ;   eax = hypercall number
    ;   ebx, ecx, edx, esi, edi = arguments 1-5
    
    int $0x82           ; Trap to hypervisor
    
    ; On return, eax contains result
    ret
 
; Hypervisor IDT handler for int 0x82
hypervisor_int82_handler:
    ; Save guest state
    push_all_registers
    
    ; Validate hypercall number
    cmp eax, MAX_HYPERCALL
    jae .invalid_hypercall
    
    ; Dispatch to handler
    mov rax, [hypercall_table + rax * 8]
    call rax
    
    ; Restore guest state with result in eax
    pop_all_registers
    iret
 
.invalid_hypercall:
    mov eax, -ENOSYS
    pop_all_registers
    iret

Hypercall Categories

Hypercalls can be organized into functional categories based on the subsystem they serve. Each category has distinct characteristics and usage patterns.

Xen Hypercall Categories
Category	Example Hypercalls	Purpose
Memory Management	mmu_update, mmuext_op, update_va_mapping	Page table updates, TLB flushes
Event Channels	event_channel_op, set_callbacks	Virtual interrupt delivery
Scheduling	sched_op, vcpu_op	Yield, block, vCPU management
Console/Debug	console_io, sysctl	Output, debugging, configuration
Domain Management	domctl, memory_op	VM lifecycle (privileged only)
Grant Tables	grant_table_op	Inter-domain memory sharing
Time	set_timer_op, vcpu_time_info	Timer events, wall clock
Multicall	multicall	Batch multiple hypercalls

Let's examine the most critical hypercall categories in detail:

Memory Management Hypercalls:

These are among the most frequently invoked hypercalls, as every page table modification in a paravirtualized guest requires hypervisor validation. The mmu_update hypercall accepts an array of update requests, allowing batched updates for efficiency:

mmu_hypercalls.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
/* Memory Management Hypercalls */
 
/* Single MMU update (PTE, PDE, etc.) */
struct mmu_update {
    uint64_t ptr;    /* Machine address of PTE to update */
    uint64_t val;    /* New value to write */
};
 
/* Extended MMU operations */
struct mmuext_op {
    unsigned int cmd;     /* Operation code */
    union {
        unsigned long mfn;        /* For NEW_BASEPTR */
        unsigned long linear_addr; /* For INVLPG */
    } arg1;
    union {
        unsigned int nr_ents;     /* For operations on ranges */
        unsigned long pcid;       /* For TLB operations */
    } arg2;
};
 
/* MMU operation codes */
#define MMUEXT_PIN_L1_TABLE      0  /* Pin as L1 (PTE) page table */
#define MMUEXT_PIN_L2_TABLE      1  /* Pin as L2 (PDE) page table */
#define MMUEXT_PIN_L3_TABLE      2  /* Pin as L3 (PDPE) page table */
#define MMUEXT_PIN_L4_TABLE      3  /* Pin as L4 (PML4E) page table */
#define MMUEXT_UNPIN_TABLE       4  /* Unpin page table */
#define MMUEXT_NEW_BASEPTR       5  /* Set new page table base (CR3) */
#define MMUEXT_TLB_FLUSH_LOCAL   6  /* Flush local TLB */
#define MMUEXT_INVLPG_LOCAL      7  /* Invalidate single TLB entry */
#define MMUEXT_TLB_FLUSH_MULTI   8  /* Flush TLBs on multiple CPUs */
#define MMUEXT_INVLPG_MULTI      9  /* Invalidate entry on multiple CPUs */
#define MMUEXT_TLB_FLUSH_ALL    10  /* Flush all TLBs (all CPUs) */
#define MMUEXT_INVLPG_ALL       11  /* Invalidate entry on all CPUs */
 
/* Changing page table base (equivalent to writing CR3) */
static void xen_write_cr3(unsigned long cr3) {
    struct mmuext_op op;
    
    op.cmd = MMUEXT_NEW_BASEPTR;
    op.arg1.mfn = PFN_DOWN(__pa(cr3));
    
    HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF);
}
 
/* Flushing TLB for a single address */
static void xen_flush_tlb_single(unsigned long addr) {
    struct mmuext_op op;
    
    op.cmd = MMUEXT_INVLPG_LOCAL;
    op.arg1.linear_addr = addr;
    
    HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF);
}
 
/* Batched PTE updates - efficient for bulk modifications */
static void xen_set_pte_batch(pte_t *ptes[], pte_t vals[], int count) {
    struct mmu_update updates[MAX_BATCH_SIZE];
    int i;
    
    for (i = 0; i < count; i++) {
        updates[i].ptr = arbitrary_virt_to_machine(ptes[i]);
        updates[i].val = pte_val_ma(vals[i]);
    }
    
    /* Single hypercall for all updates */
    if (HYPERVISOR_mmu_update(updates, count, NULL, DOMID_SELF) < 0)
        BUG();
}

Event Channel Hypercalls:

Event channels provide the virtual interrupt mechanism. The event_channel_op hypercall manages channel lifecycle and notification:

event_hypercalls.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
/* Event Channel Hypercalls */
 
/* Event channel operations */
#define EVTCHNOP_bind_interdomain   0  /* Create channel between domains */
#define EVTCHNOP_bind_virq          1  /* Bind to virtual IRQ */
#define EVTCHNOP_bind_pirq          2  /* Bind to physical IRQ */
#define EVTCHNOP_close              3  /* Close channel */
#define EVTCHNOP_send               4  /* Send notification */
#define EVTCHNOP_status             5  /* Query channel status */
#define EVTCHNOP_alloc_unbound      6  /* Allocate unbound channel */
#define EVTCHNOP_bind_ipi           7  /* Bind for IPI delivery */
#define EVTCHNOP_unmask             9  /* Unmask channel */
 
/* Bind a virtual IRQ (timer, console, etc.) to an event channel */
int bind_virq(unsigned int virq, unsigned int cpu) {
    struct evtchn_bind_virq bind = {
        .virq = virq,
        .vcpu = cpu,
    };
    int ret;
    
    ret = HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq, &bind);
    if (ret == 0)
        return bind.port;  /* Return allocated port number */
    return ret;
}
 
/* Send notification on an event channel */
static inline void notify_remote_via_evtchn(unsigned int port) {
    struct evtchn_send send = { .port = port };
    HYPERVISOR_event_channel_op(EVTCHNOP_send, &send);
}
 
/* Allocate an unbound event channel for inter-domain comms */
int alloc_unbound(domid_t remote_dom) {
    struct evtchn_alloc_unbound alloc = {
        .dom = DOMID_SELF,
        .remote_dom = remote_dom,
    };
    int ret;
    
    ret = HYPERVISOR_event_channel_op(EVTCHNOP_alloc_unbound, &alloc);
    if (ret == 0)
        return alloc.port;
    return ret;
}
 
/* Setting up event channel callbacks during boot */
void xen_setup_callbacks(void) {
    struct xen_callback_register event_cb = {
        .type = CALLBACKTYPE_event,
        .address = (unsigned long)xen_hypervisor_callback,
    };
    
    struct xen_callback_register failsafe_cb = {
        .type = CALLBACKTYPE_failsafe,
        .address = (unsigned long)xen_failsafe_callback,
    };
    
    HYPERVISOR_callback_op(CALLBACKOP_register, &event_cb);
    HYPERVISOR_callback_op(CALLBACKOP_register, &failsafe_cb);
}

The Multicall Optimization

Since each hypercall incurs fixed overhead (VM exit, state save/restore, dispatch), executing many small hypercalls can be expensive. The multicall mechanism solves this by batching multiple hypercall requests into a single hypervisor transition.

How Multicall Works:

Guest builds an array of hypercall descriptors
Single multicall hypercall passes the array to hypervisor
Hypervisor executes each sub-call in sequence
Results written back to the descriptor array
Single return to guest with all results available

This amortizes the transition overhead across many operations.

multicall.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
/* Multicall Optimization */
 
struct multicall_entry {
    unsigned long op;           /* Hypercall operation number */
    long result;                /* Result (filled in by hypervisor) */
    unsigned long args[6];      /* Up to 6 arguments */
};
 
/* Per-CPU multicall batch */
struct mc_buffer {
    unsigned int mc_idx;
    struct multicall_entry entries[MC_BATCH];
};
static DEFINE_PER_CPU(struct mc_buffer, mc_buffer);
 
/* Add a hypercall to the current batch */
static inline void xen_mc_entry(unsigned long op, 
                                unsigned long arg0,
                                unsigned long arg1,
                                unsigned long arg2,
                                unsigned long arg3,
                                unsigned long arg4) {
    struct mc_buffer *mc = this_cpu_ptr(&mc_buffer);
    struct multicall_entry *entry = &mc->entries[mc->mc_idx++];
    
    entry->op = op;
    entry->args[0] = arg0;
    entry->args[1] = arg1;
    entry->args[2] = arg2;
    entry->args[3] = arg3;
    entry->args[4] = arg4;
    
    /* Flush if batch is full */
    if (mc->mc_idx >= MC_BATCH)
        xen_mc_issue();
}
 
/* Execute the batched hypercalls */
void xen_mc_issue(void) {
    struct mc_buffer *mc = this_cpu_ptr(&mc_buffer);
    
    if (mc->mc_idx == 0)
        return;
    
    /* Single hypercall executes entire batch */
    HYPERVISOR_multicall(mc->entries, mc->mc_idx);
    
    /* Check results if needed */
    for (int i = 0; i < mc->mc_idx; i++) {
        if (unlikely(mc->entries[i].result < 0))
            handle_multicall_error(&mc->entries[i], i);
    }
    
    mc->mc_idx = 0;
}
 
/*
 * Example: Updating multiple PTEs with multicall
 * Instead of N hypercalls, we make 1 multicall + 1 flush
 */
void xen_set_pmd_batch(pmd_t *pmds[], pmd_t vals[], int count) {
    struct mmu_update updates[count];
    
    preempt_disable();  /* Keep on same CPU for multicall buffer */
    
    /* Build array of updates */
    for (int i = 0; i < count; i++) {
        updates[i].ptr = arbitrary_virt_to_machine(pmds[i]);
        updates[i].val = pmd_val_ma(vals[i]);
    }
    
    /* Add to multicall batch */
    xen_mc_entry(
        __HYPERVISOR_mmu_update,
        (unsigned long)updates,
        count,
        0,              /* success_count - not needed */
        DOMID_SELF,
        0
    );
    
    /* Add TLB flush to same batch */
    struct mmuext_op flush = { .cmd = MMUEXT_TLB_FLUSH_LOCAL };
    xen_mc_entry(
        __HYPERVISOR_mmuext_op,
        (unsigned long)&flush,
        1,
        0,
        DOMID_SELF,
        0
    );
    
    /* Issue single multicall for both operations */
    xen_mc_issue();
    
    preempt_enable();
}

When to Use Multicall

Multicall is most beneficial when you have multiple independent hypercalls to make in sequence. The Linux kernel's Xen code automatically batches related operations (e.g., page table updates before context switch). For single, isolated hypercalls, the multicall wrapper adds overhead and should be avoided.

Security and Validation

Hypercall handlers are the hypervisor's attack surface. A malicious guest could attempt to exploit hypercalls to:

Escape its isolation and affect other guests
Access memory it shouldn't see (confidentiality breach)
Modify memory of other guests or the hypervisor (integrity breach)
Consume resources excessively (denial of service)
Privilege escalate within the virtualization stack

Security Principles for Hypercall Design:

Critical Validation Requirements

•Validate All Pointers — Guest-provided memory addresses must be verified to actually belong to the calling guest. Never dereference a guest pointer directly.
•Check Domain Permissions — Not all hypercalls are available to all guests. Control operations (domctl) are restricted to Domain 0.
•Validate Array Lengths — Buffer sizes must be checked against maximum allowed values to prevent buffer overflows.
•Verify Machine Frame Numbers — When guests provide MFNs, verify the guest actually owns those frames.
•Validate Page Table Entries — PTEs cannot map arbitrary memory; only frames owned or granted to the domain are valid.
•Rate Limiting — Some operations should be rate-limited to prevent resource exhaustion attacks.

hypercall_validation.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
/* Hypercall Validation Examples */
 
/* Validating guest-provided memory pointer */
static int validate_guest_buffer(struct domain *d, 
                                 unsigned long guest_va,
                                 size_t size,
                                 bool write) {
    unsigned long gpfn, mfn;
    p2m_type_t p2m;
    
    /* Check alignment */
    if (guest_va & (sizeof(long) - 1))
        return -EINVAL;
    
    /* Check size limits */
    if (size > MAX_HYPERCALL_BUFFER_SIZE)
        return -E2BIG;
    
    /* Walk guest page tables to get machine frame */
    gpfn = guest_va >> PAGE_SHIFT;
    mfn = get_gfn_query(d, gpfn, &p2m);
    
    if (!mfn_valid(mfn))
        return -EFAULT;
    
    /* Check page type allows requested access */
    if (write && !p2m_writeable(p2m))
        return -EACCES;
    
    put_gfn(d, gpfn);
    return 0;
}
 
/* Validating page table entry before allowing update */
static int validate_pte_update(struct domain *d,
                               unsigned long ptr,
                               unsigned long val) {
    unsigned long mfn;
    struct page_info *page;
    
    /* ptr must be machine address of a PTE in guest's page table */
    mfn = ptr >> PAGE_SHIFT;
    page = mfn_to_page(mfn);
    
    /* Verify guest owns this page table page */
    if (page_get_owner(page) != d)
        return -EPERM;
    
    /* Verify page is pinned as a page table page */
    if (!(page->u.inuse.type_info & PGT_validated))
        return -EINVAL;
    
    /* If mapping something, verify guest owns target frame */
    if (val & _PAGE_PRESENT) {
        unsigned long target_mfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT;
        struct page_info *target = mfn_to_page(target_mfn);
        
        /* Must own or have grant to target frame */
        if (page_get_owner(target) != d &&
            !grant_map_valid(d, target_mfn))
            return -EPERM;
        
        /* Cannot map with write + exec in same PTE (W^X) */
        if ((val & _PAGE_RW) && (val & ~_PAGE_NX))
            return -EACCES;  /* Policy decision */
    }
    
    return 0;
}
 
/* Domain permission check for privileged hypercalls */
static bool domain_has_privilege(struct domain *d, 
                                unsigned int operation) {
    switch (operation) {
    case DOMCTL_createdomain:
    case DOMCTL_destroydomain:
    case DOMCTL_pausedomain:
    case DOMCTL_unpausedomain:
        /* Only Domain 0 (management domain) can do these */
        return d->domain_id == 0;
        
    case DOMCTL_getdomaininfo:
        /* Dom0 can query any domain; others only themselves */
        return d->domain_id == 0 || d->domain_id == target_domain;
        
    default:
        return false;
    }
}

Defense in Depth

Even with careful validation, hypercall handlers have been a source of security vulnerabilities. The Xen Security Advisory (XSA) list includes numerous hypercall-related issues. Modern hypervisors employ additional defenses: fuzzing of hypercall handlers, formal verification of critical paths, and sandboxing of hypervisor components.

Cross-Hypervisor Hypercall Interfaces

Different hypervisors define their own hypercall interfaces. For guests to run on multiple hypervisors, either the guest must detect and adapt to each interface, or a standard interface must be adopted.

Major Hypercall Interfaces:

Hypervisor	Detection Method	Hypercall Instruction	Notable Hypercalls
Xen	CPUID leaf 0x40000000 = "XenVMMXenVMM"	vmcall/vmmcall/int 0x82	mmu_update, event_channel_op
KVM	CPUID leaf 0x40000000 = "KVMKVMKVM\0\0\0"	vmcall/vmmcall	KVM_HC_KICK_CPU, KVM_HC_CLOCK_PAIRING
Hyper-V	CPUID leaf 0x40000001 bits	vmcall/vmmcall	HvPostMessage, HvSignalEvent
VMware	I/O port 0x5658 (magic)	in/out instructions	VMWARE_CMD_GETVERSION, etc.

hypervisor_detection.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
/* Hypervisor Detection and Interface Selection */
 
enum hypervisor_type {
    HV_NONE = 0,
    HV_XEN,
    HV_KVM,
    HV_HYPERV,
    HV_VMWARE,
};
 
/* Detect which hypervisor we're running under */
enum hypervisor_type detect_hypervisor(void) {
    unsigned int eax, ebx, ecx, edx;
    char signature[13];
    
    /* Check for hypervisor presence bit first */
    cpuid(1, &eax, &ebx, &ecx, &edx);
    if (!(ecx & (1 << 31)))
        return HV_NONE;  /* Not running under hypervisor */
    
    /* Get hypervisor signature from CPUID leaf 0x40000000 */
    cpuid(0x40000000, &eax, &ebx, &ecx, &edx);
    memcpy(signature + 0, &ebx, 4);
    memcpy(signature + 4, &ecx, 4);
    memcpy(signature + 8, &edx, 4);
    signature[12] = '\0';
    
    if (strcmp(signature, "XenVMMXenVMM") == 0)
        return HV_XEN;
    if (strcmp(signature, "KVMKVMKVM") == 0)
        return HV_KVM;
    if (strcmp(signature, "Microsoft Hv") == 0)
        return HV_HYPERV;
    if (strcmp(signature, "VMwareVMware") == 0)
        return HV_VMWARE;
    
    return HV_NONE;  /* Unknown hypervisor */
}
 
/* Initialize paravirt ops based on detected hypervisor */
void __init init_hypervisor_platform(void) {
    enum hypervisor_type hv = detect_hypervisor();
    
    switch (hv) {
    case HV_XEN:
        xen_start_kernel();
        break;
    case HV_KVM:
        kvm_guest_init();
        break;
    case HV_HYPERV:
        hyperv_init();
        break;
    case HV_VMWARE:
        vmware_platform_setup();
        break;
    default:
        /* Native or unknown - use native ops */
        break;
    }
}
 
/* Each hypervisor provides its hypercall setup */
void kvm_guest_init(void) {
    pv_time_ops = kvm_time_ops;
    pv_mmu_ops.read_cr3 = kvm_read_cr3;
    /* KVM uses fewer paravirt hooks - mostly hardware-assisted */
    
    kvm_guest_cpu_init();
}

The virtio Standard

For I/O, the virtio standard provides hypervisor-agnostic paravirtualized devices. Rather than each hypervisor defining its own block/network/console interfaces, virtio specifies a common interface that works with KVM, Xen, VMware, and others. This significantly reduces guest porting effort.

Hypercall Debugging and Tracing

Debugging hypercall-related issues requires visibility into the guest-hypervisor boundary. Several techniques enable hypercall tracing and analysis:

Guest-Side Debugging

•ftrace hypercall tracepoints — Linux traces hypercall entry/exit with arguments and return values
•/proc/xen or equivalent — Exposes hypercall statistics, event channel state
•hypercall latency histograms — Collect timing data to identify slow paths
•xen-trace in Xen — Records hypercall flows with timestamps

Hypervisor-Side Debugging

•xl dmesg (Xen) — Hypervisor ring buffer with hypercall logging
•xentrace — High-resolution event tracing of hypercall execution
•perf kvm — Hardware performance counters for VM exits
•GDB stub — Some hypervisors support GDB attachment for debugging

hypercall_tracing.sh
Shell
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#!/bin/bash
# Hypercall Tracing Examples
 
# === Linux ftrace for hypercalls ===
 
# Enable Xen hypercall tracing
echo 1 > /sys/kernel/debug/tracing/events/xen/xen_mc_entry/enable
echo 1 > /sys/kernel/debug/tracing/events/xen/xen_mc_callback/enable
 
# View hypercall trace
cat /sys/kernel/debug/tracing/trace
 
# Example output:
# kworker/0:1-123  [000] .... 1234.567890: xen_mc_entry: op=2 (mmu_update) arg1=0xffff... arg2=1
# kworker/0:1-123  [000] .... 1234.567895: xen_mc_callback: op=2 result=0
 
# === Xentrace for hypervisor-side tracing ===
 
# Start trace capture on dom0
xentrace -D -e 0x20000 /tmp/xentrace.bin &
 
# Run workload in guest...
 
# Stop tracing
kill %1
 
# Analyze trace
xentrace_format /tmp/xentrace.bin | less
 
# === perf for VM exit analysis ===
 
# Count VM exits by reason
perf kvm stat live
 
# Record VM exit trace for specific guest
perf kvm stat record -p $(pgrep qemu) sleep 10
perf kvm stat report
 
# Example output:
#  Event name       Samples     Time     Min Time     Max Time     Avg time
# HYPERCALL            5231    12.50ms      200ns      150us      2.39us
# EPT_VIOLATION         234     1.73ms      800ns       50us      7.39us
# IO_INSTRUCTION       1502     8.44ms      500ns      100us      5.62us

Summary: Hypercalls

We've explored hypercalls from interface to implementation. Let's consolidate the key insights:

Key Takeaways

•Hypercalls are privileged system calls to the hypervisor — They provide the controlled interface between guest kernel and hypervisor.
•Entry mechanisms vary by platform — Software traps, vmcall/vmmcall, or dedicated instructions; the hypercall page abstracts these differences.
•Categories span all kernel subsystems — Memory management, events, scheduling, I/O, and time all have associated hypercalls.
•Multicall batching amortizes overhead — Grouping multiple hypercalls into one transition significantly reduces total overhead.
•Security validation is critical — Every guest-provided argument must be validated; hypercall handlers are a primary attack surface.
•Cross-hypervisor portability requires detection — Guests detect their hypervisor via CPUID and configure appropriate interfaces.
•Tracing tools exist at multiple levels — Both guest ftrace and hypervisor-side tracing enable debugging of hypercall issues.

What's Next:

Now that we understand the hypercall mechanism, we'll examine the performance benefits of paravirtualization in detail. We'll see quantitative measurements comparing paravirtualized, full virtualized, and native execution across various workloads, understanding where paravirtualization shines and where modern hardware assistance has closed the gap.

Page Complete

You now understand hypercalls—the fundamental API between paravirtualized guests and hypervisors. From entry mechanisms to security validation, this knowledge is essential for understanding how modern virtualization systems achieve both performance and isolation.

3 / 5

Loading learning content...

Operating SystemsParavirtualization

Paravirtualization

LevelAdvanced

Duration60 mins

TopicParavirtualization

3 / 5

Hypercalls

The Kernel's System Calls, Elevated

Hypercalls are the same concept, elevated by one privilege layer.

What You Will Learn

Hypercall Architecture

A hypercall system consists of several components working together to provide efficient, secure communication between guests and the hypervisor:

Architectural Components:

Hypercall Table — A dispatch table in the hypervisor mapping hypercall numbers to handler functions
Entry Mechanism — The instruction or sequence that transfers control from guest to hypervisor
Calling Convention — Register/memory layout for passing arguments and receiving results
Handler Functions — Hypervisor code implementing each hypercall operation
Return Path — Mechanism to resume guest execution with the result

Hypercall vs System Call Comparison
Aspect	System Call	Hypercall
Caller	User-space application	Guest kernel
Callee	Operating system kernel	Hypervisor
Privilege transition	Ring 3 → Ring 0	Ring 1/3 → Ring 0 (or VM exit)
Entry instruction	`syscall`, `sysenter`, `int 0x80`	`vmcall`, `vmmcall`, trap instruction
Typical latency	~100-200 cycles	~200-1000 cycles
Validation required	Check user permissions	Check guest identity and permissions
Examples	`read()`, `write()`, `fork()`	`mmu_update()`, `event_channel_op()`

The Hypercall Page:

On Intel with VT-x: vmcall instruction
On AMD with AMD-V: vmmcall instruction
On Xen PV (without hardware): int $0x82 or similar trap
Future processors: Potentially new, faster instructions

The guest copies the hypercall invocation stub from the hypervisor-provided page and uses it for all calls. This provides forward compatibility as hardware evolves.

hypercall_setup.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
/* Hypercall Page Setup - Xen Example */
 
/*
 * The hypervisor provides a page with optimal hypercall entry stubs.
 * Guest copies these to a known location and calls through them.
 */
 
/* Hypercall page provided by guest kernel */
extern char hypercall_page[PAGE_SIZE];
 
void __init xen_hypercall_setup(void) {
    /*
     * HYPERCALL_PAGE_MSR contains the physical address where
     * the hypervisor should write the hypercall page.
     */
    unsigned long hypercall_msr;
    
    hypercall_msr = __pa(hypercall_page);
    hypercall_msr |= (unsigned long)XEN_SIGNATURE << 32;
    
    /* Ask Xen to populate the hypercall page */
    wrmsrl(MSR_HYPERCALL_PAGE, hypercall_msr);
    
    /*
     * Now hypercall_page contains stubs like:
     * 
     * hypercall_page + 0*32:   (hypercall 0 entry - mmu_update)
     *     mov $0, %eax
     *     vmcall            ; or vmmcall, or int $0x82
     *     ret
     *
     * hypercall_page + 1*32:   (hypercall 1 entry - set_gdt)
     *     mov $1, %eax
     *     vmcall
     *     ret
     *
     * Each stub is 32 bytes, allowing up to 128 hypercalls per page.
     */
}
 
/* Invoking a hypercall - call into the page */
static inline long HYPERVISOR_mmu_update(
    mmu_update_t *req, unsigned int count,
    unsigned int *success_count, domid_t domid)
{
    long ret;
    
    /*
     * Call the stub at hypercall_page + (hypercall_nr * 32)
     * Arguments in registers per AMD64 calling convention:
     *   %rdi = req
     *   %rsi = count  
     *   %rdx = success_count
     *   %r10 = domid (note: not %rcx, which is clobbered)
     */
    asm volatile(
        "call *%[entry]"
        : "=a" (ret)
        : [entry] "r" (hypercall_page + __HYPERVISOR_mmu_update * 32),
          "D" (req), "S" (count), "d" (success_count), "r" (domid)
        : "memory", "rcx", "r11"
    );
    
    return ret;
}

Hypercall Entry Mechanisms

Software Trap (Pure Paravirtualization):

In pure paravirtualization without hardware virtualization support, hypercalls use software interrupt instructions:

int $0x82 on Xen (arbitrary vector not used by standard x86)
The trap transfers control to an IDT handler in the hypervisor
Similar to int $0x80 for Linux system calls in 32-bit mode

Characteristics:

Works on any x86 processor
Higher latency than hardware-assisted methods (~500-1000 cycles)
Requires careful IDT setup for security
Guest runs at Ring 1 (allowing hypervisor at Ring 0)

Assembly

; Software trap hypercall entry (Xen PV)
; Guest executes this stub from hypercall page
 
hypercall_entry_int82:
    ; Arguments already in registers:
    ;   eax = hypercall number
    ;   ebx, ecx, edx, esi, edi = arguments 1-5
    
    int $0x82           ; Trap to hypervisor
    
    ; On return, eax contains result
    ret
 
; Hypervisor IDT handler for int 0x82
hypervisor_int82_handler:
    ; Save guest state
    push_all_registers
    
    ; Validate hypercall number
    cmp eax, MAX_HYPERCALL
    jae .invalid_hypercall
    
    ; Dispatch to handler
    mov rax, [hypercall_table + rax * 8]
    call rax
    
    ; Restore guest state with result in eax
    pop_all_registers
    iret
 
.invalid_hypercall:
    mov eax, -ENOSYS
    pop_all_registers
    iret

Hypercall Categories

Hypercalls can be organized into functional categories based on the subsystem they serve. Each category has distinct characteristics and usage patterns.

Xen Hypercall Categories
Category	Example Hypercalls	Purpose
Memory Management	mmu_update, mmuext_op, update_va_mapping	Page table updates, TLB flushes
Event Channels	event_channel_op, set_callbacks	Virtual interrupt delivery
Scheduling	sched_op, vcpu_op	Yield, block, vCPU management
Console/Debug	console_io, sysctl	Output, debugging, configuration
Domain Management	domctl, memory_op	VM lifecycle (privileged only)
Grant Tables	grant_table_op	Inter-domain memory sharing
Time	set_timer_op, vcpu_time_info	Timer events, wall clock
Multicall	multicall	Batch multiple hypercalls

Let's examine the most critical hypercall categories in detail:

Memory Management Hypercalls:

mmu_hypercalls.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
/* Memory Management Hypercalls */
 
/* Single MMU update (PTE, PDE, etc.) */
struct mmu_update {
    uint64_t ptr;    /* Machine address of PTE to update */
    uint64_t val;    /* New value to write */
};
 
/* Extended MMU operations */
struct mmuext_op {
    unsigned int cmd;     /* Operation code */
    union {
        unsigned long mfn;        /* For NEW_BASEPTR */
        unsigned long linear_addr; /* For INVLPG */
    } arg1;
    union {
        unsigned int nr_ents;     /* For operations on ranges */
        unsigned long pcid;       /* For TLB operations */
    } arg2;
};
 
/* MMU operation codes */
#define MMUEXT_PIN_L1_TABLE      0  /* Pin as L1 (PTE) page table */
#define MMUEXT_PIN_L2_TABLE      1  /* Pin as L2 (PDE) page table */
#define MMUEXT_PIN_L3_TABLE      2  /* Pin as L3 (PDPE) page table */
#define MMUEXT_PIN_L4_TABLE      3  /* Pin as L4 (PML4E) page table */
#define MMUEXT_UNPIN_TABLE       4  /* Unpin page table */
#define MMUEXT_NEW_BASEPTR       5  /* Set new page table base (CR3) */
#define MMUEXT_TLB_FLUSH_LOCAL   6  /* Flush local TLB */
#define MMUEXT_INVLPG_LOCAL      7  /* Invalidate single TLB entry */
#define MMUEXT_TLB_FLUSH_MULTI   8  /* Flush TLBs on multiple CPUs */
#define MMUEXT_INVLPG_MULTI      9  /* Invalidate entry on multiple CPUs */
#define MMUEXT_TLB_FLUSH_ALL    10  /* Flush all TLBs (all CPUs) */
#define MMUEXT_INVLPG_ALL       11  /* Invalidate entry on all CPUs */
 
/* Changing page table base (equivalent to writing CR3) */
static void xen_write_cr3(unsigned long cr3) {
    struct mmuext_op op;
    
    op.cmd = MMUEXT_NEW_BASEPTR;
    op.arg1.mfn = PFN_DOWN(__pa(cr3));
    
    HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF);
}
 
/* Flushing TLB for a single address */
static void xen_flush_tlb_single(unsigned long addr) {
    struct mmuext_op op;
    
    op.cmd = MMUEXT_INVLPG_LOCAL;
    op.arg1.linear_addr = addr;
    
    HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF);
}
 
/* Batched PTE updates - efficient for bulk modifications */
static void xen_set_pte_batch(pte_t *ptes[], pte_t vals[], int count) {
    struct mmu_update updates[MAX_BATCH_SIZE];
    int i;
    
    for (i = 0; i < count; i++) {
        updates[i].ptr = arbitrary_virt_to_machine(ptes[i]);
        updates[i].val = pte_val_ma(vals[i]);
    }
    
    /* Single hypercall for all updates */
    if (HYPERVISOR_mmu_update(updates, count, NULL, DOMID_SELF) < 0)
        BUG();
}

Event Channel Hypercalls:

Event channels provide the virtual interrupt mechanism. The event_channel_op hypercall manages channel lifecycle and notification:

event_hypercalls.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
/* Event Channel Hypercalls */
 
/* Event channel operations */
#define EVTCHNOP_bind_interdomain   0  /* Create channel between domains */
#define EVTCHNOP_bind_virq          1  /* Bind to virtual IRQ */
#define EVTCHNOP_bind_pirq          2  /* Bind to physical IRQ */
#define EVTCHNOP_close              3  /* Close channel */
#define EVTCHNOP_send               4  /* Send notification */
#define EVTCHNOP_status             5  /* Query channel status */
#define EVTCHNOP_alloc_unbound      6  /* Allocate unbound channel */
#define EVTCHNOP_bind_ipi           7  /* Bind for IPI delivery */
#define EVTCHNOP_unmask             9  /* Unmask channel */
 
/* Bind a virtual IRQ (timer, console, etc.) to an event channel */
int bind_virq(unsigned int virq, unsigned int cpu) {
    struct evtchn_bind_virq bind = {
        .virq = virq,
        .vcpu = cpu,
    };
    int ret;
    
    ret = HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq, &bind);
    if (ret == 0)
        return bind.port;  /* Return allocated port number */
    return ret;
}
 
/* Send notification on an event channel */
static inline void notify_remote_via_evtchn(unsigned int port) {
    struct evtchn_send send = { .port = port };
    HYPERVISOR_event_channel_op(EVTCHNOP_send, &send);
}
 
/* Allocate an unbound event channel for inter-domain comms */
int alloc_unbound(domid_t remote_dom) {
    struct evtchn_alloc_unbound alloc = {
        .dom = DOMID_SELF,
        .remote_dom = remote_dom,
    };
    int ret;
    
    ret = HYPERVISOR_event_channel_op(EVTCHNOP_alloc_unbound, &alloc);
    if (ret == 0)
        return alloc.port;
    return ret;
}
 
/* Setting up event channel callbacks during boot */
void xen_setup_callbacks(void) {
    struct xen_callback_register event_cb = {
        .type = CALLBACKTYPE_event,
        .address = (unsigned long)xen_hypervisor_callback,
    };
    
    struct xen_callback_register failsafe_cb = {
        .type = CALLBACKTYPE_failsafe,
        .address = (unsigned long)xen_failsafe_callback,
    };
    
    HYPERVISOR_callback_op(CALLBACKOP_register, &event_cb);
    HYPERVISOR_callback_op(CALLBACKOP_register, &failsafe_cb);
}

The Multicall Optimization

How Multicall Works:

Guest builds an array of hypercall descriptors
Single multicall hypercall passes the array to hypervisor
Hypervisor executes each sub-call in sequence
Results written back to the descriptor array
Single return to guest with all results available

This amortizes the transition overhead across many operations.

multicall.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
/* Multicall Optimization */
 
struct multicall_entry {
    unsigned long op;           /* Hypercall operation number */
    long result;                /* Result (filled in by hypervisor) */
    unsigned long args[6];      /* Up to 6 arguments */
};
 
/* Per-CPU multicall batch */
struct mc_buffer {
    unsigned int mc_idx;
    struct multicall_entry entries[MC_BATCH];
};
static DEFINE_PER_CPU(struct mc_buffer, mc_buffer);
 
/* Add a hypercall to the current batch */
static inline void xen_mc_entry(unsigned long op, 
                                unsigned long arg0,
                                unsigned long arg1,
                                unsigned long arg2,
                                unsigned long arg3,
                                unsigned long arg4) {
    struct mc_buffer *mc = this_cpu_ptr(&mc_buffer);
    struct multicall_entry *entry = &mc->entries[mc->mc_idx++];
    
    entry->op = op;
    entry->args[0] = arg0;
    entry->args[1] = arg1;
    entry->args[2] = arg2;
    entry->args[3] = arg3;
    entry->args[4] = arg4;
    
    /* Flush if batch is full */
    if (mc->mc_idx >= MC_BATCH)
        xen_mc_issue();
}
 
/* Execute the batched hypercalls */
void xen_mc_issue(void) {
    struct mc_buffer *mc = this_cpu_ptr(&mc_buffer);
    
    if (mc->mc_idx == 0)
        return;
    
    /* Single hypercall executes entire batch */
    HYPERVISOR_multicall(mc->entries, mc->mc_idx);
    
    /* Check results if needed */
    for (int i = 0; i < mc->mc_idx; i++) {
        if (unlikely(mc->entries[i].result < 0))
            handle_multicall_error(&mc->entries[i], i);
    }
    
    mc->mc_idx = 0;
}
 
/*
 * Example: Updating multiple PTEs with multicall
 * Instead of N hypercalls, we make 1 multicall + 1 flush
 */
void xen_set_pmd_batch(pmd_t *pmds[], pmd_t vals[], int count) {
    struct mmu_update updates[count];
    
    preempt_disable();  /* Keep on same CPU for multicall buffer */
    
    /* Build array of updates */
    for (int i = 0; i < count; i++) {
        updates[i].ptr = arbitrary_virt_to_machine(pmds[i]);
        updates[i].val = pmd_val_ma(vals[i]);
    }
    
    /* Add to multicall batch */
    xen_mc_entry(
        __HYPERVISOR_mmu_update,
        (unsigned long)updates,
        count,
        0,              /* success_count - not needed */
        DOMID_SELF,
        0
    );
    
    /* Add TLB flush to same batch */
    struct mmuext_op flush = { .cmd = MMUEXT_TLB_FLUSH_LOCAL };
    xen_mc_entry(
        __HYPERVISOR_mmuext_op,
        (unsigned long)&flush,
        1,
        0,
        DOMID_SELF,
        0
    );
    
    /* Issue single multicall for both operations */
    xen_mc_issue();
    
    preempt_enable();
}

When to Use Multicall

Security and Validation

Hypercall handlers are the hypervisor's attack surface. A malicious guest could attempt to exploit hypercalls to:

Escape its isolation and affect other guests
Access memory it shouldn't see (confidentiality breach)
Modify memory of other guests or the hypervisor (integrity breach)
Consume resources excessively (denial of service)
Privilege escalate within the virtualization stack

Security Principles for Hypercall Design:

Critical Validation Requirements

•Validate All Pointers — Guest-provided memory addresses must be verified to actually belong to the calling guest. Never dereference a guest pointer directly.
•Check Domain Permissions — Not all hypercalls are available to all guests. Control operations (domctl) are restricted to Domain 0.
•Validate Array Lengths — Buffer sizes must be checked against maximum allowed values to prevent buffer overflows.
•Verify Machine Frame Numbers — When guests provide MFNs, verify the guest actually owns those frames.
•Validate Page Table Entries — PTEs cannot map arbitrary memory; only frames owned or granted to the domain are valid.
•Rate Limiting — Some operations should be rate-limited to prevent resource exhaustion attacks.

hypercall_validation.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
/* Hypercall Validation Examples */
 
/* Validating guest-provided memory pointer */
static int validate_guest_buffer(struct domain *d, 
                                 unsigned long guest_va,
                                 size_t size,
                                 bool write) {
    unsigned long gpfn, mfn;
    p2m_type_t p2m;
    
    /* Check alignment */
    if (guest_va & (sizeof(long) - 1))
        return -EINVAL;
    
    /* Check size limits */
    if (size > MAX_HYPERCALL_BUFFER_SIZE)
        return -E2BIG;
    
    /* Walk guest page tables to get machine frame */
    gpfn = guest_va >> PAGE_SHIFT;
    mfn = get_gfn_query(d, gpfn, &p2m);
    
    if (!mfn_valid(mfn))
        return -EFAULT;
    
    /* Check page type allows requested access */
    if (write && !p2m_writeable(p2m))
        return -EACCES;
    
    put_gfn(d, gpfn);
    return 0;
}
 
/* Validating page table entry before allowing update */
static int validate_pte_update(struct domain *d,
                               unsigned long ptr,
                               unsigned long val) {
    unsigned long mfn;
    struct page_info *page;
    
    /* ptr must be machine address of a PTE in guest's page table */
    mfn = ptr >> PAGE_SHIFT;
    page = mfn_to_page(mfn);
    
    /* Verify guest owns this page table page */
    if (page_get_owner(page) != d)
        return -EPERM;
    
    /* Verify page is pinned as a page table page */
    if (!(page->u.inuse.type_info & PGT_validated))
        return -EINVAL;
    
    /* If mapping something, verify guest owns target frame */
    if (val & _PAGE_PRESENT) {
        unsigned long target_mfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT;
        struct page_info *target = mfn_to_page(target_mfn);
        
        /* Must own or have grant to target frame */
        if (page_get_owner(target) != d &&
            !grant_map_valid(d, target_mfn))
            return -EPERM;
        
        /* Cannot map with write + exec in same PTE (W^X) */
        if ((val & _PAGE_RW) && (val & ~_PAGE_NX))
            return -EACCES;  /* Policy decision */
    }
    
    return 0;
}
 
/* Domain permission check for privileged hypercalls */
static bool domain_has_privilege(struct domain *d, 
                                unsigned int operation) {
    switch (operation) {
    case DOMCTL_createdomain:
    case DOMCTL_destroydomain:
    case DOMCTL_pausedomain:
    case DOMCTL_unpausedomain:
        /* Only Domain 0 (management domain) can do these */
        return d->domain_id == 0;
        
    case DOMCTL_getdomaininfo:
        /* Dom0 can query any domain; others only themselves */
        return d->domain_id == 0 || d->domain_id == target_domain;
        
    default:
        return false;
    }
}

Defense in Depth

Cross-Hypervisor Hypercall Interfaces

Major Hypercall Interfaces:

Hypervisor	Detection Method	Hypercall Instruction	Notable Hypercalls
Xen	CPUID leaf 0x40000000 = "XenVMMXenVMM"	vmcall/vmmcall/int 0x82	mmu_update, event_channel_op
KVM	CPUID leaf 0x40000000 = "KVMKVMKVM\0\0\0"	vmcall/vmmcall	KVM_HC_KICK_CPU, KVM_HC_CLOCK_PAIRING
Hyper-V	CPUID leaf 0x40000001 bits	vmcall/vmmcall	HvPostMessage, HvSignalEvent
VMware	I/O port 0x5658 (magic)	in/out instructions	VMWARE_CMD_GETVERSION, etc.

hypervisor_detection.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
/* Hypervisor Detection and Interface Selection */
 
enum hypervisor_type {
    HV_NONE = 0,
    HV_XEN,
    HV_KVM,
    HV_HYPERV,
    HV_VMWARE,
};
 
/* Detect which hypervisor we're running under */
enum hypervisor_type detect_hypervisor(void) {
    unsigned int eax, ebx, ecx, edx;
    char signature[13];
    
    /* Check for hypervisor presence bit first */
    cpuid(1, &eax, &ebx, &ecx, &edx);
    if (!(ecx & (1 << 31)))
        return HV_NONE;  /* Not running under hypervisor */
    
    /* Get hypervisor signature from CPUID leaf 0x40000000 */
    cpuid(0x40000000, &eax, &ebx, &ecx, &edx);
    memcpy(signature + 0, &ebx, 4);
    memcpy(signature + 4, &ecx, 4);
    memcpy(signature + 8, &edx, 4);
    signature[12] = '\0';
    
    if (strcmp(signature, "XenVMMXenVMM") == 0)
        return HV_XEN;
    if (strcmp(signature, "KVMKVMKVM") == 0)
        return HV_KVM;
    if (strcmp(signature, "Microsoft Hv") == 0)
        return HV_HYPERV;
    if (strcmp(signature, "VMwareVMware") == 0)
        return HV_VMWARE;
    
    return HV_NONE;  /* Unknown hypervisor */
}
 
/* Initialize paravirt ops based on detected hypervisor */
void __init init_hypervisor_platform(void) {
    enum hypervisor_type hv = detect_hypervisor();
    
    switch (hv) {
    case HV_XEN:
        xen_start_kernel();
        break;
    case HV_KVM:
        kvm_guest_init();
        break;
    case HV_HYPERV:
        hyperv_init();
        break;
    case HV_VMWARE:
        vmware_platform_setup();
        break;
    default:
        /* Native or unknown - use native ops */
        break;
    }
}
 
/* Each hypervisor provides its hypercall setup */
void kvm_guest_init(void) {
    pv_time_ops = kvm_time_ops;
    pv_mmu_ops.read_cr3 = kvm_read_cr3;
    /* KVM uses fewer paravirt hooks - mostly hardware-assisted */
    
    kvm_guest_cpu_init();
}

The virtio Standard

Hypercall Debugging and Tracing

Debugging hypercall-related issues requires visibility into the guest-hypervisor boundary. Several techniques enable hypercall tracing and analysis:

Guest-Side Debugging

•ftrace hypercall tracepoints — Linux traces hypercall entry/exit with arguments and return values
•/proc/xen or equivalent — Exposes hypercall statistics, event channel state
•hypercall latency histograms — Collect timing data to identify slow paths
•xen-trace in Xen — Records hypercall flows with timestamps

Hypervisor-Side Debugging

•xl dmesg (Xen) — Hypervisor ring buffer with hypercall logging
•xentrace — High-resolution event tracing of hypercall execution
•perf kvm — Hardware performance counters for VM exits
•GDB stub — Some hypervisors support GDB attachment for debugging

hypercall_tracing.sh
Shell
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#!/bin/bash
# Hypercall Tracing Examples
 
# === Linux ftrace for hypercalls ===
 
# Enable Xen hypercall tracing
echo 1 > /sys/kernel/debug/tracing/events/xen/xen_mc_entry/enable
echo 1 > /sys/kernel/debug/tracing/events/xen/xen_mc_callback/enable
 
# View hypercall trace
cat /sys/kernel/debug/tracing/trace
 
# Example output:
# kworker/0:1-123  [000] .... 1234.567890: xen_mc_entry: op=2 (mmu_update) arg1=0xffff... arg2=1
# kworker/0:1-123  [000] .... 1234.567895: xen_mc_callback: op=2 result=0
 
# === Xentrace for hypervisor-side tracing ===
 
# Start trace capture on dom0
xentrace -D -e 0x20000 /tmp/xentrace.bin &
 
# Run workload in guest...
 
# Stop tracing
kill %1
 
# Analyze trace
xentrace_format /tmp/xentrace.bin | less
 
# === perf for VM exit analysis ===
 
# Count VM exits by reason
perf kvm stat live
 
# Record VM exit trace for specific guest
perf kvm stat record -p $(pgrep qemu) sleep 10
perf kvm stat report
 
# Example output:
#  Event name       Samples     Time     Min Time     Max Time     Avg time
# HYPERCALL            5231    12.50ms      200ns      150us      2.39us
# EPT_VIOLATION         234     1.73ms      800ns       50us      7.39us
# IO_INSTRUCTION       1502     8.44ms      500ns      100us      5.62us

Summary: Hypercalls

We've explored hypercalls from interface to implementation. Let's consolidate the key insights:

Key Takeaways

•Hypercalls are privileged system calls to the hypervisor — They provide the controlled interface between guest kernel and hypervisor.
•Entry mechanisms vary by platform — Software traps, vmcall/vmmcall, or dedicated instructions; the hypercall page abstracts these differences.
•Categories span all kernel subsystems — Memory management, events, scheduling, I/O, and time all have associated hypercalls.
•Multicall batching amortizes overhead — Grouping multiple hypercalls into one transition significantly reduces total overhead.
•Security validation is critical — Every guest-provided argument must be validated; hypercall handlers are a primary attack surface.
•Cross-hypervisor portability requires detection — Guests detect their hypervisor via CPUID and configure appropriate interfaces.
•Tracing tools exist at multiple levels — Both guest ftrace and hypervisor-side tracing enable debugging of hypercall issues.

What's Next:

Page Complete

3 / 5