Loading learning content...
Every operating system student learns about system calls—the gateway through which user-space applications request services from the kernel. When an application needs to read a file, allocate memory, or create a process, it invokes a system call that transitions from user mode to kernel mode, performs the privileged operation, and returns the result.
Hypercalls are the same concept, elevated by one privilege layer.
In a paravirtualized environment, the guest kernel cannot directly perform certain privileged operations—they require hypervisor intervention. When the guest needs to update page tables, handle an interrupt, or configure a timer, it makes a hypercall: a controlled transition from the guest kernel to the hypervisor that requests a specific operation.
The analogy is precise: system calls are the API between applications and the kernel; hypercalls are the API between the guest kernel and the hypervisor. Both provide controlled, validated access to a higher privilege level.
By the end of this page, you will understand the architecture of hypercall interfaces, how hypercalls are implemented at the machine level, the design of common hypercall categories, performance considerations and optimization techniques, and the security validation required for safe hypercall handling.
A hypercall system consists of several components working together to provide efficient, secure communication between guests and the hypervisor:
Architectural Components:
Hypercall Table — A dispatch table in the hypervisor mapping hypercall numbers to handler functions
Entry Mechanism — The instruction or sequence that transfers control from guest to hypervisor
Calling Convention — Register/memory layout for passing arguments and receiving results
Handler Functions — Hypervisor code implementing each hypercall operation
Return Path — Mechanism to resume guest execution with the result
| Aspect | System Call | Hypercall |
|---|---|---|
| Caller | User-space application | Guest kernel |
| Callee | Operating system kernel | Hypervisor |
| Privilege transition | Ring 3 → Ring 0 | Ring 1/3 → Ring 0 (or VM exit) |
| Entry instruction | syscall, sysenter, int 0x80 | vmcall, vmmcall, trap instruction |
| Typical latency | ~100-200 cycles | ~200-1000 cycles |
| Validation required | Check user permissions | Check guest identity and permissions |
| Examples | read(), write(), fork() | mmu_update(), event_channel_op() |
The Hypercall Page:
Many hypervisors provide a dedicated hypercall page—a region of memory containing the optimal instruction sequence for invoking hypercalls on the current platform. This allows the guest to use the most efficient available mechanism without hardcoding assumptions:
vmcall instructionvmmcall instructionint $0x82 or similar trapThe guest copies the hypercall invocation stub from the hypervisor-provided page and uses it for all calls. This provides forward compatibility as hardware evolves.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465
/* Hypercall Page Setup - Xen Example */ /* * The hypervisor provides a page with optimal hypercall entry stubs. * Guest copies these to a known location and calls through them. */ /* Hypercall page provided by guest kernel */extern char hypercall_page[PAGE_SIZE]; void __init xen_hypercall_setup(void) { /* * HYPERCALL_PAGE_MSR contains the physical address where * the hypervisor should write the hypercall page. */ unsigned long hypercall_msr; hypercall_msr = __pa(hypercall_page); hypercall_msr |= (unsigned long)XEN_SIGNATURE << 32; /* Ask Xen to populate the hypercall page */ wrmsrl(MSR_HYPERCALL_PAGE, hypercall_msr); /* * Now hypercall_page contains stubs like: * * hypercall_page + 0*32: (hypercall 0 entry - mmu_update) * mov $0, %eax * vmcall ; or vmmcall, or int $0x82 * ret * * hypercall_page + 1*32: (hypercall 1 entry - set_gdt) * mov $1, %eax * vmcall * ret * * Each stub is 32 bytes, allowing up to 128 hypercalls per page. */} /* Invoking a hypercall - call into the page */static inline long HYPERVISOR_mmu_update( mmu_update_t *req, unsigned int count, unsigned int *success_count, domid_t domid){ long ret; /* * Call the stub at hypercall_page + (hypercall_nr * 32) * Arguments in registers per AMD64 calling convention: * %rdi = req * %rsi = count * %rdx = success_count * %r10 = domid (note: not %rcx, which is clobbered) */ asm volatile( "call *%[entry]" : "=a" (ret) : [entry] "r" (hypercall_page + __HYPERVISOR_mmu_update * 32), "D" (req), "S" (count), "d" (success_count), "r" (domid) : "memory", "rcx", "r11" ); return ret;}The physical mechanism for transferring control from guest to hypervisor varies by hardware platform and virtualization mode. Understanding these mechanisms reveals important performance and security characteristics.
Software Trap (Pure Paravirtualization):
In pure paravirtualization without hardware virtualization support, hypercalls use software interrupt instructions:
int $0x82 on Xen (arbitrary vector not used by standard x86)int $0x80 for Linux system calls in 32-bit modeCharacteristics:
12345678910111213141516171819202122232425262728293031323334
; Software trap hypercall entry (Xen PV); Guest executes this stub from hypercall page hypercall_entry_int82: ; Arguments already in registers: ; eax = hypercall number ; ebx, ecx, edx, esi, edi = arguments 1-5 int $0x82 ; Trap to hypervisor ; On return, eax contains result ret ; Hypervisor IDT handler for int 0x82hypervisor_int82_handler: ; Save guest state push_all_registers ; Validate hypercall number cmp eax, MAX_HYPERCALL jae .invalid_hypercall ; Dispatch to handler mov rax, [hypercall_table + rax * 8] call rax ; Restore guest state with result in eax pop_all_registers iret .invalid_hypercall: mov eax, -ENOSYS pop_all_registers iretHypercalls can be organized into functional categories based on the subsystem they serve. Each category has distinct characteristics and usage patterns.
| Category | Example Hypercalls | Purpose |
|---|---|---|
| Memory Management | mmu_update, mmuext_op, update_va_mapping | Page table updates, TLB flushes |
| Event Channels | event_channel_op, set_callbacks | Virtual interrupt delivery |
| Scheduling | sched_op, vcpu_op | Yield, block, vCPU management |
| Console/Debug | console_io, sysctl | Output, debugging, configuration |
| Domain Management | domctl, memory_op | VM lifecycle (privileged only) |
| Grant Tables | grant_table_op | Inter-domain memory sharing |
| Time | set_timer_op, vcpu_time_info | Timer events, wall clock |
| Multicall | multicall | Batch multiple hypercalls |
Let's examine the most critical hypercall categories in detail:
Memory Management Hypercalls:
These are among the most frequently invoked hypercalls, as every page table modification in a paravirtualized guest requires hypervisor validation. The mmu_update hypercall accepts an array of update requests, allowing batched updates for efficiency:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
/* Memory Management Hypercalls */ /* Single MMU update (PTE, PDE, etc.) */struct mmu_update { uint64_t ptr; /* Machine address of PTE to update */ uint64_t val; /* New value to write */}; /* Extended MMU operations */struct mmuext_op { unsigned int cmd; /* Operation code */ union { unsigned long mfn; /* For NEW_BASEPTR */ unsigned long linear_addr; /* For INVLPG */ } arg1; union { unsigned int nr_ents; /* For operations on ranges */ unsigned long pcid; /* For TLB operations */ } arg2;}; /* MMU operation codes */#define MMUEXT_PIN_L1_TABLE 0 /* Pin as L1 (PTE) page table */#define MMUEXT_PIN_L2_TABLE 1 /* Pin as L2 (PDE) page table */#define MMUEXT_PIN_L3_TABLE 2 /* Pin as L3 (PDPE) page table */#define MMUEXT_PIN_L4_TABLE 3 /* Pin as L4 (PML4E) page table */#define MMUEXT_UNPIN_TABLE 4 /* Unpin page table */#define MMUEXT_NEW_BASEPTR 5 /* Set new page table base (CR3) */#define MMUEXT_TLB_FLUSH_LOCAL 6 /* Flush local TLB */#define MMUEXT_INVLPG_LOCAL 7 /* Invalidate single TLB entry */#define MMUEXT_TLB_FLUSH_MULTI 8 /* Flush TLBs on multiple CPUs */#define MMUEXT_INVLPG_MULTI 9 /* Invalidate entry on multiple CPUs */#define MMUEXT_TLB_FLUSH_ALL 10 /* Flush all TLBs (all CPUs) */#define MMUEXT_INVLPG_ALL 11 /* Invalidate entry on all CPUs */ /* Changing page table base (equivalent to writing CR3) */static void xen_write_cr3(unsigned long cr3) { struct mmuext_op op; op.cmd = MMUEXT_NEW_BASEPTR; op.arg1.mfn = PFN_DOWN(__pa(cr3)); HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF);} /* Flushing TLB for a single address */static void xen_flush_tlb_single(unsigned long addr) { struct mmuext_op op; op.cmd = MMUEXT_INVLPG_LOCAL; op.arg1.linear_addr = addr; HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF);} /* Batched PTE updates - efficient for bulk modifications */static void xen_set_pte_batch(pte_t *ptes[], pte_t vals[], int count) { struct mmu_update updates[MAX_BATCH_SIZE]; int i; for (i = 0; i < count; i++) { updates[i].ptr = arbitrary_virt_to_machine(ptes[i]); updates[i].val = pte_val_ma(vals[i]); } /* Single hypercall for all updates */ if (HYPERVISOR_mmu_update(updates, count, NULL, DOMID_SELF) < 0) BUG();}Event Channel Hypercalls:
Event channels provide the virtual interrupt mechanism. The event_channel_op hypercall manages channel lifecycle and notification:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
/* Event Channel Hypercalls */ /* Event channel operations */#define EVTCHNOP_bind_interdomain 0 /* Create channel between domains */#define EVTCHNOP_bind_virq 1 /* Bind to virtual IRQ */#define EVTCHNOP_bind_pirq 2 /* Bind to physical IRQ */#define EVTCHNOP_close 3 /* Close channel */#define EVTCHNOP_send 4 /* Send notification */#define EVTCHNOP_status 5 /* Query channel status */#define EVTCHNOP_alloc_unbound 6 /* Allocate unbound channel */#define EVTCHNOP_bind_ipi 7 /* Bind for IPI delivery */#define EVTCHNOP_unmask 9 /* Unmask channel */ /* Bind a virtual IRQ (timer, console, etc.) to an event channel */int bind_virq(unsigned int virq, unsigned int cpu) { struct evtchn_bind_virq bind = { .virq = virq, .vcpu = cpu, }; int ret; ret = HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq, &bind); if (ret == 0) return bind.port; /* Return allocated port number */ return ret;} /* Send notification on an event channel */static inline void notify_remote_via_evtchn(unsigned int port) { struct evtchn_send send = { .port = port }; HYPERVISOR_event_channel_op(EVTCHNOP_send, &send);} /* Allocate an unbound event channel for inter-domain comms */int alloc_unbound(domid_t remote_dom) { struct evtchn_alloc_unbound alloc = { .dom = DOMID_SELF, .remote_dom = remote_dom, }; int ret; ret = HYPERVISOR_event_channel_op(EVTCHNOP_alloc_unbound, &alloc); if (ret == 0) return alloc.port; return ret;} /* Setting up event channel callbacks during boot */void xen_setup_callbacks(void) { struct xen_callback_register event_cb = { .type = CALLBACKTYPE_event, .address = (unsigned long)xen_hypervisor_callback, }; struct xen_callback_register failsafe_cb = { .type = CALLBACKTYPE_failsafe, .address = (unsigned long)xen_failsafe_callback, }; HYPERVISOR_callback_op(CALLBACKOP_register, &event_cb); HYPERVISOR_callback_op(CALLBACKOP_register, &failsafe_cb);}Since each hypercall incurs fixed overhead (VM exit, state save/restore, dispatch), executing many small hypercalls can be expensive. The multicall mechanism solves this by batching multiple hypercall requests into a single hypervisor transition.
How Multicall Works:
multicall hypercall passes the array to hypervisorThis amortizes the transition overhead across many operations.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697
/* Multicall Optimization */ struct multicall_entry { unsigned long op; /* Hypercall operation number */ long result; /* Result (filled in by hypervisor) */ unsigned long args[6]; /* Up to 6 arguments */}; /* Per-CPU multicall batch */struct mc_buffer { unsigned int mc_idx; struct multicall_entry entries[MC_BATCH];};static DEFINE_PER_CPU(struct mc_buffer, mc_buffer); /* Add a hypercall to the current batch */static inline void xen_mc_entry(unsigned long op, unsigned long arg0, unsigned long arg1, unsigned long arg2, unsigned long arg3, unsigned long arg4) { struct mc_buffer *mc = this_cpu_ptr(&mc_buffer); struct multicall_entry *entry = &mc->entries[mc->mc_idx++]; entry->op = op; entry->args[0] = arg0; entry->args[1] = arg1; entry->args[2] = arg2; entry->args[3] = arg3; entry->args[4] = arg4; /* Flush if batch is full */ if (mc->mc_idx >= MC_BATCH) xen_mc_issue();} /* Execute the batched hypercalls */void xen_mc_issue(void) { struct mc_buffer *mc = this_cpu_ptr(&mc_buffer); if (mc->mc_idx == 0) return; /* Single hypercall executes entire batch */ HYPERVISOR_multicall(mc->entries, mc->mc_idx); /* Check results if needed */ for (int i = 0; i < mc->mc_idx; i++) { if (unlikely(mc->entries[i].result < 0)) handle_multicall_error(&mc->entries[i], i); } mc->mc_idx = 0;} /* * Example: Updating multiple PTEs with multicall * Instead of N hypercalls, we make 1 multicall + 1 flush */void xen_set_pmd_batch(pmd_t *pmds[], pmd_t vals[], int count) { struct mmu_update updates[count]; preempt_disable(); /* Keep on same CPU for multicall buffer */ /* Build array of updates */ for (int i = 0; i < count; i++) { updates[i].ptr = arbitrary_virt_to_machine(pmds[i]); updates[i].val = pmd_val_ma(vals[i]); } /* Add to multicall batch */ xen_mc_entry( __HYPERVISOR_mmu_update, (unsigned long)updates, count, 0, /* success_count - not needed */ DOMID_SELF, 0 ); /* Add TLB flush to same batch */ struct mmuext_op flush = { .cmd = MMUEXT_TLB_FLUSH_LOCAL }; xen_mc_entry( __HYPERVISOR_mmuext_op, (unsigned long)&flush, 1, 0, DOMID_SELF, 0 ); /* Issue single multicall for both operations */ xen_mc_issue(); preempt_enable();}Multicall is most beneficial when you have multiple independent hypercalls to make in sequence. The Linux kernel's Xen code automatically batches related operations (e.g., page table updates before context switch). For single, isolated hypercalls, the multicall wrapper adds overhead and should be avoided.
Hypercall handlers are the hypervisor's attack surface. A malicious guest could attempt to exploit hypercalls to:
Security Principles for Hypercall Design:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889
/* Hypercall Validation Examples */ /* Validating guest-provided memory pointer */static int validate_guest_buffer(struct domain *d, unsigned long guest_va, size_t size, bool write) { unsigned long gpfn, mfn; p2m_type_t p2m; /* Check alignment */ if (guest_va & (sizeof(long) - 1)) return -EINVAL; /* Check size limits */ if (size > MAX_HYPERCALL_BUFFER_SIZE) return -E2BIG; /* Walk guest page tables to get machine frame */ gpfn = guest_va >> PAGE_SHIFT; mfn = get_gfn_query(d, gpfn, &p2m); if (!mfn_valid(mfn)) return -EFAULT; /* Check page type allows requested access */ if (write && !p2m_writeable(p2m)) return -EACCES; put_gfn(d, gpfn); return 0;} /* Validating page table entry before allowing update */static int validate_pte_update(struct domain *d, unsigned long ptr, unsigned long val) { unsigned long mfn; struct page_info *page; /* ptr must be machine address of a PTE in guest's page table */ mfn = ptr >> PAGE_SHIFT; page = mfn_to_page(mfn); /* Verify guest owns this page table page */ if (page_get_owner(page) != d) return -EPERM; /* Verify page is pinned as a page table page */ if (!(page->u.inuse.type_info & PGT_validated)) return -EINVAL; /* If mapping something, verify guest owns target frame */ if (val & _PAGE_PRESENT) { unsigned long target_mfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT; struct page_info *target = mfn_to_page(target_mfn); /* Must own or have grant to target frame */ if (page_get_owner(target) != d && !grant_map_valid(d, target_mfn)) return -EPERM; /* Cannot map with write + exec in same PTE (W^X) */ if ((val & _PAGE_RW) && (val & ~_PAGE_NX)) return -EACCES; /* Policy decision */ } return 0;} /* Domain permission check for privileged hypercalls */static bool domain_has_privilege(struct domain *d, unsigned int operation) { switch (operation) { case DOMCTL_createdomain: case DOMCTL_destroydomain: case DOMCTL_pausedomain: case DOMCTL_unpausedomain: /* Only Domain 0 (management domain) can do these */ return d->domain_id == 0; case DOMCTL_getdomaininfo: /* Dom0 can query any domain; others only themselves */ return d->domain_id == 0 || d->domain_id == target_domain; default: return false; }}Even with careful validation, hypercall handlers have been a source of security vulnerabilities. The Xen Security Advisory (XSA) list includes numerous hypercall-related issues. Modern hypervisors employ additional defenses: fuzzing of hypercall handlers, formal verification of critical paths, and sandboxing of hypervisor components.
Different hypervisors define their own hypercall interfaces. For guests to run on multiple hypervisors, either the guest must detect and adapt to each interface, or a standard interface must be adopted.
Major Hypercall Interfaces:
| Hypervisor | Detection Method | Hypercall Instruction | Notable Hypercalls |
|---|---|---|---|
| Xen | CPUID leaf 0x40000000 = "XenVMMXenVMM" | vmcall/vmmcall/int 0x82 | mmu_update, event_channel_op |
| KVM | CPUID leaf 0x40000000 = "KVMKVMKVM\0\0\0" | vmcall/vmmcall | KVM_HC_KICK_CPU, KVM_HC_CLOCK_PAIRING |
| Hyper-V | CPUID leaf 0x40000001 bits | vmcall/vmmcall | HvPostMessage, HvSignalEvent |
| VMware | I/O port 0x5658 (magic) | in/out instructions | VMWARE_CMD_GETVERSION, etc. |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970
/* Hypervisor Detection and Interface Selection */ enum hypervisor_type { HV_NONE = 0, HV_XEN, HV_KVM, HV_HYPERV, HV_VMWARE,}; /* Detect which hypervisor we're running under */enum hypervisor_type detect_hypervisor(void) { unsigned int eax, ebx, ecx, edx; char signature[13]; /* Check for hypervisor presence bit first */ cpuid(1, &eax, &ebx, &ecx, &edx); if (!(ecx & (1 << 31))) return HV_NONE; /* Not running under hypervisor */ /* Get hypervisor signature from CPUID leaf 0x40000000 */ cpuid(0x40000000, &eax, &ebx, &ecx, &edx); memcpy(signature + 0, &ebx, 4); memcpy(signature + 4, &ecx, 4); memcpy(signature + 8, &edx, 4); signature[12] = '\0'; if (strcmp(signature, "XenVMMXenVMM") == 0) return HV_XEN; if (strcmp(signature, "KVMKVMKVM") == 0) return HV_KVM; if (strcmp(signature, "Microsoft Hv") == 0) return HV_HYPERV; if (strcmp(signature, "VMwareVMware") == 0) return HV_VMWARE; return HV_NONE; /* Unknown hypervisor */} /* Initialize paravirt ops based on detected hypervisor */void __init init_hypervisor_platform(void) { enum hypervisor_type hv = detect_hypervisor(); switch (hv) { case HV_XEN: xen_start_kernel(); break; case HV_KVM: kvm_guest_init(); break; case HV_HYPERV: hyperv_init(); break; case HV_VMWARE: vmware_platform_setup(); break; default: /* Native or unknown - use native ops */ break; }} /* Each hypervisor provides its hypercall setup */void kvm_guest_init(void) { pv_time_ops = kvm_time_ops; pv_mmu_ops.read_cr3 = kvm_read_cr3; /* KVM uses fewer paravirt hooks - mostly hardware-assisted */ kvm_guest_cpu_init();}For I/O, the virtio standard provides hypervisor-agnostic paravirtualized devices. Rather than each hypervisor defining its own block/network/console interfaces, virtio specifies a common interface that works with KVM, Xen, VMware, and others. This significantly reduces guest porting effort.
Debugging hypercall-related issues requires visibility into the guest-hypervisor boundary. Several techniques enable hypercall tracing and analysis:
12345678910111213141516171819202122232425262728293031323334353637383940414243
#!/bin/bash# Hypercall Tracing Examples # === Linux ftrace for hypercalls === # Enable Xen hypercall tracingecho 1 > /sys/kernel/debug/tracing/events/xen/xen_mc_entry/enableecho 1 > /sys/kernel/debug/tracing/events/xen/xen_mc_callback/enable # View hypercall tracecat /sys/kernel/debug/tracing/trace # Example output:# kworker/0:1-123 [000] .... 1234.567890: xen_mc_entry: op=2 (mmu_update) arg1=0xffff... arg2=1# kworker/0:1-123 [000] .... 1234.567895: xen_mc_callback: op=2 result=0 # === Xentrace for hypervisor-side tracing === # Start trace capture on dom0xentrace -D -e 0x20000 /tmp/xentrace.bin & # Run workload in guest... # Stop tracingkill %1 # Analyze tracexentrace_format /tmp/xentrace.bin | less # === perf for VM exit analysis === # Count VM exits by reasonperf kvm stat live # Record VM exit trace for specific guestperf kvm stat record -p $(pgrep qemu) sleep 10perf kvm stat report # Example output:# Event name Samples Time Min Time Max Time Avg time# HYPERCALL 5231 12.50ms 200ns 150us 2.39us# EPT_VIOLATION 234 1.73ms 800ns 50us 7.39us# IO_INSTRUCTION 1502 8.44ms 500ns 100us 5.62usWe've explored hypercalls from interface to implementation. Let's consolidate the key insights:
What's Next:
Now that we understand the hypercall mechanism, we'll examine the performance benefits of paravirtualization in detail. We'll see quantitative measurements comparing paravirtualized, full virtualized, and native execution across various workloads, understanding where paravirtualization shines and where modern hardware assistance has closed the gap.
You now understand hypercalls—the fundamental API between paravirtualized guests and hypervisors. From entry mechanisms to security validation, this knowledge is essential for understanding how modern virtualization systems achieve both performance and isolation.