Bootstrap
The xv6's MBR contains the bootloader code (bootasm.S + bootmain.c)
Why xv6 (and other OSes) needs a boot loader, instead of "loading it self" on to the memory?
bootasm.S
Execution starts at 0x0000:0x7c00, in 16-bit mode (real mode).
The PC's Physical Address Space (Scroll down 30%). This page presents more interesting details: https://wiki.osdev.org/Memory_Map_(x86)
Disable interrupts; Clear segment registers; Enable physical address line A20 (nothing interesting here).
Global Descriptor Table is ready to use (static data in the MBR)
lgdt ? is this still 8086 ? -- we are in the real mode but this is a modern CPU. (Some) instructions/registers are available beyond the 8086 CPU specification.
Switch to the protected mode (the 32-bit mode) to enable virtual memory.
Did you specify 0x7c00 in Lecture-2's boot sector example? Why it worked? -- the example does not use a correct text segment address (0 instead of 0x7c00). However, we don't use any absolute addressing in the demo boot sector, it runs without an issue.
The purpose of the segment registers has changed in the protected mode.
on 8086 the pc (program counter) address is calculated by %cs * 16 + %ip. the 16-bit %cs is used as a base address.
the cs/es/ss/etc. registers became segment selectors on i386/x86_64 CPUs. They usually contain a small value as an index to choose from a few entries in a small table (GDT), and two bits to specify a runtime privilege level.
Why not to increase those registers' size into 64-bit or more?
What's the essential step(s) before the bootloader can call any C function?-- A stack. That's it.
How large is the stack? -- The stack top is at 0x7c00 (the label "start").
See bootblock target in Makefile
Detect physical memory mapping mapping with int 820h.
On your Linux: $ dmesg | grep e820, both usable and reserved are backed by DRAM.
bootmain.c
Now we can make function calls, declare variables.
But we cannot use floating points and malloc()/free().
How to pass some arguments to bootmain()? (Calling Conventions)
What's bootmain's task?
It loads the kernel image into the correct memory location and then pass the control to the kernel code.
0x1BADB002? mbheader.magic must contain this value. Let's grep for the magic word 0x1badb002. (use -i to ignore case, grep -nri 1badb002)
More information about the multiboot spec: https://wiki.osdev.org/Multiboot and Gnu Multiboot Spec. example: boot.S
Where is the multiboot header? xxd -g 4 -e kernel | less. see address 0x100.
You can also find it in the xv6.img (xxd -g 4 -e xv6.img | less) at address 0x300.
.bss segment (Block Started by Symbol)
entry: get more hints from Makefile and kernel.ld (LD scripts)
In kernel.ld: Why mboot_load_addr + (mboot_entry - begin)?
All the code in the kernel image is linked at a high address area (see ". = 0xFFFF800000100000" at the beginning of kernel.ld).
However, the kernel image is loaded at a low address (0x100000) at boot time.
The first part of code of kernel code (in entry.S) has to run at the low address because the 64-bit memory is not yet ready to use. As a result, to access something use a symbol's address, the address needs to be "shifted down" to the low address.
finally we know it refers to the data at mboot_entry.
See mboot_header: at the beginning of entry.S (We're entering the kernel's binary)
The kernel bootstrap
Let's talk about entry and mboot_entry
bootmain.c:67 entry();
entry.S:45 mboot_entry:
kernel.ld:10 (mboot_load_addr)
kernel.ld:20 (AT)
VMA and LMA
VMA: for linking. The compiler and linker work on it. When you build a userspace project, the virtual address space is the only one that matters.
LMA: for loading. This value has nothing to do with compilation and linking. Consider it's totally ignored by the program unless explicitly handled.
99.9% of the time, LMA is equal to VMA.
Why we have to use a different LMA? (The following explains the current implementation, but other systems can adopt a better approach.)
The bootloader switches to the protected mode (32-bit) before entering bootmain. It then follow the "multiboot" protocol to load something.
The xv6 kernel is a 64-bit program that wants to run at a high address (>>4GB).
"It's really hard to negotiate with the bootloader developers. Their bootloader is supporting 1000 different platforms and it must stay generic. The protocol is the protocol."
The kernel needs to help itself out. It is aware of the fact that its first instruction runs at a low address in 32-bit mode.
memlayout.h: KERNBASE, V2P, P2V
The kernel booting process before entering C
The kernel image is loaded at a 32-bit VA 0x100000, which is directly mapped to (PA) physical address 0x100000 by the bootstrap GDT (see bootasm.S).
The kernel is supposed to run at a high VA (0xFFFF800000100000) that is not possible in the 32-bit mode.
The kernel entry point is mboot_entry at entry.S
The entry code starts to run at the low VA (0x100000). Every time it dereferences a symbol, it must calculates the correct low VA using the high VAs in the program (the addresses associated with the symbols. you can check the symbols using "nm kernel").
The first task of the kernel entry code is to build a simplest 64-bit page table for entering the 64-bit mode, so the rest of the kernel can run at the high address space.
The four-level page table structure (see mmu.h): PML4T (root) ->PDPT -> PDT -> PT. (here it only uses the top two levels to map huge pages (1GB))
after setting up gdt (for bootstrap VA-to-PA memory mapping), it uses a ljmp to switch to 64-bit mode. (see entry64low)
In our case, (entry64low - mboot_header + mboot_load_addr) equals to (entry64low - KERNBASE).
Because this ljmp can only use a 32-bit address, the execution is still at low VA space, but is already in 64-bit mode.
Meanwhile (right after ljmp), the memory mapping at the high space has been established with the new page table.
It simply jumps to the high VA at entry64high.
Starting from entry64high (entry.S:121), the CPU starts to run the kernel code at "correct" addresses. (finally!)
The last instruction in the assembly world is "jmp main" at line #144
Read Makefile's kernel part, kernel.ld, and memlayout.h to get the dots connected.
What's in a Linux kernel image?
/boot/vmlinuz-linux # a compressed Linux kernel file (6MB)
$ file vmlinuz-linux
vmlinuz-linux: Linux kernel x86 boot executable bzImage, version 5.2.13-arch1-1-ARCH (builduser@heftig-49962) #1 SMP PREEMPT Fri Sep 6 17:52:33 UTC 2019, RO-rootFS, swap_dev 0x5, Normal VGA
decompress it with a special script:
wget -O extract-vmlinux https://raw.githubusercontent.com/torvalds/linux/master/scripts/extract-vmlinux
bash ./extract-vmlinux vmlinuz-linux > kernel
$ file kernel
vmlinux: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=..., stripped
The ELF format
http://www.skyfree.org/linux/references/ELF_Format.pdf
This article explains more about multiboot vs ELF. An ELF executable can be compatible with Multiboot.
The ELF format itself defines some "bootable" metadata. The linux kernel on my laptop is a pure ELF image. It does not contain a multiboot header.
$ readelf -l kernel
Elf file type is EXEC (Executable file)
Entry point 0x1000000
There are 5 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000200000 0xffffffff81000000 0x0000000001000000
0x00000000011d0000 0x00000000011d0000 R E 0x200000
LOAD 0x0000000001400000 0xffffffff82200000 0x0000000002200000
0x00000000004dd000 0x00000000004dd000 RW 0x200000
LOAD 0x0000000001a00000 0x0000000000000000 0x00000000026dd000
0x000000000002d000 0x000000000002d000 RW 0x200000
LOAD 0x0000000001b0a000 0xffffffff8270a000 0x000000000270a000
0x000000000016e000 0x0000000000522000 RWE 0x200000
NOTE 0x0000000000e00e34 0xffffffff81c00e34 0x0000000001c00e34
0x00000000000001ec 0x00000000000001ec 0x4