Tools and bootstrap
git
$ git
$ man git
$ man git-add # add a dash '-' between git and the sub-command
$ git clone https://github.com/bitslab/xv6-public.git
Useful sub-commands: diff, stash, reset, log, etc..
Three sections of a git project:
the Git database (the .git directory)
the staging area
the working tree (the files in your directory)
How do they change:
When you edit in any file: the working tree changes.
When you "git add" a file: the staging area changes.
When you commit: the database changes. A new commit is created on top of the HEAD of the current branch.
Everything is possible:
reset the staging area but keep all changes in the working tree: git reset
discard everything since the last commit: git reset --hard
discard all changes and erase some history: git reset --hard HEAD~3 # this erases the latest three commits
erase history but still keep your working tree: git reset HEAD~3 or git reset --soft HEAD~3 # Read the manual for what they do.
stash the changes in the staging area and the working tree: git stash # to restore the changes, use git stash pop or git stash apply.
Tools and tips
Declare some shortcuts at ~/.gitconfig.
$ man git-config
Github: Use public key authentication.
ssh-keygen -t ecdsa # keep hitting Enter until it finishes.
upload the newly generated ~/.ssh/id_ecdsa.pub to https://github.com/settings/keys.
Use an SSH url for your remote: edit xv6dir/.git/config, replace https://github.com/AAA/BBB.git with git@github.com:AAA/BBB.git
check your remotes with $ git remote -v
tig is a very handy text-based git helper.
Staging made easy with tig (especially for partially commit your local edits): https://emmanuelbernard.com/blog/2017/08/02/tig-add-interactive/
Creating a hello-world user program in xv6
There is no glibc or stdio.h in xv6. Everything you can use must be provided in the xv6 codebase.
Try some tools
grep -nr <word> # search for files containing <word> in current directory.
grep -nr --include=*.{c,h,S} <word> # search in certain types of files
Creating aliases in your ~/.bashrc) (rc is short for "run commands")
Creating two aliases 'gk' and 'gu' will be helpful for grepping kernel-only or user-only symbols in xv6: https://github.com/wuxb45/rc/wiki/Extra-aliases-for-xv6
User-space resources:
Data type definitions: types.h (uint32, uint64, addr_t)
All user-space functions are declared in user.h, and implemented in ulib.c, usys.c, printf.c, and umalloc.c (90 LOC K&R's malloc).
Debugging a user-space program
In your ~/.gdbinit, add this line to allow loading .gdbinit at your working directory.
add-auto-load-safe-path ./
In the first terminal: $ make qemu-gdb or $ make qemu-nox-gdb This will create .gdbinit, start a qemu vm, and wait for gdb attachment.
In the second terminal: $ gdb
Now, the execution stops at some assembly code. Now, enter 'c' to continue execution. The same thing will happen again, enter 'c' again.
Once you see xv6's shell, use Ctrl+C in the (frozen) gdb window to interrupt the execution.
Load the symbol file of the binary you're going to debug (in gdb): $ sym _divide
And set a break point at the beginning of the main function: $ b main
Continue execution from gdb with 'c'.
and run the command in xv6's shell: $ divide 3 5
Now the execution will stop at the main function of divide.c
Show source code in gdb: gdb> tui enable
gdb cheatsheet: https://cs.brown.edu/courses/cs033/docs/guides/gdb.pdf
BIOS
https://en.wikipedia.org/wiki/BIOS
BIOS is a library & executable: contains the first instruction to run when a PC boots up (at 0xFFFF0)
BIOS is firmware: comes with the motherboard. Contains hardware-specific subroutines for upper-level OS (DOS).
Why do we need a firmware?
Different computers contain different devices that require vendor-specific drivers.
For example, to receive a keyboard input, computer A needs to read from port 0x1234, but computer B needs to read from port 0x5678, and write to port 0x5680 to acknowledge the read.
Solution: a standard "read-keyboard-input" BIOS call for ALL IBM PC int 16h: https://en.wikipedia.org/wiki/INT_16H.
Most devices in the old IBM PC era don't support discovery.
Why BIOS has become less relevant today?
New hardware can be easily discovered and configured (USB, PCIe, etc..). The devices are getting smarter (e.g., can't charge my poor iphone6 with an "unsupported lightening cable").
UEFI is gradually replacing BIOS as a boot manager.
Inspecting the MBR on your x86 PC
MBR: Master Boot Record
The first (512-byte) sector on the disk. Loaded to address 0x7c00 by BIOS.
Code starts to runs in real mode (the 16-bit mode).
Look into your laptop/pc's boot sector: (your laptop may not have it)
# dd if=/dev/sda of=sec0 count=1
# objdump -D -b binary -mi8086 sec0
Who wrote/generated those code?
git clone git://git.savannah.gnu.org/grub.git
vi grub/grub-core/boot/i386/pc/boot.S
vi grub/grub-core/boot/i386/qemu/boot.S
The MBR is too small. It does some essential initialization and then loads a larger program into the memory to execute (the real grub boot loader).
6 Stages of Linux Boot Process (see below)
Read bootasm.S in xv6
Games that fit in an MBR: Tetris, Pillman, Invader, ... (note that they are all written in Intel syntax).
*The init process has been replaced by systemd in most Linux distros.
Creating your own mbr
Create and write some code in mymbr.S (see below)
Compile it with the assembler as mymbr.S
Without specifying a name, the default output file is a.out
Extract the MBR binary using objcopy: objcopy -O binary a.out
Again, without specifying a name for the output, objcopy overwrites an existing a.out
Use the objdump command above to inspect the MBR binary. objdump -D -b binary -mi8086 a.out
Alternatively, you may use xxd to inspect the binary file.
Finally, you must add the special signature to the end of MBR file. The perl script "sign.pl" will do this magic. ./sign.pl a.out
Alternatively, you can use .org and .word in the assembly code to hardcode the signature at the desired location (the last two bytes of the sector).
Now the MBR should be exactly 512 bytes (ls -al a.out)
We want to show something on the screen (say, character 'D') using a BIOS interrupt: Int 10h, AH=0Eh (Teletype output).
mymbr.S
.code16 # 16-bit real mode
.text
.global start
start:
cli # BIOS enabled interrupts; disable
movb $0x0, %bh
movb $0xe, %ah
movb $'D', %al
int $0x10
spin:
jmp spin
$ qemu-system-x86_64 a.out
or
$ qemu-system-i386 a.out
Debugging the MBR starting from the first instruction
The default 64-bit qemu does support debugging of 16-bit MBR. We will use a 32-bit qemu vm for this experiment (and for debugging your code in homework 2).
In your ~/.gdbinit, add this line to allow loading .gdbinit at your working directory.
add-auto-load-safe-path ./
Edit .gdbinit in your xv6 source code directory.
set architecture i8086
target remote localhost:1234
tui enable
layout asm
b *0x7c00
c
You can also manually type-in the commands without using .gdbinit. In this case, check and remove unwanted commands in .gdbinit.
Beware that the .gdbinit will be overwritten if you use make qemu-gdb. We DON'T use "make qemu-gdb" for debugging the MBR.
To start the vm: $ qemu-system-i386 -S -s a.out. This will start a 32-bit qemu vm and listen on the default port 1234
THEN, from another terminal at the same working directory: $ gdb
(gdb will try to load the .gdbinit at the current working directory, if it exists.)
After a half-second pause, you can see the gdb prompt with a message like this: "Breakpoint 1, 0x00007c00 in ?? ()"
Now you can debug the 16-bit MBR.
use si to "step instruction".
To show disassembly at some address (show 10 instructions): x/10i 0x7c00
The disassembly of xv6's boot sector can be found in bootblock.asm after a successful build (see Makefile for the rules that generate the disassembly).
Warning: qemu does not report 32-bit target type to gdb. You may see wrong disassembly when debugging an MBR. But the execution will still be correct.
Debugging the 64-bit kernel will be much easier.
What's the first ever instruction being executed in the above qemu vm instance? Try to customize .gdbinit and let it stop at the very beginning of the vm execution.
Like our simple MBR code does, the DOS operating system uses a lot of BIOS procedure calls (with the int instructions) to interact with the hardware. How can DOS protect the system from malicious applications?