System Calls

The sbrk() system call

A wrapper function at the userspace: sbrk()

Read about calling conventions.

For system calls, the parameters are passed to the kernel from the "user context" (can be a thread or process) to the kernel handler.

The user context contains the user's current "progress" of its execution, which contains a snapshot of its registers (the trap frame), and all other related information, such as file descriptor table and page table.

The kernel knows everything about the user process/thread. As a result, the kernel can let users to pass parameters using either registers and/or memory (stack).

If passing on the stack, the kernel needs to check if the user's stack (%rsp) is in a valid memory area. This is more expensive than using registers (no checking at all).

Syscall convetions:

In comparison, regular function calls on x86-64 use %rdi, %rsi, %rdx, %rcx, %r8 and %r9. The fourth register is different.

The System ABI specifications (very readable):

The syscall calling convention was intentionally designed to match the function calling convention. This minimizes the syscall complexity at the user side.

glibc's syscall wrapper: syscall.S

System calls in the 64-bit xv6 kernel

What we know about the kernel part

Who called sys_sbrk() in the kernel?

Who called syscall()?

xv6 does not support user-space %fs/%gs. The kernel uses %fs.

How Linux saves the user context in this situation:

Making a syscall from userspace

Recap: kernel can retrieve syscall parameters, passed by registers, from the user's trapframe.

How does a user program pass it to the kernel?

gdb demo with hellobrk

GDB TUI commands https://sourceware.org/gdb/current/onlinedocs/gdb/TUI-Commands.html

Other OSes:

// hellobrk.c

#include "types.h"

#include "user.h"

void onesbrk(int inc) {

  printf(1, "brk(%d) old brk is %d\n", inc, sbrk(inc));

}

int main(int argc, char ** argv) {

  // sbrk(n): set the new brk at old brk + n. n can be positive, zero, or negative.

  // returns the old brk on success (non-negative), returns negative number on error

  int xs[10] = {0, 1, 21, 64, 50000, 0, -50000, -50000, 0, -5000};

  for (int i = 0; i < 10; i++)

    onesbrk(xs[i]);

  exit();

}

Hunt for a kernel bug (or a new feature) with this tiny program.

The brk system call in Linux http://man7.org/linux/man-pages/man2/brk.2.html

Two functions are provided to user space programs: brk() and sbrk()

IDT

Interrupts: external events that need attention. e.g., received keystroke, timer alarm.

Exceptions: internal events caused by the running instruction flow. e.g., divide-by-zero, memory access violation.

When interrupts or exceptions are sent to the processor, the current execution flow will be immediately interrupted.

The CPU (core) will need to transfer control to a pre-defined handling procedure. The execution flow must be preserved so it can be resumed later (unless it will be terminated).

On x86, the Interrupt Descriptor Table (IDT) tells the CPU where to go for an interrupt or exception.

lapic.c: lapicinit()  # enable several interrupts and the controller

Some history of syscall

Originally, user can use the "software interrupt" instruction (int) to invoke system calls. It emulates an interrupt on the calling processor. The control is then transferred to the kernel.

int $0x80

System call borrows/reuses the interupt/exception handling mechanism.

However, the interrupt mechanism is very expensive as it's designed to interrupt execution flow at ANY moment. Few assumption can be made for the execution flow.

For example, user could be using any of the general-purpose registers. the interrupt handler must preserve EVERY general-purpose registers.

x86 has been providing a special mechanism for system calls.

The setup and usage of the syscall mechanism can be learned from xv6 and the instruction's documentations

On older versions of xv6-64, you may still see T_SYSCALL and it's handling on the regular trap path. With the recent updates the syscall code path has been separated from the trap code path to avoid unnecessary confusions (9/8/2019).

Extended reading: Linux/Kernel network flow