Process Lifecycle

Some scenarios for process management:

All the above tasks can be handled by a combination of fork() and exec() calls.

Process creation in *nix OSes

The typical way to create a new process in *nix:

#include <stdio.h>

#include <stdlib.h>

#include <unistd.h>

#include <sys/types.h>

#include <sys/wait.h>


  int

main(int argc, char ** argv)

{

  pid_t child_pid = fork();

  if (child_pid == 0) { // I'm the child

    printf("I'm the child. pid == %u\n", getpid());

    char * args[] = {"ls", "-al", "/home", 0};

    execvp("ls", args);

    printf("exec() failed...\n");

  } else if (child_pid > 0) { // I'm the parent

    printf("I'm the parent. child_pid == %u\n", child_pid);

    waitpid(child_pid, NULL, 0);

  } else {

    printf("fork() failed\n");

  }

  return 0;

}


The first two processes in xv6

(Read 06: The First Process about the initialization of the first process environment)

userinit()

initcode.S

init.c

sh.c

The fork() system call

allocproc()

copyuvm()

After setting its state to RUNNABLE, the child process is ready to run!

Linux's fork:

The exec() system call

After a successful call to exec() from the user space: The original execution flow of the process is destroyed and replaced by the fresh new process context setup in exec().

Open file descriptors are inherited by default. But a program can also control the behavior.

fork() in the real world: Copy-On-Write

It's obvious that if we're going to call exec() to execute a new program, the call to fork() did a lot of useless job, as everything created by the fork() will be erased by exec()

In the real system, fork() uses the technique called "copy-on-write" (aka. COW) to minimize the overhead of fork(), so we don't need to add a new system call "CreateProcess".

The core idea: the two process can share the memory pages until something is changed.

Upon a write operation, the writer process creates a private copy of the modified memory (one page at a time).

After fork(), if exec() is immediate called, almost nothing is really duplicated by the fork().

In the real systems, the page table can also be COW-ed, so the real cost for a fork() (immediately followed by exec()) is about duplicating only a few pages -- a new stack and the corresponding page table nodes for that stack.

CreateProcess() in Windows OSes

Windows has CreateProcess() -- why it needs so many parameters?

With fork() and exec(), a lot of jobs can be done on demand between these two calls.

Since the parent process loses control of the child after CreateProcess(), it needs to specify everything with the parameters. The side effect is the CreateProcess() need to check all the parameters.

Threads (POSIX threads)

In Linux, fork() internally calls the clone() system call. There are parameters to specify what should be duplicated with clone().

POSIX threads, or pthreads, is the standard mechanism on Linux.

pthread_create() calls clone() but it does not duplicate the page table. Both "execution flows" share the same VM (and a few other things), which is the behavior of threads.

Read the code in glibc's thread creation code and the man page for details about thread creation.

pthread's use of clone(): (Note that a specified flag means the corresponding resource will be shared by the two processes. -- Don't try to define what is a "process".

const int clone_flags = (CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SYSVSEM

         | CLONE_SIGHAND | CLONE_THREAD

         | CLONE_SETTLS | CLONE_PARENT_SETTID

         | CLONE_CHILD_CLEARTID

https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/createthread.c;hb=HEAD

fork()'s use of clone():

const int flags = CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD;

https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/arch-fork.h;hb=HEAD

More on fork()

The reverse of COW, KSM (Kernel Same-page Merging)

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/chap-ksm

It's an "asynchronous" version of your hw5 dedup().