hw5: page table magic – deduplication and copy-on-write

Due by noon, Friday Oct 27, 2023

In this homework, we implement two advanced virtual memory features in xv6. First, we introduce a deduplication system, wherein the kernel scans the system for duplicated pages and frees up memory by reusing identical physical memory page frames in several virtual memory locations.

This, however, introduces a major problem should a process try to write to one of these shared pages. Thus, the second part of our homework is to handle this problem using a copy-on-write approach: write-protect the shared pages, and make a new writable copy for any process that needs it.

Overview

As usual, create your hw5 turn-in repository using this link: https://classroom.github.com/a/qNj1ej8q. Clone the hw5 template from the public repository, using the hw5 branch. 

The changes mostly live in kalloc.c. Pay attention to the new functions krelease() and kretain(), as well as a new array of struct frameinfo. Here, kretain() increases the reference count on a page (indicated using the virtual address of the kernel’s mapping of the physical frame, aka P2V(framenumber<<12), just like how kalloc() and kfree() do it. krelease() decreases the count, and if the count reaches zero, it calls kfree() on the page. kalloc() sets the reference count of a freshly allocated page to 1. Finally, all uses of kfree() throughout xv6 were replaced with krelease().

Attention: whenever you see the term "frame" in xv6, it's about physical pages.

What’s primarily missing is the implementation of dedup and copy-on-write, for which placeholders exist in vm.c.

You only need to edit vm.c to achieve a fully-functional implementation. However, it's okey to edit any other files.

Part 1: Deduplication (70%)

The program dedup_reader allocates a lot of memory and fills it with lots of identical content. This uses up a lot of physical RAM. It then calls the new system call sys_dedup(), which identifies duplicate pages, and frees up most of the memory through virtual memory deduplication. The program then reads from the memory to make sure it still works as expected.

A correct solution finishes dedup in less than 1 second without crashing, and shows an increase in the number of free system pages commensurate with the size of the large allocation in the beginning of the program.

The skeleton dedup() function has be provided at the bottom of vm.c. The new system call "dedup" will come here and print out that message. You only need to implement dedup() to finish this part.

Two helper functions (update_checksum() and frames_are_identical() in kalloc.c) are provided for finding the duplicates quickly. Use these functions with your discretion. You can get full points without using them, as long as it works as expected.

 If your implementation runs correctly but takes a while to execute, dedup() might be repeatedly performing some expensive computations.

Part 2: Copy-on-Write (30%)

The program dedup_writer is similar to dedup_reader, except it then writes to some parts of the memory, checks that other parts of memory are unaffected, calls sys_dedup(), writes some more, and calls sys_dedup() again.

For this to work correctly, you need to write-protect your deduped page table entries, so that a page fault is triggered when a write occurs. This is then caught by a new case in the big trap.c switch statement, and which leads to the copyonwrite() skeleton function in vm.c. You need to implement copyonwrite() to allow write on the deduped pages by creating a private copy of that page and map it there with write enabled.

Note that you may see different numbers but similar patterns--more free pages after each call to dedup.

Bonus challenge: enable copy-on-write for fork() 

A correct copy-on-write implementation (Part 2) is required for this part.

fork() is a core functionality in unix-like systems. It creates an identical copy of the calling process. Combined with a consecutive exec(), who replaces the current process with a given program, a new process can be created. Since exec() will be discarding the old context quickly, creating a duplicate process seems to be very expensive. To minimize this cost, real-world fork adopts copy-on-write, which effectively eliminates the actual copying in fork().

Your task is to enable copy-on-write for xv6's fork(). To get started, read the code from proc.c:fork() to vm.c:copyuvm(). kalloc() is used in copyuvm() to duplicate process memory. You need to change the implementation in copyuvm() so that it get rid of kalloc(). Similar to that in part-2, the actual copying will happen when the shared page is being written to.

In this part, you only need to employ COW for the user pages. Specifically, the page table is still fully duplicated in a fork. In a real-world implementation, the page table is also duplicated using COW, which further reduces the copying efforts (as small as copying only one page in a fork!). However, that will require more extra work in the page table management code.

# of free pages is reduced after the child calls memset.

Requirements

The programs must run without crashing or reporting errors. System free memory should be maximized by the dedup functionality: the reduction in memory usage for a correct program is similar to the size of the full allocation (10 MB) since we fill it with all the same data.

Finally, make sure there are no kernel memory leaks: total system free pages may not change after repeatedly running the test programs.

Turn-in

To verify the correctness of your submission, clone the turn-in repo in a separate directory, build, and test.

dedup:

The process memory starts from 0 to proc->sz. See exec() for the initial size and growproc() for the resizing.

Note that the first page at VA==0 is not mapped. Your code needs to safely ignore those invalid pages.

kretain() increases the reference count of the page by one.

krelease() does the opposite. In addition, if the reference happened to become zero, the page is freed by calling kfree().

krelease() and kretain() automatically maintains the reference count of the each page (using pageframes), and free them when needed.

copy-on-write:

FAQs

How does the CPU distinguish between a copy-on-write page and a read-only page?

How does the kernel know it?

Useful functions/macros: