File systems

Disks

Basics:

IDE interface: One request at a time. Uses IO ports to communicate with the hardware.

Essential file system concepts

Two concepts for managing data on the disk:

Virtual file system (VFS)

The tree structure can be used to index the devices and store the metadata of the OS.

Windows has multiple trees to manage different kind of resources:

*NIX uses one tree, aka. the VFS, to organize everything that can be exposed to the user space:

We use "mount" to link of create a file system in VFS, at a certain path.

Mount point:

xv6's file system implementation

Related source code:

Your task: read the code and get familiar with the core operations of read and write.

mkfs.c: A standalone tool for creating file system images. It's NOT part of the kernel file system code.

The structure of xv6's concrete file system (fs.{h,c})

On the disk: [ boot block (the mbr)  | file system ]

Zoom in: [ mbr | superblock | log | inodes | bitmap | data blocks ]

inode

An inode corresponds to a file. A file has a number of bytes.

The file system uses some special files to manage file-system metadata, including directories and shortcuts (softlinks).

In a file system, each inode is assigned a unique number, the "inode number" (inum)

There two inode structures in xv6:

In the source code, most of the complexity comes from maintaining the consistency between the on-disk fs and the in-memory copy.

Directory -- a special file

struct dirent { // directory entry

  ushort inum; // a number

  char name[]; // a string

}

the content of a directory file is effectively a struct dirent[]

However, the concept of inode and the inode number is completely hidden from user applications.

Inside the file system, the media is the inode number:

Example: open the file "/proc/1/pid"

The root directory's inum is hardcoded to 1 (see ROOTINO in fs.h)

softlink vs hardlink

softlink is a special file. It contains a path, works like a shortcut.

hardlink is actually no different than any other files. See this example:

softlink can be created for anything. hardlink cannot be created across FS boundaries.

Maintaining consistency on persistent storage (the disks)

What is "inconsistency"?

old systems:

new systems:

key technologies:

In most Linux distros, the default setting only enables metadata-journaling. For example, rewriting of a file does not use the journal. You can see a mix of old/new contents in the file after a crash.

xv6 logging

Let's start from filewrite() in file.c -- the "VFS" write().

begin_op() and end_op() mark the critical section. There is no concurrency between write operations.

in writei():

end_op() calls commit() to do the actual writes.

commit():

When calling commit(), the log.lock is not held!

Instead of using the lock, the log.committing blocks other writers from entering the critical section. See begin_op().

xv6's write() is not scalable.

opportunities to improve:

Tree indexes and their overhead

B+-tree

B-epsilon tree

For consistency, the above structures may have to employ expensive logging or COW mechanisms.

LSM-tree (log-structured merge tree)