~2012

Debian's initial root filesystem

Debian's initial root filesystem (initrd) is a minimal environment basically responsible for finding your root filesystem on some device in your computer. In this article I'm unwinding what it's composed of and how it is generated so I can use the same mechanisms to place a complete Debian (Emdebian) environment in the initial filesystem removing the need to find the root filesystem device.

How is it generated? mkinitramfs

This is done using the mkinitramfs(8) tool. This is a simple bash script doing some file copies and creating a cpio archive. In line 167 the DESTDIR var is set which is the directory where a file system will be generated:

DESTDIR="$(mktemp -d ${TMPDIR:-/var/tmp}/mkinitramfs_XXXXXX)" || exit 1

In line 196 all the need dirs are created:

for d in bin conf/conf.d etc lib/modules run sbin scripts ${MODULESDIR}; do
    mkdir -p "${DESTDIR}/${d}"
done

After that some essential stuff is done for kernel modules. I'm skipping over it for now.

At line 239 the most important script (init) is copied to the filesystem

# First file executed by linux-2.6
cp -p /usr/share/initramfs-tools/init ${DESTDIR}/init

After that some optional scripts are copied:

# add existant boot scripts
for b in $(cd /usr/share/initramfs-tools/scripts/ && find . \
1.regextype posix-extended -regex '.*/[[:alnum:]\._-]+$' -type f); do
    [ -d "${DESTDIR}/scripts/$(dirname "${b}")" ] \
| | mkdir -p "${DESTDIR}/scripts/$(dirname "${b}")"
    cp -p "/usr/share/initramfs-tools/scripts/${b}" \
        "${DESTDIR}/scripts/$(dirname "${b}")/"
done
for b in $(cd "${CONFDIR}/scripts" && find . \
2.regextype posix-extended -regex '.*/[[:alnum:]\._-]+$' -type f); do
    [ -d "${DESTDIR}/scripts/$(dirname "${b}")" ] \
| | mkdir -p "${DESTDIR}/scripts/$(dirname "${b}")"
    cp -p "${CONFDIR}/scripts/${b}" "${DESTDIR}/scripts/$(dirname "${b}")/"
done

echo "DPKG_ARCH=${DPKG_ARCH}" > ${DESTDIR}/conf/arch.conf
cp -p "${CONFDIR}/initramfs.conf" ${DESTDIR}/conf
for i in ${EXTRA_CONF}; do
    if [ -e "${CONFDIR}/conf.d/${i}" ]; then
        copy_exec "${CONFDIR}/conf.d/${i}" /conf/conf.d
    elif [ -e "/usr/share/initramfs-tools/conf.d/${i}" ]; then
        copy_exec "/usr/share/initramfs-tools/conf.d/${i}" /conf/conf.d
    fi
done

Have a look in /usr/share/initramfs-tools/scripts/ on your machine to see whats there.

At line 277 the script copies some binaries from your system to the filesystem including config files:

# module-init-tools
copy_exec /sbin/modprobe /sbin
copy_exec /sbin/rmmod /sbin
mkdir -p "${DESTDIR}/etc/modprobe.d"
cp -a /etc/modprobe.d/* "${DESTDIR}/etc/modprobe.d/"

I don't why this is done since busybox can provide these as well but it's probably for a good reason.

Line 288 is interesting as well since extra scripts are run here. On my ubuntu system these are:

/usr/share/initramfs-tools/hooks
/usr/share/initramfs-tools/hooks/ntfs_3g
/usr/share/initramfs-tools/hooks/kbd
/usr/share/initramfs-tools/hooks/brltty
/usr/share/initramfs-tools/hooks/thermal
/usr/share/initramfs-tools/hooks/console_setup
/usr/share/initramfs-tools/hooks/dmsetup
/usr/share/initramfs-tools/hooks/framebuffer
/usr/share/initramfs-tools/hooks/fuse
/usr/share/initramfs-tools/hooks/busybox
/usr/share/initramfs-tools/hooks/klibc
/usr/share/initramfs-tools/hooks/mountall
/usr/share/initramfs-tools/hooks/fixrtc
/usr/share/initramfs-tools/hooks/udev
/usr/share/initramfs-tools/hooks/compcache
/usr/share/initramfs-tools/hooks/plymouth

These scripts prepare certain aspects in the filesystem.

The rest of the lines do some workarounds and are self-explainatory.

First thoughts

Since I'm looking to create an easy way to get the whole filesystem into RAM I'm not so happy that quite a few files are actually copied from the machine to the filesystem. This makes sense for the initial filesystem for your machine but preparing a filesystem for a different machine is optentially dangerous that way. I cannot rely on the host's setup creating the filesystem.

When it's run: init

When the kernel is ready for it, it executes /init. This can be any executable. In Debian's setup it's a script doing the following things:

It starts creating the first needed dirs and mounting /proc and others:

[ -d /dev ] || mkdir -m 0755 /dev
[ -d /root ] || mkdir -m 0700 /root
[ -d /sys ] || mkdir /sys
[ -d /proc ] || mkdir /proc
[ -d /tmp ] || mkdir /tmp
mkdir -p /var/lock
mount -t sysfs -o nodev,noexec,nosuid sysfs /sys
mount -t proc -o nodev,noexec,nosuid proc /proc

It prepares some stuff for /dev and continues to prepare a lot of vars which will be used later on.

Then it reads the commandlines used to boot the kernel and sets them to right vars:

# Parse command line options
for x in $(cat /proc/cmdline); do
    case $x in
    init=*)
        init=${x#init=}
        ;;
#etc....

Now we come to the part where the extra scripts are run:

maybe_break top

# Don't do log messages here to avoid confusing graphical boots
run_scripts /scripts/init-top

maybe_break modules
[ "$quiet" != "y" ] && log_begin_msg "Loading essential drivers"
load_modules
[ "$quiet" != "y" ] && log_end_msg

[ -n "${netconsole}" ] && modprobe netconsole netconsole="${netconsole}"

maybe_break premount
[ "$quiet" != "y" ] && log_begin_msg "Running /scripts/init-premount"
run_scripts /scripts/init-premount
[ "$quiet" != "y" ] && log_end_msg

maybe_break mount
log_begin_msg "Mounting root file system"
. /scripts/${BOOT}
parse_numeric ${ROOT}
maybe_break mountroot
mountroot
log_end_msg

maybe_break bottom
[ "$quiet" != "y" ] && log_begin_msg "Running /scripts/init-bottom"
run_scripts /scripts/init-bottom
[ "$quiet" != "y" ] && log_end_msg

The maybe_break functions are there for debugging. You can set the break parameter when you boot the kernel. Not important now but perhaps later. ;)

Explore the scripts that will be run. They are essential for setting up the operating system.

The last bits of the script are a bit magical. If you read it right you'll notice the script will never end! This is normal. In the end the script tries to switch from root filesystem in ram to the one on disk or if it can't run 'run-init':

# Chain to real filesystem
if command -v switch_root >/dev/null 2>&1; then
    exec switch_root ${rootmnt} ${init} "$@" <${rootmnt}/dev/console >${rootmnt}/dev/console
elif command -v run-init >/dev/null 2>&1; then
    exec run-init ${rootmnt} ${init} "$@" <${rootmnt}/dev/console >${rootmnt}/dev/console
fi

From there your executable ${init} on disk will take over. The init var can be set on the kernel commandline. By default it is set to '/sbin/init' (see line 48).