CS261: Research Topics in Operating Systems

Building and Debugging NetBSD Kernels

David A. Holland

August 17, 2009

(Based in part upon "Building and Debugging BSD/OS Kernels" by Keith A. Smith)

This page provides a brief discussion of how to configure, build, test, and debug a NetBSD kernel. These notes are intended for work on NetBSD-current (5.99.x) but should mostly apply to the 5.0 stable branch as well.

Contents:

Hardware environment

Checking out a source tree

Source organization

Building the cross-compile tools

For further information see the NetBSD Guide, as well as other docs. Tracking NetBSD-current may also be helpful.

If you discover something that you think may be helpful to your classmates, please post it on the course wiki.

Hardware environment

Ideally, you will have two machines available to you. The "build machine" is a stable machine that you use to develop and build kernels. The build machine should be accessable over the network, so you can log into it remotely to do development work.

The second machine is the "test machine." This is a machine that you use to test the kernels that you build. As a result this machine will be rebooted frequently. Nobody should log into a test machine except the person who is currently using it to test kernels. You will need access to the console of the test machine in order to test kernels on it.

Note that today either or both machines can be virtual, using e.g. Xen or VMware. Generally benchmarking should be done on real machines, not virtual machines; however, during development and testing a virtual test machine can make your work go much faster.

To make kernel debugging easier, the build machine (real or virtual) can be connected to the test machine via a (real or virtual) null modem serial cable. This should connect COM1 or COM2 (/dev/tty00 or /dev/tty01) on the build machine to COM1 or COM2 on the test machine.

The course staff will inform you of the names and location(s) of the build and test machines available to you. We will also arrange for console access as needed.

Checking out a source tree

The master system source tree on BSD systems lives in /usr/src. In general you don't want to develop in your system's master source tree, because you want to be able to use that source tree to update and recompile the system without having it break. So you want to check out another copy.

To check out a copy of the system source, using the CVS version control tool used by NetBSD, do this:

	cvs -z3 -d anoncvs@anoncvs.netbsd.org:/cvsroot checkout -P src

To get the 5.x stable branch instead:

	cvs -z3 -d anoncvs@anoncvs.netbsd.org:/cvsroot checkout -P -rnetbsd-5 src

The checkout will take a while, because the tree is large and CVS is slow. It will give you a directory "src" containing the latest system sources.

It is possible to check out just the kernel (and save all the space used by the user-space programs, gcc, and so on) but this is not likely to work very well unless you are also running NetBSD-current on your development machine.

Source organization

The kernel source appears in src/sys. (The other subdirectories of src contain programs and libraries organized into directories based on where on the system they install into; for example, the source for the sort program is found in src/usr.bin/sort.)

Within the kernel, most of the core kernel parts are found in src/sys/kern. The VM system (called UVM) appears in src/sys/uvm. Machine-dependent code appears in src/sys/arch, grouped by platform. Network code appears in the various net* subdirectories of src/sys.

File systems appear in src/sys/fs, src/sys/miscfs, and src/sys/ufs, with some appearing at the top level under src/sys. Which file system goes where is a matter of historical accident; because CVS does not readily support renaming files or directories, things get moved around only very slowly.

For more information see hier(7).

Building the cross-compile tools

NetBSD can be cross-compiled from (in principle) any Unix machine. In order to make this work, since NetBSD comes with its own make system builds and various related tasks are started with the build.sh script that lives in the top level of the source tree.

To build the cross-compiler tools, go to the top level of your source tree (src) and run

	mkdir obj
	./build.sh tools

(The mkdir obj is recommended because of some peculiarities in the order things are first set up. If the directory already exists, you don't need to create it. If you're building in /usr/src it will be a symbolic link to /usr/obj instead. The object directories can be set up in any of several alternate ways instead; see the file BUILDING if you really want to know the details.)

By default it will build tools for the same machine type you're building on (e.g. i386, amd64, vax) -- to build something else, use the -m option to build.sh. See BUILDING.

If you are going to be using remote gdb (kgdb) and your build machine isn't already running NetBSD, add the option -V MKCROSSGDB=yes to cause a cross-debugging version of gdb to be built along with the other tools. (This is not the default.)

When this finishes you'll have a directory named something like /usr/obj/tooldir.NetBSD-5.99.15-i386 (based on the OS, version, and machine type you're running on) with a compiler and various tools installed in it. To (cross-)compile NetBSD or parts of NetBSD, you'll run these tools instead of the default ones on your build system. Mostly the system makefiles will take care of this for you.

Configuring a kernel

Now you need to create a configuration file that specifies the kernel you want to build. The kernel config tool will create a build directory, including a makefile and the source and header files needed to describe your kernel configuration to the compiler. If you took CS161 and used OS/161 this process will be fairly familiar, though the details differ.

Go to the conf directory for the architecture you're using. Assuming that's i386, this directory is src/sys/arch/i386/conf. This directory will contain a number of pre-existing config files. The basic procedure is to copy one (usually GENERIC) to another name (e.g. "TEST") and edit the copy to suit your needs. Generally this means commenting out device drivers and file systems you don't need for your test machine (this reduces the build time) and, for kernel development, enabling one or more debugging-oriented options.

Things you want to enable:

options DIAGNOSTIC (enables assertions)
options LOCKDEBUG (enables various locking-related checks)
options DDB (the in-kernel debugger)
options DDB_ONPANIC=1 (make ddb kick in on panic)
options DDB_VERBOSE_HELP

Turning on DEBUG might or might not be worthwhile depending on what you're working on. It (among other things) allows/requires more manual intervention during bootup; this is good if bootup is not working, but a hassle otherwise.

If you have a serial cable set up to allow remote debugging with gdb, enable KGDB and uncomment makeoptions DEBUG="-g". Make sure the KGDB_DEVNAME, KGDB_DEVADDR, and KGDB_DEVRATE are set up right for the way you have the serial cable plugged in. See the NetBSD kgdb docs for more information.

Caution: On NetBSD-current on i386, the GENERIC kernel is set up to use loadable kernel modules for many vital things. The kernel modules scheme is not fully mature in a number of ways, with the upshot that you'll get a completely unbootable system if you accidentally change any of the kernel-internal binary interfaces used by important modules. The config file MONOLITHIC includes GENERIC and adds the changes needed to run without kernel modules. As of this writing I strongly recommend that you copy these definitions to the end of your own kernel config.

Once you have created a configuration file, run the kernel config tool to generate the build directory. The config tool is called "config", but you want to run the copy from the tools you built in the previous step. If your source tree is in ~/cs261/src, run ~/cs261/src/obj/tooldir.NetBSD-5.99.15-i386/bin/nbconfig TEST, using the appropriate tooldir name and the name of your config file.

Note that you only have to rerun the config tool if you add new source files to the kernel or change something in the configuration file. Therefore, it is probably not worth setting up scripts or shell aliases to streamline this process.

Building a kernel

First, go to the build directory you just created:

	cd ../compile/TEST		(or whatever you named your config)

and now run make. Except you need to run the NetBSD make you built with the tools above. If your source tree is in ~/cs261/src, do

	~/cs261/src/obj/tooldir.NetBSD-5.99.15-i386/bin/nbmake-i386

Note that if you're cross-compiling to a different processor type (e.g. vax) the machine type for the tooldir is the host type and the machine type tacked onto nbmake is the target type. So if you were compiling on PowerPC Linux for VAX NetBSD you'd run

	~/cs261/src/obj/tooldir.Linux-2.6.33-powerpc/bin/nbmake-vax

You will run nbmake-i386 (or whatever) often enough that you'll want to create a script or shell alias to do it for you. The simplest way to do this is to put

	alias nbmake ~/cs261/src/obj/tooldir.NetBSD-5.99.15-i386/bin/nbmake-i386

(or whatever) in your .aliases or .cshrc file, and log in again to reload that file.

Having done that, to actually compile you do

	nbmake depend
	nbmake

(You can use -jNUM to do a parallel build if you're on a multicore machine.)

The resulting kernel is called "netbsd" and can be copied to the root directory of your test machine and booted there. Note that you want to copy it as "netbsd.test" or something of the sort, not just "netbsd", because if your kernel doesn't work you want to be able to boot the old known-working kernel.

If you enabled full debugging symbols for use with kgdb, you'll also get a file netbsd.gdb that includes all the debug info. When debugging you'll give this kernel to gdb.

Booting a kernel

If you didn't already copy the kernel to the test machine (as /netbsd.test, do that. (One of the great ways to waste time when kernel hacking is to forget to copy the kernel over after recompiling it. This is why each build gets numbered; in theory you can check that the kernel you just booted really is the one you just built.)

Now reboot the machine. You'll need to log in as root on the console to do this.

When the test machine gets past the BIOS and loads the NetBSD bootloader, you'll get a window of a few seconds to interrupt the automaatic boot process. Press spacebar to do this. At the boot prompt, type

	boot netbsd.test

and away you go.

If you want to boot single-user (that is, just to a root shell, without starting daemons and getty/login and all that) instead do

	boot netbsd.test -s

In single user mode the kernel will probe and configure hardware, and then print

    Enter pathname of shell or RETURN for sh:

At this point press return. From the single-user shell you may need to remount the root device read-write in order to be able to do anything:

   mount -w /

You may also want to mount other file systems or start the network or daemons by hand. If you do any of these things, exit single-user mode by rebooting. Otherwise, you can exit the single-user shell and the system will continue ahead to boot multiuser. (If you started anything up by hand, the multiuser boot process will likely get confused if you don't do a clean reboot.)

Debugging a kernel

There are various techniques that you can use for debugging a kernel. printf() is implemented inside the kernel, so you can print relevant data from within the kernel. The kernel printf() implementation prints directly to the system console. This means that you can only see 25 lines of text at a time. Be careful where you put a printf() statement in the kernel. If you try to print from a routine that is called frequently (e.g., the clock interrupt), the printing will scroll by too fast for you to read.

The builtin debugger (ddb) is good enough for many purposes. In particular, when all you really need is to figure out how you got to a panic, it will generally get you a perfectly good stack trace. However, for more complex operations, like single-stepping, it doesn't always work so well, and ddb isn't all that powerful or smart.

Using remote gdb over a serial port is more involved, but works better for complex tasks. To do this, we run gdb on the build machine, and use it to debug a kernel running on the test machine. gdb communicates with the test kernel over a serial line connecting the two machines. To connect gdb to the test kernel, change directories to your compile directory (e.g., src/sys/arch/i386/compile/TEST) and run:

    gdb netbsd.gdb

Note that if your build machine doesn't run NetBSD you'll need a cross-gdb, which you can arrange to build as part of the tools (see above) and you'll run it out of your tooldir like this:

    ~/cs261/src/obj/tooldir.NetBSD-5.99.15-i386/bin/i386--netbsdelf-gdb netbsd.gdb

where the tooldir varies as usual and the exact name of gdb in that directory depends on the mood of the FSF at the time gdb was imported into NetBSD.

When you get the gdb prompt, type:

    target remote tty00

    target remote tty01

depending on whether the serial cable is connected to the build machine on COM1 or COM2 respectively. (Note: on Linux the corresponding device names are ttyS0 and ttyS1 instead.)

This tells gdb to debug a remote process communicating over the specified tty. When gdb establishes communication with the test kernel, the test kernel will stop running and wait for instructions from gdb. You can examine data values, set breakpoints, and do all of the other things you're accustomed to doing from gdb. To tell the kernel to resume execution while you are connected with gdb, type "cont". If your test kernel panics, it should give you the opportunity to connect with gdb.

If you took CS161 and used gdb on OS/161, this process will be mostly familiar. The primary difference is that in this environment the remote gdb support is itself running as part of the kernel, so debugging is not as transparent and you won't be able to e.g. debug early in the boot sequence or, probably, trace through interrupt handlers.

When you are done using gdb, use the "detach" command to tell it to close its connection to the remote machine. This is very important. If you simply quit gdb, or reset the test machine, you may leave the serial line in an inconsistent state. When you detach the test kernel will resume normal execution.

See the NetBSD kgdb docs for more information.