Monday, June 22, 2009

Regressions

Its become clear in the process of the HARE work that we really need a regular regression suite to verify functionality as well as track performance over the course of development. It strikes me that whatever we come up with may be generally useful for the Plan 9 kernel since it seems we don't really have any quantitative tracking of kernel performance.

Seems like one of the first things we need is better information about the kernel encoded in the binary. We've got the date in KERNDATE of when it was build, but little information about where and when it was built (and also, by who).  This seems increasingly important in an environment where we aren't just building out of a single directory. For my purposes, some sort of tree identifier such as a venti score or a git HEAD would be useful to encode to track exactly which version of the kernel we are testing. Eventually it seems like it would be a good idea to have a unique id for the build tools and root file system as well.

Regressions will be stored in a temporal directory hierarcy (similar to yesterday), then a directory with the $OBJTYPE filled a directory labeled by the md5sum of the kernel binary at the leaf. This directory will contain an info file with extracted meta-data about the kernel and a numbered subdirectory per run collected. Each of the numbered subdirectories will contain a console output file, and a test output file per test (named by the test). There will also be an info file with any relevant information about the run (in the case of BG/P this will be the personality and bump-id).

The regression binaries will be included with the kernel as part of the initial ramdisk to make things simple at first. The script will be modular and run from the front-end node, requiring nodes be able to boot to the point where one could cpu into them.
This will also require us to be able to determine the assigned IP address of the IO Node(s) automatically. It makes sense to go for 128-node configurations with 2 IO nodes so we can test the Ethernet bandwidth between IO nodes and not just to the front end.

The idea right now is to have a set of 5 initial performance regressions covering computation, sparse memory access, OS interference, file system access, and network access using a typical 4-way SMP configuration. Eventually we may broaden these tests to cover a range of configurations including uniprocessor and large-page. The general idea is to have a short regression which runs around a minute or so for simple sanity check of the system at boot-up and have a longer regression which runs on a daily schedule and does a more comprehensive evaluation.

For the longer test one option might be to measure the time it takes to build the entire system from scratch (kernel build or kernel+applications) on every Blue Gene node hitting a single file server for source files and storing results to a ramdisk. A variation on this would be to develop a parallel build system in which all 128 nodes collaborated on compiling the system.

Another idea I am considering is the use of linuxemu to run Linux benchmarks under Plan 9 to both measure the kernel and the emulation infrastructure. Candidates would include lmbench, postmark, dbench, and netperf.

The initial focus is on gathering results, but the idea is one we have started collecting data to allow web-based visualization of the results tracked over time so we can monitor the general health and performance of the system, As use cases materialize for testing we'll add them to the matrix.

0 comments: