Aboriginal Linux

New to the project? Read About Aboriginal Linux.
Current release: (version 1.4.4, November 11, 2015): build scripts, prebuilt binaries.
Development version: git repository, rss commit feed, snapshot tarball, mailing list.

News

News

Documentation

Download

Source Tarballs
Prebuilt Binaries Current Older

Development

Control Images

Build Control Images

THIS FILE IS HORRIBLY OUT OF DATE.

The FAQ has up to date information, all this needs a complete rewrite.

Documentation for Aboriginal Linux

What is Aboriginal Linux?
How do I use system images?
How do I build my own customized system images from source code?
How is Aboriginal Linux implemented?
Why do things this way?
Adding a new target platform

What is Aboriginal Linux?

Aboriginal Linux is a toolkit for building custom virtual machines. It lets you boot virtual PowerPC, ARM, MIPS and other exotic systems on your x86 laptop, and do development in them.

The name "Aboriginal Linux" describes the project's goal of bootstrapping a new Linux for a new target, doing all the cross compiling necessary to transition to fully native development in the new environment. This new Linux system can then be upgraded or replaced in-situ.

Aboriginal Linux provides an easy way to get started with embedded development. It also lets you build your own code against uClibc and test it on various hardware platforms, and even perform cross-platform regression testing or portability auditing.

This documentation uses the name "Aboriginal Linux" to refer to the build system consisting of a series of bash scripts and configuration files which download and compile software. The output of that build system is referred to as a "system image". The build system compiles a Linux development environment for the specified target system, and packages it into a bootable binary system image.

The base development environment is built from seven source packages: busybox, uClibc, gcc, binutils, make, bash, and the Linux kernel. This is the smallest and simplest environment that can rebuild itself entirely from source code, and thus the minimum a host system must cross compile in order to create a fully independent native development environment for a target.

Booting a development system image under an emulator such as QEMU allows fully native builds for supported target platforms to be performed on cheap and powerful commodity PC hardware. You can then build and install additional packages (zlib, bison, openssl...) within the virtual machine's native development environment, without having to do any additional cross compiling. Several build control images are provided to automate this task, and you're welcome to create your own from those examples.

Aboriginal Linux currently includes full support for arm, mips, powerpc, x86, x86-64 targets, and several other more exotic platforms; see the screenshots page for a complete list. The goal project is to support every target QEMU can emulate in "system" mode.

Aboriginal Linux is licensed under GPL version 2. Its component packages are redistributed under their respective licenses (mostly GPL and LGPL).

Optional extras

Intermediate stages of the build (such as the cross compiler and the the raw root filesystem directory) may also be useful to Linux developers, so tarballs of them are saved during the build.

By default the build cross-compiles some optional extra packages (distcc and uClibc++) and preinstalls them into the target filesystem. This is just a convenience; these packages build and install natively within the minimal development system image just fine.

Using system images

If you want to jump straight to building your own software natively for embedded targets, you can download a prebuilt binary image instead of running the build scripts to produce your own.

Here are the different types of output produced by the build:

system-image-*.tar.bz2

System images boot a complete linux system under an emulator. Each system-image tarball contains a squashfs root filesystem image, a Linux kernel configured to run under the emulator QEMU, and scripts to launch the virtual system under the emulator in various configurations.

The steps to test boot a system image under QEMU are:

install QEMU
download the appropriate prebuilt binary tarball for the target you're interested in
extract it: tar -xvjf system-image-$TARGET.tar.bz2
cd into it: cd system-image-$TARGET
execute it: ./run-emulator.sh

This boots the system image under the appropriate emulator, with the emulated Linux's /dev/console hooked to stdin and stdout of the emulator process. (I.E. the shell prompt the script gives you after the boot messages scroll past is for a shell running inside the emulator. This lets you pipe the output of other programs into the emulator, capture the emulator's output with "tee", cut and paste in the terminal window, etc.)

Type "cat /proc/cpuinfo" to confirm you're running in the emulator, then play around and have fun. Type "exit" when done.

Inside a system image, you generally wget a source code package from a URL and compile it. (You can even wget the Aboriginal Linux build scripts and run them inside one of the system images to trivially prove the project can rebuild itself.)

Inside QEMU you can access the host system's loopback interface using the special address "10.0.2.2". The build control images use this to run busybox's FTP server on the host's loopack address, allowing the system image to upload its results to the host at the end of the build. You can also run web servers and ssh servers on the host's loopback, and the system image can connect to them.

Extra space and speed

The system images by themselves are fairly small (64 megabytes), and don't have a lot of scratch space for building or installing other packages. If a file named "hdb.img" exists in the current directory, run-emulator.sh will automatically designate it as a second virtual hard drive and attempt to mount the whole unpartitioned device on /home inside the emulator.

Some optional command line arguments to run-emulator.sh provide extra space and extra speed for compiling more software:

--make-hdb $MEGABYTES - if the hard drive image to mount on /home doesn't already exist, create a sparse file of the indicated size and format it ext3.
--with-hdb $FILENAME - use specified $FILENAME from the host as the hard drive image to mount on the emulated system's /home (instead of the default "hdb.img"). Fail if it doesn't exist, unless --make-hdb was also specified.
--with-distcc $CC_PATH - enable the distcc accelerator trick. This option provides the path to an appropriate cross compiler directory, so run-emulator.sh can launch a distcc daemon on the host's loopback device configured to call that cross compiler, and configure the emulated system to call out to that cross compiler through distcc.

Running an armv4l system image with the cross compiler installed in the user's home directory, using a hard drive image in the user's home directory (to be created with a size of 2 gigabytes if it doesn't already exist) might look like:

./run-emulator.sh --make-hdb 2048 --with-hdb ~/blah.img --with-distcc ~/cross-compiler-armv4l

mini-native-*.tar.bz2

These tarballs contain the same root filesystem as the corresponding system images, just in an archive instead of packaged into a filesystem image.

If you want to boot your own system image on real hardware instead of an emulator, the appropriate mini-native tarball is a good starting point. If all you want is a native uClibc development environment for your host, try:

chroot mini-native-x86_64 /usr/chroot-setup.sh

The boot script /usr/qemu-setup.sh or /usr/chroot-setup.sh performs minimal setup for the appropriate environment, mounting /proc and /sys and such. It starts a single shell prompt, and automatically cleans up when that process exits.

If you're interested in building a more complex development environment within this one (adding zlib and perl and such before building more complicated packages), the best way to learn how is to read Linux From Scratch.

Note that mini-native is just one potential filesystem layout; the FWL build scripts have several other configurations available when you build from source.

cross-compiler-*.tar.bz2

The cross compilers created during the FWL build are relocatable C compilers for each target platform. The primary reason for offering each cross compiler as a downloadable binary is to implement the distcc accelerator trick. Using them to cross compile additional software is supported, but not recommended.

If you'd like to use one for something other than distcc, this documentation mostly assumes you already know how. Briefly:

download the appropriate cross-compiler-$TARGET.tar.bz2
extract it somewhere (doesn't matter where)
add the resulting cross-compiler-$TARGET/bin subdirectory to your $PATH
either use $TARGET-gcc as your compiler, or set your $CROSS_COMPILE prefix to "$TARGET-" with a trailing dash.

Also, stock up on aspirin and clear a space to beat your head against; you'll need both. See why cross compiling sucks for more details.

Note that although this cross compiler has g++, it doesn't have uClibc++ in its lib or include subdirectories, which is required to build most c++ programs. If you need extra libraries, it's up to you to cross-compile and install them into those directories.

How do I build my own customized system images from source code?

To build your own root filesystem and system images from source code, download and run the FWL build scripts. You'll probably want to start with the most recent release version, although once you've got the hang of it you might want to follow the development version.

For a quick start, download the tarball, extract it, cd into it, and run "./build.sh". This script takes one argument, which is the target to build for. Run it with no arguments to list available targets.

This should produce all the tarballs listed in the previous section in the the "build" directory. To perform a clean build, "rm -rf build" and re-run build.sh.

How building from source works

The build system is a series of shell scripts which download, compile, install, and use the appropriate source packages to generate a system image. These shell scripts are designed to be easily read and modified, acting both as tools to perform a build and as documentation on how to build these packages.

The build.sh script is a simple wrapper which calls the following other scripts in sequence:

download.sh
host-tools.sh
cross-compiler.sh $TARGET
mini-native.sh $TARGET
package-mini-native.sh $TARGET

In theory, the stages are orthogonal. If you have an existing cross compiler, you can add it to the $PATH and skip cross-compiler.sh. Or you can use _just_ cross-compiler.sh to create a cross compiler, and then go build something else with it. The host-tools.sh stage can often be skipped entirely.

Build stages

The following files control the individual stages of the build. Each may be called individually, from in the top level directory of FWL:

download.sh - Download source packages from the web.

This script does not take any arguments. It's a series of calls to a download function (defined in sources/include.sh) that checks if an existing copy of the tarball matching a defined $SHA1 sum exists in the sources/packages directory, and if not uses wget to fetch it from the $URL (or else from a series of fallback mirrors). A blank value for $SHA1 will accept any file as correct, ignoring its contents.

After downloading all tarballs, the function cleanup_oldfiles deletes any unused files from sources/packages (generally previous versions left over after a package upgrade while using the development version of the FWL build scripts).

Running this stage with the argument "--extract-all" will extract all the tarballs at once, to populate the cache used by setupfor. (This is primarily used to avoid race conditions when building multiple architectures in parallel with build-all-targets.sh. This is an esoteric internal detail you can safely ignore if you're not doing that.)
host-tools.sh - Set up a known environment on the host

This script does not take an arguments. In theory this is an optional step, and may be omitted, as the binaries produced by this script are not included in any of the output tarballs.

This script populates the build/host directory with host versions of the busybox and toybox command line tools (the same ones that the target's eventual root filesystem will contain), plus symlinks to the host's compiler toolchain (I.E. compiler, linker, assembler, and so on).

This allows the calling scripts to trim the $PATH to point to just this one directory, which serves several purposes:
- Isolation - This prevents the ./configure stages of the source packages from finding and including unexpected dependencies on random things installed on the host.
- Portability - Using a known set of command line utilities insulates the build from variations in the host's Linux distribution (such as Ubuntu's /bin/echo lacking suport for the -e option).
- Testing - It ensures the resulting system can rebuild itself under itself, since the initial build was done with the same tools we install into the target's root filesystem. The initial build acts as a smoke test of most of the packages used to create the resulting system, and restricting $PATH ensures that no necessary commands are missing. (Variation can still show up between x86/arm/powerpc versions, of course.)
  
  It also moves most failures to the beginning. If anything is going to break, it's usually the host-tools build. After that runs, we're mostly in a known and tested state.
- Dependency tracking - If we don't explicitly know everything we need to build ourselves in the first place, we can't be sure we added it to the final system to get a self-hosting environment.
A secondary purpose of host-tools.sh is to build packages (such as distcc and genext2fs) which might not be installed on the host system. Some of these aren't needed by build.sh but may be used by later run-emulator.sh invocations (such as the ./run-build-image.sh script).

Note that this script does not attempt to build qemu, due to the unreasonable requirement of installing gcc 3.x on the host. The FWL build scripts do not use qemu (except as an optional test at the end of cross-compiler.sh which is skipped if qemu is not available). You will need to install qemu (or another emulator, or find real hardware) to use the resulting system images, but they should build just fine without it.

This stage is optional. You don't need to run this stage if you don't want to. If the build/host directory doesn't exist (or doesn't contain a "busybox" executable), the build will use the host's original $PATH.

cross-compiler.sh - Build a cross compiler for the target, for use by mini-native.sh and the distcc accelerator.

In order to build binaries for the target, the build must first create a cross compiler to build those target binaries with. This script creates that cross compiler. If you already have a cross compiler, you can supply it here (the easy way is to create a build/cross-compiler-$TARGET/bin directory and put "$TARGET-gcc" style symlinks in it) and skip this step.

This script takes one argument: the architecture to build for. It produces a cross compiler that runs on the host system and produces binaries that run on the target system. This cross compiler is created using the source packages binutils, gcc, uClibc, the Linux kernel headers, and a compiler wrapper to make the compiler relocatable.<

The reason for the compiler wrapper is that by default, gcc hardwires lots of absolute paths into itself, and thus only runs properly in the directory it was built in. The compiler wrapper rewrites its command line to prevent gcc from using its built-in (broken) path logic.

The build requires a cross-compiler even if the host and target system use the same processor because the host and target may use different C libraries. If the host has glibc and the target uses uClibc, then the (dynamically linked) target binaries the compiler produces won't run on the host. (Target binaries that won't run on the host are what distinguishes cross-compiling from native compiling. Different processors are just one reason for it: glibc vs uClibc is another, ELF vs binflat or a.out executable format is a third...)

This script produces produces a working cross compiler in the build directory, and saves a tarball of it as "cross-compiler-$TARGET.tar.bz2" for use outside the build system. This cross compiler is fully relocatable (because of the compiler wrapper), so any normal user can extract it into their home directory, add cross-compiler-$TARGET/bin to their $PATH, and run $TARGET-gcc to create target binaries.

mini-native.sh - Use the cross compiler to create a minimal native build environment for the target platform.

This script takes one argument: the architecture to build for.

This script uses the cross compiler found at build/cross-compiler-$ARCH/bin (with $ARCH- prefixes) to build a root filesystem for the target, as well as a target Linux kernel configured for use with qemu. A usable cross compiler is left in the build directory by the cross-compiler.sh script, or you can install your own.

The basic root filesystem consists of busybox and uClibc. If the configuration variable NATIVE_TOOLCHAIN is set (this is enabled by default), this script adds a native compiler to the target, consisting of linux kernel headers, gcc, binutils, make, and bash. It also adds distcc to potentially distribute work to cross compilers living outside the emulator. This provides a minimal native development environment, which may be expanded by building and installing more packages under the existing root filesystem.

package-mini-native.sh - Create an ext2 filesystem image of the native root filesystem.

This script takes one argument: the architecture to package.

This uses genext2fs to create an ext2 filesystem image from the build/mini-native-$ARCH directory left by running mini-native.sh, and creates a system-image-tarball containing the result. It first compiles genext2fs and adds it to build/host if the host system hasn't already got a copy.

This script also generates a run-emulator.sh script to call the appropriate emulator, using the architecture's configuration information.

run-from-build.sh - Runs a system image you compiled from source.

Calls run-emulator.sh in the appropriate build/system-image-$TARGET directory, with a 2 gigabyte hdb.img for /home and distcc connected to build/cross-compiler/$TARGET. Between runs it calls e2fsck on the system image's root filesystem.

This is not technically a build stage, as it isn't called from build.sh, but it's offered as a convenience for users. It uses the existing cross-compiler and system-image directories in build/ and doesn't mess with the tarballs that were created from them.

The following generally aren't called directly, but are used by the rest of the build.

config - User definable configuration variables

This file contains environment variables which you can set to customize the FWL build process. Setting any of these variables to a nonblank value changes the build.

NATIVE_TOOLCHAIN - This tells mini-native.sh to include a compiler toolchain (binutils, gcc, bash, make, and distcc). Without this, it builds a small uClibc/busybox system. This is the only variable enabled by default in config.

Setting NATIVE_TOOLCHAIN="headers" will leave the libc and kernel header files in the appropriate include directory, for use by some other native compiler. Building and installing additional tools (such as "make", or a compiler such as pcc, llvm/clang, or tinycc) then becomes your problem.
NATIVE_TOOLSDIR - This tells mini-native.sh to change the directory layout to conform to a Linux From Scratch "intermediate" system, with everything under a /tools directory. (This provides a cleaner environment for creating a new completely customized system at the root level.)
RECORD_COMMANDS - Records all command lines used to build each package.

This inserts a logging wrapper in the $PATH which logs the command lines used by the build. Afterwards, the script "sources/toys/report_recorded_commands.sh" can generate a big report on which commands were used to build each package for each architecture. To get a single list of the command names used by everything, do:

echo $(find build -name "cmdlines.*" | xargs awk '{print $1}' | sort -u)

(Note: this will miss things which which call executables at absolute values instead of checking $PATH, but the only interesting ones so far are the #!/bin/bash type lines at the start of shell scripts.)
CROSS_BUILD_STATIC - Tells cross-compiler.sh to statically link all binaries in the cross compiler toolchain it creates.

The prebuilt binary versions in the download directory are statically linked against uClibc, by building a mini-native environment and re-running the build under that with CROSS_BUILD_STATIC=1. The sources/build-all-targets.sh script can do this automatically with the "--use-static-host $TARGET" argument. (Requires QEMU installed.)
PREFERRED_MIRROR - Tells download.sh to try to download packages from this URL first, before falling back to the normal mirror list. For example, "PREFERRED_MIRROR=http://landley.net/aboriginal/mirror".
USE_UNSTABLE - Lists packages to build alternate "unstable" versions for.

The value of this config entry is a comma separate list of packages.

Many packages in download.sh have an UNSTABLE= tag providing a URL to an alternate version. Generally these link to newer versions, often unstable development versions, for testing purposes.

In addition to changing the download location, using alternate versions of packages prepends an "alt-" in front of the package name in various places (such as the patches from the sources/patches directory and the configuration files used from sources/targets). It changes the behavior of the "download" and "setupfor" shell functions.
USE_COLOR - Color code the various build stages.

Enabling this provides a quick visual indicator of which build stage is in progress.

This is disabled by default both because its utility is a matter of taste, and because finding a half-dozen different colors that work on both white and black backgrounds is hard, and gnome-terminal can't produce an actual black background. (In its default palette, "black" is a fairly light grey.)

sources/build-all-targets.sh - Build all supported targets at once.

This performs a similar function to build.sh, but for all targets instead of just one. It can build targets in parallel with the --fork option, logs the output of the various build stages, and generates a README.

This script populates a second output directory, buildall, with its output. This is probably only of interest to FWL's developers.

How is Firmware Linux implemented?

Directory layout

The top level directory of FWL contains the user interface of FWL (scripts the user is expected to call or edit), and the "sources" directory containing code that isn't expected to be directly called by end users.

Important directories under sources include:

sources/targets - Configuration information for each target.

Adding a new target to FWL involves creating a new directory under here (which determines the name of the target), and adding two miniconfig files (for linux and uClibc), and a "details" file defining environment variables.
sources/packages - Source tarballs for the packages to be built. This directory starts empty, and is populated by download.sh.
sources/native - This directory hierarchy is copied into the target verbatim (under /usr). It contains the boot script and some sample source code.
sources/toys - Build utilities, mostly original code written for FWL. Not necessarily specific to this project, but can't be downloaded from somewhere else.

Output files from running the build scripts, and all temporary files, go in the "build" subdirectory. This entire directory can be deleted between builds.

build/sources - cached copies of the extracted source tarballs, so setupfor can "cp -lfR" instead of having to re-extract and re-patch the source each time.
build/host - Output of host-tools.sh. If this directory exists and contains a "busybox" executable, include.sh will set the $PATH to point only to this directory.
build/temp-$TARGET - Temporary directory for building each target. Feel free to delete this between runs, it should be empty unless a build broke, in which case it has the source tree that failed to build. (The temporary directory for host-tools.sh is "host-temp", in case someday somebody creates a $TARGET named "host".)
build/cross-compiler-$TARGET - Output directory for cross-compiler.sh. The corresponding cross-compiler tarball is just an archive of this directory. Used by mini-native.sh.
build/mini-native-$TARGET - Output directory for mini-native.sh. The corresponding mini-native tarball is just an archive of this directory. Used by package-mini-native.sh.
build/system-image-$TARGET - Output directory for package-mini-native.sh. The corresponding system-image tarball is just an archive of this directory. Used by run-from-build.sh.

Shared infrastructure

The top level file for the behind-the-scenes plumbing is sources/include.sh. This script is not run directly, but is instead included from the other scripts. It does a bunch of things:

It parses the "config" file at the top directory, reading in the user defined configuration variables. (You can also supply these as environment variables, if you want to specify them for just one run.)

It sets several other environment variables, specifying things like the $SOURCE and $BUILD directories, and detecting the number of $CPUS. Many of these are set using the export_if_blank function, which keeps any existing value of the variable, allowing them to be externally overridden.

It adjusts the $PATH. If build/host exists and contains a busybox executable (meaning host-tools.sh did its thing already), $PATH is set to just that directory. If build/wrapdir exists, that's used instead for command line logging via sources/more/record-commands.sh.

If host-tools.sh ran after record-commands.sh, it sets the $PATH to point to the logging wrapper directory. ($WRAPPY_LOGPATH specifies where the logging wrapper should write its log file, and $WRAPPY_REALPATH says where to find the actual commands the logging wrapper hands off to.)

It also reads sources/functions.sh, which provides shell functions used by the rest of the build, including:

read_arch_dir - parses the appropriate sources/targets directory to read architecture information and set lots of environment variables. It takes one argument, the architecture name to build. If run with no arguments, it outputs all available architectures by listing the subdirectories under sources/targets.

All the build stages except download.sh and host-tools.sh call read_arch_dir with their first command line argument.
download - used by download.sh. Calls wget if necessary, uses sha1sum to verify the files. Saves the results in the directory pointed to by $SRCDIR (set to "packages" by sources/include.sh). Treat as a fancy call to "wget".
dienow - abort the current script, exiting with an error message. (Can even exit from nested shell functions.) Treat as a fancy "exit".
setupfor - extract a source package (named in the first argument) into a temporary directory (under $WORK), and change the current directory to there. Treat as a fancy "tar -xvjf" followed by cd.

Source code is cached, meaning each package's source tarball is only actually extracted and patched once (into build/packages) and the temporary copies are directories full of hard links (or optionally symlinks) to the cached source.
cleanup - delete temporary copy of source code after build. Treat as a fancy "rm -rf" (except that it remembers the working directory from the last setupfor call, rather than requiring it to be specified on the command line).

If the exit code of the last command was nonzero, it calls dienow instead of deleting the source code that didn't build properly, to preserve the evidence of what went wrong.
maybe_fork - If the environment variable FORK is set, run the first argument in the background. Otherwise, run it in the foreground. (This is used with the wait command, which blocks until all background jobs belonging to this shell have finished. If this shell has no background processes wait returns immediately.)

Most of what these shell functions do is optional. Most of it's there to speed up and simplify the rest of the build, and perform error checking. None of it should be very important to understanding how to build or install any of the actual packages. It just abstracts away repetitive, uninteresting bits.

Downloading source code

The FWL source distribution does not include any third party source tarballs. Instead, these are downloaded by running download.sh, which calls the shell function download, which calls wget as necessary. The download.sh script contains a series of calls to the download function.

Only calls to the shell function "download" contain version information for packages. The scripts that actually build the packages do not, they are as version agnostic as possible.

The following environment variables control the behavior of download.sh, and may be set before calling it:

EXTRACT_ALL - prepopulate the source cache.

If set, each call to the download function will extract and patch the corresponding source tarball (into the sources/packages cache directory) immediately after the download completes, instead of waiting for setupfor to do it. (See "Extracting source code", below.)
FORK - Calls to download are usually wrapped in maybe_fork, so if this is set they run in parallel.
PREFERRED_MIRROR
- This contains the URL of a mirror site to be checked _before_ downloading from the actual $URL specified in download.sh.

This allows download.sh to fetch some or all of its packages from a local mirror of the files, instead of going out to the net. Any files not found in this mirror will be fetched from the standard URL, and the fallback mirrors as necessary.

(Note: inside qemu the special address 10.0.2.2 passes through connections to 127.0.0.1 on the host, so if you run a web server on your host's loopback address you can pass source code into the emulator without going out to an external network.)

The following environment variables control each call to the download function, and are set before each call:

URL - The URL from which to download this source package into the sources/packages directory.

In addition to specifying a web location, this URL determines the name of the source package to fetch. If this source tarball cannot be fetched from this location, the download function tries to download the file from a series of fallback mirrors (stored in the variable MIRROR_LIST, set in include.sh). The primary mirror is http://landley.net/aboriginal/mirror which should have every source tarball used by the build.

The package name is the filename at the end of URL minus any version information and file type extensions, so "bash-2.04b.tar.bz2" becomes "bash". The shell function "basename" uses a rather complicated regex to extract the package name from a URL. This versionless package name is used by things like setupfor, allowing the build scripts to mostly ignore the versions of the packages they build.

SHA1 - The sha1sum of the source tarball to fetch.

Used to confirm that the downloaded file is correct. If not, it tries the next mirror in the list, or calls dienow if out of mirrors.

If this value is blank, the sha1sum calculated from the file will be displayed but not verified. This means any file will be accepted as correct as long as it exists with the right name, but the build won't be able to detect corrupted or truncated files.

When updating to a new version of a package, a common trick is to update the URL and blank the SHA1, run ./download.sh to fetch the new file, cut and paste the SHA1 value displayed after the download to set the SHA1 variable, and then re-run ./download.sh to confirm they match.
UNSTABLE - URL to an alternate version of the file, for testing purposes.

This version is only downloaded when USE_UNSTABLE contains the name of this package in its package list. It doesn't fall back to check the mirror list, and is not affected by PREFERRED_MIRROR.

Unstable packages are saved as a tarball called "alt-$PACKAGE-0" plus the file type extension. (Thus the name to save is based on the filename in the normal $URL rather than on what the $UNSTABLE address points to, so even if your UNSTABLE address ends with "snapshot.tgz" or "tip.tar.bz2", it will still wind up somewhere the rest of the build can find it.)
RENAME - regex to rename a downloaded file.

This is a "sed -r" extended regular expression with which to rename a file. The "setupfor" function expects filenames in "$PACKAGE-$VERSION.$TYPE" format. If a source package at $URL isn't named that way (such as squashfs not having a dash between the package name and version), you can adjust it with this.

At the end of download.sh is a call to the shell function cleanup_oldfiles, which deletes unused files. The include.sh snapshots the current time in the variable $START_TIME, and download calls "touch" to update the timestamp on each file it verifies the sha1sum of. Then cleanup_oldfiles deletes every file from sources/packages with a date older than $START_TIME. (It does not recurse into subdirectories.)

Note that download updates the timestamp on stable packages when downloading corresponding unstable packages (and vice versa), so cleanup_oldfiles won't delete them. In this special case they're not considered "unused files", but it won't verify their integrity or fetch them if they're not already there.

Extracting source code

The function "setupfor" extracts sources/packages/$PACKAGENAME-* tarballs. (If $PACKAGENAME is found in the comma separated $USE_UNSTABLE list, the build adds an "alt-" prefix to the package name.) This populates a corresponding directory under build/sources, and applies all the sources/patches/$PACKAGENAME-*.patch files in alphabetical order. (So if a package has multiple patches that need to be applied in a specific order, name them something like "bash-001-dothingy.patch", "bash-002-next.patch" to control this.)

The trailing "-" before filename wildcards prevents collisions between things like "uClibc" and "uClibc++". Packages are allowed to contain dashes (such as gcc-core), but cannot have a digit immediately after the dash.

FWL implements source caching. The first call to setupfor extracts the package into build/sources, and then creates a directory of hard links in the current target's build/temp-$TARGET directory with cp -lfR. Later setupfor calls just create the directory of hard links from the existing source tree. (This is a hybrid approach between building "out of tree" and building in-tree.)

The ./download.sh --extract option prepopulates the source cache, extracting and patching each source tarball. This is useful for scripts such as sources/build-all-targets.sh which perform multiple builds in parallel.

The reason for keeping extracted source tarballs around is that extracting and patching tarballs is a fairly expensive operation, which uses a significant amount of disk space and doesn't parallelize well. (It tends to be disk limited as much as CPU limited, so trying for more parallelism wouldn't necessarily help.) In addition, the same packages are repeatedly extracted: the cross-compiler and mini-native stages use many of the same packages, and some packages (such as the Linux kernel) are extracted and removed repeatedly to grab things like kernel headers separately from actually building a bootable kernel. (Also, different architectures build the exact same packages, with the same set of patches. Even patches to fix a bug on a single architecture are applied for all architectures; if this causes a problem, it's not a mergeable patch capable of going upstream.)

Building host tools

The host-tools.sh script sets up the host environment. Usually the host environment is already in a usable state, but this script explicitly enumerates exactly what we need to build, and provides our own (known) versions of everything except the host compiler toolchain in the directory build/host. Once we've finished, the $PATH can be set to just that directory.

The build calls seven commands from the host compiler toolchain: ar, as, nm, cc, gcc, make, and ld. All those have to be in the $PATH, so host-tools.sh creates symlinks to those from the original $PATH.

Next host-tools.sh builds toybox for the "patch" command, because busybox patch can't simple handle offsets and is thus extremely brittle in the face of new package versions. (This is different from "fuzz factor", which removes context lines to find a place to insert a patch, and tends to break a lot.) If USE_TOYBOX is enabled, a defconfig toybox is used and all commands are installed.

Next host-tools builds a "defconfig" busybox and installs it into build/host. This provides all the other commands the build needs.

What's the minimum the build actually needs?

When building a new system, environmental dependencies are a big issue. Figuring out what package needs what, and what order to build things in, is the hardest part of putting together a system.

Running the build without build/host calls lots of extra commands, including perl, pod2man, flex, bison, info, m4, and so on. This is because the ./configure stages of the various packages detect optional functionality, and use it. One big reason to limit the build environment is to consistently produce the same output files, no matter what's installed on the host.

The minimal list of commands needed to build a working system image is 1) a working toolchain (ar, as, nm, cc, gcc, make, ld), 2) /bin/bash (and a symlink /bin/sh pointing to it), 3) the following command line utilities in the $PATH:

awk basename bzip2 cat chmod chown cmp cp cut date dd diff dirname echo egrep env expr find grep gzip hostname id install ln ls mkdir mktemp mv od patch pwd readlink rm rmdir sed sha1sum sleep sort tail tar touch tr true uname uniq wc which whoami xargs yes

These commands are supplied by current versions of busybox.

Bash has been the standard Linux shell since before the 0.0.1 release in 1991, and is installed by default on all Linux systems. (Ubuntu broke its /bin/sh symlink to point to the Defective Annoying SHell, so many scripts call #!/bin/bash explicitly now rather than relying on a broken symlink.) We can't stop the build from relying on the host version of this tool; editing $PATH has no effect on the #!/bin/bash lines of shell scripts.

The minimal set of commands necessary to build a system image was determined experimentally, by running a build with $RECORD_COMMANDS and then removing commands from the list and checking the effect this had on the build. (Note that the minimal set varies slightly from target to target.)

$RECORD_COMMANDS tells host-tools.sh to set up a logging wrapper that intercepts each command line in the build and writes it to a log file, so you can see what the build actually uses. (Note that when host-tools.sh sets up build/wrapper, it doesn't set up build/host, so the build still uses the host system's original command line utilities instead of building busybox versions. If you'd like to record the build using build/host commands, run host-too.sh without $RECORD_COMMANDS set and then run it again with $RECORD_COMMANDS to set up the logging wrapper pointing to the busybox tools.)

The way $RECORD_COMMANDS works is by building a logging wrapper (sources/toys/wrappy.c) and populating a directory (build/wrapper) with symlinks to that logging wrapper for each command name in $PATH. When later build stages run commands, the wrapper appends the command line to the log file (specified in the environment variable $WRAPPY_LOGPATH, host-tools.sh sets this to "$BUILD/cmdlines.$STAGE_NAME.$PACKAGE_NAME"), recording each command run. The logging wrapper then searches $WRAPPY_REALPATH to find the actual command to hand its command line off to.

Building a cross compiler

We cross compile so you don't have to. The point of this project is to make cross compiling go away, but you need to do some to get past it. So let's get it over with.

The cross-compiler.sh script builds a cross compiler. Its output goes into build/cross-compiler-$TARGET directory, which is deleted at the start of the build if it already exists, so re-running this script always does a clean build.

Creating a cross compiler is a five step process:

binutils - Build assembler and linker for the target platform.

This package has no interesting dependencies, and thus can be the first thing you build for a target.
gcc - Build C/C++ compiler for the target platform.

This package needs binutils, and must be built after that. It does not need a C library, so can be built before that.

The mini-native build doesn't require C++ support, but the build adds gcc-g++ to the basic gcc-core and enables C++ support so the distcc accelerator trick can speed up C++ builds.

We create an "xgcc" symlink pointing to the host compiler to force gcc not to attempt to rebuild itself with itself. (It needed to be able to build xgcc with the host compiler, but doesn't trust the host compiler to build an actual binary to deploy. Note that this xgcc builds _host_ binaries, not target binaries.)
compiler wrapper - Install a wrapper around gcc to enforce sane path logic.

This builds a wrapper for gcc from "sources/toys/gcc-uClibc.c". This compiler wrapper rewrites the gcc command line to start with --nostdinc and --nostdlib, and then explicitly adds the correct header and library search paths, and when linking adds the correct object files and libraries.

It needs to do this because gcc's path logic has been consistently broken for about two decades now. (See why cross compiling sucks for more details.)

The compiler hands off the new command line to $ARCH-rawgcc, so the old $ARCH-gcc gets renamed to that and the wrapper gets the old name.

To allow the compiler wrapper to easily find the headers and libraries, the build moves them to known locations. The system headers and libraries go into "include" and "lib" directories at the same level as the "bin" directory containing the wrapper script, and gcc's own headers and libraries go into "gcc/include" and "gcc/lib". The wrapper then finds itself (using argv[0] and if necessary searching the $PATH it inherits), and backs up one level to find the headers and libraries it needs to add to the gcc path.
linux - kernel headers.

This package doesn't have any prerequisites, but C libraries need it to build themselves. (Kernel headers define the system call API for the Linux kernel.)
uClibc - uClibc (micro C library).

This package is target code that needs to be built with a cross compiler (gcc and binutils), and also needs kernel headers. It requires all three of the other packages, and thus must be built last.

Note that we only build a standard C library. We don't build/install a standard C++ library (uClibc++), because distcc doesn't need headers or libraries in the cross compiler. Thus the cross compiler has enough C++ support to be used from the native environment via distcc, but not enough to cross compile C++ code on its own.

The compiler wrapper actually uses links to "libc.so", which is a linker script pointing to libuClibc.so.0. We patch uClibc so it doesn't put absolute paths into its libc.so; without them the linker searches the supplied library search paths, and thus the compiler may be installed in an arbitrary location.

Afterwards the build strips some of the binaries, tars up the result, and performs some quick sanity tests (building dynamic and static versions of hello world. If the target configuration lists a version of QEMU to test individual binaries under on the host, it runs the static version to make sure it outputs "Hello world".

Building a minimal native development environment for the target system

The mini-native.sh script uses the cross compiler from the previous step to build a kernel and root filesystem for the target. The resulting system should boot and run under an emulator, or on real target hardware.

If you really want to learn how to cross compile a target system, this is the script you want to read, and possibly append your own packages to. That said: please don't, and here's why:

Because cross-compiling is persnickety and difficult, we do as little of it as possible. This script should perform all the cross compiling anyone ever needs to do. It uses the cross-compiler to generate the simplest possible native build environment for the target which is capable of rebuilding itself under itself.

Anything else that needs to be built for the target can then be built natively, by running this kernel and root filesystem under an emulator and building new packages there, bootstrapping up to a full system if necessary. The emulator we use for this is QEMU. Producing a minimal build environment powerful enough to boot and compile a complete Linux system requires seven packages: the Linux kernel, binutils, gcc, uClibc, BusyBox, make, and bash. We build a few more than that, but those are optional extras.

This root filesystem can also be packaged using the Linux From Scratch /tools directory approach, staying out of the way so the minimal build environment doesn't get mixed into the final system, by setting the $NATIVE_TOOLSDIR environment variable. If you don't know why you'd want to do that, you probably don't want to.

In either configuration, the main target directory the build installs files into is held in the environment variable "$TOOLS". If $NATIVE_TOOLSDIR is set this will be "/tools" in the new root filesystem, otherwise it'll be "/usr".

The steps the script goes through are:

directory setup - Create empty directories for the basic filesystem layout.

If $NATIVE_TOOLSDIR is set, build script will create a Linux From Scratch style intermediate system by moving the filesystem layout under /tools, which means skipping the top level directories and installing most files into /tools instead of /usr. This also sets the variable $UCLIBC_DYNAMIC_LINKER to tell the compiler wrapper to create binaries that depend on shared libraries in /tools rather than the default "/lib/ld-uClibc.so.0". (With the /tools layout, the qemu-setup.sh script can recreate most of the top level directories at runtime, often as symlinks into /tools.)
Copy sources/native - The most important thing here is the qemu-setup.sh script, but there's also example source code in the src directory.
Linux kernel - Build a kernel that can boot under QEMU.

We need kernel headers to build uClibc, so install those while we've got the kernel tarball extracted. (We could grab these files directly from the cross compiler, but we rebuild from source to keep the layers cleanly separated.)

The kernel build uses sources/targets/$ARCH/miniconfig-linux to configure the kernel for the appropriate QEMU target, and the $KERNEL_PATH variable to figure out which kernel image file to use.
uClibc - Build standard C library.

The binaries in the target system are dynamically linked, so we need shared libraries installed. Again, we could grab these files out of the cross compiler, but we rebuild from source to keep the layers cleanly separated.

We unconditionally install the development files (headers and static libraries), and delete them later if $NATIVE_TOOLCHAIN isn't set.

Right after installing the C library, we export the environment variable $WRAPPER_TOPDIR which tells the compiler wrapper to links against the new headers and shared libraries we've installed into the new root filesystem, rather than the ones out of the cross compiler's include and lib directories.
toybox - Build optional command line utilities.

This isn't strictly required. If $USE_TOYBOX isn't set, this only installs symlinks for the "patch" command and "oneit" commands. (The oneit command is similar to init=/bin/sh, except it allows terminal control to work and shuts the system down cleanly on exit. It's used by qemu-setup.sh to provide a more forgiving command line.)

If $USE_TOYBOX is set, this installs toybox versions of many commands instead of the busybox versions. These tend to be simpler, more straightforward implementations than the busybox versions. (Note: your author is biased here.)
busybox - Build busybox command line utilities.

This provides the bulk of the command line utilities for the new system.

Once upon a time, "make defconfig" provided the largest sane configuration in busybox, enabling every working command and feature that didn't have undesirable side effects (such as debugging options) or require special configuration to use (such as SELINUX). Unfortunately, over time this goal was lost and make defconfig bit-rotted into a fairly random configuration.

To recapture the original "largest sane configuration" goal, the build starts with "make allyesconfig" and applies sources/trimconfig-busybox to remove features that would otherwise cause problems. The trimconfig file has comments in it if you're wondering why specific features are disabled.
Check $NATIVE_TOOLCHAIN - Build a native development toolchain only if $NATIVE_TOOLCHAIN is set.

$NATIVE_TOOLCHAIN is the only configuration option set by default. You can disable it in "config" if you want to build skeletal target system and add your own software to it by hand.

If it is enabled, the following happens:
- binutils - Build a native assembler and linker.
- gcc and libsupc++ - Build a native C and C++ compiler.
  
  This process is still a bit tangled. The fundamental reason for this is that the gcc build process is pathologically misdesigned. (See what the hell is wrong with gcc for a long digression into the details.)
  
  The secondary reason is that libstdc++ is built into gcc, which makes as much sense as building glibc into gcc. GCC's C++ support is not cleanly separated into layers, so replacing their built-in libstdc++ with the much smaller uClibc++ requires performing additional surgery on the gcc build process to get it to stop being actively stupid. (For simplicity we punted on this while building the cross compiler, but now we need to make it work.)
  
  So after beating gcc over the head with almost a dozen different environment variables and a bunch of ./configure options to get it to cross compile like a normal program, we then have to chdir into the libsupc++ subdirectory to build a static library which uClibc++ needs in order to interface properly with the compiler. (It defines things like stack unwinding and the current exception model, which the C++ library needs to know but which gcc doesn't cleanly export for external use.) Logically this step belongs with the uClibc++ build, but we have to export this information from the gcc source directory because that's where it lives.
  
  We also clean up after a bug where gcc uses multilib directories (such as /lib64) on some systems even when we explicitly told it we didn't want multilib. (This package isn't very good at taking "no" for an answer.) And we create a "cc" symlink to gcc, because some packages use that as their compiler and SUSv3 says we should have one.
- compiler wrapper - Wrap gcc, to control path logic.
  
  The native build still installs the compiler wrapper (from sources/toys/gcc-uClibc.c) to rewrite gcc's command line arguments and bypass its built-in path logic. In theory native compiling is less tricky and the final location we're installing the compiler at is known at compile time, so we could just patch the compiler's source code to check the right paths. But going there rapidly turns into a nightmare of tangled historical scar tissue, and breaks in new and exciting ways with each new gcc release. The only way to get gcc to use sane paths is to take path decisions out of its hands entirely.
  
  This does the same header/library shuffling and symlink creation as the cross compiler did, but without a prefix on the symlink names this time.
- uClibc++ - Build micro standard c++ library.
  
  C++ has its own standard library, and its own standard header files, without which the overloaded bit shift operators can't even perform I/O. The package uClibc++ provides much smaller and simpler versions of these than the libstdc++-v3 metastasized through gcc-g++.
  
  This package mostly builds out of the box, assuming the cross compiler has minimal c++ support and you have the right pliers to extract libsupc++ from the gcc build. We start with the defconfig, switch off TLS and LONG_DOUBLE support that uClibc doesn't currently provide, and blank the RUNTIME_PREFIX so it installs where we tell it to. Then we shuffle the libraries around so the compiler wrapper can find them and make symlinks from the generic "libstdc++.{so,a}" names to the corresponding libuClibc++ files.
- make - Build make
  
  A toolchain doesn't do you much good without the "make" command. Fairly straightforward to build.
- bash - The standard Linux command shell.
- distcc - command to distribute compiles across a network cluster.
That's everything in the $NATIVE_TOOLCHAIN. The rest is minor cleanup and packaging.
Build static and dynamic hello world binaries

These are installed into $TOOLS/bin as hello-dynamic and hello-static. These are debugging tools: If you can't boot the system to a shell prompt, try running hello-static as init to see if it runs and gives you output. If that works try hello-dynamic to see if shared libraries are loading.
Strip some binaries to save space.
Create the mini-native tarball - we're done.

In theory, you can add more packages to mini-native.sh, or run another similar script to use the cross compiler to produce output into the mini-native directory. In practice, this is not recommended. Cross compiling is an endless sinkhole of frustration, and the easiest way to deal with it is not to go there.

Packaging up a system image to run under emulation

The package-mini-native.sh script packages a system image for use by QEMU. Its output goes into build/system-image-$TARGET directory, which is deleted at the start of the build if it already exists, so re-running this script always does a clean build.

The steps here are:

use genext2fs to package the output of mini-native.sh as an 64 megabyte ext2 image.
create run-emulator.sh by appending an emulator invocation command line to sources/toys/run-emulator.sh.

This calls a shell function "emulator_command" from the target architecture definition, passing in the name of the ext2 image containing the root filesystem and the kernel image to boot. A shell function "qemu_defaults" is defined to let emulator_command grab logs of common boilerplate, such as kernel command line options. (In theory run_emulator is free to use a different emulator, or even output a command to send the files to real hardware through a network connection or jtag or some such.)

The path for some or the run-emulator.sh kernel command line arguments is also adjusted based on $NATIVE_TOOLSDIR.
For the powerpc architecture, ppc_rom.bin is copied from sources/toys. (This architecture needs a custom boot rom for qemu to be able to boot a bzImage via -kernel.)
Tar up the result

Running on real hardware

To run a system on real hardware (not just under an emulator), you need to do several things. Dealing with myriad individual devices is beyond the scope of this project, but the general theory is:

Figure out how to flash your device (often a jtag with openocd)
Configure and install a bootloader (uboot, apex, etc.)
Build and install a kernel targeted to your hardware (in the kernel source, see arch/$ARCH/configs for default .config files for various boards)
Package and install the root filesystem appropriately for your system (ext2, initramfs, jffs2).

Speeding up emulated builds (the distcc accelerator trick)

Cross compiling is fast but unreliable. The ./configure stage is designed wrong (it asks questions about the host system it's building on, and thinks the answers apply to the target binary it's creating).

Why do things this way

UNDER DEVELOPMENT

This entire section is a dumping ground for historical information. It's incomplete, lots of it's out of date, and it hasn't been integrated into a coherent whole yet. What is here is in no obvious order.

Why cross compiling sucks

Cross compiling is fast but unreliable. Most builds go "./configure; make; make install", but entire ./configure stage is designed wrong for cross compiling: it asks questions about the host system it's building on, and thinks the answers apply to the target binary it's creating.

Build processes often create temporary binaries which run during the build (to generate header files, parse configuration information ala kconfig, various "makedep" style dependency generators...). These builds need two compilers, one for the host and one for the target, and need to keep straight when to use each one.

Cross compilers leak host data, falling back to the host's headers and libraries if they can't find the target files they need.

TODO: finish this.

Speeding up emulated builds (the distcc accelerator trick)

TODO: FILL THIS OUT

The basic theory

The Linux From Scratch approach is to build a minimal intermediate system with just enough packages to be able to compile stuff, chroot into that, and build the final system from there.

This approach completely isolates the host from the target, which means you should be able to run the FWL build under a wide variety of Linux distributions, and since the final system is built with a known set of tools you should get a consistent result. It also means you could run a prebuilt system image under a different host operating system entirely (such as MacOS X, or an arm version of linux on an x86-64 host) as long as you have an appropriate emulator.

A minimal build environment consists of a compiler, command line tools, and a C library. In theory you just need three packages:

A C compiler.
BusyBox
A C library (uClibc)

Unfortunately, that doesn't work yet.

Some differences between theory and reality.

We actually need seven packages (linux, uClibc, busybox, binutils, gcc, make, and bash) to create a working build environment. We also add an optional package for speed (distcc), and use two more (genext2fs and QEMU) to package and run the result.

Environmental dependencies.

Environmental dependencies are things that need to be installed before you can build or run a given package. Lots of packages depend on things like zlib, SDL, texinfo, and all sorts of other strange things. (The GnuCash project stalled years ago after it released a version with so many environmental dependencies it was virtually impossible to build or install. Environmental dependencies have a complexity cost, and are thus something to be minimized.)

A good build system will scan its environment to figure out what it has available, and disable functionality that depends on anything that isn't. (This is generally done with autoconf, which is disgusting but suffers from a lack of alternatives.) That way, the complexity cost is optional: you can build a minimal version of the package if that's all you need.

A really good build system can be told that the environment it's building in and the environment the result will run in are different, so just because it finds zlib on the build system doesn't mean that the target system will have zlib installed on it. (And even if it does, it may not be the same version. This is one of the big things that makes cross-compiling such a pain. One big reason for statically linking programs is to eliminate this kind of environmental dependency.)

The Firmware Linux build process is structured the way it is to eliminate as many environmental dependencies as possible. Some are unavoidable (such as C libraries needing kernel headers or gcc needing binutils), but the intermediate system is the minimal fully functional Linux development environment we currently know how to build, and then we switch into that and work our way back up from there by building more packages in the new environment.

Resolving environmental dependencies.

To build uClibc you need kernel headers identifying the syscalls and such it can make to the OS. We get them from the Linux kernel source tarball, using the "make headers_install" infrastructure created by David Woodhouse. This runs various scripts against the Linux kernel source code to sanitize the kernel's own headers for use by userspace. (This was merged in 2.6.18-rc1, and was more or less debugged by 2.6.19.)

We install bash because the busybox shell situation is a mess. Busybox has several different shell implementations which share little or no code. (It's better now than it was a few years ago, but thanks to Ubuntu breaking the #!/bin/sh symlink with the Defective Annoying SHell, many scripts point explicitly at #!/bin/bash and BusyBox can't use that name for any of its shells yet.)

Most packages expect gcc. The gnu compiler "toolchain" actually consists of three packages (binutils, gcc, and make). (The split between binutils and gcc is for purely historical reasons, and you have to match the right versions with each other or things break.)

Adding an SUSv3 make implementation to busybox or toybox isn't a major problem, but until a viable GCC replacement emerges there's not much point.

None of the other compilers under development are a drop-in replacement for gcc yet, especially for building the Linux kernel (which makes extensive use of gcc extensions). Intel's C Compiler implemented the necessary gcc extensions to build the Linux kernel, but it's a closed source package only supporting x86 and x86-64 targets. Since the introduction of C99, the Linux kernel has replaced many of these gcc extensions with equivalent C99 idioms, so in theory building the Linux kernel with other compilers is now easier.

With the introduction of GPLv3, the Free Software Foundation has pissed off enough people that work on an open source replacement for gcc is ongoing on several fronts. The most promising is probably PCC, which is supported by what's left of the BSD community. Apple sponsors another significant effort, LLVM/Clang. Both are worth watching.

Several others (Such as TinyCC and Open Watcom) once showed promise but have been essentially moribund since about 2005, which is when compilers that only ran on 32 bit hosts and supported C89 stopped being interesting. (A significant amount of effort is required to retool an existing compiler to cleanly run on an x86-64 host and support the full C99 feature set, let alone produce output for the dozens of hardware platforms supported by Linux, or produce similarly optimized binaries.)

Additional complications

Cross-compiling and avoiding root access

Any time you create target binaries that won't run on the host system, you're cross compiling. Even when both the host and target are on the same processor, if they're sufficiently different that one can't run the other's binaries, then you're cross-compiling. In our case, the host is usually running both a different C library and an older kernel version than the target, even when it's the same processor.

We want to avoid requiring root access to build Firmware Linux. If the build can run as a normal user, it's a lot more portable and a lot less likely to muck up the host system if something goes wrong. This means we can't modify the host's / directory (making anything that requires absolute paths problematic). We also can't mknod, chown, chgrp, mount (for --bind, loopback, tmpfs)...

In addition, the gnu toolchain (gcc/binutils) is chock-full of hardwired assumptions, such as what C library it's linking binaries against, where to look for #included headers, where to look for libraries, the absolute path the compiler is installed at... Silliest of all, it assumes that if the host and target use the same processor, you're not cross-compiling (even if they have a different C library and a different kernel, and even if you ./configure it for cross-compiling it switches that back off because it knows better than you do). This makes it very brittle, and it also tends to leak its assumptions into the programs it builds. New versions may someday fix this, but for now we have to hit it on the head repeatedly with a metal bar to get anything remotely useful out of it, and run it in a separate filesystem (chroot environment) so it can't reach out and grab the wrong headers or wrong libraries despite everything we've told it.

The absolute paths problem affects target binaries because all dynamically linked apps expect their shared library loader to live at an absolute path (in this case /lib/ld-uClibc.so.0). This directory is only writeable by root, and even if we could install it there polluting the host like that is just ugly.

The Firmware Linux build has to assume it's cross-compiling because the host is generally running glibc, and the target is running uClibc, so the libraries the target binaries need aren't installed on the host. Even if they're statically linked (which also mitigates the absolute paths problem somewhat), the target often has a newer kernel than the host, so the set of syscalls uClibc makes (thinking it's talking to the new kernel, since that's what the ABI the kernel headers it was built against describe) may not be entirely understood by the old kernel, leading to segfaults. (One of the reasons glibc is larger than uClibc is it checks the kernel to see if it supports things like long filenames or 32-bit device nodes before trying to use them. uClibc should always work on a newer kernel than the one it was built to expect, but not necessarily an older one.)

Ways to make it all work

Cross compiling vs native compiling under emulation

Cross compiling is a pain. There are a lot of ways to get it to sort of kinda work for certain versions of certain packages built on certain versions of certain distributions. But making it reliable or generally applicable is hard to do.

I wrote an introduction to cross-compiling which explains the terminology, pluses and minuses, and why you might want to do it. Keep in mind that I wrote that for a company that specializes in cross-compiling. Personally, I consider cross-compiling a necessary evil to be minimized, and that's how Firmware Linux is designed. We cross-compile just enough stuff to get a working native build environment for the new platform, which we then run under emulation.

Which emulator?

The emulator Firmware Linux 0.8x used was User Mode Linux (here's a UML mini-howto I wrote while getting this to work). Since we already need the linux-kernel source tarball anyway, building User Mode Linux from it was convenient and minimized the number of packages we needed to build the minimal system.

The first stage of the build compiled a UML kernel and ran the rest of the build under that, using UML's hostfs to mount the parent's root filesystem as the root filesystem for the new UML kernel. This solved both the kernel version and the root access problems. The UML kernel was the new version, and supported all the new syscalls and ioctls and such that the uClibc was built to expect, translating them to calls to the host system's C library as necessary. Processes running under User Mode Linux had root access (at least as far as UML was concerned), and although they couldn't write to the hostfs mounted root partition, they could create an ext2 image file, loopback mount it, --bind mount in directories from the hostfs partition to get the apps they needed, and chroot into it. Which is what the build did.

Current Firmware Linux has switched to a different emulator, QEMU, because as long as we're we're cross-compiling anyway we might as well have the ability to cross-compile for non-x86 targets. We still build a new kernel to run the uClibc binaries with the new kernel ABI, we just build a bootable kernel and run it under QEMU.

The main difference with QEMU is a sharper dividing line between the host system and the emulated target. Under UML we could switch to the emulated system early and still run host binaries (via the hostfs mount). This meant we could be much more relaxed about cross compiling, because we had one environment that ran both types of binaries. But this doesn't work if we're building an ARM, PPC, or x86-64 system on an x86 host.

Instead, we need to sequence more carefully. We build a cross-compiler, use that to cross-compile a minimal intermediate system from the seven packages listed earlier, and build a kernel and QEMU. Then we run the kernel under QEMU with the new intermediate system, and have it build the rest natively.

It's possible to use other emulators instead of QEMU, and I have a todo item to look at armulator from uClinux. (I looked at another nommu system simulator at Ottawa Linux Symposium, but after resolving the third unnecessary environmental dependency and still not being able to get it to finish compiling yet, I gave up. Armulator may be a patch against an obsolete version of gdb, but I could at least get it to build.)

Packaging

Filesystem Layout

Firmware Linux's directory hierarchy is a bit idiosyncratic: some redundant directories have been merged, with symlinks from the standard positions pointing to their new positions. On the bright side, this makes it easy to make the root partition read-only.

Simplifying the $PATH.

The set "bin->usr/bin, sbin->usr/sbin, lib->usr/lib" all serve to consolidate all the executables under /usr. This has a bunch of nice effects: making a a read-only run-from-CD filesystem easier to do, allowing du /usr to show the whole system size, allowing everything outside of there to be mounted noexec, and of course having just one place to look for everything. (Normal executables are in /usr/bin. Root only executables are in /usr/sbin. Libraries are in /usr/lib.)

For those of you wondering why /bin and /usr/sbin were split in the first place, the answer is it's because Ken Thompson and Dennis Ritchie ran out of space on the original 2.5 megabyte RK-05 disk pack their root partition lived on in 1971, and leaked the OS into their second RK-05 disk pack where the user home directories lived. When they got more disk space, they created a new direct (/home) and moved all the user home directories there.

The real reason we kept it is tradition. The excuse is that the root partition contains early boot stuff and /usr may get mounted later, but these days we use initial ramdisks (initrd and initramfs) to handle that sort of thing. The version skew issues of actually trying to mix and match different versions of /lib/libc.so.* living on a local hard drive with a /usr/bin/* from the network mount are not pretty.

I.E. The separation is just a historical relic, and I've consolidated it in the name of simplicity.

On a related note, there's no reason for "/opt". After the original Unix leaked into /usr, Unix shipped out into the world in semi-standardized forms (Version 7, System III, the Berkeley Software Distribution...) and sites that installed these wanted places to add their own packages to the system without mixing their additions in with the base system. So they created "/usr/local" and created a third instance of bin/sbin/lib and so on under there. Then Linux distributors wanted a place to install optional packages, and they had /bin, /usr/bin, and /usr/local/bin to choose from, but the problem with each of those is that they were already in use and thus might be cluttered by who knows what. So a new directory was created, /opt, for "optional" packages like firefox or open office.

It's only a matter of time before somebody suggests /opt/local, and I'm not humoring this. Executables for everybody go in /usr/bin, ones usable only by root go in /usr/sbin. There's no /usr/local or /opt. /bin and /sbin are symlinks to the corresponding /usr directories, but there's no reason to put them in the $PATH.

Consolidating writeable directories.

All the editable stuff has been moved under "var", starting with symlinking tmp->var/tmp. Although /tmp is much less useful these days than it used to be, some things (like X) still love to stick things like named pipes in there. Long ago in the days of little hard drive space and even less ram, people made extensive use of temporary files and they threw them in /tmp because ~home had an ironclad quota. These days, putting anything in /tmp with a predictable filename is a security issue (symlink attacks, you can be made to overwrite any arbitrary file you have access to). Most temporary files for things like the printer or email migrated to /var/spool (where there are persistent subdirectories with known ownership and permissions) or in the user's home directory under something like "~/.kde".

The theoretical difference between /tmp and /var/tmp is that the contents of /tmp should be deleted by the system init scripts on every reboot, but the contents of /var/tmp may be preserved across reboots. Except there's no guarantee that the contents of any temp directory won't be deleted. So any program that actually depends on the contents of /var/tmp being preserved across a reboot is obviously broken, and there's no reason not to just symlink them together.

(I case it hasn't become apparent yet, there's 30 years of accumulated cruft in the standards, covering a lot of cases that don't apply outside of supercomputing centers where 500 people share accounts on a mainframe that has a dedicated support staff. They serve no purpose on a laptop, let alone an embedded system.)

The corner case is /etc, which can be writeable (we symlink it to var/etc) or a read-only part of the / partition. It's really a question of whether you want to update configuration information and user accounts in a running system, or whether that stuff should be fixed before deploying. We're doing some cleanup, but leaving /etc writeable (as a symlink to /var/etc). Firmware Linux symlinks /etc/mtab->/proc/mounts, which is required by modern stuff like shared subtrees. If you want a read-only /etc, use "find /etc -type f | xargs ls -lt" to see what gets updated on the live system. Some specific cases are that /etc/adjtime was moved to /var by LSB and /etc/resolv.conf should be a symlink somewhere writeable.

The resulting mount points

The result of all this is that a running system can have / be mounted read only (with /usr living under that), /var can be ramfs or tmpfs with a tarball extracted to initialize it on boot, /dev can be ramfs/tmpfs managed by udev or mdev (with /dev/pts as devpts under that: note that /dev/shm naturally inherits /dev's tmpfs and some things like User Mode Linux get upset if /dev/shm is mounted noexec), /proc can be procfs, /sys can bs sysfs. Optionally, /home can be be an actual writeable filesystem on a hard drive or the network.

Remember to put root's home directory somewhere writeable (I.E. /root should move to either /var/root or /home/root, change the passwd entry to do this), and life is good.

Firmware Linux is an embedded Linux distribution builder, which creates a bootable single file Linux system based on uClibc and BusyBox/toybox. It's basically a shell script that builds a complete Linux system from source code for an arbitrary target hardware platform.

The FWL script starts by building a cross-compiler for the appropriate target. Then it cross-compiles a small Linux system for the target, which is capable of acting as a native development environment when run on the appropriate hardware (or under an emulator such as QEMU). Finally the build script creates an ext2 root filesystem image, and packages it with a kernel configured to boot under QEMU and shell scripts to invoke qemu appropriately.

The FWL boot script for qemu (/tools/bin/qemu-setup.sh) populates /dev from sysfs, sets up an emulated (masquerading) network (so you can wget source packages or talk to distcc), and creates a few symlinks needed to test build normal software packages (such as making /lib point to /tools/lib). It also mounts /dev/hdb (or /dev/sdb) on /home if a second emulated drive is present.

For most platforms, exiting the command shell will exit the emulator. (Some, such as powerpc, don't support this yet. For those you have to kill qemu from another window, or exit the xterm. I'm working on it.)

To use this emulated system as a native build environment, see native compiling.

Adding a new target platform

The differences between platforms are confined to a single directory, sources/targets. Each subdirectory under that contains all the configuration information for a specific target platform FWL can produce system images for. The same scripts build the same packages for each platform, differing only in which configuration directory they pull data from.

Each target configuration directory has three interesting files:

details - sets a bunch of environment variables
miniconfig-uClibc - configuration for uClibc.
miniconfig-linux - configuration for the Linux kernel

These configuration files are read and processed by the script include.sh.

Target name.

The name of the target directory is saved in the variable "$ARCH", and used to form a "tuple" for gcc and binutils by appending "-unknown-linux" to the directory name. So the first thing to do is find out what platform name gcc and binutils want for your target platform, and name your target directory appropriately.

(Note: if your platform really can't use an "${ARCH}-unknown-linux" style tuple, and instead needs a tuple like "bfin-elf", you can set the variable CROSS_TARGET in the "details" file to override the default value and feed some other --target to gcc and binutils. You really shouldn't have to do this unless gcc doesn't yet fully support Linux on your platform, or unless you're doing multiple variants of the same target such as powerpc and ppc440. Try the default first, and fix it if necessary.)

The name of the target directory is also used in the name of the various directories generated during the build (temp-$ARCH, cross-compiler-$ARCH, and mini-native-$ARCH, all in the build/ directory), and as the prefix of the cross compiler binaries ($ARCH-gcc and friends).

$ARCH/details

The following environment variables may be set in the "details" file:

CROSS_TARGET - By default this is set to "${ARCH}-unknown-linux".

This is used by binutils and gcc. If your target really can't use that tuple name (perhaps needing a tuple like "bfin-elf" instead), you can set the variable CROSS_TARGET in the "details" file to override the default value and feed some other --target to gcc and binutils.

You usually shouldn't have to set this yourself unless gcc doesn't yet fully support Linux on your platform. Try the default first, and fix it if necessary.
KARCH - architecture value for the Linux kernel (ARCH=$KARCH).

The Linux kernel uses different names for architectures than gcc or binutils do. To see all your options, list the "arch" directory of the linux kernel source.
KERNEL_PATH - Path in the linux kernel source tree where the bootable kernel image is generated.

This is the file saved out of the kernel build, to be fed to qemu's -kernel option. Usually "arch/${KARCH}/boot/zImage", but sometimes bzImage or image in that directory, sometimes vmlinux in the top level directory...
GCC_FLAGS - Any extra flags needed by gcc.

Usually blank, but sometimes used to specify a floating point coprocessor, ABI, or --with-cpu.
BINUTILS_FLAGS - Any extra flags needed by binutils.

Usually blank.
QEMU_TEST - Optional emulator for sanity test.

At the end of the cross compiler build, a quick sanity test builds static and dynamic "Hello world!" executables with the new cross compiler. If QEMU_TEST isn't blank and a file qemu-$QEMU_TEST exists in the $PATH, the cross compiler build script will then run qemu's application emulation against the static version of "hello world" as an additional sanity test, to make sure it runs on the target processor and outputs "Hello world!".

Leave it blank to skip this test.
emulator_command - Shell function run to generate the actual emulator invocation at the end of the run-$ARCH.sh shell script in the system image tarball.

This is actually a shell function, not an environment variable. It's called from package-mini-native.sh to output an emulator command line to stdout (generally using "echo").

The function receives two arguments: $1 is the name of the ext2 image containing the root filesystem, and $2 is the name of the kernel image. The function can also call another shell function, qemu_defaults, which is defined in package-mini-native.sh and which provides most of the qemu command line. (If you use a different emulator, you don't have to call this function, but if you use qemu it makes things a lot easier and more consistent.) The qemu_command function outputs $ROOT and $CONSOLE variables for its root= and console= kernel command line arguments, so set those before calling it.

Miniconfig files

The expanded .config files used to build both Linux and uClibc are copied into the /usr/src directory of mini-native filesystems during the build, and kept for future reference.

The Linux kernel and uClibc each need a configuration file to build. Firmware Linux uses the "miniconfig" file format, which contains only the configuration symbols a user would have to switch on in menuconfig if they started from allnoconfig.

To generate a miniconfig, first configure your kernel with menuconfig, then copy the resulting .config file to a temporary filename (such as "tempfile"). Then run the miniconfig.sh script in the sources/toys directory with the temporary file name as your argument and with the environment variable ARCH set to the $KARCH value in your new config file (and exported if necessary). This should produce a new file, "mini.config", which is your .config file converted to miniconfig format.

For example, to produce a miniconfig for a given platform:

make ARCH=$KARCH menuconfig
mv .config tempfile
ARCH=$KARCH miniconfig.sh tempfile
ls -l mini.config

To expand a mini.config back into a full .config file (to build a kernel by hand, or for further editing with menuconfig), you can go:

make ARCH=$KARCH allnoconfig KCONFIG_ALLCONFIG=mini.config

Remember to supply an actual value for $KARCH.

$ARCH/miniconfig-linux

This is the miniconfig file to build a Linux kernel for the appropriate target. This is usually aimed at booting under QEMU, but if you'd like to come up with your own configuration for actual target hardware, feel free.

The starting point for kernel configs is generally one of the defconfig files from the Linux kernel source code, usually at "arch/$ARCH/configs/*_defconfig". Copy that to .config at the top of the kernel source, run menuconfig to edit it, then shrink it into a miniconfig.

Kernels to run system images under qemu generally require the following hardware: serial port (for /dev/console), hard drive (for hda and hdb images), network card (for distcc), and a persistent realtime clock (make gets unhappy if source files are newer than the current time). The ability to address at least 512 megs of memory is also nice, although some targets (such as mips) are limited to less than that by the hardware. The "qemu-system-$ARCH -M ?" and "qemu-system-$ARCH -cpu ?" options may be informative here, also the QEMU System emulator for non PC targets documentation.

$ARCH/miniconfig-uClibc

Just like the Linux kernel, uClibc needs a .config file to build, and so the Firmware Linux configuration file supplies a miniconfig. Note that uClibc doesn't require an ARCH= value, because all its architecture information is stored in the config file. Otherwise the procedure for creating and using it is the same as for the Linux kernel, just with a different filename and contents.

Most of each miniconfig-uClibc is identical from platform to platform. Usually only the "Target Architecture" changes (and occasionally an entry or two out of Target Architecture Features and Options). At some point in the future the rest of the uClibc configuration might be factored out into a common file, but so far removing the duplication hasn't been worth the extra complexity.