Aboriginal Linux

News Documentation Download Development Control Images

Ab Origine - Latin, "From the beginning".

  • Build the simplest linux system capable of compiling itself.
  • Cross compile it to every target supported by QEMU.
  • Boot it under QEMU (or real hardware).
  • Build/test everything else natively on target.

What is Aboriginal Linux?

Creating system images.

Aboriginal Linux is a shell script that builds the smallest/simplest linux system capable of rebuilding itself from source code. This currently requires seven packages: linux, busybox, uClibc, binutils, gcc, make, and bash. The results are packaged into a system image with shell scripts to boot it under QEMU. (It works fine on real hardware too.)

The build supports most architectures QEMU can emulate (x86, arm, powerpc, mips, sh4, sparc...). The build runs as a normal user (no root access required) and should run on any reasonably current distro, downloading and compiling its own prerequisites from source (including cross compilers).

The build is modular; each section can be bypassed or replaced if desired. The build offers a number of configuration options, but if you don't want to run the build yourself you can download binary system images to play with, built for each target with the default options.

(Note: the goal of the 2.0 release is to migrate from busybox, uClibc, and gcc/binutils to toybox, musl-libc, and lvm/lld.)

Using system images.

Each system image tarball contains a wrapper script ./run-emulator.sh which boots it to shell prompt. (This requires the emulator QEMU to be installed on the host.) The emulated system's /dev/console is routed to stdin and stdout of the qemu process, so you can just type at it and log the output with "tee". Exiting the shell causes the emulator to shut down and exit.

The wrapper script ./dev-environment.sh calls run-emulator.sh with extra options to tell QEMU to allocate more memory, attach 2 gigabytes of persistent storage to /home in the emulated system, and to hook distcc up to the cross compiler to move the heavy lifting of compilation outside the emulator (if distccd and the appropriate cross compiler are available on the host system).

The wrapper script ./native-build.sh calls dev-environment.sh with a build control image attached to /mnt in the emulated system, allowing the init script to run /mnt/init instead of launching a shell prompt, providing fully automated native builds. The "static tools" (dropbear, strace) and "linux from scratch" (a chroot tarball) builds are run each release as part of testing, with the results uploaded to the website.

For more information, see Getting Started or the presentation slides Developing for non-x86 Targets using QEMU.

Downloading Aboriginal Linux

Prebuilt binary images are available for each target, based on the current Aboriginal Linux release. This includes cross compilers, native compilers, root filesystems suitable for chroot, and system images for use with QEMU.

The binary README describes each tarball. The release notes explain recent changes.

Even if you plan to build your own images from source code, you should probably start by familiarizing yourself with the (known working) binary releases.

Development

To build a system image for a target, download the Aboriginal Linux source code and run "./build.sh" with the name of the target to build (or with no arguments to list available targets). See the "config" file in the source for various environment variables you can export to control the build. See the source README for additional usage instructions, and the release notes for recent changes.

Aboriginal Linux is a build system for creating bootable system images, which can be configured to run either on real hardware or under emulators (such as QEMU). It is intended to reduce or even eliminate the need for further cross compiling, by doing all the cross compiling necessary to bootstrap native development on a given target. (That said, most of what the build does is create and use cross compilers: we cross compile so you don't have to.)

The build system is implemented as a series of bash scripts which run to create the various binary images. The "build.sh" script invokes the other stages in the correct order, but the stages are designed to run individually. (Nothing build.sh itself does is actually important.)

Aboriginal Linux is designed as a series of orthogonal layers (the stages called by build.sh), to increase flexibility and minimize undocumented dependencies. Each layer can be either omitted or replaced with something else. The list of layers is in the source README.

The project maintains a development repository using the Mercurial source control system. This includes RSS feeds for each checkin and for new releases.

Questions about Aboriginal Linux should be addressed to the project's mailing list, or to the maintainer (rob at landley dot net) who has a blog that often includes notes about ongoing Aboriginal Linux development.

Design goals

In addition to implementing the above, Aboriginal Linux tries to support a number of use cases:

  • Eliminate the need for cross compiling
  • Allow package maintainers to reproduce/fix bugs on more architectures
  • Automated cross-platform regression testing and portability auditing.
  • Use current vanilla packages, even on obscure targets.
  • Provide a minimal self-hosting development environment.
  • Cleanly separate layers
  • Document how to put together a development environment.
  • Eliminate the need for cross compiling

  • We cross compile so you don't have to: Moore's Law has made native compiling under emulation a reasonable approach to cross-platform support.

    If you need to scale up development, Aboriginal Linux lets you throw hardware at the scalability problem instead of engineering time, using distcc acceleration and distributed package build clusters to compile entire distribution repositories on racks of cheap x86 cloud servers.

    But using distcc to call outside the emulator to a cross compiler still acts like a native build. It does not reintroduce the complexities of cross compiling, such as keeping multiple compiler/header/library combinations straight, or preventing configure from confusing the system you build on with the system you deploy on.

  • Allow package developers and maintainers to reproduce and fix bugs on architectures they don't have access to or experience with.

    Bug reports can include a link to a system image and a reproduction sequence (wget source, build, run this test). This provides the maintainer both a way to demonstrate the issue, and a native development environment in which to build and test their fix.

    No special hardware is required for this, just an open source emulator (generally QEMU) and a system image to run under it. Use wget to fetch your source, configure and make your package as normal using standard tool names (strip, ld, as, etc), even build and test on a laptop in an airplane without internet access (10.0.2.2 is qemu's alias for the host's 127.0.0.1.).

  • Automated cross-platform regression testing and portability auditing.

    Aboriginal Linux lets you build the same package across multiple architectures, and run the result immediately inside the emulator. You can even set up a cron job to build and test regular repository snapshots of a package's development version automatically, and report regressions when they're fresh, when the developers remember what they did, and when there are few recent changes that may have introduced the bug.

  • Use current vanilla packages, even on obscure targets.

    Nonstandard hardware often receives less testing than common desktop and server platforms, so regressions accumulate. This can lead to a vicious cycle where everybody sticks with private forks of old versions because making the new ones work is too much trouble, and the new ones don't work because nobody's testing and fixing them. The farther you fall behind, the harder it is to catch up again, but only the most recent version accepts new patches, so even the existing fixes don't go upstream. Worst of all, working in private forks becomes the accepted norm, and developers stop even trying to get their patches upstream.

    Aboriginal Linux uses the same (current) package versions across all architectures, in as similar a configuration as possible, and with as few patches as we can get away with. We (intentionally) can't upgrade a package for one target without upgrading it for all of them, so we can't put off dealing with less-interesting targets.

    This means any supported target stays up to date with current packages in unmodified "vanilla" form, providing an easy upgrade path to the next version and the ability to push your own changes upstream relatively easily.

  • Provide a minimal self-hosting development environment.

  • Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away." - Antoine de Saint Exupery

    Most build environments provide dozens of packages, ignoring the questions "do you actually need that?" and "what's it for?" in favor of offering rich functionality.

    Aboriginal Linux provides the smallest, simplest starting point capable of rebuilding itself under itself, and of bootstrapping up to build arbitrarily complex environments (such as Linux From Scratch) by building and installing additional packages. (The one package we add which is not strictly required for this, distcc, is installed it in its own subdirectory which is only optionally added to the $PATH.)

    This minimalist approach makes it possible to regression test for environmental dependencies. Sometimes new releases of packages simply won't work without perl, or zlib, or some other dependency that previous versions didn't have, not because they meant to but because they were never tested in a build environment that didn't have them, so the dependency leaked in.

    By providing a build environment that contains only the bare essentials (relying on you to build and install whatever else you need), Aboriginal Linux lets you document exactly what dependencies packages actually require, figure out what functionality the additional packages provide, and measure the costs and benefits of the extra code.

    (Note: the command logging wrapper record-commands.sh can actually show which commands were used out of the $PATH when building any package.)

  • Cleanly separate layers.

    The entire build is designed to let you use only the parts of it you want, and skip or replace the rest. The top level "build.sh" script calls other scripts in sequence, each of which is designed to work independently.

    The only place package versions are mentioned is "download.sh", the rest of the build is version-agnostic. All it does is populate the "packages" directory, and if you want to provide your own you never need to run this script.

    The "host-tools.sh" script protects the build from variations in the host system, both by building known versions of command line tools (in build/host) and adjusting the $PATH to point only to that directory, and by unsetting all environment variables that aren't in a whitelist. If you want to use the host system's unfiltered environment instead, just skip running host-tool.sh.

    If you supply your own cross compilers in the $PATH (with the prefixes the given target expects), you can skip the simple-cross-compiler.sh command. Similarly you can provide your own simple root filesystem, your own native compiler, or your own kernel image. You can use your own script to package them if you like.

  • Document how to put together a development environment.

    The build system is designed to be readable. That's why it's written in Bash (rather than something more powerful like Python): so it can act as documentation. Each shell script collects the series of commands you need to run in order to configure, build, and install the appropriate packages, in the order you need to install them in to satisfy their dependencies.

    The build is organized as a series of orthogonal stages. These are called in order from build.sh, but may be run (and understood) independently. Dependencies between them are kept to a minimum, and stages which depend on the output of previous stages document this at the start of the file.

    The scripts are also extensively commented to explain why they do what they do, and there's design documentation on the website.

What's next?

Now that the 1.0 release is out, what are the project's new goals?

Move from busybox, uclibc, and gcc/binutils to toybox, musl, and llvm (then qcc).

Now that we've got a simple development environment working, we can make it simpler by moving to better packages. Most of this project's new development effort is going into the upstream versions of those packages until they're ready for use here. In the meantime we're maintaining what works, but only really upgrading the kernel version and slowly switching from busybox to toybox one command at a time.)

uClibc: The uClibc project's chronic development problems resulted in multiple year-long gaps between releases, and after the may 2012 release more than three years went by without a release during which time musl-libc went from "git init" to a 1.0 release. At this point it doesn't matter if uClibc did get another release out, it's over, musl is the more interesting project. (Its limitations are lack of target support, but it's easy to port musl to new targets and very hard to clean up the mess uClibc has become.)

toybox: The maintainer of Aboriginal Linux used to maintain busybox, but left that project and went on to create toybox for reasons explained at length elsewhere (video, outline, merged into Android).

The toybox 1.0 release should include a shell capable of replacing bash, and may include a make implementation (or in qcc, below). This would eliminate two more packages currently used by Aboriginal Linux.

llvm: When gcc and binutils went GPLv3, Aboriginal Linux froze on the last GPLv2 releases, essentially maintaining its own fork of those projects. Several other projects did the same but most of those have since switched to llvm.

Unfortunately, configuring and building llvm is unnecessarily hard (among other things because it's not just implemented in C++ but the 2013 C++ spec, so you need gcc 4.7 or newer to bootstrap it), and nobody seems to have worked out how to canadian cross native compilers out of it yet. But other alternatives like pcc or tinycc are both less capable and less actively developed; since the FSF fell on its sword with GPLv3, the new emerging standard is LLVM.

qcc: In the long run, we'd like to put together a new compiler, qcc, but won't have development effort to spare for it before toybox's 1.0 release. Its goal is to combine tinycc and QEMU's Tiny Code Generator into a single multicall binary toolchain (cc, ld, as, strip and so on in a single executable replacing both the gcc and binutils packages) that supports all the output formats QEMU can emulate. (As a single-pass compiler with no intermediate format it wouldn't optimize well, but could bootstrap a native compiler that would.)

Additional goals for qcc would be to absorb ccwrap.c, grow built-in distcc equivalent functionality, and an updated rewrite of cfront to compile C++ code (and thus natively bootstrap LLVM).

Finishing the full development slate would bring the total number of Aboriginal Linux packages down to four: linux, toybox, musl, and qcc.

(Yes, reducing dependency on GPL software and avoiding GPLv3 entirely is a common theme of the above package switches, there's a reason for that: audio, outline, see also Android self-hosting below.)

Untangle distro build system hairballs into distinct layers.

The goal here is to separate what packages you can build from where and how you can build them.

For years, Red Hat only built under Red Hat, Debian only built under Debian, even Gentoo assumed it was building under Gentoo. Building their packages required using their root filesystem, and the only way to get their root filesystem was by installing their package binaries built under their root filesystem. The circular nature of this process meant that porting an existing distribution to a new architecture, or making it use a new C library, was extremely difficult at best.

This led cross compilng build systems to add their own package builds ("the buildroot trap"), and wind up maintaining their own repository of package build recipes, configurations, and dependencies. Their few hundred packages never approached the tens of thousands in full distribution repositories, but the effort of maintaining and upgrading packages would come to dominate the project's development effort until developers left to form new projects and start the cycle over again.

This massive and perpetual reinventing of wheels is wasteful. The proliferation of build systems (buildroot, openembedded, yocto/meego/tizen, and many more) each has its own set of supported boards and its own half-assed package repository, with no ability to mix and match.

The proper way to deal with this is to separate the layers so you can mix and match. Choice of toolchain (and C library), "board support" (kernel configuration, device tree, module selection), and package repository (which existing distro you want to use), all must become independent. Until these are properly separated, your choice of cross compiler limits what boards you can boot the result on (even if the binaries you're building would run in a chroot on that hardware), and either of those choices limit what packages you can install into the resulting system.

This means Aboriginal Linux needs to be able to build _just_ toolchains and provide them to other projects (done), and to accept external toolchains (implemented but not well tested; most other projects produce cross compilers but not native compilers).

It also needs build control images to automatically bootstrap a Debian, Fedora, or Gentoo chroot starting from the minimal development enviornment Aboriginal Linux creates (possibly through an intermediate Linux From Scratch build, followed by fixups to make debian/fedora/gentoo happy with the chroot). It must be able to do this on an arbitrary host, using the existing toolchain and C library in an architecture-agnostic way. (If the existing system is a musl libc built for a microblaze processor, the new chroot should be too.)

None of these distributions make it easy: it's not documented, and it breaks. Some distributions didn't think things through: Gentoo hardwires the list of supported architectures into every package in the repository, for no apparent reason. Adding a new architecture requires touching every package's metadata. Others are outright lazy; building the an allnoconfig Red Hat Enterprise 6.2 kernel under SLES11p2 is kind of hilariously bad: "make clean" spits out an error because the code it added to detect compiler version (something upstream doesn't need) gets confused by "gcc 4.3", which has no .0 on the end so the patchlevel variable is blank. Even under Red Hat's own filesystem, "make allnoconfig" breaks on the first C file, and requires almost two dozen config symbols to be switched on to finish the compilation, becuase they never tested anything but the config they ship. Making something like that work on a Hexagon processor, or making their root filesystem work with a vanilla kernel, is a daunting task.

Make Android self-hosting (musl, toybox, qcc).

Smartphones are replacing the PC, and if Android doesn't become self-hosting we may be stuck with locked down iPhone derivatives in the next generation.

Mainframe -> minicomputer -> microcomputer (PC) -> smartphone

Mainframes were replaced by minicomputers, which were replaced by microcomputers, which are being replaced by smartphones. (Nobody needed to stand in line to pick up a printout when they could sign up for a timeslot at a terminal down the hall. Nobody needed the terminal down the hall when they had a computer on their desk. Now nobody needs the computer on their desk when they have one in their pocket.)

Each time the previous generation got kicked up into the "server space", only accessed through the newer machines. (This time around kicking the PC up into the server space is called "the cloud".)

Smartphones have USB ports, which charge the phone and transfer data. Using a smartphone as a development workstation involves plugging it into a USB hub, adding a USB keyboard, USB mouse, and USB to HDMI converter to plug it into a television. The rest is software.

The smartphone needs to "grow up and become a real computer" the same way the PC did. The PC originally booted into "ROM Basic" just like today's Android boots into Dalvik Java: as the platform matures it must outgrow this to run native code written in all sorts of languages. PC software was once cross compiled from minicomputers, but as it matured it grew to host its own development tools, powerful enough to rebuild the entire operating system.

To grow up, Android phones need to become usable as development workstations, meaning the OS needs a self-hosting native development environment. This has four parts:

  • Kernel (we're good)
  • C library (bionic->musl, not uclibc)
  • Posix command line (toolbox->toybox, not busybox)
  • Compiler (qcc, llvm, open64, pcc...)

The Android kernel is a Linux derivative that adds features without removing any, so it's already good enough for now. Convergence to vanilla linux is important for long-term sustainability, but not time critical. (It's not part of "beating iPhone".)

Android's "no GPL in userspace" policy precludes it from shipping many existing Linux packages as part of the base install: no BusyBox or GNU tools, no glibc or uClibc, and no gcc or binutils. All those are all excluded from the Android base install, meaning they will never come bundled with the base operating system or preinstalled on devices, so we must find alternatives.

Android's libc is called "bionic", and is a minimal stub sufficient to run Dalvik, and not much more. Its command line is called "toolbox" and is also a minimal stub providing little functionality. Part of this is intentional: Google is shipping a billion broadband-connected unix machines, none of which are administered by a competent sysadmin. So for security reasons, Android is locked down with minimal functionality outside the Java VM sandbox, providing less of an attack surface for viruses and trojans. In theory the Linux Containers infrastructure may eventually provide a solution for sandboxing applications, but the base OS needs to be pretty bulletproof if a billion people are going to run code they don't deeply understand connected to broadband internet 24/7.

Thus replacement packages for the C library and posix command line should be clean simple code easy to audit for security concerns. But it must also provide functionality that bionic and toolbox do not attempt, and do not provide a good base for. The musl libc and toybox command line package should be able to satisfy these requirements.

The toolchain is a harder problem. The leading contender (LLVM) is sponsored by Apple for use in Mac OSX and the iPhone's iOS. The iPhone is ahead of Android here, and although Android can use this it has other problems (implemented in C++ so significantly more complicated from a system dependency standpoint, making it difficult to bootstrap and impossible to audit).

The simplest option would be to combine the TinyCC project with QEMU's Tiny Code Generator (TCG). The licensing of the current TinyCC is incompatible with Android's userspace but permission has been obtained from Fabrice Bellard to BSD-license his original TinyCC code as used in Rob's TinyCC fork. This could be used to implement a "qcc" capable of producing code for every platform qemu supports. The result would be simple and auditable, and compatably licensed with android userspace. Unfortunately, such a project is understaffed, and wouldn't get properly started until after the 1.0 release of Toybox.

Other potential compiler projects include Open64 and PCC. Neither of these has built a bootable the Linux kernel, without which a self-bootstrapping system is impossible. (This is a good smoketest for a mature compiler: if it can't build the kernel, it probably can't build userspace packages of the complexity people actually write.)

Why does this matter?

This is time critical due to network effects, which create positive feedback loops benefiting the most successful entrant and creating natural "standards" (which become self-defending monopolies if owned by a single player.) Whichever platform has the most users attracts the most development effort, because it has the most potential customers. The platform all the software ships on first (often only) is the one everybody wants to have. Other benefits to being biggest include the large start-up costs and much lower incremental costs of electronics manufacturing: higher unit volume makes devices cheaper to produce. Amortizing research and development budgets over a larger user base means the technology may actually advance faster (more effort, anyway)...

Technological transitions produce "S curves", where a gradual increase gives way to exponential increase (the line can go nearly vertical on a graph) and then eventually flattens out again producing a sort of S shape. During the steep part of the S-curve acquiring new customers dominates. Back in the early minicomputer days a lot more people had no computer than had an Atari 800 or Commodore 64 or Apple II or IBM PC, so each vendor focused on selling to the computerless than converting customers from other vendors. Once the pool of "people who haven't got the kind of computer we're selling today but would like one if they did" was exhausted (even if only temporarily, waiting for computers to get more powerful and easier to use), the largest players starved the smaller ones of new sales, until only the PC and Macintosh were left. (And the Macintosh switched over to PC hardware components to survive, offering different software and more attractive packaging of the same basic components.)

The same smartphone transition is inevitable as the pool of "people with no smartphone, but who would like one if they had it" runs out. At that point, the largest platform will suck users away from smaller platforms. If the winner is android we can open up the hardware and software. If the winner is iPhone, we're stuck with decades of microsoft-like monopoly except this time the vendor isn't hamstrung by their own technical incompetence.

The PC lasted over 30 years from its 1981 introduction until smartphones seriously started displacing it. Smartphones themselves will probably last about as long. Once the new standard "clicks", we're stuck with it for a long time. Now is when we can influence this decision. Linux's 15 consecutive "year of the linux desktop" announcements (spanning the period of Microsoft Bob, Windows Millennium, and windows Vista) show how hard displacing an entrenched standard held in place by network effects actually is.

Why not extend vanilla Linux to smartphones instead?

Several reasons.

  • It's probably too late for another entrant. Microsoft muscling in with Lumia is like IBM muscling in with OS/2. And Ubuntu on the phone is like Coherent Unix on the PC, unlikely to even register. We have two clear leaders and the rest are noise ("Coke, Pepsi, and everybody else"). Possibly they could still gain ground by being categorically better, but "Categorically better than the newest iPhone/iPad" is a hard bar to clear.

  • During the minicomputer->PC switch, various big iron vendors tried to shoehorn their products down into the minicomputer space. The results were laughable. (Look up the "microvax" sometime.)

    The successful tablets are big phones, not small PCs. Teaching a PC to be a good phone is actually harder than teaching a phone to be a good PC, we understand the old problem space much better. (It's not that it's less demanding, but the ways in which it is demanding are old hat and long solved. Being a good phone is still tricky.)

  • Deployment requires vendor partnerships which are difficult and slow. Apple exclusively partnered with AT&T for years to build market share, and had much less competition at the time. Google eventually wound up buying Motorola to defend itself from the dysfunctional patent environment. Microsoft hijacked Nokia by installing one of their own people as CEO, and it's done them about as much good as a similiar CEO-installation at SGI did to get Microsoft into the supercomputer market. (Taking out SGI did reduce Microsoft's competition in graphics workstations, but that was a market they already had traction in.)

  • Finally, Linux has had almost 2 decades of annual "Linux on the Desktop" pushes that universally failed, and there's a reason for this. Open source development can't do good user interfaces for the same reason wikipedia can't write a novel with a coherent plot. The limitations of the development model do not allow for this. The old adage "too many cooks spoil the soup" is not a warning about lack of nutrition, it's a warning that aesthetic issues do not survive committees. Peer review does not produce blockbuster movies, hit songs, or masterpiece paintings. It finds scientific facts, not beauty.

    Any time "shut up and show me the code" is not the correct response to the problem at hand, open source development melts down into one of three distinct failure modes:

    1) Endless discussion that never results in actual code, because nobody can agree on a single course of action.

    2) The project forks itself to death: everybody goes off and codes up their preferred solution, but it's no easier to agree on a single approach after the code exists so the forks never get merged.

    3) Delegating the problem to nobody, either by A) separating engine from interface and focusing on the engine in hopes that some glorious day somebody will write an interface worth using, or B) making the interface so configurable that the fact it takes hours to figure out what your options are and still has no sane defaults is now somehow the end user's fault.

    Open source development defeats Brooks' Law by leveraging empirical tests. Integrating the results of decoupled development efforts is made possible by the ability to unequivocally determine which approaches are best (trusted engineers break ties, but it has to be pretty close and the arguments go back and forth). Even changing the design and repeatedly ripping out existing implementations is doable if everyone can at least retroactively agree that what we have now is better that what we used to have, and we should stop fighting to go back to the old way.

    In the absence of empirical tests, this doesn't work. By their nature, aesthetic issues do not have emprical tests for "better" or "worse". Chinese food is not "better" than mexican food. But if you can't decide what you're doing (if one chef insists on adding ketchup and another bacon and a third ice cream) the end result is an incoherent mess. (At best you get beige and the DMV. Navigable with enough effort, but not appealing.)

    The way around this is to a have a single author with a clear vision in charge of the user interface, who can make aesthetic decisions that are coherent rather than "correct". Unfortunately when this does happen, the open source community pressures the developer of a successful project to give over control of the project to a committee. So the Gecko engine was buried in the unusable Mozilla browser, then Galleon forked off from that and Mozilla rebased itself on the Galleon fork. Then Firefox forked off of that and the Mozilla foundation took over Firefox...

    Part of the success of Android is that its user experience is NOT community developed. (This isn't just desktop, this is "if the whole thing pauses for two seconds while somebody's typing in a phone number, that's unacceptable". All the way down to the bare metal, the OS serves the task of being a handheld interactive touch screen device running off of battery power first, being anything else it _could_ be doing second.)



Copyright 2002, 2011 Rob Landley <rob@landley.net>