Ab Origine - Latin, "From the beginning".
Creating system images.
Aboriginal Linux is a shell script that builds the smallest/simplest linux system capable of rebuilding itself from source code. This currently requires seven packages: linux, busybox, uClibc, binutils, gcc, make, and bash. The results are packaged into a system image with shell scripts to boot it under QEMU. (It works fine on real hardware too.)
The build supports most architectures QEMU can emulate (x86, arm, powerpc, mips, sh4, sparc...). The build runs as a normal user (no root access required) and should run on any reasonably current distro, downloading and compiling its own prerequisites from source (including cross compilers).
The build is modular; each section can be bypassed or replaced if desired. The build offers a number of configuration options, but if you don't want to run the build yourself you can download binary system images to play with, built for each target with the default options.
(Note: the goal of the 2.0 release is to migrate from busybox, uClibc, and gcc/binutils to toybox, musl-libc, and lvm/lld.)
Using system images.
Each system image tarball contains a wrapper script ./run-emulator.sh which boots it to shell prompt. (This requires the emulator QEMU to be installed on the host.) The emulated system's /dev/console is routed to stdin and stdout of the qemu process, so you can just type at it and log the output with "tee". Exiting the shell causes the emulator to shut down and exit.
The wrapper script ./dev-environment.sh calls run-emulator.sh with extra options to tell QEMU to allocate more memory, attach 2 gigabytes of persistent storage to /home in the emulated system, and to hook distcc up to the cross compiler to move the heavy lifting of compilation outside the emulator (if distccd and the appropriate cross compiler are available on the host system).
The wrapper script ./native-build.sh calls dev-environment.sh with a build control image attached to /mnt in the emulated system, allowing the init script to run /mnt/init instead of launching a shell prompt, providing fully automated native builds. The "static tools" (dropbear, strace) and "linux from scratch" (a chroot tarball) builds are run each release as part of testing, with the results uploaded to the website.
Even if you plan to build your own images from source code, you should probably start by familiarizing yourself with the (known working) binary releases.
Aboriginal Linux is a build system for creating bootable system images, which can be configured to run either on real hardware or under emulators (such as QEMU). It is intended to reduce or even eliminate the need for further cross compiling, by doing all the cross compiling necessary to bootstrap native development on a given target. (That said, most of what the build does is create and use cross compilers: we cross compile so you don't have to.)
The build system is implemented as a series of bash scripts which run to create the various binary images. The "build.sh" script invokes the other stages in the correct order, but the stages are designed to run individually. (Nothing build.sh itself does is actually important.)
Aboriginal Linux is designed as a series of orthogonal layers (the stages called by build.sh), to increase flexibility and minimize undocumented dependencies. Each layer can be either omitted or replaced with something else. The list of layers is in the source README.
Questions about Aboriginal Linux should be addressed to the project's mailing list, or to the maintainer (rob at landley dot net) who has a blog that often includes notes about ongoing Aboriginal Linux development.
In addition to implementing the above, Aboriginal Linux tries to support a number of use cases:
We cross compile so you don't have to: Moore's Law has made native compiling under emulation a reasonable approach to cross-platform support.
If you need to scale up development, Aboriginal Linux lets you throw hardware at the scalability problem instead of engineering time, using distcc acceleration and distributed package build clusters to compile entire distribution repositories on racks of cheap x86 cloud servers.
But using distcc to call outside the emulator to a cross compiler still acts like a native build. It does not reintroduce the complexities of cross compiling, such as keeping multiple compiler/header/library combinations straight, or preventing configure from confusing the system you build on with the system you deploy on.
Most build environments provide dozens of packages, ignoring the questions "do you actually need that?" and "what's it for?" in favor of offering rich functionality.
Aboriginal Linux provides the smallest, simplest starting point capable of rebuilding itself under itself, and of bootstrapping up to build arbitrarily complex environments (such as Linux From Scratch) by building and installing additional packages. (The one package we add which is not strictly required for this, distcc, is installed it in its own subdirectory which is only optionally added to the $PATH.)
This minimalist approach makes it possible to regression test for environmental dependencies. Sometimes new releases of packages simply won't work without perl, or zlib, or some other dependency that previous versions didn't have, not because they meant to but because they were never tested in a build environment that didn't have them, so the dependency leaked in.
By providing a build environment that contains only the bare essentials (relying on you to build and install whatever else you need), Aboriginal Linux lets you document exactly what dependencies packages actually require, figure out what functionality the additional packages provide, and measure the costs and benefits of the extra code.
(Note: the command logging wrapper record-commands.sh can actually show which commands were used out of the $PATH when building any package.)
Now that the 1.0 release is out, what are the project's new goals?
Move from busybox, uclibc, and gcc/binutils to toybox, musl, and llvm (then qcc).
Now that we've got a simple development environment working, we can make it simpler by moving to better packages. Most of this project's new development effort is going into the upstream versions of those packages until they're ready for use here. In the meantime we're maintaining what works, but only really upgrading the kernel version and slowly switching from busybox to toybox one command at a time.)
uClibc: The uClibc project's chronic development problems resulted in multiple year-long gaps between releases, and after the may 2012 release more than three years went by without a release during which time musl-libc went from "git init" to a 1.0 release. At this point it doesn't matter if uClibc did get another release out, it's over, musl is the more interesting project. (Its limitations are lack of target support, but it's easy to port musl to new targets and very hard to clean up the mess uClibc has become.)
toybox: The maintainer of Aboriginal Linux used to maintain busybox, but left that project and went on to create toybox for reasons explained at length elsewhere (video, outline, merged into Android).
The toybox 1.0 release should include a shell capable of replacing bash, and may include a make implementation (or in qcc, below). This would eliminate two more packages currently used by Aboriginal Linux.
llvm: When gcc and binutils went GPLv3, Aboriginal Linux froze on the last GPLv2 releases, essentially maintaining its own fork of those projects. Several other projects did the same but most of those have since switched to llvm.
Unfortunately, configuring and building llvm is unnecessarily hard (among other things because it's not just implemented in C++ but the 2013 C++ spec, so you need gcc 4.7 or newer to bootstrap it), and nobody seems to have worked out how to canadian cross native compilers out of it yet. But other alternatives like pcc or tinycc are both less capable and less actively developed; since the FSF fell on its sword with GPLv3, the new emerging standard is LLVM.
qcc: In the long run, we'd like to put together a new compiler, qcc, but won't have development effort to spare for it before toybox's 1.0 release. Its goal is to combine tinycc and QEMU's Tiny Code Generator into a single multicall binary toolchain (cc, ld, as, strip and so on in a single executable replacing both the gcc and binutils packages) that supports all the output formats QEMU can emulate. (As a single-pass compiler with no intermediate format it wouldn't optimize well, but could bootstrap a native compiler that would.)
Additional goals for qcc would be to absorb ccwrap.c, grow built-in distcc equivalent functionality, and an updated rewrite of cfront to compile C++ code (and thus natively bootstrap LLVM).
Finishing the full development slate would bring the total number of Aboriginal Linux packages down to four: linux, toybox, musl, and qcc.
Untangle distro build system hairballs into distinct layers.
The goal here is to separate what packages you can build from where and how you can build them.
For years, Red Hat only built under Red Hat, Debian only built under Debian, even Gentoo assumed it was building under Gentoo. Building their packages required using their root filesystem, and the only way to get their root filesystem was by installing their package binaries built under their root filesystem. The circular nature of this process meant that porting an existing distribution to a new architecture, or making it use a new C library, was extremely difficult at best.
This led cross compilng build systems to add their own package builds ("the buildroot trap"), and wind up maintaining their own repository of package build recipes, configurations, and dependencies. Their few hundred packages never approached the tens of thousands in full distribution repositories, but the effort of maintaining and upgrading packages would come to dominate the project's development effort until developers left to form new projects and start the cycle over again.
This massive and perpetual reinventing of wheels is wasteful. The proliferation of build systems (buildroot, openembedded, yocto/meego/tizen, and many more) each has its own set of supported boards and its own half-assed package repository, with no ability to mix and match.
The proper way to deal with this is to separate the layers so you can mix and match. Choice of toolchain (and C library), "board support" (kernel configuration, device tree, module selection), and package repository (which existing distro you want to use), all must become independent. Until these are properly separated, your choice of cross compiler limits what boards you can boot the result on (even if the binaries you're building would run in a chroot on that hardware), and either of those choices limit what packages you can install into the resulting system.
This means Aboriginal Linux needs to be able to build _just_ toolchains and provide them to other projects (done), and to accept external toolchains (implemented but not well tested; most other projects produce cross compilers but not native compilers).
It also needs build control images to automatically bootstrap a Debian, Fedora, or Gentoo chroot starting from the minimal development enviornment Aboriginal Linux creates (possibly through an intermediate Linux From Scratch build, followed by fixups to make debian/fedora/gentoo happy with the chroot). It must be able to do this on an arbitrary host, using the existing toolchain and C library in an architecture-agnostic way. (If the existing system is a musl libc built for a microblaze processor, the new chroot should be too.)
None of these distributions make it easy: it's not documented, and it breaks. Some distributions didn't think things through: Gentoo hardwires the list of supported architectures into every package in the repository, for no apparent reason. Adding a new architecture requires touching every package's metadata. Others are outright lazy; building the an allnoconfig Red Hat Enterprise 6.2 kernel under SLES11p2 is kind of hilariously bad: "make clean" spits out an error because the code it added to detect compiler version (something upstream doesn't need) gets confused by "gcc 4.3", which has no .0 on the end so the patchlevel variable is blank. Even under Red Hat's own filesystem, "make allnoconfig" breaks on the first C file, and requires almost two dozen config symbols to be switched on to finish the compilation, becuase they never tested anything but the config they ship. Making something like that work on a Hexagon processor, or making their root filesystem work with a vanilla kernel, is a daunting task.
Make Android self-hosting (musl, toybox, qcc).
Smartphones are replacing the PC, and if Android doesn't become self-hosting we may be stuck with locked down iPhone derivatives in the next generation.
Mainframe -> minicomputer -> microcomputer (PC) -> smartphone
Mainframes were replaced by minicomputers, which were replaced by microcomputers, which are being replaced by smartphones. (Nobody needed to stand in line to pick up a printout when they could sign up for a timeslot at a terminal down the hall. Nobody needed the terminal down the hall when they had a computer on their desk. Now nobody needs the computer on their desk when they have one in their pocket.)
Each time the previous generation got kicked up into the "server space", only accessed through the newer machines. (This time around kicking the PC up into the server space is called "the cloud".)
Smartphones have USB ports, which charge the phone and transfer data. Using a smartphone as a development workstation involves plugging it into a USB hub, adding a USB keyboard, USB mouse, and USB to HDMI converter to plug it into a television. The rest is software.
The smartphone needs to "grow up and become a real computer" the same way the PC did. The PC originally booted into "ROM Basic" just like today's Android boots into Dalvik Java: as the platform matures it must outgrow this to run native code written in all sorts of languages. PC software was once cross compiled from minicomputers, but as it matured it grew to host its own development tools, powerful enough to rebuild the entire operating system.
To grow up, Android phones need to become usable as development workstations, meaning the OS needs a self-hosting native development environment. This has four parts:
The Android kernel is a Linux derivative that adds features without removing any, so it's already good enough for now. Convergence to vanilla linux is important for long-term sustainability, but not time critical. (It's not part of "beating iPhone".)
Android's "no GPL in userspace" policy precludes it from shipping many existing Linux packages as part of the base install: no BusyBox or GNU tools, no glibc or uClibc, and no gcc or binutils. All those are all excluded from the Android base install, meaning they will never come bundled with the base operating system or preinstalled on devices, so we must find alternatives.
Android's libc is called "bionic", and is a minimal stub sufficient to run Dalvik, and not much more. Its command line is called "toolbox" and is also a minimal stub providing little functionality. Part of this is intentional: Google is shipping a billion broadband-connected unix machines, none of which are administered by a competent sysadmin. So for security reasons, Android is locked down with minimal functionality outside the Java VM sandbox, providing less of an attack surface for viruses and trojans. In theory the Linux Containers infrastructure may eventually provide a solution for sandboxing applications, but the base OS needs to be pretty bulletproof if a billion people are going to run code they don't deeply understand connected to broadband internet 24/7.
Thus replacement packages for the C library and posix command line should be clean simple code easy to audit for security concerns. But it must also provide functionality that bionic and toolbox do not attempt, and do not provide a good base for. The musl libc and toybox command line package should be able to satisfy these requirements.
The toolchain is a harder problem. The leading contender (LLVM) is sponsored by Apple for use in Mac OSX and the iPhone's iOS. The iPhone is ahead of Android here, and although Android can use this it has other problems (implemented in C++ so significantly more complicated from a system dependency standpoint, making it difficult to bootstrap and impossible to audit).
The simplest option would be to combine the TinyCC project with QEMU's Tiny Code Generator (TCG). The licensing of the current TinyCC is incompatible with Android's userspace but permission has been obtained from Fabrice Bellard to BSD-license his original TinyCC code as used in Rob's TinyCC fork. This could be used to implement a "qcc" capable of producing code for every platform qemu supports. The result would be simple and auditable, and compatably licensed with android userspace. Unfortunately, such a project is understaffed, and wouldn't get properly started until after the 1.0 release of Toybox.
Other potential compiler projects include Open64 and PCC. Neither of these has built a bootable the Linux kernel, without which a self-bootstrapping system is impossible. (This is a good smoketest for a mature compiler: if it can't build the kernel, it probably can't build userspace packages of the complexity people actually write.)Why does this matter?
This is time critical due to network effects, which create positive feedback loops benefiting the most successful entrant and creating natural "standards" (which become self-defending monopolies if owned by a single player.) Whichever platform has the most users attracts the most development effort, because it has the most potential customers. The platform all the software ships on first (often only) is the one everybody wants to have. Other benefits to being biggest include the large start-up costs and much lower incremental costs of electronics manufacturing: higher unit volume makes devices cheaper to produce. Amortizing research and development budgets over a larger user base means the technology may actually advance faster (more effort, anyway)...
Technological transitions produce "S curves", where a gradual increase gives way to exponential increase (the line can go nearly vertical on a graph) and then eventually flattens out again producing a sort of S shape. During the steep part of the S-curve acquiring new customers dominates. Back in the early minicomputer days a lot more people had no computer than had an Atari 800 or Commodore 64 or Apple II or IBM PC, so each vendor focused on selling to the computerless than converting customers from other vendors. Once the pool of "people who haven't got the kind of computer we're selling today but would like one if they did" was exhausted (even if only temporarily, waiting for computers to get more powerful and easier to use), the largest players starved the smaller ones of new sales, until only the PC and Macintosh were left. (And the Macintosh switched over to PC hardware components to survive, offering different software and more attractive packaging of the same basic components.)
The same smartphone transition is inevitable as the pool of "people with no smartphone, but who would like one if they had it" runs out. At that point, the largest platform will suck users away from smaller platforms. If the winner is android we can open up the hardware and software. If the winner is iPhone, we're stuck with decades of microsoft-like monopoly except this time the vendor isn't hamstrung by their own technical incompetence.
The PC lasted over 30 years from its 1981 introduction until smartphones seriously started displacing it. Smartphones themselves will probably last about as long. Once the new standard "clicks", we're stuck with it for a long time. Now is when we can influence this decision. Linux's 15 consecutive "year of the linux desktop" announcements (spanning the period of Microsoft Bob, Windows Millennium, and windows Vista) show how hard displacing an entrenched standard held in place by network effects actually is.Why not extend vanilla Linux to smartphones instead?
|Copyright 2002, 2011 Rob Landley <firstname.lastname@example.org>