The stat breakage was printing long long as long, which is 32/64 bit type confusion on 32 bit hosts. Of course the real type was hidden by layers of typedefs, which are worse in statfs than in posix's statvfs because the linux structure is trying to hide a historical move from statfs() to statfs64(). But statvfs has the fsid as a 64 bit field, and statfs has fsid as a 128 bit field (struct of two 64 bit fields, and it uses all the bits), so switching from the LSB api to the posix API would truncate a field. Grrr.
Anyway, stat's fixed now and I ran a build of all the targets and half of them broke, with WEIRD breakage. On i486, i586, and i686 the perl build said the random numbers weren't random enough (but /dev/urandom is fine). Sparc and ppc440 segfaulted with illegal instructions.
Four things changed: the kernel version, the qemu version, toybox, and the squashfs vs initramfs packaging. The illegal instructions sound like a qemu problem, the perl build breakage might be kernel version? Sigh. Too much changed at once.
Oddly enough, arm built through to the end. Well of course.
Still trying to get an Aboriginal release out. The lfs-bootstrap control image build broke because the root filesystem is writeable now, so the test whether or not we need to create a chroot under /home and run the build in that isn't triggering.
So I need to add another condition to the test... but what? The obvious thing to do is df / and see if there's enough space, but A) how much is "enough", B) df doesn't have an obvious and standardized way to get the size as a numeric value. You have to chop a field out with awk, which is (to coin a phrase) awkward.
Yes, classic unix tool, standardized by posix, not particularly scriptable.
The tool that _is_ scriptable (and in toybox) is "stat", and in theory "stat -fc %a /" should give the available space... but it doesn't. It gives it in blocks, and how big is a block? Well that's %S, so you have to do something like $(($(stat -fc '%a*%S'))) and have the shell multiply them together (and hope you have 64 bit math in your shell, but for the moment we do).
Next problem: stat is broken on armv5. It works fine on x86, but it's breaking in aboriginal. (Is this an arm thing, a uClibc thing, a 3.18 kernel thing... sigh.)
So now to debug that...
Still banging on Aboriginal Linux. You rip out one little major design element and replace it with something wildly different and there's consequences all over the place...
The ccwrap path logic is still drilling past where I want it to stop (and thus not finding the gmp.h file added to /usr/include because it's looking in /usr/overlay/usr/include in the read-only squashfs mount). I pondered using overlayfs to do a union mount for all this, but that's a can of worms I'm uncomfortable with opening just yet. (Stat on a file and stat on a directory containing the file disagree about which filesystem they're in. I suppose the symlink thing is similar, but one problem at a time...)
Since I was rebuilding ccwrap so much, I decided to make a new "more/tweak.sh" wrapper script to generally make it easier to modify a file out of a stage and rerun the packaging. The stage dependencies are encoded in build.sh using the existence of tarballs (if the tarball is there, the stage build successfully), so it can just delete the tarball before the check and the stage gets blanked and rebuild.
However, I want to manually do surgery on a stage, and then rebuild all the stages _after_ that one without rebuilding that one. (Avoiding a fifteen minute cycle time for rebuilding native-compiler on my netbook is sort of the point of the exercise.) And build.sh didn't know how to do that, so I added an AFTER= variable telling it to blank the dependencies for a stage as if the stage was rebuilt, but not to rebuild the stage. (Sounds simple. Took all day to debug.)
The other fun thing is that system-image.sh is rebuilding the kernel, which is the logical place for it to go (it's not part of the root filesystem, and all the kernels this is building are configured for QEMU so you'd want to replace that when using real hardware anyway), but it's also an expensive operation that produces an identical file each time (when you're not statically linking the initramfs cpio archive into the vmlinux image, anyway).
So I added code to system-image.sh that when you set NO_CLEANUP it checks if the vmlinux is already there and skips the build if so. (The filesystem packages blow away existing output files, the same way tar -f does.) And have tweak.sh set NO_CLEANUP=temp (adding a new mode to delete the build/temp-$ARCH directories but not the output files) to triger that.
So when I finally finished implementing this extensive new debugging mode, it took me a while to remember what problem I wanted to use it on. It's been that kind of week...
And then, when I got "more/tweak.sh i686 native-compiler build_section ccwrap" to work, it put the new thing in bin/cc instead of usr/bin/cc because natie-compiler is weird and appends "/usr" to $STAGE_DIR. So special case that in tweak.sh...
And after all that, it produced an x86-64 (host!) binary for usr/bin/cc, because sources/sections/ccwrap.sh uses $HOST_ARCH isn't set. (Sigh: there's TOOLCHAIN_PREFIX, HOST_ARCH, ARCH, CROSS_COMPILER_HOST... I'd try to figure out how to get the number down but they all do slightly different things, and the hairsplitting's fine enough that _I_ have to look it up in places.
The toybox build is using $ARCH, the toolchain build is using $HOST_ARCH. This seems inadvisable. ($ARCH is target the toolchain produces output for, and $HOST_ARCH is the one the toolchain runs on. They're almost always the same except when doing the canadian cross stuff in the second stage cross compiler. In fact native-compiler.sh will set HOST_ARCH=$ARCH if HOST_ARCH isn't already set, which is the missing bit here.)
Sigh. Reproducing bits of the build infrastructure in a standalone script is darn fiddly. Reminding me how darn fiddly getting it all to work in the _first_ place was...
I switched the Aboriginal Linux stage tarballs from bzip to gzip, because bzip2 is semi-obsolete at this point in a way gzip isn't. There's still a place for gzip as a streaming protocol (which you can implement in something like 128k of ram including the buffer data), while kernel.org has stopped providing tar.bz2 and replaced them with tar.xz.
This gets me off the hook for implementing bzip2 compression-side in toybox. (Half of which is a horrible set of cascading string sort algorithims where if each one takes too much time it falls back to the next with no rhyme or reason I can see, it's just magic stuff Julian Seward picked when he came up with it, just like the "let's do 50 iterations instead of 64" magic constants all over the place that scream "mathematician, not programmer" (at the time, anyway). And I can't use the existing code because it's not public domain, but if I can't understand it I can't write a new one.)
Yes, I'm enough of a stickler about licenses that I won't use 2-clause BSD code in an MIT-licensed project, or Apache license, or ISC... They all try to do the same thing but they have slightly different license phrasing with the requirement to copy their chosen phrasing exactly, which is _stupid_ but if you think no troll with a budget will ever sue you over that sort of thing, you weren't paying attention to SCO or the way Oracle sued Google over GPL code. In theory you can concatenate all the license text of the various licenses you used, which is how the "Kindle Paperwhite" wound up with over 300 pages of license text under its "about" tab. If you ever _do_ wind up caring about what the license terms are, that's probably not a situation you want to be in.
The advantage of public-domain equivalent licenses is they collapse together. You're not tied to a specific phrasing, so nobody bikesheds the wording (which is what's given us so many slightly incompatible bsd-alikes in the first place).
But it's also that toybox isn't about combining existing code you can get elsewhere. If I can't write a clean, polished, well-integrated version of the thing, you might as well just install the other package. If I can't do it _better_, why do it _again_? (That's why I didn't merge the bsd compression code I had into busybox in the first place. I had the basics working over a decade ago.)
So back to Aboriginal: switching tarball creation from bzip to gzip actually made things _slower_. Yes, the busybox gzip compression is slower than the bzip compression. That's impressively bad. (Numbers: running busybox gzip on the uncompressed native-compiler tarball takes 2 minutes 3 seconds. Cat the same data through host gzip, it takes 21 seconds. Busybox is _six_times_ slower. The point of gzip is to do the 80/20 thing on compression optimized for speed, simplicity, and low memory consumption. Slower than bzip is... no.)
For a while now I've been considering how to parallelize compressors and decompressors. I don't want to introduce thread infrastructure into toybox, but I could fork with a shared memory region and pipes and probably make it work. (Blocking read/write on a pipe for synchronization and task dispatching, then a shared memory scoreboard for the bulk of the work.)
In the case of gzip, if I chop the input data into maybe 256k chunks (with a dictionary reset between each one), and then have each child process save its compressed to a local memory buffer until it's ready to write the data to the output filehandle (they can all have the same output filehandle as long as they coordinate and sync() properly).
However, first I'd like to see if I can just get the darn thing _faster_ in the single processor version, because the busybox implementation is _sad_. (Aboriginal is only still using it because I haven't written the toybox one yet. I should do that. After the cleanup/promotion of ps and mdev, which is after I cut a release with what's already in the tree, which is after I get all the targets built with it.)
Did a fairly extensive pass to try to fix up distcc's reliability, tracing through distcc to see why a simple gcc call on the command line was run locally instead of distributed. And I found the problem, in distcc's arg.c line 255: if (!seen_opt_c && !seen_opt_s) return EXIT_DISTCC_FAILED;
Meaning I have to teach ccwrap to split single comple-and-link gcc command lines into two separate calls, because distcc itself doesn't know how to do it. (At which point I might as well just distribute the work myself...)
The new build layout breaks halfway through the linux from scratch build, and the reason is that the wrapper is not compatible with relocating the toolchain via symlinks.
The wrapper does a realpath() on argv to find out where the binary actually lives, and then the lib and include directories are relative to that (basically ../lib and ../include, it assumes it's in a "bin" directory).
I need to do that not just because it's the abstract "right thing", but but because I actually use it: aboriginal's host-tools.sh step symlinks the host toolchain binaries it needs into build/host (so I can run with a restricted $PATH that won't find things like python on the host and thus confuse distcc's ./configure stage and so on). The toolchain still needs to figure out where it _actually_ lives so it can find its headers and libraries.
But in the new "spliced initramfs" layout, the toolchain is mounted at /usr/hda and then symlinked into the host system. So /usr/hda/usr/bin/cc is the real compiler, which gets symlinked to /usr/bin. The wrapper is treating /usr/hda/usr/include as the "real" include directory, but the package installs are adding headers to /usr/include... which isn't where the compiler is looking for them. I created an environment variable I can use to relocate the toolchain, but I'd prefer if it could detect it from the filesystem layout. So how to signal that...
I was thinking it could stop at a directory that also contained "rawcc", but there's two problems with that. 1) libc's realpath() doesn't give a partial resolution for intermediate paths, 2) I moved rawcc into the compiler bin directory where the real ld and such live. (You thought the binaries in the $PATH were the actual binaries the compiler runs rather than gratuitous wrappers? This is gcc we're talking about, maintained by the FSF: it's unnecessary complexity all the way down.) So instead of /usr/hda/usr/bin having rawcc symlinked into /usr/bin, my toolchain has it in usr/$ARCH/bin/rawcc (which corresponds to /usr/lib/gcc/x86_64-linux-gnu/4.6/cc1 on xubuntu and yes this INSANE tendency to stick executables in "lib" directories needs to die in a fire. "/opt/local" indeed...)
I guess what I need to do is traverse the symlinks myself, find a directory where ../lib/libc.so exists relative to that basename(), and then do a realpath() on the _directory_.
Excellent article describing how recessions work, but it raises a question: Why would anyone save at 0% interest? Where's this "excess of desired savings" coming from, why _now_?
The answer is that paying off debt is a form of saving. (It's often one of the best investments you can make, it gives you guaranteed tax free returns at higher interest rates than you get anywhere else.) Having a zero net worth is a step up for a lot of people, burdened by student loans and credit card debt and an underwater mortgage...
But borrowing creates money, because money is a promise and borrowing is makes new promises. When you swipe your credit card the bank accounts the money was borrowed from still show the same number of dollars available in them, and the people you bought things from keep the money you borrowed and spent. That money now exists twice, because your promise to repay the credit card debt is treated as asset in your bank's books. Money is _literally_ a promise, new money comes from people making new promises, and credit cards allow banks to turn individual civilian promises into temporary money.
For the same reason, paying off debt destroys money, by canceling the magnifying effect of debt. When the debt is repaid, that money no longer exists in multiple places, so there is now less money in circulation. The extra temporary money created by securitizing your promise expires when the promise is fulfilled and the loan is repaid. Result: there are fewer effective dollars in circulation, which is a tiny contraction of the money supply.
But debt is not just magnifying existing money: this is where _all_ money comes from. It's promises all the way down, and _nothing_else_. This is the part that freaks out libertarians, who use every scrap of political power they can buy to forbid the government from ever printing new money, so they can pretend the existing money is special and perpetual and was immaculately created by god on the seventh day or something.
Here's what really happens.
A lot of government "borrowing" is actually printing money while pretending not to. A "bond" is a promise to pay a specific amount of money at a specific future date (say $100 in 30 years), which is then sold today for a lower value than the future payback (so the "interest rate" is the difference between the future payoff and the current price, expressed as an annual percentage change). The federal Department of the Treasury regularly issues "treasury bonds", which it auctions off to the highest bidder. (Again, the auction sale price determines the interest rate: divide the amount the bond pays at maturity by the amount it auctioned for and annualize it, and that's the interest rate the bond yields to the buyer.)
The trick is that the Federal Reserve (the united states' national bank) can buy treasury bonds with brand new money that didn't exist before the purchase. When buying completely safe assets (such as debt issued by people who can, if all else fails, literally print money to pay it back), the federal reserve is allowed to borrow money from itself to do so. The Federal Reserve's computer more or less has a special bank account that can go infinitely negative, and they transfer money out of it to buy treasury bonds, using the bonds as collateral on the "loan".
The Federal Reserve doesn't need congressional authorization to create this new money because "cash" and "treasury bonds" are considered equivalently safe assets (issued by the same people even), so it's just swapping one cash-equivalent asset for another, a mere bookeeping thing of no concern to anyone but accountants. At the other end the Treasury Department is auctioning bonds all the time (including auctioning new bonds to pay for the old ones maturing), but these bonds are all made available for public auction where investors other than the Federal Reserve can bid for them, so in _theory_ the federal debt could be entirely funded by private investors and thus the libertarians can ignore the fact it doesn't actually work that way.
This is why the federal reserve can control interest rates. When it's buying the vast majority of treasury bonds at each auction, the price it bids for them determines the interest rate earned on them by everybody else. (If you bid less than the fed you don't get any bonds. If you bid much more than the fed you're a chump losing money, and anyway your entire multi-billion dollar fortune is a drop in the bucket in this _one_ auction.)
So a _giant_ portion of the federal debt is money the government owes to itself. (Not just the federal reserve, but the social security trust fund, which is its own fun little fiction: when social security was created people retiring right then got benefits without ever paying into the system. Current taxpayers paid for retirees, and that's still how it works today.)
This debt created money, and the expanding national debt expanded the money supply not just so the US economy could expand but so foriegn countries can use the US dollar as their reserve currency (piling up dollars instead of gold and using them as ballast to stabilize the currency exchange rates).
The hilarious part is that the federal reserve makes a "profit" due to the interest paid on the treasury bonds. When the bonds mature and get paid back, the Fed gets more money from Treasury than it paid them to buy the things. What does the Fed do with this profit? Gives it back to the Treasury.
No really, that's literally what happens: the federal reserve's profits are given to the government and entered into the government's balance sheet as a revenue source. People are _proud_ of this, even though it's just money going in a circle. The treasury pays interest to the federal reserve which gives it right back, and it's just as good as taxes!
(Half the point of taxes to keep inflation down by draining extra money out of the system after the government spends money the federal reserve and treasury have spun up out of promises to pay each other back. It's a bit like the two keys to launch a missile thing: they have to cooperate because printing presses were too obvious. There are still printing presses, but you have to buy cash from them with money in bank accounts. _New_ money is created in the bank accounts by borrowing previously nonexistent dollars from the federal reserve in exchange for previously nonexistent treasury bonds. Welcome to the ninteenth century.)
Clueless nutballs like Ayn Rand Paul who have no idea how the system actually _works_ constantly attack it because they are incensed at the idea that the money they worship is a social construct rather than a real tangible thing, so they attack the machinery that makes it work to prove that everything will still work without the machinery. (Just like if you stop paying the water bill you no longer get thirsty. Well how do you know until you've tried? As was established in Red Dwarf, "Oxygen is for losers".)
But as I said years ago, money is just debt with a good makeup artist.
So when people respond to us being in a recession (because everybody's paying down debt so nobody has any money to buy stuff with) by trying to cut federal spending and balance the federal budget and pay down the _national_ debt at the same time...
They are IDIOTS, who should not be allowed to operate heavy machinery they clearly do not understand _anything_ about. The government _can't_ run out of money. It can cause inflation if it overdoes things, but right now we've been stuck with the _opposite_ problem for almost eight years. We could do with some inflation. (If you take on a 30 year mortgage expecting 3% annual inflation and get 1%, you wind up paying off twice as much money as you thought you would over the life of the loan. Inflation benefits debtors. That's why creditors hate it so much. They always go on about retirees, but it's billionares doing the lobbying to screw over people with mortgages.)
Oh wow, somebody donated to my patreon.
Sometime last year I claimed my name on Patreon. (My last name is nearly unique: during World War I a german immigrant named "Landecker" decided that immigrating to the US with a german sounding name wasn't a good idea, so he made something up, and my family has been the proud bearer of this tradition ever since. All what, three of us? My father's sister Barbara changed her name when she married, as did my sister Kris, so there's my father, my brother, and myself. Oh, and my father remarried. Four people with the name, of which I'm the only programmer.)
Of course nothing's entirely unique on google, it shows up as a typo in a number of places, one of my brother's friends wrote fanfic using the name for a character, and some random bodybuilder decided to use my last name as the first name of his stage name (so if you do a google image search for it you mostly get him), but eh. Unique enough for most purposes, not a lot of competition for it as login handles, but still worth grabbing if you have any plans for someday caring about the service.
Anyway, I filled out a creator profile on Patreon and did some goals called "Do the thing!" where in exchange for the money I promised to appreciate receiving it, and then largely ignored the thing. I didn't even bother to mention it here or on my project websites or mailing lists. (Once upon a time I proposed crowdfunding on the toybox list. Literally nobody replied.) It's not even the patreon account my household sponsors other patreons through (that would be Fade's patreon, through which I send like a dollar a month each to a half-dozen different webcomic artists).
Over the past year or so several companies have contacted me to ask if I had time to do toybox or aboriginal linux work for them, and I mentioned the patreon each time and they went "we're not really set up to do that". I guess it's not surprising somebody eventually took me up on it, but still... Cool.
(They're strongly encouraging me to work on mdev next. Ok then...)
I've been wrestling with an Aboriginal Linux design change for a couple months now, and it's fiddly. The problem is the linux-kernel build.
In the old design, the top level wrapper build.sh calls:
(This is slightly simplified, ignoring the optional second stage cross compiler, the ability to disable the native compiler, and so on.)
The idea behind the new design is to move simple-root-filesystem into initramfs. Then the native-compiler is packaged up into a squashfs on /dev/hda and gets symlinked into the initramfs at runtime via "cp -rs /dev/hda/. /".
This means the simple-root-filesystem output gets packaged into a cpio image, the native-compiler.sh output gets packaged into a squashfs, and the old root-filesystem.sh script that used to splice the two together at compile time goes away (they're now combined at runtime).
So the new design is:
Yes, I could have made the splice part cp -s the files into /usr instead of /, and thus not had to modify native-compiler at all, but that's less generic. In theory you can splice arbitrary extra complexity into the initramfs from the hda mount, no reason _not_ to support that. (There's a "directory vs symlink" conflict if the new filesystem has an /etc directory, mkdir tries to mkdir /etc" and complains something that isn't a directory already exists there. Of course it's a symlink _to_ a directory so if it just continued everything would work. I should check what the host cp does, reread what the standard says, and figure out what I want toybox cp to do here. But for the moment: the /etc symlink points to /usr/etc so just put the files there for now and it works...)
So root-filesystem.sh, root-image.sh, and linux-kernel.sh got deleted, simple-root-filesystem became root-filesystem, native-compiler.sh got its output moved into a "usr" subdirectory, and system-image once again builds the kernel (which is really slow, but conceptually simple).
The packaging is completely different. The old root-filesystem.sh script goes away, because the simple-root-filesystem and native-compiler output get combined at runtime instead of at a compile time. (This means more/chroot-setup.sh script also has to know how to combine them, but since it's just a cp -a variant that's not a big deal anymore.)
The old root-image.sh and linux-kernel.sh stages used to be broken out to avoid extra work on rebuilds, but I put them back because I don't like describing the design. It makes iterative debug builds take longer, but I can rebuild individual packages outside the build system if I need to fiddle with something many times. (I'm almost always too lazy to bother...)
A lot of optional configuration tweaks the old build supported go away too: ROOT_NODIRS was a layout based on linux from scratch chapter 5, but lfs-bootstrap didn't use it. NO_NATIVE_COMPILER let the build do just the simple-root-filesystem, now you can select that at runtime.
On the whole, a big conceptual cleanup. But a real MESS to explain (mostly because of what's going away), and a lot of surgery to implement.
Happy birthday to me...
Didn't really do anything for it this year. (Last year was 42, that's an important one. 43 means you survived the answer. Didn't ask for what I really wanted either year, because I want other people to be happy more.)
Work ate this week dealing with kernel stuff (adding icache flushing support to a nommu system), and now I'm back poking at toybox trying to remember where I left off. According to (hg diff | grep +++ | wc) I've got differences in 20 files, and that's _after_ moving some of the longstanding stuff like grep -ABC or the half-finished dd rewrite (the bits that broke a "defconfig" build for me) to patches...
But I'd like to ship both sed and aboriginal releases this weekend, and now that sed is in the next aboriginal linux todo item is expr. And expr is weird. Testing the host version:
$ expr +1 +1 $ expr -1 -1 $ expr 1- 1- $ expr 1+ 1+ $ expr 1 + expr: syntax error $ expr + 1 1 $ expr - 1 expr: syntax error $ expr 1 - expr: syntax error
So now I'm staring at the posix spec to try to figure out what portion of this nonsense is required by the spec, and what portion is implementation weirdness. (I think +1 is being treated as a string, -1 being treated as an integer, but I have no idea why nothing + 1 is allowed but nothing minus 1 isn't? (Maybe the first becomes "" + "1", but there's no minus behavior for integers? Maybe?)
One of my four talk proposals got accepted at CELF. Unsurprisingly, they went with "What's new in toybox". (Not the rise and fall of copyleft.)
I'd link specifically to this year's page, but this is the Linux Foundation. They never archive old stuff, it's all about squeezing sponsorship money out of thing du jour and moving on to the next revenue garnering opportunity while history vanishes. Sigh. Oh well. At least the free electrons people record and archive stuff.
The ancient and decrepit version of Bash I've been using in Aboriginal linux, 2.05b, doesn't understand "set -o pipefail". It doesn't have the "jobs" command. And it doesn't build toybox.
I was lulled into a false sense of complacency by the fact that aboriginal uses an "airlock step" where it populates build/host with busybox and so on, and then restricts $PATH to point to just that directory for the rest of the build. So the system should rebuild under itself since it initially built with the same set of tools
The exception to this is stuff called at an absolute path, namely any #!/script/interpreter because the kernel doesn't search $PATH when it runs those so they need an absolute path. (The dynamic library loader for shared libraries works the same way.)
I think this old version of bash _should_ have pipefail and jobs, but apparently the way I'm building it they're not switching on in the config. I don't know why.
Of course I tried a quick fix of switching to #!/bin/ash to see if the various rumors I've been hearing about busybox's shell being upgraded actually meant something, and got 'scripts/make.sh: line 79: syntax error: unexpected "("' which is ash not understanding <(command) redirects. Of course.
I may have to do toysh sooner than I expected. This is not cool.
(Yes, I could upgrade to the last GPLv2 release of bash, but that's not the point. I plan to replace it eventually, upgrading it wouldn't be a step forward.)
Working on switching Aboriginal so simple-root-filesystem lives in initramfs unconditionally. Have native-compiler.sqf live on /dev/hda and splice them together at runtime instead of in root-filesystem.sh. Use initmpfs for initramfs, and have a config knob for whether the initramfs lives in vmlinux or in a separate cpio (SYSIMAGE_TYPE=rootfs or cpio). This probably means that simple-root-filesystem needs to be unconditionally statically linked, otherwise the "delete old /lib contents and replace with new lib" gets tricky. (No, you don't want to bind mount it because the squashfs is read-only so you can't add more, you want symlinks from writable lib into the squashfs.)
All this means run-emulator.sh just gives you a shell prompt, without native toolchain, so move the qemu -hda argument to dev-environment.sh.
While I'm making stuff unconditional: drop NO_ROOTDIRS, it's fiddly and unnecessary. (The idea was to create an LFS /tools style layout, but lfs-bootstrap.hdc doesn't use it.)
Leftover issue: more/chroot-splice.sh needs a combined filesystem and root-filesystem.sh isn't doing it anymore...
By the way, these are the busybox calls left in the basic Aboriginal Linux build:
2 busybox gzip 4 busybox dd 11 busybox bzip2 28 busybox tar 121 busybox diff 215 busybox awk 275 busybox sh 1623 busybox tr 2375 busybox expr 21692 busybox sed
And I'm almost done with toybox sed.
The development category has more commands than that, the Linux From Scratch build and general command line stuff switches on another 20 commands (bunzip2 fdisk ftpd ftpget ftpput gunzip less man pgrep ping pkill ps route sha512sum test unxz vi wget xzcat zcat) but none of that is actually used by aboriginal linux itself. So, approaching a milestone...
Linux Weekly News covered Toybox's addition to Android, (Ok, I emailed them a poke and a couple links but they decided it was newsworthy and tracked down several more links than I sent them.)
Meanwhile it keeps showing up in contexts that surprise me, such as openembedded.
Heh. Grinding away at the todo list...
Finished cleaning up printf.c, back to the sed bugs. Felix Janda pointed out that posix allows you to "split a line" by escaping an end of line in the s/// replacement text. So this _is_ probably local to the 's' command the way the other one was local to 'a', and I don't need to redo the whole input path.
Which is good, because redoing the input path to generically handle this ran into the problem that it _is_ context-specific. If "a\" and just "a" could no longer be distinguished because input processing had already removed the escape, that conflicts with how the gnu/dammit extensions for single line a are supposed to behave. Or that "echo hello | sed -e 'w abc\' -e 'p'" is apparently supposed to write a file ending in a backslash, and then print an extra copy of the line.
(This is all fiddly. You wind up going down ratholes wondering how "sed -e "$(printf "a\\\n")"" should behave, and decide the backslash is _not_ the last thing on the line because the input blocking put the newline into the line, and in that context it's as if it was read in by N and becomes significant... I think?)
So my big laptop (now dubbed "halfbrick") is reinstalled with 14.04, which has all sorts of breakage (the screensaver disable checkboxes still don't work coming up on a year after release, you have to track down and delete the binary), but I used that at Pace and at least found out how to kill the stupid Windows-style menus with fire and so on.
Last night, I started it downloading the Android Open Source Project.
This morning I found out "repo sync" had given up after downloading 13 gigabytes, and ran it again. It's resumed downloading.
Now that toybox is merged into android, I really need to set up an android build environment to test it in that context (built against bionic, etc).
The software on my system76 laptop has been stuck on Ubuntu 13.04 since the upgrade servers went away. (I thought I could upgrade versions after that point, but apparently no.) This has prevented me from poking at the Android Open Source Project, because building that needs packages I haven't got installed (can't install more without the upgrade servers), and my poor little netbook hasn't got the disk spacei, let alone CPU or memory to build it in less than a week.
I've held off upgrading because it's also my email machine, but finally got to the point where I need to deal with this. (Meant to over the holidays, didn't make it that far down my todo list.)
The fiddly bit was clearing enough space off my USB backup disk to store an extra half-terabyte of data. (Yes, I filled up a 1.5 terabyte disk.) then leaving it saving overnight, and now it's installing.
The ubuntu "usb-cd-creator" is incredibly brittle, by the way. I have a 10 gigabyte "hda.img" in the ~/Downloads directory, and it finds that when it launches and lists that as the default image it's going to put on the USB key (but does _not_ find the xubuntu-14.04.iso file), insists that it won't fit, and doesn't clear this "will not fit" status even if I point it at the ISO and select that instead. So, delete the hda.img so it won't find it, and then I hit the fact that the "format this disk" button prompts you for a password and includes the time you spend typing the password in the timeout for the "format failed" pop-up window. I.E. the pop-up will pop up WHILE YOU ARE TYPING THE PASSWORD.
This is not a full list of the bugs I hit in that thing, just the two most memorably stupid.
The printf work I've done turns out to have broken all sorts of stuff because the regression tests I was running were invalid, because printf is a bash builtin! Which means the tests/printf.test submitted to toybox last year is not actually testing toybox printf: it doesn't matter what printf is in the $PATH, the bulitin gets called first.
(Is there a way to disable unnecessary bash builtins? Other than calling the binary by path each time...)
I keep hitting funky bugs in gnu commands while testing their behavior to see what toybox should do. The most recent example:
$ printf "abc%bdef\n" "ghi%sjkl\n" "hello\n" abcghi%sjkl def abchello def
What I was trying to test was "does the %b extension, which isn't in posix, interpret %escapes as well as \escapes?" And the answer seems to be: sort of. Only, badly.
I'm not entirely sure what this bug _is_. But it's not in the printf the man page is about, it's in the printf you get from "man bash" and then forward slash searching for printf. :)
(Well, ok, /usr/bin/pritnf produces the same output. But probably shouldn't, and I don't think I'm implementing that strangeness in toybox.)
Finally fixed that sed bug I was head scratching over so long: it was that I need to parse escapes in square brakets, ala [^ \t], because the regex engine isn't doing it for me. (It's treating it as literal \, literal t.)
Now on to the NEXT sed bug, which is that you can have line continuations in the regex part of a s/// command. (Not so much "bug" as "why would anyone do that? That's supported?"
In theory, this means that instead of doing line continuations for specific commands, I should back up and have a generic "if this line ends with \, read the next line in". (Except it's actually if this line ends with an odd number of \ because \\ is a literal \.)
The problem is, the "a" command is crazy. Specifically, here are some behavioral differences to make you go "huh":
$ echo hello | sed -e a -e boom sed: -e expression #1, char 1: expected \ after `a', `c' or `i' $ echo hello | sed -e "$(printf 'a\nboom')" sed: can't find label for jump to `oom'
In the first instance, merely providing a second line doesn't allow the 'a' command to grab it, the lines need to be connected and fed in as a single line separated by a newline. but in the second instance, when we _do_ that, the gnu/dammit implementation decides that the a command is appending a blank line (gnu extension: you can provide data on the same line). (Busybox treats both cases like the second one.)
I suppose the trick is distinguishing 'echo hello | sed -e "a hello" "p"' from 'echo hello | sed -e "a hello\" "p"'. In the first case, the p is a print command. in the second, it's a multiline continuation of a.
And the thing is I can't do it entirely by block aggregation because 'echo hello sed -e "$(printf "a hello\np")"' is a potential input. (The inside of $() has its on quoting context, so the quotes around the printf don't end the quotes around the $(). Yeah, non-obvious. One more thing to get right for toysh. The _fun_ part is since 'blah "$(printf 'boom')"' isn't actually _evaluating_ the $() during the parse ($STUFF is evaluated in "context" but not in 'context'), then the single quotes around boom _would_ end and restart the exterior single quote context, meaning both the single quotes would drop out of the printf argument and the stuff between them wouldn't be quoted at all if you were assigning that string to an environment variable or passing it as an argument or some such. Quoting: tricksy!)
Anyway, what it looks like I have to do is retain the trailing \ at the end of the line. I have to parse it to see if I need to read/append more data, but then I leave it there so later callers can parse it _again_ and distinguish multiline continuations.
Sigh. It's nice when code can get _simpler_ after a round or two of careful analysis. So far, this isn't one of those times, but maybe something will crop up during implementation...
So the kernel developers added perl to the x86 build again, and of course I patched it back out again. Amazingly, and despite this being an x86-only problem, it _wasn't_ Peter Anvin this time. It was somebody I'd never heard of on the other side of the planet, and it went in through Thomas Gleixner who should know better.
The fact my patch replaces 39 lines of perl with 4 lines of shell script (and since the while shell script fits in the makefile in place of the call to the perl script it only adds _2_ lines to the makefile) is par for the course. I could clearly push this upstream as another build simplification.
But I haven't done so yet, because I really don't want to get linux-kernel on me anymore. It's no fun. I'm fighting the same battles over and over.
There are a bunch of patches I should write extending initmpfs. It should parse the existing "rootflags=" argument (in Documentation/kernel-parameters.txt) because size=80% is interesting to set (the default is 50%, the "of available memory" is implicit). I should push a patch to the docs so the various people going "I specified a root= and didn't get rootfs as tempfs because that explicitly tells it not to do that" have somewhere I can point them other than the patch that added that to go "yeah, you don't understand what root= means". I should push a patch so CONFIG_DEVTMPFS_MOUNT works for initmpfs (the patch is trivial, grep -w dev init/noinitramfs.c shows a sys_mkdir("/dev") and sys_mknod("/dev/console") and do_mounts_initrd.c has a sys_mount() example too, this is like 20 minutes to do and test.
But... It's zero fun to deal with those guys anymore. It's just not. The Linux Foundation has succeeded in driving away hobbyists and making Linux developent entirely corporate. The "publish or perish" part of open source is still there, just like in academia. But I'm not hugely interested in navigating academic political bureaucracy for a tenure-track position, either.
Sigh. I wrote a series of articles about this a decade ago. The hobbyists move on, handing off to the 9 to 5 employees, who hand off to the bureaucrats. I'm not _surprised_. I'm just... I need to find a new frontier that doesn't involve filling out certification forms and collecting signatures to navigate a process.