Rob's Blog (rss feed) (mastodon)

2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2002

November 22, 2022

Broke down and walked to the table at the UT geology building. (No sign of the killer robot Elliott warned me about. So far that seems to just be a San Francisco PD thing.) There's still no outlet, but my laptop has a decent battery and can start with a full charge. And technically I could go to the courtyard of the... biology building I think it is? The one down speedball past the vending machines... where there's an outlet I can charge it if I actually do run it down. Although the charger still has the overheating problem with the bigger battery if I use it while it's charging (Dell charge controller bios needs an update or something; as far as I know they only ever shipped windows tools to do that, and it's old enough to be out of support so I'm guessing "no". I could buy a higher wattage charger but this one still technically works...

Fade posted a new chapter in her In Nomine fanfic on AO3, and I want audiobooks because eyestrain (focusing on stuff far away during the walk is an advantage of the walk; admittedly not so much while watching anime but for BOOKS...) So on the walk I tried to get my phone to read it to me.

Here's the FIRST bug report I sent to the Android devs via crash pop-up:

You don't have a text to speech app in Android. You have Google speech services, but I have to install third-party adware to actually have it speak pages at me. So I cut and pasted the contents of a web page and tried to send them to Google translate to get it to speak it out loud. It crashed. And then the error reporter crashed trying to report it.

I gave up and installed "Speechify", which is spectacularly badly programmed. It has like 10 minutes of intro text it wants to read at you with no way to skip it (turned the volume down and let work through while I wasn't looking at the screen). But it turns out if you TRY to skip you screw up focus for stuff later: after reading its unskippable intro spech it insists on getting a first name from you, and the first time it tried it the text wasn't going into the input field because I'd tapped stuff earlier. The Android on-screen keyboad will happily send text nowhere if nothing has focus. There was no way to switch focus TO the input field (the app had locked focus changes out, but didn't ensure the field HAD focus first), so I had to kill and re-start it (which made it re-read all that intro text yet again). After the name it makes you performatively select which voice to use and which speed to read at (despite these being changeable later it has no defaults and makes a big deal about going through a WIZARD to prevent you from actually starting to use it for longer). Once again it got focus wrong (I didn't wait long enough for it to STOP TALKING) and it had to be restarted a third time, reading through all the intro nonsense AGAIN...

Here's the SECOND bug report I sent Android's marketplace crash thingy once I plowed through all that:

I installed one of the third party shareware text to speech readers with the in-app purchases. It crashed when I sent it text, and your reporter crashed again the same way. Just like Google translate.

Why do you even have text to speech capability in the OS if you won't use it?

This crash was similar to a web page mastodon linked me to yesterday that had a "listen to article" button that started reading through the page, and then paused a longish time entering into "please subscribe" pop-up before the body of the article, and then the chrome tab crashed when the reader reached the end of the pop-up. (Redid it three times, always crashed the same way.) Because of that, I guessed that doing "select all" on the AO3 page was making Android's cut-and-paste plumbing Get Unhappy about the block of navigation links and punctuation and such at the top. So I carefully selected just the BODY of the chapter with the little start and end brackets, and instead of sending it to the other app I opened a new file in the app and told it "copy from clipboard". And THEN it was a willing to read the text, mispronouncing lots of words.

Except I ran out of my 1500 "free premium words" (despite DECLINING their free whatsis offer twice) less than halfway through the chapter (which wasn't even the new one Fade uploaded, I tend to reread some context before going into the new stuff), and the voice suddenly changed to a much louder and more grating one. Because capitalism. This was after three different pauses in reading (pop-ups trying to get me to subscribe and rate the app). It's better about working "screen off in my pocket" than prudetube, but not by much.

Yes, I'm still using an official Google Pixel phone instead of a version of Android modified by a phone vendor. Stock, unmodified image. I break everything.

November 21, 2022

Oh hey, Tumblr is adding support for the same protocol Mastodon uses (and a half-dozen other apps, technically it's the "fediverse") so you can subscribe to tumblrs from mastodon and add mastodon accounts to your tumblr feed. Cool.

The reason ASAN=1 make test_command builds in ASAN support but doesn't use the unstripped version of "command" is a nasty tangle of shell inheritance.

The theory is scripts/ has an "if $ASAN" block that adds stuff to $CFLAGS, but it unsets $ASAN inside that block so future calls to portability.h won't redundantly the same stuff to $CFLAGS. It also sets NOSTRIP=1 (telling scripts/ to cp generated/unstripped/$FILE up to the top level instead of washing it through the strip command) but doesn't _export_ it. Which in theory is fine because either we inherit the local variable or the change to ASAN gets dropped when we return to parent context.

But in the make test_command case, Makefile exports CFLAGS (initially with an empty value) and calls scripts/ which sets CFLAGS in ./configure and does "source scripts/" (thus parsing the if ASAN block, switching on NOSTRIP and unsetting ASAN), and then calls scripts/ as a child process... which inherits the updated CFLAGS (because Makefile exported it) but since we unset ASAN that isn't passed through to the child process (it had been exported, but we deleted it when we consumed it), and the NOSTRIP is just a local variable that was never exported so that isn't passed through to the child either.

So it's not that ASAN and NOSTRIP get out of sync, it's that CFLAGS and ASAN/NOSTRIP get out of sync. I don't want to update CFLAGS twice, but don't want to lose the NOSTRIP change. And only one of those is exported.

Hmmm. CFLAGS is exported in the Makefile because when you say "make CFLAGS=abc" that needs to be provided to scripts/ It's a UI thing: make variables only become environment variables in children when exported. But that means the behavior differs between scripts/ sed and make test_sed, which is where this ASAN weirdness came from. (It HALF works. ASAN is enabled, but in a stripped binary.)

To fix this, I tried converting the various scripts/*.sh handoffs so they all "source" the relevant scripts, ensuring the local variables marshalled through... but then make test_sed failed because set -o pipefail in caused tests that pipe into head to exit with an error (because the producer got cut off). And even when I just introducd the bug, I have to stop everything and root cause the bug I just saw before it scurries under the fridge. I never know if I introduced it or just REVEALED it: backing out the change means the bug stops reproducing without being understood, and that's a BAD THING. Doing this soaks up time and energy, but I gotta do it to produce code I can actually use. (I really do break everything.)

Once again, lifetime rules are a hard part. I _could_ just switch pipefail off again after sourcing the other files, but this smells brittle. Arbitrarily polluting the environment, future changes have a wider scope than they should. (Alas mksh does not have "export -n" or "declare" so REMOVING the export property from CFLAGS is awkward, so I have to unset ASAN or test for specific things already having been added to ASAN to not redo that again, but still set NOSTRIP again...)

Of course I'm working around debris in my tree while I do this. I have a bunch of accumulated changes in my scripts/ directory I'd like to check in, but... I removed the HOSTCC change in ASAN_FLAGS so I can actually use it locally: yes I know kconfig is funky, I need to replace it, but those binaries DO NOT SHIP. Various people want to run ASAN against temporary build tools because purity, but I should just check it in anyway since it's what I've been testing against and without it ASAN breaks kconfig. (And I refuse to modify kconfig more for license reasons: it gets REPLACED not fixed.)

Another change is my current calculates $PART2 of the prefix tuple differently, and I can't find a blog entry about why? The file is dated July 5, but blog entries around there aren't saying much. If $PART2 doesn't start with a - it prepends linux-musl to it, moving that out of the assignment to $TARGET. The advantage is if you provide a $PART2 that starts with - it won't say "linux-musl" so you can make toolchains with more flexible names. Except the patch I have can't work because it's still putting a dash between PART1 and PART2 when assembling $TARGET so you'd have two dashes if you provided your own in PART2. I needed this for something while fiddling with a toolchain build 6 months ago, but this is a half-finished change that can't work as-is which I've lost the context for. Oh well, redo it if I cycle back around to doing that thing and thus recall WHY I needed to make that change...

I want to keep the rewrite I just did to have scripts call other scripts, because calling back into make is a layering violation: make is supposed to be a purely optional convenience wrapper, the actual work is in shell scripts. But having the scripts SOURCE each other also seems wrong (don't want this script to need to care what was or wasn't exported three scripts up). Which means I need to export the data that gets used by child processes, and the proper place to do that is... maybe the top level "configure"? The question is, should everything defined in there be exported? Hmmm... I THINK that's ok? This is all my code being run. Specifically, (intentionally) does NOT call ./configure or (It does an env -i purge then sets a minimal set of environment variables, none of which are set out of sight or arbitrarily exported, becuase who knows how third party package builds will react?)

Eh, exporting a bunch of stuff from configure is intrusive though. The ASAN test block in needs to export the variables it's modifying (so the updated values get inherited) if it's gonna unset ASAN (to avoid duplicate appends to those variables). That's probably the correct fix.

Coding is easy. Figuring out what the code should DO is hard.

November 20, 2022

Fun little corner case: a while back I hacked a "-q" option into toys/example/demo_number.c which just forces a segfault via *(void *)0 = 0; and I added a call to it in tests/demo_number.test to make sure the test infrastructure caught it. It appends a line "exited with signal (or returned 139)" to the recorded output, which does not match the otherwise empty expected output of the test, so the test fails.

Except now I want a test that segfaults in the SUCCESS condition, because I'm trying to test that my plumbing is catching it and handling it properly. Yes I have manualy tested that the test suite catches segfaults (even when they don't otherwise modify the output), but I want to add it to the regression tests because I break everything and I don't trust it to STAY working. While it's easy enough to just add the line to the expected output (in which case it passes), bash itself outputs scripts/ line 137: 26061 Segmentation fault "demo_number" -q 2> /dev/null which is kind of annoying and not something I can apparently suppress within the test itself? Several different gyrations of redirecting stderr didn't do it.

Apparently bash won't report segfaults if you pipe the output somewhere else, so gratuitous | cat at the end shuts it up... but also discards the return code so my script doesn't catch it either. (Plus dmesg is filling up with gratitous "Oh noes, a SEGFAULT occurred, here's a hex dump!" which is utterly useless at the best of times... ah, easy enough to disable: echo 0 > /proc/sys/debug/exception-trace. Tempted to add that to my rc.local. I don't think mkroot's kernel config has that enabled in the first place.)

If I add a trap "echo no" SIGSEGV; to the start of my test it then prints just "Segmentation fault" instead of the run-on sentence, but it still prints it, and doesn't run my trap. Stackexchange seems to think bash just gets a SIGCHLD with an exit code, so I could intercept that... except when I do the SIGCHLD gets called _after_ bah has printed the warning spam. I do not want bash to handle this AT ALL. I want to handle this myself. Stop taking control away from me, I do not want to argue with a hammer. A tool that demands attention is a bad tool.

I don't want to ask Chet about this. Partly because I don't want to bother him and partially because I'm dreading one of those "stop wanting this" replies, although mostly I expect that's a reaction to the Mastodon maintainer's insistence that wanting quote tweets or a URL listing a user's Favorites makes you a bad person. "You're using it wrong" is seldom the correct answer.

November 19, 2022

My "cannot focus" has turned into a headache and runny nose. Whee. It's still not bad enough to call myself exactly sick, but as Seanan Mcguire once tweeted, "I cannot brain today, I has the dumb."

Slept until noon, which didn't obviously help. Trying to push through it...

Heh. If I run the current-ish verson of bash in a deleted directory (mkdir -p sub/sub; cd sub/sub; rm -rf ../../sub; env -i /bin/bash) the result is CHATTY. And I don't know if "job-working-directory: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory" is from bash or from libc, because I seem to be getting it trying to run readlink and ls, and I'm pretty sure neither of those is a bash builtin. (For all its faults, bash is not systemd.) Meanwhile "which readlink" gives me two different irrelevant error messages and THEN the right answer...

Ah, the big long nested redundant error message IS from bash, any time it tries to launch a new process: echo doesn't print it (builtin), /bin/echo does (child process).

So in an existing directory, Debian's realpath -m --relative-to=does/not/exist also/missing prints ../../../also/missing but in a non-existing directory it complains does/not/exist doesn't exist despite the -m. I'd kinda wondered how it would handle getcwd() failure in that context. (My previous attempt at this is calling xgetcwd() so it doesn't produce any output except the error message when you're in a nonexistent directory, but that's not correct for readlink -f /proc/self/exe and such. Again, I need the mkroot test environment finished before I can properly TEST this...)

I was also going "will the kernel have a /proc/self/cwd" and yes it does, and it says "/path/to/my/sub (deleted)" which isn't helpful because that's in-band signaling: you could legitimately have a directory named that! (Deleting the cwd entry out of proc would be a clever way to signal you haven't got one, but no...) Still, that's the kernel fscking up, not my immediate problem...

November 18, 2022

Blah, I have just been under the weather for a couple days now. Dunno if this is fallout from travel, from five vaccinations in three weeks, or I've just come down with something.

Cannot focus at all today.

Watching twitter burn. Neil Gaiman and George Takei have joined us on mastodon, and people are posting threads explaining how to legally insulate your own mastodon instance if you offer free hosting to third parties: you can apparently do a proper legal moat for less than the web hosting costs. (Modulo the optional recommendation to incorporate an LLC. Which I've done twice already, and if Texas is being stroppy I should just do a delaware one I guess...)

November 17, 2022

I've accumulated a number of things that are hard to finish because there's sort of a summit to get over. I think I know what I want to do with diff.c now and just need to cycle back to it, but there are users for whom it's already load bearing and I'm proposing swapping out the implementation. Playing with live wires there.

On the deflate.c compression side, yes the algorithm requires a certain amount of sustained focus to maintain mental context, and sure there are a number of potential users like zip that I can't really start writing until I've got an implementation of this plumbing, but the actual BLOCKER way back when was trying to exactly match the output of other implementations (at least for the default and -9 compression levels, because people hash tarballs all the time) without reading their source: there's no documentation on when to flush dictionaries. Or when to use the default synthetic dictionary vs the calculated one for a block: slight loss of encoding efficiency for the fake dictionary vs the savings from not having to write a dictionary into the stream. What, do you just do it twice and use the smaller one? That seems expensive for an algorithm that's still popular as the 80/20 of encoding strategies: it's lightweight enough to implement in hardware if necessary. Making it slow removes half the point.

Deflate extract doesn't care about this because it doesn't make DECISIONS here, just plays back what the encoder did. There are multiple valid ways to encode the same data, and finding the smallest possible one presumably involves basically fuzzing (re-encoding the data thousands of times), for fairly small gains. (Somebody implemented that once, by the way. It wandered by years back. The advantage is it's the same decoder so everything already supported it.) Dictionary reset is more or less the "when do we insert a new keyframe" problem in video. I should probably just examine some large historical tarballs and try to grok the resets and mode selection I guess?

A big tangle of changes I need to finish and check in is redoing lib/passwd.c, which gets rid of lib/portability.h (was was a horrible instance of technical debt that should not be allowed to repeat), and also removes the dependency on at least shadow.h which is a layer of scar tissue reimplementing most of the libc password plumbing (one which bionic doesn't actually have, hence its position in /lib/portability.h inside a __has_include() wrapper).

The reason /etc/shadow was invented is that back in 1970 /etc/passwd was created world readable (chmod 644) so things like ls and ps can map UIDs to names without requiring special permissions. The file contains lines of colon separated text, each of which is username:password:uid:gid:display name:/home/dir:shell, and all those fields EXCEPT password are still maintained in their original places so all the tools that weren't asking a user for their password didn't have to change.

In Bell Labs' original unix, the passwords in the second field were hashed, and at the time those hashes were computationally unbreakable. The calculation of each hash was expensive enough that a brute force attempt to try all possible passwords would take centuries. But computers increase in power exponentially, and despite swapping out the original "crypt" algorithm for DES in the 1980s, then md5 (22 characters) in the 90's, then sha256 (43 characters) and sha512 (86 characters)... (and my man 3 crypt is out of date, I believe type 7 is sha3sum ala rijndael?) it wasn't enough. No matter what the hash is, the human typeable password space isn't THAT big. Even by the late 1990s you could have a beowulf cluster precompute rainbow tables so if you could grab a site's password hashes (if a web page ad escapes its sandbox, /etc/password is a small file to upload) it's pretty much game over. Even adding salt to chaff the rainbow tables just makes the table's storage requirements bigger... no hash was good enough. Since then we've added pervasive SMP and leaseable cloud instances and hardware accelerated hash calculation... the only way to secure a site's passwords is to prevent outsiders from getting your hashes (and ratelimit login attempts). All the hashing really does is prevent casual snooping, and maybe give you a little time to change your passwords WHEN an adversary gets them.

So /etc/passwd was invented, for the sole purpose of moving the password hash into a file only root could read, requiring SUID root tools to read as well as write. And it was done STUPIDLY. Instead of just being a "UID:passwd" mapping in the same colon separated text format, the new file had NINE FIELDS for some reason (more than /etc/passwd!). And instead of updating getpwnam() and putpwent() and friends to transparently handle both files (presumably with the get functions leaving pw_passwd NULL or blank when you're not root), the old "passwd" field got turned into "x" to signal the need to look in /etc/shadow, and it was the appliation's problem to do MORE work to support the new format. (And yes, this means you can have SOME hashes in shadow and some still world readable in passwd.) A whole new set of functions got layered on top to access the new fields out of the new file, with it being the caller's job to collate them and make sense of it. (I worked with the person who did this when I worked at JCI, by the way. Ego the size of a planet.)

My lib/passwd.c rewrite gets rid of the need for shadow.h, and just fiddles with two colon separated text files directly using the same code. (Well, four counting etc/group and gshadow.) But a lot of other stuff uses libc's original password plumbing, and I've even wrapped it with xgetpwuid() and xgetpwnam() in lib/xwrap.c and bufgetpwnamuid() and such in lib/lib.c (to cache recently seen entries in a simple linked list to avoid re-fetching them from the file, so things like ls -l linked against musl don't thrash redoing the lookup: yes linux has disk cache but all the extra syscalls and text file parsing are unnecessary).

I'm trying to make sure that individual apps don't mix both sets of plumbing, because I don't trust the libc stuff NOT to cache, and thus retain stale data I've just updated. But at the same time, reimplementing libc infrastructure and converting everything over to the new one is seldom a good idea. When I did it for lib/env.c I wound up regretting it because yes any conventional environment variable update is a memory leak, but when I thought I could use that plumbing for sh.c I turned out to be wrong, and reimplemented it there anyway because it has a bunch of status flags and nested layers of env plumbing (local variables in function context) that just does not work like environ[] does. Infrastructure in search of a user sucks even when you have the user in mind and just haven't written it yet. All the existing users that were historically happy with the libc API were not motivations for the new API, and I shouldn't move them over to it just because it exists. (Yeah, cleanup and collate... but not away from what's still there in libc?)

So I'm unsure how much of the existing libc passwd plumbing to keep. Conventional unix has a cleanly defined /etc/passwd file containing colon separated human readable text, and my current reimplementation of lib/passwd.c manipulates colon-separated text /etc files directly in the file formats defined by man 5 passwd and man 5 shadow, because doing it for shadow and for passwd is 90% the same code already so it might as well be consistent. I've long punted on pam (pluggable authetication modules) and friends: if you're using shared network logins via kerberos this is not the toolset for you. (That teaming pile of dlopen nonsense is what broke static linking against glibc in the first place.) But just local /etc/files support should be plenty for a simple chroot environment you can build stuff in, including the "posix containers" I've made puppy eyes at the Android guys about for years but which remain moot until I have a working build environment to use in them.

I really don't care how/whether it works on MacOS. I suspect they're storing user data in something like the windows registry, a database only accessable through One True Library that mere humans can never look directly at without replaying the Ark of the Covenant scene from Raiders of the Lost Arc. But I haven't checked either, because I can't have a MacOS test VM on my Linux box. Proprietary OS tied to proprietary hardware, Darwin died long ago. They want to exclude developers who haven't paid to play, I am thereby absolved from caring about them.

For a long time Android's solution to user logins was to not have them: every installed app has its own UID outside the normal "user" paradigm. (Once upon a time I vaguely recall getpwnam() being just a stub emitting a warning about being unimplemented.) These days Android seems to MOSTLY use the classic text format, but bionic has some curve balls such as an AID_APP_START nothing should be below? (Which I can't find a definition of under aosp/bionic but is defined in system/extras/ioblame/ as 10000.) And I mentioned the warning messages about my username not having a "system_" prefix when I linked ls against bionic. (Maybe that's what it wants under APP_START?) Also their /etc/passwd is a symlink somewhere else: my code is deleting the file and doing an atomic mv to update it (plus Stupid File Locking for historical reasons), which would break the symlink. And reading symlinks to see where /etc/passwd has gone seems DEEPLY CREEPY at a security level? (Did I already add a .config string to point toybox at the actual location at build time? It's been a while since I last touched the new plumbing...)

But the actual blocker for merging my lib/passwd.c rewrite back when I implemented it is that many different commands use this plumbing, and I changed the lib function arguments so I'd have to update all callers at once. I converted passwd, su, login, and mkpasswd enough so they'd compile with the new stuff locally in my tree, but have not yet converted all the stuff in pending: groupdel, chsh, sulogin, groupadd, userdel, and useradd. Which kind of need to be tested and promoted at the same time, and the blocker to THAT was always the lack of test environment (I do not muck about with my dev machine as root thanks). Now I've got mkroot far enough to do that in at least some contexts that CAN be unblocked, but is still a lot of work...

November 16, 2022

Trying to settle back into a programming rhythm but the table setup outside the UT geology building seems permanently dismantled. It's not that the tables are still against the far side of the space, it's that they they built a giant plywood box where the outlet was. (For no other obvious purpose than blocking the outlet?)

Hanging out at the Wendy's in Jester Center, but that closes at 11 and the music is turned up to "discourage lingering" volume. (I need noise cancelling headphones.)

Sigh. I need to confront the "home office" problem. (The clingiest of cats is 19 years old, but quite healthy. We'll probably sell the house before I have a cat-free space. I used to view cats as training wheels for kids, but Fade just never got pregnant and I turned 50, and as much as I love cats I'm tired of the endless demands for attention. The cat will never outgrow this. There is no process, we're not working TOWARDS anything.)

November 15, 2022

Flying back to Austin. Not a likely a busy programming day either.

Given the option of a pile of free movies to watch on the plane... I'm not interested in any of them? The post-endgame marvel stuff didn't hook me because I don't have Disney's Google+ channel so I missed enough not to care about follow-up movies. I didn't see Scarlet Witch's TV series so the Dr. Strange sequel's a pass. I didn't see Loki so Thor Love and Thunder's meh. The mouse tightened its grip and I slipped through their fingers. More money from fewer people, standard capitalist upward retreat. Same for Light Buzzyear and other Disney properties: I don't perceive it as a standalone property, just a 2 hour ad for Disney+. (I almost want to see Turning Red, but not enough to watch it right now.)

Elsewhere: Shazam was the last movie I saw in theatre before the Drafthouse's acquirer out of pandemic bankruptcy (downside of taking care of your people in a capitalist hellscape: running out of money) scrubbed away the last trace of individuality. Heard bad things about WW84 and it got sucked into that Snyder Cut thing that's an active turnoff. (They kept Ezra Miller but cancelled Batgirl. That's a hard no.) I think I saw Detective Pikachu already... (Or maybe was just spoiled that Ryan Reynolds was Luke's father? I remember surprisingly little detail from this movie, but was never really a pokemon fan.) Warner's milking-a-dead-cow trilogy Fanatic Beasts, Crimes of Crinkly Bindlewurdles, and Secrets of Dorbledumb: I didn't even see the last non-prequel movie they gratuitously split in two so they could fit in all 300 pages of camping in the woods, and that was before the author publicly joined slytherin. I'm ordinarily into the DC Animated Universe (there were two cannonical bat-voices: Adam West was the Bright Knight: batman as paladin. Kevin Conroy was the Dark Knight: batman as Batman) but the animated super-pets poster is giving me "boss baby" vibes. Or maybe that "cats vs dogs" movie? In brief: avoid. The Minions-prequel-prequel and Joker-as-maga-incel (the edgiest of edgelords) I might pay money to AVOID having to see.

So instead I watched more episodes of Realist Hero on my phone via Hulu's download function. (I think Crunch has started to let you download, but I haven't tried it yet. They didn't used to and it's not habit there yet. By the time I thought of it the plane was taking off.)

Hulu's subtitle plumbing uses a HUGE font size. I usually zoom the font but this is excessive even for me. and it ignores your system setting for font size preference: I shrunk it all the way in system settings, killed and restarted the app, no change. Hulu also puts a black rectangular background behind the subtitles, which means the subtitles are taking out an unnecessarily large chunk of the video even in landscape mode. In portrait mode it's sometimes overwriting half the video, and in picture-in-picture mode it completely overwrites the video and usually the top of each subtitle set is off the top of the window. Hulu's app is really badly implemented, is what I'm saying. I guess they checkboxed this as a disability feature, did minimal legal compliance, and moved on.

I'm impressed by how small a budget Realist Hero was animated on. Not a complaint! They tell a compelling story and I took me a second watch through the first 12 episodes and then a new episode to notice, and even then it was while watching the first new episode on a big screen with Fade that I actually noticed, but... they only actually make a couple hundred new drawings per episode? They get huge mileage out of panning and zooming over still frames. Sometimes using the two layer effect where a drawing moves slightly over an out of focus background drawing to show depth; ninety years ago Disney had really expensive cameras made to do that for Snow White and now that's a trivial digital composite milking five seconds of footage out of two drawings. When characters do move, a given shot will usually only animate mouths or a tail or something. Actual motion in a shot it's generally about 4 frames from one still to another (sometimes repeating). The individual drawings are good, and they're great at hiding the tricks (again to the point I didn't notice for the longest time). They switch to a new "shot" (completely new drawing from a different angle) every few seconds so it doesn't get stale, but they don't even farm out interstitials. The ~4 keyframes they draw to show full body motion display in sequence... which actually draws attention AWAY from the static nature of the rest of the shot? Smoother movement would make settling into stillness seem odd. It works, and they clearly know what they're doing, but this is Hanna-Barbera levels of cheating and the voice actors are carrying the action on their backs.

November 14, 2022

Going through old patches to try to get them off the todo heap. It's a pain trying to reverse engineer, from a pre-pandemic patch, the test case which should have gone with it. This is not a finished change, what was I trying to DO?

Especially when PARTS of the patch have since gone in, but other parts haven't. (This command was since made mayfork, so this work was not preparing it to be mayfork. Was this the Wrong Fix for an issue that already got fixed? It's dated 2 days after a related-ish commit so I think I had something left over at the time? But what was the TEST?)

My normal flow, which I've been avoiding, would be to grind away at toysh until it's ready to go in, both because it blocks so much and because of its shared infrastucture. I want to promote "expr" out of pending, but I ALSO want it to share the $(( )) logic in sh.c.

The $(( )) use case works on a single string (breaks it up into operators itself) and treats strings as environment variables to read from or assign to. The expr use case works on separate strings (each constant or operator is its own argument) and half the operators work on strings (ala "abc" + "def" becomes "abcdef"). But what's really shared between them is the operator precedence rules: * happens before + and ( ) are weird and so on. Having that in two places basically ITCHES.

But $(( )) interacting with shell variables (with all the local/export/readonly tagging) makes splitting the recalculate() plumbing out into lib/ awkward. But I can't promomte an expr merged into sh.c without promoting sh.c: it's either in pending or it's in posix.

One discomfort I had with expr.c was the int vs str struct and associated tests, and staring at it a bit I went "the bottom bit of a string is always 0 because the pointer has to be aligned", so if I set the bottom bit in integers and do a one bit shift around math, I could use that bottom bit to distinguish between them and peel off the string operations first, falling through to the int logic with a one byte shift (so 63 bit math instead of 64 bit math but presumably that's enough range, posix doesn't specify the integer size here at all, that's happy with 32 bit math which even 32 bit platforms mostly don't for shell calculations anymore on Linux; I might have had something to do with that back when my posix emoval patch series replaced the clock tick calculation with a shell script before redoing that with C, before Peter Anvin inflicted bc on people)...

Ha. Wait. That's not true for environment strings, only for malloc() returns. Environment strings (including argv[]) are collated one after the other with null terminators and no padding, so the start of the string is an arbitrary offset within the block and strings after argv[0] CAN have the bottom bit set. I'd have to strdup() them. Or the test is if (!(str&1) || (str >= argv[0] && str < environ)) which is still reasonable for a prefix and/or wrapper function. Although if it's a wrapper how does recalculate() know which function to recurse to...

Speaking of shell math, assignment suppression is wrong. It has to persist into recursive calls: echo $((0 ? (x++) : 2)); echo x=$x should NOT print x=1 at the end.

November 13, 2022

Sigh, I didn't even OPEN my laptop today. (Yeah, it's sunday, but still. I MEANT to get a lot of work done.)

Headed out in the morning with laptop to Penzey's (spice store) since the one in Austin closed during the pandemic but there are two here in Minneapolis. It's a 40 minute bus ride, and back when I commuted on the bus to and from Pace I got half my toybox programming done on the bus. This time the destination is a block from a McDonald's (which also hasn't got a surviving location near me in Austin anymore: strangely enough Texas mitigated against covid way less hard than sane states, but more stuff seems to have closed), so I brought my laptop and... watched phone videos on the bus instead of programming.

Bought twice as much as I expected at Penzey's. There were actual humans there I could describe a spice I got one time but I can't remember the name of, and where to find it on the shelves. Afterwards, McDonald's was a huge disappointment: the smallest combo is $10 now? The "dollar" menu only has a single $1 item left on it (small soda) and the McDouble variants that used to have are now all over $3. (So 300% inflation since I was last there.)

Left again without getting anything and had my sister pick me up (Penzey's is halfway to her place and this is my last weekend in town). She has a PC under the TV in the living room whose windows install finally collapsed in on itself, so when I was there last week I offered to install Devuan on it and see if it could be revived without buying a new PC. (Which isn't cheap around here.)

The problem is they had a USB wifi dongle with no in-tree Linux support, instead it's download and compile a driver from source. I could plug in my phone and tether to download extra stuff, but the external module build wanted kernel headers under /lib/modules/$(uname -r)/build. In theory to get those you install the linux-headers-$(uname -r) package... which didn't exist in the repo because the oldest version it had was newer than the kernel the devuan install media I had put on the box, and "apt-get update && apt-get upgrade" did not update the kernel version. And installing the headers package does NOT pull in the relevant kernel as a dependency, because of course it doesn't. And there's no linux-kernel-* packages in the repo.

Last week it was late and I told them the easy thing to do was just buy a USB dongle Linux would support, so my sister ordered one off Amazon with "linux" in the descriptive text. Except what arrived was a cheap chinese brand with no mention of Linux on the box or in the pamphlet. It came with a micro-CD with the Mac and Windows drivers on it (and a PDF version of the exact same pamphlet), but no Linux driver. Ok, tether my phone again, Google, and... their website had a downloadable linux driver. Which is a zip file that turns into a source directory, within which make dies with the same lack of /lib/modules/$VER/build.

Ok, if I can't find the old headers, how do I force it to upgrade the kernel? I wanted to avoid this because if the kernel version ever upgrades AGAIN it will leave the built-from-source driver behind, but I also want it to WORK. Alright, dpkg-query -S /boot/vmlinuz* because obviously civilians are expected to know that, and the magic name I should have known to look for is linux-image-$VER. Right, manually force it to install a new version of that package which matches a headers package still in the repository on the server (the newest one available on the server is older than the one my laptop's devuan install is using and I am NOT ASKING WHY right now) and now that's installed... do I have to adjust grub to boot into this? The package install ran some sort of grub rebuild but I dunno what the default kernel it wants to boot into? /etc/grub.d/README says I should look in /etc/default/grub which says I should look at /boot/grub/grub.cfg... Oh let's just try rebooting and see what happens. Hey, it came up in the new kernel. And loaded the driver! How do I associate it with the wifi...

Waiting for my sister to get back from the store so she can tell me the wifi password. My brother knows approximately what it is but not which letters are capitalized, nor which numbers are spelled out and which are numerals. (He has it on his phone, but the battery is dead and he went off to charge it. The younger niece is busy on her computer and unwilling to be interrupted, the older is a night owl and woke up just long enough to deny knowledge, neither nephew is home.)

Ok, the old machine is connected to the net! And it froze watching a video because it only has 4 gigs ram and 256 megs of swap, and I stupidly opened four chrome tabs to different websites before starting a video in the 5th and chrome is SUCH A PIG. (Ctrl-alt-f1, pkill -f renderer, swap's in a file so swapoff the file make it a gig with dd, mkswap and swapon the file again. Not a real fix but maybe SLIGHTLY less unstable now.)

And the kernel stack dumped in dmesg when I moved the USB dongle from the front to the back of the machine. Didn't panic, but wouldn't reconnect until I rebooted it. Such a lovely out-of-tree driver. A USB device that does not support hotplug. Bravo.

The new install defaults to playing sound through the PC speaker instead of the TV. The fix is to click the volume icon, select audio mixer from the pulldown, go to the output devices tab, and click the green checkbox next to HDMI to make it the default audio output device. Which it doesn't remember across reboots. So I showed my sister how to do that. What I SHOULD have done is deleted "pcspkr.ko" from under /lib/modules (as long as we're hacking up drivers that won't survive a kernel version upgrade), but it was late and I was tired... (No, I have not tried to get it to play DVDs yet. At home we just stick them in the praystation. I'm scheduled to fly back to Austin tuesday morning so that's not likely to happen this visit...)

Linux on the desktop: smell the usability.

November 12, 2022

Yes there's a release due today. No I'm not ready to do it. I've spent a BUNCH of cycles arguing with myself about the lib/args.c design being wrong for things that care about argument order (which I have multiple examples of now; sed was easy enough to fix up after the fact but tar isn't) and... I think I need to add a callback? To do the tar --wildcards stuff I'm either seperately traversing to reproduce multiple bits of state (ew), or I need a callback from the args traversal where I can access the state at the time and act on what it looks like at that moment. And the second is clearly less nuts.

(The command calling back into lib/args.c after the fact would be nuts. Not just exposing the lib internals and manually piloting them pickle-rick style, there's also a bunch of parsing state ala gof.opts and gof.longopts set up by parse_optflaglist() that gets freed again before it returns, plus all the actual traversal functions are side effect city, manipulating toys.optflags and toys.optargs and GLOBALS() and so on as they traverse. To avoid conflicts you'd have to clear it and then do the whole thing again, or else you leak memory and maybe have [!ab] flag relation tests go boing the second time. The callback doesn't even have to understand lib/args.c's mess, it just has to look at the normally exposed optflags and optargs and GLOBALS() and make decisions, with the arguments saying "this just changed".)

This probably means adding a new TOYFLAG_ARGSKIP and then having the command call get_optflags() itself? Probably add a new argument that's the function pointer to call, and have main.c pass in NULL for it? Some sort of void *callback(char *arg, void *where) every time it adds a string either to a struct arg_list or toys.optargs (which you can check for via the where pointer)? Not sure what it should return, but I don't know what decisions the callback would be making either?

But I still need to audit all the callers, which means delaying doing this as long as possible so I HAVE all the callers... Which means trying to half-ass something in the meantime to get past this so I can do all the OTHER things I need to get this release out.

I hate being sick. I pile up technical debt faster than I untangle the gordian knot. I am trying to CLOSE tabs and REMOVE items from the TODO list, not net increase the uncanned volume of worm. (I need to pace. Not sure where around here is good pacing territory. Everyting outside is realy cold.)

The problem is, if you really start looking closely at the man page for gnu tar, it has --sort=inode and --no-delay-directory-restore and there's very clearly decades of scar tissue and zero a whole lot of dinosaur engineering at play here.

Alright, the quick and dirty thing I can do is tweak the filter() function in tar.c. It's currently already using fnmatch(), so --wildcards is hardwired on. Um, why is it not using the FNM_LEADING_DIR macro? The header is posix, it's in toys.h? Ah, but FNM_LEADING_DIR isn't. Most likely BSD or mac broke. Sigh. Well, I need to add some more flags and should probably be using the symbols for them, need to add portability.h stuff to handle that, but I don't have a test environment for checking if I broke mac... Um, why do I nead FNM_LEADING DIR here? Because unlike regex I can't get the length of the match. What a terrible API.

Ok, handwaving that for now: there are three existing calls to filter(): creation is calling each name against the exclusion list, and extraction each name against both the inclusion and exclusion lists.

The theory here is there are a set of patterns, and a set of names the patterns apply to. Tar always seems to be applying the patterns to raw names before any other transforms (like --xform or --strip-components; I should add tests to confirm that and hope it's not positional too)...

For file creation, the --exclude pattern list is applied to the command line filenames to CREATE the include list. (The name list patterns are applied to isn't filesystem contents, it's command line arguments.) The include list is then used verbatim.

The timing doesn't match what I've implemented either: the creation include list seems to filter excludes while parsing the command line (so they never get saved if they're excluded): that would need that callback to implement.

For extraction, the command line include list is a set of filters and is used AS the list of patterns to be applied to archive contents. That's why creation includes can't have patterns, but extraction includes can. (More "how" than "why", really.)

November 11, 2022

Last rabies stab of the series. Did not enjoy it.

Today I have a crick in the right side of my neck that goes from my sinuses to my rib cage, which is weird because the last two injections were on the left side? Between the tetanus booster, the covid shot, the rabies series, and being up north when it suddenly dropped below freezing while NOT getting my regular multi-mile walks for a couple weeks now... my system is unhappy. (How do your legs get pins and needles lying on your back on a bed? That's new...)

Failed my politeness roll again this morning but... if I remove the nav bar, how do you get to the other pages on the site? What is the specific alternative he's suggesting? I asked to see the proposed plan and got more demands. I'm not going to tear the nuthatch out of the book nor do I have a copy of "Ethel the Aardvark Goes Quantity Surveying" ready at hand.

On the tar front, create with wildcards can't apply to children because there's no syntax to specify --include filters, only --exclude. By the time tar gets them, the names supplied on the command line are literal paths to open all-or-none. (Creation directory traversal is never filtered.) Either the shell expands wildcards before calling tar, or you pipe the output of "find" through xargs or $() to produce the tar list. Extract starts with fixed archive contents, so the arguments you give filter against that, by again they only specified negative filters and not positive ones because reasons?

Except -T will redo shell parsing in the absence of --verbatim-files-from including word splitting, wildcard traversal (the sub/*/blah/*.txt kind), and you can even have command line options in the -T payload:

$ echo -T loop.txt > loop.txt
$ tar -c -T loop.txt | tar tv
tar: loop.txt: file list requested from loop.txt already read from command line
tar: Exiting with failure status due to previous errors

Bravo. So during command line assembly gnu/tar reimplements part of the shell, seperately from the archive creation plumbing. And the sequencing is:

$ tar c www/index.html --exclude '*.html' www/roadmap.html | tar t
$ tar c www/index.html --no-wildcards --exclude '*.html' --wildcards -T <(echo www/roadmap.html) | tar t

The wildcard enablement flag is checked when the filename is added to the list, not when the pattern is added to the list. (Although this is still comparing against command line options and not filesystem contents: the name existing or not in the filesystem is irrelevant to this filter pass.)

Sigh. I ACTIVELY don't want to have to care about this part, but it has users now...

November 10, 2022

Various github issues boil down to to "this project should work like other projects", which I'd be fine with if other projects were consistent, but they're not. Trying to reply as politely as possible, but it's difficult and I am under the weather.

Coding is easy. Figuring out how the result should behave is hard. I keep getting feedback that makes the hard parts harder. And most of my open tabs or dirty files in the tree are blocked on the design part, not implementation.

November 9, 2022

As Peter Dinklage said on Colbert, "Lady Moderna was not kind."

November 8, 2022

Got a moderna covid booster this morning. Did not enjoy it.

Ok, tar --wildcards support is actually six flags: --wildcards, --wildcards-match-slash, and --anchored plus the corresponding --no-thingy versions. Exclusion defaults to --no-anchored, --wildcards, and --wildcards-match-slash.

I should NOT fiddle with lib/args.c as a prerequisite for this, just do the brute force thing an then apply a cleanup patch on top later. (The bloat and ugliness of the brute force version acts as motivation for the cleanup.) However, design-wise there's multiple pieces of information to collate here:

Because a flag that defaults to on and has a short opt for just the "no" version is funky to represent in "x(no-blah):" syntax. And then you get into the positional nonsense which tar uses, if a change only applies to entries after the flag in the command line you have to track when it happened, and the for(;;) loop trick I've been using means I'm duplicating the string in two places. (Because strcmp(x, "thingy") ends with a NUL and the optstr thingy ends with ')' so they can't collapse together even if the compiler or programmer is clever.)

There's very definitely some missing infrastructure here, but let's not try to design it BEFORE I have users for it.

Alright, what are tar's various use cases for this, each of which has a different set of defaults. Um, I did a trawl through that recently.

So --exclusion is --wildcards and -wildcards-match-slash but --no-anchored. Inclusion differs for creation and extraction, does it also differ for command line arguments vs recursively encountered filenames? (Inclusion is immutable for creation-side, command line arguments, what does it do with recursion... Tests. I need to start with lots of tests.)

November 7, 2022

Paralyzed by twitter drama and upcoming election.

Also by my todo list being wide. I have many many tabs still open and collating tabs is not DOING the work in those tabs. I need to get the lib/password.c changes checked in (which touches like a dozen command files several of which aren't in pending, and requires mkroot test infrastructure). I still have to write gzip compression side, and do a zip.c implementation. I haven't finished promoting dd.c yet! I know what I need to do on route.c now but haven't circled back to it. I'm about halfway through at least two different patch sets people sent into the list and github but in both cases I found an unrelated bug while testing them and need to track that down...

Update on JWZ using mastodon. Also, the german non-profit Mastodon GmbH that hosts has a patreon that among other things defrays their hosting costs. (The number of mastodon users has almost doubled in the past week passing a million on sunday, so they're really having to scale up their servers' hardware and bandwidth. I haven't noticed over on the server, but if I click "explore" over there I need Google Translate to read anything, so...)

November 6, 2022

Huh, so that's why the keyserver network went down. I wonder what replaced it?

Squinting at a patch trying to figure out if it's still relevant, because it's to a year old version of host.c before it was cleaned up and promoted out of pending (does not remotely apply to the current codebase), and doesn't come with a test case demonstrating the issue.

Unfortunately, my current host.c has a lot of changes in it, and is segfaulting. I'm not sure those two statements are are related but they're way harder to debug in combination. The changes were mostly Moritz Weber's attempt to add nslookup, which got bogged down by my observation that nslookup has a whole scripting language attached to it. (Kinda like some ftp client versions.) Not being an nslookup user, I didn't know what's actualy needed here, and then we both got distracted...

That recent module list traversal email I closed a couple days ago was a follow-up to my original grep -r 'MODULE_ALIAS_FS' fs | sed 's/[^"]*"\([^"]*\).*/\1/' | xargs with the manual fixup that virtio_fs.c had a funky entry. I don't really WANT to collate the ALIAS_FS() entries to compare them with file_system_type .name entries to check for inconsistencies, but linux-kernel has gotten way more complicated than necessary over the years.

Anyway, the stack all that pops back up to is having mkroot capable of testing toybox tools on every relevant filesystem type, which means enumerating and categorizing the filesystems. From years ago I know the real headache is the flash ones, which you can't even loopback mount without inserting an emulation layer capable of specifying an erase block size. (I wrestled with that at Johnson Controls so 2018 and early 2019 blog entries would be where to look...)

November 5, 2022

Spent the day at my sister's visiting my brother and the niecephews. (Well, three of them. One of them has joined the navy to get trained to do nuclear submarine stuff.) Not much programming time. Left my bluetooth headphones there.

Jamie Zawinski expressed frustration at mastodon:

And I linked to both the text and video versions of my rant about open source development's inability to do user interfaces, and then wrote up a brief explanation of mastodon as a blog comment:

Squatting a mastodon address is like squatting an email address. Having more than one just makes life harder. As long as you've got one somewhere you can follow everybody from that account and they can follow you. If you care about being at gmail or hotmail great, if not pick a smaller server or you can set up your own (it's just a web thing written in ruby, about like setting up mercurial or cgit). Moving servers means your old account does a 403 to the new location (there's a config thingy for it in your account's settings page). I presume other people's subscriptions auto-update? (Haven't tried.)

To follow people go to the search bar on YOUR server (logged in as you) and type @user@server (ala becomes and then when they come up there's a "follow" clicky icon right there.

When you go to another website any interaction will try to do an OAUTH sort of thing to your server, which is a pain, so don't do that. Instead view their feed through YOUR server's page, and then you can just interact as you directly. If you go into settings and enable "advanced web view" you get columns like tweetdeck, which makes this much easier. Then clicking on the name of anybody you follow (while looking on YOUR page) pulls them up as a column.

As usual look at other people's following/followers list to find more people. Although half the time I have to go to their page for that (right click open in new tab) because of security settings nonsense, but then I picked a server in Japan.

November 4, 2022

Third rabies shot this morning. I did not enjoy it.

I have created a mastodon account. Tweaked the links up top to add that and yank the old twitter I lost access to in 2018 and the livejournal I stopped using after Russia bought it.

Got some programming in at the zombie burger king on the green line. (Their $6 snack combo thing seems to be an inflation-adjusted 4-for-4.) Noticed the "no loitering, 30 minute time limit" sign on my way out and went "no WONDER nobody else is here". Either they're a money laundering front or they're not long for the world as a business. Multiple people came to the "back" door leading into the big empty dining area (which was locked, only the front door near the counter was open), rattled it, and then turned right and went to the restaurant literally next door.

New developers have sent in some toybox patches. Alexander Holler is interested in the shell (which could both use a lot of work and isn't quite ready for company yet), and nomas2000 on github is sending me patches via issues instead of pull requests, which I can't quite apply properly? Pull requests I can wget the URL manually adding ".patch" to the end, and then feed the resulting file to "git am" and it shows up attributed to the correct author. If I cut and paste the patch out of the issue description, I still don't know who "nomas2000" is (no email) and going to their profile page was not illuminating. Eh, giving it a couple days before applying them improperly.

November 3, 2022

Finally spent some time on the phone with Jeff working on the ASIC toolchain build on the new machine. We used debootstrap to create a known good build environment for the openroad/openlane stuff (with Jeff's patches adding VHDL support), and uploaded the results to github. It creates a devuan chroot, uses "unshare" to use it as a half-assed container, and runs the build in that.

Bunch of todo items: the hardwired -j 8 should become -j$(nproc) but the big server is a 24 processor machine and he's worried about it overheating while he isn't there (the server is in a Hello Office in Akihabara and we're in North America) so his personal build (which we're genericizing) throttled the processors. We should instead throttle cpufreq:

$ cd /sys/devices/system/cpu
$ cat cpu0/cpufreq/scaling_min_freq | sudo tee cpu*/cpufreq/scaling_max_freq

Except... scaling_min_freq is 2200000 and scaling_max_freq before adjustment is 6442480 and that's a BIG difference. I'm tempted to babysit the thermal sensors during a build and try to bump it up a little (especially since portions of any build are single threaded), but what's the step granularity here? Long ago I got a list of numbers out of this API for every step it knew how to do, and now, not so much. 6442480-2200000 is 4242480, half of which is 2121240, add that back to 2200000 and maybe 4321240 would be recognized? I guess? (Thank you kernel guys for once again taking an old simple API and replacing it with guesswork and black magic. Red Ha-IBM's full employment program for middle aged white guys continues to chug along.)

Another reason I want to fix this is because now that I have an account on this machine, I can use it to do fast builds of all the mkroot targets, and maybe even restart LFS bootstrapping.

I want to put out a toybox release around the 12th so I'm trying to close tabs. The tar --wildcards stuff got sidetracked on the whole --no-wildcards argument parsing issue, which aptly demonstrates one of my failure states:

I don't have time/energy to do the proper thing here, I'll just do The Wrong Thing.

[Time passes. The Wrong Thing does not get done.]

Alright, I've circled back to the molehill, let's build this mountain.

It's sort of like writer's block. It's not that I don't know how to do it... it just doesn't look right.

The Q4 money has arrived, as was foretold in legend and song email. Life is good. Now I need to implement all the things.

November 2, 2022

Sigh, the very nice internet at Fade's apartment doesn't have a password (it uses a login screen that records the mac address of authorized devices), which means the wireless hop isn't even using WPA2, meaning any http:// connection (such as because Dreamhost can't figure out how to apply the "let's encrypt" key to their shared listhost server; yes this problem is over 6 years old now) is plaintext in the clear. And since Dreamhost will NOT give me command line access to mailman, I have to log in to their web page with a plaintext password to manage that site, and am never quite sure what cookie data it's sending when I connect to it.

Pulled out the phone to do USB tethering. That's not hugely secure either but at least it's not trivial middle school level "leave a snoop program running to see what's going past" level of outright incompetence in an apartment building on a college campus. Sheesh.

Spent the morning converting my old timeconst.bc replacement patch to linux v6.0. Since the 4.2 kernel the header it generates has grown a nanonseconds field (which was trivial: made the loop do a third pass with a third entry in the name array), the time consuming bit was figuring out how the makefile plumbing had changed. (There's no more hostprogs-y, now it's hostprogs. The patch that went through and removed $(obj) prefixes from stuff is inappropriate for a host tool, so I put the $(obj) back on the new line. Plus this filechk_gentimeconst nonsense means that instead of the tool writing the file, you spit the data to stdout so it stores it in memory as a make variable, and then kbuild writes it to the file in a later pass. Oh, and V=1 doesn't apply to $(call filechk,blah) invocations, those are still silent. Bra fscking vo. Overcomplicating everything.)

Anyway, now I've got a patch but there's no way thunderbird will post it without wordwrapping, so I googled to find the updated version of the plugin... which wants me to restart thunderbird, and I have SO many open reply windows to close first. So now instead of closing terminal tabs, I'm closing unsent email reply windows.

I have a partial reply to this message where I ran a standard terrible command line to answer a question:

diff -u <(grep -r file_system_type -A 9 * | grep '[.]name' | sed 's/[^"]*"\([^"]*\)".*/\1/' | sort -u) <(find * -type f | xargs sed -n 's/.*ALIAS_FS[(]"\([^"]*\)".*/\1/p' | sort -u) | grep -v '^[ @]'

Specifically, what .name entries in file_system_type structs were NOT found in ALIAS_FS() macros? And the answer I got at the time was that binfmt_misc is in ALAIS_FS() but not found in a .name, and the other way around the extras are btrfs_test_fs, devpts, hugetlbfs, proc, pstore, pvfs2, ramfs, sysfs, virtiofs.

I recognize binfmt_misc, proc, sysfs, and devpts as synthetic filesystems. There's no way btrfs_test_fs is a real thing, not even looking.

hugetlbfs: ooh a THIRD ram backed filesystem. And a completely broke one at that:

# mount -t hugetlbfs sub sub
# echo potato > sub/file
bash: echo: write error: Invalid argument

This filesystem exists to use huge pages for all its mappings (on x86 an individual translation lookaside buffer entry can be either 4k long or 2 megabytes long; using the big ones eliminates a lot of soft faults and resulting unnecessary page table traversals). So while you can create files normally, the writes bounce unless... What, am I supposed to truncate() and madvise() here? The history is that long ago the kernel's generic page allocation plumbing didn't know how to make use of huge pages at all, so hugetlbfs was invented to manually request them via mmap(), then Mel Gorman did a whole lot of work and huge pages started getting used automatically in at least some circumstances, but the filesystem is still there for databases and such to micromanage stuff.)

Anyway, that one's somewhere BETWEEN a ram backed filesystem and a synthetic filesystem. It provides control knobs for the memory management subsystem, for decisions that have been at least partially automated since. Given that retail system memory kinda peaked in the 32-64 gigabyte range (you can get bigger but it becomes rapidly more expensive as the sales volume drops, I've seen boards supporting 256 gigs for sale with a price attached, anything above that is bespoke "ask for prices" nonsense. (If you have to ask, you can't afford it.)

I know what virtio is but not virtiofs (different than 9p using virtio transport), I dunno what pstore or pvfs2 is... and I'm confused why tmpfs isn't here if ramfs is? (I mean it's a diff so that just means both treat it the same, but...)

Sigh. When I tried to rerun that giant horrible command line instead of trusting the cut and paste in the email, I got 39 hits instead of 10. And binfmt_misc was not among them. Great. (Version skew!)

Cycle back to this later I guess...

Next unsent email: a reply to this message starting with:

Ok, busybox ash and dash both do a fairly crappy job at this:

  $ busybox ash -c 'x() x; x'
  Segmentation fault
  $ dash -c 'x() x; x'
  Segmentation fault

Which is presumably related to the changes in my current sh.c (55 insertions, 45 deletions according to diffstat). Need to finish that and add a test.

Anyway, I got the timeconst.bc removal patch posted to linux-kernel. And I should redo it for LP64 instead of ugratuitously_long_names_t from the C committee that still doesn't admit filehandles exist. (Windows and Unix both uses filehandles, is there anything left that DOESN'T do filehandles? It's a pity Linux doesn't have a read_until(fd, buf, len, '\n') system call that could do readline() for us. The point is I can't ungetc() into a pipe or char device, so block reads past the terminator are bad, but char-at-a-time reads are also bad. The FILE structure stores the data, but we've been over that.

Sigh. I've got a half-dozen kernel patches at this point that should really be collated somewhere? Three in mkroot now, this one, the CONFIG_DEVTMPFS_MOUNT on initramfs one, the rootfstype=tmpfs one...

November 1, 2022

Bit of a hiccup mounting /dev/?da and /dev/?db on each target... when I got to m68k there's a /dev/adb, which is Apple Desktop Bus. Why does a bus have a /dev node? No idea.

Ok: aarch64, armv7l, s390x, and x86_64 are vda, armv5l, i486, i686, m68k, mips, mipsel, powerpc, powerpc64, powerpc64le are sda. No remaining hda but I object to its removal on general principles. (Hard disk a. It's still a hard drive, that's the generic name like "eth0". The implementation technology should be irrelevant. Stop complicating things that don't need to be complicated.)

And sh4 doesn't support -hdb? It supports -hda (which becomes sda) but add -hdb and it says "machine type does not support if=ide,bus=0,unit=1" which... is IDE. What's been renamed "PATA" since, even though it isn't. How does IDE not support both a master and a slave device? It's inverting like 4 wires on the cable.

Because in qemu's hw/sh4/r2d.c it says "Master only" from when support was added back in this commit. Hmmm... They added support for a piece of hardware that's wired to the board without an actual IDE controller? Sigh, we REALLY need to add a Turtle board emulation to qemu, and genericize the Turtle kernel build so you can tell device tree it's got an sh4 in it. Then it could replace the r2d emulation with something holding more than 64 megs of ram and only one block device, and that deeply weird serial port setup.

Jeff gave me access to the fast 24 CPU machine in the tokyo office yesterday, and I rebuilt all the mcm toolchains on there. Today on the Voip call with him I got sudo fixed. (He told me to change my account's password without telling me what it IS. Kind of an important step there. I could ssh in via key but not sudo.) I walked Jeff through installing a debootstrap, flinging a minimal init script into it (to mount /proc and such), and invoking it via unshare to kind of half-ass a container. Then we ran his hardware toolchain build scripts (the ghdl/yosys/icestorm/magic/openroad stuff for FPGA and ASIC), and got quite a ways before running out of time. The build we left running at the end failed because... wget wasn't installed by default. Debian's base OS package choices continue to mystify.

I tried to get it to build mkroot with those new toolchains, but "bc" isn't installed by default either. Which is FAR MORE UNDERSTANDABLE since that's a useless archaic tool that's been replaced by either "python" or "matlab" depending on which direction you're going. But Peter Anvin blocked one of my perl removal scripts with a bc implementation, forcing gentoo and Linux From Scratch and such to install it which they'd never bothered to before because it's seriously useless.

I have a bc removal patch that replaces the timeconst.h generator with a C implementation (that's the patch Peter blocked; he ACTIVELY does not want Linux to build with minimal dependencies, still dunno why), and even though it was last updated in 2015, the bc script also hasn't been touched much since... linux 4.2 my script says. Let's see what's changed: in 2015 a comment, in 2017 a function added four new defines to the header, and in 2018 they sprayed license identifiers over everything. And then no change in 4 years. So the only real change is the 4 new defines, and that's still pretty trivial.

October 31, 2022

Had the next rabies shot this morning. Really unhappy about that. Now hanging out at an actual coffee shop (caribou coffee at the U of M) for the first time in forever. I've missed this.

Finally actually checked in that patch I converted to sed on thursday. It's a small part of ongoing work I'm in the middle of testing, and the REASON for it was to make kernel build work with cc pointing to llvm and my attempt to rebuild the llvm-hexagon toolchain got derailed by version skew, but release early release often. THAT part works, so check it in...

Ok, for the mkroot "tests" package do I want to union mount a loopback squashfs instance? While that would keep the data compressed in the initramfs, the amount decompressed into the page cache for use probably makes it a net loss. But we're talking multiple megabytes of pinned memory and qemu's sh4 board still maxes out at 64 megs ram.

The obvious alternative is to mount two block devices, -hda and -hdb. One for the read only data, and one for the writeable scratch space needed to do nontrivial builds. I shouldn't need more than one instance of scratch space: just make it bigger. If I need to bundle together mutiple read only block devices, I can either partition an image or just extract them into subdirectories and mksquashfs the lot into one image.

I THINK modules_install is just doing a find . -name '*.ko' and then preserving the subdirectory layout when copying the results, and I'm trying to do a make allmodconfig to confirm that (or at least check that devuan hasn't got any .ko files that aren't at the corresponding place in the kernel build)... except for the part where x86-64's defconfig uses the Russian unwinder instead of frame pointer, which needs that gratuitous third ELF implementation package in order to compile. (Which no non-x86 target needs). I have a patch to yank that, but it means other bits of x86-64 haven't been regression tested recently, and it turns out arch/x86/kernel/ftrace_64.S breaks in something like 4 places. Almost all off them are under CONFIG_DYNAMIC_FTRACE... but one isn't. And I don't know how to fix it off the top of my head, so yank CONFIG_FTRACE from the allmodconfig build, and...

Sigh. Gnu make is demonstrably crap, specifically when you do an SMP build that hits a build break, the other threads will continue on for many screens of files until finally running out of stuff to do, and then you have to scroll back up a lot to hunt for the actual error message that explains the break. And after several of those I stopped providing -j to the build, and it's taking a while. (But if it breaks, it tells me immedately WHY.) Of course gmake could stop launching new child processes when it has a break, it just doesn't. That's why I say it's demonstrably crap. "The build broke and I would like to see the error" is not an OBSCURE use case. (Eventually hit ctrl-c and switched back to a -j 3 build anyway. The allmodconfig build compiles a LOT of stuff.)

It's strange: "make" is building a bunch of [M] lines for modules, but that just creates .o files for them, you have to "make modules" to link those into *.ko files. Why? Ah, no you don't. It just does all that in one big pass at the end for some reason.

Ok, switch CONFIG_UNWINDER_ORC to CONFIG_UNWINDER_FRAME_POINTER and disable CONFIG_FTRACE and... amdgpu's display_mode_vba_30.c died with stack frame larger than 2048 bytes, but the build continued for at least TWELVE MINUTES after that and I had to re-run make without -j to see what the error was. Bravo. Ok, disable CONFIG_DRM_AMDGPU and it wants to do a complete rebuild because the .config file changed? Ooh, not this time. (Tweaking anything under FTRACE is a full rebuild.)

Hey, it finished! I had to run it again single threaded to confirm that it finished rather than just tailed off, because at -j 3 it looked like all the failures (shoulda checked echo $? to see the return code I guess). And how does it differ from the modules devuan has installed? Let's see: for i in $(cd /lib/modules/$(uname -r)/kernel; find . -name '*.ko'); do [ -e $i ] || echo $i; done | wc -l and there are 358 modules left over. If I change that || to an && there are 3201 modules that match. Because I built v6.0.0 and the host has 4.19.

Ok, I was asking the wrong question: how does a modules_install compare with find *.ko in the source tree? Let's see: make modules_install INSTALL_MOD_PATH=$PWD/../blah and then for i in $(cd ../modpath/lib/modules/6.0.0+/kernel; find . -name '*.ko'); do [ -e $i ] || echo $i; done and there are no modules in the install that aren't at the same relative path in the kernel source dir after the build! Woo! And the other way... for i in $(find . -name '*.ko'); do [ -e ../modpath/lib/modules/6.0.0+/kernel/$i ] || echo $i; done no modules in the kernel source path were NOT copied to the destination dir!

So modules_install is literally just doing something like "mkdir destdir && tar c $(find . -name '*.ko') | tar xC destdir", and creating some unrelated files in a parent of destdir which we don't actually need.

Hey, I got an email from Wise saying the middleman organization "has sent you money", it should show up in the account tomorrow, and the reference number is... a single period. Well I'm reassured.

October 30, 2022

Quiet weekend at home with Fade. Cleaned up a couple old github issues and commented on a few more to see if they can be closed, but spent more time playing Persona 5 than either.

I also built all the mkroot kernels with module support, so I can test that everywhere. I'd like to promote modprobe this coming release, but the one in pending depends on "depmod" which toybox hasn't got an implementation for. Also: module loading isn't a common operation requiring low latency? We haven't got a depmod for shared and static libraries, and yet somehow the linker and dynamic linker cope. (Ok, they know what filename they're looking for, but still. 3559 files, all but 276 of which are under 100k on my x86-64 debian system. I have no idea WHAT the gpu driver authors are thinking though: 6 megabytes for amdgpu.ko and 4 for noveau? And neither btrfs nor xfs have any business being almost 3 megabytes each, that's just sad...)

Anyway, teaching modeprobe to just traverse *.ko each call and parse the module symbols into RAM to find the other modules this one depends on shouldn't be that hard: by modern computing standards this is a small amount of processing. Especially if it starts by looking at the other modules in the same directory and stops when it's satisfied the symbols it needs? And otherwise goes for locality (looking at all the fs/*/*.ko before poking at fs/sound and fs/virt and such). And it can remember the ones it's seen already this run so it can satisfy nested dependencies in a single pass.

The wrinkle there is that if I DON'T implement a depmod in toybox to produce an unnecessary cache file, the kernel modules_install is still going to want to call depmod... no worse, it wants to call scripts/ Aha! But that wrapper script has "Warning missing '' file. Skipping demod." So if I just delete, maybe it'll still install the *.ko files? Or alternately, if DEPMOD doesn't it does exit 0 and the install continues... except it manually adds /sbin to the $PATH to find the host one even when I gave it a restricted $PATH for cross compile building. Thanks EVER so much. (Debian's refusal to have things like mke2fs in the $PATH for normal users even though use can use them as non-root JUST FINE is historic stupidity on par with pointing /bin/sh at the Defective Annoying SHell.)

And in the top level kernel Makefile target __modinst_pre is messing with modules.order, modules.builtin, and modules.builtin.modinfo (expecting them to be there)... Does the kernel produce them itself? (Maybe?) And it's sucking in a scripts/Makefile.modinst because of course. And there's also a Makefile.modpost and Makefile.modfinal next to that. NONE OF THIS COMPLEXITY HAS ANY REASON TO EXIST.

Can I maybe just find linux -name '*.ko' and do something useful with that myself instead of calling modules_install? There's a signatures pass, but A) I'm not using it, B) it can probably be run separately?

October 29, 2022

The llvm-hexagon build failed while trying to build libclang which is a bunch of RTSanitizer code. It wanted to link against the C library, which hasn't been built yet (because this is the compiler it would be built WITH). So llvm has reached the point where it can only be built with an existing llvm for the same target. LLVM must already exist in order to build LLVM, or else you depend on gcc building it. Bra fscking vo.

Here's an excellent piece of writing (from a professional author with personal experience), but the site it's on is rotting. It's the geocities problem: the original site goes down and sometimes someone mirrors it and sometimes you can fish it out of (and occasionally it inexplicably stays up despite statements to the contrary)... but a lot is just lost. No first sale doctrine, no used copies.

This is often exacerbated by the same creator producing work via multiple channels (that link is the patreon for the inactive podcast of a well known twitter thread writer talking about the book they're writing that's put everything else on hold), making it hard to "catch them all". How does handle tweet threads? If something nice was on youtube and that site goes crazy after the creator dies, doesn't even address that.

And who indexes all this stuff if it IS preserved? Back in the day sites like slashdot would show you the dozen most interesting links of the day, and presumably that could have turned into some kind of librarian curated trove like yahoo was trying to create before Google's algorithmic sorting profoundly outscaled what a small team of curators could do for "the entire internet".

This used to be a thing universities and libraries tried to tackle, but have been under direct political attack for 30 years because Reagan decided that an "educated proletariat" was a threat to the oligarchy he wanted to create. No really, in those terms. This is where the Stem Stem Uber Alles push came from: Nazi Scientists didn't second-guess Hitler's policies because a functioning society without bigots running gas chambers wasn't their area of expertise. They had the SS, the GOP has ICE, and the modern nazi science brigades happily work for them. To quote Tom Lehrer's song about an actual Nazi scientist who was instrumental to the V2 rocket bombings of london and then happily worked for the US space program after the war: "Once the rockets are up who cares where they come down? That's not my department..."

But collating those kinds of links and performing exactly that sort of historical comparison is why the GOP has most viciously attacked any attempt to preserve or teach history...

October 28, 2022

Last week I asked the sponsorship middleman's support guy if he could check the updated payment info, and he said "It seems correct since Wise validation didn't catch this time." Which didn't reassure me because Wise never caught anything until AFTER they had the money (one of which had to be reversed, two of which deposited but were not allowed to be repeated). But that's the best I was going to get, so I approved the invoice and let the middleman's plumbing do its thing.

This morning I replied to the email thread to ask the support guy guy why, a week later, the money has not yet shown up in my account. (Ye saga continues...)

Running the hexagon llvm build script with updated repos. For a definition of "updated" which means the llvm repo is a live pull last updated on the 18th, and the musl repo is a janury 2021 fork of musl with a hexagon patch stack on top of it going through August 2021. So it's either almost 2 years old or only 1 year old depending on how you want to measure. (Again, that's what the quic hexagon branch has live on github.) And then the linux repo is a cp -a of my fullhist one, I think with 6.0 still checked out, and a couple patches applied but they don't touch the headers_install plumbing that toolchain builds use.

On sunday morning a puppy bit my ankle, and I've been waffling about getting a rabies shot ever since. Three puppies came out of the bushes near the railroad tracks to bark at me when I went by making a loud noise (returning a lost shopping cart, which rattles a lot on pavement), and since I was at HEB anyway I bought them some dog kibble and returned to feed them. Turns out there's a homeless encampment in that overgrown ditch which the puppies are being trained to guard (or at least there was a voice yelling "hey" out of the bushes, it was like 6:30 am). The puppies are not remotely full grown and are adorable, but one of them nipped at my ankles as I walked away (which is standard dog behavior) and his teeth scratched the skin. I poured mouthwash and rubbing alcohol on it a couple minutes later (hadn't been really bleeding before that but the alcohol made it bleed a LOT), but still: technically I got bit by a strange dog. Which is highly unlikely to have rabies (none of the puppies were a foot long unless maybe you count the tail), but still.

I didn't even attempt to engage with the texas healthcare system because it's been capitalismed to death, (and because they don't hold dogs for observation but jump straight to dissection, and I'm not gonna do that to some poor homeless guy's innocent puppy). But I saw a dead possum in the gutter half a block from there the next day and went "what did it die of", and remembered all the mice the cats brought in which I released into that ditch over the years (there's a very small stream running down it), and the raccoons that live in the storm drains, and...

Since my health insurance is through Fade's work (either being a graduate student or the attached teaching position) the hospitals around the university of minneapolis are all in network (and actually functional, there's a medical school here) and I was flying here on wednesday anyway (already vaguely hoping to schedule appointments for a large backlog of medical questions I've been ignoring), so I thought I'd ask here. Wednesday I got in and collapsed post-travel. Thursday I brought it up and Fade said we'd go to urgent care in the morning. So now it's friday morning.

Originally I'd planned to ask "do I need", but walking into the building it occurred to me that their lawyers wouldn't let them NOT give it to me if there was a 0.1% chance of "bad thing happens we could get sued over", and I was right. (And in their defense, this isn't "you don't wake up one morning", this is "after a month of futile intensive care your family gets a medical bill bigger than your life insurance policy so the mortgage isn't postumously paid off".)

Rabies treatment is not "a shot". It was 5 shots, one in each arm and three into my ankle (because they couldn't get all the immunoglobulin in one site) which REALLY HURT. My voice is still hoarse from the screaming. The other two were rabies vaccine and tetanus. Yes I am still needle-phobic.

Have to come back for more vaccine on monday, next friday, and the friday after that. Bumped the return flight to the 15th. (No, the covid booster and flu shots couldn't be combined with this mess, they're given by different people in a different part of the building. Outpatient pharmacy. Whole 'nother psych-your-self-up-for-a-shot cycle to get those. Wheee.)

Not expecting to get anything else done today.

October 27, 2022

The net at Fade's place is SO much faster than Google Fiber. I go to a web page and the whole thing finishes loading in under 5 seconds, as opposed to the slow creaky population of each new element one at a time. (Each new connection can take over 5 seconds just to start, and one page can have dozens of seperately fetched elements with all the graphics and CSS nonsense.)

Google Translate's web page reports that the Japanese->English translation of "aoshima" is "Qingdao". The ENGLISH translation of "blue island" is a CHINESE word. Well that's nice. (I got there trying to use google translate to find out if shima and jima are meaningfully different suffixes or just an accent thing. Looks like accent.)

Aw, darn it. Wolfgang Denk died. (Creator and maintainer of u-boot. He tried out some of my stuff years ago but decided it wasn't ripe yet. He found lots of good bugs for me to fix in a short amount of time, though...)

It's quiet here in Fade's dorm room (the second bedroom didn't get anyone assigned in it this year, so it's very quiet here when she's off teaching or in the graduate student office, modulo the dog freaking out when I either close him in the bedroom (if I'm out here he must cling) or when somebody closes a door halfway down the hall (bigger territory to defend, STRANGERS within sensor range). Still, it's a decent work environment (too tired to accomplish much today, but I'm poking at stuff), and I could probably record videos (I brought a pair of the headphones with the good microphone).

But it's still disheartening that youtube constantly screws up with wildly inconsistent rules that change randomly and the site only listens to social media appeals. Which I suppose is better than never listening to anybody. (This wouldn't be so bad except for the rampant acquisitions without a shred over oversight. Sigh. Whole system still has Boomers dragging it down for a few more years yet. No ethical consumption under capitalism because the Boomergenarians insist the rest of us will only eat the rich over their dead bodies. Taps foot. Checks watch. Get on with it then.)

I've got a pile of kernel patches that never went upstream. I've converted a few to sed invocations in mkroot that modify the kernel source snapshot before compiling it. Just converted another one (the "use cc instead of gcc when available" one that lets llvm "just work" when that's what cc points to, including prefixed $(CROSS_COMPILE)cc toolchains). Can't quite keep the sed under 80 chars but it's not too bad.

But some changes aren't one liners you can easily sed. Stuff like my patch making CONFIG_DEVTMPFS_MOUNT work with initramfs (so PID 1 starts with stdin/stdout/stderr pointing to /dev/console even when you used cpio to create the initramfs file as a normal user, so you can actually see early boot error messages which saves SO much debugging time when anything hiccups). Or the patch I did to let rootfstype=tmpfs force initmpfs even when somebody goes "root=donotuse" which would ordinarily disble it because you said you were not staying on rootfs. That is LITERALLY what that argument means, don't say it if you don't mean it. Grrr. But people still want an override, so I did one.

Beyond the size of the changes, there's the number: three sed changes to the kernel is already pushing it. Ideally all this would go upstream, but linux-kernel is old and insular and bureaucratic. And 6.1 is adding Rust support next to BPF. Sigh. Yes I saw Google's new Fuschia 2: now in Rust, except it says "almost entirely" (don't make me use two toolchains to build a base system) and it's literally claiming to eliminate off-by-one errors in the description. (Java had bounds checking on all access, tinycc had a bounds checker for arrays in C, do they mean they're doing more than that? Python had "for i in array: print i" 20 years ago and trust me, you can still have an off by one error in python.) The whole thing still seems overcomplicated to me, but eh, let it play out. C wasn't my first programming language, Commodore 64 basic was, and I'm sill sad Lua decided not to have a full posix binding standard library. I'm just not remotely convinced any proposed alternative is better at doing what C does, and I'm trying to make a simple understandable system that can build itself under itself in a small circle. Pulling in two distinct toolchains as build dependencies and requiring knowledge of two programming languages to follow the code (make + C is already bad enough) is not simple.

Wouldn't say I got a lot done today. Still recovering from travel and changing to a day schedule.

October 26, 2022

Flying to minneapolis.

I meant to get work done on the bus to the airport, in the airport waiting for the plane, on the plane itself, and on the Light Tactical Rail trip from the airport to Fade's dorm. Alas, I did none of those things, and after a full day's travel my back is very unhappy with me. Full flight, 4th boarding group, only seats left next to very overweight people so spent hours leaning at an angle in a non-reclineable airplane seat with the armrest down.

Arrived. Very tired.

Every time I push blog entries to the website I get a shower of email from SEO people wanting to submit "guest posts". Some kind of rss feed crawler with google site rank scraping, I take it? Entirely automated, of course. Capitalism is WELL past its sell-by date...

October 25, 2022

Tomorrow I need to get on a plane to minneapolis. I should be preparing more. But so far...

Going through various old pull requests and seeing what's obsolete, already fixed, or can be easily tied off. Some of them it's hard not to be snarky about.

Elliott asked for tar --wildcards which is a little more frought than it seems at first. Hmmm. Ok, posted that rumination to the list instead of here.

My attempt to do some small videos triggered working through the swapon/swapoff todo lists and I've now added a lot of code which I'm not entirely sure is an improvement? The previous swapon and swapoff were tiny: take one argument on the command line and do the thing. Then -d discard support was added, but that's kinda needed for modern flash devices. (Although presumably it's the swap driver's problem? When it can discard, it should discard. Do you ever NOT want to discard?) And swap -p priority support wasn't hard to do, although I'm a little fuzzy on the use case.

But by the time I'm adding UUID and LABEL support, and parsing CSV options out of /etc/fstab and merging them with -o options from the command line... this command is no longer "simple". I'm now facing the standard editorial dilemma where I've DONE the work, but do I want to check it in? The usual design question about the scope and boundaries of toybox, which has no emprically right answer. What do I want the project to look like? Implementing the todo items lets me measure the cost... but against what exactly?

And don't ask me how to automatically regression test any of this, even with mkroot.

October 24, 2022

Broke down and ordered Persona 5 through the switch's built-in store. I was seriously considering ordering a cartridge through amazon but "do I want Jeff Bezos to get part of this money or all of it to go to Nintendo" was a pretty heavy weight on that scale. Physical media vs not giving money to a billionaire launching dick rockets. Yeah Nintendo's still a BIT of a middleman, but they at least made the console if not the game. The fact I already own P5 for PS4 means they're not an abusive monopoly, and beyond Sony there's Steam, and even xbox (from The Law Offices of Small and Limp Esquire which no thanks but "let's you and him fight" is one of the central tenets of dipolmacy. It's potentially useful the toe of the giant is in this space to be cut off. The enemy of my enemy is an interposable ablative defense.)

A significant problem with netcat -u -L cat is UDP doesn't let you know when the connection's closed. It almost demands a -W entry to time out the connection when no more data's received for a while, but I don't want netcat to magically impose one without being asked. I could maybe try to send periodic zero length keepalive packets, but even if that works (are zero length UDP packets allowed?) I can easily see NAT routers swallowing those, or systems being configured not to send any response to connections to closed ports to annoy nmap users...

Alas, the solution to this is "use TCP/IP". They already DID this. Maybe I should just disallow -L with netcat? Otherwise this is a FAQ entry waiting to happen...

Heh, noticed a Charles Stross line admonishing someone for "falling into the gap between what is and what should be". Kinda what I'm doing with the whole "pointers are not references" pet peeve, but that's a "should be" that's worth fighting for. (On the C side of things: C++ can disappear into a maze of twisty undefined behavior exceptions, all alike. Go and be consumed by Rust. We still NEED a portable assembly language. The job of C is to have just enough abstraction that switching between x86 and arm is not a complete rewrite, but to otherwise describe what the hardware is actually doing in explicit manual terms that map closely enough you can go through the assembly and C side by side and match them up.)

October 23, 2022

I phoned gamestonk yesterday to see if they had Persona 5 for the Switch, and spent a while explaining that no I don't mean the various praystation versions from the past few years I mean the version that just came out for the Nintendo Switch, the handheld console, and yes I want a cartrige version because I could already have downloaded it from the store within the console otherwise... Initially they said they only stock preorders (because why would a store have anything you could walk in and buy, the ideal store is apparently an amazon pickup locker with real estate and staff), but then after the guy on the phone talked to his manager they said they had one copy of the "metal box" version (whatever that is). I'm not sure if they really do or were just trying to get me into the store to upsell me on something else, and even if they do would it still be there by the time I made it across town without a car (they're next to the Northcross Mall, which apparently grew back and is operating again).

That said, when I walked to UT on thursday they'd moved all the tables against the open side of the porch and taken away the extension cord I'd left plugged into the outlet at the start of the pandemic (which students regularly used, I left a 3-way splitter on the end). It's possible my voicemail to the building manager about resetting the timer on the lights to NOT leave 4 hours of darkness before they come on each evening has triggered some kind of spring cleaning of the patio area? Either way, the table near the outlet isn't until they finish whatever they're doing (and I'm not gonna drag it because that would interfere with whatever it is)... and I fly to Minneapolis on wednesday... so I haven't bothered to walk back to UT since. Hopefully they'll have worked it all out by the time I get back, and I was kinda wandering to a day-ish schedule anyway (because Fade's on one; yes I should get covid booster before getting on airplane, yes I still have a needle phobia, working on it).

Currently my schedule is "get up at 3am and go to bed at 6pm" which remains night-esque so far. I still can't work at home because cat, but there's no shortage of tables WITHOUT outlet. (I'm currently at the outside dining area of In-N-Out: it's 4am, nobody's here to object.) Working on battery instead of with an outlet means working in shorter bursts, but again: flying to Fade's on Wednesday for two weeks, after which UT may have sorted itself out.

That said, In-N-Out is a 5 minute walk each way, not an hour, so I'm not getting nearly as much exercise. Pondering walking to gamestonk before the sun comes up and taking the bus back: twice as far a walk but I'd only be doing it one way, and I got a lot of practice working with laptop on bus when I commuted to work at Pace up in the Arboretum, the job I had before going to work for Jeff... Alas, they don't open until noon on sundays.

I had to write the first paragraph of today's blog entry TWICE because I accidentally hit "u" one too many times in vim (undo) and it ALL VANISHED AFTER ONE KEYSTROKE (because it was all one typing session without me hitting escape out of input mode to save, so that was the undo transaction size), and of course if you type one more letter by accident there's no "redo" because you're already at the newest change and it doesn't retain branches. If I ever manage to circle around to properly focus on toybox vi I'd like to work out how to make that suck less, but I'm not sure how. Some sort of undo navigator? In text mode? Hmmm. Last I checked, the busybox reaction to "redo" (ctrl-r, yes I have to look it up each time) was to print "not implemented". Eh, jot a note on the todo list and move on...

I did NOT fix netcat -u, the test in my commit message doesn't have -u in it. I still think TCP and UDP can mostly share a common codepath, because you CAN read and write from udp sockets via normal read() and write() calls, and that's how it HAS to work in order to provide stdin/stdout to a child process when netcat -l or -L is running a command line to handle the connection. (The child process is not going to recvfrom() stdin.)

Right now, it works halfway: a ./netcat -u 9876 client connects to a ./netcat -u -s -p 9876 -l server, and what I type on the client is printed out by the server (-l without a command to run does "chat mode", passing along stdin/stdout data to the other end), but as soon as I type something in the server it goes nc: xwrite: Destination address required.

The problem is the server side hasn't called accept(), because the sequencing's changed. For -l (single connection) mode the first incoming packet lets you know who to reply to (and to ignore any incoming packets that AREN'T from them). In TCP this is handled by accept(). In UDP this is sort of recvfrom's job, but the difference is accept() doesn't have a payload. When you recvfrom() you get data, and need to pass that data along to the consumer. The pollinate() function I did does poll() and read/write for a pair of filehandles, properly handling the half-connection required for the far side's read() to get EOF and know it's time to finish and flush pending data. (HTTP 1.0 connections didn't work without that, 1.1 uses more explicit framing, but it's still a thing you want to get right.) And if I write() the extra data to the output fd before calling pollinate(), that write can block until the child reads it, meaning a child that produces a lot of output before reading its input can fill up its output pipe buffer and the whole thing hangs. This is why pollinate() exists: to do that particular dance RIGHT.

This "I've read data but can't lseek() back in the stream" problem cropped up in tar as well (autodetecting tarball type from stdin), and I think I wound up having to fork() an extra child process to deal with it. (I very much wanted to make that use case work for interface reasons, because if "tar xvf file.tgz" works reliably then the "z" is going to drop out of people's muscle memory, which means "tar xf < file.tgz" should work too: it doesn't in debian, but would be easy enough to implement with lseek, but you can't lseek() on "wget -O- | tar xv" because pipes aren't seekable. Hence putting a lot of work in to make that work for toybox tar.

Aha! recvfrom(MSG_PEEK) is udp's alternative to accept(). Until then I'd half-convinced myself that the write() would only block if it filled up the pipe's output buffer, which should never happen on Linux if I supported recvfrom atomically collecting "jumbo frames" (which I wasn't planning on; don't ask me how splitting works here though: probably sanely? I guess? If UDP drops data instead of splitting frames too big to fit into the... no, the man page says recv() equivalent to read() with flags, it HAS to split right... Anyway, that write() could still go to blocking stdout and cause the same problem. Yay MSG_PEEK as -u's listen() alternative. (I was wondering if a zero length read might also do it: block until there is data but consume no data, but that's not the way they decided to go...)

The next problem is -L (multiple connection) mode, because with -u there's no accept() call so we call read() on the socket directly, which means UDP needs to call connect() on the socket to target it at the far end in order to use normal write() on that filehandle... but there's only one socket shared by all the incoming connections. If we fork() and multiple children call connect() on their inherited socket filehandle, will they be sufficiently isolated? (This is similar to "if I do an lseek() on this fd is the position change visible to another fd".) What is shared and what is separate? No idea, gotta test it...

And yes, I am making all this house of cards work ON NOMMU SYSTEMS. The actual non-vfork fork() is for netcat -l without -p (so it can print out the random port number it's bound to and the script that ran it can assign it to a variable and use it without waiting for the process to exit). The child processes are launched via xpopen() which means I can have an xpopen_setup() do a callback to call connect(). Which on nommu is still in vfork() before exec context (where the parent process is suspended and the new child process still shares all the parent process's memory mappings and anything it writes will affect the parent process, BUT the child already has its new file descriptor table, so if connect(sockfd) DOES have different state than the parent's fd, that would be where to modify it.

(I mean I COULD do a whole new UDP dispatcher based on recvmsg() to pass data back and forth between known recipients in a single host process using one big long poll[] but... dowanna? Like, really dowanna?)

Sigh. Netcat's unix domain socket codepath shares very little code with the TCP and UDP codepaths, and interleaving them makes it if/else salad. Wanna ask Tom Cherry if there were any actual USERS of this, or if he just added it because other implementations had it? But not now...

Today I learned that if you vfork() from within vfork() context, the error return is "resource temporarily unavailable". Good to know.

October 22, 2022

Attempted to navigate my pet peeve with a minimum of venting. Probably didn't manage it, but there's a commit now and I think I have a test for it? (The downside of catching these entries up to date is I uploaded multiple extensions to yesterday's entry, some of it NOT in the middle of the night US time, and have no idea if RSS readers notice the updates? Oh well...)

I still can't run the whole test suite for the NDK build, but very much do NOT trust this is the end of it if they're inventing fresh "undefined" behavior. A quick grep -l 'void [*])1' toys/*/*.c is finding that in not just sed, but tunctl, sh, find, and test. (and demo_editline but eh). And that's just ONE variant of this sort of thing: the case they hit wasn't that. Another variant is that I'm incrementing the bottom bit of known-aligned pointers in my diff.c implementation to mark potential matches (and then decrementing them again before freeing them, because I know what the system is ACTUALLY DOING, and why). The first annoying "compiler mis-optimizes" was main.c calculating stack consumption where I typecast to (long). (Not unsigned long, because I need the difference between the two of them and if I ever DO support pa-risc Linux, stack there grows up. So it's absolute value of signed difference.)

But yeah, I see the reference they gave to c11 and I checked the relevant part of c99 and it's got similar "we can't guarantee char is a byte" hedging. A pointer is not a reference. I guess I need to lean MORE heavily on LP64 and thus typecast MORE math to unsigned long to work around the compiler from being fscking stupid. But it would be nice if I could run the whole test suite from the NDK build, since that's the compiler exploring strange new worlds, seeking out new breakage and to boldly going there. Need to put in more work on testing, as always...

I fired up the Fedora vm to see what swapoff -va is supposed to print, and apparently if /proc/swaps has "/dev/zram0" (the VM booted from a read-only DVD image with no persistant storage, but I'm guessing that's a data compressed ramdisk) then it prints "swapoff /dev/zram0". And of course Fedora won't let me swapon /dev/zram0 again right afterwards because "read swap header failed" and you just make jazzhands at Fedora while backing slowly away. (The system should be straightforward and intelligible. Lots of variants of it aren't. Those are broken and unmaintainable and will go away in time.)

It's a pity I can't give Fedora display size hints on the kvm command line so it starts in 1024x768 instead of needing manual adjusting each time. There's probably a way to do it, but it's not immediately obvious what it is and I used all my polite on the compiler bug that hit sed. (Yes, I read the part that says c99 was already breaking. I am sad, but not repentant. A pointer is not a reference. I have used Java, Python, Lua, and more. I know what a reference is. Half-assing it like this does not help.)

So anyway, for i in potato fungus; do dd if=/dev/zero of=$i bs=1M count=10 && mkswap $i; done (and their dd still says bs=1m is "invalid number" because the case of the suffix isn't completely arbitrary) and... that -v output is very, very chatty. And redundant. Is it swapon's job to complain of insecure permissions and owner? And the "found signature" line isn't helping, the useful info it provides is repeated on the next line.

What does busybox... not installed in the VM and I don't have net right now. Ok, compile a static defconfig busybox with the musl-cross-make toolchain and... huh. "make distclean defconfig busybox" dies with a missing header file, they haven't got the dependencies set up between those targets to let you just say CROSS_COMPILE and LDFLAGS once and tell make to do all the things in one pass. In 2022. Ok.

Darn it, I can't find that test plumbing I thought I'd checked in. Where I went to ALL THAT EFFORT to get it to work in parallel and everything... where did that go? Ah, scripts/ I thought it more of it was in scripts/root/tests but apparently not. (Wanted to grab the httpd invocation out of there rather than work it out again, because no matter how I invoke dropbear I can't get it to run as a normal user on a high port and let me log in via username and password (or without password). I need a chicken and egg "copy key file onto the board I need dropbear to copy files into", or I need to run dropbear as root on the host so it can read /etc/shadow (and then type my password into the Fedora VM, which I'd rather not). Fedroa has wget and I have httpd, and busybox is just a file... (Yes I could just read busybox's source to see what the -v message is, but dowanna because license. Hygiene thing.)

And busybox swapoff doesn't have -v. Well of course. Half hour well spent, that.

Anyway, Fedora's swapoff -av prints one "swapoff /dev" line per device, not "swapoff /dev /dev" which is what I was wondering. (No of course there's no swapon or swapoff in posix.) Swapon accepts multiple files on the command line and iterates through them, and given -a will still go through the command line files. How does it handle spaces in the mount point field... "man 5 fstab" says octal escapes, same as the kernel produces. Ok. Assuming /dev/disk/by-uuid is maintained by fstab, handling UUID= entries shouldn't be a big deal. (I admit to curiosity about "UUID=../../../home/user/filename" but you have to be root to run this already. Dunno what /dev/disk/by-partuuid is for, my first guess that I could open any filename in there using the first few digits of one of the UUIDs out of the other directory was not borne out by experiment. It's a synthetic filesystem, but they're PRETENDING it's organic. None of the sort of tricks you see in /proc...)

Hmmm, the "swapon /dev" lines go to stdout and the rest go to stderr. The rest of the metrics go to stderr, making them less likely to be parsed? "swapon doesnotexit" sets $? to 255, and swapoff -a as a normal user sets $? to 16. I am 99% sure I do not care, and am unlikely to start. (Wait for people to complain...)

October 21, 2022

Sigh, they're breaking the compiler again.

A pointer is not a reference. A pointer is a variable containing a memory address. If you dereference a pointer that does not point to a valid mapping, bad things can happen. (Your attempt to access memory may have other constraints like word alignment, you can read but not write here, appropriate access size for memory mapped hardware that only knows how to give you 32 bits at a time and gets unhappy when asked for 8...) But the POINTER should never be the problem, it's just an address. It can point ANYWHERE. Dereferencing the pointer is the fraught bit.

If your compiler's pointer arithmetic can't deal with putting any value into the pointer, A) your compiler is broken, B) we need to use integer arithmetic and typecast the result back to a pointer, or some other way to work around the compiler bug.

In this case, dead code elimination is triggering inappropriately. That's another class of optimization present in Turbo C for DOS circa 1987, although that proprietary codebase back then called it "jump elimination" and the classic ACM paper people refer back to explaining the basic technique is from 1991. Still: over 30 years ago now, the general idea is not new.

Historically the big hammer for this is the "volatile" type modifier, which literally means "the optimizer gets this wrong". That is all volatile has EVER meant, initially due to register allocation preserving values and thus optimizing away accesses to memory that got modified by hardware or interrupts (yes, the simple optimizations of the olden days still sometimes managed to be wrong), and the more optimizations the compiler grows the bigger a hammer "volatile" becomes. Alas, you can't tell it to disable just SOME optimizations performed on this variable: volatile disables even maintaining the variable in a register, so each access becomes a fresh register read/write from the appropriate memory location. (CISC often did operations while loading registers, RISC usualy means register load and operation on registers are separate steps, then write back from register is a third step. The ability to execute multiple operations per clock cycle was an acceptable tradeoff for this.) And as with "const" there's a slightly different syntax for "the contents of the pointer is volatile" and "what I dereference through this pointer is volatile" that I always have to look up, although I'm pretty sure the first covers the second...

It's a pity I can't just while ((volatile)thingy) in this instance to disable JUST the bad dead code elimination instance, but typecasts don't modify the existing type, and that would default to "signed int" for the rest of the characteristics. Which to be honest would still mostly get the right result... except on big endian 64 bit systems. Casting to (volatile long) is probably the shortest thing to type because we're checking for null so sign bits aren't gonna make a difference. (LP64 means pointer fits in long.) That's assuming this compiler understands "volatile" as part of a typecast, not just a variable declaration. (I don't THINK any modern compilers still have that bug, but I can't reproduce this so don't know if I've fixed it unless the submitter tells me. We're talking compiler bugs here, anything could happen...)

Anyway, waiting to hear back from the bug submitter. Expanding "undefined" behavior in C is a pet peeve of mine. I'm trying to be reasonable about this, but C is not C++. All those memory access constraints I listed above I've personally hit. If toybox needs to measure the stack consumption on a nommu system to figure out whether to recurse or exec, there should be a way to do that without inline assembly. Stop breaking the language.

Phone tethering is slow. I got the text about "you have used 50 gigs of bandwidth this month" on the 18th. Alas, Google Fiber keeps having very long delays before it starts loading a thing (I need to throw the white circle router out and get a new one) with sometimes outright "does not work at all for 30 seconds" dropouts, so I keep turning my phone's wifi off when it fails to load a video and then forgetting to turn it back on again. T-mobile doesn't cut me off, but the ratelimiting gets progressively slower until it resets sometime after the 1st, and that's still a ways off this time. The tethering is seperately metered (I think that WILL cut me off if I go over? Capitalist shenanigans, it's the same data!), but the RATE of the data is a subset of the regular phone data rate (slower than the phone gets), which puts it into unhappy territory just now.

Sigh, update from Elliott with a test. Which of course does not reproduce the problem in the x86-64 toolchain on my devuan system. Alright, do a "repo sync" in aosp and grab the latest NDK (I have 25 and 25b is out) in another tab. He gave me arm assembly output so try that I guess...

Alas, forgot to reboot the google fiber router before starting this, so after an hour repo's at 3% and the NDK is halfway through the download. I've shifted my schedule forward a bit, but I'm still mostly on a night schedule. May have to let it run "overnight" as it were and resume this evening...

October 20, 2022

Robert Piekarski reported a netcat bug (via linkedin! That's a new one).

$ toybox nc -4 -u -l -p 9876
netcat: listen

$ strace toybox nc -4 -u -l -p 9876
listen(3, 5) = -1 EOPNOTSUPP (Operation not supported)

Dunno why errno isn't getting set by the listen() libc wrapper? (It should be netcat: listen: strerror_message but the last bit isn't there if errno is 0. According to strace errno is NOT zero, but it's not winding up in errno. And the funky part is musl and glibc are BOTH doing that...?)

Josh Gao added UDP support and I haven't poked at it, but I have an old netconsole tutorial bookmarked... which has turned into a "flywheel" hosting error page? But has both parts, and here's the kernel docs on the same thing. I've been meaning to add a mkroot option to do that, probably as a scripts/root package... (Except that would add on TOP of the serial console? Hmmm...)

Sigh. I just wasted an hour tracing through musl's listen() implementation and __syscall_ret() and sticking raw syscall() into netcat and then tracing through my lib/ code to eventually work out "oh, that error_exit() should be a perror_exit()". Yes, that would explain why it's not printing the errno string: because I didn't ask it to. And the bugfix was just to make the call to listen() conditional on not having -u, the rest was changing the command line to do the server/client thing right. (Server needs -s and -p. Possibly the user interface should be updated so it automatically shifts target address to source address when you -l, but that's how it's been all along...)

Can you syscall(__NR_syscall) and if so, how many times? (I explicitly blocked "toybox toybox toybox ls" because shenanigans. Dunno what the kernel guys did about reentrancy there. It would eat a register every time and there's a maximum of something like 6 arguments at the best of times...)

I'm tempted to make a lib/ function to concatenate an array of strings, since the two pass loop to do that is so common: first pass to measure, malloc(), second pass writes the data in with a separator. Except looking around for instances of it that could be replaced with the library function is a bit frought because some of them use s += sprintf(s, blah), some are strcpy... the first one that comes to mind is dirtree_path() but that traverses a linked list (actually goes up the ->parent nodes of a tree but it's a linked list for this purpose). Should the array end with a null pointer entry ala argv[] or should it have a length? The instance of this in watch.c has the start of thhe string be a bespoke sprintf() instance that's not part of the array, Should there be a way to tell it to allocate extra data at the start (or end) that you can add extra "weird" entries to that aren't in the array? The extra argument to dirtree_path() is number of extra bytes to allocate at the end.

There's an awful lot of ALMOST generic code, that stops being quite generic enough when you look at it closely, and isn't easy to massage into a common format. The changes and wrapping you'd need to factor out common code are sometimes intrusive enough the copying's worth it. Lots of balancing act analysis that results in NOT doing the thing. Sadly, A lot of the work I do doesn't produce immediate tangible results.

Elliott has been cc'd on a lot of the back and forth with the middleman handling the Google sponsorship money, and when Lars asked whether we should use the same middleman company again Elliott asked how that was going. I edited a lot of sidequest out of my reply to the two of them in a vain attempt to seem vaguely professional. Here's the cut and paste before I did that. (Waste not want not, I wrote it up, might as well put it here.)

> did they resolve your most recent problem yet?

No, ball's in my court. There was no deadline, and I'm always reluctant to do financial transactions via web because I know just enough about how the sausage is made to see it as rolling security dice. "Here's three different ways every link in this chain could be compromised, will we get away with it this time..." But you're right, I should get this over with...

Their support says the problem is the ABA routing code has expired or something? (Which isn't a thing I knew they could do.) Fade says the one she gave me is still on the bank's website... which it is. Ah, I see. I used the "wire transfer" ABA routing number, and I guess I was supposed to use the per-state checking account ABA routing number. (I didn't know one bank has many different ABA routing numbers, more than one of which can transfer money into the same account. Fade's in Minnesota, I'm in Texas, we share a bank account... Nope, not going further down this rathole now.)

I tend to procrastinate about opening websites with financial data because I know just enough about how the sausage is made to be reluctant to roll the dice on security each time. Ordinarily I just assume anything I type into a browser is probably public, and that's... not good for bank info.

I don't THINK anybody's got a keylogger on my laptop, don't THINK my browser's had a fake master certificate installed with a MITM for everything, don't THINK any data's leaking through non-encrypted channels across the wifi (when WPA2 got kracked in 2017 did they duct tape it back together or just ignore it and move on like rowhammer?), I don't THINK devuan's package repos have been compromised (which would presumably be noticed by somebody other than me), I don't THINK the website at the far end is exploited or likely to have an embezzling employee or sends offsite archives someplace insecure (but really dunno their infrastructure layers), I don't THINK the handoff to Wise is being monitored, don't THINK Wise's own infrastructure is cracked (and that would fall under a Berkshire Hathaway style reinsurance policy if so... maybe even trigger FDIC? I'd have to ask Cathy...)

But knowing a bunch of links in the chain that COULD each fail makes me nervous, and reluctant to open the can of worms. I can mostly filter out a bunch of "comfortably willing to ignore" categories: I'm 99.9% certain nobody would bother to point a yagi antenna through my window to reproduce my screen and keyboard data from the radio signal leakage just to perform minor bank fraud. (Although it turns out you CAN work out what keys people pressed from an audio recording using the little infrared laser dot on the window trick, or from the laptop's built in microphone. But again, that's five eyes BS, not cost effective for minor theft.) And I doubt my kernel's particularly compromised because while you can run rowhammer in Javascript (most web page ad suites use way too much CPU for no obvious reason while running random third party code in my browser), nobody's gonna target Devuan. Security through obscurity is real in that we're too small to care about. But just now looking up routing number pages up on my phone while I had the account info page open in my laptop browser, I tried not to point the backward facing camera at the laptop screen even though I _know_ how much effort you guys put into layering security everywhere in the phone. But android is THE big monoculture-ish target that gets a LOT of attention, and "phone got pointed at a check or this known list of banking websites" seems like something somebody might set up an OCR recognizer for...

Brevity is the soul of wit in part by editing out the data and reasoning you used to reach your conclusions. "I am idiosyncratically uncomfortable" != "here's why". As with politics, there's a limnal space between being fully truthful and sounding like an absolute doomspouting loon. At least in the computer science space I've got xkcd on my side. And as bad as politics gets, I take solace in the numbers: white male Boomer is the core demographic of the Magats and the average Boomer is 67. If you split out the Joneses as a separate cohort the midpoint of what's left puts the average Boomer at something like 72, with the pandemic having reduced the average US male life expectancy to 73. Part of the reason the political operatives doing mass elder abuse are moving so fast to undermine democracy, they're on a literal deadline.

Still, bad things that legitimately COULD happen doesn't mean they always do. (Don't get me started on the hypochondriac streak I developed from my two years of a pre-nursing major in college, lying awake thinking that the crunchy neck joint I inherited from my mother could turn into a hangman's fracture... A little knowledge is a dangeous thing, and in some areas that curve doesn't flatten out for a while.)

October 19, 2022

Fade's Pixel 3 stopped receiving calls until she rebooted it. "It's like being on Windows XP again", she says. I myself made an actual phone call for the first time in I don't know how long (poking the maintenance guy about changing the timer the lights are on at the geology building), and the Android phone app asked me to rate it. The stock one from Google! YOU ARE A PHONE. You may have forgotten, but I have not! I don't not get asked to rate a car when I park it and turn it off. I am not asked to rate my front door when I get home and unlock it. A public urinal does not spit out a business card when I flush it. I'm aware Late Stage Capitalism is having extinction bursts all over the place, but no. If I open the bun of a hamburger and see third party advertising text written on the underside of the bun, I'm not eating it. Boil your own frogs, I object.

Heard back from busybox about the grep thing, and I can't even. The busybox devs are not seeing the same busybox behavior I'm seeing? From a fresh build from a fresh git pull? Ok, backing slowly away... I can wait for somebody to come to me with a use case where this affects something real.

The kernel has obviously broken dependencies in a way that... I can't have been the first person to hit this, can I? (I wanted mkroot to do the kernel invocation in a single line to avoid having to repeat all the variable assignments, but I can work around this if necessary...)

Ok, built a kernel with modules, what's my test here... cachefiles depends on fscache, so modprobe cachefiles... and toybox hasn't got modprobe because it's not in defconfig. Right, rebuild toybox... except rebuiling raises another design issue: mkroot can rebuild the root filesystem by itself without rebuilding the kernel just by skipping the LINUX= on the command line, and it'll re-use the existing kernel. Which lets me build and test the filesystem quickly without waiting multiple minutes for a kernel rebuild (and draining my laptop battery fast when it's not plugged in; I'm currently trying the breakfast at the UT Wendy's in Jester Center: the honey chicken biscuit is very sticky and not compatible with typing, but I still have 74% battery as I go for my first soda refill. I can do my own evaluation and review of this stuff thank you very much. It's ok, but not worth almost $8. I wonder how much just fries and the soda are without the sandwich? The old "rent the space" problem again. The 4 for 4 doesn't start up until lunch menu, and the only reason that hasn't already been repriced is they made the price part of a large branding exercise with significant sunk costs. I give it until maybe January.)

Efficiently rebuilding mkroot with modules: if the kernel build writes modules into root/$ARCH/fs then recreating that directory without re-running the kernel build produces a filesystem WITHOUT the kernel modules. Is there a "make modules_tarball" perhaps? Not that I can find in the linux kernel's "make help". Hmmm. I can install to a temporary directory, tar them up myself, and then untar them into fs, but... that seems silly? And if I install them into fs/lib, tar them into "modules.tgz" next to the kernel build, and then untar them again only when NOT rebuilding the kernel I have two codepaths (I.E. less testing, and the very real possibility that something can break and I don't notice for quite a while because my usage pattern was consistently using the other codepath).

On the one hand, I'm reluctant to special case this. On the other hand, trying to work it into the more generic overlay mechanism... doesn't fit. The overlay I recently added is entirely optional, doesn't necessarily live in the build directory, and is externally user supplied rather than edited by the build. NOT special casing it sounds worse.

Right, install modules into a temporary directory under the linux build directory (so the existing delete of the temporary kernel source snapshot takes care of the temporary modules too), create a tarball from that, and then have the initramfs build include the tarball. Hmmm, you know, if I made a cpio instead of a tarball, I could just append them together? (Remember that nonsense with the TRAILER!!! entry being a hardlink flush, I know initramfs cpio extract gets that right. Dunno if it handles two concatenated gzips but I can (cat file1; cat file2) | gzip without too much effort...)

I let my fries get cold. Still 61% battery. But those mosquitoes I was annoyed at around the table? They're here in the Wendy's. And it would be far more rude to pull out my bug repellent can in here than it is out there. (Should have given myself a good spray before relocating.)

Walked home and stopped at HEB table! It is now 10am. I am borking my sleep schedule big time, but I'm trying to get modules working in mkroot... So modules.cpio instead of tar (and might as well make it cpio.gz since I can zcat as easily as cat and if it's there it'll get tarred up into the distributed result tarballs...)

$ scripts/ PENDING="modprobe insmod" CROSS=x86_64 tests
$ (cd root/x86_64 && ./
# modprobe cachefiles
# cat /proc/modules
cachefiles 40960 0 - Live 0xffffffffa0049000
fscache 294912 1 cachefiles, Live 0xffffffffa0000000
# modprobe -r cachefiles
# cat /proc/modules
# exit
reboot: Restarting system

To quote Philo from UHF: "Yeah. It works."

October 18, 2022

*blink* *blink* Lars at Google emailed to ask if I want to extend the toybox sponsorship next year. Ummm... yes? Very much so. (Right. What do I need to ship the 1.0 release...)

Still closing tabs. Trying to figure out if I should have a toyonly grep test for "echo -e 'one\0two' | grep -l ^t" saying it matches. (But grep -al doesn't? Wha?) Pondered emailing the coreutils list I'm STILL subscribed to (someday, they might add cut -DF like they said) but grep isn't part of coreutils? It has its own git repo. Right, clone that, there's no configure, run autoconf... and it dies with an error? (What does "build-aux/git-version-gen: not found" mean?)

Sigh, I should grab a grep source release tarball (within which they've presumably already run autoconf), but dowanna right now. However, I did notice that busybox is doing the same thing gnu/dammit is, so I emailed them to ask what's up with that...

October 17, 2022

Looked up my prudetube channel on my phone to see which videos I'd actually finished and uploaded, and... only four. And given what's there, filling out the command list seems like low hanging fruit, so I did the usual ls -oS toys/*/*.c | grep -v pending and... the smallest one is "clear" which is actually fraught (the escape does not reset the TTY out of "cooked" mode, nor does it fix the stupid "wrap at edge of screen" issue QEMU leaves that I had to put a fix into mkroot's scripts for), so I need to fix that but I need research to see if those are the ONLY issues (that's just what I've personally hit), and then figure out how to test it (mkroot is necessary but may not be sufficient)...

Next are true and false, which are the commands I've already done. And next is swapoff, which... should really have -a and -v? And swapoff -a means reading /proc/swaps, which presumably uses the same octal escapes as /proc/mounts so that code should be moved out of portability.c into lib.c so I can call it here...

ANYWAY: I also want to test swapoff, which means running it and having it do a thing, and that should be reasonably safe to do on my laptop... except for the darn Fedora KVM instance eating half my swap partition.

And that VM has gradually gone SEPTIC, which is annoying. Keeping that VM open has become gradually more of an imposition on my system to the point where when I switch tabs to thunderbird the system spends 5 seconds swapping memory with X11 task switcher frozen (on an SSD!). So I went into there to try to finish this: I left off where "LD_LIBRARY_PATH=$LIBS ./toybox" was going "./toybox: path stuff: version `XCRYPT_2.0' not found (required by ./toybox)" which is another facet of glibc being horrific.

Coming back to it I thought I'd just statically link toybox, but cp $(find . -name '*.a') libs/ is going "cp: input/output error" which says the kernel in the vm has gotten... unhappy. (Again: Fedora. Pointy Hair Linux. People who think systemd is a good idea. Amazed it works at all ever.) Oh, and some stupid cron job deleted my test file in /tmp. (Instead there are a dozen systemd-private-BIGLONGHASH-dbus-broker.service variants. Which are directories full of MORE CRAP. Of COURSE there isn't a /tmp/systemd-private to put them all in, or having them be in /run since you imposed that on everybody... No, a dozen seperate very long filenames making sure ls can't do column view in there. Bra fscking vo.)

So I need to reproduce my debugging state in there from scratch, which I'm not looking forward to. What data can I still marshall out of the VM, what did I record in the mailing list posts and earlier blog entries, and what do I need to recreate? I was reproducing it in a specific toybox version, but both git log and git describe --tags in the toybox directory are giving me "Bus error (core dumped)". Thanks Fedora. But ./toybox --version gave me the hash (107996e296a5, commit on August 30 from Yi-Yo Chiang) of the version I built last, and that I can work with.

Yeah, leaving a Fedora instance running a long time is like leaving a Windows instance running too long: it's gradually gone SEPTIC, which is annoying. (Host debian that's been up all that time is still just fine. I _should_ reboot it more often, but mostly that happens because of laptop battery issues and the new one's been pretty solid so far. Fingers crossed...)

Alright, let's copy out the command history in the open windows: echo /tmp/foo.txt | LD_LIBRARY_PATH=../glibc/build ./toybox tar czf /tmp/out.tar.gz --absolute-names --transform 's,^/,,' -T- and git clone; git checkout glibc-2.35; mkdir build; cd build; ../configure --prefix=$PWD/sub, and then I stuck a dprintf in libio/iogetline.c function _IO_getline_info, and can probably take it from there? Command history didn't show a lot of package installs. Quite possibly that was also "garbage collected" by Fedora. (The original Red Hat was my primary distro until it was ended in favor of the whole "enterprise" thing. Code 11a, 11a2b, 1b2b3. Zero zero zero destruct zero..)

Devuan is MUCH happier with that VM closed. I still need to pkill -f renderer to free up enough memory to swapoff, lemme read some of these tabs I have loaded first...

October 16, 2022

Ok, want to do videos, want to try posting them to tumblr for previously explained reasons (and the way everyone seems to be abandoning ship or at least hedging their bets over on prudetube), need to either contact support or find a workaround to reset my account password. (Or break down and make a new one...)

Fixed the grep issue. Oops.

And I had grep -z and -Z wrong: it's not input and output seperately controlled, -z does both input and output and -Z controls whether file listing (grep -l) ends with NUL or newline. (Since filenames having newlines in them are kind of the original reason for a lot of this NUL termination stuff: / and \0 are the only two characters guaranteed not to be in a filename on Linux.)

Next question: my first attempt to test -Z was testcmd '' '-lZ ^t input' 'input\0' 'one\ntwo' '' and should I keep that as a toyonly test? T simpler test for the issue is echo -e 'one\ntwo' | grep -l ^t and the issue is the gnu/dammit version is considering "start of line" to start after an embedded NUL (when NOT using -z) and mine isn't. When \n is the line terminator, embedded NUL should basically be a normal character. I note that gnu/dammit sed is getting it right: echo -e 'one\0two' | sed 's/^t/x/' outputs onetwo and echo -e 'one\0two' | sed 's/t/x/' outputs onexwo. ...sigh, and mine isn't finding anything past the NUL.

October 15, 2022

Nobody's replied to me on the mailing list all week. Weird. Checked the spam filter, but that doesn't appear to be it? (Elliott gets busy with other things, the dmesg bug guy hasn't confirmed I fixed his bug, the debian guy hasn't replied about those patches...)

The tar --xform="flags=" syntax starts from 0. And it's not a delta from the previous default: it resets each time. So "flags=rh" is the same as "flags=rhS". No, my reading of the manual page did not give me that impression, and I did not have existing test cases for this feature either, so... (Implementing a feature I've never used before, which is not in posix, and does not come with existing test cases... is not the fast way to do it.)

Thinking of getting a small HDTV for the bedroom desk and moving the tiny cube computer there (it's only under the big TV because that's a TV, but there's nowhere convenient to keep the keyboard and mouse). Alas, people seem to just be throwing OUT the smaller ones rather than selling them used. The local used computer store ("discount electronics") moved from Anderson Lane up to Parmer Lane, down the street from where Fry's used to be. (About twice as far away from my house, but I biked to Fry's from here several times when it was still open.) Pulled up Google Maps for the first time in forever (uninstalled the app from my phone when it started showing me ads and stopped showing me local black-owned small businesses like the Great Clips in hancock center EVEN AT full zoom centered over them), and I was curious if anything's replaced the dead Fry's yet so I pulled it up in street view...

It's hilarious. Going from one side of the Fry's parking lot to the other is a multiple clicks, and along the way it switches between sunny midday and overcast evening, the trees gain and lose leaves (and in one corner there's even red fall leaves, which is hard to find in Austin). In adjacent clicks Fry's is open with a full parking lot and long-gone without a sign on the building anymore. It's utterly incoherent. Some parking lot lanes are eight clicks to go down and others jump from end to end in one click.

The walk to and from the University of Texas (2 miles each way) is pretty much the only exercise I get, and you can't do that in the day in Texas because sunburn and heatstroke. But the downside of working overnight at the UT table on the porch of the geology building is the lights switch off promptly at 6:30 am (on a timer; I've found and poked the relevant maintenance people and they went "oh yeah we should fix that" and then didn't, I should follow up but haven't yet). If I stay too long after the lights go off various Bad Things happen: in summer, mosquitoes descend in a CLOUD, and it gets hot walking home once the sun's come up. A little after that, the stupid carillon in the High Powered Rifle Tower starts playing "I've been working on the railroad" very slowly in a loop for 15 minutes which is far more annoying than it sounds. And eventually, it gets bright enough it's hard to see the laptop screen, which rules out using the phone screen on the walk. So I'm in the habit of packing up and heading out reasonably promptly after the lights switch off.

Since plugging into the outlet at the table leaves my laptop battery fully charged, I keep thinking I should stop at the tables at HEB's deli section (two blocks from home) and get a couple more hours work in there before calling it a night, especially when I was in the middle of something. But I've gotta buy SOMETHING as an excuse to eat/drink it at HEB's table. (Capitalism refuses to allow public places where humans can exist without paying, it actively destroys places like libraries and civic centers where that would be acceptable. I'm kinda pushing it at the UT geology building but the security guards accepted my "alumni" explanation and are used to me now; being a reasonably groomed white male probably helped there). So I have to shop first, which turns into a circuit of the store when I arrive (especially when they're out of my preferred cans of checkerboard tea, it's stocked by a vendor so HEB itself doesn't control the supply, and it's sold out around 4/5 of the time), and there are usually good clearance deals first thing on the morning on perishable stuff like ground beef, and then I want to get the perishables home to the fridge promptly, and once I get home I collapse and don't head out again despite what I'd planned before walking in the door. (And if I try to sit down and work at home, there is cat. There is always cat.)

It's all psychology but ADHD starts you with a deficit of executive function; I was young enough they still called it "hyperactive" and "gifted" when I was diagnosed at age 7, and they didn't start giving kids Ritalin until long after that. We had to self-medicate with caffeine. Uphill, both ways, in the snow.

Anyway, today I made my saving throw vs clearance and actualy sat down at the HEB table and did more work because I ALMOST had tar --xform working and wanted to GET IT DONE. And did. It's checked in now. (For a definition of "now" that's actually 8 am on the 16th, but I started this blog entry on the 15th so it counts. Night schedules make dating things weird. I generally start the new day's entry AFTER sleeping...)

October 14, 2022

Anime update: I wouldn't say I _finished_ "Jobless Reincarnation" so much as ran out of episodes. (I assume they're working on another season because plotwise it just trailed off.) Now rewatching Season 1 of "Realist Hero" and "The Devil is a Part-timer" because it's been a while and Season 2 of both is out, although in each case "watching" is mostly "listening to without looking at the screen", because I'm still trying to learn Japanese and it's not like I'm going to miss major plot developments the second time around. (Yeah, this is about my normal rate. The big burst last time was "what I watched over the pandemic", only a couple of those series were particularly recent. Usually my Incomperehensible Japanese Exposure is "Konnichiwa, NHK News Des" because podcasts work with the screen off, but in THEORY I already know these stories so should have a bonus to picking out words and sentences I've been exposed to the translation of already. I'm recognizing a lot of repeated words that I don't know the meaning of, I suppose that's a kind of progress?)

Finally updated the music playlist on my phone so I can listen to current stuff again. Five years ago I'd program with a youtube music list playing in headphones, but when prudetube switched to playing two 30 second ads between each song (with interruptions in the middle of anything 8 minutes or longer) I switched back to mp3s. But the mp3 directory I had on my phone was a bit stale (only a couple things newer than you could get through Napster), partly because I mostly don't compress stuff to mp3 anymore? (I was ripping CDs to .wav files for a while, and then I found aac. Lossless codecs for the win.)

I've had various music collections dating back to casette tapes recorded off vinyl records and the radio. Took me years to find proper sources for "Deteriorata", "Dead Puppies", or The Band With No Instruments' rendition of "girl you've really got me now". I still own all sorts of stuff on CD, but seldom pull those out anymore, my USB DVD drive is basically there to rip stuff into a format I can use and then it goes back on the shelf. (There's a blu-ray player built into the praystation, but "I Pancreas Video" closed down over the pandemic and the collection on the living room shelves is DVD not Blu-Ray. Yeah it plays them, and we do sometimes, mostly for the commentary tracks on the Leverage boxed set and so on. But you have to sit AT THE TV to do that. That's for special occasions.)

I still occasionally buy CDs and DVDs (poked Fade to get the Harley Quinn Season 1 and 2 DVDs so we could watch them together), but that's more "voting with our wallet" and "buying a license" than doing much with the physical media. If I'm going to spend money I want to GET physical media, the first season of Jodie Whitaker's Doctor Who we bought through Prime sits there unwatched without even a house ornament to show for it. (I made it as far as the episode with the pokemon eating the spaceship, but then I never finished Capaldi's last season either: "extra oxygen to protect from fire" and "the moon is an egg that hatched" led into a multi-arc episode about blindness that turned into "we have brainwashed the world" which I was not up for during the trump administration; simultaneously tense and boring.) But even though my Alpocalypse CD (birthday present some years back) is still in the shrinkwrap, I don't really own it unless I own a physical thing: first sale doctrine does not enforcably attach to downloads. If I physically own a copy of Cringley's "Triumph of the Nerds" PBS series on VHS, I'm legally allowed to have a copy on my laptop. No longer being entirely sure if we still have a VHS player is beside the point: somebody gives me trouble about the rip, I get to physically hit them with the box set. It's 99% "please make more of this" and 1% "someday I might really get to ruin a laywer's day", but still... (I'm a lot looser about this with books, but I'm used to books wandering off. Somewhere I have a list of multiple dozens of books I gave to my apprentice Nick, and I lost touch with her when she moved to Florida years ago. Plus my house is probably at least 1% printed material by weight. I do not have multiple rooms lined with DVD shelves.)

Alas, some stuff just isn't available on physical media: I'd happily buy the She-Hulk series on DVD but Disney refuses to sell them. You can't buy Rock Sugar's "reimaginator" anymore due to a really stupid court decision from a batty judge, but Rock Sugar turns a blind eye to its presence on prudetube. Svrcina's "My Domain" was on an album that does not seem to have had a physical release (and the downloads they did sell were mp3 encoded). And where exactly would you buy copies of mashups like DJ Lobsterdance's "Stayin' Alive In The Wall" or Grave Danger's "Under My Chandelier"? (Or classics like Nirgaga and Radioactive In the Dark and so on.) Or Olag Berg's minor key versions of songs like Wrecking Ball and YMCA (noticeably better than the originals if you ask me: Miley sounds like she has actual _remorse_ in the minor key version, and minor key YMCA is a FIGHT SONG).

But I'm not going to switch to a DIFFERENT streaming service because all of them have the same basic problem as prudetube: even if you've "bought" copies of things through them it's not really yours. (Just try applying "first sale doctrine" or "archival copy" or "space/time shifting" to Audible.)

October 13, 2022

Oh no, tar --xform="flags=x;s/a/b/" is supported, isn't it? It's not JUST the RSH flags? No, the docs say "Default scope flags can also be changed using ‘flags=’ statement in the transform expression." It is specifically for SCOPE flags: rRsShH. (That's my story and I'm sticking to it.)

The problem with having misplaced a half-terabyte microsd card is it's physically small enough it's kind of hard to find it. (The one in the slot was a random 32 gig one full of temporary junk, used for transferring files to an embedded board. Looks like I was sneakernetting a debug boot cycle to a turtle board at some point, and didn't put the big one back.) I mean... maybe it's in my backpack? It would probably take half an hour to prove that negative. Hope it's not another variant of "placed on a windowsill weeks ago, swept off and fell behind the baseboard" again. Who needs a steganography strategy, I can't reliably find my own stuff at the best of times.

October 12, 2022

Still vaguely trying to close tabs, and one of the tabs I'm closing had the test I was using for the sed -e -f interleaving, so I tried to add it to sed.tests and it didn't work. Because that test:

echo hello | ./sed -e 'a one\' -f <(echo -n 'two\') -e 'three'

...actually tests TWO things, and the other is line continuations. The patch fixed the interleaving, but an -f FILE ending with a backslash on the last line does not line continue out of the file's scope, because the EOF indicator (a NULL entry passed to the callback at the end of the file) flushes pending continuations. Which SOMETHING has to do, because echo hello | sed -e 'a bcd\' outputs two lines: "hello" and "bcd". The continuation gets a nominal "" line appended to it. But it's not end of THIS file, it's end of all input, and right now the do_lines() callback is flushing at end of file.

Aha! If I take the -n out of the echo 'two\' gnu/dammit complains it can't find a jump target for "hree", meaning the continuation was terminated by the newline at the end of the file. Continuation only happens if the -f file does NOT end with a newline? Which is just ELDRICH. Do I even want to fix that? I suppose I can teach do_lines() not to flush a file where the last line doesn't have a newline, but that sounds like a bug waiting to happen. I could also just check in a different test and wait for somebody to complain: \ continuing past EOF on -f is an implementation detail stuff REALLY shouldn't be depending on, and while I _can_ make it work, I also kinda want to back slowly away. (Watch some build script somewhere break...)

$ echo hello | busybox sed -f <(echo 'a one\') -f <(echo 'two')

Darn it. (They're still parsing all -f after all -e, presumably because back in the day I wrote their sed. But it's doing cross-file continuations, even WITHOUT the -n. Hmmm.)

Another tab has echo -e 'one\ntwo\nthree' | grep -ze $'x\n' in it which outputs those three lines with debian's grep and outputs nothing with mine, and I stared at it for quite a while before going: oh right, -z doesn't apply to PATTERN input in the gnu/dammit one, so the extra \n is turning it into two patterns: "x" and "", the second of which is match-whole-line.

In THEORY I have a test for this in grep.test already: testcmd "-z with \n in pattern" "-f input" "hi\nthere\n" "i\nt" "hi\nthere" which unravels to echo -ne "hi\nthere" | grep -f <(echo -ne "i\nt") and expects "hi\nthere\n" as its output. Except... A) despite the description that's not passing -z to grep. B) even if it did, splitting "i\nt" in the middle gives you "i" and "t" as two separate patterns, one of which is in each line, so yes it will output both lines but for the wrong reason.

Sigh, the old story: a test that isn't testing what I think it's testing. I can add -o and | hd the result and then it's obvious that "i\0t\0" is not "i\nt\0". Yes it passed both TEST_HOST and through toybox, but it's still a bad test. And I guess the definition of -z (which is not in posix) does not modify patterns, just input and output test lines. Sigh. (Wanna REAL spec. People who have listened to Keith Richards can't always get what they want, he has cursed us all musically, but reverse engineering stuff to derive what should be specified somewhere uses up all my spoons.)

What other tabs... modules in mkroot. On the one hand, I want to turn mkroot into a better testing environment including module support and things like overlayfs (another open tab I need to close), on the other I want to keep it simple, and the fact it builds WITHOUT module support has historically caught at least one bug in the kernel.

Hmmm. I have a local scripts/root/tests target I haven't checked in yet that copies the toybox test plumbing into the image (although that's a megabyte and a half of text so maybe it should make a squashfs and loopback mount it intead). I can extend that and put a module list in there, and set MODULES="modules modules_install" if $KNOD or $KMODEXTRA aren't empty... except then it wouldn't build the zimage because the implicit __all target would get dropped? Which is not the same as "all" (for historical reasons I'm assuming), which DOES build modules. So maybe the linux build should always use the explicit "all" target to build modules if there are any (presumably harmless when there aren't). Is modules_install also a NOP if there were no modules, or does it mkdir or something?

Iffy about $KMODEXTRA because all I really need to do is append to KMOD with += since I went to all that trouble of blanking the environment with env -i at the start of the script (and the loop semantics I'm using should work fine with space separated CSV blobs. Well, the =y loop will print "#architecture independent" twice if you set $KEXTRA. Hmmm.

But the existing plumbing has $KEXTRA and each target overwrites $KCONF instead of appending to it. And there's no KERNEL_CONFIG_EXTRA, nor does that append instead of overwriting. Sigh, do I have enough of an existing userbase for mkroot that changing the API for a cleanup is warranted? (Explanatory videos!)

October 11, 2022

Patreon's website is working again! Woo!

(The site taking a moment to load and saying "0 patreons, $0/month" before finishing loading is a bit of a heart attack. The money's nice, but it's more "17 people showed up for your talk" vs "0 people showed up for your talk" happening as a SUDDEN CHANGE leading to a paniced moment of "did I just milkshake duck myself without noticing?" before it redrew. I didn't THINK my recent mailing list or blog posts were that inflammatory... :)

Speaking of inflammatory, back to implementing the moderately horrific gnu/dammit tar --xform sed semantics. (Singable to the Lucky Charms cereal jingle, of course.)

So xform "flags=R;s/a/b/r" means I can't just have the flags switch on and off bits at runtime, because while r (apply to regular files) is the default value, in this instance specifying r isn't a NOP, it reasserts something that's been disabled (by flags=R meaning do NOT apply to regular files). But I can't just record what bits I saw either, because order matters. (R then r means r wins, r then R means R wins.) Having it live in two places means it's neither "switch on and off a bit" (because I have to collate them later), nor "maintain two bitmasks and combine later". It's maintain THREE bitmasks and combine later: flags= is just one bitmap because it applies to previous instances of itself (so I can cancel out R and r there), and then s///flags needs both R and r but only whichever was seen last (yes, you can s///rR). I need to implement toggle logic there so setting r unsets R and vice versa. The initial state is still 'all bits unset' because even though "r" is the default, s///r switches off flags=R in a way s/// just using the defaults does not.

Adding flags to unsigned sedcmd->sflags steals bits from the offset field, which is used to store the "which match" field, ala s///7 only replacing the 7th match in the string. (Sort of like s///g, but with a filter.) Except I haven't quite documented that properly: the backreferences in the "replace" part of s/search/replace/flags are limited 0-9 (partly because \0 is an escaped single character in the middle of arbitrary text, and partly because I'm feeding regexec() an array of 10 regmatch_t entries, so it can only record 9 parenthetical sub-matches: \0 is an alias for & because the first regmatch_t entry is the entire matched span. Honestly \0 should insert a literal NUL, but that's not what the gnu/dammit one does, so...)

But s///42 in the flags part is NOT limited to 9 entries, the sed s///g logic increments a counter until it matches and replaces only that one, so there's no limit on how high it can go. The limit is just how big a number it can store, and instead of giving it its own field I had it share unsigned sedcmd->sflags which is 32 bits. I was using the bottom 4 bits of that for flags, now with --xform RHS and a flag for the trailing slash hack I'm stealing 8 bits, which leaves 24 bits with a range of 16.7 million. Stealing another 3 still leaves a couple million. Not a current issue, but something to keep an eye on.

Hmmm... maybe I'm looking at it wrong. If it's resolved at PARSE time instead of runtime, then flags= is one bitmask and the s///flags is a delta from that. What's lost in parsing is the order the flags occur in, but while I'm parsing I have that and can come up with a correct result. I can't do it after the fact with just two bitmasks and ":label;s///r;FLAGS=R;b label" ordering because "saw r" and "didn't see anything" are the same bitmask. But PARSE time resolving means I only need the two sets of flag bits because in the above case, FLAGS= was all zeroes when s///r happened, so the resulting mask is all zeroes, and that's semantically consistent. (Is it _right_? Elliott's on record saying I shouldn't ask that question, but if I was good at not asking I wouldn't have wound up hip-deep in the guts of the system. I pull loose threads. Can't help it.)

Huh, speaking of the s/// help text being a little off, I wonder what I meant by \ at end of line appends next line of script? Because gnu/dammit:

$ echo abcbdb | sed -e 's/b/x/\' -e g
sed: -e expression #1, char 7: unknown option to `s'
$ echo abcbdb | sed -e 's/b/\' -e 'x/g'
sed: -e expression #1, char 5: unterminated `s' command
$ echo abcbdb | sed -e 's/b\' -e '/x/g'
sed: -e expression #1, char 4: unterminated `s' command

Ain't doin' that what I can tell. (Sigh. I need to write TEST CASES.)

October 10, 2022

Tried to invoice the middleman again (for the Google toybox sponsorship money: October is a new fiscal quarter)... and it failed. The middleman's entire purpose in this relationship is to hold and disburse money, and it has not gone smoothly yet once, although "I break everything" is definitely in play here. (But so is "you had one job".)

The middleman is involved because I, as an individualish person, am basically too small for Google's machinery to see, and firing up a new LLC from scratch (they can SEE another corporation) had like 8 an week lead time post-pandemic. (Everything was shut down, then everything was stacked up. Kinda like the shipping containers at the ports, only for state paperwork.) The middleman organization is somebody Google had worked with before, who could take the money and then give it out again (minus a cut) in a way their lawyers had existing paperwork to just re-sign off on. Ok, fine.

The first time I invoiced the middleman back in Q2, I accidentally used the SWIFT code instead of ABA to indicate bank. Swift was all over the news, Russia was blocked from Swift which is the big way money gets routed... except that's international only, you can't use it for domestic transfers. But if you DO use it for a domestic transfer, nothing in the middleman's web page flags it, and the Wise system (the middleman's middleman that does the actual financial transfers) takes the money out of the middleman's account, and does not post it into my bank's account, and weeks go by, and Fade and I call our bank and they don't know what happened, and the middleman's support people call Wise and THEY don't know what happened, and it takes several weeks to get... most of the money back. (The numbers don't add up but I haven't gone through to try to track down where because I was just glad it's over. Except it isn't, but we'll get to that.)

So I set up a second "payment entry" in the middleman's system with the same bank account number but this time the ABA routing number, and we redid the transaction, and it went through! And there was much rejoicing.

When Q3 rolled around, I tried to invoice them again, and got a pop-up saying "Wise: TransferWise validation error: We cannot currently accept payments to this recipient." (I dunno what a TransferWise is either, but I did not remove a space there.) I confirmed I'd used the ABA entry and not the SWIFT entry, but no. Transfers to that account were blocked by Wise. (Yes the Q2 payment went through, but time had passed and now Q3 wouldn't.) I pinged the middleman's support person again (a nice woman named Alina who lives in New Zealand), and she wanted a screenshot of the little javascript pop-up in the web page that fades after 5 seconds, and rather than try to work out how to capture that Fade pointed out that our savings account (at the same bank) has a DIFFERENT account number than checking, so I added a third payment option entry for that and re-invoiced, and the transaction went through, and there was much rejoicing. (I hoped the existing support request would still eventually fix whatever was wrong at Wise, because there's a nonzero chance that attempting to set up direct deposit at a future employer would go "boing" if they use Wise and this is left unfixed. But in the meantime I had a workaround and money is usually something I satisfy deficiencies of rather than actively pursue.)

Now it's Q4. I invoice again, to the savings account. I get the same blocking pop-up from Wise on the SAVINGS ACCOUNT. There's a pattern here: Q2 went into an account that got blocked afterwards. Q3 went into a different account that got blocked afterwards. Wise isn't blocking the account, it's blocking ME. And it's doing it after the fact.

Yeah, this needs fixing. Worked out how to get them their screenshot a while ago (the xfce screenshot tool has a region selection and programmable delay, so select region the pop-up WILL be in with 5 second delay and THEN press "submit" on the web form) and attached it to the email, which is a reply to the Big Long Support Thread going back to Q2.

It's weird, both ABA transactions went through, were not reversed, but then the target was blocked for future transactions. That smells to me like I've triggered anti-money-laundering software? Banks need to be seen to take action against money laundering (to mollify regulators) but have no interest in ACTUALLY stopping money laundering (they profit from it), so this kind of "I could probably open a temporary account to receive Q4 payment and it would work once" thing smells like anti-money-laundring algorithms to me. (Locking each barn door after each horse escapes, without recovering a single horse, is probably intentional.) I hope they have a human oversight "bad algorithm *whap*" button rather than fatalistic "we don't know how our own system works and can't fix it" nonsense. (Sorry, the Urge to Debug is strong, but I really don't have visibility into this situation. Cannot legally stick printfs into a bank.)

October 9, 2022

Arguing with tar --xform. See two posts on the list.

Now trying to implement it (yes, it appears to be special casing trailing slash), and another test case I'm wondering about is... what about multiple trailing slashes? But it looks like neither gnu tar nor mine will _create_ a file with multiple trailing slashes on a directory entry, or at least toybox tar c --no-recursion boom/////// doesn't. So if a hacked up tar file DOES have one when you go to extract, it's presumably ok to be weird? (Eh, but security... Or does the existence of an --xform kinda derail that anyway? Toybox tar has --restrict for a reason, THAT'S your security...)

Eh, still not comfortable. If "weird" was just another "empty filename becomes '.' even though it's a file not a directory" I'd leave it at that, but in this case "weird" means the filename would become "/" which seems wrong, so let's back up past all the trailing slashes... But what if that leaves us with an empty string? Meaning we have (some stuttering version of) the root directory... Ha. Toybox tar c --no-recursion / skips it (with an error) because it strips leading / and then an empty string is "no such file or directory". What does Debian's do... it special cases saving "/" as "./" (despite not being in / at the time).

Implementing remains easy. Figuring out what to do is still the hard part. And when I poke, I keep finding more weird corner cases in the gnu/dammit tar:

$ tar c --no-recursion "" > /dev/null
tar: Substituting `.' for empty member name
tar: : Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors

I was wondering how did it substitute '.' for a file it couldn't open, but I think that just means it processed the output filename before opening the input file. (Which means you haven't got the readdir/dentry info so can't do ALL the processing, but...?) I had to care because filtering "/" and ".." as filenames (without -P) can result in an empty string, and gnu/dammit tar seems to replace those with either "." or "./" depending on circumstances.

(Since it's been a while: Stallman keeps going "It's gnu/linux, dammit!" but Linux has nothing to do with gnu and never did, so if you leave linux out of it you can shorten his tirade to gnu/dammit and thus wind up discussing the gnu/dammit versions of various tools. Since he does his "no, it's a BUD lite" tirade any time somebody tries to say "linux", we might as well use a slash any time someone would otherwise say "gnu". Only fair.)

Except... their tar's NOT consistently doing that? You CAN have it create files with members with empty names:

$ cd ~/linux
$ tar c --xform 's/Makefile//' Makefile | tar tv
tar: Substituting `.' for empty member name
-rw-r--r-- landley/landley 65933 2022-10-03 00:57 

The substituting message there is from the extract, not the create. What are the rules here...

October 8, 2022

Sigh. I want to care about compiler stuff, but haven't got the bandwidth. (I don't want to discourage people that DO, but I don't really view myself as an authority here either. Just a disgruntled user.)

October 7, 2022

It would be nice to get modprobe out of pending, and that involves setting up a test environment, which means not only enabling CONFIG_MODULES in mkroot, but compiling some modules it can load. (I note that mkroot packages up the cpio.gz for the root filesystem _after_ building the kernel precisely so it can include modules in the initramfs image. It's not doing a "make modules" and "make modules_install" yet, but that's because I haven't got any modules for it to build...)

My first instinct was to use the net "dummy" module... but it hasn't got dependencies. (And I might want to use it in ifconfig tests, especially since for some reason you can't assign aliases to loopback interfaces.)

The bug report that got me looking at this involved a module loading another one as a dependency, and grep ': ' /lib/modules/$(uname -r)/modules.dep lists all the modules that have dependencies. Alas, most of them bind to hardware qemu doesn't emulate.

What I want is a module that doesn't need anything and doesn't do anything, but has a dependency. I don't want to enable a lot of extra infrastructure to get it (especially not have qemu emulate more hardware for it to bind to, since that's unlikely to be available on all the virtual test boards), and I don't want loading the module to potentially screw up any other tests I'm running. And I don't want to turn static infrastructure I may actually have tests depend on into a module that may not be loaded yet when the test is run, so the module can't be something I may actually NEED.

I'm currently squinting at the tcp_yeah module, which seems pleasantly useless. It implements some random congestion control agorithm that predates CoDel (Controlled Delay) which according to the guy who would know is what actually addressed BufferBloat. But sure, here's a driver that Linux still has, and it loads another module (tcp_vegas) as a dependency. Will it randomly vanish from a future kernel? Probably at some point, but in the meantime... I already have ipv4 enabled, shouldn't interfere with the other tests I'm running... unless it decides to screw up all ipv4 tcp connections with a weird congestion control algorithm? Sigh, that means it threatens to actually DO something. And maybe vanish in future, breaking my tests. Hmmm...

The nice thing about binding to the net plumbing is it doesn't probe for hardware. That's another driver's problem, which exports an API to talk to the outside world. Filesystems are another thing that does that: ext4/scsi/atapi are like netfilter/ipv4/ethernet. Similar layering. So what are some filesystems I'm unlikely to use? romfs.ko can depend on mtd.ko, but Memory Technology Device = flash = magic hardware again. xfs.ko depends on libcrc32c.ko but that sounds generic enough other things may need that (and either autoload the module or fail because it's not loaded).

Ooh: cachefiles.ko depends on fscache.ko, and both sound PROFOUNDLY useless for my purposes. Both "likely to stick around" and "never going to interfere with my use cases". Alright, let's try to enable that mess.

Um, next question. How do I add modules to the miniconfig in mkroot? Right now I'm testing with " KEXTRA=MODULES,MODULE_UNLOAD" but that adds =y symbols, not =m. I can add the full uncompressed CONFIG_BLAH=m lines to KERNEL_CONFIG, but architectures that set KERNEL_CONFIG overwrite it rather than appending. And we still haven't got a "make modules_install" invocation anywhere.

The build script needs some surgery, but first this needs design work. Possibly a MODULES=CSV setting with its own little stanza.

October 6, 2022

Bug report that lspci -i doesn't work, and... I never wired up -i so it's always looking for the database in the search path. Oops. And it's a little awkward to fix because the compressed vs uncompressed search is two passes...

The mkroot plumbing is forcing CONFIG_EMBEDDED off, but the kernel doesn't seem to require that anymore? Bisect, bisect... ah, they fixed it last year, undoing the amazingly stupid hack they did in 2014 that caused the problem in the first place. Good! I can remove it from mkroot now. (One less thing to explain!)

I should do a video demonstrating that bisection as a general "how to use git bisect" introduction. I should poke tumblr again to get my account back to see if posting videos there works, and if linking those videos from patreon works better than linking them from (Where having every video you post go into a SINGLE GLOBAL NAMESPACE kinda led to decision paralysis every time I wanted to upload something. Writing in ink on the Wailing Wall: be sure. Plus then patreon thumnailed it spectacularly wrong and I have no idea which side of that transaction to talk to about it.) I should fire up a VM and log into Patreon and reassure everyone there I Aten't Dead. Except that's a different VM from the one I've got the Fedora Bug reproduced in and the memory usage of both at the same time would NOT make this laptop happy, so I should go finish sticking printfs into glibc and seeing what's going wrong there...

Trying to close tabs until I'm at least back into euclidian tab-space. I keep forkbombing myself trying to collate todo items.

October 5, 2022

Cycled back to poking at debootstrap and half-assing container support with unshare. The reason chrooting into a fresh debootstrap DIDN'T have command history wasn't a missing package: it was because /bin/sh pointed to the Defective Annoying SHell and I wasn't explicitly saying /bin/bash to chroot. When I didn't env -i the host environment, chroot picked up $SHELL pointing to /bin/bash so it worked anyway, as long as I _didn't_ do the half-assed container thing.

So maybe I want some variant of:

$ sudo env -i USER=root TERM=linux SHELL=/bin/bash LANG=$LANG PATH=/bin:/sbin:/usr/bin:/usr/sbin unshare -Cimnpuf chroot beowulf

Trying to figure out what environment variables the init script in the chroot should set, and which ones toysh should initialize if it's blank...

I copied the mkroot init script into there but unshare -pf makes it think it's pid 1, so it's doing the qemu setup instead of the chroot setup. Which is a problem for a half-dozen reasons, but the first one it hits is [ $$ -eq 1 ] && exec 0<>/dev/console 1>&0 2>&1 in the devtmpfs mount, which sends the output to... I'm not entirely sure where actually? /sys/class/tty/console/active says tty0 which would be the ctrl-alt-F1 vga console? But the output didn't go there, instead the script exited (presumably because stdin was closed). The unshare blocked access to the host's text mode consoles, I guess.

I could manually strip down the init script so it works in a chroot that's actually PID 1, but what I'd like to do is detect this case somehow and have it do the right thing automatically so I'm not maintaining multiple forks. With a normal chroot without container plumbing, PID 1 is the distinction. But here... how about "is root a tmpfs"? Let's see, does calling stat -fc %T / work without /proc and such mounted... KARGS=rdinit=/bin/sh ./ gives me a shell without the init script having run, so ls -l /proc has no contents, and yes I get "tmpfs" from stat -f. Ok, I could have that be a distinction instead of (in addition to?) PID 1.

Another question is "Do I have a stdout?" Which isn't really a qemu question, it could also mean the kernel has my CONFIG_DEVTMPFS_MOUNT patch so I don't need to do early console fiddling to see error messages if the init script has issues. (Ideally that would get merged upstream and I could just generally not do it, but the linux-kernel guys' heads are too far up their asses to see or hear the outside world anymore. Sigh, I should repost quarterly on general principles. Kinda like voting in a red state, it's not that I expect to succeed it's that it annoys the right people.)

Huh... If I do 2>&3 on bash it says "bad file descriptor", which is a good way to determine it's not open! But if I do that in toysh it says "syntax error" instead, which is... right detection, wrong reporting. And if it aborted the running script, that would also be bad, I want a (silent) failure I can test and react to.

Sigh, I left off in the middle of toysh work too. Something about pl2str() outputting function bodies properly? The file date is september 6, but that's when Zabina died and I apparently did not do good blogging about it. Did I email about it? Well, there was some interaction with Chet on the list and that probably _triggered_ the work, but what tangent did I go off on? Sigh, time to try to reverse engineer my half-finished patch and reproduce whatever failing test I was trying to make work...

(You'd think there would be less of this with the Google sponsorship, but I'm still trying to prioritize the incoming bug reports and requests, and also the todo list Elliott gave me that the Google hermetic build guys would like to see, and that means I'm still ping-ponging around a bit. Plus I'm just not back to full speed after three rounds of Covid, the Trump administration, riding down a startup, various fuckery from the supreme court and the governor of my state... I turned 50 this year and my baseline productivity is not where it once was. Nobody's complained I'm being too slow yet, but _I_ expect better than this.)

October 4, 2022

Closing tabs for a maybe-release with shortest job first scheduling (I.E. collecting low hanging fruit), and I'm cycling back to what turned into a request for ls --sort=nocase. It's kind of generally hard to get utf8 case insensitive sorting right without more locale support than toybox is doing, but I can do the basic ASCII case insensitivity which seems to be what's asked for here.

Trying to genericize the unambiguous --longopt plumbing from lib/args.c (currently line 430 or so), thinking it could be a lib function that ls --sort also uses, so the current ls --help sorting options saying "-t timestamp" would let --sort=time also work...? Mostly I dread trying to rephrase the help text coherently. But --longopt strings can also end with "=" which makes it a bit tricksy. Plus the data structure being traversed is a linked list of structures, and the result is to identify the structure in question. (Not "it's this string" but "it's this option".)

And adding ls --sort also raises the spectre of csv: can you have fallback sorts the way ps does? Ala "ls --sort=ext,time" and such?

I'm also going through and deleting old *.patch files from my toybox work directory, confirming each one either got applied or the issue was addressed in another way. Back in 2020 I got a patch from Juhyung Park to fix a bug he was seeing in dmesg, which I couldn't reproduce and which fell through the cracks. Email sent to see if he's still contactable...

Being able to work on toybox full time has certainly gotten me CLOSER to keeping up with this sort of thing at the rate they come in, but it's still a firehose with a large existing backlog.

October 3, 2022

The problem with trying to make the new qemu mkroot tests run in parallel is qemu does file locking on the init.sqf file, and the qemu invocation fails if another instance is using it. (I keep forgetting file locking exists, I haven't used it since OS/2, except a couple weird places in /etc where it's part of some silly access protocol from the 1980s.)

The sqf file is tiny enough I suppose I can cp it for each process using it, and then rm it right afterwards so it's using an unlinked file that autodeletes itself when the process exits?

So much shell fiddliness. I'm trying to get different levels of output for V=1 and V=2, using a construct like { block } | tee logfile | { [ -z "$V" ] && grep '^=== ' || >/dev/null; } which of course hangs. The fix is to put a gratuitous "cat" before the >/dev/null because sure, why not. Bash can echo <(<README) but not here.

Another fun waste of an hour is I had #/bin/bash at the start of the script (rather than #!/bin/bash) so the kernel's execve() system call didn't recognize the file type as BINFMT_SCRIPT. When bash ran it and the exec failed, bash then checked if it looks reasonably like a shell script and ran it with itself (I.E. under bash). But when busybox "time" tried to exec it and failed, it fell back to calling /bin/sh on it... which points to the Defective Annoying SHell on Debian. So running the script directly worked but "time" failed a few lines into the script. (If you suspect this is happening, having the script readlink /proc/self/exe is useful.)

October 2, 2022

Working on a proper scripts/ which is a horrible name for it but nothing better comes to mind at the moment.

When stdin is a tty but stdout isn't, qemu will SIGTTOU itself (which is equivalent to SIGSTOP). I don't know why. Took a while to track down what was even happening. The workaround is < /dev/null (so stdin isn't a tty either) and I boggle.

Yup, Linux 6.0 dropped. Maybe I should do a toybox release early just to sync mkroot up with that kernel? Hmmm. Two in a row with no new commands promoted is a bit embarassing, but eh. Doing the thing...

October 1, 2022

Happy first of Halloween! (Look, if the xmaspolists keep trying to have Santa start making them money BEFORE the end of the thanksgiving day parade as "Micracle on 34th street" promised, then Halloween can start sometime in August. And pushing the other way, we can totally decorate the tree with little pumpkins and skeletons. Jack Skellington is christmas.)

Once again, walked to the table and left my charger at home. The habit of not always having my charger in the bag when the laptop is in the bag has some sharp edges.

They're calling the new kernel 6.0 and it's rc7, and it's saturday with new drops tending to happen on sundays, so I pulled the new kernel and built all the mkroot targets to see what I'm up against. When doing a taskset 7 scripts/ CROSS=allnonstop LINUX=~/linux/linux the magic invocation to see which ones DIDN'T successfully build a kernel is ls root | egrep -xv "$(ls root/*/linux-kernel | sed 's@root/\([^/]*\)/linux-kernel@\1@' | xargs echo -n | tr ' ' '|')" | xargs which in this case produces armv4l armv7m build microblaze mips64.

None of that is actually a surprise: "build" is a false positive (it's the only directory under there that ISN'T a target output directory), "armv4l" kinda fell through the cracks of the OABI->EABI change (which technically just requires thumb1 extensions so armv4tl should work, but I haven't done a kernel config for it yet). I made an attempt at "armv7m" a couple months ago and should circle back to it at some point, it's nommu and support took a while to go upstream into the toolchain and libc and kernel and I need to hammer on it more. "Microblaze" is another nommu one that isn't producing binaries right? (Even trying to run them with qemu through debian's binfmt_misc foreign binary autodetection thingy, I get "Exec Format Error" which... I don't know what's wrong? I should ask Rich, he added microblaze support to musl and theoretically got it to work at least once?) It doesn't continue on to a kernel because I can't tell what "success" looks like if the binaries won't run. And then "mips64" I also need to do a kernel for. (It LOOKS like qemu has a malta emulation for 64 too?)

Anyway, none of those failures are regressions, so now I want to test that each of the systems that built actually WORK, and I'd like to do so in an automated way instead of the manual testing I've been doing. So for network, I can go echo world > hello && ./toybox netcat -p 8080 -s -L ./toybox httpd . on the host and have a quick [ "$(wget -O -)" = world ] test in qemu to say if the net's up... but I would like qemu to run that itself on startup.

I _could_ drive qemu from outside, with a script on the host. I've done expect nonsense and have the txpect plumbing for that. But what I really want to do is have something like the build control images from back in Aboriginal Linux, and the first step of that is automounting /dev/?da if it's available, which is a one line addition to the mkroot init script...

Except the kernel source I'm building has my patch to make CONFIG_DEVTMPFS_MOUNT work for initramfs (but apparently not my patch to automatically use gcc or llvm without having to explicitly specify it), which means /dev is mounted before init runs, and for some reason the init script is trying to mount /dev again and producing an error (because it's already mounted). And the failure is that "mountpoint" is saying that /dev is not a mount point, when it clearly is. It's in /proc/mounts and stat / vs stat /dev produce different device major:minor. (Synthetic filesystems are usually major 0, minor counting up from 1, in this case initramfs is 0:2 and devtmpfs is 0:5. I don't know what has minors 1, 3, and 4 but there's weird internal stuff like "pipefs" that should basically never be user-visible....)

I grabbed the last published static strace binary from aboriginal linux and it's saying helpful things like syscall_383(0xffffff9c, 0xbf9e1f91, 0x100, 0x7ff, 0xbf9e1580, 0xbf9e16d0) = 0 because it seems the kernel's system callls have changed a bit in the past 7 years. (Or at least musl is using different ones than uClibc knew about?)

Rummaged around in the headers a bit to try to track it down, but I was testing the i686 build and the host headers are x86-64. Too much deadline pressure with the battery draining (and I really want to look at stuff on the net but plugging my phone in via USB would drain the battery faster), so I closed up the laptop and walked back home, got the charger, and walked back to the table. Step counter saying 21k steps today and I've still got the walk back home after this... (Snark at the table all you want but this is pretty much my only form of exercise.)

Ok, the problem with mountpoint is that the conversion to same_file() was inappropriate here: the first test was != and the second test was == and inverting the logic can't salvage them not matching. Also, while we're there, mountpoint -q should also be silent when the argument doesn't exist. Checking TODO items while I'm there, no easy way to have mountpoint detect --bind mounted files without looking in /proc/mounts, which strace says is what the other implementation is doing. I don't want to depend on /proc being there: in the mkroot init script it isn't yet (for chroot or for qemu), and I don't want to implement two codepaths to do the same thing.

Ok, taught the mkroot init script to mount /dev/?da on /mnt when it's there, but did NOT add a call to /mnt/init. At least not yet. The overlay package I just added means I can have an "etc/rc/init" symlink to /mnt/init in the overlay (being broken on the host is fine), and then have an init in the filesystem I feed qemu as -hda.

Alright, what's my smoketest script here, something like:

echo running init
[ "$(date +%s)" -gt 1500000000 ] && echo date ok
wget -O -
reboot -f

And then mksquashfs init init.sqf (if you give mksquashfs a file as its source, it makes a filesystem containing just that file in the root directory) and ./ -hda init.sqf... which is complaining because the big iron loons that took over qemu development want the fill-it-out-in-triplicate syntax to do the exact same thing... Sigh.

September 30, 2022

According to the debootstrap installation docs, debootstrap uses wget, ar, /bin/sh, and "basic linux tools" (footnote lists sed, grep, tar, and gzip). That's sounding fairly feasible to run under toybox. Especially since I can run it under scripts/record-commands and look at log.txt to see what commands it called out of the $PATH.

$ sudo scripts/record-commands debootstrap beowulf newdir
$ awk '{print $1}' log.txt | sort -u | xargs
apt-config arch-test basename cat chmod chown chroot cp cut debootstrap dirname dpkg dpkg-deb gpgv grep head id ln ls mkdir mknod mv perl rm sed sha256sum sort tar touch tr umount uniq unxz wc wget
$ grep wget log.txt

Fundamentally what it does is wget a list of deb files and install them all into a new chroot. (Under the covers dpkg is something like an ar file containing tarballs.) Except if you "debootstrap beowulf beowulf" and then try to run it again without net (phone tether at the table still dodgy, plus it's got a monthly quota), it makes it as far as trying to wget and dies. Did not cache all those packages it downloaded, apparently? Not at the system level, anyway.

So to play with it self-contained, I need a good run with record-commands telling me all the files it fetched off the server, mirror them to a directory locally, and run toybox httpd exporting them. Probably something could be done with the pool1.iso file? (A while back I made puppy eyes at the #devuan people and they started uploading pool1.iso disks with basically the whole web repo on them each release, so you can install and fully populate a VM image without net access.) Except if I loopback mount that and search for InRelease, it doesn't find it. There's a dists/chimaera/Release but... so much undocumented implicit knowledge.

Sigh. The downside of the debian project being 29 years old (as of two weeks ago) is lots of old hands learned stuff from long-gone sources, and don't notice that newbies wandering in have no obvious way to read up on it. I've complained the NOMMU space is like that, oral traditions are no substitute for documentation. It's great the people we can ask are still around, but if you're new to the area how do you know what to ask?

September 29, 2022

Elliott has received the grep fix and it passed the presubmit tests. Alas pushing that also flushed some tar/sed --xform work that... may be destablizing? (Hopefully isn't. Should work.) But I should try to finish that up this weekend.

Starting to get chilly at the table. Lots of students out, fairly late into the night.

September 28, 2022

The Fedora VM I've had open for a while so I can eventually track down that glibc bug (it's on the todo list!) has started keeping a CPU pegged (bad for battery life), so I went into it and ran "top" and of course it's a systemd instance stuck in a CPU eating loop. There's a reason... ok, a very long list of reasons... I don't allow systemd on my laptop.

I haven't focused on tracking the bug down because... well, I got interrupted and haven't popped my way back up the stack to that, but the OTHER reason is I'm trying to fix the tar issue in a different way (the whole sed --tarxform protocol thing), which means the bug isn't blocking. It's still a problem I want to FIX, but as with the recent glibc header inclusion nonsense it's not necessarily MY bug.

Speaking of which, I keep meaning to subscribe to the sed mailing list so I can ask them if the inconsistent append behavior is intentional or not (and if so why). Also, the sed man page documents a --sandbox option that disables the e/r/w commands. I had --tarxform disable the w command, so I was curious, but... what's the "e" command? Their man page doesn't document one, posix hasn't got one, and I'm not installing their version of gopher to look at gnu's proprietary documentation format. (Tar had an html manual, and the bash man page is the most complete and definitive documentation bash has.)

Unfortunately, the table (not so much my home-away-from-home as my office-away-from-cat) has developed another problem: now that all the students are back, T-mobile's cell tower for campus proper is badly overloaded. As in internet connectivity comes and goes, sometimes it just stops working for ten minutes, and occasionally it gets so bad the signal strength indicator turns into an exclamation point. So if they do have a web page version of their info page (they'd HAVE to, it's 2022, not even gnu is THAT out of touch), I can't load it right now. Sigh.

Update: net came back eventually, yes there's an html manual and e is "execute pattern space as a command", ala "echo 'rm -rf ~' | sed e" (don't run that). I boggle at the security-mindedness of the gnu project, and am pretty happy not to have implemented that.

September 27, 2022

Wait, Tumblr lifted the porn ban? Hmmm, more "loosened" rather than "lifted", but still. That's a sign of life I hadn't expected from them. I wonder if... yes they host video. That's Very Interesting.

(While I don't expect to post porn, I refuse to live in The Good Place's censored Holy Forking Shirtballs world, in which only certain people are allowed to watch Deadpool let alone produce it. No. You do not childproof the planet. That's not how that works, and helicopter parenting is terrible for the kids too. Children in Europe drink alcohol at the dinner table and aren't as screwed up about it as we are where you're expected to go from virgin to 100% knowledgeable at the stroke of midnight. Right wing loons who think in black and white need to die off. In the weeks after "the shoe bomber" everyone took off their shoes for the TSA before they were ever asked to, and inevitably the security theater expanded to fill the space by mandating the behavior. If you self-censor safely back from the blurry line, the line will move towards you every time. Sheep who only graze in half the field get the fence around them tightened up. Rinse repeat. Stop it. Push BACK already. The Boomers will die. If prudetube wants to go down with them, let it and move on.)

Attempted to reset my tumblr password on an account last used during the Jurassic period. Got the reset link, but every password I type into it is "too short", including 30 character mixtures of upper and lowercase letters, punctuation, and digits long enough to scroll the input field. Tried contacting support and while I can fill out the form explaining my issue, and click the captcha (which times out after like 30 seconds), the "next" button remains greyed out and inactive. I suspect they don't support chrome on Linux? (The actual _site_ does fine, at least when not logged in...) Right, throw it on the todo heap and pop the stack...

Got the new sed --tarxform mode implemented, and taught tar to use it, but haven't quite checked it in yet because the protocol needs to change just a little more: it's not sending a type indicator for the "transformation scope flags", which I haven't implemented yet because there's a zillion unanswered questions about them. I answered some of them at one point, but it was long enough ago that I don't remember what they were and need to answer them again. Complained about it on the list, on general principles.

Sigh, I should update the protocol so tar is sending the type info and sed is parsing it, and then worry about actually using the info (or setting non-placeholder information in the field) later. Except there's a sequencing issue on the tar side: the logical place to put the call to xform() is fairly early on in add_to_tar(), which is before the "Handle file types" if/else staircase that sets hdr.type. I can't easily move the xform() call down because the results get used twice before the end of the hdr.type staircase.

The TRICKY bit is hard link detection. I can't just feed st_mode to xform() at let it figure it out because there's a list of previously seen nodes to check. (The link count being elevated isn't enough, this tarball has to include multiple instances of the same inode in order to care.) And one of the uses of the transformed name is saving it in the hardlink list when it's the first time we've seen it and thus need to store it as a normal file. So the xform has to happen before hardlink detection, and can't happen before hardlink detection if we're selecting on the "Hardlink" type.

Hmmm... I suppose if hardlink detection moved before the transform and saved the untransformed path, it would be consistent? No, because when there IS a hardlink, the link name we save into the file is the transformed one. (Name transform collisions can't _cause_ hardlinks, can they? No, that would be insane. I mean, even for gnu.)

Urgh, what happens if you're saving a hardlink to a symlink and the scope flags say to act on hardlinks but not on symlinks? Who designed this mess? What are the actual use cases?

Jeff wants me to test his ASIC toolchain build scripts. The obvious way to start that would be to reproduce them under Debootstrap, which gives me a broken bash with no command history. I remember installing more stuff into debootstrap to make that work at one point but have no idea what the fix actually WAS? It was many moons ago, I of course wrote random documentation du jour but it doesn't mention this issue. Haven't circled around to bulldozing through it the hard way yet, too much else to do...

Somebody messaged me on linkedin, which has an actively stupid (ahem, "sticky") setup where it sends you an email letting you know you got a 15 word message, but does not include the message in the email, just a phishing link. It's really annoying. (They wanted contact info for an old co-worker, I forwarded them. Email would have saved me 10 minutes.)

September 26, 2022

Checked in the grep fast path code, with a timeout 5 test that times out with the old code and takes less than a second with the new (even on my somewhat ancient laptop). Explained the general ideas behind the change on the list more than once, with pointed hints about how real world test data would be nice so I know what needs to be IN the fast path and what can fall back to the slow path, but I'm basically back to waiting for people to complain again. (I implemented more than they asked for and less than the openbsd implementation they pointed me at was doing.)

Started on the new sed protocol for tar --xform. Sed wasn't exactly designed for this...

September 25, 2022

Made it to the table. Do not have laptop charger. New longer battery is still finite. (Before I noticed that, I spent most of the battery I had editing and uploading blog entries -- we're live through September 4 -- before rummaging for the charger and going "oops".)

Fade bought me a new battery (not just newer but bigger, 65 nominal ampre-fortnights instead of the old one's 58), and I swapped it in on the 17th (you don't lose your open windows if the laptop is plugged in while you take out the battery), and one of the first things I noticed is that if I let the laptop charge the new higher-wattage battery while using it for something CPU-intensive the power brick gets very hot and emits a noticeable chemical odor.

So I haven't done that again since; I mostly recharge it while it's suspended, and otherwise try not to stress the system when the battery isn't full. And I don't ALWAYS carry the charger with me, in hopes of flexing the cord less, because I still have to position the cord carefully for it to start charging. I'm assuming there's a data line between Dell's laptop and Dell's proprietary power brick that only makes intermittent contact, and it needs to send a packet to give the microcontroller permission to charge the battery. (If you're using a third party charger, the laptop will run but not charge. That's how Dell forces you to buy their expensive proprietary chargers.)

If the data line stops connecting to the brick again (see "intermittent") the battery continues to charge, but maybe then the brick can't signal to the charge controller to slow down? (Hence the overheating?) No idea. It's Dell, which means they're almost certainly reselling something they don't really understand, and they seem to actively PREFER engineers who don't speak english (and can thus hide from western customers more easily, and not be cut out as a useless middleman). The adapter's made in China so the designers were probably in Taiwan, Korea, or Vietnam? Well, this tech's like a decade old now, so I suppose it could have been designed in Hong Kong. Emperor Xi didn't kill the goose that laid Shenzen's golden eggs until 2020.

September 24, 2022

It's Youtube's hypocrisy that gets me. I just saw a youtube ad for "Cleveland Sex Therapy" that says "How to Open Up your Relationship: a guide to starting polyamory. Tips and recommendations to start." (Yes, with that capitalization.) But if a creator put up a video about that, they'd kill the whole channel. Alas, prudetube is where the new Mark Blyth video got posted. I'd much rather watch stuff through a different site, but so far that's still the one people post such things on.

The AOL->livejournal->myspace cycle grinds forward, but we haven't got quite got an ao3/wikipedia answer to video hosting yet. (I've looked.) Sadly sites like sourceforge/github/twitter go through bait-and-switch cycles where the producers slowly become the product until cthulu purchases them. I remember when slashdot was a "community". Then it got bought and its founders evicted, and slowly moved from serving its community to milking it. The same thing happened to youtube: a founder stopped being CEO in 2010, the last founder left in 2011, then in 2013 Google+ got shoved down the site's throat which turned the comments section toxic (it wasn't great before, but the reset made it a CESSPOOL), then they unsuccessfuly pushed subscriptions by taking away existing features (like playing with the screen off) and making new ones (downloads without youtube-dl) pay-only, and when that didn't work they raised the price and doubled down to literally "annoy people into subscribing" (no seriously, they announced that as their explicit plan), as part of which they started to randomly insert ads into the middle of existing videos. And this is alongside the whole "Prudetube" issue where the woman in charge of youtube is a raging misogynist. (Sadly, women who hate women are surprisingly common, and being "not like other girls" has always been a profitable business model.)

So I'm spending a lot more time on crunchyroll these days. I still want to learn Japanese, and as long as I'm watching something with a plot...

Finished "Trapped in a Dating Sim", which was disappointing. (Ended abruptly, Protagonist was an asshole devoid of character development, most of the secondary characters were interchangeable. I get the feeling there were a lot of cultural references going "whoosh" past me, but that was also the case with Yakitate Japan and I enjoyed that series. Possibly my standards were lower before I had access to Crunchyroll's library?)

Also finished "Parallel world pharmacy", which was better than "Drugstore in Another World" (only made it through 6 episodes of that one) but still disappointing. It ended kind of abruptly without really accomplishing much, a big bad showed up suddenly at the end and was dealt with in a couple episodes, and I didn't particularly find any of the characters memorable. The protagonist is kind of one note, the father is "a father", there's "younger girl in awe of him" and "older girl who admires him" who didn't really have any other characteristics or development I could recall...?

It's not that either of the two series I just finished were exactly bad, but I have watched some EXCELLENT anime over the past couple years that they suffer in comparison to. Ascendance of a Bookworm (I may break down and buy the light novels to read ahead even _without_ audiobooks), Kobayashi's Dragon Maid (Kobayashi Dragon Mom), My Next Life as a Villainess... (I'm not sure what Bakarina goes through could really be described as character development, but everyone in her herem gets tons. Wait, if a herem is full of women and a himem is full of men, would a gender balanced one be a themem? Her cult of followers: they benefit from hanging out with her.) Interviews with Monster Girls had interesting characters I'd love to watch another season of (advance Sakie and Tetsuo's relationship you cowards).

Heck even "How Not to Summon a Demon Lord" had a deeply broken protagonist go through rather a lot of uncomfortable character development surrounded by supporting characters with backstory and plot arcs. (I also note it's probably as close to hentai as crunchyroll is willing to carry, and when the protagonist has certain emotional scars tweaked he has a nasty tendency to gloatingly murder bad guys while they're begging for their lives, which other members of the setting do not call him on. Strong implication that the "wives" of the season 2 midboss were mind controlled, and "rescue" is not what happened. Still, I'd watch a season 3.)

Earlier I finished "Banished from the Heros Party" and "In the Land of Leadale" which were also both kind of mediocre. I watched them to the end, but am not clamoring for another season of either. I'm not sure an anime has to be necessarily GOOD for me to want another season of it: "Kuma Kuma Kuma Bear" had like one idea in it, and I'd happily watch another season, because the people in it seemed to be enjoying themselves. And "Death March to the Parallel World Rhapsody" was a kinda by the numbers isekai, but I'd watch another season if they did one (they probably won't).)

Yes those last five anime all had wildly overpowered protagonists who were the strongest person in their setting, but a protagonist being wildly overpowered for the setting isn't a problem for an anime that handles it well. That just means physical threats aren't the main source of conflict or dramatic tension.

Bofuri has an unstoppable protagonist and that was both very good and something I look forward to another season of. I could write a LOT about that one: a VR game under active development with a certain amount of emergent algorithmic world building meets a player whose play style repeatedly breaks the game mechanics in ways the devs didn't expect but the AI-ish algorithms keep jackpot-rewarding. (Having my own "I break everything" tech issues, I relate, except for her it's contextually more or less a superpower.) They nerf her during system upgrades more than once, and her choices impose a lot of limitations that other players don't experience, so it's more balanced than it sounds and she forms a guild of other characters providing good context and character interaction and such. But so much of the setting raises questions. They can taste things when eating in VR? The special events have time compression where they're in-game for a week when it's only a single night back home? People get knocked unconscious by in-game events more than once? That is some impressive VR helmet technology, presumably magnetic fields or something instead of skull-drilling electrodes, but it still sounds REALLY DANGEROUS to stick your head into. It's handwaved as not relevant to the plot, which wants an isekai-but-not-isekai setting, so ok: the technology is not the focus. A player badly breaking PVP MMORPG game balance, with occasional asides from the devs struggling to deal with it (and chatrooms of other players boggling) is the focus, and it's the story of her and the people around her having fun while the devs struggle to try to keep the world entertaining for everyone playing and at least a LITTLE balanced, as new content is introduced.

Reincarnated as a Slime followed the power creep (what if a trash mob spawned in the Final Dungeon and got the XP from the ender dragon on day one, plus mad loot including the Enemy Skill materia immediately used to obtain skills from a bunch of high level mobs, then they exited that dungeon into the Newbie Start Zone) to logical-ish worldbuilding conclusions: beyond a certain point you start having a political impact, so politics and logistics and relationship building became the focus. Plus the guy who'd been isekaied was a General Contractor who sees a goblin village and wants to give it roads and drainage and establish supply chains. The series suffers a bit from what I'm told is a compressed manga-to-anime conversion that sometimes covers events without the backstory and setup, and the result is some gear grinding tonal shifts where... he's teaching a school now? Why should we care about Diablo again? But on the whole, it keeps most of the cargo inside the train cars. (Sadly, Slime Diaries was kinda boring in spots. Interesting concept, ok execution, but side vignettes that run parallel to the main story have limited opportunities for plot _or_ character develoment if they can't significantly affect the main story. A variant of prequelitis. The creator has done ALL THIS WORLDBUILDING and for anything good there's an iceberg of content they haven't shown onscreen. But when they actually do, the result is the Silmarillion because the story they chose to tell was by definition the most interesting bit, and they already showed that. This is lesser content, it can't affect the status quo we've already seen, and spoilers are built-in. This instance of it isn't BAD, but was not as entertaining as mainline. I was hoping for something like the Harper Hall trilogy in Anne McCafferey's Dragonriders of Pern series (not Anime, 1970s sci-fi books), which I actually preferred to the first trilogy. But that had an adjacent setting with its own protagonist(s), and plenty of new stuff happening for those people within that context with only occasional crossover with the first series.)

One Punch Man (the definition of overpowered protagonist) was gory but entertaining, although season 2 mostly took the camera off the protagonist and onto side characters. That series has a similar structural problem to the original Star Trek's transporter, which tended to abort the plot if they can just beam out of danger. So every episode it would either break, their communicators got stolen, or the enterprise has to leave orbit. First season of One Punch Man focused on Saitama basically having to keep playing a game he'd already beaten, and trying to find meaning in the absence of challenge. Living life with too high an AC to take damage and attacks that one-hit everything and being STUCK THAT WAY, in his apartment, at the grocery store... Him punching the bad guy was never the solution to the REAL problems he faced, it was just "oh this again". The second season focused on the side characters more and contained multiple problems that WOULD be solved by Saitama punching them, and it had to come up with reasons for him not to do so enough times for the seams to show. (Still decent, but things like the martial arts tournament dragged on too long as an Obvious Distraction.)

Alas, some of the overpowered protagonist anime... "Kid from the Last Dungeon Boonies" had an interesting premise but they had to nail the Idiot Ball to him in a way that prevents character development, and then the plot conspires to avoid further character development (by blackmailing his teacher?), and I only made it a few episodes in. (Being competent is fun to watch, especially exercising earned competence. Apologizing for competence, or being embarassed by competence, isn't. Saitama worked hard to become strong and was proud of his strength until he overshot his setting and became melancholy from lack of further challenge. Maple's game breaking abilities are often somewhat goofy (sheep have the ability to grow wool, it was not a monster ability any player was expected to LEARN from them, nor use it as a defensive tactic in a boss fight) but she's never embarassed or apologetic, and besides she's problem-solving in her own way. The stream of massive powerups that result (because nobody expected her and a boss monster to be unable to damage each other until she bit it to death doing 1 HP at a time, at which point The Algorithm awarded her its poison attack as a new skill) are just a side effect. (Gain a skill that absorbs an incoming attack? Instead of awarding it to yourself, equip it on your shield that already blocks attacks with knockback on the attacker, thus changing the skill's nature slightly to target (and thus absorb)... the attacker. As a one hit kill. Repeatable. Until the devs nerfed it to 10 uses/day in the next patch because COME ON. It has its own logic and recognizable debugging klutz blast radius and is fun to watch.)

The problem with things like "The Strongest Sage With the Weakest Crest" isn't how strong the protagonist is, it's that he's a cipher who undergoes no character development, never discusses the setting he came from (apparently remembers his previous life but never misses anything about it), surrounds himself with people he orders around but doesn't really have a personal relationship with, and they don't really seem to have backstories either? (None of them have parents, a hometown, childhood memories, other friends, goals beyond Graduate from Setting University...)

Contrast that with "Didn't I say to make my abilities average in the next life", which for all its flaws surrounds the protagonist with multiple characters with actual connections to their world, who do not appear to have been vat-cloned right before the start of the first episode. (Fade described that series as popcorn. Not a lot of substance, but fun.)

The Netflix-funded anime I've seen have been inconsistent. I've been Killing Slimes for 300 years started strong, and then had no idea where to go with it. Turned into "adopt house guests as family members with very little justification" and then they did a dungeon crawl in a shopping mall. (The protagonist's pre-isekai life basically never came up. Immortal witch, lived 300 years, completely unchanged by it. There was a lot of that: the characters were varied, but none of them really changed from how they were introduced.)

"So I'm a Spider, So What?" had reasonably interesting protagonist level-grinding in her hikikomori dungeon (not the same as character growth, but eh), but the power creep was mostly interesting in the context of a rigidly defined monster dungeon environment, and made less sense (and the mechanics seemed more artificial) once she got out of it. And all along the series kept cutting away to "people we have no reason to care about" who were trying to do politics without sufficient backstory (because not knowing what's going on counts as intrigue?) and there were a bunch of time jumps in both directions (not time travel, just telling the story out of order) plus shapeshifting that made it kind of confusing who was who, and basically EVERYBODY was a bad guy with no reason to root for them, and I honestly don't remember most of the last third of it.

Anyway, I've watched enough anime that Sturgeon's Law is starting to rear its head. (And that's AFTER whatever selection process sends us a subset of Japan's output. Before there was also survivorship bias: the classic anime people remember 20 years later kind of HAVE to be classic, because they wouldn't be remembered otherwise. Just like all the old 1950s cars that have survived to the present are so much longer-lasting than the average car built today. The ones that didn't make it aren't being counted.)

September 23, 2022

The problem with redoing existing commands people are using is you can't check in partial results. I'm working through the grep test suite (mine, I don't even know where the gnu/dammit one lives), and each fix gets the new -F plumbing to make it three or four more tests down the list, and then I find some new "-o has inappropriate NUL bytes in the output" weirdness to look at (not necessarily something I just broke, just something I hadn't noticed before) and I make another tweak to code that hasn't been checked in yet, and it all eventually gets presented as having emerged fully formed for the forehead of Zeus. *shrug* Worked for Athena, I guess. (Except the full myth is that Zeus ate Metis and then took credit for her work, while also perpetuating the cycle of abuse his father visited upon his siblings. That's privileged white men for you.)

Grep patterns having \( and \1 mean it's NOT literal and IS special is just cruel. (But the fixed-vs-regex categorization pass before the bucket sort needs to get that right too. The fast path is still "fixed" because it doesn't handle variable length matches, which also means it never has to back up and redo anything. It's also not handling [groups].)

The semester is definitely underway at the University of Texas Austin Campus, because sitting here at the table at 1:30 in the morning my phone has just given up trying to get data from T-mobile's towers. The anime I was watching on the walk here (another pharmacy isekai: not the one with some dude who lived with a wolfgirl and ghost girl and mixed things via musical numbers, but one that the first 2 episodes so far seem to be "jobless reincarnation, only with a pharmaceutical researcher") started to glitch out when I hit the edge of campus, but that could have just been Crunchyroll's horrible app with the bufferbloat problems. But now my signal bar has full strength with an exclamation point on it. (I could call T-mobile tech support, but they've never actually helped. And I'm guessing the issue is "50k students in 40 acres, only now with 3G"... or however many G's we're up to. I heard 6 is in development. I suspect the G's are a modern version of retsyn.)

September 22, 2022

Trying to make the -F plumbing work for all the corner cases regex was handling was intrusive enough I wound up just doing the single pass bucket sorted version anyway. I wound up not needing the bitmask because I can just if (TT.fixed[firstchar]) which isn't quite as cache local but 256*8 is 4k which on a modern processor isn't a big deal. Besides, the top 2k of that should never be populated unless we're doing unicode sequences, at which point all bets are off. Practically speaking, it's at most 96 entries (starting at ascii 32) so 768 bytes that actually need to take up L1 cache space in the inner loop when NOT matching the start of a pattern.

I can handle a little more than literal matches: ^ and $ are only special at the start and end of line respectively (anywhere else they're literal matched). If you \ escape an unknown character you just get that character (so \y becomes y), and . isn't too hard to deal with either: it's a little tricky when it's the first character, but I do have a list of such patterns, it's TT.fixed['.'];

The logic in parse_regex() gets a bit more complicated for all that, of course. With FLAG(i) it has to toupper() the first character. I'm just gonna punt on utf8 for the moment.

And while we're at it:

$ echo hello | grep '$'
$ echo hello | grep '^'

That's a little awkward to make the match logic work with patterns sorted by starting character. (Before I just punted to the regex implementation and that sort of thing was its problem. Need to add tests for empty pattern that isn't empty...)

What IS the regex plumbing doing with backslashes? You can escape a special character, but if you escape anything ELSE...

$ echo 'a[c' | grep 'a\[c'
$ echo 'a\bc' | grep 'a\bc'
$ echo abc | grep 'a\bc'
$ echo ac | grep 'a\bc'

As far as I can tell, an escape sequence that isn't escaping a special character means the result never matches anything, but it doesn't throw a parse error either? Is that right? (I have not installed nor will I install the gnu "info" tool modeled on the gopher protocol that got killed off by HTML in the 1990s, so if they refuse to put that info online it's proprietary documentation that doesn't matter.)

I found a stackoverflow post that suggested -E was the extended regex version of -e so you'd stick an argument after it and could repeat it to provide multiple extended patterns and I went "wait, what?" and tried it... no, they were wrong. It's a global type flag, but -e is still the pattern indicator either way. (Thought so, but confidently stated misinformation still triggers a double take and check...)

September 21, 2022

Going through grep and going "NUL bytes in patterns never work, even on debian's version with -zFf", and noticing that grep --color '' will emit two invisible color change sequences at the start of each line (because empty string matches every line, but does so as a zero length match at start of line), and staring at "int baseline = mm->rm_eo;" and going that's trying to be the amount start was incremented by, but we just reset it to toybuf above and whatever mm instance it was in a regex match isn't the one in toybuf, so... how do I make a test to show that failure?

This is not what I'm trying to do, I'm trying to optimize many fixed (or semi-fixed) patterns being tested in parallel, but I have to understand the context the code's running in and... I've already found a bunch of places it's not RIGHT? (Admittedly those were stuff like combining -F with -w and the empty pattern, which nobody seems to have ever done, but still. I need tests so it can get it all right...)

September 20, 2022

Recently I've been trying to speed up grep, starting with autodetecting fixed (grep -F) patterns and putting them in a seperate list, and then traversing both lists at search time. So all FLAG(F) does is force all patterns into the fixed list during initial processing.

For the first pass, that's ALL I was trying to do: no bitmask thing, and the checks are still "loop over the whole string for each pattern". I want to do this in stages I can check in, which means running regression test_grep every time I get something coded, and what I'm finding is -F was not heavily tested in combination with flags like -w, or with empty patterns... and then I hit embedded NUL bytes. The fixed string searches don't currently work at ALL with NUL bytes.

It's very tempting to just back up and do a larger change, but I don't want this to turn into another "80% completed and I got distracted by bug report du jour" thing.

September 19, 2022

Fixing the xxd -c 0 and -g 0 weirdness. Even the version of xxd my devuan install has didn't get it right, but I'm told newer ones do better, although still not in a conceptually consistent manner.

Implementing is easy, figuring out what the behavior should BE is hard. Matching exactly what some existing implementation does is the default option, but when that implementation has vesion skew or is obviously half-assing it (let alone both), things get... sad.

I don't use busybox as an authoritative implementation, but I do compare their output to see what they got away with. They're DEFINITELY half-assing it anywhere they can (because binary size), and on the "wait for people to complain" front they've usually had longer elapsed since they shipped whatever feature it was, and they have a more diverse userbase directly interacting with the result. Toybox has more seats, but it's mostly sealed behind plastic inside Android. Busybox has a 20 year headstart of bespoke tinkerers assembling fiddly things, although "it should be load bearing, if not an outright drop-in replacement for the tool versions in mainstream Linux distros" dates to my tenure on the project (so 15 years). Still, much more surface area to catch bug reports, so if they HAVEN'T had to implement something yet, that's a reasonable argument that it doesn't matter.

(That said, busybox xxd -pc0 is still wordwrapping at 60 characters. They haven't fixed what I'm fixing is an arguming against caring, but not a definitive one.)

September 18, 2022

Banging on the test infrastructure a bit: when it skips tests it should say SKIP: but doesn't always.

The test infrastructure could absorb my full-time attention for many moons, but this is kind of fallout from the removal of SKIPNEXT. Or at least adjacent. Unifying SKIP and SKIPNEXT into a counter makes it easier to say SKIP: with a single codepath, but there's still a lot of "if test; then... fi" blocks and equivalents in the test scripts, which bypass calls to testing/testcmd, and if it isn't called they don't get to output their SKIP: message. The "and equivalents" are the fiddly parts: skipnot test-returns-false sets $SKIP so the "testing" lines get called and can print their message, but "toyonly line-to-run-only-when-testing-toybox" was only calling its line when we have toybox. When we didn't have toybox, it wasn't saying SKIP. (It wasn't saying ANYTHING, the test was _silently_ skipped.)

Highly unlikely I'm done here, I need to audit the entire test suite for semantic consistency about test skipping, but I need to audit the entire test suite for full test coverage and that's essentially the same audit. Todo item for later.

September 17, 2022

I've been trying to fix a few places where sed -z was still using newline instead of NULL, and I hit this in devuan's host sed:

$ echo -en 'abc\0def\0' | sed -z 'ax' | hd
00000000  61 62 63 00 78 0a 64 65  66 00 78 0a    |abc.x.def.x.|

That really seems like a bug: the "append" command ends the appended line with a newline, even when it glued it on after a NULL. (Changing a toybox command's behavior is easy in comparison to figuring out what the right behavior IS, especially when upstream seems to be getting it wrong. Is it better to be consistent or to be right? Of course posix is no help, they haven't noticed -z exists yet. SUSv4's "find" didn't even have -print0 yet, I wouldn't expect them to notice sort -z and grep -zZ for quite a while yet.)

Unfortunately to poke the developers of the gnu/dammit version, it looks like I have to subscribe to yet another mailing list. (I'm still on the coreutilis list, becuase they still haven't merged cut -DF that I've noticed, but sed is an unrelated project.)

Eh, not connected to the internet right now, worry about it later...

September 16, 2022

Just saw a six month old tweet from Ariadne Conill of Alpine Linux who was apparently disturbed by some of my leftover sleep deprivation names in toybox. The more tired I am while coding, the more obscure my variable and function names get. I usually go back and genericize the worst offenders, but I admit I miss a few from time to time.

When enumerating display fields, if I ask myself "what kind of fields are these" the obvious answer is "strawberry fields", and with a structure name like strawberry the obvious loop iterator is "forever" (because of the Beatles song), and after the dozenth reference to "pattern" I'm going to throw in a "logrus" just for balance, but that sort of thing bothers some people. *shrug* Oh well. I regularly do cleanup passes even on mostly finished stuff, going back to try to clarify things, and I often remove stuff I don't want to explain. The sleep dep names are just one category of that.

I've intentionallly left several. I don't expect everyone reading the grep code to be familiar enough with discworld to recognize "struct reg *shoe;" as a reference, but a "shoe" is also a box used for shuffling and dealing cards, and even if you don't... it's a variable name. And I already RESISTED having "struct timespec ts;" be "struct timespec its;" in the millitime() function in lib, which would also be a perfectly acceptable name for what it does). You can't please everyone, and in the case of Alpine I'm not trying.

Alpine doesn't use toybox and I don't expect them to start, because they don't want to. The stated reasons for their decision may change, but I don't expect the decision to. They built their system around busybox years ago, and it's the devil they know. Toybox wasn't ready back when Alpine started, and toybox becoming ready later doesn't matter to Alpine because that part of their system is done, and changing has costs. The relative quality of busybox and toybox isn't why they stay with busybox: they're familiar with their existing codebase and it ain't broke enough to fix.

Looking at toybox to FIND additional reasons to support a decision they've already made makes me grumble a bit, but hey: free code review. Throw it on the todo heap. (Or explicitly confirm I'm not gonna, which is a valid result of my review of their review.)

September 15, 2022

So I got a bug report that toybox grep is too slow in the presence of literally thousands of patterns to check in parallel. Um, yeah. It would be. Unfortunately the suggestion of gluing them together with | (like I originally did before abandoning that approach as problematic, and which could just as easily be done by the caller making the big -f file) turns out to be even slower.

My first guess here is to grab the "fixed" patterns (I.E. the ones starting with a matchable first character and having no control characters) and check those first using a cache-local loop (memcmp each pattern at same position, so it's hot in L1 cache) and possibly a bitmask of first characters that exist in a pattern (so I can rapidly skip any character that can't match any of those fixed patterns). I could even bucket sort the patterns into an array[256] so I'm only trying the ones that start with this character. If the pattern has no utf-8 escapes in it (nothing >127) then I can even handle case insensitivity cheaply, and the "." wildcard at a place other than the first position isn't too hard to handle either. Plus the -F logic kinda folds into this.

The simple thing to do is then fall back to the actual regex patterns using the old "traverse the whole string for each pattern" regexec() logic, although maybe it's faster to append a ^ match start of line to the ones that don't already have it and try it at each position so it's cache hot? (Especially if it has a non-magic first character for the bitmask check, although x* does NOT count because that's zero-or-more, and having an | anywhere in the pattern is another can of worms.) Depends how much startup overhead each call to regexec() has, how much slower is calling a ^pattern at each position vs calling regexec(pattern) once on the full string to be checked and letting it advance internally. (Does skipping characters it can't start with make it a win, at least in context of the rest of the plumbing?)

Sigh, I should do the simple thing first and see if that speeds stuff up enough for the use cases people are hitting. JUST the patterns that would work the same with -F first. (Adding in "." is easy, but it's also an extra test in an inner loop, and unicode says it should match a character... Speaking of does that mean _printable_ character? What do combining characters do with '.'? Can I get away with ignoring that in Android's use cases?)

According to man 7 regex, \ can escape ^.[$()|*+?{\ and case insensitivity gets weird in the presense of UTF-8 sequences (needing to convert to wide character and then do towupper() so I can't even strncasecmp() because I don't know how long it is, and wcsncasecmp() assumes the string has been converted into the wide character "array-of-int" representation)... anyway, those are "this is not a simple pattern" indicators. Having the . wildcard in it is ok (modulo probably ignoring unicode above), and $ I can also deal with (heck, for -F strings it's an optimization to stop looking before the end although probably not worth it for bulk pattern handling), and ^ might as well stay conventional regex because that HAS to be cheap (it's just checking once at the start of string, not at every position in the string)... But again, moving patterns from the fallback list into the -F-ish array[256] with the bitmask can be done incrementally to speed up more use cases, one pattern feature at a time. (Doing so incrementally also makes regression testing easier.)

Assuming Elliott will give me real use cases, they seem to think the test case with nothing but hex digits is enough. Meaning I only ever care about fully fixed patterns and my mask can fit in a short. Are any of the tests needing acceleration case insensitive? Dunno. Do any of those have unicode? (Do any of them NOT have unicode, which is easier to accelerate grep -i for.) I'm reluctant to speculatively implement codepaths that MIGHT help, I want to know what's NEEDED...

Elliott pointed me at what BSD does, which good to know, but... they're still traversing the string start to finish for each pattern instead of trying a more lex-style width first approach? (Ala the bitmask.) Which means tricks they're doing like figuring out how many charaters they can skip ahead when it doesn't match... don't really apply to what I want to try? Plus they have a whole second codepath for case insensitive unicode support and I'm waiting to see if that's actually necessary or if those can be handled by the fallback regexec() plumbing? (I don't know if the regex plumbing in musl or bionic currently handle unicode matches anyway, but if they _don't_ it's not my problem to _fix_ it. :)

September 14, 2022

Here's a section I edited out from reply to the debian guy, right after the observation that _POSIX_PATH_MAX and _XOPEN_PATH_MAX are even more meaingless than PATH_MAX, because "They don't describe anything about the current environment you're running in."

Toybox is sometimes clever _within_ a command, but tries not to be clever _outside_ of a command, if that makes sense? There's a certain amount of "I don't know this, I can't know this, I'm not going to TRY to know this" in the external interfaces. I can't predict the weird. They regularly invent new weird after I ship any given version.

The linux VFS lets filenames be 255 bytes long containing any character except NUL and '/' (with utf8 parsing left as an exercise for the reader: here are your bytes back), and the limit on directory depth is inode exhaustion. That's been true for years, and is likely to remain so.

You can always wind up with loops due to bind mounts (or mv while you're traversing, or probably other shenanigans) so the dirtree plumbing has to be prepared to traverse back up your ->parent pointers to detect same_file().

An individual filesystem doing something other than that is "notice system call failure and print errno while exiting" territory. (Hurd or BSD or MacOS may also violate the Linux VFS assumptions, but they're unlikely to _increase_ any of those constraints.) Some mangle case and some truncate filenames and selinux is probably turing complete and other processes can asyncronously delete CWD out from under you (hence openat() needing to be able to recover in a directory traversal implementation, that recorded tree down to where we think we are is useful for getting back as far as we can and going "ok, this branch is gone now, continue from last parent"...)

P.S. no lseek() variant on readdir() is ever reliable, modern filesystems return stuff in hash order which gets perturbed by large flowers and brightly colored wallpaper, so a reliable deep traverse that wants to back up and recover has to read an entire directory level and THEN descend into it, because otherwise you can't recover back up to where you were unless you re-read the directory from the start having recorded each one you saw and filtering it out to avoid processing it again. Presumably less of an issue with rm but if you want rsync or tar to handle arbitrary depth you have to close the parent filehandles to avoid filehandle exhaustion, and you kinda need to read all their data before doing so or it's just a mess. And I need to add tests to make sure they can recover from a directory subtree vanishing during traversal or being mv-ed out from under the old parent, and that's mkroot territory...

I have a tendency to write up long technical... "explanations" is a strong word but "this is what I'm thinking" doesn't quite have a snappy explanatory phrase. Anyway, I then cut them out of the thingy as a digression, but sometimes want to refer back to them. I sometimes think maybe that's what I should post to youtube, just aimless technical rants. That's more podcast stuff, really... Anyway, sticking it here is a compromise.

September 13, 2022

Got a little extra sleep, and had the first reasonably productive day (by my standards) in a while.

September 11, 2022

I wanted to see why glibc was pulling in linux/limits.h, and I don't like editing my global system state as root (never quite sure I've restored it all and I do NOT want to track down those bugs a month from now), so I did "cp /usr/include/linux/limits.h limits.h; vi limits.h; sudo mount --bind limits.h /usr/include/linux/limits.h" which works just fine... as long as you edit the local limits.h before --bind mounting it. And the changes show up fine in the standard headers included by your test build. But if you then "vi limits.h" again after the bind mount, vi deletes the limits.h file and writes a new one to break hardlinks, which also orphans the --bind mounted inode. (It's like still having an open filehandle to a deleted file, it gets deleted when the last reference goes away. The old contents are still there at the bind mount, not the updated contents vi wrote.)

Knowing WHY it didn't work doesn't make it better.

September 10, 2022

I'm running a bit behind editing and posting blog entries (as usual), and I just got my exasperated rant at gcc up yesterday, and woke up to an email response:

...from a sequencing point of view, I think this line is equivalent to:

> ss = ss + sprintf(ss = toybuf, "%d", h1->pos);

... which has a sequencing problem because the ss read isn't sequenced with respect to the inner ss assignment.

To which I replied:

Yes, I know. I'm saying that's nuts for multiple reasons.

The semantics of what SHOULD happen look very clear to me. Entirely possible the standard is what's crazy here, but it IS crazy. They can warn about this but can't fix it? If ss was a global that was modified within the function call they wouldn't be able to warn about this, so breaking it down like that is extremely dubious anyway, just perform the load last before the assignment and treat += as atomic with regard to the optimizer (not threading or smp, but you need locking and cmpxchg and stuff there already). The target of += has to be an lvalue, it CAN'T be that complicated to fetch. Sure the address-of-data could be an outright function call with multiple array dereferences and structure member offsets, but once you've GOT the final pointer to primitive type, don't dereference it until it's needed because yes it can change.

They're also saying that a register load from a local variable can persist across a function call. The function call MUST happen before the addition, they're requiring support for an order of operations that MAKES NO SENSE just because some optimizers might be crazy.

If it was ss = 1+(2+3) then the 2+3 has to be resolved before it gets added to 1, and there isn't an actual value stack on real hardware, what it does is reorder the operations so 2+3 happens before +1 in the resulting stream of generated assembly. So from THAT viewpoint there's no reason ss = ss + (ss = 7); isn't clear when the assignment is inside parentheses, we're SAYING it has to happen first (increasing its priority). And a function call should not be weaker than parentheses.

Maybe my viewpoint is affected by having maintained a compiler fork for 3 years. They could trivially have specified this behavior as stable and CHOSE not to, and I think that choice was really really stupid. (If sticking "volatile" on ss would have made the warning go away, then the optimizer was given too much freedom.)

I'm not saying "the compiler isn't like this". I'm saying it SHOULDN'T BE. Reality is that this particular compiler bug is at the design level, possibly at the standards level. But it's still a bug.

I occasionally engage with the posix committee, because I'm writing toybox and what they say... eh, doesn't really affect me since SUSv4/Posix-2008 came out 14 years ago and is still current barring typos (rumors of Issue 8 abound but until it goes on it's not a thing yet), plus I just try to DOCUMENT "devations from posix" rather than not have them. But it would be nice if my delta was kept small and even diminished in places.

But I've never bothered to engage with the C committee, both because I haven't maintained my own compiler version in 14 years, and I just RECENTLY moved from the 23 year old release of their standard to the 11 year old release. And also because the C++ compiler developers actively trying to sabotage C very much WANT to make this worse (so C++ looks less bad in comparison), and I would be defending "the way it should be" rather than "the way it is", which is seldom a strong philosophical position. If I was working on qcc it would fscking get this RIGHT... except I'm not and don't expect to have time to in the forseeable future.

But "optimizers affect the otherwise obvious language semantics" is a pet peeve of mine. No, it means your optimizer is broken. If you can warn about it, you can fix it. While I CAN dance around the user-visible sharp edges of your insane over-optimizer, I should not have to. "But other compilers might not..." then those other compilers are broken. At the very least give me more -fno-strict-aliasing variants to MAKE IT STOP.

Some squishiness is historical. Heck sizeof(int) was 2 on DOS and 4 on Linux, but the march TOWARDS things like LP64 has been an improvement. Yes historically "x = 0; printf("%d %d\n", ++x, ++x);" varied its output back under Turbo C depending on whether you selected C or Pascal function call semantics (and thus pushed arguments on the stack left to right or right to left, which affected the order in which it resolved them), and that got baked into the standard (the same way it refused to acknowledge "char is a byte" for the longest time). But that's childhood trauma, not a virtue.

If I know "int x = 0; x = printf("%d\n", x++) + printf("%d\n", x++); printf("%d\n", x);" is GOING to perform those first two printf() calls in order (as visible in their output), then why would it NOT perform the two x++ in order? (Which it DOES. Even with -O2 gcc gives the right output: 0, 1, and x=4. And yet it warns that some OTHER compiler might not.

Optimizers are absolutely FULL of logic to figure out what it can move without changing the visible result. X has to happen before Y, therefore that constrains what can shuffle around. If "a[ss] = ss++;" has an unclear order of operations, then DEFINE it. Pick one, specify, agree what should happen, document it, publish it, and now it's no longer unclear. This is not fundamentally different than 1+2*3 doing the 2*3 before the 1, we have operator precedence tables for a REASON.

When there's ALREADY nothing semantically unclear about what should happen, it's just a question of what people have chosen to defend. (Grumble grumble mark every variable "volatile" by default grumble.)

Optimizers need a MUCH HIGHER BAR for breaking code than compiler writers want to give them, we all should push back on this more. THAT is my complaint here.

September 9, 2022

Zabina died last night. Well, we took her to the vet and had her put down because she couldn't breathe anymore.

She was diagnosed with a congenital heart condition back in December, the vet said it would be a miracle if she lasted 6 weeks. That was 9 months ago, so she had a pretty good run. She hasn't murdered anything in the yard in over a month, but she was still enjoying herself until about 3 days ago. That's when she started randomly meowing in distress, and when we came to see she was just open mouth breathing really hard. At first it was like the occasional attacks she's had over the past few months, an unpleasant minute or so choking on nothing, but it would pass and then she'd feel better. (She's been doing that since last year, it's why we took her to the vet and then the cardiologist in the first place.) But these were lasting longer, and she was listless in between them. And petting her made her breathing worse. We'd hoped she'd rally like she has before, but lunchtime yesterday she meowed in distress from under my bed (not a usual place for her to hang out), and wouldn't come out. I gave her a bowl of cat gravy (liquid cat food both cats really love) and she perked up enough to come out, but just sat near the water dish in the kitchen drinking and open mouth breathing.

She's never hidden from people, even when she was sick she was social and affectionate cat who considered humans basically fungible. She never met a human she didn't like (snuggling up to vet techs and every house guest). She's not always a lap cat (and when she is she randomly flops on your lap and then leaves again 5 minutes later), but she'd usually position herself a few feet away from a human, or strategically between multiple humans in the standard cat chess way. (And even yesterday, sick as she was, when Fuzzy went to her room around 4pm Zabina moved to be halfway between her and me.)

Two things Zabina's always loved are shoulder rides (she can be TALL) and when we have a fire in the yard (it means we are out in her yard for an extended period of time, with a blanket on the grass). It rained last night so flying sparks shouldn't set stuff on fire with Austin's weather being so weird. Everything was so damp the fire went out 3 times just trying to start it with a bunch of paper and the driest wood and twigs we could find, but we eventually got a small fire of neighborhood branches going in the little brick lined pit we dug to cook dinner on it... and Zabina wouldn't come out. I picked her up (very carefully) and gave her a shoulder ride out to the blanket... and she went right back inside, flopped in front of the door to Fuzzy's room, meowed in distress, and started choking on nothing. And didn't stop.

At that point, we put the fire out and called the vet. Apparently the number Fuzzy had and confirmed WASN'T the vet who came to our house for Aubrey years ago (the cat who got cancer from her flea drops), this was just the new 24 hour vet on Guadalupe that we had to call a car and take Zabina to in a carrier. I didn't think she'd make it, and she peed herself going into the carrier (VERY LOUD PROTESTS), but once we got there they put her in a 40 percent oxygen enclosure, and after a half hour or so she woke up enough we got to say goodbye. (Even then she was still open mouth breathing, and while not rejecting petting she wasn't really responding positively either.)

Zabina lasted 9 months longer than the professionals thought she would, and was enjoying things until this week. Once we confirmed she was no longer enjoying the things she really used to look foward to, and was in persistent physical distress, it was time.

September 8, 2022

Alright, so what should a sed protocol for tar --xform look like?

Launching an instance of sed per filename was always a hack, and the Android guys have noticed it's really slow. But a persistent sed process can get out of sync with tar, either by not returning any output for a name or returning what parses as two outputs (not only can you stick \n or \0 in the output via substitution, but filenames can have \n in them already). Doing a big read() doesn't guarantee we got all the data (PATH_MAX went byebye and if sed isn't doing a single atomic write() I can't guaranted all output will have traversed the pipe yet anyway), and no fiddling with timeouts to see if more data is coming isn't faster than forking a bunch of sed instances (which at least exit promptly).

I was thinking I could add a new --longopt to wrap a protocol around each transaction providing an explicit length (ascii decimal number plus newline before the data) on input and output. The downside is you'd HAVE to use toybox sed, the MacOS or gnu/dammit ones wouldn't have the longopt. (At which point I guess it could still fall back to the curent fork an instance each time behavior, but maintaining two codepaths to do the same thing is not ideal.)

I started trying to make it symmetric, but the existing plumbing is NOT set up to do that: it uses do_lines() out of lib/ which converts the fd it's fed into a local FILE * and then loops calling getdelim() and the provided function pointer, so when the callback function gets control (with the number line it can atoi() so it knows how much to read), it hasn't got access to that FILE * to fread() more data from. No, reading from the fd doesn't work because line reads don't know ahead of time where the delimiter is so they grab a block of data and store the extra until next time, the FILE * almost certainly has the start of the data we need stored in the structure. (This is glossing over sed's "read one line ahead to detect end of file for $ matching" complication, which is merely annoying in context.)

Probably what I want to do is just use -z for input: a filename can't have NUL in it (forward slash and NUL are the two characters prohibited in filenames by the Linux VFS) and tar is providing the input so won't play games with it: the user-provided sed pattern is the tricksy source of breakage.

At which point I'm tempted to just say "sed -z" and if somebody's stupid enough to "d" the string or insert a \0 call it pilot error? I doubt macos sed has -z, but we don't compile with macos sed either: macos sed is trash. The bigger problem is this SMELLS security exploitable. I don't know how, but this is exactly the kind of sharp edge that gets buried three layers deep and then combined with unforseen weirdness, and suddenly you've got a tarball validator stomping system files. Except... you could already do that with ../../.. couldn't you? Hmmm... (Note to self: test that transform happens BEFORE checks for "under this directory" and so on.)

September 7, 2022

Wendy's! Actually hanging out at the booth with a 4 for 4 and my laptop for the first time in I have no idea how long.

Oh goddess, github has ACHIEVEMENTS now? Apparently I've contributed code to a mars explorer (via linux-kernel), to an arctic vault (also via linux-kernel), and have "starstruck" level 3. Did this really need gamification?

Got a bunch of email answered and cleaned up after the recent test suite API change, and had serious momentum going when my laptop battery ran down enough it suspended. As I suspected, the fact Wendy's no longer has public bathrooms is not the limiting factor to how long I can work there...

This confirms that I definitely need better office space to get work done. I miss the hotel rooms in Japan. (And even the Hello Office.) My house's central air is one-directional: air comes OUT of vents but the only air INTAKE is one big one in the ceiling of the little hallway off the living room (I.E. right under the air conditioner itself, where we change the filter). So if I close any of the rooms and stay in them for hours, they get noticeably warm and stuffy. More of an issue with the small front room that used to be an office before we turned it into storage, but even the big master bedroom isn't _ideal_. (I can set the bathroom exhaust fan going there though.)

Alas using your bedroom as an office is a recipe for insomnia, and the kitchen table I normally use means Fuzzy and Peejee constantly interrupt me (no door I can close to keep them out, which is also why the air conditoning works well in there).

September 6, 2022

Zabina is unwell. She's lived nine months longer than anybody thought she would, but we're having to make a judgement call about whether she's still enjoying life. (I _think_ she is? But there are definite periods of distress. We have card on the fridge of a vet who makes house calls for one specific purpose. We haven't called them since Aubrey, but Fuzzy's confirmed the number is still current.)

Reading a page which says "China's sudden lurch from Third World basketcase to dynamic modern economy... In the 1960s, sixty million people died of famine in the Chinese countryside; by the 2010s, that same countryside was criss-crossed with the world's most advanced high-speed rail network..." and those things are orthogonal?

It's like saying "millions died in Pol Pot's massacres, but a few decades later they had cell phones". The one did not cause the other. The mass deaths in china were because Mao was the poster child for Dunning-Kruger and kept doing stupid crap like the "great leap forward" where he forced everyone to melt down their farming equipment into ingots for display purposes, and the "four pests campaign" where he trashed the ecosystem. Nixon going to China and Deng Xaioping pulling a 180 on private industry were a repudiation of Maoism, and emperer Xi pivoting back to Mao is doing huge damage to his country.

China see-sawing rapidly between flood and drought today is partly because of climate change, but MOSTLY because they dammed every river in the country multiple times turning the natural water distribution network into a series of evaporation ponds (and then stuck the south-to-north water transfer project on top of it with peverse financial incentives). Here in the USA, Atlanta Georgia was already a deadly heat island decades ago because they replaced enough trees with black asphalt parking lots to affect the weather patterns for miles around. The CCP did that sort of thing on a much larger scale, and every time things got worse they doubled down.

Mao also _prevented_ more chinese people than he killed with his "one child policy", which got worse every year it was left in place (kind of like "stack ranking" does in silicon valley) to the point where they have two consecutive "only child" generations and have now lost the cultural norm of HAVING children (or the economies of scale that make raising kids an economically viable life choice). This means China's army is completely hollow because they can't afford any personnel losses; every dead soldier has two parents and four grandparents who have no remaining descendants, have seen their line end, and have nothing left to lose.

The USA has its own problems here, of course. Thirty five years ago we had "latchkey kids" all over the place, and older siblings babysitting younger ones. Now our broken healthcare system attaches a $30,000 hospital bill to childbirth, day care went from "hanging out at a neighbor's house" to licensed specialists with college degrees and regular inspections (the closest of which is a 4 mile round trip drive with a 2 year waiting list and 4 figure per-child monthly cost), and children are so rare and precious that the slightest hiccup (such as an unescorted nine year old seen walking a block from home) makes Child Protective Services come take them away, meaning children are now risky and stressful to have. Raising two kids to adulthood is more expensive than paying off the mortgage on a house big enough to raise two kids in, and two is still below replacement rate...

Once again, the solution starts with "every single Boomer dies". How the unsustainable social norm is them removed is an open question: taking the society it's attached to down with it is the most obvious way. Ghost towns become ghost cities, sped up by hurricanes, tornadoes, golf ball sized hail, 120 degree days with rolling blackouts... Hopefully we find a softer landing than that, but we can't START while christian dominionists are actively trying to bring about the apocalypse because they think they'll be raptured to cloud nine and get to meet Santa Jesus.

September 5, 2022

I am reminded of Neil Gaiman's observation (in his "make good art" commencement speech) that one day he realized he had become someone who professionally responded to email, and wrote on the side.

I'm mildly exasperated that I keep writing up largeish responses to _private_ messages. Three different such threads in progress at the moment, including bash maintainer Chet Ramey replying to something off-list. It's not that I don't want to have these discussions, it just seems non-optimal to research stuff and write it up, and then not be able to publicly refer to back to it. (I like to feel I'm at least contributing to a public body of knowledge. It doesn't have to be GOOD to form part of somebody else's research slush pile, if all else fails learning what NOT to do by example.)

And I KNOW I'm not doing youtube videos fast enough when other people wind up basically doing them for me. (Which I'd point people at more prominently if he wasn't using the old abandoned github fork that says at the top of the README "it moved into toybox", as visible in the start of his video. I'm simulaneously proud and embarassed. He spent the second half of the video trying and failing to get the dropbear build to work. It works in the current one. I checked last release!)

Mixed feelings about one's work are normal. (Neil Gaiman defines a novel as "a long piece of prose with something wrong with it", although he attributed the quote to Randall Jarrell.) I remember when I stumbled across an old discussion of toybox that seemed at the time like it was dominated by Gavin Howard going "he hasn't updated my bc code, therefore the project is trash". I was strongly tempted to just delete that bc and maybe write a new one from scratch at some point in the future. What stopped me from doing that was, ironically, being unsure if bc is even still a thing? All I've seen anybody use bc for in ages is to run Peter Anvin's kernel script to calculate the clock tick granularity. (Which I posted both C and shell replacements for before he checked it in, it was Peter's spanner in the works for the perl removal patch series.) Sure bc is in posix, but so are compress, uucp, sccs, ed, fort77, mesg, qdel, pax... All the kernel _really_ needs bc to do is perform the equivalent of kernel/time/timeconst.bc, last modified in 2018. (There's one other use in tools/edid/Makefile where it basically performs printf %x $((100-$NUM%100)) for some reason? I mean, seems a bit overkill? But the tools/edid directory was moved from Documentation/ and isn't listed in tools/Makefile so I can probably safely ignore it.

If I did want to write a new bc, I finally got the $((math)) logic in the shell implemented, and I know the _theory_ of arbitrary length math and calculating trig functions by formula. But... I have literally zero use cases for it? The timeconst.bc plumbing is just multiplication, division, and modulus (remainder from division). It LOOKS like it's using exponents but 2^b is actually a bit shift. Everything it does fits comfortably in 64 bit math.

That says to me we really don't NEED bc. Linux From Scratch and gentoo didn't bother to add it to the base OS until it showed up as a kernel build dependency.

September 4, 2022

Circling back around to that bug I need Fedora to reproduce, I downloaded a new Fedora .iso that runs a version of glibc that deigns to compile with a current-ish gcc version, confirmed the issue does reproduce there, built glibc, confirmed "LD_LIBRARY_PATH=../glibc/build toybox" gave me the command list... but then trying to reproduce the issue under (as yet unmodified git build of) glibc went:

Fatal glibc error: ../sysdeps/nptl/fork.h:83 (reclaim_stacks): assertion failed: l->next->prev == elem

Before it even GOT to the failing part I could (presumably now) stick printfs in. Dunno if this is git commit du jour having an issue (continuous integration sucks), or subtle version skew between whatever libraries it found in the glibc build directory vs ones it loaded out of /lib, or if it's detecting subtle memory corruption that ASAN, musl, bionic, and debian's version of glibc didn't...

Ok, I'm pretty sure it's not that last one because:

$ LD_LIBRARY_PATH=../glibc/build ldd toybox
Fatal glibc error: ../sysdeps/nptl/fork.h:83 (reclaim_stacks): assertion failed: l->next->prev == elem
ldd: exited with unknown exit code (134)

Yeah, it didn't even enter toybox main() and already bailed. Presumably in the dynamic linker code. Except ldd without all that says:

The first of which is kernel magic, the middle three I gave it symlinks for (and I ran ldd on each of those three to make sure they didn't pull in any OTHER libraries than the ones in the above list)... is there version skew with the dynamic linker?

The infrastructure is just _spectacularly_ brittle.

Ok, let's try rolling the version backward... and Fedora doesn't say what the version it's using IS. Almost every other shared library is a symlink to a filename with the full version number, but /lib64/ is a file, not a symlink. Great, let's ask "yum info libc6"... network is unreachable. Fedora no longer has a local package repository? It has to talk to the server AGAIN after I already installed strace using yum so it's already initialized whatever there is to initialize...

Sigh. My phone still has a web browser, and Google says Fedora 36 should be using 2.35, so check out glibc-2.35 from git and... yay? I think that worked?

September 3, 2022

[The note I left myself for today's blog entry is "200000/28=7142, 7142*6=42852". That's it. I remember that this was how I'd worked out something important, that it proved or demonstrated something, and I meant to write up an explanation of it as a blog segment. Which would have been fine if I'd gotten back to it in the next day or two, but it's now September 16 as I go through and edit this, and I have NO IDEA what the actual topic was.]

[Editorially I note that going back through the blog entries before this I come off as REALLY stressed. Fade had returned to Minnesota, one of my cats was dying, I had writer's block on some chunks of toybox infrastructure... Accurate record of how I was feeling at the time, but probably not a fun read.]

September 2, 2022

For a while I've wanted to change the toybox test plumbing so:

testing "cat1" "cat" "hello\n" "" "hello\n"
testing "cat2" "cat infile" "hello\n" "hello\n" ""

Is instead:

testcmd 'cat1' '' $'hello\n' '' $'hello\n'
testcmd 'cat2' 'infile' $'hello\n' $'hello\n' ''

The testing->testcmd switch is ongoing, but the move to $'' would avoid gratuitous use of echo and in theory makes it a little more obvious what fields are processed (although 4 of the 5 are, so it's only the first one that ISN'T). It also makes NOT processing them easier: when I have to pass through backslashes or escape sequences to a command it can take me several attempts to get the amount of extra escaping right.

The downside is A) this is needed commonly enough that just doing it for you all the time made sense -- most input and output ends with \n, B) mixing environment variables with this becomes a bit awkward because $"blah" vs $'blah' aren't the same as "blah" vs 'blah', the double quote version is some completely unrelated (and useless) internationalization thing. (I don't remember whether to be annoyed at bash or posix, and am not looking it up right now.)

Part of the reason I've wanted to change this is the txpect plumbing I implemented (yes, I wrote a version of "expect" in shell, which sh.test uses quite a bit) already does this: when you want a newline, you $'\n'. And I can do "$VAR"$'\n' when I need to, it's a LITTLE awkward but not THAT awkward?

But the friction of changing every existing test, and the relatively minor benefit... It's one of those "In academia the fighting is so vicious because the stakes are so small" things. (I did, however, check to make sure mksh can do this too, and it can...)

Aha! Ok, the REASON not to do that is shell variables can't contain embedded NUL bytes, so echo -en 'a\0b' can feed the whole string with the NUL byte in it to a command's stdin, but $'a\0b' fed as an argument into the testing() function gets truncated at the NUL byte (because that's end of string using C's efficient but limited in-band signaling).

I should do an explainer video on C vs non-C string handling. (Back in the day "Pascal" was the big comparison because Turbo Pascal and Turbo Basic were the big competing languages under DOS, and C and Pascal could share .o files as long as you told the C compiler to --use-pascal function calling conventions, hence the "functions evaluated in reverse order" user-visible behavior change.)

The tl;dr is the alternative to an in-band signalling string terminator is explicit length tracking in a second variable, which means each pascal string is a struct from C's point of view. If your length tracker is a single byte, your strings can't be longer than 255 bytes long. If it's a short, you're limited to 64k which still comes up sometimes. If it's a 32 bit length then each string takes an extra 4 bytes to store, which adds up and imposes alignment constraints.

But the bigger problem was you wind up doing a lot of unnecessary copying. A pascal (or Java, or Python) string object points to the start of a string, so if I want to remove the first character I memmove() the whole string (or add _another_ variable to the struct to track starting offset, which has its own death-by-a-thousand-cuts problem set). In C, I can just ptr++ as long as I keep track of what I need to free later. Grabbing data out of a big mmap() or something is easy in C, but in Python you allocate new memory and copy it even for trivial things. Since string processing is (more often than not) the most expensive thing a program DOES (and fundamentally cache unfriendly), this was kind of an important thing to optimize for. Now after a zillion iterations of Moore's Law people care less than they used to, but C (and its Bizarro world version C++) still dominates because "runs twice as fast" still means "eats half as much battery doing so".

The downside, of course, being that in-band signaling can always spuriously trigger and treat data as metadata without rigorous (and inherently recursive) escaping. Which brings us back to "unix process arguments can't include a NUL byte".

September 1, 2022

Another fun corner case with diff -B is that the return value is 0 if all the matching hunks were suppressed, meaning the FLAG(q) logic has to go _after_ collating the match lines and determining -B suppression. One more test for the test suite...

August 31, 2022

I started downloading the new Fedora 36 ISO before testing something (partly in hopes it would fix the version skew issue I was seeing trying to swap out the glibc version in the Fedora 34 ISO), and it said it was going to take 2 hours to download because Google Fiber is slower than my phone tether. But my phone tether has a maximum monthly tethering bandwidth before it slows WAY down, and a gratuitous extra 2 gigs is noticeable especially at the end of the month, so I would LIKE to make the fiber do this.

And then while watching a youtube video on my phone as it downloaded... the video hung. For a full minute. Because my phone was associated with the Google router instead of T-mobile (again, save metered bandwidth at the end of the month), and it can't manage to watch a video on Google's servers using a Google phone now? Great.

So I called Fade to see if she could contact Google Fiber tech support yet again, and they wanted me to install the Google Home app on my phone since she's back up in minneapolis, but when she couldn't make it add me to the thing remotely as a "guest", she eventually gave me her login info for the "Google Home" app...

At which point Google added 77 contacts to my phone. I now have the phone number of a bunch of people I barely know mixed in with my phone numbers. I did not ask it to do this. I actively do not want this. There is no undo button. I uninstalled the "google home" app again but apparently the damage is done. (This sort of unwanted "integration" is one of many reasons I've avoided Microsoft products.)

I deleted a test contact off my phone and it deleted it off of Fade's phone in Minnesota. I was afraid of that. It's got her credentials stored somewhere, probably under system settings... yup, there's a google accounts tab, remove her account... it gave me a big warning about how that would remove all her contacts and I went "yes exactly what I want"... and then it didn't. The pop-up lied, they're all still there in my contact list.

Now that I've removed her account, let's try rebooting the phone... Nope, they're still there, but now Signal has notified me that 17 people I don't personally know are all on Signal. Which means Google has inappropriately notified THEM that I'm on signal (NOT UNDER MY REAL NUMBER, it's some voip thing Jeff set up that's since expired I think, I just installed this app for the j-core engineering chat channel when they decided to move there from Line). The cross-contamination ripples out without asking, propagating misinformation, and I can't seem to stop it.

Of course this isn't exactly a high water mark for Google doing that to people, which is why I consider uploading anything to Google cloud services to be equivalent to posting it to usenet. Any information put into a Google Doc is sent to advertisers (Jeff's complained that he gets ads related to things he uploads into the private corporate workspace he pays hundreds a month for; that's information about upcoming patents and trade secrets that's STILL going to advertisers), which also means it's purchased by all the state actors and law enforcement agencies and so on, since those all have have front companies buying all the advertiser data alongside the corporations doing so. Not that Faceboot and Amazon and Palantir and Blackwater being able to buy my data is any better. Every time I look at an amazon product on my phone or laptop it's in spanish these days, and I can't change the language preference without logging in, which means creating an account. Plausible deniability, of course. As with twitter demanding my phone number, it just means I can't use that service anymore.

Can't complain: I'm the product not the customer. It would SEEM like we could complain since we pay for Prime, so I AM the customer. And Jeff pays for the Google Workspace that bases ads on proprietary business plans. But being the customer doesn't mean you're not ALSO the product.

And without selling all our data, how else do we expect the GOP to figure out who to arrest for buying birth control? Preventing them from arresting people would deprive them of their religious freedom to burn witches. Long christian tradition, Cain killed Abel, Moses had 3000 of his followers killed for disobedience after escaping egypt (in the movie Charlton Heston threw tablets at the golden calf and it exploded, but in the book he ordered his followers to stab them: Exodus 32, 26-28, see also Matthew 10, 34-37), Abraham was told to sacrifice Isaac, king David had Uriah killed so he could marry Bathsheba (king Solomon was their son). An especially godly thing to do is murder the kids of parents you don't like, such as David's first child with Bathsheba (after which David apologized for the murder, which made it ok), the firstborn of Egypt, Job's wife and kids... (But that's ok because God gave Job replacements later. No the original wife and kids stayed dead, but they're apparently fungible? The important thing is that a beardy man had an experience. Anyway, that's why ICE putting kids in cages is very christian.)

At least now that I've removed Fade's login from my phone's "list of Google logins" (again, didn't ask it to add her there, I was trying to log into just one app to poke the router, not change my phone's global state and send my wife's contact list out through signal)... Anyway, at least now that connection's broken, I've confirmed I can delete Fade's contacts that have spilled over mine without deleting them off her phone as well. If I can figure out which ones are hers and which ones are mine that I'm just not recognizing out of context...

And of course the Google Fiber support drone once again couldn't fix the router, just like all the other times. This time they wanted us to move the router out of the kitchen, which... look, aside from signal strength NOT BEING THE PROBLEM (the "2 hour download" was while my laptop was on the kitchen table about eight feet from the router), the kitchen is where Google's installer drilled the hole in the wall to install the fiber jack, and the ethernet cable molded to that jack is maybe 5 feet long and would go over a tile floor if we did try to run it anywhere. No, I can't feasibly move it from where the installer put it. The fundamental problem is they made the curb cut on the wrong side of the property, so they couldn't go up the same side of the concrete driveway the cable TV cable went up but instead had to go up the other side, where the concrete driveway slab continues around the side of the house to become the concrete back porch: they drilled through the first wall that's not behind multiple feet of surface concrete, and on that side of the house it's the kitchen. All that concrete was poured long before we bought the house, and easily visible on Google Maps' own satellite photos. That said, it's not a big house and the wifi signal reaches it all reasonably well from there.

The guy who was here to swap out the fiber jack's lens months ago explained exactly what was wrong with the little white circle router, basically it's too "smart" and once it gets this kind of confused it needs to be replaced. (He did NOT say "probably because of all the data collection spyware", he said something like "adaptive routing tables", but a gigabit connection to a single next hop at a fixed IP does not need to make any real routing decisions? There only IS one choice of where the packets go from here. That guy was ready to replace the router while he was here, but couldn't without the associated app and Fade was in Minneapolis at the time. Ever since, the Google Fiber phone drones don't understand the system anywhere near as well as that tech did. No the ethernet cable is physically attached to the fiber jack so it can't be unplugged, no it's only a few feet long, yes rebooting the router specifically (not the fiber jack) is what makes it fast again for hours at a time, which means it can't be a cable problem or a wifi signal problem. And even when the router's slowed way down two devices talking to each other within the house are still fast, and that goes through the router because of the way wifi access points work. Phone Drone does not understand any of that, they're always reading from a script titled "how to avoid spending money on service calls".

I'm trying to deal with this now because during the several months Fade was here over the summer, Google Fiber support refused to send the original tech BACK out to replace the router like he'd offered to. Every time we contacted them since it's been excuse du jour.

Anyway, I rebooted the router again, so it's fast again for the evening, and I downloaded that "2 hour" Fedora ISO in 4 minutes 58 seconds. (The FIBER part's been fine ever since the support guy replaced the lens, it's Google's crappy ROUTER that's broken and which they've refused to acknowledge, let alone fix. And it is so very clearly a software problem, not a hardware one...)

Now to spend far too long cleaning unwanted contacts out of my phone. Is this context-free half-remembered name one of my wife's friends I've heard mentioned but never met, or someone I know but haven't talked to since before the pandemic? Is 'Dad' her father or mine? A fun evening to look forward to.

(Yes, this is my normal experience with technology. I break everything. Too-smart automation did something I didn't ask it to, and now I have to get a mop. Ironically, it happened while I was trying to fix OTHER too-smart automation that keeps doing things I didn't ask it to. I get angry at Google because I still expect better from them, but I'm slowly learning to treat them as a Curate's Egg.)

August 30, 2022

"I'm good at reading code, but I know you are not."

What a nice way to start the morning. From a guy telling me that the developer behind an attempt to revive Java's "write once run anywhere" promise by genericizing Apple's old universal binary concept to support every OS on every type of hardware simultaneously... It's ok because that developer used to work for the company that removed support for all but 2 architectures from Android, which makes them a domain expert on portability. Ok. You do you. Of course they feel I need to read the new attempt's code to be allowed not to care, the same way each new attempt at making microkernels work is sure I need to read THEIR code to be qualified to continue not to care about microkernels.

This is fundamentally the same issue as "proof of my particular god" du jour. A theologian who has studied every Star Trek novel word by word and gone through every movie and TV show frame by frame to PROVE klingons are real, refusing to talk to anyone who hasn't even seen the last season of Deep Space Nine. Uh-huh. Let me know when whatever you're excited about actually arrives. "Coming Soon Now, You'll See" is a promise I've heard before. If it's the kind of thing where ignoring _one_ person being excited about it means I'm in serious danger of never hearing about it again... how is that an argument AGAINST my position? I heard about llvm repeatedly for something like 6 years before I tried it: it was still there when I got around to it.

So yeah, that's today's new tangent somebody strongly feels I should be following instead of closing tabs I already have open in the giant todo heap.

Got my new diff implementation producing a reasonable diff for Ray Gardner's test case, which is a step up from segfaulting. It's still wrong in two places (the @@ line count of a hunk is off by one, and the three trailing match lines are repeated in the hunk and the greedy matching algorithm is eating them early, so it ends with three - lines. Which is technically correct, that IS a valid view of the change that happened... but unified diffs should only do that at EOF, which this isn't). Not sure what's up with the first problem, need to stick a bunch of printfs into it to see where it's going off the rails. (Also, "mine doesn't match theirs" doesn't always mean WRONG, but I'm not sure how that would be the case here? Do we NOT have the same number of lines in the hunk?)

The easy way to fix the second problem is to compare and peel off the trailing lines before evaluating matches and stick them back on after, which is ugly but conceptually simple. A _better_ way is probably to have the algorithm burn the candle at both ends (have the loop alternate pairing off matches from the start of file and the end of file), but that's a bigger change and I want this DONE. (Sigh, I also want it _right_...)

Meanwhile, I'm not entirely certain what the definition of diff -B is supposed to be, because:

$ diff -Bu <(echo $'one\n\ntwo\n\nthree') <(echo $'one\n\ntwo\nthree')
$ diff -Bu <(echo $'one\ntwo\n\nthree') <(echo $'one\n\ntwo\nthree')
--- /dev/fd/63
+++ /dev/fd/62
@@ -1,4 +1,4 @@

Technically, BOTH hunks only differ by blank lines. But debian's diff is not suppressing the second hunk with -B because it decides to treat "two" as having moved instead of the blank line having moved. Which is a coin toss. (Again, "why is the diff algorithm making these decisions" seems like a thing I should understand. I see WHAT it's doing. Should it be?)

August 29, 2022

My laptop battery's been increasingly fiddly over the past few weeks: whatever communication with the microcontrollerr lets it start charging doesn't happen unless I hold it at exactly the right angle for a moment, then I can put it back down and it's fine up to 100%. The other day I plugged it in and went to bed and hadn't noticed it hadn't started charging, and then no matter what I tried it wouldn't _START_ charging (the battery was at 90% so I could have gone off and used it... but that's the old Voyage of the Dawn Treader when to blow the horn question, isn't it? Besides, if it loses power completely all my open terminal windows with half-finished things where screen backscroll and command line history act as notes-to-self about what I was in the middle of, and all my half-finished thunderbird email reply windows go poof with no record of them because thunderbird has normal Linux levels of usability.

(Thunderbird's "repeat every recognized email address twice" thing is annoying enough, some sort of html reply code triggering in text mode. And when I'm composing in text mode reply lines still act weird for editing reasons and sometimes hitting backspace at the start of the line deletes the end of the previous line instead of collating. And then it wordwraps the sent mail at something like 76 characters, which combines badly with dreamhost's web archive when viewed on my phone's browser which isn't wide enough to show 76 characters even in landscape mode and of course pinch-to-zoom doesn't re-wordwrap but instead puts text off the edge of the screen and makes me scroll to see it, or zooms out to only take up half the screen but doesn't change how it wordwrapped it... but I KNOW the phone is doing the wordwrapping because my laptop browser doesn't do that. Yes that's thunderbird, mailman, and android chrome combining into a captain planet of bad presentation, with each somehow making it worse.)

Anyway, a replacement E6230 battery (6 cells, nobody can quite agree how many coulomb-fornights it should have but somewhere AROUND 65 probably) is $30 through amazon and like $12 through ebay. (Fade went through amazon because she didn't trust ebay, devil you know I suppose.) And it arrived this morning and I left it on the couch rather than fiddle with it because after sufficient tilting and tapping, I got the old one charging again! (The CAPACITY seems reasonable, still well over half of factory specs. There's just something corroded or cat-haired or dendrited about the connector that makes it not want to START charging, but once it starts it stays started. Again, devil you know...)

And of course when I got to the table, it was powered off and I'd lost all my open windows. Fired up again claiming to have 100% charge though! Sigh.

August 28, 2022

Still trying to debug what's wrong with tar --xform as Elliott's seeing it but I'm not. I reproduced the problem on a Fedora image and it's somewhere in glibc's getdelim(). It looks like the "stdin" object is getting corrupted, although I have no idea how, and would really like to stick printfs into libc.

If you build glibc from source out of git and do an LD_LIBRARY_PATH=. ./toybox with in $PWD, it says "version `GLIBC_2.33' not found (required by /lib64/" which I did not link it against but Fedora is a taste sensation dependency explosion.

But if I "git checkout glibc-2.3.3" ../configure then dies with:

checking version of as... 2.31.1, bad


$ as --version
GNU assembler (GNU Binutils for Debian) 2.31.1
Copyright (C) 2018 Free Software Foundation, Inc.

But configure has:

echo $ECHO_N "checking version of $AS... $ECHO_C" >&6
  ac_prog_version=`$AS --version 2>&1 | sed -n 's/^.*GNU assembler.* \([0-9]*\.[0-9.]*\).*$/\1/p'`
  case $ac_prog_version in
    '') ac_prog_version="v. ?.??, bad"; ac_verc_fail=yes;;
       ac_prog_version="$ac_prog_version, ok"; ac_verc_fail=no;;
    *) ac_prog_version="$ac_prog_version, bad"; ac_verc_fail=yes;;

Yes, the as version I'm building with is TOO NEW to build glibc 2.3.3, which will only accept 2.13 through 2.19. Same for most of the other tools. Yes, the "copyright 2018" tool is too new to build the libc in a Fedora version that has a March 2021 date on ls -l /bin/ls.

The _levels_ of brittle they've built into this thing are just... sigh.

August 27, 2022

Ray Gardner wrote up his own explanation of the classic diff algorithm, and I'm trying to read it to see if it makes more sense to me than the other writeups did.

Diff and longest common subsequence

The algorithm is trying to collect/mark all the lines that will be kept (leading space instead of leading - or +). I got that part. We want to keep the most possible unchanged lines, but there may be more than one way to do that and some are less legible than others.

Saying that a subsequence is zero or more symbols and that it's different than a common subsequence didn't really help my understanding, because it's spending time defining terminology in a way different than what someone who has used the output of diff and the input of patch for many years already calls these things, instead of bringing the mountain to McDonald's as it were.

The paper by Hunt and McIlroy describes HM better than I can

Then why...?

Sigh. He KNOWS I already found and read that paper, and I said it did not explain to me "why are they doing it that way?" (I can see WHAT they're doing. That was never the problem. Some PDP-11 programmers back in the 1970's decided loading the entire file into memory at once and hashing every line was the best approach: Why did they do that? What are the actual advantages of this approach, what are the downsides, what are the corner cases and pathological inputs, what are the design tradeoffs and alternatives? The McIlroy paper SAID that doing it right is N^2 and the approach they've taken is cheaper but imperfect, but didn't really go into HOW. Sorting has quick sort and heap sort and shell sort and so on, with explanations of the tradeoffs and videos showing them in action, some by dance troupes. I am not finding anything like that here.)

Aking me to read a paper which literally says I should just read the other paper again, presumably over and over until I "get it"... Great.

Let's look at his python code and see if it's illuminating:

# Here if returning actual LCS elements:
# Omit step 9. It weeds out jackpots, where the LCS contains matches
# between values that hashed the same but are not actually equal
# due to hash collisions. Instead just return LCS elements from B.

If a "jackpot" is a spurious hash collision, why not say spurious hash collision instead of inventing more terminology? Is omitting step 9 what weeds out "jackpots", or would step 9 weed out jackpots and we're not bothering to do that? (If so wouldn't NOT checking for hash collisions have consequences? If our hunk says lines are equal that aren't actually equal, that seems problematic...)

Sigh, I can sit and chew through this too, but it doesn't seem much more productive than the first five such writeups? There have been at least two variants since "patience diff" all built on top of the same base, and git had performance issues due to changing hash algorithms, and everybody keeps tweaking diff but nobody has bothered to explain WHY the unquestioned base they're all building on is like that.

The real problem is I've reached the point where I'm tired of this problem space and don't really WANT to implement a "diff" at this point. It is not fun. Especially since a while back I threw up my hands and went "I'll write a new one from scratch that I _do_ undersand!" and did most of that work and people keep going "how stupid are you not to understand the old one, here, it's easy, stop what you're doing and look at what I wrote.... which literally says "read the math paper from the 1970's again".

Long ago I wrote a "patch" implementation that did not work like other implementations, and it turned out to be ok. I wanted to explore doing that in reverse for diff, but I didn't get to because I got an external submission dictating how it should go (which wasn't even patience diff, which I'd asked about at the time), and my questions about it are met with incredulity. No "why" question has received a "why" answer so far, and I am tired.

August 26, 2022

Arrived at the table. Did not remember to pack my laptop charger. Ok, deadline pressure.

Debugging toybox tar --xform under fedora is extra-annoying because along the way I tried to "yum install strace", and did not immediately notice (elsewindow while it did its thing, then distracted working on the context switch) that it tried to INSTALL ALL SOFTWARE UPDATES SINCE THE DVD WAS PRESSED. Without asking. Into a vm running from read only storage union mounted with tmpfs in 4 gigs of memory. Yeah, system got very swappy and unhappy when I let that run a bit. (You CAN paralyze the GUI on an ssd with swap thrashing. Probably terrible for the ssd's lifespan. I wonder if there's a tool to say how many cycles it guesses it has left and how many bad sectors it's remapped and so on? I very vaguely remember such tools for server crap ten years ago, but am not swap-thrashing over to THAT right now for the same reason I'm not [LINK]diving deeper into ASAN or valgrind just now...)

And here's more BIG LONG TANGENT that I edited out of my reply to Elliott's implication that x86-64 is a z80 with no registers:

Nah, it's 32 bit x86 that only had 8 registers. (Well, that and the original arm "thumb" instruction set. :) Z80 had 16, as do armv7, thumb2, and x86-64. (And superh, vax, s360, m68k... 32 registers is also popular, but whether the lower code density from the extra bit in the instruction word is actually a net win is a judgement call...)

We can all agree 8 is too few. Federico Faggin wasn't the only one to add 8 more general purpose registers to the 8080 lineage, the DEC Alpha guys did too. When Compaq purchased the corpse of Digital Equipment Corporation in 1996, they didn't want the chip design team so AMD scooped them up and asked "if you were to make an x86 chip, what would it look like"? The result was Athlon. Then they asked "If you were to extend it to 64 bits, what would THAT look like?" And the result was Opteron, with only 10% more circuitry than Athlon. (That's why it fit in Alpha EV6 motherboards with a different BIOS, and supported SMP out of the gate.)

The Alpha designers doubled the number of registers in the chip (and made all the new ones general purpose) because it was basically a shame NOT to, and the result was a surprisingly non-disgusting chip given its heritage.

Intel tried everything to kill x86-64 and all SORTS of executives staked their careers on Itanic, but at the end of 2004 Dell (which 50% of Intel's sales volume went through) said all their high end customers had informed Dell that they were buying x86-64 systems next year, and Dell had the amount of time it took their in-house bureaucracies to approve purchase orders through new vendors to provide them with x86-64 systems before they went elsewhere to buy them. Dell passed the ultimatum on to Intel, which had a SCREAMING MELTDOWN but produced what they called the "ia32e" which ran the x86-64 instruction set although they intentionally ripped the iommu out of to break compatibility (and of course they bolted it onto the Pentium 4 base because politics)... but it was close enough and the market consolidated and now we all have 8 more general purpose registers.

(And then as laptops displaced desktops Intel was still shipping Pentium II into that space because P4 would drain a full battery in minutes and the India design team just kept making it WORSE but P3 was fundamentally unportable to newer manufacturing processes, so Intel's Israeli design team took the Pentium III and properly latched all the racy logic and bolted 2 megs of L2 cache on die and called the result "Pentium M" and it absolutely KILLED the Pentium 4 in the market, and everybody watched the lightning show and listened to to the queen soundtrack until the pentium 4 was no more and the pentium M got renamed the "core", and then they made it SMP (core duo), and THEN they finally made it x86-64 (core 2 duo), at which point it was A) everywhere, B) didn't suck NEARLY as much as everything else Intel had done in the previous ~7 years. Intel eventually learned not outsource its tech to Bangalore, because the REASON it was cheap was they were throw recent college gradutes at it and then 2-3 years later cycled those guys out to go service higher margin contracts for customers in China/Korea/Vietnam, so you were perpetually paying to train newbies with no continuity of experience on the project and then they went to work for your competition. This was 20 years ago of course, no idea what what the relevant technogeopolitics looks like now, the center of gravity migrated to shenzen then faceplanted when the tanks rolled into Hong Kong after that asshole murdered his girlfriend on a vacation in taiwan and Hong Kong passed a bill that let Taiwan extradite people from Hong Kong and Emperor Xi went "Taiwan is mine mine mine mine mine and here's the list of Hong Kong citizens I'm extraditing to Beijing under the new bill for the crime of comparing me to Winnie the Pooh"... Kinda eclipsed by Covid but it's all going pear shaped.)

I know I've done better writeups on the above than this, but I can never find them except for the REALLY old stuff. The point is, x86-64 has 16 registers which is enough to dedicate one to TLS and one to the stupid C++ "this" pointer and one to a frame pointer and so on. Heck add 3 more to go from ELF to FDPIC (instead of "base pointer plus segment offset" for text/data/rodata/bss, each of the four is located independently and gets its own register). And under the covers the modern intel stuff is doing HORRIFIC speculative register renaming crap opening endless sidechannel data leaks and security holes. The HARDWARE probably has at least 256 registers, it's just the instruction set's namespace that's constrained...

P.S. this and this are useful for discussions in this area.

You can tell I'm a bit fried when I go on a computer history tangent.

August 25, 2022

Sitting on my hands and NOT appending the following to my reply to Elliott's "You are not the user" comment:

I miss the days when anthropologists analyzed our community and gave us vocabulary to distinguish "user" from "user".

I used to do a lot on the anthropological side of things myself, back in the old days, especially before I fell out with the author of the cathedral and the bazaar (alas, he went crazy), but I just haven't had time recently. I've got a broken buzzy recording of the prototype and the fan club talk on my talks page but never even got that much recorded of the three waves talk I did at the same conference a few years later. (Once upon a time I even did user interface development. No, "the inmates are running the asylum" and "the design of everyday things" weren't the last words on usability and affordances and understanding your community, and I told Eric at the time his Aunt Tillie metaphor was... not ideal. I kinda got discouraged back around 2010 but this is still VERY NECESSARY work to do...)

Any ecosystem has layers dependent on other layers. "This area is my problem" is not the same as "this is all that matters". Since McDonalds serves 100 times as many customers per day as there are cattle ranchers in the united states (yes actual numbers: 70 million vs 0.7 million), does that mean it's clear which group isn't "real" and can be safely ignored?

Sigh. The Linux Foundation basically clearcut and stripmined the Linux ecosystem hobbyists built and we're all watching people who went into it before then age out. It's gonna get UGLY in about 15 years. (Unless this got fixed while I wasn't looking.) Oh well...

At that point I stopped and just cut and pasted it here because A) I'd wandered way off on a tangent and B) it's not my place to subtweet at Elliott about how he treats other Android team members. But me, I'd probably find "your use case doesn't matter because you aren't real" somewhat demotivating. I _do_ want to make things better for developers...

A few hours later I forwarded a lightly cleaned up version of this thread (hey, it's public) to Jonathan Corbet of Linux Weekly News, along with the patch that started it, and a brief explanation:

They'd like to use devtmpfs, and the missing piece they need is obvious enough I independently came up with it off the top of my head, but the friction of attempting to engage with the kernel clique AT ALL is just too high to even bother trying.

Once again the "sitting on my hands" part was NOT appending to that explanation:

I may be biased on this subject because it took me five years of reposting to get the perl removal patches in, and initmpfs took many 5 years from "hey this should happen" to me bashing it repeatedly through and even afterwards it still has obvious bugs that nobody fixes even with the patch posted to the list. Plus things like this and this don't even get replies from humans anymore. Perhaps people not bothering to engage with lkml anymore is normal now? This was almost a decade ago and the situation has not improved. If we've all just given up, then sorry for wasting your time...

August 24, 2022

The glibc devs backported the fix to the 2.36 release branch! In other news, they've decided to have a 2.36 release branch. Life is good.

I cherry picked commits from my ongoing diff fork and pushed them upstream, in part so externally visible development doesn't look dead and in part because I'd checked the portability.h workaround for glibc 2.36 into my fork, and it's no longer needed, and rather than add a commit removing it I can just transplant commits I haven't pushed yet with "git format-patch" and "git am". I still have multiple diff commits in there that need to go upstream as a series because the previous one was usable to some people and the new one isn't yet. Although part of the reason there is I'm implementing stuff the previous one didn't have as I go through...

Sigh. Seriously, gcc?

warning: operation on 'ss' may be undefined [-Wsequence-point]
ss += sprintf(ss = toybuf, "%d", h1->pos);

How is that undefined? ss = toybuf happens in the function arguments, which all MUST happen before we return a value to the +=. You can't perform the increment assignment before calling the function, which returns the value you're incrementing by, and you can't call the function before evaluating the argument to the function! If ss is a global it's visible in the function, and if I have a global containing a pointer to the local that the function can access its value through then it's equally visible. Unless you're saying a FUNCTION CALL is not a sequence point, there is no ambiguity about what's supposed to happen here. (Function call arguments are special in that "," isn't evaluated strictly left-to-right as it is outside function arguments, but not special in that they must be evaluated and their side effects take place before we call into a function? So you can only ever INCREASE undefined behavior, never constrain it?)

I'm aware that historical optimizers broke x++, and also that the order of argument evaluation was reversed for people doing "pascal" stack order so "if (x) y, z;" doesn't work the same in function arguments than everywhere else. That's annoying, and the standards commitee cowardly refused to specify behavior in a way that would outlaw stupid things microsoft was doing. Got it. But this has no reason to ever break.

I'm sad when the C++ people who took over C compiler development sprinkle yet more random land mines around C development in hopes it because as much of a cargo cult as C++. Eventually I expect them to render C++ unvaiable and for it to be replaced by the zillions of languages (go, rust, swift etc) whose entire reason for being is "C++ is terrible, let's not use it".

Unfortunately C++ is the drowning swimmer determined to drag C down with it, and I am sad. C is a very good language for doing what it does. C++ is a terrible langauge for doing anything ever, and its entire marketing strategy is being Mr. Hyde to C's Dr. Jekyl, living a parasitic existence standing in for something simple and sustainable.

August 23, 2022

Amazon is evil may be sung to "Every sperm is sacred".

Oh hey, remember how I lost my twitter account because they insist on tying it to a phone number which I refuse to do? Twitter just paid a $150 million fine for doing that. (They continue to demand it anyway. As John Rogers regularly says, "a fine is a price". They paid the fine, and continue the behavior. Just more surreptitiously.)

Seriously, twitter is held together by string. They are LESS evil than the obvious alternatives, but capitalism, plutocracy, and lead poisoned Boomer voting patterns are corroding every good institution the country has. The LD50 on the Baby Boom is still 2034.

August 22, 2022

Ok, this is just sad. And it's probably posix-mandated sadness:

$ mkdir three four
$ seq 1 1000000 | dd bs=4096 count=8 > file1.txt
$ seq 1 1000000 | dd bs=4096 count=8 > file2.txt
$ sudo toybox losetup -f file1.txt
$ sudo toybox losetup -f file2.txt
$ sudo toybox mknod three/loop b 7 0
$ sudo toybox mknod four/loop b 7 1
$ diff three four
File three/loop is a block special file while file four/loop is a block special file
$ diff -r three four
File three/loop is a block special file while file four/loop is a block special file

I made two block devices that should compare identical (loopback mounts to files with equivalent contents), and it won't compare them. The posix wording "If both file1 and file2 are directories, diff shall not compare block special files, character special files, or FIFO special files to any files and shall not compare regular files to directories" seemed a little unclear to me: it won't compare them to REGULAR files, won't compare them to different kinds of files (but can compare a block device to another block device), or wouldn't compare them to ANYTHING? And the answer is "wouldn't compare them to anything but for some reason we made directories a separate clause". If you mean anything, why not say anything?

It says the rationale is it shouldn't get hung up reading from a blocking input source, but you can hit that in an automount or a network filesystem. And they have the explicit test of block or char device with same major/minor compares identical, but when they're not identical it just DOESN'T compare them? That's not consistent behavior...

August 21, 2022

Ah, I did NOT have "meat sweats" from the lovely steak dinner Fuzzy prepared for the household. When I went to turn the air conditioner down a couple degrees, I found out that the inside temperature was 80 and despite the thermostat's insistence that the air conditioner is running, no air was coming out of the vents.

The soonest an air conditioning repair person can get here is thursday. Lovely.

August 20, 2022

I am managing to be meta-annoyed by gnu projects, which I suppose shouldn't be surprising. They got "being wrong" wrong. (I want to make a "Century of the Fruitbat" reference here, but it's been Century of the Anchovy a while now.)

I got new the diff.c implementation implemented enough to start compiling and testing it again. This round is the "dir vs file" and such logic, implemented without dirtree. So much debugging, and then a zillion command line options to go through and make sure I've implemented and tested...

August 19, 2022

Oh good, people are working on running the haber-bosch process with renewable energy.

Context: power generation and transportation together are around 1/3 of the global economy, and the way those switch over to renewables is obvious. But something I've been tracking for years is technology to replace the OTHER uses of fossil fuels, beyond directly burning them to produce heat, electricity, and motion.

The next largest chunks of the global economy are argiculture, manufacturing, and finance (what order they goes in depends on how you measure it: finance claims to be the biggest thing every but finance is entirely a social construct moving score points around so people have social permission to do stuff, how many score points it accumulates to itself and how many people spend their working hours doing finance are some distance from each other. I'm not sure what the "money isn't real" version of "atheist" is called, people keep bringing up carl marx which is a bit like saying atheism is based on the writings of cristopher hitchens. That's... not how it works? There wasn't a specific guy who proved santa claus doesn't exist? If anybody is recently important here it's David Graeber, who was apparently important enough to be seduced by a russian spy and dead within months of marrying her.)

Anyway, all the uses of oil and methane (marketed as "natural gas") as feedstocks in plastics and pharmaceuticals and such can generally be replaced by renewable sources, and people are working on that. The natural world already builds every organic compound out of sunlight, air, water, and trace minerals. It has to: anything organic exposed to an oxygen atmosphere for a few thousand years basically evaporates, so either the biosphere can reproduce it or it's non-organic slag (like the silicon dioxide and aluminum trioxide making up most rocks). The ground is made of stuff that's most-oxidized state is a solid, but carbon dioxide is a gas. That means any carbon compounds on the surface came from living cells that used energy from sunlight to pull it out of the air within the past few thousand years.

We use fossil fuels as cheap inputs to organic chemistry synthesis because they were buried deep enough underground oxygen couldn't get to them. Just as plate techtonics only became accepted accepted in the 1950s and the asteroid that killed the dinosaurs was tracked to a specific impact crater in the 1980s, we've only recently started figuring out that the majority of fossil carbon comes from the brief (geolocially speaking, couple million years tops) period after plants invented cellulose as a building material, before microbes figured out how to break it down. Undigested cellulose piled up in thick layers and got buried under sediment, kind of like today's "great plastic garbage patch" in the pacific ocean. (Although since cellulose is a carbon based polymer, the biosphere has a head start working out how to eat plastic and is making visible progress already. But even today most things can't digest cellulose; animals that do require elaborate plumbing with symbiotic bacteria like a cow's four stomachs or termite's abdomen. Horses can die if they drink the wrong temperature water because their grass-eating digestive system is COMPLICATED.)

So in theory, you can always replace fossil fuels in any organic synthesis with electricity, air, water, and maybe some trace minerals. Because that's where it came from in the first place. The trick is making it cheap enough to compete, and scaling it up to industrial quantities.

I've mentioned before how Norman Borlaug's green revolution quadrupled the world's food supply, which is the reason all those 1960s predictions of overpopulation and starvation never panned out. Borlaug bred new strains of wheat, corn, and rice that could absorb more nitrogen fertilizer and turn it more efficiently into food (breeding "dwarf" varies that grew less plant and more seed, so with enough water and fertilizer you could both grow up to 3 crops per season on the same field, and each crop would yield more grain; he also bred in a bunch of disease resistance but modern agrobusiness didn't follow up on that part, chosing to spray each new bug and fungus since the 1970s with chemicals).

The nitrogen fertilizer is the main reason agriculture is so dependent on fossil fuels. The main input to the plastic and pharmaceutical ecosystem is usually methane, but the main input to the agricultural nitrogen ecosystem is ammonia. The Haber-Bosch process is named after a nazi scientist and the industrial engineer who made the nazi's reaction scale from "enough to gas soldiers in their trenches" to "enough to feed the world". The process makes ammonia by combining nitrogen and hydrogen under great pressure (200 to 400 atmospheres) and high temperature (400 to 650 degrees celsius) in the presence of an iron catalyst.

(Just like with digesting cellulose, this "nitrogen fixing" is a thing most plants CAN'T do, and the small number that can generally have symbiotic bacteria actually doing the chemically tricky part.)

So the haber-bosch inputs are nitrogen, hydrogen, and energy to compress and heat the gasses. Pure nitrogen comes from pressurizing air to a liquid and letting the layers seperate (into liquid nitrogen, liquid oxygen, and a trace amount of "everything else"). While you can pass an electric current through water to break it down into hydrogen and oxygen, but currently the cheapest source of hydrogen is from combining steam with methane at a thousand degrees celsius. (Back in the 1920s the fossil fuel industry marketed methane as "natural gas" when it was replacing coal gas in the 1920, and they've paid a lot each year to keep the name current so nobody calls it "fart gas", which it is). The fossil fuel industry leaks a bunch of methane into the atmosphere (and even into the water supply) both digging out out of the ground and piping it to its various consumers, and in the atmosphere it's 80x as potent a greenhouse gas as CO2. (Luckily it has only about a 7 year half-life, oxidizing into CO2, but in the short term switching from oil to "natural gas" methane has made global warming NOTICEABLY worse.)

So the Haber-Bosch process produces nitrogen fertilizer from atmospheric nitogen using a BUNCH of energy (instead of the clever enzymes you find in clover and peanuts and such), and the article linked back at the start is about doing that fromm solar and wind instead of fossil fuels. Given that agriculture is something like 1/6 of the world's economy, we need to scale this WAY THE HELL UP as quickly as possible. Just switching everybody's stove and water heater and furnace over to electricity requires trillions of dollars of infrastructure investment (and completely reinventing how we pay for it all, since the transportation infrastructure becomes the dominant cost when renewable generation is distributed instead of centralized). Manufacturing and distributing billions of tons of nigrogen fertilizer using new and different infrastructure is a similar level of investment.

But as with aluminum smelting and electric arc furnace steel smelting, this is potentially a big "demand load" that can consume surplus power when variable generation is high and idle when extra power isn't available. (With sufficient automation so you're not mobilizing factory workforces on an hour's notice. Batteries can provide an hour or two to smoothly start up and shut down production lines, but "Will we work tomorrow? Only if it doesn't rain..." Well, farm labor's been doing that forever...)

August 18, 2022

Elliott poked the glibc people, and they say the sys/mount.h breakage should be fixed by these two commits. Still needs testing, but they've at least acknowledged the problem.

I keep oscilating between dirtree-based and non-dirtree based approaches to the two parallel directory traversal code, which means I'm throwing out stuff I implemented and then regretting it and rewriting it. The most recent problem is the lifetime of the directory filehandles: I want to openat() the entry within the directory, but I don't have an "end of this directory" notifier and if I let dirtree_recurse() return it's already closed the directory filehandle.

Yes, keeping a filehandle open for each directory level of two parallel traversals COULD hit filehandle exhaustion. And no, I don't expect "file/dir moves out from under us during traversal" to be particularly exploitable here. (Merely annoying, and reporting it as an error or even aborting partway through is probably fine.) But it sets my teeth on edge to do it WRONG...

This is one of my personal known failure modes: paralysis thrashing between two exclusive but nearly equivalent options because the difference between them is small enough I have trouble deciding. The usual way I break this is sort of a variant of "don't ask questions, post errors" by trying to make the simpler/crappier one work and see what's actually wrong with it. In this case, that means don't use the lib/ plumbing, just code up a standalone thing.

It _is_ a little silly to stat() each entry just to get the dev_t, but I have to stat the command line arguments because they didn't come from readdir, AND since this traverses symlinks I have to stat through any symlink I see (because the dentry is just "it's a symlink"), and at that point having one codepath beats having more efficient execution (when it's all cache local anyway and only nommu systems running at 2 digit megahertz in 8 megabytes ram or less are likely to care about the difference, and it's probably still ok-ish there.)

August 17, 2022

The mapping between stat->st_mode and dirent->d_type is actually pretty simple, if non-obvious: type is mode>>12. That's it. It doesn't LOOK like that because the DT_BLAH macros for d_type in dirent.h are listed in decimal and the S_IFBLAH entries in bits/stat.h are listed in octal, but if you compare them they do line up. So I can have readdir() plumbing only do a stat() if it gets DT_UNKNOWN and trivially translate the value, and then fall back into the same codepath. So on non-broken filesystems it wouldn't do an unnecessary stat.

I still have the "initial entries didn't come from readdir" problem, but if I force them to DT_UNKNOWN (which is zero, so that's easy) and always have it do the fallback path for the first entries, I can avoid writing the same code in two places which is sort of the point of the exercise. (If at all possible, code that fetches data from the OS and makes the same decision on the result should not exist in two places. Because over the years it can version skew so one instance is using the openat() variant and the other isn't, and eventually there's some weird corner case where the behavior differs between the two under a specific selinux regime on this particular filesystem... Or somebody tracks it down and changes it and only later finds out they've only changed some but not all of the cases where it does that. There should just BE one place it does the thing and makes the decision. Single Point Of Truth: I'm a fan.)

Um, what on earth is DT_WHT? Whiteouts in union mounts shouldn't be user-visible? The linux source doesn't seem to have it? (There's a translation table in fs/fs_types.c with all the _other_ types, for no obvious reason since it's just a bit shift and mask...) The file include/linux/fs_types.h defines it but that file was added in 2019 and it says the values were copied from glibc. (Why...?) The code, xfs, and vboxsf filesystems have DT_WHT checks but it's all in code that copies all the types from the fs_types.h file... Ah, vboxsf does say SHFL_TYPE_WHITEOUT which again SHOULD NOT BE USER VISIBLE. If you are SEEING whiteouts, your union mounts are BROKEN...

Right, rathole. The kernel has no hobbyists left to do cleanup, so is accumulating cruft. No surprise.

Ok, I need to do two parallel directory traversals, apparently following symlinks (why else need duplicate detection?) so "cd .." isn't useful and I need to keep an open filehandle or cd back down from the top). I need to readdir() both, sort them (and filter for -S although that doesn't require sorting, it's just an inequalify comparison which strcmp can do; the sort is for stable output). And ideally I don't want to recurse to unlimited depth because that's not friendly on nommu systems with fixed stack size.

The problem is it's hard to make "not recurse" and "inherit DT_UNKNOWN structs and fix them up automatically" work cleanly together. Um... Well, maybe... Ah, problem: I would _also_ have to fix up the dev_ino pair if I didn't stat it on the way in. Although that's easy enough to do since I can just copy over the ones from the stat all the time (they should never be WRONG, and they're basically free at that point)... oh hang on, I have to stat() to get the dev value: readdir() just gives me the inode number but assumes everything in the same directory is on the same device. (Which is wrong in the presence of bind mounts, and yes you can bind mount a file.) So I always have to stat() each entry, which is tipping the balance BACK in favor of having dirtree() do what it can here, and supplement it with a really weird callback.

Hmmm, maybe not that weird. Part of the reason I never implemented DIRTREE_BREADTH is you can do it as a callback, just return DIRTREE_SAVE without DIRTREE_RECURSE. There isn't a callback for "end of directory", but the function you called to traverse the directory returns and you can then manually call dirtree_recurse() on each child of interest? So I could collect all the entries, sort them (using a fresh array of pointers), and then process the result manually outside of a callback.

August 16, 2022

Elliott is having some sort of problem with tar --xform that I can't reproduce, and I've got better error reporting here in my tree but can't push to the kernel until diff.c isn't a huge regression for them.

Sigh. There's a REASON diff stayed in pending so long...

The next problem with diff.c -r is it needs to descend two directories in parallel, which dirtree.c isn't designed to do. Unlike cp/mv this cares what's already in the target directory, because the "only in $DESTDIR: file" messages need to be produced, which means we need to readdir() the target directory as well, and compare them.

I don't use dirtree then I can't use dirtree_path() to make a path string to this entry from all those saved parent entries, plus the loop detection requires me to keep the list of dev/ino entries for each parent anyway, which dirtree already does. That's significant reinvention of existing infrastructure right there.

The other bit of diff funkiness is it alphebetizes the contents of each directory, which means loading in all the contents of the current directory before descending into subdirectories. While I defined a symbol for DIRTREE_BREADTH I never actually implemented breadth first search because nothing was using it yet (and this would thus be the only user).

I'm caught in one of these cases where I have to decide between multiple close options: the infrastructure I have isn't QUITE a good fit, but it's good enough that reimplementing it in this command would be obvious bloat. Should I implement DIRTREE_BREADTH in lib/dirtree.c even though this is the only user? In theory breadth first search can be controlled via callback (return DIRTREE_SAVE instead of DIRTREE_RECURSE, and then do a second pass that calls back into the dirtree infrastructure manually for each new level. In any case, using two instances of dirtree in parallel on two different directories requires brand new glue logic. (I could have DIRTREE_SAVE snapshot both trees and then traverse in a second pass, but that consumes who knows how much memory up front...)

On the other hand, if I do readdir() manually, can I trust the d_type entries? The man page says "not all filesystems" support this (I.E. return DT_UNKNOWN for everything), but the only one I remember being broken like that was reiserfs, and Hans basically murdered his filesystem when he murdered his wife. (If Linux still had hobbyists somebody probably would have CHECKED, and either gone through and implemented support in all the filesystems or poked Michael Kerrisk to update the man page. But no corporation is going to pay for engineering time for that kind of general cleanup. And I still have dangling patches the toxic bro dudes have ignored for years, although I still occasionally repost a couple. Still, no point making more of them.)

Then again I have to stat() the immediate command line arguments to see if they're directories, so if I can get that stat() to be the same codepath somehow...

August 15, 2022

Khem Raj submitted a fix for the glibc issue, which both shows interest and is a very effective (if unintentional, which is always the best way) "don't ask questions, post errrors". His fix modified multiple commands to move the sys/mount.h include into them, which punishes the commands for glibc's bug. The correct fix is to treat this as the glibc bug it is and have an #ifdef glibc in portability.h that copies the individual symbols out of the damaged header, and then #else include sys/mount.h for every other libc that hasn't broken this.

I've checked that into my ongoing branch, but still can't push it until I'm ready to replace diff.c for the android guys.

(A newbie programmer wants to have a long discussion with me in email, which on my part boils down to sending him links to somebody already doing the next thing he's enthused about 30 years ago. Shouldn't this be what college is for?)

August 14, 2022

Every time I try to sit down to edit tutorial video, I bump into daily reminders of why I really don't want to have anything to do with prudetube. It's disheartening.

What I really want to do is make a "videos" page under the toybox page, with an index of all the videos I've made and links to them. (Should the status page should be involved, since it's a command list with "usage" and "implementation" links under each, plus a bunch of mkroot and lib/ videos...?)

Unfortunately actually _hosting_ videos on my website is awkward because capitalism: the bandwidth is limited and dreamhost has had a history of objecting to large directories full of video files that I didn't link anywhere and was just watching myself ("Prohibited data on your account being moved to DreamObjects storage", which they then tried to charge me extra for until I deleted and disabled it). And just random web spiders periodically downloading all of the files I published for Aboriginal Linux had a history of straining stuff...

The main problem with uploading videos to is every piece of content ever uploaded there seems to be in a single flat namespace (for no apparent reason), and there's no obvious way to glue together collections. (It's been done before so there obviously IS a way to do it, but if I have to perhaps upload a big zip file of everything, I'd have to produce it all before publishing rather than periodically adding to a collection, which isn't really helpful.)

August 13, 2022

Oh great, there's a glibc bug coming up that breaks the toybox build. In a way that really doesn't look obvious to _fix_ cleanly.

Poked at the "qemu-system-ppc64 -M powernv" target, and I think the kernel's powernv_defconfig probably works with it, but this "IBM opal" bios nonsense spends 10 seconds spitting out all sorts of errors before it even tries to load the kernel. Not my first choice of target. (Openbios is chatty and overengineered enough as it is...)

Adding all the diff.c file/dir behavior back is funky and I'm not sure the previous one got it right either. There's stuff like --from-file that's its own little world, and my recent changes to have error_msg() use the ARGFAIL() instead of 1 only solves PART of the problem. (If you go through a list of files, some of which are there and some of which aren't, do you use the error value or the "saw a difference" value? Does one always win or is it positional? Looks like the error case always wins. Ok, so I need a TT.sawdiff that gets set and on exit set toys.exitval only if it's zero. Ah, I already worked that out a few days ago, right...)

It's not entirely obvious to me where the stat() should go. The --from-file stuff is a loop, then the comparison against directory is a loop, and then -r is a loop, and diff -r dir1 dir2 is probably two loops. (Or DIRTREE_SAVE, especially with that crazy alphabetical order constraint.)

My general instinct is always to open a filehandle and fstat() it to avoid race conditions, because if you stat() a name then act on the name you're never sure it's the same object but if you have a filehandle it doesn't matter if somebody moved the dentry since then. But the stat() is primarily to figure out if we're working on a directory in which case we recorse into the contents, and the dirtree stuff isn't really set up to inherit a filehandle. How security critical is diff -u against inotify races? Probably not very, but I want to do it right on general principles. Do I want to do it right enough to add an entry path to the dirtree plumbing? Hmmm...

Of course the dirtree plumbing isn't 100% airtight in this regard either, dirtree_add_node() does an fstatat() on the name and the actual open happens in dirtree_handle_callback() when we recurse into a directory. But then it can compare the dev+ino values with what it previously statted, and I have TODO notes to make all that work without keeping persistent filehandles open because things like rm -r must have infinite recursion depth since you can mkdir a/a/a/a... and then cd a/a/a/a... and mv "~/x ." (which could be x/x/x/x...) to assemble arbitrarily deep directory stacks no matter what the system thinks PATH_MAX should be. Heck, something like while true; do mkdir -p a b; mv a b; mv b a; done would keep going until inode exhaustion...

So yeah, I guess I can just stat and call dirtree and if somebody gets sneaky and clever out from under diff it'll either compare the wrong files or throw an error message and exit with 2.

August 12, 2022

I got a release out! And of course testing the mkroot images found stuff that I wasn't going to hold up the release for. Namely, I haven't been regression testing that "./ -hda ext3.img" works, and several architectures' block device support has bit-rotted. I fixed m68k easily enough (looks like some new guard symbols showed up, CONFIG_SCSI and such). Unfortunately, 32 bit powerpc is more deeply broken.

A kernel commit last year removed the IDE_PMAC driver (including deleting drivers/ide/pmac.c) used by "qemu-system-ppc -M g3beige". The commit comment says that the scsi bigots finally convinced everybody to switch over to systemd pretend yet another category of non-scsi hardware is vaguely scsi shaped. (Which is exceptionally annoying because only the scsi layer has asynchronous enumeration problems, meaning the order devices appear in can vary from boot to boot. There was never any question which piece of hardware "/dev/hdc" was since you had to physically unplug it and move it to a different connector to change which device it showed up as, but "dev/sdb" can be a different device if you reboot without changing anything just because racing drivers completed asynchronous probing in a different order. The IBM developers broke this 15 or so years back to force the desktop Linux guys to confront the dynamic enumeration issues the mainframe systems were never able to solve. They thought the hordes of open source linux on the desktop guys could fix anything! What actually happened was desktop Linux got worse, helping Mac pull far ahead and giving Windows time to recover, until Linux on the Desktop was buried at a crossroads with a stake through its heart. (Basically anybody with taste went "that's terrible, don't do that", and they went "bwahaha IBM is spending a billion dollars a year on Linux you WILL BOW DOWN BEFORE ZOR-EL!" and the people with taste just backed away slowly from the entire hotplug layer leaving it in the hands of Greg Kroah-Hartman and Kay Sievers.) Since then we've had the whole udev debacle, the failure of which was exploited by systemd. The Linux Foundation drove hobbyists away from Linux development until the community forgot what a hobbyist even is. All around a completely unnecessary self-own.)

Anyway, the commit comment says to switch over to libata but does NOT say whether or not there even IS a new pmac driver dealing with this particular memory-mapped I/O hardware interface. Let's see, what does find -iname '*pmac*' dig up in the current kernel source... there's a pmac32_defconfig in arch/powerpc? Ok, let's make ARCH=powerpc pmac32_defconfig and... it wants to build 8 gazillion modules. Of course it does. Fire up menuconfig and switch off "Enable loadable module support" in the top level menu, and NOW build it (with CROSS_COMPILE pointing at the powerpc compiler prefix and ARCH=powerpc), and... it built. It booted. And it kernel paniced instead of giving me a shell prompt, lost in the nebulous space between "Warning: unable to open an initial console." and the init script getting far enough to mount devtmpfs and redirect its own stdin/out/err. I submitted a patch to the kernel guys MANY TIMES to fix this, most recently in 2020... No, I've got one here from February of this year? Which is just adding three lines and moving one line down, it's the simple tiny version without the debian bug workaround. (Basically if the init_eaccess() fails add an else if (IS_ENABLED(CONFIG_DEVTMPFS_MOUNT)) {init_mkdir("/dev", 0755); devtmpfs_mount();} and then move console_on_rootfs() to right after that.)

Ah: CONFIG_DEVTMPFS is not enabled. Right, enable it. And CONFIG_DEVTMPFS_MOUNT sicne I've applied my patch. And it's still panicing even though the boot messages say it mounted devtmpfs...? Because CONFIG_PMACZILOG_CONSOLE isn't enabled, even though the serial driver ITSELF is -- has anyone ever actually USED this defconfig?

And now it went:

pmac_zilog: Error registering serial device, disabling pmac_zilog.
pmac_zilog: Did another serial driver already claim the minors?

And then somehow managed to kernel panic into the openbios monitor? I didn't even know that was an option. "Vector: 300 (Data Access) at [d1015c40]" it says. Right. Turn off the OTHER serial drivers in menuconfig, and...

Yay! That got me my shell prompt! And when I added -hda blah.img to the command line and did the "mount /dev/?da /mnt" I got my block device. Now to work out which set of symbols I need to add to my existing config...

Part of the reason I'm doing this is to see how hard it is to recapitulate phylogeny. Doing bringup on a new board often starts from an existing defconfig, and getting something that can usefully boot to a shell prompt can take a while. I should do a tutorial video on this, but youtube is a wretched hive of scum and villainy and it's hugely demoralizing having anything to do with it...

I gave up on following tutorial-like procedures, took a guess as to which symbols I needed next, and was right. CONFIG_PATA_MACIO looked likely, according to menuconfig the dependency stack under it was ATA,ATA_SFF,ATA_BMDMA, and then when the init spam showed it finding the qemu disk but there was no driver in /dev I switched on CONFIG_BLK_DEV_SD because the SCSI bigots are COMPLETELY INSANE. Sigh... Anyway, works now, on to the next thing...

August 11, 2022

Alright, I've done enough sidequest work here that hasn't gotten me closer to a release. I wanted to get diff.c and the readlink relative path stuff fixed up this release, and instead ate several days recording video footage I haven't edited, working out diff algorithm corner cases, and teaching lib/args.c new tricks. Time to start on the release notes and building all the targets for testing...

August 10, 2022

I'm teaching the error_msg() plumbing to change a zero error value to the TOYFLAG_ARGFAIL() value (when there is one) instead of to 1. I'm triaging the current list of commands to make sure that's the right thing to do, which is currently chroot timeout cmp env grep nohup sort, of which chroot, timeout, cmp, grep, and nohup already use the argfail return value for file not found. The env errors are things like malloc() failure (which basically doesn't come up) or exec failure (which has its own rules)...

Lots of digression into checking the debian version's exit value for errors OTHER than command line parsing. Rathole city, as usual.

Hang on, I don't support sort -C? Huh, just -c apparently. Trivial to add, it's just -c with the warning suppressed. Probably showed up in susv4 and I wrote the code for susv3. This is part of the reason I need to triage/audit every command against posix before the 1.0 release. (That and making the test suite check every crook and nanny...)

August 9, 2022

Posted a checkpoint of my new diff to the list: it only compares two files and hadn't been fully debugged yet, that was just the first thing I got working. (As in it DID give me a correct unified diff for a changed file pair I threw at it.)

Elliott thinks I should check it in as a new file next to the old one, but I'm a bit gun shy about that for many reasons? I've checked it into one of my "clean" directories and we'll see if that becomes the new history. (So far toybox's history only has two merge commits in it, one was a mercurial bug and the other was when I was new to github. I'm not going to do a merge commit for this, I do git pull --rebase in the external repository instead. A big advantage of "git format-patch" and "git am" is no merge commits.)

I want to move the --color support into lib/xwrap.c, but it turns out the gnu crap is not consistent, to the surprise of nobody:

$ ls --color=walrus
ls: invalid argument ‘walrus’ for ‘--color’
Valid arguments are:
  - ‘always’, ‘yes’, ‘force’
  - ‘never’, ‘no’, ‘none’
  - ‘auto’, ‘tty’, ‘if-tty’
Try 'ls --help' for more information.
$ diff --color=walrus
diff: invalid color 'walrus'
diff: Try 'diff --help' for more information.

Sigh. I think mine just needs to support always/never/auto and then wait for somebody to complain?

Ok, how do I handle the error return code of:

$ diff -u --from-file=README doesnotexist REED; echo $?
diff: doesnotexist: No such file or directory
+++ REED
@@ -7,7 +7,6 @@

The return code is 2 because "bonk" didn't exist, not 1 because the files differed. (And not 0 if there were no changes.) The problem is, if I set the exit code to 2 at the start, the error_msg() call doesn't change it (because it's nonzero), but then how do I then know NOT to clear it later when starting an actual diff?

Hmmm. I need some variant of out of band signaling here. I need to keep toys.exitval zero so the error plumbing can change it, and I need a "saw differences" indicator in GLOBALS, and then I need to set the exit code on the way out. I think?

August 8, 2022

Oh dear. I just noticed that (at least before I started cleaning up diff) the 2014 external submission had EXACTLY the same help text as busybox:

$ diff -w -u <(busybox diff --help 2>&1) <(./diff --help)
--- /dev/fd/63	2022-08-09 17:19:14.415993160 -0500
+++ /dev/fd/62	2022-08-09 17:19:14.415993160 -0500
@@ -1,9 +1,4 @@
-BusyBox v1.31.0 (2019-08-13 06:54:37 CDT) multi-call binary.
-Usage: diff [-abBdiNqrTstw] [-L LABEL] [-S FILE] [-U LINES] FILE1 FILE2
-Compare files line by line and output the differences between them.
-This implementation supports unified diffs only.
+usage: diff [-abBdiNqrTstw] [-L LABEL] [-S FILE] [-U LINES] FILE1 FILE2
 	-a	Treat all files as text
 	-b	Ignore changes in the amount of whitespace
@@ -20,3 +15,4 @@
 	-t	Expand tabs to spaces in output
 	-U	Output LINES lines of context
 	-w	Ignore all whitespace

That's... not good from a licensing perspective. I mean code aside, that's a large enough chunk of text to be clearly copyrightable (and not scenes a faire either). Not only can I _not_ keep that, but it makes me wonder where the REST of that code came from?

I vaguely remember chewing somebody out about this sort of thing on the list years ago, but can't find it now? I refused to take a patch, I think to mdev, because I recognized the code as coming from busybox (I'd written some but not all of it; when I can isolate and relicense my own old work I'll do that myself, thanks). But for some reason google is not finding half the messages in the toybox mailing list's web archive? As in checking my local mbox file finds ones they don't for some pretty obvious keywords...

Sigh. Luckily, I'm doing a brand new implementation of diff ANYWAY. I feel noticeably less guilty about that now.

August 7, 2022

The problem with each diff finder design is I can come up with an input that breaks it.

Conceptually, the diff result is that each side of a hunk has a bitmap of "matched" lines (which could be implemented as setting the bottom bit of the string pointer so it's not aligned, that's easy to clean up in the print pass). The constraints are that each line can only match with one other line (so a marked line doesn't mean "this has a match" but "the next marked line on the other side is this line's match"), and matches can't "cross" (beause lines are emitted in order).

If A is XZXWX and B is XWXZX then each line DOES have a match, and B[1] could matches any of A[1,3,5]. A simple "consume next match" approach works ok here: 1-1 match, then 2-4 match, then 3-5 match resulting in " X+W+X Z X-W-X". And if A=XZXCDEWX B=XWXFGHZX that's 1-1, 2-7, 3-8 ala " X+W+X+F+G+H Z X+C+D+E+W+X". And A=XXZXXWX B=XWXZX 1-1, 2-3, 3-4, 4-5...

But then A=XXCDEFGHI B=XCDEFGHIX matches 1-1, 2-9 and nothing else. The big run in the middle is missed because "next match" jumped after it and we can't go back. It's not a question of duplicates either: A=XZCDEFGHI B=XCDEFGHIZ also matches 1-1, 2-9. When it SHOULD match 1=1, 3-2, 4-3, 5-4, 6-5, 7-6, 8-7, 9-8 for " X-Z C D E F G H I+Z".

Hmmm, when the two sides differ on which match is "next", take the closer one? Something's gotta break that, but I'm not sure what?

August 6, 2022

So the new code I've got to digest diff hunks starts by collating and sorting both halves of the hunk to find matching lines, then traverses the matches to find all the possible spans of "kept" lines that are neither inserted nor deleted (at every possible alignment), then sorts the found matches by length and grab the longest ones first...

And I'm not sure that's right? The problem is you can have conflicting matches, only some of which can be used as matches. The obvious example is when you move a hunk: the old location becomes - lines and the new location becomes + lines even though the moved hunk is a match. But if the part you moved is longer than the context you moved it around in the rest of the hunk, then the context is what gets - and + lines. Either way, when ALL you do is move then every line matches _somewhere_, but not all of those matches can be used. Some of them have to be + and - lines in the diff. And in that case, it's keeping the longest runs of unchanged stuff to bracket + and - lines around.

The question is, WHICH sets of matches is the "optimal" set. Do I _just_ grab the longest ones I can? If I grab a 10 line match, are there two 8 line matches that would collectively add up to more but neither of which can be used if the 10 line match is used? I suspect this only comes up if there's a lot of repetition in the lines (ala abcabcdabcabcdabc), because you'd have to have long matches at multiple different offsets... that's probably not real? Maybe? Hmmm...

But what might be real is a long match bumps a less-long match that was occluding an even shorter match. And when the long match bumps the medium sized match, I don't know to go back and re-enable the short match if I already looked at it and discarded it in favor of the medium sized match. Maybe if I sort the match list by size and look at the SHORTEST matches first, and then bump them for longer matches... no, that still would zap the short match for the medium match and then the long match. Take the longest match and just keep it, and fit in any shorter matches around it that you can. Either you never bump anything or you traverse the match list more than once.

Sigh, the thing about all this run sorting stuff is the runs can't CROSS. They have to occur in order. Which implies sorting the matches by starting position... but is that starting position within the first file or within the second? When hunks move, those conflict. This entire approach smells wrong.

August 5, 2022

I've been trying to add a proper cortex-m target to mkroot, because in theory qemu and Linux both support the mps2-an385 board. In practice, I'm not sure either side's support is entirely real yet?

In the process, I hit a truly infuriating kernel change that's just... Ok, backstory time.

One of the big advantages of toybox over busybox is that toybox DOESN'T have dozens and dozens of config sub-options like CONFIG_MD5_SMALL micromanaging stuff even I don't really have the domain expertise to evaluate. Our config questions are "do you want to include the 'ls' command?" and if you do it's basically always the same command with the same options behaving the same way.

Over on the Linux kernel side, I like miniconfig because it's conceptually simple: start from allnoconfig and switch on these options, letting dependency resolution do its thing just like it would if this was a checklist you went through manually via menuconfig. This means there's nothing in your config that's not visible in the list, either explicitly set or as a consequence of what you explicitly set. And back in the day you could use some really tiny miniconfig files to configure a runnable kernel.

Over the years, Linux and busybox both have grown a lot of unnecessary complexity, and the kernel guys have done some deeply silly things. Both have wandered far away from "simple", but you can still achieve simple with a flamethrower.

What I _really_ dislike is when a new version adds new options you have to switch on to get the existing behavior. Saying "but this defaults to on" doesn't help, because there are 8 zillion symbols out there that ALSO default to on, each of which is a judgement call or it WOULDN'T BE A CONFIG SYMBOL. Complaining that I'm ignoring your defaults is complaining that I'm trying to understand my config.

So back to that kernel change: imagine if toybox cpio suddenly added a special config item you had to switch on in order to get "mtime" support in the archiver tool. Linux has been Just Doing That since 2008, but no. Now there's a config symbol you have to switch on to do that. (By the way, as horrible as the CONFIG_EXPERT stuff is... isn't that what it's FOR? These are config symbols most people won't fiddle with, we hide them in a special group where most of them default to on despite being hidden. The existence of both CONFIG_EXPERT and CONFIG_EMBEDDED is infuriating, but the linux-kernel development community has managed to chase away both groups so nobody's left to complain anymore, to the point some lists are basically abandoned. Not that this always stops me.)

August 4, 2022

Checked in on prudetube again today trying to motivate myself to edit and upload the taskset video, and prudetube still hates women. Yes, I know some of this is fallout from Fosta/Sesta, but these specific videos are created for tiktok, and reposting them on prudetube is what requires the extra censorship. (Meanwhile, pornhub still doesn't even require a login.) Nope, prudetube has no excuses (just like their kids videos disabling the mini-player was about having a snit at regulators objecting to them collecting and storing data on minors, so they intentionally sabotaged their own site). Jon Stewart recently mentioned watching baseball on Odysee: is that site load bearing? I mostly hear it's not as good at punching nazis and had written it off with crap like "parler", and while this is making me wonder if I should take a second look at it, Jon's gotten old and is getting sucked in to some right wing misinformation (and this isn't the first time that's happened either), so... eh?

I got my new diff implementation identifying hunks, but efficiently outputting the hunks takes more thought. I read some more writeups on existing diff algorithms and it's still all explained mathematically or using graph theory, and that just doesn't map to the territory for me. I feel bad about rewriting this, but A) my new code so far is only about 250 lines, B) I can't support something I don't understand! (If this hadn't been an external contribution, this is the approach I would have taken in the first place. I've spent a lot of time trying to grind down the diff.c I was given, and if I'd just done a new one earlier I'd be done by now. Yes, I feel like a failure as a developer here, and am worried about dunning-kruger, but why can nobody explain what the old stuff is doing without pretending it's math instead of an algorithm?)

I think what I want to do is a simple "greedy" algorithm that collates the two half-hunks into one big array of lines, qsorts the big array, and then uses the resulting adjacent pairs (identifying matching lines) to find runs. (No, I'm not bothering with a hash, I'm just using my comparison function on line pairs.)

Each matching pair is a candidate alignment: loop to find how many lines before this and how many lines after this match. This creates a new array of known runs (length,start1,start2), qsort THAT by length of match, and keep the longest matches filtering out overlapping matches that conflict with an already kept match. Then use that run data to output the hunk: anything not in a matching run s a + or - line.

If the new "big" array is pointers to pointers, then A) I know which side each line came from by comparing "is pointer >= start && <= start+len" against the first half-hunk (and I only have to test one because if not then it's the other).

I'd also like to work out how to avoid redundantly calculating runs, because "match at position 1, match at position 2, match at position 3..." are all matches at different points which the sorted big array doesn't distinguish. I can search back through the array of already found runs to see if this line pair is in it (both lines at the same offset from start of match within their respective half-hunk) although that smells unpleasantly n^2 if the array of found matches gets long. Hmmm...

I could use the bottom bit of each pointer (modifying the hunk array, advantage of pointer-to-pointer) to indicate that I've already used it in a run. (Pointers returned by malloc are always aligned to word size, so the bottom TWO bits are usable for signaling if I mask them back out afterwards, which I can do on the printing pass so free() isn't confused.) The problem is, any line can be repeated, so the other hunk could have multiple different places this line can match up to, rendering a "this line has been matched" bit useless. The sorted "big" array lets me know how many candidates there are to match this on each side, but is that useful info here? (Well, "there are only two" is. Then I could set a "used" bit. But I don't like having multiple codepaths, and I need to NOT do that when there AREN'T only two...)

Hmmm. With each new line pair I can search the array of found matches to see if this is known (the "from" line and "to" line are within the range of and at the same offset from the start of an already found match). This keeps the found matches down to a dull roar, and it's a cache-local array I'm just doing math on, not even reaching out to DRAM for the string contents. The counter-argument is if I have 7 instances of this line in the "from" and 12 instances in the "to" I have to do 7*12 comparisons, and even a one line long match at a unique alignment has to be recorded in case it's needed.

Hmmm... the array of known runs can be 12 bytes per entry, because it's an array of unsigned int times 3 because each value is measuring lines: the length of the match is number of lines, and and starting offset within each hunk is also the line number within the hunk. (I could PROBABLY get away with unsigned short, but don't need to quibble: at 32 bits it's still 341 entries per 4k page. Most hunks shouldn't be anywhere near that.)

Sigh, the thing about all this run sorting stuff is the runs can't CROSS. They have to occur in order, and I strongly suspect there's an algorithm that takes that into account earlier in the process. I've tried to write it a couple times and haven't quite gotten there, but...

It's all a question of what inputs it would be confused by. It should never produce WRONG hunks, just... not ideal ones. Most concise and most readable aren't quite the same metric. The greedy "grab longest run" approach is theoretically an attempt to optimize for readability...

August 3, 2022

I added mount --rbind for Yi-Yo Chiang yesterday, but haven't heard back from him about whether or not it met his needs. I guess no news is good news?

The 5.19 kernel dropped on the 31st, so I need to do the full set of mkroot builds to confirm I'm good shipping that in the mkroot images with this weekend's release. I guess that's about as synced as it's likely to get. :)

August 2, 2022

Finally got an nproc+taskset usage video recorded. Now I need to edit it. I note that the process of going through that resulted in FIVE seperate commits to taskset.c, so I guess yay polishing pass?

August 1, 2022

Elliott gave some good advice about prioritization, and after the next toybox release (this weekend) I should use it. But right now I'm trying to close tabs. I want to get a video out, and I want to get diff finished. Both are slightly "bit off more than I can chew" territory; not blocked, but it's taking a whole lot of chewing.

He also said:

i'm reliably informed by people who watch a lot more youtube than i that "regular uploads" and "convenient upload lengths" are the two keys to making viewers happy :-)

And I have a reply window open, but... I've watched a lot of meta-videos from content creators about struggling with The Algorithm and trying to get paid (and this is aside from the insane censorship), and I really don't want to make that a big part of my life. (And yes I've also watched the Spiffing Brit's series on hacking the youtube algorithm so I know about shorts and polls and watch time and so on... But I just don't want to GO there.) If they weren't prudetube, announcing their intent to annoy users into subscribing, doing completely random copyright strikes, with comments full of scams (having destroyed their actual commentor community back when they shoved Google+ down everybody's throats, )... Then maybe I'd care? But right now I view youtube as a legacy platform nobody is actually HAPPY with anymore, which I expect to see die and be replaced. In the meantime I would happily do this as a video podcast if that was still a thing, but I don't have scalable hosting and while rss has recovered a bit since its lows I dunno how to wrap/tag an rss video. I suppose I could make a twitter account for them? (I intend to have a index page under the toybox page with links to each video...)

Fade gets back late tonight. I really haven't left the house in several days; I keep meaning to go out somewhere with laptop but I can't record video where it's not quiet. (Staying home to let my back recover has developed its own sort of inertia.)

July 31, 2022

Trying to record video. The dog keeps barking at random things because Fade's nothere and he's VERY NERVOUS. (She's at a family reunion in california, gets back tomorrow.) I'd like to get this up before the patreon rollover, but my skill level at this is still very low.

This time around I'm doing a script to read from, not just bullet points but actual "here's all the words I want to say, in order" typed out ahead of time and put onto a file I can read off my phone. But this means I'm not typing and talking at the same time. Possibly I should record audio in one pass and then do stuff on screen in a separate pass? (I want to come up with a method that scales. So far, this isn't it...)

July 30, 2022

Remember how lawyers hate the public domain because it takes away their ability to get paid? Attack du jour. THIS DOCUMENT is only a copyright license, and does not address trademark or patent licenses. NOT THE SAME as a pledge not to license trademarks or patents in other documents, it just says we are not addressing them here! Scope of what is granted!) So busybody lawyers trying to justify their existence (see the part of David Graeber's book where "we need an air conditioner repairman on standby but can't admit you have nothing to do when nothing's broken, so we will make up busy work to fill your time", this would appear to be the legal equivalent of that) are attacking the public domain trying to drive a wedge between public domain equivalent licenses. Which are EQUIAVALENT, it says so in the name, therefore an attack on one is basically an attack on all, but what else is new?

July 29, 2022

Sigh, I should break nproc off into a seperate file and have the simple vesion just do the two sysconf() calls with maybe a comment that musl has been broken for many years and has been arguing with itself for 3 years instead of fixing it...

Except that when I just tried that, glibc is doing ONLN for CONF. I.E. it has the opposite bug musl has, and never shows the taskset-available processors but ALWAYS shows the number installed. (Modulo hotplug.) Which is why my code wasn't doing sysconf for that. And in fact, the initial commit adding support for this back in 2015 did both of them manually, and the code has more or less reverted to what that was doing (modulo manual readdir instead of dirtree callback, which avoids a global). Meh, at least there's a reason for it. (And "glibc is broken one way, musl is broken the other way" is starting to sound familiar here, now that I think about it...)

I got the blkid feature request from yesterday in, but the fstype request is funky. It turns out debian is grabbing the klibc implementation out of /usr/lib/klibc/bin (which is not in the $PATH for good reason), and what it's doing is just ugly at a design level and I don't know why initramfs-thingy thinks it needs it? When I run "whoami" it says "landley" because it's a unix command. It does not say "USER_NAME=landley" because IT IS A UNIX COMMAND. Nor does nproc say "NR_CPUS=4", it just says "4". That's how unix works. The klibc developers do not grok unix. (I mean, I could add a --prefix command line option or something to the toybox one? But klibc would still produce the prefix without it, and wouldn't understand the command line option if it was provided and would try to parse it as an image name. The klibc design's not just broken, it's brittle.)

While I'm on the subject, I've added escaping back to blkid -o export because blindly calling "eval" on its output (as debian's initramfs-mangler is doing for the klibc output) when somebody created a fat filesystem image with a volume name of ";rm -rf /home" seems inadvisable somehow? Not sure what the full list of characters that should trigger quoting is. My first pass just did it for spaces, quotes, and backslashes, but there's a whole bunch of characters that are not safe in shell, off the top of my head $><|&;*\n\t`~()!#? ... I think bash extended wildcard syntax is covered by the parentheses? Sigh, well it's a start.)

Circling back around to diff, I'm trying to figure out what hitting EOF means for ending an in-progress hunk when doing streaming parsing. Normal processing is toggling between the two inputs searching backwards to find 2*TT.U matching lines to end a hunk, but when we hit EOF we didn't find that yet (or we would have stopped loading new lines). I guess keep advancing the non-ended file until it hits the end, checking each new line against the ended half-hunk the same way? Seems potentially expensive, but at least the expense isn't _increasing_ each time since the one you're checking against isn't expanding. (Large insert in the middle of a file is gonna suck no matter what, that's what all that hashing stuff was for. The question is what counts as "large" and how much of a common case is it to optimize? Hmmm... There comes a point where you want to fall _back_ to hashing, I expect...)

The thing about C is you wind up with a lot of leading tabs/spaces, and lines with just a closing curly bracket, so string comparisons don't end quite immediately and may even spuriously go on for a line or two. But even then, it's still pretty cheap: a "lot" of tabs is 10 and a "lot" of spaces is 80. Again, probably not a real issue because 50 years of Moore's Law since the original diff paper, and "diff -u" is sort of an easier problem to solve although I suspect "-U 0" may not work at all with my approach, or at least it would never re-sync. Either I shouldn't support -U 0, or I should have it parse at least 1 context line even when it doesn't emit it, just so it can resync and end the hunk...

July 28, 2022

Lay down for a nap and slept through until morning (including missing a scheduled phone call). Woke up in "everything hurts" mode, but I think that was mostly just dehydration.

Broke open the "Khaos" monster energy drink I've had in the fridge for quite a while (got as a treat; I can't have those things regularly anymore but I can fill a shot glass from them and treat it as a sipping beverage). I have mostly finished the can, but I'm GETTING WORK DONE!

I poked musl about the nproc ONLN/CONF thing and the answer is that they've been arguing about it for over 3 years and it's not finished yet. Right. So I taught toybox nproc to look at sysfs, and am backing away from the musl list slowly and quietly, trying not to make any sudden moves.

Elliott poked me because I've broken diff again (ouch), which was half of yesterday's "must... do... work!" despite whatever lurgy that was (probably switching between day and night schedules three times in the same week; my internal clock gave out). The problem was my recent lib/args.c changes didn't properly handle "--longopt=" with nothing after the = (it's the same as -x "" ala an explicitly blank argument; note that for --longopt="" the shell will drop out the "" but "" as a separate argument adds an argv[] entry of length zero so the user interface logic is the same but the onus on who handles it is different). Added tests to skeleton.test and fixed it. (Yes, in that order. The hard part isn't fiddling with the code, it's the missing tests. I broke the other variants 3 different ways fixing it, but I had the tests to NOTICE I'd done that and could jenga it around until the pachinko ball made it to the bottom through the test staircase. It's not so much the code being complicated as there just being a lot of cases to deal with: the same command line input is interpreted at least 4 different ways by commands, hence the "x:", "x:;", "x: ", and "x:; " cases. Note that ; and " " are modifiers to a base type, so that's all 4 variants of each bit being there or not being there.)

Next up, I got a feature request for blkid and fstype. The blkid one boils down to "your blkid hasn't got -o and initramfs-tools uses it", except presented in essay format. (Pascal's apology again, concise = making it look easy = lots of work. Well, concise+intelligible anyway. :)

July 27, 2022

Deeply under the weather today. Not sure why. I've tried to get started working but just... haven't.

July 26, 2022

And the site is down again, timing out attempting to load on both my laptop (google fiber) and my phone (t-mobile). Yet another reply sent to the Dreamhost email thread. (Yesterday on github someone wouldn't believe that Dreamhost can't add https support to in 2022. And yet...) Ooh, the Dreamhost Quality Team just sent me a "How did our support team do?" survey. Asking specifically about the performance of "Art", from whom I received exactly one email at the end which said (and I quote): "Our engineering team looked into the server move and see that it's working now. We are seeing the archives being written to now." So his contribution was to show up and go "it looks fine to me" a week into the process, not even explicitly acknowledging that it WAS broken and that unbreaking it required work. This was after multiple emails from "Mark" and a subscription notification from "Hector Dreamhost" joining the mailing list. How did Art do? I couldn't tell you WHAT did Art do. (Yes that email was yesterday. Today the server was down for hours.) Should I send the "how did we do" survey a link to the github thread above?

Command review to do a video on it is basically another round of promotion/cleanup inspection, and I found a bug in taskset, which sent me down a rathole to figure out I'd already created the infrastructure to fix the problem, and could just do a small tweak.

The problem is when I first wrote taskset long ago, I thought it needed SUID root powers to work, so annotated it TOYFLAG_NEEDROOT. And then later I worked out it didn't, so downgraded that to TOYFLAG_STAYROOT on the theory that if toybox runs as suid root and then the wrapper calls something like "mount" that needs it, we haven't dropped those permissions which it couldn't re-acquire with a new exec().

Except that's NOT how the SUID plumbing works: we drop permissions at the first opportunity. The wrappers don't retain them "in case", we do the darn re-exec when we need to. And that's why toy_exec_which() in main.c checks toys.wasroot to see if xexec() needs to fail the toy_exec() and fall back to the execvp(). (And making that work so the "toybox" multiplexer entry path doesn't drop permissions along the way and then go into a re-exec loop was... fiddly. That's why the extra variable.)

So the fix is to remove TOYFLAG_STAYROOT from taskset's NEWTOY flags, but I had to go through the above to confirm WHY that's the right thing to do.

Meanwhile, I found out that musl has a bug! The nproc command has two modes, the default shows available processors (as modified by taskset), and nproc --all shows installed processors (whether or not the current process can schedule on them). One codepath is _SC_NPROCESSORS_CONF and the other is _SC_NPROCESSORS_ONLN. Except musl does ONLN for both, it hasn't got the second codepath, which according to strace is checking /sys/devices/system/cpu in glibc, and the bionic source has a comment saying that /proc/cpuinfo works fine on x86 but arm is broken because arm filters out the taskset-unavailable processors from that, so you have to look at the sysfs one to work around the arm bug.

I'd like to test that myself, but mkroot is building UP kernels for all its targets, on the theory we're testing userspace not the kernel. Last I checked qemu's SMP emulation was all done using a single host processor, so all it did was slow down the virtual system with extra context switches and cache thrashing. There was a presentation a decade back (I think I watched it in my office at polycom? Some kvm conference video sponsored by Red Hat and IBM and the concept of bureaucracy...) about using threads to make qemu SMP acutally use multiple host processors, and the synchronization requirements were just horrific and it was basically a "this is why we haven't done this yet". QEMU dynamically recompiles code blocks rather than emulating individual instructions, and getting the details of inter-processor interaction right in that for things like spinlocks and memory barriers and so on is... apparently not fun? It's been a while, I forget the details...

Huh, work's been done since then on getting it to work, and grep -r TARGET_SUPPORTS_MTTCG configs/targets | wc is showing 17 targets supporting this now. I might need SMP support to actually test taskset? Hmmm, do I want to go there...

Well I don't want to go there right NOW, anyway.

July 25, 2022

The mail archive is fixed! It's displaying, and new mail being posted is getting added to it, AND messages are going out via email again! Only took a week. By Dreamhost standards, that's pretty snappy.

Writing a script for nproc and taskset video. It keeps spinning off tangents. I picked this because it's closed and finished and I have real use cases for it in other vidoes, and review keeps going down ratholes...

July 24, 2022

I just loaded the mailing list web archive on my phone and it had 4 new messages at the end! (All ones I'd already seen due to the cc, but still!) I went to my laptop to see if dreamhost had sent me a support email reply... and the list URL is 404 loading from there.

But hey, I guess that means it's being actively worked on now?

I keep using "taskset 7" on my laptop telling some CPU hog to use 3 of the 4 processors, which has a bunch of benefits: the system is still responsive while I'm doing it, the voltage draw never causes a sudden shutdown running from my somewhat elderly battery (which still lasts a couple hours if I don't overload it), when recording video there shouldn't be dropouts...

Which means I want to do a "taskset" video pair (using, implementation), which has me looking up the maximum possible number of processors in Linux again. Because I'm using "toybuf" for the bitmask, page size 4096 means 32768 bits so my taskset can't do more processors than that. I _think_ the biggest SMP support target Linux has ever merged into vanilla was 8192, and that was a mostly theoretical powerepc thing? But it's been a while since I looked it up and I'm htting the "I don't remember what symbol name to google for" chicken and egg problem trying to find it in my blog (assuming I blogged it rather than emailed it or something).

Digging through the linux source (I'm SURE I've done this before, and thought I'd written it up...) after a few greps for likely candidates the symbol I want is NR_CPUS, about which Documentation/RCU/Design/Data-Structures/Data-Structures.rst says 'NR_CPUS': number of 'rcu_data' structures, one for each possible CPU. (Except the rcu_data array is actually defined via the per_cpu() macro which is black magic #defined in include/linux/percpu-defs.h, which does not actually include "NR_CPUS" in the header because modern Linux is layers of nested full employment complexity. Job security through obscurity: if you can't understand what I do, you can't replace me, thus you can't fire me. An old IBM trick, among other things. I prefer "if you can't be replaced, you can't be promoted", which was in the Nancy Lebowitz button catalog I got at my first Doctor Who convention in 1984.)

So include/linux/threads.h #defines NR_CPUS from CONFIG_NR_CPUS, which has a default of 1 right above that but CONFIG_BLAH symbols mostly come from the various arch/*/Kconfig files. Grepping for =[0-9]*$ in the defconfig files, there are four =2048 defconfigs all under powerpc (pseries, skiroot, ppc64, powernv), and two =512 definitions (the defconfig and debug_defconfig files for the s390 architecture). (Well, there's also gensparse_defconfig for ia64 but that's not real on multiple levels.) All the other defconfigs set it to 2, 4, 8, 16, 24, 32, or 64 processors. So supporting 32768 is way massive overkill.

That said, defconfig isn't definitive. The range possibly selectable in the Kconfig file would be. So find arch -name 'Kconfig*' | xargs grep -r -A 5 NR_CPUS and... hexagon is 2-6, alpha is 2-32, arc is 2-4096 (unlikely), nios2 is 1, mips boils down to 2-256, x86 has a purely theoretical 2-8192 (powerpc envy?), loongarch is 2-256, openrisc is 2-32 (I want to pat that architecture on the head), parisc is 2-32, arm is 2-32, arm64 is 2-4096, sh is 2-32, xtensa is 2-32, microblaze is 1, csky is 2-32, sparc is 2-4096, s390 is 2-512, and neither ia64 nor riscv is real. (The NR_CPUS range starts at 2 because you switch off SMP support for 1, letting a bunch of infrastructure drop out.)

The high-end SMP stuff is a ragged abandoned mess. For example, arch/powerpc/Kconfig selects CPUMASK_OFFSTACK "if NR_CPUS >= 8192", and then arch/powerpc/platforms/Kconfig.cputype gives config NR_CPUS a "range 2 8192 if SMP" so it can't be greater. (And this is the one architecture that MIGHT actually be using this stuff, presumably for that "watson" thing that won Jeopardy.)

Mips isn't ACTUALLY 2-256, off course. That's just the range line for config NR_CPUS in arch/mips/Kconfig. It's an overengineered mess with a staircase of NR_CPUS_DEFAULT_XX symbols and then the help text for the config symbol says "The maximum supported value is 32 for 32-bit kernel and 64 for 64-bit kernels" so WHY HAVE A RANGE GOING TO 256???? (Perhaps because mips alienated its customer base via patent trolling and its engineering department slowly strangled to death? Who knows...)

The x86 mess is even sillier somehow: NR_CPUS_RANGE_BEGIN just should not exist, nor should MAXSMP, and NR_CPUS_RANGE_END with CPUMASK_OFFSTACK seems deeply aspirational. The CPU mask was moved in 2008 back when the top 500 supercomputer list was being dominated by Linux machines and having thousands of SMP nodes sharing NUMA memory seemed like it might be real someday. (Spoiler: not so much, they went with cloud setups descending from beowulf clusters instead.) Support on x86 was added by Mike Travis of SGI 3 days later, which seems odd for a company that shipped mips hardware but the end of SGI came about when a Microsoft executive got control of SGI and made them pivot from Mips to hardware capable of running Windows NT. (As with the time a Microsoft executive took over Nokia and sucked its guts out to feed Microsoft, the brainwormed company didn't entirely survive.) As for x86 massive SMP, Knights Landing/Hill became Xeon Phi and was discontinued in 2020 because people either used clusters or ported their stuff to GPUs. And I only remember talk about that aspiring up to 512? Stuff that's actually still selling(?) like Cascade Lake can do 32 cores which might hyperthread to twice that?

So almost 15 years ago x86 and powerpc had a dick-measuring contest up to 8192 (but didn't follow through with actual hardware), and arm64 and sparc claim half that possibly for marketing reasons. (Sparc had Fujitsu doing supercomputers for a while, and arm64 is still growing although probably not THAT far.) The s390's 512 processors seems a reasonable "big iron" limit (and that's the "runs an entire company taking up two floors of a building pumped full of florinert with its own power substation" level of scalaing). Everything else maxes out at word size for optimization reasons: 32 on 32 bit processors, 64 of 64 bit processors, and even the 64 bit ones seldom actually ship more than 32 processor systems, presumably due to memory bus contention and such.

So yeah, taskset handling 32768 is waaaaaay overspecced, but I had the buffer lying around and it was easy to do.

July 23, 2022

The delivery failure bounce messages from everything Elliott and I sent to the toybox mailing list since it died have started to come in. The interesting part seems to be "<> (expanded from <>): temporary failure. Command output: local: fatal: execvp /dh/mailman/pat/mail/mailman: No such file or directory" which is 100% "their new setup is borked". I wonder if it actually works for anyone else?

Circling back around to getting ready to do those mkroot videos, I checked youtube... which seems to have pissed off most of their creators (which leaks out to other platforms but is quite PRONOUNCED on youtube's own site now). As for the escalating prudishniss ala livejournal strikethrough, here is a MALE DOLL WEARING SHORTS with a censor logo over it in the thumbnail (which was fine to show on TV but youtube is more prudish than TV, and is an obvious piece of plastic which presumably has Ken Doll anatomy anyway???). And of course if you have an anime with a school sports competition (I.E. DRAWINGS of women in ATHLETIC SHORTS) clips from that need censor sprites over their hips in every scene because prudetube. And here's a censor bar in the thumbnail of a Don Bluth film for children where a fully dressed cartoon character shown in a children's movie is too risque for prudetube. All self-censorship from terrified creators.

This was just a couple minutes scrolling through their recommendations! Nooooot a healthy website. Meanwhile the non-video ads mixed into the feed are for "Lewd festival pics", "Beach photos for adults only", something represented by a picture of two women in burquas (ok, niqabs) with bare legs kicked up high (really!), some game called queen blade with the tagline "Feel the fighting spirit surge and burst through your beautiful warriors' tight armors"... If the ads are full of stuff they won't allow in the videos, the problem CAN'T be finding advertisers for the videos? And a bunch of right-wing loon ads: no, I do not "stand with trump", and I've blocked "michael knowles" channel (some glenn beck wannabe) and clicked "stop seeing this ad" to every new ad from him (cycling through inappropriate, irrelevant, and repetitive) but the ad spend continues to contaminate my feed... Also, youtube has developed some sort of bug where the video hangs but the audio keeps playing on a still frame frozen for the rest of the video. It did that on three videos. I checked and the app is upgraded to current? Sigh, Why am I planning to add more video to this mess again?

In toybox, I keep breaking stuff that's in "pending", which wouldn't be a problem except Android USES stuff out of pending. Next release is scheduled for the 6th (3 months since last time), but the release AFTER that I should prioritize trying to promote all the commands Google uses out of pending.

Let's see, Android.bp says:

  • all_srcs: dd, diff, expr, getopt, tr
  • device_srcs: brctl, getfattr, lsof, modprobe, more, stty, traceroute, vi

Hmmm. I'm working on the first two of those already. Where's the list Ellliott sent me a while back...

  diff (--line-format, -I)
  expr (GNU unary +)
! hexdump (-n#, -s#, -C, -e, -f FILE, -v)
  realpath (-s (--no-symlinks), --relative-to)
  tar (--transform)

less important, just used for debugging buildbot issues:
! fuser (no flags, explicit filename, -v, -k)
! pstree (-pal for ninja, -A for ltp)

Which mostly overlaps the above two (and adds features to non-pending stuff), so the third batch of commands would be:

  • enh_srcs: hexdump, realpath, tar, fuser, pstree.

Which isn't "fish these out of pending" just "todo this year". Most of Elliott's list is already in those first two, although it also adds features to non-pending stuff.

So the collated Android TODO list would be: dd, diff, expr, getopt, tr, brctl, getfattr, lsof, modprobe, more, stty, traceroute, vi, hexdump, realpath, tar, fuser, pstree.

But "fix unshare so I can do the introductory mkroot video the way I want to" is for _this_ release. As is finishing diff, I hope...

Except the next thing I chased down after I typed that was why seemingly innocuous changes I made to lib/args.c were causing tests/timeout.test to fail with trap: SIGTERM: bad trap. The answer of course being it wasn't the args.c changes, it was Elliott reverting my change because on debian /bin/sh points to the Defective Annying SHell, meaning #!/bin/sh is fundamentally unreliable and should never be used for anything. Leaving the revert in place for now because presumably mksh isn't as broken as dash and they need it to work. Not sure how to properly fix this other than finishing toysh so we can agree on a known working shell. I might chmod -x timeout.test for the moment...

July 22, 2022

So let's see, how is it going:

On 7/19/22 14:39, DreamHost wrote:

Discussion Lists Maintenance Complete.

Hello, Rob!

As promised, we've just completed scheduled maintenance on the following DreamHost discussion list(s):


The maintenance went well, and no action is required on your part.

If you've got any questions about this upgrade, please don't hesitate to reach out to our technical support team [LINK]. We're standing by, ready to help!


My reply on 7/20/22:


On 7/20/22 14:10, DreamHost Customer Support Team wrote:

Hello Rob,

Thank you for writing to DreamHost Support! I am happy to assist you today!

I am sorry to hear of the trouble you have had here and any inconvenience it may have caused. I was able to restore the archives for you and they should now be accessible to you again.

If you have any other questions, comments, or concerns, please feel free to reply or open a ticket via our panel.

My reply:

You did, thanks. Could you get the archive to update when new messages come in again?

I sent a message to the list 9 hours ago, shortly after I got this message from you, and it hasn't shown up in the web archive yet.

And a few hours later, I followed up with:

And now is timing out. (On both google fiber and t-mobile.)

And the latest exchange (I.E. my reply from this morning):

On 7/21/22 09:02, DreamHost Customer Support Team wrote:
> Hello Rob,
> Thank you for writing to DreamHost Support! I am happy to assist you
> today!
> I reviewed the list here and suspect that you may have sent the email
> while the list was still in a broken state and that particular post may
> not show.

Elliott Hughes sent an email to the list 3 hours after you sent me this message, which still hasn't shown up in the web archive. He also sent 5 yesterday and I sent 4 replies. None of them have shown up on the list.

I don't actually know if they're going through the list either because so it's quite possible that list messages are just vanishing? I haven't seen anybody send messages to the list without cc-ing me since your server maintenance broke the web archive.

And yes, re: the above archive link, I'm aware that saying:

> Luckily, I'm in the habit of checking said archive or stuff I missed.
> And luckily, the archive hasn't broken again recently...

Was tempting fate about this given dreamhost's history of being unable to keep a mailing list server working, ala (I believe this is the sixth time this has happened? I'd have to go back and count. And no and are not a complete record, there was plenty before that and a few since. It _had_ been a while, though...)

I also note that the "temporarily disabled" message in has been there for over 10 years now.

> I did try to test this from my end, but current settings need moderator
> approval and there do appear to be several others waiting to be approved.

Where? says "There are no pending requests."

Which is odd because there on the 18th there were 301 pending moderator requests (all spam, I'd triaged them when the notification emails came in but hadn't deleted them yet). I usually get about 5/day in there and haven't seen any since the "upgrade". Looks like mail to the list is just vanishing...

> You may want to look through those and approve them before they get sent
> and recorded in the archives.

I just looked. The admin web interface says there are no requests. Not even any fresh spam. If you're seeing pending requests but I'm not seeing pending requests, then there are two places pending requests could go and they're not lining up.

> As far as I can tell, this seems to be working now, but I won't be able
> to know for sure until you approve my message.

I just tried. The admin web interface says it does not exist.

> If you continue to have
> issues with this, please provide the email address that you are sending
> from and I can investigate the logs further with that info.

enh at google dot com and rob at landley dot net have both posted to the list multiple times over the past 24 hours. I can forward you the actual messages if you like, but I dunno how this support@ email address deals with attachments?

> If you have any other questions, comments, or concerns, please feel free
> to reply or open a ticket via our panel.

So I'm waiting to see what happens next. The hard part is always getting them to recognize that whatever it's doing now is not, by definition, "working properly"... (Talking to tech support has eaten WAY too many of my spoons this week.)

If you're wondering why I write a text file in vi and rsync it up to a server instead of still using a blogging service like my old livejournal... well, livejournal specifically drove away half its userbase with "strikethrough" (I.E. prudish capitulation to pearl-clutching right wing self-appointed censors, just like youtube is doing today) and then the corpse was bought by Russia, which filled it with spyware by the time that article was written. More recently Russia (where Putin moved livejournal's servers so he could control/spy on his servants better) became an international pariah state by invading Poland Kuwait Ukraine. (Yes, I could have moved from Livejournal to Blogger or Google Plus... neither of which still exist. I _had_ a twitter account until they decided "guillotine the billionaires" was hate speech instead of a valid political platform, and yes they kept Trump's account up for over a year after that, and now they're suing to force Elon Musk to follow through buying them who has promised to reinstate the Dorito's account. And yes I downloaded by tweet archive semi-regularly because I didn't trust it to stay up.)

I'm a child of the 8-bit era: "this too shall pass". The platform I'm on is going away eventually, I always have to be prepared to migrate. The Commodore 64, the Amiga, Dos with Desqview, OS/2, the Java ecosystem, Red Hat, Ubuntu, Devuan... I'm actively trying to make a Phone OS self-hosting so PC hardware can get kicked up into the "big iron" server space like mainframes and minicomputers and nobody under 50 needs to care that it exists because your entire software life cycle can be authored and run on battery powered portable devices.

This is one of many reasons why deep dependency chains bother me. You don't know what you're pulling it from a security or maintainability perspective, you can't reproduce it in isolation and thus are not doing science, you don't know what the bus number of any of this stuff is, your attack surface is enormous, it's regression city as things "upgrade" out from under you or bit-rot...

So yeah, I think I've got all the messages I could reconstruct the archive with on another site. (Although I'd have to filter/collate at least 3 mbox files because gmail's insane filtering doesn't send back copies of my own messages and only sends one copy of messages I'm cc'd on so the list mbox, my inbox, and my outbox needs to have the relevant messages fished out of them to make the ONE mbox I'd get for free if gmail DIDN'T try to "help" in a way I can't prevent.) But as with moving my own email off of gmail... I'm really busy with other things and would much rather NOT deal with this right now?

July 21, 2022

I'm listening to the "Stuff you should know" podcast episode on "what happens when the government thinks you're dead", which started talking about identity theft numbers being 9 digits and the USA having used half of them already, a topic I was having Shower Thoughts on just yesterday. (The current US population is around 1/3 of a billion, but they started issuing them during FDR's New Deal so there had to be a bunch of dead people who'd consumed identity theft numbers, and recycling them is problematic...) Anyway, the main topic of the podcast is people mistakenly listed as dead in the social insecurity database and the endless game of whack-a-mole to undo it (and you thought identity theft was bad)... and here I am going "oh great, ANOTHER benefit of being a 'Junior' I have to look forward to". My father's in his 70s and AARP sent ME his invitation to join when he turned 50, and then never sent me one when I did because presumably they've still got me confused with him. Coin flip whether or not I get declared legally dead if I outlive him...

The "Google Podcasts" app has LAYERS of horrible design. Half the podcasts won't play unless I download them (it's probably the https vs http thing in the URL in their RSS feed, that's why I had to stop using the previous podcast app because "upgrading" android refused to read any http URL anymore and completely broke the majority of my podcasts). The workaround is to download the episode, which puts it into library->downloads (gratuitous extra layer), but when I listen to something it vanishes out of there! Did it delete it? No, it moved it to "history". NEITHER has a "delete" option like EVERY OTHER PODCAST APP does, how I free the space back up probably involves using the system file browser and going to find where they're stored and deleting them out from under the stupid app. (How do you miss basic stuff like this? Your objects have no lifetime rules!)

Can PID 1 install usable signal handlers for kill -9 and SIGSTOP and so on? Since it's the one process that doesn't normally receive those, but it ALSO has special magic behavior that the SIG_DFL behavior for most signals is SIG_IGN because it won't be killed by anything it hasn't set up a handler for? Is the magic implemented at the signal mask level, or in the signal handler dispatch level?

So I haven't fixed unshare yet, but...

$ sudo ./unshare -iu hostname walrus
$ hostname
$ sudo ./unshare -iu hostname driftwood
sudo: unable to resolve host walrus: Temporary failure in name resolution

Yes, unshare -u didn't work there, but why does Debian's sudo care that the host name changed? What horrific crap is it doing behind the scenes that should not be happening? I haven't implemented toybox sudo yet because I want to understand the various ramifications of security weirdness here, but that was mostly about clearing environment variables, not... whatever that's doing.

July 20, 2022

I knew better. I knew better, I knew better, I knew better, and I did it anyway.

Yesterday I posted to the toybox list that the web archive on dreamhost's server "hasn't broken again recently". Mere hours later, dreamhost emailed me:

As promised, we've just completed scheduled maintenance on the following DreamHost discussion list(s)... The maintenance went well, and no action is required on your part.

And when you go to look it says "Currently, there are no archives." Yes, they deleted the entire mailing list web archive WITHIN HOURS of me posting that.

Sigh. Support ticket submitted... Ok, they restored from an only slightly stale backup, but the site still isn't updating with new posts. Support ticket reopened... And now the site is failing to connect and timing out. Support ticket updated...

Meanwhile, Google Fiber got itself deeply borked again (2.2 mbps upload, 63.6 mbps download, it's still supposed to be symmetrical and the speed drop is just a symptom that happens alongside connections taking 45 seconds to start sending data). And once again a router reboot "fixed" it, except I rebooted the router TWO DAYS AGO and that's how long it took to get unusable again? Sigh...

At the same time, T-mobile had a recurrence of whichever problem it had on the 12th with the signal strength "!", which was borked for half an hour but fixed itself while I was waiting for their support people to call me back? (I noticed because I switched off wifi on the phone so it starting using t-mobile's bandwidth... which was not load bearing. No idea how bad it's been when I wasn't trying to use it.) So I called the support line back, and pointed out I'd talked to them about this on the 12th when their towers were having a "known issue" that was "being worked on". After about the 5th time I said "no" to indian support drone trying to upsell me on their home 5G product (do not telemarket at me on a support call, if I call because the service I already pay you for isn't working the answer is NOT going to be me giving you MORE money, the ONLY option here is me giving you LESS money and neither of us want that), the drone transferred me to somebody on THIS side of the planet who had more details: it's 105 degrees out and their tower is basically melting. (Well, probably the equipment keeps putting itself into thermal protection shutdown; he didn't have THAT level of detail.) I sympathize, and will stop calling. The party to blame here is Exxon, not T-mobile; their design constraints changed out from under them because the planet got broken by late stage capitalism.

The flailing Google Fiber Router inside my air-conditioned kitchen does not have that excuse: it's gradually getting slower between reboots as the spyware algorithms get more and more confused. Whatever is breaking there is something it never should have been doing in the first place. If I was a lot younger I'd probably stick a raspbery pi between the router and the fiber jack to wireshark the traffic and see where it's dialing home and reporting data to. I assume somebody's already done a hardware teardown and I would have heard the screams if it had a microphone jack in it, but I categorically object to being both the customer AND the product. If I am paying you for the service, you don't get to show me advertising or resell my data to the five eyes. Pick the relationship you want with me.

I forgot android's building lsof out of pending, when adding same_file() I screwed something up there I hadn't fixed yet (because pending and no tests/lsof.test). And "make tests" was dying in cpio because I screwed up xpopen_setup() with what was supposed to be a refactoring where I didn't reverse a test properly while simplifying the logic:

-    if (pipes[pid] != -1) continue;
-    if (pipe(cestnepasun+(2*pid))) perror_exit("pipe");
+    if (pipes[pid]!=-1 && pipe(cestnepasun+(2*pid))) perror_exit("pipe");

That first test needed to be pipes[pid]==-1 because in this case "continue" means NOT to continue with the current expression, so when commbining the two into one if we need to invert the first test. (Long long ago I learned basic boolean algebra, and the big thing I remember is "!(a && b)" is equivalent to "!a || !b" because moving the not into or out of a parenthetical group inverts each individual item and ALSO swaps the boolean logical operators. Also !(a==b) is the same as a != b, and so on. It's one of the more useful general thingies I learned back in the day...)

The reason the sed s///x test hangs with "make tests" but not "make test_sed" is the toybox timeout command is used for the first, and the debian timeout is used out of the $PATH for the second. Which is why I didn't notice it when I checked in the timeout -i support, because I was testing individual commands and "make tests" tests the toybox commands in combination, so finds more weird corner cases.

The problem here is a race condition: if SIGCHLD from the child comes in before timeout is ready for it, we miss it and then the full timeout elapses before the command moves on.

I set up the SIGCHLD handler AFTER the call to xpopen() because I didn't want the child process to inherit weird signal handler shenanigans. I'm a bit gun-shy there after the long ago multi day saga of trying to debug a blocked SIGALRM inherited from PID 1 due to an old bash bug, although that was the child inheriting the signal _mask_ not the signal handlers, so I don't ACTUALLY have to use the xpopen_setup() child callback to reset SIGCHLD (since the child process can't have any children of its own before calling exec). And yes of course I have to set the signal handler before forking the child to avoid a race window with the child exiting. (Even without SMP, it could get scheduled first.)

Darn it, there's still a race condition after that's moved, because the SIGCHLD can come in before we record the fd we're reading from into the poll structure, so the signal handler can't close the fd, and there's no other way to tell poll() to exit early EXCEPT closing the fd? Trying NOT to pass the fd into poll always has at least the race of making the poll() function call (copying the fd into the function's local variable as an argument value) and THEN having the signal happen before the actual poll system call starts.

I cut the gordian knot with siglongjmp() from the SIGCHLD handler so the loop _doesn't_ exit and just goes "receiving SIGCHLD means timeout can exit" because we only spawn one child process; if that has grandkids those would reparent to PID 1 instead of us so getting the signal means we can exit now. BUT while this is good enough for a standalone timeout, it's not good enough for a bash builtin that doesn't fork: xpopen_both() can create a pipe but not return it yet before the signal handler goes, leaking filehandles. I may have to manually pipe() and VFORK() myself instead of calling letting xpopen() do it, because pipe(TT.pipes) is atomic: either the pipes got written where the signal handler can see them, or they weren't created yet. The kernel either backs out any halfway states before returning to userspace, or else doesn't check for signals until the system call is about to return to userspace anyway.

Anyway, that should fix up the "timeout 10 does_not_exit" problem Elliott spotted, and also the test suite slowdown I noticed.

July 19, 2022

I haven't gone out to the table since I strained my back, giving it some recovery time (walking 4 miles with a heavy backpack doesn't help recovery), but I'm in the habit of going _out_ to work. Even when I'm in another city for work (Chicago, Tokyo, Milwaukee, St. Paul...) I tend to find places to go away from the room I'm renting because after about 3 days in the same work corner I get claustrophobic. (Long ago I took my first thinkpad with a butterfly keyboard to the Northcross mall food court at lunch, and basically never looked back.)

I tried to set up a home office during the pandemic, and I get decent work done from living room couch or the the kitchen table when everyone else is asleep until the cat climbs up my arm, flops on my shoulder, and purrs adorably which very effectively prevents me from working. When we bought the house we set aside a room to be a home office, which turned out to have terrible airflow when the door is closed so it's 10 degrees warmer than the rest of the house (we emptied our storage cube into it a few years back when Fade was getting her doctorate in another state and I expected to be in Tokyo a lot, it's chest high with boxes now). Setting up a desk in the bedroom had multiple issues, and the only other rooms lockable against a cat are Fuzzy's room and the bathrooms.

The catch-22 situation the table outside the UT geology building solves is "I don't want to walk more than a couple blocks under the blazing texas sun" vs "what's open late post-pandemic"? It's (usually) LOVELY solid blocks of uninterrupted time (electrical outlet even, not limited by battery time if I get in a groove), but it's also a two hour round trip travel commitment (great exercise, BUT), and puts me solidly on a night schedule.

In the Before Times there were plenty of places around the city with "seat, table, beverage, sun protection" that were open late and didn't mind me hanging out with laptop for a couple hours at a time, but the ones I used to go to all closed or reduced their hours. The closest library is 2.2 miles away, and the closest Starbucks is 1.5 miles away. Both close 8pm today, earlier on other days (the library is closed entirely sunday because cheeses). For most other things Google Maps' hours of operation is unreliable (for example it shows when the drive-through is open, never when the inside is open; I didn't know a donut shop could HAVE a drive-through until I got there).

I live two blocks from a large strip mall (Hancock Center) that used to have plenty of choices for places to go. In practice, pandemic recovery is... an ongoing process. It technically has a coffee shop in the far corner, but it's one of those "large standalone booth in the parking lot" types with very little indoor seating, and using their outside seating at 2pm is not an option in texas. (Today's projected high is 105 degrees Fahrvergnugen, and I can't see my phone or switch screen outside even at 8am, so the laptop's right out.)

I've been bringing my laptop to the HEB deli section's tables, but they're getting crowded (I always forget the schedule for veterans breakfasts and trivia night) and there's a surprising lack of interesting caffeinated beverages there at the moment. They stopped carrying "peace tea" during the pandemic. (Gas station on the corner still has it, but they have no seating.) HEB just reorganized the Intergalactic Foods aisle eliminating the slot for my favorite beverage (the checkerboard tea cans, aka "Chin Chin Loire River Assam milk tea") which was of course out of stock during the reorg because it sells out immediately every time they restock it, but they kept the slot for the without-milk version of the same product because that's NEVER out of stock, they still have most of the same individual cans they had last year. (Keep the thing that doesn't sell, ditch the thing that immediately sells out. Check.) Most of the other beverages I get are gallon sizes or six packs, not convenient individual versions. (Except, of course, the monster energy drinks I no longer drink because that got unhealthy. And even then they've switched out most of the interesting flavors.)

The Wendy's in Hancock is open inside again (although HEB opens at 6am and Wendys opens when they feel like it: I've been there at 10am and they weren't open yet despite mailing me coupons for their breakfast menu: drive-through only again), but they put locks on the bathrooms which is a "you are not welcome here" vibe that makes me reluctant to hang out for any length of time. You don't actually want strangers inside your building, got it.

Currently hanging out at Jack in the Box, which is... actually quite nice. They still have OUTLETS here. I did not pack my laptop cord because nobody else still has outlets. Eh, still 65% battery...)

Ok, the diff free() failure was an off by one error in quote handling (it always decremented it on return even when we hadn't incremented it past a starting quote character), and Elliott needed me to work around a false positive generator that Android requires their builds to go through now, and then I got a little work in on starting proper test suite coverage for lib/args.c by hijacking skeleton.c and using the fact it has a zillion example command line processing types (so I don't have to look them up when using it to start a new command) actually regression test the lib/args.c plumbing. Which helps me cycle back to finishing lib/args.c so I can fix unshare so I can do the mkroot usage video...

Except something is wrong with the test suite. It's got delays in it. One of the sed tests sits there for multiple seconds, which only happens in "make tests" but not in "make test_sed". Sigh, git bisect time...

July 18, 2022

My script took 28 seconds to start the rsync of this blog file to my website (and then completed it in under a second because fiber). Time to reboot the google white circle router again. Or take a hammer to it to FORCE myself to dig a 15 year old linksys out of a box to drastically speed up the house's internet. (I did, however, confirm I physically CAN'T replace the cat-5 cable coming out of the fiber jack on the wall without taking off the outer plastic case it disappears into. The case does not obviously pop off, and I'm not accidentally ripping the jack off my wall fiddling with it.)

Wrapping my head back around lib/args.c. The motivating case for the " " optstr attribute was "kill -stop" vs "kill -s top", in which case yes you pass -stop through. But in THIS case, I want it to accept it but consider it not to have an argument, which means also checking bit 8 when I check bit 4 of opt->flags. And fun corner case: "ls --color=none --color" enables color. So the later --color without an argument needs to discard the --color argument previously given. (Needs a test case!)

Kind of annoyed at the variable name "shrt" but "int short;" is just going to confuse C because short's a keyword, and "is_short" is at least as awkward. Alas "long" is also a keyword, so inverting it isn't much better. I guess I could call it longopt...

Alright, what are the cases here: "kill -s top" passes through -s (": "), "date -Id" must be attached (":;") or date "--iso=d". And actually that no longer needs "(iso)(iso-1234)" because we have automatic longopt abbreviation recognition now... except abbreviations mixed with ":;" are currently broken:

$ ./date --iso-=d
date: bad -I: ELL=/bin/bash (see "date --help")

Right, I need tests/skeleton.test. Because despite that command being intended as a template for new commands, what it DOES is have an example of just about every type of option parsing (so I don't have to look them up when starting a new command that has an occurrence counter or type "*" collecting struct arg_list or something). And that means it can TEST all these weird corner cases and make sure they don't bit-rot again. (It's hard to fully test "date" or "kill" without root access, or at least more container plumbing than I've been willing to take for granted on the host.)

Sigh, if I was starting the test suite today I'd make a lot more use of bash's $'blah\n' strings instead of having the expected return (argument 3) passed through echo -e behind the scenes. I'm still tempted to do a conversion pass, but it would probably annoy Elliott and I can't do it until toysh can run the test suite because I don't think mksh has $'' support? No, looks like it does. Huh...

Ok, important cases: "kill -s top" vs "kill -stop" (: ), "date -Id" vs "date -I" vs "date --iso=d" vs "date --iso-8601=d" (:;), and now "unshare -u -n" vs "unshare -un" vs "unshare --uts" vs "unshare --uts=spam" (:; )...

What does unshare --uts=spam do, anyway? Bind mounts it but where? On "spam" in the curent directory of where you run unshare, looks like. (Expects an absolute path?) And the EXAMPLES section of the unshare man page shows the --mount-proc option, which is silly levels of micromanaging (there's no --mount-sys or --mount-devtmpfs so why have... REALLY?

And I've read through the --setgroups=allow/deny thing a couple times now and am not entirely sure what exploit they're patching. I'm sure there IS one, but... requires a random capability bit (in the container or outside of it... "man 7 user_namespaces" opened in another tab to come back to), kernel only allows it after gid_map has been set (for a container, or even on the host?), and gid map becomes writeable by unprivileged processes when setgroups(2) is permenantly disabled with deny? (Why?)

I do NOT HAVE FOCUS for this can of worms right now. Gotta finish diff. Gotta finish dd. Gotta finish shell. Gotta finish getting the test suite running in mkroot. Gotta finish the hermetic build command list. And the sed changes for tar --transform too.

(Yes, toybox tar --transform is going to require toybox sed because I can't figure out how to do it otherwise, short of building most of sed into tar.)

Good grief, they allow ramfs mounts within a semi-privileged namespace? What keeps "cat /dev/zero > ramfs/file" from locking the system with NO SIZE LIMIT? (The cgroup setup for memory limits is presumably separate...)


July 17, 2022

The unshare option collating issue is fallout from this commit eighteen months ago. I.E. I broke it recent-ish-ly, which is often the case. (Regression test suite! Needs soooooo much more filling out.)

The problem is the behavior of various commands is not consistent: date -I accepts an attached format for the short option (which the man page describes as -I[FMT], --iso-8601[=FMT]), when there's a space after the -I there's no argument, when there isn't the rest of the string is an argument. That's where the behavior lib/args.c currently implements comes from. But "unshare" does not allow the short options to have argument strings, it instead allows grouped short options but --longopts can have optional arguments.

I was thinking I could have the semicolon options only accept attached short option arguments when there is no corresponding long option, but the date -I use case has both. So I need to specify this in the option string somehow.

But that's not the problem. The relevant chunk of lib/args.c that makes this decision is:

// Does this option take an argument?
if (!gof->arg || (shrt && !gof->arg[1])) {
  gof->arg = 0;
  if (opt->flags & 8) return 0;
  gof->arg = "";
} else gof->arg++;
type = opt->type;

When an option has the semicolon indicator, opt->flags is ored with 8, and there's a test that returns early... but I don't remember what the context of the test means, in part because it's changed. The date -I commit above added "shrt" as a third argument to the gotflag() function this code is in indicating whether we're processing a short or a long option. And I don't remember under what circumstances is gof->arg NULL? The date -I commit added the second || (shrt...) part of the test, so the test USED to just be is arg null? And it looks like it currently can't be, because on the way in if !opt we print gof->args in the error message?

Sigh, the downside of having written most of this infrastructure in 2006 and modified it a LOT over the years is my mental model of what's going on is out of date in places and fuzzy in others. Let's see, from the top:

Data structures in lib/args.c:

  • struct getoptflagstate - populated by parsing the option string in NEWTOY(), the local variable pointing to it is usually "gof".
  • struct opts - linked list in gof->opts, one structure per FLAG_bit in toys.optflags.
  • struct longopts - linked list hanging off of gof->longopts, listing all the --long options. Each links to the relevant gof->opts short entry (where the character we're looking for, opts->c, is -1 if it's a bare longopt).

Functions in lib/args.c:

  • get_optflags() - entry point called from outside, parses current command line in this.argv using current NEWTOY() struct in this->which.
  • parse_optflaglist() - populate getoptflagstate from the option string in NEWTOY(name, "optionstring", FLAGS).
  • gotflag() - called for each recognized long or short option to set flags, save arguments, enforce bounaries, handle groupings, etc.

In main.c, toy_singleinit() calls get_optflags(), which starts by calling parse_optflaglist() to populate struct getoptflagstate gof; and the opts and longopts lists hanging off it. Then it iterates through argv[] to populate toys.optflags, toys.optrags, and TT.arguments. Then on the way out there's a couple sanity checks ("Need %d arguments", etc).

The big loop over argv[] is basically for (gof.arg = argv[i = 0], argv[i], gof.arg = argv[++i]) so gof.arg always points to the current argument being parsed. Each argument (unless/until it stops early) is checked for an initial dash. If no dashes, it appends the argument to lib.optargs. If there's two initial dashes it searches gof.longopts and calls gotflag(&got, longopt->opt, 0). For one dash it iterates through the characters of the arg and searching gof.opts for the short option and calling gotflag(&got, opt, 1) if found.

Before either call to gotflag(), gof.arg is incremented to point to where the next argument would be within the current option string, so the "=" for --longopt=abc and the next charater for "tar -Cdirname"... ah-ha! The " " indicator says the additional argument must be separate! I've already got a cuddling disable-er... And it doesn't work: now for unshare -im it says -im is an unrecognized option. (More tests! Fluffier test suite!)

The current code is setting gof.arg to NULL when the --longopt hasn't got an = at the end, which seems... unnecessary? For a short opt, gof.arg would point to an empty string (the NULL terminator at the end of -abc) when there's no attached argument, and it seems like the same logic would mostly work for both? Modulo arg actually pointing to the current short opt and needing to be advanced on in the short opts case, vs pointing to the null terminator or = int he longopts case (so advance if it's = but do NOT advance past the NULL...)

This quick go over lib/args.c glossed over error handling and figuring out when to stop early and that whole "tar xvzf" and "ps ax" first set of charaters is a short option even with no dash thing. And didn't go through how parse_optflaglist() actually populates gof/opts/longopts. But that's the general idea of what lib/args.c is doing. Yeah, I need to do a video on THAT too...

(The problem with posting videos is I haven't figured out how to iteratively improve videos. I can check in code and then check in an update a dozen times if I need to, even after it's pushed to the server, a video is kind of one-and-done in my current process. In theory editing can help here, in practice I haven't worked out to retcon bits of it with a different cut and new footage inserted...)

July 16, 2022

Ha! All this time I had Form Energy confused with ESS. Both are doing iron flow batteries (I.E. the battery chemistry is rust in salt water), but Form's done literally nothing the entire time I've been tracking them, silently sucking up investment while doing one or two pilot programs with nothing to say for months at a time. ESS is the one giving compelling tech walkthroughs (ala let's open up this real working thing and show you how it works) and actually offering products for sale now. ESS also has a much more active twitter feed that's actually about their stuff being used instead of "look, we hired another MBA!" and "Here's an interview with an executive about how batteries are important!"

ESS is even already selling into Europe, and that article says its production capacity has grown from 250 MWh last year to 2 GWh this year. Their current cost is $20/kwh (last year Lithium Ion was $132/kwh but due to high demand the price is expected to increase). Like the old "iron edison" batteries from a century ago (Edison bought some eastern european guy's patent and renamed the technology after himself, kind of like Musk today: those were iron/nickel batteries in aqueous solution, the new ones are iron/oxygen batteries in aqueous solution) the iron flow batteries have unlimited charge/discharge cycles with no capacity loss. (Their 25 year estimated lifespan is because it's full of PVC pipes and pumps that will eventually need maintenance, not because iron rusting and unrusting has unaccounted-for side reactions, nor does iron grow dendrites. Iron/oxygen/water/salt technology is mostly plumbing, and its technological complexity is closer to the "swimming pool" than "aquarium" end of things.)

Unlike Form, ESS has actual numbers for their stuff (because it currently exists and is for sale now): their standard setup stores 6 megawatts per acre with 74 megawatt-hours capacity (so 12.3 hours to fully charge and 12.3 hours to fully discharge when electricity goes in/out at 6 megawatts). Austin's total electricity generation is 4600 megawatts which would need a little over 750 acres somewhere in the 437 square miles Austin Energy services to have batteries provide the total peak load. (For comparison the Tesla Gigathing is 2500 acres, and those 750 acres of storage don't have to be contiguous. It's better if it ISN'T, but spread out around the dozens of existing substations.)

And most of Austin's electricity load is air conditioning, which happens while the sun is shining and solar panels are generating full blast, so needing to run the whole city from batteries should basically never happen modulo another uber-blizzard. All this solar is why the ERCOT incompetence panic du jour hasn't hit Austin yet: over 50% of our local daytime power comes from renewables, with 37% of daytime load coming from solar and the rest wind plus that one wood-burning boondoggle. Yeah, record heat waves and record air conditioning consumption, but as Tom Baker said "the moment has been prepared for".

Sigh, the lovely pie chart of live electric generation for Austin gives the percentage for each TYPE, but not the total megawatts being produced/consumed, or any kind of non-live average over time. 4600 MW is total generating CAPACITY, but it's not going full-bore 24/7 and the batteries' job is to absorb excess solar power and feed it back out again at night. Doing that could get us to 100% renewables pretty quickly: double wind and quadruple solar with batteries absorbing the excess to feed back at night. (Note that wind seems to generate MORE at night around here, although they only give percentages not absolute numbers so...)

July 15, 2022

Fuzzy wanted to talk so went out with me on my late night walk, and followed me all the way to the table at UT. (I still describe Dean Keaton as the wife from Family Ties, she suspected Norman Hackerman was a character on Mr. Robot. That's "26th street" and "the geology building" in a sane naming scheme.)

She was quite impressed (not necessarily in a good way) by the "Giant Ugly Pile of Recycled Metal Canoes Lashed Together on a Stick" at 25th and Speedway, and took a picture. The plaque, which disingenously describes it as "whimsical", says it's been there since 2015. I'd thought it was longer. It took me years to stop being afraid it would fall on me walking under it, and I still don't want to go near the thing when there's lightning. When I first moved to Austin you could rent these canoes on town lake to go under the bat bridge and get pooped on. I would have thought that much aluminum would have a significant resale value, but maybe it's alloyed with something toxic? Dunno...)

And Fuzzy came back 5 minutes later with a homeless guy, of the "this person's brain is clearly not working properly" kind. (Paranoia, and something REALLY wrong with his short term memory.) There are no homeless shelters open after midnight that Google can find... not MUCH open overnight since the pandemic. So leaving him in a better position than we found him took... I don't even know how long? (For a definition of "better" that involved buying him cigarettes and a new lighter, because we couldn't get him a shower and a place to sleep, and cigarettes were his second priority. Food was third, but he couldn't have meat because "ulcers" and literally the only non-chip food Several Elevens had without meat in it was... a banana. They have sandwiches, and canned pasta, and ramen... which all has meat in it. Ten minutes in I was starting to suspect the slurpees are meat based.) We left him at a seat in front of the Dobie Mall entrance Google Maps shows for the "Homeless are Human" corporation in suite six hundred something, which opens at 8am. And then we both went home, because I dunno how to usefully do more than that here.

Somewhere during this, I appear to have strained my back. Which is not fun. It was twinging a bit over the past few days, but it REALLY did not like all the running around trying to find something open at 3am with a staggering mumbling dude who kept bumping into me.

Not my most productive evening, programming-wise.

July 14, 2022

I honestly hadn't noticed what day it was until I saw the hashtag trending on twitter. I should get out more...

The first mkroot video I want to do is on using it, starting with "make root" and sudo chrooting into the directory, showing how broken static linking against glibc is ("wget" says "System error", and "ls -l" can only show user/group IDs not usernames), and then either dynamic linking and downloading a cross compiler and building against musl...

But something I want to demonstrate is how changing the host name within the chroot changes it on the host too, and how "unshare" can prevent that... which is how I found the unshare problem where "unshare -imnpru" only does -i and not the other ones.

The problem is that unshare --ipc=name creates a persistent name space, and this is the "optional argument" logic where you can leave off the =name and it's fine. For the SHORT options, I implemented that as "unshare -iname -m" supplies the argument, and "unshare -i -m" doesn't. Except that means "unshare -im" treats the m as an argument to -i and not as -m. And the debian "unshare" command lets you say "unshare -imnpru" treating them all as arguments, but currently toybox requires that to be "unshare -i -m -n -p -r u".

I dug and noticed the new -C argument, and "nsenter -a", and I went "great!" and added -C and -a which means I can say "unshare -a" and maybe bypass the whole argument cuddling issue for now... and then after I'd checked it in I noticed that nsenter has -a but unshare DOESN'T. (I added it, but debian's hasn't got it.) So I can't use "sudo unshare -a chroot root/host/fs" as the example in my video, because that would have to be the toybox unshare on the host. (And ALSO the whole "adding flags other things don't support makes Denys Vlasenko sad", although honestly, they SHOULD have added it if they were going to put it in nsenter. That has mostly the same flags... although again why does nsenter have -S and -G to set user and gid of the process in the new namespace, but unshare doesn't? One creates a new namespace, the other reuses an existing namespace, this is basically the same use case, why have they diverged???)

So this is why I haven't started that video yet, I have a rough edge in my script where I dunno what to say when I get to this point. I don't want to provide a tightrope walk over an abyss where if you do EXACTLY what I show it'll work, but a seemingly trivial change (-i -m -n vs -imn) breaks stuff and you dunno why.

If debian's unshare only lets you create "persistent" namespaces with via --longopt then toybox's unshare should do the same, which means changing lib/args.c so I need to either audit the other users to confirm they're ok with the change or add another indicator to specify the desired behavior? Hmmm... Design issue. I was thinking ";" without ":;" would be --long= only but it uses both because you can have OTHER kinds of optional arguments, ala "#;"... Maybe ; up front in the <1>1 prefix part to indicate longopt only for ";"? Hmmm... it's all subtler than I like...

July 13, 2022

Austin Energy has a live pie chart of its energy source mix, and coming up on 11am solar is up to 37% of the generation total. (Which I think is probably about where it's likely to peak, based on how many panels are installed.) This says if you quadruple the panel count and add a few parking lots full of shipping container batteries (which can stack eight deep), solar+wind could easily supply the city's electricity needs. Probably want to go to 5x or even 6x when we switch over to electric cars.

The first of those last two links is Bloomberg going "S-curves go vertical soon after hitting 5% and we're about to hit 5% electric car sales", and the second is Ford's electric vehicle sales numbers. Ford sells around half a million vehicles per quarter and they sold 16k EVs in Q2 (increases in F-150 Lightning and E-transit cannibalizing Mustang sales as they divert the battery supply from the fancy sportscar to the workhorses), which is around 3% of their total. Lightning and E-transit are both sold out a year in advance (at which point they stopped taking preorders), so their limiting factor is ENTIRELY how fast they can make them, and they're investing billions in increasing production as rapidly as possible.

Austin's stated goal is "to provide 65% of customers' energy needs with renewable resources by 2027", which sounds roughly like doubling the solar installation capacity. There's a new 150 megawatt solar installation up in Pflugerville. Their generation portfolio has 645MW solar, 1425MW wind, 770MW gas turbines, 430 MW share of a nuclear plant near houston, 570 MW of a coal plant they're actively trying to get rid of (but being politically blocked), and a 100MW wood-burning boondoggle they bought to get out of their contract with it, and now fully own but don't really want because it's crazy expensive to run. (The price of wood has gone way UP, not down.) Austin has a couple megawatts of lithium-ion batteries, but what it REALLY needs are those big iron/air container batteries. (Which are still doing pilot projects and building manufacturing capacity, and don't have actual retail prices posted where you can order units yet.)

July 12, 2022

I've been leaning on t-mobile to avoid dealing with the google fiber issues. Fade had me reboot the google router again this morning because she'd scheduled a watch-along of the new Hilda movie with a friend in maryland, and the little white circle had gotten stroppy again. (She'd power cycled the fiber jack on the wall, but that doesn't fix anything. It's specifically the router that's borked. Power cycling the router fixed things again for the moment, but it deteriorates over time.)

Of course over the last couple of days T-mobile decided to break, as in my signal strength has been "!" for large chunks of the day (meaning no bandwidth). A reboot would fix it for like 30 seconds but then it would go "!" again. I can still make phone calls and send/receive texts, but neither is something I regularly do with my phone. The household grocery list is a slack channel.

Calling T-mobile's tech support revealed that a tower near my house is handling calls but not data, and they've got people frowning at it but no idea when it'll be fixed. The support lady in India suggested I go into android "settings->system->reset options->reset Wi-Fi, mobile, & bluetooth" in hopes it selects a non-broken tower. As with power cycling the phone, that lasted about 30 seconds before navigating itself back to "!". (It did forget my wifi password and all the saved bluetooth headphone associations, so I guess it did something? And of course the reset ended the call with tech support, probably why their script suggests it.)

Eh, that problem I SHOULD be able to wait out. And it still works fine when I'm at the UT campus instead of home. The ball there is not in my court: broken tower, being fixed. Half the reason I'm dithering about engaging with Google Fiber tech support again is the PROPER fix is to throw their router in the trash and get a new one from Best Buy (which has an actual physical location within walking distance)... Let's see, Google for "best buy router"... ahem, "best buy router -prime" (Adam Savage was right, Google has become useless for finding products to buy...) And the first hit is... oh goddess. No, I am not deploying another one of Google's white circles EVER AGAIN. They AREN'T DUMB ENOUGH.

Right, that goes back on the todo heap, worry about it later...

For some reason plugging a USB-A->USB-C cable into my laptop's combination USB/HDMI port (which only does USB-2) can charge the skyrim machine switch, but plugging it into either of the USB-3 ports on the other side does this weird cycle thing where the switch's screen keeps powering on and back off again? The USB-2 driver gives up and goes "I dunno what this is, but it wants power", and the USB-3 driver NEVER STOPS TRYING TO IDENTIFY IT, or something? Odd. (I was charging a USB battery from my laptop and then charging the switch from that until I noticed this. Both types can drive a _battery_...)

The Posix-2008 (ala SUSv4) spec wants diff to support -bcefru -C# -U#, and we're not doing that. We're only (I.E. always) producing -u output, so no -cefC, just need -bruU (although -C could be treated as a synonym for -U I guess).

Posix says when traversing directories don't compare block/char/fifo/dir to files... which is ALMOST a complete list: stat has S_IFBLAH macros for FIFO CHR DIR BLK REG LNK SOCK. I guess LNK passes through but what about unix domain sockets? (I guess they forgot them, I'm assuming they're excluded too.) So using stat() instead of lstat() (modulo not traversing through symlinks with -r), the test is basically "file vs not-file".

It's got the "Only in" and "Common subdirectories" messages specified, and a semi-specified binary file comparison format required to have the two path names and the string "differ". Hmmm, posix's same_file() test includes same block/char major:minor, but then says what to DO about same files is undefined. (So why...?)

Ok, posix requires redundantly outputting the "diff -ruN FILE1 FILE2" line before each file comparison, even though it's got the +++ and --- lines providing that info? Great. Eh, not that HARD to do, just... silly? Lots of stuff in here about the diff -u output format but it's basically stuff I already had to learn to implement patch. And nothing about any actual "finding diffs efficiently" algorithm. Grumble.

You know, all these algorithms are finding lists of ranges, but the lines are never reordered: diff is basically finding a BITMAP of unchanged lines. Well, possibly two bitmaps (one for each file). Of interest is the set of lines in each file that were NOT changed from the other file, so when you output you traverse forward discarding pairs and then writing the +line or -line for each unmatched entry. And sufficiently long matched spans are _discarded_, as in diff doesn't output anything for them and can basically stop thinking about them. Which sounds a lot like the way I wrote patch.c. Hmmm...

All this search stuff is thinking in terms of spans, but the spans can't be reordered. Not in diff -u output. I've seen graphical diff output that displays moved stuff: when I was working at IBM fresh out of college OS/2 had a lovely side-by-side visualizer that showed moved stuff with diagonal lines. Alas I haven't encountered that since, it doesn't seem to exist outside of IBM. Yes I looked when I switched to Linux, at some length...

Alright, if I were writing a diff the way I wrote patch (conceptually turn my patch.c inside out to PRODUCE said output from 2 files), how would I go about it? My patch is streaming, which means it can't apply hunks out of order (or even overlapping hunks). It forgets "seen" lines, and just searches forward for the next matching place remembering as much in-progress context as it needs to, and when it successfully finds a place to apply a patch it outputs the appropriate remembered state and discards both the saved context and the hunk it just applied. (When the hunk fails partway through, it flushes the remembred context a line at a time reevaluating its way forward...)

In this case "success" means I can discard lines. There are three lines of context before/after a hunk, which means to avoid producing overlapping patch hunks you can have at most five unchanged context lines between two changes. (Otherwise three trailing lines of end context + three leading lines of new context means splitting it into two hunks won't overlap and thus won't confuse patch: this is why the format has the @@ counts at the start instead of looking at consecutive context lines to detect end of patch.)

The next question is reacquiring unchanged lines after a change, which is where the N^2 search thing comes in I guess? Hmmm, alternate between the two inputs reading one more line, and check that line backwards along the other input for a match, and then check for six consecutive unchanged lines (well, 2 times -U many matching lines: -U lines of trailing context for the old hunk and -U lines of leading context for the next hunk so you can break the hunk without overlap), which means it's time to flush the hunk and forget the previous context. So reading a line at a time into memory and storing it is the correct behavior (modulo keeping/reusing lines we already read ahead that turned out not to be part of the last hunk), and all this "copying to a seekable file" stuff is silly, because your memory consumption is based on size of the HUNK you're assembling, not the size of the file. If you diff two multi-gigabyte unchanged files, you're pairing lines and discarding them again and only keeping 3 trailing context lines for starting a new hunk with.

Does this produce nicely human readable output? Dunno. Will it marshall changes between two trees in a way patch can deal with? Definitely. Can Bram Cohen's "patience" technique maybe tweak this slightly to be less ugly? Dunno. How ugly is it? Why was this not how it's ALWAYS been implemented? What did I miss here?

I want test cases. I REALLY want test cases. People have written at least four different diff algorithms because they produce different diff output on the same input, and I would like that input please. But no, THIS was not considered relevant to preserve, apparently? Grrr...

July 11, 2022

The unshare(2) man page says CLONE_NEWCGROUP showed up in Linux 4-6, which was released May 2016. Ten months left on the seven year rule. Trying to figure out if I should add an #ifndef to portability.h just to remove it again. I'll probably wait for somebody to complain...

One of the most annoying rough edges of C11 (sigh, yeah, I've upgraded from c99 because (type){a,b,c} complex inline constants are just so CONVENIENT, plus named struct member initializers with the rest guaranteed zeroed is nice) is that void * doesn't nest. Specifically, I want to rewrite xrealloc() so instead of having the same semantics as realloc, ala ptr = realloc(ptr, size), it's just xrealloc(&ptr, size) with ptr passed only once. It already xexit()s on error, so if (NULL) on the return value isn't relevant. It CAN still return the pointer, there's one instance of memcpy(x = xrealloc(x), blah) in xwrap.c for example, but there's a whole LOT of instances of big[complicated].thing = xrealloc(big[complicated].thing, size) that don't need to repeat the calculation.

Unfortunately, the LEVEL of indirection still matters: if I make the argument type a void * then xrealloc(ptr, size) instead of xrealloc(&ptr, size) won't get caught at compile time: it'll accept any pointer. And if I make the argument void **, then char *blah; xrealloc(&blah, size); doesn't work because a char ** isn't a void **. Since the pointer is ultimately to void (meaning we don't care what it's a pointer TO since we can't access it without a typecast anyway) all that SHOULD matter is it's the same number of levels of indirection. But no. That's not what the C committee decided to do. (Yes, this is the opposite of how C++ thinks. Thank goodness.)

In this case I want a LITTLE type checking, but I only have the options of "none" and "too much". This is the same basic problem that llist.c has all over the place: llist_pop() is a pointer to a pointer but has to be void * to avoid a bunch of typecasts, which means if I forget the &head and just pass head the compiler won't catch it. (It's pretty obvious at runtime, but shouldn't HAVE to be.)

For the dlist stuff I kinda have a vague notion I can stick a common wrapper type at the start of all the dlist structures containing JUST prev and next, since a pointer to a structure is the same as the pointer to the first member, so C can actually do single inheritance pretty easily. But this adds an extra level of name traversal to access prev and next (tt-> which isn't really a net win, and &(tt->a) isn't THAT much better than a typecast, which is why I haven't really tried to open that can of worms yet. (Plus it's a big intrusive change touching a lot of places, albeit mostly mechanically...)

July 10, 2022

Facing "diff" again. I dowanna become an expert on diff algorithms. There seem to be at least four of them now, and that's if you don't count whatever the "Hunt–Szymanski algorithm" is (wikipedia's a bit of a forkbomb here).

Doug McIlroy's diff paper has failed to sink in for me, but long ago I bookmarked a blog post series that I vaguely recall did a good job covering the territory at the time. It may have predated the "histogram" algorithm which is surprisingly hard to google for...

I was really hoping I could implement just ONE algorithm and call it good. Right now it looks like "histogram" would be that one (it's an improvement upon "patience" which is the one I'd planned to implement before somebody else sent in a contribution). But that's not at all what the current diff.c is doing, and "throw out code because I don't understand it and write a new one I do understand" is all wrong. I need to understand this BEFORE giving myself permission to throw it out (if I still want to after I understand it).

Oh, and the free() bug is NOT just in my current tree's diff.c, it's in the last checked in version too, as demonstrated by:

$ seq 1 100000 > one
$ seq 1 4 100000 > two
$ diff -u one two

Which is a lovely test I saw on this page (once upon a time demonstrating a hashing bug in git that caused diff to run pathologically slowly) and I'd happily put it in the test suite if I knew what result to compare it against? According to "wc" it produces 100,003 lines of output, which I am not copying into diff.test and would need to generate myself somehow...

$ diff -u <(diff --label nope --label nope -u one two) <(echo -e '--- nope\n+++ nope\n@@ -1,100000 +1,25000 @@'; for((i=1;i<=100000;i++)); do (((i-1)&3)) && printf - || printf ' '; echo $i; done)

A test which takes 1.6 seconds to run on my laptop because of the shell loop, and could easily produce different output for different diff algorithms (dunno yet), but eh... (Probably use a shorter version for regression testing, but for the moment I want to make the long one work.)

ANYWAY, that test causes toybox's currently checked in diff -u to throw a free() error, and I dunno if I broke it with cleanups or if it was already broken when I started but it doesn't really matter at this point. The existing test suite code didn't catch it, I gotta fix it. I hate free() errors, a random explosion long after the cause of the problem. I hacked up a function to manually validate musl's heap once, right before musl changed how they maintain the heap so that pointer traversal didn't work anymore. I should do it again for the new stuff at some point, but I've got ENOUGH tangents right now...

July 9, 2022

Oh hey, the Linux kernel guys are rediscovering User Mode Linux and using it for container plumbing. When did I get involved with that... looks like 2006. No wait, the build system I did before aboriginal was based on User Mode Linux, and I _ended_ that in 2005 when I got QEMU to work. When did I start that... looks like 2003.

For the mkroot docs I'm trying to stick "unshare" between sudo and chroot in my example to give some quick and dirty container support, but "sudo unshare -imnpru chroot root/host/fs" is not actually protecting the host from changing hostname in the container, and pulling on the thread is... I don't think I properly cleaned up izabera's last commit to this thing. I CERTAINLY didn't properly test it...

Don't need another tangent right now. [Goes down tangent...]

July 8, 2022

What I DIDN'T say in my reply on the toybox list was...

My Google Fiber comes and goes. My speed test on the 29th was 73.2 mpbs upload and 4.6 mbps download. They're supposed to be symmetrical. It's really the latency from all the dropped packets that's noticeable: on a bad day sites can take 15 seconds for DNS to resolve (and then load quickly once that's done). When it gets bad crunchyroll pauses every 5 seconds to buffer (that site has terrible bufferbloat, it's bad on any connection that drops any packets). And whatever it is gets REALLY bad youtube does the same, which I hadn't seen before: for some reason the youtube app on my phone will read ahead and buffer on t-mobile, but it WON'T read ahead when associated to the google fiber router. How does it know? WHY does it know? How does this help? Why would they even implement two codepaths? Please stop trying to be "smart". At least netflix/hulu/prime buffer reasonably, they're used to being at the end of crappy internet...

We already had a guy out to replace the fiber lens thingy in the box on the side of the house a month or so back when it was just being consistently awful, and that DID help, but he wanted to replace the little white router circle Google gave us but couldn't because Fade had the app to control it installed on her phone and she was in Minneapolis for 4th street and we couldn't transfer control to the new one without it. With linksys routers you could just plug the new one in and throw the old one away, but apparently the technology's been "improved" until it lost that capability? (I don't WANT a control app. Visiting from the wifi side to control it was _fine_.) The service dude explained how lots of dropped packets like the bad lens was causing can drive the circle router's algorithms persistently nuts because it's a SMART router in a way I'd really rather it wasn't and remembers the wrong things in its flash, which is why he was ready to just replace it with a spare he had with him. ("Can you just factory reset the one we have?" "No.")

Except when it all started going pear shaped again however many weeks later, rather than send the same guy back out now that Fade and her phone were home, the support chat bot suggested we replace the six inch ethernet cable connecting their router to their wall plug, which I keep meaning to do (I dug up a fresh still-in-plastic cable and everything) but it seems to have cleared up again on its own (for the moment) and I'm still not sure how a dodgy ethernet cable would cause DNS lookups of a new site to take 15 seconds and then the page loads quickly once it's connected? (Service dude was ready to replace the router, chatbot is now trying to avoid sending someone else out, problem goes on my todo list because I don't want to argue with a bureaucracy and it's mostly still usable?)

*Shrug* It's on the todo list. It comes and goes with the weather or something, and I'm using my phone tether half the time anyway. (T-mobile reset the monthly "fast" bandwidth quota at the start of the month, and I've mostly stopped watching youtube anyway because the site just got unbearable for other reasons. Netflix works. Prime and Hulu let me download just about anything to watch later so connection quality doesn't come up much... I also checked tubi and pornhub and both of them can stream video reliably: crunchyroll's had obvious bufferbloat issues for years and youtube is being "smart" if and only if it's connected to google hardware.)

All this is still more reassuring than the time the Google fiber got cut and they fixed it with a couple hundred feet of patch cable duct taped down the sidewalk (crossing a street and at least one gravel driveway) for over a month. They fixed it properly eventually...

Heh. I just tried to send myself the video I took of walking along said hilariously duct-taped patch cable by telling the android files app to "share with slack", and my phone gave me the "slack has crashed" pop-up, with the option to send feedback to the android guys. So I told it to send the crash dump and it then said the "market survey app has crashed, send feedback", so I wound up sending them a crash dump for their crash report utility instead. Par for the course...

July 7, 2022

Trying to write up an explanation of mkroot is always a bit of a black hole, and I keep winding up checking polishing patches into because THIS order of operations or THIS naming choice is easier to explain. And I think I found an actual bug (with CROSS_COMPILE out of the $PATH instead of an absolute path), fixed now...

July 6, 2022

Looked at busybox's timeout to see if adding -i was an easy patch, but they seem to have gutted it and moved all the actual workings under libbb/ somewhere via multiple layers of indirection. So no, not easy.

Running "make test_diff" does not reliably display the diff for failures, for reasons that become much clearer with 30 seconds of thought rather than 10 minutes of debugging.

Sigh. Diff is freeing memory wrong (with glibc throwing an error in free() which is never a good sign). I dunno what I screwed up, and the main problem is that "make test_diff" is not stressing the plumbing in any particularly illuminating ways. Passing those tests does NOT demonstrate the code is load bearing. What I need is two different git checkouts of linux-kernel to do a diff -r against and see if they produce the same output, except... are different versions of the gnu/dammit diff guaranteed to do that? I suspect there wouldn't be a -d "try harder to find a small set of changes" option if that was the case. (And that's before you get to Bram Cohen's "patience" algorithm.)

Ok, the test I SHOULD be running is two different kernel checkouts, create a diff, apply it with patch, and then ensure the trees are identical. That at least tells me diff _worked_. And I totally can't check that into tests/diff.test so... Hmmm.

I also need to do an explanatory video about mkroot. I'm gonna go poke at that for a bit.

July 5, 2022

Today is national bikini day. Google brings up this 2019 video about it as one of the first hits, because if you tried to post that today prudetube would take it down. Why does this issue bother me so much? Stephen Fry explained it well in his intellegence squared debate appearance: imagine all the cooking channels were blocked during lent/passover/ramadan, and then five years later Youtube escalated to going after any video game clip showing a pixilated character healing by eating food, all the while insisting this is NOT a merely religious crusade... except there's literally no other reason to do it. This is a normal part of life, condemned by certain religions but not others, and Youtube is basically going after pictures of bacon because some religion somewhere says eating pork is unclean and they're politically ascendant right now because Boomers have gone senile and want to performatively buy indulgences. And youtube is being EXTRA insidious by not having clear lines or consistent enforcement so people proactively self-censor and corrode the culture. We went through Prohibition in the 1920s and it didn't work, this is just actively stupid and I hope it destroys Youtube quickly so it can be replaced by something less stupid.

I checked in timeout -i since it wasn't just me wanting it, and managed to get the old codepath using the new code so I could take out the previous timer and signal stuff. (The trick is making a pipe() after the fork and waiting on the read half of the pipe only this process could write to and never does when I want an unconditional timeout from poll(). I tried /dev/null first but that always immediately returns that there is data to be read... and then read hangs. I'd call it a kernel bug, but I'm not up to stepping in that swamp right now.)

I need to poke Denys over at busybox and see if he'd like timeout -i on his side, and then maybe poke the coreutils guys again but they STILL haven't done anything with cut -DF last I checked, despite assuring me that it would be going in eventually? (I'm not implying malice or disingenousness here, just that todo items get buried without badgering.)

I also let cut -DF marinate for quite some time and had other people poke me about "standardizing" it. I'm still subscribed to that coreutils list (waiting for -DF to go in) and half the traffic is people going "I just noticed the date command exists and have suggestions for completely rewriting it from scratch" which have to just get tiring...

Doug McIlroy's old diff paper from 1976 is still written in math-ese. It SEEMS to be describing very simple concepts but it's trying to explain them as if this is calculus, which it is not. I was never good at wrapping my head around the way math people think, and this is not helping.

July 4, 2022

Bit of a divergence into computer history to try to work out why "root" isn't reliably present in /etc/group. (No posix is of no use at all here, it's never specified what any of the unix users should be and doesn't even guarantee uid 0 is special. Asking it to weigh in on GROUPS is just silly: Windows NT didn't have groups so Posix obviously couldn't standardize them.)

Anyway, I dug up group "root" with gid 0 in Red hat 1.0 from 1995, but it was NOT in v7 Unix circa 1979. (The user "root" was there already in v2 Unix from 1972, but groups seem to have first shown up in v6.) My real question is, if I redo the tar tests to work on MacOS, where group 0 is "wheel", can I rely on any of the OTHER groups to be there?

The "wheel" thing is because BSD started when Ken Thompson took a sabbatical from bell Labs to teach two semesters at UC Berkeley, which were fall 1975 and spring 1976. The dates in Dennis' v6 tarball says Ken shipped the v6 release right before leaving Bell Labs (in New Jersey) for California: most of the file dates are July 18, 1975. The only entries in etc/groups were "bin:1" and "other:3", so "root:0" was a standard user but not yet a standard group, and BSD made something up. (Bill Joy's terrible design sense went on to give us "vi" and "network interfaces do not have /dev nodes", before Joy eventually announced humanity must abandon all science and technology before we destroy the world. No really.)

So, can I rely on any of the OTHER groups to be there semi-portably? I have tests that specify name:id on the command line, but reading values out of /etc is one of the things I'm testing. ALWAYS overriding that from the command line leaves some plumbing untested.

Groups 1-9 on my Devuan Cranberry laptop are "daemon, bin, sys, adm, tty, disk, lp, mail, news". The sys through lp ones were the same in RH1 in 1995, but bin and daemon swapped places, and the last two used to be mem and kmem. So... stable-ish? The problem is this is "cave fish go blind" territory, most of this DNA isn't coding for anything that's being used... No wait, back in 2013 Greg KH's bosom buddy Kay Sievers prepretrated a profound layering violation by moving knowledge of specific users and groups into devtmpfs. (Linus eventually threw Kay off of linux-kernel, allowing him to focus on being one of the main systemd developers. No really!)

So yes, LINUX has fixed low group numbers, and is kind of stuck with them now. Next question: does MacOS? Elliott posted a list and I guess Apple moving it now is their problem. Where was the list... Ah, here. Yes I got confused by the negative gids too, mac is nuts:


And the ONLY one that matches in MacOS, old Linux, and current Linux is... "sys" with gid 3. Eh, I can use that I guess? I wonder what freebsd has? Ah, it's 3 there too. Good to know. Ok, I can use that one.

I'm implementing "timeout -i" to track inactivity, and I felt quite clever for working out that I could indeed handle both the -i and non-i codepaths with the same code by feeding poll() a NOP filehandle to time out on... except when the command closes its stdout the wait() can hang because closing stdout doesn't mean you're about to exit. Hmmm, but the child doesn't have access to the NOP filehandle so that can't close so that's only a concern for the -i codepath, and I can just define "closes stdout" as "it's exiting now" and set the kill timeout to 1 second if it's not already set.

July 3, 2022

Watching other countries fight back hard, while here we wait out the Boomers and late stage capitalism. I should find a protest to go to, but I'm mostly nocturnal at the moment...

I broke the return code of "VERBOSE=all make tests" when I moved the individual tests into subshells (because FAILCOUNT wasn't shared with the parent shell), and my first fix handled stopping for ctrl-c but not reporting the return code correctly. So I fiddled with it, and the result is bash getting into a state where it says "argument list too long" trying to run any command? Wha...? Ah: having the subshell do "cat $FAILCOUNT > continue" and then the host "FAILCOUNT+=$(cat continue)" doesn't do what I meant if I don't declare FAILCOUNT integer. It does string concatenation on a rapidly expanding variable, eating all the environment space. (Remember when they started capping that at 10 megs a few years back? Yeah, that.) Right.

Darn it, the arch/x86 kernel build broke again, once again demanding some random elf package be installed that NO OTHER ARCHITECTURE requires to build. "The power of git bisect compels you..." Right, the fix is to chop out "selects HAVE_OBJTOOL" instead of "selects HAVE_STACK_VALIDATION", and I can just tweak mkroot's sed invocation to yank both so it works with a wider range of kernel versions. This is why we regression test.

July 2, 2022

I'm trying to read Doug McIlroy's paper on how the original diff algorithm worked back in 1976, but it's written as math not as programming. It still seems to boil down to "sort the lines, find spans of same line, show what's left", but it's in a darn foreign language my brain just bounces off of.

Aha, if you diff a regular file and a directory, it compares against the same filename in that directory. Sort of "cp" logic. In an earlier cleanup pass I replaced some overly complicated tests with "can't compare directory with non-directory", and this seems to be the exception to that, which I spotted puzzling out an otherwise unreachable else case in diff_main(). (Going through the CODE is tedious and tiring, but doable. Going through the paper full of mathematical notations, I need a whiteboard.)

Of course there was no test case for this. "My changes pass all the diff tests" is not proof of anything. My cleanups have broken this implementation like three different subtle ways (not in the diffs it produces once it's comparing two files, but in what files it chooses to compare when given leeway to choose), and I strongly suspect it would have been faster/easier to just write a new one from scratch, but I'm halfway through now. I hope.

Once again, "I was trying to get tests to work under mkroot before tackling this" rears its ugly head because otherwise I have NO IDEA how to write a test case for "the exit value of a directory diff is not just the exit value of the last match" since I can't control directory traversal order and different filesystems hash their input differently. I can read the order with "ls -f" and I _guess_ modifying a file won't change the dentry order? (Doesn't rewrite the dentry, so shouldn't re-hash... If there's a filesystem out there that rehashes due to atime updates, I'm not humoring it.) So I can echo same | tee one two > three and then echo differ > $(ls -f | tail -n 2 | head -n 1) and maybe that will work? Seems reliable on tmpfs, dunno about other filesystems...

July 1, 2022

I've been cleaning up diff.c, and I need test cases. Each cleanup I've checked in passed "make test_diff" but I'm not sure what that proved. It's easy for me to diff -ruN two different Linux versions, but another matter entirely to chip off lots of little specific manageably sized regression tests without large external dependencies.

And I still haven't got a full mental model of the diff specification. I just now noticed that "diff dir1 dir2" compares the files in each. (The -r compares the contents of subdirectories rather than saying "only in A" or "common subdirectory $BLAH", I suspect I DID know this at one point but forgot because I always use -r or list files on the command line.)

The next question of course is what a test case for dir1 dir2 without -r should look like, because "Only in " and "Common subdirectories:" are longish english phrases of the kind I try to avoid in toybox... and it looks like posix is specifying them ("Diff Directory Comparison Format"), so I need to reread the spec.

Both Elliott and Doug McIllroy pointed me at the new location of the PDF I couldn't find at the top of diff.c (because Lucent took down in 2015), so I need to read that too. And neither is casual reading. I still haven't quite gotten diff.c to the point where I'm puzzling out the algorithm either, although I know the basic gist of it (sort the lines to find matches, collect runs of consecutive matches, display what's left), but the difference between Doug's algorithm and Bram Cohen's patience algorithm was something I knew at one point, but have forgotten...

The thing I'm wondering at the moment though is why the existing diff code is doing quite so much with lseek. Is there a file you can lseek but not mmap? (Other than directories, which you can no longer exactly _read_ from so it's kinda moot.) Was this lseek stuff to support diffing files >4G on 32 bit systems? (A use case I don't think I care about in toybox, not before fixing "truncate -s" anyway...)

June 30, 2022

I've been trying to motivate myself to do some videos I can upload to prudetube (which can't even show scenes from original 1960s Star Trek anymore), but the supreme court's still attacking democracy (and quite possibly life on earth, not just US), and Biden's still going "those scamps, aren't they adorable". AOC and Ilhan Omar and the other squad members are saying exactly the right things, but being blocked by the senile Boomercrats (yeah yeah, hashtag #notallboomers), and it's hard not to curl into a ball. At least today was the last day of the supreme court's term. But they start this crap up again in October...

Indiana Jones repeatedly demonstrated "punching nazis". Babylon 5 did an entire season on fighting creeping facism. In the first season finale of Steven Universe as the most powerful threat so far shows up to kidnap the entire cast, Steven's rebuttal to Lapis Lazuli's "That's why we can't fight them" was "That's why we HAVE to fight them." Public entertainment has tried to keep alive the lessons learned from World War II: that Neville Chamberlain was wrong, appeasement never works (danegeld!), cooperation with fascists makes you complicit, you MUST FIGHT BACK. The british empire spanned the globe when Gandhi took them on. Martin Luther King Jr. was very much the underdog against Jim Crow. The other side convincing you that you can't win so must meekly submit is their greatest weapon, don't fall for it.

Ukraine didn't fight back against Russia's invasion because it was guaranteed to beat Russia. It wasn't even LIKELY. They didn't EXPECT to win, but they DID fight back, and KEPT fighting, and now it looks like Russia is going to run out of Russia before Ukraine even has to switch to the vietnam/afghanistan style war of attrition behind enemy lines to oust the occupiers everybody expected in the first WEEK of the conflict. Russia was widely expeted to take Ukraine in days but be unable to hold it long term, and instead it turns out when people actually fight back bullies are often bluffing incompetent cowards with the depth of tinfoil. (Or else they wouldn't be bullies.) Four months in Russia hasn't quite managed to take 20% of a country and they STARTED OUT occupying 13% of already after the "little green men" surprise attacks in 2014. (Showing the math: Wikipedia for square kilometers of the "independent" Donetsk, Luhansk, and Crimean "republics" is 26517+26684+26100=79301 km2, Wikipedia's page on Ukraine itself says 603628 km2, dividing out gives 13.1% captured in the wake of ousting Yanukovych, I.E. Ukraine's Trump/Lukashenko. Russia precedes their invasions by bribing the local plutocrats to install a fascist puppet, which is usually how fascists get into power: find people who would sell their own mother and make an offer.)

Ukraine is up against a 69 year old man in a country where the average male lifespan is 68, whose immune system is so trashed he stays 50 feet away from people he's meeting and seems to be making half his decisions via 'roid rage. He's purged his government of any succession plan or infrastructure capable of functioning without his personal oversight, meaning when he dies Russia's government essentially ceases to exist and has to be rebuilt from scratch. Ukraine just has to fight against a Boomer until the Boomer dies of old age, and the more stressful they make his life the faster that's likely to be. When it comes to who can outlast who, geezers lose.

Part of the reason Ukraine is doing so well is Zelensky is 44 years old. His geriatric advisors actively blocked preparations for defense, but once the tanks started rolling they had people in charge capable of learning and delegating and allowing their underlings to try new things. No 70 year old would have believed drone warfare could possibly be important, as Douglas Adams explained.

This pattern is historically repeated a lot: FDR was 51 when he became president, Abraham Lincoln was 52, John F. Kennedy was 43, Bill Clinton was 44 (balanced the federal budget while presiding over a massive economic boom) and Obama was 47. The good ones ain't geezers, they're flexible with stamina and quick reaction times.

Proper elder statesmen transition to an advisory role, assisting and teaching without becoming the limiting factor without which necessary work can't get done. They may still have plenty to contribute, but beyond a certain point asking people to depend exclusively upon them stops being a good idea. Elderly people thinking they're more capable of wielding gatekeeping authority at 80 than they themselves were at 50 (and thus gathering ever more power to themselves without end, preventing younger people from getting any practice) is the same kind of snowballing ego that led J.K. Rowling to produce longer and longer doorstops as her series progressed because her editor was no longer allowed to touch her words (not even to shorten the camping trip where the protagonists literally sat in a tent avoiding the action for a large chunk of the last book's 759 pages). This "me and only me, unfiltered pure me" is called hubris. George Lucas "lived long enough to become the villain" because Han really DID shoot first, and at a certain point he'd devoted his life to undoing his own past accomplishments not just with inferior prequels but by preventing anyone from seeing the original works unmodified. (Hence people painstakingly recreating the "despecialized edition" from old film stock he hadn't managed to reclaim and destroy.)

Here in the USA Pelosi (82) Biden (79) and Schumer (71) have also tried hard to purge their party of any possible succession plan. It's just a thing Boomers do. Our geronotocracy is all older than Putin, mostly because the USA's less toxic food, water, and air lets rich white people live longer here. Russia's coverup of its first Chernobyl successfully suppressed knowledge of it for decades, resulting in a whole lot of contaminated land/water/people over there not even counting the heavy metals and dioxin and so on from industry and mining with no oversight. Over here people could historically stand up to that sort of thing without winding up in a siberian Gulag, annoying the rich on behalf of sufficiently large numbers of poor people was at least an OPTION. Not that we're perfect either, lots of people STILL think Denver has a "naturally higher background level of radiation"... And these days our political system has been captured by doddering geezers, largely because Boomers still don't see themselves as adults and thus vote for someone older than they are. But on average, people here live longer than there because we at least HAVE an EPA and FDA and so on.

But they still die. The Baby Boom has ALMOST worked its way through. We just have to fight back until the lead poisoned senile dementia patients who vote for other lead poisoned senile dementia patents STOP being 1/5 of the population. (In the 2020 census, Boomers were 21.45% of the population. In 2010, they were 25%. Racist Grandpa stacking on top of the crazy 27% is how the evangelical dominionists got a political power base. The Boomers dying puts the fascists back down around a quarter of the population, where they're afraid to come out and wave their swastikas in public.)

Various other countries are cleaning house. But we still haven't managed to take the keys away from the geezers here.

June 29, 2022

Elliott emailed in a feature request for diff on monday (-F) which doesn't seem too hard to add, except diff is in pending and needs a bunch of cleanup. So I'm doing an initial pass, starting with the "I don't have to understand how the code works, these are mechanical-ish transforms where if the old code worked the new code should do the same thing"... Lots of "variable declarations have the asterisk-meaning-pointer on the variable name, not on the type name, because indirection is declared per-variable not per-type: whoever wrote this does not understand (or is not comfortable with) how C works", yanking any "const" that doesn't move a global from the data segment to rodata segment, eliminating gratuitous use of enum (which they were ALREADY mixing with integers in the same variable, yes overlapping ones)...

And in the process of cleanup, gcc started warning about an unused variable. And I didn't CHANGE that. As far as I can tell, the variable was unused in the old code too, it just didn't get spotted by the compiler until I refactored things? (I've gone over it multiple times. I do not trust this. Then again, I don't trust gcc's warning generation anyway. May be used uninitialized indeed.) Sadly, this variable was structure->member capped to file length, and the uncapped structure->member is being used several times in the following code, so this is probably a real bug in the code. But I need a couple more passses with a weed-whacker before I start trying to understand the actual logic of the code.

It still passes "make test_diff" but it hasn't got thorough enough tests for me to consider that load-bearing...

June 28, 2022

Putting some cycles into dd cleanup, because I promised, and I've hit one of those "the stakes are too small to see clearly" things.

I want to swap the dd time reporting to use millitime(), in part because this eliminates a use of gettimeofday() which posix has declared obsolete, and also a gratuitous use of floating point (I still care about cheap disposable linux systems running on wind-up-toy hardware), but also because... who is this output for? Humans can't visually perceive even individual milliseconds. If 24 frame/second movies update the screen every 42 milliseconds and 1080p updates at 60hz (every is 17 milliseconds), what does fractions of a millisecond tell a human?

This is a divergence from the gnu/fsf version, which is showing a randomish number of digits (eight in this case, I guess the ninth was zero?):

$ dd if=README of=/dev/null
12+1 records in
12+1 records out
6306 bytes (6.3 kB, 6.2 KiB) copied, 0.00291766 s, 2.2 MB/s

But the jitter of calculating what to print in the message wipes out something like half the digits of that? (I usually say "failure to do nothing successfully is a registered trademark of the Free Software Foundation": I guess being inaccurate to many decimal places is a gnu thing.)

On my laptop, two back to back calls to clock_gettime() with nothing else in between ranges from 148 to 388 nanoseconds difference. That's filling in two cache local stack variables with no page faults or anything. Add a single printf() assembling a buffer actually going out to a pipe and all bets are off: I'm getting 130k to 200k nanoseconds from printf("It's %d\n", var).

(Some years back gnome terminal had a synchronous output issue where piping your output through "tee" could significantly speed up compiles, because pipes have a buffer in the kernel but pty output blocked until the recipient read the data, and gnome terminal wouldn't read the next lump of data until it had processed the previous one, so writes to stdout blocked waiting for gnome terminal to update the display. Xfce's terminal doesn't seem to have that problem, or at least piping it through tee didn't make much difference to a printf() sandwiched between two clock_gettime() calls.)

I don't HAVE to switch to millitime(): I could easily switch it to clock_gettime() and use nanodiff() and keep the same basic logic... but why? Milliseconds are a unit for humans, and considering that dd is a standalone program, "time /bin/true" is ranging from 2 to 5 milliseconds on my laptop (2 to 5 million nanoseconds jitter, 7 of the 9 digits of precision). What are we measuring, why, and for whom? Sure I could carefully measure JUST the time to copy the data, not including command startup, option parsing, or reporting the output, but just a before/after around the tight inner loop... and the noise could still easily be bigger than the signal for small transfers. You wanna bench an I/O device, transfer enough data to mean something.

That said, I've added some tests for integer overflow. Yes on 64 bit unsigned numbers. Because otherwise, it couldn't report a sane transfer rate for transfers larger than 18 petabytes. (The +999 is to avoid a division by zero error, I.E. it's rounding seconds up, not to the closest half-second. Yes I saw the news article about the petabit per second optical fiber in 2020, but you'd still have to send 144 seconds of data at that rate to hit the overflow path so this would introduce less than half a percent error anyway, and for dd's rate reporting to be relevant you'd need a Linux machine that could pump that much data through a single process's stdin and stdout on one CPU. And the failure mode would be not reporting the transfer rate quite as accurately as you'd like.)

So circling back around to the first paragraph of today's rant: doing all this is easy, figuring out WHAT to do is the hard part, and when you have essentially aesthetic issues (it'll more or less work either way) I can easily spend ten times as long figuring out "should I bother to change this or leave it how it is" than actually doing it. I want the code to be RIGHT, but I don't want to rewrite everything just to smell like me, and which one am I doing here?

June 27, 2022

Got taxes filed. Paying less than I expected, but then I made less than I expected last year. (The downside of having been somewhat underemployed. Plus working 1/3 of last year for Jeff, who paid less than half the going rate. *shrug* Passion projects. "This would revolutionize a science-based world..." but a capitalist one that kept Windows on top for 30 years can stay irrational longer than you can stay solvent, as they say. They also say never go broke chasing someone ELSE'S dream...)

Bash's read command echoes back the characters typed into it when neither stdin nor stdout are redirected, and in THEORY I can just traverse the "unredirect" list to check for that. In practice, there isn't ONE unredirect list, there's several that stack. I can grab TT.ff->blk and traverse that easily enough (flow control block state is global), but pipe_subshell() has an unredirect list in a local variable, and each sh_process has an unredirect list (pp->urd). In theory read can look at its own sh_process via TT.pp, and I guess pipe_subshell can temporarily push a block, maybe? (Or does that give me another prototype loop...)

Sigh, implementing the read command is trying to pull in rather a large chunk of job control as a dependency. The tty management is kinda tied up with the rest of it. (When are we interactive, exactly?)

$ read&
[1] 1380
$ potato
bash: potato: command not found
[1]+  Stopped                 read

Ah, backgrounded commands don't get /dev/null as their stdin (in the absence of other redirects), instead they get stopped when reading from stdin, which is tty permission magic having to do with... where is it... tcsetpgrp(). Why do the man pages for setsid, tcsetattr, and "man 4 tty" all NOT mention tcsetpgrp in the "see also" section at the bottom? You've gotta already know this stuff to FIND it. Grrr...

So if you stop being a member of a tty's associated process group, you get suspended trying to read from that tty. (Even as root!) Meanwhile, WRITING to the terminal seems happily allowed for background processes. But then crapping all over your output is something the kernel does all the time too. At least that's not an additional constraint: the read command just cares about having stdin or stdout redirected. Right...

Still need the signal handlers to restore the raw console state on interrupt, and the PREVIOUS signal handler setup is in the job control fork I haven't finished and merged yet. Which needs some serious forward-porting at this point. (The "trap" command is sort of "eval" with a different entry path, but remember how I said some of the state was global? Hmmm, probably just needs to call_function() to push a new TT.ff entry. And then end_function() in the return path. Keep in mind trap doesn't happen IN the signal handler, the handler records the signal it saw and the function evaluation loop checks that before running the next command and inserts essentially a function call to the trap entry. And yes, this means I need to have the prompt also check for this because:

$ trap "echo hello" SIGINT
$ ^Chello

$ ^Chello

June 26, 2022

Paralyzed by right wing fuckery. Gonna come out of this mad, but for now I'm very very tired.

No, I am mad. I'm mad at prudetube for significantly contributing to this. "Elvira, mistress of the dark" could not have survived on current Youtube, meaning 1980s broadcast television was more permissive than current youtube. Which means (this portion of) Google is not part of the solution here, they are part of the problem.

Youtube isn't just complicit, they're actively driving the erosion of women's rights by being prudish assholes. They police women's bodies, disapprove of what women choose to wear (in public, where they are not bothered by other people or local law enforcement), and say women can't consent to appearing in their own videos. By being hypocritical inconsistent unpredictable and capricious with disproportionate penalties (including loss of account for first offense), they're trying to force people to self-censor, and it's culturally corrosive.

I grew up on Kwajalein (the new base commander giving a spech at the start of this video is in a building I recognize, the old "non-denominational" church where we had singing practice) and every 6 months we'd go to Hawaii for vacation: in the winter we'd just spend 2 weeks there, in the summer we'd pass through after a few days to spend a month stateside, and then back to hawaii on the way home. (Hickam had THE worst scrambled eggs I've ever eaten.) Hawaii was the common, non-exciting getaway from Kwaj, and the mainland was the big EXOTIC place we'd tour (but never get to see snow because school let out in the summer).

So I'm following a hawaiian diving account, where prudetube manages to be hypocritical within a single account. Sure, the guy narrating the videos is obviously using a fellow diver as "pretty girl on the hood of the car to sell the car" so people will watch video after video about cleaning trash off the sand in hopes tourists stop polluting the beach quite so much. But recently prudetube has started insisting on censor blobs OVER BIKINIS. In this video the woman is not doing anything sexual. She is jumping into the water, is wearing a bathing suit, is viewed FROM THE SIDE, and had to have her hips pixilated anyway. It's not the only one. Youtube is insisting that a woman's body is inherently sexual, but random dudes can be fully naked in their thumbnails and it's fine, because prudetube says men's bodies aren't sexual, only women's bodies are.

Any handwaving about "exploitation" here has to answer "did she consent to appearing in the video"? Let's see, the woman is addressed by name (Britney), in most of the videos looks directly at the camera and interacts with the person filming her (even taking the camera to film the narrator on multiple occasions), appears regularly in videos filmed on different days, and is clearly posing for the camera. Heck, in addition to the narrator's personal friends/family, he also does behind the scenes on models doing a photoshoot he and his friends are helping film. Any theory of these women NOT consenting to this would basically have to involve blackmail, but youtube gatekeeps because she's not acting how they want her to act. (Remember, "if you can't say no you can't say yes" works both ways. Youtube demonstrably does not believe a woman can give consent.)

After years of censorship and abuse from youtube, Naiomi Wu went so far as to post a long video explaining her history (raised as a boy under China's "one child" policy, because her family lied about having a son and followed through on that lie until she left home and came out as female; making her physically "cis" but socially/mentally "trans", and thus FLAMBOYANTLY feminine in a drag queen sort of way except with a different chinese cultural context because a lot of OTHER women went through this under China's one child policy, so it's an immediately recognizeable thing over there like being an Elvis impersonator or Cowboy is over here...) Anyway, she posted this video of her history leading her to the choices she's made in order to try to get youtube to BACK OFF. And yet she STILL regularly complains of continuing censorship from youtube. Because how DARE a mad scientist version of "Elvira, mistress of the dark" exist, let alone be GOOD at it? Her shorts are too short for her to be in tech!

This is what's fueling the erosion of women's rights (and thus everybody's rights, ala "first they came for"). Mass media bending the culture to assume women cannot exist in public, certainly not without control by men. She can't make choices about her own body here (Youtube will make them for her), and that spreads. Youtube is bringing back the Comstock act, and I'm VERY TIRED OF IT.

(Saying the woman running youtube can't be sexist is like saying Clarence Thomas can't be racist. Demonstrably untrue.)

Anyway, I've stopped watching youtube to the walk to and fron the table, and pulled out the switch and started a new skyrim playthrough. The downside is this is harder to STOP doing when I get there. Youtube was always a lot easier to put down than Skyrim...

June 25, 2022

For the read command I need a signal hander to de-raw stdin, pop itself from the list, and siglongjmp. Except this raises job control issues, since there should already be signal handlers installed for this and the new stuff should work for them. There currently aren't because that work was in one of the many forks and never got finished/committed, but I need to work out the design. In theory, sigatexit() does a signal_all_killers(exit_signal), which sets the rc to 128+signo and then calls xexit(), which will traverse through toys.xexit list popping each function in the reverse order it was added and calling it once before freeing the structure.

The problem is A) most of the shell plumbing wants generic_signal which records the signal it saw an then returns (interrupting whatever function call was in progress with EAGAIN or similar, so the main loop can check toys.signal to see if we got a signal and handle it appropriately, such as inserting a function call to a trap handler). B) if you have two signals in a row exit_signal() won't return to the interrupted function that got popped, because it doesn't return. Instead it traverses the toys.xexit list again before calling _xexit() which will siglongjmp back to the shell.

Possibly I need to teach exit_signal() to recognize when it's interrupted itself (a repeat of the same signal should be masked, but signal_all_killers() funnels a bunch of DIFFERENT signals into the same handler function) and have it return instead in that case.

Linux Weekly News highlighted a comment that using the eBPF bytecode VM environment for more stuff could lead to decreased mantainability with two contexts and domain crossing between them and fragmentation. This is pretty much exactly my argument against things like "rust" in the kernel: start over from scratch with a new one in nothing BUT rust and I'll break down and learn it. Treat it like a cancer slowly spreading throught the codebase and I'm against it. Sigh, tempted to dig up my old login and reply to the comment there...

Here's another test case to file under "syntax error abort granularity", which jumps you out of "source" but continues the enclosing whatsis:

$ cat one
source two
echo hello
$ cat two
syntax error)
$ bash one
two: line 1: syntax error near unexpected token `)'
two: line 1: `syntax error)'

But if "two" does an exit instead, that exits the caller as well.

June 24, 2022

The supreme court just crapped out the "women do not have bodily autonomy" ruling providing legal precent for taking your kidney if anyone anywhere needs a kidney. (Once again, Monty Python was NOT ASPIRATIONAL.) The court did this in the friday news dump slot, in part to distract everybody from the way they just eliminated separation of church and state, made permitless concealed carry mandatory in all 50 states, said that ICE doesn't need a warrant to barge into the homes of 2/3 of the population, and so on. It feels a lot like living in 1930s germany. The Boomers are not dying fast enough. FDR expanded the court and Lincoln outright ignored it, but all Biden and Pelosi and Schumer can do is wring their hands and tell people to vote harder in future, for more potted plants like them who will do nothing about it ever because the threat of looming armageddon is the only reason to vote for them. The problems are getting worse on their watch. Gerontocracy sucks, we need hard mandatory retirement from politics the day someone turns 65. We already have laws saying you can't vote or serve in offices (or buy things, drive, hold jobs) younger than a certain age, older is exactly symmetrical. Take all the "advisory" positions you like, but an octagenerian is not the guy you wake up at 3am to deal with a crisis. You should not make final decisions about a future you have no chance of participating in when people who DO are all around you. This is Boomers speaking over the people who will actually be doing it.

The best summary I've seen so far is "Democrats who wring their hands, weeping 'oh I don’t agree with it but we’ll lose the election if we fight it right now'." They're worse than useless if they do not fight.

Not feeling like tech blogging today. I envy my wife's xanax prescription for days like this.

June 23, 2022

Fade's game launched today. She's quite proud.

The supreme court did like five seperate horrible things this week. Boomer Biden will never expand the court like FDR did, so we're waiting for the Boomers to die. Keep busy in the meantime...

The sh.c file is approaching 5000 lines of code. Sigh. Disappointed. I really thought I could do this in 3500 lines, but there's just SO MUCH weird little corner case functionality...

June 22, 2022

Slept most of today again. Gave up and had a little caffeine. I need to do a proper caffeine detox at some point, but it looks like it's the kind of thing I need to schedule at this point.

The next failure in "toysh scripts/" is that scripts/ calls "command -v $COMPILER" as a shell builtin equivalent of "which" (which does NOT have the Red Hat brain damage of outputting "$THINGY does not exist" so checking whether it produced output to see if the file exists does not work on Red Hat. Yes, they're the people who pushed systemd on everyone else). So I need to implement the "command" shell builtin.

Meanwhile, I'm fixing the next "make test_sh" failure, and backgrounding pipeline segments was easy enough: the test that was already there should have been testing the _previous_ block (because the pipeline redirection happens before the add_block()), although this does raise the question of "echo hello | x=42" doesn't allow the assignment in the terminal segment to persist in bash. (How it works has apparently changed over the years, but bash doesnt' CURRENTLY run that in the current context, presumably to make suspending the pipeline with ctrl-z easier. So I still need to add a test for that.)

But the next problem is the test uses the "read" command, which I hadn't implemented yet. So I've started that, and its $IFS behavior is funky. The current expand_arg_nobrace() splits the argument all the way, and I need it to stop after X splits. I just added a "measure" argument for the funky semicolon handling in for ((;;)) because swapping out $IFS didn't do what I wanted. I kind of need a "do not expand variables" flag, which is awkward because it still has to PARSE through X=abc; echo "${X/b/\};;}" and just not save the results. Which also means it shouldn't call $(subshell) or $((math))...

I've tested that "echo abc potato | read IFS potato oyster" does NOT immediately adjust what $IFS is doing when it sets it: it updates NEXT call. And once again I stumbled across echo $IFS not producing any output because of course. (Have to quote it, it's splitting ITSELF on every character it contains. I remember that one!)

Ah, read is simpler than I thought:

$ read one
$ echo $one
$ read one two three
"one two" three "four five"
$ echo $one
$ echo $two

Not ENTIRELY simple, but it doesn't need the expansion logic. Just iterate through ignoring quotes... backslash escapes?

$ read one two
one\ two three
$ echo $one
one two
$ read one two
> two
$ read one two
one\ntwo three
$ echo $one

Right. Bespoke logic not sharing any of the common plumbing, except possibly the $PS2 prompt? Yes, but it doesn't expand \w and friends in said prompt, just emits it as-is. But the variable SETTING logic...

$ declare -i walrus
$ read walrus
bash: read: @: syntax error: operand expected (error token is "@")

Yup, full song and dance. Modulo NOT updating $IFS partway through the line, which is just more implementation details bubbling up to the surface: collect all the split strings, then assign them to variables in a second pass.

Hmm, the fact that I personally never used bash arrays much, and thus chose not to implement them on the first pass, is starting to loom a bit. Added a TODO for "read -a"...

Blah, the -d option puts read into "raw mode". It's manually echoing stuff back, but backspace writes out as ^? and when you -d x the input stops IMMEDIATELY when you type x (although \ to escape it still works). And the parameter parsing is positional, ala "read one -d x" complains that -d isn't a valid variable name. And when you're in -d mode newline doesn't emit the $PS2 prompt on continuation lines. (Why would I expect consistent behavior from bash?) And although -n and -N also do raw mode and disable backspace, if you \ newline it prompts! And I think -d "" is going to use null (ascii zero) as the delimiter?

The answer to the question "how does it handle typing off the end of the line and then backspacing", the answer is: it doesn't. Your cursor stays at the left edge.

Raw mode requires an interrupt handler to get it back OUT of raw mode when you ctrl-c, which is the first interrupt handler I've needed for a NOFORK command. That's going to take some design pondering, the shell's job control handlers are gonna fight a bit here. Well, I made the toys.xexit signal handler list to be called in sequence, and in theory I can manually pop a signal handler of the end of that. Cleanup and siglongjmp...

I have not figured out what read -i does. As far as I can tell, it doesn't do anything.

I should really take some cycles away from the black hole that is shell development to finish up tar --transform and dd.

Ha! I can send email again! Thunderbird gratuitously changed my mail configuration when it "upgraded" itself, but the old option was still there and I could go in and set it BACK once I dug down to the right place. (Account settings aren't under preferences, it's a different slot in the thunderbird pulldown menu, which unlike everything else is NOT accessable from the top left but is instead the three straight lines at the far right edge because all things Mozilla ignore how your desktop works and implements its own terrible UI regardless.)

June 21, 2022

Been mostly off caffeine since yesterday. Slept most of the day. Predictably irritable.

Half the books on amazon don't have audiobook versions. I read the text version of the latest books in the Tinker and Good Intentions series, despite the eyestrain, because I wasn't waiting months or years for the audiobook. In theory there's text to speech built into android, and the Kindle app USED TO support it, but Amazon disabled it to get more money from their audiobooks. (Hopefully that link is in english for other people. The amazon website is showing everything in spanish for my laptop browser because it wants to force me to log in to view stuff. Not gonna: amazon's website is broken, so I mostly don't use it.)

But amazon's greed here is self-defeating because they don't HAVE purchaseable audiobooks for things like the Ascendance of a Bookworm light novels. Which started life as free web novels anyway (somewhere between a webcomic and Japan's, here in the states "50 shades of grey" and "The Martian" started that way too), which might still be online in Japanese (or fishable out of but those aren't translated into english that I've found. A text-to-speech conversion was being posted one per week on a youtube channel with few enough followers it wasn't monetized (and thus didn't interrupt itself to tell me about Our Lord And Savior The Almighty Dollar every ninety seconds), but this morning it had been deleted from my watch history. (Guessing the whole channel got copyright struck, and of COURSE capitalism means youtube edits it out of your history rather than showing you the "this thing is blocked because $CAPITALIST complained" like it would if you'd actually bookmarked the link... Yup, they did. I notice that my Android web browser no longer shows "history" at all, it just comes up as a reload symbol. Sigh.)

So I still can't buy an audiobook of this in english, and I couldn't text-to-speech the text version if I bought that, and Late Stage Capitalism wants to block my ability to get it though any OTHER channels despite Anime only becoming popular in the USA in the first place due to fan translation. If I go to a library that has story time where they read a book out loud to the audience, that's fine. But record that and put it online (which plenty of people do), it gets taken down a couple weeks later. (Meaning they never get far out of book one.)

I 100% agree with authors wanting to get paid and have careers, and would happily mail Miya Kazuki $20 if I had her address (or in this case a 2000 yen note, you can get them from any reasonably sized american bank with about 24 hours notice, I've mailed cash to japan in a card before, it's way easier than sending mangoes). But I have very little sympathy left for most publishers, and the problem here is I literally CANNOT BUY what's being blocked.

I found a vietnamese website that will text-to-speech the english version of at least the first two (of five) volumes. Entirely possible they've got it "properly" licensed for vietnam, and it does display image ads on the page, which unlike Youtube plays fine with the screen off so I don't have to care. Of course I would be excluded via "region locking" if this was something like a DVD, but countries the USA hasn't prostyletized its religion into quite as effectively don't refuse to staunch your bleeding without payment up front. (The various Monty Python hospital sketches were FARCE, not aspirational.)

But even that wouldn't actually advance the plot for me: they only have through the end of volume 2 and I already watched the anime on crunchyroll (a service I pay for). The first 3 seasons of the Anime made it through the end of volume 2 of the 5 volume completed series, and season 4 hasn't even been greenlit yet, let alone produced and distributed. The web novel's been complete since 2017, and I'm told the web novel, light novel, and anime are all basically the same story. (Yes I've been trying to learn japanese for the past 5 years but I REALLY SUCK at it. Foreign languages were always my academic achilles heel...)

Sigh. Really looking forward to the end of capitalism, as soon as the last Boomer dies and we get to guillotine the billionaires and use their estates to provide universal basic income. (Because taxing them to do that is "unrealistic". Keep in mind you can stop being a billionaire with the stroke of a pen. Their defenders are making the "black lives matter" vs "blue lives matter" mistake: one is a voluntary condition you can resign from at any time, the other is not. Letting rich people continue to exist causes problems.)

(P.S. You have to click the tiny "read more" in that last link to see 90% of what he wrote because Youtube's UI is terrible.)

June 20, 2022

I'm told ELC has started, something like two miles from my house. Haven't swung by to see, because I'm not paying $300 for the privilege. (When I went to submit talks to the CFP on the last day, the website was no longer accepting submissions. *shrug*)

The handyman came by with his tools this morning, frowned at the granite counter, and made an appointment to come back with more tools a week from tomorrow. (It's not "smooth down a bump", it's "a two foot segment needs a milimeter shaved off of it". Which is apparently a hard problem because granite.) I didn't go to the table last night so I could be up this morning, but wound up going back to bed and sleeping until about 2pm anyway.

I got email back from the middleman organization asking for a screenshot of the pop-up about the financial services company rejecting the bank transfer, which is kind of awkward because it's a self-dismissing pop-up with a timeout somewhere between one and two seconds. And said screenshot would have my banking info in it. Hmmm...

A patch series wandering by to the linux kernel's nolibc.h got me looking at it again: it's no longer one big file but got broken up into a dozen individual.h files with nolibc.h just being #includes of all the others. This includes arch-*.sh files for aarch64, arm, i386, mips, riscv, and x86_64. I.E. those are the only real architectures that are actually supported. I poked Rich about adding support for arch/sh TWO YEARS AGO, but it never happened. Oh well. (I tried to do it myself, but mapping the musl entry point assembly to the nolibc assembly was nonobvious and the emails I sent to Rich and pokes on IRC got an "I'll look at it" which was... a year and two months ago.) Sad, but I'm overcommitted and I think I have to let that go...

Replied to two emails from Elliott on the list via gmail web login. I should have started the migration today but it's 7pm already. Maybe tomorrow.

Finished implementing the for ((;;)) math logic, for both type 1 and type 2 blocks. In the case of loops, type 1 does the setup and type 2 does the later passes through the logic. It isn't doing a $X variable resolution pass at ALL yet, because I haven't figured out where to put it (see yesterday), but the basic for ((i=0;i<5;i++)) stuff should work: the math logic itself can read and set variables now, and I've filled in all the operators the bash man page mentions.

While I was there it REALLY looked like various cleanup stuff wasn't freeing the arg.v array out of sh_process structures, but when I added frees for that it threw double free errors, and now I'm going "but if they got added to the delete list how does the realloc not corrupt that when it goes over 32 entries and resizes?" So now I have a todo to trace through my own infrastructure again just because it's been so long since I've looked at it... (Lifetime rules: who frees what. It's a big deal in C, and I need leak detection plumbing I can run after feeding big scripts through it that tells me what file and line number each leftover chunk of memory was allocated on...)

And yes, I've still got that TODO to make it so $((x=42)) assignments that change variable values in the middle of resolution don't screw up the metrics or lifetime rules, but I'm trying to get to a good stopping point so I can throw some cycles at dd and tar --transform and so on. Next release is due August 3, and I'd like it to be on time for once.

June 19, 2022

Sigh, I can't use IFS splitting to implement for((;;)) because that's only done for expanded contents:

$ bash -c 'IFS=@; echo one@two'
$ bash -c 'IFS=@ X='one@two'; echo $X'
one two

Which this very much isn't:

$ Z='x=1;x<5;x++'; for(($Z)); do echo $x; done bash: syntax error: arithmetic expression required bash: syntax error: `(($Z))'

But for some reason if you backslash escape or quote the semicolons in for((;;)) they don't count and it complains about expecting an arithmetic expresion. What variant of parsing are they doing here?

$ for ((${x/;/;;};;0)); do echo hello; done | head -n 3

Nope, it's pretty much GOT to be doing the full varible expansion logic just to FIND the semicolons, but it should not be modifying anything, just traversing it. Measuring how much until we hit a specified terminator. Hmmm, gotta edit the variable expansion plumbing to add a new mode for this.

Next question: are the variables expanded just the first time, or are they expanded every time? Looks like every time:

$ X=1; for ((x=0;x<10;x+=$X)); do [ $x -eq 5 ] && X=2; echo $x $X; done
0 1
1 1
2 1
3 1
4 1
5 2
7 2
9 2

But... not competently?

$ X=i; for (($X=0;++$X;(i==5)?X=y:1)); do echo i=$i y=$y; done
i=1 y=
i=2 y=
i=3 y=
i=4 y=
i=5 y=
$ y=3 X=i; for (($X=0;++$X;(i==5)?X=y:1)); do echo i=$i y=$y; done | head -n 10
i=1 y=3
i=2 y=3
i=3 y=3
i=4 y=3
i=5 y=3
i=5 y=3
i=5 y=3
i=5 y=3
i=5 y=3
i=5 y=3

I do not understand what bash is doing there. Clearly X is getting updated because $i stops incrementing, but $y doesn't START to increment? (When $X switches over to resolving to y, shouldn't the "test" part of the for loop then become ++y which should resolve to 1 its first time through so the first one wouldn't stop...?)

Sigh, emailing Chet would involve logging into the gmail web GUI. Dowanna. I should start my email migration monday morning (when dreamhost tech support people are presumably around.)

Syntax error doesn't just abort current statement:

$ toybox sh -c 'for i in one ${j?x} two; do echo $i; done; echo next'
sh: j: x
$ bash -c 'for i in one ${j?x} two; do echo $i; done; echo next'
bash: j: x

June 18, 2022

I've been attending the wednesday call with Jeff Dionne's team, but I think I'm out now. The google sponsorship to focus on toybox means I'm focusing on toybox, and it just developed into too much of a conflict.

Jeff's team is making a j-core ASIC through the skywater 130nm fab process Google was poking at last year. They submitted a minimal test chip to the last shuttle a couple months back, and now they're trying to get a fully fleshed out version ready for the next shuttle deadline, which could potentially be used as-is to make the mask for a production chip sellable as a real mass-produced product if all the tests pass on the resulting shuttle chips...

Um, context: a "shuttle run" is a special shared mask with dozens of individual chips glued together into one big pattern, so when you burn a wafer from it instead of getting thousands of the same chip (repeated over the surface of the wafer like bathroom tile, to be cut out and packaged into the little ceramic/plastic cases with metal pins sticking out) you get fifty different kinds of chip but only a few hundred of each. A fab runs a shuttle to cheaply test out new designs, because instead of paying for creating a mask, mounting it in the fab, burning wafers from it (a multi-step process involving more-than-lasers and repeated applications of various toxic chemicals), and cutting the wafer apart into individual chips... you instead SPLIT the cost with everybody else on the shuttle. So if they can round up 49 other designs you each pay 1/50th of the total. (And the fabs generally run shuttles more or less at cost because it's a marketing expense instead of a profit center: this is how new designs get tested out, often requiring the designers multiple tries to get it right, but it's where new designs they WILL profit off of come from. There are also various programs where educational institutions pool together a bunch of class projects to run a shuttle for their graduate students learning chip design, those are the cheapest but small business can't necessarily get in on those.)

Shuttle runs vary per fab, the big expensive ones might only do one shuttle annually, and the oldest cheapest fabs may do one every couple weeks. (Basically the more you make of the SAME chip, the less often you have to run shuttles.) And you're only dividing the costs, not eliminating them: an old process where a mask costs $50k may only cost a couple thousand dollars to submit a design to a shuttle and get back a couple hundred chips. A shuttle slot on a modern process with a $5 million mask is still going to cost as much as a small building even split 50 ways. There's a lot of supply chain complexity I'm glossing over here (getting a useful chip packaged and mounted on a board has a lot of steps), but that's the gist of it.

So anyway, I've still been dialing into the wednesday calls despite no longer working for Jeff. (He stopped paying me in january because the Google guys thought the budget allocation approved back in november would be available to spend in February so I could start doing toybox stuff full-time. I offered to keep working for him part time for a while, but that's not what he wanted. So I got paid by _neither_ of them for a while, while still working on both projects. Wheee...)

I was still trying to help out the j-core open source project, except... is there one? The website roadmap still says "coming in 2017" (the plans have changed but nobody bothered to update the website because it's not important), the project's github says zero contributions in the past year (plenty of stuff was committed in private repositories, but not published), the mailing list's web archive was still up last I checked but the mailserver that lets you post to it has been down for YEARS now despite me bringing it up on EVERY WEEKLY CALL for SIX MONTHS after it went down, because Jeff was happier with it down...

Over the years there were I believe two instances of bikeshedding on the mailing list where somebody wanted to talk about their project instead of ours and wouldn't shut up about it until I unsubscribed them from the list. This seems to have confirmed Jeff's belief the rest of the world is too dumb to help out with his project, and that trying to teach people stuff they don't already know is just a distraction (after all, HE learned it on his own)... So we haven't had an active open source community around the project for years. Instead the project did trailing abandonware releases where we'd work for months or years before anybody else heard about it, and by the time they get to see something there was no way they could even usefully comment on what got released, because we'd long since moved on and what got published was no longer close to the version we were working on.

Anyway, wednesday before last Jeff needed a piece of toolchain glue software made, and when I didn't volunteer for it after repeated hinting he assigned me work, of the "learn these two file formats and convert one to the other" variety. Which would mean I'd have to stop what I'm doing, focus on his thing instead, and then go back to the toysh stuff I've spent all this time loading into my head... and I didn't. I have a window of opportunity to get toybox to a 1.0 release (or at least WAAAAY closer), and I am TAKING IT. So I didn't respond to his texts about how it's going, and I didn't dial in to this wednesday's group call. I feel bad for ghosting him, but honestly don't know how else to handle it. (I don't want to contradict/undermine him to the rest of his team, but I am currently on a different project and ain't got the spoons for this.)

The work Jeff wants me to do is good work: Google's open toolchain stuff is organized by a guy named Mithro who does not seem to have actually ever made an ASIC before. He may have handed verilog source over to a third party, with a large check, so they could take a design that ran on an FPGA and turn it into an actual fabbable ASIC design. But that's a bit like porting a C program from gcc to tinycc, from glibc to bionic (the version from "kit-kat"), from 64 bit to 32 bit, from little endian to big endian, from Linux to a nommu RTOS that's muti-threaded instead of multi-process (every program in the system is a thread all running together simultaneously in one big app), and the new target fails on unaligned access (silently, without faulting). All at the same time. And then you hit weirdness like "no ELF segment can be longer than 64k" that don't throw an error but also silently fails and you have to track it down.

This is buckets of work not just requiring experience and expertise, but access to carefully guarded IP trade secrets (each fab hoards their tech specs like a dragon's jewels), and there's a lot of for-profit companies happy to do this work FOR you while completely hiding every detail of what they actually DO and making you sign an NDA just to talk to them. The whole point of Darpa's Open Road project was to make a fully open ASIC toolchain that actually worked with modern inputs and targets. And then Google's Open Lane project forked that to make it much more complicated, or something? Anyway, there's politics, and I haven't really been able to keep track because it's not my soap opera. Jeff was trying to make the older "qflow" work, but unfortunatly the maintainer of qflow was hired to maintain OpenRoad/OpenLane instead, and his cycles are mostly going into the new packages rather than the old ones.

Unfortunately instead of making things LESS proprietary and confusing, Mithro's project seems to have spawned yet another of these middleman companies who take your source and run it through their black box to come up with a magic file that goes to the fab. (No seriously, one of the important tools here is literally named magic.) How Mithro can hire consultants to make things more open and understandable, who then form a consulting company based on those things NOT being open and understandable enough for you to do it yourself instead of paying them to do it... what does success look like here? (Success for whom?)

The specific problem Jeff currently wants me to solve is that the timing information about how long signals take to go through each part of the routed circuitry does not always get propagated across file format conversions, so when you package laid-out circuitry into a "component" that gets reused (just stamped as-is into various bits of the design like a chip being wired into a board would be), the delay between signals entering the component and resulting signals exiting the component is not properly tracked.

A number of users are saying "we've got nothing here, it doesn't work", and the Google/Efabless guys complain that the proposed fix isn't a rigorously complete solution to the entire problem space. Mithro's guys imply there's some mythical perfect that's the enemy of this good, so they won't implement an imperfect solution that people would then (horrors!) start actually using. They're not going to implement this "full" solution any time soon either, and haven't even suggested a design for it that I'm aware of. They're just using it as an excuse NOT to fix this problem.

Alas "Don't ask questions post errors" doesn't work when it's not a real open source project, but one of those proprietary-development-that-leaks things with all the devs working on it being paid by the same employer and outsiders being unable to usefully contribute. They just close the bug on you because you are an invalid user with an invalid use case since you don't have an office in their building like everyone else on the project does, so they never see you in the cafeteria where the real design discussions happen. There's no threat you'll implement the "wrong" thing because YOUR code will never make it into THEIR project anyway.

So Jeff wants me to route around them and implement a converter tool out of tree to generate component timing files from the stuff the components themselves are generated from, I think? Bonus points if IBM and ESA can use the result. (Bolt it to the side of their toolchains, so what if it doesn't make it upstream? It's open ENOUGH.)

Oh here, I'll just cut and paste my assignment from Jeff's email:

Liberty files contain timing information for the interface to cells, macro cells or standard (PDK library) cells. The PDK has .lib files provided by the Fab, but the OpenROAD tool flow does not know how to generate .lib files for e.g. a CPU macro cell. The OpenROAD maintainers say that .lib files are insufficient to allow hierarchical design, and others suggest that .lib files are at least better than a guarantee that any sufficiently complex result will not work.

What we need to do, at least as a first step, is extract the timing information from the timing closure report files generates when we build macro cells, and put that into .lib format for Yosys and OpenROAD to read when those macros are used. This will guide the tool flow when it tries to meet the timing requirements of the macros, as opposed to now when it simply wires macros together and hope.

This GitHub issue describes what needs to be done, search for Anton’s message containing “I've been trying to build up the liberty files from STA output”. The timing file for a timing constrained run of the J1 cpu is attached also. Liberty file docs and a link to a Perl script that creates an (empty, no timing information) .lib file from a Verilog netlist.

I admit after years of both GNU/FSF shenanigans ala and also the "libertarian" political movement having a pipeline to fascism. (Imagine a spherical frictionless electorate that respected John Galt enough he didn't take his ball and go home so society collapsed without him you'll see really it will he's the one and only Great Man of History without which progress can't happen, what do you mean taking your economic theory from Ayn Rand's fiction is like preparing to enter the confectionary business by studying "Willy Wonka and the Chocolate Factory" frame by frame?) Anyway, at this point I kinda distrust anything called a "liberty file" on general principles. But that's a rounding error here.

I am very sympathetic to Jeff's goal. I want to see this done. If we had any kind of open source community, I'd wave it at them to see if anybody wants to paint Tom Sawyer's fence here. Apparently IBM's Open PowerPC project, the European Space Agency, and the Skywater fab people would ALSO like to see this done, because they want to run designs through skywater using OpenWhatsis. The guy who did Qflow got hired to oversee a massively overcomplicated from-scratch rewrite that combines a hundred different packages in a giant docker file containing many different bespoke tools written in over a dozen different languages, and just downloading it requires a fiber connection and several hours. (Basically a different part of Google has recreated the Android build process again, to the surprise of nobody.)

Unfortunately, the Google development team that's accreted all this code keeps closing this bug-report-as-feature-request. I told Jeff weeks ago that he should try to work with the IBM/ESA/Skywater guys already agitating to get this fixed, and see if they might want to fund his team to do any of this work. But that would involve building up relationships and publishing his findings where people can pester him with bikesheeding, and writing a white paper about "here is the problem, here's what should be done about the problem" and circulating it to people who might potentially fund work on the problem (like the endless rounds of VC funding the entire time I've known him)... And he's focused on getting a tapeout in time for the next shuttle deadline.

Right now, I'm focused on getting toysh to run scripts/ inside a mkroot system so I can actually implement the tests requiring root access and a specially configured system (insmod tests requiring the "dummy" driver, and so on). I am BUSY. I am not available to be assigned work I did not volunteer for. Yes I _can_ read the liberty file specification and come up to speed on a new file format and convert data from one text file format to another in python... but it would flush all the carefully rebuilt toysh state right out of my head.

June 17, 2022

Sigh, I guess bash's echo $((8*y=42)) behavior is a priority issue, with the multiplication binding before the assignment. Still annoying, but alright. (The only OTHER variable modifying operations bash implements are pre and postincrement, which are higher priority than everything else. TODO: ensure (x---y) parses as (x-- -y). Yes, it does.)

Ok, pruning the config probes Elliott pointed out on the list. I removed most support for uClibc in 2017. The HASTIMERS one is a glibc bug work around. Couple of 7 year support horizon ones, and most of the rest can use __has_include() to check for a header's existence (since gcc 5.1, llvm has it too)... That leaves two probed symbols: TOYBOX_FORK and TOYBOX_ON_ANDROID.

The TOYBOX_ON_ANDROID compile time probe is just #ifdef __ANDROID__ so I've already got a build-time symbol to check, but the kconfig "depends on" plumbing needs a .config symbol, and right now two commands (log and sendevent) depend on that, and one command (mkpasswd) depends on it NOT being set. I just converted a couple other symbols the same way on the theory that the platforms that weren't setting those symbols already can't build defconfig, but in this case it's vanilla Linux that can't build toys/android/{log,sendevent}.c so removing it would build break vanilla defconfig. I come up with workarounds: I could make an IS_ANDROID() wrapper and put it around the NEWTOY() line or something, so it would compile but then get dead code eliminated because the multiplexer wouldn't have a reference to it in the toy_list[] command table), but... hmmm, awkward. I could also #ifdef __BIONIC__ around the bionic-specific function calls in each command_main() and just leave the command there as a NOP on vanilla Linux. (You can run it, but it wouldn't do anything; maybe print "needs bionic" or something instead and exit with an error?) But that's even _more_ awkward. Then again, the android menu being empty on vanilla linux without selinux is kinda awkward already. Commands vanishing from the menu without explanation isn't the world's most elegant solution, it's just the EXISTING one that seems comfortable because familiarity breeds contempt. Lemme think about that...

As for TOYBOX_FORK, possibly I could check __FDPIC__ instead? At the moment, if the compiler is building an fdpic binary, that means it's nommu and I should use vfork() instead of fork. Two problems: 1) there are binflt (a.out based) nommu binaries that some old school hardware guys still cling to (and that's an aftermarket modification to the linker so no built-in compiler #define for that), 2) in theory fdpic could also be used as as improved asl/asmr security stuff someday, where instead of just the program's start address being randomized, the four main segments (text/data/rodata/bss) can all be independently randomized relative to each other.) This didn't happen because x86 and arm didn't implement fdpic and the userbase of every other architecture combined is less than either of those... but arm finally merged fdpic support because Linux on cortex-m is a thing, so it could START to happen.

(Note: the original fdpic support patches were against the main ELF loader, but conservative kernel developers forced the work into a fork for the same reason ext3 and ext4 were forked into separate drivers from ext2: because they were afraid of bugs in well-used infrastructure so they wanted to split their testing base and make sure the new stuff got far less use and gratuitously duplicated infrastructure in the name of "stability". Eventually the ext3 driver got deleted, but ext2 and ext4 are both still there despite ext4 mounting an ext2 filesystem just fine last I checked..? The point is, the fdpic driver should not only be able to load conventional ELF binaries, it should be able to replace the old ELF loader entirely if the Linux development process hadn't become increasingly brittle over the years.)

Leaving both probed symbols in place for now. For one thing, still having the probe infrastructure in place is useful in case I do come up with more stuff that needs it. (Not gonna retain infrastructure that isn't being used, it would bit-rot without natural regression testing...)

Speaking of bit-rot, I haven't been running "TEST_HOST=1 make tests" enough. Specifically, the upgrade to from devuan bronchitis to devuan chlamidia seems to have introduced a BUNCH of regressions:

$ TEST_HOST=1 VERBOSE=allnopass make tests 2>&1 | grep FAIL | wc

Which could be anything, it just means the output no longer matches exactly, but digging into them can open cans of worms. For example, "cmp" now says "after byte %d, line %d" when a match fails, which is easy enough to add, but it doesn't ALWAYS say it:

$ cmp <(echo -en 'ab\nc\nx') <(echo -en 'ab\nc\n')
cmp: EOF on /dev/fd/62 after byte 5, line 2
$ cmp -l <(echo -en 'ab\nc\nx') <(echo -en 'ab\nc\n')
cmp: EOF on /dev/fd/62 after byte 5

The point of -l is to show differing bytes (apparently with gratuitous indent in their version, for no obvious reason):

$ cmp -l <(echo one) <(echo three)
                  1 157 164
                  2 156 150
                  3 145 162
                  4  12 145
cmp: EOF on /dev/fd/63 after byte 4

Why does it modify the EOF message? No idea!

And while I'm at it, seeking off the end isn't an error anymore?

$ cmp <(echo -n potato) <(echo -n "") 99
$ echo $?
$ cmp <(echo hello) <(echo also) 123
cmp: EOF on /dev/fd/63 which is empty

And of COURSE it's going to special case that EOF message rather than saying byte and line as normal. The message is wrong, the file wasn't empty we just skipped past the non-empty part, but skipping past it didn't advance the byte count displayed because it's "of the bytes we compared" rather than "of the input". The GNU/FSF is as always chosing the least useful available metric.

The problem here is that A) the behavior of the GNU/FSF tools changed out from under us due to version skew, C) the new behavior is stupid, C) I can't even appeal to posix because they were ALREADY violating posix (saying "byte" instead of "char", no of COURSE this tool isn't unicode aware)...

Making TEST_HOST work is a generally hard problem because "host" isn't well defined, and over time a moving target. (I should update to devuan diptheria more promptly when it comes out, but probably won't.) What does TEST_HOST do when run on alpine, thus with busybox? I'd LIKE the tests to all pass there too, but it's not the same capabilities let alone the same output. I can regex and $NOSPACE my way past some of the rough edges, handwaving that my/their output is "close enough"... but what counts as close enough? (What exactly am I supposed to be testing? Fundamentally this is an aesthetic issue! No right answer! The real-world rubber hits the road is "the scripts people run through this pass" (but when should the scripts should be changed vs the tools changed?), plus filtering through the complaints people send me (where "filtering" is fundamentally aesthetic). I lean towards "provide the behavior the largest existing userbase expects", but when that's a moving target? Right, circle back to this one later. (I could literally spend YEARS on the test suite. I kind of AM already since scripts/ under mkroot is motivating my current round of shell work...)

Sigh, I guess bash's echo $((8*y=42)) behavior is a priority issue, with the multiplication binding before the assignment. Still annoying, but alright. The only OTHER variable modifying operations bash implements are pre and postincrement, which are higher priority than everything else so you just check at the start of the function without paying attention to the priority level. (Does (x---y) parse as (x-- -y)? Yes, it does: although -y is higher priority than x-- the only other interpretation of that is x - --y and non-prefix subtraction is WAY lower priority, so relative priority of postdecrement vs prefix subtraction is funky because both operators are positional and bind in only one direction. Wheee...)

In toybox ./sh -c 'help $COMMAND' is working for : and . but not for [ which is... only happening in the standlone build, if I build a full toysh "help [" works fine. Ah, the only reason ./sh -c '[ 1 -lt 2 ] && echo yes' works in the standalone build is because it's calling /usr/bin/[ out of the $PATH. Which means the "sh" special case configuration plumbing in scripts/ to enable shell builtin symbols from other files is only working for NOFORK commands not for MAYFORK commands... except looking at it, the sed invocation looks for TOYFLAG_MAYFORK but not TOYFLAG_NOFORK? Which is the OPPOSITE of what I'm seeing? How are the NOFORK symbols working now in standalone builds? (I test those a lot, "make test_$COMMAND" tests the standalone build of that command, "make tests" is what tests the multiplexer.)

June 16, 2022

I remember groggily telling Fade where to find the USB A-to-C cable in my backpack (I was still awake for the quarterly bug spraying guy to do his thing as she was heading out to the airport), which explains why when I got to the table tonight my backpack did not contain the cable to tether my phone to my laptop. I have another somewhere, no idea where, other than "not with me now". The wifi around here was still too congested to actually work last I checked, the university environment hitting the limit on the number of access points returned in a single scan. Dunno if it's debian or linux but instead of returning what it can it returns error because insufficient buffer size or something, and the the wicd network manager does NOT retry with a larger buffer. But it's been at least a year, maybe they fied that? Which would mean my limiting factor is draining the phone battery, still not ideal...

Got to a good stopping point on the shell stuff and checked several changes in... and then dove right back in because I'm partway through upgrading the "math" plumbing. (I was weak! Yes my email is still broken. Yes I have Elliott's todo list and should be removing the compile-time probes, but... toysh!)

Dear bash: when does a syntax error abort processing? Sometimes it does:

$ echo ${abc?blah}; echo here
bash: abc: blah

And sometimes it doesn't:

$ ((0+)); echo $?
bash: ((: 0+: syntax error: operand expected (error token is "+")

The second one SAYS "syntax error", but that's not even a special error return distinct from normal "false".

And for some reason ((math)) doesn't change the "$_" last argument?

$ echo hello
$ ((1+2)); echo $_
$ ((1+2)); echo $_
$ ((1+2)); echo $_ there
hello there
$ ((1+2)); echo $_ there
there there

I got stumped by bash's integer conversion behavior for a bit, but I think I've got it sorted out now. (I can't really complain x=1a2; echo $((x)) had a non-obvious error message if I plan to just say "bad math" with the equation string and an offset of the character it gave up trying to parse. Although in this case that might have been less confusing...)

Trying to work out how to handle infinite math loops without adding another argument to recalculate() to track recursion depth:

$ x=y; y=x; echo $((x))
bash: y: expression recursion level exceeded (error token is "y")

And it IS full recursion:

$ x='1+2'; echo $((x))

No, x is not an integer variable resolved at assignment time. The calculate logic is being called on the value of x to turn it into a number. You can tell it's being resolved to a number rather than expanded inline in the equation:

$ x='1+2'; echo $((x*3))

Otherwise the precedence rules would have said 1+2*3=7 instead of (1+2)*3=9.

$ echo $((++(q)))
$ echo $(((q)--))
bash: (q)--: syntax error: operand expected (error token is "-")

Well, we must be thankful for small mercies, I guess. I was going "oh no" about supporting that, but I guess in bash lvalues are not transitive. Hmmm, which says the precedence table is wrong, assignment is only detected right after a variable gets resolved.

So $(()) is zero, and not an error, but $((,4)) is an error. Why? If an empty expression is valid, then why is an empty expression invalid? (I'm not looking for truth here, I'm looking for CONSISTENCY!) Sigh, "toysh accepts things bash can't handle" is a deviation I've been willing to make before. The question is whether $((1+2,)) is a common error they're really guarding against (resolves to "0" because the right hand is $(()) which becomes "0"), or was their math parser just inconsistently implemented? When is it or isn't it ok to have an empty expression?

On a related note:

$ bash -c 'x=3;echo $((x?:7))'
bash: x?:7: expression expected (error token is ":7")\

Bash does not implement the gcc "test ? : other" extension where a nonzero value can be used without calculating it twice by just omitting it in the middle there. I kind of think I want to on general principles?

Hmmm, I need to be able to suppress assignment, for ? : and for the short circuit behavior of && and ||. What's a good way to signal that to a recursive call? (Dowanna add another argument...)

Wait, bash? What are you doing:

$ echo $((8*y=42))
bash: 8*y=42: attempted assignment to non-variable (error token is "=42")
$ ./toybox sh -c 'echo $((8*y=42)); echo $y'

I'm trying to work out tests for the relative priority of assignment (the lvalue requirement means it can't easily be part of the normal lvl test stack so it's extra fiddly to get the priority right; all of the OTHERS are highest priority and this is near-lowest, although possibly all I need to do is pass on the current "lvl" to the recursive evaluaton call to get the rvalue) and bash is NOT SUPPORTING IT AT ALL? Why even LIST a priority for assignment in the bash man page when it can ONLY GO AT THE START OF A BLOCK?

Nope, bash is mathing wrong. I'm following C here, at least as far as I've already implemented. And what is this nonsense?

$ echo "$((!0))"
bash: !0: event not found
$ echo "$((\!0))"
bash: \!0: syntax error: operand expected (error token is "\!0")
$ echo "$(("!"0))"
bash: !"0: event not found
landley@driftwood:~$ echo "$(("\!"0))"
bash: \!0: syntax error: operand expected (error token is "\!0")

The bash man page has ! listed as logical negation, but there's no obvious way to USE it! (Why can I "echo !" just fine if it's the last thing on the command line?) Hmmm...

$ echo 'echo "$((!0))"' >
$ chmod +x
$ ./

Really? REALLY?

$ echo "$(("'!'"0))"
bash: '!'0: syntax error: operand expected (error token is "'!'0")

What even is going ON there?

$ echo "$((""0))"
$ echo "$(("''"0))"
bash: ''0: syntax error: operand expected (error token is "''0")
$ echo "$(("''"))"
bash: '': syntax error: operand expected (error token is "''")
$ echo "$((''))"
bash: '': syntax error: operand expected (error token is "''")

There _isn't_ a "''" token in that last one.

$ echo "$((""))"

I'm angry enough I need to step away from the keyboard for a bit. Bash is constructed from scar tissue and held together by spite. (MY PARSER PARSERS THIS! MY PARSER IS BARELY A PARSER! YOU SHOULD NOT BE LOSING TO MY PARSER!)

And back. I understand why "$(("''"))" fails because recursive context, but that's WHY "$((''))" should not fail, the '' in the middle is a NOP empty quote with the $(()) context.

$ echo $(("1"+2))
$ echo $(('1'+2))
bash: '1'+2: syntax error: operand expected (error token is "'1'+2")

Bash is special casing double quotes within their math parser, but did not implement single quotes. Why?

Having looked at how chrome renders the above, you can't tell two single quotes '' from one double quote " in the default font. That's just... *chef's kiss*. (Luckily <pre> blocks are monospaced.).

June 15, 2022

Napped with Clingy Dog yesterday afternoon after Fade left, and slept through until morning. I guess I needed it? (Still not QUITE recovered from this nearly month-long cold, but... better than I was? The need to clear my throat every 20 seconds yesterday has backed off to every five minutes today. Progress!)

I decided to invoice the middleman for Q3 a couple weeks early, in case it went weird again. (Third quarter is July, August, September, and July starts 2 weeks from Friday.) Got a pop-up saying Wise is "not currently accepting payments to this recipient" when I clicked "submit" on the invoice. I hadn't even approved the invoice yet, this was their system refusing to let me CREATE the invoice. Confirmed I selected the ACH routing entry instead of the Swift routing entry in the pulldown. (Both are still there, giving identical account numbers and nothing else in the description, but I guessed it was the second one and then confirmed the fields it had filled in.) Wrote an email to the support person from last time. Then I cut and pasted said email contents over to the gmail web interface (manually filling in the to, cc, and subject fields, and apologizing in a PS that I dunno how to set reply-to through the gmail web interface to retain threading) and sent it from there because thunderbird and gmail's smtp servers are still talking past each other.

Then I went to to check how late Best Buy in Mueller center is open, and it said I don't have access to google maps. Oh right, I was still logged into my gmail account to use the web interface to send email. Clicked the pulldown menu at the top of the "you do not have access" page's nav bar and selected "log out", when then loaded a page saying:

400. That’s an error.

The server cannot process the request because it is malformed. It should not be retried. That’s all we know.

Ok, opened a new tab, typed "" into the URL bar, and selected "log out" from there, and THEN I could pull up google maps. (Yeah, I should have started the dreamhost migration process today. Maybe tomorrow.)

Composed a quick email to Elliott about the 400 error, partly "my day continues as expected" and partly because he might know who to report it to (the website is in THEORY still Google's core business), with a cut and paste of the full URL to the failure that I didn't include here because I don't know how secure the "security token" is. Which of course triggered the same darn thunderbird login pop-up web window that does not work trying to send it. (Muscle memory.) Discarded the email rather than logging BACK into gmail to try to send it from there, largely because it's not really his job and I don't want to bother him. (I have played the role of "bug concierge" often enough not to want to saddle other people with it.)

Pretty typical wednesday so far.

I recklessly, flagrantly, and with malice of forethought allowed my phone to download and apply a new system update during all of this. Living on the edge! Haven't noticed breakage yet, but it's usually package updates that break stuff on Android, not system updates. (Google pray is not allowed to auto-update ANYTHING.) Elliott and company do a pretty good job with the base OS layers (from my perspective anyway) and within a release it's just security exploit mitigations rather than interface changes.

The phone's behavior seems exactly the same so far: I still have to select the Rock Sugar album's ogg file twice in the Google "Music Player" app because the first one always immediately closes itself. No idea why, dunno why the second one works, and it's not a rathole I'm currently equipped to go down. I haven't understood why the "back" button now needs to be hit three times to work these days either. (It used to work every time, then a couple of years ago it started regularly ignoring the first hit and needing to be double tapped. Now it's three times.) No, I never did get the spurious invocations of "Google Assistant" from worn headphone cables fixed either: after about eight years of putting up with it with no acknowledgement or fix from upstream I gave in and switched to bluetooth headphones.

As I said, pretty typical wednesday...

June 14, 2022

Email broke, as described in the PS on the mailing list. At a guess, thunderbird assumes that the interactive web login thingy will only ever be two pages, but mine has that disambiguation page. I ACTIVELY don't want thunderbird to upgrade to the new authentication method, I want to continue using the old login method for this old protocol, but I didn't get asked. It's a bit like sitting down in your car in the morning and finding the steering wheel's been replaced overnight with a touchscreen/joystick combination. I did not ask for this, now is NOT THE TIME, but it would be an ordeal to even TRY to get the old thing back...

Fade flew back to Minneapolis for a week, in part attending the 4th Street convention this weekend. The dog is AMAZINGLY clingy.

I've slowly migrated to a night schedule over the past week, largely because I've finally recovered enough from this cold to head out to the table three times in the past 7 days. (Not on consecutive days, but I'm only so recovered.)

June 13, 2022

Before World War II the T-shirt was considered underwear to be worn under other clothes, and was inappropriate (even explicitly illegal) to show in public. (Men were also arrested for going topless until 1936.) The two world wars chipped away at the taboo status of t-shirts, especially the south pacific campaign of World War II stationing thousands of soldiers in damp tropical heat, until it was merely rude/uncouth to wear a t-shirt without another shirt over it (meaning old people remained horrified and teenagers didn't care). Finally in the 1954 movie "on the waterfront", a young Marlin Brando wearing a t-shirt with nothing over it did for t-shirts what Don Johnson did for beard stubble in 1986. It went from "embarassed to be seen like that in public" to "the hottest new fashion trend" in a matter of weeks, and the layers of clothing people were required to wear in "polite company" shrank. (Geezers in legacy positions still kept t-shirts out of corporate dress codes and fancy restaurants until they died, but even that seems to have mostly worked its way through.)

I consider this progress, and expect all clothing/nudity taboos to eventually go that way. (This doesn't mean you won't be able to wear white cotton gloves and a top hat if you want to, just that you won't be required to nor punished for not doing it.) We live inside climate controlled buildings with central heat and central air conditioning, most often conveyed between those buildings in climate controlled vehicles, yet we "protect" ourselves from this artificial environment because legacy religious teachings held in place by inertia say it's simultaneously "shameful" and "immodest" not to. (Isn't modesty the opposite of bragging? So not hiding your body is bragging? Failing to be ashamed... is bragging? Does this remind anybody else of fascism's "the enemy is weak and strong at the same time"? It's shameful and not hiding it is bragging.)

The distinction between "underwear" and "swimwear" covering the same areas is that one is taboo and the other is on billboards and magazine covers: neither actually helps you swim. There's also no categorical difference between "woman in hajib would be stoned to death for showing her face" and the american vestigial need for a bikini/speedo to avoid arrest for the euphemistically named "public indecency". (13 US states put you on the sex offender list for public urination, although in Texas mere nudity requires a second offense. No we can't throw stones about the women's headwear thing either, technically catholicism required women to cover the head in church until 1983.) It's all shame-based religious inertia: the Mormon "special underwear" isn't much different from how underwear is treated in the other 49 states. Being ashamed of your own body is just another variant of "original sin", priests screaming that an invisible man will smite you for thinking wrong because a book about the 5000 year old oral tradition of middle eastern tribes says so, somewhere between the talking snake and the seven-headed dragon. (Their attacks on evolution can never stop because if they're wrong about where we came from, why would they be right about where we're going to?)

As recently as 1970 hats were socially required, the suit and tie was ubiquitous work attire, and "shoeshine boy" was still a common profession. Hats, suits, and leather shoes have all retreated from daily life within living memory. Maybe not gone away entirely, but no longer required. IBM eliminated its requirement to wear a tie in 1997, even Goldman Sachs finally gave in in 2019 shortly before Brooks Brothers went bankrupt. The few remaining shoe repair businesses around here are basically retirees running out the clock. Few homes or office buildings still have a prominent hat rack. (You can wear a hat today if you like, but the two most common popular images for american hats are red "maga" ballcaps and guys who call a trilby a fedora and say "milady" a lot.)

Unfortunately "eventually" can take a while, and progress isn't constant. What really annoys me is when society temporarily rolls BACKWARDS and geezers spread their influence at the expense of the young. This is always self-limiting because the old die first, but you can have some really bad decades before that happens. Fascism is a yearning for an imaginary past, and lead-poisoned Boomers do not clearly remember the childhoods they yearn towards.

June 12, 2022

Yay, I got a new patreon supporter! Welcome Vikram! (I'd send a hello message but I have to fire up a VM while connected to internet for that, and I'm out at a table on battery reading the email I downloaded before heading out.)

I also need to re-enable every gmail user on the mailing list YET AGAIN because its out of control spam filter sometimes rejects delivery instead of putting false positives into the spam folder where its mistake can be fixed. Youtube also kept getting a true/false test wrong so turned it into multiple categories EACH of which it gets wrong. (Why does gmail sort stuff into folders AND sometimes reject delivery?) Complicating things is not always the right fix. Youtube at least has the excuse of being under constant attack from nazis trying to game its recommendation algorithms with the backing of multiple state actors (China, Russia, Saudi Arabia...) plus the plutocratic moralizing of payment processors who have literally cornered the market on "spending money" without anyone seeming to object and are happy to bow to right-wing loons who think we should all wear hairshirts and flagellate ourselves (it's amazing how much random anti-sex bias everything in the USA gets sprayed down with that you don't find out about outside of funny anecdotes; as with all the racism and misogyny it's just cultural background noise). That's on TOP of the insane IP regime and capitalism's insatiable demands for growth (squeezing blood from a stone until whatever it is collapses). Youtube is doing a terrible job under extreme duress. What's Gmail's excuse?

Our oven's heating coil blew out last month, and the new replacement stove we ordered just arrived. This one's induction, and literally NONE of our pans (except the two cast iron frying pans) work with it. Several are magnetic on the sides, but not the bottom? (Really?) Oh well, Fuzzy bought one induction-friendly pot for now...

Laptop rebooted last night, lost all my open windows. Rebuilding state again (meaning this blog is once again serving its original purpose of reminding me what I was doing). For some reason the laptop battery did not survive the walk home, which is odd because it had over 10% capacity left and I went two blocks? The connections are getting a little fiddly. For the past month or so when I plug it into AC it charges for about 5 seconds and then stops again, and I have to tilt it 90 degrees for it to start charging again, at which point it will charge all the way to full. Some sort of microcontroller negotiation saying "yes, this is a battery with X capacity, go do the thing" that's not completing properly because of a dirty contact or hair in the connector or something. I found out about the tilting because that's what I do to remove and reinsert the battery, which won't kill your open windows as long as it's plugged into AC. Except it starts charging again BEFORE popping the battery. Dell technology! (Ok, rebadged taiwanese technology, I doubt anybody working directly for Dell has a clue how it works. They just order Compal and Wistron and Quanta and such to do it for them, and put their name on it. And they seem to have moved the assembly part to Brazil? I'm out of the loop...)

So many toysh parse modes to keep straight, where a double quoted "~" doesn't expand but "$HOME" does (and so does unquoted cat <<< ~ but not cat << ~ in which case it wants a literal ~ on its own line as the EOF marker).

(It is SUCH a pain to blog about shell syntax, because I have to replace all the < > & with &lt; &gt; &amp; and yes, doing that just now was kind of meta.)

And there are still places I DISAGREE with bash's parsing, and don't think I wanna do it that way?

$ ((1+2)) potato laryngitis && echo hello << EOF
bash: syntax error near unexpected token `potato'
$ ((1+2)) $potato $laryngitis && echo hello << EOF
bash: syntax error near unexpected token `$potato'
$ ((1+2)) < hello && echo hello << EOF
> ^C
$ if true; then echo hello; fi $POTATO
bash: syntax error near unexpected token `$POTATO'

If those variables expand to nothing, how is that an error? It's erroring on something at parse time that I'm erroring on at run time, and... I think I'm ok with that? This is another symptom of my redirect logic and variable expansion logic being the same pass, and bash doing it in two passes. (I grumbled about error message granularity stemming from this a while back. I think I need to start a "deviations from bash" section under "deviations from posix" up in the top comment.)

Why do "help :", "help [", and "help ." all work, but not "help !"? I can "echo !" and it doesn't seem to mind. The ! command is more or less "run this command line and return inverted true/flase status". Sure bash calls it a "reserved word" instead of a builtin because it works on pipelines, but bash's "time" also works on pipelines:

$ time echo | sleep 3

real	0m3.003s
user	0m0.000s
sys	0m0.004s

And THAT's got a "help" entry. (Sigh, I need to make "time" work on pipelines. The bash man page part on pipelines specifically calls out "time" and "!" as being magic at the start of a pipeline. Am I gonna have to move time.c into sh.c as well as test.c? Dowanna...)

Back to hammering on parse_line() but now I think I know what to do? Hang on, I thought [[ test ]] line continuations had to all be on one line, but :

$ [[
> 1 -lt 2
> ]]
$ echo $?
$ [[
> ""
bash: unexpected token `newline', conditional binary operator expected
bash: syntax error near `""'
$ [[
> a
bash: unexpected token `newline', conditional binary operator expected
bash: syntax error near `a'
$ [[ a ]]; echo $?; [[ "" ]]; echo $?

But that's a valid test? [ "" ] is false and [ a ] is true because empty string vs non-empty string? It wants a continuation for an A -op B test, but doesn't recognize the one argument variety? What? This is NOT CONSISTENT! The rejected lines can theoretically run fine, the bash line continuation logic isn't handling the full "test" complexity.

Grrr. Ok, my parsing should be MORE lenient than Bash's parsing (accept anything they accept), so I need to do the line straddling that I didn't think I needed to do (because "[[ a" DOES error out), so... (Grrr, rewriting the same piece of code 5 times as I figure out new corner cases bash doesn't document. Yup, I'm back doing shell work...) Alright, this is ANOTHER case where I check it at runtime and just accept stuff at parse time, which means [[ blah ]] has to span multiple lines, which () {} if/fi and so on already do so... except this still isn't a type 1 block.

You know, add_pl() snapshots TT.LINENO into the new pipeline segment so we can do error reporting, should we snapshot the FILENAME in there as well? I dowanna because allocation lifetime on the string, and I don't THINK if/else/fi blocks can straddle "source" statements?

$ cat one
if true
source two
$ cat two
echo hello;
$ source one
two: line 2: syntax error near unexpected token `fi'
two: line 2: `fi'

Yup, they don't. Meaning the error statements don't need pipeline filename to report the line an unterminated flow control statement started on, if I want to do that. They can just report the current filename being parsed and it should be correct.

Got sh.c to the point it compiles again, and it immediately segfaults when run with no arguments, crashing in the "cd ." it runs internally as part of the setup. (Because doing so sets $PWD and some other internal state.) Par for the course given how much surgery I did on the code, debug it in the morning...

June 11, 2022

The toysh function declaration transplant logic needs to move after HERE documents are all resolved, putting it at the end of the loop, and that means the trick I'm doing adding a special (void *)1 entry guarding the pipeline pointer onto the expect stack is no longer necessary: the code at the end can just traverse the doubly linked list backwards to find the type 'f' entries and process each one. As long as we do it back to front it's not recursive, any function declaration within the body of another function will already have been transplanted by the time we get to the enclosing declaration.

But the OTHER test that special expect entry was triggering was the check and syntax error if the function body we just added WASN'T a flow control block. (Ala x() echo;) And the easy way to fix that is to add the check to the NULL expect entry, which means "there must be a statement here" but didn't care what kind. (I.E. { } is a syntax error because the block has to have something IN it.) If that "mandatory statement seen" test also goes "and if the statement before us was type f and we're not type 1, barf". (Type 1 being "start of flow control block", ended by a type 3.)

EXCEPT it turns out that [[ ]] and (( )) are ALSO viable function bodies! Without exactly being flow control blocks? Because they parse differently: not only does ]] not have to be the first thing on a line, it CAN'T be: [[ without an ending ]] on the same line is an error which I need to add a check and a test for. Even quotes let you span lines and continue, but not [[ ]] tests. Although (( )) is quoting logic, and thus happily does line continuation. Which I'm getting wrong. My logic is removing newlines from (()) contents, which means ((1+2\n3)) turns into 1+23 while bash keeps the whitespace break:

$ ((1+2
> 3))
bash: ((: 1+2

So many tests I need to add...

So the ((math)) parsing in toysh is turning it all into a single string, which seemed like a good idea at the time, but raises issues now. It MOSTLY works like [[ ]] but... not entirely. For one thing ((1+2)) is valid without spaces, and [[ needs a space after it or [[1<2]] says "bash: 2]]: No such file or directory". But the for ((a;b;c)) parsing doesn't count any ; that's backslash escaped or quoted...

Darn it. I know what I need to do. Expand the (( )) string with IFS=; so it splits it into three words at the (unescaped, unquoted) semicolons. I hate it when IFS actually winds up being useful, it's a TERRIBLE IDEA. Whole thing. I have to match how it works for compatibility, but it's ALSO a giant 1970s boondoggle I am highly relucatant to encourage.

And you can have ( ) inside [[ ]] blocks, which do not parse as flow control blocks but DO retain the lack of need for spaces:

$ [[(1 -lt 2)]]; echo $?
$ [[(1 -gt 2)]]; echo $?
$ [[((1 -gt 2))]]; echo $?
$ [[((1+2
bash: unexpected token `newline', conditional binary operator expected
bash: expected `)'
bash: expected `)'
bash: syntax error near `[[((1+2'

Meanwhile, my parse_word() is currently getting [[(1 -gt 2)]] sort of right, but [[((1 -gt 2))]] becomes the word "[[" followed by the word "((1 -gt 2))]]" because it's STARTING as a parenthetical flow control statement (parentheses are word breaks) but ENDING as quote logic (the same way "abc"def is all one word).

It would be SO NICE if I could figure out how to do this in smaller chunks, but I keep making changes and adding notes without reaching a good stopping point to compile-and-test. Oh well, thanks to Google I have a BIG BLOCK OF TIME to work on this at the moment.

Just trying not to get overwhelmed. My brain has a somewhat finite stack depth, and this is all tangent from tangent from tangent. It KINDA circles around to make a complete system? Ish? But it's a huge pain to debug when everything changed before you can next try to run it.

June 10, 2022

Ok, I think I'm going to have to punt on =~ support because I honestly don't know how to do this part:

$ [[ abc =~ . ]]; echo $?
$ [[ abc =~ 1* ]]; echo $?
$ [[ abc =~ "1*" ]]; echo $?
$ [[ abc =~ "1"* ]]; echo $?
$ [[ abc =~ 1"*" ]]; echo $?
$ X='.*'; [[ abc =~ $X ]]; echo $?
$ X='.*'; [[ abc =~ "$X" ]]; echo $?

The quoted part is not a regex. The unquoted parts of the same string are interpreted as regex. With all the positional shenanigans test is doing, including:

$ test "(" "" ")"; echo $?
$ test "(" x ")"; echo $?
$ test "(" == ")"; echo $?

In order: parenthesized "empty string is false" test, parenthesized "non-empty string is true" test, "two strings do not equal each other" test. I am supposed to detect when a =~ comparison is happening in that and escape the string on the right so it has backslashes before active regex symbols. (And no, I can't just escape every quoted character because \n doesn't mean 'n'. Is this regex symbol active/inactive is more or less the state being tracked for wildcard expansion and $IFS splitting.)

Sigh. Ok, I _do_ know how to do it, it's just really ugly. I'm probably gonna have to move test into sh.c, defer argument expansion (and thus quote removal) until after the positional stuff gets worked out by test_main(), and then probably add a new expansion mode flag to backslash escape punctuation? But I can do all that LATER...

And the header dependency stuff has broken the android build. Great. Trying to figure out how to fix it but I'll probably just turn the error into a warning for now, this whole area needs a lot more design work and I have too many open tabs already....

You know what I need a test for? Not covid: mono. This cold is in its THIRD WEEK. I'm feeling a lot better than I was, but boy is it lingering. Eh, maybe I'm just old...

June 9, 2022

Sigh, I need to bite the bullet and start my email migration off gmail at some point. The deadline got pushed back somewhere around June 27, but I should definitely deal with it before then... ah, that says the ACTUAL deadline (where it stops working) is probably August 1. Still, I'm already paying for a slice of an email server as part of my Dreamhost package, might as well use it...

And I still don't have reasonable transportation to Star Ranch for the AANR-SW annual meeting taking place there this weekend. I was thinking of biking, but today's high is 98, tomorrow's is 103, and checking the website it actually started tonight so I'm _already_ missing it, and I didn't reserve a room early enough if I was going to stay there overnight. (In addition to a big convention competing for a small number of rooms, they got harassed into adding paperwork hoops to jump through if you're not a regular visitor, and I haven't made it there in _years_...) Sigh, I didn't go to Worldcon when it was ~50 miles away in San Antonio a few years back, and am not going to ELC here in Austin later this month either (didn't get a talk proposal submitted in time so no comped attendance, and a ticket's multiple hundreds of dollars I could afford but not justify). I should get out more. (Yes, both event links are ephemeral, so a year from now following the links from this blog entry will not give you information about either specific event I just mentioned, and there's no obvious way to turn either one INTO a persistent link to that year's event. Oh well, this is one reason exists I guess. Not that they have 100% covereage...)

I'm trying to chip off some of the smaller bits of the toysh digging I've been doing so I can check them in separately. (And may even spend a few cycles spinning some of the _other_ plates in this project...)

I'm teaching test to handle < and > comparisons, which apparently bash's builtin test does if you backslash escape them so it goes through to the command instead of being parsed as a redirect? (I think the [ ] and [[ ]] logic is basically the same, only the parsing is different.) But all the string comparisons need to be case insensitive for sh -o nocasematch and friends, and that means I either need a magic argument to test (which is in-band signaling and can theoretically break a legitimate input), or I need test.c to read the internal state of sh.c. Which I have provision fork in NOFORK commands: if toys.rebound isn't NULL, we're called as a shell builtin. Currently help.c uses that to determine whether to show the NOFORK commands or not (because "help" called from the shell should, but via the $PATH or multiplexer should not).

The namespacing is still a little funky. I've got MOST of the data I need: none of the NOFORK commands have any actual optstr yet, so the global "toys" doesn't get modified when they're called, meaning all the data is still there. (I'd have to do some variant of the "copy this->temp" trick if they started needing that, but so far none have.) The header generated/globals.h is always fully populated so I can use struct sh_data (or*) to access the relevant shell global variable. But all the shell #defines are internal to sh.c. I haven't actually implemented -o nocasematch yet so there isn't a define for that one, but variable processing has VAR_TOUPPER and VAR_TOLOWER, the wildcard plumbing has WILD_CASEMATCH that (eventually) gets set from somewhere... Alas the bash man page does not have a "summary of when to be case insensitive" section (one of many things I need to read the whole thing for and holistically extract). I need to work out when test should be case insensitive, and then inform it somehow. But I can defer that and leave a TODO...

Sigh: bash's "test" builtin supports < and > but /usr/bin/test does not. Of course not. (Remember all those years when bash's echo -e handled escapes but /bin/echo from coreutils didn't? The aristorcrats gnu project!) And even bash's [ ] doesn't handle =~ but [[ ]] does... why? It has to be PARSED by the test logic rather than the caller or else all that ! -a -o nesting has to be reproduced. Why have a test to DISABLE it, when A) you didn't for < and >, B) positionally it has to parse as a syntax error otherwise, there's no case adding support for could BREAK that wasn't already broken. (I also dunno why bash did < and > but not <= and >=. They have -ge and friends on the integer side...)

Sigh, do I have to call regfree() if regcomp() exited with an error? Or does the error path free all of it? The man page isn't being helpful here, so let's read bionic's regex implementation... which is copied from BSD, copyright 1992. Thirty years ago. Anyway, yes the error path frees all the data so xregcomp() doesn't need ot call regfree() and the longjmp() from error_exit should be sufficient cleanup for the nofork test doing regex with an illegal back reference or something in the pattern. Good.

June 8, 2022

I have fresh github bug reports about toysh issues. (Working on it!) And a fresh dozen "bounce action notifications" because gmail decided it didn't like a mailing list message again. Sigh, gotta dig up an ethernet cable to plug the laptop into Google's little white circle because dreamhost can't apply the Let's Encrypt certificate that provides https to to because reasons. (Yes I've asked dreamhost about this before. The lists domain isn't the same machine as the site webserver, it's an alias for a Very Shared Server that customers do not get a shell prompt on.)

The parsing changes I'm making for [[ test ]] apply to (( math )) as well (perform trailing redirects but ((1<2)) is not a redirect, yet it's ALSO a command not a flow control block). Alas I can't yet plug calculate() into for ((i=0;i<10;i++)); do echo $i; done because the recalculate plumbing doesn't do variable resolution/assignment yet: no i=0 or i++.

Hmmm, I _think_ the NO_PATH|NO_SPLIT expansion being done inside [[ ]] is also correct for (( )) contents?

$ ((1<2)) && echo hello
$ ((2<2)) && echo hello
$ X=4; (($X<6)) && echo hello
$ X=y; (($X++)); echo $y
$ let "$X++"; echo $y
$ y=1 (( y -lt 3 ))
bash: syntax error near unexpected token `('
$ y=1 [[ y -lt 3 ]]
bash: [[: command not found

Gratuitously inconsistent error message at the end there, of course. And why DOESN'T it accept prefix variable assignments for these cases? My current logic does and it would be extra work to make it NOT do so...

Just glanced at the busybox mailing list for the first time in a while, and there was a post about an alpine bug report that leads to an entire busybox subsystem (libb/printable_string.c and libbb/unicode.c function unicode_conv_to_printable2()...) I hadn't noticed existed, trying to do output sanitization? What? Why?

Long ago, the ANSI.SYS driver in DOS implemented an escape sequence to remap keyboard keys to produce arbitrary strings, so the next time you hit the "a" key it could output "\nformat c:\ny\n" or something. This is breakage Linux EXPLICITLY NEVER IMPLEMENTED because it was REALLY STUPID. And the Alpine devs are implying that there are broken terminals out there which can be hacked by outputting arbitrary strings to the TERMINAL. They do not say how, but it's implied not just to be "the terminal is now in an unhappy state" (which leaving things in raw mode does, and XON/XOFF flow control could when it got ctrl-S, and QEMU still does breaking the line wrap state and thus bash history editing, and catting a binary used to regularly change the character mapping so everything was funky graphics characters until you blindly ran "reset" although I haven't hit that in a while so I don't think xfce's current terminal implements that...). But no, there's a CVE out there that says some terminals can "execute arbitrary code".

Look, if your terminal allows "echo" to excute arbitrary code by outputting an attacker-controlled string, that's a broken terminal. Fix your xterm or use a different one. (And I'm trying to teach toysh's prompt to reset the terminal state a little more aggressively in interactive mode.)

Why does busybox have an entire SUBSYSTEM to sanitize its output? I mean, the "less" command is gonna need to parse that sort of thing to get the scrolling right, as would an eventual "screen" implementation, but a lot of that boils down to "output a sequence then ask where the cursor is now" rather than trying to perfectly predict what a variety of terminals will do. "Sanitizing" normal output? I really really really don't want to put that kind of crap in toybox. So why did busybox do it?

The git log says printable_string.c was added January 31, 2010 as "ls: unicode fixes", with no further explanation. That commit has some test suite entries but they're labeled things like "concatentation of incomplete sequences" and "overlong representation of the NUL character" and so on, and seem to be a preexisting package of utf8 corner cases rather than anything to do with terminals? (I keep thinking each new length of utf-8 sequence starts offset by the previous length's highest byte so there would be no redundant encodings and you could represent more encodings with fewer bytes, because surely Ken Thompson wouldn't have gotten that wrong? Unicode is nuts but utf-8 isn't? But I haven't nontrivially looked at the utf8 plumbing in many moons...) Also, the test suite plumbing checked in at the end of that commit has a ls.mk_uni_tests and a ls.tests where the second seems to be the first with all the actual utf8 sequences replaced with '?' characters, and I'm not even ASKING why that happened. Nope, not my problem.

Looking at the busybox web archive for January 2010 doesn't show any discussion of WHY this stuff went in. There was a thread about ls -l not working on android (going on for a long time, apparently not realizing it hasn't got /etc/passwd). Nor does February (closest I spotted is this which ain't it). I was still participating in the busybox list back then, so presumably if something about this had wandered by I'd have noticed? (This was my "if you can't beat 'em join 'em" period when I'd mothballed toybox and ported my "patch" implementation to busybox and met Denys in person at ELC and explained "everything about a command can be in a single file, so you don't have to modify five different files each time you add a command" theory. Back before the 2011 restart targeting Android...)

June 7, 2022

I need to make and upload more youtube videos, but every time I check in youtube continues to implode. Why even have age restricted content if you randomly delete people's videos or entire channels under the same capricious guidelines? Sure, Youtube is not the only online service that's gone nuts against its own userbase, but since I've never bought or sold anything through ebay I have the luxury of not personally caring about its decline.

(Speaking of which: dear The Economist, saying "When the driver shortage normalizes..." about Uber/Lyft's problems is another way of insisting we'll inevitably return to a pre-pandemic "normal". That's an "if" at BEST. A third of the Boomers haven't even retired yet and the trucking industry is already trying to recruit them for its own driver shortage. The average number of children in the USA is down to 1.7 (replacement rate is 2.1 since their kids-per-adult only counts adults, and not everybody survives to adulthood). We've historically made up for that with immigrants but that's cratering too (gee, wonder why). In addition to the Boomers not getting any younger, a significant contributing factor to "the great resignation" was workers dying of covid (especially people who interacted with the public), and an order of magnitude more got long covid and no longer had the energy for a second or third "side hustle". So Uber's dependence on all those schoolteachers driving for them after class isn't necessarily replicable going forward.)

Youtube has always had a creator burnout problem, but at this point half the things wandering by on my youtube recommendations are reuploads of old videos that got some retroactive stupid strike months or even years after they went up. Channels where the creator died inevitably become motheaten as videos in series go down. And no it's not "algorithms", the prudish sexism is coming from the top, on a site where the CEO gave herself a "freedom of expression" award (and yes the annoy people into subscribing guy reported to her).

*shrug* I'm happy to let Youtube die, I just need to figure out where to host my own videos if it's no longer viable, vimeo doesn't want to be in that business anymore, dreamhost is too cheap to be particularly load bearing, and Google Fiber... even if they weren't doing the "it would inexplicably cost 4x as much to get a static IP" scam, their router drops to dialup speeds if I don't unplug it and plug it back in every few days. (The guy came out and replaced the little lens in the box on the side of the house, but for some reason the white circle Google left here keeps getting slower and slower the longer it's on. We have to power cycle it every time we want to watch anything on crunchyroll.)

Still banging on toysh: possibly I should defer function body transplanting to the first execution of the function declaration. I already have the type 'f' vs type 'F' distinction, and if the function declaration goes out of scope before being executed (ala if false; x() { echo boom; }; fi) that's fine?

I have reached the semi-inevitable point of any deep dive into sh.c where I have enough parallel pending threads of change in progress at once that I want to apply a discrete self-contained change to a clean tree, test that it works, and check it in so I have a smaller diff against vanilla. Except I checked in Moritz's change to host.c and have that only half cleaned up, so should I diverge to finish cleaning that up first? Except he took my question about "do we want the interactive mode of nslookup" as a suggestion to IMPLEMENT the interactive mode of nslookup (despite not, himself, being a user of it that I can tell), so I now have a 2-patch series waiting for me on the mailing list that is not on top of the one I've already applied ot the tree, so I guess I should back the old one out of my tree? I guess he sent it to me as a two patch series so I can apply the first but not the second...

I am not very good at this "maintainer" thing. The conflicting demands of authoring code and editing an anthology of code ideally means a mature project has a dedicated editor who doesn't author. (When Douglas Adams was script editor for Season 17 of Doctor Who in 1979, he didn't MEAN to completely rewrite the episode "City of Death", and used a pseudonym when the episode needed that much work...) But toybox isn't at 1.0 yet, I have a lot of authoring left to do, and need to get better at editing/integrating external contributions in parallel without swap-thrashing. Oh well, doing the best I can...

Oh goddess, after all my supposed dependency improvement work, running scripts/ once rebuilt all the headers THREE TIMES. The third time so "make install" could build "instlist". I mean... yeah sort of? But... not really a net improvement? (Sigh, rebuilding the headers doesn't take long, but there was a lot of implicit "I know this changed but that only matters in obscure corner cases" before that now... aren't ignored. Grumble grumble. I guess the important case is that "make change" is fast? The rest of the time it doesn't really take long enough to matter even for interactive builds on my nearly decade old laptop. (I'd say "try it on a raspberry pi" but Pi 4 eats 3 amps and the Pi foundation sells an official case fan now. Not exactly the "runs happily off a USB battery" Pi of yore.)

I just went through and triaged the functions in sh.c, they're not QUITE sorted in a rational order? Some of it's dependencies (what calls what without excessive function prototypes), but a lot of it was just organic growth without tidying passes to reorganize stuff...

Sigh, I pushed a "tidying" commit and now I regret it, but it's up on github already. The problem is "git annotate" and friends can't see through the moves, so they become a lot less useful after that sort of "tidying" commit. (Even when the file moves out of PENDING it has a --follow option to see through it, but not hunks of code moved within the same file.) So I think I want to hold off on moving a lot of stuff until I've got everything implemented.

For the record, the groups were something like:

# libraryish (could move into lib/ if external users)
nospace, getutf8, anystart, anystr, fpathopen
# library but on sh_specific structures
syntax_err, arg_add, push_arg, arg_add_del
# variable get/set
varend, findvar, addvar, getvar, getvar_special
cache_ifs, setvar_found, setvar_long, setvar
unsetvar, setvarval, visible_vars, declarep
# math
recalculate, calculate
# file descriptor redirection
next_hfd, save_redirect, unredirect
# token parsing
redir_prefix, parse_word
# managing flow control blocks and function calls
clear_block, pop_block, add_block, call_function,
free_function, end_function
# marshalling data into subshells (especially on nommu)
subshell_callback, pl2str, run_subshell, pipe_subshell
# wildcard expansion
wildcard_matchlen, wildcard_match, wildcard_path,
do_wildcard_files, collect_wildcards, wildcard_add_files
# argument expansion, quote removal, variable resolution...
slashcopy, expand_arg_nobrace, brace_end, expand_arg,
expand_one_arg, expand_redir
# executing commands in pipelines
sh_exec, run_command, free_process, free_pipeline, add_pl
# parsing each line of input (move up with parse_word?)
# job control
find_plus_minus, is plus_minus, show_job, wait_job, wait_pipeline
# readline with prompt
do_prompt get_next_line
# Navigating flow control in pipelines (calls run_command() above)
# setup plumbing that maybe goes with variables above?
initvar initvardef set_varflags export
# The main entry point for parsing any chunk of shell script text
# Setup code called from sh_main
nommu_reentry subshell_setup
# The main entry point for the shell
# all the other entry points for shell builtins

June 6, 2022

Moritz Weber sent me a snapshot of his git.c, and posted nslookup mode for host.c to the list. Both I need to find multiple hours of focus to go over properly. The first isn't quite at the handoff point where he's done with it and I can change it enough he'd have work to do wrapping his head around my changes. The second raises design issues.

When there are multiple commands that do basically the same thing, I'm always torn about which to implement. I use "dig" for lookups but busybox didn't have that one. The "host" command is fairly simple, in that it doesn't really do much, and that's what Rich (the musl-libc maintainer) sent me an implementation of, with no existing competition so of it went right in. The "nslookup" command has an interactive command syntax I've never preally looked into, and the new patch doesn't implement. The design questions this raises are of the aesthetic variety, easy to say what we CAN do, but hard to say what we SHOULD do... (The "host" command is probably the most unixy, do one thing and do it well. But "dig" matches what DNS is doing behind the scenes and I've found it very useful for diagnosing ISSUES with all that zone transfer nonsense. I haven't used nslookup much, I'm mostly familiar with it as the only one windows has, where I just used it as "host". (Can you show me the number for this name? You can do this. Try for me? I'll let you play an advertisment at me if you can do it. Ok, that was ALMOST right, do you know what "IPv4" is?) I'm assuming the Linux version is slightly more load bearing...

I had enough notes in sh.c about ripping <<< parsing out of the main HERE document path that I just went ahead and did it. Turns out I already did part of that before, so part of the problem my tests had was the <<< implementation in the current tree was incoherent, half treated as HERE document and half as a normal redirect. Oops. Getting distracted halfway through this stuff and not having a thorough regression test suite that we actually PASS so the interesting test isn't after other failing tests. Establishing a baseline! Working on it..)

HERE documents needing to go back and annotate existing segments get kinda awkward when the parsing gets complicated. Function completion is a bit non-obvious: the closing } of a function triggers a transplant of the function body into a separate function struture (because functions can be called by name arbitrarily later, completely separate lifetime from the block of code they were declared in which gets freed when it goes out of scope). BUT that transplant can happen long before the HERE document line collection has a chance to happen, because lines come in asynchronously (I can't _request_ lines from the input) and the processing is "what do I do with the line that just came in":

$ x() { cat << EOF;} && echo hello
> potato
$ x
$ x

Yes, that HERE document is persistent and part of the function, but the contents of the here document don't get collected until after another statement is parsed and queued up to run. That "echo hello" could even be a call to the function instead; the call won't happen until we've filled out the here document contents. It's the same basic line continuation logic: we can't run until we've read enough input to complete the current thought. Return 1 if we need more input to glue to what we've got, return 0 when what we've parsed is complete and executable.

$ fun() { cat; } << EOF
> potato
$ fun
$ fun

Still persistent, but that's a trailing redirection on the enclosing block instead of a statement inside the function.

$ fun() << EOF
bash: syntax error near unexpected token `<<'
$ fun() { << EOF
> potatoid
> { cat; }
> }
$ fun

That HERE document redirect is on an empty statement at the start of the function body, so it applies to nothing. The same way this:

$ > filename

... is basicaly a synonym for "touch filename", creating an empty file with no contents. This is the input version of that, and thus a NOP. And yes, the extra nested curly bracket took me a moment, but it's because:

$ fun() { << EOF
> hello
> cat; }
$ fun

HERE document lines are higher priority than logic continuation lines so we complete the HERE input then complete function body. That has one { and one } and they match. The HERE document is still applying to an empty statement before the cat statement (not to the whole function: the newline after the first EOF is a statement break), and thus cat hangs reading stdin.

Sigh. Function declarations are weird because the declaration adds a chunk of code to a global name=value dictionary, but PARSING the function isn't what does that. It's executing the function declaration that does it, which is not the same as calling the function:

$ bash -c 'if true; then x() { echo hello; }; fi; x'
$ bash -c 'if false; then x() { echo hello; }; fi; x'
bash: x: command not found

The reason I need to transplant code into struct sh_function->pipeline is so it can be reference counted, because the lifetime rules are weird. When you execute a function declaration you copy the function into the global function name table. (I'm unaware of these being namespaced in nested functions or anything, you can't stick "local" on a function declaration. Or at least every variant I've tried is a syntax error so far.) So if the original chunk of code (containing the declaration) falls out of scope and gets freed, the copy in the function table needs to persist. It's basically reference counted, and gets freed when the reference count drops to zero.

But moving function bodies into function structures screws up the HERE document parsing a lot, because I have to traverse backwards to find the first unprocessed pipeline segment (because you can have lots of segments in the line that got parsed, which may include multiple scattered HERE documents that then consume each new line in sequence until satisfied; I even have to mark "bridge" segments between them that DON'T have unsatisfied HERE documents but are preceded by one that does; it's that or N^2 reparsing of everything for each new line). While traversing backwards, I can descend into transplanted function bodies (which involves marking the function declaration during the transplant process, the type 'f' node needs to know that the type 3 at the end was a bridge segment).

The problem is, when I hit EOL on a HERE document and satisfy it, I need to traverse FORWARD (unmarking bridge segments) to see if there's more to do so it can return the line continuation info, I.E. should parse_line return 1 (need another line) or 0 (can start executing now). But when traversing forward, I can't pop UP out of an sh_function because the tree is singly linked so can can only be traversed in one direction. (It's actually worse than that because the doubly linked list of pipeline segments in the function body gets dlist_terminate()d when it's transplanted, and I only have a pointer to the start so have to loop to the end and then work backwards from there.)

Sigh, I think what I need to do is move the function transplanting out of line, and make it a second pass that happens before returning 0. That way it doesn't interfere with HERE document parsing. It's an O(N) traversal but that's not so bad... (Of course I have to parse them in reverse, because function declarations can contain other function declarations. But it _is_ a doubly linked list, backwards is easy enough...)

I really want to deal with Moritz's two files and Elliott's todo list, but toysh is "elephant chewing" territory. I need to get to a good stopping point where I can put it down again. Every time I cycle back to it I have to climb a BIG MOUNTAIN in order to tap the top with a hammer a few times.

June 5, 2022

HERE document parsing in toysh's parse_line() is a multi-step process: First we parse all the pipeline segments out of the current input line, and reach a logical stopping point where we don't need a line continuation because of unterminated quotes or backslashes. As we terminate each pipeline segment, we mark pl->count = -1 which says we need to check the segment for HERE documents in the exit path. Note that not only can you queue up multiple HERE documents in a single pipeline segment, but multiple pipeline segments can have pending HERE documents, ala:

cat << EOF1 && echo hello && cat << EOF2 && echo there
> one
> EOF1
> two
> EOF2

I'm tempted to move parsing of three shift here documents (<<< "string") out of HERE document parsing and make it a normal redirect, because they sort of ARE weird file redirects. Except they're awkward to handle because that data comes from the shell's memory and goes through an fd to the process, which is hang city if done wrong. I can't just make a pipe, write the string to the pipe, and close it, because it's unlimited size and could be too big for the pipe buffer (if the write blocks, the shell hangs). Launching a background process to feed in the data is AWKWARD on nommu (we are not threaded, don't ask). So what it's doing is creating a /tmp file, deleting it, and writing the data into it. That way the write can't block (it can -ENOSPC but we can legitimately error out for that), and a filehandle reading from a deleted file is even seekable! Since << HERE documents already have that plumbing, having the three shift version funnel into that codepath is easy at runtime. At PARSE time it kinda sucks, but it's a question of which end should suck.

At parse time all the fiddly bits are "more lines are coming, stack their data into prepared side-channel spaces and return need-more-info until we've matched all the EOF tags and filled up all the prepared slots". Except three shift HERE doesn't do that at all, it's one string and we have it right there. So the trick is making the later parsing skip OVER these entries, which are already filled out...

I still haven't got adb hooked back up to my phone: the version installed in debian, the version in the AOSP checkout, and what my phone expects are three different things and the first thing ADB does is launch a server daemon out of some sort of search path, which requires installing a magic debian package (android-sdk-platform-tools-common) so it can supply /lib/udev/rules.d/51-android.rules so the adb daemon can find my phone when I ask adb "what phones are plugged in". And THAT is why I can't use the vesion from AOSP and need to use the version from Debian, unless I'm copying random bits of AOSP checkout du jour into my host system's config files by hand...

Anyway, the EASY way to copy files onto my phone was to just throw them into a directory on my website, download them one at a time through the browser, and then create a new directory with the "Files" app and move them to it. Tedious, but straightforwardish.

Oh, and Android has a passable "Music Player" app. It has a bunch of things that cause dropouts in the playback (such as turning the screen on and off), and one of my aac files drives it NUTS to the point it plays random tones and then hangs and pops up the "would you like to report this" window. And for some reason Rock Sugar's Reimaginator album (an ogg file) always has to be played twice. The first time exits immediately, the second plays fine. *shrug*

Between that, audiobooks, podcasts, and the four video streaming services I already PAY for (netflix, hulu, crunchycrunch, and prime), I'm getting over the loss of Youtube. (And no, I haven't reinstalled the Google Maps app either. I pull up the web version a couple times a month, but only when absolutely necessary. When a service goes beyond the Advertising Event Horizon, I wander off. I STILL can't get Google Maps to show me the Great Clips in Hancock Center. I zoom ALL THE WAY IN and it's not there, unless I search for it by name in which case suddenly there's a pin that tells me the phone number and hours they're open. Because they didn't pay Google for the privilege of being listed. Why would I use a service that's not going to show me places that haven't paid to be listed? Applying the "relevance" filter to local geography shows the bias quite clearly, and they won't let me opt out of the filtering because money.)

June 4, 2022

Going down a longish toysh rathole where a pointer value is wrong in a test case, and tracking through the creation of the relevant linked list to find out what's stomping it, I just spent AN HOUR trying to figure out why the newly allocated list member had THE SAME POINTER VALUE as the previous entry in the list, despite the previous one NOT HAVING BEEN FREED? Cut and pasted the hex next to each other just to be sure: it's identical. And can't be. (New call to add_pl() returns the same hex value, stuck a dprintf(2, "HA!") in free_pipeline() and that's not being called, but nothing ELSE should be freeing it, can't be heap corruption because it proceeds way too far in the rest of the program for that and later code is successfully USING the supposedly overlapping structure instances...)

It was, of course, because I have a <<< "here document variant" in the test case, which does a realloc() to expand the structure size, which allocates a new instance, memcpy()s the data over, and frees the old one. So the next convenient chunk of memory exactly the size of the allocation for a new pipeline stage is... the previous chunk of memory the realloc() freed.

<fozzie_bear>Oh. I knew that.</fozzie_bear>

I have to back out SO MANY DEBUG PRINTFS now. (So much of debugging is meticulously, painstakingly answering the wrong question. Of course the reason it's always in the last place you look is when you find it you stop looking.) And of course the next question is what subset of this debug scaffolding I've piled on here is still relevant, and what can be cleaned away.

Emailed a question to Chet about what bash was doing, which resulted in another "Oh. I knew that..." I suppose embarassment my code isn't already complete and bug-free is the programmer's version of "that part of the book" novelists reliably hit halfway through where they want to delete the book and possibly quit writing. I'm my case, I'm frustrated because this code should already be complete and bug-free by now, and me still trying to work out design elements (let alone elaborately reverse engineering stuff I wrote and should already understand) is just sad. Of course ideally someone else would already have done it all before I even started, and done a better job than I could, since I'm not really that good at this. But it turns out that while there ARE programmers out there both better and faster than I am... Fabrice Bellard is busy. And I've encountered enough of the rest not to be particularly bothered by Impostor Syndrome: on average we have Weinberg's Law instead, and we collectively EARNED it. (And no, replacing C with Language Du Jour ain't gonna help, Flon's Law is just as old).

So, the problem I've hit in my test case is that <<< "here documents" aren't actually as debugged as I'd like: the realloc() I mentioned earlier wasn't updating the cached arg pointer to a member that could move, and THEN the counter for the number of entries it was allocating was off by one (although I recently fiddled with that so may have just recently broken it), and THEN the data it was saving into the new entry had an object lifetime problem: I needed to make a copy of the string or it would get double-freed, but tracking down WHEN I needed to make a copy involved working through the << and <<- cases where EOF is just a marker that gets dropped as data is read over it vs <<< "contents" that get kept until runtime: only this one case had the lifetime issue.

So many more tests I need to add to sh.test. But again, I knew that...

Another problem is that EOF processing has its own sequencing issues:

$ cat << EOF1 && cat << EOF2
> EOF2
> EOF1
> EOF1
> EOF2

The line gets parsed all the way to the end with the relevant HERE document redirects queued up attached to each statement, and then future lines of input are assigned to the appropriate HERE documents in sequence, terminated by the appropriate terminators. (So the first EOF2 is input, not a terminator, and second EOF1 is input, not a terminator.) And THAT means that my current code looking at the last pipeline segment is insufficient, it needs to run each new line of input past every finished pipeline segment in sequence. Either I can traverse the whole linked list each time looking for negative pl->count (scalability issue if I try to run a 20k line script and it's doing an n^2 traversal on it), or I can have a pointer to the last one that was "finished" (I.E. known not to need more data). And the pointer is iffy because I'm not entirely certain when to clear it? (It sort of logically belongs in "expect".) Or advance it: can't point to the trailing entry or the realloc() would also screw it up. Hmmm...

Anyway, if I get THAT solved, then the differing handling of trailing redirections on [ test ] vs [[ test ]] that I mentioned at the end of my second email to Chet come in to play, as in I need to finish teaching [[ test ]] to be a weird pseudo flow control block, which is what started me down this rathole. And THEN see what fails next in scripts/

And then if I'm really lucky I can transplant the test infrastructure inside mkroot and run it under QEMU, and start doing proper regression tests for things that require root to test, or delicate setup to reproduce. Which is what I need this for in the first place...

June 3, 2022

A side note to the ongoing discussion about MacOS binary versions: to me the frustrating part about MacOS is that Apple CLAIMS "Darwin" to be an open source OS like the Android Open Source Project. AOSP is real: I've built it on my own laptop and even booted it in the magic qemu fork AOSP ships a couple times. AOSP is not AIMED at me, but it works, and actual open source distros (lineage, replicant, eelo, calyxos...) have even produced versions that run on phones.

For me AOSP mostly goes in the raspberry pi bucket of "trying to actually USE this spins off too many tangents that distract me before I get to the shell prompt it can theoretically provide". In the case of PI it's things like what is the status of the actually open source pi firmware rewrites (answer: don't quite work yet, can of worms, possibly stalled by pandemic, whack-a-mole adding peripheral support), if anything at all goes wrong there's no HDMI output so what random GPIO pins do I use to get a serial port again? (Thanks to that short board bringup contract I've got a couple of those USB-to-serial-wires adapters lying around now, never did find my original one but I ordered more.) I STILL can't use a vanilla kernel on pi but must grab a stale fork off of github so what changes did they make... But my inability to NOT pandora every black box I come across is a "me" issue, and it works fine for other people who can blindly follow instructions to wave the wand without asking how it works. AOSP exists, my inability to lie comfortably in it without turning around 3 times first and a lot of digging is on me. (That said, it would be nice if these projects HAD fewer sharp edges and unanswered questions, especially after their tenth birthday. It SHOULD all be old hat by now, not magic hat producing undocumented rabbits via rote invocation, and now australia has a new invasive species...)

But Darwin isn't even that. When I google "Darwin QEMU" the first hit links to a List of available ISO images which is 404. The next hit is about the same darwin release version on a page dated 11 years ago. No actual apple page comes up on the first page of hits, although there is something called the "puredarwin" project, which on its main page links to a "teaser video" from only 4 years ago. (So... LESS dead?)

The Darwin wikipedia page says there are still "releases", but that's just tracking tags on Apple's kernel repo on github. The problem isn't that no file there has been touched in 13 months (so trailing source-under-glass release with no external ability to participate in development even with current bug reports). No, the problem is that's just the kernel, not a distro. No libc, no userspace bootable to a shell prompt, no toolchain, and the README is about how to build the kernel under paid MacOS involving a lot of menu selections for things Apple preprepared for you. (There's a "bless" command required as part of the build?)

According to this post-mortem (from 2006), Darwin failed because only the engineers wanted to do it, and this is another company where the engineers have no political power. For once the real power is Marketing rather than Legal (or weird executive financial scams), and Marketing did the absolute minimum to SAY they were doing something they didn't actually do.

So Darwin goes in the OpenSolaris/Illumos bucket: the mission is to convince our customers the snake oil we sell has every attribute our competitors' antibiotics do, not to change actual performance in any way. AOSP is successful enough as an open source project that even Huawei's HarmonyOS and Amazon's FireOS can adversarially fork it yet still wind up chasing the original's taillights ad infinitum. Darwin could not ever be shipped by a competitor to Apple, and "developer wants a MacOS build environment without spending over $1000 on Apple hardware" counts as a competitor. (Eating their seed corn part of Apple's business model, they eat EVERYTHING.) Thus corporate strangled Darwin but made no actual announcement of its death, because it was only ever about perception.

June 2, 2022

Ok, bash WASN'T lying about not performing word splitting, it's just that a test with an otherwise unrecognized string is "true" when the string is nonzero length:

$ [[ BOOM ]] && echo yes
$ A='1 -lt 2'; [[ $A ]] && echo yes
$ A='2 -lt 1'; [[ $A ]] && echo yes
$ [[ -e $A ]] && echo one
$ touch "$A"; [[ -e $A ]] && echo two


Bug reports about top misbehaving. From the email title I thought they were complaining about top itself using too much CPU (which is TOTALLY TRUE and one of my todo items for it), but they're saying the numbers don't add up, and their example (8x cpu with 724% system and 141% user, plus others) would bear that out.

Let's see, the kernel's Documentation/filesystems/proc.rst says the numbers on the CPU lines are of /proc/stat are: user, nice, system, idle, iowait, irq, softirq, steal, guest, guest_nice. Ok... Unfortunately, the bug report gave me ONE snapshot of the numbers, and top is displaying the difference between two sets. (These numbers are total ticks consumed in each category, meaning they count up monotonically until the system is rebooted. You read them and then read them again to see what changed, by subtracting the first reading from the second. That's pretty much what "top" does with everything...)

Hmmm... I wonder if this is a kernel locking issue? Specifically, getline() is performing multiple small reads from /proc, and the kernel ain't gonna hold "do not update these numbers" locks across multiple reads from userspace. Which means if we don't do one big read into toybuf or similar, the numbers can basically be incoherent. (I dunno what libc they're using, but if it's doing byte-at-a-time reads, or at least smaller than line length which looks like 60 bytes or so, then the kernel may be regenerating the line to find the part to append, and the regenerated line won't match the original, so two sets of numbers spliced together...

The problem with reading into a fixed size buffer is that imposes a maximum supported SMP level. 4096/80 is 51, although the lines are usually shorter than that, so we're probably good for at least 64 processors here. And it looks like it already IS doing a single read into toybuf? Hmmm...

It's hard to debug top because it's hard to regression test it. The command is inherently interactive, and reads live system data you can't easily replay exactly. Hmmm... Step 1 is to reproduce the issue.

June 1, 2022

Jacinda Acern, prime minister of New Zealand, was on Colbert recently, and I'm going "it's amazing what a country whose Boomers didn't breathe quite so much leaded gas fumes for 40 years can accomplish". I mean gun control, sure, but every contry on the planet EXCEPT the USA has managed that and medicare for all (usually called a National Health Service). But their Boomers seem to have stepped aside gracefully for a prime minister young enough to give birth in office. They're NOT still driving well into their 80s with the youngsters having to stage an intervention to take the keys away.

The New Zealand Boomers also aren't obviously gullible idiots falling for every conspiracy theory (and nigerian prince scam) and screaming full-throated racist grandpa "off my lawn" slogans all the way into the grave (while clinging to power). Nor do they accept "nothing can be done, the billionaires are too powerful, just accept your fate" arguments for 37 different existential crises switching seamlessly between "masking will traumatize my child" and "active shooter drills are the price of freedom". Our local cohort, subjected to massive pediatric lead poisoning, is not going senile quietly. Here, the dumber they get the louder and more confidently they dunning-kruger, and our billionaires embrace the cannon fodder while trying to reinvent monarchy.

I'm also watching the local Boomers freaking out about how they've destroyed their species' birth rates. The racists are of course sure that Those People are still having lots of kids, because Those People are magically super-powerful when not being inferior. They're weak AND strong, as needed to feed racist delusion du jour. That's how fascists work: superior but victims, the rightful rulers always on the verge of being overwhelmed by a vast "conspiracy" that somehow massively outnumbers these supposedly "normal" people. How you can be both normal and a tiny minority raises significant questions about your definition of "normal" of course.

The dumbest part of "replacement theory" is the idea that it would be... not even a bad thing, that it would be UNUSUAL. The next generation will not look like you, act like you, and will be deeply embarassed that you ever existed. What else is new? You don't need "trail of tears" genocide for that to be the case: kids flee abusive parents and escape religious cults all the time. We no longer wear neck ruffs and buckles on our shoes. Hats are no longer a regular thing. The necktie is an endangered species.

(The nudist argument is that ALL clothing should do that because we're sitting in air conditioning "protecting" ourselves from literal climate control. How does a bathing suit help you bathe? Just admit the puritans and shakers and quakers and other witch-burning loons left you with a shame-based culture, literally ashamed of your own body. That's not mentally healthy, but we repress it and don't talk about it because taboo. There are things you can't say and questions you cannot ask, which obviously means we're dealing with it well and it's causing no problems for anybody.)

Anyway, it turns out humans don't breed well in captivity. Latchkey kids were replaced by helicopter parenting and KEPT GOING. It now costs AT LEAST $30k to give birth in a hospital (when nothing goes wrong and they don't pull one of those "your anesthesiologist was out of network and gets paid $2000/minute" scams). The pandemic briefly resulted in $300/month child support but Biden shut that down again immediately because he's 80 and doesn't need it. And of course the student loan forgiveness was merely a campaign promise; Good Cop never delivers on promises. (Although he MIGHT forgive $10k/borrower! Maybe! So one semester, back before the interest compounded.)

I keep repeating to myself, as a comforting mantra, "The Boomers will die". All the problems the world has right now seem Boomer-related to me, starting with hanging on to capitalism long long LONG past its sell-by date. Pediatric lead poisoning meets senility: bad combo. Last decade they named themselves after the Mad Hatter's tea party because that sterotype was based on mercury poisoning, and lead is a similar neurotoxin except it's less "mad" and more "stupid". Suppresses higher reasoning and puts emotions and "gut feeling" in charge. Tribalism, following a strong charismatic leader... fascism. The Boomers will die. Putin's 69, Biden's 79, Pelosi is 82, Schumer is 71, Trump is 75, McConnell is 80, Clarence Thomas is 73. The Boomers will die, and the survivors will Damnatio Memorae everything a Boomer ever touched.

The Simpsons finale "Poorhouse Rock", when asked for a solution, literally said to burn it all down. (I'd link to that part but late stage capitalism intellectual property law, you know the drill.)

May 31, 2022

I _think_ all I had to do for the case/esac parsing was accept type 3 as a synonym for type 0, meaning it was a tiny fix. Lots of reading to refamiliarize myself with the code enough to trust that, though. (I look forward to a 1.0 release of toysh where I can do a video series explaining how it all works. Dowanna do one while it's still in flux.)

While I was there I checked in a pending one liner making do_source() complain about unfinished lines. This is sometimes redundant (at the command line "if true" and enter to get the continuation prompt, then ctrl-D to close input), but the $((if true)) case was exiting without a complaint. It's still not getting the error code right because I need to redo all that to ACTUALLY recursively parse the syntax, which... I am not looking forward to. There's GONNA be ways to make nommu run off the end of the stack if I go there, but the behavior's subtly wrong otherwise...

The NEXT failure using toysh to run scripts/ is because it's trying to use [[ ]] and I haven't fully implemented that yet. The failure is manifesting as not matching the trailing ]] because the code to handle it is missing. It's a weird case: unlike ( ) you can "echo ]]" just fine without quotes. Unlike { } it does not require a ; in front of it to be parsed but is recognized in the middle of the line. Hence the need for its own syntax mode. But what does that mode DO? Trying it at the bash command line, [[ ; == ; ]] doesn't work with those unquoted either. The reason [[ test ]] exists (instead of just [ test ]) is to disable some of the shell command line processing for the test, but only SOME of it. Great.

The bash man page says that within double bracket tests < and > are "lexographic operators" (I.E. greater than/less than), but will that work ok if they're _parsed_ as redirects but not actually _used_ as redirects?

$ [[a<b]] && echo hello
bash: b]]: No such file or directory
$ [[ a<b ]] && echo yes
$ [[ ; == ; ]]
bash: unexpected token `;' in conditional command
bash: syntax error near `;'
$ [[ @ == @ ]] && echo yes
$ [[ # == # ]] && echo yes
> ^C

Sigh. I need to come up with a bunch of test cases. Token parsing in this context is slightly modified, but not only slightly.

$ [[ >>abc 1 -lt 2 ]] && echo wha
bash: unexpected token `>>' in conditional command
bash: syntax error near `>>a'
$ [[ <<<potato 1 -lt 2 ]] && echo wha
bash: unexpected token `<<<' in conditional command
bash: syntax error near `<<<p'
$ [[ 1 >> 2 ]] && echo wha
bash: unexpected token `>>', conditional binary operator expected
bash: syntax error near `>>'

Error message changed for that last one: parser state advanced to fail a DIFFERENT way.

$ [[ 1 &> 2 ]] && echo wha
bash: unexpected token `&>', conditional binary operator expected
bash: syntax error near `&>'
$ [[ 1 <<<< 2 ]] && echo wha
bash: unexpected token `<<<', conditional binary operator expected
bash: syntax error near `<<<<'

Ha! Leaking implementation details: <<< is a redirection operator and <<<< is not. Yeah, it's doing the standard shell token parsing in between the double square brackets, it's just interpreting the tokens differently afterwards.

$ [[ 1 | 2 ]] && echo wha
bash: unexpected token `|', conditional binary operator expected
bash: syntax error near `|'
$ [[ 1 & 2 ]] && echo wha
bash: unexpected token `&', conditional binary operator expected
bash: syntax error near `&'
$ [[ 1 ; 2 ]] && echo wha
bash: unexpected token `;', conditional binary operator expected
bash: syntax error near `;'

Statement ending tokens do not end statements here (and redirect tokens don't redirect), but it's not exactly happy about it.

$ [[ 1 ;& 2 ]] && echo yes
bash: unexpected token `;&', conditional binary operator expected
bash: syntax error near `&'
$ [[ 1 |& 2 ]] && echo yes
bash: unexpected token `|&', conditional binary operator expected
bash: syntax error near `&'
$ [[ 1 |; 2 ]] && echo yes
bash: unexpected token `|', conditional binary operator expected
bash: syntax error near `;'

But they're still being _grouped_ the same way when parsing. It sees ";&" as one token and "|;" as two.

Hmmm, all the changes might be local to expand_redir(). I thought I might need a NO_REDIR flag for expand_arg_nobrace() but the redirections are done before calling that? It's sort of the "skip" argument of expand_redir, which previously jumped over prefix variable assignments. Except we DO still expand those arguments:

$ A="1 -lt 2"; [[ $A ]] && echo wha

Dear bash man page: when you say "Word splitting... not performed on the words between the [[ and ]]" you are LYING.

Implementing is easy! Figuring out what it should DO is a huge time sink. Right, this is SORT of like prefix assignments, only... a new loop I have to write that handles the argument expansion here, and then call into expand_redir() with the skip so trailing redirects happen. (Yes, you can "[[ thingy ]] > walrus", works fine.)

May 30, 2022

Elliott gave me a longish todo list he was interested in having me implement and/or fish out of pending and cleanup/promote, and of course the magnetic pull of toysh has sucked me back in. The logic is that I want the test suite running under mkroot so I have a framework within which to implement a lot of the missing stuff, and scripts/ is a specific scriptacting as a specific test case with specific things I need to fix. That's a good checkpoint to reach for implementing everything else, but doing toysh requires loading a WHOLE lot of state back into my head. I've got maybe 1/3 of it so far.

Right now I'm re-reading parse_line(). Adding various additional TODO lines, and coming up with more tests. For example, what happens if...

$ cat <<< one <<< two <<< three
$ cat << EOF << EOF
> one
> two

Because stdin gets redirected (and the old one stomped) and then redirected AGAIN with the first redirection stomped by the second, so the command only sees one. Got it.

I need leak detection plumbing. Ordinarily I don't care because exit() is the ultimate process resource cleanser, but this is a long-running thing with complicated behavior that juggles filehandles and memory and COULD lose track of them. (It SHOULDN'T, it cleans itself up, but what I can't see I can't be sure. Hmmm...)

May 29, 2022

[Still sick. Pascal's apology for this blog entry, stream of consciousness is a failure mode when I haven't got the focus to filter.]

Over in another project, someone pointed at a paper about "all digital PLLs for low noise applications", and I commented that they need some place collecting that sort of thing. Unfortunately, that maintainer's response to that kind of suggestion has historically been a non-public wiki, and things we didn't publish where google and could find them were seldom available years later. It gets buried under daily clutter and fails to survive migration du jour.

It turns out "publish or perish" applies to the data itself: If we put stuff out where EVERYBODY could see it, then maybe we can google it up again years from now. The theoretical possibility that other people could see what we were doing and "scoop" us was a rounding error compared to us knowing we USED to have something but not being able to find it anymore on a dozen old system images migrated from place to place. Howard Aiken and Cory Doctorow were both right: nobody is INTERESTED in stealing your ideas because ideas are easy, turning them into something useful is A LOT OF WORK.

The "sufficiently public" question is both "have we mirrored the article" (will our link to it go away?) and "where is our index of metadata published"? Just the LIST of important articles, things we've read and our own notes about what we've learned from them... even just our to-do and to-read lists. These get lost easily. Stuff like this blog, as rambling and unfocused as it is, is out there. Which means I can find it again. I have the OPTION to go back and reseach my own history and take better notes, and collate stuff. (I keep meaning to go back and reread my old blog entries and turn it into current non-duplicative todo checklists and regression tests and such. Verify that the done things are done and evaluate the things that aren't done yet to see if they should BE done, and make a plan of attack for that subset. Yes, that's buckets of work needing to be done before I can do MORE WORK.)

Even though I'm quite the digital pack rat, I've still lost a lot of stuff over the years. SOMEWHERE I have the second half of 2016's blog entries, at least what I wrote of them at the time. Unedited, entries trailing off unfinished, sometimes just a couple sentences I was going to flesh out later with twitter and email and mailing list posts and git commit messages. But it was the skeleton I could use to edit and post the entries for that time with. I got hung up on a long unfinished writeup about some topic (probably a licensing issue), and the newer entries remained unposted because I can't upload them out of order with this technology, and then january 1st I skipped ahead (new year new file) always meaning to go back... But at some point I rsynced bits of what was on the website over what was in my laptop's www directory, and the "partial but clean" public version overwrite the "messy but far more complete" private version. I'm reasonably certain I _have_ the full 2016 file in the various incremental backups I made before then, but... I'm not quite sure where? I have to pull out old USB disks and binary search for "older than that rsync but newer than the file not existing yet or being naturally incomplete". They're... not exactly labeled with the appropriate granularity for that? Some of them are formal backups, some are "got a new machine and old hard drive still has old files". I've had a candidate drive sitting on a shelf in the bedroom for a year or so now, and need to dig up the USB-to-sata adapter to see what's on it. But the unedited file remained a todo item for a couple years before I noticed I'd lost it, and this is just getting the todo item BACK, without having a week set aside to edit and upload half a year's semi-finished blog entries.

You don't always realize you don't still have something (at least not conveniently available) until you try to dig it up and realize what failure mode it fell into. The 2016 blog entry is a variant of the "consolidation removed duplicates that weren't" failure mode. The oldest backup tarball on my laptop (2010, theoretically containing earlier years) got truncated by an incompete rsync so I thought I had a convenient copy when I didn't (another thing I should go fish out of old hard drives). Sometimes I can't find a reader for an old format (vlc can still read this old talk but chrome can't, and I've meant to finish a transcript of what's there and redo the talk for years, but... it's another todo item that's aging out from "time consuming" to "hard to do"). Sometimes a file was moved between two places and I have versions of each place that don't have it. That lovely old talk on institutional memory had several more real-world examples. The data usually still exists SOMEWHERE but fishing it out takes time and effort, and then when you're DONE you've resurrected a todo item demanding a bunch more work which is usually why it sank in the first place. Going back far enough I have things in "arj" archives, an old OS/2 image I should get college files off of, at least one interesting zip disk I'm pretty sure is packed with an IDE zip drive that could read it... if I had a computer with a pre-SATA disk interface. (Building an old computer and installing an old vesion of Linux on it that could drive the old hardware... Not an afternoon's project. I might have a USB spaceship adapter that could work with that old disk, but it would also need a seperate one of those 4-pin power supply things? I think?)

Meanwhile, if I posted a rant to linux-kernel or the busybox mailing list back in 2002, the archives are there and google pulls them right up. Livejournal less so (even before Russia pariahed itself), although has the data in a sad unsearchable format I could mirror manually and grind through if necessary.

Data locked in a proprietary service doesn't really count as "published". People keep thinking "it's different now" and whatever walled garden du jour they're uploading to is eternal THIS time. Meanwhile I'm watching Charles Stross point out that Musk is trying to do to Twitter what Romney did to Toys-R-Us (load it up with unsustainable debt to avoid having to take a loan against his Tesla shares).

Youtube has been one of the most stable archive sites (compare to google video, vine, vimeo's recent announcement it's exiting the public video hosting business...), but Youtube's also been wobbly for a while now and is starting to smell a bit ripe with random unappealable copyright strikes, midroll ads retconned into 15 year old videos, and INSANE prudishness. They have "youtube for kids" and they have "age restricted videos", and being in EITHER category is a death sentence for recommendations; how does that make sense? And then on top of that their community guidelines on "nudity and sexual content" -- not the same thing -- say that basically anything they feel like "may result in content removal or channel termination". This topic is FAMOUSLY hard to define, and Youtube does an extra-bad job of it. They explicitly say fully clothed people and "clips from non-pornographic films" can run afoul of their policies, talk vaguely about "fetishes" (which can be literally anything, how does ASMR avoid this), repeat "please note this is not a complete list", mention "depicts someone in a sexualized manner without their consent" once but people being depicted as such WITH consent (including someone filming and posting themselves) still 100% violates youtube's policy so I don't know why they bothered, and ends with "Remember these are just some examples, and don't post content if you think it might violate this policy" because their solution is keeping a wide moat of self-censorship around anything that mightly possibly offend anyone anywhere (such as in Iran or Saudi Arabia). Modern youtube has a surfiet of white men in part because it regularly decides some woman's neckline is too low in her book review and blocks the video. You cannot ever say what will NOT trigger their censors, which is not a healthy ecosystem, and they can literally just delete the whole account on a whim. ("We may also terminate your channel or account after a single case...")

The community sites that can lean on Moore's Law storage growth to bundle their entire history into a mirrorable format tend to reliably retain data, because if THEY screw up somebody catches it. Wikipedia[citation needed] says its complete database is currently just over 20 gigabytes (available via bittorrent), so if an article got lost somebody would likely still have it. There's sort of a Mary-Poppins style "run on the bank" dynamic: as long as you CAN get your data out, then the data is generally safe. If youtube-dl ever stops working, that's probably youtube's death rattle with its budget diverted to lawyers doing the Frank Drebin "nothing to see here" dance in front of the exploding fireworks factory. (Similiarly, the bits of github you can clone feel safe. The bits of github you CAN'T clone are a walled garden now owned by Microsoft.)

The endgame for Youtube seems likely to look more like Netflix, which used to have 100k options you could watch but gradually shrank to 15k options. The great variety of content you used to be able to get through that service narrowed and homogenized until people started losing interest and wandering off. Even if you've bookmarked the old stuff, they just can't show it to you anymore. As for the generic stuff they're replacing it with, why go to youtube for clips of Colbert's show when you can watch the whole episode on The unique remix stuff like reaction videos or AMVs is in violation of youtube's terms and copyright struck down almost immediately these days. ("See the full reaction on my Patreon...") Commentary and analysis rolls to a stop, and even the people who don't leave feel the need to justify staying to themselves.

I'm still waiting for Patreon to get its own hosting together to replace vimeo. Yes I looked at onlyfans but they focus on subscribers to the exclusion of all else: not a lot of options to show stuff to people who HAVEN'T paid. They're doing an OFTV thing that's sort of a free service, but it's by invitation only from their existing creators and its page on their main service says "subscribe to see user's posts" for every post because the service isn't set up NOT to do that.

It's not really the bandwidth that makes competing video services hard to set up: we've had almost 20 years of Moore's Law since youtube started, and a new service can always scale with usage like that did (and of course bittorrent scales basically infinitely). New _streaming_ services get set up all the time. No, the hard part is the intellectual property regime for uploading user content without getting sued, which Tiktok skirted by being in china. Said regime is another thing that can't go away before the Boomers do. Give creators basic income, fix Patreon and OnlyFans and so on, and let Steamboat Willie and the Wizard of Oz enter the public domain already.

As soon as the Boomers die...

May 28, 2022

Hmmm. I've taught scripts/ when it needs to rebuild generated/Config.probed but doing that doesn't rebuild .config at the top level. Running "make oldconfig" would do it, but scripts/ is a passive consumer of the kconfig output for design layering reasons (and kconfig still being under a toxic license: all that directory does is produce a single .config file which scripts/*.sh then reads). The Makefile can ask kconfig to reproduce .config, but scripts/*.sh does not, so if the Makefile was going to update the .config it already would have before calling scripts/ Which implies I need to split this test out earlier maybe? But where would I put it?

I don't know how to teach "make" to run a script, THEN test a dependency. It's the old "mixing imperative and declarative code" problem again. I want to control what order things go in, and it makes sure I cannot as a primary design goal. This is why so many efforts are underway to replace it, but there's no cvs->git sea change, just a bunch of cvs->svn/perforce/bzr factions none of which are actually compelling.

Possibly a lot of these probed symbols shouldn't be Configure symbols, just #defines in a header. They're config symbols so the menuconfig plumbing can have dependencies on them, but how many currently do that? Let's see...

$ egrep "($(sed -n 's/^config \(TOYBOX_.*\)/\1/p' generated/Config.probed | xargs | tr ' ' '|'))" generated/ | sort -u

  depends on TOYBOX_FIFREEZE
  depends on TOYBOX_FORK
  depends on TOYBOX_ICONV
  depends on !TOYBOX_ON_ANDROID
  depends on TOYBOX_ON_ANDROID
  depends on TOYBOX_PRLIMIT
  depends on TOYBOX_SHADOW
  depends on TOYBOX_UTMPX

Yeah, that's fairly signifcant.

Ok, how about having barf and exit with an error if the .config file is old and needs rebuilding? I can't recalculate those depends without calling back into kconfig from, which would open stupid "derived work" gpl nonsense that I refuse to mess with. Someday I need to reimplement kconfig from scratch, but this is a tangent from multiple tabs I was trying to close and I'm not opening that can of worms _now_.

The next question is: how _do_ I figure out the .config is old? Answer: with egregious and flagrant use of sed: convert Config.probed into SYMBOL=y and SYMBOL=n lines based on each default, then grep those symbol names out of .config and make similar =y and =n lines, sort them, and diff them. If the probed symbols and their values match, we can use it, otherwise they need to be recalculated.

The NEXT question is, I went down this rathole because "make tests" failed in cp, because a simple "cp README newfile" was producing a zero byte file (without error!) probably because CONFIG_TOYBOX_COPYFILERANGE was set wrong. But why was that breaking? It should have fallback code. Really, that probed symbol is there to avoid a build break when we haven't GOT copyfilerange. It should fall back to a read/write loop through libbuf, which shouldn't cause runtime problems. What exactly was going on there? I may have screwed up that test environment tracking down the other issues (occupational hazard, this is why I stop and debug, but finding nested buts can defeat that). I don't want to just wander off and NOT fix the other issue(s) I stumbled on falling down this rathole, but I need to figure out how to reproduce it again...

THEN maybe I can get back to whichever github bug report I was trying to look at before all this. (I don't even remember. I need to tie off my shell change first though, I think? This is why I blog, so I can read my own trail of breadcrumbs.)

May 27, 2022

Money arrived! In the actual bank account where Fade can see it! I'm sort of collapsing into a heap with relief, although still being sick is a contributing factor. (Not AS sick, but a nonzero amount of sick. Which was in the "not sleeping well" phase last night, and in the "any exertion that would raise my breathing rate turns straight into nausea" stage today, but other than that I feel pretty decent except for the coughing and sweating.)

The problem with having nontrival technical discussions on github (instead of the mailing list) is they don't get properly archived. Even if I tried, there are lots of different nooks and crannies they can fall in. This one was commit comments on lines of code, stacked into a long back-and-forth. If I "git clone" the repo, I don't get those comments, they're only ever on the website's proprietary metadata, and do not leave the custody of the servers Microsoft bought 4 or 5 years back.

Alright, I should cycle back to tar --transform (and dd before that, and film multiple tutorial videos) but I've got mental state loaded for toysh. The current failing test case simplifies to bash -c $'case X in\n X) printf %s "X" >&2 || { echo potato;} ;;\nesac' and the plumbing needs to search backwards to find the X) instead of looking at the most recent pipeline segment. Hmmm...

And of course I hit another unrelated bug, where my recent rework to try to get it to do fewer "rebuild all" cycles and actually have better dependency generation... is producing broken results that I need to "make clean" to fix. Seems to boil down to it not detecting that the compiler/linker command line has changed between a musl cross compile and a debian host toolchain build.

(I'm still sick enough to waste an hour of debugging going down a "wrong theory of what's happening" fork. I do that all the time, but usually not for an HOUR before spotting a wrong premise.)

Sigh. I let the youtube app upgrade on my phone (I KNEW better, never let the google apps update or the UI randomly changes in bad ways and loses non-paid capabilities or inserts ads) and now the "drag the slider to the end of the video and hit replay" trick to kill the midroll ads no longer works. I don't mind ads before the video and after the video (I don't _watch_ them, but that's true of all ads) but randomly stopping the video to replay a completely unrelated video is unusable levels of malfunction, which I've always treated as "the video suddenly ended, youtube doesn't have the rest, such a shame". TV shows were _created_ to have ad breaks, the algorithmic insertion sticks them in the middle of a sentence, partway through songs (this cord change is brought to you by some random capitalist thing I would actively refuse to buy if I was paying enough attention to remember who they were instead of treating it as a pure malfunction; the phone ringing with "Scam Likely" and thus pausing the video contains exactly as much information).

So my phone no longer has working youtube support. Oh well. I need to set ADB back up and copy mp3 files into the thing again.

May 26, 2022

Respect to The Onion. And here's a good thread from a cartoonist who lives in Uvalde. (Remember, police have no duty to protect, but they do get full immunity from any liability and can literally steal anything from you at any time without needing an excuse, up to and including your house. This is another aspect of billionare-owned late stage capitalism that appparently cannot be fixed while a single Boomer still draws breath.)

Woo, I got to correct the invoice with different banking info today! Let's see what happens next.

Walked to the table last night (very slowly) and managed to get a little work done, or possibly it counts as research? I'm trying to run scripts/ under toysh, and it barfs on the ;; on line 221. The REASON is it sees that the last block is type 3, which doesn't logically terminate with ";;". That's because there's an implicit terminator, notice how these two lines work the same:

$ a=X; case $a in X) echo $a ;; esac
$ a=X; case $a in X) echo $a ; ;; esac

I need to teach it this bit of the parsing to not just look at the last block group but dig backwards through symmetrical block groups to go "ok, what block am I really terminating?", and I think I made a function for that already? (It's not hard, but it is one of those for loops requiring its own temporary variable that's easier to make a function out of just for the varible.)

But the bigger issue is I'm stale on this code and need to re-absorb the context of which block group means what, and I really SHOULD have documented all this somewhere but remember that it all still in flux last time I tried?

Ok, so the general toysh data flow is that something calls do_source() on a FILE * (having already created at least one TT.ff function context), which calls get_next_line() in a loop (display prompt and do command history/editing as necessary, return a malloc()ed string), and then passes that to parse_line() which populates a linked list of struct sh_pipeline (its second argument) and returns 0 if it finished a complete thought, 1 if it needs another line of input to finish the thought (that $PS1 ">" line continuation prompt), or -1 if there was a syntax error and the caller should just discard everything up to that point. When it returns 0, the caller can call run_lines() to run the parsed lines (the struct sh_pipeline list), and afterwards free the sh_pipeline list. Which do_source() hits EOF on the FILE *, it returns. (When you need to run an arbitrary string via -c or eval or something, fmemopen() can give you a FILE *.)

The third agument to parse_line() is the "expect stack", which is a standard struct double_list of block terminators. It's temporary storage and when the list is empty we're not in a flow control block. (That's not the only reason we may need to request more line(s) of input, an unterminated "quoted string" or trailing || doesn't require us to be in a flow control block, but we're can't start running the next thing until we're NOT in a flow control block either.

Each sh_pipeline struct has an integer "type" field, which indicates flow control block type. The main block types are:

  • 0) normal executable pipeline statement (run as a command)
  • 1) start of block (if/while)
  • 2) "gearshift" (then/do) switching from the head of the block to the body of the block
  • 3) end of block (fi/done)

Some blocks, such as { } or ( ) have type 1 and 3, but no gearshift, but every type 3 has a corresponding type 1 and vice versa. The "end" field in sh_pipeline points from type 1 (and type 2) to its type 3. (At runtime there's a "struct sh_blockstack" with start and middle pointers letting us get back to our type 1 and type 2 blocks, but since those go past as we're evaluating them we don't need to record them in the pipeline structure.)

Parsing "if true; then echo hello; fi > potato" gives us a type 1 "if" block, a type 0 "true" block, a type 2 "then" block, a type 0 "echo hello;" block, and a type 3 "fi" block (with the > potato redirect attaching to the type 3 block). Running that pipeline means hit the type 1 if, allocate a new struct sh_blockstack pointing to this if statement's pipeline segment and push it onto the stack, check its type 3 (using the end pointer) for redirects (and save THEM in the sh_blockstack to undo when we pop it), and advance to the type 0 pipeline segment to run the test. Keep going through the test code until we hit our type 2 gearshift (there could be multiple statements separated by && or something, thus multiple type 0 or even nested "if if true; then true; fi; then echo ha; fi" -- anywhere a type 0 can go can also have nested flow control blocks, that's what the "struct sh_blockstack" tracks. We know when we've hit OUR type 2 by looking at the blockstack.) Then since this type 2 goes with an "if" statement, evaluate the return code of our test to see whether or not we should run the body (jump to the type 0 "echo hello;" segment after the type 2 "then"). If not, jump straight to the "end" segment (using the if segment's end pointer, which our sh_blockstack points to). Evaluating the end pops the sh_blockstack and runs its cleanup (freeing memory and undoing any redirects).

Loops add some jumps back ot the start and instead of a type 0 block between type 1 and type 2, they can have a type 's' block which is a list of strings instead of a command (for i in one two three; do echo $i; done). It's still a pipeline segement with the strings held in an argv list, and yes it has to terminate with ; instead of || or | because it's a syntax error otherwise.

Functions add type 'f' blocks right before a type 1. For some reason functions in bash must be followed by a compound statement, but you can have abc() if true; then echo "$@"; fi and they're fine with it?) When a function finishes parsing it can convert into a type 'F' block where the function gets exported into the function name table and the whole span of struct sh_pipeline segments after it gets moved into the type 'F' sh_pipeline segment's first argv pointer. Why "first"? because the argv in sh_pipeline is an array of arrays: its first entry is the command line arguments and its second and later entries are the HERE documents fed into this command. A struct can only cleanly have one variable length array in it, at the very end: if you malloc() extra space then indexing off the end of that array uses the extra space. So it reallocs() and resizes as necessary to handle an unlimited number of HERE documents. Because you can have multiple HERE documents read in sequence, and HERE documents redirected to diffferent filehandles, and so on:

$ echo one <<EOF two three
> ha
one two three

That example didn't USE the substituted input, but it did supply command line arguments after starting the HERE document. Yes you can attach HERE documents to function definitions, and you need to be able to rewind and replay them each time you run the function:

$ a() { cat; } <<< potatosalad
$ a
$ a

I honestly don't remember when a function definition is and isn't exported, although I vaguely recall it's one of those scope things like local variables and I have a pending bug report about popping the wrong thing when a function definition goes out of block scope. MY headache left over from poking at that was working out the lifetime rules for the string allocations in the pipeline segments. (These pipeline segments need to survive past do_source() returning. Functions are maybe 2/3 of the way implemented?)

The next bug in the test suite is that pipeline segments that should be run in their own process, I.E. implicit ( subshell ), aren't being and thus the parent shell hangs generating an infinite amount of data that it would then infinitely read if it proceeded to the next pipeline segment (and then the whole mess fed to "head" which chops out the part it wants and lets the rest know to exit by closing the filehandle so their output errors when they write to it). Except you can't just subshell EVERYTHING because what (if any) of it should be able to work in the parent's context so variable assignments and cd and stuff persist? (Of COURSE posix doesn't say.)

And then there's case/esac, which was added after the rest (including after functions), and has introduces multiple gearshifts (type 2) within the same block. Quite possibly those shouldn't be type 2. This is the part I need to wrap my head around now to fix the next bug that's in front of me to get the test suite to run under toysh, and this is also the general area that's going wrong trying to run debian's /usr/bin/ldd script.

I still have the cold, I'm just tired of waiting until I can think clearly to go back through this stuff. To be honest I've had covid brain for months, and other stress-related problems before that (my startup collapsing during the 2016 election was... not good). I don't get to handle this by being "smart enough", I have to be pesistent with duct tape and nail file until the result looks acceptable to me dorodango style.

HEB had shrimp on sale, so Fuzzy grilled shrimp for dinner. Zabina was SO HAPPY we were in her yard. The dog was dubious.

May 25, 2022

Even more sick. All the sore throat and coughing and unfocusedness. Bug reports and fixes are going by on github and I'm trying to keep up, but I do not have a lot of spare brain right now.

I figured out what's going wrong with the darn ctrl-c issue in toysh though: it's waking up multiple instances of the shell which are then taking turns reading from the command line. I chroot in with /bin/sh, then run /init manually which pauses that first shell and gives me a new shell (conceptually a child of the "pid 1" that ran it), but the new shell does NOT take over the tty and create a new session ID and such (calls to setsid() and tcsetpgrp()) which means the ctrl-c is sent to the original chroot shell as well, and it wakes up and starts reading from the tty, but the child shell ALSO caught the signal and didn't exit, so they're both reading from the terminal meaning they grab alternating lines, and that's how $PWD and getcwd() look like they get out of sync because one shell eats the "cd subdir" and the next eats the "cd .." and then when I query $PWD in one and getcwd() in the next I get different answers...

I am not wording well today. I hope that last paragraph was coherent.

Anyway, the fix is to implement proper terminal control stuff. Which I think I have partway done in one of these fork directories already. I did a lot of stuff on job control that needs finishing and integrating...

Bad things hapened yesterday. I'm trying to keep this blog mostly technical these days, but Uvalde is something like 150 miles from Austin: head to San Antonio and continue west when you get there for about as far again. The gun nuttery, and cops being useless drama queens with extra useless (and even actively harmful and lying about it, and yes Uvalde spends 40% of its budget on police, which have no legal obligation to actually do anything), and general refusal to lift a finger against white men no matter what they're doing, is common to both places. Austin may be the dot of yin in the giant yang that is Texas, but that just makes it slightly less concentrated here. We're all just waiting for the last Boomer to die.

Trying to focus on work in the meantime.

May 24, 2022

Still sick. Fuzzy had this for a week. Wheee.

I've reproduced the toysh confusion where "ls -l /proc/$$/cwd" and "echo $PWD" disagree, and it definitely seems something to do with interrupting a process via ctrl-C. The shell is in a slightly unhappy state after that. Unfortunately, as a reproduction sequence goes "run something that takes 3 seconds and ctrl-c it partway through" is a bit... squishy. Not exactly easy to add to the regression test suite even if I do work out what's going on, and not exactly easy to run exactly the same test multiple times in a row with slightly different printf()s.

But it does NOT happen if the process is allowed to run to completion, only when it's interrupted. Hmmm...

Yay! The middleman said the domestic swift transaction has finally errored out, and the whole thing should unwind over the next few days. Eventually the transaction in the middleman's website should go from "paid" to "error", then I get to resubmit it with fixed bank info, and then maybe actually see money 3 business days after that? That would be nice.

May 23, 2022

Various bug reports today highlighting the lack of mkroot-based testing, and I'm trying to poke at that (despite being sick and really out of it), but I've hit a WEIRD bug where $PWD and /proc/self/cwd do not match. When I run a child process, it's using $PWD. When I run a builtin, it's using cwd. Except it's weirder than that, because "ls" and "ls dirname" are behaving differently and they should be the same builtin-vs-child state?

This is the standard "freeze, do not step on the broken glass, examine the crime scene" problem where I have to STOP AND FIGURE THIS OUT rather than lose the test case, except slightly creepier because it LOOKS like filesystem corruption, but it's in a chroot not a vm, which means it would be HOST filesystem corruption. (I have a full backup from earlier this month, but still.) Nothing in dmesg. 99% likely it's my code and not the host code, but... WEIRD.

Alas I don't currently have the focus to mentally model anything complicated right now. Trying, but... (Youtube used to have a clip from the Phantom Tollbooth where the Terrible Trivium convinced the Humbug to dig a hole through a cliff with a needle, but of course late stage capitalism has weaponized intellectual property law so short clips from a largely forgotten 50 year old movie must be torn down.)

May 22, 2022

Sick. Fuzzy had a bad cold (muscle aches, etc) most of last week and I just came down with it. Turns out there's still diseases other than covid wandering around...

I was waiting for "minimum electrical bills" to punish rooftop solar panels undermining the fossil fuel industry. That's why people in Austin have been speccing enough batteries to unplug from the grid entirely. (Yes, downtown. The Texas electrical grid came very close to collapsing again a couple weeks ago, and it's spring. Enron was the energy trading arm of Houston Light and Power, the whole "deregulation, free market" libertarian nonsense poisoning the infrastructure here has rendered it basically unsalvageable, and people are opting out.)

May 21, 2022

And gmail unsubscrubed almost 1/3 of the list again. (Including me this time, although all I had to do for that was open the confirmation link it emailed me. At least gmail's feral spam filter didn't eat the confirmation link. This time. It really does NOT like my replies to scsijon. Dunno why. And yes, it unsubscribed me by refusing delivery of one of those emails it would then put in "All Mail" but not in "Inbox" where pop3 could fetch it so I wouldn't get broken threads in the toybox folder. Gmail has achieved nested failures.) On the bright side, Google has apparently backed off on the forced migration to a pay service. Still haven't got the domain admin login, but it gives me a few more weeks to deal with the migration.

Checked in tar --xform, which left me with various loose ends: 1) --long-name-truncation (ambiguous, possibilities) and 2) dirtree without eating filehandles (and rm -rf infinite depth test), 3) testing under mkroot (mount had tortoise() swapped for xrunread() but no regression tests because root, df test failed on kvm ubuntu image because unionfs returned 0 blocks used). I've got a bunch more: The pending readlink relative path stuff (also hits cp -s), Elliott's todo list... heck, I should read the last couple years of blog and collate todo items mentioned there. (And collate my various todo.txt files.)

Poking at the --longopt-truncation stuff now because while implementing tar --xform I could never remember if it was tar --show-transform or tar --show-transformed and realized I'd been using both, and according to the man page it's actually tar -show-transformed-names which is being automatically recognized as any unambiguous subset:

$ tar --show
tar: option '--show' is ambiguous; possibilities: '--show-defaults' '--show-snapshot-field-ranges' '--show-omitted-dirs' '--show-transformed-names' '--show-stored-names'

This is easy enough for me to implement in lib/args.c, but the question is WHEN should it do this? What, new leading punctuation type to say longopts are abbreviatable? I could easily make EVERYTHING auto-abbreviate, which would encourage scripts to abbreviate longopts in a way that might break with other implementations, and thus break compatbility with the gnu/stuff if it DOESN'T do that. Hmmm...

"sed p --null" works (it's --null-data), base64 --de works (it's --decode), cat --squeeze works (--squeeze-blank, ala -s)... Ok, it's looking like Debian handles any unambigous truncation of --longopts by default. So lib/args.c should just do that for any --longopt.

May 20, 2022

Fade's back!

Still no money. They said the borked transaction would definitely be reversed this week, but it wasn't. Three recruiters have dangled work from home contracts at me this week. Hmmm...

Debugging the tar --xform stuff, it looked like the new xrunread() was leaking filehandles, but it just turns out that was dirtree recursing down directories and keeping one filehandle open per level. (Which I have a todo item to deal with already, needed for rm -r to handle infinite depth but I'm never comfortable TESTING weird rm -r corner cases so that's in the "do it when I've got mkroot running the test suite" bucket.)

May 19, 2022

Hmmm, maybe I don't need to spawn a sed instance per filename for tar --xform? If I added sed --tarhack I could force NULL terminator mode and prefix each input with a type (for s///rRsShH), and then on output filter out any interstitial NULL bytes (from somebody being tricksy with \0).

The problem is toybox tries never to DEPEND on calling toybox commands out of the $PATH. Toybox tar should work with debian's sed or gzip, toybox mount should work with debian's losetup, etc. But if I _don't_ depend on being able to modify sed, how do I get it to handle the input type filters at all? I'd have to implement a parallel tar command parser to find the "r" in "/abc/,2{y/s/q/;s/a[s/q]/\1/gr;}"? (Almost certainly that's what debian's tar DID. Having five implementations of the same functionality across your command suite is very gnu.)

Another toybox plumbing rule is that commands shouldn't call other commands' functions directly, because that way lies build breaks. The shared infrastructure lives in lib, or sh.c has NOFORK plumbing (but again, the commands that aren't in the same command.c file drop out of the list cleanly when deselected, which a call to command_main() would not. It's a design layering violation.) I don't want to make sed a nofork because it doesn't guarantee that all its memory is freed and its filehandles closed when a random error_exit() calls xexit(). The nofork plumbing can intercept that to longjmp() back to the caller, but unless the command is sanitized for nofork the error paths will leak resources. And for most of them the normal exit path leaks because the OS cleans all that up for us anyway so calling close() and free() right before main returns is extra code accomplishing nothing.

The "clean" way to do it is just call xexec() to run the other command, and if we can recurse we do so, and when we've hit our recursion stack limit we fork and exec out of the $PATH. (And the Android guys enabled TOYBOX_NORECURSE in the .config for some reason, so it NEVER does that for them and always forks and calls whatever's in the $PATH. I don't know why, by haven't asked.) The standalone builds always call out of the $PATH unless you build the other command in as a NOFORK (which is currently special cased for toysh and nobody else).

I posted my ruminations to the list about what exactly is tar --xform actually DOING with the s///S and such indicators, and the answer is apparently "not much". That's just sad. Hopefully I can just NOT IMPLEMENT THEM and we'll be ok? Let's see who complains...

Sigh, the new xrunread() function replaces tortoise() in mount.c that shelled out to a command and returned its output, and that A) could output an error message with the command's return code (although the command itself presumably already complained to stderr, or else the failure to exec was spat to stderr by the forked child process), B) read into toybuf so didn't have to free stuff. Which meant it was length limited, not that losetup -f is going to return over 4k of "/dev/loop1" string. But the new thing is returning a malloced block of memory, which means it needs to free it to avoid leaking, and... I really didn't want to bother because it's a tiny ammount of memory and how long is the world's longest /etc/fstab anyway? (How many times can this function be called in one run?) But I felt the need to write a comment, then rephrased that comment so many times I gave up and added strlen(s)>=sizeof(toybuf) to the error_exit() case and then strcopied it to toybuf and freed the returned string. (It wasn't WRONG, just... untidy.)

Another very SLIGHT rough edge that calling out to sed will add a newline to the end of the output and my chomp() implementation strips as many trailing /r /n as the file has. If they sed a file to end with multiple /r/n then MAYBE the gnu version wouldn't strip more than one (I haven't checked). I could fix this by calling sed -z, but the tiny possibility of maybe calling the macos sed that hasn't got -z (I dunno?) balanced vs the tiny possibility that somebody would try to create filenames ending in newline/linefeed characters...

Ha! Wait, my sed implementation WON'T add a newline if its input didn't end with a newline. So I don't have to care about that, and neither does the gnu/dammit version... but once again, I have no idea what mac would do, so I should probably chomp() it anyway? Grrr. Now I'm not sure what the right thing to do is here, the common case is two sed implementations (no, three: I wrote the busybox one back in the day and it gets this right also) that DON'T need the input mangled with rules of thumb (not that utf-8 is ever going to care, but who knows what weird binary crap people have in filenames?) vs the tiny possibility someone will dig up an actually nominally posix compliant sed from the 1980s that adds a newline to the last line when there wasn't previously one there, and thus transformed names all have newlines appended to the filename.

It would be easier if my brain felt like coming up with things I could actually TEST today, but that's not the kind of day I'm having. (Speaking of which: tar --exclude is checked before tar --xform, and xform happens before tar --strip. Trying to figure out if I should add that to the tests because normally my policy is to add every test I needed to run when implementing, but if the order changes in future does TEST_HOST failing really MATTER? Sigh. Does anybody use these in combination? I'm not implementing the file type limiters in the sed expression, and what I AM doing is passing through to sed so you can use full sed syntax (multiple of each if you like, it'll do "sed -a abc -e def -e ghi" in order). Already deviating a bit, no standard, I matched the behavior and added comments to the source, and WHAT I'm testing is a weird corner case...)

Sigh: why is tar --strip only on extract? Is there a reason? ("tar c --xform" is done after stripping leading ../ and once again I CAN add a test but am unsure if I SHOULD...)

May 18, 2022

Garrett Kajmowicz (coworker of mine back at Timesys, who once upon a time wrote uClibc++) is visiting. He's taking advantage of work-from-home to go on a road trip. We had a cookout last night. He's asking me to "show him Austin" and I honestly haven't been out into it much recently? Well, lots of walking around late at night but not while it's open. He's not gonna go downtown to visit a courthouse or the state LLC issuer or the DMV. We have a bunch of really good restaurants... most of which are national chains? Various places to shop, but I'm not big into stuff. Took him to Rudy's. It's still good, but suffered a bit during the pandemic. We should hit the Alamo Drafthouse I guess? Except they're not doing anything interesting until this weekend, and he's only here a couple days midweek. Sample problem with the conference center and theatres and music venues and stuff... all that stuff requires you to camp the spawn. (The lesson here is I'm not really a good guide for this sort of thing.)

Here's a good example of how ontogeny recapitualtes phylogeny within Fortune 500 corporations. "I want simple." "Sure, I'll add that on top of the pile." (Every new C++ release wraps another layer of simplification around what's already there.) Completed debugged code is an asset with a dollar value attached, therefore deleting it costs the company money. After the people who did it left, reverse engineering something that already works is a waste of salary hours: everything becomes a black box you must wrap your changes around, encapsulating to form a pearl around the speck of random trash at the center. (Of course that's not how simple works. But it is how yocto works. Rather than cleaning the floor, install a new floor over it.)

Poking at tar --xform raises some funky sequencing issues: do you filter (--exclude) on the transformed name or the original name? And does --strip happen before or after --xform? I should add tests, but right now I'm just implementing SOMETHING and I can tune it later...

Sigh, and I have a TODO note that tar implements --exclude but not --include and that's just silly? (Well, I guess you use find for that, but you can use find with grep -v too? If you already did most of the work, why NOT wire it up? Sigh...)

Grumble grumble one of my sleep deprivation variable names slipped through (int ala = mode;) and it's a bit intrusive to change it...

Grrr... I think I have to spawn an instance of "sed" for each filename. I was thinking I could use -z for null delimiters, but what if s///p produces two output lines (thus getting us out of sync even with NULL delimiters)? And if I'm not strongly policing what sed -e they send (which I'd rather not, why can't they y/// if they want to? Or a whole sed script with however much complexity they need to get the job done?) then there's no guarantee it's gonna produce ANY output for a given input line. (Which I assume translates to "skip this file".) I can delimit the "this input produced this output, possibly none" by closing the input pipe and waiting for the output pipe to close because process lifetime. This spins the PID but eh, shell scripts do that all the time. (Making sed run as a nofork could be done, but isn't trivial. And so far all the nofork plumbing lives in sh.c, moving THAT isn't trivial either, although I need to do some subset of it to properly integrate Moritz Weber's upcoming git implementation..)

Of course sed itself can't parse the s///rRsShH filters on file type. (Regular, symlink, hardlink... why no dD for directory? Is the filter on the whole path or just the last member name? Because if you apply a filter to the path down to a filename but not to the directory entry that created that directory, it's not gonna match up right. (And so I need to run MORE tests against debian's tar to see what it does in order to work out a spec to implement...)

May 17, 2022

Dear bash: why does a trailing || pull in the next line, but a trailing | does not? (Yes I know what the standard says, but WHY?)

Still ripping the generated/flags.h plumbing a new one. What I'd LIKE to do is turn something like the "mkfifo" command's optstring "<1"USE_MKFIFO_Z("Z:")"m:" into:

#define FLAG_m (1<<0)

And then generated/flags.h doesn't need to be regenerated when .config changes.

I also kind of want it to be generated via bash script instead of C (tricky), and I also want to collapse together the two FORCED_FLAG macros and just have it always be 1LL. Unfortunately, it turns out gcc's optimizer does "long long" math in 64 bit mode and then truncates the result to 32 bits even when it CAN'T MATTER, ala:

int main(int argc, char *argv[])
  unsigned long ul = 1LL<<atoi(argv[1]);
  printf("%lu\n", ul);

Compiling that with 1LL vs 1 before the shift and then diff -u <(objdump -d one) <(objdump -d two) says that it produces different code, even though the assignment to ul will truncate at 32 bits so the left shift might as well be 32 bits, the conversion-to-int can move one step up in the math because shift left can't have the bottom 32 bits influenced by anything in the top 32 bits. Even on 64 bit platforms it's an extra instruction (because then the truncate becomes explicit). I'm guessing on 32 bit platforms it's calling the library functions to do 64 bit FOIL algebra instead of the simple 32 bit math it has CPU instructions for.

Sigh, that's why I had the two macros in the first place (and switched to using the big one only when it was shifted by enough to matter). Hmmm. And with llvm it's an even bigger difference between the two assembly outputs. Sigh, you'd think with all the work they put into the optimizer, gcc or llvm would pick this up, but apparently not...

Right now, mkflags.c takes two different optstr inputs, one for allyesconfig and one for the current config; it resolved the macros with gcc -E but did it twice and then diffs the strings. If I switch to preserving the USE_X() macro inside the FLAG_X(macro) then it only needs the one (unpreprocessed) input, which more or less means rewriting mkflags.c, and if I'm going to rewrite it anyway I might as well do it as shell instead of C. Except C string syntax is kind of nontrivial, ala "abc\"USE_"USE_POTATO("abc")"def" and yes that is an escaped literal quote and USE_ is legitimate optstr payload although I think I can guarantee it'll never actually occur in that order in a string... I mean it COULD show up in a longopt, but I'm willing to bet it won't and fix it up if it does?

Anyway, trying to process this in bash turns out to be kind of unpleasant. I made a sed invocation to give me the command name and option string for each NEWTOY:

sed -ne ':a;s/^USE_[^(]*(\(.*\))$/\1/;ta;/^NEWTOY(/!d;s/NEWTOY(\([^,]*\),/\1/' \
    -e 's/, *[^"]*) *$//;/"/p' toys/*/*.c | while read NAME ARGS; do
  echo NAME="$NAME"
  echo ARGS="$ARGS"

But taking the result and emitting flag macros for it... thats easier to do in C than in bash? Which means I'm writing a new C scripts/mkflags.c which is not ideal, but still... could be worse?

Except... The reason I have the low-ascii skip handling in lib/args.c is so when it hasn't got -Z the parser will error on -Z instead of silently accepting but ignoring it. Even if I chop the optstr contents out with macros instead of replacing them with spacers, the flag positions change so the FLAG macros have to be regenerated, which means rewriting generated/flags.h anyway.

Sigh, I WANT to simplify this infrastructure, but it's all got reasons for what it's doing. Needs more design thought.

In the meantime, I started down this path because tar --transform and --xform were synonyms without a short option to collate them (which matters because they take arguments and put them in two different buckets which don't maintain order relative to each other), and I should implement a solution to that in the current plumbing if this isn't a good near-term excuse to redo said plumbing.

May 16, 2022

Gmail is back on its bullshit. So many "bounce action notifications", refusing delivery of a message gets retried by dreamhost until the number of attempts exceeds the unsubscribe threshold. I can't get an actual human at either side to fix the design assumption conflict (can you just discard the one MESSAGE and not unsubscribe the USER), just like I can't get gmail and thunderbird to talk to each other about their conflicting imap design assumptions (both the "All Mail" vs "Inbox" folder problem that screws up threading in mailing lists, and the problem "you can only permanently delete out of the 'Trash" folder but thunderbird implements imap "move mail" as "copy" then "delete" so trying to move messages to the trash before deleting them DOESN'T HELP.)

The common element here is, of course gmail. I asked Elliott but he isn't aware of any humans working at gmail from his perspective either.

I went through and fixed all the mailing list accounts up by hand in the web interface again. Something like 1/4 of the list users are gmail. Nobody else got kicked into "hold:B" state, just gmail, as usual.

And I still need to move OWN email off gmail by the end of the month due to gmail's forced migration to a pay service of my old account type (which I don't have the admin login info for and thus couldn't migrate anyway even if I DID want to pay for it, which I don't). But I still can't start the migration yet because I want this "getting paid by the middleman" thing sorted out first.

Speaking of which, the middleman emailed me this morning with the reply they got from their outsourced payment processor:

So it looks like we need to ensure transaction is rejected at the bank and the money is refunded first before we can proceed. It should normally take a few business days but I hope to have all of this sorted for you this week.

Please let me know if you have any questions.

Oh do I have questions, most of which I didn't ask. I DID ask if "sorted out this week" involved just reversing the first transaction, or if actually getting money into MY account might happen this week, because of the whole "email needs to change ISPs at the end of the month" and me just assuming it will go as wrong as this because it often does for me. I break everything. It SHOULD take maybe 24 hours for the DNS to update. I EXPECT to be unable to receive email at through maybe the end of June? No idea.

Questions I DIDN'T ask include whether they could elaborate on the word "ensure": is there a strong possibility of the money NOT being refunded? If the money went into a random account that didn't have my name on it and somebody got dollar signs in their eyes and absconded with it, is there some sort of insurance here? (I have zero visibility into this process. I've heard of large windfalls winding up in people's accounts due to bank error and them getting in trouble for spending it, but they still spent it.)

And of course, is the transaction being rejected because we asked them to reverse it, or because the transaction that "successfully completed" couldn't have successfully completed, both because you can't do a domestic transfer via swift and because there's no destination account with that name/number pair at that bank? (The name "Landley" was made up during World War I because a german immigrant named "Landecker" wanted a less german-sounding name fleeing a war to the side that WOULDN'T draft him; there's not a lot of us. Google says a few people not related to me have taken it as a stage name, usually as their first name.) If "rejected" means rejected rather than cancelled, then how did it "complete successfully" first? What was the three day waiting period for if "destination account exists" was not part of the success criteria?

I had more questions, which I decided to edit out of the email reply as "not helpful", so I saved the longer version as a draft in thunderbird, edited it down, sent it, and watched thunderbird delete from the drafts folder, without backup or undo option. Thank you thunderbird. You wrote a copy of the short version in the sent mail folder, but completely deleted the long version I explicitly saved. Bra fscking vo.

Yes, this is a normal weekday for me, why do you ask? I break everything, all the time. Nothing is foolproof because fools are so ingenious, and whatever soup of neuroatypicality I have combines with knowing just enough to be dangerous to near Newton Pulsifer levels at times. I am trying very hard not to have to migrate my email until the Q2 bank deposit is sorted, meaning I'm on a deadline. Wheee.

And yes, I make an EXCELLENT TESTER because of this. Doing it for other people hasn't been how I prefer to spend my time, in part because Murphy's Law only fails when you try to demonstrate it so RELYING on being able to break things seems like it'd break, but if you wonder why the software I make is like it is, this is what works for me. Achieving basic functionality (not gonna say reliability) by process of elimination. The Linux USB maintainers do not use dodgy cables nearly often enough, or they'd have fixed whatever locking problem causes ports get tied up by a device that's been unplugged. (My laptop has 3 of them, but I've run out before. Luckily software suspend unloads and reloads the USB subsystem as part of its power management shenanigans, which seems to fix the issue.)

People worry about fork bombs, I worry about "tr '\0' a /dev/zero | sort" or "while true; do mkdir a; cd a; done". (Has anyone ever actually TESTED inode exhaustion in most modern filesystems? Running out of SPACE, sure, but ext4 has fixed-size inode tables, if you touch a zillion files but leave them zero length, you run out of inodes without running out of disk space. What do OTHER filesystems do, and when were those error paths really last exercised?)

I have so much practice at debugging because "how did I break it THIS time?" is a question other people have consistently been unable to answer. Still dunno what was wrong with my phone's volume down button. It's happy again now. Could go out again at any time. Same as it ever was...

May 15, 2022

Last time I sat down to do "tar --transform" the problem was "tar --xform" is a synonym, but there's no short option. Short options can have as many --longopts as you want as synonyms, and they all get collated which means when they can collect a single * argument list and all goes into the same variable. So if "tar -Z" was a synonym the optstr "Z(transform)(xform)*" would let the command line "tar --transform s/abc/def/ --xform s/ghi/jkl/ --transform s/mno/pqr/" put all three in the same list in the original order. But without a short option to collate them under, leading "bare" longopts don't collate, so each would collect into its own arg_list and processing them after the fact would have lost which order they arrived in relative to each other.

It's not a BIG flaw, and this problem has come up before: sed doesn't remember what order it saw -e vs -f in when both are on the same command line. A certain amount of detail being lost is inherent to having generic code do the option parsing before calling main(): the neatly collated data can miss some subtleties of the original. But this time it bothered me enough I shelved it, and now I want to fiddle with the option parsing infrastucture to provide a way to at least collate bare longopts.

Except... $HOSTCC scripts/mkflags.c running on the host to generate a header file at all is a bit of a wart in the design. MOST of generated/*.h is created with a shell script calling the existing toybox commands. I've slowly been deprecating scripts/config2help.c so maybe at some point I could yank it (possibly replacing some of what it does with USE_COMMAND() macros in the help text?) and it really SEEMS like what mkflags.c does can be done with bash/sed. Especially if the result was more like:


I.E. when a string snippet is in a macro (INSIDE the string, I don't care about the guard macro around the NEWTOY() line here), I could just put that macro in the flag definition, and this avoids the whole dance where I curently run the arguments through the cc -E preprocessor twice to produce different option strings and then have mkflags.c compare them. (Which is a BIG part of the reason the headers have to be regenerated whenever the config changes!) It can be done in ONE pass through the data, and the result would be the same no matter what .config said.

Except flag macros can theoretically nest, and teaching sed to push and pop them... would not be pretty. Hmmm.

ANYWAY, what I should do is add an "invisible short option" that lets me collate longopts but doesn't result in a new short option, and teach the existing mkflags.c to handle that. What I DID was a fairly extensive cleanup pass on the build plumbing that doesn't address this at all, but makes the output a little more readable and makes explaining it in a video easier. :)

May 14, 2022

If a Pixel 3a's "volume down" button sticks, the phone is completely bricked. Specifically, when you reboot it, the bootloader menu comes up and then can't be dismissed while the "down" button is held, so it will never reboot back into android again. (And you can't power it OFF either, holding down the power button only works when it's not also holding volume down. I have to wait for the battery to die.)

The phone repair place in Hancock center didn't survive the pandemic, neither did the one that used to be in Fiesta across the street. The closest phone repair place I can find is the "1up" phone repair place on Guadalupe at UT (which I've walked past late at night and it specifically says they repair Android devices on the window). Dug around to find a paper book to read on the bus. (No bluetooth headphones. No podcasts or audiobooks. No e-books. No phone games...)

The 1up phone repair said a Pixel 3a is unserviceable because the case is designed not to come apart (and wouldn't go back together again if it did), so even if it doesn't need a replacement part (which they DEFINITELY couldn't get), they can't get it open to service anything inside without destroying the phone. A stuck button is the most trivial POSSIBLE hardware problem (other than maybe the usb port filling up with pocket lint, which you can fix with a needle or a sim card ejector tool), but the phone is a brick now. Because of the contact switch under a button.

The thing is, I bought this phone through Google instead of tmobile because I specifically wanted an unlockable one. (Even though the bootloader says I never actually bothered to unlock it.) And mail ordering another one would take a week. I could presumably pop the sim card out and put it in an old phone, but I stopped using my Nexus 5 because it got wet (and was never quite happy afterwards), and besides it was 32 bit so stopped getting security updates. I expected THIS phone to become my backup phone someday, and possibly be a thing I'd unlock and reimage and test software on...

Really not impressed with Google's hardware policy right now. I've had this thing for less than 3 years but it is now completely unserviceable. (I haven't even scratched the display, the problem is OBVIOUSLY TRIVIAL, but no. It's a brick. Thanks Google.)

Sigh, what was that cheap Motorola phone Louis Rossman liked? (Can't google it right now because I'm still out at UT and have no phone tethering because brick. Ah, UT guest wifi. It's the moto g. Which is $190 new and has an sd card slot, something the pixel did not. Its processor is cortex-a53, 64 bit, 8 cores maxing out at 2.3 ghz. In theory that's faster than my laptop. Only 4 gigs ram instead of 16, of course, but THIS phone accepts a microsd card and I've got a half-terrabyte one of those somewhere...) Still multiple days to mail-order it, but there's a T-mobile store in Hancock center. Back to the bus...

HA! The guy at the t-mobile place was able to unstick the button! Even he's not quite sure how he did it (I took it out of the case and pressed, scraped with the SIM ejector pin, and whacked it against tables and such for like an hour) but it's working again! For the moment, anyway. Limp by until money comes in again, at least.

May 13, 2022

As Sam Vimes said, you do the job that is in front of you. Right, back on the horse.

Finished editing and uploading the second video (first was how to download and use binaries, second is how to build from source, including basic cross compiling if you have cross compilers lying around). I need to start fiddling with playlists...

The dropbear problem is because dropbear's autoconf is stupid. Zlib built for arm64 just fine, but then dropbear did:

checking for aarch64-linux-musleabi-gcc... no
checking for gcc... gcc

Because IT'S NOT -GCC, it's -CC. So dropbear has the same hardwired headache Linux does if you don't hit it with a rock. (Why did the absolute path version bypass that but the search-in-path version didn't? Autoconf! Why is it NATIVELY COMPILING WHEN IT CAN'T FIND THE CROSS COMPILER DESPITE THAT NOT BEING WHAT IT WAS ASKED TO DO? Because gnu/autoconf has "gnu" in it, so it doesn't care what it was ASKED to do, it thinks it knows better than any unwashed masses outside its cathedral (it was the original one) and will impose its will upon them for their own good, "this hurts me more than it hurts you", "look what you made me do", etc. Of course there's a middle aged white man at the helm, you can't be properly patronizing without "pater", the latin word for "father", as in The Fatherland. And of course and OF COURSE. Why did Matt decide to use autoconf? Sigh. I object to the design philosophy behind this tool. Oh well, usual whack-a-mole solution...)

And moving the package builds before the toybox build (because the new scripts/root/dynamic needs to run before toybox builds if toybox is to be dynamically linked on the target) revealed the bug that my dropbear build script isn't returning to the top level menu. (The package builds are sourced, not run as child processes, because dropbear has to set an environment variable that writes extra stuff into the qemu-*.sh launch script. But that ALSO means each build script has to clean up after itself for things like $PWD. The problem in this case is that dropbear builds two packages, zlib and dropbear, and the lifetime of those package builds overlap because the zlib build creates a static target library dropbear links against but which is not installed to the target (no, I didn't modify it to be aware of the new dynamic build option), and setupfor/cleanup does not nest so "setupfor zlib; cd ..; setupfor dropbear; cleanup" puts us back in the directory "cd .." got us to...)

Hmmm. I kind of want a debug function that compares "env" output before and after, and $PWD, but... eh, that's not SIMPLE. Defeats the purpose. Really, the package builds should work as child processes. Maybe the QEMU_MORE stuff should live a file somehow. Hmmm...

Anyway, fix up what's currently there. For the directory I can just pushd . before the source and popd afterwards. Bit of a band-aid but it works.

And while I'm at it have there been new releases of zlib or dropbear? Yes to both. Except zlib 1.2.12 isn't on the sourceforge mirrors. The previous release (from 2017) is there, but the curent release (from 2 months ago) is not. Despite still linking to them as "mirrors". Time to email the maintainer. (If I link to the copy on, the build will break as soon as he releases a new version because he deletes the old ones when he does that. "Because building against it would be insecure", he says. That's not how this works for any other package. Ok, it took him 5 years to get this release out because it's a very stable project that does not do "heartbeat" releases to show the maintainer hasn't quietly died without anybody noticing. But still. Back in aboriginal linux I had my own mirror page I put the tarballs on, but manually maintaining that and other people depending on me to do that going forward isn't SIMPLE.)

Eh, I can update the dropbear version, anyway...

May 12, 2022

Sigh, miscommunication within the houshold resulted in the mortgage check being sent out on time (so it wouldn't be late), and bouncing today because the money wasn't in the account yet. (Fade then got a cash advance against her credit card to put enough money in the account if they resubmit it. If I knew the money would take this long to get here she could have done that last week, or I could have taken another contract after the last one, or sold stock, or tapped a Roth IRA... Two recruiters have called me this week to ask if "things are finally panning out with Google" and I replied that I honestly don't know. Xeno's invoice.)

The money SHOULD have been in the account because the dot-com company that the middleman outsourced the wire transfer to said it completed at 8:15 this morning, but the money wasn't in our account. Fade called the bank people, and they said there was no trace of a transaction. Cue hours of digging...

So of course it turns out I filled out the bank deposit information wrong. I typed in the info it asked for accurately: the account number was right, my name was right, my mistake was when it asked for a swift code I gave it one. The bank guy Fade talked to assured her you can't use a swift code for a domestic transfer, it's ONLY for international transfers. (And yet the transfer went through...? "Yes, but what happens if you DO?" "Oh. Huh. Ummm..." Ah, that old refrain. Story of my life.)

The middleman company's web form defaulted to "swift" instead of "ABA routing number" (and thus there was no place to type a routing number since I didn't notice you could click on and switch the type), and their system did not check that it was trying to do a domestic transfer to a US address via swift code. Instead it did the transfer to "Wells Fargo Bank International" (which apparently isn't the same as the domestic Wells Fargo Bank?). The account number was right... for the domestic bank. There apparently WAS an account with that number to receive the money, since IT WENT THROUGH. Not minding that my name was not on that account?

So now I've asked the middleman company to ask their outsourced wire transfer company to reverse the transaction, although this took hours to do because the person I was emailing last time turns out to be in New Zealand and didn't get into work until around 4pm my time, so I'd sent four emails and created an account on the middleman's slack by the time I got a reply.

They say I should wait for the transaction to reverse (3 business days), then change the deposit info and resubmit (3 more business days). So hopefully sometime next week I might actually see money. And I shouldn't fiddle with anything while it's reversing which rules out a parallel invoice with the remaining money parked at the middleman. (I only invoiced myself for 3 months because I did not trust this process. A few weeks back I explained to Elliott why Fade handles all our online purchases, and he tried to reassure me as to their safety. Did I mention that as far as I can tell, none of my computerized medical records still exist? You literally CAN'T get rid of that stuff, and yet. Welcome to my world...)

That ate my energy for the day. Curled up into a ball for the rest of it and finished reading Harbinger... which ended on a cliffhanger. Book 4 came out in 2014, EIGHT YEARS AGO. Sigh. I really hope it doesn't take her another 8 years to resolve the cliffhanger, she turns 60 next year. (I'd subscribe to her patreon except for the stupid screen blanking bug every time I visit her profile, no of COURSE patreon hasn't fixed that. I should do that on my phone, but "my phone being authorized to spend money, even indirectly" is a line I have not crossed. The orwellian tracking device with cameras, GPS, and whole-room microphone that's on and with me 24/7 does not get to spend money. Yes I am contributing signifcant chunks of software to its operating system, but that's one lump in the katamari. They try very very hard to secure it, and state level actors and criminal syndicates (venn diagram's not QUITE a circle there but between trump/putin/xi it's pretty darn close at times) push back the other way with a literal body count. The hardware was made in china and I'm supposed to TRUST it? My main defense here is being largely uninteresting and blending in with the crowd.)

May 11, 2022

I got a bug report about disabling logging changing the cross compiler behavior in a way that broke the dropbear build. It looks like an absolute path to the cross compiler prefix (thus any tool name you append becomes an absolute path to that prefixed tool) becomes an unpathed prefix with the $PATH adjusted to have the set of tools in it. In THEORY both should behave the same, in practice the dropbear build started using gnu/autoconf back in 2009 and thus only ever works by coincidence.

But when I tried to reproduce this problem in a clean directory, I got:

scripts/root/plumbing: line 17: root_download/zlib-1.2.11.tar.gz: No such file or directory
wget: unsupported protocol:

Oops. I promoted wget and updated the airlock list to replace host wget with my wget, but the "defconfig" build doesn't include openssl (and thus https support). And I really don't want mkroot to DEPEND on the host having openssl installed. Long-term I want to do what busybox did and implement my own https plumbing. (Denys did it from the RFC, it's not an unmanageable lift. I'd REALLY like to make puppy eyes at him and use his code under 0BSD but he's been incommunicado for a bit.)

In the meantime, possibly I should modify the dropbear build to use /usr/bin/wget for https downloads. (Or at least fall back to it on a second attempt...?)

May 10, 2022

Wen Spencer's "harbinger" is out, so I'm re-reading the series up to that point before reading the new book. The new one's book 5, so this may take a while. (It's been out for a month but I was waiting for the audiobook, and I've stopped waiting. I prefer actually reading books, but since eyestrain is my limiting factor for working...)

May 9, 2022

My problem with $(echo||) being unable to return an error is that pipe_subshell() returns an fd, but not the PID, without which we can't wait() to get the exit code. That's fine, the shell won't accumulate zombies because wait_job() will eventually harvest the child data (and discard it as not in the jobs table). But it does mean that the return code from the subshell doesn't get propagated anywhere.

Which is actually what bash does: echo -n $(false); echo $? prints 0. What barfs is a syntax error, which bash notices early because it's the host shell recursively parsing the shell syntax within the $() and thus noticing the unbalanced statements before the end of the parenthetical block. Which means this isn't JUST an obscure issue of using case statements in shell substitutions (which I've never actually seen done), this is easily user visible syntax error checking. Hmmm.

Darn it, I have to make $() parsing catch syntax errors at parse time. Which means parsing has to be recursive, which it's REALLY not set up to do. Grrr.

Probably I should finish the rewrite to let while true; do echo hello; done | head -n 3 work properly first. (The first pipe segment needs an implicit subshell, the while can't run to completion in the host shell context before launching the next pipe segment. Also, head closing its stdin needs to terminate the while loop. Right now if you add explicit parentheses around the first pipeline segment yourself, "echo" is noticing the output is gone but the while true isn't and thus doesn't terminate, so the pipeline hangs eating CPU after outputting its three lines.)

All of this is tangled into terminal and job control, of course. I did large plumbing changes for this and got pulled away before they were finished, and need to reverse engineer my own code so I can complete it. Which is a well I haven't dived back into yet because I need to pick off some of the items on Elliott's list first: at least dd, tr, readlink, and tar --transform since all those are ALSO open cans of worms already...

May 8, 2022

Darn it, it's been long enough since I've edited videos I'm not rembering the key bindings. I need to make myself a cheat sheet. Let's see...

I did blender setup and saved defaults so it starts in video editing mode with a good panel layout, so I don't need to do anything with that, but I should dig up and bookmark the video I used to do the setup in case I need to install it on a new machine.

Plug the USB headphones in first, click the xfce volume icon and select "audio mixer" at the bottom of the drop down, go to "output devices" and click the green checkbox, go to "input devices" and click the green checkbox, THEN run blender (because you won't be able to move what it's outputting to otherwise and for some reason plugging in the headphones doesn't make them the default input/output device). This is more important for recording than editing, but it's still nice to have headphones when editing.

Dismiss the stupid advertising pop-up it always starts with, then at the bottom click "add->movie" (in view/select/marker/add/frame/strip). navigate to the first mp4 file from simplescreenrecorder and select it. Mouse over the display window and hit "home" to zoom it, mouse over the edit strip at the bottom and hit "home" to zoom that. In the upper left (properties) area scroll to the top and update the "end frame" field to the number of frames in the clip. (That has to be manually updated every time material is added, "home" doesn't change it. The default of 250 is insanely short.)

Now down at the bottom press "play". If it loops back to the start you've gone past the "end frame" and need to adjust. If you pause and advance the frame number with the arrows it plays just one frame, and at 5/sec it's reasonable to navigate by that. The vertical green line is the current frame number. Playing forward is more coherent than playing backward. It cuts at the current frame it's the START of the current frame, so if you advance to where a sound plays or a screen change happens, your ending frame is ON that change (not the frame before it). The starting frame for a cut is AFTER the last frame you want to keep. Left clicking also moves the line (selecting a strip is right click, yes it's crazy/backwards). Or hit "b" to batch select, hold down the left mouse button to drag the rectangle over both strips (audio is on top, video on the bottom), and release the mouse button. No there isn't a reliable visual indicator of what is and isn't selected, it sort of changes a bit but not in an easily confirmable way.

Hit shift-k to hard cut at the vertical green line. (Only applies to selected strips, and you have to re-select after every move. Non-shifted k is a "soft cut" which has never worked right for me.) Right click the little deleted bit, hit x to delete, confirm in the pop-up menu (sigh, it's got ctrl-Z to undo with a VERY deep stack but no, it needs a confirmation pop-up every time), select the remaining strips with "b" again, hit "g" to move, and move them left to fill in the gap (cursor left, then enter when done. If it overlaps the existing one it'll go red, but it bumps it back right to not overlap when you release so you don't have to be too precise. If there's a black dropout, you accidentally left a gap so move it again.)

Video editing is basically repeating that over and over to chop out the 1/5 second intervals that shouldn't be there until done. Add the image trailer at the end, then in resolution select "100%" in scale for render resolution (it keeps going back to 50%, maybe I need to update the save thingy?), and then at the top render->render animation. This writes out the file, very very slowly, to a random-ish name in whatever directory it started in (it doesn't prompt you for an output file name, and when you navigate to another directory to read the filename, it doesn't care).

Watch the result in vlc to make sure it's acceptable, then upload it to prudetube.

May 7, 2022

I'm still on some University of Wisconsin mailing list from that japanese class I took in Milwaukee, and they're having a "Trauma in our community" seminar. (The Boomers are not dying fast enough. Alito is 72. Clarence Thomas is 73. That's almost a decade younger than Biden and Pelosi, but still most of a decade past what used to be mandatory retirement age (65) before Boomers refused to step down while still alive. And that's ignoring the 40 years of breathing leaded-gas fumes that gradually made the lot of them measurably dumber even before they started going senile.)

Slowly getting pulled back into shell headspace. I have toysh to-test notes for things like echo $(if true) and echo $(true &&) that should produce syntax errors (and thus an exit code of 1 despite inability to run the command, same general idea as ${unset?complaint} erroring out in the command setup before having something to run). Making that work requires some poking at sh.c. It also raises that problem that if it just produced error output without changing the return code (luckily not the case here), I'd have no idea how to make the test suite TEST that properly, with TEST_HOST also passing. I kind of need a regex match on the produced output. Or "did it produce any output"? Hmmm... I could probably do it with txpect.

The next fun todo item in that same list is echo $(case a in a) echo hello;; esac) which toysh can't parse because $() contents just matches up parentheses, it doesn't do full recursive parsing (I'm waiting for somebody to complain who actually NEEDs it) because it would have to either discard the parse results and re-parse it again later anyway, or else result in a nasty branching tree structure that would be awkward to free() again. Right now I parse shell script into a circular doubly linked list with each node containing an array of argv[]. (Ok, it's an array-of-arrays with the first being the command line and the later ones being HERE documents; reality is ALREADY pretty complicated.) Having each element of that argv[] potentially have MORE THAN ONE entire subsidiary shell script circular-linked-list-containing-array-of-argv[] is just... really dowanna? That's why I'm keeping $(substitutions) as a string until it's time to execute them. You've still gotta get the sequencing right:

$ X='$((1+2))'; echo $($X)
bash: $((1+2)): command not found

But that's why I have/need so many tests...

May 6, 2022

I got the release out, after the usual shenanigans.

Back to poking at dd again, and hating dd conceptually. This keyword=value thing was really not a good interface (posix never even says what conv= is short for), and the DEFAULTS are all wrong: Why is there no seek=end? Without conv=notrunc dd defaults to truncating the file before writing to it, but oflag=append does not imply notrunc? (The toybox one probably should; that's a deviation from what upstream does but it's a NOP otherwise! Or undefined behavior when two oflag=append stomp some of each other's initial data and then interlace the output.)

Right, nose to the grindstone. Pickaxe to the coalface. Read the spec again, and the man page. Oh, speaking of which, I'm still subscribed to the coreutils mailing list (camping the spawn of "cut -DF") and I saw a MUCH nicer syntax for something wander by... here. iseek and oseek instead of remembering which one's seek and which one's skip. I definitely want toybox dd to support that. Although the adjacent commit is damn awkward to implement because toybox passes all this through to "atolx()" which is case insensitive, and "b" is already 512 byte blocks... Wait, no, it was originally bytes? Why did that change... because "most things" wanted b to be 512. Which things? Apparently dd! Probably posix says that. And indeed, dd if=/dev/zero count=1 bs=1b | wc gives me 512 bytes. So they added a new capital letter that means something different than the lower case letter. Bra filesystem checking vo, gnu/dammit guys.

Over on github Elliott wrote a thing more or less in favor of assert() to check for "should never happen" errors. Personally, I lean towards "never test for an error condition you don't know how to handle".

The problem is half the time a "can't fail" error is benign. I've seen spurious -EINTR leak through because the process was suspended and resumed, or -ENOENT when a device got hot-unplugged (which should have been treated as EOF but wasn't; I.E. same general reason for nerfing SIGPIPE). In a lot of cases the correct thing to do is ignore it and continue, because it doesn't actually indicate a problem. Exiting in such a case just makes the program more brittle.

An assert() is fundamentally about being afraid to do the wrong thing, thus exiting. But exiting can BE the wrong thing. (Especially when your caller doesn't notice and carries on with the truncated data, but simple denial of service is enough to be wrong when it would otherwise have worked.) In trying not to do the wrong thing, you do the wrong thing. In biology, this is the histamine problem. We deploy antihistamines at everything because our bodies respond to a WIDE RANGE of stuff with swelling, which is often a bigger problem than the original issue... but not always. Exiting is not always the wrong thing: I've got buckets of xfunctions() in lib/xwrap.c that exit when they detect something going wrong. But I've also removed a LOT of spurious asserts from code over the years where it was complaining about a non-problem (and thus the proper fix _was_ to remove the assert).

"What is the best thing to do here" is a DECISION. I do a lot of staring and pondering at cases, trying to figure out possible inputs and their ramifications. But "if the unexpected happens, do this"... having a stock response to the unexpected is a contradiction in terms. When I don't know what the correct response is, I lean towards not making it worse. If I don't know how to respond, I do not prepare a response. (And "suppose the kernel is compromised and sends you malicious attack data when you're reading /proc/filesystems"... I can't fix that. I'll suppose a future kernel produces more than 4k of input, sure, but a filesystem name of "../../../../etc/shadow" is not an attack vector I'm willing to add code to defend against: in that circumstance they can probably already ptrace me. I'm out.)

What I really want is a proper test suite that triggers every possible error path, but that turns out to be really hard to do. An error handler that is never tested can BE the problem. Case in point just now: I got the order of operations wrong because that error path has never triggered. How do I add a test to trigger it? I don't remember how you can fail to create a new variable, but I remember there was a way to do it? I need to read through setvar_found() to see why it can fail... the two goto bad; cases are because VAR_READONLY, or += on VAR_INT failing calculate(). Except (vv = addvar(s, ff))->flags = VAR_NOFREE; does not set either of those flags? Oh what was this case... something about dangling symlinks with VAR_NAMEREF? Have I not implemented the variant that can fail yet?)

I gotta set aside a block of time to get back into this headspace. If I can't come up with a test case, then the error condition can't occur here and I don't need a handler for it. (And that's not a "should never happen" because a function I wrote is calling another function I wrote. When you change the function, you check the callers before committing it. And then you add tests for the externally visible behavior your new code causes/allows/prevents/changes.)

Sigh. It is entirely possible I'm wrong, but I don't think going EITHER way is always right. Elliott leans one way, I lean the other, but it's an individual judgement call in each case what to do.

May 5, 2022

I did not put a jar of mayonaise in the sink and take a picture of it today. Not feeling the holiday spirit.

Today was the day that money from google was supposed to make it to the middleman organization. (45 day payment terms in invoices, and Fortune 500 companies always wait until the last day to pay so they can eke out tiny little bits of interest on payables. That's late stage capitalism for you.) The website balance is still reading $0.00, but I sat down and figured out how to do the paperwork anyway (with Fade handholding via phone). I invoiced myself for Q2 (13 weeks, at exactly halfway between the hourly rate the last 2 gigs were paying me), approved my own invoice, entered my banking info for deposit, filled out the third party form they use (docusign is neither secure nor particularly legal, and yet...) to hand over my identity theft number so they can do tax reporting (although not witholding)... and now we wait. Their documentation says they process approved invoices twice a week. I'm assuming the invoice existing before money comes in won't cause a problem (it'd just get bumped until next time)...

Almost got the toybox release out. I actually uploaded a version of my release checklist which isn't entirely up to date but gives a general idea. Wrote up the release notes, and edited them into parseable HTML. Built the mkroot images with the right version tag and uploaded them. (I cheated slightly and undid the tag, committed two more things, and redid the tag after the build, but both changes were just to documentation and wouldn't affect the binary). The toybox static binaries come from mkroot these days, out of root/$ARCH/fs/bin/toybox in each target build. Yeah it pulls in two things out of $PENDING but I'm ok with that.

I spent April shoveling through open tags and backlog of half finished stuff, trying to close down for a release, but what I really need to do now is get serious about the dev environment stuff. Google gave me a list of stuff they want for AOSP to hermetically build itself, which is the priority to get done first, that's only part of the picture.

I want to run the toybox test suite under mkroot, which is MOSTLY a question of toysh being good enough. Right now, when I run "./sh scripts/" it goes "source: syntax error 'scripts/'@221: ;;" and yes I had to upgrade the shell to beat that much detail out of the error message. (It was initially just saying "source: syntax error ;;" which... isn't helpful.) This is the same part of the code (case/esac parsing) that's preventing /usr/bin/ldd from running under toysh. (Gnu ldd involves a large shell script, which starts with #!/bin/bash so once again ubuntu's decision to switch /bin/sh to point to dash remains stupid 15 years later. You can't NOT have bash on the system. You can TOTALLY not have dash.) Once I've got the tests running under mkroot, I have a lot MORE tests I need to add that run as root and/or require a known environment containing things like the ethernet "dummy" driver.

I also want to build toybox under mkroot with a native.sqf compiler loopback mounted. This immediately dies due to "make" not being there, but assuming I harvest static binaries from an old aboriginal linux build and stick them after toybox in the $PATH, I can presumably attempt that with toybox providing what it can, and both verify that the toybox commands are load-bearing in this context, and get a list of what's missing via scripts/record-commands.

I eventually want to do a new automated Linux From Scratch build under mkroot. I'm tempted to take my old LFS 6.3 build I automated years ago and run that (because it was known working at one point), except more than one of those old packages have #if/#else staircases ending in "#error what is musl libc". This was the case even when I was trying to use musl in my old "last gplv2 release of everything" toolchain under aboriginal (and was the main thing delaying me from switching that project over to musl: I'd lose the ability to build LFS natively unde the result). Using an even newer toolchain with the old packages will not have improved this.

This sort of version skew problem happens a lot. On monday SJ Hill emailed me asking about initramfs console issues in a 2.6 kernel, and I couldn't build a 2.6.39 kernel from a git checkout to test my answers with the current debian toolchain because apparently newer gcc versions default to producing pic code now? And then when I applied that fix, the 2.6 build tried to #include linux/compiler-gcc8.h which doesn't exist. It really really should not CARE, but does. He said he's building it under an old debian version running in a VM, which was more effort than I wanted to go to just answering email related to documentation I wrote in 2005. Anyway, we got his problem fixed and he wandered off to do his work. The point is, old packages not working with new build environments isn't always a problem with the new build environment, but with the old package making stupid assumptions. When "gnu" is involved, you can just about guarantee it.

May 4, 2022

A google developer is sending me fixes for toysh. I am simultaneously happy somebody's paying attention to it and embarassed it's not ready yet. It's like a realtor coming to show somebody the house when you haven't tidied up. I'm suppressing the urge to preemptively apologize. They're reviewing an unfinished draft, but I've been working on this for over 2 years now so it's my fault it's not entirely there yet. Speaking of which....

I recorded two large chunks of what should be the next video, on building toybox from source. Now I need to edit it into... at least 2 videos, it's 20 minutes of raw footage. I need to remember to type "reset" and clear my terminal window more often, because my editing decisions are a lot more constrained when I don't. (The backscroll in the terminal window jumping around would be distracting.)

May 3, 2022

The GOP was especially evil today. Too stressed to focus much. And yes, this is the same supreme court that ruled in favor of Nestle in the recent child slavery case. The Boomers are not dying fast enough.

I'm trying to remind myself that youtube becoming prudetube isn't JUST a sign of US culture being co-opted by witch-burning puritan Boomers clutching their pearl rosaries as they sink beneath the waves, it's mostly about youtube being incompetent and self-destructing by slowly driving away its creators. (Note how in this great resignation howto video at 3:13 the word "onlyfans" is bleeped out, in a list of patreon/etsy sites people use to make money online.) And that Amazon becoming an SEO hellscape being called out by celebrities is about them having bait-and-switched from customer-driven search to bid-for-placement while even screwing over sellers with its own copycat products that get the top five search spaces for free.

All platforms have a lifespan, and if Youtube is going the way of Vine, Twitter going the way of Livejournal, and Amazon going the way of Woolworth's that's not unexpected. More like inevitable. Similarly, I now go entire weeks without remembering microsoft EXISTS, which did not used to be the case. They're still around, but after 30 years of unassilable dominance they have no presence in the phone space, can no longer exclude macs and chromebooks from the PC world, and have their own version of "Win-OS/2" (which meant back in 1993 anybody deciding whether to ship an OS/2 binary or a windows binary could just ship a windows binary and cover both systems, quickly ending most native OS/2 development).

There's plenty of places to buy stuff other than amazon. Even if I'm not personally active these days, plenty of people are still promoting nudism, and holding events with reasonable turnout. (I'm told Star Ranch's naked 5k run had record attendance this year. I wanted to go but it was too close to my return flight from Minneapolis.)

May 2, 2022

Trying to update various toybox documentation, and it's HARD. I very carefully phrased was was there, over a large number of attempts, and now I need to change what it's saying in a way that resequences stuff, reorders categories, changes category divisions... Hmmm.

Xeno's release process continues as usual. Half way to halfway... Meanwhile the bug reports continue to come in. *shrug* The tree's in pretty good shape, I'd just like to nail it down and SNAPSHOT it already. But that involves a LOT of documenting, which is a special kind of turning over rocks to see what's underneath. (Honestly, I'm not really that good at this. I'm just persistent!)

May 1, 2022

My wife is getting a degree in classics, doing her dissertation on "Social interactions between (the group of marginalized people composed of) slaves and sex workers in Plautus". She's currently taking a class on sex work in the ancient world.

A recent thread from Dr. Sarah Taber on how men took over the egg trade from women (replacing "egg money" with factory farming and Salmonella) neatly explained why prositution is illegal. The relevant observation is "Men force women into the worst-paying parts of the... system. But once those trades start making [money], the women have to go." Here's a 1983 paper how men pushed women out of computer science as soon as there was money in it. Centuries ago a woman invented kabuki dancing and later only men were allowed to do it.

Prostitution doesn't cause any actual PROBLEMS, it's just that women always make more money at it than men do, thus it can't be allowed. Yet it's been legal all my life in Nevada and Amsterdam which have no problem with it. The current war on camgirls is related to the blowback against "the great resignation": young women achieving financial independence working from home cannot be allowed by a patriarchal dominance hierarchy. Everything else is just an excuse.

Centuries of home weaving and sewing were forced to give way to the Triangle Shirtwaist Fire, an example of human trafficing that had nothing to do with sex work. Most human trafficing today is in agriculture. Blaming sex is the usual pearl clutching diversion.

Fade's tumblr is full of references to, among other things, "our stronger nudity taboo", and as a (largely lapsed) nudist, I am sad about said stronger nudity taboo. Made recently much, much stronger by senile lead-poisoned boomers.

Fade, Fuzzy and myself were recently trying to watch Thermae Romae on Netflix, a remake of an anime about a roman bath designer time traveling to modern japan to steal ideas. It's TV-MA... for no apparent reason. They never show a single dick. Awkward camera angles, shiny water, and anachronistic towels. (In the first episode they put all the bathing romans in towels, and then a few episodes later one of the great discoveries the roman bath architect time-travels to modern Japan to appropriate is... towels.)

Seriously? What is this nonsense? People have bodies. I showered with my younger brother until I was ten. Parents taking naked pictures of their children was entirely normal before the Boomers took over; nudity is not the same thing as sex, and this A) drawings, B) already a pay service, C) an entire story about people having baths.

Daryl Hannah's naked body in the 1984 movie Splash was PG, but the Disney+ version recently deepfaked dog hair over her butt. (Oddly enough Tubi had the original unedited version last month, because an ad-supported service was less scared of itself than a billion dollar conglomerate that regularly rewrites the law.) The video for Wrecking Ball is still on youtube, 1.1 billion views over the past 8 years, which somehow has not led to the collapse of civilization or even the singer (despite the Disney child star curse).

As for full frontal male nudity, Richard Donner's 1978 Superman showed a naked little boy's dick (when Kal-El lands in Kansas and stands up) in a PG movie. In R-rated films nonsexual male full frontal nudity was everywhere from The Terminator to Life of Brian, as recently as 2009 Doctor Manhattan spent most of Watchmen naked.

But Netflix decided to rate something TV-MA:nudity (no other reason given) and then NOT show anything in ANIMATION (drawings, not even pictures of real people) where everybody is repeatedly naked for UNAVOIDABLE PLOT REASONS? Seriously? Either show the nudity, or don't make it TV-MA:nudity. Pick one!

The senile lead poisoned Boomers (no really, they are uniquely measurably stupid) are not dying fast enough. From climate change to fascism (which always includes racism, sexism, and homophobia in the definition of "fatherland" because everyone must perform their assigned societal roles full time no exceptions), it's a race between enough Boomers dying and the collapse of civilization. I'm aware this is a rounding error compared to that, but it's personally annoying.

April 30, 2022

Oh good, Denys is posting to the busybox mailing list again. That means I can stop paying attention. (I noticed in the first place because there was a guy emailing me about a bug disclosure that he thought might be a significant security vulnerability, and I was trying to help him. He's posted it publicly now. Yes, I still get emailed about busybox stuff, and try to at least gracefully hand off...)

I've copied the toybox release note slush pile into the start of news.html, grabbed a new hitchhiker's guide quote for the top, and am trying to beat it into some semblance of order to cut a release. There's still a zillion trailing loose ends but it's been FIVE MONTHS since the last release and I'd really rather not make it six.

Blah, what is my todo list: Release notes. Youtube channel for the videos. Update nav bar with youtube and patreon links, and collate the documentation somehow. While I'm at it, the main toybox page should go to about.html but with release a release note link from there... probably that goes at the top of the page?

The last time I tried to collate that I hit the fact that the summary at the top of news.html and the summary at the top of about.html are redundant to each other, but not as easily mergeable as I liked. Really, I should have a summary blog in header.html (which produces the nav bar via ssi, and yes I have a todo item to make toybox httpd support ssi...)

This is my problem doing this sort of work, it's fractal. Attempting to close tabs winds up opening more tabs. Doing the work makes the todo list longer, and there's sort of a hydra thing going on where trying to find half-finished stuff and finish it results in more half-finished stuff...

(You can tell I'm in debug mode. I'm angry at software design again.)

April 29, 2022

As always, updating documentation leads onto a development tangent. In this case, now that I've got binary toolchains uploaded I want to explain CROSS_COMPILE= as its own part of the FAQ, and the examples I have for that are both statically linked, and I want to teach mkroot to set up dynamic library directories in the target (as a new "dynamic" argument like dropbear), which means writing a new scripts/root/dynamic so you can "scripts/ dynamic" to add the shared libraries the way you can add dropbear. (And then move those builds before toybox, so the toybox build can test -e $ROOT/lib/ and do a dynamic build if it's there.)

But despite all the talk about merging /bin and /usr/bin, what Devuan did NOT merge is /lib and /usr/lib, which is annoying for me trying to add a scripts/root/dynamic that works with the host toolchain too, because lives in /usr/lib and lives in /lib. (Well, both add a gratuitous x86_64-linux-gnu-gnu-gnu-stallman-forever-ftaghn-ia-ia-all-hail which has nothing whatsoever to do with Linux and never did. The migration from the libc5 the kernel guys wrote to libc6 (Ulrich DrPepper's fork of glibc is where "glibc 2.0" came from, pretty much the same as egcs where the original project was long dead and linux people frankensteined the corpse into something useful) was because Ulrich's project was all about adding thread support (I.E. spraying everything down with gratuitous locking) and the flood of Java people in 1998 (tripling the size of the community after Netscape released its source code and credited Linux for the decision) outright DEMANDED thread support because James Gosling hadn't included bindings for unix staples like poll/select in the standard Java library so the ONLY way to do nonblocking I/O was to spawn a blocking thread for each background read, so Java devs who'd learned nothing but threading couldn't function without it. (Sun invented threading because forking processes on Solaris was so heavyweight, Linux could fork processes faster and cheaper than Solaris could spawn threads so had never bothered with threads, but if twice as many cobol programmers had shown up to use Linux than the entire previous community, they'd have gotten good cobol support in a year or two, so... Anyway, the FSF was the largely passive recipient of a lot of external development work that tried to use them as a coordination point largely for historical reasons. The gnu project was announced in 1983 and was already vaporware dinged by Linus' 0.0.1 announcement in 1991. Stallman initially responded by telling people not to use it and then pivoted to trying to take credit for it. And no moving off libc5 wasn't about ELF support, that's what the move from libc4 to libc5 was about. Libc6 was about threading.)

So anyway, I need to copy everything out of the directory that has, the one that has[0-9], and the one... No, hang on. I don't care what the host has, I only care what the toolchain has, and rather than doing library whack-a-mole I need to use cc -print-search-dirs and parse that.

Still need to special case the dynamic linker through. (/lib64 should not be a thing at the top level. Seriously. You ALREADY ADDED multilib directories UNDER /lib, and then you have another one on top. That's very gnu, used here to mean "trying so hard to be clever it wraps back around to stupid". The kind of clever that piles higher and deeper without ever clarifying anything. Eternally accumulating, never simplifying. I'm trying to eliminate problems so you don't have to deal with them anymore, this nonsense confronts you with endless "look what I did, you'll never understand it, worship me" chest beating. When I'm successful at my goals, it doesn't look like I did anything. I'm not building monuments, I'm TIDYING. Grrr.)

April 28, 2022

I'm trying to update qemu, and once again it's "upgraded" itself into uselessness. I want to run "make install" without root access to see where it's trying to install stuff. (Is it in /usr/local/bin or /usr/bin or /opt or what?) But when I run "make install" it pops up a gui window prompting me for my password so it can sudo, and if I don't give it the password it exits with failure. There is no "continue as this user" option so I can see it try (and fail) to install and thus know WHERE it's trying to install stuff.

So I try to find and rip out the call to sudo from the build plumbing, but build/ is calling meson/ which is calling some library I can't easily find, so second attempt: make a NOP sudo wrapper script that does nothing but run its arguments and stick that in the front of the $PATH... and it's not calling sudo at all, this GUI pop-up is some thing (I.E. it's from the systemd people). The COMMAND LINE BUILD is going through GUI DESKTOP PLUMBING to INSTALL FILES. That's not right.

I remember now why I haven't upgraded qemu in so long. I WILL NOT run a random script as root to modify my system in ways I can't see before it does the thing. That is not happening. I tried running it under fakeroot, but that said it couldn't LD_PRELOAD in one of the child processes and popped up the same GUI prompt.

Ok, what I needed to do (after reading files under qemu's "meson" subdirectory, which have spaces in them and lots of references to running on windows) was add --dry-run to the install command line in build/, and THEN it says it's trying to install stuff to /usr/local. Ok then.

The new qemu-system-ppc64 is 102976760 bytes. That's over a hundred megabytes. It is not statically linked, ldd lists 68 shared libraries on TOP of that nonsense. Grand total 3.7 gigabytes of JUST /usr/local/bin/qemu-* files, and that's not even counting the crap it installed in /usr/local/share and who knows where else. Bloatware of the highest order. Meanwhile, Fabrice's Bellard's new tinyemu is a 250k source tarball.

(Ok, as Rich points out, stripping qemu-system-ppc64 brings it down to only 20 megs. But that's still insanely large, and qemu installed the unstripped binaries by default. The Devuan install ISO is only a 1.2 gigabyte download, QEMU's default install is THREE TIMES BIGGER THAN THE DOWNLOADING THE ENTIRE OPERATING SYSTEM. Yeah sure, compressed vs non-compressed, I don't care. That's NOT RIGHT.)

April 27, 2022

A problem sei/coresemi/j-core hit a lot was collapsed manufacturing chains, I.E. this is no longer manufactured, there's just an existing inventory of irreplaceable parts. Apparently this is now coming up trying to send enough stinger missiles to ukraine. The USA has a stockpile of them but hasn't ordered any new ones in 15 years, and manufacturing more requires redesigning the electronics to replace parts that are no longer available. Of course once you're aware of this category of problem, scratch the surface ANYWHERE in the Boomers' world and you find stuff the Greatest Generation did that nobody inherited responsibility for. Sometimes hobbyists have reverse engineered enough to know how to do it, but not at scale, and the original sources have lost that institutional memory (let alone ready production capacity) in some capitalist merger/layoff/consolidation.

This morning I MEANT to sit down and fix the sha1sum endianness issue, but instead I got sucked into cataloging various data types the file command doesn't recognize, and now I'm trying to design file -i support, to output mime types instead of the ad-hoc text format file usually produces. There was zero provision for that in Elliott's original design, and it's not easy to retrofit.

Ok, technically I want to implement file --mime-type which is JUST the mime type without the stupid ;charset=potatosalad decoration gnu/file keeps wanting to crap over things. But I have an aversion to long options without corresponding short options (it's not unix!) and thus want to an -i that does the concise/simple output, EXCEPT that makes TEST_HOST harder to support in the test suite.

(If Elliott hadn't submitted his own file.c, I'd probably have done one that JUST produced mime output to start with, and added the long text version later when people asked, and then probably not for everything... but that's not where we are today. Hmmm... This needs a long walk and a design think.)

Ahem, but AFTER I get a release out. And a bunch of videos filmed edited and posted. (Put the shiny thing down and back away slowly. Doesn't matter that I've just half-assed it based on file extension in httpd.c, I need to CLOSE tabs...)

April 26, 2022

Grinding along doing release prep: building all the targets and testing that the network actually works in them, by running my new httpd on loopback on the host and having wget fetch the toybox README file and sha1sum it. (Remember: qemu's address passes through to on the host, so ./netcat -p 8080 -s -L ./httpd matches up with wget -O - | sha1sum which would be nice to automate somehow. I have not reopened the can of worms that is build control images yet, but I'm working towards it.

This is a kind of test I've done piecemeal over the years, but there's been nothing systematic up until now because I didn't have the pieces, and now that I _AM_ being systematic-ish about it (or at least retesting all the targets)... my armv7l fix for the network card wasn't applied right. (It's a QEMU bug: QEMU's arm device tree is the same for 32 bit and 64 bit platforms and the address it gives for the network adapter is up above 4 gigs and thus inaccesable on 32 bit targets. You can work around this by enabling the page addressing extensions on the kernel so it can map high memory down into the 32 bit address space, but you really shouldn't HAVE to. The QEMU bug that was opened about this got marked invalid, so we're stuck with the workaround. Anyway, the PROBLEM is when I added the symbol I added CONFIG_ARM_LPAE to the csv list for the armv7l target, and the plumbing prepends the CONFIG_ itself so it was setting CONFIG_CONFIG_ARM_LPAE=y which didn't work.)

Meanwhile, on m68k the wget seems to be working fine, but sha1sum is producing the wrong hash! (Not just for this, echo hello | sha1sum produces the wrong hash too. I thought I had the endianness and alignment right for sha1sum? Hmmm, mips is getting it wrong the same way, that's looking like endianness. (Yup, mipsel is getting the right answer.)

Let me guess: In 2014 Daniel Verkamp did a commit which made md5sum and sha1sum run faster, but made the code bigger and more complicated. Shortly thereafter I did a commit which made them work on big endian. Then in 2016 Elliott did a commit to add a config option to pull the assembly optimized versions of md5sum and sha1sum (and sha256 and friends) out of openssl. Then in 2019 I did a commit which reverted Daniel Verkamp's changes putting the non-openssl versions back to the slow-but-tiny versions. So the question is: did I forget to re-apply the big endian fix?

The other thing I'm not really testing this pass is block device support. Everything's still running out of initramfs, no "create an ext2 formatted hda.img and mount it on /home". That's an important step for doing native builds.

Oh hey, powerpc64le found a new way to fail. Can't tell if this is a qemu bug or a kernel bug, so I asked both lists. Probably going to be ignored by both too, due to the list traffic volume. (The powerpc list isn't that high traffic, but it gets giant piles of inappropriate crap cc'd to it. Standard kernel-dev failure mode for the past decade, signal buried in noise.)

Huh, sh4 is doing a similar failure, except it doesn't "catch up" when you hit a key, the extra data has been discarded. (It got the sha1sum of the fetched file right when I typed it in manually though.)

April 25, 2022

Still congested.

I caught up on the git commit log, and have (unedited) release notes covering all the commits. Now I need to figure out what else should and should not go in the release, so I'm back to picking through the "git diff" in my toybox dirs. I have a conversion of nbd to work on nommu (it daemonizes which was using fork and should now use vfork(), but htat's not a trivial conversion because fork-and-continue is not available with vfork, you have to re-exec yourself). But it's not very tested?

And THIS reminds me that I was trying to come up with a nommu test environment for qemu, because having to pull out my turtle board and sneakernet files onto it via sd card is a slow and awkward way to test compared to mkroot and qemu. (Sure I can manually set CONFIG_MUSL_LIES_ABOUT_HAVING_FORK, but I never quite trust that isn't leaking something that won't work on a REAL nommu system, and having an actual nommu target in the mkroot list gives me testing organically.)

And specifically the nommu target I was trying to bring up a week or two back was coldfire, ala nommu m68k which qemu supported long before it supported full m68k.

... and my laptop decided to power down walking to the table, so instead of resuming from suspend I've lost all my open windows again.

April 24, 2022

On my way out to the table after sundown I noticed an abandoned walker on the corner of Red River and 38th, the owner of which was an old man standing in the middle of the intersection. I talked to him (his name's Glen) and he was VERY lost and tired, having been walking since morning. (With a walker, due to a back injury he's got surgery scheduled for next month.) His cell phone battery had died and he could barely see at night, he'd been trying to read the street signs up above the stoplights. He was trying to get to the driver's license office to deal with citation paperwork (related to nighttime "driving while unable to see") so he could move his truck out of the parking lot he'd had to leave it in after the police guy stopped him and told him not to drive until he sorted the citation.

I used my phone to work out the bus route to get back to his truck, which was only like 4 blocks from the driver's license office. (It's that one up on Lamar across from whatsaburger. He knew that area because he used to live in an apartment there years ago, but his truck was parked half a mile east of there and he went in the wrong direction when he headed out this morning, hence winding up 30 blocks away from where he needed to be at the end of the day.) I got on the bus with him (day passes are lovely), and took him to the relevant whataburger (open 24 hours), and from there we walked very very slowly east to his truck. (He only fell off the curb into the street twice. Once because his left wheel went off the edge of the 4 inch curb, and the second time because he tried to sit down on the curb where there was a storm drain in the gutter.)

As we talked, it turned out that the reason he's in this much trouble is his wife (ok, "girlfriend of 9 years") died suddenly 3 months ago, leaving him living on his own about a year away from qualifying for full Social Security. He has a brother, but he's out of town until monday (attending the funeral of a friend Glen couldn't go to because he had to work and deal with the citation on the truck or he can't get to the construction jobs he works at).

After a couple hours we finally made it to his truck, which he said he was fine sleeping in. (And it started fine; he was worried he'd drained the battery the previous night.) He said he can see well enough during the day to make it back to the driver's license office, and this time knows which direction it's in. He couldn't find his car charger in the first 5 minutes so I left him my worse phone battery and one of my zillion USB charger cables so his phone would be useful again in the morning.

I walked home from there, which took me by my house and I stopped in for a beverage and a bathroom break, and then wound up staying instead of heading out to the table again.

So I got my exercise, but missed out on about half the day's programming.

April 23, 2022

Huh. According to "git log --format=fuller" it looks like back on wednesday Bernhard Fischer commited a half-dozen back patches to the busybox git repo? (That's the guy who took over uClibc from Erik Andersen and let it die.) Still no word from Denys...

I bit the bullet and uploaded binary toolchains (cross and native, for all the targets I've gotten working so far except hexagon) to my website. I told Patreon about it on the 15th (my first locked post!), and will probably add a link to the FAQ or something as part of release prep. It's not too hard to figure out where they are, but I haven't been publicizing it yet because... ew. GPLv3. Ouch. Dowanna. (Yeah there's a source tarball-of-tarballs in the same directory, it's still horriffic to provide ANY exposed surface to the people who sued Mepis because even explicit endorsement from Mark Shuttleworth was not enough to let Mepis point to Ubuntu's servers for source packages their Ubuntu derivative hadn't modified: this one man garage operation was forced to mirror everything locally or get sued by the FSF.)

But my old aboriginal linux toolchains have fallen off the map (too old to build current kernels), and the LACK of proper toolchains I can point people at has become limiting. I need to do videos building toybox and mkroot and so on that kind of need "wget toolchain" as a step I can SHOW rather than "here's the procecure for building a zillion toolchains at once, building just one is kinda hinky because it builds an i686 musl toolchain first and then builds the other toolchains with that because they're static linked on the host for portability, and if you just run it without rather complex command line arguments it's going to build dozens of different toolchains and take a double digit number of hours to finish..."

Yeah, not exactly intro material. "wget this tarball, extract it, put it here" is 15 seconds of instruction and maybe 60 seconds of demonstration while explaining why. It's very limiting NOT to have them up somewhere.

April 22, 2022

I just hit a real world example of why "infrastructure in search of a user" is a bad thing. Years ago I implemented fileunderdir(file, dir) in lib/lib.c, on the theory that "take the absolute path of both and compare them" is an expensive but reliable way to avoid all the "../../.." and symlink escapes for things like web servers. That approach potentially kills a whole category of tricksy security nonsense. The current function only has one caller in the tree so far (cp.c), but I put it in lib anyway because I expected more to come. And here I am doing httpd, so I call fileunderdir() and...

Its API is wrong on multiple levels. When you "wget" that turns into a fetch of "/" which (once adjusted) is exactly EQUAL to the constraining path, and this wants the file to be UNDER the constraining path so an exact match doesn't work. Meaning I tried "wget" and got a 404 error, and had to debug why. Plus the function returns a malloc()ed string (to the abspath of the file) and I was treating its return value as "true or false" so was leaking that without even noticing. And changing the semantics of the existing function would subtly break cp in ways I'm not sure I have a test for yet...

So for the moment, I've added my own "static int isunder(dir, file);" to httpd with the semantics I want, and then I might want to move the other one into its only user (cp.c) and wait for a third instance to show up before trying to work out common semantics they'd all be happy with.

April 21, 2022

Hmmm. I think I need to rip scripts/config2help.c out.

I just built a defconfig-ish toybox in a mostly clean source dir, and netcat --help is not showing the "server" options, but when I "netcat -l" it does listen on a local port. The web help generated last release shows the extra options, so this broke recently-ish, but I really don't feel like spending a lot of effort debugging it?

Way back when, config2help would shuffle together help text for commands with config options that could change what sets of command line options it supported, but HAVING that sort of micromanagement was design idea left over from busybox. That project had the goal of being as small as possible at the expense of compatibility, and as I pushed it into being something you could use on a desktop (as Alpine Linux does), being able to yank features out of individual commands to save 35 bytes at a time got less interesting. Toybox never had as many config sub-options, and I've been slowly eliminating the ones I did implement, because I want toybox commands to behave consistently. "Does toybox cp support --preserve" should be a question you can answer just by knowing which version you're running.

The only two commands I'm spotting in menuconfig that still have these kind of sub-options are cat and netcat. In the case of cat it's because of the catv command, which is my fault (I took some bad advice). Nobody uses it, they just use cat -v, and google isn't even finding mention of its existence outside of busybox command lists. I'm not finding any scripts using it, no human writeup mentioning it, etc. Android's toybox config does not enable it, but DOES have -v in cat itself.

As for netcat, the argument was that offering server functionality was "dangerous", although... that's why iptables exists? And dialing out is just as dangerous? Android DOES enable netcat's server mode, and they're as paranoid as you get...

Sigh, I zapped both those two but there's still MKDIR_Z and such. (Commands with features enabled via build environment probes.) I think the correct fix here is to run the config help text through the preprocessor so USE_COMMAND(blah) macros can apply to it? What have we got left: MKNOD_Z, PASSWD_SAD, WGET_LIBTLS, ID_Z, MKDIR_Z, MKFIFO_Z, SORT_FLOAT...

Anyway, I went down this giant tangent because I got an initial httpd implementation to the point where I need to start testing it, but I only implemented inetd-style support so far (not a standalone daemon server engine), so I was gonna run it under netcat -l (which provides the same stdin/stdout network wrapper semantics) and I was doing --help to remember the server options and the options weren't there...

April 20, 2022

Strangely under the weather. Sore throat, sneezing, general lethargy. Not sure if this is a cold or if it's from all the smoke I inhaled last night. (We had marshmallows! We had hot dogs! We had wind shifting around and blowing smoke straight AT me no matter how many times I moved.)

Finally got wget promoted. And in order to properly test wget (in a regression test way instead of fetching stuff from that I put there via ssh), I need a web server I can run on loopback as part of wget.test. So of course I've added an httpd.c to pending and am filling it out...

April 19, 2022

Today's email reply I did NOT send, regarding my ongoing attempt to get posix to notice freadahead():

On 4/18/22 07:36, Chet Ramey wrote:
> On 4/18/22 12:53 AM, Rob Landley wrote:
> So the gnulib folks looked at a bunch of different stdio implementations
> and used non-public (or at least non-standard) portions of the
> implementation to agument the stdio API.
> If that's what you want to do,

I thought starting "is this already in here and I missed it" was more polite than "you should add a thing that may already be there and I just haven't spotted it", but I'd assumed one would flow naturally into the other due to the nature of the venue. My mistake.

(And hey, I just saw Geoff's proposed workaround and I think I can use it. Still problematic in a threaded context, but what isn't?)

> propose adding freadahead to the standard.

Yes, I would love to. How? does not say how, googling "how do I propose an addition to the posix standard" brought up the faq, the wikipedia page (also doesn't say) some rant from Richard Stallman (not even clicking)...

The front matter of the standard itself has "participants" (a credits roster), and before that an "updating IEEE standards" section that points to which is a glossy marketing brochure site but I dug down to which does not obviously help?

(I did not expect "ask on the list" to require monkey paw phrasing.)

> Or reimplement the gnulib work and accept that the stdio implementation
> can potentially change out from under you.

Or ask the bionic maintainer to add the API the musl maintainer already added (which is basically what's in dragonfly bsd and z/os already, and is also the api gnulib implements with that staircase), so two of the five C libraries I care about would export an explicit API for this, helping a cleaner way to do this gain more widespread availability...

> Current POSIX provides no help here.

No, Geoff had a good suggestion. I just hadn't checked my spam filter promptly.

>> I was just wondering if there was a _clean_ way to do it.
> OK. Do you think you've gotten an answer to that?

"Does posix have a way to do this?"


"It would be nice if it got standardized."

"Maybe it would, but that's a different question."

When I asked the C standards committee chair about adding this, and reported back his observation that the C spec does not have filehandles, and my interpretation that this makes posix the relevant context to add such a function which would be analogous to the existing fileno(), the reason was I wanted to establish that posix was the relevant context to add such a function.

This is because I was hoping to lay the groundwork to convince posix to add such a function.

>> The C99 guys point out they haven't got file descriptors and thus this would
>> logically belong in posix, for the same reason fileno() does. "But FILE *
>> doesn't have a way to fetch the file descriptor" was answered by adding
>> fileno(). That is ALSO grabbing an integer out of the guts of FILE *.
> Sure. And adding that to the standard would require the usual things, for
> which there's a process.

What is this process? Where does one find documentation on this process? Is it in one of the documents linked from It's not in, and describes the defect reporting mechanism, is a new feature considered a defect? Ah yes, halfway through that document, it says:

> Everything starts out as a defect report, unless raised
> during a plenary session when the ORs are present.

(I wonder if by "plenary session" they mean the thursday conference calls?)

Ok, so by "there is a process" you meant "add it to". That would be the process.

Which I should have known because who hasn't read down to line 138 of "Committee Maintenance Procedures for the Approved Standard" document of the "Committee Draft Development Procedures" section near the bottom of the righthand column of the project's about page. How did I miss it?

>> This exists. It would be nice if it got standardized.
> Maybe it would. But that's a different question.

Out of curiosity, why do _you_ think I brought this issue up on the posix mailing list specifically?

Anyway, I've got a workaround now, and I don't care about making it work with threads, so...

Thanks Geoff,


April 18, 2022

A tree fell into our back yard recently. It was growing up between our fence and the neighbors' and Tornado Warning Du Jour brought it down while I was up at Fade's. (That whole "storm of the century every few weeks this time of year" thing hasn't really gone away, we just did many thousands of dollars worth of landscaping after the fourth time the house flooded so the water part isn't as big a problem. We did have to repaint that corner of the house scoured bare by Hurricane Harvey.)

It's been hanging OVER our fence ever since, because the part where it broke is just a little above the fence and the tree itself is in our yard, and we didn't want to let it fall the rest of the way down and take a chunk out of said fence, so Fuzzy went to Home Despot and bought a saw, and spent several hours today performing topiary with extreme prejudice. And now we have a large pile of wood bits, so of course she put some bricks in a circle and tried burning some. (After trying to google for the tree type to make sure it wasn't one of the more obviously poisonous ones.)

Green wood does not actually burn all that well (the tree was alive until it came down, and yes this is the SECOND tree to rot and fall into our yard, at least this one wasn't full of bees), but we still had our dead christmas tree awaiting disposal, and THAT burns quite easily and gets the other wood going eventually. And we have zillions of fallen branches that have been accumulating because breaking them up to stuff in the compost bin or lawn bags was a chore.

We need to buy marshmallows. We only have the little mini ones you roast over a candle with a toothpick.

April 17, 2022

I'm looking at shipping container batteries.

No, not Ambri, they suck. Bog standard ivory tower academia needlessly complicating things, with a full decade of failure behind them. Solar projects storing energy in molten salt is an old idea and they were all very expensive failures. Putting the molten salt in a shipping container so you can have it downtown instead of out in the desert is NOT an improvement. Observing "you know, we could make big batteries out of shipping containers" ten years ago was laudable, but not THAT creative given that converting shipping containers into diesel generators was an established business model. Back in 2012 people were already living in them, using them for greenhouses, and one had already been converted into a starbucks. Deciding it was a good idea to FILL THEM WITH LAVA and park them downtown in population centers was very... grey haired white man with academic titles fundraising from silicon valley. (Where accidentally reinventing things that already exist is a regular event. A place where "new to me personally" = revolutionary.)

Anyway, the ongoing problem is that solar panel deployments are storage limited far earlier than they should be, because curtailment triggers loss aversion, one of many well known bugs in the human brain. We won't install solar panels if we're sometimes "wasting" their output, because monkey brain says it's better to let rain fall on pavement than install water barrels that might sometimes overflow. Failure to capture EVERYTHING is wasteful, so don't capture ANYTHING. Brilliant! If you never even start, you can't do an imperfect job!

So we need batteries and lots of them. The chronic first mover in the psychological hype space deploying a half-assed solution in perpetual development has of course installed space heaters, which as usual have plenty of competitors trying to do exactly the same thing. GE has its own shipping container full of lithium cells called the reservoir system, which was also announced years ago and then silence, but it technically still exists so yay I guess? Samsung's stab at this was via partnership with a chinese company so I assume all their IP was stolen and the venture folded when their partner ran off with the money, but I can't say I've actually checked. That's just how most chinese partnerships go...

But all those guys are doing lithium, which got popular providing PORTABLE power. It's light and energy dense, explodes on contact with water (or various other things), and is fairly rare. Annually we only mine about 30 times as much lithium as gold. We mine over 6800 times as much copper as gold each year, and copper is still expensive enough for people to steal pipes and wires out of empty buildings.

But when you're filling shipping containers with batteries, and can stack said shipping containers on top of each other, weight and density are far less of an issue than cost and durability. Something cheap and plentiful, which doesn't burn (let alone explode), can be recharged thousands of times without losing capacity, and isn't made out of toxic waste or blood diamonds.

Which is why I'm interested in Form Energy, the people making shipping container batteries out of iron and salt water. Iron gives off electricity when it rusts, and if you run a curent through the rust it gives up the oxygen and turns back into iron. This is how aircraft carriers made out of steel avoid rusting: they run a low-level electric current through the hull. It's called "impressed curent cathodic protection" and according to wikipedia[citation needed] they've been doing it via batteries since 1824 and with allgator clamps from a generator since 1928.

Form Energy makes "iron air" batteries using big piles of iron pellets (which are about as explosive as metallic iron usually is), and salt water (which will actually put OUT a fire if poured on it, and if it wasn't safe to pour straight into the sewers we couldn't use road salt to de-ice roads in the winter). We make buildings out of steel girders and live in them, so I'm not too worried about toxicity, nor about sourcing the materials. Any modern really big cheap human thing probably has a lot of iron in it. Even concrete is full of rebar.

Form Energy more or less fills a shipping container with PVC pipe, iron pellets, water, and road salt. To speed up the reaction they add a little vinegar to acidify the liquid, and wire the whole thing up to accept and output electricity. The reaction still isn't very fast so they make it big, and you get a battery that takes 10 hours to fully charge and 10 hours to fully discharge which is what you WANT for municipal battery applications. If you short it across the terminals whatever you're shorting it with gets hot, but the battery itself isn't likely to care. (It's iron soaked with salt water. What's it gonna do? "Rust faster!")

Pretty much all batteries have side reactions (when you recharge them the energy dosn't ALL go into reversing the discharge reaction, it makes Weird Other Chemicals too), but the "salt, water, iron, and air" reaction is simple enough (especially since they're using ROAD salt, not table salt, so you haven't got any chlorine involved) that they can slap an upscale pool/aquarium filter on there and recycle it all back into the original chemicals without too much effort. (They call this bit the "kidneys" and it's where all their patents are.) The main other thing they care about is temperature control (because water has to stay liquid: even in death valley it's not going to get anywhere near boiling and the water is FULL OF SALT so it can get pretty cold without freezing, but minnesota winter can still manage to freeze the pipes and the reaction would slow down in the cold anyway, so you're going to want to heat it a bit in the cold, or at least insulate the container...)

The limit on how much capacity one of these can store is apparently the weight of the iron: it's really big and really heavy. The maximum legal weight of a shipping container is around 20 thousand kilograms (petty much the same for 20 foot or 40 foot ones), so while they CAN make individual batteries that last longer than 10 hours the containers they're in wouldn't be portable or stackable. But the duration of ONE battery isn't really limiting: if you have twenty batteries that last an hour each, you have twenty hours of batteries. The reason not to do that with the lithium ones was the space, the expense, the limited manufacturing capacity due to the shortage of materials to make them from, and of course the tendency for even non-Tesla lithium to catch fire in any sort of quantity.

I'm glossing over some details, but less than with most battery technologies. "Let's use 200 year old chemistry plus pool supplies to make batteries out of a material so abundant we build oil tankers and skyscrapers out of it, which wouldn't burn if you threw it on a bonfire and left it there overnight". Seems like a win to me.

Anyway, here's a video with a tour of their factory, and a talk given by one of the women who run the company talking about how fast they're hiring and growing. (They've got a bunch of women in actual positions of authority.)

April 16, 2022

My attempt to organize an explanation of the toybox infrastructure has resulted in me outlining a complete C tutorial (and checking it against other tutorials to see what I've missed). Hands up everybody surprised by this.

So I now have a whole script for a quick and dirty C tutorial, which I'm pretty sure I could cover in a one hour talk at ELC or similar. Can I do it in half an hour? Hmmm...

Of course I haven't really explained ELF file format and Linux process attributes and so on... ("man 2 execve" and "man 2 fork" have lists of process atributes, "man 7 environ" has another chunk...) Or walked anybody through the kernel's ELF loader. I already have gratuitous mount documentation lying around...


Meanwhile, the representative of "512 solar" came by to see my roof (I was napping on the couch, he never called or knocked during this "visit" and only texted me afterwards), and claimed it is impossible for us to have solar panels because our property has trees on it. I mentioned I was looking into taking two of them down over the best southern facing exposure, and he said it didn't matter, his mind was made up, solar panels would explode or something if put on our roof. As far as I can tell, this person did not go into the back yard to see the southern facing part of our roof and the yard that only has trees around the edges. I know this because the way the leaves are piled up in front of the gate at the moment totally jams it (and said leaves have not been disturbed anyway), by far the easiest way to the back yard is through the house. He just didn't want to bother with our house at all.

Yeah I know, I should really pay off the home equity loan before fiddling with that. (And when Fade finishes her dissertation who knows where she'll get a job, we may sell this place and move.) But I'm still disappointed at a rooftop solar panel installer that doesn't want to finish the quote process. (Yes demand surge after the ukraine invasion has caused the price of solar panels and batteries to go UP for the first time in decades, which even Trump's tariffs couldn't manage. But refusing to do your job without even saying "I've changed my mind and don't want to do this one" is just unprofessional.)

Speaking of which, the preiliminary quote they gave us was going to use panels from "Mission Solar" in San Antonio, which has plenty of nice puff pieces about how they manufacture their own panels... except they don't. They USED to back in 2014 (although it didn't say where they got the silicon wafers), but in 2016 they abandoned that business model and started buying cheap asian solar cells like everybody else. Then in 2018 they "doubled production" (back to pre-layoff levels perhaps?) but as that article clarifies they hadn't gone back to making their own cells, they were basically assembling Ikea flat packs from china like everybody else. And then in 2021 they were hoping Biden would expand their business, but nothing about going back to manufacturing their own cells. Just assembling them into panels.

If the whole "china supporting russia and threatening taiwan while having endless covid lockdowns because their vaccine can't handle variants" thing goes sour, does Mission Solar have a sustainable business model? I'd love to hear that they do, but their website does not answer this question. (In fact they list solar cells as a "raw material", which would be a bit like Intel listing silicon chips as a raw material and saying they just package Samsung's output into little ceramic containers with pins. Which sure is a viable business model but it's not the same thing as IC design and photolithography. Outsourcing your core business is not a sign of health.)

(Oddly, this is exactly the problem Russia is having with sanctions: they THOUGHT they made their own stuff but it's full of inputs they can't reproduce, there was an excellent thread about that last month. Possibly the most relevant tweet in the thread is this one, about how their drive to BECOME independent turned into them lying to themselves and hiding the origins of stuff.)

April 15, 2022

Nope, adding gzip support to wget would be trivial if the http standads bureaucracy hadn't hallucinated "chunked" encoding. Because it can return anything "chunked" and can gzip the chunked, and making gzip and chunked work together turns out to be stupidly difficult. (Piping it to zcat works fine when it isn't chunked, does not work when it is. That's just broken.)

I grepped busybox wget.c (which is 22 years old at this point) for the strings "gzip" or "deflate", which it does not contain. And their wget hasn't got -d but -o log.txt shows an uncompressed transfer. Busybox wget has not gone unmodified for 22 years, but if it's been in USE for 22 years without growing support for this feature yet, nobody cares much and I'm pretty happy skipping it.

(This might also explain why pulling pages from or could never give me "deflate" encoding, only gzip. The spec says it exists, but nobody's bothered to implement it.)

Design question: if I escape an URL I'm given so space becomes %20 and such, what do I do if I'm given a URL that already has escapes in it? Because % is an illegal character that needs escaping...

Hmmm, looks like the server unescapes whatever it's given.

April 14, 2022

The lib/deflate.c plumbing I wrote strongly assumes it's reading from an fd and writing to an fd. This is to avoid trying to return and restart in the middle of stuff, the "get next X bits" and "write next X bits" funtions just handle buffers emptying and filing up behind the scenes because the data has somewhere to go.

What I could do is make empty() and refill() callback functions that get called instead of the read() and write(), and then if either is non-NULL call that instead of reading the fd. (Because the fd legitimately could be zero, reading from stdin is a thing.) But I'm not sure that's LESS awkward that just forking a background process to pipe the data through? Especially if it's a MAYFORK so it avoids the exec. (Which on Linux with MMU is 90% of the overhead, the fork itself is close to free. On nommu fork requires an exec and there still isn't a "re-exec-self" syscall, despite me repeatedly asking on linux-kernel, so it's dependent on /proc/self/exe being mounted.)

One problem the fork-and-fd approach is there's no standard zlib or deflate command line utility to pipe stuff through the way there is for gzip format. The obvious thing to do is add flags to zcat, but non-toybox zcat wouldn't have them. (Once upon a time I was trying to turn "decompress" into the general purpose swiss army knife decompressor with command line options selecting the type for compression and autodetecting decompression, posix be damned since their "deflate" is for the .Z algorithm that was abandoned in the 1980s for patent reasons and less efficient than deflate anyway... but that kind of fell by the wayside a while ago.)

Another problem is wget.c doesn't provide a clean filehandle to read from, because https support. Instead wget_read() is a wrapper around three different functions (read(), tls_read(), or SSL_read()), two of which take a structure instead of a filehandle as their "from" argument. The design for adding decompression support to wget with https is trying very hard to boil down to for (;;) { weird_read_func(in, buf, len); weird_write_func(out, buf, len); } in a loop, which is... not ideal? Gratuitous data shoveling through a buffer I'm both the producer into and consumer from, without even reblocking the data. Hmmm...

April 13, 2022

Called the tax lady and had her file an extension. Still haven't seen a dime from Google, and the piecemeal contracts I've worked over the past 6 months have not included "money for taxes". Well, they did tax witholding, but not "pay the backlog from Jeff not witholding"? I took zero deductions to try to catch up, but I haven't gotten a tax REFEND since before I first worked for Jeff, and I gotta pay the preparer too. I REALLY don't want to have to sell stock or something to pay taxes, because I have no idea what my cost basis on years of Divident Reinvestment Plan purchases is? (I don't want to file tax paperwork on something I sell to pay taxes. The Sorceror's Apprentice music from Fantasia would start playing on a financial level. I can if I have to, but... sigh. Waiting for this would be a lot easier if I hadn't ridden down yet another start-up and lost tens of thousands of dollars to the house repeatedly flooding. I am REALLY appreciating patreon right now.)

Moritz is working on his git implementation, and git is basically a multiplexer with a bunch of subcommands. I want to reuse the toybox mutiplexer plumbing so "git clone" can work like "toybox ls", which seems straightforward. In main.c, split toy_find() into two functions, with the new toy_list_find() taking the list to search and the length as arguments. (The old name would still prefix-match "toybox" at the start and pass toy_list_find() an adjusted list to skip that out of order first entry.)

The git_main() call then looks something like:

struct toy_list *which, toy_list[] = {
  {"clone", git_clone_main, "<1", 0},
  {"init", git_init_main, 0, 0}

which = toy_list_find(name, toy_list, ARRAY_LEN(toy_list));
toy_init(which, toys.optargs);

Which assumes the git top level command's option string has the "stop at first non-option agument" thing, because "git -C dir pull" is a thing...

But the problem is 1) if this parses the new option string how do I get corresponding FLAG() macros? 2) help text: "git --help pull" and "git pull --help" are both a thing. (There's also "git help pull"...)

To get the flag macros and help, I need the subcommands to have NEWTOY() macros and config blocks like normal. I need them to exist in the main toy_list[], but not be externally visible. And THAT means they're basically shell builtins. I probably need a new TOYFLAG_SUBCOMMAND flag which includes TOYFLAG_NOFORK (so they don't show up in the "toybox" output), but also a way to tell the shell not to make them available as shell builtins. (can't be called directly from the shell and don't show up in "help").

(There's a lot of "I already did this but it's not generic enough...")

Ok, probably if I chord together NOFORK and MAYFORK, this can give me the behavior I want. If all the subcommands have a common prefix, I can make a function that collects them together into a temporary list and searches just that list, but returns their position in the original list.

No, wrong: I can put the search function back how it was because this doesn't need binary search. THAT was an optimization to minimize the latency of running toybox commands in the common entry path. This is just "search the list for the prefixed entries" via brute force. I care a lot how fast "true" runs in the shell, but don't care at all if "git pull" takes an extra half a millisecond. A shell script can run a hundred commands per second, fractions of milliseconds add up there.

April 12, 2022

Working on toybox release notes, but as I go through the commit list I keep finding half-finished things that need more work.

Also working on toybox video scripts. What I really need to film is an introduction to the infrastructure, which is a big topic that's hard to break down into bite size chunks. I've tried multiple times already, and keep writing more or less the same script over and over, and not being happy enough with it to record. (This SHOULD be simple, both to explain and to use...)

I kind of want to approach it with a video walking through "hello world" (toys/example/hello.c) and explaining the comment block at the start (top line describing the file, copyright line, links to standards*, NEWTOY() line(s)* have USE() guards* and three arguments each (command name*, option string*, flags*), kconfig block*), then include toys.h*, GLOBALS()*, and finally command_main()*.

But each asterisk above is a can of worms requiring its own video. For example, the NEWTOY lines are SORT of collected via grep -w 'NEWTOY(' toys/*/*.c > generated/newtoys.h except they're sorted in alphabetical order by "first argument to newtoy" and you can't use sort -t -k for that because the relevant field starts with an open parentheses and ends with a comma, and -t only lets you choose one delimiter, so instead we use sed magic before and after the sort to copy the relevant field to the start of the string and then strip it back off again. Another can of worms is how the "command name" argument from newtoy is used as both a string (toys.which->name) and a symbol name (the command_main function pointer) to populate toy_list[] in main.c via macro magic...

I kinda have to explain how generated/*.h gets created by grepping data out of each command's source file. (Except it's usually sed, not grep, because I modify the data slightly as it's extracted.) There's five generated/*.h files, two Config.* files...

What is instlist doing there, that should go in unstripped. (Goes off to poke at scripts/*.sh, it's used in and neither of which are using the $UNSTRIPPED variable, because is including ./configure not scripts/, although another problem is UNSTRIPPED is defined in scripts/ not ./configure, and while I'm at it ASAN is used in both configure and and half what it's defining overlaps. And scripts/ is not including OR ./configure, because it's calling scripts/ and can be included twice without glitching stuff? Hmmm, yes if I unset ASAN after consuming it...

Ahem. This is another reason doing this takes so long. Documenting stuff always leads to fixing things rather than trying to explain them.

ANYWAY, at some point I need to do a birds and the bees talk on how command_main() gets called, walking through main.c from the top level main() to the command_main() receiving control. Including argument parsing and toy_find() and toy_init() and so on. Plus the standalone non-multiplexer codepath too.

And then the NEXT thing to explain is lib/*.c. How many commands are currently exported from there?

$ sed '/^$/d;/^[ {}/#]/d;:r;/, *$/{N;s/\n//;s/ [ ]*/ /g;br};/) *$/p;d' lib/*.c | grep -v ^static | wc -l

Hmmm, let's organize that a bit better...

for i in $(ls lib/*.c | sort); do echo -e "\n--- $i\n"; sed '/^$/d;/^[ {}/#]/d;:r;/, *$/{N;s/\n//;s/ [ ]*/ /g;br};/^static/d;/) *$/p;d' $i; done

Right. EACH of those files is a video unto itself. You'd think lib/args.c would be easy with only one entry point, but that one's a doozie, however that's spelled. And these command lists are moving targets enough that I'm reluctant to record said videos without a mechanism in place to keep them up to date? (Re-recording them and editing the playlists? Some sort of notifiation to rewatch, except if 90% of it's still the same... Hmmm.)

April 11, 2022

While looking for something else, I stumbled across Hacker News noticing toybox exists again 6 months ago. And half the comments are Gavin Howard going "But my bc! It's the most important thing toybox could possibly have! And toybox is useless because it hasn't been paid attention to!"

Makes me want to remove bc and start over from scratch when I get around to that. Probably easier than trying to trim SIX THOUSAND lines of code down to something manageable. (I just checked, and the one in busybox is 7500 lines. Which he considers an improvement.)

Back when it was first presented to me, the development team for this bc exploded into a cloud of drama. And out of that drama stepped a guy who claimed someone ELSE was the source of drama, yet stirs up a cloud of drama in the comments here. (Drama magnets are always surrounded by other people who are At Fault For All The Drama, and they constantly go find new groups of people where Drama Inexplicably Occurs as soon as they show up. Dunno how much of this is drama magnets finding each other, and how much... isn't.)

Honestly, the most useful part of the whole process was probably the test suite. Except the only USE of bc I've noticed is the one script the kernel uses, because Peter Anvin blocked my perl removal patch with a rewrite in bc. Before which neither Linux From Scratch nor Gentoo had bc in their base install, because it's obsolete and useless otherwise. Sure it's in posix. So are the 1977 edition of Fortran, the compress utility for the .Z extension which was abandoned due to patents that expired in 2004, an ancient source control system that's five generations out of date (sccs was replaced by rcs, which was replaced by cvs, which was replaced by svn, which was replaced by git, and Linux switched to git SIXTEEN YEARS AGO and it's gone through multiple user interfaces and file formats since then), and of course a bunch of commands for managing batch job queues...

Posix is supposedly coming out with Issue 8 soon. I doubt they'll bring "tar" back (stop trying to make "pax" happen), or cpio (still the basis for RPM and initramfs)...

April 10, 2022

Huh. Busybox hasn't commited anything to the repo for over a month, nor has Denys posted to the list. I hope he's all right. (His country is still being invaded, although he was living elsewhere in Europe last I checked.)

Yay, Rich finally got the 1.2.3 musl release out. Woo! I should rebuild the toolchains and then bite the bullet and upload the binaries to my website.

April 9, 2022

Following up on:

As for adding coldfire support, let's see... google for "qemu coldfire" first hit is which says the default board in qemu-system-m68k is coldfire and has run uclinux. There's a defconfig for it (arch/m68k/configs/m5208evb_defconfig) so:

$ make ARCH=m68k m5208evb_defconfig
$ CROSS_COMPILE=m68k-linux-musl- make ARCH-m68k
$ qemu-system-m68k -nographic -kernel vmlinux

Hey, console output on serial using my existing m68k toolchain. Good sign. Ok, let's see, can I get a userspace...

I did a quick stab at trying to get fdpic support working. (I haven't got an fdpic toolchain for the target, but the loader should be able to handle ELF PIE binaries, if somewhat less efficiently.)

The next steps to trying to get a mkroot target for it would be:

$ cp .config walrus
$ ARCH=m68k ~/aboriginal/more/
Calculating mini.config...
[1] 135/135 lines 3260 bytes 100%
$ grep -v '=y$' mini.config

I.E. make ARCH=blah relevant_defconfig, digest it down to a miniconfig, and pull out the config symbols that AREN'T boolean. (No modules here so I can skip dealing with them; they can't be needed to boot an initramfs but may be needed for things like ethernet support or mounting block devices...)

Those symbols get glued together into the KERNEL_CONFIG= setting in the mkroot target. I probably don't need to change the LOG_BUF size from whatever the default is... where is that, init/Kconfig and it's 17, which is 128k. Meh, close enough. I probably don't need to change the baudrate (it's virtual, qemu won't care) and the bootparam string needs revisiting. (I'm not faking up an mtdblock device for qemu, I'm booting initramfs. But if I _do_ want a block device for native builds...)

As for the config symbols that are enabled, let's filter out the ones all targets set:

$ sed -n 's/CONFIG_\([^=]*\)=y$/\1/p' mini.config | egrep -v "^($(grep -o 'BINFMT_ELF,[^ ]*' ~/toybox/toybox/scripts/ | tr , '|'))\$"

And now I'm working through that config symbol list, to see which ones are relevant. Currently I'm trying to figure out why taking out why switching off CONFIG_EXPERT makes all serial console output go away. (I have a sneaking suspicion I've tracked this issue down before. It feels familiar...)

April 8, 2022

Jeff's finishing up the first J-core ASIC (first round is just a test chip in a shuttle, maybe 50 chips resulting and they're proof of concept rather than general purpose), and needed to fill an extra 100 bytes of ROM space with chip demo, so I tried to come up with the smallest CRC32 implementation I could so the ROM can checksum itself and then flash the LEDs in sequence to indicate the resulting bits. (We can't fit the proper CPU test program in the ROM space we've got, and this chip doesn't have serial output to display the results anyway.)

unsigned crc32(unsigned char *data, unsigned len)
  unsigned int i, j, k, c;

  // Calculate little endian crc32 with pre and postinversion
  c = ~0;
  for (i = 0; i<len; i++) {
    k = (c^data[i])&0xff;
    for (j = 8; j; j--) k = (k&1) ? (k>>1)^0xEDB88320 : k>>1;
    c = k^(c>>8);

  return ~c;

This produces the same as the "crc32" command line program, but calculates each "table" entry as needed because there isn't a spare 1024 bytes of sram. Inefficient use of CPU but it's only got to grind through a couple kilobytes of ROM, and everything it's doing should fit in registers.

April 7, 2022

Scientists researching indoor air pollution from gas stoves found it was so bad they switched their own homes to electric induction. (Quote: "All the researchers were pretty horrified.")

Sigh. Now that I've forced the USB phone tether to stop grabbing a gratuitous ipv6 address (by switching off ipv6 support in the kernel)... Devuan's udhcp daemon keeps exiting? My address expires and I have to re-run the daemon. I suspect it's throwing some sort of stupid error I can't see (because daemon) and exiting? Buggy piece of crap, I need to replace it with toybox's dhcp client so I can fix any problems. (Such as -4 being ignored and the darn thing grabbing an ipv6 address I did not ask for and ACTIVELY DO NOT WANT because it SABOTAGES SSH'S MAN IN THE MIDDLE PREVENTION.)

Huh. Well, there appear to be something like 30 instances of dhclient running, and when I killed them all and ran a new "dhclient -d -4 usb0" (because dhclient said -d4 was an unknown argument: facepalm), it complained "RTNETLINK answers: file exists" which is the error it gives if the interface is already up with an address... and then says renewal in 1624 seconds. (Um, ok? So... did it or did it not work? I thought that error made it exit, and that I had to run it again. Also, when I unplug the USB so the interface goes away, this does NOT make dhclient exit? It just leaves an old copy talking to a no longer existing device?)

Not opening the dhclient can of worms right now, I'm trying to finish cleaning up "wget" so it can go into this release. Closing tabs, not opening new ones...

April 6, 2022

Ok, the Google guys say that the middleman organization invoiced them and was approved on March 23, and given 45 day payment terms they should have the money by Star Wars Day early next month, at which point I can submit an invoice to the middleman and finally get paid for working on toybox! Woo!

Which means I shouldn't have to get another contract to keep the lights on, I can start tackling the toybox todo backlog now and invoice for this month at the end of the month. (Yay!)

Probably starting with getting a release out, since that's hugely overdue. (I've put some time into it this past week but I'm like a third of the way there...)

April 5, 2022

Japan points out how much of the "carbon capture" technology silicon valley techbros are working on belongs in the "you've reinvented trains, badly" bucket because plants are already amazingly good at sucking carbon out of the air. (They're solar powered and everything.) All we have to do is process the resulting biomass to convert that carbon into whatever format we want it to be in, and "mixing it into dirt" turns out to grab rather a lot for potentially quite a long time.

Sigh, I need to resubscribe to Linux Weekly News. I'm terrible about spending money online. (I know too much about how the sausage is made to trust typing payment information into any keyboard. And my phone does NOT have authorization to spend money. I generally ask Fade to do it for me, which I KNOW is silly and yet... I should buy gift cards or something.)

Anyway, I am reminded of this because I'm waiting to read the updated story about the Debian /usr merge, which is a follow-up on their story from six years ago, which is apparently my fault. Yet ANOTHER case where I yodeled in an avalanche zone and then couldn't steer the result. (The high water mark of which is probably still the patch penguin thread, which just goes to show that "Don't ask questions, post errors" sometimes really works. Here's an old news article about it, here it's linked from a syllabus, here's somebody's disseration in which I am at least 3 people. (Mostly just "Landley", sometimes Rob or Robert, on pages 36 and 37 it's "Langley", and page 173 quotes a Roger Landley.) It's that old "Don't ask questions, post errors" thing.)

Anyway, back in the Yellowbox days symlinking /bin and such under /usr let me mount a read only zisofs image on /usr that contained more or less the entire operating system. (Zisofs was a compressed CDROM image format that predated squashfs. The read only root filesystem was partly a security measure: I wanted something you couldn't mount -o remount,rw if you'd partially cracked the system and "isofs" fit the bill.) My yellowbox init was doing unnatural things that Linux didn't quite support yet, since then the kernel has grown "mount --move" and lazy unmounts. (And yes, my habit of posting things to lkml that get completely ignored goes back quite a ways, doesn't it?)

Anyway, I had the luxury to merge /bin with /usr/bin in 2002 because I didn't have a package management system to worry about. "Where files live" is kind of important to those, and the various packages all have to agree or it gets weird. Debian has tens of thousands of packages and decades of historical cruft accumulation. Plus there are hardwired "#!/bin/bash" in zillions of scripts, so the symlinks aren't going away.

April 4, 2022

URGH, I HATE HATE HATE the way ipv6 lobotomizes ssh's man-in-the-middle protection, so that every "git push" to github does:

Warning: Permanently added the RSA host key for IP address '2607:7700:0:2d:0:1:c01e:ff71' to the list of known hosts

I've been running "dhclient -4 usb0" with phone tethering to disable that, but apparently devuan "upgraded" it so that it requests both address types at the same time now even when you EXPLICITLY TELL IT NOT TO.

So now to stop it, I have to do:

sysctl -w net.ipv6.conf.all.disable_ipv6=1
sysctl -w net.ipv6.conf.default.disable_ipv6=1

Only then does ssh uses a consistent address which DOES NOT GET AUTO ADDED WITHOUT ASKING ME EVERY TIME. (A reminder that the and block and most of the multicast address space can totally be released for general use, as Dave Taht as trying to do. That's somewhere around 300 million more ipv4 addresses freed up without redistributing any of the existing ones. (And does NOT need netmask, there's another 16 million...) But no, the IPv6 loons see doing that as an existential threat and mobilize to block it. They can't win unless everone else loses, because a proper fix to the problem renders them unnecessary. IPv6 is such a clusterfsck that the only way for it to succeed is for IPv4 to fail, so it must be sabotaged.

April 3, 2022

Attempting to put together scripts to record videos from. The problem of needing to explain everything before explaining everything else remains, but I suppose that's more a "what order do I put video lists in" rather than what order I RECORD material in.

Tempted to try to optimize "factor", but the largest positive signed 64 bit prime seems to be:

$ time toybox factor $(((1<<63)-25))
9223372036854775783: 9223372036854775783

Which takes 14 seconds to (fail to) factor on my 10 year old laptop. That's probably good enough.

(This laptop is newer than that to ME, because I bought four of them from surplus clearance so I have spares if it breaks, but this model was introduced a decade ago now and all I did was stick more memory and a big SSD into it. The CPU is a quad core-i5 at 2.7ghz. Faster than a speeding raspberry pi at 1.5ghz, but not by all THAT much. Factor may not be snappy in a loop, but it's USABLE on any 64 bit value and isn't using a bignum library so that's all it can do anyway.)

April 2, 2022

I get emails:

> Good morning!
> Attn. Mr Rob Landley :
> Recently my Huawei Nova 2 Plus Android phone started to show me some signs
> of being rooted in a very different way with some elements like a toybox
> /toolbox as I'm gonna show you : Please take a look to all of the images
> above attached.
> All I hope Mr Landley is to receive some free advise on the issue I'm
> dealing with I mean how to get rid of this nasty problem and is ruining my
> phone I mean I can not run apps mainly those of banking services and stuff
> like that!
> Thanks in advance for any can provide to help me out with this issue and
> resuming my activities normally.
> Best regards
> Sergio XXXXX
> Sent from my Huawei Mobile

I make one specific component of the system, which is incorporated by Google into the Android Open Source Project with over a thousand other packages, and then the version you're using was extensively modified by Huawei (a Chinese company).

This is a bit like your car getting infested with ants, and emailing the designer of your car's fuel tank on the theory the car can't run without a fuel tank so I must know what to do about ants.

Have you tried a factory reset?


April 1, 2022

A few months back Joe Biden tried to change the IRS so any account with $600 in it must report every single transaction (no matter how small) to federal authorities. (This neatly disarmed the "double the IRS's budget so they can go after Billionaires" push from the left, by offering to double the IRS's budget so they can go after the poor. This doubling is of course undoing decades of cuts by the GOP reducing the IRS to a shadow of its former self, it's shrunk 20% just since 2010 because the rich hate taxes and the GOP wants to shrink the federal government to the point they can "drown it in the bathtub". This would allow billionaires to become kings and keep slaves again, the basic goal of plutocracy.)

Now Boomer Biden's taking his next step in the War on Cash: he wants all online sellers to provide the government with Photo ID to Protect the Children... no wait (checks notes) because intellectual property law.

A recent Paul Krugman Column included the paragraph:

I still sometimes encounter people who say that we live in a digital age, so we should be using digital money. But we already do! Like many people, I pay for most things by clicking a mouse, tapping my debit card or pressing a button on my phone. I used to keep singles in my wallet to buy fruit and vegetables from New York’s ubiquitous sidewalk stands, but these days even they often accept Venmo.

And later in the same column:

And we shouldn’t discount the importance of illegal activity. There’s about $1.6 trillion worth of $100 bills in circulation — 80 percent of all U.S. currency — even though large-denomination bills are very hard for ordinary consumers to spend. What do you think people are doing with all those Benjamins?

So 69 year old Krugman notes that the ability to make anonymous financial transactions on a daily basis is going away, and tuts that using cash for any reason is inherently suspicious (only criminals use cash), and never questions whether that's a good thing.

Meanwhile, the GOP is empowering vigilantes to sue anyone who "assists" someone to have an abortion, and make traveling across state lines to have one a felony. (The GOP has always relied on vigilanties, they just used to wear white hoods and lynch people instead of suing them.) Even if the credit card company or regulatory agency collecting this data ISN'T captured directly by nazis, the next data leak will give a complete list of donors to Planned Parenthood or the DNC so they know who to target with vigilante armament bill du jour. And Biden's refusal to make the Supreme Court "go to eleven" means said laws proliferate and remain in force, while Merrick Garland lets all the statutes of limitations expire on rich white men who might otherwise be prosecuted ever for anything.

Krugman's a Boomer, which means baked into his worldview is the assumption that anything they inherited has always been like this therefore it requires no maintenance and no amount of abuse Boomers heap upon it can possibly damage it in any way. The Boomer world is eternal and unchanging and required no effort (from a Boomer) to be like this in the first place. Therefore nothing important is ever really destroyed or goes extinct, no matter how incompetent those lead-poisoned narcisists are at dunning-krugering their through life. Impostor syndrome only happens to people with the slightest trace of self-awareness.

March 31, 2022

Followed Fade to her graduate office, which is a nice quiet place to record videos from when nobody else is here. Except I need to prepare scripts. And trying to explain stuff keeps leading me off on tangents to fix things I've just reexamined and gone "that's a rough edge, could be better"... Plus I'm leaving myself a lot of notes like:

Can prlimit query stack size (ala ulimit) for comparison in main() instead of hardwired stack limit for command recursion? Is that worth doing, and does it work on nommu where stack size is a per-process attribute in the FDPIC ELF headers? Does prlimit show the correct process-local stack size on nommu, and if not should I poke Rich to fix the j-core kernel...

March 30, 2022

Finished the board bringup contract yesterday: the board is essentially up now. All three issues were hardware, but were easily fixable (in hardware) once diagnosed. So I have once again done a software project which resulted in zero lines of code, and yet fixed the problem. (Yay?)

The most interesting was that the ATWILC3000 and ATWINC3400 wifi chips are pin-compatible, so when one was out of stock (ongoing chip shortage) they substituted the other. Except being electrically compatible does not make them REMOTELY the same chip: the 3000 is a standard-ish WIFI adapter with a linux driver in the vanilla tree, and the 3400 is a router-on-a-chip that boots its own multitasking OS from built-in SPI flash and doesn't just have its own TCP/IP stack but runs its own web server, does its own crypto, it runs SNTP to set its onboard clock so it can check https certificate expiration... The 3400 is designed to act as a router for deeply embedded arduino-style systems too small to do their own networking, and its manual does not contain the word "Linux" because it's not FOR that.

The fix was to find 3000s in stock at digikey and swap the chips back. In THEORY we might have botched up some variant of a PPP connection to let Linux use the 3400 as a gateway to talk to the net through an extra hop, but that would be a HARDER hardware change because they hadn't wired up the serial port pins (which the 3000 only uses for testing).

There was a whole lot of head scratching along the way to figure out why trying to load 3000 firmware into the 3400 providing the WEIRDEST driver failures. (Oh, the driver will panic the kernel if the firmware isn't in the format it expects, because it only checks that the SOURCE ranges it's copying data from fit, not that the TARGET ranges it's copying data to fit. Needless to say the 3400's SPI flash image and the 3000's SRAM image aren't quite compatible.) Whole lot of sticking printfs into the wifi driver to answer The Wrong Questions. Searching for keywords in the giant programming manual and not finding them (because they weren't there) was also frustrating. Reading the sales brochure is what finally clarified things: oh, wait, this is the wrong category of product. (And it's pin compatible? Really? Yes, yes it is... Why?)

When you have a really FRUSTRATING debugging session, often you've missed something more fundamental/obvious than you're examining. This was an "I am in the wrong house" level mistaken assumption. Took quite some backing up until I could see it.

March 29, 2022

My laptop kept powering itself back up fifteen seconds after suspending, so I went through the dance of closing every window in all eight desktops until I could actually shut the thing down. This involved scribbling a note about the state of many open terminal windows into a text file to track which todo item that open terminal represented. (I semi-regularly go through and close tabs that CAN be closed, but a lot are something I was in the middle of and context-switched away...)

On Linux I can fix most things without a reboot, but "the BIOS has gotten itself confused" is not one of them.

March 27, 2022

"Streaming" means things vanish. You either have your own copy, or you don't. (This is a subset of "intellectual property law destroys intellectual property". Years ago Cory Doctrow said we're living in a dark age, because so much will fail to be preserved from today due to IP law that future historians will have a much clearer picture of the 1800s than today. Or was it lawrence lessig who said that? I googled, but the talk has fallen off the net...)

March 26, 2022

Visiting Fade and doing board bringup contract. Not very active on other stuff at the moment.

Twitter has gotten deeply confused. The 12 hour timeout expired, and now I can read things again, and tweet, but I can't favorite things? (Cached CSS entry maybe? Meh, not worth trying to debug.)

March 25, 2022

Fade set up a new twitter account for me so I can read threads without Twitter's stupid paywall page blanking the screen as soon as you scroll down three tweets, and while I didn't intend to post anything with it I got tempted into replying to stuff. Yesterday, I replied "We're all just waiting for the Boomers to die" and today:

We have determined that this account violated the Twitter Rules. Specifically, for:

Violating our rules against hateful conduct.

You may not promote violence against, threaten, or harass other people on the basis of race, ethnicity, national origin, sexual orientation, gender, gender identity, religious affiliation, age, disability, or serious disease.

As a result, we’ve temporarily limited some of your account features. While in this state, you can still browse Twitter, but you’re limited to only sending Direct Messages to your followers –– no Tweets, Retweets, Fleets, follows, or likes. Learn more. Your account will be restored to full functionality in: 12 hours and 0 minutes.

You can start your countdown and continue to Twitter once you:

Verify your phone number

Delete the content that violates our Rules

1 Tweet

If you think we’ve made a mistake, contact our support team.

So twitter considers "waiting" to be hate speech. (This Onion article is from twenty years ago before the Boomers got NEARLY this bad.) Meanwhile, the "your account can still browse twitter" is clearly untrue because now it's the phone number entry paywall at every page (paying with information is still paying, and my spouse already GAVE THEM her phone number to create the account so they HAVE THIS INFORMATION, it can only be some kind of "we know who you are" threat?), and they want me to performatively delete a tweet they've already blocked.

This is not a healthy website.

March 24, 2022

When busybox ash encounters an elf binary it doesn't recognize (such as an arm64 binary on an armv7l system) it tries to run it as a shell script.

root@stm32mp1:~# /boot/toybox-arm64 
/boot/toybox-arm64: line 1: ELF��@@�C
                                        : not found
/boot/toybox-arm64: line 2: syntax error: unexpected "(" (expecting ")")

Meanwhile toybox notes any utf8 parsing failures on the first line (I.E. anything >127 that doesn't decode as valid utf8) and goes "this is a binary". Not the best heuristic, but it catches most things.

March 23, 2022

In minneapolis.

Here's a tangent I edited out of an email I was sending:

I'm trying to make a minimal self-hosting system for a bunch of reasons, and C remains the obvious language to do that in. But "where the source code comes from" is not strictly required to be part of that core. My Aboriginal Linux system would wget tarballs, and the automated building plumbing at created squashfs images of pre-extracted source code so the VM could just refer to a mount point with source code for each package in a separate directory.

The source code download mechanism doesn't affect "here is a classroom of students studying everything to understand it all". It doesn't affect the "Ken Thompson's Trusting Trust attack" analysis. It doesn't affect "here's an archival system you can rebuild known state from" (because if you can't reproduce it from scratch under laboratory conditions what you're doing isn't science). If you don't everything locally cached before starting you've thrown the "reproducibility" part out the window.

(Sigh, I need to do DESIGN videos. Lots of explanations about _why_...)

March 22, 2022

Flying to Minneapolis.

The Posix committee has approved adding strlcpy() to the next release (SUSv5, Issue 8, and POSIX-whatever year they get around to publishing it). Which raises an interesting issue of how to deal with that in portability.c: I independently invented "strlcpy()" back under Turbo C on DOS, and was happy to find it in SunOS' toolchain and BSD and so on, but it was never in glibc because Urlich DrPepper hated it for some reason. Fine, easy enough to implement my own. But if it gets added to libc and the standard #include headers, that's going to clash. (We're 100% going to disagree about "const" if nothing else.)

Do I put the definitions inside #if __STDC_VERSION__ < V4_RELEASE_DATE perhaps? Dunno. I've been using c99 and basically ignoring c11 (and bits of c99 to be honest, LP64 is better for known integer type sizes and declaring variables at the start of a block lets you FIND them), but I haven't been saying -std=c99 because I haven't needed to: both gcc and llvm have given me reasonable behavior as the default. I never had to tell /bin/sh -std=susv4 to specify which version of posix to use either, because THAT'S STUPID. I just run a sufficiently capable version of the shell that can handle my scripts, and it does the thing. If you find yourself having to say "python3" instead of python, it means the python transition faceplanted spectacularly.

Hmmm, looks like somebody on BSD complained (they have their own version in libc) and I already removed it from toybox. (I forgot.) Oh well, bullet dodged I guess?

March 21, 2022

The USA had nine mass shootings over the weekend.

Completely lost in the noise of course, but no party is "pro life" or "protecting children" when active shooter drills have replaced singing "duck and cover" and THEY ARE OK WITH THIS.

March 20, 2022

Back to work on the stm board. Serial console works now!

Something about the yocto install on this board puts it in a mode where the mouse wheel scrolls through command history instead of backscroll. The fix is "tput rmcup" which isn't installed on the board, but running that on my laptop and piping it to hd says "echo -e '\e[?1049l\e[23;0;0t'" should fix it (and did). The man page implies that the first one is involved in "mouse tracking" support but doesn't really properly explain (it says waht ?1000l does but not ?1049l), and if it mentions the "t" one at all I can't find it. Still, it worked, and that's the important part.

So the first thing I'm trying to fix is the gigabit ethernet jack. In theory it's the same chipset and driver as the raspberry pi form factor evaluation board, but something about it is unhappy. I set up a static PTP connection between it and my laptop, and the board never sees packets sent from my laptop, and my laptop increases the rx error count for each packet (although occasionally some get through!) Hmmm...

March 19, 2022

Boomer Prudishness has gone completely insane.

March 18, 2022

Trying to close tabs, in this case tabbed terminal windows within which I have a bunch of half-finished commits, and also the various "clean1, clean2, clean3..." directories in my toybox workspace where I was doing something off on its own branch. (Mostly toysh changes.)

One of the abandoned forks was a very long rathole trying to figure out toysh crash where free() was aborting on one of the targets, which turned out to be 2 things: 1) I was popping an entry off of a list but had a cached pointer to it that needed to be updated, 2) I was hitting a bug in musl-libc's realloc() that corrupted the heap. Apparently this bug only existed briefly a year ago, and of course the toolchain I built (with git commit du jour since musl hasn't had a release in over a year) was within the range that had it.

March 17, 2022

Evangelical christianity continues to suck

I really hope the fall of Putin takes down the GOP he funded. (And yes, Manchin is a republican.) Texas suppressed 19% of the mail-in ballots this last vote.

Boomer Media sucks. Defund the police. The LAPD has been terrible for a hundred years, and they even made a movie about it. But as notable as that instance is, it's not exactly unique.

March 16, 2022

So, definitely not vimeo then. (They're charging patreon creators thousands of dollars per month for hundreds of views per month, as part of pivoting away from hosting video for individuals into hosting video for corporations. Patreon has not yet caught up with the change, and continues to think it has a partnership with them.)

Started on that very brief contract (just through the end of the month) to keep the lights on while the Google guys work out how to get money from point A to point B through a legal briar patch, and we just worked out that the board I'm doing bringup work on was damaged in transit, which is why I couldn't get serial output. So I packed it up and overnighted it back to them, and am taking tomorrow and friday off.

March 15, 2022

Tesla sucks. (All billionaires suck. USA's own oligarchs.)

March 14, 2022

Started the board bringup contract. Oh goddess another Yocto build. But I guess that's why they need help: if they weren't using Yocto they'd be done by now.

The ELC-in-Austin CFP closes in an hour and a half, and I still can't come up with the enthusiasm to submit proposals. I mean, I could do "building the simplest possible linux system" again, presumably much less jetlagged this time and with mkroot actually functioning. I could do "the state of toybox". I could propose a talk on mkroot. I could do a talk on android convergence. Heck, that stupid colocated "community leadership conference" suggests "Open Source Governance and Models", meaning I could dig up "the prototype and the fan club" talk again and try to get a version properly recorded.

But I'm just not feeling it. I don't feel connected to this community. This is The Linux Foundation: stamping out "community" is what they do. Chasing away hobbyists so the suits can charge rent. I wouldn't be going there to see anybody...

*shrug* Looks like it's moot anyway, the website closed the CFP earlier than the posted time. Probably using client-side javascript to use my timezone instead of the stated pacific time, but I'm not resetting my laptop clock to get around that.

March 13, 2022

The dollar value of Asset Forfeiture has surpassed burglary in the USA. The police are now MEASURABLY stealing more money than the "criminals". They need to be defunded.

March 12, 2022

I picked up a brief contract through Imperial Staffing (2-3 weeks of board bringup) to keep the lights on while Google works out how to give me the money they have sitting in an account already approved for toybox development. (There are many lawyers involved, because I am not a corporation, a state Google's bureaucracy cannot comprehend.)

Going through my various toybox/clean[1-9]* directories where I forked off some task and then got distracted from it before it was finished. One of them involved sticking several dozen dprintf(2, "thingy") debug prints into toysh, and running the mkroot init script. Unfortunately, what I DIDN'T record was the actual TEST I was running. The dates say these files were last updated July 5-10, 2021, and my blog says I was trying to track down an error manifesting in musl's free() on 64 bit platforms. Aha! I ran the i686 mkroot image but not the x86-64 mkroot image, and that one is segfaulting. Right.

March 11, 2022

The Boomers' prudish pearl clutching censorship continues. It will escalate continuously until they die.

The GOP's new strategy is modeled on the "fugitive slave laws", where they pass a state law and make crossing state lines to avoid it a felony. This is OBVIOUSLY unconstitutional, but they've packed the supreme court with confederate loons and Biden will never make it "go to eleven" because he's a Boomer and Boomers solve nothing.

Disney squeezing blood from a stone: pay to watch ads. (This is why it's "late stage" capitalism: they can never have enough. They can never stop. They must ALWAYS escalate to the indefensible and trigger a revolt.)

We know what policies work to make a good sustainable society. We always have. The problem is oligarchs lying, cornering the market, inserting unnecessary middlemen to bleed the system dry, burning down the homes people own so they must pay rent forevermore. This is why brexit is privatizing britain's NHS. This is why doctors in the USA no longer make house calls, and the surgery for a broken arm ($16k) costs more than a year's minimum wage salary ($15k, and that's before taxes).

March 10, 2022

Trying to finish up the "cd in deleted directory" stuff and I am annoyed at bash again:

~$ mkdir sub
~$ cd sub
~/sub$ rmdir ../sub
~/sub$ cd -P ..

What is the -P for then, exactly? What, does it fall BACK to -L to try to... Ah, I see. This is a Linux kernel thing. The ".." symlink is retained active in the dead directory, pointing to the directory it WAS in. And when I rmdir the parent directory as well it's still pinned (ls .. shows it as empty but doesn't error, and stat .. gives it an inode), and then cd .. fails but cd -P .. squirts back up to the next non-deleted directory.

So I don't think I have to handle this specially, because the kernel is already being weird about it.

March 9, 2022

The EU has just ordered that the RT inteview with the late David Graeber be scrubbed from the internet, because ten years later a dictator went nuts and therefore everything they've ever posted is damnatio memorae. Not just the original posts, but reposts must be removed. And not region-locked out of the EU, they get to export their censorship globally. Sigh. I can see blocking new stuff, but removing the old stuff has a lot of splash damage. (And once again, a reminder that "streaming is not ownership".)

Of course the worst kind of censorship is self-censorship. These days a 1970s Australian public service announcement that drew cartoon humans that look like humans is considered AMAZINGLY WEIRD because acknowleding how humans look is JUST NOT DONE anymore.

America is one of the LEAST sexually healthy or body positive countries in the world today. What messages do people grow up with here? Your body is shameful and no one must ever see it, but other people's bodies are fascinating and you deseperately want to see them. If what YOU have is bad, and what THEY have is good... Sure it's not the only issue (historically half a percent of humanity seems to always have been trans, which in the USA's population of 330 million would be 1.6 million people), but perhaps a bit of a thumb on the scales? If most actual physical differences are scrupulously hidden from view at all times, at what point does sexual dimorphism become an abstract concept and thus a matter of opinion? Of course no attempt to understand early development influences is any excuse for nazis to persecute people. My point is just that we live in a DEEPLY repressed, unhealthy society. (And as always, the capitalist corporations latch on to any social flaw and make it worse.)

Sadly, the world is complicated. For example, Putin's "denazification" is standard right-wing projection, his forces marching under the Zwastika are mass-murdering people to depose a jewish president. Meanwhile Israel is one of the few countries that's refused to condemn the invasion, and their government is settling Ukranian refugees on occupied Palestinian land while they're the main worldwide source of spyware repressive governments use to monitor dissidents.

And the usual issues remain at full volume: Defund the police, the GOP is still literally a death cult...

March 8, 2022

So I made a Vimeo account and uploaded a video (a lightly edited version of the first one-take tutorial video on downloading and using a toybox static binary), tried to make a video post on patreon, and... it says I can't do this with a "Vimeo basic" account, I need one of the pay accounts.

I logged into with my google account (well, one of them) and uploaded the video there, and then tried to post THAT to patreon as a video post and instead of embedding the video the summary box is a notice about having an outage for scheduled maintenance, and then when you click through it goes to the download page where there are inexplicably five files (I only uploaded one). Although has a nice video player (that goes to 3x when youtube's maxes out at 2x), you have to click "parent directory" to get to it for some reason.

I'm not sure if this is actual incompetence on patreon's part, or strategic incompetence promoting specific partners in a plausibly deniable fashion?

Meanwhile, the Google guys suggested I make an account on something called "Open Source Collective", so I can go through an approved vendor and thus bypass the lawyers boggling endlessly at the concept of an individual human being existing as standalone entitity rather than part of a corporation. (Doesn't EVERYONE have a corporate shell? Are you truly real without a corporation wrapped around you?)

This collective of course takes a 10% tithe, and I asked if just sponsoring my Patreon was doable (I have the old-style account there that takes less than that), and apparently the tax people doing a bunch of head-scratching about that. (So giving 10% of the money to some organization I've never heard of is doable, but going through the one that many open source developers actually use is not currently an option. This reminds me of IBM's lawyers spending the whole of 1998 frowning at GPLv2.)

March 7, 2022

Yet another study shows that lead poisoning cost Boomers several IQ points.

Capitalism intentionally creates scarcity, and some of that scarcity is permanent.

Here's a good thread on how deregulation under capitalism screwed up rock music in 1996. (With bonus misogyny.) And here's a good writeup on how Boomer Capitalism isolated everyone and destroyed our social skills.

And here's a 20 year old article explaining how Russia was already a plutocracy, I.E. capitalism is to blame for Putin. "Oligarch" is just a billionaire with political power.

Cultural misogyny is often internalized, although other times it's explicit (and incompetent, and implausibly deniable).

Defund the police.

March 6, 2022

Fuzzy was up all night barfing again, right after I was sick/tired/anxious again. This is definitely acting like that long-covid sine wave thing, which sucks. (And would imply we got Omicron late last year. Wheee.)

So I'm out at the table trying to edit another video I recorded, and I've hit a snag.

For years I've left the house to work, which might be an unmedicated ADHD thing? I'm more productive OUT somewhere, at a coffee shop or fast food booth or the picnic table on the UT Geology Building's porch. The hotel rooms in Japan were marvelously productive, but the apartment I had there stopped being so after the first week. Same 4 walls does not productivity make, for some reason. Not exactly writer's block, just really hard to get into the zone and WAY easy to get distracted.

But recording video at the table has the problem that it isn't QUIET. Right now there's cicadas, and even when there isn't there's some sort of climate control hum from across the street. (Most likely a multi-story building installation of heat pumps.) I'm assured that the hum is tolerable in the recordings (the true/false videos were recorded here), but my problem right now is I recorded several minutes of footage on the "tty" command (usage and implementation) back at home where it actually IS quiet-ish, and took it here to edit, but now I want to supplement the usage part because during editing I realized I never really explained what a TTY _is_. That's easy enough, I've worked out a brief script pointing at man 4 tty and basically saying it's a pipe with extra side channel info such as screen width and height, cursor position, and also process membership for signal delivery so the shell can batch signal ever process associated with that TTY so ctrl-C kills everything in a pipeline and ctrl-Z suspends the whole pipeline (and then bg or fg can resume it all). (On the implementation side I should probably mention you can go look at the man pages for tcgetpgrp()( and tcgetsid() if you want to know how that's implemented but it's out of scope for now.)

The problem is, if I splice together footage taken at the table with footage NOT taken at the table, the background air conditioner hum cutting in and out probably gets a lot more noticeable. (The cicadas certainly would.) So I need to edit as much as I can here and then go somewhere quieter to record supplementary footage.

Re-recording turns out to be harder than I thought, because on later takes of the same thing I wind up tongue tied and... tired? Dunno. It's strangely exhausting to explain things into a microphone, I don't quite know why.

This is WAY harder than just standing in front of people and talking to them. When I can actually SEE my audience I don't care so much if I repeat myself a little or explain stuff out of order and then backfill. Live talks are never GOING to be perfect, so they don't have to be. I can TELL if the audience is confused or bored or following it. Even giving talks over zoom, there's still people listening live. This is like trying to do stand up comedy without anybody laughing. Maybe it's because I'm new at it and have been speaking to groups of people FOREVER. (I taught community college courses back during the dot-com boom in front of rooms full of people at a time. That part's old hat. One on one I can do, one to many I can do, one to NONE I lose traction somehow...)

March 5, 2022

A right wing loon added an amendment to ban postal banking to the current post office repair bill, which would include the $21 billion of money orders the post office sold last year. It's another part of the "war on cash", the US Dollar has been privatized and may now only be spent through the monopsony of Visa, Mastercard, and Chexsystems who will ban you for life if you do anything they consider "immoral" such as visit a nudist resort. (Of course spray-painting moral judgements over everything is how they prevented AIDS research in the 1980s.)

Hertz is actively hurting its customers for no obvious reason by reporting its properly rented cars stolen and having the renter arrested. They interviewed one man who wound up homeless, another who spent 6 months in jail. It apparently does this over 3000 times per year and knowingly refuses to provide accurate information to the police because if it did the police wouldn't accept its theft reports. (Leto's Law has been covering this story for a while.) The moral of the story: corporate personhood allows people to do this sort of thing and escape punishment because the PEOPLE who did it aren't liable, the ablative corporate shell is.

The GOP remains funded by russia even today. Of course capitalism in general has that problem, but right now right wing loons are actively blocking US military operations near Ukraine.

Meanwhile, Defund, Guillotine, as usual.

March 4, 2022

Blah. Sick again. Or at least VERY VERY TIRED all day, for no reason. And borderline anxiety.

Posted an email to the list about the ongoing attempt to do videos.

I need to figure out where to host the videos I'm making, but youtube's copyright claims are just so amazingly stupid that I really don't want to get it on me, but there isn't an obvious second choice yet? They're like AOL, Myspace, or Livejournal: totally unassailably dominant. An american institution like Sears. (Although in this case it's mostly that Patreon has good youtube integration, and hasn't bothered to implement that for most other places, including raw URLs.)

So there's a bug report about toybox md5sum treating directories as empty files, and this is a more general problem that should be fixed in lib/ somewhere. If I put an fstat() into notstdio()... it doesn't have a good way to report back that it failed. Hmmm. Maybe just in loopfiles()? Grrr. Trying to retrofit a design constraint.

March 3, 2022

Happy International Sex Worker's Rights day. (That thread is from a bookstore, highlighting a lot of books about the racism and sexism behind the attacks on the industry.) Speaking of, here's an old video showing that 80% of american men admitted to watching porn and pornhub had literally BILLIONS of visits per day, and yet the Boomer prudes think that prohibition's going to somehow improve matters here.

The USA defeated Stalin's Soviet Union and Mao's Red China by converting them from communism to plutocracy, which turned out to be an even bigger problem in the long run. Late Stage Capitalism is now the toxic problem threatening to destroy the world, with global warming and nazis and countries invading each other (China's plutocrats eyeing Taiwan).

The plutocrats' playbook is to take over public services and then destroy them, both to convince people that government is unworkable (with tories in charge) and to make people desperate enough to beg for crumbs from oligarchs. (Bill Gates does not hire Warren Buffett to wash his car, they need piles of starving peasants to boss around or being a zillionaire means nothing. If they can embezzle the funds along the way, that's a bonus. As is writing laws to prevent themselves from being prosecuted, such as needing to prove intent. Sure you can show what I _did_ but without reading my mind you can never know what I MEANT by it...)

The most recent scummy replublican thing is to outlaw postal banking, including the $21 billion annual business selling money orders the post office currently does. (It's part of the War on Cash, all financial transactions must go through one of three private companies that can track and veto each purchase. This is why we need to guillotine the billionaires, as soon as the last boomer dies.)

Here in the USA we have the return of slumlords. Still trying to buy up every residence in the country and corner the market on housing.

Russia bombing a nuclear reactor is alarming but not as bad as it could be, because nobody outside of Russia is stupid enough to still run RBMK reactors. (Meanwhile, Russia still has ten of them on its territory, each of which could still go full Chernobyl. Luckily, Russia's post-soviet turn to capitalism gave it massive embezzlement and fraud which has hollowed out its army.)

March 2, 2022

Sarah Taber has great coverage of the strategic importance of ukranian agriculture.

HEB's 88 cent small avacadoes now have a purple sticker, saying they're from the Dominican Republic. The big ones still say from Mexico, presumably they take longer to ripen in the warehouse? (There's a nonzero chance Avacado Laundering is going on, but I expect they really were trucked through a third country if so.)

This laptop has bluetooth built in (my previous one didn't but the Dell Latitude E6230 does, I was reminded when rewriting lsusb recently). It's on an internal USB bus and has EXACTLY the same problem that the external USB bluetooth dongles did: the GPL Linux bluetooth stack is incompetent. (Android's got a working one, they threw out the GPL version and wrote an Apache licensed version from scratch. But nobody's ported that back to Debian.) The xfce GUI dropdown menu for bluetooth sees my headphones just fine (iJoy Logo), but when I click "select" it can't identify them (the options are "serial port", "audio sync", or "handsfree"), and when I select any of those it thinks for a few seconds and then goes "Device added successfully, but failed to connect". This has been the case with the Linux Bluetooth stack for at least 7 years now, completely unchanged. Linux on the desktop is not just dead but embalmed.

I need to do a video on the toybox build infrastructure, but first I want to split header generation out of scripts/ just so it's easier to explain.

There's already a NOBUILD=1 that tells it to return early (just regenerate the headers, don't compile code), but that still leaves this script being hundreds of lines long and doing all SORTS of stuff before actually compiling *.c into *.o. And the basic technique for "make -j" in shell script seems like it would be of use to other projects. (You really don't NEED ninja and so on, 90% of the problem here is that "gcc *.c -o blah" won't use multiple processors because compilers are still stuck in the 20th century which has been over for two full decades now.)

The problem with splitting up scripts/ is that it cheats: the file generated/ is both a standalone script you can run to rebuild the toybox binary from this source and generated/*.h with a simple "gcc *.c -o blah" invocation (sadly currently broken again because I don't regression test it enough), but it's ALSO used by the plumbing to see if we need to re-run the library probes. (The compiler can eliminate used libraries via --as-needed, but if you tell it to link against a library it can't FIND that's a hard error, so we need to probe for what libraries are available and only include those.) And the "figuring out what to build" logic that theoretically only goes in the build-not-headers part gets used to create generated/ Hmmm, I suppose I can have it generate the start of the script as a separate file and then "cat file; make_more >> file". Bit of an intrusive change...

March 1, 2022

There was apparently exactly one airplane that could move the largest pumps and cranes and such around the world, and Russia destroyed it attacking the airport in Ukraine. I should apparently add the book "Kleptopia" to the to-read pile because it annoys the right people. New research shows that the glass ceiling is held in place by insecure, poorly performing men.

The GOP is reviving its family separation policy, this time taking native american children away from their families so they can be raised by white people (or sent to catholic boarding schools to be quietly murdered).

February 28, 2022

Still anxious, but largely trying to dismiss it as lingering stomach upset.

Lars at Google is back from his week off and says that legal came back with more questions and it'll be at least another week of processing. Fade moved some savings around to pay the mortgage, but I should probably sell my Coca-cola stock to have enough cash cushion to start job hunting again if it takes longer than that. (Ok, so not ENTIRELY lingering stomach upset. And my incredibly generous patreon donor edited his pledge down by 50%, which is still incredibly generous, especially considering I owe like 8 videos by this point. But even the original incredibly generous rate was a small fraction of my mortgage payment, so does not strongly influence the "can I afford to keep doing this" decisions.)

February 27, 2022

Darn it, decided to experiment since I was feeling "tired but not anxious" and HEB was out of the milk tea I like, so I got a bottle of the japanese milk tea (not the brand I get when I'm in japan, but they don't have that here, but this is still a fairly standard sweet black tea with milk), which I'd stopped drinking because the green tea version of it seemed to be triggering anxiety (I.E. upsetting my digestion for a day or more at a time, in a way that messes with my emotions). Two hours later, I'm full of anxiety. Funny that.

Walking around hancock center in a circle watching the blender video tutorial series on my phone. Everybody keeps taking "I have one video I want to cut chunks out of and stitch together" as too trivial a case to ACTUALLY SHOW AN EXAMPLE OF. (I'm not trying to overlay picture in picture. I'm not trying to do transitions. I just want to edit my one long take down to something that sounds more intentional.)

February 26, 2022

Speaking of the USA being uniquely prudish... (Because we were colonized by the Quakers and Puritans, people who fled pre-Victorian England for being too licentious. The mayflower landed at Plymouth Rock in 1620, between this portrait of english court fashion in 1616, and this in 1625. The thanksgiving images of buckled hats don't just look stupid by MODERN standards, they were silly cult stuff at the time, without even taking into account how much colder England's climate was back before the industrial revolution and coal burning steam locomotives and such kicked off global warming.)

So I was sick and then Ukraine got invaded which is quite distracting. I haven't emailed Denys to see if he's ok because I don't want to bother him right now, but I did finally give the busybox list a heads up that he might be a bit distracted. He's posted a single terse message to the list this month.

Found a situation where bash is leaking internal state, I think? Running this in my home directory (and WITHOUT the usual prompt trimming I do for these shell examples):

landley@driftwood:~$ mkdir one
landley@driftwood:~$ cd one
landley@driftwood:~/one$ rmdir ../one
landley@driftwood:~/one$ cd .
cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
landley@driftwood:~/one/.$ declare -p | grep one
declare -x OLDPWD="/home/landley/one"
declare -x PWD="/home/landley/one/."
landley@driftwood:~/one/.$ unset PWD OLDPWD
landley@driftwood:.$ cd .
cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory

How did it get "one" back at the end there? The directory no longer exists and I zapped the environment variables that contained that string.

I don't think I want to emulate this one in toysh?

February 25, 2022

I've done almost no work since I got sick, partly because I'm still recovering and partially because Ukraine got invaded, which is hugely distracting. I have no special expertise here and there's nothing I can do about it, but strangely despite draining my energy it's also reducing my anxiety, by taking things that seemed unresolvable and making them... much less abstract.

You know how in Persona 5 the Phantom Thieves have to send a calling card to manifest the treasure so it becomes real enough they can steal it? That's how I feel about Ukraine. All the nebulous Russian influence behind Trump and Brexit where the Mueller report can come out and be ignored because untouchable oligarchs endlessly throw oil money around to run troll farms and bribe politicians and talking heads. The guy who ran Trump's presidential campain got a presidential pardon and the russian oligarch who got Boris Johnson elected was elevated to the House of Lords.

Russia's invasion of Ukraine brings all that stuff to a head so it can actually be addressed. The invasion is NOT plausibly deniable, and the responses to it aren't endlessly delayable until the statute of limitations expires. Tanks on the ground means now, and as with Saddam Hussein invading Kuwait there's no middle ground to hide in.

Faux News can't be propped up by Russian money when Europe has permanently cut BOTH nordstream pipelines and blocked Russia's largest banks. (Creating "the unbanked" is fine when it's punching up, doing it to a country in response to firing missiles at apartment buildings is way different than doing it to individual civilians for being seen naked.) Putin turns 70 in October: the math on a dictator for life changes when he's 3 years from the average russian male lifespan (BEFORE covid) and the sanctions only come off the day he dies and not one second earlier...

This defines the rest of Putin's life. Russians are out in the streets protesting again despite ALL the crackdowns. The morale of Russian troops is terrible: one group of deserters said they were told they'd be welcomed as liberators, a group of POWs say they were told the invasion was just more "exercises" until they were well past the border. Putin turns 70 in October: the math on a dictator for life changes when he's 3 years from the average russian male lifespan (BEFORE covid) and the sanctions only come off the day he dies and not one second earlier...

There's a kind of "big boat stuck" aspect as well, a concrete problem where everybody who doesn't agree what the issues ARE is outing themselves as a transparent shill (and even those are backpedaling). Alas this one is killing people and turning children into refugees and has a small but nonzero chance of escalating into World War III, while also bringing a bunch of other looming nebulous sword of damocles issues down into reach of resolution.

So yeah, this is distracting me and draining all my energy. (The weather being too cold and rainy to go to the table isn't helping there.)

I've still got a backlog of other links, as usual: Guillotine the billionaires, the GOP are all hyopcrites, defund the police, the USA's capitalist medical system is useless... And here's a strong argument for school lunch being free for all students: if you're legally required to be there and can't leave, how do they get to charge for feeding you?

February 24, 2022

Still sick. Got nothing done today.

February 23, 2022

Sick all day, lying on the couch. Nauseous, exhausted, and my sense of smell is misidentifying the catbox as the burnt dust smell you get from the vents when you first start the heater up in the fall, and my checkerboard tea tastes like band-aids. Fuzzy was barfing all night when she had this a few days ago, which is the _third_ time she's had this (about a week apart). Starting to suspect we got Omicron and this is the long tail sine wave thing where you feel better then it comes back then goes away again then comes back for several weeks. (As with the Flu, getting the new strain annually is only to be expected. It's Covid 19 because 2019, it's now 2022, welcome to YEAR THREE of the pandemic. Got the 'rona again. Same old same old.)

Russia invaded Ukraine and late stage capitalism still can't manage to wean itself off Russian money (and not just the right wing loons in Russia's pocket). The modern economy is literally built around money laundering for oligarchs, with corrupt regulators and complicit media. Of course we aren't sending them people or hardware, Late State Capitalism can only conceive of responding in financial terms. Meanwhile Russia's billionaires are commiting their own acts of war in preparation of taking territory directly from NATO countries, while we wait for the other shoe to drop.

The GOP is evil because they don't consider anyone else to be people just puppets, to be manipulated.

Plutocratic prudishness is a new thing. And of COURSE it doesn't stop with sex workers, they're already censoring legal abortion information and attacking gay children. The Evangelical Boomers will continue to get more extreme until their dying day.

Luckily, there's light at the end of the tunnel on that one. Turns out the great resignation is also the great retirement.

February 22, 2022

A good interview with Spike Trotman about leaving kickstarter. She was an early pioneer of the platform, and one of the first to go when they decided to "switch to blockchain" whatever that means. (Probably an attempt to escape that whole "the unbanked" thing where visa/mastercard/swift veto any financial transactions based on "I know it when I see it" moral judgements based entirely on evangelical christian religious prejudices: if your book acknowledges teenagers can be gay it can't be purchased for US dollars, and if you object too hard YOU PERSONALLY may be banned from ever having a bank account again. But at the same time, "if blockchain is the answer you're asking the wrong question", and Kickstarter is being vague about its reasoning because they don't want to point fingers at the payment processing monopoly and wind up with a retaliatory chexsystems ban on their executives. This is another reason we desperately need postal banking, where every american citizen has the inalienable right to a bank account. Once again easily fixed once the Boomers get out of the way...)

So Spike left and is now crowdfunding on her own website, using a wordpress plugin. I'm all for increased decentralization (I.E. undoing of web3), good to see the pendulum swing back the other way. (It's done this many times over the course of the computer industry, that's why X11 workstations tried to be a big thing back in the 1990s.)

February 21, 2022

Apparrently Apple's "security" is entirely performative. Youtube has a new feature to make sure the next video of police misconduct posted on a brand new youtube account can't go viral on their platform. The united states remains uniquely prudish and easily scammable via sex panics.

Fascists are murdering protestors on US soil again. It's only ever the GOP doing this, with the police equally blaming the victims. (The protests were against the police, the police say the protesters deserved to be shot, pointing out the large number of witnesses contradicting their story makes the situation "very complicated". This is AFTER portland police were documented attacking protestors 6000 times and got censured by the Department of Justice.)

Another day, another leak of secret billionaire bank account info. Who owns $80 billion of swiss bank accounts and what did they spend it on.

Of course the scandal isn't that "they still bank here": adding individuals to "the unbanked" is bad whoever they are. The scandal is the tax evasion and that billionaires anywhere remain unguillotined. "Punching up" is good, "punching down" is bad, even though both actions are punching. Basic income (plus a National Health Service and rent control) fixes most of this: let people afford to do what they want to do. And stop criminalizing sex work: we don't criminalize chocolate harvesting despite modern corporations enslaving children to do that. The triangle shirtwaist fire was about textile workers locked in and unable to leave when a fire broke out. Sex is NOT THE PROBLEM here.

The above article about the most recent billionaire banking leak also details the path by which Visa/Mastercard/Chexsystems achieved a proprietary stranglehold on the US dollar so it can unbank "immoral" customers like marijuana dispensaries and camgirls and escalate into the "war on cash" to eliminate all untracked transactions. I'm hoping that The Great Retirement (followed by The Great Die-off) will take the wind out of their sails, but really the problem of cornering the market and creating oligarchs is inherent in capitalism, going back to the Guilded Age with railroad robber barons in the 1800s, who were only displaced by the spanish flu pandemic, World War I, the 1929 stock market crash leading to a decade long Great Depression, World War II, and finally Sputnik and the "atomic bomb delivered by ICBM that no amount of radar and fighter jets can defend against so we will REORDER SOCIETY because of it". The period from New Deal through Cold War beat back the oligarchs for most of a century, but that century is up. AOC's Green New Deal was not NEARLY aggressive enough, capitalism needs to END, along with corporate personhood and civil forfeiture and the existence of billionaires. (Both as a category and as individuals. Guillotines are designed to divid. Proper tool for the proper job. Until then, Billionaires continue to literally kill people.)

February 20, 2022

Among the many MANY failures of IPV6 is the way it defeats half the purpose of ssh: every time I connect to a server via an ipv6 address it's a DIFFERENT address, and it goes "Warning: permanently added host key for address BLAH to known hosts" which provides ZERO protection from man in the middle attacks. (I know the "Earn It" act is poised to outlaw encryption in the USA entirely in the name of locking up all the Boomers who took naked baby pictures of their kids and are thus guilty of child porn by insane modern standards, while also denying that one House episode was actually describing a real thing because THESE days nobody does that under the age of 30. But it's still annoying for the infrastructure to get this stupidly wrong in the meantime.)

Grinding away merging lspci and lsusb, so they can share the .ids file loading/parsing/searching infrastructure, and while we're at it the parser for the uevent keyword=value stuff is pretty much the same too. (The lspci DISPLAY logic is nuts, but that's because it's got a half-dozen different output formats. -m, -n, -nm, -nn, -nnm, plus -e and -k modifiers for all that. And -nn puts the data in a DIFFERENT ORDER than all the others, the vendor:device data gets collated. Everything else is string [numeric] with fields optionally dropping out, but that one emits string string [numeric:numeric] unlike anything else. Hysterical raisins.)

This tweet reminds me I miss the non-STEM parts of college. I got an english minor, and had a bunch of courses in sociology and psychology and cultural anthropology and comparative religion and so on, and those were the actually useful ones. (Open source is an example of a potlatch culture!) The computer science I taught myself starting in 5th grade, and only wound up taking classes in it to pull my GPA up after the disastrous foreign language requirements. (The only F I ever got in high school was the second year of German: I remembered nothing from the first year. The only D I got in college was French: my french teacher taught french entirely in french. I tried to explain that if I knew french I wouldn't need to be there, but it didn't work because he wanted me to explain it to him in french. I WORKED for that D. Don't remember a darn thing from it, I eventually traversed the foreign language requirement with two back-to-back summer courses in spanish where the compressed schedule plus grading on a curve meant 50% was a C+.)

Oligarchs try to divert all funding to STEM and away from humanities because STEM isn't a political threat to oligarchy, and humanities are. Silicon valley techbros are politically fungible.

February 19, 2022

Good news: the copyright office refused to register AI-generated work, and the number of people "certain God exists" has fallen below 50%. Here's a good argument that Faceboot is following Myspace's end-of-life playbook. People mobilizing against the ongoing Boomer Stupidity and outing fascist funders. Meanwhile, california's sued Elon Musk for racism.

We still need to defund the police. Each misdemeanor arrest costs NYC $1750, so buying and giving away diapers and food would be signifcantly cheaper than policing petty thefts of essentials. And here's police teaming up with fossil fuel interests to stage an attack on themselves as an excuse to evade an eviction order by the native american tribes who legally own the land. They don't have to fool anyone, they just need a paper-thin excuse to "yeah but" silence protests at a casual glance: "Isn't that horrible?" answered with "Oh no, she was asking for it..."

Law enforcement is always protecting the powerful against the powerless. and the silver-spoon crowd are unused to being on the wrong side of that, so every time they meet a bigger fish or swim out of their little pond they cry like babies over the injustice of this thing that has never happened to anyone else before.

The war on cash continues. All purchases must be trackable back to an individual in realtime, and the recipient of the money potentially culled from the financial system. Switching everything digital is a bad idea. Once upon a time you could do the NYT crossword without having your data harvested, but of course Wordle is full of trackers under NYT ownership.

If the Earn-It Act punches a big hole in Section 230, Google's ad business could wind up with an awful lot of legal liability, and they'll still somehow manage to avoid going after the real problems (which existing laws apply to just fine). Meanwhile creeping prudishness continues to have knock-on effects, and the GOP is openly attacking contraception already. They consider taking away abortion a given at this point because of the supreme court packing, they're on to the next misogny. And as always, Boomer Biden fixes nothing.

Scammy right wing loons never stop fighting to chip off the most vulnerable 10% of everyone else.

February 18, 2022

I was recently pondering that the switch to electric cars means you can't drive around the country anonymously anymore. Even if you "own" one, and somehow disabled the built-in cellular tracker that does the "software updates", you can fill up a car at a gas station with cash bu all the chargers are credit card only, and I expect the vehicle talks to the charger reporting its VIN.

I've long considered atheism a religion the way zero is a number, and that agnosticism is the "lack of belief", because you can't prove a negative. Whether it's the existence of thor, zeus, ganesha, ameratsu, pele, anansi, osiris, morrigan, sun wukong, baba yaga, or santa claus. Religious people have lots of distraction arguments, like "was santa claus based on a real historial person", because obviously if that person did exist that proves they're still around today as an omniscient arbiter of morality seeing you when you're sleeping or awake and judging you to be bad or good while conveyed by a flying sleigh around the world to millions of houses in a single night single night with a dimensionally transcendental toy sack filled by elves living at the north pole. (Frosty's hat is clearly "Emet" brand, and has a label on the brim.) But since we all know the songs and can name the reindeer, and when we were young our parents told us it was true, all that counts as evidence for it being true. And obviously you aren't qualified to argue Klingons aren't real if you don't speak Klingon and haven't first read every novel and seen every episode and movie Klingons appear in (including the cartoons!) to confirm that proof of their existence isn't in there somewhere.

But some people strongly disagree with "agnosticism" being the default, and consider atheism to be the "lack of belief" one. It's a semantic argument, really. Personally I got disgusted with the atheist "movement" a decade ago prostelytizing (ala "I know better than you do what you really are, and what you must call yourself"), and then either turning into outright fascists or (eventually) stepping way back and reevaluating their lives. If you have missionaries, try to convert people, and have meetings about your belief structure, you just might be a religion. Firm belief in nothing is still a firm belief. Absence of information is not the basis for belief. Black swans turned out to exist. The saddest thing about Elon Musk (and it's a long list) is he did NOT put a Utah teapot into his stupid red sportscar he launched into space. In honor of Bertrand Russell, of course.

February 17, 2022

There are still avacadoes at HEB today, despite the import ban. (I'm not sure how much of this is "don't threaten US officials" and how much of it is the US government telling mexico to get its act together with all the reshoring we're doing there.)

New covid strain dropped.. 30-50% more contagious than Omicron, as injurious as Delta (it's got that s-gene back), mutated enough to reinfect, detected in 47 states, just as the CDC is declaring the pandemic over (just like Denmark did). So looks like March is gonna be a fun month, although as long as you're fully vaccinated you should be ok, modulo no hospital beds for heart attacks, car crashes, appendicitis...

Meanwhile in global warming news...

Hmmm, got a bug report that sha1sum and friends treat directories as empty files, when they should treat them as -EISDIR and skip them with error_msg().

This is actually a library code problem: an awful lot of stuff should refuse to accept a directory in place of a file. There doesn't seem to be an faccess() variant that works on a file, just fstat(), so the check is doable but a bit more heavyweight than I like. (Eh, the overhead of copying a bunch of dentry data into a struct is probably swamped by the system call entry/exit code.) The two obvious places to put this are in loopfiles_rw() or in notstdio(), and although the second is the more thorough fix, adding a stat() to basicaly every open() call makes me wince, and auditing all the calls for correctness (69 xopen and 23 xcreate under toys/ according to a quick grep, plus more in lib/ and possibly main.c) seems uncomfortable. And those are just the DIRECT calls, not through something else.

Speaking of which, there are 48 loopfiles calls under toys/*/*.c (each of which goes through xopen), and just auditing those if that's where I put this fix is... annoying. But that's probably the first place to fix it.

I checked in bash and "echo hello < dirname" is not an error. (No, I'm not poking the maintainer. "The implementation is the specification" is hard enough to cope with, and then he keeps FIXING things I point out to him when I just want to understand why it's doing that.)

February 16, 2022

Naiomi Wu just did a good thread about how Youtube's CEO "has decided to deliberately target channels this year that *aren't* breaking the rules but that she just doesn't like. One of the things Susan doesn't like is women in STEM channels since there are none in the top 100 tech channels." Here's a previous thread she did (and a study linked in the comments).

Meanwhile, since youtube's music playists are now "ad for ulcerative coitus, one song, two 30 second ads for bitcoin, one longer song interrupted midway by another ad", I decided to dig up adb and use my USB cable to copy mp3s onto my phone, but "Android Play" has been discontinued in favor of youtube music. Android has LOST THE ABILITY to act as an ipod replacement, they've ceded that market entirely to iPhone (although the usual reason iPhone users give for 2/3 of the US phone market having gone iPhone is the blue/green texting thing. Android taking multiple years to maybe eventually also block Facebook's tracking cookies might also factor in somewhere, although Apple has plenty of its own problems there).

I am old enough to own easily rippable CDs, and digital downloads that aren't permanent rentals locked inside a walled garden ready to go away when they do. I have my own music, I wanna put it on my phone and listen to it, and stock Android is no longer capable of that. So I downloaded VLC for Android, which wants global read/write access to all files on my phone. No, I'm putting files in a directory and pointing it at them, IT DOES NOT GET WRITE ACCESS. But it can't do read access without write access, just net play. (Which sounds kinky, but I'm busy.) So I installed "simple music player" which does the normal "allow app to access photos and media". (That's what it needs to do its job, and if it uploads my cat pictures to Russia, oh well.)

February 15, 2022

Boomers solve nothing. Boomers can never resist letting Russia spend money at them, which is why they're so easily hired to push Russian agitprop. Sedition is just another commercial endorsement deal to them, like hawking shoes or burgers. They keep letting the same clowns cause trouble over and over, without learning anything ever. (Speaking of which, Canadian truck loons are still in the news.) Everything the Boomers did was ultimately self-defeating, and the lead poisoning combined with senility has made them really stupid.

Oligarchy uses racism as a shield and a distraction. (Keri Leigh Merritt wrote a whole book about this in 2017.) Meanwhile actual racists are stupid and spend their time manufactuing victimhood and being profoundly hypocritical.

Quartz' translation of the spotify apologia is weapons grade snark. And here's a a long 4 part thread of actual analysis of the podcasts triggering the backlash. But the rot at spotify goes far beyond Joe Rogan.

Amazon needs to be broken up: turns out Audible's shafting creators too. Abolish ICE. Remove cars. Slumlords are a problem. An excellent analysis of why all blockchains suck. Uber is a scam. Facebook is also a scam. Youtube continues to suck. The great resignation is happening for many reasons. Religious parables are often kind of horrifying if you understand what they made a story about. Defund the police, then defund the police some more. Funds meant for other things keep getting diverted to police.

Don't forget how bad the guy behind the Comstock Act was. historical censorship has always been insanely stupid, but the right wing loves endless creeping censorship, intimidation, voter suppression, and conspiracy.

The earn it act continues to be horrific. First they came for trans people, then they came for sex workers, then they came for gay teens (and teachers), then they came for abortion rights... How any of the "freeze peach" advocates support the end of anonymity and banning encryption in the name of being shriveled up old prudes is beyond me. (Sex workers look out for each other, the reason there's so much trafficing is they're prevented fromm doing so, and the ones who are preventing them from helping each other are the ones victimizing them: the same rich white men who are openly persecuting them are paying for the trafficing. It's the old "homophobe turns out to be gay" issue. No catholic priest gets to weigh in on this issue.)

No it's NOT the stimulus payments: inflation is happening because the cost of shipping things from china went from $3k/container to $30k/container with a month long wait to unload in the port of los angeles, and it goes back down when we finish onshoring/reshoring the manufacturing within NAFTA. Playing with the demand side (via money) only has an impact when supply can't keep up, it's SUPPLY constraints that cause inflation, not demand. The classic example is rare comic books where there's only a few dozen bidders for that issue in the WORLD and if you found a box of 1000 of Batman's first appearance in an attic somewhere the price of all of them combined would plummet to less than what ONE costs now. "Act now supplies running out" is a classic sales line trying to induce demand via the illusion of scarcity. Cornering the market raises the price, it's how Late Stage Capitalism works: when everyone is adequately supplied, the thing being produced is cheap. Only when there isn't enough to go around do prices get bid up. It doesn't matter what the DEMAND is, air is still free and clean water coming out of the tap is cheap enough to shower in the stuff without a timer. Inflation is happening because of disrupted SUPPLY chains, not a change in demand.

The reason I keep coming back to twitter isn't what twitter thinks it is. Unfortunately last year twitter was purchased by a right wing loon, just like the Wall Street Journal and Newsweek were. "Conservatives" are all about censorship. They want everything they say to be mandatory viewing (clockwork orange style when at all possible), but to be the only ones allowed to speak and immune from criticism.

Elon Musk, "incompetent narcissist". He should have it on his business cards. Guillotine the billionaires.

February 14, 2022

The Atlantic just ran an article on how the USA's doctor shortage is intentionally caused by the AMA imposing medical school graduation quotas, restricting visas for foreign doctors, and preventing nurses from taking on doctors' tasks. It's basically the same story from this excellent 2009 writeup, and of course I've done my own deep dives into the topic before: we can't fix healthcare in this country until the cartel intentionally screwing it up for profit gets broken up.

The reason for fiddling with struct string_list yesterday is I wanted to add struct dev_list next to it, because lsusb and lspci are reading basically the same file format for their device ID databases. There's a third layer to pci.ids, but toybox's lspci isn't using it (because it doesn't implement -v, which turns out to be kinda fraught).

But I've sort of changed my mind and decided that merging lspci and lsusb into the same file probably makes more sense? Because I don't see a third caller showing up any time soon, and neither of them is particularly big (each a little over 100 lines).

Meanwhile, I REALLY need to break up ps.c and move the shared code to lib/ps.c, and that's... a can of worms. But triaging commands to do a new round of documention on them (videos) is resulting in various cleanups. As documentation always does. It's easier to fix it than explain it. (Which can be read as BOTH avoidance productivity and filing off corner cases that are too much backstory to digress into during the writeup.)

February 13, 2022

Reading the C99 specification is no fun. I forget what I was trying to look up but along the way I convinced myself it did NOT require unmentioned structure members to be zero initialized when you did &(struct boing){.potato=37;} (I mean, it WORKS, and you can see the explicit initialization via objdump -d, but I wasn't sure it was RELIABLE because {1,2,3} and {.a=1,.b=2,.c=3} seemed to be treated different by the c99-draft? The confusing part is that section says "Unnamed members of structure objects have indeterminate value even after initialization", but apparently section is the part that says to zero initialize them, and "elements or members" clarifies it applies to both styles of initialization. (According to grep '[)]{ *[.]' main.c lib/*.c toys/*/*.c the only command using member initialization in compound literals is host.c, which I rewrote NOT to do that and then didn't check in because... no, it's fine as is. I think.)

I then tried to look up what "unnamed members" _ARE_ but it seems to be 1) a bitfield thing (might have seen that but I can never trust bitfield endianness and the compilers traditionally generate HORRIBLE code for bitfield access, so I always shift and mask myself) and 2) a way to add padding to a struct? (Never used it, never encountered it. Lots of stuff adds padding to structs but they always call it stuff like char __unused1[2];)

Ah right, I was trying to look up whether the variable length array element at the end of a struct should have [0] or just [] and it's section that says it's char blah[]; with nothing in it. (It of course adds some nonsense about access being "undefined" and then immediately subverts it via the examples in 16-18, and I refuse to be baited twice by the same standard in the same session. This one I KNOW, I'm just making sure that removing the 0 from struct string_list in lib/lib.h is the correct thing to do. They're equivalent with or without the 0, but static checkers throw a wobbly and people keep trying to use them...)

Heh, I apparently need a current writeup of my whole "Linux uncontaminated by gnu" plan to make a system you CANNOT call "gnu-slash-linux" because it isn't even built with gcc. I've mentioned this multiple times over the years, I think it was in my 2013 "rise and fall of copyleft" talk at ohio linuxfest for example, and probably on the old aboriginal linux mailing list long before that. Heck, it was probably in my old livejournal back in the busybox days, and a quick google finds people referencing me talking about it back in 2007. And, of course, my original sin that interested me in undoing the damage from the toxic waste spill I'd initially triggered in the first place...

February 12, 2022

Apparently the reason for the boil water notice (attachment B, page 1, "How exactly did this happen") is that Austin adds lime (calcium carbonate) to the water to soften it, and somebody left the lime running overnight until it clogged up the filters. Whether this was related to somebody being unable to get to work because of the ice storm is not specified.

Yay, I figured out how to switch the keyboard backlight in this laptop on and off. (Despite poking the geology building's management at UT a couple months back, they have yet to fix the timer that turns the lights on at midnight, so if I get there at 10pm I'm in the dark for a couple hours, and the keyboard backlight is very convenient.) It's function right-arrow. While fiddling with that, I toggled something with a moon symbol, and something that's three squares fit together. (I was thinking it MIGHT be a keyboard but it's probably supposed to look like the mousepad and the left and right click buttons under it?) I hope selecting whatever it is an even number of times turns it back off so I don't notice some weird new symptom I forget how I triggered days from now. Ah, it turns out there is a webpage explaining that stuff, and most of it doesn't work on Linux anyway. Oh good.

I'm going through lots of little toybox commands that might make good short intro videos I can both get out easily and use to learn video editing. (This is resulting in lots of little cleanups to commands I haven't looked at in a while.) There's a lot of "I need to explain everything before I explain everything else" circular dependency stuff in this, but there's pretty clearly gonna have to be playlists: how to use commands, how commands are implemented, common infrastructure, building bootable Linux systems...

I want to do videos with GROUPS of small commands, because true, false, clear, reset, sync... they're all basically one function call. Another genre is "open a file and call an ioctl on it", maybe with or without a loopfiles(), which covers freeramdisk, fsfreeze, partprobe... (Huh, I have a "detach.c" in my tree that isn't checked in, I should look into that...) It seems a bit silly to have a "how to use unlink" video, and then an "unlink.c implementation walkthrough" when each is under a minute: the implementation is "if (unlink(*toys.optargs)) error_exit("message");" plus the usual command boilerplate and help text.

But at the same time if each command has its usage video and it's implementation video, it's easier to do a table of them. A few commands (ps, sed, sh...) will probably need multiple implementation videos though. I REALLY need to do the breakup work on ps.c to make the 5 commands in it have their own files and move the rest to lib/ps.c before doing an explainer video on that. (The shell and sed are just big, I expect awk and vi to be too. I see tar.c has crept over 1000 lines too, that one's not CONCEPTUALLY hard to explain but somewhat voluminous.)

Everything else (that's completed and not in pending) probably fits reasonably well into a single implementation walkthrough video that isn't exhausting to watch. But even the "how to use" video for sed or sh is gonna be a bit of a marathon viewing if it's all one go...

February 11, 2022

To avoid burying the lead: people are organizing against the latest monstrosity, and here's Stephen Fry on Graham Norton decades ago explaining the fundamental problem.

So the anti-sex evangelical loons have a new bill to make showing ankle on the internet illegal: the Earn It Act. (Of course the main people stopping sex trafficing are sex workers, but victim-blaming is fundamental to christianity. Eve listened to the snake that God made and let into the garden, so God held a grudge for thousands of years and had to arrange a human sacrifice to give himself permission to (conditionally) forgive their descendants. That is literally the core story of christanity, so if the church admits Eden is a "metaphor" then original sin needing to be "forgiven" by this benevolent god is...?)

Wikipedia blacked out to stop SOPA and PIPA, and white guys spoke out against it, but not FOSTA and SESTA because that was "icky speech". Germany's nazi party started out by treating trans people as non-human, and expanded it to concentration camps for millions of people. Of COURSE the misogynist evangelicals going after camgirls won't stop there, they want to rigidly censor the internet and bring back the Comstock Act.

But the real driver is to force everyone participating in the great resignation back into the crappy jobs they left making plutocrats money. And yes, they're still using other tools like civil forfeiture (except that example was Amazon convincing a friendly judge to steal the money straight from bank accounts), and of course both of the news stories linked from that gofundme are behind paywalls.

The evangelical clowns burning a zillion books are never addressing actual problems. The actual fixes are all undoing damage the GOP intentionally caused in the first place (or at least pushing back against it). There are zero bright lines with prudery, just one big long gradient smoothly shading through every possible grey. The censors will never have enough, and will keep grasping until the Boomers die and we end capitalism.

Speaking of capitalism, Faceboot's cover-up of its role in Cambridge Analytica just blew up in its face. The billionaire whose seed money was inheriting blood emeralds from apartheid south africa might be a giant racist (who could have seen that coming) and he continues to be arrogantly incompetent.

February 10, 2022

It warmed up a bit, and I went to the table and actually stayed for a while and got work done. (Also, HEB restocked on my lovely checkerboard tea.) Enjoyable evening, on the whole.

Except I tried to walk down to the river first (and then walk back TO the table) to get a longer walk since it's an exercise-poor week, and the construction downtown is just NUTS. And extremely pedestrian unfriendly. You can't walk south on the east side of the statium, it's all fenced off, so I kept going east along the fence until it finally gave out IN THE I-35 FRONTAGE ROAD. No sidewalk, having to walk IN the road for multiple blocks down to... somewhere past MLK I think? With a 4 foot high crash divider a high chain link fence right on the other side of it. And then when I finally could get back out of the road and went west a block before continuing south... that road gave out after half a block and was all fenced off by construction and cranes. So I went back north, another block west, started south again... same thing. Wound up walking back to the table from there (still through a lot of construction and backtracking, but less bad when I got away from I-35).

I have no idea what they're doing. I vaguely expect it's related to the I-35 expansion that's tearing down the upper deck and killing the bat colony at 42nd and turning the whole thing into a belowground flooding deathtrap? Yes, looks like it is. What a horrible project. It's ALREADY making downtown unwalkable just in the CONSTRUCTION phase (no pedestrian through access for literally multiple blocks). We've known about induced demand for decades now, Expanding the highway is totally self-defeating. The proper thing to do is reduce car dependence by eliminating single family zoning and parking minimums and so on. You don't have to drive when you can walk to the dentist, but you can't walk to the dentist if zoning doesn't allow a dentist to set up shop within a mile of your house, and literally half the space in your city is legally required to be parking lot.

February 9, 2022

It's fascinating, I can never remember the name of the hackermonthly PDF (let alone the URL for the original slightly wrong explanation I posted to the busybox mailing list off the top of my head a dozen years ago) so I googled for "landley usr bin" to find it and...

No, Ken and Dennis did not have floppies, the PDP-7 used a ramdisk loaded from paper tape. (And /usr wasn't "renamed", it was the user directories with a three letter name like bin, etc, lib, and dev because the pdp-7 had 1024 18-bit words with half running the OS and half being used as a ramdisk so yes the extra letter counted and they kept the name when they moved to the pdp-11.)

No, dynamic linking did not solve any problems with static linking, it made things WORSE so that the proposed "mount this, then that" approach wouldn't work reliably. (And yes, the a.out executable format had shared libraries back before ELF.)

Obviously I can't really throw stones when my own first stab at the explanation got the disk sizes wrong. (I remembered the TOTAL size but not how it was partitioned, nor how MUCH faster the tiny system disk was than the external disk packs. In modern terms they filled up their SSD so leaked the system onto the external USB disk holding user files, and then got a second USB disk later to move the user files to so the system could have all the first one.) That's why I went back and corrected it when a magazine asked to republish it, and gave links to primary sources at the end.

February 8, 2022

I have filled out the "Embark" online paperwork. I already know one thing I got wrong, but that's normal for me filling out paperwork: back in middle school I expected about an 85% on tests where I knew the material cold, just because it was paperwork so I would of course fill some of it out wrong, miss the back of one of the sheets (of the instructions if not the test), and then hand it in to the wrong place. (Programming is all about iteratively fixing the thinkos when the machine points out where you got it wrong, sometimes backing up and frogging entire sections when something's misaligned.)

To relax, I'm trying to get toysh to match bash's behavior for moved and deleted current working directories. I already added a test:

mkdir -p one/two/three
testing 'cd in renamed dir' \
  'cd one/two/three && mv ../../../{one,four} && cd .. && echo ${PWD: -9:9}' \
  '/four/two\n' '' ''
rm -rf one

Which the host passes but toysh doesn't, and the logic is sadly a bit tangled. But then bash is a bit confused too. When launched in a deleted current working directory, bash goes:

$ env -i bash --norc --noprofile --noediting

shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory


Except I ran it under strace and piped the output to less so I could search it (even with --noediting bash and less are trying to read from the same tty and getting alternating letters until you hit control-C which kills the bash instance but not the less instance for some reason; probably gotta make my less implementation do that too eventually) there was only ONE call to getcwd() (with a fixed size 4k buffer instead of an automatic allocation), so I have no idea what it's talking about "parent directories" plural?

I added support for "cd -" while I was there (and already emailed Chet back on January 20 to add it to "help cd" in bash because seriously, it's in posix: 'cd -' is 'cd "$OLDPWD"' the same way 'cd ~' is 'cd "$HOME"').

It was already exporting $PWD and $OLDPWD only on the first cd, because that's what bash does (and yes I should add a test):

~$ cd sub
~/sub$ cd ..
~$ unset OLDPWD
~$ cd sub
$ declare -p OLDPWD
declare -- OLDPWD="/home/landley"
~/sub$ declare -p PWD
declare -x PWD="/home/landley/sub"
~/sub$ unset PWD
~/sub$ cd ..
~$ declare -p PWD
declare -- PWD="/home/landley"
~$ env -i bash --norc --noprofile -c -- 'cd sub && env'

It SETS them both each time you cd, but only _exports_ them the first time, and just does a local assignment the rest of the time. And I do mean "local":

$ x() { local PWD; cd .; }
$ unset PWD
$ x
$ echo $PWD

So I think my code was already getting that right (I need to add more tests) but before it was wasting an entire int for this in the TT globals block and I switched it to using the high bit of TT.options, and collated the export statements under the same test.

When I say I'm not just writing a shell but specifically a BASH replacement, I'm SERIOUS about that. My spec is not posix, or even the bash man page. It's WHAT BASH DOES. Tied to the comfy chair, poked with the soft cushions, and added to the regression test suite. I'm reluctant to communicate with Chet about stuff because he keeps FIXING THINGS I POINT OUT and thus changing the behavior. Which probably breaks somebody's script somewhere...

February 7, 2022

Still not sleeping well, because anxiety. I did not used to have anxiety. It comes and goes, but having it lurking is INCREDIBLY ANNOYING.

Corporations are not your friend, and regulatory capture is a thing.

Intellectual property law destroys intellectual property. Ten years ago the netflix DVD library had 100,000 titles in it, but the online streaming version they transitioned to keeps losing content and these days has 15,000 and shrinking. All the services are like that: Youtube just showed me a clip from something called "Sakura Trick" which Google says came out in 2014 and was on Crunchyroll, but of course it's not on Crunchyroll anymore. It's about two high schoolers dating; if they were heterosexual it would be "cute" but if they're gay cue the pearl clutching about corrupting the youth. Me, I'm thinking it might be better "I want to learn japanese" background noise to put on without subtitles than Hulu's run of the original Sailor Moon, because watching Usagi murder people and be lauded for it "because protagonist" gets old. Each week she kills somebody who was passing as human just fine at the start of the episode, and I have yet to see any of her opponents actually permanently harm anyone? As I said, watching without subtitles so I couldn't tell you much about the plot. It's hard to find things that aren't so bad they're unwatchable but aren't good enough I turn on the subtitles to follow them after a few minutes.

Of course the Boomer extinction burst of evangelical prudishness is cover for late stage capitalism misoginistically shoving self-employed young women back into retail jobs, and the war on cash (no untracked purchases for-profit corporations can't individually veto and PUNISH you for trying!) has literally escalated to police seizing armored cars. (And no, crypto solves nothing.)

I'm confused by Mark Cuban. He's a billionaire, and he's maybe possibly doing a good thing? Most likely because he wants to run for president (which should DEFINITELY NOT HAPPEN), so it's really for his own personal gain, but... I wouldn't want to STOP him from subsidizing affordable medications, if it is true? It's not Mackenzie Scott's get-out-of-guillotine free pass (checkboxes: did not voluntarily become a billionaire, is convincingly and consistently attempting to STOP being a billionaire, has embraced Uncle Ben's Spider-Mantra in doing so). Mark Cuban isn't absolving himself of the slumlord guilt of having taken a billion dollars from poor people and hoarding it (starting by giving it BACK), he's doing the "little people" a favor by pushing back against a specific societal problem that plutocrats have been consistently exacerbating since 1970 (here's a lovely writeup on that). But that IS one heck of a favor... Maybe it means he goes to the back of the line leading up to the guillotine? Hmmm... (Obviously it's not my call, but I'm trying to figure out what to lobby for. In a just society, hoarding a billion dollars while anybody is homeless would be a capital offense.)

This is an unusual exercise because most billionaires constantly broadcast how bad they are, and even PR-minded twerps like Elon Musk are much clearer: sure Muskrat's cheerleading the advance of cool tech, but none of it's happening because of him. He bought Tesla, he bought Solar City, he bought Maxwell. All the advances in electric cars and solar panels and batteries were already happening and he literally purchased credit for other people's ongoing work. ("Oh but they might not have been sufficiently funded otherwise" that's only because for a hundred years the USA massively funded basic research but we STOPPED when Reagan lowered the anti-billionaire tax shield until plutocrats could hijack the government and iteratively "drown it in a bathtub". The Boomers have been eating the seed corn since they took over.) Musk's assets have been mortgaged to the hilt for decades, all of Tesla's money (like Uber's) comes from friendly financiers willing to hand him tiny slices of the $21 trillion the USA printed after the 2008 mortgage crisis and gave to the BANKS rather than using it to bail out any homeowners or students with loans or the piles of predatory credit card debt... That man is a scam artist. Adam Something and Common Sense Skeptic have multiple video take downs of his endless dumb ideas. Ford's F150 Lightning is still restricted by the dealership model, even though they're inching away from that model they're hamstrung by stupid protectionist laws from years ago, and nothing Musk did to punch holes in that for Tesla fixed the dealership problem for anyone else, because he's not ABOUT fixing anything for anyone else. He's about getting a spotlight shone on him as a savior because he inherited african blood emeralds during Apartheid which gave him seed money to hang out with finance bros during the 1990s dot-com boom.

But even without the threshold being "net positive", it's rare to see a billionaire where guillotining them immediately would have ANY discernable downside. Encountering a second brings us from "Spiders Georg" territory into... black swan event maybe?

February 6, 2022

Walked out to the table for the first time all week. It's been below freezing each night, and the UT Carrilon plays "I've been working on the railroad" (no really!) twice through very slowly starting at 9pm, but in between there's a few hours were it's quiet but not TOO cold yet.

Didn't really get anything done, but oh wow I needed the exercise. Couple hours of walking. Listened to podcasts in the wifi headphones. Good times.

February 5, 2022

Austin has issued a "boil water" notice. Not due to the freeze, due to "human error" which usually means infrastructure mismanagement and underfunding/understaffing.

Various links from my talk went up on the golug website. Basically stuff I'd pasted into the chat channel. There's no archive of the video that I can find, and the page itself doesn't seem to be set up to have a persistent link to any of the entries. Presumably will snapshot it at some point? Still, I'm glad there was interest, and I need to make my own videos on this.

Some good political news: multiple federal courts have struck down republican gerrymanders in upwards of 5 states. (Doesn't stop the voter suppression, propaganda networks, replacing the people doing the counting with partisan hacks... But still, step in the right direction.)

February 4, 2022

So it looks like logging into Google Embark (their contractor payment thingy) with my main email address has a hiccup, namely that I break everything.

I've had the google account my email is spam filtered through for many years, through multiple upgrades of Google's systems, and it's... kinda borked. For example, to this day I have to log OUT of google to use because otherwise it's not an "authorized service".

Elliott recently pointed me at a bug in (that's what those /b/ links translate to for the rest of us) and I should probably log in to comment on it, but I used my "work account" (at to view it. The gmail account my personal email goes through is some sort of "google domain" organizational thingy set up in 2008-ish by a friend who has since undergone a religious conversion and moved to Europe. Once upon a time gmail wouldn't accept mail for other domains unless without special setup, and if I ever had the domain-side login to that it would be a half-dozen laptops ago.

Both the organization account and my previous individual Google account had the login "landley" (yes, at the same time: initially different namespaces), and when Google collapsed them together it had a name collision and got itself VERY confused (meaning I couldn't use Google Plus for the first two years of its existence), and Google periodically sends me emails like this one from Thursday:

We are writing to let you know that your G Suite legacy free edition will no longer be available starting July 1, 2022. To maintain your services and accounts, you need to upgrade to Google Workspace.

Which I have historically ignored and they auto-migrated whatever they had to and email still worked so I left it alone, but I do not consider an account with a pending Windows Scheduled Shutdown timeout to be load bearing, and if it STOPS passing email I move the DNS record to dreamhost's mail servers and update the pop3 and smtp server addresses in my client.

I've been thinking of preemptively pointing my domain's MX record at dreamhost's servers anyway to cut the gordian knot, but to be honest have been reluctant to touch it. And now I need a Google account to use Google vendor stuff and should probably interact with android's bug system directly and...

The thing is, Google's "Embark" draws a bright line between "gmail" and "non-gmail" addresses for some reason, and moving the MX record in dreamhost's nameservers (where the domain is registered) doesn't clean out the account association on Google's side: it still thinks "" goes with the 15 year old fused pair of gmail accounts that have undergone multiple system migrations. The tutorial video warns that if I try to tell Embark my email is an external address when it smells gmail, it will throw an "email mismatch error".

I could use my phone's gmail account, but my phone has NEVER had authorization to spend money, except for one fixed-size gift card I fed it back in Milwaukee. (The problem with programmer paranoia is it tends to be justified, not that intel is any better. It's not that the phone is inherently less secure, it's that the always-on orwellian tracking device with the whole-room microphone, camera pointed at the user's face, and second-by-second GPS tracking is a much higher profile target, and reusing google accounts between phone and payroll smells WAY too much like reusing passwords.)

Besides, until recently I just created a new gmail account for each phone. I kept it for the most recent phone because I remembered the login/password and because I wanted my youtube music playlist, but youtube has lost that capability and no longer lets you listen to two consecutive songs without 60 seconds of advertising in between. Youtube has taken away the POINT of having a persistent account for phones, and I'll probably just create another one once the Pixel 3a stops being supported (or allowing battery replacements).

I asked Elliott if there was somebody at Google I could talk to who could maybe apply cleansing flame to the situation, and he's never even MET anybody who works with gmail, and doesn't believe there are any at the Googleplex. (My own theory is it's entirely automated.)

Anyway, if you're wondering why I compulsively document and make everything be a self-contained reproduction sequence script with minimal external dependencies and regression tests... (And also why I use a vanilla unmodified distro on my laptop with a fresh reinstall each version upgrade and a written checklist of setup changes, and why I leave my phone running the stock Android version with nothing sideloaded.) Bit-rot is sadly real: the context shifts out from under you, the space something lived in stops being maintained, migration copies symlinks but not what they point to (at a dozen different levels of abstraction), they back up the data but not the reader/parser...

I got innoculated aginast this early, by the time I'd moved from a C64 to Amiga to DOS to OS/2 to Linux I was ALREADY expecting "this too shall pass" and becoming a digital pack-rat. But it's hard to be paranoid ENOUGH...

February 3, 2022

Austin is paralyzed by an ice storm. HEB is open but understaffed, which is ok because there's almost nobody here. (And then they closed at 5pm.) This is nothing like last year's blizzard, though, this is just our version of a snow day, we had a worse ice storm than this the first year I moved here. (That was over an inch of ice on everything, this is like, a third of an inch tops. Makes the trees pretty, but they'll live.)

Politics is unchanged: the GOP is a terrorist organization, missouri has legalized lynching, and Oklahoma just outlawed teaching evolution. (Because if christianity is wrong about where we came from, why would it know where we're going to? Must preserve useful societal control where the man molesting the choirboys is the man telling the choirboy what is and isn't moral, and has the child confess all thoughts to him in a small booth where holding back any private thoughts the child DOESN'T confess is its own sin, and then if the child breaks the seal of the confessional to show where on the doll the man in the frock touched him, he will be tortured in hell forever. And when the boy grows up, his stockholm syndrome will fight sex workers and defend priests.)

February 2, 2022

Yay, Lars pointed me at the Google individual contractor signup process (but warned that there are usage limits in a given year, presumably fallout from the Microsoft permatemps lawsuit). So I don't have to go visit my credit union to open a new bank account and so on, yay. They have an unlisted youtube video walking people through the web wizard. (I'm not sure if Embark is the name of the new system or just the wizard part. Still, it's quite straightforward, and I should totally make the account and do the thing. Any time now...)

I poked Jeff about whether he's running payroll, and he wanted me to commit to working for him full time and blowing off Google as a condition of paying me for this month. He wouldn't say it in so many words, of course, but "being committed"... He IS my friend, and I'm sympathetic to his position (he's understaffed and trying to hit ambitious deadlines to get the next tranche of funding), and I want to see everybody's projects succeed, but I'm still recovering from stress burnout from Stenograph (and the pandemic and the trump administration right after riding down SEI in the first place) and CANNOT overcommit myself right now. (I'm already to the "want to lie on the couch for a week and do nothing" stage.)

Speaking of overcommitting myself, I really need to do the toybox videos I prompsed my Patreon supporters. I have scripts, I need to sit down and learn blender video editing (the way I need to sit down and make that Embark account and track down my shoe size and blood type and whatever else it needs; finish watching the videos first, probably. Yes the video editor is a tiny bump on the side of a large project, but it's no siller than Netscape having a built-in email client was, and I used that for years. This one does not require installing half of KDE to boot it.)

Tonight I gave a mkroot talk at Orlando Linux User Group (because their call for talks went across the Silicon Valley LUG list I'm still subscribed to from when Jeff and I gave a talk there years ago, and I decided a practice run at the mkroot talk I need to do for patreon wouldn't hurt). They're using a web page called that can screen share on Linux assuming you're not using any CPU for anything else. (I have 4 processors in this laptop and it used half of each of them just to do the tiled Google Meet thing even when I wasn't presenting.) I walked people through the three sections of the mkroot source (setup, creating the root filesystem, kernel build and packaging), and referenced the initramfs docs I wrote years ago (and need to update: the new domain for the web archive links is, and the miniconfig docs, plus links to some of the kernel patches I've sent upstream over the years to actually fix issues I'm working around. (Of course they were ignored, I usually give up after resubmitting 3 times and the successful ones need to be consistently resubmitted over and over for at least five years years with full-on dead parrot sketch levels of explaining why.) I even briefly went into the dropbear build to show what adding packages looks like. During the Q&A I diverged into 0BSD with a little backstory of how I wound up there...

Now I need to figure out how to break that material up into a series of 5-10 minute videos that progress logically but aren't TOO hard to understand out of context. Hmmm...

HEB was positively SLAMMED today because there's an ice storm coming, so the line to use the self-service checkout machines ran all the way around the deli and into produce aisles. They apologized for being understaffed (despite offering $17.50/hr to start according to the signs), but it's panic buying. Last year we went down under 10 degrees celsius and STAYED there, all the pipes froze including the gas pipes leading to the turbines generating electricity: it comes straight up out of holes they drilled into the pipes with no processing, which means they didn't bother to remove the steam and such mixed into it, which means if it's cold enough ice plates inside the pipes until they close up.

Meanwhile solar panels are self-de-icing if you install them at a slight angle, because the parts that aren't lit up act as resistors and heat up when current from the lit bits tries to flow through them, so the snow on them melts at the bottom until it slides off, and the newly revealed bits collect more electricity to heat up the remaining parts faster until it's all clear and dry. (This is why casting a shadow over part of a solar panel disproportionately reduces the amount generated, and they use microinverters to wire them up individually instead of in series when they can, so the shaded ones merely don't generate instead of consuming the power produced by the other panels).

But that was a hard freeze lasting days. This is just getting down into the 20s overnight (below freezing but not by that much) with rain putting a thin layer of ice over everything, and then going back up above freezing during the day and then down into the 20s again the following night (repeat for most of a week). That's not even climate change, Austin was doing that when I moved here in 1996. (We have six weeks of winter every year, they're just not consecutive. Last year we got in trouble because two of them WERE consecutive: the destruction of the Polar Vortex means nothing's keeping the the arctic winds up where they belong anymore. So the ice up there doesn't get built up and maintained, and all the cold leaks down here where it's just gonna melt again immediately. Leaving the freezer door open does not mean your frozen peas are going to last longer just because it feels so cold in the kitchen. It's a BAD sign, not a good one. The cold should not be down here, it should be up there: it went walkabout because the oil companies screwed up the weather.

February 1, 2022

Turned 50 today. Forgot until well after I woke up, and then wound up taking the day off and (let's be honest) moping.

Fade ordered me a fresh set of headphones. Yay. (I do break them every 6 months or so; they have planned obsolescence plastic hinges that are always the first thing to go. If those were metal I'd probably still be on my original pair. Capitalism again.)

January 31, 2022

The plan for today was to file new LLC paperwork at the Secretary of State's office when they opened at 8:30 am, but it was pouring out when I woke up. I tried to wait it out (the #10 bus picks up across the street but I still have to walk several blocks downtown and then I'd planned to walk to UT from there to work with my laptop), but by around 2pm it was clear there wasn't going to be a good gap so I went anyway.

Talked to the LLC specialist in the little booth for half an hour, and came away with a stack of paperwork to fill out and return. They say that with covid it could take up to 40 days to process, although I can bribe them an extra $25 to cut that in half, so that's nice.

I have to do name clearance searches for a new name because my old LLC from 2008 (Impact Linux) was apparently never properly shut down and still sort of zombie-exists but not in a useful way. (It went "inactive" in 2011 due to unpaid annual continuing-to-exist fees. The person who handled that at the time was born again into one of those religions where you get a new name, dress differently, and move away from everyone who knew the old you.) At this point properly killing the old LLC off means fully resurrecting it first, which is MORE paperwork and fees than just starting over with name clearance searches for a new name, which is like finding an unused domain or twitter login, only worse and probably requiring trademark searches too (which the initial state search doesn't do). Decisions, decisions...

Got caught in the rain on the way back, dried off at the Wendy's in Jester Center. Which is open 10am to 5pm, and for once I was there during daylight hours. Bought a new frosty keychain and everything, although sadly I didn't have those coupons with me. And then trying to walk home from UT during what SEEMED to be a break in the rain, it started pouring enough I gave up and asked Fade to order me a Lyft while huddling in a parking garage. (I haven't got the app on my phone; nothing on there is authorized to spend money. Yeah, I should have just taken the bus home, but it LOOKED like a gap in the rain. Tricksy weather.)

January 30, 2022

I went out to jack in the box today (the sign on Wendy's said the lobby is only open for take-out orders, which if true would be an improvement) and got a combo. They were out of the #1 and #2 combos but still had the junior burgers. Their self-serve soda machine is down but they can still fill sodas from the drive-in machine. I sat at the wall table near the outlets and read fanfic. Then I went into HEB and bought a can of milk tea from the interdimensional foods aisle and read more fanfic. Then I went home and napped while my electronics recharged, and watched some anime with Fade in the evening, and walked out to the table (2 miles each way, it's good exercise).

(Ever since high school I've tended to absorb the writing style of what I've been reading recently, and that author does a lot of matter of fact descriptive statements. You have been warned.)

At the table, but... really tired. I am not load bearing when it comes to ANY additional stress right now, not even GOOD stress. I am highly pleased to be starting work on Google's toybox hermetic build todo list, but tomorrow I have to fill out paperwork with the state of texas (can I reuse the name Impact Linux from 2008, or do I need to find another one that passes a conflict search), and then do I need a federal EIN or not? I finally got the tax envelope from Triple Crown (now that the box of stuff has returned to Stenograph; packed it as well as I could with those air baggies from amazon and wads of paper, but that thinkpad never had a case, I hope it made it there ok) and need to file taxes for 2021 soon, I hope Juanita can tell me how to do the "passthrough ignore" thing for a single person corporation.

I want to get work done. I need to get a lot of todo items cleared. I wrote a bunch of things I need to remember to do on the back of an envelope this morning, and of course left it at home.

I did a little programming doodle yesterday, which was great stress relief. That's the kind of programming I do for fun. I was going to do a quick ldd and posted about it on the list in hopes somebody had a trick to make dlopen() work from a statically linked context (ideally to portably poke a shared library loader to tell me about foreign libraries in a chroot context; I want the load address and maybe whether all the symbols resolve, but I don't need to USE the result), and Elliott reacted like I'd tried to poison his cat. (We've been in a long thread about it all month.)

My working style has always spun off todo items faster than I can close them, and the really satisfying work sessions are the ones where I can close tabs, finish stuff, document it, and ship it. I'm trying to clear my plate before diving into a new chunk of todo, and... I'm having the programmer vesion of weak noodle-arm can't lift anything. I'm TIRED.

I'm about to sign up to serve two masters, because Jeff's doing the J-Core ASIC that I REALLY WANT TO SEE SHIP, and Google's finally ready to get serious about toybox in a way I REALLY WANT TO SEE FINISH. And first I have to work out the division of my time between them (both want me full time; I would LOVE to do both full time) and do a bunch of paperwork.

I need to do more toybox videos, those should probably be weekly. And I signed up for a lightning talk at golug on wednesday (the topic is an intro to mkroot), and Jeff and I are speaking about making an open ASIC at Japan Technical Jamboree #78. Both online via different streaming clients, of course.

Still haven't decided if I should submit an ELC talk. My "not sure I really want to go" talk strategy has been to submit ONE proposal instead of the usual four or five for the committee to chose from, but then I feel rejected when they don't select it. I'm aware that makes no sense, and yet I know my failure modes. And coping strategies. I have decades of experience representing a project rather than myself, and met my wife when she was working the SJGames table at various conventions I was pitching Linucon at. I could be poised and confident pitching the convention to potential vendors because it was a great convention and other people not seeing that was their loss. I was just coincidentally creating it, the work stands on its own. But "would you like me to talk" can't avoid being at least a little about me, it's not a context in which I can easily intellectualize emperical metrics to moot a subjective judgement, and I can't dismiss it with "this person's opinion doesn't matter" when I'm the one who asked to be judged.

(Oddly, if I submit 4 topics and they choose none of them, it's easy to dismiss the conference because clearly our interests do not overlap. It's less of a personal rejection. I didn't ask them to trust my judgement about what would make a good talk but offered them a menu, and they didn't want anything on that menu because they don't get why any of that is important. Ok then.)

January 29, 2022

Sigh, I need to vent but I'm too tired. Fuzzy and I tried to walk through the Wendy's drive-through today (corporate is still mailing out coupons) but they were too understaffed to serve food. (When we knocked on the little window, the guy behind the register ran away, and the woman who (eventually) answered said come back in half an hour.)

Here's a good story about how the last wave of american prudish censorship got beaten back in the 1980s (between panics about dungeons and dragons being witchcraft and AIDS being a judgement from some god or other). When television ruled the culture they got a strangehold on broadcast television and vetoed every image and word spoken. Now that late stage capitalism has its tentacles everywhere they're making moral judgements about every dollar spent and rendering those they deem insufficiently pure "unbanked", and the gerontocracy's promises about properly funding the IRS to get billionaires and corporations to actually pay taxes turn into monitoring the poor to make sure the have-nots pay every penny of tax on every tip and side hustle. (Of course the increase in the IRS's budget is all about punching down, not up.) Here's a good link about how plutocracy is a problem from my wife's tumblr, and Dr. Sarah Taber pointing out that the war on camgirls is part of plutocracy pushing back against the great resignation: how dare anyone pay their bills WITHOUT flipping burgers for a large corporation.

A clear explanation of how modern christianity turned into a toxic death cult of doomsday preppers, and thus why church attendance is plummeting.

David Graeber talked about "direct action", which takes lot of forms.

January 28, 2022

YES! Google has approved funding to sponsor toybox work! (Specifically to advance the "hermetic build" work I've been poking at for many years.)

At the start of this (a full 6 months ago) Elliott didn't think I'd need to resurrect my old LLC, but now that we're actually doing it... yeah, down to the Secretary of State's office monday morning to fill out paperwork.

And I'm still hip deep in the J-core ASIC work and don't want to leave Jeff in the lurch, so I'm presumably doing each half-time for a bit? No idea how long that's going to take, but I want to see both these things FINISH.

Speaking of which, back to explaining how hardware toolchains work:

The toolchain we're currently trying to make our ASIC mask with is qflow, which is a set of build wrappers around four other tools (greywolf, netgen, qrouter, magic), plus then you need to feed it the fab's #include file with the standard cells for the process you're actually trying to build a mask for.

I mentioned last time how insanely proprietary most fabs are: their SDK (except it's a PDK here, Physical Development Kit) is TOP SECRET. Various NDA material has leaked over the years (especially for the older fabs that have been passing this info around for decades), but if you present a self-made mask to such a fab without a good explanation of how you got it, they will be EXTREMELY unhappy with you. The fab WANTS to get your netlist and make their own mask, because that's a service they can charge you for, and sprinkle in lots of per-chip royalties for their libraries, and it means your design is completely unportable so if you ever want to fab it through a competitor you basically start over from the netlist and pay all over again.

BUT: back at the start of the pandemic Google paid one fab (Skylake) to publicly release their PDK data without an NDA, for a complicated but powerful 130 nanometer fab process with its physical facility up in minnesota somewhere. This includes (quite a large) standard cell library, and corresponding process data such as how thick the P in a P/N junction has to be and how close two wires can be together before they interfere. (There's hundreds of weird little corner cases worked out by scientists in lab coats when they were designing the fab process, which are usually part of the NDA data without which your placer can't place and your router can't route. Sky130 is extra weird because the layers are different sizes: five volts up top, 3 volts in the middle one volt down at the bottom, so it can do a lot of flexible signal processing stuff and is frankly a bit odd to make a CPU with.)

Before this, the other main option was the OSU standard cells, which a university made years ago using extremely conservative estimates for a very old process, and would PROBABLY work... if spaced widely and clocked very slowly. And original targeting 350 nanometers, I think? It's a terrible option, but at least does not involve an NDA, and was used to develop a lot of these open tools back in the day. And I've talked about Graham Petley's work before, but it's more a generator than a ready to use PDK package, and the guy who did it apparently retired to paint watercolors. Jeff was involved in that project back in the day and could pick it up again, but there's a lot of heavy lifting to actualy use it in a real modern process and we'd like to establish a working baseline first. But the advantage there is you could (in theory) assemble an PDK for any process from maybe 25 nanometers on up. Below that it gets weird and quantum, and they cheat by turning the cells on their side to pack them more tightly together...

The standard modern file format for this stuff is LEF (Library Exchange Format) and DEF (Design Exchange Format). The PDK is generally LEF and DEF files, and the final mask file is a big DEF file. The netlist could be supplied as a DEF file instead of as verilog but usually isn't for historical reasons, and qflow isn't JUST a pile of glue scripts but also has a half-dozen little C programs to do file format conversions.

So, back to qflow: Greywolf is the placer, qrouter is the router, netgen does multiple things (most notably collating the list of labeled pins into a list of networks), and magic is the design rule checker (and it's a GUI visualizer, and format converter, and it actually calls back to netgen to perform some of those design rule checks).

The input to qflow is a netlist describing the circuit (which can be a bunch of formats but verilog is popular). This is still literally just a list of components with pins, and a corresponding list of "nets" which are just groups of pins to wire together. (I was calling them "pin pairs" until Jeff corrected me that there can be 3 or more pins in a net, all of which get wired together so current flows between them all. So networks, not just wires.)

The output is a "routed netlist", a bit like a PDF or photoshop file (except in DEF format) with a bunch of layers each containing a bunch of rectangles. Each component from the input netlist has been assigned to a physical location (layer, x, y, and orientation: greywolf places cells in a grid but can rotate or flip each one to make the routing work better) and broken into the series of rectangles of each type of material that implement that component. Each net from the input netlist has also been filled out with the set of rectangles (on metal layers, plus the vertical bridges connecting layers) needed to route the connections (I.E. the output of place and route).

The rectangles on each layer are all filled with the same thing, like "metal" or "insulator" or "dopant" (the chemicals that turn silicon positive or negative) or vertical connectors between layers punching little squares in the insulator layers. These layers are defined in the fab's PDK, and the PDK also has rules for each layer describing how thin the rectangles can be and the minimum gap needed between them, and the design rule checker's job is to verify that none of the PDK's rules are violated by a given DEF file generated by the toolchain: it checks the output for errors after you've generated a mask candidate.

The components don't just connect to each other, they connect to power and ground and clock. On the outside of your chip is a "pad ring" with special cells that talk to the outside world; there's generally a separate pad ring generator that produces all that, and then your place and route fills in the space inside the ring and links up the relevant nets to the ring's connections to satisfy all the references to the outside world.

The hardware version of QEMU is called spice. It's a physics simulator package that can actually "run" a circuit in a mask definition file. (Very slowly and unreliably, and it hasn't benefitted from 30 years of Moore's Law the way the other tools have, although there are ongoing attempts to rewrite it. But if the stars align and you chop your chip into tiny enough pieces, it can show you little pieces of your circuit working before you spend $$$ to fab it.)

So that's what we're TRYING to get the ASIC toolchain to do: combine a verilog netlist (produced by GHDL from the ICE40 J1 repo) with the sky130 PDK to run it through the qflow tools to place and route a mask file. We also need some libraries, most notably an SRAM generator (what's wrong with openram is its own whole writeup, but the tl;dr is it SHOULD have a few bits of SRAM available as a macrocell we can stick in the PDK and place and route instances of like any other component, and instead it tries to takes over a chunk of silicon wafer space and produce raw rectangles, routing its own power lines and everything (badly). In software terms, we want some C functions and it gives us a .o file with a giant binary blob in it that isn't even -fpic).

One persistent bug in qflow is that verilog uses an insane escape sequence for names with punctuation in them: it sticks a backslash on the front and a space at the end (which either gets stripped so the symbol no longer matches, or causes a syntax error, in any other format the symbol name is copied into). The proper thing to do when converting from verilog to any other format is to strip this escaping, and re-escape any characters that need it in the new file's format. (So far, we haven't hit a case where anything WOULD need to be escaped anywhere else, it's mostly that verilog hasn't got a lot of concepts VHDL does so things need to be made very explicit for it, which involves creating compound symbol names to represent components of VHDL objects, and GHDL uses an "illegal" character (period) to glue those together to avoid potential in-band signaling conflicts. Period is a valid name component requiring no escaping in all the other file formats EXCEPT verilog...)

What qflow decided to do instead (which is nuts) was keep the leading backslash and turn the trailing space into another backslash. And it didn't do it CONSISTENTLY, lots of names leak through unconverted. Sometimes this broke the reader of the new format, and other times it meant bits of the design didn't match up (mostly broken nets because it couldn't match up pins when the converted and unconverted names didn't match at each end). The other problem is, while LEF and DEF don't seem to escape anything and just treat backslash as a normal character... json doesn't. And yes, one of the conversions is into and out of json, because that's what some tool wanted. So the json parser threw a syntax error trying to read the file, which we hadn't noticed before because it didn't get far enough to HIT that failure mode before we fixed several of the other conversions. (The name mismatches were showing up as design rule check failures, now they're showing up as a build break, and this is PROGRESS. Sigh.)

So the PROPER fix is to go back and strip out ALL the escaping: when the vhdl reader library encounteres an escaped name, it should use the escaping to parse the name, then strip it. Unfortunately, this both means digging through qflow's vhdl code to figure out WHERE to strip the escaping (it needs it to figure out where symbols end during some of the parsing), AND it means ripping out all the wrong conversions later turning the "\name [123]" format into "\name\[123]". (Whether or not the [123] should be part of the resulting symbol is one of those things I need to dig further into, but... probably? Work out "what would this do if the name didn't need to be escaped" and make it do that.)

It would have been easier if the earlier code had been at all consistent or systematic, but there's a lot of whack-a-mole fixes that happened here over the years, which need to be ripped out again.

Qflow was apparently trying to preserve the escapes because one of the big Design Rule Checks at the end makes sure the input and output files contain all the same components (nothing got dropped), so it wanted a conversion that was trivially reversible. But the question "how do you know when to put the escaping BACK so it looks like the original" was THE WRONG QUESTION to ask. You don't, you consistently strip escapes EVERY time you load data from verilog: you now know the length of the symbol because it's in a C string, and you don't CARE what punctuation is in a null terminated C string. Teach the design rule checker to strip the escapes when it loads verilog, so it doesn't HAVE to escape the data it loads from the DEF file.

And if you DO need to write the data out into a new verilog file, you know to escape that symbol name BECAUSE IT HAS PUNCTUATION IN IT. Same way the escape wound up there in the first place when the original tool wrote out the first verilog file. You don't have to preserve a decision an earlier tool automatically made, you just have to competently make the same decision again as needed.

So yeah: qflow is a set of build wrappers around four other tools (greywolf, netgen, qrouter, magic), and then spice (or Xyce) is a physics simulator package that can actually "run" a mask (badly), and it doesn't quite work right with verliog that was generated from VHDL, and it doesn't quite work right with the sky130 PDK. We're working on it.

January 27, 2022

Poking at toysh. Combining command history with continuations is horrible. Both of these multiline input patterns:

$ for i in a b c
> do echo $i
> done
$ for i
> in a b c
> do echo $i
> done

Produce the same for i in a b c; do echo $i; done history entry when you cursor up. The newline between "for i" and "in" does NOT take a semicolon, and the code stitching the multiple lines together into a single long line knows that. This means we have TWO line continuation types: ends current command and does not end current command. No wait:

$ echo $(
> echo potato)
$ echo "

When you cursor up, the history entry PRESERVES the newline in both those cases. So three states so far. And you can cursor left and right past the line break just fine, which is why I wanted the code I'm writing to be the basis of my text editor implementation.

Sigh, what does busybox ash do... keeps everything entered as separate lines in the history and does no stitching together. Makes sense, but a lot less useful for rerunning things.

Alas, that's all the time I have for this today. Jeff needs me to fix vlog2Spice and vlog2Def the way I fixed vlog2Cel yesterday, and then look at the other project he found that's also trying to use VHDL with qflow to build a chip, and see if we can collaborate at least on toolchain fixes...

January 26, 2022

Working on the qflow stuff (toolchain for making our own ASIC). Trying to debug various design rule verification failures, and work out what magic's display output is trying to say when there are a bunch of layers... Ok, back up: I should do videos about this stuff too.

We've had a fully open source "VHDL to bitstream" toolchain for a while, which compiles human readable VHDL source code into a "netlist" file describing the circuit, and then the toolchain links that netlist into a bitstream which you can load into an FPGA to run your circuit.

The fully open source toolchain uses GHDL to parse the VHDL source code, and yosys to produce the netlist, then feeds the netlist to icestorm to turn it into a bitstream for Lattice's ice40 or ecp5 FPGAs.

We've recently updated this bitstream toolchain: the previous version was using a package called ghdlsynth-beta that's since been merged into one of the other packages. Unfortunately the new one reads a vital component via dlopen() and thus can't be distributed as a statically linked binary, which isn't exactly an improvement. But anyway, open source FPGA bitstream generation toolchain (for lattice): solved-ish problem.

An FPGA is primarily made up of LUTs and switching fabric. Switching fabric is a bunch of wires connecting all the inputs and outputs of every other circuit in the cell, which run together into junction boxes full of programmable switches that control which lines gets plugged together or kept apart.

LUT is short for Lookup Table, which is a tiny amount of SRAM with a set of input pins and one output pin, so each possible set of inputs can produce a known output (and thus act as any combination of and/or/nand/nor/not gates turning a set of binary inputs into binary output) just by using the input pins as address lines to look up that bit in the SRAM.

Historically there were 3 big FPGA vendors: Xilinx, Lattice, and Altera, but Altera was bought by Intel in 2015 and isn't available to individuals or small business. The J-core stuff builds for Xilinx and Lattice FPGAs.

Lattice uses 4-input LUTs (each indexing 16 bits of SRAM), and Xilinx uses 6-input LUTs (64 bits of SRAM). Neither's really better, it's just the granularity with which you can program the circuits, but it does mean that 1000 Lattice LUTs is potentially less circuitry than 1000 Xilinx LUTs so you can't compare them directly. (Although it also depends on how efficiently the toolchain's optimizer uses those extra inputs and how many are left unconnected, so it's not a strict ratio either. It's ballpark estimates then "try it and see".)

Xilinx was a higher-end FPGA vendor than Lattice, but Lattice has been creeping upmarket. The ice40 line maxes out at 7680 (4-bit) LUTs, while Xilinx's "LX9, LX25, LX45" were how many thousands of (6-bit) LUTs each contained. But Lattice's ecp5 line scales up into tens of thousands of LUTs, albeit at a slower clock speed than the high-end (much more expensive) Xilinx chips. J-core runs around 66mhz on cheap (Spartan 6) Xilinx and 100 mhz on more expensive (Kintex) Xilinx chips. It runs between 12mhz and 20mhz on various ICE40 chips, and isn't expected to be much faster on ecp5 (although we can do the full J2 SMP on there). This also makes things like 100baseT ethernet, GPS correlators, and USB2 noticeably harder to implement on Lattice.

One reason Lattice is hobbyist-friendly is price: an ICE40 is $3.5-$5 even with the chip shortage. An LX45 Spartan 6 is $35-$50 depending on speed, and a similarly sized Kintex is easily twice that. (And that's just the bare chip, bought in volume. Then you need boards to put them in, and Xilinx needs more expensive boards.)

But the real advantage of Lattice is the community has reverse engineered how to program it. Xilinx spent a LOT more on lawyers to sue anybody trying to understand its IP, so you can program Lattice with open source tools quite well (we've completely discarded Lattice's proprietary FPGA toolchain), but the Xilinx equivalents are buggy and incomplete and only furtively developed when Xilinx isn't looking.

So you compile VHDL to a netlist and link it into a bitstream that loads into the FPGA to program the LUTs and switching fabric (and initialize any other SRAM blocks). Great. What's a netlist?

A netlist is actually two lists: a list of hardware components (such as "and gate"), and a list of the named I/O pins sticking out of each component. The linking-ish phase of the hardware build is called "Place and route". The "placer" assigns each component of the circuit into one of the FPGA's programmable cells (including all those LUTs). The "router" programs the FPGA's switching fabric to connect up the various inputs and outputs of each LUT (by matching up the pin labels in the netlist and wiring together everything with the same name).

Conceptually, a hardware toolchain is a bit like a software toolchain. A C compiler produces a bunch of instructions bundled into functions and variables bundled into structures, and then the linker glues them together using a bunch of annotations where thing X needs to connect to thing Y. (In static linking this is all done at compile time, in dynamic linking another program does some of it at load/runtime.)

In a hardware toolchain, the compiler produces components (called "cells") instead of ALU instructions, and instead of jump/branch instructions it has I/O pins to wire together (called "nets", short for wiring networks, because when you connect together 3 or more pins that need to share a signal, or power, or ground, or clock, those wires have to be able to branch.) The toolchain backend reads the netlist so it can place the cells and route the nets.

The main limiting factors on how fast you can clock a circuit are 1) how long are the wires the signal has to travel down, 2) how much electricity does the component at the end need to fill it up? Physically smaller circuitry has shorter wires and less capacitance, so clocks faster. But layout also matters: if a wire "trace" is too long, the signal trying to go down it won't have done its job by the time the clock advances, so your circuit doesn't "meet timing". You can annotate signals with minimum required timing (if this 100baseT transciever is passing 4 bits at 25mhz, the circuitry at the far end has 40 microseconds handle it), and place and route iteratively try to get the best timing and tell you if anything failed to meet its minimums. (You can "floorplan" by manually grouping chunks of circuitry together with more annotation to basically give the placer hints, but that's a more advanced optimization technique usually only done when automatic place and route just can't figure it out.)

The big differences between different targets is what components are available: the VHDL or Verilog parser produces an Abstract Syntax Tree and runs the usual optimizations on it, then generates a component list and net list from that. Each Lattice FPGA family has one set of components, each Xilinx family has a very different list, and each hardware fab has its own list of available components for making photolithography masks from. A given process's list of available components is called its "standard cells", and the last "generic" step in a hardware build is the AST, because you need to #include the target process's standard cell file to generate a netlist.

The reason we still have to use the Xilinx Web ISE package to produce bitstreams for Spartan 6 (the bigger and faster FPGA in the turtle board) is icestorm doesn't know how to place and route for xilinx, and packages like Symbiflow that try to target xilinx aren't as mature because Xilinx is proprietary and litigous and has ways of encrypting its bitstreams. We have an ok list of standard cells for xilinx, so we can use GHDL to create a netlist (and thus use a more modern version of VHDL than Web ISE supports), but then we have to place and route that netlist using Web ISE to make a bitstream. Unfortunately, Web ISE is a binary only package last updated halfway through the Obama administration, which won't run without a registration key: it's basically the Flash plugin of bitstream toolchains, if Flash had required individual registration keys for each install with a working email address and needing a physical street address (ostensibly for export control reasons). They can probably remotely disable it too, I haven't checked. (Xilinx is REALLY trying to push everyone to its new Vivado toolchain, which only supports the newer more expensive FPGA families. Think Microsoft trying desperately to pull a reluctant userbase off of Windows XP without the new releases being interesting enough to move them on their own. But Vivado is no less proprietary or tightly controlled, and can't make a Turtle bitstream anyway because they dropped support for the older cheaper FPGAs.)

The standard cells of an FPGA aren't just LUTs, there are also macrocell "libraries" including clock inputs, phase locked loops, SRAM blocks, integer multipliers, and in extreme cases entire DRAM controllers. It's a bit like pulling functions out of libc that would either be a lot of work to reimplment or which you can't really implement yourself in a portable manner (because they need OS-specific system calls). Clocks need crystals, phase locked loops need capacitors, etc. The gotcha here is that each library component you pull in has a finite number of instances available on that FPGA, which live in a specific place on the chip you have to wire the rest of your circuitry up to, a constraint which makes meeting timing harder.

Compiling for ASIC (I.E. making a hardware mask) is slightly different: we take the AST produced by ghdl+yosys and run it through a different backend to produce a netlist using that fab's standard cells. In ASIC there's no limit to how many of each component it can create, the limit is how much space it has to draw cells in. The ASIC placer puts your standard cells in a big grid, initially randomly distributed (modulo any floorplanning) and then performing a kind of sort, swapping them with each other to collectively minimize the distance between pins needing to be connected to each other. The ASIC placer also tries to leave enough space for the ASIC router to add the necessary wires in the next pass, using various heuristics (fancy word for an automated way to guess). The result is more or less a giant photoshop file full of many layers of colored rectangles, with metadata so you can trace back to where those rectangles came from if you need to debug stuff.

A hardware toolchain still produces components ("cells") instead of processor instructions, but instead of libraries it has the option to bundle them together into "macrocells" (which is just cells containing other cells recursively, but with the routing locally within them already worked out). Each macrocell can be treated like a tiny little chip that has I/O pins, each with a long complicated name saying which OTHER pin on some OTHER cell it needs to connect to.

The whole ASIC has a "pad ring" around it (thing crt0.o and friends), and sucks in "standard cells" (think libc, except this defines things like "how to make a transistor in this fab process".). The pad ring is what connects to the externally facing pins the chip will have connected up to it when it gets cut out of the wafer and packaged into a ceramic case. This has special I/O pad cells (GIANT capacitor-ish thingies to try to protect the chip from static damage, or generate an enormous output current visible from space from outside the chip), and GPIO blocks that let the chip remap which pins do what (so the same ASIC can be packaged into different form factors without making different masks; sometimes there's a runtime-programmable register that lets you swap stuff in and out, and sometimes it's fuses blown during manufacturing).

Fabs treat their standard cell libraries as crown jewels: you sometimes have to sign a contract with them just to get enough to make a netlist with, and then the fab wants your netlist and will do everything else itself. If you want just OUTLINE versions of the standard cells (reserve this much opaque space you can't route through on this many adjacent layers, with I/O pins connecting at these spots) to do your own place and route, that's generally serious NDA territory. And you will NEVER get the full version that allows you to make a complete mask, in software terms you can only make a .o file that THEY link into an executable.

Standard cell libraries aren't just transistors and nor gates, they also include big macrocells like the FPGA libraries do, but fabs charge per-chip royalties for them. Using a fab's SRAM or their DDR3 controller adds extra costs, which you pay again every time you run another wafer from that mask.

January 25, 2022

Back in 2006 I was invited (by David Mandala and Tim Bird) to speak at the Consumer Electronic Linux Forum, which I've attended semi-regularly since. This year it's being held in Austin, and I'm tempted to submit a talk proposal... except I should really be doing Youtube videos directly instead of having someone else record and post them. Maybe it makes sense to do a talk there to advertise my youtube channel? Which would involve me actually starting a youtube channel; so far I have one video and the scripts for several more I haven't recorded yet because I need to learn video editing.

Ok, technically CELF isn't in Austin this year, ELC is. (That's the new name the Linux Foundation peed over it to make it smell like them.) Except ever since the Linux Foundation grabbed it they've been trying to glue it to other random events in hopes its success will rub off on whatever random nonsense emerged from their bureaucratic focus group du jour. This year the Linux Foundation is calling the collected mess the "Open Source Summit" (since O'Reilly is no longer using the name: the Linux Foundation is nothing if not opportunistic, derivative. ostentatious... Is immitation still flattery if they're not sincere?)

The latest katamari they've assembled inclues all SORTS of random padding to look bigger. Linuxcon has always made me sigh a bit at the name, but that one's at least happened before, and I think CloudOpen also previously existed, but ContainerCon isn't realy a name, OSPOCon (Open Source Projects Office) is meta-bureaucracy, SupplyChainSecurityCon was CLEARLY just made up to cash in on current headlines, and then you've got Critical Software Summit, Community Leadership Conference, Emerging OS Forum, Embedded IoT (as opposed to all the non-embedded IoT going on), Diversity Empowerment Summit (really!), Open AI & Data Forum (not even trying anymore), and Open Source On-Ramp. (Because none of the rest has an educational component?)

So yeah, they're holding CELF in Austin this year and trying to pretend it's got 14 other events bolted onto it (not tracks, EVENTS), at least half of which they just made up. Ten years ago the event's T-shirt already had more sponsor logos than your average Nascar jacket and I'm trying to figure out if I should bother with it.

January 24, 2022

Oil producers are treating their entire infrastructure as stranded assets. No matter how high oil prices go, they're not investing in more drilling because the transition to renewables could shrink demand and make the price go negative again at any time. While the USA can scale up production,