Rob's Blog (rss feed) (mastodon)

2023 2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2002


April 28, 2024

Finished the hwclock.c fixes to work around the glibc and musl breakage. (The trick was realizing that asm/unistd.h is what's getting called under the covers and if we _has_include it and #include it before any other header the existing header guards against double inclusion should make it just work.)

I should probably move some of that into lib/portability.c, but this is its only current user, and "musl breakage" has a history of being sprayed around the tree because Rich really puts EFFORT into breaking stuff to punish people writing software he doesn't approve of.

I need to figure out an automated way to test watch 'while true; do echo -n .; sleep .1; done' because it's easy to check manually, but painful to automatically. For one thing, there's no way to tell watch "run this twice then stop". I suppose I could -e and "exit 1" but the debian one goes "press any key to exit" when that happens, which is EXTRA useless. And of course if you run the above command on debian's watch it produces no output, just hangs there waiting. (Presumably if I left it running long enough it would eventually fill up some buffer, but I gave it a full minute and nothing happened.)

So yes, here's a command that is LITERALLY USELESS for scripting, it can ONLY be used interactively as far as I can tell... and it doesn't produce progressive output because stdio buffer. Bra fscking vo, procps maintainers. You bought into the gnu/stupid.

Honestly, we need nagle on stdio. They did it for net but won't do it for stdout and I dunno why. Make write() a vdso call marshalling data into a single page (4k) vdso ring buffer mapped write-only, which flushes when full or on a timer 1/10th of a second after the last write to it (tasklet driven by the kernel timer wheel). This avoids syscall overhead for the "small writes to stdout" common case without all this NONSENSE around manually flushing. Which the gnu loons have been arguing about on the coreutils list for weeks, inventing whole new APIs that read another magic environment variable to change default behavior, oh yeah that's not gonna have security implications somewhere. A denial of service attack because something in a pipeline never flushed and hung instead...

And yes, I'd special case PID 1 here. Unix pipelines are a thing. Put nagle on writes specifically to stdout, that way you don't need lots of 4k buffers to handle byte at a time writes to the kernel without syscall overhead.


April 27, 2024

Elliott replied to Oliver (with a "no, because..." on something to do with readelf) and now I feel guilty for leaving Elliott to clean up the mess. My lack of sufficient "no, because..." should not leave him having to do it.

On the one hand, if I read Oliver's Mt. Email accumulation and reply to them I will literally do nothing else on the project because he drains my energy and DOES NOT STOP. On the other, letting him run rampant and unsupervised... He is referring to toybox as "our code", and will be calling it "my code" (meaning his) soon enough.

I totally admit this is me failing as a maintainer. Someone comes in well-meaning and energetic and I am not making proper use of their enthusiasm. I should stop coding and become a full-time mentor of other people. I can't do both.

Bug reports are useful. I'm all for _suggestions_. But "right about the problem, wrong about the solution" still applies, and people who won't take "no" for an answer are a time sink. "That's not how I want to fix it" isn't final, people can argue against my point, but reiterating the exact same thing more emphatically without adding new information isn't it, and "you are a bad person for saying that" (shooting the messenger) is exhausting. Plus sprinkling in words like "defective" and "obviously" in your "don't ask questions post errors" posts... sigh.

Right now github has two related threads: in one somebody's arguing that they'd like a different aesthetic user interface to trigger something they can already do. Meanwhile, in another thread, static linking with the optional external libraries (zlib/libssl/libselinux and so on) had an order dependency that parallel probing broke, because dynamic linking automatically remembers symbols seen in previous libraries and static linking does not. Each of the 2 github threads has a "wrong fix". One wants me to add a static linking checkbox to kconfig (you can already LDFLAGS=-static but busybox had a _checkbox_ to add -static to LDFLAGS for you), the other wants me to maintain magic library order. And that's not how I want to solve either problem.

Let's start with the second one: yes I COULD create a software contraption to maintain the library order: turn the library list into a bash array, have each probe use/return an array index, and then output the enabled array indexes in array order. But that's ugly and brittle and complicated and not how I want to fix it. It can still break on library combinations I haven't personally tested, and it isn't immedately clear WHY it's doing that (because dynamic linking doesn't need it).

Instead I want to tell the linker to use --start-group, which is a flag to tell the linker to just do the right thing. It turns out the liker CAN do this already (they just don't because "performance", which again is a C++ problem not a C problem, and probably last came up in the 1990s but hasn't been re-evaluated, and again it's already how it works for dynamic linking because it WILL tell you at compile time (not runtime) about unresolved symbols that weren't mentioned in any previous dynamic library). But adding -Wl,--start-group to the default LDFLAGS in scripts/portability.sh makes some linker versions complain if there's no corresponding --end-group (and then do the right thing, but first they need to noisily announce their unhappiness, which is very gnu). Another reason I didn't check it in immediately is because I needed to test that it IS a NOP on dynamic linking, and specifically that it didn't break --gc-sections (in both gcc and llvm linkers), but my default build doesn't have any optional libraries in it, and at the moment neither "defconfig" nor "android_defconfig" build under the android NDK (the first because it assumes crypt() is available but I haven't finished and checked in the lib/ version yet, the second because the NDK hasn't got selinux.h but the shipped android build enables it because AOSP's toolchain still isn't quite the same as the NDK toolchain). So I needed to come up with test build configs/environments (and try it on mac and bsd with their silly --dead-strip thing), and make it add --end-group as appropriate.

But by NOT immediately checking it in, the submitter seemed to think I meant everyone doing LDFLAGS=-static should remember to also manuallly add -Wl,--start-group to their LDFLAGS, which would be a sharp edge no matter how I documented it: people who Do The Obvious Thing without needing to be told would still hit breakage because they didn't read the docs thoroughly before building, and then dismiss toybox as broken rather THAN read the docs. (I myself would definitely move on to something else if that was my early impression of the project.)

And the guy in the SECOND thread then posted to the FIRST thread advocating that the magic kconfig checkbox should add the magic extra "static link properly" flags. Which is STILL WRONG, it's just more deeply wrong.

The "just add a checkbox" solution to the first one is wrong because static linking is already fraught in numerous ways unrelated to this, in part because glibc is terrible. One result of these threads is "maybe I should collect the various faq.html mentions of static linking into a dedicated static linking faq entry". There's some in "how do I cross compile toybox" and some in "what architectures does toybox support" (in all three parts) and some in "What part of Linux/Android does toybox provide" and then there's MORE material about mkroot/packages/dynamic that's just in the blog and/or mailing list not the faq and none of that actually addresses link order. So a faq entry collecting together information about static linking (how to do it and why it's fraught) could be good.

Another todo item resulting from this is trying to make static linking LESS fraught, which a kconfig entry for static linking WOULD NOT FIX. I don't want to have multiple ways to do things: you can already LDFLAGS=--static and that's the obvious way to do it to a lot of people (and on a lot of other projects). Requiring people to add -Wl,--start-group to --static in LDFLAGS is a land mine, and having a kconfig entry that performs extra magic but leaves LDFLAGS people facing nonobvious breakage is NOT GOOD. I miss when "there should be one obvious way to do it" was python's motto (back before 3.0 broke everything).

I don't want to add a kconfig entry for static linking for several reasons. I'm not setting CROSS_COMPILER through there, or setting binary type (fdpic or static pie): the only reason there's a TOYBOX_FORCE_NOMMU option is it used to be called TOYBOX_MUSL_NOMMU_IS_BROKEN, in a proper toolchain you can autodetect this but Rich refuses to have a __MUSL__ symbol you can check for and ships a broken fork() that fails at runtime to defeat conventional compile time probes for mmu support.

The existing kconfig entries are all things the code needs to make a decision about but can't probe for. When you link in zlib or openssl it calls different functions which provide different behavior. And it's not the same as just having a library and headers installed on the host: we don't pull in random crap just because it's available. Should we use this or that implementation is a DECISION, I can probe for availability but not intent.

So adding a kconfig entry, and making it do increasingly magic things, would add ever-increasing amounts of magic but never make it reliable. For example, it's easy to have dynamic libraries but not static libraries installed, which came up in the NDK and is also a Fedora problem. I tried to get an selinux test environment setup, which means Fedora, but they don't install ANY static libraries by default (because that's where Ulrich Drepper railed in German against the unclean ways needing to be purged for many years before leaving to work for Goldman Sachs during some financial crisis or other), and the online instructions I found to "install static libraries on fedora" only installed static libc but not static versions of the other libraries from other packages. Which means you can have the headers but not the (right kind of) library, meaning even _has_include() doesn't help.

What I want is to make it "just work" for as many people as I can, while NOT getting in the way of existing experts who want to handle the difficult cases (or provide answers to people who ask them). The solution I came up with was to have scripts/make.sh probe $LIBRARIES and then if it's not empty, LIBRARIES="-Wl,--start-group $LIBRARIES -Wl,--endgroup". So it's only added if it has something to do, and there's an end tag to stop glibc's silly warning spam. Yes it does it for dynamic linking, which is why I had to test it was a NOP, and was supported in all the build environments I want. (I first used this flag doing hexagon bringup in 2011 and it wasn't brand new then either.)

Unfortunately, Oliver piped up in the first thread before I got to fixing stuff and turned the situation into an outright flamewar. Somebody (not the original issue submitter, just a drive-by rando) got mad I tyranically wouldn't add the aesthetic checkbox despite the Will Of The People or some such, and Oliver managed to fan the flames, and I wound up actually looking up how to block somebody on github for the first time. (After just deleting something inflammatory I didn't want to reply to and getting a HOW DARE I indignant response that confirmed I never want to hear from that person again.) And no, it wasn't Oliver, but it may have been collateral damage from Oliver trying to act in an administrative capacity for the project. (Not dealing with Oliver is having side effects. I'm 99% sure he MEANS well, and he's trying very hard to contribute positively to the project, unlike the guy I blocked. But I never had to block anyone before Oliver acted as self-appointed moderator.)

I want to get things done. I want to clean UP messes and REMOVE unnecessary complexity. And I'm not always immediately sure how best to do that in any given situation, but it's not about voting or who is the loudest, it's about working out the right thing to do. Half the time it's a question of keeping up with the flood and finding time/energy to properly think it through. There's always more corner cases. I just made a note that lib/portability.h has a glibc-only prototype for crypt() that needs to go when the new crypt() replacement in lib/ gets finished. I'd like a mechanism to annotate and expire old workarounds that lets me run a scan as part of my release.txt checklist, but right now portability.h has #ifndef AT_FDCWD with the note Kernel commit 5590ff0d5528 2006 and that's old enough (18 years, 2.5 times the 7 year horizon) that I've probably looked at it before and kept it for some reason? But what is the reason and when can it go away? Do I need to test on mac and freebsd? The bash "wait -n" thing was centos having a 10 year horizon: has THAT expired yet? (And then MacOS needed it because last GPLv2 release of bash doesn't understand -n, so... no. It gets an exception to the 7 year rule.) Doing that by hand is tedious and error prone, I'd like some automated way to check.

But that is SO far down the todo list...


April 26, 2024

Ok, got compare_numsign() rewritten and now I'm trying to write new find tests (there weren't any for -link -size -inum or -*time let alone checking rounding and corner cases) and as always getting TEST_HOST to pass is the hard part. It turns out the debian one is crappier than I remembered: "-atime 1s" isn't recognized because the time suffixes are apparently something I added? (Which I guess is why they never had to wrestle with "-atime 1kh" multiplying the units.)

Another question is which find -filters implicity add "-type f" so "find dir -blah" doesn't include "dir" itself. I've noticed "-size" is one such, but -mtime is not.


April 25, 2024

Yay, at 9am a Dreamhost employee got in and put my website back up. Thats a relief. (It was sort of understandable... except for the part that not one file they've been concerned about so far has changed in the past 10 years. As in they did a deeper scan of the whole mess for other files that might retroactively justify their concern, and the list literally did not include a single file that hasn't been there IN THAT DIRECTORY, unchanged, since 2014 or earlier. How can they be INFECTED if they're UNCHANGED FOR A DECADE?)

Under the weather today. Minor sore throat's been building for a few days, probably got a thing. Trying to squint at the find compare_numsign() weirdness but I'm low on focus.

Good to know I'm not alone at being annoyed at the crunchyroll censorship and han shot first trend in modern society. Downside of digital media: if you don't own your own a copy, fascists just LOVE to quietly rewrite all the textbooks each year and claim they're unchanged no it was always like that you're remembering wrong. How can you know history if you can't preserve it? Outsourcing stuff to archive.org or a streaming service doesn't cut it, and the Comstock Act was never actually repealed, it just got overruled by various court judgements rendering it unenforceable... which the maga-packed supreme court is reinstating. (Yes maga-packed: six of the current members are in that category. Five were appointed by presidents who LOST the popular vote: Barret, Kavanaugh and Gorsuch by Trump, Alito and Roberts by Dubyah, and of course daddy Bush appointed Clarence "uncle" Thomas, whose confirmation where Anita Hill accused him of sexual harassment was chaired by Joe Biden, no really. Politically Bush Sr. had to pick a black person to replace Thurgood Marshall, so the guy behind the Willie Horton ads found a black man who hates black people, and that's before he and his wife's personal corruption in office.)


April 24, 2024

Oh bravo Dreamhost. Chef's kiss. They took my website down today. Calloo callay. Twardling dreamhost. (I used to have a button that said "The mome rath isn't born that can outgrabe me." But I am, currently, frumious at the whiffling tulgey manxome burblers.)

Yes, I know that malware authors have been using my old toolchains to build their malware since something like 2013, and yes gnu crap used to leak the host path the libraries were built at into the resulting binaries until the debian guys did their "reproducible build" work last decade and came up with patches to stop some of the stupid (and yes, I'd been yelling at people about this in public for years before... ahem). And some bug bounty people were very bad at googling back when google could still find stuff (I shipped a general purpose compiler, yes you can build bad stuff with it, I have no say in this), and now Dreamhost has identified THE ORIGINAL COMPILER SOURCE TARBALL as containing those same strings and thus CLEARLY INFECTED. (It GOES the other WAY. Causality... ARGH.)

So I need to explain to a human that they're putting Descartes before the horse here. Luckily Dreamhost _does_ have actual humans on staff (unlike gmail), there's just a bit of a turnaround time getting their attention. (They strive for nine fives of uptime, and mostly achieve it.)

Meanwhile, I've got work to do...

Implementing lsns requires some options, and -p behaves non-obviously because every process has every namespace, but namespaces "belong" to the first process that has it. So when I lsns -p my chromium task (with two local namespaces), it shows the first bash process as the owner of all but 2 of the namespaces. (So lsns -p 3457 shows 2 lines belonging to that and 5 lines belonging to pid 581.) Except when I ran this at txlf it reported pid 459 owning those namespaces, which has exited since. It's NOT claiming that PID 1 or similar owns this, because ls -l /proc/1/ns is permission denied. So it's attributing it to the first one it FINDS, which when run as a non-root user is somewhat potluck.

This seems easy to implement because "ls -1f /proc" shows PIDs in numerical order, so I don't need to do any special sorting. EXCEPT that pids wrap, so a lower numbered PID can be the parent of a higher numbered PID. What does the util-linux implementation of lsns do? Not a clue! What's the CORRECT behavior to implement here? Dunno.

I want to ask on the list if anybody really needs octal (since two people have complained about it), and just have atolx skip leading zeroes followed by a digit, but Oliver would reply five times and drown out any other conversation. (The mailing list is still up, including the archive. For once being a separate server I don't/can't administer was a net positive, at least in context.)


April 23, 2024

Darn it, got an email notification that Google is disabling pop/imap access to gmail in September (unless I want to login on blockchain). I need to migrate my email to Dreamhost before then...

Went through my inbox and laboriously restored the unread messages, although somewhere in double digits from Oliver I stopped marking his. He's been replying as the Representative Of The Project on github too, holding threads where he solemnly comes to a decision with the bug reporter, and then presumably sends me a patch. I haven't read those threads, just skimmed to see what the actual bug report is.

Oh hey, Oliver finally noticed that I haven't been reading his stuff for weeks. (I assume that's what the body of the message is, I've just seen the title.) I'm tempted to reply with that Neil Gaiman quote, but... do I want to reply at all?

If Oliver had noticed I wasn't replying and rolled to a stop, and then poked me after some silence, I would feel obligated to re-engage and shovel through the backlog. But he's never stopped. He's never paused. He's INCREASED his output, including speaking on behalf of the project on github. Oliver does not care that he's making work for me. He does not care that reading and replying to his messages takes time and energy on my part. Even when I'm mostly saying "no" to him, it still eats time and energy, and when he objects to the "no" and I have to give a more detailed explanation and then he KEEPS objecting to the "no" because he's sure he's smarter than me and I just didn't understand the point he was making...

I find the signal to noise ration to be poor here. Being spammed with low-quality review that results in a string of "no, becuase... no, because... no, because..." does not help the project. Oliver is absorbing engineering time to educate himself at the EXPENSE of the project. He's not listening, he's telling. He's not asking questions about the years of old mailing list post or blog entries where we discussed stuff. He's seldom asking questions at all, he's making assertions. Questions are fine, if it's written up somewhere I can point him at it, and if it isn't then once I HAVE written it up maybe it should go in the FAQ or code.html or design.html or something. That way if I do a writeup the work contributes towards an end beyond just answering one person's questions. But Oliver seems to believe I owe him ENGAGEMENT, and that I am a bad person for not prioritizing him more, and I am SO TIRED.

And the longer I wait, the larger the accumulated pile of demands becomes because Oliver keeps talking to an empty room, piling up more and more posts he 100% expects me to shovel through, and any time I spend on that is time I'm not spending shoveling through my own backlog of todo items and other people's pokes. (Which at least have novelty and often shortest-job-first scheduling. Those MAY be a quick fix, or that person MAY just need unblocking rather than hand-holding and spoon feeding. Often I do get a patch and apply it. Sometimes it's "good question, wrong answer" and I can fix it or add it to the todo list.)

It's the difference between random interrupts and a screaming interrupt. One source constantly providing low-quality interrupts gets squelched. I really don't want to make it formal, but I am not scheduling a tasklet for this RIGHT NOW, and the longer the unanswered queue gets the more likely I am to just dump it. I'm losing faith that dealing with Oliver's backlog would help the project. I'm losing faith that I'm capable of helping Oliver mature into a developer that would help other projects in future. I expect he eventually will, but I personally do not have the social skills to expedite this process for a time/energy expenditure I have budget for. Yes, this is a failing on my part, I know. Failure acknowledged, I suck. Moving on to what I _can_ do...


April 22, 2024

Bit of a ping-pong day. Swap thrashing between various tasks, none of which are low hanging fruit collectable without a heavy lift. Keep rattling bars to see if any are loose...

I've done the start of a konfig.c to replace kconfig, but there's design questions kind of looming. I'm currently writing a standalone C program the build compiles with $HOSTCC and runs... Which means I'm reimplementing xzalloc() and strstart() and friends, which is a bit awkward. I mean I COULD have it pull in lib.c, but that smells like extending the scripts/prereq/build.sh plumbing I recently did and that is intentionally as simple as I could figure out how to make it at the time. I'd kind of LIKE to do this in bash so you don't compile anything, but this much string processing in bash is awkward. (It's awkward in C too, but I'm used to it there.) And I kind of want to have this replace scripts/config2help.c while I'm there, which would be WAY more work to try to do in bash...

Since I recently fiddled with the record-commands plumbing, I ran my Linux From Scratch build "ch5.sh" script (from October) twice in a row under "taskset 1" to see what differences show up in two presumably identical single processor builds run consecutively in the same directory. (So I can start replacing commands in the $PATH and see if the output has any detectable differences: that's one of my big near-term consumers of record-commands output.) There are 3 build artifacts from that: log.txt with the record-commands output, out.txt with the |& tee output, and an "lfs" directory with the new chroot. If I move each of those to a save directory and run the build again in the original location, any absolute paths written out into the build are the same, so the only noise should be actual differences...

The diffstat of the captured stdout/stderr has 16 insertions/deletions, which is 4 different lines: for some reason the bash build does "ls -l bash" on the file it just built, which of course has a varying timestamp in it. There's 3 instances of "configure: autobuild timestamp... 20240422T005241Z", 2 instances of "Configuring NCURSES 6.4 ABI 6 (Sun Apr 21 19:53:21 CDT 2024)", and the rest are "-/path/to/lib/gcc/x86_64-x-linux/12.2.0/../../../../x86_64-x-linux/bin/ld: total time in link: 0.043413" with the amount of MICROSECONDS THE LINK TOOK varying between builds. (Because we needed to know!)

I can filter most of that through sed easily enough without worrying TOO much about false positives getting yanked: sed -E 's/(autobuild timestamp...|total time in link:) [0-9].*//;s/^-rwx.* bash$//'; but the "Configuring NCURSES" line is less obvious how best to trim. (I want to narrowly identify stuff to remove, not encode knowledge about stuff to _keep_, that way lies version skew.) Hmmm... I suppose if I match the parentheses at the end and just yank from those... s/^(Configuring NCURSES .* )[(].*[)]$/\1/ seems to work.

(x() { sed -E 's/(autobuild timestamp...|total time in link:) [0-9].*//;s/^-rwx.* bash$//;s/^(Configuring NCURSES .* )[(].*[)]$/\1/';};diff -u <(x<out.txt) <(x<out1.txt))

Of course I left off work on this LFS build script with pending design issues. One of them is the record-commands setup requires a toybox binary that's not part of the toybox muliplexer, which is a bit of a sharp edge about where best to get it from. The problem is logpath does argv[0] shenanigans that are incompatible with the toybox multiplexer's argv[0] shenanigans, and rather than special case the command in toybox_main() I made it only work as a standalone binary with a #warning if you compile it as a builtin. Both approaches suck, pick your poison...

The annoying part is I'd like record-commands to work both from a host build or within mkroot: the obvious way to do it in each context is very different, and I don't want to do both with if/else context detection. I just updated record-commands so you can ~/toybox/mkroot/record-commands blah blah from wherever you are and it should run the command line with the hijacked $PATH writing everything into log.txt in the current directory, and then clean itself up on the way out. But I haven't got the toybox source in mkroot, and don't want to add a dependency on that to the LFS build. Which means I'd need to build and install the "logwrap" binary into the $PATH and have the script "which logpath" and do its own setup. EXCEPT I can't trust that to be there on the host, and when it IS there maybe it's running under the first record-commands invocation and the path is already wrapped.

In theory I can just have mkroot/packages/lfs build logwrap for the target AND copy the mkroot/record-commands script from the toybox source into the new root filesystem, and run it myself to wrap the lfs.sh runner at the appropriate point. If logwrap is in the $PATH it won't rebuild it, but just do the setup, so can still be used as a wrapper. Except this build sets up a chroot environment and then runs a second script in the chroot, and if the contents of THAT are to be logged...

What I was in the process of writing when I left off on the LFS work last time was a logwrap_reset() function that can run inside the chroot to _update_ the log wrapper path when a command just installed new commands, and I want to put them at the start of the $PATH but record when they get run. That can assume (or detect) that we already have a wrapper set up, and just tweak the existing setup.

Proving that toybox provides enough of a command line to set up the chroot build is one thing. Proving that toybox provides enough of a command line to run the builds that happen WITHIN the chroot is a second thing. I can do them in stages, but it's hard to sit on my hands and not attack the second part during the first part. The goal is to eventually have something vaguely alpine-shaped where the base system is toybox but any other packages you need to build under that build fine, using toybox.

I should track down who the riscv guy was at txlf and ping him, but looking at buildroot the bios it built is an ELF file passed to QEMU via -bios, and I've done various elf build shenanigans for the "hello world" kernel stuff moving the link address around, and all I really care about in the FIRST pass is that it stop complaining about a conflict and try to actually run the vmlinux kernel I gave it. I refuse to pull in an external package dependency, but ${CROSS_COMPILE}cc -nostartfiles -nostdlib -Wl,-Ttext-segment=0xdeadbeef - <<<"void x(void){;}" -o notbios seems feasible?

Except since I never added the partial riscv config I'd worked out to mkroot.sh (because it didn't _work_), I dunno where it is. I know I built a riscv vmlinux that didn't work, but am not immediately in a position to repeat it. (Other than "defconfig with that one extra symbol switched on", which takes FOREVER to build. Sigh, ok, find an electrical outlet...)

Ok, I did a "git pull" in buildroot and rebuilt the qemu_riscv32_virt_defconfig target, and readelf -a on the "fw_jump.elf" in that says the .text segment starts at 0x80000000. And when I yank that argument... it still boots. Huh.

Oh. Aha! It's not booting the vmlinux, it's booting the arch/riscv/boot/Image file. The build also creates an Image.gz file in the same directory, which doesn't boot under qemu, but the MIDDLE of the three files (vmlinux->Image->Image.gz) is the one that works with qemu. And doesn't complain about conflicting mapping ranges.


April 21, 2024

Right clicked on the "Inbox" folder and thunderbird popped up the menu and immediately dismissed it, apparently selecting "mark folder as read" with no undo uption. Thank you thunderbird. I had like 50 unread messages in there since the start of the month. (Admittedly half of them from Oliver.)

Android gave me the "79 files (your mp3 collection on this phone) should be deleted!" pop-up WHILE I was using the File app to play one of them. There is no "permanently fuck off" option, it will do it again over and over as long as I have this phone.

Ok, I need to add the "return" builtin to toysh, which means popping function contexts. I think I've done this analysis before, but it's been a while so let's re-do it: function contexts are created by call_function() which doesn't actually call a function, lemme rename that new_fcall(). It's called from run_subshell(), run_command(), sh_main(), eval_main(), and source_main().

The three main()s are relatively straightforward: sh_main() creates the initial function context and ->next being NULL means you can't return. The function context in eval_main() is there so I have a pipeline cursor (TT.ff->pl) that I can return to the calling code from, and to snapshot LINENO:

$ X=$'echo one $LINENO\necho two $LINENO\necho three $LINENO'; eval "$X"; echo here $LINENO
one 1
two 2
three 3
here 1

Sigh, in this old devuan bash -c 'echo $LINENO' is saying zero, but I think one of the conversations with Chet pointed that out to him and he changed it. I should wait until after the version upgrade to add tests, or maybe run tests in an LFS chroot? Hmmm...

Anyway, the transparent function context from eval should basically be ignored:

$ echo $(return)
bash: return: can only `return' from a function or sourced script

But there's a "stop context", preventing child processes from running parent commands. And return is looking PAST that sometimes:

$ x() { echo $(return); }; x
$

Sigh. I want to ask Chet why that DOESN'T error, but there's a significant chance that would introduce more version skew.


April 20, 2024

Trying to fix a bug report that the submitter closed once the issue was diagnosed and they could work around it. Nope, that's not the same as FIXING it, so I've added more comments that probably nobody will ever see in future because "closed issue". (Not a fan of Microsoft Github.) Two of those comments document my wrestling with alpine:

I tried to set up an alpine test environment (my last one was a chroot years ago), but it doesn't seem like they ship a livecd? Or at least the "extended" x86-64 image on their "downloads" page isn't one.

I downloaded their CD, kvm -m 2048 -cdrom blah.iso and got a login prompt instead of a desktop, the only account I could guess was "root", then I couldn't "git clone https://toybox" because it didn't have "git" installed. I googled and did an "apk add git" but it said it didn't know the package, "apk update" and "apk upgrade" didn't help...

This is not really a livecd.

I may have been a bit spoiled by knoppix and devuan's livecds, which set up a union mount reading the iso and writing changes into an overlaid tmpfs, with apt-get set up to install arbitrary additional packages. (Ok, you need to boot a recent enough livecd that not doing an "apt-get update/upgrade" that would fill up the tmpfs with noise doesn't complain that the package versions it's trying to find aren't available or compatible with the existing install, but that's just bog standard cloud rot trying to talk to servers that aren't local. I made puppy eyes at the devuan guys and they packaged up pool1.iso for me, with the whole repo on a big DVD image so VM bringing doesn't require talking to severs that may not be there anymore when regression testing against an older image, and sometimes I even bother to set that up and use it properly. I have the incantations written down somewhere...)

Anyway, the saga continued:

Used the setup program to install it to a virtual disk, booted that, logged in, installed git, logged in as the non-root user I'd created, cloned the repo, there was no make... and no sudo. And "apk add sudo" didn't work. Right... Ok, installed make, there was no gcc, installed that, and now it says ctype.h not found. I have to install an additional package to get standard posix headers supplied by musl, installing the compiler does not get me headers.

This is not the friendliest distro I've encountered. Also, what's the difference between the "extended" image and the "minimal" image?

Installed musl-dev. Installed bash. And now the build is complaining linux/rfkill.h isn't installed...

Which is the point where I gave up and just installed a local busybox airlock dir to stick at the start of the $PATH for testing. I don't actually care about alpine specifically (until someone complains specifically), the question here is do the busybox commands work here, and the answer was "no" but not a deep no. The airlock setup failed because -type a,b isn't implemented in busybox find (actually the wrapper directory setup failed, which is odd because it came AFTER the airlock setup...?) which failed back to the host $PATH which meant busybox commands were doing all sorts of things and going "I don't understand this option to this command!" But fixing the airlock to use the toybox commands made the build work, which, you know, is why it's there...


April 19, 2024

The problem with cleanup and promotion of stty is I dunno what half this crap DOES, and the stty man page doesn't really explain it either.

There's a bunch of legacy nonsense leftover from 1970's tty devices that connected a physical printer (with ink on paper) with keyboard via serial cable. (Back in the day special purpose video monitors were too expensive for mere mortals, and using mass produced televisions as displays had a half-dozen different problems: heavy, expensive, hot, NTSC resolution was poor, generating the input signal yourself had regulatory issues... Technology advanced to normalize video monitors in the 1980s but Unix was 15 years old by then.) This is why the Linux tty layer is a nightmare for maintainers. Or so I'm told...

Setting serial speed makes sense (for serial devices), although independent ispeed and ospeed was last relevant when Hayes/USR/Telebit and v32.bis modems were fighting it out in the market in 1992. (The proprietary encodings all lost, the Navy bought a zillion of one of them, USR I think, as they were end of lifed but nobody else cared. That was the "fast one direction, slow the other direction" encoding that didn't have echo cancellation so didn't care about satellite transmission delays, but these days the satellite transmissions start out digital. v32 sent basically the same data in both directions and cancelled out the echo of what it knew it had sent, which meant there was a maximum delay before the ring buffer cycled and it couldn't recognize the echo to cancel it, which never got exceeded in domestic calls but happened routing through satellites.)

Yesterday I poked at setting cols and rows without the xterm noticing the change. "min" sets minimum characters per -icannon read and I have no clue why you'd want to do that. "time" sets a read timeout but doesn't say what the UNITS are (seconds? Milliseconds?) and isn't that what poll/select are for anyway?

"Line discipline" is not documented: the number selects which tty driver Linux loads to handle a serial port, there's a list of numbers in bits/ioctl-types.h (0 is N_TTY) and the kernel has MODULE_ALIAS_LDISC() lines that tag drivers as handling a specific line discipline number, but of the 16 in the 6.8 kernel only 3 might matter (other than 0, which means NOT loading a driver): N_PPP, N_SLIP, and N_MOUSE. And you don't set any of those via stty.

The Linux Test Project makes me sad (and mostly tests kernel anyway). The posix conformance tests (which I've never seen and last I heard were very expensive) also make me sad. Coming up with the tests the code needs to pass is WELL over half the work of most commands. And other projects' test suites either don't test anything of interest, are full of tests I don't mind NOT passing, or I never bothered to work out how to get it to run on anything but its built-in command. (They never did a TEST_HOST that I could find.)


April 18, 2024

I haven't checked yesterday's stty fix in yet because... how do you test this? I don't have physical serial hardware currently set up, and the hardware I have at hand that could do that is currently set up to use it as serial consoles, which means changing them is kinda awkard (if something goes wrong I probably have to reboot the board to get it back). I mean I should set up ssh _and_ console in parallel, which also means setting up at the desk where all the boards are instead of "laptop out at coffee shop away from endlessly barking dog"...

I wondering if some sort of tty master/slave thing can let me regression test this? Or strace? (The problem with "stty write, stty read and display" is if it's the SAME stty so if it's got something wrong it's likely to get it bidirectionally wrong.) But I suppose in the short term I can use debian's stty to test that MY stty set the right stuff. Yes, I am changing the speed of ptys. (It records them!)

Another just WEIRD thing stty can do is set columns and rows for the current terminal, but xfce's "Terminal" program does NOT resize itself when you do this, so when you "stty cols 37 rows 15" bash then wordwraps VERY strangely until you grab the edge of the window and resize it (which resets the pty's cols and rows to the xterm's size). I tried "kill -SIGWINCH $PPID" but that didn't help. I thought I'd strace the "resize" command to see what that's doing, but:

$ resize 37 15
resize: Can't set window size under VT100 emulation
$ TERM=linux resize 37 15
resize: Can't set window size under VT100 emulation
$ reset -s 15 37

Oh wow, that made bash VERY unhappy. And "reset" doesn't fix it! Hmmmm. Weeeird... that will make the terminal _bigger_, but not smaller. Ooh, and the grab-and-resize is out of sync now! It thinks a window that is 20 rows tall (I counted) is 80x2 and won't let me shrink it vertically any farther. I should email the xfce guys about this... Ok, "stty rows 25 cols 80; resize -s 25 80" seems to have gotten the terminal back into something controllable. And I can shrink it to... 22x3. Which counting characters agrees with. Yay. And resizing that BACK up has remembered what the first half of the screen had, but bash has 8 lines of garbage at the bottom ala "landley@dlandley@dlandley@d..."

Does nobody else actually TEST CORNER CASES? Sigh...

So yeah, "man 4 console_codes" probably has some resize magic I could dig into (and toybox's reset.c may need a bigger hammer), but that doesn't help with stty.


April 17, 2024

Poking at stty, promoting which is the last thing left in an old todo file I'd like to delete and it's only 460 lines so presumably reasonably low-hanging fruit? The problem is, it's basically impossible to TEST in an automated fashion. (Or at least I haven't got a clue how, except for setting values and having it spit them back? For what that demonstrates?)

The lists of speeds is duplicated in the command, I've got it in lib/lib.c but... xsetspeed() just calls the ioctl(), it doesn't have a way to convert a baud rate to/from the magic BOTHER values the ioctl eats, which we need to display the values. Ok, break out the array into a static, add new to/from functions and make the existing function call the converter... Sigh, the conversion is evil magic, what's it doing... Ok, the magic extension bit for "we ran out of speeds, let's glue another 0-15 range on" is bit 13 (4096), and +1 because I skipped B0 in my table (why save zero in the table when you can't set the hardware to rate zero), and then BOTHER isn't actually a usable value (it's defined as a mask, but the first VALUE they made a macro for is 010001 for NO APPARENT REASON, they just wasted another entry), so there's two magic +1 in there depending where you are in the range, and then you have to subtract the first range when setting the second (except it's not -16, it's -14 because we skipped B0 and then we skipped BOTHER)...

And previously I rolled all that up into a test adding a constant, which I commented insufficiently, the commit comment did not explain, and looking at it I don't trust it. Great. Ok, cp toys/example/{skeleton,bang}.c and then edit bang.c to a test function with the size array and the #defined constant array (all the B50, B75, B110 and so on), and make sure that all the from conversion and two conversion produce what the constants SAY they should produce... No I am not checking bang.c in, I confirmed it but that really doesn't seem to be the kind of thing we need to regression test? (Unless the values are different on BSD and such, in which case... I'm not sure I CARE if it works there?)

You'd think this would just be "set an arbitrary speed" by feeding it an integer and having the driver work out what clock divisor to set the hardware to, but alas half the drivers out there don't do that because modems and physical tty devices didn't do that (they had standard speeds), and those were dominant users of serial ports forever. So there is some way to set an arbitrary one, but the last couple drivers I looked at ignored what you tried to set through that and only used the B38400 style values. And you can set it to 4 million bits/second through that, which is pretty much the limit of what serial hardware's going to do with a cable longer than a few inches anyway: if you need to go faster than half a megabyte per second, you might wanna twist the wires and have a packet protocol for error correction and retransmission. I mean yeah you can layer ppp etc in userspace, and people do... The point is 500 kilobytes/sec hasn't been limiting enough for people to put much effort into fixing it because if you push that hardware much further things get weird anyway because of how the cables and signaling work.

The fancier protocols like USB send complementary data across two wires twisted together with encoding that breaks up runs of zeroes and ones and makes sure there's roughly equal numbers of each to avoid radio interference weirdness, and they care about things like "pin capacitance" that just didn't come up much with slow serial data... In the USB turtle hat we just grabbed an off the shelf USB 2.0 PHY ("physical transciever") chip that sent/received the wire signals for us and gave us a 4 bit parallel data running at 50mhz, so we could send/receive a byte every 2 clocks at a rate our FPGA could run at. (Going that fast over milimeters of wire is a lot less fraught than going that fast over even a few inches of wire. Presumably signals work better in metric.) For the turtle's builtin USB ports we were talking USB 1.1 to a hub chip that downshifted for us, so it was an order of magnitude slower. You could still plug USB 2.0 into the other end of the hub (on the 4 exterior ports the board exposed to the outside world) and the hub chip would forward packets to the USB 1.1 "host" connection inside the board, and it presumably all worked because the USB protocol is a call-and-response thing where the "device" end mostly just replies to packets sent by the "host" end asking it for data. So it would go slow but work... if we'd ever made a bitstream that actually IMPLEMENTED a USB host controller. (The stuff for turtle board was the other end, USB gadget side. Which is simpler because it can advertise a single protocol and doesn't care what other devices are plugged in, while the host has to support lots of different protocols and track the state of all the attached devices.)


April 16, 2024

Sigh, I hadn't replied to Oliver since the 8th but I fell off the wagon. I knew better. (Ok, technically I replied to Jarno, but...)

And in reply, Oliver says I can just wait to read his replies so he can speak for the project to everybody on the mailing list I maintain, without me having to care what he says. Yup, that'll solve everything... Oh well, as long as I have his permission to ignore him (clearly something I needed to have). I wonder how long it'll take him to notice?

Rather than try to deal with magic "/usr/bin/env" path or making sure I "bash script.sh" everywhere instead of just running it, I want to merge scripts/genconfig.sh into scripts/make.sh. The reason it's separate is the config plumbing needs to call it: anything sourcing Config.in is going to try to import generated/Config.in and generated/Config.probed. That might be another vote for bumping "rewrite kconfig" up the list, although a drop-in replacement for the old kernel kconfig would still have the same sequencing issue.

There are only 2 probed symbols left: TOYBOX_ON_ANDROID and TOYBOX_FORK. In theory both of them could just check #defines, the first __ANDROID__ and the second __FDPIC__. But configuration dependency resolution needs config symbols, the C only gets compiled (and can check #ifdefs) after the .config file is written out and processed. That's the real sequencing issue. Is there an easy design way to have a config symbol "depends on" a #define? The current upstream kernel kconfig is turing complete and can do all sorts of things (including rm -rf on your home directory), but I'm unaware of a specific existing syntax for this sort of check. I also dunno what's gotten migrated into busybox, buildroot, u-boot, or whatever other packages are using kconfig forks these days. "depends on #ifdef __FDPIC__" is easy to implement but "a subset" and "a fork" are different things from an "other people learning this stuff" standpoint. Forks diverge further over time, once I start ADDING stuff there's no obvious bright line way to say "no" (or regression test against another implementation)...

The other thing this sort of implies is "depends on #ifdef __SELINUX__" except that requires an #include before the test because the symbol is defined in a header rather than built in to the compiler. The android guys patched their compiler to say __ANDROID__ without #including any of the bionic headers. (I don't know WHY they did that, but it's what the NDK is doing and you work with the toolchain you have, not the one you'd like. The compiler also says __linux__ but that's the ELF ABI it's generating when it writes out a .o file.)

Hmmm, I do NOT want the plumbing automatically sucking in dependencies "because they're there", but dependencies that don't show up in the config when not available ALSO means they'd magically vanish when not available, which means the build DOESN'T break if you told it to build against zlib and zlib wasn't there in your build environment. The config symbol would instead silently switch itself off again because dependencies, and silently working with a slower fallback isn't what they ASKED FOR. Breaking at build time (the current behavior) seems like the right thing there. Hmmm...

Tricksy. It would be nice if the kernel, uclibc, busybox, buildroot, and u-boot had already gotten together and SOLVED this for me, but it doesn't look like they were even asking questions along these lines.

I suppose I can pipe the cc -dM output through sed to produce config symbols in one pass (even with some __has_include() nonsense at the start) which means I can do it CHEAPLY. Something like :|${CROSS_COMPILE}cc -dM -E -|sed -En ;s/^#define __(FDPIC|ANDROID)__ .*/CONFIG_\1\n\tbool\n\tdefault y/p' . That still needs to happen at config time instead of make time, but maybe it ONLY has to happen at config time? I think scripts/make.sh doesn't read Config.in, it just reads .config. Still a question of WHERE to put "FDPIC" and "ANDROID" though, the LOGICAL place is in the top level Config.in file. There just isn't a syntax for it.

Alright, what did the kernel guys add for this. Documentation/kbuild/kconfig-language.rust says depends on $(cc-option,-fstack-protector) on line 538 (long after it's done explaining what "depends on" is, this is not documentation it's a wiki page of notes.) Which is not what I want, a #define and a command line --compiler-option are two different things. The other syntax it mentions is def_bool $(success,$(srctree)/scripts/cc-check-foo.sh $(CC)) which is the outright turing complete "go ahead and run rm -rf ~ when pulling in an external module, why not" stuff that made me nope out when they added it in 2018. I mean make can already do that, but CONFIGURE doing it is new.

I want "preprocess this source snippet, then set this list of symbols based on output strings being found or not being found in the result". I'm not spotting it in the existing kconfig kernel documentation. I can make a shell script that does it, but... I've GOT that already, and would like to avoid having to call it from 2 places so I don't have the freebsd guys bugging me about what shell to call it WITH just because they made a bad call years ago and are stuck with it now.

I can just take the call to scripts/genconfig.sh out of scripts/make.sh and just have the Makefile call "bash scripts/genconfig.sh", which would make the BSD guys happy. That also means yanking the "Config.probed changed" warning...

Ah, the other problem is that config2help parses Config.in, which means pulling in generated/Config.in. That's why make.sh needed to call it.


April 15, 2024

Called the tax lady and got through, confirming that she filed an extension. Yay.

So many messages from Oliver, speaking for the project to other people on github, dictating ultimate truth instead of making suggestions or asking questions. I am so tired. It's increasingly hard to edit my replies to be polite. (And of course every time I DO object, I'm being unreasonable because he IS the only arbiter of absolute truth in the universe...)

I should be an adult. I should not be bothered by this. It just... adds up.


April 14, 2024

Night on airport floor. Cold, loud, and the alarms keep going off. (Pretty sure the alarms are intentional to punish people doing what I'm doing. The cold is probably to bank up air conditioning so when the sun comes up and crowds arrive the climate control has a headstart, arbitraging cheap overnight electricity.)

Once again trying to charge my phone from the laptop, since that's the only thing I could plug into the wall. Did not get a full charge this time either.

It's weird to consider that you do not need to show a boarding pass to go through security theatre. They don't care whether you're getting on a plane, you can go through to meet people at the gate. What the TSA is even theoretically securing _against_ remains an open question.

Yesterday's "evolution of computers" rant reminded me of the theory that living cells evolved from zeolite deposits near undersea volcanic vents, a mineral which which naturally develops a bunch of roundish little empty niches on the surface in certain chemical environments, which then naturally develop an electric charge near active volcanic vents, and the wide range of energetic organic compounds constantly flow out of the vents even today often can form an organic film somewhere between soap scum and the inner cell membranes around various organelles inside the cell. This electric charge can then discharge itself to ratchet all sorts of other chemical reactions "upwind" against entropy, and today we call this a cell's "resting membrane potential" and the main job of molecules like ATP and NADH and so on is to recharge the membrane potential, which is the cell's actual chemcial synthesis worktable. The theory is this process developed interesting molecules that spread from indentation to indentation in some patch of zeolite, and then contaminated other patches of zeolite near other vents (in which case viruses may have predated freefloating cells), and one thing that made molecules more "interesting" (or at least more likely to reproduce and spread) was building/improving membranes to collect higher concentrations of interesting molecules (collect the components, maintain a better electrical charge across the membrane, catalyze reactions likely to turn compoments into more complicated molecules using the membrane charge), and after a long enough time some cells "better membrane" process didn't just extend them across holes faster (both to fix damage and to colonize new surfaces) but extended out protrusions that closed themselves off, turning the membrane into a free-floating sphere, inventing free-floating cells. And then those cells could bud off another one when they'd collected enough chemicals (so yeast budding predated full cell division)...

I miss studying biology.

Got home. Collapsed. The usual.


April 13, 2024

TXLF day two

Signing (docusign, there's no WAY that has any legal weight) the actual "put the house on the market when the realtor is ready" paperwork. She's listing it for only $125k less than the tax assessment (Fade negotiated well), so the amount various contractors have invoiced to take out of the sale price has increased the sale price... approximately one to one. Ok then. And it looks like the realtor is taking 6% and then any buyer's realtor would take 3% on TOP of that? So 9% commission total? Sigh, Fade read this closely, I leave it to her.

Our usual handyman Mike was very insistent that he could do a lot of the prep work cheap and get paid at closing, and "a lot" became EVERYTHING ALL OF IT GIVE ME THE WORK, and he underbid the other contractors and bit off waaaaay more than he could chew, and is now the one holding up the listing. (Or so the realtor told me on the phone yesterday, I haven't spoken to him since leaving Austin.) The realtor said she's going to change the locks and have her team finish the last of the work. Fine. Good luck. I'm still letting Fade handle all this because I have not recovered sufficient emotional resilience in this area to have coherent opinions. We are in the process of washing our hands of it, and just need to navigate the extrication.

Back to the Palmer Center for TXLF: Spent fifteen minutes in the talk room getting laptop hdmi displaying on the projector. Yay. (The trick was 1024x768 and using the mirror checkbox in the main xfce "display" widget, ignoring the destination selector pop-up because clicking on that does NOT mirror the displays.)

The riscv guy said he'd be in the dealer's room at 9am, but the dealer's room isn't open. I'd email him, but I do NOT remember his name. (I brought reading glasses this trip, so I have to tilt them and squint to read people's badges. My see stuff far away glasses are on the desk in my bedroom in minneapolis.) He already knew my name and I forgot to ask his: I almost certainly know who he is, he implied we've exchanged email before, the question is WHICH person. Email does not help attach a name to a face. I'm not sure how to check the schedule for people running booths in the dealer's room, and the signs only say which company it is, not who's running the booth... Eh, likely to bump into him later.

Sitting in a talk called "what I wish I'd known about containers", which so far I could have given except for the "terminology" part: a container "image" like the "RHEL Universal Basic Income Image", a container "engine" (podman, docker) so basically the launcher, a container "orchestrator" (kubernetes, swarm) which I think is doing cluster management at a level I have never personally had to care about. (I remember back in the beowulf days when there was a multi-ssh tool that connected to multiple systems and mirrored what you typed at all the sessions. We've come a ways since then, but not THAT far.)

He brought up an "unshare, cgroups, seccomp, selinux" slide near the start, and now he's explaining the unshare command. I'm curious if there's anything I should add to the unshare command I wrote for toybox. He's using all --longopts for his unshare --user --pid --map-root-user --mount-proc --fork bash example. (I got to ask a question: if --mount-proc used any special flags or anything to distinguish it from simply "mount -t proc /proc /proc" inside the container. He didn't know. Eh, I can strace it.)

His selinux explanation was just a slide saying "stopdisablingselinux.com", and now he's brought up that page which is a plea and a link to somebody's video. Nope. (Debian hasn't got selinux even installed by default, it's one of the things I like about it.)

Sigh, and now it's all podman ephemera. I should go dig into "bocker", or the "implement containers in 100 lines of C" link, or the rubber-docker repository...

Ooh, he just ran an "lsns" command in passing that looks interesting. And "man unshare" has stuff about /proc/pid/thingies used to export shared namespaces or something? Ok, add those to the todo heap. I have learned something this talk! Time well spent.

He also mentioned that "runc" and "crun" are both container runtimes, in a "fun facts" sort of way. I note that "runtime" was not in his image/engine/orchestrator terminology slide. Is this the container's PID 1 that talks to the outside world through inherited pipes, maybe? I've seen _previous_ container plumbing talks, I just mostly haven't gone on a deep dive into here because too many plates spinning...

Good point about persistent vs ephemeral data. (I was aware of the topic but he highlighted it as a thing administrators spend brain on setting up containers for people.) For "persistent" he says bind mounts and "volumes" are the main options, but did not explain what volumes ARE. (So, like, qcow? I note that bocker assumes you have a btrfs mount and uses the weird magic snapshot stuff in that. The last time I heard anything described as a "volume" was IBM S360 DASD volumes from the 1990s, and since IBM peed all over KVM until it smelled like them it's no surprise to see the term show up here, but what do they MEAN by it in this context? Loopback or NBD mounted disk image, maybe? The raid management plumbing?)

I gave my mkroot talk! Hopefully, someday, there may be a video posted. Argued with the projector a bit _again_ but got there early enough to have time for it. Turns out you have to select "mirror" from the output type selection pop-up AND click the unrelated "mirror displays" checkbox. Can't blame the venue, that's XFCE user interface being... I can't say "disappointing" because my expectations weren't really violated here. Open source cannot do user interfaces, XFCE is _less_bad_ than most.

I got through about half the material I'd prepared, and of course not in the order I wrote down. My "simplest possible linux system" talk from 1927 2017 started with a rant about circular dependencies because that's the big problem here: everything needs something else _first_, both to run it and to explain it. So the urge to stop in the middle of an explanation and TANGENT into the thing you need to understand first is very strong, and I'm always weak to that. (ADHD! Weave the tangents into a basket!)

The fundamental problem with system bringup dependencies is the last evolutionary ancestor that could actually light a fire by rubbing sticks together went extinct. In the microcomputer world, the last piece of hardware that could boot up without using a program saved out by some other computer was the MITS Altair, which could toggle a boot program into memory using the front panel switches and buttons. (Select address, select value, press "write". Eventually you flip the "cpu is stopped" switch to "run" and let it go from the known address is resets to when power cycled.)

In the minicomputer world DEC's PDP minicomputers could boot from the tty serial peripheral devices (dunno if that was a small ROM or a physical circuit that held the processor in reset until it finished a read/write loop or what, it's probably in the PDP-8 FAQ or something). The ASR-33 teletype and similar (big clackety third party printer+keyboard I/O peripheral) included a paper tape reader/writer on the side, and not only were there mechanical punching keyboards that could punch paper tapes as you pressed the keys via basically clockwork (or presumably an ASR-33 could do it running standalone), but you could work out the bit patterns and punch appropriate holes in a blank tape by hand with a push pin if you really had to. This is how the PDP-7 bootstrapped unix for the first time, by loading a paper tape. Haven't got a bootloader program? Work one out by hand with the processor documentation and graph paper, punch a tape by hand, then turn the machine on and feed in the tape you made. You can't brick that without damaging the hardware.

But modern computers can only read code written by another computer program. Lots of programs take human input, but it's a program writing it out in machine-readable format. A blank computer with no program can't do ANYTHING without lighting its fire from another computer. The olympic torch relay from the sacred fire distribution point is mandatory, even matches are obsolete outside of the embedded space.

Saw Elizabeth Joseph's talk on mainframeness and s390x. (She was at least the third presenter in this room who couldn't get the HDMI to work in the first 5 minutes.) She says I should join the "linux distributions working group" and apply to IBM LinuxOne to get an s390 login, a bit like the mac login Zach van Rijn gave me. I mean there's no obvious reason I _couldn't_ cross-compile all the toolchains from s390x. Other than nobody else having done so and thus they're unlikely to work. (Let's make a cross compiler from s390x to superh! That's clearly going to be a well-tested codepath...)

Went to the dealer's room, the sci-five guy did not get qemu working last night. I gave him a card and he said he'd email me. Forgot to get _his_ contact info again, but presumably he'll email me?

Bus to the airport from palmer center is a direct shot, good to know. I had the second pipeline punch while giving my talk, but I still had the "rio" flavor monster can left over at the airport and of course security theatre wouldn't let it through. It's kind of nasty and I wound up pouring most of it out. Oh well, learning experience. (Never been to Rio, for all I know that's what the city tastes like. Not in the habit of licking architecture. Pipeline Punch is guava flavored, Khaos is tangerine, Ripper was pineapple, this was not a recognizeable fruit. Maybe it's Durian. I wonder what Durian tastes like?)


April 12, 2024

TXLF day one.

Walked to the Leander Light Tactical Rail station this morning: it's about 4 miles, which is about "there and back" to the UT geology building's picnic tables from my old house. Left well before the sun came up, so it wasn't too bad. Bought a "3 for $7" deal on Monster on the walk. Two pipeline punch and a new "rio" flavor, green can with a lady dressed as a butterfly on the can. Had one of the pipelines on the walk, and breakfast was about 1/3 of the tupperware container of strawberry lemon bars fuzzy gave me. Bit more sugar than I'd like, but hey: walking the calories off.

Rode the rail to the end (a downtown dropoff point) and walked to palmer center from there, across the 1st street bridge. All of this early enough that the sun wasn't doing much yet, and it was still reasonably cool, because many years ago I gave myself heatstroke by walking to an earlier Texas Linuxfest in 110 degree midday austin heat and rehydrating with the Rockstar "hydrating" tea flavored abomination: when the caffeine wore off I thought I was having a heart attack, had to lie prone for most of an hour, and I suspect that's what damaged my left eye. (Blind spot in that one's three times the size of the blind spot in my right eye. It's in the right "optic nerve plugs in here" place but should not be that big, and I first noticed it the next day.) I've been very careful NOT to push stuff like that again, and yes I was going "drinking the pipeline on the long walk is not the smartest thing" but I hydrated a LOT before heading out and the sun wasn't up yet, and there are (terrible) beverages at the venue. (And spoilers: I had lunch at a nearby burger place, with ISO standard diet coke.) I'm generally fine while walking, I can "walk it off" for a surprising number of issues. It's when I STOP that it catches up with me.

Of course traveling to the venue so early in the morning means the tax lady wouldn't have been there yet when I went past on the light rail, meaning I basically did not manage to make it to the tax office this trip. (It's a half-hour walk each way from the house and at least twice as far from Palmer Center, so without a car or bike "just drop by" is an hour time investment or a Lyft fee, and their voicemail message basically said they're not taking visitors right now, and yesterday I'd have gotten there around 5:30 so they might have left already anyway). I emailed her to request she file an extension. I should follow up on monday, but I'm not entirely sure how if they're not answering their phone and don't reply to my email...? (If I really have to, I can probably file my own extension. Or have another tax person do it. But... not today.)

Checked in to TXLF, got a bag with a t-shirt proclaiming a date from this decade. Yay! That's been a bit of a problem with my stash of t-shirts, I'm embarassed to wear something from a conference in 2014 because that's a decade ago now. Yeah I'm old, but I prefer not to broadcast it quite THAT much, and I think my last in-person conference was pre-pandemic? (The TXLF guys say this is their first in-person conference SINCE the pandemic, they went virtual for a while.)

Eliminating talks given by a CEO or about Kubernetes, the first thing I wanted to see was a 4:30pm talk about bash (which I eventually walked out of after 15 minutes into a 1 hour talk, because the guy was still going on about how to clone his github to set up his testing framework and had yet to actually say anything about bash except how to check the version number). Hung out in the dealer's room a lot before then. 2/3 of the booths are pointy hair nonsense too, but there's still more interesting people running booths than giving talks.

Bothered the python booth people to see if maybe there's a ph7/mruby variant for python? Which seems unlikely due to the 3.7 expiration being quite so rigidly policed: not only can There Be Only One Implementation, but there can be only one active VERSION of that implementation. Three different forks of python are _going_ to vary more than python 3.6 vs 3.7, if it's THAT much of a problem for them people using slightly old versions, this is way too brittle to have a compatible ecosystem. Add in the general tendency for embedded versions NOT to stay cutting edge all the time and constantly replace themselves... The embedded world is still installing 2.6 kernels half the time: we're BIG into "stable", and when we do implement new stuff we try to 80/20 a compatible subset cutting as many corners as we can get away with. Python's Progeria Policing would be quite a headwind for an embedded version.

Anyway, the python guys suggested two projects, micropython and circuit python, which turns out to be a fork of micropython. Google for "tiny python" also finds "tinypy", "tiny python", and "snek". And python.org has a wiki with links to a bunch of implementations: python written in python, lithp, php, one wrtten in haskell... The google summary for the link shows "rustpython", which I haven't scrolled down to yet but I'm pretty sure that's not in the first half of the page. (Google seems to have a bit of a bias here. Then again maybe that's the most recent change to the page, I dunno how much of the previous stuff here dates back to Python 2.0 before they started aggressively purging their ranks. Logically... probably most of it.)

Anyway, I'm interested in maybe adding ph7 and mruby and whatever the python equivalent is to mkroot as packages. You want this language on the target? Sure, here's how to build it. (Although for me rust goes in the "riscv" bucket: wake me in 5 years if it's still a thing, after I've done enough others that "yeah, if I'm adding or1k I suppose riscv isn't _less_ important"...)

Speaking of, I bothered the guy at the Sci Five booth about my inability to get qemu-system-riscv to boot a vmlinux built from vanilla source without external dependency packages, which is the hack buildroot used. This architecture still has NO BOARD DEFCONFIGS, just the "use the default y/n for each symbol and thus build hundreds of modules" defconfig. He identified what buildroot was using that firmware for: riscv needs some sort of hypervisor layer so the kernel can call into a vendor-supplied equivalent of Intel's system management mode and run code behind your back, or something? (Perhaps it's more like Sony's playstation Hardware Abstraction Layer they did their PS3 Linux port on top of? Because that ended well.) The point is, there IS a "CONFIG_RISCV_SBI_V01" symbol in the vanilla kernel I can enable to build one into the vmlinux, and the help text for that symbol says "This will be deprecated in future once legacy M-mode software are no longer in use". So his workaround is something they've promised to remove. How nice. And then of course when I did build that, I was back to the "qemu wants to map a rom in the same place as the vmlinux so refuses to load" problem, which I showed him and he went "huh" and promised to take a look when he had time.

Staying at my house tonight turned out to be fraught: I pinged the realtor to be sure that A) it's not currently on the market (it is not), B) no contractors are doing work on it tonight (they're not), but rather than answer my text she voice called me and wouldn't get off the phone for 20 minutes trying to find me a hotel. (I didn't ask her to do this, that's not what I wanted, it's still my house, stop it. Her REASONS for saying I couldn't stay at my own house back when my talk was approved DO NOT APPLY yet. She has not come up with a DIFFERENT reason, she's just squicked by me being in HER house.)

Once it became clear I wasn't taking no for an answer without some sort of actual REASON, me spending the night in what's still technically my house then became HER PROJECT where she had to drop off an air mattress and towels and so on, and... I didn't ask for that? I couldn't STOP her (I tried), and then she texted me another FOUR TIMES about it at various points during the day until I blew up at her. Look: I just dowanna check an unknown hotel for bedbugs, potentially oversleep, and then work out transit from wherever it is to the venue in the morning. The #10 bus picks up from "within sight of the house's driveway" and drops off within sight of palmer center. This is a known place that I technically still own and is not being used. It's a stretch of floor, behind a lockable door in a climate controlled space, with a shower and electrical outlets for charging stuff (which we're still paying the monthly electric bills for). I have spent the night in worse. It's NOT A BIG DEAL, and I am not HER GUEST. My flight out on sunday takes off at 5:30 am so I'm planning on spending tomorrow night on the floor of the airport (otherwise I'd have to leave at 3 am anyway), which I have also done rather a large number of times before (usually without warning), which has neither a lock nor a shower. I don't plan to leave trash in the house or anything, and I intend to be out before sunrise. It shouldn't have to spend more than half an hour trying to GET PERMISSION to do this.

This is my relationship with the realtor in a nutshell: what I want to do, and what I consider obvious to do, is completely irrelevant to her. It simply does not fit into her head. She will force me to do everything exactly her way unless I make a scene, and then it's a big production that's my fault when all I wanted was for her to just not. Can we NOT replace the (completely undamaged) floors? No, that was not an option. And now the floors have been replaced wrong, the new not hugely waterproof flooring in both bathrooms up to the edge of the shower (because Mike apparently stopped listening to her at some point too). Apparently I should feel guilty about "the thing I said we shouldn't do at all" being done wrong over my objections, because we didn't use HER contractor to do it.

Sigh. I have a finite capacity for politeness processing, which I've been sadly overbudget on the past couple months. I can smile and stay silent on most things, or walk away and let them get on with it without me, but diplomatic negotiating to "let the other person have my way" is something I've been handing off to Fade where possible. I am so tired.

Dinner at the HEB. I bought all their remaining cans of checkerboard tea, so I have something other than energy drinks to drink at the conference tomorrow.

I should have brought a USB outlet charger. I thought I had one in the backpack, but apparently not. My phone is "charging slowly" from my laptop, which has to stay on for it to do so. It has not been at 100% this entire trip, but has brought up its "dying in an hour" warning more than once. Overnight last night at Stu's place got it to 85%. (It's also possible I'm just not getting enough sleep...)


April 11, 2024

Flying to Texas LinuxFest today.

Called the tax lady, but voicemail says they're full and not listening to voicemail. Huh. I knew I can't get an appointment now, but I need to file an extension (which in previous years took them like 30 seconds), and would like to hand them a pile of paperwork while I'm in town to stick in a folder until they DO have time to look at it. (Taking pictures with my phone to email to them violates my "the phone does not handle money" rule, which covers giving it my identity theft number as well. Kinda all over the tax info...)

Airport, airplane to Austin (no luggage, I can fit a couple changes of clothes in my backpack), bus to the house (because that's the busses I know, didn't fish the key out of the lock box but peeked in through the windows and grabbed the mail; all spam, forwarding should have kicked in by now).

Alas, by the time I arrived at the house the half hour walk to the tax place would have put me there well after 5pm. Showing up unannounced after hours while they're slammed seems impolite, maybe I can do this tomorrow morning. Instead I had dinner at the HEB (where I bought several cans of checkerboard tea; they're fully stocked because I haven't been buying it).

Then I took the light tactical rail to visit Fuzzy and Stu, and Fade got me a lyft from Leander station to Stu's house. Fuzzy is stressed. Peejee has lost weight. Stu was mostly asleep.


April 10, 2024

Speaking of languages with multiple implementations (I.E. _real_ programming languages), there's an embedded "mruby" implementation of Ruby, and I got asked if that works with mkroot. (Or at least I'm chosing to interpret the question that way, there was some confusion.)

The mruby downloads page provides a microsoft github link to dynamically generate a git shapshot of a tag from a project. Meaning the release archives go away when microsoft github does. A hard dependency on a microsoft cloud service is... "not ideal". But I guess it's not THAT much worse than sourceforge links persisting in 2024? (Except when sourceforge went evil in 2016 it changed hands again and the new owners have worked to rebuild trust. So there isn't the same "inevitable decline" aura around it the big boys have squeezing blood from every stone...)

You can adjust the .zip extension to .tar.gz to get a known archive format (the github URL parser microsoft inherited is flexible that way), but the archive name is still just "3.3.0.extension" with no project name on the front of it, and my download function in mkroot/packages/plumbing doesn't know what to do with that. (Maybe I need a fourth argument? Hmmm...)

The next problem is that ruby has its own magic build utility, "rake", which is IMPLEMENTED IN RUBY. So, circular dependency there. (And yet another build tool to add to the pile of wannabe make replacements.)

I tried running the rake build under record-commands and see if maybe I could create a canned build script, but there's a lot of varying -D define arguments, and a large section where it's building a bunch of small C programs and then running them to produce output it then assembles. (Some sort of self-bootstrapping JIT code maybe?) And creating a rake replacement in C: the build dependencies are written in Ruby. The language needs itself installed to build itself. There does not appear to be a "microperl" build option here that can create a tiny portable mruby just big enough to run "rake". Hmmm...


April 9, 2024

Python 3.7 came out in 2018 and had a dot-release in 2023, but QEMU stopped building with it a year ago because it's "too old" (not "there was a bug because it used a new feature", but it had an EXPLICIT VERSION CHECK and REFUSED). The kernel b4 utility just broke the same way and it's apparently explicit policy, amongst all USERS of python. Projects like ph7 or tinycc can implement fairly stale forks of the language and still get widely used, but python POLICES and SANITIZES its userbase. You Are Not Welcome Here with that old stuff. (That's still in a debian LTS release that's still supported.) They go out of their way to break it, over and over.

Python's progeria would drive me away from the langauge even if the transition from 2.0 hadn't pretty much done it for me. "How dare you continue to run existing code! For shame!" Seriously, they BURNED OUT GUIDO. When your "benevolent dictator for life" steps down because the flamewars got too bad, something is wrong with the community.

Meanwhile, I only moved toybox from C99 to C11 in 2022. Partly because I'd already broken it without regression testing and didn't want to clean up the (struct blah){x=y;} inline constants I'd started using (which turned out to be a C11 feature), partly because C11 offered a convenient bug workaround for LLVM, and partly because I'd been envying the _has_include() feature for a while so there was an actual obvious benefit to moving (turning configure probes into #ifdefs in the code, simplifying the build plumbing).

If I had an 8 year old car that stopped being able to fill up at current gas pumps or drive on current roads, and had to move to a lease model going foward because ownership is no longer allowed, I would object. But the Python guys seem to have no problem with this. "Subscribe or die." You own nothing, you must rent.


April 8, 2024

I should just stop replying to Oliver, which eats all my energy and accomplishes nothing. I'm trying to get a release out, and have instead wasted multiple entire work sessions replying to Oliver.

One of the harder parts of cutting toybox releases is remembering a Hitchhiker's Guide quote I haven't already used. I wanted to go with "For a moment, nothing happened. Then, after a second or so, nothing continued to happen" since it's been WAY TOO LONG since the last release, but it turns out I already used that one in 2012. The "Eddies in the space time continuum, and this is his sofa is it?" line got used last year. I wanted a little more context to the "spending a year dead for tax purposes" line but google is unhelpful and I put my actualy physical copies of the books in boxes and then a storage cube last month. (I tend to go with the book phrasing rather than the BBC miniseries phrasing, especially since half the clever lines are only in the book description and weren't actually dialogue or narration.)

After 1.0 I might switch over to Terry Pratchett quotes. Who knows. Insert disclaimer about forward looking statements and so on.

I outright PANICED when I checked my email and saw a $10k invoice from some random stranger against the middleman, but it wasn't _approved_. (Anybody with an account can submit an invoice.) I logged in and rejected it, then submitted my own invoice for Q1 (which I was waiting until after I got a release out to do, because last year the middleman made a stink about invoicing for work I hadn't done yet; they put _conditions_ on passing along the Google money). Then their website went "something is wrong" at the end of the submission process, and gave a full screen error when I went back to the main page.

And I'm going "oh yeah, I had to borrow Fade's macbook to approve my invoice last quarter" (it _submitted_ fine, but then the site went nuts), because even though debian applies security fixes to this ancient chromium build (where "ancient" = 2020), the VERSION it claims to be is old and various websites reject it. Plus devuan balderdash is probably actually end of life now? No, it says it's still maintained as "oldoldstable", and I fetched security updates last night and there was one. Possibly through June?

I should update after Texas Linuxfest anyway. (And buy a new hard drive at the best buy there, I dunno where to go to get those in person here in Minneapolis and I'm always reluctant to order stuff like that online. I like to _see_ it before buying. Yes I bought stuff through Computer Shopper back in high school, and bought the Orange Pi 3b boards online and had them mailed to me, but for storage specifically there's way too much chinese fake stuff online these days. Amazon is completely useless.)


April 7, 2024

I held my nose and honestly tried to get a riscv qemu target booting, but arch/riscv/configs/defconfig is gigantic (it's not a config, it's the "default y/n" entries from Kconfig, and the result has little to do with the architecture and is full of =m modules), but arch/riscv/configs doesn't offer a lot of obvious alternatives, nor does make ARCH=riscv help. My next guess, make CROSS_COMPILE=riscv32-linux-musl- ARCH=riscv nommu_virt_defconfig which at least claims to be for qemu's "virt" board produces a kernel that qemu-system-riscv32 -M virt -nographic -kernel vmlinux complains has "overlapping ROM regions", because "mrom.reset" lives at 0x1000-0x1028 and the kernel is trying to load itself at address zero.

Buildroot's qemu_riscv32_virt_defconfig is building firmware blobs from a separate source package and feeding -bios fw_jump.elf to qemu's command line. I do NOT want external dependency packages, that's why I have an x86-64 patch to remove the ELF library dependency (and allow it to use the frame pointer unwinder every other architecture can use).

So qemu has a -kernel loader for riscv, but it doesn't work. A brand new architecture needs a spyware blob running "system management mode" over the kernel. Bra fscking vo.

I tried the defconfig build with all the modules just to be sure (that has EIGHT console drivers enabled: vt, hw, serial_{8250,sh_sci,sifive), virtio, dummy, and framebuffer: no idea what the qemu board's default is for -nographic, and don't ask me what device console= should be set to for any of those), but it had the same ROM/kernel conflict. And the problem isn't qemu board selection either: every -M board type had the same conflict except "none", which instead complains it doesn't support "-kernel".

Eh, revisit this after upgrading devuan, since I can't build current qemu with python 3.7. That's unlikely to fix it, but if I'm building current I can ask questions on the qemu mailing list...


April 6, 2024

I spend SO MUCH TIME writing and rewriting responses to Oliver's messages. Here's my first reply to "utf8towc(), stop being defective on null bytes" (yes, that's his title) which I did NOT send, but instead copied here and then wasted hours trying to make it sound "professional" instead of honest.

On 4/6/24 17:48, Oliver Webb via Toybox wrote:
> Heya, looking more at the utf8 code in toybox. The first thing I spotted
> is that utf8towc() and wctoutf8() are both in lib.c instead of utf8.c,
> why haven't they been moved yet, is it easier to track code that way?

Love the accusatory tone. "Yet." Why haven't I moved xstrtol() from lib.c to xwrap.c "yet".

> Also, the documentation (header comment) should probably mention that
> they store stuff as unicode codepoints, I spent a while scratching my
> head at the fact wide characters are 4 byte int's when the maximum
> utf8 single character length is 6 bytes.
>
> Another thing I noticed is that if you pass a null byte into utf8towc(),
> it will assign, but will not "return bytes read" like it's supposed to,
> instead it will return 0 when it reads 1 byte.

And strlen() doesn't include the null terminator in the length "like it's supposed to". That can't possibly be intentional...

> Suppose you have a function that turns a character string into a array
> of "wide characters", this is easily done by a while loop keeping a
> index for the old character string and the new wide character string.
> So you should just be able to "while (ai < len) ai += utf8towc(...",
> the problem?

Again with the "should". No point checking what existing commands using these functions do:

$ grep -l utf8towc toys/*/*.c | grep -v pending | wc -l
9

Rob

I'm aware of "don't ask questions, post errors" but being polite in response to Oliver is EXHAUSTING. And takes a ZILLION rewrites to scrub the sarcasm from, and even then my reply is not all smiles, but at least provided a lot of patient explanation.


April 3, 2024

Tried to run scripts/prereq/build.sh on mac without first running "homebrew" and it spat SO many warnings and errors. The warnings I don't care about: they deprecated vfork() and syscall() and so on but they're still there, why would anybody EVER think adding an integer to a string constant would append to the string that's a strange thing to warn about in C which still is not C++, and shut up about "illegal character encoding in string literal" because it's NOT a unicode character...

But the part I don't understand is "toys/other/readlink.c:67:7: error: no member named 'realpath' in 'union global_union'" when grep realpath scripts/prereq/generated/globals.h finds it just fine. It's there! If you couldn't read the headers out of that directory we wouldn't have gotten that far. There are no #ifdefs in that file. You know what global_union _is_, so why isn't mac's /usr/bin/cc finding the member? This is clang:

$ /usr/bin/cc --version
Apple clang version 14.0.0 (clang-1400.0.29.202)
Target: arm64-apple-darwin21.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

I've built this code with clang. Is there some flag I need to hit it with to tell it to stop being weird?

Huh. Hang on, it's also complaining that wc can't find FLAG_L which means it's reading old headers from somewhere. Lemme try a "make clean" and then...

$ grep -i error out2.txt
toys/other/taskset.c:52:17: error: use of undeclared identifier '__NR_sched_getaffinity'
toys/other/taskset.c:81:15: error: use of undeclared identifier '__NR_sched_setaffinity'
toys/other/taskset.c:119:29: error: use of undeclared identifier '__NR_sched_getaffinity'
3 warnings and 3 errors generated.

Ok, that's a lot more reasonable. (This compiler is searching the current directory before -I even though I had to -I . over on Linux or it WOULDN'T search the current directory for things under a path. PICK A SEMANTIC.)

Next problem: it wants nproc, which uses taskset. Splitting it out into its own function won't help because it's the same sched_getaffinity() plumbing being called to populate the cpu usage mask and then count enabled processors. I dunno the "right" way to do that on a mac or BSD, I should ask somebody...


April 2, 2024

Ok, I went through the commits up to now and made primordial release notes from them (which, like my blog, require a lot of rephrasing and gluing bits together and and HTML massaging to be publishable).

Doing that meant writing down a lot of TODO items that a commit left unfinished, four of which I have already decided NOT to hold up this release for (finish leftover backslash newline shell stuff, promote csplit, redo tsort for new algorithm, file parsing JPEG EXIF data doesn't refill buffer) and five of which seem kind of important: (test/fix passwd rewrite and re-promote passwd.c to toys/lsb/passwd.c, finish fixing up hwclock.c (glibc and musl broke it in DIFFERENT WAYS), the new mkroot init rewritten not to use oneit wants "trap SIGCHLD" but toysh hasn't got a trap builtin yet, and also hasn't got "return", and I need to post the kernel patches I'm using to build 6.8 to the linux-rectocranial-inversion mailing list so they can be sneered at and then ignored again.

Possibly I should just punt on those fixes and try to get a follow-up release out soonish.


April 1, 2024

I am not putting out a release on April 1, so I have a little more time to poke at stuff.

Updating the roadmap, which has a "packages" section. In theory mapping commands to packages is basically (declare -A ray; for i in $(toybox); do which $i >/dev/null && ray[$(dpkg-query -S $(readlink -f $(which $i)) | toybox cut -DF 1)]+=" $i" || ray["none:"]+=" $i"; done; for i in ${!ray[@]}; do echo $i ${ray[$i]}; done;) In practice, that dumps a lot in "none" because the relevant package isn't installed on my laptop. (Although a lot less if you remember to add /sbin and /usr/sbin into the $PATH. Debian is insane, it thinks calling "ifconfig" to see the machine's IP address is something non-root should never do. Everyone else puts those directories in normal users' $PATH for a REASON.) Debian also breaks packages up at fairly stupid granularity: things like eject, passwd, pwgen, and login are each in their own package. Other things are in WEIRD places: cal is in bsdmainutils (which is NOT the same as bsdutils containing the two commands "logger" and "nice"), "which" is in debianutils, mkpasswd is in whois (what?), crc32 is in libarchive-zip-perl (really?)...

I'm not entirely convinced this is a useful exercise. The list I did before was mostly based on Linux From Scratch, which says what commands are installed by each source package it builds. I checked each package list and grouped stuff by hand, which was a lot of work. Updating that list based on an automated trawl of debian source control is EASY, but not necessarily USEFUL, because debian's package repository seems like a lower quality data source and I can't figure out how to query packages I don't currently have installed.

At the end of my local copy of the roadmap is a TODO section I've been meaning to check in, and one of the things on it is:

Ship a minimal generated/build.sh with snapshot generated/ directory that builds _just_ the commands used by the toybox build, with no optional library dependencies, so minimal host compiler can build toybox prerequisites instead of requiring "gsed" and "gmake": mkroot/record-commands make clean defconfig toybox && toybox cut -DF 1 log.txt | sort -u | xargs

I want to build the toybox prerequisites without any optional libraries, so you can run a "scripts/prerequisites.sh" with a simple enough syntax even the shell built into u-boot can handle it (just substitute in $VARIABLES and run commands, no flow control or redirection or anything) and have it compile a toybox binary that provides what toybox needs out of the $PATH. Maybe even letting you build on mac without homebrew, and making native bootstrap on qnx and similar feasible-ish.

Hmmm... I've already got plumbing to collect the actual commands used by the build: mkroot/record-commands make clean defconfig toybox populates log.txt, and a config with JUST those symbols enabled would be... (should I throw in --help support while we're there?)

for i in toybox toybox_help toybox_help_dashdash $(toybox cut -DF 1 log.txt | sort -u | xargs); do grep -qi CONFIG_$i'[= ]' .config && echo CONFIG_$i=y; done | tr '[:lower:]' '[:upper:]'

Ok, grind grind grind... hmmm, I want to simplify the shipped headers and if I DON'T have --help I can basically stub out generated/help.h so let's NOT add --help support here. Need a script to regenerate all this automatically, of course...

Sigh, I put a test for MacOS in the simplified build.sh so I could feed the linker two different kinds of garbage collection (because -dead-strip on Linux's ld is interpreted as -d -e ad-strip replacing the entry point with the nonexistent symbol "ad-strip", which then gets replaced with a default value that results in a segfault when you try to run it, so I can't just feed both to the build all the time). Except with the Defective Annoying SHell, my test dash -c '[ "$(uname)" == Darwin ] && echo hello' says "[: Linux: unexpected operator" which makes me tired.

I figured it out (posix is = not == and dash goes out of its way to break on any non-posix syntax) but I wound up just blanking LINK="" anyway because it's simpler: the binary still builds and runs, there's no unreachable symbols getting pulled in without the dead code elimination here, it's just a bigger binary and I don't really care in this context. Smaller and simpler build.sh script wins out.

Got it working enough to check in.


March 31, 2024

Hammering away to get a toybox release out today, because I don't want to do an April 1 release but I _do_ want one in Q1 2024.

Sigh, didn't manage it. Got a lot done, but the tunnel didn't go all the way through. Got the mkroot kernel configs converted to use be2csv with curly bracket grouping, added a microblaze kernel config, documentation updates...


March 30, 2024

The xz exploit is all over mastodon. Hopefully my paranoia about not wanting to run Orange Pi's default image seems slightly less silly now.

Seeing so many rust worshippers going "rust wouldn't have stopped this, but it proves we need to switch everything to rust ANYWAY". They're attacking C for a MULTI-YEAR SOCIAL ENGINEERING EXPLOIT, which was found precisely because people were intimately familiar with what to expect from the C ecosystem, including 30 years of ELF linking in Linux and "objdump -d". If somebody did this exploit to rust stuff, nobody would ever find it. (Google for "objdump disassembly on rust output" vs "objdump disassembly on C output". One has tons of blogs and tutorials and such, the other doesn't seem to have a single relevant link newer than 2013 in the first page. That's seems to me like a PROBLEM.)

How is this social engineering attack an argument FOR replacing 30 years of established field-tested software that was developed in public with public logs of the discussions and years of history and everybody attending in-person conferences and networking and giving recorded talks and so on... let's throw all that out for brand new software developed from scratch by unknowns now that state level actors have shown interest in targeting this area. Because it's written in The Annointed Language and coding in anything else is a sin, you vile unconverted heathen still clinging to the old ways.

Sigh, one of my concerns about self-driving cars was the skill of driving a car atrophying, so after a while nobody could improve the self-driving cars because nobody knew how to do the task anymore. Automating away a task can eliminate expertise in that task from the wider population, which isn't necessary a bad thing but it's something to be AWARE of. C is a portable assembly language, however much the C++ developers hate anyone pointing out an advantage of C that clearly C++ does not have. You can map from C to assembly in your head, even with fairly extreme optimizer shenanigans a bit of paleontology will dig up the ancestral relationship. It is therefore POSSIBLE to dig down into "here is where what the machine is doing diverged from what I thought I told it to do", and this is a pretty standard debugging technique in C. It's not intro level, but usually by the time you've got a few years experience you've done it more than once. "This code resulted in this write or jump instead of that one, and here's the instruction that made the decision I didn't expect".

Of course right now the venture capitalists have pivoted from blockchain to large language models, so expressing concern about loss of human expertise is going to fall on deaf ears at least until they cash in their bailouts after the next crash. (Billionaires are not allowed to lose money, congress will print more and hand it to them every time they screw up bigly enough for as long as capitalism remains the state religion. Oh well...) And collapse is not the end: the industry got built from scratch over a century, our descendants can do it again I suppose. Not immediately useful for strategic decision making, though.


March 29, 2024

The xz exploit looks like somebody checked in a "test case" with an x86-64 binary blob, and the overcomplicated build system spliced that in to the build with some variant of LD_PRELOAD linker shenanigans overriding the correct symbol. (Which means it does not affect toybox, and my own systems use dropbear for ssh anyway, yes including my laptop).

This is similar to the problem I had with things like pam, and why I tend not to enable module support. You can start with a secure system, then add arbitrary binary blobs at runtime to change how it works. If nothing else that makes the system less AUDITABLE. I can't usefully examine blobs provided to me from on high, and a signing chain of custody is still GIGO. I have to trust that my upstream didn't get exploited, and when that upstream includes systemd they've already lost control of what is and isn't included in that system. (And then people inexplicably want stuff like ssh-agent talking through d-bus: just keeping up with the version skew on how it works when you apt-get update is more than I have bandwidth for. A .ssh/key file in a directory may not be "as secure" but I at least think I understand what's GOING ON.)

A more secure system is one that has LESS in it. Same logic as "watertight". I mean, you can argue about encapsulation and layers of privilege (yay containers), but the people who talk about that tend to think microkernels are a good idea. If I cut and paste an ssh key from one window to another, my clipboard has privileged information in it. My clipboard is not particularly secure. (Yes, there have been attacks on this.) And the threat model of keyloggers, screen scrapers, and processes listening to the laptop's microphone (from which you can apparently reconstruct what keys were typed on the keyboard!) doesn't require a kernel exploit if the information isn't being securely collected and distributed. If my laptop or phone camera had a physical LED that lit up when it was powered, at the HARDWARE level, there wouldn't be a band-aid and electrical tape over them, respectively. If I can't ask the kernel to enumerate all listeners to the microphone, what's the POINT? (Sure, output's got a mixer, but input probably shouldn't.)

Ahem. Tangent. A black box with a sign lighting up saying "all is well" is actually saying "trust me bro". You can sign it but I can't meaningfully examine it.


March 28, 2024

Finally got a sh4eb target with the fdpic loader running under qemu, which can run the sh2eb nommu root filesystem! Woo! It's not a 100% solution because it won't suffer from fragmentation like the other one does, and if the code DOES try to do things to call mmap() with the wrong flags it'll work fine because the underlying kernel isn't a nommu kernel.

Still, it's an alternative to sneakernetting an sd card over to the turtle board every time I want to do ANY nommu smoketesting. Modulo I haven't got a build that's putting them together, instead I'm manually cpio.gz-ing the fs directory and editing run-qemu.sh to use "-kernel ../sh2eb/fs.cpio.gz". I should probably automate that somehow...

Meanwhile, I have a reasonably sized kernel patch adding FDPIC support to the with-mmu version of superh, which would go upstream if linux-kernel was still functioning. Sigh. Throw it in the list of 6.8 patches to post along with mkroot binaries, I guess? (I should post them to the list again for the usual mockery and derision. Don't really want to, but it's conceptually adjacent to spring cleaning. Big pain, big mess, probably healthy.)


March 27, 2024

I applied the commit as-is, but I wonder what a tests/inotifyd.test would look like? I mean, even under mkroot, there's some design work here...


March 26, 2024

Trying to add a bootable microblaze target to mkroot now that either musl-1.2.4 or musl-1.2.5 seems to have fixed whatever segfault was happening in the userspace code, or at least I ran some toybox commands with qemu-microblaze application emulation and they didn't die like they used to.

I built a kernel from linux's one and only microblaze config (arch/microblaze/configs/mmu_defconfig which nominally implies a nommu variant they didn't bother providing a defconfig for but let's worry about that later) and trying to boot it under qemu-system-microblaze died immediately complaining about unaligned access. And left the terminal in "raw" mode so nothing you type produces output until you run "reset" blind, definitely an arch with all the rough edges polished off.

Eventually I ran "file" on the vmlinux to see that the defconfig had built a little endian kernel, and the presence of qemu-system-microblazeel in the $PATH suggests qemu-system-microblaze is big endian. The root filesystem I built is also big endian, because telling the gcc tuple "microblaze-unknown-linux" with no further details produces a big endian toolchain with big endian libraries, which built a big endian toybox binary. But Linux's .config defaults to little endian unless I add an explicit CONFIG_CPU_BIG_ENDIAN=y config symbol that isn't in the defconfig.

Switching endianness gave me a kernel that booted on qemu's default board (-M petalogix-s3adsp1800), and CONFIG_SERIAL_UARTLITE wants the serial device "ttyUL0" which gave me boot messages. (Tempted to do targets for both endiannesses since there's a qemu-system for the other one, but I already published new toolchains which did NOT include a little endian microblaze toolchain with little endian libraries... maybe next time.)

The external initramfs.cpio.gz loader works and I got a shell prompt! As with or1k I can't figure out how to get the kernel to halt in a way that causes qemu -no-reboot to exit, but it's better than nothing. (Worry about that once I'm running current qemu builds again, which requires a newer version of perl.)

Trying to harvest config symbols out of this defconfig, the next problem is it has the same kind of CPU feature micromanagement nonsense that or1k had:

CONFIG_XILINX_MICROBLAZE0_USE_MSR_INSTR=1
CONFIG_XILINX_MICROBLAZE0_USE_PCMP_INSTR=1
CONFIG_XILINX_MICROBLAZE0_USE_BARREL=1
CONFIG_XILINX_MICROBLAZE0_USE_DIV=1
CONFIG_XILINX_MICROBLAZE0_USE_HW_MUL=2
CONFIG_XILINX_MICROBLAZE0_USE_FPU=2

Which is just LEVELS of sad. Isn't this a compiler -m flag rather than config nonsense? I already BUILT userspace and it didn't need to be micromanaged like this. Can you maybe trap on the missing instruction and emulate the way FPUs are handled (sure it's slow but it means I don't have to care), or some kind of cpu version feature bitfield with the runtime linking patch nonsense all the other architectures do? (Reserve space for the function call, turning it into instruction plus NOP when you don't need it.) I mean seriously, I don't have to do this on a real architecture.

But the annoying part for ME is how verbose the config is: I can either leave them all out so the already slow emulator is even slower because it's making function calls for instructions qemu is clearly emulating (it booted!) or else the microconfig version of the above is the outright tedious XILINX_MICROBLAZE0_USE_MSR_INSTR=1 XILINX_MICROBLAZE0_USE_PCMP_INSTR=1 XILINX_MICROBLAZE0_USE_BARREL=1 XILINX_MICROBLAZE0_USE_DIV=1 XILINX_MICROBLAZE0_USE_HW_MUL=2 XILINX_MICROBLAZE0_USE_FPU=2 which is BEGGING for bash's curly bracket expansion syntax. Which the bash man page calls "brace expansion". That would be XILINX_MICROBLAZE0_USE_{{MSR_INSTR,PCMP_INSTR,BARREL,DIV}=1,{HW_MUL,FPU}=2} which is almost reasonable. (I mean still CONCEPTUALLY broken in a "this is not a real processor" way, but not quite as horrible to include in mkroot.sh. One line vs three.)

The problem is brace expansion produces space separated output, and this is CSV (comma separated values). I can of course trivially be2csv() {echo "$@" | tr ' ' ,;} in a function, and calling that function would perform the brace expansion on its arguments, so using it would look like $(be2csv abc{def} blah blah) which I guess isn't that bad? Conceptually it's extra complication (now there's FOUR levels of config processing), but there's a bunch of other repetition in the existing microconfigs that could get cleaned up with brace expansion, and while I'm at it I could properly wordwrap the Very Long Lines that most configs are right now.

I note that this would increase the line count of mkroot.sh which I brag about, but Goodhardt's Law applies here: a metric that becomes a target stops measuring anything useful. More lines containing LESS DATA and being easier to read is a good thing. This is also why I've got a bunch of comment lines in the code (and yes they're in the line count).

The slightly embarassing part is I have a mkroot talk back in Austin on the 12th, and I think I put the mkroot.sh line count in the talk description. Oh well, I can explain. The Very Long Lines were always a cheat, anyway.


March 25, 2024

Updated musl-cross-make, got it to build more targets, and uploaded the resulting toolchains.


March 23, 2024

Remember my rant last month about crunchyroll censorship? A brief follow-up. You can't make the "cartoons are for kids" argument when you show that much gore (which is not new for this show), but of course everybody was wearing towels in the shower because THAT can't be shown while a single Boomer still draws breath.

Half of my problem here is "han shot first". Speilberg came to publicly regret editing guns out of ET, and was quite eloquent about Not Doing That again in future.

I want to watch the original version that made this thing popular. Not some pearl-clutching geezer's edits showing me what THEY want me to see, even when the geezer editing it was once involved in the property's creation back before they ossified into a loon and were compelled to render unwatchable the work they did when they were younger.

But having a distribution channel do this en masse? Sets my teeth on edge. And every time I wonder if what's on screen is a choice the original made or a choice the distributor airbrished over the original breaks my immersion and pulls me right out of the story. Fade to black, clever camera angles, non-transparent water, ALL FINE. But only if it's the original doing it and not changed "for your protection" by someone who knows better than me what I should be allowed to see. Distributors want the exclusive right to convey stuff they didn't create to an audience... and then only provide changed stuff that's NOT what gets shown in Japan. Makes me want to _speculatively_ buy DVDs to see if I MIGHT like things.

This is a separate issue from the original artist _disgracing_ the work so it's still available in its original form but seems tainted, like Dilbert, Harry Potter, Bill Cosby... Death of the Author vs Harvey Weinstein holding Dogma hostage. When Disney's attempts to bury Song of the South turn into photoshopping cigarettes out of pictures of its founder who died of lung cancer, and then its streaming service is riddled with changes... Disney is really big and keeps buying stuff it didn't create and has a history of editing those properties once it owns them. Like crunchyroll is doing.


March 21, 2024

There's no convenient place to set my laptop up in Fade's bedroom: it's full of stuff. There are at least 3 nice places to set my laptop up elsewhere in the apartment, but Adverb will scratch constantly at the bedroom door if I don't let him out and bark constantly at the front door out into the hallway if I do. I have my own bedroom I could close the door to, but again: constant scratching to be let out if someone else is in the apartment and he can't cling to them.

So once again, despite escaping the cat situation, I have a dog situation where I need to leave and go find workspace out in the wider world to take my laptop to. Luckily the apartment has a couple of shared workspaces, which haven't been _too_ busy so far...


March 20, 2024

9am phone call with the realtor, who wants to spend an additional $12k to (among other things) do a more extensive version of the floor replacement I keep trying to talk her out of. (It's entirely for aesthetic reasons, the floor isn't damaged, she just doesn't like it. Now she wants to rip out the toilets so new flooring can go under it in the bathrooms, which I explicitly said no to the last week I was packing up, but nothing she ever wants to do is settled until she gets her way, "no" just means it will be brought up again later.)

The City of Austin's tax assessment on the place was $700k. Speaking to her she thought it was worth $550k but could be brought up to $600k with about $20k of work. Now she's wants to spend an extra $12k on top of that, and is saying it's worth $400-450k. The argument that money we spend fixing the place up will have twice that impact on the sale price isn't very convincing when the base number for the sale price was never in writing and seems subject to endless downward revision.

So to recap: we said we could probably afford about $6k-$8k of work, got talked up to $20k, and now she want to increase it to $32k. And the result of the work done so far seems to have been to DECREASE the amount she wants to list it for.

I find this process stressful. She's also insisting that the city of Austin's tax evaulation is fraudulent, that the three biggest online house assessment sites are frauduluent (that part's plausible), and the two realtor email lists telling me how other houses in the area sold (one I've been on since I bought the place a decade ago, the other I got subscribed to by the mortgage guy I talked to when I tried to refinance back when rates were briefly under 3% during the pandemic) are also fraudulent. Everybody everywhere is giving bad numbers except her, and her numbers keep changing, always in the same direction.

But my wife agrees with the realtor her sister recommended, so fine. There's no equity in the house, meaning I have very little saved for retirement. Good to know. (I don't THINK all the realtor's aesthetic judgements are because she has a specific friend she wants to sell the house to cheap. She's somehow guessing what everyone everywhere would universally like. FINE. Not my area of expertise.)

I have moved beyond finding the process stressful to finding it exhausting.

Update: running the numbers again, we might get out the same amount of equity we put into it from selling the condo back in 2012, only having lost money to ten years of inflation. At this point, that seems like a best-case scenario.


March 18, 2024

Looking at orange pi 3b kernel building, the vanilla kernel still claims to have Orange Pi 3 support, but not 3b. I dunno what the difference is between them: it's an rk3566 chipset either way but bunches of stuff use that, apparently very differently.

Orange pi's github has a new "orange-pi-6.6-rk35xx" branch that looks promising. Of course it doesn't have actual linux git history in it, the entire branch history is just 3 commits, labeled "First Commit", "Init commit for linux6.6", and "Support Orange Pi 3B". So in order to read through a patch of what they added to vanilla linux, I need to come UP with such a patch via diff -ruN against a fresh vanilla v6.6 checkout.

The first difference from orange pi's "init" commit is that the first line of arch/alpha/boot/bootloader.lds (the SPDX-identifier line) is missing, and git annotate in 6.6 says that was added in commit b24413180f560 in 2017. So I dunno what this "init" commit is, but it's ANCIENT... the top level Makefile says 4.9.118. Why would you even... I mean what's the POINT?

Ok, let's try the SECOND commit, the one that says it's linux 6.6, and piping the diff into diffstat we get 1209 files changed, 12719 insertions(+), 11506 deletions(-) which is NOT a vanilla release. Maybe it's one of Greg KH's ME ME ME releases? Hmmm... Not obvious how to get those in a git repo. I can get incremental patches, even fetch them all via for i in $(seq 1 21); do wget https://cdn.kernel.org/pub/linux/kernel/v6.x/incr/patch-6.6.$i-$((++i)).xz; done but there's no zero to one, it starts with 1-2, meaning I think I have to start with 6.6.1 instead of Linus's release version?

Except the first patch in that series (the 1-2 one) starts by adding a "dcc:" entry between the "dc:" and "sym:" entries of Documentation/ABI/testing/sysfs-driver-qat and the "init commit" for linux-6.6 does NOT have that change. Was it reverted by a later patch? Grep says the line only appears in the first patch, not in any later patch (reverting it would have a minus line removing it).

So the orange pi chinese developers went from some variant of 5.9 to something that is not 6.6 nor one of the dot releases after... hang on. Check the Makefile... That says 6.6-rc5. Maybe it's an EARLIER version? (I just want to see where they forked off vanilla! I'm assuming any changes that actually made it into vanilla AREN'T spyware. Probably. Or at least multiple people other than me looked at them already to catch anything obvious.)

Ok, *cracks knuckles*: for i in $(git log v6.6-rc5..v6.6-rc6 | grep '^commit ' | awk '{print $2}'); do git checkout -q $i; echo -n ${i:0:12}; diff -ru . ../linux-orangepi | diffstat | tail -n 1; done

The point of divergence has to be newer than the one that changed the Makefile to say -rc5, but older than the commit that changed it to say -rc6. I could also look at individual diff lines and try to annotate them to a commit from -rc6, but this just runs in the background...

Sigh, the closest commit (6868b8505c80) still has 416 files changed, 9404 insertions(+), 4611 deletions(-). Whatever orange pi checked in as their "base", it is NOT a vanilla commit.


March 17, 2024

I have a pending fix I'm staring at because I called the variable "edna" and I should change it to "mode" but I have recently been informed that my variable names aren't good enough even when I do cleanup passes to remove idiosyncratic naming.

I don't want to be reverse psychologied into making the codebase worse just because some else threw a tantrum, but I've had an exhausting month and it's _really_ hard for me to get "in the zone", as it were.

Anyway, the technical issue is my install -d was creating the directory with permission 0777 and letting the default umask 022 drop out the group and other write bits, but for the _files_ the callback was using base permissions of 0755 to apply the string_to_mode() delta against, so of course I had to test (umask 0; install -d potato) and confirm that yes, the base permissions are 0755 for the directory too.

But THEN I did:

$ (umask 0; install -dm +w potato)
$ ls -o
total 4
d-w--w--w- 2 landley 4096 Mar 17 04:51 potato

Which says that when it DOES have a delta, the base permissions are ZERO which is just SAD. I mean, I can do that, but... ew?

As always, doing it is easy, figuring out WHAT to do is hard...


March 14, 2024

Ok, Oliver has explicitly progressed to flamewar and there's no useful reply I can make that that.

What's my current todo list:

finish log/release notes
hwclock
fix /etc/passwd, re-promote command, promote other commands
build with current kernel
toysh builtins "return" and "trap"
orange pi kernel and/or qemu arm64 debootstrap chroot
cut a release
LFS build to create chroot
LFS build part that runs under chroot
android's pending list
  diff expr tr brctl getfattr lsof modprobe more stty traceroute vi
blog catch up
close tabs, reboot laptop, reinstall 16 gig ram, devuan update

I should go work on some of that...


March 13, 2024

I am irritable. I don't WANT to be irritable, but line buffering is being stroppy in more or less the way I'd expected, and I'm being lectured by Oliver again.

Sigh, I'm pretty sure Oliver MEANS "your half-finished code could use more cleanup and comments" and not "I am the omniscient arbiter of taste, bow before my iron whim". But he's dictating to me how my own code MUST be organized because there's exactly one right way to do it and I was Clearly Wrong, and I just don't have the spoons to handle this gracefully right now. (That's why I've ignored it as long as I have, even when I don't pull my laptop out I tend to check the web archive on my phone to see if there's something new I should respond to. This was a "definitely should not respond to it JUST NOW", with the move and all.)

Busybox had lots of commands that I didn't maintain, but delegated and forwarded requests about. Awk most prominently comes to mind. I tried to let that happen in toybox a few times, which is how I wound up with bc.c being the longest file in the tree (longer than news.html, AND perched in a Cloud of Drama but I mostly try to ignore that). Sigh: it's hard to delegate _and_ maintain the code equivalent of bonsai.

I should book the flight back to Austin for my Texas LinuxFest talk. The realtor was very unhappy at the idea of me bringing a sleeping bag back to Austin and crashing on the floor of my own house for 2 nights. Oh well, I've flown to random cities and spent money on a hotel room before. I just... really don't want to.


March 12, 2024

I moved to Minneapolis. There were weeks of tetris-ing things in boxes and lifting heavy things into various piles. I did 6 consecutive nights on 4 hours or less of sleep per night, which I am no longer young enough to bounce back from the next day.

We moved my flight to Minneapolis back from the 5th to the 10th, and moved back the deadline to have the storage pod picked up TWICE, because SO MUCH TO PACK. Podzilla finally came for it Saturday morning, and a couple hours later I rented a U-haul (from the place a 10 minute walk away on I-35) so we could fill it up with Fuzzy's stuff and I could drive it to her Father's place in Leander. (I _tried_ to get some sleep there, but he played a podcast about the "pillowcase rapist" at full volume ten feet away; he's gotten far older in the past ~5 years than in the previous 15.)

Peejee is settling in well at Stu's. She has a familiar caretaker monkey, and her warm square, and slurry. There was rather a lot of hissing at their existing cat, Scoop, but she's lived with other cats before.

When we finally got the dead motorcycle and chest freezer and SO MANY BOXES out of the U-haul and swept it out and I drove it back, I returned to the house one last time to pack the final 3 suitcases to take on my 8pm flight to Minneapolis: everything else got thrown out (or donated if the realtor's up to elegantly disposing of stuff), including half my clothes that didn't fit in the suitcase. (I tried to get a nap first, but workmen were pressure washing the driveway: our handyman was willing to work on contingency, so the realtor got her $20k worth of work so she could sell the place for $150K less than the current tax assessment. Wheee.)

Headed to the airport, caught my flight to Minneapolis, collapsed at Fade's, and was informed the next day that Drama Had Occurred in my absence. (Pretty sure it's the guy who crossed the street from the apartment complex to ask me about the giant container with the storage company's billboard on the side of it in my driveway, but not much I can do about it from here and... strangely, only minor annoyance levels of harm done? When we first moved in, our game consoles were stolen, then nothing for 12 years, and moving out the realtor didn't get the air fryer because it was stolen.)

Heh, I forgot the 2012 breakin was why I stopped trying to get a kernel.org account. (Went to a mandatory in-person keysigning, backup disk got stolen with that key on it, didn't bother to try again.)


March 11, 2024

Oh goddess, I just want to know what the RIGHT BEHAVIOR IS, so I can implement it.

Except what coreutils is doing/advocating is very clearly NOT the right behavior. And I'm a monoligual english speaker with a TINY SMATTERING of japanese, so really not qualified to opine on this stuff. But watching silicon valley financially comfortable white males make decrees about it leaves an aftertaste, you know? Bit more humility please. You do not live in the "circle of rice" (which can be sung to that song from the Lion King), and are thus outvoted.

I note that the original circle of rice from reddit is probably correct. I don't trust the smaller one the guy in singapore redrew to exclude Japan because it depends on china's inflated estimates of its population. China's local governments get funding based on head count, so when the "one child policy" reduced population inventing more people on paper and self-certifying their existence was a big temptation. One theory why it's so hard to migrate within china was local governments trying to hide that sort of thing. (This was a chronic problem throughout history, the phrase "pass muster" in Europe originally meant inspecting a regiment of troops to confirm each listed soldier could be present at the same time, because officers would make up enlisted men so they could pocket the extra salaries. The inspection by the people paying the bills wasn't to make sure their boots were shined, it was making sure those boots actually had someone in them.)

That's why estimates of china's actual current population run as low as 800 million, but even China's own central government has been unable to actually _check_ because the local governments really really really don't want them to. Since covid, china relaxed its internal migration rules, in part because they can blame covid for any _specific_ missing people and the central government really doesn't want to do that so carefully doesn't look: one cover-up hides the other. But some fraction of the declining number of births might be because some portion of the young adults nominally capable of having them only ever existed on paper. There's so much fraud it's hard to tell, especially from here.

[Backdated entry: I didn't touch my laptop for several days during the height of the move, but this is when the email came in.]


March 7, 2024

Got a google alert, which I set on my last name over 10 years ago and hasn't been useful in forever (and barely ever triggers anymore), telling me that my grandmother died.

Nothing I can do about it at this point. She lived to be 100, like her mother before her. More boxes to pack...


March 5, 2024

If you collect your mp3 files into a directory, The Android 12 ("snow cone") built in file browser app can be convinced to play them in sequence, and will continue playing with the screen switched off. (Just go to "audio files" and it shows you folders you've created in random other places, for some reason?)

But as soon as focus returns to the app (which is what happens by default when you switch the screen back ON), the playback immedately jumps to the position it was at when you switched it off, and playback switches to that point in that song. Redrawing the app's GUI resets the playback position. Oh, and if you let it play long enough, it just suddenly stops. (And then jumps to the old position when you open it to see what's going on.) The user interface here is just *chef's kiss*.


March 4, 2024

We're tentatively having the storage pod picked up on friday, renting a u-haul to take Fuzzy's stuff to her father's place on saturday, including the 20 year old cat, and then I drive to the airport Sunday. Fingers crossed.

My proposed talk at Texas LinuxFest (explaining mkroot) got accepted! Except I plan to be in minneapolis after this week, and have to fly BACK for the talk. (And get a hotel room, because the realtor is highly dubious about me bringing a sleeping bag to crash on the floor of a house with a lockbox on the front. Yes, this is the same realtor that insists the place has to be listed for $150k less than the tax assessment. She's a friend of my wife's sister.) So I may have to get a hotel in order to speak at an Austin conference. Oh well, I've done that for a zillion other conferences...

In the netcat -o hexdump code, TT.ofd is unsigned because I'm lazy and wanted one "unsigned fd, inlen, outlen;" line in the GLOBALS() declaration instead of two lines (one int fd, one unsigned inlen, outlen), since xcreate() can't return -1 (it does a perror_exit() instead). I thought about adding a comment, but adding a comment line to explain I saved a line seems a bit silly.

I found an old pair of glasses while packing (in a box with a prescription slip from 2012), which is kind of backwards from the pair I've been wearing in that the LEFT eye is more or less clearly corrected, but the RIGHT eye is fuzzy at any distance. I've refused to update my prescription for several years now with the excuse "they're reading glasses" ever since I figured out that the reason I'm nearsighted is my eyes adjust to whatever I've been looking at recently, and I read a lot. The day of the school eye test in second grade on Kwaj I'd been reading all morning and my eyes hadn't had time to adjust BACK, so they gave me glasses. Which my parents kept reminding me to wear. So I'd read with those, focusing up close, and 20 years of feedback loop later I finally figured out what's going on and STOPPED UPDATING. But I still spend most of my time staring at a laptop or phone or similar, so far away is fuzzy unless I've taken a couple days off. But it mostly stopped GETTING WORSE, as evidenced by glasses from 2012 not being worse than the current set, just... different.

My last few sets of glasses I just went "can you copy the previous prescription", which they can do by sticking it in a machine that reads the lenses, but after a few "copy of a copy" iterations it went a little weird in a church glass sort of way. (Which my eyes mostly adjusted to!) But I've developed a dominant eye over the past couple years... and these old glasses are BACKWARDS. The dominant eye with these glasses is the LEFT one, and the right is hard to read text at my normal length with just that one eye open.

So I'm wearing that pair now, on the theory variety's probably good in terms of not screwing up my visual cortex so nerves atrophy or something, in a "that eye's input isn't relevant" sort of way. Honestly I should go outside and stare at distant things more often, but texas sunlight and temperatures are kind of unpleasant most of the year.

(I remember why I stopped wearing this pair. One of the nose pieces is sharp and poky.)


March 3, 2024

Gave up and admitted I'm not making the March 5 flight to minneapolis, and had Fade bump it back to the evening of the 10th (which is when I actually told the realtor I'd be out of here). I immediately got hit with ALL THE STRESS, because my subconscious knew the deadline on the 5th wasn't real but the one the 10th is. (My brain is odd sometimes, but I've been living with it for a while now.)

Red queen's race continues: I hadn't checked in the hwclock rewrite motivated by glibc breakage which screwed up the syscall wrapper to not actually pass the arguments to the syscall. Meanwhile, musl-libc changed their settimeofday() to NOT ACTUALLY CALL THAT SYSCALL AT ALL, which is the only way to set the in-kernel timezone adjustment. So I rewrote hwclock to call the syscall directly, but before checking it in I wanted to test that it still works properly (I.E. reads and writes the hardware clock properly), and I'm not gonna do that on my development laptop so I needed to do a mkroot build to test under qemu.

Which is how I just found the musl commit that removed __NR_settimeofday, thus breaking my new version that calls the syscall directly. Rich both broke the wrapper AND went out of his way to make sure nobody calls the syscall directly, because users aren't allowed to do things he disapproves of. (For their own good, they must be CONSTRAINED.)


March 2, 2024

I've had mv -x sitting in my tree for a couple days, but it came up on the coreutils mailing list (in a "don't ask questions, post errors" sort of way) so I'm checking it in.

In theory both renameat2() and RENAME_EXCHANGE went in back in 2014 (ten years ago now!), but glibc doesn't expose either the Linux syscall or the constant Linux added unless you #define STALLMAN_FOREVER_GNU_FTAGHN_IA_IA and I categorically refuse. Also, this should build on macos and freebsd, which probably don't have either? So I need a function in portability.[ch] wrapping the syscall myself inside an #ifdef.

Which is a pity, because renameat() seems like what "mv" really WANTS to be built around. Instead of making a bunch of "path from root" for the recursive case, the clean way to handle -r is to have openat() style directory filehandles in BOTH the "from" and "to" sides, and that's what renameat() does: olddirfd, oldname, newdirfd, newname.

Although there's still the general dirtree scalability issue I have a design for but haven't properly coded yet: keeping one filehandle open per directory level leads to filehandle exhaustion if you recurse down far enough. I need to teach dirtree() to close parent filehandles and re-open them via open("..") as we return back up (then fstat() and compare the dev/ino and barf if it's not the same). (And even if I teach the dirtree() plumbing to do this, teaching _mv_ to do it would be separate because it's two parallel traversals happening at the same time.)

Without conserving filehandles you can't get infinite recursion depth, and you can trivially create an infinite depth via while true; do echo mkdir -p a b; echo mv a b/a; echo mv b a; done or similar so at least "rm -r" can't be limited by PATH_MAX. And without the stat to see if that gets us the parent node's same dev/ino back rm -rf could wind up deleting the WRONG STUFF if an ill-timed directory move happened in a tree that was going away, which is important to prevent. So we both need to check that the parent filehandle is safe to close because we can open("..") to get it back (if not, we followed a symlink or something and should keep the filehandle open: if you cause filehandle exhaustion by recursing through symlinks to directories, that's pilot error if you ask me), AND we need to confirm we got the right dev/ino back after reopening.

But if we DO get a different dev/ino when eventually reopening "..", what's the error recovery? We can drill back down from the top and see how far we get, but do we error out or prune the branch or what? Doing "mv" or "rm" on a tree we're in the middle of processing is bad form, and if we're getting different results later somebody mucked with our tree mid-operation, but what's the right RESPONSE? At a design level, I mean.

Anyway, that's a TODO I haven't tackled yet.


March 1, 2024

The pod people's flatbed truck arrived today, and dropped off a storage container using what I can only describe as an "elaborate contraption". (According to Fade, their website calls it PODzilla, imagine a giant rectangular daddy longlegs spider with wheels, only it lifts cargo containers on and off a big flatbed tow truck.) There is now a large empty box with a metal garage door on one side in the driveway, which I have been carrying the backlog of cardboard boxes we packed and taped up into.

I'm very tired. Fuzzy's gone to the u-haul store to buy more boxes. We're like 20% done, tops.

I tried to get a toybox release out yesterday (using the "shoot the engineers and go into production" method of just SHIPPING WHAT I HAVE, with appropriate testing and documentation), but got distracted by a mailing list question about the "getopt" command in pending and wound up wasting the evening going through that instead. Although really the immediate blocker on the release is I un-promoted the passwd command when I rewrote lib/password.c until I can properly test that infrastructure (under mkroot, not on my development system!) and that's both a pain to properly set up tests for (the test infrastructure doesn't run under toysh yet because I've refused to de-bash it, I'm trying to teach toysh all the bashisms it uses instead) and because there's a half-dozen other commands (groupadd, groupdel, useradd, userdel, sulogin, chsh) that are low hanging fruit to promote once that infrastructure's in, and what even ARE all the corner cases of this plumbing...

There are like 5 of these hairballs accumulated, each ALMOST ready, but that one that causes an actual regression if I don't finish it.

Wound up promoting getopt, so that's something I guess. Still not HAPPY with it, but it more or less does the thing. Given my stress levels accomplishing anything concrete is... an accomplishment.


February 29, 2024

The coreutils maintainer, Padrig Brady, just suggested using LLMs to translate documentation. I keep thinking gnu can't possibly get any more so, but they manage to plumb new depths.

The University of Texas just started offering a master's degree program in "AI".

Linus Torvalds recently talked about welcoming LLM code into the kernel, in the name of encouraging the younguns to fleet their yeek or some such. (The same way he wants to have langauge domain crossings in ring zero by welcoming in Rust while the majority of the code is still C. Because nothing says "maintainable" like requiring a thorough knowledge of two programming langauges' semantics and all possible interactions between them to trace the logic of a single system call. So far I've been able to build Linux without needing a BPF compiler. If at some point I can't build kernels without needing a Rust compiler, that's a "stay on the last GPLv2 release until finding a different project to migrate to" situation.)

The attraction of LLMs is literally Dunning-Kruger syndrome. Their output looks good to people who don't have domain expertise in the relevant area, so if you ask it to opine about economics it looks GREAT to people who have no understanding of economics. But if you ask it to output stuff you DO know about, well obviously it's crap. I.E. "It's great for everything else, but it'll never replace ME, so I can fire all my co-workers and just have LLMs replace them while I use my unique skills the LLMs do a bad job replicating".

Fundamentally, an LLM can't answer any question that hasn't got a known common answer already. It's morphing together the most common results out of a big web-scraped google cache, to produce the statistically most likely series of words from the input dataset to follow the context established by the prompt. The answer HAS to already be out there in a "let me Google that for you" sense, or an LLM can't provide it. The "morphing together" function can combine datasets ("answer this in the style of shakespeare" is a more advanced version of the old "jive" filter), but whether the result is RIGHT is entirely coincidental. Be careful what you wish for and caveat emptor are on full display.

I can't wait for license disputes to crop up. Remember the chimp who took a photo of itself and a court ruled the image wasn't copyrighted? LLM code was trained on copyrighted material, but the output is not itself copyrightable because human creativity wasn't involved. But it's not exactly public domain, either? Does modifying it and calling your derived work your own IP give you an enforceable copyright when 95% of it was "monkey taking a selfie?" and the other 5% is stolen goods?

Lovely comment on mastodon, "Why should I bother to read an LLM generated article when nobody could be bothered to write it?" Also people speculating that ChatGPT-4 is so much worse than ChatGPT-3 that it must have been intentionally sabotaged (with speculation about how this helps them cash out faster or something?) when all the LLM designers said months ago that sticking LLM output into an LLM training dataset was like sticking a microphone into a speaker, and the math goes RAPIDLY pear shaped with even small amounts of contamination poisoning the "vibe" or whatever's going on there. (Still way more an art than a science.) So scraping an internet that's got LLM-generated pages in it to try to come up with the NEXT round of LLM training data DOESN'T WORK RIGHT. The invasive species rapidly poisons its ecosystem, probably leading to desertification.

Capitalism polluting its own groundwater usually has a longer cycle time, but that's silicon valley for you. And white guys who confidently answer questions regardless of whether they actually know anything about the topic or not are, of course, highly impressed by LLMs doing the same. They made a mansplaining engine, they LOVE it.

"Was hamlet mad" was a 100 point essay question in my high school shakespeare class, where you could argue either side as long as you supported it. "Was hamlet mad" was a 2 point true/false question in my sophomore english class later the same month. Due to 4 visits to the Johns Hopkins CTY program I wound up taking both of those the same semester in high school, because they gave me the senior course form to fill out so I could take calculus as a sophomore, so I picked my other courses off there too and they didn't catch it until several months later by which point it was too late. I did not enjoy high school, but the blatant "person in authority has the power to define what is right, even when it's self-contradictory and patently ridiculous" experience did innoculate me against any desire to move to Silicon Valley and hang out with self-important techbros convinced everyone else is dumber than they are and there's nothing they don't already know. A culture where going bankrupt 4 times and getting immediate venture capital funding for a 5th go is ABSOLUTELY NORMAL. They're card sharps playing at a casino with other people's money, counting cards and confidently bluffing. The actual technology is a side issue. And now they've created a confident bluffing engine based on advanced card counting in a REALLY BIG deck, and I am SO TIRED.


February 28, 2024

Trying hard to get a leap day toybox release out, because the opportunity doesn't come along that often.

This is why Linux went to time based releases instead of "when it's ready" releases, because the longer it's BEEN since the last release the harder it is to get the next release out. Working on stabilization shakes todo items loose and DESTABILIZES the project.


February 27, 2024

When I tested Oliver's xz cleanup, which resulted in finding this bug, what I muttered to myself (out loud) is "It's gotta thing the thing. If it doesn't thing the thing it isn't thinging."

This is my clue to myself that it may be time to step away from the keyboard. (I didn't exhaust myself programming today, I exhausted myself boxing up the books on 4 bookshelves so somebody could pick the empty bookshelves up and move them to her daughter's bedroom. This leaves us with only 14 more bookshelves to get rid of.)

Remember how two people were working on fdpic toolchain support for riscv? Well now the open itanium crowd has decided to remove nommu support entirely. Oh well. (It's a good thing I can't _be_ disappointed by riscv...)


February 24, 2024

Sigh, started doing release notes with today's date at the top, and as usual, that was... a bit ambitious.

Editing old blog entries spins off todo items as I'm reminded of stuff I left unfinished. Going through old git commits to assemble release notes finds old todo items. Doing "git diff" on my dirty main dev tree finds old todo items... The question is what I feel ok skipping right now.

I'm too stressed by the move to make good decisions about that at the moment...


February 23, 2024

Sigh, the censorship on crunchyroll is getting outright distracting. Rewatching "kobyashi maid dragon" (_without_ subtitles this time, I've heard it so many times I kind of understand some of the japanese already and I know the plot so am trying to figure which word means what given that I sort of know what they're saying), and in the first episode Tohru (the shapeshifted dragon) was shown from behind, from the waist up, with her shirt off. But you can no longer show a woman's bare back on crunchyroll (you could last year!), so they edited in another character suddenly teleporting behind her to block the view.

This is 1950's "Elvis Presley's Pelvis can't be shown on TV" levels of comstock act fuckery. (And IT IS A CARTOON. YOU CANNOT HAVE SEX WITH A DRAWING. There are so many LAYERS of wrong here...)

Imagine the biblical prohibitions on food had been what survived into the modern day instead of the weirdness about sex. The bible's FULL of dietary restrictions predating germ theory, the discovery of vitamins, or any understanding of allergens: can't mix milk and meat, no shellfish, no meat on fridays, give stuff up for lent, fasting, the magic crackers and wine becoming LITERALLY blood and human flesh that you are supposed to cannibalize but it's ok because it's _church_ magic... Imagine black censor bars over the screen every time somebody opens their mouth to eat or drink. Imagine digitally blurring out any foodstuff that isn't explicitly confirmed, in-universe, as kosher or halal. Imagine arguing that watching "the great british bake-off", a dirty foreign film only available to adults on pay-per-view in 'murica, was THE SIN OF GLUTTONY and would make you statistically more likely to get tapeworms because FOOD IS DANGEROUS.

Kind of distracting, isn't it? Whether or not you're particularly interested in whatever made anime character du jour shout "oiishiiii" yet again (it's a trope), OBVIOUSLY CENSORING IT is far, far, far more annoying than the trope itself could ever be. Just show it and keep going. Even if I wanted to (I don't) I can't eat a drawing of food through the screen... but why exactly would it be bad if I could? What's the actual PROBLEM?

I am VERY TIRED that right-wing loons' reversion to victorian "you can see her ankles!" prudishness is being humored by so many large corporations. These idiots should not have traction. Their religion is funny about sex EXACTLY the same way it's funny about food, with just as little scientific basis. These days even their closest adherents ignore the EXTENSIVE explicit biblical dietary prohibitions (Deuteronomy 14 is still in the bible, forbidding eel pie and unagi sushi although Paul insists that God changed his mind since then, but even the new testament forbids eating "blood" and "meat of strangled animals" in Acts 15:29 and the medieval church had dozens of "fast days" on top of that, plus other traditions like anorexia mirabilis, but these days we ignore all that because their god isn't real and we all AGREE the food prohibitions were nothing but superstition propagated from parent to child the same way santa claus and the tooth fairy are. Even the more RECENT stuff like "lent" (which gave us the McDonalds Fish sandwich because christianity was still culutrally relevant as recently as the 1960s) is silly and quaint to anyone younger than Boomers.

But the SEX part persists (officiating marriage was too lucrative and provided too much control over the populace to give up), and is still causing enormous damage. Religious fasting is obsolete but shame-based abstinence is still taught in schools. Except most sexually transmitted diseases only still EXIST because of religious shame. Typhoid mary was stopped by science, because we published the information and tracked the problem down and didn't treat getting a disease as something shameful to be hidden and denied. Sunlight was the best disinfectant, we find OUT sources of contamination and track them down with the help of crowdsourcing. NOT with medieval "for shame, you got trichinosis/salmonella/listeria what a sinner, it's yahweh jehovah jesus's punishment upon you, stone them to death!" It's precisely BECAUSE we drove the religious nonsense out and replaced it with science and sane public policy that you can eat safely in just about any restaurant even on quite rural road trips. We have regular testing and inspections and have driven a bunch of diseases out of the population entirely, and when there IS an outbreak of Hepatitis A we don't BLAME THE VICTIMS, we track down the cause and get everybody TREATED.

I don't find cartoon drawings of women particularly arousing for the same reason I don't find cartoon drawings of food particularly appetizing... but so what if I did? So what if "delicious in dungeon" or "campfire cooking" anime made me hungry? Cartoon food on a screen is not real food in front of me for MULTIPLE REASONS. which also means I can't get fat from it, or catch foodborne pathogens, or allergens, or deprive someone else's of their rightful share by eating too much, or steal the food on screen, or contaminate it so other people get sick. Even if I _did_ salivate at cartoon food... so what?

Even if I was attending a play with real actors eating real food up on the stage live in front of me, which I could literally SMELL, I still couldn't run up and eat it because that's not how staged entertainment works. But the Alamo Drafthouse is all about "dinner and a movie" as a single experience, and when I watched Sweeney Todd at the Alamo Drafthouse they had an extensive menu of meat pies (which is how I found out I'm allergic to parsnips), and it was NOT WRONG TO EAT WHILE WATCHING when the appropriate arrangements had been made to place reality in front of each individual attendee, EVEN THOUGH THAT MOVIE IS LITERALLY ABOUT CANNIBALISM. You can't make a "slippery slope" argument when the thing LITERALLY ACTUALLY HAPPENING would be fine. Oh wow, imagine if a summoned elf from another world climbed out of the TV and had sex with me right now! Um... ok? This is up there with wanting to fly and cast "healing" from watching a cartoon with magic in it. The same church also did a lot of witch burnings, it was wrong of them and we're over that now. Today, watching Bewitched or I Dream of Jeanie, I'm really not expecting to pick up spells because I'm not four years old, but if watching "The Tomorrow People" taught me to teleport... where's the downside? What do you think you're protecting anyone FROM?

These entertainments regularly show people being brutally, bloodily murdered, and THAT is just fine. Multiple clips of deadpool on youtube show the "one bullet through three heads in slow motion" scene unblurred, but the scenes showing consensual sex with the woman Wade Wilson lives with and proposes marriage to and spends half the movie trying to protect and/or get back to, THAT can't be shown on youtube. (And even the movie has some internalized misogyny, albeit in the form of overcompensating the other way and still missing "equality": in the scene where he collapses from the first sign of cancer, he's fully naked and she's wearing underwear, because male nudity isn't sexual while women in underwear or even tight clothing are always and without exception sexual and beyond the pale, and showing an orifice literally HALF THE POPULATION has is unthinkable even in an R rated movie.)

Sexual repression has always correlated strongly with fascism. The nazis first book burning was a sexual research institute. The victorian prudishness of the british was the period they were conquering an empire with jamaican slave plantations and feeding opium to china and the East India company subjugating india and native american genocides (George "town killer" Washington) so on.

It's currently the boomers doing it. As teenagers in the 1960s they pushed "sex drugs rock and roll" into the mainstream, and then once they were too old to have sex with teenagers they outlawed teenagers having sex with EACH OTHER or selling pictures they took of themselves (the supreme court's Oberfell decision in 1982 invented the legal category of "child porn" because some teenage boys selling pictures they took of themselves masturbating made it all the way to the supreme court, which is why everybody used to have naked baby pictures before that and the 1978 movie "superman" showed full frontal nudity of a child when his spacecraft lands without anybody thinking it was sexual, but 4 years later the law changed so filming things like that is now SO TERRIBLE that you can't even TALK ABOUT IT without being branded as "one of them", which makes being a nudist a bit frustrating). And now the Boomers are so old even the viagra's stopped working, they're trying to expunge sex from the culture entirely.

Sigh. This too shall pass. But it's gonna get uglier ever year until a critical mass of Boomers is underground. (In 2019 there were estimated to be about 72 million Boomers left, and 4 million of them died between the 2016 and 2020 elections which was the main reason the result came out differently.)

In the meantime... crunchyroll. Last week I tried to start a new series called "I couldn't become a hero, so I reluctantly decided to get a job", and I'm tempted to try to buy the DVD of a series I may not even like because I CANNOT WATCH THIS. In the first FIVE MINUTES they'd clearly edited a half-dozen shots to be less porny. I'm not interested in trying to sexualize cartoon characters, but this is "han shot first" and the ET re-release digitally editing the guns into walkie-talkies levels of obvious and unconvincing bullshit. Even when I'm theoretically on their side (defund the police, ACAB, I'm very glad the NRA is imploding) the cops who showed up to separate Elliott from his alien friend HAD GUNS and STOPPIT WITH THE PHOTOSHOP. If I can tell on a FIRST WATCH that you're editing the program within an inch of its life... every time I'm pulled right out of my immersion again.

I dislike smoking, but Disney photoshopping cigarettes out of Walt Disney's photos is historical revisionism. If a show had a bunch of characters chain-smoke but they digitally edited them to have lollypops and candycanes in their mouths all the time instead, gesticulating with them... You're not fooling anyone. Imagine if they did that to Columbo. Columbo with his cigar digitally removed and every dialog mention of it clipped out. You can be anti-cigar and still be WAY CREEPED OUT BY THAT. Cutting the "cigarette? why yes it is" joke out of Police Squad does not make you the good guy.

Do not give these clowns power. The law is whatever doesn't get challenged.


February 22, 2024

Sat down to rebuild all the mcm-buildall.sh toolchains this morning for the upcoming release (so I can build mkroot against the new kernel), but the sh4 sigsetjmp() fix went in recently (a register other stuff used was getting overwritten) and Rich said it was just in time for the upcoming musl release, so I asked on IRC how that was doing, and also mentioned my struggle with nommu targets and the staleness of musl-cross-make, and there was a long quite productive discussion that resulted in Rich actually making a push to mcm updating musl to 1.2.4! Woo! And it looks like they're doing a lot of cool stuff that's been blocked for a bit.

As part of that discussion, somebody new (sorear is their handle on the #musl channel on libra.chat) is working on a different riscv fdpic attempt, and meowray is working on adding fdpic support to llvm-arm. Either could potentially result in a nommu qemu test environment, I'm all for it.


February 21, 2024

One of my phone apps "updated" itself to spray advertising all over everything, after 2 years of not doing that. Showing one on startup I'd probably wince and let the frog boil, but having an animated thing ALWAYS on screen when it's running: nope. And Android of course does not let me downgrade to the previous version of anything because that would be giving up too much control.

It doesn't show ads if I kill the app, go into airplane mode, and relaunch it without network access. Then I get the old behavior. So I went into the app permissions, viewed all, and tried to revoke the "have full network access" permission. The app is an mp3 player reading files off of local storage, I switch to it from the google built-in one because Google's didn't understand the concept of NOT streaming but just "only play local files"...

But Android won't let me revoke individual app permissions. I can view "other app capabilities", but long-press on it does nothing, nor does swipe to the side, and tapping on it just brings up a description with "ok". No ability to REVOKE any. Because despite having purchased a phone, I am the product not the customer. Even having put the phone into debug mode with the "tap a zillion times in a random sub-menu" trick, I still don't get to control app permissions. (So what are the permissions FOR, exactly?)

Sigh, serves me right for running vanilla android instead of one of the forks that actually lets me have control over my phone. I suppose there's a thing I could do with adb(?), but keeping the sucker in airplane mode while listening is a workaround for now...

And no I don't feel guilty about "but what about all the effort the app developer put into it", I can play an mp3 I downloaded through the "files" widget: it's built into the OS. Which is fine for the copy of Rock Sugar's "reinventinator" Fade bought me for christmas: whole album is one big ogg file, threw it on on my web server and downloaded it, and it plays fine. But the File app doesn't advance to the next one without manual invervention. "Play this audio file" is probably a single line of java calling a function out of android's standard libraries. Going from an android "hello world" app tutorial to "display list of files, click on one to play and keep going in order, show progress indicator with next/forward and pause/play button, keep going when screen blanked with the lock screen widget... In fact nevermind that last bit, the "file" widget is doing the exact same lock screen widget playing that ogg file, so this is probably a standard gui widget out of android's libraries and you just instantiate it with flags and maybe some callbacks. (Sigh, it's Java, they're going to want you to subclass it and provide your own constructor and... Ahem.) Anyway, that's also built into the OS.

This is probably a weekend's work _learning_ how to do all that. Including installing android studio. And yes my $DAYJOB long ago was writing java GUI apps for Quest Multimedia and I taught semester long for-credit Java courses at austin community college: I'm stale at this but not intimidated by it.

But I haven't wanted to open the app development can of worms because I'm BUSY, especially now you have to get a developer ID from Google by providing them government ID in order to have permission to create a thing you can sideload on your OWN PHONE.

Not going down that rathole right now. I am BUSY.


February 20, 2024

Hmmm, you know a mastodon feed of this blog doesn't have to be CURRENT, I could do audio versions of old entries, do notes/01-23-4567 dirs each with an index.html and mp3 file (alongside the existing one-big-text version), and post links to/from a (new, dedicated) mastodon account as each one goes up, which would allow people to actually comment on stuff, without my tendency to edit and upload weeks of backlog at a time. (Hmmm, but _which_ mastodon account? Does dreamhost do mastodon? Doesn't look like it. I don't entirely trust mstdn.jp to still be around in 5 years, I mean PROBABLY? But it's outside of my control. How much of the legal nonsense of running your own server is related to letting OTHER people have accounts on it, and how much is just "the Boomers are leaving behind a dysfunctinally litigous society". There was a lovely thread about mastodon legal setup tricks for individuals running their own server, things like notifying some government office (a sub-program of the library of congress I think?) to act as a DMCA takedown notice recipient "agent" on your behalf, but it was on twitter and went away when that user deleted their account. Mirror, don't just bookmark...)

Ahem: backstory.

This blog is a simple lightly html formatted text file I edit in vi, and I tend to type in the text extemporaneously and do most of the HTML formatting in a second pass, plus a bunch of editing to replace [LINK] annotations with the appropriate URL I didn't stop to grab at the time, and finish half-finished trail off thoughts not englished wordily because brain distract in

Anyway, the "start of new entry" lines are standardized, and as I go through editing I replace my little "feb 20" note with a cut and paste from the last entry I edited to the start of the new one, and change the date in the three places it occurs. Yes vi has cut and paste: "v [END] y [PAGEUP... cursor cursor...] p" and then "i" to go into insert mode and cursor over to the three places the entry's date shows up in the first line and type over it because I'm sure there's a "search and replace within current line" magic key but I've never bothered to learn it. It would be great to to have the date in just ONE place, but I'm editing raw HTML and it's got an <a name="$DATE"> to provide jump anchors, an <hr> tag to provide a dividing line, <h2> start and end tags to bump the font up, an <a href="#$DATE"> tag to provide an easily copyable link to the entry (each entry links to itself), and then an expanded english date to provide the display name for the link. (And then on the next line, usually a <span id=programming> tag so SOMEDAY I can make multiple rss feed generators that show only specific categories, if you "view source" there's a commented out list of span tags at the top I've historically used and try to stick to.)

The advantage of each new entry having a standardized line at the start is it's easy to search for and parse, and I have a python script a friend (Dr. What back at timesys) wrote ages ago to generate an rss feed for my blog, which I've rewritten a lot since then but it's still in python rather than sed out of historical inertia, and also me treating rss (actually "atom", I think?) as a magic undocumented format likely to shatter if touched. (It is python 2. It will not ever be python 3. If a debian upgrade takes away python 2, that's when the sed comes out. Posix has many failings, but "posix-2024" is not going to force you to rewrite "posix-2003" scripts that work, the same way modern gasoline still works in a 20 year old car.)

What this form of blogging does NOT provide is any way for readers to leave comments (other than emailing me or similar), which was the big thing I missed moving from livejournal back to blogging on my own site. And I am NOT doing that myself: even if I wanted to try to deal with some sort of CGI plumbing for recording data (I don't), user accounts and moderation and anti-spam and security and so on are way too much of a pain to go there. (I have met the founders of Slashdot. It ate their lives, and that was 20 years ago.)

But now that I'm on mastodon (as pretty much my only social network, other than some email lists and the very occasional youtube comment under an account not directly connected to anything else), using a mastodon account as an rss feed for the blog seems... doable? Ok, the entries don't have TITLES. Summaries would be a problem. (On mstdn.jp posts have a 500 character limit, I guess I could just do start of entry. But they're not realy organized with topic scentences, either.)

The real problem has been that I'm not posting promptly, and tend to do so in batches (because editing) which floods the feed. Possibly less of an issue with rss feeds, where you can get to it much later. (The feed readers I've seen had each data source basically in its own folder, not one mixed together stream like social media likes to do so stuff gets buried if you don't get to it immediately.)

There's also a lot of "chaff", since a blog has multiple topics and I might want to serialize just one (the id=programming stuff). I've (manually) put the tags in, but haven't USED them yet. Haven't even mechanically confirmed the open/close pairs match up, just been eyeballing it...


February 19, 2024

Watched the building a busybox based debian peertube video, which really should have been a 5 minute lightning talk. It boils down to "I use mmdebstrap instead of debootstrap, here's some command line options that has and how I used them to install debian's busybox package in a semi-empty root directory and got it to boot". It's not _really_ a busybox based debian, more hammering in a screw and filing the edges a bit.

First he established "debian's too big for embedded" by doing mmdebstrap --variant=minbase unstable new-dir-name and showing the size (not quite 200 megs), then he trimmed it with --dpkgopt='path-exclude=/usr/share/man/*' and again for (/usr/share/doc/* and /usr/share/locale/*) which was still over 100 megs.

Next he mentioned you can --include packagename (which takes a CSV argument) and introduced the --variant=custom option which only installs the packages you list with --include. And he talked about --setup-hook and --customize-hook which are just shell command lines that run before and after the package installs (in a context he didn't really explain: it looks like "$1" is the new chroot directory and the current directory already has some files in it from somwhere? Maybe it's in the mmdebstrap man page or something...)

Putting that together, his "busybox install" was:


INCLUDE_PKGS=dpkg,busybox,libc-bin,base-files,base-passwd,debianutils
mmdebstrap --variant=custom --include=$INCLUDE_PKGS \
  --hook-dir=/usr/share/mmdebstrap/hooks/busybox \
  --setup-hook='set -i -e "1 s/:x:/::/g" > "$1/etc/passwd"' \
  --customize-hook='cp inittab $1/etc/inittab' \
  --customize-hook='mkdir $1/etc/init.d; cp rcS $1/etc/init.d.rcS' \
  unstable busybox-amd64

(Note, the "amd64" at the end was just naming the output directory, the plumbing autodetects the current architecture. There's probably a way to override that but he didn't go there.)

He also explained that mmdebootstrap installs its own hooks for busybox in /usr/share/mmdebootstrap/hooks/busybox and showed setup00.sh and extract00.sh out of there, neither of which seemed to be doing more than his other customize-hook lines so I dunno why he bothered, but that's what the --hook-dir line was for apparently. (So it doesn't do this itself, and it doesn't autodetect it's installing busybox and fix stuff up, but you can have it do BITS of this while you still do most of the rest manually? I think?)

In addition to the packages he explicitly told it to install, this sucked in the dependencies gcc-12-base:amd64 libacl1:amd64 libbz2-1.0:amd64 libc6:am64 libdebconfclient0:amd64 libgcc-s1:amd64 liblzma5:amd64 libpcre2-8-0:amd64 libselinux1:amd64 mawk tar zlib1g:amd64 and that list has AWK and TAR in it (near the end) despite busybox having its own. I haz a confused. This was not explained. (Are they, like, meta-packages? I checked on my ancient "devuan botulism" install and awk claims to be a meta-package, but tar claims to be gnu/tar.)

Anyway, he showed the size of that (still huge but there's gcc in there) then did an install adding the nginix web server, which required a bunch more manual fiddling (creating user accounts and such, so he hasn't exactly got a happy debian base that "just works" for further packages, does he) and doing that added a bunch of packages and ~50 megs to the image size. (Plus naginiks's corporate maintainer went nuts recently and that project forked under a new name, but that was since this video.)

Finally he compared it against the alpine linux base install, which is still smaller than his "just busybox" version despite containing PERL for some reason. This is because musl, which the above technique does not address AT ALL. (It's pulling packages from a conventionally populated repository. Nothing new got built from source.)

Takeaway: the actual debian base appears to be the packages dpkg, libc-bin, base-files, base-passwd, and debianutils. This does not provide a shell, command line utilities, or init task, but something like toybox can do all that. Of course after installing a debootstrap I generally have to fiddle with /etc/shadow, /etc/inittab, and set up an init ANYWAY. I even have the checklist steps in my old container setup docs somewhere...


February 18, 2024

The limiting factor on a kconfig rewrite has been recreating menuconfig, but I don't really need to redo the current GUI. I can just have an indented bullet point list that scrolls up and down with the cursor keys and highlight a field with reverse text. Space enables/disable the currently highlighted one, and H or ? shows its help text. Linux's kconfig does a lot with "visibility" that I don't care about (for this everything's always visible, maybe greyed if it needs TOYBOX_FLOAT or something that's off?). And Linux's kconfig goes into and out of menus because an arbitrarily indented bullet point list would go off the right edge for them: the kernel's config mess goes a dozen levels deep, but toybox's maximum depth is what, 4? Shouldn't be that hard...

As for resolving "selects" and "depends", according to sed -n '/^config /,/^\*\//{s/^\*\///;p}' toys/*/*.c | egrep 'selects|depends' | sort -u there aren't current any selects, and the existing depends use fairly simple logic: && and || and ! without even any parentheses, which is the level of logic already implemented in "find" and "test" and such (let alone sh). Shouldn't be too challenging. I should probably implement "selects" and parentheses just in case, though...

The cursor up and down with highlighting stuff I already did in "top" and "hexedit" and such, and I should really revisit that area to do shell command line editing/history...


February 17, 2024

The deprecation news of the week:

The last one is sad. FreeBSD is rendering itself irrelevant in the embedded world. Oh well, if they want to embrace being "MacOS Rawhide and nothing more", it's their project...

Ongoing sh4 saga: I might be able to get FDPIC working on qemu-system-sh4, but it turns out qemu-system-sh4 doesn't boot mkroot anymore, even in a clean tree using the known-working kernel from last release.

I bisected it to a specific commit but commenting out the setvbuf() in main didn't help. Tracked it down to sigsetjmp() failing to return. Note that this is SET, which should just be writing to the structure. Yes it's 8 byte aligned. This bug is jittery crap that heisenbugs away if my debug printfs() have too many %s in them (then it works again). Asked for help on the musl, linux-sh, and toybox lists.

And of course, I got private email in reply to my list posts. As always:

On 2/16/24 20:22, [person who declined to reply publicly] wrote:
> Shot into the blue:
>
> try with qemu-user; mksh also currently has a regression test
> failing on a qemu-user sh4 Debian buildd but with one of the
> libcs only (klibc, incidentally, not musl, but that was with
> 1.2.4)

Hmmm, that does reproduce it much more easily, and I get more info:

Unhandled trap: 0x180
pc=0x3fffe6b0 sr=0x00000001 pr=0x00427c40 fpscr=0x00080000
spc=0x00000000 ssr=0x00000000 gbr=0x004cd9e0 vbr=0x00000000
sgr=0x00000000 dbr=0x00000000 delayed_pc=0x00451644 fpul=0x00000000
r0=0x3fffe6b0 r1=0x00000000 r2=0x00000000 r3=0x000000af
r4=0x00000002 r5=0x00481afc r6=0x407fffd0 r7=0x00000008
r8=0x3fffe6b0 r9=0x00456bb0 r10=0x004cea74 r11=0x3fffe6b0
r12=0x3fffe510 r13=0x00000000 r14=0x00456fd0 r15=0x407ffe88
r16=0x00000000 r17=0x00000000 r18=0x00000000 r19=0x00000000
r20=0x00000000 r21=0x00000000 r22=0x00000000 r23=0x00000000

Might be able to line up the PC with the mapped function with enough digging to find the failing instruction...

What IS a trap 0x180? Searching the sh4 software manual for "trap" says there's something called an exception vector... except "exception" has over 700 hits in that PDF and "exception vector" has two, neither of which are useful.

Ok, in qemu the string "Unhandled trap" comes from linux-user/sh4/cpu_loop.c which is printing the return code from cpu_exec() which is in accel/tcg/cpu-exec.c which is a wrapper for cc->tcg_opts->cpu_exec_enter() which is only directly assigned to by ppc and i386 targets, I'm guessing uses one of those curly bracket initializations in the others? According to include/hw/core/tcg-cpu-ops.h the struct is TCGCPUOps... Sigh, going down that path could take a while.

Alright, cheating EVEN HARDER:

$ grep -rw 0x180 | grep sh
hw/sh4/sh7750_regs.h:#define SH7750_EVT_ILLEGAL_INSTR 0x180 /* General Illegal Instruction */

What? I mean... WHAT? Really? (That macro is, of course, never used in the rest of the code.) But... how do you INTERMITTENTLY hit an illegal instruction? (What, branch to la-la land? The sigsetjmp() code doesn't branch!)

That email also said "It might just as well be just another qemu bug..." which... Maybe? It _smells_ like unaligned access, but I don't know _how_, and the structure IS aligned. I don't see how it's uninitialized anything since A) the sigsetjmp() function in musl writes into the structure without reading from it, B) adding a memset() beforehand doesn't change anything. If a previous line is corrupting memory... it's presumably not heap, because nothing here touches the heap. The "stack taking a fault to extend itself" theory was invalidated by confirming the failure case does not cross a page boundary. "Processor flags in a weird state so that an instruction traps when it otherwise wouldn't" is possible, but WEIRD. (How? What would put the processor flags in that state?)

Continuing the private email:

> There's also that whole mess with
> https://sourceware.org/bugzilla/show_bug.cgi?id=27543
> which affects {s,g}etcontext in glibc, maybe it applies
> somewhere within musl? (The part about what happens when
> a signal is delivered especially.)

Which is interesting, but musl's sigsetjmp.s doesn't have frchg or fschg instructions.

But what I _could_ try doing is building and testing old qemu versions, to see if that affects anything...


February 16, 2024

Broke down and added "riscv64::" to the mcm-buildall.sh architecture list, which built cross and native toolchains. (Because musl/arch only has riscv64, no 32 bit support.)

To add it to mkroot I need a kernel config and qemu invocation, and comparing qemu-system-riscv64 -M '?' to ls linux/arch/riscv/configs gives us... I don't know what any of these options are. In qemu there's shakti, sifive, spike, and virt boards. (It would be really nice if a "none" board could be populated with memory and devices and processors and such from the command line, but that's not how IBM-maintained QEMU thinks. There are "virt" boards that maybe sort of work like this with a device tree? But not command line options, despite regularly needing to add devices via command line options ANYWAY.) Over on the kernel side I dunno what a k210 is, rv32 has 32 in it with musl only supporting 64, and nommu_virt_defconfig is interesting but would have to be a static PIE toolchain because still no fdpic. (Maybe later, but I could just as easily static pie coldfire.)

(Aside: static pie on nommu means that running "make tests" is unlikely to complete because it launches and exits zillions of child processes, any of which can suddenly fail to run because memory is too fragmented to give a large enough contiguous block of ram. FDPIC both increases sharing (the text and rodata segments can be shared between instances, meaning there's only one of each which persist as toybox processes run and exit), and it splits the 4 main program segments apart so they can independently fit into smaller chunks of memory (the two writeable segments, three if you include stack, are small and can move independently into whatever contiguous chunks of free memory are available). So way less memory thrashing, thus less fragmentation, and way less load in general (since each instance of toybox doesn't have its own copy of the data and rodata segements) thus a more reliable system under shell script type load. This is why I'm largely not bothering with static pie nommu systems: I don't expect them to be able to run the test suite anyway.)

This leaves us with linux's riscv "defconfig", which I built and set running and ran FOREVER and was full of modules and I really wasn't looking forward to stripping that down, so I went "does buildroot have a config for this?" And it does: qemu_riscv64_virt_defconfig with the corresponding qemu invocation from board/qemu/riscv64-virt/readme.txt being "qemu-system-riscv64 -M virt -bios fw_jump.elf -kernel Image -append "rootwait root=/dev/vda ro" -drive file=rootfs.ext2,format=raw,id=hd0 -device virtio-blk-device,drive=hd0 -netdev user,id=net0 -device virtio-net-device,netdev=net0 -nographic" which... needs a bios image? Really? WHY? You JUST INVENTED THIS ARCHITECTURE, don't make it rely on LEGACY FIRMWARE.

But maybe this is an easier kernel .config to start with (less to strip down anyway), so I tried building it and of course buildroot wants to compile its own toolchain, within which the binutils build went: checking for suffix of object files... configure: error: in `/home/landley/buildroot/buildroot/output/build/glibc-2.38-44-gd37c2b20a4787463d192b32041c3406c2bd91de0/build': configure: error: cannot compute suffix of object files: cannot compile

Right, silly me, it's a random git snapshot that's weeks old now, so I did a "git pull" and ran it again and... exact same failure. Nobody's built 64 bit riscv4 qemu image in buildroot in multiple weeks, or they would have noticed the build failure.

Open source itanic. It's not a healthy smell.

(WHY is it building a random glibc git snapshot? What's wrong with the release versions? Buildroot can PATCH STUFF LOCALLY, overlaying patches on top of release versions was one of the core functions of buildroot back in 2005. Right, ok, back away slowly...)


February 15, 2024

Rich confirmed that he intentionally broke another syscall because he doesn't like it, and wants all his users to change their behavior because it offends him. So I wrapped the syscall.

But the problem with fixing up hwclock to use clock_settime() and only call settimeofday() for the timezone stuff (via the wrapped syscall, yes this is a race condition doing one time update with two syscalls) is now I need to TEST it, and it's one of those "can only be done as root and can leave your host machine in a very unhappy state". The clock jumping around (especially going backwards) makes various systemwide things unhappy, and doing it out from under a running xfce and thunderbird and chromium seem... contraindicated.


February 14, 2024

Emailed Maciej Rozycki to ask about the riscv fdpic effort from 2020 and got back "Sadly the project didn't go beyond the ABI design phase."

Since arm can (uniquely!) do fdpic _with_ mmu, I tried to tweak the sh4 config dependencies in fs/Kconfig.binfmt in the kernel to move superh out of the !MMU group and next to ARM, and the kernel build died with binfmt_elf_fdpic.c:(.text+0x1b44): undefined reference to `elf_fdpic_arch_lay_out_mm'.

Emailed the superh and musl mailing lists with a summary of my attempts to get musl-fdpic working on any target qemu-system can run. (Not including the or1k/coldfire/bamboo attempts that, it turns out, don't support fdpic at all.) Hopefully SOMEBODY knows how to make this work...


February 13, 2024

Emailed linux-kernel about sys_tz not being namespaced, cc-ing two developers from last year's commit making the CLONE_NEWTIME flag actualy work with clone().

I don't expect a reply. As far as I can tell the kernel development community is already undergoing gravitational collapse into a pulsar, which emits periodic kernels but is otherwise a black hole as far as communication goes. Members-only.

The clone flag that didn't work with clone() was introduced back in 2019 and stayed broken for over 3 years. Linux's vaunted "with enough eyeballs all bugs are shallow" thing relied on hobbyists who weren't just focusing on the parts they were paid to work on. You don't get peer review from cubicle drones performing assigned tasks.

I am still trying to hobbyist _adjacent_ to the kernel, and it's like being on the wrong side of gentrification or something. The distain is palpable.


February 12, 2024

So glibc recently broke settimeofday() so if you set time and timezone at the same time it returns -EALLHAILSTALLMAN.

But if you DON'T set them together, your clock has a race window where the time is hours off systemwide. And while "everything is UTC always" is de-facto Linux policy, dual boot systems have to deal with windows keeping system clock in local time unless you set an obscure registry entry which isn't universally honored. Yes this is still the case on current Windows releases.

Digging deeper into it, while a lot of userspace code uses the TZ environment variable these days, grep -rw sys_tz linux/* finds it still used in 36 kernel source files and exported in the vdso. The _only_ assignment to it is the one in kernel/time/time.c from settimeofday(), so you HAVE to use that syscall to set that field which the kernel still uses.

When musl switched settimeofday() to clock_settime() in 2019 it lost the ability to assign to sys_tz at all, which I think means it lost the ability to dual boot with most windows systems?

The other hiccup is sys_tz didn't get containerized when CLONE_NEWTIME was added in 2019 so it is a systemwide global property regardless of namespace. Then again they only made it work in clone rather than unshare last year so that namespace is still cooking.

The real problem is the actual time part of settimeofday() is 32 bit seconds, ala Y2038. That's why musl moved to the 64 bit clock_settime() api. The TZ environment variable assumes the hardware clock is returning utc. The point of sys_tz is to MAKE it return UTC when the hardware clock is set wrong because of windows dual booting.


February 11, 2024

The paper Decision Quicksand: how Trivial Choices Suck Us In misses an important point: when the difference in outcome is large, it's easier to weigh your options. When the difference in outcome is small, it's harder to see/feel what the "right thing" is because the long-term effect of the decision is buried in noise. So more important questions can have a clearer outcome and be easier to decide, less important ones tend to get blown around by opinion. (Hence the old saying, "In academia the fighting is so vicious because the stakes are so small". See also my longstanding observation that open source development relies on empirical tests to establish consensus necessary for forward progress, subjective judgements from maintainers consume political capital.)

The classic starbucks menu decision paralysis is similar (there's no "right choice" but so many options to evaluate) but people usually talk about decision fatigue when they discuss that one (making decisions consumes executive function). These are adjacent and often conflated factors, but nevertheless distinct.


February 10, 2024

Sigh, shifting sands.

So gentoo broke curses. The gnu/dammit loons are making egrep spit pointless warnings and Oliver is not just trying to get me to care, but assuming I already do. Each new glibc release breaks something and this time it's settimeofday(), which broke hwclock.

And I'm cc'd on various interminable threads about shoving rust in the kernel just because once upon a time I wrote documentation about the C infrastructure they're undermining.

I can still build a kernel without bpf, because (like perl) it's not in anything vital to the basic operation of a Linux compute node. If the day comes I can't build a kernel without rust, then I stay on the last version before they broke it until finding a replacement _exactly_ like a package that switched to GPLv3. I have never had a rust advocate tell me a GOOD thing about Rust other than "we have ASAN too", their pitch is entirely "we hate C++ and confuse it with C so how dare you not use our stuff, we're as inevitable as Hillary Clinton was in 2016"; kind of a turn-off to be honest. They don't care what the code does, just that it's in the "right" langauge. This was not the case for go, swift, zig, oberon, or any of the others vying to replace C++. (Which still isn't C, and I'm not convinced there's anything wrong with C.)

All this is a distraction. I'm trying to build towards goals, but I keep having to waste cycles getting back to where I was because somebody broke stuff that previously worked.


February 9, 2024

Finally checked what x86-64 architecture generation my old laptop is, and it's v2. Presumably upgrading from my netbook to this thing got me that far (since the prebuilt binaries in AOSP started faulting "illegal instruction" on my old netbook circa 2018, and this was back when I was trying to convince Elliott the bionic _start code shouldn't abort() before main if stdin wasn't already open so I kinda needed to be able to test the newest stuff...)

Meaning the pointy haired corporate distros like Red Hat and Ubuntu switching to v3 does indeed mean this hardware can't run them. Not really a loss, the important thing is devuan/debian not abandoning v2. (Updating devuan from bronchitis->diptheria presumably buys me a few years of support even if elephantitis were to drop v2 support. I _can_ update to new hardware, just... why?)

Went to catch up on the linux-sh mailing list (superh kernel development) and found that half the "LTP nommu maintainer" thread replies got sorted into that folder due to gmail shenanigans. (Remember how gmail refuses to send me all the copies of email I get cc'd on but also get through a mailing list, and it's potluck which copy I get _first_? Yeah, I missed half of another conversation. Thanks gmail!)

There's several interesting things Greg Ungerer and Geert Uytterhoeven said that I totally would have replied to back on January 23rd... but the conversation's been over a couple weeks now. Still, "you can implement regular fork() no nommu with this one simple trick" is an assertion I've heard made multiple times, but nobody ever seems to have _done_, which smells real fishy.

Arguing with globals.h generation again: sed's y/// is terribly designed because it doesn't support ranges so converting from lower to upper case (which seems like it would be the DEFINITION of "common case") is 56 chars long (y///+26+26), and hold space is terribly designed because "append" inserts an un-asked-for newline and the only way to combine pattern and hold space is via append. With s/// I can go \1 or & in the output, but there's no $SYNTAX to say "and insert hold space here" in what I'm replacing. You'd think there would be, but no. (More than one variable would also be nice, but down that path lies awk. And eventually perl. I can see drawing the line BEFORE there.)

But some of this is REALLY low hanging fruit. I don't blame the 1970s Unix guys who wrote the original PDP-11 unix in 24k total system ram (and clawed their way up to 128k on its successor the 11/45), but this is gnu/sed. They put in lots of extensions! Why didn't they bother to fix OBVIOUS ISSUES LIKE THAT? Honestly!

My first attempt produced 4 lines of output for each USE() block, which worked because C doesn't care, but looks terrible. Here's a variant that glues the line together properly: echo potato | sed -e 'h;y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/;H' -e 'g;s/\n/ /;s/\([^ ]*\) \(.*\)/USE_\2(struct \1_data \1;)/'

Which is mildly ridiculous because all it's using hold space for is somewhere to stash the lower case string because I can't tell y/// to work on PART of the current line: the /regex/{commands} syntax says which entire lines to trigger on, and s/// doesn't have a way to trigger y/// or similar on just the text it's matched and is replacing.

(And while I'm complaining about things sed SHOULD let you do, why can't I match the first or last line WITHIN a range? The 1,$p ranges don't _nest_, so in sed -n '/^config /,${/^ *help/,/^[^ ]/{1d;$d;p}}' toys/*/ls.c | less the 1d;$d is irrelevant because that's "whole file", not "current match range". I want a syntax to say "this range is relative to the current scope" which would be easy enough for me to implement in the sed I wrote, but wouldn't be PORTABLE if I did that. It's like the gnu/dammit devs who added all these extensions never tried to actually USE sed in a non-trivial way...)

But eh, made it work. And it runs on toys/*/*.c in a single sed invocation (and then a second sed on the output of the first to generate the GLOBALS() block from the previous list of structure definitions) and is thus WAY faster than the "one sed call per input file" it was doing before. Fast enough I can just run it every time rather than doing a "find -newer" to see if I need to run it. (And, again, potentially parallelizable with other headers being generated.)

But that just cleaned up generation of the header with the wrong USE() macros, which still build breaks. I need per-file USE() macros, or some such. Back up, design time. (Meaning "restate the problem from first principles and see where telling that story winds up".)

The GLOBALS() block is unique per-file, and shared by all commands using the same file. Previously the name of the block was the name of the file, but sed working on toys/*/*.c doesn't KNOW the name of the current file it's working on (ANOTHER thing the gnu clowns didn't extend!) and thus I'm using the last #define FOR_walrus macro before each GLOBALS() block (once again: sed "hold space", we get ONE VARIABLE to save a string into) as the name of both the structure type name and the name of the instance of that struct in the union. So now instead of being the name of the file, it's the name of the first command in the file, which is fine. As long as it's unique and the various users can agree on it.

Which means the manual "#define TT.filename" overrides I was doing when the "#define FOR_command" didn't match can go away again. (And need to, they're build breaks.) So that's a cleanup from this...

But there's still the problem that the first command in the file can be switched off in menuconfig, but a later command in the same file can be enabled, so we're naming the struct after the first command, but a USE() macro with the name OF that command would be disabled and thus yank the structure out of the union, resulting in a build break.

The REASON I want to yank the structure out of the union is so the union's size is the ENABLED high water mark, not the "everything possible command including the ones in pending" high water mark.

Oh, but I'm generating the file each time now, which means I don't need the USE() macros. Instead I need to generate globals.h based on the toys/*/*.c files that are switched on by the current config, meaning the sed invocation takes $TOYFILES as its input file list instead of the wildcard path. There's an extra file (main.c) in $TOYFILES, but I don't care because it won't have a GLOBALS() block in it. Generating $TOYFILES already parsed .config earlier in make.sh so I don't even have to do anything special, just use data I already prepared.


February 8, 2024

So scripts/make.sh writes generated/globals.h via a pile of sed invocations against toys/*/*.c and alas it can't do just ONE sed invocation but has to loop calling sed against individual files because it needs to know the current input filename, which slows it down tremendously _and_ doesn't parallelize well, but anyway... I just modified it to wrap a USE_FILENAME() macro around each "struct filename_struct filename;" line in union global_union {...} this; at the end of the file, in hopes of shrinking sizeof(this) down to only the largest _enabled_ GLOBALS() block in the current config. (So the continued existence of ip.c in pending doesn't set a permanent high water mark according to scripts/probes/GLOBALS.)

Unfortunately, while the current filename is used to name the structure and the union member, and TT gets defined to TT.filename even with multiple commands in the same file... there's no guarantee a config FILENAME entry actually exists, which means there's no guarantee the USE_FILENAME() macro I'm adding is #defined. This showed up in git.c, and then again in i2ctools.c: lots of commands, none of them with the same name as the file.

Need to circle back and redesign some stuff to make this work...

Ok, second attempt: use the #define FOR_blah macros instead of the filename, which _does_ allow a single sed invocation to work on toys/*/*.c in one go, although I have to do a lot of hold space shenanigans and use y/// with the entire alphabet listed twice instead of "tr a-z A-Z" to do the upper and lower case variants, but I made the header file I wanted to make! Which now doesn't work for a DIFFERENT reason: if the first command in the file isn't enabled, the USE_BLAH() thing removes the TT struct from the union, and the second command in the same file attempting to use the shared structure gets an undefined member error dereferencing TT.

Which... um, yeah. That's what would happen. I need a USE() macro that's X or Y or Z, which I haven't got logic for. I can add a new hidden symbol and do either selects or depends, but I kinda want to SIMPLIFY the kconfig logic instead of complicating it.

Long ago when I was maintaining busybox, I proposed factoring out the Linux kernel's "kconfig" so other packages can use it, about the way "dtc" (the device tree compiler) eventuallly got factored out. This fell apart because I wanted to keep it in the kernel source but make it another thing the kernel build could install, and Roman Zippel or whoever it was wanted to remove it from the kernel and make a new package that was a build dependency of the linux kernel, which was such a horrible idea that NOT EVER RE-USING THIS CODE was better than adding a build dependency to the kernel, so the idea died. (I note that dtc is still in Linux, despite also being an external project. They didn't do the "make install_dtc" route from the linux source, but they didn't add the dependency either. Instead they maintain two projects in parallel forever, which is what the then-kconfig maintainer insisted was impossible. He's also the guy who rejected properly recognizing miniconfig as a thing unless I did major surgery on the kconfig.c files. I waited for him to go away. He did eventually, but I haven't bothered to resubmit. The perfect is the enemy of the good, and if my only option is the Master Race I'm ok siding with extinction. Kinda my approach to Linux development in a nutshell, these days.)

And since factoring out kconfig DIDN'T happen, and I've instead got an ancient snapshot of code under an unfortunate license that has nothing to do with modern linux kconfig (which became TURING COMPLETE and can now rm -rf your filesystem, bravo), I need to discard/rewrite it and want to reproduce as little as possible. The scripts/mkflags.c code was supposed to be the start of that, but that wound up using output digested by sed. And then the scripts/config2help.c code was going to be the start of a kconfig rewrite, but that stalled and started to back itself out again at the design level because a zillion sub-options is a bad thing. (Somebody once contributed the start of one written in awk. I still haven't got an awk.)

I haven't reopened this can of worms recently, but changing the config symbol design requirements is... fraught. What do I want this to DO...


February 7, 2024

Sigh, I needed a second email account and went "my phone demanded a google account to exist for Android, I'll use that one"... and was then waiting for the email to arrive for 2 weeks. Today they texted me about it and I investigated and "auto-sync" is turned off, so of course I'd never get a notification or see a new email in the list: I had to do the "pull down" guesture to load new emails. (I remember this! Same problem came up last time I tried to use this app some years back, when I still had a work gmail account on the phone for the weekly google hangouts calls that became google meet calls when hangouts joined the google graveyard and we were forced to migrate and I needed an updated link from an email...)

I went into the settings to turn auto-sync back on, along the way turning off two new "we're sending all your data to google to train our chatgpt-alike and sell to advertisers by calling it personalization" options it grew and auto-enabled since the last time I was there (because if you never had the chance to say no, it's not a lack of consent?), but turning on auto-sync has a pop-up:

Changes you make to all apps and accounts, not just Gmail, will be synchornized between the web, your other devices, and your phone. [Learn more]

And now I remember why it was turned OFF. (And why I usually create a new gmail account every time I get a new phone, discarding the old history.) You do not get to flush every photo I take of my cat to your cloud service as a condition of checking email. I don't care what the bribe is, that's microsoft-level creepy bundling and monopoly leverage and yes disabling it renders YOUR phone app unusable which is a YOU problem, that's why I wasn't using that email account for anything before now.

This round of gmail being creepy on my phone is seperate from gmail being buggy recently on the account I use on my laptop via pop3 to fetch email sent to my domain. They're not the same account, and the only way google ever has to connect the two is intrusive data harvesting. Of a kind that occasionally makes it confuse me with my father, who saddled me with his name and a "junior" which is why I started getting AARP offers in my 30's. Which admittedly had some pretty good discounts in the brochure, but no, they had me confused with someone else over a thousand miles away.

(Ok, the AARP thing was because when I moved out of Austin as my mother was dying and didn't get a new place there for a year, I had my mail forwarded to my father's place in pennsylvania. And then had it forward from there to the new place in Austin when I moved back. And wound up getting more than half his mail because of similar names and disabled the forwarding fairly quickly (he' just box up and mail me my accumulated junk mail every few weeks), but places like AARP had voraciously "updated" based on scraps of misinformation to TRACK ITS PREY... and wouldn't accept "no". This was years before "FAANG" started doing it, although I dunno why netflix is considered intrusive in that acronym? I keep forgetting I _have_ that, mostly it's Fade watching.)

So yeah, the gmail phone app's useless because they intentionally refused to offer an "automatically notice new email on the server" option that does NOT "constantly send every photo you take and random audio recordings to data harvesting servers even if you never open this email app again".

The reason I needed the second email account is the second room of Fade's apartment up in minneapolis has been empty since mid-pandemic (they were assigning her roommates until then, but her last one moved back in with her family to ride out the pandemic, and it's been empty for well over a year now), and we asked her front office and they made us a very good deal on a 6 month lease through August, when we might be moving anyway depending on where Fade gets a job. (Now that she's graduated, she got piecemeal teaching work for the spring semester but is also job-hunting for something more permanent.) Which is why I'm trying to sell the house and move up there. Fuzzy's moving back in with her father (who's old and in the hospital way too much and could use more looking after anyway, she's been visiting him on weekends already, he lives up in Leander about a five minute drive from the far end of Austin's Light Tactical Rail line), and she's taking the geriatric cat with her.

Fade's made it clear she's never moving back to a state that wants her to literally die of an ectopic pregnancy, so we were going to sell the house at some point anyway, and "timing the market" is another phrase for "reading the future", so now's as good as any. (Last year would have been way better. Next year could be anything.)

The second email account came in because I was the "guarantor" on her lease for the first account, since she was a student and obviously student housing involves a parent or similar co-signing, doesn't it? Except with my email already in the system _that_ way, me actually signing up to get a room there confused their computer deeply, so to apply to RENT there I had to create a new account, which required a new email address... (I can bypass "guarantor" by just paying multiple months in advance.)

I continue to break everything. (And just now trying to e-sign the lease document, I noticed the "download a PDF copy" link was on the first page but hitting the checkbox to accept electronic delivery advanced to the second page, and hitting the back button put me back in the email, and clicking on the link again said it had already been used and was thus expired... Eh, the usual. Fade's handling it.)


February 6, 2024

Alas, devuan doesn't seem to have qemu-deboostrap (anymore?), so trying to reverse engineer it to set up an arm64 VM image, the root filesystem part looks like:

$ dd if=/dev/zero of=arm64-chimaera.img bs=1M count=65536
$ /sbin/mkfs.ext4 arm64-chimaera.img
$ mkdir sub
$ sudo mount arm64-chimaera.img sub
$ sudo debootstrap --arch=arm64 --keyring=/usr/share/keyrings/devuan-archive-keyring.gpg --verbose --foreign chimaera sub
$ sudo umount sub

And then fishing a kernel out of the network installer and booting the result:

$ wget http://debian.csail.mit.edu/debian/dists/bullseye/main/installer-arm64/current/images/netboot/debian-installer/arm64/linux -O arm64-vmlinux
$ qemu-system-aarch64 -M virt -cpu cortex-a57 -m 2048 "$@" -nographic -no-reboot -kernel arm64-vmlinux -append "HOST=aarch64 console=ttyAMA0 root=/dev/sda init=/bin/sh" -drive format=raw,file=arm64-chimaera.img

Which died because the ext4 driver is not statically linked into that kernel image and thus can't mount the root=. In fact the list of drivers it tried was blank, it has NO drivers statically linked in. Which implies you have to insmod from initramfs in order to be able to mount any filesystem from a block device, which is just INSANE. Swapping in the kernel mkroot builds for the aarrcchh6644 target, and using root=/dev/vda instead (because different drivers and device tree), I got a shell prompt and could then run:

# mount -o remount,rw /
# /debootstrap/debootstrap --second-stage
# echo '/dev/vda / ext4 rw,relatime 0 1' > /etc/fstab
# ifconfig lo 127.0.0.1
# ifconfig eth0 10.0.2.15
# route add default gw 10.0.2.2
# apt-get install linux-image-arm64

Which successfully installed packages from the net into the VM, but I'm not sure that last install is actually helpful? It installed a kernel, but didn't install a bootloader. Can qemu boot if I just give it the -hda and not externally supply a -kernel?

$ qemu-system-aarch64 -M virt -cpu cortex-a57 -m 2048 "$@" -no-reboot -drive format=raw,file=arm64-chimaera.img

Nope, looks like it did not. Or doesn't know how to produce any output? It popped up a monitor window but not a display window, and didn't produce serial console output. And fishing that kernel out of the ext4 filesystem and passing it to -kernel in qemu means I'd also need to pass -initrd in as well (still assuming it does not have any static filesystem drivers), and then what is it trying to display to? Where exactly does it think it's getting its device tree from? (If it's statically linked into the kernel then I haven't got one to feed to qemu to try to PROVIDE those devices. And still no way to add console= to point at serial console...)

Eh, stick with the mkroot kernel for now I guess. This should let mcm-buildall.sh build native arm hosted toolchains, both 32 and 64 bit, for next release. It would be way better to use one of the orange pi 3b actual hardware devices I can plug into the router via cat5 and leave on 24/7, that can do the qemu regression testing via cron job and everything. Plus my home fiber's faster than the wifi so stuff physically plugged into the router doesn't even count against the bandwidth we're actually using, it could act as a SERVER if they didn't go to such extreme lengths to make you pay extra for a static IP (four times the base cost of the service, for no reason except "they can").

But I don't trust the Orange Pi's chinese kernel not to have spyware in it (like... 30% chance?) and I haven't sat down to hammer a vanilla kernel into giving me serial output and a shell prompt on the hardware yet. Mostly because I can't power an orange pi from my laptop USB the way I can a turtle board, it wants a 2 amp supply and the laptop wants to give half an amp. I mostly think of working on it when I'm out-with-laptop...


February 5, 2024

I fell behind on email over the weekend (dragged the laptop along but didn't connect it to the net), and gmail errored out a "denied, you must web login!" pop-up during my first pop3 fetch to catch up.

So I went to the website and did a web login, and it went "we need need need NEED to send you an sms, trust us bud honest this will be the only one really please we just GOTTA"... I have never given gmail a phone number, and refuse to confirm or deny its guess.

So I clicked the "get help" option... which also wanted me to login. So I did and it said it needed to verify the account, and this time offered to contact my next-of-kin email (it's 2am, she's asleep).

So I decided to wait (and maybe vent on mastodon a bit, and look up what I need to do in dreamhost to switch my mx record to point at the "you are a paying customer" servers I get with my domain and website rather than the "you are the product" servers... yeah I'd lose the accumulated weekend of email but the main reason I _hadn't_ done it was screwing up and losing access to email for a bit would be annoying and here gmail has DONE IT FOR ME), and messed with some other windows for a bit, then out of habit switched desktops and clicked the "get messages" button in thunderbird...

And it's downloading email again just fine. (And did so for the 5 logins it took to grab a couple hundred messages at a time and clear the backlog: linux-kernel and qemu-devel and so on are high traffic lists and their pop3 implementation has some arbitrary transaction limit.) And it looks like a reasonable weekend's worth of email...? Nothing obviously wrong?

I haz a confused.

I don't _really_ want to move email providers at the same time I'm trying to sell a house and move, but... leaving this alone feels kind of like ignoring termite damage. Some things you descend upon with fire. Gmail is _telling_ me that it's unsafe.

I'm _pretty_ sure this is their out of control data harvesting trying to connect together pieces of their social graph to map every human being to a phone that has a legal name and social security number using it, and can be tracked via GPS coordinates 24/7. If there WAS any actual "security" reason behind it, it obviously didn't WORK. I got access back without ever providing more than the old login. I didn't get WEB access back, but that just means I can't fish stuff out of the spam filter. So... greedy or incompetent?

But why _now_? What triggered it...


February 4, 2024

I have a pending pull request adding port probing to netcat. It adds two flags: -z is a "zero I/O mode" flag where it connects and closes the connection immediately, which isn't really zero I/O because a bunch of TCP/IP packets go through setting up and tearing down the connection so the other side totally notices. Also a separate -v flag that just prints that we've connected successfully, which seems weird because we print a big error message and exit when we DON'T connect successfully, so saying that we did seems redundant.

The patch didn't invent these options, I checked and both are in busybox's "nc_bloaty" which seems to be a full copy of Netcat 1.10, because busybox has multiple different implementations of the same command all over the place in the name of being small and simple. In theory nc_bloaty.c is Hobbit's netcat from the dawn of time which Denys nailed to the side of busybox and painted the project's color in 2007, although maybe it's had stuff added to it since, I haven't checked.

(Sorry, old argument from my busybox days: making cartriges for an Atari 2600 and coin-op machines in a video arcade are different skillsets, and gluing a full sized arcade cabinet to the side of an atari 2600 is NOT the same as adding a cartrige to its available library. As maintainer I strongly preferred fresh implementations to ports because license issues aside, if it already existed and we couldn't do BETTER why bother? Hobbit's netcat is actually pretty clean and slim as external programs you could incorporate go, but Vodz used to swallow some whales.)

Anyway, that's not the part that kept me from merging the netcat patch from the pull request into toybox the first day I saw it. Nor is the fact I have the start of nommu -t support using login_tty() in my tree (another thing I need a nommu test environment for) and have to back it out to apply this.

No, the head scratcher is that the name on the email address of the patch I wget by adding ".patch" to the github URL is "कारतोफ्फेलस्क्रिप्ट™" which Google Translate says is Marathi for "Kartoffelscript" with a trademark symbol. Marathi is the 4th most widely spoken language in India (about 90 million speakers), and Kartoffel is german for Potato.

I mean, it's not some sort of ethnic slur or exploit or something (which is why I checked the Thing I Could Not Read), so... yay? I guess I could apply that as is, I'm just... confused.

And I'm also looking at the OTHER available options in the bigger netcat's --help output and going "hex dump would be lovely". I don't need a "delay interval" because the sender of data can ration it easily enough, and each call to netcat does a single dialout so the caller can detect success/fail and delay in a loop if they're manually scanning a port range for some reason. (Look, nmap exists.) I'm reluctant to add -b "allow broadcasts" because... what's the use case here? I can do that one if somebody explicitly asks for it, which means they bring a use case.


February 3, 2024

Moving is exhausting, and so far I've barely packed up one bookcase.

Follow-up to yesterday's email, my correspondent is still looking into the IP status of older architectures, sending me a quote from a reuters article:

> "In 2017, under financial pressure itself, Imagination Technologies sold the
> MIPS processor business to a California-based investment company, Tallwood
> Venture Capital.[47] Tallwood in turn sold the business to Wave Computing in
> 2018,[48] both of these companies reportedly having their origins with, or
>l ownership links to, a co-founder of Chips and Technologies and S3 Graphics.[49]
> Despite the regulatory obstacles that had forced Imagination to divest itself of
> the MIPS business prior to its own acquisition by Canyon Bridge, bankruptcy
> proceedings for Wave Computing indicated that the company had in 2018 and 2019
> transferred full licensing rights for the MIPS architecture for China, Hong Kong
> and Macau to CIP United, a Shanghai-based company.[50]"

As far as I can tell mips imploded because of the PR backlash from screwing over Lexra.

Mips used to be all over the place: Linksys routers were mips, Playstation 2 was mips, the SGI Irix workstations were mips... Then they turned evil and everybody backed away and switched to powerpc and arm and such.

China didn't back away from mips, maybe due to a stronger caveat emptor culture and maybe due to not caring about lawsuits that couldn't affect them. The Lexra chips that got sued out of existence here were still widely manufactured over there (where US IP law couldn't reach at the time; that's how I got involved, somebody was importing a chinese router and trying to update its kernel to a current version, and it needed an old toolchain that didn't generate the 4 patented instructions). China's Loongson architecture recently added to the linux kernel is a Mips fork dating back to around 2001.

Yes, "homegrown clone". Don't ask, I don't know. See also this and this for the arm equivalent of what china did to mips. Any technology sent to china gets copied and then they claim to have invented it.


February 2, 2024

I get emails. I reply to emails. And then I cut and paste some long replies here:

> Is there an expiration on ARM patents such as the ARM7TDMI and ARM9? With the
> SH-2 being developed in 1992, and expiring in 2015, I am curious if the ARM7
> would be synthesizable.

In theory?

Ten years ago there was a big push to do open hardware arm, and Arm Inc. put its foot down and said they didn't mind clones of anything _before_ the ARMv3 architecture (which was the first modern 32 bit processor and the oldest one Linux ran on) but if you tried to clone ARMv3 or newer they would sue.

That said, the point of patents is to expire. Science does not advance when patents are granted, it advances when they expire. Lots of product introductions simultaneously from multiple vendors, such as iphone and arm launching within 18 months of each other, can be traced back to things like important touchscreen patents expiring.

The problem is, the big boys tend to have clouds of adjacent patents and patent-extension tricks, such as "submarine" patents where they file a patent application and then regularly amend it so it isn't granted promptly but instead remains an application for years, thus preventing its expiration clock from starting since it expires X years after being _granted_, not applied for. (But prior art is from before the _application_ for the patent.) Or the way drug companies patented a bunch of chemicals that were racemic mixtures, and then went back and patented just the active isomer of that chemical, and then sued anybody selling the old raecemic mixtures because it _contains_ the isomer. (Which CAN'T be legal but they can make you spend 7 years in court paying millions annually to _prove_ it. The point of most Fortune 500 litigation isn't to prove you're right, it's to tie the other side up in court for years until you bankrupt them with legal fees, or enough elections go by for regulatory capture to Citizens United up some pet legislators who will replace the people enforcing the law against you.)

Big companies often refuse to say exactly what all their relevant patents ARE. You can search yourself to see what patents they've been granted, but did they have a shell company, or did they acquire another company, so they control a patent their name isn't on? And this is poker: they regularly threaten to sue even when they have nothing to sue with. Bluffing is rampant, and just because they're bluffing doesn't mean they won't file suit if they think you can't afford a protracted defense. (Even if they know they can't win, they can delay your product coming to market for three years and maybe scare away your customers with "legal uncertainty".)

You can use existing hardware that was for sale on known dates, and publications that would invalidate patents that hadn't yet been filed (there was some attempt to bring submarine patents under control over the past couple decades, but it's reformers fighting against unguillotined billionaires with infinitely deep pockets and they have entire think tanks and lawfirms on retainer constantly searching for new loopholes and exploits).

My understanding (after the fact and not hugely informed) was that a big contributor to J-core happening was going to Renesas with old hardware and documentation to confirm "anything implementing this instruction set has to have expired because this came out on this date and either the patent had already been granted or this is prior art invalidating patents granted later", and when Renesas still insisted on license agreements demanding per-chip royalties, refusing to sign and telling them to sue. Which they did not, either because they were bluffing or the cost/benefit analysis said it wasn't worth it. But standing up to threats and being willing to defend against a lawsuit for years if necessary was an important part of the process, because the fat cats never STOP trying to intimidate potential competitors.

The J-core guys could have chosen any processor from that era to do the same thing with: m68k, Alpha, etc. And in fact they initially started trying to use an existing Sparc clone but it didn't do what they needed. The sparc was memory inefficient and power hungry, which led to the research into instruction set density, which led to superh as the sweet spot. In fact superh development started when Motorola's lawyers screwed over Hitachi on m68k licensing, so their engineers designed a replacement. x86 is even more instruction dense due to the variable length instructions, but requires a HUGE amount of circuitry to decode that mess at all efficiently. Starting with the Pentium it has a hardware frontend that converts the x86 instructions into internal RISC instructions and then actually executes those. (That's why RISC didn't unseat x86 like everybody expected it would: they converted their plumbing to RISC internally with a translation layer in front of it for backwards compatibility. The explosion of sparc, alpha, mips, powerpc, and so on all jockeying to replace x86... didn't. They only survived at the far ends of the performance bell curve, the mainstream stayed within the network effect feedback loop of wintel's dominant market share. Until phones.)

Arm Thumb, and thus Cortex-m, was a derivative of superh. To the point it got way cheaper when the superh patents expired and arm didn't have to pay royalties to renesas anymore, which is why that suddenly became cheap and ubiquitous. But from a hardware cloning perspective, keep in mind "thumb" was not present in the original arm processors. Also, things like "arm 7" and "arm 9" are chips, not different instruction set architectures. (Pentium III and Pentium M were both "i686".) The instruction set generations have a 'v" in them: armv1, armv2, armv3, up through armv8.

It goes like this:

Acorn Risc Machines started life as a UK company that won a contract with the BBC to produce the "BBC Micro" back in 1981 alongside an educational television program teaching kids how to compute. Their first machine was based on the MOS 6502 processor, same one in the Commodore 64 and Apple II and Atari 2600: that had 8-bit registers and 16 bit memory addressing, for 64k RAM total. (The story of MOSTEK is its own saga, the 6502 was to CPU design a bit like what Unix was to OS design, it showed people that 90% of what they'd been doing was unnecessary, and everybody went "oh".)

ARMv1 came from acorn's successor machine the Archimedes (released in 1987, circa the Amiga) which used a home-grown CPU that had 32 bit registers (but only 26 bit addressing, 64 megs max memory). ARMv2 added a hardware multipler and a faster interrupt mode (which only saved half the registers), but still 26 bit addressing. Think of ARMv1 and ARMv2 as a bit like the 286 processor in intel-land: a transitional attempt that wound up as a learning experience, and fixing what was wrong with them means backwards compatibility doesn't go back that far.

The oldest one Linux runs on is ARMv3, which did a proper flat 32 bit address space, and is generally considered the first modern ARM architecture. ARMv4 introduced a bunch of speedups, and also a way of announcing instruction set extensions (like different FPUs and such) so you could probe at runtime what was available. These extensions were indicated by adding a letter to the architecture. The most important extension was the "thumb" instruction set, ARMv4T. (But there was also some horrible java accelerator, and so on.) ARMv5 had various optimizations and integrated thumb so it wasn't an extension anymore but always guaranteed to be there: recompiling for ARMv5 speeds code up about 25% vs running ARMv4 code on the same processor, I don't remember why. ARMv6 added SMP support which is mostly irrelevant outside the kernel so you generally don't see compilers targeting it because why would they? And then ARMv7 was the last modern 32 bit one, another big speedup to target it with a compiler, but otherwise backwards compatible ala i486/i586/i686. All this stuff could still run ARMv4T code if you tried, it was just slower (meaning less power efficient when running from battery, doing the "race to quiescence" thing).

Along the way Linux switched its ARM Application Binary Interface to incorporate Thumb 1 instructions in function call and system call plumbing, the old one retroactively became known as "OABI" and the new (extended) one is "EABI", for a definition of "new" that was a couple decades ago now and is basically ubiquitious. Support for OABI bit-rotted over the years similarly to a.out vs ELF binaries, so these days ARMv4T is pretty much the oldest version Linux can run without serious effort. (For example, musl-libc doesn't support OABI, just EABI.) In THEORY a properly configured Linux kernel and userspace could still run on ARMv3 or ARMv4 without the T, but when's the last time anybody regression tested it? But if ARMv3 was your clone target, digging that stuff up might make sense. Easier to skip ahead to ARMv4T, but A) lots more circuitry (a whole second instruction set to implemment), B) probably more legal resistence from whoever owns ARM Inc. this week.

And then ARMv8 added 64 bit support, and kept pretending it's unrelated to historical arm (stuttering out aarrcchh6644 as a name with NO ARM IN IT), although it still had 32 bit mode and apparently even a couple new improvements in said 32 bit mode so you can compile a 32 bit program for "ARMv8" if you try and it won't run on ARMv7. Dunno why you WOULD though, it's a little like x32 on intel: doesn't come up much, people mostly just build 64 bit programs for a processor that can't NOT support them. Mostly this is a gotcha that when you tell gcc you want armv8-unknown-linux instead of aarrcchh6644-talklikeapirateday-linux you get a useless 32 bit toolchain instead of what you expected. Sadly linux accepts "arm64" but somehow the "gnu gnu gnu all hail stallman c compiler that pretends that one of the c's retroactively stands for collection even though pcc was the portable c compiler and icc was the intel c compiler and tcc was the tiny c compiler" does not. You have to say aarrcchh6644 in the autoconf tuple or it doesn't understand.

So what's Thumb: it's a whole second instruction set, with a mode bit in the processor's control register saying which kind it's executing at the moment. Conventional ARM instructions are 32 bits long, but thumb instructions are 16 bits (just like superh). This means you can fit twice as many instructions in the same amount of memory, and thus twice as many instructions in each L1 cache line, so instructions go across the memory bus twice as fast... The processor has a mode bit to switch between executing thumb or conventional ARM instructions, a bit like Intel processors jumping between 8086 vs 80386 mode, or 32 vs 64 bit in the newer ones.

Note that both Thumb and ARM instruction modes use 32 bit registers and 32 bit addresses, this just how many bits long is each _instruction_. The three sizes are unrelated: modern Java Virtual Machines have 8 bit instructions, 32 bit registers, and 64 bit memory addresses. Although you need an object lookup table to implement a memory size bigger than the register size, taking advantage of the fact a reference doesn't HAVE to be a pointer, it can be an index into an array of pointers and thus "4 billion objects living in 16 exabytes of address space". In hardware this is less popular: the last CPU that tried to do hardware-level object orientation was the Intel i432 (which was killed by the 286 outperforming it, and was basically the FIRST time Intel pulled an Itanium development cycle). And gluing two registers together to access memory went out with Intel's segment-offset addressing in the 8086 and 286, although accessing memory with HI/LO register pairs was also the trick the 6502 used years earlier (8 bit instructions, 8 bit registers, 16 bit addresses). These days everybody just uses a "flat" memory model for everything (SO much easier to program) which means memory size is capped by register size. But 64 bit registers can address 18 exabytes, and since an exabyte is a triangular rubber coin million terabytes and the S-curve of Moore's Law has been bending down for several years now ("exponential growth" is ALWAYS an S-curve, you run out of customers or atoms eventually), this is unlikely to become a limiting factor any time soon.

The first thumb instruction set (Thumb 1) was userspace-only, and didn't let you do a bunch of kernel stuff, so you couldn't write an OS _only_ in Thumb instructions, you still needed conventional ARM instructions to do setup and various administrative tasks. Thumb 2 finally let you compile a Linux kernel entirely in Thumb instructions. Thumb2 is what let processors like the Cortex-M discard backwards compatibility with the original 32-bit ARM instruction set. It's a tiny cheap processor that consumes very little power, and the trick is it's STUCK in thumb mode and can't understand the old 32 bit instruction set, so doesn't need that circuitry. Along the way, they also cut out the MMU, and I dunno how much of that was "this instruction set doesn't have TLB manipulation instructions and memory mapping it felt icky" or "as long as we were cutting out lots of circuitry to make a tiny low-power chip, this was the next biggest thing we could yank to get the transistor count down". Didn't really ask.

Thumb 2 was introduced in 2003. I don't know what actual patentable advances were in there given arm existed and they were licensing superh to add this to it, but I assume they came up with some kind of fig leaf. (People keep trying to patent breathing, it's a question what the overworked clerks in the patent office approve, and then what the insane and evil magic court that ONLY hears IP law cases on behalf of rich bastards gets overruled on as they perpetually overreach.) But it still came out 20 years ago: patents are going to start expiring soon.

The ARM chip design company the original Acorn RISC guys spun out decades ago was proudly british for many years... until the Tories took over and started selling the government, and then they did Brexit to avoid the EU's new financial reporting requirements (which were going to force billionaires doing money laundering through the City of London and the Isle of Man to list what all their bank accounts and how much money was in each, Switzerland having already caved some years earlier so "swiss bank account" no longer meant you could launder stolen nazi gold for generations)... and the result was Worzel Gummidge Alexander "Boris" de Pfeffel Johnson (Really! That's his name! Look it up!) sold ARM to Softbank, a Japanese company run by a billionaire who seemed absolutely BRILLIANT until he decided Cryptocoins were the future and funded WeWork. Oh, and apparently he also took $60 billion from Mister Bone Saw, or something?

So how much money ARM has to sue people these days, or who's gonna own the IP in five years, I dunno.


February 1, 2024

Happy birthday to me...

Closing tabs, I have a bunch open from my earlier trudge down nommu-in-qemu lane, which started by assuming or1k would be a nommu target, then trying to get bamboo to work, then coldfire...

A tab I had open was the miniconfig for the coldfire kernel that ran in qemu, and that's like half the work of adding it to mkroot... except that was built by the buildroot uclibc toolchain. So I'm trying to reproduce the buildroot coldfire toolchain with musl instead of uclibc, but there IS no tuple that provides the combination of things it wants in the order it wants them, and patching it is being stroppy. Alas gcc is as far from generic as it gets. This config plumbing is a collection of special cases with zero generic anything, and it's explicitly checking for "uclinux" in places and "-linux-musl" in others, and that leading dash means "-uclinux-musl" doesn't match, but "-linux-musl-uclinux" doesn't put data in the right variables (because some bits of the config thinks there are 4 slots with dedicated roles) plus some things have * on the start or the end and other things don't, so sometimes you can agglutinate multiple things into a single field and other times you can't, and it is NOT SYSTEMATIC.

This isn't even fdpic yet! This is just trying to get the config to do what the other thing was doing with musl instead of uclibc. I can probably whack-a-mole my way down it, but if the patch is never going upstream... (Sigh. I should poke coreutils about cut -DF again.)

Now that Fade's graduated, we've decided to pull the trigger on selling the house. Fade's already done paperwork for me to move into the other room at her apartment for the next 6 months, and they start charging us rent on the extra room on the 15th I think? But if I fly back up there with an actual place to live, I don't really want to fly back here, and this place is EXPENSIVE. (I bought it thinking "room to raise kids", but that never happened.) So packing it out and getting it on the market... I should do that.

Fuzzy took the news better than I expected, although her father's been sick for a while now and moving back in to take care of him makes sense. She's keeping the 20 year old cat.

I bought 4 boxes at the U-haul place across I-35 and filled them with books. It didn't even empty one bookshelf. Um. Moving from the condo at 24th and Leon to here was moving into a BIGGER place, so we didn't have to cull stuff. And that was 11 years ago. Before that Fade and I moved a U-haul full of stuff up to Pittsburgh circa 2006... and then moved it all back again a year and change later. The third bedroom is basically box storage, we emptied our storage space out into that to stop paying for storage, and still haven't unpacked most of it. Reluctant to drag it up to Minneapolis (and from there on to wherever Fade gets a job with health insurance, it's the exchange until then). But I don't have the energy to sort through it either. I have many books I haven't read in years. (Yes I am aware of E-books. I'm also aware you don't really _own_ those, just rent them at a billionaire's whim.)

I'm reminded that packing out the efficiency apartment I had for a year in Milwaukee took multiple days (and that was on a deadline), and I'd gone out of my way to accumulate stuff while I was there because it was always temporary. And lugging it all to Fade's I pulled a muscle carrying the "sleeping bag repurposed as a carry sack" I'd shoved all the extra stuff that wouldn't fit into the suitcases into, while switching from a bus to minneapolis's Light Tactical Rail. This time Fade wants to do the "storage pod, which can be somewhat automatically moved for you" thing.


January 31, 2024

Parallelizing the make.sh header file generation is a bit awkward: it's trivial to launch most of the header generation in parallel (even all the library probes can happen in parallel, order doesn't matter and >> is O_APPEND meaning atomic writes won't interleave) and just stick in a "wait" at the two places that care about synchronization (creating build.sh wants to consume the output of optlibs.dat, and creating flags.h wants to consume config.h and newtoys.h).

The awkward part is A) reliable error detection if any of the background tasks fail ("wait" doesn't collect error return codes, creating a "generated/failed" file could fail due to inode exhaustion, DELETING a generated/success file could have a subprocess fail to launch due to PID exhaustion or get whacked by the OOM killer... I guess annotate the end of each file with a // SUCCESS line and grep | wc maybe?), B) ratelimiting so trying to run it in on a wind-up-toy pi-alike board or a tiny VM doesn't launch too many parallel processes. I have a ratelimit bash function but explicitly calling it between each background & process is kinda awkward? (And it doesn't exit, it returns error, so each call would need to perform error checking.) It would be nice if there was a proper shell syntax for this, but "function that calls its command line" is a quoting nightmare when pipelines are involved. (There's a reason "time" is a builtin.) I suppose I could encapsulate each background header generation in its own shell function? But just having them inline with & at the end is otherwise a lot more readable. (I'm actually trying to REDUCE shell functions in this pass, and do the work inline so it reads as a simple/normal shell script instead of a choose-your-own-adventure book.)

While I'm going through it, the compflags() function in make.sh is its own brand of awkward. That function spits out nine lines of shell script at the start of build.sh, and although running generated/build.sh directly is pretty rare (it's more or less a comment, "if you don't like my build script, this is how you compile it in the current configuration"), it's also used for dependency checking to see if the toolchain or config file changed since last build. When we rerun make.sh, it checks lines that 5-8 of a fresh compflags() match the existing build.sh file, and if not deletes the whole "generated" directory to force a rebuild because you did something like change what CROSS_COMPILE points to. That way I don't have to remember to "make clean" between musl, bionic, and glibc builds, or when switcing between building standalone vs multiplexer commands (which have different common plumbing not detected by $TOYFILES collection). The KCONFIG_CONFIG value changes on line 8 when you do that: it's a comment, but not a CONSTANT comment.

The awkward part is needing to compare lines 5-8 of 9, which involves sed. That magic line range is just ugly. Lines 1 is #!/bin/bash and lines 2 and 9 are blank, so comparing them too isn't actually a problem, but lines 3 and 4 are variable assignments that CAN change, without requiring a rebuild. Line 3 is VERSION= which contains the git hash when you're building between releases, if we don't exclude that doing a pull or checkin would trigger a full rebuild. And line 4 is LIBRARIES= which is probed from the toolchain AFTER this dependency check, and thus A) should only change when the toolchain does, B) used to always be blank when we were checking if it had changed, thus triggering spurious rebuilds. (I switched it to write the list to a file, generated/optlibs.dat, and then fetch it from that file here, so we CAN let it through now. The comparison's meaningless, but not harmful: does the old data match the old data.)

Unfortunately, I can't reorganize to put those two at the end, because the BUILD= line includes "$VERSION" and LINK= includes "$LIBRARIES", so when written out as a shell script (or evaluated with 'eval') the assignments have to happen in that order.

Sigh, I guess I could just "grep -v ^VERSION=" both when comparing it? The OTHER problem is that later in the build it appends a "\$BUILD lib/*.c $TOYFILES \$LINK -o $OUTNAME" line to the end, which isn't going to match between runs either. Hmmm... I suppose if TOYFILES= and OUTNAME= were also variable assignments, then that last line could become another constant and we could have egrep -v filter out "^(VERSION|LIBRARIES|TOYFLIES|OUTNAME)=" which is uncomfortably complicated but at least not MAGIC the way the line range was...

(The reason main.c lives in TOYFILES instead of being explicit on the last line is to avoid repetition. The for loop would also have to list main.c, and single point of truth... No, I'm not happy with it. Very minor rough edge, but it's not exactly elegant either...)


January 30, 2024

What does make.sh do... First some setup:

  • declares some functions
  • does a (safe) rm -rf generated/ if compiler options changed
  • check if build.sh options changed
    • function compflags, just check lines 5-8: $BUILD $LINK $PATH $KCONFIG_CONFIG
    • delete the whole "generated" dir if they don't match, forcing full rebuild
  • set $TOYFILES (grep toys/*/*.c for {OLD|NEW}TOY()s enabled in .config)
  • warns if "pending" is in there (in red)

And then header generation:

  • write optlibs.dat (shared library probe)
  • write build.sh (standalone build script, to reproduce this binary on targets that have a compiler but not much else, like make or proper sed)
  • Call genconfig.sh which writes Config.probed, Config.in, and .singlemake (that last one at the top level instead of in generated, because "make clean" can't delete it or you wouldn't be able to "make clean; make sed".
  • Check if we should really run "make oldconfig" and warn if so.
  • write newtoys.h (sed toys/*/*.c)
  • write config.h (sed .config)
  • write flags.h (compile mkflags.c, sed config.h and run newtoys.h through gcc -E, pipe both into mkflags)
  • write globals.h (sed toys/*/*.c)
  • write tags.h (sed toys/*/*.c)
  • write help.h (compile config2help.c, reads .config and Config.in which includes dependencies ala generated/Config.)
  • write zhelp.h (compile install.c and run its --help through gzip | od | sed)

And that's the end of header generation, and it's on to compiling stuff (which is already parallelized).

It's awkward how scripts/genconfig.sh is a separate file, but "make menuconfig" needs those files because they're imported by Config.in at the top level, so that has to be able to build those files before running configure. Possibly I should split _all_ the header generation out into mkheaders.sh (replacing genconfig.sh), and just have it not do the .config stuff if .config doesn't exist? (And then make.sh could check for the file early on and go "run defconfig" and exit if it's not there...)

Having .singlemake at the top level is uncomfortably magic (running "make defconfig" changes the available make targets!) but getting the makefile wrapper to provide the semantics I want is AWKWARD, and if it's in generated/ then "make clean" forgets how to do "make sed".

The reason the above warning about calling "make oldconfig" doesn't just call it itself is that would be a layering violation: scripts/*.c CANNOT call out to kconfig because of licensing. The .config file output by kconfig is read-only consumed by the rest of the build, meaning the kconfig subdirectory does not actually need to _exist_ when running "make toybox". Kconfig is there as a convenience: not only is no code from there included in our build, but no code from there is RUN after the configuration stage (and then only to produce the one text file). You COULD create a .config file by hand (and android basically does). Blame the SFLC for making "the GPL" toxic lawsuit fodder that needs to be handled at a distance with tongs. (I _asked_ them to stop in 2008. Eben stopped, Bradley refused to.)

Of the three scripts/*.c files built and run by the build, the only one I'm _comfortable_ with is install.c I.E. instlist, which spits out the list of commands and I recently extended to spit out the --help text so I could make a compressed version of it. It's basically a stub version of main.c that only performs those two toybox multiplexer tasks, so I don't have to build a native toybox binary and run it (which gets into the problem of different library includes or available system calls between host and target libc when cross compiling, plus rebuilding *.c twice for no good reason). This is a ~60 line C file that #includes generated/help.h and generated/newtoys.h to populate toy_list[] and help_data[], and then writes the results to stdout.

The whole mkflags.c mess is still uncomfortably magic, I should take a stab at rewriting it, especially if I can use (CONFIG_BLAH|FORCED_FLAG)<<shift to zero them out so the flags don't vary by config. I still need something to generate the #define OPTSTR_command strings, because my original approach of having USE() macros drop out made the flag values change, and I switched to annotating the entries so they get skipped but still count for the flag value numbering. Maybe some sort of macro that inserts \001 and \002 around string segments, and change lib/args.c to increment/decrement a skip counter? I don't really want to have a whole parallel ecology of HLP_sed("a:b:c") or similar in config.h, but can't think of a better way at the moment. (Yes it makes the strings slightly bigger, but maybe not enough to care? Hmmm... Actually, I could probably do something pretty close to the _current_ processing with sed...)

The config2help.c thing is a nightmare I've mentioned here before, and has an outstanding bug report about it occasionally going "boing", and I'd very much like to just rip that all out and replace it with sed, but there's design work leading to cleanup before I can do real design work here. (Dealing with the rest of the user-visible configurable command sub-options, for one thing. And regularizing the -Z support and similar so it's all happening with the same mechanism, and working out what properly splicing together the help text should look like...)


January 29, 2024

It's kind of amusing when spammers have their heads SO far up their asses that their pitch email is full of spammer jargon. The email subject "Get High DA/DR and TRAFFIC in 25-30 Days (New Year Discount!" made it through gmail's insane spam filter (despite half of linux-kernel traffic apparently NOT making it through and needing to be fished out), but the target audience seems to be other SEO firms. (No, it didn't have an ending parentheses.)

Wrestling with grep -w '' and friends, namely:

$ for i in '' '^' '$' '^$'; do echo pat="$i"; \
  echo -e '\na\n \na \n a\na a\na  a' | grep -nw "$i"; done
pat=
1:
3: 
4:a 
5: a
7:a  a
pat=^
1:
3: 
5: a
pat=$
1:
3: 
4:a 
pat=^$
1:

The initial bug report was that --color didn't work right, which was easy enough to diagnose, but FIXING it uncovered that I was never handling -w properly, and needed more tests. (Which the above rolls up into one big test.)

As usual, getting the test right was the hard part. Rewriting the code to pass the tests was merely annoying.


January 28, 2024

Managed to flush half a dozen pending tabs into actual commits I could push to the repo. Mostly a low-hanging-fruit purge of open terminal tabs, I have SO MANY MORE half-finished things I need to close down.

Heard back from Greg Ungerer confirming that m68k fdpic support went into the kernel but NOT into any toolchain. I'm somewhat unclear on what that MEANS, did they select which register each segment should associate with, or not? (Did that selection already have to be made for binflt and it just maps over? I'm unclear what the elf2flt strap-on package actually DOES to the toolchain, so I don't know where the register definitions would live. I was thinking I could read Rich's sh2 patches out of musl-cross-make but they vary WIDELY by version, and some of this seems to have gone upstream already? For a definition of "already" that was initially implemented 7 or 8 years ago now. It LOOKED like this was one patch to gcc and one to binutils in recent versions, but those mostly seem to be changing config plumbing, and grepping the ".orig" directory for gcc is finding what CLAIMS to be fdpic support for superh in the base version before the patches are applied? So... when did this go upstream, and at what granularity, and what would be LEFT to add support for a new architecture?)

People are trying to convince me that arm fdpic support was a heavy lift with lots of patches, but looking back on the superh fdpic support it doesn't seem THAT big a deal? Possibly the difference was "already supported binflt", except the hugely awkward bag on the end postprocessor (called elf2flt, it takes an ELF file and makes a FLT file from it) argues against that? But that doesn't mean they didn't hack up the toolchain extensively (pushing patches upstream even!) and THEN "hit the output with sed" as it were. You can have the worst of both worlds, it's the gnu/way.

I got a binflt toolchain working in aboriginal way back when. Maybe I should go back and look at what elf2flt actually DID, and how building the toolchain that used it was configured. (I honestly don't remember, it's been most of a decade and there was "I swore I'd never follow another startup down into bankruptcy but here we are" followed by the Rump administration followed by a pandemic. I remember THAT I did it, but the details are all a bit of a blur...)

But now is not the best time to open a new can of worms. (I mean there's seldom a GOOD time, but... lemme close more tabs.)


January 27, 2024

Sigh. I'm frustrated at the continuing deterioration of the linux-kernel development community. As they collapse they've been jettisoning stuff they no longer have the bandwidth or expertise to maintain, and 5 years back they purged a bunch of architectures.

Meanwhile, I'm trying to get a nommu fdpic test environment set up under qemu, and checking gcc 11.2.0 (the latest version musl-cross-make supports) for fdpic support, grep -irl fdpic gcc/config has hits in bfin, sh, arm, and frv. I'm familiar with sh, and bits of arm were missing last I checked (although maybe I can hack my way past it?) But the other two targets, blackfin and frv, were purged by linux-kernel.

I.E. the increasingly insular and geriatric kernel development community discarded half the architectures with actual gcc support for fdpic. Most of the architectures you CAN still select fdpic for don't seem to have (or to have ever had) a toolchain capable of producing it. That CAN'T be right...

Cloned git://gcc.gnu.org/git/gcc.git to see if any more fdpic targets spawned upstream: nope. Still only four targets supporting fdpic, two of which linux-kernel threw overboard to lighten the load as the hindenberg descends gently into Greg's receivership. As the man who fell off a tall building said on his way down, "doing fine so far"...

Yes I still think driving hobbyists away from the platform was a bad move, but as with most corporate shenanigans where you can zero out the R&D budget and not notice for YEARS that your new product pipeline has nothing in it... the delay between cause and effect is long enough for plausible deniability. It "just happened", not as a result of anything anyone DID.

And which is worse: Carly Fiorina turning HP into one of those geriatric rock bands that keeps touring playing nothing but 40 year old "greatest hits" without a single new song (but ALL THE MONEY IN THE WORLD for lawyers to sue everybody as "dying business models explode into a cloud of IP litigation" once again)... or Red Hat spreading systemd? Zero new ideas, or TERRIBLE ideas force-fed to the industry by firms too big to fail?

Caught up on some blog editing, but haven't uploaded it yet. (Japanese has a tendency to omit saying "I", which is has been a tendency in my own writing forever. "I" am not an interesting part of the sentence. That said, it technically counts as a bad habit in english, I think?) I made a mess of december trying to retcon some entries (I'd skipped days and then had too many topics for the day I did them and wanted to backfill _after_ I'd uploaded, which probably isn't kind to the rss feed), and I only recently untangled that and uploaded it, and I'm giving it a few days before replacing it with the first couple weeks of January.

My RSS feed generator parses the input html file (capping the output at something like the last 30 entries, so the rss file isn't ridiculously huge in the second half of the year), but that makes switching years awkward unless I cut and paste the last few entries from december after the first few entries of January. Which I've done for previous years, and then at least once forgotten to remove (which I noticed back when Google still worked by searching for a blog entry I knew I'd made and it found it in the wrong year's fine). Trying to avoid that this year, but that means giving the end of december a few days to soak.


January 26, 2024

Hmmm... can I assume toybox (I.E. the multiplexer) is available in the $PATH of the test suite? Darn it, no I can't, not for single command tests. Makes it fiddly to fix up the water closet command's test suite...

So Elliott sent me a mega-patch of help text updates, mostly updating usage: lines that missed options that were in the command's long one-per-line list, tweaking option lists that weren't sorted right, and a couple minor cleanups like some missing FLAG() macro conversions that were still doing the explicit if (toys.optflags & FLAG_walrus) format without a good excuse. And since my tree is HUGELY DIRTY, it conflicted with well over a dozen files so applying it was darn awkward... and today he gave me a "ping" because I'd sat on it way too long (I think I said a week in the faq?) at which point my documented procedure is I back my changes out, apply his patch, and port my changes on top of it because I've already had PLENTY OF TIME to deal with it already.

And of course trying to put my changes back on top of his was fail-to-apply city (the reason I couldn't just easily apply it in the first place), so I went through and reapplied my changes by hand, some of which are JUST conflicting documentation changes (like patch.c) and others are fairly low hanging fruit I should just finish up.

Which gets us to wc, the water closet word count command, where I was adding wc -L because somebody asked for it and it apparently showed up in Debian sometime when I wasn't looking. (It's even in the ancient version I still haven't upgraded my laptop off of.) It shows maximum line length, which... fine. Ok. Easy enough to add. And then which order do the fields show up in (defaults haven't changed and the new fifth column went in at the end, which was the sane way to do it), so I add tests, and...

The problem is TEST_HOST make test_wc doesn't pass anymore, which is not related to THIS change. The first failure is a whitespace variation, which already had a comment about in the source and I can just hit it with NOSPACE=1 before that test (not fixing it to match, one tab between each works fine for me, I do not care here; poke me if posix ever notices and actually specifies any of this).

But the NEXT problem is that the test suite sets LC_ALL=c for consistent behavior (preventing case insensitive "sort" output and so on), and we're testing utf-8 support (wc -m) which works FINE in the toybox version regardless of environment variables, but the gnu/dammit version refuses to understand UTF-8 unless environment variables point to a UTF-8 language locale. (Which makes as much sense as being able to set an environment vbariable to get the gnu stuff to output ebcdic, THIS SHIP HAS SAILED. And yet, they have random gratuitous dependencies without which they refuse to work.)

On my Debian Stale host, the environment variables are set to "en_us.UTF-8", so the test works if run there, but doesn't work in the test suite where it's consistently overridden to LC_ALL=c. (In a test suite it's more important to be CONSISTENT than to be RIGHT.)

I could of course set it to something else in a specific test, but nothing guarantees that this is running on a system with the "en_us" locale installed. And fixing this is HORRIFIC: in toybox's main.c we call setlocale(LC_CTYPE, "") which reads the environment variables and loads whatever locale they point to (oddly enough this is not the default libc behavior, you have to explicitly REQUEST it), and then we check that locale to see if it has utf8 support by calling nlcodeinfo(CODESET) which is laughable namespace pollution but FINE, and if that doesn't return the string "UTF-8" (case sensitive with a dash because locale nonsense), then we try loading C.UTF-8 and if that doesn't work en_us.UTF-8 because MacOS only has that last one. (So if you start out with a french utf8 locale we keep it, if not we try "generic but with UTF-8", which doesn't work on mac because they're just RECENTLY added mknodat() from posix-2008. As in it was added in MacOS 13 which came out October 2022. FOURTEEN YEARS later. Yes really. Steve Jobs is still dead.)

So ANYWAY, I have painfully hard-fought code in main.c that SHOULD deal with this nonsense, but what do I set it to in a shell script? There is a "locale" command which is incomprehensible:

$ locale --help | head -n 3
Usage: locale [OPTION...] NAME
  or:  locale [OPTION...] [-a|-m]
Get locale-specific information.
$ locale -a
C
C.UTF-8
en_US.utf8
POSIX
$ locale C.UTF-8
locale: unknown name "C.UTF-8"
$ locale en_US.utf8
locale: unknown name "en_US.utf8"

Bravo. (What does that NAME argument _mean_ exactly?) So querying "do you have this locale installed" and "what does this locale do" is... less obvious than I'd like.

I was thinking maybe "toybox --locale" could spit out what UTF-8 aware locale it's actually using, but A) can't depend on it being there, B) ew, C) if it performed surgery on the current locale to ADD UTF-8 support with LC_CTYPE_MASK there's no "set the environment variable to this" output for that anyway.

Sigh. I could try to come up with a shell function that barfs if it can't get utf8 awareness, but... how do I test for utf8 awareness? Dig, dig, dig...

Dig dig dig...

Sigh, what a truly terrible man page and USELESS command --help output. Dig dig dig...

Ah: "locale charmap". for i in $(locale -a); do LC_ALL=$i locale charmap; done

What was the question again?


January 25, 2024

Running toybox file on the bamboo board's filesystem produced a false positive. It _said_ it had ELF FDPIC binaries, but the kernel config didn't have the fdpic loader enabled. And the dependencies for BINFMT_ELF_FDPIC in the kernel are depends on ARM || ((M68K || RISCV || SUPERH || XTENSA) && !MMU) so I only have 5 targets to try to get an fdpic nommu qemu system working on. (And need to read through the elf FDPIC loader to figure out how THAT is identifying an fdpic binary, it seems architecture dependent...)

I haven't poked at arm because musl-cross-make can't build a particularly new toolchain and hasn't been updated in years, but maybe the toolchain support went in before the kernel support did? I should come back to that one...

SuperH I'm already doing but only on real hardware (the j-core turtle board), and qemu-system-sh4 having "4" in the name is a hint WHY sh2 support hasn't gone in there yet. (Since qemu-sh4 application emulation can run it might be possible to build a kernel with the fdpic loader if I hack the above dependency to put superh next to ARM and outside of the !MMU list? Dunno what's involved but presumably arm did _some_ of that work already.)

M68K is coldfire, I ran buildroot's qemu_m68k_mcf5208_defconfig to get one of those which booted, but all the binaries are binflt. I grepped the patched gcc that mcm built to see how its configure enables fdpic support, but the patches vary greatly by version. Hmmm...


January 24, 2024

Sigh, I really need to add a "--shoehorn=0xa0000000,128m" option to qemu to tell it to just forcibly add DRAM to empty parts of a board's physical address range, and a kernel command line option for linux to use them...

My first attempt at fixing grep -w '' didn't work because it's not just "empty line goes through, non-empty line does not"... Turns out "a  a" with two spaces goes through also. Which means A) the '$' and '^' patterns, by themselves in combination with -w, suddenly become more interesting, B) my plumbing to handle this is in the wrong place, C) 'a*' in the regex codepath has to trigger on the same inputs as empty string because asterisk is ZERO or more so this extension to the -w detection logic still needs to be called from both the fixed and regex paths without too much code duplication, but how do I pass in all the necessary info to a shared function...

Marvin the Martain's "Devise, devise" is a good mantra for design work.


January 23, 2024

I want a qemu nommu target so I can regression test toybox on nommu without pulling out hardare and sneakernetting files onto it, and or1k's kernel config didn't have the FDPIC loader in it so I'm pretty sure that had an mmu.

Greg Ungerer said he tests ELF-fdpic on arm, and regression tests elf PIE nommu on arm, m68k, riscv, and xtensa. Which isn't really that helpful: I still don't care about riscv, arm requires a musl-cross-make update to get a new enough compiler for fdpic support, and xtensa is a longstanding musl-libc fork that's based off a very old version. (I could try forward porting it, but let's get back to that one...)

The three prominent nommu targets I recall from forever ago (other than j-core, which never got a qemu board) are m68k (I.E. coldfire), powerpc (where bamboo and e500 were two nommu forks from different vendors, each of which picked a slightly different subset of the instruction set), and of course arm (cortex-m, see toolchain upgrade needed above).

Buildroot's configs/ directory has "qemu_ppc_bamboo_defconfig" and board/qemu/ppc-bamboo/readme.txt says "qemu-system-ppc -nographic -M bamboo -kernel output/images/vmlinux -net nic,model=virtio-net-pci -net user" is how you launch it. Last time I tried it the build broke, but let's try again with a fresh pull...

Hey, and it built! And it boots under qemu! And hasn't got "file" or "readelf" so it's not immediately obvious it's fdpic (I mean, it's bamboo, I think it _has_ to be, but I'd like to confirm it's not binflt). And qemu doesn't exit (halt does the "it its now safe to turn off" thing, but eh, kill it from another window). And from the host I can "toybox file toybox file output/target/bin/busybox" which says it's fdpic.

Ok, the kernel build (with .config) is in output/build/linux-6.1.44 and... once again modern kernel configs are full of probed gcc values so if I run my miniconfig.sh without specifying CROSS_COMPILE (in addition to ARCH=powerpc) the blank line removal heuristic fails and it has to dig through thousands of lines of extra nonsense, let's see... it's in output/host/bin/powerpc-buildroot-linux-gnu- (and of COURSE it built a uclibc-necromancy toolchain, not musl) so... 245 lines after the script did its thing, and egrep -v "^CONFIG_($(grep -o 'BINFMT_ELF,[^ ]*' ~/toybox/mkroot/mkroot.sh | sed 's/,/|/g'))=y" mini.config says 229 lines aren't in the mkroot base config, with the usual noise (LOCALVERSION_AUTO and SYSVIPC and POSIX_MQUEUE and so on)... static initramfs again, does bamboo's kernel loader know how to specify an external initramfs or is static a requirement like on or1k?

Yet another "melting down this iceberg" session like with or1k (which I'd HOPED would get me a nommu test system), but the other big question here is does musl support bamboo? It supports powerpc, and the TOOLCHAIN supports bamboo, but is there glue missing somewhere? (Long ago I mailed Rich a check to add m68k support, but he had some downtime just then and gave me a "friend rate" on an architecture nobody else was going to pay to add support for probably ever, and I was working a well-paying contract at the time so had spare cash. If nothing else, there's been some inflation since then...)


January 22, 2024

So, unfinished design work: I want more parallelism and less dependency detection in make.sh setup work (mostly header generation).

It's not just generating FILES in parallel, I want to run the compile time probes from scripts/genconfig.sh in parallel, and probe the library link list (generated/optlib.dat) in parallel, and both of those have the problem of collecting the output from each command and stitching it together into a single block of data. Which bash really doesn't want to do: even a=b | c=d | e=f discards the assignments because each pipe segment is an implicit subshell to which assignments are local, yes even the last one. I can sort of do a single x=$(one& two& three&) to have the subshell do the parallelizing and collect the output, but A) each output has to be a single atomic write, B) they occur in completion order, which is essentially randomized.

The problem with A=$(one) B=$(two) C=$(three) automatically running in parallel is that variable assignments are sequenced left to right, so A=abc B=$A can depend on A already having been set. Which means my toysh command line resolver logic would need to grow DEPENDENCIES.

In theory I could do this, the obvious way (to me) is another variable type flag that says "assignment in progress" so the resolver could call a blocking fetch data function. Also, I'd only background simple standalone assignments, because something like A=$(one)xyz where the resolution was just _part_ of the variable would need to both store more data and resume processing partway through... Darn it, it's worse than that because variable resolution can assign ${abc:=def} and modify ala $((x++)) so trying to do them out of sequence isn't a SIMPLE dependency tree, you'd have to lookahead to see what else was impacted with a whole second "collect but don't DO" parser, and that is just not practical.

I can special case "multiple assignments on the same line that ONLY do simple assignment of a single subshell's output" run in parallel, but... toysh doing that and bash NOT doing that is silly. Grrr.Alright, can I extend the "env" command to do this? It's already running a child process with a modified environment, so env -p a="command" -p b="command" -p c="command" echo -e '"$a\n$b\n$c" could... resolve $a $b and $c in the host shell before running env, and if I put single quotes around them echo DOESN'T know how... Nope, this hasn't got the plumbing and once again my command would be diverging uncomfortably far from upstream and the gnu/dammit guys still haven't merged cut -DF.

The shell parallelism I have so far is a for loop near the end of scripts/make.sh that writes each thing's output to a file, and then does a collation pass from the file data after the loop. Which I suppose is genericizeable, and I could make a shell function to do this. (I try to quote stuff properly so even if somebody did add a file called "; rm -rf ~;.c" to toys/pending it wouldn't try to do that, and maintaining that while passing arbitrary commands through to a parallelizer function would be a bit of thing. But it's also not an attack vector I'm hugely worried about, either.)


January 21, 2024

Bash frustration du jour: why does the "wait" builtin always return 0? I want to fire off multiple background processes and then wait for them all to complete, and react if any of them failed. The return value of wait should be nonzero if any of the child processes that exited returned nonzero. But it doesn't do that, and there isn't a flag to MAKE it do that.

I'm trying to rewrite scripts/make.sh to parallelize the header file generation, so builds go faster on SMP systems. (And also to just remove the "is this newer than that" checks and just ALWAYS rebuild them: the worst of the lot is a call to sed over a hundred or so smallish text files, it shouldn't take a significant amount of time even on the dinky little orange pi that's somehow slower than my 10 year old laptop. And the OBVIOUS way to do it is to make a bunch of shell functions and then: "func1& func2& func3& func4& func5& wait || barf" except wait doesn't let me know if anything failed.

Dowanna poke chet. Couldn't use a new bash extension if I did not just because of 7 year time horizon, but because there's still people relying on the 10 year support horizon of Red IBM Hat to run builds under ancient bash versions that predate -n. And of course the last GPLv2 version of bash that MacOS stayed on doesn't have that either, and "homebrew" on the mac I've got access to also gives you bash 3.2.57 from 2007 which hasn't got -n. So a hacky "fire off 5 background processes and call wait -n 5 times" doesn't fix it either. (And is wrong because "information needs to live in 2 places": manually updated background process count. And "jobs" shows "active" jobs so using it to determine how many times I'd need to call wait -n to make sure everything succeeded doesn't work either.)

Meanwhile, wait -n returns 127 if there's no next background process, which is the same thing you get if you run "/does/not/exist" as a background job. So "failure to launch" and "no more processes" are indistinguishable if I just loop until I get that, meaning I'd miss a category of failure.

I made some shell function plumbing in scripts/make.sh to handle running the gcc invocations in the background (which, as I've recently complained is just a workaround for "make -j" being added instead of "cc -j" where it BELONGS. (HONESTLY! How is cc -j $(nproc) one.c two.c three.c... -o potato not the OBVIOUS SYNTAX?) Maybe I can genericize that plumbing into a background() function that can also handle the header generation...

That said, I think at least one of the headers depends on previous headers being generated, so there's dependencies. Sigh, in GENERAL I want a shell parallelism syntax where I can group "(a& b&) && c" because SMP is a thing now. I can already create functions with parentheses instead of curly brackets which subshell themselves (turns out a function body needs to be a block, but it turns out "potato() if true; echo hello; fi" works just fine because THAT'S A BLOCK. I want some sort of function which doesn't return until all the subshells it forked exit, and then returns the highest exit code of the lot. It would be easy enough for me to add that to toysh as an extension, but defining my own thing that nobody else uses is not HELPFUL.

Meanwhile, cut -DF still aren't upstream in gnuutils. Despite repeated lip service. Sigh, I should poke them again. And post my 6.7 patches to linux-kernel...


January 20, 2024

Oh dear:

unlike Android proper, which is no longer investigating bazel, the [android] kernel build fully switched to bazel, and doesn't use the upstream build at all. (but there's a whole team working on the kernel...

I had to step away from the keyboard for a bit, due to old scars.

On the one hand, "yay, multiple independent interoperable implementations just like the IETF has always demanded to call something a standard". That's GREAT. This means you're theoretically in a position to document what the linux-kernel build actually needs to DO now, having successfully reimplemented it.

On the other hand... oh no. Both "build system preserved in amber" and "straddling the xkcd standards cycle" are consulting bingo squares, like "magic build machine" or "yocto".

AOSP is actually pretty tame as fortune 500 examples of the Mongolian Hordes technique go: everything is published and ACTUALLY peer reviewed with at least some feedback incorporated upstream. Their build has to be downloadable and runnable on freshly installed new machines with a vanilla mainline Linux distro and retail-available hardware, and at least in theory can complete without network access, all of which gets regression tested regularly by third parties. And they have some long-term editors at the top who know where all the bodies are buried and shovel the mess into piles. (There's a reason DC comics didn't reboot its history with "Crisis on Infinite Earths" until Julius Scwartz retired. Then they rebooted again for Zero Hour, Infinite Crisis, 52, Flashpoint, the New 52, DC Rebirth, Infinite Frontier, Dawn of DC... I mean at this point it could be a heat problem, a driver issue, bad RAM, something with the power supply...)

This means AOSP does NOT have a magic build machine, let alone a distributed heterogeneous cluster of them. They don't have Jenkins launching Docker triggered by a git commit hook ported from perforce. Their build does not fail when run on a case sensitive filesystem, nor does it require access to a specific network filesystem tunneled through the firewall from another site that's it both writes into and is full of files with 25 year old dates. Their build does not check generated files into an oracle database and back out again halfway through. They're not using Yocto.

(YES THOSE ARE ALL REAL EXAMPLES. Consulting is what happens when a company gives up trying to solve a problem internally and throws money at it. Politics and a time crunch are table stakes. It got that bad for a REASON, and the job unpicking the gordian knot is usually as much social skills, research, and documentation as programming, and often includes elements of scapegoat and laxative.)


January 19, 2024

Onna plane, back to Austin.

Did some git pulls in the airport to make sure I had updated stuff to play with: the most recent commit to musl-cross-make is dated April 15, 2022, updating to musl-1.2.3. (There was a 1.2.4 release since then, which musl-cross-make does not know about.) And musl itself was last updated November 16, 2023 (2 months ago). He's available on IRC, and says both projects do what they were intended to so updates aren't as high a priority. But the appearances worry me.

I am reminded of when I ran the website for Penguicon 1, and had a "heartbeat blog" I made sure to update multiple times per week, even if each update was something completely trivial about one of our guests or finding a good deal on con suite supplies or something, just to to provide proof of life. "We're still here, we're still working, progress towards the event is occurring and if you need to contact us somebody will notice prompt-ish-ly and be able to reply".

Meanwhile, if a project hasn't had an update in 3 months, and I send in an email, will it take 3 more months for somebody to notice it in a dead inbox nobody's checking? If it's been 2 years, will anybody ever see it?

That kind of messaging is important. But I can't complain about volunteers that much when I'm not the one doing it, so... If it breaks, I get to keep the pieces.


January 18, 2024

If I _do_ start rebuilding all the toybox headers every time in scripts/make.sh (parallelism is faster than dependency checking here, I'm writing a post for the list), do they really need to be separate files? Would a generated/toys.h make more sense? Except then how would I take advantage of SMP to generate them in parallel? (I suppose I could extend toysh so A=$(blah1) B=$(blah2) C=$(blah3) launched them in parallel background tasks, since they already wait for the pipe to close. Then bash would be slow but toysh would parallelize...

I originally had just toys.h at the top level and lib/lib.h in the lib/ directory, and it would make sense to have generated/generated.h or similar as the one big header there. But over the years, lib grew a bunch of different things because scripts/install.c shouldn't need to instatiate toybuf to produce bin vs sbin prefixes, and lib/portability.h needed ostracism, and so on. Reality has complexity. I try to collate it, but there's such a thing as over-cleaning. Hmmm...


January 16, 2024

Sat down to knock out execdir and... it's already there? I have one? And it's ALWAYS been there, or at least it was added in the same commit that added -exec ten years ago.

And the bug report is saying Alpine uses toybox find, which is news to me. (When they were launching Alpine, toybox wasn't ready yet. They needed some busybox, so they used all of busybox, which makes sense in a "using all the parts of the buffalo" sort of way.)

Sigh, I feel guilty about toybox development because a PROPER project takes three years and change. Linux took 3 years to get its 1.0 release out. Minix took 3 years from AT&T suing readers of the Lyons book to Andrew Tanenbaum publishing his textbook with the new OS on a floppy in the back cover. The Mark Williams Company took 3 years to ship Coherent. Tinycc took three years to do tccboot building the linux kernel. There's a pretty consistent "this is how long it takes to become real".

Toybox... ain't that. I started working on it in 2006, I'm coming up on the TWENTIETH ANNIVERSARY of doing this thing. Admittedly I wasn't really taking it seriously at first and mothballed it for a bit (pushing things like my patch implementation, nbd-client, and even the general "all information about a new command is in a single file the build picks up by scanning for it" design (which I explained to Denys Vlasenko when we met in person at ELC 2010). I didn't _restart_ toybox development until 2012 (well, November 2011) when Tim Bird poked me. But even so, my 2013 ELC "why is toybox" talk was a decade ago now.

I'm sort of at the "light at the end of the tunnel" stage, especially with the recent Google sponsorship... but also losing faith. The kernel is festering under me, and I just CAN'T tackle that right now. The toolchain stuff... I can't do qcc AND anything else, and nobody else has tried. (Both gcc and llvm are both A) written in C++, B) eldrich tangles of interlocking package dependencies with magic build invocations, C) kind of structurally insane (getting cortex-m fdpic support into gcc took _how_ many years, and llvm still hasn't got superh output and asking how to do it is _not_ a weekend job).

And musl-libc is somewhere between "sane" and "abandoned". Rich disappears for weeks at a time, musl-cross-make hasn't been updated since 2022. Rich seems to vary between "it doesn't need more work because it's done" and "it doesn't get more work because I'm not being paid", depending on mood. It's the best package for my needs, and I... SORT of trust it to stay load bearing? And then there's the kernel growing new build requirements as fast as I can patch them out (rust is coming as a hard requirement, I can smell it). I would like to reach a good 1.0 "does what it says on the tin" checkpoint on toybox and mkroot before any more floorboards rot out from under me.

Sigh, once I got a real development environment based on busybox actually working, projects like Alpine Linux sprang up with no connection to me. I'd LIKE to get "Android building under android" to a similar point where it's just normal, and everybody forgets about the years of work I put in making it happen because it's not something anybody DID just the way the world IS. I want phones to be real computers, not locked down read-only data consumption devices that receive blessings from the "special people who aren't you" who have the restricted ability to author new stuff.

And I would really, really, really like to not be the only person working toward this goal. I don't mind going straight from "toiling in obscurity" to "unnecessary and discarded/forgotten", but I DO mind being insufficiently load-bearing. Things not happening until I get them done is ANNOYING. Howard Aiken was right.


January 15, 2024

I saw somebody wanting execdir and I went "ooh, that seems simple enough", although git diff on the find.c in my main working tree has debris from xargs --show-limits changing lib/env.c to a new API, which is blocked on me tracing through the kernel to see what it's actually counting for the size limits. (Since the argv[] and envp[] arrays aren't contiguous with the strings like I thought they were, do they count against the limit? If not, can you blow the stack with exec command "" "" "" "" ""... taking a single byte of null terminator each time but adding 8 bytes of pointer to argv[] for each one, so I have to read through the kernel code and/or launch tests to see where it goes "boing"?

Elliott's going "last time you look at this you decided it changed too often to try to match", which was true... in 2017. When it had just changed. But as far as I can tell it hasn't changed again SINCE, and it's coming up on 7 years since then. (My documented time horizon for "forever ago".) So it seems worth a revisit. (And then if they break me again, I can complain. Which if Linus is still around might work, and if Greg "in triplicate" KH has kicked him out, there's probably a 7 year time horizon for replacing Linux with another project. (On mastodon people are looking at various BSD forks and even taking Illumos seriously, which I just can't for licensing reasons.)


January 14, 2024

Bash does not register <(command) >(line) $(subshells) with job control, and thus "echo hello | tee >(read i && echo 1&) | { read i; wait; echo $?; }" outputs a zero. This unfortunately makes certain kinds of handoffs kind of annoying, and I've had to artifically stick fifos in to get stuff like my shell "expect" implementation to work.

On an adjacent note, a shell primitive I've wanted forever is "loop" to connect the output of a pipeline to the input back at the start of the pipeline. Years and YEARS of wanting this. You can't quite implement it as a standalone command for the same reason "time cmd | cmd | cmd" needs to be a builtin in order to time an entire pipeline. (Well, you can have your command run a child shell, ala loop bash -c "thingy", a bit like "env", but it still has to be a command. You can't quite do it with redirection because you need to create a new pipe(2) pair to have corresponding write to and read from filehandles: writing to the same fd you read from doesn't work. Which is where the FIFO comes in...)


January 13, 2024

Ubuntu and Red Hat are competing to see who can drop support for older hardware fastest, meaning my laptop with the core i5-3340M processor won't be able to run their crap anymore.

I guess I'm ok with that, as long as Debian doesn't pull the same stupidity. (I bought four of these suckers, and have broken one so far, in a way that MOST of it is still good for spare parts. I am BUSY WITH OTHER THINGS, don't force me to do unnecessary tool maintenance.)


January 11, 2024

A long thread I got cc'd on turned into a "Call for LTP NOMMU maintainer", which... I want Linux to properly support nommu, but don't really care about the Linux Test Project (which is an overcomplicated mess).

Linux should treat nommu/mmu the way it treats 32/64 bit, or UP vs SMP, as mostly NOT A BIG DEAL. Instead they forked the ELF loader and the FDPIC loader the way ext2 and ext3 got forked (two separate implementations, sharing no code), and although ext4 unified it again (allowing them to delete the ext2 and ext3 drivers because ext4 could mount them all), they never cleaned up the FDPIC loader to just be a couple of if statements in the ELF loader.

It's just ELF with a separate base register for each of the 4 main segments, text, data, rodata, and bss. Instead of having them be contiguous following from one base register. Dynamic vs static linking is WAY more intrusive. PIC vis non-PIC is more intrusive. They handle all THAT in one go, but fdpic? Exile that and make it so you CANNOT BUILD the fdpic loader on x86, and can't build the elf loader on nommu targets, because kconfig and the #ifdefs won't let you.

And instead of that, when I try to explain to people "uclinux is to nommu what knoppix was to Linux Live CDs: the distro that pioneered a technique dying does NOT mean Linux stopped being able to do that thing, nor does it mean nobody wanted to do it anymore, it just means you no longer need a specific magic distro to do it"... Instead of support, I get grizzled old greybeards showing up to go "Nuh-uuuh, uclinux was never a distro, nobody ever thought uclinux was a DISTRO, the distro was uclinux-dist and there was never any confusion about that on anyone's part". With the obvious implication that "the fact uclinux.org became a cobweb site and eventually went down must be because nommu in Linux IS obsolete and unsupported and it bit-rotted into oblivion because nobody cared anymore. Duh."

Not helping. Really not helping.


January 10, 2024

Got the gzipped help text checked in.

My method of doing merges on divergent branches involves checking it in to a clean-ish branch, extracting it again with "git format-patch -1", and then a lot of "git am 000*.patch" and "rm -rf .git/rebase-apply/" in my main branch repeatedly trying to hammer it into my tree, with "git diff filename >> todo2.patch; git checkout filename" in between, and then once I've evicted the dirty files editing the *.patch file with vi to fix up the old context and removed lines that got changed by other patches preventing this from applying, and then when it finally DOES apply and I pull it into a clean tree and testing throws warnings because I didn't marshall over all the (void *) to squelch the "const" on the changed data type, a few "git show | patch -p1 -R && git reset HEAD^1" (in both trees) and yet MORE editing the patch with vi and re-applying. And then once it's all happy, don't forget "patch -p1 todo2.patch" to re-dirty those bits of the tree consistently with whatever other half-finished nonsense I've wandered away from midstream.

Meanwhile, the linux-kernel geezers have auto-posters bouncing patches because "this looks like it would apply to older trees but you didn't say which ones". (And "I've been posting variants of this patch since 2017, you could have applied any of those and CHOSE not to, how is this now my problem" is not allowed because Greg KH's previous claim to fame was managing the legacy trees, and personal fame is his reason for existing. Then again it does motivate him to do a lot of work, so I can only complain so much. Beats it not happening. But there are significant negative externalities, which Linus isn't mitigating nearly as much as he used to.)


January 9, 2024

I've been up at Fade's and not blogging much, but I should put together a "how to do a new mkroot architecture" explainer.

You need a toolchain (the limiting factor of which is generally musl-libc support), you need a linux kernel config (using arch/$ARCH/defconfig has a file), and you need a qemu-system-$ARCH that can load the kernel and give serial output and eventually run at least a statically linked "hello world" program out of userspace. (Which gets you into elf/binflt/fdpic territory sometimes.)

The quick way to do this is use an existing system builder that can target qemu, get something that works, and reverse engineer those settings. Once upon a time QEMU had a "free operating system zoo" (at http://free.oszoo.org/download.html which is long dead but maybe fishable out of archive.org?) which I examined a few images from, and debian's qemu-debootstrap is another interesting source (sadly glibc, not musl), but these days buildroot's configs/qemu_* files have a bunch (generally uclibc instead of musl though, and the qemu invocations are hidden under "boards" at paths that have no relation to the corresponding defconfig name; I usually find them by grepping for "qemu-system-thingy" to see what they've got for that target).

Once you've got something booted under qemu, you can try to shoehorn in a mkroot.cpio.gz image as its filesystem here to make sure it'll work, or worry about that later. If you don't specify LINUX= then mkroot doesn't need to know anything special about the target, it just needs the relevant cross compiler to produce binaries. (The target-specific information is all kernel config and qemu invocation, not filesystem generation.)

Adding another toolchain to mcm-buildall.sh is its own step, of course. Sometimes it's just "target::" but some of them need suffixes and arguments. Usually "gcc -v" will give you the ./configure line used to create it, and you can compare with the musl-cross-make one and pick it apart from there.

The tricksy bit of adding LINUX= target support is making a microconfig. I should probably copy my old miniconfig.sh out of aboriginal linux into toybox's mkroot directory. That makes a miniconfig, which laboriously discovers the minimal list of symbols you'd need to switch on to turn "allnoconfig" into the desired config. (Meaning every symbol in the list is relevant and meaningful, unlike normal kernel config where 95% of them are set by defaults or dependencies.)

Due to the way the script works you give it a starting config in a name OTHER than .config (which it repeatedly overwrites by running tests to see if removing each line changes the output: the result is the list of lines that were actually needed). You also need to specify ARCH= the same way you do when running make menuconfig.

The other obnoxious thing is that current kernels do a zillion toolchain probes and save the results in the .config file, and it runs the probes again each time providing different results (so WHY DOES IT WRITE THEM INTO THE CONFIG FILE?) meaning if you don't specify CROSS_COMPILE lots of spurious changes happen between your .config file and the tests it's doing. (Sadly, as its development community ages into senescence, the linux kernel gets more complicated and brittle every release, and people like me who try to clean up the accumulating mess get a chorus of "harumph!" from the comfortable geezers wallowing in it...)

Then the third thing you do once you've got the mini.config digested is remove the symbols that are already set by the mkroot base config, which I do with a funky grep -v invocation, so altogether that's something like:

$ mv .config walrus
$ CROSS_COMPILE=/path/to/or1k-linux-musl- ARCH=openrisc ~/aboriginal/aboriginal/more/miniconfig.sh walrus
$ egrep -v "^CONFIG_$(grep -o 'BINFMT_ELF,[^ ]*' ~/toybox/mkroot/mkroot.sh | sed 's/,/|/g')=y" mini.config | less

And THEN you pick through the resulting list of CONFIG_NAME= symbols to figure out which ones you need, often using menuconfig's forward slash search function to find the symbol and then navigating to it to read its help text. Almost always, you'll be throwing most of them away even from the digested miniconfig.

And THEN you turn the trimmed miniconfig into a microconfig by peeling off the CONFIG_ prefix and the =y from each line (but keep ="string" or =64 or similar), and combining the result on one line as a comma separated value list. And that's a microconfig.

And THEN you need to check that the kernel has the appropriate support: enough memory, virtual network, virtual block device, battery backed up clock, and it can halt/reboot so qemu exits.


January 6, 2024

The amount of effort the toys/pending dhcpd server is putting in is ridiculous for what it accomplishes. Easier to write a new one than trim this down to something sane.

Easier != easy, of course.


January 5, 2024

I had indeed left the 256 gig sd card at Fade's apartment, which is what I wanted to use in the "real server". (I had a half-dozen 32 gig cards lying around, but once the OS is installed that's not enough space to build both the 32 bit and 64 bit hosted versions of all the cross compilers, let alone everything else. I want to build qemu, both sets of toolchains for all targets, mkroot with kernel for all targets, and set up some variant of regression test cron build. So big sd card.)

The orange pi OS setup remains stroppy: once I got the serial adapter hooked up to the right pins, there's a u-boot running on built-in flash somewhere, as in boot messages go by without the sd card inserted. Not hugely surprising since the hardware needs a rom equivalent: it's gotta run something first to talk to the SD card. (And this one's got all the magic config to do DRAM init and so on, which it chats about to serial while doing it. At 1.5 megabit it doesn't slow things down much.) Which means I'm strongly inclined to NOT build another u-boot from source and just use that u-boot to boot a kernel from the sd card. (If it's going to do something to trojan the board, it already did. But that seems a bit low level for non-targeted spyware? My level of paranoia for that is down closer to not REALLY trusting Dell's firmware, dreamhost's servers, or devuan's preprepared images. A keylogger doing identity theft seems unlikely to live THERE...)

Besides, trying to replace it smells way too bricky.

I _should_ be able to build the kernel from vanilla source, but "I have a device tree for this board" does not tell me what config symbols need to be enabled to build the DRIVERS used by that device tree. Kind of a large missing data conversion tool that, which is not Orange Pi's fault...

So anyway, I've copied the same old chinese debian image I do not trust (which has systemd) to the board, and I want to build qemu and the cross compilers and mkroot with Linux for all the targets on the nice BIG partition, and record this setup in checklist format. (In theory I could also set up a virtual arm64 debian image again and run it under qemu to produce the arm toolchains, but I have physical hardware sitting RIGHT THERE...)

I _think_ the sudo apt-get install list for the qemu build prerequisites is python3-venv ninja-build pkg-config libglib2.0-dev libpixman-1-dev libslirp-dev but it's the kind of thing I want to confirm by trying it, and the dhcp server in pending is being stroppy. I got it to work before...

Sigh. It's HARDWIRED to hand out a specific address range if you don't configure it. It doesn't look at what the interface is set for, so it's happy to try to hand out address that WILL NOT ROUTE. That's just sad.


January 2, 2024

I fly back to Minneapolis for more medical stuff on wednesday (doing what I can while still on the good insurance), which means I REALLY need to shut my laptop down for the memory swap and reinstall before flying out.

So of course I'm weaning mkroot off oneit, since writing (most of) a FAQ entry about why toybox hasn't got busybox's "cttyhack" command convinced me it could probably be done in the shell, something like trap "" CHLD; setsid /bin/sh <>/dev/$(sed '$s@.*/@@' /sys/class/tty/console/active) >&0 2>&1; reboot -f; sleep 5 presumably covers most of it.

But while testing mkroot to make sure reparent-to-init doesn't accumulate zombies and such. That's what the trap doing SIG_IGNORE on SIGCHLD is for, a zombie sticks around while its signal delivery is pending; presumably so the parent can attach to it and query more info, but if the parent doesn't know it's exited until the signal is delivered, and it goes away as soon as the signal IS delivered, I don't know how one would take advantage of that?

Anyway, I noticed that "ps" is not showing any processes, which is a thing I hit back on the turtle board, and it's because /proc/self/stat has 0 in the ttynr field, even though stdin got redirected. But stdout and stderr still point to /dev/console? Which means the kernel thinks we're not attached to a controlling tty, so of course it won't show processes attached to the current tty.

I vaguely remember looking at lash years ago (printed it out in a big binder and read it through on the bus before starting bbsh) and it was doing some magic fcntl or something to set controlling tty, but I'm in a systematic/deterministic bug squishing mood rather than "try that and see", so let's trace through the kernel code to work backwards to were this value comes from.

We start by looking at MY code to confirm I'm looking at the right thing. (It's worked fine on the host all along, but you never know if we just got lucky somehow.) So looking at my ps.c line 247, it says SLOT_ttynr is at array position 4 (it's the 5th entry in the enum but the numbering starts from zero), and function get_ps() is reading /proc/$PID/stat on line 749, skipping the first three oddball fields (the first one is the $PID we needed to put in the path to get here, the second is the (filename) and the third is a single character type field, everything after that is a space-separated decimal numeric field), and then line 764 is the loop that reads the rest into the array starting from SLOT_ppid which is entry 1 back in the enum on line 245. This means we started reading the 4th entry (if we started counting at 1) into array position 1 (which started counting at 0), so array position 4-1=3, and 4+3 is entry 7 out of the stat field table in the kernel documentation. (In theory we COULD overwrite this later in get_ps(), but it only recycles unused fields and this is one we care about.)

The kernel documentation has bit-rotted since I last checked it. They converted proc.txt to rust (to make the git log/annotate history harder to parse), and in the process the index up top still says "1.8 Miscellaneous kernel statistics in /proc/stat" but if you search for "1[.]8" you get "1.8 Ext4 file system parameters". Which should not be IN the proc docs anyway, that should be in some sort of ext4 file? (Proc is displaying it, but ext4 is providing it.)

I _think_ what I want is "Table 1-2: Contents of the status fields (as of 4.19)" (currently line 236), but right before that it shows /proc/self/status which _looks_ like a longer version of the same info one per line with human readable field descriptions added... except it's not. That list skips Ngid, and if you look at the current kernel output it's inserted "Umask" in second place. So "which label goes with which entry offset" is lost, they gratuitously made more work for everyone by being incompatible. That's modern linux-kernel for you, an elegant solution to making the kernel somewhat self-documenting is right there, and instead they step in gratuitous complexity because "oops, all bureaucrats" drove away every hobbyist who might point that out. Anyway, table 1-2 is once again NOT the right one (it hasn't even GOT a tty entry!), table 1-4 on line 328 is ("as of 2.6.30-rc7", which came out May 23, 2009 so that note is 15 years old, lovely), and the 7th entry in that is indeed tty_nr! So that's nice. (Seriously, when Greg finally pushes Linus out this project is just going to CRUMBLE TO DUST.)

Now to find where the "stat" entry is generated under fs/proc in the kernel source. Unfortunately, there's not just /proc/self/stat, there's /proc/stat and /proc/self/net/stat so grep '"stat"' fs/proc/*.c produces 5 hits (yes single quotes around the double quotes, I'm looking for the string constant), but it looks like the one we want is in base.c connecting to proc_tid_stat (as opposed to the one above it connecting to proc_tgid_stat which is probably /proc/$PID/task/$PID/stat). Of course neither of those functions are in fs/proc/base.c, they're in fs/proc/array.c right next to each other where each calls do_task_stat() with the last argument being a 0 for the tid version and a 1 for the tgid version. The do_task_stat() function is in that same file, and THAT starts constructing the output line into its buffer on line 581. seq_put_decimal_ll(m, " ", tty_nr); is the NINTH output, not the seventh, but seq_puts(m, " ("); and seq_puts(m, ") "); just wrap the truncated executable name field, and subtracting those two makes tty_nr entry 7. So yes, we're looking at the right thing.

So where does tty_nr come from? It's a local set earlier in the function via tty_nr = new_encode_dev(tty_devnum(sig->tty)); (inside an if (sig->tty) right after struct signal_struct *sig = task->signal;) which is _probably_ two uninteresting wrapper functions: new_encode_dev() is an inline from include/linux/kdev_t.h that shuffles bits around because major:minor are no longer 8 bits each but when they expanded both minor wound up straddling major to avoid changing existing values that fit within the old ranges). And tty_devnum() is in drivers/tty/tty_io.c doing return MKDEV(tty->driver->major, tty->driver->minor_start) + tty->index; for whatever that's worth. But really, I think we care that it's been set, meaning the pointer isn't NULL.

So: where does task->signal->tty get set? I did grep 'signal->tty = ' * -r because the * skips the hidden directories, so it doesn't waste a bunch of time grinding through gigabytes of .git/objects. There's no guarantee that's what the assignment looks like, but it's a reasonable first guess, and finds 4 hits: 1 in kernel/fork.c and three in drivers/tty/tty_jobctrl.c. The fork() one is just copying the parent process's status. The assignment in proc_clear_tty() sets it to NULL, which is getting warmer. A function called __proc_set_tty() looks promising, and the other assignment is tty_signal_session_leader() again setting it to NULL. (Some kind of error handling path?)

So __proc_set_tty() is the unlocked function, called from two places (both in this same file): tty_open_proc_set_tty() and by proc_set_tty() (a wrapper that just puts locking around it). The second is called from tiocsctty(), which is a static function called from tty_jobctrl_ioctl() in case TIOCSCTTY which means this (can be) set by an ioctl.

Grepping my code for TIOCSCTTY it looks like that ioctl is getting called in openvt.c, getty.c, and init.c. The latter two of which are in pending.

The main reason I haven't cleaned up and promoted getty is I've never been entirely sure when/where I would need it. (My embedded systems have mostly gotten along fine without it.) And it's STILL doing too much: the codepath that calls the ioctl is also unavoidably opening a new fd to the tty, but I already opened the new console and dup()'d it to stdout and stderr in the shell script snippet. The openvt.c plumbing is just doing setsid(); ioctl(0, TIOCSCTTY, 0); which is a lot closer to what I need, except I already called setsid myself too. Ooh, the man page for that says there's a setsid -c option! Which didn't come up here because it's tcsetpgrp(), which in musl is a wrapper around ioctl(fd, TIOCSPGRP, &pgrp_int); Which in the kernel is back in drivers/tty/tty_jobctrl.c and tty_jobctrl_ioctl() dispatches it to tiocspgrp() which does if (!current->signal->tty) retval = -ENOTTY; so that would fail here. And it setting a second field, which seems to depend on this field.

TWO fields. Um. Ok, a non-raw controlling tty does signal delivery, when you hit ctrl-C or ctrl-Z. Presumably, this is the process (group?) the signal gets delivered TO?

Ah, man 4 tty_ioctl. Settling in for more reading. (I studied this EXTENSIVELY right when I was starting writing my own shell... in 2006. And I didn't really get to the end of it, just... deep therein.)

My real question here is "what tool(s) should be doing what?" Is it appropriate for toysh to do this for login shells? Fix up setsid -c to do both ioctl() types? Do I need to promote getty as "the right way" to do this?

I don't like getty, it SMELLS obsolete: half of what it does is set serial port parameters, which there are separate tools for (stty, and why stty can't select a controlling tty for this process I dunno). Way back when you had to manually do IRQ assignments depending on how you'd set the jumpers on your ISA card, and there was a separate "setserial" command for that nonsense because putting it in getty or stty. There's tune2fs, and hdparm, and various tools to mess with pieces of hardware below the usual "char or block device" abstractions.

But getty wants to know about baud rate and 8N1 and software flow control for historical reasons, and I'm going... This could be netconsole or frame buffer, and even if it ISN'T the bootloader set it up already (or it's virtual, or a USB device that ACTS like a serial port but isn't really, hardware like "uartlite" that's hardwired to a specific speed, so those knobs spin without doing anything) and you should LEAVE IT ALONE.


Back to 2023