Rob's Blog (rss feed) (mastodon)

2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2002


December 26, 2023

I got pointed at a page of "Obscure features of C" and I'm always up for learning new corner cases about my stomping grounds, so...

The link calling it "obscure" is misleading. The page itself says "lesser known tricks, quirks, and features of C" which is a lot more accurate. The list includes "last needed in about 1977" for trigraphs, "introduced in the 2011 standard so not everybody still doing c99 has noticed it yet" for compound literals and designated initializers, and borderline "bug in some but not all compilers" for multi-char literals (probably a terrible early attempt at wide char support kept around because some program still won't built without it).

I admit I didn't know the syntax for array pointers, but I suspect that's because I almost never use multidimentional arrays in C: a[b*c+d] is easy to do and by the time you're adding a third layer it's hardly ever an array, it's usually a tree. And if it's a _single_ dimentional array that you're trying to add bounds checking to... again, I don't expect compile-time bounds checking to mean anything outside MAYBE the stack. (An ASAN or HWSAN variant that figures out you're indexing outside of a single heap allocation might be nice, but the bound to check comes from malloc() not from the access type.) Still, good start, technically I've learned something. (It's a bit like function pointer syntax when you know it's there but don't remember where the extra parentheses go.)

Bitfields are sadly useless because nothing defines which bit goes where (order and padding) so if you try to match them up with hardware registers even compiler _version_skew_ can break your program. (I suppose you can hit it with __attribute__((packed)) and maybe newer standards tried to address some of the worst bits since last I checked? The "zero length bitfield" entry implies so, although this document isn't distinguishing between "introduced in C99" and "only happens in Microsoft Visual Studio versions 6.1-12.7, not including 11.3".) The other problem is what the compiler does with bitfields tends to be horribly inefficient compared to just doing it yourself. (And you didn't _define_ short to be 16 bits, that's something even Windows' LLP64 and everybody else's LP64 agree on. Char is 8, short is 16, int is 32, everybody except windows has long be the same size as pointers, and long long is 64.)

Are there C programmers who don't know about "volatile"? How is that obscure? (Various people complain it's too big a hammer, and also that it's not the same as a memory barrier when your problem is the processor hardware reording accesses in an SMP context, but "what it does" was in K&R wasn't it?) Stuff like "register" and "inline" have been around forever but get widely ignored by compilers which do or don't do that stuff automatically anyway. (They don't FORCE anything, therefore they don't MEAN anything. Do not give advisory keywords to a compiler that's going to have its optimizer rewritten twice over the next decade as we bounce between generations of hardware. Technically there's an "auto" keyword to be the opposite of "static" but it's a weird vestigial thing with no use I'm aware of.)

Not knowing about "restrict" is good: Dennis Ritchie sent a long screed to the committee to try to get them to stop breaking stuff, and "restrict" is less bad than the "noalias" he argued them out of adding, but it's still pointless and the correct response to seeing it in a program is ripping it out.

Was it C99 that added "flexible array members"? Because everybody was doing char blah[0]; as the last array member and then indexing off the end (so they could do math in the malloc() and easily use the extra space), and the perl clutchers went "but but but BOUNDS CHECKING!" and everybody else told them to sod off because the compiler literally CAN'T get that right, so C99 made a special magic invisible zero to put there instead and we all use it because eh, one less character? I think it was C99...

The %n format specifier is obscure? You can't get granular error information out of sscanf() without it. (How far in the parsing did it get before losing traction? The return value is number of fields populated, except it doesn't count %n itself, and also skips the * ones you told it not to write to, and is generally less useful than you'd think. If you scanf("[%d,%d]", &a, &b) that doesn't say whether there was a closing ']', you need len=0; scanf("[%d,%d]%n", &a, &b, &len); if (len)...

And using %.* to truncate string output: not obscure! Regularly useful! (What was obscure was tracking down that the limit is indeed measured and enforced in BYTES not utf-8 characters; I made puppy eyes at Rich Felker back around 2013 and he tracked it down for me, confirming it is so. :)

"Interlacing syntactic constructs" is a verbose way of saying case labels are goto: labels that can plonk down in the middle of loops, and "switch" is just a multi-destination goto. Which was a horrific thing I encountered cleaning up the original bunzip2 code which made EXTENSIVE USE OF THIS for very bad reasons (his code needed to request more data and he did it by COPYING EVERY LOCAL VARIABLE INTO AND OUT OF A STRUCT EVERY FUNCTION CALL, with big a=s.a; b=s.b; blocks, and corresponding s.a=a; s.b=b; blocks, and setting a "state=LABEL;" value right before each case label... My rewrite just had a grab_bits(3) function call that could read() more data from upstream or longjmp() out with an error as necessary when the input buffer ran dry. That change alone eliminated like 2/3 of his source lines.

That's another reason I mostly avoid switch/case in my own programs, although mainly it's an unnecessarily verbose syntax (don't annotate the common case requiring a zillion "break" statements, annotate falling THROUGH), produced larger code than if/else staircases when I actually checked, wasn't measurably faster, nobody can agree how to indent it, and despite being restricted to checking integers (because the jump targets aren't case "string":) you STILL don't get any sort of (3

Does any C programmer think --> isn't two operators (decrement, greater than)? That's not obscure, that's C++ polluting people's minds. Add the space if it helps you read it.

The idx[arr] thing again seems less... it's pointer math. The question is can the compiler figure out the stride for the addition of the two types (because pointer+offset is actually pointer+sizeof(*pointer)*offset, and yes sizeof(void) being 0 vs 1 was an area of standards contention for a while... I still say it should have been 0, catch the bugs quickly). But I I first hit 7[str] in the obfuscated C code contest back in college, and I've had compilers spit warnings at me when I did 7+s instead of s+7. I have stuff like -Wno-char-subscripts in toybox's CFLAGS because of that sort of thing.

Same for negative array indexes: I actually use those, sometimes you increment past something and then need to check what you were just on. I know x[-1] is valid for the exact same reason I know x[1] is valid: it's not what my pointer points to but I have context. Some compiler's warning genrator was mad, I had to find a -Wstop-that to silence it.

Toybox's OPTSTR generation is an instance of constant string concatenation, and I suspect I've done it to break up some long printf() strings across lines but grep -A 1 '"[ \t]*$' main.c lib/*.c toys/*/*.c | grep '^[ \t]*"' didn't find one, so...

Don't get me started on backslash line splicing, I still need to fix that in toysh. (The hard part is cleanly backing _out_ my previous attempt to fix it and adding enough tests.)

Yes, I do use && and || as conditionals in shell scripts all the time, and I know about (and rely upon) the short circuit behavior in C where only the side that gets evaluated gets executed. Lots of if (x && *x) avoids null pointer dereferences that way. (I further rely on dead code elimination to yank function calls for stuff I'm not linking in, if (FLAG(x) && potato()) won't link in the function if FLAG_x is constant zero because the feature is diabled.) That said, trying to avoid if() statements in C is asking for "statement with no effect" warnings, which they note. And doesn't result in smaller binaries. if (x) blah(); and x && blah(); presumably compile to the same thing.

I don't like "enums". It's syntactic sugar hiding what the program is DOING. That said, I broke down and used them in a few places, like ps.c; it's not a tool with NO use, just... widely overused. And dumping out into the larger program symbol namespace while SEEMING not to, which is why I prefixed them... It is interesting that you can use them as asserts (although the real trick here is having the compiler notice a division by zero error at compile time), but you can just as easily if (0) a block instead: it still has to syntax check it, which means doing the math for compile-time constants. And I'm not a fan of asserts in general, which is its own whole topic...

(Sigh, I had to do a lot with what is and isn't a compile-time constant in my tinycc fork, because tinycc wasn't letting you do all the global variable assignments it should, so I fixed it. Which brings us to the wonderful world of "constant propagation", which _is_ in the C standard, and DOES come up in what is and isn't a legal initialization of global and static variables. You cannot run CODE to initialize a global, it has to be a constant. And yes, static variables inside functions are functionally equivalent to static variables _outside_ functions, just with a layer of namespacing, so you can't run code to initialize a static variable in a function, either.)

Declaring a struct in a function return type is no sillier than declaring one when you declare an array of an instance and initialize the members, which we do all the time. Make sure your function declaration precedes its callers, as usual... (A more likely gotcha is that you can pass structures as arguments, not just pointers to them, meaning you can bloat your stack usage TREMENDOUSLY without noticing. Structure assignment in general is basically an implicit memcpy(), but the compiler usually generates TERRIBLE CODE for it.)

I actually _use_ the nested struct declaration not being kept nested (because it's a flat namespace) in toysh. The struct declaration has to be inside the GLOBALS() block for me to declare instances of them that the union can calculate the size of at the end of generated/globals.h with all the struct definitions it's block copied out of the various source files. Otherwise it goes barf trying to sizeof() an incomplete type when you #include the header.

Flat initializer lists is another "I don't use multidimentional arrays much" thing. I was vaguely aware of this but don't think I've ever used it. If I have to malloc() a big block of memory by doing math, I'm likely to _access_ it by doing math. Besides, that way it doesn't _have_ to be rigidly sized, that's a choice. The first 32 entries of length 64, the rest of length 16, easy enough... Using them for structs happens all the time, in fact when you struct potato blah = {0}; what you're actually doing is initializing the first member to 0 and then all the members you don't mention also get initialized to 0. (And the compiler generally produces terrible code to do this, it's not a memset() it's a bunch of assignments, but it's not so bad for small things and hopefully they'll fix it upstream somday.) And the "named initializers" introduced in 2011 clarified that it's not the REMANING fields, it's EVERY field you didn't mention gets initialized to zero when you have at least one initializer, so if you potshot one in the middle all the others are guaranteed to start zero. With a note that this applies to arrays too but I forget the syntax for initializing only certain field members in an array. I could look it up... (Also, uninitialized global variables are guaranteed to start zeroed thanks to the ELF spec, it wasn't in C99. Maybe they've added it to a more recent one, but those variables living in the .bss section and that section's contents guaranteed to be zeroed was what ensured they're zeroed when I researched this in 2011 doing the Linux port to Hexagon.

Implicit casting of void pointers is NOT obscure, it's a core advantage of C over C++. It's only obscure to C++ programmers. (Once again, C++ polluting people's minds.)

Static array indices in function parameter declaration is yet more compiler micromanagement. It's the kind of thing "whole tree" compiles should get you, and if you're not looking to do that don't try to make it happen MANUALLY. That's asking for REALLY funky optimizer bugs brought on by moon phase and ambient humidity.

Is "macro overloading by argument list length" actually in one of the standards? I've seen varargs abuse with nested macros where one macro expands to having or not having a comma in it so "what is argument one" changes depending on a comparison result, I believe that's how Linux implemented their IS_ENABLED(BLAH) without a preprocessor populating a .h file like I did back on busybox 20 years ago. (Well, their FIRST stab at it copied my .h file full of zero or one #defines from busybox, except they added it to the headers kconfig wrote out, but I recall them switching to a clever preprocessor hack later.)

Yes, I have seen function pointer typedefs. I mostly avoid typedefs (again, hiding what the program is actually doing), but I have done a bunch of function pointers as arguments, "local variable is a function pointer you assign functions to", and inline (test ? strcmp : strcasecmp)(a, b) stuff that's basically function pointer under the covers. But I mostly avoid typedefs as information hiding: if my variable is type struct walrus I say "struct walrus" in every variable definition and function argument so you NEVER HAVE TO CHASE DOWN THE ACTUAL TYPE. (Did I remember to mention that in the coding style page? Probably...)

Oh hey, X macro has a wikipedia[citation needed] page. Once again I've been using something for years without knowing there was a name for it. Toybox is using them in several places, somewhat extensively in getconf.c for example.

No, you do NOT have "named function parameters", you created a MACRO to abuse C11 named struct initializers and you'd have to create a similar macro wrapper around every single function you want to do that for. Similarly for the "combining" version, that way lies generated code and reinventing cfront. And the "Abusing unions" one (I agree it's abuse) once again requires the same fields in the same order to be listed twice, manually setting up someting brittle just so your users then don't have to say where things actually are.

The quality is declining in this list, let's knock a few more out quickly: "Unity builds"... yes you can #include all your source files from one top level source file. That's not a secret, merely a bad idea. (When SMP happened, "make -j 4" was stupid and "cc -j 4 *.c" was RIGHT THERE, notice how the compiler resets its state between each file so statics and function prototypes and such reset, which a pile of #includes won't.) Yes, I was aware of scanf square bracket syntax: it's in the scanf man page. Boehm GC doesn't WORK, and "Cosmopolitan Libc" is a nightmare aping Java's "write once debug anywhere" from 20 years ago. Inline assembly isn't well-known? Now you're REALLY reaching. The sizeof() duplicate case thing goes out and says "depend on compiler" (and I've exfiltrated embedded data by blinking LEDs and beeping speakers, but these days you almost always have a serial port, it's 2 contacts and a clock). If you want to detect a constant expression, just try using it as a global initializer (you can't generate code to initialize globals so if it's NOT a constant expression the compiler will barf). Yes, C can be used in an object oriented manner, the linux-kernel does this. I do not care about the "Metaprogramming" one enough to even insult it, and yes the preprocessor can do a lot more than is a good idea to use.

There is a distinct, "Oh wow, KNIVES! I love knives. Watch THIS!" vibe towards the end of that list. And a lot of "here's things people built on top of C, which did not see wide use, gee I wonder why?"

Sigh, I'd hoped to learn more from that. Oh well...


December 24, 2023

[Ok, this was the original entry for the 23th but I'm moving it here. Still in late January. Honestly, editing these for human consumption takes longer than writing them.]

In response to the recent email thread [your guess is as good as mine, it's been a month] I dug up reboostrap again, and it said there was a 2015 talk from its maintainer about it, so I went to see if there was a recording, and it turns out there was! Not on youtube, on peertube. And the first hit on peertube was a server that went "actually I don't have this, want me to redirect you to the server that does?" which is a silly manual step, but eventually I found the page with the video, which does not include the word "rebootstrap", it says "Automating Architecture Bootstrap" and had literally zero views before I found it.

And the site, peertube.debian.social, has a BUNCH of interesting videos which you have to scroll back through an insanely slow loading index page of the endless-scrolling variety so you can't even temporarily bookmark your place if the page ever has to reload, or go "these seem roughly chronological, let me skip ahead about 2 years" by manually fiddling with the index number in the URL for the next reload. Oh no, you've got to let the javascript VERY SLOWLY do its thing, adding yet more entries to a far-too-long list that you must keep your own place in, because that's "modern". You also can't ctrl-F search what you haven't loaded yet, so you have to scroll "to the end" (I'm not even sure that's possible) and then do a text search of the page. (At least it doesn't do that stupid thing twitter used to do where the actual text of each tweet was some sort of dynamically populated submodule that ctrl-F searches couldn't see, so you had to CUT AND PASTE the entire page to a text file to search recent history. No idea if/when that got fixed, I deleted my twitter account six months after Twitler declared his intention to sink the place.)

Anyway, I checked to see if "toybox" was there and didn't find it, so I tried "Android" and found a promising "Android tools BOF" link, and RIGHT NEXT TO THAT in the list was Building a busybox based debian" which I am VERY interested in, because swapping out the base packages for toybox (and redoing the build to create a repo of existing supported hardward architectures but with every package linked against musl so it's a new SOFTWARE architecture... kinda like hard float vs soft float, that's what the rebootstrap interest is for)... I stumbled across that video by accident. Scrolling a bit more there's also something called the "Spontaneous Cross Compiling BOF" (is "spontaneous cross compiling" a specific cross compiling technique, or was the BOF spontaneous yet recorded, and is this general "cross compiling source packages" or "cross compiling DEBIAN packages". Dunno, haven't watched it yet.) And also Running debian desktop in containers because sure, why not. (At a guess either ssh x11 tunneling or some kind of VNC server? Either way a pretested recipe would be quite welcome...)

Back in the days of hand-curated internet, somebody would go through this and go "here's today's interesting video!" with a brief description. I could totally do that on Mastodon. I do not have the spoons for such a commitment, but would happily subscribe to somebody who did.


December 23, 2023

I regret having my blog up to date (as in uploaded to the website), because the gap from the 17th to the 19th could easily be backfilled with a couple of the topics I instead have to cram into today's entry. There's an umbrella sort of topic paragraph making multiple unrelated things seem connected (or at least they connected in my head enough to want to blather about them a bit), but I could pretend I was mulling it over ala 2-3 days instead of "I did an ADHD work burst and now let me trail-of-breadcrumbs it for you in a less overwhelming way while I go stare at butterflies for a while recharging".

I went ahead and backfilled three entries, and then my blog languished for a while un-uploaded [It's currently January 24], and now I've cut and pasted them back to the end of this one because interleaving historical entries made me lose track of how far I'd properly edited and posted, until I downloaded the file that was on the server and diffed it with the one on my laptop. Other people's definitions of "the hard things" and "the easy things" don't necessarily map well to my working style...

Anyway, have a giant wall of text:

Fake December 17th:

You know what's missing from the modern open source world? As in hard VACUUM thereof? Online resource curation. My "prototype and the fan club" talk had that chunk about the giant "slush pile" of unsolicited manuscripts in the fiction publishing space, and how editors filter through it to find the small subset they actually clean up and publish. It's a lot of work to chew through the massive bland pablum and boil it down and pan for gold and whatever other metaphors seem appropriate to produce highly concentrated Topics Of Interest. But this is what slashdot did, back in the day. This was why we followed people on Livejournal who were good link curators pointing us at good articles to read.

Current example: somebody should edit Cory Doctorow's "craphound" podcast. Grab just the episodes of him reading his columns (no short stories or "daddy-daughter" episodes), strip out the 3-5 minutes of throat clearing at the start of each one where he chats about his travel schedule and how the pool near his house just reopened, and also remove the "thanks for tuning in again" part at the end about his upcoming book publication schedule. And shorten the actual end credits to the 8 seconds of "You've been listening to the Cory Doctorow Podcast, licensed under Creative Contributions Attribution Non-commercial Sharealike US 3.0" and then stop there before he quotes Woodie Guthrie for half a minute. (His license DOES allow remixing, so we can redistribute tightened episodes.)

For example, his "Microencentives and Enshittification" podcast should be 4:08 to 14:43 plus 15:09 to 15:17, thus shrinking his 16:04 length to 10:43 and making an audio experience that doesn't require a lot of patience to sit through before he GETS TO THE POINT.

Then you could start up your own RSS feed of the better episodes, maybe do a mastodon account that posts the title and link (or embedded MP3 right there in the post), and offer a link back to his website for the text version.

This is curation work anybody could do... but nobody does. And "algorithms" certainly aren't gonna, by whatever name ("AI", "wizards", "agents", "alexa/cortana", "Clippy", "Microsoft Bob"...).

Fake December 18th:

Yesterday's brief thing on curation was the result of THROWING OUT a lot of initial writing on the topic, and trying to get to the point. My first attempt was longer and wider ranging, and there's a wider point I'm still not sure how to express. For context, here's the material that GOT me to yesterday's short version:

A recent youtube video pointed me at "antennapod" as a good replacement for google's own podcast app (which is joining the Google Graveyard due to ravenous monetization forcing all RSS feed content through the Youtube Music interstitial ad injector every time a speaker inhales, this breath brought to you by a brand of peanut butter cups I used to like quite a lot but haven't bought in YEARS because they WON'T STOP INTERRUPTING WHAT I'M WATCHING so they need to go out of business as an example to others... Ahem: tangent.)

Anyway, I haven't worked out how to move over my existing subscriptions yet (there's a promise of some CSV export and then import thing when Google finally Graveyards its podcast app), but first I wanted to make sure this new one was an ok player experience (it's an open source project so the UI is potluck), and to do that I had it play a few podcasts. The sound quality's ok, it works with the phone screen off and in my pocket, I can speed it up 2x or faster, it turns out it DOESN'T do some weird "force me to subscribe just to listen to one video I searched up" but the UI to make it work is non-obvious (close the beg screen with the tiny x in the corner and then expand the bar at the bottom; THERE'S your controls). Downsides: there's no obvious way to download the MP3, or to get a URL I could share with other people. It every time you reopen the app (even switching away for a moment) it loses context so you have to drill down the list again... Still, I've used worse and google's one is going away...

While evaluating this, I searched for Cory Doctorow's podcast feed and listened to 2 or 3 of his recent episodes, and... there's excellent material here, buried in a layer of throat clearing. The Wadsworth Constant is strong with this one. The first 3-5 minutes of each post is him chatting about recent travel and how happy he is the municipal pool near him reopened, and the last 30 seconds of each podcast is boilerplate of which only the first 8 seconds is actually relevant. Somebody has to listen to all this to get the timestamps to truncate each episode to just the actual material, but we shouldn't make EVERYBODY listen to that.

Making people read that to get to the "here's how you tighten up an episode of this podcast series, every episode could benefit from that, the result would make much more concise, interesting, hard-hitting listening and might spread the material more widely" center... well it's like reading all those recipes that give the cook's life story before listing ingredients. A properly curated cookbook does NOT include that.

I also note that I've been putting "span" tags around my blog entries for years now, on the theory that someday I might teach my rss feed generator script to generate different feeds for the different tags (not hard, just not at the top of the todo list), and... I have no idea what tag to put on today's entry. Yesterday's was the "programming" tag (as most entries are). Back during the Rump administration I had a lot of "politics" tag before settling on the even/odd day thing where I carved out days to NOT TALK ABOUT BOOMERDAMARUNG (or I'd never get anything done, just chanting "oh the humanity" while those obsolete gasbags burned down everything they fell upon)... But this is sort of a meta-observation? If you "view source" the page, my notes at the top have cataloged spans for: links, politics, programming, donotwant (I.E. "Things that open source developers should not be doing"), health, entertainment, smof (Secret Masters of Fandom, an old way of saying convention runner inside baseball stuff, been a while since I ran a convention), and comphist (computer history).

This is sort of a combination of smof and comphist with a side of donotwant. Haven't got a tag for it, and I dislike creating categories with one entry...

Fake December 19th:

Here's another stab at the wider topic. (And back under the "programming" tag because open source USED to do this right. You could argue for comphist but that isn't prescriptive.)

We're starting with a giant slush pile of stuff that Google can't even FIND (especially if you don't know what questions to ask), and which youtube's algorithm will never show you because popularity contests with significant feedback loops aren't the only possible evaluation of merit.

Linux Weekly News still technically exists, although it's stopped even pretending to be weekly, and its kernel summary page stopped even trying to cover major discussion threads of interest on the linux-kernel mailing list in a timely enough manner participation was remotely possible (even BEFORE the week-long embargo for non-subscribers got added).

Kernel-traffic used to do a great job summarizing the linux kernel mailing list, highlighting a dozen or so threads each week and giving editied summaries of them, and it was recent enough you could jump on and do "and another thing" if they'd missed something major you cared about. When that site's editor stopped having the time/energy to do the job, Jon Masters' podcast version carried the torch for a bit, and other sites did their own summary articles... but it was a bunch of work. And nothing's really replaced it since. (I tried to hire Mark to do it back in the day; cashed the check and never did the work. Oh well.)

I tried to do it for qemu, but got overwhelmed pretty quickly. Anybody else COULD HAVE done this, but nobody did. (Ok, Julia Evans I.E. B0rk does so. And does an excellent job of it. But there's room for several hundred more people here.)

The failure mode of Linux Weekly News is they want to make original contributions. Their analysis must be a creative addition in its own right or they won't cover it... which is not what Kernel-Traffic did. They just summarized the material as best they could, and got out of the way to let OTHER people be clever/insightful/informative. Linux Weekly News isn't HUMBLE enough to be a good news correspondent, the news is not about the reporter.

I recommended usefully condensing Cory Doctorow's podcast in a way that cut ~5 minutes out of each one without adding a second of new material, and (in my estimation) greatly improving them thereby, and that consistently doing so and posting links with just the TITLE of each entry would be a great service. There is, essentially, ZERO creative input in that, and I would totally subscribe if somebody did that for me. (And went through the backlog of old ones and did them too.) But it's not what people want to do. They want to make their mark. They want the work to be about THEM, not about the work.

This is janitorial work. This is washing dishes. This is cutting the crust off sandwiches. It _can't_ be "about you", it is inherently humble and self-effacing. There's expertise in choosing what to keep and what to throw out, but if you notice the janitor they're doing it wrong. The creativity is in finding places where work needs doing. Where doing work would improve people's lives. You can find faster and more efficient ways of doing it, a cleaner result with less effort, maybe even automate it away so it now "just happens" and NOBODY has to do it anymore. And that's great. But "I'm so happy THIS person is ladeling out my soup today" implies either you're hitting on them or somebody else does it noticeably wrong.

Alas, "These floors need sweeping. Somebody should do that..." has only ever been a useful observation when I then personally find the time and energy to do it, and often to keep doing it to have any real impact. I don't have a staff or budget to order somebody else to do it. People keep asking me how they can help, and if I tell them something like this they take it as a hazing ritual or "paying their dues". No, it's real work that needs to be done that makes people's lives better, and there's ALWAYS MORE OF IT.

There ARE places where taste is important. Where knowledge of the material really helps, and where judgement calls need to be made. In my old prototype and the fan club talk, I mentioned Linux distros selecting which packages to include in the base OS image. It's a judgement call. There's no right answer (but a whole lot of wrong ones), and you do your best and if you succeed nobody really notices because no sharp edges stick out to snag their attention.

In 2007 when I took a Linux Foundation Documentation fellowship (and long after the funding ended, eventually became linux-kernel Documentation maintainer because SOMEBODY had to), I was trying to do this sort of shoveling. But it was drinking from a firehose, I was overwhelmed. I got the kernel's Documentation directory online (it previously had no web version), collected the Ottawa Linux Syposium papers together in one place, indexed a bunch of Linux-Journal entries, linked to the (previously fairly obscure) Linux Weekly News kernel page index, and put it all together in a single kernel.org/doc page. And I _kept_ updating it even after the Linux Foundation lost interest (until I lost access to it in 2011 because of encroaching bureaucracy), and even briefly became the linux-kernel Documentation directory maintainer (again, long after the Linux Foundation pivoted away), because somebody needed to clean stuff up. (Alas, the kernel devs didn't listen to me any more after I was maintainer than before. The kernel clique had already circled and closed, part-timers had no place.)

This is also what I had in mind for the youtube videos I was trying to do: no wadsworth constant throat clearing, just the edited condensed material. Which is SO much work to get to, and has a "perfect is the enemy of the good" problem where I don't want to add anything that's less refined and purified than it could be (while also missing coverage), so I wind up not adding _anything_. I could do an off the cuff extemporanous drivel about any of the commands with 15 seconds notice, no problem. That's definitely non-ideal, but would it be better than nothing? Hmmm...

I should figure out how to set up my own peertube instance. Youtube is too shifty for me, they keep "I am altering the deal, pray I don't alter it any further". (I've started a youtube list of creators complaining about youtube.)


December 22, 2023

The same guy asked a follow-up question "How did the Other Linux Derivatives build their own version of Linux. Like what Software or Tools did they use to make Ubuntu, Red Hat, Fedora, CentOS and others" and my first answer was:

The fundamental problem is all these distros only build under themselves. It's the circular dependencies problem. I gave a sadly jetlagged talk a few years ago that I've been meaning to redo ever since, but in the absence of anything more coherent: Building the simplest possible Linux system. (I'm amazed I stayed upright for that whole thing.)

Red Hat uses a build system called Koji, Which last I heard was extremely brittle, and requires very specific host configuration and package versions in order to work. (It's a question of priorities: their goal is to build and ship the next version rapidly with a large team of full-time employees, not make the process reproducible to individual outsiders.)

SuSE uses something called "Open Build Service" which, as with most things using the word "open", is proprietary and specific to its creator. (Methinks the lady doth protest too much.) It very much likes to pretend it's this generic thing applicable to distros other than SuSE. This is marketing bullshit, SuSE would love the rest of the world to use their stuff exclusively and be dependent on them. (So would Microsoft Github.)

Ubuntu is a Debian variant, but uses its own build system that's proprietary to cannonical. I've lost track since they switched to systemd (and thus became uninteresting to me), it has something to do with Launchpad I think?

Once upon a time I researched this whole area at some length, and the answer was "it's terrible". People who've been doing something for 15 years tend to get really insular about it, surrounded by other people of similar seniority, they forget the path that got them there (everything they're not using bit-rots and they accumulate magic unreproducible inputs), and lose the ability to onboard new people.

These days I'm mostly paying attention to debian (the least bad one I found, and yes that includes looking extensively at Gentoo, and its successor Funtoo), and focusing on getting Android to build under Android.

Rob

And then a couple hours later I went back and replied again:

I'm not sure I answered the question you asked. There's two layers here, building binary packages from source, and installing binary packages into a system.

Most people install Linux from binary packages fetched from a server. There's a "base OS install" of some kind which installs the default set of packages (you need enough of an OS to run the package management system, although most distro's default package set is way bigger than that) and then you add packages with the distro's install utility.

Linux distributions are made from packages, each of which is compiled from source and the resulting binaries are archived together and (usually) uploaded into a distribution repository, which is a server full of .rpm files for red hat variants like SuSE, or .deb files for debian variants like ubuntu and knoppix. There's less common ones like pacman (um, arch linux I think?) but rpm and deb are the two big ones.

Each package file is basically an archive of compiled binary files produced by the build (deb is based on tar, rpm is based on cpio) with some additional data files saying:

  1. What is this package, and what version is it.
  2. what prerequisite packages does this depend on, and thus needs to have installed before this can be installed, and that can also include minimum required versions (or specific versions).
  3. What script do I need to run when I install this.
  4. When I _uninstall_ this what cleanup script do I need to run?"
And so on.

The "debootstrap" tool I mentioned creates a fresh debian filesystem in a new directory, installing the "base OS" and "default packages". You can then do a little bit of setup and chroot into it. Way back when I first learned how to do this, I wrote it down. (The principle's the same, but all the version info's 10 years out of date.)

Here's a more recent one that created a known working root filesystem using debootstrap, and then added stuff to it. (Which were the build prerequisites for the rest of the toolchain build it was doing...)

All that's _using_ a server of prebuilt binary packages. The next question is how do I fill UP a server with packages I built myself?

In THEORY, when building a distro from source, what you're really doing is populating a server with package files. Note that you can have _build_ prerequisites that aren't needed at _runtime_, so there's a second layer of prerequisites when building from source. Each distro generally has an extra file they add to the package that describes what it means in the context of their distro, for my toybox project in debian it's probably this file (which is in a directory of files the debian guys added for some reason).

In practice, there's a bunch of little gotchas. If you're starting from an existing system and adding a new package, that's reasonably well understood and documented, because debian has over 40,000 packages in it so there's somewhere over ten thousand people who need to maintain one or two packages, and there's enough handoff that they keep the docs up to date.

Note, all distros do this, the difference is that Debian is a volunteer effort run entirely online, so their processes are accessible to outsiders. Companies like Red Hat (now owned by IBM), SuSE (they're German), and Ubuntu (the company behind them is called Cannonical, it's a vanity project of a south african billionaire named Mark Shuttleworth) has hundreds of full-time employees and when they hire new people they can afford to "onboard" them with in-house training and mentoring from existing employees. So while they THEORETICALLY have similarly open and documented processes... in reality those aren't load-bearing and if they're broken for 5 years or 90% of the people who try to get through them give up, who's really gonna notice?

But maintaining one package and assembling a bunch of packages into a fresh system are different, and creating a new system from scratch doesn't happen all that often. (Usually it's when people want to bring it up on a new hardware architecture, like that riscv stuff recently, or when people want to replace one of the base OS components in a way that's potentially binary incompatible, such as building with musl-libc instead of glibc, or using llvm instead of gcc. Then you rebuild EVERYTHING, and generally you find somebody who's done it before and ask them because the documentation's incomplete and bit-rotted...)

Plus as I mentioned before, creating a root filesystem is one thing, installing a bootable system (with a bootloader+kernel and with appropriately partitioned storage formatted as a filesystem that gets mounted properly during system bringup) is a bit harder. Your root filesystem is _mostly_ the same for all users (modulo which sets of packages they've chosen to install), but each slightly different hardware needs slightly different install (different bootloader, different kernel config, different partition layout, different drivers to talk to the hardware...)

This is half the reason something like "buildroot" is so complicated to configure: it's got support for a bunch of different hardware boards, it builds its own cross compilers from source... Often what you want to do is build a system for QEMU and run it under the emulator, and THEN try to tackle your actual hardware once you've succeeded getting the emulator to boot and run the stuff you build.

Regression testing hundreds or even thousands of physical boards is _not_ easy, only the really big distros can reasonably try. Most of the rest either fork an existing distro and regularly resync with upstream, go "we only care about vanilla PC hardware with these 3 types of graphics card", or have a certain amount of Caveat Emptor in the system bringup part (here's a bunch of control knobs, have fun, let us know if you get it working and we'll add yet another defconfig that worked for somebody 5 years ago... I.E. buildroot).

Rob


December 21, 2023

I haven't been doing much programming this week, but I have been keeping up with email, and somebody emailed me the question "What is the main difference between Yocto Project vs Linux From Scratch when building our own Linux Distribution" and since I typed my usual long reply I might as well share it here:

They're completely different kinds of projects.

Linux From Scratch is basically a book. It describes how to build packages, but expects the user to type each line of commands into the terminal themselves, to produce a new system by hand. The goal is to understand what all the pieces are and what they do.

Yocto is a large bureaucracy put together by the Linux Foundation, a 501c6 trade association (the same kind of legal entity as the Tobacco Institute) on behalf of multiple Fortune 500 companies. Its deeply layered design is reminiscent of filling out forms in triplicate, and its design goal is to hide from the user what is actually going on so you have to hire consultants to do it for you.

One compromise between the two is the "buildroot" project, which is designed around menuconfig (the same configuration the Linux kernel and busybox use) to select which packages you'd like to include and what system you'd like to configure output for. It's reasonably automated, but if you know how the parts work you don't have to fight the automation to figure out how to make changes to the parts.

If you're using an architecture that's already supported by debian (which most are), and don't have particularly tight space constraints, "debootstrap" is probably your easiest option for putting together a new system. (It's a pity they don't have better musl-libc support yet.) If you'd like to avoid systemd, the devuan repositories do that but are otherwise vanilla debian.

Embedded system bringup generally comes in 3 parts: getting your kernel up and running, putting together the root filesystem you'd like to use, and adding your application that the embedded system will run. The ability to separate those three and do one at a time makes things much easier.

Kernel bringup:

Root filesystem bringup:

Install your application and get it to work:

If you want to throw money at the problem, Yocto is designed to soak up as much money as possible. I've never been entirely sure how that helps, but it does make you strongly dependent on yocto. 90% of what you learn about yocto doesn't translate elsewhere, it's about managing the layers and processes of yocto, not Linux. People migrate between ubuntu and red hat all the time, but if you join the yocto ecosystem it's very sticky.

Rob

P.S. I wrote my own tiny system builder in 300 lines of bash, but _all_ it does is boot linux to a shell prompt under qemu for a dozen architectures.


December 20, 2023

Fade and Adverb arrived back in Austin, and I was out of comission with an eye issue.

My left eye was being REALLY WEIRD (spot in the middle as if I'd stared at a bright light but I hadn't and it lasted hours), and then it developed a hemispherical sort of _crack_ all around the left side, and my hypochondriac streak went "retinal detachment!", and I basically slept all day hoping it would fix itself. It eventually resolved into a REALLY BIG new floater of the long twisty line type, which was apparently stuck to the back of my eye the first day and a half (so not moving). Still disturbing and annoying and I don't think I'm out of the woods yet (where did it come from and why?) but not "there's a crack in my vision" levels. I've got plenty of floaters already, and the last one similar to this to show up was in... I was working that contract at Dell, so 2004.

I'd still go see a medical professional about it if I wasn't in Texas, where I'm not convinced there still ARE any. Setting foot in St. David's Of the Bloody Eucharist is just asking for a $20k charge due to the receptionist being out of network. Catholic hospitals especially seem just gratuitously exploitative. Why religion is allowed in medicine, I couldn't tell you.

*shrug* It's christmas. If I can't take a few days off over the holidays with family visiting while having a medical issue...


December 17, 2023

Going through tabs, closing windows, cleaning stuff up...

I had updates to findglobals in a window, and I guess I should just add them to the script and check it in, but there are some judgement calls.

The changes are basically a wrapper command line I came up with that parsed the data a bit and reorganized it so it was easier for me to read, and if I prefer it maybe other people will? The new output looks like:

D 8	toybox_version
B 80	toys
B 4096	libbuf
B 4096	toybuf
D 7648	toy_list
B 8232	this

Which has decimal sizes, sorts by size, with the type indicator at the left edge. Before it was just picking specific lines out of the nm --size-sort output which meant the first field was 16 hex digits, with a lot of leading zeroes and then the sizes were in hexadecimal so 2028-1de0 is how many bytes? I do way too much of this stuff and still have to stop and convert. The new output has the type on the left so I can trivially pipe it through sort to get that grouped, but is otherwise symbols sorted by size.

But to _get_ that output, it's doing a "grep -v GLIBC" which is both right and wrong. That crap should not be in the output, but it also shouldn't EXIST. I can't fix gnu/dammit anything, just mitigate it at best, and the caller can easily pipe to filter it out themselves... but a thing the user pretty much has to do every time is probably something the script should do.

But there's more: ASAN adds another half-dozen symbols. (Only SOME of which have "asan" in them) and it also pulls in syslog plumbing? It adds "prioritynames" and "facilitynames" which toybox only references in toys/pending/syslogd.c (which is _not_ enabled in that build), and that's sucking them out of libc as globals. Why would ASAN be outputting stuff to syslog? No idea. I'd rather it didn't.

If I statically link against musl-libc (which is in theory the sane one), the result is 55 lines of symbols, with things like __libc and __stdout_FILE and __malloc_context and _abort_lock and who knows what else. Similarly static linking against bionic (the only kind I can run on my host) has 24 symbols, from __android_fdtrack_globally_disabled to _ZTV18ContextsSerialized (which is probably the result of c++ name mangling). I'm not having my script even try to filter those out.

I think filtering GLIBC is probably correct. Filtering the asan symbols doesn't remove prioritynames/facilitynames so I might as well not bother.

Another issue: my comment in findglobals was "We should have this, toy_list, toybuf, and toys." which is 2 short: there's libbuf now (which is ok) and toybox_version which... really shouldn't be a global? It should be static, or possibly even a string constant in the rodata section. (So hey, running the script found a thing I should maybe address!)

Except that toys/*/help.c uses it to produce the HTML header output for help -h. Otherwise the only user is either the --version option or show_help(), both of which are in main.c. Which does NOT know about any command's -h option, main.c works under the "lib/" rules of nothing command-specific being in there. (It's at the top level as the obvious "start reading here" entry point to the code. I'm kinda tempted to have xexit() and friends be in there but it's long enough as it is.)

The compromise I came up with is the entry code is in main.c. with an unfortunately deep dependency stack ala toy_find() and toy_singleinit() and toy_exec_which() and so on, so people reading the code have an obvious place to start (here's the main() function) and can trace their way _into_ commands. "When I run sed, how does it call sed_main()" should be easy to answer. But the _exit_ code is in "lib/", and which in _theory_ is only used by certain commands... well, xexit() always happens, which _can_ call the toys.xexit list populated by sigatexit() but _usually_ doesn't have to, and which _can_ siglongjmp(*toys.rebound) but usually doesn't...) and you basically have to be looking at "true" to not have an error_exit() which calls perror_msg() which calls verror_msg() with the funky varargs traversal printf-alikes do...

So I'm hiding a lot of the shutdown code that USUALLY doesn't really matter. Conceptualy, you exit() and maybe fprintf(stderr, "bad thing happened"); on the way out. And I reimplemented atexit() because the libc one doesn't let you remove entries or call the accumulated functions _without_ exiting. And the shell can call certain builtin commands and then get control back afterwards even if the command hit an error and didn't return normally...

Not stuff I want to confront people with right when they start to read the code.

The other big thing that's not in main.c is lib/args.c (the argument parsing logic) which is its own can of worms. But TECHNICALLY not used by all commands either, even though it's invoked from main (toy_singleinit() calls get_optflags()). But that's a 500 line digression, and a fairly densely packed one at that. (Option parsing turns out to be a surprisingly hard problem, that's why libc has getopt.h has getopt(3) and getopt_long() and so on. I reimplemented it and don't use that libc plumbing either: I had my reasons but explaining them is another can of worms.)

*shrug* We all draw our lines somewhere...

Anyway, trying to figure out a clean way to move that one user of toybox_version back into main.c so it can be static variable, possibly even const. (Maybe _then_ the compiler could figure out it should be a string constant and thus rodata. I could have the macro expand to the "string" but I don't trust the compiler not to create multiple instances of it, and somehow some way it would wind up with version skew where different versions in different files had different strings on some platform due to a weird build dependency. Single point of truth: if there only IS one, it can't get out of sync with itself.)


December 16, 2023

I'm _trying_ to record the partial todo items in the open tabs instead of get distracted doing them, but I wound up spending far too many hours trying to coherently explain mkroot, which started life as the external mkroot project's README and took a lot of hammering to get even half a README for the mkroot/ subdirectory under toybox. (More like 2/3 than half, but nowhere near complete yet either.)

There is SO much backstory, and it's all circular dependencies so trying to figure out what order to put it in that has the least handwaving is HARD. It's not done, and I'm not happy with it, but... it's a stopping point, at least.


December 15, 2023

The tutoring area on the third floor of Jester center is deathly quiet, and all the chairs are up on the tables. I think the dorms have closed for the holidays? Yup, academic calendar says the last day was the 12th. The building's still unlocked, and nobody's actually minded me going and sitting quietly in a corner with a laptop. (If they do I'm happy to go sit at the outside table, but the raccoons were getting a bit insistent the last couple times I was there and I thought I'd give them a chance to get less used to me. If I didn't have to worry about rabies I'd treat them as another form of feral cat, but at least pre-pandemic Austin was a hotspot.)

Closing more tabs, lots of little things to check in...


December 14, 2023

Fade has defended her dissertation, defeated the snake, and is now officially Dr. Fade.

Hey, I got a reply from the openrisc guy... and it looks like I should start over on a different board emulation. The "virt" board knows how to shut down, the "sim" board does not.

(I worked around the bug where feeding the sim board an external initrd.gz file hung and never gave me boot messages by statically linking in the initramfs. I had the plumbing from the Turtle board. It's not ideal, but maybe switching to the virt board would fix that.)

Let's see if they can fix or work around the "sim" problems...

Checked in the or1k toolchain build (including the bugfix for musl-cross-make to install the kernel headers), and also added the second stage host toolchain as a build dependency when you specify toolchain(s) to build on the command line. For reasons I tried to explain but boil down to "gcc starts with gnu, and thus sucks".


December 13, 2023

Closing more tabs with shell tests in. Why does cat<<\EOF count as quoting the HERE document's terminator so variables in the body do not expand, but doing a backslash newline line continuation in the EOF does NOT count as quoting the terminator and variables still get expanded?

I'm not convinced the logic is consistent here.

The expr logic boils down to "convert everything back to a string each time, and then atol() it again when doing the next operation". Because "\( 111 + 222 \) : 33" produces 2, because it is a string. It doesn't matter where the string came from. So I don't need this weird struct with a string member and a long member, I just need a char *, which means I can probably cut and paste the $(( )) logic out of sh.c and do a string version with fewer operators so the logic is, if not shared, at least _consistent_ between commands.

The scripts/install.sh plumbing that sets up the build airlock has PENDING="expr git tr bash sh gzip" and also "awk bison flex make", the first group being the ones toybox has in pending that aren't ready yet, and the second being the group that aren't even started in pending. Of the left hand group, expr, tr, and gzip are low hanging fruit to me, sh I'm working on but it's big, and then the four on the right are just a lot of work. (Half of which is me learning how to use each command in a lot more depth than I know now, but what else is new?)

I keep glancing at expr and tr and trying to knock 'em out, and I think I finally have a _plan_ for expr. (And I've had a plan for tr for a bit now but it's a heavy-ish lift; doing proper utf-8 support. It should have an ascii fast path which is just as 127 byte -> 127 byte translation table even WITH the weird [:upper:] escapes, and then a second high ascii conversion function that's basically going to traverse the from and to strings for each character to figure out what to do with it, and no I'm not really trying to get [=CHAR=] right because that one's a nightmare.)

But not this release for either of those. CLOSING TABS. Ahem. (So many distractions just CATALOGING all this nonsense. I left a desk piled with post-it notes and I have to translate them into a log book before I can sweep the thing clean and give it a good scrubbing which it desperately needs...)


December 12, 2023

Oh hey, there is an archive of the openrisc list! The UI is terrible, but it exists. Alas, No replies yet...

Closing more tabs and writing them to todo.txt. Stuff is coming in (I need to add a second tar sparse extract format because mac/bsd) but I am trying to resist doing that RIGHT NOW, because I need to reboot, upgrade memory, reinstall, and cut a toybox release with new toolchains (including new target support) and kernel version (with rebased patches). Not necessarily in that order...

Getting reminders that healthcare.gov open enrollment ends the 15th. I believe Fade's health insurance I've been on runs through the end of January, and I do NOT want to try to get something through the Texas exchange because the reich-wing loons actively sabotaged it, which I've explained and professional authors have more eloquently explained and... (The tl;dr is Reagan passed EMTALA in 1986 requiring hospitals to treat anybody who showed up at the emergency room because pushing them out the door to die on the sidewalk was bad optics, and Obamacare rerouted the subsidies to pay for that through medicaid (same money arriving through a different channel), and any state that turned down the medicaid expansion didn't get the old subsidies back. So any state that "blocked obamacare" suddenly had hospitals going predictably bankrupt left and right, because duh. Which is why you CANNOT get good health care in Texas. The health care system here already collapsed even BEFORE covid burned out what was left. There are strip mall clinics where a nurse can get you a prescription for anything your grocery store pharmacy sells, and some end-of-life care for Boomers with Medicare. Everything else is terrible, long waiting lines, and random large charge city. (Yes, that's the hospital I walk past going to/from the table at night. My "local" one. It's notorious for that, but its investors are happy to pay to expand it...)

So I would like to be on my wife's health insurance in Minnesota. It's cheaper to fly there for care than get any here, and the quality is SO MUCH BETTER. Alas health care is tied to employment so when graduating she has to change providers, and it's all up in the air at the moment...


December 11, 2023

One problem with being on a night schedule is I get out of the shower and notice my hair is ridiculous (don't see a mirror often otherwise)... and the Great Clips in hancock closed at 7pm. And this is at least the third time I've done that this week. Oh well, maybe tomorrow...

Still banging on or1k kernel config. Going back to the defconfig I switched off several obvious symbols in menuconfig and made sure that gave me serial output. If I have output I can stick printk() calls into the kernel to figure out why it's doing stuff, it's the "no output at all" hump that's hard to get over because I dunno what's wrong and have to guess to find out; yes I could whip up a register-bang loop for early serial output but nobody seems to DOCUMENT for any given board, and its always a pain to find.

Aside: one of my big TODO items for mkroot is doing a "hello world" kernel like we did for the turtle board for each supported qemu target, which is especially nice for the targets that support vmlinux because it's just the _start entry and a couple extra command line options. I wrested with that a couple years back, and a quick google finds recipes for several other architechtures which saves me digging through the kernel's serial drivers and early printk support to try to find port values and bitmasks, although these days you can presumably scrape some of that out of device trees? Maybe?

Anyway, the point of having a zillion hello world kernels lying around would be precisely so I could cut and paste the two line for(;;) loop to bitbash a string constant out to the serial port, and thus stick a printf-equivalent into the earliest boot of each kernel, up to and including right at the start before the actual kernel code runs anything. Then work my way down to where it stopped making progress, and figure out why. (A dumb little for loop to convert an int to a char[16] buffer on the stack and print _that_ out at as a string is a standard part of this toolkit.)

There's really only two kinds of debugging: 1) I have a deterministically reproducible thread of execution, which I can restart at will, and at some point it diverges from my expectations. 2) A strange interrupt from mars happens when the stars align and the first sign of trouble is flying shrapnel, but it's happened more than once so PROBABLY wasn't a cosmic ray. (Or maybe I just have really, really, really good log data and I can't reproduce it but can still try to do forensics on what I've got?)

You can printf your way through pretty much anything in the first category (although your output may be flashing an LED or beeing speaker, which you can "clever hans" to answer most questions if you try). Debugging the the second category is either trying to turn it INTO the first category (disable threads, etc), or guess what happened via the Sherlock Holmes/House M.D. method (his sidekick got renamed from Watson to Wilson to be slightly less obvious) and then prove it somehow. (Mentally model the system and figure out what COULD cause that, and then try to add meteor catchers.)

The point is, If I can A) reproduce the issue at will, B) stick printfs into it, the rest is usually just work.

It's nice when you have sprintf() and friends to make your bare metal output loop more flexible, but previous times I've done it include "inside the uclibc dynamlic loader for Hexagon" where I couldn't use global variables or call functions because it hadn't dynamically linked _itself_ yet (they were all symbols in the Procedure Linkage Table and Global Offset Table which hadn't been traversed and fixed up yet, that's what I was trying to debug; turned out to be REL vs RELA confusion), and I did it in u-boot on an arm board where the u-boot image initially ran out of SRAM and then relocated itself to DRAM after initializing the DRAM controller, and the linked addresses were all for the post-DRAM relocation so to access any string constants BEFORE that I had to work out the appropriate value to subtract from it, ala #define THINGY 0x12345678 and then output("potato"-THINGY); a lot. Fun times. Or I could do char blah[8]; blah[0]='p'; blah[1]='o'; blah[2]='t'... if I haven't worked out the value for THINGY yet. (Yes you CAN search memory for a known string when the MMU isn't set up yet so everything will just return nonsense instead of faulting on bad address reads, and then just see if that works as the offset. I prefer to work it out from the data sheet or bootloader source so I know WHY it works and what code changes the relationship should survive, but data sheets lie and bootloader source from vendors is often beyond messy.)

Disabling the optimizer is sometimes useful in this sort of thing, because if "where the program thinks the rodata lives is currently garbage memory", then if you go (char blah[]){'p','o','t','a','t',o',0} and THINK it's equvalent to the above set of assignments to stack memory, but the optimizer instead goes "I'll just save that as an rodata block and insert a memcpy to move it to the stack", it won't WORK. (Sometimes you have to hit the compiler with a rock until it does what you SAID to do. Don't give me guff about "undefined behavior", I will hurt you in ways that don't heal.)

Also, on real hardware this assumes the bootloader already set the serial port to the correct speed (QEMU doesn't care, and only some USB serial devices do). You need a bit of initialization otherwise: stuff values into a couple registers (crystal divisor) and start the clock going (register write setting an enable bit). Which brings your two line loop (writing bytes to a register and checking a status bit) up to 6 or 8 line territory, but only the first instance needs the extra. It _should_ stick until changed.

Anyway: or1k kernel config. So I fed the lightly simplified in menuconfig but "gives me output" one to miniconfig.sh which resulted in 57 lines, and then "cp mini.conf mini.cooper" so I can compare what I'm doing now to what I had before, and start stripping lines DOWN from "known working" by commenting one or two I think I don't need out each time, building to make sure I still get serial output, and then next time I can delete the ones I've commented out and comment out a couple more. Slower, but I can see per-symbol whether or not I still get boot messages output on serial. (Which is ALL I care about at the moment: it speaks or it does not speak.) The compile/test cycle looks like:

$ make CROSS_COMPILE=or1k-linux-musl- ARCH=openrisc allnoconfig KCONFIG_ALLCONFIG=mini.cooper vmlinux -j $(nproc)
$ timeout 3 qemu-system-or1k -nographic -kernel vmlinux

I _do_ need OPENRISC_BUILTIN_DTB=or1ksim, whatever default it's using (virt_defconfig doesn't set it) does not produce output in qemu. I also need SERIAL_8250, SERIAL_8250_CONSOLE, and SERIAL_OF_PLATFORM.

I do NOT need any of the OPENRISC_HAVE_INST_* -- maybe they're performance things but they don't change the boot messages. I don't seem to need JUMP_UPON_UNHANDLED_EXCEPTION either. Adding PANIC_TIMEOUT=1 makes it TRY to reboot (and say it failed), but adding POWER_RESET, POWER_RESET_SYSCON, POWER_RESET_SYSCON_POWEROFF, and SYSCON_REBOOT_MODE (all from virt_defconfig) accomplishes nothing obvious. If QEMU implents a software reboot or poweroff trigger, the kernel doesn't seem to know how to poke it.

So really, the kernel mkroot was building SHOULD have worked, why didn't...

Ah. Figured it out. The same vmlinux binary produces output if I do NOT tell qemu "-initrd file.gz", but if I feed any of them a gz it hangs. I was thinking maybe the memory layout only reserved a little space for the initrd file and segments were stomping each other? But it doesn't matter what's IN the gzip file, if I feed it an EMPTY gzip file (ala ":|gzip > file.gz" produces a 20 byte archive) it still hangs. But if I "dd if=/dev/zero of=file.gz bs=1k count=2048" up a 2 megabyte empty file and feed THAT in, it does NOT hang, so it's not a question of running out of room. Which means... what? It's crashing in the type detection logic for external initrd files? The extractor is writing data to the wrong place? This should be generic plumbing, but the memory layout is arch-specific, which is why I initially suspected it...

There's no CONFIG_EARLY_PRINTK driver here so it's buffering the printk output and writing it out once it can finally enable drivers that require interrupts, which is shortly before launching init, so if it crashes earlier than that I may not get any output at all. That's why EARLY_PRINTK got invented, and their config didn't enable it...

This is weird kernel (or qemu?) bug territory, I should ask openrisc people questions now. (How did this work for you but not me?)


December 10, 2023

Trying to add or1k (I.E. openrisc) to mkroot. It looks like there's a reasonable defconfig (arch/openrisc/configs/or1ksim_defconfig), so digesting that down to a miniconfig using my old script looks something like:

$ cd ~/linux
$ make ARCH=openrisc CROSS_COMPILE=or1k-linux-musl- distclean or1ksim_defconfig
$ make ARCH=openrisc menuconfig # zap CONFIG_EXPERT and CONFIG_BLK_DEV_INITRD
$ cp .config walrus
$ ARCH=openrisc ~/aboriginal/more/miniconfig.sh walrus
$ GSYM="$(grep -o 'BINFMT_ELF,[^ ]*' ~/toybox/mkroot/mkroot.sh | sed 's/,/|/g')"
$ egrep -v "^CONFIG_($GSYM)=y" mini.config | less

Make the .config, tweak it a bit to remove some obvious overgrowth, shrink it with the script (which has to be run in the kernel dir and operates on a filename _other_ than the .config it repeatedly replaces as it runs tests to see if each line is actually necessary), then get a list of global symbols enabled for all architectures by mkroot, and filter those out of the miniconfig we just made.

The result is 70 symbols enabled in this architecture. Then I fire up a window with "make ARCH=openrisc menuconfig" in the kernel source directory so I can use forward slash to look them up and read their help text. I have to search for the name, find what menu they're in, and then navigate to them in that menu to see their help text, because the search DOES NOT LET YOU SEE THAT for no obvious reason. Gratuitous busywork due to terrible UI, because no young hobbyists who might want to improve that sort of thing have joined linux-kernel development in decades, it's all brittle old farts ensconced in layers of bureaucracy. Over a decade ago I made a web version that let you navigate to a symbol (it's repeated per-arch because each architecture has a different menu tree) but I lost access to the website site I was uploading it to in the great kernel.org breakin of 2011 (they locked the barn door HARD after the horses had escaped), and the python 2.x script that produced it has presumably bit-rotted because the kconfig language is turing complete now and can rm -rf your home directory. (The backstory is that I used to maintain kernel.org/doc before linux-kernel's collective head traveled far up enough their collective ass to hear outsiders shouting.) Still, maybe I should try to dig that up and update my local version of the page? It would have been personally useful here. Sigh, throw it on the todo heap...

Anyway: 70 symbols, menuconfig open, let's do this.

Spoilers: I think I need OPENRISC_BUILTIN_DTB="or1ksim", ETHOC, SERIAL_8250 and SERIAL_8250_CONSOLE. I MIGHT also need OPENRISC_HAVE_INST_{FF1,FL1,MUL_DIV}, JUMP_UPON_UNHANDLED_EXCEPTION, SERIAL_OF_PLATFORM, and maybe IOMMU_SUPPORT? But I got there by process of elimination.

I'm familiar with and can thus yank a bunch of symbols: LOCALVERSION_AUTO, LOG_BUF_SHIFT, UTS_NS, PID_NS, HZ_100... They enabled MODULES but have no =m entries selected. Similar global decisions that aren't board-specific include enabling SWAP, SLAB_MERGE_DEFAULT, COMPACTION, TCP_CONG_ADVANCED, STANDALONE, LEGACY_TIOCSTI (really? why?), LDISC_AUTOLOAD (ditto)...

Why is CROSS_MEMORY_ATTACH configurable even with CONFIG_EXPERT switched off? (Why even HAVE that symbol outside the expert menu?) And INITRAMFS_PRESERVE_MTIME is also new, and dumb. Why would you ever NOT do that? The help text for the commit that added it is so FSCKING stupid. With LITERALLY A MILLION DIRECTORIES (in initramfs!) you save less than a second. Again: if this config option should even exist (unlikely) put it in the expert menu.

ETHOC is OpenCores 10/100 ethernet mac... but why the hell is there something called PHYLIB that's separately configurable, and a need to care about MICREL_PHY? The need to select the menu symbol so you make its contents visible is a (sad, decades-old) limitation of kconfig I never personally got around to fixing and thus nobody else did either because no hobbyists remain to fix issues in Linux that aren't directly tied to their paychecks. But "this driver needs this hardware to work" is what config dependencies EXIST FOR, isn't it?

The OPENRISC_HAVE_INST_BLAH stuff is just sad, along with OPENRISC_NO_SPR_SR_DSX. This architecture is not cooked yet. Nobody on x86 or arm needs that kind of micromanagement nonsense, you setup your toolchain and then maybe #ifdef predefined declarations out of the ":| cc -E -dM -" list if there's significant variation in -m types. (Colon is a synonym for "true", :| is faster to type than < /dev/null when you want immediate EOF on stdin.) Sigh, we had lunch with the openrisc guy in Tokyo once, I didn't know enough about his architecture to mention any of this stuff but it could use some serious cleanup...

JUMP_UPON_UNHANDLED_EXCEPTION is similarly nonsense. Pick a behavior already. Don't ask ME to select it. (I lean towards fewer symbols.)

Strangely, BLOCK_LEGACY_AUTOLOAD is not about preventing legacy firmware from autoloading, it's a legacy autoload feature in the block layer which has no business being in modern systems. It is selected by BLOCK_DEV_MD which is just INSANE. I'm a little confused why BLOCK_DEV_MD is switched on in this .config (multi-disk, I.E. RAID support), but... let's just assume I can yank this? The MQ_IOSCHED_* stuff also seems nuts to have on a deeply embedded board in the first place. (It all goes into QEMU which feeds off to a host kernel with its own I/O scheduler.. but again, from a mkroot perspective these are global config options not the province of any one architecture.)

I don't need INPUT_KEYBOARD and KEYBOARD_ATKBD, nor INPUT_MOUSE and MOUSE_PS2, if I'm just using serial console. Similarly, the array of CONFIG_HID* symbols (human interface device: mostly USB keyboards, mice, joysticks, and tablets).

SERIO_SERPORT is "serial port line discipline" and I really hope nobody needs that anymore. It's way far away from SERIAL_8250 and SERIAL_8250_CONSOLE in the menu, which is two symbols I presumably do need (although why there's TWO is still... funky; the console should be able to wrap an arbitrary character device without weird glue code). And under the 8250 stuff: SERIAL_8250_DEPRECATED_OPTIONS is crazy to select, and did they honestly implement a weird historical SERIAL_8250_16550A_VARIANTS that needs extra probing (why would you do that?), and WHY is SERIAL_OF_PLATFORM a separate config option? If you enable device tree (which long ago emerged from the primordial sparc/powerpc "open firmware" sort-of-consortium thing that died ages ago) and you have this driver, then device tree may initialize such devices. Why is there a CONFIG OPTION to plug together two things you ALREADY SELECTED?

DEVMEM is another legacy thing (and a global decision), PTP_1588_CLOCK is Precision Time Protocol which we do not need (think NTP over ethernet LAN, used in compute clusters that _really_ care about the nodes having closely and constantly synchronized clocks at a sub-millisecond level despite local thermal variations), GPIO_CDEV_V1 is another legacy thing and somewhat deep in the weeds at the best of times? There's VIRTIO and VHOST menu open, but nothing selected under them?

Did they really implement IOMMU_SUPPORT? That seems somewhat... elaborate for this hand-rolled hardware. No, it doesn't look like they did. How did this symbol get enabled? I mean, make ARCH=openrisc or1ksim_defconfig and then grep IOMMU_SUPPORT .config and it's =y, but it's not IN the defconfig file. I guess they just didn't explicitly turn it OFF?

More global options: INOTIFY_USER, NFS_*, XZ_DEC (shouldn't selecting xz in initramfs cover it? Why does it need the library support manually selected as well? Can you decompress xz by piping it through /proc or something? I thought decompressing modules was userspace's job and any crypto signing was on the decompressed one inside the wrapper?) SYMBOLIC_ERRNAME, DEBUG_KERNEL, DEBUG_MISC, SECTION_MISMATCH, FRAME_POINTER, FTRACE, and RUNTIME_TESTING_MENU are also all global, not board-specific.

Ok, enabling _just_ KCONF=OPENRISC_BUILTIN_DTB=\"or1ksim\", ETHOC, SERIAL_8250, SERIAL_8250_CONSOLE produced no output. Add adding the SERIAL_OF thingy didn't help. Hmmm... Back to building or1ksim_defconfig and yes that still works. (Although if I don't supply ARCH but not CROSS_COMPILE when running menuconfig, the actual build prompts me for config symbols which is FSCKING stupid, linux-kernel clique. If you probe it at compile time, you don't have to write it into the .config file. You can just probe it AGAIN next time!)

Alright, we do this the tedious way...


December 8, 2023

I'm redoing the old standalone mkroot project's README into one I can put in toybox's mkroot directory, and in addition to being unable to name mkroot/packages (I've got the same paragraph referring to files in there as a module, a script, and a package), I'd very much LIKE to have a simple example that downloads a tarball, cross compiles it, installs it, and cleans up after itself without a lot of extraneous sidequests.

I have dropbear, which is a fairly COMPLETE example... but overcomplicated. It grabs multiple packages (zlib and dropbear), builds them in an overlapping way via whack-a-mole overrides to make cross compiling work (specify CC= with the prefix but pass --enable-static to configure) and I still don't think I've got it handling dynamic linking properly...

I have busybox, which is... political. I don't want to use it as an example _and_ I hit its increasingly hardwired dependency on bzip2 compression with a rock in a way that broke the help text. And it hasn't got install_flat so I improvised: not clean.

I glanced through the Linux From Scratch tarballs but they break down into "things toybox should have built-in", "things like glibc or gcc" that are both horrific and inappropriate to try to build in mkroot", giant packages like python and perl that I don't want to even think about the dependency chains of, or... man-pages and tzdata aren't really a source package you compile, are they? Nothing's really jumping out at me from there as a good example. (I'm eyeing xz, but it's autoconfified and uses libtool, and I dread trying to get it to run under the airlock.)

I rooted around a bit and found unfs3-0.9.22.tar.gz which is the old standalone nfsv3 server (I.E. a userspace implementation which uses TCP/IP sockets instead of the old UDP nonsense, basically the least bad NFS), but building it against musl fails because rdp/rdp.h doesn't exist in musl. (Because nothing other than NFS used Sun's insane Remode Desktop Protocol for ANYTHING, so having it in libc was NUTS.)

It's like staring at an open fridge and having no idea what I want to eat. I know there's a zillion packages out there that would be perfect for this, but none of them is coming to mind, usually because of dependencies. (Heck, dropbear had one, that's why it's building 2 packages...) Most real world builds AREN'T clean. Hmmm...


December 7, 2023

Finally had the phone call with Louis Rossman, except it wasn't with him it's with... his boss? The guy behind the new project, who I think is also sponsoring the right to repair lobbying. (Nice guy, I did not write down his name and am terrible with names.)

Interesting project, not sure how much of the information they talked about is public (it didn't come up), but on the call I named 3 previous attempts at doing basically the same thing and explained what happened to each of them, thus leaving the hole they're still trying to fill. It's a "wouldn't it be great if"... that people have tried before. It would be lovely if they succeeded, but it's hard for me to get excited.

Then again, Linus wrote "yet another unix-like kernel" and it turned into Linux, so I can't say this WON'T be "the one". Just... most people who write a kernel do NOT wind up with a worldwide community organized around it. And as an amateur computer historian, I've tried to understand (at least in retrospect) how and why the things that became successful did so, and I'm not smelling the competitive advantage here? (Beyond maybe "properly and consistently funded", which puts it ahead of J-core...)

In the case of Linux, it took off by inheriting the existing minix community which Andrew Christmastree Tanenbaum did not want to serve. The author of Minix never took external patches, because of his publishing contract, because he wanted a teaching tool not a real load-bearing OS (once students passed his class they were no longer his problem), and because it was never a collaborate project to begin with but a work of single authorship. The comp.os.minix users were almost all running modified versions of minix, applying patches maintained by the community to port it from 16 to 32 bits and add features like virtual terminals and buffered I/O. Linux jumped from 0.12 to 0.95 basically overnight because shortly after Linus posted his own kernel, some of the minix add-on patch maintainers ported their patches to Linux, and Linus merged them into the next release causing their maintainers (and users) to switch allegiance and rapidly absorbing the accumulated backlog of third-party development work into the new kernel. That was the social environment within which Linux grew and spread, consuming an existing community by providing an outlet/platform for a large backlog of work they'd already done. Lightning could strike because it had been raining a long time and the clouds were charged up.

Minix itself had an opening because AT&T had strangled the original Unix community in the wake of the Apple vs Franklin decision extending copyright to cover binaries, so the first 15 years of "open source due to benevolent neglect" ended abruptly, with the various commercial sublicensors squabbling over the fragments (the "unix wars" and the BSDi lawsuit). Tanenbaum wrote a new implementation from scratch with no legacy legal entanglement to get AWAY from that (replacing the now-illegal Lions Book)... and sold it bundled with his own new textbook. It was not freely redistributable, it cost $69 in 1987 money.

Unix took off in 1974 by being described in detail in an article in the Proceedings of the Association of Computing Machinery (a prominent trade journal at the time), among other things being the first OS written in a higher level language than assembly, so porting it to new hardware was not a complete rewrite. (I.E. this can theoretically run on YOUR computer, whatever it is.) This publication led a bunch of people to beg AT&T for copies, which AT&T was required to provide due to the 1956 antitrust consent decree.

Unix was the first portable operating system, a better mousetrap in lots of ways (hierarchical filesystem, unstructured files, pipe composition, the command interpreter's a regular userspace program...), and the people who funded its creation were legally prohibited from hoarding it (until AT&T allowed itself to be broken up in 1984 to escape the consent decree). Linux was first-ish mover in a market vacuum created by capitalism destroying the original unix community, which incubated in a group of hundreds of skilled professionals desperate for what it provided.

Lightning actually struct Linux THREE TIMES in its first decade: the minix crowd got it to 1.0 rapidly, but the real growth driver from 1.0 to 2.0 was the way adding Apache to Linux could turn any old 386 in a closet into a web server back in 1993, when the NSF changed their AUP (dropping the restrictions on who could connect to the federal subsidized backbone), and the web EXPLODED in popularity as for-profit internet service providers offered dialup access to all comers (most infamously America Online), but nobody had budgets for new hardware for something as strange and unexpected as a "web server" and rather than argue with management they put linux+apache on 5 year old PCs fished out of the trash. And then in 1998 Nescape's source code release was explicitly inspired by a 1997 talk comparing Linux development to other Unix variants (Gnu Hurd, BSD, and conventional licensed Unixes). (The talk was given again at usenix... several months after Netscape's source was already released), which meant Netscape directed the "anything but microsoft" crowd it had united under Java since 1995 into Linux development, tripling the userbase in a year (212% annual growth in a widely-publicized study). At that point, Microsoft and Sun started attacking it directly, and IBM and Oracle started making money from it...

Linus didn't consciously know this when he did what he did. He was in the right place at the right time, and lightning struck. But a project that would LIKE for lightning to strike... Well, Ben Franklin flew kites in a storm. (From inside a barn which kept the last 6 feet of string dry, which were made from nonconductive silk. A detail people who copied his experiment tended to negelect, with more than one fatal result...)


December 6, 2023

I'm been unhappy with yes.c being overcomplicated. I matched the 2.1 gigs/second output of debian's host version, but the code's ugly and I wanted to see if I could come up with a version that looked less like I was throwing a tantrum.

I tried to move the writev() to lib.c, some sort of writevall(toys.optargs), but coming up with a sane syntax for it turns out to be hard because optargs[] doesn't have spaces between the entries, nor a newline at the end, and by the time I've got writev(fd, argv, separator, terminator) it's getting silly and seems kinda special purpose anyway. If I'm manually editing the incoming argv[] list, I might as well just create the iovec[] by hand and save a step, which leaves very little for the libc function to do. Modulo it's not handling short writes gracefully, it expects writes to block and finish instead. Which is fine until someone suspends and resumes the pipeline, but maybe SA_RESTART handles that? Used to be a problem long ago, the kernel has moved on, I should retest at some point. But anyway, moving this to lib/ to hide the complexity and potentially share it between commands was not trivial, after several days of coming back and staring at it again, I've shelved that approach.

So I thought I'd bite the bullet and switch the FILE *stdout buffering modes to be "block-vs-line": if we request TOYFLAG_LINEBUF in our NEWTOY() flags toy_init() calls setlinebuf(stdout) otherwise it calls setbuffer(stdout, xmalloc(4096), 4096). (The man page seems to imply fwrite() and friends can allocate its own buffer on first read/write if you just give setbuffer a length, but if you pass a NULL as the middle argument of setbuffer() the result is immediate/unbuffered output, hence the malloc.) And then I can just use dprintf(1, blah) or similar when I want unbuffered writes (since xprintf() got nerfed).

So I edited main.c to work that way, and simplified yes.c back to about what it looked like before the rewrite, and fed that to count -l and the performance was TERRIBLE. With no arguments "yes" was doing printf("y\ny\ny\ny\ny\ny\ny\ny\n") in a loop which did about 250 megs/second (I.E. 1/10 the speed of the writev() version), and "yes one two three" was even worse doing only about 40 megs/second output on my laptop, with "yes" consuming 100% cpu according to top (and count eating a single digit percentage of another processor keeping up with it). It turns out printf(" %s", blah) is REALLY SLOW, in both glibc and musl. I have no idea how many times it's calling strlen() and malloc() and memcpy() behind the scenes, but it's consistently expensive.

So NEXT attempt, I had yes.c convert the input to one contiguous string (via two for() loops with a malloc in between) and had the output loop call fwrite() with a precalculated length (I could have puts() but I was trying to avoid the plumbing strlen() inside the output loop), and THAT version still only got about 250 megs/second. Looks like the limiting factor is copying the data into and out of the FILE * buffer: doing that is just plain slow, there's always a memcpy() before each write().

I want to simplify this code, but the speed penalty is an order of magnitude, and "yes" DOES get used as a data source fed into things like "dd" so its performance could significantly impact builds.

With the current vectored write version, yes | count each peg a processor at 100% (which is odd, because "./count -l < /dev/zero > /dev/null" maxes out at 11G/sec not 2.1, but eh...) I _mostly_ don't care about performance until somebody comes to me with a complaint, but dude: order of magnitude.

The next question is whether block buffering would addresses top's CPU usage at all, but I think I need to profile that and find out WHERE the cpu is being used? (I think I did that once and the sscanf() loop reading the "stat" data was surprisingly painful? But it's been forever. I can switch to a while (*a>='0' && *a<='9') x = x*10)+*(a++)-'0'; or something, but there's always a bottleneck. That's how bottlenecks work: removing one makes something ELSE the bottleneck, it's just the narrowest part of the pipe.

Trying to close down for a release and laptop upgrade, not a good rathole to go down just now, but I can move it up the todo list.


December 5, 2023

I have Laringytis. The fourth Doctor's admonition of K9 keeps running through my head: "Laryngitis! What's a robot doing with Laringytis, I mean what do you NEED it for?" (The voice actor was unavailable.)

Sigh:

$ . - <<<$'cat<<EOF';. - <<<$'potato\nEOF' bash: -: No such file or directory bash: -: No such file or directory

I'm trying to work out a "supply file input from a constant" syntax that does NOT run a subshell the way cat <(cat<<<$'hello world') does, and what I've _hit_ is "bash's 'source' doesn't parse - meaning stdin". (Toybox tries to be generic and have all commands behave the same way handling that sort of thing with common plumbing. So here's a tension between "toybox expectations" and "matching bash behavior".)

I am not emailing Chet Ramey. Not doing it. Bash is enough of a moving target already, and I'm closing windows in hopes of FINALLY upgrading my laptop (new debian release, putting the big memory back, maybe buy a new hard drive if I've got it in the budget) which is likely to cause BUCKETS of regressions in TEST_HOST already, which I am prepared to tackle but...

The original test in the window I was closing was to see if HERE documents could straddle source contexts. They're logically kind of like an #include in C, but they work more like a function call in bash. Can you define "local" variables in them? What impact do they have on $LINENO? How about $0 and $@ and so on? The bash man page explains SOME things, but not all, so I run experiments and then leave the tab open as note-to-self to come back to later, and now I'm trying to marshall the test into tests/sh.test and would like a better syntax for it than writing two temporary files I then have to delete...

$ (source <(cat<<<$'cat<<EOF'); source <(cat<<<$'potato\nEOF');)
bash: warning: here-document at line 1 delimited by end-of-file (wanted `EOF')
bash: potato: command not found
bash: EOF: command not found

That's a very ugly syntax. I wonder if txpect would help here somehow? I could do CTRL-D from the command line to trigger EOF if it was just "source <(cat); source <(cat);" and then I supplied input, does $'\x4' work in txpect? Is that terrible enough black magic I don't want to put it in a test?

It's possible to produce HERE document EOF conditions that cannot be met:

$ bash -c $'cat<<EO\'\\\nEO\'\n'
bash: line 1: warning: here-document at line 1 delimited by end-of-file (wanted `EO\
EO')

Fuzzy's theory is my throat is irritated by the massive clouds of dust the construction at Hancock Center is kicking up (I walked by last night and all the walls of sears are down so you can see inside both floors, and they'd scraped the asphalt off the driveway in front of the dead sears all the way down to HEB, and this morning she says it's dust city). So I walked north to the McDonald's in the Target parking lot and hung out there with laptop for a bit. The cheapest combo is now $8, but you can still order by speaking to a human and handing over cash if you really insist. (There's one register left in one corner. They REALLY want to turn the whole building into a giant vending machine with no humans working at it, and all the burgers and fries produced by machinery... except humans are still cheaper, I guess?)


December 4, 2023

Sick. I've had a cough and irritable throat/lungs on and off since before I went to Fade's, but it's spiked a bit. Kinda hard to program.

It does make closing tabs and copying them to various todo.txt files without giving in to the temptation of trying to DO any of them slightly easier, I guess. Got a lot closed, may actually see a light at the end of the tunnel there.

(My usual method of rebooting my laptop is lossy. It hangs, crashes, or powers itself off because loose/dead battery, or Thing I Did As Root (and _that_ test unmounted the home directory...), or whatever cat hair filled up the last motherboard, and I just lose a lot of work. Swapping in new hardware earlier this year means that hasn't happened this time, so I do it the hard way. :)


December 3, 2023

Auditing all the toybox commands to check which ones need LINEBUF, which ones would be ok with a 4k block buffer output, and which ones need switching to dprintf() or similar. (Yes, like I've needed to do for weeks, I've stopped trying to find ways to avoid it and am basically "cleaning the old one with a toothbrush". Likely to take a while...)

In theory, the commands that CARE are the ones that produce progressive output, meaning they read input, process it, and write output based on that input. Because the input may trickle in, or it may have to go thorugh a lot of input to produce output, so if you pipe the results to less or similar you may not see anything for quite a while if it's accumulating 4k of output before showing anything. So "hexdump" needs line buffering, but "ascii" or "basename" produce all their output as quickly as they can without stopping.

I am, of course, finding unrelated things to fix. And I really want to add some sort of safety catch to blkdiscard. Partitoning software and filesystem formatters would go "you are about to wipe the drive, are you sure". The default action of this command, with no arguments, is to erase 100% of the supplied device's data. If there's EVER a time for an "are you sure (y/n)?" that seems like it. But it would be an API change...

Wandered to west campus instead of immediately walking home at the end of my usual overnight programming thing (the round trip is 10k steps!) to actually find the Rossman Repair Group office. It's not _behind_ the Starbucks on 24th, it's _next_ to the starbucks on 24th. Which currently opens at 5:30 am on weekdays and it's technically monday morning now, so I hung out there with a hot chocolate and edited blog entries for a bit to see if anybody was an early riser. Nope. Headed back around 7:30am because I'm tired.


December 2, 2023

Set qemu building toolchains on the orange pi again, and doing just BOOTSTRAP=armv7l (no linux32 wrapper) it once again died with undefined reference to host_detect_local_cpu() meaning I need to give it --build=armv7l-doesnotmatch-linux so the moronic "build==host" plumbing in ./configure (which SHOULD NOT EXIST) does not trigger. It should never, ever trigger. It's the height of gnu that they even have it.

Hmmm, and my first attempt it didn't override it right. Last time I put --build=armv7l-walrus-linux in the armv7l:: tuple by hand. Trying to get the plumbing to detect the need for this (I.E. building the first static toolchain) and supply it automatically. In THEORY there's already a test for this here so it can add the host compiler to the $PATH, which kinda HAS to work already or we wouldn't have gotten this far... The cycle time for letting it build the host toolchain and then try to build the next one (which is the one that breaks because gnu) is longish, need to drill down a bit for a better test setup of just this plumbing...

At least this one has the 32 gig USB stick plugged in. The huaweicloud debian system (which had NO updates over the time I was in minneapolis, another reason to replace it) takes up half the sd card. Doubling the storage triples the free space.

Sigh. Design-wise, the alternative to TOYFLAG_LINEBUF should probably be a 4k output buffer, so it can just deal with it via iterative printf. Converting it means auditing all the commands that DON'T use LINEBUF to make sure they're ok with that and we don't have the "if you pipe the output to less you see nothing for over a minute" problem.

Don't want to. It's probably the right thing. It's a BUNCH of janitorial work. Back when xprintf() actually FLUSHED I thought I had a handle on it, and going back over that territory yet again is annoying...


December 1, 2023

Broke down and hit send on the email from last night. (Ironically, their reply to it then cited the "do not quote the deep magic at me" meme _back_ at me, which I hadn't mentioned there. I _said_ I was trying not to come off like that. Unsuccessfully, it seems.)

What happens if dprintf() has a short write? I know it returns length, but how am I supposed to know what the full length of each printf("%ld %s", num, string) is going to be? It varies! (I miss having a peer group of programmers smarter than me available to answer questions. Moving users of xprintf() to dprintf() kinda requires knowing the dprintf error semantics...)

How do I boot qemu from an sd card image? I did -sd blah.img but -boot doesn't have an sd option in --help? ("qemu-system-aarch64 -m 1024 -machine orangepi-pc -sd orange.img -nographic" just hangs with no output. It didn't complain, but I don't know if it's _trying_? Without -nographic I get a qemu console window but have no idea what to do with it. I was hoping to do some of the u-boot or at least kernel bringup for the orange pi under the emulator rather than sneaker netting sd cards into a physical board that is NOT IN A CASE so I have to be very careful about handling, electrically speaking.)

I want to be surrounded by people smarter and more experienced than me. How else am I supposed to learn? The hard way where I end up writing the documentation I wanted to read at the end of it is VERY FRUSTRATING.

Asked both questions on the libera.chat #musl and #qemu channels, respectively. Waited an hour to see if anybody responded. (It's saturday, nobody seems to be around...) Eventually got answers that printf() returns -1 for short writes (which reading through the source seems to confirm; it also says that dprintf is _not_ atomic, every %s produces another small write() call). And on #qemu somebody pointed me at a web page on how to specify various block device types, which was neither helpful nor responsive to the question I'd asked, but after some clarification rounds they said that if I build u-boot and feed that to qemu's -kernel, it can do more elaborate bootloading from devices qemu can't specify, and there's a page on that I should read.

Speaking of qemu questions, is there a way to disable the locking of the image file so I can get multiple qemu instances to share the same squashfs image without copying it? (As mkroot/testroot.sh would like to do?) I tried adding readonly=on to -drive back when I wrote that script but qemu still locked it, and when I just tried it AGAIN it went "qemu-system-sh4: Block node is read-only" and refused to boot. So these days adding readonly=on to a -drive argument seems to prevent qemu from running at _all_, which is another example of "lateral progress". (Now it doesn't work in a DIFFERENT way.)

Asked that one too. Nobody replied before I logged off...


November 30, 2023

Involved in a linux-kernel thread I don't really want to reply to, but at the same time I'd really rather the patch in question did NOT go in, neither as a new CONFIG option nor imposed on everybody.

I also really don't want to wave 20 years of back links at the person, but they're doing a "you don't understand how this stuff works" thing and... sadly, I kind of think I do? Credentialism is sad and pointless, but I've been through this and have the scars to prove it. No idea how to split that hair without being an asshole about it, though.

I also have no authority WHATSOEVER on Greg Kroah-Hartman's Linux-kernel list, and am not trying to pretend I do. I can't even get my OWN patches in. Well, I probably could, the same way I _could_ lose 80 pounds in 6 months while simultaneously learning enough japanese to pass the N2 proficiency test. I know what's _required, actually doing it is a bit more of a stretch. Heck, I could INVENT a name to submit patches under to avoid triggering Greg, it's not like they actually _check_, it's all performative.) So I can't exactly "you come into my house!?" when it is very much not my house, the carpet's worn through in places and the roof's leaked for years. But this change (add a kernel CONFIG option so initramfs doesn't extract into rootfs, but instead mounts a second tmpfs instance on top and extracts into that, so the pivot_root shenanigans containers should never have been doing don't go "boing" from initramfs) makes me wince. They have switch_root. They could use an initrd instead. Adding a config option that takes an expert five minutes to explain the BACKSTORY about why anyone might or might not want to select it)... oh please no. Do not add more inconsistency to the early boot semantics.

Anyway, typed up another long reply and... haven't hit send yet. Do I really WANT to keep participating in this thread? Hmmm... I could cut and paste what I've written here, as I usually do when deciding not to send it, but... they asked technical questions, which I am attempting to reply to. NOT replying is conceding the technical point, and they did suggest they might be amenable to a redirection that makes it less bad. (Kernel command line option instead of config symbol, much more easily ignorable and closer to how it already works.)


November 29, 2023

Wrote most of a writevall(char **args, char *sep) function for lib/lib.c so I can simplify yes.c again, but... that doesn't append a newline.

The proper API for this is not clear. (Or at least more complicated than I like.) I either need head/tail entries in writevall()'s args (along with the separator, so 4 arguments), or I'd need to edit the args[] array to add stuff to the toys.optargs list before calling writevall() in which case I might as well just assembly the iovec myself. The "head" entry would be needed because the OTHER current identified use case is logpath, which does not currently quote the command name in the first entry so filtering by command name's easier, and having the separator be quote-space-quote leaves unbalanced quotes at the start at the end, which I can compensate for at the end with quote-newline... unless there's just a command name with no arguments. But that doesn't fix the unbalanced quote at the start, plus that wouldn't escape quotes _within_ the arguments...)

Should I audit the FILE * linebuf nonsense to change the options between block buffered and line buffered, and then just use dprintf(1,) when I want raw output? Back when xprintf() actually flushed, I had commands overriding the buffering so they produced progressive output promptly, but xprintf() got nerfed which is why this whole mess is still a problem.

But a flag day global API change in a dirty tree does NOT get me closer to cutting a release...


November 28, 2023

Flying back to Austin today.

Fade's snake wrestling dissertation defense is scheduled! I am not gonna be in town for it, but bits of it happen over zoom and I've previously made that work on my phone.

I was, however, here for her finishing the dissertation, and performed a completely traditional marriage role. I cooked and cleaned, made sure dinner was ready on time after she got home from the office, usually followed by evening beer, snacks, and a succession of hot beverages. In the morning I provided coffee and breakfast, and a packed lunch (generally leftovers from dinner) as she headed out to the office, did the associated dishes and a certain amount of laundry, pilled and walked the dog... As I said, traditional marriage stuff. (She actually has two offices this semester, teaching a monday/wednesday/friday class at a different college, and grading a class at U of M on tuesday/thursday, the second of which is why we still have health insurance...)

Getting a little work in on the flight. The shell backslash newline parsing, which glues the next line on to the current one but only in certain quoting contexts, turns into whack-a-mole in expand_arg_nobrace() and inside recalculate() and probably elsewhere, and I want to make another attempt at getting parse_line to handle it...


November 26, 2023

Poking at the lib/password.c stuff a bit. Still need a mkroot test environment for it...

I've been emailing back and forth a bit with Louis Rossman, the mac repair guy from youtube, who moved from New York to Austin a year or so back. He asked about needing an embedded developer to help him with a project on an Sitara AM625 chip (not repair, he's making a thing) and I emailed asking for details. I don't think I'm the engineer he's looking for, but I piqued his interest enough for him to email back, and we did some ping-pong trying to set up a phone call which hasn't happened yet. (He's busy and I tend to see his emails several hours after he sends them.) His repair shop is like 4 blocks from the table at UT (behind the Starbucks on 24th street), I'm tempted to just stop by and say hi when I get back...


November 25, 2023

Happy thanksgiving.

Orange pi filled up trying to build 64 bit toolchains (only gave the huawei distro a 32 gig sdcard), but it did all but the last 2? (Modulo sh2eb died with an error I've seen before, but that's on the "hexagon/or1k/riscv64" pile at best.)

So I deleted the 64 bit toolchain directory, sent across a kernel directory and my patchesi (it's not connected to the wifi, I just ran a cat5 cable to my laptop and toybox dhcpd on my laptop's eth0 so I can ssh/scp in), and told it to build all the kernels with the 32 bit arm hosted toolchains, and then run the qemu tests. (I built qemu first, back before all those toolchains.)

The kernel build took about 7 hours, which isn't TOO bad for a dozen architectures. There were several mkroot/testroot.sh failures trying to run them under QEMU, but I think the timeout isn't long enough? (It's testing them in parallel on a machine with an obvious DRAM bus bottleneck, and the sdcard isn't blazing fast either.) Eh, look at it on a non-holiday...


November 24, 2023

Adding xargs --show-limits because it seems useful, and trying to fix up the limit detection while I'm at it. According to man 2 execve the limit since 2.6.25 (in 2008) has been 1/4 the stack size ala getrlimit(RLIM_STACK)/4 with a floor of 32 pages (the old limit before 2.6.23. In between those two it did RLIM_STACK/4 without the floor, which broke stuff.) Unfortunately musl's sysconf(_SC_ARG_MAX) returns a compile time constant which is the old 32 page value, which is wrong in multiple ways: it's NOT a compile time constant (you can change it per-process with ulimit) and how it was calculated changed 15 years ago.

So I redid the function in lib/env.c, which has a second user (find.c) that also needs fixing, and checked its output against what xargs --show-limits is producing on debian, which is...

$ xargs --show-limits < /dev/null
Your environment variables take up 2508 bytes
POSIX upper limit on argument length (this system): 2092596
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2090088

That last line is wrong. I did "env | wc" and got 2506 so I guess that's close enough (the NUL terminators swap out 1:1 with newlines... rounding it up to word size maybe? Except this is a 64 bit machine and that's 4 byte alignment, not 8. Extra null terminators for zero length strings? Dunno...)

And 2092596-2090088 = 2508, which is the part that's both true and the wrong. The layout in memory goes "argv[argc], NULL, envp[envc], NULL, strings", I.E. the environment space starts with two arrays of pointers to the start of each entry, with a NULL entry at the end of each array so you know when you're done. (This is at the start of the stack space, and using it AS a stack grows down from the end, so running out of stack presumably means you hit a guard page between the environment space and the stack growing down? I don't THINK that guard page counts against the environment space?

This calculation is ignoring argc[] here because it hasn't been provided yet... except you can go "xargs --show-limits one two three" and you'd THINK that would work in, but it makes no difference to the gnu output in my stale devuan version. And even totally empty it should have the NULL array entry terminating argv[], which it doesn't. Not that argv _can_ be totally empty, the kernel's fs/exec.c function do_execveat_common() requires argv's array length to be greater than zero, meaning the first entry can't be NULL, although it could be a zero length string.) But that's still two pointers plus a NUL byte, adding 17 bytes on a 64 bit target.

But envp[] _is_ known, and their code is counting its string contents. (The global variable "environ" is initialized from envp in the _start() code that calls main(), but it's also the usually ignored third argument of main() which is where the name envp comes from: int argc, char **argv, char **envp. There's no envc argument, you have traverse the array until the null pointer and count it yourself.) Since we know how many environment variables get passed on to the child, we can calculate the size consumed by envp: it's sizeof(char *) times one more than the number of environment strings (because of that NULL entry at the end). In this case, there are 35 environment strings and 36*8 = 288. If xargs isn't reserving the right amount of space, then on long unlucky inputs the exec() will fail returning E2BIG in errno.

Once again I'm torn between making toybox match gnu or do the RIGHT THING. Leaning towards the right thing: count the arguments supplied to xargs, add in the argc[] and envp[] array sizes, ACTUALLY TEST that I can launch a child with exactly as much space as I'm calculating (to the byte)... And of course if you supply 1000 arguments it'll eat an additional 8000 bytes for pointers, so "length of the strings" is NOT a complete metric...

Also, the chatty english nonparseable output is just sad, and not even consistent! The first line has the number in the middle of english text, with no colon and the word "bytes" after it. And --show-limits is then falling through to normal xargs behavior, hanging reading stdin or trying to launch the supplied command (or "echo"). No, this is like --help, it happens INSTEAD of normal operation.

I wonder if rounding up to pointer alignment implies the pointers to AFTER the strings, which I didn't think they did? I should check...

$ cat hello3.c
#include 

int main(int argc, char *argv[], char *envp[])
{
  printf("%p %p %p %p\n", argv, *argv, envp, *envp);
}
$ ./a.out
0x7fffa83118b8 0x7fffa831360e 0x7fffa83118c8 0x7fffa8313616

Yup, both arrays are before the string data. argv is 18b8 and *argv is 360e, envp is 18c8 and *envp is 3616... except there's a 7.5k gap between the start of argv and the start of the string data? What? And argv is 2232 bytes into the start of its page (um... the task struct?) Hmmm... Dowanna read the kernel source right now, this has "rathole" written all over it...


November 23, 2023

Hmmm, I want to ask the question "what are the newest text files under my home directory". Meaning I want to show all files of a given type in the filesystem in reverse chronological order, which SEEMS like "find . -name '*.txt' 2>/dev/null -print0 | xargs -0 ls -lotr" but that hits xargs batch size issues and gets chopped up into individually sorted blocks, which isn't helpful.

My next attempt is "find . -name '*.txt' | while read i; do echo "$(date -r "$i" +%s) $i"; done | sort -n | while read a b; do ls -l "$b"; done" which works but the indentation's wrong because each file is listed individually, so all the spacing is single space.

I guess something like... "find . -name '*.txt' | while read i; do echo "$(date -r "$i" +%s) $i"; done | sort -n | while read a b; do echo -n "$b"; printf \\0; done | xargs -0 ls -ltr" Which works for the wrong reasons, but eh. Answered my question? Except most of this is under git repositories and I don't want that...

Next attempt, filter out files under git repos ala: find . -name '*.txt' | while read i; do [ -e "$i" ] || continue; echo "$(date -r "$i" +%s) $i"; done | sort -n | while read a b; do echo -n "$b"; printf \\0; done | egrep -zZvf <(find . -name .git 2>/dev/null | sed 's/\(.*\)[.]git$/^\1/') | xargs -0 ls -ltr

Which would filter out the todo.txt and similar under my ~/toybox/toybox dir, because they're IN a git repo, just not checked in. How do I distinguish between "mkroot extracts source tarballs under root/ which may have *.txt files in them", and "this is a recently edited file that may have a partial video script or talk idea in it". Using "git status" to see if it's checked in won't exclude the tarball ones...

Back to seridipitously stumbling across things, I guess...


November 22, 2023

Shoveling through a zillion open windows trying to close enough stuff to be able to finally reboot my laptop without losing lots of pending work, which alas has the "tidying the bookcases" problem where you stop and read the books. I keep going "oh that's an easy fix, lemme just finish that" but if it was an easy fix I would have DONE it already. This is a large accumulation of self-selected deceptively non-easy fixes.

Writing down zillions of todo items implied by each open tab. For example, I need to reproduce the toybox build under the new freebsd-14 release, replacing the freebsd-13.raw image I was testing on under qemu. I probably blogged about what my setup was? Yup, I did, which points at the mailing list and ALSO says I should add a FAQ entry. Don't do it NOW. Write it DOWN. And then squint dubiously at it in 6 months wondering why I haven't already done it and what the gotcha was...


November 21, 2023

According to the toolchain build commit, the qemu-system-hexagon build is in the forked repository github.org/quic/qemu in tag hexagon-sysemu-6-nov-2023 but alas, downloading that and checking out the tag (leaving the tree in nukekubi state) running configure barfed that I haven't got a new enough python 3. (In theory I could probably build this on the orange pi, but dowanna. Need to upgrade my laptop, which means closing so many tabs tabs.)

More shell bugs came in on github. I'm sick of thrashing around with the shell, I want to go through my make test_sh and fix what's already THERE, in sequence, so I can see progress and clear some space and have regression testing that means something for each new coding session. But I keep injecting HARD PROBLEMS near the front of the list so I actually hit them in my current testing (if I add a test case to the end of the list and there are a zillion failures how do I even know the test is correct... well, it all passes on TEST_HOST or I don't check it in).

The current "next FAIL" is that reading HERE documents performs line continuations BEFORE checking for the EOF marker, so if you do:

$ cat <*lt;EOF
thingy\
EOF
more thingy
EOF

It SHOULD print two lines of output: "thingyEOF" and "more thingy". But right now my scanner is assembling the lines before trying to parse their contents, which means it stops at the first EOF. And yes, thingy\\ should print a backslash at the end of the line and does NOT escape the newline, so you can't really parse backwards, but going forward from the start of the line is the full expand_arg_nobrace() logic with subshells and everything.

Another one of these is bash -c $'echo $\\\nPATH' which should expand to the variable contents despite the backslash newline after the $, but does not currently with my shell because it's not checking for it there specifically. I initially had the line reading and word parsing plumbing handling this and stripping it out, but it's highly contextual so I moved it to expand_arg_nobrace() which is both whack-a-mole of corner cases like the above (every time I look for the next character I need an advance() function to potentially strip out escaped newlines) and it's too late for HERE document EOF detection like above. Maybe the line reading plumbing needs to detect it like it was doing but record something to act on later?

There needs to be a place to put this where it can be both generic enough and specific enough, and maybe layers need to be shuffled somehow...

Cycling back to a half-remembered design problem is always mildly annoying. Step 1: working out what the actual PROBLEM was and what's wrong with the first few "obvious" solutions, and thus why I didn't finish it last time. I couldn't get all the pieces to fit when I downed tools on this, and coming back after a while I don't remember what all the pieces WERE...


November 20, 2023

Submitted the mkroot talk I didn't properly give in taiwan to Texas Linuxfest, despite some rough edges with the process. (A human emailed me and assured me humans are still involved. Not an outsourced marketer working for the event venue whose customer is a director of human resources for strategic planning, but somebody who actually cares about the topic of the event.)

Somewhere I should keep a list of "talks I should maybe give". This mkroot walkthrough, toybox status du jour, countering trusting trust, an updated and non-jetlagged version of "building the simplest possible Linux system", what's actually involved with making Android self-hosting, what's actually involved with open source software release engineering (I have a checklist, but also reference the earlier debian time based releases talk)...

There's so much stuff I should write down somewhere, because I don't remember I know it until it until it comes up in context. For example, you don't want "true" to have --help output because "true --help > /dev/full" would produce an error message and return nonzero. (That's HOW that could be exploitable.)


November 19, 2023

Next round of the ongoing kernel farce, I noticed that my patch had the new code:

+if (IS_ENABLED(CONFIG_TMPFS) && (!root_fs_names ? !saved_root_name[0] :
+  !!strstr(root_fs_names, "tmpfs")))

Which the kernel procotology process has converted into:

+if (IS_ENABLED(CONFIG_TMPFS)) {
+  if (!saved_root_name[0] && !root_fs_names)
+    is_tmpfs = true;
+  else if (root_fs_names && !!strstr(root_fs_names, "tmpfs"))
+    is_tmpfs = true;
+}

The !! in my patch was so the ? : types matched, otherwise the compiler complains. Logical && already coerces the result to 0 or 1 since at least C99, so it's extra-unnecessary there.

So they "clarified" the code by turning one assignment into two assignments, turning one test on root_fs_names into two tests on rootfs names, and to further clarify they swapped the order in which root_fs_names gets tested. But didn't remove the now-unnecessary !! coersion. Sigh. I could almost see "clarifying" into something like:

if (IS_ENABLED(CONFIG_TMPFS)) {
  if (root_fs_names && strstr(root_fs_names, "tmpfs")) is_tmpfs = true;
  else if (!root_fs_names && !saved_root_name[0]) is_tmpfs = true;
}

But that's not what they did. And I'm not going to comment on it because a fix going in means I can drop my patch, and I don't want to interrupt the necromantic ritual mid-sacrifice, or I could be on the hook for a new goat.

I had to stare at kernel commit fad1db8a351c to figure out whether it was doing a bad thing. It's marking the function devtmpfs_mount() as _init which means it's freed before starting PID 1, which I thought might explain why mount -t devtmpfs in a subdirectory is just giving me a --bind mount of the parent version even in a new namespace (because it CAN'T create a new instance), but it looks like that function is NOT the filesystem's mount entry point, it's a wrapper only called by the init/main.c and init/do_mounts.c (why is that two codepaths, I am SO tired). The actual mount code is public_dev_mount() in drivers/base/devtmpfs.c which is responsible for the horrible singleton nonsense. (And I'm coincidentally setting CONFIG_DEVTMPFS_MOUNT in my configs anyway because I have a patch to make it do something useful, so didn't notice this breaking configs that DIDN'T have it.

I need an or1k toolchain for u-boot, and both musl and qemu-system claim to have or1k support, so I might as well try to add or1k to mkroot while I'm there. So I tried or1k:: in mcm-buildall.sh and it produced a toolchain. I then poked around in the kernel source to see if there looked to be a reasonable defconfig, and found arch/openrisc/configs/or1ksim_defconfig which even has CONFIG_OPENRISC_BUILTIN_DTB="or1ksim" so I shouldn't need to copy a device tree binary file, so I built this kernel by running CROSS_COMPILE=/path/to/or1k-linux-musl- make ARCH=openrisc or1ksim_defconfig and then again without the defconfig. Technically the crazy default build target name is "__all" (which is very different than what "all" does) but last time I tried putting both on the command line with -j $(nproc) it broke because of missing dependencies. (Maybe it's been fixed since? Who knows...) I can do "vmlinux" as a target but that won't build whatever strangely packed arch/$ARCH/boot/thingy qemu-system-blah might need; the vmlinux loader is NOT universally wired up to "qemu-system-blah -kernel" for some reason. Anyway, that's why I'm in the habit of just running it twice.

The build created an arch/openrisc/boot/vmlinux.bin but I tried just the vmlinux at the top of tree, ala qemu-system-or1k -no-reboot -nographic -kernel vmlinux -append "console=ttyS0 panic=1" and I got boot messages! Woo!

What it did NOT do is exit qemu when it paniced at the end, but got stuck in the usual CPU-eating loop of a kernel that doesn't know how to tell its board to power off (or even have a proper halt instruction wired up to this so it can hang WITHOUT spinning eating power). Hmmm... the virt_defconfig entry has more config entries, including POWER_RESET, POWER_RESET_SYSCON, and POWER_RESET_SYSCON_POWEROFF which look promising. But switching those on doesn't make a difference... The boot messages at the end don't say it's _trying_ to reboot. Is the panic=1 making it through? Hmmm... nope, the boot spam has "Kernel command line: earlycon" and then a stack dump because earlycon doesn't work (ioremap failure, possibly imperfect board emulation by qemu), which looks like qemu's -append isn't setting the kernel command line. So where the kernel getting "earlycon" from? Back to the kernel source, grep -r '"earlycon' arch/openrisc says... Of course it isn't in the .config file where I can override it, instead they hardwired it into the board file device tree, boot/dts/or1ksim.dts has bootargs = "earlycon"; If qemu-system-or1k's -append doesn't know how to edit (overlay?) the device tree to override that then I can't panic=1 to force the kernel to exit qemu if something goes wrong. (I could patch the kernel source, but not consistently specify this info the way I am for the other platforms.)


November 18, 2023

The reason toybox's dhcpd refused to run is I hadn't assigned an IPv4 address to eth0 yet. The xioctl (hex number) failed message does not adequately explain the situation, although I did eventually figure it out.

For some reason "linux32 uname -m" returns armv8l on the orange pi kernel, which is wrong: armv7l was the last 32 bit release, armv8 was the one that introduced the 64 bit extensions. I'm unaware of any armv8 that does NOT support 64 bit. I don't care how precious Arm Inc. has been about the name: armv8 was the 64 bit one and aarrcchh6644 is an alias for it.

So my attempt to build an armv7l toolchain needed me to set BOOTSTRAP=armv7l in the environment, which then hit the old problem with gcc builds that think host=target when they don't, which means I need to add --build=armv7l-fuckoff-linux to the armv7l gcc configure line for no other reason than "strcmp don't match so don't do the stupid codepath". I should teach mcm-buildall.sh to do that automatically I guess, although i686 managed to avoid it because the linux32 wrapper worked there so I didn't have to set BOOTSTRAP.


November 17, 2023

Grinding away at the mcm-buildall.sh update to autodetect host type. While I'm at it, I'm pushing all my various fixes in there so you can run it on a fresh musl-cross-make checkout without having to manually select a newer gcc version in the Makefile and so on to get the toolchains I'm building.

Lots of reproducing old issues and trying to remember how I fixed them, the -nostdinc++ one happens on the x86_64 host build with gcc 11.2 and I went "wait, wasn't that a j-core issue?" (Yes, but moving from i686 to x86-64 hit it too. And I didn't have the patch in my standard stack.) And I wasted TWO DAYS trying to figure out why it kept re-downloading a perfectly good linux-6.6.tar.xz. (Yes it's only using it for the kernel headers, which is a lot to download for that purpose, but the external one it had was ancient and didn't have m68k or sh4 in it, plus I _do_ want to know whatever weird upstream breakage the kernel guys do promptly when possible. Build with current to know when I _can't_ build with current, and complain promptly enough they remember breaking it and can't say "nobody complained for years, obviously you're weird for caring" -- because everybody ELSE stopped bothing trying to ever upgrade and just runs really old stuff that works for them.) The problem turned out to be that if the date stamp on the hash file is newer than the datestamp on the tarball, the Makefile will re-download it. And my script was writing the hash file with an echo redirect each run. (The solution: "touch -d @1 hashfile" back to 1969. BIG HAMMER. Long-term solution: don't use make for this.)

I really want to unwrap this build so it's not using musl-cross-make at all anymore, especially since the last commit to mcm was a year and a half ago now. But at the moment, I need to add stuff to the known-working one first.


November 15, 2023

The linux kernel mailing list bureaucracy (his name is Greg) has gone to plaid. Keystone cops level here.

Ooh, very nice email from the vxworks guys at wind river, one of their new developers ported toybox to vxworks and they asked about upstreaming it. I'm all for it. We just need to work out how...


November 14, 2023

I did not send it, I did not send it, I typed up a reply but I did NOT hit send. (Not even to the nice guy resubmitting my patch to lkml over and over to attempt bureaucratic insertion.)

> This is the friendly patch-bot of Greg Kroah-Hartman. You have sent him
> a patch that has triggered this response. He used to manually respond
> to these common problems, but in order to save his sanity (he kept
> writing the same thing over and over, yet to different people), I was
>l; created.

To cc: multiple mailing lists with auto-generated form responses, because his sanity is more important than everyone else's COMBINED.

Greg regularly dumping hundred-long backport patchlists for multiple old kernel versions to lkml is probably the largest single contributor to its volume, and of course he has them from: him instead of something easily filtered out because The Spotlight Must Always Be On Greg At All Times, he's Mr. Important.

> Hopefully you will not take offence and will fix the problem
> in your patch and resubmit it so that it can be accepted into the Linux
> kernel tree.
>
> You are receiving this message because of the following common error(s)
> as indicated below:
>
> - You have marked a patch with a "Fixes:" tag for a commit that is in an
> older released kernel, yet you do not have a cc: stable line in the
> signed-off-by area at all, which means that the patch will not be > applied to any older kernel releases.

His bot cc'd multiple mailing lists because you _didn't_ ask for this to be backported to stale kernels, and at the end of that first paragraph he's saying he won't accept it into _current_ if you don't tell him to _backport_ it to old releases.

It's out of my hands (because I can't navigate this bureaucracy), but personally I'd drop the "fixes" tag: it's an interface improvement adding a new requested capability.

The fact it's a bug from 2013 that's been commented on here before repeatedly but nobody bothered to do the one line fix in 10 years of open source scrutiny (because the bureaucracy chased away all the hobbyists who do that sort of thing sometime before 2013) is incidental...

The snark goes in the blog, not on the list. The lkml situation is not salvageable, the problem is Linus is an empty nester whose kids have all graduated, and Greg "Pay attention to meeeee..." Kroah-Hartman's quest to have the spotlight on himself at all times has shoved aside Alan Cox and Andrew Morton to become the Designated Successor (because it's a high status position, not for any technical reason) which puts a narcissist bureaucrat in charge of the ever-increasing "process". We must carve our submission on stone tablets to submit them to The Greg with proper deference and ceremony. There is a little dance.

That's not fixable. That's "rebuild after it collapses", or before by starting over from scratch and rendering the old monolith irrelevant the way Linux did to Unix. (I've got my hands full with userspace, and if I _did_ have bandwidth would rather tackle compiler than kernel.)

Wait, qemu hexagon has system emulation now? Finally! And I do NOT have time for this right now because I'm A) trying to get a toybox release out, B) trying to get the toolchains built on the second orange pi with all vanilla downloads. Which means fixing the mcm-buildall.sh script, which I TOTALLY need to replace so it builds toolchains in the LFS style without the deeply stagnant musl-cross-make but I don't have time for that either. And to get my own bootloader and OS build on there I need an or1k cross compiler because of the firmware for the power controller chip in u-boot, which comes AFTER the toybox release but updating mcm-buildall.sh to work on aarrcchh6644 hosts -- without special _casing_ that architecture, but uname -m and switchover) does not...

Ahem. Gotta do the thing...


November 13, 2023

Finally checked in for the sleep study I'm theoretically up here in Minneapolis for. (Where I get much better healthcare as the husband of a U of M graduate student, although she defends her dissertation in December so it's Cobra and then the exchange unless she rolls REALLY WELL on the job search table.) Technically I didn't have to come up yet because this first bit is a glorified Zoom call to set things up, but in THEORY they then do the actual sleep study after that prompt-ish-ly?

Yay, nolibc got fixed in the kernel! In a clean checkout of 6.7-rc1:

$ make headers_install INSTALL_HDR_PATH=$PWD/walrus
$ gcc -nostdinc -nostdlib -fPIC -I tools/include/nolibc -I walrus/include -I include/linux -xc - <<< $'#include \nint main(void) { return write(2, "Hello world\\n", 12);}'
$ ./a.out Hello world

Oh hey, it's even got printf. It's not VERY printf: %.3s doesn't truncate the string which will break all SORTS of stuff in toybox, but still... Heh, that's even documented in tools/include/nolibc/stdio.h:

/* minimal vfprintf(). It supports the following formats:
 *  - %[l*]{d,u,c,x,p}
 *  - %s
 *  - unknown modifiers are ignored.
 */

Hmmm... the original nolibc was all inlines and such, and this is declaring functions. Meaning if you #include it from two different files, you'll get two instances of printf() that probably fight. And no obvious way to say "I want the declarations not the definitions" so I can #include and then have one file #include <once-only.h> to get the function definitions. That's MUCH less useful than the old version, which I could supplement with my own code and just let it handle the syscall wrappers and basic types for me. This one grew the limitation that it can't be used from a multi-file source project, by PROVIDING a broken printf() I'd need to replace anyway. The kernel guys overshot and became crappier, how much of what they've done the past decade does THAT describe?

When I "make clean defconfig baseline" and then switch off "cal" in menuconfig and run "make bloatcheck", why doesn't it show the help text getting smaller? It noticed the size of toy_list[] changed, but... Hmmm. Because the help text is a big string constant, and those aren't exactly named. I don't think nm --size-sort is showing them, it's only listing named objects: functions and global variables. In readelf I can see .rodata shrink from 1f05f to 1ef24 but that's buried deep and not remotely orthogonal. Um... aha: if I change char *help_data = MAGIC; to char help_data[] = MAGIC; now it shows up in bloatcheck... and is no longer in the rodata segment. Ok, add the dreaded const qualifier, and a (void *) typecast on the user of the symbol so the compiler STOPS COMPLAINING ABOUT DROPPING IT (string constants are in the rodata segment and you don't complain about THEM, give me the segfault at runtime if I screw up and otherwise SHUT UP ABOUT THIS)... Anyway, that works.

Ever since the C++ developers took over compiler development, I get the equivalent of a chef under a loudspeaker that goes "Your knife moved too fast! The fire is too hot! I detect blackened food but you didn't chant "cajun style" out loud over and over the whole time it was cooking!" It's kind of annoying. Yes, you can totally injure yourself in the kitchen and poison people if you do it wrong, but Microsoft Clippy looking over your shoulder and spouting a stream of pop-up complaints DOES NOT IMPROVE MATTERS.

Updating mkroot/packages/busybox so I can build a current busybox in mkroot and compare the busybox and toybox commands side by side (especially useful in the Linux From Scratch build since I can supplement the build with the commands I haven't implemented yet), and... it hasn't got a "make install_flat" target but it DOES have "make modules_install". Why does the _busybox_ build have modules_install? Ah, becuase they blindly copied makefile plumbing from the kernel back in... oh goddess they added a mechanism to embed shell scripts in the busybox binary, why would... ahem. Ok, reading "git log Makefile" is WAY too long, git annotate says Denys copied kernel build plumbing in 2006 (shortly after he took over from me) and hasn't changed that part of it since.

I know it's been a while since I looked at any of that stuff, but... wow. Just wow.

And bb_common_buffsiz1 (their version of toybuf) has a 127 line generate_BUFSIZ.sh shell script to determine whether it should be 1K or 4K (because nit-picking micromanagement) which I noticed because that script ran right before trying to link the busybox binary, and complained that "diff" wasn't in the $PATH, then hung instead of exiting. (Bravo.) While I plan to provide a diff command for toybox 1.0 I'm not switching over to working on that RIGHT NOW. So my build script needs to replace this size determination script with "echo 4096" or some such. Except of course it's not just emitting the size to stdout, so I've got to read the script to figure out what it's doing in enough detail to be able to NOT do it...


November 12, 2023

I'm aware I don't blog as much up at Fade's. :)

Ok, the way to get a proper text display on an HDTV is to turn off "overscan". (Fade's TV had the same problem as the one at home, so I dug around and found the config option. Given the layer of dust on everything, I'm not _too_ worried about forgetting to switch it back, and it's just one toggle anyway. She has a TV in her room for the switch and streaming services, this is the one in the other room.)

I can't associate the orangepi with the wifi here, and just plugged in my phone tether for the moment. I can't find a "download sources" option in musl-cross-make, but "make extract_all" de facto does the same thing (has to have them to extract them), and then I could "make clean" to undo the extract so my script can add the extra patches. It's kind of impressive that 70 kilobytes/second is now "crazy slow". (Yes, t-mobile is rate limiting the tethering connection. For one thing, I can watch youtube while downloading with no impact on the speed. For another, I helped _implement_ a 40 megaBYTE per second USB-2.0 ethernet hardware adapter in VHDL two trips to Japan ago. Our cheapish FPGA could clock fast enough to deal because there are serial-to-parallel USB PHY chips that give you 8 bits of data at 1/8 the speed the USB signal is clocked at, which even automatically handles the funky zero balancing stuff the protocol does to avoid electrical interference in the cable. More a protocol transciever than a PHY, but it handed us all the raw data to interperet into packets, and we had to assemble our own headers and such going out the other way...)

A post scrolled by on Mastodon that I can't find again, but it said the fix for tethering slowness is to set the laptop's TTL to 65, because the phone TTL is 64 and various carriers are checking packet TTL to detect and slow tethered bandwidth. I should try that. (I'm PAYING FOR TETHERING. It's the same bandwidth. I could download ISO images to my phone and then transfer them to the laptop and THAT wouldn't be slowed. You ratelimit my phone when I watch too much cruncyroll in a month _already_, which I notice there because netflix and hulu have "download" options and prudetube apparently pays to be exempt or something?)

I don't want to try to set the pi building stuff with my phone plugged in to it: even though it's "charging slowly" it's still pulling power (probably 500ma of the 2000 budget) and I'd rather not strain things unnecessarily. So I need everything downloaded before the compile starts, so I can unplug the phone before the CPU gets hungry. Also, I haven't checked in the "autodetect host type and reorder builds appropriately" change, and I'd like this run to use it rather than fixing stuff up by hand again.

Succumbed to the allure of the 3 for $7 monster deal at the gas station down the street from Fade's apartment. They have the "khaos" flavor (tangerine) that got me into this stuff in the first place, and the "pipeline punch" (guava) that's my go-to these days. I am sipping them VERY SLOWLY over a period of many hours, and trying to stick to one can per day.


November 9, 2023

The part of my reply I did _not_ cc: to linux-kernel (but instead sent as a separate email to the individually cc'd people)... turns out to have gone there anyway because thunderbird is constructed entirely out of bugs. (Sigh, I had it all copied here and HTML formated with blockquote tags and the links turned into links and everything. Oh well.)

It's not that it's embarassing to post it to the list or anything, it just can't possibly help, and the usual suspects will go "I am NOT a rigid bureucrat" and get mad without actually contradicting anything I said other than a blanket "Nuh-uuuh!". Except the central point I made that "they HAVE the patch, they want literally the same thing resubmitted to the same place and can't function without it" = the definition of bureaucratic paralysis.

When Linux had its 1.0 release, Unix was 25 years old. Linux is now 32 years old. Those "unix war" greybeards with suspenders arguing about vaxes and ignoring the PC were where the linux-kernel crowd ignoring the Raspberry PI was SEVEN YEARS AGO. (Sigh, back in 1995 there was a good dilbert cartoon about unix greybeards, which apparently Neil Stephenson cited in one of his books. Alas like Eric Raymond that author went down the right wing rathole to crazytown, and possibly UNLIKE Eric he did so prominently enough you can't even cite his OLD work from the before times anymore. His syndicate dropped him and the behind the bastards podcast did multiple episodes on him. Oh well...)

People are celebrating unix time hitting 1.7 billion seconds, and I remarked on mastodon that the original Unix lasted about 1 billion seconds before collapsing and being replaced by Linux (that rollover was in 2001), and the era of Linux can thus be expected to collapse and be replaced at the 2 billion seconds mark in 2033. And judging by last time, that replacement kernel is probably being created about now. Also, 1995-1969 means Unix was 26 years old when a newspaper comic mocked its egocentric senesence, and 1991+26 = 2017, 6 years ago now.


November 8, 2023

Attempting to set up another Orange Raspberry here at Fade's, as long as the two spares I ordered got delivered here because she didn't change the address pulldown when ordering. I brought various sd cards with me. I did NOT bring a 3 amp USB-C charger, but the little instruction paper in the box says it's maximum 2 amp draw at 5 volts (so 10 watt power consumption), and 2.1 amp chargers come in 3-packs at gas stations so there's plenty. (The 1.5-2 amp range may be CPU/GPU load, or maybe plugging in 4 downstream USB devices into the USB host ports? Dunno, they didn't say. Downstream USB devices gotta get their power from somewhere. I don't plan to use the GPU on this thing, and still haven't got a CPU fan, so we're probably good on power.)

The Wikipedia[citation needed] page for Frank Hayes has ARCHIVED a link to a 404 page. (The actual link is still up, it just moved over the years, but footnote 3 has some weird dead redirect that they washed through archive.org.) I tried editing the wikipedia "talk" page to gripe at them (as is my wont; I don't edit wikipedia articles directly because of their "firsthand knowledge is anathema" policy, but will edit talk pages to point at obvious problems), but my wife's apartment's IP address range, which requires individual devices' mac addresses to be registered to connect to it, has been blocked in perpetuity by wikipedia as some kind of coffee shop allowing drive-by editing? (Which, to be clear, it is not. Although the last coffee shop I tried to edit a talk page from worked fine, but it's been a while.)


November 7, 2023

Onna plane, flying up to Fade's for thanksgiving and to use my health insurance for a sleep study. (My sleep/wake schedule is terrible and Fade says I snore sometimes. Can they proscribe me modafinil, or get a CPAP machine? Who knows, but Texas' healthcare system is fully borked so having it done up in Minneapolis is way better, and best to get everything I can done while I still have the good insurance.)

The usual last minute panicing, I got my laptop backed up but not reinstalled to the new devuan version. Pulled the 16 gig memory (two 8 gig sticks) out of the old laptop and wrapped them in the orange pi's anti-static bag to take to Fade's, in case I manage to close all the windows and reboot here. (It was last trip here that I switched over TO this laptop and did NOT move over the memory because I didn't know what was causing the crashes on the old one, so it would be... not poetic justice, but something, to finally get that migrated over on the same countertop.)


November 6, 2023

I've mostly fixed the bzcat.c rename thing but there's a rough edge where it's calling copy_tempfile() and then delete_tempfile() or replace_tempfile() at the end, which is plumbing originally written for patch.c and pulled in for sed -i and other stuff, where it's doing the right thing. You create a temporary file that gets deleted on exit, and then at the end you rename that file over the original file as an atomic update, leaving the file unchanged if the update didn't complete for whatever reason.

Except that in THIS case, bunzip2 creates a new file with a known name (either the file without the .bz2 extension or the .tar version of the .tbz file) and that's the final name. There's no rename step. This ALSO means that the final name to rename to doesn't need to be recorded, meaning we don't need a malloc() copy of it that gets freed() so there's no lifetime tracking.

But NOT using copy_tempfile() means the new file we're creating won't get deleted if you ctrl-c instead of letting it finish. Setting up a different signal handler to delete the file atexit() seems duplicative. I want to use PART of the infrastructure, which implies I should split copy_tempfile() into two functions, one a wrapper around the other. Does that mean I also need to split the delete/replace ones? No, I think it means I don't CALL them when I'm using the simpler version, because the cleanup is either just close the filehandle or delete the file myself. Hmmm... except the signal handler can't delete if it hasn't got the name saved, so I _do_ still need... and what does bunzip2 do when the disk fills up, I think it leaves the partial file there?

Infrastructure in search of a user isn't _just_ a case where writing functions before you need them is bad. It's a case of later users trying to reuse the same infrastructure and having slightly different needs means you rewrite stuff that wasn't generic enough. Which turns into large rewrites for small gains, and the question of whether this should share the infrastructure at all. (Which if you had all the users lined up and fully understood to start with, you could have grabbed the 80/20 subset in the design phase.) Sigh, it's annoying.

Very much not getting a release out before the plane trip. On the bright side, the orange pi built all the qemu instances in less than a day, and the result presumably works. Alas I need to fix the toysh bug before I can test them properly, but every time I sit down at the laptop there's new email about new problems, and when I ignore that and try to find what window I was doing a specific fix I decided to work on when not sitting at the keyboard, I hit 37 other things I haven't finished yet which should be quick to tie off but aren't. (If they were quick to tie off they wouldn't be lingering, they'd be checked in.)

And then there's tabs where I go "What does THIS mean?" when it's got diff -u <(for i in $(./toybox); do echo $i; done | sort) <(sed -n 's/^[ \t]*usage: //p' Config.in $(echo toys/*/*.c | tr ' ' '\n' | grep -v pending | grep -v example) | awk '{print $1}' | sort) | less sitting there at the prompt, and I squint and head scratch until I eventually figure out "ah, somebody hit a config2help.c bug that boils down to trying to combine the usage: lines of two config options, the parent of which doesn't HAVE a usage: line, how did that happen... (Probably gentoo edited the source, but I haven't gone and looked at what they patched. Could still be a really weird config selection.) That shell monstrosity was me comparing the list of built toybox commands with the list of usage: lines in the code, which is popping up all sorts of stuff.

I hadn't noticed that i2cdetect has THREE usage: lines at the start of the code (which is wrong and driving the parser nuts), and gzip is default n (because I need to finish it, that would be in pending if it wasn't sharing a file with a promoted command; compression vs decompression side).

A recurring refrain is that OLDTOY() aliases need a better way to share help text: chgrp and chown report not having usage lines because the usage line of chgrp says "chgrp/chown" and the help text of chown says "see: chgrp". And md5sum's usage: line is ???sum and then sha1sum and sha256sum and friends all says "see: md5sum" with no usage line. But egrep/fgrep does the same thing in a different way: they're also OLDTOY() aliases for grep, but each has invisible config options (no description, so it doesn't show up in menuconfig) with no help section in them. Those config options aren't individually selectable, they're default y with a depends on grep (which probably breaks standalone builds, the dispatch table has multiple entries in it and "grep" is always all three might work like grep, not fgrep or egrep, when built standalone). Meanwhile, reboot/halt/poweroff just has a CONFIG block for reboot, meaning if you "make halt" or "make poweroff" the standalone build winds up with an empty dispatch table which means calling ARRAY_LEN() on it build breaks in main.c. (Which is why the other two are pulling shenanigans with stub config options, which isn't the right fix either...)

There are a few lingering command sub-options, mostly adding -z for selinux nonsense: id -z, mkfifo -z, mknod -z...

A simpler way to get "what's the line after each help" entry is probably grep -A 1 '^[ \t]*help[ \t]*$' toys/*/*.c | egrep -v '(--|: *help$)' but that answers a slightly different question, and wouldn't catch the egrep/fgrep or halt/poweroff issues above.

Ahem: NOT WHAT I'M TRYING TO WORK ON RIGHT NOW. Tangent...


November 5, 2023

The problem with orangepi's git repositories is they're NOT forks of upstream, they're on top of a random historyless "initial commit" of some u-boot source directory. (Which is a common corporate thing to do, it was actually in the vendor's yocto setup instructions at a contract I worked a couple years back. Step 1: discard all history.) So I'm trying to figure out which version orange pi's u-boot used, and their "initial commit" has a Makefile that identifies its version as v2014.07 (a decade old, great) but when I check that vanilla version out and diff it there's a bunch of changes, for example arch/arm/cpu/armv7/cpu.c grew an #ifdef CONFIG_ARM_A7 around the dcache disable on line 43. So I check out the next commit (v2014.10) in vanilla, do a git annotate on that file, and... there's no #ifdef line to pull a commit number from. Orange Pi's "initial commit" of u-boot WAS NOT CLEAN, it was full of their changes. Yeah, the Makefile line adding -Werror to PLATFORM_CPPFLAGS was (I HOPE) something upstream wouldn't have accepted either.

Darn it, I tried to "git checkout master" in the orangepi repo to go back to where I started but they followed the "you can't say MASTER anymore, that's sexist" blog post on Microsoft Github (because nobody respects your autonomy or preserves your agency like Microsoft), so I have no idea what commit it was on when I downloaded it, and probably have to clone the repo again. (I await disney plus editing Tron to have everybody saying the Main Control Program, and the various self-help books about Achieving Mainery. You have created your mainpiece, this mainwork will live through the ages.) Meanwhile on upstream uboot, "git checkout master" worked fine.

Hmmm... that old page said they were building vanilla u-boot to run orange pi hardware, and the vanilla u-boot git says they added Orange Pi 3 support back in 2021 and that they copied it from vanilla linux to do it. And their most recent orange pi commit was two weeks ago. So _probably_ vanilla works fine if I can work out how to configure it?

Hmmm, check the configs/ directory... they have an orangepi_3_defoncifg. What happens if I "make orangepi_3_defconfig" in u-boot with my musl-cross-make aarrcchh6644 cross compiler? It complains it can't run "swig", which aptitude says is some sort of interface generator package. Ok, "aptitude search swig" says the package is just "swig", so apt-get install that and... well it compiled, but says Image 'u-boot-sunxi-with-spl' is missing external blobs and is non-functional: atf-bl31 scp which fills me with joy facepalm. It says I should read board/sunxi/README.sunxi64 in the u-boot source, and... Ok, that explains things. The SCP one is power control running on an embedded or1k processor so I need an or1k cross compiler to build it. This also says it's optional. I don't mind losing suspend (it's a server), but I'd be kind of annoyed if I can't reboot the board from the command line. Still, I can try without it first and add this if I need it.

As for the other one, it looks like the bureau of Alcohol Tobacco and Firearms partnered with ARM to produce "ARM Trusted Firmware", which I can't blame China for (although I _can_ blame Japan since Softbank bought them, but Masayoshi Son is unlikely to want spyware in everybody's boards. Then again, he was the main backer of WeWork and has recently pivoted from crypto to AI so we're not talking unassailable judgement here either. But trusting him vs Xi it's absolutely no contest. Although I keep thinking his first name is Wayward. His wikipedia[citation needed] page says he held the record for "person who lost the most money in history" from the Dot-com crash until Elon Musk took the title from him in twenty years later. A rags to riches to less riches to rather a lot of riches again story, but an actual self-made man who studied computer science at Berkeley from what his publicists say, and it's his fortune to lose...)

Hmmm... you know, _technically_ or1k is supported by musl. And there's a kernel port and qemu. The reason I hadn't poked at it before is I was under the impression nobody'd ever bothered to make actual hardware, it was an FPGA-only chip like Xilinx' Microblaze. (Xilinx is an FPGA manufacturer, their processor was never MEANT to have an ASIC version as far as I can tell. It was something to do with their FPGAs. Back when I worked at timesys, Mips having an FPGA version you could license from them was their big advantage over ARM. Until the whole Lexra thing, anyway.)


November 4, 2023

Scheduled to fly up to Fade's on the 7th, I should start thinking about packing.

Somebody on linux-kernel wants me to repost my patch, which hasn't changed since last time and doesn't depend on anything in the rest of that patch series. (The amount of bureaucratic maneuvering necessary to insert anything into linux-kernel's hemorrhoidal orifice is just AMAZING.) Sigh, I have meant to post my updated kernel patches for the next musl release, which was supposed to happen on the 1st. I just have a shortage of spoons recently.

More github bug reports: both gunzip and bunzip2 aren't handling the renames properly. Another person hit the glibc crypt() breakage which requires a lot of reshuffling to fix properly.


November 3, 2023

Last day of early voting, walked to the thing. I'd heard of prop A, prop B, and prop 1, but there were a dozen more numbered ones I wasn't prepared for. Gave it my best guess, but the summaries are often misleading. (Basically any republican initiative on the ballot is going to try to sound like it's about feeding kittens and actually be about baiting traps for the fur trade.)

On the way back we finally went to the new grocery store ("sprouts") that opened in Mueller Report Center, which is another member of the Central Market/Black Hole Foods/Traitorous Joe's family, purveyors of snooty vegetables. Nice enough as these things go, but it's only been open a week. Nice sale on cheese sticks with proscuiotto wrapped arond them, 99 cents each instead of the usual price of somewhere over $3. How "a cheese stick with 2 pieces of thinner cut salami wrapped around it" becomes $3 is one of those mysteries of capitalism, which you are not meant to understand until you make it through harvard business school and can be told about xenu and the spaceship in the volcano. Before then knowing gives you asthma, apparently. Unless I'm mixing up my religions, this one may be about not translating the writings of Milton Keynes into english lest ye be burned at the stake. (I'm pretty sure Adam Smith's writings were an early "do not invent the torment nexus".)

Did not walk to the table tonight, because "there and back again, and what happened after" got me my 10k steps for the day, and a mild sunburn.


November 2, 2023

Walked to UT last night, very slowly with a jacket over a sweater, wool socks, and gloves (one flourescent orange and the other flourescent yellow; I've got another pair like them with the hands swapped, but no idea where it's got to). Sat on a couch in a quiet corner of the third floor of Jester West and got a single small programming thing done, then walked quietly home again, coughing and sniffling the whole way. I'm calling it a win.

Watching the discussion on the OSI list continue from what I hope is "Minimum Safe Distance", to quote the recorded voice chastizing Ripley in Aliens.


November 1, 2023

New month. I should have a toybox release out today, but I am nowhere near ready.

Giovanni Lostumbo emailed to say he liked my latest rant on computer history (and has added it to his list), and I let him know I broke down and replied to the thread anyway, with a somewhat different off the top of my head rant with several more links to "I do not expect you to believe me" source references, and mentioning topics like Coherent and Imsai and Atari/Activision not in the blog one. (Although given the Motley Fool article I linked to is one I wrote half a lifetime ago, it really only counts as a reference because it gives a specific number and which book I got it out of. And I suppose "this statement was made in 1999, made it through the editorial process of a large publishing institution, and has not been refuted since". But, you know, the New York Times bestselling book by the harvard professor it was quoting probably managed that part...)

The kernel guys are at it again. I can't even.

Alright, speaking of kernel builds, the orange pi kernel built a random linux-git snapshot from earlier this week, using the musl-cross-make toolchains it built locally with a two line fix to swap i686 host for aarrcchh6644 host (changing the tuple and moving it to the front of the build target list sequencing), and there were a number of failures:

$ ls log/*.n
log/armv4l.n  log/armv7m.n      log/mips64.n
log/armv5l.n  log/microblaze.n  log/x86_64.n

Of those, two are actual "the kernel build broke". On x86-64 attempting to AR linux/tools/objtool/libsubcmd/libsubcmd.a it went /bin/sh: 1: ar: not found which means it's trying to run the unprefixed ar instead of x86_64-linux-musl-ar out of the provided cross toolchain. I dunno if that hurts anything in this instance, but it's wrong and that's why the airlock doesn't put the host version of things like ar in the $PATH.

And then on armv5l my build script said cp: bad 'arch/arm/boot/dts/versatile-pb.dtb': No such file or directory when trying to install the generated device tree binary (which you have to feed to qemu to run that board), and the problem is the file moved to arch/arm/boot/dts/arm/versatile-pb.dtb in the kernel source. Why did it grow an extra arm? I have no idea. I should change my script to do a "find" for the file it's installing rather than try to specify a path to where to find it because the kernel guys are old now.


October 31, 2023

Still sick, but less sick. Brain might be back up to 50% today, not sure? I'm in the "I don't drink but may be glimpsing what it's like" version of not entirely coherent, feeling better than I was but probably not in a load-bearing trustworthy way.

Poking at bash arrays to redo scripts/mcm-buildall.sh so when it autodetects the $BOOTSTRAP type with uname -m it can pull the right tuple to the front of the list. I know bash array theory, but it's corner cases all the way down...

$ NAME=(one "two three" four five)
$ echo $NAME
one
$ echo ${NAME[*]}
one two three four five
$ for i in "${NAME[@]}"; do echo =$i=; done
=one=
=two three=
=four=
=five=
$ for i in "${NAME[#]}"; do echo =$i=; done
bash: #: syntax error: operand expected (error token is "#")
$ echo ${#NAME}
3
$ echo ${#NAME[@]}
4
$ echo ${NAME[0]}
one
$ echo ${NAME[1]}
two three
$ for i in {0..${#NAME[@]}}; do echo =$i=; done
={0..4}=
$ X=3
$ echo {1..$X}
{1..3}
$ echo {1..3}
1 2 3

Because curly brackets are resolved before variable expansion, right. I know that because I implemented it in my shell. Still kind of annoying. (Yeah I can call $(seq 0 $((${#NAME[@]}-1))) but it seems kind of silly.)

Yes, I should add all the above to tests/sh.test and implement arrays. At this point "go back through my blog and see what tests I've missed" is part of the todo compost heap.

Hmmm... right now mcm-buildall.sh is just appending -linux-musl to the bootstrap name and ignoring the rest of the tuple, which was fine for i686 because it hasn't got anything. I've been vaguely considering moving the host toolchains to x86-64 for years (the days where the 32 bit build provided useful extra portability are probably gone), so this rewrite doing that naturally is fine... EXCEPT that my x86-64 toolchain build has --with-mtune=nocona because my old netbook died with "illegal instruction". And although my CURRENT netbook hasn't got that problem, if I am worried about "portability" I think I'd still like to keep that?

This is sort of the i486 vs i686 of x86-64, except Intel renamed i586 the "pentium" when they lost a lawsuit against AMD saying they couldn't trademark a number, and the x86-64 series never had sequential numbers that clued you in on what architecture was newer/older than what. Intel's modern theory of backwards compatibility is "Throw out everything you have and give us a lot of money. Now do it again." It's really easy to lose track of architecture versions here, and there's a similar problem with aarrcchh6644 although I'd mostly have to ask Elliott about that. (There's probably a large diagram with boxes and arrows. They presumably run tsort on it.)

Cloned qemu git to the 3b and ran ./configure and it went "*** Ouch! ***" which seems kind of melodramatic. (No really, that's a cut and paste!) It then spent 7 lines editoralizing, which is an elaborate way of telling me "install python3-venv". Then it wanted ninja-build, which had to be shoveled out of a very differently worded complaint. And then it wanted pkg-config which was again phrased quite differently (and cryptically), then libglib2.0-dev (which is pulling in a DOZEN other libraries, libblkid-dev, libffi-dev, libmount-dev... what a horrible hairball but that's gnu for you: glibc wasn't enough, they need a recycling bin next to the trash receptacle)... Sigh, I'd be pretty happy _without_ libpixman-1-dev because this is a headless server without x11 installed which should only ever run stuff with -nographic but QEMU provides no obvious way to skip it. (It skipped sphynx so it doesn't build documentation. Bitched about it for multiple lines of output, but it did it. A lot of other stuff just said "NO" in response to the probe and happily moved on. Hugely inconsistent behavior here.)

And after ./configure successfully completed I looked through the "NO" list and remember how they moved "slirp" out into a separate library a couple years back? So if you build qemu without it you haven't got NETWORKING SUPPORT? Yeah, add "libslirp-dev" to the list. The configure dies without a pixel management library but happily chops out the masquerading virtual network support that works fine without root access (and was there back in 2005), because QEMU DEVELOPMENT IS INSANE. (Oh, and it still has a bunch of git "subprojects", which pixman used to be one of but is now a dependency package instead. Why have submodules AND mandatory package dependencies? I mean... pick one? And why is the kernel's device tree compiler one of the.... nope, not asking.)

Alright, so what I needed to know was "sudo apt-get install python3-venv ninja-build pkg-config libglib2.0-dev libpixman-1-dev libslirp-dev" and I wasted quite a while working that out.

Then I went "would running this under taskset 5 be faster" because it would thrash the orange pi 3b cache less and wind up a net win in completion time? A dozen years back when I was building Aboriginal Linux on comet boards, the Qualcomm Hexagon's barrel processor design (at least the v2 I was using) had 5 virtual processors but insufficient TLB entries and thus performance was best around -j 3 and got SLOWER with more parallelism because gcc is a pig that thrashes memory; back before memtest86 compiling the kernel was considered the best Linux memory test. Running "taskset 5" actually requests 2 processors: taskset takes a hexadecimal bitmap of enabled processors, so 5 is bits 0101 meaning even if this sucker's "4 processors" are 2 real processors hyperthreaded to 4, staggering it like that should request two physical processors. I think. (Works that way on intel.)

But then I broke out of that after about 30 seconds because I went "Faster compared to WHAT?" I wouldn't be able to tell without a baseline, it could just as easily lose badly leaving half the compute resources on the table, and I have no idea if ninja is smart enough to count active processors or all processors, and running 4 gcc instances on 2 physical processors is quite possibly a worst case scenario performance-wise.

I have no idea why typing "make clean" spams a bunch of "tests/tcg/aarch64-linux-user: -march=armv8.1-a+sve detected" variants. Why are you PROBING TOOLCHAIN FLAGS in the CLEAN TARGET? Because IBM is driving, wearing a Red Fedora and saying "m'ilady" a lot with a pubic goatee, and I am still so tired.


October 30, 2023

Feeling terrible. Very tired. Can't sleep. Still coughing and sneezing and so on. Still drizzle outside, but now it's dropped down to near freezing. Spent 5 minutes outside and wanted gloves. (Which means I don't just have a cold, I have a STEREOTYPICAL cold. Of the "wet and drizzly somehow inexplicably encourage viruses" kind. Next up I'll have joint aches because of the weather and want a rocking chair...)

The orange pi finished building all the toolchains early this morning. My current theory is the memory bandwidth to the 8G ram chip is way too small to satisfy 4 processors running gcc, the rk3566 has 32k i/d cache per processor (256k total) and then 512k "L3 cache" on top of that which is _laughably_ small for this use case (my ten year old laptop has 6 times as much), so having 8 gigs of DRAM doesn't mean much if it's constantly thrashing the cache. Still, the thermal sensors never got above 60 celsius and it chugged away quietly on the counter to completion, and I'm really not asking THAT much from the system.

Symlinked the ccc directory into a clean toybox checkout and set it building mkroot/mkroot.sh CROSS=allnonstop LINUX=~/linux to see how long that takes. I haven't fixed the shell yet and I pointed it at a clean linux checkout without any of my patches, so the result is unlikely to WORK. And I haven't installed qemu on the thing yet anyway. But I'm trying to get confidence that "nightly toybox+kernel builds" is a thing this hardware could conceivably do as a cron job or similar.

Sat down to check email, with a message from Ray Gardner that basically reads to me "I know you said you finally think you understand how to rewrite tsort back on the 19th, but you didn't do it fast enough so here's an implementation that permanently prevents you from writing your own and thus confirming that you DO understand it, which you'll be an asshole if you don't just apply immediately, meaning you will never have confidence that you can maintain this command if I vanish, meaning you should really just remove tsort from toybox entirely and declare it out of scope, except that would be politically even worse." Yes, I'm aware that is a 100% unfair reading and I don't want to SAY that... which is why I haven't tried to reply yet. I am not coherent enough to handle the situation right now.

Yup, I'm definitely still sick. Trying to code right now means I'll probably do more harm than good and make more work for myself in the long run, but I'm behind and falling behinder and making people increasingly disappointed.

The two additional Orange Pi's I asked Fade to order are waiting in her apartment, because it's a shared family account and she selected the wrong address from the pulldown. Oh well, in theory I fly there on the 7th (getting the NEXT round of medical care while the insurance is good. Sleep study this time).

The mkroot allnonstop build made a kernel for about 2/3 of the targets. Really _weird_ failures on some of them, for example x86-64 is trying to call the host "ar" at the end instead of the cross prefixed one. I'm sure I fixed that before, but nobody's been regression testing a cross compile _to_ x86-64 recently I guess? (This was git checkout du jour, I haven't tried to bisect when the problem was introduced.) Didn't even take half a day to finish, that's easily a nightly build on this hardware.

Sigh, to reduce wear and tear on the flash memory I have a TODO item to set up a tmpfs mount for the cp -s source snapshot to copy into, since symlinks don't care about crossing filesystem boundaries and this way all the temp files wind up in the tmpfs. Then the build can tar up any build failures (following symlinks of course) to peruse later. But if the performance bottleneck _is_ the DRAM bus (haven't really tried to chase it down yet but it smells like memory bandwidth, they designed the thing for 2 gigs and 8 gigs isn't FASTER, going at the same speed it actually takes 4x as long to read/write all of it) then a tmpfs might actually make performance _worse_? Well... probably a wash, actually. Ram backed filesystems are essentially mounting the page cache, the data already goes through the page cache anyway because that's how the VFS works. (Long ago there was separate block cache and file cache, but that went bye-bye sometime during the 2.3 development cycle I think? 2.5 at the latest.)


October 29, 2023

Spent most of yesterday with a sore throat, sneezing, stuffed up nose, and a headache that felt like it was pushing down on all my top teeth until I took a Fake Zyrtec. (Well, store brand.) Not shaping up to be my most productive day. Dunno if it's a bacteria or virus, or if something's blooming after all the rain. Gradually got worse through the day and I didn't accomplish anything after about 10am. Hoping for better today, but not starting out feeling great...

I miss walking to/from UT to clear my head. The weather has consistently refused to cooperate for over a week (downpours every night, wind that downs tree branches when there isn't lightning), and it's really hard to replace a 4 mile walk with pacing inside or a couple circuits around Hancock center. I haven't been getting enough exercise or enough focused work time away from home, and haven't been sleeping well either, which has left me generally feeling overwhelmed by accumulating TODO items. This general lurgy has turned (back?) into a cold with pronounced symptoms, but that doesn't mean I'm feeling BETTER...

I applied Elliott's microcom patch yesterday (and a fix to not use the CTRL() macro that bionic and musl don't #include without another explict header, so those builds don't break, although maybe I need a more general fix for that) but hadn't pushed it yet because I kinda want to do more cleanup. The immediate stuff is user interface cleanup I wasn't feeling up to (and I wanted to let him comment on switching command letters), and some library work. The progress indicator from count.c should wind up in lib so wget.c can also use it. On the one hand, count's progress indicator seems like overkill and precludes just using sendfile() here, but on the other if I went to all the trouble of writing it already and it's in use, having MORE of them seems silly. And somebody's going to want a bar graph, they always do...

And the "raw mode" line input from lib/passwd.c... well let's face it I have a HUGE todo item to do proper line input for sh.c with cursoring around and command history and so on. I can't block this awaiting that but should maybe figure out what function interface it wants so I can drop in the better one later?

I also have a todo item to make sure the terminal shenanigans I'm doing with microcom don't screw stuff up like they were doing way back when with tab stops being set wrong and so on. I mean, I THINK I fixed that, the recent tty work should address the worst of it if I hadn't earlier: now it's always a delta from the TTY settings it read rather than trying to define everything to a "known good" state that turns out to break some actual serial devices for reasons I never DID track down. But the tty driver used by the turtle's serial board got very unhappy, and I should pull it out and test it again. And set up serial on the Orange Pi, although I can ssh into it now. Still need to reimage it, I am NOT giving Emperor Xi's image any of my ssh keys or passwords. Anyway, I have a todo item to go through all the tty stuff again when I have spare brain (and maybe figure out how to TEST these failures I've only been seeing when I plug in obscure real hardware), which is kind of bundled into the "write new raw line editing" TODO item because that's all TTY black magic...

But I haven't had a lot of spare brain recently. Elliott implying there are people making load bearing use of this thing that somewhat recently had sharp edges I'm not sure I've resolved means I need to come up with some focus here, and I am NOT WELL at the moment.

But this morning there's an email with a ping from Elliott about applying his patch, so I pushed what I had and typed up a reply, which I haven't sent yet because I kept doing rambling digressions about unrelated todo items that belong here more than on the list.

Part of the reason I'm messing with writev() in yes.c is there might be a lib/ function that takes an array of strings and writes them out as one big atomic transaction... except A) the obvious users take optargs[] which hasn't got a "\n" on the end or extra space to add it, B) the other obvious user that came to mind is logpath.c which wants to "quotes" around the arguments after the first which I suppose could be handled by interleaving elements into a new array but the caller would probably have to do that, C) there's a length limit of 1024 iovec per atomic writev() so enough arguments stops being atomic, and is that an error or "fall back to malloc and stpcpy() loop"? The complexity moving OUT of yes is good, but... maybe the function here is actually "copy this array of strings into one big string with X bytes extra space" like dirtree_path() does on dirtree linked lists? That's what watch.c and logger.c seem to want...

So many little todo items. I haven't fixed the "toybox sed --help is single line output" bug yet. (Meanwhile "toybox --help sed" produces the full output. The flags are set wrong somewhere in the different codepaths, not hard, I just haven't yet.) I haven't finished adding gzipped help text support yet (it's sitting mostly done in a directory). I need to redesign logpath because gnu/autoconf got worse while I wasn't looking. I got a github bug report about the shell glitch that I've both known about and FIXED at some point but dunno which tree that's in and this fix was part of some larger shell work I never finished (probably the bash -c $'echo $\\\n{PATH}' work where you SHOULD be able to stick a \ line continuation almost anywhere, but it shouldn't filter it out early in line parsing because there are cases it needs to stay, and adding a hundred different checks for it in every corner case where I advance to the next character is not the correct answer either...) So I haven't checked in a fix to that shell glitch, and at a quick glance don't remember exactly where it WAS (I have a general idea but there's like 4 functions calling each other in sequence and I have to trace down the stack again), and when I sit down and go "building defconfig with the Android NDK doesn't work because commands like su.c are still trying to use getspnam() and such from the shadow nonsense that I re-implemented but didn't wean the users off of, sure that open window is in front of me and I feel like I have the brain to understand it, I'll work on THAT this morning..." But I can't. Nobody's waiting for that. There's a bunch of other things people are waiting for that I should be doing instead. Whatever I try to focus on, I should be focusing on something else instead.

No individual log is responsible for a logjam, but here we are...


October 28, 2023

Oh good grief, two of my least favorite licensing people, Larry Rosen and Bradley Kuhn, are interacting on the OSI's license-discuss list where the're doing bad computer history and insisting that a guy Larry Rosen coincidentally interviewed for a book years ago is clearly the origin of something obvious that was independently invented simultaneously multiple times. I typed up a rebuttal I didn't send, and as usual that's what the blog is for. (There's "someone is wrong on the internet", which is hard to resist and "Bradley and Larry are amplifying each other's wrong" which is easy to resist because them being wrong is kind of the ground state of existence.)

Given that open source was the default before proprietary software was invented by the Apple vs Franklin decision in 1983 extending copyright to cover binaries (previously "just a number" and thus uncopyrightable), which triggered things like IBM's "Object Code Only" announcement and Stallman's famous printer tantrum, there's a fairly small window for anybody else who thought about this obvious thing before then to publish it. There's a lot of exploratory business model stuff like the way the WWIV bulletin board system gave source to people who registered the shareware, thus creating an open source community completely disconnected from unix, which had never heard of "patch" and was sharing "mod files" as english instructions which show people trying to find working models for distributed collaboration on source code with capitalism trying to consume everything around them. I'm pretty sure doing "I would gladly pay you tuesday for a hamburger today" with source code was not a unique thought among budding capitalists.

Entire communities like DECUS and CP/M northwest grew up in the decades before Apple vs Franklin. Magazines like "Compute" and "Byte" had basic listings in the back of every issue because that's how software worked before the law changed. Software had source code the way food had recipes. Unix wasn't _that_ special, it was just tied to the vax hardware TCP/IP routers that darpa replaced BBN's initial honeywell IMP hardware with in 1979 and thus spread with the internet because in the darpa subsidy days it was the provided router OS for the big broadband pipe you didn't have to pay for. (The initial "flag day" switchover was related to the switch from honeywell to VAX hardware, they went from single byte addresses to 4 byte addresses once everybody had the new hardware.) That community talked to each other via subsidized backbone connections instead of store-and-forward dialup like bulletin boards or uucp. Much fewer storage constraints and message purging than wwivnet or fidonet or any of the others.

The bestselling computer in the world was the PDP-8 from 1973 until it was displaced by the Apple II around 1979, and in its entire production run the PDP-8 sold a grand total of around fifty thousand units EVER, meaning there was no consumer base for a "software industry" before microcomputers. Most software before that was either produced by hardware manufacturers bundling software with the hardware they sold, or by local staff maintaining an installation, or collaborations like produced Multics. What little commercial software got created was bespoke development tailored to specific installations because there was no other business model yet due to a lack of customers to sell to. (Not a lot of speculative development when your total potential worldwide market for PDP-6 software was 23 machines, you talk to them FIRST and get paid before putting in the engineering time, and then you DO the work on their hardware because you haven't got one.) The first computer to sell a million units was the Commodore VIC 20 at the end of 1982, and "the computer" was Time's man of the year for 1982. The Apple vs Franklin legal battle happened when it did because a shrinkwrap software finally had a potential customer base THAT YEAR. People fought over the money once there actually was money.

Sure unix was doing stuff, but so were Commodore, Atari, OS/360 and the seven dwarves, the BBC micro, NIST FIPS... The idea of dangling an eventual source release like a carrot has been reinvented, independently, multiple times. Start researching history and this reinvention turns out to be surprisingly common: once the forest is full of rabbits, rabbit stew gets "invented". (Heck, I independently invented bytecode in college without ever having heard of it, and was all excited until I did enough research to find pascal P-code from the 1970s. When you encounter campaigns like Konrad Zuse's kids insisting "this guy independently did stuff which was then destroyed and nobody heard about it for 40 years until his grandkids started claiming he should get credit for being first even though if he'd never been born the industry would be no different in any way"... It's tiring. There were lots more that did not get preserved. People only dug up stuff like the Anastoff-Berry Computer because there was a lawsuit to break a patent on "this now-obvious thing that has been reinvented multiple times but nobody bothered to document it, so let's dig up prior art before it composts". Twenty years ago microsoft patented updating the mouse cursor via XOR because nobody had bothered to patent it, not because nobody else was DOING it. Howard Aiken explicitly credited Babbage, in interviews he talked about having read about the analytical and difference engines in the harvard library and how that gave him the idea for the Harvard Mark I, and that's half the reason we credit Babbage today, because Aiken did. But we can PROVE nobody followed up on Zuse's work, and if Zuse had never been born the industry would be no different in any way, but sure: it's an Apatosaurus not a Brontosaurus because academics call dibs and play musical chairs with credit. Not because any ancestral influence was actually transmitted.)

Did you coincidentally interviewed someone important, or does he seem important you because you interviewed him and he lived in a compost heap of 30 other people doing approximately the same thing with a mail-order software business back in the BBS days?

Rob

P.S. at the tail end of the 1970s a lot of PC and other micro software was regularly cross compiled from big iron (as the first versions of DOS were, and Peter Salus' A Quarter Century of Unix describes the first few bootstrap versions of unix for the PDP-7 being cross compiled from a GE-645 via interdepartmental mail) so possession of the source code didn't always do the average user much good anyway. People didn't value what they hadn't yet lost. It's complicated.

Seriously, they're Doing History Wrong. When you dig up the fossil record you find a lot of stuff that had descendants and a lot of stuff that did not. I want to know what the ancestors of modern stuff was, and what factors led us to turn out this way. Stuff that died out without descendants six ice ages ago is of some interest, but it's a different thing.

Once you wire electricity into everybody's homes to run the radio and the lights, people start inventing electrical devices like vacuum cleaners. The switch from whale oil to kerosene meant people started finding lots of uses for petroleum distilation products because they HAD them. Twitter being founded a year before the iphone's release was not a coincidence: there were probably 30 similar sparks over the previous decade that DIDN'T land in a pile of oily rags. Broadband standards were only established in 1997 and began mass deployment in 2000, which meant youtube's founding in 2005 was about as early as enough people had broadband to watch video at a reasonable resolution. Atom Films was 7 years earlier but had to contend with dialup, which meant it was showing cartoons as flash animations, not video recorded by cameras. Atom.com got bought by MTV and relaunched about the same time as youtube when video sites were already "a dime a dozen". What set youtube apart wasn't the technology, it was the legal settlement. (Setting legal precedents in court by employing more expensive lawyers for longer than the other side can afford is a prominent mechanism by which rich people buy laws.)

Back when Gates' 1976 "open letter to hobbyists" was widely mocked, people didn't think they COULD lose the ability to share source code just like cooking recipes, until rich people bought new laws. People creating new stuff aren't the ones cornering the market. Only people who think "this is irreplaceable because I couldn't have created it" want to fence in and charge admission to The Precious. The people who actually made it usually think "Anyone could have done that, I know because I did and I'm not that special. And I could do a better job if I tried again or tinkered more". Capitalists who come along and commercialize the work of others get rich, hoarding wealth far beyond anyone's needs. Meanwhile the actual engineers like Steve Wozniak, Paul Allen, and Mitch Kapor all quit working for money when they had $100 million, because that's more money than you can spend in a lifetime (at 5% interest that's just under $100k/week) so there's no more point trying to financially PROFIT from your work anymore, and you can go do your thing without cubicles and meetings and sales goals. Those three walked away from Apple, Microsoft, and Lotus respectively in the mid-80's. With inflation that's about $300 million in today's money, so $300k/week, I.E. "buy somebody a house each week" territory. But mostly it's a big round psychologically impactful number that makes people go "Is that enough? If there is such a thing as enough, that probably qualifies." People who need MORE money than that to feel satisfied have a hole in their heart no amount of money will ever fill.

But no, Larry and Bradley are pursuing "great white beardyman theory". Nobody has to have ever heard of this guy to be copying him, and "pioneering failed business model" is not a thing. Clearly stallman was the first male man dudebro guy hombre to get mad at a printer (rather than someone who chronically spends more time on self-promotion than he ever did on programming), and movies like Office Space owe credit to him, and movies like 9-to-5 and Desk Set don't count because gender beats chronology...

Heck, this isn't even the "promising to escrow it but not following through" model so popular today. According to this 10 year old version of the wikipedia page for "Source Code Escrow", "In 1982, mathematician Dwight Olson founded Data Securities International, the first software escrow company." They're crediting a software package that first shipped in 1988 for something there was a service you could outsource to 5 years earlier.

I am so tired of this.


October 27, 2023

New keyboard arrived. Asked Fade to order it after I went to Best Buy yesterday and they did not have a USB keyboard. As in they only theoretically carried one type and were out of stock. Plenty of wireless ones, but I don't want to type passwords in via wireless. So I had Fade order me one from amazon. (And you wonder why amazon is winning?)

Broke down and plugged the orange pi as-is into the router. Everything pretty much works as-is out of the box. The board's throttled its performance down farther than I expected in the absence of a heat sink and CPU fan, although the slowness doesn't really seem like it's the processor, maybe the sd card? You'd think disk cache would compensate for most of this, is ext4 doing a lot of unnecessary syncing?

Oddly, the 4 gig image I copied to the card auto-resized itself at some point so the data partition can see the full 32 gig capacity. (This is the old card, I bought a 256 gig card on my previous trip to best buy when I got the 19 inch HDTV. Which is propped up on two stacks of 3 coasters because the plastic stand has a hardwired viewing angle and I'm doing a standing desk thing at the kitchen counter. Eventually I want to remove the monitor and keyboard and access the thing via ssh, but... baby steps.)

The debian image is updating itself from huawei servers. Add in the auto-resize and other similar cleverness in the OS image and I'm going "yeah, there's totally Xi-mandated spyware buried in there; treat the home internet like public wifi until I get that reimaged". Fuzzy's windows machine is a windows machine so probably nothing to protect there, and I don't expect much of the praystation 4 we sometimes watch netflix/hulu/prime on. (I haven't actually played a game on it since spring. I got a switch. The switch is SO much more convenient. And I haven't played anything on THAT this month.) The household does not have any Intenet of Thongs IPv6 enabled underwear, so the rest isn't TOO bad. I suppose it could hack the router, but I kinda assume somebody already DID? It's ancient. (And still way way way faster than the one Google gave us, go figure.)

Started an mcm-buildall.sh all toolchains build. Had to tweak it a little and restart because it built an i686 toolchain and then started building everything else with that, which is wrong for arm64 hardware. I have a variable indicating which host toolchain to use (which should set itself via $(uname -m) but doesn't yet) but I also have to pull the appropriate cross/native toolchain pair to the front of the list because all the other toolchains are built with the cross toolchain that matches the host. The initial "host" toolchain is only used to build that one cross toolchain. (There's nonsense with thread local storage and shared libraries and such which... gcc generally does a multi-stage build where it builds itself and then rebuilds itself with the one it just built, and I've mostly told it not to, and this is the price for that.) It looks like we only have one toolchain per machine name (which wasn't historically guaranteed with EABI vs OABI and such, but is currently the case) so I can move the list to something I can search and juggle a bit. It's a little awkward because a lot of my extended tuple data is quoted strings with spaces in it, providing extra gcc/binutils build command line arguments, and doing anything complicated in the shell without splitting the arguments at the wrong place is a bit annoying. (I should probably learn bash arrays, I need to implement them in toysh anyway...)


October 26, 2023

Today I learned that newer versions of autoconf are EVEN CREEPIER, in that if the version of the command in the $PATH doesn't do what they want, instead of reporting the error they silently check the rest of the $PATH to see if there's another version and use that one via absolute path instead. Meaning my "insert stuff at the start of the $PATH to see if it's good enough for the build to use" approach isn't sufficient, I need to create a SPECULATIVE AIRLOCK from which I have REMOVED the host versions of the commands I'm testing.

And also possibly that my logwrap.c design of having the first entry in the $PATH call the second entry in the $PATH, approximately the way ccache and such do, may not be ideal for working around gnu/stupid. (That's not how the one in aboriginal linux worked, how long ago was autoconf pulling this nonsense? When sneering old white guys say "I've forgotten more than you know about this"... no really, it's a PROBLEM at times. I love being second banana to someone smarter than me in an area, pair programming is GREAT. I'm a natural born assistant. I just tend to pick problems nobody seems to know how to do, and head out flailing into the unknown with a pickaxe and some road flares.)

90% of dealing with gnu crap has always been MAKING IT STOP. The proper solution is RIPPING STUFF OUT, not adding to a steaming pile of "gnu". Gnu is like a 2 year old wanting to "help in the kitchen", you have to distact it and put up baby gates to PREVENT it from reaching out and touching stuff it should not, because it has an infantile ego thinking it's the only thing in the world that matters, and everything else is just a prop put there for its persional benefit by sky daddy him-male-self, and none of it continues to exist when baby's not looking at it.

Oh well, at least it's not systemd. Yet. (Give it time, there will be a gnu/systemd. It's like /opt/local, the pathology CAN'T NOT. It's like predicting the behavior of alcoholics. Or in this case pathological narcissists. I don't WANT to be right, and yet...)


October 25, 2023

Feeling a bit overwhelmed with TODO items. Bug reports, rough edges I'm stumbling across and noting, tangents, new patch submissions, algorithm arguments...

Yeah, it just means I'm not clearing them fast enough. But being overwhelmed isn't helping me clear them faster. I'm still JUST under the weather enough that I can't get into "the zone" as it were, so I dink away...


October 24, 2023

The main reason I never fiddled with busybox's loadkmap and dumpkmap is I had no idea what the file format they'd invented was. You use the host's loadkeys to load a keymap, and then you dumpkmap to save it to some kind of binary blob, and that's presumably where these files come from. They're never properly generated, they only exist as a parasite on a larger ecosystem, which is wrong, however I'd like to be compatible with them on the theory that people out there may have existing files they want to use.

And thus I've dug up the old version of busybox I used to maintain (ending with release 1.2.2) which I'm willing to look at since I read almost all of it thoroughly back when I was maintaining it (or in the ~3 years leading up to that) and am already as contaminated as it gets with that old stuff, this is basically just refreshing my memory. (Yes I know the difference between reading publicly published material and copyright infringement, but I prefer to honestly be able to say I haven't seen it in [timeframe estimate].) I'm reluctant to look at the CURRENT versions without a good reason, but I can just imagine Bradley suing me over the OLD stuff. In 2023. I could turn that into a PR mushroom cloud for him...

The actual kernel data get/fetch interface is straightforward enough (and visible via strace): ioctl(fd, KDGKBENT, data) with a pointer to struct {char table, index; unsigned short value;} where table is capped at NR_KEYMAPS and index it capped at NR_KEYS, and then the set is the same except KDSKBENT to set instead of get. One ioctl per key, and I'm guessing that's the scancode that corresponds to each ascii value. I'm unclear why there's more than one table or how you switch between them, but eh. (Unicode maybe...?)

The file format busybox is writing is... moderately sad. The file starts with the 7 character string "bkeymap", followed by MAX_NR_KEYMAPS (256) bytes that are set to 0 or 1 to indicate which tables are included. This REALLY wants to be a bitfield, but... isn't. It's not aligned either, a 256 byte entry coming after a 7 byte header. Followed by a bunch of unaligned shorts, all those scancodes written out in binary host order. You can't dumpkmap on one endianness and loadkmap on the other, and mmap() is out of the question due to the alignment. This is just painful.

The old 1.2.2 version of dumpkmaps (which I did not write and never actually used) is hardwiring the table so that only the first 14 entries are set, and then switching off every 4th one of those. The result is basically for (i = 0; i<14; i++) if (~i&3) do_table(i); but they did this more elaborate thing instead, I do not know why. If dumpkmaps is the only source of this info, and it's hardwired to produce this table, can it ever NOT read stuff in this order? And why is it skipping tables...? (Why is it using more than ONE table? What do the other tables DO? *shrug* What it does and why it does it are two different things, if I want to know why I probably need to read the kernel source or some such...)

For some reason the struct definition and constant #defines are split across two headers: linux/kd.h has the struct and the ioctl values, and linux/keyboard.h has the two size constants: NR_KEYS=128 and MAX_NR_KEYMAPS=256. Busybox has of course copied all four into their source, but I consider including linux headers from individual toybox commands the right thing to do... except my /usr/include/linux/keyboard.h on Devuan Botulism #defines NR_KEYS to 256, not 128. Which is funky because this value is part of the binary format busybox made up (each table is NR_KEYS shorts long), so what did they do when it changed? (Different initial magic string to differentiate the new format?) Checking the current busybox source... they didn't do anything, they still have their copied value hardwired at 128. They do not appear to have noticed yet that the kernel switched it to 256 back in 2005. Great.

Maybe I should just implement the ascii parser for loadkeys. Doesn't seem more complicated than lsusb and lspci parsing the vendor database files. I'd have to learn more about what all this data MEANS, but maybe that's a good thing.

But not a today thing. A TODO list thing.


October 23, 2023

Although loadkeys is installed in the debian base OS image, no actual keymap files were available on the orange pi image. The /usr/share/keymaps directory the man page references does not exist, and find / -name '*.map' does not find any files. Dug around and found defkeymap.map files in both the linux kernel source (drivers/tty/vt, this one with comments and various how to get umlauts extensions) and the loadkeys source (more bare bones version), which reminds me of the todo item that busybox has loadkmap, dumpkmap, setkeycodes, and scankey which I've never tackled in toybox.

The linux-kernel developers have been working dilligently to deprecate tty consoles, and everybody who actually sets up systems has been putting them BACK with framebuffer and such as necessary because the VT100 hardware existed for a reason. (The 8008 processor was created for a "glass tty" project, modern microprocessors exist in part because a full screen of text attached to a keyboard is a useful way to interact with a computer, and grizzled greybeards considering themselves above that is a sign they're reaching the end of their useful lifespan).

Anyway, did a "raw" download of the second file and copied it onto the cheap 32 gig USB stick and... I can't plug the USB stick into the orange pi while the keyboard and mouse are plugged in, because the 4 USB slots are close together in the standard raspberry pi formation, and the USB stick's case bulges out a bit in all directions.

Some rather extensive jiggling and a couple reboots later I got the stick plugged in, ran loadkeymaps on the file, and... nothing happened. The enter and backspace keys still didn't work. So I ran scankey and... they're not generating scancodes. Ah. So that's why that keyboard was on the shelf with the one missing the "x" key, it's a broken keyboard I should throw out. (Dunno where eletronics waste recycling lives.)

Right. Need to buy a new keyboard. (I could try to rely on the serial port, but that's even fiddlier and requires a second computer to be present and working to interact with this computer.)


October 22, 2023

Responding to a post to the list and reading a page I dug up in response, I'm reminded of something Linus Torvalds said once (probably on lkml), that throughput can always be increased with paralellism and increased clock speed and so on, but latency you can never get back.

It's a bit like that old trick question about "a car goes X miles per hour for Z many miles, how fast would it have to go the rest of the way to get there on time" and the answer is "faster than the speed of light" because it's already used up all the time and isn't there yet. Add up your pauses and it doesn't matter how fast you're going the rest of the time, you're not finishing before the total amount of time you spend waiting. "Hurry up and wait" is not solved JUST by hurrying up, you have to address the waits.

Spent the morning cleaning off counter space next to the router, finding and shuffling cables, trying to research whether granite is a sufficiently insulating surface to leave the bare board on (it would act as a lovely heat sink if so, and I _still_ don't have a case that fits it, although I have multiple little cardboard boxes that should). But I don't want to move the little TV there (not persistently, anyway), so I need to get a boot image set up.

Researching to set up the Orange Pi 3b properly. I walked to Best Buy and bought a 19" HDTV I can actually see all the console text on... but the NEXT problem is the keyboard mapping is nuts. (Chinese? It's a bog standard USB 101 key keyboard, you'd think it wouldn't NEED mapping, but that's not the world we live in.) The ENTER key does nothing (but the one on the number pad works), the backspace key does nothing (I have to hit CTRL-H), and eventualy I expect to need a tilde or something I'll have to look up the unicode hex entry procedure for. Apparently the fix to reconfigure the keyboard is some variant of "sudo dpkg-reconfigure keyboard-configuration", or "setxkbmap us", or "loadkeys us; localectl set-keymap us". (There are various web pages on this.)

But the bigger problem is A) I'm not running a chinese distro on a machine I intend to keep (the engineers seem nice but I do NOT trust xi-who-must-be-obeyed not to have ordered a flunky to order a flunky to make sure all their software has backdoors in it), B) it's got systemd all over it (ew). What I really want is to be able to run a stock devuan on it, or failing that build u-boot and linux from vanilla source and run vanilla devuan userspace. The current distro has all the hardware working (GPU and everything) but the minimum I NEED is the processor, memory, thermal sensors, USB, ssd, and ethernet. Keeping the wifi and emmc and so on would be nice, but I ordered the cheap one without storage populated and I bought an ssd and thumb drive but not emmc. Looking up the deltas between "stock install" and their image is presumably useful. And I haven't installed a CPU fan so it seems to have throttled itself to 1.2 ghz instead of 1.8, I should probably retain that.

So I'm reading the 350 page PDF, which is full of good information presented in basically random order and the index is trash:

page 107: serial setup (USB->TTL, gnd/rxd/txd)
page 127: wifi setup (nmcli, nmtui)
page 143: create_ap
page 179: sensors (thermal sensor)
page 191: sudo orangepi-config
page 289: linux sdk

Some nice stuff here. I've struggled to set up wifi access from the command line (let alone my own access point) before. (Unencrypted is easyish, associating with a password was like 5 stages of passing data between different programs, and of course I was trying to do it in noise areas where the card/driver kept overloading when it did a scan.)

I should probably set up a serial console so I have a non-HDMI console for the hardware. (I have a USB to TTL module with the three little wires each with a pin header, the diagram shows where to connect the gnd/rxd/txd pins and I _think_ black is GND? That's the dangerous one. Of course it needs the same "disable flow control" magic I recorded on the j-core web page ages ago: "stty -F /dev/ttyACM0 -crtscts". (Or whatever this device shows up as, possibly ttyUSB0.) And of course sudo service modemmanager stop if your distro is being fscking stupid and running a daemon that sends random hayes AT commands from the 1980s down every serial device it finds to make sure things like updating flash get reliably corrupted. (Guys, don't put that in the base OS packages. Just no.)

Oh hey, it explains the stupid new systemd ethernet name policy that the most recent debian upgrade broke, I was wondering why the ethernet device was no longer ETH0. But you can rip it out again via "net.ifnames=0" on the kernel command line, and hopefully devuan won't be that stupid.

What I'd really like to do is get QEMU emulating this hardware, so I can try out new potential setups without huncing over a non-portable piece of hardware. The elderly cat remains feisty and clingy, and you can't ssh into a machine until after you've set it up at least partway. There's a page on doing this, but as with most such pages it's several years old describing an earlier iteration of the hardware. Still, I only care about a subset of the devices...


October 21, 2023

Who was it who predicted computers would shrink down to a bump on a cable? We're not there yet, but we're in "board does not weigh enough to pull the HDMI cable flat and instead dangles at a 45 degree angle" territory. Oh well, extra ventilation vs the thermal conductivity of granite, either way...

When I plug the orange pi into the big TV, the display is funky in two ways: 1) it's solid black unless I switch away and then switch back in the input settings, 2) the display is bigger than the screen so I'm only seeing a chunk in the middle, with the first few lines off the top, the first fiew characters of each line off the left edge, and then when it gets to the bottom I can't see that either. So there's a certain amount of "reset" and "echo -n '     '; command" going on with the full screen text console.

The little cube computer had this problem too, back when it was working: it's not the computer, it's the TV. The TV's resolution negotiation is not dynamic when the input changes. I set up the cube up as a server and used VNC to access its desktop a couple times, but mostly did the standard "ssh -t screen -dR walrus" to run persistent stuff under daemonized screen (ctrl-x ctrl-d to detach leaving everything running, the capital R kicks any other tab that's logged in, the way vim refuses to forcing me to manually :recover and rm the .file.txt.sw? stack that sadly tends to accumulate).

Spent half an hour not finding the tiny video screen I ordered back in Wisconsin, it might be in the third bedroom that's full of all the boxes we dumped there when we stopped paying for public storage. Yes it's a waste to not use all the rooms in the house, but it's also a waste to pay $100/month to keep a pile of boxes we haven't opened from our past several moves. Yeah, I should go through it and cull, but Fade's up in Minneapolis so hasn't needed office space, and the last box I opened from there, taking out some very nice waterford crystal I inherited from my mother, Fuzzy broke one within literally the first 30 seconds by knocking it over onto the granite countertop. It's mostly papers and old computer equipment anyway. Recycling 20 year old PCs is a thing I don't REALLY know how to do (I used to know where to take batteries and fluorescent light bulbs, but should check if it's still there after the pandemic), and it seems environmentally impolite to just put them in bulk trash. Plus "check and scrub the old hard drives". There's digital sorting as well as the paper kind...

Last time I went through the boxes I was trying to find any more Munchin "heart of the anomaly" cards from Linucon. We printed 1000 in total, which came in 2 boxes of 500. The full box I gave to the Linucon year 2 chair at the end of the event (who swears he doesn't remember this, but I was there), and the other box (the one we'd distributed however many hundred from to attendees, but it still had some left) I thought I gave to Mark? Not sure. There were a dozen or so leftover attendee schwag bags that had a card each in them (we prepped bags ahead of the registration line and had a buffer), I took some of those out to give to people over the next few months but thought maybe I'd missed some? But a pretty thorough rustle through the boxes didn't find any, and I've done that more than once over the years. (I owe John Kovalic a copy! His price for doing the art was he wanted one of the cards, and sending some to him was the reason I gave the year 2 chair that box. Totally blanking on his name at the moment, but it has been 20 years...) I gave my personal card to Steve Jackson so they had one for their archives. You'd think Steve would have gotten one in HIS schwag bag as a Guest of Honor, but "we got special permission to print a limited edition munchkin card" apparently did not trigger "hey, maybe people will want this later" immediately, despite years of pokemon and magic the gathering and so on. Oh well... (I still have the art files in ancient email archives, but the _point_ is it was a limited run of 1000.)

("Heart of the Anomaly" is both classic star trek technobabble and a munchkin necklace, a heart-shaped jewel worn by Gilly the Perky Goth, which has an extra +1 if signed by Wil Wheaton, who was also GoH at Linucon 1 and one of the first scheduled events after opening ceremonies was "Guests of Honor all play Munchkin with each other" where we made sure to deal it into Wil's initial hand. (It was right after opening cremonies to attendees didn't show him the card first before he examined his hand and discovered the new card we'd made.) We ALSO made sure, by having someone ask for his autograph beforehand, that he did not have a pen. And that everyone else at the table did (Steve, Howard Tayler, etc), to pull out and show him. It was lovely.)

(And yes, I meant +1 EXTRA if signed by Wil Wheaton. I did not mean for his signature to REDUCE the bonus of the card, and there was an erratta on the official munchkin website clarifying this. We were trying to taunt him with his lack of pen, and give the attendees something to interact with him over during the rest of the con. Incentive to NOT sign it would defeat the purpose.)

Anyway, I need an HDTV monitor I can actually adjust the edges of (dowanna screw up the big TV's settings for watching video and playing games, not that we use it all that often), and it looks like the cheapest way to avoid Amazon here is to walk to Best Buy. (Shipping makes buying local about $5 cheaper.) It's a thing I should have anyway, so...


October 20, 2023

Thought I should set up another home server to do fire-and-forget builds under "screen -dR walrus" that I can come back to later without driving my laptop into swap, draining the battery, or just opening it up and realizing the compute job I left running has another 2 hours to go and wasn't making progress while it was suspended. Plus I can set up nightly builds of linux kernel git snapshot du jour, mkroot du jour, rebuild the musl-cross-make toolchains more regularly, run the toybox test suite on each new commit (which github does not do reliably and then fishing logs and such out of it is a pain), eventually automate the Linux From Scratch builds for regression testing... I have a bunch of uses for a little server grinding away at stuff.

Alas the "gmktec nucbox" (tiny cube computer) I was using before spontaneously bricked itself a while back. I think I plugged it into the switch docking station's USB-C power supply, which instantly fried it because USB-C is not compatible with USB-C, as the rasperry pi guys found out (isn't it CREEPY that the "full featured" USB cables have a chip inside?) back before they hired a cop to build covert surveilance devices and literally bragged about it on mastodon. ("Our stuff is watching you! We couldn't be more proud! Buy it, install it in your home, put it in all your critical infrastructure!" And they couldn't understand why people found this a bit tone-deaf...)

So I ordered an Orange Pi 3b off amazon a couple weeks back (saw somebody's quick review and went "sure, why not"), and it arrived today. This is a cheap shenzen pi clone from a vendor I've been vaguely following since 2018, but back then an 8x processor board with 2 gigs ram was not balanced like I wanted. This board has an 8g RAM 4x CPU variant for about $60, in a standard pi form factor. Rustled up a 32g sd card and set about downloading debian for Orange Pi 3b... which is on a google drive that says (and I quote) "Download quota exceeded for this file, so you can't download it at this time" for all four files (desktop and server images for the current and previous debian releases). But there's a "download all" option in the corner, which zipped the four files into two files and then did multiple parallel downloads of those two zip files. Which total 3 gigs, and expects to take over 20 minutes to download via Google Fiber (downloading from Google's server using Google's connection which brags about its speed, and it's managing 2.5 megabytes/second), and of course since they made zip files out of them I can't check the contents before the download completes because Google _converted_ the uploaded files into a file format that puts the index at the end of the file (thus even slight truncation renders the entire file useless). I continue to suspect that those 12k people Google laid off in January may have been load-bearing.

So the first zip file contains the two Debian "word that starts with B" releases (desktop and server), as opposed to the older Debian "word that starts with B" releases that went in the second zip file. (Thank you Debian.) So "Bookworm" is stable and "Bullseye" is "oldstable" and I will not remember that but if I'm only dealing with one I don't have to.

Extracting the zip file gives me a .7z file, and strangely my devuan laptop has a 7z archiver installed. (The relationship between 7z and xz and lzma is something I'm still unclear on.) The archiver is another one of those tar/ps command parsers where you don't put a dash on the command letter, so "7z l blah.7z" to list the files and then "7z x blah.7z" to extract them... which gives an elaborate multiline progress indicator to make sure you know this ain't a unix tool designed to be scriptable. Did I really need the lines "Scanning the drive for archives:" and "1 file, 488048704 bytes (466 MiB)" echoed out at me when I specified the archive name on the command line? What the output does NOT contain, at least not when I came back after it finished animating, is the names of the files it extracted. Because why would I need to know THAT?

The archive extracted to a blah.img file and a blah.img.sha file, which MAKES NO SENSE. If an adversary replaces the archive with a trojaned one, they could replace the sha1 that's IN THE SAME ARCHIVE. You did not prove anything. Right: it's china, anybody who asked these sort of questions wound up shot or living in Uighurs' homes under orders to prevent married couples from ever having sex so Xi's han supremacist genocide could be nominally bloodless at least where people can see it.

The blah.img file is 3.7 gigs, with a 1g partition and a 2.5g partition. So to recap: an orangepi web page linked me to a google drive where I got a zip file containing a .7z archive which contained an .img file with two partitions, and now I'd like to figure out how to sanely put the 3.7g image onto a 32g sdcard. I guess just cat it, accept the smaller size for now, and try to repartition and resize filesystems later? The current question is just "does it boot"?

What I really want is the bootloader so I can install my own Debian on it. The official raspberry pi bootloader is a nest of horrible binary firmware for the broadcom chipset, which loads an extensively forked kernel (again, thanks broadcom but it was the pi guys who selected that component manifest). But over in Orange Pi land there are pages at least implying that vanilla source builds have been made to work on orange pi hardware. (Number two has to try harder, I guess.) I don't really care about half this hardware (the GPU and "AI engine" and so on), I want a little fire-and-forget build server. It's a 4xSMP 64 bit 1.8ghz processor with 8 gigs of ram, it could be faster than my laptop. But of course the above "how to build" is an old page circa 2017 about a 32 bit board and this is a 64 bit board that came two months ago. (For once, actually using current hardware! And paying the penalty for it.)

But let's boot it up with their stuff first. (It's their software running on their hardware, any exploits or spyware can stay there.) They have an english wiki which links to a 358 page PDF manual with pretty much the same contents as the wiki... Sigh. Of COURSE it wants a CPU fan. Will it auto-thottle itself if I DON'T order a CPU fan for it? (And will it cook itself if so?) I was kinda hoping this could eventually live in the kitchen physically plugged into the router's gigabit ethernet. (That way it's not fighting for wifi bandwidth with everything else in the house, I can forward it a port to ssh into from outside the house and life is good. In theory the google fiber is far faster than the house wifi. In theory.)

What does "backup battery interface" mean? Does it come with a battery for the onboard clock or do I need to provide one? (I mean, it was cheap... And does NOT quite fit in the raspberry pi case I've had the turtle board in, it's a fraction of a milimeter too long. I dunno if this is intentional or a design tolerances thing or what.)


October 19, 2023

Ray Gardner sent me the explanation of Knuth's sort algorithm from the awk book, which is MUCH clearer. When reading in the before/after string pairs, you make a list of unique strings you've seen so far, each of which has a count of ancestors and a list of successors. (This is where the insertion sorted array comes in, because you binary search to find each entry you're updating as you read the input.) Once all the data's read in and the big string array is finished, you loop through the array once adding the ones with zero ancestors (which can go out now) to a second list of TODO items. Then loop through the TODO list, and as you process each entry you iterate through its "successors" list decrementing each entry's "ancestor" count, and if that count falls to zero you add that entry to the tail of the TODO list that because it can go out now. You don't even remove the entries from the big string list because the TODO list is what matters for the second half of processing. You can detect cycles by decrementing a count as you add entries to the TODO list, and if the count doesn't fall to zero you had at least one cycle.

That's it. It's reasonably simple, the core data structure is just struct { char *name; unsigned ancestor; struct arg_list *successor; } and I should definitely redo tsort with that... once I clear some space.

It's really hard to sit on my hands instead of trying to make suggestions, because I don't WANT them to add more gratuitous complexity to these commands and solving their problems so they do wind up implementing something would not be _success_. (There's been more than one Dr. Who plot where he got tricked into fixing the enemy's stuff because he couldn't NOT point out where it was wrong, although in the 7th doctor's first episode he was still delirious from the regeneration. And for the 10th doctor, Yana hadn't switched over to being a bad guy yet.)

Elsewhere, I'm not sure what success looks like because somebody seems to have turned on strict overcommit and then their toybox build broke, but switching snprintf() for xsnprintf() just means it exits with a different error message? The build still breaks: it did not process the input because libc returned NULL from snprintf(), and I dunno why it did that. Nobody else has seen it do that, and the code hasn't changed since April 2019. We've had ten releases since then. I'm pretty sure it's worked for somebody other than me in that time. (Yes it's janky but it's operating on known inputs in a build environment, this code DOESN'T SHIP! Did they change the inputs? Obviously they changed the compiler and libc but I've tested it on multiple of each...)

That said... I honestly don't know what's going wrong here. (As I explained in the github comment.) I looked at the gentoo bug too and it's not enlightening. This is 80% likely something wrong with their gentoo build environment, but... I mean it could always be my code doing stuff wrong, but... how? (Even switching on strict overcommit still SHOULDN'T cause this to return NULL? I'd think?)

The thing is, this isn't just gentoo's build environment (overcommit switched on?) using gentoo's compiler (fortify or electric fence or ASAN) using gentoo's libc (outright bug?) but this is also gentoo's patched version of toybox: did they change the help text in a way config2help is misparsing?

This used to build for gentoo, and isn't now. What changed?

AHA! I misread the bug report. (My eyesight was not having a good time with github changing the font and background color repeatedly, and from some of the text I'd assumed it was the same "we hit it here but it was allocated over here" two part thing ASAN gives, so saved some eyestrain by not looking closely enough at the other half which was a different test on the SAME part.) It's not strndup() RETURNING null, we're PASSING it NULL. Which means it is my code, albeit parsing broken input. (And wrote up more explanation.)

All this nonsense is vestigial and I've probably removed enough sub-options at this point that I could just yank the rest? We still need the complexity mkflags() does, but we are NOT collating help text across sub-options. (There's stuff wrong with the help text parsing and we may need to figure out how to do proper usage: for md5sum and sha1sum having the same help text, but the current stuff isn't handling that right as is, so it can probably go.)

And toybox sed --help is producing --short output. That's a bug. Fix it later. And toybox sed -s treats 'q' as skipping to the next file (separate files!), but gnu/sed ends input entirely. Sigh. Another bug, throw it on the pile. (I noticed because it means I have to do sed -s -n '/^config /,/^\*\//{/^\*\//d;p}' toys/*/*.c instead of the simpler sed -s -n '/^config /,${/^\*\//q;p}' toys/*/*.c which is sad.)

Alright, fixing up yes.c to use writev() with toybuf as an iovec array of repeated mappings of the same output, the old toybox is doing 1.2 megs/sec with that , my new one is doing 104 megs/sec, and the debian one is doing 3.6 gigs/sec.

$ toybox timeout 1 yes | ./count -l > /dev/null
1340528 bytes, 1.2Mb, 260Kb/s, 0m01s   
$ ./timeout 1 ./yes | ./count -l > /dev/null
108696064 bytes, 104Mb, 21Mb/s, 0m01s   
$ timeout 1 yes | ./count -l > /dev/null
Terminated bytes, 3.6Gb, 704Mb/s, 0m01s   

Hmmm, still not good enough, need to submit bigger outputs...


October 18, 2023

I did not expect to spend this evening's brainpower optimizing yes.c, but here we are. I wandered through setvbuf(toybuf) but that only got me up to 53mb/s and the debian one is doing almomst 3 gigabytes, so now I'm going "screw it: writev() with a redundant iovec stack!" (I can fill toybuf up with an array of 256 of them all pointing to the same thing).

If I was cheating on ALL cylinders I'd iterate through argv[] and turn the NUL terminators into spaces because Linux is calling us with contiguous environment space (and it doesn't screw up ps/top output THAT much because they're collating the individual arguments with spaces when they display them anyway so it should look the same once they've done that)... except that xexec() isn't guaranteed to give us continguous inputs, that's strings that can come from anyway. So our first call would work but later calls would not.Alas, that's the downside of being too clever. So I need to a contiguous malloc() big enough to assemble the output string into, pretty much what dirtree_path() is doing except on an array instead of a list.

And THEN I need my loop to retry short reads so the pipe filling up or SIGSTOP/SIGCONT don't glitch the output by making it restart at the wrong place so one of the outputs is truncated...

(No, I'm not really recovered. But I am caffeinated!)


October 17, 2023

Stephen Colbert has covid. One of the authors I follow on tumblr has covid. One of the youtubers I follow has covid. I guess I'm in good company?

Trying to feel better. Unfortunately, it doesn't quite work that way. Sigh, I should still get the updated covid booster up at Fade's next month. Which is probably going to be another couple days of really no fun, getting vaccinated against something I've already had but can't prove, so my immune system goes boing. Fade just filled out paperwork for another teaching gig (it's all peacemeal and adjunct work until she finishes dissertating) and they wanted a photo of her vaccination card, so I should remember to bring that so they can update it with the new shot...


October 13, 2023

Sigh. So I've gotten what I'm pretty sure is covid multiple times since 2020. I've never timed it right to test positive, but I don't test often (they're a finite consumable resource, I tend to hoard those) and I've reacted REALLY strongly to a couple of the vaccinations, which implies my immune system was already familiar with the pathogen. Plus the symptoms du jour roughly matched what people said to expect from whatever round was going around, and I've never been isolating THAT hard (what with a grocery store two blocks away that I've been in the habit of visiting daily since I moved into this house). And these days they say cloth masks were always useless and it's N95 or nothing (despite earlier saying N95 had to be specially fitted or it didn't work).

Anyway, some of the previous rounds of probably-covid did a sine wave thing where I'd feel better for a couple weeks and then get worse again, and I'm having such an aftershock. I don't feel BAD exactly, just... I can't focus, am very tired in a way sleep doesn't seem to help, my heart rate's a bit elevated even when lying prone, and everything aches.

Weekend coming up, let's see if that's sufficient recovery time. And some prophyactic zyrtec and ibuprofen...


October 12, 2023

The first thing Linux From Scratch does is build a chroot (I.E. "chapter 5"), and the lowest hanging fruit for toybox is providing the command line utilities necessary to build that chroot under mkroot.

It's actually a bit simpler than that, because I'm externally building a native compiler toolchain using musl-cross-make, so I can break the compiler/libc part out and deal with it separately. If all I wanted to do was bootstrap LFS under mkroot, then I need to get the native compiler hooked up, provide writeable scratch space, marshall the data in (source tarballs and build scripts and so on), and run the thing until it can chroot. In practice I want to do way more than that, but it's a good first milestone reference point.

Musl-cross-make is building binutils gcc gmp mpc mpfr linux musl, which means the chroot packages that AREN'T built by musl-cross-make are: bash coreutils diffutils file findutils gawk glibc grep gzip m4 make ncurses patch sed tar xz.

The glibc build is funky for multiple reasons:

  1. I don't want to use it (I prefer musl and failing that bionic)

  2. Building glibc needs host dependencies like perl that nothing ELSE in the LFS base system needs, at least not since I patched perl out of the kernel build. (The other packages will call perl for pod2man if it's available, but will skip it perl isn't available in the probe because they don't NEED to install man pages during a bootstrap cycle.)

  3. I'm not entirely sure that glibc CAN be bootstrapped under a musl system, it's one of those FSF head up its ass packages where (at one point) glibc had a hard requirement on host glibc in order to build. I remember it couldn't be built under uClibc because of some thread local storage assumption (it built a tool that required a feature uClibc didn't provide in order to build glibc, and did so even when you'd theoretically switched that feature off) which is why my aboriginal LFS bootstrap skipped doing a toolchain build. I _couldn't_ get glibc to build, so I couldn't built the LFS toolchain, so I built the LFS packages with (mostly) the LFS configuration under the aboriginal toolchain.

    But that was a long time ago and a different libc. It's likely musl DOES provide whatever bootstrapping feature glibc needs, because Rich wouldn't leave that out. He was doing glibc _binary_ compatibility for a while there, which is something differen versions of glibc don't even reliably provide with each other. (Heck, the FSF added a whole new symbol versioning nightmare to break glibc compatiblity with.) But if I didn't care about "building gnu patch" because I've already got a good patch, I wouldn't be doing most of this exercise. I'm trying to prove I _can_ bootstrap the other environment, if necessary, and that includes building glibc.

The sed, grep, patch, and tar packages each provide basically one command, all of which toybox already has a version of that SHOULD be good enough, and I'm confident I can fix them if something's wrong. Oddly findutils provides both find and xargs (why?), but toybox has both.

In theory I've got an xz decompressor in toybox, although I'm not sure how load bearing it is. It's a public domain implementation by somebody named Lasse Collin, which Isaac Dunham added to toybox way back when. At some point I should go down that rabbit hole, it's 3136 lines of external code that need a focused cleanup, but it hasn't been a priority. That said, I should probably see if it extracts the LFS source tarballs in question when called from toybox tar.

Maybe our "file" is good enough? I honestly don't know what packages are using it for what... Grep on the command log says there are 5 different invocations, two of which are "file conftest" and "file conftest.o" (one reads as ASCII the other ELF) and the other three are file "-b" "--mime-type" "-e" "tokens" "-L" "-z" followed by either "-n", "-r", or an argument consisting of a couple hundred backslashes. Which means none of them have a filename you can open, meaning they spit out a "usage" message or other error. (Autoconf is wild, man. I have a TODO item about implementing --mime-type in file but it's almost a second command. I also note that httpd.c has a mime() function to convert file extensions to mime types...)

Pulling those out, we're left with: bash coreutils diffutils gawk gzip m4 make ncurses. I know my shell isn't good enough yet, so we need bash and I expect that's what's pulling in ncurses (for user friendly command line editing and history). I haven't finished diff. I haven't STARTED awk, m4, or make. My gzip is still missing compression support.

And of course "coreutils" is a hairball containing over a hundred commands, so that's its own analysis.

Hmmm, which commands are called by which package builds? I changed setupfor() to call "$(which printf)" instead of just printf because it's a bash builtin, but when it's called out of the $PATH it winds up in the wrapped command log so I can do this little horror:

( unset X; { grep -n 'n=== ' log.txt | sed 's/\([^:]*\):.*=== \([^[\\]*\).*/\1 \2/'; echo \$ done; } | while read a b; do [ -n "$X" ] && { echo === $Y && sed -n "$X,${a}p" log.txt | awk '{print $1}' | sort -u | xargs;}; X=$a; Y="$b"; done ) | less

And see what commands each individual package build is calling. It's a pity I can't easily feed that to "watch" as the build runs, but it's got both single and double quoting contexts (doing different things) and shell quoting contexts don't easily nest. I suppose I could make a shell script...

The NEXT thing I did is fire up good old Python 2.x and make an inverter consuming the output of the above horror:

#!/usr/bin/python

import os,sys

x = sys.stdin.read().split('\n')[:-1]
y = {}
cmd = {}
for i in range(len(x)/2): y[x[i*2][4:]] = x[i*2+1].split(' ')
for i in y:
  for j in y[i]:
    if not j in cmd: cmd[j]=[]
    cmd[j].append(i)

for i in cmd:
  x = "%s %s" % (len(cmd[i]), 1)
  print "%s %s\t%s" % (len(cmd[i]), i, " ".join(cmd[i]))

Yeah I should have used map() and friends but I didn't remember the syntax off the top of my head and wasn't looking it up for something simple. Anyway, THAT produces a list of commands followed by each package that uses it, which I can pipe to sort -nr | less to get a priority list of what to test, I guess? Although the hotspots (which, uname, mkdir, and printf) that EVERYTHING uses are actual used by setupfor() and announce() and so on in the shell script, whether or not the actual package builds call them, so that's cheating...

I still need to run this in a more restricted environment, though. Especially down at the bottom of the list there's a lot of one package calls to python, python3, expect, automake-1.16, autm4te, autoheader, autoconf, aclocal-1.16, and so on. Stuff that's probably probed for and not used if not present.


October 11, 2023

Ray Gardner benchmarked my tsort implementation and at 400k pairs the debian implementation took 1.5 seconds and mine took 50 seconds, and given a million pairs the debian implementation took 3.5 seconds and mine took 7 and 1/2 minutes. Yeah, that's N^2 scalability.

I MOSTLY don't care because I don't know of a use case that big (he didn't send the test data, nor say how he created it), but wikipedia[citation needed] says there is an algorithm that's roughly linear-ish somewhere in Knuth's "The Art of Programming" pages 261 to 268. So I grabbed a library copy and... "Somewhere". The author is one of those academics who puts "exercises" every couple pages rather than EXPLAIN stuff (pet peeve of mine, it's why I can't read Paul McKinney's lwn.net articles), and what he DOES say in this case is math salad:

"Our first example is a problem called _topological_sorting_, which is an important process needed in connection with network problems, with so-called PERT charts, and even with linguistics; in fact, it is of potential use whenever we have a problem involving a _partial_ordering_. A partial ordering of a set S is a relation between objects of S, which we may denote by the symbol [squiggle] satisfying the following properties for any objects x, y, and z (not necessarily distinct) in S:

i) if x [squiggle] y and y [squiggle] z, then x [squiggle] z,
ii) if x [squiggle] y and y [squiggle] x, then x = y.
iii) x [squiggle] x. (Reflexivity.)

The notation x [squiggle] y may be read "x precedes or equals y." If x [squiggle] y and x [different squiggle probably meaning not equals] y, we write x [another squiggle, possibly less than in a very strange font] y and say "x precedes y". It is easy to see from (i) (ii) and (iii) that we always have

i') If x [squiggle2] y and y [squiggle2] z, then x [squiggle] z. (Transitivity.)
ii') If x [squiggle2] y y [squiggle2 with a line through it, which he just made up and might mean "not squiggle2" but he didn't SAY that] x. (Asymmetry.)
iii') x [squiggle 2 with a line through it] x. (Irreflexivity.)

That was the portion JUST from page 261 of the third edition of The Art of Unix Programming. Wikipedia [citation needed] says to read through to page 268. It has not actually gotten near any algorithms yet. I dunno what a PERT chart is (or why it's so-called), or how you'd apply this to neolithic computational linguistics from either 1968 (when this was first published) or 1997 (when the third edition came out, assuming that part got updated 25 years ago). I had never previously encountered the word "Irreflexivity" (nor apparently has thunderbird's spellchecker") but it was nice of Knuth to say he personally found it "easy to see" and that anyone to whom it is not would not be the target audience of his book. (Possibly I need to have read the book from the beginning, and he'd have defined Irreflexivity somewhere previously. Or he just assumes my math minor was more thorough than it was and more recent than it was.)

I was hoping that reading the corresponding part of the book would help explain the sample code I dug up (which is MIT licensed so I'm reasonably comfortable reading it and writing a new one), but I think I'm better off figuring out what that code is doing and trying to reverse engineer why.


October 10, 2023

All my usual Ukraine information sources have suddenly started covering Israel, which I find exhausting and slightly sad. Yes the recent attack is terrible, and the response is also going to be terrible, but I am not a domain expert on the middle east and neither are most of the people I've been getting info about Russia and Ukraine from.

I am out of my depth in the middle east, and do not expect to ever have anything useful to say about most of it. What little I know there post-1700 comes from studying the oil trade, which Israel isn't a big part of. (Some recent offshore natural gas, but it's not very developed yet and there's no pipes so exports would have to be LNG which is limited by tankers and import terminals.) What's going on in gaza is nasty and complicated and I do not pretend to understand it, and I'm not GOING to in a useful amount of time. (There are a bunch of terrible things in the world that I can't do anything about. I have no expertise nor influence over India's slide into fascism, for example. I hope they don't?)

Even just understanding US domestic politics involves knowing about the Southern Strategy and the Civil War and the 3/5 compromise all the way back to the introduction of Malaria to the New World by the Jamestown colony that eventually led the colonists' pivot from indentured servants to African slavery in the first place, all of which is backstory to Reagan undoing FDR's New Deal and recreating the Gilded Age (lowering the top tax rate from 70% to 28% gave us the national debt and created billionaires, the two graphs are a mirror image of each other) and the explicit circa-2010 strategy of moving the Overton Window that gave the Gerasimov Doctrine an opening for Russian troll farms to eventually outright hijack an election. (That wikipedia article being studded with "this doesn't really exist" disclaimers is an EXAMPLE of the Gerasimov doctrine in action.) Along the way there's been a bunch of religion involved, and at least three major CATEGORIES of racism (neither the Chinese Exclusion Act nor the Trail of Tears were about Africa), and recently Sputnik begat Darpa in time to educate the Baby Boom but Tetraethyl Lead undermined everything and all that's just some of the CONTEXT you need to even TRY to make sense of recent moves in the country I _live_in_.

I'm trying to follow Russia's invasion of Ukraine because Russia constantly interferes with US politics in ways that directly affect me, which I've already spent a decade studying closely, and I would like them to stop. I'm fairly certain the end of Russia would mean the end of the US Republican Party, which would be a very good thing.

Russia has a HISTORY of dicking with the USA, and their return to "fascist regime trying to conquer the world" after a brief interlude as merely a crime syndicate is unsurprising. Their cold war security state got recycled into the world's largest online troll farm, which I have to care about both as a programmer and due to merely being online. I've also studied the oil industry a lot (even worked in it briefly in 2008), and the poster children for the resource curse are Russia and Saudi Arabia. (Resource curse: when the government's money doesn't come from taxing its people then it doesn't really NEED its people, and can become genocidally repressive and just import what the elites want.) I started with a lot of context on Russia, and still don't remotely consider myself an expert but have enough background that constantly consuming multiple sources of information about the conflict doesn't make me feel TOTALLY lost.

I'm rooting for Ukraine to kick Russia's ass, as Putin's overextension and miscalculation here could topple Putin's regime, or at least greatly reduce his ability to exert influence, or maybe just tie up the 70 year old man with the tremors and puffy face until he dies of old age. Post-Putin, I expect Russia to fragment, with an unknown amount of civil war as it breaks apart (and possibly China eats the eastern 1/3 of it, but Xi-who-must-be-obeyed has his own problems and the land's surprisingly useless.)

The USA has piles of military surplus left over from the cold war that was designed to be fired at Russia, which we still have largely because it's expensive to dispose of safely. Handing it over to Ukraine so they can dispose of it by firing it at Russia makes sense to me in context, and it's nuts we haven't done more of it faster. We spend 3/4 of a trillion dollars annually on defense, and the entire cold war was "USA vs Russia" for half a century, so nickel and diming the prevention of that threat re-emerging is fscking stupid, and largely happening because the constant Russian interference mentioned above is the reason the modern GOP still exists. (And gerrymandering and voter suppression and so on, but Russia financed and organized surprising amounts of it because it's the most effective way to sabotage the country. Every US government shutdown has been the republicans.) If Ukraine is willing to stand up to defend itself from Russian aggresion we should give them railguns, orbital lasers, and the SHIELD helicarrier to do it with. But just about the only NEW stuff the USA has given Ukraine so far was the Patriot battery, which is not the same technology we used in the 1990 Iraq war any more than each new year's iPhone is the same technology. The newest F-150 trucks are electric vehicles, it's still an F-150.

So I am out of my depth in the middle east. I strongly suspect that part of Russia's payments to Iran for drones and such to attack Ukraine with has come in the form of support for this attack on Israel: training, weapons, intelligence, and so on. MBS (Mister Bone Saw) in Saudi Arabia doesn't care much about Israel and bought all his weapons from the USA so using them to attack our ally would be a career limiting move (he wants to conquer Yemen and Qatar instead, just... somewhat surreptitiously). Israeli intelligence's failure to see the attack coming has Russian fingerprints all over it. (Maybe Wagner? That's tangled...)

But that's not what most of the Ukraine bloggers are talking about. They're chasing clicks, and trying to predict how many war crimes Netanyahu feels he's owed now, and I just can't.


October 9, 2023

I did not expect to spend this evening's brainpower wring a long screed on toybox help text spacing, but that's what wound up happening. Long, and surprisingly exhausting because I just audited all the help text. (In a way where the result is "I need to do it again", but I managed to categorize stuff and explain some regularities, so that's... something?)


October 8, 2023

I listened to the first five Amber books on youtube, and the guy uploaded another classic science fiction novel called "fungus", with a book cover of somebody screaming as fungus grows out of his mouth and nose. I have no interest in listening to it (it predates "the last of us" by something like 40 years but nobody ever said "a human version of cordyceps" was a new idea), and find the picture kind of disturbing on repeated exposure, so I told Youtube "do not recommend this video again".

So Youtube made a playlist with that as the first video (and thus the thumbnail), and has put it in every single recommendation list since (including under other videos). It's not music, it's an audiobook, but it playlisted it anyway. And there is nothing in the pulldown menu for a playlist to tell it "never do this again" the way you can with a video.

I went in and gave the video a thumbs down. Didn't help. I selected "hide this user" in the user's page. Didn't help. Next time it recommended a different video from that user, I told it never to recommend anything from this user ever again. (Which is apparently NOT the same thing as "hide this user" from the user page, and which you can't do unless it DOES recommend a video from the user.) Which I didn't really want to because it's a channel of vintage audiobooks that are out of copyright because of the berne convention and library of congress' old program for the blind, and in theory I'm quite interested in what ELSE they put up, but after multiple days of this thumbnail showing up in every list I'm willing to sacrifice that to STOP SEEING THIS THUMBNAIL. But it didn't help, youtube is STILL SHOWING ME THE PLAYLIST BASED ON THE THUMBNAIL I DON'T WANT TO SEE. I've submitted two help tickets about it already, but youtube doesn't actually have humans working there anymore.

Maybe the fungus got them.

Oh, and today I got a quite explicit youtube ad for erectile dysfunction medication that among other things spent about 1/3 of its runtime on an animation of a guy giving oral sex, which played before a scuba diving video that had censor blur following a woman in a bikini because she's not allowed to dress like that. You can't make this stuff up. Way beyond double standards here. (If it was on "sexplanations" or "sex positive gaming" or one of the other channels that's basically 100% demonetized and makes its money off sponsorships and patreon, that would be one thing. But it wasn't. To be clear, I don't object to the ad, any more than any other ad on youtube. I object to the hypocritical sexist censorious prudery from assholes who cordon off "for kids" videos and then scatter caltrops over the rest of the site anyway with constantly shifting standards.)

This is the problem with KOSA and Fosta/Sesta and all the other attempts to bring back the Comstock act (which was still being used to outlaw birth control in 1960). "I'll know it when I see it" is NEVER a good basis for legal enforcement. It CAN'T BE. That's NOT HOW LAWS WORK. (And another 30 layers of bad starting with the whole "victimless crime" can of worms. The Boomers aren't dying fast enough.)


October 7, 2023

"You need 16.1 gb to install Linux Mint, you only have 8 gb."

Really? Sigh. Lots of bugs coming from ecryptfs and/or mint. Fixed two, but installing it in a VM seems like a good idea. Modulo it's a pig.


October 6, 2023

Alas the REASON that help.html doesn't #include the header.html file (with the nav bar down the left and the version info up top) is the file isn't checked in, it's generated by help -ah, and that generator is trying to be generic, not part of the www/ directory. Adding an #include of the header.html file would sort of be a layering violation.

I'm pretty sure I need to change the nav bar ordering. And hammer on the about page some more. (I shuffled things around again a couple days back so the index.html for toybox goes to about.html. A few months back I added the quickstart page and made that the default, and before that news.html was the default.

The point of making news.html the default is to show the project is active. The point ot making quick.html the default is to quickly explain how to use it (or at least try it out). The point of making about.html the default is to explain what it is, why it is, and why you might care.

Circular dependencies again. I want to explain everything at once, and can't. I should break down my "simplest possible linux system" talk and try to redo that in more digestible chunks.

I'm told patreon does video now...


October 5, 2023

Thunderstorms kept me awake last night, but it's cooled Austin down quite a bit.

The construction on the porch at the UT Geology Building ended a while back, but they didn't exactly put everything back gracefully. They left the tables a mess (all scrunched into only half the patio) and the Large Pointless Art Object is unboxed but only moved a little way out of the corner (it's big and heavy, and where exactly it "belongs" on the porch is debatable), but they seem done for now. There was enough space for me to drag a table into the corner near the outlet, and set up on it again.

I can do compiles and kernel builds here again! Time to update my kernel patches for 6.5... looks like just one failed to apply. Fixed it up. Let's do an i686 test build...

Oh right. I broke the shell. And I have a partial fix in one of my directories, but it wasn't the FULL fix, and I got distracted...

Why is it when I have 100% initialized a variable, deterministically, gcc gives me a zillion "may be used uninitialized" warnings (because it can't track variable pairs, ala y is only used when z is set), but when I move a group of initializations between the loop start and their declaration several times and drop one, it compiles fine without warnings (because I fed &var to the function so it COULD have initialized it)? And it works in my testing because the stack is zero when I first call the function (the missing initialization is to zero). And ASAN never blinks because yes, that is my stack space.

Tools that are just good enough to give one a false sense of security set one up for embarassment. Sigh. Oh well, fixed now. (Oliver Webb caught it. And apparently several other things in the test suite, he's apparently running in a very different environment than I am. I'd like to reproduce and fix all the issues he's hit, but "reproduce" is step 1. This one I could understand via inspection, and was in code that got checked in earlier this week so is still on probation anyway.)

I _suspect_ the tar issue he hit is the sparse stuff again? But... I could be wrong about that? I didn't properly fix the sparse stuff because I don't know HOW to, the code in tar.c is behaving properly and it's the TEST that's being funky. The filesystem never honors the sparse granularity we feed it because there are on 1 byte holes, it's going to be rounded to some kind of extent granularity and I don't know how to query a filesystem to ask what that granularity IS. In something like isofs or certain flash filesystems it may not even be a power of 2.

Sigh. I suppose I could do something like "echo hello > file && truncate -s 1m file && tar cf file.tar --sparse file && rm file && tar x file.tar && du file" and anything less than half a megabyte from du would pass? My TODO item here is "--hole-detection=raw" which says I should teach tar to have a mode where any run of zeroes 512 bytes or longer is recorded as a hole (tar's internal block size is 512 bytes). That would at least reliably produce a tarball regardless of what the filesystem is doing. Except the ability to interrogate the filesystem is one of the things I'm presumably testing. But the filesystem behaving consistently kinda needs to happen there. Unless I make a tool to query the filesystem for the test and then check that what tar did matches what the query tool said, except the test suite is designed NOT to require a compiler. (It tests existing binaries, so you can run it on a slim target.) And the only thing that's really coming to mind is "du", which doesn't tell me where the holes ARE. Unless maybe I can do something clever creating the test file in stages...? Hmmm...


October 4, 2023

Watched the start of b0rk's talk about bash, and she pointed out that set -e is nerfed within shell functions that are the left side of the || operator, and now I'm wondering about scope. I need to move something into struct sh_fcall to track this, but "bash -c 'x() { set -a; x=123; }; x; y=456; declare -p x y'" says that y is exported so set -a survives the function context pop. I _think_ this is just set -e being special? The bash man page says:

The shell does not exit if the command that fails is part of the command list immediately following a while or until keyword, part of the test following the if or elif reserved words, part of any command executed in a && or || list except the command following the final && or ||, any command in a pipeline but the last, or if the command's return value is being inverted with !.

Which sort of implies it's a blockstack property, or at least honoring -e involves a blockstack traversal at exit time? Hmmm...


October 3, 2023

Tracking down a bug, and it turns out I broke timeout. Specifically, adding timeout -i broke it, and my fix for that broke a different part of it. The problem is that occasionally the child exits before the parent calls wait(), and if the SIGCHLD handler's already been called wait() says no such child. I thought you could safely call wait() IN the SIGCHLD handler, and thus logically could call wait() after longjmp()ing out of the SIGCHLD handler. I needed the longjmp() because getting the poll() for timeout -i to reliably exit otherwise was just not happening despite multiple attempts, and never resuming the poll() was the fix I wound up with. But apparently wait() is returning "no such child", and thus we produce the wrong exit code from the test.

I was feeling good about having found the problem via thinking about the issue and code inspection, and then right after writing an email about it noticed a 3 hour old message in my inbox where Android's test guy had found the issue via strace. (Eh, either way works, but for a moment there I felt competent. Which is not a requirement here: the nature of toybox isn't "things nobody else could do", it's all "things nobody else has BOTHERED to do". Janitorial work, really. Luckily I hadn't yet edited and uploaded the debugging war stories blog entry, so this wasn't a "Ha, I can find your bug!" then fails to find bug before everyone else does situation. Not that I ever said I was particularly FAST at doing so, mostly just unusually stubborn about it...)

As for fixing it, I don't know what's up with wait() and probably need to change the SIGCHLD handler so it gets delivered the extra data blob and then harvest the signal-or-exit value out of said data blob, which I dunno how to do but that's what man pages are for. The point is WHEN the parent gets the SIGCHLD, at that moment the data we need is available. And later it apparently isn't, at least not reliably. So I need to figure out how to grab it as it goes by. (Is it leaving the signal stack context that frees the zombie? How would the kernel _know_? Why would it _care_? Was it always a race condition that the signal handler can call wait() until the child returns from the signal delivery to exit from zombie status, and the longjmp() is just giving it extra time to do so? Hmmm...)

Using the bigger signal handler struct is one of those things like calling stat() where I wince slightly because the kernel is populating a struct with 50 things of which I need 3, but I've gotta reign in my microoptimizing instincts because the structure setup is swamped by the syscall overhead. Like, an order of magnitude. That's why they return so much data in a big struct, it's easier than making individual system calls to query each thing. (Which is why they added the vdso page to make some hotpath syscalls very cheap...)


October 2, 2023

Circling back to tsort: when I feed "a b c d f f d e" to debian's tsort I get "a c f b d e", and when I feed it to busybox tsort I get "a f b c d e", so whatever my tsort produces (in this case "a b c d e f") it's not gonna reliably match TEST_HOST=1 because different hosts don't match EACH OTHER.

Getting the algorithm to work is easier than getting the test suite and documentation right. There should be a name for this, because it comes up a LOT.

Urgh, and the STUPID COMPILER will not SHUT UP. I've got a qsort() over an array of entry pairs which sorts them by the second element (as described here at least twice), and bsearch() is designed to work on the same thing (after you quicksort it you can binary search it) with the same search function. Except in one of the cases I want to search to see if the FIRST element is in any of the second elements, and since I'm feeding it an array of length 2 the obvious thing to do is subtract 1 from the pointer I feed bsearch, so when it adds 1 again it's looking at the first element instead of the second.

The compiler INSISTS ON WARNING ABOUT THIS. It never references memory I don't own, but the POINTER goes out of bounds briefly until the increment undoes the decrement. Which DOES NOT MATTER. But typecasting it to (void *) does not undo the warning that gcc INSISTS on grenerating. Only typecasting everything to long and do the the math in longs where the compiler CAN'T SEE IT AS POINTER MATH allows it to happen without a warning. I want a "shut up, I know what I'm doing" annotation like (void *) or extra parentheses around an assignment usually do. (To be honest, I'd like it to NOT WARN about stuff that ISN'T BROKEN, but the compiler isn't that smart and all the AI nonsense that tries to make it "help out in the kitchen" just makes stuff worse.) So I had to check in an LP64 workaround with a large comment.

On the bright side, I'm angry enough to make progress again.


October 1, 2023

Sigh. I extended dirtree's "again" field to note when we'd followed a symlink because of DIRTREE_SYMFOLLOW, which was mostly a question of changing all the callers that just wanted the AGAIN bit to mask that out... and then it didn't fix the problem I was trying to address. I don't want to rip the new infrastructure out (since it's an instance of something fairly generic: DIRTREE_BREADTH and DIRTREE_AGAIN and so on are already setting their respective bit when they're responsible for a callback), but it does not currently have a user.

The actual problem I was trying to fix (that a symlink you follow shows up as a hardlink when the same file's in there twice) got fixed a different way, which is I'm still a bit unhappy about (the hardlink field being greater than 1 was used as a filter so I was only adding a few entries to the dev/inode pair list, but with --dereference anything could be a hardlink, so I have to add everything, and it's not using a hash table or tree structure that's designed to scale to lots of entries). But rather than prematurely optimize, I'm waiting for somebody to complain...


September 29, 2023

Saw that Elizabeth Warren has become a co-sponsor of KOSA (I.E. Comstock Act II, the Focus on the Family Online Tracking Act outlawing mention of gay people to anyone too young to rent a car, and thus requiring photo ID to access the internet so Rhonda Santis can dox you via supoena to direct the tiki torch brigades at their next target's living room). I will never give Elizabeth Warren money again. I will never vote for her again. I'm working on a guttering candle's fragment of energy and this is VERY DISPIRITING. We can't have nice things while a Boomer lives.

"FOSTA/SESTA 2: even harder" is trying make a country where "truth leaving her well to shame mankind" can no longer be posted to the internet. It's already a problem but Wikipedia needing photo ID to allow you to see the original is a BAD THING. Which I suppose was the point of the painting, highlighting the irony. I should give money to AO3's lawyers, they'll need to move all the servers overseas.

Alright, focus on the work. Sam Vimes' mantra: do the job that is in front of you. I want to match debian's "fold" output exactly (at least for non-utf8 input), and the edge cases are kinda fiddly, but I have a bunch of test cases I cut and pasted while writing the thing, which (as always) are very helpful because each one was me asking "so what is it supposed to DO here?" because I didn't know. Just getting these two right:

$ echo -ne '\t\n' | fold -w2 | hd
00000000  09 0a 0a                                          |...|
00000003
$ echo -ne 'a\t' | fold -w2 -s | hd
00000000  61 0a 09                                          |a..|
00000003

Was a bit of an ordeal. And my big clue that this fold.c needed a rewrite was that of the 5 tests it already had, about half didn't past TEST_HOST. As in the output of this command, and the output of the debian command, as exercised by ITS OWN TESTS (the tests the previous fold.c in pending had before I touched anything), did not match.

Sigh: and it turns out there's MORE stuff I didn't know:

$ echo -e 'abc\t\bdefghi'
abc    defghi
$ echo -e 'abc\t\b\bdefghi'
abc   defghi
$ echo -e 'abc\t\b\b\bdefghi'
abc  defghi

When you print a backspace character after a tab, it backs up one column not one character, I.E. the terminal does NOT remember that it printed a tab and undo the tab. The xfce terminal window distinguishes space from tab for cut and paste purposes, because otherwise you break makefiles and python and patch hunks and all SORTS of things that care about non-visible whitespace differences. But the legacy terminal output plumbing grandfathered in this tab vs space behavior, and I implemented the obvious MODERN interpretation, not the legacy one from before we had wide and combining characters and HAD to track this stuff.

Sigh, do I need some sort of --legacy command line option to tell this fold to work like a fold that is not utf8 aware? Hmmm, that won't help. The problem is "when you cat this file from the command line to the terminal, what do you see". And the terminal retained some legacy stupid due to our old friend, the hysterical raisins.

Alright, how does backspace interact with unicode on xfce's terminal? Let's stick a combining umlaut on the letter w, backspace once, and overwrite it with x: echo -e 'w\xcc\x88\bx' and I get an x in the first column, which does not have an umlaut. What's a double width character I can stick that combining umlaut on... the first character of my japan.txt file is three utf8 bytes, so echo -e '\xe7\xa7\x81', then stick the umlaut on and... it's showing the umlaut NEXT to it. Well it is a fairly tall character. So backspace and type x (echo -e '\xe7\xa7\x81\bx') and... really? It put an x in the second column, with the first column blank. So it backed up width one, but zapped the whole character.

Really?

Ok, before the Boomers shove KOSA up our asses and you have to log in to see me say this, I'd just like to say, from the bottom of my heart, "unicode is kinda fucked up, isn't it?" (I implemented categorically LESS broken plumbing, and now I have to break it.)

I wonder how the terminal program Rich wrote years ago handles this stuff? It's abandoned, at some some obscure repo URL, and may not compile with a current compiler if nobody's tried in 7 years, but NOW I'M CURIOUS.

Except the point of focusing on fold.c is that I'm trying to do a shortest-job-first scheduling where I FOCUS on something and actually FINISH it. I've advanced it to the point where new difficulties present themselves. Again. I should not switch away, I should FINISH it. Hmmm...


September 28, 2023

Reached the point on fold where I'm deleting my TODO item comments out of the various debris sections I repeatedly reimplemented (mostly each reason why I abandoned THAT bit, because it didn't handle a corner case I'd just realized) and I think I've got them all? It's all notes-to-self like "only update space if it doesn't put us over, without checking over twice?" that wouldn't make much sense to anyone else. (There should be a way to have just ONE check for the line getting long enough we need to insert a newline, which means updating the "after last space" checkpoint needs to occur after that check, otherwise we'd back up to the wrong spot when a line ended with a space.)

Lots of traversing two parallel metrics (width and bytes) that have to stay in sync but also be kept separate and each handled appropriately. And various temptations to inappropriately optimize. For example: when I output a partial line only up to the last space, do I _have_ to restart measuring both width and bytes from there, going back over the part of the line I already did see before deciding it was too long? Answer: yes, because I know how many bytes it's been since then but not how much WIDTH, and it's not just "also checkpoint width for space" because I'd need to back up in my bit array of character starting positions.


September 27, 2023

Listening to audiobooks of the Amber series. This particular online source of dubious legality (no guilt: I've owned the complete series in paperback since childhood and the author died almost 30 years ago) was apparently recorded "for the blind" by the Library of Congress in 1979. I expect the narrator to switch for the first Merlin book, since this this was recorded before that came out. (He came to a sort of stopping point, and then resumed with a different protagonist 5 years later. Yes, this series is the reason typing "pattern" so many times in sed.c wound up with me inserting "logrus" a few times to balance it out. Due to eyestrain being a limiting factor on how long I can work, I don't read nearly as much for fun anymore, but I listen to a lot of audiobooks.)

I've been interacting with the Hexagon toolchain guys on the musl mailing list. They submitted their hexagon musl changes to Rich, and I mentioned I'd fished their hexagon llvm+musl build out of QEMU's test suite a couple years back but version skew in the llvm repositories broke it not long after, and they pointed me at a git repo for their build plumbing! Which is... not newbie friendly. There was a thread where they added documentation and I described my attempts to make it work.

I don't want to go down this rathole right now. I have too many cans of worms open already, and am spending all my time running between spinning plates as each mixed metaphor loses context and needs refreshing. Swap thrashing, close tabs. Which brings us back to the TODO list critical mass problem: working on TODO items spins off new tangents faster than I tie them off.

My EXCUSE for parking this for the moment is I need to do that devuan version update first, because their build is using python 3.8 for some reason. And I need to close a zillion tabs and swap over the 16 gig chips. And so many tabs are where I left off working on a toybox command...


September 26, 2023

The question of whether I feel under the weather due to actual biology or some sort of malingering has been answered by digestive issues of the kind that make leaving the house inadvisable. Which is frustrating for a number of issues, not least because I walked to UT and back last night for the first time in something like a week and rather enjoyed the exercise, and now I can't repeat that tonight.

Alright, jettisoning -u in fold. It's not in any standard, wasn't actually a requested feature (tr '\n' ' ' unfolds just fine), and trying to make utf8 support work with -s in that was just nuts. (The need to checkpoint -s data means you can't move on FROM it, plus the -w 0 infinite unfold part is essentially a different codepath entirely...)

I can tell when I'm not at my best because I get stuck on little things, like a boat floating in too little water. The -s option to fold wraps at the last space, meaning I can't just output each character as I encounter it but need to bookmark and potentially go back. Which means instead of reading into toybuf and cycling through it (thus potentially overwriting the space character I'd want to back up to) I should getline(), but that reads arbitrary amounts of data into memory at once when the line's really long (tr '\0' ' ' < /dev/zero | thingy) and my above suggestion of using tr to unwrap lines produces exactly that kind of input.

As I said: little things. Unsure of which approach is best. I could stick with toybuf because the posix fold spec says -s backs up to the last blank "meeting the width constraints" and 4096 bytes seems entirely reasonable... but it's still an arbitrary magic limitation. Which very few people other than me would ever notice.

The other thing I'd LIKE to do is write each line out in a single write() instead of having a lot of individual write() calls for each (wide) character. I'm aware that getline() could easily be doing something icky under the covers, and I could pass output along to fwrite() so the FILE * shenanigans handle it. (Or, quite often, don't, but then it's not MY fault.) And the semi-logical thing to do is write out chunks between each space so I've disposed of data I KNOW I don't have to back up to later. Except it doesn't always have -s which implies two codepaths, which is bad... Again, little things. None of these are hard to implement, it's PICKING AN APPROACH.

As for the line based approach, "while (ss = getline(fp)) {stuff();}" is simple but can't handle embedded NUL bytes. Do I _want_ to handle embedded NUL bytes? Would anybody other than me notice if I didn't? And here's the REAL failure mode: I write 15 different versions of the code and frog them again because I hit a corner case that this approach ISN'T best for so go switch to another one with its own downsides.

And there's confusion between the "width" and "byte" contexts. If backspace actually backs up by the width of the previous character, I essentially need a bitfield of character starting positions because variable width unicode characers aside: tab is DYNAMIC width depending on where the cursor currently is. You can't traverse backwards through tabs, you have to either remember the position or re-parse from the beginning. But the bitfield isn't BYTES, it's WIDTHS, meaning traversing back tells you how much width the last character consumed but that doesn't tell you where in the STRING to go. You still have to traverse backwards to get that. Which isn't TOO bad in utf8: if the top two bits are 11 it's the start of a unicode sequence and if the top two bits are 10 it's a later character in the sequence, otherwise it's conventional 7 bit ascii. EXCEPT that utf8 sequences aren't necessarily well formed and going forward I took utf8 parsing failures as "width one", and the above reverse traversal heuristic would happily skip a dozen 10 bytes after a 11 byte when the stupid unicode crap (because microsoft) is broken so you can only have 3 exension bytes. (Gratuitously limiting the total unicode space to a little over a million characters.) And I'm just gonna PUNT there: I think I'm ok getting the counting wrong in the presence of malformed utf8/unicode. (You also can't conceptually backspace through leading combining characters at the start of a line, which have nothing to combine with.)

Of course "not at my best" is another one of those "ADHD has hyperfocus surges and when you're not in hyperfocus you're performing below potential, you must PERPETUALLY SPRINT" things. Best is a bad metric, I know. But the gap between "now" and "best" is pretty noticeable at the moment.

I still trip over problems left and right: I can test and debug all sorts of crap. SOLVING the problems so there AREN'T bugs seems way harder than it used to be.


September 25, 2023

Trying not to reply negatively to emails from well-meaning people. Oliver Webb is contributing well on the list, and I want to encourage him, even if "do we really need a 'dc' implementation" is a question I was not prepared to address before it came up. (But he's finding stuff to do! When people ask me what needs doing I never know what to say, and I'm still a bit gun-shy at even trying after pointing Divya at the test suite.) And scsijon pointing me at a GPLv2 licensed shell is... he MEANS well, and shell stuff is indeed hard, but writing up everything wrong with the suggestion would eat an evening. I tried to restrain myself.

Still better than the guy on the coreutils list who wants to "promote a culture of cautious file management practices" by removing the -f option from rm, because it's too convenient and people clearly won't develop new muscle memory and who needs compatibility and it's been around for 40 years so clearly is a pressing problem and I'm 95% certain that guy's trolling because no human being is that fscking stupid _and_ able to find a project's mailing list. But... maybe not?

Luckily, I'm not obligated to respond to that one. (I did weigh in on the "why do we still use info" thread, but just the once.)


September 24, 2023

An easy combining character to play with is the umlaut, but while echo -e 'x\xcc\x88' adds dots to "x", it does NOT add dots to tab or space.

I think the logic I need is to skip past (basically ignore) combining characters and handle iswspace() as one category (for -s), printable characters with width >= 1 as another category, tab is magic (because it's variable width), most low ascii is essentially combining characters (zero width), and then newline/linefeed are pretty much the same thing to the pumbing (resets character count for newline insertion purposes).

There's no model for how -u should work other than the patch I got (it's a new feature), but it should turn newlines into spaces because otherwise echo -e 'abc\ndef' | fold -u turns into abcdef with no space between, and the line break IS a form of whitespace. (Nope, not doing magic backslash nonsense here.)


September 23, 2023

Everything hurts. I seem to have managed to injure my knee in my sleep. Took a prophylactic ibuprofen and zyrtec, and now feel less bad, but I dunno if that's just the placebo effect. I've gone through a 12-pack of the v8 caffeinated blueberry juice cylinders in 2 days (ordinarily it takes at least 4); the caffeine is basically having no effect right now. Haven't walked to UT since... my step counter says it's been 6 days. Wheee...

The problem with poking at fold.c is a well-meaning contributor added -u which makes it WAY more complicated (the combination of -suw becomes semi-unbounded for one thing, although "re-parse the whole mess from the start again each time" is the brute force approach), and handling unicode properly (posix talks about "columns" a lot) is extra-funky because backing UP in the presence of combining characters is kind of unpleasant (although backing up through well-formed utf8 isn't that hard because (char&c0)==c0 means it's the start of a utf8 sequence and trailing combining characters just get skipped), and then there's the question of how low ascii is handled, although four of them are already special cased and a quick run of for i in $(seq 1 31); do echo -e $i "abc\x$(printf %02x $i)def"; done says that 8 values (ascii 8 9 10 11 12 13 26 27) are weird and the rest silently drop out. \n \b \r \t are 10, 8, 13, and 9 respectively, which leaves 11 and 12 (both going straight down one line) and 26 and 27 both sort of starting an ascii escape sequence, except not really?)

And then of course I ponder interpreting ascii escape sequences, which is code I kind of want for less and watch (so they can at least handle color changes), but that probably should NOT be in fold. Even if the CODE can be shared, the CONCEPTUAL complexity is... not ideal. Nope, not going there in this command.

I miss being able to think straight. Which depending on your definition was basically the Obama administration, but I'd take a local peak right now...


September 22, 2023

Fuzzy took the robot to central market last night and bought ghost peppers. She is fermenting them in a jar to make ghost pepper sauce with. Be afraid.

Got a mail notification from the 401k that cashed itself out because I never made it down to Fidelity before the deadline. (In the special invitation only building you can't get into without giving them your social security number over the phone, which I'm not comfortable doing: you can no longer speak to a human without first making an appointment that requires giving out identity theft information over an unsecured channel, I was not good with this.) So in theory they're mailing me a check and a large tax bill. This wasn't the check, this was the notice of how big the tax bill is going to be. 10% penalty plus a higher tax bracket than I've been in since 2018.

Sigh. It would be nice to have more executive function so I could steer this sort of thing better.


September 20, 2023

The going theory is I've come down with covid again. (This would be what, the 4th time? I don't even have tests this time around, although I'm told I can order another set of free ones. Tests are free, but the new booster shot is not. Lovely. Oh, and if you've already had covid repeatedly, people are saying with the current variant they didn't start testing positive until their second week of being sick. Not really sure how the test helps at that point.)

All I know is I slept for 8 hours (somewhat unexpectedly, throwing what laughingly passes for my sleep schedule way off), and then lay down again and slept for another 5 hours. (Which is unusual for me.) And have now recovered enough to feel merely mediocre. After 13 hours of sleep.


September 19, 2023

Got a bug report about a memory leak in toysh (um... yes?) and finally downloaded valgrind, built sh with CFLAGS=-g and ran valgrind --leak-check=yes generated/unstripped/sh -c 'echo hello' which says there were 45 extra malloc blocks at exit and the first three are... environment variable allocations that still existed in our environment space when we exited.

Which is because we allocate "name=value" environment variable strings via malloc, and we put them in our own environ[] array block, and neither gets freed before toysh exits. Multiple such variables are allocated in the shell entry path, because we set SECONDS RANDOM LINENO GROUPS BASHPID EPOCHREALTIME EPOCHSECONDS EUID UID PPID PATH HOME SHELL USER LOGNAME HOSTNAME HOSTTYPE MACHTYPE OSTIME OPTERR BASH PS1 PS2 PS3 PS4 SHLVL PWD OLDPWD and so on. The plumbing should free the old allocation for each of those variables if it gets set to a new value, so it doesn't LEAK, but there's no cleanup on exit to get rid of them because the OS does that for us. (Oddly enough, the shell itself is not a NOFORK or MAYFORK, it very much expects _exit() to have the OS clean up after it.)

This is a "dead dove, do not eat" situation. I don't know what I expected. Rather a lot of filtering would be necessary to find INTERESTING leaks, which is a thing I knew, but there it is in front of me. The question is what accumulates over time as scripts run. The shell design does manual garbage collection (the "delete" lists) and I could easily miss stuff. The big loop in do_source() calls parse_line() until it returns 0 to accumulate struct sh_pipeline segments in pl, and then calls run_lines() on the accumulated pl, and afterwards calls free_pipeline() (via llist_traverse), and also cleans up the expect list (which is error handling, it should be empty after successful parsing). Right after that (currently line 4130-ish), if anything has accumulated that is NOT on one of the persistent lists, it's a leak. So I'd like to call some sort of dump_heap() function right there to tell me what's been malloced but not yet freed, and let me whack-a-mole out the stuff that SHOULD be there, so I can spot leaks. Neither valgrind nor ASAN are really designed to work at that layer.

The other problem is, there are multiple persistent lists. (Well, resizeable arrays really.) There's the environment variable list, there's the defined functions, there's shell aliases, there's job control, there may be some "set" debris from having changed our command line arguments (but the next set should free them if they're not pointing to environment space)... Each one locally cleans up after itself, but does not try to cleanup anything on exit because the OS does it for us. So a shell-local leak checker needs to traverse all those to see if each hunk is tracked in a known category that could still free it as needed (and thus hasn't _leaked_ yet).


September 18, 2023

Yes, the hancock wendy's also has pumpkin spice frosties. They taste more gingerbread than anything, but that's not a bad thing.

Distracted into looking at httpd.c (big surprise), the .htaccess file needs to be readable ot the httpd process which means I have to special case check for and NOT show it. (And even if the permissions did something clever anyway I'd probably still want to do that because forgetting the permissions on a file shouldn't be a waiting land mine.)


September 17, 2023

Wendy's has pumpkin spice frosties, but the line in Jester Center at 1:30am was waaaaaaay too long to try it, and I got to programming in a quiet classroom until after it closed (at 4am). Maybe the one in Hancock Center is also doing them?

Sigh, trying to close tabs so I can reboot my laptop to swap memory (and do a full backup, and upgrade to the new devuan release), and one of the open windows is a thunderbird email reply to a message from the qemu-devel mailing list reminding me to confirm that the malta fix went in and worked so I don't need to patch qemu locally anymore when testing mkroot's mips build under qemu... and I can't build qemu anymore because it requires python >= 3.8 and Devuan Bulimia has 3.7 something. (And I still refuse to USE anything newer than 2.x because dude, seriously?) So that's a todo item with a blocking dependency, can't do it now.


September 16, 2023

Trying to close some old windows, an email reply I had open since last month (half-replied and buried under other windows) reads back to me like a cross between bragging and "old geezer repeating the same war stories everyone's heard over and over", so I cut and pasted them here to my self-indulgent blog for posterity. [And then finished the thought where it trailed off, because that's what I do when editing and posting blog entries.] The context is needing to keep a bug in a project's bug system forever if you can't prove it _isn't_ your fault. My assertion is that _not_ tracking a given bug down is almost always a choice driven by resource constraints. And I know that because more than one of my consulting gigs has been somebody who's worked with me before flying in "that guy who root causes baffling intermittent problems that only SOMETIMES happen".

A few such war stories that aren't paricularly proprietary or NDA:

  • "There's nothing wrong with your java process bringup, this other background boot process checksumming the firmware image for DRM reasons is causing enough memory contention to trigger the OOM killer and it's selecting your java app instead of the DRM process as what to kill... ah, don't tell management they can just have the background task add an madvise(unneeded) on the part of the mmap it's already read as part of its loop, you want to use that as political leverage, that part's not my problem..."

  • "Wow, we've actually hit a processor errata where it returns from interrupt using the WRONG REGISTER as the stack pointer which is why there's a stack frame clobbered all over this data structure out of the blue; turns out there's already a workaround for this in the vendor toolchain's libc (never use these two CPU instructions next to each other because an interrupt occuring between then is bad) but we're only using the vendor kernel with a vanilla toolchain, this patch should probably go upstream to the libc project..."

  • "Your network protocol timeout happens even when you make the thread realtime priority because another thread in your giant 80 thread hairball is calling fork() which copies and then discards the entire process' heap because it can only do the COW page table trick when you fork from the main PID=TID thread, so when you fork from a child thread it falls back to copying all the memory which happens under the page table lock and that takes about 75 milliseconds on this hardware and your thread's deadline is 4ms so yeah, lemme replace that fork() and every other fork() that ISN'T from the main thread with vfork() instead..."

  • A company mailed me a board last year so I could fix their drivers, and instead I diagnosed 3 hardware bugs, one of which was simple, one was "this chip doesn't do what you think it does, you want this other chip with a very similar number, yes they're pin compatible but way different, here's the backstory I just researched...", and one was "this trace you cut to fix a problem caused this other problem because you need to understand how USB device probing works and in THIS case how it's conceptually complicated by something called USB 'On The Go' even when you're not using it... so yeah, replace the cut with a diode and you're good." (Note: I am not a hardware person. They were mostly hardware people.)

And of course I do this sort of thing all the TIME in hobbyist work, and can even document it better (because talking in too much detail about employers' unreleased projects outside of work is AT THE VERY LEAST impolite), but blathering about hobby work is just a question of time/energy/focus/memory. Here's a week and a half long chase of a bug from a build script into the kernel and back out again (which is only memorable because of the detailed writeup): Wednesday, Friday, Sunday, Friday, Saturday, Sunday Monday.

Another random hobbyist example that has more than averagely explicit writeup is when I got annoyed that cursor up/down on an actual serial terminal in busybox vi behaved differently than using the same thing through ssh, and instead of "don't do that then" I reported it to the busybox list, and when that didn't help I tracked it down and worked out how to fix it. I wasn't an expert on terminal escape sequence collation, I was just annoyed and capable of reproducing the problem while sticking fprintf(logfile, "here\n") into the code. (Because printing to stdout or stderr would heisenbug the problem I was trying to track down, but writing to a file didn't.) You can see the fix and some follow-up in the relevant section of the git log, the mailing list thread descending from that report above is pretty detailed but sadly mostly me arguing with everybody about what's actually going on, which is one of those cases where I'd rather I HADN'T been right but alas was. (The moral of that story is all wrong! I _want_ the community to be smarter than me and to teach me something new! I learned buckets from the busybox community back in the day! Alas, ex-maintainer Erik Anderson wandered away from the project for multiple reasons (post-linaro he had his own startup which led to overwork which led to marital issues), I myself was chased away from the project by Bruce Perens, Manuel Nova got really bitter after his girlfriend (wife) died, Glenn McGrath burned out trying to do his own GPL enforcement in australia, Mike Frysinger started treating open source as a teaser to get you to buy the "real" proprietary fork of projects he was being paid to work on (prominently blackfin, which resulted in that architecture getting removed from the vanilla linux kernel in 2018 (commit 4ba66a976072) because no work had gone upstream in so long the communty around it had collapsed), Vladimir Oleynik refused to learn english (Russia seems to produce two kinds of engineers, those who move out of the country like Kir Kolyshkin and those who vanish into it like Vladimir did)... Alas the community that brought me in to busybox development back in the day broke apart and drifted off...

Anyway, enough war stories. The point is I can't think of a "great white whale" bug I could never solve (or at least prove to my satisfaction was the fault of other code I could avoid using and didn't care enough about to fix for those who did). The question is always "am I going to spend my anger at the world not being how it should on this, or on something else right now". Fortune 500 companies have budget allocations, but cleaning up loose ends that bother them is what hobbyists DO.

Except I've already got far more loose ends than I can clean up. I entirely understand "I can't drop everything and go medieval on this bug's ass for as long as it takes right NOW, I'm BUSY", and I also understand "I don't have the environment this was seen in, I cannot reproduce your bug". And of course deciding it's not worth fixing (we can live with it or declare it a feature). But I tend to treat "WTF?" bugs that happen to me personally as a crime scene until I've got an isolated and minimized reproduction sequence: nobody move, cordon off the area, start collecting evidence to theorize with. Do NOT let it escape under the fridge, it will be back with friends at 3am. Once I've got a reliable reproduction sequence and vague theory of what's going on I can squish it at leisure, but the point is you can't let it GET AWAY. Yes, this approach is IMMENSELY time consuming (and why I _cannot_ use windows). And probably not entirely by choice, because I'm very bad at NOT doing it. (Call it a luxury or something akin to laziness if you like, I suspect it's a symptom of ADHD.)

It's not that I don't respect other people having a different mindset and/or choosing to prioritize differently. My point was just that it _IS_ a choice.


September 15, 2023

If you're wondering why I keep putting the [citation needed] after Wikipedia, the current edition of their page on the tsort command says "As of 2017, it is part of the Posix.1 standard" (what did 2017 have to do with anything, it's currently 2023. It was in the 1997 version of posix from 25 years ago). Then it says "According to its info page" which is just insane: info was the GNU project's proprietary documentation format nobody else uses, and the GNU project was a failed 1980's unix clone that has nothing to do with modern unix and nothing to do with original unix either. And then the wiki says the FreeBSD manual page dates fold's appearance to Unix V7 which... we _have_ Unix v7, the Henry Spencer archive for example, so couldn't you use a primary source instead of referencing the FreeBSD manual for a statement about non-bsd? No? Of course not...

Alright, implementing the tsort dup list (which is an optimization, but the algorithm's O(N^2) otherwise): it needs start/end, and then it needs two alternating "last" entries that march down from end each time through the outer loop. Does the outer loop's stop condition (cycle detection) care if we _output_ anything (because duplicate suppression) or if we didn't _remove_ anything? Can we have a pass that removes a dependency on something that's already been output and thus suppressed? No, because only depends-on-self nodes could have already gone out if a node depended on them. The closest we can get is a backwards march "a b c a d c e d" producing "e d c a b" and that still needs to output at least one new unsupressed string each pass.


September 14, 2023

Sigh. It's hard to get good information about china because of biased reporting. The sources that translate accurate up to date information into english are often selectively disclosing a biased narrative.

China's even more regional than the USA is, with all the politics that implies. (Good source but also biased.) Here in the USA our "northern" and "southern" states fought a civil war a century and a half ago, and our east/west coasts are separated by "the midwest" which is really the great lakes area (the far less populated expanse from montana down through new mexico is something else entirely, and only matters in contexts where empty land gets to vote instead of people, such as the Senate). The point is, neither "Utah did a thing" nor "California did a thing" are necessarily particularly represenative of the country as a whole.

The current leader of China, Emperor Xi of the Communist Dynasty, is from china's north up near the mongols and the great wall (like most of china's historical military agressors who periodically conquered the rest of the country and ethnically cleansed all the non-han ethnicities; it doesn't guarantee "brutal stabby man" but rising through the local power structure there is a bit like rising to prominence in the entourage of the current Governor of Florida: there are connotations).

The second largest power base (and thus Xi's opposition faction) is centered in Shanghai, the port city whose name became a verb in the west ("shanghaied").

Hong Kong isn't really a power center in chinese politics because it was foreign-owned until recently, and Shenzen was just a suburb of Hong Kong (just across the border in CCP-owned territory) where young Igors would gather to implement the designs of Hong Kong's foreign-trained mad scientists who hadn't had the very concept of "asking questions and trying to understand how the world works" beaten out of them by China's obedience-centered school system. (Which is why China's domestic efforts always need imported foreign tech to copy; the mindset that allows innovation gets you killed under the system Chairman Mao set up. Not hyperbole. All the chinese tech companies are using designs from Hong Kong or Taiwan, or technology "transferred" from western firms, because growing up doing bible study on Mao's Little Red Book then becoming a scientist is like a studying to becoming a geneticist at a creationist bible college.)

In the 1990s the british were guilted into handing Hong Kong "back" to the CCP, which is a bit like Spain handing Florida "back" to the United States when they took it from the native americans. What's left of China's previous government fled to Taiwan, although Japan's invasion in the 1930's deposed the last emperor of the Qing dynasty so the "Republic of China" was a bit like the Bolshevik revolution in Russia deposing the Czars and then all THOSE leaders getting lined up and shot by Stalin a a few years later. Yes, same Trotsky who lead the Red Army when they fought the White Army... look, fascists tend to take power messily. The heap 'o skulls they build a throne out of comes from multiple sources, divide and conquer means allying with a faction to dispose of another faction, then switching sides, rinse repeat until no more factions but yours. They purge their own ranks by peeling off 10% of their base to be Enemy Du Jour until nobody left has any power except the dictator. It's betrayal all the way down, that's the only way an ignorant hick like Mao got the authority to impose idiocy like the "great leap forward" and "four pests campaign" on a pacified populace, because everybody who could say no had been lured out of hiding and killed repeatedly over several decades first.

Anyway: Shenzen is a cash cow to be milked, but not a historical power center with any kind of political base, because it first rose to prominence under the current repressive system and was never allowed to make its own decisions.

So Emperor Xi hated Shanghai, it was the historical power base of his opposition, and Xi used the pandemic to punish his opposition. The harshest most draconian covid lockdown measures were imposed on shanghai. That's where people were welded into their apartments to literally starve to death, and where the economy collapsed completely without recovery. By the end of the pandemic, Xi had reduced the opposition faction's power to the point where he could kill their leader without reprisal, and he never rebuilt the city after that because he thinks it would be handing power back to his enemies.

So every time I see more coverage of how bad Shanghai is doing... guys, it's INTENTIONAL. The leader of china MEANT TO DO THAT. It's entirely possible he succeeded too hard (lots of splash damage in China's larger economy), and entirely possible Xi (like Putin) has retreated into a bubble of yes-men completely isolated from the outside world so he honestly isn't aware of large swaths of his country collapsing around him, but using the condition of Shanghai as proxy for China's economy is like using Prigozhin's fate after the Wagner coup as proof of the predicted decline in Russian airline safety now they no longer have access to western replacement parts. That's not what happened. The fate of Detroit and New Orleans under the Dubyah administration was an interesting data point about the USA under the Dubyah administration, but 20 years later the larger country has yet to collapse because Dubyah was being RACIST: 80% of Detroit's population was black and 67% of New Orleans population was black. That's why a republican administration destroyed those american cities, because the GOP is the KKK with a public relations staff. When Houston (23% black) flooded under Hurricane Harvey it got rebuilt.

The data sources talking about pain in Shanghai are not WRONG. They're not LYING. But they are putting two unrelated things next to each other and implying a causal connection, and it's really hard to put together a mental map from the snippets we get out of context. I think china probably IS doomed, but what "doomed" means in a context this big... The Soviet union completely collapsed around 1990 and Russia was a laughably failed state for a while there... and yet that area of land still exists and still has people on it doing things with leftover equipment from back in the day, which still make rather a lot of news on the international stage today. Biden's approach to the Ukraine invasion seems to be intentionally dragging things out to grind through the old soviet stockpile that Russia can never replace, because it came from the dozen countries like Poland and East Germany that Russia captured from the nazis during World War II with the help of United States lend-lease providing them endless high-tech equipment to conquer territory with, and was then allowed to keep afterwards. (The USA gave back Japan and Germany, the Soviets held on to everything and tried to conquer more.) Without that empire of vassal states providing tribute, Russia is just an oil exporter like Iran, Iraq and Saudi Arabia. In 2020 Russia's top five exports were all fossil fuels, followed by gold, iron, wheat, fish, and ammonia (made from fossil fuel). Without the dozen other conquered nations of the Soviet Empire, Russia isn't very interesting.

But despite repeated collapse (Russia defaulting on its debts in the late 90's triggered the Asian Economic Crisis Japan is still recovering from) Russia still technically exists, and China is similarly unlikely to just suddenly stop. I dunno what's going to happen. I'm trying to find out...


September 13, 2023

I need to leave myself little explanations at the top of each modified sh.c with what problem I was trying to address. Right now I'm trying to figure out which of my modified sh.c instances has the work I was doing to fix the thing where a line of HERE document ends with a backslash so even though the next line is the EOF indicator it doesn't _count_:

$ cat << EOF
> abc\
> EOF
> EOF
abcEOF

Sigh: on my phone, <pre> blocks like the above are in an absolutely TINY font, even though they look fine in debian's chrome on my laptop. No, I'm not specifying a font size change. No, I haven't got a stylesheet. It's a bug in the default android browser that showed up several years ago now, which nobody at Android has ever cared to fix. When it switches to the monospace font the size of the previous font does not transfer over, and is instead set to "microscopic" until the </pre> tag pops the font stack. Obvious, glaring bug, which I complained about back before the pandemic (although it's years older than that), and it's still there in the chrome version I updated to this morning (with the client identification DRM and the new ad tracking you have to switch off in 3 places) because of the new charged vacuum emboitment. Oh well. (Dr. Who used "CVE" back in the 1970s, it's either a Brontosaurus or I'm right on this one.)

Anyway, sometimes I work around that android display bug by doing a paragraph tag inside the blockquote and individually termining each line inside with <br /> instead of using a pre tag. (The trailing slash is xml's way of saying this tag is self-terminating, and I know html isn't xhtml but if you're going to make me add </p> tags I'm gonna self-terminate the standalone tags on general principles. Paragraph tags have a blank line between them, line breaks do not.) But that's enough extra work I usually don't bother. (No idea how it renders on iphone, I should track down somebody with one and ask.)


September 12, 2023

Snuck a peek at busybox tsort.c to see if I got it right, and I actively don't want to KNOW what xrealloc_vector_helper() is doing. And I've forgotten what FAST_FUNC is. And they seem to be using some kind of tree structure instead of just an array? Nope. I'll stick with my naieve, uninformed, possibly inefficient implementation that isn't named after anybody. Send me test cases that break if you care that much...

Sigh, my tsort is outputting stuff in a different order than debian's and busybox's tsort. It's not WRONG: topological sort is not unique, you just need something that satisfies the constraints. But the main difference is I'm peeling out circular entries first and printing orphaned second entries when I print (and discard) the unpinned first entry, so "a b c d f f d e" through debian or busybox gives "a c f b d e" but through mine gives "f a b c d e"., which isn't WRONG. (a is before b, c is before d, d is before e, and f can go anywhere). If I add a constraint so f _can't_ go anywhere (a b c d f f d e d f) you get "a b c d e f" from mine and "a c b d f e" from the other two. Again: neither is WRONG.

But the inconsistency does make testing harder. If my test and TEST_HOST do not agree on the result, what is a "right answer" to compare against in the test? I'm not testing for canned results, I'm testing for reproducible correct answers that I can hopefully get from other implementations. This makes writing tests WAY HARDER than just "I eyeballed this as good once, make sure it didn't change". Nobody said my eyeballing was correct! "Other implementation also did it" is much more reassuring. At least we're CONSISTENTLY wrong...

Hmmm, do I really need to peel out the circular entries first? It smells like I shouldn't, but it's a special case. Nodes that depend on THEMSELVES don't count as a cycle, but when there are multiple pairs that depend on the same string the binary search may return a pair other than this one as the answer to "find somebody who depends on this". So the depends-on-self pair needs to be removed when encountered, THEN we check if any of the remaining pairs depend on it. Hmmm. Maybe with careful ordering... The thing is, all the depends on self pairs should be removed in the first pass. Whether or not they're printed then is separate issue, but if they're NOT printed when removed then something else depends on them and THAT pair is responsible for printing this string when enough dependencies are satisfied that it can eventually be removed. I hate having the strcmp(a, b) as part of EVERY pass through the list. It can only trigger on the FIRST pass through the list.

Alright, I can make the initial collection do the strcmp(), and then set pair[0] = pair[1] so I can do (cheaper) pointer comparisons instead of strcmp(). Doesn't matter from an allocation perspective, this is readfd() doing one big malloc() to hold the input data (actually a realloc() loop but details: one big heap allocation) and then a second malloc() holding the pairs[] pointer array. We read the data into memory, do a pass over it to count words, malloc the pairs list, do a second pass over the data to fill out pairs[] and null terminate all that whitespace (newline or space doesn't matter, it separated strings and now terminates strings, readfd() automatically adds a single null terminator at the end of the file it read in, properly allocated and initialized but not included in the returned length)... anyway, I can throw an if ((len&1) && !strcmp(pair[len], pair[len-1])) pair[len-1] = pair[len]; in that second pass and that means the strcmp() only happens on one pass through the list, not every time. (Well it was bothering me.)

My algorithm is assembling an array of string pairs sorted by second element. (Busybox did an insertion sort, but I just did loop and qsort() out of libc.) Then we loop over the pairs and do a bsearch() for the first string in each pair to see if anybody else has it as their second string, meaning that other pair depends on this pair. (Happily bsearch() uses the same sort function as qsort() did, yay code reuse.) If something depends on this pair, leave it alone and continue the loop. If we get to the end of the loop without finding any loose ends, what's left is one or more unprintable loops, and we error out on the circular dependency, printing the first loop if I'm feeling fancy. (I should implement that. Right now it's just error_msg("loop containing %s\n", pair[0][0]) and let the user figure it out from there.)

When the bsearch() returns NULL we've found a printable first element, so copy the pair to a local variable char keep[2]; and remove it from the list, which is just a memmove() and decrement len. We then iterate through the list of saved strings we've already printed this pass through the pair list (to kill duplicates), and if it's not found in there we both print it and add it to the duplicate list so we won't print it AGAIN.

Then, since I've removed this pair from the list and am about to discard it, I bsearch() for the SECOND entry to see if anything depends on that. If we depend on something that nothing depends on, it can go out now too! If something else does depend on it, printing it is the other pair's job so we can just discard it. So if we should print it, do the same duplicate-suppressed print for the second string as the first string.

I put the duplicate suppression list in the space at the end of the pair list that we moved the entries down out of in the earlier memmove(), back when we removed the element from the list and copied it to the keep[2] local variable. That left free space, and the duplicate list can never have more entries than we removed (because that's where they came from). And once we've traversed to the end of the pair list, we can discard the duplicate list because nothing in it can still be in the pair list...

Nope, that's wrong: echo f a c f | ./toybox tsort | xargs printed c f f a because "f a" couldn't be printed on the first pass (since "c f" depended on it), but when "c f" was yanked nothing depended on f so it went out, and then the duplicate list got cleared before "f a" went out.

Grrr, the duplicate list needs to survive one extra pass through the pair list. That's awkward. (Test cases! So many test cases! Object lifetime rules! The classic saying "The two hard problems in computer science are naming things, cache invalidation, and off by one errors" comes up again: object lifetime tracking is hard even when you're NOT trying to keep two copies of the same data in sync.) Alright, progressive deletion? (For a definition of "deletion" that's just moving an "end" pointer up, but still.)

I miss my 30's. I could keep all this in my head at once and it would still be THERE half an hour later. This is a small enough algorithm I'm not actually having that much trouble with it: the actual tsort.c code minus comments and whitespace is currently 48 lines, 10 of which are opening or closing curly brackets on their own line. (Yeah, "minus comments" knocks out the entire header block with the menuconfig info and help text and such. Add back the NEWTOY() line I guess for 39 load-bearing lines. And I still need to fix the duplicate entry bug I just realized last paragraph.

But oddly enough, this one is easy to code and REALLY HARD TO EXPLAIN. Or at least the explanation is several times longer than the actual code.


September 11, 2023

Added a ts -m option to append milliseconds to the time (since the darn strptime() escape format doesn't handle fractional seconds because neither time_t nor the broken-down struct tm do) and switched it to fetching time with millitime() and got all the way through until... millitime() return uptime, not unix time, because that's what clock_gettime(CLOCK_MONOTONIC) returns. Which is _better_ for ts -i and -s but not what ts without those needs. Sigh, EXISTING API (of ts) SUCKS.

I've been walking to UT and back fairly regularly the past week and change, which adds 10k steps to my day (about 4 miles) and is good for my health... except the Wendy's in Jester Center has finally recovered from the pandemic and is open until 4am again. Heading back home this morning they were still open as I headed out, and I got a 4 for 4. (Which this location still has because they do not offer kids meals. I asked.) Pretty much balancing out the calories of the walk right there. Eh, win some...

Hmmm, I could have the help text decompress into a 128k malloc buffer. Right now defconfig's help text is 82732 bytes and adding all the pending commands brings it up to 108573 bytes. If I move the horrible #ifdef salad into pending.c, and then the rest of the help text doesn't change at all because the decompression works on the same kind of buffer it has now...


September 10, 2023

HEB is selling 3-packs of 32 gig USB sticks for $12. (It rang up as $21 but they corrected it when I showed them the price on the wall, which is conveniently near the self-checkout registers. My habit of twisting the package off the hanger instead of calling an employee to unlock the little thing was not commented on, since I was in the middle of buying it and all.)

My old USB sticks are all ancient and terrible: several are dead and the rest are 1 or 2 gigs, with the occasional 8 gig acting all big. On the one hand 32 gigs is tiny by modern standards, on the other hand it's big enough to be pretty useful.

There's a neon orange one, a neon yellow one, and a neon red one, in transparent cases so you can see the circuit board inside. Gotta do something to stand out from the crowd, I guess... Ooh, interesting. It's formatted with a FAT variant that maxes out at 4.2 gigs instead of 2.1 gigs for an individual file. Um... yay I guess? (Is that how all vfat works or is this that exfat I keep hearing about, or...?) This means if I _do_ get another tiny little USB cube server (or finally get a raspberry pi working), I could run VMs attached to 4 gig ext2 mounted loopback images on USB stick providing scratch space for a mkroot build without worrying about burning out the built-in flash. We've recently established that 2 gig images may not be enough for current gcc, because gnu. I mean yeah, technically I could do it with network block device mounts too, but A) what would the NBD be served _from_ (needing a server that can stay up is the point of the exercise), B) I suspect if the network goes down for an hour, a kernel using NBD mounts might get unhappy in a way that requires physically poking the hardware to reset it, and I want a server I can leave up for weeks and use from another state without worrying about that. Anything can break, but avoiding a known sharp edge is mental load. Don't want to have to worry about it.

Writing to the USB stick drained my laptop's battery noticeably fast. At the start of copying that 4.29 gig file the battery was 97%, at the end it was 94%. The copy took 6 minutes 25 seconds (I ran it under "time"), just over 11 megabytes/second write speed. Eh, that's acceptable.


September 9, 2023

Had to look up what it was the qemu loons replaced "-hda block.img" with again (it's "-drive format=raw,file=block.img"). Note that kvm --help (qemu --help was removed) has one instance of "hda" and 17 instances of the word "drive", and the -hda argument takes exactly one argument while -drive takes 31 different comma separated keyword=value options like "detect-zeroes=unmap,iops_rd_max=irm,group=g" (the --help output does not provide further information about what any of those _mean_).

How did an open source project get this bad? Because bureaucrats took over. Story time!

Thirty-ish years ago Beowulf clusters based on Linux, networking together groups of cheap PCs, started seriously eating into IBM's mainframe market. So IBM retasked a lot of old white men doing cobol on punched cards on big iron to instead do Linux, and they took over "xen" development (an ugly hypervisor technology people like me were trying to avoid at the time), which was rendered completely irrelevant by "KVM" so IBM's guys took over KVM development, which was based on QEMU so they took over QEMU development. The stench of bureaucracy drove away QEMU's creator Fabrice Bellard, so these days QEMU is maintained entirely by punched card whallopers who think any technology that DOESN'T require at least three full-time employees per install is leaving money on the table, so of course it needs a mandatory configuration file written in its own language that require you to take courses from IBM and get a certification in. (Like JCL.)

Yes my first job out of college was doing OS/2 at IBM, but the Boca Raton facility that had created the PC was an oasis like Xerox Parc (only with less structural protection from the surrounding bureaucracy, a tide pool instead of a walled garden). The Boca people made fun of the Poughkipsee people even back then, and IBM destroyed the Boca Raton facility a few months after I got there (which is how I wound up being "site consolidated" to Austin), and although the Linux Technology Center they started in 2000 was in the 900 buildings on the east side of Burnett where they'd dumped the OS/2 guys (starting their Linux development with Boca refugees only 5 years mixed back into IBM proper), IBM's Big Push Into Linux was really a Sam Palmisano thing and he didn't last.

IBM's 1980s implosion took place under two CEOs (both interchangeable white men named John, one "Opel" and one "Akers"), and then a guy named Lou Gerstner brought it back from the dead starting in 1993 and made the company relevant again for a while. He fell on his sword when the dot-com crash happened, handing off to a guy named Sam Palmisano, who basically inherited a todo list from Gerstner in 2002 and Did Those Things, one of which was "spend $1 billion/year on Linux". (It was one of those "Sun/Microsoft are killing IBM but Linux is killing both of them faster" rock-paper-scissors things.) When Sam reached the end of Lou's roadmap in 2011, he retired. And handed the company off to Gini Rometty, a bloodless beige accountant from central casting who proceeded to cost-cut the company to death, including eliminating the entire R&D budget. (Robert Cringely, who did the "Triumph of the Nerds" PBS miniseries on computer history, chronicled the fall and even wrote a book about it.) But it turned out Gini did actually have a plan: all her bean cutter cost-cutting briefly juiced the stock so she could use it as monopoly money to buy another company that DID have a future: Red Hat, I.E. Pointy Hair Linux. She burned old IBM to the ground to buy a company that HAD briefly understood Linux 10 years earlier before ossifying.

Red Hat was one of the first Linux distributions to be run by someone who understood marketing, and during the dot-com boom of the 90's he built it up until it was big enogh to have an IPO in the year 2000. And the consultants Red Hat brought in to handle the IPO explained to Red Hat's founders (Robert Young and friends) what Sun Microsystems actually DID for a living: exploit a quirk in really big procurement contracts. When people bid to sell Very Expensive Things to governments or Fortune 500 companies, the contracts have piles of legalese restrictions as the dinosaurs try to protect themselves in a way that winds up costing them even more money. A common stipulation is to cap a vendor's maximum allowed profit at a percentage of the cost of materials... which means the vendor specs the most expensive possible materials. If somebody putting together a system for the U.S. Navy can only mark up the operating system running their new device by 10%, management WILL specify a $5000/seat Solaris license they can make $500 profit on instead of a $29 retail boxed copy of Red Hat that nets them $3, even if the engineers building the system would much rather use Linux than Slowaris. And Red Hat went "Wait, if we find an excuse to charge way way WAY more for the same thing, there's a class of customers that will buy significantly MORE of it?" And they hallucinated up some marketing bullshit to create "Red Hat Enterprise", and the company went from something like $15 million annual revenue to over $100 million just in time for that IPO, and "the tail wagging the dog" situation resolved itself with the company being sucked entirely out of the retail market because all their engineers were spending all their time being 24/7 grape peelers and fan wavers and fluffers at the enterprise side (there's nothing for you to DO but we continually reassure ourselves that you are ON CALL just in case with endless busywork which pays REALLY WELL), which created a market vacuum for "actually usable Linux distro that the open source community actually creating Linux can use", which Ubuntu stepped into around 2004-ish.

Why did it take so long? In January 2001 George "Dubyah" Bush and Dick "Dick" Cheney caused the dot-com implosion, because in the run up to the supreme court overriding the results of the "hanging chad" election they'd lost at the start of November 2000, Dubyah and Darth Cheney were out giving stump speeches about What We Will Do Now That We've Totally Won Of Course It Will Be Us Don't Look Down Just Keep Walking (which all the news stations covered because the unresolved election was THE big story) and the CONTENT of the speeches was all "We're going to give away piles of money to billionaires, explicitly undoing the balanced budget that Clinton left us with and running the national debt up to never before seen heights in the name of creating an oligarch class that can afford to kidnap harems off the street and hunt peasants for sport." And everybody went "but WHY???" And their answer was "Uh... the economy will totally collapse if we don't, giant tax cuts for the rich are the only way to save it." And this was INSANE, nobody else had heard even a whisper of an upcoming recession before then: we'd won the cold war and invented the internet, everything was going GREAT and the big worry was overheating causing inflation because the economy was doing so well there was nobody left to hire. (I had three jobs at once during this time period: day job programming, teaching community college courses at night, and writing a stock market investment column.) But the corporate decision makers heard this "tax cuts because recession" speech repeated over and over for weeks, and went "well if you say so, we'll tighten our belts in the new year's budget to save up a rainy day fund just in case you know something we don't". But they couldn't cut inventory because sales were through the roof and besides they'd already done that (all the just-in-time delivery shenanigans, and things like Coca-Cola spinning out its bottlers as a separate company so the manufacturing and distribution facilities weren't on the books of the company that sold the syrup... that was all 80's and 90's developments; inventory was the denominator in the "cash conversion cycle" figure that investors started looking at when P/E ratios went crazy and nobody understood the businesses they were investing in enough to do proper discounted cash flow analysis). And back then it was common knowledge that cutting R&D spending is highly disruptive introducing huge bubbles in the development pipeline (a one month disruption can cause a two year delay sort of thing; this was back when the USA was still DOING a lot of R&D so we still understood how it worked, the big "outsource all our thinking to india then china" stuff came later)... But there was one expense you could switch on and off like a light: advertising.

So everybody just didn't budget to buy any advertising in Q1 of 2001, to bank up money for the recession Bush and Cheney assured them was coming... And it turned out that magazines and television and websites all had the same revenue model, paid for by advertising. It wasn't a BAD revenue model, multiple companies had survived doing that for centuries... until the entire economy suddenly stopped paying for ads at the same time, and then there was splash damage.

Ms. Magazine was 40 years old and McCall's magazine had been around for a century, but both folded in 2001. 3 of the 5 television networks (ABC, NBC, CBS, FOX, UPN) at the time ended the year in the red, despite cancelling most of their series and replacing them with cheap Reality TV where instead of paying professionals to produce shows, some cameras follow random people around for a while (possibly inside a Spirit Halloween or similar) and then editors cut together a story after the fact like a sculptor with a block of marble. But this collapse was mostly known as "the dot-com bust", because it ended the dot-com boom HARD. The most common website business was "online magazine without the printing and distribution costs" (people wrapped fish in newspaper because buying newspaper is cheaper than buying the blank paper it's printed on, due to the advertising subsidizing the material, and who worryied about toxins in the ink back when everyone was breathing tetraethyl lead from gasoline exhaust).

All the websites supported by advertising were just GUTTED in January 2001. That's how Bush and Cheney triggered the dot-com crash, by cratering the advertising market with a self-fulfilling prophecy about recession to justify their tax cuts for billionaires.

1/3 of the dot-com businesses were always doomed, but ordinarily their collapse wasn't synchronized. Another 1/3 needed time to establish themselves, either by growing to a profitable scale (Amazon and Tesla lost lots of money for many years before turning a profit) or "finding themselves" (the way Twitter started via SMS texting but pivoted to web-and-app, or how Flickr started as an online game but pivoted to photo sharing when that's what its users actually spent their time doing with its service, or how Youtube had to survive the lawsuit from Viacom). And the remaining 1/3 of the dot-com businesses already HAD a sustainable business model, but what does that matter when your customers suddenly go away? A friend of mine was on the Board of Directores of VA Linux: 3 of their 5 largest customers went into Chapter 11 in the space of 2 weeks, owing them money for hardware that had been delivered on vendor financed credit paid off a little each month (server payments like car payments). Now those payment claims were tied up in bankruptcy court and VA might some day see pennies on the dollar, years from now if they were lucky. AND those customers were gone and wouldn't be buying any more machines, so future sales were looking dismal. AND that shiny new hardware the customers had been using was all getting auctioned off for pennies on the dollar at bankruptcy liquidation sales, so any surviving customers wouldn't need to buy anything new at retail price for YEARS. That's why VA exited the hardware business, and it wasn't just VA. Dell laid off 17,000 people in Austin in 2001, Intel idled its fabs when it ran out of warehouse space to store chips nobody was buying... the splash damage ripping out into the rest of the economy was brutal.

But the "online magazine" style dot-com companies were at ground zero of George W. Bush's stupidity. I was working for The Motley Fool until November 2000 when the vibe got Really Uncomfortable (management was STRESSED), and it was just no fun anymore, and I handed in my notice effective at the end of the year. The Fool's revenue fell 50% between Q3 2000 and Q1 2001, and they had an all hands meeting (I still got the emails) and laid off 50% of their staff to cut expenses in line with revenue. They were well-managed and had outlets other than the web (newspaper column, radio program, hardcover books...), and thus survived, but it also killed what was unique about them and they became just another stock market investment site...

So backing up: the delay between Red Hat eating Sun Microsystems' business model and Ubuntu stepping into the retail Linux market vacuum was partly due to the dot-com bust. Red Hat pivoted HARD to the enterprise market, and over the next ~15 years Red Hat turned into Pointy Hair Linux, the operating system equivalent of filing everything in triplicate, and that's what Gini Rometty bought when she zerg rushed IBM at a big acquisition to get a replacement business model.

And THAT is the IBM that took over QEMU development and pushed out all the hobbyists. The IBM made of dead wood that was too expensive to fire (and who didn't head for the exits back when David Niven threw his hat into the fire after Gini had burned all the furniture and pried up the deck planks heading out into deep ocean as fast as possible), combined with the portions of Red Hat that survived a decade of ISO-9001 certification audit training update preparation meeting pre-meeting scheduling conference catering budget review email reply-all sessions.

And that's why qemu's -hda option is broken, and why it's easier to maintain a local patch than try to argue about it on the list.

Apparently a dude named Arvind Krishna took over Red IBM Hat in 2020. Literally all I know about him is the quote from the statesman article about wanting to replace 30% of his employees with AI. Oh, did I mention that the IBM Austin facility they merged the Boca Raton developers with was itself dismantled? Their hardware manufacturing in Austin (used to design and make PowerPC chips) was sold to build a shopping mall called "The Domain", and what's left on the east side of the road is currently being sold, and towards the end of that article CEO du jour says he looks forward to replacing 30% of IBM's remaining workforce with basically ChatGPT. So that's nice.

Anyway, "-hda file.img" is really simple, so it had to be deprecated in favor of -drive,argument,argument and I am sad.

Open Source projects only avoid being embraced and extended by bureaucracy to the extent a motivated hobbyist is willing to fork them, or to reimplement them from scratch. A project that is good enough to prevent competition is susceptible to frog boiling, and I dunno how to fix that. It's not technical, it's social.


September 8, 2023

I received a "ts" submitted via the list, which is... from something called moreutils? Debian's repository has over 70k packages and yes this is one of them, but... why? Then again, busybox added it in 2019. (Is this an argument _against_ adding tsort support just because busybox added tsort? Sigh...)

The busybox version works on integer seconds, and the timestamps in dmesg show microseconds. I am sad that neither strftime() nor date(1) seem to have caught up to the idea that computers are fast enough to measure fractional seconds these days. No "fraction of a second" field in any of the print commands.

I also wanna specify digits of precision, because milliseconds are plenty for humans and nanoseconds are the best the machine sees. And no the machine's not gonna see MORE than nanoseconds even with a 5ghz clock, the page fetch latency from DRAM is still gonna be dozens of nanoseconds. The closest two lines I can spot in my current dmesg are 3 microseconds (3000 nanoseconds) apart, and those are adjacent printk() statements in the kernel. Back in j-core's "setting our clock from GPS" days we had a thermally stabilized clock in a little electric oven under styrofoam and it would still drift a ~8 nanoseconds from the (not human perciptible!) breeze any time somebody walked by the desk. The air conditioner turning on, the door closing across the room... (Microsemi sold an atomic clock on a chip, but it cost WAY too much, ate a processor's worth of electricity, and the lead time for ordering one -- nobody actually _stocked_ it, gotta get it from the manufacturer -- was 22 weeks back BEFORE the global pandemic-induced chip shortage. Maybe the tech has improved since, but there was basically no demand for it at the time.

So yeah, milliseconds and nanonseconds are useful, so of course somebody added a whole "microseconds" ecosystem which is neither fish nor fowl. Why did they do that... oh hey, historical explanation. Still a bad thing.


September 7, 2023

Sigh. There's a long thread ongoing on the coreutils mailing list about posix creating an unnecessary alias for the printf command's %b option, which the bash maintainer has already already declined to go along with so it's kinda moot, and I have written at least three replies that I then deleted instead of sending. I am RESISTING talking about how the standards committee that removed "tar" in favor of "pax" over 20 years ago (and STILL hasn't admitted nobody followed them over that cliff) should stop trying to demand changes of existing code. A good standards body should document, not legislate.

But I am sitting on my hands and not Doing Flamewar. Nope. There's a dial-in zoom thing for the posix bureaucracy to talk about it later today, which I am Not Dialing In To either. (I've circled back to a night schedule again so it would be half an energy drink of caffeine to stay awake for it anyway.)

Oh hey, bash's builtin printf has %n which assigns to environment variables. I need a MAYFORK on printf.c to implement that. Throw it on the todo list...

Checking if busybox djelibeybi hersheba tsort handled the echo a a a a | tsort edge case, I ran "make distclean defconfig busybox -j $(nproc)" and it barfed saying it couldn't find fixdep. (Sigh: not my problem, build it single processor... And yes, "echo a a a a | ./busybox tsort" did NOT consider that a loop, but output just "a" like it's supposed to.)

Posix's "tsort" command is a HORRIBLY DOCUMENTED simple dependency resolver which takes whitespace-separated pairs of inputs describing dependencies, meaning the first entry in each pair must come before the second, and outputs a "topologically sorted list" which contains each unique input once in an order that obeys all those before/after rules, ala:

$ echo a e b a c a d c | tsort
b
d
c
a
e

The resulting list has a before e, b before a, c before a, and d before c. If the list contains circular dependencies, tsort errors out and shows the cycle, although 'echo a a a a | tsort' just shows 'a' because depending on yourself is not a cycle. (Special case!)

My first pass through posix had this in the "uninteresting" bucket but busybox added it last year. I didn't notice at the time but my every few weeks check of the busybox list folder (yes I'm still subscribed) had a memory leak fix for the command, and digging back to the original submission somebody was using it as a dependency resolver for their init script ordering. Ok, sure.

I'm not gonna look how busybox implemented it because gpl. (I'm comfortable going back to look at the 1.2.2 version I _released_, but this code was added later.) But the obvious way is to read the whole mess into an array of pairs and go over it with two for(;;) loops, checking each first entry against each second entry. If this first entry is not in any second entry (and entries where first and second entry match don't count), print this one, and yank this entry (and every other entry with a first entry matching this one) from the list. If we make it to the end and haven't printed anything this pass, what's left is all circular dependencies with no loose ends to unravel.

Problem: that's an n^2 algorithm. Sorting the table and binary searching could give me a log(n), but sometimes it wants to find second entries (does anybody else depend on this), and sometimes it wants to find first entries (duplicates to remove so we don't output it multiple times). I'd need TWO sorts, and finding the same entry in the other table is not fun. (The "other entries matching his one" problem: "a b a c a d"... Even with a fallback sort each pair is not guaranteed to be unique, see again "echo a a a a | tsort". I suppose I could have a deduplication pass but ew?)

Hmmm, maybe I want a suppression linked list? "Things I have already output this pass, so even though I'm removing them from the table don't show them again?" Except that can go O(N^2) if the entire table is "a z b z c z d z" and gets removed all in one go. But that doesn't smell like a common case, and even if it is we'd output the whole table on the first pass (discarding _everything_ immediately) so it's not THAT bad, I think one of the other N's drops out. (And I don't want to code an insertion sort for the removal list. That just feels wrong.)

Ha: not a linked list. This is an array: move the entry we're yanking to the end when we move the rest down to fill in the hole. Then you naturally get "all the ones we've removed" together at the end, and just have to keep track of how big the table was at the start of this pass. Loop from new end to old end. (This is all obvious enough it's probably the "standard" implementation. Not really a hard problem, just new to _me_.)


September 6, 2023

Blah. I've had an upset stomach on and off for days (of the "reads as anxiety if I'm not careful" type) and it's REALLY hard to concentrate.

The plumbing.sh in the lfs build I'm doing does a "find $LFS -newer .timestamp" thing (in theory that can be used for package management, here are the files this new package installed) which works on the host but inside the chroot the $LFS dir is /root/lfs containing source tarballs, build directories, and build scripts. After the chroot the packages are installed into / not into $LFS, so the find is looking in the wrong place. This codebase is brand new and already accumulating scar tissue and changing assumptions. Great.

So I've mentioned Rock Sugar before, which is a side-project of the voice actor behind Wakko Warner and several of his friends. (As with the Blues Brothers, famous-ish person with unrelated day job sometimes goes slumming as a rock star, and I cannot argue with this. Their schtick is doing VERY GOOD mashups where they sing the lyrics and melody of one song to the backing music of another song, which I find EXCELLENT programming music. Challenging enough to keep my ADHD at bay without actually being distacting because all the _components_ are familiar. But their first album's only an hour long...)

Some time ago Fade bought me the second Rock Sugar album off Rock Sugar's website, which is only available as a digital download because their first album's physical CDs all got recalled due to a lawsuit from the ex-lead singer of Journey who was ABSOLUTELY CERTAIN the professional voice actor hadn't done a spot-on impression of him but had instead used an unauthorized recording of the actual Journey geezer! (Who broke his hip in 1998 and retired from singing.) And of course Jess Harnell (the voice actor) proved in court that this retiree loon couldn't tell what was his own voice and what wasn't, but the judge threw the doddering elder a bone by saying the album caused "market confusion" and couldn't be sold anymore. And Jess didn't spend the time/money to appeal, and just let people upload the album to youtube where people could listen to it for free. (I found out about them from the Professor of Rock interview with him there.) You can in theory still buy the CD used, but it's $200 for a scratched up one, and the artist doesn't get the money.

But the judge's ruling didn't apply to A) digital downloads of B) their SECOND album. (The first was reimaginator, the second is called reinventinator.) And besides, Steve Perry is 74 years old now and presumably busy suing other people.

So anyway, I legally own a copy of Rock Sugar's second album but couldn't immediately FIND it, which I was reminded of by the second album being uploaded to youtube a few months back. So the easy thing to do is youtube-dl the copy of the whole album that's on youtube, tell ffmpeg to strip the audio part out of it, copy it to my website, download it to my phone from there, and move it into the mp3 player directory. (For a definition of "easy" that meant I could theoretically do all that in 5 minutes from the laptop I was sitting at without bothering other people who were asleep at the time.)

And THIS is how I found out that youtube-dl does not work with Google Fiber. It does... something. And detects that it's an unauthorized stream in under a minute of downloading, aborting the download and immediately failing on "cursor-up enter" restart attempts. (And yes, I did a fresh pull of youtube-dl to make sure I was using the current version.)

But it still works fine with phone tethering. (Grumble grumble monthly bandwidth quota.) So Google's service is measurably less capable than its competitors, because Google imposes restrictions and the various Google services collude together to impose additional layers of data harvesting and activity tracking and digital restrictions management, which in this case blocks something I personally HAVE A LICENSE TO DO. (I paid Rock Sugar directly for a copy of this material! Actually slightly higher resolution than youtube has, which I could totally shift onto my phone if I could be bothered to scrape through my hard drive to find wherever I put the file (under whatever name I called it), or search back through a year of email to find the download link to re-fetch it from Rock Sugar's website, or wait until 7am for Fade to wake up. But the youtube link was right there and I have a tool that can grab it (which I've needed to archive my OWN presentations from the linux foundation's channels and such; yes the same people who accidentally deleted the entire 2015 ELC conference off youtube, in theory I can contact Tim Bird and get a copy through official channels but that takes weeks at best). And this tool works fine... when I'm not on Google Fiber, because Google Fiber is uniquely restricted in a way that other providers are not.


September 5, 2023

Proper fix for the backslash segfault thing. Of course I have a pending "78 insertions, 17 deletions" patch to sh.c to make more changes to the backslash logic, because (among other things) bash -c $'XYZ=xyz; echo "abc$\\\nXYZ"' outputs "abcxyz" but toysh is outputting "abc$XYZ" so I need to fix that, and not via whack-a-mole but in a more generic way. (I have like 5 pending changes to toysh in different directories. They fight. This one was in a new fresh directory and conflicts with probably all of them. Oh well...

Oh goddess, Linux From Scratch 12.0 came out on the first. Nope, NOT CHANGING HORSES MIDSTREAM.

List of things that should happen with the LFS build:

  1. Finish through the end of chapter 8 as-is, unmodified, to establish a baseline so we know what success looks like.
    • note that most of chapter 9 is booting stuff, not chroot stuff, so not really needed for this use case. Grub might be nice to compile, but not install. Building a kernel is a good smoketest though. But modern LFS kind of handwaves away the configuration step.
  2. Rebuild ch5 with the musl-cc cross compiler, so the statically linked chapter 7 tarball isn't so fscking huge.
  3. Use a busybox $PATH to build ch5, and ensure the result matches? (Ensure how?)
    • Matches means it still builds to completion, and the log and resulting binaries are similar-ish. Can I examine any config.log stuff? I guess the list of commands that get called in a single processor run matching is a good start. At least examine/explain any observable differences...
  4. Start inserting toybox commands in place of the busybox commands and re-run the build.
  5. Swap glibc for musl in chapters 5 and 8. Yanking perl if possible.
    • Still compile it, but don't install it. Maybe put both in some kind of optional side build? Figure out what needs python too. Oh, and patching the kernel to NOT need libelf and bc probably lets us yank those too.

There's another step, which could go anywhere above:

6) Transplant the ch7 build into kvm chroot so the ch7.0 with all its bind mounts and such can be a mkroot script running under the toybox environment.

And then, of course, dive into beyond linux from scratch...

Step 1 goes through the end of chapter 8 because most of chapter 9 is about booting, not chroot stuff, so not really needed for this use case. Grub might be nice to compile, but not to install into a vm that boots with qemu -kernel as its bootloader. Building a kernel is a good smoketest, but modern LFS kind of handwaves away the configuration step, and I already have kernel builds in mkroot happening under a toybox-only $PATH.

There's two goals here: 1) Make sure what we provide is good enough to run the package builds, 2) Make sure that what we provide can replace most of this gnu/crap so nobody ELSE needs to compile it unless they're sufficiently masochistic or think they must be using "standard" versions and yet somehow haven't been peer pressured into running Windows or MacOS. (Such people exist. For some reason.)

I left off in the glibc build, which is horrific, specifically the time zone data, which is packaged wrong and wants me to hardwire a timezone into the image because there's no sane way to select one. Austin and Minneapolis are both in the "Chicago" time zone.


September 4, 2023

Dealing with the sh.c segfault: ./sh -c $'abc\\\n def' triggers it and it's that backslash handling from earlier this year again. They WERE being stripped during initial parsing, and now they persist way longer (and have to be filtered out later), and something's getting confused by it.

This would be SO much easier if glibc's asan worked. It tells me the line it segfaulted on, with no backtrace. There's a backtrace for where the nearest memory block was allocated, but not for where the fault happened. Great, something somewhere called skip_redir_prefix()! Who and why? Your guess is as good as mine, I dunno how it GOT there...

Back to drilling down by inserting dprintf(2, "florp\n"); dprintf(2, "wheep\n"); and dprintf(2, "pang\n"); statements into the code. The point is "I got here, uniquely, passing these points in this order, before it exploded." Adding var=%p statements as needed to examine decision state. This is the debugging version of percussive maintenance: hit it with a rock until it's the right shape, but it never NOT works. You can have the "an interrupt came out of nowhere" problem, or "previous actions had delayed consequences so free() on a seemingly valid address threw a heap corruption error", at which point you start reducing your test case by ripping out previous chunks of code until the problem stops happening, at which point whatever you last ripped either threw the grenade or disturbed the Jenga tower so the grenade missed anything vital. Debugging the uncooperative ones is a whole lecture, and yes I have isolated them to "compiler bug" and "processor errata" before, but never both in the same year. (A zillion other people have used this toolchain with this processor on this OS. I'm the first one trying this code. That's a 1/userbase chance that it's NOT in my code. Happens, but really not often. And usually because I'm building a cutting edge compiler or libc or kernel from source and the bug was introduced in the past few months. For example, the processor errata in the cortex-m already had a workaround in the vendor's uClibc toolchain but not in the vanilla uClibc I was using; fix existed but had not made it upstream yet. I was using their kernel source to run on the board in question, but not their toolchain...)

This particular bug hunt isn't being remotely stroppy, just tedious. Window 1 is the text editor, window 2 is the command line within which command line history recompiles and re-runs the test every time I hit cursor up and enter. (The && operator in make && ./thingy is very useful; if I typo in the source it doesn't re-run the test.) Using ASAN actually makes that far LESS convenient because I then have to two-finger scroll up three screens to see my last printf output, because it's spat all sorts of useless "shadow byte legend" garbage after the interesting stuff I WANT to see, and there's no obvious way to get JUST the stack traces. I asked Elliott once what the shadow bytes were about and he didn't know off the top of his head.)

Sigh, looking at the existing code with my usual security paranoia, expand_redir() is doing arg->v[j] where j is a signed int and going "if they ever manage to feed more than 2 billion arguments to the same command line, that could wrap to negative and index out of range", and I mostly DIDN'T take that kind of thing into account when writing this because the maximum glibc contiguous malloc() size was 128 megs at one point, and the kernel would cap environment space at 10 megs... but that kind of stuff changes with the weather and I can't rely on it, I should probably have an explicit check for "2 billion argument maximum" in the parsing somewhere. (Not that I expect you can easily exploit a pointer 16 gigabytes before the start of the heap, but let's not reach out and touch hyperspace and expect it to end well on general principles.)


September 3, 2023

Sigh, I've written 3/4 of help text compression support where if you've enabled both CONFIG_TOYBOX and CONFIG_ZCAT then scripts/install.c builds an instlist --help that spits out the big help text block (with embedded NUL bytes and special 0xff entries for OLDTOY redirects) and make.sh runs it through gzip -9 | od | sed to turn it into a header file...

And now I'm writing the consumer side, and what I really want to do is decompress it into libbuf and print that entry out. I can iterate counting NUL bytes, and then memmove() what's left down and decompress the rest of the 4k block so I'm sure I've got all the data, then print it and it should be null terminated already. Except that doesn't work: for i in $(toybox); do echo $i $(toybox --help $i | wc -c); done | sort -k2,2n says sed --help is 4934 bytes. Which means I can't do the simple decompress-then-print because of ONE ENTRY. (It's the LITTLE THINGS that screw up seemingly elegant solutions. This could be so much cleaner if not for this ONE OUTLIER...)

And of course now I'm going "eh, is this worth doing at all?" It saves about 50k space in the binary (80k text to 30k text), but that binary could always live on squashfs or similar? It's really for embedded systems doing xip, and if that needs a nontrivial extractor or a large malloced DRAM buffer to work how much of a gain is it really? (The point is leveraging the deflate code we've already got.)

Plus digging into actually using the deflate stuff, A) Elliott turned zcat into #ifdef salad because he wants to be use zlib's slightly faster duplicate implementation of this code (I am NOT copying that to a second C file), B) I never bothered to implement the decompress-into-memory codepath. It's not hard, it's just the ~3 places doing flush are all doing so to filehandles right now. (The filehandle to write to is copied into both the bitbuf and deflate structs, which seems redundant but revisiting all that is part of the "implement compression side" todo item which I'm NOT DIVERGING INTO RIGHT NOW). The big design issue was that stopping decompression partway through, and backing out and returning in a way we can easily resume is MUCH harder than just giving it a place to flush data to when buffers fill up. So I did the easy thing at the time, and now... decompressing into an always-big-enough memory buffer would be the easy way.

At the moment, compressing the help text doesn't seem like enough of a win to really want to do infrastructure lifting. It seemed like low-hanging fruit when I was writing an outline to describe how make.sh works for an instructional video, but... as with so many things left at this point, there's a REASON it's still on the todo list. But I don't want to throw out hours of work either...

Sigh, this is the same reason I recently bounced off of moving the hash functions from toys/*/md5sum.c to lib/hash.c so I can use the internal ones in the password code: it's another tangle of external library code the Android guys wanted, and the result is ugly enough I rotated to bang on SOMETHING ELSE rather than hold my nose and deal with it. I need to go back and do it, I just... really don't want to? It's icky. Sigh. (Nobody seems to have noticed yet that toys/other/sha3sum.c does NOT implement a libcrypto codepath, it just does the internal one which WORKS FINE...)

Maybe I should work out lib/portability.c shenanigans with weak symbols? That sounds better than having command implementations full of #ifdefs, might be a good approach... (I occasionally suffer from something a bit like writer's block, which is my subconscious telling me that the design is wrong and I need to work out how. I can smash through it under deadline pressure when I need to, but an awful LOT of design work is staring aimlessly into space doing the is-this-it routine with the blind man and the Rubik's cube from UHF...)

Ok, if I move the library stuff into portability.c with the CONFIG_TOYBOX_GRATUITOUS_EXTERNAL_LIBRARY symbol checking in there, and have weak versions of the functions in lib/*.c, then that moves the config symbol checking out of lib/ which is one of my big objections to migrating this code INTO lib/ in the first place. I don't LIKE having config symbols checked in lib/*.c because the build dependencies don't really work out well doing that. To properly check them, you'd have to rebuild lib/*.c every time you build a new command, and that makes compile/build/test cycles slow and "make change" _really_ slow. But if you don't, it's subtle bug city. My compromise is only checking CFG_TOYBOX* symbols in lib/ (which don't change often) but it's still... icky. (Moving icky to portability.c makes me FEEL better. That's where it GOES.)

In md5sum.c the divergence point between the library and builtin function dispatching is the loopfiles() callback do_hash(fd, name) which calls either do_lib_hash() or do_builtin_hash(), both of which operate on a file descriptor instead of a buffer. So... kind of a lot like the zcat code, actually: not the API I actually need for the new use, I need this to operate on a memory buffer too. (The FILE * plumbing has memopen() for this, but the cure is worse than the disease. I also have xrunread() but that seems like overkill. Hmmm, still pending design work...)


September 2, 2023

Isn't C's #include "thingy.h" supposed to search in the current directory, as opposed to #include <thingy.h> which searches just /usr/include and friends? So why do I need to say -I . to get #include "generated/blah.h" to work with the devuan botulinium toolchain?

There's another bug report that sh.c does something wrong. (Segfault with line continuation.) I should circle back to shoveling that out, but it's an endless time sink and I have so many open tabs. I want to close tabs, which means shortest-job-first scheduling to get stuff done and checked in. The hard part is I'm terrible at telling how long something will take until I've finished. Hmmm...)

I'm cheating slightly in that it's a weekend, so I don't really HAVE to look at the new sh segfault until monday...


September 1, 2023

Hey, cruise changed its mind and now charges a flat $5/ride in its tiny little beta-test service area. That makes a lot more sense. Flat monthly rate would make more sense still (especially since the cars drive around constantly rather than parking even when empty, they can de-prioritize heavy users the same way my phone bandwidth does when I go over however many gigabytes per month it is, so your rides have a longer wait time before arriving when you've done a lot of it them close together). But expanding the service area is the first priority. (I'd be tempted to sign up for the thing myself if not for the iron rule that My Phone Is Not Authorized To Spend Money, Ever. I hooked a $200 gift card up to it once, and even that didn't end well.)

I'm still banging on video outlines. I should actually record the videos at some point. The classrooms on the second floor of Jester Center are pretty much _ideal_ recording areas. Very quiet at 3am, and now the students are back they're open again. Of course this also makes them excellent work space, which means I do lots of typing and then go home not having recorded anything. (Headphones with the good microphone sitting right there next to me...)

I'm getting lots of documentation written (which looks a bit like code review in a certain light, and means I'm doing stuff like trimming global sizes in passing), but... despite old people consistently saying "I hate watching videos like all these zoomers do, I want written documentation" I've CREATED buckets of written documentation over the years and nobody reads it. And I'm not entirely sure how to organize it, either. I can write a 5000 word treatise on the nuances of sed or ls, and _I_ wouldn't read it, so...

The basic "command walkthrough" is a bit of a porcupine because there are SO many potential tangents. Even explaining "true" and "false"... true does literally nothing, and false has one line: "toys.exitval = 1;" at which point the explanation takes a sudden 90 degree turn into explaining where toys.exitval came from, which is why I need an explainer on the three seashells 6 global variables. I need a walkthrough of the entry path (which might as well explain all of main.c). And then I need a whole thing on lib/args.c (called from that main.c entry path!) which is A) kind of a large explanation (500 lines of fairly dense code parsing its own input data format in the option strings), B) initializes globals like toys.optflags and toys.optargs, C) initializes the start of the GLOBALS() block.

It's simple like riding a bike is simple. Unfortunately, riding (and maintaining) a bike isn't actually simple, or else training wheels wouldn't exist. I very much want to make it all simpler, but can't figure out how to still make it all WORK if I do...

P.S. You'd _think_ toybox_version from yesterday would be in rodata instead of writeable data, but it turns out "const" is useless: "extern const char * const toybox_version;" still puts it in data, and "extern const char const * const toybox_version const;" complains about duplicate calls to "const". I could probably hit it with __attribute__((section)) but that's too micromanagy for me. And when I do apply "const" to variables, the rewritten-in-c++ compilers complain about assigning pointers-to-const to non-const pointers (I vaguely recall there was a brief period where they did this for signed/unsigned mismatch, and everybody -fstop-being-stupid it until they backed off), and I am NOT spreading a communicable disease through my codebase to silence warnings. Strings don't work like that: if I try to modify a string constant I get a segfault at runtime, as Ken and Dennis and Brian Kernighan intended. And thus toy_list is writeable data that never gets written to, to shut the stupid compiler up. (Sigh, I should look into that __attribute__((section rodata)) thing, shouldn't I? Smells way too much like busybox micromanagement. But the advantage is rodata can collapse together between multiple running instances of the same program, especially on nommu fdpic, and thus has an actual measurable benefit...)


August 31, 2023

One of the videos I need to do is explaining the global variables in toybox, which you can beat out of it with scripts/findglobals.sh (a wrapper around the ever-useful "nm --size-sort" piped into a couple grep filters):

$ make distclean defconfig toybox &amp; scripts/findglobals.sh | grep -v GLIBC
0008 D toybox_version
0050 B toys
1000 B libbuf
1000 B toybuf
1d60 D toy_list
2028 B this

(Building against musl adds stdin/stdout/stderr, building against bionic adds __PREINIT_ARRAY__, and statically linking against anything adds dozens of entries, but those six are the only global variables that should actually be in toybox itself, by policy.)

The ones with "B" are the bss segment, which means they start out zeroed. The two with "D" (data segment, initialized to specific values) are toybox_version which is the version string in toys.h or from git describe in scripts/make.sh, and toy_list which is the sorted list of command structures describing the commands toybox knows how to be.

The two 4k scratch buffers are toybuf and libbuf (one for use in commands and one for use in lib/*.c), toys is a global instance of struct toy_context from toys.h which is filled out by toy_init() and lib/args.c and a few other places (explaining each toys.field would be half of any video about the globals because there's over a dozen and they're all different, and most of them are important), and this is a union containing each command's GLOBALS() data, with the size being that of the largest command's GLOBALS struct...

Hmmm, what is going on with:

$ grep '\t' generated/globals.h | wc
    985    3194   25327
$ grep '^\t' generated/globals.h | wc
      0       0       0
$ toybox grep '\t' generated/globals.h | wc
    169     507    4482
$ toybox grep '^\t' generated/globals.h | wc
    169     507    4482

I'm trying to make a script to tell me the sizeof() each command's GLOBALS() struct, and the struct lines in the union coincidentally have a leading tab (for historical reasons) so I tried to grep that, and... debian's grep doesn't want to play?

Toybox is doing what I expect but I wrote it so that's not evidence that "what I expect" is right. The question is, why does the debian one treat \t as magic? (I tried with and without square brackets...) Ah, it's NOT treating \t as magic. And that's the problem. If I have bash expand it instead:

$ grep $'^\t' generated/globals.h | wc
    169     507    4482

Hmmm... busybox also isn't interpreting \t there. Will mine doing that break something? Do I need to "fix this" (make it LESS capable) and add a test for it NOT understanding escapes? Anyway, back to writing my script. (TODO list critical mass is where working on your todo list makes it longer. I have been over that event horizon for many years now...)

The script is: { echo -e '#include "toys.h"\nint main(void) {'; sed -n 's/^\tstruct \(.*\)_data .*/printf("%d \1\\n", (int)sizeof(struct \1_data));/p' generated/globals.h; echo '}'; } | gcc -xc - && ./a.out | sort -n which only needed two manual fixups to render properly in html! (The & becomes &amp; because it's an html special character, no redirects this time that need &lt; and &gt; replacing.) What the script does is create, compile, and run a small C program to print sizeof() each struct in the union in generated/globals.h, taking advantage of the fact that only those lines start with a tab because the script that generates them is really old. (Yes, even gnu/sed understands \t in the pattern, you see why I'm confused? BE CONSISTENT!)

So anyway, that script says how big each command's GLOBALS block is (sorted by size in bytes), and the last few lines of its output are:

520 tr
1024 cksum
2080 modprobe
2192 grep
2192 telnet
8232 ip

Everything "tr" and earlier is reasonably sized, and "ip" and "telnet" are in pending. That leaves three commands: cksum, modprobe, and grep.

The 1k for cksum is the crc table, which isn't using toybuf because we use that as our read() input buffer in the data procesing loop, fair enough. I could trivially split toybuf between the two uses but 1k is small enough it can go on the stack even for nommu, so I might as well move that, and inline the little endian and big endian per-byte functions while I'm at it (there were two callers of each which is why it was a function, but I can move the second call into an else case inside the loop if I add a "done" variable. Having it on the stack like that means the EASY way to do this is re-initializes the table for each file, which... I mean I could have the table be a local variable in command_main() and stick a pointer to the table back in GLOBALS to avoid the re-init, but... probably not worth it? Microoptimization to avoid a loop of size 256*8=2048 cycles times maybe 50 instructions long, happening once per input file...

Grep is big because of struct arg_list *fixed[256]; which with 8 byte pointers is 2048 bytes. That's fallout from adding fixed string bucket sort optimization last year (commit a7e49c3c7860 and then like 4 fixes on top of that), and that I probably DO want to turn into a pointer and malloc().

Modprobe has a struct arg_list *dbase[256]; which is the same 2k, but the code there uses hash %= ARRAY_LEN(TT.dbase); so it would care about just changing it from array to pointer, and WHY is it doing a modulus on a power of 2? Also, under what circumstances might TT.dbase change? Did I never clean this up... oh, I didn't: modprobe is still in pending. (The promoted ones are insmod, lsmod, modinfo, rmmod, but not yet modprobe.)

Sigh, I don't usually use modules in the embedded systems I build, and proper testing of modules is one of my big "get tests working under mkroot" motivations, which ain't there yet. (Ahem: am not there yet.) I do NOT regularly run root tests on my development laptop, which is why /proc/uptime is approaching 8 digits. (Not something I'm proud of, I still need to close all my windows so I can swap those 16 gig memory chips over from the previous laptop. And now that Devuan Diptheria has come out I should really upgrade off of Devuan Bronchitis...)


August 30, 2023

I don't usually post my todo list because it's probably unintelligible to anyone else. For example, "lfs wrap granularity" means I have an open design question about the linux from scratch wrapper that logs each command out of the $PATH. If I just wrap the initial inherited $PATH, I don't get the calls to the new commands that are built and installed along the way, some of which toybox implements. But "this command isn't needed until after an external package providing it can be successfully built on this system" is useful dependency information. And the initial wrap is less noisy, later re-wraps may have python and so on in them.

What I should probably do is have each package install re-wrap the $PATH but also update the log filename it writes to, so I have a separate log of each package build's command line invocations. That way I can later slice and dice the data however I want, and the trivial one is "cat them all together and pipe to "awk '{print $1}' | sort -u | xargs" to get the full command list. I.E. I haven't LOST anything by doing that. But this means I need to edit setupfor(), which I'm tryign to keep simple? Maybe the log update should be a separate shell command, called manually at the start and then again by cleanup()? Except cleanup can't update the log target file for a NEW package...

Other todo items like "Replace glibc", "yank perl", "What needs python?" are probably more transparent, but have connotations. Perl and Python seem like they belong in Beyond Linux From Scratch, which is Linux From Scratch Book II and says how to build x11 and sshd and postgresql and so on. If toybox can provide a linux from scratch equivalent system with just itself and a compiler (from which you can then build any of the LFS packages without further prerequisites, modulo stuff like curses), then Perl and Python logically DO go in the BLFS bucket along with Ruby and Lua and Java and so on. (Even git, dhcpcd, ntp, and rsync are blfs, not lfs base system.) Except I'm not writing BLFS, and nobody else is likely to competently maintain an "ELFS/EBLFS" embedded book. (If I was an insomniac teenager I'd happily take that on... but back then I didn't know HOW. Now I'm spread thin and haven't got the spoons for large new projects, I can barely keep my existing plates spinning...)


August 29, 2023

Got the Linux From Scratch 11.3 automated build script script up to section 8.5.2.2, at which point the time zone data is stroppy because the tarball doesn't have a subdirectory. (It just extracts a dozen files into the CURRENT directory.) I have code to fiddle with that in aboriginal linux, but am trying to minimize complication this time around? Also, this is part of the glibc build which is just horrific all around, although the gcc build is second-worst. (WHY is the cc1plus binary 366 megabytes? There's no excuse for that. Not the whole compiler, that was me asking "why is tarring up this file taking so long"...)

I want to make a cleaned up musl-based version of this build, keeping toybox at the start of the $PATH instead of the end, so it keeps using it instead of replacing binaries with new ones as they're built and installed. But first I need to reproduce LFS as it exists so I know what success looks like and have a frame of reference to diverge from.

Another thing is I haven't got anywhere to check this in. I don't want to make a separate project for it like I did last time, but the logical place to put it in toybox would be mkroot/packages/lfs, except it's not to the point where it even tries to work under mkroot yet. (One of these packages is not like the others...) It's currently 5 scripts: a "plumbing.sh" that factors out common code (some setup like "umask 022" but also the announce(), setupfor(), and cleanup() shell functions that bracket each package build), a pre-chroot script (chapters 5 and 6) that builds the initial directory, and chroot-into-directory script (current LFS doesn't build a "mount" command in the chroot and instead does a lot of --bind mounts from the host before chrooting into the new system), and then two more build scripts so far: chapter 7 does all the work before deleting the /tools directory, and then chapter 8 (up to the timezone nonsense).

So I run ch5.sh on the host, then ch7.0.sh to chroot into lfs, and then inside the chroot ch7.sh, ch8.sh. And ch7.0.sh copies plumbing.sh and the other 2 scripts into LFS. Awkward, but reproducible.

What it DOESN'T do yet is set up the command line logging wrapper inside the chroot. I have a log of the chapter 5/6 files, but not a log of what gets used inside the chroot. In theory I only need to provide the host binaries and can let the chroot do its thing, and that's system bootstrapping done... except I want toybox to provide a working system the way busybox does in alpine, which means building arbitrary packages with toybox plus supplemental binaries toybox doesn't implement. If toybox DOES provide a command, that implementation needs to be load bearing...

A passing anime (kuma kuma kuma bear, which I re-watched season 1 of because season 2 is showing now) had an egg-based "pudding" without particularly describing the recipe. Fuzzy's been making egg custard for a bit (which is really good), but this is solid rather than liquid. (Fuzzy's of the opinion japan reinvented the flan, and I've been referring to it as a marsupial flan.) I watched a couple videos of people making it, which didn't do a good job of providing recipes, but the incredients included eggs, milk, and vanilla, and also starch, gelatin, cream, and a caramel sauce (usually just sugar and water, cooked). To be honest I want to go back to tokyo and eat the stuff there (so I know I'm experiencing the actual professionally prepared version on model and as intended, not judging it by our random attempts to replicate something we've never tried before), but I no longer work for a Japanese company even part-time...


August 28, 2023

I've been banging on stuff long enough that every once in a while something wanders by where I'm honestly not sure whether I'm responsible for it or not.

I mean stuff percolates around, that's normal. I saw "we've replaced the dilithium they normally use with Folger's Crystals" on a bumper sticker at Worldcon a couple years after using it as an original fidonet tagline. When people reflect my "containers are chroot on steroids" phrasing back at me, I know that came from the OpenVZ booth I ran with Kir Kolyshkin at Scale back in 2011 where I gave a rehearsed 90 second patter to dozens of people explaining what this "container" stuff is and why they should care. And a few months after I gave my "prototype and the fan club" talk at Flourish in 2010, I watched a video of Greg KH repeating a chunk of it (the red hat as fanzine editor analogy) more or less verbatim in one of his own talks a few months later.

In this case, "the C locale does not support UTF-8" was a problem Rich Felker and I struggled with way WAY back (I have a memory of trying to wrap my head around the problem in the kitchen at Cray's office in Minneapolis, which is a contract I worked for 6 months in 2013). Thus the C.UTF-8 locale in Android is something I advocated for early on for toybox-in-android, involving both Rich Felker and Elliott Hughes in working out how to get it right. So it LOOKS like my "this should happen" and trying to get the ball rolling fed into android, and is feeding into coreutils, but... it seems REALLY obvious, and like something that would it have happened anyway from another proximate cause? (Android's previous internationalization stuff was all at the GUI level inside java, I dunno when bionic actually developed locale support. I suppose I could check the git log, but it's not actually _important_. It works now, that's what matters. And coreutils is sort of finally catching up, at least in being aware that it's an option and making their test suite not die in such an environment.)

Sigh: the comment at the top of musl's des implementation said it's derived from "freesec" which sounds like it _might_ be public domain, so I googled "freesec des" and the first 5 pages of hits are entirely porn sites. Google's not even coming up with the musl source page I got that from. While it's great that google search isn't trying to bowlderize the web the way prudetube is, it would be nice if it could actually FIND STUFF anymore. Later the same day, I tried to find "site:lwn.net python kubler ross" and... zero hits. Luckily I linked to it from my blog. (I have no idea why the python developers thought forcing people to leave python 2 behind would make them move to python 3 instead of any other language. It was completely unjustified.) But another obvious thing that exists which Google can no longer find.


August 27, 2023

Oh goddess. The recent coreutils talk of next release accepting new features (and Elliott's ping about not having brief hash output) reminded me that I added -b to toybox md5sum and friends many moons ago, and I should offer that to busybox to see if it becomes more standard. (Coreutils can then ignore it for many years, as usual, but eh.)

So I'm looking at current busybox code for the first time in forever to whip up an add-b.patch and... look, I created the ENABLE_BLAH macros because the CONFIG_BLAH macros were only defined sometimes and had to be tested with #ifdef, while the ENABLE ones were always defined to SOMETHING (either 0 or 1) so they could be if (ENABLE_BLAH) triggering dead code elimination without invoking the preprocessor to create different codepaths where the parentheses or curly brackets might not match up and thus cause build breaks in certain configurations. Right? Simple. Straightfoward. Useful, I thought. I tried to explain this but the other busybox devs never quite understood the difference.

The current code has added an ENABLE macro that gets tested with #ifdef and needs #ifdef around every use of it? WHAT THE FSCK IS WRONG WITH... sigh. The ENABLE_ naming convention meant it was a _type_ of symbol with a consistent behavior. Meant, past tense, apparently. (Oh well, not my project...) Also, this nonsense from line 266 to 276 where there's 4 calls to getopt32() repeating 4 different variants of the option string, which result in DIFFERENT FLAG VALUES only 2 of which match the manually defined FLAG_ macros on line 140... That's just... ow? I stopped looking at this codebase for a REASON.

(Yes, they count their flag values from the left and I count mine from the right. The REASON I do that in the binary number "1011" the first bit is 8 and the last bit is 1, and the bit that's _not_ set is 4. So in toybox a command with optstr "abcd" receiving -a -c -d would set bit 8 for -a, 2 for -c, and 1 for -d, leaving 4 (-bb) off. The flag bits go where the binary number bits go. The letters are in the same order as the binary digits.)


August 26, 2023

I've been making "truncate -s 2g blah.img" ext2 images by reflex for kvm build scratch space, but the chapter 5 build of LFS 11.3 is over 3 gigabytes, 1.5 each for the "tools" and "usr" directories. A combination of glibc being hilaribad at static linking and the gcc that got rewritten in C++ bloating to insane sizes. Honestly, usr/libexec/cc1 is 258 megabytes (and cc1plus is bigger, and lto1 is about the same size whatever that is), there's something called lto-dump in usr/bin that's 250 megabytes, libstdc++.a is 30 megabytes...

This is NOT NORMAL. Nor is it necessary. I regularly ran aboriginal builds on qemu images with only 256 megs of ram for the whole OS (kernel and everything), and this ONE BINARY is bigger than that. What they have done to it is not an improvement.

And no you can't "oh but Moore's Law" your way out of it when laptops got stuck at 4 gigs ram for about 15 years. (That was the high end that triggered the switch to 64 bit processors in 2005, and I was still pulling up "preinstalled with 2 gigs" machines over the pandemic.) They finally seem to be unstuck, but not by much: I just typed "laptop" into google and clicked on the first and third sponsored links, and both had 8 gigs ram. So 4 gigs was the high end in 2005 and 8 gigs is "standard" in 2023, 18 years later. An 18 month doubling time is not an 18 year doubling time, the same way C is not C++, the move to which is WHY compilers in 2023 eat so much more memory than they did in 2007 without accomplishing significantly different tasks, and no it's not a bigger optimizer. (You've screwed up a perfectly good compiler is what you've done. Look at it, it's got template instantations).


August 25, 2023

Went out to the airport on 3 hours sleep, sat next to an elderly couple on the airplane who fell asleep leaning into my seat space, watched saved anime episodes on my phone rather than trying to pull out the laptop. Got home, hugged Fuzzy, petted the cat, and slept for several more hours.

Fuzzy is in GM's self-driving car beta program ("Cruise"), and gets a week of free rides (starting from her first ride), so we went on an adventure! We had it take us to a DIFFERENT grocery store!

In theory cruise's robotaxi service costs something like $5 plus 30 cents/mile and 20 cents/minute (she told me and I may not be remembering accurately). I understand the price per mile, and maybe charging for wait time before/after the actual travel, but charging per minute during the ride means the passenger cares more about how long the ride takes, so "the robot is driving like a nearsighted octagenarian" is potentially aggravating. (Giving people extra reasons to criticize the performance of your beta product seems less than ideal to me, but hey: free for now. (And Waymo was proposing a flat monthly fee back in the day...)

First conceptual problem: restricted hours. Cruise beta starts running at 8pm and stops at 5:30am, so not a lot of places would be open by the time we get there (or would be closing soon). Fuzzy's first several planned trips on the thing turn out not to be possible because it just doesn't go there, or it's closed by the time we get there. (Austin Central Library is open until 8pm monday through Thursday.)

Second conceptual problem: restricted service area that starts a couple blocks away from our house and excludes over half of Austin. It's basically a donut around the university, going almost as far as us, and almost to the river, and west to... I dunno, Mopac? That part's mostly residential, nice to pick people up from but not a lot of destinations suggesting themselves there. (It won't drive through UT proper for some reason, hence donut.) Which means we have to walk a ways to catch a robot (not carrying back a lot of heavy stuff) and most of the places Fuzzy thought of going turned out not to be available destinations. Fuzzy was especially disappointed it couldn't take her to Central Market (HEB's snooty overpriced grocery store, the "buy local" version of whole foods).

Eventually we worked out that we can go to La Madeleine's parking lot on Lamar, which is a few blocks south of HEB Central Market (home of many snooty exotic overpriced things) and across the street from Rudy's (home of much barbecue and cream corn that is not _creamed_ corn but "corn cooked in cream". We also passed the Kolache Factory on that walk, but it's a breakfast/lunch place that opens at 6am and closes at 3pm. Our old vetrinarian is also there... if we wanted to carry a cat in a carrier a long way down the sidewalk of a busy street.)

TL;DR summary of the experience: the robotaxi is in beta test but the phone app is alpha at best. The actual driving part works ok (although it selects residential backstreet routes optimized to go over as many nimby speed bumps as possible), most of our problems seem to have been with the app.

First actual problem: the robot started heading to us then aborted. (Just like human drivers!) I think the problem is we picked a pickup point (church parking lot) on the edge of the service area, but the route it calculated to navigate there took it out of the service area, at which point it went "boing" mid-journey. (The routes are idiosyncratic, at one point it did a three right turn loop around a block instead of turning left, at a no-light no-traffic residential intersection. Eh, as long as I didn't have to do it...) I'm assuming the Cruise engineers noticed the abort and we don't have to tell them. (Not that we have an obvious way to provide feedback.)

For our second attempt we used one of its suggested pickup points another couple blocks southwest (well inside the coverage area, we'd now walked about 4 blocks to get there). And the robotaxi arrived!

Second actual problem: it sat in front of us going "click click" repeatedly, but the doors were still locked. This went on for a couple minutes before we backed two car lengths away from it (thinking maybe we need to let this one go and summon another?), at which point it turned the corner and drove two houses down and put on its emergency flashers. Aha! It didn't think it had reached its pickup point! And it did not indicate this to us in any obvious way, and our proximity to it (standing next to one of the rear doors) apparently paralyzed it and made it re-lock its doors repeatedly, or something? Weird.

We got in! It drove! The screens showed us our route on a map! It dropped us off! Fuzzy was literally giggling through at least half of this. That part worked fine (good shocks, the speed bumps weren't that bad). It got us there, we got out and walked to Central Market, and Fuzzy got to shop. (She bought lobster mushrooms and smelt. I got an avacado and some yogurt coated lemon shortbread bites from the bulk bins. We were hoping to find rice bran but they didn't have it.)

For the return trip, we summoned another one back at the original place, and the app warned us it was a 9 minute walk and we went "yeah, we know", and... the app did not take the walk time IT HAD WARNED US ABOUT into account. It summoned the car immediately (4 minutes) and said it would wait 3. So we jogged and got there in 7 minutes, got in the car... and it aborted the trip WITH THE DOOR OPEN. It let us open the door, timed out as we got into the car, and then drove off in the wrong direction (we were at the north edge of the service area but it continues off to the west) with the display saying it wasn't currently conveying passengers. The seats have weight sensors so they can beep about the seat belts not being fastened, but that isn't taken into account for the "people are in the car" logic?

We hit the help button and had a conversation with an engineer who said no, we had to get out because "Panda" (each robotaxi is uniquely named, it shows it on the app and in the display) was heading to another customer. So we hit the emergency exit button and it let us out, summoned another one through the app, and it soon said it had arrived but it never drove down the street either direction from us... and then we saw it (hazard lights) at the far end of another business's parking lot, and went there... and of course it was Panda again. No it wasn't going to another customer, it was just driving randomly around like they've been doing for months before they started taking customers. (As long as they keep moving, they don't need to pay for parking.)

Panda took us back to our original pickup point 4 blocks from the house (saved previous location in the app, so Fuzzy sent us there as a known quantity), and we walked home. This trip was where it made that loop to avoid turning left, and it also did a sudden DON'T-HIT-CAT style full brake stop that... we didn't see what it was braking for? But sure? It slammed on the brakes from like 20 mph so a noticeable lurch but not a huge deal. We got there, got out, it drove off, and we walked home.

On the whole, a more pleasant experience than Lyft or Uber, I suppose. I'm used to being exasperated at technology (I break everything), and don't take it personally. (The most frustrating part of the whole experience was several times we wanted to see the edge of the service area, but couldn't pull it up in whatever app or vehicle screen mode we were in at the time. As complaints go, that's pretty minor.)


August 24, 2023

Flying back to Austin tomorrow, I should try to get my act together, or at least packed back into the suitcase.

I have temporarily de-promoted passwd.c because the new infrastructure needs waaaay more testing. Plus I'm unclear on what it should DO in several corner cases: are -d and -l and -u root-only? They seem like they should be root only. Running "passwd -l" as a normal user seems dangerous, and "passwd -d" seems likely to violate policy. Not that we've GOT a good policy, I really want to remove CONFIG_PASSWD_SAD because toybox commands mostly don't have sub-options anymore, we've come a bit far from busybox at the design level over the years. But I don't want to enforce an arbitrary heuristic on everybody? Said heuristic is VERY minimal, bordering on useless. It doesn't require multiple character types (upper/lower/digit/punctuation), it enforces a minimum length of 6 (which even with full 256 values would only be a 48 bit keyspace, that's probably laptop crackable in realtime)...

The modern use case for passwords is rate limited login attempts. I just assume if they've got the hash they can probably brute force anything a human is willing to type with a GPU farm no matter what the algorithm is, but "you've made enough bad guesses to notice and do something about" is still useful. Yet another failure of IPv6 is it makes "this IP failed 10 times, block it for 5 minutes" much less feasible than IPv4. Wikipedia[citation needed] permanently blocked the whole of IPv6 years ago (nobody on it can edit pages, period). Other sites do similar, but it only ever comes up when I'm using phone tethering and DON'T run "dhclient -4" to force an IPv4 address. (I don't actually edit wikipedia pages, because of a personal policy: they refuse to let people with firsthand knowledge contribute to the site, and I'm not going to edit something I don't know about. But I do edit wikipedia _talk_ pages anonymously from time to time to point them at references proving them wrong about something, which is then generally ignored but I feel I did my part.)

The unixy way to do this sort of thing would be to have passwd call out to mkpasswd to generate the actual $id$salt$hash string, which implies also calling out to some sort of password policy command to validate it's a good enough password. But doing so _securely_ is non-obvious in mulitple ways (can't call it from $PATH, don't leak the new password through /proc) and the fact is the unix guys DIDN'T do this back in v7 means there's no standard for it.

Of course one big REASON Ken and Dennis didn't bother (modulo Trusting Trust) is brute forcing through even an unsalted 6 character keyspace was prohibitively expensive to break on a PDP-11. At 6 bits per character, on a 16 bit processor running at 1.25 mhz, assuming a trivial hash that takes 1000 clock cycles per attempt: (((1<<(6*6))*1000)/1250000.0)/(60*60*24) is 636 days to exhaust the keyspace, and that was on a shared machine were people would probably NOTICE the long-running job. Unix wasn't regularly networked until Vaxen running BSD replaced the original IMP hardware in 1980, so the Labs' threat model was "your coworkers" and the AT&T Patent and Licensing Department's secretarial pool wasn't all THAT rowdy in 1972. Even the people on the early arpanet were all a certain amount of vetted anyway since you needed a close relationship with a large institution as price of admission until the NSF AUP changed in 1993 allowing randos to buy access, and THAT didn't happen until after the BSDI lawsuit. Richard "old stick-in-the-mud fogey standing astride history shouting no" Stallman got away with famously having no password on his internet-connected account well into the 1990s, because he refused to acknowledge the internet was no longer a rural town small town because that would be allowing change to exist. (Some change being worth opposing is different than idolizing an ultra-conservative who hates the concept of change.)

Exponential growth of both processing power and the userbase made password security a real world concern (or at least elevated its priority beyond "clean desk policy inside a building you have to badge into anyway") only _after_ capitalism had muzzled the Bell Labs devs. (Unix v7 was the last _release_ from the labs, but the labs guys continued to make newer versions through Unix v10. Nobody outside the labs ever heard about it, because AT&T commercialized Unix with System III and System V, so the labs versions lost permission to be published outside the company. Same reason nobody ever saw their successor system Plan 9 before Bell Labs got spun off as Lucent around Y2K: it was proprietary and cost a lot of money to sneak a peek at. When your advertising plan is "pay me a lot of money to see what I've got", nobody's likely to bother and even if they do word of mouth doesn't spread far when nobody ELSE can see it either.) Unix features that DIDN'T come from Ken and Dennis were far less universally adopted, modulo the bits you needed to connect to the internet...

Hmmm, can I have something in /etc signaling what kind of password policy to enforce? How about the default format for new passwords is the format of root's password, and failing that sha256?


August 23, 2023

Oh goddess, glibc broke userspace again. 2.36 broke mount.h and now 2.38 is breaking crypt(), because of course it is. (Why does anyone anywhere use gnu/anything?)

Alright, not only has the passwd rewrite become time critical, but I need to get it using internal hash functions. Great. Ok, let's do this...


August 22, 2023

Giovanni Lostumbo poked me about ongoling discussions of getting a modern kernel to run in 2 megs of ram. And I of course pointed him at the old "2.6 kernel running in 256k of SRAM" thing from ELC 2015 that the Linux Foundation deleted the talk video of, but Vitaly Wool did indeed get Linux running on a "microcontroller" with no DRAM (tl;dr: 80% of the heavy lifting was kernel executing in place out of memory mapped flash, binaries executing in place out of cramfs in mapped flash, and NOMMU so no page tables).

I thought trying to get an XIP system working under QEMU might be interesting, but after a bit of digging it looks like the Linux kernel clique deleted the xip subsystem and replaced it with "dax" which does not sound like it does the same thing. The old xip stuff let you execute code directly out of mappable ROM or flash memory. But that documentation file was deleted in kernel commit 95ec8daba310, in favor of "dax", which looks like it's just more of that O_DIRECT oracle database nonsense? Haven't dug far but none of the MTD code seems to implement it...

*Shrug* I'm a bit out of the loop here, all the people I know doing this stuff are still using the 2.6 kernel _today_ because if they ask a question on linux-kernel they either get ignored or mocked. "You're a tiny minority we can ignore and bully", "No, there are zillions of us", "We never see you around here", "Yes because if we come here you ignore _and_ bully us". See also the recent "your statically linked initramfs is weird for merely existing" argument. And you wonder why I don't push my patches at linux-kernel that often/hard anymore?


August 21, 2023

Not as good about blogging while I'm up at Fade's. More face to face social interaction, less with the computer I guess?

Fuzzy says the air conditioning went out in Austin. Radiant has been informed. They replaced the vents earlier this year and we bought the warantee service thingy from them last time the air conditioner went weird, so assuming the outside unit doesn't need to be replaced it might be under warantee. (They're the good/fast option in the good/fast/cheap theodicy. They're not cheap, but... Last Week Tonight has done multiple segments on their advertising?)

And the "blower motor" died. $1800 to replace it. The extra twice a year inspection thingy did not catch that it was corroded enough it looks like it's been undersea for months. The power flicker last night pushed it over the edge. Installed in 2015, lasted 8 years: owning a house continues to be expensive. (But renting's gotten nuts these days. Too many unguillotined billionaires.)


August 20, 2023

I recently squinted at Android's microdroid and gave up something like 5 screens in where it still hadn't explained "what is this intended for and how do I use it", which inspired me to redo the toybox main page in an attempt to answer that question for casual browsers encountering it for the first time. I put a link up top with the current release version and date, which when clicked goes to the release notes that used to be the first page. (The point of that is "proof of life". Yeah, way more missable, but prioritization questions boil down to what you decide to suck at...)

Oh hey, there's a new devuan release. I should close enough windows on my laptop I can shut it down and move over the 16 gig ram chips from the old one to the new one.


August 19, 2023

Sigh. I miss Michael Kerrisk maintaining man7.org. For one thing, if you try to drill down from man7.org it bounces off of kernel.org/doc/man-pages (for historical reasons), and THAT page got zapped to not point back to Michael Kerrisk's website. So although all the old pages are still up, the indexes that let you FIND them got deleted by the kernel maintainers. With no replacement! Golf claps everyone! (The kernel clique is turning all 50, none of them have learned how to do this in 30 years, so "new people learning how to do this" is no longer part of their worldview.)

It would also be nice if I could ask Michael Kerrisk whether I need to do a regfree() after a failed regcomp(). Does the error path leak memory? Who knows? Posix says "undefined" which is NOT HELPFUL. (Musl's internal error handler already calls regfree(). Is it safe to call it twice?) It would be nice if the man page got updated.

Also, since "groupdel --help" produces help text, it would be nice if there was some indication of what legal characters for user and group names were. The /etc/passwd file seems ok with everything but colon, newline, and NUL, but the above is an example of, shall we say, "another constraint"...

Yes, I remember the blog post from earlier this year about building my own HTML version of the man pages git repository and posting it on my website, which alas doesn't quite work as-is because the repo that's there does file:///usr/include/stdio.h links and the new maintainer doesn't seem to care at all about web versions of anything so I'd have to do it myself (and I don't do CSS so the result's likely to look like the rest of my blog at best). It's ON THE TODO LIST...


August 18, 2023

Heard from Google that there's budget cuts and they can't fund me full-time next year, but they're requesting _some_ funding for me which is way better than nothing. (Fingers crossed it goes through.)

I thought if I had a year of full time focus I could get everything done, but the hard stuff remains hard and I've been prioritizing support requests so there's quite a bit of ping-poing. And I don't exactly have writer's block on the videos, more... paralyzing perfectionism? Hmmm. Stream of consciousness is trivial, complete and coherent explanations not so much.

Still, it's been lovely, and if it wasn't for a half-dozen household emergencies (most recently needing to replace all the ducts in the house and the hot water heater) we'd have paid off the home equity loan we ran up over the pandemic. (I should just sell the house and move. Which would involve vanishing for a month while I packed everything out...)


August 17, 2023

Darn it, mstdn.jp is borked. It works fine from the app and when I'm logged in, but when I use my phone's browser (not logged in) or an incongnito window (ditto) it loads as a black screen. Which is also what happens when I send somebody offsite a link to one of my posts.

I pulled up the site's "about" page (while logged in) and that gave me a blob of japanese text that google translate says means email sns at bunsan dot social. I gave that a try, and got back a delivery Status Notification (delay) the next day. Not a good sigh.

Is this some sort of language detection thing? When I joined the site it was run by "sugitech" which was a journalism organization run by a 20-something woman, but I heard rumblings of them handing it off to a big organization with deep pockets when Twitler sent a flood of refugees their way. (Which involved cloudflare failure messages for a bit as the domain got handed over, don't ask me how a CDN works into activitypub and regularly updating individual timelines, but the site did get a lot more responsive when it came back up.)

I noticed the phone link issue on the bus heading to the Barbie movie, so it's been going on for at least a few days already and has not resolved itself. I really don't want to switch servers, in part because selecting a new server is annoying. (I still haven't moved my email off gmail!) Luckily b0rk made a guide to running your own mastodon instance. (There are various places that'll run a dedicated mastodon container for you in the $5/month range, In theory I can have arbitrary subdomain.landley.net addresses redirect who knows where. But A) I'm _BUSY_, B) although requesting an archive gives you all your old posts, there's no obvious way to load those posts into the new server. (I mean yeah the old links won't redirect automatically either, but I could at least give out NEW links to old posts on the new server instead of the content going away if the old server does. Nobody seems to have written a "parse the json and manually stick the posts into the database" script yet, although I can't say I've looked that hard...)

Sigh. Throw it on the todo heap...


August 16, 2023

The other issue with this cp -r stuff is filehandle exhaustion. I'm using openat() variants for everything I can so I'm not re-traversing paths that can change out from under us, which means as cp -r is descending into directories it's opening two filehandles per level, so when the directory gets >500 levels deep the default 1024 filehandles allowed in a linux process (ulimit -n) get used up.

I have a plan for rm -r to teach the traversal to close filehandles above the current parent and re-open them via ".." (and then compare the dev:ino pair in stat and traverse back down from the top if it's not the same). I can optimize that slightly by: A) allowing the first 50 or so directory levels to keep their filehandles so the common case never hits this, B) leaving discontinuous filehandles open (if stat ".." doesn't give the dev/ino we have for the parent, keep the filehandle) which catches symlinks (but not bind mounts).. I still need the "drill back down from top" logic in case somebody does a "mv" down in the tree during a traversal, and yes it might not find the directory we were in. The question is what error handling looks like there: maybe error_exit()? There are potentially DIRTREE_COMEAGAIN calls I can't make if I haven't got a parent filehandle, and again we wouldn't do this for the first 50 levels which should cover any non-pathological filesystem layout. (Yeah, famous last words...)

I probably want to do it WITHOUT keeping the first 50 for a release or two, just to catch errors in the less-used codepath.

Doing something similar with cp means saving a dev/ino pair for the NEW directory somewhere. Right now dt->extra is the filehandle of the destination directory, and even if I did want to sometimes replace that with the dev/ino pair (and had a reliable way of distinguishing which it was... negating it only leaves me 31 bits in a long on 32 bit platforms) there isn't enough space to store _both_ device and inode one integer (even with clever bit packing/shaving, kernel_dev_t is 32 bits and inodes can be 64 bits on modern filesystems, sort of goes in the "large file support" bucket, disks are big now). Putting a struct there adds malloc/free I haven't got construction/destruction callbacks for. With dirtree_path() I can request extra allocation space up front, maybe I need something like that for dirtree? (It's already a variable sized object because of the name string at the end.) Where would I PUT it? We haven't got global dirtree traversal data (a lack I've noticed before), and it's not easy to fit that extra info into the function call API. But realloc() after the fact has the problem that the pointer can change when you do that, so other things that point TO it need updating. Hmmm...

There's a reason it's still on the todo list. :)


August 15, 2023

I have a pending cp.c change where xattrs don't apply to directories, and the problem is there's no mkdir variant that returns a filehandle to the open inode, you have to create-then-open which is a race window for shenanigans. Applying selinux labels after such a race window is just CONCEPTUALLY WRONG. But also unavoidable with the existing API?

Having cp operate in a less than ideally secure way is annoying but not unprecedented. Applying SECURITY LABELS in an insecure way is just... why bother? I have a conceptual objection to this. It bothers me.

In theory I can open (not following symlinks) and then do some paranoia on the filehandle: confirm S_ISDIR and that .. is the expected dev:ino of the parent, right user:group, it's on the same dev... except that cp -a is EXPECTED to follow an existing symlink if it's there, isn't it? I'm going "what if they did a bind mount" but... normal use case could theoretically have a bind mount. If you cp -a into an existing directory, does it modify the ownership and permissions of that directory?

These are design questions I need to resolve, and then add tests for, but it's the kind of tests that requires magic build environment that has/supports xattrs and runs as root so it can fiddle with ownership...

Alright, if cp creates a directory with permissions 700 and populates it and then adds world fiddlable permissions on the way back OUT, then at least other users shouldn't be able to take advantage of the create/open window. It already does a DIRTREE_COMEAGAIN chmod but that's because we forced the directory to be writeable to ourselves, and because other changes drop the suid bit already do? This means if you ctrl-C in the middle of a copy the permissions are only SLIGHTLY wrong, but if I create directories 700 then an interrupted copy's directory permissions are VERY wrong, and access during the copy is no longer a thing. Plus if we didn't create the directory then we don't change its permissions and thus its contents aren't protected, and directories we create at the top level would be in an existing directory that wasn't protected, so that _can't_ be a complete fix. (Unless I create each directory in a hidden .subdir, then open it and "mv newdir .." into place, which is just WAY too magic and again kill -9 would leave debris...)

Sigh. Secure vs obvious.


August 14, 2023

And the btrfs fix got merged into the btrfs maintainer's tree. We _just_ missed the -rc6 pull but there might be an -rc7 pull before the release. I have NO idea why the commit's in the log there twice (as c5e6134bb363 and also 9b378f6ad48c an hour apart) but I am NOT ASKING. Selling past the close, trust the process, do not interrupt the enemy while he is making a mistake...

Got distracted by sort.c for a bit until I hit a snag. (It's probably "skip" but wanted to ask first.)


August 13, 2023

Saw barbie. It earned that billion. Have not heimed yet, might barb again as a lead-in. Or perhaps afterwards. (Fade is already looking forward to watching the DVD extras.)

Tested the btrfs fix so I could post my In-Triplicate-By: line to the mailing list in accordance with the prophecy. Alas I can't seem to link to spinics at the moment because it's failing to connect. (Meaning _both_ brtfs web archives are down; one went away in either 2016 or 2020, the other connected this morning but won't now, possibly it's a phone tethering vs apartment wifi issue?) But here's a cut and paste of the reproduction sequence I sent there:

$ mkroot/mkroot.sh CROSS=x86_64 LINUX=~/linux/btrfs-patched KEXTRA=BTRFS_FS
$ cd root/x86_64
$ truncate -s 1g btrfs.img
$ mkfs.btrfs btrfs.img
$ ./run-qemu.sh -hda btrfs.img
# wget http://10.0.2.2:8888/btrfs-test-static
# chmod +x btrfs-test-static
# grep btrfs /proc/mounts # just confirming
# mkdir /mnt/sub
# cd /mnt/sub
# for i in {1..1000}; do touch $i; done
# /btrfs-test-static
# exit

That didn't include the btrfs-test-static source or build because it was already on the list, but that's:

$ cat test.c
#include <sys/types.h>
#include <dirent.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
  DIR *dir = opendir(".");
  struct dirent *dd;

  while ((dd = readdir(dir))) {
    printf("%s\n", dd->d_name);
    rename(dd->d_name, "TEMPFILE");
    rename("TEMPFILE", dd->d_name);
  }
  closedir(dir);
}
$ x86_64-linux-musl-cc --static test.c -o btrfs-test-static
$ toybox netcat -s 127.0.0.1 -p 8888 -L toybox httpd .

The bug was that changes to the directory were appended to active readdir() sessions (actually getdents() under the covers), meaning if you traversed the directory touching files your readdir() would never end, which hit users of my find implementation trying to build AOSP on btrfs. I could have worked around it, but not _reliably_. There's an unavoidable denial of service attack if one proceess can pin another process's readdir() in a loop. (Yeah, maybe process scheduler batching would prevent it but do you want to trust that? Are you sure systemd never does a readdir() on a user-modifiable directory?)

This was found by toybox's find --exec, which ran a process for each directory entry as it was read. I could have switched it to DIRTREE_BREADTH to read each directory's contents into memory before running the first child, which would have worked around this trigger for this bug. (And been noticeably worse on embedded systems with low memory, thus I'd want to NOT do it for most filesystems meaning add a config option for it, but I don't LIKE having this sort of config option so it would have been called something like CFG_TOYBOX_BTRFS_BUG which just seemed rude. I could also have added a find --breadth command line option, and might yet, but that just gives me something to REPLY to bug reports with. Nobody would ever organically AVOID being hit by this issue, it would be a mop and bucket to clean up with.)

But read all then use would still be vulnerable to something like my test program running: it doesn't just pin itself, it pins ANY OTHER PROCESS doing a readdir() on that directroy. They all get broadcast updates ala inotify, so "not triggering ourselves" does not actually prevent this problem.

I also thought about cacheing the dev:ino pairs to eliminate dupes, but a flood of duplicates coming in can still pin us forever: we still hang eating 100% CPU even if we're discarding them so the OOM killer doesn't zap us. Maybe our process would eventually outrace the other one due to scheduler batching letting us run to the end of what was queued so far before the other process gets to add more... modulo SMP, and assuming the other one isn't a malicious actor WANTING a denial of service that spawned 16 threads to hammer a directory (which is just dentry spinning, doesn't even need to hit backing store if they never get old enough to flush). And that's if the libc's readdir() implementation was using a big enough getdents() buffer size under the covers that we ourselves don't schedule a bunch because of all the system calls we're making to fetch entries one or two at a time...

The cacheing could stop at the first duplicate, but that would stop _early_ on filesystems that store things in trees or hash tables and return _some_ duplicates because "it moved later in the tree/hash we're traversing" means a renamed entry gets returned again under the new name, just not in a way that results in an endless loop. You'll reach the end of any given tree/hash table eventually, entries move earlier as often as they moves later (and even a calculated attack is traversing a finite keyspace plus the calculation's gonna slow the attacker down so the defender outraces it and terminates). That does mean such filesystems can miss renamed entries that jump BACK past a traversing cursor, which is its own kind of bug, but a much smaller one. (I suppose I should make toybox rm -rf try to traverse a directory _again_ if it can't delete it, in case it missed a renamed entry? Eh, file creation is the same exploit, and that can legitimately happen at any time. Do a good faith effort and report shenanigans if it's changing out from under us, which is the current behavior.)

And if I _don't_ stop on the first duplicate, when _can_ I stop? As many duplicates as I've read entries? That fails if the first entry is renamed: read one, dupe one, ignore rest of directory. (Ok, astronomically unlikely but still not RIGHT.)

So the workaround I'd worked out before they fixed it was "cache entries, stop at 16 times as many dupes as legitimate entries" which is a horrible evil heuristic but would at least terminate while finding entries reasonably reliably. And I hadn't applied it because "ew" and "just don't use btrfs", and was waiting for a _third_ bug report about it. (Although the first one was already two bug reports.) But actually fixing the problem in the kernel is SO MUCH BETTER.

Oddly enough this is one of the bugs posix enshrined as allowable, due to refusing to call broken implementations from the 1980s non-conformant (yes, even in the 2018 version). A posix readdir() is never guaranteed to finish, it can return infinite results on finite filesystems. But I _also_ remember linux-kernel developers arguing about this back in the day (circa the 2.5 development cycle) which is why I was aware of it in the first place: trying to return "new" files added after the directory was opened (or at least after the first getdents() call on the open file descriptor) opens denial of service attacks.


August 12, 2023

So the mkroot failure is in main.c where toy_init() frees the old toys.optargs if it's not an incremented variant of toys.argv, and something in sh.c is setting toys.optargs to something that A) isn't malloced() so can't be freed, B) isn't part of argv's existing environment space. So the free faults. Alas, the musl-cross-make cross compilers I'm building don't support ASAN so I can't get the "it was allocated here" stack dump, which I admit a growing fondness for.

The best debuggers I ever used were A) Integrated into Turbo C for DOS, B) part of some OS/2 IDE at IBM, and both went away again so I stopped relying on them and moved to my current "stone knives and bear skins" approach of editing with vi and compling from the command line with a bunch of printf()s stuck into the code to track down problems because those tools can't easily be taken away again. Yeah this is open source but that still has life cycles. I remember when xmms was _the_ mp3 player for Linux, and was declared unmaintainable and had its last release in 2007. We all had to migrate from xfree86 to x.org, and "death before systemd" puts one distinctly in the minority these days. Lots of desktop I relied quite heavily on was tied to KDE, meaning it went away when KDE became ergonomically unusable to me (and others, although Linus is apparently far more forgiving than I am). I still miss Kmail, and Konqueror, but "giant hairball tied together so breaking one part breaks all of it" ain't my jam.

Sigh, I need to divert into doing an LLVM+musl toolchain build script from source so I can do most of my cross compile testing with clang, but there are just ENDLESS bug reports...

Speaking of which, the btrfs issue got independently reported again so I checked vger for a btrfs mailing list and then posted there and a day later, there's a fix. Very nice. Triaging the bug and getting the right person's attention is always the hard part.

There's also a cultural difference between up-and-coming projects like btrfs, which are working to convert people away from established alternatives like ext4, and entrenched king of the hill projects like Linux where gatekeepers who've been running things for a quarter century insist supplicants work to prove their issue worthy of consideration. (I watched Linux turn from the first type to the second type, and am sad.)


August 11, 2023

The junior combo at Borger King (have it our way, your way is irrelevant) has, like Wendy's, developed reasonable portion sizes. For $7 instead of $5, but this is what's within reasonable walking distance of Fade's.

Poking at the Linux From Scratch 11.3 build: yes I have a bit of swap thrashing going on here, but "driving test environment" is kind of an important thing I've been missing to organize all the OTHER work...

I miss how clean the earlier LFS versions were, this one does half the chroot in chapter 5 and the other half in chapter 7, has a fairly awkward handoff where the new chroot hasn't got "mount" in it so has to be fairly extensively set up (as root!) by the host in a way that isn't really reentrant (not a problem doing the work manually, but awkward to develop a build script in stages under), and I no longer follow the logic of the /tools directory at ALL.

In earlier LFS versions when you chrooted there was ONLY the /tools directory containing all the binaries you'd cross-compiled from the host, and you set $PATH to point at /tools/bin and run your builds in the chroot, and then you'd rm -rf /tools once you'd used it to build enough of the new system you no longer needed it. This was the original "airlock step", that made sure that none of the files you wrote from on the host wound up in the final system. But now, 2/3 of the new chroot is outside of /tools when you chroot. I'm not sure why any of it is IN /tools anymore...

I miss the Linux Luddites podcast. (Motto, "Not all change is progress", and intro tagline "Every week we try the latest free and open source software and then decide we like the old stuff better". The Linux Late Night podcast was not a sufficiently interesting replacement. Oh well...)


August 10, 2023

Fade had an appointment with her shrink (who prescribes her ADHD meds and the anti-anxiety pills) and she had me tag along to meet her. Said shrink can't take me on as a patient (and thus prescribe modafinil at me) because I'm not a UofM student, but she recommended a couple people (I am VERY OBVIOUSLY a poster child for ADHD) and meanwhile she suggested I get a sleep apnea study (a thing Fade had previously mentioned she thinks I have; I dunno, I'm generally not conscious for that part). There was a slot to see somebody to start that process a half hour later in the same building... but he stopped listening after he took my blood pressure and it was 140/90 (well I had caffeine this morning, didn't know somebody was going to measure my blood pressure). He scheduled a blood draw for monday. He did not schedule a sleep study, I have to engage with the "online portal" to do that, which means working out how to log into it again.

This will be at least the third time somebody noticed something weird about my circulation and did a blood draw. The third time I went to the emergency room with 3am chest pain in Austin I let them draw blood (since they'd done a chest cat scan and found nothing wrong; it happened every spring when I lived 2 blocks downwind of pease park and left the windows open at night, dunno what blooms that time of year but it was an annual event that stopped when I moved to the other side of campus) -- their tests found nothing wrong. And back when I was on diuretics it was after another doctor did a blood draw and found nothing wrong. My blood pressure has ALWAYS been at the high end of normal (my father was put on blood pressure medication in his 20's, not because of a problem but because of a measurement). That's not what I was there for. If an alcoholic gets shot and goes to the hospital, they're there ABOUT THE BULLET.

Sigh. The doctor did prescribe me five ativan, so I could try one to see how it affected me and then take one before the blood draw. Haven't picked them up yet. It's... not the problem? I mean my needle phobia IS a signifcant problem for his desired course of action, but he's willing to put some effort into (and provide controlled substances for) pursuing the goal HE wants to see, and not a lot towards pursuing the goal I came there for.

And this is the GOOD medical system, not the completely dysfunctional Austin mess where nobody was taking new patients but Fade dilligently found me a general practitioner at a "men's health" sports clinic near the UT stadium, which I had exactly ONE meeting with (he basically ignored the issues I came there for because there wasn't a bone sticking out, and I might as well not have bothered) and then the practice closed down 6 months later so I'd have to either find a new GP or drive to Round Rock. (You'd think the light rail would go there: it doesn't. You'd think the bus system would go there: it doesn't. Greyhound goes PAST it up I-35 to stop in Wacko. Similar problem trying to visit Elgin to the east without a car: it's 35 miles away. I could either walk for 9 hours or get an Ub̈er there for $150.)

P.S. When the founder of a company names it after the middle word of "Deuts̈chlan̈d Uber̈ Alles" they're an OBVIOUS nazi. Not exactly trying to hide it. Yes, Silicon Valley has a pronounced rich while male incel eugenicist "social engineering a heap of skulls" problem. Last one out of Silicon Valley remember to flush.


August 9, 2023

I've circled back to trying to clean up expr.c again but I have zero experience using it: when $(( )) math isn't sufficient I called python. This weird "some arguments are strings, some arguments are integers" business is:

$ expr abc + def
expr: non-integer argument
$ expr abc '*' 3
expr: non-integer argument

The obvious results would have been "abcdef" and "abcabcabc" but no. And then there's the colon operator, which I thought would produce a substring but:

$ expr a123b : '[0-9]*'
0
$ expr a123b : 'Z'
0
$ expr a123b : '1'
0
$ expr a123b : '[0-9]'
0

It's not true or false... ah, figured it out. It's doing the least useful thing it could possibly do:

$ expr a123b : a123b
5
$ expr a123b : a12
3
$ expr a123b : a1
2

Returning length of initial (anchored) match. Bra fsking vo.

Ah, and reading the expr.c source in pending, it returns the value when there isn't a sub-match in the regex, and otherwise returns the string of the first sub-match. I would not have guessed that. Want to know what the expr man page says? STRING : REGEXP - anchored pattern match of REGEXP in STRING. Which doesn't say what the actual RESULT should be at ALL.


August 8, 2023

Dentalized. My front teeth look like teeth again, which I wasn't sure was possible but they did an excellent job. Face all screwed up by chemicals, and when those started to wear off my _nose_ hurt, which I wasn't expecting. Wound up napping until afternoon.

Going through the pile of old patches in my toybox directory trying to at least delete the ones I historically applied. Found a half-finished "move this out of the way so I can apply something else" save from years ago that I eventually worked out was "in printf.c \0 doesn't work with %b" and managed a fresh fix for. (At least I THINK that was the issue? Found a bug, fixed the bug, deleted the old unfinished change. Best I can manage.)

I've got an old patch to expr.c which I never finished cleaning up because I wanted it to share code with $(( )), but I've got a reasonable $(( )) implementation in toysh now and it doesn't look anything LIKE expr. As in they don't do the same thing: strings are variables to the shell, but literal strings to expr. Plus $((1+2)) doesn't have spaces and expr 1 + 2 has a hard requirement for spaces separating tokens. And expr hasn't got any assignment operators, single = is comparison. So with the benefit of hindsight it's not just "factor out recalculate() and have it use a callback to look up strings", it's got enough differences that large chunks of the infrastructure would need to drop out and become callback plugins, and it's probably past the point where trying to make the rest collapse together isn't worth it. But it's still really uncomfortable having both.

And the one in expr is just NOT MY STYLE. A table and code operating on that table using an enum to pass data between each other? So very much NOT "single point of truth". (And I've fallen out of the habit of using case statements because they're only a win about 10% of the time, and I don't think this is one of those times either.)


August 7, 2023

Gotta be up early tomorrow for an 8am dental appointment.

There's no obvious way to tell chrome that the URL bar has nothing to do with search, and it should not pollute the URL autocomplete suggestions with "how old was the series of tubes guy when he misunderstood the internet as badly as Joe Biden is doing now". I try very hard to pull up google.com and run my searches there so as NOT to pollute the autocomplete history, but chrome disguises empty pages as google pages, except what you type in there gets shoved into the "pollute your URL autocomplete" namespace. Once again, the "we know better than you what you want and will shove our way down your throat until you comply". You'd think they'd know better than to do that on Linux, but now. (From hell's heart I stab at thee, for hate sake I spit my last breath at thee, otherwise I would already be using Windows or at least a Mac, honestly...)

People online are panicing that we all need to migrate off of chrome to avoid Google's new web DRM nonsense anyway, so I'm curious if the vivaldi browser doesn't have this un-disableable URL autocomplete pollution problem. (If it does, I at least have the mastodon contact of their project lead. He's already responded to a poke, and left to his own devices posts cat pictures. Yes, that's a positive sign.)


August 6, 2023

On a plane to Fade's. Didn't blog for a bit after getting the toybox release out, largely a "collapse" thing. There was the Taiwan talk, the toybox release, and the flight to Fade's all in a row being Looming Deadlines. (Returning for more dental work, although the University of Minnesota wants Fade to teach one more class this fall which may extend the health insurance another semester, who knows?)

Lots of airplane prep stuff: got a haircut, got a suitecase packed at the very last minute (with two requested boxes of HEB store brand cereal and a very frozen tray of Fuzzy's lemon bars).

Didn't get the 16 gig ram chips moved from the old laptop (which I left behind in Austin) to the new one, but I've noticed that the battery standby time is twice as long on the new setup. I don't think it's just the fresher battery, I suspect twice as much ram pulls more power. Plus I still haven't gotten to a good "close all the windows and shut the laptop down" point since my last visit to minneapolis.

I really need to record a proper version of the taiwan talk. I did 90% of the prep work and then hit the scheduled release window like a bird, as it were. I packed the good microphone, so hopefully I can get that done at Fade's.

Fade's posted about half-done is better than not done. I did NOT respond with a link to the Simpsons song "Do a half-assed job". But that's half my problem with the videos, I'm being a perfectionist. The other half is the same problem I had with the "simplest possible linux system" talk years go: circular dependencies. There is SO MUCH BACKSTORY...


August 5, 2023

I noticed busybox added "tsort", which is hubwards of hersheba a posix command I skipped as irrelevant because nothing in the Linux From Scratch build (or the portion of the Beyond Linux From Scratch build I tried) ever used it, nor have I in my various unix poking since 1992. But it _seems_ like low-hanging fruit... Except the posix page doesn't give even a hint of what the command actually DOES, and the man page is basically the posix page. The wikipedia page at least gives an example, but... I don't understand WHY? (Sort an acyclic graph! What does it output if there IS a cycle? It outputs an error message "input contains a loop". Uh-huh.)

But... why? I mean... what's it FOR? (The history section says it was part of the innards of an ancient linker? Um... ok? As a command line utility still out there in 2023? And just recently added to busybox. WHY was it added to busybox? Is this one of those "because it was there in posix" things, or did someone actually have a use case?)


August 4, 2023

Watching the second episode of Good Omens Season 2 with Fade and Fuzzy. It's excellent.

This role really allows David Tenant to show his range: not of this world, centuries old, lives in an obsolete supernatural vehicle, passing for human but sometimes only just, saves the world by wandering around talking to people and performing the occasional minor miracle, interacts with famous historical figures but generally treats the high and mighty the same as shop clerks, treats money as a minor annoyance he can largely ignore, can't function properly without his companion...

At the start of the 10th Doctor's tenure he asked "Am I ginger", and at the end he predicted some new man would go sauntering away. Crowley is ginger and has a heck of a saunter.


August 3, 2023

Collapsed a bit after getting the release out. Gotta pack to fly to Fade's, but just sort of... not doing it. (Lot of taking my laptop out somewhere and sitting down listlessly shuffling through stuff.)

There's a form of stunlock where I have so many todo items laid out in front of me that every time I open my laptop and select a window with a todo item in it I get a different one, and do a few hours work on it (half of which is refamiliarizing myself with where I left off and working out the design again) but not enough to get it checked in, and then next time picking a DIFFERENT window. And if I spend more than a couple hours on one thing without getting it done, I go "no, I'm spending too much time on this, everything ELSE needs to get done" and swap, often without realizing I'm doing it. (Wasn't an issue before I had people waiting on my output...)


July 30, 2023

Toybox 0.8.10 is out.


July 28, 2023

Panic panic panic talk in half an hour. Cut and paste my TODO list for the talk out of the outline and into here:

#TODO: https://landley.net/bin/{toybox,mkroot,toolchains}
#TODO: upload new toolchains
TODO: test busybox package
#TODO: test extra in miniconfig

Sigh. I have now given a talk by pointing my phone camera at my laptop screen and typing with one hand. [Achievemnt unlocked: sigh.]

Note to self: do not assume that just because you're trying to use google meet with google's chromium browser, and because the meet page is showing you yourself through your webcam, that "google meet" won't crash when you click the "join" button. Rich says I could have worked around that by changing my user-agent string so Google Meet doesn't try to call some windows-only DRM library? The failure was trying to record video ahead of time and only testing that my phone worked with the google meet link for the Q&A, then running out of time (partly "perfectionism" but mostly "this isn't finished yet, let me try to nail this together real quick") and trying to "do it live"...

Anyway, I owe them a PROPER version of the talk. But I'd like the talk to actually describe a release version, so I need to do a release...

The commented out TODO items above mean: I made the "quick" bin symlinks so I can say https://landley.net/bin/mkroot instead of having to point people at https://landley.net/toybox/downloads/binaries/mkroot but unfortunately the way dreamhost's web server works (might still be apache?) there's no obvious way to discover the second URL from the first. I was thinking bin/toybox then click "parent directory" but that just peels off the symlink...

Built and uploaded toolchains with gcc 11.2 and musl 1.2.4. (Still built i686 on the host rather than x86-64, I should probably switch that over next time). Poked Rich about maybe actually upgrading musl-cross-make (hadn't had a commit in a year) and sent him my 3 local patches for it (not counting the package version upgrades). Redid the comment generation in linux-miniconfig so the third block says "# architecture extra" instead of architecture independent again.

Still haven't checked that mkroot/packages/busybox actually does anything useful, haven't actually run it in more than a year. Theoretically useful doing LFS bootstrapping if alpine's been regression testing that everything still builds under busybox (after I got it all working in the first place under aboriginal, thus allowing alpine to exist).


July 25, 2023

Sitting down to create the mkroot video for friday (yes prudetube emits fresh suck every day but I don't have to upload it there), and... I have 30 minutes total, which is not much time. There is SOOOO much stuff I want to complain about, and can't fit in the time allotted.

For example, in the README I have the example invocation KARGS=quiet ./run-qemu.sh -hda docs/linux-fullconfig but if you do that on x86-64 the QEMU bios still clears the screen and outputs a bunch of text (no obvious way to suppress this) despite the "quiet", which includes the magic broken esc[7l sequence which screws up bash command line editing and history (disables automatic wordwrap), and which you just have to know is undone by esc[7h which is why both the mkroot init script and run-qemu.sh emit the antidote. And then despite the "quiet" the kernel goes "you didn't enable this one specific bug mitigation, Doom and Gloom!" which... I'm running bottled code in a NAT-ted VM that's not trying to sandbox unknown code from the net? I do not care? If I did, I'd have added whatever config that is? (Plus this is an EMULATOR, I'm pretty sure it doesn't emulate the hardware flaw! And honestly, Spectre is just flaw du jour. There's tons! Did they ever even fix rowhammer? Or just smile and collectively agree not to look down?) But no, the kernel won't shut up because the kernel devs have been convinced they know better than everybody else for several years now...

Argh, there's no WAY I'm describing a coherent subset of this in half an hour, let alone Q&A time. This is a similar problem to why the standalone mkroot project had a README but I don't have one in the mkroot subdir yet. If you want to build on alpine linux you're probably ok (haven't tried it), or if you use the cross compilers you're ok, but the "simple" create-a-chroot build against glibc only partly works, and explaining what's wrong is at least a 15 minute digression right there. Unless I just want to say "because Ulrich Drepper was an asshole and the bureaucratic commitee that inherited the project when he bogged off to the finance industry hasn't had the spine to actually reverse any of his bad decisions". (I can badmouth instead of explain, which is TRUE but probably not helpful. "My project is good, these people are idiots"... not a good intro.)

I can start with downloading prebuilt binary versions, except... I want to reorder the mkroot binary output a bit, so linux-miniconfig and linux-fullconfig are in the "docs" directory. Just the files you NEED at the top level directory, run-qemu.sh and the files it calls...


July 24, 2023

Sigh, I don't WANT to switch browsers off chrome, but that's what's going around the zeitgeist right now. Right now vivaldi looks like the least bad option? (From one of the co-founders of Opera, who started over when his old company sold itself to china because capitalism. Yeah the code is a webkit derivative which means it's the same rewrite of a rewrite of konqueror, but that's a little like libre office forking off open office. Some of the gui stuff is source-under-glass, but people used QT for decades without caring about that part?)

Alright, what's standing between me and a toybox release: I want to fix cp -s the way I fixed readlink. I have a large /etc/passwd rewrite that (among other things) lets at least a lot MORE of defconfig build under the ndk. There's a bunch of pending sh changes but I can probably punt on that because that's NOT getting promoted this release.

I'm grinding through the LFS stuff: I ticked off dd from the pending list, and the other ones I've already done a good chunk of are diff, expr, gzip, tr, and xz. But what I really need to do is A) rerun that under a clean minimal debootstrap to get a $PATH dependency list without a bunch of extraneous crap that configure opportunistically included, B) finish the within-chroot part and log what THAT'S using. (And prepare a "yes it worked" double build smoketest I can just leave running in the background.) But I'm relucating to do complicated things in a chroot because "ifconfig" and "date" and so on can still screw up the host context as root within the chroot. Which is why I had the unshare line, but toysh running mkroot's init script within the chroot didn't detect that stdin was already open so replaced it with the container's /dev/console which apparently goes nowhere, so I went "lemme just do this under mkroot" and I made an 8 gig ext2 partition and used toybox httpd to wget a tarball of the debootstrap result to extract into it and chrooted into that and it didn't work and I don't remember why and need to try again.

Too much of a tangent to block the release for.

And I've got that remote "intro to mkroot" talk on friday for the Taiwan conference that should really describe how to use the vanilla release and the currently uploaded mkroot system images and so on. Need to do a 10 minute "download prebuilt binary tarballs and play", and a 10 minute "building this from source with the cross compilers" (which keeps getting derailed by "why dynamic linking instead of static linking is REALLY COMPLICATED, which is what screws up building WITHOUT the cross compilers on glibc hosts, although it works ok on something like alpine"...)

I mean honestly, I've got a bunch of tricks to harvest shared libraries out of the host toolchain, but they all suck. There's a sequencing issue about needing to select dynamic _before_ building toybox if that's to be dynamically linked, but you don't know what shared libraries you need until AFTER you've built all the binaries, and if you just copy everything out of debian it's 1.7 gigabytes of shared libraries on my install, which ain't gonna fit in initramfs. But if you try to be selective and recursively run ldd on the binaries after you've built them, plus each shared library you copy, that STILL doesn't identify the dlopen() crap that glibc calls even from static builds. (It's not IN the runtime linking, it's done by functions after the program starts running. For BAD REASONS.) So if you want to make dynamic linking against glibc work, you need a hardcoded list of additional shared libraries to copy to target in case they're dlopen()ed, and that list will of course change with new glibc releases.

Did I mention that Red Hat maintains glibc? Yes, the same people who did systemd. The same people who are trying to shove wayland down everybody's throats. The same people who stopped releasing their source code. IBM, You BM, we all BE for IBM.

Anyway, wasted a bit trying to make dynamic linking (against glibc) not just work but be cleaned up enough to be easily explicable in 2 or 3 minutes out of the upcoming talk, and it's just wasn't happening.

Darn it, Microsoft github's tests are failing. Spotted it earlier but couldn't see what was wrong from my phone because Microsoft won't show me test results unless I log in. (It literally says "log in to see test results" when I click, and I'm not giving Microsoft my phone credentials.) insists I log in to see test results., and there's... some sort of version skew with ubuntu, maybe? Lots of "bzcat: out EOF" I'm not seeing in a "git clone toybox blah && cd blah && make distclean defconfig toybox tests" on my machine. And those aren't even the actual failures, which seem to be in tar...

Although I AM seeing spurious output from that clean run (and yes my rote memory version is disabling ASAN because I'm not letting make tests build a toybox binary but telling it to build one from the command line, but one issue at a time). It's diff saying expected/actual don't exist in the pwd tests, because it's creating ../expected and ../actual and then doing "cd ..; diff expected actual" afterwards. But pwd.tests did an "ls -s . blah; cd blah" which means ../file is doing a physical file traversal to the parent directory, but cd .. is peeling off the last $PWD entry which is the NOP circular symlink. Although it's still saying it's NOT an error because the diff produces no output! (Which is right for the wrong reasons, and _itself_ a bug.)

On the one hand, I don't want the test doing a cd to change where expected/actual live. On the other, I don't want to pollute the environment variable space with extra stuff? Still, the second is definitely the lesser evil. And I should also capture stderr as part of the diff output when detecting test failure.

None of which is what's going wrong on Microsoft github, of course.


July 23, 2023

Still checking the kernel bug report for the btrfs issue. No response yet...

Got dd cleaned up and promoted. That was one of my big "I want to get this into the next release" things I'd been holding it for, so I might cut a release today.

Part of the dd promotion was just NOT adding the block granularity tests and instead just wait for somebody to complain. I THINK I'm getting them right? (I'm still not sure conv=sync is handled right, but if I'm getting it wrong I'm pretty sure the previous code was too? There's no test for it yet... ok, added a test.)

Watching twitter's dumpster fire du jour from Mastodon and being glad I got out before the frogs REALLY started boiling. The waves of "this is fine" burning dog energy are just... it's like watching catholics justify each new pedophile priest scandal and find reasons not to react or change to each new input. (Another hundred children's bodies found buried under a catholic school? Ho hum. We're still the ultimate arbiters of morality, never mind the schools we filled with kidnapped native americans are like a serial killer's backyard, after all we finally rescinded the Doctrine of Discovery in... March of this year. Now go eat your human flesh and drink your human blood which may look like crackers and wine but we insist are literally, not merely symbolically, actual cannibalism. In a good way!)

I have a history of not being on AOL, not using Windows, not using Faceboot, not drinking, and for most of the past decade not driving a car. Doing Without is pretty normal. Heck, my twitter account got blocked in 2019 for tweeting "Guillotine the billionaires" as my comment on link du jour a could hundred times, and their "I'll know it when I see it" ever-changing community moderation trends shifted out from under me so that was retroactively No Longer Acceptable. Except it's a political position: if this country has capital punishment, how does it NOT apply to the Sackler family behind the opioid crisis having provably killed 100k people? So old Twitter-under-@jack retconned all my old posts' status and wanted me to performatively delete each instance from the history _and_ give them my phone number, and I went "oh well, no more twitter" back in 2019. Jack let The Resident keep his twitter account. "This is not a place of honor. The danger is in a particular location. It increases towards a center." Muskrat buying the thing was a matter of degree.

As for missing it... I miss Livejournal. When Russia bought that, its userbase fled. I miss the #busybox freenode channel circa 2003, which broke up and wandered off long before some Korean billionaire bought freenode and its userbase fled to three different servers. (Erik was better at community management than me. It's never been a strong suit of mine.) "This too shall pass." Insane billionaires destroying companies I grew up with like Sears (Eddie Lampert) and Toys-R-Us (Mitt Romney) were a bigger deal to me than an insane billionaire destroying a website that was founded over a decade after I graduated college.

And no, it's not "capitalism" doing it, any more than "monarchy" killed Anne Boleyn and Jane Seymour. There was a specific guy. He had a neck. Society allows Lampert, Romney, Musk, eight Sacklers, and (according to Forbes) 2629 other billionaires as of May 2023 to all sleep safely each night in a country with half a million homeless people and 16 million homes sitting vacant, with 34 million "food insecure" people (9 million of which are children) in the world's largest food exporter. That's a choice. I have a political objection to that. Our gerontocracy (Biden's 80, Pelosi's 83, Feinstein is 90 and basically a vegetable) isn't going to change until the Boomers die. The non-defunded police are constantly punching down, with bullets.

So yeah, twitter delenda est. This too shall pass and could be quicker about it. I didn't try to defend "the wall street journal" when Rupert Murdoch bought it, I'm not trying to defend twitter from its new owner. Burn baby burn.


July 22, 2023

I want to hammer my SSD _slightly_ less hard, which means both reducing the amount of swapping it's doing and running the big LFS package builds in ramfs (well, least cp -s the source into a tmpfs mount), which means it's finally time to transplant the 16 gig memory chips from my old laptop to the new one, which still has 8 gits (2x 4gig, the other has 2x 8 gig).

Alas, this involves a reboot, which involves closing piles of open windows in 8 desktops, which means a lot of cut and paste of "here's a test I ran to figure out a toysh corner case" into my sh.tests file from which I should eventually try to extend tests/sh.test into something with actual design coverage.

It's very slow going. I have quite a lot of open tabs. And trying to do the thing IN a tab tends to spin off tangents (more tabs)...


July 21, 2023

Sometimes I get a poke on github where I honestly don't understand what's being asked. Somebody finds busybox "strings" to be performance critical to them, and this should be of interest to toybox?

I feel like I'm missing something, but honestly can't spot it.


July 20, 2023

I poked a little at adding a config TOYBOX_BTRFS_BUG_WORKAROUND that enables a lib/portability.c section wrapping opendir()/readdir()/closedir() with a version that glues an extra field onto the end of the dir structure that's an array of dev_t/ino_t pairs we've seen so far... but really, adding a -breadth option is probably better? It prevents the btrfs bug from making find denial of service attack itself, and if another program operating on the same filesystem does that... it's an obvious btrfs bug. A loop calling readdir() is never guaranteed to terminate on btrfs. That's a problem, no matter what filtering you do on the results you're never sure you're DONE.


July 19, 2023

A recently restarted discussion wandered over onto kernel.org bugzilla (to confirm that they're going to say it's a feature), and when I tried to make an account it said I already had one, and I went "how long ago was this" and guessed a pasword I haven't used in at LEAST 10 years. Which was it. That's disturbing.

Then I composed a comment because I think they should have the simple standalone 15 line C program that reproduces this issue instead of knee-jerk saying "it's a toybox bug go away". Also, "this is a denial of service attack waiting to happen" seems an important point to make? Could just be me. (A program running as root traversing a userspace directory could get pinned in an endless readdir() loop by a program repeatedly renaming a file.)

As usual, my comment went on a bit long and I decided to edit out most of the more inflammatory "I assume you're going to be stubborn about this so let's explain just HOW problematic the position I expect you're defending is" and cut and paste it to my blog (standard move for me), but when I did that and hit "preview" it went "your session token has expired, log in again".

So for security reasons, bugzilla.kernel.org would not generate a preview if I took too long composing the message, but happily let me log in using a 10 year old _highly_ insecure password. That's nice.

Here's the text I removed:

It seems like other filesystems are trying to provide a snapshot of the directory at query time (with the same stuff shows up and stuff goes away problems of "ps"), and btrfs's readdir() is trying to be inotify as well appending a stream of updates that happened after the open and first getdents(). (I'm guessing it does not append deletions because getdents can't return negative dentries.)

While cacheing entries before acting on them prevents us from doing this to ourselves (so never use btrfs on embedded systems if it's unsafe to use without extra cacheing), that still doesn't prevent any other process from intentionally making a readdir literally continue forever, hanging programs that don't know they need a workaround for this filesystem's behavior.

And coming up with a workaround is non-obvious, because "stop at the first dev/inode pair I've already seen" would break on a filesystem that returns entries in alphabetical order (so renaming bcd to ghi could repeat an inode before returning a zzz we hadn't seen yet, but that doesn't mean it wouldn't EVER terminate). Any method to stop before EOF is an imperfect heuristic, and continuing to EOF is never guaranteed to terminate on btrfs.

Hmmm... maybe "I've seen every previously returned dev+inode a second time" is a good btrfs loop detector heuristic? Probably not going to see any new ones at that point...

Sigh, I can see the logic of "if we make getdents() act like inotify() that's more _reliable_", but as with many security things the logic is wrong. The result is losing the guarantee that getdents() will _ever_ end, which is a livelock denial of service waiting to happen. Inotify exists, if you _want_ to make something "reliable" like that in userspace you can. And then it's YOUR job to work out how to defend the extra complexity _you_ just added against denial of service attacks like the one I just got a bug report about.


July 18, 2023

I have received an "event invitation" from the Taiwan guys (which thunderbird has somehow processed, in its "I want to grow up to be Microsoft Office" way), stating that at least the Q&A part happens via Google Meet. Ok, my phone can do that.

I 100% want to send them the videos ahead of time. Possibly post them to that youtube channel I haven't been using because youtube's gotten... Urgh, I could add a hundred more links to those three. Nobody who actually posts to Youtube seems to enjoy being there anymore.


July 17, 2023

Staring at toys/pending/hexdump.c which... there's nothing fundamentally WRONG with it, except that toybox has od.c (from posix) and xxd.c (from Elliott's personal preferences) and hexedit.c (interactive gui tool) and this is a FOURTH of implementation of basically the same functionality that shares NO CODE with the other three.

It's... I mean... usually when I'm dumping stuff on the command line I use "hd" which is an alias for "hexdump -C", and if I had to pick one of od or xxd or hd I would totally go for hd. But... FOUR implementations of the same general thing? Sharing NO CODE?

Keeping it out is a bad call, putting it in is a bad call. And unifying the implementations sticks on "xxd isn't in my wheelhouse". I dunno what success looks like there because I don't use it, and that one's full of sharp edges. Although half the code is do_xxd_include() and do_xxd_reverse(), and looking at dehex() I really want to replace it with sscanf()...

Sigh, printing out a hex dump is easy to do. There's not MUCH code, which is why it's hard to share. But FOUR implementations, sharing NOTHING? Ouch...


July 16, 2023

Sigh, I'm trying to set up debootstrap under mkroot, which means doing a quick and dirty rm -rf blah.img && truncate -s 8g blah.img && mke2fs -j blah.img && ./run-qemu-sh. -hda blah.img but... there's a bunch of WEIRD going on here. Bash in the chroot is saying that /dev/ttyS0 in not a controlling console, but SERIAL_8250_CONSOLE is enabled in the kernel config? I'm using qemu so I can do chroot _without_ unshare. No container weirdness here, it should be a normal chroot.

And then I'm transferring in the data by running toybox netcat -s 127.0.0.1 -p 8888 -L toybox httpd . on the host, and within qemu doing wget http://10.0.2.2:8888/debootstrap.tgz -O- | tar xv which is looping saying "inflate EOF", which is just weird. (Why _looping_? How is is it not _exiting_? It works fine if I fetch the file and extract it with tar -f but not when wget pipes into stdin?)

I'm trying to write this down and NOT go down tangents debugging them, but it's hard. (Define "focus". This is a prioritization problem.)


July 15, 2023

The entry on the 13th was kind of long and my diversion into the gnu "info" format was an off-topic tangent, so I moved it here. :)

The REASON the "makeinfo" command that should just not exist in 2023 is the gnu info format was a derivative of "gopher" from before html took over, meaning it DIED THIRTY YEARS AGO. The University of Minnesota announced it would charge license fees for gopher in Febuary 1993 but CERN disclaimed ownership of www, so http:// became ubiquitous and gopher:// strangled even though Mosaic could read from both servers. This was another "IBM lost its lawsuit against Compaq so PC clones were royalty-free, Apple won its lawsuit against Franklin so Apple II could not be cloned" all over again. (Apple begged Steve Jobs to come back in 1997 because it was _dying_.) NFS was a terrible network filesystem protocol, but Sun released it for free and the competitors like IBM's AFS were proprietary.

In 2003 the gnu documentation maintainers agreed info was obsolete and should be replaced. I know this because Eric Raymond was writing "doclifter" (a tool to parse man page output and heuristically produce docbook), and he asked them while I was sitting next to him in his home office.

I met Eric before he went crazy: he used to be a bit weird but functional. Back before my mother's cancer returned she lived in Marlton New Jersey and Eric lived in Malvern PA, about an hour's drive away. He had a gun hobby but it was like an archery hobby, really didn't come up much. (He had exactly one rifle or possibly shotgun in the basement, I never saw him fire it. He'd had some kind of pistol for use at shooting ranges before that, but it got confiscated out of his luggage by the TSA after 9/11. He was mad about that because both Terry Pratchett and Larry Wall had borrowed and used it at a shooting range adjacent to various conventions.)

I met Eric at Atlanta Linux Showcase in 1999, and again at Worldcon in 2000, after reading The Cathedral And the Bazaar and tracking down as much else of his writing as I could find. He lived an hour's drive from my relatives, so I promised the next time I was in town I'd drive by his house and drop off a copy of the movie 1776 (because my theory at the time was ESR was Ben Franklin, RMS was John Adamas, and Linus Torvalds was Thomas Jefferson, at least as portrayed in that movie. I also gave RMS a copy when I visited _him_ in Boston in February 2001, and he was very upset I thought he was Adams, he thought HE was Franklin.)

Eric and I started collaborating on stuff (I was writing for The Motley Fool, he'd been stalled on The Art of Unix Programming for a while, I offered to review and help edit...) The morning of September 11, 2001 I'd spent the night on the futon in Eric and Cathy's basement and was driving home when I noticed multiple police cars driving aimlessly with their flashers on before I'd even left Malvern, and turned on the radio to hear about "the hole where the World Trade center used to be", and turned back around and knocked on Eric's door and went "I think somebody just nuked the World Trade Center" and we went back in and I tried to get Slashdot to load while he loaded The Drudge Report. (I remember cnn.com was down, whether from load or due to half the eastern seaboard's internet having gone through the WTC's basement was not immediately clear.)

I say "before Eric went crazy" a lot, but he was important to open source for years. It's like Newton spending the end of his life studying alchemy and magic, or Linus Pauling deciding massive doses of vitamin C weren't a placebo despite being measurably flushed out of the body. And of course William Shockley (who stole credit for the transistor from Bardeen and Brattain) and James Watson (who stole credit for discovering the structure of DNA from Rosalind Franklin) announced themselves as virulent racists as they got older. (Eric had at least DONE his early work.)

Eric was a Libertarian back when the Koch Bros had whole think tanks devoted to capturing and radicalizing Libertarians. The Nobel Effect plays in here a bit, and the way smart successful people Dunning-Kruger hard in adjacent areas to their speciality. (If you're good at one thing you must be "smart" as a universal trait, and thus good at everything, so how hard can everyone else's areas of earned expertise be?) Smart people are often _more_ susceptible to con artists, and depression works in here too, they do the work of seeing the faces in clouds and building stories to justify their expectations.

9/11 didn't hit me that hard. I honestly didn't see why it was a bigger deal than The Oklahoma City Bombing six years earlier. I'd grown up on Kwajalein, which was littered with World War II debris, and where they regularly tested ICBMs for eventual use against Russia or possibly China. When my family moved to New Jersey I used to hold me breath and count after planes went overhead the first year back in the states because WHAT IF THAT WAS AN INCOMING NUKE? I was 10 in a strange country with Ronald Regan in charge, and I'd only heard the ICBMs _launch_ from Kwaj, the incoming ones from Vandenberg Air Force Base in California generally came down miles away in the lagoon. (Except that one where they tried a land impact and missed and wound up pointing the tradex at brand x, but... tangent.) Kwajalein was too small and isolated to be much of a target, although my mother had explained to my sister, when my sister got a silver necklace for christmas when I was 6, that if civilization collapsed she could trade that for food so she wouldn't have to eat our cat Namaur immediately. (The Boomers were NOT all right, even back then.) Me, I was the strange child viewing civilization like an alien antropolgist going "If Rome collapsed 2000 years ago and civilization rebuilt itself, then if we DO have a nuclear war we'll probably be back where we are now in another 2000 years, but we need to solve aging before losing our current tech level for that to matter..." (Did I mention I read The Ship Who Sang when I was 7?) Anyway, once we were back in the states Regan was on TV all the time going on about how mustache-twirlingly evil the Russians were, and "The Day After" was on TV, and Sting was singing "if the Russians love their children too" and Star Trek's future history just _assumed_ we'd go through a nuclear war before rebuilding and managing serious space travel, and I really wasn't HAPPY with the move to New Jersey between the definite nucear targets of New York and Philadelphia. (Also, we moved from Florida to Kwaj when I was 5, moved from Kwaj to NJ when I was 10, and did NOT move out of New Jersey when I was 15. Unfair.) Still, it was a bit of a stress relief when the berlin wall came down in high school. China could still nuke us, but since Nixon did the divide-and-conquer thing and started outright bribing them, they hadn't really wanted to. They made way too much money off of us.

In comparison, "a couple airliners got hijacked and suicide bombers took down a building"... and? Hijackings were a regular-ish occurrence in the 1970s, and Japan had done kamikaze plane attacks through World War II. Again, Kwaj was a big WWII battleground and the military housing we lived in was built after the US navy took the island during the war: while collecting shells on the reef "45s" were less common than cone shells or brownies, but more common than strawberry cowries.

Some planes hitting a building a hundred miles away was not REMOTELY an existential threat. I mean yeah, a big shame, and my brother had visited that building and had a cup from the cafe on top in the dishwasher. But they'd blown up less than 5% of one city in a country that had multiple dozens of big cities and hundreds of little ones. Hurricane Andrew had caused a much bigger swath of destruction, and hadn't somebody already tried to blow up the World Trade Center a few years earlier with a car bomb in the basement? I couldn't understand why it was such a big deal, but everybody around me was PANICING... One more way I didn't fit in, all I could do was wait for them to work through it.

(I didn't understand that Boomers were raised on duck-and-cover rhetoric, where the country would pop like a soap bubble at the first attack by a foreign power. Even though the cold war was over, they couldn't grok that there were ways to be attacked by a foreign power that did NOT mean the immediate end of world civilization. Even today, people keep thinking that Russia's tantrums mean World War 3 if they don't get everything they ever ask for, when the most GENEROUS estimates of their current ICBM and warhead capacity is that today they MIGHT do about as much damage to the rest of the world combined as the USA did to Japan with Fat Man and Little boy in 1945. "Two cities lose 20% of their population" is the kind of thing Ukraine is going through NOW on a regular basis (Mariupol basically no longer exists) and they're still fighting. The USA had already beaten Japan in 1945, but Truman needed something showy enough to surrender to rather than fighting to the death. The "thousands of missiles on constant alert" thing 20 years later was like the bomb shelters stocked with food: we paid rather a lot of money to maintain them and they went away again when we stopped paying for them. A dozen disgruntled junior Saudi royals never had that kind of resources: they were suicide bombers taking advantage of cockpit doors that didn't lock and passengers who expected to live if they didn't resist. If that could ever be an existential threat Israel wouldn't have lasted 5 years.)

In response to 9/11 Eric doubled down on the gun-nuttery, because he coped with the stress by writing a terrible post (on _paper_ in that restaurant with the raspberry cheesecake) about how it wouldn't have happened if everybody on every plane had guns at all times (which I didn't believe, but he was stressed and feeling helpless and needed to vent). And then Eric defending himself online from the inevitable blowback, thus doubling more down. And later his wife Cathy (a lawyer who wanted to become a judge) got involved with local GOP politics (the path to judgeship, apparently) during the Cheney administration's "Duct Tape and Plastic Sheeting" days of warantless wiretaps and the TSA being a law unto themselves while Haliburn invaded Iraq (stealing 2 _billion_ dollars in cash along the way, as in it vanished from shipping containers in their/blackwater's custody) and the "threat level" changed daily with blue being one of the options for some reason. (The dubyah administration had no fucking clue, but everyone would have rallied around a potted plant. Huddling together for reassurance, really. It became unpatriotic to make fun of what we had all previously agreed was a clearly incompetent idiot puppeted by Darth Vader. Everybody had to fall in line and obey, just like under McCarthyism in the 5 "red scard" years right after the soviets detonated their first nuclear bomb.)

Over the next few years Eric gradually got more brittle (hanging out in the online spaces the Drudge Report and libertarianism led him to) until my ability to collaborate with him went on indefinite hiatus around 2009. While I was launching Penguicon and helping defend IBM from SCO around 2003 we still worked together great. We finished the 64 bit transition paper in 2006 with growing but still manageable friction. We made made multiple attempts at finishing the "Why C++ is not my favorite language" starting around 2008 but the last couple visits turned into shouting matches. (The Koch Brothers had think tanks to capture libertarians, so on my visits Eric kept showing me articles about how seeding the oceans with iron might cause huge algae booms that would sink to the bottom and trap carbon, and I was going "whale fall is not new, stuff eats food even on the bottom of the deep bits, it'll all be back in the atmosphere in twenty years tops", and showing me his "research" about how oil might be generated in the planet's mantle and seep up towards the surface and thus essentially never run out, and I was going "I just worked a contract at Ion Geophysical in Houston where I MET the guys who came up with that lie, you are DEEP into some sort of cult nonsense here and this is Tobacco Institute levels of targeted lying which you should NOT be dumb enough to fall for"... I remember one car trip towards the end where he was telling me about the book "The Bell Curve" he was impressed by (and me going "no"), and in his office him explaining to me how eskimos could DEFINITELY all rotate 3D objects in their heads because they could get rid of waste heat more easily because living further north makes people "genetically smarter" and me calling bullshit; by that metric bald people would be measurably smarter and nobody could think while wearing a hat: I was pre-med in college, temperature is equalized by blood circulation, excessive heat loss through the head is a BUG not a feature, and heat dissipation has never been a limiting factor on THOUGHT, nobody should fall for this! Sigh. Eric's online social channels were feeding him targeted crap, because libertarian gun nuttery was identified as exploitable by the Koch Brothers' think tanks. People were ALREADY writing articles about the "libertarian to fascist pipeline", but he was too far gone. I _watched_ him get radicalized, and couldn't stop it. The same think tanks got the atheists too, as I said basically exploiting Nobel Disease. People who think they're smarter than the grifter are the easiest marks.

(One of the first times Eric got "you just accidentally spit on me" mad in 2008 was when I said "we're having a semantic argument" and he insisted that did NOT mean we're uselessly arguing about the definition of words. He had some sort of philosophical nonsense about "semantic" being the most important category of knowledge or some such, talked about a philosopher whose name I didn't recognize, and patronized me a bit for neither knowing nor caring about him. I was never convinced, I just stopped arguing. He got ANGRY about it...)

I stopped visiting Malvern after 2009, and in 2011 we stopped speaking to each other at all when he went full climate change denialist and I asked him "When did you turn into Glenn Beck?" on twitter. (He did not take that well. We didn't speak again for many years, and that was one brief phone call nominally burying the hatchet because Cathy asked me to. Eric had at least acknowledged that climate change denialism specifically he'd been wrong about, but his libertarianism led him down other right wing loon paths...)

Anyway, I miss my friend, I'm sorry he went crazy. But the point is, back in 2003 even the GNU maintainers admitted that the "info" documentation format was toast, so still using it 20 years later is just SAD.

Admittedly the gnu devs' "we'll try doclifter when it's ready" statement is a bit like the kernel guys saying they'd move to Eric's cml2 instead of the kconfig rewrite that wound up happening instead, because Eric wrote cml2 in Python (not previously a kernel build dependency) and it took 20 seconds to open because he was doing some horrible analysis thing that made sense to a Lisp programmer, and he refused to cache the results because "shipping generated data was wrong". Meanwhile, the kconfig rewrite had blah.c_shipped files so you didn't need lex or yacc on your build machine.

But Eric being stubborn wasn't the fundamental problem: Doclifter mostly failed because docbook was pointless, something I argued with Eric about at the time, or at least while we were editing The Art of Unix Programming, which was also written in Docbook. If there are no visual editors for your format, and thus you MUST edit the tags by hand in a text editor, the format is of limited use. The distinction between "semantic" and "presentation" means this is a PROGRAMMING LANGUAGE, not a document authoring format, and you'll never get tech writers to touch it. Even microsoft word had "show edits" and such to make the invisible visible and thus GUI editable. Saying "there's a bunch of semantic markup that doesn't translate to presentation layer" is ivory tower academic bullshit, it's useless in real life. What little oxygen docbook had got eventually replaced by wiki markup because you could tab back and forth between editing and display.

But still, they could have converted info to anything else. It failed as a "standard", outside the FSF ever picked it up, and the FSF sticking with gopher-based info in 2023 is like still publishing in EBCDIC. The Gopher core devs realized they'd lost in 1993, Web trafic passed gopher traffic in 1994, and the University of Minnesota where it was developed (and named after their sportsball mascot) disbanded the gopher programming team in 1995 (retasking the developers to develop a web-based accounting software package instead). Firefox (generally a trailing indicator) dropped support for Gopher with the 4.0 release in 2011 (10 years after Internet Explorer 6 dropped it). Chrome never had it. Info can STOP NOW. I'm pretty sure I posted a very similar rant about this at least a dozen years ago because it's a ZOMBIE...

So yeah, info sucks. Don't install it, and rip it out of any configure file that can't cleanly drop it out. Richard Stallman went crazy (along a different axis) about 10 years _before_ Eric did, and the FSF Bill Cosbying him back into the fold so it's still _his_ call to stick with this rejected data format he'd created or move to _anything_ else? Sigh.


July 14, 2023

I'm rereading the posix dd spec, and it was clearly NOT written by someone who had ever tried to implement it. It talks at length about what to do about short reads but never even considers the possiblity of short WRITES. A bunch of rules for figuring out what command line arguments and input block sizes result in which output block sizes, including a 6 step list starting "The processing order shall be as follows"... but writes are assumed to complete, block, go into a kernel buffer... never a short write.

The section saying what goes to stderr says a count of whole and partial input and output blocks, and also if there are any truncated blocks a line about those too. Where truncated only applies to input blocks, not output blocks, and not the "short read" kind of truncated but instead their stupid "conv=block" feature. You have "conv=noerror" but the summary does NOT say how many read or write errors were encountered and seeked past? Really?

And of course even the 2018 version (I.E. what's live on the website today) has an ASCII to EBCDIC conversion chart, despite that already being irrelevant back in the 1990s. And in the RATIONALE section it says "a failed read on a regular file generally does not increment the file offset, and dd must then seek past the block on which the error occurred; otherwise the input error occurs repetitively. When the input is a magnetic tape, however, the tape normally has passed the block containing the error... and thus no seek is necessary." So it's commemorating an ancient unix kernel bug from 30 years ago, and noting that the driver for a piece of hardware that was already obsolete 20 years ago behaved differently. And another footnote in RATIONALE says that EBCDIC doesn't have the [ and ] characters (without which you can't use any modern programming language) so they fudged it.

Sigh. The whole "is dd actually reblocking" question... while it's what the tool was originally FOR, I'm not sure anybody actually _uses_ it for that? It's "read this amount of data at this offset from this source, and send it to that offset at that source". Any transforms like toupper can be done by some other filter in a pipeline. Back in the day they micromanaged the block size, and that's great, but it's really HARD to care in modern contexts because the 40th's anniversary of the publication of Nagle's algorithm is next year.

I've been grinding away at trying to come up with a test suite that can detect the transaction granularity, but really: if we read and write the correct data at the correct offsets? Probably good enough.


July 13, 2023

Sigh, not sure why I thought LFS 12 was out? Probably clicked on the "unstable" link on the web by accident. Cloned the git repo (which says 404 in a browser but clones from the command line; so friendly) in hopes of having an easier time keeping track. In the meantime: yay 11.3.

You know, technically, to get the LFS 11.3 build working all I actually NEED is the "chapter 5" build. As in if a mkroot chroot can run the script I already wrote, then everything else happens in the new chroot and THAT means that if toybox can provide the commands that record-commands says the script I already wrote is calling, we're ready for at least the naieve LFS bootstrapping. Hmmm...

There's still a a bit of glue layer though: Chapter 7 starts by running crap as root in the host sytem, starting with a "chown -R root:root lfs" (except in an elaborate overcomplicated way where it does every individual subdirectory instead of just cleanly recursing), and then mkdir lfs/{dev,proc,sys,run} which is STUPID to put there, then it mounts dev, dev/pts, proc, sysfs, tmpfs, and dev/shm (which is slightly awkward because there's no "mount" command in the chroot yet). And THEN it does more or less chroot lfs env -i HOME=/root TERM="$TERM" PATH=/bin:/sbin /bin/bash --login.

Three of those four things require root access to do, so it kinda makes sense to put them here, although I either want to shoehorn my unshare -Cimnpuf layer in there and run as fake root inside the chroot, or more likely boot QEMU and run as root inside the emulator.

But only three of the four, and I really REALLY want to clean that fourth thing up: the mkdir in the middle could be part of the initial mkdir. And having it come AFTER the chown... so the mountpoints belong to the host user? Going out of your WAY to do that? Why?

And of course, historically the glibc build was an abomination requiring both perl AND python as hard build prerequisites. I substituted uClibc back in the day, and can presumably use musl instead now. But let's get what's there working first.

Ok, I re-ran ch5.sh under record-commands and the new list of commands it called are:

aclocal-1.16 ar as autoconf autoheader autom4te automake-1.16 awk basename bash bison cat cc c++filt chmod cmp cp cut date dd diff dirname echo egrep env expand expect expr fgrep file find flex g++ gawk gcc getconf git gnat gnatgcc gnatmake gnatprep grep gzip head hostname id install ld ldd ln ls m4 make makeinfo mkdir mktemp msgfmt msgmerge mt mv nm nproc objcopy objdump od paste patch perl pkg-config pod2man print python python3 ranlib readelf realpath rm rmdir sed sh sleep sort strip tail tar test touch tput tr true tty uname uniq wc which x86_64-pc-linux-gnu-pkg-config xargs xgettext xmlcatalog xz

Which isn't even necessarily the full list, because PATH=newtools:$PATH which means it might have built stuff that got called later, which toybox COULD supply but which it built a tool for before ever trying to use. Ideally I'd like the toybox host binaries to be able to build as many packages as possible, and thus be able to build ncurses or similar WITHOUT building coreutils first, so I want the build to KEEP using the toybox versions when they are available, thus installing the new stuff AFTER toybox in the $PATH. (Ala PATH=$PATH:newtools instead.) But let's start with the low hanging fruit first.

It's likely at least some of the above crap got called by autoconf to see if it was there, but it would have happily worked without it. There is NO REASON anything in 2023 should be calling "mt" (the magnetic tape control command), for example.

Ok, which binaries from this list does defconfig toybox already provide:

basename bash cat chmod cmp cp cut date dirname echo egrep env expand fgrep file find getconf grep head hostname id install ln ls mkdir mktemp mv nproc od paste patch readelf realpath rm rmdir sed sh sleep sort tail tar test touch true tty uname uniq wc which xargs

And which does the toolchain provide:

ar as cc c++filt g++ gcc ld nm objcopy objdump ranlib strip

Plus the stuff already in the toybox roadmap (if not pending) is:

awk bison dd diff expr flex gawk git gzip m4 make tr xz

Which leaves:

aclocal-1.16 autoconf autoheader autom4te automake-1.16 expect gnat gnatgcc gnatmake gnatprep ldd makeinfo msgfmt msgmerge mt perl pkg-config pod2man print python python3 tput x86_64-pc-linux-gnu-pkg-config xgettext xmlcatalog

Ok, for i in $FILES; do dpkg-query -S $(readlink -f $(which $i)); done | sort and then a slight manual cleanup:

autoconf: autoconf autoheader autom4te
automake: aclocal-1.16 automake-1.16
cpio: mt-gnu
expect: expect
gcc-8: x86_64-linux-gnu-gcc-8
gettext: msgfmt msgmerge xgettext
gnat-8: x86_64-linux-gnu-{gnat,gnatmake,gnatprep}-8
libc-bin: ldd
libxml2-utils: xmlcatalog
mime-support: run-mailcap
ncurses-bin: tput
perl-base: perl
perl: pod2man
pkg-config: pkg-config x86_64-pc-linux-gnu-pkg-config
python2.7-minimal: python2.7
python3.7-minimal: python3.7
texinfo: texi2any

Sigh, I try not to have autoconf and automake installed on my laptop, I'm guessing that's left over debris from trying to get some gnu/dammit package to compile from a random git snapshot. I should set up and run this in a clean debootstrap, but for the moment assuming those do drop out (they USED to)...

The only calls to mt (the magnetic tape control utility, from back in the days when big iron computers had tape reels on the front, and presumably flashing lights and made beep-boop noises and a lot of relay clicks before speaking flatly in Majel Barrett's voice) in the log are all "mt -?" with no follow-up, so I'm guessing this is historical gnu configure debris checking to see if something is there and then never caring. What packages do that: egrep '^tar "xvf"|^mt ' log.txt | less says mpc, file, gawk, and xz. Ugly. But it's autoconf, so that goes without saying.

I boggled a bit at "expect" because grep '^expect ' log.txt produced zero hits but awk '{print $1}' log.txt | sort -u was finding it, but it turns out expect was called with no arguments (so the log line had no space after it, just an immediate newline). It's only checked for (not used) by binutils, which can presumably get along fine without it.

The downside of the readlink -f is a few things renamed themselves, but without that dpkg-query -S isn't smart enough to find the packages. The gcc-8 thing is actually what "gnatgcc" redirects to, so this isn't host toolchain leakage, it's gnat leakage. (Not that we were using a cross compiler anyway, but still.)

Presumably if gettext isn't installed it'll drop out? Back in aboriginal I had a gettext-stub library that would A) symlink msgfmt to true, B) provide a stub libintl.h and libintl.c that did as little as possible, lots of #defining things to NULL and functions doing return msgid; or return "C"; but right NOW the question is will this work if this package just ISN'T installed? I remember having to patch binutils back in the day. (Not a big patch, just... they never regression tested the not-installed path so they asked a question and then couldn't handle one of the answers for stupid typo reasons. Which persisted for at least 5 years because nobody wanted to talk to gnu zealots and everyone just patched it locally.)

The "gnat" stuff is because I still have a package installed for building the ASIC hardware toolchain out of ghdl and yosys and such. It's one of those obscure gcc "compiler collection" things like fortran or gcc's built-in java support, what language... it's gcc's ADA compiler. Which is a horrible overcomplicated language the US Navy bloated into uselessness back in the 1980s (back in college Rutgers had a class in it in the catalog, but wasn't actually offering it, so I asked a college advisor why and got an earful), and for some reason GHDL was implemented in it. (I think because VHDL the language is an ADA derivative or some such? Which makes as much sense as saying C and Python use "algol syntax", which is technically true but the last Algol standard in 1968 was presented with a report from the committee saying they already considered the language a failure and were going to stop now. C carried the banner from 1972 onwards and other languages have C-like syntax, the fossil ancestor is effectively extinct.) Anyway, gnat almost certainly drops out cleanly if it's not installed because 99% of the userbase won't have it. (And ghdl being written in ADA is the main reason ghdl isn't more widely used, which alas drags VHDL down with it.)

HA! THE BUILD IS CALLING LDD. To do what, exactly? Grep of log.txt says that "ldd --version" is called 3 times (twice by the "file" build, once by the "patch" build). Yet more useless autoconf shenanigans, not actually used for anything.

The gcc and mpc builds are calling xmlcatalog "" "http://docbook.sourceforge.net/release/xsl-ns/current/" which does not need to happen.

Another readlink -f renaming head scratcher: run-mailcap is what "print" is symlinked to, and... there are only 2 calls to that in the log (repeated several times though), and both of them produce a usage: message and several "error:" lines. Calls like print -r -- -n are nonsense, neither -r nor -n are recognized options to the mime-support "print" command. And then one has a zillion backslashes as an argument, but it doesn't produce output to stdout (only stderr) because again, unrecognized options so error message instead. This one's happening in over half the package builds. (You can sing "autoconf is useless" to "every sperm is sacred".)

I kinda have tput in the toybox roadmap except that it's got as many options as stty and I dunno what's _relevant_, but in this case the build is calling:

$ grep '^tput ' log.txt | sort -u
tput "bold"
tput "setaf" "1"
tput "setaf" "2"
tput "setaf" "4"
tput "setaf" "5"
tput "sgr0"
tput "smso"

Oh goddess, who's calling perl. Is it still just glibc being that stupid? No, lots of stuff is (et tu, binutils?) but almost entirely just calls to texi2pod for documentation generation, so I can probably either not install perl or --disable-docs somehow and get it to not. And this is related to pod2man as well.

Lots of calls to pkg-config: binutils is looking for libdebuginfod, libzstd, and msgpack, grep is looking for libpcre2, and make is just checking if the tool itself exists but not actually using it.

Of course it's using both python 2 _and_ 3. Findutils is using "python" unprefixed (and getting python 2), in some sort of sysconfig replacement with comments like "Can't use sysconfig in CPython 2.7, since it's broken in virtualenvs" and while it's so much an "I can't even" as strongly believe I shouldn't, and then glibc is using python3 and running whatever scripts/gen-as-const.py is rather a lot of times (some sort fo compiler wrapper?) plus gen-translit.py and dso-ordering-test.py and gen-libm-test.py. As I said: replace with musl.

And the "texi2any" nonsense is actually a symlink from "makeinfo", which is a command that should just not exist in 2023. It's in the "mt" bucket, the gnu info format was a derivative of "gopher" before html took over, meaning it DIED THIRTY YEARS AGO. (I'm pretty sure I ranted about how obsolete it was already at least a dozen years ago?) So yeah, info sucks. Don't install it and rip it out of any configure file that can't cleanly drop it out..


July 12, 2023

It's very easy to sit down and open a new can of worms, and cover the room with slush pile scribbling about ideas and implementation... vs the slow tedious heavy lift to finish and clean and package and test and document it so it's DONE. I have SO many open cans of worms spilling into each other, which I've been hammering on xeno's paradox style to close off and check in for WEEKS... They self-select, because the ones that are easy to finish get finished, and the ones that are hard to finish accumulate unfinished.

The Taiwan guys replied that the pandemic "trained us to adapt" for remote talks, so I need to record... let's see, half hour timeslot which is 9:30-10:00 am on the 29th there, which I think is 8:30 pm on the 28th my time? So, half an hour of material, allocate that with a ten minute prerecorded talk on using mkroot, then stop for 5 minutes live Q&A via zoom variant du jour, then ten more minutes explaining the implementation of mkroot, and 5 minutes left for more questions?

So, two ten minute videos on mkroot. Time to make some bullet point lists and hammer them into outlines. And of COURSE I'm doing the "I should clean that up, I should change this part" dance. Trying to document stuff always results in the desire to simplify away the bits I don't want to explain to someone who doesn't already know them. Which is a net positive (by rubber ducking at a theoretical audience I've found more things to fix), but also a tangent from a tangent...

Ah right. Trying to explain this goes "If you run mkroot with no arguments it builds a broken binary because we statically linked against glibc, which can't do things like DNS lookups because glibc sucks. Copying the dynamic libraries out of the toolchain takes 1.7 gigabytes on my laptop, and although I long ago had a trick to run ldd recursively against the binaries and libraries I copied, Elliott Hughes of Google strongly objected to me adding ldd to toybox for reasons I'm still unclear about, so you can't do that in an airlock build. So you pretty much HAVE to use the provided cross compilers for this to be at all useful."

I can of course explain the clear path, where I carefully do not walk across any land mines. Which is sort of cheating. I.E. ONLY show them using a cross compiler (or building on a distro like alpine that has a musl host library), and hand-wave away how profoundly glibc sucks.

Or I could just ignore Elliott and implement the simple tool I need to make this work, and he can leave it out of Android's config. (I note that the Android NDK does not contain ldd either, so it's not like I can use the version out of the toolchain that he insists provides the "right" functionality, because it's NOT THERE. He argued at length that does-not-exist trumps "good enough", and I am so tired.)

Sigh. I can also run sed against the readelf output and then do elaborate path shenanigans... which is way too big to put in mkroot proper but has to happen _AFTER_ the toybox build and all the package builds. In _theory_ toybox just links against libc and the dynamic linker which you can get by building hello world, but in PRACTICE when that's glibc toybox pulls in libcrypt.so.1, libm.so.6, and libresolv.so.2. And even THAT doesn't tell me what magic dlopen() crap it needs for the DNS resolver and so on, this is "run it under strace and see what gets opened" territory, with the question of what codepaths get exercised in your test...

Blah, how am I supposed to programmatically find the dlopen libraries of glibc? Hardwire knowledge of glibc into the "dynamic" harvester script, I guess. There's a reason I haven't done this before now, but I need to explain it to a new audience in a couple weeks, and I would like the explanation to make SENSE and not have large holes. The problem is, the design of glibc makes no sense and has large holes.


July 11, 2023

Sigh, I wimped out of attending the taiwan conference. When they initially contacted me I was in Japan and totally expected to be back regularly, and Japan to Taiwan and back is a day trip so a two day conference: not a problem. And having it be at the start or end of a multi-week visit to Japan, also not a problem. The international flight was amortized over a long stay in the area.

But I wasn't really in control of that schedule, and eventually plans changed so I wasn't going back to Japan for work at all, which meant I was now amortizing an international round trip against a 2 day conference where I knew nobody and was only scheduled to give one half hour talk. I still tried to make it work, but with the FASTEST (not cheapest) option each way being something like "22 hour travel time, including a 3 hour layover in San Francisco and then 6 hours in Los Angeles"... add in travel to/from the airport at each end and we're talking a day and a half of travel before and after, so at least a 5 day commitment to give a half hour talk, and "don't stay for the whole conference" would be _less_ incentive to go...

And of course the longer I delayed making an uncomfortable decision the less options were available and the more the price went up... trend line did not improve. I feel bad about this, but it was one of the multiple "things looming at me" that have been piling up recently, that tend to pile up when my executive function gets overwhelmed. (Bit like a large log blocking a river, and lots of little things plugging up the gaps.)


July 10, 2023

The Linux From Scratch automation script I started was LFS release 11 and LFS 12 is out now, so I should probably redo it. Back under aboriginal I stuck with an old version and got waaaaaay behind, until updating it was a heavy lift. Although part of that was accumulated version skew from sticking with the last GPLv2 toolchain releases until they got ancient, and the new stuff needing more and more patches to build with old tools, especially after C11 came out and packages started depending to it.

But I don't want to redo my recent LFS build stuff yet, I want to continue through to the end, because a partial script isn't very useful to me. I need TESTS. What I need from this exercise is a reproducible Linux From Scratch build that has some obvious success/failure indicator, and "rebuilds itself under itself twice, and I get a shell prompt from the second one" is the obvious smoke test. It worked, and what it made also worked. The build didn't break isn't the same as the build WORKED, and "the first build works well enough to complete a second build" is more or less my definition of success. (I _could_ do it a third time to make sure the rebuilt-under-itself one works properly, but it's one of them iterative 80/20 things: 80% vs 96% vs 99.2%. While I have seen bugs that only occurred on the third pass, that happened like twice over ten years, and both times aboriginal's users emailed me about it.)

Anyway, once I've got such a successful test, with the record-commands log of which commands the $PATH needed to have in them (and either the chroot or the second build should catch anything it calls via /absolute/path), I could swap in toybox commands one at a time and see how the build differs. Especially a single-processor build if I can compare the log output (both stdout and any config.log files) to detect non-obvious decision differences during the build. (Whether the new packages get installed before or after toybox in the $PATH is editable after the fact. The best test coverage for toybox is "PATH=/toybox:$PATH" and the lowest hanging make-it-work fruit is "PATH=$PATH:/toybox".)

But I can't start going "now try my sed!" until I've got an automated run-to-completion test. Because otherwise all I can say is "the build didn't break", which is less helpful. And wandering away for a couple months and then having to redo what I've already done rather than continuing down the path from where I left off is... typical, really. Frustrating, but typical.

Tardis envy...


July 9, 2023

Circled back to toysh and the function call lifetime stuff that's blocking the "return" builtin: looks like sh_fcall->delete only has two users, which are set_main() updating the $@ command line argument list (same lifetime as the function context, so deleting that is when you'd delete any memory allocated to hold those values), and then when run_command() does a function call it transplants the deletion list from struct sh_process to sh_fcall... and I don't remember why I did that? I know a function call's sh_process struct isn't entirely real (the pid field is zero), but it's got one with the same lifetime as anything else in a pipe? I suppose the function context gets popped a little earlier (when we exit the function, not when job control waits for the child process to calculate the pipeline exit code), but why would that be _important_?

Sigh, I did work in a branch to automatically subshell pipeline elements that needed it, because otherwise for i in {1..100000}; do echo $i; done | while read i; do echo $i; done would fill up the pipe buffer and hang. Part of the reason I need to block out the world and focus on the shell for a bit again is my mental model of what it's doing has lost track of what got CHECKED IN and what was only sketched out and worked on in an unfinished branch.

The QUESTION I was trying to answer was whether end_fcall() should bail out refusing to pop the root context before or after running the deletion list, and the answer is "before" because only the set_main() case applies to the root context (and you don't want to free the command line arguments while still using them). The run_command() case is always operating on a freshly added function context which CAN'T be the root context where the global variables live, so updating global variables in a loop outside of any function can't accumulate debris and fill up memory. (I'd be surprised if I got that wrong, but wanted to be sure.)


July 8, 2023

Bleurgh, I have some kind of lurgy. My sleep schedule is completely unintelligible, and when I am up I'm too tired to do anything.


July 7, 2023

I did a cleanup pass on i2cdetect but don't have a test environment for it, and setting up a raspberry pi has always been a flaming pain. (Poked at it again, but vanilla linux kernel doesn't have a defconfig for the chipset, and I'm not very interested in building an out-of-tree fork that's stayed out of tree for over 10 years.)

But there's a web page claiming to set up an I2C temperature sensor for qemu. So I built mkroot's x86-64 target and ./run-qemu.sh -device tmp105,id=sensor,address=0x50 which made qemu complain 'tmp105' is not a valid device model name.

The web page says they built qemu specially, doing echo CONFIG_TMP105=y >> default-configs/i386-softmmu.mak which doesn't work on current qemu because there's no default-configs directory. Grepping for TMP105 brings it up in a bunch of places, one of which is hw/sensor/Kconfig which says this depends on I2C already being set... what boards is that already set for? And there's kconfig in qemu? There's no make kconfig... docs/devel/kconfig.rst agrees there's no UI for it, you manually modify symbols under default-configs. There IS no default-configs. Alright, git log --stat and search for... Yup, two years ago commit 812b31d3f914 claimed to "rename default-configs to configs" (although the --stat output doesn't agree that those filenames actually got modified) and of COURSE the bureaucracy maintaining this didn't update the documentation.

Alright, the "configs" directory has two subdirectories under it: targets and devices. The "devices" directory has *-softmmu subdirectories, and all but one of those only contains a single "default.mak" file. (The magic special aarrcchh6644 target that's licensed differently than the other targets in TCG because it's magic and special has _two_ files here, "default.mak" and "minimal.mak". You couldn't have used the kconfig plumbing to if out a chunk of the file, could you? No? That would be too obvious for the punched card types that took over maintenance of this when Fabrice Bellard fled the encroaching bureaucracy...

Ahem, so under the two gratuitous extra levels of directory, configs/devices/x86_64-softmmu/default.mak is one line including ../i386-softmmu/default.mak because of course. And THAT has a bunch of commented out CONFIG_SOMEDEV=n lines (sigh, that's not how kconfig works) with a comment that you can uncomment them "to disable these optional devices". (Why can't you do this from the qemu command line?)

What does setting CONFIG_ISAPC actually do, grep -rl for it and... the only file using it other than this one is hw/i386/pc_piix.c which is using it in two #ifdefs to chop out two functions: pc_init_isa() and isapc_machine_options(). The rest of the file is still compiled, because of course.

So they use kconfig in a way that people already familiar with kconfig can get no information from. My earlier question of "where is CONFIG_I2C set" has no obvious answer so far. Can I just glue the symbol the gist page says to the i386 default.mak and have it work? Where _is_ CONFIG_I2C set, let's grep for it and... there are 281 hits under roms/u-boot and 7 hits _not_ under there. Bra fscking vo. I remember the YEARS that u-boot wouldn't run under qemu because Wolfgang Denk unconditionally refused to make dram init configurable. Then Wolfgang died. Often the way such refusals end...

There's something called "meson.build" that sucks this in but I don't want to run a meson, whatever that is. And it's also in build/mips-softmmu-config-devices.h... Ah, meson is yet another build system. So you type "make" to build with ninja using a ./configure shell script that runs python 3, and now there's another layer called meson. And I'm trying to trace through their highly nonstandard use of some subset of kconfig. I doubt anybody still working on qemu understands half the layers, they're just cargo cult piling up black boxes.

I'm just trying to emulate an i2c test environment. This is like two chips and a couple wires on a bread board. The whole POINT of i2c is it's very simple, that's why it exists. It's a mildly structured serial protocol. QEMU went to the trouble of implementing it, and adding a command line add-a-device syntax, and then won't actually DO it. The qemu-system-x86_64 binary that DOESN'T include support for this is 18 megabytes and ldd | wc says it links against 69 shared libraries, but supporting i2c by default would just be silly.

Hmmm, I googled for "qemu configure enable all devices" and... this doesn't seem to have occurred to anyone in the qemu community? The first "device emulation" documentation hit (on gitlab.io) ends with a 14 link "emulated devices" list that does not include i2c, but does include "CAN Bus Emulation Support", "Network emulation", and "Sparc32 keyboard". (Um, pick a level?)

Ooh, another hit near the end of the page (or at least just before the "people also search for" google advertising bar) says I can type "qemu -device help" which doesn't work because they removed the "qemu" link, but qemu-system-x86_64 -device help does indeed provide a long list of... fairly useless information. Half of it is CPU version strings, and adding -M isapc does not change the list in any way so it's not a list of devices available in a given context. BUT grepping that output for i2c produces 4 hits, two of which are on bus i2c-bus! So i2c-ddd and smbus-ipmi. More stuff to google, but I'm tired just now...


July 6, 2023

I have renamed call_function() to new_fcall() because I need to be better with the function vs fcall distinction: sh_function is a function definition (created by encountering name() { body; } and storing code that CAN be called, with its own name and sh_pipeline list, in a big global namespace), and fcall is a running call from one part of the script to another part of the script, with local variables and a blockstack and so on to keep track of where we are in loops and such right now, which lets us know where to go back to when this call ends. If you call a function recursively, one sh_function can instantiate multiple instances of sh_fcall.

So run_subshell() uses an anonymous function context for the fork() path, because the child process should exit when hitting the end of the current parenthetical block, and should not be allowed to return or break or anything out of it. The only reason we don't rip up and free the existing TT.ff list in the child is we retain the local variable context in the subshell, so we instead cap the list with a hard stop marker so we never return past it and redundantly execute the same shell code in the parent _and_ the child.

What DOES happen when you return from a parenthetical in a function call:

$ x() { (echo hello; return); echo two; }; x; echo three
hello
two
three
$ x() { { echo hello; return;}; echo two; }; x; echo three
hello
three
$ (echo one; return;); echo two
one
bash: return: can only `return' from a function or sourced script
two

Ok. So in a subshell, return detects that there's an enclosing function/source context OUTSIDE of the subshell (inherited from the parent process) which it could return from, but still exits the child process rather than running code outside the subshell. So THEY didn't clear their function stack either. :P

(It's gotta be some kind of cap entry, becuase subshells can nest. You can't have a global "we we are in a subshell" indicator, it has to indicate a position on the stack. I suppose it could be a pointer to a stack entry instead of a type of entry ON the stack, but I don't see how that's an improvement. Still needs its own blockstack so you don't "continue" outside the lines either.)

The next caller of new_fcall() is run_command(), which is NOT an anonymous function call. This is fully nonymous, an actual call to a named function from which we can "return". It does an addvar(0, TT.ff) to indicate this, initializing the local variable stack. (It doesn't add a variable to it, but it _exists_ is the point so you CAN add local variables to it later.)

Next up sh_main() calls new_fcall() to create the initial function context, but that one's magic. As with PID 1 and initramfs in the kernel, it's never not there and doesn't quite have the same properties as later ones. It's neither anonymous nor nonymous: you can't return from it, but can't reach past it either: you get an error message if you try.

Next up eval_main() calls new_fcall() but that one gets reached past: we're not even in a subshell, return works immediately. This one is as transparent as possible, but still stops running at the end so you can free the file * passed into do_source().

And finally source_main() calls new_fcall() which is a hybrid between run_command() and eval_main(): return acts like a function popping specifically _this_ function context (not drilling past it), but we recursed into do_source() and need to return from that to close the filehandle and such.

All five users want some form of cleanup, but run_command has the cleanup happen inside the run_lines() loop, specifically free_function() checks if ff->func is set and calls free_function() on it, which does the reference counting. While I could add an fp field to TT.ff so it also did its own cleanup, do_source() is iterating through input lines and handling line continuation requests, and we need a signal to break out of that and return from it. I suppose this COULD all be one big line handling loop in sh_main() (which would be easier on nommu stacks), but it's not currently written that way...

But run_subshell() wants isolation, run_command() wants command() semantics, sh_main() wants an init task, eval_main() wants transparency (not just for return but for break, and of course you can eval 'eval "echo hello"'), and source_main() wants command semantics with cleanup.

$ for i in a b c; do eval 'if [ $i == b ]; then break; fi'; done; echo $i
b

Ok, (subshell), command(), main, eval, source: 1) eval and subshell are transparent to return, main errors, command and source are returned to, 2) eval and subshell are transparent to break/continue (when it digs its way out of the blockstack; subshell will abort when it pops the transparent context so diddling with a child process's forked copy of the function list is harmless), 3) everything except command causes run_lines() to exit so the caller can perform cleanup, 4) local variables are command() only.

Returning drills down through "transparent" contexts to the next command or source context, erroring if it hits main. But this error is a normal function failure return with no other effect on flow control:

$ echo hello; return; echo $?
hello
bash: return: can only `return' from a function or sourced script
1

When return DOES find a target context to return from, it pops all but the last blockstack in each context (including the transparent ones it drilled past) and sets ff->pl to NULL.

In order to make "break" and "continue" work in eval context I basically need to give them a similar logic to return, which implies they should also be shell builtins rather than keywords like if/else interpreted within run_lines() as they are now. So those need to move, and to leave empty transparent contexts as necessary to get popped and trigger run_lines() to return so eval can clean up.

I THINK that's right?

I'm tempted to try to restructure things so do_source() just queues up work and then there's a loop in sh_main() that reads the next line from the current fd in the current sh_fcall(), but that's major design surgery and I'm just trying to get "return" in...

[Editorial note: I continued filling out this day's description until the 9th because I wanted it together in one place.]


July 5, 2023

So what function contexts in toysh do is tell the run_lines() plumbing about discontinuities in the input script. Right, back up:

Running anything starts in do_source(), which takes a FILE * argument saying where the lines of shell script come from. The heart of that function is a loop calling get_next_line(), parse_line(), and run_lines() as appropriate. (With signalling back and forth about line continuations.)

At some point I need to teach get_next_line() to do command history and editing, but right now it just outputs a prompt (do_prompt()) and does... not even a getline(), a getc()/realloc() loop, because signal handling.

The function parse_line() takes one char *line at a time and assembles a doubly linked list of struct sh_pipeline (ala "pipeline segments"). It returns 1 if it needs another line to complete the current thought, such as unterminated quotes or an if statement without a corresponding then/fi, and 0 if the resulting sh_pipeline list is runnable as-is (and -1 if there was a syntax error). Behind the scenes parse_line() calls parse_word() a lot (which returns the length of the next token in bytes, handling quoting and such, and itself returning 0 if we need another line to finish), and handles the results with a big if/else staircase. At the start, parse_line() glues the new line to what's left of the previous line if this _is_ a line continuation.

Each struct sh_pipeline contains an int type field and a struct sh_arg {int c; char **v} with the split-up command line. (The arg->v[arg->c] entry is NULL if the statement ended with a newline or semicolon, but if it's | or && or something that string is saved as the terminating entry so the pipeline segments can be appropriate stitched together when run.) The int type indicates what _kind_ of statement this is: zero for a normal command you fork and exec, 1 for the start of a flow control block (if/for/while), 2 for the "gearshift" between the test and the body (then/do), and 3 for the terminator (fi/done). Each of these still has an arg so it can if (!strcmp(arg->v[0], "while") to distinguish between them at runtime, then an if/then/else statement has more pipeline segments between the type 1 and 2 representing the test you run, and segments between type 2 and 3 are the body that's conditionally run if the test returned success. (There's various other types for functions and case statements, and "for" loops have more data fields, etc.) Plus these flow control statements nest so you can have types other than 0 between types 1 and 2 or 2 and 3....

When run_lines() is processing all this stuff it USED to have a struct pipeline *pl local variable to point to the current pipeline, and a struct sh_blockstack *blk which is the runtime stack of nested if/else/fi and for/do/done contexts (which is never empty, you always start running in an initial block that's kind of an invisible { } around everything, or in | a | pipeline ( ) around each one). The blockstack structure contains the housekeeping information that needs to be duplicated at each nested level of flow control.

Then I added a function call stack, globally pointed to by TT.ff, which the pipeline cursor and block stack moved into so calling a function can jump to a new pipeline segment, and then eventually return back to where it came from even if it was in the middle of nested while/if/case flow control. And each level of function call has also got a struct sh_vars array containing the local variables at this level, and another struct sh_arg containing the command line arguments this function was called with (so $1 and friends expand appropriately), and so on. Just as the blockstack list is never empty, the function call list is also never empty: sh_main() creates an initial function call context containing the global variables and command line arguments to the shell, and so on. And the order of TT.ff entries is slightly non-obvious, the one it points to is the current one, the one we'd return to is ->next, and it's a doubly linked list so TT.ff->prev is always the "root" context containing the global variables. When TT.ff->next == TT.ff we're not in a function.

When run_lines() gets to the end of the current pipeline list and has thus run out of things to do (without asking for more input), it has two options: 1) pop the function stack and return to where we got called from, or 2) break out of the loop and return to whoever called run_lines(). The "break out and return" case happens when we have an "anonymous" function context at the end of the stack, which calls to do_source() add, allowing it to clean up after itself. This happens when sh_main() does an initial call to "cd ." to set up $PWD and $OLDPWD, and later in sh_main() it calls do_source() again with the -c string or the script file or the global "stdin" FILE * instance. Calling do_source() is also what eval_main() and source_main() do internally, and all of those calls need to exit do_source() when exiting that function context, so the caller can close the FILE * and otherwise clean up and return.

The problem is, you can "return" from source context, and you can "return" from eval context. And if you've got an anonymous function context at the end of the list (indicating run_lines() needs to return to its calling C function when done) it's not immediately clear to me what "return" should _do_. It's gotta traverse down the function call stack, edit the appropriate parent function context(s), but leave the anonymous function context(s) intact so we return to the right C functions. And the search for the first non-anonymous context may find there isn't a valid one to return to, so it should emit the "return not in a function" message which might be a syntax error? (I need to make sure syntax errors flush the call stack appropriately, but still signal the need to return from the C functions.)

The previous "just handle it when we hit the end" logic never cares about an _enclosing_ function context. That's new to "return"...


July 3, 2023

Walking to the UT tables, making up for lost time by getting extra steps. Walked to the river and back to the university, now in a certain amount of pain. But 20k steps so far and still a couple miles to get home. Exercise!

So adding return to toysh: in "source context" return works but "local" doesn't. Working out the granularity of this stuff is kind of annoying. So it's an active function context without a variable context, look at setvar and it's calling getvar to locate existing variables that may need replacement (well, setvar() calls setvar_long() which calls findvar() and addvar()... Except when do you need to search from current function context and when do you need to add to the root context? Urgh, I already redid all this logic in one of the branches I never got to finish and merge! My mental model doesn't match the code because it forked and didn't get back together.)

Right, tangent. What I need to do HERE is work out signalling, specifically how should I label each struct sh_fcall instance in the linked list so "return" knows what to pop? Each function context records: 1) pipeline cursor (where in the script are we executing), 2) local variables, 3) command line arguments for $* and friends, 4) flow control block stack for nested if/else/while stuff. (Which has to be in the fcall instance so when you return from a function it knows how much to pop.)

I'm creating a new function context in five places:

  • sh_main() - initial function context, this is the root one that holds the global variables.
  • run_command() - normal function() calls.
  • source_main() - it's basically a function call, has command line arguments even! But no local vars.
  • eval_main() - temporarily swap out the command line arguments so we can repurpose $* expansion, and is... otherwise wrong (can of worms).
  • run_subshell() - so the child process has its own pipeline cursor and empty blockstack, maybe wrong too?

Alright, so the question is what happens if you return from eval or subshell context. In the case of subshell return can (should?) be ignored, in the case of eval it needs to return to the _parent_ context, which is funky because eval_main() set up a function context and wants to tear down one function context, but return would reach past that? How do I make that work...

$ x() { echo one; eval "return 37"; echo two; }; x; echo $?
one
37

Hmmm... Ok, create "transparent" function context that return would blow past, manually cache both TT.ff and TT.ff->pl in a local pointer, do_source() on the resolved $* string, and then restore TT.ff->pl to the cached pl value ONLY if TT.ff hasn't changed? No, it needs the blank blockstack context too so "run off the end" returns to the calling context (I.E. how do_source() knows when to end). So kind of what it is doing, but put the if (TT.ff==ff) on the end_fcall(). If they called return, that fcall already got ended.

I think.


July 2, 2023

Ah, the standard way to add an hour to a debugging session: compare the wrong output files! (And figure out you're doing so after you run out of printfs to stick in, and run the binary under strace to confirm that the system call is being made to emit the correct output.)


July 1, 2023

Darn it, bash does NOT set the O_DIRECT flag on its pipes, at least not that I can tell. (Maybe only if there was a read?) cat /proc/self/fdinfo/0 says flags 0100002 on the pty, which /usr/include/asm-generic/fcntl.h says is O_LARGEFILE and O_RDWR, which... maybe pty already doesn't do collating? Fine. But then echo hello | cat /proc/self/fdinfo/0 says flags 00 which is just sad. (Can fdinfo not read flags out of a pipe, or are no flags set? No idea!)

Dowanna try to insert an extra C binary into the dd tests. The test suite does NOT depend on having a compiler in the $PATH at runtime.


June 30, 2023

I need to get a toybox release out. I need to submit a quarterly invoice. I need to book the flight to speak at the taiwan conference (although they still haven't gotten back to me about hotel information). I need to post that new kernel patch to lkml.

It's one of those "I need to do everything before doing everything else" stunlocks where I really wanted to get "dd" cleaned up in this release, but I need to finish the patch.c rewrite and then get back to the like FIVE nested shell issues (I'm halfway through fixing HERE document variable expansion line continuations in one tree, and implementing "return" in another...) But a new kernel just dropped and it's best for mkroot if I have a fresh kernel version out when they do...

One of my old 401k plans has ended and needs me to transfer the money to something else (they suggest rolling it over into an IRA), and after 6 months of pestering I'm 95% sure this isn't an identity theft scam. but I still want to go to a fidelity office in person to deal with it, and thought "I'll do that in minneapolis", but the closest fidelity office was an hour away by bus. Sigh. (Only Boomers have significant retirement savings, and they all live in suburbia. This was one year of maxed 401k deductions around 10 years ago, and if I withdrew it I could cover my bills living in this house for... 2, maybe 3 months?)

I don't have MUCH retirement savings, but they are at least badly managed. I have two old 401k accounts from when I worked at Pace and Polycom (stuck in the closest thing they offered to an index fund, neither of which was very close), both companies have been renamed since (I can't keep them straight), and I really really really should have rolled them both over into a Roth IRA during the pandemic because I was making less than usual and the tax hit would have been comparatively minor. But I just didn't have the spoons, and even a minor tax hit was still more free cash than I had: I'm still paying down the home equity loan I ran up following by a start-up down instead of trying to get a new job.)

I should get a proper psychiatric whatsis (before Fade graduates and we lose the good health insurance where we're pooled together with a bunch of 20-something college students) so I can get the kind of regularly tailored-and-adjusted ADHD meds Fade has, because caffeine borrows a cup of executive function from your future self, and I have recently confirmed that one "monster" energy drink with sugar gives me about an hour of focus and then knocks me on my ass (irritabily) for the rest of the day. (The diet ones just go straight to headache now.) Alas, Texas no longer has a healthcare system, instead strip malls have emergency care kiosks, which sprout up like mushrooms around here. Google Maps refuses to show the one closest to me (across the parking lot from the dead Sears in Hancock Center) unless you know to search for it by name (presumably because it's for poor people, they won't show the black-owned haircutting place just past the Wendy's either: zoom in all the way and it never pops up), but on the walk to-from the table I pass where the second closest Wells Fargo location to me closed during the pandemic, which was replaced by "Next Level Urgent Care" which shares its building with a convenience store, a coffee shop, a nail salon, and a Jimmy John's. Of course they're down the street from St. David's but nobody can afford to _go_ there. (It's owned by HCA.)

In Minneapolis I can go to health care places that remember I exist between visits (and do not ask if I'd like fries with my prescription), but alas I'm not there for long enough periods to deal with half the medical issues I want to get looked at while we still have the good insurance. (When Fade graduates, we go back on... the obamacare plans I guess? Post-dissertation she's already lined up a one semester teaching gig covering for someone's maternity leave this fall, but I don't think it comes with health insurance. We can do that Common Object Request Broker Architecture thing to very expensively extend the previous health insurance month by month for a bit, but...)

That's another reason spending a few years in Japan looked interesting: they have a functioning healthcare system. Admittedly they don't believe in psych meds because it's all in your head and you should just stoically suffer and play out your social role until it's time to commit suicide or be killed by the system (if you're not up for dying at your desk via Karoshi or becoming a hikikomori they have a special forest you can go to, suicide is the leading cause of death there for men age 20-44 and women age 15-29). But for things like dentistry and blood pressure they're way head of us, so... net win? Then again people go to Mexico from the USA for affordable health care all the time, so "ahead of us" is almost an information-free statement? Anyway, it looked good while I was there.


June 29, 2023

Flying back to Austin. The middle seat was empty this time so I had enough elbow room to use my laptop a bit, and the new one still has a fresh battery, so I could do some kernel compiles. Last night I ran the usual mkroot/mkroot.sh CROSS=allnonstop LINUX=~/linux/linux build on a clean checkout of the new 6.4 kernel, with none of my patches applied, and wonder of wonders everything except x86-64 built. Which means my patch to remove the stupid x86-64-only host ELF library nonsense is the only thing I actually NEED to forward-port. (My other patches are nice to have, but not release blockers.)

So I did that on the plane, starting with the "proper" fix of changing the HAVE_OBJTOOL line in arch/x86/Kconfig to say "if X86_64 && !UNWINDER_FRAME_POINTER". (The ORC unwinder is implemented stupidly in a way that drags in external dependencies that build break a simple host environment. You can select the frame pointer unwinder on every other architecture, and USED to be able to on x86-64, but when this new feature was added in 2019 they broke the existing one but only on this one architecture. I've been hitting it with a rock ever since, which means I'm regression testing that the unwinder that works on EVERY OTHER ARCHITECTURE also still works here.)

Except, since last release, Josh Poimboeuf broke arch/x86/entry/entry_64.S in commits 4708ea14bef3 and fb799447ae29 by adding some sort of stack guard that never got tested with the relevant config option switched off. It adds hardwired dependencies on the ORC stack unwinder to the x86-64 system call entry code. Bra fscking vo. A bunch of undefined macros sprinkled everywhere, which needs me to add this nonsense to the start of the assembly file:

+#ifndef CONFIG_HAVE_RELIABLE_STACK_TRACE
+#define UNWIND_HINT_ENTRY
+#define UNWIND_HINT_IRET_ENTRY
+#define validate_unret_begin
+#endif

Which took quite a while to figure out, because everything going wrong is in the middle of nested macro expansions. And then it STILL breaks because the assembler says the file arch/x86/include/asm/idtentry.h line DECLARE_IDTENTRY_RAW(X86_TRAP_BP, exc_int3); has garbage on the end of the line, the error message says first unrecognized character 's', and after MUCH DIGGING I changed a second line of entry_64.S with this hunk:

+       UNWIND_HINT_IRET_ENTRY offset=\has_error_code*8
        .if \vector == X86_TRAP_BP
                /* #BP advances %rip to the next instruction */
-               UNWIND_HINT_IRET_ENTRY offset=\has_error_code*8 signal=0
-       .else
-               UNWIND_HINT_IRET_ENTRY offset=\has_error_code*8
+               UNWIND_HINT_IRET_ENTRY signal=0
        .endif

That might not be the right fix, because now there's two instances of the UNWIND_HINT_IRET_ENTRY macro and I don't actually know what it DOES. Will it insert the wrapper code twice? Why does it take statements as arguments? I'm just GUESSING. The ORC codepath uses an assembly macro that takes multiple arguments, and does... something... with them. Note that the difference between .s and .S files is that lower case doesn't go through the C preprocessor and the upper case does, so they have C macros AND assembly macros in this thing, and I'm not that familiar enough with proper assembler syntax to author it from scratch. I've done a lot with machine language, and poked at existing assembler like this, but as soon as you have an assembler doing symbolic name=value stuff I do THAT part in C with inline assembly statements as needed. So I've never actually used assembler macro syntax because if you're using macros why are you doing it in assembly?

The build break here is that the assembler won't accept multiple statements on the same line (another reason to use inline assembly in C functions to do anything fancy), and when I #define the macro to nothing so it drops out that's what this tries to do. Possibly I need to change my "#define UNWIND_HINT_IRET_ENTRY" to instead take a list of arguments and split them so each one is on its own line, but I don't know how to make a C macro do that off the top of my head, and don't know how to make an assembler macro do anything, and it's a bit tricksy to look that up on an airplane with no wifi.

I also wist for the days of "don't ask questions, post errors" where I could post something that worked for me to the list and make puppy eyes at this Josh Poimboeuf guy who broke it so he could fix the #else case of his macro. The last dozen times I've posted anything to the list they never engaged with HOW I did anything, it was either "you got the bureaucracy wrong" or "you're crazy for wanting to do that, stop wanting things". Questioning my goal and questioning my paperwork filing skills, seldom if ever discussing the code. (Oh, I did get coding style complaints about semicolon placement and where to break lines and such.)

Anyway, I got a patch that works for me. I need to post it to the kernel list, but... I do not have the executive function to steel myself to deal with that wretched hive of scum and villainy just now.

Back in Austin, hanging out at the HEB deli tables with laptop, and re-examining patch.c, I kind of want to rewrite it to do work in a different order. This patch implementation is designed to work as a stream editor, meaning it grabs a hunk and searches forward through a stream of input lines for a place to apply it, and if it hits EOF it announces failure, emitting the hunk it couldn't apply to stderr. It only buffers enough data to evaluate the current hunk, and writes it out to the file as soon as it can be sure that line wasn't the start of this hunk. This means two things: 1) It can't back up to apply hunks out of order (which is fine, diff never generates them out of order, nor do they overlap), and 2) it can't fail a hunk in the middle but apply later ones. The way it figures out a hunk DIDN'T apply is by reaching the end of the file without finding a place to apply it.

What I _could_ do is read all the hunks and try to apply them in parallel, and take the first one that matches at any given position. This would allow hunks to apply out of order. It could introduce failure modes where two identical sections are replaced with two different things, but as long as earlier hunks win over later hunks in the case of a tie, that should work out ok?

But this is not a tangent I want to go down right NOW. I'm trying to close tabs for a long-overdue release...


June 28, 2023

I'm in another one of those stun-lock situations where spinning enough plates where each is only glacially advancing. Not blocked, just... chipping away.

Sigh, this sort of thing is why I factored out and genericized the toybox argument parsing right at the start. It's not the command's job to implement support for -- although it may have a flag to _disable_ what it gets by default.

I'm still subscribed to the busybox mailing list, and check the folder every once in a while to see if they have interesting test cases or feature requests from their userbase. A surprising amount of the busybox issues that go by are "I got the toybox plumbing right, doesn't apply to me" issues, and I'm mostly looking for test cases or new features requests, although in that case the android guys submitted a getfattr years ago and then yanked it again (I moved it to pending) because their internal stuff binds to some elaborate C++ library or something? And "man getfattr" in Devuan Bulimia doesn't mention it? But I already have xattr getting/setting logic in "tar" so I suppose I should deal with the command line version at some point, starting with looking up why they wanted that weird library. (Probably some kind of magic filtering. I remember discussing it on the list, I just don't remember the _result_. But that's what the list archive is for...)


June 27, 2023

Got a bug report in patch, which looks like yet more fallout from adding fuzz factor support. (The problem is if you're speculatively applying a hunk and it fails after a few lines, you have to back up and try the first line of the hunk at each of the lines you've already traversed past, but the loop wasn't really set up to do that.)

But patch has several pending todo items already. The reversed hunk detection is... not exactly wrong, but insufficient. Right now it's just "one added line works as a removed context line", but it SHOULD be "a full hunk would have applied in reverse, and no hunk has yet applied forwards". Also the payload of struct double_list in lib/lib.h should really be void *data instead of char *data, because that way you can stick as any pointer type in there without needing a typecast. But the first user was patch.c which wanted strings, and did plist->data[1] and so on, meaning in the rest of the tree there are a bunch of gratuitous typecasts, which I've been meaning to clean up. But cleaning it up requires making patch NOT use it as a char *...

I should just fix the bug in front of me and not poke the house of cards, but I'm bad at that. :)

Hmmm... is the "no hunk has yet applied forwards" test actually helpful? Because hunks apply in order, so the check happens at the end of the file when we didn't find a place to apply this hunk. Meaning we can only evaluate one hunk for having been reversed, and you COULD append a reversed patch to a normal patch. If you do that, it re-opens the file because generating a diff starts with @@ filename lines, and then the hunks within that must go in order in the file and can't overlap. So it's "the first hunk of a file, not a file". (Meaning the first hunk of a "@@ filename", but not necessary the first hunk of a file.patch. Coming up with vocabulary to distinguish concepts is a surprisingly large amount of the design work here, you need unique NAMES for this stuff and they haven't necessarily already got them...)


June 26, 2023

Still under the weather. I mostly spent the day listening to discworld audiobooks and nursing a headache from healing dentistry.


June 25, 2023

Toybox release prep has a whole bunch of externalities, the most annoying of which is building current linux kernels in mkroot. I should update to the new kernel version, but I never did reply to Andrew Morton, haven't tried to shove my patches up LKML's orifice since last toybox release, and I REALLY don't look forward to finding out what they broke this time. (And I should make sure it builds the patched and unpatched version, although I've NEVER installed the gratuitous elf package on my laptop so I've never built the unpatched kernel for x86-64 in mkroot since that dependency cropped up. Built fine for arm last I checked. It's so absolutely unavoidably necessary most architectures still doesn't need it and you worked fine without it for 30 years. I am SO TIRED of arguing with those people.)

I miss aboriginal linux as a test environment, and want to slap together a mkroot environment capable of at least running the test suite. If that involves grabbing a bash binary from aboriginal or some such, oh well. And as long as I was doing that, I could grab the make binary and do more "build self under self" testing, which leads into the mkroot stuff... and that implies put the mkroot /bin and the aboriginal linux /bin on the same filesystem with the first berfore the second in the $PATH... But small easy steps. If I try to do everything at once I never get to check anything IN. (Plus aboriginal linux was layered, with the toolchain being a second filesystem spliced in at mount time, so getting bash is easy but getting make involves pulling in an obsolete glibc version that links against ancient uclibc that lots of modern packages won't compile against. Remember, uclibc development lost power shortly BEFORE posix-2008 came out and is only realy RELIABLY susv3 (not susv4) so me beating a dead horse through 2017 doesn't help the staleness of the API. I ended that project for _reasons_...)


June 24, 2023

I have not been keeping up with my blog since dentalism. (And the days I hadn't written up before that are kinda lost to history now.) Still not back to full speed, but shoveling out.

I'm trying to close tabs for release, but tabs keep opening as fast as I close them. Especially when there are a couple of gristle changes that take an inordinate amount of chewing to be satisfied with, it's not BIG, it's just not _right_ yet.

Still grinding away at the toysh HERE document line continuation logic, which is being a whack-a-mole change. (Currently it's gluing the lines together AFTER checking for the HERE terminator, so sh -c $'cat<<EOF\nabc\nE\\\nOF\necho hello' doesn't work because neither E\ nor OF were the end of line marker. But again, you can't just look at the last two bytes of the line for a backslash and a newline because that backslash could itself be escaped, you've gotta parse forward from the start, and THAT logic is basically the variable expansion logic (at least for detecting line continuations, there's a call to parse_word() in there), so unless I want to duplicate it in two places it's tricksy to get the ordering right.)

I have a smallish change to cp that would mean backing out the pending change to cp -s. I got readlink fixed and the cp -s logic is separate but similar, I want to get both fixed. But it's DARN FIDDLY. (And no, the gnu/dammit one doesn't get it right. I want cp -sr to work whether source is above dest, dest is above source, or one or both are absolute paths. When given relative paths I want it to produce relative path symlinks, and when given absolute paths I want it to produce absolute path symlinks. And I need tests for all of it in cp.test)

I'm actually most of the way through reviewing and cleaning up dd.c but I'm now at the "part I want to rewrite" (not hard, just fiddly) and the "testing the hard to test parts", namely the I/O blocking that dd is all about. (If you DO get a short read, how big is the next read and the next write? Posix has a LOT to say about that, but it's hard to externally test that a program is doing that right.)

The trick to testing dd is to use O_DIRECT pipes, a "packet mode" feature added in 3.4 that maintains the existing blocking (does not merge or split blocks of data sent through them). Which I _think_ bash is already doing, because otherwise "while read i; do blah $i; blah stdin; done" is really hard to do in a pipeline. (On a seekable fd you can fseeko(ftello(fp, 0, SEEK_CUR)) to force it to discard the buffered input and back up the underlying file descriptor so the child process gets the next byte after the line read put into i, but you can't unget pipe data. But if the input was one write() per line, the O_DIRECT means the next read() naturally stops there and readline() sees the trailing newline and returns at the right place without readahead. It's a bit delicate, but workable.)

Of course even if bash does that, no guarantee that mksh or zsh (which macos switched to when it gave up on the last gplv2 release of bash) does, so if I depend on the shell's implicit behavior tests might fail in an uninformative way.

I'm tempted to make a C wrapper to test this more reliably, but the test suite is not set up to build C code (it doesn't strictly depend on a compiler being available) and creating some sort of toys/example/demo_testpipe.c makes the tests using it really awkward ot invoke. (None of the example stuff is in defconfig. I suppose I could have "make tests" build with a slightly weird config, but that's horribly magic and I don't want to go there.)

Sigh. I guess I could check for $BASHPID being set and only run the pipe blocking tests in that case? Because bash should set the flag on the pipe already.

And of course all the github issues, which tend to come in bursts it seems...


June 22, 2023

Dental surgery day. One of medical science's most granular weight loss programs, and one of the few modern thingectomies where "bite down on this" is still a core part of the procedure.

Sticking various forms of pliers and a miniature angle grinder in my mouth isn't actually the part I mind, that's just engineering. No, the DO NOT THINK ABOUT THE INCIPIENT PANIC part is that I needed four injections into my mouth fifteen minutes before the surgering. Why was my blood pressure thirty points higher this visit than it was during my consult visit to schedule this a couple weeks back? Before the doctor even got into the room and it was the assistant setting me up? The same reason I was sweating so much: my subconscious knew NEEDLES WERE COMING!

But first, a fifteen minute talk with the surgeon where I agreed to let them use lidocane, despite me having put it on my allergy list, because there are apparently a half-dozen something-cane drugs that are basically minor variants of the same thing, and then once you move outside that family your face is now numb and paralyzed for 12 hours instead of 4, and I eventually just decided to risk/lump the migrane. (Does it really matter WHAT they're injecting? NEEDLES!

My lidocane-inspired trip to the emergency room back in 2013 happened when a lidocane rinse got in my sinuses and triggered a full field visual migrane I thought was a stroke making me go blind. (When it's in BOTH eyes the same way, it's not an eye issue it's a brain issue.) I could see again an hour and change later (lying on the hospital gurney with nobody having seen me yet), and the "everything is overlaid with sparkly television static" went away after three or four days. Injecting the same drug into my circulatory (not lymphatic) system is less likely to get through the blood/brain barrier intravenously than getting coughed into my sinuses, infiltrating the nasolacrimal plumbing, and coming in contact with the optic nerve like I suspect happened last time. Although I did not armchair-opine that at an actual medical professional, just said I was willing to risk the migrane.

Of course the OTHER problem with lidocane is it's not very effective on me. Despite thinking I could relax a bit once the four injections were over and let them get on with the easy part (the actual surgery)... turns out I could still feel it. Muted a lot, but ow. They offered a fifth injection, but I went "that's not an improvement, just get it over with". (Yes I had considered asking if they could just yank the tooth WITHOUT injections, but strongly suspected I would regret it and there's no way they would have agreed for insurance reasons anyway.)


June 20, 2023

Capitalism's doing another "embrace and extend": podcasts started life as an open protocol where you post an mp3 file at a URL, and have an xml file called an "rss feed" list the available episode URLs with attached descriptions and dates and so on. Then capitalist aggregators like apple and spotify got ahold of them, and now the mp3 is hidden behind layers of paywall.

For example, an interesting episode of "on with kara swisher" scrolled by recently on my android podcast app, called "the man making self-driving trucks", and I wanted to send a link to Fuzzy because that topic interests her. But when I asked the Google Podcasts app on my phone for a URL, it gave me an absolutely insane pile of hash crap that I do NOT want to cut and paste into discord, smells like it would link-rot within days when some server cache expired, and which had no obvious link to the mp3 anyway. (There's a play-in-web-page button, some sort of "mark as played" cookie thing, and "add to playlist" cookie thing. So a captive portal wrapper around the content, keeping you in their walled garden ecosystem.)

When I google for the podcast's website, there are apparently two of them, one on vox media and one at nymag... with different content? But nymag pops up a paywall if you try to actually look at an episode, so screw 'em. Neither page gives links to the actual mp3 files, instead they link off to "spotify, apple podcasts, or whever you listen" with the third being some weird service that also has a play button but no link to the mp3 file (and that one doesn't even have separate pages for each episode, just a big page you click expand from, so if you want to link to a specific episode don't use them). And of course if you "view source" on any of these pages (which Google has removed from its phone browser) it's nested layers of obfuscated javascript assembling a URL out of pieces, and part of the point of https:// everywhere is you can't easily route your browser request through a proxy to see what actual URL it's fetching. (Remember how Google is switching DNS lookups to go through https so you can't block adverterising websites at your firewall?)

Luckily there's still an rss feed behind the scenes, and googling for "on with kara swisher rss" gave me a URL that's human-readable enough to get an MP3 URL out of, in this case https://www.podtrac.com/pts/redirect.mp3/pdst.fm/e/chrt.fm/track/524GE/traffic.megaphone.fm/VMP4773935195.mp3?updated=1686187670 which I THOUGHT might have an expiration in it (so it would force you to reload the RSS to get a more recent one), but date +s @1686187670 says June 7 so that's apparently upload date. (Why...?)

I eventually wound up sending fuzzy the apple podcasts link as the most human-readable of the lot. None of them are really things you type into a URL bar yourself, because the people making this infrastructure don't care about that. (Even youtube's horrible hashes are intentionally short enough you can copy them from one machine to another by hand. That was by design, and its inheritors haven't broken it yet.)

(Later I noticed that the URL the rss pointed at is itself a wrapper, and https://pdst.fm/e/chrt.fm/track/524GE/traffic.megaphone.fm/VMP4773935195.mp3 was the ACTUAL link to the MP3. Which is a 403 redirect to https://dcs.megaphone.fm/VMP4773935195.mp3?key=2b15ad537ed1f7f2041d2bd4dbdd1139&request_event_id=c162bbdb-7b1f-43db-a97e-76a7551db06b because of course it is, although you can strip the ? and everything after it and the result still works... albeit by being a 403 redirect putting it _back_.)

I hope the fediverse guys take podcasts back under their wing. Youtube used to let you listen to the audio when the phone was otherwise off, but Google moved that behind their paywall (taking away something it could already do and now charging extra for it). Podcast apps still work with the screen off, meaning you don't have to hold the phone and prevent anything from touching the screen, and the battery doesn't run down as fast.


June 19, 2023

Google has developed another weird semi-failure mode where typing a search query into the main google.com search field and hitting enter doesn't work. Instead it inserts a line break and lets me type a second line of search query, and I have to use the mouse to hit the "Google Search" button under the entry field to get it to actually do the search.

I think the problem is that when the page is loading from my android wifi hotspot (I still haven't got the wifi in Fade's apartment to accept the new laptop's mac ID), the page loads more slowly than anyone at Google has regression tested in years, and the "actually do the search now" behavior got moved to javascript or something in a separate file that's #included from the initial html file, and thus it's loaded after the first file loads. So there's a few second window where the page is displayed and lets me enter data into the field, but hitting enter in the field doesn't submit the form data because the file adding that behavior hasn't loaded yet. So I hit enter and scroll the google search bar (it does not resize) with my cursor ready to type a second line of text.

How does google NOT NOTICE that they inserted a bug into the-thing-they-do? The behavior changing while the page loads is NEWBIE bad web design, they used to teach about avoiding that in single semester courses on this stuff. I know laying off 12k load bearing people in january was a bad move, but seriously. What, only poor people have slow internet anymore, we don't matter? Or "you shouldn't ever see the main google.com page, you should just type your query into the URL bar and pollute your URL autocomplete with random search phrases that you couldn't load without interposing a google.com page of advertising"? Neither is exactly reassuring.

(Yes, I am aware that pedants say "URI" when it hasn't got https:// on the beginning. I file it under "no, a BUD lite", and if there is a useful distinction then "URL" is the good one (points to a potentially interesting site and "URI" is the bad one (points to a site frequented by people who say "URI", so I don't want to go there). File it with kibibytes: not gonna.)


June 18, 2023

I think parse_word() has to separately flag trailing slash newline as the reason for requesting a new line so it can be removed when gluing lines together, because having expand_arg() do it is just a MINEFIELD where bash -c $'echo "$\\\nPATH"' prints $PATH instead of expanding because "is the next character X" becomes a noticeably harder question to answer if "escape sequence that collapses to nothing" isn't removed before asking.

But this means parse_word() can't just return a NULL pointer for "needs more data", because that could mean "unterminated quote" (there are a BUNCH of different types of those) or it could mean this special magic instance where we want to chop 2 bytes off the end when gluing them together, and you can't just look at the END of the returned string because that \ could be the second part of \\ or it could be in single quotes where the \ isn't removed at all, and of course "\'$(' isn't a balanced set of single quotes but "\'$('\' is, the enclosing quotes aren't balanced but the question is do we remove-or-keep the final \ if the line breaks at this point? Gotta start at the beginning and work forward to know.

I have two sets of plumbing that are independently parsing this nightmare, parse_word() nondestructively figures out where the next word ends and expand_arg_nobrace() does quote removal. Did I mention even presumably simple single and double quote removal is modal? Inside single quotes you ignore everything except a single quote, so '"', '\', and '$' are all complete single quote strings, and yes even newlines just continue as part of the string until you get the next single quote which is why the escaped newline keeps the escape. But inside double quotes you still expand variables, but backslash no longer vanishes before most characters (so \x passes through), except that \\ or \$ are still escaped so consume that backslash, and backslash newline is a third instance where the backslash is consumed.

The OTHER thing I could do is just have parse_word() return NULL to indicate "need another line", but remove the trailing two characters that triggered that itself, by replacing the \ with a null byte. It's uncomfortably magic (and violates the "nondestructively" above), but I really don't have a better place to put it. The signaling handoff isn't designed for this.


June 17, 2023

I talk about being out of spoons a lot, but it's not diability. It's ADHD. I have limited executive function, and ALWAYS HAVE. I work based on momentum but have huge trouble STEERING. Lots of times I know "this is not what I should be doing", and yet here we are.

It's not so bad with people waiting on a deadline like RIGHT NOW. Pair programming or in teams. But I have often spent an entire day Needing To Do A Thing, and Trying To Do A Thing, and not having started that thing at 6pm. Things like "book a flight to a conference" or "update my bank information in a website". Looming. (I mean, I'm GOING to do it wrong. Or sub-optimally. If I could just get my stars aligned I could do it RIGHT and feel confident I'd done so. And not that I'd made the wrong reservation or participated on the receiving end of identity theft or something. Which is probably the executive function version of free floating anxiety trying to find something specific to latch on to: this is big and immovable for no obvious reason, what could the reason be...)

Back in the olden days rich white men had wives/secretaries/servants that compensated, and the less rich were miserable with constant Deadline Crisis because nothing got done before it HAD TO HAPPEN RIGHT NOW. (Been there, done that...) And I can compensate somewhat with a routine built around a rigid schedule, which is another variant of "externally imposed deadlines"...

One of the nice things about programming is "doing it wrong" is a normal part of the process. Debugging an empty screen. It's not good enough yet, keep hammering. It'll never be RIGHT, just good enough for now. I'm sure there's a better way to do this but I haven't been able to think of it yet. But what if THIS happens? Add more tests. Almost any piece of my code, if you ask me "how do you break this" I'll have a LIST, but it's acceptable stuff like "provide a single line of input larger than RAM plus swap". I do not handle $((1<<67)) on a 64 bit system. And I'm provisionally ok with that. For now. I am COMFORTABLE with programming, both because I've already MADE most of the obvious mistakes multiple times and dealing with it is just shoveling, and because _everybody_ is bad at this. Iterative pareto principle, 80/20 and then 80/20 what's left. Find checkpoints to commit that are better than what was there before.

But I need traction and momentum. Scrabbling across the surface and bouncing off because I can't get started is frustrating. I need to drive a crampon in to get started. I've spent far too long being five minutes from getting started. And if I don't know where my next checkpoint is because the code's being can-of-worms but people are already _using_ it...


June 16, 2023

Yay, the big endian fix for QEMU finally went in. (It's not specific to mips malta, it affects s390x as well.)

This morning's dental appointment was apparently just diagnostic, not pulling the bad molar. $400 worth of X-rays, and then they want to do $1500 of work on all the OTHER teeth. Which is way cheaper than I expected, to be honest, I got charged $8k to get my front teeth screwed up like this back in 2013, so probably double that for inflation since. (There's a reason I'm going to the place Fade gets her teeth done, it's the hospital attached to a medical school. Either they get graded on their work, or it's the teachers doing it.)


June 15, 2023

Ooh, new bug with variable expansion: ${0::0} is erroring in toysh but not in bash, which I noticed in my monthly-ish glance at the busybox list to see if they've hit anything interesting. (Most issues they mention don't apply to toybox, but every once in a while there's an interesting test case.) Which is EXTRA weird because in bash ${0:} errors but ${0::} does not. And nothing you put _in_ the slice math fields seems to produce an error, "echo ${0:=}" is fine, "echo ${0:]}", even "echo ${0:0/0}" which SHOULD throw a "division by zero" error...

I'm still wrestling with HERE document expansion line continuations being tricksy, turns out I can't just iterate over parse_word() since external quoting is irrelevant but INTERNAL quoting is still a thing. So ${PATH/"abc"} still has to match but quotes before or after are ignored (normal text having apostraphes in it does not suppress variable expansion). Except you can \$ the initial $, which seems to be the only backslash escape other than the terminal newline that counts, the rest are ignored/retained.

This raises a design issue: I only need to traverse and resolve line continuations (and glue together lines) once, but collecting HERE documents doesn't care about that. Resolving variables does. So when I DO have an unbalanced HERE document, the time to generate the error is when using it. For example, a HERE document in a function definition can contain anything, but when you CALL the function it would error if you have ${ without a concluding } in the HERE document. Checking each time you call the function is right for error reporting but expensive for gluing togehter line continuations.

Sigh, when I wrote expand_one_arg() the design assumption was the input was already sanitized, and changing it to reliably detect unbalanced ${} and such would be a lot of auditing work to get something I could trust. Scanning the input before calling it in HERE document expansion isn't hard, but the obvious place to do it is the WRONG place to do it. Grrr. I have multiple bad options here.

According to the thread with Chet, bash takes a different approach storing HERE documents as one big string. I'm keeping the individual lines we read in as long as I can to be nice to nommu systems, and having one big string still doesn't fix the "unfinished expansions cause error" problem. I guess as long as I need a traversal pass to spot those, HERE document traversal at expansion time is the right thing to do. The gluing-lines-together part should only do WORK the first time, and we need the rest for error detection.


June 14, 2023

Onna plane to minneapolis.

The man page for readv() and writev() says that they work just like read() and write() except for taking multiple buffers, which means they SHOULD perform a single atomic input/output in the non-interrupt case. So I don't actually need to do memmove() between dd blocks to convert input and output sizes, I can wrap around the end of the buffer using (at most 2) iovecs.

So the question then becomes, what's the error case where a short read or short write (which any signal can cause) leaves me with not enough data to do a write() of the requested size, but not enough free BUFFER to do a read of the requested size? In theory the necessary buffer size is input+output, but for bs=1g that's... uncomfortable. In theory on a system with an mmu the physical pages should mostly remain unpopulated if I haven't touched them. (Modulo whatever weirdness transparent huge pages get up to, items allocated AFTER this in the heap and so on...) In practice, this is a fairly obscure error recovery path where input and output writes get out of sync.

Reading POSIX on the plane: bs= disables block aggregation, so allocating ibs+obs when either is specified makes sense. Posix says the default is ibs=512 and obs=512 so by default you DO have block aggregation that can get out of sync.

Posix says: "if the read returns less than a full block and the sync conversion is not specified, the resulting output block shall be the same size as the input block", which means if ibs=1g obs=512 then if you only read 800 megabytes instead of a full gigabyte, you perform a single output write of 800 megabytes, completely ignoring the 512 byte size request.

Sigh, there's aggregating short blocks and SPLITTING long blocks, I want to know when to do each, and it's not explaining the difference clearly here.


June 13, 2023

Flying back to minneapolis tomorrow, to dogsit while Fade's at 4th street and for my delayed dental work. Trying to finish stuff up here before then.

[Editorial note: screw it. I've been blocked on this for a couple weeks now, gaijin smash time.]

Sigh, I'm editing this on July 5th and here's what I left myself as notes to finish this entry:

I got invited to patreon video.

Trying to get a release out, 0.8.9 vs 0.8.10

Finish dd
  - testing with O_DIRECT pipes
  - has a TODO about buffer overflow

Thanks past me. The patreon video is a todo note-to-self because for a long while half the videos I watch on prudetube were creators complaining about prudetube and it just doesn't sound like fun, but I still haven't followed up on getting the tutorial videos I want to do hosted on that german peertube server instance the "topless topics" lady uses. I looked into hosting videos directly on patreon last year (along with tumblr and a few other places) but they hadn't properly rolled it out yet, and were pointing people at vimeo which was publicly exiting the video hosting space. (Their new business model is, like, corporate training videos? Or maybe just serving nothing but advertising, I forget. It seemed uninteresting, either way.) And I still need to update my patreon bank info for the new credit union.

The release note-to-self is obvious, the part about about the version numbers means that the logical successor to 0.8.9 is 0.9.0 but that signals... at least getting dd promoted? Adding the LFS build? Adding command line editing and history to the shell so it FEELS functional even if that wouldn't significantly change the number of scripts it can run... I haven't quite EARNED a 0.9.0 yet, but I've held off releasing 0.8.10 for twice as long as I should in part because I didn't want to CALL it that.

The dd stuff is that 1) I know how to test it now: when you set O_DIRECT on a linux pipe it does NOT merge packets in the pipe buffer, which gives you a method to preserve read/write transaction sizes and actually TEST that all this funky dd blocking is actually doing what it claims. (This test will most likely fail on macos. I'm ok with that. The remaining question is HOW to set that flag on the pipe.)

And 2) is a note in the existing dd.c implying the buffer size it's allocating is insufficient. Since then I've decided to rewrite the actual copy-and-realign loop to use readv() and writev() because then I don't HAVE to realign anything! The kernel can do a single atomic I/O transaction from or two multiple userspace buffers! (Since posix 2001, apparently. I don't remember if that was susv3 or susv2...) So I don't need to do funky memcpy things when ibs!=obs to copy the data back to the start of the buffer so it can potentially be "topped off" by another read. (Yes of COURSE I should test this with prime numbers.)

Knowing how to do it and clearing headspace TO do it is a different matter: I need to CLOSE tabs. Jumping from thing to thing results in nothing getting checked in and all my time spent reverse engineering where I left off.

So yeah, spoilers. (Like half of tomorrow's entry is me working out what I just summarized there...)


June 12, 2023

I'm still subscribed to the coreutils mailing list (well they STILL haven't merged the cut -DF stuff they said they would, and said was still in progress last time I poked them; getting a Linux From Scratch build going is a good way to test new releases of that before they make it into debian), and recently they were talking about "dd count=3x4x5" syntax (an insanity required by posix, from the days before $((MATH)) was built into the shell), and I went "eh, it's not hard to implement" so I added it to the dd.c in pending. Which already had some outstanding cleanups from the last time I looked at it, and I sat down and did the REST of the review while I was there (I want to rewrite the main copying loop, and need to expand the test suite now I've figured out how to do the O_DIRECT pipe thing that can actually measure the input and output block sizes)...

I got to a point where I could check in the changes I'd made, so I ran make test_dd to make sure I hadn't introduced any regressions and... There's a test for count=0x2 hex notation. Because I'm using atolx() which automatically grabs hexadecimal prefixes. And implementing the multiplication thing BROKE that. Ah, THAT'S why I hadn't done that. Of course, I _could_ make both work together, since 0x means multiplying by zero if not interpreted as the hex prefix (something you probably never want to do in this context)...


June 11, 2023

Still digging through HERE document variable expansion and I'm pondering the fact that int x, len; x = writeall(out, str, len = strlen(str)); if (x != len) barf; is subtly wrong, in that signed 32 bit integers max out at 2 gigabytes and a string on a 64 bit platform COULD be longer than that. I dunno if malloc() still has maximum contiguous chunk sizes it can return (my recent glib comment on the list to Elliott that malloc() should only fail for virtual address space exhaustion isn't TECHNICALLY true)...

This is a category of "thing I do not look forward to auditing". Same as the "how big a string can getline() return" problem, where in THEORY you can tr '\0' ' ' /dev/zero | sort and the OOM killer triggers. There's no reason a normal user CAN'T run :(){:&:&};: and forkbomb the system. (That's what the container plumbing is for.) Outright ATTACKS against this code are currently in the "don't test for errors you don't know how to handle" bucket, but as we approach 1.0 anything with security implications is a thing.

In this case when I wrote the code the assumption was that string-too-big errors would either be misattributed as disk errors (passing a negative number to write() should return error) or silently truncate the string in the HERE document output, either of which is acceptable for an explicitly insane input. But I need some way to AUDIT this stuff. Which may just boil down to "grep for each call to each library function and check its user", tedious but not really that hard. Toybox is small on purpose, and I've always vaguely thought a pre-1.0 audit of EVERYTHING was probably a good idea.

But I still don't know how to handle the getline() issue. It's sort of definitional, I don't know what the correct behavior IS. Barf on any single line longer than some arbitrary length? This is where OOM returns are good, from a libc that knows how much memory the system/container has available and can use that as a frame of reference for "this is too big to load into memory".

Which reminds me that the pending get_next_line() rewrite in sh.c (to actually have command editing/history) also needs to do the fseek() trick so the buffered FILE * readahead that getline() inevitably does isn't reflected in the file descriptors that commands inherit. I think I already have that on my sh TODO list? It's probably in tests/sh.test already, I should check...


June 10, 2023

So HERE document variable expansion requires line continuations, which is REALLY AWKWARD. A test case is "${PATH//:/ }" can have a newline instead of a space in it, which is preserved AS a newline if you didn't escape it, so you get path components one per line instead of space separated. That's not the awkward part, that's actually useful (although $'\n' instead of a literal newline on input is easier to read). No, the problem is that variable expansion assumes the input already has all the line continuations resolved by parse_word(), to the point there's a couple places expand_arg_nobrace() doesn't do bounds checking because input without a sufficiently unquoted trailing } for each ${ can't make it that far.

Which means I need to iterate over unquoted HERE document input with parse_word() to detect when I need to glue lines together. (It's very much NOT a fast path, you should almost never need to glue lines together in HERE documents, but if I EVER need to do it I only have bad places to put it. Bash doesn't detect unterminated variable expansions at parse time:

$ x() { cat<<EOF
> ${PATH
> EOF
> }
$ x
bash: ${PATH
: bad substitution

But the logical time to glue lines together is parse time. Once you've traversed a set of lines once and attached the ones needing continuation, you don't need to do it AGAIN, so doing it every expansion seems wasteful. I dowanna mark it as traversed, that's bad magic. Sharp poky outy bits, already too much magic I can't avoid, not doing it as an optimization. (In this context, "magic" is similar to the "you are not expected to understand this" comment in the original Unix. It's a thing that is not sufficiently obvious from reading the code. I myself spend too much time going "why did I do that...?" looking at my old code, it can't be easier for other people. When I say "simple" in the toybox design goals, half of that is "readable". Possibly that should be its own goal, but I haven't got a seperate metric for it.)

On the bright side, HERE document quoting is simpler than I remembered, because not only are "EOF" and 'EOF' indistinguishable, but E\OF and EOF"" are all the same too. All quoting is removed for symbol matching, but if there was any quoting (at all, even a single backslash) then variables are NOT expanded in the HERE document. (Yes, this is horrifying. Hysterical raisins.)

Last month google couldn't grep for my name (autocorected it to "langley" if not quoted, without saying it had done so; I think it added "misses" to the search and then ranked every single resulting page higher for what WOULD have been multiple pages if it hadn't gone endless autoscroll, but I'm just guessing at how they broke it). But I just googled for rasin to see if I got the spelling right (I've corrected it so many times over the years that I now correct it to be wrong when I _do_ get it right) and it did NOT go "did you mean raisin", instead it showed me the genre of haitian music and the Dragon Ball Z henchman, and the castor oil brand and the Rasin Foundation and the Checkosolovakian author with the eyebrows over the s, and the "image search" block it inserts is three pictures of raisins from shutterstock and such and one picture of the dragonball Z character...

Sigh: it's a defensible position I suppose, but now it's failing THE OTHER WAY. Pick one. I can come up with percussive maintenance workarounds to my workflow, but randomly shifting inconsistent behavior is... disconcerting. I still treat Google like a hammer, every time the head flies off mid-swing I double-take out of my workflow. I need to get used to the idea Google search is no longer solid, but... the reason I've never been able to work with Microsoft products is I need CONSISTENT failures. Go wrong the SAME WAY, predictably, and we can call it a feature. I'm happy to work with knives and fire as long as I _understand_ them. Reliable doesn't mean GOOD.


June 9, 2023

Sigh, I want to focus on the shell and work through the existing test suite in order, fixing each failing test until the entire test suite passes. It's a pain to add MORE tests to the test suit before doing this, because unless I put them right at the start they don't get run (because they're after the first existing failure), and in THEORY there is some sort of intelligible order to these tests (which probably vaguely reflects the order things are mentioned in the bash man page? Maybe? If nothing else, collating the HERE document tests and the flow control tests and the $((MATH)) tests and the variable resolution tests and the line continuation tests and so on...)

Then, logically, I'd take my various unfinished shell work branches and turn each one into a patch to review and finish like any other submission. Presumably reducing the number of outstanding branches.

My normal workflow also produces a zillion tangents, most of which are local stack push/pop that resolve themselves, but not always. For example reading through the HERE document variable resolution logic I just removed the "delete" argument from expand_one_arg() and made it clean up after itself instead, so it passes a local deletion list to expand_arg() and then frees every entry that isn't the string it's returning. All the existing callers except one were passing NULL for the list, and then freeing the returned string when they were done with it if it wasn't the same string they'd passed in. Which unfortunately leaks memory these days because variable resolution got more complicated and I added stuff like $((MATH)) which produces intermediate results that go on the deletion list, because it's not immediately clear what their lifespan should be. (When variable resolution produces a result it uses it until it replaces it. When we're not sure there are no other users of the thing we're replacing the deletion list lets us defer the free until we are sure. It also lets us mix copied and original data in other data structures: the delete list lets us know what to free. At various points like exiting a flow control block we know there can't be any users left, and we can traverse the associated deletion list. It's kind of very manual garbage collection, and the shell logic is full of it.

I need to enable ASAN's memory leak detector so I can see if there's more stuff like I just fixed, and ALSO come up with test cases to see if it fails the other way: if a "while true" loop adds entries to a deletion list that aren't freed each time through the loop, memory can fills up until the OOM killer goes boink. It's ok to batch the frees, but not let them accumulate endlessly. If I hadn't just been able to audit all the callers of the function whose semantics I was changing to make sure each one was good freeing the returned value that ISN'T going on a deletion list, I would have left myself either breaking or non-breaking TODO entries in the code where I didn't have the spoons to traverse and prune that logical branch just now. A non-breaking TODO entry is a comment with TODO I can search for later, things like possible memory leaks or missing features that don't stop me from testing it: it WORKS, it's just not RIGHT. Current examples from sh.c include "// TODO ctrl-Z suspend should stop script" and in syntax_err() "// TODO: script@line only for script not interactive". (I.E. the error message shouldn't include $LINENO when you're typing individual commands at a shell prompt.) A breaking TODO entry is an UNCOMMENTED specific nonsense word thrown into the code so I can search for it, and also so trying to compile the code will point out the line number where I haven't fixed a thing yet. That's a "must fix this before checking in" indicator.

Unfortunately, the new issues people are submitting to me when they try to use the shell, and the issues that come up talking to Chet, are not remotely in test suite order. They're GOOD INPUT, I'm happy to have them and want to fix all of them ASAP, but they're random potshots from left field, and there's a certain amount of drowning. This is a deficiency on my part, I know, and yet. I'm no longer an overcaffeinated 25 year old. These days I do 2 hours of programming and have to stand up from the keyboard. Well, ok, I still occasionally look up and 6 hours have passed and I had no idea, or my laptop suddenly suspends because the battery ran out. But getting into the zone like that is harder than it used to be. Spinning multiple plates leaves me with the nagging feeling that focusing on anything is letting down everything _else_ I should be spending the time on instead.

I'm very grateful to Android for letting me focus on toybox, and winding down the Japan stuff was a choice clearing more time for that. (Admittedly Mike put his thumb on the scales there.) But it's opportunity cost as far as the eye can see. Android wants a hermetic build but I want to go beyond that and make AOSP fully self-windinghosting. Making a self-hosting system has kernel and toolchain work too. (I have RESISTED poking at the kernel's nolibc or similar projects: the C library is NOT MY AREA. Even though I'm pretty sure I could write "just enough for toybox" in a couple weeks and need to go there at least somewhat to make strace work right... Ahem.) My old busybox work was driven not just by aboriginal but by building Linux From Scratch under the result, and I haven't poked at that in months (there's a new version out already). I need to get a toybox release out, but SO MANY TABS that I could _almost_ close and get in... I should make travel arrangements for the mkroot talk in taiwan, and mkroot currently semi-assumes a kernel patch stack I haven't updated or tried ot chase upstream in a while, and there's QEMU work I should do: the -kernel loader in half the targets can't boot from a vmlinux, there's an outstanding mips patch, my "simplest possible linux system" talk in 2017 included some hello world kernel examples and I have a pure C one now that I really should genericize for the different QEMU architectures and explain about stage 1 vs stage 2 bootloaders (the difference is DRAM init and relocation out of read-only memory and SRAM, Wolfgang Denk's refusal to make dram init optional is why you couldn't run u-boot under QEMU for a long time), and in THIS tab is where I was doing builds of each historical QEMU release until I found the last one that could actually boot Linux 0.0.1 and figure out what they broke... (If I redo the simplest possible Linux system talk I wanted to show that...)

Ahem. Tangents. Focus. People send me bug reports, I prioritize them, but getting to the far end of long-term plans while supporting a userbase with conflicting needs turns out to be hard.

I brought an umbrella to the table this time, and then half an hour later there was MUCH LIGHTNING. Umbrella does not protect against lightning, and all the buildings are closed due to the summer. (It's between summer sesssions so the university is shut down fairly hard at the moment.) Packed up and walked back home again before the storm could actually reach me, but the evening did not produce the block of productive work time I wanted...


June 8, 2023

Got the big lump of shell work checked in and now I have many, many tabs open with tests in them that I should deduplicate and marshall into tests/sh.test, plus I need to reply to pending email from Chet. And at some point come up for air and look at what I've been ignoring in github requests and so on while I've been head down on this thing. Except... I'm still doing HERE document variable expansion wrong.

I watched some "first time seeing" reaction videos to Moana, and the comments pointed out a bunch of things I hadn't noticed (the heart of Te Fiti was keeping the grandmother alive, she passed it off knowing she would die), half the music is done by a New Zealand group called "Te Vaka", and they linked to some other good details).

But I ALSO noticed that Lin-Manuel Miranda saw the Disney "Heroes get I Want songs, villains get I Am songs", and gave Maui and Tamatoa their "I Am" songs, but started Moana receiving a "You Are" song from her father (which she had to work to overcome), then she got her "I Want" song (How Far I'll Go), then the ancestors sang a "We are" song at her, then after she rejected the call and got grandmothered back onto the path she had a song literally ending with "I Am Monana", and then at the climax she sang a "You Are" song. At God.

This was very "hold my beer" of him. (And in Anime terms, instead of fighting god and killing them, you fight god and defeat = friendship. Percussive maintenance factory reset, returning fire to the gods...)


June 7, 2023

There is apparently some sort of National Smoke Emergency, like the 1930s Dust Bowl except fire instead of airborne topsoil blowing away. The topsoil blew away because steel plows were a marvelous new technology with no possible downsides, and nobody cared what the native americans who had been terriforming this "lush wilderness" for tens of thousands of years thought about sustainability. They organized forests instead of orchards, herds of semi-tame bison you didn't even have to hunt, and salmon runs where you could scoop up baskets full of fish. Since nobody OWNED that stuff, it must have just _happened_ instead of being the exact opposite of "the tragedy of the commons", which appears to be a specific failure mode of the british upper classes (just like "lord of the flies" was). Some cultures really do work for the collective good and look out for their fellow human, and others steal anything that isn't nailed down, mug children, rape women and say "she was asking for it, look how she was dressed" afterwards... Did you know europeans defected to join the native americans on a regular basis? Just walked away from "civilization" to join the superior culture. It was called "going native", and was hushed up by the rich landowners it embarassed. There's a reason rich slaveowners aimed so much genocide was aimed at cultures that DIDN'T orbit around exploitative capitalism, the side by side comparison made them look so bad quite a lot of people switched sides.

Anyway, having a smoke-heavy cookout last night was... bad timing. Fuzzy still has a sore throat, and I've had a headache all day.

Alright, what's still broken in the backslash and HERE document change... the <<< operator isn't adding a newline to its output. The line continuation can of worms...

A P.S. I removed from a mailing list post because it waxes unfortunately political: It would be nice if there was a janitorial community that made cleaned up versions of simple tools WITHOUT turning into survivalist preppers. I don't want to poke the git devs about bug du jour any more than I want to poke the kernel devs about my quarterly patch list, but when I look around at groups like "suckless" or "less wrong" they somehow manage to fail in the _other_direction_. Libertarians tearing down society, atheism becoming a religion (firm belief in nothing is still firm belief: zero is a number) that funnels people into incel nazi spaces. What's the old Niesche quote, "you become what you fight"? I want something like XV6 that could actually be a sustainable auditable load bearing base layer with no external dependencies, without falling into either the microkernel trap or having the "minix problem" of refusing to be real-world useful because it's "just a teaching tool". Yeah, it's hard to figure out where to draw the line, but I keep seeing posts about "software manifests" from corporate types and going "you're SO CLOSE, you could REMOVE dependencies and actually SIMPLIFY..." But no...


June 6, 2023

Another HERE document corner case is that toysh is parsing 'cat<<EOF' as one word, and bash is making it three words. Which is possibly related to "echo abc<(true)" outputting "abc/dev/fd/63" in bash? (Yes I found a use case for it: if /dev is mounted in a subdirectory. Some horrible thing I was doing with a chroot I think.) So multiple words, but not with SPACES between them. It's doing redirects according to variable expansion logic. And then there's:

$ echo 1<2
bash: 2: No such file or directory
$ touch 2
$ echo 1<2
bash: echo: write error: Bad file descriptor
$ ls 1
ls: cannot access '1': No such file or directory
$ echo abc<2
abc

The redirect prefix logic (apparently?) only triggers at the start of a word, so that 2>&1 stuff has to be its own word. And yes that includes the abc{def}<2 assign-to-variable prefix, still only triggers at the start of a word. So I was partly right and partly wrong. I THINK what I need to do is move the variable expansion logic out of expand_redir() into its own function, and then both call it from there (handling the prefixes) and from expand_arg_nobrace()? Except expand_redir() gets an unredirect list (well, resizeable double entry array) as an argument, and expand_arg_nobrace() does not. And in quite a number of the contexts we expand arguments FROM, redirection is not an appropriate operation. Urgh, I REALLY dowanna add another argument (it's got six, this is turn it into a structure time) and another NO_BLAH flag.

Ok, I have two sets of code that are traversing the nested quote contexts: parse_word() and expand_arg_nobrace(). (Which includes backslash escapes, as echo 1\<2 outputs 1<2 instead of redirecting, and echo 1\<<2 redirects from the file "2" instead of a HERE document, and yes I should have tests...)

So should parse_word() break an unescaped < into its own token (which means we lose the "was there a space before this" information for "echo abc<(def)" but I already knew we were getting that one wrong), or try to push redirection down into expand_arg_nobrace() which means it has to feed back the undo information to its caller? Meaning I'd need to change all the callers, and then an error path if you try to redirect somewhere we passed in a NULL for urd? Urgh, I just wanna get to a point I can check in the trailing backslash rewrite. I'm not trying to find MORE reasons to do major surgery...

Alright, parse_word() is already using the redirectors[] list, it's just doing so only at the start of a word. It needs to do so at any unquoted point within the word (breaking like parentheses would). The reason I DIDN'T is the skip_redir_prefix() stuff only happens at the start of the word, so yes there are two instances of redirection detection that behave slightly different. I can have parse_word() split at recognized (unprefixed) redirectories, and declare the abc<(blah) case a known divergence from bash. Not happy about it, but the alternative is an unbounded amount of rewriting...

Fuzzy bought a Red Snapper (very tasty, according to UHF), and we tried to grill it over a fire but an hour and a half of fire building didn't result in noticeable coals: everything's too wet, and too big. The kindling left over from last year decomposed noticeably since we collected it. We got a lot of smoke but not a lot of fire. Even with a bunch of newspaper under it, the wood hissed constantly.

Eventually Fuzzy pulled the charcoal briquettes out of the shed and cooked the fish in the actual charcoal grill. It was indeed very tasty. Both of us have sore throats from the smoke.


June 5, 2023

While walking to the table at UT last night, I got some news that at least gave me closure on the projects I was visting Japan for, and let me uninstall Signal from my phone. Now I'm wondering about my attempts to learn japanese, which has turned into a large anime to-watch queue since "walk for exercise, exposure to japanese, and entertainment" is more compelling multitasking than most of my other viewing options. I would still LIKE to understand japanese, but no longer have a specific use for it. (I mean, I can go vacation there, but I don't really KNOW anybody?) The talk in Taiwan is now also just a round trip to give a single talk, and then straight back home again. More time spent traveling than at the destination, I think.

I think I've worked through the tailing backslash line continuation changes, but the HERE document processing is still broken, maybe even MORE broken now, the backslash changes broke EOF token matching because it's got a newline on it now and echo -en 'cat<<X\nabc\nX' should match without a trailing newline on the X, yes I need to add a test...

The problem is, do I check in the backslash continuation changes by themselves, even if it causes regressions elsewhere? I'm sort of working on one big lump of stuff that it's hard to break into chunks, which means it's hard to get good stopping points where I can check things in. But that's how I wound up with so many orphaned development branches that never quite made it into the main line of development.

I got caught in a thunderstorm on the way home. Note to self: googling "austin weather" still gives an hour-by-hour expected precipitation timeline in the Google search result, but it is COMPLETELY USELESS. It said less than 10% chance of 0.01 inches of precipitation all night half an hour before I left the table, and then there was a DOWNPOUR with lightning before I even made it off campus. So I stood under a small awning for 2 and 1/2 hours, as the weather alternated between "merely raining" and "water is running down the brick wall on the INSIDE of the awning and splashing on me from that side due to electrical cables for the lighting above the door". And of course I had my new laptop in my backpack, and was holding my phone under my chin, hoping neither got soaked through enough to short out. (Yes, this ordeal started about half an hour after I was pondering whether or not to check in those changes. I had plenty of time to ponder the irony.) Eventually got enough of a gap I could run to a parking garage with better roof coverage, and at least sit down for a while.

Got home after sunrise. My sleep schedule is totally borked.


June 4, 2023

I'm encountering comments on a daily basis by everyone from famous authors to my own wife about how Google's services are deteriorating, and it worries me. I don't even bother to track the youtube complaints anymore, although at least they walked that last one back. Their "let's delete the majority of all youtube videos ever posted" policy got modified a few days later when enough people pointed out to them what they were about to do. Now they're just going to delete grandma's photos of her kids and people's school records and so on. (Never trust any online service to retain your data. Once they've sold it to advertisers they don't care anymore. You are the product, not the customer.)

I would miss Google. I really really REALLY don't want to be forced to use services from Microsoft, Apple, Amazon, Faceboot, or any of the other "walled hellscape" model late stage capitalist nonsense. Google has its faults, but it historicaly started from "don't be evil" and whatever its trajectory since then, its competitors have all been boiling the same frogs from a far worse starting point, without Google's history of employees pushing back. But that was before Google laid off 12k people at the start of the year, at least some of whom appear to have been load bearing. (They're blaming the search degredation on large language models producing extra SEO spam, but that doesn't explain why they stopped being able to find 10 year old resources that hadn't changed location. As with youtube, they valued "new" over "old" until the established stuff became unfindable.)

Capitalism destroys. It consumes like fire. And late stage capitalism is where it's eating _itself_, in some weird attempt to profit-via-potlatch. When HBO got bought by an octagenarian right-wing billionaire and started deleting its own catalog, I thought that was dumb. But now Disney is doing it. I had "little demon" bookmarked as a series to watch on Hulu (Episode 1 summary: "Chrissy Feinberg's first day of seventh grade goes south when she discovers she's the Antichrist") and it's gone now. DISNEY OWNS IT (when they bought Fox they wound up owning both FX networks and Hulu). It was hulu-exclusive content created by Hulu's parent company, which is now available nowhere. Ka-ching? Youtube declaring their intention to join that parade wasn't _suprising_, but they DID walk it back. Google does still listen, at least sometimes. That's why their core business rapidly and obviously deteriorating worries me: they would be _missed_. (And yes the whole apple vs android thing I've been pushing for over a decade now too.)

Sigh, a common failure mode with petty criminals is smashing a thousand dollar plate glass window to steal a hundred dollars of display items behind it. The billionaire equivalent used to be strip mining and slum lords, but now they've moved on to "imposing disproportionate externalities" upon the intellectual property world. And when you dig for reasons, "some billionaire's personal fee-fees" motivate far too much. The general consensus around why the batgirl movie was destroyed (completed but not released) is because the actress was black and the billionaire who bought it was racist. Disney's purge is attributed to the writer's strike, and the desire to punish the strikers by ceasing to pay residuals. Of course residuals are a fraction of the money they MAKE from the program, so they have to give up a big share to deny other people a small share. A tiny fraction of hurt goes to people who annoyed them, a bigger hurt goes to themselves and their customer base. But of course Disney was one of the first and worst offenders here, inventing the legal fiction that they could acquire properties like Star Wars without acquiring the obligations that came with them (to pay royalties to authors) and then using their size to stonewall starving artists until they finished starving.

The "Citizens United" decision is 13 years old, Mitt Romney's corporations are people was a year later (implying he should have faced murder charges for Toys R' Us), and they've packed the supreme court since until it's completely dysfunctional. The nominal opposition is literally senile, meaning this is unlikely to end without guillotines, which can only happen over the Boomers' dead bodies. And so we wait. Opinions vary on how long we wait, but we all know how societal change works, it "progresses one funeral at a time". I would LOVE to be wrong about that.


June 3, 2023

Sigh, I'm trying to redo the toysh line input processing. The trailing \ logic is different from what bash does (ongoing thread on the list with chet about that), and I _also_ have a hack I've been meaning to clean up: because parse_line ignores completely blank lines, I'm having EOF feed in single space " " lines to flush pending line continuations, which is wrong for multiple reasons. For one, toybox sh -c 'echo hello\' has an space on the end instead of a backslash, which is TWO bugs (\ gets eaten, space gets added). For another, you can have MULTIPLE pending line continuations, and a single EOF line won't necessarily flush them. Except we return 1 when we need another line, and there's different reasons for needing another line, which behave differently: unterminated if or || flow control errors out, but unterminated HERE documents are terminated by EOF (with a warning in bash, silently in the defective annoying shell). And you can have more than one HERE document pending at the same time: cat << EOF1; cat << EOF2 or even just cat << EOF1 << EOF2 (no they don't append, stdin gets redirected twice so the first one is dropped, but 3<<EOF would let you read from fd 3... Oh, and HERE documents are seekable, I should add a test for that. It writes to a deleted temp file to get a seekable filehandle that frees its contents automatically when closed. Classic unix filesystem semantics...)

My line input logic was removing trailing newlines right at the start, but I can't do that because an escape at the end of a line that was NOT ended with a newline gets preserved. (Well, not RELIABLY by bash, but I poked Chet about that. The -c processing is still magic.) So now I've got to propagate that \n through and it's essentially trailing whitespace which I'm already mostly handling, but something somewhere's likely to break. Plus NULL pointer, empty string, and string that only contain whitespace being DIFFERENT is why that "send in a line with a space in it" hack happened in the first place...

I'm also patching the HERE document logic to terminate all outstanding HERE documents at EOF. It can still return "nope, I need more" for unterminated flow control, at which point the caller errors out because there is no more, but the caller can't distinguish "need more HERE document lines" from "need more flow control logic", it just returns 1 to ask for another line, so it has to do it within parse_line(). And I've had to add multiple goto statements to get it to work because the existing logic really isn't set up to turn into a loop. Multiple gotos are not elegant, it means there should be a loop here which would require major surgery to insert...


June 2, 2023

Multiple people are now trying to use toysh and sending me bug reports, but what I _really_ need to do is grind through the "ASAN=1 make test_sh" bugs because every time I hit an issue or major todo item I try to throw a test in there which bash passes and thus toysh probably _should_. And there are a whole lot of existing tests toysh doesn't pass yet, which makes adding new ones awkward. (I keep sticking them near the start so they trigger, but there should be some logical order to all this...)

The next test_sh failure is a double free when a command comes after a HERE document, ala ASAN=1 make sh && ./sh -c '<<0;echo hello' which did print the hello! It didn't warn that the HERE document hit EOF, but I can presumably add that.

ASAN says the second free happened on line 2923... which is in the function free_pipeline() so yes it would, wouldn't it? This being gcc, it doesn't say who CALLED that function, because gcc's ASAN is crap. And I can't use the Android NDK's ASAN because it's only available as a dynamic library, and if I dynamically link against bionic it's not available on the host so the binaries won't run. And you can't LD_LIBRARY_PATH your way around the dynamic loader being /system/bin/linker64". Elliott suggested I could symlink /system to somewhere in the NDK, but find android-ndk-r25c -name linker64 produced zero hits.

So backing up, the first free was in do_source() which calls llist_traverse(pl, free_pipeline) after run_lines() returns. Because we've executed all the stuff and it's done with it now... Ah, and then it frees the HERE document but the last entry in that is the EOF which was one of the arguments to the earlier command line that already got freed. (Because why copy it when we're

I need to enable the leak detector, and whitelist the EXPECTED leaks at exit. I know how to do the first, not sure how to do the second. Other than writing a debug function to laboriously free stuff the OS is about to free for us. I'm worried about accumulating leaks during long runs, not blocks of data with the same lifetime as the process. I want some sort of leak_forget() function that says "anything that's already been allocated is not interesting for leak detection, only show me NEW allocations after this point that don't get freed".

Alas, gcc's ASAN is abandoned crap, and the LLVM toolchain I have wants dynamic bionic installed on the host, and will NOT work static linked (for no obvious reason other than they either didn't think of it or didn't bother). So puppy eyes about adding stuff would add it to a context I can't use anyway.

Hmmm, I should try getting a dynamic bionic chroot working again. In theory the stdin panic fix in the _start code has made it in to the release version by now? (I should really learn to build the NDK from source. Too many tangent ratholes...)


June 1, 2023

Email from Chet: my trailing backslash line parsing is wrong in toybox (or at least doesn't match bash). I knew my line parsing is wrong and I'd have to redo it, but it turns out it's wrong in more ways than I was aware of. Hmmm...

Also bash -c 'cat<

Oh, and no matter how you fiddle with the priority, HERE documents always seem to eat their lines before line continuation logic does:

$ if cat << EOF; then
> blah
> EOF
> echo hello; fi
blah
hello

Which toysh is already getting right, but I want to make sure I have tests for. And also:

$ if [ $(cat) == blah ]; then echo hello
> fi << EOF
> blah
> EOF
hello

I.E. your REASON for requesting line continuation can vary from line to line, based on parsing the new input. And that $(cat) can't be evaluated until the trailing redirect has replaced stdin for the whole block, which I'm already getting right but the new changes can't break that, hence regression testing...

I mean, if you REALLY want to go down the rathole here:

$ bash -c 'echo $LINENO'
0
$ bash -c $'\n\n\necho $LINENO'
3
$ echo 'echo $LINENO' > weeb
$ bash -c '. weeb;. weeb;echo $LINENO'
1
1
0
$ bash -c $'. weeb;. weeb;echo $(eval $\'echo $LINENO\\necho $LINENO\');echo $LINENO'
1
1
1 2
0
$ bash -c $'. weeb\n. weeb\necho $(eval $\'echo $LINENO\\necho $LINENO\');echo $LINENO'
1
1
3 4
2
$ bash -c $'. weeb\n. weeb\neval $\'echo $LINENO\\necho $LINENO\';echo $LINENO'
1
1
2
3
2

That's why each "do_source()" has it's own pseudo-function context, because LINENO is often a local variable even without a function call, which sometimes gets reset and sometimes gets inherited as you enter/exit each new parsing context, and I need tests for all of it...


May 31, 2023

Flew back to Austin first thing in the morning, early enough in the day I could hang out with new/old laptop at Wendy's and the HEB tables.

Finally got the ls --sort tests in, and fixed more than one bug found by them.

For example --sort can handle csv arguments (in toybox, not in gnu/dammit) but when I fed it more than one sometimes it was looping endlessly because when you've already matched you don't do further sorts (which meant it wasn't advancing past those arguments it wasn't processing), but it needs to CHECK those future arguments to see if there's a "reverse" in there, so it was looping without advancing... (There's also "unsorted" which stops argument processing despite not having matched, but I can't TEST unsorted because it means the filesystem order leaks through and I can't control what that IS)...

Yesterday the dentist said the tooth should come out, and responded to my "it doesn't hurt" with a poke with a tool demonstrating that the nerve is alive and well and not protected by much and capable of hurting a VERY LARGE AMOUNT QUITE SUDDENLY.

This would be the SEVENTH tooth I've lost. (Four wisdom teeth and two on top next to the incisors removed presumably for cosmetic reasons as part of the braces years ago? I had a somewhat pronounced overbite as a teenager. My parents were really making those decisions at the time, I was in early high school.)

He didn't quite come out and say it, but when my wisdom teeth were removed this tooth was left without a matching tooth to chew against, and that's apparently bad for teeth. Which means I'm losing this one because they took out too many wisdom teeth back in the day. Those two removed in the front meant the braces shifted the rest forward, which made room to KEEP the top two wisdom teeth (which I pointed out at the time) and the dentists went "no, you don't want to leave a tooth with no matching tooth for it to work against"... but they DID, didn't they?"

Add in the TMJ I got because the braces for some INSANE reason involved a rubber band from my upper left to the lower right (across my tongue) for 6 months, and this means basically every major experience I've had with american dentistry has caused future problems. The braces made my jaw click and grind, the previous tooth removals left me with an orphan tooth that's now collapsing, and the 2013 experience in St. Paul left two of my front teeth looking obviously terrible. The cost each time was probably somewhere around ten grand each time (if not more adjusted for inflation)...

Anyway, had him grind off the pointy bit (which didn't hurt for about five seconds and then essentially repeated the first poke; he is a VERY good dentist to not have done further damage inside my mouth when I lurched like that, but hey: my cheek can heal now). And then I scheduled a follow-up apointment for a month from now. I was thinking of flying back to dogsit while Fade's at 4th street anyway. (Previous years she flew back from Austin and left Adverb with us, but she's staying in Minneapolis this summer to finish her dissertation so she can defend it in August, which is why I'm flying there to see her here so many times this year.)


May 30, 2023

I may have gone a little overboard. Somebody emailed me to track down a citation, and I replied with my usual "THIS IS A SPECIAL INTEREST OF MINE" enthusiasm:

> Is this yours?
> https://landley.net/history/mirror/cpm/history.html
> I'm citing some of it in a book I'm writing and I wanted to make sure it was you.

As the "mirror" states, it's a copy of an old page from geocities. Here's the original pulled out of archive.org.

Back in 1984 Gary Kildall was one of the original co-hosts of the TV show "computer chronicles" (until he became too busy with his company to continue). Here's an episode he co-hosted on "programming languages", and here's an episode on "operating systems". When Kildall died, the show did a retrospective on him.

Here's the "standard" interview with him (I have a copy of this book). And here's another computer industry pioneer reminiscing about him.

> Also, Im going a little beyond what's on that page, and might you be able to
> confirm it's (more or less!) accurate, please?
> Thanks
> [NAME]
>
> In 1974 Gary Kildall, co-founder (with his wife) of Digital Research, personally
> created CP/M, which became the standard operating system for 1970s personal
> computers.

You should really watch the PBS series Triumph of the Nerds (which is based on the book "Accidental Empires", the presenter is the book's author).

CP/M only became the standard operating system for "S-100" systems. (Here's a song Frank Hayes, columnist for ComputerWorld, wrote/performed about the S-100 bus. Yes, it's a "C shanty". From the album "never set the cat on fire".)

The "PC vs Mac" of the day was Apple II vs S/100 systems (which started as clones of the Dec Altair: MITS manufacturing couldn't keep up with demand but they shipped a full schematic with every system were using off the shelf parts, so other people bought the parts and assembled them according to the schematic, and then started making improvements).

The company "Imsai" (that's the computer the protagonist of the movie "Wargames" had in his bedroom) convinced Kildall to break his OS into two parts (BDOS and BIOS, Basic Disk Operating System and Basic Input Output System), with the BIOS essentially being a driver package provided by the hardware manufacturers so the same BDOS could talk to disk and console hardware. That way, ALL CP/M machines could run from the same floppy disk, rather than having separate disks for each manufacturer.

All that was 8-bit, and since 1979 Kildall had been chasing multiprocessing (MP/M) as the next big thing (about 20 years too early, the cost of memory was a big limiting factor so at the time running multiple programs in parallel on the same machine wasn't _that_ much cheaper than buying multiple machines and networking them, although S/100 systems didn't have a "motherboard", memory expansion was on cards (which you can keep adding as long as you have slots), and even the CPU was on a card, so having a multi-processor system with two CPU cards wasn't a far-fetched idea, the trick was making it WORK...) so he basically ignored the 16 bit 8086 for the first couple years.

But a guy named Tim Patterson at Seattle Computer Products was working on a new 8086 board which was intended to run CP/M, and since DR hadn't shipped it yet he bought an off the shelf CP/M manual and implemented 16 bit versions of the system calls it listed so he had something to test the hardware with, calling the result "QDOS" (a play on BDOS, Quick and Dirty Operating System).

Tim had previously worked a summer job for Microsoft where he created their first hardware project (an 8080 processor card for the Apple II, allowing it to run Microsoft DOS instead of the one Steve Wozniak had written), and when Paul Allen realized that IBM's project Acorn was basically a 16 bit CP/M machine he and gates threw $50k at Tim (split with his employer) to buy QDOS from him, which they renamed "DOS 1.0"...

> But the first versions of CP/M, like the early personal computers,
> had very limited functionality: the first version merely supported
> single-tasking on 8-bit microprocessors and no more than 64 kilobytes of memory.

8-bit machines all had only 64 kilobytes of memory, and hacks like "bank switching" historically never made much difference. CP/M was about the best you could do on that generation of hardware. Paul Allen thought the PC that IBM was developing could do better and wanted to run Unix on it, so he licensed Unix from AT&T and contracted a small 2-man garage outfit called SCO (the Santa Cruz Operation) to port it to the Intel 8086 and Motorola 68000 processors (because IBM hadn't decided wihich it would go with yet), and called it "xenix" to indicate "we'll port it anywhere IBM needs it to go".

Then when they signed the NDA and got the hardware specs of the original IBM PC (because IBM wanted to put Microsoft's DOS in ROM as the PC's operating system, like the Commodore 64 and so on, and the CEO was on the board of directors of the Red Cross with the mother of William H. Gates III, "trey" to his friends, made an exception for "Mary's Boy". Microsoft was too small to qualify as an IBM vendor normally, that part is in the book "Big Blues" about the history of IBM, by the way)... Anyway, the IBM PC specs read "16k of ram expandable to 64k if you pay extra, and the ISA bus is just the S/100 bus with unused wires removed" (as in there were literally adapters that plugged the bigger cards into the smaller slots, no electrical or timing fiddling required, just shifting wires over)...

Paul Allen went "oh: you're going to run CP/M on it". But he and Gates had already expanded their ambitions to sell a bigger OS to IBM, and Gates said he knew Kildall and offered to set up the meeting with IBM, whereupon SOMEHOW Kildall got the impression that the meeting was in the afternoon but IBM got the impression that the meeting was in late morning, so Kildall was off at the airport flying his airplane (cessna probably?) to cool his nerves, and when the IBM guys unexpectedly showed up at his house (he worked from home) his wife paniced and called Gates who suggested that the company lawyer look over the NDA while Kildall got back from the airport, and as lawyers do he went "ew" and started negotiating terms, and since they refused to sign it as is the IBM guys went away empty handed before Kildall even got back from the airport, and the whole meeting was set back weeks...

Which gave Paul time to contact Tim Patterson and scrape up $50k to buy QDOS and offer to be IBM's "second source" with "their" 16 bit CP/M clone (filing off the Q and renaming it MS-DOS). IBM did the PC after their salesbeings saw Apple II running Visicalc on the secretaries desks when they went to meet with executives in otherwise pure IBM shops, and after allowing Digital Equipment Corporation and the PDP-1 to live (creating the minicomputer ecosystem) they vowed NEVER AGAIN. They estimated they had a year to flood the market and smother Apple before it got entrenched, but an internal process audit had just measured it took them 9 months to ship an empty box, so they had NO TIME to make the new product and get it to market. The head of the Boca Raton department offered to make one out of off the shelf third party parts they could order in volume with a phone call, which is NOT how IBM normally did things but this was an emergency and the CEO personally granted absolution and indulgences to the Boca team. IBM was a monopoly used to squeezing customers, so they carefully made sure none of these new suppliers could ever do monopoly leverage against IBM, by ensuring there was a second source for EVERYTHING, with their one unique contribution being the BIOS ROM (the thing Compaq clean room cloned). They hadn't been second sourcing the software (just the hardware), but hey: good idea! Another CP/M, sure thing.

Meanwhile, Kildall was a Navy instructor before he started Digital Research, which meant he knew about being a vendor to big bureaucratic institutions, and wasn't really keen on going there. It's lots of money, but most of it's pie-in-the-sky someday money after jumping through lots of hoops and years of delay, and to get there you need a dozen full-time staff just to navigate the bureaucracy. His company was a couple people running out of his house: he'd take free money if IBM offered it, but he already had a CP/M ecosystem built around the stuff he was already selling to existing customers, and this new thing was either part of the S-100 family or it wasn't.

So when the PC shipped, CP/M-86 was late, and when it arrived it cost several times what Microsoft priced DOS at. But the real nail in the coffin was that Paul Allen didn't give up on his dream of having this machine run Unix. Each new release (PC, XT, AT) could support more memory, and the 8086 processor could physically address up to a full megabyte. (The DOS 640k barrier was because they'd arbitrarily mapped I/O memory at 10x the original PC's memory capacity: 2/3 for RAM, 1/3 for I/O memory space. You had to move the VGA card's memory window in order to use more contiguous address space in your application, and even then you don't get ALL the space because I/O memory is still needed.)

DOS 1.0 was a bug-for-bug clone of CP/M (well, a 16-bit port of an 8 bit system, but otherwise identical). But for the DOS 2.0 release, Paul Allen added as many Unix features to MS-DOS as he could. You could now use filehandles instead of file control blocks, and stdin/stdout/stderr were filehandles now. He added unix-style subdirectories, although DOS 2.0 used "\" and "/" interchangeably because "dir /s" was how CP/M had indicated command line options, so DOS 2.0 let you use both "dir /s" and unix style "dir -s" with the / version deprecated, but he couldn't quite REMOVE it yet, so the syscalls supported both directory separators. And he publicly announced that a future DOS version (hand-wiggle maybe around 4.0) would just be Xenix with a DOS emulation layer for old programs. You'd need something like 256 to be worth it, and hey: you'd get multiprocessing for free. (Remember how Kildall was doing MP/M? Maybe not THAT crazy. For reference, IBM announced its "Topview" multitasking graphical desktop for DOS in August 1984, and the first version of the Desqview multitasker for DOS shipped in July 1985. If 8 bit systems max out at 64k, a 16 bit system with 128k of RAM running 2 of those 8-bit programs at once sounds pretty feasible...)

The new unix features in DOS 2.0 made it a way better programming environment than CP/M-86, so it wasn't just cheaper now it was BETTER, and CP/M-86 receded from use on the IBM PC. (And clones, Compaq had happened by now. The reason the IBM PC took over the world and the Apple II didn't is that when IBM sued Compaq they lost, but when Apple sued Franklin they won: https://en.wikipedia.org/wiki/Apple_Computer,_Inc._v._Franklin_Computer_Corp. That was the legal decision that extended copyright to cover binaries and thus invented "shrinkwrap" software, see also the 1980 Audio interview with Bill Gates (mp3 and transcript both linked from https://landley.net/history/mirror/#:~:text=1980%20audio ). The GNU project, IBM's "Object Code Only" announcement, and AT&T's post-breakup commercialization of Unix were all responses to Apple vs Franklin...)

IBM's competitive focus on Compaq and the hardware clones distracted it for years from the fact it had lost its second source competition on the operating system side when DOS 2.0 rendered CP/M-86 irrelevant. IBM shipped its own PC-DOS and Digital Research eventually came out with DR-DOS, but by then Microsoft was doing "CPU tax" contracts with motherboard manufacturers (see the 1995 antitrust trial under Judge Sporkin), and used aggressive bundling (buy X get Y for free, and you can't NOT buy X) to promote Windows and Office... But I'm getting ahead of myself.

Two things happened to derail the dos->xenix move:

1) the IBM PC/AT (developed in 1983, shipped August 1984) added a hard drive, so the DOS 3.0 release was mostly about adding hard drive support (the C: drive) rather than furthering the convergence with Xenix.

2) in 1983 Paul Allen came down with Hodgkins Lymphoma. (That's the same cancer Hank Green just got. It's one of the most treatable forms of cancer, but it IS cancer, and can totally kill you).

Nobody initially knew WHY Paul Allen was so sick (looked like overwork during the DOS 3.0 crunch), but Paul Allen owned 1/3 of Micorosoft's stock because Bill Gates was an asshole: they original wrote DOS for the MITS Altair, and the owner of MITS offered Paul a job working at MITS. When incorporating Microsoft Gates insisted he have 2/3 of the stock and Allen only 1/3 because Gates would be working at Microsoft full time and Allen only part time due to his job at MITS, and Allen agreed... and then immediately after that was signed, Gates asked Allen if he could get him a job at MITS. As I said: asshole.

But the ultimate asshole move was that while Paul Allen was working himself to death trying to get DOS 3.0 out fast and clearly sick but not yet properly diagnosed, Paul heard Bill Gates and Steve Ballmer (Microsoft employee #3, Gates' old poker buddy from Harvard Law School before they dropped out to work at Microsoft) talking to each other in the next room about how to get Paul Allen's 1/3 ownership of microsoft back when Paul died. They didn't want it going to his family, they wanted to figure out how to take it back.

When Paul Allen took a leave of absence to get cancer treatment, he never returned to Microsoft. The drive to switch everything to Xenix left with him, and Gates looked around for other people to copy technical agendas from instead. He saw the Apple Lisa (because Apple gave them an early unit to port their application software to), and tried REAL HARD to copy it but Windows 1.0 and Windows 2.0 were just pathetic. DOS 4.0, 5.0, and 6.0 offered nothing that DOS 3.0 hadn't. Gates teamed up with IBM to work on OS/2 which was IBM's attempt to port mainframe technology down to the PC space... alas, targeting the 286 instead of the 386.

IBM had bought the entire first year production run of the Intel 286 processor to keep it out of the hands of competitors (like Compaq), and was then stuck with a warehouse full of the slowest, most expensive, rapidly depreciating 286 processors ever made. That's why they refused to go to the 386 and even the IBM PS/2 was mostly 286 chips, they were trying to unload that backlog of 286 chips! (They eventually landfilled some portion of them, but it took YEARS.) In 1987 the Compaq Desqpro 386 was the first 386 PC because IBM the 386 had been out since 1985 and IBM still hadn't used it, and Compaq got tired of waiting. (As did IBM's customers.) So yeah, that's why OS/2 was so far behind the times that Windows 3.0 could get out ahead of it and establish a new programing API standard.

When David Weise made windows work years later on his own and against orders, the first person he showed it to thought he'd get in trouble for it because Microsoft was focused on OS/2. Microsoft never had a plan, they had a monopoly that let them fail repeatedly until they got lucky. Their "CPU tax" monopoly contracts forced manufacturers to License microsoft products for entire "product lines", meaning PC manufacturers who wanted to ever sell a Microsoft operating system on ANY machine had to put them on EVERY machine. They couldn't sell even a small number of machines without the preinstalled microsoft software, and microsoft fought a marketing campaign for years against "naked machines" because obviously the only thing anyone could do with a machine that DIDN'T have microsoft software on it was install pirated microsoft software. Microsoft's monopoly leverage also let them prevent other operating systems from being installed alongside theirs, and when Windows 95 came out they extended this to preventing IBM from installing OS/2 on any of its own PCs if it wanted any access at all to Windows 95. (See the 1998 antitrust trial with Judge Jackson.) But again, getting ahead of myself.

The death blow for Xenix was that after the 1983 AT&T breakup, when AT&T was commercializing unix, it sucked in code (without attribution) from all the third party unix variants and shipped it in Unix System III. (System V was a successor to System III, there was a 4.0 but it never shipped to customers.) This is why the AT&T vs BSDi lawsuit ended favorably for BSDi: they were able to prove in court that AT&T had sucked in THEIR code without attribution, and thus forced a settlement on AT&T. AT&T also did the same thing to Xenix, and when Gates found out Microsoft code was in an AT&T product without permission or payment he went BALLISTIC, but didn't think he had the legal heft to take on AT&T so instead he purged Xenix from Microsoft (it had been running their internal email system and so on) and unloaded Microsoft's interest in Xenix on SCO (which is how SCO wound up fully owning Xenix, they'd intially just been a subcontractor doing work on somebody else's IP, but they got it cheap), and basically developed a Dave Cutler level of Unix hatred going forward...

I note that back in the day I did a LOT of research on this for my rebuttal to SCO's second amended complaint against IBM, and xenix is all through it. The indented parts in green are mostly stuff I wrote, with a little bit from Eric, but the OSI position paper was his baby and the rebuttal paper was mine. The rebuttal links to a lot of primary sources, many of which have sadly gone away over the years but you can still pull most of them out of archive.org if you try...

(You should TOTALLY get a copy of Peter Salus' book "a quarter century of unix". And a copy of "Where wizards stay up late" which is about the formation of the internet. Soul of a new Machine and A Few Good Men From Univac are more tangential, but loads of fun.)

Oh, and the book "Hackers" by Steven Levy is the other half of this Ken Olsen Smithsonian interview, literally two halves of the same story with the TX-0 and so on.

Oh, and the first four interviews in the Intel section of my mirror are the four parts of the story of the birth of the microprocessor: Ted Hoff (the actual creator), Federico Faggin (who went on to found Zilog and the Z80 processor), Masatoshi Shima was their actual customer at Busicom (who many people say was the ACTUAL inventor of the 4004), and then their boss was Gordon Moore (of Moore's Law fame).

Then read "Crystal Fire" about the invention of the transistor. The second half of that book is about the creation of Silicon Valley (which exists because William Shockley was an utter asshole), and Gordon Moore is a featured player (part of the "traitorous 8" that bounced from Shockley to Fairchild to found Intel)...

Ahem: computer history is a hobby of mine. Here's a 2 part writeup (part 1, part 2) on some interesting plot threads I did a dozen years ago.

(I've been meaning to write my own book for years, but... too busy.)


May 29, 2023

Sigh, fell out of the habit of blogging during the week when I couldn't. (Nothing for my editing pass to elaborate on when I didn't leave myself a trail of breadcrumbs...)

Git log shows couple of shell fixes. I should get a release out then do a deep dive into shell stuff again, and try to get that properly finished.

Cut up one of Fade's old disposable mouthguards to get a chunk of plastic I can put over the tooth so my cheek can get some relief from endless stabbing. (It was keeping me awake, and it's not fun to talk either.)

Fade got me an appointment at the dental school attached to the university she gets all her tooth care done at. Of course she gets it free as a grad student, and I don't. We pay like $500/month to get me on her health insurance plan, but it doesn't cover dental for me: luxury bones. Still, these guys are known to be very good at their job, and should not make it WORSE. I'd very much like treatment that didn't cause more problems than it solved...


May 26, 2023

Back on the horse. (For a definition of "horse" that involves taking my new laptop to the common work area in building 1 of Fade's apartment, which is playing a spanish cover of "Achy Breaky Heart" for some reason.)

The 'repeated hang" failure mode left me with a lot of vi :recover files where it prompts me which of the three .swp files to read, and I'm just zapping all that. There's a lot of pausing to stare at "am I deleting the .blah.c.sw? file or the blah.c file" before each one JUST TO BE SURE. (I have made that mistake. Less of an issue when the file is in git, and I'm just losing recent changes instead of trying to dig it up out of a USB backup drive.)

Sigh, the hard part of fiddling with a command like ulimit/prlimit is A) coming up with the new help text, B) coming up with test suite entires. Once I've got those, the CODE is generally pretty easy. Implementation is seldom the hard part, DESIGN is the hard part. What should it DO?


May 25, 2023

New laptop arrived. The freeze problem advanced to "happens 30 seconds after a reboot", so I ordered another of the same type I could just swap the hard drive into. (I have 3 such spares at home, but they're in Austin and I'm with Fade in Minneapolis.)

It's so CLEAN. Not covered in scratches and gunk, almost as if I HAVEN'T been dragging it around with me everywhere for a couple years. Same model (Dell E6230) but this one's refurbished and thus in a slightly different case (doesn't say Dell E6230 on it for one), and with this case I can't see the charge/disk LEDs with the lid open. Seems like a tiny thing, but kinda significant now that I'm confronted with its absence. Yeah, there's software versions up in the toolbar (which I have configured to only be visible with the mouse hitting the top of the screen), but I don't TRUST the software ones. I wouldn't have a band-aid over the laptop camera if it had a physical LED that lit up when it was powered independent of any software. The fact they refuse to do that stuff is why one of the first things I do with any new laptop is stick a band-aid over the camera. The pad protects it for when I want to use it, and when I don't it's NOT LOOKING AT ME. Grrr.)

I tried borrowing Fade's old macbook during the gap, which was a comedy of errors in and of itself. She dug it out of the closet, confirmed it worked, set it to charge on the counter, and went to work. I opened the lid to be confronted with a login prompt. Ah. Day 2: armed with the password I tried to ssh out to a linux machine to do some work and... none of the ones I can think of are configured to allow password, they're all key-only. (I have backups of everything... in Austin.)

It's pretty late in the day by this point (shipping estimated the new laptop would arrive yesterday, instead it came in after 3pm today), and by the time I'd rustled up an appropriate screwdriver and got the hard drive swapped and network access sorted out (registering the mac address with Fade's apartment's wifi... gave me an intercept screen asking me to log in? Seems redundant somehow. Oh well, phone tethering still works...) it's after 5pm. Old machine still has the bigger memory but I'm making sure this is STABLE before swapping more parts than strictly necessary. To be honest it's possible I could have fixed the old one with a can of compressed air, but I haven't got one here and am not entirely sure where to buy one (target?), and the hang problem going away and then coming back again is how I _got_ here. I want reliability, please.

Via the phone tether I'm downloading SO MUCH EMAIL... (Gmail's pop3 does about 1 message per second in 250-500 message chunks. Between linux-kernel and qemu-devel and so I, I get well over 1000 messages a day. This is likely to take a while...)

On the bright side, the time off probably gave my eyes time to adjust to the new glasses. (The myopia is the same, only the asigmatism has been changed to protect the innocent.)

Wrote up yesterday's broken tooth while email downloads. Not gonna backfill the rest because I didn't do anything of note and don't remember what most of it was anyway...


May 24, 2023

Still backfilling: this is the day I broke a tooth. Molar all the way in the back, bottom left side, next to where I got a wisdom tooth removed years ago. The tooth itself doesn't hurt, the magic japanese toothpaste is quite effective. Hydroxyapatite deposits more calcium phosophate on top of any exposed dentin and keeps the nerves protected behind bone equivalent... but it does nothing about the enamel, a large chunk of which is what broke off here, leaving a sharp pointy bit that's stabbing my cheek. The cheek hurts a LOT.

Regretting not getting to a dentist while I was in Japan, but while I trust the medical providers over there FAR more than the ones in the USA... there's still a language barrier, and my teeth are in terrible shape due to the extensive dental work I paid thousands for back in 2013. (The 6 month apartment I had for the Cray contract in St. Paul was right down the hall from a dentist, and I used them as a second opinion to say "yeah, those two front teeth that got chipped before they even came in because of that car accident when you were 5 years old smashing your baby teeth up into your gums so they've got grooves on the front? We we ALSO want to just drill all that out and turn it into fillings because it's weird and we're calling that cavities even though you yourself can't detect them in any way." So I went with it, and all the fillings they put in chipped to pieces and fell out entirely over the next 18 months, leaving me with large obvious holes in two front teeth. I paid a lot of money to get those holes, and felt really silly about it, but regular application of japanese toothpaste meant it didn't hurt and did not appear to be getting worse...

And now I need to wrestle with US dentistry for a _different_ problem. Dowanna.


May 23, 2023

This gap was due to my laptop being dead and having to mail-order a replacement because all my spares were back in Austin.


May 21, 2023

The "battery charging while using laptop" problem is getting worse, I just had two reboots (well, freezes forcing me to reboot) in half an hour with nothing plugged into USB, and while basically just typing in a text editor and no cpu-intensive anything pulling power.


May 19, 2023

I originally had this as "April 31" but then my python RSS feed generator went "boing" parsing the date, because there isn't one. (Midnight according to my laptop is in the middle of the day in tokyo, it's a bit fuzzy which day I'm writing for over there...) So I moved it here because I didn't write a blog entry today, due to ongoing travel recovery:

Somebody in email said "Canada as a whole seems to be determined to be a branch plant operation of US-based multinationals," and I replied:

This too shall pass. Maybe not fast enough to benefit either of us personally, and when both the roman empire and the british empire receded they left behind a lot of scar tissue, but no empire lasts forever, and the USA is already pulling back from the world now we're a net exporter of oil (have been since 2019) and thus don't really care about having a trillion dollars of navy policing everybody's shipping quite so much anymore. We've still got aircraft carriers, but they can't be everywhere, and we've gotten rid of most of the smaller patrol boats that used to _be_ everywhere...

It's hard to take solace in bad things happening, but the currently powerful aren't going to stay powerful forever. The USA is facing the end of the boomers (1946 was 77 years ago, their refusal to hand off anything gracefully is going to cause a LOT of loss of institutional continuity), climate change (Houston's flooded twice more since hurricane Harvey, but "again" isn't as newsworthy), the exhaustion of the ogalalla and california's central plains aquifers underpinning the majority of our agriculture (don't get me started on crop monocultures), the collapse of the US health care system coinciding with the rise of antibiotic resistance, a dozen kinds of invasive species (Texas has "crazy raspberry ants" that are attracted to electromagnetism and will thus ball your wifi router), the reshoring of manufacturing (if our trillion dollar annual defense budget stops paying for the navy protecting container ships "for free" then floating everything from the far side of the world instead of mexico gets a lot more uncertain)...

Canada has its own issues to work out, but the worldwid fascist crazy should recede with the Boomers (they got BADLY poisoned by airborne lead in the gasoline for 50 years, and in 2/3 of who's left it's combining terribly with senility to go past "kids these days get off my lawn" into flat earth territory). Once that's past, then maybe the wretched survivors can start shoveling out. (Step 1: universal basic income, which yes will only happen over the Billionaires' dead bodies. So maybe UBI is step 2.)

Personally, I would like the new victorian prudishness and the Fosta/Sesta (Comstock Act II) nonsense to stop being imposed on every other country in the world, along with the USA's tendency to treat children as non-persons. In Japan quite small kids buy stuff in the store and go on the train by themselves. In Europe kids can have wine or beer at dinner as soon as they can walk. In the USA we look back in horror on the days of "latchkey kids" because now they're non-persons legally confined to a building every day where they go through metal detectors and are watched over by police with live ammo who randomly search their bags and lockers, they can be arrested and permanently removed from any family that allows them to be alone on the street two blocks from home. It's "for their protection" that they can't work or vote or drive, if a teenager sexts a naked selfie to another teenager they can BOTH wind up on a sex offender watch list for life, which is a change to the law the Supreme Court only made in 1982 by the way. Ronald Regan hijacked the federal highway funds to force states to raise the drinking age from 18 to 21 (after vietnam lowered it since Johnson and Nixon were drafting 18 year olds to die, so they-who-were-about-to-die protested their way into being treated like adults in other areas, which involved "the man" gunning down protestors).

So yeah, rooting for the american empire to collapse. When France invented the guillotine they had to work through Robespierre and Napoleon (liberty, equality, oh look a rich white guy has siezed dictatorial power again, rinse repeat), but it worked out for them in the long run, and they're currently braving tear gas to push back against late stage capitalists who want cheaper and more obedient servants. "We can't afford this" means you squeeze the rich harder. We went to the moon WHILE fighting the cold war, whether or not we could "afford it" wasn't the big question.

I would love to reach a point where I could take solace in GOOD things happening. Not just looking forward to the end of bad things and hunkering down to minimize the inevitable collateral damage...


May 18, 2023

Travel recovery day. Headache. I had a row to myself on the motreal -> minneapolis flight so I managed to get an hour of sleep, but missed beverage service and got intensely dehydrated. (It's the pressure changes, that's WHY they do constant beverage service on airplanes.)

Then lyft couldn't find where I was in terminal 2 and cancelled the pickup, and then taking the light tactical rail home was... interesting. Minneapolis seems to have cut the rail maintenance budget, the second train (after the half-hour wait between trains because they don't run frequently enough) was dirtiest public transport I've ever been on. And that includes Camden New Jersey and the New York subway. (I am sooooo spoiled by tokyo.)

Got home to Fade's, crashed, woke up in the morning with a headache. Still have the headache. I have a lot to catch up on, but am unlikely to be very productive today.

I'm also realizing that part of the headache is probably adjusting to the new glasses I got the night before my flight. (Japan does better glasses than the states, but the ones I've been wearing are from 2017 and finally started scratching last year.) We ordered went in to get them over a week ago, but since I wanted the extra anti-scratch coating they needed a week to do instead of an hour, and then when I came to pick them up a week later they apologized and wanted to _redo_ them because the left lens was slightly off center, but they did a rush job in 2 days this time (and gave me the other pair of lenses in case I need spares). Which is all fine, but I'm adjusting to new glasses and this is the first quiet unrushed "ok, sit down and try to work" session I've had since... and it's stacking debuffs.


May 17, 2023

Flying back to Minneapolis.

Sigh, this laptop power supply with the intermittent data connection works fine when I'm not trying to charge the battery _and_ use the laptop at the same time. Suspend it, charge it up, then keep it plugged in and use it: great. But when I don't do that, it has three obvious falure modes: 1) power supply gets VERY hot (the first time I noticed the _smell_ of volatile plastic compounds becoming airborne), 2) laptop toggles every 15 seconds or so between charge and discharge (which can't be good for the battery, and has corresponding screen rightening and dimming poer management weirdness sometimes), 3) if I forget and try to charge a USB thingy (such as my bluetooth headphones) while it's also charging the battery, the laptop freezes solid. (Probably a kernel panic that doesn't get marshalled through X11 into actually showing me the panic or even THAT it paniced. The kernel guys have been throwing functionality overboard like hot air ballons dropping sandbags, and one of the things they gave up on a while ago was "go into VGA mode or framebuffer and dump text to the screen when you get a panic". Because who cares about THAT?

I note that the OLD battery, which was smaller even before it lost 1/3 of its capacity due to age, had much fewer problems because I think its maximum charging current was lower (fewer cells). The battery charging logic goes "aha, I can feed THIS much power into the battery" and when the controller can't interrupt that and go "I said I could deliver this much power but I'd like to take it down a notch now" because the data line's gone walkabout again, Dell's charging logic goes all pear shaped.

Anyway, I tried to charge my laptop and phone before getting on the plane, laptop went suddenly catatonic and had to have the power key held down until it turned off, so I've lost all my open windows again. Sigh. Right as I was getting into a position to dig out and address the backlog.

Plane is 100% crowded again, and they made the seats smaller (again!). Front to back AND side to side, I'm trying to use laptop at awkward angle on the TV tray but it just plain doesn't FIT. And while I theoretically have a power outlet (I assume that's what the green LED on the seat in front of me near my right ankle is about), I can't see it well enough to actually plug into it. They turned the cabin lights off because it's an overnight flight, and my phone battery is fully dead so I can't use its flashlight (well I didn't really get to CHARGE it in the airport, did I?). The overhead reading light doesn't make it down there...

Not _much_ makes it down there. Air Canada added seats to their planes since pre-pandemic times, and reduced flight frequency so every international flight is 100% full. 13 and 1/2 hours in a space too small to pick up anything that fell onto the floor (unless I can hook it with my foot, I'd have to ask the guy next to me to get out of his seat, which means the person next to HIM would have to get up and stand in the aisle). The accumulated muscle cramps from being unable to move is not pleasant. Add in the usual 6pm tokyo departure time and the sleep deprivation is... getting unpleasant.

Dunno how much of this is Air canada and how much of this is post-pandemic late stage capitalist capitalist profiteering, but the days where I wrote most of the ps.c infrastructure on a flight back from Japan seem long gone.


May 16, 2023

Jeff wants me to meet Mike today and talk about future plans. I really, really, really don't want to, but there isn't a graceful way to back out of it. I strongly suspect they're going to try to pressure me into making more of a commitment to Jeff's company. (I'm not signing anything before Fade can read it. I also have no idea if Google wants to continue the toybox funding beyond what they've already done, but I'm not moving on from that until I've done all I can there. Jeff's project does not take precedence over MY project.)

Jeff hates when I say I'm working on "his projects", and insists that it's "our project". He has a "vision" that he's upset he hasn't ben able to explain all of to me because I keep getting derailed into practical things we need to do, but I'm not interested in infrastructure in search of a user and a big strategic goal that can't be concretely implemented. We've worked on and then left half-finished a dozen different pieces of technology. I care about what can get completed and put in the hands of users. If Jeff had funding for us to spend 5 years focusing on Basic Research in the vein of bell labs or xerox parc, great. But we don't. And I got enough swap-thrashing on toybox, thanks.

Still, I'm learning interesting stuff I didn't know before. And I'm FINALLY getting to the point where I know a LITTLE japanese. There have been anime dialog scenes where I followed multiple consecutive sentences! Yeah, ok, simple ones, but once there were FIVE sentences, in a row, that I understood almost all of. Alas, in actual interactions with japanese people, knowing "this is the point where the person running the cash register asks me if I need a bag" is still far more useful than my ability to parse the words...


May 15, 2023

The OpenLane git checkout is about 2 gigabytes. We went on a "deleting stuff we can prove isn't needed" spree (.git, designs, docker, docs, regression_results)... and the result is 2.1 megabytes of scripts actually needing to be installed. The .git directory is over a gigabyte, full of old long-deleted churn. They also checked in every project that's successfully built against this thing INTO THE OPENLANE REPO. (Remember when uClibc had a test suite containing every package that had ever successfully built against uClibc, and the invocation necessary to make it work in the new context? That test suite turned into the "buildroot" project. Well OpenLane has something similar, and it's FAR BIGGER THAN THE ACTUAL PROJECT.) This is pufferfish territory, the project is making itself look big, but once you cut through the cloud of squid ink there's not actually much there.

I need to fix up the toybox shell because people are using it, which means I need to finally add the command line editing and history (without which it seems way less finished than it is because monkey brains conflate user interface polish with functionality, and yes that includes _me_).

Command line editing is adjacent to the crunch_str() logic in exactly the same way fold() is, namely that "backpace eats how much" is the big missing piece of both. Which is a nonobvious question to answer because the HARD part is tabs advance by a variable amount based on where they started. (Also, nonprintable characters are TRAILING, which is the dumbest thing the unicode committee ever did. A printable character does not FLUSH pending nonprintable characters, the printable character comes first and is them modified by following characters, being REDRAWN ON THE SCREEN multiple times in some instances, which also means you're never sure you've finished a stack of combining characters until you've read PAST it and gotten a character (not byte, utf-8 sequence parsed to unicode point!) that is NOT part of this one, which you then need to unget and process seprately in the net go 'round the loop. When you've got a string fragment, you CAN'T know. It could end in an unfinished utf8 sequence. There could be a combining character following it. Pretty much the only thing that DOES tell you it's done is newline... and then what does combining characters at the start of a line MEAN exactly when there's NOTHING FOR THEM TO COMBINE WITH? (What, do they combine with an implicit NUL? What would that mean? The last newline has an umlaut!) It's REALLY STUPID because they did it BACKWARDS.)

But it's what's there, so we must cope.


May 14, 2023

I keep wanting to do a "how to bloat code" presentation, starting from the classic K&R "Hello World" and then shifting over to C++ with accessor functions doing exactly the same job but passing the data between all the various "enterprise" style contexts, demonstrating the full range of "it's not code reuse the first time" nonsense, and generally showing that you can have a very large amount of infrastructure that LOOKS like it's doing something but isn't really.

I'm reminded of this becuase the skywater toolchain turned their README from a text file into one of those "markup that generates HTML" things, yes of course it has stylesheets, and along the way they added a lint variant to check the validity of their markup, then of COURSE they factored it out into a subrepo which does a "git submodule update" at build time. (So you check out the repository, and then when you run the build it checks out more repository within the build.)

Remember: this is a README. Historically, this was ONE SMALL TEXT FILE. There's no build infrastructure for a text file. None of this is NEEDED, but this crap it's metastasized into is pulling in a chunk of another one of Mithro's projects, which is doing dependency checking against the host to make sure various packages and versions are installed... except it's checking RPM and we're running it on a Debian system so it's not finding the magic red hat package names out of the wrong kind of repository. And since they did a "-include" this whole mess should just DROP OUT if the repository isn't installed... but they're installing it WITHIN the build.

Someday I should do a proper writeup on why Google's OpenLane project stalled...

tl;dr: OpenRoad is a DARPA funded initiative, which works. OpenLane is fundamentally a small number of shell scripts that call the OpenRoad tools to do their thing in order, with reference to the Sky130 PDK which is basically "fonts and CSS to make a mask for this fab". (A fab is sort of a really high end printer. We're submitting a job to it. The job is literally a big data file.)

OpenLane is a partnership between Google and SkyWater (which used to be Cypress Semiconductor) to create an open toolchain for Sky130. Google hired the guy who did QFlow to work on the tools part, and then subcontracted much of the fab integration work to a company called "Efabless" which has an existing business taking people's design files (mostly in Verilog) and converting them into something the fab can accept. Which means if Efabless were to succeed at what Google's paying them to do, it would undercut their existing core business. There are two big projects here: OpenLane is a set of control scripts that call the OpenRoad tools in the right order to perform tasks, and the other is "Skywater PDK" which is a data dump from the fab. Tim Edwards is running a giant pile of fixup scripts in the Skywater PDK build because the fab's data dump is horrible (there's like... OCR errors in it or something?) And the resulting PDK is subtly broken half the time, although somebody found that if you run make TWICE, the result is usually good after the second time. (But the only way to determine if the result is good is to build the REST of the toolchain around the resulting PDK, then build your project with the resulting toolchain, then test result. Which is time consuming and labor intensive.)

The guy at Google running this is Tim "Mithro" Ansell, who is writing his own build system to do some portion of all this, except he doesn't seem to have done this before in a nontrivial way so doesn't really know what success looks like? He's a fan of the concept, but not a veteran. Jeff (who has done this before) keeps telling him "you need to do this" and getting dismissed as silly, and then 6 months later they realize they need to do what Jeff was telling them. Kinda like the RiscV guys, really...)

So Mithro grabbed chunks of his symbiflow project and stuck them to the Skywater PDK builder, specifically he's grabbed anaconda, which long ago used to be Red Hat's system installer. It was the large python program that would run when you booted an install CD (or floppies) that would partition and format your disk and let you select what type of Red Hat system you wanted to install, and would then install all the packages. This was back before Fedora and Enterprise happened, they replaced it with something else rather than rewrite the large pile of Python 2 code in Python 3. But the old 1990s Red Hat system installer seems to have spun out into its own project maintained by yet another proprietary company producing source-under-glass that you can see but would be crazy to try to build or install yourself) and is using it to confirm prerequisites are available in the local RPM repository. On Debian systems that don't use RPM, this doesn't find much.

Specifically, it's complaining that "yosys" isn't installed. It is, and it's in the $PATH, but since Mithro's slurped-up symbiflow plumbing that's installing a proprietopen fork of Anaconda didn't install it, it's not finding it. If it just tries to call "yosys" it's there, and presumably "yosys --version" might say if it's new enough, but instead Mithro/symbiflow/anaconda/openlane runs a large pile of python 3 which returns an incorrect answer.

Note: it doesn't have to do ANY OF THIS AT ALL, becuause if yosys isn't there then you should get an obvious build break where the last line of output is an attempt to run yosys and getting a file not found error. This is basically the "assert" problem where the bug IS THE EXISTENCE OF THE ASSERT.

It looks like if you remove that whole subdirectory, the enclosing makefile should just work becuase it has - before the include to skip the nonexistent file, and then the $(wrapper) variable drops out and it just calls the rest of command line. Seems worth a try, anyway. So I'm trying to remove the git repository, and I chopped the makefile target that clones it out.

Except he didn't just clone it there, he added it as a submodule, which means it's getting cloned already. So I need to remove it from the Makefile AND remove it as a submodule from the parent repository. Except the makefile of the parent repository is checking it out. (Remember yesterday's "how do I remove a submodule"? Because me not checking it out didn't prevent THIS from checking it out.)

One of the subrepos needs to be patched, which means we check out our own copy and tell the build where to find it. And there's even a make variable for this! Except autoconf is marshalling data from variable to variable, and if you track it back the top level configure is setting it with a hardwired path.

The whole project is like this. They keep making layers of infrastructure and then hardwiring it to do specific things. There are obviously multiple teams working at cross purposes here, and the PROPER fix would be to RIP OUT all the stuff that is we can PROVE is not doing anything. But you don't show progress in a fortune 500 company by REMOVING code. Code has a dollar value attached, generating more of it is always progress. Code gets depreciated and amortized, not _deleted_. Deleting it costs MONEY. Creating more is profit! IBM's KLOCS and so on...

We, on the other hand, are trying to get something to WORK.


May 13, 2023

So git grew a "fatal: detected dubious ownership" error whenever you cd into another user's directory and try to "git log" a repository. Not a "warning", but "I stubbornly refuse to perform the requested operation". So far the only fix is to sudo and run git as root, where it doesn't care about permissions.

That's really stupid. Barfing this way when WRITING to a repo is one thing, but I'm cd-ing into another user's directory and trying to "git log" and "git show" individual commits there. (I could tar the repository and extract a copy in my home directory so it all belongs to me, but that's deeply silly. And inconvenient.)

I don't know if google has deteriorated to the point it can't find it, or if there's no way to fix git other than to build it from source with this test patched out. Luckily, there's part of a git implementation in toybox, and this would be a reason to finish and use it.

In the meantime, it makes debugging a build that runs as a different user extra-annoying... and even more brittle than I thought? Darn it, the sudo workaround isn't load-bearing: if I do an "env -i PATH=$PATH git log" as root, I get the "fatal: dubious ownership" abort again. It's something about HAVING RUN SUDO that git goes "oh well if you really mean it". Actually doing so AS ROOT is not allowed by git. (I mean, I could destructively "chown -R root:root .git" but then the original user couldn't use it. Before finding out git was treating sudo as magic, I was thinking the right thing to do here is create an LD_PRELOAD library that wraps stat() to patch ownership to always equal getuid(), but even that won't fix it?)

I am so tired of myopic git developers. The reason I stopped maintaining kernel.org/doc is when kernel.org had a breakin (because one of the devs was ssh-ing in from a windows machine, and once you'd logged in the server wasn't that secure internally across users) they locked the barn door after the horses had escaped by removing generic ssh support, including the ability to rsync over ssh. I pointed them at a way to make ssh explicitly call rsync, with forced prefixes and everything, but they weren't interested: they'd homebrewed some horrible wrapper tool that ONLY let ssh run git (nothing else), so to update the website I had to check everything into git, including things like the gigabyte video file (USB driver writing tutorial) they'd removed after the breakin which I wanted to put back online. If the file then moved elsewhere (it was eventually uploaded to youtube) it would STILL be taking up space in the .git directory both in my local copy and on the server, forever. But they were deep in "If all you have is a hammer, everything looks like a thumb" territory, and I gave up and moved on...

Aha: I asked on the #git channel on freenode, and the magic invocation is "git configure --global 'safe.directory=*'", and they pointed at a reference for why the stupid happened. (And they confirmed it's checking SUDO_UID, which is just wrong.) Yay, found a way to make it stop.


May 12, 2023

Long argument with somebody online who claims that statically linking an initramfs into the kernel is weird, and that in 20 years of messing with Debian and Ubuntu they've never encountered it, so obviously nobody does it because their experience is universal. And apparently contradicting them was considered insulting. (Insisting that having used debian and a derivative of debian makes your experience universal, thus everyone else is weird, was apparently NOT insulting. Go figure.)

I can't link to it because it wasn't cc'd to a mailing list, only to Jeff. Mike texted to yell at me about "burning bridges", which means Jeff forwarded it to him. I'm also informed that Jeff has apologized to them on my behalf, which was not something I'd asked for (or been aware of at the time).

A lot more "huh, Google can't find this" instances during said exchange, which kinda undercut my point that what I was doing is not unusual. The deterioration of Google is getting alarming. But I did manage to at least dig up a few interesting numbers, which I can cut and paste here:

My perspective is skewed. Not just because I'm the guy who wrote the initramfs documentation in the kernel back in 2005, but I also maintain the command line utilities of Android and used to maintain the command line utilities used by Linux routers. That means I hear from those communities a lot, and they're orders of magnitude bigger than desktop Linux. That's not hyperbole: this page estimates there are 33 million active Linux workstations, and 1.6 billion active Android devices. Add in about ~750 million routers of which around 91% use Linux and "somewhere over 500 million" seems a reasonable guess, bringing the embedded total from just those two sources over 2 billion active installs.

So I regularly hit things that are "weird" for 33 million installs and "normal" for a couple billion. It's hard for me to convince the developers who make those billions of devices to show up even briefly on linux-kernel because they got tired of being called weird, and being seen as pushy when they try to explain. And if they won't show up, out of sight out of mind. (They think _I'm_ weird for still engaging with the kernel community at all.)

(I didn't even go into the PC hardawre space, where Red Hat claims to have a 33% share of the "worldwide server market" although that's in terms of who's paying for their OS, not installs. In terms of seats, all conventional Linux distros together are collectively 2.1% of desktop installs, behind ChromeOS at 2.2%. Windows is over 74%, Mac is 15.3, neither NOTICES those two. And in the PC "cloud" space... it's still Windows at 72%.)


May 11, 2023

Working on toybox stuff today instead of Jeff's thing, but I no longer feel safe huddling in my hotel room (and they're cleaning the room today anyway), so I went to the Hello Office, and when Jeff arrived he got mad that I didn't immediately stop working on the toybox thing and start working on his thing instead, and he left abruptly and angrily. I took the train to Akihabara to try to find a coffee shop there (it's the other part of tokyo I'm familiar-ish with), but didn't bring an umbrella and got caught in a rainstorm. (I have an umbrella in my hotel room and we found FOUR cleaning up the hello office, I am not buying ANOTHER ONE). Holed up in a random not-mall space, but it didn't have good seating to use laptop with so I mostly watched stuff on my phone. Eventually a lull in the rain let me take the train back, except I was most of the way to Shibuya before I realized I was going the wrong way down the Ginza line. The train was too crowded to pull out laptop there either. Got back to the hotel eventually.

Yesterday's entry was too long so I moved the battery tech description here. I have actually learned a lot about battery technology this trip:

Step 1 was to stare at boards taken out of the OLD system being salvaged and repurposed. (Well, diverted. The batteries were ordered for another project but never actually installed. The pandemic messed with shipping logistics or some such. I think they were going to be used at a wind farm in another country, but didn't make it there?) The existing system has thousands of batteries in over a hundred big cube things, each cube is sort of an industrial garden shed meant to live outdoors. It's a proprietary Chinese design made from imported western chips that require an NDA to get programming specs for from the (german?) chip vendor to figure out what we can salvage and what we have to reimplement. Each battery management board attaches to a case containing 52 Lithium Iron Phosphate prismatic cells: big blue rectangles with two terminals on top like a car battery, each roughly 15x10x4 centimeters and weighing a little over a kilogram.

The resulting pack of 52 is in a big (aluminum?) case, kind of a horizontal silver version of the Monolith from 2001, which is too big to fit in the elevator to the office, and most of them are a 2 hour car ride away anyway. (Mike has a car but Jeff and I don't. The guy who salvaged the batteries bought a plot of cheap land out in the countryside to store them. It's not anywhere near a train line, and does not have much in the way of hotels either. You know the parts of rural japan that have a lot of abandoned houses and entire towns with no one younger than 65 because of the declining birth rate? Yeah that. Jeff and Mike went out there and fetched stuff a few days before I flew in to Tokyo, but did not bring back an actual battery. Just an assortment of easily removable electronics and lots of photographs.)

So terminology: "battery" is a collection of cells, and "cell" is an individual anode/cathode pair (with electrolyte and separator), in this case those big blue rectangles ("prismatic cells"). The battery cells are wired in series because Lithium Iron Phosphate chemistry produces 3.2 volts (plus or minus ~10% depending on how charged the cell is; the voltage rise/drop is actually how you tell when you're done charging and discharing the battery). So each prismatic cell holds a LOT of power (over a hundred amp-hours) but produces a tiny voltage. Wiring them in series adds up the voltages of each battery, so 52 x 3.2 = 166.4 volts, at a LOT of amps. (Each monolith is roughly like a Ford E-transit battery I think? Same ballpark anyway, I don't have the numbers in front of me.) And each cube has a couple dozen of them: it was a VERY big battery farm.

So the board attached to the front of each of these 52 battery monoliths has four AFEs, which stands for "Analog Front End". It's a big analog to digital converter that measures voltages, and each AFE has a 28 pin connector hooked up to it through a zillion little resistors and capacitors. Those pins come from the batteries, the general idea is to have a connection before/after each cell so you can measure the voltage put out by just that one cell, and if it's higher than it should be while you're charging the battery, you can route current around it through the same pins so it doesn't charge up as much as the others in the string. Except the AFE chip can only divert like 1% of the charge current around the battery, so it's just a LITTLE bit of balancing, but it can happen each time you charge, and you can choose to stop early on either the charge of discharge if some cells are hitting an end stop and others aren't yet, sacrificing collective capacity to avoid damaging any of the individual cells.

This is how all battery management systems work, people do youtube videos about this. You start with balanced cells when you assemble the battery pack, and then do a tiny amount of balancing each time you charge them to _keep_ them balanced. If they're all from the same production run and have been linked together since, they should only really get UNBALANCED due to slightly uneven heating. But the home users making their own battery walls mixing and matching scavanged cells with very different origin and history, their battery management systems will have a LOT more work to do, and may not be able to keep up. Hopefully not an issue here.

So anyway, this chinese design is 4 copies of the AFE chip vendor's reference design board glued together (literally; they sell it on their website and it looks _identical_), with a different 5th board on one end (haven't found what they copied that from yet), and then the whole thing laminated under at least a milimeter of plastic to keep moisture out (and maybe electrical insulation). The 4 AFE chips are daisy chained together talking some SPI/serial variant, and then for the 5th board the SPI connection goes to a microcontroller. The microcontroller is from the same company that makes the AFE chip, and it's more or less a motorola 6800 from the 1970s with a bunch of SRAM and flash bolted on.

The 6800 is an 8-bit predecessor to the 32-bit m68k from the Amiga and Macintosh. The MOStek 6502 in the Commodore 64 and Apple II was to the Motorola 6800 what the Zilog Z80 was to the Intel 8080. In both cases, engineers who worked on the earlier one left to form their own company. So a 6800 SOC with 128k of sram is roughly equivalent to a Commodore 128, albeit clocked a bit faster 30 years later.

So that 5th board section is the controller, and at the far end of THAT board is a CANBUS connection to the outside world. CANBUS came from the car industry and is also used in manufacturing automation. The problem is, CANBUS is just a "read value from address, write value to address" protocol that tells us nothing about what's being said. The chinese manufacturer's board design is all under NDA (we recognized the AFE reference board they copied because there's a picture of it on the chip manufacturer's website), and the proprietary-is-good chinese company built their stuff out of chips that are all themselves under NDA. (There are perfectly good non-NDA AFEs on the market, but that's not what they chose to use.)

Even if we felt up to reverse engineering an assembly dump of the biggest program that could fit in a Commodore 128, we can't get at it because this NDA SOC has a fuse you blow to prevent reading the flash back out. (What this system is doing is generic and well understood, and the patents on Lithium Iron Phosphate batteries themselves expired last October (although that's mostly about manufacturing more cells, not managing them). But every design decision in the electronics has been about obfuscating what they're doing to protect largely nonexistent intellectual property. They want to SEEM unique and magic because you can't tell what they're doing, or interoperate with any of their existing stuff to repurpose it.)

So EITHER this control SOC is a dumb translator passing on the AFE info to whatever is at the other end of the CANBUS connection, OR this is where the battery management program lives that's measuring the voltages and making the bypass decisions and reporting how "full" or "empty" the battery is so it doesn't overcharge or undercharge and damage any of the cells. It's either a passthrough or it's the brain, no idea which.

Solution: build a new one and replace the whole board. Which also has the advantage that when these repurposed wind farm batteries run out, we can order more prism cells and put them in our own case (Jeff found an aluminum fabricator in japan that would do nicely), and make our own battery systems.

Oh, the old battery cases are also water cooled. (Well, half water half ethylene glycol.) The big cube shed things have an elaborate climate control system, and the spec sheets say that these batteries operate within about 3 degree celsius temperature range. This is partly because they packed a LOT of batteries tightly together into each cube (and run a LOT of power through them), but also because they apparently didn't want to do the math about how the batteries behave differently at different temperatures. (Which Jeff has papers on with graphed curves and math... but the chinese engineers apparently didn't bother.) Which is funny because the 28 pin connector only NEEDS 14 wires to measure the battery voltages (52/4=13, so 12 between each battery and 2 at the ends of the string), and most of the rest are probably temperature sensors? If we're using the same case and wiring harness for the initial deployment we want to reuse those temperature sensors, but... no documentation. Gotta go poke at stuff with voltmeters and crack one open to find what's actually there and look up data sheets...

All this stuff has to get re-certified to be hooked up to the grid, we need to find or create documentation for the parts we keep. (Jeff also wants to find a "current shunt" for measuring the whole battery, because adding up the individual cells isn't good enough. It's apparently somewhere in all this.) Personaly, I'm uncomfortable mixing enough electricity to run a car with conductive liquids, but it's what's already there. Cracking them open and then deploying the result is a thing we would rather not do, so we want to just replace the electronics on the front without opening the case. (Also, Lithium Iron Phosphate is WAY SAFER than Lithium Ion. I would not want to do ANY of this with Lithium Ion. The downside is LiPo4 only has half the energy density of the best Lithium Ion, but it can go through a LOT more charge/discharge cycles without losing capacity. "Lasts fifty times longer and puncturing a single membrane doesn't result in a three hour fire water won't extinguish" is rather a nice trade-off.)

Anyway, coming up with a plan for what to do with all that was "Milestone 2". Initially the goal for that was "design a demonstration prototype unit" (and milestone 3 is building/delivering like 3 prototypes they could show to people), but we wound up debugging through enough of the original electronics (and out the other side) that we came up with a scalable manufacturing plan for all-new replacment parts.

At which point I assumed we would actually start making stuff, but so far...


May 10, 2023

Jeff wanted me to come along to Shibuya for a meeting with Mike and PK today so we can go over business plans for fundraising. Because of course. I mentioned not wanting to talk to Mike, and Jeff went through several variants of "that's not good for me", "you can't do that", "get over it", and "suck it up and deal" (none of them phrased _quite_ that way), and I went along rather than argue.

I think Jeff's position here is "ha ha, Mike just _threatened_ to have you arrested and presumably deported and barred from the country, it didn't actually happen, so no harm no foul". My position is "Mike showed me who he is". There's probably some divergent neurochemistry in there, what with me being an ADHD poster child and all. (Growing up I was diagnosed "hyperactive and gifted". They hadn't invented ADHD yet. This isn't exactly rejection sensitive dysphoria because I never wanted Mike's approval, he's a friend of Jeff's whom Jeff finds useful for running the business that gives us the opportunity to work on the interesting tech. That's not the relationship jeff wants me to have with mike, and "I work on interesting tech with you" is not the relationship Jeff wants me to have with his business, either. But I started working for him in October 2014, it's 7 and 1/2 years later, and I can't think of a single thing we worked on that actually got deployed. We have not shipped ANYTHING to a customer except prototypes and demonstration units. As with Linux on the Desktop and making Android self-hosting, I keep grinding away and want it to work, and we get closer. There's a bunch of good side effects. But I am no longer trying to organize my household finances around it succeeding: for the moment Google is paying the bills so I can focus on toybox, and THAT is plenty of challenge for me. I don't know how long that situation will last, and am trying to make the most of it. This is VACATION TIME from that.)

I want to learn tech stuff from Jeff, and he's got a lot of great projects to do. I was hoping this trip we might reopen the VHDL to implement the barrel processor and fourier engine functionality we were talking about before Covid happened. Making an ASIC work through Sky130 _or_ ArtAnalog/TSMC would be great too. Jeff's also talked about doing a fresh j-core implementation starting over with a tomasulo/scoreboard design so it can do multi-issue. But we're not working on any of that, because we have more important things to do... as in chase money. I always get lured here with promises of tech work, and then we do a big fundraising document.

This time the document was called "milestone 2", and there was at least a lot of high level technical design work involved as we worked out how to recondition a load of batteries someone wants to repurpose from storing power at a windmill farm into individual combini and factory power walls. Only 6 of Japan's 17 nuclear reactors have reopened since Fukushima, meaning they leaned hard back into fossil generation without really PLANNING for that to happen, and in the past couple years the cost of Japanese electricity has tripled as fart gas got expensive due to Vladimir Putin's dick being too small. Load shifting from overnight to daytime is now potentially a big cost savings, and that's a market with legs anyway as wind+solar ramps up, so let's get into the battery management system business! Sure, why not, sounds like fun. I've been watching prudetube videos from Will Prowse and such about this sort of thing for years anyway, I'd love to learn more.

When I first went to work for Jeff in 2014 his company Smart Energy Instruments was trying to retrofit the electrical grid with sensors so we could feed a lot more wind and solar into it. I'm big into renewables and getting off fossil fuel, this IS my idea of fun. I very much want to see this project succeed, and grow into a sustainable business.

In the first ~2 weeks of this trip I learned how battery management systems work, although I doubt I could quite reproduce all the math myself. We've confirmed we can do a new one from off the shelf chips that don't require an NDA, plus technology Jeff has lying around from previous projects. Yay! The result was... a document that goes to somebody who gives Jeff money for having completed the project milestone. (But that somebody is not a "customer" and Jeff was angry that I kept calling him that.)

At the end of my original trip it looked like we were just about to actually starting building stuff, so I agreed to stay a couple more weeks. Then as soon as the trip was extended, we did the Open Project stuff to make gantt charts, and today's meeting I didn't want to attend was about preparing for a fundraising round.

After the meeting in Shibuya, Mike wanted to talk to me about setting up a meeting where he and I go over a new contract for me to come back to work for Jeff's company full-time. I was noncommittal. I'd rather NOT be arrested and deported before the 17th.


May 9, 2023

Travel arrangements for the potential talk in taiwan this summer went a bit off the rails yesterday, because when they originally asked about airport selection I thought they were talking about the DESTINATION airport not source (I didn't recognize airport code IAH)... so they booked me out of Houston. Oops. That's a 2.5 hour drive away from Austin (if I still had a car, bit longer by bus). I asked if they could add a connecting flight since a quick check of commuter flights from Austin to Houston shows a bunch for around $70... and they offered to refund me the $70 when I got there. I was too tired to cope and thought I'd try again in the morning. (Actually I went "maybe I could just send them a video of my talk, and not go in person because even if they can't amend the itinerary I'm sure they can still get a refund this far in advance...", and was up until 3am doing an outline.)

The money isn't the problem. The USA's insane security theater is the problem. Airports these days, you're supposed to budget 2 hours to get through security, and if I arrive on a different itinerary than I'm continuing on I have to go through security AGAIN (and collect my luggage and re-check it), which means my one hour layover adds 2 more hours, and I still have to arrive 2 hours early in Austin, plus the actual austin to houston flight, plus me getting up and going to the airport in Austin, which means my ~1:30 pm depature out of Houston is now something I should leave home for around 6 am. It's now a red-eye flight with something like 17 hours of travel before I arrive in a strange country to deal with a new kind of customs check and trying to find the hotel. (Tokyo I more or less know my way around now, and can recover from inevitably getting lost more than once on my way anywhere. Taiwan I've never been to, I'm assuming it has a rail system or busses of some sort? To... a hotel? Somewhere?)

What I'd MEANT to work on last night back at the hotel (instead of outlining a talk I don't have to give for months yet as a way of venting "dowanna deal with this right now" anxiety) was getting Jeff an initramfs that extracts a tarball into a subdir and then does a proper switch_root into that. Which means teaching switch_root that it's not only partition boundaries block file deletion, it should also skip the destination directory. Oh, and it wasn't doing the mount --move on the existing partition mounts, thought it was but apparently I hadn't implemented that.

Doing this means the j-core developers don't have to rebuild the kernel each time to change the root filesystem contents. (They never taught the j-core bootloader to load an eternal initrd.gz file because doing that with device tree involves either patching an existing device tree on the fly or doing the device tree overlay thing, and we just never got around to it.) Upgrading switch_root is a good toybox thing to add to the release I'm preparing anyway, seemed like a good thing to prioritize.

So after about 3 hours of sleep, I got up and started working on that, and an hour later Mike called up to yell at me that I was making "us" look poor by arguing over $70 (the conference that invited Jeff and me to speak had offered to cover travel, I took them up on it, I hadn't yet responded to the email offering $70 instead of amending the itinerary). When I told him my position on that, he shifted to yelling I hadn't left the hotel in days (not true?) and that I didn't have the work for Jeff done yet which he had CALLED INTERRUPTING ME WORKING ON. (And which I do best in the hotel because our Hello Office is hot, stuffy, windowless, and full of trash.)

They're paying for my trip but not paying me a salary. I like tokyo, hadn't been here in years, from my point of view I got a free vacation to Japan and am helping out friends, but I've GOT a day job working on toybox, and am trying to keep up with that while I'm here. I want to see them succeed, but this ain't paying the mortage (or the $8k to clean the mold out of the vents back home).

I told Mike that his call had interrupted me doing the thing he had called to yell at me about not doing, and that I was going to go back to doing it, and hung up and muted my phone for a bit. Twenty minutes later I noticed his Signal message: "Rob pick up the phone or I will get you arrested by the police". (Did I mention Mike is a japanese citizen and I'm not? On an unrelated note, when I first got ADHD meds I didn't know they weren't allowed in Japan: different schedule levels and not honoring foreign prescriptions and so on. When Jeff and I cleaned up the office I found all sorts of old pre-pandemic things, many of which I'd asked Mike about the status/location of before said cleanup. Back when I left the apartment in Japan expecting to return _before_ the end of the pandemic, I left behind two suitcases worth of stuff which Mike moved to the office when the lease on that apartment finally expired and they chose not to renew it.)

So at that point it was me trying to get the code done and sent to Jeff before the police showed up. I got it done, tested, checked in, and an updated image (with build procedure) emailed to Jeff. I then left the hotel room (still sans police), met Jeff at the Hello Office, and walked him through the build so I was sure he could reproduce what I'd done.

I'm sure other things happened later that day, but I couldn't tell you what they were. No, police didn't show up. (I dunno if Mike was bluffing or merely changed his mind. I was actually looking forward to maybe getting to go home early, I'm kind of regretting extending my stay from the 3rd to the 17th if it's going to be like this, and NOW the complication with the Taiwan itinerary thing has switched to "I asked them to fly me in from the USA and out to Tokyo, but I'm no longer sure I could/should come back to Tokyo"...)


May 8, 2023

Cleaned out the spam folder again. For some reason, gmail has decided that all of Elliott Hughes' posts to the musl mailing list were spam. (Not the REST of the threads he's replying to, just him. And not his posts to me or to the toybox list.) Yes, he's a google employee emailing from a google.com address. *jazzhands*

Finding stuff wrong while doing release notes, as you do. (Nothing highlights gaps and weird cornercases like writing documentation.)


May 7, 2023

I should record my talk proposal writeups instead of just entering them into various call for papers websites where the ones that aren't selected vanish. (I mean, I SHOULD do them as online talk videos, but it's really hard to motivate myself to talk to camera by myself in a quiet room. As the twelth doctor said in the episode Heaven Sent: "I'm nothing without an audience". I was doing ELC but the Linux Foundation's really done a number on that one and I haven't been able to brace myself for it since 2019.)

Slogging away at toybox release notes. I've done a new entry skeleton that I may leave commented out at the start of the page because I haven't entirely been consistent in my category headings. I should probably ALMOST CERTAINLY switch the index.html link to point to the "about" page (which tries to explain what the project is and why) instead of the "news" page (which is long technical gibberish release notes and not a good first impression; yeah proof of life but not a gentle slope to ascend instead of a vertical cliff).


May 6, 2023

Oh goddess. You know how news coverage and articles always seem authoritative until you read something you already know about, and then there's multiple obvious errors? I just read the Wikipedia[citation needed] article on Bionic. Lots of "that's just wrong", "that's almost a decade out of date", "Elliott fixed that because of _me_", "no you can just do this instead"... I may need to go lie down. (And I'm not even a Bionic developer!)

Jeff and I cleaned up the Hello Office by dragging most of its contents out into the conference room down the hall and then putting 3/4 of it back and throwing out the rest. (Well, right now it's a pile of trash in the middle of the office because the building's trash room is only open for an hour in the mornings, and we didn't find a box cutter for the boxes so may need to buy one. But I'm calling it a win anyway. Three hours of lifting and hauling. I do NOT get enough exercise.)

During the shoveling I unearthed a mysterious CD which turned out to have the professional photo I had taken (japan still has that service, like Sears used to) for my now-expired zairyu card, which means my own attempt was not strictly necessary.


May 5, 2023

Woo, 5000th commit to the toybox repository! I feel I should have some sort of celebration. (I bought an instance of the famed famichicki from famimart. Tasty, but very greasy. I prefer their teriyaki grilled chicken breast to their fried offering. I guess both technically qualify as famichicki, but the fried one is the meme.)

I am REALLY TEMPTED to add a new option to toybox echo so it can split the arguments it's printing with newline instead of space. There's a lot of "ls blah/*/blah | xargs | blah" to glue things together, but the other way has to use "echo blah/*/blah | tr ' ' '\n'" which is awkward (quoting both arguments!), but not awkward ENOUGH to add "echo -N blah/*/blah" and open the whole "should I try to teach busybox and coreutils about that" can of compatibility worms.

Yes, I added "test -T 37" recently, because I care that the file descriptor is OPEN, not that it's necessarily a terminal. And I couldn't figure out how to do that otherwise... Sigh, figured it out: I can just do "2>/dev/null <37" instead and the shell will error if the filehandle isn't open (because the dup2() fails). Alright, remove that then. (This is why releases take so long: writing documentation reveals needed code changes, and both blogging and release notes count.)


May 4, 2023

Under the weather today. The kebab place Mike likes to go has very spicy food, and I killed a roach that crawled over the table while I was eating. This morning my digestion was not happy. Go figure.

Working on closing tabs for a toybox release. So many tabs...

I need to rebuild toolchains with musl-1.2.4 but I never did bisect why sh2eb won't build under newer gcc, and I can't ship new toolchains without that. (Well, ideally I want to rebuild the hexagon llvm too, which had its own version skew a while back. I also want to redo mcm-buildall.sh to not need mcm, at which point I can probably stick the new replacement in mkroot.

And I need to finish mkroot/README and update the mkroot faq.html entires...


May 3, 2023

Some text I cut out of my reply, as "not helpful". (I have a venting-about-lkml budget I try to stay under, and that message already had plenty. Here, no such limit. Well, I spent so much of the rump administration venting about what the ruling nazis were doing that I forced myself to only do it on odd numbered days and left the even ones for technical stuff, but A) much higher limit, B) that was people's lives rather than just niche drama.) Anyway, what I wrote was, with URLs moved into actual links because blog instead of non-HTML email:

I was going to point you at the last kernel commit with "oppenlander" in it so you could confirm which email to use, but I have a repository here going back to 0.0.1 and his name's not there despite the patch submissions. He's not a regular in the clique, so nothing he submitted ever got in. That's linux-kernel for you.

(For all my faults, I historically _have_ managed to get code into linux-kernel. Largely because I'm really old so have been around longer than a lot of the grognards gatekeeping these days, and I was even technically the Documentation maintainer between commits 01358e562a8b and 5191d566c023. And I understand the whole "if you want to get anything done you have to complain until you're blue in the mouth" Dead Parrot aspect of the project the author of Squashfs eloquently explained ("a closed community which know everyone worth knowing by sight") ten years ago when Linux Weekly News asked if the Linux Foundation had completed its purge of all hobbyists from the open source development process, which it had. They've ossified a LOT more in the 10 years since Philip Lougher wrote that...

So yeah, happy to submit patches to someone who will actually talk about the code and not the bureaucracy+politics (he says, venting about the bureaucracy+politics).


May 2, 2023

Jeff and Mike are turning a big todo list I made into Open Project Work Items. I'm sitting with my laptop doing other stuff, but available in case they need to ask questions about the todo list.

I have an old rant about open source being unable to do user interfaces, and it's about how any time it's faced with a user interface issue the process melts down into one of three distinct failure modes. I know I blogged about it but couldn't remember which year off the top of my head, so I googled for "landley three distinct failure modes"... and then put quotes around "landley" because I recently learned that it's silently substituting in random misspellings for words it doesn't think are popular enough... and my blog STILL does not show up in ANY of Google's hits. Nor does the copy of the rant I put into the aboriginal linux about page, which I was reminded of when I looked at the talk version of the rant I gave years ago at ELC and I had that about page version up on the screen.

Google found NONE OF THAT. Despite all three containing the phrase "three distinct failure modes", and two of them being on landley.net. Google search is not healthy. It's kind of concerning, twitter going away is one thing, but Google Search will be _missed_. (They're panicing about chatgpt, but NOT about rapidly losing competence at their original core business. It seems to have started about when they laid off those 12,000 people.)

Today I learned that Open Project (and presumably whatever the generic name for crap-like-jira is) has "stories" and "epics", and an epic is a collection of stories. (Like the Epic of Gilgamesh... which seems kinda unique and nobody ELSE calls a collection of stories an epic? It's usually a series when it's not a trilogy. Kevin Feige is trying to brand the MCU iron man to endgame collection as "the inifinit saga".) This "epic" naming is pretentions enough I'm actually slightly nauseous. I would go out of my way to avoid meeting the people who decided on that naming.

Still getting emails for the "Austin Tech Happy Hour", which was a vaguely interesting thing many years ago. It seemed like a good idea to maybe meet some people on the same side of the planet, now that all the local LUGs I knew broke up; I went three or four times, don't think I actually met anybody I wound up seeing again). But at some point it grew a cover charge to keep the riff-raff out, and I really don't feel the need to pay $10 to attend a gathering of people I don't know at a bar, thanks. (Meeting random strangers with shared interests in-person is what giving talks at conferences is for. And science fiction conventions. In THEORY it's what meetup.com was about, but all the ones I tried to attend of those were "oh no, you're not allowed to enter the building without paying" nonsense too. And all those SxSW events that supposedly didn't require a badge, I stopped trying those YEARS ago because I never once got in. They were either full to capacity from preregistrations I couldn't access without a badge, or just plain "it said it didn't need a thousand dollar badge but does". As with the twitter blue checks, it's not the ability to afford it that's the problem, it's that the kind of people you're selecting for means I don't want to meet them.

Alas, my normal daily schedule involves sitting quietly in various corners reading and/or writing things, with the occasional long walk by myself. I often have _extensive_ correspondence with people at least a thousand miles away, but have to go out of my way to exchange ten consecutive words with anybody in the same town who I don't actually live with. There's a reason I founded more than one science fiction convention back in the day. :)


May 1, 2023

Darn it, glibc's wcwidth() is returning at most 1 for every character in toybox, never 2 even though when you cat tests/files/japan.txt it's all hiragana characters of width 2 (visibly measurable against an ascii text line above it). I'm trying to rewrite fold.c to do unicode properly and the glibc apis don't work.

Jeff is deeply enamoured of a pointy haired management thing called "OpenProject", so we spent HOURS yesterday setting it up so he can do gantt charts in it. Except the admin account doesn't work because it immediately goes "cross site scripting!" which turns out to be because the browser doing https is not enough, the openproject application ALSO has to have access to your let's encrypt keys. (Why? I don't know. Third base.)

This thing is the kind of "open source" you see when a corporation produces regularly updated abandonware. It has no community. There is no libra chat channel for it. Googling for things about it produces hits on their site and nowhere else (although with the sad state of google search I'm not sure what that proves).

A recurring error in our attempts to set up OpenProject is that their git integration breaks apache, which refuses to start because "OpenProjectGitSmartHttp" is a made up word its config file parser doesn't know. Googling for that word finds a closed bug report on the Let's Encrypt website where the Let's Encrypt people say "this is not our bug, ask openproject". There's also a bug report on the OpenProject website where somebody said it broke, and someone else replied "yeah it broke for me too", with no response and no fix. The bug report is from 3 years ago.

We EVENTUALLY figured out that the magic word is exported by the subversion integration code, so if you enable git integration WITHOUT also enabling subversion integration, it CAN'T WORK. (I repeat, this project has no developer community except employees of the company producing it, and THEY want you to run their magic docker where everything is preinstalled for you and you do not touch their proprietary inexplicable secret sauce "open source" code that you're crazy for trying to install/configure yourself.)

And of course if you enable the svn integration it breaks apache for a DIFFERENT reason, so we just switched them both off for now.

I also noticed that the gmail account Jeff set up for me years ago, which I'm only logged into on my phone, hasn't been inactive like I thought. When I open the gmail app on my phone (only thing logged into it), it says "auto sync is off", and I have to pull down to load to see if there's new mail. This is why I haven't gotten a new mail notification from it since last year. BUT if I try to turn auto sync on, I get a full-screen pop-up saying this doesn't apply to just gmail but will also flush all my photos to google's cloud so they can scan them on behalf of ICE and the TSA. There's no obvious way to enable "tell me when I get new email" without "send my contact list and location history to the governor of texas whose boomer supporters can sue you for a million dollars if they think your wife had a miscarriage". Hell no. I don't want to sync my photos, contacts, location history, I don't want it uploading (let alone retaining) the voice samples from speech-to-text (which I KNOW it can do locally because it does it in airplane mode, I don't know what Rossman is on about? Or is this one of those "it _can_ operate independently but there's no way to tell it not to upload everything anyway" things like I'm having with the email client?)...


April 30, 2023

So I'm writing a new unicode aware fold and I'd just like to say that posix really needs to move past the Y2K bug and enter the 21st century at some point. They have a "-b" meaning "interpret as bytes", but do NOT really handle the "not that" case.

Backspace is defined as reducing the column count by one, but unicode characters can have variable width (including zero for combining characters which should logically come BEFORE the character(s) they combine with but don't because somebody REALLY STUPID was on the unicode committee, I'm assuming from Microsoft). So in THEORY backspace should remove the number of columns consumed by the last printable character.

In practice, the flush-and-forget approach to output when toybuf fills up is a problem because we may have to backspace into it... unless we record how wide each column of output was? I mean, that's just a malloc of length -w (or shorter if we want to get fancy), AND avoids having to back up through utf8 to find the last printable unicode character.


April 29, 2023

Talked to Jeff about whether I should bump my flight back. We're getting a bunch done, and Fade and Fuzzy don't... strongly object. I enjoy Tokyo, and get about as much toybox work done here as I do elsewhere (the better work environment balancing out the extra demands on my time, although I hope fixing the mold in the vents back in texas changes that going forward).

I'm basically getting a free vacation in Japan, modulo not really seeing much of it outside of late night walks (tried to walk south to the beach; there's no beach, it turns into an industrial harbor sort of thing. Oh yeah, city. Right...) but I'm STILL trying to learn nontrivial amounts of the language.

Everybody keeps saying that "the food is so much healthier here you will lose weight", but 24 hour combini with tuna mayo rice balls and sweet milk tea EXACTLY the way I made it growing up (which was dismissed as absurd by everyone else, where's the Fools I'll Show Them All lightning when you need it, VINDICATION!!1!ichi!) it hasn't exactly worked out that way. I have two relevant Claire Ting videos queued up but would have to shuffle luggage for multiple minutes to clear space... so long walks. I wonder if there are any swimming pools available?

I'm eyeing a toybox release. It's overdue, I know, but there are so many things I'd like to get IN said release. Still, 6.3 came out and I should more or less try to stay synced with kernel releases...


April 27, 2023

The downside of this lovely office is I have no ADHD meds here, because they're not legal in Japan, and I'm really starting to notice.

Sigh, SMB_SUPER_MAGIC is still lying around, and it got moved to staging in commit 2116b7a473bf and then removed in 939cbe5af5fb in 2011, which was TWELVE YEARS AGO and yet the debris is still not just in the kernel tree, but in the header files exported to userspace. Oh hey, and USBDEVICE_SUPER_MAGIC is gone too (commit fb28d58b72aa back in 2012), but the symbol's still exported in the header. Oh, and last time I was poking at this, Novell Netware went away in commit bd32895c750b but they still have NCP_SUPER_MAGIC in the header.)

The observation in The Cathedral and the Bazaar that "with enough eyeballs all bugs are shallow" has been demonstrably untrue of linux-kernel for at least that long. There are not enough eyeballs because the kernel community is unwelcoming of newbies who would go over the obvious with fresh eyes and thus point out stuff like that. Just like the geriatric Unix community it replaced, it's now all old farts with long white beards and suspenders telling everyone who will listen about the glory days ~25 years ago.

Huh. I've got "fat" as 0x4006 in my list, and can't find that in the kernel source (not current, not 4.0, 3.0, 2.6, 2.4, 2.2, or 2.0). It came from a patch from Hyejin Kim but I have no idea where he(?) got that from? There's "msdos" and "vfat" (both 0x4D44), but no "fat" using 0x4006... And the 4d44 constant was added in linux-0.9.7 in 1992.

Right, posted the patch with a jazzhands comment, poked the github request to see if that fixed it for them, and punted on a BUNCH of questions. (If I identify smb do I say "cifs" or "smb3", both of which are driver names you can mount it with but... different behavior? msdos vs vfat is another but there's never reason NOT to use vfat these days that I'm aware of...) What I should really do is come up with a Horrible Sed Invocation that just extracts this data from the kernel source so I can regression test, but I'm not up for it right now. (In part because grep -ho 'register_filesystem[(][&][^)]*)' -r * | sort -u | wc -l says there's 97 of them, and in part because the first one grep finds is in arch/s390/hypfs/inode.c and I really can't bring myself to care about that one at the moment. And because grep 'static struct file_system_type .*_fs_type = {' -r * | wc returns 83 hits rather than 97 meaning this is NOT quite regular enough to make it easy.)


April 26, 2023

Fade finally tried one of the cans of The Dintiest Moore I left behind (I asked her to order a flat while I was there), and is a fan. It's the only american product I've encountered that makes serious use of Demi-Glace. (I don't know what non-demi glace would look like. Full glace? Ask the french. Highly boiled cow.)

Watched a frustrating history of gasoline video, which both had good historical information and repeated debunked lies out of old industry press releases verbatim.

A hundred years ago, Standard Oil worked out that mixing about 10% ethanol into gasoline prevents engine knock. All the lead in tetraethyl lead EVER did was make it PATENTABLE, because ethanol (which is the kind of alcohol humans have been drinking for thousands of years) already existed. The lead served no other function in the mixture EXCEPT to make it patentable. Tetraethyl lead is four ethanol molecules connected to an atom of lead (resulting in a molecule shaped like a swastika), and when you heat it you get back four ethanol molecules, plus a free radical of lead which goes out the tailpipe. It otherwise behaves EXACTLY like mixing ethanol into the gasoline would (which was the goal of developing the compound), and when it was finally restricted by the EPA they replaced it in gasoline with pure ethanol. Old engines that COULD use leaded gasoline (because they didn't have a catalytic converter, which the lead binds to covering over the catalyst surfaces that otherwise break down incomplete combustion products like carbon monoxide and nitrous oxide and so on), all those old engines worked JUST FINE with "unleaded" gasoline, and people only thought the stuff with lead was "better" because of years of advertising lying to them and causing placebo effect performance evaluation.

The airborne lead also made people exposed to it measurably more stupid, which is combining badly with senility in the current Boomer generation as age-related neurological degeneration overcompes their ability to compensate for a lifetime of nerve damage from massive pediatric and chronic lead exposure. (This is why everyone fled the cities to the suburbs, they moved upwind so they could breathe! But it was only RELATIVELY better, the air of the ENTIRE PLANET was poisoned (airborne lead was like acid rain and the CFCs that caused the antarctic ozone hole).

Keep in mind that organic lead compounds are generally even worse than metallic lead, because the human body is better at absorbing organic compounds and bringing them inside cells. So both tetraethyl lead itself and the free lead radicals going out the tailpipe in a cloud of superheated moist carbon monoxide and so on... that may have poisoned the Boomers WAY more than the largely inert residue it's broken down into 30+ years later. Some compounds are worse than others: the movie Erin Brocovich talks about Hexavalent chromium being WAY MORE TOXIC than other chromium compounds, and the research chemist Karen Wetterhahn was killed by a couple drops of dimethyl mercury poisoning her through her glove. The leaded gasoline profiteers were intentionally putting lead into volatile organic compounds that people would inhale, the neurological damage the Boomers suffered from this is manifesting VERY STRONGLY in their senior years.

Seriously, I wrote about this at length, with citations to multiple articles about it. Water samples taken in the middle of the pacific ocean had 20 times as much lead near the surface as the same location a few hundred feet down. Blood lead levels were SIX HUNDRED TIMES higher than samples from ancient egyptian mummies, and children absorbed 5 times as much as adults did. The Boomers were the first generation to grow up surrounded by cars, and it HURT THEM BADLY. In their 20s they could mostly compensate. But as they slowed down in their 40s the brain damage really started to show, and now that they're turning 70 two thirds of them are losing all touch with reality. This is not a case of oligarchs being better at manipulating people than the Railroad Robber Barons of the Guilded Age of the late 1800s, this is a population of lead poisoned vegetables ripe for elder abuse. Ten years ago they were falling for nigerian prince email spam, and now it's fascists finding them useful political cannon fodder. If even the rich and famous regularly suffer from elder abuse, imagine what the wider population of brain damaged Boomers is undergoing. Boomerdom going full nazi is because they literally have brain damage, which means our best chance to pull out of it and clean up afterwards is to outlive them.

Back to the frustrating video: when he later goes on to talk about "oxygenates" like ETHANOL... he does not connect the dots. This was not a new discovery. Thomas Midgely and his bosses understood this JUST FINE a hundred years ago. They chose to poison LITERALLY BILLIONS OF PEOPLE around the world entirely for profit. And then when Oil Industry stopped needing "the Ethyl Institute", the think tank reorganized itself into The Tobacco Institute to defend poisoning OTHER people for profit. And when that ran out, they reorganized into a bunch of global warming denialist think tanks to continue to kill people for profit.

Billionaires love to profit from fascists, and gerontocracy collapses into facsism, and we're suffering from both right now. On the geronotocracy thing: Hitler came to power in Germany because the previous President of Germany, 86 year old World War I veteran Paul Hindenberg, made him Chancellor in 1934 to shut him up (ahem: in hopes sharing power would appease him). Hindenberg was then manipulated into signing an emergency declaration ONE MONTH LATER giving Hitler's edicts the force of law which were not subject for judicial review for the duration of an emergency that lasted until Hitler said it was over. 6 months later Hindenberg died, at which point Hitler appointed himself president AND chancellor. Hindenberg was the same age as Dianne Feinstein (who is still in the senate), 3 years older than Nancy Pelosi (who is still in congress), and the same age Biden would be at the end of the second term he just announced he's running for. At least all those guys are OLDER than the pediatric (but not chronic) lead exposure from gasoline.

Oh good grief, now the guy in the video is on about ethanol coming from plants that absorb carbon dioxide: STOP IT. All matters is whether it's fossil carbon or not. Plants taking carbon out of the atmosphere for SIX MONTHS before it goes right back into the atmosphere does not change the amount of carbon in the atmosphere in any meaningful way. Mining operations that take carbon that's been underground for millions of years and release it into the atmosphere, THAT is what permanently increases atmospheric carbon. I do not care about rearranging deck chairs on the titanic, either you're mining fossil carbon or you aren't. (The problem with "carbon sequestration" is finding someplace to put it. A trillion dollar industry digging up carbon from miles underground is kinda hard to run in reverse at the DESIGN level...)

Reading press releases is not research.

Sigh, archive.org decided to commit seppuku during the pandemic (let's aggro every major publisher by putting their books online for free!) so I should definitely mirror the institutional memory post in my own computer history archive before it goes away. (Yes IP law is stupid the same way car-centric cities are stupid, but running out into traffic is not the answer.)


April 25, 2023

Finally got the turtle board running the 6.3 kernel and current toybox (increasing my kernel patch stack to 10 patches in the process), and... there are bugs. For some reason, ctrl-C doesn't work in the console which means oneit isn't doing the switch from /dev/console to /dev/ttyS0 (well, ttyUL0 there) properly. Another problem is that "ps" produces no output, even though I can cat files out of /proc and see the raw data it should be transforming into output.

Alas, I still don't have a proper nommu test system set up under qemu, and sneakernetting sd cards over to the turtle board for compile/install/test cycles is... really hard on the fingernails. I burned out (gummed up?) one SD card adapter already and bought a new one that's REALLY TIGHT, and have been slowly chipping bits of plastic off the ridge at the end getting the sd card back out. (The turtle board itself does the push-to-click thing but the laptop end uses a microsd-to-sd adapter, unless I want to dig up a USB adapter which is worse. I've already trimmed my fingernail to be less pointy in hopes of chipping out LESS plastic, but it's a question of degree.)

Sigh, at some point I need to do this dance with QEMU's virtual cortex-m board so I have a nommu test environment that runs under qemu, which should make regression testing this a lot easier. The problem is I don't have a "what success looks like" reference version there. Maybe I can beat one out of buildroot? (Or make puppy eyes at Geert Uytterhoeven about coldfire, that's a nommu target qemu theoretically supports as well, although I recall getting a kernel/board config to match with nontrivial amounts of RAM and useful peripheral devices didn't line up last I checked. Sigh, I should learn to modify QEMU, but just haven't got the spoons.)


April 24, 2023

The magic to stop vim from intercepting the mouse, thus preventing the terminal from letting me copy and paste text between a screen session at the far end of ssh and a local window, is the colon command "set mouse=" with nothing after it. There may have been a small rant.


April 23, 2023

I'm not just merging the j-core turtle board config into mkroot, and cleaning up mkroot general in preparation for cutting a toybox release, and also testing the 6.3 kernel. Of course there's kernel config weirdness. Kernel commit 3508aae9b561 memorializes a lot of config changes back around v5.8 that I wasn't paying much attention to at the time. IOSCHED_CFQ became IOSCHED_BFQ, IOSCHED_DEADLINE seems to have replaced the NOP one (always configured in), and MMC_BLOCK_BOUNCE went away because you can't switch off the bounce buffers anymore. MTD_M25P80 got merged into MTD_SPI_NOR.

Dirty trick: I can detect NAME=VALUE in the mkroot microconfig format and automatically insert lines other than =y or =m without needing the separate KERNEL_CONFIG mechanism... Except that the value can in theory have a comma in it. (None of the ones I'm using yet do, but they CAN.) Hmmm, I suppose I can come up with an escape mechanism for the comma? And then NOT have an obvious example of it in the file. Hmmm... The alternative is keeping the second mechanism for passing in raw lines despite nothing in the file currently using it. Or waiting for somebody to complain, which... isn't really better here because said complaint is likely to turn into "oh I can't use this" rather than "I'd better report this to the maintainer". Hmmm...

(I can backslash escape quotes and spaces, but can't backslash escape commas because the escape gets eaten before that parsing happens. I could transpose it with another character but that's black magic. I could say an assignment has to be the last thing on the line so it eats commas but I've already got multiple assignments in one config. Hmmm...)


April 22, 2023

And the air conditioner service guy back in Austin found mold in the vents. So we have to make an appointment with a Mold Remediation Specialist. Great. Well, that explains why I'd feel so tired five minutes after getting home, and had so much trouble getting a good working environment there and preferred to do all my work out at a fast food table or at the university.

(This is exactly why we had very expensive specialists come after every flood with HUGE BLOWERS and refrigerator-sized dehumidifiers drilling holes and spraying gallons of chemicals into the walls: DID NOT WANT MOLD. Didn't really think of the air conditioner vents, where condensation is kinda normal. What's in there for them to eat, anyway? Dust, I guess...)

Anyway, I'm here in tokyo, where mold smells completely different anyway. (The clothes I left hanging to dry in the apartment needed some serious re-washing.)


April 21, 2023

It's so easy to just spend the ENTIRE DAY in an APA hotel room, and ignore the outside world. I shouldn't, because it's Tokyo out there and I really like tokyo, but it's SO QUIET. (APA is apparently the middle three letters of Japan, at least on their posters... Which is weird because here it's Nippon. Medieval Dutch and Portugese traders asked OTHER countries what those islands over there were called and "Japan" seems to have emerged via consensus from multiple languages playing telephone, and then the same insane map makers who named two whole continents after Americo Vespucci went sure, "Japan", sounds great).

I figured out why Jeff can't stand them, or the windowless Hello Office: the pandemic gave him claustrophobia. Being in enclosed spaces too long gradually increases his stress levels and he needs to go OUT somewhere. I can relate, but am personally experiencing the opposite here. Let me work!

The main limiting factor is Jeff calling me up and pulling me in to his projects, but he is paying for the trip.


April 20, 2023

The old j-core ethernet driver was just too messy to submit to mainline. It's not secret, but Jeff outsourced it to some cheap Russian programmer (the lowest bidder) years before we ever met and only like 1/3 of it is actually relevant. It's got all sorts of debris from IEEE time synchronization and such that were never completed. We should really write a new one, but never got around to it.

That said, the last time it got forward-ported was 5.8, and we'd like to use it on current (6.3) kernels, and I bisected the FIRST build break to commit adeef3e32146, which made a field const and added a gratuitous new API to change it. There's a bunch of commits (bb52aff3e321, 0f98d7e47843, 9a962aedd30f) converting drivers to the new API, so it wasn't too hard to fix it up. The other breakage (b48b89f9c189) removed an argument from a function, and was easy to fix up.


April 19, 2023

Bisected the "Turtle works now" bug to commit 5d1d527cd905, which was a rewrite of the RCU plumbing for the networking code that starts "Using rwlock in networking code is extremely risky..." So yeah, I'm willing leave that part to the professionals. The symptom they saw was soft lockups, it fixed our boot hang, calling it good and moving on.

I have discovered LaserPig on Youtube, who answers the question "What if Sheogorath, Daedric prince of Madness from Skyrim, did youtube videos about the war in Ukraine in character as an extremely drunk farm animal with strong opinions about military history and equipment". I discovered him via a team up with the "oh god you've reinvented trains again" guy who keeps photoshopping Elon Musk into a clown outfit.


April 18, 2023

Trying to have mkroot more gracefully straddle the patched vs unpatched kernel issues, and also get the init script to work nicely in both QEMU and a chroot/container. Added test -T to check if stdin is open, test already has -t but I don't care if it's a tty because a chroot with redirected stdin/stdout (or piped through something) is fine and does not need to be replaced with /dev/console.

Sat down to figure out why the current vanilla kernel broke on turtle, and... it's fixed? It smelled like an alignment issue (unaligned access), and maybe it got perturbed so it's aligned again? (Or else there was a bug that hit somebody else and they fixed it?) Either way, that means the revert commit in Rich's j-core patches is no longer needed although I'm still gonna track down what fixed it so I know. (If bad alignment got perturbed into place again, it'll be back.)

And now that there's a new arch/sh kernel maintainer I'm looking at those to see what's still relevant. Vladimir Murzin's commit was merged into vanilla. There's one adding extra percpu memory... why? Rich does not actually provide descriptions with his patches, so I have no idea what actual PROBLEM he was trying to fix, and the kernel's attempts to describe this plumbing are not enlightening.

Ok, generic commits that still apply: 4c7333b0fb9e, 53ac9fc75ae0, 262e1e5884da and could maybe go upstream as-is. Commit 155d2abffb8b is jcore-specific (the clock thing), I think it's generic-ish and could go into the vanilla tree? (Need to test that the turtle board as is still works with it.) The ethernet is 186e1d80a89b and 666583fa6d5d, gratuitously split in two for no obvious reason.

I'm test building on my turtle board, by repeating:

for i in ../toybox/000[34578]*.patch; do echo $i; patch -p1 -i $i || break; done
sed -Eis '/select HAVE_(STACK_VALIDATION|OBJTOOL)[^_]/d' arch/x86/Kconfig
patch -p1 -i ../linux-sh/0001-percpu-km-ensure-it-is-used-with-NOMMU-either-UP-or-.patch
patch -p1 -i ../linux-sh/0001-revert-790eb67374-to-unbreak-j2.patch
mkroot/mkroot.sh CROSS=sh2eb LINUX=~/linux/github
sudo bash -c 'mount /dev/mmcblk0p1 /mnt && cp root/sh2eb/linux-kernel /mnt/vmlinux && umount /mnt'
sudo microcom -s 115200 /dev/ttyACM0

Bash command line history comes in handy there, cursor up a few times and hit enter, take out the sd card put it in the holder, put it in the laptop, run the thing, take it out of the holder, ka-click it into the board, plus in usb, frown at boot messages, rinse repeat.

Bisecting stuff is awkward (why the above [34578] skips some numbers and a couple patches are broken out as individual lines). Much annoyance because git insists that old is good and new is bad. You're never searching for where something got FIXED, only for where a bug was introduced. Therefore, to find the commit where the turtle board started working again (without reverting commit 790eb67374 in the patch above, that's the "it just started working again" thing) I had to call the one that does not boot "good" and the one that boots to a shell prompt "bad". Because git.

For extra fun, I have to build each one without the revert first to see if I get output, and then build it again with the revert to make sure I DO get output and it's not a different bug. So the cycles are a bit slow.

And mkroot is set up to always do a full build. I can do incremental builds out of tree, but then I have to hack the config file to point to the initramfs directory via absolute path and I dowanna.


April 17, 2023

The hotel rooms in Japan are lovely. Jeff says they drive him nuts, but they're giving me something I haven't had nearly enough of: quiet well-lit isolation with a desk and an outlet and internet access where I can get work done without interruption. (Especially now I don't have to be out of them by 10am, at least 3 days out of 4.) A 5 minute walk away there's tea the outright weird way I grew up drinking it (cold, sweet, with milk: confusing BOTH sides of the atlantic), and cheap tasty rice triangles (onigiri) which I finally figured out the intended way to open so the seaweed goes around the rice. (There's a pull tab that peels away to split the packaging with a plastic thread down the middle, and then you pull it equally off both sides so the seaweed stays in place. Gets seaweed crumbs on the desk, but otherwise works great. The point is the seaweed and rice are separate until you open them so the seaweed stays dry and crispy.)

I hadn't actually _installed_ the qemu targets I rebuilt back at the end of march, and now that I'm trying to test them "mips" still isn't working. and I don't remember what specifically I built (I can see the git log but it doesn't help), so I think I need to pull and rebuild ust mips, which is probably still "./configure --target-list=mips-softmmu"? (The QEMU devs have a terrible habit of breaking their API for no obvious reason.)

I locally checked in the "move mkroot to its own directory" stuff and pulled it into my main tree, but haven't pushed yet. I should add a "Hey! This moved!" stub when you try to run the old one, but the #!/bin/echo command line gets the name of the script you're currently running as an extra argument, and "This script moved to mkroot/mkroot.sh scripts/mkroot.sh" is... not clear.

I need ldd to fix up mkroot/root/dynamic, which runs after populating the airlock. Alas Elliott strenuously and repeatedly objected to toybox containing an ldd capable of running that loop, because it wouldn't invoke glibc's dynamic linker to find where something is currently loaded into memory when you haven't loaded it into memory. How a cross compilation running ldd on a mips binary is supposed to tell you where that library is currently loaded into an x86-64 system, I couldn't tell you, but the FSF keeps making their binaries fatter and fatter. The one in uclibc never did this.

Back in 2020 a third article about the greying of Linux came out, but it's already fallen off the web and you have to fish it out of archive.org because "Linux has gone the way of Unix, maintained by crotchety greying grognards who scoff in all directions outside their insular little niche" isn't really NEWS anymore.


April 16, 2023

It's one of those days where I'm skittling along a giant dependency chain doing twenty minutes work on one thing and then going "but first I need to do X" and getting five things deep before I go "what can I actually FINISH AND CHECK IN RIGHT NOW".

The most recent email with the guy who needs me to update scripts/root/dynamic wound up with him being able to use the prebuilt musl toolchains I provided (he was saying Linux= instead of LINUX= but it's case sensitive), but I should still poke at the "dynamic" target because mkroot shouldn't REQUIRE musl, which led me to asking whether static linking in bionic is working yet (mkroot didn't used to be able to use it because of the "segfault with no stdin" bug hitting PID 1, which got fixed upstream but hadn't made it into the NDK yet, and there's a new NDK (r25c) so I downloaded that and extracted it and went "huh, creating the cc symlink I've been doing works but seems silly because there's no OTHER tools prefixed like that in that directory anymore, where did they go? There are a bunch prefixed with llvm- except there's no llvm-ld in there...") and so on down a rathole I've also parked and stepped away from because NOT RIGHT NOW.

But I tried to build with that in a fresh directory and defconfig of course barfed with bionic because it hasn't got the shadow password plumbing, except I redid lib/password.c and friends in my tree (the new one doesn't USE the shadow.h nonsense awkwardly bolted alongside the original user/group stuff by shadow-utils back in the 1990s) and I really need to test and check that whole rewrite in, except it's big and intrusive so copying it to a new fresh directory for proper testing required some investigation: "git diff lib" says that lib.h, password.c and pending.h (which I deleted as part of this work) are the changed files to marshall over, but the new password.c has three functions (get_salt, read_password, update_password) which grep says are used by: passwd.c, su.c, login.c, mkpasswd.c, chsh.c, groupadd.c, groupdel.c, sulogin.c, useradd.c, and userdel.c. And in my big dirty working tree the changed files are passwd.c, mkpasswd.c, chsh.c, groupadd.c, and groupdel.c, so that's what I should coopy to the new tree and try to build.

Except to test this stuff I need a mkroot build (not letting it write to my /etc directory as root just yet, thanks), and I ALSO have a toybox directory where I'm moving mkroot out of scripts/ and into it's own mkroot/ subdirectory (where I can give it its own README), and there are two edge cases that I'm not sure whether I should move: 1) mcm-buildall.sh and record-commands.

Design-wise scripts/mcm-buildall.sh remains a rough edge because it populates the ccc/ directory at the top level, not under mkroot.sh. The problem is one again "lifetime rules" (you don't rebuild the toolchains every time you rebuild mkroot). So... does it stay in scripts/ or does it move to mkroot/ with mkroot.sh and test_mkroot.sh and the scripts/root directory? It's not really part of toybox, it's an important dependency for mkroot (CROSS= there is what expects the ccc/ directory), and mkroot is what has the plumbing to download external packages (via mkroot/root/plumbing) so it kind of _does_ need to be in there... But if it IS in there then the README is hard to write, because the logical sequence of scripts is then 1) cccbuild.sh, 2) mkroot.sh, 3) test_mkroot.sh. But 99% of the time, you don't RUN cccbuild.sh. Heck, most newbies will probably download binary toolchains because it's a pain.

The other thing is I want to rewrite mcm-buildall.sh so it doesn't use Rich's musl-cross-make repository anymore and is its own standalone cccbuild.sh instead, because Rich doesn't reliably maintain musl-cross-make (the last commit to it was just over a year ago), and it's really not helping much anyway. The Linux From Scratch partial build script I posted to the toybox list last month builds a gcc variant without jumping through that many hoops, and I'm leaning towards just doing my own build directly rather than working out how to feed configuration stuff through Rich's plumbing to the gcc build. I've already added a couple of my own patches to his that he won't take, and have a couple more queued up that I poked him about but he ignored. (That said, I believe he and in his family are still touring Indonesia? I type this from Tokyo, can't throw glass houses at anybody, but I try to stay in touch. He's been insufficiently communicado for a while now.)

And then there's the whole "llvm toolchains" can of worms I need to reopen at some point, which musl-cross-make is no help at all about... I suppose the pending rewrite is a good excuse to leave the old one in scripts/ for now?

ANYWAY, I'm trying to write up the new README, starting from the ancient README back when it was a standalone project, and the FAQ entry (which is another thing I need to update before checking in the move; I should probably leave a symlink from scripts/mkroot.sh to ../mkroot/mkroot.sh in the tree).

Oh hey, today _is_ the every-fourth-day that they clean the room. When I asked the guy at the front desk what time I had to be out by, he said it was tomorrow.


April 15, 2023

And lo, I have my laptop available again (yay adapter), a quiet hotel room (APA is now only cleaning the rooms every 3 days so I can stay in it all day if I like), and rather a largeish todo backlog. Let's see:

Upgrade test suite so gentoo can run it.
  Request filesystem type, umount -l.
  ldd chroot https://github.com/landley/toybox/commit/e70126eabef8
Finish lspci -x fallout.
  Check compression? https://github.com/landley/toybox/issues/386
  http://lists.landley.net/pipermail/toybox-landley.net/2023-April/029520.html
Finish cgroup stat support.
  https://github.com/landley/toybox/issues/423
Yifan Hong's continuing tar weirdness:
  https://android-review.googlesource.com/c/2536710
Peter Maydell qemu Malta patch?
Tom Lisjac (and previous guy) want scripts/root/dynamic
  https://github.com/landley/toybox/issues/418
David Legault, fold tests. (Promote fold?)
  https://github.com/landley/toybox/issues/424
vmstat for zhmars
  https://github.com/landley/toybox/issues/422
sizeof(toybuf)
  https://en.cppreference.com/w/c/language/_Alignas
fix sh2eb mkroot build (toolchain and kernel)
gzip --rsyncable
  implement deflate, implement rsync...
Ongoing cleanup of mdev.c started on plane due to /sys/block poke.
  http://lists.landley.net/pipermail/toybox-landley.net/2023-April/029525.html
Finish the cp -s work so I can do install -T
Try to beat a multi-console thing out of mkroot+qemu to test oneit change
  http://lists.landley.net/pipermail/toybox-landley.net/2023-April/029531.html

Pretty sure I've missed multiple things there. Plus I _was_ planning on cutting a release before visiting Tokyo. And there's the Linux From Scratch automation script so I can go back down the aboriginal path of making a self-hosting toybox environment...


April 14, 2023

Ah right, there are no three prong outlets in Tokyo. And I brought a three prong laptop charger. That's inconvenient. My plan to program all morning until the sun came up (what with being waaaaay off this timezone in my sleep schedule) hit a bit of a snag there.

Met with Jeff in his office, unboxed, disassembled and reassembled oscilloscope, talked about about his battery project, went out to dinner with Mike and some of Mike's friends in Shibuya where we went to a chinese-run restaurant that allows smoking indoors, where I found out that after a few years of not trying to eat while breathing cigarette smoke I've lost my tolerance for it. (As in "mouthfull of food and lungfull of air combines to convince my brain I've got a mouthful of cigarette ash, and forcing myself to swallow triggers a nausea reaction that lasts all night." That was not fun.)


April 13, 2023

Air travel moved the clock forward 12 hours and more or less ate today. Went to bed at 8pm local time anyway, which was something like 5am relative to where I got up this(?) morning, after getting maybe an hour of sleep on the plane. (Sitting bolt upright. Horrible neck cramp.)

But at least I have delivered the giant oscilloscope box to Jeff, who dumped it in the office. Tomorrow I need to reclaim the giant pile of laundry and books and such I left in the apartment I couldn't get back to during the pandemic.


April 12, 2023

Onna plane. Got up at 5am to go to the airport. Flying from Minneapolis to Toronto (which is the wrong direction?) and then Toronto to Narita airport in Tokyo. Between the layover and the going the wrong way part, it's like 17 hours of travel before I even get to customs at the far end.

It's a lot easier to get programming done on a plane that ISN'T 100% full. Getting up at 5am after finally adusting back to a day schedule doesn't help either. I had grand plans for the 14 hour uninterrupted block, but don't have the focus.

Forgot to eat this morning (caffeine yes, food no), was quite appreciative of the first meal on the international flight at like 1pm minneapolis time. That may be a contributing factor to the lack of focus...


April 11, 2023

Huh. Given the way adler32 works, if you're just looking for a run of zeroes at the bottom and it's 16 bits or less... you don't need the whole algorithm. It's just "add up the bytes modulus by the largest 16 bit prime".

That really seems unreliable? I mean... ok, fast. But "runs of zeroes" are legitimately a thing? If you compress all zeroes is just gonna reset every minimum window size (4k)?

I still want to figure out how to do the rolling addler32 of the top part. I KNOW I worked this out before, my blog says I did it in 2001 and again in 2013 and it would be nice if I'd actually DONE it back then rather than restarting every 10 years.

Of course today's interrupt is updating the filesystem type detection list, which is tricksy because the kernel isn't consistent. I already have one more small patch (basically a repeat of the v850 patch one) to send to lkml, but they'll just ignore it. (I need to reply to Andrew Morton, but "you guys no longer take obvious one line fixes" is hard to say POLITELY.)

[Editorial, April 15th as I'm fixing this up to post and replacing the [LINK] with an actual link... WOW Google search is imploding fast. Googling for "linux landley v850 patch" does not find that patch, nor does adding "elf" before patch. Adding "remove" before v850 finally found one copy of it in mail-archive.org, which is not the kernel's own lore.kernel.org/lkml nor is it the iu.edu one that's been there since 1995, nor did it find a copy in any of the archives in the vger list for linux-kernel. Google search is blind to all of those. I got the above link out of my preferred archive by checking the date on the post in the one copy Google DID eventually find after all those retries, and then going to lkml.iu.edu and manually navigating there from the top down. Remember when it was easier to google for stuff than bookmark things? Not anymore...]

To avoid preparing for my flight I've been stress baking, using up the half-finished ingredients in the fridge of types Fade doesn't use to produce food she'll eat. She tends to make big batches of what she calls "kibble" once a week and put it into individual plastic tubs, and then have the same thing for the majority of her lunches and dinners until it runs out. Generally "pasta or rice with stuff in it". I'm leaving her such a cheese pasta tomato caserole sort of thing, and a chicken rice dish, and a large pile of steamed green beans, and a meat pie.

The household's standard meat pie recipe (which I learned from Fade but I'm the one who always cooks it now) is cook and drain one pound of ground beef, add a can of cream of condensed mushroom soup (as-is), a can of sweet baby peas (drained), a significant amount (most of a pound?) of shredded cheese, a dozen or so shakes of Penzey's "california seasoned pepper", stir it all together and decant into pie crust, bake at 375 for half an hour. Her pie tins are smaller than the ones I use, so trimming the extra off the bottom pie crust leaves enough for stripes of pie crust across the top, which I can bridge with torn up cheddar slices to get two pies from one pair of pie crusts. (Which is good because premade rolls of pie crust are like $5 a box now.)


April 10, 2023

And Jeff got back to me about the Tokyo trip with less than 36 hours before the plane takes off, because of course he did. Ok, the long-delayed trip to target for 2 more pairs of pants and a new pair of shoes needs to happen tomorrow, because Tokyo hasn't got anything in gaijin sizes. (Hopefully I can get new glasses in Tokyo the same place I got glasses last time, they're much better than you get through Zenni. I think it was somewhere in a Tokyo Hands, but that doesn't narrow it down that much. They're sort of vertical shopping malls, and there's at least 3 of them we went to in Akihabara and Asakusa and possibly Shibuya?)

Sitting down to actually implement gzip --rsyncable, I'm hitting the problem that the USE I'm making of the zlib stuff is "pass off an fd and it returns when done", meaning my code doesn't get to read the data and partition it. I could do a wrapper that reads the data and passes it along, and probably will eventually, but that seems kinda silly?


April 9, 2023

Weekend. Hung out at my Sister's, saying hi to the niecephews.

The one of the four that maintains their original gender (despite whatever their father's new wife does to them every week that they refuse to talk about but are very unhappy about) got screwed over by his father in a DIFFERENT way, apparently if you've ever gone for mental health counseling even once, the navy's nuclear submarine training program will happily give you a "waiver" to get through boot camp (because they're SO not making their recruitment numbers), but will then kick you out right afterwards even if you come in 6th in your class (because you volunteered to drop a spot so somebody else who was coming in as an E1 could get promoted to E2 by being in the top 5).

Personally, I'm not a fan of career paths which you aren't allowed to quit that could order me into combat when we're not at war, especially when I know multiple people who wound up permanently disabled in military "incidents" that weren't even combat related. (Shinga got crippled in a training accident, and spent the next decade plus having to deal with VA underfunding. Remember my "apprentice" Nick from 10 years ago? Her dad got poisoned working near a burn pit in Iraq, degenerative neurological something or other, I got to watch him get worse every time I visited...) But I also didn't grow up hand-to-mouth poor. (I've done what I can to help, but it's intermittent and from far away. Kris could never move out of Minnesota without losing custody of the kids to their father's new wife...)

Honestly, put Jon Stewart in charge of it. (The six minute video in there counts as "nailing an interview" to me...)


April 7, 2023

We switched the household slack to Discord. Let's see how that goes. I have so many "send notes to self in my DM-to-me slack channel", which was a scratchpad I could easily access on both my phone and laptop, and now I'm laboriously copying the to-laptop ones by hand now that I've lost access to that in that context. Gotta do that before deleting the slack accounts and uninstalling the app. (Alas, the phone side doesn't have a selection option that can highlight more than one entry. On the web side I could mouse drag and scroll and grab multiple pages of stuff to a test file at once, but the android UI doesn't have anything similar. Press-and-hold to highlight a single entry. No obviout shift-click to highlight another entry without deselecting the first. And so, I laboriously type one entry at a time into the keyboard. I'm back to February...)

Doing gzip --rsyncable kinda implies doing rsync. According to the rsync wikipedia[citation needed] page the checksum in question is adler32, which seems simple enough, although I'm squinting at the modulus: I'm pretty sure that can happen at the END as long as the input length is less than 256? Sigh, wikipedia[citation needed] keeps saying "look at the zlib source code to see a more optimized version" rather than just saying WHAT THE MORE OPTIMIZED VERSION IS. This is a five line algorithm! Obviously it's moving the modulus to the end. Alright, let's do for loop here to see where the overflow is... If the starting input is already ffffffff and you add ff each time you'll overflow a 32 bit counter after... 5552 entries. So page sized inputs are fine. I can add a comment.

Ok, so the theory here is you do a running checksum on the input, and when the bottom X bits are all zero, you reset the deflate stream. (When Hayase Nagotoro invented blockchain there was a lot less originality than I thought: the post proposing --rsyncable for debian came out YEARS earlier.) Since deflate is designed to work on concatenated archives, I don't even really need to communicate with the encoder, this is a "close and reopen, append results together" situation. Probably you want some minimum amount of input before checking the results, and maybe initialize the CRC to something other than 0 so a run of zeroes doesn't leave it zero? (Or does the "minimum amount of input test" cover that case?)

The next question is "how many bits of zeroes" and "what's the mimimum block size", and the original paper isn't even using adler32 like rsync is so I don't want to take its answers? Unfortunately nobody seems to actually document what gzip --rsyncable is actuallly doing here, let alone how many bits it considers worth resetting for in its --rsyncable. I hate looking at gnu crap both for licensing reasons AND because it's always TRULY HORRIBLE CODE, but I'd like to be at least somewhat compatible? And in the absence of ANY sort of documentation, hold my nose and see what's publicly available on github's web view... Looks like they're using 4096. And they're using MODULUS on a POWER OF TWO to check for the zeroes.

That's just sad. I need to step away from the keyboard for a long walk.


April 6, 2023

The toybox test suite is a bunch of shell scripts in tests/*.test (one for each command name) that get run by scripts/test.sh (which calls scripts/runtest.sh). The actual tests are shell functions that look like:

testing "name" "command line" "expected output" "file input" "stdin input"

I.E. each test has five arguments: 1) the name to print when running the test, 2) the command line to run for the test, 3) what the test is expected to produce on standard output, 4) what to write toa file named "input", and 5) the input to pipe into the command line's stdin.

There's some complexity: each test gets run in an empty directory (generated/testdir/testdir) with the $PATH set up so it's testing the right command(s). Arguments 3, 4, and 5 are run through echo -e to resolve escapes, and if there's a newline at the end you have to explicitly state it. (There's almost always \n on argument 3.) If argument 4 is an empty "" string then no input file is created (and "input" is deleted between tests) so it's not there messing up ls output and so on. If any command fails to produce the expected output, the script exits (unless the environment variable $VERBOSE contains the string "all" somewhere in it) so later tests don't even get run. But that's the basic idea of the test suite.

There's a few more corner cases, such as the checks that conditionally skip tests (shell functions like "toyonly" and "optional" and "skipnot", which set the $SKIP environment variable), and a whole second set of testing aparatus providing the "txpect" function (txpect NAME COMMAND [I/O/E/Xstring]...) that works like 'expect' listing a series of inputs to stdin and expected outputs (on both stdout and stderr) and eventually an expected exit value, ala:

PS1='$ ' txpect 'shell hello' 'bash --norc --noprofile -i' E$'$ ' I$'echo hello\n' O$'hello\n' E$'$ ' I$'exit 3\n' X3

And someday maybe I need to figure out how to hook that up to pty master/slave plumbing (or do they call it dom/sub now? Hey, that's a consensual relationship...) so I can query cursor position in a virtual screen (testing stuff like "top" in an automated fashion is REALLY nonobvious). But implementing "expect" in pure bash was hard enough...

Possibly the most complex part of all this, from my perspective, is that Android doesn't use my scripts/test.sh, it just uses scripts/runtest.sh. All the shell functions are defined in runtest.sh, but the test.sh script is the one "make tests" and "make test_sed" call to set up the generated/testing/testing directory and work out whether we're testing a single toybox command (calling scripts/single.sh to build it and install it into generated/testing if so), all the toybox commands (calling scripts/install.sh to put all of them into generated/testing so they're all in the $PATH at once), or running the tests against the host commands (which is testing the tests themselves, not testing toybox's command implementations: I haven't proved much if I pass tests I wrote but nothing ELSE passes those tests. Alas the host is a moving target and each time I upgrade devuan some tests that used to pass start failing because their output changed, there's some regex fuzzing I can do but it's a red queen's race...) I'm never entirely sure what will and won't break android's testing when I fiddle with my test plumbing, but Elliott pokes me when I do and I can fix it after the fact.

So anyway, I recently added "ls --sort" which needs tests, and first I was converting the existing ls.tests from "testing" to "testcmd", which is another wrapper in scripts/runtest.sh that supplies the name of the command being tested so you don't need to start each command line string with the same command. (Not just to eliminate redundancy, but so it can force testing the _toybox_ command instead of shell builtins and alias trickery by providing absolute path to command as necessary. Otherwise testing "echo" under bash isn't testing toybox, it's testing bash.) I didn't want to change the base "testing" function to do that because sometimes you want to be explicit, ala "VAR=VALUE $COMMAND --blah" or "for i in a b c; do $COMMAND $i; done", and besides: switching them over requires editing the command string to remove the command name, which gives me an excuse to review tests using my standard lazy approach. (Same general idea as the college study advice that taking notes helps because when you write it down you remember it.) But converting entire test files from "testing" to "testcmd" is generallly good because the result is shorter and less redundant, and often avoids wordwrapping. In THEORY a nice low-brain activity I can do when I'm not feeling up to much... (Except that when I do review, I tend to find stuff and go off on tangents. It's basically horror movie logic here, there WILL be something. But I'm getting ahead of myself...)

Another entry in the "there's some complexity" pile above is that 1) the name of the command being tested is prepended to the name of the test, so you don't have to repeat it each time, 2) if the first argument to testing is an empty string, then the second argument gets used as the name of the test. So if I say testing "" "-R" ".." ".." ".." it'll go "PASS: ls -R" in the output. (Or FAIL: or SKIP: depending on what happened. They're all the same number of characters so the output lines up either way. And when going to a tty, it's color-coded.)

I converted "testing" to always prepend the command name because I never want it to NOT do that (or at least couldn't think of any use cases), but when I combined left the first argument of testcmd blank I wound up with output like "PASS: ls ls -R" (or worse, PASS: ls /big/long/path/to/ls -R" because they were BOTH adding it, in a way that was nontrivial to untangle.

So that's where I went off on a tangent and parked the "add ls --sort tests" todo item last time. And this was AFTER my previous excursion into fixing the plumbing ls was using (so the tests actually ran in an EMPTY directory, so ls didn't keep having to duck into a subdirectory to avoid showing debris; this is the downside of me saying "I can always use more tests" when people ask me "how can I help", Divya meant well but left me with some technical debt to shovel out. The real problem is I'm a prefectionist acting like I'm making faberge eggs, but if I'm not doing a BETTER job than what's already there why bother? I mean ok, "licensing", but that's not sufficient reason by itself...)

ANYWAY, with all THAT sorted, I converted the ls tests from "testing" to "testcmd" and now I'm looking at a few of them I noticed are kinda weird. The -N test was actually testing -q, which means back when it went in I didn't review it enough, even though I found one issue right off (which seems obvious: -q isn't the default and -N switches off -b but not -q, so you have to be able to switch -q on first to tell?). And now that I'm going back to try to PROPERLY test it (turns off -b but not -q because gnu/dammit of course) and then I hit:

$ ls --show-control-chars $'hello \rworld'
'hello '$'\r''world'
$ ls --show-control-chars $'hello \rworld' | cat
world

What is this "shell escaping but only to tty" nonsense? Does THAT have a new command line option? I actively do not want to implement it, because it's STUPID. What is the POINT of doing SHELL ESCAPING ONLY TO TERMINAL OUTPUT? And if you're goint to do a $'' wrapper why not just have that be around the whole thing? Why have MORE THAN ONE QUOTE CONTEXT IN THE SAME OUTPUT? What is WRONG WITH THESE PEOPLE?

I miss the days where the gnu/dammit clowns failed to add -j support to tar for 5 years because the gnu development had completely stopped. Everybody just had a standard patch they added and it was all good. That was the period during which the gnu tools became actually popular in Linux, when they WEREN'T CONSTANTLY BREAKING NEW STUFF.


April 5, 2023

And I rebooted my laptop, losing all my open windows. Not because of any hardware or OS thing this time, but because I was working out test plumbing (to fix gentoo's inability to reliably run the toybox test suite when they build the package, by letting tests request which filesystem they run under), and I did "mount blah.img sub && cd sub && umount -l ." and SOMEHOW instead of unmounting the new loopback filesystem the debian host umount command unmounted my /home partition out from under every running desktop process. So that's nice.

That's a "press the power button and hold it down until it does the unclean override sudden power down" thing. THIS is why I like to test this sort of infrastructure in qemu instances. And of course thunderbird doesn't retain emails in the process of being composed the way kmail did, nor do my 8 gazillion terminal windows+tabs restore their state to show what I was in the middle of...

The design reason I was doing that is a test should be able to go "force_filesystem ext234" at the start and if "stat -fc%T ." says that's not what we're currently using then it should dd if=/dev/zero up an image (because loopback mount can't use sparse files so truncate -s won't work here, although it's transparent to an emulator so qemu can eat one just fine), mke2fs it, loopback mount it in a directory, cd into that directory, delete the loopback file and lazy umount the directory (so both the file and the mount point get freed when the last process using them does; the mount pins the file's inode, and our test process has the mount pinned as its cwd, but no matter _how_ the test exits it can't leave the mount lying around on the host afterwards), and then the rest of the test proceeds as normal and then does a cd out off the directory as part of normal cleanup.

Except of course the gnu/dammit stat command says "ext2/ext3" instead of a proper driver name. Gotta add filters to the parsing because they got "clever" in an actively harmful way. Toddlers "helping" in the kitchen, minus the learning part.

And now that I've rebooted, chromium stopped working with slack, which is now a fullscreen "you can switch to a supported browser or you can install our data mining app but otherwise fuck you" page. I asked on the #devuan channel which says it's a #debian issue because chromium 90.0.4430.212 is the newest version in the "oldstable" category (which devuan beowulf lines up with), so now I'm asking on the #debian channel. There is a "backports" repository, but this package isn't in it.

I've heard that chromium is near-impossible to compile from source, so I'm not TOO surprised that debian can't get it to build in older environments. The standard Google problem of their code both having a zillion dependencies and being completely unportable to the point they care about the specific dot-release of each dependency. Sigh. (And hermetic builds are technically a move to be LESS accepting of variations in build environment. You deploy the one true build environment to target, because building in an emulator on a provided image is slow. I care very much about reproducibility from first principles. This does not appear to be a common viewpoint.)

But needing to do a major OS version update to regain access to my household slack on my laptop? Sigh. I might need to start caring about firefox again. (It's a "household slack" with Fade and Fuzzy, but half of what I use slack for is cut-and-paste of URLs to my phone, and running another OS in a vm to run chromium in there means I'd have to get cut and paste working in kvm, which... enough ratholes for one day, thanks.)


April 4, 2023

Had to go to the hospital to have a piece of glass professionally removed from my foot. Not my most productive day otherwise.


April 3, 2023

One of those "spent all day trying to get in the headspace to do productive work, and didn't" days. Cooking and cleaning and generally being Fade's housewife worked out ok. And I did actually invoice the middleman. (Yay!)

(Why does avoidance productivity either put me into DO ALL THE THINGS mode or else completely stop all work, with no middle ground? This wasn't even tax paperwork, it was "resubmit an invoice". Which yes I had a bad bureaucratic experience with 3 months ago, but seriously...)


April 2, 2023

The mkroot dynamic build (which there's a waiting user for) SEEMS simple, but the current script is using a "cc --print-search-dirs | xargs cp -a $TARGET" approach that winds up populating the target with over a dozen gigabytes of crap which will NEVER fit in a ramfs, and is big enough repeatedly doing that build seems likely to noticeably shorten the life of my laptop's SSD. (And that's after I fixed the "it's copying symlinks" problem that cropped up in an OS version upgrade, so that it wasn't that big, but didn't reliably work either.)

My first stab at cleaning that up was "copy everything to target then sort the hashes, use them to compare files and hardlink together what's identical". Which cuts the space in half but the result is still multiple gigabytes and doesn't reduce the disk thrashing at all (the files still get copied before being discarded).

Now I want to dig up my old "run lld on each file on target to get list of libraries actually in use and copy just those into the new chroot" approach that I had a bash script for even back in the busybox days, code which I recently removed from toybox in theory because mkroot superceded it, and in PRACTICE because I need ldd in the $PATH to make that work, and when I mentioned my desire to add that to toybox Elliott had kittens. (I still don't understand WHY. He doesn't have to enable it for android, but it's a thing I personally have an immediate use case for. Copy this and the library files it needs, and the library files THAT needs. Do it recursively but skip ones already present on target, which prevents endless loops. My first script to do that was in 2001, back when I first put together a tiny boot image with binaries harvested from the distro I was running. I want to say... Red Hat 6?)


April 1, 2023

Sigh, need to invoice the middleman. I can do it on monday. (It's not EXACTLY rejection sensitive dysporia, "submitted paperwork that got bounced because weird politics, reluctant to do it again" is at least in PART me wondering "do they want to hold on to the money because their finances are dire enough that having it in their bank account reassures THEM, and if so will they actually pass it on once the arbitrary limits they invented are satisfied?" A bank limiting withdrawls to $3000/day is not a healthy bank. Their behavior is NOT A GOOD SIGN, and I'm reluctant to put pressure on the broken thing and see if it hurts... but I'm trying not to invent an unnecessary crisis here either, and "I last got paid in October and we just sent more money to the IRS than I've ever paid for a car" is ticking audibly...)

I suppose the flood of "april fool's" nonsense online shows that people are feeling better? It went away pretty much entirely during the Rump administration because geratric fascists shouting "fake news" to discredit their opposition by loudly and repeatedly asserting that anything they didn't like simply couldn't be true (the previous nazis called this the "big lie") made anybody ELSE not scrupulously telling the truth and fact checking everything they could... kinda inadvisable. It's still really annoying, and seldom even slightly funny. Oh well.

Going through the github requests trying to find simple things to close, but I've just been overcomplicating stuff recently.

For example somebody requested the "shuf" command yesterday, which I added, but I spent a while arguing with myself over whether it should use random(), lrand48(), or getrandom(). In theory the randomest is getrandom() but each calll consumes kernel entropy, which seems overkill for something like this? And it's _wasting_ entropy because it returns whole bytes and I then have to chop it down to just what I need, the common case of which is what, 500 entries?

Initializing a prng from a proper entropy source is a classic middle ground for a reason, and the _easy_ way to chop a randomness source down to a specific integer range is modulus, which will introduce bias unless the modulus is MUCH smaller than the range of the random number (so the uneven coverage of the last wraparound is statistically insignificant).

In the end I just went with srandom(millitime()) and then random()%count which is good enough. (And the trick to make it efficient is lines[ll] = lines[--TT.count] because if you don't care what order the not-yet-used entries are in, swapping the last one down into the hole you just left avoids the memmove() you'd do to close the hole while keeping them in order, or any sort of usage bitmap nonsense.)


March 31, 2023

What?

FAIL: chmod 750 dir 640 file
echo -ne '' | chmod 750 dir 640 file &&
  ls -ld 640 dir file | cut -d' ' -f 1 | cut -d. -f 1
--- expected	2023-04-01 02:57:10.424197685 -0500
+++ actual	2023-04-01 02:57:10.428197685 -0500
@@ -1,3 +1,3 @@
--rwxr-x---
 drwxr-x---
 -rwxr-x---
+-rwxr-x---

Sigh, I broke ls with the new --sort stuff, because when I reused the -A and -d flags I didn't UNSET them from the base set, so ls -d no longer produces output in the same order. Oops.

Of course I left myself a todo about this: a pet peeve of mine is --longopts without a corresponding sort option are un-unixy, and the new short options I defined for the new --sort types that didn't already have any were -! and -? (except ? is a wildcard and CAN occasionally misbehave, a pet peeve of mine with "qemu-system-mips -M ?" to list available machines is you need to quote the ? if there's a single character file in your current directory; the reason the magic . and .. files don't count here is wildcards won't match hidden files unless the first character is an explicit period... Ahem, anyway maybe -~ would be a better option since tilde is only special to the shell as the _first_ character of an argument, and ~ means approximately anyway so case insensitive shouldn't be TOO hard to remember.)

Using punctuation like that means I'm MUCH less likely to conflict with existing or future gnu nonsense. (The cut -DF support STILL isn't upstream in coreutils, last I checked. I should poke them again...)

So I need to grab the extended argument parsing plumbing I added WAY WAY BACK while working on mkdosfs, which wanted -@ to set the offset and I added a whole mess of lib/args.c and scripts/mkflags.c plumbing to allow that. Which I checked in and tested and everything, but the only user of it is still out of tree in my local pending directory because I got distracted and still haven't finished mkfs.vfat. So, how does it work:

Take your ascii character value (@ is hex 40), set the high bit to turn it into "high ascii", turn that into a good old K&R C octal escape circa 1976, and include the octal escape in the option string: for -@ it's "\300". The FLAG macro you get is FLAG_X followed by two hex digits, in this case FLAG_X40 which means FLAG(X40) should work.


March 30, 2023

I didn't get a haircut before I left Austin, and Fade's suggestion is next to her office which is a half-hour walk each way, but I could use the exercise. That and visiting the Tiny Target next door at the whole morning.

Fade pointed me at a lovely little study room off the side of one of the light courts in the apartment, which would be perfect for recording tutorial videos. It's also a nice place to get away from Endlessly Barking Dog.

I want to write up a quick mkroot explanation for the qemu guys (who are testing with my mips and misel mkroot images, yay!) but alas, it's not quick and simple. It SHOULD BE, but I'm having Pascal's Apology again. Which is why I need to record a video tutorial for this. (Hearing it out loud helps me get a written version to be concise and intelligible too.

Whole bunch of work, that.


March 29, 2023

Recovery from travel.

A QEMU thread has me rebuilding all the qemu targets again, which is a bit of a time sink.

It's kind of hilarious that Ubuntu doubling down on Snap and SuSE doing a flatpak distro came out the same day. Snap is Ubuntu's proprietary version of flatpak the same way Ubuntu had the upstart init system, unity desktop, mir 3d compositor thingy... Ubuntu is run by a white male billionaire from south africa, not a whole lot of "listening" or "following" going on there. It's a real pity that their move to replace /bin/sh with the Defective Annoying SHell was swallowed by Debian, but Debian also switched to systemd which was approximately as stupid. Being at the mercenary of a billionaire's whims can't be comfortable. (Debian has an unfortunate history of FSF-adjacency, which means its development got so flamewar constipated the project almost died with many years between "debian stale" releases, and Connonical hired at least one a full-time developer to shovel out the mess on the engineering side of _debian_ (not ubuntu) because the open source project he'd overlaid his proprietary project on going under would have been embarassing. This was also the period where fleeing Debian developers squashed Gentoo, which that distro never really recovered from...)


March 28, 2023

Travel day. Onna Airplane.

I'm carting two pieces of checked luggage, the first of which is a suitcase inside a suitcase so I can fill one with Japan Loot for the return trip. I've probably missed Milk Seafood Ramen season (to Fuzzy's great disappointment), but I also left a bunch of clothes and books and stuff in the apartment when I left (meaning to return) over the pandemic, and it got packed away into storage and I should reclaim it.

The second piece of luggage is the GIANT CARDBOARD BOX with the oscilloscope Jeff sent me in the middle of the pandemic. Apparently the stuff they made 50 years ago is way better than the stuff you can get today, because nobody makes analogue waveform storage anymore and the digital equivalents are hundreds of thousands of dollars IF you can find something sufficiently high resolution. So when one comes available at a good price (usually because the people who knew how to operate it retired or died, and the inheritors don't value something they can't use), he snatches them up. This sort of thing can record signals for DDR3 and USB3 busses. We've done LPDDR2 and USB2 already because the cheap digital stuff can keep up with that, but anything faster gets expensive rapidly to see what's actually going across the wire.

Analog storage is the same general idea as a mercury delay line: giant capacitor that reproduces the wiggles of the input in its output, then you can loop it back on itself to retain the signal for a while. It is INSANELY high accuracy, calibrated with NASA-style equipment that sadly doesn't exist anymore. The downside of the analog stuff is A) the stored signal only lasts a few minutes, B) there's a hard cap on the SIZE of the capture because the delay between input and output is fixed and you can't record more than that at a time.

(When Scotty stored himself in a transporter pattern buffer for decades, the technobabble description was a bit like this. And in the Dr. Who episode Timelash, the 6th doctor used a McGuffin based on this principle to hit the bad guy with his own zap gun. Modulo "no, it's still totally murder when you're counting down like that while pointing the output at him, you could have just turned it to face the wall; there might be a limit to self defense when the dude is literally begging", but that's the kind of writing Colin Baker suffered under. At least he never had to deal with Yellow Kangs or the Kandy Man.)

Anyway, giant box under my desk in the bedroom became giant box in bedroom closet which is now giant box in Fade's apartement, which I hope to convert to giant box somewhere in japan that is no longer my problem. MAILING it to japan would cost hundreds of dollars (more than Jeff paid for it, that's why the seller only offered US shipping, not even to Canada), but it's just under the weight cap for checked luggage, and they don't charge extra for it being _bulky_. (The weight limit is a health-and-safety thing, maximum weight workers can be expected to individually lift between conveyor belts lots of times per day. Anything heavier than that requires two people to lift for liability reasons, and thus special labeling and handling procedures, and generally gums up the works trying to load and unload the plane quickly.)

No idea when Jeff plans to fly me to Japan, but hanging out with Fade until then. Disposing of Giant Box is a nonzero portion of the reason I agreed to the Japan trip. (I also really LIKE tokyo, and it would be nice if the stuff I worked on for years actually got launched out into the world, although toybox comes first these days thanks to Google.)


March 27, 2023

Bunch of errands today. Four bus rides.

I've meant to switch credit unions for years now, and as long as I was going to be down at UT after 9AM anyway... I'm seldom still there when I walk because I head back around sunup or I get all hot and sweaty, plus I can't see my phone display in full sunlight. And then I needed to go from UT up to The Domain to close the old credit union account, because that's the closest remaining Amplify location. University of Texas credit union has 2 locations and 2 ATMs within a half hour walk of my house, and two more at the university from a bus that picks up within sight of my driveway. The closest remaining "Amplify used to be the IBM Texas Employees Federal Credit Union but renamed itself" location is eight miles away, a two bus minimum each way or something like a $40 round trip on lyft WITHOUT surge pricing.

Fresh full backup of my laptop to USB drive. This SSD is old enough I'm occasionally checking dmesg to see if it's started to get unhappy about stuff. (Shouldn't, but I can be hard on things...)

Huh, corner case in the toybox test suite. So the general theory of toybox tests is a file full of testing 'name' 'cmdline' 'result' 'infile' 'stdin' lines (each is a call to a bash function) where the first argument's the name of the test to print on the PASS: line, the second argument's what to run, the third is the stdout output to expect, the fourth is data to write into a file named "input" (which only gets created when that's not blank), and the last is what to feed into the command's stdin.

Three complications to this: 1) The 'name' has the name of the command being tested automatically prepended to it so you don't have to repeat it every time, 2) there's a wrapper function testcmd which inserts the name of the command we're testing into the start of the 'cmdline' argument so we don't have to repeat it (and it makes sure we call it out of $PATH instead of a bash builtin by providing the absolute path when necessary), if the 'name' argument is blank it uses 'cmdline'.

The problem is that if you leave 'name' blank in testcmd it prepends the command name TWICE. Once when testing() prints the PASS/FAIL/SKIP line, and once in the testcmd() wrapper.


March 26, 2023

Got tired of waiting for Jeff to actually schedule a trip, and got a plane ticket to visit Fade up in minneapolis. (If I'm flying to tokyo, it should be from there.)

This means I have SO MUCH TO DO before then. Laundry! Fresh full backup of my laptop! Toybox todo items I should flush up to github... And it means I should NOT walk to the table tonight, because then I won't get anything done during daylight hours tomorrow because sleep schedule. Alas...

Fuzzy's birthday was on the 20th and we ordered her an Oculus 2 so she could play beatsaber, and it does not work. So we're returning it, which means I need to drop it off the return box at the Amazon lockers in Gregory Gym (the building with two first names), and I thought that was my excuse to do my 4 mile nightly walk watching anime on my phone despite the earlier "I shouldn't do that for schedule reasons"... but the building doesn't open until 9am. (I'm currently on a night schedule. The flight tuesday's noon-ish. Gotta impedence match between now and then.)

If I'm planning to be at the university during daylight hours I should get a new credit union account at the UT credit union on Gadalupe. Which means I should also close down my old Amplify account (which used to be IBM Texas Employees Federal Credit Union before they moved entirely out of Austin up into the northern suburbs. The closest location left is in <snootiness>The Domain</snootiness>, which is 8 miles from my house, an hour away by bus or bicycle. All their closer locations closed years ago.)

I deleted the Google Maps app off my phone screen back when it turned into all advertising all the time and stopped showing me black owned businesses (such as the haircut place I regularly go to in Hancock Center) even when I zoom in all the way, but sometimes I still need to see how far it is from point A to point B and what bus to take (and/or when things open, which it's never been quite right about since the pandemic), and when I do that I'm using the web version on my phone. Here's the SERIES of bugs I just hit in Google Maps' web version: enter the two addresses, hit the arrow on the keyboard to actually search and... it doesn't do anything. Plus it's scrolled itself to the right in a way that won't let me scroll back left so I can see the start of what's written on the page. And when I rotate it from landscape to portrait mode in hopes it resets itself... it loses track of the addresses I entered to ask directions about. It loses track of the location I was looking at, and instead reset itself all the way back to zoomed out full city view. That part's trivially reproducible, does it every time. Ask directions, type in the first address, rotate the phone, and the page undergoes a hard reset losing all context. Bravo Google. Your own browser in YOUR PHONE can't handle your website. That's... *chef's kiss*.

Anyway, from UT to The Domain is one bus (the 803). Yay. I should do that. (I don't want to give Patreon and such the banking info for the household account. I'm still paranoid about combining "money" with "internet".)


March 25, 2023

I got the ls --sort stuff checked in but not properly tested. Confirmed it didn't cause any obvious regressions in the test suite, but then got distracted by the whole Microsoft Github clusterfsckery trying to check it in. Had to delete the man-in-the-middle key four times before it stopped complaining. (IPv6 is not fun.)

Hmmm, tests/ls.test is ugly. Each test is bracketed with "cd lstest && $TEST && cd .." because otherwise the "expected" and "actual" files wind up in the current directory listing, and hence the output of most tests. The first being the output the test is expected to generate (argument 3 to testing()) and the second is the file output is currently redirected to, which are files so we can diff them and naturally get useful labels on the results. There's a fourth file, "input", but these days that's only created when testing() argument 4 isn't blank.

I suppose I could move them up a directory level? Because the action's taking place in generated/testdir/testdir, with the first "testdir" being where temporary binaries we're testing live. Since none of them are called "expected" and "actual" it shouldn't conflict if I use it as a work directory. (Modulo whatever Android's doing to use this test infrastructure, I THINK it should be ok? They use my scripts/runtest.sh but not my scripts/testing.sh which sets this up... Sigh, I should poke Elliott, shouldn't I?)

Walked to the bat bridge instead of UT, 25k steps total instead of just 10k, but my back was killing me when I sat down on the couch in Jester Center and I didn't get anything done. (They're kind of terrible faux leather couches on the second floor, mostly there for show I think, and my lower back's been unhappy since I slept on it wrong a few days ago, like a crick in the neck but older and more decrepit. I _really_ don't want this to become chronic because it wouldn't just suck, it would be CLICHE. The difference between being 15 and being 50 is problems resolving in about 8 minutes vs problems resolving in about 8 days. Lots easier for to fall behind on cumulative wear when it's not clearing itself nearly as fast as it used to.)


March 24, 2023

Finally dug up an old-style micro-USB cable that WASN'T a charger cable but actualy did data, so I can see the serial output on the turtle board. It works fine once I got a cable, but the linux-kernel I built and released last time does not work at all. (No output to serial once the bootloader hands off to it.) The one the sdcard had on it was linux-5.10 (dunno if I tested something newer since, that's just the reference version I know works), so there's some bisecting to do.

Huh, the musl-cross-make toolchain rebuild I did with gcc 11.3 earlier this month didn't build the sh2eb cross compiler because libgcc/unwind-pe.h had an error: '_Unwind_gnu_Find_got' was not declared in this scope which... I mean clearly it's a gcc bug, but what exactly broke? (It built sh4. Is this a nommu thing?) How do I track that down... What I _want_ to do is bisect it in the git repository, which is tricksy. It's slow to build gcc at the best of times, and mcm with my wrapper script doesn't do partial compiles.

I'm kinda tempted to compare the Linux From Scratch chapter 5+6 build script with musl-cross-make and just do a toolchain build script. If I have to fish out my own patches to make the build work _anyway_... I did that in aboriginal linux, this time it should probably be a proper project all on its own.

That's already 2 nested tangents from what I'm TRYING to do.


March 23, 2023

Got the LFS chapter 5+6 script building to the end. No idea if the result's actually useful yet, haven't done the chroot and started the second script. For some reason following the current LFS instructions, half the new commands _aren't_ in the /tools directory? They're in the normal paths. What's the point of the airlock step if you do that? I has a CONFUSED...

Alas, my initial naieve attempts to run record-commands to get a log of the host commands called for this build script... did not work. I need to update scripts/record-commands until it works right out of the box even when I haven't looked at it in 6 months and don't remember how I'm "supposed" to use it. (For one thing, it calls scripts/single.sh to build the log wrapper. It should check if "toybox" is already in the $PATH and symlink logwrapper to that if so, and only try to build it if it can't. Otherwise, you can't use it from anywhere OTHER than the toybox directory...)

I also have a github bug request from somebody who did scripts/mkroot.sh and then couldn't "ping" anything because glibc is crap at static linking. Um, yeah. That's why I added a "dynamic" script, but I've updated devuan since the last time I poked at that and now it's copying a bunch of symlinks into the target, including absolute paths outside the chroot. Unfortunately, when I add a -L to the cp -a the result is 1.7 gigabytes of usr/lib space because glibc is an insane pig, so I need to hardlink them back together to get the size down to a dull roar.


March 22, 2023

Back at the table again (I've missed this), putting together a Linux From Scratch 11.3 build script, so I can do the old trick of substituting in toybox commands one at a time and comparing the output to make sure nothing changed. (I should probably diff the config.log as well. To get consistent results I should do single processor builds, but I'm having the script make -j $(nproc) and then I can just "taskset 1" to force that single threaded later.)

Jeff thinks he might wind up flying me to tokyo on monday, but the hard part is working out hotels. It's cherry blossom viewing season there, which coincided with spring break in the states, and it's the first time in 3 years Japan's been open for tourists. The hotel room shortage has not eased up at all yet. Still a big staff shortage. They've announced plans to allow more foreign workers, but it apparently hasn't manifested results yet...


March 21, 2023

At the table, with a can of checkerboard tea. It's been a while. (Ok, I'm at one of the tables NEXT to the original one, working on battery because the outlet's blocked off, and ignoring the construction fencing. But still: same porch, same lighting, same comfortable seating.)

Poking at dd.c because I had the tab open, and... ok, that's a kind of painful use of TAGGED_ARRAY. There's nothing BUT the strings and the position indicator for the strings. This makes me sad. There's gotta be a better way to do that. I'm not sure what that better way IS, but this is ugly...

And now distracted by the half finished ls --sort plumbing, which I have now finished and the result compiled and failed the very first test in "make test_ls". Great.


March 20, 2023

I have done something to my back while sleeping. It's like a crick in my neck, except lower back, and I'm on something like day 3 of this. Reeeeally hoping it doesn't go chronic.

There's an i2c bug report on github that's been... badly explained repeatedly. I think the submitter doesn't have english as their first language, and I have no i2c domain expertise, nor do I have a test environment, which is why I haven't done the normal level of cleanup on this command, which ALSO means I haven't done as much review.

Because writing code is easier than reading code, I tend to rewrite as I go to utilize my far-more-practiced writing code muscles to help with the reading. Yes I know it's a bad habit, and sometimes I throw away the result because it's just marking stuff up in red pen, but that seems a waste with toybox? If I'm gonna clean up the code and thing the result is an improvement, I want to check it in, but I can't test for stupid thinko/typo regressions if I don't have a test environment and ANY change can theoretically introduce a regression. I've borked semicolons or bracket nesting levels in code refactoring before (back in my tinycc fork), and the result compiled but subtly misbehaved. Gotta test. CAN'T test. It's a problem. I've USED i2c tools on various board over the years, but it was all at contracts where I left the hardware behind with the job. My laptop hasn't got it. I don't THINK the turtle board does either but when I just tried to boot it up I didn't get serial console... it uses the old pre-C usb cables and I think this one might just be a charger cable not data? (Why do they DO that?)

I'm poking at qemu to see if that has a good test environment for i2c somewhere, but none of the ones I built did because the kernel hasn't got CONFIG_I2C enabled, and when I switched that on (and CONFIG_I2C_CHARDEV because that's not enabled by the first thing for some reason), then there's DRIVERS: I2C_SCMI, I2C_CBUS_GPIO, I2C_GPIO, I2C_OCORES, I2C_PCA_PLATFORM, I2C_SIMTEC, I2C_XILINX, I2C_MLXCPLD, I2C_VIRTIO... Plus whatever I2C_HELPER_AUTO is for... protocols? I suppose I could just switch it ALL on and see if any of the QEMU board emulations bind to something? Whatever I come up with should probably be added to scripts/root/tests so I can build regression test systems that do this automatically, but first I need to make it work _once_.


March 19, 2023

You can sing "closing tabs" to "closing time".

Trying to collect old superh patches for Glaubitz (the new arch/sh maintainer in Linux), but... there's so much old debris here and I have no idea what's still relevant. I collected lots of groups of 4 or 5 patches at a time and sent them to Rich when he was nominal maintainer, most of which never got applied, but I didn't exactly archive them again afterwards. (Checking back email in my sent box is one of the avenues of investigation here...)

Hah, scp-ing my blog file and corresponding rss file up to the website takes LESS THAN A SECOND with the new router. The old one took long enough I usually tabbed away and came back.

I keep meaning to find a way to post these bog entries to mastodon so people can reply there. This thing is a text file I edit with vi and periodically rsync, with a python script that generates an rss feed based on the lines that start each entry being regular enough (mostly thanks to cut and paste) that the text parsing to chop stuff out and plonk it into wrappers is pretty simple. But there's no WAY I'm turning that into an activitypub feed any time soon, and

Mastodon can provide an rss feed, but not let you FOLLOW on rss feed. Or easily convert an rss feed into mastodon posts at some known @user@server account. (If you google for it there's dozens of weird little projects on github or websites to do-it-as-a-service that seek to address this, but no real winner emerges and Google's search ranking to indicate which ones to look at first has deteriorated into uselessness over the past few months. My wife regularly complains about google becoming useless and she's not a techie.)


March 18, 2023

Blah, I need network block device tests, which is fiddly both because it's a client/server thing requiring root access AND kernel support for the /dev nodes, but also because the server and the client test against each other and "make test_nbd_client" would build just the client and then try to grab the server out of the $PATH, which most likely isn't there. As with the tar --xform stuff needing toybox sed, the test is looking at a _combination_ of toybox commands, which... the current test suite isn't really set up to do. (Well, "make tests" that tests ALL of toybox at once can, but not in a more granular fashion.)

I can have the nbd-client test check that nbd-server is there and fail to run if it isn't, but... the tests are mostly the same on both sides? Sigh, what are the tests:

  1. nbd-client can mount nbd-server device on loopback, read a 4k block from it, write a 4k block to it, flush, and exit.
  2. nbd-server -r: can export a read-only file, client can mount it read only and read from it, client can't mount it read/write. (Or does it fall back to read-only like iso9660?)
  3. nbd-client without -b does default to 4k, and -b 1024 is a different block size. (Different ext2 filesystem mounts care about underlying block size for reasons I'm not entirely clear on, but it gives a failure case to check).

Hmmm, so far most (all?) of the toybox servers are inetd style. I should probably find some way to indicate that in the help text? Ok, sntp isn't because that's a UDP protocol, and figuring out when a UDP transaction is _finished_ is AI-complete. By which I mean "C3P0 could do it, but I wouldn't trust chatgpt near it". There's one of those P=NP things going on with this AI nonsense, where closing the gap is likely to take multiple lifetimes if it can be done.

Possibly I need more lib/net.c code to do a server wrap thing that takes a callback function? Except then my httpd and nbd_server need more command line arguments to indicate the server and port to bind to, which is a UI issue. Hmmm, I need to revisit httpd anyway to add the rest of cgi support. And nbd_server already says "ala inetd" which is funky since I don't have an inetd in toybox.

Long ago the samsung guys contributed tcpsvd to pending, which doesn't share any code with netcat. It does do a number of things netcat doesn't: limits on simultaneous connections, sets a bunch of environment variables... it also doesn't support nommu (which netcat server mode does), and combining vfork() with the -h option to look up remote connections (which can take an arbitrarily long time) does NOT sound like fun. Um, wouldn't the -b N thing be rendered irrelevant by kernel syncookie support? It's been YEARS since I've looked at that, where does -b get used... no FLAG_b and it's not TT.b it's... Sigh, count the arguments: TT.bn. Am I going to have to clean this thing up just to properly EVALUATE it? Grumble grumble... I really dislike duplicate infrastructure, but at the same netcat doesn't track multiple children. Plus this one hasn't got the "cat" part, it's always setting up filehandles and leaving the reading and writing of them to a child process.

Hmmm... I suppose I could clean it up and potentially merge them _later_? They have a hand-rolled hash table implementation. It's doing an error_exit() on recvfrom() errors. Is there a UDP packet you can send that DOS the server? (I remember TCP out of band data, but not UDP?) Why is it using sigemptyset/sigsuspend instead of just pause()? Does tcpsvd MEAN to write a trailing nul byte on the message part of -C COUNT:MESSAGE or is this an accident? (What do other implementations do? Is there a spec? Sigh, break down and look what what busybox does: no they do not have a trailing NUL byte, and they use nonblocking send() instead of write, which seems kind of important. Although I could probably fcntl(F_GETFL/F_SETFL) to set O_NONBLOCK, but why when send() exists? I could also check MTU length vs the message, but again... simple thing.)

Oh this looks like a long cleanup. And learning domain expertise. Why am I opening another can of worms when I'm trying to CLOSE TABS again?


March 17, 2023

Got the new toolchains built with gcc 11.2, the patch worked and I should poke dalias about merging it into musl-cross-make. (It's a backport, this should not be controversial to upstream? But then I felt that about the kernel patches. Oh well, it's on the #musl backscroll, maybe he'll notice...)

Built scripts/mkroot.sh CROSS=allnonstop LINUX=~/linux/github followed by scripts/test_mkroot.sh and everything except sh4 and the "no kernel" targets (armv4l armv7m microblaze mips64) passed, and the sh4 problem is the qemu+kernel clock issue (that emulated board isn't getting a battery backed up clock, and I ran the test without my laptop connected to the net so it can't set the clock from NTP).

So the new toolchain's working as well as the old I guess? More warnings, such as 'sprintf' argument 4 may overlap destination object 'ifs' in sh.c which... Ok, I can see an "even more optimized" version getting that wrong and I should maybe switch that to memmove() but first I should refresh my "what data is in which variable" mental working state which implies I should have a LOT more comments here (and possibly rename some variables) but reading through this code I did a couple quick simplifications but NO I have like FIVE DIRTY VERSIONS OF THIS FILE to collate already (I was working on this at Fade's last month, where was that... I _just_ dirtied the toybox/toybox file which had previously been clean, where's the recent... not in clean, not in kleen... it's in kl2). Ahem: NOT NOW...

Yay, somebody who seems to know this i2c stuff finally piped up on the confusing bug there. I still haven't got a test environment, and "get a raspberry pi working" is not ideal there. (I've been meaning to do that forever: their bootloader needs horrible proprietary blobs to bring the system up, the hdmi+keyboard setup in front of the TV is awkward and the connections buried and I haven't got a hdmi monitor for the desk in the bedroom (been trying to get out to Discount Electronics to buy one for months but they moved 5 miles further away, up near where Fry's used to be), and the only non-broken pi case I have is in use on my turtle board... Sigh, I should sit down and do it anyway. So many tangents.

Speaking of tangents, the recent "cpio -i extra garbage arguments" thing really SHOULD have them be extract filters, and opening cpio I see I have that "cpio skip NUL" test still not passing on the host, and a TODO about hardlink support since that's what the TRAILER!!! entry actually flushes (the cached hardlink detection), which means I really should try to get the other mother to sew buttons onto the hardlinks in a test directory to confirm what the output looks like, and then confirm the kernel's consuming it the same way AND poke the people who were talking adding xattr support to initramfs...

Sigh. Pull a thread in this jenga tower...

One of my early posts to mastdon was reminiscing about how the INTENDED use of a Tardis in Dr. Who seems to be for very long-lived Time Lords to bog off to deep space or some deserted beach for a few years while they catch up on STUFF, and then return 5 minutes after they left actually caught up on all their reading and browser tabs and todo lists without the society around them moving on so they missed anything or accumulated new todo items while they were gone. And the Doctor got in trouble because using one to go to a planet and interact with people was Doing It Wrong.

Yeah, 500 years between regenerations (both because the second doctor said he was about 450 years old and the eleventh lasted about that long in his little exile town), the ability to pause the world for a decade at a time in a nice quiet workshop area with kitchens and libraries and swimming pools and long corridors to walk down... I can definitely see the appeal.


March 16, 2023

The coreutils guys have got their knickers in a twist about new gcc releases breaking trying to build existing packages again, and rather than go "our code didn't change, yours did, this is your bug", they're capitulating because gnu. And providing horrible emacs examples.

Anyway, I should probably try newer gcc so I'm at least not surprised and can have -fno-stupid-thing workarounds prepared for fresh compiler bugs from C++ loons? The current musl-cross-make git version has gcc 11.2.0 as its newest toolchain... And it broke. Then new version can't even do a canadian cross:

from ../../../../../src_gcc/libstdc++-v3/src/c++17/floating_to_chars.cc:31:
build/i686-linux-musl/i686-linux-musl/obj_gcc/i686-linux-musl/libstdc++-v3/include/fenv.h:58:11: error: 'fenv_t' has not been declared in '::'

The line in question is "using ::fenv_t;" which can't possibly be a good idea.

The fix is to tell the libstdc++ build not to include the standard C++ headers in its search path. No really! Adds a compile flag. (And according to heat on the #musl irc channel, that's what got merged upstream.)

No wonder each new release breaks. It failed to build itself with itself, and THIS SHIPPED.


March 15, 2023

Weekly call with the J-core engineering team. Still no word about actually going to Tokyo. The tourists are back, it's sakura season. You'd think the overflow of hotel rooms from the olympics would mean they aren't all full, but having plenty of ROOMS does not mean having plenty of STAFF to service those rooms, and everybody got laid off during the pandemic. Japan does not have extra people in general these days (under the age of 60, anyway), and the covid restrictions allow tourists back but not yet foreign workers to run cash registers. It's apparently a problem, but there are worse problems. (The "nobody has any money, everything's going out of business" problem has at least been arrested by the return of the hordes of tourists. Although a lot of individual shops didn't survive.)

The new router arrived, and we eventually got it set up without installing any apps. It is SO much faster than the little white circle from Google (and the signal strength bar is green rather than yellow when my laptop's on the desk in the bedroom), although we haven't gone all "office space printer" and smashed the google circle with a hammer yet because we're giving it a few days.

The fiber connection itself is actually quite nice: the router SUCKED. The _service_ is mixed: why can't we get a static IP for less than twice what we're paying for the connection now? It's LITERALLY THE SAME SERVICE with a trivial config tweak. Wasn't the whole point of IPv6 that even if you can't get a stable IPv4, everyone everywhere could have a stable IPv6? But no, they want to capitalism at us.


March 14, 2023

Downloaded a fresh LFS book, and the magic all-in-one source tarball which should probably be more well documented, and I should automate another build and then try to get mkroot to do it. I can insert a toybox dir at the start of the $PATH and switch over commands one by one, just like I did with busybox back in the day.

Alas I'm not feeling inspired, because I have too many open tabs. Closing tabs tends to be hard because they're all only still open if I didn't get them closed last time I sat down at it. But starting anything NEW just makes it worse. And it's deep into "if I work on anything specific I'm not doing anything ELSE" territory. Generally a sign I'm still undervolt. (The cedar pollen is not helping.)


March 13, 2023

Still not feeling great, but I should do stuff.

I reached the point of editing and uploading blog posts where the entire entry for Feb 22 is "Oh god, kernel people." I know exactly what that's about but... really don't WANT to expand it? For the same reason I stopped replying to the kernel threads. Can I just use old kernels? I want them to stop breaking stuff that USED to work.


March 12, 2023

Sore throat, couldn't sleep. Spent most of the day huddled on the couch.

Tried to watch the "campfire cooking" isekai with Fuzzy, which Did Not Work because of the stupid Google router continuing to die. (I tried associating my phone with Google's router to save bandwidth for like five minutes when I got back, and then undid it again because even when T-mobile is throttling me for going over my 50 megabyte monthly quota it's STILL WAY FASTER THAN THAT STUPID ROUTER.)

This was finally enough for us to break down and get a new router. It was $50 cheaper to overnight a netgear from Amazon than to buy the exact same router at the Best Buy a fifteen minute walk from here.) So far it looks like it needs an app installed on somebody's phone to set it up (the card in the box says what app to install, or gives a URL to talk to a support being; no other instructions), so we haven't actually swapped it in yet, but there SHOULD be a way to talk to it directly...


March 11, 2023

Sore throat. Kind of lurgy-ish. Trying to figure out if this is allergies or dryness or microorganisms. Possibly it's a team effort. And, of course, I'm old.

Jeff got his contract signed, which means I may be heading back to Tokyo to help him organize the giant archive of stuff we did so it can get spliced together into a new product. Historically speaking, I can do toybox stuff from tokyo MORE easily than from Austin (he hates Apa hotel rooms, I find them just about my platonic ideal of a work environment, with a conbini downstairs for lunch rice balls), so...


March 10, 2023

We rebooted the Google Fiber router yesterday because it had become unusable again. Today it's already bad enough that reloading the household slack tab (after a "pkill -f renderer" because chrome was taking up too much memory again) did the ?cdn_fallback=1 then then added ?force_cold_boot=1 for the third attempt and then timed out saying it couldn't contact slack.

I don't mind google.com taking 7 seconds to load nearly as much as I mind being completely unable to use some sites, or thunderbird pausing for ~3 seconds between each email it downloads via pop3 (meaning a 400 message download takes over 10 minutes, so downloading my ~1500 daily messages is a background task that takes over half an hour).

Capitalism's really BIG failure is externalities. Engineers should be forced to dogfood their own products. I want THIS router put on the desk of the person who designed it, with all their traffic going through it, and to be forbidden from rebooting it for a week.

And yes, I'm happy to dogfood toybox. The main reason I don't already is I want a feel for what the other versions do so I can make toybox roughly match it. (When your frame of reference is your own output, it's really easy to spiral off into the weeds.)

Much wrangling with cpio, trying to fix three different issues. Got two of them fixed, calling it good enough since the third isn't a regression and nobody's waiting for it. (That's the "TEST_HOST fails, when did that start?" Moving targets...)

Pondering (st.st_mode&S_IFMT) == (mode&S_IFMT) and wondering if the compiler is smart enough to turn that into !((mode1^mode2)&S_IFMT) or if that's even a win. (3 operations vs 3 operations, although ! is only an operation sometimes? It could also go r1 = S_IFMT; r2 &= r1; r3 &= r1; branch-not-equal r1,r2 or some such. The repeated constant is PROBABLY something the compiler can handle for me, I don't need to go "that's redundant, I could rephrase it in a way it's not stated twice" and then ponder whether or not that's actually an improvement.

Ahem: premature optimization. Back away slowly.


March 9, 2023

Ok, I _think_ for the help fixes: "toybox --help COMMAND" should print Elliott's advertising line and "toybox help command" should not, and "toybox --help" is equivalent to "toybox --help toybox", but "toybox help" is equivalent to "toybox help help".

This is all UI stuff, so there isn't a right answer, but I'm trying to come up with an answer that makes sense for me without obviously disappointing anybody else.


March 8, 2023

I have 8 zillion accumulated 80/20 patches were I've done most of the work and then hit "does this cover all the cases, what ARE all the cases, and what are all the test cases I need to put this through to prove that" and I can't quite work that part out. I'd very much LIKE to check all this stuff in, but making sure it's _right_ is hard.

The sad part is I keep trying to grab low-hanging fruit, finding out the thing is not low hanging fruit, parking it at a good "almost finished but not feeling up to finishing just now" parking spot, grabbing OTHER presumably low hanging fruit, and then coming back a couple weeks later and having to reconstruct my mental state from scratch.

The external bug reports are actually easier to field because somebody else is waiting for me to finish and I can tell whether or not I've fixed their test case.


March 7, 2023

I'd like to have a nommu test system that runs under qemu, and "coldfire" (an m68k variant) is the oldest of the lot. The problem I had back under aboriginal linux is none of the nommu board emulations had the complete set of hardware devices I wanted (256 megs RAM, battery backed up clock, serial I/O, two block devices, network card), but most things have a serial console and I can fake the clock with sntp or an environment variable, and if I have a network card I can use network block devices. It's not ideal, but it's _something_. Alas I can't use swap on nommu so a board with only 64 megs ram isn't running modern gcc on anything complicated.

Alas qemu is terrible about labeling its boards (it's getting better, but there's no docs/system/m68k yet), I can go "qemu-system-m68k -M ?" and I THINK the first two boards there are coldfore (as opposed to the with-mmu ones that Linux inexplicably won't let me built a nommu kernel for) are an5206 (Arnewsh 5206) and mcf5208evb, the latter of which is the default board. As far as I can tell (from reading through hw/m68k/ar5206.c and hw/m68k/mcf5206.c) the 5206 has 128 megs ram but no hardware except a serial port? The 5208 has one network card, which is at least something.

So, back to the linux source: arch/m68k/configs has a file m5208evb_defconfig so let's build that and see if I can feed it to qemu-system-m68k -nographic -no-reboot -kernel vmlinux and hey: boot messages! Panicing because no initramfs. And the no-reboot is ignored implying this board doesn't know how to reboot or power off which is... sigh.

Memory goes from 40000000-41ffffff which... echo $((0x1ffffff)) is 32 megs ram. That's a bit squished. And it ignores qemu's -m option to try to give it more, which beats the cortex-m boards that were erroring out when you gave it any -m value other than the default. (QEMU may be undocumented, but at least its behavior is inconsistent.)

What else is in these boot messages: ttyS0 is the "mcfuart" driver. A dozen TCP/IP layer boot messages about hash table initialization and such but no line about the actual network card initializing itself. (Doesn't mean it didn't, which messages happen at which printk verbosity level is kinda potluck in embedded board drivers.) Oooh, mtd probe address, we've got a Memory Technology Device which means flash chip. Data storage onna block device, which QEMU might be able to stick a host file under. /dev/mtdblock0 which the "initramfs didn't work" root= fallback logic tried to mount as ext2... because apparently the default kernel command line (from qemu? built into the kernel?) is root=/dev/mtdblock0 and WHY does it bother saying /dev/ there? Honestly, what's the alternative?

Ok, I got a kernel to boot and spit out messages to serial port, which means I MIGHT be able to get an initramfs to boot to a shell prompt with serial console, even if I don't have any other I/O devices working yet. Assuming I can figure out how to get musl to...

Ah, darn it. I did this a year ago. And why did google not find musl's official web mirror on openwall? Google searches are getting RAPIDLY less useful, it's very annoying. I manually navigated to the right place but for some reason Google can't find that. Do THEY have a borked robots.txt? No, looks sensible. This is just Google increasingly sucking. I hope they recover.

Anyway, yeah, that's why I didn't do this earlier. Puppy eyes at Rich time again, I guess?


March 5, 2023

Took ADHD meds _and_ a store brand zirtec AND a prophylactic ibuprofen this morning, just for good measure. Actually able to concentrate for once, at least so far.

And Elliott's having build trouble on mac, which... how slow is it to launch executables on mac? Is it just a homebrew thing, or are all mac binaries latency spike city? And yes, I should have realized old version of bash without "wait -n" isn't just a centos thing, it's also a mac thing. So my centos hack is insufficient if you care about the mac build being well-supported, which Elliott does.

Checked in the fixes for the warnings from yesterday.

Grrr, tests/files/* is design-level wrong, but it would take a largeish rewrite to make it right. I need generally better organization for "not the actual toybox source" files: scripts/make.sh and scripts/mcm-buildall.sh and scripts/mkroot.sh and scripts/root are all slightly different categories.

Cycling back to the "help" redo...


March 4, 2023

Dear compiler loons:

toys/posix/ls.c:393:16: warning: too many arguments for format [-Wformat-extra-args]
printf(" "+FLAG(m), 0); // shut up the stupid compiler

But if I yank it, llvm goes:

toys/posix/ls.c:393:16: error: format string is not a string literal (potentially insecure) [-Werror,-Wformat-security]
printf(" "+FLAG(m));

So one compiler warns if you give it an extra argument, the other warns if you DON'T give it an extra argument, and in NEITHER case is it an ACTUAL PROBLEM. (Sigh, switching it to xputsn() but still. This is unsuppressable false positive noise. Stop it.)

Meanwhile, gcc is also going:

toys/posix/cat.c:31:32: warning: the omitted middle operand in ?: will always be 'true', suggest explicit middle operand [-Wparentheses]
int i, len, size = FLAG(u) ? : sizeof(toybuf);

It's not "true", it's a constant 1. Guaranteed by C99. I WANT it to be a 1. When the flag returns 0, I want to replace it with sizeof(). That's what that code is DOING. (This warning showed up when I added the !! because previously it was (integer&mask) which coincidentally was 1 but gcc wasn't treating "1" and "true" as different because THEY ARE NOT DIFFERENT IN C, THAT IS A C++ THING AND C IS NOT C++.

On the bright side, flag position is less important, so less of a lurking land mine. The cost is that gcc's rapacious stupidity triggers on more irrelevant crap. I'm sad I can't just compile with old toolchain versions from Before The Stupid, but I did that in aboriginal and there was a limit.

Garrett (the uclibc++ guy I worked with at timesys way back) drove through Austin and met me for lunch, and we wound up talking for 5 hours. (Rudy's no longer has the reasonably sized reusable plastic cups, it's styrofoam now. Oh well, I've still got like 10 of the old ones.)

Too tired to do more programming after that, although I'm not sure how much was the truly insane quantity of cedar pollen in the air today. Yesterday's apocalypse du jour dropped the temperature 30 degrees, which always wakes up the cedar trees this time of year and gets them bukakkeing their needles off. The recent apocalii also left us with a large pile of broken branches out front, between the ice storm and the tornado warning, and it's a race between mail-ordering a hatchet to make firewood and municipal brush collection to see who gets them first.


March 3, 2023

How limp did I go after getting back home? I was 3 days behind on reading my webcomics.

Sigh, I was so _amazingly_ spoiled by the speed of Fade's internet connection. I'm back here with Google Fiber and pages are taking 30 seconds to load, and email is downloading at one message every 3 seconds. (In batches of 400. It takes a bit.)

Finally applied Elliott's pending patch. (Saw it in the web archive yesterday but hadn't downloaded enough email to grab a local copy, yesterday I took my laptop to Wendy's and HEB and neither offered net access. Phone tethering drains the laptop battery fast, and the radio signal situation in Hancock center is appalling: it can't see my WIFI access point if I lay the phone on the keyboard, and around the Corpse of Sears (which Wendy's is across the parking lot from) my bluetooth headphones need my 6 inches from my left ear to avoid dropouts. Plus t-mobile did the "you have used 48gb of your 50gb gratuitous metering quota before we throttle the hell out of you" ping in the airport, and doesn't reset until the 5th...

So FLAG(x) now uses !! to force the return value to 0 or 1, which gets optimized away when it's used as a logic value. Audited all the users to remove a bunch of existing VALUE*!!FLAG(x) that are now redundant, and removed several subtle dependencies on a flag having a specific value along the way (some of which were commented, some weren't, including at least one subtle bug introduced by a commit that moved flags). There's still several VALUE*!FLAG(x) which now turns into VALUE*!!!(x&y) but the extra ! also get optimized out.

Whole lot of little style fixes as long as I was doing a review pass, spaces arround the = in assignments, removing inconsistently used parentheses, str = FLAG(x) ? "" : "K" becoming str = "K"+FLAG(x), etc. A few cases of "FLAG(x) ? TT.x : other" becoming "TT.x ? : other" which is actually subtle: sometimes you check the flag to see if it was set because it's an argument that only takes collated arguments, so --blah=abc sets TT.blah to "ABC" but --blah leaves it NULL. But I checked that it wasn't the case, and switched to the "test only one value and hopefully it's still in a register" version. (That said, I _kept_ one in patch.c because TT.p is numeric and could legitimately be -p 0 which is different behavior from not saying -p, so we need to check the flag not just the value.)

Whole lot of other "verification" of VALUE*FLAG(x) was _previously_ the rightmost flag, and not a hidden *4 or something. (The one case where it was had a comment.)

While I was there, I normalized todo and Todo to TODO so it's easier to grep for. (Can't just grep -i because "todo" shows up in comments and at least one local variable name.)

This wasn't (intended as) a micro-optimization to shave a few bytes off the code, this was "remove some conceptual land mines", but I did run bloatcheck a few times in hopes it wasn't making the result noticeably larger.

Oh goddess, this chunk of tar.c:

do {
  TT.warn = 1;
  ii = FLAG(h) ? DIRTREE_SYMFOLLOW : 0;
  if (FLAG(sort)|FLAG(s)) ii |= DIRTREE_BREADTH;
  dirtree_flagread(dl->data, FLAG(h) ? DIRTREE_SYMFOLLOW : 0, add_to_tar);
} while (TT.incl != (dl = dl->next));

Is assigning to ii but not USING it, the argument to dirtree_flagread() recalculates one of the flags and leaves the other zero. How is the passing the test suite? Would fixing it _break_ the tests?

Fixing it does not break the existing test suite. I'm gonna fix it and see who (if anyone) complains? (I think it might only affect sorting at the top level, which might not be a thing since even when the top level is a directory that's one entry. I need to think through it and come up with a test, which I dowanna do now because this is big and I want to get it CHECKED IN.)

I have SO much half-finished crap in my tree I need to FINISH and FLUSH. The recently help plumbing changes aren't quite done yet. My most recent bount of shell work. The lib/passwd.c rewrite. I get a bunch done but don't make it over the hump so it accumulates instead of reducing. Need to CLOSE TABS...

And I think even tar --sort isn't going to sort the command line arguments? Procesed inn the order provided, the CONTENTS get sorted. Which still gives us the stable ordering, which is what they were after... Ahem, NOT FOLLOWING THE TANGENT RIGHT NOW.


March 2, 2023

Still kinda collapsed. I have pending email from 2 people to reply to, and spent most of the day not doing it. So many tabs to close...

Alright, the design issue with the --help output is when should it have the toybox summary line? Going with my most recent release binary, it looks like "toybox --help ls" prints it but "toybox help ls" does not? I can work with that...

Sigh, there's a lot of THINGY*!!FLAG(x) and Elliott's most recent patch also modified code that assumpes FLAG(x) is producing 1 which is an artifact of position. (There's a comment about making sure the flag is in the right place in the optstr. That's... more brittle than I like.)

Possibly the FLAG() macros should have the !! built in? I should check whether the optimizer is smart enough to produce the same code. (No, I am not going to start using the "boolean" type.) Time to dig out make baseline and make bloatcheck! Which don't quite work here because changing toys.h at the top level doesn't get dependency checked and cause a rebuild, and "make clean" deletes the baseline out of generated/unstripped. Workaround: rm -rf generated/obj before make bloatcheck.It's not _quite_ the same output. (With gcc, anyway.) In do_sha3sum() it's because we care about the flag position, which should be masking instead of using the FLAG() macro anyway. In do_gzip() it's because we're passing the value to a function which does not appear to be being inlined so even though it's only ever being used as a logic value the status doesn't propagate far enough. In cp_main() FLAG(f) and FLAG(n) are being assigned to local variables which are then used as logic values... which shouldn't make a difference to code generation, but does? Ha! And when I yank those local variables and just use the FLAG() macros directly, it shrinks 34 bytes! In touch_main() it's another "we care about flag position" thing saving 3 bytes: I'll live, and cpio_main() is another "flag is 1" with a comment, and also assigning FLAG(t) to a variable which only cares that it's nonzero but the variable's incremented a couple times later (to make it nonzero) so... take the hit. In cksum_main() FLAG(L) is passed as a function argument, so "zero or nonzero" must become "0 or 1" (and crc_init() is in lib/ so I don't expect it to be inlined across compilation units). Still kinda surprised su_main() isn't in pending because that whole subsystem is still unfinished, but reset_env() is taking FLAG(l) as an argument which lives in lib/ so isn't inlined so doesn't see it's being used as true/false. In pidof.c print_pid() is returning FLAG(s) and that function isn't being inlined because the function pointer is passed to names_to_pid(). Ha: nl_main() was doing another "depend on the flag being in position 1" but did NOT have a comment about it... and there's about 5 more of those. Sigh.

Huh, patch -R looks broken: apply_one_hunk() did reverse = FLAG(R) and then part of the "allow fuzz" test was c=="-+"[reverse] which means it depended on FLAG(R) being 1, but when Elliott added -s in commit 6f6b7614e463 I didn't catch that he put it at the end and moved R to 2 meaning in the reverse case it'll be comparing against the NUL terminator instead of the '-'. And we don't have a test for autodetecting fuzz. So adding the !! would actually _fix_ this.

Alright, I think I want to audit all the FLAG() uses in toys/*/*.c because there's a lot of !! I can now remove, and I should be consistent about not parenthesizing VAL*FLAG()|VAL*FLAG() because * is higher priority than |. It's a pity there's no "make test_dmesg" to make sure I didn't break that. I expect this is gonna come up a lot in a treewide audit...


March 1, 2023

Day after travel. Collaped.


February 28, 2023

Flying back to Austin.


February 27, 2023

Oh hey, another email in my inbox this morning about what somebody thinks I SHOULD be doing instead of what I am doing. (Watched a good video on "autistic inertia". I've mentioned before that I work based on momentum, and there you go. Having something Looming can either be extremely motivating (avoidance productivity: I will do SO much cat waxing to "virtuously" avoid the Looming Thing), or extremely demotivating (loss of momentum and traction because I can't muster the executive function to address picking up that piece of paper, it just won't budge).

Anyway, ignoring the "linux-kernel community is so broken" pile, the question du jour in my email was:

what is the official toybox opinion on rust being added to toybox?

And "My gut reaction is "Oh goddess not again" and I've been actively ignoring it?" was too short, so Pascal's Apology kicked in, and I replied:

Define "added"? I'm not putting a rust compiler in toybox, if that's what you mean?

If you mean "should I implement some commands in Rust and some in C", having a single simple context everything is done in the same way is part of toybox's design goals? Early in Toybox development the build needed Python, and I cleaned out that build dependency so it's all C and bash, and I'm implementing my own bash compatible shell so toybox builds under toybox. Early on I even had some commands implemented as shell scripts, and I wound up removing them again and doing them in C even though I planned to ship a shell interpreter, because I wanted the whole thing to be a single file with no external dependencies which you could statically link and drop into an empty chroot directory and have it just work.

If you mean "rewrite the whole project from scratch in a different language", long ago I was thinking of rewriting the whole of toybox in Lua but the problem I hit is that Lua doesn't ship with a standard set of posix bindings so I had to install something like 7 different prerequisite packages just to manage things like "wget", let alone implementing "mount" or "ifconfig", and if I had to implement/ship my own new Lua bindings written in C (and cross compile those to every supported target architecture) I might as well just do everything in C. (Which is a pity, Lua was quiet elegant, but their deployment strategy was too minimalist to be usable on its own.)

If you mean how would Rust affect my variant of countering trusting trust then having the project be in multiple languages again kinda defeats the purpose of a minimal installable base capable of reverse engineered binary auditing.

If you mean coming up with a replacement tiny system written in a single language that's both learnable the way minix and xv6 are _and_ scales up to actual load bearing deployment in real world usage (the way Linux 0.95 through about 2.2 did)... I'm still trying to make that work in _C_ (well a non-GPL one, I had it working with busybox but the insane FSF poisoned that well so thoroughly with GPLv3 around 2007 I wound up starting over.) I'm told the Rust compiler is now written in Rust and dunno what its system call binding approach is, but I still await a Rust kernel that actually ships in a product. (Even a vmworks level of kernel: I decided to wait for that when I saw that blonde lady's Rust talk at linuxconf.au in 2017 and I'm still waiting. Heck, even something as silly as Fuchsia, just something somebody somewhere actually used for something in a non-demonstration manner. There's a dozen different My Little Kernel variants people have done, but nobody actually seems to do real work in Rust? It's all either reimplementing stuff that already exists because "Ew Icky C", or "here's how we're going to change the language governing bureaucracy" and "here's how we're going to add yet more complexity to the language" and it's very tiring...)

Show me a serious attempt at a system that rebuilds itself under itself from source code, all written entirely in Rust with no C anywhere, and I might start to care? ADDING Rust on top of existing complexity is just more xkcd standards layering. (Yes, you have garbage collection and bounds checking like Java did in the 1990s. Yes you have native compilation to binaries like Java did with IBM's Java Native Compiler back in the 1990s. Yes you have a Big Marketing push and drive to rewrite everything in this one language like Java did in the 1990s. Yes you have a strong argument that C++ is a terrible language like Java did in the 1990s, which is not the same as C being a terrible language but try telling any C++ developer that. From a safe distance. Bring popcorn.)

If you mean the "Rust is inevitable, the same way Hillary Clinton was in 2008 and again in 2016", I note I've lived through the following:

I learned C in 1989, spent about 1992-1995 doing C++, and then was all in on Java as my main programming language from 1996 until about 2000, caught the Python 1.x->2.x transition and then bowed out again when staying on 2.x actively offended the 3.x developers... The C I learned way back when remains relevant. If I tried to write new code in the ~1995 version of any of those other languages it wouldn't build in modern environments.

I was part of the "rewrite everything everywhere in Java" crowd for about 5 years. My bug report was the reason the ability to truncate a file was added to Java 1.2. I worked on IBM's port of JavaOS to the PowerPC in 1997, taught Java at the local community college in 1998 and 1999, designed a hard realtime garbage collector... It was really exciting. (I wrote a little about how that ended in my blog.)

Any time someone goes "why aren't you using Rust" as an accusation, I treat it the exact same way as the C++ and Java people doing that before them. I had 20 years of Windows people asking why I didn't do windows (smoothly transitioning from rejecting OS/2 to rejecting Linux). I don't care if "everybody's doing it", I've never had a Facebook account either. It's not _my_ job to "be convinced". Lua had "here's cool stuff Lua does better", which appealed to me enough to take a look. I have yet to see arguments in _favor_ of rust, they've all been _against_ C. "C bad, icky and dangerous, we blame you for perpetuating it, you must stop now". No thanks.

A big reason I keep coming back to C is I can stay 10 years behind on the standards without a problem. Heck, I can still compile K&R stuff from 1978 if I really need to. The main deficiency of ANSI C from 1989 is that the first 64 bit processor came out in 1991 so the 64 bit "long long" type was a widely implemented compiler extension that worked its way into the standard later. I only moved toybox from C99 to C11 recently because of like ~3 minor convenience features (typecast array/struct literals, the "has_include" macro, and an alternate "inline" syntax that let us work around an llvm bug that's probably since been fixed).

Rust still hasn't settled down and decided to be nearly that stable: from a distance it looked to me like the first decade or so of the language was just WILD THRASHING leaving the language unrecognizable 5 years later, and now it sort of knows what it is, but still changes?

Has anybody made Rust work on a nommu system? Or only XIP from read only storage with 256k of sram? (Which Linux has been made to do, for example. Good luck pulling that off with garbage collection...) If not, your argument is "we'll still need C, but just less of it, so a smaller pool of people will have less expertise and age out without replacement". That's kind of Tesla's version of the self driving car argument: 99% of the time it'll drive for you just fine, and the remaining 1% it will crash and/or kill pedestrians and we're calling that the driver's fault but the driver won't be paying attention and may be way out of practice assuming they ever knew how to drive in the first place. How this is supposed to be a net improvement, I couldn't tell you.

Is there a Rust version of tinycc? What's the smallest, simplest Rust compiler out there? (Tinycc could happen because the language wasn't a moving target. If I decided to pick it back up and bang on it again the old stuff I did is still theoretically relevant. Is even a 5 year old version of Rust still relevant?)

If you want to implement commands in rust yourself, you can stick them in the $PATH and it should just work. Is there an obvious reason this should have anything to do with toybox? The "start over and rewrite everything in rust" approach like I was poking at doing with Lua would mean getting all four packages written in Rust. And preferably a stable version of Rust where a newbie could grab an existing system deployed 10 years ago and not touched since then, fire up the old build, reproduce it, understand it, and be able to modify it. As far as I can tell, this isn't a thing the Rust community _wants_, let alone is actively trying to achieve.

Sigh, I haven't got anything _against_ Rust, any more than against Ruby or PHP or Lisp or Prolog. I just don't care. Nor was I _offended_ by the people submitting forth and lisp interpreters (yes, plural) to toybox over the years. (In the absence of toysh, people have decided it needs a programming language.) I understand this guy's interest, and would like to politely decline... except I DO have something against projects like systemd that don't give me a graceful option not to participate, and the push to rewrite the linux kernel in rust without forking it is exhausting in the same way the build requiring perl was exhausting.

This guy didn't exactly knock on my door with a rust version of The Watchtower to tell me the good news about our new savior, but... I'm not getting "live and let live" vibes from this community either.

(I have a youtube video bookmarked, which claims to explain Rust in an hour. It's on my giant to-watch heap. I'm not AGAINST Rust. I just... still don't see the point?)


February 26, 2023

Fiddling with toybox help plumbing. Kinda spiraled.

So "toybox --help toybox" wasn't producing output, because of fallout from changes to prevent "toybox toybox toybox" stacking arbitrarily deep (and blowing the stack now that Linux doesn't necessarily enforce environment size limits even on mmu systems). So I started poking at that, but the show_help() flags API did the old "this argument was a yes/no boolean, then it grew a second bit, then it grew a third bit, and now it needs #defines" thing that I hadn't cleaned up yet. And while I'm there, "help -au" should print the usage lines for all commands, but calling help as a shell builtin does unique filtering so what happens when you "help -u" on the builtin? And the "See:" logic isn't filtering right as a builtin (redundant lines). And this whole "Toybox 0.8.9 multicall binary (see https://landley.net/toybox)" line at the start (which wasn't my idea, but then calling Linux "linux" wasn't Linus's idea either) should only be output SOME of the time and when is that some?

I keep trying to do quick fixes that wind up touching a half-dozen different files and leave off unfinished after hours of work and then it just ADDS TO THE MESS.


February 25, 2023

Flying back to Austin on tuesday. Not up for programming stuff today. Reading fanfic on AO3 instead.

Some months back I posted an observation about the Tardis to mastodon, which is why I want one. Just catch up on everything and come back when you're feeling up to it.

I wrote up an email reply which is a bit rambling and off topic for the toybox list (see "not up for" above, combined with pascal's apology for writing a long letter, substituting "spoons" for "time") so here it is instead. The context is that Michael Kerrisk, the man-pages maintainer, retired and handed the project off to a new guy, and didn't properly announce it (quietly added a co-maintainer to the git repo and then ghosted everybody), and now that we've finally figured out what HAPPENED we're trying to adjust.

On 2/24/23 11:46, enh wrote:

> > Possibly the new maintainer needs to poke Konstantin to get access to update the
> > directory, and then put stuff under the actual kernel.org page? (Or you could
> > put some under an android.org location? Either way they'd be up to date with the
> > repo instead of a couple years behind...)
>
> yeah, that's one of the options... generate the html and stick it on one of the
> android-specific sites, but that seems a bit odd (people are already confused by
> places where the man pages are actually only talking about glibc; hosting them
> on an android site would only make that worse) and there are already a lot of
> links to man7.org out there in the wild, that it would be
> unfortunate to see go stale. (though if no-one has access to man7.org
> any more, there's nothing we can do about that anyway.)

The downside of depending on individuals is you're inconvenienced when they cycle out. The downside of depending on organizations is they're all just a bunch of individuals who get together and collectively pretend, so things go just as pear shaped when the people actually doing the work well leave without a proper handoff to someone else who will actually do the work well, but you tend not to notice as fast (before _or_ after: see the Linux Foundation's consumption of the Free Standards Group and thus the Linux Standard Base). This lack of warning isn't necessarily an improvement.

Ahem: man7.org was offered as a community resource but is actually Michael Kerrisk's personal page and he is not handing it off to the next guy. (The maintainer of landley.net does not get to throw stones here, although all the toybox.net variants are camped by people who want thousands of dollars.)

The responsibility for the man-pages git repository was handed off (resulting in the repo effectively moving to a new URL which nobody seems to really care about), but not the website or the release announcement email list. (Haven't gotten one since, if it's still having releases?) If it's a good idea for the project to move to more of a "package deal" where there's a repository+website+mailing list that can be passed to a new maintainer as a group, that's sort of a design issue.

Jeff Dionne set up the original uclinux project, which I believe busybox.net was modeled on after linaro ended in the dot-com crash and the kernel parts got merged upstream and Erik Andersen kept the busybox+uclibc subset of uclinux going as a personal project. He handed busybox.net off to me in 2005 by giving me a login to the server (it moved from the DSL line in his basement to osuosl, but Erik still pays for the domain renewals). When buildroot forked off of uclibc, I'm the one who abused my root login on the shared server image to create a new mailing list and kicked the buildroot traffic off to the new list. (Alas, too late to save uClibc.) Buildroot has since separated itself the rest of the way from uclibc (its own VM with its own domain), so it's not inconvenienced by shared infrastructure going down (as has happened a few times what with uclibc being dead and all), which also means that handing over the keys to a new maintainer is a thing that buildroot could potentially do if necessary.

Sigh, somebody should write up a non stream of consciousness "handing over the keys of an open source project to a new maintainer" document. Do you even have a manifest of what all the project's resources ARE for something as big as Android? Not that Google's ever going to hand off Android. I remember when Red Hat set up Fedora and it (it pretended to be independent until Red Hat finally admitted it was just Red Hat Enterprise Rawhide. (So an independent Centos emerged... and Red Hat bought it.) Anyway, the point is when people/management change, the project's gonna wobble no matter what the corporate structure says because it's people who do things and know things and remember things, whatever they corporate structure says.

> > Let's see, how hard is it to produce html output from this git repo... it's got
> > a top level Makefile to do exactly that as its default target, but it wants a
> > package called "man2html". And installing that on my laptop installed apache
> > which LAUNCHED AN INSTANCE ON LOOPBACK. Why on EARTH would... that's just sad.
> >
> > But ok, I can uninstall it again after building... looks like it populated
> > tmp/html with files? No top level index. Let's see, the first file under "man3"
> > is __after_morecore_hook.3.html which seems to be a synonym for malloc_hook (not
> > symlinks or hardlinks, just redundantly generated files). The "Return to main
> > contents" link goes to file:///cgi-bin/man/man2html which does not exist. The
> > #include link goes to file:///usr/include/malloc.h which ain't gonna
> > work on a web server either...
> >
> > Looks like there's the start of something workable here, but it needs a bit of
> > shoveling? (Or at least digging into how to configure it?)
>
> yeah, and one problem with being part of a large bureaucracy is that the docs
> folks and the branding folks will all want a say in making it look "right" if
> it's on an android site!

My first really well-paid consulting gig was working at a dot-com that was managing a rewrite of IBM's mainframe pricing and sales system. Various departments within IBM had wrestled for control of the project so extensively that upper management had outsourced our bit of it so NONE of them had it. Taken the ball away and given it to someone else entirely so they'd stop fighting.

Which meant my job was to be on an 8am conference call with IBM Europe (Boblingen, Germany: initial deployment) and IBM USA (Poughkipsee and Dallas, one did frontend one did backend), a 6pm conference call with IBM USA and IBM Australia (Worldwide Integration and Test, it was _not_ in didjabringabeeralong because that's a Discworld reference but don't ask me what city it WAS in, somewhere that was simultaneously under water and on fire at one point but that came later), and when I needed Australia and Europe to talk to each other that was a 3am call and I slept under my desk AND BILLED FOR THE TIME. (The dot-com manager told me to.) I don't think I authored a line of code (for them) the entire contract, the _technical_ part of my job was matching up defect reports that told us to do one thing and defect reports that told us to do the exact opposite (or explicitly NOT to do that thing) and bring up pairs of them in the meeting.

Somebody eventually explained to me that a specific manager in Dallas (Ken somebody?) had figured out how to get promoted by sabotaging projects: during the design phase he demanded to know why implementation hadn't started yet, then when they started implementing an unfinished design he'd demand to know why it wasn't being tested yet... The answer was always "because we're not ready" but he'd make a stink and get it started and the reputation was Ken Got Things Done. It wasn't happening before he made it happen. It all collapsed into chaos the moment he left, but that just showed how vital he'd been didn't it?

So this project had fundamental design changes coming in regularly, requiring not just complete rewrites multiple couple years into the project, but constant changes to the test plan. ("Why can we never get real database data to test with?" "It's their strictest trade secrets." "What is this system for anyway?" "Pricing 360 mainframes." "How much do those usually cost?" "That's not how it works, the salesman figures out how much the customer is able to pay, and then they produce an invoice that adds up to that amount." "So this whole system is a giant bullshit generator that emits nonsense to produce a predetermined result?" "The invoice has to be reproducible and comply with a bunch of legal and regulatory clearance issues, you have to word things right for the technology to be exportable to various jurisdictions..." "You didn't answer my question." "No I did not.")

Eventually the Australians did a slimy clever political thing to extricate themselves from this cluterfsck death march, by declaring that one of the endless thrashing "release candidates" they'd been given had PASSED THE TESTS and they certified it as deployable, closed out their budget, and scattered to the winds reassigned the testing staff to other projects. Completely ignoring the fact that the testing they were doing was useless (something nobody else could call them on because nobody could, for political reasons, admit it to be true in so many words.) They took a random passing snapshot in time of the vague contradictory specifications they'd been given and ran the red queen's race fast enough to catch up just long enough to call Bingo. They'd been given an impossible job and claimed to have done it, because ignoring the "it's just busy work until we're ready for you" nature and instead declaring victory meant they could STOP DOING IT. Which immediately clogged up the pipeline leading to them, because NOTHING MORE COULD BE TESTED, an existential constipation crisis leading to ALL THE PHONE CALLS.

That's about when my 6 months were up, at which point the consulting company all this had been outsourced to offered me a 50% raise to just STAY AND BE ON THE CALLS... and I just couldn't. I couldn't put into words WHY, this was almost 15 years before David Graeber wrote his first article on "Bullshit Jobs". But at the start of the contract, the existing employee who'd been doing it had used all his accumulated vacation time AND some family emergency under the family and medical leave act to take a solid two months sabbatical, forcing them to reassign the project to ANYONE ELSE BUT HIM. They'd thrown money at a passing junior dev to just Sit In The Chair And Be On The Calls, and he'd left me a pile of useless printouts to "get up to speed" with. There WAS no documentation. The job was babysitting, and the burnout was just insane if you didn't understand that and tried to actually accomplish anything ever. I found myself physically unable to just shut up and take the money longer than I'd already done.

Anyway, tl;dr there are sometimes political advantages to having something live outside an organization.

> > > nope... that's still
> > > https://thephd.dev/c-undefined-behavior-and-the-sledgehammer-guideline
> >
> > I apply the sledgehammer to the compiler. (Push back against the abuser causing
> > the damage, don't make the victims endlessly escalate ever-changing "compliance"
> > that's never good enough. Danegeld encourages the dane.)
>
> read the link (or listen to what i've been telling you for the best part of a
> decade) --- the problem is that the compiler folks don't believe we're their-
> customer. they don't care about "is it useful?", they care about "microbenchmark
> line goes up?". or, in your analogy "the law is currently on the abuser's side".

Oh sure. I know. Doesn't mean I'm going to stop fighting. (There's a reason I was poking at tinycc/qcc.)

Steven Universe's "that's why we can't fight them", "that's why we have to fight them" line works in here somewhere.


February 24, 2023

Got an email from Andrew Morton which would have been great if it was the first one, but after Thomas Gleixner's repeated replies ignoring the code and talking about bureaucracy (I actually MET him, and later recommended his company to Taylor Simpson at Qualcomm for handling Hexagon's kernel patch review and upstreaming, nice guy back in the day...), and that japanese guy going "we voted on this stupid unnecessary API so adding code that renders it irrelevant would highlight how stupid it had been and embarass us all"...

I'm trying to scrape up the politeness to answer Andrew's questions in a constructive manner. Rather than an honest one. "Who do I expect to merge this?" Nobody. I do not expect the kernel clique to be functional enough in 2023 to merge external contributions from individuals. All of this code was submitted to the list before, and ignored. This is a roundup for people outside the kernel. If people build their own kernels, this can add to their patch stack. If lawyers give me guff I go "look, I submitted it to them, they chose not to merge it for their own reasons". But linux-kernel being a functional place to discuss patches? That's LONG gone.

But I can't just SAY that. It's not USEFUL. I'm not sure what would be, and dealing with them makes me SO TIRED. (Andrew is being polite and constructive! I should do the same! I really should. I'm just out of spoons for kernel "community".)


February 23, 2023

Many moons ago I was trying to add cortex-m support to mkroot but seem to have lost my notes. (I want a qemu nommu target so I can more easily test nommu support without copying stuff to my turtle board. Yeah, I can tell toybox to enable nommu support anywhere and use the nommu codepaths, but that doesn't prove nothing LEAKED, and that the result actually WORKS on a nommu system.)

My old blog entry from the time just says I was working on it, but doesn't provide useful context like what QEMU board or Linux defconfig I was trying to make work. So we start over.

According to qemu-system-arm -M ? | grep '[-]M' the list of QEMU Cortex-m boards includes "stellaris" (64k sram), bbc microbit (no obvious Linux target), and stm32 has two boards: vldiscovery and netduino, neither of which implement ethernet or block devices. That leaves mps2.

The mps2-an500 and mps2-an511 each have 16 megs DRAM, and qemu's hw/arm/mps2.c has a gratuitous explicit test for an -m trying to increase it and then refusing to do so: if (machine->ram_size != mc->default_ram_size) error_report("Invalid RAM size, should be %s", mc->default_ram_size); (Which seems silly, there's space in the mapping? Oh well...)

Linux has an mps2_defconfig build. I need a kernel config, QEMU board emulation, and compiler that all agree on the target, where "compiler" includes both gcc tuple and musl support. I have a static PIE toolchain for armv7m (I.E. thumb 2 I.E. cortex); I'd like an fdpic toolchain but haven't made that work yet because support hadn't been merged upstream yet last I checked. (gcc, binutils, and linux all need it, not sure about musl?)

Standard ELF has absolute memory addresses hardwired into it, which means you could only run at most one instance of each ELF binary on a nommu system (it's kind of the same problem a.out had with shared libraries: in practice the ELF loader just isn't allowed on nommu). Position Independent Executables (PIE) use relocatable Position Independent Code (releative addresses from a base pointer kept in a register), which is basically building your executables the same way you build your shared libraries, so they can be loaded anywhere in memory. It's slightly less efficient but the security nuts love it because exploit shellcode hasn't got known absolute addresses to use on the target system. FDPIC takes that concept and expands it to make all four of the standard ELF segments (text, data, rodata, bss) independently relocatable, which means your program doesn't require one big contiguous chunk of memory to fit into, but can instead fit into four smaller chunks (which is very useful on nommu systems, where memory tends to get fragmented over time), AND it means the read-only segments can be shared between program instances (five copies of bash can all use the same text and rodata, each one just needs their own data, bss, stack, and heap), but the downside is you need 4 registers to store the 4 base pointers (or have your base pointer point to an array of 4 pointers with an extra dereference on most memory accesses). But that's ALSO something the security guys like because foreign exploit shell code can't even know where rodata is relative to text in a given running binary, it's even fiddlier to exploit.

You'd think the FDPIC loader would be the standard one by now (since it can handle normal ELF binaries just fine: FDPIC is ELF with an extra flag in the header, it has the OPTION too make the segments non-contiguous but not the obligation to do so.) But as with the ext2/ext3/ext4 drivers the kernel guys went "no, fork it and have a completely seperate file that will get out of sync with the other one", and then years later it's a mess...


February 22, 2023

Oh god, kernel people.

[That's all I wrote for this entry at the time. It's now March 19 and I haven't edited and uploaded past this entry yet because I just don't have the emotional energy to deal with that toxic waste dump, but we're coming up on a month behind, so here goes:]

Thomas Gleixner replied, ignoring the actual code parts and instead having a multi-part exchange entriely about the bureaucracy, where I didn't cc: the right people (I cc'd who get_maintainer.pl said!) and my subject line was wrong and UGH, my DESCRIPTION and also there's something in some thousand line documentation file I missed but he literally won't specify what it was because the onus is on me to FIGURE IT OUT. And he's also arguing that if a dependency is EVER needed then it's ALWAYS needed so my patch to be able to build without objtool is conceptually wrong because SOME configurations need that dependency therefor EVER building without it is a crazy thing to want to do.

Meanwhile, Mashahiro Yamada is literally saying that my patch to try the "cc" name before falling back to "gcc" and thus autodetecting llvm in both native and cross compilers (with no other behavior change I am aware of) can't go in because, and I quote: "In the discussion in the past, we decided to go with LLVM=1 switch rather than 'cc'. We do not need both." (With a link to the previous vote.) This was his REPLY to me pointing out that the name "gcc" is like "gawk" and "gmake" (and "gsed" on macos homebrew) and that just about everything else uses the generic name where possible. What's his logic here, "we voted, therefore the topic cannot be revisited"? I just...

So tired.


February 21, 2023

Got my patch series posted to linux-kernel. The oldest patch in that series was first submitted over 15 years ago, albeit in a different form then. Another one fixes a minor bug I myself introduced 10 years ago, which nobody else has bothered to fix since even when I pointed it out to them.

If you're wondering why I'm tired of dealing with the kernel clique...


February 19, 2023

Dear bash, up yours:

$ bash -c $'cat << EOF\nthingy'
bash: line 1: warning: here-document at line 0 delimited by end-of-file (wanted `EOF')
thingy
$ echo $?
0

I'm trying to test ERROR PATHS to make sure they exit gracefully instead of throwing ASAN allocation errors, and you have WARNINGS? The shell has errors and the shell has success, having WARNINGS is new territory. (Still exits with 0... I do not have a syntax_warn() function.) Comparing with the Defective Annoying SHell... that accepts it without a warning and also exits 0. Fine, change the error path to be a... strange sort of success? (Dash does not append a newline to thingy, bash does. Dash doing it strongly argues in favor of NOT doing it, so always newline it is.)


February 18, 2023

Going through the HERE document parsing logic, I hit a "this can't work" bit (comparing two pointers only one of which gets incremented in the loop), and tried a simple ./sh -c $'cat << EOF\nhello\nEOF\n' test and sure enough it didn't (never recognizes EOF), and tried to find the last place it did and gave up in 2021...

That can't be right. There's a regression test suite. I know it doesn't make it thorugh all the tests, but... I had this working at one point.

Sigh, symptom of swap thrashing: sh is a big command that requires a lot of focus and I've had to do it in small increments with all the other demands. Now that I'm focusing way more on toybox, there's still lots of bug reports that spawn off tangents that SEEM quick but aren't. (I spent a couple days basically SCOPING mkisofs. I need to cycle back to diff. I still haven't set up a test environment to check in the lib/passwd.c rewrite that's actually buildable and testable with the bionic NDK...)

I've fixed a lot of sh bugs that were in front of me, which broke other stuff and I either hadn't made it though the test suite (because of the expected failures from still missing features) or the relevant test isn't in the test suite yet. So I need to grind away at fixing stuff for a sadly large hopefully uninterrupted block off time.

I miss the 36 hour porogramming sessions of my youth. These days I look up after 4 and need a long walk...


February 17, 2023

Sed bug report came in while I was poking at the shell double free, and of course the sed thing is another object lifetime rule issue, introduced by the sed speedups which added extra cacheing. Got it sorted I think?

The double free is in an exit path, where the cleanup does not match the assumptions. The HERE document logic adds the EOF marker to the end of the ARG list: not a COPY of the marker, the actual pointer to the original string we parsed earlier. The sh_pipeline variables "count" and "here" let us know we're in HERE document accumulation mode so each time parse_line() gets called it moves the marker and discards it when matched, but the cleanup function called by the exit path isn't looking at that.

There's also something I called "bridge segments" where additional commands that do NOT have HERE documents attached to them get parsed before the line continuation logic fetches the body of the HERE document(s), ala:

$ cat<<EOF; echo hello
potato
EOF
potato
hello

In that case the pipeline segment "echo hello" parses into would get marked as a bridge (its ->count set to -1) so the parse_line() entry path knows to back up through it and look for uncompleted HERE document segments. Once they're completed it works its way forward unmarking completed segments until it can either return "we have a complete thought, you can execute it now" or finds another reason to ask for line continuation (being in the middle of a for loop or if statement, for example).

The PROBLEM is that when you DON'T complete the HERE document, that extra entry indicating what EOF string we're looking for shouldn't be freed, because it exists earlier in the pipeline (in whatever statement had the redirect) so if you free it in both places... double free.

Alas while fiddling with this I found MORE wrong cases. For example, if the redirect ISN'T attached to a statement, it gets freed early (when the NOP statement is freed) an thus the HERE document can't be concluded for a different reason, ala bash does:

$ <<EOF; echo hello
potato
EOF
hello

And toysh can't handle that yet because free("EOF") happens after parsing the first line and then the HERE document fetching use-after-frees it.

I think I need to just xstrdup() it. Premature optimization strikes again, "I don't need to copy this, the original's lifetime is longer than HERE document parsing by definition"... "yes I do to be CONSISTENT".

Sometimes "progress" is just adding yet more tests the existing code doesn't pass.


February 16, 2023

I've been meaning to post my patch stack to linux-kernel for weeks (not because I think they'll merge it but so it's not my fault that they haven't), and hey: Linus did an -rc8 so this isn't the merge window week. Yay extra time, but I sat down to do mkroot builds of 6.2 anyway and...I broke the shell. Darn it. One of those fixes for Eric Roshan-Eisner's fuzzing bugs introduced a strcmp(ex, blah) without a test for ex being null, and running mkroot's init script triggers that codepath and segfaults. Stupid thinko, but have I really not tested mkroot in a month? Sigh.

Oddly enough, I'd already hit this and fixed it up in the shell work I did yesterday, but getting that to a good stopping point so I can check it in is tricksy. (It started with an attempt to add the read builtin and there's a lot of half-finished debris lying around the tree.)

Went to Walgreens early this morning and bought earplugs. Much less painful work experience. (I am not a dog person. Never developed the skillset. If I lock adverb in the bedroom when he's not alone in the apartment, he claws at the door endlessly and will damage it. If I let him out into the center room (combination living room and kitchen), he barks at the front door for maybe ninety seconds every time somebody else in the apartment complex walks through the hall. Fade sits on the bed with her laptop and closes the door, and Adverb thinks that's the correct way to be home and keeps trying to lure me there, but I'm used to a table and chair and this room has better lighting.)


February 15, 2023

If you're wondering how my day is going, my attempt to add a shell "read" builtin has diverged into reverse engineering my ${variable} expansion code to figure out what all the corner cases are which led to reading the relevant part of the bash man page which led to me restarting the bash man page from the beginning which led to redoing sh_main() flag parsing and adding tests for sh -cs "arg" thingy vs sh -c "arg" thingy which led to me changing the logic so -c "arg" aren't an arg.c colon attachment (because they aren't in bash: it reinterprets the first argument as a command instead of a shell script but sh -c -s "echo hello" prints "hello" instead of trying to run -s and yes I need a test for it) which circled back to me trying to get all the existing tests to run under ASAN which means tracking down why sh -c '<<0;echo hello' was faulting which is because TT.ff->pl = xrealloc(TT.ff->pl) sometimes ALSO needs to update TT.ff>pl->end and now I'm trying to work out when that's true. (The only realloc of an existing pipeline segment is when attaching HERE documents to one, which expands the arg[] array at the end, but I need to update ALL the pointers.) And then once I added a loop to check all the pl->end in the pipeline and update it if necessary (which SHOULD happen before function bodies get moved so it should all be in the one doubly linked list), that revealed a double free error I need to track down.

None of this was what I planned to do next, but with Android in feature freeze it seems like a good time to make a dive back into shell stuff...

Adverb has been barking continuously throughout this. Fade's dog is unhappy when Fade isn't here, and expresses it when he's not alone. (If he barks at the front door long enough, clearly I will bring Fade back. It's worked every day so far, after enough hours. I have headphones, but need earplugs. I have escaped the clingiest cat to visit a neurotic dog.)


February 12, 2023

I've taken a break from caffeine here at Fade's, which has resulted in some very long naps. As in more than one unexpected 8 hour nap. Not the most productive, but eh, it's a weekend...

Gentoo's "make tests" is failing on du because overlayfs lies. My first instinct was to mount a tmpfs when run as root, ala if [ $(id -u) -eq ]; then mount -t tmpfs tmpfs .; cd "$PWD"; umount -l .; fi (the lazy unmount means it's still there on the current directory while we're in it, but automatically unmounts as soon as we cd out or exit the process).

Unfortunately, the results from tmpfs are very different from the ext4 I developed it on: mkdir allocates a 4k block up front on ext4 but in tmpfs directories are always size zero (because the dentry cache doesn't take up space in the page cache). And I can't convert the tests to what tmpfs produces unless I'm going to _require_ it to run under tmpfs, which you can't do as a normal user. I think I need to do:

  dd if=/dev/zero of=ext2.img bs=1M count=1 status=none
  mke2fs -b4k ext2.img
  mount ext2.img .
  rm ext2.img
  cd "$PWD"
  umount -l .

Which should get me a filesystem that behaves like the one I'm developing on. (How does "dd" manage to get "unix" so wrong? Success is silent so your pipeline isn't full of trash, having to status=none to do that... I'm blaming IBM, they got ebcdic in there somehow. I'd use "truncate -s" but you can't loopback mount a sparse file...)


February 11, 2023

Attempting to close tabs: the gentoo locale thing should be fixable by having it try C.UTF-8 (which macos hasn't got) before en_US.UTF-8 (which gentoo hasn't got). My readingn of "man 7 locale" says it should try C.utf8 in its search path (feed it the "official" name and it tries four different variants: upper and lowercase, with and without dash)... gentoo still didn't work. I tried to run it under strace to see why, but "emerge strace" doesn't work on last Sunday's LiveCD because /etc/portage/make.profile is a broken symlink. Emailed a "huh?" at Patrick Lauer...

Oh goddess, whatever Horrible Gnome Thing gentoo's livecd is using as its' terminal (or is it a Horrible KDE Thing?) is FLASHING the broken symlink at me. Causing KVM to gratuitously eat CPU doing perpetual screen updates just to the display can cause ADDITIONAL EYESTRAIN. That manages to be counterproductive on multiple levels. (And I haven't dug into figuring out how to make the background be actually black instead of dark grey, because they decided "less contrast, that'll help".) Cleared the terminal and my CPU usage graph no longer looks like a heart monitor.


February 10, 2023

Onna plane. Heading to Minneapolis, visiting Fade until the end of the month. (Flying back on the 28th, which is as far as February goes this year.)

Haven't blogged for the past few days, felt under the weather ever since the ice storm. (It _really_ threw off my sleep schedule.) Made a few notes about "huh, I should blog about that" and then didn't. (Sigh, I should backfill but mostly the things I thought about blogging were when I wasn't in front of the computer, so said notes would be in Austin and I'm onna plane.)

What did I do: aggroed the bash maintainer into a coreutils thread. (Still subscribed because cut -DF still hasn't been merged or rejected.) The arch/sh maintainership transfer is still up in the air. Started researching mkisofs. Did NOT post my kernel patch stack to lkml yet.

On the toysh "read" builtin front, bash's behavior is subtle: read -p hello > /dev/null doesn't work because the prompt is output to stderr not stdout (justs like the $ prompts). If I go read "" it exits with an error immediately (because "" is not a variable name it can assign to), but if I read potato "" it reads a line of data, splits it, assigns the first part to potato, and THEN exits with the error. I don't understand why it only checks the FIRST value for validity before reading input? (Why check it at all before reading if you're not going to check the rest...)

$ read -p % ""
bash: read: `': not a valid identifier
$ read -p % potato ""
%one two three
bash: read: `': not a valid identifier
$ echo $potato
one
$ read -p % potato ""; echo $potato
%blat
bash: read: `': not a valid identifier
blat
$

First time it doesn't even output the prompt, the third read shows it's not a syntax error (just a normal error exit). So that's good to know. I should add tests...

And then of course, after all that bashing my head against input granularity, sitting down to write "read" I'm hitting OUTPUT granularity. Namely: you can list multiple variable names on the read command line and it does IFS splitting to put a word in each argument the way it does for $1 $2 $3 etc for commands and functions... but if there are fewer variables than arguments it STOPS splitting early, and puts the rest of the string into the last argument, not having consumed the remainder's $IFS characters. Meaning read A B <<< "a b c" will preserve a run of multiple spaces or whatever space/tab/space combo was between "b c" when assigning to $B. Which is NOT the "split and glue back together with the first $IFS charachter" logic of "$*" nor the "glue back together with specifically space regardless of what IFS says" behavior I implemented SEMI_IFS for in "eval" and "case"...)

The problem is, my function that does all this work is expand_arg_nobrace() which is already taking six arguments, the last two of which are usually zero. I'm reluctant to add a third "usually zero" argument, especially since the last one that's currently there is "long *measure" which seems like it could be repurposed, but what it currently does is "set it to a character to search for a bit like $IFS but this one's a hard stop where you write the offset at which you found this character into *measure and return early", which is used to reliably find the semicolons in ((math;moremath;evenmoremath)) regardless of quoting and ${thingy#$((blah))} nesting levels. Totally different from "set NO_SPLIT in flags after argument 3".

(I also hate $IFS as a concept, and spent months wrapping my head around the details of what does and doesn't become a separate argument with "" and ""$EMPTY and """$*" when there are no arguments, and how x() { echo $#;}; x """" should print 1 not 2... and looking back through this code I remember that there ARE a bunch of special cases but not WHAT they all were, which is why I made so many tests/sh.test cases for it, and I dowanna touch this forest of nested horror that laboriously jenga-style made them all work, but I have to find exactly the right place to drop in a state change with no state inappropriately crossing the change point... and I dowanna.)

Setting *measure to a negative number is uncomfortably magic.

Adding an IFS flag to change the meaning of *measure would let me avoid changing all the callers to add another zero, but it has a naming problem: the common prefix of almost all the existing flags is NO_ as in NO_SPLIT and NO_IFS to disable something expand_arg() would otherwise be doing. (Which isn't great either, but EXPAND_NO_SPLIT is too long when you're or-ing together five of them). I already violated that with SEMI_IFS and dowanna do so again or I've just got a bunch of random #defines floating around the code.

I made a quick stab at adding an expand_arg_nobrace() wrapper calling expand_arg_nobrace_raw(). After all the original API is expand_arg() which handles ab{c,d} processing and then passes on to expand_arg_nobrace(). But two of the calls ending in double zeroes are recursive calls within expand_arg_nobrace() itelf, and I'd need to provide a function prototype (with seven complex arguments to keep in sync if anything changes) to let those two call each other, which is exactly the kind of nonsense I'm trying to avoid with the ever-widening API on this sucker as I find new corner cases.


February 9, 2023

Of course make tests breaks on gentoo, why wouldn't it?


February 7, 2023

Fixed tar yet again. Here's hoping it sticks this time.

I am now researching mkisofs implementation. (I actually made the mythical "bootable hard drive image" one of the pages said they can't find an example of, back in the yellowbox days. Took some fiddling to get the machine's BIOS to accept it, what with all the legacy hard drive types. Probably why it didn't get used as widely as "floppy image", which had a lot less variants.)


February 6, 2023

I'm amused by Hyrum's Law. (It's the API version of "with enough eyeballs all bugs are shallow". With enough users, all observable behaviors of your system become "the API" and changing it breaks somebody. That's why my spec for toysh is "what bash does" and then run a bunch of existing scripts through it to see what breaks.)

While emailing somebody I checked to see if I'm still in the first page of Google results for "patch penguin", and the answer is "no, but creepy".

The minor discomfort is Google search no longer produces a paged interface, it's one of those perpetual scroll things that loads more as you scroll down I didn't ask for this and actively don't want it, but they wanna be fancy javascript nonsense. (If I switch off javascript for google.com will I get pages back?)

The MAJOR discomfort is I scrolled down something like a hundred entries and it's ALL ADVERTISEMENTS. Every entry is a product and the google summary gives a price in dollars at the bottom, and half of them say "in stock". And it's a special line that's a slightly different shade of grey than the other lines: Google has a "product" category in the search and is showing me almost entirely products. I don't want products. I confirmed I had NOT selected the "shopping" tab, but 2023 Google weights shopping pretty much to the exclusion of all else. I can't EXCLUDE "shopping" from my search, because they don't want me to and I'm "the product not the customer"...

(Um, since Google is apparently determined to become useless now: the Charged Vacuum Emboitment mentioned above was space technobabble the Tardis passed through in the 4th doctor episode Full Circle to wind up in "E-space" instead of the normal universe. "Emboitment" is apparently a mangled french word meaning something like "to put in a box". All TLAs have bad collisions in the modern world, and my brain tends to lock onto the one I encountered first. Mitre is as far as I can tell an NSA front organization, so I guess it's nice the US government is collecting and publishing security vulnerabilities, but I'm always confused when something I do is considered important enough to mention? But I guess I should finish the httpd Common Gateway Interface functionality.)


February 5, 2023

Wait... really? There's a toybox CVE for httpd? (Yeah I remember fixing that bug, but was it really worth a Charged Vacuum Emboitment?)

So I came up with an fpfix() function that does the fseek(ftell(fp)) thing (and should PROBABLY also do the fcntl(O_DIRECT) thing with maybe a stat() determining which is appropriate), and I inserted a call to it in both save_redirect() and unredirect() doing if (fd<3) fpfix((FILE *[]){stdin,stdout,stderr}[fd]); and then ripped it back out again because... that's not right. The extra syscalls are expensive if they'll happen a lot, so I want to make sure they happen at only the necessary places. (Yes, it's lifetime rules again. No, garbage collection wouldn't help. Which made me start wondering how rust or go intend to apply to nommu systems until I got a headache and had to walk away for a bit.)

I'm 95% certain we ONLY care about "fixing" stdin, because that's what uses getline(). For everything else toysh is using file descriptors, so our stdout and stderr global FILE * instances should never _get_ out of sync if we just avoid ever using them. (Is THIS why each dprintf() call on glibc does a gratuitous lseek(fd, 0, SEEK_CUR) before doing a write() of the appropriate data? It's mildly annoying that dprintf() on glibc has such noisy strace output, and you'd think that fileno() would do it to if so, but no...)

I can only think of two actual stdin consumers on toysh: get_next_line() and the "read" builtin can each eat extra data because of FILE * readahead, and then when we run child processes those can inherit a gap. So there are three cases in need of potential adjustment, but the further complication is there are two TYPES of adjustment: seekable file descriptors can get fixed up with seek after the fact, but if it's a pipe we want to set O_DIRECT preferably before the producer _writes_ data into it (because once the pipe buffer's collated we've lost the blocking information).

So toysh needs to fixup each pipe() it creates, and _maybe_ sh_main() should fixup the stdin we inherit? Hmmm, what about "read < /dev/tty"? That says we SHOULD set O_DIRECT on nonseekable save_redirect() input? (Or maybe expand_redirect() should do it when opening the redirect file? Grrr...) I really want an elegant design chokepoint everything has to go through rather than trying to whack-a-mole every entrance and exit. Three consumers of the data, two types of fixup, SHOULD be six total cases, but pipe() vs < /dev/tty isn't in that paradigm.

Ok, toysh needs to O_DIRECT incoming pipe inputs as soon as possible (so sh_main() and expand_redir()), and also set that flag on outgoing pipes at creation time before we write anything to them. The seekable kind can need to set back to the right place when we're done reading them, which does NOT belong in get_next_line() but instead should go at the start of run_line() so multiline reads get optimized (line continuations don't have to re-read the input, so scripts can load chunks), and also on the exit path of each read builtin (because we assume we're going to run at least one command on what we read).

Alright, that SEEMS to make sense...

I'm trying to read through the musl source to see what its getline() block read size is... it really looks like that's doing single byte reads too? src/stdio/getdelim.c is repeatedly calling getc_unlocked(f) and getc_unlocked.c is this strange little wrapper function doing int (getc_unlocked)(FILE *F) { return getc_unlocked(f); } which is explained by src/internal/stdio_impl.h which has #define getc_unlocked(f) ( ((f)->rpos < (f)->rend) ? *(f)->rpos++ : __uflow((f)) ) (and thus the parentheses around (getc_unlocked) isn't some weird function pointer syntax, it's so the symbol explicitly has no arguments and thus the macro preprocessor doesn't recognize it as the macro defined to take arguments... and then the body DOES expand to that macro. Me, I would have PUT A COMMENT THERE.) Anyway, this __uflow(f) is in src/stdio/__uflow.c (yes with two underscores on the filename) which is basically doing f->read(f, &c, 1) except... that read() function pointer takes a FILE * as its first argument, not a file descriptor. Where is the function pointer set? Well one of them is function __stdio_read() which... is doing crazy things with an iovec that I am NOT puzzling through right now ("len - !!f->buf_size" again needs a COMMENT) but it looks like it might be reading buf_size, whatever that is.

I no longer care about the numbers. (If I need to know I can run a test program under strace.) I very vaguely remember from years ago it was 512 in at least some cases? Anyway yes, it can maybe read ahead with block size big enough to reasonably amortize the system call overhead. And thus needs some serious unget to pass the file descriptor to other users. No, I am not trying to look at bionic just now, not after that.


February 4, 2023

Oh goddess fsetpos() is a stupid API, isn't it? The classic ftell() returns long which is signed 32 bits on 32 bit systems, and files are bigger than that these days, but instead of doing some sort of lftell() which returns long long (and an lfseek that accepts it) they invented a new gratuitous fpos_t type which they pretend isn't just a typedef for "long long", and then created two new libc functions with completely unrelated names: int fgetpos(FILE *fp, fpos_t *pos) and int fsetpos(FILE *fp, const fpos_t *pos), both of which are FUCKING STUPID.

WHY does fsetpos() take a POINTER to pos? If you just passed it the value, you wouldn't need to say "const" would you? Yes the get function that WRITES the value is taking a pointer, because they decided these need to return 0 or 1 to indicate error instead of returning -1 when there's an error like the previous one did (since that's not a valid file position), which is itself stupid. (The old way was smarter.) But the set function has ZERO REASON for its pos argument to be a pointer. Feed it the value, then you don't need to annotate it with "restrict" or "auto" or "static" or anything because IT IS A NORMAL ARGUMENT. (Symmetry is not an argument here, the functions DO DIFFERENT THINGS. You don't printf("%d", &i) because %n can write to i and thus needs a pointer, therefore the arguments should ALL be pointers. That would be INSANE.)

The C++ clowns who took over C development make me sad. Ken and Dennis and Doug McIlroy and Brian Kernighan were very smart. The people they handed off to... not so much. (I did NOT point out that gnu would have made rm -rf be "filesystem-modifier remove --no-prompt --recurse-into-dirs-newer-than=all --ignore-read-only", and that unix was all about individual commands that "do one thing and do it well" nd connecting commands with pipes instead of "git subcommand" or "ip subcommand" or "systemd subcommand" or...)

Simple systems survive. Increasing complexity eventually collapses under its own weight. Alas, "this too shall pass" does not usually do so on timescales I get to personally benefit from. There are a lot of "marsupial rat" versions of unix out there (including the 8 zillion posix RTOS variants) because it _works_. Linux wandering away from unix says bad things about LINUX, not about unix.

You can get a full understanding of a unix RTOS in a couple years, although xv6 sadly has the minix problem. (Ken Thompson taught his working Unix system to a generation of grad students who created BSD from it, but ivory tower academics zealously guard their abstract teaching tools from being fouled by any feedback from real world use: patches decidedly unwelcome.)

Which is odd because a complete course on something like vxworks could easily happen in high school, it's CLEVER but not that big and not that complicated, and it's a multitasking posix system with the standard bells and whistles. (NFS over USB? Out of the box, and fits comfortably in 2 megabytes...) Not remotely unique either, that one's just 36 years old and still going so it's easy to talk about. You'd think Linux would have knocked out all the proprietary unixes, but Linux is a PIG that hasn't fit comfortably in 2 megabytes RAM since the 1990s.

Yes it's entirely possible to come up with a brand new replacement paradigm, but it would have to be equally simple and elegant to persist nearly as long. Java/JavaOS tried 20 years ago (back when I taught classes in it at the local community college), but it was an uphill battle even before Sun trashed that quite thoroughly. And then oratroll happened: the other problem with Java was IP entanglements. Technology advances when patents expire, not when they're granted. Unix escaped AT&T early and laboriously purged itself of lingering corporate taint in the early 90's. Anything trying to replace unix has to reckon with late stage capitalism's relentless embrace-extend-extinguish clearcutting and strip mining. The settlers come in and find a carefully curated land with a bounty of buffalo and passenger pigeons and american chestnuts, and all of it's dead and gone within a few decades. The descandants of britain's imperial capitalism do the same thing to any resource that can't defend itself from rapactious unsustainable exploitation as they did to their own people before metastasizing into a global empire, and they are 100% convinced that ideas are property. The livejournal->myspace->twitter->mastodon cycle is about communities as property being embraced extended and extinguished, their members fleeing to a new territory the would-be owners haven't conquered yet. France solved this problem with guillotines.

As SCO proved, there's no money in suing modern Unix. (The Mormon activist behind the lawsuit still managed to take advantage of Novell's founder's descent into altzheimers to elder abuse away all his money and use it to make the handmaid's tale a reality, eventually achieving success under the Trump administration, a misogyny the octagenarian democrats are happily complicit in sustaining to this day.)

Yes this is a cultural thing, the native americans who were here for 36,000 years before the white man came terraformed the place to be full of food you'd just reach out and pick. They modified their environment to make hunting and gathering _easy_, and were also a lot cleaner than europeans. (The ubiquitous "road dust" that medieval europeans brushed off their cloaks was powdered horse manure, which is a health hazard even with modern sanitation, and don't get me started on the cows and pigs and chickens and it somehow managed to be even worse in the cities...) The highly contagious European settlers who came here and killed almost everyone they met (Start watching this charlie brown thanksgiving episode at 18 minutes and 10 seconds, it's educational) didn't realize they were wanding through the equivalent of Kew Gardens, they thought it was wild and that nobody needed to maintain it, and smashed up enormous salmon runs and screwed up controlled burns and just made a mess of the place. Capitalism has ALWAYS been unsustainable. It's just that "expanding until you eat the whole world" was a viable strategy until quite recently, when capitalism predictably ran out of world.

This is why the GOP wants to ban "critical race theory", by the way. When even 1960's Charlie Brown episodes go "we took this land by literal genocide"... the German nazi party literally sent study teams to america in the 1930s to learn how to codify racism in law and get away with mass murder, in response to which president Roosevelt put Japanese americans into american concentration camps, which they could only escape by joining the army to fight in the war. Today we call "plantation owners" billionaires. Might want to maintain some awareness of this general cultural context.


February 3, 2023

Darn it, fseek() is underspecified. If I lseek() on a file descriptor I know what happens, and what error conditions to check for if the fd isn't seekable. But if I fseek() back a few bytes, is it doing an lseek() on the underlying file descriptor or just adjusting the buffer in the FILE * object? If I fseek() on something that isn't seekable does it cause a problem for future reads?

I just fixed head.c, but toysh's read builtin also needs to put back extra data it read for the corresponding test to work right, and lseek(fileno(FILE)) would leave the FILE * readahead buffer with leftover trash in it, so in THEORY I want to do fseek() but in practice I dunno how much I can trust it? (More debris from the C specification people pretending file descriptors don't exist so they don't need to interact with them, and posix refusing to go far enough specifying the interaction.) Honestly, "fseek() shall fail... IF the call to fseek() causes an underlying lseek() and [error happens]" because calling fseek() is by no means guaranteed to cause an actual lseek() to update system status. (Grr, do an fseek() AND lseek(fileno(FILE)) maybe? I'm not convinced this is BETTER than just doing single byte reads of the input so we never get ahead...

Sigh, time to read multiple libc implementations...

Ok, from musl and bionic it LOOKS like fseek() is generally implemented as a wrapper around lseek that flushes and drops the FILE * internal buffer data when the seek works, and the ambivalence about whether not it actualy does that is because fmemopen() and friends exist, so some FILE * objects AREN'T a wrapper around a file descriptor. And those are weird, but I don't have to care about them here.

Ha! If I feed the O_DIRECT flag to pipe(2) then in THEORY that prevents multiple writes from being collated in the pipe buffer, meaning "while true; echo $((++x)); done | while read i; echo $i; done" shouldn't skip any numbers even if it creates and destroys a separate FILE * each time through. (Which it still shouldn't for stdin/out/err, but I need to throw in whatever the read equivalent of a fflush() is each time we redirect stdin.)

Hmmm. There's a gratuitous artificial limitation on fcntl(F_GETFD/F_SETFD) which ONLY lets it change FD_CLOEXEC and NOTHING ELSE. Why even have the API then?

Wow, glibc is truly craptacular. If I go over to my freebsd-13 image and include unistd.h and fcntl.h and do pipe2(fds, O_DIRECT); it works fine. And it works fine built with musl-libc too. In bionic, they have O_DIRECT but not pipe2 because their unistd.h has an inexplicable #ifdef IA_IA_STALLMAN_FTAGH around the prototype. (And I still haven't figured out how to #ifdef for the presence of a function prototype.) But if I do that on glibc it complains about pipe2 _and_ O_DIRECT both failing to be exported from the header files I included without #defining about how RMS sleeps in ryleh. Guys: pipe2() was introduced in 2008 and O_DIRECT has been in Linux for more than 20 years (and grew its pipe2 meaning in Linux 3.4 released May 2012), it is a Linux system call, not a gnu thing.

Linux is not and never has been part of the gnu project, and RMS explicitly objected to the existence of Linux before he switched to trying to take credit for it, and yes his explanation at that link is a big lie because Linux forked off minix not gnu, which is why the early development was all done on comp.os.minix and he had a famous design argument with Minix' creator (when said professor returned from summer break) who kicked him off minix's usenet newsgroup and made him start his own mailing list. I collected some interesting posts from the first couple years on my history mirror: note the COMPLETE lack of Stallman or FSF participation in any of it, and if you boot 0.0.1 under an emulator, the userspace ain't gnu either. Stallman was 100% talking out of his ass: Linux was inspired by (and developed under) Minix with the help of printed SunOS manuals in Torvalds' university library, and it incorporated a bunch of the BSD work going on at the time. The gnu project was one of MANY unix clones happening in the wake of the 1983 Apple vs Franklin decision extending copyright to cover binaries and inspiring AT&T to try to close and commercialize Unix after 15 years of de facto open source development (and the FIRST full Unix clone shipped in 1980) By the time Linux happened, the GNU "project" had been spinning its wheels for eight years. When Linus's 1991 announcement said it WOULDN'T be like gnu, he was MOCKING WIDELY KNOWN VAPORWARE, like a game developer referencing Duke Nukem Forever or Diakatana.

Anyway, the point is the glibc developers have had PLENTY OF TIME to get these symbols into the darn userspace headers, and the only reason they haven't is the same reason Stallman tries to take credit for Linux, which has led to bad blood in both directions. (Stallman also tries to take credit for the existence of FreeBSD, but they just point and laugh at him. He had nothing to do with Wikipedia or project gutenberg either. The term "Freeware" was invented by Andrew Fluegelman years before Stallman's GNU announcement. Magazines like Compute's Gazette had BASIC listings in the back every month dating back to the 1970s. Dude can shut up and sit down aleady, that sexist privileged white male Boomer has Elon Musk levels of taking credit for other people's work going on, and needs to just stop.)

Aha! There's a SECOND fcntl(F_GETFL/F_SETFL) API which CAN toggle O_DIRECT. That's just _sad_, but sure. Assuming I can reliably beat a definition of O_DIRECT out of the headers, which I can't really #ifdef/#define myself because it varies by architecture. But I can get that from everything except glibc, and maybe I just don't care about it working with glibc? There's only so persistently stupid you get to be before I leave you behind. Define it to zero when glibc's broken headers did not provide, and let the call drop out, you get unreliable behavior due to a libc bug. I will not, ever, define stallman because my code is not part of the gnu project. One of its many goals is to provide an antidote to gnu.

Huh, it's surprisingly easy to get derailed into half an hour of closing tabs. Something like a hundred accumulated open terminal windows in desktop 7 (email) which are mostly just "type exit, hit enter" in each one because it's some man page I was looking at or command line tests I an confirm I finished with (or "pulldown->move to another workspace" and send off to desktop 2 (toybox) or 6 (linux/qemu/mkroot, and my kvm instance running freebsd hangs out there too), a bunch of "last thing here was pushing to git" or git show $HASH, or running some simple command like "pkill -f renderer" or df /mnt (shows me what if anything is currently mounted on it) or doing math with $((123*456)), or grepping for a symbol in /usr/include or the output of something like "aptitude search thingy" (an apt-get wrapper with better syntax) where I recognize and can discard the results but switched away from that window once I had my answer. When vi is editing a file exiting out and doing a git diff shows me whether I was browsing or actually made changes.

And lots and LOTS of "vi was editing a file and then got killed" because when you fire up vim on a file that's already being edited, it tells you the PID of the old vim instance but doesn't have an obvious way to just kill the old one and let you inherit the editing session. Instead you have to "kill PID" manually if it's still running (or search around to try to find the tab but good luck with that), then :recover and if the file's changed write it out under a new name to see if the changes are interesting, then rm the temp file and the .file.swp and THEN you can go back and edit it normally. Wheee...) If I'm feeling posh I can even go collate windows that got moved to the proper desktops (you can not only drag and reorganize tabs within a window, on xfce you can drag and drop then between terminal windows. If you haven't got a tab, open a new tab to force the tab bar to show up, then exit the new tab when it's the last one in the window.)

Heh, here's the directory where I was re-ripping some CDs (usb DVD drive still works, cdparanoia still works, most of the CDs are still in the right cases) and hitting them with flac to scp up to my website so I could download them to my phone. (Long ago I had big youtube music playlists, but youtube became 100% useless without paying. Not just two ads between each song, but interrupting longer songs in the middle to play ads. Digging out old CDs and mp3 collections it is...) Pretty sure I can rm *.wav in there, I could zap the .flac files too but eh, I'm not short of space just now. (2 terabyte ssd covers a multitude of sins. Or at least allows them to quietly accumulate.)

Here's the window I download and filed my twitter archives in (both for my original account, which I then deleted, and the backup account Fade made me during all those years i refused to give @jack my phone number, which I still have but haven't posted to even once since making that archive because downloading a fresh archive wants to do 2FA through Fade's phone in Minneapolis which is just not worth it. (I check a couple individual feeds there about as often as I remember to check Charles Stross' blog or Seanan Mcguire's Tumblr. I don't have an account on either site...)

That's the EASY part of tidying one's desktop, of course. Browser tabs have gone beyond the timesink event horizon. Chrome remebering them between restarts is both a blessing and a curse, but at least "pkill -f renderer" keeps the memory usage down to a dull roar. It would be nice if it could save each inactive tab to a zip file under .chrome somewhere so that tab didn't have to reload from the website as if freshly opened whenever I get back to it, but hey. I've learned I basically never look at bookmarks again, and I _do_ periodically revisit and finish/cull old browser tabs. Not as fast as they accumulate, but still...


February 2, 2023

The ice storm has REALLY screwed up my sleep schedule. Woozy. (Couldn't work, couldn't go out, the lights were off all day, and it was stressful.) My internal clock is flashing 12, doing the whole "Too tired to focus but lying down does not result in sleep" thing...

It's hard for me to get worked up about "yoda conditions" when it's THE SAME COMPARISON. 1 == x and x == 1 are equivalent, but the one on the left can't be mistaken for/typoed into an assignment. "Correcting" everything to the one on the right because it's not "mentally comfortable" is something I'm having trouble sympathizing with? (My mental arithmetic apparently does not have "handedness". This is a thing the language has always allowed you to do, and there is a reason to do it and zero reason to do the other one. Arguing "but it's not a _strong_ reason to do it" vs having literally zero reason other than aesthetic preference... Sigh.)

Darn it, my clever "while read" combo hack in toysh has a problem.

So getline is a glacial CPU-eating slog without cacheing, and FILE * is the cacheing layer ANSI C decided to provide back in the day and (eventually) implement getline() on top of, and if you're just reading from stdin then the "read" builtin can use the stdin global constant (as get_next_line() is curently doing), and my THEORY was that for anything else (either read -u or read < source) I could fdopen() a FILE * object and cache it in the struct sh_blockstack instance for the enclosing loop (adding a field to the control flow data), and thus not lose cached readahead buffer data by destroying and recreating the FILE * wrapper each time the read command ran and exited.

BUT: read -u $VARIABLE is not guaranteed to be the SAME filehandle each time through the loop. I guess I can call fileno() on the FILE * and compare the fd we're trying to operate on, and tear down the old one and replace it when they change it?

while read i -u 37; do for in {1..10}; do read j k l -u 37; do echo $i $j $k $l; done; done; done

I can come up with a bunch of test cases I don't care about OPTIMIZING, but I'd prefer they didn't actively break. (But why would anyone do that? "for i in a b c d; do read a b c < $i; do_stuff; done" could happen. Hmmm, but then it's doing an open/close on the file object in the read context, so cacheing the FILE * object in the flow control would be wrong. Grrr. Lifetime rules!)

Hmmm... alright, there are two cases here: read from a tty and read from a file. In the tty case, the input (should) come in chunked so the block reads are short and shouldn't readahead much anyway. (If you've ever typed stuff before a shell was ready and the input got lost... that. Password prompts are notorious for it, but it happens elsewhere.)

The other case is "while read... < file.txt" where it will very much read all the way ahead, and if you ever discard extra buffer you deterministically lose bits of the file. Which says (oh goddess) I need a reference counted cache of FILE * wrappers for file descriptors >2 (stdin, stdout, stderr have persistent globals) but bump the reference increment/decrement to the enclosing loop block object (if any), which STILL won't work with "while read x; do command that also reads input $x; done < file.txt" because the FILE * will read ahead and then pass the filehandle to the command which starts reading after whatever the FILE * ate.

$ while read i; do echo =$i; head -n 1; done <<< $'one\ntwo\nthree\nfour\nfive'
=one
two
=three
four
=five

How. HOW? Is it doing single byte reads from input?

$ echo -e 'one\ntwo\nthree\nfour\nfive' | while read i; do echo =$i; head -n 1; done
=one
two

Ah. It gets it right when the input is seekable. Of course.

$ while read i; do echo =$i; toybox head -n 1; done <<< $'one\ntwo\nthree\nfour\nfive'
=one
two

And it's at least partly "head" doing extra work, and toybox is getting it wrong. (New test!)

RIGHT.

And this says that FILE * is generically borked in the presence of fork/exec _anyway_, because the inheritor of our fd 0 won't see the data read ahead into the FILE *stdin buffer. I'm more familiar with this problem as it relates to stdout flushing, because glibc's gotten that very wrong before, and that was just trying to make flush on exit() reliable, let alone exec without exit.

The two big problems in computer science REMAIN naming things, cache invalidation, and off by one errors.


February 1, 2023

For my birthday, an ice storm knocked out the power from before I woke up in the morning until sometime after 10pm. I had some battery in my laptop, but didn't fire it up because if it drains all the way I lose all my open windows, and with more freezing rain predicted tonight I didn't know if power would be restored before thursday. (Plus our geriatric cat's heating pad was off, so sat on me for hours instead.)

Luckily it got just cold enough to sleet instead of more freezing rain. None of the trees that could have collapsed on my house did, although two on the block dropped some quite big chunks, and one such trees has drooped significantly and is resting half its branches on our roof, but in a bend-not-break sort of way. (One around the corner has bent basically in half and is resting its branches on the _ground_, which I find impressive. Pecans are survivors.)

So yeah, not a productive day, but way better than it could have been. No flood damage, no hurricane scouring the paint off a corner of the house...

Sigh. The very nice glasses I got in Japan shortly before the pandemic are finally wearing out. The lenses were outright scratchproof for a good three years, but the coating's weathered enough they're starting to scratch. They've been WAY more durable than anything I got from Zenni, and I dunno whether they're still functional at all with that whole "outsource to china" strategy meets china's covid lockdowns, the container pileup, and now wolf warrior diplomacy and reshoring? (I didn't get my prescription checked in Japan and instead handed them an old pair of glasses to copy the prescription from, and I've passed them off as "reading glasses" ever since. That was intentional: I'm not driving so I care more about reading up close for long periods, and glasses that focus more naturally at that length cause less eyestrain.

I _have_ newer/stronger glasses somewhere, but about 5 years ago I worked out that my eyes are adjusting to my normal usage patterns (staring at up-close things for hours at a time), and the whole reason my vision sucks is years of a correct-and-adapt cycle I probably could have just avoided if I hadn't been reading comic books all morning before the school eye test back on Kwaj. I'd never needed glasses before, but the roofline was a touch blurry... because my eyes took a couple hours to swing back to looking at far away stuff. I'm a lot older so it takes my eyes a lot longer to move their overton window, but even today it still happens: if I stop wearing glasses for 8 hours or so far away things are WAY sharper when I finally do put them back on. I just... hardly ever do that? No phone, no lights, no motorcars, not a single luxury... Sometimes I take them off on long walks to the table while listening to podcasts, but that's about it.)


January 31, 2023

Honestly, WHY does qemu keep gratuitously changing its user interfaces? Once again the old one was simple and straightforward, the new one is insane, and removing the old simple API serves no obvious purpose. They broke tcp forwarding, they broke -hda, they broke -bootp... Stoppit.


January 30, 2023

It occurs to me I can test the lib/passwd.c rewrite under a debootstrap chroot instead of waiting for mkroot, because it's just twiddling files rather than poking at syscalls or /proc the way route and insmod do to actually change the host kernel's system state.

In theory, it's "debootstrap beowulf beowulf" (for devuan anyway) and then when that's finished copy a stripped down version of mkroot's "init" script in there and sudo env -i USER=root TERM=linux SHELL=/bin/bash LANG=$LANG PATH=/bin:/sbin:/usr/bin:/usr/sbin unshare -Cimnpuf chroot beowulf /init and... in PRACTICE it's being stroppy. I dealt with this for Jeff some months back, but apparently didn't blog about it enough, and can't find my notes? Hmmm... I remember tracking down a weird bug involving accidentally running the Defective Annoying SHell instead of bash, hence the SHELL= export there, and that's the kind of thing I WOULD have blogged about, but no?

I might have tweeted about it, in which case it's lost to history because of the muskrat's midlife crisis. (For his quarter life crisis he bought a company that makes shiny red sports cars. The bald Amazon billionaire bought the a newspaper, the south african emerald brat tried to pretend he wasn't copying him by instead buying the latest iteration of aol/livejournal/myspace. Because SpaceX clearly isn't in a dick measuring contest with Blue Origin. A company named after the X-prize, which he lost -- Paul Allen sponsored Burt Rutan to win -- is clearly NOT about competition and ego, it's an entirely original thing that emerged fully formed from his very large brain, which is no way a cry for help.)


January 29, 2023

Alright, FIX WHAT'S THERE in dirtree. BREADTH traversal means dirtree_recurse() needs to iterate through the child list of stored entries (if any), which calls handle_callback() which frees the node when the callback didn't return DIRTREE_SAVE. The problem is, we're recursing through that list and free(node) doesn't remove it from the list. We're only told AFTERWARDS whether or not it saved it (did handle_callback return a pointer or NULL). So I need to fetch the next entry _before_ calling handle_callback so we can iterate without read-after-free list traversal, but I need to update and advance the saved-node pointer _after_ calling handle_callback, making sure it always points to valid memory.

Dear C++ developers who have hijacked gcc development:

In file included from ./toys.h:69,
                 from lib/dirtree.c:6:
lib/dirtree.c: In function 'dirtree_recurse':
./lib/lib.h:71:35: error: label 'done' used but not defined
 #define DIRTREE_ABORTVAL ((struct dirtree *)1)
                                   ^~~~~~~
lib/dirtree.c:174:21: note: in expansion of macro 'DIRTREE_ABORTVAL'
     else if (new == DIRTREE_ABORTVAL) goto done;
                     ^~~~~~~~~~~~~~~~
lib/dirtree.c:154:18: warning: unused variable 'entry' [-Wunused-variable]
   struct dirent *entry;

Bravo on the warning and error message generation. Exactly what I would expect from people who think C++ is a good idea. (And yes, that is a single processor build with no output interleaving. I double-checked. And yes, those were the first output messages before it had a chance to get itself good and confused, which it did and complained just as uselessly for quite a while after that. For the record, I had an extra } on line 177, a few lines AFTER all that nonsense. The compiler was no help whatsoever in finding it.)

Ok, got sort checked in. It uses -s as its short option which is a bit questionable (as far as I can tell the gnu/dammit one has -s produce the behavior it was already _doing_ for extract and throws an error if you try to use it with create: bravo guys), and my --sort can take an optional =thingy argument for compatibility but only implements sort by name. (Again, there's no "rm -r --switch-off-r" so --sort=none seems useless, and --sort=inode is a micro-optimization for 1980s vax systems without disk cache? It claims a performancce improvement but extract ain't gonna care (it's not USING the old inodes) and create has to read all the directory entries in order and then do a second pass to open them when it sorts ANYTHING, and then using inode number as a proxy for disk layout is optimizing seek time on uncached spinning disks which is also assuming they're regularly defragmented in a way that doesn't get the file locations out of sync with the inodes AND which assumes the disk was basically empty when all the files were created so the on-disk file locations correspond to the inode numbers, AND assumes a filesystem that's allocating inodes sequentially instead of using them as hash values... seriously, this was a marginal idea in 1989, trying to do it on a VM using virtfs to talk to a host storing data in btrfs is just NONSENSE.

The request was just for generating stable tarballs. I'm a little "eh" about mine vs gnu/dammit producing different output because I'm using strcmp() and the FSF loons are probably listening to the locale information and doing the same "upper case sorts mixed in with lowercase" nonsense that forces everybody to go LC_ALL=c before calling 'sort' out of the host path, but I can't control that and "stable produced with the same tool" is presumably the goal here.

Yes, the test I added for --sort is not using "professional" names. No, I'm not cleaning it up to look presentable. Possibly I should have left sed as it was and let the culture catch back up...


January 28, 2023

Grrr, the design of dirtree.c isn't right. And I've known it isn't right, but it's hard to GET right. There are FOUR interlocking functions (dirtree_add_node(), dirtree_recurse(), dirtree_handle_callback()), plus a fourth wrapper function dirtree_read() you generally start out by calling, and that's way too complicated.

The job of dirtree_add_node() is to stat a directory entry and populate a struct dirtree instance from it, which is fine. That's good granularity. That's the only one of the lot that ISN'T crazy, although possibly that assumption is what needs to change to let me fix everything...

When each dirtree instance gets created a callback function can happen, with behavior that happens in response to that callback's return code. That's what dirtree_handle_callback() does: you feed it a dirtree instance and the callback function, and it calls one on the other and responds to its return code. Possibly dirtree_add_node() could just take the callback as another argument... except what I was trying to avoid was recursing into subdirectories causing the function to recurse too. I don't want NOMMU systems with tiny unexpandable stacks to have unnecessarily limited directory traversal depth. Although I don't think I've got that right NOW either, so...

The dirtree_recurse() function handles recursion into subdirectories. Badly. Right now it opens a filehandle at each level to use the openat() family of functions, meaning directory traversal depth is limited by number of filehandles a process can open simultaneously. Instead I need to traverse ".." from the directory I'm in to get back to the parent directory, and then compare the saved dev/ino pair in the cached stat structure to see if that's the same node, and if not traverse back down from the top again. (And if THAT doesn't work, prune the traversal. That's "mv a subdir while archiving" levels of Don't Do That. SECDED memory falls back to DETECTING an error it can't correct, quite possibly this is xexit() time.)

The linked list of dirtree structures is less of a problem than the recursion stack depth because a linked list doesn't have to be contiguous, you can fragment that allocation all you want.

Sigh, the real outlier here is ls.c. Everything else just calls dirtree_flagread() and gets callbacks, but ls micromanages the traversal because it had weird sequencing requirements. So I need to refamiliarize myself with the ls weirdness to make sure a new cleaner dirtree implemenation could provide the callbacks it needs (quite possibly it _is_ the new DIRTREE_BREADTH semantics) so I can stop exporting dirtree_recurse().

Grrr, but Elliott pinged me about a new android code freeze and I wanna get him --sort before that goes in. I should debug what's THERE instead of redesigning it, but it's REALLY hard to get the object lifetimes right with multiple functions passing stuff off between them in a loop like it is now.

I think I need two functions: dirtree_add_node() and dirtree_read() that does all the callback handling by non-recursively traversing the tree (adding/removing nodes as it goes if/when the callback says to). Hmmm, but what would the arguments be? There isn't a global "tree" object that can hold things like "flags", and I want to be able to traverse on a path _or_ under an existing struct dirtree *node... Maybe dirtree_read(char *path, int flags, function *callback) which is a wrapper for dirtree_traverse(dirtree_add_node(char *name, int flags), int flags, function *callback)... except the reason dirtree_add_node() needs the parent pointer is for parentfd due to the openat() stuff, that's why the caller can't just set it after it returns. Right...

Fiddly. Hmmm...

When I'm done all this plumbing SHOULD look so simple that it's all obvious and trivial and seems like I didn't do anything. Getting there is usually a flaming pain, and a lot of the times I DON'T and have to ship something overcomplicated, which says to ME that I'm not very good at this. Alas, the reason I _don't_ have impostor syndrome is the rest of the industry turns out, on average, to be even worse at it than me.


January 27, 2023

Trying to debug tar --sort and it's being stroppy. I'm not sure I've got the design right, which is sad for something so seemingly simple?

Sort of regretting having implemented --no-ignore-case. It's the default, just don't specify it when you don't mean it? I didn't have sort check it, and am going "eh...". (The extra code to check it is bad. Having it and NOT checking it here is bad. Grrr. NOT PICKING AT IT. I haven't figured out how to make lib/args.c gracefully handle this category and I'm trying NOT to go down a rathole of spending 3 days on the design of something relatively unimportant. Not a fan of ---longopts at the best of times, having extra options to put the behavior BACK to the default... rm -r does not have a turn-off-r-again option because it DOES NOT NEED TO.

The gnu/dammit clowns are BAD AT UNIX, Stallman only cloned unix after ITS died because his community had collapsed under him and he wanted to hijack an existing userbase, he hated and fought unix until he was forced by circumstance to join, and was an outsider who never properly understood WHY it worked.

The old history writeup I did on this years ago didn't even MENTION Digital Equipment Corporation's Project Jupiter which was the proposed successor to their 6-bit mainframes (the PDP-6 and PDP-10). The Jupiter prototype system was used to render part of the graphics in the 1982 disney movie Tron, but DEC pulled the plug on development in April 1983, and THAT's what caused Stallman to give up on ITS and start over cloning Unix. He'd backed the wrong horse, the hardware platform he'd inherited (after everybody else who worked on it graduated and moved on with their lives, he stuck around as a perpetual college student) died out from under it, and NOBODY ELSE CARED. He was forced to move because the univesity was going to unplug the old hardware and throw it away. This wasn't a decision, this was a forced REACTION. RMS was always a conservative reactionary working to prevent change, who took the smallest steps possible each time the legacy position he defended became untenable. As with all ultra-conservatives, he mistakes this for "visionary thinking" and talks himself up, but it's the same "looking back to a largely imaginary golden age" you see so much of from any other privileged old fogey complaining about kids these days.

Stallman couldn't even predict the obvious near future: 6 bit systems inevitably lost to 8 bit systems as memory got cheaper because the whole POINT had been that you could fit 25% more text a given amount of memory using 6 bits per symbol instead of 8... with glaringly obvious limitations. With only 64 combinations you just couldn't fit everything: 26 upper chase characters, 26 lower case characters, and 10 digits left only TWO symbols for space and newline -- you couldn't even end sentences with a period. If you wanted ANY puncutation, you had to sacrifice digits, or make everything uppercase, and different compromises meant incompatible encodings.

The first 7 bit ASCII standard was published in 1964. With twice as many symbols there was no need to compromise -- after upper, lower, and digits half the space was still available for punctuation and control characters -- so every 8-bit system could use a compatible encoding for all documents. Gordon Moore's article describing Moore's Law was published in 1965, predicting exponential increases in memory availability for the forseeable future. Clinging to a 6-bit system almost 20 years later (after all his classmates had already abandoned it) was head-in-the-sand levels of stubbornness on Stallman's part.

DEC had introduced its first system with 8-bit bytes (the 16-bit PDP-11) in 1970, 13 years before canceling Jupiter, and its 32-bit successor the VAX came out in 1977. In DEC's entire history it only ever sold about 700 of its 36-bit PDP-10 mainframe systems. DEC sold almost a _thousand_ times as many PDP-11, and DEC shipped a dual-processor VAX the year before canceling Jupiter.

Stallman is the exact opposite of "visionary". He's just another classically educated white male with decades of practice retroactively justifying what he's already decided to do by constructing a convincing shell of logic around his emotional motivations, and it is just as exhausting dealing with his fanboys as it is dealing with the fanboys of muskrat or jordache peterman or the ex-Resident or any of the others.

Jeff's flying back to Japan. I am jealous. But Fade made a flight reservation for me to visit her from Feb 10 to 22, so that's nice. (Her dorm apartment thingy still has the second room empty and locked, so it doesn't both anybody if I stay more than a couple days.)


January 26, 2023

Last year I ordered a cthulamp for the desk in the bedroom (one of them "five positionable metal tentacles with a lampshade at the end of each" deals), but couldn't figure out how to assemble it properly and then wound up flying off to Fade's and finishing the contract from there. Took another stab at assembling it today and figured out what I got wrong this time (the little plastic not-washer thing with the raised inner bit was both on the wrong side of the shade AND rotated 180 degrees, so it fit perfectly but then the light bulb didn't), and WOW that desk is a nicer workspace with 5 more LED bulbs right next to it.

Finished and checked in --wildcards. Needs more tests in the test suite, but it didn't cause obvious regressions and should be enough to unblock the android kernel guys?

Implementing tar --sort next.

I tried Chloe Ting's "5 minute warmup" video.

Made it to the end this time.

Everything hurts.

(It wasn't even one of her proper EXERCISE videos. I did the WARMUP and am still in pain an hour later. It turns out slowly walking 4 miles a night 3 or 4 times a week not exercise a wide variety of muscle groups.)


January 25, 2023

Elliott emailed me asking for a bug report if I could reproduce the adb compatibility issue, because he says the policy is the developer kit should be backwards compatible all the way back to kit kat, including ADB working. I apologized and acknowledged it's been a while since I've tried the distro version of ADB. (For file transfer I scp files to my webserver so my phone can download them, and attach stuff to myself in slack going the other way. I installed an ssh app on my phone but haven't bothered to use it in forever.

Back when I was running Devuan Ascii, _many_ things out of the repo didn't work (llvm was too old for the packages I was trying to build, ninja was too old, I finally upgraded to Beowulf because building qemu from source demanded a newer vesion of python 3...) The adb in Ascii having been broken probably wasn't surprising. I got in the habit of downloading a new version of the android tools rather than trying the distro version, and haven't checked if I still NEED to in a while...

My current phone's a Pixel 3a that end-of-lifed on Android 12 (the system->update menu has a big "regular updates have ended for this device" banner, with the last one 10 months ago), so isn't exactly a moving target anymore anyway. (Yeah, I should upgrade my laptop to Devuan Chimaera, but nothing major's broken yet that I've noticed?)

At a guess, debian breaking adb is like debian breaking qemu: I always build that from source because debian's distro version never works. Even when the theoretically exact same release built from source via "./configure; make; make install" works fine.

Alright, where did I leave off with wildcards: --wildcards{-no,}{-match-slash,} --{no-,}anchored --{no-,}ignore-case and this is why I got so distracted by trying to automate no- prefix in the plumbing. Right, just explicitly spell out all 8 flags for now and clean it up later. What are the USERS: Inclusion vs exclusion, creation vs extraction, command line arguments vs recursively encountered arguments: that's 8 combinations. No, 16 with and without case sensitivity. (This is assuming extract and test behave the same.) Each of those can have wildcards default to enabled or disabled: case sensitivity is the global default, exclusion defaults to wildcards no-anchored match-slash. Not everything can be enabled in every position, for example --wildcards does not affect command line arguments when creating an archive. (That's one of the tests I wrote back in October.)

I'm also annoyed at --show-transformed-names and --show-stored-names because it should just pick one. I'm also reminded that --verbtim-files-from exists and I think that's what I'm doing already? (Need to compare with busybox...)

Sigh, it's so easy to find -K and -N and go "I could implement that" but nobody's ASKED for it and if you go down that road even ignoring crap like -n (not implementing multiple codepaths to do the same thing, thanks) and --sparse-version there's gratuitous complication like --owner-map (not the same as --group-map) and the $TAPE environment variable and twelve --exclude variants that really could be done via "find" ("find -print0 | xargs -0" covers a multitude of sins, fairly portably) and then just nuts stuff like --hard-dereference that... what's the alternative? Linux doesn't let you hardlink directories, and a file with more than one hardlink is A FILE. Would --ignore-command-error apply to the compressor or just programmatic output streams?

Busybox NOT implementing stuff for a long time is a useful data point: they got a couple decades of people poking them and going "I need this". If it didn't happen (strongly enough for them to react), that's informative.

Except I got asked (on github somewhere) to support appending: -r and -u and maybe -A? (Which is append with existing archive which you don't need tar for...? I mean, it cuts off the trailing NUL blocks I guess. There's an -i option which... I don't know why that always being on would be a bad thing? Probably some historical reason...)

The existencce of "lzip", "lzop", and "lzma" makes me tired. None of which are "xz". (It's like being back in the days of arj and zoo.)

Ahem: ok, back up to the motivating use case: tar --directory={intermediates_dir} --wildcards --xform='s#^.+/##x' -xf {base_modules_archive} '*.ko'

Oh yes, and with gnu/dammit tar --wildcards affects patterns AFTER it but not before it in the command line. Sequencing! Right.

Ok, wildcards can be switched on for extract but NOT for create because creation isn't doing a directory search but is opening (and recursing into) specific command line thingies so there's no comparison being done: there's no readdir() in that codepath, the open(argv[x]) either succeeds or fails. Comparisons are done for creation exclusion (while recursing?), extraction inclusion, extraction exclusion... which corresponds to toybox tar's 3 existing calls to filter() with add_to_tar() calling filter(TT.excl), and then unpack_tar() doing both filter(TT.incl) and then filter(TT.excl). Both TT.excl calls should default to --no-anchor --wildcards-match-slash but the TT.incl call shouldn't (but currently does because I only implemented one filter behavior). The man page implies incl should default to --anchored --no-wildcards --no-wildcards-match-slash...

Sigh, I can just compare my argument with the global variable to distinguish the two cases, and set the default that was. It's ugly, but having the caller (redundantly!) specify the defaults is also ugly, and having an extra agument to distinguish the modes when I can just test for it... Wanna get this finished and move on to the next thing.


January 24, 2023

It's been a while since I've had a significant visual migrane.

The experience is not raising any positive nostalgia.

Not a productive evening.


January 23, 2023

Checked in the probably correct but not actually tested DIRTREE_BREADTH code (which at least didn't cause regressions in the test suite) this morning, but haven't used it to implement tar --sort yet because I still have 2/3 of --wildcards in my tree. Which is actually a half-dozen options because there's --no-wildcards-match-slash and so on.

Urgh, why is tar.c not using FNM_LEADING_DIR instead of the constant? I did not leave myself a comment about WHICH build environment barfed on this. The fnmatch.h header is in posix but this particular constant isn't, It's unsurprisingly in glibc, it's in bionic (which says it got it from openbsd), it's in musl. Boot up freebsd-13 under kvm... that's got it too. And Zach got me a mac login... it's there as well.

Ok, is it a 7 year time horizon thing? The date on the line according to git annotate is 4 years ago, so most likely 7 years has expired by now if that was the case? (It's not a kernel thing, it's a libc thing. Annotate on musl's fnmatch.h says it's from 2011, that's a full dozen years ago.) Eh, try constant for the macro and see who complains...

Oh wow. It's glibc that complains. It wants #define ALL_HAIL_STALLMAN to provide the constants, but on Bionic and FreeBSD and MacOS they're just there without magic #defines. And it's the same constant value everywhere. Right, #ifndef in portability.h time, maybe posix will catch up somewhere around 2040...

Yay, dreamhost fixed it. My two posts about it to the list didn't wind up in the web archive and I was all ready to take up my sword again... but it's because I sent the message and the reply to "lists@landley.net" which is not a real address. Hopefully google and archive.org will start populating again at some point.


January 22, 2023

That tar --xform test failure which only happens on musl is because musl still doesn't have regexec(REG_STARTEND). So it's just a new manifestation of a known failure, eating another round of debugging time because 10 years ago Rich explicitly refused to implement something even the BSDs have.

Sigh. I'm eventually either going to have to fork musl or drop support for it. I should just switch that date test back on. There are multiple "yup, musl and musl only is broken, this even works on BSD" cases already. The test suite needs a MUSL_IS_BROKEN flag on tests, or something...

A tech writer recently boggled at the pointless "undefined behavior" in C compilers written by C++ developers. And here's a rant I edited out of a post to lkml:

The C language is simple. The programs you write aren't, but the LANGUAGE is. C combines the flexibility of assembly language with the power of assembly language: it's basically a portable assembly language, with just enough abstraction between the programmer and what the hardware is actually doing that porting from x86 to arm isn't a complete rewrite. You manually allocate and free all resources (memory, files, mappings) and all sort of stuff like endianness, alignment, and word size is directly whatever the hardware does. In C, single stepping through the resulting assembly and matching it up with what your code does isn't that unusual. I've gone looking at /proc/self/maps on a sigstop'd binary and objdump -d on the elf executable to figure out where it got to, and in C you _can_ do that.

C++... isn't that. The language is DESIGNED to hide implementation details, all that stuff about encapsulation and get/set methods and private and protected and friend and so on is about hiding stuff from the programmer. Then when implementation details leak through anyway, try to fix everything by adding more layers (ala "boost") on top of a broken base, but that's like adding floors to a skyscraper to escape a cracked foundation. It's still static typing with static allocation i(they're insanely proud of tying stuff to local variable lifetimes and claiming that's somehow to garbage collection) and it's GOING to leak implementation details left and right, so they have buckets of magic "don't do that" prohibitions which they cargo cult program off of. Most of C++ is learning what NOT to do with it.

C was simple, so C++ developers hijacked compiler development and have worked very hard for the past 15 years to fill C with hidden land mines so it can't be obviously better than C++.

C is a good language for what it does. C++ is a terrible language. The C++ developers have worked tirelessly to make C and C++ smell identical, and as a result there's a big push to replace BOTH with Rust/Go/Swift and throw the C baby out with the C++ bathwater.

Haven't heard back from dreamhost, so I've submitted ANOTHER support request:

http://lists.landley.net/robots.txt prevents Google from indexing http://lists.landley.net/pipermail/toybox-landley.net/

I did not put http://lists.landley.net/robots.txt there and cannot delete it.

The contents of http://lists.landley.net/robots.txt are:

User-agent: *
Disallow: /

Would you please delete this file, or change it to allow Google to index the site? I do not have access to it.

Here's hoping THAT is explicit enough for them to actually do something about it. Third time's the charm?


January 21, 2023

Properly reported the qemu-mips breakage. That list may be corporate, but it's not the wretched hive of scum and villainy linux-kernel's turned into, so maybe... (Yay, there is a patch, and it Worked For Me.)

So what DIRTREE_BREADTH _should_ look like is something like...

  1. The initial callback (which always happens) returns BREADTH, and the calling function populates the ->child list one level down the same way DIRTREE_SAVE would.
  2. The second callback has ->again set to DIRTREE_BREADTH, which lets you sort the children. When this one returns, it recurses into those children unless you returned DIRTREE_ABORT. This recursion frees each child if its initial callback didn't return DIRTREE_SAVE.
  3. The DIRTREE_AGAIN callback is handled normally, although the children were already freed if not SAVEd.

Hmmm, instead of checking for DIRTREE_BREADTH a lot the "populate children" loop should just pass a NULL callback while accumulating children... Sigh, I need to stress test DIRTREE_ABORT to make sure A) it returns from anywhere, B) it doesn't leak memory. Except most of my actual users don't choose the abort path, they continue on despite errors: tar, rm, cp...


January 20, 2023

We have a dishwasher again! Exact same type as last time, so it looks like nothing has changed but so much work went into this. (Ah, that old story.) The install guy set it doing an empty pratice run first, but then we have so many dishes to wash...

Jeff is trying to set up an sh4 development environment so he can come up with mmu patches and send them to linux-kernel, and I've been feeding him the trail of breacrumbs I've laid out with mdm-buildall and mkroot and so on, even using my prebuilt binary system image tarball the network didn't work for him, and that's becaue I'm using an older qemu version than he is.

Building QEMU from source recently broke network support for all platforms by splitting it out into a separate package your distro has to install for you. Because obviously the ability to talk to the network is not a standard thing a VM would want to do. This now requires "libslirb". There's an existing slirp package, for the serial line internet protocols slip and ppp, which has nothing to do with libslirp that I can tell. Luckily devuan has a "beowulf-backports" repository alongside all the others, which I can add (why didn't the OS install do that?) to get this libslirp-dev package. I'm still annoyed the IBM mainframe guys who took over QEMU development when kvm displaced xen as Linux's standard "hypervisor" are suddenly demanding it, but at least I can get Jeff unblocked now.

Mainframe punched card culture should not be allowed to turn functional software into bloated "enterprise" crap: qemu-system-arm64 (ahem, I mean qemu-system-aarrcchh6644) is A HUNDRED AND TWENTY FIVE MEGABYTES. Dynamically linked! That can't be right. You can tell Fabrice Bellard moved on long ago, and was replaced by a committee.

And test_mkroot.sh says mips is still broken... because the ethernet hardware isn't binding even WITH the library installled. And that's... because an endianness "fix" broke big endian for pretty much the entire PCI bus. Sigh. Vent about it all and move on...

Ok, tangent du jour beaten back down, let's circle back to the toybox design issue I'm frowning at. What notes did I leave myself:

why are recurse and handle_callback split?
  dirtree_add_node(): clear design, yay
    - maybe add callback as argument to dirtree_add_node()?
  dirtree_handle_callback: 

stages:
  fetch dir, initial callback: returns DIRTREE_BREADTH
    fetch children, via recurse with BREADTH.
      problem: closed fd already? (don't close for BREADTH)
    breadth callback: returns DIRTREE_RECURSE
      traverse children now
        call handle_callback on each?

Which means: DIRTREE_BREADTH isn't that hard to implement, but the existing code has three functions that really seem like they shouldn't be split that way?

  • dirtree_add_node(dirtree *parent, char *name, int flags) - creates a struct dirtree from a file. Handles the flags FOLLOW, STATLESS, and SHUTUP. Returns a new node with ->parent connected but not ->child.

  • dirtree_handle_callback(dirtree *new, function *callback) - calls callback(new) and handles the return value: flags RECURSE, COMEAGAIN, SAVE, and ABORT. (And I'm trying to add BREADTH here.)

  • dirtree_recurse(dirtree *node, function (callback, int dirfd, int flags) - most of the plumbing.

One sharp edge is that handle_callback() is opening the dirfd for recurse, but then recurse is closing it, which is NOT a happy lifetime rule.

I think the reason for all this tangle in the first place is I was trying to recurse the data structure without making the FUNCTIONS recurse, so it didn't eat an unbounded amount of stack when descending into a tree of unbounded depth? (Especially nasty on nommu.) Except that pretty much means having all three of them be a single function, because otherwise they're calling back and forth between each other. Or having one function that calls the others in a loop, which isn't what it's currently doing.

In any case, "implement breadth first search" and "reorganize this to not be designed wrong" really need to be two different passes, otherwise I'm here for a while...


January 19, 2023

Ha! The dirtree.c plumbing shouldn't have seperate DTA_BLAH flags for the "again" field to distinguish different types of callbacks, it should reuse the existing DIRTREE_COMEAGAIN, DIRTREE_STATLESS, and DIRTREE_BREADTH bits. (The "again" field is a char so can only hold the first flags, but I can reorder the DIRTREE flag list as necessary so the ones that cause callbacks are all at the start. Nobody else cares which flag is which, that's why there's macros.) This way, the again bits are the same as the reason for the callback: no flags is the initial "we found and populated a struct" callback you always get when callback isn't NULL, then BREADTH is "finished populating a directory with implicit DIRTREE_SAVE but did not descend into it yet, so now would be a good time to sort the children", and then COMEAGAIN call would be the final call on the way out of the directory after handling all children. (STATLESS doesn't cause a seperate callback, but is set on any callback when stat isn't valid.)

I should rename DIRTREE_COMEAGAIN to just DIRTREE_AGAIN (it was a Simpsons reference), but my tree's too dirty for comfort, need to check other stuff in first.

For BREADTH child callbacks are deferred until traversal: if the initial no-flags callback on the directory returns DIRTREE_BREADTH the plumbing should populate all the child structures without making any callbacks on them yet, then it does a callback on the same dir again with DIRTREE_BREADTH, then traverses the child list doing normal callbacks but freeing all the non-dir children after each callback returns, and then traverses the now-shortened list again handling the directories it needs to descend into...

Hmmm, that's not what gnu/dammit tar is doing, though. It's populating and sorting the list, then traversing it but descending into each directory as it encounters it in the travesal. Which isn't a true breadth-first search, it has ELEMENTS of breadth-first but... Ok, the return codes from the callback functions need to control order. Maybe if the DIRTREE_BREADTH callback returns DIRTREE_RECURSE then we descend into it now, and if not we do the second pass thing? Hmmm. I've got DIRTREE_SAVE, DIRTREE_RECURSE, and DIRTREE_BREADTH, and can return a chord of any of them to mean what I need it to, the question is what's the most obvious way to signal what I need it to do? What ARE the use cases?

This needs some pacing and staring into the distance....


January 18, 2023

Sitting at HEB with a stack of beverages I just bought (refill on blueberry energy cylinders, the checkerboard teas are back in stock, and there was a good coconut water coupon today)... but no snacks.

I miss japanese grocery stores and conbini. The conbini of course had rice balls and steamed buns and even microwaveable hamburgers if you wanted to get serious. The grocery store near the office had lovely little 100 yen sandwiches, which were just two pieces of cheap white bread with some filling (I usually got the strawberry jam or tuna varieties), crimped in some variant of a panini press that cut off the crusts and sealed the edges, and then presumably run through a nuclear reactor to sterilize them so it has multi-week shelf life. (Like mythbusters did to sterilize those tortilla chips in the "double dipping" episode: conveyor built moves the product past a strong radiation source is basically a non-heating microwave that kills all the bacteria with a few seconds of intense gamma radiation. The expiration date on the package is when the sandwich dries out slightly and is less tasty, I never had one actually go bad.) We could totally do that here in the states, we just don't: some variant of laws, culture, inclination, and capitalism optimizing for profit over all else.

Ok, tar --sort needs DIRTREE_BREADTH to do breadth first search. I could instead do DIRTREE_SAVE to populate the whole tree up front, then sort the whole tree, and then traverse the resulting whole tree, but don't want to because A) directories changing out from under us are less icky if you do it all in one pass, B) I've already got the openat() directory-specific filehandles for local access (I can open "file in this directory") in that initial pass. A second traversal has to either re-establish the openat() filehandles, or create/open big/long/path and potentially hit PATH_MAX issues. Since I don't have existing plumbing to do either of those yet, as long as I have to to write new plumbing ANYWAY I might as well implement the DIRTREE_BREADTH stuff I have some existing design stubs for.

DIRTREE_BREADTH brings up the DIRTREE_COMEAGAIN callback semantics: to enforce a specific traversal order I need to sort each directory's contents before descending into it. I reserved a DIRTREE_BREADTH flag back at the start but never implemented it, and I now have _three_ users of this plumbing that I'm aware of (ls, find, tar) so sounds like time to implement it. (Whether or not I poke ls.c with a stick afterwards remains an open question.)

Looking at find -depth is.. sigh. The toybox find help text describes -depth as "ignore contents of dir" and the debian man page describes -depth as "Process each directory's contents before the directory itself" and I don't remember if posix even has -depth and I probably need to spend an hour or two on this rathole, but I haven't got spare cycles for right now. (And I've already REVIEWED this one multiple times, so 99% likely I wouldn't be fixing the code but just updating my memory of it.) Anyway, -depth existing implies that _without_ that it's doing a breadth first search... which it demonstrably isn't in simple testing. Ok, find is NOT doing breadth first search. I thought it had an option for this, but no. It has an option to tell it what order to _act_ on what it's traversing, but it still descends into each directory it encounters when it encounters it.The ls.c code is taking manual control of the traversal by having the callback return DIRTREE_SAVE without DIRTREE_RECURSE so the traversal populates a directory's children, then it converts the linked list to an array, sorts the array, uses the array to re-link the list objects in the right order, then it iterates over the sorted list and calls dirtree_recurse() again on each directory entry.

So I want dirtree_recurse to assemble the list, call a sort callback on the directory that can reorder the children, and then traverse them and descend. Which is a different callback from the current DIRTREE_COMEAGAIN callback? Do I need a third dirtree->again flag value? It's got 1 (callback on directory after processing all contents) and 2 (DIRTREE_STATLESS returning a file we can't stat), which are set/used as constants without macros defined for them. A third means macros, what would... DTA_AGAIN and DTA_STATLESS maybe?

Hmmm... but IS this callback a different one than DIRTREE_COMEAGAIN? It sounds like DIRTREE_BREADTH means: 1) DIRTREE_SAVE a linked list of a directory's children without recursing, 2) call the DIRTREE_COMEAGAIN callback on the directory, 3) traverse the saved list... doing what exactly? When are these freed? If we free them the step 3 traversal how do they ever get used?

Ok, I think I do want a third flag: DTA_DIRPOP lets you sort a directory after it's populated, and then we call with DTA_AGAIN on each entry right before we free it. Except the find -depth question comes in: does the directory count as occurring before or after its contents? That's a question for the sort function... ah, ok: while traversing the list, do a DTA_DIRPOP call before descending into it, DTA_DIRPOP|DTA_AGAIN after populating it, an then DTA_AGAIN without DTA_DIRPOP before freeing it. Silly, but it gives the callback multiple bites at the apple while still having generic infrastructure actual do the traversal.

And this is basically a wrapper function before the existing add_to_tar() dirtree callback that checks the flags and does sorting stuff as necessary, but otherwise calls the other callback. And you only insert the second callback when doing --sort. Ok, that seems feasible?

Implementing is easy, figuring out WHAT to implement is hard.

Darn it, one of the commands that came up in need of tweaking when I change dirtree semantics is chgrp... which was never converted to FLAG() macros. But chgrp.tests needs root to run meaning I want to run it under mkroot and that whole BRANCH of development is... several pops down the stack.

My _development_ plan has circular dependencies. Gordian knot cutting time, let's do it "wrong" for a bit just to clear some things...


January 17, 2023

My sleep schedule has been creeping forward towards my usual "walk to UT well after dark and spend the wee hours at the university with laptop", but I got woken up at the crack of dawn by sirens, flashy lights, and engine sounds right outside my window because the big house on the corner caught fire, and between something like 7 fire trucks and the police blocking off the street at both ends it was Very Definitely A Thing even from bed. I got up to make sure there wasn't incoming danger to us, and then I was up...

Kind of out of it all day as a result. Got a nap later, but "5 hours then FORCED UP" is something I may be too old to handle gracefully...

1pm call with Jeff to go over the Linux arch/sh patches, and the mmu change that apparently motivated the latest round of dickishness.

Elliott wants --sort=name, so looking at that. The man page has a short -s right next to it, which... "sort names to extract to match archive". What does that _do_ exactly? I'm already going through the archive in the order the names in the archive occur. There's not much alternative with tar. You can pass a bunch of match filters on the command line, but it's going to encounter them in the archive it's extracting, and thus extract them, in the order they occur in the archive. Tar != zip, it's not set up to randomly seek around, especially when it's compressed.

Sigh, my tar tree still has 2/3 of a --wildcards implementation in it, and does not currently even compile. Plus a bunch of test suite tests the host passes but my version doesn't. Need to finish that or back it out...

And when I do full tests against the musl build, tar is failing the "xform trailing slash special case". Which I don't notice when it's skipping the xform tests because it's using non-toybox sed (as happens on "make test_tar" unless I do special $PATH setup), and which I don't notice when testing a full glibc build because it works there. 95% likely it's musl's regex implementation, but... what specifically is diverging?

I would have an easier time with this if I remembered exactly what the "xform trailing slash special case" IS. October wasn't that long ago, but I checked this in as part of a large lump after days of work and there were a bunch of tests? It's searching for "^.+/" which... ^ is start of string, . is single character wildcard, + is * except "one or more" instead of "zero or more", and then / is presumably a literal / except it says "special case" here... Sigh, was this in the tar manual?

The example at the very end of that page is about specifying multiple sed transforms on the same command line, the first of which is NOT TERMINATED PROPERLY. (I.E. --transform='s,/usr/var,/var/' is missing a comma at the end.) And they repeat it twice the same way. Is this a doc mistake they cut and pasted, or does their implementation accept that? I'm afraid to check, and have NO idea how to deal with it if their implementation DOES allow it but normal sed doesn't. Maybe circle back to --xform after implementing the new stuff...


January 16, 2023

Ok, here's how I could cheat on the toysh "read" builtin: the case I care about optimizing is "while read", and the "while/do/done" block has an entry/exit lifespan. I can have the "while" cooperate with "read" to cache a FILE object. The read has to save it because "-u fd" is a read argument, but the while gives it someplace TO save it with a longer lifespan than the individual read call, and passing out of the "done" lets us know when to free the FILE *. Hmmm, I could store it in sh_blockstack's char *fvar with an evil typecast, that's not used by while... I'm dubious. Need to pace about it more. Probably best to implement just the slow path first. (There are SO many read options... timeout, length with and without terminator, -s puts the terminal in raw mode... I'm gonna need to back and implement array variable support in everything at some point? How do I stage this sanely...)

Oh hey, Greg KH is _also_ yanking most of the classic graphics drivers from linux-kernel. It REALLY sounds like linux-kernel development is collapsing and they're throwing code overboard as fast as they can. I hope that's NOT the case, I really thought we had another 5 to 10 years before that happened, but if Linus has decided to retire early because his daughters are all off to college... Let's see, his and his three daughters' birthdays are the easter egg values in "man 2 reboot" LINUX_REBOOT_MAGIC2, which are:

$ printf '%x\n' 672274793 85072278 369367448 537993216
28121969
5121996
16041998
20112000

So Linus is 53 (december 28, 1969) and his _youngest_ daughter is 22. Yeah, he's probably recently become an empty nester, and may be "quiet quitting" to go do other things with his life. And Greg has been waiting DECADES for the opportunity to do to Linux what Elon Musk is doing to twitter. Like an alcoholic buying a distillery. Sigh.

My annoyance with current linux kernel development is "Stop breaking stuff. Can the things that used to work still work?" And the reason we CAN'T have a stable kernel that doesn't shed features is... Greg Kroah-Hartman! Who many years ago proudly wrote a document named stable-api-nonsense about how the concept of Linux EVEN HAVING a stable driver API so you could keep portable divers between versions the way Windows did for many years... Greg said that's a crazy idea that Linux would never ever do. Userspace can still run a static binary from 1996, the kernel can't load a module from 9 months ago. Partly because GPL, and partly because Linux MUST be free to completely rewrite all the plumbing every 18 months to gain half a percent performance improvement with worse latency spikes. And now Greg's deleting a bunch of working drivers that are too hard to maintain under his insane regime. Wheee...

Sigh. Speaking of spiraling narcisists, did you know that Elon Musk got the idea of going to mars from a science fiction book the Nazi rocket scientist Werhner von Braun wrote in 1949, in which the emperor of Mars was named "Elon"? Back in the 1950s the reason Musk's grandparents gave for leaving canada for apartheid south africa was they perceived a "moral decline" in Canada (Wikipedia says "Most of the recorded student deaths at residential schools took place before the 1950s" so Musk's grandparents left Canada's about when the mass kidnapping and murder of native children declined, and instead they traveled halfway across the world to participate in Apartheid). So there's a nonzero chance Musk was named after that character in the 1949 German book, since his family was VERY familiar with a wide range of otherwise obscure nazi materials. So of course various Musk fans are now going "famous rocket scientist predicted Elon would be emperor of mars!" and I'm going "you have the causality here exacty backwards". Why do people keep thinking the man's ever had an original idea? That's NEVER been how he works...

My grandfather also interacted with Von Braun, they worked together on the Apollo program. (My parents met on the apollo program, because my father dated his boss's daughter.) The story grampa told me was that Von Braun's most important contribution to the US space program was statistical failure analysis. Grampa never mentioned the NSA until he got _really_ low while my mother was dying of cancer in the other room, shortly after my grandmother had died of her chronic lung problems (emphysema and eventually lung cancer, from years of smoking back before I was born). They'd had three kids, I never met Uncle Bo who volunteered to fight in vietnam over Grampa's strenuous objections and died there when his helicopter was shot down. Grandpa was now outliving a second kid and not taking it well, and started by complaining about how his hearing was shot and his big interest that got him into electronics had been crystal radios and audio. He was telling me how the allies recorded sound on magnetized metal wire but it got cross-talk when spooled and you couldn't splice it if it broke or got a kink, but they captured desk-sized audio reel to reel tape recorders from nazi bunkers which were a MUCH better design: built-in insulation between the magnetic layers in the spool and the tape could be cut cleanly and attached together with scotch tape on the back, and some of the GIs shipped a couple of them to the states where Bing Crosby paid to have them reverse engineered (and vastly simplified) so he could ship tape reels around to radio stations instead of constantly flying to give live performances, and this became the company "Ampex". Grandpa also told me how he did cryptography during the war creating one time pad vinyl records of "music off the spheres" radiotelescope recordings of ionized particles hitting the upper atmosphere sped up to be good random static which completely drowned out the voice unless you had the exact same record to do noise cancelling on the other side (stacks of these records were carried across the atlantic via courier, each one smashed and the pieces melted after one use). Churchill and FDR used these to securely talk to each other live over transatlantic cable, and this proceeded naturally to grampa venting about being blackmailed into joining (what became) the NSA after the war because they were going to draft him and put him on the front line in Korea if he didn't "volunteer", and then not being able to get out for decades until some idiot almost got him killed in Iraq in the 1980s by trying to hand off intelligence to him in his hotel room while he was there as a technical expert for General Electric upgrading (and bugging) the Iraqi phone system. (Apparently the various spy services are the best technical recruiters, finding you companies to work at. Well, they were decades ago, anyway. My take-away was "don't get any of that crap on you, you'll never get out again", and I learned it from my father's simple defense contracting.)

Oh hey, Dreamhost replied. They escalated to somebody who DID NOT BOTHER TO READ MY SUPPORT REQUEST. Not even the usbject line, which reads "Re: The robots.txt you put on lists.landley.net (which you won't let me log into) blocks google."

On 1/15/23 23:48, DreamHost Customer Support Team wrote:

> Hello,

> Thank you for contacting DreamHost Support! My name is XXX and I'd be happy to assist you with your concerns.

> With regards to the discussion list service, the last time this service was touched was last year in July when we had a maintenance on it to where we upgraded the services to new hardware. This didn't change much of how the service functions, though, as we're still running the same Mailman version as before under 2.1.39.

The robots.txt file is not technically part of mailman. Mailman runs in a web server, and that web server is serving the robots.txt file.

> About the http://lists.landley.net/listinfo.cgi page, that page has been disabled for a long time now.

I noticed. I complained about it at the time.

> The list overview page for the discussion list service was disabled over 5 years ago, actually.

Yes, as I told you in my last email. Closer to ten, really: https://landley.net/notes-2014.html#20-12-2014

> So, that page posted the "The list overview page has been disabled temporarily" message for a very long time now.

What does the word "temporarily" mean here?

> Unfortunately, that cannot be edited, but you already have your list archives set to public, so they can all be accessed here: http://lists.landley.net/listinfo.cgi/toybox-landley.net

Yes, I know, they are linked from https://landley.net/toybox on the left. But if people go to the top level lists.landley.net page, they do not get a list of available lists, and every couple months (for many years) people ask me why, and I tell them "because Dreamhost is bad at this".

For comparison, if I go to http://lists.busybox.net I don't need to remember the exact URL of the list I want to look at, because there is a navigation link right to it. That is not true of the toybox project, and I can't fix it, and my stock response to everyone who asks is "because Dreamhost is bad at this". Your service makes my open source project look bad to the point it's a FAQ.

The top level index page is especially nice if I'm sitting down at a different machine than I'm normally at and using a standard web browser to see if there are new posts, because remembering the full URL with the "dash between toybox and landley but the dot between landley and net and also a dot between listinfo and cgi"... that's tricky to do from memory.

> Since it's public, clicking the "Toybox Archives" link will open up the archives for that list for anyone that finds it.

I know how mailing lists work. I use them. If you looked at the mailing list in question you'd see I last posted to it on thursday. The "enh" posts are from Elliott Hughes, the maintainer of the Android base operating system for Google. He's the second most active poster to the list in question. I used to have other mailing lists for other projects, but they ended or moved off dreamhost "because Dreamhost is bad at this".

> As for the robot.txt file,

It's robots.txt.

> your 'lists.landley.net' sub-domain for the list does not use a robot.txt file.

Because it's robots.txt, as defined in the IETF RFC documents: https://www.rfc-editor.org/rfc/rfc9309.html

Point a web browser at: http://lists.landley.net/robots.txt

Do you see the file there? The file is wrong. The result returned by fetching that URL (which I CUT AND PASTED INTO MY LAST MESSAGE TO YOU) prevents Google from indexing the site. I do not have control over this file, for the same reason I had no control over the "temporarily disabled" message. It is a thing Dreamhost did on a server I do not have direct access to.

> In fact, on the mailman server, the services are not actually under the list sub-domain. That's just the sub-domain that all of your lists are managed under.

Do you see the "which you won't let me log into" up in the subject: line of this email, from my original support request?

In the message you are replying to I explained that "landley.net" and "lists.landley.net" are on different servers and I don't have access to the lists.landley.net one to fix what's wrong with it. You are repeating my problem statement back at me.

> But, on the mailman server, each list has its own set of configurations and files. For example, the stuff for the 'toybox' list is under the 'toybox-landley.net' location on the mailman server and has no robots.txt file.

When you fetch the URL from the server, there _is_ a robots.txt file. (Which you spelled properly this time.) The text "temporarily disabled" probably wasn't in the toybox-landley.net subdirectory either. The mailman install has templates for shared infrastructure.

This implies that it's a global setting, and you have thus blocked google search on EVERY mailman domain that Dreamhost serves. (Which I suspected was the case but don't know what other server pages to look at to confirm it.)

> It's just a sub-domain DNS record that points to the list server for where the list is managed.

Yes, I know. I managed my own DNS for the first few years you hosted my site, until I took advantage of your free domain renewals as part of the bundle.

I'm sure there was a little "yes I am experienced at web stuff" radio button selector when I submitted this help request? It did not have a "I ran my own apache instance for about 10 years, have also configured nginx, and even wrote my own simple web server from scratch in three different langauges" option, but still. (The httpd implementation I wrote last year is at https://github.com/landley/toybox/blob/master/toys/net/httpd.c because I needed something to test the new wget implementation with, so I did a simple little inetd version. Haven't wired up CGI support yet but it's got about 1/3 of the plumbing for it in already.)

The problem isn't that I don't know what's wrong, it's that I do not have access to fix it. I thought I'd explained this already, but I can repeat it.

I can SEE the robots.txt file. So can google. It is there. It should not be.

> And lastly, I'm afraid that our list services are not configured to run through HTTPS and there are no plans on getting that updated at this time, unfortunately.

Yes, I know. But that isn't _fresh_ breakage, so I'm living with it as part of the general "dreamhost is bad at this" Frequently Asked Question.

But Google _could_ find my mailing list entries a year or so back, and can't now, so Dreamhost adding a bad robots.txt is fresh breakage. (Dunno how long the google cache takes to time out when a new deny shows up?)

Given that the project I'm maintaining through that mailing list is Google's command line utilities for Android (I.E. their implementation of ls/cat/set etc as described in https://lwn.net/Articles/629362/ ) that's especially embarrassing.

> This would be quite the project as it would require an upgrade of Mailman, likely to version 3, which is quite different from version 2. So, the list admin page can only be accessed through HTTP. I'm very sorry about that.

Eh, I'm used to it.

I don't _think_ Android has entirely dropped support for non-encrypted URLs yet, only for certain api categories. (Which sadly broke my podcast player when upgrading to Android 12 no longer let it load http:// podcast files, only https.) I think you still have a couple more years before your mailing list infrastructure becomes entirely inaccessible from phones: https://transparencyreport.google.com/https/overview?hl=en

That uptick to 100% in the chart when Android 13 came out is a bit worrying, but I haven't bought a new phone in a few years and mine is only supported through 12. _I_ can still access it. (And from my Linux laptop, of course. No idea if random windows or mac users still can though. Safari's policy and chrome's policy and mozilla's policy don't advance in lockstep, but I hear rumblings.

Most websites have put mandatory http->https forwarding in place where accessing http just gets you a 301 redirect to https for _years_ now. Try explicitly going to "http://cnn.com" or "http://google.com" in your browser, it will load the secure page. It can't _not_ do so.

The rise of "let's encrypt" (nine years ago according to https://lwn.net/Articles/621945/ ) was what finally let people start deprecating the old protocol in clients, because sites no longer have to pay for a certificate so even the third world organizations running a solar powered raspberry pi on their cell phone towers can afford https now.

> I hope that helps to clear things up.

No, it doesn't. The robots.txt file excluding * for / still needs to be removed so Google can index my mailing list posts like it used to do.

> Please contact us back at any time should you have any questions or concerns at all. We're here to help!

The concern I expressed in the subject line is still not fixed.

I'd guess they did this because they didn't have any other way to manage server load, and their servers are underprovisioned. I suppose if they're truly this incompetent and have no other solution, I can set up a cron job to scrape the lists.landley.net web archive and mirror it under landley.net? It's EXTREMELY SILLY to do so, but I can just add that to the FAQ I guess?


January 15, 2023

Oh hey, Greg Kroah-Hartman is also removing the RNDIS driver from Linux, which is how Android does USB tethering. I wonder when Linus stopped being maintainer? The glory hound's been trying to drag the spotlight onto himself for decades now, but used to get told "no" a lot for hopefuly obvious reasons. Honestly, he's half the reason I don't post to lkml anymore. Al Viro was less abrasive: I'll take honest distain over two-faced self-aggrandizing politics any day.

I have some domain expertise with USB ethernet: a couple years back Jeff and I implemented CDC-ACM USB ethernet hardware for the Turtle boards, which could talk to Linux and MacOS but not Windows because Windows doesn't support CDC-ACM. It's a reference implementation from a standards body, but does NOT have a generic Windows driver because Microsoft wants money from each hardware vendor to be "supported". To test it we got a beta of a driver from somebody that made it work for half an hour at a time (before you had to unplug it and replug it because the driver was an "evaluation" version that timed out), but Microsoft charged $30k to sign a driver for Windows, and each is specific to a vendor ID and model number. Microsoft chose to have no generic driver for the protocol, only drivers for specific devices, so each hardware vendor had to pay microsoft $30k each time they needed to update their driver. (They claim they eliminated unsigned drivers for "security", but it's a profit center for them.)

Everybody Jeff talked to suggested we implement the RNDIS protocol instead, which is something Microsoft invented but both Mac and Linux supported it out of the box, and that one DOES have a generic driver in Windows that doesn't require $30k periodically sent to microsoft. Switching our hardware to RNDIS didn't look hard, we just hadn't done the research to make sure there weren't any lurking patents. (PROBABLY not? https://web.archive.org/web/20120222000514/http://msdn.microsoft.com/en-us/library/windows/hardware/gg463298.aspx says "updated 2009" and "assumes the reader is familiar with Remote NDIS Specification; Universal Serial Bus Specification, the Revision 1.1" but that document has been carefully scrubbed off the internet, the oldest I can find is 5.0. Because implementing against the old version is a prior art defense, so the old version is yanked.

The protocol was all in the FPGA bitstream, the actual USB chip we'd wired to the FPGA pins was just a fancy transciever that didn't even know about packets, and USB 2.x "bulk" protocols are all the same packet transfers with different header info. We never got around to prototyping it, we ran out of time shortly after we got the CDC-ACM version working (including our own TERRIBLE userspace driver that just spun sending data to/from a memory mapped I/O interface into the kernel's "raw packet" plumbing, improving THAT was our next todo item but the benchtop prototype was 2x SMP so the driver eating a processor affected power consumption but not performance). Jeff and I both flew out of Tokyo, and a year and change into the pandemic the funding for that project ran out, so it got mothballed without doing a proper production run, and we just didn't get back to it. But using RNDIS was the easy fix, and it's what everybody ELSE in the industry did, including Android's USB tethering.

Now Greg KH seems to be saying "we're losing features left and right, our collaping development team can't maintain the stuff we've already got, so let's flex OUR market muscle to out-influence microsoft". Or something?

I suspect Android's response will be "USB tethering is no longer supported on desktop Linux then, oh well, here's a Linux driver for RNDIS if you want to make it work". I haven't asked Elliott, but I remember when USB file transfer between my Linux laptop and android phone used to be really simple... and then it was replaced by some Microsoft protocol I could theoretically install an elaborate Gnome program for which never worked. (Or I could install the Android Development Kit, enable the debug menu in my phone, and use ADB file transfer from the command line. I've had to download a new copy of the android tools from their website every time I've needed to get that to work, because version skew.) Linux on the Desktop is not a commercially significant target market, we get _courtesy_ at best.

Even years from now, it would still be WAY easier for the J-core guys to ship an out-of-tree Linux kernel module than externally add a driver to Windows without paying them $30k annually-ish. Stuff like the Steam Deck could 100% use an out of tree driver if they needed to. Greg is making vanilla linux development smaller, but who's really suprised? He was the author of the kernel's "Code of Conflict" after all, and Linus was the one who apologized on behalf of the community and very publicly went to therapy to dig the community even a little way out of that hole, not Greg. The aging development community was emitting distress signals in 2013, and again in 2017, and now it's 2023...

(Yes I know Greg wrote "Android has had this disabled for many years so there should not be any real systems that still need this." My phone's running Android 12, I just tethered to check and dmesg said "rndis_host 3-1.2:1.0 usb0: unregister 'rndis_host' usb-0000:00:1a.0-1.2, RNDIS device". Oh, and hey, there's a more convenient way to configure it than I've been doing. I honestly don't know if Greg is clueless or lying, but does it matter? He is Confidently Wrong White Male.)

USB 2.0 shipped in 2000 so it's fairly recently gone out of patent (hence predictable badmouthing from for-profit manufacturers TERRIFIED of commodity competition from cheap generic hardware; the instant anything becomes available for open royalty-free implementation in it MUST BE DESTROYED. The oldest RNDIS documentation I could find says "updated 2009" (not authored, updated, it's older than that) and "assumes the reader is familiar with Remote NDIS Specification; Universal Serial Bus Specification, the Revision 1.1" but that document has been carefully scrubbed off the internet, the oldest I can find is 5.0. Because implementing against the old version is a prior art defense, so the old version is yanked. It is entirely possible that it recently DID go out of patent... and thus must be destroyed. How that idea made it from one of the Linux Foundation's largest contributors to one of the Linux Foundation's most prominent employees, I couldn't speculate, but he's sure confident about it.

RNDIS isn't tied to a specific USB generation (it's a packet protocol going across a transport), but USB 2.0 should be out of patent now (the spec is dated April 27, 2000) and that chugs along around 40 megabytes per second, which is still a quite useful modern data rate: over 20 parallel 4K HD netflix streams, over two gigabytes per minute, just under 7 hours per terabyte. It's about 1/3 the _theoretical_ max rate of gigabit ethernet (which I never get), and we were implementing it full speed on hardware running at... 60mhz I think? Either 4 bit or 8 bit parallel bus into and out of the chip, moving multiple bits per clock. A USB-powered device talking USB-2.0 RNDIS ethernet isn't hard to implement. Our CDC-ACM implementation fit in an ICE-40 with space left over.

I'm grinding through some of those email files from yesterday, trying to identify all the patches sent to the linux-sh list (grep '^+++ ' seems a reasonable first pass there once they're in individual files), but thunderbird saved all the files with the current date so it's not easy to filter for relevance. So I'm doing for i in sub/*; do toybox touch -d @$(date -d "$(toybox dos2unix < "$i" | sed -n 's/^Date: [ \t]*//p;T;q')" +%s) "$i" || break; done (as you do, yes gnu/dammit date gets unhappy with \r on the end of a date string, apparently), and I get an error message:

date: invalid date ‘Mon Sep 29 01:50:05 2014 +0200’

And I'm going... wha? Cut and paste that string to toybox date and... yes, it fails too. First guess: click back in xfce's little calendar widget, September 29, 2014 was a... sunday. Seriously? Sigh. Ok, FINE. Oddly, that date's not from the headers, it's from an inline patch which means... how is my T;q on the sed not triggering? (Back before I added that, date was complaining that multiple concatenated dates with \n were not a valid date...)

Ah, my sed is wrong. It expects a space after the date and that message has a tab in the headers, so it continued on and pulled one from a "git am"-ish patch in the body of the message. Ok, fix that and check that they all convert... yup, now they do.

Huh. You know, _technically_ netcat UDP server mode could handle one packet and then go back into "listening for new connection" mode, which would solve the 'locks itself to a specific source port' issue. That wouldn't work for child processes: the reason it's handling UDP packets the sameway as TCP connections is so we can pass off stdin/stdout filehandles to child processes. Which is where the "no notification of when a connection with no flow control closes" problem comes from: we'd need some sort of keepalive packet and there's no obvious place to insert that (if the kernel hasn't got a flag we'd need a Gratuitous Forwarder Process of some kind). The reason I didn't do that before is I don't want two codepaths to implement the same thing. Really, my use case here for interactive mode is "Linux net console". Does that send from a consistent source port even across reboots? Hmmm...

At this point I honestly expect healthcare.gov to KEEP sending me emails after today: "You missed the deadline, it was yesterday! How could you!" Yes I did try Obamacare one year, but at the moment I have the classic "VERY NICE health insurance through spouse's work" arrangement, in this case Fade's graduate program through the end of the summer, and then maybe we'll do that Common Object Request Broker Architecture thing to extend it a bit if she hasn't found a job yet, at which point it's _next_ healthcare.gov enrollment period). Alas there's no obvious way to tell obamacare's automated system that A) I'm currently good, B) you are basically useless in Texas because republican assholes bounced the subsidies and sabotaged implementation, C) I schedule doctor's appointments when I visit my wife up to minneapolis because the hospitals _here_ are collapsing unless all you need is a $100 visit to a nurse practitioner in a strip mall to get regulatory permission to purchase pills from a pharmacy, which are all over the place now. (How much of that collapse was covid and how much was foretold in legend and song is an open question. Two answer the follow-up questions: 1) Yes it's intentional, 2) if you don't work for a billionare-owned company getting a UTI costs more than a car and potentially more than a house so you will put up with ANYTHING to keep your job and they have less competition from small businesses and independent contractors. Guillotine the billionaires.)


January 14, 2023

I wonder if there's some way to get mastodon to do the green check mark thing? If you view source, I've had the link up top for a while now, with the magic rel="me" thing that's apparently an important part of it, but it just doesn't register? (I was reminded by updating the page links for the new year...)

Always odd when I get a request to do a thing I'm in the middle of doing. Yay, I'm on the right track? Not quite sure how to reply... "Um... yeah."

While Fade was here, heading out to poke at my laptop usually meant I'd use like a quarter of my battery then head back. Getting the low battery warning comes as a surprise after a few weeks of Not Doing That.

Dreamhost forwarded my support request to a higher level tech. That's nice. Unlikely to hear back before monday

I am once again impressed by how broken Thunderbird is. This needs some context. So Rich Felker theoretically maintains Linux's arch/sh but he hasn't updated his linux-sh git repo in over a year, and Cristoph Hellwig unilaterally decided to delete Linux support for the architecture despite plenty of people still using it and having an active debian port and so on. He didn't just suggest it, but posted a 22 patch series to remove it. (The charitable explanation is he's doing a don't a "don't ask questions, post errors" thing and putting the onus on US to object loudly enough.) Of course Greg KH immediately jumped up and went "I am deciding" because he's like that, but in THEORY Linus still has the final say here, and has not weighed in last I checked? And of course the motivations for the removal are contradictory: the primary complaint is it hasn't been converted to device tree (which is true of a lot of stuff), so the reply is be sure to remove the stuff that IS using device tree. Thanks ever so much.

The guy who maintains the Debian fork has tenatively volunteered to become the new maintainer, and one thing he'd need is all the patches that Rich chronically hasn't applied for years now. (Jeff informed me of this, and has volunteered to help the new guy, but will NOT say so on the list, and I quote: "Not going to engage with LKM toxicity in any way, got permanently away from that way back in 2002." So I connected them in private email and am very tired of doing that. But I still haven't posted this set to the kernel list myself, so can't exactly blame him?) So a useful concrete thing I can do is grab the accumulated linux-sh patches that have gone by on the list. So I'm giving it a go.

The first problem is gmail is crazy, and only ever keeps ONE copy of a message when I'm sent multiple copies with different headers, which means when I get emails cc'd to linux-kernel and linux-sh I only get one copy and which list-id it is is semi-random. (Usually linux-sh because that server has fewer subscribers so sends my copy out faster, but not reliably.) In each mbox in which I _don't_ get a copy, reply threads get broken, and if I ever wanted to put together a cannonical toybox history mbox file (and start a quest chain to eventually insert it into Dreamhost's web archive to fix the gaps) I'd have to check my toybox folder AND my inbox AND my sent mail (because I don't get my OWN messages sent back to me through the list either). But that's not FRESH stupid.

So I've done a search on my linux-kernel folder in thunderbird for messages with linux-sh in the "to or cc" field, which defaulted to searching subfolders but ok. Some of those subfolders are architectures or subsystems I follow (linux-hexagon and such, linux-sh is a seprate top level folder in my layout because I checked it regularly), but most of those subfolders are previous years of linux-kernel that I moved messages out to because thunderbird not only melts if you try to open a folder with a few hundred thousand messages in it, but email delivery slows down because the filters appending email TO those large mbox files somehow scale with the size of the mbox file they're appending to, and having linux-kernel as a regular destination gets noticeably slow every 3 months or so, and email fetch CRAWLS after 9 months without reorganization. So I have to periodically do maintenance to keep thunderbird running by moving messages into yearly folders to fight off whatever memory eating N^2 nonsense is in thunderbird's algorithms (a name and an offset in an mbox file doesn't seem like THAT big a struct, but it is in C++). Thunderbird's, "click then shift click to highlight a bunch of messages, right click move to other folder" plumbing ALSO scales badly with lots of messages (the "swap thrash" threshold is somewhere around 40k messages, which is much faster with an SSD but really not good for it, and the actual OOM killer kicks in somewhere in the 60k-90k range. There's something N^2 in their algorithm maybe? Yes the right click menu popping up even with 20k messages selected can take 30 seconds; it's a chore). But again, that's not FRESH stupid.

Thunderbird's search results window presents a list of messages but doesn't let me right click and DO anything to them. (I can double click one at a time to open in a new window, but not what I want here.) Instead I have to create a virtual "search subfolder", which has a pink icon and populates itself slowly as it re-performs the search (of each subfolder) each time you go into it, but otherwise seems to act as a regular folder. Fine. And after it had stopped adding messages clicking on the last message in the list pegged the CPU for 45 seconds before it showed me its contents. FINE. So eventually manage to I highlight all the messages in the pink folder, right click and get a menu, tell it to "save as"... and the resulting destination pop-up doesn't give the option of making a new mbox file, it wants to save them as individual messages. Ok. So I give it an empty folder to do so in, and...

Here's the FRESH stupid: a thousand empty files show up at once, with no contents yet, 30 seconds later the contents fill out in the filiesystem but I also get a pop-up saying "couldn't save file". Because it tried to open all the files it was writing IN PARALLEL and ran out of filehandles. (Or maybe loop to open them all, loop to write them all, loop to close them all? Why would anyone do that? The default ulimit -n is 1024, the default HARD ulimit is 4096 filehandles per process without requiring root access to increase. Don't DO that.)

Remember how I said Mozilla was not a real open source development organization? They are BAD AT THIS. So is the Free Software Foundation, the Linux Foundation, and even Red Hat. Capitalism mixes badly with open source development, even when it a nominal foundation claiming to shield developers from capitalism. Red Hat inflicted systemd on the world for profit (we're not allowed to opt out), the FSF zealots became as bad as what they fought, and the Linux Foundation and Mozilla did that wierd 501c6 trade association thing (a for-profit nonprofit tax dodge) where they're endlessly fundraising to provide exclusive members-only benefits.


January 13, 2023

Fade is on the airplane (as is This Dog).

Sigh, I do not have time for kernel shenanigans, but I guess I need to make time. Grrr. (There's a maintained debian port, Hellwig. Stop it. I didn't even post the bc removal patch to the list! I should do so, every release from here on...)

And Wednesday's question has been answered: the reason the middleman couldn't manage to pay my invoice THIS time is because "our policy is to pay invoices in arrears rather than in advance". Which is news to me because I was previously doing quarterly invoices (not risking TOO much of the money to a single transaction) and they paid Q4 in october. This time I invoiced for 2 quarters at once (hoping not to go through a multi-week debugging/negotiation process QUITE as often), and... Sigh. (They have literally one job. This is the fourth time it has not gone smoothly.)

The Executive Director of the middleman went on to suggest "if there is a strong reason for invoicing in advance, please do let us know, and we may be able to make specific arrangements for this — such a binding you to an agreement as a consultant."

I replied:

I invoiced for 2 quarters this time largely because each of the previous 3 invoices had some sort of multi-week issue. I honestly did not expect this one to go through smoothly either, but was at least hoping to deal with the problem less often. (I left a good chunk of the sponsorship money in there because I'm still not ENTIRELY convinced it won't vanish in transit again and maybe not come back this time.)

Now that I know the fourth roadblock is bureaucracy, let me know when and how much I'm allowed to invoice for to conform with your procedures, and I'll do that then. I'm assuming invoicing for Q1 would still be paying me in advance, so... March? (In previous quarters I got paid for the quarter we were in, but now that I'm aware of "policy" I'm assuming that no longer works either. Can I invoice for Q1 now and get paid March 1, or do I have to wait to submit and approve the invoice?)

As for whether I'm a flight risk, I've been working on https://github.com/landley/toybox/commits/master for 16 years (ala https://github.com/landley/toybox/commit/13bab2f09e91) which is longer than github's existed (hence https://landley.net/hg/toybox). Every commit in that repo was applied to the tree by me, and I personally authored... grep says 3642 of them. Even the current mailing list goes back to 2011 (http://lists.landley.net/pipermail/toybox-landley.net/) and dreamhost is terrible at mailing lists (https://landley.net/dreamhost.txt and https://landley.net/dreamhost2.txt and no I don't know where the threading info went back at http://lists.landley.net/pipermail/toybox-landley.net/2013-August/thread.html but after https://landley.net/toybox/#12-21-2015 and https://landley.net/toybox/#23-07-2022 and such I'm not asking).

Most of that time toybox was a hobby project I did around various day jobs (https://landley.net/resume.html). Google decided to use toybox in late 2014 and I kept working on it as a hobby for another 7 years. I am very grateful to them sponsoring me to focus on this project, and have said so publicly multiple times including in the release notes (https://landley.net/toybox/#12-08-2022). Disclosure: before the sponsorship I did get the Google Open Source Award twice, which came with a $200 gift card each time.

I suppose I could always get hit by a bus or have a stroke or something, but I'm not sure how signing a contract with you would affect that?

How WOULD the middleman perform oversight? Do they have any idea what success looks like? The only other guy who gets cc'd on this sort of thing is Elliott, and even I can't reliably find stuff like that again 6 months later. (Would they assign somebody to read my blog? Would that HELP?) Eh, KLOCs I suppose. Judge the value of a car by the weight of metal added to its construction...

While trying to google for a link writing the above, I noticed that lists.landley.net is no longer visible via google at all, and traced it to Dreamhost adding a robots.txt blocking... everything. I didn't change anything, and don't have ACCESS to change anything (remember: it's a shared server and they don't let me log in directly, everything happens through a web panel). I have opened a support ticket.

Oh goddess:

Subject: The robots.txt you put on lists.landley.net (which you won't let me log into) blocks google.

Hello Rob,

Thank you for contacting the DreamHost support team, I'm sorry you're having this issue, I will be happy to help. After checking your site under landley.net, I was not able to find the robots.txt you've mentioned, so to check the rules and offer you solutions. Have you deleted the file to prevent blocking Google crawling your site?

Please, have a look at our article on how to create a robots.txt file that is convenient for you https://help.dreamhost.com/hc/en-us/articles/216105077

I hope this troubleshooting and information was useful for you. Please, don't hesitate to contact back the support team in case you need it.

They didn't even read the TITLE of my support request, did they?

My reply:

> After checking your site under landley.net, I was not able to find the robots.txt you've mentioned,

Because lists.landley.net is not the same web server as landley.net. Your mailing lists run on a different (shared) server which I don't have direct access to, and which I can only interact with through your web panel.

Your server has been mildly broken for years, such as refusing to give a list of available mailing lists under https://lists.landley.net (which has been "temporarily disabled" for over a decade).

But sometime in the past year or so the robots.txt on lists.landley.net (which is not landley.net) changed, so that:

https://www.google.com/search?q=site%3Alists.landley.net

Says "no information available on this page", and when I click "learn why" under that it goes to:

https://support.google.com/webmasters/answer/7489871?hl=en#zippy=%2Cthis-is-my-site%2Cthe-page-is-blocked-by-robotstxt

> Have you deleted the file to prevent blocking Google crawling your site?

I would love to get access to lists.landley.net to fix stuff there, but the lack of that has been a persistent issue dealing with you for some years now:

https://landley.net/dreamhost.txt
https://landley.net/dreamhost2.txt
https://landley.net/toybox/#23-07-2022

I haven't even bothered to ask where the thread information for older months went:

http://lists.landley.net/pipermail/toybox-landley.net/2013-August/thread.html

(It used to be able to indent those, but not anymore.) But far and away the BIGGEST problem with lists.landley.net is you can't access it via https but only http, which means mailing list administration sends a plaintext password across the internet for every page load. (Because the Let's Encrypt certificate for landley.net isn't available to the shared lists.landley.net server.)

> Please, have a look at our article on how to create a robots.txt file that is convenient for you https://help.dreamhost.com/hc/en-us/articles/216105077

I know what a robots.txt file is. But I do not have access to change any of the files at https://lists.landley.net. I can only ssh into the sever that provides landley.net (ala www.landley.net) because the different domain name resolves to a different host.

> I hope this troubleshooting and information was useful for you.

Not really. Here is the issue:

https://landley.net/robots.txt - 404 error
https://lists.landley.net/robots.txt - ERR_CONNECTION_REFUSED
http://lists.landley.net/robots.txt

  User-agent: *
  Disallow: /

Meaning Google cannot index the site. It USED to index the site, but it stopped sometime during the pandemic, because of you.

I hate having to explain people's own systems to them. It's embarassing for both of us. I also dislike having to reenact the Dead Parrot Skit. ("If you want to get anything done in this country you've got to complain until you're blue in the mouth.") I feel there should be a better way, but I'm not good enough to find it.

Walked to the table for the first time in a while. (I was hanging out with Fade in the evenings, and mostly staying on a day schedule with her.) Pulled up the list of pending shell items... and then spent the evening editing old blog entries since new year's.

My blog plumbing (such as it is) has a slight year wrapping issue: I switch to a new filename for 2023. The rss feed generator takes the html file as input and emits the most recent 30 entries in rss format, using the stylized start of new day html lines to split entries. Which means if the December entries aren't appended to the new year's file they'll prematurely vanish from the RSS feed, but when I DO append them I keep forgetting to delete them and I think some previous years might STILL have the previous december at the end?

There's always a temptation to cheat and not edit/publish January for a week or two, so that when the RSS feed updates everybody's had plenty of time to notice the old stuff, and anybody new checking it won't see a questionably short list. Not that I need MORE incentive to procrastinate about a large time sink...


January 12, 2023

Fade flies home tomorrow, mostly spending time with her.

The ls.c stuff (fallout from this) is harder than it needs to be because ls --sort has a lot of long options (and I added a couple more). The current help text looks like:

-c  ctime      -r  reverse  -S  size     -t  time   -u  atime   -U  none
-X  extension  -?  nocase   -!  dirfirst

And I went "well of course that should be comma separated values with fallback sorts, just like sort itself does!" and that's... tricksy. Each of those can be specified as a short option (which doesn't save the order it saw them in), and you can presumably mix short and long options, and I dowanna re-parse the list each time because that feels slow but I don't have a good format to put it in?

Eh, data format's not hard: array of char that's the offset of the flag value for the sort type. Break the comparison out into its own function and feed it either toys.optflags or 1<<sort[i] in a loop. If I ensure flag 0 isn't interesting (it's currently -w, not a sort option) then it's even a null terminated string (of mostly low ascii values, but still). But the design and user interface parts are still funky: the longopts would accumulate as fallback sorts and the single character sort flags should switch each other off? No, it's more complicated than that: you can do ls -ltr with is reversed mtime, so they DO chord at least sometimes... Actually, "reverse" is specifically weird. Sticky. It should ALWAYS go last because otherwise it has no effect.

No, the really WEIRD chording is -cut with or without -l. (I was working on this before, I know I was, and I document COMPULSIVELY, but it's not in the blog. Is it on the mailing list? One of the myriad github pages? A commit comment? Did I make the mistake of typing it at someone in IRC and setting the little mental "it's been written up!" flag? Who knows...)

(Once upon a time the #uclibc channel on freenode, where all the busybox developers hung out back in the day, was logged to a web page, which I believe went down when the guy hosting it went through a bad divorce, and in any case freenode got bought by a korean billionaire who did to it what the muskrat is doing to twitter. I still sometimes think "written in irc" means "I can find it again later", but have mostly trained myself back out of that these days.)

Anyway, the issue is that the ls man page (and behavior) is nuts:

-c     with -lt: sort by, and show, ctime (time of last modification of
       file status information); with -l: show ctime and sort by  name;
       otherwise: sort by ctime, newest first

So -l disables -c's sorting behavior and you add -t to get it back. Same for -u. That's horrible historical nonsense and I need to make it work, but where does --sort ctime and --sort atime work into this mess?

As always "how to implement" is easy and "what to implement" is hard.

Sigh: [-Cxm1][-Cxml][-Cxmo][-Cxmg][-cu][-ftS][-HL][-Nqb] is tangly. But lib/args.c hasn't got a [-Cxm[1log]] syntax and there haven't been other callers for it.


January 11, 2023

I have invoiced the middleman! Let's see how it fails to work THIS time.

Staring at the bugs shell fuzzing found. And ls.c. And the shell "read" command. And the shell "command" command, because command -v is where scripts/portability.sh barfs trying to run the test suite under toysh.

Not really making a lot of progress on any of them, but looking. Oh, and I should read that git format documentation...


January 10, 2023

Finally cut a toybox release. And then updated the date in news.html to the correct day AFTER cutting the release. (The tarball and tagged commit still say the 8th. Always something.)


January 9, 2023

I have reached the "revert and rip stuff out" stage of release prep: if it's not ready, punt it to later.

Disabled the date test in "make tests" again: not shipping a fresh toolchain this time. Put bc and gcc back in the airlock because not demanding people build a patched linux this time either. More FAQ updates. I can't get the ls.c work in this time...

Right wing loons are flipping out about a possible ban on gas stoves, which means Fuzzy has been involved in an argument online where somebody insisted it was impossible to make proper custard on induction, so we have a big pot of custard now. It's lovely. Peejee has a want. (Cat, YOU may be spry and feisty but your kidneys are 19.)

Peejee had custard.


January 8, 2023

Found a problem with "make sh_tests" where some of the early tests weren't testing the right shell. There's a context switch before which you can do "sh -c 'test' arguments", and then it switches to having all the tests run through sh -c _for_ you, to ensure it's all being tested in toysh rather than being "eval" under whichever shell the test suite is itself running in. (On Debian, bash. In Android's case, mksh.) You can manually wrap tests before that yourself, but I found a set of tests before the switch but weren't wrapped, and moved them after the switch so they happen in the proper context... and some fail. Now I've gotta fix unrelated stuff before running the test suite gets me back to seeing the failures I was debugging before Progress of a sort, I suppose. But it puts us firmly into "punt this whole mess until AFTER next release" territory.

Oh hey, right after I tried to pivot AWAY from working on toysh, Eric Roshan-Eisner ran a fuzzer on toysh and found several ways to segfault it. Fixed some low hanging fruit, punting the rest until (say it with me) after the release.


January 7, 2023

I kiiiinda wanted to get "make test_sh" passing its test suite this release. Not happening, but I have multiple dirty sh.c forks I'm trying to check in and delete. The release is the time to finish unfinished things and clean up what you can.

The next "make test_sh" failure was a simple fix [editorial note: while trying to add the link to the blog I realized I'd checked it in to the WRONG TREE: pushed now] but the next thing that test tried to do is call the shell builtin "read"... which I haven't implemented yet. Taking a stab at it now, but there's a design problem: it does line reads but lets read -u substitute a different file descriptor. Hmmm...

Strings are hard, and that includes efficiently reading lines of input. This is why I had get_line() all those years: byte-at-a-time is slow and CPU intensive, but larger reads inevitably overshoot, and you can't ungetc() to an fd. (Well you _can_ but only to a seekable fd, which does not include pipes or tty devices.) This is why the ANSI/ISO C committee invented the FILE * back in the 1980s: somewhere to store a buffer of data you block read and save the extra for next time. But shells don't USE file pointers, they use file descriptors, both for redirect and when spawning child processes.

This isn't AS bad because pipe and tty devices return short reads with the data they've got, so when a typing human is providing input MOST of the time the computer will respond to the enter key before you press the next key. And piping between programs, each printf() turns into a seperate write() system call which sends a batch of data through the pipe and if the read() at the far end receives that data before more gets sent (and concatenated in the pipe buffer) then it hasn't read ahead part of the next line there either. But if you DO type fast (or something like "zcat file.gz | while read i" happens) then the read gets extra characters that go on the next line, but the read returns and the next read happens only knowing the file descriptor. (If you're wondering why you see "echo $X && sleep .1 && echo $Y && .sleep .1" in some places in the test suite... generally this sort of thing. Even that's not ENTIRELY deterministic if the system's heavily loaded enough.)

This same problem would also screw up trying to provide input to a command, such as echo 'fdisk blah.img\nn\np\n1\n\n\nw' | sh because the FILE * stdin used to read the fdisk line will read ahead to the rest of the data in the input block, which is then NOT provided by file descriptor 0 to the fdisk child process, because it was aleady delivered and is waiting in an unrelated buffer. (I bothered the Posix and C99 guys about querying how many bytes of data are waiting in the buffer so I could fread() them back OUT and pass them along, about like my tar.c does when autodetecting compression type from pipe input. You read data and then need to DO something with it, can't go back into the file descriptor.

(If you COULD unget data into a read-only file descriptor, that would be a security issue. Unix semantics generally do make sense when you dig far enough, because everybody's had 50 years to complain and usually would have fixed it by now if it was actually wrong.)

All this reminds me I'm ALREADY mixing FILE * and stdin in sh.c because get_next_line() takes a FILE * argument, but that's always been a placeholder function. I need to implement interative line editing and command history, which I should tackle soon but it's not a _small_ can of worms to open and I want to get the shell more load bearing before putting a big neon "welcome" sign out front. But the interactive stuff should use the input batching trick I introduced to busybox vi years ago to figure out what is and isn't an ANSI escape sequence, which I already iplemented in lib/tty.c scan_key_getsize() and is STRONGER reliance on input batching. (It would be lovely if I could fcntl() a pipe to NOT collate data written to it in its buffers, but this seems to be another "optimization" I can't turn off. It would also make testing dd SO much easier if I could do that...) Anyway, scan_get_getsize() is always doing 1 byte reads to assemble the sequence from a file descriptor without overshooting, because "interactive keyboard stuff" really should not be a big CPU/battery drain on modern systems. (He says knowing that "top" is a giant cpu hog that really needs some sustained frowning to try to be less expensive. I dunno if it's all the /proc data parsing or the display utf8 fontmetrics or what, but something in there needs more work.)

I suppose I could try to lseek() on the input, and do block reads when I can and single byte reads if I can't? The problem is the slow path is the common case. I don't want zcat file | while read i to be an unnecessarily slow CPU hog, and the FAST way is using getline() through a FILE * (or writing my own equivalent; generally not an improvement). Which doesn't work for -u, and if I wrapped that in a FILE * where would I save the cache struct between "read" command calls? How do I know when it's done and I can free it? Can of worms. Redirect and FILE *stdin aren't going to play nice together, but what's the BETTER way?

Sigh, I'm not entirely sure what the corner cases here ARE. Coming up with test cases demonstrating it causing problems is a headache all its own. And some of those corner cases I'm pretty sure bash suffers from and their answer is 1/10th second sleeps.

I don't WANT two codepaths, one for stdin and one for -u other than 0. That's just creepy.


January 6, 2023

Elliott didn't like xmemcmp() so it's smemcmp() now. (Yeah, I know he's right, just... trying to get a release out!) Bugfix from somebody at google for sh.c (yay, people are looking at it). FAQ update...

The new dishwasher is not arriving today, supply chains are failing to supply, or possibly chain. New estimate is the 20th. Fuzzy is very tired of doing dishes by hand, we have purchased paper blates, cups, bowls, and plastic utensils.


January 5, 2023

Day 2 of Actual Release Prep, which _seems_ like it's mostly just creating a big page of release notes but is actually "go through each commit since the last release and re-examine them", which is second pass review (more design than code per se) and a MASSIVE tangent generator. It always winds up taking multiple days to actually get a release out, and that's AFTER I've done a full mkroot build-and-test recently on the kernel version I plan to release with, using the toolchain I plan to release with. (I.E. no blocker bugs.)

The toolchain issue's a bit wonky this time, because llvm version skew broke my ability to rebuild the hexagon toolchain with the script that worked last time, but I need to rebuild musl to include the patch for the issue Rich refused to fix but which otherwise breaks the date test that runs on glibc and bionic but not musl. (I enabled the date and gunzip tests in "make tests" because they worked fine on glibc and bionic, and only after I'd checked it in realized that date was still disabled because the musl bug was a pending todo item.)

It's a one line fix to musl, but Rich won't do it because Purity or something, and after many years fruitlessly arguing with Rich I just put the simpler workarounds in my code, and otherwise patch musl in the toolchains I build. I wave a patch at Rich once so it's not MY fault when he says no, same general theory as linux-kernel. (Except there I tend to do second, third, even fourth versions when I get engagement. Less so when they're ignored.)

Speaking of which, my musl patches are still inline in scripts/mcm-buildall.sh but I've moved the kernel patches from scripts/mkroot.sh out to a separate repository. I should philosophically collate my patch design approach at some point, but I'm not holding up the release for it now.

In general broken out patches are better in case other projects want to pick them up. Squashfs was widely used out of tree for many years before lkml deigned to notice, and some of the Android stuff still is I think? Back on Aboriginal I had a patches dir with linux-*.patch in it. For this release I'm probably putting 0001-*.patch files for the kernel in the mkroot binaries release dir because "apply these to your arbitrary kernel version" seems easier than collating two otherwise unrelated kernel trees via git. But how much of that is what I'm used to vs what other people are used to? (I mean I HAVE a branch published on github, but have to redo it each release I do a musl thingy on and then can't delete the old ones if they're load bearing, which is non-collated cruft I dowanna encourage/accumulate. "Probably gonna delete this later" is not a good ongoing policy.


January 4, 2023

Rant cut and pasted from the list to the blog:

I'm not a fan of over-optimizing compilers. My commodore 64 had a single 1mhz 8 bit processor with 38911 basic bytes free, and it was usable. I'm typing this on a quad processor 2.7 ghz 64 bit processor laptop with 16 gigabytes of ram, and this thing is completely obsolete (as in this model was discontinued 9 years ago: they were surplussed cheap and I bought four of them to have spares).

Performance improvements have come almost entirely from the hardware, not the compiler. The fanciest compiler in the world today, targeting a vintage Compaq Desqpro 386, would lose handily to tinycc on a first generation raspberry pi. Hardware doubled performance roughly annually (cpu was 18 months but memory and storage and stuff in creased in parallel) and each major compiler rewrite would be what, 3% faster? The hardware upgrades seldom broke software (rowhammer and specdown meant the hardware didn't work as advertised, but that's an obvious bug we worked around, not "everything intentionally works different now, adjust your programs"). Every major gcc upgrade had some package it won't build right anymore and the gcc devs say we shouldn't EXPECT it to.

Part of this attitude is fallout from the compiler guys back around 1990 making such a big deal about the move from CISC to RISC needing instruction reordering and populating branch delay slots and only their HEROIC EFFORTS could make proper use of that hardware... and then we mostly stayed on CISC anyway (yes including arm) and the chips grew an instruction translation pipeline with reordering and branch prediction.

I'm aware this is a minority view and I'm "behind the times", but if I wanted the tools to be more clever than necessary I'd be up in javascript land writing code that runs in web browsers, or similar.

This difference of viewpoint between myself and people maintaining compilers in C++ keeps cropping up, and I have yet to see a convincing argument in favor of their side. They're going to break it ANYWAY.

I'm currently editing the December 28 blog entry about tidying up the html help text generator, and I realized a corner case I hadn't handled: nbd-client says "see nbd_client" which doesn't exist. (Public dash version vs private underscore version because C symbol generation.) Sigh. Ok, fix the help text generator AGAIN...

I keep nibbling at the release, but... time to start writing release notes. Ok, git log 0.8.8..HEAD and... there have been a few commits, haven't there? Lot to go through. But first, the hardest part: picking a Hitchhiker's quote I haven't already used.


January 3, 2023

Working to make ASAN=1 make test_sh pass, which is whack-a-mole. The address sanitizer's a net positive, but it's a close thing at times. (Gimme a stack trace of where the problem OCCURRED, not just where the closest hunk of memory was historically allocated.)

Refrigerator dude stopped by and vacuumed the refrigerator coils, which were COVERED with cat hair. Showed me how to do it myself, not that I own a vacuum cleaner. (Tile floors. Big flood back in 2014. The only carpet in the house these days is throw rugs we take outside and do a sort of bullfighter thing with.)

The outsourced washing machine guy called and said the symptoms I'm seeing on the model I have means the circuit board's almost certainly fried, probably had water leak onto it, which with labor is something like $800 to replace (Bosch is reliable, but not repairable), and getting a new dishwasher of the same model and having the old one hauled away is basically the same price, so there's not much point him coming out to look at it and billing us for not being able to help. (Professional repair dude, no shortage of work.) Thanked him and Fade ordered a new one from the people we got the replacement washer and dryer through, they think it'll be here Friday.

Yes, I am aware I did the refrigerator thing because "they're already coming" and then it was two seperate servicebeings. There was meta-upsell there, apparently. Unlikely to use Sears again, which is convenient since they only barely seem to still exist as a kind of temp agency.


January 2, 2023

Finishing up pending toysh fixes. I was redoing math stuff and it's always more work to figure out what I was doing when I leave myself half-finished commits in the tree. I can see what the code there is doing, but have to work out what I MEANT it to do and examine how much I got done to figure out what I left out. The design work is as always the tricksy part: is there anything I didn't think of at the time, or thought of but didn't implement, or thought of then but am not remembering now? (There's no such thing as sufficiently copious design notes that do NOT include a finished implementation. Not that the implementation by itself is always sufficient either, which is why there's code comments and commit comments and blog entries...)

Not gonna manage to merge expr.c in with $((math)) this release, and I'm not 100% sure it's doable (well, worth doing) at all. And the big missing piece seems to be a floating point version of this same code. Python seems to do arbitrary precision math: 1<<256 and 1/1.7 resolve but 1.7<<256 is an error. Multiple codepaths!

Eh, punt for now. Close the tab, move on to the next...

The dishwasher died. As in the power button does nothing, acts like it's not plugged in but the outlet works when we plug in other stuff? (RIGHT at new year's. IS there such a thing as a Y2023 bug?)

Despite Sears having died years ago, Google Maps has a number for "sears appliance services" in hancock center. (In the middle of the parking lot on the map.) And when I called it I got... what's probably an indian call center, but sure. They have our file from when we bought the thing. And want $150 for a service technician to come look at it. Hmmm...

I'm not entirely sure how they upsold me on having my refrigerator serviced, but it was just an extra $50 and nobody's looked at it in ~5 years and I'd rather it didn't go out, so sure. Why not. (Fade thought we might as well, anyway. Long as the dude's already making the trip...)


January 1, 2023

Happy new year! The first in a while that isn't "the year of hindsight", "last year won", or "also that year". We are finally "2020 free", or at least experiencing a diet version thereof.

Grrr. The recent xmemcmp() changes in file.c left me with an open tab where I WANT to replace a bunch of memcmp(x, "string", 8) with #define MEMCMP(x, y) xmemcmp(x, y, sizeof(y)) so you don't need to specify the length, but unfortunately it won't quite work. Yes, sizeof() treats a string constant as an array and thus gives you the allocation size, including the null terminator. Some of the comparisons in file.c are checking the NULL terminator, and some aren't. Having two #defines for the two different cases pushes this out of net-positive mental complexity savings territory. Subtle enough the NEW thing becomes a sharp edge you can cut yourself on. The other is redundant/tedious but very explicit.

While reviewing them I did find a memcmp(s+28, "BBCD\0", 5) so once again no review is ever COMPLETELY wasted...

Maybe I should rename _mkroot_ to "dorodango"...


Back to 2022