Rob's Blog (rss feed) (mastodon)

2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2002


September 12, 2023

Snuck a peek at busybox tsort.c to see if I got it right, and I actively don't want to KNOW what xrealloc_vector_helper() is doing. And I've forgotten what FAST_FUNC is. And they seem to be using some kind of tree structure instead of just an array? Nope. I'll stick with my naieve, uninformed, possibly inefficient implementation that isn't named after anybody. Send me test cases that break if you care that much...

Sigh, my tsort is outputting stuff in a different order than debian's and busybox's tsort. It's not WRONG: topological sort is not unique, you just need something that satisfies the constraints. But the main difference is I'm peeling out circular entries first and printing orphaned second entries when I print (and discard) the unpinned first entry, so "a b c d f f d e" through debian or busybox gives "a c f b d e" but through mine gives "f a b c d e"., which isn't WRONG. (a is before b, c is before d, d is before e, and f can go anywhere). If I add a constraint so f _can't_ go anywhere (a b c d f f d e d f) you get "a b c d e f" from mine and "a c b d f e" from the other two. Again: neither is WRONG.

But the inconsistency does make testing harder. If my test and TEST_HOST do not agree on the result, what is a "right answer" to compare against in the test? I'm not testing for canned results, I'm testing for reproducible correct answers that I can hopefully get from other implementations. This makes writing tests WAY HARDER than just "I eyeballed this as good once, make sure it didn't change". Nobody said my eyeballing was correct! "Other implementation also did it" is much more reassuring. At least we're CONSISTENTLY wrong...

Hmmm, do I really need to peel out the circular entries first? It smells like I shouldn't, but it's a special case. Nodes that depend on THEMSELVES don't count as a cycle, but when there are multiple pairs that depend on the same string the binary search may return a pair other than this one as the answer to "find somebody who depends on this". So the depends-on-self pair needs to be removed when encountered, THEN we check if any of the remaining pairs depend on it. Hmmm. Maybe with careful ordering... The thing is, all the depends on self pairs should be removed in the first pass. Whether or not they're printed then is separate issue, but if they're NOT printed when removed then something else depends on them and THAT pair is responsible for printing this string when enough dependencies are satisfied that it can eventually be removed. I hate having the strcmp(a, b) as part of EVERY pass through the list. It can only trigger on the FIRST pass through the list.

Alright, I can make the initial collection do the strcmp(), and then set pair[0] = pair[1] so I can do (cheaper) pointer comparisons instead of strcmp(). Doesn't matter from an allocation perspective, this is readfd() doing one big malloc() to hold the input data (actually a realloc() loop but details: one big heap allocation) and then a second malloc() holding the pairs[] pointer array. We read the data into memory, do a pass over it to count words, malloc the pairs list, do a second pass over the data to fill out pairs[] and null terminate all that whitespace (newline or space doesn't matter, it separated strings and now terminates strings, readfd() automatically adds a single null terminator at the end of the file it read in, properly allocated and initialized but not included in the returned length)... anyway, I can throw an if ((len&1) && !strcmp(pair[len], pair[len-1])) pair[len-1] = pair[len]; in that second pass and that means the strcmp() only happens on one pass through the list, not every time. (Well it was bothering me.)

My algorithm is assembling an array of string pairs sorted by second element. (Busybox did an insertion sort, but I just did loop and qsort() out of libc.) Then we loop over the pairs and do a bsearch() for the first string in each pair to see if anybody else has it as their second string, meaning that other pair depends on this pair. (Happily bsearch() uses the same sort function as qsort() did, yay code reuse.) If something depends on this pair, leave it alone and continue the loop. If we get to the end of the loop without finding any loose ends, what's left is one or more unprintable loops, and we error out on the circular dependency, printing the first loop if I'm feeling fancy. (I should implement that. Right now it's just error_msg("loop containing %s\n", pair[0][0]) and let the user figure it out from there.)

When the bsearch() returns NULL we've found a printable first element, so copy the pair to a local variable char keep[2]; and remove it from the list, which is just a memmove() and decrement len. We then iterate through the list of saved strings we've already printed this pass through the pair list (to kill duplicates), and if it's not found in there we both print it and add it to the duplicate list so we won't print it AGAIN.

Then, since I've removed this pair from the list and am about to discard it, I bsearch() for the SECOND entry to see if anything depends on that. If we depend on something that nothing depends on, it can go out now too! If something else does depend on it, printing it is the other pair's job so we can just discard it. So if we should print it, do the same duplicate-suppressed print for the second string as the first string.

I put the duplicate suppression list in the space at the end of the pair list that we moved the entries down out of in the earlier memmove(), back when we removed the element from the list and copied it to the keep[2] local variable. That left free space, and the duplicate list can never have more entries than we removed (because that's where they came from). And once we've traversed to the end of the pair list, we can discard the duplicate list because nothing in it can still be in the pair list...

Nope, that's wrong: echo f a c f | ./toybox tsort | xargs printed c f f a because "f a" couldn't be printed on the first pass (since "c f" depended on it), but when "c f" was yanked nothing depended on f so it went out, and then the duplicate list got cleared before "f a" went out.

Grrr, the duplicate list needs to survive one extra pass through the pair list. That's awkward. (Test cases! So many test cases! Object lifetime rules! The classic saying "The two hard problems in computer science are naming things, cache invalidation, and off by one errors" comes up again: object lifetime tracking is hard even when you're NOT trying to keep two copies of the same data in sync.) Alright, progressive deletion? (For a definition of "deletion" that's just moving an "end" pointer up, but still.)

I miss my 30's. I could keep all this in my head at once and it would still be THERE half an hour later. This is a small enough algorithm I'm not actually having that much trouble with it: the actual tsort.c code minus comments and whitespace is currently 48 lines, 10 of which are opening or closing curly brackets on their own line. (Yeah, "minus comments" knocks out the entire header block with the menuconfig info and help text and such. Add back the NEWTOY() line I guess for 39 load-bearing lines. And I still need to fix the duplicate entry bug I just realized last paragraph.

But oddly enough, this one is easy to code and REALLY HARD TO EXPLAIN. Or at least the explanation is several times longer than the actual code.


September 11, 2023

Added a ts -m option to append milliseconds to the time (since the darn strptime() escape format doesn't handle fractional seconds because neither time_t nor the broken-down struct tm do) and switched it to fetching time with millitime() and got all the way through until... millitime() return uptime, not unix time, because that's what clock_gettime(CLOCK_MONOTONIC) returns. Which is _better_ for ts -i and -s but not what ts without those needs. Sigh, EXISTING API (of ts) SUCKS.

I've been walking to UT and back fairly regularly the past week and change, which adds 10k steps to my day (about 4 miles) and is good for my health... except the Wendy's in Jester Center has finally recovered from the pandemic and is open until 4am again. Heading back home this morning they were still open as I headed out, and I got a 4 for 4. (Which this location still has because they do not offer kids meals. I asked.) Pretty much balancing out the calories of the walk right there. Eh, win some...

Hmmm, I could have the help text decompress into a 128k malloc buffer. Right now defconfig's help text is 82732 bytes and adding all the pending commands brings it up to 108573 bytes. If I move the horrible #ifdef salad into pending.c, and then the rest of the help text doesn't change at all because the decompression works on the same kind of buffer it has now...


September 10, 2023

HEB is selling 3-packs of 32 gig USB sticks for $12. (It rang up as $21 but they corrected it when I showed them the price on the wall, which is conveniently near the self-checkout registers. My habit of twisting the package off the hanger instead of calling an employee to unlock the little thing was not commented on, since I was in the middle of buying it and all.)

My old USB sticks are all ancient and terrible: several are dead and the rest are 1 or 2 gigs, with the occasional 8 gig acting all big. On the one hand 32 gigs is tiny by modern standards, on the other hand it's big enough to be pretty useful.

There's a neon orange one, a neon yellow one, and a neon red one, in transparent cases so you can see the circuit board inside. Gotta do something to stand out from the crowd, I guess... Ooh, interesting. It's formatted with a FAT variant that maxes out at 4.2 gigs instead of 2.1 gigs for an individual file. Um... yay I guess? (Is that how all vfat works or is this that exfat I keep hearing about, or...?) This means if I _do_ get another tiny little USB cube server (or finally get a raspberry pi working), I could run VMs attached to 4 gig ext2 mounted loopback images on USB stick providing scratch space for a mkroot build without worrying about burning out the built-in flash. We've recently established that 2 gig images may not be enough for current gcc, because gnu. I mean yeah, technically I could do it with network block device mounts too, but A) what would the NBD be served _from_ (needing a server that can stay up is the point of the exercise), B) I suspect if the network goes down for an hour, a kernel using NBD mounts might get unhappy in a way that requires physically poking the hardware to reset it, and I want a server I can leave up for weeks and use from another state without worrying about that. Anything can break, but avoiding a known sharp edge is mental load. Don't want to have to worry about it.

Writing to the USB stick drained my laptop's battery noticeably fast. At the start of copying that 4.29 gig file the battery was 97%, at the end it was 94%. The copy took 6 minutes 25 seconds (I ran it under "time"), just over 11 megabytes/second write speed. Eh, that's acceptable.


September 9, 2023

Had to look up what it was the qemu loons replaced "-hda block.img" with again (it's "-drive format=raw,file=block.img"). Note that kvm --help (qemu --help was removed) has one instance of "hda" and 17 instances of the word "drive", and the -hda argument takes exactly one argument while -drive takes 31 different comma separated keyword=value options like "detect-zeroes=unmap,iops_rd_max=irm,group=g" (the --help output does not provide further information about what any of those _mean_).

How did an open source project get this bad? Because bureaucrats took over. Story time!

Thirty-ish years ago Beowulf clusters based on Linux, networking together groups of cheap PCs, started seriously eating into IBM's mainframe market. So IBM retasked a lot of old white men doing cobol on punched cards on big iron to instead do Linux, and they took over "xen" development (an ugly hypervisor technology people like me were trying to avoid at the time), which was rendered completely irrelevant by "KVM" so IBM's guys took over KVM development, which was based on QEMU so they took over QEMU development. The stench of bureaucracy drove away QEMU's creator Fabrice Bellard, so these days QEMU is maintained entirely by punched card whallopers who think any technology that DOESN'T require at least three full-time employees per install is leaving money on the table, so of course it needs a mandatory configuration file written in its own language that require you to take courses from IBM and get a certification in. (Like JCL.)

Yes my first job out of college was doing OS/2 at IBM, but the Boca Raton facility that had created the PC was an oasis like Xerox Parc (only with less structural protection from the surrounding bureaucracy, a tide pool instead of a walled garden). The Boca people made fun of the Poughkipsee people even back then, and IBM destroyed the Boca Raton facility a few months after I got there (which is how I wound up being "site consolidated" to Austin), and although the Linux Technology Center they started in 2000 was in the 900 buildings on the east side of Burnett where they'd dumped the OS/2 guys (starting their Linux development with Boca refugees only 5 years mixed back into IBM proper), IBM's Big Push Into Linux was really a Sam Palmisano thing and he didn't last.

IBM's 1980s implosion took place under two CEOs (both interchangeable white men named John, one "Opel" and one "Akers"), and then a guy named Lou Gerstner brought it back from the dead starting in 1993 and made the company relevant again for a while. He fell on his sword when the dot-com crash happened, handing off to a guy named Sam Palmisano, who basically inherited a todo list from Gerstner in 2002 and Did Those Things, one of which was "spend $1 billion/year on Linux". (It was one of those "Sun/Microsoft are killing IBM but Linux is killing both of them faster" rock-paper-scissors things.) When Sam reached the end of Lou's roadmap in 2011, he retired. And handed the company off to Gini Rometty, a bloodless beige accountant from central casting who proceeded to cost-cut the company to death, including eliminating the entire R&D budget. (Robert Cringely, who did the "Triumph of the Nerds" PBS miniseries on computer history, chronicled the fall and even wrote a book about it.) But it turned out Gini did actually have a plan: all her bean cutter cost-cutting briefly juiced the stock so she could use it as monopoly money to buy another company that DID have a future: Red Hat, I.E. Pointy Hair Linux. She burned old IBM to the ground to buy a company that HAD briefly understood Linux 10 years earlier before ossifying.

Red Hat was one of the first Linux distributions to be run by someone who understood marketing, and during the dot-com boom of the 90's he built it up until it was big enogh to have an IPO in the year 2000. And the consultants Red Hat brought in to handle the IPO explained to Red Hat's founders (Robert Young and friends) what Sun Microsystems actually DID for a living: exploit a quirk in really big procurement contracts. When people bid to sell Very Expensive Things to governments or Fortune 500 companies, the contracts have piles of legalese restrictions as the dinosaurs try to protect themselves in a way that winds up costing them even more money. A common stipulation is to cap a vendor's maximum allowed profit at a percentage of the cost of materials... which means the vendor specs the most expensive possible materials. If somebody putting together a system for the U.S. Navy can only mark up the operating system running their new device by 10%, management WILL specify a $5000/seat Solaris license they can make $500 profit on instead of a $29 retail boxed copy of Red Hat that nets them $3, even if the engineers building the system would much rather use Linux than Slowaris. And Red Hat went "Wait, if we find an excuse to charge way way WAY more for the same thing, there's a class of customers that will buy significantly MORE of it?" And they hallucinated up some marketing bullshit to create "Red Hat Enterprise", and the company went from something like $15 million annual revenue to over $100 million just in time for that IPO, and "the tail wagging the dog" situation resolved itself with the company being sucked entirely out of the retail market because all their engineers were spending all their time being 24/7 grape peelers and fan wavers and fluffers at the enterprise side (there's nothing for you to DO but we continually reassure ourselves that you are ON CALL just in case with endless busywork which pays REALLY WELL), which created a market vacuum for "actually usable Linux distro that the open source community actually creating Linux can use", which Ubuntu stepped into around 2004-ish.

Why did it take so long? In January 2001 George "Dubyah" Bush and Dick "Dick" Cheney caused the dot-com implosion, because in the run up to the supreme court overriding the results of the "hanging chad" election they'd lost at the start of November 2000, Dubyah and Darth Cheney were out giving stump speeches about What We Will Do Now That We've Totally Won Of Course It Will Be Us Don't Look Down Just Keep Walking (which all the news stations covered because the unresolved election was THE big story) and the CONTENT of the speeches was all "We're going to give away piles of money to billionaires, explicitly undoing the balanced budget that Clinton left us with and running the national debt up to never before seen heights in the name of creating an oligarch class that can afford to kidnap harems off the street and hunt peasants for sport." And everybody went "but WHY???" And their answer was "Uh... the economy will totally collapse if we don't, giant tax cuts for the rich are the only way to save it." And this was INSANE, nobody else had heard even a whisper of an upcoming recession before then: we'd won the cold war and invented the internet, everything was going GREAT and the big worry was overheating causing inflation because the economy was doing so well there was nobody left to hire. (I had three jobs at once during this time period: day job programming, teaching community college courses at night, and writing a stock market investment column.) But the corporate decision makers heard this "tax cuts because recession" speech repeated over and over for weeks, and went "well if you say so, we'll tighten our belts in the new year's budget to save up a rainy day fund just in case you know something we don't". But they couldn't cut inventory because sales were through the roof and besides they'd already done that (all the just-in-time delivery shenanigans, and things like Coca-Cola spinning out its bottlers as a separate company so the manufacturing and distribution facilities weren't on the books of the company that sold the syrup... that was all 80's and 90's developments; inventory was the denominator in the "cash conversion cycle" figure that investors started looking at when P/E ratios went crazy and nobody understood the businesses they were investing in enough to do proper discounted cash flow analysis). And back then it was common knowledge that cutting R&D spending is highly disruptive introducing huge bubbles in the development pipeline (a one month disruption can cause a two year delay sort of thing; this was back when the USA was still DOING a lot of R&D so we still understood how it worked, the big "outsource all our thinking to india then china" stuff came later)... But there was one expense you could switch on and off like a light: advertising.

So everybody just didn't budget to buy any advertising in Q1 of 2001, to bank up money for the recession Bush and Cheney assured them was coming... And it turned out that magazines and television and websites all had the same revenue model, paid for by advertising. It wasn't a BAD revenue model, multiple companies had survived doing that for centuries... until the entire economy suddenly stopped paying for ads at the same time, and then there was splash damage.

Ms. Magazine was 40 years old and McCall's magazine had been around for a century, but both folded in 2001. 3 of the 5 television networks (ABC, NBC, CBS, FOX, UPN) at the time ended the year in the red, despite cancelling most of their series and replacing them with cheap Reality TV where instead of paying professionals to produce shows, some cameras follow random people around for a while (possibly inside a Spirit Halloween or similar) and then editors cut together a story after the fact like a sculptor with a block of marble. But this collapse was mostly known as "the dot-com bust", because it ended the dot-com boom HARD. The most common website business was "online magazine without the printing and distribution costs" (people wrapped fish in newspaper because buying newspaper is cheaper than buying the blank paper it's printed on, due to the advertising subsidizing the material, and who worryied about toxins in the ink back when everyone was breathing tetraethyl lead from gasoline exhaust).

All the websites supported by advertising were just GUTTED in January 2001. That's how Bush and Cheney triggered the dot-com crash, by cratering the advertising market with a self-fulfilling prophecy about recession to justify their tax cuts for billionaires.

1/3 of the dot-com businesses were always doomed, but ordinarily their collapse wasn't synchronized. Another 1/3 needed time to establish themselves, either by growing to a profitable scale (Amazon and Tesla lost lots of money for many years before turning a profit) or "finding themselves" (the way Twitter started via SMS texting but pivoted to web-and-app, or how Flickr started as an online game but pivoted to photo sharing when that's what its users actually spent their time doing with its service, or how Youtube had to survive the lawsuit from Viacom). And the remaining 1/3 of the dot-com businesses already HAD a sustainable business model, but what does that matter when your customers suddenly go away? A friend of mine was on the Board of Directores of VA Linux: 3 of their 5 largest customers went into Chapter 11 in the space of 2 weeks, owing them money for hardware that had been delivered on vendor financed credit paid off a little each month (server payments like car payments). Now those payment claims were tied up in bankruptcy court and VA might some day see pennies on the dollar, years from now if they were lucky. AND those customers were gone and wouldn't be buying any more machines, so future sales were looking dismal. AND that shiny new hardware the customers had been using was all getting auctioned off for pennies on the dollar at bankruptcy liquidation sales, so any surviving customers wouldn't need to buy anything new at retail price for YEARS. That's why VA exited the hardware business, and it wasn't just VA. Dell laid off 17,000 people in Austin in 2001, Intel idled its fabs when it ran out of warehouse space to store chips nobody was buying... the splash damage ripping out into the rest of the economy was brutal.

But the "online magazine" style dot-com companies were at ground zero of George W. Bush's stupidity. I was working for The Motley Fool until November 2000 when the vibe got Really Uncomfortable (management was STRESSED), and it was just no fun anymore, and I handed in my notice effective at the end of the year. The Fool's revenue fell 50% between Q3 2000 and Q1 2001, and they had an all hands meeting (I still got the emails) and laid off 50% of their staff to cut expenses in line with revenue. They were well-managed and had outlets other than the web (newspaper column, radio program, hardcover books...), and thus survived, but it also killed what was unique about them and they became just another stock market investment site...

So backing up: the delay between Red Hat eating Sun Microsystems' business model and Ubuntu stepping into the retail Linux market vacuum was partly due to the dot-com bust. Red Hat pivoted HARD to the enterprise market, and over the next ~15 years Red Hat turned into Pointy Hair Linux, the operating system equivalent of filing everything in triplicate, and that's what Gini Rometty bought when she zerg rushed IBM at a big acquisition to get a replacement business model.

And THAT is the IBM that took over QEMU development and pushed out all the hobbyists. The IBM made of dead wood that was too expensive to fire (and who didn't head for the exits back when David Niven threw his hat into the fire after Gini had burned all the furniture and pried up the deck planks heading out into deep ocean as fast as possible), combined with the portions of Red Hat that survived a decade of ISO-9001 certification audit training update preparation meeting pre-meeting scheduling conference catering budget review email reply-all sessions.

And that's why qemu's -hda option is broken, and why it's easier to maintain a local patch than try to argue about it on the list.

Apparently a dude named Arvind Krishna took over Red IBM Hat in 2020. Literally all I know about him is the quote from the statesman article about wanting to replace 30% of his employees with AI. Oh, did I mention that the IBM Austin facility they merged the Boca Raton developers with was itself dismantled? Their hardware manufacturing in Austin (used to design and make PowerPC chips) was sold to build a shopping mall called "The Domain", and what's left on the east side of the road is currently being sold, and towards the end of that article CEO du jour says he looks forward to replacing 30% of IBM's remaining workforce with basically ChatGPT. So that's nice.

Anyway, "-hda file.img" is really simple, so it had to be deprecated in favor of -drive,argument,argument and I am sad.

Open Source projects only avoid being embraced and extended by bureaucracy to the extent a motivated hobbyist is willing to fork them, or to reimplement them from scratch. A project that is good enough to prevent competition is susceptible to frog boiling, and I dunno how to fix that. It's not technical, it's social.


September 8, 2023

I received a "ts" submitted via the list, which is... from something called moreutils? Debian's repository has over 70k packages and yes this is one of them, but... why? Then again, busybox added it in 2019. (Is this an argument _against_ adding tsort support just because busybox added tsort? Sigh...)

The busybox version works on integer seconds, and the timestamps in dmesg show microseconds. I am sad that neither strftime() nor date(1) seem to have caught up to the idea that computers are fast enough to measure fractional seconds these days. No "fraction of a second" field in any of the print commands.

I also wanna specify digits of precision, because milliseconds are plenty for humans and nanoseconds are the best the machine sees. And no the machine's not gonna see MORE than nanoseconds even with a 5ghz clock, the page fetch latency from DRAM is still gonna be dozens of nanoseconds. The closest two lines I can spot in my current dmesg are 3 microseconds (3000 nanoseconds) apart, and those are adjacent printk() statements in the kernel. Back in j-core's "setting our clock from GPS" days we had a thermally stabilized clock in a little electric oven under styrofoam and it would still drift a ~8 nanoseconds from the (not human perciptible!) breeze any time somebody walked by the desk. The air conditioner turning on, the door closing across the room... (Microsemi sold an atomic clock on a chip, but it cost WAY too much, ate a processor's worth of electricity, and the lead time for ordering one -- nobody actually _stocked_ it, gotta get it from the manufacturer -- was 22 weeks back BEFORE the global pandemic-induced chip shortage. Maybe the tech has improved since, but there was basically no demand for it at the time.

So yeah, milliseconds and nanonseconds are useful, so of course somebody added a whole "microseconds" ecosystem which is neither fish nor fowl. Why did they do that... oh hey, historical explanation. Still a bad thing.


September 7, 2023

Sigh. There's a long thread ongoing on the coreutils mailing list about posix creating an unnecessary alias for the printf command's %b option, which the bash maintainer has already already declined to go along with so it's kinda moot, and I have written at least three replies that I then deleted instead of sending. I am RESISTING talking about how the standards committee that removed "tar" in favor of "pax" over 20 years ago (and STILL hasn't admitted nobody followed them over that cliff) should stop trying to demand changes of existing code. A good standards body should document, not legislate.

But I am sitting on my hands and not Doing Flamewar. Nope. There's a dial-in zoom thing for the posix bureaucracy to talk about it later today, which I am Not Dialing In To either. (I've circled back to a night schedule again so it would be half an energy drink of caffeine to stay awake for it anyway.)

Oh hey, bash's builtin printf has %n which assigns to environment variables. I need a MAYFORK on printf.c to implement that. Throw it on the todo list...

Checking if busybox djelibeybi hersheba tsort handled the echo a a a a | tsort edge case, I ran "make distclean defconfig busybox -j $(nproc)" and it barfed saying it couldn't find fixdep. (Sigh: not my problem, build it single processor... And yes, "echo a a a a | ./busybox tsort" did NOT consider that a loop, but output just "a" like it's supposed to.)

Posix's "tsort" command is a HORRIBLY DOCUMENTED simple dependency resolver which takes whitespace-separated pairs of inputs describing dependencies, meaning the first entry in each pair must come before the second, and outputs a "topologically sorted list" which contains each unique input once in an order that obeys all those before/after rules, ala:

$ echo a e b a c a d c | tsort
b
d
c
a
e

The resulting list has a before e, b before a, c before a, and d before c. If the list contains circular dependencies, tsort errors out and shows the cycle, although 'echo a a a a | tsort' just shows 'a' because depending on yourself is not a cycle. (Special case!)

My first pass through posix had this in the "uninteresting" bucket but busybox added it last year. I didn't notice at the time but my every few weeks check of the busybox list folder (yes I'm still subscribed) had a memory leak fix for the command, and digging back to the original submission somebody was using it as a dependency resolver for their init script ordering. Ok, sure.

I'm not gonna look how busybox implemented it because gpl. (I'm comfortable going back to look at the 1.2.2 version I _released_, but this code was added later.) But the obvious way is to read the whole mess into an array of pairs and go over it with two for(;;) loops, checking each first entry against each second entry. If this first entry is not in any second entry (and entries where first and second entry match don't count), print this one, and yank this entry (and every other entry with a first entry matching this one) from the list. If we make it to the end and haven't printed anything this pass, what's left is all circular dependencies with no loose ends to unravel.

Problem: that's an n^2 algorithm. Sorting the table and binary searching could give me a log(n), but sometimes it wants to find second entries (does anybody else depend on this), and sometimes it wants to find first entries (duplicates to remove so we don't output it multiple times). I'd need TWO sorts, and finding the same entry in the other table is not fun. (The "other entries matching his one" problem: "a b a c a d"... Even with a fallback sort each pair is not guaranteed to be unique, see again "echo a a a a | tsort". I suppose I could have a deduplication pass but ew?)

Hmmm, maybe I want a suppression linked list? "Things I have already output this pass, so even though I'm removing them from the table don't show them again?" Except that can go O(N^2) if the entire table is "a z b z c z d z" and gets removed all in one go. But that doesn't smell like a common case, and even if it is we'd output the whole table on the first pass (discarding _everything_ immediately) so it's not THAT bad, I think one of the other N's drops out. (And I don't want to code an insertion sort for the removal list. That just feels wrong.)

Ha: not a linked list. This is an array: move the entry we're yanking to the end when we move the rest down to fill in the hole. Then you naturally get "all the ones we've removed" together at the end, and just have to keep track of how big the table was at the start of this pass. Loop from new end to old end. (This is all obvious enough it's probably the "standard" implementation. Not really a hard problem, just new to _me_.)


September 6, 2023

Blah. I've had an upset stomach on and off for days (of the "reads as anxiety if I'm not careful" type) and it's REALLY hard to concentrate.

The plumbing.sh in the lfs build I'm doing does a "find $LFS -newer .timestamp" thing (in theory that can be used for package management, here are the files this new package installed) which works on the host but inside the chroot the $LFS dir is /root/lfs containing source tarballs, build directories, and build scripts. After the chroot the packages are installed into / not into $LFS, so the find is looking in the wrong place. This codebase is brand new and already accumulating scar tissue and changing assumptions. Great.

So I've mentioned Rock Sugar before, which is a side-project of the voice actor behind Wakko Warner and several of his friends. (As with the Blues Brothers, famous-ish person with unrelated day job sometimes goes slumming as a rock star, and I cannot argue with this. Their schtick is doing VERY GOOD mashups where they sing the lyrics and melody of one song to the backing music of another song, which I find EXCELLENT programming music. Challenging enough to keep my ADHD at bay without actually being distacting because all the _components_ are familiar. But their first album's only an hour long...)

Some time ago Fade bought me the second Rock Sugar album off Rock Sugar's website, which is only available as a digital download because their first album's physical CDs all got recalled due to a lawsuit from the ex-lead singer of Journey who was ABSOLUTELY CERTAIN the professional voice actor hadn't done a spot-on impression of him but had instead used an unauthorized recording of the actual Journey geezer! (Who broke his hip in 1998 and retired from singing.) And of course Jess Harnell (the voice actor) proved in court that this retiree loon couldn't tell what was his own voice and what wasn't, but the judge threw the doddering elder a bone by saying the album caused "market confusion" and couldn't be sold anymore. And Jess didn't spend the time/money to appeal, and just let people upload the album to youtube where people could listen to it for free. (I found out about them from the Professor of Rock interview with him there.) You can in theory still buy the CD used, but it's $200 for a scratched up one, and the artist doesn't get the money.

But the judge's ruling didn't apply to A) digital downloads of B) their SECOND album. (The first was reimaginator, the second is called reinventinator.) And besides, Steve Perry is 74 years old now and presumably busy suing other people.

So anyway, I legally own a copy of Rock Sugar's second album but couldn't immediately FIND it, which I was reminded of by the second album being uploaded to youtube a few months back. So the easy thing to do is youtube-dl the copy of the whole album that's on youtube, tell ffmpeg to strip the audio part out of it, copy it to my website, download it to my phone from there, and move it into the mp3 player directory. (For a definition of "easy" that meant I could theoretically do all that in 5 minutes from the laptop I was sitting at without bothering other people who were asleep at the time.)

And THIS is how I found out that youtube-dl does not work with Google Fiber. It does... something. And detects that it's an unauthorized stream in under a minute of downloading, aborting the download and immediately failing on "cursor-up enter" restart attempts. (And yes, I did a fresh pull of youtube-dl to make sure I was using the current version.)

But it still works fine with phone tethering. (Grumble grumble monthly bandwidth quota.) So Google's service is measurably less capable than its competitors, because Google imposes restrictions and the various Google services collude together to impose additional layers of data harvesting and activity tracking and digital restrictions management, which in this case blocks something I personally HAVE A LICENSE TO DO. (I paid Rock Sugar directly for a copy of this material! Actually slightly higher resolution than youtube has, which I could totally shift onto my phone if I could be bothered to scrape through my hard drive to find wherever I put the file (under whatever name I called it), or search back through a year of email to find the download link to re-fetch it from Rock Sugar's website, or wait until 7am for Fade to wake up. But the youtube link was right there and I have a tool that can grab it (which I've needed to archive my OWN presentations from the linux foundation's channels and such; yes the same people who accidentally deleted the entire 2015 ELC conference off youtube, in theory I can contact Tim Bird and get a copy through official channels but that takes weeks at best). And this tool works fine... when I'm not on Google Fiber, because Google Fiber is uniquely restricted in a way that other providers are not.


September 5, 2023

Proper fix for the backslash segfault thing. Of course I have a pending "78 insertions, 17 deletions" patch to sh.c to make more changes to the backslash logic, because (among other things) bash -c $'XYZ=xyz; echo "abc$\\\nXYZ"' outputs "abcxyz" but toysh is outputting "abc$XYZ" so I need to fix that, and not via whack-a-mole but in a more generic way. (I have like 5 pending changes to toysh in different directories. They fight. This one was in a new fresh directory and conflicts with probably all of them. Oh well...

Oh goddess, Linux From Scratch 12.0 came out on the first. Nope, NOT CHANGING HORSES MIDSTREAM.

List of things that should happen with the LFS build:

  1. Finish through the end of chapter 8 as-is, unmodified, to establish a baseline so we know what success looks like.
    • note that most of chapter 9 is booting stuff, not chroot stuff, so not really needed for this use case. Grub might be nice to compile, but not install. Building a kernel is a good smoketest though. But modern LFS kind of handwaves away the configuration step.
  2. Rebuild ch5 with the musl-cc cross compiler, so the statically linked chapter 7 tarball isn't so fscking huge.
  3. Use a busybox $PATH to build ch5, and ensure the result matches? (Ensure how?)
    • Matches means it still builds to completion, and the log and resulting binaries are similar-ish. Can I examine any config.log stuff? I guess the list of commands that get called in a single processor run matching is a good start. At least examine/explain any observable differences...
  4. Start inserting toybox commands in place of the busybox commands and re-run the build.
  5. Swap glibc for musl in chapters 5 and 8. Yanking perl if possible.
    • Still compile it, but don't install it. Maybe put both in some kind of optional side build? Figure out what needs python too. Oh, and patching the kernel to NOT need libelf and bc probably lets us yank those too.

There's another step, which could go anywhere above:

6) Transplant the ch7 build into kvm chroot so the ch7.0 with all its bind mounts and such can be a mkroot script running under the toybox environment.

And then, of course, dive into beyond linux from scratch...

Step 1 goes through the end of chapter 8 because most of chapter 9 is about booting, not chroot stuff, so not really needed for this use case. Grub might be nice to compile, but not to install into a vm that boots with qemu -kernel as its bootloader. Building a kernel is a good smoketest, but modern LFS kind of handwaves away the configuration step, and I already have kernel builds in mkroot happening under a toybox-only $PATH.

There's two goals here: 1) Make sure what we provide is good enough to run the package builds, 2) Make sure that what we provide can replace most of this gnu/crap so nobody ELSE needs to compile it unless they're sufficiently masochistic or think they must be using "standard" versions and yet somehow haven't been peer pressured into running Windows or MacOS. (Such people exist. For some reason.)

I left off in the glibc build, which is horrific, specifically the time zone data, which is packaged wrong and wants me to hardwire a timezone into the image because there's no sane way to select one. Austin and Minneapolis are both in the "Chicago" time zone.


September 4, 2023

Dealing with the sh.c segfault: ./sh -c $'abc\\\n def' triggers it and it's that backslash handling from earlier this year again. They WERE being stripped during initial parsing, and now they persist way longer (and have to be filtered out later), and something's getting confused by it.

This would be SO much easier if glibc's asan worked. It tells me the line it segfaulted on, with no backtrace. There's a backtrace for where the nearest memory block was allocated, but not for where the fault happened. Great, something somewhere called skip_redir_prefix()! Who and why? Your guess is as good as mine, I dunno how it GOT there...

Back to drilling down by inserting dprintf(2, "florp\n"); dprintf(2, "wheep\n"); and dprintf(2, "pang\n"); statements into the code. The point is "I got here, uniquely, passing these points in this order, before it exploded." Adding var=%p statements as needed to examine decision state. This is the debugging version of percussive maintenance: hit it with a rock until it's the right shape, but it never NOT works. You can have the "an interrupt came out of nowhere" problem, or "previous actions had delayed consequences so free() on a seemingly valid address threw a heap corruption error", at which point you start reducing your test case by ripping out previous chunks of code until the problem stops happening, at which point whatever you last ripped either threw the grenade or disturbed the Jenga tower so the grenade missed anything vital. Debugging the uncooperative ones is a whole lecture, and yes I have isolated them to "compiler bug" and "processor errata" before, but never both in the same year. (A zillion other people have used this toolchain with this processor on this OS. I'm the first one trying this code. That's a 1/userbase chance that it's NOT in my code. Happens, but really not often. And usually because I'm building a cutting edge compiler or libc or kernel from source and the bug was introduced in the past few months. For example, the processor errata in the cortex-m already had a workaround in the vendor's uClibc toolchain but not in the vanilla uClibc I was using; fix existed but had not made it upstream yet. I was using their kernel source to run on the board in question, but not their toolchain...)

This particular bug hunt isn't being remotely stroppy, just tedious. Window 1 is the text editor, window 2 is the command line within which command line history recompiles and re-runs the test every time I hit cursor up and enter. (The && operator in make && ./thingy is very useful; if I typo in the source it doesn't re-run the test.) Using ASAN actually makes that far LESS convenient because I then have to two-finger scroll up three screens to see my last printf output, because it's spat all sorts of useless "shadow byte legend" garbage after the interesting stuff I WANT to see, and there's no obvious way to get JUST the stack traces. I asked Elliott once what the shadow bytes were about and he didn't know off the top of his head.)

Sigh, looking at the existing code with my usual security paranoia, expand_redir() is doing arg->v[j] where j is a signed int and going "if they ever manage to feed more than 2 billion arguments to the same command line, that could wrap to negative and index out of range", and I mostly DIDN'T take that kind of thing into account when writing this because the maximum glibc contiguous malloc() size was 128 megs at one point, and the kernel would cap environment space at 10 megs... but that kind of stuff changes with the weather and I can't rely on it, I should probably have an explicit check for "2 billion argument maximum" in the parsing somewhere. (Not that I expect you can easily exploit a pointer 16 gigabytes before the start of the heap, but let's not reach out and touch hyperspace and expect it to end well on general principles.)


September 3, 2023

Sigh, I've written 3/4 of help text compression support where if you've enabled both CONFIG_TOYBOX and CONFIG_ZCAT then scripts/install.c builds an instlist --help that spits out the big help text block (with embedded NUL bytes and special 0xff entries for OLDTOY redirects) and make.sh runs it through gzip -9 | od | sed to turn it into a header file...

And now I'm writing the consumer side, and what I really want to do is decompress it into libbuf and print that entry out. I can iterate counting NUL bytes, and then memmove() what's left down and decompress the rest of the 4k block so I'm sure I've got all the data, then print it and it should be null terminated already. Except that doesn't work: for i in $(toybox); do echo $i $(toybox --help $i | wc -c); done | sort -k2,2n says sed --help is 4934 bytes. Which means I can't do the simple decompress-then-print because of ONE ENTRY. (It's the LITTLE THINGS that screw up seemingly elegant solutions. This could be so much cleaner if not for this ONE OUTLIER...)

And of course now I'm going "eh, is this worth doing at all?" It saves about 50k space in the binary (80k text to 30k text), but that binary could always live on squashfs or similar? It's really for embedded systems doing xip, and if that needs a nontrivial extractor or a large malloced DRAM buffer to work how much of a gain is it really? (The point is leveraging the deflate code we've already got.)

Plus digging into actually using the deflate stuff, A) Elliott turned zcat into #ifdef salad because he wants to be use zlib's slightly faster duplicate implementation of this code (I am NOT copying that to a second C file), B) I never bothered to implement the decompress-into-memory codepath. It's not hard, it's just the ~3 places doing flush are all doing so to filehandles right now. (The filehandle to write to is copied into both the bitbuf and deflate structs, which seems redundant but revisiting all that is part of the "implement compression side" todo item which I'm NOT DIVERGING INTO RIGHT NOW). The big design issue was that stopping decompression partway through, and backing out and returning in a way we can easily resume is MUCH harder than just giving it a place to flush data to when buffers fill up. So I did the easy thing at the time, and now... decompressing into an always-big-enough memory buffer would be the easy way.

At the moment, compressing the help text doesn't seem like enough of a win to really want to do infrastructure lifting. It seemed like low-hanging fruit when I was writing an outline to describe how make.sh works for an instructional video, but... as with so many things left at this point, there's a REASON it's still on the todo list. But I don't want to throw out hours of work either...

Sigh, this is the same reason I recently bounced off of moving the hash functions from toys/*/md5sum.c to lib/hash.c so I can use the internal ones in the password code: it's another tangle of external library code the Android guys wanted, and the result is ugly enough I rotated to bang on SOMETHING ELSE rather than hold my nose and deal with it. I need to go back and do it, I just... really don't want to? It's icky. Sigh. (Nobody seems to have noticed yet that toys/other/sha3sum.c does NOT implement a libcrypto codepath, it just does the internal one which WORKS FINE...)

Maybe I should work out lib/portability.c shenanigans with weak symbols? That sounds better than having command implementations full of #ifdefs, might be a good approach... (I occasionally suffer from something a bit like writer's block, which is my subconscious telling me that the design is wrong and I need to work out how. I can smash through it under deadline pressure when I need to, but an awful LOT of design work is staring aimlessly into space doing the is-this-it routine with the blind man and the Rubik's cube from UHF...)

Ok, if I move the library stuff into portability.c with the CONFIG_TOYBOX_GRATUITOUS_EXTERNAL_LIBRARY symbol checking in there, and have weak versions of the functions in lib/*.c, then that moves the config symbol checking out of lib/ which is one of my big objections to migrating this code INTO lib/ in the first place. I don't LIKE having config symbols checked in lib/*.c because the build dependencies don't really work out well doing that. To properly check them, you'd have to rebuild lib/*.c every time you build a new command, and that makes compile/build/test cycles slow and "make change" _really_ slow. But if you don't, it's subtle bug city. My compromise is only checking CFG_TOYBOX* symbols in lib/ (which don't change often) but it's still... icky. (Moving icky to portability.c makes me FEEL better. That's where it GOES.)

In md5sum.c the divergence point between the library and builtin function dispatching is the loopfiles() callback do_hash(fd, name) which calls either do_lib_hash() or do_builtin_hash(), both of which operate on a file descriptor instead of a buffer. So... kind of a lot like the zcat code, actually: not the API I actually need for the new use, I need this to operate on a memory buffer too. (The FILE * plumbing has memopen() for this, but the cure is worse than the disease. I also have xrunread() but that seems like overkill. Hmmm, still pending design work...)


September 2, 2023

Isn't C's #include "thingy.h" supposed to search in the current directory, as opposed to #include <thingy.h> which searches just /usr/include and friends? So why do I need to say -I . to get #include "generated/blah.h" to work with the devuan botulinium toolchain?

There's another bug report that sh.c does something wrong. (Segfault with line continuation.) I should circle back to shoveling that out, but it's an endless time sink and I have so many open tabs. I want to close tabs, which means shortest-job-first scheduling to get stuff done and checked in. The hard part is I'm terrible at telling how long something will take until I've finished. Hmmm...)

I'm cheating slightly in that it's a weekend, so I don't really HAVE to look at the new sh segfault until monday...


September 1, 2023

Hey, cruise changed its mind and now charges a flat $5/ride in its tiny little beta-test service area. That makes a lot more sense. Flat monthly rate would make more sense still (especially since the cars drive around constantly rather than parking even when empty, they can de-prioritize heavy users the same way my phone bandwidth does when I go over however many gigabytes per month it is, so your rides have a longer wait time before arriving when you've done a lot of it them close together). But expanding the service area is the first priority. (I'd be tempted to sign up for the thing myself if not for the iron rule that My Phone Is Not Authorized To Spend Money, Ever. I hooked a $200 gift card up to it once, and even that didn't end well.)

I'm still banging on video outlines. I should actually record the videos at some point. The classrooms on the second floor of Jester Center are pretty much _ideal_ recording areas. Very quiet at 3am, and now the students are back they're open again. Of course this also makes them excellent work space, which means I do lots of typing and then go home not having recorded anything. (Headphones with the good microphone sitting right there next to me...)

I'm getting lots of documentation written (which looks a bit like code review in a certain light, and means I'm doing stuff like trimming global sizes in passing), but... despite old people consistently saying "I hate watching videos like all these zoomers do, I want written documentation" I've CREATED buckets of written documentation over the years and nobody reads it. And I'm not entirely sure how to organize it, either. I can write a 5000 word treatise on the nuances of sed or ls, and _I_ wouldn't read it, so...

The basic "command walkthrough" is a bit of a porcupine because there are SO many potential tangents. Even explaining "true" and "false"... true does literally nothing, and false has one line: "toys.exitval = 1;" at which point the explanation takes a sudden 90 degree turn into explaining where toys.exitval came from, which is why I need an explainer on the three seashells 6 global variables. I need a walkthrough of the entry path (which might as well explain all of main.c). And then I need a whole thing on lib/args.c (called from that main.c entry path!) which is A) kind of a large explanation (500 lines of fairly dense code parsing its own input data format in the option strings), B) initializes globals like toys.optflags and toys.optargs, C) initializes the start of the GLOBALS() block.

It's simple like riding a bike is simple. Unfortunately, riding (and maintaining) a bike isn't actually simple, or else training wheels wouldn't exist. I very much want to make it all simpler, but can't figure out how to still make it all WORK if I do...

P.S. You'd _think_ toybox_version from yesterday would be in rodata instead of writeable data, but it turns out "const" is useless: "extern const char * const toybox_version;" still puts it in data, and "extern const char const * const toybox_version const;" complains about duplicate calls to "const". I could probably hit it with __attribute__((section)) but that's too micromanagy for me. And when I do apply "const" to variables, the rewritten-in-c++ compilers complain about assigning pointers-to-const to non-const pointers (I vaguely recall there was a brief period where they did this for signed/unsigned mismatch, and everybody -fstop-being-stupid it until they backed off), and I am NOT spreading a communicable disease through my codebase to silence warnings. Strings don't work like that: if I try to modify a string constant I get a segfault at runtime, as Ken and Dennis and Brian Kernighan intended. And thus toy_list is writeable data that never gets written to, to shut the stupid compiler up. (Sigh, I should look into that __attribute__((section rodata)) thing, shouldn't I? Smells way too much like busybox micromanagement. But the advantage is rodata can collapse together between multiple running instances of the same program, especially on nommu fdpic, and thus has an actual measurable benefit...)


August 31, 2023

One of the videos I need to do is explaining the global variables in toybox, which you can beat out of it with scripts/findglobals.sh (a wrapper around the ever-useful "nm --size-sort" piped into a couple grep filters):

$ make distclean defconfig toybox &amp; scripts/findglobals.sh | grep -v GLIBC
0008 D toybox_version
0050 B toys
1000 B libbuf
1000 B toybuf
1d60 D toy_list
2028 B this

(Building against musl adds stdin/stdout/stderr, building against bionic adds __PREINIT_ARRAY__, and statically linking against anything adds dozens of entries, but those six are the only global variables that should actually be in toybox itself, by policy.)

The ones with "B" are the bss segment, which means they start out zeroed. The two with "D" (data segment, initialized to specific values) are toybox_version which is the version string in toys.h or from git describe in scripts/make.sh, and toy_list which is the sorted list of command structures describing the commands toybox knows how to be.

The two 4k scratch buffers are toybuf and libbuf (one for use in commands and one for use in lib/*.c), toys is a global instance of struct toy_context from toys.h which is filled out by toy_init() and lib/args.c and a few other places (explaining each toys.field would be half of any video about the globals because there's over a dozen and they're all different, and most of them are important), and this is a union containing each command's GLOBALS() data, with the size being that of the largest command's GLOBALS struct...

Hmmm, what is going on with:

$ grep '\t' generated/globals.h | wc
    985    3194   25327
$ grep '^\t' generated/globals.h | wc
      0       0       0
$ toybox grep '\t' generated/globals.h | wc
    169     507    4482
$ toybox grep '^\t' generated/globals.h | wc
    169     507    4482

I'm trying to make a script to tell me the sizeof() each command's GLOBALS() struct, and the struct lines in the union coincidentally have a leading tab (for historical reasons) so I tried to grep that, and... debian's grep doesn't want to play?

Toybox is doing what I expect but I wrote it so that's not evidence that "what I expect" is right. The question is, why does the debian one treat \t as magic? (I tried with and without square brackets...) Ah, it's NOT treating \t as magic. And that's the problem. If I have bash expand it instead:

$ grep $'^\t' generated/globals.h | wc
    169     507    4482

Hmmm... busybox also isn't interpreting \t there. Will mine doing that break something? Do I need to "fix this" (make it LESS capable) and add a test for it NOT understanding escapes? Anyway, back to writing my script. (TODO list critical mass is where working on your todo list makes it longer. I have been over that event horizon for many years now...)

The script is: { echo -e '#include "toys.h"\nint main(void) {'; sed -n 's/^\tstruct \(.*\)_data .*/printf("%d \1\\n", (int)sizeof(struct \1_data));/p' generated/globals.h; echo '}'; } | gcc -xc - && ./a.out | sort -n which only needed two manual fixups to render properly in html! (The & becomes &amp; because it's an html special character, no redirects this time that need &lt; and &gt; replacing.) What the script does is create, compile, and run a small C program to print sizeof() each struct in the union in generated/globals.h, taking advantage of the fact that only those lines start with a tab because the script that generates them is really old. (Yes, even gnu/sed understands \t in the pattern, you see why I'm confused? BE CONSISTENT!)

So anyway, that script says how big each command's GLOBALS block is (sorted by size in bytes), and the last few lines of its output are:

520 tr
1024 cksum
2080 modprobe
2192 grep
2192 telnet
8232 ip

Everything "tr" and earlier is reasonably sized, and "ip" and "telnet" are in pending. That leaves three commands: cksum, modprobe, and grep.

The 1k for cksum is the crc table, which isn't using toybuf because we use that as our read() input buffer in the data procesing loop, fair enough. I could trivially split toybuf between the two uses but 1k is small enough it can go on the stack even for nommu, so I might as well move that, and inline the little endian and big endian per-byte functions while I'm at it (there were two callers of each which is why it was a function, but I can move the second call into an else case inside the loop if I add a "done" variable. Having it on the stack like that means the EASY way to do this is re-initializes the table for each file, which... I mean I could have the table be a local variable in command_main() and stick a pointer to the table back in GLOBALS to avoid the re-init, but... probably not worth it? Microoptimization to avoid a loop of size 256*8=2048 cycles times maybe 50 instructions long, happening once per input file...

Grep is big because of struct arg_list *fixed[256]; which with 8 byte pointers is 2048 bytes. That's fallout from adding fixed string bucket sort optimization last year (commit a7e49c3c7860 and then like 4 fixes on top of that), and that I probably DO want to turn into a pointer and malloc().

Modprobe has a struct arg_list *dbase[256]; which is the same 2k, but the code there uses hash %= ARRAY_LEN(TT.dbase); so it would care about just changing it from array to pointer, and WHY is it doing a modulus on a power of 2? Also, under what circumstances might TT.dbase change? Did I never clean this up... oh, I didn't: modprobe is still in pending. (The promoted ones are insmod, lsmod, modinfo, rmmod, but not yet modprobe.)

Sigh, I don't usually use modules in the embedded systems I build, and proper testing of modules is one of my big "get tests working under mkroot" motivations, which ain't there yet. (Ahem: am not there yet.) I do NOT regularly run root tests on my development laptop, which is why /proc/uptime is approaching 8 digits. (Not something I'm proud of, I still need to close all my windows so I can swap those 16 gig memory chips over from the previous laptop. And now that Devuan Diptheria has come out I should really upgrade off of Devuan Bronchitis...)


August 30, 2023

I don't usually post my todo list because it's probably unintelligible to anyone else. For example, "lfs wrap granularity" means I have an open design question about the linux from scratch wrapper that logs each command out of the $PATH. If I just wrap the initial inherited $PATH, I don't get the calls to the new commands that are built and installed along the way, some of which toybox implements. But "this command isn't needed until after an external package providing it can be successfully built on this system" is useful dependency information. And the initial wrap is less noisy, later re-wraps may have python and so on in them.

What I should probably do is have each package install re-wrap the $PATH but also update the log filename it writes to, so I have a separate log of each package build's command line invocations. That way I can later slice and dice the data however I want, and the trivial one is "cat them all together and pipe to "awk '{print $1}' | sort -u | xargs" to get the full command list. I.E. I haven't LOST anything by doing that. But this means I need to edit setupfor(), which I'm tryign to keep simple? Maybe the log update should be a separate shell command, called manually at the start and then again by cleanup()? Except cleanup can't update the log target file for a NEW package...

Other todo items like "Replace glibc", "yank perl", "What needs python?" are probably more transparent, but have connotations. Perl and Python seem like they belong in Beyond Linux From Scratch, which is Linux From Scratch Book II and says how to build x11 and sshd and postgresql and so on. If toybox can provide a linux from scratch equivalent system with just itself and a compiler (from which you can then build any of the LFS packages without further prerequisites, modulo stuff like curses), then Perl and Python logically DO go in the BLFS bucket along with Ruby and Lua and Java and so on. (Even git, dhcpcd, ntp, and rsync are blfs, not lfs base system.) Except I'm not writing BLFS, and nobody else is likely to competently maintain an "ELFS/EBLFS" embedded book. (If I was an insomniac teenager I'd happily take that on... but back then I didn't know HOW. Now I'm spread thin and haven't got the spoons for large new projects, I can barely keep my existing plates spinning...)


August 29, 2023

Got the Linux From Scratch 11.3 automated build script script up to section 8.5.2.2, at which point the time zone data is stroppy because the tarball doesn't have a subdirectory. (It just extracts a dozen files into the CURRENT directory.) I have code to fiddle with that in aboriginal linux, but am trying to minimize complication this time around? Also, this is part of the glibc build which is just horrific all around, although the gcc build is second-worst. (WHY is the cc1plus binary 366 megabytes? There's no excuse for that. Not the whole compiler, that was me asking "why is tarring up this file taking so long"...)

I want to make a cleaned up musl-based version of this build, keeping toybox at the start of the $PATH instead of the end, so it keeps using it instead of replacing binaries with new ones as they're built and installed. But first I need to reproduce LFS as it exists so I know what success looks like and have a frame of reference to diverge from.

Another thing is I haven't got anywhere to check this in. I don't want to make a separate project for it like I did last time, but the logical place to put it in toybox would be mkroot/packages/lfs, except it's not to the point where it even tries to work under mkroot yet. (One of these packages is not like the others...) It's currently 5 scripts: a "plumbing.sh" that factors out common code (some setup like "umask 022" but also the announce(), setupfor(), and cleanup() shell functions that bracket each package build), a pre-chroot script (chapters 5 and 6) that builds the initial directory, and chroot-into-directory script (current LFS doesn't build a "mount" command in the chroot and instead does a lot of --bind mounts from the host before chrooting into the new system), and then two more build scripts so far: chapter 7 does all the work before deleting the /tools directory, and then chapter 8 (up to the timezone nonsense).

So I run ch5.sh on the host, then ch7.0.sh to chroot into lfs, and then inside the chroot ch7.sh, ch8.sh. And ch7.0.sh copies plumbing.sh and the other 2 scripts into LFS. Awkward, but reproducible.

What it DOESN'T do yet is set up the command line logging wrapper inside the chroot. I have a log of the chapter 5/6 files, but not a log of what gets used inside the chroot. In theory I only need to provide the host binaries and can let the chroot do its thing, and that's system bootstrapping done... except I want toybox to provide a working system the way busybox does in alpine, which means building arbitrary packages with toybox plus supplemental binaries toybox doesn't implement. If toybox DOES provide a command, that implementation needs to be load bearing...

A passing anime (kuma kuma kuma bear, which I re-watched season 1 of because season 2 is showing now) had an egg-based "pudding" without particularly describing the recipe. Fuzzy's been making egg custard for a bit (which is really good), but this is solid rather than liquid. (Fuzzy's of the opinion japan reinvented the flan, and I've been referring to it as a marsupial flan.) I watched a couple videos of people making it, which didn't do a good job of providing recipes, but the incredients included eggs, milk, and vanilla, and also starch, gelatin, cream, and a caramel sauce (usually just sugar and water, cooked). To be honest I want to go back to tokyo and eat the stuff there (so I know I'm experiencing the actual professionally prepared version on model and as intended, not judging it by our random attempts to replicate something we've never tried before), but I no longer work for a Japanese company even part-time...


August 28, 2023

I've been banging on stuff long enough that every once in a while something wanders by where I'm honestly not sure whether I'm responsible for it or not.

I mean stuff percolates around, that's normal. I saw "we've replaced the dilithium they normally use with Folger's Crystals" on a bumper sticker at Worldcon a couple years after using it as an original fidonet tagline. When people reflect my "containers are chroot on steroids" phrasing back at me, I know that came from the OpenVZ booth I ran with Kir Kolyshkin at Scale back in 2011 where I gave a rehearsed 90 second patter to dozens of people explaining what this "container" stuff is and why they should care. And a few months after I gave my "prototype and the fan club" talk at Flourish in 2010, I watched a video of Greg KH repeating a chunk of it (the red hat as fanzine editor analogy) more or less verbatim in one of his own talks a few months later.

In this case, "the C locale does not support UTF-8" was a problem Rich Felker and I struggled with way WAY back (I have a memory of trying to wrap my head around the problem in the kitchen at Cray's office in Minneapolis, which is a contract I worked for 6 months in 2013). Thus the C.UTF-8 locale in Android is something I advocated for early on for toybox-in-android, involving both Rich Felker and Elliott Hughes in working out how to get it right. So it LOOKS like my "this should happen" and trying to get the ball rolling fed into android, and is feeding into coreutils, but... it seems REALLY obvious, and like something that would it have happened anyway from another proximate cause? (Android's previous internationalization stuff was all at the GUI level inside java, I dunno when bionic actually developed locale support. I suppose I could check the git log, but it's not actually _important_. It works now, that's what matters. And coreutils is sort of finally catching up, at least in being aware that it's an option and making their test suite not die in such an environment.)

Sigh: the comment at the top of musl's des implementation said it's derived from "freesec" which sounds like it _might_ be public domain, so I googled "freesec des" and the first 5 pages of hits are entirely porn sites. Google's not even coming up with the musl source page I got that from. While it's great that google search isn't trying to bowlderize the web the way prudetube is, it would be nice if it could actually FIND STUFF anymore. Later the same day, I tried to find "site:lwn.net python kubler ross" and... zero hits. Luckily I linked to it from my blog. (I have no idea why the python developers thought forcing people to leave python 2 behind would make them move to python 3 instead of any other language. It was completely unjustified.) But another obvious thing that exists which Google can no longer find.


August 27, 2023

Oh goddess. The recent coreutils talk of next release accepting new features (and Elliott's ping about not having brief hash output) reminded me that I added -b to toybox md5sum and friends many moons ago, and I should offer that to busybox to see if it becomes more standard. (Coreutils can then ignore it for many years, as usual, but eh.)

So I'm looking at current busybox code for the first time in forever to whip up an add-b.patch and... look, I created the ENABLE_BLAH macros because the CONFIG_BLAH macros were only defined sometimes and had to be tested with #ifdef, while the ENABLE ones were always defined to SOMETHING (either 0 or 1) so they could be if (ENABLE_BLAH) triggering dead code elimination without invoking the preprocessor to create different codepaths where the parentheses or curly brackets might not match up and thus cause build breaks in certain configurations. Right? Simple. Straightfoward. Useful, I thought. I tried to explain this but the other busybox devs never quite understood the difference.

The current code has added an ENABLE macro that gets tested with #ifdef and needs #ifdef around every use of it? WHAT THE FSCK IS WRONG WITH... sigh. The ENABLE_ naming convention meant it was a _type_ of symbol with a consistent behavior. Meant, past tense, apparently. (Oh well, not my project...) Also, this nonsense from line 266 to 276 where there's 4 calls to getopt32() repeating 4 different variants of the option string, which result in DIFFERENT FLAG VALUES only 2 of which match the manually defined FLAG_ macros on line 140... That's just... ow? I stopped looking at this codebase for a REASON.

(Yes, they count their flag values from the left and I count mine from the right. The REASON I do that in the binary number "1011" the first bit is 8 and the last bit is 1, and the bit that's _not_ set is 4. So in toybox a command with optstr "abcd" receiving -a -c -d would set bit 8 for -a, 2 for -c, and 1 for -d, leaving 4 (-bb) off. The flag bits go where the binary number bits go. The letters are in the same order as the binary digits.)


August 26, 2023

I've been making "truncate -s 2g blah.img" ext2 images by reflex for kvm build scratch space, but the chapter 5 build of LFS 11.3 is over 3 gigabytes, 1.5 each for the "tools" and "usr" directories. A combination of glibc being hilaribad at static linking and the gcc that got rewritten in C++ bloating to insane sizes. Honestly, usr/libexec/cc1 is 258 megabytes (and cc1plus is bigger, and lto1 is about the same size whatever that is), there's something called lto-dump in usr/bin that's 250 megabytes, libstdc++.a is 30 megabytes...

This is NOT NORMAL. Nor is it necessary. I regularly ran aboriginal builds on qemu images with only 256 megs of ram for the whole OS (kernel and everything), and this ONE BINARY is bigger than that. What they have done to it is not an improvement.

And no you can't "oh but Moore's Law" your way out of it when laptops got stuck at 4 gigs ram for about 15 years. (That was the high end that triggered the switch to 64 bit processors in 2005, and I was still pulling up "preinstalled with 2 gigs" machines over the pandemic.) They finally seem to be unstuck, but not by much: I just typed "laptop" into google and clicked on the first and third sponsored links, and both had 8 gigs ram. So 4 gigs was the high end in 2005 and 8 gigs is "standard" in 2023, 18 years later. An 18 month doubling time is not an 18 year doubling time, the same way C is not C++, the move to which is WHY compilers in 2023 eat so much more memory than they did in 2007 without accomplishing significantly different tasks, and no it's not a bigger optimizer. (You've screwed up a perfectly good compiler is what you've done. Look at it, it's got template instantations).


August 25, 2023

Went out to the airport on 3 hours sleep, sat next to an elderly couple on the airplane who fell asleep leaning into my seat space, watched saved anime episodes on my phone rather than trying to pull out the laptop. Got home, hugged Fuzzy, petted the cat, and slept for several more hours.

Fuzzy is in GM's self-driving car beta program ("Cruise"), and gets a week of free rides (starting from her first ride), so we went on an adventure! We had it take us to a DIFFERENT grocery store!

In theory cruise's robotaxi service costs something like $5 plus 30 cents/mile and 20 cents/minute (she told me and I may not be remembering accurately). I understand the price per mile, and maybe charging for wait time before/after the actual travel, but charging per minute during the ride means the passenger cares more about how long the ride takes, so "the robot is driving like a nearsighted octagenarian" is potentially aggravating. (Giving people extra reasons to criticize the performance of your beta product seems less than ideal to me, but hey: free for now. (And Waymo was proposing a flat monthly fee back in the day...)

First conceptual problem: restricted hours. Cruise beta starts running at 8pm and stops at 5:30am, so not a lot of places would be open by the time we get there (or would be closing soon). Fuzzy's first several planned trips on the thing turn out not to be possible because it just doesn't go there, or it's closed by the time we get there. (Austin Central Library is open until 8pm monday through Thursday.)

Second conceptual problem: restricted service area that starts a couple blocks away from our house and excludes over half of Austin. It's basically a donut around the university, going almost as far as us, and almost to the river, and west to... I dunno, Mopac? That part's mostly residential, nice to pick people up from but not a lot of destinations suggesting themselves there. (It won't drive through UT proper for some reason, hence donut.) Which means we have to walk a ways to catch a robot (not carrying back a lot of heavy stuff) and most of the places Fuzzy thought of going turned out not to be available destinations. Fuzzy was especially disappointed it couldn't take her to Central Market (HEB's snooty overpriced grocery store, the "buy local" version of whole foods).

Eventually we worked out that we can go to La Madeleine's parking lot on Lamar, which is a few blocks south of HEB Central Market (home of many snooty exotic overpriced things) and across the street from Rudy's (home of much barbecue and cream corn that is not _creamed_ corn but "corn cooked in cream". We also passed the Kolache Factory on that walk, but it's a breakfast/lunch place that opens at 6am and closes at 3pm. Our old vetrinarian is also there... if we wanted to carry a cat in a carrier a long way down the sidewalk of a busy street.)

TL;DR summary of the experience: the robotaxi is in beta test but the phone app is alpha at best. The actual driving part works ok (although it selects residential backstreet routes optimized to go over as many nimby speed bumps as possible), most of our problems seem to have been with the app.

First actual problem: the robot started heading to us then aborted. (Just like human drivers!) I think the problem is we picked a pickup point (church parking lot) on the edge of the service area, but the route it calculated to navigate there took it out of the service area, at which point it went "boing" mid-journey. (The routes are idiosyncratic, at one point it did a three right turn loop around a block instead of turning left, at a no-light no-traffic residential intersection. Eh, as long as I didn't have to do it...) I'm assuming the Cruise engineers noticed the abort and we don't have to tell them. (Not that we have an obvious way to provide feedback.)

For our second attempt we used one of its suggested pickup points another couple blocks southwest (well inside the coverage area, we'd now walked about 4 blocks to get there). And the robotaxi arrived!

Second actual problem: it sat in front of us going "click click" repeatedly, but the doors were still locked. This went on for a couple minutes before we backed two car lengths away from it (thinking maybe we need to let this one go and summon another?), at which point it turned the corner and drove two houses down and put on its emergency flashers. Aha! It didn't think it had reached its pickup point! And it did not indicate this to us in any obvious way, and our proximity to it (standing next to one of the rear doors) apparently paralyzed it and made it re-lock its doors repeatedly, or something? Weird.

We got in! It drove! The screens showed us our route on a map! It dropped us off! Fuzzy was literally giggling through at least half of this. That part worked fine (good shocks, the speed bumps weren't that bad). It got us there, we got out and walked to Central Market, and Fuzzy got to shop. (She bought lobster mushrooms and smelt. I got an avacado and some yogurt coated lemon shortbread bites from the bulk bins. We were hoping to find rice bran but they didn't have it.)

For the return trip, we summoned another one back at the original place, and the app warned us it was a 9 minute walk and we went "yeah, we know", and... the app did not take the walk time IT HAD WARNED US ABOUT into account. It summoned the car immediately (4 minutes) and said it would wait 3. So we jogged and got there in 7 minutes, got in the car... and it aborted the trip WITH THE DOOR OPEN. It let us open the door, timed out as we got into the car, and then drove off in the wrong direction (we were at the north edge of the service area but it continues off to the west) with the display saying it wasn't currently conveying passengers. The seats have weight sensors so they can beep about the seat belts not being fastened, but that isn't taken into account for the "people are in the car" logic?

We hit the help button and had a conversation with an engineer who said no, we had to get out because "Panda" (each robotaxi is uniquely named, it shows it on the app and in the display) was heading to another customer. So we hit the emergency exit button and it let us out, summoned another one through the app, and it soon said it had arrived but it never drove down the street either direction from us... and then we saw it (hazard lights) at the far end of another business's parking lot, and went there... and of course it was Panda again. No it wasn't going to another customer, it was just driving randomly around like they've been doing for months before they started taking customers. (As long as they keep moving, they don't need to pay for parking.)

Panda took us back to our original pickup point 4 blocks from the house (saved previous location in the app, so Fuzzy sent us there as a known quantity), and we walked home. This trip was where it made that loop to avoid turning left, and it also did a sudden DON'T-HIT-CAT style full brake stop that... we didn't see what it was braking for? But sure? It slammed on the brakes from like 20 mph so a noticeable lurch but not a huge deal. We got there, got out, it drove off, and we walked home.

On the whole, a more pleasant experience than Lyft or Uber, I suppose. I'm used to being exasperated at technology (I break everything), and don't take it personally. (The most frustrating part of the whole experience was several times we wanted to see the edge of the service area, but couldn't pull it up in whatever app or vehicle screen mode we were in at the time. As complaints go, that's pretty minor.)


August 24, 2023

Flying back to Austin tomorrow, I should try to get my act together, or at least packed back into the suitcase.

I have temporarily de-promoted passwd.c because the new infrastructure needs waaaay more testing. Plus I'm unclear on what it should DO in several corner cases: are -d and -l and -u root-only? They seem like they should be root only. Running "passwd -l" as a normal user seems dangerous, and "passwd -d" seems likely to violate policy. Not that we've GOT a good policy, I really want to remove CONFIG_PASSWD_SAD because toybox commands mostly don't have sub-options anymore, we've come a bit far from busybox at the design level over the years. But I don't want to enforce an arbitrary heuristic on everybody? Said heuristic is VERY minimal, bordering on useless. It doesn't require multiple character types (upper/lower/digit/punctuation), it enforces a minimum length of 6 (which even with full 256 values would only be a 48 bit keyspace, that's probably laptop crackable in realtime)...

The modern use case for passwords is rate limited login attempts. I just assume if they've got the hash they can probably brute force anything a human is willing to type with a GPU farm no matter what the algorithm is, but "you've made enough bad guesses to notice and do something about" is still useful. Yet another failure of IPv6 is it makes "this IP failed 10 times, block it for 5 minutes" much less feasible than IPv4. Wikipedia[citation needed] permanently blocked the whole of IPv6 years ago (nobody on it can edit pages, period). Other sites do similar, but it only ever comes up when I'm using phone tethering and DON'T run "dhclient -4" to force an IPv4 address. (I don't actually edit wikipedia pages, because of a personal policy: they refuse to let people with firsthand knowledge contribute to the site, and I'm not going to edit something I don't know about. But I do edit wikipedia _talk_ pages anonymously from time to time to point them at references proving them wrong about something, which is then generally ignored but I feel I did my part.)

The unixy way to do this sort of thing would be to have passwd call out to mkpasswd to generate the actual $id$salt$hash string, which implies also calling out to some sort of password policy command to validate it's a good enough password. But doing so _securely_ is non-obvious in mulitple ways (can't call it from $PATH, don't leak the new password through /proc) and the fact is the unix guys DIDN'T do this back in v7 means there's no standard for it.

Of course one big REASON Ken and Dennis didn't bother (modulo Trusting Trust) is brute forcing through even an unsalted 6 character keyspace was prohibitively expensive to break on a PDP-11. At 6 bits per character, on a 16 bit processor running at 1.25 mhz, assuming a trivial hash that takes 1000 clock cycles per attempt: (((1<<(6*6))*1000)/1250000.0)/(60*60*24) is 636 days to exhaust the keyspace, and that was on a shared machine were people would probably NOTICE the long-running job. Unix wasn't regularly networked until Vaxen running BSD replaced the original IMP hardware in 1980, so the Labs' threat model was "your coworkers" and the AT&T Patent and Licensing Department's secretarial pool wasn't all THAT rowdy in 1972. Even the people on the early arpanet were all a certain amount of vetted anyway since you needed a close relationship with a large institution as price of admission until the NSF AUP changed in 1993 allowing randos to buy access, and THAT didn't happen until after the BSDI lawsuit. Richard "old stick-in-the-mud fogey standing astride history shouting no" Stallman got away with famously having no password on his internet-connected account well into the 1990s, because he refused to acknowledge the internet was no longer a rural town small town because that would be allowing change to exist. (Some change being worth opposing is different than idolizing an ultra-conservative who hates the concept of change.)

Exponential growth of both processing power and the userbase made password security a real world concern (or at least elevated its priority beyond "clean desk policy inside a building you have to badge into anyway") only _after_ capitalism had muzzled the Bell Labs devs. (Unix v7 was the last _release_ from the labs, but the labs guys continued to make newer versions through Unix v10. Nobody outside the labs ever heard about it, because AT&T commercialized Unix with System III and System V, so the labs versions lost permission to be published outside the company. Same reason nobody ever saw their successor system Plan 9 before Bell Labs got spun off as Lucent around Y2K: it was proprietary and cost a lot of money to sneak a peek at. When your advertising plan is "pay me a lot of money to see what I've got", nobody's likely to bother and even if they do word of mouth doesn't spread far when nobody ELSE can see it either.) Unix features that DIDN'T come from Ken and Dennis were far less universally adopted, modulo the bits you needed to connect to the internet...

Hmmm, can I have something in /etc signaling what kind of password policy to enforce? How about the default format for new passwords is the format of root's password, and failing that sha256?


August 23, 2023

Oh goddess, glibc broke userspace again. 2.36 broke mount.h and now 2.38 is breaking crypt(), because of course it is. (Why does anyone anywhere use gnu/anything?)

Alright, not only has the passwd rewrite become time critical, but I need to get it using internal hash functions. Great. Ok, let's do this...


August 22, 2023

Giovanni Lostumbo poked me about ongoling discussions of getting a modern kernel to run in 2 megs of ram. And I of course pointed him at the old "2.6 kernel running in 256k of SRAM" thing from ELC 2015 that the Linux Foundation deleted the talk video of, but Vitaly Wool did indeed get Linux running on a "microcontroller" with no DRAM (tl;dr: 80% of the heavy lifting was kernel executing in place out of memory mapped flash, binaries executing in place out of cramfs in mapped flash, and NOMMU so no page tables).

I thought trying to get an XIP system working under QEMU might be interesting, but after a bit of digging it looks like the Linux kernel clique deleted the xip subsystem and replaced it with "dax" which does not sound like it does the same thing. The old xip stuff let you execute code directly out of mappable ROM or flash memory. But that documentation file was deleted in kernel commit 95ec8daba310, in favor of "dax", which looks like it's just more of that O_DIRECT oracle database nonsense? Haven't dug far but none of the MTD code seems to implement it...

*Shrug* I'm a bit out of the loop here, all the people I know doing this stuff are still using the 2.6 kernel _today_ because if they ask a question on linux-kernel they either get ignored or mocked. "You're a tiny minority we can ignore and bully", "No, there are zillions of us", "We never see you around here", "Yes because if we come here you ignore _and_ bully us". See also the recent "your statically linked initramfs is weird for merely existing" argument. And you wonder why I don't push my patches at linux-kernel that often/hard anymore?


August 21, 2023

Not as good about blogging while I'm up at Fade's. More face to face social interaction, less with the computer I guess?

Fuzzy says the air conditioning went out in Austin. Radiant has been informed. They replaced the vents earlier this year and we bought the warantee service thingy from them last time the air conditioner went weird, so assuming the outside unit doesn't need to be replaced it might be under warantee. (They're the good/fast option in the good/fast/cheap theodicy. They're not cheap, but... Last Week Tonight has done multiple segments on their advertising?)

And the "blower motor" died. $1800 to replace it. The extra twice a year inspection thingy did not catch that it was corroded enough it looks like it's been undersea for months. The power flicker last night pushed it over the edge. Installed in 2015, lasted 8 years: owning a house continues to be expensive. (But renting's gotten nuts these days. Too many unguillotined billionaires.)


August 20, 2023

I recently squinted at Android's microdroid and gave up something like 5 screens in where it still hadn't explained "what is this intended for and how do I use it", which inspired me to redo the toybox main page in an attempt to answer that question for casual browsers encountering it for the first time. I put a link up top with the current release version and date, which when clicked goes to the release notes that used to be the first page. (The point of that is "proof of life". Yeah, way more missable, but prioritization questions boil down to what you decide to suck at...)

Oh hey, there's a new devuan release. I should close enough windows on my laptop I can shut it down and move over the 16 gig ram chips from the old one to the new one.


August 19, 2023

Sigh. I miss Michael Kerrisk maintaining man7.org. For one thing, if you try to drill down from man7.org it bounces off of kernel.org/doc/man-pages (for historical reasons), and THAT page got zapped to not point back to Michael Kerrisk's website. So although all the old pages are still up, the indexes that let you FIND them got deleted by the kernel maintainers. With no replacement! Golf claps everyone! (The kernel clique is turning all 50, none of them have learned how to do this in 30 years, so "new people learning how to do this" is no longer part of their worldview.)

It would also be nice if I could ask Michael Kerrisk whether I need to do a regfree() after a failed regcomp(). Does the error path leak memory? Who knows? Posix says "undefined" which is NOT HELPFUL. (Musl's internal error handler already calls regfree(). Is it safe to call it twice?) It would be nice if the man page got updated.

Also, since "groupdel --help" produces help text, it would be nice if there was some indication of what legal characters for user and group names were. The /etc/passwd file seems ok with everything but colon, newline, and NUL, but the above is an example of, shall we say, "another constraint"...

Yes, I remember the blog post from earlier this year about building my own HTML version of the man pages git repository and posting it on my website, which alas doesn't quite work as-is because the repo that's there does file:///usr/include/stdio.h links and the new maintainer doesn't seem to care at all about web versions of anything so I'd have to do it myself (and I don't do CSS so the result's likely to look like the rest of my blog at best). It's ON THE TODO LIST...


August 18, 2023

Heard from Google that there's budget cuts and they can't fund me full-time next year, but they're requesting _some_ funding for me which is way better than nothing. (Fingers crossed it goes through.)

I thought if I had a year of full time focus I could get everything done, but the hard stuff remains hard and I've been prioritizing support requests so there's quite a bit of ping-poing. And I don't exactly have writer's block on the videos, more... paralyzing perfectionism? Hmmm. Stream of consciousness is trivial, complete and coherent explanations not so much.

Still, it's been lovely, and if it wasn't for a half-dozen household emergencies (most recently needing to replace all the ducts in the house and the hot water heater) we'd have paid off the home equity loan we ran up over the pandemic. (I should just sell the house and move. Which would involve vanishing for a month while I packed everything out...)


August 17, 2023

Darn it, mstdn.jp is borked. It works fine from the app and when I'm logged in, but when I use my phone's browser (not logged in) or an incongnito window (ditto) it loads as a black screen. Which is also what happens when I send somebody offsite a link to one of my posts.

I pulled up the site's "about" page (while logged in) and that gave me a blob of japanese text that google translate says means email sns at bunsan dot social. I gave that a try, and got back a delivery Status Notification (delay) the next day. Not a good sigh.

Is this some sort of language detection thing? When I joined the site it was run by "sugitech" which was a journalism organization run by a 20-something woman, but I heard rumblings of them handing it off to a big organization with deep pockets when Twitler sent a flood of refugees their way. (Which involved cloudflare failure messages for a bit as the domain got handed over, don't ask me how a CDN works into activitypub and regularly updating individual timelines, but the site did get a lot more responsive when it came back up.)

I noticed the phone link issue on the bus heading to the Barbie movie, so it's been going on for at least a few days already and has not resolved itself. I really don't want to switch servers, in part because selecting a new server is annoying. (I still haven't moved my email off gmail!) Luckily b0rk made a guide to running your own mastodon instance. (There are various places that'll run a dedicated mastodon container for you in the $5/month range, In theory I can have arbitrary subdomain.landley.net addresses redirect who knows where. But A) I'm _BUSY_, B) although requesting an archive gives you all your old posts, there's no obvious way to load those posts into the new server. (I mean yeah the old links won't redirect automatically either, but I could at least give out NEW links to old posts on the new server instead of the content going away if the old server does. Nobody seems to have written a "parse the json and manually stick the posts into the database" script yet, although I can't say I've looked that hard...)

Sigh. Throw it on the todo heap...


August 16, 2023

The other issue with this cp -r stuff is filehandle exhaustion. I'm using openat() variants for everything I can so I'm not re-traversing paths that can change out from under us, which means as cp -r is descending into directories it's opening two filehandles per level, so when the directory gets >500 levels deep the default 1024 filehandles allowed in a linux process (ulimit -n) get used up.

I have a plan for rm -r to teach the traversal to close filehandles above the current parent and re-open them via ".." (and then compare the dev:ino pair in stat and traverse back down from the top if it's not the same). I can optimize that slightly by: A) allowing the first 50 or so directory levels to keep their filehandles so the common case never hits this, B) leaving discontinuous filehandles open (if stat ".." doesn't give the dev/ino we have for the parent, keep the filehandle) which catches symlinks (but not bind mounts).. I still need the "drill back down from top" logic in case somebody does a "mv" down in the tree during a traversal, and yes it might not find the directory we were in. The question is what error handling looks like there: maybe error_exit()? There are potentially DIRTREE_COMEAGAIN calls I can't make if I haven't got a parent filehandle, and again we wouldn't do this for the first 50 levels which should cover any non-pathological filesystem layout. (Yeah, famous last words...)

I probably want to do it WITHOUT keeping the first 50 for a release or two, just to catch errors in the less-used codepath.

Doing something similar with cp means saving a dev/ino pair for the NEW directory somewhere. Right now dt->extra is the filehandle of the destination directory, and even if I did want to sometimes replace that with the dev/ino pair (and had a reliable way of distinguishing which it was... negating it only leaves me 31 bits in a long on 32 bit platforms) there isn't enough space to store _both_ device and inode one integer (even with clever bit packing/shaving, kernel_dev_t is 32 bits and inodes can be 64 bits on modern filesystems, sort of goes in the "large file support" bucket, disks are big now). Putting a struct there adds malloc/free I haven't got construction/destruction callbacks for. With dirtree_path() I can request extra allocation space up front, maybe I need something like that for dirtree? (It's already a variable sized object because of the name string at the end.) Where would I PUT it? We haven't got global dirtree traversal data (a lack I've noticed before), and it's not easy to fit that extra info into the function call API. But realloc() after the fact has the problem that the pointer can change when you do that, so other things that point TO it need updating. Hmmm...

There's a reason it's still on the todo list. :)


August 15, 2023

I have a pending cp.c change where xattrs don't apply to directories, and the problem is there's no mkdir variant that returns a filehandle to the open inode, you have to create-then-open which is a race window for shenanigans. Applying selinux labels after such a race window is just CONCEPTUALLY WRONG. But also unavoidable with the existing API?

Having cp operate in a less than ideally secure way is annoying but not unprecedented. Applying SECURITY LABELS in an insecure way is just... why bother? I have a conceptual objection to this. It bothers me.

In theory I can open (not following symlinks) and then do some paranoia on the filehandle: confirm S_ISDIR and that .. is the expected dev:ino of the parent, right user:group, it's on the same dev... except that cp -a is EXPECTED to follow an existing symlink if it's there, isn't it? I'm going "what if they did a bind mount" but... normal use case could theoretically have a bind mount. If you cp -a into an existing directory, does it modify the ownership and permissions of that directory?

These are design questions I need to resolve, and then add tests for, but it's the kind of tests that requires magic build environment that has/supports xattrs and runs as root so it can fiddle with ownership...

Alright, if cp creates a directory with permissions 700 and populates it and then adds world fiddlable permissions on the way back OUT, then at least other users shouldn't be able to take advantage of the create/open window. It already does a DIRTREE_COMEAGAIN chmod but that's because we forced the directory to be writeable to ourselves, and because other changes drop the suid bit already do? This means if you ctrl-C in the middle of a copy the permissions are only SLIGHTLY wrong, but if I create directories 700 then an interrupted copy's directory permissions are VERY wrong, and access during the copy is no longer a thing. Plus if we didn't create the directory then we don't change its permissions and thus its contents aren't protected, and directories we create at the top level would be in an existing directory that wasn't protected, so that _can't_ be a complete fix. (Unless I create each directory in a hidden .subdir, then open it and "mv newdir .." into place, which is just WAY too magic and again kill -9 would leave debris...)

Sigh. Secure vs obvious.


August 14, 2023

And the btrfs fix got merged into the btrfs maintainer's tree. We _just_ missed the -rc6 pull but there might be an -rc7 pull before the release. I have NO idea why the commit's in the log there twice (as c5e6134bb363 and also 9b378f6ad48c an hour apart) but I am NOT ASKING. Selling past the close, trust the process, do not interrupt the enemy while he is making a mistake...

Got distracted by sort.c for a bit until I hit a snag. (It's probably "skip" but wanted to ask first.)


August 13, 2023

Saw barbie. It earned that billion. Have not heimed yet, might barb again as a lead-in. Or perhaps afterwards. (Fade is already looking forward to watching the DVD extras.)

Tested the btrfs fix so I could post my In-Triplicate-By: line to the mailing list in accordance with the prophecy. Alas I can't seem to link to spinics at the moment because it's failing to connect. (Meaning _both_ brtfs web archives are down; one went away in either 2016 or 2020, the other connected this morning but won't now, possibly it's a phone tethering vs apartment wifi issue?) But here's a cut and paste of the reproduction sequence I sent there:

$ mkroot/mkroot.sh CROSS=x86_64 LINUX=~/linux/btrfs-patched KEXTRA=BTRFS_FS
$ cd root/x86_64
$ truncate -s 1g btrfs.img
$ mkfs.btrfs btrfs.img
$ ./run-qemu.sh -hda btrfs.img
# wget http://10.0.2.2:8888/btrfs-test-static
# chmod +x btrfs-test-static
# grep btrfs /proc/mounts # just confirming
# mkdir /mnt/sub
# cd /mnt/sub
# for i in {1..1000}; do touch $i; done
# /btrfs-test-static
# exit

That didn't include the btrfs-test-static source or build because it was already on the list, but that's:

$ cat test.c
#include <sys/types.h>
#include <dirent.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
  DIR *dir = opendir(".");
  struct dirent *dd;

  while ((dd = readdir(dir))) {
    printf("%s\n", dd->d_name);
    rename(dd->d_name, "TEMPFILE");
    rename("TEMPFILE", dd->d_name);
  }
  closedir(dir);
}
$ x86_64-linux-musl-cc --static test.c -o btrfs-test-static
$ toybox netcat -s 127.0.0.1 -p 8888 -L toybox httpd .

The bug was that changes to the directory were appended to active readdir() sessions (actually getdents() under the covers), meaning if you traversed the directory touching files your readdir() would never end, which hit users of my find implementation trying to build AOSP on btrfs. I could have worked around it, but not _reliably_. There's an unavoidable denial of service attack if one proceess can pin another process's readdir() in a loop. (Yeah, maybe process scheduler batching would prevent it but do you want to trust that? Are you sure systemd never does a readdir() on a user-modifiable directory?)

This was found by toybox's find --exec, which ran a process for each directory entry as it was read. I could have switched it to DIRTREE_BREADTH to read each directory's contents into memory before running the first child, which would have worked around this trigger for this bug. (And been noticeably worse on embedded systems with low memory, thus I'd want to NOT do it for most filesystems meaning add a config option for it, but I don't LIKE having this sort of config option so it would have been called something like CFG_TOYBOX_BTRFS_BUG which just seemed rude. I could also have added a find --breadth command line option, and might yet, but that just gives me something to REPLY to bug reports with. Nobody would ever organically AVOID being hit by this issue, it would be a mop and bucket to clean up with.)

But read all then use would still be vulnerable to something like my test program running: it doesn't just pin itself, it pins ANY OTHER PROCESS doing a readdir() on that directroy. They all get broadcast updates ala inotify, so "not triggering ourselves" does not actually prevent this problem.

I also thought about cacheing the dev:ino pairs to eliminate dupes, but a flood of duplicates coming in can still pin us forever: we still hang eating 100% CPU even if we're discarding them so the OOM killer doesn't zap us. Maybe our process would eventually outrace the other one due to scheduler batching letting us run to the end of what was queued so far before the other process gets to add more... modulo SMP, and assuming the other one isn't a malicious actor WANTING a denial of service that spawned 16 threads to hammer a directory (which is just dentry spinning, doesn't even need to hit backing store if they never get old enough to flush). And that's if the libc's readdir() implementation was using a big enough getdents() buffer size under the covers that we ourselves don't schedule a bunch because of all the system calls we're making to fetch entries one or two at a time...

The cacheing could stop at the first duplicate, but that would stop _early_ on filesystems that store things in trees or hash tables and return _some_ duplicates because "it moved later in the tree/hash we're traversing" means a renamed entry gets returned again under the new name, just not in a way that results in an endless loop. You'll reach the end of any given tree/hash table eventually, entries move earlier as often as they moves later (and even a calculated attack is traversing a finite keyspace plus the calculation's gonna slow the attacker down so the defender outraces it and terminates). That does mean such filesystems can miss renamed entries that jump BACK past a traversing cursor, which is its own kind of bug, but a much smaller one. (I suppose I should make toybox rm -rf try to traverse a directory _again_ if it can't delete it, in case it missed a renamed entry? Eh, file creation is the same exploit, and that can legitimately happen at any time. Do a good faith effort and report shenanigans if it's changing out from under us, which is the current behavior.)

And if I _don't_ stop on the first duplicate, when _can_ I stop? As many duplicates as I've read entries? That fails if the first entry is renamed: read one, dupe one, ignore rest of directory. (Ok, astronomically unlikely but still not RIGHT.)

So the workaround I'd worked out before they fixed it was "cache entries, stop at 16 times as many dupes as legitimate entries" which is a horrible evil heuristic but would at least terminate while finding entries reasonably reliably. And I hadn't applied it because "ew" and "just don't use btrfs", and was waiting for a _third_ bug report about it. (Although the first one was already two bug reports.) But actually fixing the problem in the kernel is SO MUCH BETTER.

Oddly enough this is one of the bugs posix enshrined as allowable, due to refusing to call broken implementations from the 1980s non-conformant (yes, even in the 2018 version). A posix readdir() is never guaranteed to finish, it can return infinite results on finite filesystems. But I _also_ remember linux-kernel developers arguing about this back in the day (circa the 2.5 development cycle) which is why I was aware of it in the first place: trying to return "new" files added after the directory was opened (or at least after the first getdents() call on the open file descriptor) opens denial of service attacks.


August 12, 2023

So the mkroot failure is in main.c where toy_init() frees the old toys.optargs if it's not an incremented variant of toys.argv, and something in sh.c is setting toys.optargs to something that A) isn't malloced() so can't be freed, B) isn't part of argv's existing environment space. So the free faults. Alas, the musl-cross-make cross compilers I'm building don't support ASAN so I can't get the "it was allocated here" stack dump, which I admit a growing fondness for.

The best debuggers I ever used were A) Integrated into Turbo C for DOS, B) part of some OS/2 IDE at IBM, and both went away again so I stopped relying on them and moved to my current "stone knives and bear skins" approach of editing with vi and compling from the command line with a bunch of printf()s stuck into the code to track down problems because those tools can't easily be taken away again. Yeah this is open source but that still has life cycles. I remember when xmms was _the_ mp3 player for Linux, and was declared unmaintainable and had its last release in 2007. We all had to migrate from xfree86 to x.org, and "death before systemd" puts one distinctly in the minority these days. Lots of desktop I relied quite heavily on was tied to KDE, meaning it went away when KDE became ergonomically unusable to me (and others, although Linus is apparently far more forgiving than I am). I still miss Kmail, and Konqueror, but "giant hairball tied together so breaking one part breaks all of it" ain't my jam.

Sigh, I need to divert into doing an LLVM+musl toolchain build script from source so I can do most of my cross compile testing with clang, but there are just ENDLESS bug reports...

Speaking of which, the btrfs issue got independently reported again so I checked vger for a btrfs mailing list and then posted there and a day later, there's a fix. Very nice. Triaging the bug and getting the right person's attention is always the hard part.

There's also a cultural difference between up-and-coming projects like btrfs, which are working to convert people away from established alternatives like ext4, and entrenched king of the hill projects like Linux where gatekeepers who've been running things for a quarter century insist supplicants work to prove their issue worthy of consideration. (I watched Linux turn from the first type to the second type, and am sad.)


August 11, 2023

The junior combo at Borger King (have it our way, your way is irrelevant) has, like Wendy's, developed reasonable portion sizes. For $7 instead of $5, but this is what's within reasonable walking distance of Fade's.

Poking at the Linux From Scratch 11.3 build: yes I have a bit of swap thrashing going on here, but "driving test environment" is kind of an important thing I've been missing to organize all the OTHER work...

I miss how clean the earlier LFS versions were, this one does half the chroot in chapter 5 and the other half in chapter 7, has a fairly awkward handoff where the new chroot hasn't got "mount" in it so has to be fairly extensively set up (as root!) by the host in a way that isn't really reentrant (not a problem doing the work manually, but awkward to develop a build script in stages under), and I no longer follow the logic of the /tools directory at ALL.

In earlier LFS versions when you chrooted there was ONLY the /tools directory containing all the binaries you'd cross-compiled from the host, and you set $PATH to point at /tools/bin and run your builds in the chroot, and then you'd rm -rf /tools once you'd used it to build enough of the new system you no longer needed it. This was the original "airlock step", that made sure that none of the files you wrote from on the host wound up in the final system. But now, 2/3 of the new chroot is outside of /tools when you chroot. I'm not sure why any of it is IN /tools anymore...

I miss the Linux Luddites podcast. (Motto, "Not all change is progress", and intro tagline "Every week we try the latest free and open source software and then decide we like the old stuff better". The Linux Late Night podcast was not a sufficiently interesting replacement. Oh well...)


August 10, 2023

Fade had an appointment with her shrink (who prescribes her ADHD meds and the anti-anxiety pills) and she had me tag along to meet her. Said shrink can't take me on as a patient (and thus prescribe modafinil at me) because I'm not a UofM student, but she recommended a couple people (I am VERY OBVIOUSLY a poster child for ADHD) and meanwhile she suggested I get a sleep apnea study (a thing Fade had previously mentioned she thinks I have; I dunno, I'm generally not conscious for that part). There was a slot to see somebody to start that process a half hour later in the same building... but he stopped listening after he took my blood pressure and it was 140/90 (well I had caffeine this morning, didn't know somebody was going to measure my blood pressure). He scheduled a blood draw for monday. He did not schedule a sleep study, I have to engage with the "online portal" to do that, which means working out how to log into it again.

This will be at least the third time somebody noticed something weird about my circulation and did a blood draw. The third time I went to the emergency room with 3am chest pain in Austin I let them draw blood (since they'd done a chest cat scan and found nothing wrong; it happened every spring when I lived 2 blocks downwind of pease park and left the windows open at night, dunno what blooms that time of year but it was an annual event that stopped when I moved to the other side of campus) -- their tests found nothing wrong. And back when I was on diuretics it was after another doctor did a blood draw and found nothing wrong. My blood pressure has ALWAYS been at the high end of normal (my father was put on blood pressure medication in his 20's, not because of a problem but because of a measurement). That's not what I was there for. If an alcoholic gets shot and goes to the hospital, they're there ABOUT THE BULLET.

Sigh. The doctor did prescribe me five ativan, so I could try one to see how it affected me and then take one before the blood draw. Haven't picked them up yet. It's... not the problem? I mean my needle phobia IS a signifcant problem for his desired course of action, but he's willing to put some effort into (and provide controlled substances for) pursuing the goal HE wants to see, and not a lot towards pursuing the goal I came there for.

And this is the GOOD medical system, not the completely dysfunctional Austin mess where nobody was taking new patients but Fade dilligently found me a general practitioner at a "men's health" sports clinic near the UT stadium, which I had exactly ONE meeting with (he basically ignored the issues I came there for because there wasn't a bone sticking out, and I might as well not have bothered) and then the practice closed down 6 months later so I'd have to either find a new GP or drive to Round Rock. (You'd think the light rail would go there: it doesn't. You'd think the bus system would go there: it doesn't. Greyhound goes PAST it up I-35 to stop in Wacko. Similar problem trying to visit Elgin to the east without a car: it's 35 miles away. I could either walk for 9 hours or get an Ub̈er there for $150.)

P.S. When the founder of a company names it after the middle word of "Deuts̈chlan̈d Uber̈ Alles" they're an OBVIOUS nazi. Not exactly trying to hide it. Yes, Silicon Valley has a pronounced rich while male incel eugenicist "social engineering a heap of skulls" problem. Last one out of Silicon Valley remember to flush.


August 9, 2023

I've circled back to trying to clean up expr.c again but I have zero experience using it: when $(( )) math isn't sufficient I called python. This weird "some arguments are strings, some arguments are integers" business is:

$ expr abc + def
expr: non-integer argument
$ expr abc '*' 3
expr: non-integer argument

The obvious results would have been "abcdef" and "abcabcabc" but no. And then there's the colon operator, which I thought would produce a substring but:

$ expr a123b : '[0-9]*'
0
$ expr a123b : 'Z'
0
$ expr a123b : '1'
0
$ expr a123b : '[0-9]'
0

It's not true or false... ah, figured it out. It's doing the least useful thing it could possibly do:

$ expr a123b : a123b
5
$ expr a123b : a12
3
$ expr a123b : a1
2

Returning length of initial (anchored) match. Bra fsking vo.

Ah, and reading the expr.c source in pending, it returns the value when there isn't a sub-match in the regex, and otherwise returns the string of the first sub-match. I would not have guessed that. Want to know what the expr man page says? STRING : REGEXP - anchored pattern match of REGEXP in STRING. Which doesn't say what the actual RESULT should be at ALL.


August 8, 2023

Dentalized. My front teeth look like teeth again, which I wasn't sure was possible but they did an excellent job. Face all screwed up by chemicals, and when those started to wear off my _nose_ hurt, which I wasn't expecting. Wound up napping until afternoon.

Going through the pile of old patches in my toybox directory trying to at least delete the ones I historically applied. Found a half-finished "move this out of the way so I can apply something else" save from years ago that I eventually worked out was "in printf.c \0 doesn't work with %b" and managed a fresh fix for. (At least I THINK that was the issue? Found a bug, fixed the bug, deleted the old unfinished change. Best I can manage.)

I've got an old patch to expr.c which I never finished cleaning up because I wanted it to share code with $(( )), but I've got a reasonable $(( )) implementation in toysh now and it doesn't look anything LIKE expr. As in they don't do the same thing: strings are variables to the shell, but literal strings to expr. Plus $((1+2)) doesn't have spaces and expr 1 + 2 has a hard requirement for spaces separating tokens. And expr hasn't got any assignment operators, single = is comparison. So with the benefit of hindsight it's not just "factor out recalculate() and have it use a callback to look up strings", it's got enough differences that large chunks of the infrastructure would need to drop out and become callback plugins, and it's probably past the point where trying to make the rest collapse together isn't worth it. But it's still really uncomfortable having both.

And the one in expr is just NOT MY STYLE. A table and code operating on that table using an enum to pass data between each other? So very much NOT "single point of truth". (And I've fallen out of the habit of using case statements because they're only a win about 10% of the time, and I don't think this is one of those times either.)


August 7, 2023

Gotta be up early tomorrow for an 8am dental appointment.

There's no obvious way to tell chrome that the URL bar has nothing to do with search, and it should not pollute the URL autocomplete suggestions with "how old was the series of tubes guy when he misunderstood the internet as badly as Joe Biden is doing now". I try very hard to pull up google.com and run my searches there so as NOT to pollute the autocomplete history, but chrome disguises empty pages as google pages, except what you type in there gets shoved into the "pollute your URL autocomplete" namespace. Once again, the "we know better than you what you want and will shove our way down your throat until you comply". You'd think they'd know better than to do that on Linux, but now. (From hell's heart I stab at thee, for hate sake I spit my last breath at thee, otherwise I would already be using Windows or at least a Mac, honestly...)

People online are panicing that we all need to migrate off of chrome to avoid Google's new web DRM nonsense anyway, so I'm curious if the vivaldi browser doesn't have this un-disableable URL autocomplete pollution problem. (If it does, I at least have the mastodon contact of their project lead. He's already responded to a poke, and left to his own devices posts cat pictures. Yes, that's a positive sign.)


August 6, 2023

On a plane to Fade's. Didn't blog for a bit after getting the toybox release out, largely a "collapse" thing. There was the Taiwan talk, the toybox release, and the flight to Fade's all in a row being Looming Deadlines. (Returning for more dental work, although the University of Minnesota wants Fade to teach one more class this fall which may extend the health insurance another semester, who knows?)

Lots of airplane prep stuff: got a haircut, got a suitecase packed at the very last minute (with two requested boxes of HEB store brand cereal and a very frozen tray of Fuzzy's lemon bars).

Didn't get the 16 gig ram chips moved from the old laptop (which I left behind in Austin) to the new one, but I've noticed that the battery standby time is twice as long on the new setup. I don't think it's just the fresher battery, I suspect twice as much ram pulls more power. Plus I still haven't gotten to a good "close all the windows and shut the laptop down" point since my last visit to minneapolis.

I really need to record a proper version of the taiwan talk. I did 90% of the prep work and then hit the scheduled release window like a bird, as it were. I packed the good microphone, so hopefully I can get that done at Fade's.

Fade's posted about half-done is better than not done. I did NOT respond with a link to the Simpsons song "Do a half-assed job". But that's half my problem with the videos, I'm being a perfectionist. The other half is the same problem I had with the "simplest possible linux system" talk years go: circular dependencies. There is SO MUCH BACKSTORY...


August 5, 2023

I noticed busybox added "tsort", which is hubwards of hersheba a posix command I skipped as irrelevant because nothing in the Linux From Scratch build (or the portion of the Beyond Linux From Scratch build I tried) ever used it, nor have I in my various unix poking since 1992. But it _seems_ like low-hanging fruit... Except the posix page doesn't give even a hint of what the command actually DOES, and the man page is basically the posix page. The wikipedia page at least gives an example, but... I don't understand WHY? (Sort an acyclic graph! What does it output if there IS a cycle? It outputs an error message "input contains a loop". Uh-huh.)

But... why? I mean... what's it FOR? (The history section says it was part of the innards of an ancient linker? Um... ok? As a command line utility still out there in 2023? And just recently added to busybox. WHY was it added to busybox? Is this one of those "because it was there in posix" things, or did someone actually have a use case?)


August 4, 2023

Watching the second episode of Good Omens Season 2 with Fade and Fuzzy. It's excellent.

This role really allows David Tenant to show his range: not of this world, centuries old, lives in an obsolete supernatural vehicle, passing for human but sometimes only just, saves the world by wandering around talking to people and performing the occasional minor miracle, interacts with famous historical figures but generally treats the high and mighty the same as shop clerks, treats money as a minor annoyance he can largely ignore, can't function properly without his companion...

At the start of the 10th Doctor's tenure he asked "Am I ginger", and at the end he predicted some new man would go sauntering away. Crowley is ginger and has a heck of a saunter.


August 3, 2023

Collapsed a bit after getting the release out. Gotta pack to fly to Fade's, but just sort of... not doing it. (Lot of taking my laptop out somewhere and sitting down listlessly shuffling through stuff.)

There's a form of stunlock where I have so many todo items laid out in front of me that every time I open my laptop and select a window with a todo item in it I get a different one, and do a few hours work on it (half of which is refamiliarizing myself with where I left off and working out the design again) but not enough to get it checked in, and then next time picking a DIFFERENT window. And if I spend more than a couple hours on one thing without getting it done, I go "no, I'm spending too much time on this, everything ELSE needs to get done" and swap, often without realizing I'm doing it. (Wasn't an issue before I had people waiting on my output...)


July 30, 2023

Toybox 0.8.10 is out.


July 28, 2023

Panic panic panic talk in half an hour. Cut and paste my TODO list for the talk out of the outline and into here:

#TODO: https://landley.net/bin/{toybox,mkroot,toolchains}
#TODO: upload new toolchains
TODO: test busybox package
#TODO: test extra in miniconfig

Sigh. I have now given a talk by pointing my phone camera at my laptop screen and typing with one hand. [Achievemnt unlocked: sigh.]

Note to self: do not assume that just because you're trying to use google meet with google's chromium browser, and because the meet page is showing you yourself through your webcam, that "google meet" won't crash when you click the "join" button. Rich says I could have worked around that by changing my user-agent string so Google Meet doesn't try to call some windows-only DRM library? The failure was trying to record video ahead of time and only testing that my phone worked with the google meet link for the Q&A, then running out of time (partly "perfectionism" but mostly "this isn't finished yet, let me try to nail this together real quick") and trying to "do it live"...

Anyway, I owe them a PROPER version of the talk. But I'd like the talk to actually describe a release version, so I need to do a release...

The commented out TODO items above mean: I made the "quick" bin symlinks so I can say https://landley.net/bin/mkroot instead of having to point people at https://landley.net/toybox/downloads/binaries/mkroot but unfortunately the way dreamhost's web server works (might still be apache?) there's no obvious way to discover the second URL from the first. I was thinking bin/toybox then click "parent directory" but that just peels off the symlink...

Built and uploaded toolchains with gcc 11.2 and musl 1.2.4. (Still built i686 on the host rather than x86-64, I should probably switch that over next time). Poked Rich about maybe actually upgrading musl-cross-make (hadn't had a commit in a year) and sent him my 3 local patches for it (not counting the package version upgrades). Redid the comment generation in linux-miniconfig so the third block says "# architecture extra" instead of architecture independent again.

Still haven't checked that mkroot/packages/busybox actually does anything useful, haven't actually run it in more than a year. Theoretically useful doing LFS bootstrapping if alpine's been regression testing that everything still builds under busybox (after I got it all working in the first place under aboriginal, thus allowing alpine to exist).


July 25, 2023

Sitting down to create the mkroot video for friday (yes prudetube emits fresh suck every day but I don't have to upload it there), and... I have 30 minutes total, which is not much time. There is SOOOO much stuff I want to complain about, and can't fit in the time allotted.

For example, in the README I have the example invocation KARGS=quiet ./run-qemu.sh -hda docs/linux-fullconfig but if you do that on x86-64 the QEMU bios still clears the screen and outputs a bunch of text (no obvious way to suppress this) despite the "quiet", which includes the magic broken esc[7l sequence which screws up bash command line editing and history (disables automatic wordwrap), and which you just have to know is undone by esc[7h which is why both the mkroot init script and run-qemu.sh emit the antidote. And then despite the "quiet" the kernel goes "you didn't enable this one specific bug mitigation, Doom and Gloom!" which... I'm running bottled code in a NAT-ted VM that's not trying to sandbox unknown code from the net? I do not care? If I did, I'd have added whatever config that is? (Plus this is an EMULATOR, I'm pretty sure it doesn't emulate the hardware flaw! And honestly, Spectre is just flaw du jour. There's tons! Did they ever even fix rowhammer? Or just smile and collectively agree not to look down?) But no, the kernel won't shut up because the kernel devs have been convinced they know better than everybody else for several years now...

Argh, there's no WAY I'm describing a coherent subset of this in half an hour, let alone Q&A time. This is a similar problem to why the standalone mkroot project had a README but I don't have one in the mkroot subdir yet. If you want to build on alpine linux you're probably ok (haven't tried it), or if you use the cross compilers you're ok, but the "simple" create-a-chroot build against glibc only partly works, and explaining what's wrong is at least a 15 minute digression right there. Unless I just want to say "because Ulrich Drepper was an asshole and the bureaucratic commitee that inherited the project when he bogged off to the finance industry hasn't had the spine to actually reverse any of his bad decisions". (I can badmouth instead of explain, which is TRUE but probably not helpful. "My project is good, these people are idiots"... not a good intro.)

I can start with downloading prebuilt binary versions, except... I want to reorder the mkroot binary output a bit, so linux-miniconfig and linux-fullconfig are in the "docs" directory. Just the files you NEED at the top level directory, run-qemu.sh and the files it calls...


July 24, 2023

Sigh, I don't WANT to switch browsers off chrome, but that's what's going around the zeitgeist right now. Right now vivaldi looks like the least bad option? (From one of the co-founders of Opera, who started over when his old company sold itself to china because capitalism. Yeah the code is a webkit derivative which means it's the same rewrite of a rewrite of konqueror, but that's a little like libre office forking off open office. Some of the gui stuff is source-under-glass, but people used QT for decades without caring about that part?)

Alright, what's standing between me and a toybox release: I want to fix cp -s the way I fixed readlink. I have a large /etc/passwd rewrite that (among other things) lets at least a lot MORE of defconfig build under the ndk. There's a bunch of pending sh changes but I can probably punt on that because that's NOT getting promoted this release.

I'm grinding through the LFS stuff: I ticked off dd from the pending list, and the other ones I've already done a good chunk of are diff, expr, gzip, tr, and xz. But what I really need to do is A) rerun that under a clean minimal debootstrap to get a $PATH dependency list without a bunch of extraneous crap that configure opportunistically included, B) finish the within-chroot part and log what THAT'S using. (And prepare a "yes it worked" double build smoketest I can just leave running in the background.) But I'm relucating to do complicated things in a chroot because "ifconfig" and "date" and so on can still screw up the host context as root within the chroot. Which is why I had the unshare line, but toysh running mkroot's init script within the chroot didn't detect that stdin was already open so replaced it with the container's /dev/console which apparently goes nowhere, so I went "lemme just do this under mkroot" and I made an 8 gig ext2 partition and used toybox httpd to wget a tarball of the debootstrap result to extract into it and chrooted into that and it didn't work and I don't remember why and need to try again.

Too much of a tangent to block the release for.

And I've got that remote "intro to mkroot" talk on friday for the Taiwan conference that should really describe how to use the vanilla release and the currently uploaded mkroot system images and so on. Need to do a 10 minute "download prebuilt binary tarballs and play", and a 10 minute "building this from source with the cross compilers" (which keeps getting derailed by "why dynamic linking instead of static linking is REALLY COMPLICATED, which is what screws up building WITHOUT the cross compilers on glibc hosts, although it works ok on something like alpine"...)

I mean honestly, I've got a bunch of tricks to harvest shared libraries out of the host toolchain, but they all suck. There's a sequencing issue about needing to select dynamic _before_ building toybox if that's to be dynamically linked, but you don't know what shared libraries you need until AFTER you've built all the binaries, and if you just copy everything out of debian it's 1.7 gigabytes of shared libraries on my install, which ain't gonna fit in initramfs. But if you try to be selective and recursively run ldd on the binaries after you've built them, plus each shared library you copy, that STILL doesn't identify the dlopen() crap that glibc calls even from static builds. (It's not IN the runtime linking, it's done by functions after the program starts running. For BAD REASONS.) So if you want to make dynamic linking against glibc work, you need a hardcoded list of additional shared libraries to copy to target in case they're dlopen()ed, and that list will of course change with new glibc releases.

Did I mention that Red Hat maintains glibc? Yes, the same people who did systemd. The same people who are trying to shove wayland down everybody's throats. The same people who stopped releasing their source code. IBM, You BM, we all BE for IBM.

Anyway, wasted a bit trying to make dynamic linking (against glibc) not just work but be cleaned up enough to be easily explicable in 2 or 3 minutes out of the upcoming talk, and it's just wasn't happening.

Darn it, Microsoft github's tests are failing. Spotted it earlier but couldn't see what was wrong from my phone because Microsoft won't show me test results unless I log in. (It literally says "log in to see test results" when I click, and I'm not giving Microsoft my phone credentials.) insists I log in to see test results., and there's... some sort of version skew with ubuntu, maybe? Lots of "bzcat: out EOF" I'm not seeing in a "git clone toybox blah && cd blah && make distclean defconfig toybox tests" on my machine. And those aren't even the actual failures, which seem to be in tar...

Although I AM seeing spurious output from that clean run (and yes my rote memory version is disabling ASAN because I'm not letting make tests build a toybox binary but telling it to build one from the command line, but one issue at a time). It's diff saying expected/actual don't exist in the pwd tests, because it's creating ../expected and ../actual and then doing "cd ..; diff expected actual" afterwards. But pwd.tests did an "ls -s . blah; cd blah" which means ../file is doing a physical file traversal to the parent directory, but cd .. is peeling off the last $PWD entry which is the NOP circular symlink. Although it's still saying it's NOT an error because the diff produces no output! (Which is right for the wrong reasons, and _itself_ a bug.)

On the one hand, I don't want the test doing a cd to change where expected/actual live. On the other, I don't want to pollute the environment variable space with extra stuff? Still, the second is definitely the lesser evil. And I should also capture stderr as part of the diff output when detecting test failure.

None of which is what's going wrong on Microsoft github, of course.


July 23, 2023

Still checking the kernel bug report for the btrfs issue. No response yet...

Got dd cleaned up and promoted. That was one of my big "I want to get this into the next release" things I'd been holding it for, so I might cut a release today.

Part of the dd promotion was just NOT adding the block granularity tests and instead just wait for somebody to complain. I THINK I'm getting them right? (I'm still not sure conv=sync is handled right, but if I'm getting it wrong I'm pretty sure the previous code was too? There's no test for it yet... ok, added a test.)

Watching twitter's dumpster fire du jour from Mastodon and being glad I got out before the frogs REALLY started boiling. The waves of "this is fine" burning dog energy are just... it's like watching catholics justify each new pedophile priest scandal and find reasons not to react or change to each new input. (Another hundred children's bodies found buried under a catholic school? Ho hum. We're still the ultimate arbiters of morality, never mind the schools we filled with kidnapped native americans are like a serial killer's backyard, after all we finally rescinded the Doctrine of Discovery in... March of this year. Now go eat your human flesh and drink your human blood which may look like crackers and wine but we insist are literally, not merely symbolically, actual cannibalism. In a good way!)

I have a history of not being on AOL, not using Windows, not using Faceboot, not drinking, and for most of the past decade not driving a car. Doing Without is pretty normal. Heck, my twitter account got blocked in 2019 for tweeting "Guillotine the billionaires" as my comment on link du jour a could hundred times, and their "I'll know it when I see it" ever-changing community moderation trends shifted out from under me so that was retroactively No Longer Acceptable. Except it's a political position: if this country has capital punishment, how does it NOT apply to the Sackler family behind the opioid crisis having provably killed 100k people? So old Twitter-under-@jack retconned all my old posts' status and wanted me to performatively delete each instance from the history _and_ give them my phone number, and I went "oh well, no more twitter" back in 2019. Jack let The Resident keep his twitter account. "This is not a place of honor. The danger is in a particular location. It increases towards a center." Muskrat buying the thing was a matter of degree.

As for missing it... I miss Livejournal. When Russia bought that, its userbase fled. I miss the #busybox freenode channel circa 2003, which broke up and wandered off long before some Korean billionaire bought freenode and its userbase fled to three different servers. (Erik was better at community management than me. It's never been a strong suit of mine.) "This too shall pass." Insane billionaires destroying companies I grew up with like Sears (Eddie Lampert) and Toys-R-Us (Mitt Romney) were a bigger deal to me than an insane billionaire destroying a website that was founded over a decade after I graduated college.

And no, it's not "capitalism" doing it, any more than "monarchy" killed Anne Boleyn and Jane Seymour. There was a specific guy. He had a neck. Society allows Lampert, Romney, Musk, eight Sacklers, and (according to Forbes) 2629 other billionaires as of May 2023 to all sleep safely each night in a country with half a million homeless people and 16 million homes sitting vacant, with 34 million "food insecure" people (9 million of which are children) in the world's largest food exporter. That's a choice. I have a political objection to that. Our gerontocracy (Biden's 80, Pelosi's 83, Feinstein is 90 and basically a vegetable) isn't going to change until the Boomers die. The non-defunded police are constantly punching down, with bullets.

So yeah, twitter delenda est. This too shall pass and could be quicker about it. I didn't try to defend "the wall street journal" when Rupert Murdoch bought it, I'm not trying to defend twitter from its new owner. Burn baby burn.


July 22, 2023

I want to hammer my SSD _slightly_ less hard, which means both reducing the amount of swapping it's doing and running the big LFS package builds in ramfs (well, least cp -s the source into a tmpfs mount), which means it's finally time to transplant the 16 gig memory chips from my old laptop to the new one, which still has 8 gits (2x 4gig, the other has 2x 8 gig).

Alas, this involves a reboot, which involves closing piles of open windows in 8 desktops, which means a lot of cut and paste of "here's a test I ran to figure out a toysh corner case" into my sh.tests file from which I should eventually try to extend tests/sh.test into something with actual design coverage.

It's very slow going. I have quite a lot of open tabs. And trying to do the thing IN a tab tends to spin off tangents (more tabs)...


July 21, 2023

Sometimes I get a poke on github where I honestly don't understand what's being asked. Somebody finds busybox "strings" to be performance critical to them, and this should be of interest to toybox?

I feel like I'm missing something, but honestly can't spot it.


July 20, 2023

I poked a little at adding a config TOYBOX_BTRFS_BUG_WORKAROUND that enables a lib/portability.c section wrapping opendir()/readdir()/closedir() with a version that glues an extra field onto the end of the dir structure that's an array of dev_t/ino_t pairs we've seen so far... but really, adding a -breadth option is probably better? It prevents the btrfs bug from making find denial of service attack itself, and if another program operating on the same filesystem does that... it's an obvious btrfs bug. A loop calling readdir() is never guaranteed to terminate on btrfs. That's a problem, no matter what filtering you do on the results you're never sure you're DONE.


July 19, 2023

A recently restarted discussion wandered over onto kernel.org bugzilla (to confirm that they're going to say it's a feature), and when I tried to make an account it said I already had one, and I went "how long ago was this" and guessed a pasword I haven't used in at LEAST 10 years. Which was it. That's disturbing.

Then I composed a comment because I think they should have the simple standalone 15 line C program that reproduces this issue instead of knee-jerk saying "it's a toybox bug go away". Also, "this is a denial of service attack waiting to happen" seems an important point to make? Could just be me. (A program running as root traversing a userspace directory could get pinned in an endless readdir() loop by a program repeatedly renaming a file.)

As usual, my comment went on a bit long and I decided to edit out most of the more inflammatory "I assume you're going to be stubborn about this so let's explain just HOW problematic the position I expect you're defending is" and cut and paste it to my blog (standard move for me), but when I did that and hit "preview" it went "your session token has expired, log in again".

So for security reasons, bugzilla.kernel.org would not generate a preview if I took too long composing the message, but happily let me log in using a 10 year old _highly_ insecure password. That's nice.

Here's the text I removed:

It seems like other filesystems are trying to provide a snapshot of the directory at query time (with the same stuff shows up and stuff goes away problems of "ps"), and btrfs's readdir() is trying to be inotify as well appending a stream of updates that happened after the open and first getdents(). (I'm guessing it does not append deletions because getdents can't return negative dentries.)

While cacheing entries before acting on them prevents us from doing this to ourselves (so never use btrfs on embedded systems if it's unsafe to use without extra cacheing), that still doesn't prevent any other process from intentionally making a readdir literally continue forever, hanging programs that don't know they need a workaround for this filesystem's behavior.

And coming up with a workaround is non-obvious, because "stop at the first dev/inode pair I've already seen" would break on a filesystem that returns entries in alphabetical order (so renaming bcd to ghi could repeat an inode before returning a zzz we hadn't seen yet, but that doesn't mean it wouldn't EVER terminate). Any method to stop before EOF is an imperfect heuristic, and continuing to EOF is never guaranteed to terminate on btrfs.

Hmmm... maybe "I've seen every previously returned dev+inode a second time" is a good btrfs loop detector heuristic? Probably not going to see any new ones at that point...

Sigh, I can see the logic of "if we make getdents() act like inotify() that's more _reliable_", but as with many security things the logic is wrong. The result is losing the guarantee that getdents() will _ever_ end, which is a livelock denial of service waiting to happen. Inotify exists, if you _want_ to make something "reliable" like that in userspace you can. And then it's YOUR job to work out how to defend the extra complexity _you_ just added against denial of service attacks like the one I just got a bug report about.


July 18, 2023

I have received an "event invitation" from the Taiwan guys (which thunderbird has somehow processed, in its "I want to grow up to be Microsoft Office" way), stating that at least the Q&A part happens via Google Meet. Ok, my phone can do that.

I 100% want to send them the videos ahead of time. Possibly post them to that youtube channel I haven't been using because youtube's gotten... Urgh, I could add a hundred more links to those three. Nobody who actually posts to Youtube seems to enjoy being there anymore.


July 17, 2023

Staring at toys/pending/hexdump.c which... there's nothing fundamentally WRONG with it, except that toybox has od.c (from posix) and xxd.c (from Elliott's personal preferences) and hexedit.c (interactive gui tool) and this is a FOURTH of implementation of basically the same functionality that shares NO CODE with the other three.

It's... I mean... usually when I'm dumping stuff on the command line I use "hd" which is an alias for "hexdump -C", and if I had to pick one of od or xxd or hd I would totally go for hd. But... FOUR implementations of the same general thing? Sharing NO CODE?

Keeping it out is a bad call, putting it in is a bad call. And unifying the implementations sticks on "xxd isn't in my wheelhouse". I dunno what success looks like there because I don't use it, and that one's full of sharp edges. Although half the code is do_xxd_include() and do_xxd_reverse(), and looking at dehex() I really want to replace it with sscanf()...

Sigh, printing out a hex dump is easy to do. There's not MUCH code, which is why it's hard to share. But FOUR implementations, sharing NOTHING? Ouch...


July 16, 2023

Sigh, I'm trying to set up debootstrap under mkroot, which means doing a quick and dirty rm -rf blah.img && truncate -s 8g blah.img && mke2fs -j blah.img && ./run-qemu-sh. -hda blah.img but... there's a bunch of WEIRD going on here. Bash in the chroot is saying that /dev/ttyS0 in not a controlling console, but SERIAL_8250_CONSOLE is enabled in the kernel config? I'm using qemu so I can do chroot _without_ unshare. No container weirdness here, it should be a normal chroot.

And then I'm transferring in the data by running toybox netcat -s 127.0.0.1 -p 8888 -L toybox httpd . on the host, and within qemu doing wget http://10.0.2.2:8888/debootstrap.tgz -O- | tar xv which is looping saying "inflate EOF", which is just weird. (Why _looping_? How is is it not _exiting_? It works fine if I fetch the file and extract it with tar -f but not when wget pipes into stdin?)

I'm trying to write this down and NOT go down tangents debugging them, but it's hard. (Define "focus". This is a prioritization problem.)


July 15, 2023

The entry on the 13th was kind of long and my diversion into the gnu "info" format was an off-topic tangent, so I moved it here. :)

The REASON the "makeinfo" command that should just not exist in 2023 is the gnu info format was a derivative of "gopher" from before html took over, meaning it DIED THIRTY YEARS AGO. The University of Minnesota announced it would charge license fees for gopher in Febuary 1993 but CERN disclaimed ownership of www, so http:// became ubiquitous and gopher:// strangled even though Mosaic could read from both servers. This was another "IBM lost its lawsuit against Compaq so PC clones were royalty-free, Apple won its lawsuit against Franklin so Apple II could not be cloned" all over again. (Apple begged Steve Jobs to come back in 1997 because it was _dying_.) NFS was a terrible network filesystem protocol, but Sun released it for free and the competitors like IBM's AFS were proprietary.

In 2003 the gnu documentation maintainers agreed info was obsolete and should be replaced. I know this because Eric Raymond was writing "doclifter" (a tool to parse man page output and heuristically produce docbook), and he asked them while I was sitting next to him in his home office.

I met Eric before he went crazy: he used to be a bit weird but functional. Back before my mother's cancer returned she lived in Marlton New Jersey and Eric lived in Malvern PA, about an hour's drive away. He had a gun hobby but it was like an archery hobby, really didn't come up much. (He had exactly one rifle or possibly shotgun in the basement, I never saw him fire it. He'd had some kind of pistol for use at shooting ranges before that, but it got confiscated out of his luggage by the TSA after 9/11. He was mad about that because both Terry Pratchett and Larry Wall had borrowed and used it at a shooting range adjacent to various conventions.)

I met Eric at Atlanta Linux Showcase in 1999, and again at Worldcon in 2000, after reading The Cathedral And the Bazaar and tracking down as much else of his writing as I could find. He lived an hour's drive from my relatives, so I promised the next time I was in town I'd drive by his house and drop off a copy of the movie 1776 (because my theory at the time was ESR was Ben Franklin, RMS was John Adamas, and Linus Torvalds was Thomas Jefferson, at least as portrayed in that movie. I also gave RMS a copy when I visited _him_ in Boston in February 2001, and he was very upset I thought he was Adams, he thought HE was Franklin.)

Eric and I started collaborating on stuff (I was writing for The Motley Fool, he'd been stalled on The Art of Unix Programming for a while, I offered to review and help edit...) The morning of September 11, 2001 I'd spent the night on the futon in Eric and Cathy's basement and was driving home when I noticed multiple police cars driving aimlessly with their flashers on before I'd even left Malvern, and turned on the radio to hear about "the hole where the World Trade center used to be", and turned back around and knocked on Eric's door and went "I think somebody just nuked the World Trade Center" and we went back in and I tried to get Slashdot to load while he loaded The Drudge Report. (I remember cnn.com was down, whether from load or due to half the eastern seaboard's internet having gone through the WTC's basement was not immediately clear.)

I say "before Eric went crazy" a lot, but he was important to open source for years. It's like Newton spending the end of his life studying alchemy and magic, or Linus Pauling deciding massive doses of vitamin C weren't a placebo despite being measurably flushed out of the body. And of course William Shockley (who stole credit for the transistor from Bardeen and Brattain) and James Watson (who stole credit for discovering the structure of DNA from Rosalind Franklin) announced themselves as virulent racists as they got older. (Eric had at least DONE his early work.)

Eric was a Libertarian back when the Koch Bros had whole think tanks devoted to capturing and radicalizing Libertarians. The Nobel Effect plays in here a bit, and the way smart successful people Dunning-Kruger hard in adjacent areas to their speciality. (If you're good at one thing you must be "smart" as a universal trait, and thus good at everything, so how hard can everyone else's areas of earned expertise be?) Smart people are often _more_ susceptible to con artists, and depression works in here too, they do the work of seeing the faces in clouds and building stories to justify their expectations.

9/11 didn't hit me that hard. I honestly didn't see why it was a bigger deal than The Oklahoma City Bombing six years earlier. I'd grown up on Kwajalein, which was littered with World War II debris, and where they regularly tested ICBMs for eventual use against Russia or possibly China. When my family moved to New Jersey I used to hold me breath and count after planes went overhead the first year back in the states because WHAT IF THAT WAS AN INCOMING NUKE? I was 10 in a strange country with Ronald Regan in charge, and I'd only heard the ICBMs _launch_ from Kwaj, the incoming ones from Vandenberg Air Force Base in California generally came down miles away in the lagoon. (Except that one where they tried a land impact and missed and wound up pointing the tradex at brand x, but... tangent.) Kwajalein was too small and isolated to be much of a target, although my mother had explained to my sister, when my sister got a silver necklace for christmas when I was 6, that if civilization collapsed she could trade that for food so she wouldn't have to eat our cat Namaur immediately. (The Boomers were NOT all right, even back then.) Me, I was the strange child viewing civilization like an alien antropolgist going "If Rome collapsed 2000 years ago and civilization rebuilt itself, then if we DO have a nuclear war we'll probably be back where we are now in another 2000 years, but we need to solve aging before losing our current tech level for that to matter..." (Did I mention I read The Ship Who Sang when I was 7?) Anyway, once we were back in the states Regan was on TV all the time going on about how mustache-twirlingly evil the Russians were, and "The Day After" was on TV, and Sting was singing "if the Russians love their children too" and Star Trek's future history just _assumed_ we'd go through a nuclear war before rebuilding and managing serious space travel, and I really wasn't HAPPY with the move to New Jersey between the definite nucear targets of New York and Philadelphia. (Also, we moved from Florida to Kwaj when I was 5, moved from Kwaj to NJ when I was 10, and did NOT move out of New Jersey when I was 15. Unfair.) Still, it was a bit of a stress relief when the berlin wall came down in high school. China could still nuke us, but since Nixon did the divide-and-conquer thing and started outright bribing them, they hadn't really wanted to. They made way too much money off of us.

In comparison, "a couple airliners got hijacked and suicide bombers took down a building"... and? Hijackings were a regular-ish occurrence in the 1970s, and Japan had done kamikaze plane attacks through World War II. Again, Kwaj was a big WWII battleground and the military housing we lived in was built after the US navy took the island during the war: while collecting shells on the reef "45s" were less common than cone shells or brownies, but more common than strawberry cowries.

Some planes hitting a building a hundred miles away was not REMOTELY an existential threat. I mean yeah, a big shame, and my brother had visited that building and had a cup from the cafe on top in the dishwasher. But they'd blown up less than 5% of one city in a country that had multiple dozens of big cities and hundreds of little ones. Hurricane Andrew had caused a much bigger swath of destruction, and hadn't somebody already tried to blow up the World Trade Center a few years earlier with a car bomb in the basement? I couldn't understand why it was such a big deal, but everybody around me was PANICING... One more way I didn't fit in, all I could do was wait for them to work through it.

(I didn't understand that Boomers were raised on duck-and-cover rhetoric, where the country would pop like a soap bubble at the first attack by a foreign power. Even though the cold war was over, they couldn't grok that there were ways to be attacked by a foreign power that did NOT mean the immediate end of world civilization. Even today, people keep thinking that Russia's tantrums mean World War 3 if they don't get everything they ever ask for, when the most GENEROUS estimates of their current ICBM and warhead capacity is that today they MIGHT do about as much damage to the rest of the world combined as the USA did to Japan with Fat Man and Little boy in 1945. "Two cities lose 20% of their population" is the kind of thing Ukraine is going through NOW on a regular basis (Mariupol basically no longer exists) and they're still fighting. The USA had already beaten Japan in 1945, but Truman needed something showy enough to surrender to rather than fighting to the death. The "thousands of missiles on constant alert" thing 20 years later was like the bomb shelters stocked with food: we paid rather a lot of money to maintain them and they went away again when we stopped paying for them. A dozen disgruntled junior Saudi royals never had that kind of resources: they were suicide bombers taking advantage of cockpit doors that didn't lock and passengers who expected to live if they didn't resist. If that could ever be an existential threat Israel wouldn't have lasted 5 years.)

In response to 9/11 Eric doubled down on the gun-nuttery, because he coped with the stress by writing a terrible post (on _paper_ in that restaurant with the raspberry cheesecake) about how it wouldn't have happened if everybody on every plane had guns at all times (which I didn't believe, but he was stressed and feeling helpless and needed to vent). And then Eric defending himself online from the inevitable blowback, thus doubling more down. And later his wife Cathy (a lawyer who wanted to become a judge) got involved with local GOP politics (the path to judgeship, apparently) during the Cheney administration's "Duct Tape and Plastic Sheeting" days of warantless wiretaps and the TSA being a law unto themselves while Haliburn invaded Iraq (stealing 2 _billion_ dollars in cash along the way, as in it vanished from shipping containers in their/blackwater's custody) and the "threat level" changed daily with blue being one of the options for some reason. (The dubyah administration had no fucking clue, but everyone would have rallied around a potted plant. Huddling together for reassurance, really. It became unpatriotic to make fun of what we had all previously agreed was a clearly incompetent idiot puppeted by Darth Vader. Everybody had to fall in line and obey, just like under McCarthyism in the 5 "red scard" years right after the soviets detonated their first nuclear bomb.)

Over the next few years Eric gradually got more brittle (hanging out in the online spaces the Drudge Report and libertarianism led him to) until my ability to collaborate with him went on indefinite hiatus around 2009. While I was launching Penguicon and helping defend IBM from SCO around 2003 we still worked together great. We finished the 64 bit transition paper in 2006 with growing but still manageable friction. We made made multiple attempts at finishing the "Why C++ is not my favorite language" starting around 2008 but the last couple visits turned into shouting matches. (The Koch Brothers had think tanks to capture libertarians, so on my visits Eric kept showing me articles about how seeding the oceans with iron might cause huge algae booms that would sink to the bottom and trap carbon, and I was going "whale fall is not new, stuff eats food even on the bottom of the deep bits, it'll all be back in the atmosphere in twenty years tops", and showing me his "research" about how oil might be generated in the planet's mantle and seep up towards the surface and thus essentially never run out, and I was going "I just worked a contract at Ion Geophysical in Houston where I MET the guys who came up with that lie, you are DEEP into some sort of cult nonsense here and this is Tobacco Institute levels of targeted lying which you should NOT be dumb enough to fall for"... I remember one car trip towards the end where he was telling me about the book "The Bell Curve" he was impressed by (and me going "no"), and in his office him explaining to me how eskimos could DEFINITELY all rotate 3D objects in their heads because they could get rid of waste heat more easily because living further north makes people "genetically smarter" and me calling bullshit; by that metric bald people would be measurably smarter and nobody could think while wearing a hat: I was pre-med in college, temperature is equalized by blood circulation, excessive heat loss through the head is a BUG not a feature, and heat dissipation has never been a limiting factor on THOUGHT, nobody should fall for this! Sigh. Eric's online social channels were feeding him targeted crap, because libertarian gun nuttery was identified as exploitable by the Koch Brothers' think tanks. People were ALREADY writing articles about the "libertarian to fascist pipeline", but he was too far gone. I _watched_ him get radicalized, and couldn't stop it. The same think tanks got the atheists too, as I said basically exploiting Nobel Disease. People who think they're smarter than the grifter are the easiest marks.

(One of the first times Eric got "you just accidentally spit on me" mad in 2008 was when I said "we're having a semantic argument" and he insisted that did NOT mean we're uselessly arguing about the definition of words. He had some sort of philosophical nonsense about "semantic" being the most important category of knowledge or some such, talked about a philosopher whose name I didn't recognize, and patronized me a bit for neither knowing nor caring about him. I was never convinced, I just stopped arguing. He got ANGRY about it...)

I stopped visiting Malvern after 2009, and in 2011 we stopped speaking to each other at all when he went full climate change denialist and I asked him "When did you turn into Glenn Beck?" on twitter. (He did not take that well. We didn't speak again for many years, and that was one brief phone call nominally burying the hatchet because Cathy asked me to. Eric had at least acknowledged that climate change denialism specifically he'd been wrong about, but his libertarianism led him down other right wing loon paths...)

Anyway, I miss my friend, I'm sorry he went crazy. But the point is, back in 2003 even the GNU maintainers admitted that the "info" documentation format was toast, so still using it 20 years later is just SAD.

Admittedly the gnu devs' "we'll try doclifter when it's ready" statement is a bit like the kernel guys saying they'd move to Eric's cml2 instead of the kconfig rewrite that wound up happening instead, because Eric wrote cml2 in Python (not previously a kernel build dependency) and it took 20 seconds to open because he was doing some horrible analysis thing that made sense to a Lisp programmer, and he refused to cache the results because "shipping generated data was wrong". Meanwhile, the kconfig rewrite had blah.c_shipped files so you didn't need lex or yacc on your build machine.

But Eric being stubborn wasn't the fundamental problem: Doclifter mostly failed because docbook was pointless, something I argued with Eric about at the time, or at least while we were editing The Art of Unix Programming, which was also written in Docbook. If there are no visual editors for your format, and thus you MUST edit the tags by hand in a text editor, the format is of limited use. The distinction between "semantic" and "presentation" means this is a PROGRAMMING LANGUAGE, not a document authoring format, and you'll never get tech writers to touch it. Even microsoft word had "show edits" and such to make the invisible visible and thus GUI editable. Saying "there's a bunch of semantic markup that doesn't translate to presentation layer" is ivory tower academic bullshit, it's useless in real life. What little oxygen docbook had got eventually replaced by wiki markup because you could tab back and forth between editing and display.

But still, they could have converted info to anything else. It failed as a "standard", outside the FSF ever picked it up, and the FSF sticking with gopher-based info in 2023 is like still publishing in EBCDIC. The Gopher core devs realized they'd lost in 1993, Web trafic passed gopher traffic in 1994, and the University of Minnesota where it was developed (and named after their sportsball mascot) disbanded the gopher programming team in 1995 (retasking the developers to develop a web-based accounting software package instead). Firefox (generally a trailing indicator) dropped support for Gopher with the 4.0 release in 2011 (10 years after Internet Explorer 6 dropped it). Chrome never had it. Info can STOP NOW. I'm pretty sure I posted a very similar rant about this at least a dozen years ago because it's a ZOMBIE...

So yeah, info sucks. Don't install it, and rip it out of any configure file that can't cleanly drop it out. Richard Stallman went crazy (along a different axis) about 10 years _before_ Eric did, and the FSF Bill Cosbying him back into the fold so it's still _his_ call to stick with this rejected data format he'd created or move to _anything_ else? Sigh.


July 14, 2023

I'm rereading the posix dd spec, and it was clearly NOT written by someone who had ever tried to implement it. It talks at length about what to do about short reads but never even considers the possiblity of short WRITES. A bunch of rules for figuring out what command line arguments and input block sizes result in which output block sizes, including a 6 step list starting "The processing order shall be as follows"... but writes are assumed to complete, block, go into a kernel buffer... never a short write.

The section saying what goes to stderr says a count of whole and partial input and output blocks, and also if there are any truncated blocks a line about those too. Where truncated only applies to input blocks, not output blocks, and not the "short read" kind of truncated but instead their stupid "conv=block" feature. You have "conv=noerror" but the summary does NOT say how many read or write errors were encountered and seeked past? Really?

And of course even the 2018 version (I.E. what's live on the website today) has an ASCII to EBCDIC conversion chart, despite that already being irrelevant back in the 1990s. And in the RATIONALE section it says "a failed read on a regular file generally does not increment the file offset, and dd must then seek past the block on which the error occurred; otherwise the input error occurs repetitively. When the input is a magnetic tape, however, the tape normally has passed the block containing the error... and thus no seek is necessary." So it's commemorating an ancient unix kernel bug from 30 years ago, and noting that the driver for a piece of hardware that was already obsolete 20 years ago behaved differently. And another footnote in RATIONALE says that EBCDIC doesn't have the [ and ] characters (without which you can't use any modern programming language) so they fudged it.

Sigh. The whole "is dd actually reblocking" question... while it's what the tool was originally FOR, I'm not sure anybody actually _uses_ it for that? It's "read this amount of data at this offset from this source, and send it to that offset at that source". Any transforms like toupper can be done by some other filter in a pipeline. Back in the day they micromanaged the block size, and that's great, but it's really HARD to care in modern contexts because the 40th's anniversary of the publication of Nagle's algorithm is next year.

I've been grinding away at trying to come up with a test suite that can detect the transaction granularity, but really: if we read and write the correct data at the correct offsets? Probably good enough.


July 13, 2023

Sigh, not sure why I thought LFS 12 was out? Probably clicked on the "unstable" link on the web by accident. Cloned the git repo (which says 404 in a browser but clones from the command line; so friendly) in hopes of having an easier time keeping track. In the meantime: yay 11.3.

You know, technically, to get the LFS 11.3 build working all I actually NEED is the "chapter 5" build. As in if a mkroot chroot can run the script I already wrote, then everything else happens in the new chroot and THAT means that if toybox can provide the commands that record-commands says the script I already wrote is calling, we're ready for at least the naieve LFS bootstrapping. Hmmm...

There's still a a bit of glue layer though: Chapter 7 starts by running crap as root in the host sytem, starting with a "chown -R root:root lfs" (except in an elaborate overcomplicated way where it does every individual subdirectory instead of just cleanly recursing), and then mkdir lfs/{dev,proc,sys,run} which is STUPID to put there, then it mounts dev, dev/pts, proc, sysfs, tmpfs, and dev/shm (which is slightly awkward because there's no "mount" command in the chroot yet). And THEN it does more or less chroot lfs env -i HOME=/root TERM="$TERM" PATH=/bin:/sbin /bin/bash --login.

Three of those four things require root access to do, so it kinda makes sense to put them here, although I either want to shoehorn my unshare -Cimnpuf layer in there and run as fake root inside the chroot, or more likely boot QEMU and run as root inside the emulator.

But only three of the four, and I really REALLY want to clean that fourth thing up: the mkdir in the middle could be part of the initial mkdir. And having it come AFTER the chown... so the mountpoints belong to the host user? Going out of your WAY to do that? Why?

And of course, historically the glibc build was an abomination requiring both perl AND python as hard build prerequisites. I substituted uClibc back in the day, and can presumably use musl instead now. But let's get what's there working first.

Ok, I re-ran ch5.sh under record-commands and the new list of commands it called are:

aclocal-1.16 ar as autoconf autoheader autom4te automake-1.16 awk basename bash bison cat cc c++filt chmod cmp cp cut date dd diff dirname echo egrep env expand expect expr fgrep file find flex g++ gawk gcc getconf git gnat gnatgcc gnatmake gnatprep grep gzip head hostname id install ld ldd ln ls m4 make makeinfo mkdir mktemp msgfmt msgmerge mt mv nm nproc objcopy objdump od paste patch perl pkg-config pod2man print python python3 ranlib readelf realpath rm rmdir sed sh sleep sort strip tail tar test touch tput tr true tty uname uniq wc which x86_64-pc-linux-gnu-pkg-config xargs xgettext xmlcatalog xz

Which isn't even necessarily the full list, because PATH=newtools:$PATH which means it might have built stuff that got called later, which toybox COULD supply but which it built a tool for before ever trying to use. Ideally I'd like the toybox host binaries to be able to build as many packages as possible, and thus be able to build ncurses or similar WITHOUT building coreutils first, so I want the build to KEEP using the toybox versions when they are available, thus installing the new stuff AFTER toybox in the $PATH. (Ala PATH=$PATH:newtools instead.) But let's start with the low hanging fruit first.

It's likely at least some of the above crap got called by autoconf to see if it was there, but it would have happily worked without it. There is NO REASON anything in 2023 should be calling "mt" (the magnetic tape control command), for example.

Ok, which binaries from this list does defconfig toybox already provide:

basename bash cat chmod cmp cp cut date dirname echo egrep env expand fgrep file find getconf grep head hostname id install ln ls mkdir mktemp mv nproc od paste patch readelf realpath rm rmdir sed sh sleep sort tail tar test touch true tty uname uniq wc which xargs

And which does the toolchain provide:

ar as cc c++filt g++ gcc ld nm objcopy objdump ranlib strip

Plus the stuff already in the toybox roadmap (if not pending) is:

awk bison dd diff expr flex gawk git gzip m4 make tr xz

Which leaves:

aclocal-1.16 autoconf autoheader autom4te automake-1.16 expect gnat gnatgcc gnatmake gnatprep ldd makeinfo msgfmt msgmerge mt perl pkg-config pod2man print python python3 tput x86_64-pc-linux-gnu-pkg-config xgettext xmlcatalog

Ok, for i in $FILES; do dpkg-query -S $(readlink -f $(which $i)); done | sort and then a slight manual cleanup:

autoconf: autoconf autoheader autom4te
automake: aclocal-1.16 automake-1.16
cpio: mt-gnu
expect: expect
gcc-8: x86_64-linux-gnu-gcc-8
gettext: msgfmt msgmerge xgettext
gnat-8: x86_64-linux-gnu-{gnat,gnatmake,gnatprep}-8
libc-bin: ldd
libxml2-utils: xmlcatalog
mime-support: run-mailcap
ncurses-bin: tput
perl-base: perl
perl: pod2man
pkg-config: pkg-config x86_64-pc-linux-gnu-pkg-config
python2.7-minimal: python2.7
python3.7-minimal: python3.7
texinfo: texi2any

Sigh, I try not to have autoconf and automake installed on my laptop, I'm guessing that's left over debris from trying to get some gnu/dammit package to compile from a random git snapshot. I should set up and run this in a clean debootstrap, but for the moment assuming those do drop out (they USED to)...

The only calls to mt (the magnetic tape control utility, from back in the days when big iron computers had tape reels on the front, and presumably flashing lights and made beep-boop noises and a lot of relay clicks before speaking flatly in Majel Barrett's voice) in the log are all "mt -?" with no follow-up, so I'm guessing this is historical gnu configure debris checking to see if something is there and then never caring. What packages do that: egrep '^tar "xvf"|^mt ' log.txt | less says mpc, file, gawk, and xz. Ugly. But it's autoconf, so that goes without saying.

I boggled a bit at "expect" because grep '^expect ' log.txt produced zero hits but awk '{print $1}' log.txt | sort -u was finding it, but it turns out expect was called with no arguments (so the log line had no space after it, just an immediate newline). It's only checked for (not used) by binutils, which can presumably get along fine without it.

The downside of the readlink -f is a few things renamed themselves, but without that dpkg-query -S isn't smart enough to find the packages. The gcc-8 thing is actually what "gnatgcc" redirects to, so this isn't host toolchain leakage, it's gnat leakage. (Not that we were using a cross compiler anyway, but still.)

Presumably if gettext isn't installed it'll drop out? Back in aboriginal I had a gettext-stub library that would A) symlink msgfmt to true, B) provide a stub libintl.h and libintl.c that did as little as possible, lots of #defining things to NULL and functions doing return msgid; or return "C"; but right NOW the question is will this work if this package just ISN'T installed? I remember having to patch binutils back in the day. (Not a big patch, just... they never regression tested the not-installed path so they asked a question and then couldn't handle one of the answers for stupid typo reasons. Which persisted for at least 5 years because nobody wanted to talk to gnu zealots and everyone just patched it locally.)

The "gnat" stuff is because I still have a package installed for building the ASIC hardware toolchain out of ghdl and yosys and such. It's one of those obscure gcc "compiler collection" things like fortran or gcc's built-in java support, what language... it's gcc's ADA compiler. Which is a horrible overcomplicated language the US Navy bloated into uselessness back in the 1980s (back in college Rutgers had a class in it in the catalog, but wasn't actually offering it, so I asked a college advisor why and got an earful), and for some reason GHDL was implemented in it. (I think because VHDL the language is an ADA derivative or some such? Which makes as much sense as saying C and Python use "algol syntax", which is technically true but the last Algol standard in 1968 was presented with a report from the committee saying they already considered the language a failure and were going to stop now. C carried the banner from 1972 onwards and other languages have C-like syntax, the fossil ancestor is effectively extinct.) Anyway, gnat almost certainly drops out cleanly if it's not installed because 99% of the userbase won't have it. (And ghdl being written in ADA is the main reason ghdl isn't more widely used, which alas drags VHDL down with it.)

HA! THE BUILD IS CALLING LDD. To do what, exactly? Grep of log.txt says that "ldd --version" is called 3 times (twice by the "file" build, once by the "patch" build). Yet more useless autoconf shenanigans, not actually used for anything.

The gcc and mpc builds are calling xmlcatalog "" "http://docbook.sourceforge.net/release/xsl-ns/current/" which does not need to happen.

Another readlink -f renaming head scratcher: run-mailcap is what "print" is symlinked to, and... there are only 2 calls to that in the log (repeated several times though), and both of them produce a usage: message and several "error:" lines. Calls like print -r -- -n are nonsense, neither -r nor -n are recognized options to the mime-support "print" command. And then one has a zillion backslashes as an argument, but it doesn't produce output to stdout (only stderr) because again, unrecognized options so error message instead. This one's happening in over half the package builds. (You can sing "autoconf is useless" to "every sperm is sacred".)

I kinda have tput in the toybox roadmap except that it's got as many options as stty and I dunno what's _relevant_, but in this case the build is calling:

$ grep '^tput ' log.txt | sort -u
tput "bold"
tput "setaf" "1"
tput "setaf" "2"
tput "setaf" "4"
tput "setaf" "5"
tput "sgr0"
tput "smso"

Oh goddess, who's calling perl. Is it still just glibc being that stupid? No, lots of stuff is (et tu, binutils?) but almost entirely just calls to texi2pod for documentation generation, so I can probably either not install perl or --disable-docs somehow and get it to not. And this is related to pod2man as well.

Lots of calls to pkg-config: binutils is looking for libdebuginfod, libzstd, and msgpack, grep is looking for libpcre2, and make is just checking if the tool itself exists but not actually using it.

Of course it's using both python 2 _and_ 3. Findutils is using "python" unprefixed (and getting python 2), in some sort of sysconfig replacement with comments like "Can't use sysconfig in CPython 2.7, since it's broken in virtualenvs" and while it's so much an "I can't even" as strongly believe I shouldn't, and then glibc is using python3 and running whatever scripts/gen-as-const.py is rather a lot of times (some sort fo compiler wrapper?) plus gen-translit.py and dso-ordering-test.py and gen-libm-test.py. As I said: replace with musl.

And the "texi2any" nonsense is actually a symlink from "makeinfo", which is a command that should just not exist in 2023. It's in the "mt" bucket, the gnu info format was a derivative of "gopher" before html took over, meaning it DIED THIRTY YEARS AGO. (I'm pretty sure I ranted about how obsolete it was already at least a dozen years ago?) So yeah, info sucks. Don't install it and rip it out of any configure file that can't cleanly drop it out..


July 12, 2023

It's very easy to sit down and open a new can of worms, and cover the room with slush pile scribbling about ideas and implementation... vs the slow tedious heavy lift to finish and clean and package and test and document it so it's DONE. I have SO many open cans of worms spilling into each other, which I've been hammering on xeno's paradox style to close off and check in for WEEKS... They self-select, because the ones that are easy to finish get finished, and the ones that are hard to finish accumulate unfinished.

The Taiwan guys replied that the pandemic "trained us to adapt" for remote talks, so I need to record... let's see, half hour timeslot which is 9:30-10:00 am on the 29th there, which I think is 8:30 pm on the 28th my time? So, half an hour of material, allocate that with a ten minute prerecorded talk on using mkroot, then stop for 5 minutes live Q&A via zoom variant du jour, then ten more minutes explaining the implementation of mkroot, and 5 minutes left for more questions?

So, two ten minute videos on mkroot. Time to make some bullet point lists and hammer them into outlines. And of COURSE I'm doing the "I should clean that up, I should change this part" dance. Trying to document stuff always results in the desire to simplify away the bits I don't want to explain to someone who doesn't already know them. Which is a net positive (by rubber ducking at a theoretical audience I've found more things to fix), but also a tangent from a tangent...

Ah right. Trying to explain this goes "If you run mkroot with no arguments it builds a broken binary because we statically linked against glibc, which can't do things like DNS lookups because glibc sucks. Copying the dynamic libraries out of the toolchain takes 1.7 gigabytes on my laptop, and although I long ago had a trick to run ldd recursively against the binaries and libraries I copied, Elliott Hughes of Google strongly objected to me adding ldd to toybox for reasons I'm still unclear about, so you can't do that in an airlock build. So you pretty much HAVE to use the provided cross compilers for this to be at all useful."

I can of course explain the clear path, where I carefully do not walk across any land mines. Which is sort of cheating. I.E. ONLY show them using a cross compiler (or building on a distro like alpine that has a musl host library), and hand-wave away how profoundly glibc sucks.

Or I could just ignore Elliott and implement the simple tool I need to make this work, and he can leave it out of Android's config. (I note that the Android NDK does not contain ldd either, so it's not like I can use the version out of the toolchain that he insists provides the "right" functionality, because it's NOT THERE. He argued at length that does-not-exist trumps "good enough", and I am so tired.)

Sigh. I can also run sed against the readelf output and then do elaborate path shenanigans... which is way too big to put in mkroot proper but has to happen _AFTER_ the toybox build and all the package builds. In _theory_ toybox just links against libc and the dynamic linker which you can get by building hello world, but in PRACTICE when that's glibc toybox pulls in libcrypt.so.1, libm.so.6, and libresolv.so.2. And even THAT doesn't tell me what magic dlopen() crap it needs for the DNS resolver and so on, this is "run it under strace and see what gets opened" territory, with the question of what codepaths get exercised in your test...

Blah, how am I supposed to programmatically find the dlopen libraries of glibc? Hardwire knowledge of glibc into the "dynamic" harvester script, I guess. There's a reason I haven't done this before now, but I need to explain it to a new audience in a couple weeks, and I would like the explanation to make SENSE and not have large holes. The problem is, the design of glibc makes no sense and has large holes.


July 11, 2023

Sigh, I wimped out of attending the taiwan conference. When they initially contacted me I was in Japan and totally expected to be back regularly, and Japan to Taiwan and back is a day trip so a two day conference: not a problem. And having it be at the start or end of a multi-week visit to Japan, also not a problem. The international flight was amortized over a long stay in the area.

But I wasn't really in control of that schedule, and eventually plans changed so I wasn't going back to Japan for work at all, which meant I was now amortizing an international round trip against a 2 day conference where I knew nobody and was only scheduled to give one half hour talk. I still tried to make it work, but with the FASTEST (not cheapest) option each way being something like "22 hour travel time, including a 3 hour layover in San Francisco and then 6 hours in Los Angeles"... add in travel to/from the airport at each end and we're talking a day and a half of travel before and after, so at least a 5 day commitment to give a half hour talk, and "don't stay for the whole conference" would be _less_ incentive to go...

And of course the longer I delayed making an uncomfortable decision the less options were available and the more the price went up... trend line did not improve. I feel bad about this, but it was one of the multiple "things looming at me" that have been piling up recently, that tend to pile up when my executive function gets overwhelmed. (Bit like a large log blocking a river, and lots of little things plugging up the gaps.)


July 10, 2023

The Linux From Scratch automation script I started was LFS release 11 and LFS 12 is out now, so I should probably redo it. Back under aboriginal I stuck with an old version and got waaaaaay behind, until updating it was a heavy lift. Although part of that was accumulated version skew from sticking with the last GPLv2 toolchain releases until they got ancient, and the new stuff needing more and more patches to build with old tools, especially after C11 came out and packages started depending to it.

But I don't want to redo my recent LFS build stuff yet, I want to continue through to the end, because a partial script isn't very useful to me. I need TESTS. What I need from this exercise is a reproducible Linux From Scratch build that has some obvious success/failure indicator, and "rebuilds itself under itself twice, and I get a shell prompt from the second one" is the obvious smoke test. It worked, and what it made also worked. The build didn't break isn't the same as the build WORKED, and "the first build works well enough to complete a second build" is more or less my definition of success. (I _could_ do it a third time to make sure the rebuilt-under-itself one works properly, but it's one of them iterative 80/20 things: 80% vs 96% vs 99.2%. While I have seen bugs that only occurred on the third pass, that happened like twice over ten years, and both times aboriginal's users emailed me about it.)

Anyway, once I've got such a successful test, with the record-commands log of which commands the $PATH needed to have in them (and either the chroot or the second build should catch anything it calls via /absolute/path), I could swap in toybox commands one at a time and see how the build differs. Especially a single-processor build if I can compare the log output (both stdout and any config.log files) to detect non-obvious decision differences during the build. (Whether the new packages get installed before or after toybox in the $PATH is editable after the fact. The best test coverage for toybox is "PATH=/toybox:$PATH" and the lowest hanging make-it-work fruit is "PATH=$PATH:/toybox".)

But I can't start going "now try my sed!" until I've got an automated run-to-completion test. Because otherwise all I can say is "the build didn't break", which is less helpful. And wandering away for a couple months and then having to redo what I've already done rather than continuing down the path from where I left off is... typical, really. Frustrating, but typical.

Tardis envy...


July 9, 2023

Circled back to toysh and the function call lifetime stuff that's blocking the "return" builtin: looks like sh_fcall->delete only has two users, which are set_main() updating the $@ command line argument list (same lifetime as the function context, so deleting that is when you'd delete any memory allocated to hold those values), and then when run_command() does a function call it transplants the deletion list from struct sh_process to sh_fcall... and I don't remember why I did that? I know a function call's sh_process struct isn't entirely real (the pid field is zero), but it's got one with the same lifetime as anything else in a pipe? I suppose the function context gets popped a little earlier (when we exit the function, not when job control waits for the child process to calculate the pipeline exit code), but why would that be _important_?

Sigh, I did work in a branch to automatically subshell pipeline elements that needed it, because otherwise for i in {1..100000}; do echo $i; done | while read i; do echo $i; done would fill up the pipe buffer and hang. Part of the reason I need to block out the world and focus on the shell for a bit again is my mental model of what it's doing has lost track of what got CHECKED IN and what was only sketched out and worked on in an unfinished branch.

The QUESTION I was trying to answer was whether end_fcall() should bail out refusing to pop the root context before or after running the deletion list, and the answer is "before" because only the set_main() case applies to the root context (and you don't want to free the command line arguments while still using them). The run_command() case is always operating on a freshly added function context which CAN'T be the root context where the global variables live, so updating global variables in a loop outside of any function can't accumulate debris and fill up memory. (I'd be surprised if I got that wrong, but wanted to be sure.)


July 8, 2023

Bleurgh, I have some kind of lurgy. My sleep schedule is completely unintelligible, and when I am up I'm too tired to do anything.


July 7, 2023

I did a cleanup pass on i2cdetect but don't have a test environment for it, and setting up a raspberry pi has always been a flaming pain. (Poked at it again, but vanilla linux kernel doesn't have a defconfig for the chipset, and I'm not very interested in building an out-of-tree fork that's stayed out of tree for over 10 years.)

But there's a web page claiming to set up an I2C temperature sensor for qemu. So I built mkroot's x86-64 target and ./run-qemu.sh -device tmp105,id=sensor,address=0x50 which made qemu complain 'tmp105' is not a valid device model name.

The web page says they built qemu specially, doing echo CONFIG_TMP105=y >> default-configs/i386-softmmu.mak which doesn't work on current qemu because there's no default-configs directory. Grepping for TMP105 brings it up in a bunch of places, one of which is hw/sensor/Kconfig which says this depends on I2C already being set... what boards is that already set for? And there's kconfig in qemu? There's no make kconfig... docs/devel/kconfig.rst agrees there's no UI for it, you manually modify symbols under default-configs. There IS no default-configs. Alright, git log --stat and search for... Yup, two years ago commit 812b31d3f914 claimed to "rename default-configs to configs" (although the --stat output doesn't agree that those filenames actually got modified) and of COURSE the bureaucracy maintaining this didn't update the documentation.

Alright, the "configs" directory has two subdirectories under it: targets and devices. The "devices" directory has *-softmmu subdirectories, and all but one of those only contains a single "default.mak" file. (The magic special aarrcchh6644 target that's licensed differently than the other targets in TCG because it's magic and special has _two_ files here, "default.mak" and "minimal.mak". You couldn't have used the kconfig plumbing to if out a chunk of the file, could you? No? That would be too obvious for the punched card types that took over maintenance of this when Fabrice Bellard fled the encroaching bureaucracy...

Ahem, so under the two gratuitous extra levels of directory, configs/devices/x86_64-softmmu/default.mak is one line including ../i386-softmmu/default.mak because of course. And THAT has a bunch of commented out CONFIG_SOMEDEV=n lines (sigh, that's not how kconfig works) with a comment that you can uncomment them "to disable these optional devices". (Why can't you do this from the qemu command line?)

What does setting CONFIG_ISAPC actually do, grep -rl for it and... the only file using it other than this one is hw/i386/pc_piix.c which is using it in two #ifdefs to chop out two functions: pc_init_isa() and isapc_machine_options(). The rest of the file is still compiled, because of course.

So they use kconfig in a way that people already familiar with kconfig can get no information from. My earlier question of "where is CONFIG_I2C set" has no obvious answer so far. Can I just glue the symbol the gist page says to the i386 default.mak and have it work? Where _is_ CONFIG_I2C set, let's grep for it and... there are 281 hits under roms/u-boot and 7 hits _not_ under there. Bra fscking vo. I remember the YEARS that u-boot wouldn't run under qemu because Wolfgang Denk unconditionally refused to make dram init configurable. Then Wolfgang died. Often the way such refusals end...

There's something called "meson.build" that sucks this in but I don't want to run a meson, whatever that is. And it's also in build/mips-softmmu-config-devices.h... Ah, meson is yet another build system. So you type "make" to build with ninja using a ./configure shell script that runs python 3, and now there's another layer called meson. And I'm trying to trace through their highly nonstandard use of some subset of kconfig. I doubt anybody still working on qemu understands half the layers, they're just cargo cult piling up black boxes.

I'm just trying to emulate an i2c test environment. This is like two chips and a couple wires on a bread board. The whole POINT of i2c is it's very simple, that's why it exists. It's a mildly structured serial protocol. QEMU went to the trouble of implementing it, and adding a command line add-a-device syntax, and then won't actually DO it. The qemu-system-x86_64 binary that DOESN'T include support for this is 18 megabytes and ldd | wc says it links against 69 shared libraries, but supporting i2c by default would just be silly.

Hmmm, I googled for "qemu configure enable all devices" and... this doesn't seem to have occurred to anyone in the qemu community? The first "device emulation" documentation hit (on gitlab.io) ends with a 14 link "emulated devices" list that does not include i2c, but does include "CAN Bus Emulation Support", "Network emulation", and "Sparc32 keyboard". (Um, pick a level?)

Ooh, another hit near the end of the page (or at least just before the "people also search for" google advertising bar) says I can type "qemu -device help" which doesn't work because they removed the "qemu" link, but qemu-system-x86_64 -device help does indeed provide a long list of... fairly useless information. Half of it is CPU version strings, and adding -M isapc does not change the list in any way so it's not a list of devices available in a given context. BUT grepping that output for i2c produces 4 hits, two of which are on bus i2c-bus! So i2c-ddd and smbus-ipmi. More stuff to google, but I'm tired just now...


July 6, 2023

I have renamed call_function() to new_fcall() because I need to be better with the function vs fcall distinction: sh_function is a function definition (created by encountering name() { body; } and storing code that CAN be called, with its own name and sh_pipeline list, in a big global namespace), and fcall is a running call from one part of the script to another part of the script, with local variables and a blockstack and so on to keep track of where we are in loops and such right now, which lets us know where to go back to when this call ends. If you call a function recursively, one sh_function can instantiate multiple instances of sh_fcall.

So run_subshell() uses an anonymous function context for the fork() path, because the child process should exit when hitting the end of the current parenthetical block, and should not be allowed to return or break or anything out of it. The only reason we don't rip up and free the existing TT.ff list in the child is we retain the local variable context in the subshell, so we instead cap the list with a hard stop marker so we never return past it and redundantly execute the same shell code in the parent _and_ the child.

What DOES happen when you return from a parenthetical in a function call:

$ x() { (echo hello; return); echo two; }; x; echo three
hello
two
three
$ x() { { echo hello; return;}; echo two; }; x; echo three
hello
three
$ (echo one; return;); echo two
one
bash: return: can only `return' from a function or sourced script
two

Ok. So in a subshell, return detects that there's an enclosing function/source context OUTSIDE of the subshell (inherited from the parent process) which it could return from, but still exits the child process rather than running code outside the subshell. So THEY didn't clear their function stack either. :P

(It's gotta be some kind of cap entry, becuase subshells can nest. You can't have a global "we we are in a subshell" indicator, it has to indicate a position on the stack. I suppose it could be a pointer to a stack entry instead of a type of entry ON the stack, but I don't see how that's an improvement. Still needs its own blockstack so you don't "continue" outside the lines either.)

The next caller of new_fcall() is run_command(), which is NOT an anonymous function call. This is fully nonymous, an actual call to a named function from which we can "return". It does an addvar(0, TT.ff) to indicate this, initializing the local variable stack. (It doesn't add a variable to it, but it _exists_ is the point so you CAN add local variables to it later.)

Next up sh_main() calls new_fcall() to create the initial function context, but that one's magic. As with PID 1 and initramfs in the kernel, it's never not there and doesn't quite have the same properties as later ones. It's neither anonymous nor nonymous: you can't return from it, but can't reach past it either: you get an error message if you try.

Next up eval_main() calls new_fcall() but that one gets reached past: we're not even in a subshell, return works immediately. This one is as transparent as possible, but still stops running at the end so you can free the file * passed into do_source().

And finally source_main() calls new_fcall() which is a hybrid between run_command() and eval_main(): return acts like a function popping specifically _this_ function context (not drilling past it), but we recursed into do_source() and need to return from that to close the filehandle and such.

All five users want some form of cleanup, but run_command has the cleanup happen inside the run_lines() loop, specifically free_function() checks if ff->func is set and calls free_function() on it, which does the reference counting. While I could add an fp field to TT.ff so it also did its own cleanup, do_source() is iterating through input lines and handling line continuation requests, and we need a signal to break out of that and return from it. I suppose this COULD all be one big line handling loop in sh_main() (which would be easier on nommu stacks), but it's not currently written that way...

But run_subshell() wants isolation, run_command() wants command() semantics, sh_main() wants an init task, eval_main() wants transparency (not just for return but for break, and of course you can eval 'eval "echo hello"'), and source_main() wants command semantics with cleanup.

$ for i in a b c; do eval 'if [ $i == b ]; then break; fi'; done; echo $i
b

Ok, (subshell), command(), main, eval, source: 1) eval and subshell are transparent to return, main errors, command and source are returned to, 2) eval and subshell are transparent to break/continue (when it digs its way out of the blockstack; subshell will abort when it pops the transparent context so diddling with a child process's forked copy of the function list is harmless), 3) everything except command causes run_lines() to exit so the caller can perform cleanup, 4) local variables are command() only.

Returning drills down through "transparent" contexts to the next command or source context, erroring if it hits main. But this error is a normal function failure return with no other effect on flow control:

$ echo hello; return; echo $?
hello
bash: return: can only `return' from a function or sourced script
1

When return DOES find a target context to return from, it pops all but the last blockstack in each context (including the transparent ones it drilled past) and sets ff->pl to NULL.

In order to make "break" and "continue" work in eval context I basically need to give them a similar logic to return, which implies they should also be shell builtins rather than keywords like if/else interpreted within run_lines() as they are now. So those need to move, and to leave empty transparent contexts as necessary to get popped and trigger run_lines() to return so eval can clean up.

I THINK that's right?

I'm tempted to try to restructure things so do_source() just queues up work and then there's a loop in sh_main() that reads the next line from the current fd in the current sh_fcall(), but that's major design surgery and I'm just trying to get "return" in...

[Editorial note: I continued filling out this day's description until the 9th because I wanted it together in one place.]


July 5, 2023

So what function contexts in toysh do is tell the run_lines() plumbing about discontinuities in the input script. Right, back up:

Running anything starts in do_source(), which takes a FILE * argument saying where the lines of shell script come from. The heart of that function is a loop calling get_next_line(), parse_line(), and run_lines() as appropriate. (With signalling back and forth about line continuations.)

At some point I need to teach get_next_line() to do command history and editing, but right now it just outputs a prompt (do_prompt()) and does... not even a getline(), a getc()/realloc() loop, because signal handling.

The function parse_line() takes one char *line at a time and assembles a doubly linked list of struct sh_pipeline (ala "pipeline segments"). It returns 1 if it needs another line to complete the current thought, such as unterminated quotes or an if statement without a corresponding then/fi, and 0 if the resulting sh_pipeline list is runnable as-is (and -1 if there was a syntax error). Behind the scenes parse_line() calls parse_word() a lot (which returns the length of the next token in bytes, handling quoting and such, and itself returning 0 if we need another line to finish), and handles the results with a big if/else staircase. At the start, parse_line() glues the new line to what's left of the previous line if this _is_ a line continuation.

Each struct sh_pipeline contains an int type field and a struct sh_arg {int c; char **v} with the split-up command line. (The arg->v[arg->c] entry is NULL if the statement ended with a newline or semicolon, but if it's | or && or something that string is saved as the terminating entry so the pipeline segments can be appropriate stitched together when run.) The int type indicates what _kind_ of statement this is: zero for a normal command you fork and exec, 1 for the start of a flow control block (if/for/while), 2 for the "gearshift" between the test and the body (then/do), and 3 for the terminator (fi/done). Each of these still has an arg so it can if (!strcmp(arg->v[0], "while") to distinguish between them at runtime, then an if/then/else statement has more pipeline segments between the type 1 and 2 representing the test you run, and segments between type 2 and 3 are the body that's conditionally run if the test returned success. (There's various other types for functions and case statements, and "for" loops have more data fields, etc.) Plus these flow control statements nest so you can have types other than 0 between types 1 and 2 or 2 and 3....

When run_lines() is processing all this stuff it USED to have a struct pipeline *pl local variable to point to the current pipeline, and a struct sh_blockstack *blk which is the runtime stack of nested if/else/fi and for/do/done contexts (which is never empty, you always start running in an initial block that's kind of an invisible { } around everything, or in | a | pipeline ( ) around each one). The blockstack structure contains the housekeeping information that needs to be duplicated at each nested level of flow control.

Then I added a function call stack, globally pointed to by TT.ff, which the pipeline cursor and block stack moved into so calling a function can jump to a new pipeline segment, and then eventually return back to where it came from even if it was in the middle of nested while/if/case flow control. And each level of function call has also got a struct sh_vars array containing the local variables at this level, and another struct sh_arg containing the command line arguments this function was called with (so $1 and friends expand appropriately), and so on. Just as the blockstack list is never empty, the function call list is also never empty: sh_main() creates an initial function call context containing the global variables and command line arguments to the shell, and so on. And the order of TT.ff entries is slightly non-obvious, the one it points to is the current one, the one we'd return to is ->next, and it's a doubly linked list so TT.ff->prev is always the "root" context containing the global variables. When TT.ff->next == TT.ff we're not in a function.

When run_lines() gets to the end of the current pipeline list and has thus run out of things to do (without asking for more input), it has two options: 1) pop the function stack and return to where we got called from, or 2) break out of the loop and return to whoever called run_lines(). The "break out and return" case happens when we have an "anonymous" function context at the end of the stack, which calls to do_source() add, allowing it to clean up after itself. This happens when sh_main() does an initial call to "cd ." to set up $PWD and $OLDPWD, and later in sh_main() it calls do_source() again with the -c string or the script file or the global "stdin" FILE * instance. Calling do_source() is also what eval_main() and source_main() do internally, and all of those calls need to exit do_source() when exiting that function context, so the caller can close the FILE * and otherwise clean up and return.

The problem is, you can "return" from source context, and you can "return" from eval context. And if you've got an anonymous function context at the end of the list (indicating run_lines() needs to return to its calling C function when done) it's not immediately clear to me what "return" should _do_. It's gotta traverse down the function call stack, edit the appropriate parent function context(s), but leave the anonymous function context(s) intact so we return to the right C functions. And the search for the first non-anonymous context may find there isn't a valid one to return to, so it should emit the "return not in a function" message which might be a syntax error? (I need to make sure syntax errors flush the call stack appropriately, but still signal the need to return from the C functions.)

The previous "just handle it when we hit the end" logic never cares about an _enclosing_ function context. That's new to "return"...


July 3, 2023

Walking to the UT tables, making up for lost time by getting extra steps. Walked to the river and back to the university, now in a certain amount of pain. But 20k steps so far and still a couple miles to get home. Exercise!

So adding return to toysh: in "source context" return works but "local" doesn't. Working out the granularity of this stuff is kind of annoying. So it's an active function context without a variable context, look at setvar and it's calling getvar to locate existing variables that may need replacement (well, setvar() calls setvar_long() which calls findvar() and addvar()... Except when do you need to search from current function context and when do you need to add to the root context? Urgh, I already redid all this logic in one of the branches I never got to finish and merge! My mental model doesn't match the code because it forked and didn't get back together.)

Right, tangent. What I need to do HERE is work out signalling, specifically how should I label each struct sh_fcall instance in the linked list so "return" knows what to pop? Each function context records: 1) pipeline cursor (where in the script are we executing), 2) local variables, 3) command line arguments for $* and friends, 4) flow control block stack for nested if/else/while stuff. (Which has to be in the fcall instance so when you return from a function it knows how much to pop.)

I'm creating a new function context in five places:

  • sh_main() - initial function context, this is the root one that holds the global variables.
  • run_command() - normal function() calls.
  • source_main() - it's basically a function call, has command line arguments even! But no local vars.
  • eval_main() - temporarily swap out the command line arguments so we can repurpose $* expansion, and is... otherwise wrong (can of worms).
  • run_subshell() - so the child process has its own pipeline cursor and empty blockstack, maybe wrong too?

Alright, so the question is what happens if you return from eval or subshell context. In the case of subshell return can (should?) be ignored, in the case of eval it needs to return to the _parent_ context, which is funky because eval_main() set up a function context and wants to tear down one function context, but return would reach past that? How do I make that work...

$ x() { echo one; eval "return 37"; echo two; }; x; echo $?
one
37

Hmmm... Ok, create "transparent" function context that return would blow past, manually cache both TT.ff and TT.ff->pl in a local pointer, do_source() on the resolved $* string, and then restore TT.ff->pl to the cached pl value ONLY if TT.ff hasn't changed? No, it needs the blank blockstack context too so "run off the end" returns to the calling context (I.E. how do_source() knows when to end). So kind of what it is doing, but put the if (TT.ff==ff) on the end_fcall(). If they called return, that fcall already got ended.

I think.


July 2, 2023

Ah, the standard way to add an hour to a debugging session: compare the wrong output files! (And figure out you're doing so after you run out of printfs to stick in, and run the binary under strace to confirm that the system call is being made to emit the correct output.)


July 1, 2023

Darn it, bash does NOT set the O_DIRECT flag on its pipes, at least not that I can tell. (Maybe only if there was a read?) cat /proc/self/fdinfo/0 says flags 0100002 on the pty, which /usr/include/asm-generic/fcntl.h says is O_LARGEFILE and O_RDWR, which... maybe pty already doesn't do collating? Fine. But then echo hello | cat /proc/self/fdinfo/0 says flags 00 which is just sad. (Can fdinfo not read flags out of a pipe, or are no flags set? No idea!)

Dowanna try to insert an extra C binary into the dd tests. The test suite does NOT depend on having a compiler in the $PATH at runtime.


June 30, 2023

I need to get a toybox release out. I need to submit a quarterly invoice. I need to book the flight to speak at the taiwan conference (although they still haven't gotten back to me about hotel information). I need to post that new kernel patch to lkml.

It's one of those "I need to do everything before doing everything else" stunlocks where I really wanted to get "dd" cleaned up in this release, but I need to finish the patch.c rewrite and then get back to the like FIVE nested shell issues (I'm halfway through fixing HERE document variable expansion line continuations in one tree, and implementing "return" in another...) But a new kernel just dropped and it's best for mkroot if I have a fresh kernel version out when they do...

One of my old 401k plans has ended and needs me to transfer the money to something else (they suggest rolling it over into an IRA), and after 6 months of pestering I'm 95% sure this isn't an identity theft scam. but I still want to go to a fidelity office in person to deal with it, and thought "I'll do that in minneapolis", but the closest fidelity office was an hour away by bus. Sigh. (Only Boomers have significant retirement savings, and they all live in suburbia. This was one year of maxed 401k deductions around 10 years ago, and if I withdrew it I could cover my bills living in this house for... 2, maybe 3 months?)

I don't have MUCH retirement savings, but they are at least badly managed. I have two old 401k accounts from when I worked at Pace and Polycom (stuck in the closest thing they offered to an index fund, neither of which was very close), both companies have been renamed since (I can't keep them straight), and I really really really should have rolled them both over into a Roth IRA during the pandemic because I was making less than usual and the tax hit would have been comparatively minor. But I just didn't have the spoons, and even a minor tax hit was still more free cash than I had: I'm still paying down the home equity loan I ran up following by a start-up down instead of trying to get a new job.)

I should get a proper psychiatric whatsis (before Fade graduates and we lose the good health insurance where we're pooled together with a bunch of 20-something college students) so I can get the kind of regularly tailored-and-adjusted ADHD meds Fade has, because caffeine borrows a cup of executive function from your future self, and I have recently confirmed that one "monster" energy drink with sugar gives me about an hour of focus and then knocks me on my ass (irritabily) for the rest of the day. (The diet ones just go straight to headache now.) Alas, Texas no longer has a healthcare system, instead strip malls have emergency care kiosks, which sprout up like mushrooms around here. Google Maps refuses to show the one closest to me (across the parking lot from the dead Sears in Hancock Center) unless you know to search for it by name (presumably because it's for poor people, they won't show the black-owned haircutting place just past the Wendy's either: zoom in all the way and it never pops up), but on the walk to-from the table I pass where the second closest Wells Fargo location to me closed during the pandemic, which was replaced by "Next Level Urgent Care" which shares its building with a convenience store, a coffee shop, a nail salon, and a Jimmy John's. Of course they're down the street from St. David's but nobody can afford to _go_ there. (It's owned by HCA.)

In Minneapolis I can go to health care places that remember I exist between visits (and do not ask if I'd like fries with my prescription), but alas I'm not there for long enough periods to deal with half the medical issues I want to get looked at while we still have the good insurance. (When Fade graduates, we go back on... the obamacare plans I guess? Post-dissertation she's already lined up a one semester teaching gig covering for someone's maternity leave this fall, but I don't think it comes with health insurance. We can do that Common Object Request Broker Architecture thing to very expensively extend the previous health insurance month by month for a bit, but...)

That's another reason spending a few years in Japan looked interesting: they have a functioning healthcare system. Admittedly they don't believe in psych meds because it's all in your head and you should just stoically suffer and play out your social role until it's time to commit suicide or be killed by the system (if you're not up for dying at your desk via Karoshi or becoming a hikikomori they have a special forest you can go to, suicide is the leading cause of death there for men age 20-44 and women age 15-29). But for things like dentistry and blood pressure they're way head of us, so... net win? Then again people go to Mexico from the USA for affordable health care all the time, so "ahead of us" is almost an information-free statement? Anyway, it looked good while I was there.


June 29, 2023

Flying back to Austin. The middle seat was empty this time so I had enough elbow room to use my laptop a bit, and the new one still has a fresh battery, so I could do some kernel compiles. Last night I ran the usual mkroot/mkroot.sh CROSS=allnonstop LINUX=~/linux/linux build on a clean checkout of the new 6.4 kernel, with none of my patches applied, and wonder of wonders everything except x86-64 built. Which means my patch to remove the stupid x86-64-only host ELF library nonsense is the only thing I actually NEED to forward-port. (My other patches are nice to have, but not release blockers.)

So I did that on the plane, starting with the "proper" fix of changing the HAVE_OBJTOOL line in arch/x86/Kconfig to say "if X86_64 && !UNWINDER_FRAME_POINTER". (The ORC unwinder is implemented stupidly in a way that drags in external dependencies that build break a simple host environment. You can select the frame pointer unwinder on every other architecture, and USED to be able to on x86-64, but when this new feature was added in 2019 they broke the existing one but only on this one architecture. I've been hitting it with a rock ever since, which means I'm regression testing that the unwinder that works on EVERY OTHER ARCHITECTURE also still works here.)

Except, since last release, Josh Poimboeuf broke arch/x86/entry/entry_64.S in commits 4708ea14bef3 and fb799447ae29 by adding some sort of stack guard that never got tested with the relevant config option switched off. It adds hardwired dependencies on the ORC stack unwinder to the x86-64 system call entry code. Bra fscking vo. A bunch of undefined macros sprinkled everywhere, which needs me to add this nonsense to the start of the assembly file:

+#ifndef CONFIG_HAVE_RELIABLE_STACK_TRACE
+#define UNWIND_HINT_ENTRY
+#define UNWIND_HINT_IRET_ENTRY
+#define validate_unret_begin
+#endif

Which took quite a while to figure out, because everything going wrong is in the middle of nested macro expansions. And then it STILL breaks because the assembler says the file arch/x86/include/asm/idtentry.h line DECLARE_IDTENTRY_RAW(X86_TRAP_BP, exc_int3); has garbage on the end of the line, the error message says first unrecognized character 's', and after MUCH DIGGING I changed a second line of entry_64.S with this hunk:

+       UNWIND_HINT_IRET_ENTRY offset=\has_error_code*8
        .if \vector == X86_TRAP_BP
                /* #BP advances %rip to the next instruction */
-               UNWIND_HINT_IRET_ENTRY offset=\has_error_code*8 signal=0
-       .else
-               UNWIND_HINT_IRET_ENTRY offset=\has_error_code*8
+               UNWIND_HINT_IRET_ENTRY signal=0
        .endif

That might not be the right fix, because now there's two instances of the UNWIND_HINT_IRET_ENTRY macro and I don't actually know what it DOES. Will it insert the wrapper code twice? Why does it take statements as arguments? I'm just GUESSING. The ORC codepath uses an assembly macro that takes multiple arguments, and does... something... with them. Note that the difference between .s and .S files is that lower case doesn't go through the C preprocessor and the upper case does, so they have C macros AND assembly macros in this thing, and I'm not that familiar enough with proper assembler syntax to author it from scratch. I've done a lot with machine language, and poked at existing assembler like this, but as soon as you have an assembler doing symbolic name=value stuff I do THAT part in C with inline assembly statements as needed. So I've never actually used assembler macro syntax because if you're using macros why are you doing it in assembly?

The build break here is that the assembler won't accept multiple statements on the same line (another reason to use inline assembly in C functions to do anything fancy), and when I #define the macro to nothing so it drops out that's what this tries to do. Possibly I need to change my "#define UNWIND_HINT_IRET_ENTRY" to instead take a list of arguments and split them so each one is on its own line, but I don't know how to make a C macro do that off the top of my head, and don't know how to make an assembler macro do anything, and it's a bit tricksy to look that up on an airplane with no wifi.

I also wist for the days of "don't ask questions, post errors" where I could post something that worked for me to the list and make puppy eyes at this Josh Poimboeuf guy who broke it so he could fix the #else case of his macro. The last dozen times I've posted anything to the list they never engaged with HOW I did anything, it was either "you got the bureaucracy wrong" or "you're crazy for wanting to do that, stop wanting things". Questioning my goal and questioning my paperwork filing skills, seldom if ever discussing the code. (Oh, I did get coding style complaints about semicolon placement and where to break lines and such.)

Anyway, I got a patch that works for me. I need to post it to the kernel list, but... I do not have the executive function to steel myself to deal with that wretched hive of scum and villainy just now.

Back in Austin, hanging out at the HEB deli tables with laptop, and re-examining patch.c, I kind of want to rewrite it to do work in a different order. This patch implementation is designed to work as a stream editor, meaning it grabs a hunk and searches forward through a stream of input lines for a place to apply it, and if it hits EOF it announces failure, emitting the hunk it couldn't apply to stderr. It only buffers enough data to evaluate the current hunk, and writes it out to the file as soon as it can be sure that line wasn't the start of this hunk. This means two things: 1) It can't back up to apply hunks out of order (which is fine, diff never generates them out of order, nor do they overlap), and 2) it can't fail a hunk in the middle but apply later ones. The way it figures out a hunk DIDN'T apply is by reaching the end of the file without finding a place to apply it.

What I _could_ do is read all the hunks and try to apply them in parallel, and take the first one that matches at any given position. This would allow hunks to apply out of order. It could introduce failure modes where two identical sections are replaced with two different things, but as long as earlier hunks win over later hunks in the case of a tie, that should work out ok?

But this is not a tangent I want to go down right NOW. I'm trying to close tabs for a long-overdue release...


June 28, 2023

I'm in another one of those stun-lock situations where spinning enough plates where each is only glacially advancing. Not blocked, just... chipping away.

Sigh, this sort of thing is why I factored out and genericized the toybox argument parsing right at the start. It's not the command's job to implement support for -- although it may have a flag to _disable_ what it gets by default.

I'm still subscribed to the busybox mailing list, and check the folder every once in a while to see if they have interesting test cases or feature requests from their userbase. A surprising amount of the busybox issues that go by are "I got the toybox plumbing right, doesn't apply to me" issues, and I'm mostly looking for test cases or new features requests, although in that case the android guys submitted a getfattr years ago and then yanked it again (I moved it to pending) because their internal stuff binds to some elaborate C++ library or something? And "man getfattr" in Devuan Bulimia doesn't mention it? But I already have xattr getting/setting logic in "tar" so I suppose I should deal with the command line version at some point, starting with looking up why they wanted that weird library. (Probably some kind of magic filtering. I remember discussing it on the list, I just don't remember the _result_. But that's what the list archive is for...)


June 27, 2023

Got a bug report in patch, which looks like yet more fallout from adding fuzz factor support. (The problem is if you're speculatively applying a hunk and it fails after a few lines, you have to back up and try the first line of the hunk at each of the lines you've already traversed past, but the loop wasn't really set up to do that.)

But patch has several pending todo items already. The reversed hunk detection is... not exactly wrong, but insufficient. Right now it's just "one added line works as a removed context line", but it SHOULD be "a full hunk would have applied in reverse, and no hunk has yet applied forwards". Also the payload of struct double_list in lib/lib.h should really be void *data instead of char *data, because that way you can stick as any pointer type in there without needing a typecast. But the first user was patch.c which wanted strings, and did plist->data[1] and so on, meaning in the rest of the tree there are a bunch of gratuitous typecasts, which I've been meaning to clean up. But cleaning it up requires making patch NOT use it as a char *...

I should just fix the bug in front of me and not poke the house of cards, but I'm bad at that. :)

Hmmm... is the "no hunk has yet applied forwards" test actually helpful? Because hunks apply in order, so the check happens at the end of the file when we didn't find a place to apply this hunk. Meaning we can only evaluate one hunk for having been reversed, and you COULD append a reversed patch to a normal patch. If you do that, it re-opens the file because generating a diff starts with @@ filename lines, and then the hunks within that must go in order in the file and can't overlap. So it's "the first hunk of a file, not a file". (Meaning the first hunk of a "@@ filename", but not necessary the first hunk of a file.patch. Coming up with vocabulary to distinguish concepts is a surprisingly large amount of the design work here, you need unique NAMES for this stuff and they haven't necessarily already got them...)


June 26, 2023

Still under the weather. I mostly spent the day listening to discworld audiobooks and nursing a headache from healing dentistry.


June 25, 2023

Toybox release prep has a whole bunch of externalities, the most annoying of which is building current linux kernels in mkroot. I should update to the new kernel version, but I never did reply to Andrew Morton, haven't tried to shove my patches up LKML's orifice since last toybox release, and I REALLY don't look forward to finding out what they broke this time. (And I should make sure it builds the patched and unpatched version, although I've NEVER installed the gratuitous elf package on my laptop so I've never built the unpatched kernel for x86-64 in mkroot since that dependency cropped up. Built fine for arm last I checked. It's so absolutely unavoidably necessary most architectures still doesn't need it and you worked fine without it for 30 years. I am SO TIRED of arguing with those people.)

I miss aboriginal linux as a test environment, and want to slap together a mkroot environment capable of at least running the test suite. If that involves grabbing a bash binary from aboriginal or some such, oh well. And as long as I was doing that, I could grab the make binary and do more "build self under self" testing, which leads into the mkroot stuff... and that implies put the mkroot /bin and the aboriginal linux /bin on the same filesystem with the first berfore the second in the $PATH... But small easy steps. If I try to do everything at once I never get to check anything IN. (Plus aboriginal linux was layered, with the toolchain being a second filesystem spliced in at mount time, so getting bash is easy but getting make involves pulling in an obsolete glibc version that links against ancient uclibc that lots of modern packages won't compile against. Remember, uclibc development lost power shortly BEFORE posix-2008 came out and is only realy RELIABLY susv3 (not susv4) so me beating a dead horse through 2017 doesn't help the staleness of the API. I ended that project for _reasons_...)


June 24, 2023

I have not been keeping up with my blog since dentalism. (And the days I hadn't written up before that are kinda lost to history now.) Still not back to full speed, but shoveling out.

I'm trying to close tabs for release, but tabs keep opening as fast as I close them. Especially when there are a couple of gristle changes that take an inordinate amount of chewing to be satisfied with, it's not BIG, it's just not _right_ yet.

Still grinding away at the toysh HERE document line continuation logic, which is being a whack-a-mole change. (Currently it's gluing the lines together AFTER checking for the HERE terminator, so sh -c $'cat<<EOF\nabc\nE\\\nOF\necho hello' doesn't work because neither E\ nor OF were the end of line marker. But again, you can't just look at the last two bytes of the line for a backslash and a newline because that backslash could itself be escaped, you've gotta parse forward from the start, and THAT logic is basically the variable expansion logic (at least for detecting line continuations, there's a call to parse_word() in there), so unless I want to duplicate it in two places it's tricksy to get the ordering right.)

I have a smallish change to cp that would mean backing out the pending change to cp -s. I got readlink fixed and the cp -s logic is separate but similar, I want to get both fixed. But it's DARN FIDDLY. (And no, the gnu/dammit one doesn't get it right. I want cp -sr to work whether source is above dest, dest is above source, or one or both are absolute paths. When given relative paths I want it to produce relative path symlinks, and when given absolute paths I want it to produce absolute path symlinks. And I need tests for all of it in cp.test)

I'm actually most of the way through reviewing and cleaning up dd.c but I'm now at the "part I want to rewrite" (not hard, just fiddly) and the "testing the hard to test parts", namely the I/O blocking that dd is all about. (If you DO get a short read, how big is the next read and the next write? Posix has a LOT to say about that, but it's hard to externally test that a program is doing that right.)

The trick to testing dd is to use O_DIRECT pipes, a "packet mode" feature added in 3.4 that maintains the existing blocking (does not merge or split blocks of data sent through them). Which I _think_ bash is already doing, because otherwise "while read i; do blah $i; blah stdin; done" is really hard to do in a pipeline. (On a seekable fd you can fseeko(ftello(fp, 0, SEEK_CUR)) to force it to discard the buffered input and back up the underlying file descriptor so the child process gets the next byte after the line read put into i, but you can't unget pipe data. But if the input was one write() per line, the O_DIRECT means the next read() naturally stops there and readline() sees the trailing newline and returns at the right place without readahead. It's a bit delicate, but workable.)

Of course even if bash does that, no guarantee that mksh or zsh (which macos switched to when it gave up on the last gplv2 release of bash) does, so if I depend on the shell's implicit behavior tests might fail in an uninformative way.

I'm tempted to make a C wrapper to test this more reliably, but the test suite is not set up to build C code (it doesn't strictly depend on a compiler being available) and creating some sort of toys/example/demo_testpipe.c makes the tests using it really awkward ot invoke. (None of the example stuff is in defconfig. I suppose I could have "make tests" build with a slightly weird config, but that's horribly magic and I don't want to go there.)

Sigh. I guess I could check for $BASHPID being set and only run the pipe blocking tests in that case? Because bash should set the flag on the pipe already.

And of course all the github issues, which tend to come in bursts it seems...


June 22, 2023

Dental surgery day. One of medical science's most granular weight loss programs, and one of the few modern thingectomies where "bite down on this" is still a core part of the procedure.

Sticking various forms of pliers and a miniature angle grinder in my mouth isn't actually the part I mind, that's just engineering. No, the DO NOT THINK ABOUT THE INCIPIENT PANIC part is that I needed four injections into my mouth fifteen minutes before the surgering. Why was my blood pressure thirty points higher this visit than it was during my consult visit to schedule this a couple weeks back? Before the doctor even got into the room and it was the assistant setting me up? The same reason I was sweating so much: my subconscious knew NEEDLES WERE COMING!

But first, a fifteen minute talk with the surgeon where I agreed to let them use lidocane, despite me having put it on my allergy list, because there are apparently a half-dozen something-cane drugs that are basically minor variants of the same thing, and then once you move outside that family your face is now numb and paralyzed for 12 hours instead of 4, and I eventually just decided to risk/lump the migrane. (Does it really matter WHAT they're injecting? NEEDLES!

My lidocane-inspired trip to the emergency room back in 2013 happened when a lidocane rinse got in my sinuses and triggered a full field visual migrane I thought was a stroke making me go blind. (When it's in BOTH eyes the same way, it's not an eye issue it's a brain issue.) I could see again an hour and change later (lying on the hospital gurney with nobody having seen me yet), and the "everything is overlaid with sparkly television static" went away after three or four days. Injecting the same drug into my circulatory (not lymphatic) system is less likely to get through the blood/brain barrier intravenously than getting coughed into my sinuses, infiltrating the nasolacrimal plumbing, and coming in contact with the optic nerve like I suspect happened last time. Although I did not armchair-opine that at an actual medical professional, just said I was willing to risk the migrane.

Of course the OTHER problem with lidocane is it's not very effective on me. Despite thinking I could relax a bit once the four injections were over and let them get on with the easy part (the actual surgery)... turns out I could still feel it. Muted a lot, but ow. They offered a fifth injection, but I went "that's not an improvement, just get it over with". (Yes I had considered asking if they could just yank the tooth WITHOUT injections, but strongly suspected I would regret it and there's no way they would have agreed for insurance reasons anyway.)


June 20, 2023

Capitalism's doing another "embrace and extend": podcasts started life as an open protocol where you post an mp3 file at a URL, and have an xml file called an "rss feed" list the available episode URLs with attached descriptions and dates and so on. Then capitalist aggregators like apple and spotify got ahold of them, and now the mp3 is hidden behind layers of paywall.

For example, an interesting episode of "on with kara swisher" scrolled by recently on my android podcast app, called "the man making self-driving trucks", and I wanted to send a link to Fuzzy because that topic interests her. But when I asked the Google Podcasts app on my phone for a URL, it gave me an absolutely insane pile of hash crap that I do NOT want to cut and paste into discord, smells like it would link-rot within days when some server cache expired, and which had no obvious link to the mp3 anyway. (There's a play-in-web-page button, some sort of "mark as played" cookie thing, and "add to playlist" cookie thing. So a captive portal wrapper around the content, keeping you in their walled garden ecosystem.)

When I google for the podcast's website, there are apparently two of them, one on vox media and one at nymag... with different content? But nymag pops up a paywall if you try to actually look at an episode, so screw 'em. Neither page gives links to the actual mp3 files, instead they link off to "spotify, apple podcasts, or whever you listen" with the third being some weird service that also has a play button but no link to the mp3 file (and that one doesn't even have separate pages for each episode, just a big page you click expand from, so if you want to link to a specific episode don't use them). And of course if you "view source" on any of these pages (which Google has removed from its phone browser) it's nested layers of obfuscated javascript assembling a URL out of pieces, and part of the point of https:// everywhere is you can't easily route your browser request through a proxy to see what actual URL it's fetching. (Remember how Google is switching DNS lookups to go through https so you can't block adverterising websites at your firewall?)

Luckily there's still an rss feed behind the scenes, and googling for "on with kara swisher rss" gave me a URL that's human-readable enough to get an MP3 URL out of, in this case https://www.podtrac.com/pts/redirect.mp3/pdst.fm/e/chrt.fm/track/524GE/traffic.megaphone.fm/VMP4773935195.mp3?updated=1686187670 which I THOUGHT might have an expiration in it (so it would force you to reload the RSS to get a more recent one), but date +s @1686187670 says June 7 so that's apparently upload date. (Why...?)

I eventually wound up sending fuzzy the apple podcasts link as the most human-readable of the lot. None of them are really things you type into a URL bar yourself, because the people making this infrastructure don't care about that. (Even youtube's horrible hashes are intentionally short enough you can copy them from one machine to another by hand. That was by design, and its inheritors haven't broken it yet.)

(Later I noticed that the URL the rss pointed at is itself a wrapper, and https://pdst.fm/e/chrt.fm/track/524GE/traffic.megaphone.fm/VMP4773935195.mp3 was the ACTUAL link to the MP3. Which is a 403 redirect to https://dcs.megaphone.fm/VMP4773935195.mp3?key=2b15ad537ed1f7f2041d2bd4dbdd1139&request_event_id=c162bbdb-7b1f-43db-a97e-76a7551db06b because of course it is, although you can strip the ? and everything after it and the result still works... albeit by being a 403 redirect putting it _back_.)

I hope the fediverse guys take podcasts back under their wing. Youtube used to let you listen to the audio when the phone was otherwise off, but Google moved that behind their paywall (taking away something it could already do and now charging extra for it). Podcast apps still work with the screen off, meaning you don't have to hold the phone and prevent anything from touching the screen, and the battery doesn't run down as fast.


June 19, 2023

Google has developed another weird semi-failure mode where typing a search query into the main google.com search field and hitting enter doesn't work. Instead it inserts a line break and lets me type a second line of search query, and I have to use the mouse to hit the "Google Search" button under the entry field to get it to actually do the search.

I think the problem is that when the page is loading from my android wifi hotspot (I still haven't got the wifi in Fade's apartment to accept the new laptop's mac ID), the page loads more slowly than anyone at Google has regression tested in years, and the "actually do the search now" behavior got moved to javascript or something in a separate file that's #included from the initial html file, and thus it's loaded after the first file loads. So there's a few second window where the page is displayed and lets me enter data into the field, but hitting enter in the field doesn't submit the form data because the file adding that behavior hasn't loaded yet. So I hit enter and scroll the google search bar (it does not resize) with my cursor ready to type a second line of text.

How does google NOT NOTICE that they inserted a bug into the-thing-they-do? The behavior changing while the page loads is NEWBIE bad web design, they used to teach about avoiding that in single semester courses on this stuff. I know laying off 12k load bearing people in january was a bad move, but seriously. What, only poor people have slow internet anymore, we don't matter? Or "you shouldn't ever see the main google.com page, you should just type your query into the URL bar and pollute your URL autocomplete with random search phrases that you couldn't load without interposing a google.com page of advertising"? Neither is exactly reassuring.

(Yes, I am aware that pedants say "URI" when it hasn't got https:// on the beginning. I file it under "no, a BUD lite", and if there is a useful distinction then "URL" is the good one (points to a potentially interesting site and "URI" is the bad one (points to a site frequented by people who say "URI", so I don't want to go there). File it with kibibytes: not gonna.)


June 18, 2023

I think parse_word() has to separately flag trailing slash newline as the reason for requesting a new line so it can be removed when gluing lines together, because having expand_arg() do it is just a MINEFIELD where bash -c $'echo "$\\\nPATH"' prints $PATH instead of expanding because "is the next character X" becomes a noticeably harder question to answer if "escape sequence that collapses to nothing" isn't removed before asking.

But this means parse_word() can't just return a NULL pointer for "needs more data", because that could mean "unterminated quote" (there are a BUNCH of different types of those) or it could mean this special magic instance where we want to chop 2 bytes off the end when gluing them together, and you can't just look at the END of the returned string because that \ could be the second part of \\ or it could be in single quotes where the \ isn't removed at all, and of course "\'$(' isn't a balanced set of single quotes but "\'$('\' is, the enclosing quotes aren't balanced but the question is do we remove-or-keep the final \ if the line breaks at this point? Gotta start at the beginning and work forward to know.

I have two sets of plumbing that are independently parsing this nightmare, parse_word() nondestructively figures out where the next word ends and expand_arg_nobrace() does quote removal. Did I mention even presumably simple single and double quote removal is modal? Inside single quotes you ignore everything except a single quote, so '"', '\', and '$' are all complete single quote strings, and yes even newlines just continue as part of the string until you get the next single quote which is why the escaped newline keeps the escape. But inside double quotes you still expand variables, but backslash no longer vanishes before most characters (so \x passes through), except that \\ or \$ are still escaped so consume that backslash, and backslash newline is a third instance where the backslash is consumed.

The OTHER thing I could do is just have parse_word() return NULL to indicate "need another line", but remove the trailing two characters that triggered that itself, by replacing the \ with a null byte. It's uncomfortably magic (and violates the "nondestructively" above), but I really don't have a better place to put it. The signaling handoff isn't designed for this.


June 17, 2023

I talk about being out of spoons a lot, but it's not diability. It's ADHD. I have limited executive function, and ALWAYS HAVE. I work based on momentum but have huge trouble STEERING. Lots of times I know "this is not what I should be doing", and yet here we are.

It's not so bad with people waiting on a deadline like RIGHT NOW. Pair programming or in teams. But I have often spent an entire day Needing To Do A Thing, and Trying To Do A Thing, and not having started that thing at 6pm. Things like "book a flight to a conference" or "update my bank information in a website". Looming. (I mean, I'm GOING to do it wrong. Or sub-optimally. If I could just get my stars aligned I could do it RIGHT and feel confident I'd done so. And not that I'd made the wrong reservation or participated on the receiving end of identity theft or something. Which is probably the executive function version of free floating anxiety trying to find something specific to latch on to: this is big and immovable for no obvious reason, what could the reason be...)

Back in the olden days rich white men had wives/secretaries/servants that compensated, and the less rich were miserable with constant Deadline Crisis because nothing got done before it HAD TO HAPPEN RIGHT NOW. (Been there, done that...) And I can compensate somewhat with a routine built around a rigid schedule, which is another variant of "externally imposed deadlines"...

One of the nice things about programming is "doing it wrong" is a normal part of the process. Debugging an empty screen. It's not good enough yet, keep hammering. It'll never be RIGHT, just good enough for now. I'm sure there's a better way to do this but I haven't been able to think of it yet. But what if THIS happens? Add more tests. Almost any piece of my code, if you ask me "how do you break this" I'll have a LIST, but it's acceptable stuff like "provide a single line of input larger than RAM plus swap". I do not handle $((1<<67)) on a 64 bit system. And I'm provisionally ok with that. For now. I am COMFORTABLE with programming, both because I've already MADE most of the obvious mistakes multiple times and dealing with it is just shoveling, and because _everybody_ is bad at this. Iterative pareto principle, 80/20 and then 80/20 what's left. Find checkpoints to commit that are better than what was there before.

But I need traction and momentum. Scrabbling across the surface and bouncing off because I can't get started is frustrating. I need to drive a crampon in to get started. I've spent far too long being five minutes from getting started. And if I don't know where my next checkpoint is because the code's being can-of-worms but people are already _using_ it...


June 16, 2023

Yay, the big endian fix for QEMU finally went in. (It's not specific to mips malta, it affects s390x as well.)

This morning's dental appointment was apparently just diagnostic, not pulling the bad molar. $400 worth of X-rays, and then they want to do $1500 of work on all the OTHER teeth. Which is way cheaper than I expected, to be honest, I got charged $8k to get my front teeth screwed up like this back in 2013, so probably double that for inflation since. (There's a reason I'm going to the place Fade gets her teeth done, it's the hospital attached to a medical school. Either they get graded on their work, or it's the teachers doing it.)


June 15, 2023

Ooh, new bug with variable expansion: ${0::0} is erroring in toysh but not in bash, which I noticed in my monthly-ish glance at the busybox list to see if they've hit anything interesting. (Most issues they mention don't apply to toybox, but every once in a while there's an interesting test case.) Which is EXTRA weird because in bash ${0:} errors but ${0::} does not. And nothing you put _in_ the slice math fields seems to produce an error, "echo ${0:=}" is fine, "echo ${0:]}", even "echo ${0:0/0}" which SHOULD throw a "division by zero" error...

I'm still wrestling with HERE document expansion line continuations being tricksy, turns out I can't just iterate over parse_word() since external quoting is irrelevant but INTERNAL quoting is still a thing. So ${PATH/"abc"} still has to match but quotes before or after are ignored (normal text having apostraphes in it does not suppress variable expansion). Except you can \$ the initial $, which seems to be the only backslash escape other than the terminal newline that counts, the rest are ignored/retained.

This raises a design issue: I only need to traverse and resolve line continuations (and glue together lines) once, but collecting HERE documents doesn't care about that. Resolving variables does. So when I DO have an unbalanced HERE document, the time to generate the error is when using it. For example, a HERE document in a function definition can contain anything, but when you CALL the function it would error if you have ${ without a concluding } in the HERE document. Checking each time you call the function is right for error reporting but expensive for gluing togehter line continuations.

Sigh, when I wrote expand_one_arg() the design assumption was the input was already sanitized, and changing it to reliably detect unbalanced ${} and such would be a lot of auditing work to get something I could trust. Scanning the input before calling it in HERE document expansion isn't hard, but the obvious place to do it is the WRONG place to do it. Grrr. I have multiple bad options here.

According to the thread with Chet, bash takes a different approach storing HERE documents as one big string. I'm keeping the individual lines we read in as long as I can to be nice to nommu systems, and having one big string still doesn't fix the "unfinished expansions cause error" problem. I guess as long as I need a traversal pass to spot those, HERE document traversal at expansion time is the right thing to do. The gluing-lines-together part should only do WORK the first time, and we need the rest for error detection.


June 14, 2023

Onna plane to minneapolis.

The man page for readv() and writev() says that they work just like read() and write() except for taking multiple buffers, which means they SHOULD perform a single atomic input/output in the non-interrupt case. So I don't actually need to do memmove() between dd blocks to convert input and output sizes, I can wrap around the end of the buffer using (at most 2) iovecs.

So the question then becomes, what's the error case where a short read or short write (which any signal can cause) leaves me with not enough data to do a write() of the requested size, but not enough free BUFFER to do a read of the requested size? In theory the necessary buffer size is input+output, but for bs=1g that's... uncomfortable. In theory on a system with an mmu the physical pages should mostly remain unpopulated if I haven't touched them. (Modulo whatever weirdness transparent huge pages get up to, items allocated AFTER this in the heap and so on...) In practice, this is a fairly obscure error recovery path where input and output writes get out of sync.

Reading POSIX on the plane: bs= disables block aggregation, so allocating ibs+obs when either is specified makes sense. Posix says the default is ibs=512 and obs=512 so by default you DO have block aggregation that can get out of sync.

Posix says: "if the read returns less than a full block and the sync conversion is not specified, the resulting output block shall be the same size as the input block", which means if ibs=1g obs=512 then if you only read 800 megabytes instead of a full gigabyte, you perform a single output write of 800 megabytes, completely ignoring the 512 byte size request.

Sigh, there's aggregating short blocks and SPLITTING long blocks, I want to know when to do each, and it's not explaining the difference clearly here.


June 13, 2023

Flying back to minneapolis tomorrow, to dogsit while Fade's at 4th street and for my delayed dental work. Trying to finish stuff up here before then.

[Editorial note: screw it. I've been blocked on this for a couple weeks now, gaijin smash time.]

Sigh, I'm editing this on July 5th and here's what I left myself as notes to finish this entry:

I got invited to patreon video.

Trying to get a release out, 0.8.9 vs 0.8.10

Finish dd
  - testing with O_DIRECT pipes
  - has a TODO about buffer overflow

Thanks past me. The patreon video is a todo note-to-self because for a long while half the videos I watch on prudetube were creators complaining about prudetube and it just doesn't sound like fun, but I still haven't followed up on getting the tutorial videos I want to do hosted on that german peertube server instance the "topless topics" lady uses. I looked into hosting videos directly on patreon last year (along with tumblr and a few other places) but they hadn't properly rolled it out yet, and were pointing people at vimeo which was publicly exiting the video hosting space. (Their new business model is, like, corporate training videos? Or maybe just serving nothing but advertising, I forget. It seemed uninteresting, either way.) And I still need to update my patreon bank info for the new credit union.

The release note-to-self is obvious, the part about about the version numbers means that the logical successor to 0.8.9 is 0.9.0 but that signals... at least getting dd promoted? Adding the LFS build? Adding command line editing and history to the shell so it FEELS functional even if that wouldn't significantly change the number of scripts it can run... I haven't quite EARNED a 0.9.0 yet, but I've held off releasing 0.8.10 for twice as long as I should in part because I didn't want to CALL it that.

The dd stuff is that 1) I know how to test it now: when you set O_DIRECT on a linux pipe it does NOT merge packets in the pipe buffer, which gives you a method to preserve read/write transaction sizes and actually TEST that all this funky dd blocking is actually doing what it claims. (This test will most likely fail on macos. I'm ok with that. The remaining question is HOW to set that flag on the pipe.)

And 2) is a note in the existing dd.c implying the buffer size it's allocating is insufficient. Since then I've decided to rewrite the actual copy-and-realign loop to use readv() and writev() because then I don't HAVE to realign anything! The kernel can do a single atomic I/O transaction from or two multiple userspace buffers! (Since posix 2001, apparently. I don't remember if that was susv3 or susv2...) So I don't need to do funky memcpy things when ibs!=obs to copy the data back to the start of the buffer so it can potentially be "topped off" by another read. (Yes of COURSE I should test this with prime numbers.)

Knowing how to do it and clearing headspace TO do it is a different matter: I need to CLOSE tabs. Jumping from thing to thing results in nothing getting checked in and all my time spent reverse engineering where I left off.

So yeah, spoilers. (Like half of tomorrow's entry is me working out what I just summarized there...)


June 12, 2023

I'm still subscribed to the coreutils mailing list (well they STILL haven't merged the cut -DF stuff they said they would, and said was still in progress last time I poked them; getting a Linux From Scratch build going is a good way to test new releases of that before they make it into debian), and recently they were talking about "dd count=3x4x5" syntax (an insanity required by posix, from the days before $((MATH)) was built into the shell), and I went "eh, it's not hard to implement" so I added it to the dd.c in pending. Which already had some outstanding cleanups from the last time I looked at it, and I sat down and did the REST of the review while I was there (I want to rewrite the main copying loop, and need to expand the test suite now I've figured out how to do the O_DIRECT pipe thing that can actually measure the input and output block sizes)...

I got to a point where I could check in the changes I'd made, so I ran make test_dd to make sure I hadn't introduced any regressions and... There's a test for count=0x2 hex notation. Because I'm using atolx() which automatically grabs hexadecimal prefixes. And implementing the multiplication thing BROKE that. Ah, THAT'S why I hadn't done that. Of course, I _could_ make both work together, since 0x means multiplying by zero if not interpreted as the hex prefix (something you probably never want to do in this context)...


June 11, 2023

Still digging through HERE document variable expansion and I'm pondering the fact that int x, len; x = writeall(out, str, len = strlen(str)); if (x != len) barf; is subtly wrong, in that signed 32 bit integers max out at 2 gigabytes and a string on a 64 bit platform COULD be longer than that. I dunno if malloc() still has maximum contiguous chunk sizes it can return (my recent glib comment on the list to Elliott that malloc() should only fail for virtual address space exhaustion isn't TECHNICALLY true)...

This is a category of "thing I do not look forward to auditing". Same as the "how big a string can getline() return" problem, where in THEORY you can tr '\0' ' ' /dev/zero | sort and the OOM killer triggers. There's no reason a normal user CAN'T run :(){:&:&};: and forkbomb the system. (That's what the container plumbing is for.) Outright ATTACKS against this code are currently in the "don't test for errors you don't know how to handle" bucket, but as we approach 1.0 anything with security implications is a thing.

In this case when I wrote the code the assumption was that string-too-big errors would either be misattributed as disk errors (passing a negative number to write() should return error) or silently truncate the string in the HERE document output, either of which is acceptable for an explicitly insane input. But I need some way to AUDIT this stuff. Which may just boil down to "grep for each call to each library function and check its user", tedious but not really that hard. Toybox is small on purpose, and I've always vaguely thought a pre-1.0 audit of EVERYTHING was probably a good idea.

But I still don't know how to handle the getline() issue. It's sort of definitional, I don't know what the correct behavior IS. Barf on any single line longer than some arbitrary length? This is where OOM returns are good, from a libc that knows how much memory the system/container has available and can use that as a frame of reference for "this is too big to load into memory".

Which reminds me that the pending get_next_line() rewrite in sh.c (to actually have command editing/history) also needs to do the fseek() trick so the buffered FILE * readahead that getline() inevitably does isn't reflected in the file descriptors that commands inherit. I think I already have that on my sh TODO list? It's probably in tests/sh.test already, I should check...


June 10, 2023

So HERE document variable expansion requires line continuations, which is REALLY AWKWARD. A test case is "${PATH//:/ }" can have a newline instead of a space in it, which is preserved AS a newline if you didn't escape it, so you get path components one per line instead of space separated. That's not the awkward part, that's actually useful (although $'\n' instead of a literal newline on input is easier to read). No, the problem is that variable expansion assumes the input already has all the line continuations resolved by parse_word(), to the point there's a couple places expand_arg_nobrace() doesn't do bounds checking because input without a sufficiently unquoted trailing } for each ${ can't make it that far.

Which means I need to iterate over unquoted HERE document input with parse_word() to detect when I need to glue lines together. (It's very much NOT a fast path, you should almost never need to glue lines together in HERE documents, but if I EVER need to do it I only have bad places to put it. Bash doesn't detect unterminated variable expansions at parse time:

$ x() { cat<<EOF
> ${PATH
> EOF
> }
$ x
bash: ${PATH
: bad substitution

But the logical time to glue lines together is parse time. Once you've traversed a set of lines once and attached the ones needing continuation, you don't need to do it AGAIN, so doing it every expansion seems wasteful. I dowanna mark it as traversed, that's bad magic. Sharp poky outy bits, already too much magic I can't avoid, not doing it as an optimization. (In this context, "magic" is similar to the "you are not expected to understand this" comment in the original Unix. It's a thing that is not sufficiently obvious from reading the code. I myself spend too much time going "why did I do that...?" looking at my old code, it can't be easier for other people. When I say "simple" in the toybox design goals, half of that is "readable". Possibly that should be its own goal, but I haven't got a seperate metric for it.)

On the bright side, HERE document quoting is simpler than I remembered, because not only are "EOF" and 'EOF' indistinguishable, but E\OF and EOF"" are all the same too. All quoting is removed for symbol matching, but if there was any quoting (at all, even a single backslash) then variables are NOT expanded in the HERE document. (Yes, this is horrifying. Hysterical raisins.)

Last month google couldn't grep for my name (autocorected it to "langley" if not quoted, without saying it had done so; I think it added "misses" to the search and then ranked every single resulting page higher for what WOULD have been multiple pages if it hadn't gone endless autoscroll, but I'm just guessing at how they broke it). But I just googled for rasin to see if I got the spelling right (I've corrected it so many times over the years that I now correct it to be wrong when I _do_ get it right) and it did NOT go "did you mean raisin", instead it showed me the genre of haitian music and the Dragon Ball Z henchman, and the castor oil brand and the Rasin Foundation and the Checkosolovakian author with the eyebrows over the s, and the "image search" block it inserts is three pictures of raisins from shutterstock and such and one picture of the dragonball Z character...

Sigh: it's a defensible position I suppose, but now it's failing THE OTHER WAY. Pick one. I can come up with percussive maintenance workarounds to my workflow, but randomly shifting inconsistent behavior is... disconcerting. I still treat Google like a hammer, every time the head flies off mid-swing I double-take out of my workflow. I need to get used to the idea Google search is no longer solid, but... the reason I've never been able to work with Microsoft products is I need CONSISTENT failures. Go wrong the SAME WAY, predictably, and we can call it a feature. I'm happy to work with knives and fire as long as I _understand_ them. Reliable doesn't mean GOOD.


June 9, 2023

Sigh, I want to focus on the shell and work through the existing test suite in order, fixing each failing test until the entire test suite passes. It's a pain to add MORE tests to the test suit before doing this, because unless I put them right at the start they don't get run (because they're after the first existing failure), and in THEORY there is some sort of intelligible order to these tests (which probably vaguely reflects the order things are mentioned in the bash man page? Maybe? If nothing else, collating the HERE document tests and the flow control tests and the $((MATH)) tests and the variable resolution tests and the line continuation tests and so on...)

Then, logically, I'd take my various unfinished shell work branches and turn each one into a patch to review and finish like any other submission. Presumably reducing the number of outstanding branches.

My normal workflow also produces a zillion tangents, most of which are local stack push/pop that resolve themselves, but not always. For example reading through the HERE document variable resolution logic I just removed the "delete" argument from expand_one_arg() and made it clean up after itself instead, so it passes a local deletion list to expand_arg() and then frees every entry that isn't the string it's returning. All the existing callers except one were passing NULL for the list, and then freeing the returned string when they were done with it if it wasn't the same string they'd passed in. Which unfortunately leaks memory these days because variable resolution got more complicated and I added stuff like $((MATH)) which produces intermediate results that go on the deletion list, because it's not immediately clear what their lifespan should be. (When variable resolution produces a result it uses it until it replaces it. When we're not sure there are no other users of the thing we're replacing the deletion list lets us defer the free until we are sure. It also lets us mix copied and original data in other data structures: the delete list lets us know what to free. At various points like exiting a flow control block we know there can't be any users left, and we can traverse the associated deletion list. It's kind of very manual garbage collection, and the shell logic is full of it.

I need to enable ASAN's memory leak detector so I can see if there's more stuff like I just fixed, and ALSO come up with test cases to see if it fails the other way: if a "while true" loop adds entries to a deletion list that aren't freed each time through the loop, memory can fills up until the OOM killer goes boink. It's ok to batch the frees, but not let them accumulate endlessly. If I hadn't just been able to audit all the callers of the function whose semantics I was changing to make sure each one was good freeing the returned value that ISN'T going on a deletion list, I would have left myself either breaking or non-breaking TODO entries in the code where I didn't have the spoons to traverse and prune that logical branch just now. A non-breaking TODO entry is a comment with TODO I can search for later, things like possible memory leaks or missing features that don't stop me from testing it: it WORKS, it's just not RIGHT. Current examples from sh.c include "// TODO ctrl-Z suspend should stop script" and in syntax_err() "// TODO: script@line only for script not interactive". (I.E. the error message shouldn't include $LINENO when you're typing individual commands at a shell prompt.) A breaking TODO entry is an UNCOMMENTED specific nonsense word thrown into the code so I can search for it, and also so trying to compile the code will point out the line number where I haven't fixed a thing yet. That's a "must fix this before checking in" indicator.

Unfortunately, the new issues people are submitting to me when they try to use the shell, and the issues that come up talking to Chet, are not remotely in test suite order. They're GOOD INPUT, I'm happy to have them and want to fix all of them ASAP, but they're random potshots from left field, and there's a certain amount of drowning. This is a deficiency on my part, I know, and yet. I'm no longer an overcaffeinated 25 year old. These days I do 2 hours of programming and have to stand up from the keyboard. Well, ok, I still occasionally look up and 6 hours have passed and I had no idea, or my laptop suddenly suspends because the battery ran out. But getting into the zone like that is harder than it used to be. Spinning multiple plates leaves me with the nagging feeling that focusing on anything is letting down everything _else_ I should be spending the time on instead.

I'm very grateful to Android for letting me focus on toybox, and winding down the Japan stuff was a choice clearing more time for that. (Admittedly Mike put his thumb on the scales there.) But it's opportunity cost as far as the eye can see. Android wants a hermetic build but I want to go beyond that and make AOSP fully self-windinghosting. Making a self-hosting system has kernel and toolchain work too. (I have RESISTED poking at the kernel's nolibc or similar projects: the C library is NOT MY AREA. Even though I'm pretty sure I could write "just enough for toybox" in a couple weeks and need to go there at least somewhat to make strace work right... Ahem.) My old busybox work was driven not just by aboriginal but by building Linux From Scratch under the result, and I haven't poked at that in months (there's a new version out already). I need to get a toybox release out, but SO MANY TABS that I could _almost_ close and get in... I should make travel arrangements for the mkroot talk in taiwan, and mkroot currently semi-assumes a kernel patch stack I haven't updated or tried ot chase upstream in a while, and there's QEMU work I should do: the -kernel loader in half the targets can't boot from a vmlinux, there's an outstanding mips patch, my "simplest possible linux system" talk in 2017 included some hello world kernel examples and I have a pure C one now that I really should genericize for the different QEMU architectures and explain about stage 1 vs stage 2 bootloaders (the difference is DRAM init and relocation out of read-only memory and SRAM, Wolfgang Denk's refusal to make dram init optional is why you couldn't run u-boot under QEMU for a long time), and in THIS tab is where I was doing builds of each historical QEMU release until I found the last one that could actually boot Linux 0.0.1 and figure out what they broke... (If I redo the simplest possible Linux system talk I wanted to show that...)

Ahem. Tangents. Focus. People send me bug reports, I prioritize them, but getting to the far end of long-term plans while supporting a userbase with conflicting needs turns out to be hard.

I brought an umbrella to the table this time, and then half an hour later there was MUCH LIGHTNING. Umbrella does not protect against lightning, and all the buildings are closed due to the summer. (It's between summer sesssions so the university is shut down fairly hard at the moment.) Packed up and walked back home again before the storm could actually reach me, but the evening did not produce the block of productive work time I wanted...


June 8, 2023

Got the big lump of shell work checked in and now I have many, many tabs open with tests in them that I should deduplicate and marshall into tests/sh.test, plus I need to reply to pending email from Chet. And at some point come up for air and look at what I've been ignoring in github requests and so on while I've been head down on this thing. Except... I'm still doing HERE document variable expansion wrong.

I watched some "first time seeing" reaction videos to Moana, and the comments pointed out a bunch of things I hadn't noticed (the heart of Te Fiti was keeping the grandmother alive, she passed it off knowing she would die), half the music is done by a New Zealand group called "Te Vaka", and they linked to some other good details).

But I ALSO noticed that Lin-Manuel Miranda saw the Disney "Heroes get I Want songs, villains get I Am songs", and gave Maui and Tamatoa their "I Am" songs, but started Moana receiving a "You Are" song from her father (which she had to work to overcome), then she got her "I Want" song (How Far I'll Go), then the ancestors sang a "We are" song at her, then after she rejected the call and got grandmothered back onto the path she had a song literally ending with "I Am Monana", and then at the climax she sang a "You Are" song. At God.

This was very "hold my beer" of him. (And in Anime terms, instead of fighting god and killing them, you fight god and defeat = friendship. Percussive maintenance factory reset, returning fire to the gods...)


June 7, 2023

There is apparently some sort of National Smoke Emergency, like the 1930s Dust Bowl except fire instead of airborne topsoil blowing away. The topsoil blew away because steel plows were a marvelous new technology with no possible downsides, and nobody cared what the native americans who had been terriforming this "lush wilderness" for tens of thousands of years thought about sustainability. They organized forests instead of orchards, herds of semi-tame bison you didn't even have to hunt, and salmon runs where you could scoop up baskets full of fish. Since nobody OWNED that stuff, it must have just _happened_ instead of being the exact opposite of "the tragedy of the commons", which appears to be a specific failure mode of the british upper classes (just like "lord of the flies" was). Some cultures really do work for the collective good and look out for their fellow human, and others steal anything that isn't nailed down, mug children, rape women and say "she was asking for it, look how she was dressed" afterwards... Did you know europeans defected to join the native americans on a regular basis? Just walked away from "civilization" to join the superior culture. It was called "going native", and was hushed up by the rich landowners it embarassed. There's a reason rich slaveowners aimed so much genocide was aimed at cultures that DIDN'T orbit around exploitative capitalism, the side by side comparison made them look so bad quite a lot of people switched sides.

Anyway, having a smoke-heavy cookout last night was... bad timing. Fuzzy still has a sore throat, and I've had a headache all day.

Alright, what's still broken in the backslash and HERE document change... the <<< operator isn't adding a newline to its output. The line continuation can of worms...

A P.S. I removed from a mailing list post because it waxes unfortunately political: It would be nice if there was a janitorial community that made cleaned up versions of simple tools WITHOUT turning into survivalist preppers. I don't want to poke the git devs about bug du jour any more than I want to poke the kernel devs about my quarterly patch list, but when I look around at groups like "suckless" or "less wrong" they somehow manage to fail in the _other_direction_. Libertarians tearing down society, atheism becoming a religion (firm belief in nothing is still firm belief: zero is a number) that funnels people into incel nazi spaces. What's the old Niesche quote, "you become what you fight"? I want something like XV6 that could actually be a sustainable auditable load bearing base layer with no external dependencies, without falling into either the microkernel trap or having the "minix problem" of refusing to be real-world useful because it's "just a teaching tool". Yeah, it's hard to figure out where to draw the line, but I keep seeing posts about "software manifests" from corporate types and going "you're SO CLOSE, you could REMOVE dependencies and actually SIMPLIFY..." But no...


June 6, 2023

Another HERE document corner case is that toysh is parsing 'cat<<EOF' as one word, and bash is making it three words. Which is possibly related to "echo abc<(true)" outputting "abc/dev/fd/63" in bash? (Yes I found a use case for it: if /dev is mounted in a subdirectory. Some horrible thing I was doing with a chroot I think.) So multiple words, but not with SPACES between them. It's doing redirects according to variable expansion logic. And then there's:

$ echo 1<2
bash: 2: No such file or directory
$ touch 2
$ echo 1<2
bash: echo: write error: Bad file descriptor
$ ls 1
ls: cannot access '1': No such file or directory
$ echo abc<2
abc

The redirect prefix logic (apparently?) only triggers at the start of a word, so that 2>&1 stuff has to be its own word. And yes that includes the abc{def}<2 assign-to-variable prefix, still only triggers at the start of a word. So I was partly right and partly wrong. I THINK what I need to do is move the variable expansion logic out of expand_redir() into its own function, and then both call it from there (handling the prefixes) and from expand_arg_nobrace()? Except expand_redir() gets an unredirect list (well, resizeable double entry array) as an argument, and expand_arg_nobrace() does not. And in quite a number of the contexts we expand arguments FROM, redirection is not an appropriate operation. Urgh, I REALLY dowanna add another argument (it's got six, this is turn it into a structure time) and another NO_BLAH flag.

Ok, I have two sets of code that are traversing the nested quote contexts: parse_word() and expand_arg_nobrace(). (Which includes backslash escapes, as echo 1\<2 outputs 1<2 instead of redirecting, and echo 1\<<2 redirects from the file "2" instead of a HERE document, and yes I should have tests...)

So should parse_word() break an unescaped < into its own token (which means we lose the "was there a space before this" information for "echo abc<(def)" but I already knew we were getting that one wrong), or try to push redirection down into expand_arg_nobrace() which means it has to feed back the undo information to its caller? Meaning I'd need to change all the callers, and then an error path if you try to redirect somewhere we passed in a NULL for urd? Urgh, I just wanna get to a point I can check in the trailing backslash rewrite. I'm not trying to find MORE reasons to do major surgery...

Alright, parse_word() is already using the redirectors[] list, it's just doing so only at the start of a word. It needs to do so at any unquoted point within the word (breaking like parentheses would). The reason I DIDN'T is the skip_redir_prefix() stuff only happens at the start of the word, so yes there are two instances of redirection detection that behave slightly different. I can have parse_word() split at recognized (unprefixed) redirectories, and declare the abc<(blah) case a known divergence from bash. Not happy about it, but the alternative is an unbounded amount of rewriting...

Fuzzy bought a Red Snapper (very tasty, according to UHF), and we tried to grill it over a fire but an hour and a half of fire building didn't result in noticeable coals: everything's too wet, and too big. The kindling left over from last year decomposed noticeably since we collected it. We got a lot of smoke but not a lot of fire. Even with a bunch of newspaper under it, the wood hissed constantly.

Eventually Fuzzy pulled the charcoal briquettes out of the shed and cooked the fish in the actual charcoal grill. It was indeed very tasty. Both of us have sore throats from the smoke.


June 5, 2023

While walking to the table at UT last night, I got some news that at least gave me closure on the projects I was visting Japan for, and let me uninstall Signal from my phone. Now I'm wondering about my attempts to learn japanese, which has turned into a large anime to-watch queue since "walk for exercise, exposure to japanese, and entertainment" is more compelling multitasking than most of my other viewing options. I would still LIKE to understand japanese, but no longer have a specific use for it. (I mean, I can go vacation there, but I don't really KNOW anybody?) The talk in Taiwan is now also just a round trip to give a single talk, and then straight back home again. More time spent traveling than at the destination, I think.

I think I've worked through the tailing backslash line continuation changes, but the HERE document processing is still broken, maybe even MORE broken now, the backslash changes broke EOF token matching because it's got a newline on it now and echo -en 'cat<<X\nabc\nX' should match without a trailing newline on the X, yes I need to add a test...

The problem is, do I check in the backslash continuation changes by themselves, even if it causes regressions elsewhere? I'm sort of working on one big lump of stuff that it's hard to break into chunks, which means it's hard to get good stopping points where I can check things in. But that's how I wound up with so many orphaned development branches that never quite made it into the main line of development.

I got caught in a thunderstorm on the way home. Note to self: googling "austin weather" still gives an hour-by-hour expected precipitation timeline in the Google search result, but it is COMPLETELY USELESS. It said less than 10% chance of 0.01 inches of precipitation all night half an hour before I left the table, and then there was a DOWNPOUR with lightning before I even made it off campus. So I stood under a small awning for 2 and 1/2 hours, as the weather alternated between "merely raining" and "water is running down the brick wall on the INSIDE of the awning and splashing on me from that side due to electrical cables for the lighting above the door". And of course I had my new laptop in my backpack, and was holding my phone under my chin, hoping neither got soaked through enough to short out. (Yes, this ordeal started about half an hour after I was pondering whether or not to check in those changes. I had plenty of time to ponder the irony.) Eventually got enough of a gap I could run to a parking garage with better roof coverage, and at least sit down for a while.

Got home after sunrise. My sleep schedule is totally borked.


June 4, 2023

I'm encountering comments on a daily basis by everyone from famous authors to my own wife about how Google's services are deteriorating, and it worries me. I don't even bother to track the youtube complaints anymore, although at least they walked that last one back. Their "let's delete the majority of all youtube videos ever posted" policy got modified a few days later when enough people pointed out to them what they were about to do. Now they're just going to delete grandma's photos of her kids and people's school records and so on. (Never trust any online service to retain your data. Once they've sold it to advertisers they don't care anymore. You are the product, not the customer.)

I would miss Google. I really really REALLY don't want to be forced to use services from Microsoft, Apple, Amazon, Faceboot, or any of the other "walled hellscape" model late stage capitalist nonsense. Google has its faults, but it historicaly started from "don't be evil" and whatever its trajectory since then, its competitors have all been boiling the same frogs from a far worse starting point, without Google's history of employees pushing back. But that was before Google laid off 12k people at the start of the year, at least some of whom appear to have been load bearing. (They're blaming the search degredation on large language models producing extra SEO spam, but that doesn't explain why they stopped being able to find 10 year old resources that hadn't changed location. As with youtube, they valued "new" over "old" until the established stuff became unfindable.)

Capitalism destroys. It consumes like fire. And late stage capitalism is where it's eating _itself_, in some weird attempt to profit-via-potlatch. When HBO got bought by an octagenarian right-wing billionaire and started deleting its own catalog, I thought that was dumb. But now Disney is doing it. I had "little demon" bookmarked as a series to watch on Hulu (Episode 1 summary: "Chrissy Feinberg's first day of seventh grade goes south when she discovers she's the Antichrist") and it's gone now. DISNEY OWNS IT (when they bought Fox they wound up owning both FX networks and Hulu). It was hulu-exclusive content created by Hulu's parent company, which is now available nowhere. Ka-ching? Youtube declaring their intention to join that parade wasn't _suprising_, but they DID walk it back. Google does still listen, at least sometimes. That's why their core business rapidly and obviously deteriorating worries me: they would be _missed_. (And yes the whole apple vs android thing I've been pushing for over a decade now too.)

Sigh, a common failure mode with petty criminals is smashing a thousand dollar plate glass window to steal a hundred dollars of display items behind it. The billionaire equivalent used to be strip mining and slum lords, but now they've moved on to "imposing disproportionate externalities" upon the intellectual property world. And when you dig for reasons, "some billionaire's personal fee-fees" motivate far too much. The general consensus around why the batgirl movie was destroyed (completed but not released) is because the actress was black and the billionaire who bought it was racist. Disney's purge is attributed to the writer's strike, and the desire to punish the strikers by ceasing to pay residuals. Of course residuals are a fraction of the money they MAKE from the program, so they have to give up a big share to deny other people a small share. A tiny fraction of hurt goes to people who annoyed them, a bigger hurt goes to themselves and their customer base. But of course Disney was one of the first and worst offenders here, inventing the legal fiction that they could acquire properties like Star Wars without acquiring the obligations that came with them (to pay royalties to authors) and then using their size to stonewall starving artists until they finished starving.

The "Citizens United" decision is 13 years old, Mitt Romney's corporations are people was a year later (implying he should have faced murder charges for Toys R' Us), and they've packed the supreme court since until it's completely dysfunctional. The nominal opposition is literally senile, meaning this is unlikely to end without guillotines, which can only happen over the Boomers' dead bodies. And so we wait. Opinions vary on how long we wait, but we all know how societal change works, it "progresses one funeral at a time". I would LOVE to be wrong about that.


June 3, 2023

Sigh, I'm trying to redo the toysh line input processing. The trailing \ logic is different from what bash does (ongoing thread on the list with chet about that), and I _also_ have a hack I've been meaning to clean up: because parse_line ignores completely blank lines, I'm having EOF feed in single space " " lines to flush pending line continuations, which is wrong for multiple reasons. For one, toybox sh -c 'echo hello\' has an space on the end instead of a backslash, which is TWO bugs (\ gets eaten, space gets added). For another, you can have MULTIPLE pending line continuations, and a single EOF line won't necessarily flush them. Except we return 1 when we need another line, and there's different reasons for needing another line, which behave differently: unterminated if or || flow control errors out, but unterminated HERE documents are terminated by EOF (with a warning in bash, silently in the defective annoying shell). And you can have more than one HERE document pending at the same time: cat << EOF1; cat << EOF2 or even just cat << EOF1 << EOF2 (no they don't append, stdin gets redirected twice so the first one is dropped, but 3<<EOF would let you read from fd 3... Oh, and HERE documents are seekable, I should add a test for that. It writes to a deleted temp file to get a seekable filehandle that frees its contents automatically when closed. Classic unix filesystem semantics...)

My line input logic was removing trailing newlines right at the start, but I can't do that because an escape at the end of a line that was NOT ended with a newline gets preserved. (Well, not RELIABLY by bash, but I poked Chet about that. The -c processing is still magic.) So now I've got to propagate that \n through and it's essentially trailing whitespace which I'm already mostly handling, but something somewhere's likely to break. Plus NULL pointer, empty string, and string that only contain whitespace being DIFFERENT is why that "send in a line with a space in it" hack happened in the first place...

I'm also patching the HERE document logic to terminate all outstanding HERE documents at EOF. It can still return "nope, I need more" for unterminated flow control, at which point the caller errors out because there is no more, but the caller can't distinguish "need more HERE document lines" from "need more flow control logic", it just returns 1 to ask for another line, so it has to do it within parse_line(). And I've had to add multiple goto statements to get it to work because the existing logic really isn't set up to turn into a loop. Multiple gotos are not elegant, it means there should be a loop here which would require major surgery to insert...


June 2, 2023

Multiple people are now trying to use toysh and sending me bug reports, but what I _really_ need to do is grind through the "ASAN=1 make test_sh" bugs because every time I hit an issue or major todo item I try to throw a test in there which bash passes and thus toysh probably _should_. And there are a whole lot of existing tests toysh doesn't pass yet, which makes adding new ones awkward. (I keep sticking them near the start so they trigger, but there should be some logical order to all this...)

The next test_sh failure is a double free when a command comes after a HERE document, ala ASAN=1 make sh && ./sh -c '<<0;echo hello' which did print the hello! It didn't warn that the HERE document hit EOF, but I can presumably add that.

ASAN says the second free happened on line 2923... which is in the function free_pipeline() so yes it would, wouldn't it? This being gcc, it doesn't say who CALLED that function, because gcc's ASAN is crap. And I can't use the Android NDK's ASAN because it's only available as a dynamic library, and if I dynamically link against bionic it's not available on the host so the binaries won't run. And you can't LD_LIBRARY_PATH your way around the dynamic loader being /system/bin/linker64". Elliott suggested I could symlink /system to somewhere in the NDK, but find android-ndk-r25c -name linker64 produced zero hits.

So backing up, the first free was in do_source() which calls llist_traverse(pl, free_pipeline) after run_lines() returns. Because we've executed all the stuff and it's done with it now... Ah, and then it frees the HERE document but the last entry in that is the EOF which was one of the arguments to the earlier command line that already got freed. (Because why copy it when we're

I need to enable the leak detector, and whitelist the EXPECTED leaks at exit. I know how to do the first, not sure how to do the second. Other than writing a debug function to laboriously free stuff the OS is about to free for us. I'm worried about accumulating leaks during long runs, not blocks of data with the same lifetime as the process. I want some sort of leak_forget() function that says "anything that's already been allocated is not interesting for leak detection, only show me NEW allocations after this point that don't get freed".

Alas, gcc's ASAN is abandoned crap, and the LLVM toolchain I have wants dynamic bionic installed on the host, and will NOT work static linked (for no obvious reason other than they either didn't think of it or didn't bother). So puppy eyes about adding stuff would add it to a context I can't use anyway.

Hmmm, I should try getting a dynamic bionic chroot working again. In theory the stdin panic fix in the _start code has made it in to the release version by now? (I should really learn to build the NDK from source. Too many tangent ratholes...)


June 1, 2023

Email from Chet: my trailing backslash line parsing is wrong in toybox (or at least doesn't match bash). I knew my line parsing is wrong and I'd have to redo it, but it turns out it's wrong in more ways than I was aware of. Hmmm...

Also bash -c 'cat<

Oh, and no matter how you fiddle with the priority, HERE documents always seem to eat their lines before line continuation logic does:

$ if cat << EOF; then
> blah
> EOF
> echo hello; fi
blah
hello

Which toysh is already getting right, but I want to make sure I have tests for. And also:

$ if [ $(cat) == blah ]; then echo hello
> fi << EOF
> blah
> EOF
hello

I.E. your REASON for requesting line continuation can vary from line to line, based on parsing the new input. And that $(cat) can't be evaluated until the trailing redirect has replaced stdin for the whole block, which I'm already getting right but the new changes can't break that, hence regression testing...

I mean, if you REALLY want to go down the rathole here:

$ bash -c 'echo $LINENO'
0
$ bash -c $'\n\n\necho $LINENO'
3
$ echo 'echo $LINENO' > weeb
$ bash -c '. weeb;. weeb;echo $LINENO'
1
1
0
$ bash -c $'. weeb;. weeb;echo $(eval $\'echo $LINENO\\necho $LINENO\');echo $LINENO'
1
1
1 2
0
$ bash -c $'. weeb\n. weeb\necho $(eval $\'echo $LINENO\\necho $LINENO\');echo $LINENO'
1
1
3 4
2
$ bash -c $'. weeb\n. weeb\neval $\'echo $LINENO\\necho $LINENO\';echo $LINENO'
1
1
2
3
2

That's why each "do_source()" has it's own pseudo-function context, because LINENO is often a local variable even without a function call, which sometimes gets reset and sometimes gets inherited as you enter/exit each new parsing context, and I need tests for all of it...


May 31, 2023

Flew back to Austin first thing in the morning, early enough in the day I could hang out with new/old laptop at Wendy's and the HEB tables.

Finally got the ls --sort tests in, and fixed more than one bug found by them.

For example --sort can handle csv arguments (in toybox, not in gnu/dammit) but when I fed it more than one sometimes it was looping endlessly because when you've already matched you don't do further sorts (which meant it wasn't advancing past those arguments it wasn't processing), but it needs to CHECK those future arguments to see if there's a "reverse" in there, so it was looping without advancing... (There's also "unsorted" which stops argument processing despite not having matched, but I can't TEST unsorted because it means the filesystem order leaks through and I can't control what that IS)...

Yesterday the dentist said the tooth should come out, and responded to my "it doesn't hurt" with a poke with a tool demonstrating that the nerve is alive and well and not protected by much and capable of hurting a VERY LARGE AMOUNT QUITE SUDDENLY.

This would be the SEVENTH tooth I've lost. (Four wisdom teeth and two on top next to the incisors removed presumably for cosmetic reasons as part of the braces years ago? I had a somewhat pronounced overbite as a teenager. My parents were really making those decisions at the time, I was in early high school.)

He didn't quite come out and say it, but when my wisdom teeth were removed this tooth was left without a matching tooth to chew against, and that's apparently bad for teeth. Which means I'm losing this one because they took out too many wisdom teeth back in the day. Those two removed in the front meant the braces shifted the rest forward, which made room to KEEP the top two wisdom teeth (which I pointed out at the time) and the dentists went "no, you don't want to leave a tooth with no matching tooth for it to work against"... but they DID, didn't they?"

Add in the TMJ I got because the braces for some INSANE reason involved a rubber band from my upper left to the lower right (across my tongue) for 6 months, and this means basically every major experience I've had with american dentistry has caused future problems. The braces made my jaw click and grind, the previous tooth removals left me with an orphan tooth that's now collapsing, and the 2013 experience in St. Paul left two of my front teeth looking obviously terrible. The cost each time was probably somewhere around ten grand each time (if not more adjusted for inflation)...

Anyway, had him grind off the pointy bit (which didn't hurt for about five seconds and then essentially repeated the first poke; he is a VERY good dentist to not have done further damage inside my mouth when I lurched like that, but hey: my cheek can heal now). And then I scheduled a follow-up apointment for a month from now. I was thinking of flying back to dogsit while Fade's at 4th street anyway. (Previous years she flew back from Austin and left Adverb with us, but she's staying in Minneapolis this summer to finish her dissertation so she can defend it in August, which is why I'm flying there to see her here so many times this year.)


May 30, 2023

I may have gone a little overboard. Somebody emailed me to track down a citation, and I replied with my usual "THIS IS A SPECIAL INTEREST OF MINE" enthusiasm:

> Is this yours?
> https://landley.net/history/mirror/cpm/history.html
> I'm citing some of it in a book I'm writing and I wanted to make sure it was you.

As the "mirror" states, it's a copy of an old page from geocities. Here's the original pulled out of archive.org.

Back in 1984 Gary Kildall was one of the original co-hosts of the TV show "computer chronicles" (until he became too busy with his company to continue). Here's an episode he co-hosted on "programming languages", and here's an episode on "operating systems". When Kildall died, the show did a retrospective on him.

Here's the "standard" interview with him (I have a copy of this book). And here's another computer industry pioneer reminiscing about him.

> Also, Im going a little beyond what's on that page, and might you be able to
> confirm it's (more or less!) accurate, please?
> Thanks
> [NAME]
>
> In 1974 Gary Kildall, co-founder (with his wife) of Digital Research, personally
> created CP/M, which became the standard operating system for 1970s personal
> computers.

You should really watch the PBS series Triumph of the Nerds (which is based on the book "Accidental Empires", the presenter is the book's author).

CP/M only became the standard operating system for "S-100" systems. (Here's a song Frank Hayes, columnist for ComputerWorld, wrote/performed about the S-100 bus. Yes, it's a "C shanty". From the album "never set the cat on fire".)

The "PC vs Mac" of the day was Apple II vs S/100 systems (which started as clones of the Dec Altair: MITS manufacturing couldn't keep up with demand but they shipped a full schematic with every system were using off the shelf parts, so other people bought the parts and assembled them according to the schematic, and then started making improvements).

The company "Imsai" (that's the computer the protagonist of the movie "Wargames" had in his bedroom) convinced Kildall to break his OS into two parts (BDOS and BIOS, Basic Disk Operating System and Basic Input Output System), with the BIOS essentially being a driver package provided by the hardware manufacturers so the same BDOS could talk to disk and console hardware. That way, ALL CP/M machines could run from the same floppy disk, rather than having separate disks for each manufacturer.

All that was 8-bit, and since 1979 Kildall had been chasing multiprocessing (MP/M) as the next big thing (about 20 years too early, the cost of memory was a big limiting factor so at the time running multiple programs in parallel on the same machine wasn't _that_ much cheaper than buying multiple machines and networking them, although S/100 systems didn't have a "motherboard", memory expansion was on cards (which you can keep adding as long as you have slots), and even the CPU was on a card, so having a multi-processor system with two CPU cards wasn't a far-fetched idea, the trick was making it WORK...) so he basically ignored the 16 bit 8086 for the first couple years.

But a guy named Tim Patterson at Seattle Computer Products was working on a new 8086 board which was intended to run CP/M, and since DR hadn't shipped it yet he bought an off the shelf CP/M manual and implemented 16 bit versions of the system calls it listed so he had something to test the hardware with, calling the result "QDOS" (a play on BDOS, Quick and Dirty Operating System).

Tim had previously worked a summer job for Microsoft where he created their first hardware project (an 8080 processor card for the Apple II, allowing it to run Microsoft DOS instead of the one Steve Wozniak had written), and when Paul Allen realized that IBM's project Acorn was basically a 16 bit CP/M machine he and gates threw $50k at Tim (split with his employer) to buy QDOS from him, which they renamed "DOS 1.0"...

> But the first versions of CP/M, like the early personal computers,
> had very limited functionality: the first version merely supported
> single-tasking on 8-bit microprocessors and no more than 64 kilobytes of memory.

8-bit machines all had only 64 kilobytes of memory, and hacks like "bank switching" historically never made much difference. CP/M was about the best you could do on that generation of hardware. Paul Allen thought the PC that IBM was developing could do better and wanted to run Unix on it, so he licensed Unix from AT&T and contracted a small 2-man garage outfit called SCO (the Santa Cruz Operation) to port it to the Intel 8086 and Motorola 68000 processors (because IBM hadn't decided wihich it would go with yet), and called it "xenix" to indicate "we'll port it anywhere IBM needs it to go".

Then when they signed the NDA and got the hardware specs of the original IBM PC (because IBM wanted to put Microsoft's DOS in ROM as the PC's operating system, like the Commodore 64 and so on, and the CEO was on the board of directors of the Red Cross with the mother of William H. Gates III, "trey" to his friends, made an exception for "Mary's Boy". Microsoft was too small to qualify as an IBM vendor normally, that part is in the book "Big Blues" about the history of IBM, by the way)... Anyway, the IBM PC specs read "16k of ram expandable to 64k if you pay extra, and the ISA bus is just the S/100 bus with unused wires removed" (as in there were literally adapters that plugged the bigger cards into the smaller slots, no electrical or timing fiddling required, just shifting wires over)...

Paul Allen went "oh: you're going to run CP/M on it". But he and Gates had already expanded their ambitions to sell a bigger OS to IBM, and Gates said he knew Kildall and offered to set up the meeting with IBM, whereupon SOMEHOW Kildall got the impression that the meeting was in the afternoon but IBM got the impression that the meeting was in late morning, so Kildall was off at the airport flying his airplane (cessna probably?) to cool his nerves, and when the IBM guys unexpectedly showed up at his house (he worked from home) his wife paniced and called Gates who suggested that the company lawyer look over the NDA while Kildall got back from the airport, and as lawyers do he went "ew" and started negotiating terms, and since they refused to sign it as is the IBM guys went away empty handed before Kildall even got back from the airport, and the whole meeting was set back weeks...

Which gave Paul time to contact Tim Patterson and scrape up $50k to buy QDOS and offer to be IBM's "second source" with "their" 16 bit CP/M clone (filing off the Q and renaming it MS-DOS). IBM did the PC after their salesbeings saw Apple II running Visicalc on the secretaries desks when they went to meet with executives in otherwise pure IBM shops, and after allowing Digital Equipment Corporation and the PDP-1 to live (creating the minicomputer ecosystem) they vowed NEVER AGAIN. They estimated they had a year to flood the market and smother Apple before it got entrenched, but an internal process audit had just measured it took them 9 months to ship an empty box, so they had NO TIME to make the new product and get it to market. The head of the Boca Raton department offered to make one out of off the shelf third party parts they could order in volume with a phone call, which is NOT how IBM normally did things but this was an emergency and the CEO personally granted absolution and indulgences to the Boca team. IBM was a monopoly used to squeezing customers, so they carefully made sure none of these new suppliers could ever do monopoly leverage against IBM, by ensuring there was a second source for EVERYTHING, with their one unique contribution being the BIOS ROM (the thing Compaq clean room cloned). They hadn't been second sourcing the software (just the hardware), but hey: good idea! Another CP/M, sure thing.

Meanwhile, Kildall was a Navy instructor before he started Digital Research, which meant he knew about being a vendor to big bureaucratic institutions, and wasn't really keen on going there. It's lots of money, but most of it's pie-in-the-sky someday money after jumping through lots of hoops and years of delay, and to get there you need a dozen full-time staff just to navigate the bureaucracy. His company was a couple people running out of his house: he'd take free money if IBM offered it, but he already had a CP/M ecosystem built around the stuff he was already selling to existing customers, and this new thing was either part of the S-100 family or it wasn't.

So when the PC shipped, CP/M-86 was late, and when it arrived it cost several times what Microsoft priced DOS at. But the real nail in the coffin was that Paul Allen didn't give up on his dream of having this machine run Unix. Each new release (PC, XT, AT) could support more memory, and the 8086 processor could physically address up to a full megabyte. (The DOS 640k barrier was because they'd arbitrarily mapped I/O memory at 10x the original PC's memory capacity: 2/3 for RAM, 1/3 for I/O memory space. You had to move the VGA card's memory window in order to use more contiguous address space in your application, and even then you don't get ALL the space because I/O memory is still needed.)

DOS 1.0 was a bug-for-bug clone of CP/M (well, a 16-bit port of an 8 bit system, but otherwise identical). But for the DOS 2.0 release, Paul Allen added as many Unix features to MS-DOS as he could. You could now use filehandles instead of file control blocks, and stdin/stdout/stderr were filehandles now. He added unix-style subdirectories, although DOS 2.0 used "\" and "/" interchangeably because "dir /s" was how CP/M had indicated command line options, so DOS 2.0 let you use both "dir /s" and unix style "dir -s" with the / version deprecated, but he couldn't quite REMOVE it yet, so the syscalls supported both directory separators. And he publicly announced that a future DOS version (hand-wiggle maybe around 4.0) would just be Xenix with a DOS emulation layer for old programs. You'd need something like 256 to be worth it, and hey: you'd get multiprocessing for free. (Remember how Kildall was doing MP/M? Maybe not THAT crazy. For reference, IBM announced its "Topview" multitasking graphical desktop for DOS in August 1984, and the first version of the Desqview multitasker for DOS shipped in July 1985. If 8 bit systems max out at 64k, a 16 bit system with 128k of RAM running 2 of those 8-bit programs at once sounds pretty feasible...)

The new unix features in DOS 2.0 made it a way better programming environment than CP/M-86, so it wasn't just cheaper now it was BETTER, and CP/M-86 receded from use on the IBM PC. (And clones, Compaq had happened by now. The reason the IBM PC took over the world and the Apple II didn't is that when IBM sued Compaq they lost, but when Apple sued Franklin they won: https://en.wikipedia.org/wiki/Apple_Computer,_Inc._v._Franklin_Computer_Corp. That was the legal decision that extended copyright to cover binaries and thus invented "shrinkwrap" software, see also the 1980 Audio interview with Bill Gates (mp3 and transcript both linked from https://landley.net/history/mirror/#:~:text=1980%20audio ). The GNU project, IBM's "Object Code Only" announcement, and AT&T's post-breakup commercialization of Unix were all responses to Apple vs Franklin...)

IBM's competitive focus on Compaq and the hardware clones distracted it for years from the fact it had lost its second source competition on the operating system side when DOS 2.0 rendered CP/M-86 irrelevant. IBM shipped its own PC-DOS and Digital Research eventually came out with DR-DOS, but by then Microsoft was doing "CPU tax" contracts with motherboard manufacturers (see the 1995 antitrust trial under Judge Sporkin), and used aggressive bundling (buy X get Y for free, and you can't NOT buy X) to promote Windows and Office... But I'm getting ahead of myself.

Two things happened to derail the dos->xenix move:

1) the IBM PC/AT (developed in 1983, shipped August 1984) added a hard drive, so the DOS 3.0 release was mostly about adding hard drive support (the C: drive) rather than furthering the convergence with Xenix.

2) in 1983 Paul Allen came down with Hodgkins Lymphoma. (That's the same cancer Hank Green just got. It's one of the most treatable forms of cancer, but it IS cancer, and can totally kill you).

Nobody initially knew WHY Paul Allen was so sick (looked like overwork during the DOS 3.0 crunch), but Paul Allen owned 1/3 of Micorosoft's stock because Bill Gates was an asshole: they original wrote DOS for the MITS Altair, and the owner of MITS offered Paul a job working at MITS. When incorporating Microsoft Gates insisted he have 2/3 of the stock and Allen only 1/3 because Gates would be working at Microsoft full time and Allen only part time due to his job at MITS, and Allen agreed... and then immediately after that was signed, Gates asked Allen if he could get him a job at MITS. As I said: asshole.

But the ultimate asshole move was that while Paul Allen was working himself to death trying to get DOS 3.0 out fast and clearly sick but not yet properly diagnosed, Paul heard Bill Gates and Steve Ballmer (Microsoft employee #3, Gates' old poker buddy from Harvard Law School before they dropped out to work at Microsoft) talking to each other in the next room about how to get Paul Allen's 1/3 ownership of microsoft back when Paul died. They didn't want it going to his family, they wanted to figure out how to take it back.

When Paul Allen took a leave of absence to get cancer treatment, he never returned to Microsoft. The drive to switch everything to Xenix left with him, and Gates looked around for other people to copy technical agendas from instead. He saw the Apple Lisa (because Apple gave them an early unit to port their application software to), and tried REAL HARD to copy it but Windows 1.0 and Windows 2.0 were just pathetic. DOS 4.0, 5.0, and 6.0 offered nothing that DOS 3.0 hadn't. Gates teamed up with IBM to work on OS/2 which was IBM's attempt to port mainframe technology down to the PC space... alas, targeting the 286 instead of the 386.

IBM had bought the entire first year production run of the Intel 286 processor to keep it out of the hands of competitors (like Compaq), and was then stuck with a warehouse full of the slowest, most expensive, rapidly depreciating 286 processors ever made. That's why they refused to go to the 386 and even the IBM PS/2 was mostly 286 chips, they were trying to unload that backlog of 286 chips! (They eventually landfilled some portion of them, but it took YEARS.) In 1987 the Compaq Desqpro 386 was the first 386 PC because IBM the 386 had been out since 1985 and IBM still hadn't used it, and Compaq got tired of waiting. (As did IBM's customers.) So yeah, that's why OS/2 was so far behind the times that Windows 3.0 could get out ahead of it and establish a new programing API standard.

When David Weise made windows work years later on his own and against orders, the first person he showed it to thought he'd get in trouble for it because Microsoft was focused on OS/2. Microsoft never had a plan, they had a monopoly that let them fail repeatedly until they got lucky. Their "CPU tax" monopoly contracts forced manufacturers to License microsoft products for entire "product lines", meaning PC manufacturers who wanted to ever sell a Microsoft operating system on ANY machine had to put them on EVERY machine. They couldn't sell even a small number of machines without the preinstalled microsoft software, and microsoft fought a marketing campaign for years against "naked machines" because obviously the only thing anyone could do with a machine that DIDN'T have microsoft software on it was install pirated microsoft software. Microsoft's monopoly leverage also let them prevent other operating systems from being installed alongside theirs, and when Windows 95 came out they extended this to preventing IBM from installing OS/2 on any of its own PCs if it wanted any access at all to Windows 95. (See the 1998 antitrust trial with Judge Jackson.) But again, getting ahead of myself.

The death blow for Xenix was that after the 1983 AT&T breakup, when AT&T was commercializing unix, it sucked in code (without attribution) from all the third party unix variants and shipped it in Unix System III. (System V was a successor to System III, there was a 4.0 but it never shipped to customers.) This is why the AT&T vs BSDi lawsuit ended favorably for BSDi: they were able to prove in court that AT&T had sucked in THEIR code without attribution, and thus forced a settlement on AT&T. AT&T also did the same thing to Xenix, and when Gates found out Microsoft code was in an AT&T product without permission or payment he went BALLISTIC, but didn't think he had the legal heft to take on AT&T so instead he purged Xenix from Microsoft (it had been running their internal email system and so on) and unloaded Microsoft's interest in Xenix on SCO (which is how SCO wound up fully owning Xenix, they'd intially just been a subcontractor doing work on somebody else's IP, but they got it cheap), and basically developed a Dave Cutler level of Unix hatred going forward...

I note that back in the day I did a LOT of research on this for my rebuttal to SCO's second amended complaint against IBM, and xenix is all through it. The indented parts in green are mostly stuff I wrote, with a little bit from Eric, but the OSI position paper was his baby and the rebuttal paper was mine. The rebuttal links to a lot of primary sources, many of which have sadly gone away over the years but you can still pull most of them out of archive.org if you try...

(You should TOTALLY get a copy of Peter Salus' book "a quarter century of unix". And a copy of "Where wizards stay up late" which is about the formation of the internet. Soul of a new Machine and A Few Good Men From Univac are more tangential, but loads of fun.)

Oh, and the book "Hackers" by Steven Levy is the other half of this Ken Olsen Smithsonian interview, literally two halves of the same story with the TX-0 and so on.

Oh, and the first four interviews in the Intel section of my mirror are the four parts of the story of the birth of the microprocessor: Ted Hoff (the actual creator), Federico Faggin (who went on to found Zilog and the Z80 processor), Masatoshi Shima was their actual customer at Busicom (who many people say was the ACTUAL inventor of the 4004), and then their boss was Gordon Moore (of Moore's Law fame).

Then read "Crystal Fire" about the invention of the transistor. The second half of that book is about the creation of Silicon Valley (which exists because William Shockley was an utter asshole), and Gordon Moore is a featured player (part of the "traitorous 8" that bounced from Shockley to Fairchild to found Intel)...

Ahem: computer history is a hobby of mine. Here's a 2 part writeup (part 1, part 2) on some interesting plot threads I did a dozen years ago.

(I've been meaning to write my own book for years, but... too busy.)


May 29, 2023

Sigh, fell out of the habit of blogging during the week when I couldn't. (Nothing for my editing pass to elaborate on when I didn't leave myself a trail of breadcrumbs...)

Git log shows couple of shell fixes. I should get a release out then do a deep dive into shell stuff again, and try to get that properly finished.

Cut up one of Fade's old disposable mouthguards to get a chunk of plastic I can put over the tooth so my cheek can get some relief from endless stabbing. (It was keeping me awake, and it's not fun to talk either.)

Fade got me an appointment at the dental school attached to the university she gets all her tooth care done at. Of course she gets it free as a grad student, and I don't. We pay like $500/month to get me on her health insurance plan, but it doesn't cover dental for me: luxury bones. Still, these guys are known to be very good at their job, and should not make it WORSE. I'd very much like treatment that didn't cause more problems than it solved...


May 26, 2023

Back on the horse. (For a definition of "horse" that involves taking my new laptop to the common work area in building 1 of Fade's apartment, which is playing a spanish cover of "Achy Breaky Heart" for some reason.)

The 'repeated hang" failure mode left me with a lot of vi :recover files where it prompts me which of the three .swp files to read, and I'm just zapping all that. There's a lot of pausing to stare at "am I deleting the .blah.c.sw? file or the blah.c file" before each one JUST TO BE SURE. (I have made that mistake. Less of an issue when the file is in git, and I'm just losing recent changes instead of trying to dig it up out of a USB backup drive.)

Sigh, the hard part of fiddling with a command like ulimit/prlimit is A) coming up with the new help text, B) coming up with test suite entires. Once I've got those, the CODE is generally pretty easy. Implementation is seldom the hard part, DESIGN is the hard part. What should it DO?


May 25, 2023

New laptop arrived. The freeze problem advanced to "happens 30 seconds after a reboot", so I ordered another of the same type I could just swap the hard drive into. (I have 3 such spares at home, but they're in Austin and I'm with Fade in Minneapolis.)

It's so CLEAN. Not covered in scratches and gunk, almost as if I HAVEN'T been dragging it around with me everywhere for a couple years. Same model (Dell E6230) but this one's refurbished and thus in a slightly different case (doesn't say Dell E6230 on it for one), and with this case I can't see the charge/disk LEDs with the lid open. Seems like a tiny thing, but kinda significant now that I'm confronted with its absence. Yeah, there's software versions up in the toolbar (which I have configured to only be visible with the mouse hitting the top of the screen), but I don't TRUST the software ones. I wouldn't have a band-aid over the laptop camera if it had a physical LED that lit up when it was powered independent of any software. The fact they refuse to do that stuff is why one of the first things I do with any new laptop is stick a band-aid over the camera. The pad protects it for when I want to use it, and when I don't it's NOT LOOKING AT ME. Grrr.)

I tried borrowing Fade's old macbook during the gap, which was a comedy of errors in and of itself. She dug it out of the closet, confirmed it worked, set it to charge on the counter, and went to work. I opened the lid to be confronted with a login prompt. Ah. Day 2: armed with the password I tried to ssh out to a linux machine to do some work and... none of the ones I can think of are configured to allow password, they're all key-only. (I have backups of everything... in Austin.)

It's pretty late in the day by this point (shipping estimated the new laptop would arrive yesterday, instead it came in after 3pm today), and by the time I'd rustled up an appropriate screwdriver and got the hard drive swapped and network access sorted out (registering the mac address with Fade's apartment's wifi... gave me an intercept screen asking me to log in? Seems redundant somehow. Oh well, phone tethering still works...) it's after 5pm. Old machine still has the bigger memory but I'm making sure this is STABLE before swapping more parts than strictly necessary. To be honest it's possible I could have fixed the old one with a can of compressed air, but I haven't got one here and am not entirely sure where to buy one (target?), and the hang problem going away and then coming back again is how I _got_ here. I want reliability, please.

Via the phone tether I'm downloading SO MUCH EMAIL... (Gmail's pop3 does about 1 message per second in 250-500 message chunks. Between linux-kernel and qemu-devel and so I, I get well over 1000 messages a day. This is likely to take a while...)

On the bright side, the time off probably gave my eyes time to adjust to the new glasses. (The myopia is the same, only the asigmatism has been changed to protect the innocent.)

Wrote up yesterday's broken tooth while email downloads. Not gonna backfill the rest because I didn't do anything of note and don't remember what most of it was anyway...


May 24, 2023

Still backfilling: this is the day I broke a tooth. Molar all the way in the back, bottom left side, next to where I got a wisdom tooth removed years ago. The tooth itself doesn't hurt, the magic japanese toothpaste is quite effective. Hydroxyapatite deposits more calcium phosophate on top of any exposed dentin and keeps the nerves protected behind bone equivalent... but it does nothing about the enamel, a large chunk of which is what broke off here, leaving a sharp pointy bit that's stabbing my cheek. The cheek hurts a LOT.

Regretting not getting to a dentist while I was in Japan, but while I trust the medical providers over there FAR more than the ones in the USA... there's still a language barrier, and my teeth are in terrible shape due to the extensive dental work I paid thousands for back in 2013. (The 6 month apartment I had for the Cray contract in St. Paul was right down the hall from a dentist, and I used them as a second opinion to say "yeah, those two front teeth that got chipped before they even came in because of that car accident when you were 5 years old smashing your baby teeth up into your gums so they've got grooves on the front? We we ALSO want to just drill all that out and turn it into fillings because it's weird and we're calling that cavities even though you yourself can't detect them in any way." So I went with it, and all the fillings they put in chipped to pieces and fell out entirely over the next 18 months, leaving me with large obvious holes in two front teeth. I paid a lot of money to get those holes, and felt really silly about it, but regular application of japanese toothpaste meant it didn't hurt and did not appear to be getting worse...

And now I need to wrestle with US dentistry for a _different_ problem. Dowanna.


May 23, 2023

This gap was due to my laptop being dead and having to mail-order a replacement because all my spares were back in Austin.


May 21, 2023

The "battery charging while using laptop" problem is getting worse, I just had two reboots (well, freezes forcing me to reboot) in half an hour with nothing plugged into USB, and while basically just typing in a text editor and no cpu-intensive anything pulling power.


May 19, 2023

I originally had this as "April 31" but then my python RSS feed generator went "boing" parsing the date, because there isn't one. (Midnight according to my laptop is in the middle of the day in tokyo, it's a bit fuzzy which day I'm writing for over there...) So I moved it here because I didn't write a blog entry today, due to ongoing travel recovery:

Somebody in email said "Canada as a whole seems to be determined to be a branch plant operation of US-based multinationals," and I replied:

This too shall pass. Maybe not fast enough to benefit either of us personally, and when both the roman empire and the british empire receded they left behind a lot of scar tissue, but no empire lasts forever, and the USA is already pulling back from the world now we're a net exporter of oil (have been since 2019) and thus don't really care about having a trillion dollars of navy policing everybody's shipping quite so much anymore. We've still got aircraft carriers, but they can't be everywhere, and we've gotten rid of most of the smaller patrol boats that used to _be_ everywhere...

It's hard to take solace in bad things happening, but the currently powerful aren't going to stay powerful forever. The USA is facing the end of the boomers (1946 was 77 years ago, their refusal to hand off anything gracefully is going to cause a LOT of loss of institutional continuity), climate change (Houston's flooded twice more since hurricane Harvey, but "again" isn't as newsworthy), the exhaustion of the ogalalla and california's central plains aquifers underpinning the majority of our agriculture (don't get me started on crop monocultures), the collapse of the US health care system coinciding with the rise of antibiotic resistance, a dozen kinds of invasive species (Texas has "crazy raspberry ants" that are attracted to electromagnetism and will thus ball your wifi router), the reshoring of manufacturing (if our trillion dollar annual defense budget stops paying for the navy protecting container ships "for free" then floating everything from the far side of the world instead of mexico gets a lot more uncertain)...

Canada has its own issues to work out, but the worldwid fascist crazy should recede with the Boomers (they got BADLY poisoned by airborne lead in the gasoline for 50 years, and in 2/3 of who's left it's combining terribly with senility to go past "kids these days get off my lawn" into flat earth territory). Once that's past, then maybe the wretched survivors can start shoveling out. (Step 1: universal basic income, which yes will only happen over the Billionaires' dead bodies. So maybe UBI is step 2.)

Personally, I would like the new victorian prudishness and the Fosta/Sesta (Comstock Act II) nonsense to stop being imposed on every other country in the world, along with the USA's tendency to treat children as non-persons. In Japan quite small kids buy stuff in the store and go on the train by themselves. In Europe kids can have wine or beer at dinner as soon as they can walk. In the USA we look back in horror on the days of "latchkey kids" because now they're non-persons legally confined to a building every day where they go through metal detectors and are watched over by police with live ammo who randomly search their bags and lockers, they can be arrested and permanently removed from any family that allows them to be alone on the street two blocks from home. It's "for their protection" that they can't work or vote or drive, if a teenager sexts a naked selfie to another teenager they can BOTH wind up on a sex offender watch list for life, which is a change to the law the Supreme Court only made in 1982 by the way. Ronald Regan hijacked the federal highway funds to force states to raise the drinking age from 18 to 21 (after vietnam lowered it since Johnson and Nixon were drafting 18 year olds to die, so they-who-were-about-to-die protested their way into being treated like adults in other areas, which involved "the man" gunning down protestors).

So yeah, rooting for the american empire to collapse. When France invented the guillotine they had to work through Robespierre and Napoleon (liberty, equality, oh look a rich white guy has siezed dictatorial power again, rinse repeat), but it worked out for them in the long run, and they're currently braving tear gas to push back against late stage capitalists who want cheaper and more obedient servants. "We can't afford this" means you squeeze the rich harder. We went to the moon WHILE fighting the cold war, whether or not we could "afford it" wasn't the big question.

I would love to reach a point where I could take solace in GOOD things happening. Not just looking forward to the end of bad things and hunkering down to minimize the inevitable collateral damage...


May 18, 2023

Travel recovery day. Headache. I had a row to myself on the motreal -> minneapolis flight so I managed to get an hour of sleep, but missed beverage service and got intensely dehydrated. (It's the pressure changes, that's WHY they do constant beverage service on airplanes.)

Then lyft couldn't find where I was in terminal 2 and cancelled the pickup, and then taking the light tactical rail home was... interesting. Minneapolis seems to have cut the rail maintenance budget, the second train (after the half-hour wait between trains because they don't run frequently enough) was dirtiest public transport I've ever been on. And that includes Camden New Jersey and the New York subway. (I am sooooo spoiled by tokyo.)

Got home to Fade's, crashed, woke up in the morning with a headache. Still have the headache. I have a lot to catch up on, but am unlikely to be very productive today.

I'm also realizing that part of the headache is probably adjusting to the new glasses I got the night before my flight. (Japan does better glasses than the states, but the ones I've been wearing are from 2017 and finally started scratching last year.) We ordered went in to get them over a week ago, but since I wanted the extra anti-scratch coating they needed a week to do instead of an hour, and then when I came to pick them up a week later they apologized and wanted to _redo_ them because the left lens was slightly off center, but they did a rush job in 2 days this time (and gave me the other pair of lenses in case I need spares). Which is all fine, but I'm adjusting to new glasses and this is the first quiet unrushed "ok, sit down and try to work" session I've had since... and it's stacking debuffs.


May 17, 2023

Flying back to Minneapolis.

Sigh, this laptop power supply with the intermittent data connection works fine when I'm not trying to charge the battery _and_ use the laptop at the same time. Suspend it, charge it up, then keep it plugged in and use it: great. But when I don't do that, it has three obvious falure modes: 1) power supply gets VERY hot (the first time I noticed the _smell_ of volatile plastic compounds becoming airborne), 2) laptop toggles every 15 seconds or so between charge and discharge (which can't be good for the battery, and has corresponding screen rightening and dimming poer management weirdness sometimes), 3) if I forget and try to charge a USB thingy (such as my bluetooth headphones) while it's also charging the battery, the laptop freezes solid. (Probably a kernel panic that doesn't get marshalled through X11 into actually showing me the panic or even THAT it paniced. The kernel guys have been throwing functionality overboard like hot air ballons dropping sandbags, and one of the things they gave up on a while ago was "go into VGA mode or framebuffer and dump text to the screen when you get a panic". Because who cares about THAT?

I note that the OLD battery, which was smaller even before it lost 1/3 of its capacity due to age, had much fewer problems because I think its maximum charging current was lower (fewer cells). The battery charging logic goes "aha, I can feed THIS much power into the battery" and when the controller can't interrupt that and go "I said I could deliver this much power but I'd like to take it down a notch now" because the data line's gone walkabout again, Dell's charging logic goes all pear shaped.

Anyway, I tried to charge my laptop and phone before getting on the plane, laptop went suddenly catatonic and had to have the power key held down until it turned off, so I've lost all my open windows again. Sigh. Right as I was getting into a position to dig out and address the backlog.

Plane is 100% crowded again, and they made the seats smaller (again!). Front to back AND side to side, I'm trying to use laptop at awkward angle on the TV tray but it just plain doesn't FIT. And while I theoretically have a power outlet (I assume that's what the green LED on the seat in front of me near my right ankle is about), I can't see it well enough to actually plug into it. They turned the cabin lights off because it's an overnight flight, and my phone battery is fully dead so I can't use its flashlight (well I didn't really get to CHARGE it in the airport, did I?). The overhead reading light doesn't make it down there...

Not _much_ makes it down there. Air Canada added seats to their planes since pre-pandemic times, and reduced flight frequency so every international flight is 100% full. 13 and 1/2 hours in a space too small to pick up anything that fell onto the floor (unless I can hook it with my foot, I'd have to ask the guy next to me to get out of his seat, which means the person next to HIM would have to get up and stand in the aisle). The accumulated muscle cramps from being unable to move is not pleasant. Add in the usual 6pm tokyo departure time and the sleep deprivation is... getting unpleasant.

Dunno how much of this is Air canada and how much of this is post-pandemic late stage capitalist capitalist profiteering, but the days where I wrote most of the ps.c infrastructure on a flight back from Japan seem long gone.


May 16, 2023

Jeff wants me to meet Mike today and talk about future plans. I really, really, really don't want to, but there isn't a graceful way to back out of it. I strongly suspect they're going to try to pressure me into making more of a commitment to Jeff's company. (I'm not signing anything before Fade can read it. I also have no idea if Google wants to continue the toybox funding beyond what they've already done, but I'm not moving on from that until I've done all I can there. Jeff's project does not take precedence over MY project.)

Jeff hates when I say I'm working on "his projects", and insists that it's "our project". He has a "vision" that he's upset he hasn't ben able to explain all of to me because I keep getting derailed into practical things we need to do, but I'm not interested in infrastructure in search of a user and a big strategic goal that can't be concretely implemented. We've worked on and then left half-finished a dozen different pieces of technology. I care about what can get completed and put in the hands of users. If Jeff had funding for us to spend 5 years focusing on Basic Research in the vein of bell labs or xerox parc, great. But we don't. And I got enough swap-thrashing on toybox, thanks.

Still, I'm learning interesting stuff I didn't know before. And I'm FINALLY getting to the point where I know a LITTLE japanese. There have been anime dialog scenes where I followed multiple consecutive sentences! Yeah, ok, simple ones, but once there were FIVE sentences, in a row, that I understood almost all of. Alas, in actual interactions with japanese people, knowing "this is the point where the person running the cash register asks me if I need a bag" is still far more useful than my ability to parse the words...


May 15, 2023

The OpenLane git checkout is about 2 gigabytes. We went on a "deleting stuff we can prove isn't needed" spree (.git, designs, docker, docs, regression_results)... and the result is 2.1 megabytes of scripts actually needing to be installed. The .git directory is over a gigabyte, full of old long-deleted churn. They also checked in every project that's successfully built against this thing INTO THE OPENLANE REPO. (Remember when uClibc had a test suite containing every package that had ever successfully built against uClibc, and the invocation necessary to make it work in the new context? That test suite turned into the "buildroot" project. Well OpenLane has something similar, and it's FAR BIGGER THAN THE ACTUAL PROJECT.) This is pufferfish territory, the project is making itself look big, but once you cut through the cloud of squid ink there's not actually much there.

I need to fix up the toybox shell because people are using it, which means I need to finally add the command line editing and history (without which it seems way less finished than it is because monkey brains conflate user interface polish with functionality, and yes that includes _me_).

Command line editing is adjacent to the crunch_str() logic in exactly the same way fold() is, namely that "backpace eats how much" is the big missing piece of both. Which is a nonobvious question to answer because the HARD part is tabs advance by a variable amount based on where they started. (Also, nonprintable characters are TRAILING, which is the dumbest thing the unicode committee ever did. A printable character does not FLUSH pending nonprintable characters, the printable character comes first and is them modified by following characters, being REDRAWN ON THE SCREEN multiple times in some instances, which also means you're never sure you've finished a stack of combining characters until you've read PAST it and gotten a character (not byte, utf-8 sequence parsed to unicode point!) that is NOT part of this one, which you then need to unget and process seprately in the net go 'round the loop. When you've got a string fragment, you CAN'T know. It could end in an unfinished utf8 sequence. There could be a combining character following it. Pretty much the only thing that DOES tell you it's done is newline... and then what does combining characters at the start of a line MEAN exactly when there's NOTHING FOR THEM TO COMBINE WITH? (What, do they combine with an implicit NUL? What would that mean? The last newline has an umlaut!) It's REALLY STUPID because they did it BACKWARDS.)

But it's what's there, so we must cope.


May 14, 2023

I keep wanting to do a "how to bloat code" presentation, starting from the classic K&R "Hello World" and then shifting over to C++ with accessor functions doing exactly the same job but passing the data between all the various "enterprise" style contexts, demonstrating the full range of "it's not code reuse the first time" nonsense, and generally showing that you can have a very large amount of infrastructure that LOOKS like it's doing something but isn't really.

I'm reminded of this becuase the skywater toolchain turned their README from a text file into one of those "markup that generates HTML" things, yes of course it has stylesheets, and along the way they added a lint variant to check the validity of their markup, then of COURSE they factored it out into a subrepo which does a "git submodule update" at build time. (So you check out the repository, and then when you run the build it checks out more repository within the build.)

Remember: this is a README. Historically, this was ONE SMALL TEXT FILE. There's no build infrastructure for a text file. None of this is NEEDED, but this crap it's metastasized into is pulling in a chunk of another one of Mithro's projects, which is doing dependency checking against the host to make sure various packages and versions are installed... except it's checking RPM and we're running it on a Debian system so it's not finding the magic red hat package names out of the wrong kind of repository. And since they did a "-include" this whole mess should just DROP OUT if the repository isn't installed... but they're installing it WITHIN the build.

Someday I should do a proper writeup on why Google's OpenLane project stalled...

tl;dr: OpenRoad is a DARPA funded initiative, which works. OpenLane is fundamentally a small number of shell scripts that call the OpenRoad tools to do their thing in order, with reference to the Sky130 PDK which is basically "fonts and CSS to make a mask for this fab". (A fab is sort of a really high end printer. We're submitting a job to it. The job is literally a big data file.)

OpenLane is a partnership between Google and SkyWater (which used to be Cypress Semiconductor) to create an open toolchain for Sky130. Google hired the guy who did QFlow to work on the tools part, and then subcontracted much of the fab integration work to a company called "Efabless" which has an existing business taking people's design files (mostly in Verilog) and converting them into something the fab can accept. Which means if Efabless were to succeed at what Google's paying them to do, it would undercut their existing core business. There are two big projects here: OpenLane is a set of control scripts that call the OpenRoad tools in the right order to perform tasks, and the other is "Skywater PDK" which is a data dump from the fab. Tim Edwards is running a giant pile of fixup scripts in the Skywater PDK build because the fab's data dump is horrible (there's like... OCR errors in it or something?) And the resulting PDK is subtly broken half the time, although somebody found that if you run make TWICE, the result is usually good after the second time. (But the only way to determine if the result is good is to build the REST of the toolchain around the resulting PDK, then build your project with the resulting toolchain, then test result. Which is time consuming and labor intensive.)

The guy at Google running this is Tim "Mithro" Ansell, who is writing his own build system to do some portion of all this, except he doesn't seem to have done this before in a nontrivial way so doesn't really know what success looks like? He's a fan of the concept, but not a veteran. Jeff (who has done this before) keeps telling him "you need to do this" and getting dismissed as silly, and then 6 months later they realize they need to do what Jeff was telling them. Kinda like the RiscV guys, really...)

So Mithro grabbed chunks of his symbiflow project and stuck them to the Skywater PDK builder, specifically he's grabbed anaconda, which long ago used to be Red Hat's system installer. It was the large python program that would run when you booted an install CD (or floppies) that would partition and format your disk and let you select what type of Red Hat system you wanted to install, and would then install all the packages. This was back before Fedora and Enterprise happened, they replaced it with something else rather than rewrite the large pile of Python 2 code in Python 3. But the old 1990s Red Hat system installer seems to have spun out into its own project maintained by yet another proprietary company producing source-under-glass that you can see but would be crazy to try to build or install yourself) and is using it to confirm prerequisites are available in the local RPM repository. On Debian systems that don't use RPM, this doesn't find much.

Specifically, it's complaining that "yosys" isn't installed. It is, and it's in the $PATH, but since Mithro's slurped-up symbiflow plumbing that's installing a proprietopen fork of Anaconda didn't install it, it's not finding it. If it just tries to call "yosys" it's there, and presumably "yosys --version" might say if it's new enough, but instead Mithro/symbiflow/anaconda/openlane runs a large pile of python 3 which returns an incorrect answer.

Note: it doesn't have to do ANY OF THIS AT ALL, becuause if yosys isn't there then you should get an obvious build break where the last line of output is an attempt to run yosys and getting a file not found error. This is basically the "assert" problem where the bug IS THE EXISTENCE OF THE ASSERT.

It looks like if you remove that whole subdirectory, the enclosing makefile should just work becuase it has - before the include to skip the nonexistent file, and then the $(wrapper) variable drops out and it just calls the rest of command line. Seems worth a try, anyway. So I'm trying to remove the git repository, and I chopped the makefile target that clones it out.

Except he didn't just clone it there, he added it as a submodule, which means it's getting cloned already. So I need to remove it from the Makefile AND remove it as a submodule from the parent repository. Except the makefile of the parent repository is checking it out. (Remember yesterday's "how do I remove a submodule"? Because me not checking it out didn't prevent THIS from checking it out.)

One of the subrepos needs to be patched, which means we check out our own copy and tell the build where to find it. And there's even a make variable for this! Except autoconf is marshalling data from variable to variable, and if you track it back the top level configure is setting it with a hardwired path.

The whole project is like this. They keep making layers of infrastructure and then hardwiring it to do specific things. There are obviously multiple teams working at cross purposes here, and the PROPER fix would be to RIP OUT all the stuff that is we can PROVE is not doing anything. But you don't show progress in a fortune 500 company by REMOVING code. Code has a dollar value attached, generating more of it is always progress. Code gets depreciated and amortized, not _deleted_. Deleting it costs MONEY. Creating more is profit! IBM's KLOCS and so on...

We, on the other hand, are trying to get something to WORK.


May 13, 2023

So git grew a "fatal: detected dubious ownership" error whenever you cd into another user's directory and try to "git log" a repository. Not a "warning", but "I stubbornly refuse to perform the requested operation". So far the only fix is to sudo and run git as root, where it doesn't care about permissions.

That's really stupid. Barfing this way when WRITING to a repo is one thing, but I'm cd-ing into another user's directory and trying to "git log" and "git show" individual commits there. (I could tar the repository and extract a copy in my home directory so it all belongs to me, but that's deeply silly. And inconvenient.)

I don't know if google has deteriorated to the point it can't find it, or if there's no way to fix git other than to build it from source with this test patched out. Luckily, there's part of a git implementation in toybox, and this would be a reason to finish and use it.

In the meantime, it makes debugging a build that runs as a different user extra-annoying... and even more brittle than I thought? Darn it, the sudo workaround isn't load-bearing: if I do an "env -i PATH=$PATH git log" as root, I get the "fatal: dubious ownership" abort again. It's something about HAVING RUN SUDO that git goes "oh well if you really mean it". Actually doing so AS ROOT is not allowed by git. (I mean, I could destructively "chown -R root:root .git" but then the original user couldn't use it. Before finding out git was treating sudo as magic, I was thinking the right thing to do here is create an LD_PRELOAD library that wraps stat() to patch ownership to always equal getuid(), but even that won't fix it?)

I am so tired of myopic git developers. The reason I stopped maintaining kernel.org/doc is when kernel.org had a breakin (because one of the devs was ssh-ing in from a windows machine, and once you'd logged in the server wasn't that secure internally across users) they locked the barn door after the horses had escaped by removing generic ssh support, including the ability to rsync over ssh. I pointed them at a way to make ssh explicitly call rsync, with forced prefixes and everything, but they weren't interested: they'd homebrewed some horrible wrapper tool that ONLY let ssh run git (nothing else), so to update the website I had to check everything into git, including things like the gigabyte video file (USB driver writing tutorial) they'd removed after the breakin which I wanted to put back online. If the file then moved elsewhere (it was eventually uploaded to youtube) it would STILL be taking up space in the .git directory both in my local copy and on the server, forever. But they were deep in "If all you have is a hammer, everything looks like a thumb" territory, and I gave up and moved on...

Aha: I asked on the #git channel on freenode, and the magic invocation is "git configure --global 'safe.directory=*'", and they pointed at a reference for why the stupid happened. (And they confirmed it's checking SUDO_UID, which is just wrong.) Yay, found a way to make it stop.


May 12, 2023

Long argument with somebody online who claims that statically linking an initramfs into the kernel is weird, and that in 20 years of messing with Debian and Ubuntu they've never encountered it, so obviously nobody does it because their experience is universal. And apparently contradicting them was considered insulting. (Insisting that having used debian and a derivative of debian makes your experience universal, thus everyone else is weird, was apparently NOT insulting. Go figure.)

I can't link to it because it wasn't cc'd to a mailing list, only to Jeff. Mike texted to yell at me about "burning bridges", which means Jeff forwarded it to him. I'm also informed that Jeff has apologized to them on my behalf, which was not something I'd asked for (or been aware of at the time).

A lot more "huh, Google can't find this" instances during said exchange, which kinda undercut my point that what I was doing is not unusual. The deterioration of Google is getting alarming. But I did manage to at least dig up a few interesting numbers, which I can cut and paste here:

My perspective is skewed. Not just because I'm the guy who wrote the initramfs documentation in the kernel back in 2005, but I also maintain the command line utilities of Android and used to maintain the command line utilities used by Linux routers. That means I hear from those communities a lot, and they're orders of magnitude bigger than desktop Linux. That's not hyperbole: this page estimates there are 33 million active Linux workstations, and 1.6 billion active Android devices. Add in about ~750 million routers of which around 91% use Linux and "somewhere over 500 million" seems a reasonable guess, bringing the embedded total from just those two sources over 2 billion active installs.

So I regularly hit things that are "weird" for 33 million installs and "normal" for a couple billion. It's hard for me to convince the developers who make those billions of devices to show up even briefly on linux-kernel because they got tired of being called weird, and being seen as pushy when they try to explain. And if they won't show up, out of sight out of mind. (They think _I'm_ weird for still engaging with the kernel community at all.)

(I didn't even go into the PC hardawre space, where Red Hat claims to have a 33% share of the "worldwide server market" although that's in terms of who's paying for their OS, not installs. In terms of seats, all conventional Linux distros together are collectively 2.1% of desktop installs, behind ChromeOS at 2.2%. Windows is over 74%, Mac is 15.3, neither NOTICES those two. And in the PC "cloud" space... it's still Windows at 72%.)


May 11, 2023

Working on toybox stuff today instead of Jeff's thing, but I no longer feel safe huddling in my hotel room (and they're cleaning the room today anyway), so I went to the Hello Office, and when Jeff arrived he got mad that I didn't immediately stop working on the toybox thing and start working on his thing instead, and he left abruptly and angrily. I took the train to Akihabara to try to find a coffee shop there (it's the other part of tokyo I'm familiar-ish with), but didn't bring an umbrella and got caught in a rainstorm. (I have an umbrella in my hotel room and we found FOUR cleaning up the hello office, I am not buying ANOTHER ONE). Holed up in a random not-mall space, but it didn't have good seating to use laptop with so I mostly watched stuff on my phone. Eventually a lull in the rain let me take the train back, except I was most of the way to Shibuya before I realized I was going the wrong way down the Ginza line. The train was too crowded to pull out laptop there either. Got back to the hotel eventually.

Yesterday's entry was too long so I moved the battery tech description here. I have actually learned a lot about battery technology this trip:

Step 1 was to stare at boards taken out of the OLD system being salvaged and repurposed. (Well, diverted. The batteries were ordered for another project but never actually installed. The pandemic messed with shipping logistics or some such. I think they were going to be used at a wind farm in another country, but didn't make it there?) The existing system has thousands of batteries in over a hundred big cube things, each cube is sort of an industrial garden shed meant to live outdoors. It's a proprietary Chinese design made from imported western chips that require an NDA to get programming specs for from the (german?) chip vendor to figure out what we can salvage and what we have to reimplement. Each battery management board attaches to a case containing 52 Lithium Iron Phosphate prismatic cells: big blue rectangles with two terminals on top like a car battery, each roughly 15x10x4 centimeters and weighing a little over a kilogram.

The resulting pack of 52 is in a big (aluminum?) case, kind of a horizontal silver version of the Monolith from 2001, which is too big to fit in the elevator to the office, and most of them are a 2 hour car ride away anyway. (Mike has a car but Jeff and I don't. The guy who salvaged the batteries bought a plot of cheap land out in the countryside to store them. It's not anywhere near a train line, and does not have much in the way of hotels either. You know the parts of rural japan that have a lot of abandoned houses and entire towns with no one younger than 65 because of the declining birth rate? Yeah that. Jeff and Mike went out there and fetched stuff a few days before I flew in to Tokyo, but did not bring back an actual battery. Just an assortment of easily removable electronics and lots of photographs.)

So terminology: "battery" is a collection of cells, and "cell" is an individual anode/cathode pair (with electrolyte and separator), in this case those big blue rectangles ("prismatic cells"). The battery cells are wired in series because Lithium Iron Phosphate chemistry produces 3.2 volts (plus or minus ~10% depending on how charged the cell is; the voltage rise/drop is actually how you tell when you're done charging and discharing the battery). So each prismatic cell holds a LOT of power (over a hundred amp-hours) but produces a tiny voltage. Wiring them in series adds up the voltages of each battery, so 52 x 3.2 = 166.4 volts, at a LOT of amps. (Each monolith is roughly like a Ford E-transit battery I think? Same ballpark anyway, I don't have the numbers in front of me.) And each cube has a couple dozen of them: it was a VERY big battery farm.

So the board attached to the front of each of these 52 battery monoliths has four AFEs, which stands for "Analog Front End". It's a big analog to digital converter that measures voltages, and each AFE has a 28 pin connector hooked up to it through a zillion little resistors and capacitors. Those pins come from the batteries, the general idea is to have a connection before/after each cell so you can measure the voltage put out by just that one cell, and if it's higher than it should be while you're charging the battery, you can route current around it through the same pins so it doesn't charge up as much as the others in the string. Except the AFE chip can only divert like 1% of the charge current around the battery, so it's just a LITTLE bit of balancing, but it can happen each time you charge, and you can choose to stop early on either the charge of discharge if some cells are hitting an end stop and others aren't yet, sacrificing collective capacity to avoid damaging any of the individual cells.

This is how all battery management systems work, people do youtube videos about this. You start with balanced cells when you assemble the battery pack, and then do a tiny amount of balancing each time you charge them to _keep_ them balanced. If they're all from the same production run and have been linked together since, they should only really get UNBALANCED due to slightly uneven heating. But the home users making their own battery walls mixing and matching scavanged cells with very different origin and history, their battery management systems will have a LOT more work to do, and may not be able to keep up. Hopefully not an issue here.

So anyway, this chinese design is 4 copies of the AFE chip vendor's reference design board glued together (literally; they sell it on their website and it looks _identical_), with a different 5th board on one end (haven't found what they copied that from yet), and then the whole thing laminated under at least a milimeter of plastic to keep moisture out (and maybe electrical insulation). The 4 AFE chips are daisy chained together talking some SPI/serial variant, and then for the 5th board the SPI connection goes to a microcontroller. The microcontroller is from the same company that makes the AFE chip, and it's more or less a motorola 6800 from the 1970s with a bunch of SRAM and flash bolted on.

The 6800 is an 8-bit predecessor to the 32-bit m68k from the Amiga and Macintosh. The MOStek 6502 in the Commodore 64 and Apple II was to the Motorola 6800 what the Zilog Z80 was to the Intel 8080. In both cases, engineers who worked on the earlier one left to form their own company. So a 6800 SOC with 128k of sram is roughly equivalent to a Commodore 128, albeit clocked a bit faster 30 years later.

So that 5th board section is the controller, and at the far end of THAT board is a CANBUS connection to the outside world. CANBUS came from the car industry and is also used in manufacturing automation. The problem is, CANBUS is just a "read value from address, write value to address" protocol that tells us nothing about what's being said. The chinese manufacturer's board design is all under NDA (we recognized the AFE reference board they copied because there's a picture of it on the chip manufacturer's website), and the proprietary-is-good chinese company built their stuff out of chips that are all themselves under NDA. (There are perfectly good non-NDA AFEs on the market, but that's not what they chose to use.)

Even if we felt up to reverse engineering an assembly dump of the biggest program that could fit in a Commodore 128, we can't get at it because this NDA SOC has a fuse you blow to prevent reading the flash back out. (What this system is doing is generic and well understood, and the patents on Lithium Iron Phosphate batteries themselves expired last October (although that's mostly about manufacturing more cells, not managing them). But every design decision in the electronics has been about obfuscating what they're doing to protect largely nonexistent intellectual property. They want to SEEM unique and magic because you can't tell what they're doing, or interoperate with any of their existing stuff to repurpose it.)

So EITHER this control SOC is a dumb translator passing on the AFE info to whatever is at the other end of the CANBUS connection, OR this is where the battery management program lives that's measuring the voltages and making the bypass decisions and reporting how "full" or "empty" the battery is so it doesn't overcharge or undercharge and damage any of the cells. It's either a passthrough or it's the brain, no idea which.

Solution: build a new one and replace the whole board. Which also has the advantage that when these repurposed wind farm batteries run out, we can order more prism cells and put them in our own case (Jeff found an aluminum fabricator in japan that would do nicely), and make our own battery systems.

Oh, the old battery cases are also water cooled. (Well, half water half ethylene glycol.) The big cube shed things have an elaborate climate control system, and the spec sheets say that these batteries operate within about 3 degree celsius temperature range. This is partly because they packed a LOT of batteries tightly together into each cube (and run a LOT of power through them), but also because they apparently didn't want to do the math about how the batteries behave differently at different temperatures. (Which Jeff has papers on with graphed curves and math... but the chinese engineers apparently didn't bother.) Which is funny because the 28 pin connector only NEEDS 14 wires to measure the battery voltages (52/4=13, so 12 between each battery and 2 at the ends of the string), and most of the rest are probably temperature sensors? If we're using the same case and wiring harness for the initial deployment we want to reuse those temperature sensors, but... no documentation. Gotta go poke at stuff with voltmeters and crack one open to find what's actually there and look up data sheets...

All this stuff has to get re-certified to be hooked up to the grid, we need to find or create documentation for the parts we keep. (Jeff also wants to find a "current shunt" for measuring the whole battery, because adding up the individual cells isn't good enough. It's apparently somewhere in all this.) Personaly, I'm uncomfortable mixing enough electricity to run a car with conductive liquids, but it's what's already there. Cracking them open and then deploying the result is a thing we would rather not do, so we want to just replace the electronics on the front without opening the case. (Also, Lithium Iron Phosphate is WAY SAFER than Lithium Ion. I would not want to do ANY of this with Lithium Ion. The downside is LiPo4 only has half the energy density of the best Lithium Ion, but it can go through a LOT more charge/discharge cycles without losing capacity. "Lasts fifty times longer and puncturing a single membrane doesn't result in a three hour fire water won't extinguish" is rather a nice trade-off.)

Anyway, coming up with a plan for what to do with all that was "Milestone 2". Initially the goal for that was "design a demonstration prototype unit" (and milestone 3 is building/delivering like 3 prototypes they could show to people), but we wound up debugging through enough of the original electronics (and out the other side) that we came up with a scalable manufacturing plan for all-new replacment parts.

At which point I assumed we would actually start making stuff, but so far...


May 10, 2023

Jeff wanted me to come along to Shibuya for a meeting with Mike and PK today so we can go over business plans for fundraising. Because of course. I mentioned not wanting to talk to Mike, and Jeff went through several variants of "that's not good for me", "you can't do that", "get over it", and "suck it up and deal" (none of them phrased _quite_ that way), and I went along rather than argue.

I think Jeff's position here is "ha ha, Mike just _threatened_ to have you arrested and presumably deported and barred from the country, it didn't actually happen, so no harm no foul". My position is "Mike showed me who he is". There's probably some divergent neurochemistry in there, what with me being an ADHD poster child and all. (Growing up I was diagnosed "hyperactive and gifted". They hadn't invented ADHD yet. This isn't exactly rejection sensitive dysphoria because I never wanted Mike's approval, he's a friend of Jeff's whom Jeff finds useful for running the business that gives us the opportunity to work on the interesting tech. That's not the relationship jeff wants me to have with mike, and "I work on interesting tech with you" is not the relationship Jeff wants me to have with his business, either. But I started working for him in October 2014, it's 7 and 1/2 years later, and I can't think of a single thing we worked on that actually got deployed. We have not shipped ANYTHING to a customer except prototypes and demonstration units. As with Linux on the Desktop and making Android self-hosting, I keep grinding away and want it to work, and we get closer. There's a bunch of good side effects. But I am no longer trying to organize my household finances around it succeeding: for the moment Google is paying the bills so I can focus on toybox, and THAT is plenty of challenge for me. I don't know how long that situation will last, and am trying to make the most of it. This is VACATION TIME from that.)

I want to learn tech stuff from Jeff, and he's got a lot of great projects to do. I was hoping this trip we might reopen the VHDL to implement the barrel processor and fourier engine functionality we were talking about before Covid happened. Making an ASIC work through Sky130 _or_ ArtAnalog/TSMC would be great too. Jeff's also talked about doing a fresh j-core implementation starting over with a tomasulo/scoreboard design so it can do multi-issue. But we're not working on any of that, because we have more important things to do... as in chase money. I always get lured here with promises of tech work, and then we do a big fundraising document.

This time the document was called "milestone 2", and there was at least a lot of high level technical design work involved as we worked out how to recondition a load of batteries someone wants to repurpose from storing power at a windmill farm into individual combini and factory power walls. Only 6 of Japan's 17 nuclear reactors have reopened since Fukushima, meaning they leaned hard back into fossil generation without really PLANNING for that to happen, and in the past couple years the cost of Japanese electricity has tripled as fart gas got expensive due to Vladimir Putin's dick being too small. Load shifting from overnight to daytime is now potentially a big cost savings, and that's a market with legs anyway as wind+solar ramps up, so let's get into the battery management system business! Sure, why not, sounds like fun. I've been watching prudetube videos from Will Prowse and such about this sort of thing for years anyway, I'd love to learn more.

When I first went to work for Jeff in 2014 his company Smart Energy Instruments was trying to retrofit the electrical grid with sensors so we could feed a lot more wind and solar into it. I'm big into renewables and getting off fossil fuel, this IS my idea of fun. I very much want to see this project succeed, and grow into a sustainable business.

In the first ~2 weeks of this trip I learned how battery management systems work, although I doubt I could quite reproduce all the math myself. We've confirmed we can do a new one from off the shelf chips that don't require an NDA, plus technology Jeff has lying around from previous projects. Yay! The result was... a document that goes to somebody who gives Jeff money for having completed the project milestone. (But that somebody is not a "customer" and Jeff was angry that I kept calling him that.)

At the end of my original trip it looked like we were just about to actually starting building stuff, so I agreed to stay a couple more weeks. Then as soon as the trip was extended, we did the Open Project stuff to make gantt charts, and today's meeting I didn't want to attend was about preparing for a fundraising round.

After the meeting in Shibuya, Mike wanted to talk to me about setting up a meeting where he and I go over a new contract for me to come back to work for Jeff's company full-time. I was noncommittal. I'd rather NOT be arrested and deported before the 17th.


May 9, 2023

Travel arrangements for the potential talk in taiwan this summer went a bit off the rails yesterday, because when they originally asked about airport selection I thought they were talking about the DESTINATION airport not source (I didn't recognize airport code IAH)... so they booked me out of Houston. Oops. That's a 2.5 hour drive away from Austin (if I still had a car, bit longer by bus). I asked if they could add a connecting flight since a quick check of commuter flights from Austin to Houston shows a bunch for around $70... and they offered to refund me the $70 when I got there. I was too tired to cope and thought I'd try again in the morning. (Actually I went "maybe I could just send them a video of my talk, and not go in person because even if they can't amend the itinerary I'm sure they can still get a refund this far in advance...", and was up until 3am doing an outline.)

The money isn't the problem. The USA's insane security theater is the problem. Airports these days, you're supposed to budget 2 hours to get through security, and if I arrive on a different itinerary than I'm continuing on I have to go through security AGAIN (and collect my luggage and re-check it), which means my one hour layover adds 2 more hours, and I still have to arrive 2 hours early in Austin, plus the actual austin to houston flight, plus me getting up and going to the airport in Austin, which means my ~1:30 pm depature out of Houston is now something I should leave home for around 6 am. It's now a red-eye flight with something like 17 hours of travel before I arrive in a strange country to deal with a new kind of customs check and trying to find the hotel. (Tokyo I more or less know my way around now, and can recover from inevitably getting lost more than once on my way anywhere. Taiwan I've never been to, I'm assuming it has a rail system or busses of some sort? To... a hotel? Somewhere?)

What I'd MEANT to work on last night back at the hotel (instead of outlining a talk I don't have to give for months yet as a way of venting "dowanna deal with this right now" anxiety) was getting Jeff an initramfs that extracts a tarball into a subdir and then does a proper switch_root into that. Which means teaching switch_root that it's not only partition boundaries block file deletion, it should also skip the destination directory. Oh, and it wasn't doing the mount --move on the existing partition mounts, thought it was but apparently I hadn't implemented that.

Doing this means the j-core developers don't have to rebuild the kernel each time to change the root filesystem contents. (They never taught the j-core bootloader to load an eternal initrd.gz file because doing that with device tree involves either patching an existing device tree on the fly or doing the device tree overlay thing, and we just never got around to it.) Upgrading switch_root is a good toybox thing to add to the release I'm preparing anyway, seemed like a good thing to prioritize.

So after about 3 hours of sleep, I got up and started working on that, and an hour later Mike called up to yell at me that I was making "us" look poor by arguing over $70 (the conference that invited Jeff and me to speak had offered to cover travel, I took them up on it, I hadn't yet responded to the email offering $70 instead of amending the itinerary). When I told him my position on that, he shifted to yelling I hadn't left the hotel in days (not true?) and that I didn't have the work for Jeff done yet which he had CALLED INTERRUPTING ME WORKING ON. (And which I do best in the hotel because our Hello Office is hot, stuffy, windowless, and full of trash.)

They're paying for my trip but not paying me a salary. I like tokyo, hadn't been here in years, from my point of view I got a free vacation to Japan and am helping out friends, but I've GOT a day job working on toybox, and am trying to keep up with that while I'm here. I want to see them succeed, but this ain't paying the mortage (or the $8k to clean the mold out of the vents back home).

I told Mike that his call had interrupted me doing the thing he had called to yell at me about not doing, and that I was going to go back to doing it, and hung up and muted my phone for a bit. Twenty minutes later I noticed his Signal message: "Rob pick up the phone or I will get you arrested by the police". (Did I mention Mike is a japanese citizen and I'm not? On an unrelated note, when I first got ADHD meds I didn't know they weren't allowed in Japan: different schedule levels and not honoring foreign prescriptions and so on. When Jeff and I cleaned up the office I found all sorts of old pre-pandemic things, many of which I'd asked Mike about the status/location of before said cleanup. Back when I left the apartment in Japan expecting to return _before_ the end of the pandemic, I left behind two suitcases worth of stuff which Mike moved to the office when the lease on that apartment finally expired and they chose not to renew it.)

So at that point it was me trying to get the code done and sent to Jeff before the police showed up. I got it done, tested, checked in, and an updated image (with build procedure) emailed to Jeff. I then left the hotel room (still sans police), met Jeff at the Hello Office, and walked him through the build so I was sure he could reproduce what I'd done.

I'm sure other things happened later that day, but I couldn't tell you what they were. No, police didn't show up. (I dunno if Mike was bluffing or merely changed his mind. I was actually looking forward to maybe getting to go home early, I'm kind of regretting extending my stay from the 3rd to the 17th if it's going to be like this, and NOW the complication with the Taiwan itinerary thing has switched to "I asked them to fly me in from the USA and out to Tokyo, but I'm no longer sure I could/should come back to Tokyo"...)


May 8, 2023

Cleaned out the spam folder again. For some reason, gmail has decided that all of Elliott Hughes' posts to the musl mailing list were spam. (Not the REST of the threads he's replying to, just him. And not his posts to me or to the toybox list.) Yes, he's a google employee emailing from a google.com address. *jazzhands*

Finding stuff wrong while doing release notes, as you do. (Nothing highlights gaps and weird cornercases like writing documentation.)


May 7, 2023

I should record my talk proposal writeups instead of just entering them into various call for papers websites where the ones that aren't selected vanish. (I mean, I SHOULD do them as online talk videos, but it's really hard to motivate myself to talk to camera by myself in a quiet room. As the twelth doctor said in the episode Heaven Sent: "I'm nothing without an audience". I was doing ELC but the Linux Foundation's really done a number on that one and I haven't been able to brace myself for it since 2019.)

Slogging away at toybox release notes. I've done a new entry skeleton that I may leave commented out at the start of the page because I haven't entirely been consistent in my category headings. I should probably ALMOST CERTAINLY switch the index.html link to point to the "about" page (which tries to explain what the project is and why) instead of the "news" page (which is long technical gibberish release notes and not a good first impression; yeah proof of life but not a gentle slope to ascend instead of a vertical cliff).


May 6, 2023

Oh goddess. You know how news coverage and articles always seem authoritative until you read something you already know about, and then there's multiple obvious errors? I just read the Wikipedia[citation needed] article on Bionic. Lots of "that's just wrong", "that's almost a decade out of date", "Elliott fixed that because of _me_", "no you can just do this instead"... I may need to go lie down. (And I'm not even a Bionic developer!)

Jeff and I cleaned up the Hello Office by dragging most of its contents out into the conference room down the hall and then putting 3/4 of it back and throwing out the rest. (Well, right now it's a pile of trash in the middle of the office because the building's trash room is only open for an hour in the mornings, and we didn't find a box cutter for the boxes so may need to buy one. But I'm calling it a win anyway. Three hours of lifting and hauling. I do NOT get enough exercise.)

During the shoveling I unearthed a mysterious CD which turned out to have the professional photo I had taken (japan still has that service, like Sears used to) for my now-expired zairyu card, which means my own attempt was not strictly necessary.


May 5, 2023

Woo, 5000th commit to the toybox repository! I feel I should have some sort of celebration. (I bought an instance of the famed famichicki from famimart. Tasty, but very greasy. I prefer their teriyaki grilled chicken breast to their fried offering. I guess both technically qualify as famichicki, but the fried one is the meme.)

I am REALLY TEMPTED to add a new option to toybox echo so it can split the arguments it's printing with newline instead of space. There's a lot of "ls blah/*/blah | xargs | blah" to glue things together, but the other way has to use "echo blah/*/blah | tr ' ' '\n'" which is awkward (quoting both arguments!), but not awkward ENOUGH to add "echo -N blah/*/blah" and open the whole "should I try to teach busybox and coreutils about that" can of compatibility worms.

Yes, I added "test -T 37" recently, because I care that the file descriptor is OPEN, not that it's necessarily a terminal. And I couldn't figure out how to do that otherwise... Sigh, figured it out: I can just do "2>/dev/null <37" instead and the shell will error if the filehandle isn't open (because the dup2() fails). Alright, remove that then. (This is why releases take so long: writing documentation reveals needed code changes, and both blogging and release notes count.)


May 4, 2023

Under the weather today. The kebab place Mike likes to go has very spicy food, and I killed a roach that crawled over the table while I was eating. This morning my digestion was not happy. Go figure.

Working on closing tabs for a toybox release. So many tabs...

I need to rebuild toolchains with musl-1.2.4 but I never did bisect why sh2eb won't build under newer gcc, and I can't ship new toolchains without that. (Well, ideally I want to rebuild the hexagon llvm too, which had its own version skew a while back. I also want to redo mcm-buildall.sh to not need mcm, at which point I can probably stick the new replacement in mkroot.

And I need to finish mkroot/README and update the mkroot faq.html entires...


May 3, 2023

Some text I cut out of my reply, as "not helpful". (I have a venting-about-lkml budget I try to stay under, and that message already had plenty. Here, no such limit. Well, I spent so much of the rump administration venting about what the ruling nazis were doing that I forced myself to only do it on odd numbered days and left the even ones for technical stuff, but A) much higher limit, B) that was people's lives rather than just niche drama.) Anyway, what I wrote was, with URLs moved into actual links because blog instead of non-HTML email:

I was going to point you at the last kernel commit with "oppenlander" in it so you could confirm which email to use, but I have a repository here going back to 0.0.1 and his name's not there despite the patch submissions. He's not a regular in the clique, so nothing he submitted ever got in. That's linux-kernel for you.

(For all my faults, I historically _have_ managed to get code into linux-kernel. Largely because I'm really old so have been around longer than a lot of the grognards gatekeeping these days, and I was even technically the Documentation maintainer between commits 01358e562a8b and 5191d566c023. And I understand the whole "if you want to get anything done you have to complain until you're blue in the mouth" Dead Parrot aspect of the project the author of Squashfs eloquently explained ("a closed community which know everyone worth knowing by sight") ten years ago when Linux Weekly News asked if the Linux Foundation had completed its purge of all hobbyists from the open source development process, which it had. They've ossified a LOT more in the 10 years since Philip Lougher wrote that...

So yeah, happy to submit patches to someone who will actually talk about the code and not the bureaucracy+politics (he says, venting about the bureaucracy+politics).


May 2, 2023

Jeff and Mike are turning a big todo list I made into Open Project Work Items. I'm sitting with my laptop doing other stuff, but available in case they need to ask questions about the todo list.

I have an old rant about open source being unable to do user interfaces, and it's about how any time it's faced with a user interface issue the process melts down into one of three distinct failure modes. I know I blogged about it but couldn't remember which year off the top of my head, so I googled for "landley three distinct failure modes"... and then put quotes around "landley" because I recently learned that it's silently substituting in random misspellings for words it doesn't think are popular enough... and my blog STILL does not show up in ANY of Google's hits. Nor does the copy of the rant I put into the aboriginal linux about page, which I was reminded of when I looked at the talk version of the rant I gave years ago at ELC and I had that about page version up on the screen.

Google found NONE OF THAT. Despite all three containing the phrase "three distinct failure modes", and two of them being on landley.net. Google search is not healthy. It's kind of concerning, twitter going away is one thing, but Google Search will be _missed_. (They're panicing about chatgpt, but NOT about rapidly losing competence at their original core business. It seems to have started about when they laid off those 12,000 people.)

Today I learned that Open Project (and presumably whatever the generic name for crap-like-jira is) has "stories" and "epics", and an epic is a collection of stories. (Like the Epic of Gilgamesh... which seems kinda unique and nobody ELSE calls a collection of stories an epic? It's usually a series when it's not a trilogy. Kevin Feige is trying to brand the MCU iron man to endgame collection as "the inifinit saga".) This "epic" naming is pretentions enough I'm actually slightly nauseous. I would go out of my way to avoid meeting the people who decided on that naming.

Still getting emails for the "Austin Tech Happy Hour", which was a vaguely interesting thing many years ago. It seemed like a good idea to maybe meet some people on the same side of the planet, now that all the local LUGs I knew broke up; I went three or four times, don't think I actually met anybody I wound up seeing again). But at some point it grew a cover charge to keep the riff-raff out, and I really don't feel the need to pay $10 to attend a gathering of people I don't know at a bar, thanks. (Meeting random strangers with shared interests in-person is what giving talks at conferences is for. And science fiction conventions. In THEORY it's what meetup.com was about, but all the ones I tried to attend of those were "oh no, you're not allowed to enter the building without paying" nonsense too. And all those SxSW events that supposedly didn't require a badge, I stopped trying those YEARS ago because I never once got in. They were either full to capacity from preregistrations I couldn't access without a badge, or just plain "it said it didn't need a thousand dollar badge but does". As with the twitter blue checks, it's not the ability to afford it that's the problem, it's that the kind of people you're selecting for means I don't want to meet them.

Alas, my normal daily schedule involves sitting quietly in various corners reading and/or writing things, with the occasional long walk by myself. I often have _extensive_ correspondence with people at least a thousand miles away, but have to go out of my way to exchange ten consecutive words with anybody in the same town who I don't actually live with. There's a reason I founded more than one science fiction convention back in the day. :)


May 1, 2023

Darn it, glibc's wcwidth() is returning at most 1 for every character in toybox, never 2 even though when you cat tests/files/japan.txt it's all hiragana characters of width 2 (visibly measurable against an ascii text line above it). I'm trying to rewrite fold.c to do unicode properly and the glibc apis don't work.

Jeff is deeply enamoured of a pointy haired management thing called "OpenProject", so we spent HOURS yesterday setting it up so he can do gantt charts in it. Except the admin account doesn't work because it immediately goes "cross site scripting!" which turns out to be because the browser doing https is not enough, the openproject application ALSO has to have access to your let's encrypt keys. (Why? I don't know. Third base.)

This thing is the kind of "open source" you see when a corporation produces regularly updated abandonware. It has no community. There is no libra chat channel for it. Googling for things about it produces hits on their site and nowhere else (although with the sad state of google search I'm not sure what that proves).

A recurring error in our attempts to set up OpenProject is that their git integration breaks apache, which refuses to start because "OpenProjectGitSmartHttp" is a made up word its config file parser doesn't know. Googling for that word finds a closed bug report on the Let's Encrypt website where the Let's Encrypt people say "this is not our bug, ask openproject". There's also a bug report on the OpenProject website where somebody said it broke, and someone else replied "yeah it broke for me too", with no response and no fix. The bug report is from 3 years ago.

We EVENTUALLY figured out that the magic word is exported by the subversion integration code, so if you enable git integration WITHOUT also enabling subversion integration, it CAN'T WORK. (I repeat, this project has no developer community except employees of the company producing it, and THEY want you to run their magic docker where everything is preinstalled for you and you do not touch their proprietary inexplicable secret sauce "open source" code that you're crazy for trying to install/configure yourself.)

And of course if you enable the svn integration it breaks apache for a DIFFERENT reason, so we just switched them both off for now.

I also noticed that the gmail account Jeff set up for me years ago, which I'm only logged into on my phone, hasn't been inactive like I thought. When I open the gmail app on my phone (only thing logged into it), it says "auto sync is off", and I have to pull down to load to see if there's new mail. This is why I haven't gotten a new mail notification from it since last year. BUT if I try to turn auto sync on, I get a full-screen pop-up saying this doesn't apply to just gmail but will also flush all my photos to google's cloud so they can scan them on behalf of ICE and the TSA. There's no obvious way to enable "tell me when I get new email" without "send my contact list and location history to the governor of texas whose boomer supporters can sue you for a million dollars if they think your wife had a miscarriage". Hell no. I don't want to sync my photos, contacts, location history, I don't want it uploading (let alone retaining) the voice samples from speech-to-text (which I KNOW it can do locally because it does it in airplane mode, I don't know what Rossman is on about? Or is this one of those "it _can_ operate independently but there's no way to tell it not to upload everything anyway" things like I'm having with the email client?)...


April 30, 2023

So I'm writing a new unicode aware fold and I'd just like to say that posix really needs to move past the Y2K bug and enter the 21st century at some point. They have a "-b" meaning "interpret as bytes", but do NOT really handle the "not that" case.

Backspace is defined as reducing the column count by one, but unicode characters can have variable width (including zero for combining characters which should logically come BEFORE the character(s) they combine with but don't because somebody REALLY STUPID was on the unicode committee, I'm assuming from Microsoft). So in THEORY backspace should remove the number of columns consumed by the last printable character.

In practice, the flush-and-forget approach to output when toybuf fills up is a problem because we may have to backspace into it... unless we record how wide each column of output was? I mean, that's just a malloc of length -w (or shorter if we want to get fancy), AND avoids having to back up through utf8 to find the last printable unicode character.


April 29, 2023

Talked to Jeff about whether I should bump my flight back. We're getting a bunch done, and Fade and Fuzzy don't... strongly object. I enjoy Tokyo, and get about as much toybox work done here as I do elsewhere (the better work environment balancing out the extra demands on my time, although I hope fixing the mold in the vents back in texas changes that going forward).

I'm basically getting a free vacation in Japan, modulo not really seeing much of it outside of late night walks (tried to walk south to the beach; there's no beach, it turns into an industrial harbor sort of thing. Oh yeah, city. Right...) but I'm STILL trying to learn nontrivial amounts of the language.

Everybody keeps saying that "the food is so much healthier here you will lose weight", but 24 hour combini with tuna mayo rice balls and sweet milk tea EXACTLY the way I made it growing up (which was dismissed as absurd by everyone else, where's the Fools I'll Show Them All lightning when you need it, VINDICATION!!1!ichi!) it hasn't exactly worked out that way. I have two relevant Claire Ting videos queued up but would have to shuffle luggage for multiple minutes to clear space... so long walks. I wonder if there are any swimming pools available?

I'm eyeing a toybox release. It's overdue, I know, but there are so many things I'd like to get IN said release. Still, 6.3 came out and I should more or less try to stay synced with kernel releases...


April 27, 2023

The downside of this lovely office is I have no ADHD meds here, because they're not legal in Japan, and I'm really starting to notice.

Sigh, SMB_SUPER_MAGIC is still lying around, and it got moved to staging in commit 2116b7a473bf and then removed in 939cbe5af5fb in 2011, which was TWELVE YEARS AGO and yet the debris is still not just in the kernel tree, but in the header files exported to userspace. Oh hey, and USBDEVICE_SUPER_MAGIC is gone too (commit fb28d58b72aa back in 2012), but the symbol's still exported in the header. Oh, and last time I was poking at this, Novell Netware went away in commit bd32895c750b but they still have NCP_SUPER_MAGIC in the header.)

The observation in The Cathedral and the Bazaar that "with enough eyeballs all bugs are shallow" has been demonstrably untrue of linux-kernel for at least that long. There are not enough eyeballs because the kernel community is unwelcoming of newbies who would go over the obvious with fresh eyes and thus point out stuff like that. Just like the geriatric Unix community it replaced, it's now all old farts with long white beards and suspenders telling everyone who will listen about the glory days ~25 years ago.

Huh. I've got "fat" as 0x4006 in my list, and can't find that in the kernel source (not current, not 4.0, 3.0, 2.6, 2.4, 2.2, or 2.0). It came from a patch from Hyejin Kim but I have no idea where he(?) got that from? There's "msdos" and "vfat" (both 0x4D44), but no "fat" using 0x4006... And the 4d44 constant was added in linux-0.9.7 in 1992.

Right, posted the patch with a jazzhands comment, poked the github request to see if that fixed it for them, and punted on a BUNCH of questions. (If I identify smb do I say "cifs" or "smb3", both of which are driver names you can mount it with but... different behavior? msdos vs vfat is another but there's never reason NOT to use vfat these days that I'm aware of...) What I should really do is come up with a Horrible Sed Invocation that just extracts this data from the kernel source so I can regression test, but I'm not up for it right now. (In part because grep -ho 'register_filesystem[(][&][^)]*)' -r * | sort -u | wc -l says there's 97 of them, and in part because the first one grep finds is in arch/s390/hypfs/inode.c and I really can't bring myself to care about that one at the moment. And because grep 'static struct file_system_type .*_fs_type = {' -r * | wc returns 83 hits rather than 97 meaning this is NOT quite regular enough to make it easy.)


April 26, 2023

Fade finally tried one of the cans of The Dintiest Moore I left behind (I asked her to order a flat while I was there), and is a fan. It's the only american product I've encountered that makes serious use of Demi-Glace. (I don't know what non-demi glace would look like. Full glace? Ask the french. Highly boiled cow.)

Watched a frustrating history of gasoline video, which both had good historical information and repeated debunked lies out of old industry press releases verbatim.

A hundred years ago, Standard Oil worked out that mixing about 10% ethanol into gasoline prevents engine knock. All the lead in tetraethyl lead EVER did was make it PATENTABLE, because ethanol (which is the kind of alcohol humans have been drinking for thousands of years) already existed. The lead served no other function in the mixture EXCEPT to make it patentable. Tetraethyl lead is four ethanol molecules connected to an atom of lead (resulting in a molecule shaped like a swastika), and when you heat it you get back four ethanol molecules, plus a free radical of lead which goes out the tailpipe. It otherwise behaves EXACTLY like mixing ethanol into the gasoline would (which was the goal of developing the compound), and when it was finally restricted by the EPA they replaced it in gasoline with pure ethanol. Old engines that COULD use leaded gasoline (because they didn't have a catalytic converter, which the lead binds to covering over the catalyst surfaces that otherwise break down incomplete combustion products like carbon monoxide and nitrous oxide and so on), all those old engines worked JUST FINE with "unleaded" gasoline, and people only thought the stuff with lead was "better" because of years of advertising lying to them and causing placebo effect performance evaluation.

The airborne lead also made people exposed to it measurably more stupid, which is combining badly with senility in the current Boomer generation as age-related neurological degeneration overcompes their ability to compensate for a lifetime of nerve damage from massive pediatric and chronic lead exposure. (This is why everyone fled the cities to the suburbs, they moved upwind so they could breathe! But it was only RELATIVELY better, the air of the ENTIRE PLANET was poisoned (airborne lead was like acid rain and the CFCs that caused the antarctic ozone hole).

Keep in mind that organic lead compounds are generally even worse than metallic lead, because the human body is better at absorbing organic compounds and bringing them inside cells. So both tetraethyl lead itself and the free lead radicals going out the tailpipe in a cloud of superheated moist carbon monoxide and so on... that may have poisoned the Boomers WAY more than the largely inert residue it's broken down into 30+ years later. Some compounds are worse than others: the movie Erin Brocovich talks about Hexavalent chromium being WAY MORE TOXIC than other chromium compounds, and the research chemist Karen Wetterhahn was killed by a couple drops of dimethyl mercury poisoning her through her glove. The leaded gasoline profiteers were intentionally putting lead into volatile organic compounds that people would inhale, the neurological damage the Boomers suffered from this is manifesting VERY STRONGLY in their senior years.

Seriously, I wrote about this at length, with citations to multiple articles about it. Water samples taken in the middle of the pacific ocean had 20 times as much lead near the surface as the same location a few hundred feet down. Blood lead levels were SIX HUNDRED TIMES higher than samples from ancient egyptian mummies, and children absorbed 5 times as much as adults did. The Boomers were the first generation to grow up surrounded by cars, and it HURT THEM BADLY. In their 20s they could mostly compensate. But as they slowed down in their 40s the brain damage really started to show, and now that they're turning 70 two thirds of them are losing all touch with reality. This is not a case of oligarchs being better at manipulating people than the Railroad Robber Barons of the Guilded Age of the late 1800s, this is a population of lead poisoned vegetables ripe for elder abuse. Ten years ago they were falling for nigerian prince email spam, and now it's fascists finding them useful political cannon fodder. If even the rich and famous regularly suffer from elder abuse, imagine what the wider population of brain damaged Boomers is undergoing. Boomerdom going full nazi is because they literally have brain damage, which means our best chance to pull out of it and clean up afterwards is to outlive them.

Back to the frustrating video: when he later goes on to talk about "oxygenates" like ETHANOL... he does not connect the dots. This was not a new discovery. Thomas Midgely and his bosses understood this JUST FINE a hundred years ago. They chose to poison LITERALLY BILLIONS OF PEOPLE around the world entirely for profit. And then when Oil Industry stopped needing "the Ethyl Institute", the think tank reorganized itself into The Tobacco Institute to defend poisoning OTHER people for profit. And when that ran out, they reorganized into a bunch of global warming denialist think tanks to continue to kill people for profit.

Billionaires love to profit from fascists, and gerontocracy collapses into facsism, and we're suffering from both right now. On the geronotocracy thing: Hitler came to power in Germany because the previous President of Germany, 86 year old World War I veteran Paul Hindenberg, made him Chancellor in 1934 to shut him up (ahem: in hopes sharing power would appease him). Hindenberg was then manipulated into signing an emergency declaration ONE MONTH LATER giving Hitler's edicts the force of law which were not subject for judicial review for the duration of an emergency that lasted until Hitler said it was over. 6 months later Hindenberg died, at which point Hitler appointed himself president AND chancellor. Hindenberg was the same age as Dianne Feinstein (who is still in the senate), 3 years older than Nancy Pelosi (who is still in congress), and the same age Biden would be at the end of the second term he just announced he's running for. At least all those guys are OLDER than the pediatric (but not chronic) lead exposure from gasoline.

Oh good grief, now the guy in the video is on about ethanol coming from plants that absorb carbon dioxide: STOP IT. All matters is whether it's fossil carbon or not. Plants taking carbon out of the atmosphere for SIX MONTHS before it goes right back into the atmosphere does not change the amount of carbon in the atmosphere in any meaningful way. Mining operations that take carbon that's been underground for millions of years and release it into the atmosphere, THAT is what permanently increases atmospheric carbon. I do not care about rearranging deck chairs on the titanic, either you're mining fossil carbon or you aren't. (The problem with "carbon sequestration" is finding someplace to put it. A trillion dollar industry digging up carbon from miles underground is kinda hard to run in reverse at the DESIGN level...)

Reading press releases is not research.

Sigh, archive.org decided to commit seppuku during the pandemic (let's aggro every major publisher by putting their books online for free!) so I should definitely mirror the institutional memory post in my own computer history archive before it goes away. (Yes IP law is stupid the same way car-centric cities are stupid, but running out into traffic is not the answer.)


April 25, 2023

Finally got the turtle board running the 6.3 kernel and current toybox (increasing my kernel patch stack to 10 patches in the process), and... there are bugs. For some reason, ctrl-C doesn't work in the console which means oneit isn't doing the switch from /dev/console to /dev/ttyS0 (well, ttyUL0 there) properly. Another problem is that "ps" produces no output, even though I can cat files out of /proc and see the raw data it should be transforming into output.

Alas, I still don't have a proper nommu test system set up under qemu, and sneakernetting sd cards over to the turtle board for compile/install/test cycles is... really hard on the fingernails. I burned out (gummed up?) one SD card adapter already and bought a new one that's REALLY TIGHT, and have been slowly chipping bits of plastic off the ridge at the end getting the sd card back out. (The turtle board itself does the push-to-click thing but the laptop end uses a microsd-to-sd adapter, unless I want to dig up a USB adapter which is worse. I've already trimmed my fingernail to be less pointy in hopes of chipping out LESS plastic, but it's a question of degree.)

Sigh, at some point I need to do this dance with QEMU's virtual cortex-m board so I have a nommu test environment that runs under qemu, which should make regression testing this a lot easier. The problem is I don't have a "what success looks like" reference version there. Maybe I can beat one out of buildroot? (Or make puppy eyes at Geert Uytterhoeven about coldfire, that's a nommu target qemu theoretically supports as well, although I recall getting a kernel/board config to match with nontrivial amounts of RAM and useful peripheral devices didn't line up last I checked. Sigh, I should learn to modify QEMU, but just haven't got the spoons.)


April 24, 2023

The magic to stop vim from intercepting the mouse, thus preventing the terminal from letting me copy and paste text between a screen session at the far end of ssh and a local window, is the colon command "set mouse=" with nothing after it. There may have been a small rant.


April 23, 2023

I'm not just merging the j-core turtle board config into mkroot, and cleaning up mkroot general in preparation for cutting a toybox release, and also testing the 6.3 kernel. Of course there's kernel config weirdness. Kernel commit 3508aae9b561 memorializes a lot of config changes back around v5.8 that I wasn't paying much attention to at the time. IOSCHED_CFQ became IOSCHED_BFQ, IOSCHED_DEADLINE seems to have replaced the NOP one (always configured in), and MMC_BLOCK_BOUNCE went away because you can't switch off the bounce buffers anymore. MTD_M25P80 got merged into MTD_SPI_NOR.

Dirty trick: I can detect NAME=VALUE in the mkroot microconfig format and automatically insert lines other than =y or =m without needing the separate KERNEL_CONFIG mechanism... Except that the value can in theory have a comma in it. (None of the ones I'm using yet do, but they CAN.) Hmmm, I suppose I can come up with an escape mechanism for the comma? And then NOT have an obvious example of it in the file. Hmmm... The alternative is keeping the second mechanism for passing in raw lines despite nothing in the file currently using it. Or waiting for somebody to complain, which... isn't really better here because said complaint is likely to turn into "oh I can't use this" rather than "I'd better report this to the maintainer". Hmmm...

(I can backslash escape quotes and spaces, but can't backslash escape commas because the escape gets eaten before that parsing happens. I could transpose it with another character but that's black magic. I could say an assignment has to be the last thing on the line so it eats commas but I've already got multiple assignments in one config. Hmmm...)


April 22, 2023

And the air conditioner service guy back in Austin found mold in the vents. So we have to make an appointment with a Mold Remediation Specialist. Great. Well, that explains why I'd feel so tired five minutes after getting home, and had so much trouble getting a good working environment there and preferred to do all my work out at a fast food table or at the university.

(This is exactly why we had very expensive specialists come after every flood with HUGE BLOWERS and refrigerator-sized dehumidifiers drilling holes and spraying gallons of chemicals into the walls: DID NOT WANT MOLD. Didn't really think of the air conditioner vents, where condensation is kinda normal. What's in there for them to eat, anyway? Dust, I guess...)

Anyway, I'm here in tokyo, where mold smells completely different anyway. (The clothes I left hanging to dry in the apartment needed some serious re-washing.)


April 21, 2023

It's so easy to just spend the ENTIRE DAY in an APA hotel room, and ignore the outside world. I shouldn't, because it's Tokyo out there and I really like tokyo, but it's SO QUIET. (APA is apparently the middle three letters of Japan, at least on their posters... Which is weird because here it's Nippon. Medieval Dutch and Portugese traders asked OTHER countries what those islands over there were called and "Japan" seems to have emerged via consensus from multiple languages playing telephone, and then the same insane map makers who named two whole continents after Americo Vespucci went sure, "Japan", sounds great).

I figured out why Jeff can't stand them, or the windowless Hello Office: the pandemic gave him claustrophobia. Being in enclosed spaces too long gradually increases his stress levels and he needs to go OUT somewhere. I can relate, but am personally experiencing the opposite here. Let me work!

The main limiting factor is Jeff calling me up and pulling me in to his projects, but he is paying for the trip.


April 20, 2023

The old j-core ethernet driver was just too messy to submit to mainline. It's not secret, but Jeff outsourced it to some cheap Russian programmer (the lowest bidder) years before we ever met and only like 1/3 of it is actually relevant. It's got all sorts of debris from IEEE time synchronization and such that were never completed. We should really write a new one, but never got around to it.

That said, the last time it got forward-ported was 5.8, and we'd like to use it on current (6.3) kernels, and I bisected the FIRST build break to commit adeef3e32146, which made a field const and added a gratuitous new API to change it. There's a bunch of commits (bb52aff3e321, 0f98d7e47843, 9a962aedd30f) converting drivers to the new API, so it wasn't too hard to fix it up. The other breakage (b48b89f9c189) removed an argument from a function, and was easy to fix up.


April 19, 2023

Bisected the "Turtle works now" bug to commit 5d1d527cd905, which was a rewrite of the RCU plumbing for the networking code that starts "Using rwlock in networking code is extremely risky..." So yeah, I'm willing leave that part to the professionals. The symptom they saw was soft lockups, it fixed our boot hang, calling it good and moving on.

I have discovered LaserPig on Youtube, who answers the question "What if Sheogorath, Daedric prince of Madness from Skyrim, did youtube videos about the war in Ukraine in character as an extremely drunk farm animal with strong opinions about military history and equipment". I discovered him via a team up with the "oh god you've reinvented trains again" guy who keeps photoshopping Elon Musk into a clown outfit.


April 18, 2023

Trying to have mkroot more gracefully straddle the patched vs unpatched kernel issues, and also get the init script to work nicely in both QEMU and a chroot/container. Added test -T to check if stdin is open, test already has -t but I don't care if it's a tty because a chroot with redirected stdin/stdout (or piped through something) is fine and does not need to be replaced with /dev/console.

Sat down to figure out why the current vanilla kernel broke on turtle, and... it's fixed? It smelled like an alignment issue (unaligned access), and maybe it got perturbed so it's aligned again? (Or else there was a bug that hit somebody else and they fixed it?) Either way, that means the revert commit in Rich's j-core patches is no longer needed although I'm still gonna track down what fixed it so I know. (If bad alignment got perturbed into place again, it'll be back.)

And now that there's a new arch/sh kernel maintainer I'm looking at those to see what's still relevant. Vladimir Murzin's commit was merged into vanilla. There's one adding extra percpu memory... why? Rich does not actually provide descriptions with his patches, so I have no idea what actual PROBLEM he was trying to fix, and the kernel's attempts to describe this plumbing are not enlightening.

Ok, generic commits that still apply: 4c7333b0fb9e, 53ac9fc75ae0, 262e1e5884da and could maybe go upstream as-is. Commit 155d2abffb8b is jcore-specific (the clock thing), I think it's generic-ish and could go into the vanilla tree? (Need to test that the turtle board as is still works with it.) The ethernet is 186e1d80a89b and 666583fa6d5d, gratuitously split in two for no obvious reason.

I'm test building on my turtle board, by repeating:

for i in ../toybox/000[34578]*.patch; do echo $i; patch -p1 -i $i || break; done
sed -Eis '/select HAVE_(STACK_VALIDATION|OBJTOOL)[^_]/d' arch/x86/Kconfig
patch -p1 -i ../linux-sh/0001-percpu-km-ensure-it-is-used-with-NOMMU-either-UP-or-.patch
patch -p1 -i ../linux-sh/0001-revert-790eb67374-to-unbreak-j2.patch
mkroot/mkroot.sh CROSS=sh2eb LINUX=~/linux/github
sudo bash -c 'mount /dev/mmcblk0p1 /mnt && cp root/sh2eb/linux-kernel /mnt/vmlinux && umount /mnt'
sudo microcom -s 115200 /dev/ttyACM0

Bash command line history comes in handy there, cursor up a few times and hit enter, take out the sd card put it in the holder, put it in the laptop, run the thing, take it out of the holder, ka-click it into the board, plus in usb, frown at boot messages, rinse repeat.

Bisecting stuff is awkward (why the above [34578] skips some numbers and a couple patches are broken out as individual lines). Much annoyance because git insists that old is good and new is bad. You're never searching for where something got FIXED, only for where a bug was introduced. Therefore, to find the commit where the turtle board started working again (without reverting commit 790eb67374 in the patch above, that's the "it just started working again" thing) I had to call the one that does not boot "good" and the one that boots to a shell prompt "bad". Because git.

For extra fun, I have to build each one without the revert first to see if I get output, and then build it again with the revert to make sure I DO get output and it's not a different bug. So the cycles are a bit slow.

And mkroot is set up to always do a full build. I can do incremental builds out of tree, but then I have to hack the config file to point to the initramfs directory via absolute path and I dowanna.


April 17, 2023

The hotel rooms in Japan are lovely. Jeff says they drive him nuts, but they're giving me something I haven't had nearly enough of: quiet well-lit isolation with a desk and an outlet and internet access where I can get work done without interruption. (Especially now I don't have to be out of them by 10am, at least 3 days out of 4.) A 5 minute walk away there's tea the outright weird way I grew up drinking it (cold, sweet, with milk: confusing BOTH sides of the atlantic), and cheap tasty rice triangles (onigiri) which I finally figured out the intended way to open so the seaweed goes around the rice. (There's a pull tab that peels away to split the packaging with a plastic thread down the middle, and then you pull it equally off both sides so the seaweed stays in place. Gets seaweed crumbs on the desk, but otherwise works great. The point is the seaweed and rice are separate until you open them so the seaweed stays dry and crispy.)

I hadn't actually _installed_ the qemu targets I rebuilt back at the end of march, and now that I'm trying to test them "mips" still isn't working. and I don't remember what specifically I built (I can see the git log but it doesn't help), so I think I need to pull and rebuild ust mips, which is probably still "./configure --target-list=mips-softmmu"? (The QEMU devs have a terrible habit of breaking their API for no obvious reason.)

I locally checked in the "move mkroot to its own directory" stuff and pulled it into my main tree, but haven't pushed yet. I should add a "Hey! This moved!" stub when you try to run the old one, but the #!/bin/echo command line gets the name of the script you're currently running as an extra argument, and "This script moved to mkroot/mkroot.sh scripts/mkroot.sh" is... not clear.

I need ldd to fix up mkroot/root/dynamic, which runs after populating the airlock. Alas Elliott strenuously and repeatedly objected to toybox containing an ldd capable of running that loop, because it wouldn't invoke glibc's dynamic linker to find where something is currently loaded into memory when you haven't loaded it into memory. How a cross compilation running ldd on a mips binary is supposed to tell you where that library is currently loaded into an x86-64 system, I couldn't tell you, but the FSF keeps making their binaries fatter and fatter. The one in uclibc never did this.

Back in 2020 a third article about the greying of Linux came out, but it's already fallen off the web and you have to fish it out of archive.org because "Linux has gone the way of Unix, maintained by crotchety greying grognards who scoff in all directions outside their insular little niche" isn't really NEWS anymore.


April 16, 2023

It's one of those days where I'm skittling along a giant dependency chain doing twenty minutes work on one thing and then going "but first I need to do X" and getting five things deep before I go "what can I actually FINISH AND CHECK IN RIGHT NOW".

The most recent email with the guy who needs me to update scripts/root/dynamic wound up with him being able to use the prebuilt musl toolchains I provided (he was saying Linux= instead of LINUX= but it's case sensitive), but I should still poke at the "dynamic" target because mkroot shouldn't REQUIRE musl, which led me to asking whether static linking in bionic is working yet (mkroot didn't used to be able to use it because of the "segfault with no stdin" bug hitting PID 1, which got fixed upstream but hadn't made it into the NDK yet, and there's a new NDK (r25c) so I downloaded that and extracted it and went "huh, creating the cc symlink I've been doing works but seems silly because there's no OTHER tools prefixed like that in that directory anymore, where did they go? There are a bunch prefixed with llvm- except there's no llvm-ld in there...") and so on down a rathole I've also parked and stepped away from because NOT RIGHT NOW.

But I tried to build with that in a fresh directory and defconfig of course barfed with bionic because it hasn't got the shadow password plumbing, except I redid lib/password.c and friends in my tree (the new one doesn't USE the shadow.h nonsense awkwardly bolted alongside the original user/group stuff by shadow-utils back in the 1990s) and I really need to test and check that whole rewrite in, except it's big and intrusive so copying it to a new fresh directory for proper testing required some investigation: "git diff lib" says that lib.h, password.c and pending.h (which I deleted as part of this work) are the changed files to marshall over, but the new password.c has three functions (get_salt, read_password, update_password) which grep says are used by: passwd.c, su.c, login.c, mkpasswd.c, chsh.c, groupadd.c, groupdel.c, sulogin.c, useradd.c, and userdel.c. And in my big dirty working tree the changed files are passwd.c, mkpasswd.c, chsh.c, groupadd.c, and groupdel.c, so that's what I should coopy to the new tree and try to build.

Except to test this stuff I need a mkroot build (not letting it write to my /etc directory as root just yet, thanks), and I ALSO have a toybox directory where I'm moving mkroot out of scripts/ and into it's own mkroot/ subdirectory (where I can give it its own README), and there are two edge cases that I'm not sure whether I should move: 1) mcm-buildall.sh and record-commands.

Design-wise scripts/mcm-buildall.sh remains a rough edge because it populates the ccc/ directory at the top level, not under mkroot.sh. The problem is one again "lifetime rules" (you don't rebuild the toolchains every time you rebuild mkroot). So... does it stay in scripts/ or does it move to mkroot/ with mkroot.sh and test_mkroot.sh and the scripts/root directory? It's not really part of toybox, it's an important dependency for mkroot (CROSS= there is what expects the ccc/ directory), and mkroot is what has the plumbing to download external packages (via mkroot/root/plumbing) so it kind of _does_ need to be in there... But if it IS in there then the README is hard to write, because the logical sequence of scripts is then 1) cccbuild.sh, 2) mkroot.sh, 3) test_mkroot.sh. But 99% of the time, you don't RUN cccbuild.sh. Heck, most newbies will probably download binary toolchains because it's a pain.

The other thing is I want to rewrite mcm-buildall.sh so it doesn't use Rich's musl-cross-make repository anymore and is its own standalone cccbuild.sh instead, because Rich doesn't reliably maintain musl-cross-make (the last commit to it was just over a year ago), and it's really not helping much anyway. The Linux From Scratch partial build script I posted to the toybox list last month builds a gcc variant without jumping through that many hoops, and I'm leaning towards just doing my own build directly rather than working out how to feed configuration stuff through Rich's plumbing to the gcc build. I've already added a couple of my own patches to his that he won't take, and have a couple more queued up that I poked him about but he ignored. (That said, I believe he and in his family are still touring Indonesia? I type this from Tokyo, can't throw glass houses at anybody, but I try to stay in touch. He's been insufficiently communicado for a while now.)

And then there's the whole "llvm toolchains" can of worms I need to reopen at some point, which musl-cross-make is no help at all about... I suppose the pending rewrite is a good excuse to leave the old one in scripts/ for now?

ANYWAY, I'm trying to write up the new README, starting from the ancient README back when it was a standalone project, and the FAQ entry (which is another thing I need to update before checking in the move; I should probably leave a symlink from scripts/mkroot.sh to ../mkroot/mkroot.sh in the tree).

Oh hey, today _is_ the every-fourth-day that they clean the room. When I asked the guy at the front desk what time I had to be out by, he said it was tomorrow.


April 15, 2023

And lo, I have my laptop available again (yay adapter), a quiet hotel room (APA is now only cleaning the rooms every 3 days so I can stay in it all day if I like), and rather a largeish todo backlog. Let's see:

Upgrade test suite so gentoo can run it.
  Request filesystem type, umount -l.
  ldd chroot https://github.com/landley/toybox/commit/e70126eabef8
Finish lspci -x fallout.
  Check compression? https://github.com/landley/toybox/issues/386
  http://lists.landley.net/pipermail/toybox-landley.net/2023-April/029520.html
Finish cgroup stat support.
  https://github.com/landley/toybox/issues/423
Yifan Hong's continuing tar weirdness:
  https://android-review.googlesource.com/c/2536710
Peter Maydell qemu Malta patch?
Tom Lisjac (and previous guy) want scripts/root/dynamic
  https://github.com/landley/toybox/issues/418
David Legault, fold tests. (Promote fold?)
  https://github.com/landley/toybox/issues/424
vmstat for zhmars
  https://github.com/landley/toybox/issues/422
sizeof(toybuf)
  https://en.cppreference.com/w/c/language/_Alignas
fix sh2eb mkroot build (toolchain and kernel)
gzip --rsyncable
  implement deflate, implement rsync...
Ongoing cleanup of mdev.c started on plane due to /sys/block poke.
  http://lists.landley.net/pipermail/toybox-landley.net/2023-April/029525.html
Finish the cp -s work so I can do install -T
Try to beat a multi-console thing out of mkroot+qemu to test oneit change
  http://lists.landley.net/pipermail/toybox-landley.net/2023-April/029531.html

Pretty sure I've missed multiple things there. Plus I _was_ planning on cutting a release before visiting Tokyo. And there's the Linux From Scratch automation script so I can go back down the aboriginal path of making a self-hosting toybox environment...


April 14, 2023

Ah right, there are no three prong outlets in Tokyo. And I brought a three prong laptop charger. That's inconvenient. My plan to program all morning until the sun came up (what with being waaaaay off this timezone in my sleep schedule) hit a bit of a snag there.

Met with Jeff in his office, unboxed, disassembled and reassembled oscilloscope, talked about about his battery project, went out to dinner with Mike and some of Mike's friends in Shibuya where we went to a chinese-run restaurant that allows smoking indoors, where I found out that after a few years of not trying to eat while breathing cigarette smoke I've lost my tolerance for it. (As in "mouthfull of food and lungfull of air combines to convince my brain I've got a mouthful of cigarette ash, and forcing myself to swallow triggers a nausea reaction that lasts all night." That was not fun.)


April 13, 2023

Air travel moved the clock forward 12 hours and more or less ate today. Went to bed at 8pm local time anyway, which was something like 5am relative to where I got up this(?) morning, after getting maybe an hour of sleep on the plane. (Sitting bolt upright. Horrible neck cramp.)

But at least I have delivered the giant oscilloscope box to Jeff, who dumped it in the office. Tomorrow I need to reclaim the giant pile of laundry and books and such I left in the apartment I couldn't get back to during the pandemic.


April 12, 2023

Onna plane. Got up at 5am to go to the airport. Flying from Minneapolis to Toronto (which is the wrong direction?) and then Toronto to Narita airport in Tokyo. Between the layover and the going the wrong way part, it's like 17 hours of travel before I even get to customs at the far end.

It's a lot easier to get programming done on a plane that ISN'T 100% full. Getting up at 5am after finally adusting back to a day schedule doesn't help either. I had grand plans for the 14 hour uninterrupted block, but don't have the focus.

Forgot to eat this morning (caffeine yes, food no), was quite appreciative of the first meal on the international flight at like 1pm minneapolis time. That may be a contributing factor to the lack of focus...


April 11, 2023

Huh. Given the way adler32 works, if you're just looking for a run of zeroes at the bottom and it's 16 bits or less... you don't need the whole algorithm. It's just "add up the bytes modulus by the largest 16 bit prime".

That really seems unreliable? I mean... ok, fast. But "runs of zeroes" are legitimately a thing? If you compress all zeroes is just gonna reset every minimum window size (4k)?

I still want to figure out how to do the rolling addler32 of the top part. I KNOW I worked this out before, my blog says I did it in 2001 and again in 2013 and it would be nice if I'd actually DONE it back then rather than restarting every 10 years.

Of course today's interrupt is updating the filesystem type detection list, which is tricksy because the kernel isn't consistent. I already have one more small patch (basically a repeat of the v850 patch one) to send to lkml, but they'll just ignore it. (I need to reply to Andrew Morton, but "you guys no longer take obvious one line fixes" is hard to say POLITELY.)

[Editorial, April 15th as I'm fixing this up to post and replacing the [LINK] with an actual link... WOW Google search is imploding fast. Googling for "linux landley v850 patch" does not find that patch, nor does adding "elf" before patch. Adding "remove" before v850 finally found one copy of it in mail-archive.org, which is not the kernel's own lore.kernel.org/lkml nor is it the iu.edu one that's been there since 1995, nor did it find a copy in any of the archives in the vger list for linux-kernel. Google search is blind to all of those. I got the above link out of my preferred archive by checking the date on the post in the one copy Google DID eventually find after all those retries, and then going to lkml.iu.edu and manually navigating there from the top down. Remember when it was easier to google for stuff than bookmark things? Not anymore...]

To avoid preparing for my flight I've been stress baking, using up the half-finished ingredients in the fridge of types Fade doesn't use to produce food she'll eat. She tends to make big batches of what she calls "kibble" once a week and put it into individual plastic tubs, and then have the same thing for the majority of her lunches and dinners until it runs out. Generally "pasta or rice with stuff in it". I'm leaving her such a cheese pasta tomato caserole sort of thing, and a chicken rice dish, and a large pile of steamed green beans, and a meat pie.

The household's standard meat pie recipe (which I learned from Fade but I'm the one who always cooks it now) is cook and drain one pound of ground beef, add a can of cream of condensed mushroom soup (as-is), a can of sweet baby peas (drained), a significant amount (most of a pound?) of shredded cheese, a dozen or so shakes of Penzey's "california seasoned pepper", stir it all together and decant into pie crust, bake at 375 for half an hour. Her pie tins are smaller than the ones I use, so trimming the extra off the bottom pie crust leaves enough for stripes of pie crust across the top, which I can bridge with torn up cheddar slices to get two pies from one pair of pie crusts. (Which is good because premade rolls of pie crust are like $5 a box now.)


April 10, 2023

And Jeff got back to me about the Tokyo trip with less than 36 hours before the plane takes off, because of course he did. Ok, the long-delayed trip to target for 2 more pairs of pants and a new pair of shoes needs to happen tomorrow, because Tokyo hasn't got anything in gaijin sizes. (Hopefully I can get new glasses in Tokyo the same place I got glasses last time, they're much better than you get through Zenni. I think it was somewhere in a Tokyo Hands, but that doesn't narrow it down that much. They're sort of vertical shopping malls, and there's at least 3 of them we went to in Akihabara and Asakusa and possibly Shibuya?)

Sitting down to actually implement gzip --rsyncable, I'm hitting the problem that the USE I'm making of the zlib stuff is "pass off an fd and it returns when done", meaning my code doesn't get to read the data and partition it. I could do a wrapper that reads the data and passes it along, and probably will eventually, but that seems kinda silly?


April 9, 2023

Weekend. Hung out at my Sister's, saying hi to the niecephews.

The one of the four that maintains their original gender (despite whatever their father's new wife does to them every week that they refuse to talk about but are very unhappy about) got screwed over by his father in a DIFFERENT way, apparently if you've ever gone for mental health counseling even once, the navy's nuclear submarine training program will happily give you a "waiver" to get through boot camp (because they're SO not making their recruitment numbers), but will then kick you out right afterwards even if you come in 6th in your class (because you volunteered to drop a spot so somebody else who was coming in as an E1 could get promoted to E2 by being in the top 5).

Personally, I'm not a fan of career paths which you aren't allowed to quit that could order me into combat when we're not at war, especially when I know multiple people who wound up permanently disabled in military "incidents" that weren't even combat related. (Shinga got crippled in a training accident, and spent the next decade plus having to deal with VA underfunding. Remember my "apprentice" Nick from 10 years ago? Her dad got poisoned working near a burn pit in Iraq, degenerative neurological something or other, I got to watch him get worse every time I visited...) But I also didn't grow up hand-to-mouth poor. (I've done what I can to help, but it's intermittent and from far away. Kris could never move out of Minnesota without losing custody of the kids to their father's new wife...)

Honestly, put Jon Stewart in charge of it. (The six minute video in there counts as "nailing an interview" to me...)


April 7, 2023

We switched the household slack to Discord. Let's see how that goes. I have so many "send notes to self in my DM-to-me slack channel", which was a scratchpad I could easily access on both my phone and laptop, and now I'm laboriously copying the to-laptop ones by hand now that I've lost access to that in that context. Gotta do that before deleting the slack accounts and uninstalling the app. (Alas, the phone side doesn't have a selection option that can highlight more than one entry. On the web side I could mouse drag and scroll and grab multiple pages of stuff to a test file at once, but the android UI doesn't have anything similar. Press-and-hold to highlight a single entry. No obviout shift-click to highlight another entry without deselecting the first. And so, I laboriously type one entry at a time into the keyboard. I'm back to February...)

Doing gzip --rsyncable kinda implies doing rsync. According to the rsync wikipedia[citation needed] page the checksum in question is adler32, which seems simple enough, although I'm squinting at the modulus: I'm pretty sure that can happen at the END as long as the input length is less than 256? Sigh, wikipedia[citation needed] keeps saying "look at the zlib source code to see a more optimized version" rather than just saying WHAT THE MORE OPTIMIZED VERSION IS. This is a five line algorithm! Obviously it's moving the modulus to the end. Alright, let's do for loop here to see where the overflow is... If the starting input is already ffffffff and you add ff each time you'll overflow a 32 bit counter after... 5552 entries. So page sized inputs are fine. I can add a comment.

Ok, so the theory here is you do a running checksum on the input, and when the bottom X bits are all zero, you reset the deflate stream. (When Hayase Nagotoro invented blockchain there was a lot less originality than I thought: the post proposing --rsyncable for debian came out YEARS earlier.) Since deflate is designed to work on concatenated archives, I don't even really need to communicate with the encoder, this is a "close and reopen, append results together" situation. Probably you want some minimum amount of input before checking the results, and maybe initialize the CRC to something other than 0 so a run of zeroes doesn't leave it zero? (Or does the "minimum amount of input test" cover that case?)

The next question is "how many bits of zeroes" and "what's the mimimum block size", and the original paper isn't even using adler32 like rsync is so I don't want to take its answers? Unfortunately nobody seems to actually document what gzip --rsyncable is actuallly doing here, let alone how many bits it considers worth resetting for in its --rsyncable. I hate looking at gnu crap both for licensing reasons AND because it's always TRULY HORRIBLE CODE, but I'd like to be at least somewhat compatible? And in the absence of ANY sort of documentation, hold my nose and see what's publicly available on github's web view... Looks like they're using 4096. And they're using MODULUS on a POWER OF TWO to check for the zeroes.

That's just sad. I need to step away from the keyboard for a long walk.


April 6, 2023

The toybox test suite is a bunch of shell scripts in tests/*.test (one for each command name) that get run by scripts/test.sh (which calls scripts/runtest.sh). The actual tests are shell functions that look like:

testing "name" "command line" "expected output" "file input" "stdin input"

I.E. each test has five arguments: 1) the name to print when running the test, 2) the command line to run for the test, 3) what the test is expected to produce on standard output, 4) what to write toa file named "input", and 5) the input to pipe into the command line's stdin.

There's some complexity: each test gets run in an empty directory (generated/testdir/testdir) with the $PATH set up so it's testing the right command(s). Arguments 3, 4, and 5 are run through echo -e to resolve escapes, and if there's a newline at the end you have to explicitly state it. (There's almost always \n on argument 3.) If argument 4 is an empty "" string then no input file is created (and "input" is deleted between tests) so it's not there messing up ls output and so on. If any command fails to produce the expected output, the script exits (unless the environment variable $VERBOSE contains the string "all" somewhere in it) so later tests don't even get run. But that's the basic idea of the test suite.

There's a few more corner cases, such as the checks that conditionally skip tests (shell functions like "toyonly" and "optional" and "skipnot", which set the $SKIP environment variable), and a whole second set of testing aparatus providing the "txpect" function (txpect NAME COMMAND [I/O/E/Xstring]...) that works like 'expect' listing a series of inputs to stdin and expected outputs (on both stdout and stderr) and eventually an expected exit value, ala:

PS1='$ ' txpect 'shell hello' 'bash --norc --noprofile -i' E$'$ ' I$'echo hello\n' O$'hello\n' E$'$ ' I$'exit 3\n' X3

And someday maybe I need to figure out how to hook that up to pty master/slave plumbing (or do they call it dom/sub now? Hey, that's a consensual relationship...) so I can query cursor position in a virtual screen (testing stuff like "top" in an automated fashion is REALLY nonobvious). But implementing "expect" in pure bash was hard enough...

Possibly the most complex part of all this, from my perspective, is that Android doesn't use my scripts/test.sh, it just uses scripts/runtest.sh. All the shell functions are defined in runtest.sh, but the test.sh script is the one "make tests" and "make test_sed" call to set up the generated/testing/testing directory and work out whether we're testing a single toybox command (calling scripts/single.sh to build it and install it into generated/testing if so), all the toybox commands (calling scripts/install.sh to put all of them into generated/testing so they're all in the $PATH at once), or running the tests against the host commands (which is testing the tests themselves, not testing toybox's command implementations: I haven't proved much if I pass tests I wrote but nothing ELSE passes those tests. Alas the host is a moving target and each time I upgrade devuan some tests that used to pass start failing because their output changed, there's some regex fuzzing I can do but it's a red queen's race...) I'm never entirely sure what will and won't break android's testing when I fiddle with my test plumbing, but Elliott pokes me when I do and I can fix it after the fact.

So anyway, I recently added "ls --sort" which needs tests, and first I was converting the existing ls.tests from "testing" to "testcmd", which is another wrapper in scripts/runtest.sh that supplies the name of the command being tested so you don't need to start each command line string with the same command. (Not just to eliminate redundancy, but so it can force testing the _toybox_ command instead of shell builtins and alias trickery by providing absolute path to command as necessary. Otherwise testing "echo" under bash isn't testing toybox, it's testing bash.) I didn't want to change the base "testing" function to do that because sometimes you want to be explicit, ala "VAR=VALUE $COMMAND --blah" or "for i in a b c; do $COMMAND $i; done", and besides: switching them over requires editing the command string to remove the command name, which gives me an excuse to review tests using my standard lazy approach. (Same general idea as the college study advice that taking notes helps because when you write it down you remember it.) But converting entire test files from "testing" to "testcmd" is generallly good because the result is shorter and less redundant, and often avoids wordwrapping. In THEORY a nice low-brain activity I can do when I'm not feeling up to much... (Except that when I do review, I tend to find stuff and go off on tangents. It's basically horror movie logic here, there WILL be something. But I'm getting ahead of myself...)

Another entry in the "there's some complexity" pile above is that 1) the name of the command being tested is prepended to the name of the test, so you don't have to repeat it each time, 2) if the first argument to testing is an empty string, then the second argument gets used as the name of the test. So if I say testing "" "-R" ".." ".." ".." it'll go "PASS: ls -R" in the output. (Or FAIL: or SKIP: depending on what happened. They're all the same number of characters so the output lines up either way. And when going to a tty, it's color-coded.)

I converted "testing" to always prepend the command name because I never want it to NOT do that (or at least couldn't think of any use cases), but when I combined left the first argument of testcmd blank I wound up with output like "PASS: ls ls -R" (or worse, PASS: ls /big/long/path/to/ls -R" because they were BOTH adding it, in a way that was nontrivial to untangle.

So that's where I went off on a tangent and parked the "add ls --sort tests" todo item last time. And this was AFTER my previous excursion into fixing the plumbing ls was using (so the tests actually ran in an EMPTY directory, so ls didn't keep having to duck into a subdirectory to avoid showing debris; this is the downside of me saying "I can always use more tests" when people ask me "how can I help", Divya meant well but left me with some technical debt to shovel out. The real problem is I'm a prefectionist acting like I'm making faberge eggs, but if I'm not doing a BETTER job than what's already there why bother? I mean ok, "licensing", but that's not sufficient reason by itself...)

ANYWAY, with all THAT sorted, I converted the ls tests from "testing" to "testcmd" and now I'm looking at a few of them I noticed are kinda weird. The -N test was actually testing -q, which means back when it went in I didn't review it enough, even though I found one issue right off (which seems obvious: -q isn't the default and -N switches off -b but not -q, so you have to be able to switch -q on first to tell?). And now that I'm going back to try to PROPERLY test it (turns off -b but not -q because gnu/dammit of course) and then I hit:

$ ls --show-control-chars $'hello \rworld'
'hello '$'\r''world'
$ ls --show-control-chars $'hello \rworld' | cat
world

What is this "shell escaping but only to tty" nonsense? Does THAT have a new command line option? I actively do not want to implement it, because it's STUPID. What is the POINT of doing SHELL ESCAPING ONLY TO TERMINAL OUTPUT? And if you're goint to do a $'' wrapper why not just have that be around the whole thing? Why have MORE THAN ONE QUOTE CONTEXT IN THE SAME OUTPUT? What is WRONG WITH THESE PEOPLE?

I miss the days where the gnu/dammit clowns failed to add -j support to tar for 5 years because the gnu development had completely stopped. Everybody just had a standard patch they added and it was all good. That was the period during which the gnu tools became actually popular in Linux, when they WEREN'T CONSTANTLY BREAKING NEW STUFF.


April 5, 2023

And I rebooted my laptop, losing all my open windows. Not because of any hardware or OS thing this time, but because I was working out test plumbing (to fix gentoo's inability to reliably run the toybox test suite when they build the package, by letting tests request which filesystem they run under), and I did "mount blah.img sub && cd sub && umount -l ." and SOMEHOW instead of unmounting the new loopback filesystem the debian host umount command unmounted my /home partition out from under every running desktop process. So that's nice.

That's a "press the power button and hold it down until it does the unclean override sudden power down" thing. THIS is why I like to test this sort of infrastructure in qemu instances. And of course thunderbird doesn't retain emails in the process of being composed the way kmail did, nor do my 8 gazillion terminal windows+tabs restore their state to show what I was in the middle of...

The design reason I was doing that is a test should be able to go "force_filesystem ext234" at the start and if "stat -fc%T ." says that's not what we're currently using then it should dd if=/dev/zero up an image (because loopback mount can't use sparse files so truncate -s won't work here, although it's transparent to an emulator so qemu can eat one just fine), mke2fs it, loopback mount it in a directory, cd into that directory, delete the loopback file and lazy umount the directory (so both the file and the mount point get freed when the last process using them does; the mount pins the file's inode, and our test process has the mount pinned as its cwd, but no matter _how_ the test exits it can't leave the mount lying around on the host afterwards), and then the rest of the test proceeds as normal and then does a cd out off the directory as part of normal cleanup.

Except of course the gnu/dammit stat command says "ext2/ext3" instead of a proper driver name. Gotta add filters to the parsing because they got "clever" in an actively harmful way. Toddlers "helping" in the kitchen, minus the learning part.

And now that I've rebooted, chromium stopped working with slack, which is now a fullscreen "you can switch to a supported browser or you can install our data mining app but otherwise fuck you" page. I asked on the #devuan channel which says it's a #debian issue because chromium 90.0.4430.212 is the newest version in the "oldstable" category (which devuan beowulf lines up with), so now I'm asking on the #debian channel. There is a "backports" repository, but this package isn't in it.

I've heard that chromium is near-impossible to compile from source, so I'm not TOO surprised that debian can't get it to build in older environments. The standard Google problem of their code both having a zillion dependencies and being completely unportable to the point they care about the specific dot-release of each dependency. Sigh. (And hermetic builds are technically a move to be LESS accepting of variations in build environment. You deploy the one true build environment to target, because building in an emulator on a provided image is slow. I care very much about reproducibility from first principles. This does not appear to be a common viewpoint.)

But needing to do a major OS version update to regain access to my household slack on my laptop? Sigh. I might need to start caring about firefox again. (It's a "household slack" with Fade and Fuzzy, but half of what I use slack for is cut-and-paste of URLs to my phone, and running another OS in a vm to run chromium in there means I'd have to get cut and paste working in kvm, which... enough ratholes for one day, thanks.)


April 4, 2023

Had to go to the hospital to have a piece of glass professionally removed from my foot. Not my most productive day otherwise.


April 3, 2023

One of those "spent all day trying to get in the headspace to do productive work, and didn't" days. Cooking and cleaning and generally being Fade's housewife worked out ok. And I did actually invoice the middleman. (Yay!)

(Why does avoidance productivity either put me into DO ALL THE THINGS mode or else completely stop all work, with no middle ground? This wasn't even tax paperwork, it was "resubmit an invoice". Which yes I had a bad bureaucratic experience with 3 months ago, but seriously...)


April 2, 2023

The mkroot dynamic build (which there's a waiting user for) SEEMS simple, but the current script is using a "cc --print-search-dirs | xargs cp -a $TARGET" approach that winds up populating the target with over a dozen gigabytes of crap which will NEVER fit in a ramfs, and is big enough repeatedly doing that build seems likely to noticeably shorten the life of my laptop's SSD. (And that's after I fixed the "it's copying symlinks" problem that cropped up in an OS version upgrade, so that it wasn't that big, but didn't reliably work either.)

My first stab at cleaning that up was "copy everything to target then sort the hashes, use them to compare files and hardlink together what's identical". Which cuts the space in half but the result is still multiple gigabytes and doesn't reduce the disk thrashing at all (the files still get copied before being discarded).

Now I want to dig up my old "run lld on each file on target to get list of libraries actually in use and copy just those into the new chroot" approach that I had a bash script for even back in the busybox days, code which I recently removed from toybox in theory because mkroot superceded it, and in PRACTICE because I need ldd in the $PATH to make that work, and when I mentioned my desire to add that to toybox Elliott had kittens. (I still don't understand WHY. He doesn't have to enable it for android, but it's a thing I personally have an immediate use case for. Copy this and the library files it needs, and the library files THAT needs. Do it recursively but skip ones already present on target, which prevents endless loops. My first script to do that was in 2001, back when I first put together a tiny boot image with binaries harvested from the distro I was running. I want to say... Red Hat 6?)


April 1, 2023

Sigh, need to invoice the middleman. I can do it on monday. (It's not EXACTLY rejection sensitive dysporia, "submitted paperwork that got bounced because weird politics, reluctant to do it again" is at least in PART me wondering "do they want to hold on to the money because their finances are dire enough that having it in their bank account reassures THEM, and if so will they actually pass it on once the arbitrary limits they invented are satisfied?" A bank limiting withdrawls to $3000/day is not a healthy bank. Their behavior is NOT A GOOD SIGN, and I'm reluctant to put pressure on the broken thing and see if it hurts... but I'm trying not to invent an unnecessary crisis here either, and "I last got paid in October and we just sent more money to the IRS than I've ever paid for a car" is ticking audibly...)

I suppose the flood of "april fool's" nonsense online shows that people are feeling better? It went away pretty much entirely during the Rump administration because geratric fascists shouting "fake news" to discredit their opposition by loudly and repeatedly asserting that anything they didn't like simply couldn't be true (the previous nazis called this the "big lie") made anybody ELSE not scrupulously telling the truth and fact checking everything they could... kinda inadvisable. It's still really annoying, and seldom even slightly funny. Oh well.

Going through the github requests trying to find simple things to close, but I've just been overcomplicating stuff recently.

For example somebody requested the "shuf" command yesterday, which I added, but I spent a while arguing with myself over whether it should use random(), lrand48(), or getrandom(). In theory the randomest is getrandom() but each calll consumes kernel entropy, which seems overkill for something like this? And it's _wasting_ entropy because it returns whole bytes and I then have to chop it down to just what I need, the common case of which is what, 500 entries?

Initializing a prng from a proper entropy source is a classic middle ground for a reason, and the _easy_ way to chop a randomness source down to a specific integer range is modulus, which will introduce bias unless the modulus is MUCH smaller than the range of the random number (so the uneven coverage of the last wraparound is statistically insignificant).

In the end I just went with srandom(millitime()) and then random()%count which is good enough. (And the trick to make it efficient is lines[ll] = lines[--TT.count] because if you don't care what order the not-yet-used entries are in, swapping the last one down into the hole you just left avoids the memmove() you'd do to close the hole while keeping them in order, or any sort of usage bitmap nonsense.)


March 31, 2023

What?

FAIL: chmod 750 dir 640 file
echo -ne '' | chmod 750 dir 640 file &&
  ls -ld 640 dir file | cut -d' ' -f 1 | cut -d. -f 1
--- expected	2023-04-01 02:57:10.424197685 -0500
+++ actual	2023-04-01 02:57:10.428197685 -0500
@@ -1,3 +1,3 @@
--rwxr-x---
 drwxr-x---
 -rwxr-x---
+-rwxr-x---

Sigh, I broke ls with the new --sort stuff, because when I reused the -A and -d flags I didn't UNSET them from the base set, so ls -d no longer produces output in the same order. Oops.

Of course I left myself a todo about this: a pet peeve of mine is --longopts without a corresponding sort option are un-unixy, and the new short options I defined for the new --sort types that didn't already have any were -! and -? (except ? is a wildcard and CAN occasionally misbehave, a pet peeve of mine with "qemu-system-mips -M ?" to list available machines is you need to quote the ? if there's a single character file in your current directory; the reason the magic . and .. files don't count here is wildcards won't match hidden files unless the first character is an explicit period... Ahem, anyway maybe -~ would be a better option since tilde is only special to the shell as the _first_ character of an argument, and ~ means approximately anyway so case insensitive shouldn't be TOO hard to remember.)

Using punctuation like that means I'm MUCH less likely to conflict with existing or future gnu nonsense. (The cut -DF support STILL isn't upstream in coreutils, last I checked. I should poke them again...)

So I need to grab the extended argument parsing plumbing I added WAY WAY BACK while working on mkdosfs, which wanted -@ to set the offset and I added a whole mess of lib/args.c and scripts/mkflags.c plumbing to allow that. Which I checked in and tested and everything, but the only user of it is still out of tree in my local pending directory because I got distracted and still haven't finished mkfs.vfat. So, how does it work:

Take your ascii character value (@ is hex 40), set the high bit to turn it into "high ascii", turn that into a good old K&R C octal escape circa 1976, and include the octal escape in the option string: for -@ it's "\300". The FLAG macro you get is FLAG_X followed by two hex digits, in this case FLAG_X40 which means FLAG(X40) should work.


March 30, 2023

I didn't get a haircut before I left Austin, and Fade's suggestion is next to her office which is a half-hour walk each way, but I could use the exercise. That and visiting the Tiny Target next door at the whole morning.

Fade pointed me at a lovely little study room off the side of one of the light courts in the apartment, which would be perfect for recording tutorial videos. It's also a nice place to get away from Endlessly Barking Dog.

I want to write up a quick mkroot explanation for the qemu guys (who are testing with my mips and misel mkroot images, yay!) but alas, it's not quick and simple. It SHOULD BE, but I'm having Pascal's Apology again. Which is why I need to record a video tutorial for this. (Hearing it out loud helps me get a written version to be concise and intelligible too.

Whole bunch of work, that.


March 29, 2023

Recovery from travel.

A QEMU thread has me rebuilding all the qemu targets again, which is a bit of a time sink.

It's kind of hilarious that Ubuntu doubling down on Snap and SuSE doing a flatpak distro came out the same day. Snap is Ubuntu's proprietary version of flatpak the same way Ubuntu had the upstart init system, unity desktop, mir 3d compositor thingy... Ubuntu is run by a white male billionaire from south africa, not a whole lot of "listening" or "following" going on there. It's a real pity that their move to replace /bin/sh with the Defective Annoying SHell was swallowed by Debian, but Debian also switched to systemd which was approximately as stupid. Being at the mercenary of a billionaire's whims can't be comfortable. (Debian has an unfortunate history of FSF-adjacency, which means its development got so flamewar constipated the project almost died with many years between "debian stale" releases, and Connonical hired at least one a full-time developer to shovel out the mess on the engineering side of _debian_ (not ubuntu) because the open source project he'd overlaid his proprietary project on going under would have been embarassing. This was also the period where fleeing Debian developers squashed Gentoo, which that distro never really recovered from...)


March 28, 2023

Travel day. Onna Airplane.

I'm carting two pieces of checked luggage, the first of which is a suitcase inside a suitcase so I can fill one with Japan Loot for the return trip. I've probably missed Milk Seafood Ramen season (to Fuzzy's great disappointment), but I also left a bunch of clothes and books and stuff in the apartment when I left (meaning to return) over the pandemic, and it got packed away into storage and I should reclaim it.

The second piece of luggage is the GIANT CARDBOARD BOX with the oscilloscope Jeff sent me in the middle of the pandemic. Apparently the stuff they made 50 years ago is way better than the stuff you can get today, because nobody makes analogue waveform storage anymore and the digital equivalents are hundreds of thousands of dollars IF you can find something sufficiently high resolution. So when one comes available at a good price (usually because the people who knew how to operate it retired or died, and the inheritors don't value something they can't use), he snatches them up. This sort of thing can record signals for DDR3 and USB3 busses. We've done LPDDR2 and USB2 already because the cheap digital stuff can keep up with that, but anything faster gets expensive rapidly to see what's actually going across the wire.

Analog storage is the same general idea as a mercury delay line: giant capacitor that reproduces the wiggles of the input in its output, then you can loop it back on itself to retain the signal for a while. It is INSANELY high accuracy, calibrated with NASA-style equipment that sadly doesn't exist anymore. The downside of the analog stuff is A) the stored signal only lasts a few minutes, B) there's a hard cap on the SIZE of the capture because the delay between input and output is fixed and you can't record more than that at a time.

(When Scotty stored himself in a transporter pattern buffer for decades, the technobabble description was a bit like this. And in the Dr. Who episode Timelash, the 6th doctor used a McGuffin based on this principle to hit the bad guy with his own zap gun. Modulo "no, it's still totally murder when you're counting down like that while pointing the output at him, you could have just turned it to face the wall; there might be a limit to self defense when the dude is literally begging", but that's the kind of writing Colin Baker suffered under. At least he never had to deal with Yellow Kangs or the Kandy Man.)

Anyway, giant box under my desk in the bedroom became giant box in bedroom closet which is now giant box in Fade's apartement, which I hope to convert to giant box somewhere in japan that is no longer my problem. MAILING it to japan would cost hundreds of dollars (more than Jeff paid for it, that's why the seller only offered US shipping, not even to Canada), but it's just under the weight cap for checked luggage, and they don't charge extra for it being _bulky_. (The weight limit is a health-and-safety thing, maximum weight workers can be expected to individually lift between conveyor belts lots of times per day. Anything heavier than that requires two people to lift for liability reasons, and thus special labeling and handling procedures, and generally gums up the works trying to load and unload the plane quickly.)

No idea when Jeff plans to fly me to Japan, but hanging out with Fade until then. Disposing of Giant Box is a nonzero portion of the reason I agreed to the Japan trip. (I also really LIKE tokyo, and it would be nice if the stuff I worked on for years actually got launched out into the world, although toybox comes first these days thanks to Google.)


March 27, 2023

Bunch of errands today. Four bus rides.

I've meant to switch credit unions for years now, and as long as I was going to be down at UT after 9AM anyway... I'm seldom still there when I walk because I head back around sunup or I get all hot and sweaty, plus I can't see my phone display in full sunlight. And then I needed to go from UT up to The Domain to close the old credit union account, because that's the closest remaining Amplify location. University of Texas credit union has 2 locations and 2 ATMs within a half hour walk of my house, and two more at the university from a bus that picks up within sight of my driveway. The closest remaining "Amplify used to be the IBM Texas Employees Federal Credit Union but renamed itself" location is eight miles away, a two bus minimum each way or something like a $40 round trip on lyft WITHOUT surge pricing.

Fresh full backup of my laptop to USB drive. This SSD is old enough I'm occasionally checking dmesg to see if it's started to get unhappy about stuff. (Shouldn't, but I can be hard on things...)

Huh, corner case in the toybox test suite. So the general theory of toybox tests is a file full of testing 'name' 'cmdline' 'result' 'infile' 'stdin' lines (each is a call to a bash function) where the first argument's the name of the test to print on the PASS: line, the second argument's what to run, the third is the stdout output to expect, the fourth is data to write into a file named "input" (which only gets created when that's not blank), and the last is what to feed into the command's stdin.

Three complications to this: 1) The 'name' has the name of the command being tested automatically prepended to it so you don't have to repeat it every time, 2) there's a wrapper function testcmd which inserts the name of the command we're testing into the start of the 'cmdline' argument so we don't have to repeat it (and it makes sure we call it out of $PATH instead of a bash builtin by providing the absolute path when necessary), if the 'name' argument is blank it uses 'cmdline'.

The problem is that if you leave 'name' blank in testcmd it prepends the command name TWICE. Once when testing() prints the PASS/FAIL/SKIP line, and once in the testcmd() wrapper.


March 26, 2023

Got tired of waiting for Jeff to actually schedule a trip, and got a plane ticket to visit Fade up in minneapolis. (If I'm flying to tokyo, it should be from there.)

This means I have SO MUCH TO DO before then. Laundry! Fresh full backup of my laptop! Toybox todo items I should flush up to github... And it means I should NOT walk to the table tonight, because then I won't get anything done during daylight hours tomorrow because sleep schedule. Alas...

Fuzzy's birthday was on the 20th and we ordered her an Oculus 2 so she could play beatsaber, and it does not work. So we're returning it, which means I need to drop it off the return box at the Amazon lockers in Gregory Gym (the building with two first names), and I thought that was my excuse to do my 4 mile nightly walk watching anime on my phone despite the earlier "I shouldn't do that for schedule reasons"... but the building doesn't open until 9am. (I'm currently on a night schedule. The flight tuesday's noon-ish. Gotta impedence match between now and then.)

If I'm planning to be at the university during daylight hours I should get a new credit union account at the UT credit union on Gadalupe. Which means I should also close down my old Amplify account (which used to be IBM Texas Employees Federal Credit Union before they moved entirely out of Austin up into the northern suburbs. The closest location left is in <snootiness>The Domain</snootiness>, which is 8 miles from my house, an hour away by bus or bicycle. All their closer locations closed years ago.)

I deleted the Google Maps app off my phone screen back when it turned into all advertising all the time and stopped showing me black owned businesses (such as the haircut place I regularly go to in Hancock Center) even when I zoom in all the way, but sometimes I still need to see how far it is from point A to point B and what bus to take (and/or when things open, which it's never been quite right about since the pandemic), and when I do that I'm using the web version on my phone. Here's the SERIES of bugs I just hit in Google Maps' web version: enter the two addresses, hit the arrow on the keyboard to actually search and... it doesn't do anything. Plus it's scrolled itself to the right in a way that won't let me scroll back left so I can see the start of what's written on the page. And when I rotate it from landscape to portrait mode in hopes it resets itself... it loses track of the addresses I entered to ask directions about. It loses track of the location I was looking at, and instead reset itself all the way back to zoomed out full city view. That part's trivially reproducible, does it every time. Ask directions, type in the first address, rotate the phone, and the page undergoes a hard reset losing all context. Bravo Google. Your own browser in YOUR PHONE can't handle your website. That's... *chef's kiss*.

Anyway, from UT to The Domain is one bus (the 803). Yay. I should do that. (I don't want to give Patreon and such the banking info for the household account. I'm still paranoid about combining "money" with "internet".)


March 25, 2023

I got the ls --sort stuff checked in but not properly tested. Confirmed it didn't cause any obvious regressions in the test suite, but then got distracted by the whole Microsoft Github clusterfsckery trying to check it in. Had to delete the man-in-the-middle key four times before it stopped complaining. (IPv6 is not fun.)

Hmmm, tests/ls.test is ugly. Each test is bracketed with "cd lstest && $TEST && cd .." because otherwise the "expected" and "actual" files wind up in the current directory listing, and hence the output of most tests. The first being the output the test is expected to generate (argument 3 to testing()) and the second is the file output is currently redirected to, which are files so we can diff them and naturally get useful labels on the results. There's a fourth file, "input", but these days that's only created when testing() argument 4 isn't blank.

I suppose I could move them up a directory level? Because the action's taking place in generated/testdir/testdir, with the first "testdir" being where temporary binaries we're testing live. Since none of them are called "expected" and "actual" it shouldn't conflict if I use it as a work directory. (Modulo whatever Android's doing to use this test infrastructure, I THINK it should be ok? They use my scripts/runtest.sh but not my scripts/testing.sh which sets this up... Sigh, I should poke Elliott, shouldn't I?)

Walked to the bat bridge instead of UT, 25k steps total instead of just 10k, but my back was killing me when I sat down on the couch in Jester Center and I didn't get anything done. (They're kind of terrible faux leather couches on the second floor, mostly there for show I think, and my lower back's been unhappy since I slept on it wrong a few days ago, like a crick in the neck but older and more decrepit. I _really_ don't want this to become chronic because it wouldn't just suck, it would be CLICHE. The difference between being 15 and being 50 is problems resolving in about 8 minutes vs problems resolving in about 8 days. Lots easier for to fall behind on cumulative wear when it's not clearing itself nearly as fast as it used to.)


March 24, 2023

Finally dug up an old-style micro-USB cable that WASN'T a charger cable but actualy did data, so I can see the serial output on the turtle board. It works fine once I got a cable, but the linux-kernel I built and released last time does not work at all. (No output to serial once the bootloader hands off to it.) The one the sdcard had on it was linux-5.10 (dunno if I tested something newer since, that's just the reference version I know works), so there's some bisecting to do.

Huh, the musl-cross-make toolchain rebuild I did with gcc 11.3 earlier this month didn't build the sh2eb cross compiler because libgcc/unwind-pe.h had an error: '_Unwind_gnu_Find_got' was not declared in this scope which... I mean clearly it's a gcc bug, but what exactly broke? (It built sh4. Is this a nommu thing?) How do I track that down... What I _want_ to do is bisect it in the git repository, which is tricksy. It's slow to build gcc at the best of times, and mcm with my wrapper script doesn't do partial compiles.

I'm kinda tempted to compare the Linux From Scratch chapter 5+6 build script with musl-cross-make and just do a toolchain build script. If I have to fish out my own patches to make the build work _anyway_... I did that in aboriginal linux, this time it should probably be a proper project all on its own.

That's already 2 nested tangents from what I'm TRYING to do.


March 23, 2023

Got the LFS chapter 5+6 script building to the end. No idea if the result's actually useful yet, haven't done the chroot and started the second script. For some reason following the current LFS instructions, half the new commands _aren't_ in the /tools directory? They're in the normal paths. What's the point of the airlock step if you do that? I has a CONFUSED...

Alas, my initial naieve attempts to run record-commands to get a log of the host commands called for this build script... did not work. I need to update scripts/record-commands until it works right out of the box even when I haven't looked at it in 6 months and don't remember how I'm "supposed" to use it. (For one thing, it calls scripts/single.sh to build the log wrapper. It should check if "toybox" is already in the $PATH and symlink logwrapper to that if so, and only try to build it if it can't. Otherwise, you can't use it from anywhere OTHER than the toybox directory...)

I also have a github bug request from somebody who did scripts/mkroot.sh and then couldn't "ping" anything because glibc is crap at static linking. Um, yeah. That's why I added a "dynamic" script, but I've updated devuan since the last time I poked at that and now it's copying a bunch of symlinks into the target, including absolute paths outside the chroot. Unfortunately, when I add a -L to the cp -a the result is 1.7 gigabytes of usr/lib space because glibc is an insane pig, so I need to hardlink them back together to get the size down to a dull roar.


March 22, 2023

Back at the table again (I've missed this), putting together a Linux From Scratch 11.3 build script, so I can do the old trick of substituting in toybox commands one at a time and comparing the output to make sure nothing changed. (I should probably diff the config.log as well. To get consistent results I should do single processor builds, but I'm having the script make -j $(nproc) and then I can just "taskset 1" to force that single threaded later.)

Jeff thinks he might wind up flying me to tokyo on monday, but the hard part is working out hotels. It's cherry blossom viewing season there, which coincided with spring break in the states, and it's the first time in 3 years Japan's been open for tourists. The hotel room shortage has not eased up at all yet. Still a big staff shortage. They've announced plans to allow more foreign workers, but it apparently hasn't manifested results yet...


March 21, 2023

At the table, with a can of checkerboard tea. It's been a while. (Ok, I'm at one of the tables NEXT to the original one, working on battery because the outlet's blocked off, and ignoring the construction fencing. But still: same porch, same lighting, same comfortable seating.)

Poking at dd.c because I had the tab open, and... ok, that's a kind of painful use of TAGGED_ARRAY. There's nothing BUT the strings and the position indicator for the strings. This makes me sad. There's gotta be a better way to do that. I'm not sure what that better way IS, but this is ugly...

And now distracted by the half finished ls --sort plumbing, which I have now finished and the result compiled and failed the very first test in "make test_ls". Great.


March 20, 2023

I have done something to my back while sleeping. It's like a crick in my neck, except lower back, and I'm on something like day 3 of this. Reeeeally hoping it doesn't go chronic.

There's an i2c bug report on github that's been... badly explained repeatedly. I think the submitter doesn't have english as their first language, and I have no i2c domain expertise, nor do I have a test environment, which is why I haven't done the normal level of cleanup on this command, which ALSO means I haven't done as much review.

Because writing code is easier than reading code, I tend to rewrite as I go to utilize my far-more-practiced writing code muscles to help with the reading. Yes I know it's a bad habit, and sometimes I throw away the result because it's just marking stuff up in red pen, but that seems a waste with toybox? If I'm gonna clean up the code and thing the result is an improvement, I want to check it in, but I can't test for stupid thinko/typo regressions if I don't have a test environment and ANY change can theoretically introduce a regression. I've borked semicolons or bracket nesting levels in code refactoring before (back in my tinycc fork), and the result compiled but subtly misbehaved. Gotta test. CAN'T test. It's a problem. I've USED i2c tools on various board over the years, but it was all at contracts where I left the hardware behind with the job. My laptop hasn't got it. I don't THINK the turtle board does either but when I just tried to boot it up I didn't get serial console... it uses the old pre-C usb cables and I think this one might just be a charger cable not data? (Why do they DO that?)

I'm poking at qemu to see if that has a good test environment for i2c somewhere, but none of the ones I built did because the kernel hasn't got CONFIG_I2C enabled, and when I switched that on (and CONFIG_I2C_CHARDEV because that's not enabled by the first thing for some reason), then there's DRIVERS: I2C_SCMI, I2C_CBUS_GPIO, I2C_GPIO, I2C_OCORES, I2C_PCA_PLATFORM, I2C_SIMTEC, I2C_XILINX, I2C_MLXCPLD, I2C_VIRTIO... Plus whatever I2C_HELPER_AUTO is for... protocols? I suppose I could just switch it ALL on and see if any of the QEMU board emulations bind to something? Whatever I come up with should probably be added to scripts/root/tests so I can build regression test systems that do this automatically, but first I need to make it work _once_.


March 19, 2023

You can sing "closing tabs" to "closing time".

Trying to collect old superh patches for Glaubitz (the new arch/sh maintainer in Linux), but... there's so much old debris here and I have no idea what's still relevant. I collected lots of groups of 4 or 5 patches at a time and sent them to Rich when he was nominal maintainer, most of which never got applied, but I didn't exactly archive them again afterwards. (Checking back email in my sent box is one of the avenues of investigation here...)

Hah, scp-ing my blog file and corresponding rss file up to the website takes LESS THAN A SECOND with the new router. The old one took long enough I usually tabbed away and came back.

I keep meaning to find a way to post these bog entries to mastodon so people can reply there. This thing is a text file I edit with vi and periodically rsync, with a python script that generates an rss feed based on the lines that start each entry being regular enough (mostly thanks to cut and paste) that the text parsing to chop stuff out and plonk it into wrappers is pretty simple. But there's no WAY I'm turning that into an activitypub feed any time soon, and

Mastodon can provide an rss feed, but not let you FOLLOW on rss feed. Or easily convert an rss feed into mastodon posts at some known @user@server account. (If you google for it there's dozens of weird little projects on github or websites to do-it-as-a-service that seek to address this, but no real winner emerges and Google's search ranking to indicate which ones to look at first has deteriorated into uselessness over the past few months. My wife regularly complains about google becoming useless and she's not a techie.)


March 18, 2023

Blah, I need network block device tests, which is fiddly both because it's a client/server thing requiring root access AND kernel support for the /dev nodes, but also because the server and the client test against each other and "make test_nbd_client" would build just the client and then try to grab the server out of the $PATH, which most likely isn't there. As with the tar --xform stuff needing toybox sed, the test is looking at a _combination_ of toybox commands, which... the current test suite isn't really set up to do. (Well, "make tests" that tests ALL of toybox at once can, but not in a more granular fashion.)

I can have the nbd-client test check that nbd-server is there and fail to run if it isn't, but... the tests are mostly the same on both sides? Sigh, what are the tests:

  1. nbd-client can mount nbd-server device on loopback, read a 4k block from it, write a 4k block to it, flush, and exit.
  2. nbd-server -r: can export a read-only file, client can mount it read only and read from it, client can't mount it read/write. (Or does it fall back to read-only like iso9660?)
  3. nbd-client without -b does default to 4k, and -b 1024 is a different block size. (Different ext2 filesystem mounts care about underlying block size for reasons I'm not entirely clear on, but it gives a failure case to check).

Hmmm, so far most (all?) of the toybox servers are inetd style. I should probably find some way to indicate that in the help text? Ok, sntp isn't because that's a UDP protocol, and figuring out when a UDP transaction is _finished_ is AI-complete. By which I mean "C3P0 could do it, but I wouldn't trust chatgpt near it". There's one of those P=NP things going on with this AI nonsense, where closing the gap is likely to take multiple lifetimes if it can be done.

Possibly I need more lib/net.c code to do a server wrap thing that takes a callback function? Except then my httpd and nbd_server need more command line arguments to indicate the server and port to bind to, which is a UI issue. Hmmm, I need to revisit httpd anyway to add the rest of cgi support. And nbd_server already says "ala inetd" which is funky since I don't have an inetd in toybox.

Long ago the samsung guys contributed tcpsvd to pending, which doesn't share any code with netcat. It does do a number of things netcat doesn't: limits on simultaneous connections, sets a bunch of environment variables... it also doesn't support nommu (which netcat server mode does), and combining vfork() with the -h option to look up remote connections (which can take an arbitrarily long time) does NOT sound like fun. Um, wouldn't the -b N thing be rendered irrelevant by kernel syncookie support? It's been YEARS since I've looked at that, where does -b get used... no FLAG_b and it's not TT.b it's... Sigh, count the arguments: TT.bn. Am I going to have to clean this thing up just to properly EVALUATE it? Grumble grumble... I really dislike duplicate infrastructure, but at the same netcat doesn't track multiple children. Plus this one hasn't got the "cat" part, it's always setting up filehandles and leaving the reading and writing of them to a child process.

Hmmm... I suppose I could clean it up and potentially merge them _later_? They have a hand-rolled hash table implementation. It's doing an error_exit() on recvfrom() errors. Is there a UDP packet you can send that DOS the server? (I remember TCP out of band data, but not UDP?) Why is it using sigemptyset/sigsuspend instead of just pause()? Does tcpsvd MEAN to write a trailing nul byte on the message part of -C COUNT:MESSAGE or is this an accident? (What do other implementations do? Is there a spec? Sigh, break down and look what what busybox does: no they do not have a trailing NUL byte, and they use nonblocking send() instead of write, which seems kind of important. Although I could probably fcntl(F_GETFL/F_SETFL) to set O_NONBLOCK, but why when send() exists? I could also check MTU length vs the message, but again... simple thing.)

Oh this looks like a long cleanup. And learning domain expertise. Why am I opening another can of worms when I'm trying to CLOSE TABS again?


March 17, 2023

Got the new toolchains built with gcc 11.2, the patch worked and I should poke dalias about merging it into musl-cross-make. (It's a backport, this should not be controversial to upstream? But then I felt that about the kernel patches. Oh well, it's on the #musl backscroll, maybe he'll notice...)

Built scripts/mkroot.sh CROSS=allnonstop LINUX=~/linux/github followed by scripts/test_mkroot.sh and everything except sh4 and the "no kernel" targets (armv4l armv7m microblaze mips64) passed, and the sh4 problem is the qemu+kernel clock issue (that emulated board isn't getting a battery backed up clock, and I ran the test without my laptop connected to the net so it can't set the clock from NTP).

So the new toolchain's working as well as the old I guess? More warnings, such as 'sprintf' argument 4 may overlap destination object 'ifs' in sh.c which... Ok, I can see an "even more optimized" version getting that wrong and I should maybe switch that to memmove() but first I should refresh my "what data is in which variable" mental working state which implies I should have a LOT more comments here (and possibly rename some variables) but reading through this code I did a couple quick simplifications but NO I have like FIVE DIRTY VERSIONS OF THIS FILE to collate already (I was working on this at Fade's last month, where was that... I _just_ dirtied the toybox/toybox file which had previously been clean, where's the recent... not in clean, not in kleen... it's in kl2). Ahem: NOT NOW...

Yay, somebody who seems to know this i2c stuff finally piped up on the confusing bug there. I still haven't got a test environment, and "get a raspberry pi working" is not ideal there. (I've been meaning to do that forever: their bootloader needs horrible proprietary blobs to bring the system up, the hdmi+keyboard setup in front of the TV is awkward and the connections buried and I haven't got a hdmi monitor for the desk in the bedroom (been trying to get out to Discount Electronics to buy one for months but they moved 5 miles further away, up near where Fry's used to be), and the only non-broken pi case I have is in use on my turtle board... Sigh, I should sit down and do it anyway. So many tangents.

Speaking of tangents, the recent "cpio -i extra garbage arguments" thing really SHOULD have them be extract filters, and opening cpio I see I have that "cpio skip NUL" test still not passing on the host, and a TODO about hardlink support since that's what the TRAILER!!! entry actually flushes (the cached hardlink detection), which means I really should try to get the other mother to sew buttons onto the hardlinks in a test directory to confirm what the output looks like, and then confirm the kernel's consuming it the same way AND poke the people who were talking adding xattr support to initramfs...

Sigh. Pull a thread in this jenga tower...

One of my early posts to mastdon was reminiscing about how the INTENDED use of a Tardis in Dr. Who seems to be for very long-lived Time Lords to bog off to deep space or some deserted beach for a few years while they catch up on STUFF, and then return 5 minutes after they left actually caught up on all their reading and browser tabs and todo lists without the society around them moving on so they missed anything or accumulated new todo items while they were gone. And the Doctor got in trouble because using one to go to a planet and interact with people was Doing It Wrong.

Yeah, 500 years between regenerations (both because the second doctor said he was about 450 years old and the eleventh lasted about that long in his little exile town), the ability to pause the world for a decade at a time in a nice quiet workshop area with kitchens and libraries and swimming pools and long corridors to walk down... I can definitely see the appeal.


March 16, 2023

The coreutils guys have got their knickers in a twist about new gcc releases breaking trying to build existing packages again, and rather than go "our code didn't change, yours did, this is your bug", they're capitulating because gnu. And providing horrible emacs examples.

Anyway, I should probably try newer gcc so I'm at least not surprised and can have -fno-stupid-thing workarounds prepared for fresh compiler bugs from C++ loons? The current musl-cross-make git version has gcc 11.2.0 as its newest toolchain... And it broke. Then new version can't even do a canadian cross:

from ../../../../../src_gcc/libstdc++-v3/src/c++17/floating_to_chars.cc:31:
build/i686-linux-musl/i686-linux-musl/obj_gcc/i686-linux-musl/libstdc++-v3/include/fenv.h:58:11: error: 'fenv_t' has not been declared in '::'

The line in question is "using ::fenv_t;" which can't possibly be a good idea.

The fix is to tell the libstdc++ build not to include the standard C++ headers in its search path. No really! Adds a compile flag. (And according to heat on the #musl irc channel, that's what got merged upstream.)

No wonder each new release breaks. It failed to build itself with itself, and THIS SHIPPED.


March 15, 2023

Weekly call with the J-core engineering team. Still no word about actually going to Tokyo. The tourists are back, it's sakura season. You'd think the overflow of hotel rooms from the olympics would mean they aren't all full, but having plenty of ROOMS does not mean having plenty of STAFF to service those rooms, and everybody got laid off during the pandemic. Japan does not have extra people in general these days (under the age of 60, anyway), and the covid restrictions allow tourists back but not yet foreign workers to run cash registers. It's apparently a problem, but there are worse problems. (The "nobody has any money, everything's going out of business" problem has at least been arrested by the return of the hordes of tourists. Although a lot of individual shops didn't survive.)

The new router arrived, and we eventually got it set up without installing any apps. It is SO much faster than the little white circle from Google (and the signal strength bar is green rather than yellow when my laptop's on the desk in the bedroom), although we haven't gone all "office space printer" and smashed the google circle with a hammer yet because we're giving it a few days.

The fiber connection itself is actually quite nice: the router SUCKED. The _service_ is mixed: why can't we get a static IP for less than twice what we're paying for the connection now? It's LITERALLY THE SAME SERVICE with a trivial config tweak. Wasn't the whole point of IPv6 that even if you can't get a stable IPv4, everyone everywhere could have a stable IPv6? But no, they want to capitalism at us.


March 14, 2023

Downloaded a fresh LFS book, and the magic all-in-one source tarball which should probably be more well documented, and I should automate another build and then try to get mkroot to do it. I can insert a toybox dir at the start of the $PATH and switch over commands one by one, just like I did with busybox back in the day.

Alas I'm not feeling inspired, because I have too many open tabs. Closing tabs tends to be hard because they're all only still open if I didn't get them closed last time I sat down at it. But starting anything NEW just makes it worse. And it's deep into "if I work on anything specific I'm not doing anything ELSE" territory. Generally a sign I'm still undervolt. (The cedar pollen is not helping.)


March 13, 2023

Still not feeling great, but I should do stuff.

I reached the point of editing and uploading blog posts where the entire entry for Feb 22 is "Oh god, kernel people." I know exactly what that's about but... really don't WANT to expand it? For the same reason I stopped replying to the kernel threads. Can I just use old kernels? I want them to stop breaking stuff that USED to work.


March 12, 2023

Sore throat, couldn't sleep. Spent most of the day huddled on the couch.

Tried to watch the "campfire cooking" isekai with Fuzzy, which Did Not Work because of the stupid Google router continuing to die. (I tried associating my phone with Google's router to save bandwidth for like five minutes when I got back, and then undid it again because even when T-mobile is throttling me for going over my 50 megabyte monthly quota it's STILL WAY FASTER THAN THAT STUPID ROUTER.)

This was finally enough for us to break down and get a new router. It was $50 cheaper to overnight a netgear from Amazon than to buy the exact same router at the Best Buy a fifteen minute walk from here.) So far it looks like it needs an app installed on somebody's phone to set it up (the card in the box says what app to install, or gives a URL to talk to a support being; no other instructions), so we haven't actually swapped it in yet, but there SHOULD be a way to talk to it directly...


March 11, 2023

Sore throat. Kind of lurgy-ish. Trying to figure out if this is allergies or dryness or microorganisms. Possibly it's a team effort. And, of course, I'm old.

Jeff got his contract signed, which means I may be heading back to Tokyo to help him organize the giant archive of stuff we did so it can get spliced together into a new product. Historically speaking, I can do toybox stuff from tokyo MORE easily than from Austin (he hates Apa hotel rooms, I find them just about my platonic ideal of a work environment, with a conbini downstairs for lunch rice balls), so...


March 10, 2023

We rebooted the Google Fiber router yesterday because it had become unusable again. Today it's already bad enough that reloading the household slack tab (after a "pkill -f renderer" because chrome was taking up too much memory again) did the ?cdn_fallback=1 then then added ?force_cold_boot=1 for the third attempt and then timed out saying it couldn't contact slack.

I don't mind google.com taking 7 seconds to load nearly as much as I mind being completely unable to use some sites, or thunderbird pausing for ~3 seconds between each email it downloads via pop3 (meaning a 400 message download takes over 10 minutes, so downloading my ~1500 daily messages is a background task that takes over half an hour).

Capitalism's really BIG failure is externalities. Engineers should be forced to dogfood their own products. I want THIS router put on the desk of the person who designed it, with all their traffic going through it, and to be forbidden from rebooting it for a week.

And yes, I'm happy to dogfood toybox. The main reason I don't already is I want a feel for what the other versions do so I can make toybox roughly match it. (When your frame of reference is your own output, it's really easy to spiral off into the weeds.)

Much wrangling with cpio, trying to fix three different issues. Got two of them fixed, calling it good enough since the third isn't a regression and nobody's waiting for it. (That's the "TEST_HOST fails, when did that start?" Moving targets...)

Pondering (st.st_mode&S_IFMT) == (mode&S_IFMT) and wondering if the compiler is smart enough to turn that into !((mode1^mode2)&S_IFMT) or if that's even a win. (3 operations vs 3 operations, although ! is only an operation sometimes? It could also go r1 = S_IFMT; r2 &= r1; r3 &= r1; branch-not-equal r1,r2 or some such. The repeated constant is PROBABLY something the compiler can handle for me, I don't need to go "that's redundant, I could rephrase it in a way it's not stated twice" and then ponder whether or not that's actually an improvement.

Ahem: premature optimization. Back away slowly.


March 9, 2023

Ok, I _think_ for the help fixes: "toybox --help COMMAND" should print Elliott's advertising line and "toybox help command" should not, and "toybox --help" is equivalent to "toybox --help toybox", but "toybox help" is equivalent to "toybox help help".

This is all UI stuff, so there isn't a right answer, but I'm trying to come up with an answer that makes sense for me without obviously disappointing anybody else.


March 8, 2023

I have 8 zillion accumulated 80/20 patches were I've done most of the work and then hit "does this cover all the cases, what ARE all the cases, and what are all the test cases I need to put this through to prove that" and I can't quite work that part out. I'd very much LIKE to check all this stuff in, but making sure it's _right_ is hard.

The sad part is I keep trying to grab low-hanging fruit, finding out the thing is not low hanging fruit, parking it at a good "almost finished but not feeling up to finishing just now" parking spot, grabbing OTHER presumably low hanging fruit, and then coming back a couple weeks later and having to reconstruct my mental state from scratch.

The external bug reports are actually easier to field because somebody else is waiting for me to finish and I can tell whether or not I've fixed their test case.


March 7, 2023

I'd like to have a nommu test system that runs under qemu, and "coldfire" (an m68k variant) is the oldest of the lot. The problem I had back under aboriginal linux is none of the nommu board emulations had the complete set of hardware devices I wanted (256 megs RAM, battery backed up clock, serial I/O, two block devices, network card), but most things have a serial console and I can fake the clock with sntp or an environment variable, and if I have a network card I can use network block devices. It's not ideal, but it's _something_. Alas I can't use swap on nommu so a board with only 64 megs ram isn't running modern gcc on anything complicated.

Alas qemu is terrible about labeling its boards (it's getting better, but there's no docs/system/m68k yet), I can go "qemu-system-m68k -M ?" and I THINK the first two boards there are coldfore (as opposed to the with-mmu ones that Linux inexplicably won't let me built a nommu kernel for) are an5206 (Arnewsh 5206) and mcf5208evb, the latter of which is the default board. As far as I can tell (from reading through hw/m68k/ar5206.c and hw/m68k/mcf5206.c) the 5206 has 128 megs ram but no hardware except a serial port? The 5208 has one network card, which is at least something.

So, back to the linux source: arch/m68k/configs has a file m5208evb_defconfig so let's build that and see if I can feed it to qemu-system-m68k -nographic -no-reboot -kernel vmlinux and hey: boot messages! Panicing because no initramfs. And the no-reboot is ignored implying this board doesn't know how to reboot or power off which is... sigh.

Memory goes from 40000000-41ffffff which... echo $((0x1ffffff)) is 32 megs ram. That's a bit squished. And it ignores qemu's -m option to try to give it more, which beats the cortex-m boards that were erroring out when you gave it any -m value other than the default. (QEMU may be undocumented, but at least its behavior is inconsistent.)

What else is in these boot messages: ttyS0 is the "mcfuart" driver. A dozen TCP/IP layer boot messages about hash table initialization and such but no line about the actual network card initializing itself. (Doesn't mean it didn't, which messages happen at which printk verbosity level is kinda potluck in embedded board drivers.) Oooh, mtd probe address, we've got a Memory Technology Device which means flash chip. Data storage onna block device, which QEMU might be able to stick a host file under. /dev/mtdblock0 which the "initramfs didn't work" root= fallback logic tried to mount as ext2... because apparently the default kernel command line (from qemu? built into the kernel?) is root=/dev/mtdblock0 and WHY does it bother saying /dev/ there? Honestly, what's the alternative?

Ok, I got a kernel to boot and spit out messages to serial port, which means I MIGHT be able to get an initramfs to boot to a shell prompt with serial console, even if I don't have any other I/O devices working yet. Assuming I can figure out how to get musl to...

Ah, darn it. I did this a year ago. And why did google not find musl's official web mirror on openwall? Google searches are getting RAPIDLY less useful, it's very annoying. I manually navigated to the right place but for some reason Google can't find that. Do THEY have a borked robots.txt? No, looks sensible. This is just Google increasingly sucking. I hope they recover.

Anyway, yeah, that's why I didn't do this earlier. Puppy eyes at Rich time again, I guess?


March 5, 2023

Took ADHD meds _and_ a store brand zirtec AND a prophylactic ibuprofen this morning, just for good measure. Actually able to concentrate for once, at least so far.

And Elliott's having build trouble on mac, which... how slow is it to launch executables on mac? Is it just a homebrew thing, or are all mac binaries latency spike city? And yes, I should have realized old version of bash without "wait -n" isn't just a centos thing, it's also a mac thing. So my centos hack is insufficient if you care about the mac build being well-supported, which Elliott does.

Checked in the fixes for the warnings from yesterday.

Grrr, tests/files/* is design-level wrong, but it would take a largeish rewrite to make it right. I need generally better organization for "not the actual toybox source" files: scripts/make.sh and scripts/mcm-buildall.sh and scripts/mkroot.sh and scripts/root are all slightly different categories.

Cycling back to the "help" redo...


March 4, 2023

Dear compiler loons:

toys/posix/ls.c:393:16: warning: too many arguments for format [-Wformat-extra-args]
printf(" "+FLAG(m), 0); // shut up the stupid compiler

But if I yank it, llvm goes:

toys/posix/ls.c:393:16: error: format string is not a string literal (potentially insecure) [-Werror,-Wformat-security]
printf(" "+FLAG(m));

So one compiler warns if you give it an extra argument, the other warns if you DON'T give it an extra argument, and in NEITHER case is it an ACTUAL PROBLEM. (Sigh, switching it to xputsn() but still. This is unsuppressable false positive noise. Stop it.)

Meanwhile, gcc is also going:

toys/posix/cat.c:31:32: warning: the omitted middle operand in ?: will always be 'true', suggest explicit middle operand [-Wparentheses]
int i, len, size = FLAG(u) ? : sizeof(toybuf);

It's not "true", it's a constant 1. Guaranteed by C99. I WANT it to be a 1. When the flag returns 0, I want to replace it with sizeof(). That's what that code is DOING. (This warning showed up when I added the !! because previously it was (integer&mask) which coincidentally was 1 but gcc wasn't treating "1" and "true" as different because THEY ARE NOT DIFFERENT IN C, THAT IS A C++ THING AND C IS NOT C++.

On the bright side, flag position is less important, so less of a lurking land mine. The cost is that gcc's rapacious stupidity triggers on more irrelevant crap. I'm sad I can't just compile with old toolchain versions from Before The Stupid, but I did that in aboriginal and there was a limit.

Garrett (the uclibc++ guy I worked with at timesys way back) drove through Austin and met me for lunch, and we wound up talking for 5 hours. (Rudy's no longer has the reasonably sized reusable plastic cups, it's styrofoam now. Oh well, I've still got like 10 of the old ones.)

Too tired to do more programming after that, although I'm not sure how much was the truly insane quantity of cedar pollen in the air today. Yesterday's apocalypse du jour dropped the temperature 30 degrees, which always wakes up the cedar trees this time of year and gets them bukakkeing their needles off. The recent apocalii also left us with a large pile of broken branches out front, between the ice storm and the tornado warning, and it's a race between mail-ordering a hatchet to make firewood and municipal brush collection to see who gets them first.


March 3, 2023

How limp did I go after getting back home? I was 3 days behind on reading my webcomics.

Sigh, I was so _amazingly_ spoiled by the speed of Fade's internet connection. I'm back here with Google Fiber and pages are taking 30 seconds to load, and email is downloading at one message every 3 seconds. (In batches of 400. It takes a bit.)

Finally applied Elliott's pending patch. (Saw it in the web archive yesterday but hadn't downloaded enough email to grab a local copy, yesterday I took my laptop to Wendy's and HEB and neither offered net access. Phone tethering drains the laptop battery fast, and the radio signal situation in Hancock center is appalling: it can't see my WIFI access point if I lay the phone on the keyboard, and around the Corpse of Sears (which Wendy's is across the parking lot from) my bluetooth headphones need my 6 inches from my left ear to avoid dropouts. Plus t-mobile did the "you have used 48gb of your 50gb gratuitous metering quota before we throttle the hell out of you" ping in the airport, and doesn't reset until the 5th...

So FLAG(x) now uses !! to force the return value to 0 or 1, which gets optimized away when it's used as a logic value. Audited all the users to remove a bunch of existing VALUE*!!FLAG(x) that are now redundant, and removed several subtle dependencies on a flag having a specific value along the way (some of which were commented, some weren't, including at least one subtle bug introduced by a commit that moved flags). There's still several VALUE*!FLAG(x) which now turns into VALUE*!!!(x&y) but the extra ! also get optimized out.

Whole lot of little style fixes as long as I was doing a review pass, spaces arround the = in assignments, removing inconsistently used parentheses, str = FLAG(x) ? "" : "K" becoming str = "K"+FLAG(x), etc. A few cases of "FLAG(x) ? TT.x : other" becoming "TT.x ? : other" which is actually subtle: sometimes you check the flag to see if it was set because it's an argument that only takes collated arguments, so --blah=abc sets TT.blah to "ABC" but --blah leaves it NULL. But I checked that it wasn't the case, and switched to the "test only one value and hopefully it's still in a register" version. (That said, I _kept_ one in patch.c because TT.p is numeric and could legitimately be -p 0 which is different behavior from not saying -p, so we need to check the flag not just the value.)

Whole lot of other "verification" of VALUE*FLAG(x) was _previously_ the rightmost flag, and not a hidden *4 or something. (The one case where it was had a comment.)

While I was there, I normalized todo and Todo to TODO so it's easier to grep for. (Can't just grep -i because "todo" shows up in comments and at least one local variable name.)

This wasn't (intended as) a micro-optimization to shave a few bytes off the code, this was "remove some conceptual land mines", but I did run bloatcheck a few times in hopes it wasn't making the result noticeably larger.

Oh goddess, this chunk of tar.c:

do {
  TT.warn = 1;
  ii = FLAG(h) ? DIRTREE_SYMFOLLOW : 0;
  if (FLAG(sort)|FLAG(s)) ii |= DIRTREE_BREADTH;
  dirtree_flagread(dl->data, FLAG(h) ? DIRTREE_SYMFOLLOW : 0, add_to_tar);
} while (TT.incl != (dl = dl->next));

Is assigning to ii but not USING it, the argument to dirtree_flagread() recalculates one of the flags and leaves the other zero. How is the passing the test suite? Would fixing it _break_ the tests?

Fixing it does not break the existing test suite. I'm gonna fix it and see who (if anyone) complains? (I think it might only affect sorting at the top level, which might not be a thing since even when the top level is a directory that's one entry. I need to think through it and come up with a test, which I dowanna do now because this is big and I want to get it CHECKED IN.)

I have SO much half-finished crap in my tree I need to FINISH and FLUSH. The recently help plumbing changes aren't quite done yet. My most recent bount of shell work. The lib/passwd.c rewrite. I get a bunch done but don't make it over the hump so it accumulates instead of reducing. Need to CLOSE TABS...

And I think even tar --sort isn't going to sort the command line arguments? Procesed inn the order provided, the CONTENTS get sorted. Which still gives us the stable ordering, which is what they were after... Ahem, NOT FOLLOWING THE TANGENT RIGHT NOW.


March 2, 2023

Still kinda collapsed. I have pending email from 2 people to reply to, and spent most of the day not doing it. So many tabs to close...

Alright, the design issue with the --help output is when should it have the toybox summary line? Going with my most recent release binary, it looks like "toybox --help ls" prints it but "toybox help ls" does not? I can work with that...

Sigh, there's a lot of THINGY*!!FLAG(x) and Elliott's most recent patch also modified code that assumpes FLAG(x) is producing 1 which is an artifact of position. (There's a comment about making sure the flag is in the right place in the optstr. That's... more brittle than I like.)

Possibly the FLAG() macros should have the !! built in? I should check whether the optimizer is smart enough to produce the same code. (No, I am not going to start using the "boolean" type.) Time to dig out make baseline and make bloatcheck! Which don't quite work here because changing toys.h at the top level doesn't get dependency checked and cause a rebuild, and "make clean" deletes the baseline out of generated/unstripped. Workaround: rm -rf generated/obj before make bloatcheck.It's not _quite_ the same output. (With gcc, anyway.) In do_sha3sum() it's because we care about the flag position, which should be masking instead of using the FLAG() macro anyway. In do_gzip() it's because we're passing the value to a function which does not appear to be being inlined so even though it's only ever being used as a logic value the status doesn't propagate far enough. In cp_main() FLAG(f) and FLAG(n) are being assigned to local variables which are then used as logic values... which shouldn't make a difference to code generation, but does? Ha! And when I yank those local variables and just use the FLAG() macros directly, it shrinks 34 bytes! In touch_main() it's another "we care about flag position" thing saving 3 bytes: I'll live, and cpio_main() is another "flag is 1" with a comment, and also assigning FLAG(t) to a variable which only cares that it's nonzero but the variable's incremented a couple times later (to make it nonzero) so... take the hit. In cksum_main() FLAG(L) is passed as a function argument, so "zero or nonzero" must become "0 or 1" (and crc_init() is in lib/ so I don't expect it to be inlined across compilation units). Still kinda surprised su_main() isn't in pending because that whole subsystem is still unfinished, but reset_env() is taking FLAG(l) as an argument which lives in lib/ so isn't inlined so doesn't see it's being used as true/false. In pidof.c print_pid() is returning FLAG(s) and that function isn't being inlined because the function pointer is passed to names_to_pid(). Ha: nl_main() was doing another "depend on the flag being in position 1" but did NOT have a comment about it... and there's about 5 more of those. Sigh.

Huh, patch -R looks broken: apply_one_hunk() did reverse = FLAG(R) and then part of the "allow fuzz" test was c=="-+"[reverse] which means it depended on FLAG(R) being 1, but when Elliott added -s in commit 6f6b7614e463 I didn't catch that he put it at the end and moved R to 2 meaning in the reverse case it'll be comparing against the NUL terminator instead of the '-'. And we don't have a test for autodetecting fuzz. So adding the !! would actually _fix_ this.

Alright, I think I want to audit all the FLAG() uses in toys/*/*.c because there's a lot of !! I can now remove, and I should be consistent about not parenthesizing VAL*FLAG()|VAL*FLAG() because * is higher priority than |. It's a pity there's no "make test_dmesg" to make sure I didn't break that. I expect this is gonna come up a lot in a treewide audit...


March 1, 2023

Day after travel. Collaped.


February 28, 2023

Flying back to Austin.


February 27, 2023

Oh hey, another email in my inbox this morning about what somebody thinks I SHOULD be doing instead of what I am doing. (Watched a good video on "autistic inertia". I've mentioned before that I work based on momentum, and there you go. Having something Looming can either be extremely motivating (avoidance productivity: I will do SO much cat waxing to "virtuously" avoid the Looming Thing), or extremely demotivating (loss of momentum and traction because I can't muster the executive function to address picking up that piece of paper, it just won't budge).

Anyway, ignoring the "linux-kernel community is so broken" pile, the question du jour in my email was:

what is the official toybox opinion on rust being added to toybox?

And "My gut reaction is "Oh goddess not again" and I've been actively ignoring it?" was too short, so Pascal's Apology kicked in, and I replied:

Define "added"? I'm not putting a rust compiler in toybox, if that's what you mean?

If you mean "should I implement some commands in Rust and some in C", having a single simple context everything is done in the same way is part of toybox's design goals? Early in Toybox development the build needed Python, and I cleaned out that build dependency so it's all C and bash, and I'm implementing my own bash compatible shell so toybox builds under toybox. Early on I even had some commands implemented as shell scripts, and I wound up removing them again and doing them in C even though I planned to ship a shell interpreter, because I wanted the whole thing to be a single file with no external dependencies which you could statically link and drop into an empty chroot directory and have it just work.

If you mean "rewrite the whole project from scratch in a different language", long ago I was thinking of rewriting the whole of toybox in Lua but the problem I hit is that Lua doesn't ship with a standard set of posix bindings so I had to install something like 7 different prerequisite packages just to manage things like "wget", let alone implementing "mount" or "ifconfig", and if I had to implement/ship my own new Lua bindings written in C (and cross compile those to every supported target architecture) I might as well just do everything in C. (Which is a pity, Lua was quiet elegant, but their deployment strategy was too minimalist to be usable on its own.)

If you mean how would Rust affect my variant of countering trusting trust then having the project be in multiple languages again kinda defeats the purpose of a minimal installable base capable of reverse engineered binary auditing.

If you mean coming up with a replacement tiny system written in a single language that's both learnable the way minix and xv6 are _and_ scales up to actual load bearing deployment in real world usage (the way Linux 0.95 through about 2.2 did)... I'm still trying to make that work in _C_ (well a non-GPL one, I had it working with busybox but the insane FSF poisoned that well so thoroughly with GPLv3 around 2007 I wound up starting over.) I'm told the Rust compiler is now written in Rust and dunno what its system call binding approach is, but I still await a Rust kernel that actually ships in a product. (Even a vmworks level of kernel: I decided to wait for that when I saw that blonde lady's Rust talk at linuxconf.au in 2017 and I'm still waiting. Heck, even something as silly as Fuchsia, just something somebody somewhere actually used for something in a non-demonstration manner. There's a dozen different My Little Kernel variants people have done, but nobody actually seems to do real work in Rust? It's all either reimplementing stuff that already exists because "Ew Icky C", or "here's how we're going to change the language governing bureaucracy" and "here's how we're going to add yet more complexity to the language" and it's very tiring...)

Show me a serious attempt at a system that rebuilds itself under itself from source code, all written entirely in Rust with no C anywhere, and I might start to care? ADDING Rust on top of existing complexity is just more xkcd standards layering. (Yes, you have garbage collection and bounds checking like Java did in the 1990s. Yes you have native compilation to binaries like Java did with IBM's Java Native Compiler back in the 1990s. Yes you have a Big Marketing push and drive to rewrite everything in this one language like Java did in the 1990s. Yes you have a strong argument that C++ is a terrible language like Java did in the 1990s, which is not the same as C being a terrible language but try telling any C++ developer that. From a safe distance. Bring popcorn.)

If you mean the "Rust is inevitable, the same way Hillary Clinton was in 2008 and again in 2016", I note I've lived through the following:

I learned C in 1989, spent about 1992-1995 doing C++, and then was all in on Java as my main programming language from 1996 until about 2000, caught the Python 1.x->2.x transition and then bowed out again when staying on 2.x actively offended the 3.x developers... The C I learned way back when remains relevant. If I tried to write new code in the ~1995 version of any of those other languages it wouldn't build in modern environments.

I was part of the "rewrite everything everywhere in Java" crowd for about 5 years. My bug report was the reason the ability to truncate a file was added to Java 1.2. I worked on IBM's port of JavaOS to the PowerPC in 1997, taught Java at the local community college in 1998 and 1999, designed a hard realtime garbage collector... It was really exciting. (I wrote a little about how that ended in my blog.)

Any time someone goes "why aren't you using Rust" as an accusation, I treat it the exact same way as the C++ and Java people doing that before them. I had 20 years of Windows people asking why I didn't do windows (smoothly transitioning from rejecting OS/2 to rejecting Linux). I don't care if "everybody's doing it", I've never had a Facebook account either. It's not _my_ job to "be convinced". Lua had "here's cool stuff Lua does better", which appealed to me enough to take a look. I have yet to see arguments in _favor_ of rust, they've all been _against_ C. "C bad, icky and dangerous, we blame you for perpetuating it, you must stop now". No thanks.

A big reason I keep coming back to C is I can stay 10 years behind on the standards without a problem. Heck, I can still compile K&R stuff from 1978 if I really need to. The main deficiency of ANSI C from 1989 is that the first 64 bit processor came out in 1991 so the 64 bit "long long" type was a widely implemented compiler extension that worked its way into the standard later. I only moved toybox from C99 to C11 recently because of like ~3 minor convenience features (typecast array/struct literals, the "has_include" macro, and an alternate "inline" syntax that let us work around an llvm bug that's probably since been fixed).

Rust still hasn't settled down and decided to be nearly that stable: from a distance it looked to me like the first decade or so of the language was just WILD THRASHING leaving the language unrecognizable 5 years later, and now it sort of knows what it is, but still changes?

Has anybody made Rust work on a nommu system? Or only XIP from read only storage with 256k of sram? (Which Linux has been made to do, for example. Good luck pulling that off with garbage collection...) If not, your argument is "we'll still need C, but just less of it, so a smaller pool of people will have less expertise and age out without replacement". That's kind of Tesla's version of the self driving car argument: 99% of the time it'll drive for you just fine, and the remaining 1% it will crash and/or kill pedestrians and we're calling that the driver's fault but the driver won't be paying attention and may be way out of practice assuming they ever knew how to drive in the first place. How this is supposed to be a net improvement, I couldn't tell you.

Is there a Rust version of tinycc? What's the smallest, simplest Rust compiler out there? (Tinycc could happen because the language wasn't a moving target. If I decided to pick it back up and bang on it again the old stuff I did is still theoretically relevant. Is even a 5 year old version of Rust still relevant?)

If you want to implement commands in rust yourself, you can stick them in the $PATH and it should just work. Is there an obvious reason this should have anything to do with toybox? The "start over and rewrite everything in rust" approach like I was poking at doing with Lua would mean getting all four packages written in Rust. And preferably a stable version of Rust where a newbie could grab an existing system deployed 10 years ago and not touched since then, fire up the old build, reproduce it, understand it, and be able to modify it. As far as I can tell, this isn't a thing the Rust community _wants_, let alone is actively trying to achieve.

Sigh, I haven't got anything _against_ Rust, any more than against Ruby or PHP or Lisp or Prolog. I just don't care. Nor was I _offended_ by the people submitting forth and lisp interpreters (yes, plural) to toybox over the years. (In the absence of toysh, people have decided it needs a programming language.) I understand this guy's interest, and would like to politely decline... except I DO have something against projects like systemd that don't give me a graceful option not to participate, and the push to rewrite the linux kernel in rust without forking it is exhausting in the same way the build requiring perl was exhausting.

This guy didn't exactly knock on my door with a rust version of The Watchtower to tell me the good news about our new savior, but... I'm not getting "live and let live" vibes from this community either.

(I have a youtube video bookmarked, which claims to explain Rust in an hour. It's on my giant to-watch heap. I'm not AGAINST Rust. I just... still don't see the point?)


February 26, 2023

Fiddling with toybox help plumbing. Kinda spiraled.

So "toybox --help toybox" wasn't producing output, because of fallout from changes to prevent "toybox toybox toybox" stacking arbitrarily deep (and blowing the stack now that Linux doesn't necessarily enforce environment size limits even on mmu systems). So I started poking at that, but the show_help() flags API did the old "this argument was a yes/no boolean, then it grew a second bit, then it grew a third bit, and now it needs #defines" thing that I hadn't cleaned up yet. And while I'm there, "help -au" should print the usage lines for all commands, but calling help as a shell builtin does unique filtering so what happens when you "help -u" on the builtin? And the "See:" logic isn't filtering right as a builtin (redundant lines). And this whole "Toybox 0.8.9 multicall binary (see https://landley.net/toybox)" line at the start (which wasn't my idea, but then calling Linux "linux" wasn't Linus's idea either) should only be output SOME of the time and when is that some?

I keep trying to do quick fixes that wind up touching a half-dozen different files and leave off unfinished after hours of work and then it just ADDS TO THE MESS.


February 25, 2023

Flying back to Austin on tuesday. Not up for programming stuff today. Reading fanfic on AO3 instead.

Some months back I posted an observation about the Tardis to mastodon, which is why I want one. Just catch up on everything and come back when you're feeling up to it.

I wrote up an email reply which is a bit rambling and off topic for the toybox list (see "not up for" above, combined with pascal's apology for writing a long letter, substituting "spoons" for "time") so here it is instead. The context is that Michael Kerrisk, the man-pages maintainer, retired and handed the project off to a new guy, and didn't properly announce it (quietly added a co-maintainer to the git repo and then ghosted everybody), and now that we've finally figured out what HAPPENED we're trying to adjust.

On 2/24/23 11:46, enh wrote:

> > Possibly the new maintainer needs to poke Konstantin to get access to update the
> > directory, and then put stuff under the actual kernel.org page? (Or you could
> > put some under an android.org location? Either way they'd be up to date with the
> > repo instead of a couple years behind...)
>
> yeah, that's one of the options... generate the html and stick it on one of the
> android-specific sites, but that seems a bit odd (people are already confused by
> places where the man pages are actually only talking about glibc; hosting them
> on an android site would only make that worse) and there are already a lot of
> links to man7.org out there in the wild, that it would be
> unfortunate to see go stale. (though if no-one has access to man7.org
> any more, there's nothing we can do about that anyway.)

The downside of depending on individuals is you're inconvenienced when they cycle out. The downside of depending on organizations is they're all just a bunch of individuals who get together and collectively pretend, so things go just as pear shaped when the people actually doing the work well leave without a proper handoff to someone else who will actually do the work well, but you tend not to notice as fast (before _or_ after: see the Linux Foundation's consumption of the Free Standards Group and thus the Linux Standard Base). This lack of warning isn't necessarily an improvement.

Ahem: man7.org was offered as a community resource but is actually Michael Kerrisk's personal page and he is not handing it off to the next guy. (The maintainer of landley.net does not get to throw stones here, although all the toybox.net variants are camped by people who want thousands of dollars.)

The responsibility for the man-pages git repository was handed off (resulting in the repo effectively moving to a new URL which nobody seems to really care about), but not the website or the release announcement email list. (Haven't gotten one since, if it's still having releases?) If it's a good idea for the project to move to more of a "package deal" where there's a repository+website+mailing list that can be passed to a new maintainer as a group, that's sort of a design issue.

Jeff Dionne set up the original uclinux project, which I believe busybox.net was modeled on after linaro ended in the dot-com crash and the kernel parts got merged upstream and Erik Andersen kept the busybox+uclibc subset of uclinux going as a personal project. He handed busybox.net off to me in 2005 by giving me a login to the server (it moved from the DSL line in his basement to osuosl, but Erik still pays for the domain renewals). When buildroot forked off of uclibc, I'm the one who abused my root login on the shared server image to create a new mailing list and kicked the buildroot traffic off to the new list. (Alas, too late to save uClibc.) Buildroot has since separated itself the rest of the way from uclibc (its own VM with its own domain), so it's not inconvenienced by shared infrastructure going down (as has happened a few times what with uclibc being dead and all), which also means that handing over the keys to a new maintainer is a thing that buildroot could potentially do if necessary.

Sigh, somebody should write up a non stream of consciousness "handing over the keys of an open source project to a new maintainer" document. Do you even have a manifest of what all the project's resources ARE for something as big as Android? Not that Google's ever going to hand off Android. I remember when Red Hat set up Fedora and it (it pretended to be independent until Red Hat finally admitted it was just Red Hat Enterprise Rawhide. (So an independent Centos emerged... and Red Hat bought it.) Anyway, the point is when people/management change, the project's gonna wobble no matter what the corporate structure says because it's people who do things and know things and remember things, whatever they corporate structure says.

> > Let's see, how hard is it to produce html output from this git repo... it's got
> > a top level Makefile to do exactly that as its default target, but it wants a
> > package called "man2html". And installing that on my laptop installed apache
> > which LAUNCHED AN INSTANCE ON LOOPBACK. Why on EARTH would... that's just sad.
> >
> > But ok, I can uninstall it again after building... looks like it populated
> > tmp/html with files? No top level index. Let's see, the first file under "man3"
> > is __after_morecore_hook.3.html which seems to be a synonym for malloc_hook (not
> > symlinks or hardlinks, just redundantly generated files). The "Return to main
> > contents" link goes to file:///cgi-bin/man/man2html which does not exist. The
> > #include link goes to file:///usr/include/malloc.h which ain't gonna
> > work on a web server either...
> >
> > Looks like there's the start of something workable here, but it needs a bit of
> > shoveling? (Or at least digging into how to configure it?)
>
> yeah, and one problem with being part of a large bureaucracy is that the docs
> folks and the branding folks will all want a say in making it look "right" if
> it's on an android site!

My first really well-paid consulting gig was working at a dot-com that was managing a rewrite of IBM's mainframe pricing and sales system. Various departments within IBM had wrestled for control of the project so extensively that upper management had outsourced our bit of it so NONE of them had it. Taken the ball away and given it to someone else entirely so they'd stop fighting.

Which meant my job was to be on an 8am conference call with IBM Europe (Boblingen, Germany: initial deployment) and IBM USA (Poughkipsee and Dallas, one did frontend one did backend), a 6pm conference call with IBM USA and IBM Australia (Worldwide Integration and Test, it was _not_ in didjabringabeeralong because that's a Discworld reference but don't ask me what city it WAS in, somewhere that was simultaneously under water and on fire at one point but that came later), and when I needed Australia and Europe to talk to each other that was a 3am call and I slept under my desk AND BILLED FOR THE TIME. (The dot-com manager told me to.) I don't think I authored a line of code (for them) the entire contract, the _technical_ part of my job was matching up defect reports that told us to do one thing and defect reports that told us to do the exact opposite (or explicitly NOT to do that thing) and bring up pairs of them in the meeting.

Somebody eventually explained to me that a specific manager in Dallas (Ken somebody?) had figured out how to get promoted by sabotaging projects: during the design phase he demanded to know why implementation hadn't started yet, then when they started implementing an unfinished design he'd demand to know why it wasn't being tested yet... The answer was always "because we're not ready" but he'd make a stink and get it started and the reputation was Ken Got Things Done. It wasn't happening before he made it happen. It all collapsed into chaos the moment he left, but that just showed how vital he'd been didn't it?

So this project had fundamental design changes coming in regularly, requiring not just complete rewrites multiple couple years into the project, but constant changes to the test plan. ("Why can we never get real database data to test with?" "It's their strictest trade secrets." "What is this system for anyway?" "Pricing 360 mainframes." "How much do those usually cost?" "That's not how it works, the salesman figures out how much the customer is able to pay, and then they produce an invoice that adds up to that amount." "So this whole system is a giant bullshit generator that emits nonsense to produce a predetermined result?" "The invoice has to be reproducible and comply with a bunch of legal and regulatory clearance issues, you have to word things right for the technology to be exportable to various jurisdictions..." "You didn't answer my question." "No I did not.")

Eventually the Australians did a slimy clever political thing to extricate themselves from this cluterfsck death march, by declaring that one of the endless thrashing "release candidates" they'd been given had PASSED THE TESTS and they certified it as deployable, closed out their budget, and scattered to the winds reassigned the testing staff to other projects. Completely ignoring the fact that the testing they were doing was useless (something nobody else could call them on because nobody could, for political reasons, admit it to be true in so many words.) They took a random passing snapshot in time of the vague contradictory specifications they'd been given and ran the red queen's race fast enough to catch up just long enough to call Bingo. They'd been given an impossible job and claimed to have done it, because ignoring the "it's just busy work until we're ready for you" nature and instead declaring victory meant they could STOP DOING IT. Which immediately clogged up the pipeline leading to them, because NOTHING MORE COULD BE TESTED, an existential constipation crisis leading to ALL THE PHONE CALLS.

That's about when my 6 months were up, at which point the consulting company all this had been outsourced to offered me a 50% raise to just STAY AND BE ON THE CALLS... and I just couldn't. I couldn't put into words WHY, this was almost 15 years before David Graeber wrote his first article on "Bullshit Jobs". But at the start of the contract, the existing employee who'd been doing it had used all his accumulated vacation time AND some family emergency under the family and medical leave act to take a solid two months sabbatical, forcing them to reassign the project to ANYONE ELSE BUT HIM. They'd thrown money at a passing junior dev to just Sit In The Chair And Be On The Calls, and he'd left me a pile of useless printouts to "get up to speed" with. There WAS no documentation. The job was babysitting, and the burnout was just insane if you didn't understand that and tried to actually accomplish anything ever. I found myself physically unable to just shut up and take the money longer than I'd already done.

Anyway, tl;dr there are sometimes political advantages to having something live outside an organization.

> > > nope... that's still
> > > https://thephd.dev/c-undefined-behavior-and-the-sledgehammer-guideline
> >
> > I apply the sledgehammer to the compiler. (Push back against the abuser causing
> > the damage, don't make the victims endlessly escalate ever-changing "compliance"
> > that's never good enough. Danegeld encourages the dane.)
>
> read the link (or listen to what i've been telling you for the best part of a
> decade) --- the problem is that the compiler folks don't believe we're their-
> customer. they don't care about "is it useful?", they care about "microbenchmark
> line goes up?". or, in your analogy "the law is currently on the abuser's side".

Oh sure. I know. Doesn't mean I'm going to stop fighting. (There's a reason I was poking at tinycc/qcc.)

Steven Universe's "that's why we can't fight them", "that's why we have to fight them" line works in here somewhere.


February 24, 2023

Got an email from Andrew Morton which would have been great if it was the first one, but after Thomas Gleixner's repeated replies ignoring the code and talking about bureaucracy (I actually MET him, and later recommended his company to Taylor Simpson at Qualcomm for handling Hexagon's kernel patch review and upstreaming, nice guy back in the day...), and that japanese guy going "we voted on this stupid unnecessary API so adding code that renders it irrelevant would highlight how stupid it had been and embarass us all"...

I'm trying to scrape up the politeness to answer Andrew's questions in a constructive manner. Rather than an honest one. "Who do I expect to merge this?" Nobody. I do not expect the kernel clique to be functional enough in 2023 to merge external contributions from individuals. All of this code was submitted to the list before, and ignored. This is a roundup for people outside the kernel. If people build their own kernels, this can add to their patch stack. If lawyers give me guff I go "look, I submitted it to them, they chose not to merge it for their own reasons". But linux-kernel being a functional place to discuss patches? That's LONG gone.

But I can't just SAY that. It's not USEFUL. I'm not sure what would be, and dealing with them makes me SO TIRED. (Andrew is being polite and constructive! I should do the same! I really should. I'm just out of spoons for kernel "community".)


February 23, 2023

Many moons ago I was trying to add cortex-m support to mkroot but seem to have lost my notes. (I want a qemu nommu target so I can more easily test nommu support without copying stuff to my turtle board. Yeah, I can tell toybox to enable nommu support anywhere and use the nommu codepaths, but that doesn't prove nothing LEAKED, and that the result actually WORKS on a nommu system.)

My old blog entry from the time just says I was working on it, but doesn't provide useful context like what QEMU board or Linux defconfig I was trying to make work. So we start over.

According to qemu-system-arm -M ? | grep '[-]M' the list of QEMU Cortex-m boards includes "stellaris" (64k sram), bbc microbit (no obvious Linux target), and stm32 has two boards: vldiscovery and netduino, neither of which implement ethernet or block devices. That leaves mps2.

The mps2-an500 and mps2-an511 each have 16 megs DRAM, and qemu's hw/arm/mps2.c has a gratuitous explicit test for an -m trying to increase it and then refusing to do so: if (machine->ram_size != mc->default_ram_size) error_report("Invalid RAM size, should be %s", mc->default_ram_size); (Which seems silly, there's space in the mapping? Oh well...)

Linux has an mps2_defconfig build. I need a kernel config, QEMU board emulation, and compiler that all agree on the target, where "compiler" includes both gcc tuple and musl support. I have a static PIE toolchain for armv7m (I.E. thumb 2 I.E. cortex); I'd like an fdpic toolchain but haven't made that work yet because support hadn't been merged upstream yet last I checked. (gcc, binutils, and linux all need it, not sure about musl?)

Standard ELF has absolute memory addresses hardwired into it, which means you could only run at most one instance of each ELF binary on a nommu system (it's kind of the same problem a.out had with shared libraries: in practice the ELF loader just isn't allowed on nommu). Position Independent Executables (PIE) use relocatable Position Independent Code (releative addresses from a base pointer kept in a register), which is basically building your executables the same way you build your shared libraries, so they can be loaded anywhere in memory. It's slightly less efficient but the security nuts love it because exploit shellcode hasn't got known absolute addresses to use on the target system. FDPIC takes that concept and expands it to make all four of the standard ELF segments (text, data, rodata, bss) independently relocatable, which means your program doesn't require one big contiguous chunk of memory to fit into, but can instead fit into four smaller chunks (which is very useful on nommu systems, where memory tends to get fragmented over time), AND it means the read-only segments can be shared between program instances (five copies of bash can all use the same text and rodata, each one just needs their own data, bss, stack, and heap), but the downside is you need 4 registers to store the 4 base pointers (or have your base pointer point to an array of 4 pointers with an extra dereference on most memory accesses). But that's ALSO something the security guys like because foreign exploit shell code can't even know where rodata is relative to text in a given running binary, it's even fiddlier to exploit.

You'd think the FDPIC loader would be the standard one by now (since it can handle normal ELF binaries just fine: FDPIC is ELF with an extra flag in the header, it has the OPTION too make the segments non-contiguous but not the obligation to do so.) But as with the ext2/ext3/ext4 drivers the kernel guys went "no, fork it and have a completely seperate file that will get out of sync with the other one", and then years later it's a mess...


February 22, 2023

Oh god, kernel people.

[That's all I wrote for this entry at the time. It's now March 19 and I haven't edited and uploaded past this entry yet because I just don't have the emotional energy to deal with that toxic waste dump, but we're coming up on a month behind, so here goes:]

Thomas Gleixner replied, ignoring the actual code parts and instead having a multi-part exchange entriely about the bureaucracy, where I didn't cc: the right people (I cc'd who get_maintainer.pl said!) and my subject line was wrong and UGH, my DESCRIPTION and also there's something in some thousand line documentation file I missed but he literally won't specify what it was because the onus is on me to FIGURE IT OUT. And he's also arguing that if a dependency is EVER needed then it's ALWAYS needed so my patch to be able to build without objtool is conceptually wrong because SOME configurations need that dependency therefor EVER building without it is a crazy thing to want to do.

Meanwhile, Mashahiro Yamada is literally saying that my patch to try the "cc" name before falling back to "gcc" and thus autodetecting llvm in both native and cross compilers (with no other behavior change I am aware of) can't go in because, and I quote: "In the discussion in the past, we decided to go with LLVM=1 switch rather than 'cc'. We do not need both." (With a link to the previous vote.) This was his REPLY to me pointing out that the name "gcc" is like "gawk" and "gmake" (and "gsed" on macos homebrew) and that just about everything else uses the generic name where possible. What's his logic here, "we voted, therefore the topic cannot be revisited"? I just...

So tired.


February 21, 2023

Got my patch series posted to linux-kernel. The oldest patch in that series was first submitted over 15 years ago, albeit in a different form then. Another one fixes a minor bug I myself introduced 10 years ago, which nobody else has bothered to fix since even when I pointed it out to them.

If you're wondering why I'm tired of dealing with the kernel clique...


February 19, 2023

Dear bash, up yours:

$ bash -c $'cat << EOF\nthingy'
bash: line 1: warning: here-document at line 0 delimited by end-of-file (wanted `EOF')
thingy
$ echo $?
0

I'm trying to test ERROR PATHS to make sure they exit gracefully instead of throwing ASAN allocation errors, and you have WARNINGS? The shell has errors and the shell has success, having WARNINGS is new territory. (Still exits with 0... I do not have a syntax_warn() function.) Comparing with the Defective Annoying SHell... that accepts it without a warning and also exits 0. Fine, change the error path to be a... strange sort of success? (Dash does not append a newline to thingy, bash does. Dash doing it strongly argues in favor of NOT doing it, so always newline it is.)


February 18, 2023

Going through the HERE document parsing logic, I hit a "this can't work" bit (comparing two pointers only one of which gets incremented in the loop), and tried a simple ./sh -c $'cat << EOF\nhello\nEOF\n' test and sure enough it didn't (never recognizes EOF), and tried to find the last place it did and gave up in 2021...

That can't be right. There's a regression test suite. I know it doesn't make it thorugh all the tests, but... I had this working at one point.

Sigh, symptom of swap thrashing: sh is a big command that requires a lot of focus and I've had to do it in small increments with all the other demands. Now that I'm focusing way more on toybox, there's still lots of bug reports that spawn off tangents that SEEM quick but aren't. (I spent a couple days basically SCOPING mkisofs. I need to cycle back to diff. I still haven't set up a test environment to check in the lib/passwd.c rewrite that's actually buildable and testable with the bionic NDK...)

I've fixed a lot of sh bugs that were in front of me, which broke other stuff and I either hadn't made it though the test suite (because of the expected failures from still missing features) or the relevant test isn't in the test suite yet. So I need to grind away at fixing stuff for a sadly large hopefully uninterrupted block off time.

I miss the 36 hour porogramming sessions of my youth. These days I look up after 4 and need a long walk...


February 17, 2023

Sed bug report came in while I was poking at the shell double free, and of course the sed thing is another object lifetime rule issue, introduced by the sed speedups which added extra cacheing. Got it sorted I think?

The double free is in an exit path, where the cleanup does not match the assumptions. The HERE document logic adds the EOF marker to the end of the ARG list: not a COPY of the marker, the actual pointer to the original string we parsed earlier. The sh_pipeline variables "count" and "here" let us know we're in HERE document accumulation mode so each time parse_line() gets called it moves the marker and discards it when matched, but the cleanup function called by the exit path isn't looking at that.

There's also something I called "bridge segments" where additional commands that do NOT have HERE documents attached to them get parsed before the line continuation logic fetches the body of the HERE document(s), ala:

$ cat<<EOF; echo hello
potato
EOF
potato
hello

In that case the pipeline segment "echo hello" parses into would get marked as a bridge (its ->count set to -1) so the parse_line() entry path knows to back up through it and look for uncompleted HERE document segments. Once they're completed it works its way forward unmarking completed segments until it can either return "we have a complete thought, you can execute it now" or finds another reason to ask for line continuation (being in the middle of a for loop or if statement, for example).

The PROBLEM is that when you DON'T complete the HERE document, that extra entry indicating what EOF string we're looking for shouldn't be freed, because it exists earlier in the pipeline (in whatever statement had the redirect) so if you free it in both places... double free.

Alas while fiddling with this I found MORE wrong cases. For example, if the redirect ISN'T attached to a statement, it gets freed early (when the NOP statement is freed) an thus the HERE document can't be concluded for a different reason, ala bash does:

$ <<EOF; echo hello
potato
EOF
hello

And toysh can't handle that yet because free("EOF") happens after parsing the first line and then the HERE document fetching use-after-frees it.

I think I need to just xstrdup() it. Premature optimization strikes again, "I don't need to copy this, the original's lifetime is longer than HERE document parsing by definition"... "yes I do to be CONSISTENT".

Sometimes "progress" is just adding yet more tests the existing code doesn't pass.


February 16, 2023

I've been meaning to post my patch stack to linux-kernel for weeks (not because I think they'll merge it but so it's not my fault that they haven't), and hey: Linus did an -rc8 so this isn't the merge window week. Yay extra time, but I sat down to do mkroot builds of 6.2 anyway and...I broke the shell. Darn it. One of those fixes for Eric Roshan-Eisner's fuzzing bugs introduced a strcmp(ex, blah) without a test for ex being null, and running mkroot's init script triggers that codepath and segfaults. Stupid thinko, but have I really not tested mkroot in a month? Sigh.

Oddly enough, I'd already hit this and fixed it up in the shell work I did yesterday, but getting that to a good stopping point so I can check it in is tricksy. (It started with an attempt to add the read builtin and there's a lot of half-finished debris lying around the tree.)

Went to Walgreens early this morning and bought earplugs. Much less painful work experience. (I am not a dog person. Never developed the skillset. If I lock adverb in the bedroom when he's not alone in the apartment, he claws at the door endlessly and will damage it. If I let him out into the center room (combination living room and kitchen), he barks at the front door for maybe ninety seconds every time somebody else in the apartment complex walks through the hall. Fade sits on the bed with her laptop and closes the door, and Adverb thinks that's the correct way to be home and keeps trying to lure me there, but I'm used to a table and chair and this room has better lighting.)


February 15, 2023

If you're wondering how my day is going, my attempt to add a shell "read" builtin has diverged into reverse engineering my ${variable} expansion code to figure out what all the corner cases are which led to reading the relevant part of the bash man page which led to me restarting the bash man page from the beginning which led to redoing sh_main() flag parsing and adding tests for sh -cs "arg" thingy vs sh -c "arg" thingy which led to me changing the logic so -c "arg" aren't an arg.c colon attachment (because they aren't in bash: it reinterprets the first argument as a command instead of a shell script but sh -c -s "echo hello" prints "hello" instead of trying to run -s and yes I need a test for it) which circled back to me trying to get all the existing tests to run under ASAN which means tracking down why sh -c '<<0;echo hello' was faulting which is because TT.ff->pl = xrealloc(TT.ff->pl) sometimes ALSO needs to update TT.ff>pl->end and now I'm trying to work out when that's true. (The only realloc of an existing pipeline segment is when attaching HERE documents to one, which expands the arg[] array at the end, but I need to update ALL the pointers.) And then once I added a loop to check all the pl->end in the pipeline and update it if necessary (which SHOULD happen before function bodies get moved so it should all be in the one doubly linked list), that revealed a double free error I need to track down.

None of this was what I planned to do next, but with Android in feature freeze it seems like a good time to make a dive back into shell stuff...

Adverb has been barking continuously throughout this. Fade's dog is unhappy when Fade isn't here, and expresses it when he's not alone. (If he barks at the front door long enough, clearly I will bring Fade back. It's worked every day so far, after enough hours. I have headphones, but need earplugs. I have escaped the clingiest cat to visit a neurotic dog.)


February 12, 2023

I've taken a break from caffeine here at Fade's, which has resulted in some very long naps. As in more than one unexpected 8 hour nap. Not the most productive, but eh, it's a weekend...

Gentoo's "make tests" is failing on du because overlayfs lies. My first instinct was to mount a tmpfs when run as root, ala if [ $(id -u) -eq ]; then mount -t tmpfs tmpfs .; cd "$PWD"; umount -l .; fi (the lazy unmount means it's still there on the current directory while we're in it, but automatically unmounts as soon as we cd out or exit the process).

Unfortunately, the results from tmpfs are very different from the ext4 I developed it on: mkdir allocates a 4k block up front on ext4 but in tmpfs directories are always size zero (because the dentry cache doesn't take up space in the page cache). And I can't convert the tests to what tmpfs produces unless I'm going to _require_ it to run under tmpfs, which you can't do as a normal user. I think I need to do:

  dd if=/dev/zero of=ext2.img bs=1M count=1 status=none
  mke2fs -b4k ext2.img
  mount ext2.img .
  rm ext2.img
  cd "$PWD"
  umount -l .

Which should get me a filesystem that behaves like the one I'm developing on. (How does "dd" manage to get "unix" so wrong? Success is silent so your pipeline isn't full of trash, having to status=none to do that... I'm blaming IBM, they got ebcdic in there somehow. I'd use "truncate -s" but you can't loopback mount a sparse file...)


February 11, 2023

Attempting to close tabs: the gentoo locale thing should be fixable by having it try C.UTF-8 (which macos hasn't got) before en_US.UTF-8 (which gentoo hasn't got). My readingn of "man 7 locale" says it should try C.utf8 in its search path (feed it the "official" name and it tries four different variants: upper and lowercase, with and without dash)... gentoo still didn't work. I tried to run it under strace to see why, but "emerge strace" doesn't work on last Sunday's LiveCD because /etc/portage/make.profile is a broken symlink. Emailed a "huh?" at Patrick Lauer...

Oh goddess, whatever Horrible Gnome Thing gentoo's livecd is using as its' terminal (or is it a Horrible KDE Thing?) is FLASHING the broken symlink at me. Causing KVM to gratuitously eat CPU doing perpetual screen updates just to the display can cause ADDITIONAL EYESTRAIN. That manages to be counterproductive on multiple levels. (And I haven't dug into figuring out how to make the background be actually black instead of dark grey, because they decided "less contrast, that'll help".) Cleared the terminal and my CPU usage graph no longer looks like a heart monitor.


February 10, 2023

Onna plane. Heading to Minneapolis, visiting Fade until the end of the month. (Flying back on the 28th, which is as far as February goes this year.)

Haven't blogged for the past few days, felt under the weather ever since the ice storm. (It _really_ threw off my sleep schedule.) Made a few notes about "huh, I should blog about that" and then didn't. (Sigh, I should backfill but mostly the things I thought about blogging were when I wasn't in front of the computer, so said notes would be in Austin and I'm onna plane.)

What did I do: aggroed the bash maintainer into a coreutils thread. (Still subscribed because cut -DF still hasn't been merged or rejected.) The arch/sh maintainership transfer is still up in the air. Started researching mkisofs. Did NOT post my kernel patch stack to lkml yet.

On the toysh "read" builtin front, bash's behavior is subtle: read -p hello > /dev/null doesn't work because the prompt is output to stderr not stdout (justs like the $ prompts). If I go read "" it exits with an error immediately (because "" is not a variable name it can assign to), but if I read potato "" it reads a line of data, splits it, assigns the first part to potato, and THEN exits with the error. I don't understand why it only checks the FIRST value for validity before reading input? (Why check it at all before reading if you're not going to check the rest...)

$ read -p % ""
bash: read: `': not a valid identifier
$ read -p % potato ""
%one two three
bash: read: `': not a valid identifier
$ echo $potato
one
$ read -p % potato ""; echo $potato
%blat
bash: read: `': not a valid identifier
blat
$

First time it doesn't even output the prompt, the third read shows it's not a syntax error (just a normal error exit). So that's good to know. I should add tests...

And then of course, after all that bashing my head against input granularity, sitting down to write "read" I'm hitting OUTPUT granularity. Namely: you can list multiple variable names on the read command line and it does IFS splitting to put a word in each argument the way it does for $1 $2 $3 etc for commands and functions... but if there are fewer variables than arguments it STOPS splitting early, and puts the rest of the string into the last argument, not having consumed the remainder's $IFS characters. Meaning read A B <<< "a b c" will preserve a run of multiple spaces or whatever space/tab/space combo was between "b c" when assigning to $B. Which is NOT the "split and glue back together with the first $IFS charachter" logic of "$*" nor the "glue back together with specifically space regardless of what IFS says" behavior I implemented SEMI_IFS for in "eval" and "case"...)

The problem is, my function that does all this work is expand_arg_nobrace() which is already taking six arguments, the last two of which are usually zero. I'm reluctant to add a third "usually zero" argument, especially since the last one that's currently there is "long *measure" which seems like it could be repurposed, but what it currently does is "set it to a character to search for a bit like $IFS but this one's a hard stop where you write the offset at which you found this character into *measure and return early", which is used to reliably find the semicolons in ((math;moremath;evenmoremath)) regardless of quoting and ${thingy#$((blah))} nesting levels. Totally different from "set NO_SPLIT in flags after argument 3".

(I also hate $IFS as a concept, and spent months wrapping my head around the details of what does and doesn't become a separate argument with "" and ""$EMPTY and """$*" when there are no arguments, and how x() { echo $#;}; x """" should print 1 not 2... and looking back through this code I remember that there ARE a bunch of special cases but not WHAT they all were, which is why I made so many tests/sh.test cases for it, and I dowanna touch this forest of nested horror that laboriously jenga-style made them all work, but I have to find exactly the right place to drop in a state change with no state inappropriately crossing the change point... and I dowanna.)

Setting *measure to a negative number is uncomfortably magic.

Adding an IFS flag to change the meaning of *measure would let me avoid changing all the callers to add another zero, but it has a naming problem: the common prefix of almost all the existing flags is NO_ as in NO_SPLIT and NO_IFS to disable something expand_arg() would otherwise be doing. (Which isn't great either, but EXPAND_NO_SPLIT is too long when you're or-ing together five of them). I already violated that with SEMI_IFS and dowanna do so again or I've just got a bunch of random #defines floating around the code.

I made a quick stab at adding an expand_arg_nobrace() wrapper calling expand_arg_nobrace_raw(). After all the original API is expand_arg() which handles ab{c,d} processing and then passes on to expand_arg_nobrace(). But two of the calls ending in double zeroes are recursive calls within expand_arg_nobrace() itelf, and I'd need to provide a function prototype (with seven complex arguments to keep in sync if anything changes) to let those two call each other, which is exactly the kind of nonsense I'm trying to avoid with the ever-widening API on this sucker as I find new corner cases.


February 9, 2023

Of course make tests breaks on gentoo, why wouldn't it?


February 7, 2023

Fixed tar yet again. Here's hoping it sticks this time.

I am now researching mkisofs implementation. (I actually made the mythical "bootable hard drive image" one of the pages said they can't find an example of, back in the yellowbox days. Took some fiddling to get the machine's BIOS to accept it, what with all the legacy hard drive types. Probably why it didn't get used as widely as "floppy image", which had a lot less variants.)


February 6, 2023

I'm amused by Hyrum's Law. (It's the API version of "with enough eyeballs all bugs are shallow". With enough users, all observable behaviors of your system become "the API" and changing it breaks somebody. That's why my spec for toysh is "what bash does" and then run a bunch of existing scripts through it to see what breaks.)

While emailing somebody I checked to see if I'm still in the first page of Google results for "patch penguin", and the answer is "no, but creepy".

The minor discomfort is Google search no longer produces a paged interface, it's one of those perpetual scroll things that loads more as you scroll down I didn't ask for this and actively don't want it, but they wanna be fancy javascript nonsense. (If I switch off javascript for google.com will I get pages back?)

The MAJOR discomfort is I scrolled down something like a hundred entries and it's ALL ADVERTISEMENTS. Every entry is a product and the google summary gives a price in dollars at the bottom, and half of them say "in stock". And it's a special line that's a slightly different shade of grey than the other lines: Google has a "product" category in the search and is showing me almost entirely products. I don't want products. I confirmed I had NOT selected the "shopping" tab, but 2023 Google weights shopping pretty much to the exclusion of all else. I can't EXCLUDE "shopping" from my search, because they don't want me to and I'm "the product not the customer"...

(Um, since Google is apparently determined to become useless now: the Charged Vacuum Emboitment mentioned above was space technobabble the Tardis passed through in the 4th doctor episode Full Circle to wind up in "E-space" instead of the normal universe. "Emboitment" is apparently a mangled french word meaning something like "to put in a box". All TLAs have bad collisions in the modern world, and my brain tends to lock onto the one I encountered first. Mitre is as far as I can tell an NSA front organization, so I guess it's nice the US government is collecting and publishing security vulnerabilities, but I'm always confused when something I do is considered important enough to mention? But I guess I should finish the httpd Common Gateway Interface functionality.)


February 5, 2023

Wait... really? There's a toybox CVE for httpd? (Yeah I remember fixing that bug, but was it really worth a Charged Vacuum Emboitment?)

So I came up with an fpfix() function that does the fseek(ftell(fp)) thing (and should PROBABLY also do the fcntl(O_DIRECT) thing with maybe a stat() determining which is appropriate), and I inserted a call to it in both save_redirect() and unredirect() doing if (fd<3) fpfix((FILE *[]){stdin,stdout,stderr}[fd]); and then ripped it back out again because... that's not right. The extra syscalls are expensive if they'll happen a lot, so I want to make sure they happen at only the necessary places. (Yes, it's lifetime rules again. No, garbage collection wouldn't help. Which made me start wondering how rust or go intend to apply to nommu systems until I got a headache and had to walk away for a bit.)

I'm 95% certain we ONLY care about "fixing" stdin, because that's what uses getline(). For everything else toysh is using file descriptors, so our stdout and stderr global FILE * instances should never _get_ out of sync if we just avoid ever using them. (Is THIS why each dprintf() call on glibc does a gratuitous lseek(fd, 0, SEEK_CUR) before doing a write() of the appropriate data? It's mildly annoying that dprintf() on glibc has such noisy strace output, and you'd think that fileno() would do it to if so, but no...)

I can only think of two actual stdin consumers on toysh: get_next_line() and the "read" builtin can each eat extra data because of FILE * readahead, and then when we run child processes those can inherit a gap. So there are three cases in need of potential adjustment, but the further complication is there are two TYPES of adjustment: seekable file descriptors can get fixed up with seek after the fact, but if it's a pipe we want to set O_DIRECT preferably before the producer _writes_ data into it (because once the pipe buffer's collated we've lost the blocking information).

So toysh needs to fixup each pipe() it creates, and _maybe_ sh_main() should fixup the stdin we inherit? Hmmm, what about "read < /dev/tty"? That says we SHOULD set O_DIRECT on nonseekable save_redirect() input? (Or maybe expand_redirect() should do it when opening the redirect file? Grrr...) I really want an elegant design chokepoint everything has to go through rather than trying to whack-a-mole every entrance and exit. Three consumers of the data, two types of fixup, SHOULD be six total cases, but pipe() vs < /dev/tty isn't in that paradigm.

Ok, toysh needs to O_DIRECT incoming pipe inputs as soon as possible (so sh_main() and expand_redir()), and also set that flag on outgoing pipes at creation time before we write anything to them. The seekable kind can need to set back to the right place when we're done reading them, which does NOT belong in get_next_line() but instead should go at the start of run_line() so multiline reads get optimized (line continuations don't have to re-read the input, so scripts can load chunks), and also on the exit path of each read builtin (because we assume we're going to run at least one command on what we read).

Alright, that SEEMS to make sense...

I'm trying to read through the musl source to see what its getline() block read size is... it really looks like that's doing single byte reads too? src/stdio/getdelim.c is repeatedly calling getc_unlocked(f) and getc_unlocked.c is this strange little wrapper function doing int (getc_unlocked)(FILE *F) { return getc_unlocked(f); } which is explained by src/internal/stdio_impl.h which has #define getc_unlocked(f) ( ((f)->rpos < (f)->rend) ? *(f)->rpos++ : __uflow((f)) ) (and thus the parentheses around (getc_unlocked) isn't some weird function pointer syntax, it's so the symbol explicitly has no arguments and thus the macro preprocessor doesn't recognize it as the macro defined to take arguments... and then the body DOES expand to that macro. Me, I would have PUT A COMMENT THERE.) Anyway, this __uflow(f) is in src/stdio/__uflow.c (yes with two underscores on the filename) which is basically doing f->read(f, &c, 1) except... that read() function pointer takes a FILE * as its first argument, not a file descriptor. Where is the function pointer set? Well one of them is function __stdio_read() which... is doing crazy things with an iovec that I am NOT puzzling through right now ("len - !!f->buf_size" again needs a COMMENT) but it looks like it might be reading buf_size, whatever that is.

I no longer care about the numbers. (If I need to know I can run a test program under strace.) I very vaguely remember from years ago it was 512 in at least some cases? Anyway yes, it can maybe read ahead with block size big enough to reasonably amortize the system call overhead. And thus needs some serious unget to pass the file descriptor to other users. No, I am not trying to look at bionic just now, not after that.


February 4, 2023

Oh goddess fsetpos() is a stupid API, isn't it? The classic ftell() returns long which is signed 32 bits on 32 bit systems, and files are bigger than that these days, but instead of doing some sort of lftell() which returns long long (and an lfseek that accepts it) they invented a new gratuitous fpos_t type which they pretend isn't just a typedef for "long long", and then created two new libc functions with completely unrelated names: int fgetpos(FILE *fp, fpos_t *pos) and int fsetpos(FILE *fp, const fpos_t *pos), both of which are FUCKING STUPID.

WHY does fsetpos() take a POINTER to pos? If you just passed it the value, you wouldn't need to say "const" would you? Yes the get function that WRITES the value is taking a pointer, because they decided these need to return 0 or 1 to indicate error instead of returning -1 when there's an error like the previous one did (since that's not a valid file position), which is itself stupid. (The old way was smarter.) But the set function has ZERO REASON for its pos argument to be a pointer. Feed it the value, then you don't need to annotate it with "restrict" or "auto" or "static" or anything because IT IS A NORMAL ARGUMENT. (Symmetry is not an argument here, the functions DO DIFFERENT THINGS. You don't printf("%d", &i) because %n can write to i and thus needs a pointer, therefore the arguments should ALL be pointers. That would be INSANE.)

The C++ clowns who took over C development make me sad. Ken and Dennis and Doug McIlroy and Brian Kernighan were very smart. The people they handed off to... not so much. (I did NOT point out that gnu would have made rm -rf be "filesystem-modifier remove --no-prompt --recurse-into-dirs-newer-than=all --ignore-read-only", and that unix was all about individual commands that "do one thing and do it well" nd connecting commands with pipes instead of "git subcommand" or "ip subcommand" or "systemd subcommand" or...)

Simple systems survive. Increasing complexity eventually collapses under its own weight. Alas, "this too shall pass" does not usually do so on timescales I get to personally benefit from. There are a lot of "marsupial rat" versions of unix out there (including the 8 zillion posix RTOS variants) because it _works_. Linux wandering away from unix says bad things about LINUX, not about unix.

You can get a full understanding of a unix RTOS in a couple years, although xv6 sadly has the minix problem. (Ken Thompson taught his working Unix system to a generation of grad students who created BSD from it, but ivory tower academics zealously guard their abstract teaching tools from being fouled by any feedback from real world use: patches decidedly unwelcome.)

Which is odd because a complete course on something like vxworks could easily happen in high school, it's CLEVER but not that big and not that complicated, and it's a multitasking posix system with the standard bells and whistles. (NFS over USB? Out of the box, and fits comfortably in 2 megabytes...) Not remotely unique either, that one's just 36 years old and still going so it's easy to talk about. You'd think Linux would have knocked out all the proprietary unixes, but Linux is a PIG that hasn't fit comfortably in 2 megabytes RAM since the 1990s.

Yes it's entirely possible to come up with a brand new replacement paradigm, but it would have to be equally simple and elegant to persist nearly as long. Java/JavaOS tried 20 years ago (back when I taught classes in it at the local community college), but it was an uphill battle even before Sun trashed that quite thoroughly. And then oratroll happened: the other problem with Java was IP entanglements. Technology advances when patents expire, not when they're granted. Unix escaped AT&T early and laboriously purged itself of lingering corporate taint in the early 90's. Anything trying to replace unix has to reckon with late stage capitalism's relentless embrace-extend-extinguish clearcutting and strip mining. The settlers come in and find a carefully curated land with a bounty of buffalo and passenger pigeons and american chestnuts, and all of it's dead and gone within a few decades. The descandants of britain's imperial capitalism do the same thing to any resource that can't defend itself from rapactious unsustainable exploitation as they did to their own people before metastasizing into a global empire, and they are 100% convinced that ideas are property. The livejournal->myspace->twitter->mastodon cycle is about communities as property being embraced extended and extinguished, their members fleeing to a new territory the would-be owners haven't conquered yet. France solved this problem with guillotines.

As SCO proved, there's no money in suing modern Unix. (The Mormon activist behind the lawsuit still managed to take advantage of Novell's founder's descent into altzheimers to elder abuse away all his money and use it to make the handmaid's tale a reality, eventually achieving success under the Trump administration, a misogyny the octagenarian democrats are happily complicit in sustaining to this day.)

Yes this is a cultural thing, the native americans who were here for 36,000 years before the white man came terraformed the place to be full of food you'd just reach out and pick. They modified their environment to make hunting and gathering _easy_, and were also a lot cleaner than europeans. (The ubiquitous "road dust" that medieval europeans brushed off their cloaks was powdered horse manure, which is a health hazard even with modern sanitation, and don't get me started on the cows and pigs and chickens and it somehow managed to be even worse in the cities...) The highly contagious European settlers who came here and killed almost everyone they met (Start watching this charlie brown thanksgiving episode at 18 minutes and 10 seconds, it's educational) didn't realize they were wanding through the equivalent of Kew Gardens, they thought it was wild and that nobody needed to maintain it, and smashed up enormous salmon runs and screwed up controlled burns and just made a mess of the place. Capitalism has ALWAYS been unsustainable. It's just that "expanding until you eat the whole world" was a viable strategy until quite recently, when capitalism predictably ran out of world.

This is why the GOP wants to ban "critical race theory", by the way. When even 1960's Charlie Brown episodes go "we took this land by literal genocide"... the German nazi party literally sent study teams to america in the 1930s to learn how to codify racism in law and get away with mass murder, in response to which president Roosevelt put Japanese americans into american concentration camps, which they could only escape by joining the army to fight in the war. Today we call "plantation owners" billionaires. Might want to maintain some awareness of this general cultural context.


February 3, 2023

Darn it, fseek() is underspecified. If I lseek() on a file descriptor I know what happens, and what error conditions to check for if the fd isn't seekable. But if I fseek() back a few bytes, is it doing an lseek() on the underlying file descriptor or just adjusting the buffer in the FILE * object? If I fseek() on something that isn't seekable does it cause a problem for future reads?

I just fixed head.c, but toysh's read builtin also needs to put back extra data it read for the corresponding test to work right, and lseek(fileno(FILE)) would leave the FILE * readahead buffer with leftover trash in it, so in THEORY I want to do fseek() but in practice I dunno how much I can trust it? (More debris from the C specification people pretending file descriptors don't exist so they don't need to interact with them, and posix refusing to go far enough specifying the interaction.) Honestly, "fseek() shall fail... IF the call to fseek() causes an underlying lseek() and [error happens]" because calling fseek() is by no means guaranteed to cause an actual lseek() to update system status. (Grr, do an fseek() AND lseek(fileno(FILE)) maybe? I'm not convinced this is BETTER than just doing single byte reads of the input so we never get ahead...

Sigh, time to read multiple libc implementations...

Ok, from musl and bionic it LOOKS like fseek() is generally implemented as a wrapper around lseek that flushes and drops the FILE * internal buffer data when the seek works, and the ambivalence about whether not it actualy does that is because fmemopen() and friends exist, so some FILE * objects AREN'T a wrapper around a file descriptor. And those are weird, but I don't have to care about them here.

Ha! If I feed the O_DIRECT flag to pipe(2) then in THEORY that prevents multiple writes from being collated in the pipe buffer, meaning "while true; echo $((++x)); done | while read i; echo $i; done" shouldn't skip any numbers even if it creates and destroys a separate FILE * each time through. (Which it still shouldn't for stdin/out/err, but I need to throw in whatever the read equivalent of a fflush() is each time we redirect stdin.)

Hmmm. There's a gratuitous artificial limitation on fcntl(F_GETFD/F_SETFD) which ONLY lets it change FD_CLOEXEC and NOTHING ELSE. Why even have the API then?

Wow, glibc is truly craptacular. If I go over to my freebsd-13 image and include unistd.h and fcntl.h and do pipe2(fds, O_DIRECT); it works fine. And it works fine built with musl-libc too. In bionic, they have O_DIRECT but not pipe2 because their unistd.h has an inexplicable #ifdef IA_IA_STALLMAN_FTAGH around the prototype. (And I still haven't figured out how to #ifdef for the presence of a function prototype.) But if I do that on glibc it complains about pipe2 _and_ O_DIRECT both failing to be exported from the header files I included without #defining about how RMS sleeps in ryleh. Guys: pipe2() was introduced in 2008 and O_DIRECT has been in Linux for more than 20 years (and grew its pipe2 meaning in Linux 3.4 released May 2012), it is a Linux system call, not a gnu thing.

Linux is not and never has been part of the gnu project, and RMS explicitly objected to the existence of Linux before he switched to trying to take credit for it, and yes his explanation at that link is a big lie because Linux forked off minix not gnu, which is why the early development was all done on comp.os.minix and he had a famous design argument with Minix' creator (when said professor returned from summer break) who kicked him off minix's usenet newsgroup and made him start his own mailing list. I collected some interesting posts from the first couple years on my history mirror: note the COMPLETE lack of Stallman or FSF participation in any of it, and if you boot 0.0.1 under an emulator, the userspace ain't gnu either. Stallman was 100% talking out of his ass: Linux was inspired by (and developed under) Minix with the help of printed SunOS manuals in Torvalds' university library, and it incorporated a bunch of the BSD work going on at the time. The gnu project was one of MANY unix clones happening in the wake of the 1983 Apple vs Franklin decision extending copyright to cover binaries and inspiring AT&T to try to close and commercialize Unix after 15 years of de facto open source development (and the FIRST full Unix clone shipped in 1980) By the time Linux happened, the GNU "project" had been spinning its wheels for eight years. When Linus's 1991 announcement said it WOULDN'T be like gnu, he was MOCKING WIDELY KNOWN VAPORWARE, like a game developer referencing Duke Nukem Forever or Diakatana.

Anyway, the point is the glibc developers have had PLENTY OF TIME to get these symbols into the darn userspace headers, and the only reason they haven't is the same reason Stallman tries to take credit for Linux, which has led to bad blood in both directions. (Stallman also tries to take credit for the existence of FreeBSD, but they just point and laugh at him. He had nothing to do with Wikipedia or project gutenberg either. The term "Freeware" was invented by Andrew Fluegelman years before Stallman's GNU announcement. Magazines like Compute's Gazette had BASIC listings in the back every month dating back to the 1970s. Dude can shut up and sit down aleady, that sexist privileged white male Boomer has Elon Musk levels of taking credit for other people's work going on, and needs to just stop.)

Aha! There's a SECOND fcntl(F_GETFL/F_SETFL) API which CAN toggle O_DIRECT. That's just _sad_, but sure. Assuming I can reliably beat a definition of O_DIRECT out of the headers, which I can't really #ifdef/#define myself because it varies by architecture. But I can get that from everything except glibc, and maybe I just don't care about it working with glibc? There's only so persistently stupid you get to be before I leave you behind. Define it to zero when glibc's broken headers did not provide, and let the call drop out, you get unreliable behavior due to a libc bug. I will not, ever, define stallman because my code is not part of the gnu project. One of its many goals is to provide an antidote to gnu.

Huh, it's surprisingly easy to get derailed into half an hour of closing tabs. Something like a hundred accumulated open terminal windows in desktop 7 (email) which are mostly just "type exit, hit enter" in each one because it's some man page I was looking at or command line tests I an confirm I finished with (or "pulldown->move to another workspace" and send off to desktop 2 (toybox) or 6 (linux/qemu/mkroot, and my kvm instance running freebsd hangs out there too), a bunch of "last thing here was pushing to git" or git show $HASH, or running some simple command like "pkill -f renderer" or df /mnt (shows me what if anything is currently mounted on it) or doing math with $((123*456)), or grepping for a symbol in /usr/include or the output of something like "aptitude search thingy" (an apt-get wrapper with better syntax) where I recognize and can discard the results but switched away from that window once I had my answer. When vi is editing a file exiting out and doing a git diff shows me whether I was browsing or actually made changes.

And lots and LOTS of "vi was editing a file and then got killed" because when you fire up vim on a file that's already being edited, it tells you the PID of the old vim instance but doesn't have an obvious way to just kill the old one and let you inherit the editing session. Instead you have to "kill PID" manually if it's still running (or search around to try to find the tab but good luck with that), then :recover and if the file's changed write it out under a new name to see if the changes are interesting, then rm the temp file and the .file.swp and THEN you can go back and edit it normally. Wheee...) If I'm feeling posh I can even go collate windows that got moved to the proper desktops (you can not only drag and reorganize tabs within a window, on xfce you can drag and drop then between terminal windows. If you haven't got a tab, open a new tab to force the tab bar to show up, then exit the new tab when it's the last one in the window.)

Heh, here's the directory where I was re-ripping some CDs (usb DVD drive still works, cdparanoia still works, most of the CDs are still in the right cases) and hitting them with flac to scp up to my website so I could download them to my phone. (Long ago I had big youtube music playlists, but youtube became 100% useless without paying. Not just two ads between each song, but interrupting longer songs in the middle to play ads. Digging out old CDs and mp3 collections it is...) Pretty sure I can rm *.wav in there, I could zap the .flac files too but eh, I'm not short of space just now. (2 terabyte ssd covers a multitude of sins. Or at least allows them to quietly accumulate.)

Here's the window I download and filed my twitter archives in (both for my original account, which I then deleted, and the backup account Fade made me during all those years i refused to give @jack my phone number, which I still have but haven't posted to even once since making that archive because downloading a fresh archive wants to do 2FA through Fade's phone in Minneapolis which is just not worth it. (I check a couple individual feeds there about as often as I remember to check Charles Stross' blog or Seanan Mcguire's Tumblr. I don't have an account on either site...)

That's the EASY part of tidying one's desktop, of course. Browser tabs have gone beyond the timesink event horizon. Chrome remebering them between restarts is both a blessing and a curse, but at least "pkill -f renderer" keeps the memory usage down to a dull roar. It would be nice if it could save each inactive tab to a zip file under .chrome somewhere so that tab didn't have to reload from the website as if freshly opened whenever I get back to it, but hey. I've learned I basically never look at bookmarks again, and I _do_ periodically revisit and finish/cull old browser tabs. Not as fast as they accumulate, but still...


February 2, 2023

The ice storm has REALLY screwed up my sleep schedule. Woozy. (Couldn't work, couldn't go out, the lights were off all day, and it was stressful.) My internal clock is flashing 12, doing the whole "Too tired to focus but lying down does not result in sleep" thing...

It's hard for me to get worked up about "yoda conditions" when it's THE SAME COMPARISON. 1 == x and x == 1 are equivalent, but the one on the left can't be mistaken for/typoed into an assignment. "Correcting" everything to the one on the right because it's not "mentally comfortable" is something I'm having trouble sympathizing with? (My mental arithmetic apparently does not have "handedness". This is a thing the language has always allowed you to do, and there is a reason to do it and zero reason to do the other one. Arguing "but it's not a _strong_ reason to do it" vs having literally zero reason other than aesthetic preference... Sigh.)

Darn it, my clever "while read" combo hack in toysh has a problem.

So getline is a glacial CPU-eating slog without cacheing, and FILE * is the cacheing layer ANSI C decided to provide back in the day and (eventually) implement getline() on top of, and if you're just reading from stdin then the "read" builtin can use the stdin global constant (as get_next_line() is curently doing), and my THEORY was that for anything else (either read -u or read < source) I could fdopen() a FILE * object and cache it in the struct sh_blockstack instance for the enclosing loop (adding a field to the control flow data), and thus not lose cached readahead buffer data by destroying and recreating the FILE * wrapper each time the read command ran and exited.

BUT: read -u $VARIABLE is not guaranteed to be the SAME filehandle each time through the loop. I guess I can call fileno() on the FILE * and compare the fd we're trying to operate on, and tear down the old one and replace it when they change it?

while read i -u 37; do for in {1..10}; do read j k l -u 37; do echo $i $j $k $l; done; done; done

I can come up with a bunch of test cases I don't care about OPTIMIZING, but I'd prefer they didn't actively break. (But why would anyone do that? "for i in a b c d; do read a b c < $i; do_stuff; done" could happen. Hmmm, but then it's doing an open/close on the file object in the read context, so cacheing the FILE * object in the flow control would be wrong. Grrr. Lifetime rules!)

Hmmm... alright, there are two cases here: read from a tty and read from a file. In the tty case, the input (should) come in chunked so the block reads are short and shouldn't readahead much anyway. (If you've ever typed stuff before a shell was ready and the input got lost... that. Password prompts are notorious for it, but it happens elsewhere.)

The other case is "while read... < file.txt" where it will very much read all the way ahead, and if you ever discard extra buffer you deterministically lose bits of the file. Which says (oh goddess) I need a reference counted cache of FILE * wrappers for file descriptors >2 (stdin, stdout, stderr have persistent globals) but bump the reference increment/decrement to the enclosing loop block object (if any), which STILL won't work with "while read x; do command that also reads input $x; done < file.txt" because the FILE * will read ahead and then pass the filehandle to the command which starts reading after whatever the FILE * ate.

$ while read i; do echo =$i; head -n 1; done <<< $'one\ntwo\nthree\nfour\nfive'
=one
two
=three
four
=five

How. HOW? Is it doing single byte reads from input?

$ echo -e 'one\ntwo\nthree\nfour\nfive' | while read i; do echo =$i; head -n 1; done
=one
two

Ah. It gets it right when the input is seekable. Of course.

$ while read i; do echo =$i; toybox head -n 1; done <<< $'one\ntwo\nthree\nfour\nfive'
=one
two

And it's at least partly "head" doing extra work, and toybox is getting it wrong. (New test!)

RIGHT.

And this says that FILE * is generically borked in the presence of fork/exec _anyway_, because the inheritor of our fd 0 won't see the data read ahead into the FILE *stdin buffer. I'm more familiar with this problem as it relates to stdout flushing, because glibc's gotten that very wrong before, and that was just trying to make flush on exit() reliable, let alone exec without exit.

The two big problems in computer science REMAIN naming things, cache invalidation, and off by one errors.


February 1, 2023

For my birthday, an ice storm knocked out the power from before I woke up in the morning until sometime after 10pm. I had some battery in my laptop, but didn't fire it up because if it drains all the way I lose all my open windows, and with more freezing rain predicted tonight I didn't know if power would be restored before thursday. (Plus our geriatric cat's heating pad was off, so sat on me for hours instead.)

Luckily it got just cold enough to sleet instead of more freezing rain. None of the trees that could have collapsed on my house did, although two on the block dropped some quite big chunks, and one such trees has drooped significantly and is resting half its branches on our roof, but in a bend-not-break sort of way. (One around the corner has bent basically in half and is resting its branches on the _ground_, which I find impressive. Pecans are survivors.)

So yeah, not a productive day, but way better than it could have been. No flood damage, no hurricane scouring the paint off a corner of the house...

Sigh. The very nice glasses I got in Japan shortly before the pandemic are finally wearing out. The lenses were outright scratchproof for a good three years, but the coating's weathered enough they're starting to scratch. They've been WAY more durable than anything I got from Zenni, and I dunno whether they're still functional at all with that whole "outsource to china" strategy meets china's covid lockdowns, the container pileup, and now wolf warrior diplomacy and reshoring? (I didn't get my prescription checked in Japan and instead handed them an old pair of glasses to copy the prescription from, and I've passed them off as "reading glasses" ever since. That was intentional: I'm not driving so I care more about reading up close for long periods, and glasses that focus more naturally at that length cause less eyestrain.

I _have_ newer/stronger glasses somewhere, but about 5 years ago I worked out that my eyes are adjusting to my normal usage patterns (staring at up-close things for hours at a time), and the whole reason my vision sucks is years of a correct-and-adapt cycle I probably could have just avoided if I hadn't been reading comic books all morning before the school eye test back on Kwaj. I'd never needed glasses before, but the roofline was a touch blurry... because my eyes took a couple hours to swing back to looking at far away stuff. I'm a lot older so it takes my eyes a lot longer to move their overton window, but even today it still happens: if I stop wearing glasses for 8 hours or so far away things are WAY sharper when I finally do put them back on. I just... hardly ever do that? No phone, no lights, no motorcars, not a single luxury... Sometimes I take them off on long walks to the table while listening to podcasts, but that's about it.)


January 31, 2023

Honestly, WHY does qemu keep gratuitously changing its user interfaces? Once again the old one was simple and straightforward, the new one is insane, and removing the old simple API serves no obvious purpose. They broke tcp forwarding, they broke -hda, they broke -bootp... Stoppit.


January 30, 2023

It occurs to me I can test the lib/passwd.c rewrite under a debootstrap chroot instead of waiting for mkroot, because it's just twiddling files rather than poking at syscalls or /proc the way route and insmod do to actually change the host kernel's system state.

In theory, it's "debootstrap beowulf beowulf" (for devuan anyway) and then when that's finished copy a stripped down version of mkroot's "init" script in there and sudo env -i USER=root TERM=linux SHELL=/bin/bash LANG=$LANG PATH=/bin:/sbin:/usr/bin:/usr/sbin unshare -Cimnpuf chroot beowulf /init and... in PRACTICE it's being stroppy. I dealt with this for Jeff some months back, but apparently didn't blog about it enough, and can't find my notes? Hmmm... I remember tracking down a weird bug involving accidentally running the Defective Annoying SHell instead of bash, hence the SHELL= export there, and that's the kind of thing I WOULD have blogged about, but no?

I might have tweeted about it, in which case it's lost to history because of the muskrat's midlife crisis. (For his quarter life crisis he bought a company that makes shiny red sports cars. The bald Amazon billionaire bought the a newspaper, the south african emerald brat tried to pretend he wasn't copying him by instead buying the latest iteration of aol/livejournal/myspace. Because SpaceX clearly isn't in a dick measuring contest with Blue Origin. A company named after the X-prize, which he lost -- Paul Allen sponsored Burt Rutan to win -- is clearly NOT about competition and ego, it's an entirely original thing that emerged fully formed from his very large brain, which is no way a cry for help.)


January 29, 2023

Alright, FIX WHAT'S THERE in dirtree. BREADTH traversal means dirtree_recurse() needs to iterate through the child list of stored entries (if any), which calls handle_callback() which frees the node when the callback didn't return DIRTREE_SAVE. The problem is, we're recursing through that list and free(node) doesn't remove it from the list. We're only told AFTERWARDS whether or not it saved it (did handle_callback return a pointer or NULL). So I need to fetch the next entry _before_ calling handle_callback so we can iterate without read-after-free list traversal, but I need to update and advance the saved-node pointer _after_ calling handle_callback, making sure it always points to valid memory.

Dear C++ developers who have hijacked gcc development:

In file included from ./toys.h:69,
                 from lib/dirtree.c:6:
lib/dirtree.c: In function 'dirtree_recurse':
./lib/lib.h:71:35: error: label 'done' used but not defined
 #define DIRTREE_ABORTVAL ((struct dirtree *)1)
                                   ^~~~~~~
lib/dirtree.c:174:21: note: in expansion of macro 'DIRTREE_ABORTVAL'
     else if (new == DIRTREE_ABORTVAL) goto done;
                     ^~~~~~~~~~~~~~~~
lib/dirtree.c:154:18: warning: unused variable 'entry' [-Wunused-variable]
   struct dirent *entry;

Bravo on the warning and error message generation. Exactly what I would expect from people who think C++ is a good idea. (And yes, that is a single processor build with no output interleaving. I double-checked. And yes, those were the first output messages before it had a chance to get itself good and confused, which it did and complained just as uselessly for quite a while after that. For the record, I had an extra } on line 177, a few lines AFTER all that nonsense. The compiler was no help whatsoever in finding it.)

Ok, got sort checked in. It uses -s as its short option which is a bit questionable (as far as I can tell the gnu/dammit one has -s produce the behavior it was already _doing_ for extract and throws an error if you try to use it with create: bravo guys), and my --sort can take an optional =thingy argument for compatibility but only implements sort by name. (Again, there's no "rm -r --switch-off-r" so --sort=none seems useless, and --sort=inode is a micro-optimization for 1980s vax systems without disk cache? It claims a performancce improvement but extract ain't gonna care (it's not USING the old inodes) and create has to read all the directory entries in order and then do a second pass to open them when it sorts ANYTHING, and then using inode number as a proxy for disk layout is optimizing seek time on uncached spinning disks which is also assuming they're regularly defragmented in a way that doesn't get the file locations out of sync with the inodes AND which assumes the disk was basically empty when all the files were created so the on-disk file locations correspond to the inode numbers, AND assumes a filesystem that's allocating inodes sequentially instead of using them as hash values... seriously, this was a marginal idea in 1989, trying to do it on a VM using virtfs to talk to a host storing data in btrfs is just NONSENSE.

The request was just for generating stable tarballs. I'm a little "eh" about mine vs gnu/dammit producing different output because I'm using strcmp() and the FSF loons are probably listening to the locale information and doing the same "upper case sorts mixed in with lowercase" nonsense that forces everybody to go LC_ALL=c before calling 'sort' out of the host path, but I can't control that and "stable produced with the same tool" is presumably the goal here.

Yes, the test I added for --sort is not using "professional" names. No, I'm not cleaning it up to look presentable. Possibly I should have left sed as it was and let the culture catch back up...


January 28, 2023

Grrr, the design of dirtree.c isn't right. And I've known it isn't right, but it's hard to GET right. There are FOUR interlocking functions (dirtree_add_node(), dirtree_recurse(), dirtree_handle_callback()), plus a fourth wrapper function dirtree_read() you generally start out by calling, and that's way too complicated.

The job of dirtree_add_node() is to stat a directory entry and populate a struct dirtree instance from it, which is fine. That's good granularity. That's the only one of the lot that ISN'T crazy, although possibly that assumption is what needs to change to let me fix everything...

When each dirtree instance gets created a callback function can happen, with behavior that happens in response to that callback's return code. That's what dirtree_handle_callback() does: you feed it a dirtree instance and the callback function, and it calls one on the other and responds to its return code. Possibly dirtree_add_node() could just take the callback as another argument... except what I was trying to avoid was recursing into subdirectories causing the function to recurse too. I don't want NOMMU systems with tiny unexpandable stacks to have unnecessarily limited directory traversal depth. Although I don't think I've got that right NOW either, so...

The dirtree_recurse() function handles recursion into subdirectories. Badly. Right now it opens a filehandle at each level to use the openat() family of functions, meaning directory traversal depth is limited by number of filehandles a process can open simultaneously. Instead I need to traverse ".." from the directory I'm in to get back to the parent directory, and then compare the saved dev/ino pair in the cached stat structure to see if that's the same node, and if not traverse back down from the top again. (And if THAT doesn't work, prune the traversal. That's "mv a subdir while archiving" levels of Don't Do That. SECDED memory falls back to DETECTING an error it can't correct, quite possibly this is xexit() time.)

The linked list of dirtree structures is less of a problem than the recursion stack depth because a linked list doesn't have to be contiguous, you can fragment that allocation all you want.

Sigh, the real outlier here is ls.c. Everything else just calls dirtree_flagread() and gets callbacks, but ls micromanages the traversal because it had weird sequencing requirements. So I need to refamiliarize myself with the ls weirdness to make sure a new cleaner dirtree implemenation could provide the callbacks it needs (quite possibly it _is_ the new DIRTREE_BREADTH semantics) so I can stop exporting dirtree_recurse().

Grrr, but Elliott pinged me about a new android code freeze and I wanna get him --sort before that goes in. I should debug what's THERE instead of redesigning it, but it's REALLY hard to get the object lifetimes right with multiple functions passing stuff off between them in a loop like it is now.

I think I need two functions: dirtree_add_node() and dirtree_read() that does all the callback handling by non-recursively traversing the tree (adding/removing nodes as it goes if/when the callback says to). Hmmm, but what would the arguments be? There isn't a global "tree" object that can hold things like "flags", and I want to be able to traverse on a path _or_ under an existing struct dirtree *node... Maybe dirtree_read(char *path, int flags, function *callback) which is a wrapper for dirtree_traverse(dirtree_add_node(char *name, int flags), int flags, function *callback)... except the reason dirtree_add_node() needs the parent pointer is for parentfd due to the openat() stuff, that's why the caller can't just set it after it returns. Right...

Fiddly. Hmmm...

When I'm done all this plumbing SHOULD look so simple that it's all obvious and trivial and seems like I didn't do anything. Getting there is usually a flaming pain, and a lot of the times I DON'T and have to ship something overcomplicated, which says to ME that I'm not very good at this. Alas, the reason I _don't_ have impostor syndrome is the rest of the industry turns out, on average, to be even worse at it than me.


January 27, 2023

Trying to debug tar --sort and it's being stroppy. I'm not sure I've got the design right, which is sad for something so seemingly simple?

Sort of regretting having implemented --no-ignore-case. It's the default, just don't specify it when you don't mean it? I didn't have sort check it, and am going "eh...". (The extra code to check it is bad. Having it and NOT checking it here is bad. Grrr. NOT PICKING AT IT. I haven't figured out how to make lib/args.c gracefully handle this category and I'm trying NOT to go down a rathole of spending 3 days on the design of something relatively unimportant. Not a fan of ---longopts at the best of times, having extra options to put the behavior BACK to the default... rm -r does not have a turn-off-r-again option because it DOES NOT NEED TO.

The gnu/dammit clowns are BAD AT UNIX, Stallman only cloned unix after ITS died because his community had collapsed under him and he wanted to hijack an existing userbase, he hated and fought unix until he was forced by circumstance to join, and was an outsider who never properly understood WHY it worked.

The old history writeup I did on this years ago didn't even MENTION Digital Equipment Corporation's Project Jupiter which was the proposed successor to their 6-bit mainframes (the PDP-6 and PDP-10). The Jupiter prototype system was used to render part of the graphics in the 1982 disney movie Tron, but DEC pulled the plug on development in April 1983, and THAT's what caused Stallman to give up on ITS and start over cloning Unix. He'd backed the wrong horse, the hardware platform he'd inherited (after everybody else who worked on it graduated and moved on with their lives, he stuck around as a perpetual college student) died out from under it, and NOBODY ELSE CARED. He was forced to move because the univesity was going to unplug the old hardware and throw it away. This wasn't a decision, this was a forced REACTION. RMS was always a conservative reactionary working to prevent change, who took the smallest steps possible each time the legacy position he defended became untenable. As with all ultra-conservatives, he mistakes this for "visionary thinking" and talks himself up, but it's the same "looking back to a largely imaginary golden age" you see so much of from any other privileged old fogey complaining about kids these days.

Stallman couldn't even predict the obvious near future: 6 bit systems inevitably lost to 8 bit systems as memory got cheaper because the whole POINT had been that you could fit 25% more text a given amount of memory using 6 bits per symbol instead of 8... with glaringly obvious limitations. With only 64 combinations you just couldn't fit everything: 26 upper chase characters, 26 lower case characters, and 10 digits left only TWO symbols for space and newline -- you couldn't even end sentences with a period. If you wanted ANY puncutation, you had to sacrifice digits, or make everything uppercase, and different compromises meant incompatible encodings.

The first 7 bit ASCII standard was published in 1964. With twice as many symbols there was no need to compromise -- after upper, lower, and digits half the space was still available for punctuation and control characters -- so every 8-bit system could use a compatible encoding for all documents. Gordon Moore's article describing Moore's Law was published in 1965, predicting exponential increases in memory availability for the forseeable future. Clinging to a 6-bit system almost 20 years later (after all his classmates had already abandoned it) was head-in-the-sand levels of stubbornness on Stallman's part.

DEC had introduced its first system with 8-bit bytes (the 16-bit PDP-11) in 1970, 13 years before canceling Jupiter, and its 32-bit successor the VAX came out in 1977. In DEC's entire history it only ever sold about 700 of its 36-bit PDP-10 mainframe systems. DEC sold almost a _thousand_ times as many PDP-11, and DEC shipped a dual-processor VAX the year before canceling Jupiter.

Stallman is the exact opposite of "visionary". He's just another classically educated white male with decades of practice retroactively justifying what he's already decided to do by constructing a convincing shell of logic around his emotional motivations, and it is just as exhausting dealing with his fanboys as it is dealing with the fanboys of muskrat or jordache peterman or the ex-Resident or any of the others.

Jeff's flying back to Japan. I am jealous. But Fade made a flight reservation for me to visit her from Feb 10 to 22, so that's nice. (Her dorm apartment thingy still has the second room empty and locked, so it doesn't both anybody if I stay more than a couple days.)


January 26, 2023

Last year I ordered a cthulamp for the desk in the bedroom (one of them "five positionable metal tentacles with a lampshade at the end of each" deals), but couldn't figure out how to assemble it properly and then wound up flying off to Fade's and finishing the contract from there. Took another stab at assembling it today and figured out what I got wrong this time (the little plastic not-washer thing with the raised inner bit was both on the wrong side of the shade AND rotated 180 degrees, so it fit perfectly but then the light bulb didn't), and WOW that desk is a nicer workspace with 5 more LED bulbs right next to it.

Finished and checked in --wildcards. Needs more tests in the test suite, but it didn't cause obvious regressions and should be enough to unblock the android kernel guys?

Implementing tar --sort next.

I tried Chloe Ting's "5 minute warmup" video.

Made it to the end this time.

Everything hurts.

(It wasn't even one of her proper EXERCISE videos. I did the WARMUP and am still in pain an hour later. It turns out slowly walking 4 miles a night 3 or 4 times a week not exercise a wide variety of muscle groups.)


January 25, 2023

Elliott emailed me asking for a bug report if I could reproduce the adb compatibility issue, because he says the policy is the developer kit should be backwards compatible all the way back to kit kat, including ADB working. I apologized and acknowledged it's been a while since I've tried the distro version of ADB. (For file transfer I scp files to my webserver so my phone can download them, and attach stuff to myself in slack going the other way. I installed an ssh app on my phone but haven't bothered to use it in forever.

Back when I was running Devuan Ascii, _many_ things out of the repo didn't work (llvm was too old for the packages I was trying to build, ninja was too old, I finally upgraded to Beowulf because building qemu from source demanded a newer vesion of python 3...) The adb in Ascii having been broken probably wasn't surprising. I got in the habit of downloading a new version of the android tools rather than trying the distro version, and haven't checked if I still NEED to in a while...

My current phone's a Pixel 3a that end-of-lifed on Android 12 (the system->update menu has a big "regular updates have ended for this device" banner, with the last one 10 months ago), so isn't exactly a moving target anymore anyway. (Yeah, I should upgrade my laptop to Devuan Chimaera, but nothing major's broken yet that I've noticed?)

At a guess, debian breaking adb is like debian breaking qemu: I always build that from source because debian's distro version never works. Even when the theoretically exact same release built from source via "./configure; make; make install" works fine.

Alright, where did I leave off with wildcards: --wildcards{-no,}{-match-slash,} --{no-,}anchored --{no-,}ignore-case and this is why I got so distracted by trying to automate no- prefix in the plumbing. Right, just explicitly spell out all 8 flags for now and clean it up later. What are the USERS: Inclusion vs exclusion, creation vs extraction, command line arguments vs recursively encountered arguments: that's 8 combinations. No, 16 with and without case sensitivity. (This is assuming extract and test behave the same.) Each of those can have wildcards default to enabled or disabled: case sensitivity is the global default, exclusion defaults to wildcards no-anchored match-slash. Not everything can be enabled in every position, for example --wildcards does not affect command line arguments when creating an archive. (That's one of the tests I wrote back in October.)

I'm also annoyed at --show-transformed-names and --show-stored-names because it should just pick one. I'm also reminded that --verbtim-files-from exists and I think that's what I'm doing already? (Need to compare with busybox...)

Sigh, it's so easy to find -K and -N and go "I could implement that" but nobody's ASKED for it and if you go down that road even ignoring crap like -n (not implementing multiple codepaths to do the same thing, thanks) and --sparse-version there's gratuitous complication like --owner-map (not the same as --group-map) and the $TAPE environment variable and twelve --exclude variants that really could be done via "find" ("find -print0 | xargs -0" covers a multitude of sins, fairly portably) and then just nuts stuff like --hard-dereference that... what's the alternative? Linux doesn't let you hardlink directories, and a file with more than one hardlink is A FILE. Would --ignore-command-error apply to the compressor or just programmatic output streams?

Busybox NOT implementing stuff for a long time is a useful data point: they got a couple decades of people poking them and going "I need this". If it didn't happen (strongly enough for them to react), that's informative.

Except I got asked (on github somewhere) to support appending: -r and -u and maybe -A? (Which is append with existing archive which you don't need tar for...? I mean, it cuts off the trailing NUL blocks I guess. There's an -i option which... I don't know why that always being on would be a bad thing? Probably some historical reason...)

The existencce of "lzip", "lzop", and "lzma" makes me tired. None of which are "xz". (It's like being back in the days of arj and zoo.)

Ahem: ok, back up to the motivating use case: tar --directory={intermediates_dir} --wildcards --xform='s#^.+/##x' -xf {base_modules_archive} '*.ko'

Oh yes, and with gnu/dammit tar --wildcards affects patterns AFTER it but not before it in the command line. Sequencing! Right.

Ok, wildcards can be switched on for extract but NOT for create because creation isn't doing a directory search but is opening (and recursing into) specific command line thingies so there's no comparison being done: there's no readdir() in that codepath, the open(argv[x]) either succeeds or fails. Comparisons are done for creation exclusion (while recursing?), extraction inclusion, extraction exclusion... which corresponds to toybox tar's 3 existing calls to filter() with add_to_tar() calling filter(TT.excl), and then unpack_tar() doing both filter(TT.incl) and then filter(TT.excl). Both TT.excl calls should default to --no-anchor --wildcards-match-slash but the TT.incl call shouldn't (but currently does because I only implemented one filter behavior). The man page implies incl should default to --anchored --no-wildcards --no-wildcards-match-slash...

Sigh, I can just compare my argument with the global variable to distinguish the two cases, and set the default that was. It's ugly, but having the caller (redundantly!) specify the defaults is also ugly, and having an extra agument to distinguish the modes when I can just test for it... Wanna get this finished and move on to the next thing.


January 24, 2023

It's been a while since I've had a significant visual migrane.

The experience is not raising any positive nostalgia.

Not a productive evening.


January 23, 2023

Checked in [LINK] the probably correct but not actually tested DIRTREE_BREADTH code (which at least didn't cause regressions in the test suite) this morning, but haven't used it to implement tar --sort yet because I still have 2/3 of --wildcards in my tree. Which is actually a half-dozen options because there's --no-wildcards-match-slash and so on.

Urgh, why is tar.c not using FNM_LEADING_DIR instead of the constant? I did not leave myself a comment about WHICH build environment barfed on this. The fnmatch.h header is in posix but this particular constant isn't, It's unsurprisingly in glibc, it's in bionic (which says it got it from openbsd), it's in musl. Boot up freebsd-13 under kvm... that's got it too. And Zach got me a mac login... it's there as well.

Ok, is it a 7 year time horizon thing? The date on the line according to git annotate is 4 years ago, so most likely 7 years has expired by now if that was the case? (It's not a kernel thing, it's a libc thing. Annotate on musl's fnmatch.h says it's from 2011, that's a full dozen years ago.) Eh, try constant for the macro and see who complains...

Oh wow. It's glibc that complains. It wants #define ALL_HAIL_STALLMAN to provide the constants, but on Bionic and FreeBSD and MacOS they're just there without magic #defines. And it's the same constant value everywhere. Right, #ifndef in portability.h time, maybe posix will catch up somewhere around 2040...

Yay, dreamhost fixed it. My two posts about it to the list didn't wind up in the web archive and I was all ready to take up my sword again... but it's because I sent the message and the reply to "lists@landley.net" which is not a real address. Hopefully google and archive.org will start populating again at some point.


January 22, 2023

That tar --xform test failure which only happens on musl is because musl still doesn't have regexec(REG_STARTEND). So it's just a new manifestation of a known failure, eating another round of debugging time because 10 years ago Rich explicitly refused to implement something even the BSDs have.

Sigh. I'm eventually either going to have to fork musl or drop support for it. I should just switch that date test back on. There are multiple "yup, musl and musl only is broken, this even works on BSD" cases already. The test suite needs a MUSL_IS_BROKEN flag on tests, or something...

A tech writer recently boggled at the pointless "undefined behavior" in C compilers written by C++ developers. And here's a rant I edited out of a post to lkml:

The C language is simple. The programs you write aren't, but the LANGUAGE is. C combines the flexibility of assembly language with the power of assembly language: it's basically a portable assembly language, with just enough abstraction between the programmer and what the hardware is actually doing that porting from x86 to arm isn't a complete rewrite. You manually allocate and free all resources (memory, files, mappings) and all sort of stuff like endianness, alignment, and word size is directly whatever the hardware does. In C, single stepping through the resulting assembly and matching it up with what your code does isn't that unusual. I've gone looking at /proc/self/maps on a sigstop'd binary and objdump -d on the elf executable to figure out where it got to, and in C you _can_ do that.

C++... isn't that. The language is DESIGNED to hide implementation details, all that stuff about encapsulation and get/set methods and private and protected and friend and so on is about hiding stuff from the programmer. Then when implementation details leak through anyway, try to fix everything by adding more layers (ala "boost") on top of a broken base, but that's like adding floors to a skyscraper to escape a cracked foundation. It's still static typing with static allocation i(they're insanely proud of tying stuff to local variable lifetimes and claiming that's somehow to garbage collection) and it's GOING to leak implementation details left and right, so they have buckets of magic "don't do that" prohibitions which they cargo cult program off of. Most of C++ is learning what NOT to do with it.

C was simple, so C++ developers hijacked compiler development and have worked very hard for the past 15 years to fill C with hidden land mines so it can't be obviously better than C++.

C is a good language for what it does. C++ is a terrible language. The C++ developers have worked tirelessly to make C and C++ smell identical, and as a result there's a big push to replace BOTH with Rust/Go/Swift and throw the C baby out with the C++ bathwater.

Haven't heard back from dreamhost, so I've submitted ANOTHER support request:

http://lists.landley.net/robots.txt prevents Google from indexing http://lists.landley.net/pipermail/toybox-landley.net/

I did not put http://lists.landley.net/robots.txt there and cannot delete it.

The contents of http://lists.landley.net/robots.txt are:

User-agent: *
Disallow: /

Would you please delete this file, or change it to allow Google to index the site? I do not have access to it.

Here's hoping THAT is explicit enough for them to actually do something about it. Third time's the charm?


January 21, 2023

Properly reported the qemu-mips breakage. That list may be corporate, but it's not the wretched hive of scum and villainy linux-kernel's turned into, so maybe... (Yay, there is a patch, and it Worked For Me.)

So what DIRTREE_BREADTH _should_ look like is something like...

  1. The initial callback (which always happens) returns BREADTH, and the calling function populates the ->child list one level down the same way DIRTREE_SAVE would.
  2. The second callback has ->again set to DIRTREE_BREADTH, which lets you sort the children. When this one returns, it recurses into those children unless you returned DIRTREE_ABORT. This recursion frees each child if its initial callback didn't return DIRTREE_SAVE.
  3. The DIRTREE_AGAIN callback is handled normally, although the children were already freed if not SAVEd.

Hmmm, instead of checking for DIRTREE_BREADTH a lot the "populate children" loop should just pass a NULL callback while accumulating children... Sigh, I need to stress test DIRTREE_ABORT to make sure A) it returns from anywhere, B) it doesn't leak memory. Except most of my actual users don't choose the abort path, they continue on despite errors: tar, rm, cp...


January 20, 2023

We have a dishwasher again! Exact same type as last time, so it looks like nothing has changed but so much work went into this. (Ah, that old story.) The install guy set it doing an empty pratice run first, but then we have so many dishes to wash...

Jeff is trying to set up an sh4 development environment so he can come up with mmu patches and send them to linux-kernel, and I've been feeding him the trail of breacrumbs I've laid out with mdm-buildall and mkroot and so on, even using my prebuilt binary system image tarball the network didn't work for him, and that's becaue I'm using an older qemu version than he is.

Building QEMU from source recently broke network support for all platforms by splitting it out into a separate package your distro has to install for you. Because obviously the ability to talk to the network is not a standard thing a VM would want to do. This now requires "libslirb". There's an existing slirp package, for the serial line internet protocols slip and ppp, which has nothing to do with libslirp that I can tell. Luckily devuan has a "beowulf-backports" repository alongside all the others, which I can add (why didn't the OS install do that?) to get this libslirp-dev package. I'm still annoyed the IBM mainframe guys who took over QEMU development when kvm displaced xen as Linux's standard "hypervisor" are suddenly demanding it, but at least I can get Jeff unblocked now.

Mainframe punched card culture should not be allowed to turn functional software into bloated "enterprise" crap: qemu-system-arm64 (ahem, I mean qemu-system-aarrcchh6644) is A HUNDRED AND TWENTY FIVE MEGABYTES. Dynamically linked! That can't be right. You can tell Fabrice Bellard moved on long ago, and was replaced by a committee.

And test_mkroot.sh says mips is still broken... because the ethernet hardware isn't binding even WITH the library installled. And that's... because an endianness "fix" broke big endian for pretty much the entire PCI bus. Sigh. Vent about it all and move on...

Ok, tangent du jour beaten back down, let's circle back to the toybox design issue I'm frowning at. What notes did I leave myself:

why are recurse and handle_callback split?
  dirtree_add_node(): clear design, yay
    - maybe add callback as argument to dirtree_add_node()?
  dirtree_handle_callback: 

stages:
  fetch dir, initial callback: returns DIRTREE_BREADTH
    fetch children, via recurse with BREADTH.
      problem: closed fd already? (don't close for BREADTH)
    breadth callback: returns DIRTREE_RECURSE
      traverse children now
        call handle_callback on each?

Which means: DIRTREE_BREADTH isn't that hard to implement, but the existing code has three functions that really seem like they shouldn't be split that way?

  • dirtree_add_node(dirtree *parent, char *name, int flags) - creates a struct dirtree from a file. Handles the flags FOLLOW, STATLESS, and SHUTUP. Returns a new node with ->parent connected but not ->child.

  • dirtree_handle_callback(dirtree *new, function *callback) - calls callback(new) and handles the return value: flags RECURSE, COMEAGAIN, SAVE, and ABORT. (And I'm trying to add BREADTH here.)

  • dirtree_recurse(dirtree *node, function (callback, int dirfd, int flags) - most of the plumbing.

One sharp edge is that handle_callback() is opening the dirfd for recurse, but then recurse is closing it, which is NOT a happy lifetime rule.

I think the reason for all this tangle in the first place is I was trying to recurse the data structure without making the FUNCTIONS recurse, so it didn't eat an unbounded amount of stack when descending into a tree of unbounded depth? (Especially nasty on nommu.) Except that pretty much means having all three of them be a single function, because otherwise they're calling back and forth between each other. Or having one function that calls the others in a loop, which isn't what it's currently doing.

In any case, "implement breadth first search" and "reorganize this to not be designed wrong" really need to be two different passes, otherwise I'm here for a while...


January 19, 2023

Ha! The dirtree.c plumbing shouldn't have seperate DTA_BLAH flags for the "again" field to distinguish different types of callbacks, it should reuse the existing DIRTREE_COMEAGAIN, DIRTREE_STATLESS, and DIRTREE_BREADTH bits. (The "again" field is a char so can only hold the first flags, but I can reorder the DIRTREE flag list as necessary so the ones that cause callbacks are all at the start. Nobody else cares which flag is which, that's why there's macros.) This way, the again bits are the same as the reason for the callback: no flags is the initial "we found and populated a struct" callback you always get when callback isn't NULL, then BREADTH is "finished populating a directory with implicit DIRTREE_SAVE but did not descend into it yet, so now would be a good time to sort the children", and then COMEAGAIN call would be the final call on the way out of the directory after handling all children. (STATLESS doesn't cause a seperate callback, but is set on any callback when stat isn't valid.)

I should rename DIRTREE_COMEAGAIN to just DIRTREE_AGAIN (it was a Simpsons reference), but my tree's too dirty for comfort, need to check other stuff in first.

For BREADTH child callbacks are deferred until traversal: if the initial no-flags callback on the directory returns DIRTREE_BREADTH the plumbing should populate all the child structures without making any callbacks on them yet, then it does a callback on the same dir again with DIRTREE_BREADTH, then traverses the child list doing normal callbacks but freeing all the non-dir children after each callback returns, and then traverses the now-shortened list again handling the directories it needs to descend into...

Hmmm, that's not what gnu/dammit tar is doing, though. It's populating and sorting the list, then traversing it but descending into each directory as it encounters it in the travesal. Which isn't a true breadth-first search, it has ELEMENTS of breadth-first but... Ok, the return codes from the callback functions need to control order. Maybe if the DIRTREE_BREADTH callback returns DIRTREE_RECURSE then we descend into it now, and if not we do the second pass thing? Hmmm. I've got DIRTREE_SAVE, DIRTREE_RECURSE, and DIRTREE_BREADTH, and can return a chord of any of them to mean what I need it to, the question is what's the most obvious way to signal what I need it to do? What ARE the use cases?

This needs some pacing and staring into the distance....


January 18, 2023

Sitting at HEB with a stack of beverages I just bought (refill on blueberry energy cylinders, the checkerboard teas are back in stock, and there was a good coconut water coupon today)... but no snacks.

I miss japanese grocery stores and conbini. The conbini of course had rice balls and steamed buns and even microwaveable hamburgers if you wanted to get serious. The grocery store near the office had lovely little 100 yen sandwiches, which were just two pieces of cheap white bread with some filling (I usually got the strawberry jam or tuna varieties), crimped in some variant of a panini press that cut off the crusts and sealed the edges, and then presumably run through a nuclear reactor to sterilize them so it has multi-week shelf life. (Like mythbusters did to sterilize those tortilla chips in the "double dipping" episode: conveyor built moves the product past a strong radiation source is basically a non-heating microwave that kills all the bacteria with a few seconds of intense gamma radiation. The expiration date on the package is when the sandwich dries out slightly and is less tasty, I never had one actually go bad.) We could totally do that here in the states, we just don't: some variant of laws, culture, inclination, and capitalism optimizing for profit over all else.

Ok, tar --sort needs DIRTREE_BREADTH to do breadth first search. I could instead do DIRTREE_SAVE to populate the whole tree up front, then sort the whole tree, and then traverse the resulting whole tree, but don't want to because A) directories changing out from under us are less icky if you do it all in one pass, B) I've already got the openat() directory-specific filehandles for local access (I can open "file in this directory") in that initial pass. A second traversal has to either re-establish the openat() filehandles, or create/open big/long/path and potentially hit PATH_MAX issues. Since I don't have existing plumbing to do either of those yet, as long as I have to to write new plumbing ANYWAY I might as well implement the DIRTREE_BREADTH stuff I have some existing design stubs for.

DIRTREE_BREADTH brings up the DIRTREE_COMEAGAIN callback semantics: to enforce a specific traversal order I need to sort each directory's contents before descending into it. I reserved a DIRTREE_BREADTH flag back at the start but never implemented it, and I now have _three_ users of this plumbing that I'm aware of (ls, find, tar) so sounds like time to implement it. (Whether or not I poke ls.c with a stick afterwards remains an open question.)

Looking at find -depth is.. sigh. The toybox find help text describes -depth as "ignore contents of dir" and the debian man page describes -depth as "Process each directory's contents before the directory itself" and I don't remember if posix even has -depth and I probably need to spend an hour or two on this rathole, but I haven't got spare cycles for right now. (And I've already REVIEWED this one multiple times, so 99% likely I wouldn't be fixing the code but just updating my memory of it.) Anyway, -depth existing implies that _without_ that it's doing a breadth first search... which it demonstrably isn't in simple testing. Ok, find is NOT doing breadth first search. I thought it had an option for this, but no. It has an option to tell it what order to _act_ on what it's traversing, but it still descends into each directory it encounters when it encounters it.The ls.c code is taking manual control of the traversal by having the callback return DIRTREE_SAVE without DIRTREE_RECURSE so the traversal populates a directory's children, then it converts the linked list to an array, sorts the array, uses the array to re-link the list objects in the right order, then it iterates over the sorted list and calls dirtree_recurse() again on each directory entry.

So I want dirtree_recurse to assemble the list, call a sort callback on the directory that can reorder the children, and then traverse them and descend. Which is a different callback from the current DIRTREE_COMEAGAIN callback? Do I need a third dirtree->again flag value? It's got 1 (callback on directory after processing all contents) and 2 (DIRTREE_STATLESS returning a file we can't stat), which are set/used as constants without macros defined for them. A third means macros, what would... DTA_AGAIN and DTA_STATLESS maybe?

Hmmm... but IS this callback a different one than DIRTREE_COMEAGAIN? It sounds like DIRTREE_BREADTH means: 1) DIRTREE_SAVE a linked list of a directory's children without recursing, 2) call the DIRTREE_COMEAGAIN callback on the directory, 3) traverse the saved list... doing what exactly? When are these freed? If we free them the step 3 traversal how do they ever get used?

Ok, I think I do want a third flag: DTA_DIRPOP lets you sort a directory after it's populated, and then we call with DTA_AGAIN on each entry right before we free it. Except the find -depth question comes in: does the directory count as occurring before or after its contents? That's a question for the sort function... ah, ok: while traversing the list, do a DTA_DIRPOP call before descending into it, DTA_DIRPOP|DTA_AGAIN after populating it, an then DTA_AGAIN without DTA_DIRPOP before freeing it. Silly, but it gives the callback multiple bites at the apple while still having generic infrastructure actual do the traversal.

And this is basically a wrapper function before the existing add_to_tar() dirtree callback that checks the flags and does sorting stuff as necessary, but otherwise calls the other callback. And you only insert the second callback when doing --sort. Ok, that seems feasible?

Implementing is easy, figuring out WHAT to implement is hard.

Darn it, one of the commands that came up in need of tweaking when I change dirtree semantics is chgrp... which was never converted to FLAG() macros. But chgrp.tests needs root to run meaning I want to run it under mkroot and that whole BRANCH of development is... several pops down the stack.

My _development_ plan has circular dependencies. Gordian knot cutting time, let's do it "wrong" for a bit just to clear some things...


January 17, 2023

My sleep schedule has been creeping forward towards my usual "walk to UT well after dark and spend the wee hours at the university with laptop", but I got woken up at the crack of dawn by sirens, flashy lights, and engine sounds right outside my window because the big house on the corner caught fire, and between something like 7 fire trucks and the police blocking off the street at both ends it was Very Definitely A Thing even from bed. I got up to make sure there wasn't incoming danger to us, and then I was up...

Kind of out of it all day as a result. Got a nap later, but "5 hours then FORCED UP" is something I may be too old to handle gracefully...

1pm call with Jeff to go over the Linux arch/sh patches, and the mmu change that apparently motivated the latest round of dickishness.

Elliott wants --sort=name, so looking at that. The man page has a short -s right next to it, which... "sort names to extract to match archive". What does that _do_ exactly? I'm already going through the archive in the order the names in the archive occur. There's not much alternative with tar. You can pass a bunch of match filters on the command line, but it's going to encounter them in the archive it's extracting, and thus extract them, in the order they occur in the archive. Tar != zip, it's not set up to randomly seek around, especially when it's compressed.

Sigh, my tar tree still has 2/3 of a --wildcards implementation in it, and does not currently even compile. Plus a bunch of test suite tests the host passes but my version doesn't. Need to finish that or back it out...

And when I do full tests against the musl build, tar is failing the "xform trailing slash special case". Which I don't notice when it's skipping the xform tests because it's using non-toybox sed (as happens on "make test_tar" unless I do special $PATH setup), and which I don't notice when testing a full glibc build because it works there. 95% likely it's musl's regex implementation, but... what specifically is diverging?

I would have an easier time with this if I remembered exactly what the "xform trailing slash special case" IS. October wasn't that long ago, but I checked this in as part of a large lump after days of work and there were a bunch of tests? It's searching for "^.+/" which... ^ is start of string, . is single character wildcard, + is * except "one or more" instead of "zero or more", and then / is presumably a literal / except it says "special case" here... Sigh, was this in the tar manual?

The example at the very end of that page is about specifying multiple sed transforms on the same command line, the first of which is NOT TERMINATED PROPERLY. (I.E. --transform='s,/usr/var,/var/' is missing a comma at the end.) And they repeat it twice the same way. Is this a doc mistake they cut and pasted, or does their implementation accept that? I'm afraid to check, and have NO idea how to deal with it if their implementation DOES allow it but normal sed doesn't. Maybe circle back to --xform after implementing the new stuff...


January 16, 2023

Ok, here's how I could cheat on the toysh "read" builtin: the case I care about optimizing is "while read", and the "while/do/done" block has an entry/exit lifespan. I can have the "while" cooperate with "read" to cache a FILE object. The read has to save it because "-u fd" is a read argument, but the while gives it someplace TO save it with a longer lifespan than the individual read call, and passing out of the "done" lets us know when to free the FILE *. Hmmm, I could store it in sh_blockstack's char *fvar with an evil typecast, that's not used by while... I'm dubious. Need to pace about it more. Probably best to implement just the slow path first. (There are SO many read options... timeout, length with and without terminator, -s puts the terminal in raw mode... I'm gonna need to back and implement array variable support in everything at some point? How do I stage this sanely...)

Oh hey, Greg KH is _also_ yanking most of the classic graphics drivers from linux-kernel. It REALLY sounds like linux-kernel development is collapsing and they're throwing code overboard as fast as they can. I hope that's NOT the case, I really thought we had another 5 to 10 years before that happened, but if Linus has decided to retire early because his daughters are all off to college... Let's see, his and his three daughters' birthdays are the easter egg values in "man 2 reboot" LINUX_REBOOT_MAGIC2, which are:

$ printf '%x\n' 672274793 85072278 369367448 537993216
28121969
5121996
16041998
20112000

So Linus is 53 (december 28, 1969) and his _youngest_ daughter is 22. Yeah, he's probably recently become an empty nester, and may be "quiet quitting" to go do other things with his life. And Greg has been waiting DECADES for the opportunity to do to Linux what Elon Musk is doing to twitter. Like an alcoholic buying a distillery. Sigh.

My annoyance with current linux kernel development is "Stop breaking stuff. Can the things that used to work still work?" And the reason we CAN'T have a stable kernel that doesn't shed features is... Greg Kroah-Hartman! Who many years ago proudly wrote a document named stable-api-nonsense about how the concept of Linux EVEN HAVING a stable driver API so you could keep portable divers between versions the way Windows did for many years... Greg said that's a crazy idea that Linux would never ever do. Userspace can still run a static binary from 1996, the kernel can't load a module from 9 months ago. Partly because GPL, and partly because Linux MUST be free to completely rewrite all the plumbing every 18 months to gain half a percent performance improvement with worse latency spikes. And now Greg's deleting a bunch of working drivers that are too hard to maintain under his insane regime. Wheee...

Sigh. Speaking of spiraling narcisists, did you know that Elon Musk got the idea of going to mars from a science fiction book the Nazi rocket scientist Werhner von Braun wrote in 1949, in which the emperor of Mars was named "Elon"? Back in the 1950s the reason Musk's grandparents gave for leaving canada for apartheid south africa was they perceived a "moral decline" in Canada (Wikipedia says "Most of the recorded student deaths at residential schools took place before the 1950s" so Musk's grandparents left Canada's about when the mass kidnapping and murder of native children declined, and instead they traveled halfway across the world to participate in Apartheid). So there's a nonzero chance Musk was named after that character in the 1949 German book, since his family was VERY familiar with a wide range of otherwise obscure nazi materials. So of course various Musk fans are now going "famous rocket scientist predicted Elon would be emperor of mars!" and I'm going "you have the causality here exacty backwards". Why do people keep thinking the man's ever had an original idea? That's NEVER been how he works...

My grandfather also interacted with Von Braun, they worked together on the Apollo program. (My parents met on the apollo program, because my father dated his boss's daughter.) The story grampa told me was that Von Braun's most important contribution to the US space program was statistical failure analysis. Grampa never mentioned the NSA until he got _really_ low while my mother was dying of cancer in the other room, shortly after my grandmother had died of her chronic lung problems (emphysema and eventually lung cancer, from years of smoking back before I was born). They'd had three kids, I never met Uncle Bo who volunteered to fight in vietnam over Grampa's strenuous objections and died there when his helicopter was shot down. Grandpa was now outliving a second kid and not taking it well, and started by complaining about how his hearing was shot and his big interest that got him into electronics had been crystal radios and audio. He was telling me how the allies recorded sound on magnetized metal wire but it got cross-talk when spooled and you couldn't splice it if it broke or got a kink, but they captured desk-sized audio reel to reel tape recorders from nazi bunkers which were a MUCH better design: built-in insulation between the magnetic layers in the spool and the tape could be cut cleanly and attached together with scotch tape on the back, and some of the GIs shipped a couple of them to the states where Bing Crosby paid to have them reverse engineered (and vastly simplified) so he could ship tape reels around to radio stations instead of constantly flying to give live performances, and this became the company "Ampex". Grandpa also told me how he did cryptography during the war creating one time pad vinyl records of "music off the spheres" radiotelescope recordings of ionized particles hitting the upper atmosphere sped up to be good random static which completely drowned out the voice unless you had the exact same record to do noise cancelling on the other side (stacks of these records were carried across the atlantic via courier, each one smashed and the pieces melted after one use). Churchill and FDR used these to securely talk to each other live over transatlantic cable, and this proceeded naturally to grampa venting about being blackmailed into joining (what became) the NSA after the war because they were going to draft him and put him on the front line in Korea if he didn't "volunteer", and then not being able to get out for decades until some idiot almost got him killed in Iraq in the 1980s by trying to hand off intelligence to him in his hotel room while he was there as a technical expert for General Electric upgrading (and bugging) the Iraqi phone system. (Apparently the various spy services are the best technical recruiters, finding you companies to work at. Well, they were decades ago, anyway. My take-away was "don't get any of that crap on you, you'll never get out again", and I learned it from my father's simple defense contracting.)

Oh hey, Dreamhost replied. They escalated to somebody who DID NOT BOTHER TO READ MY SUPPORT REQUEST. Not even the usbject line, which reads "Re: The robots.txt you put on lists.landley.net (which you won't let me log into) blocks google."

On 1/15/23 23:48, DreamHost Customer Support Team wrote:

> Hello,

> Thank you for contacting DreamHost Support! My name is XXX and I'd be happy to assist you with your concerns.

> With regards to the discussion list service, the last time this service was touched was last year in July when we had a maintenance on it to where we upgraded the services to new hardware. This didn't change much of how the service functions, though, as we're still running the same Mailman version as before under 2.1.39.

The robots.txt file is not technically part of mailman. Mailman runs in a web server, and that web server is serving the robots.txt file.

> About the http://lists.landley.net/listinfo.cgi page, that page has been disabled for a long time now.

I noticed. I complained about it at the time.

> The list overview page for the discussion list service was disabled over 5 years ago, actually.

Yes, as I told you in my last email. Closer to ten, really: https://landley.net/notes-2014.html#20-12-2014

> So, that page posted the "The list overview page has been disabled temporarily" message for a very long time now.

What does the word "temporarily" mean here?

> Unfortunately, that cannot be edited, but you already have your list archives set to public, so they can all be accessed here: http://lists.landley.net/listinfo.cgi/toybox-landley.net

Yes, I know, they are linked from https://landley.net/toybox on the left. But if people go to the top level lists.landley.net page, they do not get a list of available lists, and every couple months (for many years) people ask me why, and I tell them "because Dreamhost is bad at this".

For comparison, if I go to http://lists.busybox.net I don't need to remember the exact URL of the list I want to look at, because there is a navigation link right to it. That is not true of the toybox project, and I can't fix it, and my stock response to everyone who asks is "because Dreamhost is bad at this". Your service makes my open source project look bad to the point it's a FAQ.

The top level index page is especially nice if I'm sitting down at a different machine than I'm normally at and using a standard web browser to see if there are new posts, because remembering the full URL with the "dash between toybox and landley but the dot between landley and net and also a dot between listinfo and cgi"... that's tricky to do from memory.

> Since it's public, clicking the "Toybox Archives" link will open up the archives for that list for anyone that finds it.

I know how mailing lists work. I use them. If you looked at the mailing list in question you'd see I last posted to it on thursday. The "enh" posts are from Elliott Hughes, the maintainer of the Android base operating system for Google. He's the second most active poster to the list in question. I used to have other mailing lists for other projects, but they ended or moved off dreamhost "because Dreamhost is bad at this".

> As for the robot.txt file,

It's robots.txt.

> your 'lists.landley.net' sub-domain for the list does not use a robot.txt file.

Because it's robots.txt, as defined in the IETF RFC documents: https://www.rfc-editor.org/rfc/rfc9309.html

Point a web browser at: http://lists.landley.net/robots.txt

Do you see the file there? The file is wrong. The result returned by fetching that URL (which I CUT AND PASTED INTO MY LAST MESSAGE TO YOU) prevents Google from indexing the site. I do not have control over this file, for the same reason I had no control over the "temporarily disabled" message. It is a thing Dreamhost did on a server I do not have direct access to.

> In fact, on the mailman server, the services are not actually under the list sub-domain. That's just the sub-domain that all of your lists are managed under.

Do you see the "which you won't let me log into" up in the subject: line of this email, from my original support request?

In the message you are replying to I explained that "landley.net" and "lists.landley.net" are on different servers and I don't have access to the lists.landley.net one to fix what's wrong with it. You are repeating my problem statement back at me.

> But, on the mailman server, each list has its own set of configurations and files. For example, the stuff for the 'toybox' list is under the 'toybox-landley.net' location on the mailman server and has no robots.txt file.

When you fetch the URL from the server, there _is_ a robots.txt file. (Which you spelled properly this time.) The text "temporarily disabled" probably wasn't in the toybox-landley.net subdirectory either. The mailman install has templates for shared infrastructure.

This implies that it's a global setting, and you have thus blocked google search on EVERY mailman domain that Dreamhost serves. (Which I suspected was the case but don't know what other server pages to look at to confirm it.)

> It's just a sub-domain DNS record that points to the list server for where the list is managed.

Yes, I know. I managed my own DNS for the first few years you hosted my site, until I took advantage of your free domain renewals as part of the bundle.

I'm sure there was a little "yes I am experienced at web stuff" radio button selector when I submitted this help request? It did not have a "I ran my own apache instance for about 10 years, have also configured nginx, and even wrote my own simple web server from scratch in three different langauges" option, but still. (The httpd implementation I wrote last year is at https://github.com/landley/toybox/blob/master/toys/net/httpd.c because I needed something to test the new wget implementation with, so I did a simple little inetd version. Haven't wired up CGI support yet but it's got about 1/3 of the plumbing for it in already.)

The problem isn't that I don't know what's wrong, it's that I do not have access to fix it. I thought I'd explained this already, but I can repeat it.

I can SEE the robots.txt file. So can google. It is there. It should not be.

> And lastly, I'm afraid that our list services are not configured to run through HTTPS and there are no plans on getting that updated at this time, unfortunately.

Yes, I know. But that isn't _fresh_ breakage, so I'm living with it as part of the general "dreamhost is bad at this" Frequently Asked Question.

But Google _could_ find my mailing list entries a year or so back, and can't now, so Dreamhost adding a bad robots.txt is fresh breakage. (Dunno how long the google cache takes to time out when a new deny shows up?)

Given that the project I'm maintaining through that mailing list is Google's command line utilities for Android (I.E. their implementation of ls/cat/set etc as described in https://lwn.net/Articles/629362/ ) that's especially embarrassing.

> This would be quite the project as it would require an upgrade of Mailman, likely to version 3, which is quite different from version 2. So, the list admin page can only be accessed through HTTP. I'm very sorry about that.

Eh, I'm used to it.

I don't _think_ Android has entirely dropped support for non-encrypted URLs yet, only for certain api categories. (Which sadly broke my podcast player when upgrading to Android 12 no longer let it load http:// podcast files, only https.) I think you still have a couple more years before your mailing list infrastructure becomes entirely inaccessible from phones: https://transparencyreport.google.com/https/overview?hl=en

That uptick to 100% in the chart when Android 13 came out is a bit worrying, but I haven't bought a new phone in a few years and mine is only supported through 12. _I_ can still access it. (And from my Linux laptop, of course. No idea if random windows or mac users still can though. Safari's policy and chrome's policy and mozilla's policy don't advance in lockstep, but I hear rumblings.

Most websites have put mandatory http->https forwarding in place where accessing http just gets you a 301 redirect to https for _years_ now. Try explicitly going to "http://cnn.com" or "http://google.com" in your browser, it will load the secure page. It can't _not_ do so.

The rise of "let's encrypt" (nine years ago according to https://lwn.net/Articles/621945/ ) was what finally let people start deprecating the old protocol in clients, because sites no longer have to pay for a certificate so even the third world organizations running a solar powered raspberry pi on their cell phone towers can afford https now.

> I hope that helps to clear things up.

No, it doesn't. The robots.txt file excluding * for / still needs to be removed so Google can index my mailing list posts like it used to do.

> Please contact us back at any time should you have any questions or concerns at all. We're here to help!

The concern I expressed in the subject line is still not fixed.

I'd guess they did this because they didn't have any other way to manage server load, and their servers are underprovisioned. I suppose if they're truly this incompetent and have no other solution, I can set up a cron job to scrape the lists.landley.net web archive and mirror it under landley.net? It's EXTREMELY SILLY to do so, but I can just add that to the FAQ I guess?


January 15, 2023

Oh hey, Greg Kroah-Hartman is also removing the RNDIS driver from Linux, which is how Android does USB tethering. I wonder when Linus stopped being maintainer? The glory hound's been trying to drag the spotlight onto himself for decades now, but used to get told "no" a lot for hopefuly obvious reasons. Honestly, he's half the reason I don't post to lkml anymore. Al Viro was less abrasive: I'll take honest distain over two-faced self-aggrandizing politics any day.

I have some domain expertise with USB ethernet: a couple years back Jeff and I implemented CDC-ACM USB ethernet hardware for the Turtle boards, which could talk to Linux and MacOS but not Windows because Windows doesn't support CDC-ACM. It's a reference implementation from a standards body, but does NOT have a generic Windows driver because Microsoft wants money from each hardware vendor to be "supported". To test it we got a beta of a driver from somebody that made it work for half an hour at a time (before you had to unplug it and replug it because the driver was an "evaluation" version that timed out), but Microsoft charged $30k to sign a driver for Windows, and each is specific to a vendor ID and model number. Microsoft chose to have no generic driver for the protocol, only drivers for specific devices, so each hardware vendor had to pay microsoft $30k each time they needed to update their driver. (They claim they eliminated unsigned drivers for "security", but it's a profit center for them.)

Everybody Jeff talked to suggested we implement the RNDIS protocol instead, which is something Microsoft invented but both Mac and Linux supported it out of the box, and that one DOES have a generic driver in Windows that doesn't require $30k periodically sent to microsoft. Switching our hardware to RNDIS didn't look hard, we just hadn't done the research to make sure there weren't any lurking patents. (PROBABLY not? https://web.archive.org/web/20120222000514/http://msdn.microsoft.com/en-us/library/windows/hardware/gg463298.aspx says "updated 2009" and "assumes the reader is familiar with Remote NDIS Specification; Universal Serial Bus Specification, the Revision 1.1" but that document has been carefully scrubbed off the internet, the oldest I can find is 5.0. Because implementing against the old version is a prior art defense, so the old version is yanked.

The protocol was all in the FPGA bitstream, the actual USB chip we'd wired to the FPGA pins was just a fancy transciever that didn't even know about packets, and USB 2.x "bulk" protocols are all the same packet transfers with different header info. We never got around to prototyping it, we ran out of time shortly after we got the CDC-ACM version working (including our own TERRIBLE userspace driver that just spun sending data to/from a memory mapped I/O interface into the kernel's "raw packet" plumbing, improving THAT was our next todo item but the benchtop prototype was 2x SMP so the driver eating a processor affected power consumption but not performance). Jeff and I both flew out of Tokyo, and a year and change into the pandemic the funding for that project ran out, so it got mothballed without doing a proper production run, and we just didn't get back to it. But using RNDIS was the easy fix, and it's what everybody ELSE in the industry did, including Android's USB tethering.

Now Greg KH seems to be saying "we're losing features left and right, our collaping development team can't maintain the stuff we've already got, so let's flex OUR market muscle to out-influence microsoft". Or something?

I suspect Android's response will be "USB tethering is no longer supported on desktop Linux then, oh well, here's a Linux driver for RNDIS if you want to make it work". I haven't asked Elliott, but I remember when USB file transfer between my Linux laptop and android phone used to be really simple... and then it was replaced by some Microsoft protocol I could theoretically install an elaborate Gnome program for which never worked. (Or I could install the Android Development Kit, enable the debug menu in my phone, and use ADB file transfer from the command line. I've had to download a new copy of the android tools from their website every time I've needed to get that to work, because version skew.) Linux on the Desktop is not a commercially significant target market, we get _courtesy_ at best.

Even years from now, it would still be WAY easier for the J-core guys to ship an out-of-tree Linux kernel module than externally add a driver to Windows without paying them $30k annually-ish. Stuff like the Steam Deck could 100% use an out of tree driver if they needed to. Greg is making vanilla linux development smaller, but who's really suprised? He was the author of the kernel's "Code of Conflict" after all, and Linus was the one who apologized on behalf of the community and very publicly went to therapy to dig the community even a little way out of that hole, not Greg. The aging development community was emitting distress signals in 2013, and again in 2017, and now it's 2023...

(Yes I know Greg wrote "Android has had this disabled for many years so there should not be any real systems that still need this." My phone's running Android 12, I just tethered to check and dmesg said "rndis_host 3-1.2:1.0 usb0: unregister 'rndis_host' usb-0000:00:1a.0-1.2, RNDIS device". Oh, and hey, there's a more convenient way to configure it than I've been doing. I honestly don't know if Greg is clueless or lying, but does it matter? He is Confidently Wrong White Male.)

USB 2.0 shipped in 2000 so it's fairly recently gone out of patent (hence predictable badmouthing from for-profit manufacturers TERRIFIED of commodity competition from cheap generic hardware; the instant anything becomes available for open royalty-free implementation in it MUST BE DESTROYED. The oldest RNDIS documentation I could find says "updated 2009" (not authored, updated, it's older than that) and "assumes the reader is familiar with Remote NDIS Specification; Universal Serial Bus Specification, the Revision 1.1" but that document has been carefully scrubbed off the internet, the oldest I can find is 5.0. Because implementing against the old version is a prior art defense, so the old version is yanked. It is entirely possible that it recently DID go out of patent... and thus must be destroyed. How that idea made it from one of the Linux Foundation's largest contributors to one of the Linux Foundation's most prominent employees, I couldn't speculate, but he's sure confident about it.

RNDIS isn't tied to a specific USB generation (it's a packet protocol going across a transport), but USB 2.0 should be out of patent now (the spec is dated April 27, 2000) and that chugs along around 40 megabytes per second, which is still a quite useful modern data rate: over 20 parallel 4K HD netflix streams, over two gigabytes per minute, just under 7 hours per terabyte. It's about 1/3 the _theoretical_ max rate of gigabit ethernet (which I never get), and we were implementing it full speed on hardware running at... 60mhz I think? Either 4 bit or 8 bit parallel bus into and out of the chip, moving multiple bits per clock. A USB-powered device talking USB-2.0 RNDIS ethernet isn't hard to implement. Our CDC-ACM implementation fit in an ICE-40 with space left over.

I'm grinding through some of those email files from yesterday, trying to identify all the patches sent to the linux-sh list (grep '^+++ ' seems a reasonable first pass there once they're in individual files), but thunderbird saved all the files with the current date so it's not easy to filter for relevance. So I'm doing for i in sub/*; do toybox touch -d @$(date -d "$(toybox dos2unix < "$i" | sed -n 's/^Date: [ \t]*//p;T;q')" +%s) "$i" || break; done (as you do, yes gnu/dammit date gets unhappy with \r on the end of a date string, apparently), and I get an error message:

date: invalid date ‘Mon Sep 29 01:50:05 2014 +0200’

And I'm going... wha? Cut and paste that string to toybox date and... yes, it fails too. First guess: click back in xfce's little calendar widget, September 29, 2014 was a... sunday. Seriously? Sigh. Ok, FINE. Oddly, that date's not from the headers, it's from an inline patch which means... how is my T;q on the sed not triggering? (Back before I added that, date was complaining that multiple concatenated dates with \n were not a valid date...)

Ah, my sed is wrong. It expects a space after the date and that message has a tab in the headers, so it continued on and pulled one from a "git am"-ish patch in the body of the message. Ok, fix that and check that they all convert... yup, now they do.

Huh. You know, _technically_ netcat UDP server mode could handle one packet and then go back into "listening for new connection" mode, which would solve the 'locks itself to a specific source port' issue. That wouldn't work for child processes: the reason it's handling UDP packets the sameway as TCP connections is so we can pass off stdin/stdout filehandles to child processes. Which is where the "no notification of when a connection with no flow control closes" problem comes from: we'd need some sort of keepalive packet and there's no obvious place to insert that (if the kernel hasn't got a flag we'd need a Gratuitous Forwarder Process of some kind). The reason I didn't do that before is I don't want two codepaths to implement the same thing. Really, my use case here for interactive mode is "Linux net console". Does that send from a consistent source port even across reboots? Hmmm...

At this point I honestly expect healthcare.gov to KEEP sending me emails after today: "You missed the deadline, it was yesterday! How could you!" Yes I did try Obamacare one year, but at the moment I have the classic "VERY NICE health insurance through spouse's work" arrangement, in this case Fade's graduate program through the end of the summer, and then maybe we'll do that Common Object Request Broker Architecture thing to extend it a bit if she hasn't found a job yet, at which point it's _next_ healthcare.gov enrollment period). Alas there's no obvious way to tell obamacare's automated system that A) I'm currently good, B) you are basically useless in Texas because republican assholes bounced the subsidies and sabotaged implementation, C) I schedule doctor's appointments when I visit my wife up to minneapolis because the hospitals _here_ are collapsing unless all you need is a $100 visit to a nurse practitioner in a strip mall to get regulatory permission to purchase pills from a pharmacy, which are all over the place now. (How much of that collapse was covid and how much was foretold in legend and song is an open question. Two answer the follow-up questions: 1) Yes it's intentional, 2) if you don't work for a billionare-owned company getting a UTI costs more than a car and potentially more than a house so you will put up with ANYTHING to keep your job and they have less competition from small businesses and independent contractors. Guillotine the billionaires.)


January 14, 2023

I wonder if there's some way to get mastodon to do the green check mark thing? If you view source, I've had the link up top for a while now, with the magic rel="me" thing that's apparently an important part of it, but it just doesn't register? (I was reminded by updating the page links for the new year...)

Always odd when I get a request to do a thing I'm in the middle of doing. Yay, I'm on the right track? Not quite sure how to reply... "Um... yeah."

While Fade was here, heading out to poke at my laptop usually meant I'd use like a quarter of my battery then head back. Getting the low battery warning comes as a surprise after a few weeks of Not Doing That.

Dreamhost forwarded my support request to a higher level tech. That's nice. Unlikely to hear back before monday

I am once again impressed by how broken Thunderbird is. This needs some context. So Rich Felker theoretically maintains Linux's arch/sh but he hasn't updated his linux-sh git repo in over a year, and Cristoph Hellwig unilaterally decided to delete Linux support for the architecture despite plenty of people still using it and having an active debian port and so on. He didn't just suggest it, but posted a 22 patch series to remove it. (The charitable explanation is he's doing a don't a "don't ask questions, post errors" thing and putting the onus on US to object loudly enough.) Of course Greg KH immediately jumped up and went "I am deciding" because he's like that, but in THEORY Linus still has the final say here, and has not weighed in last I checked? And of course the motivations for the removal are contradictory: the primary complaint is it hasn't been converted to device tree (which is true of a lot of stuff), so the reply is be sure to remove the stuff that IS using device tree. Thanks ever so much.

The guy who maintains the Debian fork has tenatively volunteered to become the new maintainer, and one thing he'd need is all the patches that Rich chronically hasn't applied for years now. (Jeff informed me of this, and has volunteered to help the new guy, but will NOT say so on the list, and I quote: "Not going to engage with LKM toxicity in any way, got permanently away from that way back in 2002." So I connected them in private email and am very tired of doing that. But I still haven't posted this set to the kernel list myself, so can't exactly blame him?) So a useful concrete thing I can do is grab the accumulated linux-sh patches that have gone by on the list. So I'm giving it a go.

The first problem is gmail is crazy, and only ever keeps ONE copy of a message when I'm sent multiple copies with different headers, which means when I get emails cc'd to linux-kernel and linux-sh I only get one copy and which list-id it is is semi-random. (Usually linux-sh because that server has fewer subscribers so sends my copy out faster, but not reliably.) In each mbox in which I _don't_ get a copy, reply threads get broken, and if I ever wanted to put together a cannonical toybox history mbox file (and start a quest chain to eventually insert it into Dreamhost's web archive to fix the gaps) I'd have to check my toybox folder AND my inbox AND my sent mail (because I don't get my OWN messages sent back to me through the list either). But that's not FRESH stupid.

So I've done a search on my linux-kernel folder in thunderbird for messages with linux-sh in the "to or cc" field, which defaulted to searching subfolders but ok. Some of those subfolders are architectures or subsystems I follow (linux-hexagon and such, linux-sh is a seprate top level folder in my layout because I checked it regularly), but most of those subfolders are previous years of linux-kernel that I moved messages out to because thunderbird not only melts if you try to open a folder with a few hundred thousand messages in it, but email delivery slows down because the filters appending email TO those large mbox files somehow scale with the size of the mbox file they're appending to, and having linux-kernel as a regular destination gets noticeably slow every 3 months or so, and email fetch CRAWLS after 9 months without reorganization. So I have to periodically do maintenance to keep thunderbird running by moving messages into yearly folders to fight off whatever memory eating N^2 nonsense is in thunderbird's algorithms (a name and an offset in an mbox file doesn't seem like THAT big a struct, but it is in C++). Thunderbird's, "click then shift click to highlight a bunch of messages, right click move to other folder" plumbing ALSO scales badly with lots of messages (the "swap thrash" threshold is somewhere around 40k messages, which is much faster with an SSD but really not good for it, and the actual OOM killer kicks in somewhere in the 60k-90k range. There's something N^2 in their algorithm maybe? Yes the right click menu popping up even with 20k messages selected can take 30 seconds; it's a chore). But again, that's not FRESH stupid.

Thunderbird's search results window presents a list of messages but doesn't let me right click and DO anything to them. (I can double click one at a time to open in a new window, but not what I want here.) Instead I have to create a virtual "search subfolder", which has a pink icon and populates itself slowly as it re-performs the search (of each subfolder) each time you go into it, but otherwise seems to act as a regular folder. Fine. And after it had stopped adding messages clicking on the last message in the list pegged the CPU for 45 seconds before it showed me its contents. FINE. So eventually manage to I highlight all the messages in the pink folder, right click and get a menu, tell it to "save as"... and the resulting destination pop-up doesn't give the option of making a new mbox file, it wants to save them as individual messages. Ok. So I give it an empty folder to do so in, and...

Here's the FRESH stupid: a thousand empty files show up at once, with no contents yet, 30 seconds later the contents fill out in the filiesystem but I also get a pop-up saying "couldn't save file". Because it tried to open all the files it was writing IN PARALLEL and ran out of filehandles. (Or maybe loop to open them all, loop to write them all, loop to close them all? Why would anyone do that? The default ulimit -n is 1024, the default HARD ulimit is 4096 filehandles per process without requiring root access to increase. Don't DO that.)

Remember how I said Mozilla was not a real open source development organization? They are BAD AT THIS. So is the Free Software Foundation, the Linux Foundation, and even Red Hat. Capitalism mixes badly with open source development, even when it a nominal foundation claiming to shield developers from capitalism. Red Hat inflicted systemd on the world for profit (we're not allowed to opt out), the FSF zealots became as bad as what they fought, and the Linux Foundation and Mozilla did that wierd 501c6 trade association thing (a for-profit nonprofit tax dodge) where they're endlessly fundraising to provide exclusive members-only benefits.


January 13, 2023

Fade is on the airplane (as is This Dog).

Sigh, I do not have time for kernel shenanigans, but I guess I need to make time. Grrr. (There's a maintained debian port, Hellwig. Stop it. I didn't even post the bc removal patch to the list! I should do so, every release from here on...)

And Wednesday's question has been answered: the reason the middleman couldn't manage to pay my invoice THIS time is because "our policy is to pay invoices in arrears rather than in advance". Which is news to me because I was previously doing quarterly invoices (not risking TOO much of the money to a single transaction) and they paid Q4 in october. This time I invoiced for 2 quarters at once (hoping not to go through a multi-week debugging/negotiation process QUITE as often), and... Sigh. (They have literally one job. This is the fourth time it has not gone smoothly.)

The Executive Director of the middleman went on to suggest "if there is a strong reason for invoicing in advance, please do let us know, and we may be able to make specific arrangements for this — such a binding you to an agreement as a consultant."

I replied:

I invoiced for 2 quarters this time largely because each of the previous 3 invoices had some sort of multi-week issue. I honestly did not expect this one to go through smoothly either, but was at least hoping to deal with the problem less often. (I left a good chunk of the sponsorship money in there because I'm still not ENTIRELY convinced it won't vanish in transit again and maybe not come back this time.)

Now that I know the fourth roadblock is bureaucracy, let me know when and how much I'm allowed to invoice for to conform with your procedures, and I'll do that then. I'm assuming invoicing for Q1 would still be paying me in advance, so... March? (In previous quarters I got paid for the quarter we were in, but now that I'm aware of "policy" I'm assuming that no longer works either. Can I invoice for Q1 now and get paid March 1, or do I have to wait to submit and approve the invoice?)

As for whether I'm a flight risk, I've been working on https://github.com/landley/toybox/commits/master for 16 years (ala https://github.com/landley/toybox/commit/13bab2f09e91) which is longer than github's existed (hence https://landley.net/hg/toybox). Every commit in that repo was applied to the tree by me, and I personally authored... grep says 3642 of them. Even the current mailing list goes back to 2011 (http://lists.landley.net/pipermail/toybox-landley.net/) and dreamhost is terrible at mailing lists (https://landley.net/dreamhost.txt and https://landley.net/dreamhost2.txt and no I don't know where the threading info went back at http://lists.landley.net/pipermail/toybox-landley.net/2013-August/thread.html but after https://landley.net/toybox/#12-21-2015 and https://landley.net/toybox/#23-07-2022 and such I'm not asking).

Most of that time toybox was a hobby project I did around various day jobs (https://landley.net/resume.html). Google decided to use toybox in late 2014 and I kept working on it as a hobby for another 7 years. I am very grateful to them sponsoring me to focus on this project, and have said so publicly multiple times including in the release notes (https://landley.net/toybox/#12-08-2022). Disclosure: before the sponsorship I did get the Google Open Source Award twice, which came with a $200 gift card each time.

I suppose I could always get hit by a bus or have a stroke or something, but I'm not sure how signing a contract with you would affect that?

How WOULD the middleman perform oversight? Do they have any idea what success looks like? The only other guy who gets cc'd on this sort of thing is Elliott, and even I can't reliably find stuff like that again 6 months later. (Would they assign somebody to read my blog? Would that HELP?) Eh, KLOCs I suppose. Judge the value of a car by the weight of metal added to its construction...

While trying to google for a link writing the above, I noticed that lists.landley.net is no longer visible via google at all, and traced it to Dreamhost adding a robots.txt blocking... everything. I didn't change anything, and don't have ACCESS to change anything (remember: it's a shared server and they don't let me log in directly, everything happens through a web panel). I have opened a support ticket.

Oh goddess:

Subject: The robots.txt you put on lists.landley.net (which you won't let me log into) blocks google.

Hello Rob,

Thank you for contacting the DreamHost support team, I'm sorry you're having this issue, I will be happy to help. After checking your site under landley.net, I was not able to find the robots.txt you've mentioned, so to check the rules and offer you solutions. Have you deleted the file to prevent blocking Google crawling your site?

Please, have a look at our article on how to create a robots.txt file that is convenient for you https://help.dreamhost.com/hc/en-us/articles/216105077

I hope this troubleshooting and information was useful for you. Please, don't hesitate to contact back the support team in case you need it.

They didn't even read the TITLE of my support request, did they?

My reply:

> After checking your site under landley.net, I was not able to find the robots.txt you've mentioned,

Because lists.landley.net is not the same web server as landley.net. Your mailing lists run on a different (shared) server which I don't have direct access to, and which I can only interact with through your web panel.

Your server has been mildly broken for years, such as refusing to give a list of available mailing lists under https://lists.landley.net (which has been "temporarily disabled" for over a decade).

But sometime in the past year or so the robots.txt on lists.landley.net (which is not landley.net) changed, so that:

https://www.google.com/search?q=site%3Alists.landley.net

Says "no information available on this page", and when I click "learn why" under that it goes to:

https://support.google.com/webmasters/answer/7489871?hl=en#zippy=%2Cthis-is-my-site%2Cthe-page-is-blocked-by-robotstxt

> Have you deleted the file to prevent blocking Google crawling your site?

I would love to get access to lists.landley.net to fix stuff there, but the lack of that has been a persistent issue dealing with you for some years now:

https://landley.net/dreamhost.txt
https://landley.net/dreamhost2.txt
https://landley.net/toybox/#23-07-2022

I haven't even bothered to ask where the thread information for older months went:

http://lists.landley.net/pipermail/toybox-landley.net/2013-August/thread.html

(It used to be able to indent those, but not anymore.) But far and away the BIGGEST problem with lists.landley.net is you can't access it via https but only http, which means mailing list administration sends a plaintext password across the internet for every page load. (Because the Let's Encrypt certificate for landley.net isn't available to the shared lists.landley.net server.)

> Please, have a look at our article on how to create a robots.txt file that is convenient for you https://help.dreamhost.com/hc/en-us/articles/216105077

I know what a robots.txt file is. But I do not have access to change any of the files at https://lists.landley.net. I can only ssh into the sever that provides landley.net (ala www.landley.net) because the different domain name resolves to a different host.

> I hope this troubleshooting and information was useful for you.

Not really. Here is the issue:

https://landley.net/robots.txt - 404 error
https://lists.landley.net/robots.txt - ERR_CONNECTION_REFUSED
http://lists.landley.net/robots.txt

  User-agent: *
  Disallow: /

Meaning Google cannot index the site. It USED to index the site, but it stopped sometime during the pandemic, because of you.

I hate having to explain people's own systems to them. It's embarassing for both of us. I also dislike having to reenact the Dead Parrot Skit. ("If you want to get anything done in this country you've got to complain until you're blue in the mouth.") I feel there should be a better way, but I'm not good enough to find it.

Walked to the table for the first time in a while. (I was hanging out with Fade in the evenings, and mostly staying on a day schedule with her.) Pulled up the list of pending shell items... and then spent the evening editing old blog entries since new year's.

My blog plumbing (such as it is) has a slight year wrapping issue: I switch to a new filename for 2023. The rss feed generator takes the html file as input and emits the most recent 30 entries in rss format, using the stylized start of new day html lines to split entries. Which means if the December entries aren't appended to the new year's file they'll prematurely vanish from the RSS feed, but when I DO append them I keep forgetting to delete them and I think some previous years might STILL have the previous december at the end?

There's always a temptation to cheat and not edit/publish January for a week or two, so that when the RSS feed updates everybody's had plenty of time to notice the old stuff, and anybody new checking it won't see a questionably short list. Not that I need MORE incentive to procrastinate about a large time sink...


January 12, 2023

Fade flies home tomorrow, mostly spending time with her.

The ls.c stuff (fallout from this) is harder than it needs to be because ls --sort has a lot of long options (and I added a couple more). The current help text looks like:

-c  ctime      -r  reverse  -S  size     -t  time   -u  atime   -U  none
-X  extension  -?  nocase   -!  dirfirst

And I went "well of course that should be comma separated values with fallback sorts, just like sort itself does!" and that's... tricksy. Each of those can be specified as a short option (which doesn't save the order it saw them in), and you can presumably mix short and long options, and I dowanna re-parse the list each time because that feels slow but I don't have a good format to put it in?

Eh, data format's not hard: array of char that's the offset of the flag value for the sort type. Break the comparison out into its own function and feed it either toys.optflags or 1<<sort[i] in a loop. If I ensure flag 0 isn't interesting (it's currently -w, not a sort option) then it's even a null terminated string (of mostly low ascii values, but still). But the design and user interface parts are still funky: the longopts would accumulate as fallback sorts and the single character sort flags should switch each other off? No, it's more complicated than that: you can do ls -ltr with is reversed mtime, so they DO chord at least sometimes... Actually, "reverse" is specifically weird. Sticky. It should ALWAYS go last because otherwise it has no effect.

No, the really WEIRD chording is -cut with or without -l. (I was working on this before, I know I was, and I document COMPULSIVELY, but it's not in the blog. Is it on the mailing list? One of the myriad github pages? A commit comment? Did I make the mistake of typing it at someone in IRC and setting the little mental "it's been written up!" flag? Who knows...)

(Once upon a time the #uclibc channel on freenode, where all the busybox developers hung out back in the day, was logged to a web page, which I believe went down when the guy hosting it went through a bad divorce, and in any case freenode got bought by a korean billionaire who did to it what the muskrat is doing to twitter. I still sometimes think "written in irc" means "I can find it again later", but have mostly trained myself back out of that these days.)

Anyway, the issue is that the ls man page (and behavior) is nuts:

-c     with -lt: sort by, and show, ctime (time of last modification of
       file status information); with -l: show ctime and sort by  name;
       otherwise: sort by ctime, newest first

So -l disables -c's sorting behavior and you add -t to get it back. Same for -u. That's horrible historical nonsense and I need to make it work, but where does --sort ctime and --sort atime work into this mess?

As always "how to implement" is easy and "what to implement" is hard.

Sigh: [-Cxm1][-Cxml][-Cxmo][-Cxmg][-cu][-ftS][-HL][-Nqb] is tangly. But lib/args.c hasn't got a [-Cxm[1log]] syntax and there haven't been other callers for it.


January 11, 2023

I have invoiced the middleman! Let's see how it fails to work THIS time.

Staring at the bugs shell fuzzing found. And ls.c. And the shell "read" command. And the shell "command" command, because command -v is where scripts/portability.sh barfs trying to run the test suite under toysh.

Not really making a lot of progress on any of them, but looking. Oh, and I should read that git format documentation...


January 10, 2023

Finally cut a toybox release. And then updated the date in news.html to the correct day AFTER cutting the release. (The tarball and tagged commit still say the 8th. Always something.)


January 9, 2023

I have reached the "revert and rip stuff out" stage of release prep: if it's not ready, punt it to later.

Disabled the date test in "make tests" again: not shipping a fresh toolchain this time. Put bc and gcc back in the airlock because not demanding people build a patched linux this time either. More FAQ updates. I can't get the ls.c work in this time...

Right wing loons are flipping out about a possible ban on gas stoves, which means Fuzzy has been involved in an argument online where somebody insisted it was impossible to make proper custard on induction, so we have a big pot of custard now. It's lovely. Peejee has a want. (Cat, YOU may be spry and feisty but your kidneys are 19.)

Peejee had custard.


January 8, 2023

Found a problem with "make sh_tests" where some of the early tests weren't testing the right shell. There's a context switch before which you can do "sh -c 'test' arguments", and then it switches to having all the tests run through sh -c _for_ you, to ensure it's all being tested in toysh rather than being "eval" under whichever shell the test suite is itself running in. (On Debian, bash. In Android's case, mksh.) You can manually wrap tests before that yourself, but I found a set of tests before the switch but weren't wrapped, and moved them after the switch so they happen in the proper context... and some fail. Now I've gotta fix unrelated stuff before running the test suite gets me back to seeing the failures I was debugging before Progress of a sort, I suppose. But it puts us firmly into "punt this whole mess until AFTER next release" territory.

Oh hey, right after I tried to pivot AWAY from working on toysh, Eric Roshan-Eisner ran a fuzzer on toysh and found several ways to segfault it. Fixed some low hanging fruit, punting the rest until (say it with me) after the release.


January 7, 2023

I kiiiinda wanted to get "make test_sh" passing its test suite this release. Not happening, but I have multiple dirty sh.c forks I'm trying to check in and delete. The release is the time to finish unfinished things and clean up what you can.

The next "make test_sh" failure was a simple fix [editorial note: while trying to add the link to the blog I realized I'd checked it in to the WRONG TREE: pushed now] but the next thing that test tried to do is call the shell builtin "read"... which I haven't implemented yet. Taking a stab at it now, but there's a design problem: it does line reads but lets read -u substitute a different file descriptor. Hmmm...

Strings are hard, and that includes efficiently reading lines of input. This is why I had get_line() all those years: byte-at-a-time is slow and CPU intensive, but larger reads inevitably overshoot, and you can't ungetc() to an fd. (Well you _can_ but only to a seekable fd, which does not include pipes or tty devices.) This is why the ANSI/ISO C committee invented the FILE * back in the 1980s: somewhere to store a buffer of data you block read and save the extra for next time. But shells don't USE file pointers, they use file descriptors, both for redirect and when spawning child processes.

This isn't AS bad because pipe and tty devices return short reads with the data they've got, so when a typing human is providing input MOST of the time the computer will respond to the enter key before you press the next key. And piping between programs, each printf() turns into a seperate write() system call which sends a batch of data through the pipe and if the read() at the far end receives that data before more gets sent (and concatenated in the pipe buffer) then it hasn't read ahead part of the next line there either. But if you DO type fast (or something like "zcat file.gz | while read i" happens) then the read gets extra characters that go on the next line, but the read returns and the next read happens only knowing the file descriptor. (If you're wondering why you see "echo $X && sleep .1 && echo $Y && .sleep .1" in some places in the test suite... generally this sort of thing. Even that's not ENTIRELY deterministic if the system's heavily loaded enough.)

This same problem would also screw up trying to provide input to a command, such as echo 'fdisk blah.img\nn\np\n1\n\n\nw' | sh because the FILE * stdin used to read the fdisk line will read ahead to the rest of the data in the input block, which is then NOT provided by file descriptor 0 to the fdisk child process, because it was aleady delivered and is waiting in an unrelated buffer. (I bothered the Posix and C99 guys about querying how many bytes of data are waiting in the buffer so I could fread() them back OUT and pass them along, about like my tar.c does when autodetecting compression type from pipe input. You read data and then need to DO something with it, can't go back into the file descriptor.

(If you COULD unget data into a read-only file descriptor, that would be a security issue. Unix semantics generally do make sense when you dig far enough, because everybody's had 50 years to complain and usually would have fixed it by now if it was actually wrong.)

All this reminds me I'm ALREADY mixing FILE * and stdin in sh.c because get_next_line() takes a FILE * argument, but that's always been a placeholder function. I need to implement interative line editing and command history, which I should tackle soon but it's not a _small_ can of worms to open and I want to get the shell more load bearing before putting a big neon "welcome" sign out front. But the interactive stuff should use the input batching trick I introduced to busybox vi years ago to figure out what is and isn't an ANSI escape sequence, which I already iplemented in lib/tty.c scan_key_getsize() and is STRONGER reliance on input batching. (It would be lovely if I could fcntl() a pipe to NOT collate data written to it in its buffers, but this seems to be another "optimization" I can't turn off. It would also make testing dd SO much easier if I could do that...) Anyway, scan_get_getsize() is always doing 1 byte reads to assemble the sequence from a file descriptor without overshooting, because "interactive keyboard stuff" really should not be a big CPU/battery drain on modern systems. (He says knowing that "top" is a giant cpu hog that really needs some sustained frowning to try to be less expensive. I dunno if it's all the /proc data parsing or the display utf8 fontmetrics or what, but something in there needs more work.)

I suppose I could try to lseek() on the input, and do block reads when I can and single byte reads if I can't? The problem is the slow path is the common case. I don't want zcat file | while read i to be an unnecessarily slow CPU hog, and the FAST way is using getline() through a FILE * (or writing my own equivalent; generally not an improvement). Which doesn't work for -u, and if I wrapped that in a FILE * where would I save the cache struct between "read" command calls? How do I know when it's done and I can free it? Can of worms. Redirect and FILE *stdin aren't going to play nice together, but what's the BETTER way?

Sigh, I'm not entirely sure what the corner cases here ARE. Coming up with test cases demonstrating it causing problems is a headache all its own. And some of those corner cases I'm pretty sure bash suffers from and their answer is 1/10th second sleeps.

I don't WANT two codepaths, one for stdin and one for -u other than 0. That's just creepy.


January 6, 2023

Elliott didn't like xmemcmp() so it's smemcmp() now. (Yeah, I know he's right, just... trying to get a release out!) Bugfix from somebody at google for sh.c (yay, people are looking at it). FAQ update...

The new dishwasher is not arriving today, supply chains are failing to supply, or possibly chain. New estimate is the 20th. Fuzzy is very tired of doing dishes by hand, we have purchased paper blates, cups, bowls, and plastic utensils.


January 5, 2023

Day 2 of Actual Release Prep, which _seems_ like it's mostly just creating a big page of release notes but is actually "go through each commit since the last release and re-examine them", which is second pass review (more design than code per se) and a MASSIVE tangent generator. It always winds up taking multiple days to actually get a release out, and that's AFTER I've done a full mkroot build-and-test recently on the kernel version I plan to release with, using the toolchain I plan to release with. (I.E. no blocker bugs.)

The toolchain issue's a bit wonky this time, because llvm version skew broke my ability to rebuild the hexagon toolchain with the script that worked last time, but I need to rebuild musl to include the patch for the issue Rich refused to fix but which otherwise breaks the date test that runs on glibc and bionic but not musl. (I enabled the date and gunzip tests in "make tests" because they worked fine on glibc and bionic, and only after I'd checked it in realized that date was still disabled because the musl bug was a pending todo item.)

It's a one line fix to musl, but Rich won't do it because Purity or something, and after many years fruitlessly arguing with Rich I just put the simpler workarounds in my code, and otherwise patch musl in the toolchains I build. I wave a patch at Rich once so it's not MY fault when he says no, same general theory as linux-kernel. (Except there I tend to do second, third, even fourth versions when I get engagement. Less so when they're ignored.)

Speaking of which, my musl patches are still inline in scripts/mcm-buildall.sh but I've moved the kernel patches from scripts/mkroot.sh out to a separate repository. I should philosophically collate my patch design approach at some point, but I'm not holding up the release for it now.

In general broken out patches are better in case other projects want to pick them up. Squashfs was widely used out of tree for many years before lkml deigned to notice, and some of the Android stuff still is I think? Back on Aboriginal I had a patches dir with linux-*.patch in it. For this release I'm probably putting 0001-*.patch files for the kernel in the mkroot binaries release dir because "apply these to your arbitrary kernel version" seems easier than collating two otherwise unrelated kernel trees via git. But how much of that is what I'm used to vs what other people are used to? (I mean I HAVE a branch published on github, but have to redo it each release I do a musl thingy on and then can't delete the old ones if they're load bearing, which is non-collated cruft I dowanna encourage/accumulate. "Probably gonna delete this later" is not a good ongoing policy.


January 4, 2023

Rant cut and pasted from the list to the blog:

I'm not a fan of over-optimizing compilers. My commodore 64 had a single 1mhz 8 bit processor with 38911 basic bytes free, and it was usable. I'm typing this on a quad processor 2.7 ghz 64 bit processor laptop with 16 gigabytes of ram, and this thing is completely obsolete (as in this model was discontinued 9 years ago: they were surplussed cheap and I bought four of them to have spares).

Performance improvements have come almost entirely from the hardware, not the compiler. The fanciest compiler in the world today, targeting a vintage Compaq Desqpro 386, would lose handily to tinycc on a first generation raspberry pi. Hardware doubled performance roughly annually (cpu was 18 months but memory and storage and stuff in creased in parallel) and each major compiler rewrite would be what, 3% faster? The hardware upgrades seldom broke software (rowhammer and specdown meant the hardware didn't work as advertised, but that's an obvious bug we worked around, not "everything intentionally works different now, adjust your programs"). Every major gcc upgrade had some package it won't build right anymore and the gcc devs say we shouldn't EXPECT it to.

Part of this attitude is fallout from the compiler guys back around 1990 making such a big deal about the move from CISC to RISC needing instruction reordering and populating branch delay slots and only their HEROIC EFFORTS could make proper use of that hardware... and then we mostly stayed on CISC anyway (yes including arm) and the chips grew an instruction translation pipeline with reordering and branch prediction.

I'm aware this is a minority view and I'm "behind the times", but if I wanted the tools to be more clever than necessary I'd be up in javascript land writing code that runs in web browsers, or similar.

This difference of viewpoint between myself and people maintaining compilers in C++ keeps cropping up, and I have yet to see a convincing argument in favor of their side. They're going to break it ANYWAY.

I'm currently editing the December 28 blog entry about tidying up the html help text generator, and I realized a corner case I hadn't handled: nbd-client says "see nbd_client" which doesn't exist. (Public dash version vs private underscore version because C symbol generation.) Sigh. Ok, fix the help text generator AGAIN...

I keep nibbling at the release, but... time to start writing release notes. Ok, git log 0.8.8..HEAD and... there have been a few commits, haven't there? Lot to go through. But first, the hardest part: picking a Hitchhiker's quote I haven't already used.


January 3, 2023

Working to make ASAN=1 make test_sh pass, which is whack-a-mole. The address sanitizer's a net positive, but it's a close thing at times. (Gimme a stack trace of where the problem OCCURRED, not just where the closest hunk of memory was historically allocated.)

Refrigerator dude stopped by and vacuumed the refrigerator coils, which were COVERED with cat hair. Showed me how to do it myself, not that I own a vacuum cleaner. (Tile floors. Big flood back in 2014. The only carpet in the house these days is throw rugs we take outside and do a sort of bullfighter thing with.)

The outsourced washing machine guy called and said the symptoms I'm seeing on the model I have means the circuit board's almost certainly fried, probably had water leak onto it, which with labor is something like $800 to replace (Bosch is reliable, but not repairable), and getting a new dishwasher of the same model and having the old one hauled away is basically the same price, so there's not much point him coming out to look at it and billing us for not being able to help. (Professional repair dude, no shortage of work.) Thanked him and Fade ordered a new one from the people we got the replacement washer and dryer through, they think it'll be here Friday.

Yes, I am aware I did the refrigerator thing because "they're already coming" and then it was two seperate servicebeings. There was meta-upsell there, apparently. Unlikely to use Sears again, which is convenient since they only barely seem to still exist as a kind of temp agency.


January 2, 2023

Finishing up pending toysh fixes. I was redoing math stuff and it's always more work to figure out what I was doing when I leave myself half-finished commits in the tree. I can see what the code there is doing, but have to work out what I MEANT it to do and examine how much I got done to figure out what I left out. The design work is as always the tricksy part: is there anything I didn't think of at the time, or thought of but didn't implement, or thought of then but am not remembering now? (There's no such thing as sufficiently copious design notes that do NOT include a finished implementation. Not that the implementation by itself is always sufficient either, which is why there's code comments and commit comments and blog entries...)

Not gonna manage to merge expr.c in with $((math)) this release, and I'm not 100% sure it's doable (well, worth doing) at all. And the big missing piece seems to be a floating point version of this same code. Python seems to do arbitrary precision math: 1<<256 and 1/1.7 resolve but 1.7<<256 is an error. Multiple codepaths!

Eh, punt for now. Close the tab, move on to the next...

The dishwasher died. As in the power button does nothing, acts like it's not plugged in but the outlet works when we plug in other stuff? (RIGHT at new year's. IS there such a thing as a Y2023 bug?)

Despite Sears having died years ago, Google Maps has a number for "sears appliance services" in hancock center. (In the middle of the parking lot on the map.) And when I called it I got... what's probably an indian call center, but sure. They have our file from when we bought the thing. And want $150 for a service technician to come look at it. Hmmm...

I'm not entirely sure how they upsold me on having my refrigerator serviced, but it was just an extra $50 and nobody's looked at it in ~5 years and I'd rather it didn't go out, so sure. Why not. (Fade thought we might as well, anyway. Long as the dude's already making the trip...)


January 1, 2023

Happy new year! The first in a while that isn't "the year of hindsight", "last year won", or "also that year". We are finally "2020 free", or at least experiencing a diet version thereof.

Grrr. The recent xmemcmp() changes in file.c left me with an open tab where I WANT to replace a bunch of memcmp(x, "string", 8) with #define MEMCMP(x, y) xmemcmp(x, y, sizeof(y)) so you don't need to specify the length, but unfortunately it won't quite work. Yes, sizeof() treats a string constant as an array and thus gives you the allocation size, including the null terminator. Some of the comparisons in file.c are checking the NULL terminator, and some aren't. Having two #defines for the two different cases pushes this out of net-positive mental complexity savings territory. Subtle enough the NEW thing becomes a sharp edge you can cut yourself on. The other is redundant/tedious but very explicit.

While reviewing them I did find a memcmp(s+28, "BBCD\0", 5) so once again no review is ever COMPLETELY wasted...

Maybe I should rename _mkroot_ to "dorodango"...


Back to 2022