Rob's Blog rss feed old livejournal twitter

2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2002

todo: Raspberry PI

July 29, 2016

Monday's doctor visit gave me a referral to Austin Retina Associates to get my eyes as thoroughly checked as is available, and they use REALLY powerful dilator drops that didn't wear off until late afternoon. (At which point I still had eyestrain.) So not the world's most productive programming day.

On the bright side, the preliminary diagnosis was "you're really nearsighted, that can have a lot of side effects, your retinas look fine". (The nice lady reading the paperwork mentioning about my allergy to lidocaine _after_ she put all the drops in my eyes was... interesting. But I seem to have come through it without another trip to the emergency room, so yay? Difference between two drops and a mouthful of the stuff, I guess...)

Stopped by the recruiter place to sign paperwork (I guess I'm going to San Diego next week), and while we were there Fade, Fuzzy and I hit the Lakeline mall. (No, there isn't an oxford comma in that.) Ever since Highland closed there hasn't been a mall close to us, and as long as I had to go somewhere to sign paperwork anyway, we had a Family Outing. (And then we go to three different states for 6 months to recover. Oh well.)

Still trying to salvage the existing toybox work directory. This may mean ignoring my email for a week or so until I've got the mess down to a dull roar. (Shove all the metaphors in the blender and hit the "Frappe" button, it's that kind of day.)


July 28, 2016

Somebody contributed a stylesheet update to the j-core website, which looks very nice (it's not up yet) but has some hiccups such as the text overlapping if you shrink the screen size. I'm trying very hard not to do the "must understand everything or I can't maintain it!" thing, because I'm not a webmaster and hope to hand such admin duties off to somebody professional if we ever have staff for it.

(Besides, my concern about toybox is wanting to maintain this code for a couple decades without it turning into a ball of scar tissue. The website could get completely redone from scratch next year. Probably won't, but could!)

While I was fiddling with this, I found that my login credentials for j-core.org didn't work. And Jeff couldn't log in either. So we started digging to figure out which server this was on and found three different VM hosts set up by somebody who isn't with the company anymore, and it turned into a Large Digging Expedition before we got it working again. (Not "sorted", just "working again". We have another todo item now. That whole server room is moving to a new office in a month or two, so we need to deal with this anyway...)

So that was unexpected timesink du jour eating half a day. As usual.


July 27, 2016

Huh. Nevermind, job offer came through. Decision time.

I've hit another one of those "juggling so many balls I spend all my time swap-thrashing" bits, where I've been interrupted and told "no, what you really need to be working on is THIS part right now" that I've got so many half-finished things I can't check in that every time I sit down to program I spend half an hour figuring out where I _was_ on piece du jour, and it's gotten silly, as in:

toys/pending/tftp.c   |  41 ++----
toys/pending/mke2fs.c |  42 ++----
toys/posix/date.c     |  48 +++----
toys/pending/more.c   |  56 +++------
toys/pending/expr.c   | 110 ++++++++++++------
toys/posix/grep.c     | 214 ++++++++++++++++++++++++-----------
toys/pending/dd.c     | 261 +++++++++++++++---------------------------
toys/pending/wget.c   | 281 ++++++++++++++++++++++------------------------

And so on for 54 modified files in the tree. To be honest, what I usually do when it gets this overwhelming is rename that directory to some variant of ".bak", check out a clean copy, start over from scratch and make a todo item to try to salvage work and notes and such from the old directory when I get a chance. (Revisiting the old directory has never yet happened before it got so stale it wasn't POSSIBLY still relevant, but I live in hope.)

But I don't want to do that to dd _again_. This would be the second cleanup of dd that got buried by swap thrashing. (The dd spec is so insanely complicated people have tried to convince me it was a practical joke that posix approved.)

I was trying to get grep and dd cleaned up and promoted for Elliott's toolbox replacement list during the Android O merge window. (Apparently they're calling N "Nougat" instead of "Nutella"? Sad.) The grep --color change turns out to be giant and intrusive (have to redo the output path and I found ~3 new bugs in it while I was there) and while I was there I had a "make grep work with embedded NUL bytes in the input" (so "grep ld-linux /bin/ls" can tell you if it's a glibc binary or not) todo item that involves factoring code out of sed, and it basically turned into rewriting more than half of grep.

Recently somebody's been telling me that rewriting the test suite to use his preferred variant of fortify/valgrind/libmudflap/coverity is more important than anything else I was doing, and that me wanting to fix an expr bug a different way than his patch but being too busy to do so promptly is a major failing on my part. (Yes, it's still in pending, but it's a crisis!) Of course I already _have_ a giant todo list for the test suite: turning all the notes-to-self into actual regression tests, going through posix and man pages and each command's help text and making sure every verifiable assertion is verified, making sure each test either passes TEST_HOST or is annotated with SKIP_HOST (and yes designing tests that are vague enough to pass multiple implementations but strict enough to actually prove something is a hard needle to thread), working out how to reproducibly test stuff like ifconfig and top (under qemu)... I expect that to eat _months_. Without his "install this other toolchain, then..." tests.

Somebody else has been trying to write vi and less (no not the boxes guy, a new one), with deeply baked in "curses" design assumptions (double buffered screen output, of a fixed size despite utf8/unicode having arbitrary combining characters, trying to do optimized screen redraws despite 9600bps being the ultra-low end of terminal speed these days)... I started exploring this space with hexedit.c, and then worked out cursor input with test_scankey, then did top with appropriate measuring and escaping of raw terminal output, and already demonstrated to my satisfaction that this entire design approach is unnecessary.

The point at which I stopped reading _his_ code was when I'd convinced him to stop trying to do optimized redraws, but his next patch still copied everything into the double buffer. The buffer wasn't ACCOMPLISHING anything anymore, but it was still allocated and used because that's how you do it. (Getting ideas _out_ of people's heads is often the hard part.)

Of course this is me being a bad maintainer. I should either accept implementations I don't like because code that exists beats code that doesn't exist, or I should spend all my time trying to coax good code out of external contributors, even if writing it myself would be several times faster. (The easiest way to explain what I want is to write an implementation and go "this". I tried doing elaborate writeups on the cleanup page explaining why I did what I did, but it doesn't seem to have helped...)

Part of the reason less is tricky is if you want to avoid blocking on either of the input sources, you need poll or select, meaning the my todo item of factoring out the poll loop from netcat and putting it in lib/net.c has to happen first in sequence, and netcat has some pending todo items of its own (I never quite made it work properly with nommu, and I'd like a generic getaddrinfo() wrapper in lib but have never quite figured out what it should look like...)

Speaking of global changes, I've locally changed xopen() and xcreate() to never return stdin/stdout/stderr because dd had a wrapper that did this and I decided that was a good idea. I have several places in various commands that either check or cope with the possibility of the command being launched with stdin closed and the next open turning into that, but those were either bespoke tests or just "being clever" thinking through what would happen if it did and making sure it didn't cause a problem. In lib/xwrap.c function xpopen_both() has multiple comments about it, but what if other cases care but don't have a comment? So I need to audit all the users of *open() or xcreate() (and really "connect()" and similar that can return filehandles, I remember caring about this when doing netcat). And of course sometimes you WANT to open stdin/stdout/stderr, so I added xcreate_stdio(), xopen_stdio(), and xopenro() and need to swap those in as appropriate.

But I can't check in the results of a global pass on the tree with 54 dirty files in the tree (many of them one line changes reminding me to Do This Thing when I get around to it).

Before all _that_ I was doing a wget implementation because Lipi Lee sent me one that didn't work. (When I tried to fetch a static file with it, there were unrelated numbers embedded in the text.) So I started studying THAT which I saw a nice presentation on at last year's Texas LinuxFest but I got interrupted partway through and would have to restart the research because it's not fresh in my head anymore...

So yeah, big less rewrite, big dd rewrite, big expr rewrite, changes to more and mke2fs and date, I see I was finally teaching rm to do the infinite recursion thing which posix requires (dirtree currently keeps an open filehandle to each parent directory as it descends, default ulimit is 1024 filehandles per process)...

I was _trying_ to open the sh can of worms last time I had a cleanish tree, but that's not what everybody else wanted me to work on.


July 26, 2016

The "job offer" in San Diego was apparently more an offer of an offer, I.E. the recruiter said they wanted me and were putting together the paperwork, then they were interviewing "just one more candidate they'd promised to talk to", then two days of "they'll have an offer by the end of tomorrow"... Yeah, I know this dance. It's the not happening dance.

Given that I was pretty ambivalent about it anyway (work's scraped together enough cash I won't miss a mortgage payment before they get their proper funding, and they're offering me stock (which is nice gesture), and I went ahead and did Banking Things to get reserve money anyway...

$DAYJOB was willing to let me go down to half time for 6 months (since I'm already on half pay anyway, it seemed fair, although might have impacted how much stock I got), meaning if I'd taken this job I'd have been working 60 hour weeks for the next six months (full time day job plus evenings and weekends telecommunting) but would have gone back to full time pay PLUS $25/hour PLUS half my old salary on top of that... Definitely making up for the unpaid couple of months there and replacing a lot of the money the flooding cost us.

But if it's not happening, that's fine too?

I'm ambivalent about startup stock. In publicly traded companies it's great, in privately held companies there's nobody to sell it to so what it's "worth" is arbitrary. Small company stock has never turned into money for me in my entire programming career, but it's a nice gesture. And who knows, this time maybe it will? (Jeff made out well with his previous company, that's how he got the money for his equity stake in this one.)

Sigh. I've never been motivated by money, but I am grudgingly motivated by a _lack_ of money. My reply to money becoming an issue is to try to spray the household down with cash until the problem goes away again (which can have some serious latency before it kicks in). Mostly I've chosen my career based on work I consider interesting (which is why I've never done Windows), and back when I was single I did a lot of "work 6 months, take 4 months off" things which is where I got a lot of open source development done back before it was an employable skill. I'm aware I'm incredibly lucky that things I like to do are things people tend to pay for, but that's also why I'm the member of the household who has to stay employed. Both of them working full-time wouldn't make up for the loss of half my salary. (It's also why I'm the one there's diability and life insurance on.)

In this case, money-earning-strategies went into play before lack-of-money receded from immediate crisis to annoyance. Those strategies sort of started to pay off about when I stopped pushing them quite so hard, and if it winds up going over the finish line into "much money for 6 months" land, sure I'd take it. Maybe try to pay off a chunk of the mortgage on the Very Nice House I bought for a pretty girl I like (and which makes another pretty girl I like quite happy, plus there are way too many cats for us to get an apartment anyway).

But if it doesn't happen, that's more time to work on j-core and toybox and I retain a stronger relationship with a company I seriously enjoy working for when it's actually functioning as a place of employment, and which is likely to be back to issuing checks on time for the full amount sometime in the forseeable future, so I can stop worrying about money again.

Six of one...

Oh, fun trick about recruiters: if you don't send them regular follow-up emails they forget you exist. Almost always. Doesn't matter what stage of the process you're in, just replying to what they send you isn't enough. (This seems to be the case with just about any potential business relationship. And several conventions.)

You can tell I haven't been seriously job searching since work started half-paying me again, because I've stopped proactively emailing all the OTHER recruiters asking how it's going, which means I've stopped hearing back from them. And I haven't replied to the last few cold-call emails out of the blue to put new irons in the fire.

In my experience it takes a couple dozen leads to turn into an actual job, which is why when I'm seriously searching I continue engage with recruiters right up until I get an offer I'm willing to accept. Yes, it's occasionally resulted in a hard decisions between multiple offers, and I feel BAD about turning down offers (like I led them on). But right now I'm on the other end of that. The company in question was interviewing other people after the recruiter who was trying to earn a comission from placing me there was convinced they were going to offer me the job and thus him a check the next day. (I suspect this recruiter is new at it because he continued to be convinced it would happen the next day for like 3 days. Either that or it's a sales thing. I was never good at sales.)

I sort of know what it takes TO be good at sales. I could do a push to try to CONVINCE the San Diego people that I'm their guy (bypassing the recruiter, although legally he'd still get his cut if they bit). Heck, I'm pretty sure I could have made the Linux Plumbers' FPGA panel happen (more than 50% chance anyway) just by fighting for it. But when I figure out that somebody doesn't actually _want_ me, I tend to lose interest. There's just so much else I could be doing with my time and energy.

Current $DAYJOB? They want me. It's a nice feeling.

(P.S. Sometimes "not being wanted" is something you have to overcome. I asked Google's Director of Open Source back in 2013 if I could interest him in my Toybox project and he said no. So I went away and did another couple years of work on it to make it a better project, and eventually other people got interested and the code went in a different way, because the goal wasn't to impress Google but to turn smartphones into development workstations so when smartphones kick the PC up into the server space (and it becomes expensive big iron because the unit volume diminished), we're not stuck with the passive read-only future Apple wants to perpetually charge us for access to. Towards that end, I was happy to toil in obscurity for years and just do the work. Ok, and occasionally shout down zealots whose overly specific vision of Free Software circa 1979 utterly failed to address modern realty, but they were less relevant to the outcome than SCO or Oratroll, let alone Apple.)


July 25, 2016

Saw the doctor, who said 90% of all tonsilitis is viral so there's nothing they can do about it, and that my symptoms sound a bit like mono which she would need to draw blood to confirm (yes, I still have a phobia of needles) and if I DID have that there's nothing they could do about it and _that_ has to run its course too. (If I'm still taking sudden 6 hour naps next week, it's probably mono. Glee.) And I got a referral to an opthamologist for the eye thing, so I should go make another appointment.

My replacement power adapter I bought in Akihabra for 980 yen finally died last night. It's been fiddly for a few weeks with the cable needing to be in the right position (downside of carrying it everywhere in my messenger bag), but it finally wouldn't charge at all, and of course I found out with the battery almost exhausted, so it died before I dug up another one and I lost all my open windows again. I wonder what I was working on? (Well, more than the top 5 things anyway...) On the bright side, one of my previous netbooks used a compatible adapter, so I didn't have to go out and buy something.

Circling back to the dd cleanup: how do you test this? Half the point of this command is micromanaging blocking granularity (for some reason), but how do you detect that?

Answer: you make a tool for it. Sigh. (The existing 90 lines of dd test cases contributeed by a third party don't ever check this, of course.)

The problem is if I add a toys/other/count.c or toys/examples/test_blocks.c it won't reliably be in the $PATH, because standalone builds only put a single toybox command in the $PATH and use the host path for the rest of it. (That's a design decision, I don't want the cat test to fail because I changed sed. There's a 'test everything' mode ala 'make tests', and a test one thing mode ala make test_dd. When testing one thing, I just test that one thing. Also, I could "TEST_HOST=1 make test_dd" which tests the host version and doesn't even build toybox.)

So what I probably need to do is make a simple test command as a "here" document and compile it as part of the dd test. That's moderately horrible, but doable if I can assume the test system has "cc". (Can I assume that? Dunno.)

The OTHER problem is that Linux pipes are buffered, so data output into them or input from them doesn't retain blocking. (Nagle on steroids.) So I tried mkfifo, and in window one:

$ dd if=/dev/zero of=fifo bs=128 count=3
3+0 records in
3+0 records out
384 bytes (384 B) copied, 0.000706315 s, 544 kB/s

Window two:

$ ./count < fifo
in=384
out=384

Buffered again. There's a command called "stdbuf" in coreutils but apparently that sticks an LD_PRELOAD library in via the dynamic linker to adjust the stdio.h buffering in FILE objects (not an ioctl called on pipes).

There's a command socat (not installed by default) or unbuffer (ditto) that can allocate a pty, and those (maybe) aren't buffered? Except when I sat down to write a forkpty() user, I wound up basically needing the netcat infrastructure to do poll/select on the input and output. (Otherwise it's just about guaranteed to block waiting for itself.)

Technically I could also run it under strace and try to grep the strace output for "read" and "write" commands, but _EW_. But that's how I've been testing it by hand, and thus it's yet _another_ command that's reasonably straightforward to test by hand but not via automation. (Like top and ifconfig and...)

(And no, the answer to _this_ problem isn't to enable llvm's overflow checker either. Despite the endless thread on the list.)


July 24, 2016

My attempt to add --color to grep has forkbombed into a large set of of changes, many of which are theoretically unrelated to what I'm trying to do at the moment, but they're hard to fix separately because there's no point fixing old infrastructure I'm redoing anyway.

I checked in a bunch of new tests that I need to fix, but it's not complete. The most recent 'huh, I should add a test case' is that the grep man page says grep -m NUM stops reading after maching NUM many lines, but "grep -C 2 -m 5 toybox README" shows 2 trailing context lines after the last match, so it doesn't stop READING it just stops MATCHING. Wheee...

And I think I've confirmed that order of -e -f entries doesn't matter for grep (unlike sed) because:

$ echo "abcdefghijklmnopqrstuvwxyz" | grep -o -e 'de' -e 'defgh' -e 'fghijklmn' -e 'lmnopqrst'
defgh
lmnopqrst

It won't start a new match within an existing match, but it takes the longest match at each starting position regardless of pattern order. (Again, I need a test. And yes, this is what I've been doing during the endless "but automatic checker du jour is what finds the truly _important_ bugs" thread.)

So anyway, the big design issue that's been plaguing grep.c for a while is how to handle multiple patterns. What I've been doing is gluing them together into a single "pattern|pattern|pattern" regex, which has annoyed Rich from day 1 but has the theoretical advantage of being significantly faster on long strings. (There's no libc regexec() variant that takes an array, so I'd have to search the string multiple times, which is O(N) slower than doing a single pass just from the memory bus traffic (at least for any line longer than the processor's dcache).)

But sombody (Rich? Can't find the email right now) pointed out:

$ echo abcabcdefghighi | grep -o '\(.*\)\1'
abcabc
ghighi

And multiple collated parentheticals would in theory get the backslash numbering wrong. (That test is taken more or less from the posix BRE description, by the way.) Except when I tried it, I only got the first abcabc from the above test (no matches at all after, it seems to have aborted?), and adding an -e '\(123\)' before the second pattern didn't affect the output: it looks like the \1 is local between | at least with glibc?

That "at least with glibc" leans towards redoing this with separate regexes though. There's bionic, musl, and glibc that should all work, and maybe others although uClibc is toast at this point.

I think I've come up with a semi-sane way of avoiding _too_ much repeated work with the list version: save the last match context for each regex in the list, and only redo ones that aren't "no match" or past the end of the most recent match.

Oh, and I need to lift ghostwheel() (I mean regex_null()) out of toys/*/sed.c and put it in lib and then use it here and add tests to match strings after null bytes, because on ubuntu:

$ grep -o -a '.*ld-linux.*' /bin/ls

shows me the dynamic linker, albeit surrounded by garbage, but you'll note that garbage ends at NUL bytes in both directions! No it doesn't. Sigh, I don't understand the criteria for this, grep without -o is giving me the whole line between newlines but add -o and it's _much_ less data... seriously, what is the difference between -o and non-o in that? (I ran the -o output into hexdump -C and still don't understand why the .* is stopping where it is.)

Anyway, the _point_ is that should be useful to do stuff like:

$ grep -qo 'ld-linux-x86-64' /bin/ls && echo yes

July 20, 2016

The downside of having studied medicine in college (even if I didn't windup pursuing it because needles) is I have SO much material to be a hypochondriac about. So I'm either freaking out, or overcompensating in the other direction.

I've recently tried to cut aspartame out of my diet, because that's my current best guess at what's been causing the optic nerve swelling. (My initial symptoms started years ago when I was drinking those lemon rockstars which are full of artificial sweeteners.) Of course this means I've switched to making tea with splenda (because diabetes wouldn't be a lot of fun either), but it seems on the whole to have better since I did that?

Except the OTHER symptom of optic neuritis is flashing in the blind spot, where the optic nerve goes into the retina, and I've been having lots of that even when the edges of my eyes aren't doing it. (It started a few years ago, but very intermittently. Now it happens 3 or 4 times after I lay down in bed each night, and I notice it walking to and from the grocery store at night. It's CREEPY. (Oh, there's a condition where the pressure in your cerebro-spinal fluid is too high, and one of the symptoms of THAT is similar to optic neuritis but slightly different, although if left untreated it can damage your retina too. Never a shortage of material...)

And in the last few weeks I've been noticing the blind spot in my left eye a lot more. It seems bigger than the one in my right eye, and I can fairly easily position it to occlude things on an otherwise solid color surface, which is a lot harder to do with my right eye. (Creepiest thing was staring at the horizon while biking on a sunny day and watching a bus vanish into it and come out the other side. Of course I was on the right side of the road so couldn't easily compare with the other eye, but dude: creepy.)

The problem is that there's so many things wrong with my eyes (nearsighted, astigmatism, full of floaters, developing cataracts, visual migranes, and now the nerve swelling) which stack negatively that I tend to get hyper-vigilant about "is this a new problem? Will it be progressive and become a Major Thing in 5 or 10 years?" So far the ONE thing that's been in good shape is my retinas, I'd really like to keep them.

I had my eyes dialated and my retinas examined by a professional a few months ago, along with field-of-vision tests, and she said they were fine. I'm _supposed_ to have a blind spot in each eye, possibly there's a cataract that my brain has decided to edit out along with the original blind spot? No idea. Making a doctor's appointment to have it looked at again. (The weird thing is the migrane stabbiness is usually behind my RIGHT eye and this is my LEFT eye having a New Visual Thing.)

Well, I actually made the doctor's appointment to figure out why my throat seems so swollen (my hypochondriac streak chiming in "CANCER!" and being dutifully ignored), in a way that's really uncomfortable (especially at night) but doesn't seem to block my airway, and then I found out Fade's friend Stina has tonsilitis and I went "oh, that would explain it yes". Seems to be clearing up on its own. But as long as I've got the appointment anyway, I can A) have them confirm the diagnosis, B) frown professionally at my eyes.


July 18, 2016

One of the recruiters got back to me with a job offer in San Diego, a 6 month gig paying $25/hour more than this job did back before it went half pay.

The logistics would be weird with Fade going up to Minnesota at the end of next month (figuring out how to get the car everywhere it needs to go is fun), but digging back out of the financial hole the flooding plus a couple months of de facto unemployment is really attractive.

Honestly not sure how this'll work out. I _like_ my job, but if you don't get paid it's not a job and I do have a family to support. Hmmm.


July 16, 2016

The recent option parsing changes broke the multiplexer, so "./ls -l" worked built standalone, but "./toybox ls -l" didn't.

The problem I was trying to solve is that if the base command is disabled but an OLDTOY() for it is enabled, the base command's option string was a null pointer, so the NEWTOY() option parsing couldn't work. The NEW problem is that the multiplexer itself has an option string that SHOULD always be a null pointer (thus skipping the call to lib/args.c entirely), and now isn't, so the "toybox" command is attempting to parse the -l in "ls -l" above and rejecting it as an unknown option.

Feeding 0 or NULL to optflags in NEWTOY() macros indicates that the option parsing should be disabled. This lets lib/args.c drop out (be eliminated by the compiler as dead code) when no commands are actually using option parsing.

But over the years option parsing has gotten more complicated, and we don't use the macro's argument directly to create the toy_list[] array at the top of main.c anymore. Instead we wash everything through scripts/mkflags.c which produces #define OPTSTR_command macros for each command with an adjusted version of the string, replacing the dropped options (disabled by USE_CONFIGSYMBOL() macros) with ascii 1 (I.E. Ctrl-A) spacers so the FLAG_ macro values line up now that they don't move around when things are disabled (because you can't FORCE_FLAGS on for shared infrastructure unless you leave a gap).

To generate this stuff, scripts/make.sh runs each NEWTOY() line through the C preprocessor (cc -E) twice, once with a config.h produced from the current config and once with allyesconfig. It then glues those together into a set of lines that look like:

commandname "thisconfig optstr" "allyesconfig optstr"

The quotes are because the strings can have spaces in them (which lib/args.c documents as meaning a command that takes an argument MUST have a space after it, because "kill -stop" was being interpreted as "kill -s top", see commit b081ce989991).

The above lines are the input scripts/mkflags.c uses to produce the various FLAG macros for enabled/disabled stuff, examining the differences between defconfig and allyesconfig to do its job.

Unfortunately mkflags.c uses this fscanf to read each line:

bit = fscanf(stdin, "%255s \"%1023[^\"]\" \"%1023[^\"]\"\n", command, flags, allflags);

And if it gets a 0 or NULL the quotes around the argument won't match, and if it even gets "" with nothing in it the line fails to match because the %1023[^\"] must match at least one character.

So what scripts/make.sh did was hit it with sed (see comment starting on line 180) to turn unquoted inputs (0 or NULL) into " ", and then it stripped leading spaces to get back to a "" input, which it would check for and output as an unquoted 0.

The recent change checked if our "thisconfig optstr" was " " (I.E. entirely disabled), and if so used our allyesconfig optstr instead, which did not get further processing (such as skipping leading " " and then seeing if the result was an enpty string that needs to be turned into a null pointer to skip option parsing).

This change broke the "toybox" command which has a null option string so it doesn't parse the command line options of the command it's handing off to. There's a REASON some commands don't go through option parsing at all (and it should drop out for standalone commands that aren't using it, saving a few kilobytes)

So SOMETIMES the option string needs to be NULL, and sometimes it shouldn't be. And figuring out when to do each is tricky.

Way back when, it was NULL too often. An oldtoy doesn't have its own config string, because it has the same options/flags as another command. However, the wrapping config option (the USE() macro around the entire NEWTOY) is different for the oldtoy, because it has to be to make standalone builds of single commands work. (The build infrastructure sets the CONFIG option for the command it's building, so if the OLDTOY is under a different wrapper it won't get enabled. If it DID enable a diferrent macro, then the NEWTOY() would get enabled and could show up in the command list first, and single builds skip the whole command search stuff and just hardwire that they're the first element in the array, and the things checking the command name to change behavior need to see the right name even built standalone...)

Again, as far as the data passed in to mkflags is concerned, the command is diabled. This means there selected flags list is " ". But I can detect that thisconfig and allyesconfig don't match, and this fix was to use the allyesconfig string when they differ. In the case where the USE() macro around a NEWTOY is disabled, it doesn't matter what optstr we emit because the entry won't wind up in the command list.

That was the most recent change: use allyesconfig, skip leading spaces, and if it's an empty string output 0 to skip option parsing.But the downside of this all-or-nothing situation is that we never see any USE macros in the middle of the option string an OLDTOY is borrowing from a NEWTOY. When one command uses another command's option string under a different USE macro 1) you have to FORCE_FLAGS to get any of the FLAG macros switched on in the OLDTOY (which switches on _all_ of them), 2) we don't get a partial string to produce an elided CTRL-A version from, it's flatted to " " by the disabled outer USE() macro before mkflags gets it.

So right now OLDTOY isn't compatible with USE macros in option strings, and fixing that would be such an intrusive change I'm not sure where to start.


July 12, 2016

Yay, $DAYJOB claims to have given me half a paycheck! It hasn't shown up in the bank account yet, but presumably progress. If they do that again before the end of the month we can pay the mortgage without cashing out something tax-deferred to do it.

They claim they can now keep doing this until the October funding round. They're a little unclear on whether they'll give me the money for the back paychecks or stock in lieu when said funding round happens, but we'll burn that bridge when we come to it, I expect. (If they promised to pay me all the back pay in October, I'd tell the recruiters to stop looking. But... they haven't. I asked for clarification and heard nothing back. And stock _instead_ of salary, when the company has money again, would be disturbing.)

Elliott wants grep --color support, listed it as the reason he hasn't switched over from the other grep yet, and it turns out to be kinda hard to implement. When --color is on, you're basically doing a grep -o all the time, only instead of selectively displaying, you're selectively coloring. This requires a rewrite of the outline() function, which is already kinda complicated.

One way I could do it is add a linked list of ranges. Or put an array of ranges in the space leftover in toybuf (after regex_t gets taken out of the start, which is 64 bytes on x86-64 glibc, of 4k toybuf size). The failure mode is not coloring matches...

Except I'd want to join the -o logic with the coloring logic, and that's a bigger failure if we run out of space. 2*sizeof(int) is 8 so that's around 500 hits, is it a realistic limit? For display purposes no, for correctness of output purposes any arbitrary limit makes me uncomfortable. Which brings us back to the linked list, which is impolite to NOMMU but should still work there.

The outline() function exists because each line of output can have a lot of prefix data: with -H (or multiple search files) you output name: (except that colon is a - for context lines). For -n you output the line number. For -b you output the byte offset. And only THEN do you output the actual match. For "grep -Hnbo toybox README" it outputs three prefixes for each partial match.

Unfortunately, the actual line data being printed out diverges quite a bit, one of the callers wants to print a number, others print part of a line and others the whole line. And now --color adds a new type where a list of ranges gets colored differently, so I either need to assemble a large string from chunks or create a linked list of ranges to feed that data into outline().

The alternative is to make a prefix() function that spits out just the prefix data, and then write the rest of the line myself at each call site. There's some unavoidable duplication with that approach, but one of the callers wants to output a number instead of a string (color coded as white line data, not green prefix numers), and setting up a linked list of ranges for the context/match/context/match lines is probably more plumbing than just moving the work to the five callers.

Half the downside of the prefix() approach is FLAG_Z changing the output line terminators from newline to null, each of those five split out print statements needs to end with "%c" and add TT.outdelim to the argument list. Or add an xputc(TT.outdelim) line.

Still probably better than the linked list approach.

The other restructuring part is that --color basically triggers the FLAG_o logic, but outputting skipped parts of the line instead of a newline and a fresh prefix, and then flushing the rest of the line at the end before outputting the newline.


July 11, 2016

At Texas LinuxFest I met Steve French again (who I had dinner with at a previous TXLF but couldn't remember his _name_ until I saw his talk this year and went "that guy, right") who walked me through the fs/cifs/smb2pdu.c file in the kernel source that implements the packet sending and receiving for the SMB protocol. I'm pretty sure I can make a quick and dirty posix-mode smb server for toybox just from that, which would be useful.

I also saw Kir Kolyshkin's talk. (The coolest guy at Parallels during my brief stint working with them. He was there with a co-worker who I THINK I remember from my visits to Moscow long ago, but didn't remember the name of and was too embarassed to ask. In my defense it was after my second talk and I was totally fried.) I'd also asked him questions before (at the one and only Linux Plumer's conference I attended) that I hadn't been able to follow up on my own, and when I asked him about container stuff (I've got unshare and nsenter but that's only half of containers, I need an init handoff program and something to poke cgroups).

Kir pointed me at a a talk Jerome Petazonni gave on "what are containers made from", which Kir said showed how to set up a container from scratch manually from the command line. Unfortunately, 75% of the talk was theory (most of which I already knew), and when he only opened a command line to start showing how to set up a container (at the 41 minutes and 30 second mark) there were only 11 minutes left in the talk. He made it through unshare, pivot_root, cleaning out the old mount points and mounting a new /proc, then started to show how to create a veth device on the host and move it into the new container's network namespace but couldn't get it to work and ran out of time to debug it.

There's still useful information here, but it was mostly the stuff I already knew. I even already had one way to set up networking, and the practical part of the talk didn't even have time to even touch cgroups. I'm pretty sure the last ten minutes would have been exactly what I needed to know... if the talk had been ten minutes longer.


July 10, 2016

I've got a toybox design issue I've never been able to solve, which manifests as "truncate -s 8g" works on 64 bit, and doesn't work on 32 bit systems.

The toybox argument parsing logic initializes the GLOBALS() block with an array of longs, relying on the LP64 standard to have "long" and "pointer" be the same size. The downside of this is that there are subtle behavior changes between 32 and 64 bit argument parsing, specifically it's really hard to feed 64 bit integer arguments into 32 bit programs, but 64 bit ones handle them fine. (Hence the truncate -s 8g example above.)

I've been thinking for a while that I should somehow convert the globals array to always be 64 bit, and that atolx_range() should really be atollx_range(), I.E. 64 bit all the time. Unfortunately this is a really intrusive change (need to change every command's GLOBALS block), and it's also darn AWKWARD because I need to convert all the "long" arguments to a guaranteed 64 bit type (which long long technically ISN'T, it could be 128 bit! Somewhere, someday, the spec doesn't require it NOT to be.) And worse, I need padding between pointers on 32 bit, so maybe I need some kind of PTR_PAD() macro? I dunno what that would look like, and it would be a NOP on 64 bit.

I've largely just been letting 32 bit be subtly limited (ala truncate above) and letting everything move gradually to 64 bit, but... 32 bit is likely to be around in the embedded space for a while? I dunno what Android's plans here are...

I just bumped into this again because toys/pending/dd.c has its own strsuftoll() primarily because of the long long vs long thing. (And atolx_range doesn't support 'w' suffix, which isn't in posix anyway.) I can convert that to use atolx_range() and have it crunch down to 32 bits on 32 bit, or I could implement an atollx_range() but having 2 in parallel is annoying, or I could convert the existing one to long long which is intrusive but less intrusive than changing the globals array...

I've defaulted to "no action" on this for a while now because 64 bit should eventually take over and render it moot, but I'm open to suggestions.

(Another thing I could do is add a new interger type (or convert the existing integer type) to consume two slots in the globals array on 32 bit. (But not on 64 bit, which... ew? But this would let you say uint64_t in the globals to receive the argument...)


July 9, 2016

Gave my Building Android on Android talk this morning. I couldn't get my netbook to talk to the projector yesterday except in text mode, and today I wanted to show people lots of web pages (wound up borrowing a windows laptop to do it), because ubuntu found a new way to fail!

The talk before mine ran long, so despite showing up early I didn't get to even try to associate my netbook with the projector until 5 minutes into my scheduled start time. Ubuntu 12.04 could autodetect projector resolutions when I plugged in external VGA, and would change my laptop's resolution to match. Ubuntu 14.04 lost this ability, function-F5 now pulls up a "select output type" menu rather than toggling through the available options, and doing this suppresses the resolution change somehow. So I pulled up the display menu myself and guessed that the projector might like 1024x768, which it didn't (the magic value was 1280x1024), but while hitting function-F5 a lot I accidentally hit the function-F7 key, which did NOT pull up a menu, instead it disabled my touchpad. So suddenly my mouse pointer stops working and I have no idea how to get it back, and I'm now 7 or 8 minutes into my talk and can't get output on the screen. (Also, disabling the touchpad disabled the _keyboard_ for some reason. Only in X11, it still worked fine on the ctrl-alt-F1 console so I knew it was a stupid Linux software problem, not a sudden loose wire in the box.)

So there I was fiddling with a frozen mouse cursor and a display paying no attention to what I type, but the clock still updates and when I hit Function-F5 it pulls up a display resolution changing menu which I CANNOT INTERACT WITH (because it's ignoring the mouse and keyboard). I can ctrl-alt-F1 and type at the text console all I want though, and hilariously THAT output shows up on the projector, which is how I'd done my first talk (the prototype and the fan club the previous day). But this one was all web pages I wanted to show people in order...

After borrowing somebody else's machine to give the actual talk (somewhat extemporaneously, pulling up web pages as I thought of them rather than my carefully curated list with the highlighted sections... did you know chromium remembers copy-and-paste style mouse selections per-tab and retains them when you switch tabs? Very useful...) Anyway, after a less coherent talk than I meant to give (again, txlf tradition at this point), I went back to trying to fix my netbook.

At this point I realized Ubuntu was remembering this problem (which I at first blamed on the 1024x768 setting confusing X11 somehow) across reboots. So no, rebooting my machine doesn't fix it either. And renaming /etc/X11 to /etc/X11.bak didn't make it stop doing this either, nor did deleting or renaming every ls -alotr file in my home directory dated today. I STILL haven't figured out where Ubuntu is saving the currently selected screen resolution, but wherever it is it's REALLY NON-OBVIOUS.

Ubuntu has escaped its unix background enough to hide a config setting somewhere I can't find it from the command line, but NOT enough that doing anything even slightly nonstandard with its GUI doesn't outright brick the box. The Aristocrats! Linux on the Desktop!

(I eventually just asked a room full of linux geeks after the next panel if anybody there was an X11 expert, and somebody loaned me a mouse and asked if my touchpad had a kill switch. How long it would have taken me to guess on my own, I have no idea...)


July 7, 2016

Prepping talk material for Texas LinuxFest. I'm doing "The Prototype And The Fan Club" again (which I did at Flourish in Chicago years ago, and although I'm quite proud of the talk I gave the video missed the start of it and then the audio quality is terrible). And I'm doing "Building Android On Android"

I'm also juggling about four different recruiters, who are setting up phone interviews with various potential employers. $DAYJOB has managed to scrape up a little money from a canadian government program, and is working out how much they can pay us between now and October.

Back when this first happened I told Jen (my immediate supervisor) that if this wasn't resolved by my Texas Linuxfest talks, I'd use this whole mess with 3M trying to destroy my employer as an anecdote in a talk, being explicit about WHO was patenting our technology and then trying to put us out of business so we couldn't resist the patents. But now that there's a nonzero chance work might pay me enough of my nominal slary to keep up with my mortgage without having to cash out anything tax-deferred... Sigh. Alright, pass this time, they say they have money in the bank and I believe them. None of what's happened is Jen or Jeff's fault (they haven't been paid in as long as I have, and all their money is tied up in the company, it all happened because other people didn't do what they'd told THEM they would do, and they found out at the last minute with no chance to do anything about it before things hit the fan).

I'm still talking to recruiters, though...


July 2, 2016

Heard back from Michael Kerrisk about utimensat with NULL, who said:

Okay -- I added the following text to the man page:

Note, however, that the glibc wrapper for utimensat() disallows passing NULL as the value for file: the wrapper function returns the error EINVAL in this case.

So the "fix" is to change the documentation to commemorate glibc's breakage. Sigh.


June 29, 2016

What on earth does "touch -" do in the ubuntu version? I ran it under strace, and as far as I can tell it does nothing. (But successfully! Returns 0.)

Ah, here it is:

utimensat(1, NULL, NULL, 0) = 0

And yes, the linux kernel special cases a NULL path argument to make the utimensat syscall implement futimens() on the first argument instead. So touch - > filename would touch the filename. (Rich's example was "(while true; do sleep 3; touch -) > filename" which updates the timestamp of filename every 3 seconds.)

But if you feed a zero to the second argument of glibc's utimensat() wrapper it flips out, with BOTH a "__THROW __nonnull (2)" header guard in sys/stat.h _and_ the wrapper function checks for null and doesn't call the system call in that case.

The SAD part is the Linux kernel extension to do this was added in 2007 (commit 1c710c896eb4) by Ulrich Dr. Pepper, who was the glibc maintainer ati the time (before he left to work for The Bank of Evil). So glibc filtering out the ability to do what the kernel lets it do is EXTRA INSANE.

So I need to figure out if I should have an "if this then futimens() else utimensat" when both cases boil down to an actual call to utimensat anyway, or if I should let gcc throw a warning (and fail to work) when build against glibc and point all bug reports at glibc (works fine in musl and bionic!), or just ignore the "-" special case that isn't in posix anyway.


June 28, 2016

My todo heap has grown lots of spiky bits over the past month, and I'm going through and mowing them down.

For example, I've been reading the Busybox mailing list, albeit mostly lurking, just to see if people there suggest things that might be relevant to toybox. A few weeks back somebody complained that md5sum -c emptyfile didn't produce an error message, and I hadn't bothered to implemented -c in toybox but apparently somebody was using it, so I started an implementation, got distracted by the fact that I need a generic looplines() but the one I did in sed.c isn'tgeneric in the right way, and threw the half-finished thing on the todo heap. which means that md5sum (and thus sha1sum) in my tree hasn't compiled in weeks.

Today I finished that off, and even added some simple tests to the test suite, although my error messages and ubuntu's error messages do not match and I'm not testing that. (There's file not found. There's improperly formatted line - which yes includes blank lines, but I count "wrong length of hash" as a hash match failure not a misformatted line.). I did test that I handle filenames with spaces in them properly. (Parse to and eat FIRST group of whitespace characters, but keep all filename after that, although you can't have filenames with newlines in them. Yes I could do something with backslashes, but I'm not going there.)


June 27, 2016

I need to set up an Android development environment, so I cleaned 30 gigs off my netbook, fired up the AOSP build requirements page, and... they want 150 gigs. That's half the drive in my netbook, and getting a new disk isn't happening before money starts coming in again.

(Earlier, the reason I couldn't set up AOSP on my netbook was because it was running Ubuntu 12.04 when they'd moved on to 14.04. Sigh. Always something.)


June 25, 2016

Testing continues to be a weird concept. For example, the busybox guys just had a message about a corner case of "md5sum -c" and I checked and I never bothered to implement -c, but somebody's apparently using it, so I might as well.

The corner case was that -c with an empty file should produce an error message. (Why? Because the other one does, and somebody cared enough to submit a bug report. Ok then?)

My md5sum and sha1sum implementations are merged, and even though both hashes are borderline obsolete these days I need to add sha3 and so on, and presumably they'll share at least some of the infrastructure. And as long as I'm adding it to the code, I might as well add it to the test suite. This is where things get tricky: what do I test?

If I'd implemented it before this, I wouldn't have tested -c on an empty file because it wouldn't have occurred to me that it should error. But if I was in a hurry and half-read the bug report I might misunderstand that operating in -c mode should error out on an empty DATA file. (The md5sum of nothing is d41d8cd98f00b204e9800998ecf8427e and it's a valid thing to do.) So should I test an empty file to make sure that it does NOT error out?

The point of the test suite is only partly to show what coverage I've checked, it's mostly to catch future regressions. Is that a likely future regression? Should I test it separately for md5sum and sha1sum (even though it's the same codeepath)?

The md5sum man page is kinda useless when it comes to -c, you have to intuit that it accepts the same input that it outputs, I.E. hash, whitespace, filename. Except it outputs two spaces, does it accept one or three? Is it good with spaces _in_ the name? Can the hash have whitespace before it? My implementation doesn't have to check the hash length because hashes of the wrong length will never match, but should I TEST for that? Does it have its own error message? I wonder what -c does with blank lines in input? The man page is no help, gotta try it...


June 24, 2016

A long time ago yesterday's flag parsing nonsense was done in python, but I redid it in C because having python as a build dependency is bad. The smallest Linux system capable of rebuilding itself under itself SHOULD be 4 packages: kernel, libc, compiler, toybox. Reality doesn't always quite line up, but stuff like python and perl has NO place in that tight inner loop.

And yes, I intend to inflict this upon Android. They have "ninja" instead of make, and their "repo" is built on git. There's a large handwave in the previous paragraph because I need to add "make" to toybox (not just posix make but the gmake extensions packages actually use). I'm less certain about adding "ninja", so maybe that'll have to be an external package. How much of git I implement is a fuzzy line, but enough to download source from a server, check the code out, and maybe git bisect it. Checkins involve merges which are Not My Area...


June 23, 2016

Russell got a new job today. The lack of money is starting to make people flake off the company. (He was the most recent hire, so it's not surprising he'd go first, but still.) I just submitted my third two-week invoice since the last one was paid.

Toybox! At the suggestion of slashbeast on irc (I've forgtten his actual name but he's a longstanding contributor to aboriginal and I think toybox), I added patch --dry-run and -d, and the option parsing logic broke. The symptom is that --dry-run is acting as a synonym for the debug option -x, and is triggering this behavior even when CONFIG_TOYBOX_DEBUG is switched off. (Which shouldn't be possible, but what else is new?)

The problem seems to be a bare longopt after an elided short opt gets marked wrong, I.E. "(dry-run)"USE_TOYBOX_DEBUG("x")"d:ulp..." becomes "(dry-run)^Ad:ulp..." with that ^A being an ascii 1 used to signal "skip this flag value". That way FORCE_FLAGS can switch flag values back _on_, because there's a gap left for them, but without FORCE_FLAGS those FLAG_ macros become 0 and code that does if (toys.optflags&0) can be dead code eliminated.

(Yes, this code has gotten more complicated over time. :)

So, more context: each command has a NEWTOY() or OLDTOY() macro that provides an option string as its second-ish argument, and said option string can have USE_BLAH() macros in it that drop out when the corresponding config option is disabled, thus preventing those flags from being recognized in that configuration. (Since C concatenates adjacent string literals, "one" MACRO("two") "three" becomes either "onetwothree" or "onethree" depending on whether MACRO() resolves to its argument or nothing. Those macros live in generated/config.h which is created from .config by a giant sed invocation in scripts/make.sh during the build.)

We then wash the option string through a binary called mkflags (build from scripts/mkflags.c) which outputs lines that look like "#define FLAGS_x 1<<42" so the flags can move around when you insert or delete options from the string, and the code using them doesn't have to care.

By default the FLAG_ macros for the disabled flags are #defined to 0, so the FLAG_x option to patch should be 0 and all the if (test) fprintf(stderr, "debug gorp"); checks should just drop out thanks to the C compiler's dead code elimination when the if test is a constant zero. But they're not dropping out because FLAG_x is generated as (1<<6) (I.E. enabled) and FLAG_dry_run is (FORCED_FLAG<<7) (I.E. disabled). Which is the bug du jour.

Backing up: HOW exactly do we output zero for flags that have been removed from their string? If the flag isn't THERE, how do we know to output anything for it? And what does the FORCED_FLAG mean?

Second question first: when you implement more than one command in the same C file, so they can share infrastructure outside of lib/, you have the problem that any code under toys/ is in exactly one flag context at any given time. So mv and cp are both in toys/posix/cp.c, but most of the common code is in cp's flag context, and when you build mv when cp is disabled that clever "flag macros are zero, code drops out" stuff needs to be disabled so mv can translate it's flags to cp's context before calling the code it shares with cp. Thus FORCE_FLAGS: which is a macro you can #define to switch the disabled flags back on. This means the enabled macros are 1<

And now to answer the first question: what the build does is parse the optstr twice, once in the current config context and one in allyesconfig context with everything enabled. Then it feeds both strings to mkflags (built from scripts/mkflags.c), which parses each one into a linked list and then traverses both lists in parallel (function main() starting around line 176): aflist is all flag list, I.E. the allyesconfig version, and flist is just flag list, the version for the current config.

The loop traversing aflist assumes that flist is a struct subset of aflist, so if you traverse aflist the next flist entry either matches or it doesn't. If it matches, you advance both and treat this as an enabled flag. If it doesn't, you advance aflist and treat it as a disabled (unless forced) flag. You never NOT advance aflist, your decision is whether or not to advance flist too.

Inside the loop are two cases, the first if it's a bare longopt, and the second if it isn't. A bare longopt is a --long-option that doesn't have a corresponding short option. The syntax is "letter followed by attributes" (generally punctuation), and the way you specifiy a long option is to put it in (parentheses). So "d:(dinosaur)" is -d "string argument", accepting --dinosaur "string argument". (Also "--dinosaur=string" and "-dstring".) The ":" means string argument, and (dinosaur) means this is the long option synonymous with that short option.

A "bare longopt" isn't after a letter. This means it has to come first in the string, before any letter options. (Otherwise you'd just be having two --longopts for the same short opt. I think that works? Hasn't come up recently, don't remember if I implemented it. But that's what the syntax would mean.)

Fun thing about bare longopts: only they get a FLAG_ macro with their long name. If there's a short opt, the FLAG_ macro uses the short opt, and you just write the code using the short FLAG_ macro. But if there's a bare longopt, you need to have FLAG_dry_run (and yes you need a pass to squash other punctuation into _ because you can't have a minus in a macro name). That's the main reason we need different code to output flag macros for bare longopts vs regular short options: the macros it's producing are named differently.

Since 90% of the option parsing code thinks in terms of short options (and treats longopts as just synonyms for them), bare longopts are a bit of a side codepath. But they --happen so they need to work.

One other hiccup: the option parsing logic in lib/args.c traverses an option string, but if we're doing FORCED_FLAGS it can't be the option string with the letters removed by the USE() macros. It has to know to _skip_ flag values for the disabled options in order to get the bitshift indexes right. So we need to output another macro into generated/flags.h, in this case #define OPTSTR_patch, with an adjusted string for the parsing logic. We take out removed options (and their trailing punctuation) and replace them with one ascii 1 byte per disabled option, just so the option parsing logic knows to skip over that bit position.

So, the build has a big sed invocation in scripts/make.sh that extracts each optstr in the two config contexts (current config and allyesconfig), and feeds it to mkflags. In this case, it's being passed (/me sticks some printfs into the code)...

command=patch, flags=(dry-run)d:ulp#i:R, allflags=(dry-run)xd:ulp#i:R

From which mkflags is producing (among other things)

#define FLAG_x (1<<6)
#define FLAG_dry_run (FORCED_FLAG<<7)

But it should be producing

#define FLAG_x (1<<6)
#define FLAG_dry_run (FORCED_FLAG<<7)

The problem is that in scripts/mkflags.c, in main() line 176 we loop through aflags and test if it's a long opt or a short opt, and in short opts there's a test wrong:

- if (flist && (!flist->command || *aflist->command == *flist->command))
+ if (flist && flist->command && *aflist->command == *flist->command) {

Why that's wrong: for bare longopts, flist->lopt is set and flist->command is null. We're traversing two lists, one of which has been slightly polluted (*aflist->command is ^A for disabled commands, but that's ok because *flist->command should never be ^A so they don't match), and the other of which skips disabled commands. We traverse these checking if *aflist->command == *flist->command and writing the enabled FLAG_ macro if they match and the disabled FLAG_ macro if they don't.

The complication is that when they _don't_ match, one might be a bare longopt and the other a normal single letter command, so flist->command might be NULL. So if we just compare them it could segfault. So we need to first check if flist->command is NULL.

The problem above is we treated comparing a bare longopt with a non-bare longopt as a _match_, when it's never a match. That's why the output was wrong. (The next one was damaged because since this "matched" it consumed the bare longopt off of flist and then next time around compared the aflist copy a null pointer, which wasn't a match. When flist runs out before alist, everything left is not a match.)

Next time, fixing the help text parsing logic!

(P.S. A long time ago this nonsense was done in python, but I redid it in C because having python as a build dependency is bad. The smallest Linux system capable of rebuilding itself under itself SHOULD be 4 packages: kernel, libc, compiler, toybox. Reality doesn't always quite line up, but stuff like python and perl has NO place in that tight inner loop. And yes, I intend to inflict this upon Android. They have "ninja" instead of make )


June 22, 2016

All week working on an SDK for $DAYJOB. (Alpine Linux under virtualbox, even Rich couldn't get x11 working in alpine under vb.)

I really want to work on toybox stuff, but this is on a deadline and I'd really really like this job to turn back into money. It was an awesome job for over a year, and they owe me almost 6 weeks pay at this point. I'm aware of the fallacy involved in the second part, yes, but the first part bought some leeway. Ok, let's face it, I'm still here because I follow Jeff around like a puppy given a chance, and that's worth some leeway. He created uclinux, which means he basically invented the embedded Linux space. Senpai didn't just notice me, he hired me. I'm all for it, I just want the checks to start getting signed again.


June 18, 2016

I've done multiple cleanup passes on netstat, which people on the list tell me is obsolete because nobody wanted to clean up the code. Well I'm cleaning it up now! (In theory I am cleaning up a fresh implementation of it. This had BETTER be the case because toybox is actually public domain, not BSD licensed, so copying stuff out of netbsd has to get ripped back OUT again and redone when I find it. And yes, I've done it, and this is part of the reason the "pending" directory exists. So I can do that sort of review before code winds up in defconfig and distributed binaries.

I started down the netstat cleanup rathole because Elliott sent me a small cleanup patch that was cleaning it up the wrong way, and it's turning into a full promotion pass. Not fully documented to cleanup.html example levels, but I've been busy with other stuff. Which is a pity, because there are a lot of nice tricks here like the way 19 lines of ss_inode() became a single line of code at the call site. And as usual a lot of cleanups I feel bad about taking credit for, because get_strtou() had no reason to exist in the first place so yes removing it is a good cleanup but "I shrank the code 50%" is less impressive when there were functions that nothing ever _called_. Still, other people should be able to do this, that's the point of documenting the cleanup. "My cleanups wouldn't have made so much difference if the original code wasn't so bad" is not something I should feel like a fraud over. (Impostor syndrome scales!)

And a lot of what I'm doing is digging into kernel code to see if the data sources we're reading can ever _produce_ the fields they're trying to parse. As usual lot of them seem to have gone away in the 2.2 timeframe, or even earlier. I kept the [0000]:inode thing because I can't find where it went _away_, which means maybe I'm not finding what generates it today? (What does it _mean_? There's a socket:[inode] which still happens, but [0000]:inode would have been what?)


June 16, 2016

End of the four day "j-core engineering summit", and the first day that all our engineers could get together and talk to each _other_ rather than talking to Big Customer Who Hasn't Paid Us In Nine Months. (First day was prep for meeting with them, second and third days were meeting with them, and finally we get a day afterwards to do Everything Else.

Alas, one day was not neary enough time, and we focused on things to sell to other customers rather than the giant todo list of open soruce stuff. Everybody's exhausted from smiling at people who have spent most of a year trying to put us out of business. I still think their strategy is to bankrupt us so we stop challenging the patents they filed on our IP last year, but it's not my call.


June 15, 2016

Gave up on the glibc bug when I found out they have a bug report on this, added to their bugtracker in January 2015, marked as "critical", and ignored for a YEAR AND A HALF. Meanwhile Rich suggested that setting a precision of -1 is explicitly specified in posix as "as if precision was omitted", and I tried it and that doesn't trigger this bug. So I switched to that.

In theory I'm sitting in Endless Meetings With Large Customer Who Will Never Pay Us, as they explain how they made zero progress during the nine months we weren't working with them (because their legal department patented our stuff, so we stopped handing them new stuff), but it's not the engineers' fault that they work with evil clowns. It IS kind of hard to have much interest in reviving a relationship with people who got nothing done without us, though. What exactly do they bring to the table? (Oh right: money. Except they're haven't paid us a dime all year, they're still doing the "smile while twisting the knife" thing.)


June 14, 2016

If you ever need to bisect git://sourceware.org/git/glibc.git the repo build is:

$ cd ~linux; make headers_install INSTALL_HDR_PATH=~/glibc/headers

$ (rm -rf build && mkdir build && cd build && ../glibc/configure --prefix=$PWD/../install --host=x86_64-unknown-linux --disable-profile --enable-kernel=2.6.32 --with-headers=/home/landley/glibc/headers/include libc_cv_forced_unwind=yes libc_cv_ctors_header=yes libc_cv_c_cleanup=yes && make -j 2 && make install)

Then the test I'm doing is:

$ cat test.c
#include <stdio.h>
#include <limits.h>

int main(int argc, char *argv[])
{   printf("test=[%.*s]", INT_MAX, "hello");
  printf("\n");
}

$ gcc -nostdinc -isystem install/include -isystem headers/include -isystem /usr/lib/gcc/x86_64-linux-gnu/4.8/include -isystem /usr/lib /gcc/x86_64-linux-gnu/4.8/include-fixed -nostdlib -static -Linstall/lib -L /usr/lib/gcc/x86_64-linux-gnu/4.8 install/lib/crti.o /usr/lib/gcc/x86_64-linux-gnu/4.8/crtbeginT.o install/lib/crt1.o test.c -Wl,--start-group,--as-needed -lgcc -lgcc_eh -lc -Wl,--no-as-needed,--end-group /usr/lib/gcc/x86_64-linux-gnu/4.8/crtend.o install/lib/crtn.o

$ ./a.out

And yes, that's quite a pain to bisect where a glibc bug was introduced.


June 13, 2016

J-core engineering all hands meeting all day here in Austin (at a hotel up in the Arboretum). Alas, we don't have time to talk about actual jcore stuff, instead we're preparing for big presentation to Large Customer Who Has Not Given Us Money This Year And Yet Is Still Considered A Customer.

(The reason so many instances of j-core have a dash in them is there's a dead one-person database project taking the github account, and there's a music genre that's taken most of the occurrences on twitter.)


June 12, 2016

More poking at netstat.

As far as I can tell, get_strtou() is a big pointless reimplementation of strtol(), get_pid_name() doesn't use llist.c. show_ipv4 parses() data from proc including a port number and then calls get_servnam() which calls htons() on that port. We're endian switching data we got from /proc so we can call getservbyport()? Yes, the definition of getservbyport says network byte order rather than host endianness. Because that makes sense. Sigh.)

scan_pid() is reading the pid's cmdline (via absolute path despite traversing a dirtree), searching for the first space, and sticking a null in there with the comment "/bin/netstat -ntp" -> "/bin/netstat" which is nuts because cmdline has embedded NUL bytes.

Half the plumbing in this thing implements -p, and right at the center of it is a check for "[0000]:" names that, as far as I can tell, the kernel isn't producing. But it's hard to prove a negative...


June 11, 2016

Lots of poking at netstat posted to the list instead of here. For a moment I was worried that somebody had copied somebody else's incompatibly licensed code again because it was producing insane output that didn't match any kernel shipped in the past decade, but it turns out the man page has old data telling you to do that so yes, could have reasonably been done in a fresh implementation.

Almost half the plumbing in netstat implements -p, and right in the center of it is ss_inode(), which parses a /proc/pid/fd/* symlink to extract inode data out of either "socket:[123]" or "[0000]:123" format strings.

And on my box, this produces a bunch of hits:

ls -l /proc/*/fd 2>/dev/null | grep '[[]socket' | grep socket

But this:

ls -l /proc/*/fd 2>/dev/null | grep '[[]' | grep 0000

Not one hit. So dig into the kernel again to see under what circumstances (if any) this data format would be generated... I went down a couple dead ends trying to find it (the fs/proc/fd.c code is a bit hard to grind through, calling function pointer callbacks registered from who knows where), but eventually tried:

grep '"socket:' */*.c */*/*.c

Found net/socket.c where the socket: output comes from. I don't see a [%d]:%d variant anywhere.

Proving a negative is hard. Normally I try to find where something _does_ exist in a historical kernel, and then find the commit where it was removed. But this? I have no idea what it referred to in the first place, or when it might have been relevant. (On solaris in 1987? What?)


June 8, 2016

I'm STILL tweaking toybox ps. Largely because Android keeps hitting issues.

I currently have these output fields:

  • ARGS Command line (argv[] -path)
  • CMD COMM, or ARGS with -f
  • CMDLINE Command line (argv[]) COMM Original command name (stat2)
  • COMMAND Command name (/proc/$PID/exe) NAME Command name (COMMAND -path
  • TNAME Thread name (argv[0] of $PID)

Of these, ARGS, CMD, COMM, and COMMAND are all mentioned by posix, but not "specified" because its definitions are incomplete and inconsistent bordering on incoherent.

I added NAME, TNAME, and CMDLINE. Now I want to rename TNAME to NAME, but I want to preserve the "/proc/$PID/exe -path" output option (which seems useful).

I want to move TNAME to NAME, and move the old NAME to CMD, but that the default output's CMD shows /proc/self/exe minus -path, which means it says "vim.tiny" instead of "vi", which doesn't match ubuntu ps's output.

I can switch it to use COMM instead (which is what ps was actually showing before), but if I'm showing COMM I want to _say_ COMM, and posix says it should be CMD. But in posix CMD isn't defined as an actual _field_, just an alias for two other fields, which isn't how toybox does stuff. (In toybox ps, the header fields it displays and the -o FIELD names you feed in to request that output are the same.)

Plus "CMD" is naturally a short way of saying "COMMAND". (COMM also is, but COMM has a stronger posix definition that the current implementation more or less works with.)

So I want to move the old NAME to CMD, but the problem is then the ps output of ubuntu's version and toybox's version don't _quite_ match. For example, ubuntu 14.04 has /proc/self/exe set to vim.tinyBoth are "short name", but on Ubuntu 14.04 exe says vim.ubuntu's vim.tiny sets its stat[2] to "vi"

Once again, implementing is easy, figuring out what to do is hard.


June 3, 2016

I got the simple root filesystem build script I wrote last month pushed to github. Doing that turns out to be kind of tricky.

Github won't let you re-use the same ssh key for multiple accounts, it says "key already in use" when you try to upload it to another login. So I made a fresh key in .ssh/j-core and tried to use that for my push, which is slightly awkward because you can't feed extra options to ssh through git's command line. Instead you have to export GIT_SSH as a path to an alternate ssh binary (or locally modify your $PATH so it finds another ssh first when running this push script), and then have that be a wrapper script doing 'ssh -i /blah "$@"'.

So I did that, and github STILL said "permission denied to landley" on the push!

My first question was where did it get "landley" from? So I stuck an echo in front of the ssh wrapper line to see what command line it was actually sending, which turns out to be "ssh -i /path/to/key git@github.com git-receive-pack /j-core/mkroot", and I tried that and it spit back "ERROR: Permission to j-core/mkroot.git denied to landley." so it's _github_ saying "landley" (not git) which is information they shouldn't _have_, let alone care about. Information is leaking from my system to theirs through ssh, and I remembered that ssh marshals environment variables across to the new environment, so I tried prepending "env -i" before ssh and _then_ it worked just fine.

Moral of the story: github is going to great lengths to try to track users and shoehorn us into the usage patterns it expects. Corollary to the moral: I found a big enough rock to make it stop WITHOUT invoking containers or VMs this time. (I will happily modify dropbear and git source before I bow to whatever strange usage pattern they expect me to do here because they are WRONG and I'm not gonna.)


June 2, 2016

Found out why I didn't get paid on Friday. All contractor payments have been held because our largest customer hasn't paid us for 9 months. There's another round of funding lined up for October, but getting to that is going to be interesting.

The company's been fighting with said customer over IP rights, because we objected when they patented stuff we'd invented. It looks to me like they want to end the IP fight by putting us out of business, then they can grab all our ideas and try to take our product to market without us. (Their management seems to believe that engineers are fungible, they can't tell the difference between some of our ideas and all of our ideas because they don't understand how our technology works, and they assume that everything they don't understand must be trivial. It's... not a good combination.)

I'm still working at $DAYJOB, but have already returned two recruiters calls today and taken the large print "I am not currently looking for a job." line off my resume. Maybe they'll start paying me again before I find another job, maybe they won't. I really really like the people and technology I work with, but I'm supporting two other people (and four cats and a dog) in a very nice house these days, and can't afford to ride startups down like I used to.

This really really sucks. If the company goes under there's nothing I can do about it other than hold a very public grudge against Large Customer forever. I'm not naming names yet, but I need anecdotes for my Texas LinuxFest talks anyway, and I should know by then how this turns out.


June 1, 2016

Alright, it's been a MONTH since I should have put out a toybox release. I meant to before the Japan trip but didn't get around to it, and have been busy since, but it's time.

I set an Aboriginal Linux all-architectures build going with a snapshot of what I've got to get binaries, and I've started going through the git repo to do release notes. I keep thinking I haven't done that much, but there are a lot of commits to collate into release notes...


May 28, 2016

Reading the wget man page is weird: it starts by explaining to you how unix options work. As in "wget -drc" is equivalent to "wget -d -r c", and -- prevents anything after from it being interpreted as an option, and so on. And then it says:

The options that accept comma-separated lists all respect the convention that specifying an empty list clears its value.

And I went, "This is a convention?" So I tried ps, which considers -o "" an error, and mount, which considers -o "" to add nothing to the existing option list (but doesn't blank it).

Anyway, I'm _not_ having a ".wgetrc" file, not doing the -e option to execute random commands (pipelines exist, and I've needed to make a "circle" command to route a pipeline's output back to its input for over a decade).

As for progress indicator, when I submitted my old "count" program to busybox years ago it got bikeshedded into pipeline-progress, and sometime later "pv" showed up.


May 25, 2016

Todo list critical mass is a funny thing. I've had wget on my todo list forever, and Last Year Isaac Dunham mentioned he was doing a wget, but never sent me a patch.

A month or so back I finally got a wget submission, from Lipi Lee, which is... not up to my standards: it's full of hardwired array sizes (which generally aren't bounds checked), it doesn't escape spaces in URL names (the GET syntax cares), and so on.

I threw it in pending, then I got a SECOND submission adding https support via command line stunnel variant, which Isaac mentioned mentioned in his submission, so I started looking at cleaning up the first submission... and wound up rewriting it from scratch. (As you do.)

I'm doing wget because busybox had wget years ago and never implemented curl, and seemed to get away with it. There's no standard for wget OR curl, so six of one...

To test it, I fired up netcat to act as a sad little server, which didn't work because I was using netcat -l /bin/cat which immediately backgrounded because it was calling a command with stdin/stdout redirected to that command. What I wanted was netcat -l with no command. And here I'm thinking if _I_ can't get this right and I wrote it, how does anybody else have a chance, I really need to upgrade the help. But haven't figured out what it should say yet.

Going back to wget, I implemented the URL escaping behavior (modulo the question of whether domain names can have low ascii or % characters in them, I'm escaping the URL _before_ parsing it and I THINK that's ok...) and then I got to the User-Agent field, and obviously I put "toybox" there but... "toybox wget"? Version number? How would a command other than the multiplexer get access to the version number anyway? (Answer: it wouldn't and I'd need to redo the build a bit to make that available, which re-raises the question of where the version number should live when git describe doesn't return anything, because tarball builds _are_ supported and git is NOT a build prerequisite...)

Tangents.


May 22, 2016

Toybox development is the same "constant series of cleanup passes" thing I used to do with busybox, or at least it should be that way. This combines strangely with the "push to 1.0" I'm doing.

I recently added bugetgrgid() and bufgetpwuid() to lib/ because it turns out libc does _not_ cache any lookup values from /etc/passwd and /etc/group, not even the one it most recently looked up, so things like ps and ls constantly re-parse /etc/password and /etc/group, over and over, even if there's just one user being displayed.

I noticed this cursoring left and right through the "top" fields, which should change the sort order and redisplay, but not re-fetch the data from /proc faster than normal. Cursoring into the username field was _really_ slow, which implied that a significant chunk of the processing time was loading the names. (Of course only sort by name needs to load _all_ the names, the rest only do so when displaying.) And I confirmed, once I had the cache implemented cursoring around different sort types was smooth again but cpu time of top doing normal refreshes didn't go significantly down.

So the CPU time of top is from the get_ps() code, not the show_ps() code. I know because cursoring around to do a lot of displaying, now that the cache is in there, doesn't hugely affect top's CPU usage even though it's displaying 5 or 6 times per refresh.

As far as I can tell, the get_ps() overhead is mostly... proc. How does the other top implementation manage to do it in less CPU? No idea.


May 18, 2016

Dreamhost's "Let's Encrypt" integration actually seems to have worked, and https is working on the site now.

I did a much simpler root filesystem build script that downloads and verifies packages (but doesn't patch them), populates a root directory, and installs toybox+busybox+dropbear into it. Still not sure what Aboriginal Linux should look like going forward, but there's one way it could be simpler.


May 17, 2016

Starting in 1980, MIT had a famous introductory computer programming course series that tried to teach people how computers worked, hardware and software, from the ground up.

In 1997 they stopped teaching this course because they decided building a system from the ground up was no longer relevant. So they gave up, and decided instead to teach students how to poke at black boxes to get behavior out of them, without ever trying to understand how the hardware and software actually worked internally.

I think that's wrong.

On the software side, I did Aboriginal Linux because I wanted the smallest Linux system capable of rebuilding itself under itself, so you could understand what all the pieces do. On the hardware side, J-core lets us do the same thing with a processor.

The systems that nobody understands will bit rot and go away as the people who created them cease being available.


May 15, 2016

I've been useless and mostly sleeping for 4 days now. Combining a stomach bug, jetlag, and going cold turkey off caffeine is a recipe for sleeping 16 hours a day. (Either that or I've managed to contract mono.)

I have Giant TODO List Of Doom to work through. (As always, the trip to Japan made my todo list _longer_, despite getting lots of stuff done.) And I'm just not up to doing it in more than about half-hour bursts. Blah.

Going off caffeine is mostly because my eyes are in terrible shape and caffeine seems to make it worse. I should find a proper eye doctor and get a proper diagnosis, but they keep saying they can't find anything. (Other than the floaters, cataracts, myopia, and visual migranes, none of which they can do anything about.) But it's hard to see (harder in the mornings) and there's all sorts of weird flashes.


May 12, 2016

Regained a day going back across the international dateline, so what was the 12th became the 11th again and now it's the 12th again and it's hard to figure out whether this is a new blog entry or a continuation of the previous one? Return layover in the SF airport, I slept for half of it because I couldn't do ANYTHING useful on the united iternational flight with every seat filled, in the backmost row where the seat doesn't recline (but the one in front of me did and was), with a large man in the next seat spilling over into mine, and epic sleep deprivation already going on but no way to sleep where I was.

So I mostly watched stuff on the seatback entertainment system, which is actually a strange torture device United devised to annoy its passengers. The captain or one of the stewardasses would blather something irrelevant every 15 minutes (such as turning on the fasten seatbelt sign 8 times during the flight, or telling us things that map screen on the seatback had been showing us live all along), which paused the video with a pop-up about "your entertainment will resume after you listen to this important message". Not once was it an important message. The intercom would click on and have fifteen seconds of hum before they actually spoke, then another dozen seconds of hum afterwards (and often they'd blather more irrelevant drivel after a long pause). And then the thing would resume for 5 seconds before it started all over again in Japanese. And then if you tried to back up the video, you hit the fact this search controls in the video don't work. (You can view forward/back at up to 16x speed, and then when you hit "play" it resumes either at the start of the minute or the start of the scene, which could be 8 or 9 minutes earlier. Seemingly randomly.)

United! They're cheap right now for a reason.

Still, Deadpool was entertaining. A movie where "irreverent" is actually an accurate description, and it's the most perfect comic book movie casting since Robert Downy Jr. was Tony Stark, except this time it really seems like they hired Deadpool to play Ryan Reynolds. Not sure how. (Also not sure how it was that _obvious_ given I wasn't a fan of the character before this movie, yet here we are.)

Tried to watch several other movies, but mostly made it about 10 minutes in and stopped it again. (I don't understand why the Alice in Wonderland movie starring Jack Sparrow got made. Or at least not from the first 10 minutes where Alice is dancing with some chinless person surrounded by horrible people. I'm sure it's important setup, maybe I was too sleep deprived to appreciate it, but entertianment utterly failed to happen. I'm told their doing a sequel. Ok then.)

I slept through half the 6 hour layover in SFO because the concrete floor here is SO much more comfortable than anything involved with United. But now I've got a couple hours to try to get a little work done.

The loopfiles stuff still needs a design update so the callbacks aren't constantly calling openfd(). Let's see what's using that:

md5sum: doesn't care (one read, auto-close), base64: one read, auto-close, blkid: one read, auto-close...

There's an inconsistency: md5sum and blkid continue bad reads (read, test error_msg), but base64 aborts (xread). Meanwhile md5sum uses read, blkid uses readall. The blkid callback doesn't care about fd vs fp, but the direct call of the same function does (checking errno=ENOMEDIUM). bzcat is convertible but sort of prefers fd? dos2unix cares because it's using copy_tempfile() which is fd. fsync cares because it's doing ioctl() stuff...

Sigh, converting this to something coherent looks like an enormous timesink. I _think_ what I should do is have loopfiles always use a FILE * and then things that need fileno() can trivially get it, and that avoids the main problem which is once you do fdopen() you can't dispose of that FILE * object _without_ closing that fd. (It's a one way transition.)

Speaking of enormous timesinks, time to get on the next airplane.


May 11, 2016

Sick most of last night, got about 3 hours of sleep. Not _quite_ food poisoning this time, in that I didn't throw up. Some kind of stomach bug. Alas, I couldn't take a sick day because I had to check out of my hotel, and the airplane home leaves shortly after midnight.

Wound up kinda useless all day.

The Turtle boards arrived! They boot Linux! Ethernet works. The HDMI connector works! The USB hub doesn't work (although it's apparently a simple fix if you have a soldering iron, an active hi/low got swapped in the overcurrent protection). I need to take Martin's wiki material and do a public version for j-core.org.

One of the people who emailed to express interest in our proposed Open hardware track at Linux Plumber's mentioned that 2 days after our proposal was submitted, an "FPGA development" track was submitted and the two have been splitting the proposed audience and no WONDER it's been 4 months since we submitted it and there's just crickets chirping. (I emailed to go "do we still have a chance" last month and got a one line reply back boiling down to "maybe", and that's the only communication with them in 2 months.)

So I emailed them to put the proposal out of its misery. If we really want to do something like that we could do it at ELC, where the CFP doesn't close for weeks yet and then they give you a reply back in a month, and that's WITHOUT having an entire page on your responsibilities doing their convention organizing job for them. (If you're going to ask for a significant resource investment from us, don't leave us hanging for 4 months about whether our proposal is actually accepted. I have other things to do.)


May 10, 2016

I've spent the past few days trying to get the VHDL git repository conversion finished, and it's HARD. The information mercurial has exported is nuts, and git provides no error messages when something doesn't meet its expectations. Between the two of these, it's slow going.

I threw up my hands and asked Rich for help, and he confirmed that the merge info mercurial is exporting does not _remotely_ match reality. Unfortunately, he's busy trying to make gdb work for j-core, so I don't want to distract him with my problems.

I think what I have to do is create synthetic merge commits. Not entirely sure how.


May 9, 2016

Busy week. Have not kept the blog up to date.

So we have a J-core roadmap with a 64 bit strategy now, and Jeff's also been working on scaling SMALLER to something we'll probably call J1. The basic idea is to rip the cache, prefetch unit, and multiplier out of j2 and try to fit it into the 8k version of the ICE 40 FPGA, which has 32k of sram. If we can make that work (and it looks doable if we can just get a VHDL toolchain targeting it properly), then we can move jcore down into the Arduino space in a _truly_ tiny chip.

To make this work, Jeff's been poking at nvc which is a simulator like GHDL but written in C, and seems capable of producing output that yosys could consume to produce a bitstream for the ICE40. Jeff stripped down J2 into a something tiny enough to fit, and then nvc couldn't simulate it, so he sent the tarball of VHDL source as a giant test case to the nvc maintainer, who's been fixing bugs ever since.

Apparently you can replace a hardware multiplier with bit shifts and adding, which means a 32 bit multiplier becomes a 33 clock cycle microcoded instruction, which is horrible from a performance perspective but small in terms of transistor count. However, the really _fun_ bit is that the original SuperH instruction set (SH1) had a much smaller multiplier, which becomes something like a 9 clock cycle microcoded instruction. Hence this stripped down thing being J1; we can tell the compiler to output original sh code for it, not even sh2.

No, it wouldn't run Linux. Although if you hook up 256k of ram you can sort of run nommu Linux out of ROM, according to a presentation at ELC last year (video, slides).

Jeff says he's done that, but he'd rather port the Arduino GUI to connect to a jcore toolchain and hook into that ecosystem instead, which wouldn't involve running Linux on the result at all. Still pretty exciting, if we can make it work...


May 3, 2016

Pinging the computer science departments of various women's universities around Tokyo: if we're gonna hire another couple developers we have to train in our stuff _anyway_, we can correct the project's gender balance while we're at it.

Speaking of which, a message on the toybox list asked me to add contribution guidelines to the README, and while I was there I dithered for a long time about adding more text to the "code of conduct" bit, and then committed an update with great trepidation. (The original was intentionally extremely terse because any attempt to address issues affecting 51% of the population gets complaints from much smaller groups insisting it should be all about them instead, how dare I, etc. Since they can only be helped by perfect people, and would rather I do nothing than do my usual half-assed job, I compromise by doing nothing for them and getting on with what I originally planned. I'm aware this makes me a horrible person in their eyes, and I'm ok with that.)

The patreon news post situation has goten to the "sorry, sorry, sorry!" stage where I'm avoiding looking at it because I feel bad. It's a self-imposed goal to write monthly updates and I've been SUCKING at it. (That effort gets sublimated into updating this blog, so that's something.)

And I need to do a monthly writeup of the j-core.org traffic. There's been a lot of news (traveling to tokyo always flushes the todo list), and it should be collated.


May 2, 2016

Texas LinuxFest wants talk proposals by the 5th, LinuxCon Japan wants them by the 6th, and the Open Hardware Summit wants them by July 1st. I still haven't heard if the Linux Plumber's conference wants the open hardware track, but I'm more or less assuming they don't since I've heard nothing for two months. (I emailed last month and got back a one line reply that the organizer was "cautiously optimistic" we might still make. Nothing for a month before that, nothing for a month since. For something we're proposing to invest significant resources in, and have been waiting to hear back about since January. Wheee.)


April 30, 2016

I've implemented a lot more -o fields in ps than the documentation actually lists (mostly needed for things like top and iotop), so I'm trying to go back and fill them in. And I'm at -o PR vs -o PRI, which gets into what the "priority" value the kernel exports means.

The ps value is exported from fs/proc/array.c function do_task_stat, which calls task_prio() and prints that value as is. That function lives in kernel/sched/core.c where it's returning the process's p->prio - MAX_RT_PRIO, and the comment on the function says:

Return: The priority value as seen by users in /proc. RT tasks are offset by -200. Normal tasks are centered around 0, value goes from -16 to +15.

Except that MAX_RT_PRIO is defined in include/linux/sched/prio.h as MAX_USER_RT_PRIO and that (in the same file) is #defined to 100.


April 28, 2016

I miss kmail. Pity they glued it to a dead desktop, and then glued _that_ to an rss reader and 12 other things in a giant unseperable hairball. Oh well. Microsoft does bundling, kde and gnome do bundling. The GPL is itself an attempt to use copyright to do bundling.

Not really a fan of bundling.

I forgot to plug in my netbook yesterday, and the battery died while I was out at dinner. Died as in "I lost all my open windows/tabs again". So I had to relaunch chromium, which brought up the pervasive breakage in current chromium. Specifically, when I relaunch it and it connects to the network it tries to reload all its tabs, using gigabytes of memory and pegging the CPU in perpetuity as javascript in background tabs endlessly cycles animations I can't see and contacts servers to refresh ads I'm not watching and probably mines bitcoins for all I know.

The 12.04 fix was to fire up "top", find the tab processes eating nonzero cpu, and kill them. Repeat until no tabs were using CPU. I even had a script to do this (top -n 1 | screen scrape the output). But the new improved chromium damages its process environments so the names are truncated and I can't distinguish which tabs I can safely kill, and which will take down the whole chromium instance (closing all the windows). And then on the relaunch, the tabs I did manage to kill before it went down try to reload as soon as I connect to the network.

I tried connect/disconnect (wait) connect/disconnect (wait) to starve the tabs, but plenty still manage to load (and use CPU perpetually), and killing them is russian roulette and after a half-dozen accidental forced restarts of chromium, I figured out what I SHOULD have done.

As root, "while true; echo nameserver 127.0.0.1 > /etc/resolv.conf; sleep 1; done". Leave that running (dhcpcd will periodically overwrite it, this puts it back). Launch chromium and let all the tabs fail their DNS lookups. That DOESN'T get retried every time you connect to the net.

Another other thing you can do is use toybox top instead of busybox top, which shows the "chro" processes and can sort by memory usage (cursor right so the memory column is selected, that's what you sort by). Killing one or two memory hogs can free gigabytes of ram, and it usually takes 20 or so kills before you accidentally hit a critical process that takes down the whole of chromium.

So I'm slowly adapting to my new Linux desktop version, and working around the "lateral progress" that desktop Linux is known for. Every time I upgrade, I have to come up with a new set of workarounds, and this is why Linux on the desktop isn't any more popular today than it was 10 years ago. They keep breaking stuff that used to work and calling it progress.

I call it "lateral progress", and you get it on Android in spades, where it's called "cloud rot"...


April 27, 2016

Still in Japan,coming up on Golden Week, which keeps getting described in english for some reason.

As far as I can tell, English is to Tokyo what Latin and Greek were to english a century or two back. You sprinkle in phrases in that language to show you had an expensive education, but said phrases sometimes make very little sense in the original language or out of context.

Had a discussion with Jeff and Jen about testing at dinner tonight (everybody went out for pizza), and I think I realize one of the disconnects I had with Jen (who is big into testing).

To me "complete testing of all the behavior" includes making sure the thing can emit all the error messages you think it can emit. (Can I trigger all the failure cases I think I'm catching? It's a codepath, what happens when we go along that codepath...)

That means testing is complete when I'm triggering all the positive _and_negative_ behavior I've implemented. If I didn't think to implement something because I didn't think of all possible inputs and some unexpected input causes weirdness when this environment variable contains this value while it's run with stdin closed so the first open becomes the new stdin and stdout has O_NONBLOCK set on it and we inherited umask 777 and were creating a filename with an invalid utf8 sequence on a filesystem where the driver cares and then we have an unexpected filename collision due to case insensitivity and our cwd has been overmounted by another filesystem so "." and /path/to/cwd give different results and our parent process set SIGCHLD to sigignore which we inherited across exec (POSIX!) and selinux made setuid() fail for a sudo root process trying to _drop_ perissions and writing to the middle of a file failed because it was sparse but the disk filled up and another file was on NFS where close() can fail (and leaving a deleted file open causes rmdir() on the enclosing directory to fail) we've run out of systemwide filehandles right before the OOM killer takes us out...

I'm aware there's generally stuff I didn't think of. Testing is complete when it covers everything I _can_ think of. More or less by definition. And then the real world breaks it, every time, and you add more tests as you fix each bug.


April 26, 2016

Highly productive day (somewhat confused by the international dateline). But Jeff keeps thinking it's wednesday so I don't feel so bad.

Among other things, I sat down with Niishi-san and asked about the memory controller, and wrote up the result on the j-core mailing list. I should put it on its own page, but despite the design walkthrough talk actual VHDL source documentation probably comes after finishing the git repo conversion.


April 25, 2016

Lost a day to the international dateline (there's a REASON the future begins in japan, it's already tuesday here).


April 24, 2016

9 hour layover in the San Francisco airport. Trying to close windows so I can charge the second battery, but there's a todo list critical mass where working through todo items opens new tabs faster than you close them and I am _so_ far past that event horizon....

Got a couple talk proposals sent to Linuxcon, which is on Toronto this year about a 45 minute drive from SEI's canada office. Kind of a shame NOT to propose something for that, since I've never been to their canada office. (My only visits to canada were Ottawa Linux Symposium. When it was in Ottawa, before moving out of Ottawa did to it what moving out of Atlanta did to Atlanta Linux Symposium. If your city is in the event's name, don't move out of that city. It won't end well.)

I'd hoped to at least flush through toybox bug reports and get a release out, and I did get the bzip segfault fixed (it was a bad error message, the error case was detected and the attempt to report it was segfaulting), and the latest find bug, and the start of ps thread support. And I got some basic test infrastructure for toysh in (ability to run a test under a different shell than sh.test is running under).

The loopfiles stuff needs a design update so the callbacks aren't constantly calling openfd(). Not quite sure what that should look like yet.


April 20, 2016

The long-delayed trip to Japan has finally been scheduled! I fly out on the 24th (at 7:30 am!), and get back on something like the 14th.

I was trying to get a toybox release ready by the end of the month. Now I need to get it ready in 4 days.

Hmmm. I need to get serious about the toybox release. That said, I have a 10 HOUR LAYOVER in San Francisco on the way out. (Do I know anybody in San Francisco?)

Poking at the shell some more to try to get what I'm doing to a good stopping point. I'm not sure how the signal handlers should work with nofork commands, specifically if I ctrl-C while it's doing a malloc, it could theoretically interrupt between the malloc allocating the data and the assignment of the return value. That's a memory leak. I don't know how to make it NOT be a memory leak other than never allocating memory. (It could do the same with filehandles but I can presumably clean up after that because I can check what filehandles are in use with /proc/self/fd, or even by hand if necessary. I don't think any of the nofork commands are doing mmap but that's at least theoretically recoverable too.)

If you hit ctrl-C at a bash prompt and then "echo $?" it says 130, which is 128+2. But when I had toysh exit with 130 from the error handler it said 2 instead, and after elaborate head scratching it's because bash (or maybe the syscall?) is doing an &127 on it. Ok then, reasonable behavior according to posix. So it's locally preserved, but not from child processes.


April 19, 2016

Once upon a time (back at Quest Multimedia in 1998) I wrote an arithmetic expression parser in java that worked fast enough on a 75 mhz 486 DX to not just evaluate each point of a line but smoothly animate it as the equation changed.

This was, alas, almost 20 years ago. But it was a pretty standard "two stacks, one for arguments, one for operators" approach I think I read about in a book once. Implementing my own in toybox was never something I dreaded doing, just something I never got around to because there's so much else to do. That's why looking at the contributed "expr.c" in toybox, even with layers of cleanup from somebody at Google, I really want to throw it out and start over again.

The big thing I _really_ want to do is have the same code handle expr and shell arithmetic expansion (I.E. $((1+2)) evaluation). The problem is $(( )) treats unknown strings as environment variables (force converted to int, or 0 if their value isn't representable as an integer), and expr treats unknown strings as strings in certain circumstances.

In expr, | and & can have string arguments, = > >= < <= and != can have string arguments, : always works on string arguments, and arguments by themselves are strings.

In $(( )) there are seprate logical and boolean operators, = is an assignment not a comparison, and there are whole categories of operators (prefix and postfix, bit shift, the ? : conditional thing...) that expr doesn't do. And it's not like either can be extended to do the other's thing: there are several obvious conflicts in the functionality: ":" as regex vs "? :" as conditional, "=" as comparison vs "=" as assignment, "|" as comparison vs "|" as boolean, and whether "BLAH" is a string or the contents of $BLAH coerced to integer type.

The other bit is that expr operates off of separated arguments and $(( )) takes a string. Parsing a string into seprate arguments isn't that big of a deal, but "expr 1+2" returns "1+2" because it's a string argument.

That said, they can be extended _most_ of the way towards each other, and it's easy enough to have a loop grab the next token out of a string and feed it start+length to a "deal with it" function, and another loop traverse an array to call the same function with strlen().

My big question is should the common plumbing have a mode flag, or should it operate off of two different operator tables? I'm leaning towards mode flag because the enum gluing together the table and the users of the old enum table is kinda unpleasant. (Yeah I did the TAGGED_ARRAY stuff for that case over in ps, but the natural names for these entries are things like + that don't work in a symbol name. Also, the TAGGED_ARRAY stuff made an enum of _index_ positions, and this has the values stored in the table. Yeah the tagged array stores strings in the table, but ps needed that. Pretty sure I could make it not store the strings when they're not actually needed, maybe SPARSE_TAGGED_ARRAY or some such...)

Another wrinkle is indicating precedence grouping. I added a bunch of tests to expr.test to show that * and / being the same priority (and thus happening in the order presented) matters (especially with integers), so I can't just use a simple order but have to indicate demarcations. But I can put NULL entries in the table to indicate each group end.

What I want is a SPARSE_TAGGED_ARRAY of operator strings, sorted in priority order from "(" to "|=", with NULL pointers at each priority group change. That way flush checking can search from the start of the array to the NULL after this operator's index position. That makes the "prec" and "op" fields of the current table go away, leaving just "sig", which indicates whether this operator takes strings or integers on each side, but this is a property of a priority _group_, not an individual entry, so it should probably be its own array and not in this array.


April 18, 2016

Guess what the reason for the numato flashing problems was? Go on, Guess.

Hands up everybody who picked Ubuntu being flaming stupid.

Seriously, they've got a demon that sends random AT commands to any newly attached serial port because clearly nothing says it's 2016 like assuming all serial ports have a modem attached supporting the patented (no really) Hayes AT command set. And yes, hayes press releases used to have "+++ATH" in their header fields, in case you're wondering why base64 encoding message contents got so popular...


April 17, 2016

Elliott indicated that mksh might be displacable, so I should iron while everybody's on strike due to overheating, or something like that.

So I'm reading the posix shell spec! And it's crap! Seriously, it says you should have builtins like "alias", "cd", "bg", "getopts", "read", and "umask" available in the $PATH so you can exec them and have env call them. What would "env cd" _mean_? Don't alias and read basically set environment variables in the current process environment space?

I'd ask the Posix list, but you know, Jorg.

The posix standard doesn't include any escapes in PS1 except "!" (not backslash escaped, just by itself, with !! being literal !). Anyway, attempting to do the PS1 escapes from the bash man page, which brings up the question of what \[ and \] actually _do_. The man page doesn't obviously say, but google found an explanation: they're a hack working around the fact bash doesn't query its terminal cursor position. So let's just ignore them and do it right.

(Meanwhile, Elliott's also expressed interest in mke2fs, an mtools replacement, and thread support for ps. I should do these things. I'm also working through the new bunzip2 segfault test case John Roeghr sent me, and Andy Chu's long todo list of testing stuff, and I should get expr, lsof, and file promoted out of pending for the upcoming release.

Wanna do a release at the end of the month. It's already looking kinda... squishy, as deadlines go... Let's see how close I can get!


April 16, 2016

The darn numato flash tool is breaking for people, because the python3-serial package is buggy and introduces noise when you open a connection. jeff suggested I write a new one in C, which seems like overkill until you ponder "what I'd be debugging isn't Numato's script, it's python 3".

I admit removing the python 3 dependency is almost worth a C rewrite. It's not quite to "kill it with fire" levels of perl dependencies, but python 3 was not python 2. The motto of Python 2 was "there should be one obvious way to do it", and the very existence of Python 3 contradicts that.


April 15, 2016

The j-core list is up!

There has GOT to be a better sysadmin in the company than me. Oh well, got there eventually. I still need to set up DKIM or SPF or some such, but let's see how trigger happy gmail gets on the spam filtering first.


April 13, 2016

The reason buildroot's qemu-sh4 defconfig isn't working is a recent commit changed the default kernel type from zImage to uImage. (The first is a default output type qemu knows how to boot, the second is the weird packaging u-boot and only u-boot needs, for no obvious reason.) So they know about it now, and are fixing it upstream.

People don't always believe me when I say I break everything, but seriously: most things don't work for me on the first try. This is why the "continuous integration" proposals replacing releases with the expectation that the git commit du jour must always be as stable as a release would be... I am not a fan. I don't care what your test suite looks like, it will break for me.


April 11, 2016

Since the reinstall of Ubuntu 14.04 I keep finding missing packages: they don't seem to offer openoffice anymore, but libreoffice works, mame gets more annoying each release (stoppit with the full screening and the mouse grabs, more --stop-doing-that arguments eery time...)

And I apparently don't have a proper qemu install on here (x86 is there, sh4 isn't; for some reason upstream breaks this up into multiple packages for no apparently reason). I usually build it from source because of that, and the git repo on this box apparently hasn't been updated since October 2014. So I did a git pull, then make clean, in that order. The clean complained that "config-host.mak is out-of-date, running configure" (boggle), and then died complaining that pixman isn't installed.

This is "make clean" saying this. Thanks QEMU!

Making sure configure enables virtfs is another fun one (it switches it off by default, when you force it on the complaint is you haven't got libcap-devel and libattr-devel, wich are the Pointy Hair Linux names for libcap-dev and libattr1-dev). And of course after ./configure is happy, make immediately dies because autoreconf isn't there (of COURSE configure didn't check for it)...

autoreconf: configure.ac: not using Libtool
autoreconf: running: /usr/bin/autoconf
configure.ac:75: error: possibly undefined macro: AC_PROG_LIBTOOL

Seriously? There is NO EXCUSE FOR LIBTOOL ON LINUX. EVER. They notice it's not there, and then break because they try to use it anyway. That's just SPECIAL.

I could just about put up with having autoconf persistently installed on the box, but libtool gets uninstalled after each use, which means I wouldn't be able ot build qemu du jour unless this gets fixed. Hmmm... The problem is pixman, possibly last time I just installed pixman-dev rather than initializing the git subrepo? If it keeps the dependencies down to a dull roar...

How do you delete a git subrepo out of the project again? One of those "if you don't already know how to do it, there's no obvious way to look it up" things that are so prevalent with git. Time to try some google-fu... The answer is "git submodule deinit -f pixman". Clearly, I should have just known that intuitively.

After two minutes of watching "make clean" cd into each subdirectory and GENERATE HEADERS just so it could delete them again (before I killed it and did a "git clean -fdx" which took about 3 seconds), I started wondering what pixman actually _does_? Yes, I can see that the description says it's a pixel manipulation library for X11, the question is what does X11 already do that does NOT count as pixel manipulation? Oh well...


April 10, 2016

FINALLY figured out how to enable xbithack on j-core.org which means I can do a nav bar with a news page.

And the j-core design walkthrough slides are up.


April 9, 2016

Drilling through back email, various broken link notifications for j-core.org and nommu.org and the toybox readme. That means people are reading my stuff. Woo! (And fixed, in all three cases. Still haven't got the presentation slides up because Jeff has the current version of those, mine are stale.)

Another package I forgot to install after the upgrade: VLC. Needs to download 21.6 megabytes of archives, and then probably another package with the various codecs so I can actually watch real videos.

No, it apparently installed all the codecs. Query: why does a VLC video display window have a minimum width it'll let me drag to, but not a minimum HEIGHT? The video will continue to scale down to really tiny if I reduce the height (the aspect ratio says the same), but whoever wrote the window sizing code artificially limited it for some reason. Clearly because they know better than mere users.

Didn't install mame. Didn't install openoffice. (But they don't package that anymore, it's libreoffice only... ok then? It still gives me an "soffice" binary.)


April 8, 2016

I normally run a webserver on loopback, and it's not working. I installed apache, but apache's config file syntax du jour doesn't like my old loopback.conf file: It's saying "Forbidden: you don't have permission to access / on this server." According to /var/log/apache2/error.log the reason is "[authz_core:error] [pid 13161:tid 139812817176320] [client 127.0.0.1:46908] AH01630: client denied by server configuration: /home/landley/www/favicon.ico, referer: http://127.0.0.1/" which is USELESS for debugging why it's complaining.

Solution: I've meant to write an httpd in toybox. Now seems like a good time to do that. Apache has complicated itself into uselessness, and the entire Apache2 rewrite was about becoming multi-threaded instead of multi-process to scale better on windows, it never had anything to do with Linux and was in fact WORSE for Linux (that's why nobody ever wanted to upgrade off Apache 1.3), but they shoved it down our throats anyway.

So yeah, abandon apache and write a simple replacement. Not much harder than netcat, and I've got the start of a wget that can pipe through an external program to do ssh. This is how my todo list shuffles around...


April 7, 2016

CELF/ELC. Very convention, much wow.

Sitting in my hotel room until it's time to airplane again, and trying to finish up the Linux From Scratch 7.8 build for Aboriginal Linux. There are several ways to do it: should I build all the packages with the toolchain I provide, or build a new toolchain with the old toolchain? If I do build a toolchain with current gcc/bintils (and gmp, mpfr, and mpc because that mess leaks complexity like a drunk fratboy leaks beer), should I build it against glibc or against musl-libc?

Since building glibc requires _perl_, there are a certain number of packages that would need building under the original toolchain just to get to the point of replacing them, so I started by building a couple dozen packages with my old toolchain and confirming they build fine.

The glibc build also explicitly tests for binutils 2.22 or newer, and ./configure sits down in the mud and throws a tantrum under 2.17. So I tried building/installing binutils 2.25, and although it compiled fine the result went "ld: cannot open linker script file elf_x86_64: No such file or directory" so there are some path mismatches in here.

Also pondering upgrading bash in aboriginal to the last GPLv2 release, which looks like 3.2. It... really doesn't want to compile. Hmmm...


April 6, 2016

We gave a talk!. It was fun. Video may go up someday. I am exhausted.

UT has finally officially rejected Fade (because even though she got her undergraduate degree from Occidental College in california, she did some of her prep work burning through graduate prerequisites for _this_ degree at UT, and continuing on there would be incest or something. I don't understand it in the slightest).

Meanwhile, the University of Minnesota actively wants Fade, offering her a scholarship with stipend and everything MONTHS ago (while UT dithered), so she's moving into student housing up there, and probably taking Adverb with her. (If Adverb's too barky for an apartment, I can drive up and swap him out for Peejee, who has the advantage of not being a dog.)

Fuzzy and I are staying in the house in Austin (every time I've moved out of Austin I moved _back_ about 18 months later, so we might as well keep Very Nice House with 4% fixed interest mortgage), but this gives me an excuse to visit my sister up near Minneapolis for larger blocks of time.


April 5, 2016

Sitting in Khem Raj's LLVM panel at CELF, I downloaded the llvm 3.8.0 packages and tried to compile llvm:

The LLVM project has deprecated building with configure & make. The autoconf-based makefile build system will be removed in the 3.9 release.

Please migrate to the CMake-based build system. For more information see: http://llvm.org/docs/CMake.html

Oh look, another random dependency. Ths just FILLS me with reassurance. (On top of the gcc 4.7 requirement, and the fact that their linker doesn't work well enough for THEM to use it most of the time. And the fact that openembedded's "list of packages that won't build with cmake" includes _CMAKE_. And the fact that I left the compile running for HOURS after the panel and it's STILL NOT DONE, and that's just llvm, there's like 5 more packages in this chain...)

Looks like I need to maintain my old lastgplv2 toolchain a while longer. And get serious about QCC after toybox's 1.0 release. (Which needs a cfront implementation... this is not going to be fun.)


April 4, 2016

Back at CELF! Or ELC, as they're calling it this year. Fun convention, Rich Felker is here, as is Jeff Dionne although he hasn't made it to the actual convention yet. (He's holed up in the hotel restaurant being a CEO on the phone, and working on slides for our talk wednesday.)

Saw the Openembedded Talk, kernelci talk, and the yocto vs buildroot, and reducing android's memory footprint.

While I was talking with Rich in the hallway, Linus passed by and I said hi (the actual quote was "So you _didn't_ beam out after your keynote"), and he actually stopped and talked to us, which I did not expect. This would totally be a "Sempai noticed me!" moment if I hadn't been doing this for something like 18 years now, but it was nice to finally talk to him in person.

Mostly I introduced Linus to Rich ("meet your new superh architecture maintainer") and told him about the j-core stuff. He reminisced about transmeta a bit and said he wouldn't have time to use a numato board if we gave him one. (We bought a dozen to the convention to give away.) He also gave Rich permission to blow away anything in arch/sh he wants to, because Linus was this close to removing the thing during its orphan period and just didn't get around to it. (We want to keep the old stuff anyway because it shows prior art, and a lot of Japanese engineers who poured their careers into this technology are glad somebody's picked it up and run with it even if their companies don't care. We're "respecting the spirit of superh".)

So that was fun. Now back to the hotel to prepare for our talk.


April 3, 2016

Another travel day, plane from Chicago (where I spoke at Flourish) to San Diego for ELC (which used to be CELF).

I'd hoped to hang out with Jeff and Rich, but my luggage wound up in Burbank for some reason, so I hung out at the airport until 7pm when it showed up in San Diego. (Southwest gave me a $50 travel voucher for the inconvenience, which was nice of them.)

Got some more grinding in on the repository conversion, but I'm still working my way through 2012.


April 2, 2016

I would appear to be allergic to chicago. Constant sneezing and my skin itches. I wonder why?

Both talks went well, the outline of the Three Waves talk is up, the second was basically a walkthrough of what Aboriginal Linux does under the covers (assuming you don't want to use my scripts and instead build your own busybox/uClibc or toybox/musl initmpfs by hand). I look forward to the videos going up, presumably on their youtube channel.

I also attended a talk about Software Radio, but it was a lot more introductory than I was looking for (if I didn't think software radio was a good idea, I wouldn't have gone to the talk?) and half of it was the presenter showing us a video of a more interesting talk on software radio that he didn't give us the URL to.

Went to a very nice talk about how the distribution sausage is made which I'd like to re-watch when the video goes up. It's Red Hat focused, but the guy giving it knew his stuff. Talked with him a bit afterwards about how I want to bootstrap distributions from essentially a Linux From Scratch chroot, and the troubles I had with Gentoo's INSANE annotation of every single package with every architecture it had been tested on (so adding a new target requires touching every file in the entire tree, so the hexagon guys told portage that hexagon was a variant of x86 to get it to work, and then jettisoned portage for a linux from scratch approach without ever trying to push anything upstream into gentoo ever). So now I'm hoping rpm and dpkg would be less stupid.

He then explained how the situation was worse than I knew for Red Hat, because not only will Fedora 24 not build under Fedora 22, it won't build under Fedora 24 either. It builds under a modified fedora 23 and once they get all ~16,000 packages to build once they move on. And attempting to do a verification step of rebuilding under the result and fixing anything that broke would be too much work, they haven't got the resources.

So that's nice. I guess I should go back to looking at Debian, since they actually seem to care about this.


April 1, 2016

Today's the first day of Flourish, although both my talks are tomorrow. I flew to Chicago yesterday, and am staying with Beth Eicher, an old friend and coworker from Timesys who's one of the driving forces behind Ohio LinuxFest. (She has a kid now, who is ambulatory but prevocal.)

The bus we took to Flourish is the Jackson Park Express, which she pointed out Weird Al did a song about, specifically mentioning her stop. (Since she's been personally serenaded at two Weird Al concerts, she's decided that song is about her. I hadn't actually heard that one yet because the past two albums I bought the CD but don't actually have a convenient CD player hooked up to anything anymore, so I just listened to the songs on youtube. Apparently I missed a few.)

Yup, it's a weird al love song.

Flourish is a bit sparsely attended this year, but it's rebuilding after a gap year. That's the downside of student run conventions: the previous group that runs it graduates and there tends to be some rebuilding. Aggiecon tends to go in 4 year cycles as the new group who takes over flounders for a bit, learns to do it well, and then graduates together. You'd think it would involve smoother handoffs and gradually onboarding junior members, but it never seems to work out that way. The people who know how to run it do so until they're no longer available, and then newbies are thrust into the spotlight...

Penguicon tried to cycle the concom out and hand off to new people very year for the first couple years, that's why year 3 almost didn't happen. And then when they stopped that we got Mr. Penguicon trying to build his personal identity around it, which was


March 29, 2016

Tried to tether xubuntu 14.04 to my phone, spent ten minutes wrestling with the fact that any access point needing a password fails because it won't pop up a prompt (the hover text says it needs a password but it won't ASK for one), and when I went into settings->network connections manually it wouldn't let me edit anything (all the fields were greyed out). I thought maybe it had created a corrupted entry so I deleted the entry for my phone's wifi, but it recreated it corrupted. The trick turned out to be delete it, and then create a new wireless network WITHOUT trying to associate with it, and THEN I could give it a wifi password.

Seriously, this is craptacularly bad. Anyway, it took me ten minutes to get it to work, after which I couldn't remember what I wanted to Google.

Linux on the Desktop!

Meanwhile, the advantage of chromium over firefox was always that I could kill tabs. But the chrome web browser (chromium-browser) is broken in xubuntu 14.04. It truncates its command lines (the process is writing a NUL byte into its environment space so all the children of the --zygote process show their command line as "/usr/lib/chromium-browser/chro". The problem is, when chromium restarts itself and reloads all its existing tabs, it eats insane amounts of CPU and memory. (I have hundreds of tabs in a dozen or so windows. I've learned from experience that merely bookmarking stuff never gets looked at again, the firehose of new data doesn't STOP. But I'll periodically go through and harvest old tabs, and do the implicit todo items in them.)

So if I let my old "./whackium.sh" script run to completion, calling top -b and working out which chromium tabs are eating cpu long after they should have stopped, and killing them all, it eventually kills chromium. The whole thing, not just individual tabs. Because they destroyed the ability to distinguish between the child processes, thus defeating the whole purpose of chrome in the first place.


March 28, 2016

In my previous installation checklist I had to sudo ln -sf vimrc /etc/vim/vimrc.tiny and I still need to do that, because somebody at ubuntu hates vi and tries to sabotage it every single install. I don't know why.

I need to "sudo apt-get install aptitude" and then "sudo aptitude install chromium-browser mercurial subversion git-core pdftk ncurses-dev xmlto libsdl-dev apache2 xfce4-clipman-plugin g++". I didn't install flashplugin-nonfree libreoffice-gtk yet.

Disable that horrible "light locker" thing (if you don't know what the screensaver is called, you'll never find it). Make the power manager always show the icon. Set a root password...

The xubuntu terminals are SORT of white on black, except it's grey on a darker grey which is a STUPID source of eyestrain. You used to fix this by deleting terminalrc under /etc/xdg, which has now moved into an "xfce4" subdirectory for some reason, but deleting it didn't set the background to proper black so I went into edit->preferences and changed the color myself.

Eventually I should be able to move the mail over, but there's a lot of other random fixups to do first. (Copying my home directory with its random dotfiles fixed some of it, but not all. Ordinarily I DON'T do that, and force a proper re-setup so I can document what it all was and make sure everything's reproducible, but I just want to get this working again so I can do things before Flourish and ELC. I didn't expect it to eat 3 days, and can't afford for it to eat MORE than 3 days.)


March 26, 2016

The giant rsync finally finished. (I think I was rsyncing over a backup of the mac's virtualbox linux partition, not the most recent netbook rsync.) Then I let xubuntu update itself to 14.04, and when it rebooted the networkmangler icon had vanished so I can't select which wireless network to log into. (Bravo, ubuntu. I knew better, and I tried an uprade anyway. It has NEVER worked for me in the entire history of ubuntu.)

So, time for a clean install. Had an alarming moment looking at xubuntu.com (which is an apache fresh install page), but it's xubuntu.org. (Right, I knew that, it's just been a while.) I downloaded the iso (first attempt had some sort of permission problen that firefox won't tell me anything about, why is may backup machine running firefox? Eh, retried and it worked.) Then usb-creator-gtk wasn't installed, but easy enough to add it. Then the usb stick had a GPT partition that usb-cd-creator didn't know how to handle (and fdisk doesn't know how to delete, but "cat /dev/zero > /dev/sdb" does)...

It IS possible to install over an existing partition without reformatting it if you manually mount the partition and delete everything but /home, then click "do something else" at the bottom of the selector thing (really, that's what it's called) and then double click on the partition, manually select the same filesystem type in the pulldown, and THEN it lets you not select "format" while assigning a mount point.

Unfortunately, the resulting install STILL doesn't have networkmangler launching. (All I kept was the /home directory, I guess that's screwing it up?) Luckily other people who've had this problem, and after a bit of googling I found a page that said I need to run nm-applet under the "dbus-launch" wrapper, and edit the Exec= line in the file /etc/xdg/autostart/nm-applet.desktop to add that wrapper. (If keeping /home broke it how is modifying /etc fixing it? Is this just plain broken?)

So yeah, if you're wondering why Linux on the desktop never happened, and why I don't consider Android on the Desktop a step backwards from thisnonsense, I'm trying to use Ubuntu 14.04.4. That's 4 bugfix only dot releases after a Long Term Stable release shipped, and this is the kind of hoops you have to jump through to do esoteric things like "connect to the wireless access point in the next room".

Meanwhile, the box makes an annoyingly loud pc speaker beep every time I accidentally plug/unplug the power cord (it's a replacement adapter so it jostles out easily) which is actually done by the BIOS (system management mode or some such nonsense), so I can't STOP the beeping, but in 12.04 I had the volume turned way down. How to replicate that? Let's see... Pavucontrol was useless (didn't have a knob for this), I installed aumix but it had even fewer controls (none for thepc speaker beep). I had this working in 12.04 so I know there IS a way... I tried fiddling with the "pcspkr" module but that's not it. (It wasn't installed and shouldn't be.)

Ah, I finally stopped looking for ubuntu pages on this and looked for LINUX pages on this, which led me to alsamixer which (when you hit F6 and select the bottom audio connection) pulls up a menu of 6 audio controls, the rightmost of which is called "beep" and controls the thing I want to control. I have no idea if it'll persist over a reboot... ah, "sudo alsactl store" makes it persistent.

Linux: smell the usability!


March 25, 2016

Yesterday I thought "Three day weekend coming up (celebrating the release of the movie Ishtar), I should probably upgrade my nebook to 14.04 so I can read my email on the machine I actually carry with me when I head off to Flourish." I wanted to actually do some programming this weekend, so I decided to upgrade instead of reinstall.

Since I don't trust the upgrade not to eat all my files, I decided to do a full backup of everything, through the network. The server already has the individual directories I particularly care about rsynced and periodically tarballed (with old tarballs on various USB disks), but it's been a while since I did the "everything under /home" rsync.

I should have plugged in a cat 5 cable. Even an rsync over an older incremental backup has taken a day already. Not sure when it'll finish, rsync isn't exactly known for a global progress indicator. (Maybe I could add that to the toybox rsync someday? Hmmm...)


March 23, 2016

As long as I've opened the can of worms that is sed again (because Debian did something stupid), emit() can return an error when it can't write data to the output. One codepath prints an error message, another doesn't. Most callers don't check its return code, two do. One of those two does an error_exit(), which may not be the right thing if you're doing -i? I don't know.

The hard part's almost always working out what the right behavior _is_, not implementing it. Posix isn't close to detailed enough to notice this stuff, And I can't ask the austin group list because they're thorougly Jorged.

Oh well. Throw it on the todo list.


March 16, 2016

The reason Android expects to stabilize at an installed base of 6 billion phones. 1.3 billion people have no access to electricity. So basically they expect everybody in the world who can get an android phone will.


March 15, 2016

One of the issues with vi and cp and such is figuring out when they should overwrite the existing file in-place, vs when they should copy data to an adjacent file then mv it over the original.

The first one has the downside that symlinks and hardlinks can modify multiple copies of the same file, and sometimes you don't want to. (For example, the bunzip2 install overwrites an existing /bin/bunzip in place, and if that's a symlink to busybox it'll brick the system. Alas "cp -f" doesn't help because it tries to overwrite first and only deletes the file if the first attempt fails. This is why toybox binaries are chmod read-only: so cp at least cp -f won't stomp the shared file in-place.

A downside of the second one is that creating another file in the same directory doesn't guarantee you're on the same filesystem, because we have bind mounts now. In fact when I'm developing an Aboriginal Linux build control image (either using more/chroot-splice.sh or more/ ../control-images/build/lfs-bootstrap"), I cp read-only files out of /mnt to /tmp and them --bind mount the /tmp version over the /mnt version. That gives me a writeable file, but renaming it doesn't work because the filesystem it's on is read only. Similarly some scripts do "cp file.txt /dev/ttyS0" instead of cat, because historically that's worked; it expects overwrite in-place.

I implemented "create another and mv" logic for patch, both because inserting text in an existing file is awkward (you have to read and rewrite all the data after that point, and you corrupt the file with data loss if interrupted partway), and because you should be able to leave the file unmodified if a patch hunk fails. I then factored it out into lib.c and polished it for sed -i.

But for editing arbitrarily LARGE files with vi, I want to be able to mmap the file and read the data in place. Fine: allocate an array of lines that have starting offset and length of each line within the file. (Do we want to limit ourselves to 4 gigs offset, or 4 gigs line length? If not that's 16 bytes overhead per line, possibly more if we do some type of tree structure for the indexes instead of a giant array that's a big memcpy to insert into the middle of... Eh, start with an array and wait for bottlenecks to present themselves.)

I don't want edits to go live until you save, so modified lines need to be malloced and saved locally until written. The historical behavior of vim's ":w" is to overwrite existing files, updating hardlinks. (Probably truncate and write? I should strace it.) That also says that the offset should be a "long" so I can store a pointer or offset in it, but I'd need a way to indicate which...

And then there's the problem that if somebody else modifies the file while we're editing it, our mmap changes and our offset/length indexes become invalid. Which says we need to read everything into malloc memory, which is what I was trying to avoid...

Sigh. Implementing: easy. Figuring out what the correct behavior should be: hard.


March 14, 2016

Today I wrote up why you get filesystem corruption use a writeable non-flash filesystem on flash in email, so I might as well copy it here.

Conventional filesystems are based on the assumption that blocks the filesystem didn't write to won't change, and the standard block size on Linux has been 4096 bytes for many years. [1] Hard drives used 512 byte blocks, and the newer ones use 4096 byte blocks, so you could update the underlying storage with fine granularity, and parts you weren't writing to stayed the same.

But flash erase blocks are enormous by conventional storage device standards, I've seen anywhere from 128k to 2 megabytes. If the flash hardware is interrupted between the block erase and the corresponding block write (power loss or reset both do this), then the contents of the entire erase block is lost. Meaning you could lose a megabyte of data on each _SIDE_ of the area you wrote, which can knock out entire directories and allocation tables or even take out your superblock. Blocking a 1 megabyte hold in a conventional filesystem tends to render it unmountable, and filesystems designed for use on conventional hard drives don't know this is an option.

FAT is especially vulnerable to this: the file allocation table is an array of block pointers all next to each other at the start of the partition. A single failure to rewrite the data after erasing an erase block will take out the entire FAT and trash the whole filesystem unrecoverably.

It's a small race window, but the results are catastrophic.

This is why there are "log-structured" filesystems designed specifically for flash, which cycle through all the available erase blocks and make a tree pointing back to the data that's still valid in the previous ones. Linux has several implementations of this concept.

This technique is sometimes confused with "journaling", because it provides many of the same benefits, but it's implemented differently. Log filesystems are organized into an array of erase blocks. To format one, you have to have to know the flash erase block size, and they must be aligned to the start of an erase block. Because of this you usually _can't_ use them on non-flash device because they filesystem driver will try to query the flash hardware to determine the erase block size, and if that fails they don't know how to arrange themselves. They're designed ONLY to work on flash.

In operation, they cycle through all the available erase blocks and make a tree pointing back to the data that's still valid on the previous ones. Each new erase block contains both new data and any existing data collated out of the oldest block in the filesystem, I.E. the one which will be overwritten next. If there are free erase blocks the filesystem can just write new data (often leaving most of that erase block blank) without deleting an old block. If there are sparsely used erase blocks it copies the data from the oldest one to a new one and adds its new data to the extra space.

When a log-structred filesystem is near full writes get slower because it has to cycle through a lot of blocks to find enough free space, copying the oldest data to the new one and collating the free space until it has enough space to write the new data. (The smarter ones can skip entirely full blocks and just replace blocks that had some free space in them.)

Mounting them can also be a bit slow because it has to read the signature at the start of each erase block to figure out which one has the newest timestamp, I.E. the one contains the current root of the tree.

The advantage of doing this (other than automatic wear-leveling) is that if writing is interrupted after an erase, the single erase block that got trashed can be ignored (each erase block is checksummed, detecting invalid data is easy). The previous block still has a root node describing the contents of the filesystem as it was before the last attempted write, and the oldest block never gets trashed until after the newest block is written. (That means it always needs one free block between the oldest block still in use and the newest block, to accomodate these failures. So you're never erasing a block that still contains valid data, the data had to be copied out to a new block first.)

Note: read-only filesystems don't have this problem. You can stick a squashfs or read only ext2 image in flash and it's fine, because it never erases blocks so the granularity difference between what the filesystem was designed to expect and what the hardware actually does never comes up. It's only when _writing_ to flash that you need a filesystem designed for flash to avoid data corruption.

[1] It used to be 1024 bytes, but the longest an individual file could be on ext2 with 1024 byte blocks is 16 gigs, and the largest with 4096 blocks is 4 terabytes, so everybody switched years ago. (Because it uses a 3 level tree to store metadata and each level can hold more branches in a 4096 byte block than a 1024 byte block, that's why the difference is so big.)

A follow-up question: can we use log structured filesystems on SD/MMC cards? The real question seems to be "can you disable the FTL (Flash Translation Layer) and enable MTD (Memory Technology Device) mode". To quote Free Electrons:

Two types of NAND flash storage are available today. The first type emulates a standard block interface, and contains a hardware "Flash Translation Layer" that takes care of erasing blocks, implementing wear leveling and managing bad blocks. This corresponds to USB flash drives, media cards, embedded MMC (eMMC) and Solid State Disks (SSD). The operating system has no control on the way flash sectors are managed, because it only sees an emulated block device. This is useful to reduce software complexity on the OS side. However, hardware makers usually keep their Flash Translation Layer algorithms secret. This leaves no way for system developers to verify and tune these algorithms, and I heard multiple voices in the Free Software community suspecting that these trade secrets were a way to hide poor implementations. For example, I was told that some flash media implemented wear leveling on 16 MB sectors, instead of using the whole storage space. This can make it very easy to break a flash device.

If you can figure out what the erase block size your sd card is using, you can theoretically use block2mtd, at least according to the Raspberry PI and Debian guys.

That takes a block device an adds manually supplied flash erase block information. This only reclaims reliability if the FTL implementation, when receiving an aligned erase block sized write, won't break it up and do silly things with it behind the scenes. (Depends on your sd card vendor, apparently? How do you tell you've fixed it except by yanking the power a zillion times? Even some "senior embedded engineers" gloss over these issues because injecting failures at the operating system level won't trigger this, the FTL chip will automatically follow each erase with a write replacing its contents behind the scenes unless it loses power (or gets hard reset) at the wrong time. Simple tests in the lab won't hit this issue.

According to block2mtd.c in the kernel source, you either insmod block2mtd from initramfs with "block2mtd=[,]" or else write to /sys/module/block2mtd/parameters/block2mtd for the static version. In theory it should be able to provide this in the kernel command line too, but I'm not sure they wired that up in this module? (I'd have to examine further...)

Another alternative is to make sure your partitions are aligned to a nice big power of 2 size (so whatever your erase block is, you're not crossing them) and be prepared to lose the writeable partition. If you're logging to FAT, and the FAT partition is toast when you try to mount it, reformat the thing. That way the system can at least boot and start logging new data.

Also, you only lose data while writing it, once it becomes read only it should be safe, so when doing firmware updates you can have two partitions, write the update over the "old" one, and then have your boot software do the same "check the partitions, which has a valid checksum and the newest date stamp, use that" and should always have one valid even if the other is toast. (This is a common approach in the embedded world, reduces bricking the device on update.)


March 13, 2016

Long thread on the list about the toybox test suite. To me it sounds like Android has a hammer, so everything looks like a nail, but I admit testing is not an area of expertise of mine. Along the way, I seem to have posted a lot of "it's on the todo list!" that weren't in the recent post about that.

Poking at ls and trying to fix the -q and -v stuff. Posix describes ls -q with "Force each instance of non-printable filename characters and <tab> characters to be written as the <question-mark> ( '?' ) character. Implementations may provide this option by default if the output is to a terminal device." So what happens if the username has a tab in it? And presumably "character" is "utf8 wide character" now, right?

I'd ask the posix list about that, but Schilling. So I have to decide the right thing for myself and ignore the broken standards committee.

Meanwhile, I redid ls to use crunch_string() with its own escape for -qb, which brings up a problem: there are three failure cases needing escapes, low ascii (0-31), invalid sequences, and unmapped unicode points. The problem is the escape function gets passed a "wide character" (I.E. a decoded int), and then I need to dismantle that back to the raw bytes for -b byte escapes. I _think_ wcrtomb() will do it, but the docs are unclear? (Will the wc->bytes always exactly undo the bytes->wc transform? Is this guaranteed symmetrical even when it's an unknown unicode point?)

Another problem is that for -b I need to escape spaces, which are a valid printable character the escape isn't currently called for. And if the escape is sticking a backslash before characters, I need to escape backslashes. So I need to add an argument to crunch_str() to tell it what printable characters to pass through to the escape function.

(My brain's still screwed up by the cold and lack of caffeine, I'm just chipping away at the darn coal face anyway. Somewhat ineffectively, but eh.)


March 12, 2016

The reason "make test_mv" is failing is that mv depends on cp, so the single build of "mv" has cp in there also, and since it's first in the table mv always thinks it's cp, and thus acts like cp. (The multiplexer code gets yanked but the command table isn't length 1, so only the first entry in the table is noticed.)

The problem is cp is implemented in two chunks, the posix cp functions from the 1970's and CP_MORE which adds the giant stack of non-posix options (-adlnrsvF) you need to implement a modern cp.

In general I've been leaning more and more towards removing per-command options. There should be one obvious way each toybox command behaves, having to check a config file to see which variant you built is silly. When I started toybox I had a lot of busybox influence that took a while to clear out: toysh had a dozen menuconfig options before I even implemented environment variables. It doesn't anymore, now you select which commands you want in menuconfig, but other than a few global options there aren't any that change the behavior of a command, just whether or not to include it.. I started seriously cleaning the sub-options out when writing the outline for my 2015 ELC talk. (The old story: you start to write documentation and change the code rather than documenting what it currently does.)

My recent experience with the posix committe (laws, sausages, and standards: do not watch them being made) has tipped me over the edge in removing the CP_MORE and MV_MORE config options. Toybox cp and mv implement a lot more options than Posix mentions, and I'm ok with that. You can't select to make them _not_ do it in the name of compliance with a dead standards committee.


March 11, 2016

I composed the following reply to this message but didn't send it to the list. I suppose my blog is the right place for it.

On 03/11/2016 06:04 AM, Joerg Schilling wrote:

> ...

No.

Not wanting to interact with the guy who kept a memorial to decade-old Linux bugs in cdrecord's README as a reason Solaris was better and everybody should use it instead of Linux is _exactly_ why I didn't post any of this here when I maintained busybox, or since toybox was accepted as the Android standard command line implementation going forward.

Him, specifically, and he's STILL doing it:

> Note that your observation about Linux is not complete, there
> are other strange things in the Linux history. After 2001
> (4 years after a related POSIX proposal was withdrawn), Linux
> started to implement that POSIX proposal for ACLs and extended
> arrtibutes. Other platforms at that time did already decide to
> implement the NTFS ACLS that now have been standardized
> together with NFSv4.

15 years ago this happened! See how evil they are!

I knew better, and I posted anyway. I'll stick to Posix-2008 as my frame of reference to diverge from and stop bothering you guys.

Rob

I suppose another objection would be that their mailing list software doesn't make it obvious how to get a link to the thread a message is in, (in this case here.)

The reason I haven't been engaging with the posix mailing list is one of its most prolific contributors is Joerg "Linux Sux Solaris Forever" Schilling, who I don't want to get on me. I'm getting ready to throw out the Posix baby with the Schilling bathwater and just treat them the same way I do LSB and the man pages. They document one way of doing it, but not necessarily the only one. Posix is of historical interest, but has sadly fallen into disrepair in modern times.

The posix committee is dead, time to move on.


March 10, 2016

Phone call with Jeff to try to work out our ELC talks. Lots of good material about why open hardware in general, and why j-core specfically. Now I just need to write it all up.

I also need to contact the various tokyo women's universities to see which have comp-sci department, and try to arrange visits so we can try to hire graduates and/or try to interest them in computer hardware courses based around J-core to hire _next_ year's graduates.


March 8, 2016

I've had a cold all week. Not combining well with the lack of caffeine. I am getting NOTHING done. Oh well, recovery time...

I'm trying to fix rm -r, which should be able to handle infinite-ish recursion depth (as described in the recent giant todo list). I've gotten a basic cleanup pass done on the dirtree stuff, but just can't focus well enough to design the new stuff with any confidence in the result.


March 6, 2016

Still working out what file.c should do, but at least I got another update checked in.


March 5, 2016

There's an article (and here's an interview with the author) that says single women now outnumber married women in the US, and the trend line's pretty clear going forward. In 1960, 60% of women ages 18-29 were married, today it's 20%. The median age of first marriage for women wandered from 20 to 22 between 1890 to 1980, then went up to 23 in 1990, and now it's 27.

Fade and I got married in 2007 so she could get on my health insurance, after happily living together unmarried for years. Since Obamacare happened and we've been getting individual plans, so we're still married but our particular catalyst for going through with it's gone away.

In related news, Fade's girlfriend broke up with her today (amicably, and they're still friends; long distance relationships are hard). I hope she finds another one in Minnesota.

Backing up: Fade hasn't heard back from the University of Texas yet, but she applied a few other grad schools and got accepted by the universities of Michigan and Minnesota, and Minnesota's offering her a scholarship with a living stipend. Fuzzy is strongly against moving (she loves the house and her garden), and in the past 20 years I've moved out of Austin three times and moved back again each time, so would prefer to keep the house even if we spend a few years up north again. I highly doubt we'd find this nice a house as convenient to 24 hour grocery stores and a large university and so on that we could afford. (We could only afford it because 4% fixed interest rates on the mortgage, if we sell we lose that.) But Fade's run out of graduate school prep and needs to actually start her Doctorate if she's going to get one.

We're not sure how this is going to work out, but I might wind up spending 6 months up north with Fade (near my sister and the nicephews), and 6 months back here in Austin, with Fuzzy taking care of the house (and the dog and cats) when I'm not there. Minus however much time I spend in Japan, of course...)


March 4, 2016

During my most recent trip to Arkansas, Nick pointed out that some of my vision weirdness sounds like optic nerve swelling. (The official name for which is a latin phrase I'm not remembering at the moment which literally means "optic never swelling", but sounds much more ominous.)

This can cause flashing at the sides of your vision when you rapidly move your eyes side to side, as the irritated nerves signal through the only channel they've got. One of the things that can cause this: bad/chronic sinus issues causing adjacent swelling of things near the sinuses.

This explains much! It would benice if one of the various doctors I've been to over the years could have mentioned this, but given that the American Medical Association decided back in the 1970's to raise doctors' salaries by restricting the supply, I.E. engineering a shortage of doctors by enforcing medical school graduation quotas (for example the new Dell Medical School at UT is starting with a class of 50 students for the whole school), these days you get like 10 minutes with a doctor, and you're one of 40 patients she sees that day, so only the most immediately obvious stuff gets noticed before they're off to the next patient. (And when we started importing foreign doctors, the AMA cartel moved to limit their visas. Of course obamacare does nothing to address this because challenging the power of for-profit entities that have cornered the market is not Obama's thing, instead it's about channeling money through for-profit insurers to pay the higher fees for any share at all of the smaller pie.)

Anyway, I know from past experience that caffeine has been making the flashing worse, but couldn't understand why. Given that it seems to be sinus swelling screwing up nerves that makes sense, so I went off caffeine and started taking store-brand zyrtec daily until cedar pollen season's up.

I really, really miss caffeine. My producivity has been noticeably reduced by its absence. But still being able to see next decade would be really nice.


March 3, 2016

I wrote a big long reply to the mailing list that probably should have been a blog post, so copying it here for posterity. (I.E. in case Dreamhost's darn web archive eats itself again.)

On 03/01/2016 09:18 PM, enh wrote:
> No worries. Is it easier to keep track of things if I use github pull
> requests?

Not really. That's not the problem.

This week I fixed the bzcat integer overflow John Regehr reported, and the base64 wrap failure, dealt with Mike's heads on on glibc breaking the makedev includes, fixed three different build environment breaks (prlimit probe and MS_RELATIME define for uClibc, finit_module probe for ubuntu 12.04, ) and redid the test suite so it's consistently printing the command name at the start of each test and then factored that out so the test infrastructure was doing it.

Right now I'm trying to figure out how to redo lib/dirtree.c so that rm works with infinite depth. (I have a test in the test suite, but haven't checked it in yet.) Posix requires infinite recursion depth out of rm, but if dirtree closes the parent's filehandle and tries to reacquire it later (via openat("..") and drilling down again from the top if that doesn't match up, and keeping symlink parent filehandles open because reacquiring those is nuts at the best of times), then you have to do breadth first search because dirclose() loses your place and order isn't guaranteed, so I need a mode that eats filehandles and a mode that eats memory.

I haven't written the dirtree code but I've written the start o the test case for it:

+# Create insanely long dir beyond PATH_MAX, them rm -rf it
+
+// Create directory string 1024 entries deep (and half of PATH_MAX),
+// then use it to create an 8192 entry directory chain. Note that this
+// "mv a chain under another chain" technique means you can't even enforce
+// a $PATH length limit with a mkdir check, the limit can be violated
+// afterwards, so rm -r _must_ be able to clean up.)
+//
+// This one is _excessively_ long to also violate filehandle limits,
+// so naieve dirtree openat() implementation keeping filehandle to each
+// parent directory would _also_ exhaust limits (ulimit -Hn = 4096).
+// (But hopefully not so long we run out of inodes creating it.)
+X='a/'
+for i in 1 2 3 4 5; do X=$X$X$X$X; done
+for i in 1 2 3 4 5 6 7 8
+do
+ mkdir -p "$TOPDIR/$i/$X" &&
+ mv "$TOPDIR/$i ." &&
+ cd "$i/$X" &&
+ continue
+ i=
+done
+if [ ! -z "$i" ]
+then
+ break 2>/dev/null
+ exit 1
+fi

I'm trying to get NFS working with toybox mount. (You'll notice the recent "mount doesn't pass through -o leftover string data" fix.) I documented a simple NFS test environment years ago, ala http://landley.livejournal.com/49382.html (in qemu you mount 10.0.2.2 rather than 127.0.0.1) but busybox was passing through the old binary config blob format instead of the string config format, and there's no reason the string version SHOULDN'T work but I haven't made it do so yet so I'm sticking printk()s in the kernel I'm running under qemu to see why.

The need for file to detect cpio files got me looking at cpio again, remember my pending "add xattr support to cpio/initramfs" half-finished patches, and I also noticed that it wasn't checking the cpio magic (fixed that, it accidentally got checked in with the makedev header stuff, although the major()/minor() rename in lsof didn't because of pending lsof cleanup I didn't want to check in yet).

My pending cleanup of lsof needs to address the fact it takes 18 seconds to produce its first line of output when run with no arguments on my netbook (the ubuntu version takes 0.7). That's a largeish thing. Plus this looks like it chould share the ps /proc parsing infrastructure, which needs to be genericized into /lib. (I did some but not all of that refactoring work in the last interminable round of ps meddling; the common code can't access TT.* or FLAGS_* because that's all command-specific. Doing that would ALSO let me break out ps/pkill/top into separate command files.)

I was using the bash 2.05b limitations (toybox not rebuilding under it because jobs -p wasn't implemented yet) as an excuse to reopen the toysh can of worms, which I'm tempted to continue because digging into it has reminded me that the research I did circa 2006 is full of things I no longer remember. (Plus the busybox ash/hush split is sad, and aboriginal linux needing to use two shells simultaneously goes against the entire point of the project. So I need a proper bash replacement that works on nommu, and that means I write one.) But for big things like this I need to devote large blocks of time, because chipping away at them produces no progress. (I spend all my time figuring out where I left off and why.)

I have a build break in my local sed.c to remind me to add the "sed,+NUMBER" range extension feature and "s///NUMBER" extensions.

I have another build break in netcat to remind me to:

A) finish factoring out the xconnect() and xpoll() stuff into lib/net.c, taking into account the ipv6 name lookup stuff ala:

egrep "(->|[.])ai_" toys/*/*.c

Not to mention also converting:

grep gethostby toys/*/*.c

B) Add the UDP support for:

http://sarah.thesharps.us/2009/02/22/debugging-with-printks-over-netconsole/

C) Figure out whether I should merge this with tcpsvd.c and/or telnet.c, or if factoring out the common code into lib/ is enough. (Also, does the tail -f code work into that merge or stay separate? There's a whole infrastructure design rat's nest I need to do a lot of pacing and staring off into space about there. It's on the todo list.)

I have xdaemonize_nofork() in lib/portability.c to remind me to do a second pass of nommu support over everything. (That's ANOTHER thing to fix in netcat.c.)

scripts/install.sh has a one line patch not checked in to remind me of THIS issue I need to deal with:

+ # todo: install? --remove-destination? (Will install stomp symlinks?)
  [ "$1" == "--force" ] && DO_FORCE="-f"

Remember the help.c rewrite to more intelligently shuffle and recombine help text? That's pending too.

In tests/expr.test I have:

+# expr +1
+# expr 2+1
+# expr 2 + 1
+# expr 2 +1
+# expr X * 2
+# expr X + 2

Which is a year-old reminder of this and that's just a side issue, the real reason expr needs a rewrite is priority grouping ala this.

In tests/printf.test i have:

+# The posix spec explicitly specifies inconsistent behavior,
+# so treating the \0066 in %b like the \0066 not in %b is wrong
+# because posix.
+testing "printf posix inconsistency" "$PRINTF '\\0066-%b' '\\0066'" \
+ "\x066-6" "" ""

Which isn't checked in yet so "git diff" shows it and thus I remember it. (I hope to get to the point where just HAVING a failing test in the test suite reminds me of an issue to fix, but I tried to do a triage pass on the test suite last month to split out "contributed test needs to go in 'pending' even though I don't have a category for that because it's not remotely testing the right things", from "tests are valid but incomplete" from "there is no test for this command at all" from "these tests need to run as root in a controlled environment and the last time I sat down to make an aboriginal linux test environment run under qemu I tried to test 'ps' and couldn't figure out how to get output that wasn't hugely subject to kernel version skew then got distracted" from "I went through every #(%(&# line of the posix spec for this command and I've covered every corner case and this one's DONE"...

(Sheesh, the test suite could eat multiple months all on its own...)

I have another note in tests/test.test that "test" is a shell builtin so this test suite never actually tested our version, and that I need to add a way to to detect this. (VERBOSE=false to symlink the binary to /bin/false maybe? No, VERBOSE=hello to substitue the "hello world" command out of toys/examples which is _guaranteed_ to produce the wrong output, if not the wrong exit code... :)

I was partway through doing vi and less (completing the lib/interestingtimes.c and lib/linestack.c stuff), and I really need to get back to that before the "I forgot the details of how shells should work, need to reread everything again" problem sets in on that. (I was halfway through a dd.c rewrite once, tabled because the comma_iterate() infrastructure wasn't there yet and I knew I'd need common code for mount -o and ps -p and so on. That infrastructure's still not exactly DONE, but I've forgotten the details about dd and have to relearn them again at this point anyway.)

The recent wget submission from Lipi Lee (the second patch to which didn't apply, last hunk failed and I'm not sure why, but it brought up that my patch.c not only doesn't handle the git rename and permissions extensions which I need to add, but it doesn't even handle that "\ no newline at end of file" message if it comes before the last line of the diff. I note the failure was "git am" refusing to accept it, neither did ubuntu's patch, it was broken for some reason anyway, I just started debugging it in mine because I can get better debug output from my code with CONFIG_DEBUG and -x. Maybe I should make -x a default option, "why this patch didn't apply", but the output's way too verbose...)

Oh, speaking of patch, it's the main user of the "data is char * instead of void *" feature of double_list, and I keep meaning to go look if double_list would more naturally be something else, and if so how to modify patch.c to deal with it. Lots of OTHER things cast away the char * when void * wouldn't need a cast...

Anyway, the problem with wget itself (modulo cleanup and adding support for cookies and redirects and logins and a progress indicator and making sure it handles all the headers from here and...)

The REAL problem is I need to make it understand https:// and shell out to that openssh command line utility Isaac Dunham pointed me at last year that's better than stunnel. AND I need to make it handle ftp:// and combine that with the eventual ftpget/ftpput/ftpd commands (which is why i wasn't opening this can of worms yet, I eventually want an httpd that can do enough CGI support to run ph7)

Also, Rich Felker recently fixed strace for nommu, and I have that bookmarked along with this. (Given that I worked to port strace to hexagon in 2010 I'm reasonably familiar with the underlying mechanisms and could probably implement a really basic one for toybox in about a week, if I had a spare week.)

It's on the todo list. So is collating the unescape logic between printf.c, echo.c, sed.c, and sh.c and maybe adding a new wrapper layer to handle hex and/or octal escapes. (How that works in with the printf.test above, i don't know yet.)

Another nommu issue is that "did exec work or not" is difficult to determine, I need to fix xopen() to pass the "did it exec" failure back through pipe[3] to make vfork() be able to detect inability to exec as described here.

I still haven't gotten all the singleconfig commands working, for example "make ftpput" fails because the command is an OLDTOY() alias of ftpget and the singleconfig infrastructure isn't smart enough to work out what to do to build it. The config is wrong, the help text logic is wrong (unless I already fixed the help text logic part, I'd have to check. I was working on it and got distracted by emergency du jour.)

I need to figure out if readfile() should trim the \n and if so audit/change all the callers (and I thought I'd allowed a chomp() variant into lib but couldn't find it when I went to look.)

I'd like to figure out why date is failing with weird error messages:

$ sudo ./toybox date -D %s 1453236324
date: bad date '1453236324'; Tue February 53 23:63:00 CST 2024 != Wed Mar 26 01:03:00 CDT 2025

Ubuntu's wc -mc shows both, it would be nice if ours could.

This is not a complete list. This is me glancing around at my todo heaps and remembering a thing or going "why have I got that browser tab open", "what's in THIS directory's todo.txt... ah 'collate todo.txt files out of these other three directories', of course..." I have TODO: comments at the top of a bunch of toybox files I haven't looked at in ages. That's not even counting tackling the rest of the pending directory or lib/pending.h, this is just the top couple layers of my todo list. The stuff I've been working on _recently_.

And all that's just toybox. I need to do an Aboriginal Linux release soon, so I'm trying to figure out how to fix my old version of binutils, which the 4.4. kernel broke on arm and on mips. (And on superh but Rich is doing a workaround for that. The other two I've root caused but not fixed yet. The core issue is nobody but me regression tests on the last GPLv2 release of binutils anymore, so I have to patch either their projects or binutils, and the patches to binutils are wandering into "add entire new features" territory.)

I'm trying to finish the Linux From Scratch build control image upgrade from LFS 6.8 to LFS 7.8 (a multi-year gap and more or less a complete redo, which is where the perl sed fix came from).

Luckily the recent fix for the chromeos guys also fixed the "toybox doesn't build under bash 2.05b in aboriginal) problem. The other uClibc regressions I checked in fixes for recently.

I'm trying to convert the j-core mercurial repository (which has a half-dozen subrepos) to a single unified git repository. I figured out that the previous blocker was that git "rename" patches weren't getting their paths rewritten consistently and git was dying with an error message so misleading it didn't even blame the right _line_. (Found it out by removing chunks of patch at the end until I found the specific line that made the error manifest.)

My ELC talk was accepted (well, one of them), but it's really a talk that my boss Jeff Dionne knows 10x more about, but he has even less time than I do, so we somehow have to coordinate on putting together a slide deck and co-presenting by April 4.

I've been invited back to talk at Flourish, I have a half-dozen potential talks I could give there and need to select/prepare a subset for them. That's April 1 and 2, I plan to fly up to Chicago on the 31st, be there April 1 and 2, fly to San Diego on the 3rd, be at ELC, probably be there for a while longer due to $DAYJOB, and then fly back to Austin. (There's presumably a trip to Japan at some point but that's sometime after this batch of travel.)

I'm trying to put together an open hardware microconference at plumber's which involves asking people if they _would_ want to go (and possibly speak), assuming it makes and without explicit promise of travel budget because we dunno who will get what yet.

I need to finish migrating nommu.org/jcore to j-core.org, which has turned into a complete rewrite of the site. I need to install the VHDL development tools from a different tarball onto a different machine (updating the install instructions) and then build the current version from the converted git repository to post current binaries into a downloads directory that does not yet exist.

Oh, and I need to write a VHDL tutorial, which would involve me learning VHDL first.

I had a lot more todo items written down on a piece of paper, but I lost it.

> Also, did you say a few months back that you'd started on 'ioctl'? If
> so, do you want to check that in? If not, correct my misapprehension
> and I'll do it at some point. (For that and prctl I'm also slowed
> down by the fact that there's much to dislike about the existing
> commands...)

Oh right, that. I should go do that.

Rob

P.S. The email notification for https://github.com/landley/toybox/pull/26.patch came in since I hit "reply" to this message.

P.P.S. Sorry I haven't set up a proper Android test environment. Dismantling AOSP is something I really look forward to, and do NOT have the desk space for at present...

P.P.P.S. Same for Tizen and chromeos, and I need to resubmit the toybox addition patch to buildroot...

P.P.P.P.S. Oh just hit send already.


March 3, 2016

My ELC talk got approved (J-core design walkthrough), but I'm co-presenting that one with Jeff Dionne and need help putting together the presentation. (I'm presenting about stuf I don't actually KNOW there yet, although Geoff Salmon can help me come up to speed if Jeff is too busy. And Jeff can do it off the top of his head.)

Flourish contacted me and asked me to speak again, and work seems willing to send me there. It's right before ELC, but if I fly up to chicago on the 31st and fly from there to San Diego on the 3rd it works out.

I have a bunch of topics I could do at Flourish:

  • The turtles all the way talk on j-core/open hardware. (Work's sponsoring the trip, practice for ELC, there's new material, it's of general interest anyway. I could do the J-core tutorial too, maybe work would spring for a dozen Numato boards again and I could set people up with them?)

  • The prototype and the fan club (again), the talk I gave there 5 years ago, which never quite got properly recorded. I keep wanting to refer people to it, and the material's still relevant, so...

  • The rise and fall of copyleft. Important talk on the widespread return to public domain licensing, I gave a halfway decent version at OLS in 2013 but only the audio was recorded and I didn't retain the list of web pages I showed (instead of slides I gave primary references), I've wanted to do a proper one ever since. Maybe "Copyleft vs the Public Domain" is a better title?

  • The three waves: Hobbyists, Employees, and Bureaucrats (oh my). I did a series on this for The Motley Fool and there's a BUNCH of new material (among other things it explains the existence of Google Alphabet.)

  • Android as a self-hosting build environment. The PC is now Big Iron, but Apple's read-only iPad future isn't appealing. Android on the workstation is necessary to replace the PC with smartphones, the sooner we do this the less scar tissue we'll have to overcome (preferably before the FBI/NSA forces phone vendors to lock us out of our own devices).

  • Embedded Linux From Scratch. How do you make the tiniest system capable of rebuilding itself under itself from source code, then bootstrapping itself up to arbitrary complexity?

Or I could do a toybox walkthrough, or come up with something entirely new...


March 1, 2016

Not to be outdone, a kernel upgrade also broke the sh2 target, although in this case it's not 4.4 but more like 4.6. Rich's git tree has the new cmpxchg instruction, which my binutils 2.17 doesn't understand the mnenomic for.

Easy enough fix, but that's 3 architectures that dowanna build with the toolchain that 4.3 built with. Disturbing trend.


February 25, 2016

Oops. Trying to make NFS work with the toybox mount command, which is a giant pain, and I remember why I switched NFS support off in Aboriginal Linux's baseconfig: all I added was CONFIG_NFS_FS=y, CONFIG_NFS_V2=y, and CONFIG_NFS_V3=y, and the infrastructure it pulls in DOUBLES the i686 kernel compile time!

Anyway, I found part of the problem: The mount command wasn't passing through the string options. It was parsing out the mount flags, but the rest of the string was carefully assembled and then a NULL got passed to the kernel instead.

That'll screw things up, yes. :)

Anyway, I wrote down the test environment setup many moons ago, just build the old Userspace NFSv3 server and run it ala:

echo "$PWD (no_root_squash,insecure)" > blah.cfg
./unfsd -d -s -p -e $PWD/blah.cfg -l 127.0.0.1 -m 9999 -n 9999

Then in theory, inside the kernel I do:

mkdir blah
sudo mount -t nfs -o ro,port=9999,mountport=9999,nolock,v3,udp \
  10.0.2.2:/home/landley/test blah
ls -l blah

(Where "/home/landley/test" is whatever the $PWD above was.)

Of course, it doesn't work even with the first mount problem fixed, so I'm adding printks to the kernel to find out why...


February 20, 2016

Paul McKinney pinged me about the Open Hardware Microconference I'm trying to put together at LPC in november, and asked who else I'd invited.

Nobody so far, because I didn't know if it would make, but it's easy to find ideas.

We _need_ a RISC-V person to have any sort of reasonable coverage, and it would be nice to ask the people who tried to clone arm why they stopped at armv2 (answer is almost certainly intellectual property law, but even the Thumb instruction set was announced in 1995...)

Anyway, I'm off to shake the tree and see who's interested. (I know the Qualcomm Hexagon process architect and Linux kernel maintainer socially, at least a little, but dunno if their employer approves of open hardware or how interested they'd be...)


February 19, 2016

Got a tweet out of the blue from an old friend (ex-friend?) whose sudden religious conversion a couple years ago I never understood. According to her twitter stream she apparently just had surgery to remove her Thetans or ritual circumcision or something, and has now officially been born again as the person she always really was, and is sending out "you didn't believe in me, look at me now" notices? (I think?)

My attempts to understand this new religion when it first happened deeply hurt my ex-friend's feelings (I was supposed to just smile and nod), and since she was already systematically cutting off contact with everybody from her old life at the time and declaring us all evil anyway, I just shut up and let that happen. I was under the impression afterwards she was pretending I never existed, but now random contact? Confusing.

The tweet didn't show up in my stream, but when I saw Fade's reply (it was aimed at both of us. I wonder if I used the "mute" feature a year or more ago, or if twitter's being weird?) I read back a bit in her twitter stream, and her new religion seems to be working out for her, so yay? I'm glad she's happy?

The odd part is I saw her father Dale yesterday, because Nick knows him and visited him at work twice during the apartment hunt. In fact Dale recommended the real estate office (down the hall from his day job) where Nick found the new apartment.

My ex-friend's father is one of the people she cut off contact with, and when he asked me how she was doing and I honestly said I had no idea. A mutual friend of Nick's and Dale's died recently, someone they said my ex-friend knew growing up (and the reason Nick's move to Austin got aborted, so I had to return the cat). So Dale emailed his kid for the first time in forever, and apparently the response was, in its entirety, "Do not ever contact me again." That's about the point I stopped reading my ex-friend's twitter stream, when she started talking about "emotional blackmail" from relatives trying to "drag her back in" I decided I was caught up enough.

Since Dale complied and this was before my most recent trip to Arkansas, presumably this tweet is just weird timing? I admit hearing the story of the "six word reply" as Nick put it contributed to my decision not to renew contact. I haven't got the social skills to navigate minefields. It was made clear to me way back when that I was not helping matters, so I stopped. I only tried to understand in the first place because this was a friend of many years, otherwise it's none of my business. If her new religion is making her happy, have fun with it. I don't understand the guy who made TempleOS either, but I respect the amount of work that went into it. They can go off and do their thing, and leave me out of it.


February 18, 2016

Still in Arkansas, but Nick has an apartment! (I'd link to Nick's coverage of the week but that's over on Faceboot and I've never had an account there. I keep meaning to introduce Garrett and Nick, but it always boils down to them both being on Facepalm and me not being. I have twitter and a phone, I didn't have an AOL account the first time around either.)

The Android guys just sent me a mount fix that I think is the same issue the Cyanogenmod guys put in their tree a while back. Since going through the cyanogenmod tree is still on my todo list, and they never actually submitted their fixes to me (or informed me of their tree's existence, I found it on a google search for something else)... yeah.

I already fixed one of their issues (install.c no longer #includes toys.h so it doesn't use LSB headers that aren't available when cross-compiling from MacOSX) in a way they'll probably never notice. They applied a patch to toys.h to #ifdef out the LSB headers, I fixed it a different way, no idea if they'll ever notice or if they'll carry a presumably now-unnecessary patch forever. *shrug*.

The downside of the fixes from the android guys is they don't regression test on uClibc or older distros like ubuntu 12.04. Accumulating build breaks I need to clean out...

Today's example, Elliott's insmod patch hooking up "insmod -" is very nice, except that finit_module isn't available on kernels before 3.8. Unbuntu 12.04 uses 3.2 plus 98 security patches. So that's a build break on my netbook.


February 17, 2016

Still in Arkansas.

Finally got the darn perl build fixed. Here's a quote of the commit message:

The perl build's attempt to escape spaces and such in LD_LIBRARY_PATH is _SA It uses a sed expression that assumes you can escape - to use it as a litera (you can't, it has to be first or last char of the range), and assumes you have to escape delimiters in sed [] context (you don't), and/or that non-printf escapes become the literal character (they don't, the backslash is preserved as a literal), meaning it winds up doing "s/[\-\]//" which is a length 1 range, which is officially undefined behavior according to posix, and regcomp errors out.

But if we don't accept it (like other implementations do) the perl build breaks. So collapse [A-A] into just [A].

Testcase taken from perl 5.22.0 file Makefile.SH line 8.

<jazzhands>Perl!</jazzhands>

On the bright side, this unblocks the Linux From Scratch build, I think? It would be nice to ship with that again next release. If I ever get the darn toolchain building 4.4...


February 16, 2016

Still in Arkansas

Nope, I plead the fifth, although the ability to look at text files as text files remains kinda important. (As long as they still accept text files and their markup doesn't stop them from working as textfiles, life is good. But if I ever have a patch bounced because my change to Documentation/blah.txt screws up magic markup, I will not be resubmitting it.)

Elliott contributed a nice file.c (not posix compliant but posix can go hang on this one, its' spec for file is uselessish; they removed cpio from the command list but not from file?). I'm doing a cleanup pass on it, and there is SO much bikeshedding on the list. (I declined bikeshedding on the kernel list, it followed me home.)

The main argument about file.c isn't about posix, but about whether or not it should match what the de-facto standard implementation does. The problem is, the de-facto standard is both inconsistent and crazy. (It's not a gnu program, so it could be way way worse, but still.)

Posix has a file standard which suffers from Posix failure mode #1: it doesn't include enough detail to be useful. (Posix failure mode #2 is not noticing the 1970's are over yet. They still maintain the standard itself in SCCS. No really. To this day. I don't know why.) So copying with posix here is reasonably easy and completely irrelevant, although Elliott's first version largely didn't bother.

I was thinking of producing mime type output, which at least has a standard, but most of the argument so far has been about describing ELF binaries, and if you do that it just says "application/x-executable" for EVERYTHING, which brings us back to the posix failure mode.


February 15, 2016

Upgrading Aboriginal Linux to the 4.4 kernel and the mips build is breaking, because they enabled VDSO, which requires linker features added in binutils 2.23. So they have a test, scripts/ld-versions.sh which dies with busybox awk saying the regex has an unbalanced ")" (apparently that awk is extended regular expressions and it expects basic, or something?) Turning that into [?] the build then fails with a linker error.

Digging deeper, here's why:

$ ld --version scripts/ld-version.sh
GNU ld (GNU Binutils for Ubuntu) 2.22
...
$ ld --version | scripts/ld-version.sh
22200000
$ mips-ld --version
GNU ld (GNU Binutils) 2.17.50.20070703
...
$ mips-ld --version | scripts/ld-version.sh
2029270300

Obviously, the ten digit number is larger than the eight digit number, so it assumes binutils 2.17 from July 2007 must be new enough. (That's using ubuntu 12.04's host awk there, can't blame busybox for that.)


February 14, 2016

Quiet day in a hotel with sick kitten and netbook.

The toybox test suite is in horrible shape, and my poking at various loose ends snagged some ratholes. Lots of commands don't have any tests, but the recent makefile update to list available commands makes that easier. (I tweaked it some more today, now there's "make list_working" to show working commands, "make list_pending" to show pending commands, and "make list" to show both sorted together.Most (but not all) "testing" macros start with the command name in the description field. I thought I'd remove the repetition and have the infrastructure do that, but it turns out not ALL of them do, so I made a giant evil sed invocation (for i in tests/*.test; do X="$(echo $i | sed 's@tests/\(.*\)[.]test@\1@')"; Y="$(grep "testing[ \t]" $i | sed "/testing [\"']$X/d" | grep -v 'testing "name"' | grep -v "testing ['\"]['\"]")"; [ ! -z "$Y" ] && echo $X && echo "$Y"; done) and have been slowly regularizing them so that I can use a variant of that to _remove_ them all at once.

This means I'm looking in various tests/command.test files, and finding some things we aren't testing well, such as the fact that according to posix, "rm -r dir" should handle infinite directory depth. Currently toybox rm is limited by open filehandles (1024 or 4096 depending on ulimit), which is longer than PATH_MAX but not infinite. I have a design worked out to make that work properly (close parentfd for non-symlink traversals, open ".." to reclaim it and if stat doesn't match drill down from last saved fd (root or last symlink) and if the saved stat information diverges from the current stat info, error out because somebody did a "mv" while we're traversing the tree and that's cheating).

Alas, I haven't _iplemented_ this yet. But I now have a test creating an 8192 entry directory chain and trying to rm -r it. (Fun thing: mkdir -p on the host fails with an argument greater than PATH_MAX, and thus "scripts/test.sh rm" fails to create the dir because single command tests use host commands for the rest of the script, but if you make a 2048 byte string of 1024 directory entries and loop doing a mkdir, cd, mkdir, you can drill down past the limit. In fact if you mkdir the paths in parallel and do "cd longpath; mv ~/nextbit .; cd longpath;" to assemble the chain, the OS would have to traverse the contents of each directory to _enforce_ path length limits. And even then you can mount a filesystem that's _already_ got one too long, which is why rm has to just cope with it.


February 13, 2016

Spent the day driving to Nick's parents' house to return Moose, the sick kitten we've been taking care of since December. (Moose has a "liver shunt" which means he's on a special diet and requires medication before each feeding, which doesnot combine well with a house full of free-fed cats and Adverb.)

Nick's plans to move to Austin fell through, so I'm back here to deliver cat and help with apartment hunting in Little Rock. But hey, driver's license again...


February 12, 2016

Geoff salmon hugely improved the Numato j-core bitstream, enabling d-cache and i-cache (there was room in an lx9!), switching over to the new ddr controller, and upping the clock speed from 32mhz to 50mhz. The result is _noticeably_ snappier, and I really really really need to get the VHDL repository conversion finished and posted.

But today, I'm fighting with the perl build in LFS 7.8. The package sequencing LFS gives is nuts, if you try to build chapter 5 _or_ chapter 6 starting from Aboriginal Linux's 7 package build, you hit fun things like coreutils needing automake which needs perl which needs who knows what.

The problem with LFS is chapter 5 has too many packages (because glibc is a giant pig and needs perl to build). Building chapter 5 assumes the moon, and building chapter 6 assumes all of chapter 5 (including perl).

So I'm back to coming up with my own sequence, where we reverse the umbilical from the lunar module to get the extra 4 amps necessary to build perl.

The perl build is weird. We explicitly tell it _not_ to use its own built-in forked copies of zlib and libbzip2, for reasons I'm not entirely clear on. (Making them prerequisites to perl because they're... trying to optimize... perl? Really? It's _perl_. Not even the one that knows swordfighting, one of the useless homeworld protocol droid ones C3P0 evoved into since the "long time ago" bit.)

Where was I?

Right: musl had a bug (#ifndef __cplusplus; then clearly we're C11) which was easy to work around. Then toybox sed had a bug that was hard to track down, but I eventually got it to "s/[A-A]//" is an error if a length 1 range [A-A] ever makes it to regcomp. (Why? Because posix says that's undefined behavior. Why is it undefined when it's clearly a synonym for [A] I have no idea.

The path to figuring this out was long and elaborate because the regex was trying to escape spaces and such in $PWD, and they were doing so via backquotes so THING=`sed stuff` was setting the variable to an empty string, which became "LD_LIBRARY_PATH= ./perl" and with glibc that found libperl.so in the current directory (because posix says empty path segments are synonymous with . thus PATH="/blah::/thing" means check the current directory between /blah and /thing, and musl didn't implement that because it's a bad idea security-wise.)

But digging further, LD_LIBRARY_PATH _should_ have been /home/perl, the sed expression was failing when used with toybox sed, and the failure was silently ignored due to backquotes not propogating errors back to the caller. (You _just_ get the output to stdout, output to stderr goes to stderr where you don't see it amongst the rest of the perl config messages.)

The problem is the regex was written by somebody who thinks "s/[_\-\/]//" makes _sense_. Instead it has 2 mistakes: 1) you can't escape - in [], if you want to match a literal - put it at the beginning or end of the list, 2) square bracket context trumps s/// context, the delimiter ending the range doesn't count in square brackets so you don't need to escape it. So regex was seeing \-\ which is the length 1 range I referred to earlier (and yes that means it _won't_ match - but the range is actually inverted so extra things get escaped and the shell doesn't care), and regcomp threw an error, meaning sed exited with an error message. Apparently the gnu/dammit sed is collapsing that range to "s/[_[\]/]//" before calling regcomp, so I should too.

Gotta drilldown through the incidental breakage to find the actual problem needing to be fixed. That's perl for you.


February 10, 2016

Elliott pointed out that the updated grepdoesn't work on Ubuntu 14.04 because sometime after Ubuntu 12.04, glibc acquired a bug where printf("%.*s", INT_MAX, string) only prints one character. Since I tested on Ubuntu 12.04 (which doesn't have the bug) and musl (which doesn't have the bug), and he tested on bionic (which doesn't have the bug), we didn't immediately notice.

This is a category of bug where I suppose I should stick in a workaround, but remember to yank it OUT again once I no longer care about glibc being broken (by whatever path that occurs).

Meanwhile, toybox now warns if it's building code out of pending. I should


February 9, 2016

Implemented ulimit, which was so much fun. The bash version SUCKS.

No really: bash's ulimit documented -b but didn't implement it, had a -x based on an RLIMIT_LOCKS feature the Linux kernel removed in 2003 (so it hasn't worked in 13 years), used 1024 byte units for -f when posix explicitly said 512, and then it used 512 byte units for -p which was displaying a hardwired value that chagned in 2010 (linux commit 35f3d14dbbc5) so it's been wrong for over 5 years. Linux grew a very nice RLIMIT_RTTIME feature back in 2008 (linux commit 8808117ca571) that ulimit never bothered to hook up (I made it -R, it limits realtime tasks to X microseconds before they have to block or get a kill signal)...

And of course Linux grew a "prlimit" syscall ages ago (2.6.36 in 2010) but bash's ulimit doesn't use it. So I added a -P to specify the pid to ulimit on, and made it default to getppid().)


February 6, 2016

Kitten-related drama last night, almost drove to Arkansas because of it. Probably doing so this coming weekend instead.

I finally bothered to look up the makefile syntax to include a file but not mind if it's not there (it's "-include filename"), and made a generated/Singularity.mk file with targets for each command name ala "make ls" (each of which just calls "scripts/single.sh ls"), plus a "make list" to show them all, and clean:: target adding them to "make clean". (It builds them in the top directory, but that's where toybox goes, so... The "make change" target puts them in a subdir already.)

I had to filter out the "help" and "install" commands, which already had other meanings, but my real problem is that the generated directory is blown away by "make clean", so if you do a "make defconfig; make clean; make ls" it doesn't know make ls.

Right now the .config file is at the top level directory because kconfig is an existing de-facto standard I'm borrowing (linux, busybox, uclibc, buildroot...) and there are like five variants of it that can be crapped into the top directory by that stuff (why does it make a .config.old? Your guess is as good as mine) but they're all hidden files. I could make this be another hidden file at the top level directory (ala .git) or I could teach make clean to leave one file in generated. Not sure which is uglier.


February 5, 2016

ELC call for presentations submission deadline is today, so here's the ideas I've got off the top of my head:

First I'm ruling out resubmissions of existing talks: the rise and fall of copyleft one, turtles all the way, toybox status update... I've done that. People haven't heard them, and there may not be good recordings of some, but I could do podcasts if I really wanted to. (The blog that nobody reads could grow a youtube channel nobody watches.)

(Heck, I could redo the prototype and the fan club, or repackage the three waves stuff as "Hobbyists, Employees, and Bureaucrats (oh my!)" updated to explain what Google's "Alphabet" project is actually for, why the kernel summit's "call for hobbyists" flopped, understanding why the Linux Foundation ending individual memberships was probably inevitable, maybe describe it as a synthesis of the mythical man-month and the innovator's dilemma, drawing heavily from the computer history book "accidental empires"... But convincing people that the talk is worth listening to without GIVING the talk is a chicken and egg problem, and I'm not up for it.)

So here's a couple talks I could reasonably summarize at the last minute:

Android on the Workstation - From Aboriginal Linux to AOSP, the hairball and selfhost sections, and the fact that independent iOS devs beat Android to self-hosting, and that recent android on the desktop article wasn't a Google initiative, just somebody trying it.

Portability in 2016 (I.E. "Ze autoconf, it does nothing!"). Hmmm, I should note that bell labs' original unix was a reaction against multics, and linux was a reaction against minix, and have a section on glib being terrible...

Beyond that I did three variants of jcore stuff, mostly in hopes of getting Jeff Dionne to attend. An Open Hardware BOF, a jcore processor design walkthrough (which I'd need Jeff or Geoff to do), and a redone Turtles talk with Rich going into reviving the toolchain and the kernel and nommu userspace and so on. (Still starting with how great it is that patents expire, but this time talking about what we've DONE, not what we plan to do. Using the build to emit specsheets, GHDL simulator, FPGA bitstreams, and ASIC masks all from the same BSD licensed VHDL source on github. SMP, DSP, EIEIO.)

I'd also love to have arranged a libc round table with Elliott Hughes (Bionic maintainer) and Rich Felker (musl-libc and linux superh maintainer), plus maybe the buildroot and openembedded maintainers. Alas, they moved the darn conference from San Jose (where Elliott lives) to San Diego (a 7 hour drive away), so no.


February 4, 2016

My local copy of netcat hasn't compiled in a while because it's halfway converted to use xconnect() out of lib/net.c. The reason I stopped halfway is this is a hard problem, because the ipv4 to ipv6 transition was handled TERRIBLY.

Why didn't they make the ipv6 stuff handle IPv4? If an ipv6 address with all but the bottom 32 bits zeroed was an ipv4 address, userspace could switch exclusively to the ipv6 api and not have to care. But no, 20 years into this "transition" every userspace program that wants to support ipv6 without _dropping_ support for ipv4 (I.E. the entire english speaking internet) still needs duplicate codepaths.

The network stack always used variants of an "addr" structure, which has a bunch of flags specifying what type it is, and then different structures dependingon what those flags (in the first few fields, common between different structures) say.

Did I mention that the Berkeley guys, who gave us this network stack, also invented "vi"? Yeah. Wanna know why BSD hasn't taken over the world? There's a hint. (Not that AT&T's "streams" was any better. Neither group that inherited Bell Labs' work had a worthy successor until Linux, and I explicitly include the minix and gnu projects in that. The original Bell Labs Unix combined pragmatism with taste in a way that's hard to do, let alone sustain. They were reacting _against_ Multics in the same way Torvalds was reacting against Minix and Gnu.)

In terms of a server, we've got socket(), bind(), connect(), and listen(). For name resolution getaddrinfo() replaces gethostbyname() and getnameinfo()

February 1, 2016

Birthday lunch at Dead Lobster with Fade and Fuzzy, and then when I got home Fuzzy had a three layer parfait style birthday cake ready, with a layer of hard chocolate on top.

Putting together a Linux From Scratch 7.8 build control image to test the pending toybox release. There is sooo much subtle breakage in this thing. The groups file it says to create has a systemd user in it (for the non-systemd lfs). The glibc build says the test suite is critical and not to be skipped under any circumstances, but it's also expected to fail in multiple ways. The pkg-config source bundles its own copy of the glib source, but when autoconf detects that glib isn't installed on the host it DOESN'T USE IT, instead telling you you can add --oh-get-on-with-it to the configure command line to tell it to do the obvious thing it refuses to do on its own for no apparent reason.

I asked about that last one on twitter and was told A) openbsd wrote its own in perl (not an improvement), B) there's a fork that removes glib from the thing, but LFS doesn't use it.

Sigh.

There's an attr package which won't work with gettext-stub because it needs some command line thing (misericorde or some such) installed by gettext. So I built gettext, which took longer to build than glibc, and then it broke because of the __BEGIN_DECLS thing, so I shoved the appropriate #defines into musl's features.h (since my build script was already adding a #define __MUSL__ to that, yes I know the _right_ fix is to add a sys/cdefs.h) and rebuilding toolchain...

I suspect I can cut a toybox release at this point though. Nothing's broken on current toybox yet, and if it does I can throw a patch in the next aboriginal release when I check the fix in to toybox git.


January 31, 2016

Got email from Flourish (conference I spoke at in Chicago in 2010), which is scaling back up and wants me to speak there again. I told 'em sure if they can pay to fly me there. (Eh, maybe work would be willing to sponsor a trip if they can't?)

Also got email this morning from somebody asking me to finish the Aboriginal Linux "history" page. So many todo items...

I did hear back from the O'reilley conference: all three of my talk proposals were turned down, although one they were interested in promoting through "other channels" (whatever that means). (A while back I got an email asking if I could do the jcore tutorial in a smaller timeslot, and I said yes, but they turned it down anyway. *shrug* Not my normal stomping grounds, only applied because they're in Austin this year. Not going if I have to pay my way in.)

Still no word on a possible next Japan trip.


January 30, 2016

Now that I'm temporarily freed from the tyrrany of the todo list (toybox is in release freeze, just testing and bugfixing checked in at the moment), of course I have a giant surge of things I want to do to it.

It occurs to me that my struggles with the hash table code are because I'm trying to make it too generic. What I specficially need is an inode hash, so tar and gene2fs and mkisofs and rsync and such can detect hardlinks. (In theory I can make hardlinks in a fatfs too. If it's read only. Technically that's filesystem corruption in that context, though. :)

This means I don't have to handle the delete case, and am doing fixed size "long->pointer" associations. (I'm tempted to say "if you try to deal with a filesystem with more than 4 billion inodes on a 32 bit system, get a 64 bit system. As long as the failure mode is just not detecting hardlinks, I'm ok with this.)

So what I can do is have a 4k block of struct {long inode; void *data}; (on 64 bits that's 16 bytes, so ~256 entries per 4k block), do insertion sort into that, and then when one fills up split it into two child blocks of 128k each... except how to balance the sucker so it doesn't degrade to a linked list? Gets us back into tree territory, but what I want to do here is avoid having 100,000 individual tiny allocations in the tree, each with its own pair of child pointers. Batching data this small makes sense, but the existing tree descriptions aren't talking about how to balance trees of _ranges_. (Probably haven't googled the right keyword yet, this isn't a new problem. Shouldn't be that hard to work out from first principles either, but the question with each algorithm is "what are the pathological cases".)

My normal approach of doing something simple and waiting for the real world to overwhelm it doesn't apply here because tarring up a terabyte filesystem is a common-ish case these days. Except... the filesystem tells me the link count, so I only have to remember the ones with a hardlink count >1. So insertion sort in a realloc()ed array wouldn't actual be a huge deal. Hmmm...


January 28, 2016

Tidying up pgrep/pkill for the release, implementing -o and -n there, and I noticed pgrep -f is only checking the command line that fit in toybuf (ballpark 2-3k after the slot table and other strings loaded), and /proc/$PID/cmdline can be essentially unlimited length now.

Hmmm, when we do an ANSI size probe, scankey doesn't return an indication that it received it. (It saves the new size, but doesn't return a notification that we need to adjust.) Ah, we should send sigwinch to ourselves, and install a handler for it.

Redoing the Linux From Scratch build. It's been long enough that this is basically "start over from scratch", but I already did it before (and did BLFS once for qualcomm, although I didn't get to keep that code becuase you couldn't open source anything you write on the clock there).

That probably _is_ holding up a toybox release because my checklist includes running the new toybox through the full LFS build, and that hasn't worked since the switch to musl, so...


January 27, 2016

Got an email from the guy who owns the toybox.net domain asking for half his previous price and saying he's contacted all the other people who showed interest earlier and doing an act now supplies running out thing. Um... no? (It would have been nice to have, but he said no and I went on to other things.)

Finally got top to a point I'm happy to include it in a release. The headers are outputting the process and CPU and memory info. It's still using more cpu time than the other top (2% vs 1%, about 50 miliseconds per redraw on this 5 year old netbook that was ultra-low-end when I got it). I suspect a lot of that is glibc's utf8 fontmetrics code being APPALLINGLY slow, possibly I should hardwire the utf8->wide char stuff into a lib function. (Meh, shouldn't suck in musl or bionic, I think? Haven't benched it there yet.)

Darn it, benched musl. It's 3% there.

Right, it's PROBABLY the sort stuff that does string compares of otherwise numeric fields in the fallback sort. I could annotate those better to indicate which ones actually need it. (It could also be that we load all the data and then do generic processing of it, so it could be thrashing more data through the CPU cache.)

Eh, not holding up the release for this though.


January 26, 2016

Ah, the kitten _did_ break something when he pushed my netbook off the counter onto tile. The case for the nice new battery is cracked and a piece broke off on one end, revealing a flimsy looking electrical contact and the end of what looks like a fairly standard A battery. (Well they said it was a 9 cell...)

This is what duct tape is for. (Although Fade suggested electrical tape instead, and that seems to work quite well.)

Actually trying to implement the header stuff was painful enough I ripped the parsing out and just hardwired the fields in question. I had to read a new file to get the memory info anyway, and the "number of different values" vs "sum of values" vs "count of entries" vs "highest" vs "lowest" vs "number with this specific value"... it was implementing custom code for the majority of outputs anyway.

I did make the iotop header output the summable fields in the order they appear is -o, so is you feed it a different -k that should adapt.


January 24, 2016

The last bit of "top" I'd like to get in before a release is the header stuff that shows total memory and running processes and all that. In theory, %PID style escapes can use all the existing ps -o FIELDs, but that's per-process data which needs to be collated, and there's more than one way to collate it, plus sometimes it needs to be filtered, or even outright replaced.

Let's look at each header line in ubuntu's top:

top - 13:52:01 up 2 days, 23:15, 1 user, load average: 0.23, 0.42, 0.76

Uptime isn't an existing per-process field, it would be a new escape to show a global field.

The user count could be %USER counting number of unique entries. The best way I currently have to calculate unique entries is to sort by this field and then iterate counting transitions, so this is re-sorting the proc data list every escape, although I'm told qsort() got tweaked so feeding it an already-sorted array is no longer its worst-case pathological n^2 behavior case. It feels expensive, but the common case is what, 300 processes? Hmmm...

At some point I need to add something hash-like to at least track seen entries for hardlinks, and doing an insersion sort into an array is N^2. I'm tempted to do a self-balancing tree, but trees have data and two child node pointers so you're tripling the size of the data storage even without each one being a separate allocation...

(Even though I wrote the red/black tree documentation in the kernel Documentation directory, I'd still have to look up or work out how to actually implement one. The Linux Foundation guys asked me to write a red-black tree explainer for linux/Documentation when I applied the documentation fellowship back in 2007, I complained that there was already a perfectly good lwn.net article on it, and a Wikipedia[citation needed] entry, but they said Documentation should have one so I read those two and wrote one, citing both of them. But both handwaved how to do the actual node balancing decision, so I referred to them. The big problem with that entire position is they didn't have a clue what they wanted me to actually _do_, stating a problem is not the same as coming up with a plan to solve it.)

Really my problem is I don't want to go off on ANOTHER tangent implementing infrastructure right now. And for the hardlinks problem I can do insertion sort into an array for now, which is efficient up through about 4k block size, then maybe do something that splits nodes and makes a tree out of them later to scale past that point.

As for the rest of the line, I honestly don't care about load average (it's trivial to fetch out of /proc/loadavg but I've personally never made any sense of those numbers, wait for somebody to complain). I note that _all_ busybox shows of this line is loadavg.

Tasks: 288 total, 2 running, 285 sleeping, 0 stopped, 1 zombie

%PID could also use number of unique entries for the field, so same code as %USER, but in the PID case we know they're all unique so that's just an expensive way of counting total number of entries. So... special case would save CPU time at the expense of codesize, but it's a small one, PID is the first field and all we have to do is skip the sort, so if (which) qsort().

Except... if we do "-u root" filtering, do we should number of MATCHING processes or number of TOTAL processes? And while we're at it, we don't have to count %PID at all because TT.kcount has it, although that's still matching not total.

Showing "running processes" is %S=R, or really %S=R= so the parser can tell where the match ends. (That's a string compare, but eh.) Sleeping is %S=S=, stopped is %S=T=, Zombie is %S=Z=... easy enough.

Cpu(s): 38.3%us, 7.4%sy, 1.5%ni, 50.9%id, 1.7%wa, 0.0%hi, 0.2%si, 0.0%st

Three problems here. The first is "the htop problem", ala how many CPUs worth of data are we showing? If we aggregate processes we also aggregate CPUs, although "400% user" on a quad processor system is 100% usage of all 4 processors so do we divide by processors or not? Maybe Cpu(4): 400.0%us? Except a 32-way machine expands a length 5 field into a length 6 field if we do that...

Second: I'm not currently tracking/displaying these individually, just cumulatively. So I have to add more typos[] fields and possibly more slot[] entries.

,p>Third: the easy way to program this is to iterate over the list for each field. The efficient way to program this with regards to cache fetch behavior is to create an array of display fields and populate them all in one pass. I can't do that for SORTED fields (where I care about unique entries), but when adding up totals it makes sense. Then again, "when in doubt use brute force" says to do the simple to program version and wait for somebody to complain. (Either way counting is way faster than sorting, and doing an optimization only for only the already-faster cases seems sillyish?)

Mem: 7899148k total, 7770796k used, 128352k free, 281016k buffers
Swap: 3909628k total, 286084k used, 3623544k free, 2338096k cached

The last two lines are basically fetched out of /proc/meminfo. As with loadavg, this really doesn't have anything to do with existing ps info, it's not per-process it's global (per-container).

These are padded to match up, so I genericized the printf format scanning stuff out of seq -f and did "%9.10KMEM" that can turn into %9.10d, which handles the padding but not the fields. I have human_readable(), but this is forced to kilobyte output (ala iotop -k but per-field), which raises the fun case of terabyte memory systems needing more columns so if this was hardwired output I'd just measure the "total" and use that length for the rest of the fields on the line, but that's not the approach I've been taking here...

(Other fun thing: how to describe this mess in the help text! Pretty much provide the default header format as an example at the end?)


January 23, 2016

I've mentioned that Linux on the Desktop ain't happening, and included the material in several talks. The most recent update on this front is that something like a decade into attempting to upgrade our infrastructure, an attempt to redo x11 to use 3D graphics is as likely to work with a random application as linux is to install on a randomly laptop purchased without researching it first.

(Did I mention that Stu gave back the new netbook I got last year (the one I stuck the SSD in) because he couldn't get linux to work with it either? It installed; touchpad won't work and it doesn't suspend right, don't remember what else was wrong with it.)

Speaking of conference talks, I haven't heard back from the O'reilley folks which I assume means all four of my talk proposals were turned down. (Meh, never spoke there before. Only submitted because they're in Austin this year. Given I've never been to SxSW and didn't visit the recent Worldcon an hour away in San Antonio...)

The ELC deadline was pushed back to February 5th, I should submit something to them soonish. Not sure what. And there's a call for tracks at plumber's that I waved at Jeff about doing an Open Hardware thing and he thought that was a good idea. Not sure who should run it though.


January 22, 2016

I looked at adding const to toy_list[] to force it into the read-only data segment (nice on nommu systems with no copy on write), but this gave a screen or so of "expected pointer and got exact same pointer with const on it!" warnings, so I reverted it. I'm willing to put const in two places to change a data type, I refuse to clean up after it with dozens of (void *) typecasts to whap the compiler on the nose until it stops begging at the table. (Hint: string constants are in the read only data segment. If you try to modify one it segfaults. That's the behavior I want. Static checkers generate false positives, annotating every single false positive with "no it's ok" is busy work.)

And yes, typecasting to (void *) is the correct way to signal "the compiler's pointer type-checking is wrong here, and I am explicitly disabling it". Typecasting to some _other_ type implies it actually means something.

In theory there's a less-portable (attribute)(_thingy_) that could force it into the read-only segment without the type pollution. Seems silly, though.

Digging through a zillion bug reports on the toybox and aboriginal mailing lists. Elliott says running mount as a normal user (without the suid bit set) causes a segfault in android's build, because error_msg() tries to print command name and gets a null pointer dereference. Except when I try "./toybox mount" it exits immediately because toy_exec() is returning to force a re-exec to reacquire suid after an earlier command dropped it. But that's not what's happening here,the "earlier command" is STAYROOT but never got root to begin with, but since it's not the first command this check comes before the toy_init() check due to reentering through toy_exec(). Hmmm...

I want debug output for a misconfiguration, at least when CFG_TOYBOX_DEBUG is enabled. I think to get that I have to record when the process had root permissions but dropped them; that seems to be the only way we know when to get them back.

Well, I _could_ stat /proc/self/exe and see if it has the suid bit. Not that /proc is always mounted in a chroot. I was tempted to create a function that just returns a mode so you don't have to declare a stat struct each time, but given that there's stat, lstat, fstat, fstatat(), and xstat()... not worth creating that many functions. Plus you test if (!stat) but 0 is a valid mode so the test would have to be if (-1 != getmode("file")) and... eh, not quite worth it?

Hmmm, draw_str() needs to know the current x position to pad tabs out right. Well, it doesn't when it's escaping them. :)

But the more and vi logic needs to know the current position to handle tabs.


January 21, 2016

And lo, I hath waged battle with the DMV and emerged victorious! (Ok, spent 10 minutes with a seriously cute driving instructor proving that I can still parallel park. No euphemisms were harmed in the making of that sentence.) My new drivers's license photo is immensely silly, at least the black and white version they gave me on a piece of paper. (New card mailing in 2 weeks.)

Did a big cleanup pass on ps/top/iotop/pgrep/pkill removing the last batch of pervasive magic constants. (I introduced an enum. I am not fond of enums, but sometimes you just gotta enum. Reinventing it with a stripped down version of TAGGED_ARRAY was the coward's way out, it was alas the proper tool for the job.)

This is the largest chunk of the work necessary to factor the shared ps logic out into lib/ but not quite all of it. For example, TT.bits is a bitfield of seen types set in parse_ko, and get_ps() uses that to figure out which files it needs to open. (Always /proc/$PID/stat, the rest are negotiable.) Then again, if that TAGGED_ARRAY moves to lib as well... I need to figure out how the translation list should work (ps_main() has a bunch of "-o NICE" is actually "-o NI", possibly those should just become full-fledged typos[] entries.

And I still have to figure out what to do about the headers. I want to let top and iotop use the same -o and -k as ps to define what output fields to show, but making the headers similarly configurable is nonobvious. I could use some kind of %FIELD escape except the header fields currently being shown are totals for all processes... Hmmm, I suppose the header display logic could traverse the array and add up the appropriate data. Well, for the numeric fields anyway. For the string fields it gets less obvious; something like "user" becomes number of different entries. I may need a second layer of annotation on the slot indexes.

Right now it does | 64 to indicate string fields to ksort(). Are there any such types where the correct response ISN'T count how many different types we see? Hmmm... doesn't look like it. There's a few (like "F" and "S") where sticking them in the header doesn't make any sense.

As for adding up the numeric ones... for memory usage add it up, for STIME you want oldest, CPU and PGID and UID and so on are number of different entries (like string), somebody probably wants average...

Maybe instead of %CPU I could have $CPU and #CPU and @CPU for different display types? Hmmm...


January 20, 2016

Updated our health insurance today. Went back to the place with the "Obamacare" sign out front and downshifted from a "gold plan" to a "pyrite" because it had gone up over $100/month for slightly worse coverage, as of January 1st.

I am really rooting for Bernie to move the overton window towards single payer. Yes, he's talking about a ridiculous idea, and his job is to keep bringing it up until the joke wears off. Then it's no longer a ridiculous idea. Alas, this is basically what Trump and the mad hatter's tea party crowd are doing in the other direction with a new crusade against muslims. We should have somebody pushing back the other way. Hillary triangulates to whatever the center currently is, and with the tea partiers moving the far right fringe out into indian country, the "split the difference" center moves further right every year and she happily trundles along with them. That's not helpful.

We need to deploy the GOOD kind of crazy, the so crazy it just might work kind, to counterbalance. I'm aware Krugman is running the numbers and tsk-tsk-ing at Bernie, which makes me think of Han Solo's "Never tell me the odds." JFK's moon shot wasn't proposing something rational, FDR's new deal wasn't a careful measured response to anything. Obama mothballed and outsourced Nasa, I'm not excited about following him up with a lady who wants to put Snowden in Gauantanamo, and whose campaign premise is "inevitability" just like it was last time when she lost to Obama. (All the journalists using "Titanic" to describe her campaign know exactly what they're doing.)

Yes I want a woman to be president, but Elizabeth Warren isn't running. Sarah Palin is also female, not interested in voting for her either.

I got top checked in! Working on polishing stuff now. The way ubuntu's top selects which fields to display is the "f" and "o" keys, which pull up modal selection dialogs with a list of fields and a letter key assigned to each. While I could do that, I'd prefer to have the same -o and -k options thatps takes available to top. (Why do it _differently_ in different commands?)

This would mean you'd have to exit and re-run top in order to change the display, but I think I'm ok with that. The question is, what would everyone else think?

I should ask on the list....


January 19, 2016

Started on the "top" command. In theory fishing things out of pending and plugging them into defconfig is straightforward, even for a complete rewrite of the command (as in this case, using the "ps" infrastructure and sharing most of its code with iotop). In practice, Android is using several commands out of pending, including pgrep, pkill, and top, which are three that share infrastructure with ps./p>

What I really need to do at some point is break the shared infrastructure out of ps and move it to lib/procps.c, but that's cleanup after I implement all the things using the shared infrastructure.

I need to update the www/code.html page too. And possibly break it up into www/code/args.html, www/code/linestack.html, www/code/main.html, and so on.

Ah-hah! The reason for the netbook battery being lose is the little locking latch hadn't quite engaged. Push it in REALLY HARD and it makes an alarming click that means either "you just broke something" or "it's latched". (I noticed this when it fell out, which did not seem a normal thing to happen to laptop batteries.)


January 18, 2016

Visited my new doctor to have her look at my wrist, which is currently exhibiting BOTH failure modes. (The "mosquito bite doing runaway swelling/scarring until I hit it with steroid cream", and "that tiny discolored bit under my wristwatch which I put steroid cream on until it grew to cover my entire wrist and the medispring lady shook her head at me for putting steroid cream on a fungal infection, since that suppresses the immune system". Yeah, pilot error there.

This time I got a cream with both a fungicide and a steroid in it. (Ok, oversimplification: I had some of this stuff before but right after the tube ran out I got two mosquito bites right next to each other on the troubled wrist, during one of Austin's gaps that make our 6 weeks of annual winter non-consecutive, and each developed a different problem.)

Remind me to implement an rss filter for the "<span id=health>" and "<span id=programming>" tags, so you can select which content you care about. I guess this paragraph goes under programming.

Speaking of which, I got ps and iotop doing proper UTF8 fontmetrics with all three types of escapes. In the process I found a bash bug where putting three DIFFERENT kinds of invalid UTF8 sequences into bash's command line history, one right after the other, made cursor up/down measure wrong and advance two places to the right each time. (You could call it an xfce terminal bug, but bash can detect/escape these and toysh needs to, so: bash bug.)

Did you know the ansi escape "reverse" (esc [ 7 m) is _modal_? I thought that since its job is to swap foreground and background color, you'd just do it _again_ to switch it back, but no... (Why not? Because that would be too obvious.) Instead you need a 27m to switch it back off (or 0m which is what I was doing everywhere else, but since I dunno what foreground/background colors you're currently using I want to preserve that state, so...)

The people I want to smack for being stupid here presumably implemented this crap in the 1970's, and are probably dead now. (Intel extended their first processor to 8 bits for a "glass tty" contract, which lets us know _when_ these suckers were new and shiny and competitive: when the 8008 came out. let's see, I learned that from this interview with the design engineer who created the 4004. (I also have interviews with the 4004's layout engineer who felt he didn't get enough credit and left Intel to form Zilog and create the Z80 (Zilog was the 8-bit era's AMD, making compatible clones of Intel's chips, Z80 ran 8080 programs). And I have an interview with the customer Intel created the 4004 for. And the above design engineer's manager was none other than the Moore's Law Guy.)

Where was I? Oh right, the Ted Hoff interview says the Computer Terminals Corporation contract that led to the 8008 (the 8080 was a process refresh of the 8080 they stuck some improvements in) was negotiated in December 1969. So yeah, 1970's.

(That was "<span id=comphist>".)

Anyway, toybox: I should probably make ls use this too. (If it's doing --color output, it can escape invalid UTF8 sequences.) And I can finally do a proper "top" implementation, and this was a big blocker for vi, and for shell command editing/history.

What else needs this? Easy way to check: what's querying terminal_size()? Don't care about anything in pending yet, got hexedit, ls, and ps... vmstat just outputs numbers... sed? Why is sed querying... because the "l" command wraps output to terminal width using backslashes. Yeah, that needs... huh. Should that let invalid utf8 sequences through, or escape them?

I _think_ it should escape them, on the theory it's already abusing the output (escaping \a and \v and so on). So yeah, need to update sed for utf8 fontmetrics. Wheee!

Doing it _right_ is often nonobvious. I feel sorry for the guy who did "boxes", but _this_ was always going to be the hard part. Changing all the places in the code that need each change, and dealing with the ramifications of all the edge cases.

I reeeeeally need to do a test suite fill-out pass. But when I start that I probably won't get any new functionality implemented for months, so that's a near-1.0 thing. Pending first...

Oh, and I got pgrep and pkill implemented (as long as I was poking at ps, pgrep and pkill use the same infrastructure. Well done _right_ they do, all that -u filtering and such). And I made it so that pkill DOES NOT MATCH ITSELF, and made pgrep do that as well. I'm not sure if this is the right behavior, since the pgrep man page doesn't mention this and posix doesn't mention pgrep, but when testing "pkill -fl stop renderer" and having pkill suspend itself before it had hit half the chromium tabs... Yeah, that's a thing it should not do.


January 17, 2016

Oh bravo Google. My netbook battery jostled in the bag, and I when I restarted chromium and hit "restore" there was no net connection. I just connected to HEB wifi, and it tried to reload all the tabs it couldn't get a net connection for before (several hundred of them). Every one redirected to the HEB login page, which does not store the original destination.

I have been annoyed by chromium's reload behavior from day one, I normally "pkill -f renderer" right after a reload to make it stop (main reason for not using firefox: I can't kill individual cpu-hog or memory-hog tabs findable with "top"), although a blanket kill like that discards the cached ones too (which is inconvenient but beats the alternative). I forgot this time, and chromium was epically stupid because Google Knows What's Best For You And Won't Let The Behavior Be Disabled (tm). (Just like gmail's filtering!).

The most STUPID part of chromium's bad design is that the "back" button doesn't remember the location past the redirect. Why not? Because Google knows what you REALLY want, and it's "to lose your place".

Of course when "reload everything" does work, it swap thrashes the machine to death (way too much active memory at once) and uses up the 10 monthly new york times logins reloading pages I already loaded (mostly from twitter links, which doesn't count against my total) and so on. Really, it has LOTS of bad effects. But losing what the tabs POINTED to (todo list items mainly) is the most annoying part.

I suppose it's another argument for https everywhere, those turn into a "your connection is behing hijacked!" page that retains the original URL (which is all I want). Defanging stupid chrome browser breakage.


January 16, 2016

I really like the new netbook battery except that it sometimes jostles loose while the netbook is banged around in the bag during suspend (not a problem I've seen when it's out and in use, seems to be because it's physically large and awkward and catches in the bag a lot), and since Acer doesn't seem to put a capacitor in the thing even a short unplug causes it to power down, forcing a reboot and closing all my tabs. (Then again, multiple hours of use vs 45 minutes... I'm getting used to it.)

So, toybox utf8 support. The "standard" unprintable character escaping is for low ascii characters (0-31) to be "^X", if mbrtowc() fails (invalid utf8 sequence) show "" in hexadecimal, and if it converts to utf8 but isn't iswprint() (generally noticed because wcwidth() is negative), print "U+XXXX".

This isn't the only escape regime; hexedit uses its own and less generally does different stuff. That said, I'm leaning towards coercing as many as possible to that because it covers all the cases and is reasonably generic.

One hiccup of distinguishing the three types is my callback to actually print characters takes a wchar_t, for an invalid utf8 sequence I haven't _got_ a wchar_t. That said, I can take the literal value (first byte thereof, anyway; invalid sequence doesn't say how LONG the invalid sequence is. Luckily, wchar_t is an int (on both gcc and clang, "cc -E -dM - < /dev/null | grep WCHAR_TYPE" says so), so I can feed in negative values.


January 14, 2016

New netbook battery arrived last night, upgrading from a 3 cell to a 9 cell. My netbook is now wearing high heels, although the screen not opening more than 90 degrees is far more of a distraction. (I could probably file some plastic off the back edge of the screen to fix that, but let's see how big a deal it is in practice.)

I bisected binutils git last night (shudder) to find where the arm linker behavior changed, and the commit added a test to a chunk of code that my version doesn't even have, so not necessarily hugely useful. (Technically that was Red Hat code, not FSF code, but I feel dirty nonetheless.)

Circling back to find.c, I left myself a cryptic note I don't immediately understand. This sort of thing happens a lot. (Mostly I was sticking english into the code to force a build break where I left off, so I didn't forget to finish fixing a problem I don't havea test for in the test suite yet.)

.

The actual bug I was tracking down is the "find -execdir +" behavior getting the sequencing wrong so that each new directory it descends into _starts_ with the name of the directory. It should add the directory name to the parent's list _then_ descend to the new directory with a new list.

Given that, the comment makes more sense: move the directory "push" down after the append. Except there's conditional logic in there, the -execdir may not always be enabled, but we still descend/traverse? The push/pop behavior has to match to avoid memory leaks.


January 13, 2016

Since Linux 4.3 was such an easy upgrade, I thought I'd try a quick 4.4 build to see how it went. Arm and Mips both broke.

The ARM problem is funky, I bisected it to a commit that's basically doing this:

armv5l-ar rcs virt/lib/built-in.o
armv5l-ld -EL -r -o virt/built-in.o virt/lib/built-in.io

Then when virt/built-in.o is linked into vmlinux, it's got zero FLAGS so it's EABI version 0, and the vmlinux is EABI version 4. The problem is when the linker reads an empty *.a archive to produce an empty *.o file, it has no ELF objects to copy values from, so initializes the ELF header fields to zero. And then the next link catches a type conflict and breaks.

This isn't just doing it with my 2.17 toolchain, I have Code Sourcery's arm2009q1 toolchain lying around (2.19.51) and it's zeroing the flags in the .o file... ah, but the final link isn't failing. So that answers the "how did they not notice obvious breakage" question; it affects 2.17 but 2.19 has a workaround.

FYI you can build Aboriginal Linux's kernel version out of tree ala:

cd ~/linux
git clean -fdx && git checkout -f
for i in ~/aboriginal/sources/patches/linux-*.patch; do patch -p1 -i $i || break; done
make ARCH=arm allnoconfig KCONFIG_ALLCONFIG=<(cd ~/aboriginal; more/test.sh armv5l getconfig linux)
make ARCH=arm CROSS_COMPILE=~/aboriginal/build/cross-compiler-armv5l/bin/armv5l- -j 8

Substituting different CROSS_COMPILE= paths for different compilers to test, and dropping the || break if you don't care about patch version skew. (In the arm case it's needed to plug more processor types into arm "versatile".)


January 12, 2016

Jeff asked about whether $DAYJOB should migrate to git or not (something we've been talking about behind the scenes for a while), and rather than write a long argument about it I swap-thrashed over to the jcore repository git conversion stuff. (The best argument is having already done the work and being able to go "here, try this".)

We're mercurial internally, which is great except that mercurial is dying. The git userbase is big enough that users and projects are moving from mercurial to git, and it's snowballing at this point.

The real argument for _us_ is that if we want jcore to work as an open source project with contributors outside of $DAYJOB, a git repo is way better than a mercurial repo. But we need to dogfood that external repo, having our internal hardware extensions be a subdirectory (subrepo?) under the rest of the build, and the rest just using the external repo and putting all our non-proprietary commits straight into that.

A mercurial/git straddle there is possible, but painful. So we might as well move to not require new hires to know two systems.

The reason the mercurial conversion is hard is the code was originally developed as a half-dozen separate repos, and I'm trying to merge them into one big history. This is entirely on the mercurial side, coming up with one clean mercurial repo I can then convert to git. (A sign of mercurial's decay is that there are no mercurial tools to do this, so I'm writing shell and python scripts.)


January 11, 2016

Did a quick Aboriginal Linux release with the 4.3 kernel, since it applied more or less cleanly. No other changes, just trying to catch up now that 4.4 is out.

Pondered cutting a toybox release to go along with it, but almost all the work that's been done is infrastructure, not commands. There are a lot of commands that are now low hanging fruit, but I haven't actually done them yet. (Started on pkill while I was there, since the ps work makes it low hanging now. Yes, it's starting over rather than using the pending one, but I need -u to be comma seprated lists and I want to enable multiple sort criteria at once, and...)

I still need to make new build control images that work with musl.


January 10, 2016

Rich Felker threw his hat into the ring for sh maintainership, to general acclaim.

What's been more acromonious as the accompanying patch to remove the non-superh noise that Renesas has been dumping on the list ever since they moved to arm.

There are people arguing in favor of squatters' rights, saying that in the past year or two (of a list more than a decade old) noise has taken over, therefore noise has a right to be there. Personally, I am not impressed by these arguments:

> I know it's not a perfect comparison... but assume the
> original x86 Linux mailing list was called "linux-i386".

It was called "comp.os.minix".

Right now I'm holding off until the first patch gets in, then we can reopen this bikeshed.

Meanwhile I dug up my previous phone (Samsung Galaxy S) but it won't upgrade itself to Marshmallow. (It had one upgrade when connected to wireless, a butfix dot release, then nothing.) I can try shoehorning cyanogenmod on it, but the instructions online are... awkward and bricktastic.

I'm reminded of this because I accidentally brought up the bootloader menu on my Nexus 5 trying to charge the thing. The darn mini-A cables they use warp so easily, as do the sockets, that it needs very careful positioning to actually make electrical contact. Even when I take the case off, it keeps spitting the cable back out unless I hold it in at the right angle (about 5 degrees off from straight). I tried jamming it in extra-hard (not like I'll bend it worse at this point) and brought up the bootloader menu. (Similar to finding the virtualbox window properly fullscreened when I came back to my mac to find Peejee curled up on the keyboard. No idea how she did it, and it didn't last past switching out and backin, and she brought up around 50 "really delete all your email Y/N" windows in the process.)

Said bootloader is, of course, "locked", and "secure boot" enabled. This is not YOUR phone, this phone belongs to Google, the NSA, and whatever organized crime group bought the NSA leaks from people less patriotic than Snowden. (You sell secrets to people who KEEP them secret, you don't wind up in exile. I expect organized crime takes care of its own way better than the american people do. Seriously, if it gets collected and retained,it will leak. Only question is when.)

And, next trip to Japan looks like it'll probably be in late February. I have a ginormous todo list before then, so what am I doing? Toybox.


January 9, 2016

And lo, the new chest freezer has arrived!

I'm behind on my monthly patreon reports. In part because at the end of November I hadn't _done_ anything (lots of grinding but not much in the way of results yet), and now I'm trying to do all the things at once again...

My editing and formatting of last year's blog entries continues to be way behind. I spent an entire day editing stuff and advance it by a month, so there's another week's worth of stuff to do. Alas the way I wrote blog entries... well, I talked about that last week. Sequencing makes blockers, even when I have a lot of later stuff already written up. Just another thing I need to sit down and grind through.

I am, however, cheating by taking the year break as an opportunity to upload notes-2016.html before finishing the second half of 2015. I haven't moved the notes.html symlink or switched the rss generator (the rss feed is also in strict chronological order). I can post about it in a patreon update, and everybody else... I should catch up eventually. :)

In the absence of an rss feed I might be a little more lax about posting half-finished entries and finishing them later. It's similar to rebasing git commits that haven't been pushed yet: once somebody else might have pulled it, you're stuck with it. If somebody's rss feed already read the entry, even minor changes might cause duplicates...


January 8, 2016

Netbook rebooted again last night, due to me holding down the button after fifteen minutes of it updating the cursor position ONCE, due to swap thrashing. (Too many open chrome tabs.) Lost all my open windows again.

Implementing the cursor left/right sort stuff for iotop. I'm doing iotop before top both because there's less expectations about its behavior (largely due to not being nearly as widely used as top, due to the default implementation being a python program that wants to run as root), and because there isn't an existing one in the tree so I'm not being rude by throwing out an existing implementation and starting over. (Of course once I've got iotop working, top shares probably 90% of the code and possibly even can share a main() function...)

I really should have named this project dorodango, but apparently somebody's camping that domain name. (I emailed the guy who's got toybox.net, which displays a PHP error message, but he wants "over $2000" for the name. I can't say having my projects being on my personal domain site indirectly boosting my resume's google search rank is entirely a bad thing for me personally, although if I really cared about that I'd do the minimal pandering required to avoid ranking penalties. Plus, I really like my current job so put a big "not looking" notice at the top. So really, six of one...)


January 7, 2016

Got an Aboriginal Linux release out, 1.4.4 using the 4.2 kernel. Thanks to the -rc8 I beat the 4.4 kernel, meaning I'm technically only one release behind at the moment!

And I FINALLY got most of the architectures switched over to musl, the stragglers are armv4l, m68k, mips64, sh2eb, and sparc. Musl hasn't got support for m68k and sparc yet (or the half-finished alpha and s390x ports I was working on), armv4l is still oabi (because eabi requires thumb instructions, although there's apparently a linker workaround), mips64 is broken even under uClibc so I should fix that before migrating, and I wanted a clean sh2eb reference version for one release before switching that over (and opening the can of worms that is static PIE on gcc 4.2).

I need to do build control images, but that isn't necessarily part of the release, so...


January 5, 2016

The 11 months of mailing list archives Dreamhost deleted are still gone. Not surprising, since the weeks of archives they never properly archived LAST christmas are still gone too.

Elliott Hughes sent a patch to convert various perror_msg(x) calls to perror_msg("%s", x), and each one was already a constant but their static checker can't tell, and they have mandatory -Werror on the false positive. This only comes up in the CONFIG_TOYBOX_DEBUG case because I wasn't enabling the __attribute__(this uses printf format arguments) thingy except when DEBUG was enabled precisely _because_ it produces false positives if you have 'char *s = x ? "blah" : "blah"; printf(s);'.

We went back and forth a bit until I decided the right fix is to have "error_msg_raw(char *msg);", except I need three of them. (error_msg, perror_msg, and error_exit. I haven't wrapped help_exit yet because there isn't a user.)

Meanwhile, I found out that cyanogenmod switched from busybox to toybox in November, and don't seem to have mentioned it to me? (Possibly it fell through the cracks.) they have a github repo with lots of local changes.

This raises an interesting point with regard to public domain licensing: busybox can pull patches out of any repo it sees, without asking, because the result has to be GPL. (Or course there's random non-GPL crap glued into GPL projects all the time, even Linux removed some ancient AT&T malloc code from the itanic arch when SCO trolled them, but the _assumption_ is it's GPL if they were competent.)

Toybox can't pull changes out of random repos, because those changes could be licensed any which way. (There may be a de facto assumption the license hasn't changed, but nothing requires it.) So we have to ask, or wait for submissions, or re-implement.

I expect I'll re-implement.


January 4, 2016

FINALLY figured out what was wrong with Aboriginal Linux, and of course it was retroactively obvious. A mixture of pilot error and "cross compiling is hard".

The second stage cross-compiler-$ARCH.tar.gz builds are statically linked, have thread support enabled, and various other cleanups that let you extract them in an arbitrary directory and run them from there. They are portable in a way that simple-cross-compiler-$ARCH.tar.gz are not.

You don't need a second stage cross compiler to build aboriginal linux (the reason to make them is you can tar them up and distribute them), but it'll use it (instead of simple-cross-compiler) if it's there. The more/buildall.sh script that builds all targets in parallel creates both cross compilers, but ./build.sh will only do so if you set the CROSS_COMPILER_HOST environment variable to one of the recognized targets. (The cross-compiler.sh script is a small wrapper around native-compiler.sh, it's basically the same infrastructure doing a modified "canadian cross" both times.)

Making these portable cross compilers involves rebuilding the cross compiler packages using both the earlier simple-cross-compiler-$ARCH target toolchain (for the target code like libgcc), and a host toolchain to build the parts that run on the host (like the gcc executable itself). Because Ulrich Drpepper broke static linking in glibc (intentionally over a period of years because he hated the concept), I can't reliably use ubuntu's toolchain for this, and instead use the the simple-cross-compiler-i686 toolchain I built earlier. If that code is statically linked, it should run on both 32-bit and 64-bit x86 targets. (The 32 bit code is actually very slightly faster because the smaller pointer sizes thrash the cache less, this is why Linux added the x32 target.)

When, I switched the i686 target from uClibc to musl-libc, all the other targets still built fine... until I enabled the second stage cross compiler. So ./build.sh sh2eb worked because it skipped cross-compiler.sh, but more/buildall.sh set CROSS_COMPILER_HOST=i686 and then the build broke trying to build elf2flt with a musl toolchain (duplicate symbol definition due to a stupid autoconf test putting -lc before the FSF's -libjustinbieberty).

What took forever to figure out is a clean build.sh worked, a dirty build.sh didn't (used cross-compiler.sh if it was there in build/), and more/buildall.sh always triggered the failure after the musl switch. Bisecting when you're not actually running the same test each time is a frustrating experience.

(Eventually I gave up on bisecting and sat down and traced the bug through the build until I figured out what was failing and why and where it came from. "What changed" is easier to debug than reverse engineering code when you don't understand how it ever worked in the first place. Or "if" it ever worked, since sometimes the difference is it decided to use a different file or include a different #ifdef section that was never compiled at all in the working version. In this case, digging through to figure out what specifically the code was complaining about finally let me figure out what HAD changed: musl in cross-compiler-i686.


January 3, 2016

Continuing to poke at find. The -depth option (to trigger depth first search) was recursing before loop checking, so endless loops (with -L) wouldn't be detected in that case. Oops.

I go back and forth on various design issues. Supporting multiple -execdir on the same find command line is one of them: some days I'd go "just detect the second and error out saying it's unsupported", but I'm already halfway through implementing support for it and it's not that bad. (It's _obscure_ but there's a strong element of "do it RIGHT" in toybox. What doing it right entails is the wibbly bit: is _not_ doing it at all the right thing?)

I've got it the point where "find toys -execdir echo '{}' +" is _sort_ of working. As in it's pushing a new context onto the stack _before_ saving the current directory, so each exec list starts with the enclosing directory name. Oops.

My "natural addition point" for the list push yesterday is elegant and simple and (unfortunately) wrong. And -depth is still tangled up in it, because in that case the whole directory processing needs to happen from COMEAGAIN. I think that -depth is actually currently wrong, not processing directories at all (because it returns early to change processing sequence, but comeagain isn't doing the extra processing in that case).

And all of this makes me really want a worthwhile test suite already filled out so I can at least spot introduced regressions. And of course poking at that reveals that the ubuntu version of "find toys -execdir echo {} + -execdir ls -C {} +" _isn't_ doing the collation thing, it's calling each one individually with a single argument (so execdir with "+" acts like execdit with ";"). which means building the test suite by testing the host version before I've got the corresponding target stuff in testable shape... is a little harder.

I hate having a test suite that only tests ONE version of the code, because how do I know if the tests are right or just confirming that the code does what it does? Catches regressions, but doesn't prove anything _else_. But when I start getting into "ubuntu does this wrong, busybox does this wrong"... I guess if nobody's noticed it can't really _matter_? But it's still not RIGHT. Grrr.

Anyway, this is why I added SKIP_HOST to tests that the ubuntu version is expected to fail.


January 2, 2016

Poking at toybox find.c again. The -depth and -xdev options aren't affected by parentheses, they're global. I wonder if there's an easy way to note that in the --help output? (Terse vs thorough, the eternal struggle.)

Isabella Parakiss reported (and fixed) the same bug Daniel K. Levy reported back in September. It took a while for me to confirm this because Dreamhost's list archive is STILL down and what I had bookmarked was a link to the web archive. (Archive.org snapshotted the september index page, but not the individual posts.)

Anyway, dug through and confirmed that, cherry-picked out the actual fix (and the "linux environment space hasn't been limited to 128k/process since 2007" fix), and a pending "we're checking S_ISDIR(dev) instead of S_ISDIR(mode)" fix that I thought I'd already checked in? Right...

Anyway, the remaining big find hairball is making -execdir + and friends work, which is where I left off last time I was looking at this.,/p>

So -exec, -ok, -execdir, and -okdir all collect a list of names and arguments to call exec() on. the ok variants prompt, the -exec ones just call. The dir variants collate them by directory and do fchdir() into each dir, the others use paths from the top directory and does the exec from there.

All of these can be terminated with either ";" or "+", the first calls exec on each file as it's discovered, the second collates a bunch together and makes one exec call ona pile of them. In the + context the collating is different between dir (flushing as it exits each direcory) and non-dir (collecting a giant pile of absolute paths).


January 1, 2016

Happy new year.

I put "update the darn blog" as a patreon goal, and even though we never got near it (not that I've been exactly shouting about my patreon from the rooftops) I've updated my text file whenever I remember. Unfortunately, when I trail off in the middle of a thought I do the next day's entry and then have to go _back_ because I can't post them out of order (rss feed, for one thing; policy of not editing entries after they go up for another). And when there are LARGE entries like "I'm sad my friend's gone nuts" that I need to write up more than the first 25% of, or things where "there's a link that goes here but I'm not on the machine I saved that on...", or it's a half-hour's research to finish this paragraph and I haven't got time right now...

Well they can add up. Plus I'm doing all the tags by hand and I have to check the thing for typos and unfinished paragraphs and forgetting the closing quotes in an <a href=>

so it eats the next several sentences...

Anyway, I was hoping the patreon would guilt me into it, but I guess I should tap the amount down. (The patreon totals recently got adjusted to be take-home pay after fees anyway, so everybody's goals should shift since then.)

I'm conflicted about patreon. On the one hand, it would be amazing if my open source work did become my full-time job. On the other, I _like_ my $DAYJOB at a company that's redoing the electrical grid to make solar and wind mainstream viable (Problem: legacy electrical grid designed for centralized generation, if you feed nontrivial amounts of electricity back in at the edges people's light bulbs start exploding. Solution:)

And the WAY they're doing it is awesome, creating open hardware by cloning architectures all the patents have expired on, releasing the resulting hardware source files under a BSD license, and writing up both how to test and modify them on FPGAs and how to negotiate with fabs to burn small runs your own wafers to make your own SOC. (Right now the smallest run it's worth doing is 6 wafers, resulting in around 36,000 chips with our little SOC, which would cost around $50k for a 150 nanometer process. Kickstarter regularly funds projects dozens of times more expensive than that.)

The founder of the company is the founder of uClinux; my entire open source career has basically been following in that guy's wake and I get to hang out with him in person for weeks at a time. They fly me to japan to do this. He walked me through how nommu should work. I got to drag my friend Rich Felker into this and get HIM doing important work on their dime too.

I LIKE this job.

But... turning android into a self-hosting development environment is _important_. I found the place to stick the lever to shift the entire course of the computer industry and I'm PULLING AS HARD AS I CAN. And it's time-critical and NEEDS to happen and I can help it happen the right way and... I gotta do a systemd replacement init. I need to do proper container infrastructure. I need to clone a non-GPL git for Android's repo. I need to de-hairball AOSP. Clearly Google isn't going to hire me to do this (we've been through this. We've ESTABLISHED this), but they're LISTENING. They'retaking my code and contributing to my code and I need to ramp development UP and get it all DONE and...

I'd like to convince $DAYJOB to hire more people to let me focus on the bits where their interests and android's overlap, but we're at that ironic stage where we're too busy to hire more people. (Money they have, engineering bandwidth to bring new hires up to speed, not so much. Part of the reason I've been quiet about the open source nommu/jcore stuff, apart from being so busy with everything else, is our first big push got an interested person who wanted to come work for us and... we dropped him on the floor. We were too busy to follow-up on hiring desperately needed people. We also had a consultant in Japan who we were too busy to properly bring up to speed to the point we could use her...)

I keep thinking I should really focus on fixing _THAT_, but I'm not management and don't want to be. (I'll team lead your ear off but hiring and firing decisions are pure Bistromath to me.)


Back to 2015