Rob's Blog Creative Commons License rss feed livejournal twitter

2012 2011 2010 2009 2008 2007 2006 2005 2004 2002


December 31, 2013

Waited half an hour for the bus, it didn't come. Fade was off helping a friend move, so I walked home. It took four hours, cell phone battery died an hour into it. Home now, feeling a bit stiff. Still, only a couple coughing fits the whole way, so I think I'm getting better.

Huddling on the couch with Fade, watching MST3K mock a movie I saw first run when I was 6 years old at the Richardson outdoor theatre on Kwajalein. ("Laserblast": it's really bad). Camine's in her room doing her own thing on her computer. My friend Sally's in town but we missed each other due to the long walk and the dead phone battery.

Happy new year.


December 27, 2013

The toybox cleanup page is chugging along. I completed the ifconfig cleanup weeks ago but haven't quite completed the _writeup_, because I missed bits going along and have to backfill working out what I was thinking.

Part of the reason I haven't been blogging as much is my large technical diressions go into that sort of thing now, eventually posted to the toybox mailing list instead of here.

The main reason is, of course, Day Job eating my life so I don't _do_ as much of other stuff as I used to. The cray job had a 90 second commute between my tiny apartment and the office building next door. When the busses align it's more like an hour and a half each way, and when they don't it's taken twice that. The majority of my open source programming time these days has been _on_ said busses, but it's two busses each way with a random wait before each (Cap Metro: our schedule is merely a suggestion) and if I start programming instead of standing at the edge of the road the bus will _pass_ me without stopping... Yeah. Haven't quite wanted to pay $1600 to make Stafford, Texas go back to not mattering, but it's getting there...

I was cutting out the shorter of the two bus trips with biking and getting some actual exercise until I got this darn cold. Which turned into bronchitis, so I cough if I do anything that makes me breathe too hard. Fade's been driving me to work several days, but doesn't like driving me back so I don't have my bike with me and thus walk (slowly) the bits where the bus doesn't come after a half hour of waiting.

Anyway: cleanup. Yay cleanup. Working on NUL terminator (-z) support for GREP (to work with find -print0 and xargs -0), and then I've got two other pending grep requests: make it work with embedded NUL bytes (for grepping strings out of executables) and add -ABC support. And then maybe I can get back to the long-delayed find cleanup. (Little things, like "find should take multiple directories to search and the submitted one doesn't".)


December 26, 2013

I really need to do an arbitrary precision math library for toybox, because thanks to Peter "gratuitous complication" Anvin, the kernel requires bc to build. (If your change to the kernel requires the Linux From Scratch developers to change their code for no obvious benefit, you may be doing it wrong.)

I've been reading through libtommath on and off, which is the public domain implementation dropbear uses, but there's no obvious documentation and I really don't know where to _start_. The website is useless, there's no documentation in the source, the demo code is a giant block of unexplained stuff, the main header file spends the first dozen screens of text not trusting LP64 or c99 and adding workarounds for building the code as C++... (I go out of my way to break C++, it is not C and people who think it is need a session with a clue-by-four.)

I could trivially do my own addition, subtraction, multiplication, and even division on arrays. (Said division would basically be long division and thus really slow, but I don't hugely care.) But... exponentiation? Roots? Trig functions? Not a clue how to do those, and bc's got 'em. (Well, the trig stuff can be implemented as infinite series, but _ouch_...)

Possibly I can do a truly half-assed stub version that just uses double for everything under the covers and swap in something less horrible later, and that would let me get the actual bc parts written. But this is a large lump of domain expertise I haven't got, and with Day Job eating my life to pay for Big House, I can't wander off to a corner for two solid weeks worrying about nothing else until I've puzzled through it. And it's not the kind of thing I'm any good at picking up in 15 minute increments.


December 25, 2013

Nice christmas morning with Fade and Camine. We got stuff. We had food. (Some of which was part of the stuff.) Hanging out with good people. It was quite pleasant all around.

The Dog is having The Best Christmas Ever, what with the multiple new plush toys and the bag of dog cookies. Peejee got a handful of catnip on her Giant Cat Tree and is still there rolling in it an hour later. George remains zen about everything, I think it's her way of ignoring how fluffy she is. (She became a longhair at adolescence and has never forgiven the universe for this.)

Our christmas tree is more or less the "before" version from the charlie brown special, but hey. It survived to the day. (A little one foot potted thing we can presumably plant in the backyard if it survives the day.) Apparently, _not_ watering them is important. It's a tree-shaped cactus.


December 24, 2013

I get christmas off. New Year's too. Took new year's eve as one of those musical rotating floating holiday thingies that aren't quite vacation time and aren't quite sick time. Working the rest of the weekdays, of course. (Yes, even new year's eve. Starting work in October doesn't give me a lot of accumulated vacation.)

I haven't touched aboriginal linux in weeks. I have largeish plans for it, but people are waiting for the toybox stuff, and I should really attack the kernel documentation pile, and this is all fighting over the 20% of my time I'm not spending at work, commuting to and from work, or too exhausted to do anything but watch Columbo or play Skyrim.

Pending aboriginal plans include:

I'm not currently working on any of that. No time/energy. :(

Aside: the gentoo guys still insist that annotating very single package in the tree with every architecture it's allowed to work on is a good thing. Swapping C libraries? No sweat. Swapping compiler versions from gcc to llvm? Piece of cake. Rehosting the entire thing on a bsd kernel instead of Linux? Could be done. Moving from sh4 to hexagon? My GOD man are you INSANE, this requires months of clean room laboratory analysis in full bunny suits by a team of DOZENS of qualified certified professionals!

Yeah, not seeing it. Things break, you fix them. Gentoo's paranoid approach to architecture support is nuts. They DON'T annotate every package with c library, compiler, or kernel it's allowed to run on. Processor type is not more intrusive than those.


December 18, 2013

The rsync description says it uses a three level search, the first of which is a 16 bit hash table, the second is a check of the sorted list of hashes, and the third is the md4sum. Except since rsync 3.0 it's md5sum, and that was 2008 so I expect I can just do that if I can work out the version flags. (I've already got md5sum code, md4 is a bit... less documented than it used to be?)

I expect the 16 bit hash is just (chk&0xFFFF)^(chk>>16), with the entry pointing to the first position in the sorted table. The table is an array of checksum+md5sum entries. I probably don't actually need to store the position in the file because you can work it out from the table (array of block hashes, blocks in order)... No, I need original position because the table is sorted, and that might as well be bytes. So a u64, a u32, and a char[16]. 28 bytes per entry, might pad to 32 but that's not my problem.

So not that hard to implement. The funky bit's likely to be working out the protocol that rsync is sending across the wire and expects back. (Only three sets of data though, per file anyway. Then I can work out that whole incremental directory thing...)

No I'm not looking at rsync's code, they went gplv3 (because it's a subsidiary project of samba, and samba was the one big non-gnu project dumb enough to buy into the FSF's power play). That means it's not open source, it's now Libertarian Software, and I'm not getting any of that on me.


December 17, 2013

I didn't make much progress on rsync last weekend, but I did get thunderbird installed on New Laptop and ssh -X to it from the netbook to download the week's worth of email since Balsa once again found an email it couldn't download or get past.

Now I'm going through the backlog of emails balsa already downloaded, and slowly setting up mail filters to break the thunderbird-side stuff up into folders. (It can filter while downloading. Balsa claimed to have this capability, but it never worked.)

Really hoping thunderbird scales better with pop downloads than it did with imap downloads...


December 14, 2013

Finally starting to feel slightly better. Wow this was a tenacious cold.

So I recently tried to run buildroot under aboriginal, on the theory defconfig looks like it's building packages for an arbitraryish host now. And it died almost immediately needing rsync. Simple enough, wget rsync, configure, make...

And the rsync build died needing perl.

Right, new project for the weekend: implement rsync in toybox.

I dug into rsync back around 2001 and implemented the sliding checksum thing in python, but it was 1/10th the speed of the native C code so I didn't pursue it very far. Digging back in... md4 is rfc 1320, and the running hash is described in gratuitous mathematical terminology here, plus tridge's thesis is online...

(Yes, "rewrite project from scratch" is a reasonable response to finding a build gratuitously requiring perl.)


December 10, 2013

Day 9 of the sick. Camine still has it too.

This is remarkably persistent. I took last tuesday as a sick day, telecommuted wednesday, had Fade drive me at least partway to work thurday and friday (not up for biking), huddled on the couch all day saturday and sunday playing skyrim... and still sick yesterday. (Three energy drinks will get me through a work day, but I'm still not exactly 100%.)

The coughing is the most annoying part. When I lie down, I cough lots more, which makes sleeping difficult. Also my lymph nodes are swollen enough shaving is painful.


December 9, 2013

My netbook finally needed a reboot today. First time since June. I drove it deep enough into swap that an hour later it still hadn't let me move the mouse pointer. I did this by right click open in background tab on three different links in chrome, and then when it spent five minutes thrashing doing ctrl-alt-F1 to try to get a text console so I could do the "ps ax | grep flash; kill flashpid" dance. Unfortunately, these days that's no longer handled by the kernel but instead handled by X11 going through the whole desktop stack (including the gnome crap that xfce pulls in for no apparent reason), meaning it just added MORE memory pressure, and the poor little netbook with only eight gigabytes of ram went catatonic with swapping.

You'd think the out of memory killer would trigger during this, but no, it hadn't run out of SWAP. The auto-partitioning when I installed this thing gave it 4 gigs of swap, it can churn through that for days before deciding it's out of memory and it has to kill process. Freezing and being unresponsive for hours is much better than killing _processes_.

I don't think I'm going to miss current Linux userspace when smartphones leave it behind. It used to be happy in 16 megs of ram; that was a huge box. Now it's cramped in 7,714 megs of ram. Progress!

Anyway, that giant pile of open windows and half-written email replies and such that I'd lose if I rebooted the machine? Moot point now.

Also, a couple days ago balsa hit yet another of those email messages it crashes trying to download, and rather than resurrecting my manual (python) workaround I've been meaning to switch back to Thunderbird anyway once I'd caught up on the stuff it already downloaded (and finished/sent the dozens of reply windows I had open, until said reboot).

Being able to run the mail filters (written in python because the built-in ones don't work, and only usable when you close and re-open balsa, which discards pending reply windows) make that a bit easier now...


December 2, 2013

I has a sick. I'm blaming Oklahoma. (Camine's already had it for a couple days and she stayed here, but my point stands!)


November 27, 2013

Long drive up to thanksgiving and the family reunion for my grandmother's 90th birthday. Fade's driving all of it because I still haven't dealt with the "my insurance card fell under the passenger seat, and you want me to drive to Houston instead of Faxing it to you?" ticket from 2008. (License finally expired in February, but I was in Minnesota sans car for 5 months after that.)

Using the drive to tackle the giant pile of email I've fallen behind on. (Maybe once I catch up I can switch from balsa back to thunderbird. It's still crap, but I have a frame of reference for how much _worse_ an email program can be, and I think using pop instead of imap might beat the download stupidity into submission. Maybe. All I know is departure this morning was delayed an hour and a half because switching out of threaded mode to flat mode before downloading email took that long. (Inbox has 86,000 unread messages, of which about 16,000 are relevant and the rest is due to skipping. This isn't much, but the idiots put O(N) or greater algorithms in the code. Even once the threading worked its way through, it took about 3 seconds to add each message to the inbox, so just downloading a day's worth of email is about half an hour. You wonder _why_ I'm so far behind, this feeds on itself...)

So, stuff from email I should remember when I have net again: util-linux apparently has a "nsenter" command now (fork child, have child do setns(), then child clone(CLONE_PARENT) to create grandchild in right namespace). There is video of Linus's most recent keynote. Bernhard the uClibc maintainer mentioned his way of doing "miniconfig" (which may not be the same miniconfig I use), /proc/meminfo is growing a new MemAvailable: entry so we can stop adding up MemFree and MemCached (which doesn't work because tmpfs is cached but not freeable, doesn't count freeable slab, and so on), a fun post from Christoph Hellwig 11/8 or so about XFS maintainership (I should reference that if I redo the prototype and the fan club talk)...

Still a couple weeks behind. Oh well, there's always the drive back.


November 23, 2013

Gah. Had my first "scintillating scotoma" migrane vision screwup since the one in St. Petersburg. Triggered, as far as I can tell, by the "lemon dishwater" flavor diet Rockstar energy drink. (Which always caused flashy things in my vision if I drank them too fast, I'd forgotten why I stopped drinking them and now I remember.)

Happened at night so much less television static and much more "I can't see my hand when I hold it here. my brain's editing out a large part of my vision that's not working right". Turned into "headache on the other side from where it usually is" about the time I went to bed, back to normalish by morning. (Writing this the next day but dating the entry when it happened.)

I no longer seem to have the fortitude for energy drinks. I'm also curious what random snake oil thing they put in that one that I react to. ("Milk thistle" maybe? I keep confusing that with milkweed, which I _know_ is poisonous...)


November 19, 2013

New toybox and aboriginal releases. Email remains unmanageable.


November 17, 2013

Back in 2007, the new kubuntu version I installed had a broken knetworkmanager that couldn't accurately track whether the wireless connetion was up, and thus things like mozilla that paid attention to what dbus had to say about network state (_why_?) refused to use the active network connection and forced themselves into offline mode. (Mozilla had a pulldown menu option to manually override this, konqueror did not.) I filed a bug, which got merged to another bug, which was "high priority" until it got marked invalid. Because that's how bug reports naturally go: this is an EXTREMELY IMPORTANT problem and we decide it's NOT REAL. Ok then...

Along the way, I got involutarily added to the notification list of a bug titled notification bar needs to look slicker, which had nothing to do with the bug I filed (which was about a functional issue), and I've been getting periodic email notifications about it ever since.

For six years now.

Three guesses why I no longer use KDE?


November 7, 2013

Heh. Wikipedia (the world's largest stash of anecdotal evidence) has an article on toybox that is... um...

The launch of the BusyBox Replacement Project in 2012 by Sony engineer Tim Bird (and the hiring of Toybox's maintainer Rob Landley to work on that project) was controversial, as it was viewed by BusyBox's maintainers as an attempt to avoid BusyBox's GPL license terms rather than a desire for technical change.

It says Tim Bird hired me to work on Sony's android stuff. That never happened. It says things about the busybox maintainers (of which I _was_ one) that are equally untrue.

Tim did propose funding work on a pair of related projects (android mainlining to get android kernel patches into the vanilla kernel, and "bentobox" to extend android's toolbox and bionic into something usable), and had me in mind as a potential candidate for bentobox contract work. But I was busy with a day job at polycom when he pinged me about it.

So Tim made a wiki entry saying "maybe this would be a good thing to do" and asked the celinuxforum list for opinions, which aggroed an FSF zealot who wrote a very misleading blog entry claiming a conspiracy on the part of Tim's employer, ala "How DARE you take away busybox as a weapon the FSF can use to install compliance officers submititing quarterly reports to the FSF!" Which I boggled at because dude:

Once the FSF zealots started freaking out about Tim's wiki page with the project proposal (dude, CELF does a bunch of those every year), Tim didn't want to spend political capital fighting off the community auto-immune reaction that is the FSF.

But I'd only stopped working on toybox because I didn't see a niche for it: being incrementally better than busybox wouldn't replace a project with a 10 year headstart. (I even considered rewriting it in Lua because I enjoy banging on this kind of thing, but Lua's lack of a standard posix binding library meant you'd have to install a half dozen lua packages as prerequisites, and that sort of defeats the purpose.)

BusyBox was always free to use my code. Even at the height of toybox development I still wandered through to fix the occasional bug. Heck, I encouraged the busybox developers to see it as an independent project rather than a "reimplementation". I myself repeatedly submitted toybox code and design ideas to busybox so the larger userbase could benefit from it. I even sent whole new commands to busybox while toybox was completely moribund. I met Denys in person at CELF in 2010 and sat down and _explained_ toybox to him (which is why busybox acquired the ability to add a command in just one file instead of touching five files scattered throughout the tree).

No seriously, my point working on toybox (like busybox before it) was always _better_code_. The reason I didn't go back to work on BusyBox full time is it was worse code. (The wikipedia guys explicitly disbelieve this, but oh well.)

(On an unrelated note, I sometimes regret not calling toybox dorodango. Missed opportunity, I didn't think of it until way too late, but it's a better name for what I'm trying to accomplish. It's all about the polish.)

Anyway, once I noticed that a different license would give toybox a niche I happily got back to work on it. Tim went off to focus on the android mainlining half of his plans that _hadn't_ aggroed the FSF zealots, and we went our separate ways...

Not exactly the story Wikipedia's telling. :)

Oh, and the "busybox maintainers saw it as" nonsense? I _was_ a busybox maintainer. I appointed the current maintainer Denys, and linked to a dozen interactions with him _about_ toybox, above. (He emailed me questions about licensing last month.) Denys' immediate response to being told about the furor over the biased framing of the FSF zealot?

Some of the arguments from the "other side" found in that thread make sense. We are possibly a bit too aggressive when we try to force people to comply with GPL on other projects too, not only on bbox.

Meanwhile I'm unware of Erik Andersen having said anything about Busybox in the past few years, he's wandered entirely away from the project. And between the three of us, that covers the past 15 years.

No, the people who freaked out never had anything to do with BusyBox in the first place. I've found in general that the more coding you do, the less time or interest you have for licensing issues. If you're good at coding you can always write more. And defending the existing stuff's kind of silly when periodically throw it out and rewrite it anyway. (Those who can't code, litigate.)

Sigh. You'd think this would all be covered by now, but apparently not.


November 3, 2013

Hey, new kernel is out. 3.12. Meaning I need to cut an aboriginal release, meaning I need to cut a toybox release for it to use. Meaning I should really catch up on email, or at least go through the weblog and dig out the public list postings to reply to.

Lemme see if I can do that before -rc1 ships...


November 2, 2013

Sigh. I am sixteen _thousand_ emails behind. (If you email me, don't expect a reply for about two weeks.) How did I get so far behind? Like this:

Earlier this year I proposed doing a kickstarter/indiegogo to see if anybody wanted to sponsor toybox work. I know at least two companies have assigned enginers to work on the project, but I wasn't one of them. My spare time goes into it, but only after competing with many other things. (I've felt bad about falling behind on the review/cleanup of the stream of commands going into "pending", but not bad enough to not spend time with my family.)

But when I posted the crowdfunding idea to the toybox list, of making _me_ one of the engineers assigned to the project, with proposed text of the campaign video and the most lowball budget I could, it got zero replies. Nobody cared. And that means I don't have to feel guilty: I offered to take a pay cut so they could have more of my time, and they didn't want it.

With a real house and a wife in college, looking for a long-term stable position here in Austin that allowed me to make 5 year plans was the obvious next move.

I don't like job hunting, and it's even MORE stressful when you're not looking for a 6 month contract where you can smile and be professional through anything because you know when it ENDS, but are looking for something you're intend to commit a significant portion of your life to. (The fact I tried this with TimeSys and Polycom and both of those changed out from under me after I started, so that at least half the team I was hired into quit before I did in both places... I didn't want to repeat that, so I had to be MORE careful picking.)

Which meant I retreated into a bit of a shell while job hunting, where I did what was necessary to advance the job search each day, and then bogged off to play skyrim. This meant I fell behind on email, and didn't do a whole lot of cleanup on the toybox pending stuff. (Or the aboriginal cleanup like making initramfs the default and the native toolchain work in a mount point under that.)

Monday October 21st I started my new job at pace.com's Austin branch, and lost a couple weeks to the usual ramp-up/cliff-climbing. And now I've resurfaced from that... to 16,000 back emails.

Balsa remains broken crap. I can only run the external mail sorting filters I wrote when I exit balsa and restart it, and I have around 30 half-finished reply windows open; of course Basla won't remember any of those across a restart, what do I think this is, kmail?

Up in Minnesota I didn't mind so much; quiet evenings, nothing much else to do but catch up on email. The giant unsorted inbox between restarts forced me to skim all the linux-kernel and qemu-devel messages as they went by; I've actually read linux-kernel this year for the first time since kernel-traffic went away. (Proper summaries let you skip sections without losing the plot, and thus catch up if you fall behind.) But now I spend time with my family in the evenings, I run errands for friends in town. An hour and change of email per day means 8 solid hours of catch-up if you're a week behind, and 16 if you're 2 days behind. And that's time _not_ spent working on toybox or aboriginal...

I've collected 41 linux-kernel Documentation patches, of which I'd guess half already went upstream through different channels (although not necessarily in Linus's tree yet), or otherwise shouldn't be sent. I need to deal with that, since I'm nominally the linux-kernel Documentation maintainer. (I should send a patch to make that say "JANITOR" in the MAINTAINERS file. When the kernel guys go on about "everybody working on linux these days is paid to do so" I laugh. And these days don't particularly participate in the conversation because I'm reading two week old messages to scoop up Documentation patches that fell through the cracks...)

When 3.12 shows up on kernel.org I need to cut another Aboriginal release: I still haven't rewritten ccwrap.c so I can rebase on musl (and I haven't updated the musl wiki current events page since _July_), I still haven't finished adding the alpha target even though I _think_ I have all the pieces now. I worked out why 9P broke: the kernel guys were stupid. (That's another rant.) The easy way to fix it is to yank virtio support (a bad patch added to that is what broke TCP/IP support), but that means getting the 9p server in toybox working and I haven't had time to work on _that_ since Minnesota...

And of course buried within that giant pile of unsorted email are a dozen or so toybox messages (according to the web archive, but I can't reply to them from the web archive). New commands to add to the giant pending heap, when I haven't even managed to catch up on the cleanup page describing the cleanups I've already done. (I'm 90% of the way through ifconfig. One of my open windows is the musl-libc test page with all the ipv6 address parsing tests, and that plus the ipv6 wikipedia page should be all i need to finish the ipv6 bits of ifconfig and promote it out of pending. But I'm also halfway through a tr implementation which is the 2nd most used remaining busybox command in the aboriginal build, I'm also halway through cleaning up nl and find, and I need to update the roadmap and status pages...)

But at least I _officially_ know that the toybox users are ok with the scraps of hobby time I can send them. I have permission to fall behind on that project.

I still feel bad about it. Oh well, can't have everything...


November 1, 2013

Got an email a while back:

Rob,

I have developed a way to allow users to install and use GNU/Linux rootfs on Android without requiring root permissions. I thought you'd be interested to know that I chose Aboriginal as one of the first examples, since it is quite useful and lightweight. My other example is Debian (much much larger, but has apt-get etc).

https://play.google.com/store/apps/details?id=champion.gnuroot

Thanks for all you do and have done!

Corbin Champion

It's... ungrateful of me to hold the "GNU/Linux" in the name against him. One of my original motivations for what became Aboriginal Linux, back in 2003 or so, was replacing all the gnu packages in Linux From Scratch with alternative implementations like busybox and uClibc to come up with a Linux system that DIDN'T have any gnu stuff in it. Something that was clearly NOT "GNU/Linux" I could confront stallman with to see how what twisted rationalization he came up with to still claim credit for it anyway.

But the cool thing about Corbin's project is it might let me boot a Linux chroot on a vanilla android system, presumably without requiring root access. Given my focus these days, that's really useful. I should give it a try...


October 31, 2013

Next Halloween, I should find a halloween party or something ahead of time. Or at least remember to dress up. I got to work today and was surprised by small children being escorted around the building in costume.

Oh well, at least I got the woodchuck mailed out in time...


October 28, 2013

The following is a reply I composed to David Wheeler on the posix mailing list (the "austin group" list coming up with the successor to Posix-2008/SUSv4), and then didn't send because it turned into a big rant that wouldn't help there. Posting it here instead, where it won't bother people who can't do anything about the issues raised in it.

(The context is that David wants to extend posix's "make" description to be less completely useless and out of touch, and he's getting pushback. I pointed out I need to do a "make" for either toybox or qcc as part of making android self hosting, and posix make wasn't good enough to actually build linux from scratch...)

On 10/26/2013 03:23:07 PM, David A. Wheeler wrote:
> On Sat, 26 Oct 2013 11:18:21 -0500, Rob Landley wrote:
> > If you like, I can try to document what I add on top of that (and what
> > packages need it), and possibly I can make a config entry in toybox to
> > do "posix only" mode so others can also watch package builds break?
> 
> I'd be interested!  However, different people depend
> on things, and it'd be easy to argue that the list just applies to you.

Of course. Just as http://landley.net/toybox/roadmap.html only applies to me.

This is not my first time at this. I extended busybox to build linux from scratch and beyond linux from scratch; about 90 packages at the core of linux distros. If this meant "sed -ir", so be it. There were multiple common Linux packages that didn't build without it. (I'll maintain a small local patch to remove a feature from one package, occasionally from two. Not from three.)

Now I'm writing toybox which means I'm doing it all over again (with cleaner code and a license android can actually use), and I really don't care _what_ posix says about a lot of things. Posix is something I try to document my deviations from, and the large gaps it's silent on (most recently http://landley.net/notes-2013#22-10-2013). What I care about is making it _work_.

What does posix document? AIX is dying (a new co-worker who started monday was hired away from what's left of the AIX department here in Austin started at my job this week, and even _he_ says AIX is toast; remaining support just transferred to India). Solaris was bought by Oracle which is doing the whole "dying business models explode into a cloud of intellectual property litigation" thing (killed by in-memory databases) _and_ it's second banana to Oracle's Red Hat clone there. After the previous bloodbath that killed Irix and HPUX and such, that leaves what?

I'm paying attention to:

1) Linux/Android
1A) (I'm aware Cygwin and mingw exist, but can't say I really _care_.)
2) FreeBSD and friends.
2A) MacOSX/iOS and the rumored Darwin.
3) Everything else.

With the smartphone kicking the PC up into the server space (or at least a big NSA data center in Utah called "the cloud") the way the PC did to the mainframe and minicomputer before it (plug a smartphone into into a USB3 hub, add keyboard, mouse, and USB->HDMI adapter; congratulations it's a workstation. And this is ignoring the way successful tablets are all big phones, not small PCs,) the dominant phone operating systems already have more unit volume than the entire PC space combined. (Android's passed a billion units and should do so _annually_ next year. As soon as it outgrows Dalvik, I.E. this generation's ROM basic, the desktop's a relic. Over on the iOS side the current iPhone has a 64 bit processor, bluetooth keyboards, and "airplay" to HDTV. Plus Apple's the one sponsoring LLVM. Yeah yeah, the PC will never challenge the dominance of the Vax, how'd that work out?)

So musl targeting Android (under a license that doesn't violate android's "no GPL in userspace" policy) makes it potentially important. If samsung installs musl in their android phones by default it would get literally _billions_ of installs before the decade is out. Posix influencing musl could provide great leverage. Posix influencing the next interation of Illumos, a platform three people and a duck use? Not so much.

> Could you do a more general survey of "what nonstandard
> make extensions are most popularly depended on",
> including software with a variety of licenses?

I care about a non-gnu Make implementation building (roughtly in this order):

1) Linux from Scratch.
2) Beyond Linux From Scratch.
3) The Android Open Source Project's giant horrible repo hairball.
4) bootstrapping debian/ubuntu/fedora/suse/gentoo/slackware/etc from source.
5) PCBsd
6) MacOS.

The first two because they're really easy, the third because it's the point of the exercise (make Android self-hosting before iPhone does; the PC took off once it didn't need to be cross compiled from a PDP-10. Once owning a PC was all you needed to develop for a PC all the way up and down the stack, it got an explosion of new software, and letting Apple become the next Microsoft controlling hardware _and_ software would suck).

The fourth is assisting in legacy migration: software people already wrote for Linux over the past 20 years is currently still of use, and it's low-hanging fruit to support it if you're extending android towards that anyway. (Maybe in a continer. I expect posix to notice the existence of containers sometime after it notices the existence of mount points, the deflate algorithm, the passwd command...)

The fifth because I couldn't get Free, Net, or OpenBSD to install into a qemu image, and when I asked Kirk McKusic (he was keynoting Ohio LinuxFest) he said try PCBsd and that one installed. (Couldn't make it _not_ use ZFS, but I can ignore it for now.) So there's a BSD to try. Android has a "no GPL in userspace" policy (if you add GPL or LGPL software that isn't grandfathered in like the kernel, you violate the Android trademark guidelines and can't call the result "Android" in your advertising) so here's a big pile of non-GPL code and developers to pull from. Or at least the parts Apple hasn't already swallowed whole back when they hired Jordan Hubbard and such to do MacOSX...

The last is because I can't ignore them. Partly it's giving that pile of users a migration path outside their walled garden, partly it's hedging my bets in case Apple does better without Steve Jobs than it did last time around, but mostly I can't ignore http://tech.fortune.cnn.com/2013/02/06/apple-samsung-profit-share/ . The iPhone was first mover in the smartphone space, still growing and profitable. Hasn't tanked yet, and Darwin's sort of nominally open source in the right light with a tailwind. Ish.

But if I didn't even list Oracle's fork of Red Hat Enterprise in my linux distro list, why the heck would I care about either of the people still using their post-closure Solaris? And what else _is_ there anymore? AIX? Did you notice how IBM just killed AIX support in their cloud offering due to lack of interest? (http://www.informationweek.com/cloud-computing/infrastructure/ibm-shifts-smartcloud-customers-to-softl/240163407) The only mention of AIX in IBM's most recent annual report was the phrase "AIX and Dynix" occuring once on page 111 in relation to the SCO trial. Speaking of which, Ralph Yarrow's kamikaze against Linux killed off Netware and Unixware, which were the volume players.

Companies are paying billions of dollars annually to keep existing legacy systems running, but upgrading anything would just introduce risk, so how do new standards versions apply to them? Yes, there's a lot of _money_ getting thrown at keeping the old steam-powered stuff working, but what they're paying for is the privilege of _not_ upgrading it:

http://www.networkworld.com/news/2013/081913-unix-272728.html

The posix certification website is still talking about posix 2003 as something to aspire to:

http://get.posixcertified.ieee.org/
http://get.posixcertified.ieee.org/register.html
http://www.opengroup.org/certification/idx/unix.html (links to susv3, not v4)

Meanwhile, the smartphone market (where a linux fork's fighting a bsd fork) is now shipping a billion units annually:

http://www.forbes.com/sites/connieguglielmo/2013/09/04/smartphone-shipments-to-top-1-billion-units-android-ios-to-dominate-through-2017/

Everything else is noise. It might be a source of interesting ideas, but the 86open project (trying to come up with a standard binary format for x86 unix) gave up and said "it's Linux" back in the 90's:

https://web.archive.org/web/20000816002148/http://www.telly.org/86open/

At the time, the top players were:

https://web.archive.org/web/20010222223734/http://www.sdtimes.com/news/017/special1.htm
Unix Revenue: 38% Solaris, 24% HP-UX, 16% AIX
Unix Volume: 40% SCO, 22% Solaris, 11% HP-UX

Even ignoring SCO and Sun co-sponsoring lxc to run linux binaries and AIX 5L with Linux support, what's left of all of those combined ship fewer units per year than Android ships each _minute_.

And the niche players? I did a 6 month contract at Cray this year, up in St. Paul. (Hi Rose and Andrew!) They had me porting some of their driver code from SuSE Linux (the basis of the "Cray Linux Environment" that runs all their stuff these days) to Red Hat Enterprise Linux (because a third party only supports Lustre network filesystem server on that). They ported their legacy customers to Linux years ago.

So really, what are you standardizing? What are the criteria? I'd honestly like to know, because I've been subscribed to this list for months and haven't figured it out from context.

> I understand if you don't have the time (I don't), but if someone would
> be willing to do that, the results would be especially compelling.

I'm interested in having multiple compatible implementations of things that allow you to swap out parts, ala openssh/dropbear, glibc/musl, gcc/llvm. Posix's usefulness for this varies greatly. It was no use to dropbear, and implementing what posix said got musl about halfway. (In fact uClibc had to #define GLIBC to get programs to compile, and musl/sabotage are jumping through hoops and pushing buckets of patches upstream to various packages to have the luxury NOT to do that.) Over in toybox, I have a list of packages I need to build linux from scratch (my record-commands.sh wrapper instruments the $PATH with a wrapper that let me know every binary that was called, and with what command line, during the entire build) and the posix command list also gets me maybe halfway:

http://landley.net/toybox/roadmap.html

So yes, my observations would be my observations. I'm a child of the 8-bit world, and I see Linux the same way I saw OS/2: that thing we use until it's time to move to the next thing (apparently android; the "compile to a mix of html5 and javascript... no not applets or flash this time it's different" story doesn't quite gel for me yet).

I'm all for standards when they actually help, but really they're just a form of documentation that provides a frame of reference to diverge from. Posix defining a useless "make" command isn't much different than it defining "pax" instead of "tar" or "zip" or something anybody actually uses. When RPM based their package management on cpio and Linux based its initramfs on cpio, posix removed cpio from the standard and cpio usage went _up_. You still define sccs and ed. Nobody cares.

If you want to provide pragmatically useful documentation allowing projects like toybox/busybox/toolbox and musl/bionic/glibc to agree on not just a compatible but _usable_ baseline, I'm all for it. But I started programming on a Commodore 64: it's dead. So are the Amiga, Dos with Desqview, and OS/2. Yes I did SunOS code in college but I also took a course in Cobol and another in assembly language for a hewlett packard minicomputer with BCD and a "Zero and Add Packed" instruction, the last semester before they unplugged it and sold it for scrap metal. So what?

The corrolary to moore's law is that 50% of your programming knowledge is obsolete every 18 months. The great thing about unix is it's mostly the same 50% cycling out over and over, but not entirely.

I want to prevent systemd from spreading into android. Posix is of no help. I want to promote use of contianers instead of selinux. Posix is of no help. A simple 3D binding qemu can actually emulate? Promote the sane 9p network filesystem protocol over the insane nfs and samba?

I'm still having trouble understanding the worldview of this list.

Rob


October 26, 2013

In a previous post I described how the GOP's "southern strategy" poisoned the party. In the short term, bulking out their numbers by recruiting the racist dregs of the confederacy gave them the votes to win elections, but the new recruits were idiots, and voted for idiots. They were useful cannon fodder, but none of the new activists who joined the party in response to speeches about "welfare queens" and "willie hornton" could be allowed to advance through the ranks to any sort of position of authority.

This is why Strom Thurmond, elected as a democrat in the 1940's on a segregationist agenda, switched alleigance to the GOP. When LBJ repudiated the racists, they flocked to the new banner. The confederacy remains a coherent political force to this day, they just keep the bedsheets and cross burnings to private invitation-only events.

The southern strategy started in earnest under Nixon, and the Nixon aides were the last generation of pure plutocrats predating the "we are special despite all the evidence, the world would collapse without us just because" recruiting campaign. Obviously Nixon himself thought he was above the law (hence Watergate) but nobody younger than that could even _pretend_ well enough to function if they were allowed to advance to any position of power within the party. The existing party members held to power for decades through seniority and established authority, preventing the hordes of morons they'd collected from setting the agenda, but as the decades marched on time wore them down.

By the time of the first Bush presidency, the people in charge of the GOP were tired old men set in their ways, and the end of the cold war left them adrift. Regan's strategy of recasting the cold war in financial terms and outspending the russians bankrupted the soviets and led to Perestroika, but the first Bush ran up more deficit in 4 years than Regan had in 8 even though the berlin wall fell halfway through his term. After that the GOP old guard couldn't adjust their thinking to the new world where Clinton made peace in Ireland and got the head of the PLO to shake hands with the president of Israel, all they could offer was blanket opposition leading to Gingrich's government shutdown. They couldn't cope with being the only superpower, no "us vs them" mentality to hide their plutocratic self-enrichment agenda behind by rallying together against a common enemy. (Remember, all that that accumulated deficit was money spent on something, did you ever wonder who cashed the checks? These days "graft" is called a campaign contribution, the cushy government jobs offered in Lincoln's day are now awarded as contracts, and "old fogey" has been renamed "conservative".)

When Bush Sr. appointees on the supreme court overrode the results of the 2000 election and appointed Dick Cheney president, these tired old men had a serious problem. The surviving members, Cheney and Rumsfeld and so on were in their 70's. They could promote their own children, and that's where they got an alcoholic C student who put the "duh" in W to stand under the spotlight while they ran the show. But what to _do_ with that power? They'd inherited peace and prosperity, the strongest economy in generations and no big well-defined enemy. The prosperity was sacrificed in an attempt to justify a further tax cut for the rich, and in his first major speech to congress their puppet king said "The growing surplus exists because taxes are too high, and government is charging more than it needs. The people of America have been overcharged, and, on their behalf, I'm here asking for a refund."

(No really, there's video of him saying this. W's prepared text presented the idea of a federal budget surplus as immoral exploitation of the american people, and intentionally ran the country back into heavy debt because that's what his father did, and Regan before him.)

They tried to pick a fight with china but our largest trading partner refused to be provoked. "Communists are the enemy" didn't work as an idea anymore, but they couldn't get past it until their own incompetence dropped a new enemy in their lap (one originally enranged by Bush Sr's handling of the first gulf war, an enemy that clinton had easily controlled. The GOP had of course condemned clinton's efforts against this neusance as a distraction from the GOP attempts to impeach him over having an office affair with an intern. And then as soon as the GOP was back in office their sheer incompetence let this neusance get away with literally repeating an earlier attack. The enemy hadn't changed, only the defenders had).

Still, reclaiming the "us vs them" mentality let the tired old men in charge of the GOP coast for another decade of their old cold war playbook, but they weren't good at it. They needed more. More money (hence the spiraling debt), more information (hence the out of control spying), more power (Patriot Act, Guantanamo, waterboarding, assasination of US citizens without trial)... and it still wasn't enough for them to do even a mediocre job. Hurricaine Katrina caught them completely by surprise. They were tired and old and slow and just no good at this anymore.

The Bush Jr. administration was Nixon's last hurrah. The GOP old guard is finally crumbling from sheer age and exhaustion, and letting the racist core of the party peek through. Their last presidential primary really was a who's who of the party: Santorum and Bachman brought the racist crazy, Perry brought the libertarian entitlement, Gingrich was the dregs of the old guard, and they finally settled on another W-like "son of the old guard" plutocrat who would say anything to get elected but was so out of touch he couldn't keep his mouth shut about "the 47 percent".

And now the party's being dragged around by the nose by Ted Cruz, following in the footsteps of earlier GOP senator Joseph McCarthy. Cruz is a canadian hispanic who remade himself as an american white supremacist (and insisting he could become president even though he was born in Canada, because that's totally different from claiming that Obama's Kansas-born mother might have visited another country because Canada is full of WHITE people so that obviously doesn't count).

The future of the GOP is people like Cruz. The dregs of the tired old guard ala John Boehner are now reduced to nothing but delaying tactics. More of the Nixon crowd retire or die every year. And in their place, the fruits of the southern strategy.

I don't know why more analysits don't point this out explicitly. The whole Balance as Bias thing means treating the two parties as equals doesn't _work_ anymore. There is a clear story about how one party poisoned itself and is now insane. This story is not being told or analyzed.


October 25, 2013

Got a note from Rich Felker (the musl libc maintainer) about a symbol visibility bug in gcc where building the compiler --disable-shared is broken these days. Except it's not with my toolchain, for reasons I'm not sure I understand yet.

Apparently, on a 32 bit platform where long long division has to suck in libgcc code to do it, if you have a main.c with:

#include <stdio.h;>

extern long long foo(long long);

int main()
{
  printf("%lld\n", foo(100)/10);
}

And a lib1.c with:

long long foo(long long x)
{
  return x/10;
}

And you build it like:

gcc -O2 -shared -o libfoo.so lib_v1.c
gcc -O2 main.c ./libfoo.so
./a.out

That works. But then if you change the /10 to a /16 and rebuild the .so without rebuilding the a.out, running the a.out "fails with symbol errors" he says.

So I fired up aboriginal, built a root-filesystem-i686, chrooted into it, downloaded aboriginal again and built a simple-cross-compiler-i686, used that to build the above test stuff... and it all worked fine.

Possibly the multiple large hammers I hit libgcc.a and libgcc_eh.a with fixed this. Or maybe the version of gcc I'm using is before they introduced this bug. Or maybe I'm just not reproducing it right?

Odd.


October 24, 2013

Oops. The linux-arm-qemuirq.patch I added to aboriginal yesterday is aimed at the 3.12 kernel, and of course I haven't modified download.sh to fetch 3.12 yet because that hasn't _shipped_ yet.

Ordinarily I avoid checking in such patches before the kernel they apply to is out. (I used to have -alt infrastructure but it was way more trouble than it was worth. I should do branches, but also: more trouble than it's worth. I dislike the way the put gratuitous merge commits in the history when there's no reason for it.)

So yeah, the aboriginal repository is broken until 3.12 ships. Sorry about that. Just delete the patch and it builds fine.


October 23, 2013

Trying out the -rc6 kernel, and commit f9b71fef12f0 broke arm IRQ routing _AGAIN_. And then 829f9fedee30 and 99f2b130370b were applied on top of _that_ to fix it.

Guys: this isn't funny anymore.

Sigh. The patch to fix it is simpler (down to two lines), but you'd think at some point they would RUN OUT OF GRATUITOUS CHANGES TO MAKE TO DECADE-OLD HARDWARE. They keep applying patch _series_ on top of other patch _series_ all still trying to finish this one tiny change they apparently still haven't got right, and breaking my fixes in the process...

Sigh.


October 22, 2013

Rewriting the "nl" command, posix as usual doesn't cover everything. For example, -l works in -ba and -bp modes (not just -bt, haven't tried -bn yet), and the consecutive blank line count persists across files, just like the numbering does. For non-matched lines the "separator" isn't printed but the line is still indented, so the indent is spaces only without the separator, and the default \t separator is one character so it's converted to one space so if you do "-w 2 -bpand" the left edge is a bit ragged.

Oh, and for bonus fun, the way nl says to implement stuff ("%6d" and friends) if you do "nl -w 2 -v 42" you still get a space at the beginning. Even with -w 1, there's always an extra space at the beginning. Sigh.

*shrug* Adding it all to the test suite. Hitting a few things where the gnu/dammit version doesn't do the same thing and not sure I want to add tests for that. (What's "correct" behavior there?)

For example: the nl submission from Strake had a -E option, I wonder where that came from? Busybox doesn't have it, the gnu/dammit one doesn't have it... Odd. (It makes sense, it enables extended regular expression syntax for the -bpREGEX mode, but if I add a test for it non-toybox versions will fail the test suite, so the test proves what exactly?)


October 21, 2013

Setting up a new xubuntu LTS system, time to renew the checklist. This is xubuntu 12.04. If you're stuck with the Metro desktop in stock ubuntu, the first thing to do is KILL UNITY WITH FIRE and install xfce and remove the broken scrollbar residue with a claw-hammer:

Then we do:


October 20, 2013

New day job starts tomorrow, full-time permanent position with benefits. The commute's an hour each way, so I'm unlikely to have much time to do open source stuff. (The Cray job was literally next door to my apartment, the commute was about 300 feet.)

I'd thought about doing a crowdsourcing campaign, a 90 day thing starting this month and ending in the new year so that if by some fluke I did get a year's worth of money to work on stuff, it would be in a new tax year. But when I posted about the idea it got zero replies. (I know there are companies using this stuff and having their own engineers work on it, but I'm not one of them. Oh well.)


October 16, 2013

How creepy are makefiles? Let me count the ways...

PHONY =
.PHONY: $(PHONY)
...
PHONY += $(BLAH)

The .PHONEY target winds up depending on the contents of $(BLAH). For this to happen, the label target must be expanded after the second assignment, but the second assignment must occur after the first assignment. It's not _just_ mixing imperative and declarative code, it has subtle, complex, and non-obvious sequencing rules for the declarative bits too.


October 15, 2013

Cringely just wrote about moore's law and the dotcom crash, and he got the reason for the crash wrong. I had just finished writing Motley Fool columns at the time, and sent in one more to my editors about it, which they never used.

In the late 90's the tech market got softened up by three things:

Y2K caused a burst of last-minute panic spending that everybody projected as continuing into the future. In reality spending the next year was _down_ because everybody blew their budgets in one giant surge, and synchronized their cyclical upgrades so they didn't need to spend any more money on infrastructure for about 3 years. Tech vendors plotted the growth curve forward and quadrupled their staff to deal with demad that never materialized, then had to lay everybody off again.

The rise and fall in customer demand remained somewhat synchronized for the next couple upgrade cycles, with bursts in 2002-2003 (meaning XP "took off" around service pack 2), and again in 2006ish (meaning Vista shipped in a trough between uprade cycles whose customers couldn't wait around for it and then didn't need it).

Second, Microsoft got added to the Dow Jones Industrial Average right before it crashed and flatlined, due to copying Cisco's "pooling" method of acquisitions and triggering an SEC rule that forbid it from trading in its own stock for 6 months to offset the dilution caused by its stock option income tax benefit. Translation: MSFT stock fell 50% in 6 months because an accounting trick blew up in their face, and by the time they got it back under control their "aura of invulnerability" had popped (microsoft stock never goes down! Oh wait...) and the market had lost confidence in the stock as a safe institutional asset to invest things like pension funds in.

Third, the 2000 election turned into a giant unresolved mess for a month longer than expected, which screwed up "discounted cash flow analysis". DCFA is a price measuring technique most of the big investors use where they treat an investment as a series of future payments (such as a corporation's quarterly per-share earnings), each of which is converted ("discounted") into the amount you'd have to stick in a savings account today at a known interest rate to have that much in the future. So the farther in the future it is, the less it's worth now, and at some arbitrary horizon you stop counting because the future is uncertain so that money's not "real". What interest rate do you use for discounting, and how far into the future do you look? The first is "what you can get by investing elsewhere" (average of the rest of your portfolio's historical performance) and the second is baed on how predictable the income is. The hundred year old Coca-Cola company has a high price to earnings ratio because investors trust it to stay in business forever, so include lots of future payments in what they're willing to pay for it today.

DCFA is the mechanism by which uncertainty about the future lowers stock prices: the less certain the future is, the fewer future payments investors include in their math to determine what they're willing to pay for a given asset. And the month of "hanging chads" in 2000 cratered the stock market.

But so far, all of that was just wall street noise. None of it affected the larger economy. This softened up the dot-com boom, but isn't what broke it. What broke it was The Shrub being an idiot.

When the Bush I appointees on the supreme court voted in a solid block to override the results of the election, suddenly W and Cheney were big news, and the new crews flooding out to film them found them both on tour giving stump speeches about Imminent Financial Doom. Why? To justify their giant tax cut for rich people and the corporations who'd paid for their campaign. (Remember, El Shrubbo had raised more money in the _primary_ than anyone ever had before.) Of course this tax cut would be massively inflationary in the full-employment, supply constrained economy they'd inherited from Clinton, and thus congress wasn't willing to pass it. So Bush and Cheney gave a series of speeches insisting there was an upcoming recession nobody else had heard about, and only a giant tax cut for the rich could possibly ward it off.

This was news to everybody else, but thanks to the supreme court they were all over the news. This was the first anyone had heard of an upcoming recession, but repeated assertions of doom and gloom (any day now, sufficient to justify a massive tax cut for the ultra-rich) were the top story.

With talk of recession suddenly all over the news, corporate decision makers started belt tightning to prepare for it, just in case these clowns knew something they didn't. Of course you couldn't lay anybody off just then, with unemployment so low people were impossible to replace. And they couldn't cut inventories after a decade of "just in time" delivery (corporate fad du jour) optimizing supply chains until this morning's parts deliveries shipped out as completed items at the end of the day; there were no inventories to cut! And they'd learned that slashing R&D budgets was horribly disruptive, with a one month shutdown costing you a year of lost productivity as research teams broke up and long-running experiments got cancelled partway through...

But advertising could be cut. You could flip it on and off like a switch, just don't run your ads this month. So _everybody_ cut advertising, all at once.

I was writing columns for The Motley Fool at the time. 80% of their income was from advertising, and between December 2000 and January 2001, their advertising income fell 50%. Sudden, sharp, stop.

The problem is, the web economy is a publishing business. Television is based on advertising, radio is based on advertising, magazines and newspapers were based on advertising. Fish and chips came wrapped in newpapers because buying newspapers was _cheaper_ than buying the same paper blank. The cover price of magazines just defrays the cost of printing and distribution a bit; they're lucky to break even on that. All the money came from advertising. What websites did was figure out how to publish magazines and newsletters (and later radio and television) without the costs of printing and distribution.

Today the web is still publishing, the sequel to the printing press. E-readers replace books, webcomics replace comic books and strips, even amazon is basically the sears catalog. Mail-order without the mail part.

It wasn't just the dot com companies hurt when Bush and Cheney cratered the advertising market. Most of the broadcast TV networks spent 2001 in the red, and switched en masse to "reality shows" because they were cheaper to produce. Ancient established magazines like Ms Magazine (from the 1960's) and McCall's (over 100 years old) went out of business.

But the dot-com businesses were young and vulnerable, and got hit hard. About 1/3 of the dot com businesses never had a business model in the first place (pets.com and Dr. Koop.com come to mind), and deserved to die eventually... but not _all_at_once. The synchronized collapse did damage that "this week's bankruptcy" wouldn't have.

Another 1/3 had the potential to grow into something real, but were overextended due to Y2K or simply trying the "Amazon vs Ben and Jerry's" land grab, become king of the mountain before anyone else can. Given a meteor strike extinction event, they never had a chance.

Even the 1/3 of the dot com businesses that were well run and profitable, with little or no debt, got hit hard. When fool.com's revenue declined, they immediately brought expenses back line with revenues by laying off half their staff. Being well run with no debt just meant they _survived_. (I'd left a month earlier when writing for them stopped being fun because my bosses were suddenly stressed and unhappy. I had a day job, writing for them was just a well-paid hobby, and they started to make cost cutting noises along with everybody else bracing for Mr. Bush's promised recsession, even if it took a couple months for the damage to show up in their invoices. I didn't know what was up, but I didn't want any part of it either.)

My friend Eric Raymond was on the board of directors of VA Linux, another company in the top 1/3. They'd had a record-breaking IPO, solid financials, and when Bush and Cheney did their thing 3 of their 5 largest customers went out of business in a 2 week period... and that was just the tip of the iceberg. Those companies went under owing VA money, due to (then industry standard) 90 day credit terms where they received hardware and had 3 months to pay it off. Suddenly VA's invoices were tied up in bankruptcy court, where they might see pennies on the dollar years from now if they were lucky, but VA still had to pay its own suppliers for the parts.

Worst of all, the hardware VA didn't get back was sold at auction (again for pennies on the dollar), so now the hardware market was flooded with VA's own brand new equipment. The remaining customers didn't have to buy from VA, they could get brand new VA rackmount servers much cheaper at all these bankruptcy auctions. Nobody had to buy ANY new hardware with all this brand new used stuff everywhere at fire sale rates.

Faced with unpaid invoices, far fewer customers, and competition from its own hardware being sold cheaply at auction, VA exited the hardware business to focus on Sourceforge. They just barely survived, by becoming much smaller and changing their business model entirely.

That same year, Dell laid off 17,000 people in Austin (headline news in the Austin-American Statesbeing), and Intel and AMD both idled their fabs until the piles of accumulating microchip inventory could be sold off. They weren't small and agile enough to switch industries, they had to ride it out. The shockwave that unseated VA hit them hard as well, and from spread out into the economy to become a general recession.

Bush and Cheney of course responded by trying to find a foreign policy issue to distract everybody from the immediate mess they'd made domestically. Having screwed up the prosperity Clinton left them, they went on to screw up the peace he'd engineered (getting the IRA to stop bombing ireland and arranging the handshake between the head of the PLO and Israel). They started by picking a fight with China over a spyplane, but China wasn't interested. Of course their own basic incompetence meant that when the same terrorist who had set off a bomb in the world trade center parking lot in 1993(?), who clinton sent cruise missles after in 1998, and stopped at the border during the millennium celebrations... this easily suppressed clown went on to succeed big time under Bush's watch.

The "9/11 truther" morons seem to have forgotten that the Katrina aftermath was the exact same incompetence as the run-up to 9/11. "No, they really are that stupid." It's hard to believe, but the dot-com bust was just more Bush/Cheney stupidity, easily understandable by people who aren't them.

Which is why the GOP has such elaborate PR machinery to blame everything they do on someone else.


October 9, 2013

Actually building linux from scratch's HTML book from source is annoying. It's got a makefile but "make" just validates the xml and doesn't actually produce any output. I have the prerequisites installed for the kernel's "make htmldocs" but in addition it wants some package called "tidy". There's no "make help", but a quick glance at the contents of the file shows a "nochunks" target, and when you specify that it craps files into your home directory. It has three different variables specifying it two different ways (via ~ and $HOME) so to get it to build in _this_ directory you have to go:

make BASEDIR=$PWD/lfs-book DUMPDIR=$PWD/lfs-book RENDERTMP=$PWD/tmp nochunks V=1

Of course there's no "make clean" either...


October 8, 2013

For a long time I've used 12 hex digits of git commit IDs when cut and pasting, because 8 seemed too likely to collide due to the 'birthday problem' and 12 was the next logical stop (as evidenced by mac addresses). (A group of 11 people will have a pair with the same birthday more often than not, because the second roll of a 365 sided die has to match 1 number but the 3rd roll can match 2 numbers, the 4th roll can match 3 numbers, and by the 11th roll you've passed 50% cumulative likelihood of a collision so far.)

Linus Toralds just made it policy, for exactly that reason.

No real point, just amused. And yes, still really behind on my email.


October 7, 2013

You know how git automatically starts garbage collection at random times? It doesn't emit a warning message saying you've gone too long and should garbage collect, no. It gratuitously starts a 10 minute operation (on rotating media with a repo the size of linux kernel) at random times when the user is trying to do something.

This is what Linux distros did back in 1998 with fsck. Oh, you didn't want to fsck this reboot? Too bad. If you interrupt it, your system is hosed. Have a nice day.

The git user interface is so horrible it's reintroducing user interface mistakes from literally the previous century.


October 6, 2013

So blkid came in, yet another command that's not in the roadmap. Sigh. Cleaned it up as best I could. (Forgot to change my silly names back to professional names before checkin, I tend to use insane variable and label names during development, but "murderfs" has to be reported as "reiserfs" for external compatability reasons. Why, I'm not sure, but apparently somebody's still using it.)

Trying to work out the list of block based filesystems in the kernel tree. (Don't care about fuse-only filesystems, or things that aren't merged into the tree.) In theory linux/fs has one directory per filesystem under that, in practice lots of them are sysfs or nfs or hugetlbfs, things that aren't block backed filesystems.

Searching the tree for sb_read gets some filesystems, but not cramfs. That uses read_mapping... but that doesn't get btrfs. I've been staring at btrfs/inode.c and I still have no CLUE what read functions it's calling through it's numerous horrible layers of indirection and cacheing.

In theory the filesystem registration function has to indicate whether or not it's block backed so /proc/filesystems knows to say "nodev". Ah, they have FS_REQUIRES_DEV in their flags. Right, that's easy enough to search for.


October 3, 2013

Accepted a job offer yesterday, today filled out paperwork for a couple weeks of background checks before I can start.

Trying to finish stuff up before the new thing eats all my time. Wisting a bit at the fact my proposed crowdfunding campaign got zero replies when posted to the list, but oh well. It's always been a volunteer thing, why would that change? (Of course, the kernel guys boggle at the very idea that anybody still works on open source stuff without being paid for it, because they really are that clueless. Oh well, looks like a self-correcting problem in about 20 more years.)/p>

No, I'd just like it if all my toybox time wasn't getting sucked away into an endless stream of command submissions that ARE NOT ON THE 1.0 ROADMAP. Or at least ones that don't open huge can-o-worms issues like "this is a pair of commands that are usually part of a suite of a dozen commands, does merging this commit me to merging the rest?" (In this case, tcpsrv and udpsrv, part of Deny's prefered nonstandard init suite which is not what android does, not what system V traditionally did, not systemd, not upstart, and of course not oneit.)

That sort of meta-issue is making it... really hard to care about the project just now. (Part of that's the job hunt talking, of course. Job hunting always makes me want to huddle and play video games the rest of the time. Switching from "decision making without enough information" to "looming uncertainty via distant third party scrutiny" ought to clear that right up, no?)


September 28, 2013

The sparc build is unstable. I think it's a qemu issue, but I'm not sure.

The problem is that if the host system is doing anything else, the sparc build tends to hang. The most recent time it hung, it dumped voluminous crap:

checking for signbit compiler built-ins... yes
esp: esp0: Aborting command [fa181e60:2a]
esp: esp0: Current command [fa181280:35]
esp: esp0: Queued command [fa181e60:2a]
esp: esp0: Active command [fa181280:35]
esp: esp0: Dumping command log
esp: esp0: ent[30] CMD val[01] sreg[83] seqreg[04] sreg2[00] ireg[10] ss[00] event[0d]
esp: esp0: ent[31] CMD val[11] sreg[83] seqreg[04] sreg2[00] ireg[10] ss[00] event[0d]
esp: esp0: ent[0] EVENT val[0b] sreg[83] seqreg[04] sreg2[00] ireg[10] ss[00] event[0d]
...
esp: esp0: ent[28] CMD val[90] sreg[82] seqreg[04] sreg2[00] ireg[10] ss[00] event[01]

esp: esp0: ent[29] EVENT val[02] sreg[82] seqreg[04] sreg2[00] ireg[10] ss[00] event[01] esp: esp0: Aborting command [fa181280:35]
esp: esp0: Current command [fa181280:35]
...

Rinse, repeat.

It actually seemed to recover and contine with the ./configure eventually. (Building gmp I think.) But I stopped it because: dude.

Oddly enough, that message isn't from qemu, it's from the Linux drivers/scsi/esp_scsi.c. Combined with the hangs (qemu eating 100% but the sparc build no longer even listening to the keyboard and no progress if left overnight), this sounds like some kind of IRQ handling problem. Which is probably a qemu issue?

Sigh. I haven't got a "known good" to compare against. And this isn't a right-or-wrong situation, this is "kernel does not match qemu", and I haven't got real hardware to test on.

Guess I try old qemu versions and old kernel versions and see if either feels like behaving? I think the fixes I did that made it work were all userspace stuff, so the root filesystem should... hmmm.


September 27, 2013

Linux on the desktop really sucks. Here's "reason du jour".

I bought a laptop from a company called "system76" that has Linux preinstalled. It's an obscure little web company that's probably two guys in a garage somewhere, but Linux is about as well support on this machine as you can possibly expect.This laptop has a VERY LOUD FAN when I do something CPU intensive (even just playing FTL, which is using half of one CPU on an 8-way box) and it's plugged into wall current. But when it's operating off _battery_, it makes no noise at all. This is apparently because it clocks the processors down from 2400 to 1700mhz, but the performance is still pretty reasonable so I'd like it to do that, and not spin up the very loud fans, when it's on wall current too.

I've spent over an hour researching this and can't figure out how to do it. I've looked at xubuntu's battery icon and the power saving preferences under there. I set the cpufreq scaling_governor to "conservative" and to "powersave". I've echoed 1 to /sys/devices/system/cpu/cpufreq/ondemand/powersave_bias, and to ignore_nice_load.

There's a scaling_setspeed but it says "unsupported" (even though scaling_min_freq and scaling_max_freq exist and have values) unless I set the scaling_governor to "userspace", whereupon it lets me spin the knob but it's not hooked _up_ to anything. Setspeed shows a value of 1200mhz (the minimum in the above range), but /proc/cpuinfo still says 2400mhz when on wall current (fans roaring) and 1700 when on battery (quiet). Writing 1700000 to it (its units have extra zeroes for some reason) causes it to remember this value, but it has no effect on how fast the processor is going.

Googling is of course useless both because every distro has its own userspace tool for this (each with its own wiki, I've read the ones for debian, gentoo, arch, fedora, and _tried_ to find ubuntu ones but just found users talking to each other in their message boards), and because the low-level knobs those tools actually use to tell the kernel what to do have been rewritten from scratch something like five times in Linux. These days it's all processor-specific wierdness where even different intel chipsets have completely different mechanisms for doing the exact same thing because Intel has the resources to do it over from scratch repeatedly. And even reading up on acpi, speedstep, boost, and all that crap... it just doesn't help. The vendors contribute code which is utterly specific to a single chipset and doesn't work like any other chip. And when you try to tell it "don't change behavior between being plugged into the wall and on battery, work like you're on battery all the time", apparently the "like any other chip" part is optional in that last sentence.

I'm not an accountant or nurse or garage mechanic just trying to use a computer. I'm an embedded Linux developer who _assembles_ systems like these at a very low level. And I can't make this darn thing BE QUIET. So I'm burning out my battery by having it always unplugged when I use it and going back to my netbook while it's charging.

Note that laptops have been outselling desktops since 2005. (They surpassed them in revenue in 2004, in unit volume the following year). So these days, the rectangular corporate workstation Linux is designed around, which can be loud because it's under the desk and not near the user's face, is a rounding error in the larger scheme of things. The iMac was 15 years ago guys. The majority of PC _users_ have been on laptops and netbooks for about a decade (and this is ignoring phones and tablets to focus on the historical PC niche). And in this niche, Linux as a desktop OS is missing obvious stuff like some way to tell the fans to shut up. The people who designed the driver for this chip are a full decade behind the curve.

With all due respect to 7-up, "Linux on the desktop: Never happened. Never will."


September 26, 2013

Patrick at gentoo has said for a long time that patch 2.7.x doesn't work with Aboriginal's linux kernel patch stack, but I couldn't reproduce the issue. He gave me a login to his server, and it didn't happen there. Then I found out the login he gave me has a "~/local/bin" directory full of busybox commands (I don't think I put those there), so of course it didn't happen there.

So I built 2.7.1 from source, and reproduced the failure. And I went through the deeply horrible process of bisecting the gnu patch git repository (just _try_ googling for that; turns out it's on savannah) where the build changed from running "autowank.sh" to do the horrible autoconf and automake stages to doing "./bootcrap" to do the same, except that at certain points you have to redirect stdin from /dev/null or else it hangs, and obviously they can't just depend on gnugnugnudammitlib as a prerequisite shared library but must do "git heroin-addict inject" to copy the gnugnugnudammitlib into a subdirectory of the build...

I may not be _entirely_ happy having to get gnu all over me.

Anyway, I finally bisected it down to this innocent looking commit which copies yet more Linux features into gnu stuff (in this case teaching it git file rename syntax), which is what broke this patch's ability to apply to the 3.11 kernel.

For a while I thought it was leaking state between hunks (the patch introduced a new variable that didn't seem to be getting cleared, the hunk at the end modifying the file it's complaining about clearly isn't deleting said file), but what's actually confusing it is:

diff --git a/kernel/Makefile b/kernel/Makefile
index eceac38..f00be6d 100644
--- /dev/null
+++ linux/kernel/mktimeconst.c

Those first two lines are garbage left over from a previous version of the patch that got copied along with the description at the top when the patch was rediffed. They _were_ purely decorative, until the gnu guys decided to copy new features from git and did it wrong.

I might cut them a little more slack if Stallman didn't describe himself as 'Principal developer of the operating system often inaccurately called "Linux"'. Obviously, since Stallman thinks he's Linus Torvalds (or possibly Napoleon), he should totally know this stuff already...

Anyway, it only ever hit the EXTRACT_ALL=1 ./download.sh case (which extracts all the packages using the host toolset, before host-tools builds busybox patch and tar). However, more/buildall.sh does this when building all architectures (to avoid duplicate work and possible race conditions with each target extracting and patching source tarballs in parallel as needed to populate the package cache). And since most users other that me just build individual targets rather than all of them at once, it doesn't come up much.

Anyway, fixed now...


September 22, 2013

Catching up on my email somebody on busybox was complaining about an ls bug and I checked to make sure toybox didn't hit it, and the bug report sort of implies we would, but not in a way that matters. It's basically "the libc you're using isn't compatible with the system's uid/gid tracking". Busybox was built against either uClibc or glibc on an android system which doesn't have /etc/passwd (but instead does some horrible windows registry database thing stored in magic extended attributes). Or maybe it was an older android version that doesn't track this info at all (or stubs out the standard functions to query them). In any case: build environment problem, not busybox problem.

But that's not what creeped me out. This is what creeped me out. In busybox ls.c, the usernames are printed by this code:

column += printf("%-8.8s %-8.8s ", get_cached_username(dn->dn_uid), get_cached_groupname(dn->dn_gid));

A cacheing wrapper around getpwuid(uid). Busybox is now implementing its own cacheing layer on top of what libc provides, for something that isn't remotely performance-critical.

I no longer understand the design goals of busybox.


September 18, 2013

Cut a toybox release, and the release notes included a very long list of commands that went into pending. So I thought I'd look at toys/pending/*.c sorted by size and take a "shortest job first" approach to getting a few of these cleaned up.

The smallest is watch.c, which is problematic because it needs the same interactive line parsing logic as less and vi and such. You can't let it output endless data or it'll scroll off the screen and take who knows how long to finish. You also can't wait forever for it to terminate or the redisiplay timing means nothing. So it's basically a reskinned less, and we haven't got less yet. (Or shell command line history, or... there's a missing bit of plumbing here that's been a todo item forever.)

Next up: logger. Except that one of the first lines of logger is #include logger_lookup() which lives in syslogd. That's... not right. So far all shared code (even between just two commands) lives in lib, unless the commands can be shoehorned into living in the same file with one being an OLDTOY of the other, ala cp/mv, md5sum/sha1sum, chown/chgrp, id/groups... I'm not sure if the right fix is to put logger and syslogd in the same command file, or to move this code into lib, but having one command file suck code out of another is not maintainable.

Next up is nl. Finally, a command that doesn't have larger design issues outside cleaning up the contents of the file.

Ok, the help text needs redoing. I recall now from my initial read of nl that it's a horrible archaic command from the days when non-postscript typesetting seemed like a good idea; its numbering has "head/body/footer" state that makes no sense after about 1987. (This command is basically another catv mode.) Ok, the subset of flags chosen to implement seems saneish, compared to the legacy crap posix is dictating.

Globals outside the globals block. No thanks. A hardcoded length 5 fmt[] when you can specify the length from the command line, that should end well. And if you go -n '' then technically it'll access unallocated memory looking at n[1]...


September 15, 2013

The PC-BSD guys were there at Ohio LinuxFest (Kirk McKusick was closing keynote speaker), so I thought I'd give installing BSD under QEMU another shot. It would be nice to have an image to test in. They said that PC-BSD was easier to install under qemu than free, open, or net (all of which I've tried and failed to get working).

Problem one: my 8 gig image is less than the 50 gigabytes they recommend, so the install may fail. (This is a pop-up.) I haven't got 50 free gigs on the netbook I've brought with me, and don't value this image 50 gigs worth. Apparently they can run from a 3.6 gig iso image, but not install into more than twice that. And being installed under kvm never occurred to them.

Problem two: when I got to the disk partitioning thing, the readout of what it's about ot do says "FileSystem: ZFS". I don't want evil sun/oratroll crap, so I want it to use the traditional UFS. And... I can't change this. Selecting basic or advanced partitioning, neither lets me change filesystem type. (I have a checkbox to force zfs 4k block size and set zfs pool name, whatever that means, but the asssumption that YOU WILL USE ZFS goes bone deep in this installer. I'd have to pull up the command line tool to change it, and this is BSD so I don't want to go there as my first experience.)

Why did it TELL me this if it wouldn't let me CHANGE it? To rub in my nose their new motto, "THIS IS BSD, YOU HAVE NO CHOICE"?

This was approximately the point at which I lost interest again, but I can click "next" and let it do its thing in the background and MAYBE care when it's done.


September 13, 2013

Sorry the blogging's getting so sparse, my work environment's split between two machines (the old netbook which still has the Horrible Email Client on it, and the new much faster machine I refuse to install anything as broken as Balsa on). The master of the website is now on the new machine, meaning when I'm on the old one and feel like blogging it goes into a separate file to be collated later. Meaning actually posting it goes on the todo list...


September 12, 2013

Working on toybox and aboriginal releases (nothing special, just 3.11 kernel is out, merge window closing, time to flush the queue). My initmpfs patches went in, and Al Viro immediately found something wrong with them. (As foretold in legend and song.)

Now getting on a plane to speak at Ohio LinuxFest on Saturday (the rise and fall of copyleft), and hopefully hang out with the musl maintainer (who's giving his talk Friday).


September 5, 2013

Poking at the email backlog, I'm up to Jacek Bukarewicz's three part opus on mkdir, chown, and env. Specifically, I'm most of the way through addressing his mkdir issues, and when updating the test suite, i hit a really WEIRD one where the permissions were wrong... but it wouldn't reproduce from the command line. It worked fine for manual invocation, but _not_ from the test script.

Hours of head scratching later and laborious in-situ reproduction and instrumentation, the bug turns out to be before the mkdir -p -m even gets called. It's in the rm line _before_ that, where a previous test created a directory "one" then did an "rm -rf one" which exited successfully but didn't actually remove it.

I admit I haven't tested rm -rf as thoroughly as I'd like, because it's a command that can very easily eat your system if you do it wrong. (Just a cp -R that traversed .. did plenty of damage, thanks.)

Once again, strace is your friend when trying to determine what a program is actually DOING.


September 2, 2013

I implemented scripts/single.sh in toybox, which builds a single command without the multiplexer overhead. They're mostly in the 10-15k range on x86_64, so I checked in the /bin directory for files smaller than 15k, filtered out the ones that weren't ELF binaries, and did manage to find a few that are smaller on ubuntu than in the toybox version.

For example, /bin/chvt is 10480 bytes, and toybox single chvt is 14632 bytes. The reason is the toybox one is sucking in the argument parsing logic, because it takes an argument. It's not a flag, but there is one argument and the optarg stuff checks that there's one and only one argument supplied. Which means it pulls in gotflag (690 bytes), parse_optflaglist (731 bytes), and get_optflags (762) bytes for a total of 2183 bytes, which gets rounded up to about 4.5k in the binary.

In theory, I could make a libtoybox.so but in practice? The point of standalone mode is to be standalone. If you need a shared library, you can just use the darn multiplexer.

So yeah, I'm ok with losing on size to the smallest commands because there's a tension between "being simple" and "sharing common code" and that tension sticks out in extreme corner cases. Building standalone is the extremest corner case for this tension. I could implement special case logic for this, but that shoots "being simple" in the face, so no. I'm aware of it, and it's 4k on a 64 bit platform.

(Yes, this is what optimizing "simple" over "size" looks like. Busybox did a shared library for this, filled the code with #ifdefs and funky __attribute__(thingy) macros, micromanaged ELF section assignments, and so on. I'm not doing any of that, I'm just trying to write slightly clever but otherwise straightforward C code.)


September 1, 2013

Didn't go to worldcon. When it came down to it, I couldn't quite work up the interest after my experience with their concom mailing list. (Apparently worldcon is one of those "never see how the sausage is made" things.)

More umount pokery: the -r option requires parsing /etc/mtab to get the old flags, kinda implying that umount is an OLDTOY() of mount. But I should be able to umount something that isn't in /proc/mounts (such as when /proc isn't mounted), and I'd rather not have two codepaths. Another bit that's shared with mount is the escape parsing for /proc/mounts strings (\040 is space)

The format of the -v option isn't documented, it looks like "$DIR has been unmounted". I'm guessing -a just does that for each one. Think I'll drop the "has been"; the command name is "umount" so "umounted" doesn't require an enormous leap on the part of non-english speakers as an untranslated string. (And -v shows the unescaped string so probably not a lot of parsing going on with the output.) Ah, and remounting read only with -r has a different message. Of course it does.

Trying to work out an arrangement where the "umount one two three" and "umount -a" codepaths aren't implementing the same thing twice. The first has to fall back to the second because -r iterates through existing mount points (to fetch the flags), but both have -v...

Actually a readonly remount doesn't care about existing flags, because MS_RMT_MASK is (MS_RDONLY|MS_SYNCHORNOUS|MS_MANDLOCK|MS_I_VERSION), and none of the other three matter if the filesystem is readonly.


August 31, 2013

The umount command's -t option is funky. It select types for -a to trigger on, and can specify "no" at the beginning of a type, so you can do "umount -at proc" or "umount -at noproc", and you can specify comma separated multiple entries ("umount -a -t noproc,nosys"). So... what happens when you mix positive and negative entries? "umount -a -t proc,nosysfs"? Do you get everything but sysfs (so proc is redundant), or do you get just proc (so nosysfs is redundant), or an error message?

All these corner cases, still no specification. (I didn't implement -t in busybox umount, so this didn't come up there, but it's in the man page I'm looking at now, so...)


August 30, 2013

Not at worldcon because my driver's license is expired, Fade's too busy with homework to go, and Camine's never really gotten comfortable driving. Probably take a greyhound bus for the last day of it, but I'm not _hugely_ put out about missing it after my experience on the worldcon mailing list.

I signed up back in 2011 to help out at this year's con suite, and only stomached about a month of the mailing list before fleeing in terror. Let's just say I experienced the greying of fandom firsthand.

On said list, I mentioned I bought the complete set of panel recordings the last time worldcon was in San Antonio (1997?) and putting them up on the the website as a weekly podcast might be fun. The reaction was sheer _horror_. How dare those official recordings still exist? That's incriminating evidence of public performances in rooms full of dozens of random strangers which the convention officially recorded and sold tapes of, and which I legally bought. Of _course_ worldcon itself would have no right to do anything with them, and the sheer idea that such a thing might wind up on the _internet_... unprecedented!

This is when I started googling names of the various participants in the discussion, finding out how old they were, and working out that if you INCLUDED ME the average age of the people in the discussion was something like 59.

Also, there isn't enough insurance in the world to do liquid nitrogen ice cream at worldcon. Nevermind that I did it in the con suite at A-Kon in 2004 (9400 attendees) and 2005 (10,700 attendees), WorldCon is less than half that size so a much bigger lawsuit target, or something. Fear the unknown!

The sad part is to realize that worldcon is _tiny_. It peaked at a little over 8000 people in 1984, and hasn't made it back near that since. It managed 5700 attendees in 2006 but hasn't broken 4000 since, and only got 2100 attendees in 2010. This isn't a "literary con vs other type of con" thing, Neil Gaiman's had _signing_lines_ the size of the 2010 Worldcon. This is "worldcon itself used to be a bigger deal, and has stagnated".

This year Worldcon is scheduled opposite Pax prime (70,000 people) and DragonCon (52,000), all three on the same weekend. The other two are _each_ an order of magnitude larger than worldcon, and the "we've always done it this way, get off my lawn" worldcon concom is almost certainly why. This weekend isn't unique: GenCon (49,000), Otakon (34,000), and of course the various Comicons (130,000 for the San Diego one alone and it keeps trying to grow bigger but they've maxed the largest facility available). Heck, the Texas Book Festival expected "upwards of 40,000" visitors in 2011, if that's not "literary" I dunno what is. You want "big" and "literary", the Frankfurt Book Fair had 286,000 visitors. The American Library Association's annual conference draws 25,000 people. John Green, YA author ("The Fault in our Stars") co-founded VidCon with his brother Hank in 2010. This year, it had 12,000 attendees (and stopped there because it sold out).

There's no lack of excitement about books or about science fiction or fantasy, there's a lack of excitement about _Worldcon_.

You can call 2010 (the most recent year wikipedia has attendance figures avialable for) an anomaly because it was in Australia. Worldcon has to travel around the world, doesn't it? Well, actually, no: 24 of the first 27 worldcons were in the US, the exceptions were Canada (#6) and the UK (15 and 23), and in the 70's and 80's 14 of the 20 were in the US. But 2008 was in Colorado and had a little under 3800 people. For comparison, LeakyCon (a harry potter convention) capped registration at 4000 and sold out immediately. But of course, Harry Potter isn't "literary". What Neil Gaiman does isn't literary. What John Green does isn't literary. (Comicon is evil for having so much media content, but the Worldcon membership keeps awarding hugos to Dr. Who episodes and Girl Genius graphic novels. Worldcon 2007 was _in_Japan_ but anime and manga are off topic. Right.)

Oh, remember that local minnesota convention my sister took me to recently where I wound up helping in the con suite? It had 6789 people this year, bigger than any worldcon since 1989 (when Andre Norton was GoH).

Sigh. Still planning to hit it at least one day because several people I follow on twitter have come to the next city over from me, might as well say hi. But worldcon itself? It's not really a big deal anymore.


August 28, 2013

Working on two things in toybox, trying to close down for a release before linux 3.11 comes out. (-rc7 has dropped, so it's That Time).

First, I put in about half of support for building single commands in toybox without the multiplexer or the command caring what its executable name is, and I should finish that. For the moment I'm probably just doing a "scripts/single.sh commandname" and then people can loop over that with the output of "./toybox". Might hook it into the make plumbing later, but that's harder than the rest of the implementation combined because make sucks so profoundly at the conceptual level.

Second, Rich Felker suggested --color=auto for ls, since the old version does that, and the argument parsing semantics are CRAZY. What I've implemented (but apparently never debugged for standalone longopts) is that if "--color" takes an argument, then you can say "ls --color=auto file" or "ls --color auto file" and both would parse "auto" as the argument to --color.

But "ls --color file" will eat "file" as an argument to --color. And if you just go "ls --color" it'll complain there was no argument. There's no current way to make the argument _optional_.

I think the correct semantics are that you can have an argument with = but not with a space. And that brings up the corner case of "ls --color= file" which... I _think_ should note that you did specify an argument so the pointer is to "" rather than NULL? Because you can have zero length arguments so with a conventional --longopt taking an unconditional argument you could go "thingy --longopt '' walrus" and the argument to --longopt would be "", with walrus an optarg.

(Yes, I go down corner case ratholes implementing new features ALL THE TIME. I don't want the resulting code to be big, but I want it to be RIGHT.)

Which brings up the next obvious bit: the test suite should have a test for the hello command, and it should exercize a zillion option parsingy corner cases. (Since hello has a bunch of option parsing examples in it.) Which is a bit tricky because hello is default n...

Really, there are a couple of "toybox" commands, which aren't exactly "other". There's help, hello, and the toybox multiplexer itself. Is that enough for a toybox category? Should those commands go in the toybox menu that has the debug switches in it?

Design questions don't always have an obviously correct answer...


August 27, 2013

Poking at toybox "build single" mode. Building an "allnoconfig" toybox produces a 10k dynamically linked binary, which is odd. I thought "typical glibc bloat, let's build against musl"... and that drops it to 9984 bytes. Still almost 10k.

Statically linked it's 22k, sucking in multi-kilobyte implementations of malloc, free, printf_core, errmsg, and so on. But that's... sort of understandable? Not ideal, but perror_exit() is calling malloc, free, printf, and strerror, and if flush(NULL) in the global exit path fails, it reports the error. I can try to chop that out, but _my_ part of that code is only a few dozen bytes.

But why is the dynamic version so big?

According to nm --size-sort on toybox_unstripped, looking at the T entries, converting from hex to decimal and adding them up, there's 923 bytes of code on x86-64. The argument parsing logic conditionally drops out, and is not there when it's not needed. There are several other "B" entries like 4k of "toybuf" and 1k of "this", but those are bss (start zeroed, empty allocations that should not be in the executable). Minus all that, the biggest remaining item is 32 bytes of toy_list.

Are the ELF tables really that big?

Hmmm...

$ echo "int main() {return 0;}" | gcc -x c - -o false -Os -fno-asynchronous-unwind-tables -s $ ls -l false
-rwxrwxr-x 1 landley landley 6240 Aug 29 01:48 false

So a file stripped as far as I know how and doing absolutely _nothing_ is still 6k. So 10k for my 1k of code... yeah, I can see that.

Sigh. I can dig up sstrip again, but the result is not portable. There really should be a way to do this with objdump or some such...


August 26, 2013

Mostly set up on my new laptop. Keyboard's still extensively annoying, but I'm slowly getting used to it. Weighs four times what the netbook did.

Still no email client. I'm not reinstalling balsa because it's too dumb to live, probably go back to thunderbird. It turns out Linux has a shortage of minimally usable graphical email clients (although it's plenty if you elide the minimally usable part). Who knew?

Ok, blog entry done on the new machine, let's see if ./wwwsend.sh is going to do something horrible...


August 20, 2013

Poking at nbd_client.c trying to finish the cleanup and move it out of pending, and I don't seem to have documented my nbd server setup command line back in 2010?

I have nbd-server 3.3 lying around, configured and built, but running it... is horrible. Its help text says that I should do "nbd-server ip:port file" except that when I do that it says that naming a file to server on the command line is deprecated. It calls this a warning, but exits. Looking into the code... there's no implementation. It's not just deprecated, it's been REMOVED, but it still calls it a warning.

Great, so it's got a -o option to write a config file based on what you feed it on the command line. So append -o to the command line, and it says -o needs a filename. Ok, -o blah and it outputs a windows-style file (to stdout) with a [blah] section with several things under it. Ok, run it with that as its config file... and it complains it's got no [generic] section. Rename [blah] to [generic]... and it says it defines no exports! (It has an exportname= but that apparently doesn't count?)

Right. Screw this. Start reading the kernel's drivers/block/nbd.c, figure out the protocol, and write my own darn server because it's easier than configuring the broken one that's there. (Dug down to nbd_read_stat() so far, where sock_xmit() is used to receive packages. Sigh.)


August 16, 2013

Swap thrashing a bit. Doing a little bit here and a little bit there in a way that advances things without adding up to a release or upload, but oh well.

I've also been playing rather a lot of skyrim. :)

Doctor's appointment on monday to see why my right eye gets unhappy when I lie down. Difficulty finding a comfortable position to sleep in, and corresponding reluctance to go to bed basically ever, isn't exactly helping concentration...


August 9, 2013

I find it hilarious that the guy who freaked out about toybox is the same guy working to forbid people with root access from running arbitrary code on Linux in the name of humoring Microsoft's attempts to lock down the PC same as the xbox.

(Lose sight of your goals, redouble your efforts! Definition of a zealot.)


August 7, 2013

Blah. Netbook failed to suspend properly again. So what was i working on? Off the top of my head:

Mostly caught up with my email, only about 900 messages behind. (Plus the stuff I haven't downloaded yet today.)

In toybox I'm cleaning up grep to catching up with the test suite, I'm partway through ping, and I need to do sed. And I have pending mount/umount stuff. Plus I finally broke down to do the config2help.c I've needed to do forever to get rid of the python build dependency.

In aboriginal I got access to a new server from patrick@gentoo and building i686-LFS as a smoketest in the new environment died in m4. (Did it twice, and it reproduced.) I'm pretty sure it worked for at least x86, so I need to track down what's up there. I'm also halfway through the ccwrap rewrite to let me switch to musl.

Kernel Documentation I finally sent a MAINTAINERS patch to exclude the translations and the device tree bindings, and got pushback on patch format (of course) and question from the korean guy as to who _does_ maintain that then. (Pointed him to the message where I argued against including translations back in 2007, on the theory that most kernel developers can't tell a korean HOWTO patch from a a bunch of swearing. Greg KH merged it anyway, so I forwarded the guy to him.) Since the kernel.org guys still have no interest in restoring rsync I need to make a script to do rsync through kup (in a horribly inefficient way) to update kernel.org/doc.

Oh right, I need to get the indiegogo up...


August 6, 2013

A while back I talked about how US culture would be a lot more like Canada's if the Jamestown colony hadn't introduced Malaria to the new world, leading the agricultural south to bulk import malaria resistant people from west africa (as slaves when they wouldn't come willingly in large enough numbers), scarring the south's culture in a dozen obvious and non-obvious ways.

The Civil War didn't fix the south, their bent religion that justified having sex with slaves and then keeping/selling your own sons and daughters as property persisted (no, Thomas Jefferson wasn't the only one to do this, they _all_ did it). The cultivated ignorance (don't ask too many questions about why your half-brother is legally a thing) and anti-intellectuallism (who needs labor saving devices when it's slave labor) persists. From the anti-abortion movement to "men's rights", the mindset that "I am so special other people should belong to me as property" pervades everything the south does.

After the emancipation proclaimation, the "solid south" voted anti-Lincoln for a hundred years, until President Johnson signed the civil rights act in the wake of Martin Luther King Jr.'s assassination (and President Kennedy's). Then the ex-confederacy switched poliarity and voted solid GOP, and still do to this day. The democrats ran a Georgia Peanut farmer and an Arkansas Governor to eke out tiny margins of victory, but otherwise it was all GOP presidents for decades.

But the problem with explicit appeals to ignorance and racism (from Regan's "welfare queens" to the Elder Bush's Willie Hornton ad) is that the people who vote for you elect people like themselves, and they climb up the ranks of the party gaining seniority until they're in charge. The Goldwater vs Rockefeller fight for the direction of the party was about short term gain vs long term survival.

This is the context in which to read Paul Krugman's monday column. Faced with a black president, a party built on Confederate racism (and shored up with every other single-issue voting block with a Big Red Button That Stops Thought And Compels Obedience) has stuck its fingers in its ears and started humming to itself.

At this point, it really looks like all we can do politically is wait for the Racist Grandpa hordes to die off and hope there's a country left afterwards. Luckily, it's happening fairly fast.


August 2, 2013

Reading John Scalzi's blog about an agnostic reading the bible, I hit the old "Here and now is all we have" thing. I.E. if you don't believe in God you don't believe in an afterlife.

Atheists annoy me (it's a religion the way zero is a number), so here's a null hypothesis for there _being_ an afterlife, without a God: retroactive reincarnation.

Reincarnation is a possibility we have some evidence for: there's a lot of other people around, why are you yourself instead of them? But standard reincarnation has the problem that the population increases and decreases, so the number of consciousnesses changes, so the theory needs somewhere for the extras to come from and go to. It requires additional infrastructure to explain it.

Retroactive reincarnation asks what would the world look like if everyone got reincarnated in the past, perhaps as the next person born the same second they were? (Or in any arbitrary order, but usually starting your new life before the previous one ended so they overlap a lot.)

Not remembering previous lives would be necessary to avoid spoilers, although there might be occasional deja vu. Everybody in the world could think they were Abraham Lincoln in a past life and be simultaneously right. Justice would be built in without external rewards and punishments simply because Hitler eventually has to live the lives of all his victims (either before or after), so "do unto others" is still solid advice even without heaven or hell. One consciousness could eventually be _everybody_, a bit like stitching a pattern in needlepoint with only one thread. With this theory there's no more "come from" and "go to" for consciousness than for the universe (both we have to explain one of), just a lot of recycling.

Of course the militantly atheist response is that there's no such thing as consciousness, passing the Turing Test is 100% the same thing as actually _being_ conscious. If you can't measure it then it doesn't exist, all experience is illusion, etc. Somehow you didn't need God to be born once, but you _definitely_ would to be born again. (And you wonder why I call atheism a religion? They have no evidence about what happens to consciousness, but they're sure of it.)

Partly we tend to confuse consciousness with identity. That if you drink too much and don't remember it in the morning, you didn't really experience anything. Extend that logic to include the fact everybody dies then you can't be experiencing anything right now, can you? So if there _is_ consciousness, (and those of us experiencing it go "um, yeah, I'm here alright"), then it's not the same as identity. If you lose your identity when you die, is that the end of your consciousness? Most of us have experienced consciousness jumping _forward_ through certain drugs, trauma, or just really deep sleep where no time seems to have passed, can it jump backward at death?

I'm agnostic. I don't know what happens either. But retroactive reincarnation gives me a baseline hypothesis to test other ideas against: from valkyries on flying horses taking you off to a feast hall to the ancient greek underworld in a cave. Does this hypothesis make _more_ sense and require _less_ assumptions? Does it more closely match my experience, or is there a better explanation still out there?

The answer to Pascal's Wager is that the downside to believing in nonsense is it gives temporal power to con artists who use it to enrich themselves, subjugate others (women, minorities), wage crusade or jihad against the infidel, and so on. You enslave yourself to bad ideas out of fear, because long ago Gods walked the earth and helped Paris steal Helen away to Troy and we have it written down in a book so it must be true.

The assumption in Pascal's wager is that the life you're living here now is worthless because it will end. But if it's all we have, making this world worse in exchange for nothing is its own downside. And if our next life is back here ignorantly experiencing the consequences of our actions, and we waste each life over and over again for pie in the sky that's a lie, that's not just a downside, it would be the biggest shame in history. We could create our own hell on earth and live in it billions of times over, endlessly mortifying the flesh in the name of religion and never seeing a next world, just this one again with fresh eyes and fresh ignorance to exploit. That's the potential downside to Pascal's wager, it makes us torture ourselves enforcing random guesses about dress and diet and behavior to appease a hallucination. Nothing bad ever happened when you wore your lucky socks, so Thou Must Wear Them All The Time, Forevermore.

(I've talked about this before...)


August 1, 2013

On a plane back to Austin. Cleaning up grep. As you do.


July 31, 2013

The gnu man page for grep is surprisingly crappy. Documenting the -s option, it notes it's in posix but wasn't in 7th edition unix (from about 1976) and thus portable shell scripts should avoid it.

This is the current man page as of 2013. That's the FSF for you.


July 30, 2013

I stopped drinking caffeine when I moved out of my apartment, the day after the cray job ended (the 20th). I haven't quite gone cold turkey because I had a bottle of diet mountain dew right before the Google interview on Tuesday, and then had to cut the interview short due to intestinal distress. (At least I finally got an answer from them after all these years: No, Dublin would not like to hire me to be an SRE. Having trade up to a much nicer nice house in Austin since last time, this isn't a huge disappointment, but when a google recruiter pinged me a couple weeks back about SRE in Dublin I went "sure, I'll interview" just to see what would happen. I told my sister I expected it had about a 15% chance of actually happening, and I was apparently right. Not really as enthused about SRE as I was 3 years ago, to be honest. At the time it sounded like a learning experience. Now it just sounds like maintenance.)

Anyway, I've been out of it all week doing caffeine detox. I'd probably have managed more napping without the bouncing 5 year old with the sharp knees who is SURE that if she's up, I should be up. (And without the 10 year old who won't sleep in an empty room staying up until 3am with me going "Are you ready to come to bed yet? You don't need to sleep on the couch in the living room, come sleep on the futon in my room.")

Got a little programming done in spite of this, but not much. I started this blog meaning to mention some of it, but honestly don't remember what it is. (About half of ping. About half of the ccwrap rewrite... Trying not to be a jerk online or outright incoherent in emails. Probably failing. Oh well. It generally takes about two weeks for the worst of it to work through.)


July 28, 2013

Implementing ping, and it's funky. (No surprises there.) The IPv6 infrastructure does not cleanly extend IPv4, instead they implemented a whole parallel set of headers and structures and constants and calls to do the exact same thing in a new context, which seems deeply stupid.

Take "struct icmphdr" in netinet/ip_icmp.h and "struct icmp6_hdr" in netinet/icmp6.h. They're both 8 bytes long, both start with byte type, byte code, short cksum, and then a 4 byte union that could mean various things and can be left zeroed. Seems easy to have one structure handle both uses cases, but that's not what they did.


July 27, 2013

Toybox got discussed on ycombinator again, and the argument there is... odd.

Rather a lot of people showed up to defend The GPL. None of them confronted my assertion in the video that there's no such thing as "The GPL" anymore. If there was, I'd probably still be using it, I used to be quite a fan. But Linux is GPL, and Samba is GPL, and the two can't share code. Linux is GPLv2 (only), samba is GPLv3 (or later), so even though both are GPL these programs that implement two ends of the same protocol cannot use the same code to do so. (And a program licensed "GPLv2 or later" couldn't accept code copied from either one, so throwing that into the mix just makes the problem worse, it's a _third_ license state to track.) This means ever since GPLv3 came out, "The GPL" is _broken_. It's become a barrier to code re-use.

That is why copyleft has a problem. Because it's fragmented into incompatible factions that make it a barrier to code re-use, due to the FSF pulling a Darth Vader and "altering the bargain" six years after the largest group of GPL users drew a line in the sand saying they wouldn't accept that. The FSF played chicken with the Linux community, and the resulting crash shattered The GPL. I miss it, but the damage is done.

Really, I covered all this in the talk and slides I linked from the toybox page...


July 25, 2013

The GOP's strategy of moving the overton window seems to have hit natural limits. The question is how long it'll take Obama to notice (let alone Nancy Pelosi or Harry "dishrag" Reid). I'm guessing "never".

It's sad; under Clinton people were expressing concern about the world's "only remaining superpower" overwhelming everything, then the GOP took over in 2001 and became "the party of no" under the next guy, and now people are writing articles about the end of the american empire a dozen years later. I still think the real story is the decline and fall of the Baby Boomers, clinging to power on the way down and doing their best to take it with them...


July 24, 2013

Writing a new ccwrap for aboriginal, because the old one's doesn't support musl and is not a good basis for qcc or replacing distcc and is really a giant mass of scar tissue. But the proximal cause of moving this to the top of the todo list is attempting to support Alpha where I hit -nodefaultlibs not working right... Except now that I look at the code under -n, it's got -nodefaultlibs in it. Huh.

Oh well, rewrite's in proress and musl support's an important strategic goal, so finish the rewrite, debug against the previously working targets (through the LFS build) and then go back and work out why Alpha didn't work.


July 23, 2013

Out of the apartment, stuff packed, keys handed in. Spending the week at my sister's, mostly getting bounced upon by a 5 year old with very sharp knees. (Amazingly time consuming, multiple small children.)

The server lists.landley.net's on filled its disk; fixed now but the toybox and aboriginal mailing lists were bouncing yesterday. Run your own server and things break and you get buried with trivia fixing them. Use hosted serves and things break and you spend your time filling out support requests. Not quite a wash, but not the one-sided thing I'd hoped for either.

Poking at the ccwrap rewrite so I can get musl working. (I'm up to 'M' in the command line arguments list.)

About 4000 emails behind. Working on it...


July 16, 2013

Last week at work in day job. The stress and uncertainty and pending major change are making me very tired.

I'm trying to factor out common code from toybox sleep.c so I can implement timeout.c (apparently there is one now), and I'm hitting the epic stupidity of linux timekeeping: time_t has seconds granualrity, struct timespec (used by nanosleep) has nanosecond granularity, struct timeval used by setitimer (to send a scheduled signal that can wake up waitpid() on a child) has microsecond granularity.

PICK ONE, GUYS! The system should not have timespec _and_ timeval. It just shouldn't. I don't care what terrible historical reasons you have; FIXIT.

(Yes, instead of setitimer I chould use the posix timer API: the example code for that in the man page is 110 lines.)

The really annoying part of these structs is that timeval is two longs, and timespec is a time_t and a long. And time_t isn't guaranted to be long (it can be long long in 32 bit platforms). So I can't even have the same code handle both structures and just adjust based on a flag, it needs two structure definitions.

Well, _one_ of the fields is consistently a long, so I could take a long * argument to set the fractional bit, and return the seconds (as a long, let the return value assignment in the caller convert it). And actually that could encalsulate all the "to float or not to float" stuff...

Grumble. Ok, that's probably the way to do it...

Spoke to Adrienne on the phone. Tryn's feeling better, which is good to know.


July 15, 2013

Networkmanager has inexplecably decided to grey out the Mears Park WiFi entry. No explanation, it's just a different color in the list and you can't click on it.

I've tried a half-dozen things to make it stop doing this: rebooting the netbook, rebooting the router (it's in the cabinet under the TV in the breakroom), kill and restart the daemon, unload and reload the wireless module, look at various config files, google for answers... and it's still doing it. No explanation. It shows me the access point, and refuses to use it.

It did this yesterday, and I got it to work by telling it to connect to a hidden wireless network "Mears Park WiFi", and then when _that_ failed the probe found a "Mears Park WiFi 1" which worked. (Forcing it off the dead cached/leaked entry.) But that's not working today. And deleting every stored association in the list still didn't fix it.

Various sites opined that it was a bad entry in a config file: which is empty here. That the keyring not being unlocked with a password prompt. (Which brings up the "keyring is stupid" problem: I login with a password as my user. If that's not enough, doing it again in a parallel system isn't going to help. Forced another program to prompt for the password: no difference.)

I ran nm-applet under strace to see what config file it was reading from, but of course it's reading from dbus, not from a config file. I'm currently doing a find on the whole filesystem looking for a file containing the text "Mears Park WiFi" but this is such a Daily Mail/Lennart Pottering design that it's probably storing the information in extended attributes as an rc4 hash in a binary blob. Because preventing anyone else from being able to understand it means nobody else touches his toys and only he can do the magic things necessary to remain employed, or something.

Unfortunatley I never worked out how to properly provide wifi passwords from the command line. I can associate with an unencrypted network, but there aren't many of those anymore. There's an "iwconfig gratuitouslyrenamedeth0 key s:pasword" syntax, but it doesn't work. The man page talks about "key slots" and says that "passphrase is not supported". (Given that wireless internet has been available for about 15 years, you'd think they'd have caught up, but no.)

It's possible that there is a real problem and the router's got dust in it or something, but the problem is networkmangler WON'T EXPLAIN. It just greys out the entry. I can see it, but not select it. No error message, no override, no way to request additional information...

(Wound up having lunch at the bagel place and using their wifi.)


July 14, 2013

Just one day of drinking sprite zeroes instead of diet mountain dew and I'm out of it and headachey. Yeah, I've left caffeine detox a bit long. (Turns out my earlier attempt with the diet orange soda was foiled by the fact sunkist is caffeinated.)

Following up on recent developments in qemu-system-arm (for a value of "recent" indicating when I noticed), I've been digging out my old DEC Alpha configuration for Aboriginal Linux. I never bothered to check it in because at best it was like m68k: I could build something, but not test it due to lack of a workign emulator. I got m68k in when a user tested it on a different emulator (aranym) and I knew that at least the toolchain and root filesystem were working. Plus Laurent Vivier's been poking at adding m68k support to qemu, although it doesn't seem to have progressed since the last time I looked at it (2011).

Richard Hendersen's doing similar additions to qemu-alpha, but his are checked in, and he has a kernel tree he's been testing with. The branch "axp-qemu-4" actually seems up to date with current git, so there's something to play with... if I can get my toolchain and userspace building again

Last time I did that was 2009, under uClibc 0.9.31. I dug up the old config and converted it to the current syntax, and... there are issues.

Over in uClibc, wrongbot modified xstatconv and obviously never tested it on alpha: it tries to use nanonsecond data fields out of a structure that hasn't got 'em. However, this structure only gets used when you don't have 64 bit stat (which alpha does), so it's just some missing #ifdefs. Patch one.

Next, the Alpha kernel doesn't export "umount2", it instead exports umount with the umount2 semantics. (Why umount2 semantics? Because this is common vfs infrastructure, the syscall numbers and names are the target-specific part.) Since umount2 is user visible and has a man page, and busybox is calling it directly, this causes a problem. So that's another patch.

Next, building the native toolchain breaks compiling libgcc_s.so, because: "alpha-unknown-linux/bin/ld: simple-cross-compiler-alpha/lib/libc.a(__uClibc_main.o): gp-relative relocation against dynamic symbol __fini_array_start" (and a dozen more, all __uClibc_main.o calling out to __preinit_array_start and similar).

At first glance this seems like a missing -fPIC, but the deeper question is "why would libgcc_s.so include __uClibc_main.o at all?" So digging more, it looks like there might be a subtle ccwrap bug. The horrible compilation line looks like:

alpha-cc -O2 -O2 -g -O2 -mieee -DIN_GCC -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -isystem ./include -fPIC -mieee -g -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED -shared -nodefaultlibs -Wl,--soname=libgcc_s.so.1 -Wl,--version-script=libgcc/./libgcc.map -o ./libgcc_s.so.1.tmp libgcc/*.o -lc

And the problem is the wrapper I inherited from uClibc through Timesys and then have spent several years banging on... doesn't understand -nodefaultlibs. It understands -nostdlib, but that's essentially -nostartfiles plus -nodefaultlibs. And the wrapper doesn't intercept the more granular pair and properly switch off the parts for each, so random crap's getting thrown in the linker line.

I'm in the process of writing a new wrapper to do musl and avoid the giant legacy mass of scar tissue the old one's turned into, but it's not usable yet. And is likely to require rather a lot of debugging.


July 13, 2013

Oops. The mercurial "import -f" command applies a patch with preexisting metadata directly to the repository. Except it doesn't, it applies it to your local tree and then commits the current state of the whole tree. Meaning if you have changes to any _other_ files, it checks them in. The set of files touched by the patch are _not_ checked when doing the commit, because obviously that would make too much sense./p>

I had like 5 unrelated sets of changes in my tree, and they all went in a few commits back with metadata ironically saying the patch just did one small thing. (The export did: the import didn't.) And of course mercurial has a policy that one rollback is magic and special but more than one rollback is insane and unnatural, so I can't dig back to the earlier state.


July 12, 2013

New computer! Yay!

It's from System 76, their low-end notebook upgraded as far as it'll go: 16 gigs ram, terabyte of disk, quad-core i5 hyper-threading to 8, and it comes with Ubuntu preinstalled.

First observation: physically larger than the aspire one's I've been using.

Second observation: _wow_ the "unity" desktop in ubuntu is unusable crap. I mean wow. It manages to be worse than Gnome. This is "windows 8 metro" levels of dysfunction here. It's a tablet interface with no apps.

I spent three minutes trying to figure out how to trigger _either_ the package installer _or_ the terminal. Eventually I typed "terminal" into the search your system thing and launched one that way, and did an apt-get install aptitude, then aptitude install xubuntu-desktop. Then three logouts and a reboot trying to figure out how to make it STOP GIVING ME UNITY. (Solution: the tiny ubuntu logo, click that and it'll list the other desktops. There's both an xfce and a xubuntu option: that's not a good sign.)

While I'm at it, give the root user a password, because ubuntu never does that.

Third observation: this is not ubuntu LTS. This is one of those random 6 month betas that Shuttleworth declared he was discontinuing in favor of random nightly potluck snapshots (because those are never unstable).

The xubuntu software updater hanging for 60 seconds after I press the "install updates" button with no acknowledgement it's doing anything, random example of not ready for primetimeness.

They screwed up scrollbars. The scrollbar is a TINY little line on the right edge of the window now, and instead of the up arrow and down arrow at the edges of it there's now a pop-up next to it when you mouse hover over it, which you then have to navigate over (without going so far away it vanishes) to access the scroll up and down buttons. This pop-up also makes it really hard to grab the bottom right corner of a window to resize it, because it gets in the way. It's fiddly and evil and I hate it and I'm annoyed they broke the working version. Does it stop doing this if I switch to "xfce" instead of xubuntu? No, it does not. Darn it, how do I MAKE THIS STOP?

Ok, in the settings editor, "appearance", I opened a terminal window to see if any of these themes fixed the scroll bar (they don't), and every time I switch theme the terminal window gets one line shorter. Switching between "ambiance" and "radience" (which as far as I can tell are identical) you can really see it in action. (They shipped with this.) Raleigh is the only one that has an actual black terminal window background, but then the text on that black background is too dark to read easily. Who designed this? Obviously somebody who was never going to use it in direct sunlight.

Ah, "sudo apt-get purge 'scrollbar*'" removes the horrible ubuntu overlay package that breaks the gui. Had to log out and log back in for it to take effect, but I have actual scrollbars again! Looks like "albatross" was about the theme I was using...

Yet more "lateral progress". Do not want.


July 11, 2013

Netbook failed to suspend again. Third time since I installed the binary video driver for Steam. Another round of Balsa replies lost...

This time

At least I got "eject" applied, and the first round of cleanups to it applied and describe on the list before it did that.


July 7, 2013

Caught up on email. Got back home and sent almost two dozen accumulated email replies.

Sat down and got the arm target working with both new QEMU and old (1.2 anyway) QEMU, by basically patching the kernel to hit QEMU with a rock until it stopped being stupid. That was the last todo item, cut an Aboriginal Linux release, uploading it now.

Back to debugging tail -f.

It was a good convention.


July 6, 2013

And the Caribou Coffee around the corner can't connect to port 25 anywhere either. The scourge of infected windows machines spewing spam bites again.

(Yes, I run my own mail client downloading local copies and composing replies. If I didn't, I couldn't have read the email and composed the outgoing replies I'm having trouble trying to send, because I wouldn't have had net most of the time I was doing it.)

Sat in a corner with my netbook trying the lobby wireless (which can't send email either; same problem I had at Adrienne's, the ISP around here is blocking outbound non-webmail email, presumably due to windows machines spewing spam). This put me near the volunteers table, and they did a stand up and yell if anybody wants to volunteer for con suite, they need people to help out in con suite...

Six hours later, I have a t-shirt. (Well what did you EXPECT?)

The dealer's room setup I did for Penguicon was heavily inspired by the Minnesota Munching Movement (because my sister's been working Convergence since back before Minicon decided to ossify into something smaller than 4th Street Fantasy, and everybody squeezed out by their narrowing of interests went and started a new con, which was _not_ prepared for the 4th of July to fall on Thursday and thus 1/3 more people than last year decide to take all 4 days off to go to the con. They had over 5000 people in line for preregistration on Thursday, AT THE SAME TIME. It filled up the first floor went up some stairs and filled up most of the second floor, at its peak it was around 7 and 1/2 hours to get through it. Estimates of total attendance this year were "Over Niiiine Thousand!!!" and yes, of course it was pronounced like that).

Anyway, Minnesota Munching Movement. Got to see it up close this time: serving rice to the masses. (Six GIANT rice cookers in rotation, doing brown and white rice. They had something like 50 of the huge sling over your shoulder rice bags, and they went through them.) A similar setup for soup, except they froze giant plastic bags of it and thawed them in the room, then in a plastic tub of water, then in dutch ovens half full of water, and finally cut the bags to pour the soup into giant crock-puts to bring them to 140 degrees and then serve from. A couple hours of my volunteering was dish duty scrubbing the crock-pot pots, lids, and ladels so they could reinstantiate the with soup. (Plus the peanut butter and jelly station, but that wasn't generally a bottleneck.)

Couple panels as well. PZ Meyers and Bug Girl were on a panel, but the room was a solid wall of backs when I tried ot go in. (The stage beyond standing room only: that was full too.) Saw the "Aspbergers in the character of Sheldon in the big bang theory" panel, all four of the panelists had aspbergers and were describing the sensory issues I remember from childhood (socks have WRINKLES and even if you get them perfectly straight the seam over the toes shifts around and I can smell celery in the next room and...) Outgrew that, a mild case of Turettes, and "hyperactivity" that would these days be described as ADHD. Yes, I'm aware "outgrowing" that isn't normal, but it's what I did.

Also went to a Wreck it Ralph panel. (The guy dressed as Turbo wasn't there, but more than one Penelope Von Schweetz.) Didn't get to any of the Doctor Who ones, which was a bit sad (theme this year was "british invasion", they had more than one Tardis and several Daleks). In the dealer's room I almost bought a solid blue Jayne Hat that said "Police Box" on it. (Or the Police Box scarf.) But unlikely to be here for the appropriate weather.

I have no idea who the guests were. I saw Elizabeth Bear in the hall but I think she's a local (was at 4th street too). And I saw her boyfriend Scott Lynch in the con suite (asked him if _he_ remembered the "Ash Nazh Thurbatook" chant because I got a bit overheated scrubbing soup pots in a steamy bathroom and got "Ash Nazhg Thrubatook, Ash Nazhg Gimbatul, bananaphone!" stuck in my head, and yes to that tune, and if you can't remember the rest of the words it's not going to get unstuck, is it? And yes, technically that is "ring ring ring ring, ring ring ring ring, bananaphone". No, he didn't remember it either. And I didn't find filk.

A fun con. I didn't fully enjoy it because I wasn't fully recovered from 4th street (which I didn't even officially _attend_, but my day job's been taking up enough time and energy that I haven't got huge reserves), but I could see going back next year and giving it a proper go.


July 5, 2013

It would be fairly anti-social of me to just ignore the toybox mailing list and concentrate on writing new commands instead. Which means people go "oh, you're not writing any new commands because you're spending all your time very slowly trying to parse and understand other people's code; here let me write more commands and send them to you, so you have even more code to read". I don't expect to catch up before 1.0.

Some new command submissions are great. (I'd forgotten that "paste" was even in posix.) But some... I know how to do pgrep. The fact I haven't done it yet is because I probably want it to share code with ps and top, and possibly with "grep". Not because it's hard to write, but because I need to study the optimal way to write it, and to do _that_ I need to clear my plate of all this pending stuff and fixups.

I am getting some cleanup submissions now, which is awesome. But I just more than one new commands that's maybe an afternoon to write but several days to clean up. The problem is that reading code is harder than writing code, and for a lot of commands it's much slower for me to clean up somebody else's submission than ignore it and write a new one from scratch. And I'm less confident that I fully understand the behavior of the result. For example, I'm trying to figure out what's wrong with "tail", which started life as code I didn't write and apparently I missed a curve somewhere when cleaning it up and extending it to do what I needed.

It looks like the real problem with "tail" is I got distracted halfway through the cleanup by the zillion other things I need to do on this project, and forgot where I left off. And the end result is a command I thought was done but is instead segfaulting, and I have to work out the logic so I can fix it _and_ re-triage it to make sure I haven't missed anything else I meant to do to it.

(I'm trying to separate "this is the fun bit of programming, dowanna do the not fun bits" from "this takes ten times as long and I'm not convinced the result is any better". Really what trumps it all is "but if I do it this way other people are more likely to use the result", and so I do it the slow way and try hard not to produce a lower quality result. But people keep writing commands I was looking forward to doing, and sending them to me, and that's why I never get around to the next one. And when I look at their code I have all these questions about why they did that, and they're not here to ask...)

But first, I need to figure out what I didn't understand when converting tail from not-my-code into something I could (theoretically) support. It's segfaulting. It should not be segfaulting...


July 4, 2013

My sister's taken me to another convention for the 4 day weekend. The convention itself has no internet access, and the hotel we're staying at down the road has internet but blocks sending email.

So another weekend to fall further behind...


July 4, 2013

Why did I give up on KDE in 2008? Here's one random data point. Back in 2007, when knetworkmanager didn't work (couldn't connect to a wireless network), I either filed or commented on a bug. That bug got marked as a duplicate of this bug, which is about cosmetic issues instead of the functionality of network association. And six years later, I'm STILL getting email about that bug, because it's still open.

At this point it's in the "spam I don't get quite often enough to set up a filter rule for" category.


July 1, 2013

3.10 dropped over the weekend. Did a test build of i686 target with the actual final kernel and the probably-final toybox tarball, and it built LFS in the chroot.

Triaging targets: mips, powerpc, sh4, x86, x86_64 all seem happy. I still need to find a way to get ARM to work, although "just run it under qemu 1.2 because everything more recent is broken" may be it for now. It really _is_ a qemu problem.

I might also want to bang on sparc, because qemu 1.2 does run-emulator but not naive-build.sh (kernel command line too long), and the OpenBIOS upgrade that supposedly fixed that is _also_ what introduced the hang that prevents userspace from starting around interrupt enable. "use qemu 1.2" isn't a _regression_ though...

Meh, need to cut a toybox release, which means I need to write up release notes and update the web page, check the status.html and roadmap.html...

Ok, toybox 0.4.5 is out. Now to drop it in Aboriginal and rebuild everything...

Tried to catch up on email over the weekend (still behind from last weekend's power outage), but hit several patches of "must actually read this entire thread" in my skimming. (Kernel documentation: I don't do nearly enough but what I do manage is quite time consuming.)


June 30, 2013

And the Balsa email program remains crap, losing all my half-finished replies when Ubuntu crashed. No idea who I was replying to: if I owe you email and you didn't get it, ping me again. (Chrome remembers tabs, but balsa doesn't remember anything.)


June 29, 2013

Catching up on email. I usually do that on weekends, but last time due to the power outage I fell further behind instead. Currently four days behind.

Got the updated initmpfs patches posted. Got the sh4 kernel running under qemu again (chopping out yet another access to port 0x18). Fixed (another) toybox ls bug segfaulting column view when the terminal size is zero.


June 20, 2013

Power out at the hotel.

The sparc target isn't working in the new qemu. It has nothing to do with the kernel upgrade, it's the qemu upgrade switching in a new openbios (commit 467b34689d27) that makes it hang at the end of kernel boot, around where it tries to initialize interrupts. Need to poke the qemu guys and go "huh?"


June 19, 2013

One of the design edges in Aboriginal Linux is that it uses release tarballs by default, with sha1sums for each tarball, so checking in an upgrading to the new release requires the release tarball to be there in the repository. (I could check in a temp version with a url out of the mirror, but that's ugly.)

So I tend to have pending changes for each new kernel which I can't check in until the kernel's there, because the patches don't apply cleanly to the old kernel. Ideally I like to start fiddling around -rc3 and try to get all the targets working again, but then I have stuff I can't check in for a long time. When I'm lazy, I hold off on trying to get the new kernel working until it's released (because then I can check in the change to downloads.sh right off and then check in each patch as I tweak it), but then I find stuff broken in the new kernel I could have reported and had them fix during the development cycle.

In the long run, I'd like to get the patch count down so this is less of an issue. That was part of the motivation behind pushing the perl patches upstream, and now that blockage is done I should worry about the smaller ones again. The next big pain is the arm versatile patches, which change the kconfig stuff so you can plug armv4tl and such into a versatile board (which qemu can emulate, and is the easy way to get an arm board with a PCI bus). Real hardare doesn't do that, but as the recent interrupt changes show nobody's been using real arm versatile hardware (with recent kernels) in some time...

I also have 3 (count them, three) patches for sh4, but I also have to patch qemu to get that to work so I may be the _only_ one particularly testing that.


June 18, 2013

Implemented switch_root for toybox, because it was there.

The posix spec for "tr" is horrible.

It says "When the array specified by string2 is shorter that the one specified by string1, the results are unspecified". Right up front, they plead the 5th about what this command should do.

The -c and -C options put "the complement of the values specified by string1" in the array. Nowhere does posix say what they mean by "complement". I thought they meant bit reversal (ala ~c) until I played with the installed version: They mean everything _not_ specified. Except you can't just do the obvious thing and reverse the test, no, it adds every possible value that ISN'T this, IN ORDER. Does that mean the crazy horrible thing you think it means?

$ echo | tr -c a abcdefghijklmnopqrstuvwxyz
k

Yes. yes it does. Starting with a NUL byte as the first thing in the array.

Does that mean every possible unicode value? Presumably. (How else do -c and -C differ?)

Now let's back up: if you put every character you didn't specify into str1, including all the UTF8 stuff (or even just high ascii), it's pretty likely to be shorter than str2. Which it says triggers undefined behavior. Presumably this includes high ascii and everything. The host version on ubuntu uses the last character of str2 as a match when it's shorter, but posix doesn't guarantee that.

Then there's [=equiv=] which is not the same as [:class:]. The spec specifies twelve classes but does not give one example of equiv, and refers to equiv as an "equivalence class" using BOTH keywords for it, just to be confusing. Luckily it says it's defined by the current setting of LC_COLLATE which means it's probably not my problem? (I'm supporting unicode, but not locales.)

As far as I can tell, the difference between -C and -c is that the capital letter one is unicode-aware. I think? Except... not?

Then there's this little gem:

Because the order in which characters specified by character class expressions or equivalence class expressions is undefined, such expressions should only be used if the intent is to map several characters into one. An exception is case conversion, as described previously.

Except it doesn't say how _many_ characters are in said expressions, and it says that if str2 is shorter behavior is undefined, so how DO you map many characters to one obeying the standard? If I go "tr [:space:] x" the standard says THE RESULTS ARE UNSPECIFIED. It gives an [x*n] syntax for repeating a character, but doesn't say how big n should be and if you're supporting unicode 256 isn't necessarily enough... Ah, [x*] omitting n says to fill str2 to the length of str1. Right. And then an application note on BSD vs System V: I think after 30 years we can PICK one, eh? Especially if the alternative is undefined behavior. (Posix is afraid to acknowledge losers, even when it's obvious. And of course the n in [x*n] can be octal if it has a leading zero, because that was important when Posix-2008 came out.)

I looked up equivalence classes in the man page, and it said:

[=CHAR=]
    all characters which are equivalent to CHAR

Real helpful guys. Thanks.

Digging some more: it's a literal escape, so you can match "[" or "\" or "-". Why couldn't they just SAY that?

The spec says that an unfinished range ("a-") is undefiend behavior, and the gcc build depends on it (using it to convert - to _ at the end of the set).

Craptacular. Truly.


June 16, 2013

I actually got some time to myself this weekend (visited my sister while the kids were at their father's and she was at work for part of it), so I banged out split.c, one of the remaining low-hanging-fruit commands that aboriginal linux (actually linux from scratch) needs to build. Got it finished, tested, and checked in, and added tests for it to the test suite.

There are now 42 busybox commands left in build/hosts, although several are duplicates or near-duplicates ("[ [[ test", "ash sh", and so on...) plus the busybox multiplexer. Meanwhile toybox (current hg snapshot) implements 110 commands in defconfig.

The _hard_ ones left in the busybox-aboriginal list are awk, vi, and sh. (And really, vi's more fiddly than hard.) And I'm still patching out bc so it's not on the list, but should be...

Once I get aboriginal and Linux From Scratch sorted, I need to implement commands that are unique to toolbox, and then build the Android Open Source Project natively under Aboriginal, which is going to require yet more commands on the host. (Some of which, like git, are just external packages you build and install like with LFS bootstrapping. But there are bound to be some missing bits and corner cases revealed by that...)

And _then_ what I do is work out what to install on a stock android system to have a build environment that can build AOSP. (Add musl, add toybox, add git, add repo, add compiler and linker, run giant-horrible-build-command...)

Actually, interesting point: git is gplv2. Meaning either somebody somewhere needs to clone git (giant crawling horror that it is), or somebody needs a repo replacement that can wget tarballs instead of git clone stuff, or they need to do another exception for it. Amusingly, mercurial would probably take over the world from git if it wasn't also gpl (especially since it can import and export data from git repositories), but it is, so it's relegated to a historical footnote. Oh well.

But all that has to wait: back to work tomorrow.


June 12, 2013

And got another evening to myself.

Dusted off aboriginal and tried it with -rc5. The _only_ targets that worked were the x86 ones. (Sigh.)

Bisected mips and reported upstream, arm is still a qemu problem, sh4 has a qemu issue _and_ something with the kernel. In theory sparc is fixed now in qemu, but it hangs after "freeing kernel memory"...

Ok, let's bisect sh4: 3.8 worked (with appropriately patched qemu), 3.10-rc5 doesn't, so... [grind grind grind...] commit 4f73bc4dd3e8 added CONFIG_TTY, which defaults to Y when hidden, and is revealed by CONFIG_EXPERT. Which hits sh4 because that target's kconfig forces on CONFIG_EXPERT because they're "special".

Wheee. Coming up on 3am, definitely bedtime.


June 7, 2013

The Minneapolis airport to St. Pete shuttle (different city than St. Paul) gives me about an hour of time with my netbook. It's also a $65 round trip that only takes me halfway to my sister in New Ulm, she still has to drive 45 minutes each way to pick me up. But eh. Beats two hours each way.

Catching up on my email, which is a largeish time commitment. The python script I wrote to sort everything into folders only runs when balsa restarts (because balsa gets confused when I modify the mbox files while it's running and balsa's internal mail sorting facilities, at least in the version in Ubuntu LTS, do not work at all). Since I tend to have multiple reply windows open as pending todo notices (often "finish this email when I have net access and can look something up and/or include a link"), and balsa can't remember those between restarts the way kmail could, I've fallen into the habit of reading the big unsorted inbox. On the bright side, this means I'm at least skimming linux-kernel and qemu-devel and such promptly, since it's all mixed together and I scroll past a giant unthreaded list of everything in the order it downloaded.

The downside is it takes a little over an hour a day, based on the fact I just ate a 4 hour block (since I missed the noon shuttle and the next one's not until 4pm) catching up on half a week. (I also scan my actual addressed-to-me inbox when I _do_ get to restart balsa and run the sorting script, to see if I missed anything. Half the time this means "ubuntu crashed and I had to reboot my netbook".) I also check the web archives for toybox and aboriginal to see if I have upcoming messages there I need to dig towards.

Yeah, I need to switch to a new mail client. Thunderbird is dead to me, I'm not going back to kmail, and balsa is crap. The text-only clients don't let me _have_ 15 half-finished replies open in different composition windows. That leaves...?


June 6, 2013

This ifconfig design pondering (thinking out loud, really) is the sort of thing I'd ordinarily post to this blog, but I'm trying to keep it all in one place (and that's the mailing list) so I can link it from the cleanup index page.

Not a huge amount of design work I can do in twenty minute chunks, but...


June 5, 2013

Heard back from the indiegogo tech support guys (yeah, behind on email):

Indiegogo is firmly committed to the safety and security of your online transactions. We work with PayPal Payments Pro to make every effort and use the latest technology to ensure the security of transactions. Safeguards are in place secure the information we collect online, prevent unauthorized access or disclosure, maintain data accuracy and ensure the appropriate use of information. Our powerful Secure Sockets Layer (SSL) encryption technology protects sensitive information from unauthorized parties. SSL encrypts credit card information and address information before it travels over the internet. In addition, credit card numbers and addresses are never stored on our database or servers.

Bottom line, your financial information is safe when you contribute through our site.

My reply:

It's a nice form letter, but it didn't answer my question.

When you log in, your username and password are entered into an unencrypted main page. (It was fetched via http, not https. If I go to https://indiegogo.com it _redirects_ to http://indiegogo.com and strips off the encryption.)

This opens the possibility of injection attacks: the javascript the page runs was transmitted to me in plaintext and techniques exist (connection hijacking) to substitute different contents in an unencrypted page. This modified login page could silently submit a copy of my login credentials to a third party.

If I'm running a campaign raising tens of thousands of dollars, that's a tempting target for theft. With my login credentials, they could change the mailing address, email, bank to deposit the money into...

Do you understand the problem here? You're essentially opening yourselves to a phishing attack on your own website by _not_ encrypting the login page.

Rob


June 4, 2013

I did the initial writeup for an indiegogo campaign and posted it to the toybox list.

This isn't really aimed at end users. (Although I'm happy to take your money; I've learned from Howard Tayler and Amanda Palmer and a dozen years of being a consultant that being embarassed about taking money is not useful). But this is really aimed at the various companies I'm told would happily fund something like this. Open source developers are already contributing their time to the project, for which I am thankful and my "verbose cleanup" series is really to try to take better advantage of their efforts.

Part of this campaign is calling the bluff of the armchair FSF types who INSIST I must (already!) be getting _paid_ for this. I don't have a youtube account to reply to the comment on the video about Qualcomm, which I haven't worked for since 2010 and haven't received a dime from since then. (Of course if they want to sponsor the indiegogo campaign, I'm all for it, but they're not an existing user of Toybox that I know of. They don't ship android devices, they ship components that get used to _make_ android devices, so their linux/android development is all down at the driver and toolchain level, not so much in userspace. Then again I'm trying to set it up to be anonymous, so if they were to put money into it, I don't actually have to know. But I can think of some more likely candidates...)

If the campaign _doesn't_ fund, even at the minimal 3 month level, then all these GPLv3 advocates can SHUT UP. If it funds, yay: I'm working on this stuff anyway, I honestly believe everything I said in the pitch text and am working towards it anyway. This would just give me more time to do so.

The reason the campaign's not up yet is because indiegogo's main page redirects https: links to http: so the dialog you enter your login and password into comes unencrypted over the wire, so on an open wireless connection (my only internet at present) anybody could inject different javascript code into that page to send a copy of the credentials to an arbitrary third party.

I've contacted indiegogo tech support about this. Everything in their FAQ is aimed at securing credit card info for people donating to campaigns, nothing about a campaign's own account being hijacked and different deposit information substituted to steal the money raised...

(Yes, I worry about this sort of thing. I'm trying to write toybox as secure code, you get into a mindset of seeing leaks and exploits...)


June 3, 2013

Started a cleanup page for toybox, tracking the series. I appear to have missed documenting a number of them. Oops. More todo items.


June 2, 2013

I finally indexed the Why ToyBox talk. The following links jump to specific topics in the video. (Sorry about the ads, it's The Linux Foundation.)


June 1, 2013

First weekend to myself in a while, and I decided to take a break from caffiene. I can't do a full detox (which takes a month), but I haven't had any caffiene since I got out of work. I'm on my sixth diet orange soda, with three naps so far today.

I'd hoped to get an indiegogo up today. Maybe tomorrow. On the bright side, I finished cleaning up the "stat" command, am making more progress on "ifconfig", kicked off a build of all targets with current linux git (3.10-rc4 or so) so I can debug that, and have started poking at the Linux From Scratch 7.3 upgrade.

I was also hoping to get 1000 calories of exercise down in the gym, but only did 250. (While watching "Stephen Fry in America". Which is nice, but it's no "Around The World in 80 Days with Michael Palin".)

Spent some of last night catching up on The Daily Show, now catching up on Rachel Maddow. Still way behind on both, but not as far behind.

Yay time off. I go nap more.


May 28, 2013

Returning from a long weekend in Austin, getting to hang out at home with Fade and Camine. (And Pixel, Peejee, and George.) It was marvelous, I look forward to a more sustained presence there in a couple months.

Now, I'm in Louisville Kentucky, which was not on the agenda. Austin to Chicago to Minneapolis, except as Dave Barry noted, earwax on the radar at O'hare backs up the whole country. They're having storms, so we got diverted to some other airport, which filled up so we wound up here. The airport is closed, all the planes left already, no open restaurants. (But they do have electrical outlets, so I'm charging my laptop and phone.)

Oh, and no internet. Instead they have Boingo branded lack of internet. (I think this is the Donkey Kong Jr. version of Bonzi Buddy? Southwest offered internet access while we were airborne that's cheaper than this thing, and I didn't buy that either.)

At least my connecting flight in Chicago can't leave either, due to the same storm. it was originally supposed to depart at 9:55 and get me in at 11:15. Now they say the next status update here in Kentucky (not when we get to leave but when they next guess how long we might be stranded) is 11pm.

Finally got in at 2am. Whee.


May 24, 2013

Any sort of radio transmission from your electronic devices is prohibited. Feel free to connect to the in-flight wifi. Southwest just said this.


May 19, 2013

So PC sales are collapsing which means the PC to smartphone transition I gave a talk about is underway. And _that_ means my toybox project has become time critical.

So I'm working out the text of an indiegogo campaign, to see if any of the companies using toybox might want to sponsor me to work on it. (Kickstarter is product-oriented, it gets confused if you're not mailing people stuff. Indiegogo is better at "sponsor me to spend time on X", that's the one The Oatmeal used for the Tesla Museum and Operation Bear Love.)

The hard part is that I'm no longer a recent college graduate living in an efficiency and eating ramen. I have a real house with real mortgage and real plumbing and pest control expenses, supporting a family with 2.4 cats and a wife in grad school, I'm old enough that health insurance isn't really optional anymore (and even with it I've spent several thousand dollars this year on optometry and dental work). Plus when I'm not working for an official employer I pay self-employment taxes (both halves of social security and medicare), and of course the crowdfunding site takes its cut...

A recentish LinuxFest Northwest talk gave a low-end estimate of $75k/year for an experienced full-time developer who _wants_ to work on an open source project, which is in the right ballpark for me. Add in Indiegogo's cut and that I should probably get a functioning build server again (and something better than a $250 netbook to code on) and that makes it $80k for a year.

Would anybody fund that? I have no idea. But right now, I fell guilty that I'm not keeping up with the review and cleanup necessary to merge the contributions coming in (I have now done 16 cleanup passes on ifconfig and I'm maybe 2/3 done), let alone the giant todo list of stuff I want to write. I'm down to maybe 5 hours a week on toybox, and it's showing.

If I go "ok, sponsor me to work on this" and nobody does, at least I don't have to feel guilty about going as fast as I can while a completely unrelated day job eats all my time and energy. (I'll still work on it either way, the question is how much time I have.)


May 18, 2013

Testing 3.10-rc1 in aboriginal. Found the uClibc build breakage. Next up: commit 2535e0d723e4 is the reason the init script wasn't running: support for #! scripts is now a config symbol (BINFMT_SCRIPT) I need to switch on in baseconfig-linux.

The perl removal patches are finally upstream! But Peter Anvin's quest to unnecessarily complicate the kernel build continues, he replaced one with "bc" which is a turing-complete programming language that's so seldom used busybox hasn't got it. His patch broke Linux From Scratch (they had to add bc to chapter 5; in the past 12 years it's never been there). I have no idea _why_ he did this since the C implementation I contributed was actually smaller and simpler than what he wrote. Oh well.


May 17, 2013

Yay! Finally got the first chunk of the giant backport checked in at work. I'm not blocking the rest of the project!

Well, except that testing in their filesystem the gnu/dammit version of ls is checking extended attributes, and that's a codepath I didn't check which still panics. (Which I can't switch completely off because Red Hat Enterprise hardwires on selinux.) But I can deal with that on monday.

Meanwhile, I get an actual weekend. For the first time this month. (Ok, tomorrow I'm heading to Adrienne's to help her pack so she can move back south.)

Shutting down the various gazillion open windows so I can reboot my netbook, and assembling The Todo List Of Doom. (I leave a tab open with some unfinished thing to remind me to do it. When I'm busy, I accumulate a looooot of unclosed tabs...)


May 14, 2013

Eye doctor appointment this morning. Follow-up from my emergency room adventure last month. They dilated my eyes, examined my retina, confirmed there wasn't retinal detachment or blocked blood vessels or anything obviously physically wrong with my eyes.

Just like the emergency room people did, only moreso. The doctor explicitly told me she couldn't tell me about anything neurological, because that's not her area.

Sigh. My hypochondriac streak has fixated on the word "cancer" (which I encounter ~20 times/day on twitter, in ads, news about Ian Banks and Angelina Jolie, people on my feed doing fundraisers against it...), but this isn't reassuring about the "natta tumah" thing. Oh well. At least I got a new glasses prescription, they estimate it'll be ready in about a week.

Oh, and the ambulance bill for being taken two blocks arrived. $1600. Of which the insurance company covers nothing, but they've deigned to apply $400 of it to our deductible! (They don't have to apply the expenses we incur to our deductible, _this_ is an are we have to argue with them about. So of course Obama ignored single payer in favor of more middlemen from the insurance industry involved in our health decisions. Wheee...)

If I'd know what was going on, I'd have gone home. If I'd have been thinking clearly I'd have called a taxi instead of an ambulance. If I'd known what my insurance would be like I'd have said "I'm paying cash"...


May 11, 2013

Thought I might get a couple hours on toybox or aboriginal while the niecephews were at the pool, but Sam doesn't turn 6 until October and in order for her to swim, I have to swim. (Queue Captain Picard shouting "There are four lifeguards".)

On the bright side, in dayjob-land the filesytem I'm backporting through 4 years of kernel development (Pointy Hair Linux is still using a 2.6.32 kernel) now compiles! And mounts! And segfaults when you list the directory contents!

So that's something.


May 6, 2013

Day job eating life right now. No brain left for anything else.

(Caught up on Dr. Who up through the C.S. Lewis tribute christmas special though, via the netflix app on my phone. Sadly, this required less brain than it should have, but then I still hold it to the standards of the period when Douglas Adams was script supervisor.)

(Also still vaguely annoyed that the picking at early cannon Moffat's doing is completely ignoring Susan. The Doctor did not leave Gallifrey alone, he took someone away and hid her on a planet he's still guarding. The one time she returned to gallifrey (in the 5 doctors) she pretended to be just another companion and left quetly with him; it was her homeworld and she said _nothing_. Anybody else see a red flag in that? Moffatt could trivially say Susan founded the time agency Jack Harkness came from, but no, we get River Mary Sue Song...)

Ahem.


May 3, 2013

I need to extend the option parsing infrastructure to include FLAG_longopt #defines for each long option. Doing this in shell script with sed is horrible, so I'm pondering writing some C to include lib/args.c and leverage the existing C option parsing infrastructure to output its own macros.

Problem: what to do about flags that are configured out? Right now the shell script is #defining those to 0, which makes if (toys.optflags & FLAG_x) become if (0) and drop out. If I replace the shell script code doing that, the C code doesn't naturally do that either. Since USE_BLAH() macros are done by the preprocessor, it's actually really hard to get C code to deal with both at the same time.

Which means I need to build the C executable twice, with two different preprocessor configurations, and feed output from the first one into the second one...


May 2, 2013

Sigh. Caught up on email before going to bed, and there's a mount.c submission, from someone using toybox in a product and who needs mount.

I've had an unfinished 100 lines of mount sitting around for months, with no progress on it. Instead, I've been spending what little coding time I can get doing things like cleaning up ifconfig.c (which I'm maybe halfway through). If you got the impression that cleaning up giant external submissions like that is slower than writing new code from scratch, you're not alone.

The new mount submission is over 1400 lines long. The one I wrote for busybox was 600 lines. This is a command I've already written before. It would be faster for me to ignore it and finish mine.

But I haven't had time to do that. The fair thing to do would be to merge this, discard the work I've done, and clean up this thing. Except I haven't had time to finish cleaning up the last giant code dump yet. Or the half-dozen before that...

Instead, I spent today backporting vanilla kernel.org code from an obsolete 3.0 kernel to an obsolete 2.6.32 kernel, because we need 9p working in centos 6.4 and it only supports 9p2000.u not 9p2000.l like SLES does. Nobody will care in 5 years, but we can't wait for Centos 7.0 to ship, and until then Red Hat will continue to ship a kernel that's already seventeen releases out of date.

I'd love to take a year off and work on toybox full time, but I can't afford to. I got married, we moved to a real house with a real car, and she's getting a doctorate. That lifestyle comes with expenses...

I hate blocking other people. I also can't help but be envious that they get to work on toybox at least somewhat in the context of their jobs, and I don't...


May 1, 2013

Monday: two very attractive women sunbathing outside in bikinis. (Dunno if they've been doing this and I only noticed because I took the day off and the break room with internet access is surrounded by windows.)

Today, it is _snowing_. (Not sticking on the ground, wet grey and overcast all day and I'm not the only one to notice when switched to flakes after sunset.)

Jonathan Coulton's "First of May" song might possibly have presented a very slightly idealized take on the situation, is what I'm trying to say here.


April 29, 2013

Taking the day off from work. The 3.9 kernel shipped and I need to catch up with open source stuff.

Plugged 3.9 into the aboriginal build, tweaked the patches, and built the quick and dirty version of all the targets, ala:

NO_NATIVE_COMPILER=1 more/for-each-target.sh './build.sh $TARGET'

That's a small enough subset my netbook can churn it out in a finite amount of time, and: arm works, mips works, sparc sort of works ("ls -l /dev" hung once, but ran on the reboot? Hmm...), and the x86 targets work.

That leaves powerpc, sh4, and mips64. (And m68k but qemu still doesn't support that well enough to test and I haven't bothered to set aranym back up yet.) Of those sh4 dies before producing any output, quite possibly a qemu issue.

I also need to get inittmpfs ready to resubmit. There's conceptually three parts: base inittmpfs support, rootrdflags=, and the ability to configure out the root= fallback code.

But mostly, spent the day banging on toybox...


April 28, 2013

Managed to screw up my local toybox repository again today. (Doing "hg import -f" will suck up changes to your working store, even in very different parts of a file that only has a one line modification. You'd _think_ it would bypass it and just apply changes to the database, but no. And if you do two of them at once, "hg rollback" won't undo the previous one, it's now a permanent part of that repository, because Matt believes that rollback should only do one level otherwise demons fly out of your nose.

(The way to fix it is to clone the copy on the website into a new directory, do the hg import -f into _that_ copy, and then substitute the .hg directory out of that for the one in the old repository. Or at least that's what I did. So far no more nasal demons than usual.)

Meanwhile, I heard back from the first of the tinycc developers about qcc: Daniel Glockner read my old blog entry on the qcc triage todo list and he does _not_ want any of his code used under BSD license terms, so that's 295, 306, and 307 to remove from tinycc. (He added the ARM support, but the point is to replace all the code generation with qemu's TCG anyway, so not a big loss. The ELF plumbing and predefined #ifdefs need to be redone at some point, but it needs to be genericized with a table for all supported architectures.)

But that's after toybox 1.0, and presumably after I switch Aboriginal to musl, toybox, and llvm. And do my own distcc implementation that doesn't assume falling back to compile on the host is something to do on a regular basis.


April 26, 2013

Behind on everything, and this weekend I visit the niecephews again. In theory the 3.9 kernel drops this weekend, no idea when I'll get to catch up...


April 25, 2013

Why is distcc making zero speed difference building on my netbook? I kick off two parallel builds in two different windows and they advance at exactly the same rate. What, some kind of strange cache effects? Did I break distcc again in a way that it's totally refusing to tell me about because visibility into what distcc is doing has always sucked mightily? Has the emulated network become a bottleneck that's coincidentally slowing it down to EXACTLY the speed of the non-distcc version?

Sigh. I need to write a distcc replacement. Something that actually _forces_ compiles to be distributed and _never_ falls back to a local build, rather than invisibly deciding not even to try to distribute it for inscrutable reasons.

But not today...


April 24, 2013

I'm disappointed that our "Democratic" president is to the right of both Richard Nixon and Ronald Regan, but until the baby boomers stop being 24% of the population I can't see it improving.

I respect people searching for the truth, but have a problem with people who claim to have found it. For one thing, the truth changes: we fought World War II against Germany and Japan, which are now our allies. Statements like "Our problem is X", "We must do X", "It's useful to know X", "The highest moral value is X"... they have a shelf life.

Let's take the current universal moral freakout: Traditional Marriage. Before you get all bothered about the fact the only people who actually seem to want to get married anymore are gay, remember that the institution of marriage predates paternity testing, contraceptives, the industrial revolution, electricity, literacy, and most women _not_ dying in childbirth before the age of thirty. "Till Death Do Us Part" averaged less than 20 years before antibiotics, vaccines, blood transfusions, ambulances, defibrilators, refrigerated and canned food, epipens and benadryl for anaphylactic shock, tylenol to bring down fever, inhalers for asthma, surgery for appendicitis, snakebite antivenom, splints and plaster casts so a broken leg doesn't leave you lame for life, prescription eyewear to prevent "legally blind" from being effective reality, and so on. People in developed countries don't die of heatstroke or freeze to death much anymore, are seldom eaten by wolves, and can generally avoid scurvy, beriberi, or going blind due to vitamin-a deficiency. Rats, roaches, mosquitoes, fleas, and worms are things we actually notice rather than just expect about our persons.

Indoor plumbing all by itself is a huge deal: nobody in my neighborhood died of dehydration or cholera this year. I can't currently smell urine or feces. I can't smell myself either, and despite the snow outside did not risk hypothermia in washing any portion of my body.

The point is: this is a recent development. Men used to die of random banditry when they didn't get sucked en masse into a pointless border war or flatted by Genghis Kahn or Xerxes or Napoleon or Stalin. (Not saying we've solved that last one but nuclear weapons have at least kept the scale down: US casualties in Vietnam were about 1/5th of the number of Armenians killed by the Ottoman Empire during World War I.) Women died in childbirth _a_lot_. And then a plague would come through and kill yet another quarter of the population. "Till death do us part" was in the ceremony because getting remarried after your spouse died (and then having more kids) was normal: fairy tales full of stepmothers, stepfathers, and orphans aren't just common because it's a useful plot element, they're common because it was a common condition.

So saying "How DARE you question the rites of marriage handed down to the ancient greeks by Zeus himself! People divorcing after twenty years rather than dying after fifteen is clearly a sign of moral decline, Odin will strike us all with thunderbolts!" Yeah, not buying it. Used to serve a purpose. But we've moved on. Religion gave us dietary laws before we understood nutrition, parasitology, crop rotation, germ theory...

Now that we know what causes malaria, have public schools providing minimal day-care, can actually plan pregnancies and tell who the father is after the fact if we want to, and aren't generally inheriting a plot of land for generations to subsistence farm on (often as a peasant under a local warlord with a big knife), lots of people find marriage less useful. We've found new reasons to keep at it: tax advantages, joint bank accounts, hospital visits, and so on. But the real reason is cultural inertia, and tacking extensions on to a ceremony dating back to the stone age based around buying a woman from her father with sheep? Legitimately questionable if this is the best way to go about meeting society's current needs.

I also note that denialism is not searching for the truth, it's another way of being certain of an existing answer. Specifically it's saying "I know that this is wrong" and then changing your theory about how or why each time your old theory gets debunked. It's just another way of NOT questioning your existing beliefs to see if the world's moved on without you, as the world tends to do.

This is why science makes predictions. Waking up to find your swimming pool empty and guessing that invisible pixies drained it is a very different statement than saying "tonight, invisible pixies will drain the swimming pool" while it's still full. You can then wait and see what happens, and the statment is capable of being proven wrong. You have a way of noticing when the statement is not true, and that's the heart of science. If you only guess about what you already know, you can't tell when you're wrong.

So this is my problem with the baby boomers. If you search for the truth, there's a danger of thinking you've found it, at which point you stop questioning your beliefs, the world changes out from under you, and you slip into denialism about challenges to your existing belief. (This is without even getting into "smartphones have rendered wristwatches irrelevant for anyone under the age of 30, and you're telling me how great vinyl sounds" levels of inertia, but just sticking to the big important stuff.)


April 23, 2013

If VLC has focus and you type "make", you 1) mute the audio, 2) screw up the aspect ratio, 3) screw up the audio delay, 4) pause the playback by advancing one frame.

Typing "make" in the window again does not fix it.


April 22, 2013

My phone has bluetooth, but apparently the OS upgrade made the bluetooth file browser go away? Or did I only have that in the old Nexus One and not the Galaxy S?

Either way, it goes "Bluetooth! I can use that to have the laptop display sound, because its speakers are even worse than the phone's!" And that's about it.

Bravo, Google. Bravo.

(Ah, I have to download an app to do what I thought would obviously be built-in functionality. Hmmm... I wonder if this is enough to do podcasts now? Got a recording app, got something to get the files off the device, I need to dig up a screencasty thing...)


April 21, 2013

Passing comment from someone that Bill Gates dropped out of college so why can't they?

Apart from the fact his father was a lawyer and his mother was on the board of directors of the Red Cross. William H Gates III ("Trey" to his friends) really didn't have to worry about starving to death if he dropped out of the college his parents were paying for.

Just a thought. There's a _reason_ that Steve Jobs needed venture capital funding to launch Apple (costing him control of his company in 1984), and "Trey" didn't to launch Microsoft (he still owned over 40% of all Microsoft stock well into the 90's).

Yes, this sort of detail is important. As we saw when Jobs came back, when the two went head to head Jobs walked all over Gates. Repeatedly. But when Jobs said "Apple II is dead, Macintosh is the future" in 1984 (after the Lisa flopped), his board of directors rebelled and forbid him to divert resources away from the company's cash cow to invest in a sequel to the Lisa, eventually taking all authority away from him when he didn't listen. Gates's pet engineers didn't even BADLY clone the Macintosh until 1990 (and then only because an engineer working on his own initiative surprised them and forced them to chagne direction), and didn't catch up until 1995. Apple squandered a ten year head start and came crawling back to jobs who handed over the Macintosh sequel he'd finished the previous decade, and renamed it the iMac. Meanwhile Jobs had bought Industrial Light and Magic's digital effects arm when George Lucas sold it cheap after the original Tron flopped in theatres, renamed it "Pixar", and turned himself into a Hollywood movie mogul who eventually wound up the largest shareholder in Disney.

Of such details was the computer industry forged...


April 20, 2013

Whenever I reboot my netbook I lose buckets of state: open command line windows with half-edited files, todo lists, just directories that remind me "oh, this thing you were doing". Balsa won't save its state so I lose windows where I hit "reply" on a message but haven't composed and sent it yet...

Unfortunately, during one of the "apply package updates", Ubuntu swapped out its upstream crypto certificates, deleted the old ones, and needs to reboot to use the new ones. (I think that's what happened.) So I can't do any more package upagrades until I reboot, and that includes installing "debootstrap" and "debuild" so I can once again dink at bootstrapping debian under Aboriginal.

(Well, I could tell it to install packages from untrusted sources, but if the reboot doesn't fix that it's reinstall time.)


April 19, 2013

Huh. If you google for "toybox linux" the first hit is the toybox about page and the second's the news page... but it says "toybox - Rob Landley". Which is odd, because the <title> tag just says toybox. Where is it getting that? (The top level landley.net page's title is "Hello world!". The only mentions of my name except in release notes are on the license page...

Sigh. Google is trying to help out in the kitchen by dropping an egg on your shoes and getting flour everywhere. Google is helpful in ways you didn't ask it to be! How do I pat google on the head and try to distract it so it stops being helpful...


April 18, 2013

And _now_ Texas LinuxFest gets back to me, asking if I can do my proposed "why the GPL is dying" as a lightning talk instead of a full slot.

Alas, when they didn't get back to me by their own deadline (the 15th) I booked tickets to visit my family in Austin the previous weekend, where I get that monday off anyway and can spend all the time at home.

No biggie, Eben Moglen just covered the topic, and I already did three minutes on it in the GPL section of my ELC talk. (Yes, you can link to a starting time in a youtube video. :)


April 17, 2013

I mentioned earlier how I needed to teach people how to clean up code to my standards, and I've been trying to do that. I've been cleaning up pending commands in stages, doing one evening's worth of work and checking it in, then posting a message explaining why I did that.

I started by explaining the start of ifconfig cleanup. Each post links to the mercurial commit so you can see the diff, and describes what the hunks of the diff do. I described what happened in commit 843, then commit 844, then the next one that touched ifconfig was commit 852, and so on.

The first cleanup I checked in in stages was uuencode. I went back and described those stages in three posts (one two three).

My descriptions since then are tagged with [CLEANUP] in the title, so if you want to follow along in the archive, those are the posts to read. (Date view may be the easiest way.)


April 16, 2013

Sleep. Sleep would be good. I should try that sometime.


April 10, 2013

Somebody asked if I'm still working with funtoo, and I answered negatively. But perhaps I should should give more context.

I started trying to bootstrap Gentoo for Aboriginal Linux years ago, but unfortunately the Gentoo project was badly damaged by Debian refugees landing on it around 2006.

The Gentoo community got squashed by a flood of Debian developers fleeing their project's paralysis during the "debian stale" years (3.0 in 2002, 4.0 in 2007) and brought the acrimony with them. When Ubuntu bled off the pragmatists to a new fork, the FSF zealot idealists left behind fought each other to a standstill (pearl-clutching about iceweasel and such), the project nearly constipated itself to death, and a subset of its developers went "hey, gentoo's doing actual engineering without endless flamewars... well it was before _we_ got here, now it's another perpetual flamefest. Heh. Go figure." They bogged off after a few years (Ubuntu started sponsoring permanent Debian developer positions to get its parent project unclogged) but Gentoo development never really recovered because debian snuffed out the sense of community it used to have.

I was hoping Funtoo could provide the condensation nuclei for a new community, but it didn't work out. (For me. Your mileage may vary.)

On a technical level, I've poked at gentoo bootstrapping a number of times and found some deep technical problems with it. The immediate problem was that nobody in the surving community really understood Stage 1 anymore, or was willing to explain it if they did. They'd reverted to a "gentoo builds under gentoo" model where setting up portage and a gentoo base build environment was black magic. What documentation they did have assumed you already knew what you were trying to look up.

But when you dug under that, every gentoo ebuild file explicitly lists every architecture that package supports. Meaning if you want to support a new target (as the qualcomm hexagon guys were doing), you have to touch every single ebuild file in the entire tree, to add your new package to all of them. (Or, as they did, just make the x86 architecture an ancestor of your non-x86 architecture. Which is cheating, but the build system was too broken for any clean solution other than a complete rewrite.)

This completely unnecessary design assumption runs directly counter to what I'm trying to get Aboriginal bootstrapping to do. I'm building packages for whatever the current host architecture happens to be. I want the build to be architecture-agnostic, but just resolving the portage manifest means following a symlink to an architecture definition file that #includes a bunch of sub-architecture definition files that eventually get around to including individual ebuild availability lists. They make a big deal about having a top level portage configuration file in /etc that you specify your architecture tuple and such in, but this turns out to be a thin layer of tuning on top of baked in assumptions througout the portage tree.

Their entire build is designed around NOT letting me use it in any sort of flexible manner. Even _more_ so than rpm or dpkg based repositories where building from source wasn't a priority so they didn't put so much effort into imposing One True Way to do it.

By the way, Google's Chrome OS (the not-android thing they have fighting a civil war in house with the android guys) is Portage based. As outsourced to Cannonical and then brought back in-house. I couldn't wrap my head around it when I tried, it was kind of horrid. But the reason it was horrid is forking gentoo and doing your own portage-based build with different packages was hugely non-obvious. (So they did a preprocessing layer that... let's just say the One True Way got defeated. By brute force leaving piles of debris everywhere.)

So yeah, I had high hopes for Funtoo throwing all that out and starting over, and Daniel Robbins was working on it and I was trying to help until the irc incident. Except the "throwing it out and starting over" never quite happened because their community wasn't big enough, so they decided to leverage the existing portage tree from gentoo. And then never got away from it, that I saw. Oh well. (Perhaps they've made great progress since I stopped paying attention, and I just hadn't heard.)


April 6, 2013

A longish recounting of Friday's adventure. Feel free to ignore if you don't want to hear about my aches and pains, but I had a stressful thing and need to vent.

Living in the habitrails of St Paul, everything's right next to each other. My commute to work is slightly longer vertically than horizontally, and my dentist is at the end of the hall going the other way.

Around eleven I had dental work, an hour and a half of prep work (cleaning and flouride varnish and stuff) for all the cavity filling they have to do over the next few weeks. This involved an anesthetic mouth rinse that didn't anesthet as much as it should have, but then I have a history of weird reactions to anesthetics. (Nitrus Oxide makes me feel like my entire body's being electrocuted, for example.)

This ended shortly before my 1pm meeting, and I still had a hard time talking in said meeting, then headed to the convenience store for some bananas and an energy drink (I'd missed lunch).

I stopped by my apartment's break room on the way back from the convenience store to check my email, so I was looking at black text on a white screen when my vision greyed out for a moment (like I'd stood up and wasn't getting enough blood to my brain), and when it came back a blob-shaped quarter of the screen was just white with no text on it. It moved when I changed where I was looking, and it was there symmetrically in both eyes, so I knew it was a brain thing and not an eye thing. (And that it was basically "I can't see this bit but my brain is treating it like the blind spot everybody has in each eye and editing out the bit I can't see, stretching the edges and doing a flood fill or something.) Deeply freaky; I stood up to look out the window and see what something well lit but non-computer would look like, and just walking around a bit made it clear up.

Even though the problem cleared itself up in maybe 30 seconds, I went next door and asked the dentist's office where the nearest hospital was. The showed me a building three blocks away out the window, and when I said I wanted to walk rather than take a cab one of them escorted me to the parking lot. (The habitrails do not go to the hospital, it involves going outside.)

I walked about a block, and the area of my vision that had been glitched before started to re-glitch, only this time it was a sort of crosshatch texture overlaid on what I was seeing, which got more opaque and expanded off to the left. Even though walking fixed it last time, this time it seemed like walking was making it worse. (I was thinking "is this a blood vessel is blocked kind of stroke, or a blood vessel is leaking kind of stroke".) At this point, I stopped under an intersection street sign and called 911. A thousand dollars to go two blocks vs chance of permanent blindness: I'm paying the money.

While waiting for the ambulance to show up (fire truck showed up first for some reason), the visual distortion got worse, growing into a giant crescent shape taking up the left 2/3 of my vision in both eyes. It was bad enough I couldn't read anything (not even the really big letters in the advertisement on the window across the street) except out of the right side of my eyes. And here I am thinking "if I can't read, I can't work", and that if it was a blocked blood vessel they could use the clot busting drugs any time in the next hour, and if it was the rupture sort of stroke lowering my blood pressure might slow the advance while I still had some vision left so I was sitting down and trying to stay as calm as possible.

So the ambulance shows up and starts asking me which hospital I want to go to. What on earth kind of question is this to someone who THINKS HE'S HAVING A STROKE? I told them I'm from out of town, don't know my options, that I'm aware of the research about hospitals in the same city costing ten times as much for the same thing but I am not an informed consumer here and just take me to the nearest one that can deal with the symptoms I'm describing please. (I might have been babbling at this point but _dude_.)

So we get to a hospital, and they wheel my gurney into a hallway. But this point, my vision has cleared up a bit, meaning I'm afraid to move because walking made it worse and lying still made it clear up and I had no idea what was going on and if this might just be a precursor to a sudden BIG stroke or what. Yet another person asks me what's going on and I describe my symptoms, trying not to use words like "fovea centralis" because that's their domain and not mine and I'm HAPPY to leave this to the professionals.

At this point, they do a bit of traige. The ambulance guys had checked my blood sugar and blood pressure (both fine), but a nurse came and checked my blood pressure again. (Yes, if I'm having a leaking blood vessel in my brain by all means constrict my arm to force more blood to my brain, thanks. I guess the ambulance guys didn't pass on any data?) Somebody did the "can you track my finger" test, which I probably could have done even if I'd still had just the 1/3 vision while awaiting the ambulance so I'm not sure what it proved. (Not completely blind yet, nope.) A receptionist came and took my driver's license and insurance card. And that was it: I sat there.

I spent the next half hour in that hallway while they tried to clear a room for me to go into. In the first room they told me to change into a hospital gown, then gave me three very small blankets because it was really cold in the gown. (Still Minnesota.0 Then they moved me to a second room, because there was some unspecified thing wrong with the first one. (During this move my glasses got lost for about half an hour.)

After some more waiting in the new room, a nurse and then doctor finally showed up to actually look at me. (I'd been there about an hour at this point.) Since I was having a neurological problem that presented symmetrically in both eyes, they looked at my eyes. Checked for retinal detachment (none, good to know), later wheeled in a machine to check for glaucoma (none). Wanted very much to do the eye chart test with me but couldn't until my glasses turned up.

An early test the doctor did involved shining a bright light in my eyes, and the afterimages didn't want to go away, and after a couple minutes they started flickering and wobbling in a fairly disconcerting way and I went "oh no don't start again" and the doctor wanted to know why, and I said "I hope it's not doing it again". So she left. A minute or so later I hit the nurse call button because it was doing it again, but this time spreading out to the _right_ side of my vision. Luckily the left side had recovered to be nice and sharp and clear and high resolution again, so I wasn't as panicky as I could be because there was a reasonable chance this would recover the same way eventually. But I was trying to explain to the doctor that I could hold my fist up and not see it, and she kept checking the edges of my vision where I could start seeing her wiggling fingers, and I'm trying to explain "there's a large hole in my vision but it's not AT the edge, I can see around the side of it, the test you're doing is not helping" but she kept performing the useless test anyway. (That pretty much sums up the entire visit, actually.)

Sometime after that the nurse showed up with the glaucoma test machine, and around then I found my glasses (they got mixed up in the blankets) so they could do the eye test. (Reading the eye chart with "E" at the top which I try not to memorize but reading the same chart with the second eye is just SILLY.) During all this, the vision on my right side gradually cleared up the same way the left side had. Still no MRI or cat scan in the offing looking at the bit going funky, they kept investigating my eyes, which even I could tell were clearly not the source of the problem.

At about this point I went from paniced to bored, and started trying to diagnose myself. I knew before I arrived that it was a brain problem not an eye problem (I have two different eyes and this was presenting exactly the same way in both at the same time), so every single test they'd done so far was useless. And it was seeming less like a stroke because strokes generally don't move around like that, the damaged blood vessel corresponds to a specific physical area of the brain. And this was all over the place, and then cleared up again fairly quickly and completely (again, not stroke-like. A blocked blood vessel can unblock, _maybe_ a clot was moving around and re-lodging somewhere else, but... not likely.)

About this time I started to get a headache. It was just like the headaches I've had on the right side of my head for years now (which I've been blaming on sinus troubles), except this one felt like something was stabbing the back of _left_ eyeball, which was new.

This made me remember my friend Adrienne's tweets about having a "scintillating" something visual effect preceeding her migranes. She had the scintillating whatsis once and had to pull over while driving. So I used my phone to look it up and the wikipedia picture actually looked a bit like what I'd been having. Which came as something of a relief at that point, because I'd BEEN THERE FOR THREE HOURS already, and they hadn't even run any tests that would be relevant to anything going on in my brain, and even if they started at this point if I HAD been having a stroke it would probably be too late to avoid permanent damage. Strokes are one of those "every minute counts" things, I started straight for the hospital when I noticed something wrong (walking three blocks is faster than waiting for a taxi or ambulance), I called an abulance to go two blocks rather than exacerbate a problem that walking seemed to be making worse, and then when I got to the hospital I sat for three hours while they twiddled their thumbs and tested things I'd been able to rule out before heading for the hospital in the first place.

So at this point it didn't seem like a stroke anymore, and even if it had been, it would be too late to do much about it. (Stroke, like heart attack, is a "fix it before cells die" thing. Waiting three hours is bad. Although cell death through resource exhaustion is different from autolysis, so no idea what the actual time limits are.) The hospital itself was being completely useless, so I got back in my clothes and walked out to the receptionist to ask how I check myself out.

This made them get a senior doctor to come talk to me. He agreed that scintilating scotomoa sounded like a reasonable interpretation of my symptoms. He said he'd like to run an MRI anyway, but agreed it didn't seem like a stroke. After three hours of stress I just wanted to go home, so he suggested I get a second opinion from a follow-up physician and see if they thought I should get an MRI. And he scheduled me a follow-up not with a neurologist, but with an opthamologist.

Sigh. IT IS NOT AN EYE PROBLEM. IT IS A BRAIN... right.

Adrienne picked me up from the hospital, and I slept on her spare bed for ten hours longer than I'd planned (we agreed I should be observed rather than alone, and I was _TIRED_), and the next day we went to visit my sister like I'd been planning to do before all this. I lost half a day of work, but as far as I'm concerned that was a marvelous outcome because I DIDN'T GO BLIND. Yay not going blind.

I am disappointed in the hospital. I have lots of random trivia passing for knowledge, just enough to be dangerous in all sorts of areas, but I am NOT a domain expert outside of some small niches and have great respect for domain experts. I'm often the guy you call to figure out where a computer problem lives so you can call in a specialist to actually fix it. Having to diagnose myself in a medical context is CREEPY and WRONG. The fact that I could was just lucky. The hospital being so woefully understaffed that if something had been going seriously wrong I'd have been screwed was not reassuring.


April 5, 2013

That was a far more interesting evening than I expected.

I had a thing called a "scintilating scotoma", which is a visual glitch that can precede a migrane, and makes you think you're going blind because half your visual field looks like television static viewed through plastic wrap. But it's not a stroke, just blood vessels dilating wrong, and thus transient. Thank goodness. (I had two of 'em, each lasting about half an hour, with an hour gap in between.)

It took about 3 hours to diagnose this. (Hospitals around here are _really_ understaffed.) Never had a migrane before, I'm guessing it was a reaction to the dental anesthetic but honestly it could be anything.


April 2, 2013

Got an Aboriginal Linux release out.

It was qemu. 1.2.0 built all the targets reliably. Sigh. Had to go back _two_ release versions to make the funky intermittent bug go away.


March 31, 2013

Sigh. Conflicted. On the one hand, I love having contributors to toybox. I am _grateful_ that people are interested enough in the project to contribute to it. On the other hand, my past week's hobby programming time (what little time and energy I have left over after work and life, plus the demands of Aboriginal Linux, kernel documentation, keeping vaguely informed about projects like qemu, musl, linux from scratch...) went to cleaning up the uuencode/uudecode submissions from Erich Plondke.

Erich is a good coder. (I met him working at Qualcomm, he's the lead architect of the Hexagon chip.) He's the kind of developer any project is lucky to attract even passing attention from. But I still took the submitted uuencode from 116 lines (2743 bytes) in 7 functions to 67 lines (1440 bytes) in 1 function, and uudecode from 175 lines (4534 bytes) in 9 functions to 107 lines (2300 bytes) in 1 function. I know I'm _not_ a better coder than him, so the obvious answer is that he simply didn't care about that sort of thing.

Of course the question is "should I care". The submitted uuencode and uudecode worked fine, and I've saved what, a single digit number of kilobytes in the generated binary? (I haven't checked.) I _think_ my version is easier to audit simply because there's less of it, but the previous one wasn't hard to. The obvious question is "was spending a week on that worth it?"

But the reason I'm doing toybox is to do a _better_ job than what's there. I want to produce the best implementation of each of these commands that I can. So how much of my definition of "best" is an illusion?

I just got an ifconfig submission, originally touching something like eight files (adding half of them to lib, two under toys, and touching some top-level headers). I spent the time to glue it together into one big file and threw it in pending, but I don't even know where to _start_ cleaning it up. Just like I haven't even started cleaning up xzcat. (It's not that I can't, it's that it'll take a couple days just to _triage_ this properly.)

But... presumably it works as-is? I could just... leave it? Cleaning up two small, simple, already fairly clean commands just took a week. Not a week of actual programming time, but a week of the time I had to work on it. I haven't gotten back to "mount" (which I've been meaning to find a week to knock out since last june), because every time I sit down there are other things to do. I have two contributed syslog implementations and that's an _easy_ one once I figure out the design issue (nothing specifies the log levels, are they actually stable on Linux or can I harvest 'em from the header during configure or what? I got the signal names dealt with...) The find submission is an easy cleanup. Somewhere I have links to clean gzip/gunzip implementations I should integrate. The I've got to implement bc to clean up after Peter Anvin's ongoing quest to complicate the linux kernel build. I was working on test, _and_ got a submitted test that's probably only 3 times the size of the one I'd write myself...

Sigh. I'd like to delegate all this to somebody smarter than me. I know they're out there. I'd love for somebody else who's a better coder than I am to submit things that I honestly can't figure out how to improve because they're THAT GOOD. I don't want to discourage contributors. I don't want to be a bottleneck in development. I just spent a week taking code that worked fine and making changes to it most people will never even notice.

But I notice. After I left busybox, Denys dismissed this sort of thing as a difference in coding style, and maybe that's all it is. I thought I was talking about the "ifdef considered harmful" paper and the linux-kernel coding style guide about avoiding ifdefs in the C code, and that doing #if ENABLE defeats the purpose of the ENABLE macros I introduced. I guess the main difference between him and me is I thought that was important, or at least interesting. But what if all I'm doing is making the code aesthetically _pretty_ to exactly one person and nobody else?

But I can't _not_ do it and still have any sense of direction for the project. I guess if I'm wasting my time, it's mine to waste. The FSF zealots made sure I wouldn't get paid for any of this, and thus be able to afford to spend more time on it instead of working around a day job doing unrelated stuff. If the result's an idiosyncratic art project: oh well.


March 28, 2013

One of my perl removal patches replaced the kernel's mktimeconst.pl with a C implementation, using the same makefile infrastructure that generates CRC tables. The patch series has been doing that since 2010. Before that it replaced it with a shell script.

So of course once I finally started getting attention for the perl removal patches, H. Peter Anvin replaced his perl script with a "bc" implementation, because his life's goal is to insert extraneous dependencies into the kernel build and solving this problem with C just wouldn't _do_.

Nothing else uses bc. When I say "nothing" I mean the Linux From Scratch guys had to add "bc" to build the new kernel, because the rest of the LFS packages never once used bc for anything. Busybox is 15 years old and nobody's ever SUGGESTED it implement bc, that I'm aware of.

But technically it's posix. A turing complete math language that supports arbitrary precision fractional exponentiation is the _perfect_ thing to do a couple 64 bit divisions with. I mean it's such an obvious choice.

So now I'm trying to figure out if I should keep patching Peter Anvin's insane overcomplexity out, or implement bc in toybox. Hmmm...

Alas, doing a 64 bit math version of bc would solve the problem and probably nobody would ever notice the difference, but the _point_ of bc is to do arbitrary precision math. I know how to do it for the four basic math operations but implementing "raised to the power of .3176252" I need to look up how to do. (Wikipedia[citation needed] probably has an opinion. Might even be right, if I catch it at a lucky moment in the edit wars.)


March 27, 2013

Slowly cleaning up Erich Plondke's uuencode/uudecode submissions, and I hit a weird one. In uuencode, the "historical" algorithm says to basically chop the input into 6 bit chunks and add 0x20 to each one, so you get a character from 32 to 63. Except that ascii 32 is space, so the result is full of significant whitespace, which everything breaks.

In reality, everybody adds 0x40 to the space value to get 64, so the range is 33 to 64 and then you & the result with 0x3F during processing (which actually happens anyway internally as part of the mask and shift, so you get it for free). But... that's not what posix says.

I'm growing increasingly disappointed with posix. Some old farts on the Austin mailing list are going "the C locale can't possibly support unicode, that's blasphemy!" despite the C locale currently supporting unicode, they just hadn't noticed. They keep talking about "certified unix", like that means something in the 21st century (Irix! UnixWare! Ultrix!). And implying that Ken Thompson inventing something and Linus Torvalds calling it the only sane solution to the problem doesn't mean anything, because these old farts know better. Odd.


March 25, 2013

Ok, things that could be causing this weird intermittent aboriginal failure:

  • qemu upgrade
  • toybox upgrade
  • kernel upgrade
  • busybox upgrade

It's probably _not_ the binutils upgrade because that was a couple releases back (first in 1.2.1, and 1.2.2 is out since then). Admittedly I haven't been as diligent about redoing the full LFS build since my server died (dinky little netbook takes about a day per target to natively build Linux From Scratch under qemu), and this is an intermittent problem, but I still think I'd have noticed before now. Besides, it's mostly breaking during compiles and not links.

Roll back to qemu 1.4.0 release version and try armv5l (fails most reliably), and that reproduced the failure. Try going back to 1.3.0...

chmod 755 ../../lib/auto/POSIX/POSIX.so
cp POSIX.bs ../../lib/auto/POSIX/POSIX.bs
chmod 644 ../../lib/auto/POSIX/POSIX.bs
make[1]: *** [all] Segmentation fault
make[1]: Leaving directory `/home/perl/ext/POSIX'
Unsuccessful make(ext/POSIX): code=512 at make_ext.pl line 449.
make: *** [lib/auto/POSIX/POSIX.so] Error 25

So I'm guessing it's _not_ a qemu issue then? (Well, 1.3.0 was December, so maybe I wouldn't have noticed, let's try 1.2...)


March 24, 2013

The i586 and i686 LFS builds finished, but the i486 tar build failed with:

  GEN    wchar.h
  GEN    wctype.h
make  all-recursive
make[3]: Entering directory `/home/tar/gnu'
make[4]: Entering directory `/home/tar/gnu'
  CC     areadlink.o
distcc[19717] ERROR: compile areadlink.c on 10.0.2.2:9243/1 failed
distcc[19717] (dcc_build_somewhere) Warning: remote compilation of 'areadlink.c' failed, retrying locally
distcc[19717] Warning: failed to distribute areadlink.c to 10.0.2.2:9243/1, running locally instead
In file included from areadlink.c:32:
./stdlib.h:71: error: redefinition of 'struct random_data'
distcc[19717] ERROR: compile areadlink.c on localhost failed
make[4]: *** [areadlink.o] Error 1

It actually failed _twice_ (once via distcc and again locally), so it looks like the preprocessor consistently produced bad output? The i586 and i686 builds don't have the phrase "remote compilation" in their outputs, so it's not a transient problem. (Thank goodness; those suck to debug.)

Except... it _is_ a transient problem, in that I re-ran the i486 build and it completed, all the way through vim (last package). Same target, same software, different behavior.

Uninitialized variable? Weeeird.


March 23, 2013

In my toybox talk at CELF I mentioned that containers aren't quite ripe yet. The example I usually mention when it comes up to give an example of the tricky corner cases is how writing to drop_caches under /proc in a container shouldn't cause a systemwide latency spike, but that was actually an issue from 2010.

So far this month the container guys added a new feature that opened up a security hole where creating a container let you crack root on the host (due to an bad flag combination that let your container modify mount points in a shared filesystem, so bind mount /etc into the container where you're root and then bind mount /etc/passwd in there and su root on the host.) And they managed to create cross-linked directories in /proc which led to a fascinating post explaining why you can't hardlink directories (because lock ordering is based on ancestry, different path traversal to get to the same dentry means different lock order).

In other news, it looks like capability bits may finally be collapsing under their own weight, although sanity rather than doubling down is probably too much to hope for. (Security is hard, but bureaucracy is not the same as security.)


March 22, 2013

Sigh. Remember the weird intermittent failures I was getting in the native builds, not in any one package but randomly all over the place? And yet the package would build if I rebooted and tried it again, and a chroot build ran to completion three times in a row?

Bit early to be sure, but it really looks like it was a qemu bug.


March 21, 2013

On twitter somebody asked my opinion about an article titled Ramdisk Versus Ramfs - Memory usage issues, and my response was "That's a fairly extensive yet misguided analysis from someone who has no clue how the system works."

There's are four types of filesystems: block, pipe, ram, and synthetic. These are respectively filesystem backed by a block device, filesystems backed by a protocol written over a pipe to some other process, filesystems that store their contents in ram, and filesystems that make up their contents at runtime.

A ramdisk is a block device that stores its contents in a fixed-size chunk of memory. Creating a ramdisk and mounting a ramdisk are two different things. Like all block devices, in order to be mounted a ramdisk has to be formatted with a filesystem and interpreted by a filesystem driver, which reads from it by copying data into the disk cache and writes to it by copying disk cache data back to the block device.

What ramfs does is it mounts the disk cache itself as a filesystem, storing its directory entries in the dentry cache and the file contents in the page cache. The disk cache memory is the _only_ copy of the data, which stays pinned in RAM with no place to go. (There's no "backing store" to flush things to when the system tries to free up memory, so it never expires. The ramfs derivative tmpfs is an extension that can use the swap partition if you've got one, like any other swappable memory.) So when you write files into ramfs it allocates more memory, and when you delete them it frees that memory, and this is mostly just a clever re-use of existing VFS infrastructure that every other filesystem needs anyway so there's very little overhead.

This ramdisk approach is less efficient because the ram block device uses a fixed-size chunk of memory that doesn't grow or shrink based on usage, plus wasted space due to formatting, then it needs two copies of anything you actually use (the copy in the block device and the copy in the disk cache), and to top it all off the filesystem driver you mount it with is a glue layer to parse the format and copy the data into and out of the page cache.

Once upon a time ramdisks were the only option, but now they're essentially obsolete. (If you want to make a filesystem image, use the loopback driver which treats a normal file as a block device.)

So that was my objection to the article: it was written without understanding that not all filesystems are block-device backed. There's no block device behind a ramfs any more than there's a block device behind /proc or /sys. (Mounting an ext2 partition on a hard drive requires two drivers: a block device driver such as SATA or USB to provide the block device, and a filesystem driver such as ext2 or ext4 to parse the format. When you mount sysfs the sysfs driver makes up the contents of the filesystem as it goes along, and the "device" field of the mount syscall is ignored.)


March 20, 2013

My talk from CELF/ELC went up. I'm fairly proud of this one. It's about how the smartphone is replacing the personal computer (the way the PC replaced minicomputers and mainframes before it), what that means, and what I'm trying to do about it. (The outline's here.)


March 19, 2013

Sore throat, runny nose. Yeah, I had multiple children crawling on me for 3 days.

Now I'm having weird problems where the native build is breaking, but not in the same _place_. I run it twice and get two different breaks. I _think_ what's happening is suspending my netbook while a build is happening confuses qemu, or possibly distcc. Unfortunately, my netbook takes more than 12 hours to build LFS under qemu, and it's hard to leave it unperturbed that long when it's my primary machine and I have to suspend it to move it. I suppose I could disable distcc but then it would take even longer to build. (Well, not the perl build. Which is probably about half of it, due to the complete lack of acceleration running native perl code to produce more perl code has.)

I upgraded toybox, busybox, and the kernel. There hasn't been a uClibc release this year and the toolchain's same as last release. So what's going on? Hmmm...


March 18, 2013

Snowed in at my sister's, but three niecephews are in school and the one who has a snowday is asleep on the couch.

Hardwired the darn irq to 59. There's a point where working out what the math is _supposed_ to be and why it used to work out one way but doesn't now is just too many compile and test cycles for this netbook.

Two different people submitted uuencode implementations at the same time. An embarassment of riches. The pending directory is piling up a bit, I need to do all that review and polishing. My current head-scratcher is "who", the last "default n" thing outside the pending directory, which works fine for what it does but supports no options at present and posix says it should do "-abdHlmpqrstTu". Do I sit down and implement all of that before changing default y? Do I move it to "pending"? Do I document a variance from posix that we're not in the minicomputer world anymore so having multiple people logged into the same container is now fairly rare? (That's about the same logic by which I didn't bother with posix "talk", "mesg", and "write".)

And really, looking at the who spec, what's this "who am i" thing? There's a whoami command (already in) but when I do "am i" as arguments to Ubuntu's who I get no output. The "who -r" option prints the runlevel of init, which makes no sense for any init except the system V one. (Note: posix DOES NOT SPECIFY INIT.) "who -t" shows the last change to the system time clock, something Linux doesn't track (and on ubuntu shows nothing). "who -l" says to show lines on which the system is waiting for someone to log in, I.E. the serial terminals (modem pool) attached to this minicomputer. That's a no. "who -d" shows "processes that have expired and not been respawned by the init system process" which again requires a bit of knowledge of things like upstart. (Apparently my system has one, it's "pts/33 2013-03-07 20:56 13206 id=s/33 term=0 exit=0" and I have no CLUE what that means. I have no pid 13206 at present, not even a zombie.)

Ok, so who -lmpt don't produce any output on ubuntu, -bHq seem vaguely useful, -s is the default, -a is a chord of other options, -dr require knowledge of init, and -Tu produce the same info (both about tty7 which is the x11 process, and we have an "uptime" command).

Out of curiosity, I checked busybox who. The only option it has is "-a" which shows all the obsolete ctrl-alt-f1 style /dev/tty# text consoles, and pts33. (I have a dozen terminal windows open in six desktops each with buckets of tabs, there are WAAAAAAAY more ptys in use than that. Ok, what IS it with pts/33? Groveling around in /proc/[0-9]*/fd to see what points to that, it's the shell running a vi instance editing todo.txt. I note that I'm editing THIS file in vi and it's not showing up. What the...?)

Yeah, switching it to "default y" with a comment.

And with school letting out early (cancelled), the horde returns...


March 16, 2013

Arm interrupt routing: it's even more broken than that.

Let's jump back to the 3.6 kernel release, where I was reverting all the irq routing stuff back to what it looked like in 3.3 or so, and both scsi and the ethernet controller worked. In that context, we got the following boot messages:

PCI: enabling device 0000:00:0d.0 (0100 -> 0103)
sym0: <895a> rev 0x0 at pci 0000:00:0d.0 irq 27
sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking
sym0: SCSI BUS has been reset.
...
8139cp: 8139cp: 10/100 PCI Ethernet driver v1.3 (Mar 22, 2004)
PCI: enabling device 0000:00:0c.0 (0100 -> 0103)
8139cp 0000:00:0c.0 eth0: RTL-8139C+ at 0xc8874400, 52:54:00:12:34:56, IRQ 27

So both the scsi and ethernet were sharing IRQ 27.

After they moved the IRQ controller start from 0 to 32 (for not obvious reason), this means the corresponding IRQ is now 27+32=59. And this is the one that works.

Here are the new boot messages after the fix I checked in yesterday:

PCI: enabling device 0000:00:0d.0 (0100 -> 0103)
sym0: <895a> rev 0x0 at pci 0000:00:0d.0 irq 59
sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking
sym0: SCSI BUS has been reset.
...
8139cp: 8139cp: 10/100 PCI Ethernet driver v1.3 (Mar 22, 2004)
PCI: enabling device 0000:00:0c.0 (0100 -> 0103)
8139cp 0000:00:0c.0 eth0: RTL-8139C+ at 0xd0874400, 52:54:00:12:34:56, IRQ 62

Spot anything? Like the fact that 62 != 59? And thus, when I run the nativel linux from scratch build and it tries to distribute compilation over distcc, we get:

=== zlib(2 of 48)
Checking for gcc...
irq 59: nobody cared (try booting with the "irqpoll" option)
Backtrace: 
[<c00113e8>] (dump_backtrace+0x0/0x110) from [<c001152c>] (dump_stack+0x18/0x1c)
 r6:00000000 r5:c02eabdc r4:c02eabdc
[<c0011514>] (dump_stack+0x0/0x1c) from [<c004d384>] (__report_bad_irq+0x28/0xb0)
... (mondo useless stack dump that just says the IRQ came from hardware)
Disabling IRQ #59
... (more stack dump)
8139cp 0000:00:0c.0 eth0: Transmit timeout, status  d   2b    5 80ff
8139cp 0000:00:0c.0 eth0: Transmit timeout, status  d   2b    5 80ff
8139cp 0000:00:0c.0 eth0: Transmit timeout, status  d   2b    5 80ff

Oddly, QEMU's scsi device still seems to be usable so at a guess they didn't actually disable IRQ 59? Dunno...

So let's annotate and see:

slot=12 pin=1 irq=62
slot=13 pin=1 irq=59

March 15, 2013

The arm breakage: arch/arm/mach-versatile/pci.c function versatile_map_irq() is failing to add 32 after the irq start moved. That's the same line they broke with the "swizzle" stuff that I've been reverting for several releases now. I checked the current kernel to see what the un-reverted line looks like, and... commit e3e92a7be693 fiddled with it again.

commit e3e92a7be6936dff1de80e66b0b683d54e9e02d8
Author: Linus Walleij <linus.walleij@linaro.org>
Date:   Mon Jan 28 21:58:22 2013 +0100

    ARM: 7635/1: versatile: fix the PCI IRQ regression
    
    The PCI IRQs were regressing due to two things:
    
    - The PCI glue layer was using an hard-coded IRQ 27 offset.
      This caused the immediate regression.
    
    - The SIC IRQ mask was inverted (i.e. a bit was indeed set to
      one for each valid IRQ on the SIC, but accidentally inverted
      in the init call). This has been around forever, but we have
      been saved by some other forgiving code that would reserve
      IRQ descriptors in this range, as the versatile is
      non-sparse.
    
    When the IRQs were bumped up 32 steps so as to avoid using IRQ
    zero and avoid touching the 16 legacy IRQs, things broke.

Yay. Where were you when the darn swizzle breakage happened 6 months ago?

Except, and this is hilarious, IT'S STILL BROKEN. Because now they're adding 64 instead of 32, so it's trying to allocate IRQ 92 which is once again way out of scope.

The SCSI controller's IRQ used to be "hardwired 27". That's what QEMU implemented, that's what worked for years. Then the kernel guys added this "swizzle" nonsense which did the math wrong and tried to allocate irq 28, which wasn't there. Now they've added 32 to the IRQ controller start and 64 to the mapping, so they're trying to allocate irq 92, which isn't even close. (Correct answer: 32+27+((slot+pin)&3) = 59. Except it's probably more like (32*(slot+1))+27+(pin&3) if you want to support both interrupt controllers but I DON'T CARE at this point because qemu doesn't emulate that and obviously nobody's got real hardware to test this on anymore.

Note that the "swizzle" breakage (commit 1bc39ac5dab2) was really stupid. The heart of it was adding a new 'swizzle' function, and changing map_irq() like so:

-       irq = 27 + ((slot + pin - 1) & 3);
+       irq = 27 + ((slot - 24 + pin - 1) & 3);

Stop and ponder that new line for a while. They're subtracting 24, and then anding the result with 3. So the change to the line was actually a NOP, and it was the addition of an unnecessary "swizzle" function (whatever that is) adjusting the result that broke it.

They followed up the swizzle breakage with moving the IRQ controller start but not the irq_map(), so it was still requesting 28 but now that was outside the IRQ controller range. And then to fix it, made the irq_map() add 64 (not 32). That's three consecutive failures of simple arithmetic, which they never checked the result of even though qemu supports it and worked for years before they touched it. Their most recent patch does not even match its own description of the problem.

Welcome to embedded Linux development.


March 14, 2013

Years ago (late 90's? Early 2000's?) I read an article about a researcher pondering the general lack of nutrition in eucalyptus leaves, who worked out what a balanced diet for a creature with a Koala's general physical characteristics would actually _be_, and force fed a koala that diet (they won't voluntarily _not_ eat eucalyptus) for a few weeks to see what would happen.

The result was basically something out of a horror movie: a hyperactive, aggressive, extremely strong, territorial, vicious predator with nasty claws and a tendency to leap at anything that moved and several things that didn't. The researchers' conclusion was that koalas are a race of junkes, and that long ago a species of animals as vicious as anything else in australia got stoned eating eucalyptus leaves and was willing to live perpetually malnourished to stay that way, and after a lot of additionial evolution to cope with being permanently stoned, if you remove the drugs they go psychotic.

I keep being reminded of this by @speedysays on twitter (the author avatar of A Girl and Her Fed is one of her characters, a vicious talking koala), but I can't find that article anymore. It was in the online version of some australian newspaper, I bookmarked it in the browser I had at the time (might have been back when I worked at IBM, which would be 1998?), but that's long gone. Google can't find it, I'm not quite sure what to search for. It's _probably_ in archive.org somewhere, but unless I had the exact link...

Another article I read (more recently, I think, mid to late 2000's? It was after I bought the condo; I remember WHERE I was when I read it) was a scientific article about the possible origin of life on earth being undersea zeolite deposits near volcanic vents. Zeolite is a mineral that naturally has cavities about the size and shape of living cells, and all living cells have a "membrane charge" that powers a lot of their internal processes (like a battery), and one of the functions of ATP and such is to replenish this charge. Zeolite deposits near volcanic vents would naturally attract all sorts of chemicals out of the water (they're natural filters), and with the right chemicals or thermal gradients in the water they would accumulate ion differentials. The point was you could get something pretty darn close to the inside of prokaryotic cells just from what's lying around in such an environment, which could send virus or spore-like bits of itself to adjacent zeolite pockets, and become quite complex before having to evolve a membrane to create its own shell to take with it into the larger world.

I was reminded of this by another random article about how the largest ecosystem on earth is bacteria miles under the ocean floor exploiting chemical reactions between rocks and ocean water (this exhausts the chemicals in the rocks but plate tectonics provides a constant fresh supply as the old stuff gets subsumed and seafloor spreading replaces it).

The obvious conclusion is "oh, the chemicals in the charged zeolite pockets didn't need to evolve a membrane to spread, they just had to learn to carve out new pockets in other types of rock, membranes probably came much later". Given what we know from the fossil record, life arose on the planet 2 billion years before we got anything multicellular, and the nature of fossils is we don't have things like cell membranes recorded because they don't really fossilize. If the same set of chemicals "lived in caves", how would we tell? Especially if it wasn't something we were previously looking for...

I can't find this article either, but this one I can at least dig towards, via wikipedia. It seems I'm remembering some of the original coverage of Mike Russel's work, except he was using the word "Olivine" and I remember the word "Zeolite". (Apparently very similar minerals...) Possibly the article I read was something like this that was up free for a while and has gone behind a paywall now? I can find other interviews with the guy, but not the article I read. (If it was him.)

Still, I can at least find enough to prove I wasn't imagining it, but only after digging enough to find better search terms. I can't find any hint of the koala experiment, only notes that the koala population is crashing, and you can't get permits to do much of anything with them these days, you have to release them into the wild if they're healthy enough to possibly survive there.

It'll be a moot point once they're extinct, and like bananas, I'll miss them once they're gone. (Then again I apparently already missed the really good bananas. Somebody really ought to tell the mythbusters that their "can you slip on a banana peel" episode missed the _point_ and that the bananas laurel and hardy and old bugs bunny cartoons were a different species that became commercially extinct in the 1950's).

I remember reading a great article on the history of bananas. Probably excerpts from either "Banana: the fate of the fruit that changed the world" or "Bananas: How the United Fruit Company Shaped the World" (yes, there are multiple books on the history of the banana), about how railroads were driven through central america to facilitate mining, and how the mining wasn't profitable but the bananas planted along the tracks to feed the workers _were_, and how during the great depression bulk banana imports provided cheap meals in their own wrappers which people littered the sidewalks with, since banana peels are biodegradable in small quantities but became a serious trash problem (hundreds on any given street during their initial popularity, before litter laws).

Of course I can't find that article anymore either. Too bad, it was fascinating.


March 13, 2013

Putting together a toybox 0.4.4 release. There's 8 gazillion half-finished things but I haven't had time to finish them. Aboriginal's overdue for a release of its own using the 3.8 kernel, and needs the cp bugfixes to work. So, sequencing...

Otherwise, head down at work trying to get a project done, eating all my time and energy right now.


March 9, 2013

The uClibc O_NOFOLLOW thing was why cp didn't work on powerpc. The qemu arm board is breakage is fixable by telling the kernel to add 32 to the IRQ.

The device is reporting it's on IRQ 27, which was correct when the IRQ controller's range started at 0. Now that the range starts at 32, it has to be adjusted. I _think_ this should happen on the kernel side, since moving the starting IRQ was a mapping thing the kernel did, and is not a hardware thing? But possibly the IRQ controller should report adjusted numbers when queried? I'd much rather fix the kernel than qemu here, because I ship the kernel and don't ship qemu.

So let's figure out where that 27 came from. In the failure case the error message was "request irq 27 failure", and grepping for "request irq" pulled up a printk in the function sym_attach() in drivers/scsi/sym53c8xx_2/sym_glue.c pulling it out of pdev->irq. Doing an "if (pdev->irq == 27) pdev->irq += 32;" at the start of the function was the quick and dirty fix that got me to a shell prompt, so that's how I know that I guessed right about the problem. (I knew about shifting the IRQ space by 32 because that's what the patch I bisected it to earlier did; this is requesting an irq outside the controller's IRQ range, which immediately fails. So the value being an offset into the range rather than an absolute value is a reasonable guess about what the current right value is.)

Where does this get called from? Let's add a dump_stack() at the start of the function and run it again, and that says sym2_probe() (actually sym_attach() but it got inlined). That's sticking pdev into a structure and then fishing it back out again, but it's 27 at the start of that function. next is local_pci_probe() which lives in drivers/pci/pci-driver.c and is a fairly uninteresting wrapper. That's called from pci_device_probe(), and there we have an interesting digression: a struct device * gets converted to a struct pci_device * by calling to_pci_dev(). Where does that function live? Oh, it's just a #define in include/linux/pci.h doing a container_of() to get an existing enclusing struct that this is a member within.

Ok, so back to the stack trace: that call was from driver_probe_device() which came from __driver_attach which came from bus_for_each_dev() and since the problem we have here is the IRQ controller for the bus got shifted, let's see what bus_for_each_dev has...

And out of time for the moment.


March 7, 2013

Dear uClibc developers: Requiring Linux people to #define GNU_DAMMIT in order to get posix-2008 constants is not cool. I refer to O_NOFOLLOW and friends, which vary by architecture, and which you have bits/fcntl.h variants of for each one ala (from powerpc):

#ifdef __USE_GNU
# define O_DIRECT       0400000 /* Direct disk access.  */
# define O_DIRECTORY     040000 /* Must be a directory.  */
# define O_NOFOLLOW     0100000 /* Do not follow links.  */
# define O_NOATIME      01000000 /* Do not set atime.  */
# define O_CLOEXEC      02000000 /* Set close_on_exec.  */
#endif

This is from the last release version, dated May 2012. That's years after Posix-2008, so the feature test macro is a bad idea _and_ a standards violation. Plus it was never a "gnu extension", it was a Linux kernel addition. The Linux kernel is not and never was part of the GNU project, that's why Linux succeded where Gnu failed.


March 6, 2013

Thanks to Twitter's continuing policy of destroying anything its users like (currently tweetdeck), I have a tumblr now. No idea what to do with it, but there it is.


March 4, 2013

Work could use the old inittmpfs todo item I've had for years (initramfs as tmpfs instead of ramfs). Unfortunately, the last time I brought it up, Peter Anvin cast fear uncertainty and doubt upon the idea without actually explaining what was wrong with it.

I should submit a patch on "don't ask questions, post errors" principles and see if I can make them explain why it shouldn't go in. (If it's a specific issue, it should be fixable.)

Let's see, working back from the cpio extract in init hit some magic initcall section thing, so grep -r linux '"rootfs"' (I.E. look for rootfs _in_quotes_, since that has to show up in /proc/mounts), and wow a lot of stuff cares about that, but the important one is fs/ramfs/inode.c has rootfs_mount() which calls mount_nodev() with ramfs_fill_super. Ok, the same search for '"tmpfs"' pulls up mm/shmem.c which has shmem_fill_super. In theory I can swap the two (with appropriate #ifdefs) and see what happens...


March 3, 2013

Weekend with the niecephews and Adrienne. Exhausting. Think I caught something.


February 28, 2013

Andrew Morton (the #2 guy in Linux's lack-of-hierarchy) asked why I'm bothering with perl removal. So I told him. There's a certain amount of tl;dr in that.

I'm also collecting other people's public comments about this because of the five people who responded my submission, only one is actually an existing embedded developer who's been poking me about this on freenode and twitter and such. Two of the respondents (Andrew and Sam Ravnborg) are existing kernel guys who piped up this time to ask "why are you doing this" (although Sam previously acked one of the patches). The other two supportive but have no experience with the issue.

Of course I get support from lots of nice people: Alan Cox, Jon Masters, and David Anders and it's _great_, but this doesn't translate to acks on the patch. (And yes, that Alan Cox post is over 4 years old. And the patches still aren't in.)

Sigh, all together: sing.

UPDATE: YAY! Andrew added two of the patches to his tree, and Michal Marek the other!

I fall over now. (Dentist appointment early in the morning anyway.)


February 27, 2013

Ok, the perl removal patches are reposted for 3.9-rc1, which means you need to "git pull" in order to apply them. For once I didn't post the 3.8 versions but the current, applies to git at the time I posted, just retested, cc'd everybody and their dog, my darn python script didn't eat the "from" line so my name showed up right, if this doesn't get applied I dunno why version of the patches.

If anybody who cares about this topic would like to test/review/ack them, now would be a good time.


February 26, 2013

A year ago, a certain FSF zealot made a big enough stink to ensure that I couldn't get any sponsorships to work on my open source hobby project full-time. I've been continuing to work on it anyway (as I have for years, and on busybox before that), but not nearly at the rate I could go if I didn't have to work it in (along with my other open source work) around a day job's demands on my time and energy.

So even though I'm not big into schadenfreude, I can't help but follow this this thread with a certain... satisfaction. It's the same zealot, having his head handed to him _on_ precisely the same zealotry, repeatedly, by Linus Torvalds himself.

I should add Linus's microemacs to toybox. On general principles. (And hey, I gotta do a vi anyway because posix says so...)


February 25, 2013

Hang on, busybox maintainer Denys Vlasenko just said that the upstream xz compression code comes from a repository which has the following license statement:

Licensing of XZ Embedded
========================

    All the files in this package have been written by Lasse Collin
    and/or Igor Pavlov. All these files have been put into the
    public domain. You can do whatever you want with these files.

    As usual, this software is provided "as is", without any warranty.

So the people saying that "these files have been put into the public domain" is not a sufficient license statement are happy when Busybox incorporates that code and slaps GPLv2 on it?

How does that make any sense?


February 23, 2013

Travel day to go home. Sitting at the airport sad my netbook battery died overnight and I lost all my open windows. (The acer bios has this "feature" where if the power goes to 5% the suspended system wakes up, presumably so windows can suspend to disk. Ubuntu 12.04 does not suspend to disk in this instance, it just sits there rapidly draining the drags of the battery, even with the lid closed. There is no way to tell the bios NOT to do this.)

Right, one of the people who attended my talk pointed out that "allyesconfig" toybox doesn't build. Oops. I've checked in some stubs like "sed" that I've never gotten around to finishing, and they've bit-rotted a bit it seems.

Let's see, mke2fs, mount, sed, stat, and umount all barf. I haven't checked mount and umount in yet, so I need to fix the other three so they at least compile.


February 22, 2013

Finally finished listening to Paul Krugman's End This Depression Now author talk, which was full of brilliant insights (7 minutes 40 seconds: treating the economic crisis as a morality play is wrong because the people suffering are not the people who sinned).

One of the things he said is that if you look at how the population of Canada is clustered along its southern border, "Canada is closer to the US than to itself". What this made me realize is that Canada is what the US would have looked like without slavery.

The book 1493 details (among other things) how the Jamestown colony introduced Malaria to the new world (which persisted until DDT finally eradicated it in the 1940's). Within a century it had spread up and down the east coast, but the mosquitoes that spread it lived much longer in the south than in the cold northern winters. Malaria made the southern half of the country almost uninhabitable, to the point where the practice of "seasoning" arose, importing the european poor to work as servants in america meant giving them nothing to do for the first year because half of them wouldn't survive the malaria they'd inevitably contract.

A 50% mortality rate within the first year made "getting good help" from europe very difficult (and extremely expensive), so plantation owners looked around for people with natural resistance to malaria. They found them where the disease originated; the native people of West Africa have evolved significant resistance to malaria. (In fact the gene that gives you Sickle Cell Anemia if you have two copies gives you near-immunity to malaria if you have one copy. That's why it's persisted in the population: the 1/4 of the population with two sickle-cell genes dies of sickle cell, the 1/4 with two "good" genes dies of malaria, the half with one copy of each gene survive. The massive death is mostly written down as "infant mortality", so they have lots of babies to make up for it. If you were wondering why religious people are so against teaching evolution: suppressing it is easier than trying to explain how a just and loving god would allow it.)

Of course west Africans had no cultural affinity with europeans, so getting them to voluntarily move to north america in large numbers wasn't feasible, so we sent ships over there to kidnap and enslave them. First in carribean island plantations (even harder-hit by malaria), and then on the mainland.

Since 1776 our politics has been dominated by a south full of ignorant bigots. Southern plantation owners were waited on hand and foot, and the rest of the south's population never needed to invent labor saving devices because slaves did the labor (agricultural and domestic). Even those who didn't own slaves grew up conditioned to treat other human beings as not just inherently inferior but as property.

Stop and think about the common practice of fathering children with female slaves, and then owning (even selling) your own children. This was ubiquitous, considered "improving the breeding stock". (Founding father Thomas Jefferson did this, and refused to free even the subset of slaves he'd fathered, not even in his will after his death.) This requires epic hypocrasy and dissociation, so of course they turned to religion to provide it.

White southerners bent their religion around justifying slavery (bending the cain and abel story in the garden of eden to somehow say that whites _weren't_ descended from cain, or at least not as much as blacks), leading to the subtext of Southern Baptist and Evangelical denominations: "we are superior, the chosen people, justified in dominating all others because they aren't really human". Because their superiority was inherent there was no need for them to learn or improve themselves, and the idea of anyone _else_ doing so threatened their superiority which was against god's will. Ignorance became a virtue, all you needed to know was in the bible and if you start questioning whether slaves are people (or anything else) it needs to be whipped out of you. The industrial revolution passed them by, because the chosen people selling King Cotton didn't want any part of the learning required for it, because learning means questioning and we can't have that.

This _poisoned_ the culture of the south, and to this day they "cling to guns and religion" and insist "the south shall rise again" (because that worked out so well last time). An entire area of the country is culturally scarred.

The civil war didn't start this (bloody kansas) and it didn't end it either (jim crow). Since the first Republican president issued the emancipation proclaimation, the bigot vote went to the democrats for the next century (the original "solid south"). This polarity reversed when a Democratic president (Lyndon B. Johnson, Kennedy's vice president who became president when JFK was assassinated) overcame this legacy and in the wake of Martin Luther King Jr.'s assasination signed the Civil Rights Act. This alienated the racist vote, and Richard Nixon got elected with his "southern strategy" of coded appeals to the south's ubiquitous racism. The "Rockefeller vs Goldwater" divide was about explicitly rebasing the party on southern racism, and although Goldwater lost his election he won the battle to reshape his party's identity. This turned the "solid south" into a GOP stronghold that elected Regan, his VP Bush, and Bush's son. But more to the point, it elected all their congressmen and senators. The GOP became the party of rural ignorance, with subtle but pervasive racism as a core plank of the Republican party platform. A few plutocrats at the top steered a vast horde of ignorant racists, whose racism was the central unifying idea allowing them to be gathered and led. (The party would add a few more coalitions of people whose hot button issues allowed them to be led around by the nose: gun nuts, anti-abortionists, libertarians, and so on. People with a red flag that stopped all thought on any other topic. But by far racists were the largest group.)

Eventually the Democrats called them on it by running a black president, and when he got elected the GOP melted down into gibbering denial. They became "the party of NO" filibustering every bill (including ones they sponsored which he then supported), spinning insane conspiracy theories about how he couldn't possibly have actually gotten elected and must somehow not be real (birtherism), dedicating their entire agenda to nothing but preventing his reelection (triggering and then forcing the country to remain in an economic depression to make him look like a failure), willing to let the country default on its debt payments rather than vote for anything he'd sign...

This was far beyond rational hatred, after 40 years the racism of the party's base had seeped into the party's representatives. The ignorant racist bigots the party appealed to elected ignorant racist bigots to represent them, who went full "protecting our precious bodily fluids" nuts when confronted with a black president.

The plutocrats have been "riding a tiger". The goals of the billionaires who fund party activities are to reduce taxes on the wealthy (from 91% in 1963 to 28% under Regan, 15% on "capital gains" income from owning things instead of working, and many large corporations can avoid taxes entirely), reduce government regulation that prevents for-profit corporations from exploiting "caveat emptor" to the fullest, and to increase the amount of influence money buys in politics (see "Citizens United"). But to achieve these goals, they need to rile up their audience and feed them the occasional win in areas the plutocrats don't care about.

The point of all this is canada was too far north for Malaria to survive so had no need of a west african labor force resistant to the disease, and it had separate political representation where southern slaveowners didn't get to shape national politics.

So all this "conservative" mess with ultra-rich bastards cornering the market and treating the poor as inhuman because their culture says they should have slaves, and their religion says that other people deserve to be in bondange... Canda shows us what the USA itself might have looked like without the stain of slavery. Between federal government and population mobility, even the northern half of the USA is tainted with signifcant amounts of the south's legacy. Canada lives on the same continent, comes from the same "northern european dominator party mix"... they're what we might have been if we'd let the conferderacy become another Mexico and washed our hands of the slaveholders.

(Mexico is a different story, mostly settled by the spanish instead of the english. Spain has its own racist past evicting the muslim Moors and then turning the inquisition on themselves when they ran out of rich foriegners to kill and take their stuff.)

If you hear about republicans switching from Rockefeller to Goldwater, it's about the Goldwater's southern strategy explicitly rebasing the party on southern racism.


February 21, 2013

Ah. If we're up against the zero lower bound in short term interest rates, why aren't long term interest rates at zero too? Because we have about 2% annual inflation. So a zero short ter interest rate should translate to about a 2% long term bond yield, and if long term bond yields are below the rate of inflation (which they are), people are willing to lose money to hold these bonds, they're just losing _less_ than if they were holding cash.

Right, that makes sense. (Took me a while to work that out. Listening to a Paul Krugman interview on youtube on my phone remains educational...)

On the CELF front: talk went well. The room seemed full, although it's hard to tell with a light in your face and there was only one question from the audience. My brain is now completely fried, of course, hence blogging about other things.


February 20, 2013

Giving my talk at CELF tomorrow. (Whatever they're calling it this year.) I never got around to doing slides, but I have my outline up.

The talk description is actually a small part of what I'm covering. The interesting part isn't what I'm doing, it's _why_...


February 19, 2013

California is 2 timezones west of minnesota. My flight on southwest was a bit over 2 hours late. It was the latest one I could get so I didn't have to leave work earlier than necessary. The BART trains stop running at midnight. (Luckily, the airport has shuttlecraft.)

Got to bed at 1am here, which is 3am where I got up, and I got up at 7am. First scheduled event tomorrow is 9am. May be a bit zombie-ish in the morning.


February 14, 2013

Blah: todo list overflow. What am I working on again?

  • Aboriginal release: kernel upgrade broke arm versatile, and possibly other targets. (powerpc not working, "zlib: is a directory", is that toybox?)
  • Aboriginal control images: lfs-bootstrap needs upgrading to current version, install libc/toolchain, use package-list annotations to build but not install stuff (I.E. use busybox/toybox to build everything, but keep busybox/toybox).
  • Aboriginal: tarballs should not get deleted when source control directory's there.
  • Convert aboriginal to use musl (ccwrap rewrite).
  • FAQ for toybox, port all the general development stuff I wrote for busybox way back when, maybe add some prototype and the fan club talk points. (git log of busybox-www to see what's mine: "git blame 95718b309169 -- docs/busybox.net/FAQ.html" vs "git blame ef614ecca61c -- docs/busybox.net/FAQ.html".)
  • Slides for CELF: finish triage of 9base, s6, figure out which bullet points are within scope.
  • Figure out what klibc's "resume" utility is for. (Documentation/power/userland-swsusp.txt)
  • Podcasts to use some of the extra bullet points. (Note: phone USB adapter broken, how do I get data off? Can netbook record?
  • Updating aboriginal about.html, architectures.html (runaway computer history),
  • Toybox: mount/umount, test, nbd-client, p9d, arp, ifenslave, ar, printf
  • Toybox "make test" has failures, fix it.
  • Collate todo lists: ~/todo.txt, ~/toybox/todo.txt, ~/slushy.txt, ~/www/toybox/todos/todo.txt
  • Watch the linux.conf.au videos (why are the mp4 versions glitching in vlc?)
  • Test toybox/aboriginal build under ubuntu 8.04 image.
  • Kernel Documentation stuff: get kernel.org account back, update kernel.org/doc, fix 00-INDEX files, fix htmldocs, triage menuconfig, deal with patch backlog, set up git tree to feed into linux-next...

February 13, 2013

Doing a toybox FAQ, which is as much about general open source issues as it is about toybox.

The trigger was Dave Jones tweeting about an issue that made me want to link him to an old screed I wrote on the topic, except that since then Denys has dropped a large busybox-specific digression into the middle of it which makes it much less useful as a general-purpose answer to why open source developers ask you to upgrade when diagnosing a bug.

Let's see, the last version of the FAQ I checked in was... Sigh. Long before someone who shall remain nameless split the website out into a repository that's mentioned nowhere on the website itself (lovely)... Aha, back in the main busybox git:

git blame 95718b309169 -- docs/busybox.net/FAQ.html

And just to be sure, the last version before the stuff I wrote (not counting the occasional change to the html markup) was:

git blame ef614ecca61c -- docs/busybox.net/FAQ.html

Ok, stuff I wrote (and can thus re-use) identified! (Well, unless I start to care about this sort of thing, which I currently don't.)

But I do wince at anyone who can say "advance freedom" with a straight face ("Citizen, you have nothing to hide from the happiness patrol!") and still refer to "the GPL" as if there is such a thing now the Linux smb filesystem driver and Samba server can't share code even though they implement two ends of the same protocol. There is no "the" GPL anymore, there are multiple incompatible GPLs with the FSF and Linux developers on opposite sides. Have fun with your factional infighting, I'll be over here releasing BSD/MIT licensed stuff as close to the public domain as lawyers still allow to exist. (The universal receiver is gone, so I've switched to universal donor in my quest for a simple easily understood legal position.)

Speaking of lawyers allowing things to exist: DARN IT! BRADLEY! Look, the point of that ENTIRE RANT was that I hate what the FSF did to Mepis and the busybox developers SHOULD NEVER SUE SOMEBODY JUST BECUASE THEY USED UNMODIFIED VANILLA SOURCE AND DIDN'T BOTHER TO MIRROR THE TARBALL OF SOURCE THEY NEVER MODIFIED. I want them to let us know it's vanilla, not feel an obligation to mirror stale versions of stuff we've got hosted on osuosl plus archive.org has it plus you can fish it out of the git repository on every developer's machine!

He removed that. He's reserving the right to sue people into giving us our own binary-identical source tarball back, plus a giant payment (in the ballpark of the year of full-time minimum wage mentioned in last night's state of the union) he squeezes out of each one for legal fees for the privilege (and that's if they don't fight back).

I _object_ to this. I can't stop it, but I tried, and I'm very sorry it still happens. If you get sued, the above commit has the language that WAS up on the website until he removed it.

Sigh. Salvage what I can and move on...


February 12, 2013

Blah. The kernel broke QEMU's arm versatile board again. I _tested_ this, and it broke since then? Sigh...

The arm guys keep poking at the versatile board going "no, that can't be right" and making random changes. Except what qemu has emulated for the past few years is what the kernel was doing, not what random hardware nobody has anymore was doing, so they just break it. Over and over.

So the SCSI device isn't working. Bisect goes into a long range of commits that produces no output. Bisecting towards the end of that converges on commit f5565295892e, which moved the IRQ start to 32. The SCSI driver tries to grab interrupt 27, which is denied. (Sigh.) Moving the IRQ start down to 16 lets the driver bind, but then it times out awaiting a response from the device. Did the finally correct that "swizzle" thing I've been reverting for ages? No, comment that patch hunk out and it grabs irq 28 instead, and even with an IRQ start of 16 it still doesn't work.

Ok, bisect to where the dead zone _started_ and it converges on 07c9249f1fa which is "use irq_domain_add_simple()" which changes how device tree IRQ parsing is done. That change prevents the serial console from producing output. the commit before that works, and the boot messages say... Hmmm. Both say "sym0: <895a> rev 0x0 at pci 0000:00:0d.0 irq 27" but one works and one doesn't.

The one that _works_ says "sym0: unknown interrupt(s) ignored. ISTAT=0x5 DSTAT=0x80 SIST0x0" and then starts enumerating (virtual) hard drives. And then later says that the 8139cp device pci 0c.0 but on the same IRQ 27.

The failing one says "SCSI 0:0:0:0: ABORT operation started" and never recovers from that. That's _with_ the starting range adjusted to where irq 27 is an option and the driver binds.

Hmmm... It's getting the right interrupt number (at least with that "swizzle" nonsense reverted), but not the right... routing? Some sort of enable transaction? What's missing here at the hardware level. I may need to instrument qemu...

Hmmm, is it the _offset_ into the interrupt block that's important? So it needs to bind to 32+27? (Which is _not_ what the device tree is telling it, apparently...)


February 11, 2013

Minor logistical fiddliness with Aboriginal Linux: the build system uses release tarballs (which I mirror locally so the build can download them even if they disappear off the original site).

The perl removal patches change slightly ever few releases, usually due to something utterly trivial. (Right now the -next tree has somebody adding or removing a "restrict" keyword in the perl, I forget which. It makes zero difference in the replacement code, but the way patch works I have to state the file being removed exactly, so I have to patch in order to have no change after it's applied. Sometimes they add an extra field to the regexes. That sort of thing.)

So the patches match a tarball version, and the tarballs aren't available until the final release gets made. Meaning... if I check in the new patches without updating download.sh to point to one of the -rc tarballs, I break the build. But I don't want to clutter my mirror with release candidate tarballs. Locally I'm testing against git snapshot du jour, but I'm not going to check that _in_. And things break all the time, right up until the very end. (Right now I'm bisecting arm breakage between 3.7-rc5 and 3.7-rc7, somewhere i there the root device on the versatilepb went away. Might just be a config tweak but I need to track it down to see.)

Which means the perl removal patch updates tend to get posted once I can check them in, which is _after_ the new release gets posted. I.E. during the merge window. (Often a few days into it because cutting a release on this netbook takes days to build everything, and that's assuming it works right the first time.) The kernel guys prefer patches to get posted _before_ the merge window.

I doubt this is the only reason they've ignored them for years, but it makes it easier...


February 9, 2013

Oh no. I'm looking at autoconf output again.

After upgrading the kernel, busybox, and toybox, Linux From Scratch isn't building to completion because the sed build is hanging. Digging into it, some random script Ulrich Drpepper wrote in 1995 is called with one argument, and loops endlessly doing a "shift" and comparing the second argument... which is blank and there are no more arguments. So it spins forever, and the command line it's called with is hardwired not just in the makefile, but the makefile template automake uses to generate that makefile.

I.E. the only reason the sed build _doesn't_ hang is this script is never called normally. But for some reason it's getting called now, so I have to dig in and find out what changed. So I'm diffing it against the build on the host (which works fine, because Drpepper's script is never called), and even looking at just the differences it's still full of crap like:

checking mcheck.h usability... yes
checking mcheck.h presence... yes
checking for mcheck.h... yes

Three checks for the same header, all testing pretty much the same thing. (Should we include this header: yes/no. Incredibly, pointlessly verbose and it does it for a dozen or more header files.)

Then there's this test:

-checking for a thread-safe mkdir -p... /bin/mkdir -p
-checking for a thread-safe mkdir -p... /bin/mkdir -p
+checking for a thread-safe mkdir -p... build-aux/install-sh -c -d
+checking for a thread-safe mkdir -p... build-aux/install-sh -c -d

The minus lines are test on host, the plus lines are same test on target. I dunno why it's run twice, it happens again earlier too (so this test happens at least three times, but this is the same test run twice in a row). What _is_ the test? This:

case `"$as_dir/$ac_prog$ac_exec_ext" --version 2>&1` in #(
  'mkdir (GNU coreutils) '* | \
  'mkdir (coreutils) '* | \
  'mkdir (fileutils) '4.1*)
     ac_cv_path_mkdir=$as_dir/$ac_prog$ac_exec_ext
     break 3;;

Translation: run "mkdir --version" and check if the output identifies itself with one of three specific strings it recognizes. If it's not one of those three versions, don't use mkdir -p but instead use a shell script in the sed source to do mkdir -p. What does it mean by "thread safe"? I have no idea, the test doesn't specify. So no matter how busybox or toybox behaves, unless we implement a "--verbose" and pretend to be some other package, the sed build won't use us.

(The fix is, of course, to build in our own sed and let this package die. In fact this package's maintainer resigned recently because the FSF is crazy.)

But none of this is why the build is hanging. The build is trying to create stamp.vti, which version.texi depends on, which sed.info depends on... Ah. I never implemented -s in toybox cp. It accepts the flag, but "cp -sfR" is the same as "cp -fR" and that doesn't preserve date stamps and sed has decided that "configure" is newer than "sed.info" and thus it needs to rebuild a file that it can't rebuild.

I hate make. Ok, easy enough fix: implement cp -s in toybox.


February 8, 2013

I generally ignore my Android phone's "Upgrade! Now! Full-screen-popups until you yield to this!" blather because I'd done it exactly once: and the _only_ thing that did was break tethering. (Sprint installed an oh-no-you-don't patch, disguised as a standalone upgrade. First one in the queue. So it worked fine _until_ I upgraded.)

Since then my phone gradually bit-rotted (and probably got who knows what viruses since it had no security patches) until it lost the ability to mount as a USB stick to get photos and video off. It still _charges_, but wouldn't present the "turn on USB storage" dialog. So I gave in and upgraded in hopes it would fix this.

Note: upgrading android has never fixed the specific thing that was so broken I gave in and upgraded. And this is no exception: plugging in as USB still doesn't let me copy data off anymore. Maybe the USB connector got damaged, or the cable went bad: I dunno, it won't tell me.

But it did fix the "app store", so I could upgrade the apps. The netflix client stopped working months ago, so I told it to upgrade everything. Including tweetcaster, which was now unusable because the Java Swing "look and feel" plugin got upgraded out from under it and now there were two layers of un-dismissable menu bars always there vertically along the left side of the screen (!?!?) taking up about 2/3 of the window and there was so little room that it couldn't display one whole tweet (one letter at a time vertically down the right edge, with the occasional "and" or "of" collated) .

So I had to update tweetcaster too, and WOW the new tweetcaster sucks. Have the developers ever tried to use it?

Ok, dismissing the fact that every time I go to a new part of the thing I've been using for years it gives me a full-screen pop-up about what's changed which I dismiss without looking at because it's incredibly rude. It only does that once for every single page and mode in the entire program of which there are far more than there should be.

Half of what I use twitter for is opening links. Android has a built-in web browser, but the tweetcaster guys decided that calling an external app worked too well, so they did a built in sucky modal browser. There's no way to turn this off in "settings" (I looked _hard_). The browser does have an "open in browser" option in the menu which calls the external one, but if you do that tweetcaster will crash when you return and lose your place in the twittervstream. (Every time it boots it jumps to the most recent tweet, not the last one read. In fact it loses all that history, and you have to reload and hope twitter feels like letting you scroll that far back. This is because remembering 1024 or so 256-byte-ish data packets (140 chars plus metadata) would BLOW TWEETCASTER'S TINY LITTLE MIND. 256k of data? THAT'S INCONCEIVABLE!)

You can't pull up the browser menu before tweetcaster's built-in modal browser finishes loading, or it will crash. If you try to load a page with funky javascript (such as any links to the new york times) in the built-in modal browser, it will crash. You can't ever press the "back" button from tweetcaster (instead of the home button) or it will exit and lose all context (I.E. crash). To exit the built-in modal browser, you must press the back button (EXACTLY ONCE).

The "tap on a tweet" menu used to have maybe 3 entries. It's now 9 plus tweet-specific entries, so big you usually have to scroll the screen even though they made the entries narrower and harder to hit. None of these entries are "copy link to clipboard" (or "open in the real browser"), you can only copy the entire tweet to the clipboard, go into the browser, and edit it there. You can also let the tweet load in the built-in browser (which is modal so tweetcaster can do NOTHING until you finish with that page and go back, but only hit back _once_ or it'll lose your tweets), and from the built-in modal browser you can copy just the link. (Except not the expanded link, the t.co URL shortener version that gives you no info about what it points to.)

The Android built-in browser is less useful too. There use to be a "+" button to open a new tab, then hold over the URL part until a "paste" option came up. Now you have to hit the "tabs" button (once you figure out that's what that squiggle means, and it's only there when the URL bar is being displayed so you have to scroll up to make it appear) to see your tabs in funky macintosh-style hover windows and from _there_ you can hit a plus. This gives you a new page that auto-loads google. Hold on the URL bar brings you into edit mode. Delete that URL to get an empty URL bar and then press and hold again and THAT gives you a paste option. Now carefully delete the surplus tweet context that tweetcaster put around the URL you want, and note that the right edge you need to click at so you can remove that last character (because there's no delete to the right of the cursor "del" button, only a delete to the left of the cursor "backspace" button, and no cursor keys, and deleting text never frees up space at the right edge of the URL bar unless the URL is too short to fill up the bar) is exactly TWO PIXELS away from the "delete entire contents of the URL bar with no confirmation or undo" button.

This is just one example. Just about everything that used to be one click in the old android is now two or more, on the same tiny cramped screen. The average number of clicks to do anything has doubled, and they call this an advance. (Oh, and instead of the new page appearing, there's an animated swoop adding a quarter second delay between each page so if you mentally macro redundant clicks together and your thumb goes too fast: too bad, button's not there yet. Of _course_ there's no way to turn this gratuious bling off, not that I've found yet. The stupid "collapse screen to a vertical white line as if it was a 1950's television when you hit the sleep button" is the most annoying so far, it gets old after about the third time, by the 100th its developer is your mortal enemy.)

Yet another regular installment of lateral progress. It's not an upgrade if the new software can't do what the old one could, or can't NOT do what the old one didn't make you do. (Proper open source development can sometimes address this, although not always, as Gnome 3 and KDE 4 demonstrated. But at least people could fork the old ones, or provide alternative projects. But android isn't open source: it's regularly updated abandonware. Periodically releasing stale source code into the wild is not the same as visibility and input into its development. Android allows no access to the actual design or implementation process, you first hear about it long after the fact and take it or leave it with no say in it. If you have comments... who cares about THAT old version, they're already several months into the one that will replace it and there are BIG CHANGES in store, just wait and see!)

What I really want is a fresh security-patched version of the old install. Not something the vendor will ever provide, even thought I keep sending Spring a monthly fee that includes paying for this phone on an installment plan (through October).

Just called them to confirm the "through octboer" thing, and ask how much extra they'd charge me per month to remove the tethering restraint. They say they no longer OFFER that option, instead tried to sell me a "wireless mobile hotspot".

Up yours, sprint. I know what the phone can do. Even if I wasn't an embedded Linux developer, I _did_ it for a month before you disabled it, and the menu entries to do it are still there (it's just the cellular signal strength instantly drops from 4 bars to 0 when you enable that, and jumps right back when you switch it off).

I'm going to go watch netflix videos on my phone. I may leave some long ones running while I do other things, for pleasant background noise...


February 5, 2013

I just spent the entire evening making my toybox todo heap even longer. It now includes analysis of what's in klibc, sash, sbase, beastiebox, and nash. Tomorrow I need to look at s6. And embutils, elkscmd, 9base, dracut...

I don't intend to use any of their code, but I want to see what list of commands they consider important.


February 4, 2013

Remember when Standard and Poor's (the S&P 500 people) downgraded US treasury debt?

Remember when Wall Street went all-in behind Romney and told Obama to suck it?

Remember when Obama's second term started with the federal government suing Standard and Poor's for their incompetent ratings being a big factor in the mortgage crisis that trashed the economy?

Actions have consequences. Payback's one of then schedenfreude things.


February 3, 2013

So somebody requested arp, and it's resonably self-contained as network things go, in theory. In practice: the arp man page is horrible.

Writing a new command: not so hard. Figuring out what the old one can be told to do, and how to tell it to do it: hard. Oh well, start with the subset I can work out and wait for people to complain, I suppose.


February 1, 2013

For my birthday, my sister took me and three of the niecephews to Ikea, where I bought them clearance leopards (as you do) and got myself a $20 desk. It is much less well-built than the $20 desk I got at Ikea a few years back (the plastic fasteners used to be metal), but given sweden's exchange rate I'm surprised they can still remain price competitive at all.

I miss those damn toffee thingies. The minnesota store doesn't have the same range of swedish candy, maybe they stopped importing it. (Yeah, I could buy them from amazon but it seems silly somehow when I'm _at_ an Ikea and they're resolutely not carrying them.


January 31, 2013

I'm vaguely amused at the zombie tinycc project, which is trying hard to get a release out after only 3 years of stagnation.

Lemme rephrase that: in early December two different people pestered the maintainer about having a release and he replied (and I quote): "Well, release would be fine for me. Honestly, even better if someone else could do it." (Yes, this is the "maintainer" who opened a "mob branch" anybody could check into, so he didn't even have to review commits.)

So two months later they're still talking about it, and the people who are not him have done some actual work. Not with an eye towards building the kernel or busybox or anything like that. Or building any specific packages at all, that I can tell. Just having a release to have one, it seems. (I haven't been reading too closely, but none of the reports were "I tried to build package X and this broke", because they're just not there yet. Instead they've got threads arguing about how you do a stack trace.)

This burst of effort means that they're scrubbing the tree and finally noticing that they've hardwired gcc into their makefiles, and there's still plenty of code in there they don't understand and are afraid to touch. Most recently, the maintainer suggested doing nothing for a week. (Why? No idea.)

Still, they got the website link to their web archive updated. That only took one year.

This is what they chose instead of my fork, and they're welcome to it. (Actually I'm not bitter, just amused. I've got my hands full with toybox at the moment, and when I do get back around to qcc I've got permission to license Fabrice's original code BSD so I've got triage work to do even on _my_ fork. I honestly couldn't use anything out of that tcc branch anyway, even if it did look interesting. I'm just sad a good codebase got a maintainer who openly disrespects it.)


January 30, 2013

Finally got a chance to fix the toybox cp command so it doesn't break the Aboriginal Linux build. Which means I need to cut a release, and that's currently the only thing in it, so I'm shaking the tree for low-hanging fruit which would be good to stick into a release. I'm partway through a gazillion things (where "partway" sometimes means I did the research to see what's involved in implementing it, but haven't started yet). The time command's trivial, at least if all you want is posix -p mode. Groups is just id -Gn... and that doesn't work right for any users but the current one. Right.

Sigh. It's hard to come up with automated tests for things that require root access and care about how your system is configured. There's no point in creating tests that only pass on my system and which nobody else can reproduce, and I refuse to make them distro-specific either. Maybe I need to put together an aboriginal linux test chroot.

Amusingly I was pondering putting out an out of sequence Aboriginal release because I had a good stopping point before the kernel was ready. (I actually tested -rc3 for once.) But I've been so busy with other things the kernel's catching up...


January 29, 2013

Paul krugman was just on a podcast, which is great if you've been following him, or read his books, but there's some backstory if you haven't. Let's summarize:

Capitalism is about supply and demand: people sell things, other people buy them. Two major categories of things can go wrong in a capitalist economy: supply problems and demand problems. The great depression was a demand problem (1929 stock market crash), 1970's stagflation was a supply problem (OPEC oil embargo). Since 2008 we're having a demand problem again, but the current generation or leaders weren't alive during the great depression so they've been _treating_ it as a supply problem, sometimes described as "pushing on a rope".

Demand problems can be triggered when a commodity price bubble bursts leaving consumers with widespread debt. The 1929 stock market crash wiped out retirement portfolios and left excess margin loans, in 2008 housing prices collapsed and left underwater mortgages. Consumers diverted their income into paying down the debt backlog, didn't buy as many goods and services out in the economy, inventories piled up, companies laid off workers, unemployment shot up, unemployed people didn't buy stuff which reduced demand further, downward spiral into depression.

The main knob the government uses to control the economy is the interest rate at which the federal reserve loans money to banks. Lowering interest rates makes existing debt less painful (lower monthly payments) and encourages people to borrow more money, increasing demand. Higher interest rates do the opposite, reducing demand if supply can't keep up (thus avoiding inflation).

The 2008 crash was similar to the 1990's savings and loan crisis that took out BCCI under the first President Bush (because conservatives don't learn from experience and thus didn't actually fix anything). This time the crash was big enough (massive wave of foreclosures plus collapse of much bigger companies like Lehman Brothers, Arthur Anderson, and AIG) that even lowering interest rates all the way to zero wasn't enough to restore sufficient demand to keep the economy going, hence the rise in unemployment and unsold inventory.

Since you can't lower interest rates _below_ zero (except by raising inflation, which offends rich people whose fortunes erode at the same rate the debt backlog does), and since keeping the interest rates at zero isn't enough to restore the necessary level of demand to keep the economy operating, interest rates get stuck at zero in what is called "the liquidity trap".

This doesn't mean money vanished, it's just not moving. It's collected in the hands of rich people who sit on it. The same companies that have piles of unsold inventory also have large bank accounts, as do the owners of those companies, but they've become risk-averse, afraid to spend what they can't replace. Civilian consumers don't have enough money, so they can't buy things, pay their bills, or employ each other.

The fix is to increase demand, meaning somebody has to spend more money. But the economy is a closed system: all income is money somebody else spent, so if everybody cuts spending at once our incomes have to go down due to sheer math. There are only four sources of spending:

  • Individuals, who are unemployed and in debt, so spend less.

  • Businesses, which see demand down and inventories piling up, so cut expenses.

  • Foreigners, but exports are already driving the most successful other economies (china and germany), and our crash triggered a crisis in europe (the euro is fundamentally flawed), so it's hard to find anybody to sell to.

  • The government, which can print money (so it _can't_ run out), and currently borrows it at 0% interest.

The GOP morons screaming about the federal debt (which is still lower than it was in world war II) want to cut off the last remaining source of spending. In most of Europe conservative politicians have succeeded in imposing massive "austerity", contracting government spending and contracting the countries' economies with it, causing massive unemployment, poverty, riots...

An aside on Europe's problems: the new currency, the Euro, is controlled by Germany, which is driven by exports. Germany sells more to other countries than they buy from them, treating the rest of europe as a captive market. Before the single currency, this persistent imbalance would eventually have priced their goods out of the market by driving up the Deutchmark vs the Lira; instead forcing them onto a common currency with a persistent trade imbalance has impoverished Germany's customers until they can't afford to import more _and_ the imports are still cheaper than what they can produce domestically. German politicians have literally suggested that the other countries in Europe could also redesign their economies around exports, and everybody could sell more than they buy. Presumably the surplus would be sold to Mars. During World Wars I and II Germany tried to take over Europe with tanks, in the 1990's they took over Europe with banks, yet the persistent failure of the rest of the world to be Germany continues to baffle them.

Back in the USA, the main reason 2008 wasn't like 1929 (and didn't trigger a full repeat of the great depression) is that the federal government stepped in to spend a lot of money when nobody else would, and kept the system afloat. A lot of it was automatic "stabilizers" such as unemployment and medicaid. The surge of Census related temp jobs was also well timed, and Obama did beat a modest stimulus package out of congress before the GOP opposition could organize itself into "the party of no". This was enough to avoid waves of evictions forcing homeless people into shantytowns ("Hoovervilles"), but not enough to deal with the backlog of consumer debt in a timely manner. 5 years on, the economy is still depressed.

In a depression like this, the Federal government is the only organization with the freedom to spend, providing income for the indebted masses to pay down that debt until they can start spending money on each other again. (We could also just cancel the debt, but the rich who lent out the money would rather see the country destroyed than their personal wealth threatened, and billionaries own the GOP outright.)

FDR dug us out of the great depression with the new deal The same GOP morons opposed him back then, and kept him from doing enough until the run up to World War II scared them into line. Our economy recovered fully in 1940 and 1941 (december 7th 1941 was after 2 full years of spending to prepare for the war, via lend-lease and such). Unemployment went down when the government gave people jobs, and we also invested in infrastructure we're still using 75 years later (yes it's wearing out).


January 26, 2013

If you follow the GOP, you might get the impression that plutocrats actually want poor people to suffer, as a goal in itself. And you'd get that impression because it's true. It's actually a fairly common thread throughout history, the end of slavery and indentured servitude is why "you can't get good help these days". Without starving illiterate peasants, the giant manors of Gosford Park or Downton Abbey can't afford staff. Education ended country manors just as emancipation ended plantations.

The thing is, billionares don't hire other billionaries to do menial tasks; they can't afford to. You need poor people to be rich relative to. Today's minimum-wage burger-flipper has access to prepaid cell phones, antibiotics and insecticides, fruits from the other side of the planet, meat every day if they want it (chicken's cheap and the McDouble is technically beef), the ability to keep food fresh for months or years, air conditioning, access to police and fire services, learning (libraries/Wikipedia/TED talks/Kahn Academy)... We take for granted wealth that kings didn't have a couple centuries back, but a minimum wage burger flipper is not rich.

Being rich is meaningless unless other people are poor, because rich people hire less rich people to work for them. That's what money is all about: you don't pay farms or mines or factories, you pay people to build and run them. Money always ultimately goes to people, not things.

Rich people's ability to hire lots of poor people is what makes them rich. Cheap labor is the end goal of management everywhere, paying less for what you buy and being paid more for what you sell is the heart of business. The ultimate expression of this is slavery, which is as old as recorded history: buy a laborer and own them outright.

Explicit slavery is out of fashion in the USA (but ask child brides sold for dowries how the rest of the world's doing), as is the Jim Crow replacement system of the "company store" where workers were perpetually in debt to their employers and paid with "scrip" instead of cash (non-portable coupons worthless anywhere else) so they couldn't afford to leave. Economic slavery is the end goal of the upper management of most corporations, if you're lucky you get "golden handcuffs". (This is not the same as job security: slaves were sold all the time.)

The people running these systems haven't changed. Plantation owners are now CEOs, and engineer stuff like H1-B visas (leave this job and you're deported, try for citizenship on your own time). They would LOVE to reintroduce slavery or jim crow, they just want to do so out of sight with sweatshops in India or the phillipines where they can get away with it.

Yes, plutocrats want poor people to suffer, even if they don't admit it to themselves. It's inherent in what they do. If everybody is rich, nobody is rich. The ultra-rich aren't trying to expand the pie so _everybody_ gets a bigger slice, they're playing a zero sum game where they can't win unless they beat somebody else.

For the GOP to win, others must lose. Opposition to health care isn't about the cost any more than fighting wars or union-busting. It's about keeping the uppity poor in line so their labor remains cheap.


January 25, 2013

Hmmm, there's a shell syntax for assigning values to variables if they're not currently set: "${varname:=value}". I'm pondering using it to replace my export_if_blank shell function in aboriginal, but haven't done so because it's not really a cleanup. It looks like evaluating a variable but is actually modifying its contents. It sets locally rather than exporting so I'd need to list the variable name twice (once in the ${} and again in an export statement; exporting is necessary for child processes to inherit the value). And it needs a command context to be evaluated in; you can cheat and use the ":" command (which is basically a synonym for "true"), but you've still got to have that at the start of every line with one of these on it. Which is subtle. I don't like unnecessary subtlety in programs.

So "new tool, very nice, doesn't help".


January 24, 2013

Ah, balance disease. Even in the UK, they can't cover something obvious without finding some crazy loon that disagrees with it and letting them bark into a microphone for a few seconds to show that nothing is ever actually true or false (after all, gravity is a social construct). Spineless twerps: "The united states government just did this thing". Full stop. Was that so hard?

I'm enjoying working at Cray. Learning all sorts of stuff I didn't expect to. We had a brown bag on ftrace the other day, and even though I sat through the CELF keynote on it it never quite "clicked" the way it did in that brown bag. (I need to write a good intro. I should work out how to screencast videos, actually.) Plus in shell scripting, "exec" with just redirects but no new command name can redirect stdin and such persistently the current script, without resorting to elaborate parentheticals. (It's implicit in the susv4 description of exec, but easy to miss.)

Fun Ubuntu 12.04 bug: if you close the laptop lid while switched to a text console, the suspend process gets a "not authorized" which nicely pops up on the desktop (when you switch back to it), but leaves the thing running with the lid closed.


January 23, 2013

Oh right, klibc has executables in it. I should add that to the toybox roadmap (yet another use case to replace). Let's see, do a quick "echo $(for i in $(find . -type f); do file $i | grep -q executable && basename $i; done | grep -v '[.]g$' | sort -u)", filter out the *.so entries and the *.shared duplicates of static commands, and...

cat chroot cpio dd dmesg false fixdep fstype gunzip gzip halt ipconfig kill kinit ln losetup ls minips mkdir mkfifo mknodes mksyntax mount mv nfsmount nuke pivot_root poweroff readlink reboot resume run-init sh sha1hash sleep sync true umount uname zcat

Huh, no switch_root? Weird... Oh right, klibc called it run-init and I couldn't put a dash in a busybox command name so as long as I was renaming it anyway I called it switch_root as an analogy to pivot_root. Right. (My computer history research taught me the importance of tracking sources, so I'm usually very good at remembering where things came from... unless it was from me. Anything I could independently reproduce can't be that hard, QED.)

Ok, what does toybox _not_ currently have? (They call sha1sum "sha1hash", and "nuke" is "rm -rf --"... What the heck is mknodes? It... generates a .c source file? What? Ah, it's part of the build infrastructure for dash: mkinit, mksyntax, mknodes, mksignames all "hostprogs-y". And fixdep is another one.)

Ok, looks like:

cpio dd fstype gunzip gzip halt ipconfig kinit minips mount mv nfsmount pivot_root poweroff reboot resume sh umount zcat

Not a bad list. I'm partway through mount/umount already (which should include nfs and a cifs support, and fstype is related), init needs doing (which includes halt, poweroff, and reboot). I did a gzip implementation in java back in the 90's (back before 1.1 added it to the standard library), so that's gzip, gunzip, and zcat I'm confortable doing. I have plans for cpio, dd, mv, pivot_root, and ps. Which leaves ipconfig, resume, and sh, two of which I know about and "resume" is... a strange tool that reads data from a swap partition and passes it to the kernel? I thought the kernel did that built-in? Weird.

Ok, when I get an hour I need to add all that to the roadmap properly. Ooh, and this (ENOENT is not an error when cleaning out initramfs). And if mounts exist in the old directory and there's a mount point for them in the new directory, --move the mount (the klibc code has special case handling of /dev, /proc, and /sys; sigh).


January 22, 2013

Things I've learned this morning, in the approximate order I learned them:

If you miss your stop on a minneapolis bus, it doesn't turn around at the end. It kicks you off in the middle of nowhere and goes back to the depot.

The google maps client sprint put on my phone does not do bus routes. The web version is unusable on phones due to Clever Javascript.

Google maps can't find a GPS signal in the habitrails. Navigating the habitrails is an acquired skill, and they inexplicably dead end a lot. Stairs down from them don't always clearly label the street level vs sub-basements.

If you ask google maps where two bus routes intersect, it shows you a "transit center" that is a building where bus employees work. No busses stop there.

The sportsball arena has big "transit centers" that people will helpfully direct you to. There is no mention of busses in them. They are deserted. If you go outside to see if busses are mentioned there, the door will lock behind you. This puts you on route 94, a large mutilane highway. If you go along it, there are fences to prevent you from getting back out. Sprint does not get signal here. The sun does not reach here. The roadside contains a surprising amount of abandoned luggage. When a chunk of ice gets in your shoe it can be surprisingly stabby. Getting the shoe back on wearing gloves and balancing on one foot is, eventually, possible. This does not mean one has found and removed the piece of ice when it stuck to the bottom of the sock. Getting the shoe back on a second time is more difficult. It is possible not to be sure if one is laughing or crying, even while doing it. It is possible to be cold enough that getting angry just doesn't happen somehow. This does not stop one from developing a general loathing of the city of minneapolis.

The trick to getting out of route 94 on foot is to go back to the sportsball arena and find the one-foot-wide staircase up the side of the fifty foot concrete wall.

When a helpful person points and says "that's the bus you want", they don't mean that specific bus. That specific bus will sit there, locked, running, with nobody in it, for half an hour. They mean go a block away and wait at the stop for another bus with the same number.

Cans of energy drinks freeze and explode, but do so fairly silently in plastic bags, which then leak slowly down your leg and on your shoe, but if it's cold enough you won't notice, and it freezes there anyway. You notice when you get back in the warm and it starts to melt and drip.

The trip to St. Paul on the 61 bus goes through a surprising number of plowed fields. It has a scrolling LED display that says "keep conversations respectful by using appropriate volume and language".


January 21, 2013

First day of work. Brain filled up around 11am. Many accounts set up. Got a lecture about product architecture. Introduced to many people I couldn't pick out of a police lineup at this point. There were multiple wikis. Got a login to the build machine, got code checked out (from svn) and compiled (not through official build thing, just by typing "make"). and did a big diff of the local tree vs upstream: 1.9 megs before I start filtering. Tomorrow the emulator guy should be in for me to try to boot a thing with a printk() in it so I can tell it's running my code. Printed out more stuff to read overnight.

Found a broom closet so close to work the commute is farther vertically than horizontally. Of the chunk of paycheck peeled off "per diem" it's more than I get in a week but less than two, and I can't argue with the location. Might be able to move in on the 28th. Staying at Kris's friend's place until then, commuting via bus.

Explored the habitrails of downtown St. Paul a bit: found a local branch of the bank Fade uses and a place that sells familiar energy drinks. Eventually made it to 6th and Wabbajack where something or other eventually got turned into the right bus.

Lost one pair of gloves already, bought a replacement. These are electrostatic gloves that work with phone touchscreens! Modern freezing your face off technology. (The high today was -3 farenheit.)


January 19, 2013

In Minnesota, at my sister's. Woke up at 5:30. My sister woke up at 6:30. (I'm totally a night person but can get up in "the middle of the night" and then power through sunrise if I got enough sleep.)

Apparently Minneapolis has a vacancy rate of 2%, so finding a place to stay for 6 months is challenging. My sister lives over an hour from work so apartment hunting from here is a bit awkward, but this is why the internet was invented.

Niecephews are _exhausting_. (Four of them, age range: 5-13.) Having the 5 year old and the 7 year old hang off me at the same time is something my back did not enjoy.


January 18, 2013

And lo, I get on a plane to Minnesota in a couple hours, off on a 6 month contract working at Cray in St. Paul.

I cut a toybox release. It built the i686 target of aboriginal with cp and readlink enabled (and the corresponding busybox versions switched off), so that's probably a good stopping point. There's losetup in there too and several nice third party contributions and general cleanups.

I've been on the fence about doing an interstitial aboriginal release (the ls bug is embarassing) but the deciding factor is now simply that my poor netbook can't grind out a dozen targets in less than a full day, and I haven't got that much time left. The kernel's up to -rc4, I should have non-work internet again by the time that ships. (Since I've updated the perl patches before the merge window for once, I should post them to the list, but balsa whitespace-damages everything and my python script to send stuff is still getting the header info subtly wrong. More todo items...)

My talk proposal for CELF ELC (on command selection in toybox) got accepted, so I should be in San Francisco in late February. First CELF I've attended since the Linux Foundation katamari'd over it, and I'm not attending the other stuff they've bundled with it, but the kernel.org guys are still nuts about key signing (because physical proximity proves you're harmless, that's why large men in dark alleys are so reassuring), and I should really deal with that. Besides, the hallway track is generally fun.


January 16, 2013

With a bit more time to think about it, the mknod case of cp -p isn't as horrible as I thought, because I can just create the sucker with the right permissions in the first place. For files the suid bit goes away if you modify the contents of the file, but with mknod I'm not modifying the contents. (Having the suid bit on a device node is kinda strange in the first place, but it's not my place to judge.) So the initial mask is pretty much right, modulo setting umask(0) for -p.

For directories I already make sure the sucker's writeable or I can't create new entries in it. This is another facet of the same atomicity problem: for a file O_RDWR|O_CREAT gives me a writeable filehandle even if the permission bits of the file it creates don't have the write bit. (This doesn't work on an NFS filesystem, which is NFS's problem not mine. Don't use NFS, it sucks in many subtle ways.) Creating a directory doesn't give me a filehandle, and if I do have a filehandle to a directory it's just a reference, permission checks get redone each transaction. (I think.)

Awful lot of fiddliness for 220 lines of code, isn't it? If you were wondering why I held off doing it for so long, I didn't want to leave it half-finished again. I want to get it _right_.

Still gotta do -sHL flags, re-read the posix spec to see what fun corner cases I might have missed, and plug it into aboriginal to find real-world breakage.


January 15, 2013

So fchmodat(AT_SYMLINK_NOFOLLOW) always fails, which is a glibc bug that uClibc blindly copied. That took a while to track down.

Context: I finally reached the end of a longish rathole making "mknod b blah 0 5 && cp -rp blah blah2" work in toybox cp.

Last time I mentioned I had to fork the -p logic so it could handle with cases where we could and couldn't get a filehandle. The ones where we naturally already have a filehandle to the entity we're trying to set permissions, ownership, and timestamps on are nice and secure: there's no gap between fiddling with the object and fixing up its permissions where somebody could swap in a different object.

But mknod() doesn't give us a filehandle to the node it just created. That's ok, mkdir() doesn't either, in that case we can just open it on the next line and perform a longish dance of verifying that we didn't follow a symlink and that what we have now is a directory. Except that if mknod created a node that refers to a device for which no driver is currently loaded, we can't open it. (Even with 0 in place of O_RDONLY because O_RDONLY already _is_ 0. You can't _not_ request read access.)

Well, that's not quite true: 2.6.39 added O_PATH which gives you a filehandle for use with fchdir() and friends. It's not in Ubuntu 12.04 libc headers yet, but I can look up the value and #define it myself, and then open() accepts it. But if I try to set permissions through the resulting filehandle with fchmodat() it returns "Bad file descriptor". (It doesn't mind opening something that isn't a directory, but then there's _nothing_ you can do with the result.)

So rather than follow the mknod with an open(O_NOFOLLOW) I set the filehandle to AT_FDCWD and down in the -p logic I test for that and use fchownat(), futimensat(), and fchmodat() in that case (which modify a name relative to a directory filehandle), and in the other case use the more secure fchown(), futimens(), and fchmod(). I'm tempted to collapse the two forks together and just always use the lookup-by-name versions, but the original three functions I was using DON'T re-look up the entity by name opening up a race window where somebody can swap in a different one. If I still have the filehandle to the file or directory I was just operating on, from a security perspective I want to use those original ones wherever possible and only fall back to the other three where that isn't an option.

All that I figured out a couple days ago, what ate a lot of debugging time since is the AT_SYMLINK_NOFOLLOW flag, which all three of the new permissions fiddling functions mention in the man pages. That flag tells it not to follow symlinks when modifying this entry, which is the behavior I want: symlinks are already handled further up in the function and -p doesn't apply to them because they don't have their own permission bits. (Hang on, they've got their own ownership and date stamps, so 2/3 of -p does apply. Right, another todo item.)

Anyway, the "race condition" I'm concerned about for cp mknod -p is a malicious user finding some cron job or something where root is doing a cp -a on something that has the suid bit set, setting up an inotify on the destination directory, and right when the resulting file is created deleting it and replacing it with a symlink. If cp then sets the suid bit on that symlink, it can give the user a suid executable with arbitrary contents, and congratulations you've cracked root on the system.

And, as I said at the top of this entry, fchmodat(AT_SYMLINK_NOFOLLOW) always fails. It doesn't even make a syscall, it tests for the flag in libc and throws a temper tantrum if it finds it. (Wrongbot's uClibc code even tests and fails twice: if any other flag is set, fail. If this flag is set, fail. Brilliant.)

They're doing that because there's no syscall, and they can't be bothered to come up with a workaround.

There's also no obvious way to get a filehandle to a symlink, so that _must_ operate by name.

I did an strace on the gnu/dammit version of cp to see what syscalls it uses, and it's doing lchown() with a path from the current directory. That's even _less_ secure, somebody could replace one of the intermediate directories with a symlink.

What I want is for open(O_PATH) to give me a filehandle to a directory entry that lets me modify the metadata but not the contents, and which can open a filehandle to a symlink rather than one the symlink points to. (This way, I don't need two -p codepaths, one that securely operates on filehandles and the other that _insecurely_ operates on names.)

Failing that, I want fchmodat() to implement AT_SYMLINK_NOFOLLOW.

Given that I've got neither of those, I'm going to have to come up with some horrible workaround. Grrr.


January 14, 2013

I'm continually frustrated by the way people keep calling money an illusion. It's a promise of value. This promise is an "illusion" the same way a signed contract is, or laws are, or shares of stock in a corporation. The difference between a promise and an illusion is that promises are honored and enforced by somebody specific.

Society is built on promises. The bill of rights is a list of promises. Citizenship is a promise. The only reason you can walk around safely is everybody else has promised not to kill you, and if they break that promise there are people promised to defend you a phone call away (police), and people promised to defend THEM (swat teams, national guard, army).

Yes, printing money is making new promises. But so is borrowing money: when you borrow money from a bank, the bank treats your promise to pay the money back as an asset. Every time you use a credit card, you're giving the bank a fresh promise to pay the money back, and the bank records that new promise in its books AS A TYPE OF MONEY. Larger transactions involve a web of interconnected promises: taking out a mortgage loan promises to give them your house if you don't pay (and promises that a third party sheriff will come and evict you from that house on the bank's behalf if necessary). Any of these promises can be broken, and the point of courts and judges and juries is to resolve broken promises without trial by combat or lynching.

I've previously called money just debt with a good makeup artist, but actually it's a slightly different type of promise. Taking out a mortgage loan promises to give them your house if you don't pay (and promises that a third party sheriff will come and evict you from that house on the bank's behalf if necessary). Using a credit card is just the raw promise that you'll pay it back.

The government prints money for the same reason the government passes laws: it's the organization we've delegated the authority to make really BIG promises to, and which is in charge of enforcing those promises. They print money the same way they issue laws. Sometimes it's a bad idea, but sometimes NOT doing it is a bad idea. (There are people who think every law is a bad idea. If they really think so they should boycott dollars, which are backed by laws saying they're legal tender.)

The government can collect taxes as an alternative to printing money, and the reason to do that is to prevent inflation. Except that right now the economy has a shortage of money in circulation, and because nobody is willing to take a pay cut this manifests as high unemployment instead of massive deflation. (Instead of prices going down 10%, everybody who has a job gets zero pay raises for five years while ten percent of the people have no job for all that time. This is called "downward nominal wage rigidity".) A couple years of 5% inflation would actually fix what ails the economy. At the expense of making rich people poorer, and we have the best congress money can buy so they've been fighting to keep it broken because fixing _other_ people's promblems doesn't interest rich people who don't need a job.)

Promises are vitally important to a functioning society. Taxes are backed by a promise to confiscate and imprison if you don't pay. Stop signs and traffic lights and lines painted on roads are backed by a promise that police will arrest you if you don't obey them. Writing a check is a promise that there's money in the account. When you deposit money in the bank, the bank is promising to give it back.

People who don't understand this think commodity money is different. "Gold isn't a promise!" Stick that person on a desert island with all the gold they can eat and come back in a year. Commodity money was always a promise of value. People who think having gold is somehow better than having dollar bills miss the point that neither is useful unless people are wiling to exchange them for things you want at a future date. If people stop being willing to exchange them (as happened to all sorts of commodity money: cocoa beans in south america, cowrie shells in the marshall islands) they lose their value.

Using metals as commodity money is no different, look at the wildly fluctuating price of silver: in 1840 it was $1.29/oz, but extensive silver mining drove down the price to 64 cents by 1900 and 34 cents 1940. Inflation brought that back up to $1.63/oz by 1970, and then commodity speculators drove it up to $16.39/oz in 1980... before crashing back to $4.06/oz in 1990. The value of silver has historically been as volatile as beanie babies and tickle-me-elmo dolls.

The gold bugs try deseprately to pretend that silver was always secondary to gold... except that Rome demanded tribute in talents of silver, the coin of Athens (the drachma, nicknamed the "owl") was 4.3 grams of silver from the mines at Laurion, the bible has Judas paid with thirty pieces of silver, "cross my palm with silver", and so on. In greco-roman times, gold was secondary to silver. (Sure gold was rarer and more valuable, but platinum was rarer and more valuable still, and who actually pays for anything with platinum bars? It doesn't get USED for anything outside of D&D campaigns and the recent attempt to exploit a legal loophole.)

The massive amounts of silver and gold brought back from the Americas changed gold's secondary status, because there was now enough of it to encounter regularly. The huge influx of precious metals also caused massive inflation throughout europe, which was GOOD. The increase in the money supply let the european economy expand without immediately going into the same kind of depression we're in now (a liquidity crisis: not enough money to let all the people buy all the things, so you have unemployed people sitting next to unsold things and no way to connect the two). When people got used to an expanding economy they invented paper money to keep the economy expanding after the extra gold supply dried up; a bigger economy needs more money or it slides into depression.

None of this is new, but it's worked so well for so long we forgot the difference between an aqueduct and a river. That we invented mechanisms to scale things beyond what "just happens naturally", and that those mechanisms need to be _operated_. The same idiots screaming that the melting glaciers are just happening without a cause, that evolution is a lie (and thus it's ok to use 80% of all antibiotics in cattle feed because antibiotic resistance is the will of Zeus instead of a trivial example of evolution in action), screaming to keep the government out of their medicare...

They're in control of the house of representatives and fillibustering the senate, and they're trying to cut spending in a depressed economy. They're literally arguing for an end to unemployment benefits when there are three job seekers for every position nationwide so it mathematically CAN'T matter how motivated the job seekers are, there's still more pegs than holes.

It's really, measurably stupid. There are REASONS they are wrong, if you just bother to understand the system, which they don't because they think they already know everything. But driving a car does not make you a mechanic. Getting rich doesn't teach you how the economy works any more than a farmer whose crops get enough rain learns how to predict (let alone control) the weather. My first year in Austin we had a bumper crop of crickets an inch deep, if there was such a thing as cricket farming somebody would have been rich, and would forever after insist they had earned it.

The "end the fed, printing money is a sin" idiots are the exact same guys who want to drown the government in a bathtub (no new taxes, no new laws), and who insist everyone carry a gun because the promise of protection offered by the police can't possibly be trusted. These people do not believe anyone else's promises, because their own are easily broken. They're dishonest bastards whose word is worthless, so of course they don't trust anybody else, which means they can only function in society with the aid of massive piles of cash to cover their perpetual lapses. They either wind up rich (theft is profitable) or in jail. Society doesn't (can't) work the way they think it does, and when they try to change it to fit their prejudices they destroy whatever portion of it they've managed to steal. (They're great at figuring out what they can get away with. They've got the House due to gerrymandering, is that theft? Losing the popular vote but gaining the electoral vote is their stock in trade, gaming the system is what they do. Because they don't trust promises, their own or anyone else's.)

The whole trillion dollar coin thing was just an end-run around these morons, who've worked a contradiction into the law. The government promised to spend a certain amount of money (the budget congress passed), and the government promised not to tax, borrow, or print enough money to spend the amount in that budget. Paradox. The promise breakers are trying to force the government to break its own promises, to show that all those OTHER promises it's issued (from laws to dollar bills) are as worthless as those from the GOP. The platinum coin loophole let the feds print enough money to keep their promises without hoping the crazies had a moment of clarity.

For a while there, people thought Obama had a spine. In reality, the stiffening effect was a little blue pill Obama takes to sustain elections, and his spine goes all floppy again afterwards.


January 13, 2013

Back dinking at toybox cp: the mknod should be using st_rdev (device ID of special file) instead of st_dev (device ID of containing filesystem). Oops.

Unfortunately, this doesn't fix the case where we just did a mknod for a major/minor pair with no associated driver, so any attempt to open it fails, so fchown() has no filehandle to operate on to change ownership. The reason is that O_RDONLY is 0, so you can't _not_ request read permission when opening a file. (Hello PDP-11 unix from 1970.) What I need is some kind of O_NOTHING bit that _just_ gives me a filehandle I can manipulate metadata through, but not contents. Some discussion wandered by on linux-kernel about this, but it's not in the open(2) man page of current ubuntu LTS.

For most things, implementing -p just means keeping the original filehandle open and doing some extra stuff on it at the end. This avoids race conditions where the file you copied data to or the directory you created files in gets replaced by another program between your first and second actions on it. (Which could be a security problem. Assume there's a suborned flash plugin running, but it shouldn't be able to crack root by dropping a symlink somewhere root's about to flip the suid bit. And this is why the multi-user nature of minicomputer Unix is still relevant on phones, it means your web browser can't install keylogging device drivers.)

Alas, mknodat() doesn't return a filehandle, so I have to go back and look it up by name to do further operations on it, with a gap that allows a different filesystem object to be substituted. I can feed all three functions SYMLINK_NOFOLLOW, but I'm still uncomfortable.

I'm also uncomfortable with having two codepaths that do the same thing. If I'm going to have fchownat() instead of fchown() shouldn't I force everything through it? (Except mknod is rare, and the filehandle version is more straightforward for the common case of files and directories. And I still need the equivalent of the filehandle anyway to signal whether or _not_ to perform the modifications because you don't for hardlinks and symlinks. Grrr.)


January 12, 2013

Sigh. The balsa email client is really frustrating.

Right click and copy link doesn't work, so I cut around the link and copy that instead. (Why that works and right click copy link puts nothing in the clipboard, no idea.) Email filtering doesn't work (the rules never triger), so I wrote a python wrapper that runs each time balsa exits and chops up the messages in the inbox mbox, shuffling them to the right folder.

This means the filtering only happens when I exit balsa, which I can't do if I have pending reply windows up, so I tend to read the raw unfiltered inbox a lot. (This means I'm reading a lot more of linux-kernel. It also means I'm days behind on email pretty often.)

So I'm seeing messages containing patches, which I need to collect and deal with later. Cut and paste converts tabs to spaces, but worse the balsa developers didn't bother to implement window scrolling during cut and paste. so I can only cut what's currently showing on the screen, and the patch never fits. So I want to right click and save the message: the balsa guys didn't implement that. So I want to right click and copy the message to different folder (I can create an empty one to collect them, then forward later. Note that fowarding messages _as_attachments_ does whitespace damage). But the balsa developers didn't implement "copy message" either. (Move yes, copy no.) View source: same scroll problem, and it gives me the raw mime with = escapes and wordwrap breakage.

I eventually found out how to get it to do what I want, double click on the message in the message list to pop it up in a window, then "message" (pulldown menu #5), then save current part.

Ubuntu LTS hasn't updated balsa since it shipped, and building it from source still requires installing dozens of packages beyond my patience for dealing with gratuitous dependencies. (I don't want a spell checker. Three spell checker packages and it still wants more: no.)

Sigh. I should restart my search for a viable email client, but trying to find reasonable user interface stuff on linux is an enormous production. (So instead I've written a python script to send email in a way that won't whitespace damage patches, although I've still got to fix the "to" addressing logic.)


January 11, 2013

I am not a fan of SELinux and friends. Maybe I'm being too hard on them, but I doubt it.


January 10, 2013

Heh. According to the history of rome podcast I'm listening to, "senate", "senior", and "senile" come from the same root word: senate was "council of old men". Political offices in Rome were unpaid, thus "office" and "honor" come from the same root word. Running the government was a thing rich people did in their spare time.

"Release early release often" is good advice, and I want to do that with Toybox. When I checked in cp, I started poking at a release.

My process for cutting a toybox release is to stick it into aboriginal linux and build all the targets. That's how I get the static binaries. It's also a good smoketest that it _does_ build with the new toybox commands replacing yet more of busybox.

Alas, cp didn't. Or rather it _almost_ did, but the gcc source tarball keeps re-extracting itself every time it's set up, which screws up parallel builds (because they try to rm/extract/patch the same thign at the same time, this is why "EXTRACT_ALL=1 ./download.sh" can do it all up front before a parallel build, instead of the lazy binding variety build.sh defaults to.)

The problem boils down to "cp -lf", because -f only applies to the IS_REG() file copy, and -l needs the same "it didn't work, delete the destination and try again" -f behavior to avoid cp returning an error code.

So I need to shuffle the code some more. The _easy_ thing I could do is stick in another goto. (Right now the -p logic to adjust the file's metadata after creating it has a goto so the directory creation and file creation can share the same code.) The -f retry stuff either needs to be duplciated or it needs to be able to jump _back_ to retry. With a counter variable so it doesn't potentially get stuck in a loop. Sigh. (My main objection to adding a big loop around everything is I have to reindent it all and it makes massive diff noise due to the whitespace change. Oh well.)Ok, what are the cases here: directory, hardlink, symlink, mknod (char, block, fifo, socket), file. Hardlink and symlink don't need -p, but the mknod variants do and currently aren't getting that right.

The -f unlink/retry logic is needed for all the cases, but only for failures to update the _target_, but not for failures to read the _source_. So copying a directory without -r, readlinkat() failing, openat() failing mean don't unlink/retry. (And, of course, not having -f.)

Sigh. Went through the work to recombobulate everything, and now it's giving me the 'src' is 'dest' complaint for a bunch of files. (And a different test with "sudo cp -rp /dev/null blah" is giving me a "no such device or address" error attempting to open the file for -p after the mknod.) More digging to do...


January 9, 2013

Listening to Terry Gross interview Lemony Snicket (who sounds remarkably like Wyatt Cenack on the daily show). He's saying he couldn't explain to his own kid (age 9) why the Beatles called themselves that. Terry said it might have been an homage to Buddy Holly's band "The Crickets".

I thought it was a pun on "beat", as in the beat goes on. Hence the spelling "The Beatles" instead of beetles. (But nobody ever explained that to me, I just guessed for myself...)

So I continue to hate git. If your repository is modified (such as having patches applied), "git show" appends a diff, which craps lots of text into build/MANIFEST that's just trying to show the version number. This broke my manifest generator in sources/functions.sh, so I go to fix it.

In mercurial if I don't remember how to ask for the current commit hash (and nothing else), I go "hg help" (83 lines), look at the list of commands, see "hg identify", do "hg help identify" (29 lines), and find out I can do hg identify -i or -n depending on whether I want the hash or commit number in the local repository. (The number is just the count of commits in the repository when this one was added, and since it's locally unique, a simple increasing number that's easy to understand, and my copy's the master for my projects, I use that to identify versions.)

Trying to ask git how to display the current commit hash, and NOT crap 12 random other thing out so I have to dig what I want out with sed, is a PROJECT. Starting with "git help" just shows a subset of the commands, you need "git help git" (not the same output, and 901 lines) to get the full list. Spending several minutes reading that, I still have no clue where to go next. I tried "man git show" (416 lines), and that wasn't it. I vaguely recalled "git display" which doesn't exist, and dig through my old blog entries to find git describe. So "git help describe" (153 lines) seems to say it can't do it. (The closest is "git describe --exact-match" which spits out an error message if there's no tag for this commit.)

Eventually I tried beating something out of "git log", and on line 789 of the man page it finally told me enough to figure out "git log -1 --format=%H".

This took more than 15 minutes, and the answer was found more or less by trial and error. Compare that to thirty seconds or so mercurial took to figure out the same thing. This is why I hate git.


January 8, 2013

Heh. Adobe recently released a 10 year old version of photoshop as freeware so they could shut down the old license server, except they say you should only use it if you bought a copy.

This reminds me of the way pirating your own product is a time-honored tradition within the software industry, going all the way back to the creation of proprietary software by the Apple vs Franklin lawsuit in 1983. Pirating your own stuff is a way of both attracting new users and suppressing competition, plus any product that _isn't_ pirated is viewed as a dud because obviously nobody wants it unless forced by their employer. (If nobody _bothered_ to crack it, it must be really bad.)

This is something almost everybody in the industry does, they just won't admit to it (except off the record at tech conferences). I'm not talking about junior engineers sneaking out copies, but the actual owner of the intellectual property uploading their crown jewels to bulletin board systems (or these days, bittorrent) as an intentional part of their business strategy. Years ago an ex-Microsoft engineer told me he watched Steve Ballmer personally upload "cracked" versions of the chinese/korean/indian translation of a new Windows release to various pirate sites because he would rather they run a pirated version of Windows than a non-windows OS. Far easier to convince users of pirated versions to "go legit" next upgrade than to convince Linux (or OS/2, or BeOS, or...) users to pay to switch to a different OS. (Of course Microsoft was founded by a law school dropout whose father is a lawyer, so they sue reflexively. The purpose of Microsoft's "Business Software Alliance" sock puppet was to shake down people who are already using the software. If nobody bothers to pirate it, there's nothing to sue over. Of course this doesn't always work...)

So if you were wondering about the l33t hacking skillz behind all that "zero day warez" stuff... yes there are people who can do that, but their services generally aren't required.

Personally, I prefer the honesty of the open source guys. We're also giving it away for free, the difference is we happily admit it. :)

(And yes, this is the same kind of thing that got Viacom in trouble when they sued youtube... over video Viacom's own employees had uploaded to youtube as part of their jobs. Exact same kind of self-piracy is near-ubiquitous in the software industry. With the exact same kind of secrecy, because they don't want to give up the right to sue.)


January 7, 2013

Finally got cp checked in. Lotsa fun fiddly corner cases, probably still plenty more I haven't tested yet, but it's to the point where it's worth testing an Aboriginal Linux build against it.

Actually I haven't implemented the mknod() or mkfifo() bits. (Right now it handles those as cat > dest, which isn't going to work particularly well for /dev/zero.) Hmmm, when does that kick in? (For cp -a, probably for cp -p too?) According to "man 2 stat" a file can be regular, extra crispy, rotisserie style... Ahem. Regular, directory, char, block, fifo, symlink or socket. (That's unix domain sockets, "man 7 unix". I have no idea how cp is supposed to handle domain sockets. I wonder if the spec says...?


January 4, 2013

I submitted three talk proposals to CELF (well, two talks and a BOF). One was "debugging your way out of a paper bag" which is just a bunch of war stories about debugging my way into the kernel and back again (and what happened after) with an eye towards never being intimiated by any problem you can reliably reproduce. (Queue Sark: "gdb can't help you now, my little program".) Another was about my experience rewriting the Linux command line from scratch twice (once in busybox and again in toybox: what a Linux system actually _needs_, the command selection criteria, what the actual standards are and why they're not good enough, etc). The third is a BOF about helping Android replace the PC, which is 5 minutes on my mainframe->minicomputer->micro/personal computer->smartphone rant, the need to be self-hosting, and how to get there. Then the rest of the time open to the floor (and unlike the darn compiler BOF it _NOT_ turning into me rambling for an hour while people ask me questions, although if Rich Felker showed up I'd totally hand the floor to him).

I doubt any of them will be approved because the Linux Foundation annexed the Consumer Electronic Linux Forum and all its functions now belong to the Master Control Program and together they are complete and so on, and those guys do not respond well to hobbyists. (I'm not sure they know what a hobbyist is, they wanted to know what company I represented in the application form.) But it seemed worth a try.

So I was really paranoid about testing the "rm" command, but testing "cp -r" turned out to be the dangerous one. Forgetting to discard "." and ".." while traversing: bad. Forgetting to add the check that source file and target file aren't the same device+inode: bad. (So it opens src/../blah and dest/../blah which are the same file, the latter with O_TRUNCATE, reads nothing from it, writes nothign to it, closes the now zero-length file.)

Having fairly recent backups: good. Still took a couple days to work through what got damaged and what needed to be restored.

Right, moving on from that: I noticed that cp contained a repeated idiom "if (!(x=strrchr(blah, '/'))) x=blah; else x++;" which is x=basename(blah). I started writing my own basename() because the man page said the libc one could modify its argument and couldn't figure out WHY, but while writing it I started adding the code to truncate trailing slashes and went "oh, that's why". Which is a case that the existing repeated code gets wrong, and the test suite doesn't check for ("mkdir one two; ln -s one/ two" and ln complains that two/ exists.)

So going through the tree and fixing that, and inserting some extra tests in the test suite. While doing that, I wound up cleaning up rmdir -p to deal with trailing and repeated slashes and spent the longest time trying to find a reveresed test. (Isn't that always the way: I knew what I meant not what I wrote, so had to go through the whole song and dance of narrowing it down to THIS line is where it deviated from my expectations and why is... because there's a ! missing. Right.)

Meanwhile the -rc2 kernel shipped a couple days ago and that's about the point where I should start paying attention. Alas, my netbook is not much of a build machine and I goodwilled securitybreach (nowhere to set it up, it's LOUD, turns any closet into a sauna, and turning it on only when you need it's no fun because it takes several minutes to boot up with all the Dell server BIOS crap). An equivalent replacement system is about $600 these days (hyper-threaded 4-way with 32 gigs ram and a terrabyte of disk), I might get one in Minnesota.


January 3, 2013

People are talking about the debt ceiling and how Obama is too spinless to print money (via the platinum coin loophole) to render the nihlist GOP irrelevant. So here's another thing Obama's too spineless to do.

If we hit the debt ceiling, state payments should go to those who voted to honor the government's obligations (I.E. raise the debt ceiling to tax, print, or borrow money they've already passed laws obligating the government to spend). Divide each state's payment by the number of representatives in the blocking body and award one share for each yes vote. So if a state has 7 representatives in the house and 3 voted yes, they get 3/7 of the money they'd otherwise be entitled to. This includes social security checks, medicaid, VA benefits, and all military contracts payable to entities in that state (which are collectively the majority of the budget).

Note this amount will be payable from tax revenues, because the percentage of "yes" votes that goes over the amount payable from tax is enough to raise the debt ceiling. It impartially focuses the default on those who voted to default.


January 2, 2013

WOW I've been more productive when I can actually _see_.

When you cp -R the destination could technically have a symlink where you expect a directory, and thus follow it to copy into areas it shouldn't. This is pretty clearly pilot error, though.

I need to have perror_msg() set TT.exitval = 1 and do a cleanup pass removing all the resulting unnecessary manual exit value setting. Ok, switch gears and deal with that...

Trimming some error messages while I'm here. The point of perror_msg("blah") is that it prints "commandname: blah: syserror output". So instead of realpath going "cannot access '%s'", if it just says "%s" you get "realpath: input/file/name: Discombobulated Inode" or whatever the problem you had was. The _big_ advangtage of this is there's nothing to translate to other languages. The command name's mostly posix, and the error messages come from libc (which should be locale appropriate for us).

Hmmm, there's a repeated idiom (head, cat, wc, cksum, dos2unix...) of:

for(;;) {
  int len = read();
  if (len < 0) error_msg();
  if (len < 1) break;
  do stuff;
}

Not quite sure how to factor that out into a library function. (Ok, I could trivially make a MACRO but that's not the point. That doesn't eliminate duplicate code, it just hides it.)

Sigh. Part of the reason I'm doing a manual pass on all the "find toys -name *.c | xargs grep -A 3 error_msg" output is to find possible cases where error_msg _shouldn't_ set the error return code. I found one: in passwd.c it's doing error_msg("Success") and there is a REASON that command is in the "pending todo items" category in status.html. Need a big cleanup pass on that...

And now cp -p has to care about nanoseconds. (Standards are moving targets when you mothball a project for a few years...)

Memo: when writing a new cp implementation, the first time you test cp -r is _not_ the best time to find out A) you forgot to filter out "." and ".." from the directory traversal, B) you forgot to check the source/target files for dev/inode equivalence before opening O_TRUNC and copying the contents.

Sigh. This is why I have backups...


January 1, 2013

I'm not ready for it to be a different year yet. As usual.

Digging through the clutter in the guest room and moving bits to storage, I found my previous set of glasses in a box, and... wow. It's _so_much_better_ than my current set, even with the damaged lenses. (Apparently the lens coating was soluble in bug spray.) Distance viewing isn't the greatest, but up close I can FOCUS ON THE SCREEN.

I had no idea how much my ability to concentrate was screwed up by having to reread each line three times at enormous font size to see it, and getting massive eyestrain every ten minutes. (Well, I suspected, but _dude_.)

Right, cp. Dig dig... (I feel like Agatha Heterodyne dealing with Gilgamesh Wulfenbach's falling machine: that can go, that can go...) I need to redesign xreadlink() because I mostly need readlinkat() instead. Really it should do the "feed it NULL and it allocates its own damn buffer" trick.

All the xblah() wrappers exit the program if an error that should never happen actually happens. In the case of memory allocation, malloc() should never return NULL because all it does is allocate virtual address space which gets asynchronously populated later as we actually dirty the memory and generate page faults.

The page fault handler can reclaim physical pages from other users and write them to disk, and use the freed up page to satisfy the allocation. (This is called "page stealing". You give it back when the other process needs it, by stealing it back from _this_ process. Or get another one from somebody else. Or if you're lucky other allocations will be freed by then and there will be actually unused memory. And yes, you can steal pages from elsewhere in the same process. Happens all the time.)

If a page was allocated via mmap() on a file, the page can be written back to that file (or simply discarded if it hasn't changed since it was read from the file, this is why there's a "dirty bit" in the page tables). If it's an "anonymous" mapping (no associated file, technically mmap(NULL)) the page fault handler can write the page out to swap space. Of course what the fault handler _does_ is suspend the faulting program, schedule a DMA transfer, set some _other_ program running, and then when the DMA finishes it sends an interrupt to notify the fault handler the write is done and it can go reuse that page now, so it hooks it up to the faulting process's page tables, either zeroes it out or schedules a _second_ read to pull in the appropriate contents from mmap(file) or swap, and then when _that's_ done it unblocks the program and allows the attempt to access the now-restored page to continue. So you're overlapping processing with I/O wherever possible.

Anyway, the point is you're not really out of memory until you exhaust physical memory AND swap space AND all the mmap() pages. And of course disk cache, which is another big user of physical memory (usually about 1/3 of it; if you allocate all your pages for programs disk access will slow to a crawl, every time you open /var/log/wtmp it has to read through the root directory to find "var", go open the file containing the contents, read through _that_ to find "log", go open the file containing that's contents, read through to find "wtmp", look up where _that_ stores its data... We're talking dozens of disk accesses to track down where 1000 bytes actually lives on disk. You don't notice because it caches all that after the first access.

It turns out that if you pin the system against a wall with allocations it starts thrashing. Long before it actually deadlocks itself (gets into a position where there are NO pages left to satisfy any allocations, and every running process has the next page it needs swapped out and everybody's blocked waiting on everybody else), the system runs out of good decisions to make when it needs to steal a page. So instead of stealing a page that won't be used for a while, it steals a page that the process will need soon. So every time the process is unblocked it's almost immediately re-blocked needing to swap in the next page, and the entire system slows to a crawl spending all of its time swapping out pages that immediately get swapped back in.

A badly thrashing system slows to a crawl, to the point where tasks that ordinarily take a fraction of a second take minutes to complete. If work is coming in externally (such as to a web server), every request starts timing out. A server that gets into this state may never clear itself without external intervention because instead of handling 10,000 requests a second it's handling twelve (and those twelve are erroring out and aborting but it takes 5 minutes of grinding for the system to come to that conclusion).

Note that thrashing means a _transient_ load spike will take a server down and keep it down. If your normal 500 requests per second suddenly spikes to 20,000 requests per second, and then goes back to normal after a minute or so... the server stays down. Because if it's only retiring 12 requests per second, the normal 500/second is still well above what it can handle.

This is why a good web server will respond to load spikes by dropping connections early. It'll back down a bit from the edge and cap its load at maybe 8000 simultaneous transactions it can handle safely and everybody else gets a static "sorry, try again" page (or just a dropped connection with no data). If they hit retry, they'll probably be in the 8000 this time around (or the time after that), and get their page in the same 1/2 second it takes everybody else. This means that transient load spikes clear immediately, and even during the spike people pounding "retry" aren't actually making things worse. And nobody's waiting 3 minutes to see _if_ the page could load.

The linux OOM killer is similar: if the system deadlocks, kill processes until you've freed up enough memory for the survivors to continue. Unfortunately, this is a nasty problem. Ignoring for the moment the decision of WHICH processes to kill (which ain't easy), how do you tell when the system's deadlocked? It goes through thrashing first, and thrashing can literally take DAYS to resolve into an actual deadlock. I rememer the 2.4.7 kernel with a memory manager prone to thrashing when I opened too many browser windows. More than once I left a system thrashing when I went to lunch (no more requests, just finish the tasks you're already processing), and when I got back sometimes it had recovered, sometimes it had deadlocked, and more than once it was STILL THRASHING. The difference from a user's perspective between thrashing and deadlock isn't that big: "I move the mouse and cursor doesn't respond for 30 seconds" is equally useless from both causes.

This _infuriates_ the mathematicians, of course. People who want perfect behavior from the system, and believe that intentionally discarding a user request is just offensive. If you're wondering where the phrase "the perfect is the enemy of the good" comes from, it's these guys.

These are the same people who insist that the return value of malloc() should tell you whether or not there was enough memory. If you just leave an "emergency pool" of enough memory idle and only used in emergency situations... except defining an "emergency situation" is a subset of "figuring out when we're thrashing", and every time you try to work out how big the pool has to be to guarantee you can recover from thrashing you wind up doubling the amount of memory in the system and leaving half of it idle and STILL not being sure it's enough...

Why so much? Because memory is shared, because clean memory can be dirtied, because when a gigabyte-sized pig like firefox or openoffice calls fork() the new copy might dirty all its memory, but most of the time it will just exec() a new program and discard its existing mappings... and you can't tell. Predicting a program's future memory access patterns is PREDICTING THE FUTURE.

The relationship between virtual and physical address space is illusory. The easiest way for a 32 bit system to run out of virtual address space is to attempt to mmap() large files, so when you _do_ run out it's not because you ran out of physical memory. At the other end, running a 1000 copies of "bash" involves each process mapping the bash executable into its address space, and the way dynamic linking works (self-modifying-code!) every page of that mapping _could_ be dirtied. The operating system can't tell. The bash executable is about a megabyte, and bash forks a new short-lived copy of itself every time you use parentheses or pipes. In the real world, how does vetoing process creation halfway through a shell script (because the number of _potential_ writeable pages in the bash executable mapping goes over the physical memory installed in the system) differ from the OOM killer? By the time you ask how many pages of buffer two ends of a pipeline might have in flight between them, how much disk cache a process is allowed to dirty writing files to disk before you block the process and let the buffer drain... the ivory tower burns to the ground before you even get to the _hard_ questions.

So let's get back to "checking for a NULL return from malloc and attempting to actually _recover_ is useless". If this ever actually happens, the system is probably hosed to the point of needing a reboot. Your recovery code will never get any non-artificial testing. All the page fault handler can do is send the program a signal to kill it (the program has moved on by the time the system _actually_ runs out of memory), so the point of the wrapper is to make it so nobody _else_ ever has to check. If it triggers, it's like disk errors coming back from rotating media: the system needs medical attention.

(Note: disk cache means actual problems writing the files to disk often happen after the file is closed and the process has exited. The write() just put data into disk cache, if the program already exited since then there's nobody TO notify of an error. And the _reason_ we do this is waiting around for the results is orders of magnitude slower. The chef does not stand next to the diners to see how big the resulting tip is before starting to prepare the next meal, real life doesn't WORK that way.)

(Yes, nommu systems have no virtual address space, and their allocations can fail due ot fragmentation instead of just resource exhaustion. But again, "kill the program if this happens" is a pretty reasonable reaction.)


December 30, 2012

Ok, got losetup done-ish. (Needs more of a test suite, but various things worked from the command line.)

Next up is umount (which uses the losetup code, and is smaller/simpler than mount) but for a change of pace I think I should clean up cp first.

Which gets me back to where I left off in cp: figuring out what counts as an illegal recursive copy. The gnu/dammit version vetoes "cp .. ." but allows "cp ../.. ." and I'm not sure _why_. The case I'm trying to avoid is "mkdir -p temp/temp/temp/temp; cd temp/temp; cp -R ../../temp ." and it...

Duh, all you have to do is stat the source, remember that dev/inode pair, and if you see it again you've gone recursive. Ok, that's actually clever. (No way the FSF came up with it.) Here I was mucking about with creating absolute paths...

I note that I try not to look at gnu source code if I can help it (it's not so much a license contamination thing as a "life's too short to spend my hobby time trying not to vomit"), but I do run tests against it, up to and including strace.


December 27, 2012

The losetup implementation is _so_ much easier now that I've figured out how to properly encapsulate the code. And the losetup command definition is _so_ imprecisely defined: losetup -d can take multiple command line arguments (devices to deallocate). I should also test "losetup -da" and "losetup -dj filename".

Let's see: display current status when device given on command line (error if device isn't associated), display all devices with -a (silently skip non-associated devices), display all devices associated with file -j (is it an error if there aren't any? No, it is not.)

Associate one device. Associate device with offset. Associate with size limit. Change size of file and recheck size with losetup -c. Find first device with -f...

I haven't yet made -d actually _delete_ loop devices because I'm not sure how to tell how many precreated devices there are. (There's a /sys/module/loop/parameters/max_loop but it's 0 on my host and that's got 8 precreated loop devices.)


December 26, 2012

Figured out how I should probably handle losetup (run the losetup main() function from mount and have it leave the found device be at the start of toybuf; avoids more than one file needing to mess with linux/loop.h directly _or_ parsing TT from the wrong context).

Didn't get to play with it today because people sent me bug reports. Lots, and lots of bug reports. Here's the first half of the one I'm currently debugging...


December 25, 2012

So "mount -o loop" needs to do losetup -f (which is best implemented with /dev/loop-control these days), which basically means it needs most of the guts of losetup.c. If I move said guts out into lib/lib.c they depend on the contents of linux/loop.c and I don't want lib.c to depend on linux/*. (Even if I make a compile time probe, there's either #ifdefs in the C code or conditionals in the build script, and so far I've avoided both. All that sort of thing belongs in individual commands that already have config symbols compile time probes can switch off. (These config symbols work on a simple generic rule: name of C file under toys/* matches name of config symbol.) The lib stuff is just lib/*.c which gets pared down by the linker discarding unused symbols.)

Even if I bit the bullet and did factor it out, a theoretical loopback_setup() function cares about the layout of the GLOBALS block. I can sort of abstract out the arguments into char **device (which could start NULL and be set to the allocated "/dev/loopX" string for -f) and char *file (which could be NULL indicating we don't associate but instead display the results of an existing association, although it needs a flag value to indicate whether an existing but unassociated loop device is an error, since it isn't for "losetup -a" but is for "losetup /dev/loopX", but I can just use (char *)1.).

But the problem is the function needs more than that. It needs to know whether to open stuff read only or read/write (flag -r). It needs to know the device and inode of a comparison function for -j searches. So I can get it down to about 5 arguments... Although signalling we're NOT doing -j by setting dev and ino both zero is dubious, since lanana implies that NFS breaks everything, as usual. If I can't check toys.optflags the argument values get uncomfortably magic. (And I can't check losetup's optflags from mount, they're generated values and mount hasn't got losetup's macros defined in its namespace.)

Sequencing-wise, the -j stuff is in the middle between finding a device and binding or reporting the device, you filter out whether or not the device is one you want after you open it. To split that out I have to either provide a callback function (not an improvement) or split the one guts-of-losetup function in half with the first part returning a loop_info64 pointer out of linux/loop.h, meaning the linux header would have to be #included in multiple places because the users of the function need it.

Ick.

It really looks like what I'm going to have to do is have mount xexec() losetup and parse the output through a pipe, which is almost as disgusting as the other alternatives.

(The reason this wasn't a problem from the busybox version is it shared less code and was less clean. I didn't have generic infrastructure doing option parsing, I didn't care about #including linux/*.h from anywhere I felt like it, I threw #ifdefs around with abandon, and there was no POSSIBLE way to make the makefile I inherited any uglier than they already were. Making it work isn't a problem, making it _pretty_ is the problem. It takes an awful lot of work to get a simple result that looks obvious.)


December 24, 2012

Poking at losetup again. The tricky bit is I want to use the code as a standalone command, and reuse the code in mount and umount.

For mount I need essentially losetup -f, but it has to communicate back to the parent program which device it grabbed. This is fiddly because the option flags specifying -f and the global block are command-specific. If I write the functions the obvious way they're not portable, and if I write them to be portable (marshalling buckets of arguments into and out of two function arguments when they're already there in the global block) there are only two users.

One idiosyncrasy of /dev/loop-control is that the naieve (check /dev/loop0 then loop1 and stop at the first one that isn't there) no longer works because the ability to create and delete devices means the set active at any given time can be sparse. So I need to list the contents of /dev and parse the loop[0-9]* entries, so dirtree and a callback. Which is why data needs to be in the global block, because the callback isn't set up to pass things like FLAG_f. (I've got existing data structures for per-node context and global context, and adding a third layer is just awkward.)

I need to do a test suite, which requires root access to work. Unfortunately, while it's easy to come up with tests:

losetup -f -s
losetup -f file
losetup -fs file
losetup -af (fail!)
losetup -j file -s
losetup -j file file (fail!)

It's much harder to make the results reproducible. The output of querying a device includes dev and inode numbers that aren't reproducible, the paths of the associated device are absolute (and thus include the directory you ran the test in), and the order that losetup -a finds devices when it's doing a directory scan is kind of arbitrary (in my tests, it's find them in the reverse order devices were created).

Also, losetup -f is inherently racy. It finds or creates a device, then tries to use it as a second step, and another instance could allocate the device in between those. I'm trying to figure out if this should report an error or if there should be retry logic in there...

Maybe I should break down and have the losetup device scan sort the devices before trying to look at them, but this widens a similar race window with the ability to create/remove devices.

Sigh. If I wanted to do a half-assed job I'd be done by now. It's being stroppy at the design level.


December 23, 2012

How does one test a dog for vampirism? Apparently garlic is bad for dogs anyway, so it's same as the "If you drive a stake through its heart, it dies" test not really being a good way to determine whether something is or isn't vampiric in nature.

Very time consuming dog. Very energy consuming dog. Very seperation anxiety sleep deprivation dog.


December 21, 2012

Why does nobody in washington understand basic economics? This "fiscal cliff" nonsense _can't_ raise interest rates because that's not how interest works.

Interest rates are the return on investment you can get when you loan out money. When the economy sucks, interest rates go down because nobody is making any money, so nobody can afford to make the payments if they borrow more money at high rates. (The people willing to do so don't qualify for the loan. Yes, lenders can always get higher rates by accepting more risk, but beyond a certain point you're just gambling and lose more money to defaults.)

The current economy is stuck in a type of stall we haven't seen since the 1930's: nobody's making any money because nobody's spending any money, and nobody's spending money because nobody's making any money. Living off savings is terrifying if you can't replace them. This problem is easy enough to fix with something like FDR's "new deal" where the entity that can print money (and thus can never run out) buys a bunch of everything to push the economy back up to speed. But we've got a really really BIG economy and it takes a LOT of spending to get it unstuck, and the "stimulus package" of 2009 was maybe a third of what we needed. (Enough to turn "hoovervilles" and the "bonus army" into "occupy wall street".)

Ordinarily when demand goes down below where we need it to be to avoid layoffs, the federal reserve will offer to loan money at even lower interest rates until people are willing to borrow again (sometimes just to refinance their existing debts and lower their monthly minimum payments, thus freeing up new money to spend each month on goods and services). Unfortunately if you get a big enough shock the interest rate you'd need to offer to get monthly spending back up to a rate capable of keeping everybody employed is BELOW 0%, and the federal reserve can't offer that. And because this knob normally works so well at controlling the amount of monthly spending people do, the feds no longer have a backup plan for what to do when the knob hits its end stop. (Asking them to pull out the techniques FDR used in the 1930's is like asking people to pull out candles during a blackout: how quaint, how uncivilized, we don't DO that anymore...)

Unfortunately the old fogies stuck "fighting the last war" are treating this as a supply-side crisis ala the OPEC oil embargo of the 1970's, so they're busy giving water to a drowning man and blocking any attempts to drain the swamp because they cannot CONCEIVE of a problem where customers simply don't buy products producers are selling. The problem MUST be at the producer end, It's not like customers have any CHOICE in the matter, they're just sheep behaving mathematically without volition, right? Only business owners are actually _people_...

Rich people looking for places to park their savings DESPERATELY want rates to go up because right now they're losing to inflation. They don't understand why rates are so low, they're convinced it's a conspiracy on the part of the federal reserve to punish rich people and prevent compound interest from making them richer. They've invented a "bond vigilante" fantasy whereby any day now rates will MAGICALLY go up, and suddenly their vast fortune will be earning, 3%, 4%, 5% above inflation instead of losing money to inflation every year. How will interest rates go up without any increase in people's ability to qualify for new loans or make additional monthly payments? Well, they just HAVE to. Because the alternative would be unthinkable!

Of course that's not how it works, but it turns out that people who made their money by modern white collar piracy ("leveraged buyouts") don't have to understand how the economy works any more than sports players who win via steroids (and retire at 30 with epic health problems) really understand anatomy, biochemistry, neurology... An olympic medal does not qualify you to perform surgery.

Speaking of inflation: it turns out the federal reserve could fake negative interest rates by raising the rate of inflation, because 5% inflation and 1% interest is essentially -4% interest. Go ahead and borrow money: by the time you have to pay it back it won't be worth as much anyway. In fact your existing debts get slowly eroded and less troublesome. But rich people HATE inflation eroding their existing fortune, and will fight to the death to stop this from happening.

P.S. a leveraged buyout is where you borrow a bunch of money to buy a profitable company, often using that company's assets as collateral. (Just like when you get a mortgage on a house, the house you're buying is the collateral.) Once you're in charge not only can you drain the company's bank accounts and pocket the money, but you can transfer your loans into the company's name so the debts you ran up buying the company are no longer your problem. If you haven't maxed out the company's credit rating yet you can have the company borrow MORE money (and pocket it). Next chip off any large assets (buildings, profitable division) and sell them, pocketing that money. Laying off employees can reduce expenses and allow the company to qualify for more loans. Rewrite the employee contracts so any retirement benefits are no longer based on savings and will be paid from future revenues, so you can pocket any existing pension fund. When there's nothing left to loot, sell the dessicated husk of the company and pick a new target.

This is how people like Mitt Romney made their money. Yes, it starts with the ability to borrow millions of dollars, which is easier to do if your daddy used to be the governor of Michigan. If this sounds utterly evil, you obviously don't understand the realities of business where corporations are people and employees are "resources, comma, human".

That said, stealing the Mona Lisa from the Louvre still doesn't make you Leonardo Da Vinci. Obtaining is not making. This is why the correct response to calling rich people "job creators" is to point and laugh.


December 20, 2012

Downloaded the new PC BSD 9.1 release and fired it up under qemu 1.3.0. It hung endlessly on something like PCI bus scanning, with and without ACPI, so I told it to boot in "safe mode" (what is this, windows? That would run under qemu...) and it paniced saying it could't find a time source. So much for this year's interest in BSD.

New dog is very time consuming dog.


December 19, 2012

I uninstalled my irc client after someone on there insisted that the http://landley.net/aboriginal/bin directory (which contains nothing but symlinks into aboriginal/downloads) was confusing them, and that I needed to remove it so their brain could cope. (This was the same person who said I should grab aboriginal.org or ab.org despite both of those already being taken: the whois command is a thing that exists.) Really: I have other things to do with my time.

Part of my short temper is due to my normal tendency to switch to a night schedule and Fade's morning-person tendencies wanting me to be on the same schedule as her, so when I do shift to a night schedule these days she wakes me up every couple hours to see if I want to get up now, and then I'm groggy but not sleepy all night.

Plus still recovering from injuries, which the tetanus shot more or less qualifies as at this point. (My arm is swollen, and the irritation is somehow maintaining the outlines of the band-aid days later.)

Fade and I are getting a dog. The cats are just going to be _thrilled_. We've spent about 4 hours a day all week looking at various dogs, and have gone through a half dozen where we decided "ok, we want this dog" and then either it's adopted out from under us (we were told they couldn't be reserved before pickup, except that the one Fade wanted most was reserved when we came to pick it up), or "we forgot to mention that this dog was part of a bonded pair that can't be separated" or "now, about this dog's medical problems..." But Fade really wants a dog, so we keep at it. Amazingly time consuming, dog hunting.

I need to finish filling out paperwork for the job I start a little over a month from now up in Minnesota. It's a six month contract: as much fun as toybox is, I'm still paying a mortgage on a place three times bigger than I used to live in, and putting my wife through college. (I'd pondered doing a kickstarter or something to see if anybody wanted to sponsor some full-time toybox work, but Fade wasn't enthused about the idea.)

I need to repost the perl removal patches, and even though it's the merge window, and I've posted them to the list a half-dozen times over the past three years, and they don't actually change the generated files, I should probably try to feed them through the linux-next tree because the kernel development clique is ossifying a bit in its old age, developing ever-more layers of procedure and ritual. I downloaded the linux-next tree and read a bit of the wiki, but so far there's nothing about actually submitting patches to it. Possibly it's in Documentation somewhere...

Friend visiting from out of town this weekend. (She used to run Mensa games night before retiring to Maryland).

Recovering from injuries, which the tetanus shot more or less qualifies as at this point. (My arm is swolen, and the irritation is somehow maintaining the outlines of the band-aid days later.)


December 18, 2012

I have a needle-phobia. Today, I got a tetanus shot. That was pretty much my day.

I put it off as long as possible but after the weekend's incident with the wire hoop and the picture of bleeding I posted to twitter with the "do not click on this link" warning... it's a piece of metal lying on a neighbor's lawn, out there long enough to corrode a bit despite being some variant of stainless steel. And the last time I _might_ have had a tetanus shot was 9 years ago.

Then Fade took me to look at dogs, I got home, played skyrim, and fell asleep on the couch until almost midnight. I have a bunch of things I _should_ do, but really wasn't up to any of them.


December 17, 2012

If anybody cares about the patches removing perl from the linux kernel build, I just posted 3.7 versions to the mailing list: 0/3, 1/3, 2/3, and 3/3.

My direct mail sending script mangled them slightly (the archive sort of has a long name for me, but not quite?), and I'm still waiting for the list to send copies back to me so I can see how they came through, but I did the cannonical patch format with diffstat, sent them to the get_maintainers cc: list, and it's at least not whitespace damaged (unlike Balsa). With several days of merge window left.

Just like the last half-dozen times...


December 16, 2012

I got a toybox release out, and an Aboriginal Linux release out.

And I tripped over a wire hoop in the neighbor's yard and re-injured my darn foot. Much bleeding. Really annoyed. Probably need a tetanus shot.


December 14, 2012

My giant build finally completed sometime after midnight (takes more than a full day to build all targets on this netbook, and that doesn't include the native compiles). And the the 3.7 kernel broke arm, mips, and i686.i That's above average collateral damage for a single package upgrade.

I dealt with i686 day before yesterday. Bisecting arm gives me commit 387798b37c8d which added multiplatform support and changed the Arm platform type default from ARCH_VERSATILE to ARCH_MULTIPLATFORM. Ok, add the explicit config symbol to LINUX_CONFIG in the arm target files... and it builds and boots. Right, rebuild the kernel on all those targets...

I have GOT to get a faster laptop. Or at least a server I can connect to and knock out some fast compiles just to show there aren't any OTHER problems...

Let's look at mips:

arch/mips/lib/delay.c:24:5: warning: "__SIZEOF_LONG__" is not defined

Lovely. This is another toolchain version thing, isn't it? Do a quick "gcc -dM -E - < /dev/null | grep SIZEOF" on both toolchains and... yes. Yes it is.

Ok, looks like it's time to update the kernel's README and Documentation/Changes because gcc 3.2 and ld 2.12 ain't gonna work no more. I'm having to patch gcc 4.2.1 and binutils 2.17 to get them to build this sucker, and this is no longer "the sh4 maintainer is an asshole", this is two different architectures breaking even the ones I test (much newer than the documented requirements) it in the same release.

On the bright side, fix that and it seems to be working again.

Gotta test the native builds. Gotta cut a toybox release. Gotta send the perl removal patches upstream (possibly into linux-next). I should check the armv4teb target to see if I can finish it in a reasonable amount of time. I should see if powerpc-440 can actually work with qemu's new -M bamboo board emulation. I should dig up the qemu-m68k branch and make puppy eyes at Laurent...

But first thing after cutting this release: get back to the ccwrap rewrite so I can switch to musl.


December 13, 2012

I need to send the perl removal patches upstream again, and deal with a backlog of documentation patches I've tagged, but Balsa is crap. There is NO WAY to get it to avoid whitespace damaging patches. Even forwarding a message AS AN ATTACHMENT did whitespace damage.

I checked the kernel's Documentation/email-clients.txt and it doesn't mention Balsa (unsurprising) and specifically says that the gmail web front end does not and cannot be made to work. (Longish list of reasons, including it converts tabs to spaces, period.)

Meanwhile, I've got the list of marked patches in balsa that I need to extract _from_ balsa. Right click on the message and... there's no "save as" option. Great. None of the icons when I've got a message selected does it. The file, edit, view, mailbox pulldown menus have nothing. If I "view source" on a message and cut and paste from that window, it turns tabs into "\t". (What were they smoking?) I tried creating a new mbox file to copy the messages to (although undoing the mime encoding is a pain but it's something), but right click has no copy! It has move, but there's no option to leave the original message in place instead of marking it deleted. (What is this, some kind of DRM enforcement? There can be only one copy?)

I eventualy found something usable under the Message pulldown menu, called "Save current part", which can deal with my flagged messages. But in terms of sending _out_ new messages containing non-whitespace-damaged patches, Balsa simply can't do it.

So for a third time, I'm writing python code to fling mail around, using the builtin packages in the python standard library that do this stuff with only a couple lines of code from me, the meat of which is:

recipient = recipient.split(",")

headers = ["From: " + sender, "Subject: " + subject, "To: " + recipient[0]]
if len(recipient)>1: headers.append("Cc: " + ",".join(recipient[1:]).lstrip())
headers.extend(["MIME-Version: 1.0", "Content-Type: text/html", "", ""])
headers.extend(sys.stdin.read().split("\n"))
body = "\r\n".join(headers)

session = smtplib.SMTP("smtp.gmail.com", 587);
print session.ehlo()
print session.starttls()
print session.ehlo()
print session.login(sender, password)
print session.sendmail(sender, recipient, body)

So I hardwire sender to my email address, pass a comma separated list of recipients and a subject string on the command line, and redirect a file to stdin containing the body of the message.

This means I have now written a POP receiver, an SMTP sender, and an mbox filter in python, because in each case it was EASIER THAN FIXING BALSA. If I could decide whether ot pursue the gtk or qt bindings (or some other gui library), I'd just write a front end and be done with it. (I can compose messages in "mousepad".)

But I don't _want_ to write an email client. I just want to _use_ an email client. One that isn't crazy. I have other things to do...


December 12, 2012

So 3.7 added "static __initconst u64 p6_hw_cache_event_ids" to arch/x86/kernel/cpu/perf_event_p6.c and it's breaking my i686 toolchain. What the heck is __initconst? It's defined in include/linux/init.h as "#define __initconst __constsection(.init.rodata)" and right below that is an #ifdef CONFIG_BROKEN_RODATA for toolchains that don't handle this. Which is only set for parisc right now, but apparently applies to anything still using gcc 4.2.

One way to fix this is to default BROKEN_RODATA to y (which works), but I don't want to maintain yet another patch against the kernel that has no change of going upstream. Instead I should probably figure out how to patch gcc. I've been meaning to do a similar upgrade like I did with binutils, where I move to the gcc repository commit right before they went GPLv3 and then fix whatever's broken in that random snapsot, on the theory this might provide ARMv7 support. That would be a good target to support. (The new 64-bit ARMv8 will definitely require a non-gcc toolchain.)

Unfortunately, the gcc repository is crap. As far as I can tell the project is _still_ maintained in subversion and they just mirror it in git, and there are no tags, or even obvious commit comments announcing releases. I have yet to figure out where the release I'm _using_ is. A git log on the COPYING3 file found the commit that introduced that, and fairly extensive grepping of the commit before that (c473c074663de) didn't find any references to GPLv3 (or "[Vv]ersion[ \t]*3" that's actually about the license instead of libstdc++, or several other variants...) However this commit requires MPFR and GMP to build meaning it's off into 4.3 territory, and according to my notes 4.2.2 was GPLv3, so it looks like tags weren't the only history that got lost in the repository conversion. Sigh.

And then when I installed mpfr and gmp on the host just to see what would happen, but build broke in a hilarious way:

In file included from /usr/include/stdio.h:28,
from ../../.././libgcc/../gcc/tsystem.h:90,
from ../../.././libgcc/../gcc/libgcc2.c:33:
/usr/include/features.h:324:26: error: bits/predefs.h: No such file or directory
/usr/include/features.h:357:25: error: sys/cdefs.h: No such file or directory
/usr/include/features.h:389:23: error: gnu/stubs.h: No such file or directory

Translation: they did the -nostdinc thing with a lot of -I and prevented the standard headers from finding themselves, because gcc is _special_. It can't cope with building like a normal program, no, it has to micromanage the host compiler. (Even though all it DOES is parse input files and produce output files which is NOT HARD. Except for the FSF.)

Broke down and just did the kernel workaround for the moment, which got i686 building. Set a buildall.sh going to see what other targets break...


December 11, 2012

The 3.7 kernel dropped last night, so today is patch update day. My hacks to the arm "versatile" board for better qemu support don't apply cleanly anymore. (The reversion of the IRQ changes that break qemu still applies cleanly, but all the menuconfig symbol stuff to stick different processor types in the same board had context change around them.) I don't need to apply the ext4 stability fix since that's upstream.

Tryn's old "make BOOT_RAW a selectable menuconfig option" had another context change but I yanked it rather than rediff it because I'm not actually using it. (Possibly I should give that one more submission. The real value there is the help text...)

And then there's the perl removal patches. That's not just version skew, upstream had several changes: different #ifdef guards in the generated headers, and the UAPI changes finally went upstream so the kernel headers that get exported to userspace are now split out and kept in a different directory. Instead of chopping out "ifdef KERNEL" blocks while exporting them, the kernel's private headers #include the uapi versions. (They still need a pass to remove __user annotates and put underlines around __asm__ and so on. The "don't use u8 as a type in anything userspace sees" appears to be a coding convention rather than something the scripts clean out, another reason to separate the files rather than have different conding conventions inside and outside special #ifdefs.)

With stuff this fiddly, the best way to see what's changed and make sure you don't miss anything is to "git log" the old perl files and "git show" each commit that touched them so you see the patch and explanation, then make the corresponding changes to the shell script version. When I wrote the shell script I sat down and worked through everything it was doing and diffed the resulting generated files and it took _days_. But now that I've got equivalent behavior, I just want to see what new things showed up.

Which brings us to the new requirements, such as removing the _UAPI prefixes from the #ifdef guards preventing multiple inclusion in these files. Why do they need to do that? Git commit 56c176c9cac9 explains:

Strip the _UAPI prefix from header guards during header installation so that any userspace dependencies aren't affected. glibc, for example, checks for linux/types.h, linux/kernel.h, linux/compiler.h and linux/list.h by their guards - though the last two aren't actually exported.

I.E. "FSF code is buggy crap full of brittle assumptions about internal implementation details of the headers linux exports to userspace, and if we change those magic details glibc breaks, so work around this bug." The description talks about glibc, but the example breakage they cut and pasted was libtool. (I note that _one_ of the things the header export script has done for years is chop out all references to linux/compiler.h. But glibc and libtool are explicitly looking for it.)

The FSF wants _desperately_ to be the microsoft of the open source world, and they seem to think the way to get there is to produce code as bad as Redmond excretes. Hence the second paragraph of Documentation/CodingStyle in the Linux kernel, which Linus wrote in the very first version of that file back in 1996:

First off, I'd suggest printing out a copy of the GNU coding standards, and NOT read it. Burn them, it's a great symbolic gesture.

And you wonder why I'm following the #musl channel on freenode (after years of trying to use uClibc)?

Anyway, with the perl removal patches updated, now it's time to try a test build, and generally i686 is safest:

CC arch/x86/kernel/cpu/perf_event_p6.o
arch/x86/kernel/cpu/perf_event_p6.c:22: error: p6_hw_cache_event_ids causes a section type conflict
make[3]: *** [arch/x86/kernel/cpu/perf_event_p6.o] Error 1

But not in this instance. Right, what's going on here... it's complaining either about building with gcc 4.2 or 32-bit, because building the host version with the same .config (64-bit, gcc 4.6) happily proceeds. And 3.6 happily built this file. Ok, git bisect time...

Wow. A clean bisect (when's the last time that happened?) to commit e09df47885d7 which I'll have to look at in the morning because it's 5am.


December 10, 2012

Various people are surprised that when 1/1000th of the population skims off about half the wealth, everybody else is poorer. Sigh.

Did nobody pay attention to how capitalism _works_? That all profit is inefficiency in the market (it means there wasn't a competitor selling at closer to cost), and the way rich people get rich is generally by some variant of "cornering the market", I.E. fencing out competitors and selling to a captive audience. Whether it's a natural winner-take-all niche due to economies of scale or an artificial one due to patenting algebra, sustained profits require what Warren Buffet called "a moat around the business".

This is magnified by compound interest, the fact that earning 10% interest on a billion dollars gives you 100 million dollars for sitting at home doing nothing (and the historical rate of return of the US stock market over the past century averages out to a bit over 10% annually, including both world wars and the great depression). This is why the rich get richer, and if the system doesn't balance itself out you wind up with the French Aristocracy saying that if the starving peasants outside have no bread "let them eat cake" instead.

For a long time what the US did to counter this was tax the hell out of the rich, both to keep their share of the pie from crowding out everyone else and to pay for things like interstate highways, public schools, and tracking down Typhoid Mary.

Fifty years ago the top tax rate was 91% on individuals and 52% on corporations, and we used that money to put people on the moon and invent the transistor. This is why we didn't have to worry about "Citizens United" because the rich didn't have more money than the rest of us combined. The rest of us got together and voted to tax them into submission. Not just to raise revenue but to keep society balanced.

In 1964 President Johnson lowered the top tax rate from 91% to 70%, but it was really Ronald Regan who screwed everything up: In 1981 President Regan lowered the top tax rate from 70% to 50%, and then in 1986 lowered it again from 50% to 28% (and also raised the _bottom_ tax rate from 11% to 15%: yes he took from the poor to give to the rich). A quarter century of compound interest later, concentrating more and more wealth into fewer and fewer hands, and "the 1%" own the GOP outright.

Of coure the math of Regan's tax plan didn't _work_, and our enormous national debt is the result, as is lingering economic weakness. The whole "oh noes, Japan is eating our lunch, now India is eating our lunch, now China is eating our lunch" mess is because Ronald Regan and two Bushes screwed up a good thing. These days most of the money the US economy churns out is skimmed off by well-connected parasites. It no longer goes into fixing crumbling bridges, upgrading our ancient and decrepit national electrical grid, or putting a fiber optic cable to every home (like South Korea did a decade ago), or any of the other important things we "can't afford" to do.

(The other way the US dealt with monopoly profits is by breaking up monopolies using the Sherman Antitrust act. They broke up "Standard Oil" and as a result 4 of the 5 largest companies today are oil companies. They _didn't_ break up Ma Bell (their 1957 action resulted in a consent decree allowing them to continue but not expand outside the phone business) and the resulting company stagnated so badly over the next quarter century it changed its mind and allowed itself to be broken up in 1984, the breakup giving rise to cell phones and turning modems from a shameful semi-illegal abuse of their phone network (hooking up unofficial equipment to the phone lines, for shame!) into ubiquitous home internet access. Unfortunately, the Party of Regan also gutted sherman antitrust enforcement, so for example the 1995 and 1998 actions against Microsoft came to nothing.)

Add it all up and the weakness of the US economy, where parasites have sucked out half the blood and wonder why the beast's health is failing, is starting to hurt the rich. They keep treating near-zero interest rates as a plot to prevent them from continuing to compound their wealth. But loaning out money through credit cards doesn't work when the cards are maxed out because the cardholder is unemployed. Loaning out money in home mortgages doesn't work when a wave of reposessions has trashed everybody's equity and they can't afford the down payment on a new one.

Low interest rates won't make poor people living from paycheck to paycheck borrow more if they have mountains of existing debt, all they'll do is refinance. Even if they want to borrow more, the first thing they'll do with the money is pay off their existing high interest loans, lowering their overall interest rate without increasing their overall level of debt. To avoid that, the rich can't let them "qualify" for new loans; even tightly controlled store credit cards that can ONLY be used for new spending mean they might put the groceries on that and use the grocery money to pay down the other high rate credit cards.

The problem rich people have trying to park their money in a depressed economy is that compound interest isn't a mathematical abstraction, it's loaning or investing money in people who create new value by doing work. If the people aren't working, value doesn't get created: you can't take what doesn't exist. If the people's work earns less and less money, they can't buy stuff, and your big business has a shortage of buyers to sell stuff to.

Sometimes I ponder the distance between Occupy wall Street and the French Revolution and try to work out how much pain this country would actually need before declaring billionaires a game species. But mostly I console myself with the knowledge that the people who screwed this stuff up, and the people who profited from it, are in their 80's now. They and the baby boomers behind them will all die soon, and then a new generation gets a crack at it. Preferably with at least a 50% inheritance tax.


December 9, 2012

Got in a long walk today for the first time since I hurt my foot: walked to Dragon's Lair and back. Got to see Randy and several other people (several of whom I was not prepared to meet but said hi anyway).

Stopped at The Donald's along the way, and got some programming in!

Ok, todo items, otherwise known as "procrastinating about the linux 3.7 update". Test current toybox in aboriginal, fix aboriginal's native-build.sh so it actually finds netcat after the busybox->toybox switch, check baseconfig-busybox for more commands toybox provides that I can switch off in busybox (looks like cut, rm, touch, hostname, and switch_root).

Try the lfs-bootstrap build: it breaks in m4 hanging on an rm prompt about deleting a ro file without -f, which is odd because the previous rm didn't and all the rm instances in configure look like they have -f. Try a chroot splice version so I can more easily track down where that's called from, and chroot-splice is saying that the read only bind mount is writeable. And an strace on mount shows it's passing through both the bind and ro flags and the result is still writeable. Is that a regression in Ubuntu 12.04's kernel... no, apparently the read only attribute can only be applied on a remount, not on the initial bind mount, which is CRAZY. (Easily fixed, but still crazy of the kernel to require.)

Ok, the delete is actually from build-one-package.sh in the control image bootstrap infrastructure, which is deleting config.guess and substituting a tiny "gcc -dumpmachine" instead (why config.guess doesn't do that itself...)

This call to rm doesn't have a -f on it, which is fine because it's a symlink (to a read-only bind mount, but the symlink itself is in a writeable directory). The problem seems to be that faccessat() is not honoring AT_SYMLINK_NOFOLLOW. In fact, strace says:

faccessat(AT_FDCWD, "build-aux/config.guess", W_OK) = -1 EROFS (Read-only file system)

And that's not even showing the fourth argument to the syscall. Is there a kernel version limitation? Let's see, cd to linux source, 'find . -name "*.c" | xargs grep sys_faccessat' and it's in fs/open.c and... I don't even have to do a git annotate, it only has 3 arguments. So either the man page is wrong or libc implements wrapper glue that uClibc is getting wrong.

Sigh. I can check the link status in the stat info dirtree is already giving me (symlinks are always chmod 777 in linux), the problem is what if the directory it's in is read only? faccessat() should tell me if the path to here doesn't let me fiddle with it. Then again, if that is the case I can't delete it anyway so prompting is kinda moot...


December 8, 2012

And ubuntu crashed, for the first time in a while. (It was X11: chrome tabs kept freezing, then _any_ chrome tab would freeze after a few seconds, then the whole of X froze so badly the mouse pointer wouldn't move and ctrl-alt-F1 wouldn't give me the text console. Remember, the system is SO MUCH MORE STABLE if your only way of interacting with it is through a userspace process.)

This means my 8 desktops full of open windows all went bye-bye. Probably a sign I was swap-thrashing, but it's back to my todo lists to try to figure out what I was working on...

Stopped by Dragon's Lair's "webcomics rampage" thing shortly after lunchtime, but apparently they don't start until 6pm Saturday this year. Ok...


December 7, 2012

Saying "Since the server breakin we've deployed SELINUX" roughly translates to "Since that boat sank we've sprayed WD-40 on everything". Not helping. Really, not helping.

(See also: thinking WD-40 will turn a boat into a submarine, arguing about the merits of WD-40 vs scotchguard for stopping a dripping faucet, allowing people who sell undercoating and extended warantees to design IT security "solutions"...)


December 6, 2012

Sometimes, posix is so nuts _nobody_ implements it properly.

The posix rm spec, section 2, says how to handle interactive prompts for recursive deletion of directories. Section 2(b) says to prompt before descending into a directory, and 2(d) says to prompt before deleting the now empty directory.

This is not what the gnu/dammit implementation of rm does:

$ mkdir -p temp/sub
$ rm -ri temp
rm: descend into directory `temp'? y
rm: remove directory `temp/sub'? y
rm: remove directory `temp'? y

It only prompts before descending into a non-empty directory. But the spec doesn't say anything about the directory being empty, it says you prompt for 2(b) and you prompt again for 2(d).

Also, section 4 is just awkward. The bits dealing with directories should be 2(e) (because you can't get there unless you made it through 2(a)), and the bits dealing with files should be 3(b).

Oh, and posix requires infinite directory traversal depth (even though filesystems have a finite number of inodes), and explicitly says you can't keep one filehandle open per directory level. This means that A) you have to traverse ".." to get out of the directory you're in, pretty much guaranteeing that two parallel rm -rf instances on the same tree cannot both complete, B) you have to consume an unbounded amount of memory cacheing the directory contents because you can't keep the filehandle open and restarting directory traversal with -i would re-query about files you already said "n" to at the prompt.

Somebody really didn't think this one through, but even _trying_ to make it compliant means I have to start over and write a lot of bespoke code that only applies to rm. Not sure it's worth it. (I'll probably break down and do it, but I'm going to sulk a bit first and call the standards guys names.)


December 5, 2012

Updated the dirtree infrastructure to feed parent to dirtree_add_node() so it can print the full path to errors. The _other_ thing I need to work out how to do is notify a parent node that one of the child nodes had an error.

I'm going to have to have multiple negative values in parent->data for a COMEAGAIN callback, aren't I?

Oh well, could be worse. The two uses for that field are dirfd for directories and length for symlinks. (Should be zero for normal files.) Symlink length should never be negative and the only negative fd is AT_FDCWD (which is -100, and that's hardwired into the linux ABI at this point).

No, that doesn't work because I can't just reach out and set parent->data at failure time because it's using that filehandle to iterate though a directory and there may be more valid entries after the failing one. So I'd have to defer setting it, which means I need another place to store it which means it should just _stay_ there. I'm going to have to allocate another variable in struct dirtree, aren't I? (I keep being tempted to overload fields in struct stat, but they're just not well defined enough.)

Actually getting the functionality right took a couple hours. Getting error handling/reporting right is coming up on day 3.


December 4, 2012

I wonder why "rm" cares about the files it's deleting being read-only? The "unlink" command doesn't. Oh well.

Yeah, finally working on The Most Dangerous Command, which I've held off on not because it's hard but because it's most likely to screw up my system if I get it wrong.

The yesno() stuff is wrong, it's checking stdin, stdout, and stderr (in order) to find a tty and using the first one it finds, meaning it's trying to write a prompt to stdin. I can special case my way past this, but in general working out when "yes y | thingy" should feed answers to yesno and when "zcat file | tar" should bypass stdin and bother the tty... I think the caller has to specify the behavior it wants. Gonna have to rewrite that function, but I probably need more than 2 users to work out the right semantics.

The ubuntu man page for rm has a --no-preserve-root option, which is a technical violation of posix but probably a good idea anyway. Except that longopts with no corresponding short opts are kinda tricky. (I _sort_ of have support for them, you can put parentheticals before all the other options and it'll parse them and set a flag. But there's no FLAG_ macro for those, and teaching the shell script to do that sounds painful. (It would wind up being something like FLAG_no_preserve_root anyway.)

I could trivially just ignore "/" (and have the error message be "rm /. if you mean it"), but posix doesn't require it and it conflicts with simplicity of implementation. If you log in as root and "cd /; rm -rf *" or "rm -rf /*" you're equally screwed without hitting the special case. Doing a realpath check for "/" might catch a couple more ".." than you expected, but how is taking out "/home" by accident much better?

Another fun little piece of special case behavior:

mkdir -p sub/other
touch sub/other/file
chmod 400 sub/other
rm -rf sub

This complains about being unable to delete "sub/other/file". It does _not_ complain about being unable to delete sub/other, or sub, so error reporting is suppressed as it works its way back up the three. (But only for parent directories, not siblings and aunts and such.)

My first pass at the code complained about being unable to open ".", "..", and "file", with no paths. Even though . and .. are discarded by the callback what it was complaining about is inability to stat and thus generate a dirtree node, because the directory has r but not x, meaning you can list the contents but not stat the files in it. (How is that useful, you say? No clue.) The dirtree infrastructure notes the stat failure and doesn't construct a node, thus can't call the callback. I can trivially filter out . and .. from error reporting, but giving a path to file means calling dirtree_path() which means having a node to call it on.

I think what I need to do is move the error handling to the callback, which means have it make a node with zeroed stat info. I can probably detect that by looking at st_nlink which should never be zero, but occasionally is. Sigh, inband signalling vs wasting space in every node. And either way I need to go fix the existing users in ls and chgrp and such... Not liking this. What's the alternative? Making generic infrastructure too clever us unpleasant, but having it display the full path to a file it couldn't access is probably the right thing...

Also, really hard to make a test suite that captures the error output in a way the test passes on multiple implementations producing slightly different error messages. Maybe I can just count the number of lines or wash stderr through grep or something...


December 3, 2012

The losetup command is more complicated now than when I wrote one for busybox, mostly because the kernel keeps growing new features. You can set the _length_ of an association now, not just the starting offset. It's got a "capacity check" thing that updates the loopback device to match a changed file size (which is _not_ the same plumbing as setting the length of the association because the people writing this aren't big into code reuse). You can iterate through all loop devices associated with a given file...

The fun one is /dev/loop-control meaning loop devices come and go now, so the -a, -f, -j, -d, and possibly even the basic association options are more complicated, in a way I'm not sure how to avoid race conditions for, but only after July of last year. (xubuntu 12.04 has this, but 10.04 didn't. And looking at the log of drivers/block/loop.c in the kernel there's a LOT of activity going on in what you'd think would be an ancient stable system. Partition support (tweaked August 2011) with LO_FLAGS_PARTSCAN (no support in the userspace losetup xubuntu's using). And of course the kernel parameter to set the number of pregenerated loop devices (potentially to zero, so you _must_ use loop-control to request new ones).

It's a bit like moving from the old static PTY devices to /dev/pts, except not quite cleanly separated.


December 2, 2012

One of the musl guys expressed interest in a big endian arm target for aboriginal, so I'm taking a stab at that. It's... tetchy.

Ok, the old arm4eb target built something but qemu couldn't boot it. The main problem here is it's oabi, which is obsolete. Nothing uses OABI anymore, and EABI requires thumb extensions so we need to bump up a notch in processor version to support this.

New target config, armv4teb, based on an unholy union of armv4tl and armv4eb. Diff the two and see what changes armv4tl needs. The gcc/binutils tuple is derived from the filename, so that should be ok. In the uClibc config there's an "ARCH_WANTS_BIG_ENDIAN", set that.

Next problem: the kernel doesn't want the versatile board to be big endian. There's some big endian plumbing support but the key is declaring an ARCH_SUPPORTS_BIG_ENDIAN symbol that currently only the ancient mach-ixp4xx declares, more or less what the armv4eb oabi stuff from before was aimed at. (Apparently you can declare the _existence_ of symbols inside conditional blocks testing on symbols which may be modified at runtime. Wheee.) This big endian plumbing has further derived symbols for armv6 and armv7 processors, but nothing sets it. Leaked infrastructure to support out of tree boards, looks like. The plumbing's there but no board definition uses it.

So, patch the kernel kconfig so the versatile board is even MORE versatile, and then set CONFIG_CPU_BIG_ENDIAN. Now try to build it...

And the sanity test at the end of simple-cross-compiler fails because libc.so is big endian but the compiler is trying to build a little endian hello world. And it's doing that because uClibc feeds a CFLAG to force big endian but the default in the compiler is still little endian. Why is the default wrong? Dig, dig... gcc's libbfd is testing the host compiler to set the target endianness. That's just SAD. Ok, in the target config "export ac_cv_c_bigendian=yes" and try again... and that test is coming out right but it made no difference to the smoke test.

Right, the armv4eb test bit-rotted in current releases but it worked in 1.1.1 so check that out and build a working big endian arm oabi toolchain to compare against... Right, got something that can run hello world under qemu-armeb. Now to look at its build logs.

This is gonna take a while.


Back to 2012