Reply to "Things You Should Never Do, Part I

By Rob Landley

So I'm reading through Joel Spoklsky's old articles, currently on "Things You Should Never Do, Part I", and thinking about it in the context of Open Source. His thesis is that throwing out your old codebase and starting over is always a bad idea because A) years of debugging and adapting to strange real-world cases are buried in the old code, and recorded nowhere else, B) your corporation can't afford any delay if it's to stay ahead of competitors. His main example is Netscape's painful journey from 4.0 to 6.0.

This is mostly good advice, and the reason the Linux kernel developers tend to evolve old code into new code when they rewrite it via the "trail of breadcrumbs" approach. Joel's also right that lots of knowledge tends to get baked into code, and attempts to clean it up have to be careful to salvage that old knowledge. But "never do" doesn't quite hold true for open source for a number of reasons.

First of all, many other things doomed Netscape, the largest being their decision to stop giving their browser away for free, and instead start charging for it as a product. By the time Netscape reversed this disastrous decision, IE had a significant foothold (about 30% market share) and leveraged its windows bundling the rest of the way.

Secondly, open source works significantly differently than the propreitary environment Joel's describing. He warns that centralized organizations managing finite resources shouldn't take a course of action that comes naturally to open source development.

Open source and proprietary development are very different things. Fundamentally, proprietary development is centralized and open source is decentralized. Proprietary development is about work assigned to the developers who do it. Open source is a bunch of submissions into an editorial slush pile. Proprietary development is a manufacturing model, open source is a publishing model. Proprietary development is aimed at capturing the most value from each line of code written, and open source is designed around discarding most of the slush pile to fight off sturgeon's law. The basic assumptions are very different.

A few specifics:

Back in 2000 when this article was written, Spolsky had apparently never encountered a real source control system. (Not surprising, considering his history at Microsoft.) The "annotate" command is your friend, and if the checkin comment doesn't say why a change you're thinking of removing was made, the developer in question needs to be pummeled. In open source multiple developers have to communicate primarily through channels that leave a record which can be archived; mailing lists, source control, and loggable IRC. It should be possible to go back and see _why_ the code got so ugly. If it isn't, digging out and documenting the strange corner cases is a valuable contribution in and of itself.
In open source the original developers tend to hang around a long time. Proprietary software developers leave projects when they switch companies. They _can't_ work on the old codebase anymore, and usually become permanently unavailable for future consultation. But in the open source world, Linus is still around 17 years later (as are dozens of others from as early as 1992). When you find something strange in their code, if you can't find an explanation in the public discussion archives, you can ask them.
Yes, code is harder to read than to write. Spoksly mentions refactoring, making code easier to read without changing what it does. Open source development spends lots of time doing this, because people reading and understanding the code is an important part of an open source project remaining viable and well-maintained. But part of refactoring is making what the code is doing obvious. If developers can't write a new one that works just as well as the old one, then they don't fully understand how the old one works, which is a bad thing. Writing your own version purely as a learning exercise can be useful, and the result is indeed sometimes better than what you started with.
Open source development is designed to get sample code into the hands of a huge number of people, and rapidly incorporate feedback. Even Microsoft hasn't got 100,000 different machines to regression test each release of their software on, but that's normal for the release of a large open source project. The only meaningful test for a piece of software is real-world usage.
The old software doesn't go away, and remains an option during the transitional period. If Apache 1.x or Linux 2.4 remain popular, the new projects aren't significantly inconvenienced. The old stuff can even be handed off to a new maintainer for maintenance releases while the development team focuses on a new codebase. If the new fork goes down a seriously blind alley (as gcc development did after 2.7), others can pick up the torch from the older version that worked for them. Forking is not always bad, and multiple competing implementations is fairly normal in the open source world.
Some old code is unmaintainable crap which stagnates because nobody can make changes to it anymore. Open Office is struggling with this, and xmms was abandoned to it. Microsoft's own Internet Explorer stagnated for years (not even bothering to add tabbed browsing). Meanwhile Mozilla forked off Galleon (throwing away most of the old codebase), then Galleon forked off Firefox (throwing away most of the rest). One more such fork and the Mozilla codebase may finally yeild a decent browser. There are costs to backing up and starting over, but FireFox is the most significant competitive threat IE has seen in years. (Me, I use Konqueror.)
Many of Joel's examples of old code nobody understands anymore are things that stop being useful. "This makes it work with floppy drives", and "this works around a bug in Windows 95", are indeed the kind of things you eventually want to clean out of the codebase when they cease being relevant. Complexity is a cost, and it's wasted unless it buys you something. Every large system has a "complexity budget", and reducing complexity in one place lets you spend it elsewhere. Also, old features that aren't tracked and regression tested tend to bit-rot; they'll break eventually anyway if you don't know why they're there.
The reason for "write one to throw away" is that a developer doesn't fully understand a problem until they've solved it. In the proprietary world, developers seldom have time to rewrite their own code after they've gotten it working to how it _should_ have been, but in the open source world this is common practice. And there's more to it than just refactoring, there's changing the basic approach or the overall organization.
The modular nature of Linux means you can swap out a component for an equivalent component, and pick and choose the implementations you like. The guy writing Dropbear doesn't need to interact with the OpenSSH developers, despite the new project being a drop-in replacement for the old. Similarly, the Linux developers swapped out the old scheduler implementation with a new O(1) implementation during the 2.4 series, and regularly reimplement everything from the system call implementation (sysenter) to the block layer. They carefully test the new and old code side by side, and the code undergoes extensive review by the people who wrote and maintained the old code. But old code is by no means sacred.
Reimplementing an existing project can indeed result in smaller, cleaner code. This often occurs in the embedded world with projects such as BusyBox, uClibc, the aforementioned dropbear... It's much easier to add features to a small and simple project than to strip down a large and complex project to fit in a small space. Clayton Christensen's bestselling book The Innovator's Dilemma detailed how technologies can increase in capability faster than the needs of the market, leaving themselves vulnerable to cheap "disruptive technologies" that start at the low end and attack upwards. The existence of costs when reimplementing does not eliminate the existence of advantages for doing so.