Reply to "Things You Should Never Do, Part I
By Rob Landley
So I'm reading through Joel
Spoklsky's old articles, currently on
"Things You
Should Never Do, Part I", and thinking about it in the context of
Open Source. His thesis is that throwing out your old codebase and starting
over is always a bad idea because A) years of debugging and adapting to strange
real-world cases are buried in the old code, and recorded nowhere else,
B) your corporation can't afford any delay if it's to stay ahead of competitors.
His main example is Netscape's painful journey from 4.0 to 6.0.
This is mostly good advice, and the reason the Linux kernel developers
tend to evolve old code into new code when they rewrite it via the
"trail of breadcrumbs" approach. Joel's also right that lots of knowledge
tends to get baked into code, and attempts to clean it up have to be careful
to salvage that old knowledge. But "never do" doesn't quite hold true for open
source for a number of reasons.
First of all, many other things doomed Netscape, the largest being their
decision to stop giving their browser away for free, and instead start
charging for it as a product. By the time Netscape reversed this disastrous
decision, IE had a significant foothold (about 30% market share) and leveraged
its windows bundling the rest of the way.
Secondly, open source works significantly differently than the propreitary
environment Joel's describing. He warns that centralized organizations
managing finite resources shouldn't take a course of action that comes
naturally to open source development.
Open source and proprietary development are very different things.
Fundamentally, proprietary development is centralized and open source is
decentralized. Proprietary development is about work assigned to the
developers who do it. Open source is a bunch of submissions into an editorial
slush pile. Proprietary development is a manufacturing model, open source
is a publishing model. Proprietary development is aimed at capturing the most
value from each line of code written, and open source is designed around
discarding most of the slush pile to fight off sturgeon's law. The basic
assumptions are very different.
A few specifics:
- Back in 2000 when this article was written, Spolsky had apparently never
encountered a real source control system. (Not surprising, considering his
history at Microsoft.) The "annotate" command is your friend, and if the
checkin comment doesn't say why a change you're thinking of removing was made,
the developer in question needs to be pummeled. In open source multiple
developers have to communicate primarily through channels that leave a record
which can be archived; mailing lists, source control, and loggable IRC.
It should be possible to go back and see _why_ the code got so ugly.
If it isn't, digging out and documenting the strange corner cases is
a valuable contribution in and of itself.
- In open source the original developers tend to hang around a long
time. Proprietary software developers leave projects when they switch
companies. They _can't_ work on the old codebase anymore, and usually become
permanently unavailable for future consultation. But in the open source world,
Linus is still around 17 years later (as are dozens of others from as early as
1992). When you find something strange in their code, if you can't find
an explanation in the public discussion archives, you can ask them.
- Yes, code is harder to read than to write. Spoksly mentions refactoring,
making code easier to read without changing what it does. Open source
development spends lots of time doing this, because people reading and
understanding the code is an important part of an open source project
remaining viable and well-maintained. But part of refactoring is making what
the code is doing obvious. If developers can't write a new one that works just
as well as the old one, then they don't fully understand how the old one works,
which is a bad thing. Writing your own version purely as a learning exercise
can be useful, and the result is indeed sometimes better than what you started
with.
- Open source development is designed to get sample code into the hands
of a huge number of people, and rapidly incorporate feedback. Even Microsoft
hasn't got 100,000 different machines to regression test each release of their
software on, but that's normal for the release of a large open source
project. The only meaningful test for a piece of software is real-world
usage.
- The old software doesn't go away, and remains an option during the
transitional period. If Apache 1.x or Linux 2.4 remain popular, the new
projects aren't significantly inconvenienced. The old stuff can even be
handed off to a new maintainer for maintenance releases while the
development team focuses on a new codebase. If the new fork goes down
a seriously blind alley (as gcc development did after 2.7), others can
pick up the torch from the older version that worked for them. Forking
is not always bad, and multiple competing implementations is fairly normal
in the open source world.
- Some old code is unmaintainable crap which stagnates because
nobody can make changes to it anymore. Open Office is struggling with this,
and xmms was abandoned to it. Microsoft's own Internet Explorer
stagnated for years (not even bothering to add tabbed browsing). Meanwhile
Mozilla forked off Galleon (throwing away most of the old codebase), then
Galleon forked off Firefox (throwing away most of the rest). One more such
fork and the Mozilla codebase may finally yeild a decent browser. There are
costs to backing up and starting over, but FireFox is the most significant
competitive threat IE has seen in years. (Me, I use Konqueror.)
- Many of Joel's examples of old code nobody understands anymore are
things that stop being useful. "This makes it work with floppy drives", and
"this works around a bug in Windows 95", are indeed the kind of things you
eventually want to clean out of the codebase when they cease being relevant.
Complexity is a cost, and it's wasted unless it buys you something.
Every large system has a "complexity budget", and reducing complexity in one
place lets you spend it elsewhere. Also, old features that aren't tracked
and regression tested tend to bit-rot; they'll break eventually anyway if
you don't know why they're there.
- The reason for "write one to throw away" is that a developer doesn't fully
understand a problem until they've solved it. In the proprietary world,
developers seldom have time to rewrite their own code after they've gotten it
working to how it _should_ have been, but in the open source world this is
common practice. And there's more to it than just refactoring, there's
changing the basic approach or the overall organization.
- The modular nature of Linux means you can swap out a component for an
equivalent component, and pick and choose the implementations you like.
The guy writing Dropbear doesn't need to interact with the OpenSSH developers,
despite the new project being a drop-in replacement for the old. Similarly,
the Linux developers swapped out the old scheduler implementation with a
new O(1) implementation during the 2.4 series, and regularly reimplement
everything from the system call implementation (sysenter) to the block
layer. They carefully test the new and old code side by side, and
the code undergoes extensive review by the people who wrote and maintained
the old code. But old code is by no means sacred.
- Reimplementing an existing project can indeed result in smaller,
cleaner code. This often occurs in the embedded world with projects such as
BusyBox, uClibc, the aforementioned dropbear... It's much easier to add
features to a small and simple project than to strip down a large and complex
project to fit in a small space. Clayton Christensen's bestselling book
The Innovator's Dilemma detailed how technologies can increase in capability
faster than the needs of the market, leaving themselves vulnerable to cheap
"disruptive technologies" that start at the low end and attack upwards.
The existence of costs when reimplementing does not eliminate the existence of
advantages for doing so.