Guest Essay







Why Linux isn't on the desktop yet

By Rob Landley

The answer to the title of this article is a single sentence, but you'll have to read the whole article to understand it. The Linux community has an amazing blind spot, and I'd like to rant about it a bit.

I keep bumping into programmers who think some program or other is needed to change the world. They're wrong. "Linux just needs this one program and then we'll be ready!" they cry. I generally want to slap these people until they snap out of it (which is kind of hard to do through an internet connection). They are making a fundamentally wrong assumption. It's not about programs. It's about data.

Let me repeat that. Data formats are important. Programs are not.

Think about email for a second. All sorts of different programs can send and receive email: Pine, Elm, Kmail, outlook, Lotus Notes... There are hundreds of them. I don't care what program you use to send email to me; it might be been some CGI script on a web page somewhere (Hotmail, if you can stomach it), or the built-in email program in Netscape. You shouldn't care what I'm using to read your email with, either. As long as your email program is sending mine ASCII text through an SMTP server, I can download mail from a POP or IMAP server, and can both understand MIME format file attachments, we're pretty much covered. (Some people believe in throwing in HTML support, which is a bit like using colored paper and glitter ink to improve your resume, but it takes all kinds.)

All these email data formats are well documented, and writing a new email program that implements the standards is a learning exercise for high school students. Any experienced geek could bang one out over a three day weekend. A real hacker could do it in a matter of hours.

Now think about web browsers. Once upon a time, the only web browser worth mentioning was Netscape. (It wasn't the first, but it took over and became the standard.) And Netscape added all sorts of fun little proprietary features, like the "blink" tag, that only it could support. Then Microsoft bought Spyglass and called it "Internet Explorer" and reverse engineered Netscape's new features, plus a bunch of new deliberate incompatabilities (to which Netscape responded in kind) to the point where web servers sent different pages to different browsers until they could reverse engineer each other's features. Yet the people actually WRITING all the web pages wanted a single common standard (because unnecessary work is no fun), and mostly used only the features that were supported by both browsers anyway. That was about four years ago.

These days you can use Netscape, Explorer, Konqueror, Galeon, and Mozilla. In text mode there's Lynx (and Links), and for the undemanding oddball there's Grail or Hotjava. The web server your browser talks to might be apache, might be IIS, might be Tux, might be iPlanet, might be Zeus, or it might be none of those. You really shouldn't have to care.

The web is still maturing, but as long as your web browser can download from an HTTP server and render HTML 3.0 documents, it can do basic web browsing. Modern use of the web requires cascading style sheets and Javascript. Things like online banking and stock trading require encryption (HTTPS) and the ability to handle HTML forms. (A few misguided sites won't work without a flash plugin, and a couple that really haven't been paying attention might still try to use Java, but those can usually be safely ignored.)

The web isn't as mature as email, but it's getting there. Any web page that tries to determine whether it's running on Netscape or Explorer is rapidly becoming obsolete. AOL just switched millions of Compuserve users to Gecko, as a pilot program before switching over ten million more in the future. A diversity of clients keeps the data publishers standards compliant, and relegates implementation idiosyncrasies to the catagory of "bug" where they belong. As long as your browser's recent enough to understand all the data formats it's expected to send and receive, it really shouldn't matter what actual program you're using.

ALL computing is like this. You start with a unique program like Napster, and a few years later there are a hundred compatable implementations. The original implementation falls away, and what you're left with is a specification for a set of formats in which data can be stored and transported. This is how any computing niche matures. Compuserve's original GIF viewer is long forgotten, pkzip gave way to winzip and info-zip and the generic "deflate" algorithm, SSH was cloned by OpenSSH, IBM's original mainframe databases begat SQL and a dozen databases that speak it... Even long-lived programs like Bind and Sendmail are far less important than the standards they implement. Code gives way to fresh implementations, on new platforms, in new languages. Dennis Richie's original C compiler may be a museum piece but the C language is the programming language in the world today, the progenitor of C++, Java, Javascript, the implementation language for the Python and Perl interpreters, Linux, and even Windows. (Hardware is no different: an AMD Athlon can run software written for a 20 year old Intel 8086.)

To replace a format with a better one, you first support the old format (the way Explorer supported Netscape's extensions to HTML, or the way OpenSSH supported SSH version 1), and then introduce the new one as an option once enough people are using your new program to handle the old format. If you want to be successful, you encourage as many other programs as possible to use your new format as well, which is why open source is so successful at this sort of thing. Dropping out old cruft is hard, which is why Windows XP can still run 20 year old DOS programs. (In a nicely modular system you can put the old compatability mode in an optional module, but you still need to have it available or nobody will take the new stuff seriously. Or you can wait for the old one to diminish to a niche market: we're just now finally getting rid of the ISA bus in PC hardware after almost ten years of ISA and PCI coexisting in your computer.)

If you accept that computing is basically just shoveling data around, then you can express your computing needs in terms of what protocols you need support for. Linux is successful on the server because most network protocols are rigidly standardized, and we have multiple compliant implementations for all of those. Programmers also like Linux because source code is ascii text (which a hundred different programs can read, edit, and write), and the languages themselves are standardized so compliant compilers and interpreters aren't hard to find or write.

The desktop is no different. Interface issues are helpful, but they're just frosting. Millions of people put up with DOS and Windows 3.1 for years. Eye candy is NOT the problem. Data formats are. Most "features" are just the ability to manipulate your data: without support for the data format in question it's a moot point, and WITH that support the feature is generally pretty easy for some programmer to add.

If you're going to use Linux on the desktop, you may need email (SMTP), web surfing (HTML), and spreadsheets (.xls files), all of which Linux can now do. Transferring files with FTP or Samba we handle just fine, and a graphical filesystem browser is just an interface to a hierarchical filesystem (something unix actually invented 30 years ago. Yes, we have subdirectories. Go us!)

Now comes the one sentence I told you about, explaining why Linux is nowhere to be found on your average end-user non-geek desktop: The worldwide established standard data format for exchanging word processing documents is Microsoft Word files, and no Linux distribution I know of comes with an open source program that can handle them.

Until Linux can read, edit, and write *.DOC files, it can't interact with the millions of word processing documents sent around the world each day. Until Linux can handle this data format, it effectively can't word process.

The existence of Linux word processors like KWord and AbiWord which can't read and write word files (although they're working on it) is like having a web browser that can't handle HTML. It makes us look actively silly. It simply doesn't matter what other data formats these word processors CAN handle if all the documents they'll be asked to read and send are in a specific format that they just can't do.

It doesn't matter how technically pathetic the Word file format is: Linux can do windows file sharing with SAMBA, which is technically extremely pathetic (ask any member of the Samba team), but it's here and it works and as such is not a roadblock to Linux.

Linux has had closed-source Word support for a while through StarOffice (which is imperfect, but enough to limp by), but it wasn't bundled with Linux distributions and wasn't an open source program people tended to install and use. (We've all had too many closed-source programs fold up and go away after we became dependent on them to want to become dependent on another. A closed source program cannot be truly viewed as part of Linux.)

Now with OpenOffice, Linux is finally starting to get Word support, and this is literally the last roadblock to Linux on the desktop. (If you can think of another, tell me.) Word files are the last major data format Linux doesn't properly support. (And it is a nasty one, which is much more difficult to write properly than to read, which is the real reason newer versions of word can't write files readable by older versions.)

There are other data formats Linux supports, or that it would be nice to support. Various niche markets have their own data format needs. We have multimedia software that can read and write gifs, jpegs, mpegs, and mp3 files. Linux is actually doing okay in video editing and compositing (thanks in part to SGI). Print publishing uses photoshop files (and Quark files) which we mostly don't handle but companies like Adobe don't seem to mind porting stuff to Linux if there's money in it. And this isn't keeping us out of the mainstream because it isn't in the mainstream: people have been doing this on Macintoshes for years.

A desktop user may need presentation software, so it would be nice if we could read and write Powerpoint slides, but since most people write their own presentations and give them in person the format isn't as important there. (The lack of powerpoint is a nuisance, but not a killer, because Powerpoint isn't used much as a data storage and transport format. It's mostly just a way of storing temporary files until you can do your slideshow. PDF, Postscript, and GIF/JPEG in HTML pages are fairly popular alternatives anyway, all of which Linux has good support for.)

But the lack of Word support is what's killing us on the desktop.

Many programmers have been so disgusted with the Word file format (which involves dumping Word's run-time data structures to disk, which is just evil), than they're determined to replace it with something better, rather than support it. I've spoken to many programmers who are trying to write such a wonderful word processor that everybody will use it, so their new format replaces word files without their program ever having to support word files.

Final exam. Spot the deluded fantasy:

A) "I don't need to support HTML, I'll write a BETTER web browser using my own incompatable data format, and everybody will switch because it's technically just so much better."

B) "I don't need to support SMTP, I'll write a BETTER email program using my own data format, and everybody will switch because it's technically just so much better."

C) "I don't need to support Microsoft Word files, I'll write a BETTER word processor using my own data format, and everybody will switch because it's just so much better."

The correct answer? All of the above.

Comment on this article