MIRRORDIR - A PROGRAM FOR SECURE ENCRYPTION AND MIRRORING
---------------------------------------------------------

HISTORY
-------

Mirrordir was original devised in February 1998 as an idea to create
redundant systems - that is high availability systems. It was thought
that a program that would periodically mirror a file system to a
separate physical device would allow for a system to be easily restored
in the event of a drive failure. Although scripts could be written to
duplicate a directory hierarchy, no dedicated package existed that would
perform this operation optimally (minimal set of changes), and ensure
meticulous mirroring of all types of files.

Within a week a working alpha version was available, which later grew
with a rich set of modes and features. A month later a port of the
Virtual File System layer of the Midnight Commander file manager was
made and incorporated into Mirrordir. The VFS layer proved extremely
useful for Obsidian Systems' (the author's employer) business, and
allowed many of the programming hours to be billed.

In October 1998, a secure socket layer was added in. This later evolved
into the `libdiffie' library - a generic library for adding secure
socket encryption to a network utility. The library transformed the VFS
`mc://' connections into secure connections.

At this point Mirrordir lacked the important feature of programmability.
Users should be able to customize the files to include or exclude, using
a programming language. This is especially important when doing large
complex FTP mirrors. A trivial C interpretor was designed to fulfill
this need.

The following month `pslogin' was added based on rsh, using the secure
layer. Support for compressed (as well as encrypted) sockets was added
using the zlib library in January 1999. In February the `forward'
program was added to forward arbitrary TCP ports over the secure socket
layer.

Recently a WIN32 port of Mirrordir was made, supporting all the file
transfer and encryption features. Pslogin however is not yet ported.

The opportunity for Mirrordir to be used explicitly for a high
availability system came about in April 1999. It has been mirroring
between two servers every six hours for the past few months without
problems.

Mirrordir is part of the Debian distribution.


OVERVIEW
--------

(The terminology `control' and `mirror' is used to indicate the existing
and backup directories respectively.)

Although transfer of ordinary files is a simple matter, an algorithm
that duplicates directories in every detail becomes more complicated,
compounded with verification of the age of the file (possibly across a
timezone) to decide if it has been modified compared to the control.
Consider for instance the problem of duplicating the access and modified
times of a directory: the program must reset its access and modified
time, but only AFTER the directory has been read. Further, the control
access times should also be duplicated by resetting them after exiting
each directory - the idea being that mirroring is done with the minimum
possible intrusion.

The problem of changing file stat information on the mirror side also
needs to be tackled efficiently. For each directory listing, all
non-existant files on the mirror side must be removed. If a directory
exists it must be left untouched while files must only be removed if
their type (i.e. ordinary versus device file) differs between the
control and mirror trees.

Finally, hardlinks need to be correctly located and duplicated. A
hardlink table is set up internally for each multiply referenced inode,
so as to allow proper tracking of hardlinks - even hardlinks to device
files and symlinks.

Recursion works in an obvious way - for each directory entry in the
control, an entry in the mirror is located. If it exists and is a
directory, then it is recursed into. If it is a file, then appropriate
duplication is done - such as setting the permissions if they differ or
changing the ownership, or creating a device file. Only the data of
newer files is copied. Files remaining after the comparison that do not
exist on the control side are removed at the end of the each listing.

Mirrordir also supports straight recursive copying of files/directories
as a more rigorous version of the cp program.

Mirrordir is written entirely in ANSI C.


VFS LAYER
---------

The VFS (Virtual File System) layer essentially just supports
`mc://user@host:port/' and `ftp://user@host:port/' type directory
duplication/transfer in addition to local ones.

The handling of FTP connections makes Mirrordir a mirroring tool on par
with Mirror (a dedicated ftp mirroring tool). Though the vfs layer has
some peculiarities that other mirroring programs might not. For
instance, file access is transparent from the programmers point of view:
mc_open() works just like open() and so forth; allowing recursive
uploading as well as downloading. An ftp front end tends to list
directory contents raw: VFS however parses the FTP LS output into the
Unix stat structure, allowing transparent use of the (mc_)stat() system
call with efficient caching of stat data.

You can hence mirror any type of file (device, socket etc.) over ftp.

mc:// connections have more power, in that access and modified times can
be set (which is not possible with FTP). These connections require that
the secure-mcserv program be running on the host. mc:// connections are
inefficient because files are transfered in chunks that each require a
handshake using the VFS's simplistic command protocol. Network latency
hence governs the transfer speed. Downloading is more efficient because
of some caching mechanisms that are implemented.

In general, it is better to use downloading than uploading where
possible, and then to use FTP transfers for speed and mc:// transfers
for precision (in the latter case, such as mirroring a complete root
file system).


MIRRORDIR INTEGRITY CHECK
-------------------------

The `make check' build target, runs a script which attempts to mirror
every possible combination of file system change between control and
mirror. This provides absolute certainty that Mirrordir is producing an
identical file system.


C INTERPRETOR
-------------

The pseudo C interpretor is an extravagant solution to the problem of
adding some simple programmability. It is 50kB of code and is easily
portable to any application. It supports long integers, strings and huge
integers (arbitrary precision signed integers needed for cryptography).
It takes some huge liberties in its implementation, but is extremely
fast as it tries to have the lowest possible interpretive overhead. It
does not support functions.

The C interpretor API begins with adding custom C functions using

    int parser_add_operator (Operator * o);

`regexp()' is an example of a function Mirrordir adds so that users can
have the benefit of regular expression comparisons on file names. Plenty
of examples of custom functions are isolated in the file functions.c in
the source distribution.

The initialization function should then be called:

    void parser_init (void);

and then the particular code you want to use is compiled:

    void *parser_compile (char *text, Value * heap);

where `text' is the pseudo C code, and `heap' is an array of Value's -
space for any variables that `text' may declare - which will then be
accessible to the calling program in the order that they were declared
within `text'.

The pointer returned is the code compiled into an architecture dependent
byte code. The internal format is basically the Reverse Polish form of
the code, pushed onto a stack, and having direct pointers to functions
that perform each operation. This makes for extremely fast execution of
the byte code.

To execute the code, simple call,

    int parser_evaluate (void *s, void *user_data);

where `s' is the byte code returned from parser_compile, and `user_data'
is optional data that the programmer might want to pass to any functions
defined with parser_add_operator.

The result returned from parser_evaluate is just the value of the
`return' statement within `text', or zero if the interpretor fell out of
the bottom of the program.

Within the pseudo-C itself, all declared variables are initialized to
zero, but are not re-initialized between calls, hence variable values
carry over between successive calls to parser_evaluate().

Finally,

    void parser_free (void *s, Value * heap);

    void parser_shut (void);

may be called for cleaning up. The interpretor should exit with no
memory leaks.

The C interpretor performs two functions within Mirrordir: firstly it
allows programmability of the type of files you would like to include or
exclude when mirroring data. Secondly it interprets the cryptography
scripts to keep the cryptography code separate from the Mirrordir
distribution.

Implementing key exchange and signature algorithms with an interpreted
language has a further advantage: it is easy to see what the algorithm
actually does. It is easier to mentally verify the security of a fifty
line script than to wade through original C source code.


CRYPTOGRAPHY
------------

The secure socket layer provides wrappers for the connect(), accept()
and associated network functions. In theory, any socket program can just
be recompiled after including the diffie-socket.h and z-socket.h headers
and then become a `secure' version of that program. In practice this
will probably not work, and the secure-mcserv.c, forward.c and
mirrordir.c sources should be used as examples of exactly what to do.

The accept() and connect() calls of the socket layer each require
scripts to be present in /etc/ssocket/. If these are not present, then
they are automatically transfered by FTP from encrypt.obsidian.co.za -
where there are no relavent export regulations (there are obvious
security issues involved with this). The scripts generate and write
public and private keys to /etc/ssocket/public/ and
/etc/ssocket/private/ as needed. Keys are stored with one key per file
in raw binary form with the first two bytes dictating the length of the
key in bytes. (See huge_as_binary() called by huge_write()). This is a
more practical alternative to key database files, obviating the for key
management utilities. On Linux, random numbers are generated by reading
the /dev/urandom device. On other systems the random number generation
is probably not secure, since it relies on an MD5 hash of the time,
process ID and modified and access times of some top level directories.

The algorithm is a classic Diffie-Hellman key exchange with signature
verification of the host, and symmetric key stream cipher. It is
detailed as follows:

First the client sends a magic number to the server to tell it that it
wants encryption, if this is not sent, then a plain text connection is
assumed. Then it calculates X = g^x mod p, where x is random, and sends
X. The server recieves X, calculates Y = g^y mod p, where y is random,
and sends back a reply magic number and then Y. Now both calculate k =
g^(xy) mod p (client calculates k = Y^x mod p and server X^y mod p).
That concludes the classic Diffie-Hellman key exchange.

Encryption is now turned on with the key k - that is, all further talk
is over a symmetric stream cipher (explained below). Actually, two
streams are set up, one for the server->client pipe and one for
client->server pipe, using the most and least significant halves of k
respectively. The actual script lines are:

    initarcrd (c, l / 2);

    initarcwr (c + l / 2, l / 2);

and invokes a separate script (see below) to initialize the stream
cipher before continuing. The server then looks in /etc/ssocket and gets
its public, y, and private, x, signature keys. (If it can't find them
then it calculates y = g^x mod p, where x is random and is the private
key, and y is the public key, and stores x and y in the appropriate
files (this would happen if the program is being run for the first time
ever). Then it sends y (the public key) to the client.

The client recieves y and checks in the users ~/.ssocket/ directory for
a file with the same name as the hosts ip address, to verify the public
key against the file, and hence that the server is who it last claimed
to be. If the file does not exist then it creates it (i.e. a first time
connection to the server), and warns the user of the possibility of a
breach. The server then signs the encryption key k with private key x to
to produce r = k g^v mod p, s = v - r x mod q, where v is random, and q
= (p - 1) / 2 is also prime. r and s represented the signed k and are
sent to the client. The client recieves r and s and calculates m = g^s
y^(r mod q) r mod p. m must equal k as verification of the authenticity
of the server. The signature exchange is there to exclude the
possibility of a man-in-the-middle attack.

Normal TCP traffic then ensues with encryption still enabled. The first
traffic would probably verify the authenticity of the client, and since
all traffic is now encrypted, the client should now have no qualms about
sending his password in the `plain-text' such as with a normal telnet
login.

The requirement of an exportable program left only Diffie-Hellman as the
choice of key exchange, to avoid patent infringements. The signature
scheme is called the p-NEW signature scheme from Bruce Schneier's
`Applied Cryptography'. The requirement of a stream cipher that was
simple enough to be executed by an interpreted script and still operate
with reasonable speed, left only the ARC stream cipher which is
described in the same reference. The modified version of ARC used here
is a twelve bit version, implemented in the files
/etc/ssocket/arcinit.cs and /etc/ssocket/arcencrypt.cs

The stream cipher need not be executed by scripts if you are using an
international version of Mirrordir. In this case, the C routine is
compiled into the source, making this an extremely fast encrypted
connection. Unfortunately the speed of the compiled in cipher could not
be properly measured because it was negligible compared to the network
latency on the test computer - ARC is an order of magnitude faster than
plain DES, faster here because it is twelve bit. For the interpreted
script, the cipher managed 40kB per second (PII300) which is just
adequate for an X server connection, and more than adequate for any
terminal connection. Note that the `International' and `US' versions of
Mirrordir are completely inter-operable.

The key size for the stream cipher is obviously half the size of the
Diffie-Hellman key - which is enormous by any standards of symmetric key
length. The Diffie-Hellman key can be set on the command-line to any one
of 512, 768, 1024 or 1536 bits, and defaults to 512 bits.


HUGE NUMBER LIBRARY
-------------------

The arbitrary precision code was taken from the Python sources. At 25kB
This code is the most succinct library that could be found, and was
reworked to compile indepedantly of the Python sources. An interesting
modification was to redefine Python's idea of a `digit' from 15 bits to
31 bits and make use of gcc's long long integer type of 64 bits. This
gives a 7/4 speed increase for powmod operations. An extract from
huge-number.h is,

    /* comparison */
    int huge_compare (Huge * a, Huge * b);
    int huge_nonzero (Huge * v);

    /* arithmetic */
    Huge *huge_add (Huge * a, Huge * b);
    Huge *huge_sub (Huge * a, Huge * b);
    Huge *huge_mul (Huge * a, Huge * b);
    Huge *huge_div (Huge * v, Huge * w);
    Huge *huge_mod (Huge * v, Huge * w);
    Huge *huge_divmod (Huge * v, Huge * w, Huge ** remainder /* may be null */ );
    Huge *huge_invert (Huge * v);

This is a useful piece of clean code for anyone who wants to add
arbitrary precision support to a program without resorting to linking
with the GNU mp library.


12 BIT ARC
----------

Allegedly, this algorithm used 8*8 S boxes. Bruce Schneier comments that
there is no reason why this cannot be extended to 16*16 S boxes which
will require 16 x 2^16 bits of storage - 128kB. In between is the
inconvenient 12*12 arrangement requiring 2^12 words or 8kB. Its
inconvenient because the algorithm must cope with half bytes when
xor'ing the stream (it encrypts three nibbles at a time). At a minimum
key size of 256 bits, this part of the encryption package appears to
have no security issues.

It is dissapointing that the uninformed will associate security with
catch terms like DES. It should be noted that a stream cipher like this
one is close to `one time pad' encryption. I.e. it is really the most
secure type of encryption possible, and is really the ideal choice for a
TCP stream.


SECURITY
--------

At the moment, Mirrordir has undergone no external scrutiny for the
security of the algorithm or implementation. A particularly vulnerable
point is having scripts downloaded in the plain-text from a single
server.

The idea of Mirrordir is exportability - allowing the user to
use strong cryptography out of a stock US Unix distribution without
having to download or learn about `International Versions' of a package
- hence any use of PGP or other such package to sign the scripts would
still require the user to customize their system, and would defeat the
purpose of package.

One solution is to distribute the scripts to as many servers as possible
and have mirrordir verify scripts from seperate random servers. As these
scripts become wide-spread public knowledge, an attack (of the sort to
try and surreptitiously modify the scripts) would become more difficult.


FUTURE DIRECTIONS
-----------------

At some point it is hoped that the security features of the Mirrordir
VFS be incorporated back into the Midnight Commander. It is hoped to be
able to execute remote shell commands as well as transfer files over a
secure link, for example in the left panel, while having a different
system on the right panel.


Authors: Paul Sheer, currently an employee of Obsidian Systems

Contact: Paul Sheer <psheer@obsidian.co.za>

Address: 8 Wingate Court, Gibson Rd, Kenilworth, 7708, Cape Town, South Africa

Phone: +27 11 761 7224 or +27 83 604 0615

Fax: (by special arrangement)

Email: psheer@obsidian.co.za

URL: http://www.obsidian.co.za/psheer/

Title: COOLEDIT - TEXT EDITOR AND INTEGRATED DEVELOPMENT ENVIRONMENT

Needs: Pentium 200 or greater with 2 MB graphics card and Video projector capable of 1024x768

Resume: No previous conference talks.

Abstract:

	Cooledit is a full featured text editor for the X Window System.
	It is built around its own widget library written directly in
	XLib. It has a builtin Python interpretor for macro programming,
	syntax highlighting for many programming languages, a
	comprehensive interface to gdb, and generic interfaces to
	compilers and text processing utilities. It has an elegant 3D
	multiple window interface, and is light and fast.

COOLEDIT - TEXT EDITOR AND INTEGRATED DEVELOPMENT ENVIRONMENT
-------------------------------------------------------------

INTRODUCTION
------------

In 1996 work began on an internal editor for the Midnight Commander to
mimic and extend the editor of the Norton Commander DOS file manager. At
the same time, the author's thesis project led to writing a widget
library called Coolwidgets for X (the stereo-0.2 package available on
the sunsite). This evolved into a text editor that had a terminal
interface via the Midnight Commander and an X interface via Coolwidgets.

Today, Cooledit has matured into a refined and sophisticated programming
environment. Important to Cooledit's development was that it was written
mostly using itself, and has hence seen more work in its own creation
than by any other single need. This has important implications (besides
the obvious existential ones) and will be discussed with reference to
general GUI design.

Cooledit comes with several other utilities written using the
Coolwidgets library. It is written entirely in ANSI C.


COOLEDIT AND COOLWIDGET ARCHITECTURE
------------------------------------

It should be stated at the outset that Coolwidgets is poorly designed.
Although extensive and repeated overhauls have been made to the source,
and although most of the individual functions are cleanly written,
Coolwidgets is by and large a `hack'. It lacks proper modularity and
extensibility - an important lesson in program design.

Coolwidgets was an attempt to create a widget library that worked in
parallel with XLib, the idea being that the programmer could have the
low level control of X, combined with the high level shortcuts of a
ready library. When this became messy, it leaned toward a slightly
object oriented design. It is successful in that it is the minimalist
widget library for its requirements - that is, it is probably the
smallest piece of code possible for what it has to do.

How Coolwidgets actually works is best described with a `hello world'
example:

/* hello.c - simple example usage of the coolwidget X API */
#include <coolwidget.h>

int main (int argc, char **argv)
{
    Window win;
    CInitData cooledit_startup;
    CEvent cwevent;
    int y;

/* initialize the library */
    memset (&cooledit_startup, 0, sizeof (cooledit_startup));
    cooledit_startup.name = argv[0];    /* won't bother with other init's like geom and display */
    cooledit_startup.font = "-*-helvetica-bold-r-*--14-*-*-*-p-*-iso8859-1";
    CInitialise (&cooledit_startup);

/* create main window */
    win = CDrawMainWindow ("hello", "Hello");

    CGetHintPos (0, &y);        /* y position where to start drawing */
    CDrawText ("hellotext", win, 0, y, " Hello World ");        /* the lable "hellotext" may be used to identify the widget later */
    CCentre ("hellotext");      /* we want the text centred from left to right */
    CGetHintPos (0, &y);        /* get the next y position below the last widget drawn */
    CDrawButton ("done", win, 0, y, AUTO_SIZE, " Done ");       /* draw a button... */
    CCentre ("done");           /* ...centred */
    CSetSizeHintPos ("hello");  /* set the window size to just fit the widgets */

    CFocus (CIdent ("done"));
/* show the window */
    CMapDialog ("hello");

/* Run the application. */
    do {
        CNextEvent (0, &cwevent);
        if (cwevent.type == QuitApplication)    /* pressed WM's "close" button */
            break;
    } while (strcmp (cwevent.ident, "done"));

/* close connection to the X display */
    CShutdown ();
    return 0;
}

Coolwidgets widgets are stored internally and referenced by an
identifier string. The string must be unique within the scope of the
application. This identifier approach was appealing because the
programmer need not declare every widget created, and can easily
globally reference any widget - it has no other real advantages though.

The workhorse of the application is CNextEvent() - a wrapper around
XNextEvent that does things like handle Expose events, key-press
translation and widget call-backs. Using CNextEvent(), Coolwidgets can
be used in a straight C fashion like above, or in callback (ala
object-orientated) fashion with the CAddCallback() function. In this way
it is quite versatile.

A widget is defined by a 512 byte structure. There is only one structure
for all the different types of widgets - fields were just added as
needed, hence most of the fields are not used (the memory waist incurred
by this is not significant). All the widgets are allocated into an array
within the library.

The editor itself is just an additional widget of the library, in the
same way as it is under the Midnight Commander. The Cooledit application
contains the Python interpretor, shell and gdb interfaces, and manages
the multiple window interface.

The library has some distinguishing features:

    - Additional events may be received by CNextEvent besides the usual
	X events. QuitApplication above is an example (wm close).
	AlarmEvent and EditorCommand are others. These provide additional
	functionality.

    - Exposes are amalgamated into larger exposes for efficiency.
	Application need not do this themselves and can rely on received
	expose areas being optimal.

    - Key presses are translated into editor commands before reaching the
	application. These are high level commands like Cut and Paste.
	Hence any application written with Coolwidgets will have the
	same key-bindings. Key-bindings are consistent through all
	widgets. Cooledit also implements a way to redefine keys globally,
	by applying a key translator callback at a low level.

    - Dialogs and menus dynamically assign hotkeys (underlined letters)
	from heuristics. This means that international languages will have
	hotkeys without translators having to explicitly work out a hotkey
	arrangement. Alt-<letter> combinations are an intrinsic part of
	Coolwidgets.

    - All entry widgets have a history. At the moment the application has
	to take the trouble to load and save this history, but if it does,
	then the user has the benefit of persistent storage of every
	entry ever made.


DESIGNING AN INTERFACE
----------------------

It is noteworthy that the author had extensive use of a myriad of text
editors before beginning this project. Each of them had a set of nifty
features along with just as many irritating ones. Cooledit attempts to
be the combination of the best behavior from all these editors.

Although logic plays an important role when creating an interface, it
cannot completely anticipate the psychology of users' reactions. A user
will tend to avoid using some feature that they find irritating (or
because they are used to doing it differently in another program). The
user will try a combination of other features (or even another program)
to achieve the same result. They may get used to doing things a
different way, and will expect their own approach to be ergonomic, even
if it conflicts with the intention of the developer.

A users use of an application will not necessarily propagate to the
minimal set of key strokes just because an exhaustive combination of
features are present. How users will end up using an application is
highly unpredictable.

The solution is two fold:

    1.  The developers should have extensive use of many other similar
	applications. This does not just mean looking at what features
	an application has and how they are invoked, but also using those
	features extensively until the operations become spontaneous.
	Only at this point is the application fully evaluated.

    2.  Create many different ways of doing the same thing.

In the first case, a universal guideline is this: you cannot know how
you are going to feel about a set of key strokes after you have used
them a thousand times, until you have actually used them a thousand
times. Be empirical.

A list of editors that were tested is:
    - Borland C IDE
    - jed (Unix terminal editor)
    - ncedit.exe (internal editor of the Norton Commander)
    - ne.com (Norton Editor for DOS)
    - notepad.exe (Windows 3.1 editor)
    - tvedit.exe (Turbo Vision Borland editor class for DOS)

Each of these were used to write many thousands of lines of code. Others
were used, but not as intensively.

A grievous omission was the failure to test Emacs, however at the time,
Emacs was thought to not make full use of the potential of the X
keyboard standard and, like Vi, was not considered in the same spirit of
user friendliness and ergonomics. Cooledit wanted to eliminated double
key combinations as much as possible, as well as eliminate any sort of
learning curve for novice computer users.

In the second case, it is desirable to implement three separate methods
of invoking a feature. First, a mouse can be used to pull the
appropriate menu. Second, a hotkey should be supported, and finally,
keys can be used to navigate to the menu and manually invoke the menu
item. Development has aimed to make Cooledit completely independent of
mouse operations, allowing it to retain the speed of a terminal editor.
Having three methods of actuating a function inherently supports the
users learning curve toward fluent use of the application. It is felt
that all GUI's should intrinsically support this.

For interest, Cooledit development began under jed. As soon as it was
complete enough to save an edit buffer, jed was discarded. Thereafter,
Cooledit was used to write itself. In this way it receives ongoing and
intensive testing.

Cooledit's interface has evolved only out of an enormous number of hours
of testing and fine tuning.


EDIT BUFFER DESIGN
------------------

Most editors use a single linear memory block for the edit buffer.
Cooledit however has implemented a buffer array of 64k blocks. The exact
details of this are explained in the sources. This buffer array system
could allow data to be saved to swap files when very large files are
being edited, and allow for a 16 bit implementation of the editor.
However, this was never implemented, and hence the buffer system remains
an odd implementation.

Low level access to the buffer occurs solely through six functions:

    edit_get_byte - to retrieve a character.

    edit_insert - insert at the cursor and move one place.

    edit_insert_ahead - insert at the cursor.

    edit_delete - delete ahead.

    edit_backspace - delete backward.

    edit_cursor_move - move the cursor an integer number of places.

These functions record each modification into a wrapping history - the
undo stack. Hence each action taken on the buffer is recorded. This
allows for the key-for-key undo feature of Cooledit. The undo stack can
be set to an arbitrary size.

The buffer is eight bit clean and null transparent so that binary files
can be editing flawlessly. An important consideration is that no changes
are made to the buffer unless it is explicitely modified by the user.
Hence loading a file, moving the cursor and saving it again, leaves it
completely unchanged, even if it is a binary file. (Some DOS editors do
not have this grace.)


DISPLAY OPTIMIZATION
--------------------

Another reason for the development of Cooledit, was that other editors
were not display optimized. Any application that uses a canvas area
should allow for use of non-acclerated graphics displays. This requires
that only the minimal surface area is redrawn with each key press. This
has become less important today, as accelerated hardware becomes
cheaper. Cooledit also supports 16 color displays extremely well.

To demonstrate this, Cooledit has the hidden key combination
Ctrl-Alt-Shift-~ which blanks the display in red. It will be noticed
that moving and scrolling redraws only the minimal amount of area.

A static cache is used to perform this optimization. It holds a
row/column array of characters and their respective foreground and
background colorizations. Redraws are compared against the cache similar
to the way that xterms do.

The result of this is that Cooledit uses more CPU than most other
editors for display. On the other hand, X CPU usage is very much lower.


GDB AND SHELL INTERACTION
-------------------------

The Coolwidget library properly monitors jobs in the way of a shell. It
has the facility to add call-backs to file descriptors, watching its X
connection at the same time. The entire application centres around a
single select() statement.

Builtin is a utility function called triple_pipe_open used for any kind
of process interaction. The function forks a process analogous to popen,
but allows reading from stdout and stderr and writing to stdin. This
function is straight forward use of process forking and file descriptor
manipulation but is worth noting because novice C programmers often
require this functionality, but are too inexperienced to implement it
themselves or to dig for existing implementations. (Details for those
wishing to use the code may be found in the sources).

The interface to gdb uses this Coolwidget's call-back mechanism. It
operates completely asynchronously - commands sent to gdb are queued for
writing so as not to interfere with normal editor operations, and each
queued command is paired with a response. The gdb interface tries to be
as transparent as possible, giving the impression of a builtin debugger.
It implements sufficient features that a user should rarely have to type
in a command manually.

The debugger allows program output to be displayed to an xterm. Gdb has
the command-line option to set the tty to use for program output.
However there is no reliable method of returning the tty name of the
xterm back to the debugger. A small program, ttyname_stop, is installed
with Cooledit to solve this problem. Xterm is run with `-e
ttyname_stop'. Ttyname_stop prints its pid and controlling terminal
(from the ttyname() system call) to a temporary pipe file created in the
user's home directory (which is then read by the debugger) whereupon
ttyname_stop pauses indefinitely. Program output is then neatly sent to
the xterm. To close the xterm, the debugger merely needs to send SIGTERM
to ttyname_stop's pid.

Having a debugger and editor combined is most convenient for the user.
With the advent of this feature, Cooledit brings the convenience of the
famed Borland interfaces to Unix.


SYNTAX HIGHLIGHTING
-------------------

Conventional syntax highlighting buffers a color for each character (or
group of characters), so that particular regions of the edit buffer are
painted from `memory'. The cached colors are updated when modifications
are made to the text. However, the syntax highlighting used here looks
up the color of the text from its context on-the-fly, that is, as it is
being drawn. Color information for a character is NOT buffered anywhere.

To do this quickly requires a carefully optimized algorithm. It works by
stepping forward through the text and switching modes depending on
whether it has found the boundary of a word or a different keyword set.
Speed is assisted by caching the first letter of each keyword.

Highlighting text in this way is a novel approach, but has certain
limitations - it would be to slow to support regular expressions. It
also does not support case insensitive keywords. It has the advantage
that there is never any highlighting `lag'. The algorithm is also
transparent to line breaks. Hence, for example, a C style quote or
comment can cross many lines and will be highlighted correctly.

The code parses most program text at over 300kB per second on a PII300.
Screen refreshes are therefore instantaneous in most circumstances.


ON-THE-FLY SPELL CHECKING
-------------------------

It is interesting that this is an extremely easy feature to add. Ispell
allows for interaction with other programs using its `-a' option. The
author of ispell may have intended for this to be used from within
dialogs, say from a `Spellcheck' menu option. However, on a fast enough
system, there is no reason why words cannot be continuously fed to
ispell. It is in fact so easy to implement, that the author encourages
every other interactive application that processes text in any way
whatsoever, to add this feature.

Spell-checking within program comments is also implemented. The
spell-check code merely looks if the `spellcheck' option has been
enabled for the particular syntax highlighting context. Outside of say,
comments and string constants, spell-checking is disabled.

The main problem with spell checking had most to do with how misspelled
words were underlined. Either proper text styles (used by other editors)
needed to be implemented, or else Cooledit had to use the existing
syntax code. The later approach was considered more expedient.
Misspelled words are dynamically added to the list of keywords in the
syntax rule set, but are underlined instead of colorized. This has the
interesting effect that a misspelled word will be highlighted everywhere
in the buffer so long as your cursor has passed over it at least once.
To prevent an excess of keywords (this would slow down the editor)
keywords are removed from the rule set after aging one minute - which
has the benefit of preventing the display from becoming cluttered with
too many misspelled words.

The one deficiency of Cooledit's ispell interaction is that it is not
asynchronous. It has not been tested with very large ispell dictionaries
or on slow machines, so this may be an area worth optimizing.

(On-the-fly spell checking was defiantly added after the author read a
quote by a prominent figure in the computing world, that Unix systems
did not have spell-check-as-you-type.)


BUILTIN PYTHON INTERPRETOR
--------------------------

Python features standard procedures for building itself into high level
applications, and creating interfaces to C functions. Cooledit can be
built without the Python interpretor, but then lacks any Python
extensions - this is mostly for non-Linux systems, where administrators
do not wish to install the large Python sources.

Besides basic editor operations, wrappers were created to allow users to
access some of the Coolwidget library. Dialog boxes can be created from
within Python in the same style as the rest of the application. In this
way, Cooledit can actually be used to create simple GUI applications.

Users also have the benefit of being able to add or remove from any of
the existing menus, and can hence program any kind of customization.

The Python interpretor is initialized on startup, and runs the script
lib/cooledit/global.py and ~/.cedit/global.py. One of these scripts must
define a function type_change(s), which is run whenever a file is opened
or its `type' (meaning the syntax highlighting rule set being used) is
changed. `s' is the string you would normally see displayed on the left
of the editor window which describes the programming language: it is
taken from the syntax definitions. The function can use this to decide
what new utilities to make available.

The C extensions currently look like this:
    def type_change(s):

	menu ("Util")	# clear the Util menu

	# add new stuff to the util menu:
	if s == "C/C++ Program":
	    menu ("Util", "for(;;) {", "c_generic('for (;;) {', 5)")
	    menu ("Util", "while() {", "c_generic('while () {', 7)")
	    menu ("Util", "do {", "c_do_while()")
	    menu ("Util", "switch() {", "c_generic('switch () {', 8)")
	    menu ("Util", "case:", "c_case()")
	    menu ("Util", "if() {", "c_generic('if () {', 4)")
	    menu ("Util", "main() {", "c_main()")
	    menu ("Util", "#include ", "c_include()")
	    menu ("Util", "printf();", "c_printf()")

Where a function like c_printf() is defined further above. The C example
shows how utilities may typically be coded, and serves as a tutorial.
Basic operations like moving through the buffer, editing the buffer, and
returning status information are provided. The Python wrappers make
liberal use of Python's optional argument feature. For example, the
get_line() function returns: the current line with no arguments, a
single line with one argument, and a range of lines with two arguments.


FUTURE DIRECTIONS
-----------------

The potential that Python gives Cooledit is enormous. It is hoped that
users will contribute Python utilities just as they did syntax rule
sets. The Python interpretor will allow Cooledit development to subside
somewhat. The C side of Cooledit is considered substantially complete.

Some users have asked for a Gtk interface to Cooledit. A full Gtk
interface is improbable because of the heavy reliance Cooledit makes on
the Coolwidget library. The Midnight Commander for Gnome does however
implement a minimal version of Cooledit under Gtk, and it is hoped that
this will be extended to offer more features.