MIRRORDIR - A PROGRAM FOR SECURE ENCRYPTION AND MIRRORING --------------------------------------------------------- HISTORY ------- Mirrordir was original devised in February 1998 as an idea to create redundant systems - that is high availability systems. It was thought that a program that would periodically mirror a file system to a separate physical device would allow for a system to be easily restored in the event of a drive failure. Although scripts could be written to duplicate a directory hierarchy, no dedicated package existed that would perform this operation optimally (minimal set of changes), and ensure meticulous mirroring of all types of files. Within a week a working alpha version was available, which later grew with a rich set of modes and features. A month later a port of the Virtual File System layer of the Midnight Commander file manager was made and incorporated into Mirrordir. The VFS layer proved extremely useful for Obsidian Systems' (the author's employer) business, and allowed many of the programming hours to be billed. In October 1998, a secure socket layer was added in. This later evolved into the `libdiffie' library - a generic library for adding secure socket encryption to a network utility. The library transformed the VFS `mc://' connections into secure connections. At this point Mirrordir lacked the important feature of programmability. Users should be able to customize the files to include or exclude, using a programming language. This is especially important when doing large complex FTP mirrors. A trivial C interpretor was designed to fulfill this need. The following month `pslogin' was added based on rsh, using the secure layer. Support for compressed (as well as encrypted) sockets was added using the zlib library in January 1999. In February the `forward' program was added to forward arbitrary TCP ports over the secure socket layer. Recently a WIN32 port of Mirrordir was made, supporting all the file transfer and encryption features. Pslogin however is not yet ported. The opportunity for Mirrordir to be used explicitly for a high availability system came about in April 1999. It has been mirroring between two servers every six hours for the past few months without problems. Mirrordir is part of the Debian distribution. OVERVIEW -------- (The terminology `control' and `mirror' is used to indicate the existing and backup directories respectively.) Although transfer of ordinary files is a simple matter, an algorithm that duplicates directories in every detail becomes more complicated, compounded with verification of the age of the file (possibly across a timezone) to decide if it has been modified compared to the control. Consider for instance the problem of duplicating the access and modified times of a directory: the program must reset its access and modified time, but only AFTER the directory has been read. Further, the control access times should also be duplicated by resetting them after exiting each directory - the idea being that mirroring is done with the minimum possible intrusion. The problem of changing file stat information on the mirror side also needs to be tackled efficiently. For each directory listing, all non-existant files on the mirror side must be removed. If a directory exists it must be left untouched while files must only be removed if their type (i.e. ordinary versus device file) differs between the control and mirror trees. Finally, hardlinks need to be correctly located and duplicated. A hardlink table is set up internally for each multiply referenced inode, so as to allow proper tracking of hardlinks - even hardlinks to device files and symlinks. Recursion works in an obvious way - for each directory entry in the control, an entry in the mirror is located. If it exists and is a directory, then it is recursed into. If it is a file, then appropriate duplication is done - such as setting the permissions if they differ or changing the ownership, or creating a device file. Only the data of newer files is copied. Files remaining after the comparison that do not exist on the control side are removed at the end of the each listing. Mirrordir also supports straight recursive copying of files/directories as a more rigorous version of the cp program. Mirrordir is written entirely in ANSI C. VFS LAYER --------- The VFS (Virtual File System) layer essentially just supports `mc://user@host:port/' and `ftp://user@host:port/' type directory duplication/transfer in addition to local ones. The handling of FTP connections makes Mirrordir a mirroring tool on par with Mirror (a dedicated ftp mirroring tool). Though the vfs layer has some peculiarities that other mirroring programs might not. For instance, file access is transparent from the programmers point of view: mc_open() works just like open() and so forth; allowing recursive uploading as well as downloading. An ftp front end tends to list directory contents raw: VFS however parses the FTP LS output into the Unix stat structure, allowing transparent use of the (mc_)stat() system call with efficient caching of stat data. You can hence mirror any type of file (device, socket etc.) over ftp. mc:// connections have more power, in that access and modified times can be set (which is not possible with FTP). These connections require that the secure-mcserv program be running on the host. mc:// connections are inefficient because files are transfered in chunks that each require a handshake using the VFS's simplistic command protocol. Network latency hence governs the transfer speed. Downloading is more efficient because of some caching mechanisms that are implemented. In general, it is better to use downloading than uploading where possible, and then to use FTP transfers for speed and mc:// transfers for precision (in the latter case, such as mirroring a complete root file system). MIRRORDIR INTEGRITY CHECK ------------------------- The `make check' build target, runs a script which attempts to mirror every possible combination of file system change between control and mirror. This provides absolute certainty that Mirrordir is producing an identical file system. C INTERPRETOR ------------- The pseudo C interpretor is an extravagant solution to the problem of adding some simple programmability. It is 50kB of code and is easily portable to any application. It supports long integers, strings and huge integers (arbitrary precision signed integers needed for cryptography). It takes some huge liberties in its implementation, but is extremely fast as it tries to have the lowest possible interpretive overhead. It does not support functions. The C interpretor API begins with adding custom C functions using int parser_add_operator (Operator * o); `regexp()' is an example of a function Mirrordir adds so that users can have the benefit of regular expression comparisons on file names. Plenty of examples of custom functions are isolated in the file functions.c in the source distribution. The initialization function should then be called: void parser_init (void); and then the particular code you want to use is compiled: void *parser_compile (char *text, Value * heap); where `text' is the pseudo C code, and `heap' is an array of Value's - space for any variables that `text' may declare - which will then be accessible to the calling program in the order that they were declared within `text'. The pointer returned is the code compiled into an architecture dependent byte code. The internal format is basically the Reverse Polish form of the code, pushed onto a stack, and having direct pointers to functions that perform each operation. This makes for extremely fast execution of the byte code. To execute the code, simple call, int parser_evaluate (void *s, void *user_data); where `s' is the byte code returned from parser_compile, and `user_data' is optional data that the programmer might want to pass to any functions defined with parser_add_operator. The result returned from parser_evaluate is just the value of the `return' statement within `text', or zero if the interpretor fell out of the bottom of the program. Within the pseudo-C itself, all declared variables are initialized to zero, but are not re-initialized between calls, hence variable values carry over between successive calls to parser_evaluate(). Finally, void parser_free (void *s, Value * heap); void parser_shut (void); may be called for cleaning up. The interpretor should exit with no memory leaks. The C interpretor performs two functions within Mirrordir: firstly it allows programmability of the type of files you would like to include or exclude when mirroring data. Secondly it interprets the cryptography scripts to keep the cryptography code separate from the Mirrordir distribution. Implementing key exchange and signature algorithms with an interpreted language has a further advantage: it is easy to see what the algorithm actually does. It is easier to mentally verify the security of a fifty line script than to wade through original C source code. CRYPTOGRAPHY ------------ The secure socket layer provides wrappers for the connect(), accept() and associated network functions. In theory, any socket program can just be recompiled after including the diffie-socket.h and z-socket.h headers and then become a `secure' version of that program. In practice this will probably not work, and the secure-mcserv.c, forward.c and mirrordir.c sources should be used as examples of exactly what to do. The accept() and connect() calls of the socket layer each require scripts to be present in /etc/ssocket/. If these are not present, then they are automatically transfered by FTP from encrypt.obsidian.co.za - where there are no relavent export regulations (there are obvious security issues involved with this). The scripts generate and write public and private keys to /etc/ssocket/public/ and /etc/ssocket/private/ as needed. Keys are stored with one key per file in raw binary form with the first two bytes dictating the length of the key in bytes. (See huge_as_binary() called by huge_write()). This is a more practical alternative to key database files, obviating the for key management utilities. On Linux, random numbers are generated by reading the /dev/urandom device. On other systems the random number generation is probably not secure, since it relies on an MD5 hash of the time, process ID and modified and access times of some top level directories. The algorithm is a classic Diffie-Hellman key exchange with signature verification of the host, and symmetric key stream cipher. It is detailed as follows: First the client sends a magic number to the server to tell it that it wants encryption, if this is not sent, then a plain text connection is assumed. Then it calculates X = g^x mod p, where x is random, and sends X. The server recieves X, calculates Y = g^y mod p, where y is random, and sends back a reply magic number and then Y. Now both calculate k = g^(xy) mod p (client calculates k = Y^x mod p and server X^y mod p). That concludes the classic Diffie-Hellman key exchange. Encryption is now turned on with the key k - that is, all further talk is over a symmetric stream cipher (explained below). Actually, two streams are set up, one for the server->client pipe and one for client->server pipe, using the most and least significant halves of k respectively. The actual script lines are: initarcrd (c, l / 2); initarcwr (c + l / 2, l / 2); and invokes a separate script (see below) to initialize the stream cipher before continuing. The server then looks in /etc/ssocket and gets its public, y, and private, x, signature keys. (If it can't find them then it calculates y = g^x mod p, where x is random and is the private key, and y is the public key, and stores x and y in the appropriate files (this would happen if the program is being run for the first time ever). Then it sends y (the public key) to the client. The client recieves y and checks in the users ~/.ssocket/ directory for a file with the same name as the hosts ip address, to verify the public key against the file, and hence that the server is who it last claimed to be. If the file does not exist then it creates it (i.e. a first time connection to the server), and warns the user of the possibility of a breach. The server then signs the encryption key k with private key x to to produce r = k g^v mod p, s = v - r x mod q, where v is random, and q = (p - 1) / 2 is also prime. r and s represented the signed k and are sent to the client. The client recieves r and s and calculates m = g^s y^(r mod q) r mod p. m must equal k as verification of the authenticity of the server. The signature exchange is there to exclude the possibility of a man-in-the-middle attack. Normal TCP traffic then ensues with encryption still enabled. The first traffic would probably verify the authenticity of the client, and since all traffic is now encrypted, the client should now have no qualms about sending his password in the `plain-text' such as with a normal telnet login. The requirement of an exportable program left only Diffie-Hellman as the choice of key exchange, to avoid patent infringements. The signature scheme is called the p-NEW signature scheme from Bruce Schneier's `Applied Cryptography'. The requirement of a stream cipher that was simple enough to be executed by an interpreted script and still operate with reasonable speed, left only the ARC stream cipher which is described in the same reference. The modified version of ARC used here is a twelve bit version, implemented in the files /etc/ssocket/arcinit.cs and /etc/ssocket/arcencrypt.cs The stream cipher need not be executed by scripts if you are using an international version of Mirrordir. In this case, the C routine is compiled into the source, making this an extremely fast encrypted connection. Unfortunately the speed of the compiled in cipher could not be properly measured because it was negligible compared to the network latency on the test computer - ARC is an order of magnitude faster than plain DES, faster here because it is twelve bit. For the interpreted script, the cipher managed 40kB per second (PII300) which is just adequate for an X server connection, and more than adequate for any terminal connection. Note that the `International' and `US' versions of Mirrordir are completely inter-operable. The key size for the stream cipher is obviously half the size of the Diffie-Hellman key - which is enormous by any standards of symmetric key length. The Diffie-Hellman key can be set on the command-line to any one of 512, 768, 1024 or 1536 bits, and defaults to 512 bits. HUGE NUMBER LIBRARY ------------------- The arbitrary precision code was taken from the Python sources. At 25kB This code is the most succinct library that could be found, and was reworked to compile indepedantly of the Python sources. An interesting modification was to redefine Python's idea of a `digit' from 15 bits to 31 bits and make use of gcc's long long integer type of 64 bits. This gives a 7/4 speed increase for powmod operations. An extract from huge-number.h is, /* comparison */ int huge_compare (Huge * a, Huge * b); int huge_nonzero (Huge * v); /* arithmetic */ Huge *huge_add (Huge * a, Huge * b); Huge *huge_sub (Huge * a, Huge * b); Huge *huge_mul (Huge * a, Huge * b); Huge *huge_div (Huge * v, Huge * w); Huge *huge_mod (Huge * v, Huge * w); Huge *huge_divmod (Huge * v, Huge * w, Huge ** remainder /* may be null */ ); Huge *huge_invert (Huge * v); This is a useful piece of clean code for anyone who wants to add arbitrary precision support to a program without resorting to linking with the GNU mp library. 12 BIT ARC ---------- Allegedly, this algorithm used 8*8 S boxes. Bruce Schneier comments that there is no reason why this cannot be extended to 16*16 S boxes which will require 16 x 2^16 bits of storage - 128kB. In between is the inconvenient 12*12 arrangement requiring 2^12 words or 8kB. Its inconvenient because the algorithm must cope with half bytes when xor'ing the stream (it encrypts three nibbles at a time). At a minimum key size of 256 bits, this part of the encryption package appears to have no security issues. It is dissapointing that the uninformed will associate security with catch terms like DES. It should be noted that a stream cipher like this one is close to `one time pad' encryption. I.e. it is really the most secure type of encryption possible, and is really the ideal choice for a TCP stream. SECURITY -------- At the moment, Mirrordir has undergone no external scrutiny for the security of the algorithm or implementation. A particularly vulnerable point is having scripts downloaded in the plain-text from a single server. The idea of Mirrordir is exportability - allowing the user to use strong cryptography out of a stock US Unix distribution without having to download or learn about `International Versions' of a package - hence any use of PGP or other such package to sign the scripts would still require the user to customize their system, and would defeat the purpose of package. One solution is to distribute the scripts to as many servers as possible and have mirrordir verify scripts from seperate random servers. As these scripts become wide-spread public knowledge, an attack (of the sort to try and surreptitiously modify the scripts) would become more difficult. FUTURE DIRECTIONS ----------------- At some point it is hoped that the security features of the Mirrordir VFS be incorporated back into the Midnight Commander. It is hoped to be able to execute remote shell commands as well as transfer files over a secure link, for example in the left panel, while having a different system on the right panel. Authors: Paul Sheer, currently an employee of Obsidian Systems Contact: Paul Sheer Address: 8 Wingate Court, Gibson Rd, Kenilworth, 7708, Cape Town, South Africa Phone: +27 11 761 7224 or +27 83 604 0615 Fax: (by special arrangement) Email: psheer@obsidian.co.za URL: http://www.obsidian.co.za/psheer/ Title: COOLEDIT - TEXT EDITOR AND INTEGRATED DEVELOPMENT ENVIRONMENT Needs: Pentium 200 or greater with 2 MB graphics card and Video projector capable of 1024x768 Resume: No previous conference talks. Abstract: Cooledit is a full featured text editor for the X Window System. It is built around its own widget library written directly in XLib. It has a builtin Python interpretor for macro programming, syntax highlighting for many programming languages, a comprehensive interface to gdb, and generic interfaces to compilers and text processing utilities. It has an elegant 3D multiple window interface, and is light and fast. COOLEDIT - TEXT EDITOR AND INTEGRATED DEVELOPMENT ENVIRONMENT ------------------------------------------------------------- INTRODUCTION ------------ In 1996 work began on an internal editor for the Midnight Commander to mimic and extend the editor of the Norton Commander DOS file manager. At the same time, the author's thesis project led to writing a widget library called Coolwidgets for X (the stereo-0.2 package available on the sunsite). This evolved into a text editor that had a terminal interface via the Midnight Commander and an X interface via Coolwidgets. Today, Cooledit has matured into a refined and sophisticated programming environment. Important to Cooledit's development was that it was written mostly using itself, and has hence seen more work in its own creation than by any other single need. This has important implications (besides the obvious existential ones) and will be discussed with reference to general GUI design. Cooledit comes with several other utilities written using the Coolwidgets library. It is written entirely in ANSI C. COOLEDIT AND COOLWIDGET ARCHITECTURE ------------------------------------ It should be stated at the outset that Coolwidgets is poorly designed. Although extensive and repeated overhauls have been made to the source, and although most of the individual functions are cleanly written, Coolwidgets is by and large a `hack'. It lacks proper modularity and extensibility - an important lesson in program design. Coolwidgets was an attempt to create a widget library that worked in parallel with XLib, the idea being that the programmer could have the low level control of X, combined with the high level shortcuts of a ready library. When this became messy, it leaned toward a slightly object oriented design. It is successful in that it is the minimalist widget library for its requirements - that is, it is probably the smallest piece of code possible for what it has to do. How Coolwidgets actually works is best described with a `hello world' example: /* hello.c - simple example usage of the coolwidget X API */ #include int main (int argc, char **argv) { Window win; CInitData cooledit_startup; CEvent cwevent; int y; /* initialize the library */ memset (&cooledit_startup, 0, sizeof (cooledit_startup)); cooledit_startup.name = argv[0]; /* won't bother with other init's like geom and display */ cooledit_startup.font = "-*-helvetica-bold-r-*--14-*-*-*-p-*-iso8859-1"; CInitialise (&cooledit_startup); /* create main window */ win = CDrawMainWindow ("hello", "Hello"); CGetHintPos (0, &y); /* y position where to start drawing */ CDrawText ("hellotext", win, 0, y, " Hello World "); /* the lable "hellotext" may be used to identify the widget later */ CCentre ("hellotext"); /* we want the text centred from left to right */ CGetHintPos (0, &y); /* get the next y position below the last widget drawn */ CDrawButton ("done", win, 0, y, AUTO_SIZE, " Done "); /* draw a button... */ CCentre ("done"); /* ...centred */ CSetSizeHintPos ("hello"); /* set the window size to just fit the widgets */ CFocus (CIdent ("done")); /* show the window */ CMapDialog ("hello"); /* Run the application. */ do { CNextEvent (0, &cwevent); if (cwevent.type == QuitApplication) /* pressed WM's "close" button */ break; } while (strcmp (cwevent.ident, "done")); /* close connection to the X display */ CShutdown (); return 0; } Coolwidgets widgets are stored internally and referenced by an identifier string. The string must be unique within the scope of the application. This identifier approach was appealing because the programmer need not declare every widget created, and can easily globally reference any widget - it has no other real advantages though. The workhorse of the application is CNextEvent() - a wrapper around XNextEvent that does things like handle Expose events, key-press translation and widget call-backs. Using CNextEvent(), Coolwidgets can be used in a straight C fashion like above, or in callback (ala object-orientated) fashion with the CAddCallback() function. In this way it is quite versatile. A widget is defined by a 512 byte structure. There is only one structure for all the different types of widgets - fields were just added as needed, hence most of the fields are not used (the memory waist incurred by this is not significant). All the widgets are allocated into an array within the library. The editor itself is just an additional widget of the library, in the same way as it is under the Midnight Commander. The Cooledit application contains the Python interpretor, shell and gdb interfaces, and manages the multiple window interface. The library has some distinguishing features: - Additional events may be received by CNextEvent besides the usual X events. QuitApplication above is an example (wm close). AlarmEvent and EditorCommand are others. These provide additional functionality. - Exposes are amalgamated into larger exposes for efficiency. Application need not do this themselves and can rely on received expose areas being optimal. - Key presses are translated into editor commands before reaching the application. These are high level commands like Cut and Paste. Hence any application written with Coolwidgets will have the same key-bindings. Key-bindings are consistent through all widgets. Cooledit also implements a way to redefine keys globally, by applying a key translator callback at a low level. - Dialogs and menus dynamically assign hotkeys (underlined letters) from heuristics. This means that international languages will have hotkeys without translators having to explicitly work out a hotkey arrangement. Alt- combinations are an intrinsic part of Coolwidgets. - All entry widgets have a history. At the moment the application has to take the trouble to load and save this history, but if it does, then the user has the benefit of persistent storage of every entry ever made. DESIGNING AN INTERFACE ---------------------- It is noteworthy that the author had extensive use of a myriad of text editors before beginning this project. Each of them had a set of nifty features along with just as many irritating ones. Cooledit attempts to be the combination of the best behavior from all these editors. Although logic plays an important role when creating an interface, it cannot completely anticipate the psychology of users' reactions. A user will tend to avoid using some feature that they find irritating (or because they are used to doing it differently in another program). The user will try a combination of other features (or even another program) to achieve the same result. They may get used to doing things a different way, and will expect their own approach to be ergonomic, even if it conflicts with the intention of the developer. A users use of an application will not necessarily propagate to the minimal set of key strokes just because an exhaustive combination of features are present. How users will end up using an application is highly unpredictable. The solution is two fold: 1. The developers should have extensive use of many other similar applications. This does not just mean looking at what features an application has and how they are invoked, but also using those features extensively until the operations become spontaneous. Only at this point is the application fully evaluated. 2. Create many different ways of doing the same thing. In the first case, a universal guideline is this: you cannot know how you are going to feel about a set of key strokes after you have used them a thousand times, until you have actually used them a thousand times. Be empirical. A list of editors that were tested is: - Borland C IDE - jed (Unix terminal editor) - ncedit.exe (internal editor of the Norton Commander) - ne.com (Norton Editor for DOS) - notepad.exe (Windows 3.1 editor) - tvedit.exe (Turbo Vision Borland editor class for DOS) Each of these were used to write many thousands of lines of code. Others were used, but not as intensively. A grievous omission was the failure to test Emacs, however at the time, Emacs was thought to not make full use of the potential of the X keyboard standard and, like Vi, was not considered in the same spirit of user friendliness and ergonomics. Cooledit wanted to eliminated double key combinations as much as possible, as well as eliminate any sort of learning curve for novice computer users. In the second case, it is desirable to implement three separate methods of invoking a feature. First, a mouse can be used to pull the appropriate menu. Second, a hotkey should be supported, and finally, keys can be used to navigate to the menu and manually invoke the menu item. Development has aimed to make Cooledit completely independent of mouse operations, allowing it to retain the speed of a terminal editor. Having three methods of actuating a function inherently supports the users learning curve toward fluent use of the application. It is felt that all GUI's should intrinsically support this. For interest, Cooledit development began under jed. As soon as it was complete enough to save an edit buffer, jed was discarded. Thereafter, Cooledit was used to write itself. In this way it receives ongoing and intensive testing. Cooledit's interface has evolved only out of an enormous number of hours of testing and fine tuning. EDIT BUFFER DESIGN ------------------ Most editors use a single linear memory block for the edit buffer. Cooledit however has implemented a buffer array of 64k blocks. The exact details of this are explained in the sources. This buffer array system could allow data to be saved to swap files when very large files are being edited, and allow for a 16 bit implementation of the editor. However, this was never implemented, and hence the buffer system remains an odd implementation. Low level access to the buffer occurs solely through six functions: edit_get_byte - to retrieve a character. edit_insert - insert at the cursor and move one place. edit_insert_ahead - insert at the cursor. edit_delete - delete ahead. edit_backspace - delete backward. edit_cursor_move - move the cursor an integer number of places. These functions record each modification into a wrapping history - the undo stack. Hence each action taken on the buffer is recorded. This allows for the key-for-key undo feature of Cooledit. The undo stack can be set to an arbitrary size. The buffer is eight bit clean and null transparent so that binary files can be editing flawlessly. An important consideration is that no changes are made to the buffer unless it is explicitely modified by the user. Hence loading a file, moving the cursor and saving it again, leaves it completely unchanged, even if it is a binary file. (Some DOS editors do not have this grace.) DISPLAY OPTIMIZATION -------------------- Another reason for the development of Cooledit, was that other editors were not display optimized. Any application that uses a canvas area should allow for use of non-acclerated graphics displays. This requires that only the minimal surface area is redrawn with each key press. This has become less important today, as accelerated hardware becomes cheaper. Cooledit also supports 16 color displays extremely well. To demonstrate this, Cooledit has the hidden key combination Ctrl-Alt-Shift-~ which blanks the display in red. It will be noticed that moving and scrolling redraws only the minimal amount of area. A static cache is used to perform this optimization. It holds a row/column array of characters and their respective foreground and background colorizations. Redraws are compared against the cache similar to the way that xterms do. The result of this is that Cooledit uses more CPU than most other editors for display. On the other hand, X CPU usage is very much lower. GDB AND SHELL INTERACTION ------------------------- The Coolwidget library properly monitors jobs in the way of a shell. It has the facility to add call-backs to file descriptors, watching its X connection at the same time. The entire application centres around a single select() statement. Builtin is a utility function called triple_pipe_open used for any kind of process interaction. The function forks a process analogous to popen, but allows reading from stdout and stderr and writing to stdin. This function is straight forward use of process forking and file descriptor manipulation but is worth noting because novice C programmers often require this functionality, but are too inexperienced to implement it themselves or to dig for existing implementations. (Details for those wishing to use the code may be found in the sources). The interface to gdb uses this Coolwidget's call-back mechanism. It operates completely asynchronously - commands sent to gdb are queued for writing so as not to interfere with normal editor operations, and each queued command is paired with a response. The gdb interface tries to be as transparent as possible, giving the impression of a builtin debugger. It implements sufficient features that a user should rarely have to type in a command manually. The debugger allows program output to be displayed to an xterm. Gdb has the command-line option to set the tty to use for program output. However there is no reliable method of returning the tty name of the xterm back to the debugger. A small program, ttyname_stop, is installed with Cooledit to solve this problem. Xterm is run with `-e ttyname_stop'. Ttyname_stop prints its pid and controlling terminal (from the ttyname() system call) to a temporary pipe file created in the user's home directory (which is then read by the debugger) whereupon ttyname_stop pauses indefinitely. Program output is then neatly sent to the xterm. To close the xterm, the debugger merely needs to send SIGTERM to ttyname_stop's pid. Having a debugger and editor combined is most convenient for the user. With the advent of this feature, Cooledit brings the convenience of the famed Borland interfaces to Unix. SYNTAX HIGHLIGHTING ------------------- Conventional syntax highlighting buffers a color for each character (or group of characters), so that particular regions of the edit buffer are painted from `memory'. The cached colors are updated when modifications are made to the text. However, the syntax highlighting used here looks up the color of the text from its context on-the-fly, that is, as it is being drawn. Color information for a character is NOT buffered anywhere. To do this quickly requires a carefully optimized algorithm. It works by stepping forward through the text and switching modes depending on whether it has found the boundary of a word or a different keyword set. Speed is assisted by caching the first letter of each keyword. Highlighting text in this way is a novel approach, but has certain limitations - it would be to slow to support regular expressions. It also does not support case insensitive keywords. It has the advantage that there is never any highlighting `lag'. The algorithm is also transparent to line breaks. Hence, for example, a C style quote or comment can cross many lines and will be highlighted correctly. The code parses most program text at over 300kB per second on a PII300. Screen refreshes are therefore instantaneous in most circumstances. ON-THE-FLY SPELL CHECKING ------------------------- It is interesting that this is an extremely easy feature to add. Ispell allows for interaction with other programs using its `-a' option. The author of ispell may have intended for this to be used from within dialogs, say from a `Spellcheck' menu option. However, on a fast enough system, there is no reason why words cannot be continuously fed to ispell. It is in fact so easy to implement, that the author encourages every other interactive application that processes text in any way whatsoever, to add this feature. Spell-checking within program comments is also implemented. The spell-check code merely looks if the `spellcheck' option has been enabled for the particular syntax highlighting context. Outside of say, comments and string constants, spell-checking is disabled. The main problem with spell checking had most to do with how misspelled words were underlined. Either proper text styles (used by other editors) needed to be implemented, or else Cooledit had to use the existing syntax code. The later approach was considered more expedient. Misspelled words are dynamically added to the list of keywords in the syntax rule set, but are underlined instead of colorized. This has the interesting effect that a misspelled word will be highlighted everywhere in the buffer so long as your cursor has passed over it at least once. To prevent an excess of keywords (this would slow down the editor) keywords are removed from the rule set after aging one minute - which has the benefit of preventing the display from becoming cluttered with too many misspelled words. The one deficiency of Cooledit's ispell interaction is that it is not asynchronous. It has not been tested with very large ispell dictionaries or on slow machines, so this may be an area worth optimizing. (On-the-fly spell checking was defiantly added after the author read a quote by a prominent figure in the computing world, that Unix systems did not have spell-check-as-you-type.) BUILTIN PYTHON INTERPRETOR -------------------------- Python features standard procedures for building itself into high level applications, and creating interfaces to C functions. Cooledit can be built without the Python interpretor, but then lacks any Python extensions - this is mostly for non-Linux systems, where administrators do not wish to install the large Python sources. Besides basic editor operations, wrappers were created to allow users to access some of the Coolwidget library. Dialog boxes can be created from within Python in the same style as the rest of the application. In this way, Cooledit can actually be used to create simple GUI applications. Users also have the benefit of being able to add or remove from any of the existing menus, and can hence program any kind of customization. The Python interpretor is initialized on startup, and runs the script lib/cooledit/global.py and ~/.cedit/global.py. One of these scripts must define a function type_change(s), which is run whenever a file is opened or its `type' (meaning the syntax highlighting rule set being used) is changed. `s' is the string you would normally see displayed on the left of the editor window which describes the programming language: it is taken from the syntax definitions. The function can use this to decide what new utilities to make available. The C extensions currently look like this: def type_change(s): menu ("Util") # clear the Util menu # add new stuff to the util menu: if s == "C/C++ Program": menu ("Util", "for(;;) {", "c_generic('for (;;) {', 5)") menu ("Util", "while() {", "c_generic('while () {', 7)") menu ("Util", "do {", "c_do_while()") menu ("Util", "switch() {", "c_generic('switch () {', 8)") menu ("Util", "case:", "c_case()") menu ("Util", "if() {", "c_generic('if () {', 4)") menu ("Util", "main() {", "c_main()") menu ("Util", "#include ", "c_include()") menu ("Util", "printf();", "c_printf()") Where a function like c_printf() is defined further above. The C example shows how utilities may typically be coded, and serves as a tutorial. Basic operations like moving through the buffer, editing the buffer, and returning status information are provided. The Python wrappers make liberal use of Python's optional argument feature. For example, the get_line() function returns: the current line with no arguments, a single line with one argument, and a range of lines with two arguments. FUTURE DIRECTIONS ----------------- The potential that Python gives Cooledit is enormous. It is hoped that users will contribute Python utilities just as they did syntax rule sets. The Python interpretor will allow Cooledit development to subside somewhat. The C side of Cooledit is considered substantially complete. Some users have asked for a Gtk interface to Cooledit. A full Gtk interface is improbable because of the heavy reliance Cooledit makes on the Coolwidget library. The Midnight Commander for Gnome does however implement a minimal version of Cooledit under Gtk, and it is hoped that this will be extended to offer more features.