NOTES ON "i" By Howard Trickey ( howard@research.bell-labs.com) "i" is an unfinished Web browser for Plan 9. This note describes the current state (as of June 2000), what I think needs to be done, and some description of how it works. HISTORY "i" started life as charon, the web browser I wrote for Inferno. I converted the Limbo source to C using a program based on the Limbo compiler (it walked the parse tree, emitting plausible C as it went), but there was much manual work needed afterwards because I wanted it to look like good C code as I would have written it from scratch, rather than something that used a Limbo emulation library all over the place. TODO There are three categories of things that need to be done: (1) fix memory allocation (it is never freed right now); (2) fix layout and protocol bugs; (3) add functionality. I advise doing them in that order. Perhaps, rather than fixing the protocol bugs, a new protocol should be designed with the needs of acme integration in mind. MEMORY ALLOCATION The biggest problem in converting to C involves memory management. The obvious but tedious way is to write freeing code, put in reference counts where necessary, and be sure to free exactly and wherever necessary. I decided to try something different. The state inside "i" is in two categories: long-term state, and per-page state (where "page" means what you get if you load some URL; maybe an HTML document, maybe a picture, maybe a frame containing several HTML documents). Most of the state is of the per-page kind, so I decided on this strategy: Most memory allocation is done using an "emalloc" that just takes hunks of memory out of a per-process pool, with the intention that one will not free these pieces individually, but rather, will free the whole pool in bulk at various infrequent points (e.g., when one goes to a new URL). The main problem with this strategy is figuring out how to deal with the fact that sometimes one wants to allocate long-term state from within a process that is mostly allocating short-term state; or, one wants to convert short-term state into long-term state for various reasons (e.g., caching). My design for handling this was to have a special purpose "garbage collector"; really, "live stuff collector", that knows about the structure of all my datatypes and the roots of live data, and is able to walk to collect and copy stuff from short-term to long-term pool. Unfortunately, I did not finish all the work needed for this design. The current code does the allocation from pools, but not the short-term to long-term copy, so I just never free anything (this must be fixed!). The file gc.c contains code for doing the walking, based on hand-built data structure tables, but I'm not sure the tables are completely up to date. And I didn't do the work to record the roots of live data that need to be transferred. I would like to see my design carried through, because I think it would have advantages in speed and robustness (less easy to make programming errors re memory) once properly in place. But it may turn out that I am wrong, and that a more traditional keep-track-of-and-free approach is needed. BUG FIXES NEEDED 1. Layout bugs. You don't have to use "i" very much before you run across layout bugs. There seem to be a number related to table layout. Table layout is a big pain, for several reasons: (a) the sizing algorithm is complicated; (b) it gets more complicated when you want to be efficient and avoid the need to redo size calculations over & over for subtables; (c) the extant browsers behave unexpectedly and differently, especially in the face of table specifications that are impossible to fulfil (e.g., because the sum of specified column widths and padding of some subpiece is bigger than the specified size of some enclosing column). As distasteful as it is, one has to emulate the behavior of, say, IE5, whether it is "correct" or not, or else many pages will look nonsensical. 2. Internal protocol bugs. There is an internal protocol (expected sequence of calls, callbacks) among the pieces of "i". My design was intended to be such that there would be no need for "killing" any processes or threads, but rather, an orderly system of messages over channels combined with flag setting, so that each thread and process would know when to exit and/or when any threads and processes it controlled exited. There is one exception to that: if a "network fetching" process is stuck in a system call waiting for some network response, it may have to be "killed" (sent a note) to bump it out of that system call, but after that it is supposed to exit cleanly. In charon, I used a system of precipitous kills to stop the processing of a page if the user hit "stop" or some other URL link. This was never very satisfactory, and seemed even less so in the C/Plan 9 world, because of the difficulties of stopping deadlocks due to channel communication where one half goes away at random times. So "i" went to the "orderly die" system alluded to above. Unfortunately, it is very delicate to get exactly right, and if wrong, leads to a deadlock situation. There are occasions where that happens with "i" right now, showing that there are bugs in my design with respect to this protocol. They will probably be hard to track down and fix. Probably it would be best to attempt to document how and why the design is supposed to work. I made an attempt to do that with a Promela model of the communication, in /usr/howard/i/i.promela, but I don't remember how up-to-date it is. One thing to bear in mind is that sometimes the reason the design is supposed to work has to do with what are threads and what are processes --- things that are in the same process, but different threads, need less interlocking of access to global variables, and there are other assumptions that can be made about which messages are possible at which times. FUNCTIONALITY NEEDED The following things are still needed to make "i" a usable web browser, in my estimated order of importance. 1. Animated gifs. The work is done to accumulate the frames, but not to display them in sequence (the framework for doing so is there, however, as adopted from charon). 2. Javascript (& Document Object Model). This is an enormous amount of work, and should probably be done last, even though it is, unfortunately, quite important. Sean Dorward (sean@research.bell-labs.com) wrote a Javascript interpreter in Limbo, and I was using that in charon, but that was only the tip of the iceberg. Getting a match between charon's internal structure and the document & browser models assumed by extant web pages proved daunting. When I looked at it, many web pages checked to see whether the browser was Internet Explorer (version 3 or 4) or Netscape Navigator (version 3 or 4), and did up to four different versions of code to handle each possibility --- and if you were something else, forget it, they just ignored you. So I tried to emulate Netscape 3, none too successfully. One of the problems was the assumption that there could be many top-level windows, which charon didn't have because it was aimed at webphones, among other things. If the "i" engine gets integrated into acme, perhaps there will be a cleaner fit. And maybe the browser landscape is cleaner now: just aiming to emulate IE5 should be sufficient (if that is possible!). Unfortunately, the later version browsers require renderers that are more complicated than the current charon/"i" one: in particular, there is a need for absolute and/or relative positioning of elements (but then, none of the commercial browsers do this very well and they all disagree with each other, so maybe this feature is not often used). The files jscript.c and script.h in this directory are the raw output of my limbo-to-C converter on the charon-side part of charon javascript implementation. It is probably useless. The majority of the javascript code, the interpreter, has not even been run through the converter. I could help with this, if desired. 3. Support for "basic authorization" (code is there, except ability to make popup dialogs, which is partially there but needs some support in gui.c). 4. Implement "Save As". 5. SSL support (for https: URLs). My charon browser had an implementation of the client side of SSL/2, which was sufficient for this purpose. I ported much of the code to C, but didn't finish. Paul Glick (pg@research.bell-labs.com) finished it up, so it should just be a matter of integrating that code in. 6. HTTP version 1.1, with persistent connections. There was an attempt in the code to get ready to do this, but it wasn't completed. I don't know how important this is. I have run across one site that refused to do HTTP version 1.0 any more. 7. CSS (Cascading Style Sheet) support. Would be nice. Goes along with the Javascript/Document Object Model thing. 8. XML/XSL support. I would like to see this too, but this is very far out there. It does seem to be the coming thing.