DESIGN OF "i" By Howard Trickey ( howard@research.bell-labs.com) Most of the data structures used in "i" are declared in i.h . That file has a number of sections, headed by comments (// URL, //STRINTTAB, //IUTILS, etc.) giving the name of the c file that implements the corresponding routines. The c files and their purpose are: assert.c - for aborting when "this can't happen" build.c - parses HTML and converts to "build" format event.c - construction and debug print for "events" file.c - fetch code specific to FILE: protocol ftp.c - fetch code specific to FTP: protocol gc.c - tables and code to allow walking of data structures by a general "live data" collector gui.c - code to deal with window system: finding and maintaining the windows that are drawn in, reading the keyboard and mouse and delivering those results as events to the main program http.c - fetch code specific to HTTP: protocol i.c - main program: main event loop; code for placing and dealing with the main user interaction elements; code for initiating a "goto" a new url (by calling layout()); code to maintain history icons.c - arrays containing user-interface icons img.c - converts jpg and gif files to draw images, including dithering (if needed). iutils.c - many routines needed by more than one of the other modules; image cache code; user options (configuration) code; common code to deal with getting entities via the various protocols (only a little bit of custom code needs to live in the separate files file.c, ftp.c, http.c). layout.c - lays out one page, using iutils to fetch "sources" (html, imges), and then using build and/or img routines to get into internal form, then laying out in available space. Also has code for drawing each of the user interface "controls" (entry boxes, combo boxes, etc.). lex.c - lexical analysis of html, used by build. strinttab.c - string-to-int lookup routine transport.c - tables giving pointers to implementations of various parts of the entity fetching logic needed for each of the transport mechanisms (file, ftp, http). url.c - implements a "url" datatype, with various utilities to operate on them utils.c - memory, list, and string utilities STARTUP When "i" starts up, it first calls meminit, to set things up so that emalloc() and emallocz() will allocate out of a big "main group" pool. This is the pool that should hold long-lasting data. In general, emalloc() and emallocz() allocate out of a pool specifically dedicated to the current process. Next, "i" calls initdraw() to initialize Plan 9's graphics. Then it calls iutilsinit(), which initializes all the other "modules" (each, by convention, has an xxxinit() function), and processes the command line and the user's configuration file (/usr/username/lib/iconfig) for the startup url and various user-configurable options. Next, "i" calls guiinit(), which finds out the containing rio window, divides up that space into the three regions "control" (top bar, with buttons, etc), "main" (where the page is drawn), and "prog" (where the progress bar goes). It also starts up two processes to read the mouse and keyboard, and pass on mouse and keyboard "events" along an evchan channel (those events are processed in the main event loop, see below). Next, "i" creates a "netget" thread (see netget in iutils.c). Netget acts as a centralized control for fetching all of the entities needed to display a page. It communicates with the rest of "i" via channels. Netget runs forever. Next, "i" creates a "go" thread, whose purpose is to sit around waiting for commands to "go" somewhere, and act on them by calling layout. The "go" thread runs forever. Finally, "i" enters a main loop, where it continually reads the event channel and acts on those events. Many of the events can result in a new place to "go", after acting on them (e.g., a mouse event might be a click over a link). So, after each event, we check to see if there is a new place to go, and if so, we abort the current "go", and then send a new one along the go channel, to be acted upon by the "go" thread. "i" exits when a quit event is received. TYPICAL "GO" PROCESSING Drawing is done inside a "Frame" (see i.h). The whole main drawing window is a Frame, and if the HTML document is a frameset, then the main Frame will contain kid Frames. "i"s first job is to find out which Frame a given "go specification" refers to, and then it calls get() to actually fetch and render the contents of that Frame. get() will be called recursively if the contents is a frameset. get() also deals with redirections, authorization challanges, and notifying the user of fetch errors if the main entity for a page cannot be fetched. Fetching of entities (html, images, eventually things like scripts, style sheets) is organized around the concept of a ByteSource (see i.h). This is a structure with one big buffer that will eventually contain all of the bytes of the entity, and two indices: edata is set by the producer (transport mechanism) to say how much of the buffer is valid; lim is set by the consumer (lexer or image converter) to say how much has been consumed so far. This design (of one big buffer) was the result of several iterations, and seems better than a linked-list or partial-buffer approach. The ByteSource also has a Header structure that contains information that the transport mechanism knows from its initial handshake with the other end. The get() routine starts its fetching process by packing up (as a ReqInfo) all of the data needed to fetch the entity, and calling startreq() with that as argument. The result is a ByteSource, and if its err field is not set, the netget thread is busy trying to fill it. Next, it calls waitreq(), which waits for notification of change in any active ByteSource. As there is only one, it should be the one just returned by startreq(). Either an error has occurred or the Header is now filled, and get() can decide whether to handle a challenge, redirection, show an error, or, if all is OK, start layout. If all is OK, it calls layout() with the Frame and ByteSource as arguments. layout() is organized around processing a list of "Source"s. The list is initialized with the passed-in ByteSource, and as processing continues, things (images) get added to it. Each source has a type and a ByteSource. The main loop of layout goes until all the sources are "done" (either processed completely, or some error happened; in either case, the ByteSource resources are cleaned up). The loop waits for some ByteSource to have a state change, and acts on that change. For instance, if an HTML source gets more data, we call getitems() (in build.c) to convert whatever it can into the internal form used by the layout engine. If an image source gets more data, the image converter is called (currently the coverter only does something if all of the data is there, but the structure is there for incremental handling). The internal form of an HTML document is a list of Item's (see i.h). An Item is actually a kind of variant data type, simulated in C. There are a number of common data fields (defined in the Item structure itself), but then there is a tag, and depending on the value of the tag, the Item can be case to one of the particular Item types (Itext, etc.) to get the remaining fields. This structure came from Charon, where space was at a premium. I think it is still good, to save on the number of memory allocations needed, but perhaps it is a little less necessary in the Plan 9 world. The items have generic "state" bits that come from the layout needs dictated by the HTML spec and conventions. For instance, if IFbrk is set, there is a forced line break before the item. The appenditems() routine in layout.c is the main routine for taking the Item list from build() and using it to place items where they belong on the page. The representation of how the page should look is kept in a "Lay" structure (see i.h), which organizes the items into a doubly-linked list of "Line" structures. Each line has a list of Items (the Items from parsing are distributed into lines). Before adding items to lines, the measure() routine does as good a job as it can at measuring the height, width and ascent (amount above baseline) of the items. The heart of the layout algorithm is the fixlinegeom() routine. Its main job is to break long lines, and, a serious complication, deal with images and tables that are supposed to "float" on the left or right margins. The floats are kept in a list, and we have to determine when it is time to put a float on the margin, and how that affects the current line width for breaking purposes. The other complicated part of layout has to do with table layout. The sizetable() routine does much of the work. It needs to make sublayouts of its own, and thus, calls additems inside another Lay structure. The final part of layout is to actually draw the lines. The drawall() routine in layout.c does this. It is pretty straightforward, just keeping track of the current position and walking the lines, using the calculated dimensions to update the current position and drawing the items using code specific to each type. The most complicated things to draw are tables (which use the same code recursively) and form fields, which are drawn by drawctl() --- it is mainly complicated because it draws all of the borders, buttons, comboboxes, etc., using primitive draw operations. We specifically avoid heavyweight "embedded windows".