INSTALLATION The file chem invokes chem.awk, which is where the dirty work gets done. chem.awk tells pic to include a copy of chem.macros; you will need to change a pathname on the 2nd line of chem.awk. The file INSTALLATION can be used as a guide. There are several test files called *.p. INTRODUCTION "chem" is yet another preprocessor like eqn, pic, etc., this time for producing chemical structure diagrams. Today's version is best suited for organic chemistry (bonds, rings) and it's excruciatingly slow sometimes. Who knows what the future may hold. In a style reminiscent of eqn and pic, diagrams are written in a special language and occur in a document surrounded by lines beginning .cstart and .cend (in the first column, naturally). Anything outside these is copied through intact; whatever is between .cstart and .cend is converted into pic commands to draw the diagram. So as a bare minimum, .cstart CH3 bond CH3 .cend prints two CH3 groups with a bond between them. To actually print this, you must run chem, pic, troff, and your output filter on whatever file contains the input: chem [file...] | pic | troff ... | ... THE LANGUAGE The chem input language is rather small. It provides bonds of several styles, moieties (e.g., C, NH3, ...), rings of several styles, and a way to glue them together as desired. In addition, since chem is a pic preprocessor, it's possible to include pic statements in the middle of a diagram to draw things not provided for by chem itself. Bonds: bond [direction] [length n] [from Name | picstuff] draws a single bond in direction from nearest corner of Name "bond" can also be double bond, front bond, back bond, etc. (We'll get back to "Name" in a minute.) "direction" is the angle in degrees (0 up, positive clockwise) or a direction word like up, down, sw (= southwest), etc. If no direction is specified, the bond goes in the current direction (usually that of the last bond). Normally the bond begins at the last object placed; this can be changed by naming a "from" place. For instance, to make a simple alkyl chain: CH3 bond (this one goes right from the CH3) C (at the right end of the bond) double bond up (from the C) O (at the end of the double bond) bond right from C CH3 A length in inches may be specified to override the default length. Other pic commands can be tacked on to the end of a bond command, to created dotted or dashed bonds or to specify a "to" place. Names: In the alkyl chain above, notice that the carbon atom C was used both to draw something and as the name for a place. A moiety always defines a name for a place; you can use your own names for places instead, and indeed, for rings you will have to. A name is just Name: ... "Name" is often the name of a moiety like CH3, but it needn't be. Any name that begins with a capital letter and contains only letters and numbers is ok: First: bond bond 30 from First draws something like / ____/ Moieties: A moiety is a string of characters beginning with a capital letter, such as N(C2H5)2. Numbers are converted to subscripts (unless they appear to be fractional values, as in N2.5H). The moiety names itself after special characters have been stripped out: N(C2H5)2) has the name NC2H52. BP is a special "branch point" (i.e., line crossing) that doesn't print. Normally a moiety is placed right after the last thing mentioned, but it may be positioned by pic-like commands, e.g., CH3 at C + (0.5,0.5) Text within quotes "..." is treated more or less like a moiety except that no changes are made to the quoted part. Rings: There are lots of rings, but only 5 and 6-sided rings get much support. "ring" by itself is a 6-sided ring; "benzene" is the benzene ring with a circle inside. "aromatic" puts a circle into any kind of ring. ring [pointing up|right|left|down] [aromatic] [put Mol at n] [double i,j k,l ...] [picstuff] The vertices of a ring are numbered 1,2,... from the vertex that points in the natural compass direction. So for a hexagonal ring with the point at the top, the top vertex is 1, while if the ring has a point at the east side, that is vertex 1. This is expressed as R1: ring pointing up R2: ring pointing right The ring vertices are named .V1 .. .Vn, with .V1 in the pointing direction. So the corners of R1 are R1.V1 (the "top"), R1.V2, R1.V3, R1.V4 (the "bottom"), etc., whereas for R2, R2.V1 is the rightmost vertex and R2.V4 the leftmost. These vertex names are used for connecting bonds or other rings. For example: R1: benzene pointing right R2: benzene pointing right with .V6 at R1.V2 creates two benzene rings connected along a side. Interior double bonds are specified as "double n1,n2 n3,n4 ..."; each number pair adds an interior bond. So the alternate form of a benzene ring is ring double 1,2 3,4 5,6 Heterocycles (rings with something other than carbon at a vertex) are written as "put X at V", as in R: ring put N at 1 put O at 2 In this heterocycle, R.N and R.O become synonyms for R.V1 and R.V2. There are two 5-sided rings. "ring5" is pentagonal with a side that matches the 6-sided ring; it has four natural directions. A "flatring" is a 5-sided ring created by chopping one corner of a 6-sided ring so that it exactly matches the 6-sided rings. The description of a ring has to fit on a single line. Miscellaneous: The specific construction bond... ; moiety (spaces matter!) is equivalent to bond moiety Otherwise, each item has to be on a separate line (and only one line). A period "." in column 1 signals a troff command, which is copied through as is. A line whose first non-blank character is a # is treated as a comment. A line whose first word is "pic" is copied through as is after the "pic" has been removed. The command size n scales the diagram so it looks plausible at point size n (default is 10 point). Anything else is assumed to be pic and is copied through with a label. WISH LIST It's too slow (but it's too early in the game to optimize). Error checking is minimal; errors are usually detected and reported in an oblique fashion by pic. There's no library or file inclusion mechanism, and there's no shorthand for repetitive structures. The extension mechanism is to create pic macros, but these are tricky to get right and don't have all the properties of built-in objects. There's no in-line chemistry yet (e.g., analogous to the $...$ construct of eqn). There is no way to control entry point for bonds on groups. Normally a bond connects to the carbon atom if entering from the top or bottom and otherwise to the nearest corner. Bonds from substituted atoms on heterocycles do not join at the proper place without adding a bit of pic. There is no decent primitive for brackets. Text (quoted strings) doesn't work very well. A squiggle bond is needed. COMPLAINTS If something doesn't work, or if you can see a way to make something better, let us know. jon bentley lynn jelinski brian kernighan Benjamin Newman (Plan 9 porter)