.so tmac.ppt .de X1 .br .ne 2.5i .mk .P1 .ps 20 .vs 22 .. .de X2 .P2 .rt .P1 .in +3i .ps 20 .vs 22 .. .de X3 .P2 .. .TL .sp -2 \f(CWrscc\fP: an extensible compiler .AU .nf Russ Cox Frans Kaashoek Eddie Kohler .sp 2 .ft R PDOS Group Meeting .br January 24, 2006 .sp 2 .CW http://swtch.com/~rsc/talks/ ... .SL "Problem (big picture) Programs are buggy. Because programming languages don't work for you. You work for them. How do you resolve mismatch between programming task and language? .SL "Outline Problem \- 5 slides Specific Extensions \- 10 slides How to Implement \f(CWrscc\fP \- 10 slides \- parser \- internal representation Please interrupt with questions. .SL "Good Languages? .nf Some languages make some bugs easier to avoid / find. \- good abstractions \- type systems Never good enough. People extend languages any way they can. \- C preprocessor (easy, but rarely good) \- custom preprocessors (good, but rarely easy) \- custom checkers .SL "Useful Preprocessors .nf Click (Kohler) \- language to describe module properties, router configs Tame (Krohn) \- more convenient syntax for libasync \- (Even \f2I\fP can write tame programs!) Ace (Gosling) \- syntax-based preprocessor, used for bitblt QT Rewriter - used for simpler event syntax in QT library .SL "Useful Checkers .nf Magik (Engler) \- checking of flow-sensitive conditions, using xgcc \- more recent work about guessing things to check Uno (Holzmann) \- another static checker, actually available. Sparse (Torvalds) \- add annotations to Linux source for lock/unlock pairing, kernel/user address spaces .SL "Implementation Techniques .nf Start from scratch \- lots and lots of work, requires knowing about compilers Edit an existing compiler \- lots of work, requires knowing about compilers, requires \fIknowing\fP that particular compiler .SL "Problem (recap) .nf Programs are buggy. Because languages aren't good for programming. Lots of good can come from tailoring the language. But it's far too much work for most programmers. How can we lower the bar? .SL "Solution: Extensible Compiler .nf Extensible compiler \- let programmers add extensions (\f(CW.so\fP files) to the compiler during compilation. \- extensions are separate from programs that use them (think media-player codecs) \- extensions add new language features, new checking \- make it easy to write these extensions \- hide compiler when possible \- make compiler easy to understand .SL "Solution: Extensible compiler .PE Extensible compiler, without extensions: .PS X1: box wid 1.5 ht 0.35 "\f(CWrscc\fP" SRC: box invis wid 2 "C program" with .e at X1.w-(1,0) OBJ: box invis wid 2 "executable" with .w at X1.e+(1,0) arrow from SRC.e to X1.w-(.2,0) arrow from X1.e+(.2,0) to OBJ.w SRC2: box invis wid 2 "" "" with .n at SRC-(1,0) .PE Extensible compiler, with one extension: .sp .PS X1: box wid 1.5 ht 0.35 "\f(CWrscc\fP" SRC: box invis wid 2 "program" "using" "extension" with .e at X1.w-(1,0) OBJ: box invis wid 2 "executable" with .w at X1.e+(1,0) arrow from SRC.e to X1.w-(.2,0) arrow from X1.e+(.2,0) to OBJ.w SRC2: box invis wid 2 "extension" "source" with .n at SRC.s-(1,1) X2: box wid 1.5 ht 0.35 "\f(CWrscc\fP" with .w at SRC2.e+(1,0) OBJ2: box fill 0.9 wid 1.5 ht 0.35 "\f(CWext.so\fP" at X1.se-(.25,.1) arrow from SRC2.e to X2.w-(.2,0) spline -> from X2.e+(.2,0) to OBJ2.s-(0,1) then to OBJ2.s .PE .SL "Expected Uses .nf Extensions are usually small (and can start small). Programmers write compiler extensions while writing the programs that use them. Project leaders write compiler extensions to make big project easier for others. Net work should be less than not using extensions. Make possible the language changes mentioned earlier. .SL "Example Extension: Alef .LP .X1 .in 0 .in -0.5i void primetask(chan(int) c) { chan(int) nc; int i, p; p = <-c; print("%d\en", p); alloc nc; task primetask(nc); for(;;){ i = <-c; if(i%p) nc <-= i; } } Alef original .X2 .in -4i void primethread(void *arg) { Channel *c, *nc; int i, p; c = arg; p = recvul(c); print("%d\en", p); nc = chancreate(sizeof(ulong), 0); threadcreate(primethread, nc, 1024); for(;;){ i = recvul(c); if(i%p) sendul(nc, i); } } C transliteration .X3 .SL "Example Extension: Alef .X1 Alef chan(int) nc; alloc nc; task primetask(nc); nc <-= i; .X2 C + libthread Channel *nc; nc = chancreate(sizeof(ulong), 0); threadcreate(primethread, nc, 1024); sendul(nc, i); .X3 Benefits of Alef: \- more concise, more readable. \- type-checked channels. \- good thread creation syntax. \- good communication syntax (\f(CWalt\fP) Alef is dead. \- compiler author moved on. \- those left behind switched to C. gave up the benefits for ease of maintenance .SL "Example Extension: Tame .X1 .ta +2n +2n +2n TAME(static void try_connect(str h, int port)) { VARS { int fd; } BLOCK{ tcpconnect(h, port, @(fd)); } if(fd < 0) warn << h << "\en"; exit(0); } Tame TCP client. .X2 static void cb1(str, int); static void try_connect(str h, int port) { tcpconnect(h, port, wrap(cb1, h)); } static void cb1(str h, int fd) { if(fd < 0) warn << h << "\en"; exit(0); } Equivalent libasync .X3 .SL "Example Extension: Tame .X1 .ta +2n +2n +2n TAME(static void try_connect(str h, int port)) { VARS { int fd; } BLOCK{ tcpconnect(h, port, @(fd)); } if(fd < 0) warn << h << "\en"; exit(0); } Tame TCP client. .X2 static void cb1(str h, int fd) { if(fd < 0) warn << h << "\en"; exit(0); } static void try_connect(str h, int port) { tcpconnect(h, port, wrap(cb1, h)); } Libasync in the wild. .X3 .SL "Example Extension: Tame .X1 VARS { int fd; } BLOCK{ tcpconnect(h, port, @(fd)); } exit(fd); .X2 static void cb1(void) {exit(fd);} tcpconnect(port, wrap(cb1)); .X3 Benefits of Tame: \- more concise, more readable \- does not encourage upside-down programming \- easy to handle multiple parallel threads-of-control \- can write structured programs again (\f(CWif\fP, \f(CWfor\fP, ...) .SL "Example Extension: Plan 9 Kernel Checker .nf Plan 9 kernel has many conventions to remember: (So does any large program.) \- lock/unlock pairing \- no sleeping in interrupt handlers \- interrupt handlers must use ilocks \- exception stack for error handling \- kernel/user pointers \- no dereferencing user pointers in interrupt handlers \- ... Many are checked at run-time. No fundamental reason not to check at compile time. .SL "Example Extension: Plan 9 Kernel Checker .nf Add attributes to document and enforce conventions. .P1 .ps 20 .vs 22 void *ialloc(int) attribute(nosleep); void *malloc(int) attribute(nosleep); void print(char*, ...); void* ialloc(int n) { void *v; v = malloc(n); return v; } .P2 .SL "Example Extension: Plan 9 Kernel Checker .nf Add attributes to document and enforce conventions. .P1 .ps 20 .vs 22 void *ialloc(int) attribute(nosleep); void *malloc(int) attribute(nosleep); void print(char*, ...); void* ialloc(int n) { void *v; v = malloc(n); .RGB 0.75 0 0 if(v == nil) print("ialloc returning nil\en"); .CO black return v; } .P2 .ps .L1 Compiler can tell programmer he made a mistake. .SL "\f(CWrscc\fP overview .nf Front end only \- compile to low-level C and invoke \f(CWgcc\fP Written using itself Extensible LR(1) parser Extensible grammar notation Destructuring syntax Restructuring syntax Other extensions as convenient .SL "Extensible LR(1) parser .nf Shift-reduce parsers run an NFA on input. \- NFA says shift or reduce at each step Parse \f(CWx * x + x x * x + x shift x * x + x shift x * x + x reduce x => x*x x + x shift x + x shift x + x reduce x => x+x x .ft .SL "Extensible LR(1) parser .nf Shift-reduce parsers run an NFA on input. \- NFA says shift or reduce at each step \- NFA corresponds exactly to grammar rules \f(CWYacc\fP precomputes, hard-codes equivalent DFA. New parser for rscc: \- run determinization on the fly, caching result. \- if grammar changes, throw away DFA built so far. \- normal C library interface .P1 yynewsym(g, "\e"**\e"", 0); yylexnewsym(g, "\e"**\e""); yyrulestr(g, yyaction, "expr", "expr", "\e"**\e"", "expr", 0); .P2 .SL "Grammar Syntax .nf Use an extension to present a better parser interface. .P1 yacc { yaccsym right "**" %prec "*"; expr: expr "**" expr; } .P2 .L1 Compiles into appropriate function calls. Provides default \f(CWyyaction\fP rule that builds an abstract syntax tree node. .SL "Destructuring Syntax .nf Need to make abstract syntax tree nodes accessible. Don't want extension writer to need to know internal representation. Provide matching operator \f(CW~\fP for destructuring syntax. .P1 Node *a, *b, n; if(n ~ `expr{ \ea**\eb }) warn("using **: base=%N exponent=%N", a, b); .P2 .SL "Restructuring Syntax .nf Same syntax as destructuring, but used as expression instead of next to \f(CW~\fP. .P1 Node *n, *a, *b; if(n ~ `expr{ \ea**\eb }) n = `expr{ pow(\ea, \eb) }; .P2 .L1 The \f(CWa\fP and \f(CWb\fP are substituted as units, not as token streams. .P1 int x; Node *n, *nn; x = 2; n = `expr{ 1+\ex }; nn = `expr{ \en*3 }; .P2 .SL "Extensible Functions Linked list of function pointers implementing a function. Each function can handle a call itself, with or without invoking next function in the list. Syntax codifies the idiom. .P1 extend Node* compile(Node *n) { Node *a, *b; if(n ~ `expr{ \ea**\eb }) return compile(`expr{ pow(\ea, \eb) }); return default; } .P2 .SL "Extensible data structures Same idea as functions; not sure of implementation. .P1 extend struct Node { int extra; int other; int field; }; .P2 .SL "Other extensions: iterators .P1 static int start(State *z, Rule *r); static int next(State *z, Sym **s); static int end(State*); iterator(start, next, end); Rule *r; Sym *s; for(s in r) something(s); .P2 .SL "Related Work .nf Lots of macro systems \- can't write checkers with those. \- often run into scoping problems Other extensible languages/compilers \- require programmers to know too much about internals \- not easy enough to write extensions .SL "Status \f(CWRscc\fP exists. \- uses extensible grammar, yacc syntax \- provides its own grammar action functions \- uses restructuring, not destructuring \- uses extensible functions heavily \- no extensible data structures yet \- uses iterators heavily \- know how to switch over to implicit actions and destructuring \- need to find ways to hide other aspects of compiler internals (types, flow control, ...) \- submit to OSDI \- always looking for more target uses