$web/etc/handler
This mechanism is used to define the form of CGI file, SSI (Server Side Include) and auto-indexing service for specific directories.
http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]
abs_path
is/path/to/document
http://host:port/path/to/document?query
/doc/path/to/document
http://host:port/~alice/path/to/document?query
abs_path
is~alice/path/to/document
/doc/path/to/document
We call the /doc/path/to/document
request path, and denote the path by $request.
If the request path ends with “/
”, then Pegasus internally appends index.html
. We call the resulting path effective request path.
This does not mean two URL
http://host/path/to/foo/
http://host/path/to/foo/index.html
/path/to/foo
is not a directory. (a file or non-existent)
http://plan9.aichi-u.ac.jp
).# path mimetype hctl execpath arg ... /netlib/*/index.html text/html 0 /bin/ftp2html /printenv/* text/plain 0 /bin/printenv $target *.http - 1 $target *.cgi text/html + $target *.html text/html 0 $target
Fig.1: CGI handler of Pegasus.
First field is a path pattern, second field is default mime type, third fields is the control level of http header by the script, and 4th field is the path to a script. The 4th field may be followed by arguments of the script.
Path patterns are compared with effective request path.
The comparison is performed from the top of lines, and stopped if a pattern is matched.
In path pattern, directory separator “/
”' is not special. ( Therefore this pattern matching is not same as that of shell. ) There is one exception: we have a rule that pattern “/*/
” matches “/
”. Therefore the pattern
/netlib/*/README
/netlib/README
as well as /netlib/cmd/rit/README
for example.
Second field denotes the default value of HTTP header “Content-Type
”. If the field is “-
”, the script must set the header.
Third field named “hctl” takes values ‘1
’,'+
', and ‘0
’ that means control level to the http headers by the script; the meanings are
1 full control by the script + partial control by the script 0 no control by the script
1
’ is specified the script has responsibility to write all http header; the script is called non-parsed CGI in CGI/1.1. HTTP headers must be separated from HTML headers by a single blank line: a line that contains only “\n” code.<!DOCTYPE html> <html> ... </html>
In Fig.2, the third line starting with /printenv
is combined with the script below.
#!/bin/rc rfork e echo 'ARGUMENT' for(x in $*) echo $x echo echo 'ENVIRONMENT' for(x in `{ls -p /env}){ if(test -r /env/$x) echo $x `{cat /env/$x} }
Fig.2: /bin/printenv
This script may be useful to write CGI scripts under Pegasus.
If ‘+
’ is specified the script may contain http headers in compliance with CGI/1.1. The typical output style is
Content-Type: text/html Status 200 OK <!DOCTYPE html> <html> ... </html>
Fig.3: script example.
Note that:
hctl
="+
" with mimetype
="-
" corresponds to Apache CGI. However they are not the same. Pegasus CGI of this case has much more ability than Apache CGI.\r\n
”. Unix style “\n
” is OK. Pegasus cares the separators.Content-Type
” is absent, then default value in the mimetype
field will be supplied.Content-Length
”.Content-Length
” is present in the script, the value is ignored and is replaced by a computed one.
hctl
="+
".mimetype
="text/html
", but OK if mimetype
="-
".mimetype
="text/html
".
Another example is shown below.
Set-Cookie: cookie=something; expire=Sun, 6-Aug-2006 11:43:57 GMT; domain=ar.aichi-u.ac.jp; path=/test4; secure <html> <head> <title>Cookie sample</title> </head> <body> ... </body> </html>
A reserved word $target
in or after 4th field denotes absolute path (in httpd space) to the requested document. That is, $target
is the path that is prefixed “/doc/
” to effective request path.
The 4th field is a path to executable program that handles the request. Note that $target
in 4th field means the effective request path is an executable program.
The second line that begin with /netlib
in Fig.1 is for
http://plan9.aichi-u.ac.jp/netlib/
Other server such as Apache has an option to show directory index if index.html is absent. ftp2html
also does this action but does much more: if README file is present then the content is shown, and if INDEX file is present then the content is shown with appropriate action tag to the index label.
*
” was introduced to the third field of handler for the scripts that must handle all methods. (Pegasus 2.4)
Thus, the following configuration
/dav - * /bin/foo /dav/* - * /bin/foo
http://host/dav
The meaning of symbol “*
” is same as “+
” except the scripts must handle all methods.
Meaning of other symbols ("0
", “1
” and “+
”) are kept as they have been. Only the requests with HEAD, GET and POST methods will go to these script. Other requests will be handled by Pegasus and will be rejected except for OPTIONS. You need not handle HEAD method in these script, because the request is handled by Pegasus.
In summary, difference of meaning of symbols in the third field is listed in the following table.
method | limited method | all method |
---|---|---|
simple cgi | 0 |
|
cgi/1.1 | + |
* |
non-parsed cgi | 1 |
.
”) have been specified as “accessible only via CGI”.*
” in third field of handler also means to accept dot files.
Access to dot files from Mac/OSX client is annoying and causes dull response of the client. How to prevent the access? You will find some tips on the topic in next URI:
http://lists.apple.com/archives/Spotlight-dev/2006/Jun/msg00008.html
I don't know how to prevent accessing resource forks (files that begin with “._
”).
A special file “...
” is internally used to compute Content-Length of output of CGI.
You need not compute Content-Length in your CGI program for HTTP/1.1.
if(! ~ $request */){ echo X-CGI-Pass: /doc$request echo exit }
X-CGI-Pass: /baz
If “/baz” is equal to $target
, you may omit the name:
X-CGI-Pass:
text/html
” is specified for mimetype and the hctl value is ‘0’, then the format of CGI file is:<!DOCTYPE html> <html> ... </html>That is, don't start with “Content-Type:” as Apache requires:
Content-Type: text/html <!DOCTYPE html> <html> ... </html>
Apache type CGI is also supported. The file with suffix “cgi” in Fig.1 will configure CGI/1.1 for the file.
In case that “text/html
” is specified for “mimetype
”, Pegasus automatically send HTML headers to the client. Then response header becomes following rule:
200 OK
” is sent if exit status is not given.500 Internal Error
” and close the connection.keep
” or “close
” after “#
”exit '403 Forbidden # keep'
Both stdout
and stderr
are passed to client.
AUTH_TYPE CONTENT_LENGTH CONTENT_TYPE GATEWAY_INTERFACE PATH_INFO PATH_TRANSLATED QS_name # the name is name part in QUERY_STRING (see Note 1) QUERY_STRING REMOTE_ADDR REMOTE_HOST REMOTE_USER REQUEST_METHOD REQUEST_URI REQUEST_USER SCRIPT_NAME SERVER_NAME SERVER_PORT SERVER_PROTOCOL SERVER_SOFTWARE
HTTP_URI HTTP_SCHEME HTTP_HOST HTTP_REFERER HTTP_USER_AGENT
HTTP_HEADER
Additionally we have
request # requested path (see Note 2) home # /doc query # same as QUERY_STRING target # requested path from document root (see Note 3) name # basename of target hpid # pid of httpd that invoked the current script
members&children&name=alice&age=16
QS_=(members children) QS_name=alice QS_age=16
Note 2: Path of request
might end with “/
” if it is a directory. On the other hand target
is a file that is effectively requested. target
is expressed in the notation of rc
.
target = $request # request to a file target = $request/index.html # request to a directory
Note 3: The name “target” in environment variable is confusing because the same name is used in handler in different meaning. Therefore this name should be obsolete in future.
Note 4: environment variables starting with “HTTP_” are generated from key:val pair in HTTP request header. Key is case insensitive. Current RFC states that the key may be any printable ASCII but for “:”. However allowing special characters has potential risk in handling incoming requests. Note that all keys that are currently registered to IANA consist of only alpha-numeric and ‘-’. Therefore, in generating environment variables, Pegasus-2.9 allows only keys of IANA form and converts them to uppercase and, in addition, ‘-’ to ‘_’. The latter translation is to make it easy to handle keys in shell script. This conversion rule might be or might not be broken in future.
The current working directory of invoked CGI program is the directory where the target is located.
Other environment variables might be discarded or renamed in future.
erpath=$request if(test -d $erpath){ if(! ~ $erpath */){ redirect $erpath/ # which means we begin from the first by substituting # request=$erpath/ } } if(~ $erpath */) erpath=$erpath/index.html access_check $erpath handler $erpath send /doc$erpath
handler's first field is compared with $erpath
.
$target
in handler is /doc/$erpath
.
http://host/foo/?bar
http://host/~alice/foo/?bar
request to host document | request to user's document | decoded? | specified by | |
---|---|---|---|---|
HTTP URI | http://host/foo/?bar | http://host/~alice/foo/?bar | HTTP/1.1 | |
$HTTP_SCHEME | http | http | NO | Pegasus |
$HTTP_HOST | host | host | NO | Apache |
$REQUEST_URI | /foo/?bar | /~alice/foo/?bar | NO | Apache |
$REQUEST_USER | alice | YES | Pegasus | |
$PATH_INFO | / | / | YES | CGI/1.1 |
$PATH_TRANSLATED | /doc/ | /doc/ | YES | CGI/1.1 |
$SCRIPT_NAME | /foo | /~alice/foo | YES | CGI/1.1 |
$QUERY_STRING | bar | bar | NO | Apache |
$request | /foo/ | /foo/ | YES | Pegasus |
/foo/* - + /bin/baz
http://host/foo/?bar
Then, what values of these environment variables should be? The answer is unclear.
CGI/1.1 specification says that concatenation $SCRIP_NAME$PATH_INFO must be a decoded path part in URI.
Therefore these values are assigned as shown below.
request to host document | request to user's document | decoded? | specified by | |
---|---|---|---|---|
HTTP URI | http://host/foo/?bar | http://host/~alice/foo/?bar | HTTP/1.1 | |
$PATH_INFO | /foo/ | /~alice/foo/ | YES | CGI/1.1 |
$PATH_TRANSLATED | /doc/foo/ | /doc/foo/ | YES | CGI/1.1 |
$SCRIPT_NAME | YES | CGI/1.1 |
Content-Length
is checked by the server in receiving the data. Then server passes the data to CGI using stdin.
Timeout is defined to prevent buggy programs from waiting data so long time. The value can be specified in /sys/lib/httpd.conf
. The default is 5 seconds. I think the value is enough because the data is already held by the server.
A environment variable “hpid” is introduced for this purpose.
The “pid” is that of Pegasus in service.
echo -n timeout 180 > /proc/$hpid/note
def settimeout(n): note="/proc/%d/note"%hpid f=open(note, "w") if f==None: print "unable to open %s"%note print "timeout is not set" return f.write("timeout %d"%n) f.close() e=os.environ hpid=0 if e.has_key("hpid"): hpid=int(e["hpid"]) if hpid: settimeout(180) # continues heavy loaded tasks