Logo address

Execution Handler

目次

Location

	$web/etc/handler

Description

Execution handler is a program that processes files of specific path pattern. User can define relations between path pattern and the handler. This mechanism is used to define the form of CGI file, SSI (Server Side Include) and auto-indexing service for specific directories.

Configuration

Relations between path patterns and the handler is defined in $web/etc/handler. The following is the content of my configuration (http://plan9.aichi-u.ac.jp).
# path      mimetype    unused    execpath arg ...
/netlib/*/index.html text/html 0 /bin/ftp2html
*.http         -         0       $target
*.html      text/html    1       $target
*.dx_html   text/html    0       /bin/dx $target
The first field is a path pattern and the 4th field is the path to handler. The second field indicates how to do that, and the third fields is currently unused.
Comparison of path pattern and requested path is performed from the top of line. Comparison is stopped if pattern is matched to the requested path.
Arguments of the program may continue after 4th field.

In path pattern, directory separator "/"' is not special. ( Therefore this pattern matching is not same as that of shell. ) There is one exception: we have a rule that pattern "/*/" matches "/". Therefore the pattern

	/netlib/*/index.html
matches to /netlib/index.html as well as /netlib/cmd/backup/index.html for example.
Second field denotes "ContentType" of HTTP header. If the field is "-", the handler must send all HTTP headers.
Third field is not used. This field was ramfs but obsoleted because ramfs is always serviced to CGI program in current version.

The 4th field is the path to handler. Handler must be a executable.
$target in or after 4th field denotes absolute path to the requested document. Note that $target in 4th field means the requested path is an executable program.

Clients can request to Pegasus adding arguments to CGI . These arguments are automatically added without description in handler.

/bin/ftp2html in this example is a program that is used in

	http://plan9.aichi-u.ac.jp/netlib/
to handle my FTP directories. Other server such as Apache has an option to show directory index if index.html is absent. ftp2html also does this action but does much more: if README file is present then the content is shown, and if INDEX file is present then the content is shown with appropriate action tag to the index label.
/bin/dx is a tool that is designed by the author to be used for SSI.

Comparison with Apache CGI

If text/html is specified for mimetype , the format of CGI is:
<html>
...
</html>
That is, don't start with "Content-Type" as Apache do:
Content-Type: text/html

<html>
...
</html>

Error handling in CGI program

Now Pegasus can easily pass response code to the client. Therefore you might not need "-" for mimetype.
In case that text/html is specified for mimetype Pegasus automatically send HTML headers to the client. Then response header becomes following rule:
It seems this rule is working well, however we can control directly the connection: we can specify "keep" or "close" after "#"
	exit '403 Forbidden # keep'

Both stdout and stderr are passed to client.

Relation to URI

Params in HTTP/1.0 及び HTTP/1.1 is applied to Pegasus. According to RFC specification the format is:
	path;params?query
	params = param[;params]
Some of traditional web server neglect params and passe query to CGI as the argument. Pegasus disapproves this traditional manner and accepts param as argument parts that should be passed to CGI. On the other hand, Pegasus does not participate in interpritating query and passes it to CGI as environment variable without translation.

Name space of CGI

The name space for CGI program can be modified using
	/etc/namespace_80
where 80 is the port number.

Environment variable of CGI

Pegasus has many environment variables. However most of them are only experimental. Solid variables are shown in the following:
	GATEWAY_INTERFACE
	SERVER_NAME
	SERVER_PORT
	SERVER_SOFTWARE
	SERVER_PROTOCOL
	REQUEST_METHOD
	REMOTE_ADDR
	QUERY_STRING
	HTTP_HEADER
	HTTP_HOST
	HTTP_REFERER
	HTTP_USER_AGENT
	REQUEST_PATH	# requested path (see Note)
	REQUEST_URI	# requested path (see Note)
	home		# /doc
	query		# same as QUERY_STRING
	target		# requested path in service space
	name		# basename of target
	cputype		# 386
	objtype		# 386
	date		# date such as "{Mon, 04 Mar 2002 07:32:40 GMT}"
Note: path of REQUEST_URI might end with "/" if it is a directory. On the other hand REQUEST_PATH is a file that is effectively requested. target is expressed in the notation of rc.
	target = /doc$REQUEST_PATH

Other environment variables might be discarded or renamed in future.

Handling of POST data

If POST'ed data is once received by the server from the client, Content-Length is checked by the server in receiving the data. Then server passes the data to CGI using stdin.

CGI timeout

Timeout is defined to prevent buggy programs waiting data so long time. The value can be specified in httpd option or /sys/lib/httpd.conf. The default is 5 second. I think the value is enough because the data is already held by the server.