CGI Handler

This mechanism is used to define the form of CGI file, SSI (Server Side Include) and auto-indexing service for specific directories.

Path in URI

We call the /doc/path/to/document request path, and denote the path by $request.
If the request path ends with “/”, then Pegasus internally appends index.html. We call the resulting path effective request path.
This does not mean two URL

Configuration

First field is a path pattern, second field is default mime type, third fields is the control level of http header by the script, and 4th field is the path to a script. The 4th field may be followed by arguments of the script.

Path patterns are compared with effective request path.
The comparison is performed from the top of lines, and stopped if a pattern is matched.

In path pattern, directory separator “/”' is not special. ( Therefore this pattern matching is not same as that of shell. ) There is one exception: we have a rule that pattern “/*/” matches “/”. Therefore the pattern

Second field denotes the default value of HTTP header “Content-Type”. If the field is “-”, the script must set the header.

Third field named “hctl” takes values ‘1’,'+', and ‘0’ that means control level to the http headers by the script; the meanings are

In Fig.2, the third line starting with /printenv is combined with the script below.

If ‘+’ is specified the script may contain http headers in compliance with CGI/1.1. The typical output style is

NB: Pegasus has a bug in default mimetype for hctl="+".
That is, Fig.3 is wrong if mimetype="text/html", but OK if mimetype="-".
This script should be OK even if mimetype="text/html".
This bug will be fixed in next release (Pegasus 2.8a).

A reserved word $target in or after 4th field denotes absolute path (in httpd space) to the requested document. That is, $target is the path that is prefixed “/doc/” to effective request path.

The 4th field is a path to executable program that handles the request. Note that $target in 4th field means the effective request path is an executable program.

Note 1: In old days, the directory was used for FTP service.

Other server such as Apache has an option to show directory index if index.html is absent. ftp2html also does this action but does much more: if README file is present then the content is shown, and if INDEX file is present then the content is shown with appropriate action tag to the index label.

Enhanced control “*”

The meaning of symbol “*” is same as “+” except the scripts must handle all methods.

Meaning of other symbols ("0", “1” and “+”) are kept as they have been. Only the requests with HEAD, GET and POST methods will go to these script. Other requests will be handled by Pegasus and will be rejected except for OPTIONS. You need not handle HEAD method in these script, because the request is handled by Pegasus.

In summary, difference of meaning of symbols in the third field is listed in the following table.

Files that begin with “.”

* otherwise Finder of Mac/OSX client becomes somewhat unstable.

Table 1: hctl symbols in handler
method	limited method	all method
simple cgi	`0`
cgi/1.1	`+`	`*`
non-parsed cgi	`1`

I don't know how to prevent accessing resource forks (files that begin with “._”).

Ramfs

A special file “...” is internally used to compute Content-Length of output of CGI.
You need not compute Content-Length in your CGI program for HTTP/1.1.

X-CGI-Pass

example

The specification

Comparison with Apache CGI

Apache type CGI is also supported. The file with suffix “cgi” in Fig.1 will configure CGI/1.1 for the file.

Error handling in CGI program

In case that “text/html” is specified for “mimetype”, Pegasus automatically send HTML headers to the client. Then response header becomes following rule:

ENVIRONMENT VARIABLES

Note 1: Query string is automatically decoded by the httpd. For example, a query

members&children&name=alice&age=16

produces environment variables:

QS_=(members children)
QS_name=alice
QS_age=16

The prefix “QS_” is added for safety.

Note 2: Path of request might end with “/” if it is a directory. On the other hand target is a file that is effectively requested. target is expressed in the notation of rc.

target = $request		# request to a file
target = $request/index.html	# request to a directory

Note 3: The name “target” in environment variable is confusing because the same name is used in handler in different meaning. Therefore this name should be obsolete in future.
Note 4: environment variables starting with “HTTP_” are generated from key:val pair in HTTP request header. Key is case insensitive. Current RFC states that the key may be any printable ASCII but for “:”. However allowing special characters has potential risk in handling incoming requests. Note that all keys that are currently registered to IANA consist of only alpha-numeric and ‘-’. Therefore, in generating environment variables, Pegasus-2.9 allows only keys of IANA form and converts them to uppercase and, in addition, ‘-’ to ‘_’. The latter translation is to make it easy to handle keys in shell script. This conversion rule might be or might not be broken in future.

The current working directory of invoked CGI program is the directory where the target is located.

INTERNAL FLOW

handler's first field is compared with $erpath.
$target in handler is /doc/$erpath.

CGI/1.1

If script name is in URI

If script name is not in URI

Table 2: URI related environment variables of Pegasus
	request to host document	request to user's document	decoded?	specified by
HTTP URI	http://host/foo/?bar	http://host/~alice/foo/?bar		HTTP/1.1
$HTTP_SCHEME	http	http	NO	Pegasus
$HTTP_HOST	host	host	NO	Apache
$REQUEST_URI	/foo/?bar	/~alice/foo/?bar	NO	Apache
$REQUEST_USER		alice	YES	Pegasus
$PATH_INFO	/	/	YES	CGI/1.1
$PATH_TRANSLATED	/doc/	/doc/	YES	CGI/1.1
$SCRIPT_NAME	/foo	/~alice/foo	YES	CGI/1.1
$QUERY_STRING	bar	bar	NO	Apache
$request	/foo/	/foo/	YES	Pegasus

Then, what values of these environment variables should be? The answer is unclear.
CGI/1.1 specification says that concatenation $SCRIP_NAME$PATH_INFO must be a decoded path part in URI.
Therefore these values are assigned as shown below.

Handling of POST data

CGI TIMEOUT

Global Setting

Table 3: some variables in case that script name is not in URI.
	request to host document	request to user's document	decoded?	specified by
HTTP URI	http://host/foo/?bar	http://host/~alice/foo/?bar		HTTP/1.1
$PATH_INFO	/foo/	/~alice/foo/	YES	CGI/1.1
$PATH_TRANSLATED	/doc/foo/	/doc/foo/	YES	CGI/1.1
$SCRIPT_NAME			YES	CGI/1.1

Timeout is defined to prevent buggy programs from waiting data so long time. The value can be specified in /sys/lib/httpd.conf. The default is 5 seconds. I think the value is enough because the data is already held by the server.

For Each CGI

A environment variable “hpid” is introduced for this purpose.
The “pid” is that of Pegasus in service.