Logo address

Pegasus 2.4

目次

2007/02/23
2007/01/27
2007/01/21
2007/01/20

Bug fix and new features.
The source is in http://plan9.aichi-u.ac.jp/netlib/

This page describes only new features from previous version. Basic features are unchanged. Please look Pegasus 2.2 and Pegasus 2.3 documents for detail.

For those who are new to Pegasus

For those who are new to Pegasus, the installation instructions for Pegasus 2.4 are here.

Concept of Pegasus is explained in the document Pegasus — another httpd for Plan 9 —, and you will find elsewhere ver.2.2 documents. Most of them are kept unchanged. If you are advanced user, you need to know changes after Pegasus 2.2.

Enhanced handler to support WebDAV CGI script

What was the problem?

The specification of execution handler("$web/etc/handler") was insufficient to support CGI for WebDAV.
WebDAV script must handle all methods: not only GET,HEAD,POST but also OPTIONS,PROPFIND,PUT,DELETE,etc.

New symbol "*" in handler

Therefore a new symbol "*" is introduced to the third field of handler for the scripts that must handle all methods. Thus, the following configuration
	/dav	-	*	/bin/foo
	/dav/*	-	*	/bin/foo
will enable "/bin/foo" to handle all requests that begin with URI:
	http://host/dav
where "host" is your host domain name.

The meaning of symbol "*" is same as "+" except the scripts must handle all methods.

Regular symbol "0","1" and "+"

Meaning of other symbols ("0","1" and "+") are kept as they have been. Only the requests with HEAD, GET and POST methods will go to these script. Other requests will be handled by Pegasus and will be rejected except for OPTIONS. You need not handle HEAD method in these script, because the request is handled by Pegasus.

In summary, difference of meaning of symbols in the third field is listed in the following table.

method limited method all method
simple cgi 0
cgi/1.1 + *
non-parsed cgi 1
where "simple cgi" means that the cgi has not ability to control http headers, and "limited method" means that only GET, HEAD and POST methods are handled by the cgi.

Files that begin with "."

Dot files (files that begin with ".") have been specified as "accessible only via CGI".
Now, the specification is only valid for GET,HEAD and POST method.
WebDAV must be able to handle all files including dot files*. Therefore, "*" in third field of handler also means to accept dot files.

* otherwise Finder of Mac/OSX client becomes somewhat unstable.

Access to dot files from Mac/OSX client is annoying and causes dull response of the client. How to prevent the access? You will find some tips on the topic in URIs below.

I don't know how to prevent accessing resource forks (files that begin with "._").

WebDAV script

WebDAV script is bundled. The name is "webdav". The script is modified version of "webdav.cgi" written by Yuno. The script is written in perl. There are some versions of perl for Plan 9. Perl version 5.004 does not work for webdav, but version 5.8.0 does work. You will find the latter version in sources.bell-labs.com.

The script works for Mac/OSX, WinXP and optionally for Win2000. I have not examined other type of clients yet.
Win2000 client requires dirty codes in Pegasus. Therefore Win2000 support is only enabled by a compiler option

	-DWIN2000DAV
Pegasus that compiled under this option will lose some of simple logical structure stated in this document.

You can use Digest authentication. Of course, you may not use authentication for debugging purpose.

Directory for log files.

The script "webdav" requires directory
	$web/log
where "$web" is the httpd root in which "webdav" is servicing.
Log files "webdav.log" and "webdav.dbg" will be created in this directory.

Usage of the script

In mounting server's directory, client will be prompted to input URI such as
	http://host/bar
The request does not mean that directory
	$web/doc/bar
in the server is the place to locate your files, where "$web/doc" is a document root. The reason is
These considerations suggest that you must be happy if you can specify real target other than "/doc/bar". Let the target be "/baz". The value will be "/doc" in case of home page maintenance. But general speaking, it need not begin with "/doc".

Then the configuration of "$web/etc/handler" will be

	/bar	-	*	/bin/webdav /bar /baz
	/bar/*	-	*	/bin/webdav /bar /baz
Why the "/bar" in the first argument of "/bin/webdav"?
Scripts do not know the first field of the handler. Therefore "/bar" is required for the scripts to pass the incoming request such as
	http://host/bar/quux
to the request "/baz/quux".

The script works also for regular user's URI such as

	http://host/~alice/bar
and works as well in https.

NB: Directory "$web/doc/bar" must not exist in supporting WinXP.
WinXP will send a request:
	PROPFIND /bar HTTP/1.1
then Pegasus will redirect the path to "/bar/" if the directory exist.
However WinXP does not answer the response and fails to establish connection.

Creating web folder in WinXP

Due to a bug of WinXP, you need a trick in creating web folder: a mark "?" at the end of URI. The example is shown below.
	http://host/dav?

X-CGI-Pass

An extended CGI header "X-CGI-Pass" is added.
The header is really useful for scripts because it enables scripts to pass the request to the host server.
Writing codes to answer to GET request is bothersome. Why we must write the codes? Servers already have the ability to answer the request!

The specification

CGI header
	X-CGI-Pass: /baz
is a directive to the server to process the file instead of requested file, where "/baz" is an absolute path name in httpd name space.

If "/baz" is equal to the requested file, you can omit the name:

	X-CGI-Pass:

Environment variables

Introduced variables and discarded variables

Following environment variables are introduced.
where "$" mark is prepended to the name.

Pegasus specific $REQUEST_PATH is discarded because it was same as $target and because the name is misleading.
Semantics of $REQUEST_URI is changed to be same as that of Apache. The variables in old scripts can be replaced by new variable $request.

In summery, old users of Pegasus must replace the two environment variables in script by new one:

	$REQUEST_PATH    ->    $target
	$REQUEST_URI     ->    $request
Some tools are prepared to find out files that contain these two variables in the pile of web documents.

URI related variables

Let "foo" be an executable file. Then I will make clear values of related variables in case requests are:
	http://host/foo/?bar
and
	http://host/~alice/foo/?bar
URI related environment variables are listed below with examples.

Table 1. URI related environment variables of Pegasus 2.4
request to host document request to user's document decoded? specified by
HTTP URI http://host/foo/?bar http://host/~alice/foo/?bar HTTP/1.1
$HTTP_SCHEME http http NO Pegasus
$HTTP_HOST host host NO Apache
$REQUEST_URI /foo/?bar /~alice/foo/?bar NO Apache
$REQUEST_USER alice YES Pegasus
$PATH_INFO / / YES CGI/1.1
$PATH_TRANSLATED /doc/ /doc/ YES CGI/1.1
$SCRIPT_NAME /foo /~alice/foo YES CGI/1.1
$QUERY_STRING bar bar NO Apache
$target /foo/index.html /foo/index.html YES Pegasus
$request /foo/ /foo/ YES Pegasus
NB:

What if script name is not in URI?

CGI handler, or execution handler, of Pegasus is powerful. For example we can configure like this:
	/foo/*	- + /bin/baz
This means: all request to the path that begin with "/foo/" go to the script "/bin/baz". Note that CGI/1.1 specification supposes only the case that script name is in URI. Environment variables $PATH_INFO, $PATH_TRANSLATED and $SCRIP_NAME are defined on this assumption.
On the contrary, request to Pegasus
	http://host/foo/?bar
does not mean "foo" is a script nor even "foo" is existent.

Then, what values of these environment variables should be? The answer is unclear.
CGI/1.1 specification says that concatenation $SCRIP_NAME$PATH_INFO must be a decoded path part in URI.
So these values are assigned simply as shown below.

Table 2. $PATH_INFO, $PATH_TRANSLATED and $SCRIPT_NAME in case that script name is not in URI.
request to host document request to user's document decoded? specified by
HTTP URI http://host/foo/?bar http://host/~alice/foo/?bar HTTP/1.1
$PATH_INFO /foo/ /~alice/foo/ YES CGI/1.1
$PATH_TRANSLATED /doc/foo/ /doc/foo/ YES CGI/1.1
$SCRIPT_NAME YES CGI/1.1

Authentication

Introduction

In starting this section, I will review the format of "$web/etc/passwd" of previous version of Pegasus.
Pegasus supports both Basic and Digest authentication schemes. The password file is "$web/etc/passwd". The example is as follows:
alice	 5c55d71b4c47d141072cf0540c046d07   /foo
alice    72e8979b4e26d67fe4920e3fbd2ebffb   /bar alice@hera
You will observe two types of lines, a line that consists of three fields, and a line that consists of four fields.
The former type is old format for only Basic authentication, and the latter is new format for both Basic and Digest.
In both format, the second field is a MD5 sum that is derived as follows:
	echo -n blackcat | md5sum
for old format, and
	echo -n alice:alice@hera:blackcat | md5sum
for new format. In these examples, "blackcat" is the user's password and "alice@hera" is the authentication realm.

Multiple user names in "$web/etc/passwd".

Only a single user name has been allowed to a protected directory. However Windows does not allow the user name such as "alice" for WebDAV. Alice will be insisted to select one of the following two user names:
	alice@host
	host\alice
where "host" is the domain name or the IP address of the domain.

It is one solution to restrict user name in conformity with one of Windows format.
The short coming is that Mac/OSX user is insisted to enter with user name of Windows format.

Another solution is to allow multiple user names to a single path name.
The example is shown below.

alice    	...   /foo alice@hera
alice@host 	...   /foo alice@hera
where "..." is a MD5 sum.

Pegasus 2.4 allows multiple user names in a single path name.

Mixture of Basic and Digest authentication

Now, we are allowed to have old and new authentication scheme for a single path as shown below.
alice    ...   /foo
alice    ...   /foo realm
where "..." is a MD5 digest, "/foo" is a path to be protected, and "realm" is authentication realm. The first format is an old format for only Basic authentication and the second format is a new format for both Basic and Digest authentication.

New key words, Basic and Digest, to the fifth field

In new format, the actual scheme is determined in negotiation with client. However the ambiguity might make a problem in some situations. Therefore new key words, Basic and Digest, are introduced. The usage examples are as follows:
alice    ...   /foo realm Basic
alice    ...   /foo realm Digest
These keywords are case insensitive. Actual implementation looks only the first letter "B" or "D".

Parameter "allowbasic" is gone

A parameter "allowbasic" in "/sys/lib/httpd.conf" is gone. Now, Basic authentication is always permitted for all users. Users are desired to use one of two forms.
alice    ...   /foo realm
or
alice    ...   /foo realm Digest

Changes in /sys/lib/httpd.conf

allowbasic

Parameter "allowbasic" is gone as is mentioned previous section.

maxpost

What was a problem?

Data size is restricted in POST method. The old format was
# maxpost	10	# maximum size of post data (in unit of MB)
In supporting WebDAV, the limit is applied also to any method that has a content. In using WebDAV, we will want to upload large data. However allowing unauthorized user to upload large data might make another problem.

The solution

Now the limit is categorized by two types of users: authenticated and non-authenticated. The current default configuration is shown below.
# maxpost1	10	# maximum size of post data (in unit of MB) for unauthorized
# maxpost2	100	# maximum size of post data (in unit of MB) for authorized

NB: Pegasus currently relies upon memory in receiving data. Therefore we cannot assign sufficiently large value to maxpost2. The problem should be fixed in near future.