Logo address

Concept

目次

2002/08/26 update
2002/04/02 update
2002/02/12

Encapsulation of name space

Generally speaking, `server root' of web server has no essential meaning.
For example, ServerRoot in httpd.conf of Apache web server is the directory to locate a log file and some configuration files.

`Server root' of traditional web server does nothing to regulate accesses to the name space in which the server is servicing.
The problem will become clear if we run CGI programs on the server:
all the files in the system will be seen from the CGI programs.
There is potentially a serious security problem.
By this reason CGI programs of users will be prohibited or will be regulated under the control of system administrator.

Fig.1 illustrates the relation among three name spaces: console space, service space, document space.
Console space is a set of files that can be seen from console and this is also a set of files on the system. ( Console space is shown by bold rectangle. )
Service space is a set of files on which the web server is servicing to the client. Service space is also a set of files that can be accessed by CGI programs.
Service space is exactly equal to console space in traditional web server.
Document space is a set of documents that consists Web Pages.
The document space of alice is a set of document that can be accessed using URI /~alice/.
Document space of all users is in the service space and therefore they can be accessed equally by any CGI programs.

An essential progress was made first by the httpd of Plan9 second edition.
The server could encapsulate service space under server root(Fig.2).
The technique stood on the special ability of Plan9: per process name space.
The server root became "/" in the name space that was seen by CGI programs of this server. That is, CGI programs were encapsulated in the name space that is specified by server root.
Fig.2 illustrates the encapsulation. The area shown by gray color is a set of files out side of service space; the space is essentially hidden from CGI programs.

However the httpd of Plan9 2ed stood on HTTP/1.0 and was poor to handle CGI environment:
location of CGI program was fixed and POST method was not supported.
Charles Forsys proposed another design by his server(York httpd). The problem was fixed in his server. ( Therefore I had long used his httpd, though it stood on HTTP/1.0 )

Httpd of Plan9 3ed supported HTTP/1.1, however no advance was done for CGI environment.

I reconstructed httpd of Plan9 3ed to introduce new design and named the server Pegasus (1 Jan. 2002, this is a year of horse in Chinese astrology).
Now Plan9 standard httpd supports TLS. Pegasus 1.2 also supports TLS and stands on the library of Plan9 4ed.

Separated name space based on document management

A server (some.dom.com) may be accessed in various ways:
	http://some.dom.com/pathname	# main document
	http://some.dom.com/~alice/pathname	# user's document
	http://other.dome.com/pathname	# a document of virtual host (different IP)
	http://vertual.dom.com/pathname # a document of virtual host (same IP)
Generally speaking these documents are managed and administrated by different persons.

One of the problems of traditional web server (including that of Plan9) is the service space is shared among these persons.( see Fig.1 and Fig.2 ).
(Note that the problem is similar to that of address space of personal computers in the early days. Logical address was not supported.)
This means a CGI program of some person can look the documents of other persons. Therefore, there exists potential possibility of interference among the persons who have documents on the web server.

Pegasus fixed this problem.

All the web documents of a person have a single document root, /doc, in service space. That is, there is only a single document root of one person in service space. ( see Fig.3a and 3b.)
Fig.3a illustrates the relation between service space and document space when the server is accessed by /~alice.
If the server is accessed by /~bob, the relation will be changed to Fig.3b.
This means any CGI programs are encapsulated from documents of others.
CGI programs cannot access to the files not due to access control of each files but due to the scope of name space.
There will be no need to access files of other persons via CGI. Therefore this sever design should work without problem.

Currently only Pegasus have this unique ability.

Protection of files from wrong access

There may be files that should be protected from wrong access by users of the server directly or indirectly. Typical indirect access comes from CGI programs of other persons.
Therefore a user alice might think: her file `data' should be accessed only by herself and by her CGI.

Pegasus resolves this problem.

1. Run the server as user web.
(user web is not a real user, therefore there is no need to own files.)
2. add web to /adm/users as a group member of alice:

	alice:alice:web
3. alice sets access permision of data to be:
	-lrw-rw---- alice alice .... data
(in case of `write' access) or
	a-rw-rw---- alice alice .... data
may be more safe way not to lost her data.

Note: other files should be set as:

	--rw-r--r-- alice alice .... data
(if this file is allowed to read by anyone.)

Why the problem of access protection is solved so simply?
Because service space of Pegasus is encapsulated to each user.
Unix resolves this problem using CGI wrapper( for example, look http://download.sourceforge.net/cgiwrap ).
That is, CGI wrapper is set SUID of root and httpd is forced to access to CGI only via CGI wrapper.

Comparing two method, we can conclude that:
1. Pegasus method is safer than CGI wrapper, because all files of a user will fall into danger under CGI wrapper if the user write a problematic CGI. On the other hand, only files that permit writing access to `web' will fall into danger under Pegasus.
2. Pegasus method is much easier to administrate. There is almost nothing to administrate. The only thing to do is to run Pegasus as user `web'.

Virtual document environment with high freedom

The current target of Pegasus is to serve CGI to users both with high security and high freedom.

CGI program of Pegasus is, if standard configuration is applied,

  1. execution flag is set to `others'
  2. file suffix is http or html
  3. it is located in document space
  4. the name does not start with period
NCSA and Apache server also have an option that permits users to locate CGI files in document space. However they prohibit to do so and force users to locate all CGI programs in the directory /cgi-bin/ if safety is required.

Pegasus realizes virtual documents using execution handler.
So called CGI file of Pegasus is one of special configuration of execution handler.
It might be required to explain the term `execution handler', because the term may be original to Pegasus.
`Execution handler' is a program that processes files requested by clients.
User defines relation between path pattern of the request and the program to process it. ( The definition is written in /etc/handler in service space. ) We call the program `handler' of the file. If requested file is same as the handler, the file is a traditional CGI file.
A special handler can be assigned to files with special suffix.Thus we can introduce Server Side Include using execution handler.
A special handler can also be assigned to specific directories. Thus we can introduce auto-indexing mechanism for the directories of FTP service.

`Execution handler' may be applied to wide range of applications and I would like to emphasize: it is completely controllable by users (not by system administrator).