ASN.1 and BER: assembly and parsing

The ber.c module provides a library of utilities to manipulate BER-encoded information.

At the core of the module is a simple recursive structure:

	typedef struct berObj BerObj;
	struct berObj {
		int tag, len, size;
		char *buf;
		BerObj *next, *down;
	};

which is used both to construct new BER records and to parse existing ones.

In the case of parsing BER structures, one starts with a binary string (octet string, in the terminology of ITU-T recommendation X.690) which is assimilated into the buffer in the outermost structure.  At this level, no siblings apply, any progeny is represented by a single child (down) and its siblings in a linked list (down->next, etc.).  We do not store the indication that this is a composite entry, the existence of a non-nil "down" link suffices in this role.

The output from the parse procedure is then a linked list of BER objects where inner buffer pointers connect each object with the portion of the outside string that comprise the actual value.  It is left to the user, presently, to interpret the tag and transform the value to some internal representation.  It is likely that a single function could be provided to deal with the fundamental (primitive) types.

For my own convenience:

	0	Reserved for use by the encoding rules
	1	Boolean type
	2	Integer type
	3	Bitstring type (residual width in first octet)
	4	Octetstring type
	5	Null type (always empty)
	6	Object identifier type
	7	Object descriptor type
	8	External type and Instance-of type
	9	Real type
	10	Enumerated type
	11	Embedded-pdv type
	12-15	Reserved for future extensions
	16	Sequence and Sequence-of type
	17	Set and Set-of type
	18-22, 25-30	Character string types
	23, 24	Time types
	31-..	Reserved for addenda

This format fails to handle really large objects, whereas the design of BER allows for sequential handling of the input string.  An iterator that is not encumbered by large memory requirements would be useful for small memory parsers.  This may well be added in the foreseeable future.

	For the time being, we'll give priority to the double check
	that (a) we can allocate a sufficiently large buffer for the
	input data and (b) that we do not exceed an arbitrary size
	laid down by the user.  Currently, no such test is present.

The assembly of BER objects separates pretty naturally into two distinct operations.  Primitive objects can be constructed very simply by a generic operation or, more suitably, by a small set of specialised functions that differ only in small details.

This is the public interface for object construction:

	// object creation
	BerObj *ber_simple (int tag, uchar *s, int len);	// generic assembly
	BerObj *ber_int2 (int);								// 2-octet integer
	BerObj *ber_int4 (long);							// 4-octet integer
	BerObj *ber_intarb (char *s, int len);				// arbitrary-length (integer?)
	BerObj *ber_bool (int);								// boolean
	BerObj *ber_ostr (uchar *, int);					// length is octets
	BerObj *ber_bstr (uchar *, int);					// length is bits
	BerObj *ber_objid (char *);							// object ID, NUL terminated, period delimited
	BerObj *ber_init (int);								// object initialisation
	BerObj *ber_attach (BerObj *, BerObj *);			// object composition
	int ber_seal (BerObj *);							// object sealing

	BerObj *ber_parse (uchar *s, uchar *se);			// object deconstruction

The generic assembly of a primitive BER object is mirrored in the
operation used to attach a new child to an existing constructed
object.  The child object is linked to the outer (parent) object at
the "down" link, or appended to the linked list that starts with
"down" and proceeds with "down->next", "down->next->next", etc.

Because the parent object is exclusively a container, assembly is
practically complete once the last child is attached to it (or one of
its own children).  However, whereas primitive objects are finalised
at creation, constructed objects cannot be finalised until their total
length has been computed (assuming that one prefers definite to
indefinite lengths - our library operates under this assumption, a
simpler procedure to produce BER objects with indefinite length
entries ought to be available) and the "seal" operation provides this
ability.  Unsealed BER objects contain tag, length and buffer fields,
as well as a size field that specifies the capacity of the buffer.
Sealing recursively (and intelligently) descends the hierarchy and
reallocates the buffer to allow for the header which holds optimal
representations of the tag and length as well as the value.  The size
field is then set to -1 to indicate that the object is sealed.

	What's missing here is a clarification as to which buffer is
	reallocated and where the header is added (given BerObj "A",
	it is the outermost buffer that is being manipulated by
	ber_seal(&A), by assimilating its progeny's buffers into it -
	I need to confirm this from the code).  It is amusing that I
	managed to code the procedure without quite as clear an idea
	of its operation as is necessary to document it.  The
	ambiguity below stems from precisely this lack of clarity.

	Note for example that the attachment of a new component to an
	object is restricted to unsealed objects.  More accurately, it
	is essential for the attachment procedure to establish whether
	the object is already sealed.  Whether the component is then
	added or not, if necessary by first unsealing the object, is
	an API decision.

	Having studied to some depth the implementation of ber_seal(),
	I'd hazard that we want some alterations.  There seems little
	reason to return failure whether the object is already sealed
	or other operations on the object fail (most notably memory
	(re)allocation).  A previously sealed object (identified by a
	size field set to -1) could trigger a return value
	corresponding to its length and be used unaltered in the
	sealing process.  There is no obvious reason not to treat a
	seal operation on a previously sealed object as an identity
	function.

	By the same token, it is not clear that a zero size would not
	be preferable to a -1 value in the sealed object.  If the
	object has size zero _and_ buffer length zero, then it serves
	no purpose.  In fact, it is unusable.

	*** Not in fact true at all.  The NULL type of component (tag
	= 5) is always empty, its size _and_ length are zero ***

	If the length (the
	amount of buffer space used by the representation of the
	object component) is non-zero, then a zero-length "size"
	suffices to identify a sealed component.  A negative size (-1)
	could, if convenient, represent an invalid object
	representation, which would clearly need to be propagated back
	to the outermost layer.

	Note that in this new representation of the sealed object,
	ber_seal() never returns a zero length.  if this condition
	were to be tested, it ought to result in flagging the object
	as invalid.

	To be able to retain a zero length "size" indicator for a
	sealed object, we need to consider the matching length.  If it
	is zero, the NULL component is both sealed and unsealed (to be
	more clear, there is no significant distinction).  If the
	length is not zero, the object can be presumed to be sealed
	and its length returned.  The test mentioned earlier that
	would reject objects with zero length _and_ size could now be
	enhanced to check that the tag is "5", but this is largely
	superfluous.

In the canonical representation of a BER object, the inner buffers do
not represent individual memory areas; instead, they point to the
beginning of the portion of the outermost buffer area that represents
the particular value of the object they are connected to.  In sealing
an object, the present procedure releases the inner buffer space after
copying the value to the outer buffer.  It is ambiguous at times which
representation is being used, therefore we surround the release code
with compile time conditionals until this ambiguity has been resolved.

	The clarification still needed will remove this ambiguity.  At
	the core of the problem is whether the present object
	representation implicitly discriminates between the canonical
	object and the internal representation or whether an
	additional discriminator needs to be added to distinguish
	between these two possibilities.  It is in fact no more than a
	careful investigation of the representation of an object and a
	better understanding of the differences between its canonical
	and working formats.

	// utilities
	void ber_print (BerObj *, int);
	void ber_objs (BerObj *, int);
	void ber_free (BerObj *);
	// formatting functions
	int ber_fmtint (Fmt *);		// integers			// %I
	int ber_fmtoid (Fmt *);		// OIDs				// %O
	int ber_fmtdate (Fmt *);	// Validity dates	// %T

A few OIDs are in order.  Maybe we can decode them from the data entry as we go...

Critical operations to be addressed in the near future, that is to say, immediately:

1. Figuring out how to extend upas/fs to extract the S/MIME certificate (shouldn't be so hard to identify it, but we need to express it as a filesystem of some complexity) while at the same time allow for PEM and PGP and anything else that may choose to follow that format.

2. Laying out the format of a PEM or S/MIME certificate as an internal value, be it for manipulation or to present it as a filesystem (member).  This need not be too dissimilar from the output of ber_print(), but it needs to appear on demand.  In addition, we need to finalise the few discrepancies in the internal format that are still outstanding, and the threading of OIDs should be documented both for future use and to ensure that it is not self-contradictory.

--

Seemingly, we should start by transforming "b" to a fileserver, presenting the certificate as a hierarchy of files and directories.  Not understanding too well exactly what components go into the certificate, this is non-trivial.  In my opinion, we should construct the filesystem from the OIDs (use names rather than numbers, but leave the option open to switch - in fact the fileserver can provide the translation as a startup option.  We do need to instruct the fileserver as to how we propose to lay out the certificate, which ought to obey some standards such as the S/MIME RFC.

The question is what shape should the fileserver take?  Seemingly, we want objects to be the elements of the fileserver, with files containing possibly different views of the object properties (attributes), accessible by name.  I have no aversion to a flat space in which object IDs, however weird looking, live side by side as directories.  It is a problem when multiple object with the same OID have to coexist.  It is clear that an OID is in fact a class, not an ID and that to single out objects there has to be a different way to describe them uniquely.

Now back to our fileserver.  What do we model by navigating the fileserver hierarchy?  Specifically, our present requirement is to make it easy for an extension to upas/fs and acme/mail/Mail to deal with S/MIME messages.  At a trivial level, this entails signing and/or encrypting a message (the RFC makes it clear that an S/MIME document is an outer encoding that consists exclusively of a message - which in turn could have any degree of complexity - and its "certificate") and producing the proper S/MIME document or, on the other side, checking the validity of the document and/or decrypting it with the assistance of the "certificate".  A prerequisite is the ability to extract the sender's own certificate from the S/MIME attachment as well as to track its certification and revocation path to ensure its validity or alert the user to its invalidity.

Our present efforts give us a parsed object of type "signedData" and we're inclined to attach this to the filesystem hierarchy served by upas/fs for the MIME message this is part of, probably replacing its binary representation with something considerably more useful.

All along we need to keep sight of the fact that the 


-- Another file server should map OIDs to an understandable description and viceversa.  Such a file server would on one side have each OID as a node, with a name, if applicable, up to that node (some names apply to the value of the node, some names apply to the entire hierarchy up and including the node, examples would make this considerably clearer) so that one can navigate to a valid OID and identify it, but there should equally be a namespace in which specific OID names can be translated to full OIDs or to the given element within the hierarchy to which it belongs.  Again, examples ought to make this considerably clearer.


PKIX1Explicit88:  -- 1.3.6.1.5.5.7.0.1
	{ iso(1) identified-organization(3) dod(6)
	internet(1) security(5) mechanisms(5) pkix(7) id-mod(0)
	id-pkix1-explicit-88(1) }

id-pkix  OBJECT IDENTIFIER  ::= -- 1.3.6.1
         { iso(1) identified-organization(3) dod(6) internet(1) }

id-pe OBJECT IDENTIFIER  ::=  { id-pkix 1 } -- 1.3.6.1.1
        -- arc for private certificate extensions
id-qt OBJECT IDENTIFIER ::= { id-pkix 2 } -- 1.3.6.1.2
        -- arc for policy qualifier types
id-kp OBJECT IDENTIFIER ::= { id-pkix 3 } -- 1.3.6.1.3
        -- arc for extended key purpose OIDS
id-ad OBJECT IDENTIFIER ::= { id-pkix 48 } -- 1.3.6.1.48
        -- arc for access descriptors
id-qt-cps      OBJECT IDENTIFIER ::=  { id-qt 1 } -- 1.3.6.1.2.1
        -- OID for CPS qualifier
id-qt-unotice  OBJECT IDENTIFIER ::=  { id-qt 2 } -- 1.3.6.1.2.2
        -- OID for user notice qualifier

id-ad-ocsp      OBJECT IDENTIFIER ::= { id-ad 1 } -- 1.3.6.1.48.1
id-ad-caIssuers OBJECT IDENTIFIER ::= { id-ad 2 } -- 1.3.6.1.48.2

id-at           OBJECT IDENTIFIER ::= {joint-iso-ccitt(2) ds(5) 4} -- 2.5.4
id-at-name              AttributeType   ::=     {id-at 41} -- 2.5.4.41
id-at-commonName        AttributeType   ::=     {id-at 3} -- 2.5.4.3
id-at-surname           AttributeType   ::=     {id-at 4} -- 2.5.4.4
id-at-countryName       AttributeType   ::=     {id-at 6} -- 2.5.4.6
id-at-stateOrProvinceName       AttributeType   ::=     {id-at 8} -- 2.5.4.8
id-at-organizationName          AttributeType   ::=     {id-at 10} -- 2.5.4.10
id-at-organizationalUnitName    AttributeType   ::=     {id-at 11} -- 2.5.4.11
id-at-title     AttributeType   ::=     {id-at 12} -- 2.5.4.12
id-at-givenName         AttributeType   ::=     {id-at 42} -- 2.5.4.42
id-at-initials          AttributeType   ::=     {id-at 43} -- 2.5.4.43
id-at-generationQualifier       AttributeType   ::=     {id-at 44} -- 2.5.4.44
id-at-dnQualifier       AttributeType   ::=     {id-at 46}-- 2.5.4.46

pkcs-9 OBJECT IDENTIFIER ::=	-- 1.2.840.113549.1.9
       { iso(1) member-body(2) us(840) rsadsi(113549) pkcs(1) 9 }

emailAddress AttributeType      ::= { pkcs-9 1 } -- 1.2.840.113549.1.9.9

pkcs-1 OBJECT IDENTIFIER ::= {	-- 1.2.840.113549.1.1
     iso(1) member-body(2) us(840) rsadsi(113549) pkcs(1) 1 }

rsaEncryption OBJECT IDENTIFIER ::=  { pkcs-1 1 }	-- 1.2.840.113549.1.1.1

md2WithRSAEncryption OBJECT IDENTIFIER  ::=  { pkcs-1 2 }	-- 1.2.840.113549.1.1.2

md5WithRSAEncryption OBJECT IDENTIFIER  ::=  { pkcs-1 4 }	-- 1.2.840.113549.1.1.4

sha1WithRSAEncryption OBJECT IDENTIFIER  ::=  { pkcs-1 5 }	-- 1.2.840.113549.1.1.5

id-dsa-with-sha1 OBJECT IDENTIFIER ::=  { -- 1.2.840.10040.4.3
     iso(1) member-body(2) us(840) x9-57 (10040) x9algorithm(4) 3 }

dhpublicnumber OBJECT IDENTIFIER ::= {	-- 1.2.840.10046.2.1
     iso(1) member-body(2) us(840) ansi-x942(10046) number-type(2) 1 }
id-dsa OBJECT IDENTIFIER ::= {	-- 1.2.840.10040.4.1
     iso(1) member-body(2) us(840) x9-57(10040) x9algorithm(4) 1 }

id-ce OBJECT IDENTIFIER  ::=  {joint-iso-ccitt(2) ds(5) 29} -- 2.5.29

id-ce-subjectDirectoryAttributes OBJECT IDENTIFIER ::=  { id-ce 9 }

id-ce-subjectKeyIdentifier OBJECT IDENTIFIER ::=  { id-ce 14 } -- 2.5.29.14

id-ce-keyUsage OBJECT IDENTIFIER ::=  { id-ce 15 } -- 2.5.29.15

id-ce-subjectAltName OBJECT IDENTIFIER ::=  { id-ce 17 } -- 2.5.29.17

id-ce-issuerAltName OBJECT IDENTIFIER ::=  { id-ce 18 }

id-ce-basicConstraints OBJECT IDENTIFIER ::=  { id-ce 19 }

id-ce-cRLNumber OBJECT IDENTIFIER ::= { id-ce 20 }

id-ce-cRLReasons OBJECT IDENTIFIER ::= { id-ce 21 }

id-ce-holdInstructionCode OBJECT IDENTIFIER ::= { id-ce 23 }

id-ce-invalidityDate OBJECT IDENTIFIER ::= { id-ce 24 }

id-ce-deltaCRLIndicator OBJECT IDENTIFIER ::= { id-ce 27 }

id-ce-certificateIssuer OBJECT IDENTIFIER ::= { id-ce 29 }

id-ce-nameConstraints OBJECT IDENTIFIER ::=  { id-ce 30 }

id-ce-cRLDistributionPoints     OBJECT IDENTIFIER  ::=  {id-ce 31}

id-ce-certificatePolicies OBJECT IDENTIFIER ::=  { id-ce 32 } -- 2.5.29.32

id-ce-policyMappings OBJECT IDENTIFIER ::=  { id-ce 33 } -- 2.5.29.33

id-ce-authorityKeyIdentifier OBJECT IDENTIFIER ::=  { id-ce 35 } -- 2.5.29.35

id-ce-policyConstraints OBJECT IDENTIFIER ::=  { id-ce 36 }

id-ce-extKeyUsage OBJECT IDENTIFIER ::= {id-ce 37} -- 2.5.29.37

id-pe-authorityInfoAccess OBJECT IDENTIFIER ::= { id-pe 1 } -- 1.3.6.1.1.1

id-kp-serverAuth      OBJECT IDENTIFIER ::= { id-kp 1 } -- 1.3.6.1.3.1
id-kp-clientAuth      OBJECT IDENTIFIER ::= { id-kp 2 }
id-kp-codeSigning     OBJECT IDENTIFIER ::= { id-kp 3 }
id-kp-emailProtection OBJECT IDENTIFIER ::= { id-kp 4 }
id-kp-ipsecEndSystem  OBJECT IDENTIFIER ::= { id-kp 5 }
id-kp-ipsecTunnel     OBJECT IDENTIFIER ::= { id-kp 6 }
id-kp-ipsecUser       OBJECT IDENTIFIER ::= { id-kp 7 }
id-kp-timeStamping    OBJECT IDENTIFIER ::= { id-kp 8 } -- 1.3.6.1.3.8

holdInstruction OBJECT IDENTIFIER ::=	-- 2.2.840.10040.2
          {joint-iso-itu-t(2) member-body(2) us(840) x9cm(10040) 2}

-- ANSI X9 holdinstructions referenced by this standard
id-holdinstruction-none OBJECT IDENTIFIER  ::=
                {holdInstruction 1} -- deprecated 2.2.840.10040.2.1
id-holdinstruction-callissuer OBJECT IDENTIFIER ::=
                {holdInstruction 2} -- deprecated 2.2.840.10040.2.2
id-holdinstruction-reject OBJECT IDENTIFIER ::=
                {holdInstruction 3} -- deprecated 2.2.840.10040.2.3


PKIX1Explicit93 {	-- 1.3.6.1.5.5.7.0.3
	iso(1) identified-organization(3) dod(6) internet(1)
   security(5) mechanisms(5) pkix(7) id-mod(0) id-pkix1-explicit-93(3)}

id-pkix  OBJECT IDENTIFIER  ::=	-- 1.3.6.1.5.5.7
         { iso(1) identified-organization(3) dod(6) internet(1)
                    security(5) mechanisms(5) pkix(7) }

 --

The idea is that a certificate ought to be represented by a template that is filled from the ASN.1 object, subject to syntax and semantic checks that would preferably occur within the template itself.  In other words, we parse a certificate into a decomposed object, then we descend the template, filling the various elements as they are required.

The various checks are performed by subjecting the entries to validation through functions specific to the particular entry.  These are included in the template.  This is analogous to the current approach of including object-specific manipulation procedures in our local object tables.

Neither of the above approaches has been put to the test yet.

Taking a more structured approach to the problem, we could consider object analysis to consist of submitting the object to a procedure with an indication of the desired result (the choice of procedure would determine the type of result as well as the required inputs) and letting the procedure itself establish how the result is obtained.  An object-oriented approach would instead associate the various option methods with the class of object involved.  The latter approach seems more appropriate and more readily extended.

Our immediate need is twofold.  On the one hand, we want to accept an X.509 certificate and extract from it some important properties, some of which will be used for further processing.  On the other, we want to construct X.509 certificates for our own use in submitting to procedures analogous to our own.  In the process, we also wish to understand more clearly the specifications of X.509 certificates whose theoretical expression is somewhat too dense to grasp.

Somewhere in between there is also the need to build a utility that assembles a database of ASN.1 Object Identifiers (OIDs) which in turn can be used in various phases of our processing of X.509 certificates (and other, related operations).

Keeping into consideration the facilities provided by Plan 9 (factotum, file server functionality, etc) is an important aspect of the design.  Ultimately, we'll probably use an externally based LDAP service to maintain the certificates so that they are readily available to heterogenous systems, but the asymptotic objective is to use file services as the foundations for all X.509 and Directory-related activities.

In particular, factotum must be the preferred approach to the exchange of security information.

Tue Jan  3 08:22:49 SAT 2006 - Notes inspired by Peter Guttmann

1.	LDAP is critical to the management of certificates and
	certificate chains in a consistent fashion.  Therefore, the
	first implementation step must be to produce an LDAP interface
	(client) for Plan 9.  Eventually, it may be much more
	preferable to have the LDAP server as a Plan 9 application
	(ideally combining the file server paradigm with the BFS
	"database" concept to expedite on-disk access), but initially
	an LDAP client goes a long way to solve some problems.