ASN.1 and BER: assembly and parsing The ber.c module provides a library of utilities to manipulate BER-encoded information. At the core of the module is a simple recursive structure: typedef struct berObj BerObj; struct berObj { int tag, len, size; char *buf; BerObj *next, *down; }; which is used both to construct new BER records and to parse existing ones. In the case of parsing BER structures, one starts with a binary string (octet string, in the terminology of ITU-T recommendation X.690) which is assimilated into the buffer in the outermost structure. At this level, no siblings apply, any progeny is represented by a single child (down) and its siblings in a linked list (down->next, etc.). We do not store the indication that this is a composite entry, the existence of a non-nil "down" link suffices in this role. The output from the parse procedure is then a linked list of BER objects where inner buffer pointers connect each object with the portion of the outside string that comprise the actual value. It is left to the user, presently, to interpret the tag and transform the value to some internal representation. It is likely that a single function could be provided to deal with the fundamental (primitive) types. For my own convenience: 0 Reserved for use by the encoding rules 1 Boolean type 2 Integer type 3 Bitstring type (residual width in first octet) 4 Octetstring type 5 Null type (always empty) 6 Object identifier type 7 Object descriptor type 8 External type and Instance-of type 9 Real type 10 Enumerated type 11 Embedded-pdv type 12-15 Reserved for future extensions 16 Sequence and Sequence-of type 17 Set and Set-of type 18-22, 25-30 Character string types 23, 24 Time types 31-.. Reserved for addenda This format fails to handle really large objects, whereas the design of BER allows for sequential handling of the input string. An iterator that is not encumbered by large memory requirements would be useful for small memory parsers. This may well be added in the foreseeable future. For the time being, we'll give priority to the double check that (a) we can allocate a sufficiently large buffer for the input data and (b) that we do not exceed an arbitrary size laid down by the user. Currently, no such test is present. The assembly of BER objects separates pretty naturally into two distinct operations. Primitive objects can be constructed very simply by a generic operation or, more suitably, by a small set of specialised functions that differ only in small details. This is the public interface for object construction: // object creation BerObj *ber_simple (int tag, uchar *s, int len); // generic assembly BerObj *ber_int2 (int); // 2-octet integer BerObj *ber_int4 (long); // 4-octet integer BerObj *ber_intarb (char *s, int len); // arbitrary-length (integer?) BerObj *ber_bool (int); // boolean BerObj *ber_ostr (uchar *, int); // length is octets BerObj *ber_bstr (uchar *, int); // length is bits BerObj *ber_objid (char *); // object ID, NUL terminated, period delimited BerObj *ber_init (int); // object initialisation BerObj *ber_attach (BerObj *, BerObj *); // object composition int ber_seal (BerObj *); // object sealing BerObj *ber_parse (uchar *s, uchar *se); // object deconstruction The generic assembly of a primitive BER object is mirrored in the operation used to attach a new child to an existing constructed object. The child object is linked to the outer (parent) object at the "down" link, or appended to the linked list that starts with "down" and proceeds with "down->next", "down->next->next", etc. Because the parent object is exclusively a container, assembly is practically complete once the last child is attached to it (or one of its own children). However, whereas primitive objects are finalised at creation, constructed objects cannot be finalised until their total length has been computed (assuming that one prefers definite to indefinite lengths - our library operates under this assumption, a simpler procedure to produce BER objects with indefinite length entries ought to be available) and the "seal" operation provides this ability. Unsealed BER objects contain tag, length and buffer fields, as well as a size field that specifies the capacity of the buffer. Sealing recursively (and intelligently) descends the hierarchy and reallocates the buffer to allow for the header which holds optimal representations of the tag and length as well as the value. The size field is then set to -1 to indicate that the object is sealed. What's missing here is a clarification as to which buffer is reallocated and where the header is added (given BerObj "A", it is the outermost buffer that is being manipulated by ber_seal(&A), by assimilating its progeny's buffers into it - I need to confirm this from the code). It is amusing that I managed to code the procedure without quite as clear an idea of its operation as is necessary to document it. The ambiguity below stems from precisely this lack of clarity. Note for example that the attachment of a new component to an object is restricted to unsealed objects. More accurately, it is essential for the attachment procedure to establish whether the object is already sealed. Whether the component is then added or not, if necessary by first unsealing the object, is an API decision. Having studied to some depth the implementation of ber_seal(), I'd hazard that we want some alterations. There seems little reason to return failure whether the object is already sealed or other operations on the object fail (most notably memory (re)allocation). A previously sealed object (identified by a size field set to -1) could trigger a return value corresponding to its length and be used unaltered in the sealing process. There is no obvious reason not to treat a seal operation on a previously sealed object as an identity function. By the same token, it is not clear that a zero size would not be preferable to a -1 value in the sealed object. If the object has size zero _and_ buffer length zero, then it serves no purpose. In fact, it is unusable. *** Not in fact true at all. The NULL type of component (tag = 5) is always empty, its size _and_ length are zero *** If the length (the amount of buffer space used by the representation of the object component) is non-zero, then a zero-length "size" suffices to identify a sealed component. A negative size (-1) could, if convenient, represent an invalid object representation, which would clearly need to be propagated back to the outermost layer. Note that in this new representation of the sealed object, ber_seal() never returns a zero length. if this condition were to be tested, it ought to result in flagging the object as invalid. To be able to retain a zero length "size" indicator for a sealed object, we need to consider the matching length. If it is zero, the NULL component is both sealed and unsealed (to be more clear, there is no significant distinction). If the length is not zero, the object can be presumed to be sealed and its length returned. The test mentioned earlier that would reject objects with zero length _and_ size could now be enhanced to check that the tag is "5", but this is largely superfluous. In the canonical representation of a BER object, the inner buffers do not represent individual memory areas; instead, they point to the beginning of the portion of the outermost buffer area that represents the particular value of the object they are connected to. In sealing an object, the present procedure releases the inner buffer space after copying the value to the outer buffer. It is ambiguous at times which representation is being used, therefore we surround the release code with compile time conditionals until this ambiguity has been resolved. The clarification still needed will remove this ambiguity. At the core of the problem is whether the present object representation implicitly discriminates between the canonical object and the internal representation or whether an additional discriminator needs to be added to distinguish between these two possibilities. It is in fact no more than a careful investigation of the representation of an object and a better understanding of the differences between its canonical and working formats. // utilities void ber_print (BerObj *, int); void ber_objs (BerObj *, int); void ber_free (BerObj *); // formatting functions int ber_fmtint (Fmt *); // integers // %I int ber_fmtoid (Fmt *); // OIDs // %O int ber_fmtdate (Fmt *); // Validity dates // %T A few OIDs are in order. Maybe we can decode them from the data entry as we go... Critical operations to be addressed in the near future, that is to say, immediately: 1. Figuring out how to extend upas/fs to extract the S/MIME certificate (shouldn't be so hard to identify it, but we need to express it as a filesystem of some complexity) while at the same time allow for PEM and PGP and anything else that may choose to follow that format. 2. Laying out the format of a PEM or S/MIME certificate as an internal value, be it for manipulation or to present it as a filesystem (member). This need not be too dissimilar from the output of ber_print(), but it needs to appear on demand. In addition, we need to finalise the few discrepancies in the internal format that are still outstanding, and the threading of OIDs should be documented both for future use and to ensure that it is not self-contradictory. -- Seemingly, we should start by transforming "b" to a fileserver, presenting the certificate as a hierarchy of files and directories. Not understanding too well exactly what components go into the certificate, this is non-trivial. In my opinion, we should construct the filesystem from the OIDs (use names rather than numbers, but leave the option open to switch - in fact the fileserver can provide the translation as a startup option. We do need to instruct the fileserver as to how we propose to lay out the certificate, which ought to obey some standards such as the S/MIME RFC. The question is what shape should the fileserver take? Seemingly, we want objects to be the elements of the fileserver, with files containing possibly different views of the object properties (attributes), accessible by name. I have no aversion to a flat space in which object IDs, however weird looking, live side by side as directories. It is a problem when multiple object with the same OID have to coexist. It is clear that an OID is in fact a class, not an ID and that to single out objects there has to be a different way to describe them uniquely. Now back to our fileserver. What do we model by navigating the fileserver hierarchy? Specifically, our present requirement is to make it easy for an extension to upas/fs and acme/mail/Mail to deal with S/MIME messages. At a trivial level, this entails signing and/or encrypting a message (the RFC makes it clear that an S/MIME document is an outer encoding that consists exclusively of a message - which in turn could have any degree of complexity - and its "certificate") and producing the proper S/MIME document or, on the other side, checking the validity of the document and/or decrypting it with the assistance of the "certificate". A prerequisite is the ability to extract the sender's own certificate from the S/MIME attachment as well as to track its certification and revocation path to ensure its validity or alert the user to its invalidity. Our present efforts give us a parsed object of type "signedData" and we're inclined to attach this to the filesystem hierarchy served by upas/fs for the MIME message this is part of, probably replacing its binary representation with something considerably more useful. All along we need to keep sight of the fact that the -- Another file server should map OIDs to an understandable description and viceversa. Such a file server would on one side have each OID as a node, with a name, if applicable, up to that node (some names apply to the value of the node, some names apply to the entire hierarchy up and including the node, examples would make this considerably clearer) so that one can navigate to a valid OID and identify it, but there should equally be a namespace in which specific OID names can be translated to full OIDs or to the given element within the hierarchy to which it belongs. Again, examples ought to make this considerably clearer. PKIX1Explicit88: -- 1.3.6.1.5.5.7.0.1 { iso(1) identified-organization(3) dod(6) internet(1) security(5) mechanisms(5) pkix(7) id-mod(0) id-pkix1-explicit-88(1) } id-pkix OBJECT IDENTIFIER ::= -- 1.3.6.1 { iso(1) identified-organization(3) dod(6) internet(1) } id-pe OBJECT IDENTIFIER ::= { id-pkix 1 } -- 1.3.6.1.1 -- arc for private certificate extensions id-qt OBJECT IDENTIFIER ::= { id-pkix 2 } -- 1.3.6.1.2 -- arc for policy qualifier types id-kp OBJECT IDENTIFIER ::= { id-pkix 3 } -- 1.3.6.1.3 -- arc for extended key purpose OIDS id-ad OBJECT IDENTIFIER ::= { id-pkix 48 } -- 1.3.6.1.48 -- arc for access descriptors id-qt-cps OBJECT IDENTIFIER ::= { id-qt 1 } -- 1.3.6.1.2.1 -- OID for CPS qualifier id-qt-unotice OBJECT IDENTIFIER ::= { id-qt 2 } -- 1.3.6.1.2.2 -- OID for user notice qualifier id-ad-ocsp OBJECT IDENTIFIER ::= { id-ad 1 } -- 1.3.6.1.48.1 id-ad-caIssuers OBJECT IDENTIFIER ::= { id-ad 2 } -- 1.3.6.1.48.2 id-at OBJECT IDENTIFIER ::= {joint-iso-ccitt(2) ds(5) 4} -- 2.5.4 id-at-name AttributeType ::= {id-at 41} -- 2.5.4.41 id-at-commonName AttributeType ::= {id-at 3} -- 2.5.4.3 id-at-surname AttributeType ::= {id-at 4} -- 2.5.4.4 id-at-countryName AttributeType ::= {id-at 6} -- 2.5.4.6 id-at-stateOrProvinceName AttributeType ::= {id-at 8} -- 2.5.4.8 id-at-organizationName AttributeType ::= {id-at 10} -- 2.5.4.10 id-at-organizationalUnitName AttributeType ::= {id-at 11} -- 2.5.4.11 id-at-title AttributeType ::= {id-at 12} -- 2.5.4.12 id-at-givenName AttributeType ::= {id-at 42} -- 2.5.4.42 id-at-initials AttributeType ::= {id-at 43} -- 2.5.4.43 id-at-generationQualifier AttributeType ::= {id-at 44} -- 2.5.4.44 id-at-dnQualifier AttributeType ::= {id-at 46}-- 2.5.4.46 pkcs-9 OBJECT IDENTIFIER ::= -- 1.2.840.113549.1.9 { iso(1) member-body(2) us(840) rsadsi(113549) pkcs(1) 9 } emailAddress AttributeType ::= { pkcs-9 1 } -- 1.2.840.113549.1.9.9 pkcs-1 OBJECT IDENTIFIER ::= { -- 1.2.840.113549.1.1 iso(1) member-body(2) us(840) rsadsi(113549) pkcs(1) 1 } rsaEncryption OBJECT IDENTIFIER ::= { pkcs-1 1 } -- 1.2.840.113549.1.1.1 md2WithRSAEncryption OBJECT IDENTIFIER ::= { pkcs-1 2 } -- 1.2.840.113549.1.1.2 md5WithRSAEncryption OBJECT IDENTIFIER ::= { pkcs-1 4 } -- 1.2.840.113549.1.1.4 sha1WithRSAEncryption OBJECT IDENTIFIER ::= { pkcs-1 5 } -- 1.2.840.113549.1.1.5 id-dsa-with-sha1 OBJECT IDENTIFIER ::= { -- 1.2.840.10040.4.3 iso(1) member-body(2) us(840) x9-57 (10040) x9algorithm(4) 3 } dhpublicnumber OBJECT IDENTIFIER ::= { -- 1.2.840.10046.2.1 iso(1) member-body(2) us(840) ansi-x942(10046) number-type(2) 1 } id-dsa OBJECT IDENTIFIER ::= { -- 1.2.840.10040.4.1 iso(1) member-body(2) us(840) x9-57(10040) x9algorithm(4) 1 } id-ce OBJECT IDENTIFIER ::= {joint-iso-ccitt(2) ds(5) 29} -- 2.5.29 id-ce-subjectDirectoryAttributes OBJECT IDENTIFIER ::= { id-ce 9 } id-ce-subjectKeyIdentifier OBJECT IDENTIFIER ::= { id-ce 14 } -- 2.5.29.14 id-ce-keyUsage OBJECT IDENTIFIER ::= { id-ce 15 } -- 2.5.29.15 id-ce-subjectAltName OBJECT IDENTIFIER ::= { id-ce 17 } -- 2.5.29.17 id-ce-issuerAltName OBJECT IDENTIFIER ::= { id-ce 18 } id-ce-basicConstraints OBJECT IDENTIFIER ::= { id-ce 19 } id-ce-cRLNumber OBJECT IDENTIFIER ::= { id-ce 20 } id-ce-cRLReasons OBJECT IDENTIFIER ::= { id-ce 21 } id-ce-holdInstructionCode OBJECT IDENTIFIER ::= { id-ce 23 } id-ce-invalidityDate OBJECT IDENTIFIER ::= { id-ce 24 } id-ce-deltaCRLIndicator OBJECT IDENTIFIER ::= { id-ce 27 } id-ce-certificateIssuer OBJECT IDENTIFIER ::= { id-ce 29 } id-ce-nameConstraints OBJECT IDENTIFIER ::= { id-ce 30 } id-ce-cRLDistributionPoints OBJECT IDENTIFIER ::= {id-ce 31} id-ce-certificatePolicies OBJECT IDENTIFIER ::= { id-ce 32 } -- 2.5.29.32 id-ce-policyMappings OBJECT IDENTIFIER ::= { id-ce 33 } -- 2.5.29.33 id-ce-authorityKeyIdentifier OBJECT IDENTIFIER ::= { id-ce 35 } -- 2.5.29.35 id-ce-policyConstraints OBJECT IDENTIFIER ::= { id-ce 36 } id-ce-extKeyUsage OBJECT IDENTIFIER ::= {id-ce 37} -- 2.5.29.37 id-pe-authorityInfoAccess OBJECT IDENTIFIER ::= { id-pe 1 } -- 1.3.6.1.1.1 id-kp-serverAuth OBJECT IDENTIFIER ::= { id-kp 1 } -- 1.3.6.1.3.1 id-kp-clientAuth OBJECT IDENTIFIER ::= { id-kp 2 } id-kp-codeSigning OBJECT IDENTIFIER ::= { id-kp 3 } id-kp-emailProtection OBJECT IDENTIFIER ::= { id-kp 4 } id-kp-ipsecEndSystem OBJECT IDENTIFIER ::= { id-kp 5 } id-kp-ipsecTunnel OBJECT IDENTIFIER ::= { id-kp 6 } id-kp-ipsecUser OBJECT IDENTIFIER ::= { id-kp 7 } id-kp-timeStamping OBJECT IDENTIFIER ::= { id-kp 8 } -- 1.3.6.1.3.8 holdInstruction OBJECT IDENTIFIER ::= -- 2.2.840.10040.2 {joint-iso-itu-t(2) member-body(2) us(840) x9cm(10040) 2} -- ANSI X9 holdinstructions referenced by this standard id-holdinstruction-none OBJECT IDENTIFIER ::= {holdInstruction 1} -- deprecated 2.2.840.10040.2.1 id-holdinstruction-callissuer OBJECT IDENTIFIER ::= {holdInstruction 2} -- deprecated 2.2.840.10040.2.2 id-holdinstruction-reject OBJECT IDENTIFIER ::= {holdInstruction 3} -- deprecated 2.2.840.10040.2.3 PKIX1Explicit93 { -- 1.3.6.1.5.5.7.0.3 iso(1) identified-organization(3) dod(6) internet(1) security(5) mechanisms(5) pkix(7) id-mod(0) id-pkix1-explicit-93(3)} id-pkix OBJECT IDENTIFIER ::= -- 1.3.6.1.5.5.7 { iso(1) identified-organization(3) dod(6) internet(1) security(5) mechanisms(5) pkix(7) } -- The idea is that a certificate ought to be represented by a template that is filled from the ASN.1 object, subject to syntax and semantic checks that would preferably occur within the template itself. In other words, we parse a certificate into a decomposed object, then we descend the template, filling the various elements as they are required. The various checks are performed by subjecting the entries to validation through functions specific to the particular entry. These are included in the template. This is analogous to the current approach of including object-specific manipulation procedures in our local object tables. Neither of the above approaches has been put to the test yet. Taking a more structured approach to the problem, we could consider object analysis to consist of submitting the object to a procedure with an indication of the desired result (the choice of procedure would determine the type of result as well as the required inputs) and letting the procedure itself establish how the result is obtained. An object-oriented approach would instead associate the various option methods with the class of object involved. The latter approach seems more appropriate and more readily extended. Our immediate need is twofold. On the one hand, we want to accept an X.509 certificate and extract from it some important properties, some of which will be used for further processing. On the other, we want to construct X.509 certificates for our own use in submitting to procedures analogous to our own. In the process, we also wish to understand more clearly the specifications of X.509 certificates whose theoretical expression is somewhat too dense to grasp. Somewhere in between there is also the need to build a utility that assembles a database of ASN.1 Object Identifiers (OIDs) which in turn can be used in various phases of our processing of X.509 certificates (and other, related operations). Keeping into consideration the facilities provided by Plan 9 (factotum, file server functionality, etc) is an important aspect of the design. Ultimately, we'll probably use an externally based LDAP service to maintain the certificates so that they are readily available to heterogenous systems, but the asymptotic objective is to use file services as the foundations for all X.509 and Directory-related activities. In particular, factotum must be the preferred approach to the exchange of security information. Tue Jan 3 08:22:49 SAT 2006 - Notes inspired by Peter Guttmann 1. LDAP is critical to the management of certificates and certificate chains in a consistent fashion. Therefore, the first implementation step must be to produce an LDAP interface (client) for Plan 9. Eventually, it may be much more preferable to have the LDAP server as a Plan 9 application (ideally combining the file server paradigm with the BFS "database" concept to expedite on-disk access), but initially an LDAP client goes a long way to solve some problems.