DtdToHaskell tool


DtdToHaskell is a tool (and Text.XML.HaXml.Xml2Haskell provides the class framework) for translating any valid XML DTD into equivalent Haskell types. This allows you to generate, edit, and transform documents as normal typed values in programs, and to read and write them as human-readable XML documents.

Usage: DtdToHaskell [dtdfile [outfile]]
(Missing file arguments or dashes (-) indicate stdin or stdout respectively.)

The program reads and parses a DTD from dtdfile (which may be either just a DTD, or a full XML document containing an internal DTD). It generates into outfile a Haskell module containing a collection of type definitions plus some class instance declarations for I/O.

In order to use the resulting module, you need to import it, and also to import Text.XML.HaXml.Xml2Haskell. To read and write XML files as values of the declared types, use some of the following convenience functions:

    readXml   :: XmlContent a => String -> Maybe a
    showXml   :: XmlContent a => a -> String

    hGetXml   :: XmlContent a => Handle -> IO a
    hPutXml   :: XmlContent a => Handle -> a -> IO ()

    fReadXml  :: XmlContent a => FilePath -> IO a
    fWriteXml :: XmlContent a => FilePath -> a -> IO ()
not forgetting to resolve the overloading in one of the usual ways (e.g. by implicit context at point of use, by explicit type signatures on values, use value as an argument to a function with an explicit signature, use `asTypeOf`, etc.) (Also, note the similarity between these signatures and those provided by the Haskell2Xml library.)

You will need to study the automatically-generated type declarations to write your own transformation scripts - most things are pretty obvious parallels to the DTD structure.

Limitations
The generated Haskell contains references to types like OneOf3 where there is a choice between n (in this case 3) different tags. Currently, the module Text.XML.HaXml.OneOfN defines these types up to n=20. If your DTD requires larger choices, then use the tool MkOneOf to generate the extra size or range of sizes you need.

We mangle tag names and attribute names to ensure that they have the correct lexical form in Haskell, but this means that (for instance) we can't distinguish Myname and myname, which are different names in XML but translate to overlapping types in Haskell (and hence probably won't compile).

Attribute names translate into named fields: but because Haskell doesn't allow different types to have the same named field, this means your XML document which uses the same name for similar attributes on different tags would crash and burn. We have fixed this by incorporating the tagname into the named field in addition to the attribute name, e.g. tagAttr instead of just attr. Uglier, but more portable.

XML namespaces. Currently, we just mangle the namespace identifier into any tag name which uses it. Probably the right way to do it is to regard the namespace as a separate imported module, and hence translate the namespace prefix into a module qualifier. Does this sound about right? (It isn't implemented yet.)

External subset. Since HaXml release 1.00, we support the XML DTD external subset. This means we can read and parse a whole bunch of files as part of the same DTD, and we respect INCLUDE and IGNORE conditional sections.

There are some fringe parts of the DTD we are not entirely sure about - Tokenised Types and Notation Types. In particular, there is no validity checking of these external references. If you find a problem, mail us: Malcolm.Wallace@cs.york.ac.uk