What is HaXml? How do I use it? Downloads |
Recent news Contacts Related Work |
Warning! The development versions (1.14 upwards) significantly change the API of some modules! They may be incomplete, inconsistent, and liable to change before the next release! Do not expect code written against an earlier API to be compatible! DtdToHaskell has only recently been fixed to work with the new APIs! Warning! Notes for migrating code from the 1.13 version of HaXml to the development version.
HaXml is a collection of utilities for parsing, filtering, transforming, and generating XML documents using Haskell. Its basic facilities include:
For processing XML documents, the following components are also provided:
Detailed documentation of the HaXml APIs is generated automatically by Haddock directly from the source code. Documentation for the previous (stable) version, HaXml-1.13.2.
An introduction to HaXml for people who know more about XML than about Haskell can be found at IBM DeveloperWorks. Please note that the DeveloperWorks article was based on an older version of HaXml. If you try to use the examples given there, you will need a couple of minor but important edits, given as a diff patch here.
Koen Roelandt has written a more recent tutorial about using HaXml to clean up some ugly HTML pages. http://www.krowland.net/tutorials/haxml_tutorial.html
A paper describing and comparing the generic Combinators with the typed representation (DtdToHaskell/XmlContent) is available here: (12 pages of double-column A4)
Some additional info about using the various facilities is here:
Known problems:
Development versions:
HaXml-1.19.1, release date 2007.11.01
By HTTP:
.tar.gz,
.zip.
By FTP:
ftp://ftp.cs.york.ac.uk/pub/haskell/HaXml/
Ongoing development:
The development version of HaXml is also available through
darcs get
http://www.cs.york.ac.uk/fp/darcs/HaXml
Older versions:
Stable version: for 1.13.2 see
http://haskell.org/HaXml/
By FTP:
ftp://ftp.cs.york.ac.uk/pub/haskell/HaXml/
FreeBSD port:
http://freshports.org/textproc/haxml/
To install HaXml, you must have a Haskell compiler: ghc-6.2 or later, and/or nhc98-1.16/hmake-3.06 or later, and/or Hugs98 (Sept 2003) or later. You must also first download and install the polyparse package as a pre-requisite.
Then, for more recent compilers, use the standard Cabal method of installation:
runhaskell Setup.hs configure [--prefix=...] [--buildwith=...] runhaskell Setup.hs build runhaskell Setup.hs installFor older compilers, use:
./configure [--prefix=...] [--buildwith=...] make make installto configure, build, and install HaXml as a package for your compiler(s). You need write permission on the library installation directories of your compiler(s). Afterwards, to gain access to the HaXml libraries, you only need to add the option -package HaXml to your compiler commandline (no option required for Hugs). Various stand-alone tools are also built - DtdToHaskell, Xtract, Validate, MkOneOf - and copied to the final installation location specified by the --prefix=... option to configure.
To build/install on a Windows system without the Cygwin shell and utilities, you can avoid the configure/make steps by simply using the minimal Build.bat script. Edit it first for the location of your compiler etc.
Version 1.19.1 fixes a build error in 1.19. Version 1.19 improved the lazy XML parsing, and fixed some space leaks in the XtractLazy tool.
Version 1.18 pulled out the parser combinator libraries as a separate package (called polyparse), which must now be downloaded and installed before installing HaXml.
Version 1.17 essentially just fixes compatibility with ghc-6.6. However, it also include a lazier pretty-printer to use in conjunction with the lazy parser, to save running out of memory on large datasets.
Version 1.16 adds laziness to the parser combinator libraries, such that they can start to return partial results before a whole entity has been parsed. Partial is also used in the sense that the returned value can contain bottom - an error which gets thrown as an exception when you try to explore the inner regions of the value. In terms of XML, it means you get an element back as soon as its start-tag has been consumed, but if there are parse errors later on, BOOM. However, if there are no errors, it does mean that your processing will be (a) faster and (b) less memory hungry. Another cool thing is that, even in the presence of errors, you still might get enough output to satisfy your processing task before the error is noticed.
Use Text.XML.HaXml.ParseLazy and Text.XML.HaXml.Html.ParseLazy to try it out. There are also lazy versions of the supplied demo programs: CanonicaliseLazy and XtractLazy.
Version 1.15 is essentially 1.14 with some bugfixes, and some new functionality, especially in the parser combinator libraries. DrIFT now supports deriving the XmlContent class, and DtdToHaskell now also derives the XmlContent class, in addition to determining a collection of Haskell datatypes equivalent to a given DTD.
Error messages from parsing are much improved in 1.15 - they should locate any error far more specifically and accurately. Let me know about examples which do not report correctly.
Prior to 1.14, there were two separate classes, Xml2Haskell and Haskell2Xml. They are now combined into the single class XmlContent. Make sure you get a recent version of DrIFT if you want to derive this class from Haskell datatypes - the included version of DtdToHaskell has not yet been updated for deriving the class the other way, from an XML DTD.
Version 1.14 also contains a new SAX-like stream parser.
A while back, Graham Klyne extended the 1.12 version of HaXml significantly, in particular to ensure that the parser passes a large XML acceptance test suite, and to deal more correctly with Unicode, namespaces, and parameter entity expansion. His modifications will eventually be merged back in to the main CVS tree, but in the meantime, you can get his version here: http://www.ninebynine.org/Software/HaskellUtils/
The previous stable version (1.13) had the following features and fixes:
We are interested in hearing your feedback on these XML facilities - suggestions for improvements, comments, criticisms, bug reports. Please mail
Development of these XML libraries was originally funded by Canon Research Europe Ltd.. Subsequent maintenance and development has been partially supported by the EPSRC, and the University of York.
Licence: The library is Free and Open Source Software, i.e., the bits we wrote are copyright to us, but freely licensed for your use, modification, and re-distribution, provided you don't restrict anyone else's use of it. The HaXml library is distributed under the GNU Lesser General Public Licence (LGPL) - see file LICENCE-LGPL for more details. We allow one special exception to the LGPL - see COPYRIGHT. The HaXml tools are distributed under the GNU General Public Licence (GPL) - see LICENCE-GPL. (If you don't like any of these licensing conditions, please contact us to discuss your requirements.)